Nowadays, according to valuable resources of high-quality genome sequences, reference-based assembly methods with high accuracy and efficiency are strongly required. Many different algorithms have been designed for mapping reads onto a genome sequence which try to enhance the accuracy of reconstructed genomes. In this problem, one of the challenges occurs when some reads are aligned to multiple locations due to repetitive regions in the genomes.
Our goal is to decrease the error rate of rebuilt genome. To achieve this purpose, we reduce the search space for the reads which can be aligned against the genome with mismatches, insertions or deletions to decrease the probability of incorrect read mapping. We propose EIM pipeline divided to three steps: ExactMapping, InExactMapping, and MergingContigs, where exact and inexact reads are aligned in two separate phases. The results show that the two-step mapping of reads onto the contigs generated by a mapper such as Bowtie2, BWA and Yara is effective in improving the contigs in terms of error rate.
Our pipeline can be executed using following source codes:
Genome sequences derived from E. coli K12 and their corresponding simulated read sets:
Arabidopsis thaliana genome sequence derived from TAIR10 reference based on bur-0 strain variations: TAIR10-bur
Fatemeh Zare-Mirakabad
Department of Mathematics and Computer Science
Amirkabir University of Technology, 424 Hafez Ave, Tehran, Iran
Telephone: +982164545674
Email: f.zare@aut.ac.ir
Last modified: 8:21 PM 3/13/2018