Assessing the impact of exact reads on reducing
the error rate of read mapping

Nowadays‎, ‎according to valuable resources of high-quality genome sequences‎, ‎reference-based assembly methods with high accuracy and efficiency are strongly required‎. ‎Many different algorithms have been designed for mapping reads onto a genome sequence which try to enhance the accuracy of reconstructed genomes‎. ‎In this problem‎, ‎one of the challenges occurs when some reads are aligned to multiple locations due to repetitive regions in the genomes‎.

Our goal is to decrease the error rate of rebuilt genome. To achieve this purpose‎, ‎we reduce the search space for the reads which can be aligned against the genome with mismatches‎, ‎insertions or deletions to decrease the probability of incorrect read mapping‎. We propose EIM pipeline divided to three steps‎: ‎ExactMapping‎, ‎InExactMapping‎, ‎and MergingContigs‎, ‎where exact and inexact reads are aligned in two separate phases‎. ‎The results show that the two-step mapping of reads onto ‎‎‎‎‎‎‎the ‎contigs ‎generated ‎by ‎ a mapper ‎such ‎as ‎Bowtie2, ‎BWA ‎and ‎Yara‎ ‎is effective in improving the ‎contigs‎ ‎‎‎‎‎‎‎‎‎‎in terms of error ‎rate.‎


Downloads

Our pipeline can be executed using following source codes:

Genome sequences derived from E. coli K12 and their corresponding simulated read sets:

Arabidopsis thaliana genome sequence derived from TAIR10 reference based on bur-0 strain variations: TAIR10-bur


Contact us

Fatemeh Zare-Mirakabad

Department of Mathematics and Computer Science
Amirkabir University of Technology, 424 Hafez Ave, Tehran, Iran
Telephone: +982164545674
Email:
f.zare@aut.ac.ir

Last modified: 8:21 PM 3/13/2018