Alignment is the first step in most RNA-seq analysis pipelines, and the accuracy of downstream analyses depends heavily on it. Unlike most steps in the pipeline, alignment is particularly amenable to benchmarking with simulated data. We performed a comprehensive benchmarking of 14 common splice-aware aligners for base, read, and exon junction-level accuracy and compared default with optimized parameters. We found that performance varied by genome complexity, and accuracy and popularity were poorly correlated. The most widely cited tool underperforms for most metrics, particularly when using default settings.
Subscribe to Journal
Get full journal access for 1 year
only $20.17 per issue
All prices are NET prices.
VAT will be added later in the checkout.
Rent or Buy article
Get time limited or full article access on ReadCube.
All prices are NET prices.
Hayer, K.E., Pizarro, A., Lahens, N.F., Hogenesch, J.B. & Grant, G.R. Benchmark analysis of algorithms for determining and quantifying full-length mRNA splice forms from RNA-seq data. Bioinformatics 31, 3938–3945 (2015).
Bonfert, T., Kirner, E., Csaba, G., Zimmer, R. & Friedel, C.C. ContextMap 2: fast and accurate context-based RNA-seq mapping. BMC Bioinformatics 16, 122 (2015).
Philippe, N., Salson, M., Commes, T. & Rivals, E. CRAC: an integrated approach to the analysis of RNA-seq reads. Genome Biol. 14, R30 (2013).
Wu, T.D. & Nacu, S. Fast and SNP-tolerant detection of complex variants and splicing in short reads. Bioinformatics 26, 873–881 (2010).
Kim, D., Langmead, B. & Salzberg, S.L. HISAT: a fast spliced aligner with low memory requirements. Nat. Methods 12, 357–360 (2015).
Wang, K. et al. MapSplice: accurate mapping of RNA-seq reads for splice junction discovery. Nucleic Acids Res. 38, e178 (2010).
Wu, J., Anczuków, O., Krainer, A.R., Zhang, M.Q. & Zhang, C. OLego: fast and sensitive mapping of spliced mRNA-Seq reads using small seeds. Nucleic Acids Res. 41, 5149–5163 (2013).
Grant, G.R. et al. Comparative analysis of RNA-Seq alignment algorithms and the RNA-Seq unified mapper (RUM). Bioinformatics 27, 2518–2528 (2011).10.1093/bioinformatics/btr427
Huang, S. et al. SOAPsplice: Genome-wide ab initio detection of splice junctions from RNA-Seq data. Front. Genet. 2, 46 (2011).
Dobin, A. et al. STAR: ultrafast universal RNA-seq aligner. Bioinformatics 29, 15–21 (2013).
Liao, Y., Smyth, G.K. & Shi, W. The Subread aligner: fast, accurate and scalable read mapping by seed-and-vote. Nucleic Acids Res. 41, e108 (2013).
Kim, D. et al. TopHat2: accurate alignment of transcriptomes in the presence of insertions, deletions and gene fusions. Genome Biol. 14, R36 (2013).
Engström, P.G. et al. Systematic evaluation of spliced alignment programs for RNA-seq data. Nat. Methods 10, 1185–1191 (2013).
Aurrecoechea, C. et al. PlasmoDB: a functional genomic database for malaria parasites. Nucleic Acids Res. 37, D539–D543 (2009).
Glenn, T.C. Field guide to next-generation DNA sequencers. Mol. Ecol. Resour. 11, 759–769 (2011).
Wang, W.-A. et al. Comparisons and performance evaluations of RNA-seq alignment tools in 2014 International Conference on Electrical Engineering and Computer Science 215–218 (ICEECS, 2014).
Benjamin, A.M., Nichols, M., Burke, T.W., Ginsburg, G.S. & Lucas, J.E. Comparing reference-based RNA-Seq mapping methods for non-human primate data. BMC Genomics 15, 570 (2014).
Fonseca, N.A., Rung, J., Brazma, A. & Marioni, J.C. Tools for mapping high-throughput sequencing data. Bioinformatics 28, 3169–3177 (2012).
Fonseca, N.A., Marioni, J. & Brazma, A. RNA-Seq gene profiling—a systematic empirical comparison. PLoS One 9, e107026 (2014).
Gardner, M.J. et al. Genome sequence of the human malaria parasite Plasmodium falciparum. Nature 419, 498–511 (2002).
Lindner, R. & Friedel, C.C. A comprehensive evaluation of alignment algorithms in the context of RNA-seq. PLoS One 7, e52403 (2012).
Hatem, A., Bozdagˇ, D., Toland, A.E. & Çatalyürek, U.V. Benchmarking short sequence mapping tools. BMC Bioinformatics 14, 184 (2013).
Li, H. et al. The Sequence Alignment/Map format and SAMtools. Bioinformatics 25, 2078–2079 (2009).
We thank A. Srinivasan for his help administrating the PMACS cluster. We thank N. Lahens, T. Grosser, D. Sarantopoulou, F. Coldren, E. Scarci, and E. Ricciotti for support and helpful discussions. This work was funded in part by the National Heart Lung and Blood Institute (U54HL117798, G.A.F.) and The National Center for Advancing Translational Sciences (UL1-TR-001878, G.A.F.).
The authors declare no competing financial interests.
Supplementary Figures 1–15, Supplementary Notes 1–10 and Supplementary Tables 1–43. (PDF 6431 kb)
Information about the tools involved in the comparison. (XLSX 54 kb)
Statistics and accuracy metrics of tweaked alignment on Human. (XLSX 59 kb)
Statistics and accuracy metrics of tweaked alignment on Malaria. (XLSX 1629 kb)
Statistics and accuracy metrics of default alignment on Human and Malaria (latest tool versions). (XLSX 65 kb)
Statistics and accuracy metrics of default alignment on Human. (XLSX 17 kb)
Statistics and accuracy metrics of default alignment on Malaria. (XLSX 17 kb)
Statistics and accuracy metrics achieved by the best tweaked alignment on Human. (XLSX 26 kb)
Statistics and accuracy metrics achieved by the best tweaked alignment on Malaria. (XLSX 72 kb)
Statistics and accuracy metrics of default alignment on Human including/omitting annotation. (XLSX 50 kb)
Statistics and accuracy metrics of default alignment on Malaria including/omitting annotation. (XLSX 31 kb)
Computational performance metrics of default alignment on Human. (XLS 78 kb)
Computational performance metrics of default alignment on Malaria. (XLS 78 kb)
Statistics and accuracy metrics of short anchored reads alignment on Human. (XLSX 310 kb)
Statistics and accuracy metrics of simulated adapters alignment on Human. (XLSX 313 kb)
Statistics and accuracy metrics of canonical and noncanonical junctions on Human. (XLSX 148 kb)
All scripts used in this analysis. (ZIP 3592 kb)
About this article
Cite this article
Baruzzo, G., Hayer, K., Kim, E. et al. Simulation-based comprehensive benchmarking of RNA-seq aligners. Nat Methods 14, 135–139 (2017). https://doi.org/10.1038/nmeth.4106
BAMscale: quantification of next-generation sequencing peaks and generation of scaled coverage tracks
Epigenetics & Chromatin (2020)
Biochimica et Biophysica Acta (BBA) - Gene Regulatory Mechanisms (2020)
Neurobiology of Aging (2020)
IPSC-Derived Neuronal Cultures Carrying the Alzheimer’s Disease Associated TREM2 R47H Variant Enables the Construction of an Aβ-Induced Gene Regulatory Network
International Journal of Molecular Sciences (2020)
European Respiratory Journal (2020)