Article | Published:

HISAT: a fast spliced aligner with low memory requirements

Nature Methods volume 12, pages 357360 (2015) | Download Citation

Abstract

HISAT (hierarchical indexing for spliced alignment of transcripts) is a highly efficient system for aligning reads from RNA sequencing experiments. HISAT uses an indexing scheme based on the Burrows-Wheeler transform and the Ferragina-Manzini (FM) index, employing two types of indexes for alignment: a whole-genome FM index to anchor each alignment and numerous local FM indexes for very rapid extensions of these alignments. HISAT's hierarchical index for the human genome contains 48,000 local FM indexes, each representing a genomic region of 64,000 bp. Tests on real and simulated data sets showed that HISAT is the fastest system currently available, with equal or better accuracy than any other method. Despite its large number of indexes, HISAT requires only 4.3 gigabytes of memory. HISAT supports genomes of any size, including those larger than 4 billion bases.

Access optionsAccess options

Rent or Buy article

Get time limited or full article access on ReadCube.

from$8.99

All prices are NET prices.

Accessions

Gene Expression Omnibus

References

  1. 1.

    , , , & Mapping and quantifying mammalian transcriptomes by RNA-Seq. Nat. Methods 5, 621–628 (2008).

  2. 2.

    et al. Differential gene and transcript expression analysis of RNA-seq experiments with TopHat and Cufflinks. Nat. Protoc. 7, 562–578 (2012).

  3. 3.

    Affymetrix/Cold Spring Harbor Laboratory ENCODE Transcriptome Project. Post-transcriptional processing generates a diversity of 5′-modified long and short RNAs. Nature 457, 1028–1032 (2009).

  4. 4.

    et al. Integrative annotation of human large intergenic noncoding RNAs reveals global properties and specific subclasses. Genes Dev. 25, 1915–1927 (2011).

  5. 5.

    & TopHat-Fusion: an algorithm for discovery of novel fusion transcripts. Genome Biol. 12, R72 (2011).

  6. 6.

    , , & Computational methods for transcriptome annotation and quantification using RNA-seq. Nat. Methods 8, 469–477 (2011).

  7. 7.

    et al. Comparative analysis of RNA-Seq alignment algorithms and the RNA-Seq unified mapper (RUM). Bioinformatics 27, 2518–2528 (2011).

  8. 8.

    et al. Systematic evaluation of spliced alignment programs for RNA-seq data. Nat. Methods 10, 1185–1191 (2013).

  9. 9.

    et al. TopHat2: accurate alignment of transcriptomes in the presence of insertions, deletions and gene fusions. Genome Biol. 14, R36 (2013).

  10. 10.

    & Fast and SNP-tolerant detection of complex variants and splicing in short reads. Bioinformatics 26, 873–881 (2010).

  11. 11.

    et al. STAR: ultrafast universal RNA-seq aligner. Bioinformatics 29, 15–21 (2013).

  12. 12.

    & Block-sorting lossless data compression algorithm (Technical report 124). (Digital Equipment Corp., Palo Alto, 1994).

  13. 13.

    & in Proc. 41st Annual Symp. Found. Comput. Sci. 390–398 (IEEE, 2000).

  14. 14.

    & Fast gapped-read alignment with Bowtie 2. Nat. Methods 9, 357–359 (2012).

  15. 15.

    , , , & OLego: fast and sensitive mapping of spliced mRNA-Seq reads using small seeds. Nucleic Acids Res. 41, 5149–5163 (2013).

  16. 16.

    et al. Modelling and simulating generic RNA-Seq experiments with the flux simulator. Nucleic Acids Res. 40, 10073–10083 (2012).

  17. 17.

    et al. Personal omics profiling reveals dynamic molecular and medical phenotypes. Cell 148, 1293–1307 (2012).

Download references

Acknowledgements

We thank G. Pertea and L. Song for their invaluable contributions to our discussions on HISAT. We also thank C. Trapnell for the use of his TuxSim simulation program. This work was supported in part by the National Human Genome Research Institute (US National Institutes of Health) under grants R01-HG006102 and R01-HG006677 to S.L.S.

Author information

Affiliations

  1. Center for Computational Biology, McKusick-Nathans Institute of Genetic Medicine, Johns Hopkins University School of Medicine, Baltimore, Maryland, USA.

    • Daehwan Kim
    • , Ben Langmead
    •  & Steven L Salzberg
  2. Department of Biostatistics, Bloomberg School of Public Health, Johns Hopkins University, Baltimore, Maryland, USA.

    • Daehwan Kim
    • , Ben Langmead
    •  & Steven L Salzberg
  3. Department of Computer Science, Johns Hopkins University, Baltimore, Maryland, USA.

    • Ben Langmead
    •  & Steven L Salzberg

Authors

  1. Search for Daehwan Kim in:

  2. Search for Ben Langmead in:

  3. Search for Steven L Salzberg in:

Contributions

D.K., B.L. and S.L.S. performed the analysis and discussed the results of HISAT. D.K. implemented HISAT. D.K., B.L. and S.L.S. wrote the manuscript. All authors read and approved the final manuscript.

Competing interests

The authors declare no competing financial interests.

Corresponding authors

Correspondence to Daehwan Kim or Ben Langmead or Steven L Salzberg.

Integrated supplementary information

Supplementary information

PDF files

  1. 1.

    Supplementary Text and Figures

    Supplementary Figures 1–11, Supplementary Tables 1–7 and Supplementary Note

About this article

Publication history

Received

Accepted

Published

DOI

https://doi.org/10.1038/nmeth.3317

Further reading