Subjects

  • An Erratum to this article was published on 07 November 2014

This article has been updated

Abstract

High-throughput RNA sequencing (RNA-seq) greatly expands the potential for genomics discoveries, but the wide variety of platforms, protocols and performance capabilitites has created the need for comprehensive reference data. Here we describe the Association of Biomolecular Resource Facilities next-generation sequencing (ABRF-NGS) study on RNA-seq. We carried out replicate experiments across 15 laboratory sites using reference RNA standards to test four protocols (poly-A–selected, ribo-depleted, size-selected and degraded) on five sequencing platforms (Illumina HiSeq, Life Technologies PGM and Proton, Pacific Biosciences RS and Roche 454). The results show high intraplatform (Spearman rank R > 0.86) and inter-platform (R > 0.83) concordance for expression measures across the deep-count platforms, but highly variable efficiency and cost for splice junction and variant detection between all platforms. For intact RNA, gene expression profiles from rRNA-depletion and poly-A enrichment are similar. In addition, rRNA depletion enables effective analysis of degraded RNA samples. This study provides a broad foundation for cross-platform standardization, evaluation and improvement of RNA-seq.

Access optionsAccess options

Rent or Buy article

Get time limited or full article access on ReadCube.

from$8.99

All prices are NET prices.

Change history

  • 10 October 2014

    In the version of this article initially published, author Jeffrey Rosenfeld's middle initial “A” was omitted. The error has been corrected in the HTML and PDF versions of the article.

Accessions

Primary accessions

Gene Expression Omnibus

References

  1. 1.

    et al. Alternative isoform regulation in human tissue transcriptomes. Nature 456, 470–476 (2008).

  2. 2.

    , & RNA-Seq: a method for comprehensive transcriptome analysis. Curr. Protoc. Mol. Biol. 89, 4.11 (2010).

  3. 3.

    , , , & A comparison of RNA-Seq and high-density exon array for detecting differential gene expression between closely related species. Nucleic Acids Res. 39, 578–588 (2011).

  4. 4.

    , , , & RNA-seq: an assessment of technical reproducibility and comparison with gene expression arrays. Genome Res. 18, 1509–1517 (2008).

  5. 5.

    , , , & Mapping and quantifying mammalian transcriptomes by RNA-Seq. Nat. Methods 5, 621–628 (2008).

  6. 6.

    et al. Comparison of next-generation sequencing systems. J. Biomed. Biotechnol. 2012, 251364 (2012).

  7. 7.

    et al. Comparison of sequencing platforms for single nucleotide variant calls in a human sample. PLoS ONE 8, e55089 (2013).

  8. 8.

    et al. A tale of three next generation sequencing platforms: comparison of Ion Torrent, Pacific Biosciences and Illumina MiSeq sequencers. BMC Genomics 13, 341 (2012).

  9. 9.

    et al. Performance comparison of benchtop high-throughput sequencing platforms. Nat. Biotechnol. 30, 434–439 (2012).

  10. 10.

    et al. The MicroArray Quality Control (MAQC) project shows inter- and intraplatform reproducibility of gene expression measurements. Nat. Biotechnol. 24, 1151–1161 (2006).

  11. 11.

    SEQC/MAQC-III Consortium. A comprehensive assessment of RNA-seq accuracy, reproducibility and information content by the Sequencing Quality Control Consortium. Nat. Biotechnol. 10.1038/nbt.2957 (24 August 2014).

  12. 12.

    et al. Reproducibility of high-throughput mRNA and small RNA sequencing across laboratories. Nat. Biotechnol. 31, 1015–1022 (2013).

  13. 13.

    , , , & Differential expression in RNA-seq: a matter of depth. Genome Res. 21, 2213–2223 (2011).

  14. 14.

    , , & Analysis and design of RNA sequencing experiments for identifying isoform regulation. Nat. Methods 7, 1009–1015 (2010).

  15. 15.

    et al. Characterization and improvement of RNA-Seq precision in quantitative transcript expression profiling. Bioinformatics 27, i383–i391 (2011).

  16. 16.

    et al. RNA-seq: technical variability and sampling. BMC Genomics 12, 293 (2011).

  17. 17.

    et al. An RNA-Seq strategy to detect the complete coding and non-coding transcriptome including full-length imprinted macro ncRNAs. PLoS ONE 6, e27288 (2011).

  18. 18.

    , & RNA-Seq: a revolutionary tool for transcriptomics. Nat. Rev. Genet. 10, 57–63 (2009).

  19. 19.

    , , & RNA-sequence analysis of human B-cells. Genome Res. 21, 991–998 (2011).

  20. 20.

    , , & Identification of novel transcripts in annotated genomes using RNA-Seq. Bioinformatics 27, 2325–2329 (2011).

  21. 21.

    et al. Research technologies: fulfilling the promise. FASEB J. 13, 595–601 (1999).

  22. 22.

    , , & Formaldehyde substitute fixatives: effects on nucleic acid preservation. J. Clin. Pathol. 64, 960–967 (2011).

  23. 23.

    et al. Impact of RNA degradation on gene expression profiling. BMC Med. Genomics 3, 36 (2010).

  24. 24.

    , & Selective depletion of rRNA enables whole transcriptome profiling of archival fixed tissue. PLoS ONE 7, e42882 (2012).

  25. 25.

    et al. Detecting and correcting systematic variation in large-scale RNA sequencing data. Nat. Biotechnol. 10.1038/nbt.3000 (24 August 2014).

  26. 26.

    , & Sequencing technologies and genome sequencing. J. Appl. Genet. 52, 413–435 (2011).

  27. 27.

    et al. Comparative analysis of RNA sequencing methods for degraded or low-input samples. Nat. Methods 10, 623–629 (2013).

  28. 28.

    et al. The new sequencer on the block: comparison of Life Technology's Proton sequencer to an Illumina HiSeq for whole-exome sequencing. Hum. Genet. 132, 1153–1163 (2013).

  29. 29.

    Field guide to next-generation DNA sequencers. Mol. Ecol. Resour. 11, 759–769 (2011).

  30. 30.

    et al. Synthetic spike-in standards for RNA-seq experiments. Genome Res. 21, 1543–1551 (2011).

  31. 31.

    , , , & Synthetic spike-in standards improve run-specific systematic error analysis for DNA and RNA sequencing. PLoS ONE 7, e41356 (2012).

  32. 32.

    et al. STAR: ultrafast universal RNA-seq aligner. Bioinformatics 29, 15–21 (2013).

  33. 33.

    , & Biases in Illumina transcriptome sequencing caused by random hexamer priming. Nucleic Acids Res. 38, e131 (2010).

  34. 34.

    et al. Analyzing and minimizing PCR amplification bias in Illumina sequencing libraries. Genome Biol. 12, R18 (2011).

  35. 35.

    , , & GC-content normalization for RNA-Seq data. BMC Bioinformatics 12, 480 (2011).

  36. 36.

    1000 Genomes Project Consortium et al. An integrated map of genetic variation from 1,092 human genomes. Nature 491, 56–65 (2012).

  37. 37.

    et al. The Genome Analysis Toolkit: a MapReduce framework for analyzing next-generation DNA sequencing data. Genome Res. 20, 1297–1303 (2010).

  38. 38.

    , , & A single-molecule long-read survey of the human transcriptome. Nat. Biotechnol. 31, 1009–1014 (2013).

  39. 39.

    in Bioinformatics and Computational Biology Solutions Using R and Bioconductor (eds. Gentleman, R., Carey, V., Huber, W., Irizarry, R. & Dudoit, S.) 397–420 (Springer New York, 2005).

  40. 40.

    et al. A comparison between ribo-minus RNA-sequencing and polyA-selected RNA-sequencing. Genomics 96, 259–265 (2010).

  41. 41.

    , , , & The sva package for removing batch effects and other unwanted variation in high-throughput experiments. Bioinformatics 28, 882–883 (2012).

  42. 42.

    , , , & Assessing the accuracy of prediction algorithms for classification: an overview. Bioinformatics 16, 412–424 (2000).

  43. 43.

    et al. The MicroArray Quality Control (MAQC)-II study of common practices for the development and validation of microarray-based predictive models. Nat. Biotechnol. 28, 827–838 (2010).

  44. 44.

    & The pivotal regulatory landscape of RNA modifications. Annu. Rev. Genomics Hum. Genet. 10.1146/annurev-genom-090413-025405 (2 June 2014).

  45. 45.

    & Advancing RNA-Seq analysis. Nat. Biotechnol. 28, 421–423 (2010).

  46. 46.

    & RNAseq versus genome-predicted transcriptomes: a large population of novel transcripts identified in an Illumina-454 Hydra transcriptome. BMC Genomics 14, 204 (2013).

  47. 47.

    et al. The non-human primate reference transcriptome resource (NHPRTR) for comparative functional genomics. Nucleic Acids Res. 41, D906–D914 (2013).

  48. 48.

    et al. RNA-Seq Atlas–a reference database for gene expression profiling in normal tissue by next-generation sequencing. Bioinformatics 28, 1184–1185 (2012).

  49. 49.

    , & Accurate RT-qPCR gene expression analysis on cell culture lysates. Sci. Rep. 2, 222 (2012).

  50. 50.

    , , , & qBase relative quantification framework and software for management and automated analysis of real-time quantitative PCR data. Genome Biol. 8, R19 (2007).

  51. 51.

    et al. The MIQE guidelines: minimum information for publication of quantitative real-time PCR experiments. Clin. Chem. 55, 611–622 (2009).

  52. 52.

    , & edgeR: a Bioconductor package for differential expression analysis of digital gene expression data. Bioinformatics 26, 139–140 (2010).

  53. 53.

    & Moderated statistical tests for assessing differences in tag abundance. Bioinformatics 23, 2881–2887 (2007).

  54. 54.

    & Small-sample estimation of negative binomial dispersion, with applications to SAGE data. Biostatistics 9, 321–332 (2008).

  55. 55.

    & Capturing heterogeneity in gene expression studies by surrogate variable analysis. PLoS Genet. 3, 1724–1735 (2007).

  56. 56.

    , & RSeQC: quality control of RNA-seq experiments. Bioinformatics 28, 2184–2185 (2012).

  57. 57.

    et al. Evaluation of DNA microarray results with quantitative gene expression platforms. Nat. Biotechnol. 24, 1115–1122 (2006).

  58. 58.

    & HTqPCR: high-throughput analysis and visualization of quantitative real-time PCR data in R. Bioinformatics 25, 3325–3326 (2009).

Download references

Acknowledgements

We greatly appreciate the contribution and distribution of reference sample RNA from L. Shi (FDA) and his valuable interactions to assist in the planning of this study. This work was supported with funding from the National Institutes of Health (NIH), including R01HG006798, R01NS076465, R24RR032341, as well as funds from the Irma T. Hirschl and Monique Weill-Caulier Charitable Trusts and the STARR Consortium (I7-A765).

We thank the following contributors for their technical wisdom, including laboratory expertise, data analysis and bioinformatics contributions, and technical design guidance and consultation. Without their help, this study would not have been possible: D. Stopka (Memorial Sloan-Kettering Cancer Institute), G. Grove (Penn State Univ.), D. Hannon (Penn State Univ.), K. Jones (NIH/NCI/SAIC), C. Raley (NIH/NCI/SAIC), H. O'Geen (UC Davis), D. Zheng (Univ. Illinois-Urbana), O. Nguyen (UC Davis), Z.-W. Lu (UC Davis), J. Spisak (Cornell Univ.), D. Lin (NIH/NIAID), J. Pillardy (Cornell Univ.), P.-Y. Wu (Georgia Institute of Technology), J. Phan (Emory Univ.), D. Oschwald (New York Genome Center), H. Arnold (PerkinElmer), S. Tyndale (Univ. Southern California), H. Truong (Univ. Southern California), Y. Zhang (Univ. Florida), N. Panayotova (Univ. Florida), D. Moraga (Univ. Florida), S. Shanker (Univ. Florida), and N. Barker (US Army Environmental Quality Research Program).

We would also like to thank the platform vendors, Illumina, Life Technologies, Pacific Biosciences and Roche Life Sciences, for their support of this study, and their distinguished scientists for providing technical expertise and assistance in study designs, protocols, new methods development and significant contributions of reagents and sequencing kits. In particular, alphabetically by vendor: G. Schroth (Illumina); M. Gallad, J. Smith, T. Bittick, R. Setterquist and G. Scott (Life Technologies); J. Korlach, S. Turner and E. Tseng (Pacific Biosciences); and K. Fredrickson and C. Teiling (Roche Life Sciences).

We are sincerely appreciative of the Association of Biomolecular Resource Facilities (ABRF) for supporting this study and the contributing ABRF Research Groups. Special thanks to our ABRF executive board liaison A. Perera (Stowers Institute for Medical Research).

Author information

Author notes

    • Ryan Kim

    Present address: Korean Bioinformation Center (KOBIC), National Center for Biological Research Resource Information, Daejeon, South Korea.

    • Sheng Li
    •  & Scott W Tighe

    These authors contributed equally to this work.

Affiliations

  1. Department of Physiology and Biophysics, Weill Cornell Medical College, New York, New York, USA.

    • Sheng Li
    • , Sagar Chhangawala
    • , Jorge Gandara
    • , Paul Zumbo
    •  & Christopher E Mason
  2. The HRH Prince Alwaleed Bin Talal Bin Abdulaziz Alsaud Institute for Computational Biomedicine, Weill Cornell Medical College, New York, New York, USA.

    • Sheng Li
    • , Sagar Chhangawala
    • , Jorge Gandara
    • , Paul Zumbo
    •  & Christopher E Mason
  3. Vermont Cancer Center, University of Vermont, Burlington, Vermont, USA.

    • Scott W Tighe
  4. Keck School of Medicine, University of Southern California, Los Angeles, California, USA.

    • Charles M Nicolet
  5. The Huck Institutes of the Life Sciences, Pennsylvania State University, University Park, Pennsylvania, USA.

    • Deborah Grove
  6. HudsonAlpha Institute for Biotechnology, Huntsville, Alabama, USA.

    • Shawn Levy
    •  & Cynthia Hendrickson
  7. Interdisciplinary Center for Biotechnology Research, University of Florida, Gainesville, Florida, USA.

    • William Farmerie
  8. Memorial Sloan-Kettering Cancer Institute, New York, New York, USA.

    • Agnes Viale
  9. Roy J. Carver Biotechnology Center, University of Illinois, Urbana, Illinois, USA.

    • Chris Wright
  10. Biotechnology Resource Center, Institute of Biotechnology, Cornell University, Ithaca, New York, USA.

    • Peter A Schweitzer
    •  & George S Grills
  11. Department of Biomedical Engineering, Johns Hopkins University, Baltimore, Maryland, USA.

    • Yuan Gao
    •  & Dewey Kim
  12. NIH/NCI/SAIC-Frederick, Gaithersburg, Maryland, USA.

    • Joe Boland
    • , Belynda Hicks
    •  & David Roberson
  13. Genome Center, University of California, Davis, Davis, California, USA.

    • Ryan Kim
  14. Center for Genetic Medicine, Northwestern University, Chicago, Illinois, USA.

    • Nadereh Jafari
  15. NIH/NHLBI, Bethesda, Maryland, USA.

    • Nalini Raghavachari
  16. Institute for Genomics, Biocomputing and Biotechnology, Mississippi State University, Starkville, Mississippi, USA.

    • Natàlia Garcia-Reyero
  17. Division of High Performance and Research Computing, University of Medicine and Dentistry of New Jersey, Newark, New Jersey, USA.

    • Jeffrey A Rosenfeld
  18. PerkinElmer Inc., Seattle, Washington, USA.

    • Todd Smith
  19. Department of Genome Sciences, University of Washington. Seattle, Washington, USA.

    • Jason G Underwood
  20. Department of Biomedical Engineering, Georgia Institute of Technology and Emory University, Atlanta, Georgia, USA.

    • May Wang
  21. Pathonomics LLC, Philadelphia, Pennsylvania, USA.

    • Don A Baldwin
  22. The Feil Family Brain and Mind Research Institute, New York, New York, USA.

    • Christopher E Mason

Authors

  1. Search for Sheng Li in:

  2. Search for Scott W Tighe in:

  3. Search for Charles M Nicolet in:

  4. Search for Deborah Grove in:

  5. Search for Shawn Levy in:

  6. Search for William Farmerie in:

  7. Search for Agnes Viale in:

  8. Search for Chris Wright in:

  9. Search for Peter A Schweitzer in:

  10. Search for Yuan Gao in:

  11. Search for Dewey Kim in:

  12. Search for Joe Boland in:

  13. Search for Belynda Hicks in:

  14. Search for Ryan Kim in:

  15. Search for Sagar Chhangawala in:

  16. Search for Nadereh Jafari in:

  17. Search for Nalini Raghavachari in:

  18. Search for Jorge Gandara in:

  19. Search for Natàlia Garcia-Reyero in:

  20. Search for Cynthia Hendrickson in:

  21. Search for David Roberson in:

  22. Search for Jeffrey A Rosenfeld in:

  23. Search for Todd Smith in:

  24. Search for Jason G Underwood in:

  25. Search for May Wang in:

  26. Search for Paul Zumbo in:

  27. Search for Don A Baldwin in:

  28. Search for George S Grills in:

  29. Search for Christopher E Mason in:

Contributions

All authors are members of the Association of Biomolecular Resource Facilities Next-Generation Sequencing (ABRF-NGS) Consortium. S.W.T., C.M.N., D.A.B., G.S.G. and C.E.M. managed the project. S.W.T., C.M.N., D.G., S.L., W.F., A.V., C.W., P.A.S., Y.G., D.K., J.B., B.H., R.K., N.J., N.R., J.G., N.G.-R., C.H., D.R., J.R., T.S., J.G.U., C.E.M. and P.Z. performed sequencing. S.L., S.W.T., C.M.N., D.A.B., G.S.G. and C.E.M. designed the analyses. S.L., P.A.S., J.G.U., P.Z., C.E.M. and D.K. performed the data analyses. S.L., P.Z., M.W., D.K., J.G.U. and C.E.M. made the figures. S.L., S.W.T., C.M.N., D.A.B., G.S.G. and C.E.M. wrote and revised the manuscript. The ABRF-NGS Consortium members contributed to the design and execution of the study.

Competing interests

The authors declare no competing financial interests.

Corresponding authors

Correspondence to Don A Baldwin or George S Grills or Christopher E Mason.

Supplementary information

PDF files

  1. 1.

    Supplementary Text and Figures

    Supplementary Figures 1-39 and Supplementary Tables 1–8

Zip files

  1. 1.

    Supplementary Software

About this article

Publication history

Received

Accepted

Published

DOI

https://doi.org/10.1038/nbt.2972

Further reading