Multi-platform assessment of transcriptome profiling using RNA-seq in the ABRF next-generation sequencing study

Subjects

  • An Erratum to this article was published on 07 November 2014

Abstract

High-throughput RNA sequencing (RNA-seq) greatly expands the potential for genomics discoveries, but the wide variety of platforms, protocols and performance capabilitites has created the need for comprehensive reference data. Here we describe the Association of Biomolecular Resource Facilities next-generation sequencing (ABRF-NGS) study on RNA-seq. We carried out replicate experiments across 15 laboratory sites using reference RNA standards to test four protocols (poly-A–selected, ribo-depleted, size-selected and degraded) on five sequencing platforms (Illumina HiSeq, Life Technologies PGM and Proton, Pacific Biosciences RS and Roche 454). The results show high intraplatform (Spearman rank R > 0.86) and inter-platform (R > 0.83) concordance for expression measures across the deep-count platforms, but highly variable efficiency and cost for splice junction and variant detection between all platforms. For intact RNA, gene expression profiles from rRNA-depletion and poly-A enrichment are similar. In addition, rRNA depletion enables effective analysis of degraded RNA samples. This study provides a broad foundation for cross-platform standardization, evaluation and improvement of RNA-seq.

Access options

Rent or Buy article

Get time limited or full article access on ReadCube.

from$8.99

All prices are NET prices.

Figure 1: Experimental design and sequencing platforms.
Figure 2: Transcript coverage across all genes detected.
Figure 3: Intra- and inter-platform variation of RNA-seq transcript metrics.
Figure 4: Inter-platform consistency of splicing and differential expression analysis.
Figure 5: Differentially expressed genes in ribo-depleted and poly-A–enriched libraries.

Accession codes

Primary accessions

Gene Expression Omnibus

Change history

  • 10 October 2014

    In the version of this article initially published, author Jeffrey Rosenfeld's middle initial “A” was omitted. The error has been corrected in the HTML and PDF versions of the article.

References

  1. 1

    Wang, E.T. et al. Alternative isoform regulation in human tissue transcriptomes. Nature 456, 470–476 (2008).

    CAS  Article  Google Scholar 

  2. 2

    Nagalakshmi, U., Waern, K. & Snyder, M. RNA-Seq: a method for comprehensive transcriptome analysis. Curr. Protoc. Mol. Biol. 89, 4.11 (2010).

    Google Scholar 

  3. 3

    Liu, S., Lin, L., Jiang, P., Wang, D. & Xing, Y. A comparison of RNA-Seq and high-density exon array for detecting differential gene expression between closely related species. Nucleic Acids Res. 39, 578–588 (2011).

    CAS  Article  Google Scholar 

  4. 4

    Marioni, J.C., Mason, C.E., Mane, S.M., Stephens, M. & Gilad, Y. RNA-seq: an assessment of technical reproducibility and comparison with gene expression arrays. Genome Res. 18, 1509–1517 (2008).

    CAS  Article  Google Scholar 

  5. 5

    Mortazavi, A., Williams, B.A., McCue, K., Schaeffer, L. & Wold, B. Mapping and quantifying mammalian transcriptomes by RNA-Seq. Nat. Methods 5, 621–628 (2008).

    CAS  Article  Google Scholar 

  6. 6

    Liu, L. et al. Comparison of next-generation sequencing systems. J. Biomed. Biotechnol. 2012, 251364 (2012).

    PubMed  PubMed Central  Google Scholar 

  7. 7

    Ratan, A. et al. Comparison of sequencing platforms for single nucleotide variant calls in a human sample. PLoS ONE 8, e55089 (2013).

    CAS  Article  Google Scholar 

  8. 8

    Quail, M.A. et al. A tale of three next generation sequencing platforms: comparison of Ion Torrent, Pacific Biosciences and Illumina MiSeq sequencers. BMC Genomics 13, 341 (2012).

    CAS  Article  Google Scholar 

  9. 9

    Loman, N.J. et al. Performance comparison of benchtop high-throughput sequencing platforms. Nat. Biotechnol. 30, 434–439 (2012).

    CAS  Article  Google Scholar 

  10. 10

    Shi, L. et al. The MicroArray Quality Control (MAQC) project shows inter- and intraplatform reproducibility of gene expression measurements. Nat. Biotechnol. 24, 1151–1161 (2006).

    CAS  Article  Google Scholar 

  11. 11

    SEQC/MAQC-III Consortium. A comprehensive assessment of RNA-seq accuracy, reproducibility and information content by the Sequencing Quality Control Consortium. Nat. Biotechnol. 10.1038/nbt.2957 (24 August 2014).

  12. 12

    't Hoen, P.A. et al. Reproducibility of high-throughput mRNA and small RNA sequencing across laboratories. Nat. Biotechnol. 31, 1015–1022 (2013).

    CAS  Article  Google Scholar 

  13. 13

    Tarazona, S., Garcia-Alcalde, F., Dopazo, J., Ferrer, A. & Conesa, A. Differential expression in RNA-seq: a matter of depth. Genome Res. 21, 2213–2223 (2011).

    CAS  Article  Google Scholar 

  14. 14

    Katz, Y., Wang, E.T., Airoldi, E.M. & Burge, C.B. Analysis and design of RNA sequencing experiments for identifying isoform regulation. Nat. Methods 7, 1009–1015 (2010).

    CAS  Article  Google Scholar 

  15. 15

    Łabaj, P.P. et al. Characterization and improvement of RNA-Seq precision in quantitative transcript expression profiling. Bioinformatics 27, i383–i391 (2011).

    Article  Google Scholar 

  16. 16

    McIntyre, L.M. et al. RNA-seq: technical variability and sampling. BMC Genomics 12, 293 (2011).

    CAS  Article  Google Scholar 

  17. 17

    Huang, R. et al. An RNA-Seq strategy to detect the complete coding and non-coding transcriptome including full-length imprinted macro ncRNAs. PLoS ONE 6, e27288 (2011).

    CAS  Article  Google Scholar 

  18. 18

    Wang, Z., Gerstein, M. & Snyder, M. RNA-Seq: a revolutionary tool for transcriptomics. Nat. Rev. Genet. 10, 57–63 (2009).

    CAS  Article  Google Scholar 

  19. 19

    Toung, J.M., Morley, M., Li, M. & Cheung, V.G. RNA-sequence analysis of human B-cells. Genome Res. 21, 991–998 (2011).

    CAS  Article  Google Scholar 

  20. 20

    Roberts, A., Pimentel, H., Trapnell, C. & Pachter, L. Identification of novel transcripts in annotated genomes using RNA-Seq. Bioinformatics 27, 2325–2329 (2011).

    CAS  Article  Google Scholar 

  21. 21

    Angeletti, R.H. et al. Research technologies: fulfilling the promise. FASEB J. 13, 595–601 (1999).

    CAS  Article  Google Scholar 

  22. 22

    Moelans, C.B., Oostenrijk, D., Moons, M.J. & van Diest, P.J. Formaldehyde substitute fixatives: effects on nucleic acid preservation. J. Clin. Pathol. 64, 960–967 (2011).

    CAS  Article  Google Scholar 

  23. 23

    Opitz, L. et al. Impact of RNA degradation on gene expression profiling. BMC Med. Genomics 3, 36 (2010).

    Article  Google Scholar 

  24. 24

    Morlan, J.D., Qu, K. & Sinicropi, D.V. Selective depletion of rRNA enables whole transcriptome profiling of archival fixed tissue. PLoS ONE 7, e42882 (2012).

    CAS  Article  Google Scholar 

  25. 25

    Li, S. et al. Detecting and correcting systematic variation in large-scale RNA sequencing data. Nat. Biotechnol. 10.1038/nbt.3000 (24 August 2014).

  26. 26

    Pareek, C.S., Smoczynski, R. & Tretyn, A. Sequencing technologies and genome sequencing. J. Appl. Genet. 52, 413–435 (2011).

    CAS  Article  Google Scholar 

  27. 27

    Adiconis, X. et al. Comparative analysis of RNA sequencing methods for degraded or low-input samples. Nat. Methods 10, 623–629 (2013).

    CAS  Article  Google Scholar 

  28. 28

    Boland, J.F. et al. The new sequencer on the block: comparison of Life Technology's Proton sequencer to an Illumina HiSeq for whole-exome sequencing. Hum. Genet. 132, 1153–1163 (2013).

    CAS  Article  Google Scholar 

  29. 29

    Glenn, T.C. Field guide to next-generation DNA sequencers. Mol. Ecol. Resour. 11, 759–769 (2011).

    CAS  Article  Google Scholar 

  30. 30

    Jiang, L. et al. Synthetic spike-in standards for RNA-seq experiments. Genome Res. 21, 1543–1551 (2011).

    CAS  Article  Google Scholar 

  31. 31

    Zook, J.M., Samarov, D., McDaniel, J., Sen, S.K. & Salit, M. Synthetic spike-in standards improve run-specific systematic error analysis for DNA and RNA sequencing. PLoS ONE 7, e41356 (2012).

    CAS  Article  Google Scholar 

  32. 32

    Dobin, A. et al. STAR: ultrafast universal RNA-seq aligner. Bioinformatics 29, 15–21 (2013).

    CAS  Article  Google Scholar 

  33. 33

    Hansen, K.D., Brenner, S.E. & Dudoit, S. Biases in Illumina transcriptome sequencing caused by random hexamer priming. Nucleic Acids Res. 38, e131 (2010).

    Article  Google Scholar 

  34. 34

    Aird, D. et al. Analyzing and minimizing PCR amplification bias in Illumina sequencing libraries. Genome Biol. 12, R18 (2011).

    CAS  Article  Google Scholar 

  35. 35

    Risso, D., Schwartz, K., Sherlock, G. & Dudoit, S. GC-content normalization for RNA-Seq data. BMC Bioinformatics 12, 480 (2011).

    CAS  Article  Google Scholar 

  36. 36

    1000 Genomes Project Consortium et al. An integrated map of genetic variation from 1,092 human genomes. Nature 491, 56–65 (2012).

  37. 37

    McKenna, A. et al. The Genome Analysis Toolkit: a MapReduce framework for analyzing next-generation DNA sequencing data. Genome Res. 20, 1297–1303 (2010).

    CAS  Article  Google Scholar 

  38. 38

    Sharon, D., Tilgner, H., Grubert, F. & Snyder, M. A single-molecule long-read survey of the human transcriptome. Nat. Biotechnol. 31, 1009–1014 (2013).

    CAS  Article  Google Scholar 

  39. 39

    Smyth, G.K. in Bioinformatics and Computational Biology Solutions Using R and Bioconductor (eds. Gentleman, R., Carey, V., Huber, W., Irizarry, R. & Dudoit, S.) 397–420 (Springer New York, 2005).

  40. 40

    Cui, P. et al. A comparison between ribo-minus RNA-sequencing and polyA-selected RNA-sequencing. Genomics 96, 259–265 (2010).

    CAS  Article  Google Scholar 

  41. 41

    Leek, J.T., Johnson, W.E., Parker, H.S., Jaffe, A.E. & Storey, J.D. The sva package for removing batch effects and other unwanted variation in high-throughput experiments. Bioinformatics 28, 882–883 (2012).

    CAS  Article  Google Scholar 

  42. 42

    Baldi, P., Brunak, S., Chauvin, Y., Andersen, C.A. & Nielsen, H. Assessing the accuracy of prediction algorithms for classification: an overview. Bioinformatics 16, 412–424 (2000).

    CAS  Article  Google Scholar 

  43. 43

    Shi, L. et al. The MicroArray Quality Control (MAQC)-II study of common practices for the development and validation of microarray-based predictive models. Nat. Biotechnol. 28, 827–838 (2010).

    CAS  Article  Google Scholar 

  44. 44

    Li, S. & Mason, C. E. The pivotal regulatory landscape of RNA modifications. Annu. Rev. Genomics Hum. Genet. 10.1146/annurev-genom-090413-025405 (2 June 2014).

  45. 45

    Haas, B.J. & Zody, M.C. Advancing RNA-Seq analysis. Nat. Biotechnol. 28, 421–423 (2010).

    CAS  Article  Google Scholar 

  46. 46

    Wenger, Y. & Galliot, B. RNAseq versus genome-predicted transcriptomes: a large population of novel transcripts identified in an Illumina-454 Hydra transcriptome. BMC Genomics 14, 204 (2013).

    CAS  Article  Google Scholar 

  47. 47

    Pipes, L. et al. The non-human primate reference transcriptome resource (NHPRTR) for comparative functional genomics. Nucleic Acids Res. 41, D906–D914 (2013).

    CAS  Article  Google Scholar 

  48. 48

    Krupp, M. et al. RNA-Seq Atlas–a reference database for gene expression profiling in normal tissue by next-generation sequencing. Bioinformatics 28, 1184–1185 (2012).

    CAS  Article  Google Scholar 

  49. 49

    Van Peer, G., Mestdagh, P. & Vandesompele, J. Accurate RT-qPCR gene expression analysis on cell culture lysates. Sci. Rep. 2, 222 (2012).

    Article  Google Scholar 

  50. 50

    Hellemans, J., Mortier, G., De Paepe, A., Speleman, F. & Vandesompele, J. qBase relative quantification framework and software for management and automated analysis of real-time quantitative PCR data. Genome Biol. 8, R19 (2007).

    Article  Google Scholar 

  51. 51

    Bustin, S.A. et al. The MIQE guidelines: minimum information for publication of quantitative real-time PCR experiments. Clin. Chem. 55, 611–622 (2009).

    CAS  Article  Google Scholar 

  52. 52

    Robinson, M.D., McCarthy, D.J. & Smyth, G.K. edgeR: a Bioconductor package for differential expression analysis of digital gene expression data. Bioinformatics 26, 139–140 (2010).

    CAS  Article  Google Scholar 

  53. 53

    Robinson, M.D. & Smyth, G.K. Moderated statistical tests for assessing differences in tag abundance. Bioinformatics 23, 2881–2887 (2007).

    CAS  Article  Google Scholar 

  54. 54

    Robinson, M.D. & Smyth, G.K. Small-sample estimation of negative binomial dispersion, with applications to SAGE data. Biostatistics 9, 321–332 (2008).

    Article  Google Scholar 

  55. 55

    Leek, J.T. & Storey, J.D. Capturing heterogeneity in gene expression studies by surrogate variable analysis. PLoS Genet. 3, 1724–1735 (2007).

    CAS  Article  Google Scholar 

  56. 56

    Wang, L., Wang, S. & Li, W. RSeQC: quality control of RNA-seq experiments. Bioinformatics 28, 2184–2185 (2012).

    CAS  Article  Google Scholar 

  57. 57

    Canales, R.D. et al. Evaluation of DNA microarray results with quantitative gene expression platforms. Nat. Biotechnol. 24, 1115–1122 (2006).

    CAS  Article  Google Scholar 

  58. 58

    Dvinge, H. & Bertone, P. HTqPCR: high-throughput analysis and visualization of quantitative real-time PCR data in R. Bioinformatics 25, 3325–3326 (2009).

    CAS  Article  Google Scholar 

Download references

Acknowledgements

We greatly appreciate the contribution and distribution of reference sample RNA from L. Shi (FDA) and his valuable interactions to assist in the planning of this study. This work was supported with funding from the National Institutes of Health (NIH), including R01HG006798, R01NS076465, R24RR032341, as well as funds from the Irma T. Hirschl and Monique Weill-Caulier Charitable Trusts and the STARR Consortium (I7-A765).

We thank the following contributors for their technical wisdom, including laboratory expertise, data analysis and bioinformatics contributions, and technical design guidance and consultation. Without their help, this study would not have been possible: D. Stopka (Memorial Sloan-Kettering Cancer Institute), G. Grove (Penn State Univ.), D. Hannon (Penn State Univ.), K. Jones (NIH/NCI/SAIC), C. Raley (NIH/NCI/SAIC), H. O'Geen (UC Davis), D. Zheng (Univ. Illinois-Urbana), O. Nguyen (UC Davis), Z.-W. Lu (UC Davis), J. Spisak (Cornell Univ.), D. Lin (NIH/NIAID), J. Pillardy (Cornell Univ.), P.-Y. Wu (Georgia Institute of Technology), J. Phan (Emory Univ.), D. Oschwald (New York Genome Center), H. Arnold (PerkinElmer), S. Tyndale (Univ. Southern California), H. Truong (Univ. Southern California), Y. Zhang (Univ. Florida), N. Panayotova (Univ. Florida), D. Moraga (Univ. Florida), S. Shanker (Univ. Florida), and N. Barker (US Army Environmental Quality Research Program).

We would also like to thank the platform vendors, Illumina, Life Technologies, Pacific Biosciences and Roche Life Sciences, for their support of this study, and their distinguished scientists for providing technical expertise and assistance in study designs, protocols, new methods development and significant contributions of reagents and sequencing kits. In particular, alphabetically by vendor: G. Schroth (Illumina); M. Gallad, J. Smith, T. Bittick, R. Setterquist and G. Scott (Life Technologies); J. Korlach, S. Turner and E. Tseng (Pacific Biosciences); and K. Fredrickson and C. Teiling (Roche Life Sciences).

We are sincerely appreciative of the Association of Biomolecular Resource Facilities (ABRF) for supporting this study and the contributing ABRF Research Groups. Special thanks to our ABRF executive board liaison A. Perera (Stowers Institute for Medical Research).

Author information

Affiliations

Authors

Contributions

All authors are members of the Association of Biomolecular Resource Facilities Next-Generation Sequencing (ABRF-NGS) Consortium. S.W.T., C.M.N., D.A.B., G.S.G. and C.E.M. managed the project. S.W.T., C.M.N., D.G., S.L., W.F., A.V., C.W., P.A.S., Y.G., D.K., J.B., B.H., R.K., N.J., N.R., J.G., N.G.-R., C.H., D.R., J.R., T.S., J.G.U., C.E.M. and P.Z. performed sequencing. S.L., S.W.T., C.M.N., D.A.B., G.S.G. and C.E.M. designed the analyses. S.L., P.A.S., J.G.U., P.Z., C.E.M. and D.K. performed the data analyses. S.L., P.Z., M.W., D.K., J.G.U. and C.E.M. made the figures. S.L., S.W.T., C.M.N., D.A.B., G.S.G. and C.E.M. wrote and revised the manuscript. The ABRF-NGS Consortium members contributed to the design and execution of the study.

Corresponding authors

Correspondence to Don A Baldwin or George S Grills or Christopher E Mason.

Ethics declarations

Competing interests

The authors declare no competing financial interests.

Supplementary information

Supplementary Text and Figures

Supplementary Figures 1-39 and Supplementary Tables 1–8 (PDF 10745 kb)

Supplementary Software (ZIP 168 kb)

Rights and permissions

Reprints and Permissions

About this article

Verify currency and authenticity via CrossMark

Cite this article

Li, S., Tighe, S., Nicolet, C. et al. Multi-platform assessment of transcriptome profiling using RNA-seq in the ABRF next-generation sequencing study. Nat Biotechnol 32, 915–925 (2014). https://doi.org/10.1038/nbt.2972

Download citation

Further reading