Skip to main content

Thank you for visiting nature.com. You are using a browser version with limited support for CSS. To obtain the best experience, we recommend you use a more up to date browser (or turn off compatibility mode in Internet Explorer). In the meantime, to ensure continued support, we are displaying the site without styles and JavaScript.

  • Article
  • Published:

Reproducibility of high-throughput mRNA and small RNA sequencing across laboratories

A Corrigendum to this article was published on 10 March 2014

This article has been updated

Abstract

RNA sequencing is an increasingly popular technology for genome-wide analysis of transcript sequence and abundance. However, understanding of the sources of technical and interlaboratory variation is still limited. To address this, the GEUVADIS consortium sequenced mRNAs and small RNAs of lymphoblastoid cell lines of 465 individuals in seven sequencing centers, with a large number of replicates. The variation between laboratories appeared to be considerably smaller than the already limited biological variation. Laboratory effects were mainly seen in differences in insert size and GC content and could be adequately corrected for. In small-RNA sequencing, the microRNA (miRNA) content differed widely between samples owing to competitive sequencing of rRNA fragments. This did not affect relative quantification of miRNAs. We conclude that distributing RNA sequencing among different laboratories is feasible, given proper standardization and randomization procedures. We provide a set of quality measures and guidelines for assessing technical biases in RNA-seq data.

This is a preview of subscription content, access via your institution

Access options

Buy this article

Prices may be subject to local taxes which are calculated during checkout

Figure 1: Basic quality statistics in mRNA sequencing across laboratories.
Figure 2: Detection of outliers in mRNA sequencing.
Figure 3: Sources of variation in mRNA expression levels.
Figure 4: Modeling of hidden confounding factors with PEER effectively removes biases in mRNA-seq data.
Figure 5: Basic quality statistics in sRNA sequencing across laboratories.
Figure 6: sRNA heterogeneity does not disturb quantification of individual miRNAs.

Similar content being viewed by others

Accession codes

Primary accessions

ArrayExpress

Change history

  • 08 November 2013

    In the version of this article initially published, in the list of members of the GEUVADIS consortium, Stylianos E Antonorakis should have been Stylianos E Antonarakis. The error has been corrected in the HTML and PDF versions of the article.

References

  1. Mortazavi, A., Williams, B.A., McCue, K., Schaeffer, L. & Wold, B. Mapping and quantifying mammalian transcriptomes by RNA-Seq. Nat. Methods 5, 621–628 (2008).

    Article  CAS  Google Scholar 

  2. Ozsolak, F. & Milos, P.M. RNA sequencing: advances, challenges and opportunities. Nat. Rev. Genet. 12, 87–98 (2011).

    Article  CAS  Google Scholar 

  3. Wang, Z., Gerstein, M. & Snyder, M. RNA-seq: a revolutionary tool for transcriptomics. Nat. Rev. Genet. 10, 57–63 (2009).

    Article  CAS  Google Scholar 

  4. Cloonan, N. et al. Stem cell transcriptome profiling via massive-scale mRNA sequencing. Nat. Methods 5, 613–619 (2008).

    Article  CAS  Google Scholar 

  5. 't Hoen, P.A. et al. Deep sequencing-based expression analysis shows major advances in robustness, resolution and inter-lab portability over five microarray platforms. Nucleic Acids Res. 36, e141 (2008).

    Article  Google Scholar 

  6. van Iterson, M. et al. Relative power and sample size analysis on gene expression profiling data. BMC Genomics 10, 439 (2009).

    Article  CAS  Google Scholar 

  7. Sirbu, A., Kerr, G., Crane, M. & Ruskin, H.J. RNA-seq vs dual- and single-channel microarray data: sensitivity analysis for differential expression and clustering. PLoS ONE 7, e50986 (2012).

    Article  CAS  Google Scholar 

  8. Bradford, J.R. et al. A comparison of massively parallel nucleotide sequencing with oligonucleotide microarrays for global transcription profiling. BMC Genomics 11, 282 (2010).

    Article  Google Scholar 

  9. Marioni, J.C., Mason, C.E., Mane, S.M., Stephens, M. & Gilad, Y. RNA-seq: an assessment of technical reproducibility and comparison with gene expression arrays. Genome Res. 18, 1509–1517 (2008).

    Article  CAS  Google Scholar 

  10. Agarwal, A. et al. Comparison and calibration of transcriptome data from RNA-Seq and tiling arrays. BMC Genomics 11, 383 (2010).

    Article  Google Scholar 

  11. Bottomly, D. et al. Evaluating gene expression in C57BL/6J and DBA/2J mouse striatum using RNA-Seq and microarrays. PLoS ONE 6, e17820 (2011).

    Article  CAS  Google Scholar 

  12. Raghavachari, N. et al. A systematic comparison and evaluation of high density exon arrays and RNA-seq technology used to unravel the peripheral blood transcriptome of sickle cell disease. BMC Med. Genomics 5, 28 (2012).

    Article  CAS  Google Scholar 

  13. Liu, S., Lin, L., Jiang, P., Wang, D. & Xing, Y. A comparison of RNA-Seq and high-density exon array for detecting differential gene expression between closely related species. Nucleic Acids Res. 39, 578–588 (2011).

    Article  CAS  Google Scholar 

  14. Hansen, K.D., Brenner, S.E. & Dudoit, S. Biases in Illumina transcriptome sequencing caused by random hexamer priming. Nucleic Acids Res. 38, e131 (2010).

    Article  Google Scholar 

  15. Gao, L., Fang, Z., Zhang, K., Zhi, D. & Cui, X. Length bias correction for RNA-seq data in gene set analyses. Bioinformatics 27, 662–669 (2011).

    Article  CAS  Google Scholar 

  16. Oshlack, A. & Wakefield, M.J. Transcript length bias in RNA-seq data confounds systems biology. Biol. Direct 4, 14 (2009).

    Article  Google Scholar 

  17. Roberts, A., Trapnell, C., Donaghey, J., Rinn, J.L. & Pachter, L. Improving RNA-Seq expression estimates by correcting for fragment bias. Genome Biol. 12, R22 (2011).

    Article  CAS  Google Scholar 

  18. Risso, D., Schwartz, K., Sherlock, G. & Dudoit, S. GC-content normalization for RNA-Seq data. BMC Bioinformatics 12, 480 (2011).

    Article  CAS  Google Scholar 

  19. Pickrell, J.K. et al. Understanding mechanisms underlying human gene expression variation with RNA sequencing. Nature 464, 768–772 (2010).

    Article  CAS  Google Scholar 

  20. Shi, L. et al. The MicroArray Quality Control (MAQC) project shows inter- and intraplatform reproducibility of gene expression measurements. Nat. Biotechnol. 24, 1151–1161 (2006).

    Article  CAS  Google Scholar 

  21. Canales, R.D. et al. Evaluation of DNA microarray results with quantitative gene expression platforms. Nat. Biotechnol. 24, 1115–1122 (2006).

    Article  CAS  Google Scholar 

  22. Patterson, T.A. et al. Performance comparison of one-color and two-color platforms within the MicroArray Quality Control (MAQC) project. Nat. Biotechnol. 24, 1140–1150 (2006).

    Article  CAS  Google Scholar 

  23. Lappalainen, T. et al. Transcriptome and genome sequencing uncovers functional variation in humans. Nature (in the press) doi:10.1038/nature12531 (2013).

  24. Marco-Sola, S., Sammeth, M., Guigo, R. & Ribeca, P. The GEM mapper: fast, accurate and versatile alignment by filtration. Nat. Methods 9, 1185–1188 (2012).

    Article  CAS  Google Scholar 

  25. Pantano, L., Estivill, X. & Marti, E. SeqBuster, a bioinformatic tool for the processing and analysis of small RNAs datasets, reveals ubiquitous miRNA modifications in human embryonic cells. Nucleic Acids Res. 38, e34 (2010).

    Article  Google Scholar 

  26. Kosters, W.A. & Laros, J.F.J. Metrics for mining multisets. in Research and Development in Intelligent Systems XXIV, Proceedings of AI-2007 (Eds. Bramer, M., Coenen, F. & Petridis, M.) 293–303 (Springer, 2007).

  27. Gordon, D. & Finch, S.J. Consequences of error. in Encyclopedia of Genetics, Genomics, Proteomics and Bioinformatics (Eds. Jorde, L., Little, P., Dunn, M. & Subramaniam, S.) (Wiley Online Library, 2006).

  28. Aird, D. et al. Analyzing and minimizing PCR amplification bias in Illumina sequencing libraries. Genome Biol. 12, R18 (2011).

    Article  CAS  Google Scholar 

  29. Stegle, O., Parts, L., Durbin, R. & Winn, J. A Bayesian framework to account for complex non-genetic factors in gene expression levels greatly increases power in eQTL studies. PLOS Comput. Biol. 6, e1000770 (2010).

    Article  Google Scholar 

  30. Stegle, O., Parts, L., Piipari, M., Winn, J. & Durbin, R. Using probabilistic estimation of expression residuals (PEER) to obtain increased power and interpretability of gene expression analyses. Nat. Protoc. 7, 500–507 (2012).

    Article  CAS  Google Scholar 

  31. Parts, L. et al. Extent, causes, and consequences of small RNA expression variation in human adipose tissue. PLoS Genet. 8, e1002704 (2012).

    Article  CAS  Google Scholar 

  32. Benjamini, Y. & Speed, T.P. Summarizing and correcting the GC content bias in high-throughput sequencing. Nucleic Acids Res. 40, e72 (2012).

    Article  CAS  Google Scholar 

  33. Wang, L., Wang, S. & Li, W. RSeQC: quality control of RNA-seq experiments. Bioinformatics 28, 2184–2185 (2012).

    Article  CAS  Google Scholar 

  34. Huang, J., Chen, J., Lathrop, M. & Liang, L. A tool for RNA sequencing sample identity check. Bioinformatics 1463–1464 (2013).

  35. Westra, H.J. et al. MixupMapper: correcting sample mix-ups in genome-wide datasets increases power to detect small genetic effects. Bioinformatics 27, 2104–2111 (2011).

    Article  CAS  Google Scholar 

  36. Leek, J.T. & Storey, J.D. Capturing heterogeneity in gene expression studies by surrogate variable analysis. PLoS Genet. 3, e161 (2007).

    Article  Google Scholar 

  37. Fehrmann, R.S. et al. Trans-eQTLs reveal that independent genetic variants associated with a complex phenotype converge on intermediate genes, with a major role for the HLA. PLoS Genet. 7, e1002197 (2011).

    Article  CAS  Google Scholar 

  38. Montgomery, S.B. et al. Transcriptome genetics using second generation sequencing in a Caucasian population. Nature 464, 773–777 (2010).

    Article  CAS  Google Scholar 

  39. Griebel, T. et al. Modelling and simulating generic RNA-Seq experiments with the flux simulator. Nucleic Acids Res. 40, 10073–10083 (2012).

    Article  CAS  Google Scholar 

  40. Jurka, J. et al. Repbase Update, a database of eukaryotic repetitive elements. Cytogenet. Genome Res. 110, 462–467 (2005).

    Article  CAS  Google Scholar 

  41. Karolchik, D. et al. The UCSC Table Browser data retrieval tool. Nucleic Acids Res. 32, D493–D496 (2004).

    Article  CAS  Google Scholar 

  42. Berninger, P., Gaidatzis, D., van, N.E. & Zavolan, M. Computational analysis of small RNA cloning data. Methods 44, 13–21 (2008).

    Article  CAS  Google Scholar 

  43. Robinson, M.D., McCarthy, D.J. & Smyth, G.K. edgeR: a Bioconductor package for differential expression analysis of digital gene expression data. Bioinformatics 26, 139–140 (2010).

    Article  CAS  Google Scholar 

Download references

Acknowledgements

This project was funded by the European Commission 7th Framework Program (FP7) (261123; GEUVADIS); the Swiss National Science Foundation (130326, 130342), the Louis Jeantet Foundation, and ERC (260927) (E.T.D.); NIH-NIMH (MH090941) (E.T.D., R.G.); Spanish Plan Nacional SAF2008–00357 (NOVADIS), the Generalitat de Catalunya AGAUR 2009 SGR-1502, and the Instituto de Salud Carlos III (FIS/FEDER PI11/00733) (X.E.); Spanish Plan Nacional (BIO2011-26205) and ERC (294653) (R.G.); ESGI, READNA (FP7 Health-F4-2008-201418), Spanish Ministry of Economy and Competitiveness (MINECO) and the Generalitat de Catalunya (I.G.G.); FP7/2007-2013, ENGAGE project, HEALTH-F4-2007-201413, and the Centre for Medical Systems Biology within the framework of The Netherlands Genomics Initiative (NGI)/Netherlands Organisation for Scientific Research (NWO) (P.A.C.t.H. & G.-J.B.v.O.); The Swedish Research Council (C0524801, A028001) and the Knut and Alice Wallenberg Foundation (2011.0073) (A.-C.S.); EMBO long-term fellowship ALTF 225-2011 (M.R.F.); Emil Aaltonen Foundation and Academy of Finland fellowships (T.L.). We acknowledge the SNP&SEQ Technology Platform in Uppsala for sequencing, and the Swedish National Infrastructure for Computing (SNIC-UPPMAX) for compute resources for data analysis. The authors would like to thank P.G.M. van Overveld for help with preparation of the figures.

Author information

Authors and Affiliations

Authors

Consortia

Contributions

P.A.C.t.H., M.R.F., J.A., M.S., I.P., S.Y.A., J.F.J.L., H.P.J.B., M.B., O.K. and T.L. performed the analyses. P.A.C.t.H., A.-C.S., R.G., X.E., J.T.d.D., G.-J.B.v.O., I.G.G. and E.T.D. designed the study. E.T.D. and T.L. coordinated the study. P.A.C.t.H. drafted the manuscript, which was subsequently revised by all co-authors.

Corresponding authors

Correspondence to Peter A C 't Hoen or Tuuli Lappalainen.

Ethics declarations

Competing interests

The authors declare no competing financial interests.

Additional information

Full lists of members and affiliations appear at the end of the paper.

Supplementary information

Supplementary Text and Figures

Supplementary Figures 1–15, Supplementary Tables 1, 3, 5 and Supplementary Note (PDF 1862 kb)

Supplementary Text and Figures

Supplementary Table 2 (TXT 4459 kb)

Supplementary Text and Figures

Supplementary Table 4 (TXT 1000 kb)

Rights and permissions

Reprints and permissions

About this article

Cite this article

't Hoen, P., Friedländer, M., Almlöf, J. et al. Reproducibility of high-throughput mRNA and small RNA sequencing across laboratories. Nat Biotechnol 31, 1015–1022 (2013). https://doi.org/10.1038/nbt.2702

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1038/nbt.2702

This article is cited by

Search

Quick links

Nature Briefing

Sign up for the Nature Briefing newsletter — what matters in science, free to your inbox daily.

Get the most important science stories of the day, free in your inbox. Sign up for Nature Briefing