• A Corrigendum to this article was published on 10 March 2014

This article has been updated

Abstract

RNA sequencing is an increasingly popular technology for genome-wide analysis of transcript sequence and abundance. However, understanding of the sources of technical and interlaboratory variation is still limited. To address this, the GEUVADIS consortium sequenced mRNAs and small RNAs of lymphoblastoid cell lines of 465 individuals in seven sequencing centers, with a large number of replicates. The variation between laboratories appeared to be considerably smaller than the already limited biological variation. Laboratory effects were mainly seen in differences in insert size and GC content and could be adequately corrected for. In small-RNA sequencing, the microRNA (miRNA) content differed widely between samples owing to competitive sequencing of rRNA fragments. This did not affect relative quantification of miRNAs. We conclude that distributing RNA sequencing among different laboratories is feasible, given proper standardization and randomization procedures. We provide a set of quality measures and guidelines for assessing technical biases in RNA-seq data.

Access optionsAccess options

Rent or Buy article

Get time limited or full article access on ReadCube.

from$8.99

All prices are NET prices.

Change history

  • 08 November 2013

    In the version of this article initially published, in the list of members of the GEUVADIS consortium, Stylianos E Antonorakis should have been Stylianos E Antonarakis. The error has been corrected in the HTML and PDF versions of the article.

Accessions

Primary accessions

References

  1. 1.

    , , , & Mapping and quantifying mammalian transcriptomes by RNA-Seq. Nat. Methods 5, 621–628 (2008).

  2. 2.

    & RNA sequencing: advances, challenges and opportunities. Nat. Rev. Genet. 12, 87–98 (2011).

  3. 3.

    , & RNA-seq: a revolutionary tool for transcriptomics. Nat. Rev. Genet. 10, 57–63 (2009).

  4. 4.

    et al. Stem cell transcriptome profiling via massive-scale mRNA sequencing. Nat. Methods 5, 613–619 (2008).

  5. 5.

    et al. Deep sequencing-based expression analysis shows major advances in robustness, resolution and inter-lab portability over five microarray platforms. Nucleic Acids Res. 36, e141 (2008).

  6. 6.

    et al. Relative power and sample size analysis on gene expression profiling data. BMC Genomics 10, 439 (2009).

  7. 7.

    , , & RNA-seq vs dual- and single-channel microarray data: sensitivity analysis for differential expression and clustering. PLoS ONE 7, e50986 (2012).

  8. 8.

    et al. A comparison of massively parallel nucleotide sequencing with oligonucleotide microarrays for global transcription profiling. BMC Genomics 11, 282 (2010).

  9. 9.

    , , , & RNA-seq: an assessment of technical reproducibility and comparison with gene expression arrays. Genome Res. 18, 1509–1517 (2008).

  10. 10.

    et al. Comparison and calibration of transcriptome data from RNA-Seq and tiling arrays. BMC Genomics 11, 383 (2010).

  11. 11.

    et al. Evaluating gene expression in C57BL/6J and DBA/2J mouse striatum using RNA-Seq and microarrays. PLoS ONE 6, e17820 (2011).

  12. 12.

    et al. A systematic comparison and evaluation of high density exon arrays and RNA-seq technology used to unravel the peripheral blood transcriptome of sickle cell disease. BMC Med. Genomics 5, 28 (2012).

  13. 13.

    , , , & A comparison of RNA-Seq and high-density exon array for detecting differential gene expression between closely related species. Nucleic Acids Res. 39, 578–588 (2011).

  14. 14.

    , & Biases in Illumina transcriptome sequencing caused by random hexamer priming. Nucleic Acids Res. 38, e131 (2010).

  15. 15.

    , , , & Length bias correction for RNA-seq data in gene set analyses. Bioinformatics 27, 662–669 (2011).

  16. 16.

    & Transcript length bias in RNA-seq data confounds systems biology. Biol. Direct 4, 14 (2009).

  17. 17.

    , , , & Improving RNA-Seq expression estimates by correcting for fragment bias. Genome Biol. 12, R22 (2011).

  18. 18.

    , , & GC-content normalization for RNA-Seq data. BMC Bioinformatics 12, 480 (2011).

  19. 19.

    et al. Understanding mechanisms underlying human gene expression variation with RNA sequencing. Nature 464, 768–772 (2010).

  20. 20.

    et al. The MicroArray Quality Control (MAQC) project shows inter- and intraplatform reproducibility of gene expression measurements. Nat. Biotechnol. 24, 1151–1161 (2006).

  21. 21.

    et al. Evaluation of DNA microarray results with quantitative gene expression platforms. Nat. Biotechnol. 24, 1115–1122 (2006).

  22. 22.

    et al. Performance comparison of one-color and two-color platforms within the MicroArray Quality Control (MAQC) project. Nat. Biotechnol. 24, 1140–1150 (2006).

  23. 23.

    et al. Transcriptome and genome sequencing uncovers functional variation in humans. Nature (in the press) doi:10.1038/nature12531 (2013).

  24. 24.

    , , & The GEM mapper: fast, accurate and versatile alignment by filtration. Nat. Methods 9, 1185–1188 (2012).

  25. 25.

    , & SeqBuster, a bioinformatic tool for the processing and analysis of small RNAs datasets, reveals ubiquitous miRNA modifications in human embryonic cells. Nucleic Acids Res. 38, e34 (2010).

  26. 26.

    & Metrics for mining multisets. in Research and Development in Intelligent Systems XXIV, Proceedings of AI-2007 (Eds. Bramer, M., Coenen, F. & Petridis, M.) 293–303 (Springer, 2007).

  27. 27.

    & Consequences of error. in Encyclopedia of Genetics, Genomics, Proteomics and Bioinformatics (Eds. Jorde, L., Little, P., Dunn, M. & Subramaniam, S.) (Wiley Online Library, 2006).

  28. 28.

    et al. Analyzing and minimizing PCR amplification bias in Illumina sequencing libraries. Genome Biol. 12, R18 (2011).

  29. 29.

    , , & A Bayesian framework to account for complex non-genetic factors in gene expression levels greatly increases power in eQTL studies. PLOS Comput. Biol. 6, e1000770 (2010).

  30. 30.

    , , , & Using probabilistic estimation of expression residuals (PEER) to obtain increased power and interpretability of gene expression analyses. Nat. Protoc. 7, 500–507 (2012).

  31. 31.

    et al. Extent, causes, and consequences of small RNA expression variation in human adipose tissue. PLoS Genet. 8, e1002704 (2012).

  32. 32.

    & Summarizing and correcting the GC content bias in high-throughput sequencing. Nucleic Acids Res. 40, e72 (2012).

  33. 33.

    , & RSeQC: quality control of RNA-seq experiments. Bioinformatics 28, 2184–2185 (2012).

  34. 34.

    , , & A tool for RNA sequencing sample identity check. Bioinformatics 1463–1464 (2013).

  35. 35.

    et al. MixupMapper: correcting sample mix-ups in genome-wide datasets increases power to detect small genetic effects. Bioinformatics 27, 2104–2111 (2011).

  36. 36.

    & Capturing heterogeneity in gene expression studies by surrogate variable analysis. PLoS Genet. 3, e161 (2007).

  37. 37.

    et al. Trans-eQTLs reveal that independent genetic variants associated with a complex phenotype converge on intermediate genes, with a major role for the HLA. PLoS Genet. 7, e1002197 (2011).

  38. 38.

    et al. Transcriptome genetics using second generation sequencing in a Caucasian population. Nature 464, 773–777 (2010).

  39. 39.

    et al. Modelling and simulating generic RNA-Seq experiments with the flux simulator. Nucleic Acids Res. 40, 10073–10083 (2012).

  40. 40.

    et al. Repbase Update, a database of eukaryotic repetitive elements. Cytogenet. Genome Res. 110, 462–467 (2005).

  41. 41.

    et al. The UCSC Table Browser data retrieval tool. Nucleic Acids Res. 32, D493–D496 (2004).

  42. 42.

    , , & Computational analysis of small RNA cloning data. Methods 44, 13–21 (2008).

  43. 43.

    , & edgeR: a Bioconductor package for differential expression analysis of digital gene expression data. Bioinformatics 26, 139–140 (2010).

Download references

Acknowledgements

This project was funded by the European Commission 7th Framework Program (FP7) (261123; GEUVADIS); the Swiss National Science Foundation (130326, 130342), the Louis Jeantet Foundation, and ERC (260927) (E.T.D.); NIH-NIMH (MH090941) (E.T.D., R.G.); Spanish Plan Nacional SAF2008–00357 (NOVADIS), the Generalitat de Catalunya AGAUR 2009 SGR-1502, and the Instituto de Salud Carlos III (FIS/FEDER PI11/00733) (X.E.); Spanish Plan Nacional (BIO2011-26205) and ERC (294653) (R.G.); ESGI, READNA (FP7 Health-F4-2008-201418), Spanish Ministry of Economy and Competitiveness (MINECO) and the Generalitat de Catalunya (I.G.G.); FP7/2007-2013, ENGAGE project, HEALTH-F4-2007-201413, and the Centre for Medical Systems Biology within the framework of The Netherlands Genomics Initiative (NGI)/Netherlands Organisation for Scientific Research (NWO) (P.A.C.t.H. & G.-J.B.v.O.); The Swedish Research Council (C0524801, A028001) and the Knut and Alice Wallenberg Foundation (2011.0073) (A.-C.S.); EMBO long-term fellowship ALTF 225-2011 (M.R.F.); Emil Aaltonen Foundation and Academy of Finland fellowships (T.L.). We acknowledge the SNP&SEQ Technology Platform in Uppsala for sequencing, and the Swedish National Infrastructure for Computing (SNIC-UPPMAX) for compute resources for data analysis. The authors would like to thank P.G.M. van Overveld for help with preparation of the figures.

Author information

Author notes

    • Michael Sammeth

    Present address: Bioinformatics Laboratory, National Laboratory of Scientific Computing, Petropolis, Rio de Janeiro, Brazil.

    • Marc R Friedländer
    •  & Jonas Almlöf

    These authors contributed equally to this work.

Affiliations

  1. Department of Human Genetics, Leiden University Medical Center, Leiden, The Netherlands.

    • Peter A C 't Hoen
    • , Irina Pulyakhina
    • , Seyed Yahya Anvar
    • , Jeroen F J Laros
    • , Henk P J Buermans
    • , Gert-Jan B van Ommen
    • , Maarten van Iterson
    •  & Johan T den Dunnen
  2. Netherlands Bioinformatics Centre, Leiden, The Netherlands.

    • Peter A C 't Hoen
    • , Jeroen F J Laros
    •  & Johan T den Dunnen
  3. Centre for Genomic Regulation (CRG), Barcelona, Catalonia, Spain.

    • Marc R Friedländer
    • , Michael Sammeth
    • , Xavier Estivill
    • , Roderic Guigó
    • , Jean Monlong
    • , Esther Lizano
    • , Gabrielle Bertier
    •  & Pedro G Ferreira
  4. Universitat Pompeu Fabra (UPF), Barcelona, Catalonia, Spain.

    • Marc R Friedländer
    • , Michael Sammeth
    • , Xavier Estivill
    • , Roderic Guigó
    • , Jean Monlong
    • , Esther Lizano
    • , Gabrielle Bertier
    •  & Pedro G Ferreira
  5. CRG Hospital del Mar Research Institute, Barcelona, Catalonia, Spain.

    • Marc R Friedländer
    • , Michael Sammeth
    • , Xavier Estivill
    • , Roderic Guigó
    • , Jean Monlong
    • , Esther Lizano
    •  & Pedro G Ferreira
  6. CIBER in Epidemiology and Public Health (CIBERESP), Barcelona, Catalonia, Spain.

    • Marc R Friedländer
    • , Xavier Estivill
    • , Jean Monlong
    •  & Esther Lizano
  7. Department of Medical Sciences, Molecular Medicine and Science for Life Laboratory, Uppsala University, Uppsala, Sweden.

    • Jonas Almlöf
    • , Olof Karlberg
    • , Mathias Brännvall
    •  & Ann-Christine Syvänen
  8. Centro Nacional de Análisis Genómico (CNAG), Barcelona, Catalonia, Spain.

    • Michael Sammeth
    • , Ivo G Gut
    • , Paolo Ribeca
    • , Thasso Griebel
    • , Sergi Beltran
    • , Marta Gut
    •  & Katja Kahlem
  9. Leiden Genome Technology Center, Leiden University Medical Center, Leiden, The Netherlands.

    • Seyed Yahya Anvar
    • , Jeroen F J Laros
    • , Henk P J Buermans
    •  & Johan T den Dunnen
  10. Department of Genetic Medicine and Development, University of Geneva Medical School, Geneva, Switzerland.

    • Emmanouil T Dermitzakis
    • , Stylianos E Antonarakis
    • , Tuuli Lappalainen
    • , Thomas Giger
    • , Halit Ongen
    • , Ismael Padioleau
    •  & Helena Kilpinen
  11. Institute for Genetics and Genomics in Geneva (iG3), University of Geneva, Geneva, Switzerland.

    • Emmanouil T Dermitzakis
    • , Stylianos E Antonarakis
    • , Tuuli Lappalainen
    • , Thomas Giger
    • , Halit Ongen
    • , Ismael Padioleau
    •  & Helena Kilpinen
  12. Swiss Institute of Bioinformatics, Geneva, Switzerland.

    • Emmanouil T Dermitzakis
    • , Stylianos E Antonarakis
    • , Tuuli Lappalainen
    • , Thomas Giger
    • , Halit Ongen
    • , Ismael Padioleau
    •  & Helena Kilpinen
  13. European Bioinformatics Institute, Hinxton, United Kingdom.

    • Alvis Brazma
    • , Paul Flicek
    • , Mar Gonzàlez-Porta
    • , Natalja Kurbatova
    • , Andrew Tikhonov
    •  & Liliana Greger
  14. Institute of Clinical Molecular Biology, Christian-Albrechts-University Kiel, Kiel, Germany.

    • Stefan Schreiber
    • , Philip Rosenstiel
    • , Matthias Barann
    • , Daniela Esser
    •  & Robert Häsler
  15. Institute of Human Genetics, Helmholtz Zentrum München, Neuherberg, Germany.

    • Thomas Meitinger
    • , Tim M Strom
    • , Thomas Wieland
    •  & Thomas Schwarzmayr
  16. Max Plank Institute for Molecular Genetics, Berlin, Germany.

    • Hans Lehrach
    • , Ralf Sudbrak
    • , Marc Sultan
    •  & Vyacheslav Amstislavskiy
  17. Fundacion Publica Galega de Medicina Xenomica SERGAS, Genomic Medicine Group CIBERER, Universidade de Santiago de Compostela, Santiago de Compostela, Spain.

    • Angel Carracedo

Consortia

  1. The GEUVADIS Consortium

    Full lists of members and affiliations appear at the end of the paper.

Authors

  1. Search for Peter A C 't Hoen in:

  2. Search for Marc R Friedländer in:

  3. Search for Jonas Almlöf in:

  4. Search for Michael Sammeth in:

  5. Search for Irina Pulyakhina in:

  6. Search for Seyed Yahya Anvar in:

  7. Search for Jeroen F J Laros in:

  8. Search for Henk P J Buermans in:

  9. Search for Olof Karlberg in:

  10. Search for Mathias Brännvall in:

  11. Search for Johan T den Dunnen in:

  12. Search for Gert-Jan B van Ommen in:

  13. Search for Ivo G Gut in:

  14. Search for Roderic Guigó in:

  15. Search for Xavier Estivill in:

  16. Search for Ann-Christine Syvänen in:

  17. Search for Emmanouil T Dermitzakis in:

  18. Search for Tuuli Lappalainen in:

Contributions

P.A.C.t.H., M.R.F., J.A., M.S., I.P., S.Y.A., J.F.J.L., H.P.J.B., M.B., O.K. and T.L. performed the analyses. P.A.C.t.H., A.-C.S., R.G., X.E., J.T.d.D., G.-J.B.v.O., I.G.G. and E.T.D. designed the study. E.T.D. and T.L. coordinated the study. P.A.C.t.H. drafted the manuscript, which was subsequently revised by all co-authors.

Competing interests

The authors declare no competing financial interests.

Corresponding authors

Correspondence to Peter A C 't Hoen or Tuuli Lappalainen.

Supplementary information

PDF files

  1. 1.

    Supplementary Text and Figures

    Supplementary Figures 1–15, Supplementary Tables 1, 3, 5 and Supplementary Note

Text files

  1. 1.

    Supplementary Text and Figures

    Supplementary Table 2

  2. 2.

    Supplementary Text and Figures

    Supplementary Table 4

About this article

Publication history

Received

Accepted

Published

DOI

https://doi.org/10.1038/nbt.2702

Further reading