Abstract
We present primary results from the Sequencing Quality Control (SEQC) project, coordinated by the US Food and Drug Administration. Examining Illumina HiSeq, Life Technologies SOLiD and Roche 454 platforms at multiple laboratory sites using reference RNA samples with built-in controls, we assess RNA sequencing (RNA-seq) performance for junction discovery and differential expression profiling and compare it to microarray and quantitative PCR (qPCR) data using complementary metrics. At all sequencing depths, we discover unannotated exon-exon junctions, with >80% validated by qPCR. We find that measurements of relative expression are accurate and reproducible across sites and platforms if specific filters are used. In contrast, RNA-seq and microarrays do not provide accurate absolute measurements, and gene-specific biases are observed for all examined platforms, including qPCR. Measurement performance depends on the platform and data analysis pipeline, and variation is large for transcript-level profiling. The complete SEQC data sets, comprising >100 billion reads (10Tb), provide unique resources for evaluating RNA-seq analyses for clinical and regulatory settings.
This is a preview of subscription content, access via your institution
Access options
Subscribe to this journal
Receive 12 print issues and online access
$209.00 per year
only $17.42 per issue
Rent or buy this article
Prices vary by article type
from$1.95
to$39.95
Prices may be subject to local taxes which are calculated during checkout






Similar content being viewed by others
Accession codes
Change history
09 September 2014
In the version of this article initially published online, the superscript 95 for the footnote for “these authors contributed equally to this work” was omitted for the first three authors. The error has been corrected for the print, PDF and HTML versions of this article.
References
Wang, E.T. et al. Alternative isoform regulation in human tissue transcriptomes. Nature 456, 470–476 (2008).
Mortazavi, A., Williams, B.A., McCue, K., Schaeffer, L. & Wold, B. Mapping and quantifying mammalian transcriptomes by RNA-seq. Nat. Methods 5, 621–628 (2008).
Łabaj, P.P. et al. Characterization and improvement of RNA-Seq precision in quantitative transcript expression profiling. Bioinformatics 27, i383–i391 (2011).
Liu, S., Lin, L., Jiang, P., Wang, D. & Xing, Y. A comparison of RNA-Seq and high-density exon array for detecting differential gene expression between closely related species. Nucleic Acids Res. 39, 578–588 (2011).
McIntyre, L.M. et al. RNA-seq: technical variability and sampling. BMC Genomics 12, 293 (2011).
Toung, J.M., Morley, M., Li, M. & Cheung, V.G. RNA-sequence analysis of human B-cells. Genome Res. 21, 991–998 (2011).
Djebali, S. et al. Landscape of transcription in human cells. Nature 489, 101–108 (2012).
Cancer Genome Atlas Research Network. Comprehensive genomic characterization defines human glioblastoma genes and core pathways. Nature 455, 1061–1068 (2008).
International Cancer Genome Consortium. International network of cancer genome projects. Nature 464, 993–998 (2010).
Shi, L. et al. The MicroArray Quality Control (MAQC) project shows inter- and intraplatform reproducibility of gene expression measurements. Nat. Biotechnol. 24, 1151–1161 (2006).
Li, S. et al. Detecting and correcting systematic variation in large-scale RNA sequencing data. Nat. Biotechnol. 10.1038/nbt.3000 (24 August 2014).
Wang, C. et al. The concordance between RNA-seq and microarray data depends on chemical treatment and transcript abundance. Nat. Biotechnol. 10.1038/nbt.3001 (24 August 2014).
Yu, Y. et al. A rat RNA-seq transcriptomic Bodymap across eleven organs and four developmental stages. Nat. Commun. 5, 3230 (2014).
Baker, S.C. et al. The External RNA Controls Consortium: a progress report. Nat. Methods 2, 731–734 (2005).
Pruitt, K.D., Tatusova, T., Brown, G.R. & Maglott, D.R. NCBI Reference Sequences (RefSeq): current status, new features and genome annotation policy. Nucleic Acids Res. 40, D130–D135 (2012).
Harrow, J. et al. GENCODE: The reference human genome annotation for The ENCODE Project. Genome Res. 22, 1760–1774 (2012).
Thierry-Mieg, D. & Thierry-Mieg, J. AceView: a comprehensive cDNA-supported gene and transcripts. Genome Biol. 7, S12 (2006).
Dobin, A. et al. STAR: ultrafast universal RNA-seq aligner. Bioinformatics 29, 15–21 (2013).
Liao, Y., Smyth, G.K. & Shi, W. The Subread aligner: fast, accurate and scalable read mapping by seed-and-vote. Nucleic Acids Res. 41, e108 (2013).
Kim, D. et al. TopHat2: accurate alignment of transcriptomes in the presence of insertions, deletions and gene fusions. Genome Biol. 14, R36 (2013).
Li, S. et al. Multi-platform assessment of transcriptome profiling using RNA-seq in the ABRF next-generation sequencing study. Nat. Biotechnol. 10.1038/nbt.2972 (24 August 2014).
Xu, W. et al. Human transcriptome array for high-throughput clinical studies. Proc. Natl. Acad. Sci. USA 108, 3707–3712 (2011).
Marioni, J.C., Mason, C.E., Mane, S.M., Stephens, M. & Gilad, Y. RNA-seq: an assessment of technical reproducibility and comparison with gene expression arrays. Genome Res. 18, 1509–1517 (2008).
VanGuilder, H., Vrana, K. & Freeman, W. Twenty-five years of quantitative PCR for gene expression analysis. Biotechniques 44 (suppl.) 619–626 (2008).
Aird, D. et al. Analyzing and minimizing PCR amplification bias in Illumina sequencing libraries. Genome Biol. 12, R18 (2011).
Shippy, R. et al. Using RNA sample titrations to assess microarray platform performance and normalization techniques. Nat. Biotechnol. 24, 1123–1131 (2006).
Trapnell, C. et al. Differential analysis of gene regulation at transcript resolution with RNA-seq. Nat. Biotechnol. 31, 46–53 (2013).
Pickrell, J.K., Pai, A.A., Gilad, Y. & Pritchard, J.K. Noisy splicing drives mRNA isoform diversity in human cells. PLoS Genet. 6, e1001236 (2010).
Dai, M. et al. Evolving gene/transcript definitions significantly alter the interpretation of GeneChip data. Nucleic Acids Res. 33, e175 (2005).
Liu, Y. et al. Evaluating the impact of sequencing depth on transcriptome profiling in human adipose. PLoS ONE 8, e66883 (2013).
Levin, J.Z. et al. Targeted next-generation sequencing of a cancer transcriptome enhances detection of sequence variants and novel fusion transcripts. Genome Biol. 10, R115 (2009).
Agarwal, A. et al. Comparison and calibration of transcriptome data from RNA-Seq and tiling arrays. BMC Genomics 11, 383 (2010).
Raghavachari, N. et al. A systematic comparison and evaluation of high density exon arrays and RNA-seq technology used to unravel the peripheral blood transcriptome of sickle cell disease. BMC Med. Genomics 5, 28 (2012).
Qing, T., Yu, Y., Du, T. & Shi, L. mRNA enrichment protocols determine the quantification characteristics of external RNA spike-in controls in RNA-seq studies. Sci. China Life Sci. 56, 134–142 (2013).
Benjamini, Y. & Speed, T.P. Summarizing and correcting the GC content bias in high-throughput sequencing. Nucleic Acids Res. 40, e72 (2012).
Robinson, M.D. & Oshlack, A. A scaling normalization method for differential expression analysis of RNA-seq data. Genome Biol. 11, R25 (2010).
Smyth, G.K. in Bioinformatics and Computational Biology Solutions Using R Bioconductor (eds. Gentleman, R., Carey, V.J., Huber, W., Irizarry, R.A. & Dudoit, S.) 397–420 (Springer, New York, 2005).
Huber, W., von Heydebreck, A., Sültmann, H., Poustka, A. & Vingron, M. Variance stabilization applied to microarray data calibration and to the quantification of differential expression. Bioinformatics 18, S96–S104 (2002).
Wu, Z., Irizarry, R., Gentleman, R., Murillo, F.M. & Spencer, F. A model based background adjustment for oligonucleotide expression arrays. J. Amer. Stat. Assoc. 99, 909–917 (2004).
Hochreiter, S., Clevert, D.-A. & Obermayer, K. A new summarization method for affymetrix probe level data. Bioinformatics 22, 943–949 (2006).
Fasold, M., Stadler, P.F. & Binder, H. G-stack modulated probe intensities on expression arrays–sequence corrections and signal calibration. BMC Bioinformatics 11, 207 (2010).
Mueckstein, U., Leparc, G.G., Posekany, A., Hofacker, I. & Kreil, D.P. Hybridization thermodynamics of NimbleGen Microarrays. BMC Bioinformatics 11, 35 (2010).
Sykacek, P. et al. The impact of quantitative optimization of hybridization conditions on gene expression analysis. BMC Bioinformatics 12, 73 (2011).
Rapaport, F. et al. Comprehensive evaluation of differential gene expression analysis methods for RNA-seq data. Genome Biol. 14, R95 (2013).
Xu, J. et al. Cross-platform ultradeep transcriptomic profiling of human reference RNA samples by RNA-Seq. Sci. Data (in the press).
Liu, S. et al. A comparison of RNA-seq and high-density exon array for detecting differential gene expression between closely related species. Nucleic Acids Res. 39, 578–588 (2011).
Munro, S. et al. Nat. Commun. (in the press).
David, M., Dzamba, M., Lister, D., Ilie, L. & Brudno, M. SHRiMP2: Sensitive yet practical short read mapping. Bioinformatics 27, 1011–1012 (2011).
Glaus, P., Honkela, A. & Rattray, M. Identifying differentially expressed transcripts from RNA-seq data with biological variation. Bioinformatics 28, 1721–1728 (2012).
Liao, Y., Smyth, G.K. & Shi, W. featureCounts: an efficient general purpose program for assigning sequence reads to genomic features. Bioinformatics 30, 923–930 (2014).
Rasmussen, C.E. Gaussian Processes for Machine Learning (MIT Press, 2006).
Law, C.W. et al. Voom: precision weights unlock linear model analysis tools for RNA-seq read counts. Genome Biol. 15, R29 (2014).
Dillies, M.-A. et al. A comprehensive evaluation of normalization methods for Illumina high-throughput RNA sequencing data analysis. Brief. Bioinform. 14, 671–683 (2013).
Robinson, M.D., McCarthy, D.J. & Smyth, G.K. edgeR: a Bioconductor package for differential expression analysis of digital gene expression data. Bioinformatics 26, 139–140 (2010).
Acknowledgements
All SEQC (MAQC-III) participants freely donated their time and reagents for the completion and analyses of the project. Many participants contributed to the sometimes-heated discussions on the topic of this paper during numerous e-mail exchanges, teleconferences and face-to-face project meetings. The common conclusions and recommendations reported in this paper evolved from this extended discourse. The authors gratefully acknowledge support by the National Center for Biotechnology Information (NCBI)'s Supercomputing Center, the FDA's Supercomputing Center, China's National Supercomputing Center of Tianjin, the Vienna Scientific Cluster High Performance Computing Facility (VSC), the Vienna Science and Technology Fund (WWTF), Baxter, the Austrian Institute of Technology, and the Austrian Centre of Biopharmaceutical Technology. This work was supported in part by China's Program of Global Experts. This work was supported in part by the US National Institutes of Health (NIH) grants R01CA163256, R01HG006798, R01NS076465, R44HG005297, U54CA119338, PO1HG00205, R24GM102656 and the Intramural Research Program of the NIH, National Library of Medicine, National Institute of Environmental Health Sciences (NIEHS) Z01 ES102345-04, Shriners Research Grant 85500, an Australia National Health and Medical Research Council (NH&MRC) Project grant (1023454) and Victorian State Government Operational Infrastructure Support (Australia), the National 973 Key Basic Research Program of China (2010CB945401), the National Natural Science Foundation of China (31240038 and 31071162), and the Science and Technology Commission of Shanghai Municipality (11DZ2260300). We greatly appreciate SAS Institute, Inc. for kindly hosting several face-to-face meetings of the SEQC (MAQC-III) project.
Author information
Authors and Affiliations
Consortia
Contributions
Project coordination: US Food and Drug Administration.
Project lead: Weida Tong & Leming Shi.
Manuscript lead: David P. Kreil.
Scientific management: David P. Kreil, Christopher E. Mason, Weida Tong & Leming Shi.
Next-generation sequencing technology lead: Christopher E. Mason.
The following authors contributed to project leadership: Zhenqiang Su, Paweł P. Łabaj, Sheng Li, Jean Thierry-Mieg, Danielle Thierry-Mieg, Wei Shi, Charles Wang, Gary P. Schroth, Robert A. Setterquist, John F. Thompson, Wendell D. Jones, Wenzhong Xiao, Weihong Xu, Roderick V Jensen, Reagan Kelly, Joshua Xu, Ana Conesa, Cesare Furlanello, Hanlin Gao, Huixiao Hong, Nadereh Jafari, Stan Letovsky, Yang Liao, Fei Lu, Edward J. Oakeley, Zhiyu Peng, Craig A. Praul, Javier Santoyo-Lopez, Andreas Scherer, Tieliu Shi, Gordon K. Smyth, Frank Staedtler, Peter Sykacek, Xin-Xing Tan, E. Aubrey Thompson, Jo Vandesompele, May D. Wang, Jian Wang, Russell D. Wolfinger, Jiri Zavadil, Weida Tong, David P. Kreil, Christopher E. Mason & Leming Shi.
The following authors contributed equally to this work: Zhenqiang Su, Paweł P. Łabaj & Sheng Li.
Corresponding authors
Ethics declarations
Competing interests
Some of the SEQC (MAQC-III) Consortium members are employed by companies that provide services or manufacture products or equipment related to gene expression profiling, as can be seen from the affiliations provided by the manuscript authors.
Supplementary information
Supplementary Text and Figures
Supplementary Figures 1–46, Supplementary Tables 1–15 and Supplementary Notes (PDF 24302 kb)
Supplementary Data 1
RNA-seq read coverage flanking all 250 candidate junctions considered for validation. (TXT 870 kb)
Supplementary Data 2
Employed qPCR primer sequences, qPCR results and expression level estimates, as well as the corresponding RNA-seq expression level estimates for the 173 performed assays. (XLS 121 kb)
Supplementary Data 3
Supplementary Data 3 (ZIP 38371 kb)
Rights and permissions
About this article
Cite this article
SEQC/MAQC-III Consortium. A comprehensive assessment of RNA-seq accuracy, reproducibility and information content by the Sequencing Quality Control Consortium. Nat Biotechnol 32, 903–914 (2014). https://doi.org/10.1038/nbt.2957
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1038/nbt.2957
This article is cited by
-
Correcting batch effects in large-scale multiomics studies using a reference-material-based ratio method
Genome Biology (2023)
-
A multi-omics dataset of human transcriptome and proteome stable reference
Scientific Data (2023)
-
Multi-omics and immune cells’ profiling of COVID-19 patients for ICU admission prediction: in silico analysis and an integrated machine learning-based approach in the framework of Predictive, Preventive, and Personalized Medicine
EPMA Journal (2023)
-
Transcriptome profiling of celery petiole tissues reveals peculiarities of the collenchyma cell wall formation
Planta (2023)
-
Validation of a Transcriptome-Based Assay for Classifying Cancers of Unknown Primary Origin
Molecular Diagnosis & Therapy (2023)