Article | Published:

Spliced synthetic genes as internal controls in RNA sequencing experiments

Nature Methods volume 13, pages 792798 (2016) | Download Citation

Abstract

RNA sequencing (RNA-seq) can be used to assemble spliced isoforms, quantify expressed genes and provide a global profile of the transcriptome. However, the size and diversity of the transcriptome, the wide dynamic range in gene expression and inherent technical biases confound RNA-seq analysis. We have developed a set of spike-in RNA standards, termed 'sequins' (sequencing spike-ins), that represent full-length spliced mRNA isoforms. Sequins have an entirely artificial sequence with no homology to natural reference genomes, but they align to gene loci encoded on an artificial in silico chromosome. The combination of multiple sequins across a range of concentrations emulates alternative splicing and differential gene expression, and it provides scaling factors for normalization between samples. We demonstrate the use of sequins in RNA-seq experiments to measure sample-specific biases and determine the limits of reliable transcript assembly and quantification in accompanying human RNA samples. In addition, we have designed a complementary set of sequins that represent fusion genes arising from rearrangements of the in silico chromosome to aid in cancer diagnosis. RNA sequins provide a qualitative and quantitative reference with which to navigate the complexity of the human transcriptome.

Access optionsAccess options

Rent or Buy article

Get time limited or full article access on ReadCube.

from$8.99

All prices are NET prices.

Accessions

Primary accessions

Gene Expression Omnibus

References

  1. 1.

    et al. The transcriptional landscape of the mammalian genome. Science 309, 1559–1563 (2005).

  2. 2.

    , & Genome-wide transcription and the implications for genomic organization. Nat. Rev. Genet. 8, 413–423 (2007).

  3. 3.

    & The devil in the details of RNA-seq. Nat. Biotechnol. 32, 882–884 (2014).

  4. 4.

    et al. Stem cell transcriptome profiling via massive-scale mRNA sequencing. Nat. Methods 5, 613–619 (2008).

  5. 5.

    , , , & Mapping and quantifying mammalian transcriptomes by RNA-Seq. Nat. Methods 5, 621–628 (2008).

  6. 6.

    & RNA-Seq-quantitative measurement of expression through massively parallel RNA-sequencing. Methods 48, 249–257 (2009).

  7. 7.

    & Next-generation transcriptome assembly. Nat. Rev. Genet. 12, 671–682 (2011).

  8. 8.

    et al. Targeted sequencing for gene discovery and quantification using RNA CaptureSeq. Nat. Protoc. 9, 989–1009 (2014).

  9. 9.

    , , & Challenges and strategies in transcriptome assembly and differential gene expression quantification. A comprehensive in silico assessment of RNA-seq experiments. Mol. Ecol. 22, 620–634 (2013).

  10. 10.

    , , , & Differential expression in RNA-seq: a matter of depth. Genome Res. 21, 2213–2223 (2011).

  11. 11.

    et al. Multi-platform assessment of transcriptome profiling using RNA-seq in the ABRF next-generation sequencing study. Nat. Biotechnol. 32, 915–925 (2014).

  12. 12.

    et al. Detecting and correcting systematic variation in large-scale RNA sequencing data. Nat. Biotechnol. 32, 888–895 (2014).

  13. 13.

    et al. IVT-seq reveals extreme bias in RNA sequencing. Genome Biol. 15, R86 (2014).

  14. 14.

    , , , & Blind spots of quantitative RNA-seq: the limits for assessing abundance, differential expression, and isoform switching. BMC Bioinformatics 14, 370 (2013).

  15. 15.

    et al. The overlooked fact: fundamental need for spike-in control for virtually all genome-wide analyses. Mol. Cell Biol. 36, 662–667 (2015).

  16. 16.

    et al. Assessing technical performance in differential gene expression experiments with external spike-in RNA control ratio mixtures. Nat. Commun. 5, 5125 (2014).

  17. 17.

    et al. Synthetic spike-in standards for RNA-seq experiments. Genome Res. 21, 1543–1551 (2011).

  18. 18.

    et al. The External RNA Controls Consortium: a progress report. Nat. Methods 2, 731–734 (2005).

  19. 19.

    et al. A survey of best practices for RNA-seq data analysis. Genome Biol. 17, 13 (2016).

  20. 20.

    et al. GENCODE: the reference human genome annotation for The ENCODE Project. Genome Res. 22, 1760–1774 (2012).

  21. 21.

    et al. TopHat2: accurate alignment of transcriptomes in the presence of insertions, deletions and gene fusions. Genome Biol. 14, R36 (2013).

  22. 22.

    et al. StringTie enables improved reconstruction of a transcriptome from RNA-seq reads. Nat. Biotechnol. 33, 290–295 (2015).

  23. 23.

    & Evaluation of gene structure prediction programs. Genomics 34, 353–367 (1996).

  24. 24.

    et al. Landscape of transcription in human cells. Nature 489, 101–108 (2012).

  25. 25.

    et al. Quantitative gene profiling of long noncoding RNAs with targeted RNA sequencing. Nat. Methods 12, 339–342 (2015).

  26. 26.

    et al. Assessment of transcript reconstruction methods for RNA-seq. Nat. Methods 10, 1177–1184 (2013).

  27. 27.

    & Limit of blank, limit of detection and limit of quantitation. Clin. Biochem. Rev. 29, S49–S52 (2008).

  28. 28.

    , & Detecting differential usage of exons from RNA-seq data. Genome Res. 22, 2008–2017 (2012).

  29. 29.

    et al. Alternative isoform regulation in human tissue transcriptomes. Nature 456, 470–476 (2008).

  30. 30.

    , , & Normalization of RNA-seq data using factor analysis of control genes or samples. Nat. Biotechnol. 32, 896–902 (2014).

  31. 31.

    , & Moderated estimation of fold change and dispersion for RNA-seq data with DESeq2. Genome Biol. 15, 550 (2014).

  32. 32.

    et al. Differential analysis of gene regulation at transcript resolution with RNA-seq. Nat. Biotechnol. 31, 46–53 (2013).

  33. 33.

    , & The impact of translocations and gene fusions on cancer causation. Nat. Rev. Cancer 7, 233–245 (2007).

  34. 34.

    , , & The emerging complexity of gene fusions in cancer. Nat. Rev. Cancer 15, 371–381 (2015).

  35. 35.

    , , , & The landscape of kinase fusions in cancer. Nat. Commun. 5, 4846 (2014).

  36. 36.

    et al. Open-access synthetic spike-in mRNA-seq data for cancer gene fusions. BMC Genomics 15, 824 (2014).

  37. 37.

    et al. STAR: ultrafast universal RNA-seq aligner. Bioinformatics 29, 15–21 (2013).

  38. 38.

    , , & Complete karyotype characterization of the K562 cell line by combined application of G-banding, multiplex-fluorescence in situ hybridization, fluorescence in situ hybridization, and comparative genomic hybridization. Leuk. Res. 25, 313–322 (2001).

  39. 39.

    et al. Chimeric transcript discovery by paired-end transcriptome sequencing. Proc. Natl. Acad. Sci. USA 106, 12353–12358 (2009).

  40. 40.

    et al. Comparison of RNA-Seq by poly (A) capture, ribosomal RNA depletion, and DNA microarray for expression profiling. BMC Genomics 15, 419 (2014).

  41. 41.

    SEQC/MAQC-III Consortium. A comprehensive assessment of RNA-seq accuracy, reproducibility and information content by the Sequencing Quality Control Consortium. Nat. Biotechnol. 32, 903–914 (2014).

  42. 42.

    et al. Comprehensive evaluation of differential gene expression analysis methods for RNA-seq data. Genome Biol. 14, R95 (2013).

  43. 43.

    et al. Systematic evaluation of spliced alignment programs for RNA-seq data. Nat. Methods 10, 1185–1191 (2013).

  44. 44.

    , & Bringing RNA-seq closer to the clinic. Nat. Biotechnol. 32, 884–885 (2014).

  45. 45.

    , , , & Translating RNA sequencing into clinical diagnostics: opportunities and challenges. Nat. Rev. Genet. 17, 257–271 (2016).

  46. 46.

    et al. Representing genetic variation with synthetic DNA standards. Nat. Methods (2016).

  47. 47.

    , , , & Basic local alignment search tool. J. Mol. Biol. 215, 403–410 (1990).

  48. 48.

    Cutadapt removes adapter sequences from high-throughput sequencing reads. EMBnet.journal 17, 10–12 (2011).

  49. 49.

    & Fast gapped-read alignment with Bowtie 2. Nat. Methods 9, 357–359 (2012).

  50. 50.

    et al. The Sequence Alignment/Map format and SAMtools. Bioinformatics 25, 2078–2079 (2009).

  51. 51.

    et al. Transcript assembly and quantification by RNA-Seq reveals unannotated transcripts and isoform switching during cell differentiation. Nat. Biotechnol. 28, 511–515 (2010).

  52. 52.

    et al. Integrative genomics viewer. Nat. Biotechnol. 29, 24–26 (2011).

  53. 53.

    et al. Alternative splicing signatures in RNA-seq data: percent spliced in (PSI). Curr. Protoc. Hum. Genet. 87, 11.16.11–11.16.14 (2015).

  54. 54.

    & BEDTools: a flexible suite of utilities for comparing genomic features. Bioinformatics 26, 841–842 (2010).

  55. 55.

    , & HTSeq—a Python framework to work with high-throughput sequencing data. Bioinformatics 31, 166–169 (2015).

Download references

Acknowledgements

The authors would like to thank the following funding sources: Australian National Health and Medical Research Council (NHMRC) Australia Fellowship (1062470 to T.R.M. and 1062606 to W.Y.C.). S.A.H. and I.W.D. are supported by Australian Postgraduate Award scholarships. The contents of the published material are solely the responsibility of the administering institution, a participating institution or individual authors and do not reflect the views of NHMRC. The authors would also like to thank D. Thomson and M. Smith (Garvan Institute of Medical Research) for helpful discussions during manuscript preparation.

Author information

Author notes

    • Simon A Hardwick
    •  & Wendy Y Chen

    These authors contributed equally to this work.

Affiliations

  1. Genomics and Epigenetics Division, Garvan Institute of Medical Research, Sydney, New South Wales, Australia.

    • Simon A Hardwick
    • , Wendy Y Chen
    • , Ted Wong
    • , Ira W Deveson
    • , James Blackburn
    • , John S Mattick
    •  & Tim R Mercer
  2. St Vincent's Clinical School, Faculty of Medicine, University of New South Wales, Sydney, New South Wales, Australia.

    • Simon A Hardwick
    • , Wendy Y Chen
    • , James Blackburn
    • , John S Mattick
    •  & Tim R Mercer
  3. School of Biotechnology and Biomolecular Sciences, Faculty of Science, University of New South Wales, Sydney, New South Wales, Australia.

    • Ira W Deveson
  4. Australian Institute for Bioengineering and Nanotechnology, The University of Queensland, Brisbane, Queensland, Australia.

    • Stacey B Andersen
    •  & Lars K Nielsen

Authors

  1. Search for Simon A Hardwick in:

  2. Search for Wendy Y Chen in:

  3. Search for Ted Wong in:

  4. Search for Ira W Deveson in:

  5. Search for James Blackburn in:

  6. Search for Stacey B Andersen in:

  7. Search for Lars K Nielsen in:

  8. Search for John S Mattick in:

  9. Search for Tim R Mercer in:

Contributions

T.R.M. and J.S.M. conceived the project, designed sequins and in silico chromosome, and conceived experiments. W.Y.C. and S.B.A. performed experimental work. J.B. performed qRT-PCR validation. L.K.N. contributed supervision and manuscript preparation. S.A.H., T.W. and T.R.M. performed bioinformatic analyses. S.A.H., I.W.D. and T.R.M. prepared the manuscript.

Competing interests

Garvan Institute of Medical Research has filed a patent application (PCT/AU2015/050797) on some techniques described in this study.

Corresponding author

Correspondence to Tim R Mercer.

Integrated supplementary information

Supplementary information

PDF files

  1. 1.

    Supplementary Text and Figures

    Supplementary Figures 1–14

About this article

Publication history

Received

Accepted

Published

DOI

https://doi.org/10.1038/nmeth.3958

Further reading