Comparative analysis of RNA sequencing methods for degraded or low-input samples

Journal name:
Nature Methods
Volume:
10,
Pages:
623–629
Year published:
DOI:
doi:10.1038/nmeth.2483
Received
Accepted
Published online
Corrected online

Abstract

RNA-seq is an effective method for studying the transcriptome, but it can be difficult to apply to scarce or degraded RNA from fixed clinical samples, rare cell populations or cadavers. Recent studies have proposed several methods for RNA-seq of low-quality and/or low-quantity samples, but the relative merits of these methods have not been systematically analyzed. Here we compare five such methods using metrics relevant to transcriptome annotation, transcript discovery and gene expression. Using a single human RNA sample, we constructed and sequenced ten libraries with these methods and compared them against two control libraries. We found that the RNase H method performed best for chemically fragmented, low-quality RNA, and we confirmed this through analysis of actual degraded samples. RNase H can even effectively replace oligo(dT)-based methods for standard RNA-seq. SMART and NuGEN had distinct strengths for measuring low-quantity RNA. Our analysis allows biologists to select the most suitable methods and provides a benchmark for future method development.

At a glance

Figures

  1. Methods for total RNA-seq.
    Figure 1: Methods for total RNA-seq.

    Salient details for five protocols for total RNA-seq are shown. DSN-lite, RNase H and Ribo-Zero were tested for low-quality samples. SMART was tested for low-quantity samples. NuGEN, which generates double-stranded (ds)-cDNA that is amplified with Ribo-SPIA (Single Primer Isothermal Amplification), was tested for both types of samples. RNA and matching cDNA are black, adaptors and primers are colored and rRNA is gray. ss-cDNA, single-stranded cDNA. (T)30 is composed of 30 T bases.

  2. Metrics for sequence alignment and uniformity of coverage.
    Figure 2: Metrics for sequence alignment and uniformity of coverage.

    (ad) Performance of each library with respect to percentage of reads mapping to rRNA (a), percentage of duplicated reads (b), proportion of reads mapping to exons (solid), introns (hatched) and intergenic regions (white) (c) and evenness of coverage (d). The mean coefficient of variation is shown in d for the 1,000 most highly expressed transcripts in each library. (e) Locally weighted scatter-plot smoothing (LOWESS) fits of the percentage of the transcript length covered for transcripts at each expression level. Transcript coverage was aggregated for all isoforms of each gene. TPM, transcripts per million.

  3. 5[prime]-to-3[prime] sequence coverage.
    Figure 3: 5′-to-3′ sequence coverage.

    (a) Normalized coverage by position. For each library, the average relative coverage is shown at each relative position along the transcripts' length. (b,c) Percentage of annotated 5′ (b) and 3′ ends (c) covered by reads.

  4. Expression metrics.
    Figure 4: Expression metrics.

    (a) Pearson correlation coefficient between each library and the control Total library. (b,c) Scatter plots (b) and quantile-quantile (Q-Q) plots (c) between a low-quality library (RNase H) or a low-quantity library (SMART) and the control Total library. For Q-Q plots, if the two samples originated from the same distribution, the points will lie on a straight line. TPM, transcripts per million.

  5. Length and GC biases in expression metrics.
    Figure 5: Length and GC biases in expression metrics.

    (a,b) Pearson correlation coefficient between each library (columns) and the control Total library for either all transcripts (top row in a,b), transcripts with different lengths (a) or transcripts with different GC content (b). The numbers of transcripts expressed in the control Total library in the 1–1,000 bin, 1,001–5,000 bin and >5,000 bin for transcript length were 3,716, 38,088 and 7,050, respectively. The numbers of transcripts expressed in the control Total library in the less than or equal to37% bin, 37–62% bin and >62% bin for GC content were 2,358, 42,660 and 3,836, respectively.

  6. Performance for actual degraded samples.
    Figure 6: Performance for actual degraded samples.

    Key metrics for RNase H, Ribo-Zero and Total libraries from pancreas and formalin-fixed, paraffin-embedded (FFPE) kidney RNA. (a) Percentage of reads mapping to rRNA. (b) Proportion of reads mapping to exons (solid), introns (hatched) and intergenic regions (white). (c) Mean coefficient of variation for the 1,000 most highly expressed transcripts in each library. (d) Pearson correlation coefficient between each library and a control Total library.

Accession codes

Primary accessions

Gene Expression Omnibus

Referenced accessions

Change history

Corrected online 02 December 2013
In the version of this article initially published, in the Online Methods "RNase H libraries" section, the sentence beginning with "We added 5 μl preheated RNase H...." should have read "We added 5 μl preheated RNase H reaction mix that contains 10 U of Hybridase Thermostable RNase H (Epicentre), 0.5 μmol Tris-HCl, pH 7.5, 1 μmol NaCl and 0.2 μmol MgCl2 to the RNA and DNA oligo mix, incubated this mixture at 45 °C for 30 min and then placed it on ice." The errors have been corrected in the HTML and PDF versions of this article.

References

  1. Aviv, H. & Leder, P. Purification of biologically active globin messenger RNA by chromatography on oligothymidylic acid-cellulose. Proc. Natl. Acad. Sci. USA 69, 14081412 (1972).
  2. Yang, L., Duff, M.O., Graveley, B.R., Carmichael, G.G. & Chen, L.L. Genomewide characterization of non-polyadenylated RNAs. Genome Biol. 12, R16 (2011).
  3. Tang, F. et al. mRNA-Seq whole-transcriptome analysis of a single cell. Nat. Methods 6, 377382 (2009).
  4. Ramsköld, D. et al. Full-length mRNA-Seq from single-cell levels of RNA and individual circulating tumor cells. Nat. Biotechnol. 30, 777782 (2012).
  5. Islam, S. et al. Characterization of the single-cell transcriptional landscape by highly multiplex RNA-seq. Genome Res. 21, 11601167 (2011).
  6. Sinicropi, D. & Morlan, J. Methods for depleting RNA from nucleic acid samples. US patent application 20110111409 (2011).
  7. Morlan, J.D., Qu, K. & Sinicropi, D.V. Selective depletion of rRNA enables whole transcriptome profiling of archival fixed tissue. PLoS ONE 7, e42882 (2012).
  8. Huang, R. et al. An RNA-Seq strategy to detect the complete coding and non-coding transcriptome including full-length imprinted macro ncRNAs. PLoS ONE 6, e27288 (2011).
  9. Yi, H. et al. Duplex-specific nuclease efficiently removes rRNA for prokaryotic RNA-seq. Nucleic Acids Res. 39, e140 (2011).
  10. Tariq, M.A., Kim, H.J., Jejelowo, O. & Pourmand, N. Whole-transcriptome RNAseq analysis from minute amount of total RNA. Nucleic Acids Res. 39, e120 (2011).
  11. Levin, J.Z. et al. Comprehensive comparative analysis of strand-specific RNA sequencing methods. Nat. Methods 7, 709715 (2010).
  12. DeLuca, D.S. et al. RNA-SeQC: RNA-seq metrics for quality control and process optimization. Bioinformatics 28, 15301532 (2012).
  13. Beyer, A.L. & Osheim, Y.N. Splice site selection, rate of splicing, and alternative splicing on nascent transcripts. Genes Dev. 2, 754765 (1988).
  14. Yang, Y.H. et al. Normalization for cDNA microarray data: a robust composite method addressing single and multiple slide systematic variation. Nucleic Acids Res. 30, e15 (2002).
  15. Aird, D. et al. Analyzing and minimizing PCR amplification bias in Illumina sequencing libraries. Genome Biol. 12, R18 (2011).
  16. Rosenkranz, R., Borodina, T., Lehrach, H. & Himmelbauer, H. Characterizing the mouse ES cell transcriptome with Illumina sequencing. Genomics 92, 187194 (2008).
  17. Giannoukos, G. et al. Efficient and robust RNA-seq process for cultured bacteria and complex community transcriptomes. Genome Biol. 13, R23 (2012).
  18. Griffin, M., Abu-El-Haija, M., Abu-El-Haija, M., Rokhlina, T. & Uc, A. Simplified and versatile method for isolation of high-quality RNA from pancreas. Biotechniques 52, 332334 (2012).
  19. Pan, X. et al. Two methods for full-length RNA sequencing for low quantities of cells and single cells. Proc. Natl. Acad. Sci. USA 110, 594599 (2013).
  20. Roberts, A., Trapnell, C., Donaghey, J., Rinn, J.L. & Pachter, L. Improving RNA-Seq expression estimates by correcting for fragment bias. Genome Biol. 12, R22 (2011).
  21. Li, H. & Durbin, R. Fast and accurate short read alignment with Burrows-Wheeler transform. Bioinformatics 25, 17541760 (2009).
  22. Maden, B.E. et al. Clones of human ribosomal DNA containing the complete 18 S-rRNA and 28 S-rRNA genes. Characterization, a detailed map of the human ribosomal transcription unit and diversity among clones. Biochem. J. 246, 519527 (1987).
  23. Trapnell, C., Pachter, L. & Salzberg, S.L. TopHat: discovering splice junctions with RNA-Seq. Bioinformatics 25, 11051111 (2009).
  24. Dreszer, T.R. et al. The UCSC Genome Browser database: extensions and updates 2011. Nucleic Acids Res. 40, D918D923 (2012).
  25. Li, B. & Dewey, C.N. RSEM: accurate transcript quantification from RNA-Seq data with or without a reference genome. BMC Bioinformatics 12, 323 (2011).
  26. Langmead, B., Trapnell, C., Pop, M. & Salzberg, S.L. Ultrafast and memory-efficient alignment of short DNA sequences to the human genome. Genome Biol. 10, R25 (2009).
  27. Wickham, H. ggplot2: Elegant Graphics for Data Analysis (Use R!) (Springer, New York, 2009).

Download references

Author information

  1. These authors contributed equally to this work.

    • Xian Adiconis &
    • Diego Borges-Rivera

Affiliations

  1. Broad Institute of MIT and Harvard, Cambridge, Massachusetts, USA.

    • Xian Adiconis,
    • Diego Borges-Rivera,
    • Rahul Satija,
    • David S DeLuca,
    • Michele A Busby,
    • Aaron M Berlin,
    • Andrey Sivachenko,
    • Dawn Anne Thompson,
    • Alec Wysoker,
    • Timothy Fennell,
    • Andreas Gnirke,
    • Nathalie Pochet,
    • Aviv Regev &
    • Joshua Z Levin
  2. Department of Biology, Massachusetts Institute of Technology, Cambridge, Massachusetts, USA.

    • Aviv Regev
  3. Howard Hughes Medical Institute, Massachusetts Institute of Technology, Cambridge, Massachusetts, USA.

    • Aviv Regev

Contributions

J.Z.L., X.A. and A.R. conceived the research. X.A. prepared the cDNA libraries. D.B.-R., R.S., N.P., M.A.B. and A.R. developed and performed computational analysis. D.S.D. contributed code. D.S.D., A.M.B., A.S., A.W. and T.F. helped with computational analysis. D.A.T., N.P., A.R. and J.Z.L. supervised the research. J.Z.L., X.A., D.B.-R. and A.R. wrote the paper. R.S., A.G. and D.S.D. assisted in editing the paper.

Competing financial interests

The authors declare no competing financial interests.

Corresponding authors

Correspondence to:

Author details

Supplementary information

PDF files

  1. Supplementary Text and Figures (i) (47 MB)

    Supplementary Figures 1–7

  2. Supplementary Text and Figures (ii) (21 MB)

    Supplementary Figures 8 and 9

  3. Supplementary Text and Figures (iii) (782 KB)

    Supplementary Tables 1–5 and 7 and Supplementary Notes 1–5

Excel files

  1. Supplementary Table 6 (20 KB)

    RNase H oligonucleotide sequences

Additional data