Comprehensive comparative analysis of strand-specific RNA sequencing methods

Journal name:
Nature Methods
Volume:
7,
Pages:
709–715
Year published:
DOI:
doi:10.1038/nmeth.1491
Received
Accepted
Published online

Abstract

Strand-specific, massively parallel cDNA sequencing (RNA-seq) is a powerful tool for transcript discovery, genome annotation and expression profiling. There are multiple published methods for strand-specific RNA-seq, but no consensus exists as to how to choose between them. Here we developed a comprehensive computational pipeline to compare library quality metrics from any RNA-seq method. Using the well-annotated Saccharomyces cerevisiae transcriptome as a benchmark, we compared seven library-construction protocols, including both published and our own methods. We found marked differences in strand specificity, library complexity, evenness and continuity of coverage, agreement with known annotations and accuracy for expression profiling. Weighing each method's performance and ease, we identified the dUTP second-strand marking and the Illumina RNA ligation methods as the leading protocols, with the former benefitting from the current availability of paired-end sequencing. Our analysis provides a comprehensive benchmark, and our computational pipeline is applicable for assessment of future protocols in other organisms.

At a glance

Figures

  1. Methods for strand-specific RNA-seq.
    Figure 1: Methods for strand-specific RNA-seq.

    (a,b) Salient details for differential adaptor methods including RNA ligation29, SMART30 and NNSR priming31 (a) and differential marking methods (b). USER, uracil-specific excision reagent. mRNA is shown in gray and cDNA in black. For differential adaptor methods, 5′ adaptors are shown in blue, and 3′ adaptors are shown in red.

  2. Key criteria for evaluation of strand-specific RNA-seq libraries.
    Figure 2: Key criteria for evaluation of strand-specific RNA-seq libraries.

    (ad) Categories of quality assessment were complexity (a), strand specificity (b), evenness of coverage (c) and comparison to known transcript structure (d). Double-stranded genome with gene ORF orientation (blue arrows) and UTRs (blue lines) are shown along with mapped reads (black and red arrows, reads mapped to sense and antisense strands, respectively).

  3. Complexity of single- and paired-end libraries.
    Figure 3: Complexity of single- and paired-end libraries.

    (a,b) Percentage of unique reads mapping out of the total number of mapped reads, when considering only single-mapped reads (a; all libraries) or uniquely mapped pairs (b; only paired-end libraries).

  4. Strand specificity and evenness of transcript coverage.
    Figure 4: Strand specificity and evenness of transcript coverage.

    (a) Strand specificity (percentage antisense) and evenness of coverage (average coefficient of variation (CV)) for all libraries. (b) Relative gene coverage at each percentile of a gene's length, averaged across all genes in each library. The 5′ end is on the left. (c) Percentage of genes with 5′-end and 3′-end coverage in each library.

  5. Continuity of transcript coverage.
    Figure 5: Continuity of transcript coverage.

    (a) Average number of segments (separated by at least five bases of zero coverage) weighted by the average expression of each gene, in each library. (b) Lowess fit for each library. (ce) Plots for the dUTP method (c), the 3′ split adaptor method (d) and the SMART method (e). In ce, a Lowess fit is shown as a red curve, and each gene is represented by a blue dot.

  6. Digital expression profiling using strand-specific RNA-seq.
    Figure 6: Digital expression profiling using strand-specific RNA-seq.

    (a,b) Pearson correlation coefficient (a) and r.m.s. error (b) for each library when compared to a pooled reference, the control library and Agilent microarrays (right). (c,d) Scatter (left), Q-Q (middle) and MA (right) plots for the best performing (dUTP; c) and worst performing (NNSR; d) libraries, in comparison to the control library. The scatter plots show the fraction of total reads for each gene (blue dot) in the control library against a strand-specific library. The Q-Q plot shows the level at each quantile (rank) of expression in the control library against the strand-specific library. A slope = 1 line is shown for reference (red). The MA plot shows for each gene (dot) the difference in expression levels between the control and strand-specific libraries (M; y axis) compared to their mean expression level (A; x axis). Red and blue dashed lines indicate twofold and onefold difference in expression, respectively.

Accession codes

Referenced accessions

Gene Expression Omnibus

References

  1. Wang, Z., Gerstein, M. & Snyder, M. RNA-Seq: a revolutionary tool for transcriptomics. Nat. Rev. Genet. 10, 5763 (2009).
  2. Wilhelm, B.T. et al. Dynamic repertoire of a eukaryotic transcriptome surveyed at single-nucleotide resolution. Nature 453, 12391243 (2008).
  3. Denoeud, F. et al. Annotating genomes with massive-scale RNA sequencing. Genome Biol. 9, R175 (2008).
  4. Yassour, M. et al. Ab initio construction of a eukaryotic transcriptome by massively parallel mRNA sequencing. Proc. Natl. Acad. Sci. USA 106, 32643269 (2009).
  5. Marioni, J.C., Mason, C.E., Mane, S.M., Stephens, M. & Gilad, Y. RNA-seq: an assessment of technical reproducibility and comparison with gene expression arrays. Genome Res. 18, 15091517 (2008).
  6. Mortazavi, A., Williams, B.A., McCue, K., Schaeffer, L. & Wold, B. Mapping and quantifying mammalian transcriptomes by RNA-Seq. Nat. Methods 5, 621628 (2008).
  7. Pan, Q., Shai, O., Lee, L.J., Frey, B.J. & Blencowe, B.J. Deep surveying of alternative splicing complexity in the human transcriptome by high-throughput sequencing. Nat. Genet. 40, 14131415 (2008).
  8. Wang, E.T. et al. Alternative isoform regulation in human tissue transcriptomes. Nature 456, 470476 (2008).
  9. Sultan, M. et al. A global view of gene activity and alternative splicing by deep sequencing of the human transcriptome. Science 321, 956960 (2008).
  10. Guttman, M. et al. Ab initio reconstruction of cell type-specific transcriptomes in mouse reveals the conserved multi-exonic structure of lincRNAs. Nat. Biotechnol. 28, 503510 (2010).
  11. Trapnell, C. et al. Transcript assembly and quantification by RNA-Seq reveals unannotated transcripts and isoform switching during cell differentiation. Nat. Biotechnol. 28, 511515 (2010).
  12. Core, L.J., Waterfall, J.J. & Lis, J.T. Nascent RNA sequencing reveals widespread pausing and divergent initiation at human promoters. Science 322, 18451848 (2008).
  13. Parkhomchuk, D. et al. Transcriptome analysis by strand-specific sequencing of complementary DNA. Nucleic Acids Res. 37, e123 (2009).
  14. Ingolia, N.T., Ghaemmaghami, S., Newman, J.R. & Weissman, J.S. Genome-wide analysis in vivo of translation with nucleotide resolution using ribosome profiling. Science 324, 218223 (2009).
  15. He, Y., Vogelstein, B., Velculescu, V.E., Papadopoulos, N. & Kinzler, K.W. The antisense transcriptomes of human cells. Science 322, 18551857 (2008).
  16. Schaefer, M., Pollex, T., Hanna, K. & Lyko, F. RNA cytosine methylation analysis by bisulfite sequencing. Nucleic Acids Res. 37, e12 (2009).
  17. Jaffe, D.B. et al. Whole-genome sequence assembly for mammalian genomes: Arachne 2. Genome Res. 13, 9196 (2003).
  18. Xu, Z. et al. Bidirectional promoters generate pervasive transcription in yeast. Nature 457, 10331037 (2009).
  19. Guo, J., Wu, T., Bess, J., Henderson, L.E. & Levin, J.G. Actinomycin D inhibits human immunodeficiency virus type 1 minus-strand transfer in in vitro and endogenous reverse transcriptase assays. J. Virol. 72, 67166724 (1998).
  20. Gentleman, R., Carey, V., Huber, W., Irizarry, R. & Dudoit, S. (eds.). Bioinformatics and Computational Biology Solutions Using R and Bioconductor, 473 (Springer, Secaucus, NJ, 2005).
  21. Yang, Y.H. et al. Normalization for cDNA microarray data: a robust composite method addressing single and multiple slide systematic variation. Nucleic Acids Res. 30, e15 (2002).
  22. Croucher, N.J. et al. A simple method for directional transcriptome sequencing using Illumina technology. Nucleic Acids Res. 37, e148 (2009).
  23. Lipson, D. et al. Quantification of the yeast transcriptome by single-molecule sequencing. Nat. Biotechnol. 27, 652658 (2009).
  24. Ozsolak, F. et al. Direct RNA sequencing. Nature 461, 814818 (2009).
  25. Affymetrix / Cold Spring Harbor Laboratory ENCODE Transcriptome Project. Post-transcriptional processing generates a diversity of 5′-modified long and short RNAs. Nature 457, 10281032 (2009).
  26. Li, H. et al. Determination of tag density required for digital transcriptome analysis: application to an androgen-sensitive prostate cancer model. Proc. Natl. Acad. Sci. USA 105, 2017920184 (2008).
  27. Mamanova, L. et al. FRT-seq: amplification-free, strand-specific transcriptome sequencing. Nat. Methods 7, 130132 (2010).
  28. Linsen, S.E. et al. Limitations and possibilities of small RNA digital gene expression profiling. Nat. Methods 6, 474476 (2009).
  29. Lister, R. et al. Highly integrated single-base resolution maps of the epigenome in Arabidopsis . Cell 133, 523536 (2008).
  30. Zhu, Y.Y., Machleder, E.M., Chenchik, A., Li, R. & Siebert, P.D. Reverse transcriptase template switching: a SMART approach for full-length cDNA library construction. Biotechniques 30, 892897 (2001).
  31. Armour, C.D. et al. Digital transcriptome profiling using selective hexamer priming for cDNA synthesis. Nat. Methods 6, 647649 (2009).
  32. Cloonan, N. et al. Stem cell transcriptome profiling via massive-scale mRNA sequencing. Nat. Methods 5, 613619 (2008).

Download references

Author information

  1. These authors contributed equally to this work.

    • Joshua Z Levin &
    • Moran Yassour

Affiliations

  1. Broad Institute of Massachusetts Institute of Technology and Harvard University, Cambridge, Massachusetts, USA.

    • Joshua Z Levin,
    • Moran Yassour,
    • Xian Adiconis,
    • Chad Nusbaum,
    • Dawn Anne Thompson,
    • Andreas Gnirke &
    • Aviv Regev
  2. Department of Biology, Massachusetts Institute of Technology, Cambridge, Massachusetts, USA.

    • Moran Yassour &
    • Aviv Regev
  3. School of Engineering and Computer Science, Hebrew University, Jerusalem, Israel.

    • Moran Yassour &
    • Nir Friedman
  4. Alexander Silberman Institute of Life Sciences, Hebrew University, Jerusalem, Israel.

    • Nir Friedman
  5. Howard Hughes Medical Institute, Massachusetts Institute of Technology, Cambridge, Massachusetts, USA.

    • Aviv Regev

Contributions

J.Z.L., M.Y., X.A., D.A.T., N.F. and A.R. wrote the paper. J.Z.L., M.Y., X.A., C.N., D.A.T., N.F., A.G. and A.R. assisted in editing the paper. D.A.T. prepared the poly(A)+ RNA. J.Z.L. and X.A. prepared the cDNA libraries. M.Y., N.F. and A.R. developed and performed the computational analysis. J.Z.L., X.A., M.Y., N.F. and A.R. conceived the research.

Competing financial interests

The authors declare no competing financial interests.

Corresponding authors

Correspondence to:

Author details

Supplementary information

PDF files

  1. Supplementary Text and Figures (2M)

    Supplementary Figures 1–5, Supplementary Tables 1–5, Supplementary Notes 1–3

Additional data