• An Erratum to this article was published on 06 September 2018

This article has been updated


RNA-seq is increasingly used for quantitative profiling of small RNAs (for example, microRNAs, piRNAs and snoRNAs) in diverse sample types, including isolated cells, tissues and cell-free biofluids. The accuracy and reproducibility of the currently used small RNA-seq library preparation methods have not been systematically tested. Here we report results obtained by a consortium of nine labs that independently sequenced reference, 'ground truth' samples of synthetic small RNAs and human plasma-derived RNA. We assessed three commercially available library preparation methods that use adapters of defined sequence and six methods using adapters with degenerate bases. Both protocol- and sequence-specific biases were identified, including biases that reduced the ability of small RNA-seq to accurately measure adenosine-to-inosine editing in microRNAs. We found that these biases were mitigated by library preparation methods that incorporate adapters with degenerate bases. MicroRNA relative quantification between samples using small RNA-seq was accurate and reproducible across laboratories and methods.

Access optionsAccess options

Rent or Buy article

Get time limited or full article access on ReadCube.


All prices are NET prices.

Change history

  • 31 July 2018

    In the version of this article initially published online, the text "Beth Israel Deaconess Medical Center/Dana Farber Cancer Institute (BIDMC/ DFCI)" was inserted into the last sentence in the right-hand column of p.10, beginning "It is worth noting...." . In addition, on p.2, the acronym for The Cancer Genome Atlas was given as TGCA, rather than TCGA; and on p. 3, UUTR should have been defined, as University of Utrecht, the Netherlands. Finally, ref. 48 was cited in the Online Methods after "4N_Xu protocol was performed as previously described35"; this extra citation has been deleted. The errors have been corrected for the print, PDF and HTML versions of this article.


Primary accessions


  1. 1.

    , , , & Mapping and quantifying mammalian transcriptomes by RNA-Seq. Nat. Methods 5, 621–628 (2008).

  2. 2.

    , & RNA-Seq: a revolutionary tool for transcriptomics. Nat. Rev. Genet. 10, 57–63 (2009).

  3. 3.

    et al. Comprehensive comparative analysis of strand-specific RNA sequencing methods. Nat. Methods 7, 709–715 (2010).

  4. 4.

    , , & Identification and remediation of biases in the activity of RNA ligases in small-RNA deep sequencing. Nucleic Acids Res. 39, e141 (2011).

  5. 5.

    , , , & T4 RNA ligase 2 truncated active site mutants: improved tools for RNA analysis. BMC Biotechnol. 11, 72 (2011).

  6. 6.

    , , , & High-efficiency RNA cloning enables accurate quantification of miRNA expression by deep sequencing. Genome Biol. 14, R109 (2013).

  7. 7.

    , & Elimination of ligation dependent artifacts in T4 RNA ligase to achieve high efficiency and low bias microRNA capture. PLoS One 9, e94619 (2014).

  8. 8.

    et al. Addressing bias in small RNA library preparation for sequencing: a new protocol recovers microRNAs that evade capture by current methods. Front. Genet. 6, 352 (2015).

  9. 9.

    et al. Reducing ligation bias of small RNAs in libraries for next generation sequencing. Silence 3, 4 (2012).

  10. 10.

    , & Small RNA deep sequencing reveals a distinct miRNA signature released in exosomes from prion-infected neuronal cells. Nucleic Acids Res. 40, 10937–10949 (2012).

  11. 11.

    et al. Deep sequencing of RNA from immune cell-derived vesicles uncovers the selective incorporation of small non-coding RNA biotypes with potential regulatory functions. Nucleic Acid Res. 18, 9272–9285 (2012).

  12. 12.

    et al. Characterization of human plasma-derived exosomal RNAs by deep sequencing. BMC Genomics 14, 319 (2013).

  13. 13.

    , , & Cerebrospinal fluid extracellular vesicles undergo age dependent declines and contain known and novel non-coding RNAs. PLoS One 9, e113116 (2014).

  14. 14.

    et al. Small RNA deep sequencing discriminates subsets of extracellular vesicles released by melanoma cells--Evidence of unique microRNA cargos. RNA Biol. 12, 810–823 (2015).

  15. 15.

    et al. Assessment of small RNA sorting into different extracellular fractions revealed by high-throughput sequencing of breast cell lines. Nucleic Acids Res. 43, 5601–5616 (2015).

  16. 16.

    , , , & Quantitative and qualitative analysis of small RNAs in human endothelial cells and exosomes provides insights into localized RNA processing, degradation and sorting. J. Extracell. Vesicles 4, 26760 (2015).

  17. 17.

    et al. Identification of extracellular miRNA in human cerebrospinal fluid by next-generation sequencing. RNA 19, 712–722 (2013).

  18. 18.

    et al. The landscape of microRNA, Piwi-interacting RNA, and circular RNA in human saliva. Clin. Chem. 61, 221–230 (2015).

  19. 19.

    et al. Diverse human extracellular RNAs are widely detected in human plasma. Nat. Commun. 7, 11106 (2016).

  20. 20.

    et al. MicroRNA profiling in aqueous humor of individual human eyes by next-generation sequencing. Invest. Ophthalmol. Vis. Sci. 57, 1706–1713 (2016).

  21. 21.

    et al. Plasma extracellular RNA profiles in healthy and cancer patients. Sci. Rep. 6, 19413 (2016).

  22. 22.

    , , & Evaluation of statistical methods for normalization and differential expression in mRNA-Seq experiments. BMC Bioinformatics 11, 94 (2010).

  23. 23.

    , , & Normalization of RNA-seq data using factor analysis of control genes or samples. Nat. Biotechnol. 32, 896–902 (2014).

  24. 24.

    et al. Comparison of normalization and differential expression analyses using RNA-Seq data from 726 individual Drosophila melanogaster. BMC Genomics 17, 28 (2016).

  25. 25.

    et al. Reproducibility of high-throughput mRNA and small RNA sequencing across laboratories. Nat. Biotechnol. 31, 1015–1022 (2013).

  26. 26.

    SEQC/MAQC-III Consortium. A comprehensive assessment of RNA-seq accuracy, reproducibility and information content by the Sequencing Quality Control Consortium. Nat. Biotechnol. 32, 903–914 (2014).

  27. 27.

    , , & International Nucleotide Sequence Database Collaboration The sequence read archive. Nucleic Acids Res. 39, D19–D21 (2011).

  28. 28.

    , , & International Nucleotide Sequence Database Collaboration The Sequence Read Archive: explosive growth of sequencing data. Nucleic Acids Res. 40, D54–D56 (2012).

  29. 29.

    et al. Vesiclepedia: a compendium for extracellular vesicles with continuous community annotation. PLoS Biol. 10, e1001450 (2012).

  30. 30.

    , & ExoCarta as a resource for exosomal research. J. Extracell. Vesicles 1, 18374 (2012).

  31. 31.

    et al. EVpedia: an integrated database of high-throughput data for systemic analyses of extracellular vesicles. J. Extracell. Vesicles 2, 20384 (2013).

  32. 32.

    et al. The cancer genome atlas pan-cancer analysis project. Nat. Genet. 45, 1113–1120 (2013).

  33. 33.

    et al. Integration of extracellular RNA profiling data using metadata, biomedical ontologies and Linked Data technologies. J. Extracell. Vesicles 4, 27497 (2015).

  34. 34.

    et al. The NIH Extracellular RNA Communication Consortium. The NIH Extracellular RNA Communication Consortium. J. Extracell. Vesicles 4, 27493 (2015).

  35. 35.

    et al. an improved protocol for small RNA library construction using high-definition adapters. Methods Next Gener. Seq. 2 2, (2015).

  36. 36.

    , & Biases in Illumina transcriptome sequencing caused by random hexamer priming. Nucleic Acids Res. 38, e131 (2010).

  37. 37.

    et al. RNA-ligase-dependent biases in miRNA representation in deep-sequenced small RNA cDNA libraries. RNA 17, 1697–1712 (2011).

  38. 38.

    , , & Bias in ligation-based small RNA sequencing library construction is determined by adaptor and RNA structure. PLoS One 10, e0126049 (2015).

  39. 39.

    , & edgeR: a Bioconductor package for differential expression analysis of digital gene expression data. Bioinformatics 26, 139–140 (2010).

  40. 40.

    , & Differential expression analysis of multifactor RNA-Seq experiments with respect to biological variation. Nucleic Acids Res. 40, 4288–4297 (2012).

  41. 41.

    , & Moderated estimation of fold change and dispersion for RNA-seq data with DESeq2. Genome Biol. 15, 550 (2014).

  42. 42.

    , , & voom: Precision weights unlock linear model analysis tools for RNA-seq read counts. Genome Biol. 15, R29 (2014).

  43. 43.

    et al. MicroRNAs in body fluids: the mix of hormones and biomarkers. Nat. Rev. Clin. Oncol. 8, 467–477 (2011).

  44. 44.

    et al. Modulation of microRNA processing and expression through RNA editing by ADAR deaminases. Nat. Struct. Mol. Biol. 13, 13–21 (2006).

  45. 45.

    et al. Redirection of silencing targets by adenosine-to-inosine editing of miRNAs. Science 315, 1137–1140 (2007).

  46. 46.

    et al. Systematic characterization of A-to-I RNA editing hotspots in microRNAs across human cancers. Genome Res. 27, 1112–1125 (2017).

  47. 47.

    , , , & Conserved microRNA editing in mammalian evolution, development and disease. Genome Biol. 15, R83 (2014).

  48. 48.

    et al. Limitations and possibilities of small RNA digital gene expression profiling. Nat. Methods 6, 474–476 (2009).

  49. 49.

    1,500 scientists lift the lid on reproducibility. Nature 533, 452–454 (2016).

  50. 50.

    , & What does research reproducibility mean? Sci. Transl. Med. 8, 341ps12 (2016).

  51. 51.

    , , & GC-content normalization for RNA-Seq data. BMC Bioinformatics 12, 480 (2011).

  52. 52.

    , & Removing technical variability in RNA-seq data using conditional quantile normalization. Biostatistics 13, 204–216 (2012).

  53. 53.

    & Capturing heterogeneity in gene expression studies by surrogate variable analysis. PLoS Genet. 3, 1724–1735 (2007).

  54. 54.

    et al. Detecting and correcting systematic variation in large-scale RNA sequencing data. Nat. Biotechnol. 32, 888–895 (2014).

  55. 55.

    et al. Large-scale profiling of microRNAs for The Cancer Genome Atlas. Nucleic Acids Res. 44, e3 (2016).

  56. 56.

    & UNAFold: software for nucleic acid folding and hybridization. Methods Mol. Biol. 453, 3–31 (2008).

Download references


We are grateful to J.S. Rozowsky, R. Kitchen, S.L. Subramanian, W. Thistlethwaite, M.B. Gerstein, A. Milosavljevic and N. Sakhanenko for facilitating access to the exceRpt pipeline and helpful conversations and suggestions. We are also grateful to P.A.C. 't Hoen for helpful discussions. We acknowledge funding support from the NIH Extracellular RNA Communication Common Fund grants: U01 grants HL126499 to M.T., HL126496 to D.J.G. and K.W., HL126493 to D.J.E. and P.G.W., HL126494 to L.C.L., HL126495 to J.E.F., HL126497 to I.G., and UH3 grant TR000891 to K.V.K.-J. M.D.G. acknowledges initial support from a Rio Hortega Fellowship (CM10/00084) and later from a Martin Escudero Fellowship. E.N.M.N.-`t.H. and T.A.P.D. received funding from the European Research Council under the European Union's Seventh Framework Programme (FP/2007-2013) / ERC Grant Agreement number 337581 and from the Netherlands Organization for Scientific Research (NWO Enabling Technologies project nr 435002022). Y.E.W. received funding from the Dana-Farber Strategic Plan Initiative. K.W. received funding from DOD (W911NF-10-2-0111) and DTRA (HDTRA1-13-C-0055). Research reported in this publication was also supported by the National Cancer Institute of the NIH under Award Number P30CA046592 by the use of the following Cancer Center Shared Resource at the University of Michigan: DNA Sequencing. D.J.G. also acknowledges a special technology support award from the Pacific Northwest Research Institute to his lab. The content is solely the responsibility of the authors and does not necessarily represent the official views of the National Institutes of Health.

Author information

Author notes

    • Maria D Giraldez
    •  & Ryan M Spengler

    These authors contributed equally to this work.


  1. Department of Internal Medicine, Hematology/Oncology Division, University of Michigan, Ann Arbor, Michigan, USA.

    • Maria D Giraldez
    • , Ryan M Spengler
    •  & Muneesh Tewari
  2. Pacific Northwest Research Institute, Seattle, Washington, USA.

    • Alton Etheridge
    •  & David J Galas
  3. Lung Biology Center, Department of Medicine, University of California San Francisco, San Francisco, California, USA.

    • Paula M Godoy
    • , Andrea J Barczak
    •  & David J Erle
  4. Department of Obstetrics, Gynecology, and Reproductive Sciences and Sanford Consortium for Regenerative Medicine, University of California, San Diego, La Jolla, California, USA.

    • Srimeenakshi Srinivasan
    • , Peter L De Hoff
    •  & Louise C Laurent
  5. Department of Medicine, Division of Cardiovascular Medicine, University of Massachusetts Medical School, Worcester, Massachusetts, USA.

    • Kahraman Tanriverdi
    •  & Jane E Freedman
  6. Neurogenomics, The Translational Genomics Research Institute (TGen), Phoenix, Arizona, USA.

    • Amanda Courtright
    •  & Kendall Van Keuren-Jensen
  7. Department of Medicine, Beth Israel Deaconess Medical Center, Harvard Medical School, Boston, Massachusetts, USA.

    • Shulin Lu
    • , Joseph Khoory
    •  & Ionita Ghiran
  8. Center for Cancer Computational Biology, Dana-Farber Cancer Institute, Boston, Massachusetts, USA.

    • Renee Rubio
    •  & Yaoyu E Wang
  9. Institute for Systems Biology, Seattle, Washington, USA.

    • David Baxter
    •  & Kai Wang
  10. Department of Biochemistry & Cell Biology, Faculty of Veterinary Medicine, Utrecht University, Utrecht, the Netherlands.

    • Tom A P Driedonks
    •  & Esther N M Nolte-'t Hoen
  11. Leiden Genome Technology Center, Department of Human Genetics, Leiden University Medical Center, Leiden, the Netherlands.

    • Henk P J Buermans
  12. Center for Computational Medicine and Bioinformatics, University of Michigan, Ann Arbor, Michigan, USA.

    • Hui Jiang
    •  & Muneesh Tewari
  13. Department of Biostatistics, University of Michigan, Ann Arbor, Michigan, USA.

    • Hui Jiang
  14. Cardiovascular Research Institute and the Department of Medicine, Division of Pulmonary, Critical Care, Sleep, and Allergy, University of California San Francisco, San Francisco, California, USA.

    • Prescott G Woodruff
  15. Department of Biomedical Engineering, University of Michigan, Ann Arbor, Michigan, USA.

    • Muneesh Tewari
  16. Biointerfaces Institute, University of Michigan, Ann Arbor, Michigan, USA.

    • Muneesh Tewari


  1. Search for Maria D Giraldez in:

  2. Search for Ryan M Spengler in:

  3. Search for Alton Etheridge in:

  4. Search for Paula M Godoy in:

  5. Search for Andrea J Barczak in:

  6. Search for Srimeenakshi Srinivasan in:

  7. Search for Peter L De Hoff in:

  8. Search for Kahraman Tanriverdi in:

  9. Search for Amanda Courtright in:

  10. Search for Shulin Lu in:

  11. Search for Joseph Khoory in:

  12. Search for Renee Rubio in:

  13. Search for David Baxter in:

  14. Search for Tom A P Driedonks in:

  15. Search for Henk P J Buermans in:

  16. Search for Esther N M Nolte-'t Hoen in:

  17. Search for Hui Jiang in:

  18. Search for Kai Wang in:

  19. Search for Ionita Ghiran in:

  20. Search for Yaoyu E Wang in:

  21. Search for Kendall Van Keuren-Jensen in:

  22. Search for Jane E Freedman in:

  23. Search for Prescott G Woodruff in:

  24. Search for Louise C Laurent in:

  25. Search for David J Erle in:

  26. Search for David J Galas in:

  27. Search for Muneesh Tewari in:


M.D.G. designed, led and coordinated the overall study. This comprised contributions throughout the entire process, including designing, preparing and distributing synthetic RNA pools, creating detailed instructions for the participating labs, performing experiments, coordinating experimental work and communication across labs, and organizing and managing all aspects of the project. In addition, she interpreted study results and was the primary writer of the manuscript. R.M.S. led the computational analyses and data management for this study, including processing data, designing and performing data analyses, identifying and applying methods to visualize data and results, and coordinating data and metadata incoming from collaborating laboratories. He also designed the composition of the ratiometric pools, interpreted results and contributed to the manuscript by preparing the figures, drafting figure legends and writing the computational methods. A.E. contributed to experimental design and preparation of synthetic RNA pools. He also performed experimental work, interpreted results and provided comments on the manuscript. He also developed and contributed the core in-house 4N protocol, variations of which were then used by multiple laboratories. M.T., D.J.G. and D.J.E. helped to design the study and interpret the results, along with contributions from the rest of the study team, including L.C.L., P.G.W., K.V.K.-J., I.G., Y.E.W., K.W., J.E.F., H.J., E.N.M.N.-`t.H. and H.P. J.B. contributed to design of statistical analyses and data interpretation. P.M.G., A.J.B., S.S., P.L.D.H., K.T., A.C., S.L., J.K., R.R., D.B. and T.A.P.D. carried out experiments. M.T. and D.J.G. supervised the overall study and did primary editing of the manuscript with substantial input from D.J.E. All of the authors contributed to reviewing, editing and/or providing comments on the manuscript.

Competing interests

The spouse of L.C. Laurent is an employee of Illumina, Inc., the manufacturer of the TruSeq Small RNA Library Preparation Kit. L.C. Laurent and her spouse's equity interest in Illumina, Inc. represents <<1% of the company. The other authors declare no competing financial interests.

Corresponding authors

Correspondence to Maria D Giraldez or David J Galas or Muneesh Tewari.

Supplementary information

PDF files

  1. 1.

    Supplementary Text and Figures

    Supplementary Figures 1–20, Supplementary Note 1, and Supplementary Protocols 1–7

  2. 2.

    Life Sciences Reporting Summary

Excel files

  1. 1.

    Supplementary Tables

    Supplementary tables 1–10

About this article

Publication history