Abstract
Predicting the molecular complexity of a genomic sequencing library is a critical but difficult problem in modern sequencing applications. Methods to determine how deeply to sequence to achieve complete coverage or to predict the benefits of additional sequencing are lacking. We introduce an empirical Bayesian method to accurately characterize the molecular complexity of a DNA sample for almost any sequencing application on the basis of limited preliminary sequencing.
Access options
Subscribe to Journal
Get full journal access for 1 year
$259.00
only $21.58 per issue
All prices are NET prices.
VAT will be added later in the checkout.
Rent or Buy article
Get time limited or full article access on ReadCube.
from$8.99
All prices are NET prices.


References
- 1
Lander, E. & Waterman, M. Genomics 2, 231–239 (1988).
- 2
Chen, Y. et al. Nat. Methods 9, 609–614 (2012).
- 3
Fisher, R.A., Corbet, S. & Williams, C.B. J. Anim. Ecol. 12, 42–58 (1943).
- 4
Good, I.J. & Toulmin, G.H. Biometrika 43, 45–63 (1956).
- 5
Kivioja, T. et al. Nat. Methods 9, 72–74 (2012).
- 6
Efron, B. & Thisted, R. Biometrika 63, 435–447 (1976).
- 7
Baker, G. & Graves-Morris, P. Pade Approximants (Cambrige University Press, Cambridge, UK, 1996).
- 8
Molaro, A. et al. Cell 146, 1029–1041 (2011).
- 9
Ribeiro de Almeida, C. et al. Immunity 35, 501–513 (2011).
- 10
Lister, R. et al. Nature 471, 68–73 (2011).
- 11
Link, W. Biometrics 59, 1123–1130 (2003).
- 12
Mao, C. & Lindsay, B. Ann. Stat. 35, 917–930 (2007).
- 13
Keating, K., Quinn, J., Ivie, M. & Ivie, L. Ecol. Appl. 8, 1239–1249 (1998).
- 14
Hardy, G. Divergent series (Oxford University Press, London, 1949).
- 15
Simon, B. Adv. Math. 137, 82–203 (1998).
- 16
McCabe, J.H. Math. Comput. 41, 183–197 (1983).
- 17
Blanch, G. SIAM Rev. 6, 383–421 (1964).
Acknowledgements
We thank S. Tavaré, M. Waterman, P. Calabrese, G. Hannon, and members of the Hannon lab and the Smith lab for their help, advice and input. This work was supported by US National Institutes of Health National Human Genome Research Institute grants (R01 HG005238 and P50 HG002790).
Author information
Affiliations
Contributions
T.D. and A.D.S. designed the method, implemented the software, performed the analysis and wrote the manuscript.
Corresponding author
Ethics declarations
Competing interests
The authors declare no competing financial interests.
Supplementary information
Supplementary Text and Figures
Supplementary Note, Supplementary Figures 1–2, Supplementary Tables 2–3 (PDF 4640 kb)
Supplementary Table 1
Properties of data sets used in evaluating estimates of library complexity. (XLSX 50 kb)
Supplementary Software
Preseq source code and manual. (ZIP 165 kb)
Rights and permissions
About this article
Cite this article
Daley, T., Smith, A. Predicting the molecular complexity of sequencing libraries. Nat Methods 10, 325–327 (2013). https://doi.org/10.1038/nmeth.2375
Received:
Accepted:
Published:
Issue Date:
Further reading
-
T Cell Clonal Dynamics Determined by High-Resolution TCR-β Sequencing in Recipients after Allogeneic Hematopoietic Cell Transplantation
Biology of Blood and Marrow Transplantation (2020)
-
Predicting the Number of Bases to Attain Sufficient Coverage in High-Throughput Sequencing Experiments
Journal of Computational Biology (2020)
-
Single molecule poly(A) tail-seq shows LARP4 opposes deadenylation throughout mRNA lifespan with most impact on short tails
eLife (2020)
-
ATAC-seq normalization method can significantly affect differential accessibility analysis and interpretation
Epigenetics & Chromatin (2020)
-
Blockade of BAFF Reshapes the Hepatic B Cell Receptor Repertoire and Attenuates Autoantibody Production in Cholestatic Liver Disease
The Journal of Immunology (2020)