Predicting the molecular complexity of sequencing libraries

Daley, Timothy; Smith, Andrew D

doi:10.1038/nmeth.2375

Brief Communication
Published: 24 February 2013

Predicting the molecular complexity of sequencing libraries

Timothy Daley¹ &
Andrew D Smith²

Nature Methods volume 10, pages 325–327 (2013)Cite this article

17k Accesses
181 Citations
38 Altmetric
Metrics details

Subjects

Abstract

Predicting the molecular complexity of a genomic sequencing library is a critical but difficult problem in modern sequencing applications. Methods to determine how deeply to sequence to achieve complete coverage or to predict the benefits of additional sequencing are lacking. We introduce an empirical Bayesian method to accurately characterize the molecular complexity of a DNA sample for almost any sequencing application on the basis of limited preliminary sequencing.

Access through your institution

Buy or subscribe

This is a preview of subscription content, access via your institution

Access options

Access through your institution

Buy this article

Purchase on Springer Link
Instant access to full article PDF

Buy now

Prices may be subject to local taxes which are calculated during checkout

**Figure 1: Difficulties in predicting library complexity from initial shallow sequencing.**

**Figure 2: Library complexity can be estimated in terms of distinct molecules sequenced or distinct loci identified.**

Inferring whole-genome histories in large population datasets

Article 02 September 2019

Reconstructing phylogenetic trees from genome-wide somatic mutations in clonal samples

Article 23 February 2024

Large scale sequence alignment via efficient inference in generative models

Article Open access 04 May 2023

References

Lander, E. & Waterman, M. Genomics 2, 231–239 (1988).
Article CAS Google Scholar
Chen, Y. et al. Nat. Methods 9, 609–614 (2012).
Article CAS Google Scholar
Fisher, R.A., Corbet, S. & Williams, C.B. J. Anim. Ecol. 12, 42–58 (1943).
Article Google Scholar
Good, I.J. & Toulmin, G.H. Biometrika 43, 45–63 (1956).
Article Google Scholar
Kivioja, T. et al. Nat. Methods 9, 72–74 (2012).
Article CAS Google Scholar
Efron, B. & Thisted, R. Biometrika 63, 435–447 (1976).
Google Scholar
Baker, G. & Graves-Morris, P. Pade Approximants (Cambrige University Press, Cambridge, UK, 1996).
Molaro, A. et al. Cell 146, 1029–1041 (2011).
Article CAS Google Scholar
Ribeiro de Almeida, C. et al. Immunity 35, 501–513 (2011).
Article CAS Google Scholar
Lister, R. et al. Nature 471, 68–73 (2011).
Article CAS Google Scholar
Link, W. Biometrics 59, 1123–1130 (2003).
Article Google Scholar
Mao, C. & Lindsay, B. Ann. Stat. 35, 917–930 (2007).
Article Google Scholar
Keating, K., Quinn, J., Ivie, M. & Ivie, L. Ecol. Appl. 8, 1239–1249 (1998).
Google Scholar
Hardy, G. Divergent series (Oxford University Press, London, 1949).
Simon, B. Adv. Math. 137, 82–203 (1998).
Article Google Scholar
McCabe, J.H. Math. Comput. 41, 183–197 (1983).
Google Scholar
Blanch, G. SIAM Rev. 6, 383–421 (1964).
Article Google Scholar

Download references

Acknowledgements

We thank S. Tavaré, M. Waterman, P. Calabrese, G. Hannon, and members of the Hannon lab and the Smith lab for their help, advice and input. This work was supported by US National Institutes of Health National Human Genome Research Institute grants (R01 HG005238 and P50 HG002790).

Author information

Authors and Affiliations

Department of Mathematics, University of Southern California, Los Angeles, California, USA
Timothy Daley
Molecular and Computational Biology, University of Southern California, Los Angeles, California, USA
Andrew D Smith

Authors

Timothy Daley
View author publications
You can also search for this author in PubMed Google Scholar
Andrew D Smith
View author publications
You can also search for this author in PubMed Google Scholar

Contributions

T.D. and A.D.S. designed the method, implemented the software, performed the analysis and wrote the manuscript.

Corresponding author

Correspondence to Andrew D Smith.

Ethics declarations

Competing interests

The authors declare no competing financial interests.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Daley, T., Smith, A. Predicting the molecular complexity of sequencing libraries. Nat Methods 10, 325–327 (2013). https://doi.org/10.1038/nmeth.2375

Download citation

Received: 23 October 2012
Accepted: 18 January 2013
Published: 24 February 2013
Issue Date: April 2013
DOI: https://doi.org/10.1038/nmeth.2375

This article is cited by

Modeling methyl-sensitive transcription factor motifs with an expanded epigenetic alphabet
- Coby Viner
- Charles A. Ishak
- Michael M. Hoffman
Genome Biology (2024)
Detection of DNA methylation signatures through the lens of genomic imprinting
- Jean-Noël Hubert
- Nathalie Iannuccelli
- Julie Demars
Scientific Reports (2024)
Impaired ATF3 signaling involves SNAP25 in SOD1 mutant ALS patients
- Volkan Yazar
- Julia K. Kühlwein
- Karin M. Danzer
Scientific Reports (2023)
Targeted deletion of von-Hippel-Lindau in the proximal tubule conditions the kidney against early diabetic kidney disease
- Madlen Kunke
- Hannah Knöfler
- Franziska Theilig
Cell Death & Disease (2023)
Ancient DNA reveals admixture history and endogamy in the prehistoric Aegean
- Eirini Skourtanioti
- Harald Ringbauer
- Philipp W. Stockhammer
Nature Ecology & Evolution (2023)