Skip to main content

Thank you for visiting You are using a browser version with limited support for CSS. To obtain the best experience, we recommend you use a more up to date browser (or turn off compatibility mode in Internet Explorer). In the meantime, to ensure continued support, we are displaying the site without styles and JavaScript.

Predicting the molecular complexity of sequencing libraries


Predicting the molecular complexity of a genomic sequencing library is a critical but difficult problem in modern sequencing applications. Methods to determine how deeply to sequence to achieve complete coverage or to predict the benefits of additional sequencing are lacking. We introduce an empirical Bayesian method to accurately characterize the molecular complexity of a DNA sample for almost any sequencing application on the basis of limited preliminary sequencing.

Access options

Rent or Buy article

Get time limited or full article access on ReadCube.


All prices are NET prices.

Figure 1: Difficulties in predicting library complexity from initial shallow sequencing.
Figure 2: Library complexity can be estimated in terms of distinct molecules sequenced or distinct loci identified.


  1. 1

    Lander, E. & Waterman, M. Genomics 2, 231–239 (1988).

    CAS  Article  Google Scholar 

  2. 2

    Chen, Y. et al. Nat. Methods 9, 609–614 (2012).

    CAS  Article  Google Scholar 

  3. 3

    Fisher, R.A., Corbet, S. & Williams, C.B. J. Anim. Ecol. 12, 42–58 (1943).

    Article  Google Scholar 

  4. 4

    Good, I.J. & Toulmin, G.H. Biometrika 43, 45–63 (1956).

    Article  Google Scholar 

  5. 5

    Kivioja, T. et al. Nat. Methods 9, 72–74 (2012).

    CAS  Article  Google Scholar 

  6. 6

    Efron, B. & Thisted, R. Biometrika 63, 435–447 (1976).

    Google Scholar 

  7. 7

    Baker, G. & Graves-Morris, P. Pade Approximants (Cambrige University Press, Cambridge, UK, 1996).

  8. 8

    Molaro, A. et al. Cell 146, 1029–1041 (2011).

    CAS  Article  Google Scholar 

  9. 9

    Ribeiro de Almeida, C. et al. Immunity 35, 501–513 (2011).

    CAS  Article  Google Scholar 

  10. 10

    Lister, R. et al. Nature 471, 68–73 (2011).

    CAS  Article  Google Scholar 

  11. 11

    Link, W. Biometrics 59, 1123–1130 (2003).

    Article  Google Scholar 

  12. 12

    Mao, C. & Lindsay, B. Ann. Stat. 35, 917–930 (2007).

    Article  Google Scholar 

  13. 13

    Keating, K., Quinn, J., Ivie, M. & Ivie, L. Ecol. Appl. 8, 1239–1249 (1998).

    Google Scholar 

  14. 14

    Hardy, G. Divergent series (Oxford University Press, London, 1949).

  15. 15

    Simon, B. Adv. Math. 137, 82–203 (1998).

    Article  Google Scholar 

  16. 16

    McCabe, J.H. Math. Comput. 41, 183–197 (1983).

    Google Scholar 

  17. 17

    Blanch, G. SIAM Rev. 6, 383–421 (1964).

    Article  Google Scholar 

Download references


We thank S. Tavaré, M. Waterman, P. Calabrese, G. Hannon, and members of the Hannon lab and the Smith lab for their help, advice and input. This work was supported by US National Institutes of Health National Human Genome Research Institute grants (R01 HG005238 and P50 HG002790).

Author information




T.D. and A.D.S. designed the method, implemented the software, performed the analysis and wrote the manuscript.

Corresponding author

Correspondence to Andrew D Smith.

Ethics declarations

Competing interests

The authors declare no competing financial interests.

Supplementary information

Supplementary Text and Figures

Supplementary Note, Supplementary Figures 1–2, Supplementary Tables 2–3 (PDF 4640 kb)

Supplementary Table 1

Properties of data sets used in evaluating estimates of library complexity. (XLSX 50 kb)

Supplementary Software

Preseq source code and manual. (ZIP 165 kb)

Rights and permissions

Reprints and Permissions

About this article

Cite this article

Daley, T., Smith, A. Predicting the molecular complexity of sequencing libraries. Nat Methods 10, 325–327 (2013).

Download citation

Further reading


Quick links

Nature Briefing

Sign up for the Nature Briefing newsletter — what matters in science, free to your inbox daily.

Get the most important science stories of the day, free in your inbox. Sign up for Nature Briefing