Quantifying biogenic bias in screening libraries


In lead discovery, libraries of 106 molecules are screened for biological activity. Given the over 1060 drug-like molecules thought possible, such screens might never succeed. The fact that they do, even occasionally, implies a biased selection of library molecules. We have developed a method to quantify the bias in screening libraries toward biogenic molecules. With this approach, we consider what is missing from screening libraries and how they can be optimized.

Access options

Rent or Buy article

Get time limited or full article access on ReadCube.


All prices are NET prices.

Figure 1
Figure 2: Compounds in screening libraries are biased toward biogenic molecules.
Figure 3: Biogenic bias increases with molecular size.
Figure 4: Core ring structures common among drugs and related molecules.


  1. 1

    Wilhelm, S. et al. Discovery and development of sorafenib: a multikinase inhibitor for treating cancer. Nat. Rev. Drug Discov. 5, 835–844 (2006).

    CAS  Article  Google Scholar 

  2. 2

    Spencer, R.W. High-throughput screening of historic collections: observations on file size, biological targets, and file diversity. Biotechnol. Bioeng. 61, 61–67 (1998).

    CAS  Article  Google Scholar 

  3. 3

    Fox, S., Farr-Jones, S., Sopchak, L., Boggs, A. & Comley, J. High-throughput screening: searching for higher productivity. J. Biomol. Screen. 9, 354–358 (2004).

    CAS  Article  Google Scholar 

  4. 4

    Macarron, R. Critical review of the role of HTS in drug discovery. Drug Discov. Today 11, 277–279 (2006).

    Article  Google Scholar 

  5. 5

    Pereira, D.A. & Williams, J.A. Origin and evolution of high throughput screening. Br. J. Pharmacol. 152, 53–61 (2007).

    CAS  Article  Google Scholar 

  6. 6

    Bohacek, R., McMartin, C. & Guida, W. The art and practice of structure-based drug design: a molecular modeling perspective. Med. Res. Rev. 16, 3–50 (1996).

    CAS  Article  Google Scholar 

  7. 7

    Roth, B., Sheffler, D. & Kroeze, W. Magic shotguns versus magic bullets: selectively non-selective drugs for mood disorders and schizophrenia. Nat. Rev. Drug Discov. 3, 353–359 (2004).

    CAS  Article  Google Scholar 

  8. 8

    Paolini, G., Shapland, R., van Hoorn, W., Mason, J. & Hopkins, A. Global mapping of pharmacological space. Nat. Biotechnol. 24, 805–815 (2006).

    CAS  Article  Google Scholar 

  9. 9

    Yildirim, M., Goh, K.-I., Cusick, M., Barabasi, A.-L. & Vidal, M. Drug–target network. Nat. Biotechnol. 25, 1119–1126 (2007).

    CAS  Article  Google Scholar 

  10. 10

    Martin, Y.C. Diverse viewpoints on computational aspects of molecular diversity. J. Comb. Chem. 3, 231–250 (2001).

    CAS  Article  Google Scholar 

  11. 11

    Breinbauer, R., Vetter, I.R. & Waldmann, H. From protein domains to drug candidates—natural products as guiding principles in the design and synthesis of compound libraries. Angew. Chem. Int. Ed. 41, 2879–2890 (2002).

    Google Scholar 

  12. 12

    Koehn, F. & Carter, G. The evolving role of natural products in drug discovery. Nat. Rev. Drug Discov. 4, 206–220 (2005).

    CAS  Article  Google Scholar 

  13. 13

    Arve, L., Voigt, T. & Waldmann, H. Charting biological and chemical space: PSSC and SCONP as guiding principles for the development of compound collections based on natural product scaffolds. QSAR Comb. Sci. 25, 449–456 (2006).

    CAS  Article  Google Scholar 

  14. 14

    Ertl, P., Roggo, S. & Schuffenhauer, A. Natural product-likeness score and its application for prioritization of compound libraries. J. Chem. Inf. Model. 48, 68–74 (2008).

    CAS  Article  Google Scholar 

  15. 15

    Gupta, S. Aires-de-Sousa, J. Comparing the chemical spaces of metabolites and available chemicals: models of metabolite-likeness. Mol. Divers. 11, 23–36 (2007).

    CAS  Article  Google Scholar 

  16. 16

    Fink, T. & Reymond, J.L. Virtual exploration of the chemical universe up to 11 atoms of C, N, O, F: assembly of 26.4 million structures (110.9 million stereoisomers) and analysis for new ring systems, stereochemistry, physicochemical properties, compound classes, and drug discovery. J. Chem. Inf. Model. 47, 342–353 (2007).

    CAS  Article  Google Scholar 

  17. 17

    Sadowski, J. & Kubinyi, H. A scoring scheme for discriminating between drugs and nondrugs. J. Med. Chem. 41, 3325–3329 (1998).

    CAS  Article  Google Scholar 

  18. 18

    Good, A.C. & Hermsmeier, M.A. Measuring CAMD technique performance. 2. How “druglike” are drugs? Implications of random test set selection exemplified using druglikeness classification models. J. Chem. Inf. Model. 47, 110–114 (2007).

    CAS  Article  Google Scholar 

  19. 19

    Glen, R.C. et al. Circular fingerprints: flexible molecular descriptors with applications from physical chemistry to ADME. IDrugs 9, 199–204 (2006).

    CAS  Google Scholar 

  20. 20

    Bemis, G.W. & Murcko, M.A. The properties of known drugs. 1. Molecular frameworks. J. Med. Chem. 39, 2887–2893 (1996).

    CAS  Article  Google Scholar 

  21. 21

    Schreiber, S. Target-oriented and diversity-oriented organic synthesis in drug discovery. Science 287, 1964–1969 (2000).

    CAS  Article  Google Scholar 

  22. 22

    Haggarty, S., Clemons, P., Wong, J. & Schreiber, S. Mapping chemical space using molecular descriptors and chemical genetics: deacetylase inhibitors. Comb. Chem. High Throughput Screen. 7, 669–676 (2004).

    CAS  Article  Google Scholar 

  23. 23

    Shang, S. & Tan, D.S. Advancing chemistry and biology through diversity-oriented synthesis of natural product-like libraries. Curr. Opin. Chem. Biol. 9, 248–258 (2005).

    CAS  Article  Google Scholar 

  24. 24

    Gregori-Puigjané, E. & Mestres, J. Coverage and bias in chemical library design. Curr. Opin. Chem. Biol. 12, 359–365 (2008).

    Article  Google Scholar 

  25. 25

    Ertl, P., Jelfs, S., Mühlbacher, J., Schuffenhauer, A. & Selzer, P. Quest for the rings. In silico exploration of ring universe to identify novel bioactive heteroaromatic scaffolds. J. Med. Chem. 49, 4568–4573 (2006).

    CAS  Article  Google Scholar 

  26. 26

    Wester, M.J. et al. Scaffold topologies. 2. Analysis of chemical databases. J. Chem. Inf. Model. 48, 1311–1324 (2008).

    CAS  Article  Google Scholar 

  27. 27

    Wetzel, S., Schuffenhauer, A., Roggo, S., Ertl, P. & Waldmann, H. Cheminformatic analysis of natural products and their chemical space. Chimia 61, 355–360 (2007).

    CAS  Article  Google Scholar 

  28. 28

    Fink, T., Bruggesser, H. & Reymond, J.L. Virtual exploration of the small-molecule chemical universe below 160 Daltons. Angew. Chem. Int. Ed. 44, 1504–1508 (2005).

    CAS  Article  Google Scholar 

  29. 29

    Kanehisa, M. & Goto, S. KEGG: Kyoto encyclopedia of genes and genomes. Nucleic Acids Res. 28, 27–30 (2000).

    CAS  Article  Google Scholar 

  30. 30

    Buckingham, J. Dictionary of Natural Products (Chapman & Hall/CRC, United Kingdom, 2008).

    Google Scholar 

  31. 31

    Irwin, J.J. & Shoichet, B.K. ZINC–a free database of commercially available compounds for virtual screening. J. Chem. Inf. Model. 45, 177–182 (2005).

    CAS  Article  Google Scholar 

  32. 32

    Morgan, H.L. Generation of a unique description for chemical structures-a technique developed at Chemical Abstract Service. J. Chem. Doc. 5, 107–113 (1965).

    CAS  Article  Google Scholar 

  33. 33

    Hert, J. et al. Comparison of topological descriptors for similarity-based virtual screening using multiple bioactive reference structures. Org. Biomol. Chem. 2, 3256–3266 (2004).

    CAS  Article  Google Scholar 

  34. 34

    Koch, M. et al. Charting biologically relevant chemical space: a structural classification of natural products (SCONP). Proc. Natl. Acad. Sci. USA 102, 17272–17277 (2005).

    CAS  Article  Google Scholar 

Download references


This work was supported by US National Institutes of Health grant GM59957 to B.K.S. J.H. was supported by a Marie Curie fellowship from the 6th Framework Program of the European Commission; M.J.K. was supported by a US National Science Foundation graduate fellowship; C.L. was supported by a fellowship from the Max Kade Foundation.

Author information




The project was conceived of by J.H. and B.K.S. J.H. undertook most of the calculations, with molecular proof checking by J.J.I. and C.L. and algorithmic assistance from M.J.K. J.H. and B.K.S. wrote the manuscript, which was read and commented on by the other authors.

Corresponding author

Correspondence to Brian K Shoichet.

Supplementary information

Supplementary Text and Figures

Supplementary Figures 1–4 and Supplementary Tables 1–3 (PDF 306 kb)

Rights and permissions

Reprints and Permissions

About this article

Cite this article

Hert, J., Irwin, J., Laggner, C. et al. Quantifying biogenic bias in screening libraries. Nat Chem Biol 5, 479–483 (2009). https://doi.org/10.1038/nchembio.180

Download citation

Further reading