Skip to main content

Thank you for visiting nature.com. You are using a browser version with limited support for CSS. To obtain the best experience, we recommend you use a more up to date browser (or turn off compatibility mode in Internet Explorer). In the meantime, to ensure continued support, we are displaying the site without styles and JavaScript.

  • Review Article
  • Published:

Combinatorial informatics in the post-genomics era

Key Points

  • A practical and cost-effective embodiment of a chemogenomics approach to drug discovery involves the following steps:

  • Gene sequences for targets that have been identified by genomics approaches are cloned and expressed as target proteins that are suitable for screening with a probe library of small, drug-like chemical compounds.

  • These compounds are screened to find active hits using a quantitative universal binding assay.

  • Initial hits or quantitative structure–activity data emerging from the binding assay are analysed and used to formulate a selection strategy for the synthesis of additional compounds with improved properties.

  • These compounds are selected from a computer database of synthetically accessible analogues of the initial probe library, constructed using verified synthetic protocols and characterized by an extensive set of computed drug-related molecular properties.

  • The selected compounds are synthesized by parallel-synthesis methods and are subsequently tested to elaborate the structure–activity profile of the target under investigation, and refine the selection criteria for additional rounds of chemical synthesis and biological testing.

  • In each iteration, priority is assigned to the synthetic candidates using a multiobjective optimization process designed to assure that compounds are not only optimized for target binding affinity, but also have drug-like characteristics that will allow them to be used directly as tool compounds in appropriate cellular or biological model systems.

  • The potential for improved performance using such a strategy lies in the ability to rapidly follow up on initial hits through intelligent selection of related compounds from a computer database of synthetically accessible analogues with predefined synthesis recipes and predicted property profiles.

  • To address the full spectrum of targets emerging from genomics-based efforts, it will be necessary to physically screen probe libraries that span a wide range of chemotypes and contain hundreds of thousands of compounds.

  • These libraries will derive from synthetic strategies that could, in theory, produce billions of related analogues, which far exceeds the capabilities of conventional chemical-database management systems and data-modelling tools.

  • Thus, the following key questions need to be addressed:

  • How can huge combinatorial libraries be generated, represented, accessed, searched and manipulated?

  • What are the most appropriate chemical-property spaces, and how can they best be computed, sampled, visualized and validated?

  • What are the most effective ways to design, execute and analyse a combinatorial-chemistry experiment?

  • Successful deployment of such a system requires a new generation of computational tools that work effectively on a massive scale.

Abstract

The multitude of potential drug targets emerging from genome sequencing demands new approaches to drug discovery. A chemogenomics strategy, which involves the generation of small-molecule compounds that can be used both as tools to probe biological mechanisms and as leads for drug-property optimization, provides a highly parallel, industrialized solution. Key to the success of this strategy is an integrated suite of chemi-informatics applications that can allow the rapid and directed optimization of chemical compounds with drug-like properties using 'just-in-time' combinatorial chemical synthesis. An effective embodiment of this process requires new computational and data-mining tools that cover all aspects of library generation, compound selection and experimental design, and work effectively on a massive scale.

This is a preview of subscription content, access via your institution

Access options

Buy this article

Prices may be subject to local taxes which are calculated during checkout

Figure 1: A practical and cost-effective embodiment of a chemogenomics strategy.
Figure 2: Virtual-library generation.
Figure 3: Representative chemical classes in the 3DP probe library.
Figure 4: Diversity and drug likeness of the 3DP probe library.
Figure 5: Iterative library design.
Figure 6: Multiobjective library design.
Figure 7: Combinatorial networks.

Similar content being viewed by others

References

  1. International Human Genome Sequencing Consortium. Initial Sequencing and Analysis of the Human Genome. Nature 409, 860–921 (2001).

  2. Agrafiotis, D. K., Bone, R. F., Salemme, F. R. & Soll, R. M. System and method for automatically generating chemical compounds with desired properties. US Patent 5,463,564 (1995).

  3. Agrafiotis, D. K., Bone, R. F., Salemme, F. R. & Soll, R. M. System and method for automatically generating chemical compounds with desired properties. US Patent 5,574,656 (1996).

  4. Agrafiotis, D. K., Bone, R. F., Salemme, F. R. & Soll, R. M. System, method and computer program for at least partially automatically generating chemical compounds having desired properties. US Patent 5,684,711 (1997).

  5. Agrafiotis, D. K., Bone, R. F., Salemme, F. R. & Soll, R. M. System, method and computer program for at least partially automatically generating chemical compounds with desired properties from a list of potential chemical compounds to synthesize. US Patent 5,901,069 (1999).

  6. Pantoliano, M. P. et al. High density miniaturized thermal shift assay as a general strategy for drug discovery. J. Biomol. Screen. 6, 492–440 (2001).This article describes the use of a high-throughput, fluorescence-based method for detecting thermal phase transitions in proteins as a means to evaluate their stability and the effects of bound ligands.

    Google Scholar 

  7. Martin, E. J., Spellmeyer, D. C., Critchlow, R. E. Jr & Blaney, J. M. in Reviews in Computational Chemistry Vol. 10 (eds Lipkowitz, K. B. & Boyd, D. B.) 75–100 (VCH, Weinheim, 1997).

    Google Scholar 

  8. Agrafiotis, D. K. in The Encyclopedia of Computational Chemistry (eds Schleyer, P. V. R. et al.) 742–761 (John Wiley and Sons, Chichester, 1998).

    Google Scholar 

  9. Bures, M. G. & Martin, Y. C. Computational methods in molecular diversity and combinatorial chemistry. Curr. Opin. Chem. Biol. 2, 376–380 (1998).

    CAS  PubMed  Google Scholar 

  10. Agrafiotis, D. K., Myslik, J. C. & Salemme, F. R. Advances in diversity profiling and combinatorial series design. Mol. Divers. 4, 1–22 (1999).An in-depth review of computational methods that are used in diversity analysis and combinatorial-library design.

    CAS  Google Scholar 

  11. Drewry, D. H. & Young, S. S. Approaches to the design of combinatorial libraries. Chemometr. Intell. Lab. Syst. 48, 1–20 (1999).

    CAS  Google Scholar 

  12. Leach, A. R. & Hann, M. M. The in silico world of virtual libraries. Drug Discov. Today 5, 326–336 (2000).

    CAS  PubMed  Google Scholar 

  13. Leland, B. A. et al. Managing the combinatorial explosion. J. Chem. Inf. Comput. Sci. 37, 62–70 (1997).

    CAS  Google Scholar 

  14. Leach, A. R., Bradshaw, J., Green, D. V. S., Hann, M. M. & Delany, J. J. Implementation of a system for reagent selection and library enumeration, profiling & design. J. Chem. Inf. Comput. Sci. 39, 1161–1172 (1999).

    CAS  PubMed  Google Scholar 

  15. Lobanov, V. S. & Agrafiotis, D. K. Scalable methods for the construction and analysis of virtual combinatorial libraries. Combin. Chem. High-Throughput Screen. 5, 167–178 (2002).

    CAS  Google Scholar 

  16. Walters, W. P., Stahl, M. T. & Murcko, M. A. Virtual screening — an overview. Drug Discov. Today 3, 160–178 (1998).

    CAS  Google Scholar 

  17. Agrafiotis, D. K., Lobanov, V. S., Rassokhin, D. N. & Izrailev, S. in Virtual Screening for Bioactive Molecules (eds Böhm, H.-J. & Schneider, G.) 265–300 (Wiley–VCH, Weinheim, 2000).

    Google Scholar 

  18. Johnson, M. A. & Maggiora, G. M. Concepts and Applications of Molecular Similarity (Wiley, New York, 1990).An authoritative overview of the theoretical and practical aspects of molecular similarity as it applies to chemical and biological research.

    Google Scholar 

  19. Livingston, D. J. The characterization of molecular structures using molecular properties. A survey. J. Chem. Inf. Comput. Sci. 40, 195–209 (2000).

    Google Scholar 

  20. Hall, L. H. & Kier, L. B. in Reviews of Computational Chemistry (eds Boyd, D. B. & Lipkowitz, K. B.) 367–422 (VCH, Weinheim, 1991).Describes a class of important molecular-connectivity indices and their use in predicting molecular properties.

    Google Scholar 

  21. James, C. A., Weininger, D. & Delaney, J. Daylight Theory Manual. Daylight Chemical Information Systems [online] (cited 12 Mar 02) 〈http://www.daylight.com/〉.

  22. Sadowski, J. & Kubinyi, H. A scoring scheme for discriminating between drugs and nondrugs. J. Med. Chem. 41, 3325–3329 (1998).Describes the application of neural networks for discriminating drugs from non-drugs by using simple atom-type descriptors.

    CAS  PubMed  Google Scholar 

  23. Schneider, G., Neidhart, W., Giller, T. & Schmid, G. Scaffold-hopping by topological pharmacophore search: a contribution to virtual screening. Angew. Chem. Int. Edn Engl. 38, 2894–2896 (1999).

    CAS  Google Scholar 

  24. Carhart, R. E., Smith, D. H. & Venkataraghavan, R. Atom pairs as molecular features in structure–activity studies: definition and application. J. Chem. Inf. Comput. Sci. 25, 64–73 (1985).

    CAS  Google Scholar 

  25. Nilakantan, R., Bauman, N., Dixon, J. S. & Venkataraghavan, R. Topological torsions: a new molecular descriptor for SAR applications. Comparison with other descriptors. J. Chem. Inf. Comput. Sci. 27, 82–85 (1987).

    CAS  Google Scholar 

  26. Kearsley, S. K. et al. Chemical similarity using physicochemical property descriptors. J. Chem. Inf. Comput. Sci. 36, 118–127 (1996).

    CAS  Google Scholar 

  27. Moreau, G. & Broto, P. The autocorrelation of a topological structure: a new molecular descriptor. Nouv. J. Chim. 4, 359–360 (1980).

    CAS  Google Scholar 

  28. Bauknecht, H. et al. Locating biologically active compounds in medium-sized heterogeneous datasets by topological autocorrelation vectors: dopamine and benzodiazepine agonists. J. Chem. Inf. Comput. Sci. 36, 1205–1213 (1996).

    CAS  PubMed  Google Scholar 

  29. Labute, P. A widely applicable set of descriptors. J. Mol. Graph. Model. 18, 464–467 (2000).

    CAS  PubMed  Google Scholar 

  30. Kubinyi, H. in Methods and Principles in Medicinal Chemistry Vol. 1 (eds Manhold, R., Krogsgaard-Larsen, P. & Timmermann, H.) 21–36 (VCH, Weinheim, 1993).

    Google Scholar 

  31. Burden, F. R. Molecular identification number for substructure searches. J. Chem. Inf. Comput. Sci. 29, 225–227 (1989).

    CAS  Google Scholar 

  32. Sheridan, R. P., Miller, M. D., Underwood, D. J. & Kearsley, S. K. Chemical similarity using geometric atom pair descriptors. J. Chem. Inf. Comput. Sci. 36, 128–136 (1996).

    CAS  Google Scholar 

  33. Wagener, M., Sadowski, J. & Gasteiger, J. Autocorrelation of molecular surface properties for modeling corticosteroid binding globulin and cytosolic Ah receptor activity by neural networks. J. Am. Chem. Soc. 117, 7769–7775 (1995).

    CAS  Google Scholar 

  34. Todeschini, R., Lasagni, M. & Marengo, E. New molecular descriptors for 2D and 3D structures. Theory. J. Chemom. 8, 263–272 (1994).

    CAS  Google Scholar 

  35. Ghuloum, A. M., Sage, C. R. & Jain, A. N. Molecular hashkeys: a novel method for molecular characterization and its application for predicting important pharmaceutical properties of molecules. J. Med. Chem. 42, 1739–1748 (1999).

    CAS  PubMed  Google Scholar 

  36. Pearlman, R. S. & Smith, K. M. Metric validation and the receptor-relevant subspace concept. J. Chem. Inf. Comput. Sci. 9, 28–35 (1999).

    Google Scholar 

  37. Sheridan, R. P. et al. 3Dsearch; a system for three-dimensional substructure searching. J. Chem. Inf. Comput. Sci. 29, 255–260 (1989).

    CAS  Google Scholar 

  38. Murrall, N. W. & Davies, E. K. Conformational freedom in 3-D databases. 1. Techniques. J. Chem. Inf. Comput. Sci. 30, 312–316 (1990).

    CAS  Google Scholar 

  39. Guner, O. F. Pharmacophore Perception, Development and Use in Drug Design (International Univ. Line, La Jolla, 2000).A collection of articles that describe the use of pharmacophore modelling in drug design.

    Google Scholar 

  40. Mason, J. S. et al. New 4-point pharmacophore method for molecular similarity and diversity applications: overview of the method and applications, including a novel approach to the design of combinatorial libraries containing priviledged substructures. J. Med. Chem. 42, 3251–3264 (1999).

    CAS  PubMed  Google Scholar 

  41. Leach, A. R., Green, D. V. S., Hann, M. M., Judd, D. B. & Good, A. C. Where are the GaPs? A rational approach to monomer acquisition and selection. J. Chem. Inf. Comput. Sci. 40, 1262–1269 (2000).

    CAS  PubMed  Google Scholar 

  42. Martin, E. J. & Hoeffel, T. J. Oriented substituent pharmacophore property space (OSPPREYS): A substituent-based calculation that describes combinatorial library products better than the corresponding product-based selection. J. Mol. Graph. Model. 18, 383–403 (2000).This paper describes the use of substituent-based pharmacophore descriptors to encode conformation-dependent properties of combinatorial products.

    CAS  PubMed  Google Scholar 

  43. Cramer, R. D., Clark, R. D., Patterson, D. E. & Ferguson, A. M. Bioisosterism as a molecular diversity descriptor: steric fields of single topomeric conformers. J. Med. Chem. 39, 3060–3069 (1996).

    CAS  PubMed  Google Scholar 

  44. Matter, H. & Potter, T. Comparing 3D pharmacophore triplets and 2D fingerprints for selecting diverse compound subsets. J. Chem. Inf. Comput. Sci. 39, 1211–1225 (1999).

    CAS  Google Scholar 

  45. Salemme, F. R., Spurlino, J. & Bone, R. Serendipity meets precision: the integration of structure based drug design and combinatorial chemistry for efficient drug discovery. Structure 5, 319–324 (1997).

    CAS  PubMed  Google Scholar 

  46. Graybill, T. L. et al. in Molecular Diversity and Combinatorial Chemistry (eds Chaiken, I. M. & Janda, K. D.) 16–26 (ACS, Washington DC, 1996).

    Google Scholar 

  47. Jones, G., Willett, P., Glen, R. C., Leach, A. R. & Taylor, R. Further development of a genetic algorithm for ligand docking and its application to screening combinatorial libraries. ACS Symp. Ser. 719, 271–291 (1999).

    CAS  Google Scholar 

  48. Waszkowycz, B., Perkins, T. D. J., Sykes, R. A. & Li, J. Large-scale virtual screening for discovering leads in the post-genomics era. IBM Syst. J. 40, 360–376 (2001).

    Google Scholar 

  49. Sun, Y., Ewing, T. J. A., Skillman, A. G. & Kuntz, I. D. CombiDock: structure-based combinatorial docking and library design. J. Comput. Aided. Mol. Des. 12, 597–604 (1998).

    CAS  PubMed  Google Scholar 

  50. Waller, C. L. & Bradley, M. P. Development and validation of a novel variable selection technique with application to multidimensional quantitative structure-activity relationship studies. J. Chem. Inf. Comput. Sci. 39, 345–355 (1999).

    CAS  Google Scholar 

  51. Rose, V. S. & Wood, J. Generalized cluster significance analysis with conditional probabilities. Quant. Struct. Activ. Rel. 17, 348–356 (1998).

    CAS  Google Scholar 

  52. Godden, J. W. & Bajorath, J. Differential Shannon entropy as a sensitive measure of differences in database variability of molecular descriptors. J. Chem. Inf. Comput. Sci. 41, 1060–1066 (2001).

    CAS  PubMed  Google Scholar 

  53. Cooley, W. & Lohnes, P. Multivariate Data Analysis (Wiley, New York, 1971).

    Google Scholar 

  54. Xie, D., Tropsha, A. & Schlick, T. An efficient projection protocol for chemical databases: singular value decomposition combined with truncated Newton minimization. J. Chem. Inf. Comput. Sci. 40, 167–177 (2000).

    CAS  PubMed  Google Scholar 

  55. Hull, R. D. et al. Latent semantic structure indexing (LASSI) for defining chemical similarity. J. Med. Chem. 44, 1177–1184 (2001).

    CAS  PubMed  Google Scholar 

  56. Cummins, D. J., Andrews, C. W., Bentley, J. A. & Cory, M. Molecular diversity in chemical databases: comparison of medicinal chemistry knowledge bases and databases of commercially available compounds. J. Chem. Inf. Comput. Sci. 36, 750–763 (1996).

    CAS  PubMed  Google Scholar 

  57. Kruskal, J. B. Non-metric multidimensional scaling: a numerical method. Phychometrika 29, 115–129 (1964).

    Google Scholar 

  58. Sammon, J. W. A nonlinear mapping for data structure analysis. IEEE Trans. Comput. C18, 401–409 (1969).

    Google Scholar 

  59. Agrafiotis, D. K. & Lobanov, V. S. Nonlinear mapping networks. J. Chem. Inf. Comput. Sci. 40, 1356–1362 (2000).

    CAS  PubMed  Google Scholar 

  60. Rassokhin, D. N., Lobanov, V. S. & Agrafiotis, D. K. Nonlinear mapping of massive data sets by fuzzy clustering and neural networks. J. Comput. Chem. 22, 373–386 (2001).

    CAS  Google Scholar 

  61. Agrafiotis, D. K., Rassokhin, D. N. & Lobanov, V. S. Multidimensional scaling and visualization of large molecular similarity tables. J. Comput. Chem. 22, 488–500 (2001).

    CAS  Google Scholar 

  62. Agrafiotis, D. K. & Lobanov, V. S. Multidimensional scaling of combinatorial libraries without explicit enumeration. J. Comput. Chem. 22, 1712–1722 (2001).

    CAS  Google Scholar 

  63. Jamois, E. A., Hassan, M. & Waldman, M. Evaluation of reagent-based and product-based strategies in the design of combinatorial library subsets. J. Chem. Inf. Comput. Sci. 40, 63–70 (2000).

    CAS  PubMed  Google Scholar 

  64. Agrafiotis, D. K. & Rassokhin, D. N. A fractal approach for selecting an appropriate bin size for cell-based diversity estimation. J. Chem. Inf. Comput. Sci. 42, 117–122 (2002).

    CAS  PubMed  Google Scholar 

  65. Montgomery, D. C. Design and Analysis of Experiments 4th edn (John Wiley and Sons, New York, 1996).

    Google Scholar 

  66. Martin, E. J. et al. Measuring diversity: Experimental design of combinatorial libraries for drug discovery. J. Med. Chem. 38, 1431–1436 (1995).This paper describes the use of statistical experimental-design techniques to select building blocks for combinatorial libraries using a rich set of molecular descriptors.

    CAS  PubMed  Google Scholar 

  67. Hassan, M., Bielawski, J. P., Hempel, J. C. & Waldman, M. Optimization and visualization of molecular diversity of combinatorial libraries. Mol. Divers. 2, 64–74 (1996).

    CAS  PubMed  Google Scholar 

  68. Kennard, R. W. & Stone, L. A. Computer-aided design of experiments. Technometrics 11, 137–148 (1969).

    Google Scholar 

  69. Higgs, R. E., Bemis, K. G., Watson, I. A. & Wikel, J. H. Experimental designs for selecting molecules from large chemical databases. J. Chem. Inf. Comput. Sci. 37, 861–870 (1997).

    CAS  Google Scholar 

  70. Snarey, M., Terrett, N. K., Willett, P. & Wilton, D. J. Comparison of algorithms for dissimilarity-based compound selection. J. Mol. Graph. Model. 15, 372–385 (1997).

    CAS  PubMed  Google Scholar 

  71. Mount, J., Ruppert, J., Welch, W. & Jain, A. N. IcePick: a flexible surface-based system for molecular diversity. J. Med. Chem. 42, 60–66 (1999).

    CAS  PubMed  Google Scholar 

  72. Agrafiotis, D. K. & Lobanov, V. S. An efficient implementation of distance-based diversity metrics based on k-d trees. J. Chem. Inf. Comput. Sci. 39, 51–58 (1999).

    CAS  Google Scholar 

  73. Agrafiotis, D. K. A constant time algorithm for estimating the diversity of large chemical libraries. J. Chem. Inf. Comput. Sci. 41, 159–167 (2001).

    CAS  PubMed  Google Scholar 

  74. Downs, G. M. & Willett, P. Similarity searching and clustering of chemical-structure databases using molecular property data. J. Chem. Inf. Comput. Sci. 34, 1094–1102 (1994).

    CAS  Google Scholar 

  75. Brown, R. D. & Martin, Y. C. Use of structure–activity data to compare structure-based clustering methods and descriptors for use in compound selection. J. Chem. Inf. Comput. Sci. 36, 572–584 (1996).A comparison of several two-dimensional and three-dimensional descriptors, which is based on their ability to discriminate active from inactive compounds.

    CAS  Google Scholar 

  76. Brown, R. D. & Martin, Y. C. The information content of 2D and 3D structural descriptors relevant to ligand–receptor binding. J. Chem. Inf. Comput. Sci. 37, 1–9 (1997).

    CAS  Google Scholar 

  77. Patterson, D. E., Cramer, R. D., Ferguson, A. M., Clark, R. D. & Weinberger, L. E. Neighborhood behavior: a useful concept for validation of molecular diversity descriptors. J. Med. Chem. 39, 3049–3059 (1996).

    CAS  PubMed  Google Scholar 

  78. Matter, H. Selecting optimally diverse compounds from structure databases: a validation study of two-dimensional and three-dimensional molecular descriptors. J. Med. Chem. 40, 1219–1229 (1997).

    CAS  PubMed  Google Scholar 

  79. Martin, Y. C., Bures, M. G. & Brown, R. D. Validated descriptors for diversity measurements and optimization. Pharm. Pharmacol. Commun. 4, 147–152 (1998).

    CAS  Google Scholar 

  80. Lipinski, C. A., Lombardo, F., Dominy, B. W. & Feeny, P. J. Experimental and computational approaches to estimate solubility and permeability in drug discovery and development settings. Adv. Drug Deliv. Rev. 23, 3–25 (1997).A discussion of the importance of ADME screening in early-stage drug discovery.

    CAS  Google Scholar 

  81. Oprea, T. I. Property distribution of drug-related chemical databases. J. Comput. Aided Mol. Des. 14, 251–264 (2000).

    CAS  PubMed  Google Scholar 

  82. Muegge, I., Heald, S. L. & Brittelli, D. Simple selection criteria for drug-like chemical matter. J. Med. Chem. 44, 1841–1846 (2001).

    CAS  PubMed  Google Scholar 

  83. Sheridan, R. P. The most common chemical replacements in drug-like compounds. J. Chem. Inf. Comput. Sci. 2, 103–108 (2002).

    Google Scholar 

  84. Wang, J. & Ramnarayan, K. Toward designing drug-like libraries: a novel computational approach for prediction of drug feasibility of compounds. J. Combin. Chem. 1, 524–533 (1999).

    CAS  Google Scholar 

  85. Ajay, A., Walters, W. P. & Murcko, M. A. Can we learn to distinguish between drug-like and nondrug-like molecules? J. Med. Chem. 41, 3314–3324 (1998).

    CAS  PubMed  Google Scholar 

  86. Wagener, M. & van Geerestein, V. J. Potential drugs and nondrugs: prediction and identification of important structural features. J. Chem. Inf. Comput. Sci. 40, 280–292 (2000).

    CAS  PubMed  Google Scholar 

  87. Yu, L. X., Lipka, E., Crison, J. R. & Amidon, G. L. Transport approach to the biopharmaceutical design of oral drug delivery systems: prediction of intestinal absorption. Adv. Drug Deliv. Rev. 19, 359–376 (1996).

    CAS  PubMed  Google Scholar 

  88. Teague, S. J., Davis, A. M., Leeson, P. D. & Oprea, T. I. The design of leadlike combinatorial libraries. Angew. Chem. Int. Edn Engl. 38, 3743–3748 (1999).Based on an analysis of 18 lead-drug pairs, the authors point out that traditional medicinal chemistry optimization tends to drive initial high-throughput screening (HTS) hits outside the “rule-of-five” range, and suggest that combinatorial libraries should have lower molecular masses and lower log P profiles than those originally proposed by Lipinski.

    CAS  Google Scholar 

  89. Koehler, R. T., Dixon, S. L. & Villar, O. H. LASSOO: a generalized directed diversity approach to the design and enrichment of chemical libraries. J. Med. Chem. 42, 4695–4704 (1999).

    CAS  PubMed  Google Scholar 

  90. Gillet, V. J., Willet, P., Bradshaw, J. & Green, D. V. S. Selecting combinatorial libraries to optimize diversity and physical properties. J. Chem. Inf. Comput. Sci. 39, 169–177 (1999).

    CAS  Google Scholar 

  91. Rassokhin, D. N. & Agrafiotis, D. K. Kolmogorov–Smirnov statistic and its applications in library design. J. Mol. Graph. Model. 18, 370–384 (2000).

    Google Scholar 

  92. Brown, R. D., Hassan, M. & Waldman, M. Combinatorial library design for diversity, cost efficiency and drug-like character. J. Mol. Graph. Model. 18, 427–437 (2000).

    CAS  PubMed  Google Scholar 

  93. Shi, S., Peng, Z., Kostrowicki, J., Paderes, J. & Kuki, A. Efficient combinatorial filtering for desired molecular properties of reaction products. J. Mol. Graph. Model. 18, 478–496 (2000).

    CAS  PubMed  Google Scholar 

  94. Martin, E. & Wong, A. Sensitivity analysis and other improvements to tailored combinatorial library design. J. Chem. Inf. Comput. Sci. 40, 215–220 (2000).

    CAS  PubMed  Google Scholar 

  95. Gillet, V. J., Willett, P. & Bradshaw, J. The effectiveness of reactant pools for generating structurally-diverse combinatorial libraries. J. Chem. Inf. Comput. Sci. 37, 731–740 (1997).

    CAS  Google Scholar 

  96. Jamois, E. A., Hassan, M. & Waldman, M. Evaluation of reagent-based and product-based strategies in the design of combinatorial library subsets. J. Chem. Inf. Comput. Sci. 40, 63–70 (2000).

    CAS  PubMed  Google Scholar 

  97. Graham, E. T., Jacober, S. P. & Cardoso, M. G. A novel frequency distribution selection method for efficient plate layout of a diverse combinatorial library. J. Chem. Inf. Comput. Sci. 41, 1508–1516 (2001).

    CAS  PubMed  Google Scholar 

  98. Bayada, D. M., Hamersma, H. & van Geerestein, V. J. Molecular diversity and representativity in chemical databases. J. Chem. Inf. Comput. Sci. 39, 1–10 (1999).

    CAS  Google Scholar 

  99. Agrafiotis, D. K. & Lobanov, V. S. Ultrafast algorithm for designing focused combinatorial arrays. J. Chem. Inf. Comput. Sci. 40, 1030–1038 (2000).

    CAS  PubMed  Google Scholar 

  100. Stanton, R. V. et al. Combinatorial library design: maximizing model fitting compounds with matrix synthesis constraints. J. Chem. Inf. Comput. Sci. 40, 701–705 (2000).

    CAS  PubMed  Google Scholar 

  101. Agrafiotis, D. K. Stochastic algorithms for maximizing molecular diversity. J. Chem. Inf. Comput. Sci. 37, 841–851 (1997).

    CAS  Google Scholar 

  102. Hassan, M., Bielawski, J. P., Hempel, J. C. & Waldman, M. Optimization and visualization of molecular diversity of combinatorial libraries. Mol. Diversity 2, 64–74 (1996).

    CAS  Google Scholar 

  103. Good, A. C. & Lewis, R. A. New methodology for profiling combinatorial libraries and screening sets: cleaning up the design process with HARPcik. J. Med. Chem. 40, 3926–3236 (1997).

    CAS  PubMed  Google Scholar 

  104. Zheng, W., Cho, S. J. & Tropsha, A. Rational combinatorial library design: 1) Focus-2D: a new approach to the design of targeted combinatorial chemical libraries. J. Chem. Inf. Comput. Sci. 38, 251–258 (1998).

    CAS  PubMed  Google Scholar 

  105. Waldman, M., Li, H. & Hassan, M. Novel algorithms for the optimization of molecular diversity of combinatorial libraries. J. Mol. Graph. Model. 18, 412–426 (2000).

    CAS  PubMed  Google Scholar 

  106. Agrafiotis, D. K. Multiobjective optimization of combinatorial libraries. IBM J. Res. Develop. 45, 545–566 (2001).

    CAS  Google Scholar 

  107. Sheridan, R. P. & Kearsley, S. K. Using a genetic algorithm to suggest combinatorial libraries. J. Chem. Inf. Comput. Sci. 35, 310–3201 (1995).

    CAS  Google Scholar 

  108. Weber, L., Wallbaum, S., Broger, C. & Gubernator, K. Optimization of the biological activity of combinatorial compound libraries by a genetic algorithm. Angew. Chem. Int. Edn Engl. 34, 2280–2282 (1995).

    CAS  Google Scholar 

  109. Singh, J. et al. Application of genetic algorithms to combinatorial synthesis: a computational approach for lead identification and lead optimization. J. Am. Chem. Soc. 118, 1669–1676 (1996).A description of the use of a genetic algorithm to optimize peptide-based collagenase substrates using direct experimental feedback, without constructing any intermediate models of biological activity.

    CAS  Google Scholar 

  110. Brown, R. D. & Martin, Y. C. Designing combinatorial library mixtures using genetic algorithms. J. Med. Chem. 40, 2304–2313 (1997).

    CAS  PubMed  Google Scholar 

  111. Sheridan, R. P., SanFeliciano, S. G. & Kearsley, S. K. Designing targeted libraries with genetic algorithms. J. Mol. Graph. Model. 18, 320–334 (2000).

    CAS  PubMed  Google Scholar 

  112. Farnum, M. & Agrafiotis, D. K. Combinatorial Swarms (CombiChem, London, 2001).

    Google Scholar 

  113. Lobarov, V. S. & Agrafiotis, D. K. Stochastic similarity selections from large combinatorial libraries. J. Chem. Inf. Comput. Sci. 40, 460–470 (2000).

    Google Scholar 

  114. Downs, G. M. & Barnard, J. M. Techniques for generating descriptive fingerprints in combinatorial libraries. J. Chem. Inf. Comput. Sci. 37, 59–61 (1997).

    CAS  Google Scholar 

  115. Cramer, R. D., Patterson, D. E., Clark, R. D., Soltanshahi, F. & Lawless, M. S. Virtual compound libraries: a new approach to decision making in molecular discovery research. J. Chem. Inf. Comput. Sci. 38, 1010–1023 (1998).

    CAS  Google Scholar 

  116. Ivanciuc, O. & Klein, D. J. Computing Weiner-type indices for virtual combinatorial libraries generated from heteroatom-containing building blocks. J. Chem. Inf. Comput. Sci. 42, 8–22 (2002).

    CAS  PubMed  Google Scholar 

  117. Lobanov, V. S. & Agrafiotis, D. K. Combinatorial networks. J. Mol. Graph. Model. 19, 571–578 (2001).Describes the use of neural networks for predicting properties of combinatorial products from properties of their respective building blocks. This method allows product-based virtual screening of massive combinatorial libraries in a way that circumvents their virtual synthesis.

    CAS  PubMed  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Dimitris K. Agrafiotis.

Related links

Related links

FURTHER INFORMATION

Chemical Informatics Societies and Professional Organizations

LINKS

3-Dimensional Pharmaceuticals

Glossary

PROBE LIBRARY

A collection of diverse compounds that is aimed at discovering hits across a wide variety of biological targets.

MULTIOBJECTIVE OPTIMIZATION

The solution to a problem that involves the simultaneous optimization of multiple design objectives.

VIRTUAL LIBRARY

A computer representation of a collection of chemical compounds.

COMBINATORIAL LIBRARY

A collection of compounds that are derived from the systematic application of a synthetic sequence on a prescribed set of building blocks.

ENUMERATION

The process of constructing the connection tables of the combinatorial products from their respective building blocks, as prescribed by the reaction sequence.

CLIPPED REAGENT

The (potentially modified) part of a reagent that becomes part of the final product.

CONNECTION TABLE

A computer representation of the atoms and bonds that comprise a molecule. This is the computer equivalent of a chemical sketch of a molecule.

LAZY ENUMERATION

The on-demand virtual synthesis of combinatorial products.

MOLECULAR PERCEPTION

The computational detection of important structural features, such as rings, aromaticity, stereochemistry and topological symmetry, from the molecule's connection table.

MOLECULAR DIVERSITY

The chemical-information content of a collection of compounds. The concept is often context dependent.

GRAPH THEORY

Formally, a connection table for a molecule records its chemical structure as a graph — a set of vertices (the atoms) linked by edges (the bonds). This allows mathematical analyses to be used to classify the structure or calculate molecular properties.

FINGERPRINT

A set of binary numbers (1s and 0s) that are used to characterize a molecular structure. Each bit signifies the presence (1) or absence (0) of one or more structural features in the target molecule.

PHARMACOPHORE

The ensemble of steric and electronic features that are necessary to ensure optimal supramolecular interactions with a specific biological target and to trigger (or block) its biological function. Only molecules that interact at the same receptor site in the same way share a common pharmacophore.

DRUG LIKENESS

The thesis that drugs have certain common properties that differentiate them from other ordinary chemicals.

LOG P

The octanol/water partition coefficient is the ratio of the compound's solubility in octanol to its solubility in water. The logarithm of this partition coefficient is called log P. It provides an estimate of the compound's ability to pass through a cell membrane.

OUTLIER

A point that, because of observation noise, does not follow the characteristics of the input (or desired response) data.

CURSE OF DIMENSIONALITY

The sparsity of data in higher dimensions.

QUADRATIC COMPLEXITY

Quadratic complexity means that if the size of the problem doubles, the computational time that is required by the algorithm to solve it quadruples. The complexity (or order) of an algorithm is an important criterion for comparing algorithms that involve the analysis of large data sets.

LIPINSKI RULE OF 5

For compounds that are not substrates of biological transporters, poor absorption and permeation are more likely to occur when there are more than 5 hydrogen-bond donors, more than 10 hydrogen-bond acceptors, the molecular mass is greater than 500 Da, or the log P is greater than 5.

BIOISOSTERISM

The idea that a chemical group in a biologically active molecule can be replaced by another chemical group without loss of activity.

COMBINATORIAL OPTIMIZATION

The number of different combinations of k objects out of a set of n objects is given by the binomial coefficient Cnk = n!/(n − k)!k!. This can be used to calculate the number of distinct k1×k2×...×kd combinatorial arrays in a n1×n2×...×nd combinatorial library. For example, there are approximately 1040 different 10×10×10 arrays in a 100×100×100 library.

COMBINATORIAL NEURAL NETWORK

(CNN). A neural network that is trained to predict molecular properties of combinatorial products from pertinent features of their respective building blocks.

FEATURE SELECTION

A computational technique that attempts to identify a small subset of features that are most relevant to a particular machine learning task.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Agrafiotis, D., Lobanov, V. & Salemme, F. Combinatorial informatics in the post-genomics era. Nat Rev Drug Discov 1, 337–346 (2002). https://doi.org/10.1038/nrd791

Download citation

  • Issue Date:

  • DOI: https://doi.org/10.1038/nrd791

This article is cited by

Search

Quick links

Nature Briefing

Sign up for the Nature Briefing newsletter — what matters in science, free to your inbox daily.

Get the most important science stories of the day, free in your inbox. Sign up for Nature Briefing