Combinatorial informatics in the post-genomics era

Agrafiotis, Dimitris K.; Lobanov, Victor S.; Salemme, F. Raymond

doi:10.1038/nrd791

Review Article
Published: 01 May 2002

Combinatorial informatics in the post-genomics era

Dimitris K. Agrafiotis¹,
Victor S. Lobanov¹ &
F. Raymond Salemme¹

Nature Reviews Drug Discovery volume 1, pages 337–346 (2002)Cite this article

689 Accesses
100 Citations
4 Altmetric
Metrics details

Key Points

A practical and cost-effective embodiment of a chemogenomics approach to drug discovery involves the following steps:
Gene sequences for targets that have been identified by genomics approaches are cloned and expressed as target proteins that are suitable for screening with a probe library of small, drug-like chemical compounds.
These compounds are screened to find active hits using a quantitative universal binding assay.
Initial hits or quantitative structure–activity data emerging from the binding assay are analysed and used to formulate a selection strategy for the synthesis of additional compounds with improved properties.
These compounds are selected from a computer database of synthetically accessible analogues of the initial probe library, constructed using verified synthetic protocols and characterized by an extensive set of computed drug-related molecular properties.
The selected compounds are synthesized by parallel-synthesis methods and are subsequently tested to elaborate the structure–activity profile of the target under investigation, and refine the selection criteria for additional rounds of chemical synthesis and biological testing.
In each iteration, priority is assigned to the synthetic candidates using a multiobjective optimization process designed to assure that compounds are not only optimized for target binding affinity, but also have drug-like characteristics that will allow them to be used directly as tool compounds in appropriate cellular or biological model systems.
The potential for improved performance using such a strategy lies in the ability to rapidly follow up on initial hits through intelligent selection of related compounds from a computer database of synthetically accessible analogues with predefined synthesis recipes and predicted property profiles.
To address the full spectrum of targets emerging from genomics-based efforts, it will be necessary to physically screen probe libraries that span a wide range of chemotypes and contain hundreds of thousands of compounds.
These libraries will derive from synthetic strategies that could, in theory, produce billions of related analogues, which far exceeds the capabilities of conventional chemical-database management systems and data-modelling tools.
Thus, the following key questions need to be addressed:
How can huge combinatorial libraries be generated, represented, accessed, searched and manipulated?
What are the most appropriate chemical-property spaces, and how can they best be computed, sampled, visualized and validated?
What are the most effective ways to design, execute and analyse a combinatorial-chemistry experiment?
Successful deployment of such a system requires a new generation of computational tools that work effectively on a massive scale.

Abstract

The multitude of potential drug targets emerging from genome sequencing demands new approaches to drug discovery. A chemogenomics strategy, which involves the generation of small-molecule compounds that can be used both as tools to probe biological mechanisms and as leads for drug-property optimization, provides a highly parallel, industrialized solution. Key to the success of this strategy is an integrated suite of chemi-informatics applications that can allow the rapid and directed optimization of chemical compounds with drug-like properties using 'just-in-time' combinatorial chemical synthesis. An effective embodiment of this process requires new computational and data-mining tools that cover all aspects of library generation, compound selection and experimental design, and work effectively on a massive scale.

Access through your institution

Buy or subscribe

This is a preview of subscription content, access via your institution

Access options

Access through your institution

Buy this article

Purchase on Springer Link
Instant access to full article PDF

Buy now

Prices may be subject to local taxes which are calculated during checkout

**Figure 1: A practical and cost-effective embodiment of a chemogenomics strategy.**

**Figure 2: Virtual-library generation.**

**Figure 3: Representative chemical classes in the 3DP probe library.**

**Figure 4: Diversity and drug likeness of the 3DP probe library.**

**Figure 6: Multiobjective library design.**

References

International Human Genome Sequencing Consortium. Initial Sequencing and Analysis of the Human Genome. Nature 409, 860–921 (2001).
Agrafiotis, D. K., Bone, R. F., Salemme, F. R. & Soll, R. M. System and method for automatically generating chemical compounds with desired properties. US Patent 5,463,564 (1995).
Agrafiotis, D. K., Bone, R. F., Salemme, F. R. & Soll, R. M. System and method for automatically generating chemical compounds with desired properties. US Patent 5,574,656 (1996).
Agrafiotis, D. K., Bone, R. F., Salemme, F. R. & Soll, R. M. System, method and computer program for at least partially automatically generating chemical compounds having desired properties. US Patent 5,684,711 (1997).
Agrafiotis, D. K., Bone, R. F., Salemme, F. R. & Soll, R. M. System, method and computer program for at least partially automatically generating chemical compounds with desired properties from a list of potential chemical compounds to synthesize. US Patent 5,901,069 (1999).
Pantoliano, M. P. et al. High density miniaturized thermal shift assay as a general strategy for drug discovery. J. Biomol. Screen. 6, 492–440 (2001).This article describes the use of a high-throughput, fluorescence-based method for detecting thermal phase transitions in proteins as a means to evaluate their stability and the effects of bound ligands.
Google Scholar
Martin, E. J., Spellmeyer, D. C., Critchlow, R. E. Jr & Blaney, J. M. in Reviews in Computational Chemistry Vol. 10 (eds Lipkowitz, K. B. & Boyd, D. B.) 75–100 (VCH, Weinheim, 1997).
Google Scholar
Agrafiotis, D. K. in The Encyclopedia of Computational Chemistry (eds Schleyer, P. V. R. et al.) 742–761 (John Wiley and Sons, Chichester, 1998).
Google Scholar
Bures, M. G. & Martin, Y. C. Computational methods in molecular diversity and combinatorial chemistry. Curr. Opin. Chem. Biol. 2, 376–380 (1998).
CAS PubMed Google Scholar
Agrafiotis, D. K., Myslik, J. C. & Salemme, F. R. Advances in diversity profiling and combinatorial series design. Mol. Divers. 4, 1–22 (1999).An in-depth review of computational methods that are used in diversity analysis and combinatorial-library design.
CAS Google Scholar
Drewry, D. H. & Young, S. S. Approaches to the design of combinatorial libraries. Chemometr. Intell. Lab. Syst. 48, 1–20 (1999).
CAS Google Scholar
Leach, A. R. & Hann, M. M. The in silico world of virtual libraries. Drug Discov. Today 5, 326–336 (2000).
CAS PubMed Google Scholar
Leland, B. A. et al. Managing the combinatorial explosion. J. Chem. Inf. Comput. Sci. 37, 62–70 (1997).
CAS Google Scholar
Leach, A. R., Bradshaw, J., Green, D. V. S., Hann, M. M. & Delany, J. J. Implementation of a system for reagent selection and library enumeration, profiling & design. J. Chem. Inf. Comput. Sci. 39, 1161–1172 (1999).
CAS PubMed Google Scholar
Lobanov, V. S. & Agrafiotis, D. K. Scalable methods for the construction and analysis of virtual combinatorial libraries. Combin. Chem. High-Throughput Screen. 5, 167–178 (2002).
CAS Google Scholar
Walters, W. P., Stahl, M. T. & Murcko, M. A. Virtual screening — an overview. Drug Discov. Today 3, 160–178 (1998).
CAS Google Scholar
Agrafiotis, D. K., Lobanov, V. S., Rassokhin, D. N. & Izrailev, S. in Virtual Screening for Bioactive Molecules (eds Böhm, H.-J. & Schneider, G.) 265–300 (Wiley–VCH, Weinheim, 2000).
Google Scholar
Johnson, M. A. & Maggiora, G. M. Concepts and Applications of Molecular Similarity (Wiley, New York, 1990).An authoritative overview of the theoretical and practical aspects of molecular similarity as it applies to chemical and biological research.
Google Scholar
Livingston, D. J. The characterization of molecular structures using molecular properties. A survey. J. Chem. Inf. Comput. Sci. 40, 195–209 (2000).
Google Scholar
Hall, L. H. & Kier, L. B. in Reviews of Computational Chemistry (eds Boyd, D. B. & Lipkowitz, K. B.) 367–422 (VCH, Weinheim, 1991).Describes a class of important molecular-connectivity indices and their use in predicting molecular properties.
Google Scholar
James, C. A., Weininger, D. & Delaney, J. Daylight Theory Manual. Daylight Chemical Information Systems [online] (cited 12 Mar 02) 〈http://www.daylight.com/〉.
Sadowski, J. & Kubinyi, H. A scoring scheme for discriminating between drugs and nondrugs. J. Med. Chem. 41, 3325–3329 (1998).Describes the application of neural networks for discriminating drugs from non-drugs by using simple atom-type descriptors.
CAS PubMed Google Scholar
Schneider, G., Neidhart, W., Giller, T. & Schmid, G. Scaffold-hopping by topological pharmacophore search: a contribution to virtual screening. Angew. Chem. Int. Edn Engl. 38, 2894–2896 (1999).
CAS Google Scholar
Carhart, R. E., Smith, D. H. & Venkataraghavan, R. Atom pairs as molecular features in structure–activity studies: definition and application. J. Chem. Inf. Comput. Sci. 25, 64–73 (1985).
CAS Google Scholar
Nilakantan, R., Bauman, N., Dixon, J. S. & Venkataraghavan, R. Topological torsions: a new molecular descriptor for SAR applications. Comparison with other descriptors. J. Chem. Inf. Comput. Sci. 27, 82–85 (1987).
CAS Google Scholar
Kearsley, S. K. et al. Chemical similarity using physicochemical property descriptors. J. Chem. Inf. Comput. Sci. 36, 118–127 (1996).
CAS Google Scholar
Moreau, G. & Broto, P. The autocorrelation of a topological structure: a new molecular descriptor. Nouv. J. Chim. 4, 359–360 (1980).
CAS Google Scholar
Bauknecht, H. et al. Locating biologically active compounds in medium-sized heterogeneous datasets by topological autocorrelation vectors: dopamine and benzodiazepine agonists. J. Chem. Inf. Comput. Sci. 36, 1205–1213 (1996).
CAS PubMed Google Scholar
Labute, P. A widely applicable set of descriptors. J. Mol. Graph. Model. 18, 464–467 (2000).
CAS PubMed Google Scholar
Kubinyi, H. in Methods and Principles in Medicinal Chemistry Vol. 1 (eds Manhold, R., Krogsgaard-Larsen, P. & Timmermann, H.) 21–36 (VCH, Weinheim, 1993).
Google Scholar
Burden, F. R. Molecular identification number for substructure searches. J. Chem. Inf. Comput. Sci. 29, 225–227 (1989).
CAS Google Scholar
Sheridan, R. P., Miller, M. D., Underwood, D. J. & Kearsley, S. K. Chemical similarity using geometric atom pair descriptors. J. Chem. Inf. Comput. Sci. 36, 128–136 (1996).
CAS Google Scholar
Wagener, M., Sadowski, J. & Gasteiger, J. Autocorrelation of molecular surface properties for modeling corticosteroid binding globulin and cytosolic Ah receptor activity by neural networks. J. Am. Chem. Soc. 117, 7769–7775 (1995).
CAS Google Scholar
Todeschini, R., Lasagni, M. & Marengo, E. New molecular descriptors for 2D and 3D structures. Theory. J. Chemom. 8, 263–272 (1994).
CAS Google Scholar
Ghuloum, A. M., Sage, C. R. & Jain, A. N. Molecular hashkeys: a novel method for molecular characterization and its application for predicting important pharmaceutical properties of molecules. J. Med. Chem. 42, 1739–1748 (1999).
CAS PubMed Google Scholar
Pearlman, R. S. & Smith, K. M. Metric validation and the receptor-relevant subspace concept. J. Chem. Inf. Comput. Sci. 9, 28–35 (1999).
Google Scholar
Sheridan, R. P. et al. 3Dsearch; a system for three-dimensional substructure searching. J. Chem. Inf. Comput. Sci. 29, 255–260 (1989).
CAS Google Scholar
Murrall, N. W. & Davies, E. K. Conformational freedom in 3-D databases. 1. Techniques. J. Chem. Inf. Comput. Sci. 30, 312–316 (1990).
CAS Google Scholar
Guner, O. F. Pharmacophore Perception, Development and Use in Drug Design (International Univ. Line, La Jolla, 2000).A collection of articles that describe the use of pharmacophore modelling in drug design.
Google Scholar
Mason, J. S. et al. New 4-point pharmacophore method for molecular similarity and diversity applications: overview of the method and applications, including a novel approach to the design of combinatorial libraries containing priviledged substructures. J. Med. Chem. 42, 3251–3264 (1999).
CAS PubMed Google Scholar
Leach, A. R., Green, D. V. S., Hann, M. M., Judd, D. B. & Good, A. C. Where are the GaPs? A rational approach to monomer acquisition and selection. J. Chem. Inf. Comput. Sci. 40, 1262–1269 (2000).
CAS PubMed Google Scholar
Martin, E. J. & Hoeffel, T. J. Oriented substituent pharmacophore property space (OSPPREYS): A substituent-based calculation that describes combinatorial library products better than the corresponding product-based selection. J. Mol. Graph. Model. 18, 383–403 (2000).This paper describes the use of substituent-based pharmacophore descriptors to encode conformation-dependent properties of combinatorial products.
CAS PubMed Google Scholar
Cramer, R. D., Clark, R. D., Patterson, D. E. & Ferguson, A. M. Bioisosterism as a molecular diversity descriptor: steric fields of single topomeric conformers. J. Med. Chem. 39, 3060–3069 (1996).
CAS PubMed Google Scholar
Matter, H. & Potter, T. Comparing 3D pharmacophore triplets and 2D fingerprints for selecting diverse compound subsets. J. Chem. Inf. Comput. Sci. 39, 1211–1225 (1999).
CAS Google Scholar
Salemme, F. R., Spurlino, J. & Bone, R. Serendipity meets precision: the integration of structure based drug design and combinatorial chemistry for efficient drug discovery. Structure 5, 319–324 (1997).
CAS PubMed Google Scholar
Graybill, T. L. et al. in Molecular Diversity and Combinatorial Chemistry (eds Chaiken, I. M. & Janda, K. D.) 16–26 (ACS, Washington DC, 1996).
Google Scholar
Jones, G., Willett, P., Glen, R. C., Leach, A. R. & Taylor, R. Further development of a genetic algorithm for ligand docking and its application to screening combinatorial libraries. ACS Symp. Ser. 719, 271–291 (1999).
CAS Google Scholar
Waszkowycz, B., Perkins, T. D. J., Sykes, R. A. & Li, J. Large-scale virtual screening for discovering leads in the post-genomics era. IBM Syst. J. 40, 360–376 (2001).
Google Scholar
Sun, Y., Ewing, T. J. A., Skillman, A. G. & Kuntz, I. D. CombiDock: structure-based combinatorial docking and library design. J. Comput. Aided. Mol. Des. 12, 597–604 (1998).
CAS PubMed Google Scholar
Waller, C. L. & Bradley, M. P. Development and validation of a novel variable selection technique with application to multidimensional quantitative structure-activity relationship studies. J. Chem. Inf. Comput. Sci. 39, 345–355 (1999).
CAS Google Scholar
Rose, V. S. & Wood, J. Generalized cluster significance analysis with conditional probabilities. Quant. Struct. Activ. Rel. 17, 348–356 (1998).
CAS Google Scholar
Godden, J. W. & Bajorath, J. Differential Shannon entropy as a sensitive measure of differences in database variability of molecular descriptors. J. Chem. Inf. Comput. Sci. 41, 1060–1066 (2001).
CAS PubMed Google Scholar
Cooley, W. & Lohnes, P. Multivariate Data Analysis (Wiley, New York, 1971).
Google Scholar
Xie, D., Tropsha, A. & Schlick, T. An efficient projection protocol for chemical databases: singular value decomposition combined with truncated Newton minimization. J. Chem. Inf. Comput. Sci. 40, 167–177 (2000).
CAS PubMed Google Scholar
Hull, R. D. et al. Latent semantic structure indexing (LASSI) for defining chemical similarity. J. Med. Chem. 44, 1177–1184 (2001).
CAS PubMed Google Scholar
Cummins, D. J., Andrews, C. W., Bentley, J. A. & Cory, M. Molecular diversity in chemical databases: comparison of medicinal chemistry knowledge bases and databases of commercially available compounds. J. Chem. Inf. Comput. Sci. 36, 750–763 (1996).
CAS PubMed Google Scholar
Kruskal, J. B. Non-metric multidimensional scaling: a numerical method. Phychometrika 29, 115–129 (1964).
Google Scholar
Sammon, J. W. A nonlinear mapping for data structure analysis. IEEE Trans. Comput. C18, 401–409 (1969).
Google Scholar
Agrafiotis, D. K. & Lobanov, V. S. Nonlinear mapping networks. J. Chem. Inf. Comput. Sci. 40, 1356–1362 (2000).
CAS PubMed Google Scholar
Rassokhin, D. N., Lobanov, V. S. & Agrafiotis, D. K. Nonlinear mapping of massive data sets by fuzzy clustering and neural networks. J. Comput. Chem. 22, 373–386 (2001).
CAS Google Scholar
Agrafiotis, D. K., Rassokhin, D. N. & Lobanov, V. S. Multidimensional scaling and visualization of large molecular similarity tables. J. Comput. Chem. 22, 488–500 (2001).
CAS Google Scholar
Agrafiotis, D. K. & Lobanov, V. S. Multidimensional scaling of combinatorial libraries without explicit enumeration. J. Comput. Chem. 22, 1712–1722 (2001).
CAS Google Scholar
Jamois, E. A., Hassan, M. & Waldman, M. Evaluation of reagent-based and product-based strategies in the design of combinatorial library subsets. J. Chem. Inf. Comput. Sci. 40, 63–70 (2000).
CAS PubMed Google Scholar
Agrafiotis, D. K. & Rassokhin, D. N. A fractal approach for selecting an appropriate bin size for cell-based diversity estimation. J. Chem. Inf. Comput. Sci. 42, 117–122 (2002).
CAS PubMed Google Scholar
Montgomery, D. C. Design and Analysis of Experiments 4th edn (John Wiley and Sons, New York, 1996).
Google Scholar
Martin, E. J. et al. Measuring diversity: Experimental design of combinatorial libraries for drug discovery. J. Med. Chem. 38, 1431–1436 (1995).This paper describes the use of statistical experimental-design techniques to select building blocks for combinatorial libraries using a rich set of molecular descriptors.
CAS PubMed Google Scholar
Hassan, M., Bielawski, J. P., Hempel, J. C. & Waldman, M. Optimization and visualization of molecular diversity of combinatorial libraries. Mol. Divers. 2, 64–74 (1996).
CAS PubMed Google Scholar
Kennard, R. W. & Stone, L. A. Computer-aided design of experiments. Technometrics 11, 137–148 (1969).
Google Scholar
Higgs, R. E., Bemis, K. G., Watson, I. A. & Wikel, J. H. Experimental designs for selecting molecules from large chemical databases. J. Chem. Inf. Comput. Sci. 37, 861–870 (1997).
CAS Google Scholar
Snarey, M., Terrett, N. K., Willett, P. & Wilton, D. J. Comparison of algorithms for dissimilarity-based compound selection. J. Mol. Graph. Model. 15, 372–385 (1997).
CAS PubMed Google Scholar
Mount, J., Ruppert, J., Welch, W. & Jain, A. N. IcePick: a flexible surface-based system for molecular diversity. J. Med. Chem. 42, 60–66 (1999).
CAS PubMed Google Scholar
Agrafiotis, D. K. & Lobanov, V. S. An efficient implementation of distance-based diversity metrics based on k-d trees. J. Chem. Inf. Comput. Sci. 39, 51–58 (1999).
CAS Google Scholar
Agrafiotis, D. K. A constant time algorithm for estimating the diversity of large chemical libraries. J. Chem. Inf. Comput. Sci. 41, 159–167 (2001).
CAS PubMed Google Scholar
Downs, G. M. & Willett, P. Similarity searching and clustering of chemical-structure databases using molecular property data. J. Chem. Inf. Comput. Sci. 34, 1094–1102 (1994).
CAS Google Scholar
Brown, R. D. & Martin, Y. C. Use of structure–activity data to compare structure-based clustering methods and descriptors for use in compound selection. J. Chem. Inf. Comput. Sci. 36, 572–584 (1996).A comparison of several two-dimensional and three-dimensional descriptors, which is based on their ability to discriminate active from inactive compounds.
CAS Google Scholar
Brown, R. D. & Martin, Y. C. The information content of 2D and 3D structural descriptors relevant to ligand–receptor binding. J. Chem. Inf. Comput. Sci. 37, 1–9 (1997).
CAS Google Scholar
Patterson, D. E., Cramer, R. D., Ferguson, A. M., Clark, R. D. & Weinberger, L. E. Neighborhood behavior: a useful concept for validation of molecular diversity descriptors. J. Med. Chem. 39, 3049–3059 (1996).
CAS PubMed Google Scholar
Matter, H. Selecting optimally diverse compounds from structure databases: a validation study of two-dimensional and three-dimensional molecular descriptors. J. Med. Chem. 40, 1219–1229 (1997).
CAS PubMed Google Scholar
Martin, Y. C., Bures, M. G. & Brown, R. D. Validated descriptors for diversity measurements and optimization. Pharm. Pharmacol. Commun. 4, 147–152 (1998).
CAS Google Scholar
Lipinski, C. A., Lombardo, F., Dominy, B. W. & Feeny, P. J. Experimental and computational approaches to estimate solubility and permeability in drug discovery and development settings. Adv. Drug Deliv. Rev. 23, 3–25 (1997).A discussion of the importance of ADME screening in early-stage drug discovery.
CAS Google Scholar
Oprea, T. I. Property distribution of drug-related chemical databases. J. Comput. Aided Mol. Des. 14, 251–264 (2000).
CAS PubMed Google Scholar
Muegge, I., Heald, S. L. & Brittelli, D. Simple selection criteria for drug-like chemical matter. J. Med. Chem. 44, 1841–1846 (2001).
CAS PubMed Google Scholar
Sheridan, R. P. The most common chemical replacements in drug-like compounds. J. Chem. Inf. Comput. Sci. 2, 103–108 (2002).
Google Scholar
Wang, J. & Ramnarayan, K. Toward designing drug-like libraries: a novel computational approach for prediction of drug feasibility of compounds. J. Combin. Chem. 1, 524–533 (1999).
CAS Google Scholar
Ajay, A., Walters, W. P. & Murcko, M. A. Can we learn to distinguish between drug-like and nondrug-like molecules? J. Med. Chem. 41, 3314–3324 (1998).
CAS PubMed Google Scholar
Wagener, M. & van Geerestein, V. J. Potential drugs and nondrugs: prediction and identification of important structural features. J. Chem. Inf. Comput. Sci. 40, 280–292 (2000).
CAS PubMed Google Scholar
Yu, L. X., Lipka, E., Crison, J. R. & Amidon, G. L. Transport approach to the biopharmaceutical design of oral drug delivery systems: prediction of intestinal absorption. Adv. Drug Deliv. Rev. 19, 359–376 (1996).
CAS PubMed Google Scholar
Teague, S. J., Davis, A. M., Leeson, P. D. & Oprea, T. I. The design of leadlike combinatorial libraries. Angew. Chem. Int. Edn Engl. 38, 3743–3748 (1999).Based on an analysis of 18 lead-drug pairs, the authors point out that traditional medicinal chemistry optimization tends to drive initial high-throughput screening (HTS) hits outside the “rule-of-five” range, and suggest that combinatorial libraries should have lower molecular masses and lower log P profiles than those originally proposed by Lipinski.
CAS Google Scholar
Koehler, R. T., Dixon, S. L. & Villar, O. H. LASSOO: a generalized directed diversity approach to the design and enrichment of chemical libraries. J. Med. Chem. 42, 4695–4704 (1999).
CAS PubMed Google Scholar
Gillet, V. J., Willet, P., Bradshaw, J. & Green, D. V. S. Selecting combinatorial libraries to optimize diversity and physical properties. J. Chem. Inf. Comput. Sci. 39, 169–177 (1999).
CAS Google Scholar
Rassokhin, D. N. & Agrafiotis, D. K. Kolmogorov–Smirnov statistic and its applications in library design. J. Mol. Graph. Model. 18, 370–384 (2000).
Google Scholar
Brown, R. D., Hassan, M. & Waldman, M. Combinatorial library design for diversity, cost efficiency and drug-like character. J. Mol. Graph. Model. 18, 427–437 (2000).
CAS PubMed Google Scholar
Shi, S., Peng, Z., Kostrowicki, J., Paderes, J. & Kuki, A. Efficient combinatorial filtering for desired molecular properties of reaction products. J. Mol. Graph. Model. 18, 478–496 (2000).
CAS PubMed Google Scholar
Martin, E. & Wong, A. Sensitivity analysis and other improvements to tailored combinatorial library design. J. Chem. Inf. Comput. Sci. 40, 215–220 (2000).
CAS PubMed Google Scholar
Gillet, V. J., Willett, P. & Bradshaw, J. The effectiveness of reactant pools for generating structurally-diverse combinatorial libraries. J. Chem. Inf. Comput. Sci. 37, 731–740 (1997).
CAS Google Scholar
Jamois, E. A., Hassan, M. & Waldman, M. Evaluation of reagent-based and product-based strategies in the design of combinatorial library subsets. J. Chem. Inf. Comput. Sci. 40, 63–70 (2000).
CAS PubMed Google Scholar
Graham, E. T., Jacober, S. P. & Cardoso, M. G. A novel frequency distribution selection method for efficient plate layout of a diverse combinatorial library. J. Chem. Inf. Comput. Sci. 41, 1508–1516 (2001).
CAS PubMed Google Scholar
Bayada, D. M., Hamersma, H. & van Geerestein, V. J. Molecular diversity and representativity in chemical databases. J. Chem. Inf. Comput. Sci. 39, 1–10 (1999).
CAS Google Scholar
Agrafiotis, D. K. & Lobanov, V. S. Ultrafast algorithm for designing focused combinatorial arrays. J. Chem. Inf. Comput. Sci. 40, 1030–1038 (2000).
CAS PubMed Google Scholar
Stanton, R. V. et al. Combinatorial library design: maximizing model fitting compounds with matrix synthesis constraints. J. Chem. Inf. Comput. Sci. 40, 701–705 (2000).
CAS PubMed Google Scholar
Agrafiotis, D. K. Stochastic algorithms for maximizing molecular diversity. J. Chem. Inf. Comput. Sci. 37, 841–851 (1997).
CAS Google Scholar
Hassan, M., Bielawski, J. P., Hempel, J. C. & Waldman, M. Optimization and visualization of molecular diversity of combinatorial libraries. Mol. Diversity 2, 64–74 (1996).
CAS Google Scholar
Good, A. C. & Lewis, R. A. New methodology for profiling combinatorial libraries and screening sets: cleaning up the design process with HARPcik. J. Med. Chem. 40, 3926–3236 (1997).
CAS PubMed Google Scholar
Zheng, W., Cho, S. J. & Tropsha, A. Rational combinatorial library design: 1) Focus-2D: a new approach to the design of targeted combinatorial chemical libraries. J. Chem. Inf. Comput. Sci. 38, 251–258 (1998).
CAS PubMed Google Scholar
Waldman, M., Li, H. & Hassan, M. Novel algorithms for the optimization of molecular diversity of combinatorial libraries. J. Mol. Graph. Model. 18, 412–426 (2000).
CAS PubMed Google Scholar
Agrafiotis, D. K. Multiobjective optimization of combinatorial libraries. IBM J. Res. Develop. 45, 545–566 (2001).
CAS Google Scholar
Sheridan, R. P. & Kearsley, S. K. Using a genetic algorithm to suggest combinatorial libraries. J. Chem. Inf. Comput. Sci. 35, 310–3201 (1995).
CAS Google Scholar
Weber, L., Wallbaum, S., Broger, C. & Gubernator, K. Optimization of the biological activity of combinatorial compound libraries by a genetic algorithm. Angew. Chem. Int. Edn Engl. 34, 2280–2282 (1995).
CAS Google Scholar
Singh, J. et al. Application of genetic algorithms to combinatorial synthesis: a computational approach for lead identification and lead optimization. J. Am. Chem. Soc. 118, 1669–1676 (1996).A description of the use of a genetic algorithm to optimize peptide-based collagenase substrates using direct experimental feedback, without constructing any intermediate models of biological activity.
CAS Google Scholar
Brown, R. D. & Martin, Y. C. Designing combinatorial library mixtures using genetic algorithms. J. Med. Chem. 40, 2304–2313 (1997).
CAS PubMed Google Scholar
Sheridan, R. P., SanFeliciano, S. G. & Kearsley, S. K. Designing targeted libraries with genetic algorithms. J. Mol. Graph. Model. 18, 320–334 (2000).
CAS PubMed Google Scholar
Farnum, M. & Agrafiotis, D. K. Combinatorial Swarms (CombiChem, London, 2001).
Google Scholar
Lobarov, V. S. & Agrafiotis, D. K. Stochastic similarity selections from large combinatorial libraries. J. Chem. Inf. Comput. Sci. 40, 460–470 (2000).
Google Scholar
Downs, G. M. & Barnard, J. M. Techniques for generating descriptive fingerprints in combinatorial libraries. J. Chem. Inf. Comput. Sci. 37, 59–61 (1997).
CAS Google Scholar
Cramer, R. D., Patterson, D. E., Clark, R. D., Soltanshahi, F. & Lawless, M. S. Virtual compound libraries: a new approach to decision making in molecular discovery research. J. Chem. Inf. Comput. Sci. 38, 1010–1023 (1998).
CAS Google Scholar
Ivanciuc, O. & Klein, D. J. Computing Weiner-type indices for virtual combinatorial libraries generated from heteroatom-containing building blocks. J. Chem. Inf. Comput. Sci. 42, 8–22 (2002).
CAS PubMed Google Scholar
Lobanov, V. S. & Agrafiotis, D. K. Combinatorial networks. J. Mol. Graph. Model. 19, 571–578 (2001).Describes the use of neural networks for predicting properties of combinatorial products from properties of their respective building blocks. This method allows product-based virtual screening of massive combinatorial libraries in a way that circumvents their virtual synthesis.
CAS PubMed Google Scholar

Download references

Author information

Authors and Affiliations

3-Dimensional Pharmaceuticals, Inc., 665 Stockton Drive, Exton, 19341, Pennsylvania, USA
Dimitris K. Agrafiotis, Victor S. Lobanov & F. Raymond Salemme

Authors

Dimitris K. Agrafiotis
View author publications
You can also search for this author in PubMed Google Scholar
Victor S. Lobanov
View author publications
You can also search for this author in PubMed Google Scholar
F. Raymond Salemme
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Dimitris K. Agrafiotis.

Glossary

PROBE LIBRARY: A collection of diverse compounds that is aimed at discovering hits across a wide variety of biological targets.
MULTIOBJECTIVE OPTIMIZATION: The solution to a problem that involves the simultaneous optimization of multiple design objectives.
VIRTUAL LIBRARY: A computer representation of a collection of chemical compounds.
COMBINATORIAL LIBRARY: A collection of compounds that are derived from the systematic application of a synthetic sequence on a prescribed set of building blocks.
ENUMERATION: The process of constructing the connection tables of the combinatorial products from their respective building blocks, as prescribed by the reaction sequence.
CLIPPED REAGENT: The (potentially modified) part of a reagent that becomes part of the final product.
CONNECTION TABLE: A computer representation of the atoms and bonds that comprise a molecule. This is the computer equivalent of a chemical sketch of a molecule.
LAZY ENUMERATION: The on-demand virtual synthesis of combinatorial products.
MOLECULAR PERCEPTION: The computational detection of important structural features, such as rings, aromaticity, stereochemistry and topological symmetry, from the molecule's connection table.
MOLECULAR DIVERSITY: The chemical-information content of a collection of compounds. The concept is often context dependent.
GRAPH THEORY: Formally, a connection table for a molecule records its chemical structure as a graph — a set of vertices (the atoms) linked by edges (the bonds). This allows mathematical analyses to be used to classify the structure or calculate molecular properties.
FINGERPRINT: A set of binary numbers (1s and 0s) that are used to characterize a molecular structure. Each bit signifies the presence (1) or absence (0) of one or more structural features in the target molecule.
PHARMACOPHORE: The ensemble of steric and electronic features that are necessary to ensure optimal supramolecular interactions with a specific biological target and to trigger (or block) its biological function. Only molecules that interact at the same receptor site in the same way share a common pharmacophore.
DRUG LIKENESS: The thesis that drugs have certain common properties that differentiate them from other ordinary chemicals.
LOG P: The octanol/water partition coefficient is the ratio of the compound's solubility in octanol to its solubility in water. The logarithm of this partition coefficient is called log P. It provides an estimate of the compound's ability to pass through a cell membrane.
OUTLIER: A point that, because of observation noise, does not follow the characteristics of the input (or desired response) data.
CURSE OF DIMENSIONALITY: The sparsity of data in higher dimensions.
QUADRATIC COMPLEXITY: Quadratic complexity means that if the size of the problem doubles, the computational time that is required by the algorithm to solve it quadruples. The complexity (or order) of an algorithm is an important criterion for comparing algorithms that involve the analysis of large data sets.
LIPINSKI RULE OF 5: For compounds that are not substrates of biological transporters, poor absorption and permeation are more likely to occur when there are more than 5 hydrogen-bond donors, more than 10 hydrogen-bond acceptors, the molecular mass is greater than 500 Da, or the log P is greater than 5.
BIOISOSTERISM: The idea that a chemical group in a biologically active molecule can be replaced by another chemical group without loss of activity.
COMBINATORIAL OPTIMIZATION: The number of different combinations of k objects out of a set of n objects is given by the binomial coefficient Cⁿ_k = n!/(n − k)!k!. This can be used to calculate the number of distinct k₁×k₂×...×k_d combinatorial arrays in a n₁×n₂×...×n_d combinatorial library. For example, there are approximately 10⁴⁰ different 10×10×10 arrays in a 100×100×100 library.
COMBINATORIAL NEURAL NETWORK: (CNN). A neural network that is trained to predict molecular properties of combinatorial products from pertinent features of their respective building blocks.
FEATURE SELECTION: A computational technique that attempts to identify a small subset of features that are most relevant to a particular machine learning task.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Agrafiotis, D., Lobanov, V. & Salemme, F. Combinatorial informatics in the post-genomics era. Nat Rev Drug Discov 1, 337–346 (2002). https://doi.org/10.1038/nrd791

Download citation

Issue Date: 01 May 2002
DOI: https://doi.org/10.1038/nrd791

This article is cited by

Natural products in modern life science
- Lars Bohlin
- Ulf Göransson
- Anders Backlund
Phytochemistry Reviews (2010)
Smart drug discovery leveraging innovative technologies and predictive knowledge
- Tomi K Sawyer
Nature Chemical Biology (2006)
Molecular similarity and diversity in chemoinformatics: From theory to applications
- Ana G. Maldonado
- J. P. Doucet
- Bo-Tao Fan
Molecular Diversity (2006)

Combinatorial informatics in the post-genomics era

Key Points

Abstract

Access options

Similar content being viewed by others

Extending the small-molecule similarity principle to all levels of biology with the Chemical Checker

An open-source drug discovery platform enables ultra-large virtual screens

Artificial intelligence–enabled virtual screening of ultra-large chemical libraries with deep docking

References

Author information

Authors and Affiliations

Corresponding author

Related links

FURTHER INFORMATION

LINKS

Glossary

Rights and permissions

About this article

Cite this article

This article is cited by

Natural products in modern life science

Smart drug discovery leveraging innovative technologies and predictive knowledge

Molecular similarity and diversity in chemoinformatics: From theory to applications

Search

Quick links

Key Points

Abstract

Access options

Similar content being viewed by others

Extending the small-molecule similarity principle to all levels of biology with the Chemical Checker

An open-source drug discovery platform enables ultra-large virtual screens

Artificial intelligence–enabled virtual screening of ultra-large chemical libraries with deep docking

References

Author information

Authors and Affiliations

Corresponding author

Related links

Related links

FURTHER INFORMATION

LINKS

Glossary

Rights and permissions

About this article

Cite this article

Share this article

This article is cited by

Natural products in modern life science

Smart drug discovery leveraging innovative technologies and predictive knowledge

Molecular similarity and diversity in chemoinformatics: From theory to applications

Search

Quick links