Integration of virtual and high-throughput screening

Bajorath, Jürgen

doi:10.1038/nrd941

Review Article
Published: 01 November 2002

Integration of virtual and high-throughput screening

Jürgen Bajorath^1,2

Nature Reviews Drug Discovery volume 1, pages 882–894 (2002)Cite this article

6906 Accesses
661 Citations
4 Altmetric
Metrics details

Key Points

High-throughput (HTS) and virtual screening (VS) have progressed rather independently over the years. However, these disciplines have similar goals and are highly complementary. There are good indications that drug discovery research will increasingly benefit from an integrated approach to screening.
A diverse array of VS methods has been developed, including structural queries, pharmacophores, molecular fingerprints, QSAR models, diverse cluster analysis tools, statistical techniques and docking calculations. In addition, VS techniques have been implemented to filter large databases for compounds with desired or undesired chemical groups, drug-like character, preferred solubility and absorption characteristics, or oral bioavailability.
Both small-molecule- and structure-based VS have recently produced several success stories in the search for novel inhibitors or antagonists of diverse biological targets.
Some VS methods have been introduced or adapted for the analysis of HTS data, taking into account that such data sets are usually noisy and error prone. Prominent among these methods are different partitioning and clustering algorithms that can derive predictive models of biological activity from screening data.
Similar approaches are used to interface HTS and VS directly. At present, this is best accomplished by the application of iterative screening strategies, such as focused or sequential screening. Although the details of such strategies can differ considerably, they have in common that small subsets of compounds are computationally selected from large databases and assayed. On the basis of the obtained results, the search for biologically active molecules is further refined in subsequent iterations.
In several case studies, sequential screening has yielded significant improvements in hit rates over random screening. It is not uncommon for iterative screening to achieve hit rates between 10% and 40% (by markedly reducing the number of compounds to be tested).
As the size of compound databases and the number of available screening targets rapidly increase, it is conceivable that combined computational and biological screening might soon become a focal point of pharmaceutical research, despite the advances that are being made in the HTS arena towards even higher throughput.

Abstract

High-throughput and virtual screening are important components of modern drug discovery research. Typically, these screening technologies are considered distinct approaches, as one is experimental and the other is theoretical in nature. However, given their similar tasks and goals, these approaches are much more complementary to each other than often thought. Various statistical, informatics and filtering methods have recently been introduced to foster the integration of experimental and in silico screening and maximize their output in drug discovery. Although many of these ideas and efforts have not yet proceeded much beyond the conceptual level, there are several success stories and good indications that early-stage drug discovery will benefit greatly from a more unified and knowledge-based approach to biological screening, despite the many technical advances towards even higher throughput that are made in the screening arena.

Access through your institution

Buy or subscribe

This is a preview of subscription content, access via your institution

Access options

Access through your institution

Buy this article

Purchase on Springer Link
Instant access to full article PDF

Buy now

Prices may be subject to local taxes which are calculated during checkout

**Figure 1: Representative molecular descriptors and their classification.**

**Figure 2: Different methods and tools for virtual screening.**

**Figure 3: Recognition of remote-similarity relationships.**

**Figure 4: Clustering versus partitioning.**

**Figure 5: Generation of low-dimensional chemical spaces for cell-based partitioning.**

**Figure 6: Results of a virtual-screening benchmark 'experiment'.**

**Figure 7: A two-descriptor model for passive absorption.**

**Figure 8: Structural similarity versus biological activity.**

**Figure 9: Strategies for sequential screening.**

An open-source drug discovery platform enables ultra-large virtual screens

Article 09 March 2020

Artificial intelligence–enabled virtual screening of ultra-large chemical libraries with deep docking

Article 04 February 2022

A practical guide to large-scale docking

Article 24 September 2021

References

Handen, J. S. High-throughput screening — challenges for the future. Drug Discov. World 47–50 (Summer 2002).
Fox, S., Farr-Jones, S. & Yund, M. A. High-throughput screening for drug discovery: continually transitioning into new technologies. J. Biomol. Screen. 4, 183–186 (1999).
Article CAS PubMed Google Scholar
Smith, A. Screening for drug discovery: the leading question. Nature 418, 453–459 (2002).
PubMed Google Scholar
Fox, S., Farr-Jones, S., Sopchak, L. & Wang, H. Fine-tuning the technology strategies for lead finding. Drug Discov. World 24–30 (Summer 2002).
Bajorath, J. Rational drug discovery revisited: interfacing experimental programs with bio- and chemo-informatics. Drug Discov. Today 6, 989–995 (2001).
CAS PubMed Google Scholar
Drews, J. Drug discovery: a historical perspective. Science 287, 1960–1964 (2000).
CAS PubMed Google Scholar
Bajorath, J. Selected concepts and investigations in compound classification, molecular descriptor analysis, and virtual screening. J. Chem. Inf. Comput. Sci. 41, 233–245 (2001).
CAS PubMed Google Scholar
Bajorath, J. Virtual screening: methods, expectations, and reality. Curr. Drug Discov. 2, 24–28 (2002).
Google Scholar
Brown, F. K. Chemoinformatics: what is it and how does it impact drug discovery. Annu. Rep. Med. Chem. 33, 375–384 (1998).
CAS Google Scholar
Agrafiotis, D. K., Lobanov, V. S. & Salemme, R. F. Combinatorial informatics in the post-genomics era. Nature Rev. Drug Discov. 1, 337–346 (2002). An excellent review of diversity analysis, library design and profiling methods.
CAS Google Scholar
Kuntz, I. D. Structure-based strategies for drug design and discovery. Science 257, 1078–1082 (1992).
CAS PubMed Google Scholar
Halpering, I., Ma, B., Wolfson, H. & Nussinov, R. Principles of docking: an overview of search algorithms and a guide to scoring functions. Proteins 47, 409–443 (2002).
Google Scholar
Willett, P., Barnard, J. M. & Downs, G. M. Chemical similarity searching. J. Chem. Inf. Comput. Sci. 38, 983–996 (1998). This manuscript provides an introduction to similarity searching and a good description of different similarity metrics.
CAS Google Scholar
Livingstone, D. J. The characterization of chemical structures using molecular properties. A survey. J. Chem. Inf. Comput. Sci. 40, 195–209 (2000). An extensive review of different types of molecular property descriptor.
CAS PubMed Google Scholar
Cramer, R. D., Redl, G. & Berkoff, C. E. Substructural analysis. A novel approach to the problem of drug design. J. Med. Chem. 17, 533–535 (1974).
CAS PubMed Google Scholar
Barnard, J. M. Substructure searching methods. Old and new. J. Chem. Inf. Comput. Sci. 33, 532–538 (1993).
CAS Google Scholar
Gund, P. in Progress in Molecular and Subcellular Biology Vol. 5 (ed. Hahn, F. E.) 117–142 (Springer–Verlag, Berlin, 1977).
Google Scholar
Sheridan, R. P., Rusinko, A., Nilakantan, R. & Venkataraghavan, R. Searching for pharmacophores in large coordinate databases and its use in drug design. Proc. Natl Acad. Sci. USA 86, 8156–8159 (1989).
Google Scholar
Martin, Y. C. 3D database searching in drug design. J. Med. Chem. 35, 2145–2154 (1992).
CAS PubMed Google Scholar
Pearlman, R. S. Rapid generation of high quality approximate 3D molecular structures. Chem. Des. Auto. News 2, 1–7 (1987).
Google Scholar
Gasteiger, J., Rudolph, C. & Sadowski, J. Automatic generation of 3D-atomic coordinates for organic molecules. Tetrahedron Comp. Method. 3, 537–547 (1990).
CAS Google Scholar
Cramer, R. D. et al. Prospective identification of biologically active structures by topomer similarity searching. J. Med. Chem. 42, 3919–3933 (1999).
PubMed Google Scholar
Andrews, K. M. & Cramer, R. D. Toward general methods for targeted library design: topomer shape similarity with diverse structures as queries. J. Med. Chem. 43, 1723–1740 (2000).
CAS PubMed Google Scholar
Hall, L. H. & Kier, L. B. The E-state as the basis for molecular structure space definition and structure similarity. J. Chem. Inf. Comput. Sci. 40, 784–791 (2000).
CAS PubMed Google Scholar
Kier, L. B. & Hall, L. H. Database organization and searching with E-state indices. SAR QSAR Environ. Res. 12, 55–74 (2001).
CAS PubMed Google Scholar
Hull, R. D. et al. Latent semantic structure indexing (LaSSI) for defining chemical similarity. J. Med. Chem. 44, 1177–1184 (2001).
CAS PubMed Google Scholar
Raymond, J. W. & Willett, P. Effectiveness of graph-based and fingerprint-based similarity measures for virtual screening of 2D chemical structure databases. J. Comput. Aided Mol. Des. 16, 59–71 (2002).
CAS PubMed Google Scholar
Cramer, R. D., Patterson, D. E. & Bunce, J. D. Comparative molecular field analysis (CoMFA). Effect of shape on binding of steroids to carrier proteins. J. Am. Chem. Soc. 110, 5959–5967 (1988).
CAS PubMed Google Scholar
Hopfinger, A. J. et al. Construction of 3D-QSAR models using the 4D-QSAR analysis formalism. J. Am. Chem. Soc. 119, 10509–10524 (1997).
CAS Google Scholar
Duca, J. S. & Hopfinger, A. J. Estimation of molecular similarity based on 4D-QSAR analysis: formalism and validation. J. Chem. Inf. Comput. Sci. 41, 1367–1387 (2001).
CAS PubMed Google Scholar
Hopfinger, A. J., Reaka, A., Venkatarangan, P., Duca, J. S. & Wang, S. Construction of a virtual high throughput screen by 4D-QSAR analysis: application to a combinatorial library of glucose inhibitors of glycogen phosphorylase b. J. Chem. Inf. Comput. Sci. 39, 1151–1160 (1999). An instructive example of the adoption of a multidimensional QSAR model for VS calculations.
CAS Google Scholar
Xue, L., Godden, J. W. & Bajorath, J. Evaluation of descriptors and mini-fingerprints for the identification of molecules with similar activity. J. Chem. Inf. Comput. Sci. 40, 1227–1234 (2000).
CAS PubMed Google Scholar
Xue, L., Stahura, F. L., Godden, J. W. & Bajorath, J. Mini-fingerprints detect similar activity of receptor ligands previously recognized only by three-dimensional pharmacophore-based methods. J. Chem. Inf. Comput. Sci. 41, 394–401 (2001). This paper shows that conceptually simple but carefully designed 2D fingerprints can recognize molecules that have diverse structures but similar activity.
CAS PubMed Google Scholar
Mason, J. S. et al. New 4-point pharmacophore method for molecular similarity and diversity applications: overview over the method and applications, including a novel approach to the design of combinatorial libraries containing privileged substructures. J. Med. Chem. 42, 3251–3264 (1999). An extensive introduction to the four-point pharmacophore methodology.
CAS PubMed Google Scholar
Mason, J. S. & Cheney, D. L. Library design and virtual screening using multiple point pharmacophore fingerprints. Pac. Symp. Biocomput. 5, 576–587 (2000).
Google Scholar
McGregor, M. J. & Muskal, S. M. Pharmacophore fingerprinting. 1. Application to QSAR and focused library design. J. Chem. Inf. Comput. Sci. 39, 569–574 (1999).
CAS PubMed Google Scholar
Bradley, E. K. et al. A rapid computational method for lead evolution: description and application to α1-adrenergic antagonists. J. Med. Chem. 43, 2770–2774 (2000).
CAS PubMed Google Scholar
Brown, R. D. & Martin, Y. C. Use of structure–activity data to compare structure-based clustering methods and descriptors for use in compound selection. J. Chem. Inf. Comput. Sci. 36, 572–584 (1996).
CAS Google Scholar
Brown, R. D. & Martin, Y. C. The information content of 2D and 3D molecular descriptors relevant to ligand–receptor binding. J. Chem. Inf. Comput. Sci. 37, 731–740 (1997).
Google Scholar
Matter, H. Selecting optimally diverse compounds from structure databases: a validation study of two-dimensional and three-dimensional descriptors. J. Med. Chem. 40, 1219–1229 (1997).
CAS PubMed Google Scholar
Willett, P., Wintermann, V. & Bawden, D. Implementation of non-hierarchic cluster analysis methods in chemical information systems: selection of compounds for biological testing and clustering of substructure search output. J. Chem. Inf. Comput. Sci. 26, 109–118 (1986).
CAS Google Scholar
Barnard, J. M. & Downs, G. M. Clustering of chemical structures on the basis of two-dimensional similarity measures. J. Chem. Inf. Comput. Sci. 32, 644–649 (1992).
CAS Google Scholar
Pearlman, R. S. & Smith, K. M. Novel software tools for chemical diversity. Perspect. Drug Discov. Design 9, 339–353 (1998).
Google Scholar
Pearlman, R. S. & Smith, K. M. Metric validation and the receptor-relevant subspace concept. J. Chem. Inf. Comput. Sci. 39, 28–35 (1999). A landmark paper rationalizing the design of low-dimensional reference spaces for cell-based partitioning.
CAS Google Scholar
Bayley, M. J. & Willett, P. Binning schemes for partition-based compound selection. J. Mol. Graph. Model. 17, 10–18 (1999).
CAS PubMed Google Scholar
Agrafiotis, D. K. & Rassokhin, D. N. A fractal approach for selecting an appropriate bin size for cell-based diversity estimation. J. Chem. Inf. Comput. Sci. 42, 117–122 (2002).
CAS PubMed Google Scholar
Xue, L. & Bajorath, J. Molecular descriptors for effective classification of biologically active compounds based on principal component analysis identified by a genetic algorithm. J. Chem. Inf. Comput. Sci. 40, 801–809 (2000).
CAS PubMed Google Scholar
Xie, D., Tropsha, A. & Schlick, T. An efficient projection protocol for chemical databases: single value decomposition combined with truncated Newton minimization. J. Chem. Inf. Comput. Sci. 40, 167–177 (2000).
CAS PubMed Google Scholar
Godden, J. W., Xue, L. & Bajorath, J. Classification of biologically active compounds by median partitioning. J. Chem. Inf. Comput. Sci. 42, 1263–1269 (2002).
CAS PubMed Google Scholar
Sheridan, R. P. & Kearsley, S. K. Why do we need so many chemical similarity search methods? Drug Discov. Today 7, 903–911 (2002).
PubMed Google Scholar
Walters, W. P., Stahl, M. T. & Murcko, M. A. Virtual screening — an overview. Drug Discov. Today 3, 160–178 (1998).
CAS Google Scholar
Hann, M., Hudson, B., Lifely, R., Miller, L. & Ramsden, N. Strategic pooling of compounds for high-throughput screening. J. Chem. Inf. Comput. Sci. 39, 897–902 (1999).
CAS PubMed Google Scholar
Lipinski, C. A. Avoiding investments in doomed drugs. Curr. Drug Discov. 1, 17–19 (2001).
Google Scholar
Sutter, J. M. & Jurs, P. C. Prediction of aqueous solubility for a diverse set of heteroatom-containing organic compounds using a quantitative structure–property relationship. J. Chem. Inf. Comput. Sci. 36, 100–107 (1996).
CAS Google Scholar
Huuskonen, J., Salo, M. & Taskinen, J. Aqueous solubility prediction of drugs based on molecular topology and neural network modeling. J. Chem. Inf. Comput. Sci. 38, 450–456 (1998).
CAS PubMed Google Scholar
Klopman, G. & Zhao, H. Estimation of aqueous solubility of organic molecules by the group contribution approach. J. Chem. Inf. Comput. Sci. 41, 439–445 (2001).
CAS PubMed Google Scholar
Jorgensen, W. L. & Duffy, E. R. Prediction of drug solubility from structures. Adv. Drug. Deliv. Rev. 54, 355–366 (2002).
CAS PubMed Google Scholar
Wessel, M. D., Jurs, P. C., Tolan, J. W. & Muskal, S. M. Prediction of human intestinal absorption of drug compounds from molecular structure. J. Chem. Inf. Comput. Sci. 38, 726–735 (1998).
CAS PubMed Google Scholar
Egan, W. J., Merz, K. M. Jr & Baldwin, J. J. Prediction of drug absorption using multivariate statistics. J. Med. Chem. 43, 3867–3877 (2000).
CAS PubMed Google Scholar
Ertl, P., Rohde, B. & Selzer, P. Fast calculation of molecular polar surface area as a sum of fragment-based contributions and its application to the prediction of drug transport properties. J. Med. Chem. 43, 3714–3717 (2000).
CAS PubMed Google Scholar
Lipinski, C. A., Lombardo, F., Dominy, B. W. & Feeney, P. J. Experimental and computational approaches to estimate solubility and permeability in drug discovery and development settings. Adv. Drug Deliv. Rev. 23, 3–25 (1997).
CAS Google Scholar
Bemis, G. W. & Murcko, M. A. The properties of known drugs. 1. Molecular frameworks. J. Med. Chem. 39, 2887–2893 (1996).
CAS PubMed Google Scholar
Sheridan, R. P. The most common chemical replacements in drug-like compounds. J. Chem. Inf. Comput. Sci. 42, 103–108 (2002).
CAS PubMed Google Scholar
Oprea, T. Property distribution of drug-related chemical databases. J. Comput. Aided Mol. Des. 14, 251–264 (2000).
CAS PubMed Google Scholar
Ghose, A. K., Viswanadhan, V. N. & Wendoloski, J. J. A knowledge-based approach in designing combinatorial or medicinal chemistry libraries for drug discovery. 1. A qualitative and quantitative characterization of known drug databases. J. Comb. Chem. 1, 55–67 (1999).
CAS PubMed Google Scholar
Muegge, I., Heald, S. L. & Brittelli, D. Simple selection criteria for drug-like chemical matter. J. Med. Chem. 44, 1841–1846 (2001).
CAS PubMed Google Scholar
Gillet, V. J., Willett, P. & Bradshaw, J. Identification of biological activity profiles using substructural analysis and genetic algorithms. J. Chem. Inf. Comput. Sci. 38, 165–179 (1998). A good example of the usefulness of genetic algorithms in descriptor analysis. Here, a genetic algorithm implementation was used to assign weighting factors to molecular descriptors for the prediction of drug-like molecules.
CAS PubMed Google Scholar
Ajay, A., Walters, W. P. & Murcko, M. A. Can we learn to distinguish between 'drug-like' and 'nondrug-like' molecules? J. Med. Chem. 41, 3314–3324 (1998).
CAS PubMed Google Scholar
Sadowski, J. & Kubinyi, H. A scoring scheme to distinguish between drugs and non-drugs. J. Med. Chem. 41, 3325–3329 (1998). References 68 and 69 were the first to apply machine-learning techniques to the systematic prediction of drug-likeness. Different from QSAR-type analysis, neural network models can capture non-linear property relationships.
CAS PubMed Google Scholar
Norinder, U., Sjöberg, P. & Österberg, T. Theoretical calculation and prediction of blood–brain-barrier partitioning of organic solutes using MolSurf parameterization and PLS statistics. J. Pharm. Sci. 87, 952–959 (1998).
CAS PubMed Google Scholar
van de Waterbeemd, H., Camenisch, G., Folkers, G., Chretien, J. R. & Raevsky, O. A. Estimation of blood–brain barrier crossing of drugs using molecular size and shape, and H-bonding descriptors. J. Drug Target. 6, 151–165 (1998).
CAS PubMed Google Scholar
Kelder, J., Grootenhuis, P. D., Bayada, D. M., Delbressine, L. P. & Ploemen, J. P. Polar molecular surface as a dominating determinant for oral absorption and brain penetration of drugs. Pharm. Res. 16, 1514–1519 (1999).
CAS PubMed Google Scholar
Ajay, A., Bemis, G. W. & Murcko, M. A. Designing libraries with CNS activity. J. Med. Chem. 42, 4942–4951 (1999).
CAS PubMed Google Scholar
Caldwell, G. W., Ritchie, M. M., Masucci, J. A., Hagemann, W. & Yan, Z. The new pre-clinical paradigm: compound optimization in early and late phase drug discovery. Curr. Topics Med. Chem. 1, 353–366 (2001).
CAS Google Scholar
Yoshida, F. & Topliss, J. G. QSAR model for drug human oral bioavailability. J. Med. Chem. 43, 2575–2585 (2000).
CAS PubMed Google Scholar
de Groot, M. J., Ackland, M. J., Horne, V. A., Alex, A. A. & Jones, B. C. A novel approach to predicting P450 mediated drug metabolism. CYP2D6 catalyzed N-dealkylation reactions and qualitative metabolite predictions using a combined protein and pharmacophore model for CYP2D6. J. Med. Chem. 42, 1515–1524 (1999).
CAS PubMed Google Scholar
Ekins, S. et al. Three- and four-dimensional quantitative structure–activity relationship (3D/4D-QSAR) analyses of CYP2C9 inhibitors. Drug Metab. Dispos. 28, 994–1002 (2000).
CAS PubMed Google Scholar
Jones, J. P., Mysinger, M. & Korzekwa, K. R. Computational models for cytochrome P450: a predictive electronic model for aromatic oxidation and hydrogen atom abstraction. Drug Metab. Dispos. 30, 7–12 (2002).
CAS PubMed Google Scholar
Ahlberg, C. Visual exploration of HTS databases: bridging the gap between chemistry and biology. Drug Discov. Today 4, 370–376 (1999).
CAS PubMed Google Scholar
Engels, M. F., Wouters, L., Verbeeck, R. & Vanhoof, G. Outlier mining in high throughput screening experiments. J. Biomol. Screen. 7, 341–351 (2002).
CAS PubMed Google Scholar
Chen, X., Rusinko, A. & Young, S. S. Recursive partitioning analysis of a large structure–activity data set using three-dimensional descriptors. J. Chem. Inf. Comput. Sci. 38, 1054–1062 (1998).
CAS Google Scholar
Rusinko, A., Farmen, M. W., Lambert, C. G., Brown, P. L. & Young, S. S. Analysis of a large structure–biological activity data set using recursive partitioning. J. Chem. Inf. Comput. Sci. 39, 1017–1026 (1999). References 81 and 82 establish the recursive partitioning approach for the analysis and mining of large screening data sets.
CAS PubMed Google Scholar
Cho, S. J., Shen, C. F. & Hermsmeier, M. A. Binary formal inference-based recursive modeling using multiple atom and physicochemical property class pair and torsion descriptors as decision criteria. J. Chem. Inf. Comput. Sci. 40, 668–680 (2000).
CAS PubMed Google Scholar
van Rhee, A. M. et al. Retrospective analysis of an experimental high-throughput screening data set by recursive partitioning. J. Comb. Chem. 3, 267–277 (2001).
CAS PubMed Google Scholar
Miller, D. A. Results of a new classification algorithm combining K nearest neighbors and recursive partitioning. J. Chem. Inf. Comput. Sci. 41, 168–175 (2001).
CAS PubMed Google Scholar
Blower, P., Fligner, M., Verducci, J. & Bjoraker, J. On combining recursive partitioning and simulated annealing to detect groups of biologically active compounds. J. Chem. Inf. Comput. Sci. 42, 393–404 (2002).
CAS PubMed Google Scholar
Nicolaou, C. A., Tamura, S. Y., Kelley, B. P., Bassett, S. I. & Nutt, R. F. Analysis of large screening data sets via adaptively grown phylogenetic-like trees. J. Chem. Inf. Comput. Sci. 42, 1069–1079 (2002). The introduction of a new clustering method that shows promise in extracting diverse structure–activity relationships from screening data.
CAS PubMed Google Scholar
Tamura, S. Y., Bacha, P. A., Gruver, H. S. & Nutt, R. F. Data analysis of high-throughput screening results: application of multidomain clustering to the NCI anti-HIV. J. Med. Chem. 45, 3082–3093 (2002).
CAS PubMed Google Scholar
Menard, P. R., Lewis, R. A. & Mason, J. S. Rational screening set design and compound selection: cascaded clustering. J. Chem. Inf. Comput. Sci. 38, 497–505 (1998).
CAS Google Scholar
Rosenkranz, H. S. et al. Development, characterization and application of predictive-toxicology models. SAR QSAR Environ. Res. 10, 277–298 (1999).
CAS PubMed Google Scholar
Roberts, G., Myatt, G. J., Johnson, W. P., Cross, K. P. & Blower, P. LeadScope: software for exploring large sets of screening data. J. Chem. Inf. Comput. Sci. 40, 1302–1314 (2000).
CAS PubMed Google Scholar
Labute, P. Binary QSAR: a new method for the determination of quantitative structure activity relationships. Pac. Symp. Biocomput. 4, 444–455 (1999).
Google Scholar
Gao, H. Application of BCUT metrics and genetic algorithm in binary QSAR analysis. J. Chem. Inf. Comput. Sci. 41, 402–407 (2001).
CAS PubMed Google Scholar
Gao, H., Williams, C., Labute, P. & Bajorath, J. Binary quantitative structure–activity relationship (QSAR) analysis of estrogen receptor ligands. J. Chem. Inf. Comput. Sci. 39, 164–168 (1999).
CAS PubMed Google Scholar
Stahura, F. L., Godden, J. W., Xue, L. & Bajorath, J. Distinguishing between natural products and synthetic molecules by Shannon descriptor entropy analysis and binary QSAR calculations. J. Chem. Inf. Comput. Sci. 40, 1245–1252 (2000).
CAS PubMed Google Scholar
Stahura, F. L., Godden, J. W. & Bajorath, J. Differential Shannon entropy analysis identifies molecular descriptors that predict aqueous solubility of synthetic compounds with high accuracy in binary QSAR calculations. J. Chem. Inf. Comput. Sci. 42, 550–558 (2002).
CAS PubMed Google Scholar
Harper, G., Bradshaw, J., Gittin, J. C., Green, D. V. S. & Leach, A. R. Prediction of biological activity for high-throughput screening using binary kernel discrimination. J. Chem. Inf. Comput. Sci. 41, 1295–1300 (2001).
CAS PubMed Google Scholar
Doman, T. N. et al. Molecular docking and high-throughput screening for novel inhibitors of protein tyrosine phosphatase-1B. J. Med. Chem. 45, 2213–2221 (2002). One of very few case studies that directly compares the performance of VS and HTS analysis.
CAS PubMed Google Scholar
Singh, J. et al. Identification of potent and novel α4β1 antagonists using in silico screening. J. Med. Chem. 45, 2988–2993 (2002).
CAS PubMed Google Scholar
Gr¨neberg, S., Stubbs, M. T. & Klebe, G. Successful virtual screening for novel inhibitors of human carbonic anhydrase: strategy and experimental confirmation. J. Med. Chem. 45, 3588–3602 (2002).
Google Scholar
Stahura, F. L., Xue, L., Godden, J. W. & Bajorath, J. Methods for compound selection focused on hits and application in drug discovery. J. Mol. Graph. Model. 20, 439–446 (2002).
CAS PubMed Google Scholar
Manallack, D. T. et al. Selecting screening candidates for kinase and G protein-coupled receptor targets using neural networks. J. Chem. Inf. Comput. Sci. 42, 1256–1262 (2002).
CAS PubMed Google Scholar
Valler, M. J. & Green, D. Diversity screening versus focused screening in drug discovery. Drug Discov. Today 5, 286–293 (2000).
CAS PubMed Google Scholar
Martin, Y. C., Kofron, J. L. & Traphagen, L. M. Do structurally similar molecules have similar biological activity? J. Med. Chem. 45, 4350–4358 (2002).
CAS PubMed Google Scholar
Engels, M. F. M. & Venkatarangan, P. Smart screening: approaches to efficient HTS. Curr. Opin. Drug Discov. Develop. 4, 275–283 (2001). An instructive description of a sequential-screening strategy, including several interesting benchmark calculations.
CAS Google Scholar
Engels, M. F. M., Thielemans, T., Verbinnen, D., Tollenaere, J. P. & Verbeeck, R. CerBeruS: a system supporting the sequential screening process. J. Chem. Inf. Comput. Sci. 40, 241–245 (2000).
CAS PubMed Google Scholar
Jones-Hertzog, D. K., Mukhopadhyay, P., Keefer, C. E. & Young, S. S. Use of recursive partitioning in the sequential screening of G protein-coupled receptors. J. Pharmacol. Toxicol. Methods 42, 207–215 (1999).
CAS PubMed Google Scholar
Kauvar, L. M. et al. Predicting ligand binding to proteins by affinity fingerprinting. Chem. Biol. 2, 107–118 (1995).
CAS PubMed Google Scholar
Dixon, S. L. & Villar, H. O. Bioactive diversity and screening library selection via affinity fingerprinting. J. Chem. Inf. Comput. Sci. 38, 1192–1203 (1998). This paper describes the application of affinity fingerprints in iterative screening situations and provides insights into the predictive value of this approach.
CAS PubMed Google Scholar
McGovern, S. L., Caselli, E., Grigorieff, N. & Shoichet, B. K. A common mechanism underlying promiscuous inhibitors from virtual and high-throughput screening. J. Med. Chem. 45, 1712–1722 (2002).
CAS PubMed Google Scholar
Powers, R. A., Morandi, F. & Shoichet, B. K. Structure-based discovery of a novel, noncovalent inhibitor AmpC β-lactamase. Structure 10, 1013–1023 (2002).
CAS PubMed Google Scholar
Sotriffer, C. A., Gohlke, H. & Klebe, G. Docking into knowledge-based potential fields: a comparative evaluation of DrugScore. J. Med. Chem. 45, 1967–1970 (2002).
CAS PubMed Google Scholar
Wei, B., Baase, W., Weaver, L. Matthews & Shoichet, B. K. A model binding site for testing scoring functions in molecular docking. J. Mol. Biol. 322, 339–355 (2002). A well-designed study that uses T4 lysozyme mutant structures as a versatile model system for the evaluation of docking and scoring functions.
CAS PubMed Google Scholar

Download references

Acknowledgements

The author is grateful to F. Stahura for critical review of the manuscript and help with illustrations.

Author information

Authors and Affiliations

Department of Computer-Aided Drug Discovery, Albany Molecular Research, Inc. (AMRI), Bothell Research Center (AMRI–BRC), 18804 North Creek Parkway, Bothell, 98011, Washington, USA
Jürgen Bajorath
Department of Biological Structure, University of Washington, Seattle, 98195, Washington, USA
Jürgen Bajorath

Authors

Jürgen Bajorath
View author publications
You can also search for this author in PubMed Google Scholar

Glossary

SUBSTRUCTURE: A defined structural fragment of a molecule.
PHARMACOPHORE: The spatial arrangement of chemical groups or features in a molecule that are known or thought to determine its activity. The most popular pharmacophore models consist of three or four points separated by defined distance ranges. In most cases, pharmacophore geometry is not known from experiment, but is predicted.
MOLECULAR GRAPH: A two-dimensional representation of the connectivity pattern in a molecule, with atoms shown as vertices and bonds as edges.
QUANTITATIVE STRUCTURE–ACTIVITY RELATIONSHIP (QSAR).: QSAR analysis refers to methods that relate structural features of molecules to biological activity in quantitative terms. In most cases, QSAR analysis attempts to establish linear relationships between selected structural features in a series of related molecules and their known level of activity. If successful, models derived from training sets can be applied to predict molecules with higher potency.
BINARY BIT STRING: A series of 1 or 0 characters. Each bit position is either set 'on' (that is, set to 1) or 'off' (0), and can account for the presence or absence of a specific feature.
TANIMOTO COEFFICIENT: The most popular metric for the quantitative comparison of binary molecular fingerprints. This coefficient is defined as T_c = b_c/(b₁ + b₂ − b_c). In this formulation, b₁ represents the number of bits that are set on in the first fingerprint, b₂ is the number of bits that are set on in the second fingerprint, and b_c is the number of bits common to both fingerprints. If the T_c value is 1, then the compared fingerprints are identical.
COMBINATORIAL PROBLEM: As used here, the term describes the situation that the number of possible pairwise comparisons c grows with the number of objects n according to the formula c = n(n − 1)/2. So, if n becomes increasingly large, methods that rely on pairwise comparisons of, for example, database molecules become computationally infeasible.
BINNING: This process divides coordinate axes into intervals (typically of equal size). If binning is applied to the axes of 2D and 3D coordinate systems, grids and cells are obtained, respectively.
NEURAL NETWORK: Artificial neural networks are collections of mathematical models that are interconnected and organized in different layers. Given this architecture, the models correspond to neurons and the connections to synapses of the nervous system. Neural network simulations are analogous to an adaptive learning process. So, neural nets are typically trained to distinguish between different objects and their properties in learning sets, and the resulting models are then applied to make predictions on test sets.
QUANTITATIVE STRUCTURE–PROPERTY RELATIONSHIP (QSPR).: A variation of the QSAR approach, in which structural features of molecules are not quantitatively related to biological activity, but instead to physical properties, such as aqueous solubility or passive absorption.
DRUG-LIKE: The concept of 'drug-likeness' is based on the premise that drugs share specific molecular characteristics that systematically distinguish them from other synthetic or natural compounds.
LOGP(O/W): The logarithm of the octanol/water partition coefficient (often abbreviated logP) describes the solubility of a compound in octanol (hydrophobic solvent) relative to its solubility in water (polar solvent).
RULE-OF-FIVE: On the basis of statistical analysis of known drugs, candidate compounds are likely to have unfavourable absorption, permeation and bioavailability characteristics if they contain more than 5 hydrogen-bond donors, more than 10 hydrogen-bond acceptors, a logP greater than 5 and/or a molecular mass of more than 500 Da.
PRINCIPAL COMPONENT ANALYSIS: (PCA). A mathematical method that captures the variance in a data set with respect to chosen variables, and transforms correlated variables into a smaller number of uncorrelated ones for data presentation.
GENETIC ALGORITHM: Computational implementation of a problem-solving approach that uses principles of biological competition and population dynamics. Model parameters are encoded in a 'chromosome', and are varied. Chromosomes yield possible solutions to a given problem by means of a fitness function. Chromosomes that correspond to the best intermediate solutions are subjected to operations that are analogous to gene recombination and mutation to produce the next generation. This process continues until solutions reach a predefined convergence criterion.
ADME: Absorption, distribution, metabolism and excretion are important effects that determine the in vivo characteristics of drug (candidate) molecules.
DECISION TREE: A data set is successively divided at decision points. At each point, a 'yes' or 'no' decision is made for each object, dividing the data into smaller and smaller subsets along the tree. All objects in a given subset share the same signature of 'yes' or 'no' decisions.
BINARY DESCRIPTORS: These types of descriptor capture two defined states (and not continuous value ranges). Typical examples include a specific substructure or bond pattern. The feature detected by a binary descriptor is either 'present' (state 1) or 'absent' (state 2). Application of binary descriptors allows the classification of molecular data sets by means of decision trees.
SCAFFOLD: Often defined as the core structure of a small molecule, the scaffold is typically a ring system that has diverse chemical groups attached. Accordingly, it is obtained by removal of these attached groups.
CHEMOTYPE: A family of molecules that has a unique core structure or scaffold.
PHYLOGENETIC TREE: This classification structure has its origin in biology to describe evolutionary relationships. It classifies a family of objects into 'most-similar' sets by subdividing them at branch points into successively smaller subsets with increasing object similarity. The final subsets represent unique leaves of the tree. Different from a simple decision tree, a phylogenetic tree structure can create multiple branches at each point.
BAYES' THEOREM: A mathematical formulation that determines the probability that a specific result was due to a particular cause, if multiple possible causes exist. For example, a molecular database consists of 50% synthetic reagents, 30% drug-like molecules and 20% natural products. If the activity rates of synthetic compounds, drug-like molecules and natural products are 1%, 50% and 15%, respectively, what is the probability that a given biological activity in this database is represented by a natural product?
SIMILARITY PARADOX: In the context of virtual screening (VS), minor chemical modifications of otherwise similar molecules can render them either active or inactive. VS calculations are expected to identify series of molecules that share the same scaffold. However, if only inactive compounds were selected for testing, VS analysis would have 'failed', although a relevant chemotype was identified. This highlights potential problems associated with the selection of only one or a few representative molecules from a series of similar ones.
ANALOGUE: A member of a series of closely related molecules that has only minor chemical modifications that distinguish it from others belonging to this chemotype. Analogues of active molecules are often generated to improve potency and/or other compound characteristics, such as solubility or oral availability.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Bajorath, J. Integration of virtual and high-throughput screening. Nat Rev Drug Discov 1, 882–894 (2002). https://doi.org/10.1038/nrd941

Download citation

Issue Date: 01 November 2002
DOI: https://doi.org/10.1038/nrd941

This article is cited by

PermuteDDS: a permutable feature fusion network for drug-drug synergy prediction
- Xinwei Zhao
- Junqing Xu
- Yun Liu
Journal of Cheminformatics (2024)
Performance evaluation of drug synergy datasets using computational intelligence approaches
- Pooja Rani
- Kamlesh Dutta
- Vijay Kumar
Multimedia Tools and Applications (2024)
Comprehensive analysis of Seriphidium kurramense: GC/MS profiling, antibacterial and antibiofilm activities, molecular docking study and in-silico ADME profiling
- Narjis Khatoon
- Zubair Alam
- Qurban Ali
Discover Applied Sciences (2024)
Therapeutic Potential of HMF and Its Derivatives: a Computational Study
- Shashank Kumar Singh
- Soumya Sasmal
- Yatender Kumar
Applied Biochemistry and Biotechnology (2024)
Molecular docking as a tool for the discovery of molecular targets of nutraceuticals in diseases management
- P. C. Agu
- C. A. Afiukwa
- P. M. Aja
Scientific Reports (2023)

Integration of virtual and high-throughput screening

Key Points

Abstract

Access options

Similar content being viewed by others

An open-source drug discovery platform enables ultra-large virtual screens

Artificial intelligence–enabled virtual screening of ultra-large chemical libraries with deep docking

A practical guide to large-scale docking

References

Acknowledgements

Author information

Authors and Affiliations

Related links

DATABASES

LocusLink

FURTHER INFORMATION

Glossary

Rights and permissions

About this article

Cite this article

This article is cited by

PermuteDDS: a permutable feature fusion network for drug-drug synergy prediction

Performance evaluation of drug synergy datasets using computational intelligence approaches

Comprehensive analysis of Seriphidium kurramense: GC/MS profiling, antibacterial and antibiofilm activities, molecular docking study and in-silico ADME profiling

Therapeutic Potential of HMF and Its Derivatives: a Computational Study

Molecular docking as a tool for the discovery of molecular targets of nutraceuticals in diseases management

Search

Quick links

Key Points

Abstract

Access options

Similar content being viewed by others

References

Acknowledgements

Author information

Authors and Affiliations

Related links

Related links

DATABASES

LocusLink

FURTHER INFORMATION

Glossary

Rights and permissions

About this article

Cite this article

Share this article

This article is cited by

Search

Quick links