Skip to main content

Thank you for visiting nature.com. You are using a browser version with limited support for CSS. To obtain the best experience, we recommend you use a more up to date browser (or turn off compatibility mode in Internet Explorer). In the meantime, to ensure continued support, we are displaying the site without styles and JavaScript.

  • Review Article
  • Published:

Integration of virtual and high-throughput screening

Key Points

  • High-throughput (HTS) and virtual screening (VS) have progressed rather independently over the years. However, these disciplines have similar goals and are highly complementary. There are good indications that drug discovery research will increasingly benefit from an integrated approach to screening.

  • A diverse array of VS methods has been developed, including structural queries, pharmacophores, molecular fingerprints, QSAR models, diverse cluster analysis tools, statistical techniques and docking calculations. In addition, VS techniques have been implemented to filter large databases for compounds with desired or undesired chemical groups, drug-like character, preferred solubility and absorption characteristics, or oral bioavailability.

  • Both small-molecule- and structure-based VS have recently produced several success stories in the search for novel inhibitors or antagonists of diverse biological targets.

  • Some VS methods have been introduced or adapted for the analysis of HTS data, taking into account that such data sets are usually noisy and error prone. Prominent among these methods are different partitioning and clustering algorithms that can derive predictive models of biological activity from screening data.

  • Similar approaches are used to interface HTS and VS directly. At present, this is best accomplished by the application of iterative screening strategies, such as focused or sequential screening. Although the details of such strategies can differ considerably, they have in common that small subsets of compounds are computationally selected from large databases and assayed. On the basis of the obtained results, the search for biologically active molecules is further refined in subsequent iterations.

  • In several case studies, sequential screening has yielded significant improvements in hit rates over random screening. It is not uncommon for iterative screening to achieve hit rates between 10% and 40% (by markedly reducing the number of compounds to be tested).

  • As the size of compound databases and the number of available screening targets rapidly increase, it is conceivable that combined computational and biological screening might soon become a focal point of pharmaceutical research, despite the advances that are being made in the HTS arena towards even higher throughput.

Abstract

High-throughput and virtual screening are important components of modern drug discovery research. Typically, these screening technologies are considered distinct approaches, as one is experimental and the other is theoretical in nature. However, given their similar tasks and goals, these approaches are much more complementary to each other than often thought. Various statistical, informatics and filtering methods have recently been introduced to foster the integration of experimental and in silico screening and maximize their output in drug discovery. Although many of these ideas and efforts have not yet proceeded much beyond the conceptual level, there are several success stories and good indications that early-stage drug discovery will benefit greatly from a more unified and knowledge-based approach to biological screening, despite the many technical advances towards even higher throughput that are made in the screening arena.

This is a preview of subscription content, access via your institution

Access options

Buy this article

Prices may be subject to local taxes which are calculated during checkout

Figure 1: Representative molecular descriptors and their classification.
Figure 2: Different methods and tools for virtual screening.
Figure 3: Recognition of remote-similarity relationships.
Figure 4: Clustering versus partitioning.
Figure 5: Generation of low-dimensional chemical spaces for cell-based partitioning.
Figure 6: Results of a virtual-screening benchmark 'experiment'.
Figure 7: A two-descriptor model for passive absorption.
Figure 8: Structural similarity versus biological activity.
Figure 9: Strategies for sequential screening.
Figure 10: Frequent hitters.

Similar content being viewed by others

References

  1. Handen, J. S. High-throughput screening — challenges for the future. Drug Discov. World 47–50 (Summer 2002).

  2. Fox, S., Farr-Jones, S. & Yund, M. A. High-throughput screening for drug discovery: continually transitioning into new technologies. J. Biomol. Screen. 4, 183–186 (1999).

    Article  CAS  PubMed  Google Scholar 

  3. Smith, A. Screening for drug discovery: the leading question. Nature 418, 453–459 (2002).

    PubMed  Google Scholar 

  4. Fox, S., Farr-Jones, S., Sopchak, L. & Wang, H. Fine-tuning the technology strategies for lead finding. Drug Discov. World 24–30 (Summer 2002).

  5. Bajorath, J. Rational drug discovery revisited: interfacing experimental programs with bio- and chemo-informatics. Drug Discov. Today 6, 989–995 (2001).

    CAS  PubMed  Google Scholar 

  6. Drews, J. Drug discovery: a historical perspective. Science 287, 1960–1964 (2000).

    CAS  PubMed  Google Scholar 

  7. Bajorath, J. Selected concepts and investigations in compound classification, molecular descriptor analysis, and virtual screening. J. Chem. Inf. Comput. Sci. 41, 233–245 (2001).

    CAS  PubMed  Google Scholar 

  8. Bajorath, J. Virtual screening: methods, expectations, and reality. Curr. Drug Discov. 2, 24–28 (2002).

    Google Scholar 

  9. Brown, F. K. Chemoinformatics: what is it and how does it impact drug discovery. Annu. Rep. Med. Chem. 33, 375–384 (1998).

    CAS  Google Scholar 

  10. Agrafiotis, D. K., Lobanov, V. S. & Salemme, R. F. Combinatorial informatics in the post-genomics era. Nature Rev. Drug Discov. 1, 337–346 (2002). An excellent review of diversity analysis, library design and profiling methods.

    CAS  Google Scholar 

  11. Kuntz, I. D. Structure-based strategies for drug design and discovery. Science 257, 1078–1082 (1992).

    CAS  PubMed  Google Scholar 

  12. Halpering, I., Ma, B., Wolfson, H. & Nussinov, R. Principles of docking: an overview of search algorithms and a guide to scoring functions. Proteins 47, 409–443 (2002).

    Google Scholar 

  13. Willett, P., Barnard, J. M. & Downs, G. M. Chemical similarity searching. J. Chem. Inf. Comput. Sci. 38, 983–996 (1998). This manuscript provides an introduction to similarity searching and a good description of different similarity metrics.

    CAS  Google Scholar 

  14. Livingstone, D. J. The characterization of chemical structures using molecular properties. A survey. J. Chem. Inf. Comput. Sci. 40, 195–209 (2000). An extensive review of different types of molecular property descriptor.

    CAS  PubMed  Google Scholar 

  15. Cramer, R. D., Redl, G. & Berkoff, C. E. Substructural analysis. A novel approach to the problem of drug design. J. Med. Chem. 17, 533–535 (1974).

    CAS  PubMed  Google Scholar 

  16. Barnard, J. M. Substructure searching methods. Old and new. J. Chem. Inf. Comput. Sci. 33, 532–538 (1993).

    CAS  Google Scholar 

  17. Gund, P. in Progress in Molecular and Subcellular Biology Vol. 5 (ed. Hahn, F. E.) 117–142 (Springer–Verlag, Berlin, 1977).

    Google Scholar 

  18. Sheridan, R. P., Rusinko, A., Nilakantan, R. & Venkataraghavan, R. Searching for pharmacophores in large coordinate databases and its use in drug design. Proc. Natl Acad. Sci. USA 86, 8156–8159 (1989).

    Google Scholar 

  19. Martin, Y. C. 3D database searching in drug design. J. Med. Chem. 35, 2145–2154 (1992).

    CAS  PubMed  Google Scholar 

  20. Pearlman, R. S. Rapid generation of high quality approximate 3D molecular structures. Chem. Des. Auto. News 2, 1–7 (1987).

    Google Scholar 

  21. Gasteiger, J., Rudolph, C. & Sadowski, J. Automatic generation of 3D-atomic coordinates for organic molecules. Tetrahedron Comp. Method. 3, 537–547 (1990).

    CAS  Google Scholar 

  22. Cramer, R. D. et al. Prospective identification of biologically active structures by topomer similarity searching. J. Med. Chem. 42, 3919–3933 (1999).

    PubMed  Google Scholar 

  23. Andrews, K. M. & Cramer, R. D. Toward general methods for targeted library design: topomer shape similarity with diverse structures as queries. J. Med. Chem. 43, 1723–1740 (2000).

    CAS  PubMed  Google Scholar 

  24. Hall, L. H. & Kier, L. B. The E-state as the basis for molecular structure space definition and structure similarity. J. Chem. Inf. Comput. Sci. 40, 784–791 (2000).

    CAS  PubMed  Google Scholar 

  25. Kier, L. B. & Hall, L. H. Database organization and searching with E-state indices. SAR QSAR Environ. Res. 12, 55–74 (2001).

    CAS  PubMed  Google Scholar 

  26. Hull, R. D. et al. Latent semantic structure indexing (LaSSI) for defining chemical similarity. J. Med. Chem. 44, 1177–1184 (2001).

    CAS  PubMed  Google Scholar 

  27. Raymond, J. W. & Willett, P. Effectiveness of graph-based and fingerprint-based similarity measures for virtual screening of 2D chemical structure databases. J. Comput. Aided Mol. Des. 16, 59–71 (2002).

    CAS  PubMed  Google Scholar 

  28. Cramer, R. D., Patterson, D. E. & Bunce, J. D. Comparative molecular field analysis (CoMFA). Effect of shape on binding of steroids to carrier proteins. J. Am. Chem. Soc. 110, 5959–5967 (1988).

    CAS  PubMed  Google Scholar 

  29. Hopfinger, A. J. et al. Construction of 3D-QSAR models using the 4D-QSAR analysis formalism. J. Am. Chem. Soc. 119, 10509–10524 (1997).

    CAS  Google Scholar 

  30. Duca, J. S. & Hopfinger, A. J. Estimation of molecular similarity based on 4D-QSAR analysis: formalism and validation. J. Chem. Inf. Comput. Sci. 41, 1367–1387 (2001).

    CAS  PubMed  Google Scholar 

  31. Hopfinger, A. J., Reaka, A., Venkatarangan, P., Duca, J. S. & Wang, S. Construction of a virtual high throughput screen by 4D-QSAR analysis: application to a combinatorial library of glucose inhibitors of glycogen phosphorylase b. J. Chem. Inf. Comput. Sci. 39, 1151–1160 (1999). An instructive example of the adoption of a multidimensional QSAR model for VS calculations.

    CAS  Google Scholar 

  32. Xue, L., Godden, J. W. & Bajorath, J. Evaluation of descriptors and mini-fingerprints for the identification of molecules with similar activity. J. Chem. Inf. Comput. Sci. 40, 1227–1234 (2000).

    CAS  PubMed  Google Scholar 

  33. Xue, L., Stahura, F. L., Godden, J. W. & Bajorath, J. Mini-fingerprints detect similar activity of receptor ligands previously recognized only by three-dimensional pharmacophore-based methods. J. Chem. Inf. Comput. Sci. 41, 394–401 (2001). This paper shows that conceptually simple but carefully designed 2D fingerprints can recognize molecules that have diverse structures but similar activity.

    CAS  PubMed  Google Scholar 

  34. Mason, J. S. et al. New 4-point pharmacophore method for molecular similarity and diversity applications: overview over the method and applications, including a novel approach to the design of combinatorial libraries containing privileged substructures. J. Med. Chem. 42, 3251–3264 (1999). An extensive introduction to the four-point pharmacophore methodology.

    CAS  PubMed  Google Scholar 

  35. Mason, J. S. & Cheney, D. L. Library design and virtual screening using multiple point pharmacophore fingerprints. Pac. Symp. Biocomput. 5, 576–587 (2000).

    Google Scholar 

  36. McGregor, M. J. & Muskal, S. M. Pharmacophore fingerprinting. 1. Application to QSAR and focused library design. J. Chem. Inf. Comput. Sci. 39, 569–574 (1999).

    CAS  PubMed  Google Scholar 

  37. Bradley, E. K. et al. A rapid computational method for lead evolution: description and application to α1-adrenergic antagonists. J. Med. Chem. 43, 2770–2774 (2000).

    CAS  PubMed  Google Scholar 

  38. Brown, R. D. & Martin, Y. C. Use of structure–activity data to compare structure-based clustering methods and descriptors for use in compound selection. J. Chem. Inf. Comput. Sci. 36, 572–584 (1996).

    CAS  Google Scholar 

  39. Brown, R. D. & Martin, Y. C. The information content of 2D and 3D molecular descriptors relevant to ligand–receptor binding. J. Chem. Inf. Comput. Sci. 37, 731–740 (1997).

    Google Scholar 

  40. Matter, H. Selecting optimally diverse compounds from structure databases: a validation study of two-dimensional and three-dimensional descriptors. J. Med. Chem. 40, 1219–1229 (1997).

    CAS  PubMed  Google Scholar 

  41. Willett, P., Wintermann, V. & Bawden, D. Implementation of non-hierarchic cluster analysis methods in chemical information systems: selection of compounds for biological testing and clustering of substructure search output. J. Chem. Inf. Comput. Sci. 26, 109–118 (1986).

    CAS  Google Scholar 

  42. Barnard, J. M. & Downs, G. M. Clustering of chemical structures on the basis of two-dimensional similarity measures. J. Chem. Inf. Comput. Sci. 32, 644–649 (1992).

    CAS  Google Scholar 

  43. Pearlman, R. S. & Smith, K. M. Novel software tools for chemical diversity. Perspect. Drug Discov. Design 9, 339–353 (1998).

    Google Scholar 

  44. Pearlman, R. S. & Smith, K. M. Metric validation and the receptor-relevant subspace concept. J. Chem. Inf. Comput. Sci. 39, 28–35 (1999). A landmark paper rationalizing the design of low-dimensional reference spaces for cell-based partitioning.

    CAS  Google Scholar 

  45. Bayley, M. J. & Willett, P. Binning schemes for partition-based compound selection. J. Mol. Graph. Model. 17, 10–18 (1999).

    CAS  PubMed  Google Scholar 

  46. Agrafiotis, D. K. & Rassokhin, D. N. A fractal approach for selecting an appropriate bin size for cell-based diversity estimation. J. Chem. Inf. Comput. Sci. 42, 117–122 (2002).

    CAS  PubMed  Google Scholar 

  47. Xue, L. & Bajorath, J. Molecular descriptors for effective classification of biologically active compounds based on principal component analysis identified by a genetic algorithm. J. Chem. Inf. Comput. Sci. 40, 801–809 (2000).

    CAS  PubMed  Google Scholar 

  48. Xie, D., Tropsha, A. & Schlick, T. An efficient projection protocol for chemical databases: single value decomposition combined with truncated Newton minimization. J. Chem. Inf. Comput. Sci. 40, 167–177 (2000).

    CAS  PubMed  Google Scholar 

  49. Godden, J. W., Xue, L. & Bajorath, J. Classification of biologically active compounds by median partitioning. J. Chem. Inf. Comput. Sci. 42, 1263–1269 (2002).

    CAS  PubMed  Google Scholar 

  50. Sheridan, R. P. & Kearsley, S. K. Why do we need so many chemical similarity search methods? Drug Discov. Today 7, 903–911 (2002).

    PubMed  Google Scholar 

  51. Walters, W. P., Stahl, M. T. & Murcko, M. A. Virtual screening — an overview. Drug Discov. Today 3, 160–178 (1998).

    CAS  Google Scholar 

  52. Hann, M., Hudson, B., Lifely, R., Miller, L. & Ramsden, N. Strategic pooling of compounds for high-throughput screening. J. Chem. Inf. Comput. Sci. 39, 897–902 (1999).

    CAS  PubMed  Google Scholar 

  53. Lipinski, C. A. Avoiding investments in doomed drugs. Curr. Drug Discov. 1, 17–19 (2001).

    Google Scholar 

  54. Sutter, J. M. & Jurs, P. C. Prediction of aqueous solubility for a diverse set of heteroatom-containing organic compounds using a quantitative structure–property relationship. J. Chem. Inf. Comput. Sci. 36, 100–107 (1996).

    CAS  Google Scholar 

  55. Huuskonen, J., Salo, M. & Taskinen, J. Aqueous solubility prediction of drugs based on molecular topology and neural network modeling. J. Chem. Inf. Comput. Sci. 38, 450–456 (1998).

    CAS  PubMed  Google Scholar 

  56. Klopman, G. & Zhao, H. Estimation of aqueous solubility of organic molecules by the group contribution approach. J. Chem. Inf. Comput. Sci. 41, 439–445 (2001).

    CAS  PubMed  Google Scholar 

  57. Jorgensen, W. L. & Duffy, E. R. Prediction of drug solubility from structures. Adv. Drug. Deliv. Rev. 54, 355–366 (2002).

    CAS  PubMed  Google Scholar 

  58. Wessel, M. D., Jurs, P. C., Tolan, J. W. & Muskal, S. M. Prediction of human intestinal absorption of drug compounds from molecular structure. J. Chem. Inf. Comput. Sci. 38, 726–735 (1998).

    CAS  PubMed  Google Scholar 

  59. Egan, W. J., Merz, K. M. Jr & Baldwin, J. J. Prediction of drug absorption using multivariate statistics. J. Med. Chem. 43, 3867–3877 (2000).

    CAS  PubMed  Google Scholar 

  60. Ertl, P., Rohde, B. & Selzer, P. Fast calculation of molecular polar surface area as a sum of fragment-based contributions and its application to the prediction of drug transport properties. J. Med. Chem. 43, 3714–3717 (2000).

    CAS  PubMed  Google Scholar 

  61. Lipinski, C. A., Lombardo, F., Dominy, B. W. & Feeney, P. J. Experimental and computational approaches to estimate solubility and permeability in drug discovery and development settings. Adv. Drug Deliv. Rev. 23, 3–25 (1997).

    CAS  Google Scholar 

  62. Bemis, G. W. & Murcko, M. A. The properties of known drugs. 1. Molecular frameworks. J. Med. Chem. 39, 2887–2893 (1996).

    CAS  PubMed  Google Scholar 

  63. Sheridan, R. P. The most common chemical replacements in drug-like compounds. J. Chem. Inf. Comput. Sci. 42, 103–108 (2002).

    CAS  PubMed  Google Scholar 

  64. Oprea, T. Property distribution of drug-related chemical databases. J. Comput. Aided Mol. Des. 14, 251–264 (2000).

    CAS  PubMed  Google Scholar 

  65. Ghose, A. K., Viswanadhan, V. N. & Wendoloski, J. J. A knowledge-based approach in designing combinatorial or medicinal chemistry libraries for drug discovery. 1. A qualitative and quantitative characterization of known drug databases. J. Comb. Chem. 1, 55–67 (1999).

    CAS  PubMed  Google Scholar 

  66. Muegge, I., Heald, S. L. & Brittelli, D. Simple selection criteria for drug-like chemical matter. J. Med. Chem. 44, 1841–1846 (2001).

    CAS  PubMed  Google Scholar 

  67. Gillet, V. J., Willett, P. & Bradshaw, J. Identification of biological activity profiles using substructural analysis and genetic algorithms. J. Chem. Inf. Comput. Sci. 38, 165–179 (1998). A good example of the usefulness of genetic algorithms in descriptor analysis. Here, a genetic algorithm implementation was used to assign weighting factors to molecular descriptors for the prediction of drug-like molecules.

    CAS  PubMed  Google Scholar 

  68. Ajay, A., Walters, W. P. & Murcko, M. A. Can we learn to distinguish between 'drug-like' and 'nondrug-like' molecules? J. Med. Chem. 41, 3314–3324 (1998).

    CAS  PubMed  Google Scholar 

  69. Sadowski, J. & Kubinyi, H. A scoring scheme to distinguish between drugs and non-drugs. J. Med. Chem. 41, 3325–3329 (1998). References 68 and 69 were the first to apply machine-learning techniques to the systematic prediction of drug-likeness. Different from QSAR-type analysis, neural network models can capture non-linear property relationships.

    CAS  PubMed  Google Scholar 

  70. Norinder, U., Sjöberg, P. & Österberg, T. Theoretical calculation and prediction of blood–brain-barrier partitioning of organic solutes using MolSurf parameterization and PLS statistics. J. Pharm. Sci. 87, 952–959 (1998).

    CAS  PubMed  Google Scholar 

  71. van de Waterbeemd, H., Camenisch, G., Folkers, G., Chretien, J. R. & Raevsky, O. A. Estimation of blood–brain barrier crossing of drugs using molecular size and shape, and H-bonding descriptors. J. Drug Target. 6, 151–165 (1998).

    CAS  PubMed  Google Scholar 

  72. Kelder, J., Grootenhuis, P. D., Bayada, D. M., Delbressine, L. P. & Ploemen, J. P. Polar molecular surface as a dominating determinant for oral absorption and brain penetration of drugs. Pharm. Res. 16, 1514–1519 (1999).

    CAS  PubMed  Google Scholar 

  73. Ajay, A., Bemis, G. W. & Murcko, M. A. Designing libraries with CNS activity. J. Med. Chem. 42, 4942–4951 (1999).

    CAS  PubMed  Google Scholar 

  74. Caldwell, G. W., Ritchie, M. M., Masucci, J. A., Hagemann, W. & Yan, Z. The new pre-clinical paradigm: compound optimization in early and late phase drug discovery. Curr. Topics Med. Chem. 1, 353–366 (2001).

    CAS  Google Scholar 

  75. Yoshida, F. & Topliss, J. G. QSAR model for drug human oral bioavailability. J. Med. Chem. 43, 2575–2585 (2000).

    CAS  PubMed  Google Scholar 

  76. de Groot, M. J., Ackland, M. J., Horne, V. A., Alex, A. A. & Jones, B. C. A novel approach to predicting P450 mediated drug metabolism. CYP2D6 catalyzed N-dealkylation reactions and qualitative metabolite predictions using a combined protein and pharmacophore model for CYP2D6. J. Med. Chem. 42, 1515–1524 (1999).

    CAS  PubMed  Google Scholar 

  77. Ekins, S. et al. Three- and four-dimensional quantitative structure–activity relationship (3D/4D-QSAR) analyses of CYP2C9 inhibitors. Drug Metab. Dispos. 28, 994–1002 (2000).

    CAS  PubMed  Google Scholar 

  78. Jones, J. P., Mysinger, M. & Korzekwa, K. R. Computational models for cytochrome P450: a predictive electronic model for aromatic oxidation and hydrogen atom abstraction. Drug Metab. Dispos. 30, 7–12 (2002).

    CAS  PubMed  Google Scholar 

  79. Ahlberg, C. Visual exploration of HTS databases: bridging the gap between chemistry and biology. Drug Discov. Today 4, 370–376 (1999).

    CAS  PubMed  Google Scholar 

  80. Engels, M. F., Wouters, L., Verbeeck, R. & Vanhoof, G. Outlier mining in high throughput screening experiments. J. Biomol. Screen. 7, 341–351 (2002).

    CAS  PubMed  Google Scholar 

  81. Chen, X., Rusinko, A. & Young, S. S. Recursive partitioning analysis of a large structure–activity data set using three-dimensional descriptors. J. Chem. Inf. Comput. Sci. 38, 1054–1062 (1998).

    CAS  Google Scholar 

  82. Rusinko, A., Farmen, M. W., Lambert, C. G., Brown, P. L. & Young, S. S. Analysis of a large structure–biological activity data set using recursive partitioning. J. Chem. Inf. Comput. Sci. 39, 1017–1026 (1999). References 81 and 82 establish the recursive partitioning approach for the analysis and mining of large screening data sets.

    CAS  PubMed  Google Scholar 

  83. Cho, S. J., Shen, C. F. & Hermsmeier, M. A. Binary formal inference-based recursive modeling using multiple atom and physicochemical property class pair and torsion descriptors as decision criteria. J. Chem. Inf. Comput. Sci. 40, 668–680 (2000).

    CAS  PubMed  Google Scholar 

  84. van Rhee, A. M. et al. Retrospective analysis of an experimental high-throughput screening data set by recursive partitioning. J. Comb. Chem. 3, 267–277 (2001).

    CAS  PubMed  Google Scholar 

  85. Miller, D. A. Results of a new classification algorithm combining K nearest neighbors and recursive partitioning. J. Chem. Inf. Comput. Sci. 41, 168–175 (2001).

    CAS  PubMed  Google Scholar 

  86. Blower, P., Fligner, M., Verducci, J. & Bjoraker, J. On combining recursive partitioning and simulated annealing to detect groups of biologically active compounds. J. Chem. Inf. Comput. Sci. 42, 393–404 (2002).

    CAS  PubMed  Google Scholar 

  87. Nicolaou, C. A., Tamura, S. Y., Kelley, B. P., Bassett, S. I. & Nutt, R. F. Analysis of large screening data sets via adaptively grown phylogenetic-like trees. J. Chem. Inf. Comput. Sci. 42, 1069–1079 (2002). The introduction of a new clustering method that shows promise in extracting diverse structure–activity relationships from screening data.

    CAS  PubMed  Google Scholar 

  88. Tamura, S. Y., Bacha, P. A., Gruver, H. S. & Nutt, R. F. Data analysis of high-throughput screening results: application of multidomain clustering to the NCI anti-HIV. J. Med. Chem. 45, 3082–3093 (2002).

    CAS  PubMed  Google Scholar 

  89. Menard, P. R., Lewis, R. A. & Mason, J. S. Rational screening set design and compound selection: cascaded clustering. J. Chem. Inf. Comput. Sci. 38, 497–505 (1998).

    CAS  Google Scholar 

  90. Rosenkranz, H. S. et al. Development, characterization and application of predictive-toxicology models. SAR QSAR Environ. Res. 10, 277–298 (1999).

    CAS  PubMed  Google Scholar 

  91. Roberts, G., Myatt, G. J., Johnson, W. P., Cross, K. P. & Blower, P. LeadScope: software for exploring large sets of screening data. J. Chem. Inf. Comput. Sci. 40, 1302–1314 (2000).

    CAS  PubMed  Google Scholar 

  92. Labute, P. Binary QSAR: a new method for the determination of quantitative structure activity relationships. Pac. Symp. Biocomput. 4, 444–455 (1999).

    Google Scholar 

  93. Gao, H. Application of BCUT metrics and genetic algorithm in binary QSAR analysis. J. Chem. Inf. Comput. Sci. 41, 402–407 (2001).

    CAS  PubMed  Google Scholar 

  94. Gao, H., Williams, C., Labute, P. & Bajorath, J. Binary quantitative structure–activity relationship (QSAR) analysis of estrogen receptor ligands. J. Chem. Inf. Comput. Sci. 39, 164–168 (1999).

    CAS  PubMed  Google Scholar 

  95. Stahura, F. L., Godden, J. W., Xue, L. & Bajorath, J. Distinguishing between natural products and synthetic molecules by Shannon descriptor entropy analysis and binary QSAR calculations. J. Chem. Inf. Comput. Sci. 40, 1245–1252 (2000).

    CAS  PubMed  Google Scholar 

  96. Stahura, F. L., Godden, J. W. & Bajorath, J. Differential Shannon entropy analysis identifies molecular descriptors that predict aqueous solubility of synthetic compounds with high accuracy in binary QSAR calculations. J. Chem. Inf. Comput. Sci. 42, 550–558 (2002).

    CAS  PubMed  Google Scholar 

  97. Harper, G., Bradshaw, J., Gittin, J. C., Green, D. V. S. & Leach, A. R. Prediction of biological activity for high-throughput screening using binary kernel discrimination. J. Chem. Inf. Comput. Sci. 41, 1295–1300 (2001).

    CAS  PubMed  Google Scholar 

  98. Doman, T. N. et al. Molecular docking and high-throughput screening for novel inhibitors of protein tyrosine phosphatase-1B. J. Med. Chem. 45, 2213–2221 (2002). One of very few case studies that directly compares the performance of VS and HTS analysis.

    CAS  PubMed  Google Scholar 

  99. Singh, J. et al. Identification of potent and novel α4β1 antagonists using in silico screening. J. Med. Chem. 45, 2988–2993 (2002).

    CAS  PubMed  Google Scholar 

  100. Gr¨neberg, S., Stubbs, M. T. & Klebe, G. Successful virtual screening for novel inhibitors of human carbonic anhydrase: strategy and experimental confirmation. J. Med. Chem. 45, 3588–3602 (2002).

    Google Scholar 

  101. Stahura, F. L., Xue, L., Godden, J. W. & Bajorath, J. Methods for compound selection focused on hits and application in drug discovery. J. Mol. Graph. Model. 20, 439–446 (2002).

    CAS  PubMed  Google Scholar 

  102. Manallack, D. T. et al. Selecting screening candidates for kinase and G protein-coupled receptor targets using neural networks. J. Chem. Inf. Comput. Sci. 42, 1256–1262 (2002).

    CAS  PubMed  Google Scholar 

  103. Valler, M. J. & Green, D. Diversity screening versus focused screening in drug discovery. Drug Discov. Today 5, 286–293 (2000).

    CAS  PubMed  Google Scholar 

  104. Martin, Y. C., Kofron, J. L. & Traphagen, L. M. Do structurally similar molecules have similar biological activity? J. Med. Chem. 45, 4350–4358 (2002).

    CAS  PubMed  Google Scholar 

  105. Engels, M. F. M. & Venkatarangan, P. Smart screening: approaches to efficient HTS. Curr. Opin. Drug Discov. Develop. 4, 275–283 (2001). An instructive description of a sequential-screening strategy, including several interesting benchmark calculations.

    CAS  Google Scholar 

  106. Engels, M. F. M., Thielemans, T., Verbinnen, D., Tollenaere, J. P. & Verbeeck, R. CerBeruS: a system supporting the sequential screening process. J. Chem. Inf. Comput. Sci. 40, 241–245 (2000).

    CAS  PubMed  Google Scholar 

  107. Jones-Hertzog, D. K., Mukhopadhyay, P., Keefer, C. E. & Young, S. S. Use of recursive partitioning in the sequential screening of G protein-coupled receptors. J. Pharmacol. Toxicol. Methods 42, 207–215 (1999).

    CAS  PubMed  Google Scholar 

  108. Kauvar, L. M. et al. Predicting ligand binding to proteins by affinity fingerprinting. Chem. Biol. 2, 107–118 (1995).

    CAS  PubMed  Google Scholar 

  109. Dixon, S. L. & Villar, H. O. Bioactive diversity and screening library selection via affinity fingerprinting. J. Chem. Inf. Comput. Sci. 38, 1192–1203 (1998). This paper describes the application of affinity fingerprints in iterative screening situations and provides insights into the predictive value of this approach.

    CAS  PubMed  Google Scholar 

  110. McGovern, S. L., Caselli, E., Grigorieff, N. & Shoichet, B. K. A common mechanism underlying promiscuous inhibitors from virtual and high-throughput screening. J. Med. Chem. 45, 1712–1722 (2002).

    CAS  PubMed  Google Scholar 

  111. Powers, R. A., Morandi, F. & Shoichet, B. K. Structure-based discovery of a novel, noncovalent inhibitor AmpC β-lactamase. Structure 10, 1013–1023 (2002).

    CAS  PubMed  Google Scholar 

  112. Sotriffer, C. A., Gohlke, H. & Klebe, G. Docking into knowledge-based potential fields: a comparative evaluation of DrugScore. J. Med. Chem. 45, 1967–1970 (2002).

    CAS  PubMed  Google Scholar 

  113. Wei, B., Baase, W., Weaver, L. Matthews & Shoichet, B. K. A model binding site for testing scoring functions in molecular docking. J. Mol. Biol. 322, 339–355 (2002). A well-designed study that uses T4 lysozyme mutant structures as a versatile model system for the evaluation of docking and scoring functions.

    CAS  PubMed  Google Scholar 

Download references

Acknowledgements

The author is grateful to F. Stahura for critical review of the manuscript and help with illustrations.

Author information

Authors and Affiliations

Authors

Related links

Related links

DATABASES

LocusLink

α1-adrenoceptor

BCL-xL

carbonic anhydrase

β-lactamase

monoamine oxidase

oestrogen receptor

tyrosine phosphatase-1B

VLA-4

FURTHER INFORMATION

Cancer.gov

Daylight Chemical Information Systems

MDL

Glossary

SUBSTRUCTURE

A defined structural fragment of a molecule.

PHARMACOPHORE

The spatial arrangement of chemical groups or features in a molecule that are known or thought to determine its activity. The most popular pharmacophore models consist of three or four points separated by defined distance ranges. In most cases, pharmacophore geometry is not known from experiment, but is predicted.

MOLECULAR GRAPH

A two-dimensional representation of the connectivity pattern in a molecule, with atoms shown as vertices and bonds as edges.

QUANTITATIVE STRUCTURE–ACTIVITY RELATIONSHIP (QSAR).

QSAR analysis refers to methods that relate structural features of molecules to biological activity in quantitative terms. In most cases, QSAR analysis attempts to establish linear relationships between selected structural features in a series of related molecules and their known level of activity. If successful, models derived from training sets can be applied to predict molecules with higher potency.

BINARY BIT STRING

A series of 1 or 0 characters. Each bit position is either set 'on' (that is, set to 1) or 'off' (0), and can account for the presence or absence of a specific feature.

TANIMOTO COEFFICIENT

The most popular metric for the quantitative comparison of binary molecular fingerprints. This coefficient is defined as Tc = bc/(b1 + b2bc). In this formulation, b1 represents the number of bits that are set on in the first fingerprint, b2 is the number of bits that are set on in the second fingerprint, and bc is the number of bits common to both fingerprints. If the Tc value is 1, then the compared fingerprints are identical.

COMBINATORIAL PROBLEM

As used here, the term describes the situation that the number of possible pairwise comparisons c grows with the number of objects n according to the formula c = n(n − 1)/2. So, if n becomes increasingly large, methods that rely on pairwise comparisons of, for example, database molecules become computationally infeasible.

BINNING

This process divides coordinate axes into intervals (typically of equal size). If binning is applied to the axes of 2D and 3D coordinate systems, grids and cells are obtained, respectively.

NEURAL NETWORK

Artificial neural networks are collections of mathematical models that are interconnected and organized in different layers. Given this architecture, the models correspond to neurons and the connections to synapses of the nervous system. Neural network simulations are analogous to an adaptive learning process. So, neural nets are typically trained to distinguish between different objects and their properties in learning sets, and the resulting models are then applied to make predictions on test sets.

QUANTITATIVE STRUCTURE–PROPERTY RELATIONSHIP (QSPR).

A variation of the QSAR approach, in which structural features of molecules are not quantitatively related to biological activity, but instead to physical properties, such as aqueous solubility or passive absorption.

DRUG-LIKE

The concept of 'drug-likeness' is based on the premise that drugs share specific molecular characteristics that systematically distinguish them from other synthetic or natural compounds.

LOGP(O/W)

The logarithm of the octanol/water partition coefficient (often abbreviated logP) describes the solubility of a compound in octanol (hydrophobic solvent) relative to its solubility in water (polar solvent).

RULE-OF-FIVE

On the basis of statistical analysis of known drugs, candidate compounds are likely to have unfavourable absorption, permeation and bioavailability characteristics if they contain more than 5 hydrogen-bond donors, more than 10 hydrogen-bond acceptors, a logP greater than 5 and/or a molecular mass of more than 500 Da.

PRINCIPAL COMPONENT ANALYSIS

(PCA). A mathematical method that captures the variance in a data set with respect to chosen variables, and transforms correlated variables into a smaller number of uncorrelated ones for data presentation.

GENETIC ALGORITHM

Computational implementation of a problem-solving approach that uses principles of biological competition and population dynamics. Model parameters are encoded in a 'chromosome', and are varied. Chromosomes yield possible solutions to a given problem by means of a fitness function. Chromosomes that correspond to the best intermediate solutions are subjected to operations that are analogous to gene recombination and mutation to produce the next generation. This process continues until solutions reach a predefined convergence criterion.

ADME

Absorption, distribution, metabolism and excretion are important effects that determine the in vivo characteristics of drug (candidate) molecules.

DECISION TREE

A data set is successively divided at decision points. At each point, a 'yes' or 'no' decision is made for each object, dividing the data into smaller and smaller subsets along the tree. All objects in a given subset share the same signature of 'yes' or 'no' decisions.

BINARY DESCRIPTORS

These types of descriptor capture two defined states (and not continuous value ranges). Typical examples include a specific substructure or bond pattern. The feature detected by a binary descriptor is either 'present' (state 1) or 'absent' (state 2). Application of binary descriptors allows the classification of molecular data sets by means of decision trees.

SCAFFOLD

Often defined as the core structure of a small molecule, the scaffold is typically a ring system that has diverse chemical groups attached. Accordingly, it is obtained by removal of these attached groups.

CHEMOTYPE

A family of molecules that has a unique core structure or scaffold.

PHYLOGENETIC TREE

This classification structure has its origin in biology to describe evolutionary relationships. It classifies a family of objects into 'most-similar' sets by subdividing them at branch points into successively smaller subsets with increasing object similarity. The final subsets represent unique leaves of the tree. Different from a simple decision tree, a phylogenetic tree structure can create multiple branches at each point.

BAYES' THEOREM

A mathematical formulation that determines the probability that a specific result was due to a particular cause, if multiple possible causes exist. For example, a molecular database consists of 50% synthetic reagents, 30% drug-like molecules and 20% natural products. If the activity rates of synthetic compounds, drug-like molecules and natural products are 1%, 50% and 15%, respectively, what is the probability that a given biological activity in this database is represented by a natural product?

SIMILARITY PARADOX

In the context of virtual screening (VS), minor chemical modifications of otherwise similar molecules can render them either active or inactive. VS calculations are expected to identify series of molecules that share the same scaffold. However, if only inactive compounds were selected for testing, VS analysis would have 'failed', although a relevant chemotype was identified. This highlights potential problems associated with the selection of only one or a few representative molecules from a series of similar ones.

ANALOGUE

A member of a series of closely related molecules that has only minor chemical modifications that distinguish it from others belonging to this chemotype. Analogues of active molecules are often generated to improve potency and/or other compound characteristics, such as solubility or oral availability.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Bajorath, J. Integration of virtual and high-throughput screening. Nat Rev Drug Discov 1, 882–894 (2002). https://doi.org/10.1038/nrd941

Download citation

  • Issue Date:

  • DOI: https://doi.org/10.1038/nrd941

This article is cited by

Search

Quick links

Nature Briefing

Sign up for the Nature Briefing newsletter — what matters in science, free to your inbox daily.

Get the most important science stories of the day, free in your inbox. Sign up for Nature Briefing