Introduction

The process of novel drug discovery and development is generally recognized to be time-consuming, risky and costly. The typical drug discovery and development cycle, from concept to market, takes approximately 14 years1, and the cost ranges from 0.8 to 1.0 billion USD2. Rapid developments in combinatorial chemistry and high-throughput screening technologies have provided an environment to expedite the drug discovery process by enabling huge libraries of compounds to be screened and synthesized in short time3,4. Although the investment in new drug development has grown significantly in the past decades, the output is not positively proportional to the investment because of the low efficiency and high failure rate in drug discovery5. Consequently, various approaches have been developed to shorten the research cycle and reduce the expense and risk of failure for drug discovery. Computer-aided drug design (CADD) is one of the most effective methods for reaching these goals.

CADD is a widely used term that represents computational tools and sources for the storage, management, analysis and modeling of compounds. It covers many aspects of drug discovery, including computer programs for designing compounds, tools for systematically assessing potential lead candidates and the development of digital repositories for studying chemical interactions6. In the post-genomic era, benefiting from the dramatic increase in biomacromolecule and small molecule information, computational tools can be applied to most aspects of the drug discovery and development process, from target identification and validation to lead discovery and optimization; the tools can even be applied to preclinical trials5,7,8,9, which greatly alters the pipeline for drug discovery and development. Figure 1 shows a flowchart for the tasks that computational approaches have been applied to and the computational methods used at each stage. The use of computational tools could reduce the cost of drug development by up to 50%10.

Figure 1
figure 1

Multiple computational drug discovery approaches that have been applied in various stages of the drug discovery and development pipeline, including target identification and validation, lead discovery and optimization, and preclinical tests.

The commonly used computational drug discovery approaches can be categorized into structure-based drug design (SBDD), ligand-based drug design (LBDD) and sequence-based approaches. SBDD methods, such as molecular docking and de novo drug design, rely on the knowledge of the structure of the target macromolecule, which are mainly obtained from crystal structures, NMR data and homology models11. In the absence of three-dimensional (3D) structures of potential targets, LBDD tools, including quantitative structure-activity relationship (QSAR), pharmacophore modeling, molecular field analysis and 2D or 3D similarity assessment, can provide crucial insights into the nature of the interactions between drug targets and ligands, which allows predictive models that are suitable for lead discovery and optimization to be constructed12. In recent years, to deal with situations that neither the target structure nor the ligand information is available, sequence-based approaches that use bioinformatic methods to analyze and compare multiple sequences have been developed to identify potential targets from scratch and to conduct lead discovery13,14. Currently, all single methods are unable to fulfill the practical needs of drug discovery and development. Therefore, combinational and hierarchical strategies that employ multiple computational approaches have been frequently and successfully used.

The efficiency, accuracy and speed of these computational methods largely depend on several technical aspects, including conformation generation and sampling, scoring functions, optimization algorithms, and molecular similarity calculations7,11,15. In this paper, we focus on these topics and the widely used computational tools in the fields of target identification and lead discovery and address some of the most recent methodologies, platforms and applications.

Methodologies and platforms

Some remarkable methodologies and platforms focused on computational drug discovery and development have been developed and constructed. In this section, several methodologies and platforms that involve target identification, docking-based virtual screening, conformation sampling, scoring functions, molecular similarity calculation, virtual library design and sequence-based drug design are summarized. These aspects are intimately linked, and improvements in any aspect could benefit the others (Figure 2).

Figure 2
figure 2

Important methodologies and platforms in the computational drug discovery field introduced and discussed in this article, with a focus on target identification and lead discovery fields.

Target identification

As the first stage in the drug discovery pipeline, the identification of drug targets from large quantities of candidate macromolecules is both important and challenging16. The current major tools for target identification are genomic and proteomic approaches, which are laborious and time-consuming17. Therefore, to complement the experimental methods, computational tools and platforms, including reverse docking and pharmacophore mapping, have been developed.

TarFisDock is a web server that identifies drug targets using a reverse docking strategy to seek all possible binding proteins for a given small molecule18. The development of TarFisDock was based on the widely used docking program, DOCK (version 4.0)19,20. This platform consists of a front-end web interface written in PHP and HTML with MySQL as database system. DOCK is used as a back-end tool for reverse docking. The advantage of TarFisDock is obvious; it could be a valuable tool for identifying potential targets for a compound with known biological activity, a newly isolated natural product or an existing drug whose pharmacological mechanism is unclear. In addition, this platform is also able to find potential targets that could be responsible for the toxicity and side effects of a drug, which could allow for the prediction of the off-target effects of a drug candidate. Indeed, studies have shown that off-target effects have been largely responsible for the high attrition rate in drug development21. Furthermore, TarFisDock could provide valuable information for constructing drug target networks in order to study the drug-target interaction in a more systematic way. The reliability of this methodology has been tested on vitamin E and 4H-tamoxifen by identifying their putative binding proteins. The results indicated that TarFisDock could predict 50% of the reported corresponding targets. However, this method still has certain limitations: (1) the protein entries are not sufficient to cover all the protein information of disease related genomes; (2) the flexibility of the proteins is not considered during the docking procedure; and (3) the scoring function, which was intended to evaluate small molecules, may not be accurate enough for evaluating reverse docking18.

A web-accessible potential drug target database (PDTD) was constructed for TarFisDock. This database currently contains more than 1100 protein entries with 3D structures obtained from the Protein Data Bank. The general information for these proteins was extracted from the literature and several online databases, such as TTD22, DrugBank23, and Thomson Pharma. This database contains diverse information on more than 830 potential drug targets, and each drug target has structures in both the PDB and MOL2 formats. Information on related diseases, biological functions and associated signaling pathways has also been collected. All of the targets were classified according to their function and their related diseases. PDTD has a keyword search function for parameters such as the PDB ID, the target name and the disease name24. As a comprehensive and unique repository of drug targets, it could be used for in silico drug target identification, virtual screening, and the discovery of secondary effects for existing drugs.

Another important issue in target identification is finding the best interaction mode between the potential target candidates and the small molecule probes. In addition to the reverse docking method, pharmacophore modeling and mapping can be used to identify the optimal interaction mode. A pharmacophore model is the spatial arrangement of features essential for a molecule to interact with a specific target receptor. PharmMapper is the first web-based tool to use a 'reverse' pharmacophore mapping approach to predict potential drug targets against any given small molecule25. However, the PharmMapper server requires a sufficient number of available pharmacophore models that describe the binding modes of known ligands at the binding sites. Thus, a large, in-house database of pharmacophore models annotated with their target information was constructed (PharmTargetDB). The target protein structures in complex with small molecules were carefully extracted from the DrugBank26, BindingDB27, PDBBind28, and PDTD24 databases, and over 7000 pharmacophore models (covering information for over 1500 drug targets) based on the complex structures were generated. A sequential combination of triangle hashing (TriHash) and genetic algorithm (GA) optimization was adopted to identify the pharmacophore that best fit the task. Benefiting from the highly efficient and robust triangle hash mapping method, PharmMapper is computationally efficient and has the ability to carry out high throughput screens. The algorithm is highly automated, and the interface is user friendly. For experienced users, optional parameters controlling speed and accuracy and the candidate targets subset can be freely customized. The major limitation of the program is that the pharmacophore database only includes drug targets that have PDB structures with a co-crystallized ligand. However, PharmTargetDB is updated periodically as the number of structures deposited in PDB grows25.

Docking-based virtual screening

Virtual screening based on molecular docking has become one of the most widely used methods of SBDD. The primary criteria for any docking method are docking accuracy, scoring accuracy, and computational efficiency, which are all strongly influenced by the conformational searching method29,30. Molecular docking is a typical optimization problem; therefore, it is difficult to obtain the global optimum solution. Most conformational optimization methods in docking programs can only deal with a single objective, such as the binding energy, shape complementarity, or chemical complementarity. This type of method is not effective for solving real-world problems, which normally involve multiple objectives31. Therefore, an optimization algorithm that comprises several objectives and results in more reasonable and robust binding modes between ligands and macromolecules is urgently needed.

A newly developed docking methodology, GAsDock, uses an entropy-based multi-population GA to optimize the binding poses between small molecules and macromolecule receptors32. Information entropy was employed in the GA for optimization, and contracted space was used as the convergence criterion, ensuring that GAsDock can converge rapidly and steadily. A validation test docking known inhibitors into the binding pockets of thymidine kinase (TK) and HIV-1 reverse RT indicated that GAsDock is more accurate than other docking programs, such as GOLD33, FlexX33, DOCK33, Surflex30, and Glide29. To increase the accuracy and speed of the process, an improved adaptive genetic algorithm has been developed that supports a flexible docking method. Some advanced techniques, such as multi-population genetic strategy, entropy-based searching technique with self-adaption and quasi-exact penalty, were introduced into this algorithm. A new iteration scheme was also employed in conjunction with these techniques to speed up the optimization and convergence processes, making this method significantly faster than the old method34. In addition, two sets of multi-objective optimization (MO) methods, denoted MOSFOM (Multi-Objective Scoring Function Optimization Methodology), that simultaneously consider both the energy score and the contact score were developed. MOSFOM primarily emphasizes a new strategy to obtain the most reasonable binding conformation and increase the hit rates rather than to accurately predicting the binding free energy31.

Conformation sampling

One of the imperative aspects of drug design and development is to perceive the bioactive conformations of the small molecules that determine the physical and biological properties of the molecules. Many of the drug discovery methods, such as molecular docking, pharmacophore construction and matching, 3D database searching, 3D-QSAR, and molecular similarity analysis, involve a conformational sampling procedure to generate conformations of small molecules in the binding pocket and a scoring phase to rank these conformations. A practical conformation ensemble should guarantee that the conformers are energy reasonable and span the conformational space in an appropriate amount of time. Other sophisticated criteria, such as pharmacophore and binding pocket mapping, have also been implemented to sample the conformers, making the conformation generation process a multi-objective optimization process35.

A highly efficient conformational generation method named Cyndi, which is based on the multi-objective evolution algorithm (MOEA), has been developed. Using multiple objectives to control energy accessibility as well as geometric diversity, Cyndi is capable of searching the conformational space in nearly constant time and of sampling the Pareto frontier at which both the energy and diversity features are favored. The conformers are encoded into GA individuals with information on the dihedral torsions of the rotatable bonds; the VDW and the torsional energy terms are two distinctive objectives for separating the generated conformers in energy space using the Tripos force field36. Cyndi ensures that the generated conformation ensemble simultaneously meets multiple criteria, such as low energy and geometric diversity, instead of concentrating on just one criteria35. Recently, Cyndi was updated to incorporate the MMFF94 force field to more rationally assess the conformational energy. A comparison between Cyndi and MacroModel integrated in Maestro V7.5 (Schrodinger Inc), focusing on the balance between the sampling depth of the conformational space and the conformational costs with respect to the algorithm method used has been performed. MacroModel was shown to have comparable performance to Cyndi in terms of retrieving the bioactive conformations, while Cyndi performed better at discovering bioactive conformations in the shortest amount of time with regard to the efficiency of the conformation sampling37.

Scoring function

The scoring function is an essential component in virtual screening. One major scoring method is the knowledge-based scoring method, which typically extracts structural information from experimentally determined protein-ligand complexes and employs the Bolztmann law to transform the atom pair preferences into distance-dependent pairwise potentials38,39,40,41. The potential of mean force (PMF) scoring function can convert structural information into free energy without any knowledge of the binding affinities and is therefore expected to be more applicable. This method implicitly balances many opposing contributions to binding, such as solvation effects, conformational entropy and interaction enthalpy40. Several remarkable methodologies focused on these fields are introduced below.

A kinase family-specific PMF scoring function named kinase-PMF was developed with a kinase data set of 872 complexes from the PDB database to assess the binding of ATP-competitive kinase inhibitors42. This scoring function inherits the functional form and atom type of PMF0443. Compared to eight other commonly used scoring methods, kinase-PMF had the highest success rate in identifying not only positive compounds from decoys but also crystal conformations. Thus, this method could allow researchers to screen and optimize hit compounds in kinase inhibitor development42.

An improved PMF scoring function named KScore, which is based on several diverse training sets and a newly defined atom-typing scheme using 23 redefined ligand atom types, 17 protein atom types and 28 newly introduced atom types for nucleic acids, has been developed. In comparison with the existing PMF potentials, such as PMF99 and PMF04, the pairwise potentials for different atom types used in KScore have been significantly improved, particularly in the field of reflecting experimental phenomena, including the interaction distances and the strengths of hydrogen bonding, electrostatic interactions, VDW interactions, cation-π interactions and aromatic stacking. KScore is a powerful tool for distinguishing strong binders from a series of compounds and can be applied to large-scale virtual screening. In addition, further improvements should be possible by modifying the atom-typing scheme and diverse training sets44. KScore has been integrated into the previously mentioned molecular docking program GAsDock32.

On the basis of the concept and formalism of PMF and a novel iteration method, a knowledge-based scoring function named IPMF was developed. This scoring function integrates additional experimental binding affinity information into the knowledge base as complementary data to the generally used structural information. The employed iteration method is to extract the 3D structural information and the binding affinity information in order to yield an “enriched” knowledge-based model. The performance of IPMF was evaluated by scoring a diverse set of 219 protein-ligand complexes and comparing the results to seven commonly used scoring functions. As a result, the IPMF score performs best in the activity prediction test. In addition, when re-ranking binding poses, IPMF also demonstrated marginal improvements over the other evaluated knowledge-based scoring functions. These results suggest that the additional binding affinity information can be used not only for developing scoring functions but also for improving their ability to predict binding affinities. The IPMF approach provides a well-defined scheme to introduce binding information into typical statistical potentials, which may be applicable to other knowledge-based scoring functions45.

Molecular similarity methods

As the cornerstone of structure-activity relationship (SAR) and structural clustering analysis, molecular similarity is a pivotal concept in LBDD. Similarity-based virtual screening and candidate ranking are considered to be one of the most powerful tools in medicinal chemistry46,47 and have been successfully applied in a number of cases. Similarity searching programs can generally be categorized into 2D and 3D similarity according to whether 3D conformation information is considered. 2D similarity methods are efficient for quickly profiling neighboring compounds. However, it may to some extent provide different hits for the same queries as different 2D similarity definitions target different aspects of the information. This method also tends to discover close structural analogues instead of novel scaffold hits48. However, 3D similarity methods typically consider multiple aspects of the 3D conformation, including pharmacophores, molecular shapes, and molecular fields. 3D methods can be conveniently used to accomplish scaffold hopping to identify novel compounds.

Based on the pharmacophore matching approach, which was used as the engine of the previously mentioned PharmMapper Server25, a method named SHAFTS (SHApe-FeaTure Similarity) has been developed for rapid 3D molecular similarity calculation. This method adopts hybrid similarity metrics of molecular shape and colored (or labeled) chemistry groups annotated by pharmacophore features for 3D calculation and ranking in order to integrate the strength of both pharmacophore matching and volumetric similarity approaches. The triplet hashing method is used to enumerate fast molecular alignment poses. The hybrid similarity consists of shape-densities overlaps and pharmacophore feature fit values and is used to score and rank alignment modes. SHAFTS achieved superior performance in terms of both overall and early stage enrichments of known actives and chemotypes compared to other ligand-based methods48. SHAFTS has been integrated into ChemMapper Server (unpublished result).

Spherical harmonic (SH) is a set of orthogonal spherical functions that can easily represent the shape of a closed curve surface, such as a molecular surface. SH expansion theory has been successfully applied in virtual screening, protein-ligand recognition, binding pocket modeling, molecular fragment similarity, and so forth. SHeMS is a novel molecular shape similarity comparison method derived from SH expansion. In this method, the SH expansion coefficients are weighted to calculate similarity, leading to a distinct contribution of overall and detailed features to the final score. In addition, the reference set for optimization can be configured by the user, which allows for system-specific and customized comparisons. A retrospective VS experiment on the directory of useful decoys (DUD) database and principal component analysis (PCA) reveals that SHeMS provides dramatically improved performance over the original SH (OSH) and ultra-fast shape recognition (USR) methods49.

Virtual library construction

De novo drug design aims to chemically fill the binding sites of target macromolecules. One of the critical challenges of this process is to select fragment sets that have the best potential to be parts of new drug leads for a given target. Virtual library construction including focused library, targeted library and primary screening library has been suggested as one way to overcome this challenge50. Another challenge is to set up proper criteria for product judgement. To solve this problem, drug-likeness and structural diversity have been introduced into library design to reduce the size and increase the screening efficiency of the constructed libraries.

Focused libraries concentrate on one particular target and are built on the basis of a lead compound or pharmacophore, while targeted libraries are designed to seek drug leads against specific targets14. A new efficient approach that adopts the advantages of both focused and targeted libraries and integrates technologies from docking-based virtual screening and drug-like analysis was established to build, optimize and assess focused libraries. A software package named LD1.0 was successfully developed using the new approach51. Building blocks are selected from given fragment databases to create a series of virtual libraries. The virtual libraries are then optimized by library-based GA and evaluated on the basis of specified criteria such as docking energy, molecular diversity and drug-likeness. GA retains libraries with higher scores and creates new libraries to form the next generation of focused libraries. Once the termination condition is satisfied, GA optimization ends51.

Sequence-based drug design

The 3D structures of most proteins have not previously been determined, and many of the proteins do not even have a known ligand. In this situation, neither structure-based methods nor ligand-based methods can be employed to conduct drug discovery and development research. Therefore, a method to predict ligand-protein interactions (LPIs) in the absence of 3D or ligand information is urgently needed. Recently, a sequence-based drug design model for LPI was constructed solely on the basis of the primary sequence of proteins and the structural features of small molecules using the support vector machine (SVM) approach13. This model was trained using 15 000 LPIs between 626 proteins and over 10 000 active compounds collected from the Binding Database52. In the validation test of this model, nine novel active compounds against four pharmacologically important targets were found using only the sequence of the target. This is the first example of a successful sequence-based drug design campaign13.

Applications

The newly developed computational drug discovery approaches have been successfully applied in several cases, which suggests that these methods may further emphasize the role of computational drug discovery in the drug R&D workflow.

Application of computational methods to target identification

The combinational strategy of the reverse docking tools TarFisDock and the PDTD database have been successfully used to identify the targets for several bioactive compounds whose in vivo targets are unknown. Colonization of the human stomach by the pathogenic bacterium Helicobacter pylori is a major cause of gastrointestinal illnesses. However, because of the lack of mature protein targets, discovering anti-H pylori agents is a daunting task. Using the active natural product discovered by anti-H pylori screening as a probe, potential binding proteins were screened from PDTD using the reverse docking tool TarFisDock. A subsequent homology search indicated that among the 15 candidates discovered by reverse docking, only diaminopimelate decarboxylase (DC) and peptide deformylase (PDF) had homologous proteins in the H pylori genome. Enzymatic assays demonstrated that the natural product and one of its analogs are potent inhibitors against H pylori PDF (HpPDF), with IC50 values of 10.8 and 1.25 μmol/L, respectively. The X-ray crystal structures of apo-HpPDF and inhibitor-HpPDF complexes were determined, demonstrating at the atomic level that HpPDF is a potential target for screening new anti-H pylori agents53.

A natural component of ginger, [6]-gingerol, has been reported to exhibit anti-inflammatory and antioxidant properties and exert substantial anticarcinogenic and antimutagenic activities54. Despite its potential efficacy in cancer, the mechanism by which it exerts its chemopreventive effects was elusive. By using TarFisDock, [6]-gingerol was docked to each target in PDTD to identify its potential in vivo targets. The top 2% of protein hits from the ranked list were considered to be potential target candidates. Subsequent experimental data revealed that [6]-gingerol can effectively suppress tumor growth in nude mice by inhibiting leukotriene A4 hydrolase (LTA4H). These findings indicated a crucial role for LTA4H in cancer and supported the anticancer role of [6]-gingerol in targeting LTA4H to prevent colorectal cancer55.

Sphingosine-1-phosphate (S1P) is a sphingolipid metabolite that regulates many cellular and physiological processes, including cell growth, survival, movement, angiogenesis, vascular maturation, immunity and lymphocyte trafficking56,57,58. Although S1P could exert its biological function by binding to five S1P receptors on the cytomembrane, considerable evidence has suggested that S1P has direct intracellular targets. Using an in silico target identification approach, S1P was discovered to specifically bind to the histone deacetylases HDAC1 and HDAC2 to regulate histone acetylation59. S1P was also found to be a missing cofactor for the E3 ubiquitin ligase TRAF260. These achievements illustrate the pivotal role of S1P in the “inflammation-cancer” chain-related TNFα signaling pathway and in the regulation of gene expression and transcription.

Applications of computational methods in lead discovery

RhoA, one of the most characterized member of the Rho GTPase family, is essential for multiple cellular processes, including cytoskeletal rearrangement, gene expression, membrane trafficking as well as cell adhesion, migration, differentiation, proliferation and apoptosis61,62,63. This protein is a promising target for treating cardiovascular diseases. Using a docking-based virtual screening strategy in conjunction with chemical synthesis and bioassays, a series of first-in-class small molecular RhoA inhibitors were discovered from the SPECS database. A hierarchical docking strategy was adopted: DOCK4.019 was used for the initial screening, and the standard DOCK score was used to rank the resulting list; the top 3000 candidates were further docked and ranked by their new scores with Glide in standard precision (SP) mode29,64. In the end, eight compounds showed high RhoA inhibition activities, and two of them showed significant inhibitory effects against PE-induced contraction in thoracic aorta artery rings65.

Insulin-like growth factor-1 receptor (IGF-1R), a receptor tyrosine kinase, plays a pivotal role in signaling pathways involved in cell growth, proliferation and apoptosis66. IGF-1R has been shown to be overexpressed in many human cancers, which suggests it might be a promising target for cancer therapy67. Pharmacophore-based virtual screening combined with molecular docking was applied hierarchically to discover IGF-1R inhibitors. Beginning with the complex crystal structure of IGF-1R and its inhibitor, pyridine-2-one, the key interactions between the protein and the ligand at the ATP-binding site were used to construct a pharmacophore model. The SPECS database was screened using this model. The top ranked hits were then docked to the ATP-binding site using Glide29,64. This strategy led to the identification of a series of novel thiazolidine-2,4-dione analogues as potential IGF-1R inhibitors; the molecules demonstrate favorable inhibitory potency and selectivity against IGF-1R over insulin resistance (IR)68.

A prospective application of the LBDD program SHAFTS is the discovery of novel inhibitors for p90 ribosomal S6 protein kinase 2 (RSK2). Overexpression and aberrant activation of RSK2 have been linked to many human diseases, such as breast cancer, prostate cancer, and human head and neck squamous cell carcinoma69. Using the putative 3D conformations of two weakly binding RSK2 inhibitors with moderate activity as the query templates, 16 compounds with IC50 lower than 20 μmol/L, which would be missed by conventional 2D methods, were identified via chemotype switching directed by the SHAFTS calculation. The most potent hits show low micromolar inhibitory activities specifically for RSK2, and one compound also exhibits potent anti-migration activity in MDA-MB-231 tumor cells70.

In another study, a series of novel small molecule inhibitors of cyclophilin A (CypA) were identified using a de novo drug design approach. CypA plays an essential role in many biological processes, including enhancing the rate of protein folding/unfolding71,72, inhibiting the serine/threonine phosphatase activity of calcineurin73,74, facilitating viral replication and infection75,76, and inducing neuroprotective/neurotrophic effects77,78. In addition, CypA has been reported to be overexpressed exclusively in cancer cells, particularly in solid tumors, suggesting that CypA is an important regulator of carcinogenesis79. The identification of potent, structurally novel small molecule CypA inhibitors is urgently needed, as the most currently available CypA inhibitors are primarily natural products and peptide analogs that may face pharmacokinetic problems. Using the fragment structures of previously discovered CypA inhibitors80 as building blocks, a focused combinatorial library containing 255 molecules was designed using the LD1.051 program. By applying a docking-based virtual screening strategy that targets the binding pocket of CypA, 16 compounds were selected for synthesis and bioassay. According to the experimental results, these compounds all showed high CypA inhibitory activities. The binding affinity and inhibitory activity of the most potent compound among the identified novel CypA inhibitors are approximately 10 times more potent than the best previously known inhibitor81.

Outlook

Great progress has been made in methodology development and the application of computational drug discovery, resulting in a paradigm change in both industry and academics. Taking advantage of computational methods, potent hits can be obtained in a matter of weeks82. Searching for new chemical entities has led to the construction of high quality datasets and libraries that can be optimized for either molecular diversity or similarity. In addition, distributed computing has become more popular in large-scale virtual screening, in part because of increasingly powerful technology6.

Although it is apparent that computational drug discovery methods have great potential, one should not rely on computational techniques in a black box manner and should beware of the Garbage In-Garbage Out (GIGO) phenomenon. The in silico components in research must still be coupled with experiment resources, and computational discovery tools are not substitutions for the more important in cerebro component9,83,84. In the future, in addition to increasing the accuracy and effectiveness of existing technologies, the most important tendency in computational drug discovery field will be the integration of computational chemistry and biology together with chemoinformatics and bioinformatics, which will result in a new field known as pharmacoinformatics14,85. Inspired by the completion of the human genome and numerous pathogen genomes, great efforts will be made to understand the role of gene products in order to exploit their functions, which could be of great help for discovering new drug targets86. Computational methods involving target identification will become more attention-getting87,88, and designed small molecules will also be extensively used as probes for functional research89,90.