An iterative compound screening contest method for identifying target protein inhibitors using the tyrosine-protein kinase Yes

Chiba, Shuntaro; Ishida, Takashi; Ikeda, Kazuyoshi; Mochizuki, Masahiro; Teramoto, Reiji; Taguchi, Y-h.; Iwadate, Mitsuo; Umeyama, Hideaki; Ramakrishnan, Chandrasekaran; Thangakani, A. Mary; Velmurugan, D.; Gromiha, M. Michael; Okuno, Tatsuya; Kato, Koya; Minami, Shintaro; Chikenji, George; Suzuki, Shogo D.; Yanagisawa, Keisuke; Shin, Woong-Hee; Kihara, Daisuke; Yamamoto, Kazuki Z.; Moriwaki, Yoshitaka; Yasuo, Nobuaki; Yoshino, Ryunosuke; Zozulya, Sergey; Borysko, Petro; Stavniichuk, Roman; Honma, Teruki; Hirokawa, Takatsugu; Akiyama, Yutaka; Sekijima, Masakazu

doi:10.1038/s41598-017-10275-4

Download PDF

Article
Open access
Published: 20 September 2017

An iterative compound screening contest method for identifying target protein inhibitors using the tyrosine-protein kinase Yes

Shuntaro Chiba^1,2,
Takashi Ishida^1,2,3,
Kazuyoshi Ikeda⁴,
Masahiro Mochizuki⁵,
Reiji Teramoto⁶,
Y-h. Taguchi ORCID: orcid.org/0000-0003-0867-8986⁷,
Mitsuo Iwadate⁸,
Hideaki Umeyama⁸,
Chandrasekaran Ramakrishnan ORCID: orcid.org/0000-0003-4622-2496⁹,
A. Mary Thangakani¹⁰,
D. Velmurugan¹⁰,
M. Michael Gromiha⁹,
Tatsuya Okuno¹¹,
Koya Kato¹²,
Shintaro Minami¹³,
George Chikenji¹²,
Shogo D. Suzuki³,
Keisuke Yanagisawa³,
Woong-Hee Shin¹⁴,
Daisuke Kihara^14,15,
Kazuki Z. Yamamoto¹⁶,
Yoshitaka Moriwaki ORCID: orcid.org/0000-0003-0448-9790¹⁷,
Nobuaki Yasuo³,
Ryunosuke Yoshino^17,18,
Sergey Zozulya^19,20,
Petro Borysko^19,20,
Roman Stavniichuk¹⁹,
Teruki Honma^1,3,21,
Takatsugu Hirokawa^22,23,24,
Yutaka Akiyama^1,2,3,22,24 &
…
Masakazu Sekijima^1,2,3,18,24

Scientific Reports volume 7, Article number: 12038 (2017) Cite this article

3739 Accesses
24 Citations
18 Altmetric
Metrics details

Subjects

Abstract

We propose a new iterative screening contest method to identify target protein inhibitors. After conducting a compound screening contest in 2014, we report results acquired from a contest held in 2015 in this study. Our aims were to identify target enzyme inhibitors and to benchmark a variety of computer-aided drug discovery methods under identical experimental conditions. In both contests, we employed the tyrosine-protein kinase Yes as an example target protein. Participating groups virtually screened possible inhibitors from a library containing 2.4 million compounds. Compounds were ranked based on functional scores obtained using their respective methods, and the top 181 compounds from each group were selected. Our results from the 2015 contest show an improved hit rate when compared to results from the 2014 contest. In addition, we have successfully identified a statistically-warranted method for identifying target inhibitors. Quantitative analysis of the most successful method gave additional insights into important characteristics of the method used.

Chemical space docking enables large-scale structure-based virtual screening to discover ROCK1 kinase inhibitors

Article Open access 28 October 2022

Paul Beroza, James J. Crawford, … Christian Lemmen

CACHE (Critical Assessment of Computational Hit-finding Experiments): A public–private partnership benchmarking initiative to enable the development of computational methods for hit-finding

Article 15 February 2022

Suzanne Ackloo, Rima Al-awar, … Timothy M. Willson

Chemical proteomics reveals the target landscape of 1,000 kinase inhibitors

Article Open access 30 October 2023

Maria Reinecke, Paul Brear, … Bernhard Kuster

Introduction

Introducing a new drug to a market has become an enormous undertaking because of expanding research and development costs, which are estimated at over one billion USD^1,2,3,4. With a view to reducing these costs, computational technology-driven approaches have been proven to be useful and have begun to be applied at various stages of the drug discovery campaign, including from target identification to clinical phases^{3, 5}. For these stages, including the hit-compound identification for a target molecule, many computational methods have been devised to find compounds that are active from a compound library without resorting to high-throughput screening.

These computational methods use various approaches and experimental information; however, they are often divided into two categories: structure-based (SB) and ligand-based (LB). SB methods use an atomic-level structure of a target molecule. Most typical SB methods are molecular docking approaches that search the complex structure of a ligand, included in a compound library, and a target-molecule structure based on a scoring function. A ranking of docked compounds is calculated using these scores⁶. In contrast, LB methods use information of known active and/or inactive compounds related to a target molecule. LB methods generally calculate a ranking of compounds in a library using techniques such as a similarity search and machine learning⁷. Currently, various methods based on both SB and LB algorithms have been proposed for identifying hit compounds^6,7,8.

Although these methods are reasonably designed and seem to have the ability to enrich potent compounds toward higher ranks from a compound library, there are no set standards because the performance of a method often depends on the target molecule⁹. Hence, we cannot choose a method suitable for a specific target molecule before conducting experimental assessments. Thus, designating all resources for one method is risky. However, this risk may be reduced by collecting data from various computational methods. In addition, after conducting experimental assays, we can obtain information regarding a suitable method for the target.

To evaluate various methods for a target molecule, we held a compound-screening contest in 2014 to find inhibitors of the tyrosine-protein kinase Yes as an example target from a 2.2-million-compound library¹⁰. Ten groups participated in the contest and, in total, 600 compound-inhibition rates for enzymatic activity were assayed. We showed that the connected diversity of compounds proposed from all participant groups was larger than that proposed by any single group. This enabled the diversified screening of the compound library with reasonable methods. As a result, two compounds were identified as hit compounds. We had speculated that we could find methods that were significantly more likely to provide hit compounds than others based on the contest’s results. However, this was not possible with a statistically significant measure because of the shortage in number of assayed compounds. In the previous contest, the most successful group found 2 hit compounds from 55 compounds assayed. Provided that an average hit rate was 2/600, the p-value calculated by the binomial test for the group was 0.015. Taking the problem of multiple comparisons into account by the Bonferroni correction, there were no methods that outperformed others. In addition, the experiment may fail to detect other good methods, because, even if a method has a 3% hit-rate potential, 18.7% of the trials would return 0 hit compounds with 55 assays. Thus, many more assays are required for reliable evaluation.

To evaluate our approach for collecting various methods to reduce the risk of allocating all resources towards one method, and for obtaining useful information regarding promising methods, we conducted another contest in this study. We increased the number of compounds to be assayed for each group to more than 180. We chose the same target molecule as in the previous contest, i.e., the tyrosine-protein kinase Yes, because participants could use protein structural information as well as active and inactive compound information for this target, as well as related kinases in the same family. While the structure of Yes has not been reported, many homologous protein structures are deposited in the Protein Data Bank (PDB)¹¹ (e.g., 1Y57 (Unphosphorylated state of the tyrosine-protein kinase Src. Positives to Yes=92%)¹², 2SRC (Phosphorylated state of the tyrosine-protein kinase Src, positives=92%)¹³, 1OPK (the tyrosine-protein kinase Abl, positives=63%))¹⁴. Experimental information from active and inactive compounds for the target are deposited in open databases, such as BindingDB^{15, 16}, ChEMBL¹⁷, DrugBank¹⁸, and PubChem¹⁹.

The compound screening contest was organized by the Initiative for Parallel Bioinformatics (IPAB). It started on January 15, 2015 and ended on March 20, 2015. Eleven groups participated in the contest. The participants were asked to propose a prioritized set of 400 compounds. We selected approximately top 180 compounds from the prioritized list from each group and, in total, 1,991 unique compounds were assayed. Ten potent compounds with half-maximal inhibitory concentrations (IC₅₀) less than 10 μmol L⁻¹ were identified. Overview of the procedure is shown in Fig. 1. Among the 11 methods, a successful method was identified for this target in terms of hit rate, and the salient features of this method are discussed.

Methods

Preparation of compound library

A compound library was originally provided by Enamine Ltd. and contained 2,382,017 of the available compounds in their inventory. We searched the known inhibitors of Src-family kinases shown in Table S1 (Supporting Information) that met with certain criteria from ChEMBL (version 19)²⁰ and BindingDB^{15, 16} to eliminate them from the original library. These criteria included compounds with IC₅₀ <10 μmol L⁻¹, K _i <10 μmol L⁻¹, K _d <10 μmol L⁻¹, and inhibition rates >30%, where we did not take experimental conditions into consideration. We found 3,528 unique compounds, hereafter referred to as the known inhibitors of the contest, among which 24 compounds were identified and eliminated from the original compound library. We also excluded compounds interacting with a number of proteins. We searched compounds that inhibit more than four proteins from ChEMBL, using the same inhibition criteria, and 5009 compounds were identified. This number was reduced to 245 compounds by filtering for drug-likeness, as defined in Table S2 (Supporting Information). From there, 54 compounds were identified and eliminated from the original library. Finally, the processed library contained 2,381,939 compounds, and it was distributed to participants of the contest. All compound IDs in this study correspond to Enamine Ltd. IDs.

Methods participated

We accepted 11 groups, which are referred to as G1−G11 hereafter, which proposed various methods (shown in Table 1). A detailed description from groups and proposed compounds of SMILES in prioritized order are given in the section Methods used by each group of the supporting information and supplemental materials. Here, we briefly describe each method.

Table 1 Summary of methods used by participant groups.

Full size table

G1: A structure-activity-relationship (SAR) model was built employing balanced random forests²¹. Ligand descriptors of PubChem bioactive data²² for Yes kinase were used as the training set, in which seven compounds with IC₅₀ <1 nmol L⁻¹ were selected as active compounds and the other 832 compounds were designated as inactive.

G2: An SAR model was built employing a deep neural network model, in which descriptors of randomly-chosen 80% of the PubChem bioactive data²² were used as a training set and the other 20% comprised the test set, each of which contained active and non-active compounds. Promising compounds based on the SAR model were selected, followed by a filtering of drug-likeness and diverse selection.

G3: Compounds that were physicochemically similar to those of known inhibitors were filtered using a modified QED²³. A randomized tree model²⁴ was built on the bases of the concatenated descriptors of known inhibitors, their target kinases, and experimental conditions (concentration of reagents) and was applied to filter compounds. Out-of-bag validation showed a good correlation between predicted and experimental values. The filtered compounds were re-ranked by three metrics: (1) the original ranking, (2) prioritized by ligand efficiency based on the number of heavy atoms, and (3) the novelty of compounds to the top 1,000 of the original ranking compared with Src-family inhibitors. The proposed compounds were rotationally picked up from the three ranks.

G4: The Yes protein structure was built using BLAST search with the Yes sequence. Homologous proteins having a ligand of the Yes sequence were searched and the bound ligands were remapped to the built protein, which was used for the docking²⁵ of known inhibitors considering remapped ligands. Based on the ability to pick up inhibitors, Yes and ligand pairs were selected. These structures were used for the docking of library compounds.

G5: The crystal structures of Abl kinase, available for both IN and OUT conformations^{14, 26}, were taken as templates and respective structures were built for Yes kinase²⁷. PD166326, a type I inhibitor (IN), and imatinib, a type II inhibitor (OUT) were co-crystallized with Abl kinase and docked with the IN and OUT models built for Yes kinase. On the basis of physicochemical properties, the initial compound library was filtered. Actives and decoys¹⁰ were added to the filtered compounds and subjected to docking²⁸ combined with pharmacophore-based virtual screening^{29, 30}. The same set of actives and decoys were included to validate the screening results. Finally, the top hit compounds from the pharmacophore-based virtual screening of DFG-IN and DFG-OUT conformations were applied.

G6: A virtual screening method³¹ was applied to the compound library that performed 3D structural comparison based on a multiple-ligand template built from known multiple inhibitors using a geometric hashing technique. If a steric clash between a compound and the target protein was found, a score for a given ligand pose was penalized. Twenty complex structures of homologous proteins of Yes and its ligands deposited in the PDB were selected on the basis of the ability to discriminate actives from decoys through docking. The selected 20 proteins and their bound ligands were superimposed by the protein structure alignment program MICAN^{32, 33} to the Yes structure model built based on the closest homology of Yes²⁷.

G7: A deep neural network was trained based on physicochemical and topological descriptors of active and inactive ligands. Hyperparameters of the deep neural network (e.g., a number of hidden layers) were also optimized using a random search³⁴ based on receiver-operating-characteristic (ROC) curves calculated using 5-fold cross-validation procedures in terms of known ligands. The model that gave the best ROC curve was applied to filter the compound library.

G8: The target protein structure was built from homologous proteins and its binding pocket was converted into three-dimensional Zernike descriptors (3DZD). Ligand structures from the compound library were also converted to the 3DZD and the compatibility of each ligand to the pocket was used to select a potential inhibitor.

G9: Homologous proteins of Yes were downloaded from the PDB and docking pockets that were distant from the ATP/substrate-binding pockets were searched to find allosteric sites. Among the prepared candidate structures, two structures that showed higher docking scores from a relatively small number of compounds were chosen for the production run. Docked compounds were prepared by filtering similar known inhibitors (85% similarity) from the compound library. Visual inspection was applied to eliminate compounds that did not have drug-likeness.

G10: First, known potent compounds were used to filter the compound library to be used for subsequent docking. The Yes protein and ligand complex structure were built by homology modeling, followed by a molecular dynamics (MD) simulation of the complex to relax the structure. The 40-ns structure of the complex was used for docking.

G11: Protein ligand complex structures were built from three homologous proteins. Docking of active and inactive compounds was applied to each structure and the ability to separate active from inactive compounds was evaluated. Those displaying reasonable ability were used for docking of the compound library. Docking poses of high-ranked compounds were re-ranked using scores that considered the similarity and dissimilarity of docking poses among active and inactive compounds.

Screening of Compounds

Experimental procedure and screening of potential inhibitors

All inhibitory assays of the phosphorylation activity of Yes were performed in accordance with the Promega Technical Manual for the ADP-Glo™ Kinase Assay (Fitchburg, WI, USA. Catalog number: V9102). The human recombinant Yes [a.a. 2–543 (end)] was purchased from BPS Bioscience (catalog number: 40488). The details of the assay protocol and reagent information are given elsewhere¹⁰. Here, we briefly describe the screening of the compounds based on inhibition activity and results.

The screening was conducted in three steps consisting of the first screening, the second screening, and the IC₅₀ determination, as can be illustrated in Fig. 1b. First, we determined the inhibition rates of the 1,991 compounds. Each compound was placed in 4 wells of a 384-well plate. In total, 80 compounds were assayed on one plate and the other wells were used for positive and negative controls. The compounds were randomly placed on plates so that compounds proposed by one group were not placed on a plate together. A mean of four inhibition rates for each compound was compared to criteria for the first screening. These criteria included that the inhibition rate was greater than 25% and the inhibition rate was greater than the mean plus three-fold of the standard deviation of the plate on which the compound was assayed, where observed inhibition rates of positive and negative controls were not taken into consideration. As a results, 68 compounds passed the screening. Information of the dropped compounds is given in Table S3 of the Supporting Information. Second, the inhibition rates of the screened compounds were determined on one plate using the same procedure of the first screening, where compounds were dissolved from fresh powder. As a result, 16 compounds showed inhibition rates greater than the threshold of the second screening (i.e., approximately 50%). Information for screened and dropped compounds is given in Table S4 of the Supporting Information. Screened compounds were then evaluated for their IC₅₀ values. The chemical structure and assay results of these compounds are given in Table 2.

Table 2 IC₅₀ values of compounds that passed the validation assay (the 2^nd screening).

Full size table

Among the 16 compounds, 10 compounds showed an IC₅₀ less than our hit criterion, which was an IC₅₀ less than 10 μmol L⁻¹, as shown in Table 2. These compounds showed a clear dose-response relationship (DRR) as can be seen in Figure S1 of the Supporting Information. As for Z1252403274 and Z275023406, which showed a good DRR, they were not defined as hit compounds because of insufficient potency. The other four compounds, Z50080378, Z57745307, Z50080181, and Z1283491630, did not reveal a DRR, having “inhibition activity” around 50% in the whole range of concentrations, which may be due to their non-specific interactions with the target (promiscuous protein binding, protein aggregation) or solubility-related issues. For these reasons, these compounds were excluded from consideration. Note that we confirmed that the threshold used for the second screening was reasonable as can be seen in Figure S2 of the Supporting Information.

The 10 hit compounds were compared to the pan-assay interference compounds (PAINS) filters, filters A, B, and C described in the literature³⁵, which suggests potential functional groups of frequent hitters extracted from HTS assays. We found that the 10 compounds do not have these potential functional groups. This means that all the hit compounds are promising for further investigation. It should be noted that some hit compounds have “questionable” chemotypes from a medicinal chemistry point of view, i.e., hydrazones (Z49895016, Z57745314, Z57745304, Z295464022) and a potential Michael acceptor (Z449737600), which may present a reactivity/toxicity liability. In the present study, we did not exclude them because only the biochemical assays, not cell-based, were used for screening in this study. This strongly decreases the chance of getting false positives with these compounds during the primary screen. The emphasis was also placed on avoiding the potential loss of any active scaffolds identified by the competing computational groups, rather than on the early elimination of less desirable chemical series. Substituting hydrazones with their non-reactive isosteres during hit-to-lead optimization is a feasible medicinal chemistry endeavor as is illustrated by some research publications^36,37,38. We will discuss these hydrazone-containing compounds in more details in the following section. The Michael acceptor could be substituted by an amide group between an acid and a cyclical secondary amine, to retain molecular rigidity.

Discussion

Hit rate of assayed compounds

The total number of hit compounds was 10 (Table 2), seven of which were proposed by G3, two by G10, and one by G11. G3 outperformed the other groups in terms of the number of hits and potencies of the compounds and was followed by G10 and G11 as can be clearly seen in Fig. 2a. Performances of these methods in terms of a hit rate, compared to an average rate of all the methods, can be evaluated by the binomial test while eliminating the problem of multiple comparisons by applying the Bonferroni correction. Assuming the hit rate of all the compounds is 10/1991, the p-values for G3, G10, and G11 were 4 × 10⁻⁵, 0.2, and 0.6, respectively. Hence, we confirmed that the method of G3 was statistically warranted.

We can also evaluate these methods by how they enriched active compounds in their prioritized ranks. The hit compounds of these methods were enriched toward higher ranks as can be seen in Fig. 2b. These methods showed better enrichment compared to the average rate.

These results suggest that the methods employed by these groups could reasonably distinguish active compounds. In the present study, we will mainly focus on these three methods.

Comparison of ligand-based and structure-based methods

The proposed methods were classified into LB and SB approaches. The LB approaches were defined as those methods that used active and/or inactive ligand information for relevant kinases regardless of the incorporation of protein structure. The SB approaches included methods that used protein structure and did not use ligand information for filtering of the compound library.

Hit rate

The groups that found hit compounds, G3, G10, and G11, can be classified into LB, LB→SB, and SB methods as tabulated in Table 1. G3 and G10 used ligand information in a direct way to filter the compound library; only G11 used ligand information in an indirect way, i.e., the selection of a protein structure used for docking. In this sense, it was only a single compound that was proposed by an SB method. Hence, compared to LB methods, it was very difficult to find a hit compound using an SB method in this study.

The proposed compounds from G3 were selected based on the three prioritized ranks (see the explanation of G3 in the section Methods participated). Four and three of the hit compounds of G3 were found by the original rank (1) and ligand-efficiency-based rank (2), respectively. No compounds were found from the novelty-based rank, which may indicate that finding novel compounds using an LB approach is difficult.

Novelty

It is of great importance to obtain a number of novel hit compounds in drug discovery³⁹. We compared which hit compounds from LB or SB gave novel compounds in this study. First, we calculated similarities between each hit compounds and known Src-family inhibitors defined in the Preparation of compound library section. Among the similarities calculated for each of the compounds, the maximum value was assigned to the compound as the max similarity. The most novel compound was proposed by G11 (SB), which used docking for the selection of compounds, as can be shown in Fig. 3. The second was proposed by G10 (LB→SB), which used known inhibitors to filter the compound library followed by docking. Almost all the other compounds were proposed by G3 (LB), which used known active and inactive compounds to build a compound filter. Among seven hits from G3, four compounds were hydrazone (Z49895016, Z57745314, Z57745304, Z295464022) and had a similar scaffold as can be shown by their structures. This was because the training set G3 used contained 65 known hydrazone-containing compounds, in which 58 compounds had inhibition rate greater than 50%, in the total number of compounds used of 2040. Among the compounds used, 56 compounds of the hydrazone-containing compounds were derived from Published Kinase Inhibitor Set (PKIS), which collected results of kinase panel experiments of 367 kinase inhibitors and was released from GlaxoSmithKline. A similar scaffold was reported by a clustering analysis of PKIS⁴⁰. This shows a clear dependency of the LB method on training data set used. Hence, we could say that an LB method is more likely to give similar hit compounds to known inhibitors in our contest. Conversely, one can resort to a method that uses an SB approach to obtain novel hit compounds. We also confirmed that hit compounds that were proposed by different groups were not similar to each other (see Figure S3 of the Supporting Information).

Characteristics of the most successful method

Among all the groups, the hit rate of compounds proposed by G3 was statistically confirmed to be higher than the others. We summarize the salient characteristics of the method here. G3 employed a machine learning technique based on a training set that combined three kinds of data for known active and inactive compounds of the Src and relevant kinase families. These data included compound descriptors, experimental conditions when the inhibition rate was measured, and target protein information. Inhibition rates were used for training the model instead of inhibition constants or IC₅₀ values because inhibition rates for compounds were relatively abundant. In some cases, G3 used inhibition rates that were measured for the determination of IC₅₀ values. Among the LB methods, experimental conditions and protein information were not used except for G3. Here we focus on these two characteristics and investigate the significance of incorporating these features.

Incorporation of experimental conditions for machine learning

G3 included several experimental conditions, as compiled in Table 3, when training the machine learning model using inhibition rates. This was based on the fact that an inhibition rate of a compound depends on experimental conditions (e.g., concentrations of compounds, enzyme, and ATP) and that experimental conditions can differ in different studies. Hence, incorporating these conditions in the training data sounds reasonable. Experimental conditions that accompanied the known compound information that G3 used were diverse, as seen in Table 3. The range of concentrations was broad, indicating that it is dangerous to build an SAR model based only on inhibition rates or IC₅₀ values from different experimental studies.

Table 3 Range of experimental conditions used for training a machine learning technique^a.

Full size table

To test the significance of incorporating experimental conditions, G3 conducted an OOB validation with and without experimental conditions. Excluding the experimental condition made prediction accuracy (R ²) decrease from 0.82 to 0.44. We believe that, especially in the case of building an SAR model as G3 conducted, considering experimental conditions would be crucially important if data sets are based on several experimental conditions. As which experimental conditions were significantly important was not clear in this study, further investigation and validation of the insights we obtained are needed. Further, incorporating substrate concentration, which was not used by G3, may help improve prediction accuracy.

Incorporation of protein information for machine learning

G3 used compound information for Src, Tec, and Abl kinase families, which are closely related⁴² (The ChEMBL IDs and references used are tabulated in Table S6 of the Supporting Information). While some compounds interact with a broad range of kinases, others have selectivity to a specific kinase^{43, 44}. To evaluate the selectivity of compounds that G3 used, we clustered the compounds into three groups and calculated the hit rates of compounds in each cluster with respect to each kinase group, in which the hit criterion was defined to be 50% inhibition. As can be seen in Fig. 4a, each cluster did not interact with each kinase family equally. This means that there is some selectivity of compounds in the three kinase families. To evaluate the selectivity of compounds within the Src family, we clustered the compounds that had experimental information available into three groups. As can be seen in Fig. 4b, the selectivity persists in these groups. The selectivity of Group 3 of Src-family kinases was different from the other two groups. This may be consistent with the fact that Group 3 is distantly related to the other groups⁴⁵. Hence, incorporating protein information to compound descriptors and experimental conditions may improve prediction accuracy.

An OOB validation showed that excluding protein descriptors made the prediction accuracy worse, i.e., the R ² decreased from 0.82 to 0.73, indicating that distinguishing protein targets was meaningful in this study. As the trained model can provide a potency of a compound for each kinase used in the training set, we could obtain a selective compound for a specific kinase. Interestingly, only combining a compound’s descriptor and protein information did not improve the prediction accuracy compared to using compound descriptors simply as a training set, i.e., the R ² was only improved from 0.43 to 0.44 by introducing protein information. This means that protein information becomes useful when it is used with experimental conditions for a training set.

Comparison to the previous contest

Comparing this study with the previous contest would give useful information. As we noted about the previous contest¹⁰, collecting various computational methods enables diversified screening in the chemical space of the contest library compared with a single method, as can be seen in Fig. 5 and Figure S4 of the Supporting Information. This reflects the diversity of hit compounds, as can be seen in Fig. 5b and Figure S3 of the Supporting Information. The contest-based approach can provide diverse hit compounds than a single method can do. In addition, comparing the chemical diversity of hit compounds of this study (Fig. 5b) to the previous contest (Fig. 5a), hit compounds obtained in this study had broader diversity.

The total hit rate improved from 2/600 to 10/1991 (hit compounds/assayed compounds). This improvement is remarkable considering that we eliminated known inhibitors of the Src-family from the contest library this time. In the previous contest, we eliminated known inhibitors of Yes, but all the hit compounds in the previous contest were known inhibitors of other Src-family kinases.

As we have discussed in “Experimental procedure and screening of potential inhibitors” and “Comparison of ligand-based and structure-based methods” sections, we decided not to exclude the hydrzones (Z49895016, Z57745314, Z57745304, Z295464022) and the potential Michael acceptor (Z449737600) from the hit list. However, it would be worth comparing this study to the previous contest with eliminating them from the list, because regarding them as possible compounds for lead optimization remains a matter of debate. The total hit rate decreases from 10/1991 to 5/1991, which is comparable to the previous hit rate of 2/600. Even though the questionable compounds were eliminated, considering the absence of known Src-family inhibitors in the compound library used, improvement of the second contest is warranted.

We speculate that iterative participation provides the opportunity for improvement in each method because the three groups that proposed hit compounds participated in both contests. Note that 92% of the compounds in the compound library in this study were included in the previous contest library and the ten hit compounds were also included in the compound library of the previous contest.

We expected to distinguish promising methods by increasing the number of compounds assayed. However, even if the number of assayed compounds for each group was reduced to the approximate number of assayed compounds in the previous contest (55), almost all hit compounds can be found (see Fig. 2b). The p-values for G3, G10, and G11 improve to 6 × 10⁻⁷, 0.04, and 0.26, respectively. Hence, the method of G3 is statistically warranted. Apparently, we could reduce the number of compounds to assay in this sense. However, a sufficient number of compounds to assay is necessary to detect a method with a modest hit rate. If a method has a hit rate of 3%, at least one hit compound can be found in 99.6% of the time in this experiment that assayed 180 compounds for each group. In this regard, the experiment did not miss promising methods with a significant hit rate.

Conclusion

The compound screening contest to predict potential inhibitors of the tyrosine-protein kinase Yes from the 2.4-million-compound library was held not only to identify potent inhibitors for the target and but also to benchmark various methods based on the same experimental conditions, in which 11 groups participated. Among 1,991 assayed compounds, ten hit compounds with IC₅₀ values less than 10 μmol L⁻¹ were identified, which are not likely to be frequent hitters in terms of the fact that they passed PAINS filters. Comparing this study with the previous contest, which was held by the same organizer with the same target¹⁰, the hit rate improved and the diversity of hit compounds grew broader.

The participating groups employed various approaches, which were classified as LB or SB approaches. Comparison of the LB and SB approaches by the three groups which proposed hits showed that the LB approach was more likely to give hit compounds, whereas the SB approach gives more novel hit compounds in our contest.

The characteristics of the most successful LB method, which identified seven hit compounds, were studied in terms of the training data set that the group used for a machine learning technique. We found that incorporation of experimental conditions, e.g., concentration of compounds under which inhibition rates were measured, significantly contributed to the prediction accuracy. In addition, the incorporation of protein descriptors to distinguish known compounds’ target kinase was found partly to contribute to improved prediction accuracy.

We confirmed that a contest-based approach to identify potential inhibitors of a target protein can be successful in identifying promising hit compounds. Moreover, it can provide an initial benchmark of various methods and suggests promising approaches for the target system. Extensive exploitation and further investigation of these methods should lead to additional novel hit compounds in the drug discovery process.

References

Paul, S. M. et al. How to improve R&D productivity: the pharmaceutical industry’s grand challenge. Nat Rev Drug Discov 9, 203–214, doi:10.1038/nrd3078 (2010).
CAS PubMed Google Scholar
Morgan, S., Grootendorst, P., Lexchin, J., Cunningham, C. & Greyson, D. The cost of drug development: a systematic review. Health Policy 100, 4–17, doi:10.1016/j.healthpol.2010.12.002 (2011).
Article PubMed Google Scholar
Loging, W. T. Bioinformatics and Computational Biology in Drug Discovery and Development. (Cambridge University Press, 2016).
DiMasi, J. A., Grabowski, H. G. & Hansen, R. W. Innovation in the pharmaceutical industry: New estimates of R&D costs. J. Health Econ. 47, 20–33, doi:10.1016/j.jhealeco.2016.01.012 (2016).
Article PubMed Google Scholar
Ou-Yang, S. S. et al. Computational drug discovery. Acta Pharmacol. Sin. 33, 1131–1140, doi:10.1038/aps.2012.109 (2012).
Article CAS PubMed PubMed Central Google Scholar
Meng, X. Y., Zhang, H. X., Mezei, M. & Cui, M. Molecular docking: a powerful approach for structure-based drug discovery. Curr. Comput. Aided Drug Des. 7, 146–157 (2011).
Article CAS PubMed PubMed Central Google Scholar
Acharya, C., Coop, A., Polli, J. E. & Mackerell, A. D. Jr. Recent advances in ligand-based drug design: relevance and utility of the conformationally sampled pharmacophore approach. Curr. Comput. Aided Drug Des. 7, 10–22 (2011).
Article CAS PubMed PubMed Central Google Scholar
Lionta, E., Spyrou, G., Vassilatis, D. K. & Cournia, Z. Structure-based virtual screening for drug discovery: principles, applications and recent advances. Curr. Top. Med. Chem. 14, 1923–1938 (2014).
Article CAS PubMed PubMed Central Google Scholar
von Korff, M., Freyss, J. & Sander, T. Comparison of Ligand- and Structure-Based Virtual Screening on the DUD Data Set. J. Chem. Inf. Model. 49, 209–231, doi:10.1021/ci800303k (2009).
Article Google Scholar
Chiba, S. et al. Identification of potential inhibitors based on compound proposal contest: Tyrosine-protein kinase Yes as a target. Sci. Rep. 5, 17209, doi:10.1038/srep17209 (2015).
Article ADS CAS PubMed PubMed Central Google Scholar
Berman, H. M. et al. The Protein Data Bank. Nucleic Acids Res. 28, 235–242, doi:10.1093/nar/28.1.235 (2000).
Article ADS CAS PubMed PubMed Central Google Scholar
Cowan-Jacob, S. W. et al. The crystal structure of a c-Src complex in an active conformation suggests possible steps in c-Src activation. Structure 13, 861–871, doi:10.1016/j.str.2005.03.012 (2005).
Article CAS PubMed Google Scholar
Xu, W. Q., Doshi, A., Lei, M., Eck, M. J. & Harrison, S. C. Crystal structures of c-Src reveal features of its autoinhibitory mechanism. Mol. Cell 3, 629–638, doi:10.1016/S1097-2765(00)80356-1 (1999).
Article CAS PubMed Google Scholar
Nagar, B. et al. Structural Basis for the Autoinhibition of c-Abl Tyrosine Kinase. Cell 112, 859–871, doi:10.1016/S0092-8674(03)00194-6 (2003).
Article CAS PubMed Google Scholar
Liu, T. Q., Lin, Y. M., Wen, X., Jorissen, R. N. & Gilson, M. K. BindingDB: a web-accessible database of experimentally determined protein-ligand binding affinities. Nucleic Acids Res. 35, D198–D201, doi:10.1093/nar/gkl999 (2007).
Article CAS PubMed Google Scholar
Chen, X., Lin, Y., Liu, M. & Gilson, M. K. The Binding Database: data management and interface design. Bioinformatics 18, 130–139, doi:10.1093/bioinformatics/18.1.130 (2002).
Article CAS PubMed Google Scholar
Gaulton, A. et al. ChEMBL: a large-scale bioactivity database for drug discovery. Nucleic Acids Res. 40, D1100–D1107, doi:10.1093/nar/gkr777 (2012).
Article CAS PubMed Google Scholar
Knox, C. et al. DrugBank 3.0: a comprehensive resource for ‘omics’ research on drugs. Nucleic Acids Res 39, D1035–1041, doi:10.1093/nar/gkq1126 (2011).
Article CAS PubMed Google Scholar
Li, Q., Cheng, T., Wang, Y. & Bryant, S. H. PubChem as a public resource for drug discovery. Drug Discov Today 15, 1052–1057, doi:10.1016/j.drudis.2010.10.003 (2010).
Article CAS PubMed PubMed Central Google Scholar
Bento, A. P. et al. The ChEMBL bioactivity database: an update. Nucleic Acids Res. 42, D1083–D1090, doi:10.1093/nar/gkt1031 (2014).
Article CAS PubMed Google Scholar
Svetnik, V. et al. Random forest: a classification and regression tool for compound classification and QSAR modeling. J. Chem. Inf. Comput. Sci. 43, 1947–1958, doi:10.1021/ci034160g (2003).
Article CAS PubMed Google Scholar
Patel, P. R. et al. Identification of potent Yes1 kinase inhibitors using a library screening approach. Bioorg. Med. Chem. Lett. 23, 4398–4403, doi:10.1016/j.bmcl.2013.05.072 (2013).
Article CAS PubMed PubMed Central Google Scholar
Bickerton, G. R., Paolini, G. V., Besnard, J., Muresan, S. & Hopkins, A. L. Quantifying the chemical beauty of drugs. Nat. Chem. 4, 90–98, doi:10.1038/nchem.1243 (2012).
Article CAS PubMed PubMed Central Google Scholar
Geurts, P., Ernst, D. & Wehenkel, L. Extremely randomized trees. Mach Learn 63, 3–42, doi:10.1007/s10994-006-6226-1 (2006).
Article MATH Google Scholar
Takaya, D. et al. Bioinformatics based ligand-docking and in-silico screening. Chem. Pharm. Bull. 56, 742–744, doi:10.1248/cpb.56.742 (2008).
Article CAS PubMed Google Scholar
Nagar, B. et al. Crystal Structures of the Kinase Domain of c-Abl in Complex with the Small Molecule Inhibitors PD173955 and Imatinib (STI-571). Cancer Res. 62, 4236–4243 (2002).
CAS PubMed Google Scholar
Fiser, A. & Sali, A. MODELLER: Generation and refinement of homology-based protein structure models. Method Enzymol 374, 461–491, doi:10.1016/S0076-6879(03)74020-8 (2003).
Article CAS Google Scholar
Friesner, R. A. et al. Extra Precision Glide: Docking and Scoring Incorporating a Model of Hydrophobic Enclosure for Protein−Ligand Complexes. J. Med. Chem. 49, 6177–6196, doi:10.1021/jm051256o (2006).
Article CAS PubMed Google Scholar
Salam, N. K., Nuti, R. & Sherman, W. Novel Method for Generating Structure-Based Pharmacophores Using Energetic Analysis. J. Chem. Inf. Model. 49, 2356–2368, doi:10.1021/ci900212v (2009).
Article CAS PubMed Google Scholar
Loving, K., Salam, N. K. & Sherman, W. Energetic analysis of fragment docking and application to structure-based pharmacophore hypothesis generation. J. Comput. Aided Mol. Des. 23, 541–554, doi:10.1007/s10822-009-9268-1 (2009).
Article ADS CAS PubMed Google Scholar
Okuno, T., Kato, K., Terada, T. P., Sasai, M. & Chikenji, G. VS-APPLE: A Virtual Screening Algorithm Using Promiscuous Protein-Ligand Complexes. J. Chem. Inf. Model. 55, 1108–1119, doi:10.1021/acs.jcim.5b00134 (2015).
Article CAS PubMed Google Scholar
Minami, S., Sawada, K. & Chikenji, G. MICAN: a protein structure alignment algorithm that can handle Multiple-chains, Inverse alignments, C(alpha) only models, Alternative alignments, and Non-sequential alignments. BMC Bioinformatics 14, 24, doi:10.1186/1471-2105-14-24 (2013).
Article CAS PubMed PubMed Central Google Scholar
Minami, S., Sawada, K. & Chikenji, G. How a spatial arrangement of secondary structure elements is dispersed in the universe of protein folds. Plos One 9, e107959, doi:10.1371/journal.pone.0107959 (2014).
Article ADS PubMed PubMed Central Google Scholar
Bergstra, J. & Bengio, Y. Random Search for Hyper-Parameter Optimization. J Mach Learn Res 13, 281–305 (2012).
MathSciNet MATH Google Scholar
Baell, J. B. & Holloway, G. A. New Substructure Filters for Removal of Pan Assay Interference Compounds (PAINS) from Screening Libraries and for Their Exclusion in Bioassays. J. Med. Chem. 53, 2719–2740, doi:10.1021/jm901137j (2010).
Article CAS PubMed Google Scholar
Mayer, N. et al. Structure-activity studies in the development of a hydrazone based inhibitor of adipose-triglyceride lipase (ATGL). Bioorganic & medicinal chemistry 23, 2904–2916, doi:10.1016/j.bmc.2015.02.051 (2015).
Article CAS Google Scholar
Yogeeswari, P., Menon, N., Semwal, A., Arjun, M. & Sriram, D. Discovery of molecules for the treatment of neuropathic pain: synthesis, antiallodynic and antihyperalgesic activities of 5-(4-nitrophenyl)furoic-2-acid hydrazones. Eur. J. Med. Chem. 46, 2964–2970, doi:10.1016/j.ejmech.2011.04.021 (2011).
Article CAS PubMed Google Scholar
Senger, M. R., Fraga, C. A., Dantas, R. F. & Silva, F. P. Jr. Filtering promiscuous compounds in early drug discovery: is it a good idea? Drug Discov Today 21, 868–872, doi:10.1016/j.drudis.2016.02.004 (2016).
Article CAS PubMed Google Scholar
Owens, P. K. et al. A decade of innovation in pharmaceutical R&D: the Chorus model. Nat. Rev. Drug Discov. 14, 17–28, doi:10.1038/nrd4497 (2015).
Article CAS PubMed Google Scholar
Dranchak, P. et al. Profile of the GSK published protein kinase inhibitor set across ATP-dependent and-independent luciferases: implications for reporter-gene assays. Plos One 8, e57888, doi:10.1371/journal.pone.0057888 (2013).
Article ADS CAS PubMed PubMed Central Google Scholar
Durant, J. L., Leland, B. A., Henry, D. R. & Nourse, J. G. Reoptimization of MDL keys for use in drug discovery. J. Chem. Inf. Comput. Sci. 42, 1273–1280, doi:10.1021/ci010132r (2002).
Article CAS PubMed Google Scholar
Manning, G., Whyte, D. B., Martinez, R., Hunter, T. & Sudarsanam, S. The protein kinase complement of the human genome. Science 298, 1912–1934, doi:10.1126/science.1075762 (2002).
Article ADS CAS PubMed Google Scholar
Anastassiadis, T., Deacon, S. W., Devarajan, K., Ma, H. & Peterson, J. R. Comprehensive assay of kinase catalytic activity reveals features of kinase inhibitor selectivity. Nat. Biotechnol. 29, 1039–1045, doi:10.1038/nbt.2017 (2011).
Article CAS PubMed PubMed Central Google Scholar
Davis, M. I. et al. Comprehensive analysis of kinase inhibitor selectivity. Nat. Biotechnol. 29, 1046–1051, doi:10.1038/nbt.1990 (2011).
Article CAS PubMed Google Scholar
Roskoski, R. Jr. Src protein-tyrosine kinase structure, mechanism, and small molecule inhibitors. Pharmacol. Res. 94, 9–25, doi:10.1016/j.phrs.2015.01.003 (2015).
Article CAS PubMed Google Scholar
Canvas v. 2.8 (Schrödinger, LLC, New York, NY, 2016).
Duan, J. X., Dixon, S. L., Lowrie, J. F. & Sherman, W. Analysis and comparison of 2D fingerprints: Insights into database screening performance using eight fingerprint methods. J. Mol. Graph. Model. 29, 157–170, doi:10.1016/j.jmgm.2010.05.008 (2010).
Article CAS PubMed Google Scholar
Lloyd, S. P. Least-Squares Quantization in Pcm. IEEE Trans. Inf. Theory 28, 129–137, doi:10.1109/Tit.1982.1056489 (1982).
Article MathSciNet MATH Google Scholar
Yap, C. W. PaDEL-descriptor: an open source software to calculate molecular descriptors and fingerprints. J. Comput. Chem. 32, 1466–1474, doi:10.1002/jcc.21707 (2011).
Article CAS PubMed Google Scholar
Rogers, D. & Hahn, M. Extended-Connectivity Fingerprints. J. Chem. Inf. Model. 50, 742–754, doi:10.1021/ci100050t (2010).
Article CAS PubMed Google Scholar
Carhart, R. E., Smith, D. H. & Venkataraghavan, R. Atom pairs as molecular features in structure-activity studies: definition and applications. J. Chem. Inf. Comput. Sci. 25, 64–73, doi:10.1021/ci00046a002 (1985).
Article CAS Google Scholar
van Westen, G. J. et al. Which compound to select in lead optimization? Prospectively validated proteochemometric models guide preclinical development. Plos One 6, e27518, doi:10.1371/journal.pone.0027518 (2011).
Article ADS PubMed PubMed Central Google Scholar
Sandberg, M., Eriksson, L., Jonsson, J., Sjöström, M. & Wold, S. New Chemical Descriptors Relevant for the Design of Biologically Active Peptides. A Multivariate Characterization of 87 Amino Acids. J. Med. Chem. 41, 2481–2491, doi:10.1021/jm9700575 (1998).
Article CAS PubMed Google Scholar
Umeyama, H. & Iwadate, M. FAMS and FAMSBASE for protein structure. Curr. Protoc. Bioinformatics Chapter 5, Unit5 2, doi:10.1002/0471250953.bi0502s04 (2004).
O’Boyle, N. M. et al. Open Babel: An open chemical toolbox. J. Cheminform. 3, 33, doi:10.1186/1758-2946-3-33 (2011).
Article PubMed PubMed Central Google Scholar
LigPrep v. 3.2 (Schrödinger, LLC, New York, NY, 2014).
Glide v. 6.0 (Schrödinger, LLC, New York, NY, 2014).
Friesner, R. A. et al. Glide: a new approach for rapid, accurate docking and scoring. 1. Method and assessment of docking accuracy. J. Med. Chem. 47, 1739–1749, doi:10.1021/jm0306430 (2004).
Article CAS PubMed Google Scholar
Halgren, T. A. et al. Glide: a new approach for rapid, accurate docking and scoring. 2. Enrichment factors in database screening. J. Med. Chem. 47, 1750–1759, doi:10.1021/jm030644s (2004).
Article CAS PubMed Google Scholar
Hawkins, P. C. D., Skillman, A. G., Warren, G. L., Ellingson, B. A. & Stahl, M. T. Conformer Generation with OMEGA: Algorithm and Validation Using High Quality Structures from the Protein Databank and Cambridge Structural Database. J. Chem. Inf. Model. 50, 572–584, doi:10.1021/ci100031x (2010).
Article CAS PubMed PubMed Central Google Scholar
Ko, J., Park, H. & Seok, C. GalaxyTBM: template-based modeling by building a reliable core and refining unreliable local regions. BMC Bioinformatics 13, 1–8, doi:10.1186/1471-2105-13-198 (2012).
Article Google Scholar
Hennequin, L. F. et al. N-(5-Chloro-1,3-benzodioxol-4-yl)−7-[2-(4-methylpiperazin-1-yl)ethoxy]−5- (tetrahydro-2H-pyran-4-yloxy)quinazolin-4-amine, a Novel, Highly Selective, Orally Available, Dual-Specific c-Src/Abl Kinase Inhibitor. J. Med. Chem. 49, 6465–6488, doi:10.1021/jm060434q (2006).
Article CAS PubMed Google Scholar
Witucki, L. A. et al. Mutant Tyrosine Kinases with Unnatural Nucleotide Specificity Retain the Structure and Phospho-Acceptor Specificity of the Wild-Type Enzyme. Chemistry & Biology 9, 25–33, doi:10.1016/S1074-5521(02)00091-1 (2002).
Article CAS Google Scholar
Xu, W., Harrison, S. C. & Eck, M. J. Three-dimensional structure of the tyrosine kinase c-Src. Nature 385, 595–602, doi:10.1038/385595a0 (1997).
Article ADS CAS PubMed Google Scholar
Hu, B., Zhu, X., Monroe, L., Bures, M. G. & Kihara, D. PL-PatchSurfer: a novel molecular local surface-based method for exploring protein-ligand interactions. Int. J. Mol. Sci. 15, 15122–15145, doi:10.3390/ijms150915122 (2014).
Article CAS PubMed PubMed Central Google Scholar
Shin, W. H., Christoffer, C. W., Wang, J. & Kihara, D. PL-PatchSurfer2: Improved Local Surface Matching-Based Virtual Screening Method That Is Tolerant to Target and Ligand Structure Variation. J. Chem. Inf. Model. 56, 1676–1691, doi:10.1021/acs.jcim.6b00163 (2016).
Article CAS PubMed Google Scholar
Fleury, D., Sarubbi, E., Courjaud, A., Guitton, J. & Ducruix, A. Structure of the unphosphorylated c-terminal tail segment of the src kinase and its role in src activity regulation. To be published.
Bauerova-Hlinkova, V., Dvorsky, R., Perecko, D., Povazanec, F. & Sevcik, J. Structure of RNase Sa2 complexes with mononucleotides–new aspects of catalytic reaction and substrate recognition. FEBS J 276, 4156–4168, doi:10.1111/j.1742-4658.2009.07125.x (2009).
Article CAS PubMed Google Scholar
OEDOCKING v. 3.2.0.2 (Santa Fe, NM).
McGann, M. F. R. E. D. and HYBRID docking performance on standardized datasets. J. Comput. Aided Mol. Des. 26, 897–906, doi:10.1007/s10822-012-9584-8 (2012).
Article ADS CAS PubMed Google Scholar
McGann, M. FRED pose prediction and virtual screening accuracy. J. Chem. Inf. Model. 51, 578–596, doi:10.1021/ci100436p (2011).
Article CAS PubMed Google Scholar
Soding, J., Biegert, A. & Lupas, A. N. The HHpred interactive server for protein homology detection and structure prediction. Nucleic Acids Res. 33, W244–248, doi:10.1093/nar/gki408 (2005).
Article PubMed PubMed Central Google Scholar
Buchan, D. W., Minneci, F., Nugent, T. C., Bryson, K. & Jones, D. T. Scalable web services for the PSIPRED Protein Analysis Workbench. Nucleic Acids Res. 41, W349–357, doi:10.1093/nar/gkt381 (2013).
Article PubMed PubMed Central Google Scholar
Abraham, M. J. et al. GROMACS: High performance molecular simulations through multi-level parallelism from laptops to supercomputers. SoftwareX 1-2, 19–25, doi:10.1016/j.softx.2015.06.001 (2015).
Article ADS Google Scholar
Maier, J. A. et al. ff14SB: Improving the Accuracy of Protein Side Chain and Backbone Parameters from ff99SB. J. Chem. Theory Comput. 11, 3696–3713, doi:10.1021/acs.jctc.5b00255 (2015).
Article CAS PubMed PubMed Central Google Scholar
Wang, J., Wolf, R. M., Caldwell, J. W., Kollman, P. A. & Case, D. A. Development and testing of a general amber force field. J. Comput. Chem. 25, 1157–1174, doi:10.1002/jcc.20035 (2004).
Article CAS PubMed Google Scholar
Hawkins, P. C. D., Skillman, A. G. & Nicholls, A. Comparison of shape-matching and docking as virtual screening tools. J. Med. Chem. 50, 74–82, doi:10.1021/jm0603365 (2007).
Article CAS PubMed Google Scholar
Thomsen, R. & Christensen, M. H. MolDock: a new technique for high-accuracy molecular docking. J. Med. Chem. 49, 3315–3321, doi:10.1021/jm051197e (2006).
Article CAS PubMed Google Scholar

Download references

Acknowledgements

We gratefully acknowledge the financial support of Research Organization for Information Science and Technology (RIST), The Japan Biological Informatics Consortium (JBIC), HPCTECH Corporation, Schrödinger K.K., IMSBIO Co., Ltd., Dassault Systemes Biovia K.K., DiscoveResource, Inc., DataDirect Networks Japan, Inc., DELL, Namiki Shoji Co., Ltd., NEC Corporation, MITSUI KNOWLEDGE INDUSTRY CO., LTD., Lisit, Co., Ltd., Leave a Nest Co., Ltd., Level Five Co., Ltd., which made it possible to complete our contest. We are deeply grateful to Tokyo Institute of Technology, Ministry of Economy, Trade and Industry (METI), New Energy and Industrial Technology Development Organization (NEDO), Japan Pharmaceutical Manufacturers Association (JPMA), Information Processing Society of Japan (IPSJ), Japanese Society of Bioinformatics (JSBi), Chem-Bio Informatics (CBI) Society, PC Cluster Consortium, The Science News Ltd., Nikkan Kogyo Shimbun Ltd. We would like to offer our special thanks to Mr. Toshiaki Miyaki, Dr. Kazuki Ohno, and Ms. Kanako Ozeki. This work is partially supported by the Research Complex Program “Wellbeing Research Campus: Creating new values through technological and social innovation” from Japan Science and Technology Agency, JST and the Platform Project for Supporting Drug Discovery and Life Science Research (Basis for Supporting Innovative Drug Discovery and Life Science Research) from Japan Agency for Medical Research and Development, AMED.

Author information

Authors and Affiliations

Advanced Computational Drug Discovery Unit, Institute of Innovative Research, Tokyo Institute of Technology, J3-23 4259 Nagatsuta-cho, Midori-ku, Yokohama, 226-8501, Japan
Shuntaro Chiba, Takashi Ishida, Teruki Honma, Yutaka Akiyama & Masakazu Sekijima
Education Academy of Computational Life Sciences, Tokyo Institute of Technology, J3-141 4259 Nagatsuta-cho, Midori-ku, Yokohama, 226-8501, Japan
Shuntaro Chiba, Takashi Ishida, Yutaka Akiyama & Masakazu Sekijima
Department of Computer Science, Tokyo Institute of Technology, 2-12-1, Ookayama, Meguro-ku, Tokyo, 152-8550, Japan
Takashi Ishida, Shogo D. Suzuki, Keisuke Yanagisawa, Nobuaki Yasuo, Teruki Honma, Yutaka Akiyama & Masakazu Sekijima
Level Five Co. Ltd., Shiodome Shibarikyu Bldg., 1-2-3 Kaigan, Minato-ku, Tokyo, 105-0022, Japan
Kazuyoshi Ikeda
IMSBIO Co., Ltd., Level 6 OWL TOWER, 4-21-1 Higashi-Ikebukuro, Toshima-ku, Tokyo, 170-0013, Japan
Masahiro Mochizuki
Forerunner Pharma Research, Co., Ltd., Yokohama Bio Industry Center, 1-6 Suehiro-cho, Tsurumi-ku, Yokohama, 230-0045, Japan
Reiji Teramoto
Department of Physics, Chuo University, 1-13-27 Kasuga, Bunkyo-ku, Tokyo, 112-8551, Japan
Y-h. Taguchi
Department of Biological Sciences, Chuo University, 1-13-27 Kasuga, Bunkyo-ku, Tokyo, 112-8551, Japan
Mitsuo Iwadate & Hideaki Umeyama
Department of Biotechnology, Bhupat Jyoti Mehta School of Biosciences, Indian Institute of Technology Madras, Chennai, 600036, Tamilnadu, India
Chandrasekaran Ramakrishnan & M. Michael Gromiha
CAS in Crystallography and Biophysics and Bioinformatics Facility, University of Madras, Chennai, 600025, Tamilnadu, India
A. Mary Thangakani & D. Velmurugan
Division of Neurogenetics, Nagoya University Graduate School of Medicine, 65 Tsurumai, Showa-ku, Nagoya, 466-8550, Japan
Tatsuya Okuno
Department of Computational Science and Engineering, Nagoya University, Furocho, Chikusa, Nagoya, 464-8603, Japan
Koya Kato & George Chikenji
Department of Complex Systems Science, Graduate School of Information Science, Nagoya University, Furocho, Chikusa, Nagoya, 464-8601, Japan
Shintaro Minami
Department of Biological Sciences, Purdue University, Indiana, 47907, USA
Woong-Hee Shin & Daisuke Kihara
Department of Computer Science, Purdue University, Indiana, 47907, USA
Daisuke Kihara
Isotope Science Center, The University of Tokyo, 2-11- 16, Yayoi, Bunkyo-ku, Tokyo, 113-0032, Japan
Kazuki Z. Yamamoto
Department of Biotechnology, The University of Tokyo, 1-1-1 Yayoi, Bunkyo-ku, Tokyo, 113-8657, Japan
Yoshitaka Moriwaki & Ryunosuke Yoshino
Global Scientific Information and Computing Center, Tokyo Institute of Technology, 2-12-1, Ookayama, Meguro-ku, Tokyo, 152-8550, Japan
Ryunosuke Yoshino & Masakazu Sekijima
Bienta/Enamine Ltd., 78 Chervonotkatska Street, Kyiv, 02660, Ukraine
Sergey Zozulya, Petro Borysko & Roman Stavniichuk
National Taras Shevchenko University of Kyiv, 64/13 Volodymyrska Street, Kyiv, 01601, Ukraine
Sergey Zozulya & Petro Borysko
Center for Life Science Technologies, RIKEN, 1-7-22 Suehiro, Tsurumi, Yokohama, Kanagawa, 230-0045, Japan
Teruki Honma
Molecular Profiling Research Center for Drug Discovery, National Institute of Advanced Industrial Science and Technology, 2-4-7 Aomi, Koto-ku, Tokyo, 135-0064, Japan
Takatsugu Hirokawa & Yutaka Akiyama
Division of Biomedical Science, Faculty of Medicine, University of Tsukuba, 1-1-1 Tennodai, Tsukuba-shi, Ibaraki, 305-8575, Japan
Takatsugu Hirokawa
Initiative for Parallel Bioinformatics, Level 14 Hibiya Central Building, 1-2-9 Nishi-Shimbashi Minato-Ku, Tokyo, 105-0003, Japan
Takatsugu Hirokawa, Yutaka Akiyama & Masakazu Sekijima

Authors

Shuntaro Chiba
View author publications
You can also search for this author in PubMed Google Scholar
Takashi Ishida
View author publications
You can also search for this author in PubMed Google Scholar
Kazuyoshi Ikeda
View author publications
You can also search for this author in PubMed Google Scholar
Masahiro Mochizuki
View author publications
You can also search for this author in PubMed Google Scholar
Reiji Teramoto
View author publications
You can also search for this author in PubMed Google Scholar
Y-h. Taguchi
View author publications
You can also search for this author in PubMed Google Scholar
Mitsuo Iwadate
View author publications
You can also search for this author in PubMed Google Scholar
Hideaki Umeyama
View author publications
You can also search for this author in PubMed Google Scholar
Chandrasekaran Ramakrishnan
View author publications
You can also search for this author in PubMed Google Scholar
A. Mary Thangakani
View author publications
You can also search for this author in PubMed Google Scholar
D. Velmurugan
View author publications
You can also search for this author in PubMed Google Scholar
M. Michael Gromiha
View author publications
You can also search for this author in PubMed Google Scholar
Tatsuya Okuno
View author publications
You can also search for this author in PubMed Google Scholar
Koya Kato
View author publications
You can also search for this author in PubMed Google Scholar
Shintaro Minami
View author publications
You can also search for this author in PubMed Google Scholar
George Chikenji
View author publications
You can also search for this author in PubMed Google Scholar
Shogo D. Suzuki
View author publications
You can also search for this author in PubMed Google Scholar
Keisuke Yanagisawa
View author publications
You can also search for this author in PubMed Google Scholar
Woong-Hee Shin
View author publications
You can also search for this author in PubMed Google Scholar
Daisuke Kihara
View author publications
You can also search for this author in PubMed Google Scholar
Kazuki Z. Yamamoto
View author publications
You can also search for this author in PubMed Google Scholar
Yoshitaka Moriwaki
View author publications
You can also search for this author in PubMed Google Scholar
Nobuaki Yasuo
View author publications
You can also search for this author in PubMed Google Scholar
Ryunosuke Yoshino
View author publications
You can also search for this author in PubMed Google Scholar
Sergey Zozulya
View author publications
You can also search for this author in PubMed Google Scholar
Petro Borysko
View author publications
You can also search for this author in PubMed Google Scholar
Roman Stavniichuk
View author publications
You can also search for this author in PubMed Google Scholar
Teruki Honma
View author publications
You can also search for this author in PubMed Google Scholar
Takatsugu Hirokawa
View author publications
You can also search for this author in PubMed Google Scholar
Yutaka Akiyama
View author publications
You can also search for this author in PubMed Google Scholar
Masakazu Sekijima
View author publications
You can also search for this author in PubMed Google Scholar

Contributions

All authors made substantial contributions to this study and article. S.C., T.I., Y.A. and M.S. organized and operated the contest. S.Z., P.B., and R.S. carried out inhibitory assays. S.C., T.I., K.I., M.M., Te.H., and Ta.H. evaluated data. R.T., H.h.T., M.I., H.U., M.M., C.R., A.M.T., D.V., M.M.G., T.O., K.K., S.M., G.C., S.D.S., K.Y., W.H.S., D.K., K.Z.Y., Y.M., N.Y., and R.Y. participated the contest and predicted hit compound for target protein by their method. S.C., T.I., and M.S. wrote the main manuscript text. All authors approved this version to be published.

Corresponding author

Correspondence to Masakazu Sekijima.

Ethics declarations

Competing Interests

The authors declare that they have no competing interests.

Additional information

Publisher's note: Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Electronic supplementary material

Supplementary Information

Rights and permissions

Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made. The images or other third party material in this article are included in the article’s Creative Commons license, unless indicated otherwise in a credit line to the material. If material is not included in the article’s Creative Commons license and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this license, visit http://creativecommons.org/licenses/by/4.0/.

Reprints and permissions

About this article

Cite this article

Chiba, S., Ishida, T., Ikeda, K. et al. An iterative compound screening contest method for identifying target protein inhibitors using the tyrosine-protein kinase Yes. Sci Rep 7, 12038 (2017). https://doi.org/10.1038/s41598-017-10275-4

Download citation

Received: 09 March 2017
Accepted: 07 August 2017
Published: 20 September 2017
DOI: https://doi.org/10.1038/s41598-017-10275-4

This article is cited by

MERMAID: an open source automated hit-to-lead method based on deep reinforcement learning
- Daiki Erikawa
- Nobuaki Yasuo
- Masakazu Sekijima
Journal of Cheminformatics (2021)
Identification of key interactions between SARS-CoV-2 main protease and inhibitor drug candidates
- Ryunosuke Yoshino
- Nobuaki Yasuo
- Masakazu Sekijima
Scientific Reports (2020)
Molecular Dynamics Simulation reveals the mechanism by which the Influenza Cap-dependent Endonuclease acquires resistance against Baloxavir marboxil
- Ryunosuke Yoshino
- Nobuaki Yasuo
- Masakazu Sekijima
Scientific Reports (2019)
QEX: target-specific druglikeness filter enhances ligand-based virtual screening
- Masahiro Mochizuki
- Shogo D. Suzuki
- Yutaka Akiyama
Molecular Diversity (2019)
A prospective compound screening contest identified broader inhibitors for Sirtuin 1
- Shuntaro Chiba
- Masahito Ohue
- Masakazu Sekijima
Scientific Reports (2019)

Comments

By submitting a comment you agree to abide by our Terms and Community Guidelines. If you find something abusive or that does not comply with our terms or guidelines please flag it as inappropriate.

Subjects

Abstract

Similar content being viewed by others

Introduction

Methods

Preparation of compound library

Methods participated

Screening of Compounds

Experimental procedure and screening of potential inhibitors

Discussion

Hit rate of assayed compounds

Comparison of ligand-based and structure-based methods

Hit rate

Novelty

Characteristics of the most successful method

Incorporation of experimental conditions for machine learning

Incorporation of protein information for machine learning

Comparison to the previous contest

Conclusion

References

Acknowledgements

Author information

Authors and Affiliations

Contributions

Corresponding author

Ethics declarations

Competing Interests

Additional information

Electronic supplementary material

Rights and permissions

About this article

Cite this article

Share this article

This article is cited by

Comments

Search

Quick links