Introduction

Long QT syndrome (LQTs) occurs frequently and is associated with a potential risk of life-threatening torsades de pointes (TdP). Many drugs have been withdrawn from the market for LQTs toxicity1,2,3. The human ether-à-go-go-related gene (hERG) product is a tetrameric potassium channel that plays an important role in cardiac action potential. Currently, a large number of clinical cases of drug-induced or acquired LQTs have been ascribed to the blockage of the hERG channel4,5. Therefore, it is necessary to measure the hERG inhibition activity of candidate compounds during drug discovery. Many experimental assays have been used to detect hERG blockage, such as patch clamp assays, radioligand binding assays, and fluorescence-based assays6. However, these assays are time-consuming, labor-intensive and costly. It is of high interest to develop computational methods that can rapidly and accurately assess the hERG liability of a compound before it is synthesized or purchased from a commercial library.

In past years, many hERG blockage prediction approaches have been reported and can be roughly categorized as structure-based or ligand-based. Because the crystal structure of the hERG potassium channel has not been resolved as yet, the structure-based approaches rely mainly on homology modeling. These structure-based approaches are useful for explaining the binding effect of closely related analogues or a small collection of compounds. However, due to the lack of a suitable template three-dimensional (3D) structure with sufficient sequence identity, the modeled hERG structure shows limited ability to substantially improve the prediction capabilities of these approaches.

With the rapid accumulation of hERG blockage data for various compounds, more ligand-based approaches were reported. In 2002, Cavalli et al used the CoMFA method analysis for 31 drugs and built a four-point pharmacophore model that included a positively charged tertiary amine flanked by three aromatic or hydrophobic centers, which was validated using 6 compounds (r2=0.744)7. In 2003, Keseru et al developed traditional and hologram QSAR (HQSAR) models with five descriptors, including ClogP, molar refractivity (CMR), partial negative surface area (PNSA1), polarizability (W2), and hydrophobicity (D3), to predict hERG affinities, for a test set of 13 compounds (r=0.75)8. In 2006, Yoshida et al carried out 2D-quantitative structure activity relationship (2D-QSAR) studies on 104 hERG channel blockers with descriptors that included the octanol/water partition coefficient, topological polar surface area, molecular diameter, the summed surface area of the atoms and an indicator variable representing the experimental conditions for a test set containing 18 compounds (r2=0.688)9. In 2008, Li et al performed a classification model of hERG blockage for 495 compounds based on GRIND descriptors and the support vector machine (SVM) method, which achieved an accuracy of 92% for the training set but only an accuracy of 72% for the test set of 66 WOMBAT-PK compounds10. In 2011, Shen et al performed a model with 4D-fingerprints (4D-FPs) and traditional 2D and 3D VolSurf-like molecular descriptors, based on the PubChem hERG Bioassay data set containing 876 compounds — the accuracy for this model was 87% for an external test set of 356 compounds11. However, the PubChem hERG Bioassay data set was assembled from diverse sources and measured by various experimenters, which might cause the resulting model to be less reliable. In 2010, Doddareddy et al developed linear discriminant analysis (LDA) and SVM models based on a large dataset of 2644 compounds. Extended-connectivity fingerprints were used to describe chemical space. The best SVM-ECFP_6 model showed 88% accuracy for the external test set, which contained 255 compounds12. In 2013, Wang et al reviewed recent developments in computational prediction of hERG blockage, and they proposed that more reliable experimental data and a consensus modeling strategy are required to improve the performance of current computational models13.

hERG blockage data for chemicals are quickly accumulated, and a QSAR model based on a large dataset is a good approach to accurately predict the property of hERG blockage. Although Shen used PubChem containing a large amount of data and obtained a good prediction, the 4D-FP descriptors were generated based on estimations of the conformation energy profiles of molecules by molecular dynamics simulation, which is difficult to obtain11. So far, the largest dataset used for hERG blockage prediction was compiled by Doddareddy et al. They constructed SVM models that yielded good prediction accuracy but showed limited ability to interpret the mechanism of action. In this study, we used Doddareddy's data set and constructed models with a Laplacian-corrected Bayesian classification method, which considers the effects of some features in identifying toxic compounds. Simple molecular properties and extended-connectivity fingerprints were used as descriptors. These models performed well for the training and test sets and could be used for large-scale prediction.

Materials and methods

Datasets

The work reported here used the hERG blockage data from Doddareddy and co-workers12, which contained 2644 compounds, including 1479 literature compounds tested for effects on the hERG channel and 1165 FDA-approved drugs. The dataset was divided into training and test sets as described12. Activity data were expressed as an IC50 value, measured by either patch clamp or radioligand binding assays, which were conducted in human HEK-293 and CHO cell lines. Although there were some differences in the pIC50 values, these two assay results showed a consistent trend. Moreover, because we choose to build qualitative (classification) models, the quantitative variability in different assays can be considerably reduced by setting appropriate activity thresholds for separating blockers from nonblockers. Here, we employed Doddareddy's criteria, ie, the compounds with IC50 value <10 μmol/L were labeled as blockers and those with IC50 value >30 μmol/L and FDA-approved drugs as nonblockers.

Calculation of molecular properties

Many previous studies have revealed that molecular properties are relevant to hERG blockage, including the molecular lipid-water partition coefficient (ClogP), molecular weight (MW), and the molecular surface area information8,9. In this study, molecular weight (MW), molecular fractional polar surface area (FPSA), and lipophilicity (ALogP) were used. We also calculated the pKa of the most positive basic nitrogen (pKa basic) based on previous pharmacophore models7,14,15. These four molecular properties were used as simple molecular descriptors for hERG blockage modeling and were calculated with the “Calculate Molecular Properties” protocol within Discovery Studio (version 3.0; Accelrys, San Diego, CA, USA) and components in Pipeline Pilot (version 7.5; Accelrys, San Diego, CA, USA).

Extended-connectivity fingerprints (ECFPs) are a class of 2D fingerprints for capturing molecular features based on a variant of the Morgan algorithm16. An example of the generation process of ECFP for benzamide is shown in Figure 1. The ECFPs' capture atom information based on the Daylight atomic invariants rule17 and the identifiers are hashed into a single 32-bit integer value18. The number of iterations performed is determined by the maximum diameter of the neighborhoods requested. For example, “ECFP_4” generates features around each atom up to a diameter of 4, which requires two iterations. We also calculated the fingerprints with the “Calculate Molecular Properties” protocol within the Discovery Studio (version 3.0; Accelrys, San Diego, CA, USA).

Figure 1
figure 1

Example to show the ECFP iterative generation procedure.

PowerPoint slide

Modeling methods

Laplacian-corrected Bayesian classifier models were generated using Discovery Studio (version 3.0; Accelrys, San Diego, CA, USA)19,20,21. Bayesian categorization model is a statistical classification method, which use knowledge of probability and statistics based on Bayes' theorem (Eq 1):

Where A is the classification of model, B is the observed value, P(A) is the prior probability or marginal probability of classification A without considering any B factors, P(B) is the prior or marginal probability of the value, P(B|A) is the posterior probability (the probability of value B if classification A is true), and P(A|B) is the probability that classification A is true given the observed data B (also called the posterior probability)22.

We choose to build a Laplacian-corrected Bayesian classifier because it considers the complexity of the model as well as the likelihood and then picks the simplest model to explain observed data, which can avoid overfitting. The Bayesian classification technique was widely used in ADME/T predictions23,24,25. In our modeling process, the “good” samples (blockers) must be labeled first; then the model learns to distinguish the good samples from the bad samples (nonblockers). The learn-by-example process worked as follows: given a sample compound structure, the features of the sample were generated and then converted into Boolean forms. A bin was defined to count the frequency of the fingerprints and continuous values in a given range. Finally, the number of occurrences of each feature in the blocker subset, as well as in all samples, was collected. In addition, for each feature, a weight was calculated using the Laplacian-adjusted probability estimate. The Laplacian-adjusted process can be summarized as follows (Eq 2, 3, 4):

where a feature Fi is contained in B samples, and A of those B samples are active. K is a constant [virtual samples of P(Active) were added K times to stabilize the estimator to ensure more weight was assigned to the features that occurred more frequently and little weight was assigned to those that occurred less frequently]. When n features of a compound were generated, a cumulative score of feature contributions to the actives likeness (Pcombined) was computed. More details concerning the method have been described by Xia et al26.

Two different models were built: one used simple molecular properties as descriptors, and the other used simple molecular properties and ECFP fingerprints as descriptors. The Bayesian models with leave-one-out (LOO) validation were built with the training set and then evaluated with external test sets using the performance metrics described below.

Model evaluation

To evaluate the performance of the models, we used the parameter sensitivity (SE, also referred to as recall or the true-positive rate, the percentage of active compounds correctly predicted), specificity (SP, also known as the true-negative rate, the percentage of inactive compounds correctly predicted) and the global accuracy (AC, the total percentage correctly predicted). These parameters are defined as follows (Eq 5, 6, 7):

In Eqs 5,6,7, TP is the number of true positives (active compounds that are correctly predicted), TN is the number of true negatives (inactive compounds that are correctly predicted), FN is the number of false negatives (active compounds that are incorrectly predicted to be inactive), and FP is the number of false positives (inactive compounds that are incorrectly predicted to be active). The above parameters were calculated for the training and the test sets.

Results and discussion

Bayesian classifiers

The leave-one-out (LOO) statistical results for the training set for Bayesian classifiers are summarized in Table 1. The first Bayesian model based on the four simple descriptors (Model 1) showed a high sensitivity of 91% but with poor specificity of 70%. Then the ECFPs fingerprint descriptors were added to the simple descriptors; the best classifier was the model based on the simple descriptors+ECFP_14 (Model 2), which had a sensitivity of 96%, a specificity of 96%, and an overall accuracy of 96%, outperforming the best model of Doddareddy et al.

Table 1 LOO results of the Bayesian models for the training set.

The two models were then validated with test set I, which comprised 255 molecules. As shown in Table 2, the sensitivity and specificity for Model 2 were 90% and 92%, respectively. Model 1 also gave a good sensitivity of 91% but a relatively poor specificity of 75%. Model 1 performed well on the blocking molecules but poorly on the non-blocking molecules. The performance was improved when ECFP_14 was added, especially for the predictive capability for nonblockers. This result suggested that ECFP_14 is useful for identifying the common structural features of compounds that may avoid interaction with the hERG channel.

Table 2 Results of the Bayesian models for the test set I.

Molecular features important for hERG

Many molecular descriptors have been proven to be helpful for hERG blockage prediction, including 2D descriptors, such as ClogP, compound molar refractivity (CMR), polarizability, partial positive/negative surface area, compound diameter, topological polar surface area, static polar surface area, fragment fingerprints and shape descriptors, 3D descriptors, such as GRIND descriptors, and 4D descriptors, such as 4D-FPs8,9,10,11,12,25,27,28,29,30,31,32,33,34,35,36,37,38,39,40,41,42,43,44,45,46,47,48,49. Analysis of our simple, interpretable molecular properties for the compounds in the training set indicated that the molecular weight (MW), fractional polar surface area (FPSA), lipophilicity (ALogP) and pKa of the most positive basic nitrogen (pKa_basic) can preliminarily predict hERG blockage. The means and the standard deviations for the four descriptors in the training and test sets are summarized in Table 3. As shown, hERG blocking molecules have a larger molecular weight (mean value of approximately 430) compared with the nonblockers (mean value of approximately 330). In addition, compared with the nonblockers, the hERG blocking molecules have a lower FPSA value, a larger ALogP value and more basic nitrogen pKa. The mean values of the four molecular properties for the training set were listed in Figure 2. A test for significant differences is also shown. The four properties for the blockers are all significantly different from those for nonblockers (P≥0.01). The models also generated the features' statistics of the normalized probability contribution to the model (Figure 3). The features' normalized probability is the final contribution of a feature to the model's prediction, which means that the presence of a feature increases or decreases the likelihood for a compound to be a member of the good subset. The four molecular properties and their contributions to the activity identification were mapped. A positive normalized probability means that a molecule with the corresponding property value is likely to be hERG blockers, and a negative normalized probability means a molecule is likely to be nonblockers. From this figure, we can clearly find the favorable distribution ranges for properties of different molecules that cause a molecule to be more likely to be blockers or nonblockers. For example, molecules with an ALogP of −11.97−1, MW of 0227, FPSA of 0.431.00, and pKa_basic of 01.54 are likely to be nonblockers.

Table 3 Mean physicochemical properties for the training set and test set molecules.
Figure 2
figure 2

The mean value of (A) ALogP, (B) MW, (C) FPSA and (D) pKa_basic of the training set. cP<0.01 vs inactive.

PowerPoint slide

Figure 3
figure 3

The normalized probability of the four molecular properties (A) ALogP, (B) MW, (C) FPSA, and (D) pKa_basic of the training set.

PowerPoint slide

Previous studies have revealed that most hERG inhibitors have two main characteristics: hydrophobic groups forming π–π stacking and protonated groups forming cation-π interaction with hERG channel residues50,51. It has also been suggested that, with increasing hydrophobicity and molecular diameter, the hERG inhibitory effect of a compound increases9,37. In our study, we found that molecules with larger ALogP and MW but smaller FPSA are more likely to be blockers, which could be ascribed to the large pore size in hERG as well as the lipophilic character of the pore cavity37. Generally, a nitrogen positive ionizable center has been considered a common feature for many hERG blockers7,50,51,52. We calculated the pKa value of the most basic nitrogen of compounds (pKa_basic) and found that the pKa_basic of non-blocking molecules is likely to be zero. The basic site with a large pKa is more likely to be protonated, and molecules with nitrogen positive ionizable centers are more likely to form a cation-π interaction with hERG channel residues.

During Laplacian-modified Bayesian modeling, the following statistics for each feature were calculated: the number of molecules in which the feature occurred, the number of those molecules that were blockers, and the measure of how different this is from the hit rate as a whole (the ratio that would be expected if the feature was occurring randomly across the blockers and nonblockers). The Bayesian score represents the final contribution of a feature to the model prediction, which takes into account the total number of occurrences of the feature, ensuring more weight is placed on features that occur more often and little weight on those for which there are very few occurrences. The top 20 fingerprint features that made the most positive contribution to the model and the top 20 that made the most negative contribution were generated as good and bad features. These fingerprint features are shown in Figure 4. The top 10 good features have a Bayesian score of 0.636, which matches to all 37 blockers (Figure 4A). These molecules are a series of 1,2,4-triazol-3-yl-thiopropyl-tetrapydrobenzazepines, which are potent and selective dopamine D3 receptor antagonists. It has been reported that D3 receptor antagonists tend to exhibit hERG toxicity53. The top 11–20 of the good features are from sertindole analogues, Wombat-PK database molecules and h5-HT2A receptor antagonists, which are mostly strong hERG blockers48,54,55. The good features can be used as structural alerts for hERG toxicity. The top 20 bad fingerprint features are listed in Figure 4B: all of these features only occurred in the nonblockers. From the prediction results we can see that the prediction for the nonblockers has been improved significantly when fingerprints were added into the descriptors: this may play important role in identifying nonblockers. For example, a series of sertindole analogues were included in our data set50, in which the most inactive analogue (compound 5) contains the fragment B13 (Figure 5B), which occurred 104 times in the whole data set but only 2 times in the blockers. As seen from the Figure 6, the two molecules (compound 4, 5) are different from each other only at one group, but the hERG channel IC50 values show a 100-fold difference. Thus, it can be hypothesized that the existence of the B13 fragment regulates the molecule to avoid the binding of the hERG channel. These fragments may affect the hERG binding affinity of compounds by changing their entire molecular shape and other physicochemical properties. Combined with the good features, the bad features may act as regulatory factors for the structural alerts listed in Figure 4A, which increases the overall accuracy of hERG toxicity prediction.

Figure 4A
figure 4

The good features (A) generated in the model. For the annotation of each fragment, G (good features) are bin IDs, the string of numbers show the bin value defining the fingerprints feature, the second line shows the frequencies this feature occurred in “good” samples and in the entire data set. The Bayesian score shows the final contribution of the feature to the model prediction.

PowerPoint slide

Figure 4B
figure 5

The bad features (B) generated in the model. For the annotation of each fragment, B (bad features) are bin IDs, the string of numbers show the bin value defining the fingerprints feature, the second line shows the frequencies this feature occurred in “good” samples and in the entire data set. The Bayesian score shows the final contribution of the feature to the model prediction.

PowerPoint slide

Figure 5
figure 6

The structures of two Sertindole analogues compound 4 (A) and compound 5 (B). The B13 fragment presents in compound 5.

PowerPoint slide

Prediction on an external dataset

We also applied our models to Doddareddy's experimentally validated dataset, (hereafter referred to as test set II), which contained 50 compounds predicted as blockers from the Chembridge database and 10 in-house predicted nonblockers. In Doddareddy's study, hERG blockage activity was measured by radioligand binding assays of displacement of [3H]astemizole binding to cell membranes expressing the hERG channel at 10 μmol/L concentration. Eighteen compounds of 50 predicted hERG blockers showed more than 50% displacement of [3H]astemizole in a single-point assay and 4 of those 18 showed a low micromolar and even lower nanomolar activity. We categorized the dataset into 18 blockers and 42 nonblockers based on Doddareddy's experimental results. The prediction results are listed in Table 4. Model 1 was able to identify 16 of 18 blockers, corresponding to a TP rate of 0.89, which showed a higher sensitivity than Doddareddy's. The 4 high activity compounds, VH01, VH06, VH19, and VH47, were also identified. In contrast, Model 2 correctly predicted 11 of 18 blockers and showed an increased SP of 0.57. The compounds in test set II were measured by a radioligand binding assay based on the displacement of [3H]astemizole binding, which is different from the assays used for the training set and test set I. From both Doddareddy's and our studies, we found these different assays show some variability in measuring hERG blockage, which may account for the lower prediction accuracies of these models on test set II. Doddareddy et al also developed models individually from patch clamp data and radioligand binding data, but obtained far fewer correct prediction results than the models generated using combined data from different assays. Nevertheless, compared to Doddareddy's results, our models showed more balanced results for identifying active and inactive hERG inhibitors, and they can avoid missing too many blockers during early screening.

Table 4 Results of the Bayesian models for the test set II.

Conclusion

In this study a Laplacian-modified Bayesian model was developed for a large data set using physicochemical properties and extended connectivity fingerprints as descriptors, and a good predictive performance was achieved, which was better than the results of the reference studies. The four simple descriptors (MW, FPSA, ALogP, pKa_basic) preliminarily predicted hERG blockage. Some structural alerts for hERG blockage and some regulating fragments contained in nonblockers were proposed. This work may provide guidance for medicinal chemists working in drug discovery and lead optimization.

Author contribution

Ming-yue ZHENG and Xiao-min LUO conceived and designed the experiments; Li-li LIU, Jing LU, and Yin LU performed the experiments; Li-li LIU, Ming-yue ZHENG and Xiao-min LUO analyzed the data; Li-li LIU, Ming-yue ZHENG and Xiao-min LUO wrote the paper: All authors discussed the results and commented on the manuscript. All authors have given approval to the final version of the manuscript.