Introduction

Phosphatidylinositol 3-kinases (PI3Ks) are a group of plasma membrane-associated lipid kinases-enzymes that their phosphorylation plays a critical regulatory role in the cellular processes1,2. In The cellular regulatory mechanism, kinases and phosphatases catalyze activation and deactivation processes via phosphorylation and dephosphorylation of PI3Ks, respectively3. In response to various external stimuli such as oncogenes, growth factors, hormones, and environmental variations, PI3Ks are phosphorylated through conversions of phosphatidylinositol (4,5)-bisphosphate (PIP2) to phosphatidylinositol (3,4,5)-trisphosphate (PIP3)4,5. PIP3 serves as a docking site of effector proteins such as protein kinase B (PKB/Akt) that act as the second messenger molecule in the cellular membranes1. This intracellular signaling pathway has an important role in regulating diverse cellular processes such as cell growth, differentiation, proliferation, survival, and migration6,7,8. Reversible phosphorylation of inositol lipids controls diverse functions in cells. Deregulation of this pathway occurs by various genetic and epigenetic mechanisms in a wide range of tumors9,10,11. PI3Ks are divided into classes I, II, and III based on the differences in their structures and specific substrates12,13. According to the regulator proteins and signaling pathways, class I PI3Ks are further subdivided into class IA and class IB. Class IA PI3Ks contains three enzyme isoforms, PI3Kα, PI3Kβ, and PI3Kδ; while PI3Kγ is the only member of class IB and its corresponding signal primarily is generated by G-protein coupled receptors (GPCRs). PI3Kδ and PI3Kγ can generate intracellular signals to regulate immune cells. These two enzyme isoforms are being investigated for cancer treatment in the clinic14,15,16. PI3Kα and PI3Kβ are involved in the regulation of cell survival and metabolism17,18,19,20. Overall, PI3Kγ controls a critical switch between immune stimulation and suppression during inflammation and cancer21. The abnormal expression of PI3Kγ is the result of the mutation and deficiency of phosphatase and tensin homolog on chromosome ten (PTEN)1,22.

In competition with ATP, PI3Kγ inhibitors bind to the ATP cofactor binding site in the active form of kinases to block PI3Kγ activity through stabilizing inactive kinase conformations23. PI3Kγ has attracted attention as a potential drug target in treating advanced solid tumors, leukemia, inflammatory, and various autoimmune diseases. Over the past two decades, PI3Ks (especially PI3Kγ) inhibitors have been attracting extensive interest and more than 600 medicinal chemistry-based publications and patents to date show the importance of these compounds24. Enormous efforts have been dedicated to the development of highly efficient, safer, potent, and selective isoform PI3Kγ inhibitors. Gangadhara et al. discovered a class of PI3Kγ inhibitors. They proposed that the cyclopropyl ethyl moiety of these inhibitors induces a significant conformational change in both the kinase and helical domains of PI3Kγ which results in blocking the ATP–binding site25. Other research groups also discovered a series of potent and selective PI3Kγ inhibitors, some of which are: azaisoindolinones26, benzothiazoles27, 7-substituted triazolopyridine (CZC24758)28, IPI-549 (through optimization of isoquinolinone)29, and 7-azaindole isoindolinone30. Drew et al. designed potent PI3Kγ inhibitors based on the differences of IPI-549 and AZ2 in the binding modes interaction with ATP binding site of PI3Kγ31. SAR study on 6-aryl-2-amino-triazolopyridines was performed by Bell et al.32 Zhu et al. provide an overview to discuss the structure‐selectivity‐activity relationship of existing clinical PI3Kγ inhibitors33 and, ultimately Taha et al. used ligand‐based modeling and virtual screening followed by in vitro analysis to discover nanomolar PI3Kγ inhibitors34.

Despite all these efforts, indeed selecting isoenzyme compounds is difficult; due to the high sequence homology among the PI3K isoforms. Therefore, the discovery and development of PI3Kγ-selective inhibitors are still quite challenging.

Cost and time consumption are disadvantages of the in vivo-in vitro assays during drug development. QSAR analysis in a primary screening through selecting and proposing the most potent drug candidates causes to prioritize the synthesis of effective drugs; subsequently, pharmaceutical research can be more efficient. Halder and Cordeiro reported the QSAR-Co tool for predicting the activity of inhibitor compounds against different isoforms of PI3Ks, under various experimental conditions35.

In this research, QSAR analysis was carried out on a large and diverse set of potent and selective isoform-specific PI3Kγ inhibitors using an artificial neural network and multivariate linear regression. The interpretability, clarity, and understandability of the models presented by MLR make it a good choice for modeling. At the same time, the complex relationship between the chemical structures of PI3Kγ inhibitors and their biological response is the best justification for using the ANN-based nonlinear method.

Classification as another aspect of QSAR modeling can also be mentioned that was developed on qualitative categorical responses36. In the simplest case, chemical compounds are classified into two categories active/inactive based on their biological activity. The mapping function based on the output variables is employed to predict the class or category for a given observation. In the case of two classes, binary classification is applied. Due to the high sequence homology among the PI3K isoforms, here we used regression-based QSAR models for a quantitative study.

Most of the compounds used in this research have been recently synthesized or evaluated experimentally. Also, to increase the application domain of the models, these compounds were investigated in a wide range of pIC50 (− log IC50). The selectivity of these compounds for PI3Kγ over the other PI3K isoforms is confirmed by X-ray crystallography. To take into account safety profiles related to absorption, distribution, metabolism, elimination, and toxicity (ADMET) during the prediction of activity, Speck-Planche and Cordeiro introduced the multitasking model for quantitative structure biological effect relationships (mtk-QSBER)37,38,39. In the present work, Lipinski’s rule of five was used to check the drug-likeness of compounds40. Moreover, to further assess the models, external verification was performed using another group of PI3Kγ inhibitors with high structural diversity and a wide range of activity.

Materials and methods

Data sets

In this study, 245 compounds of PI3Kγ inhibitors collected from published literature18,24,28,29,30,31,32,41 were used for QSAR modeling. It is worth mentioning that, after removing duplicate molecules from the above references, a data set consisting of 256 molecules was collected. Then 11 compounds were removed from the data set, including seven molecules that were too different structurally for investigation in the application domain of the models and four molecules whose pIC50 values were out of the considered range significantly. Thus, the final data set was reduced to 245 molecules. All minimum inhibitory concentration (IC50) values of molecules were converted into the corresponding pIC50. The structure of these molecules and their corresponding values of PI3Kγ inhibitory activity (pIC50) are presented in Supplementary Table S1. Also, simplified molecular input line entry specification (SMILES) strings of molecules are provided in Supplementary Table S2.

Drug-likeness assessment

For assessment of drug-likeness of a molecule, Lipinski’s rule of five was employed40. Based on the distribution of molecular properties (molecular weight, H-bond donors, H-bond acceptors, and logP) among several thousand drugs of USAN (United States Adopted Name) data set, the percents of drugs that are predicted to have poor absorption or permeation are specified in Table 1. In this Table, the ClogP parameter is calculated based on the substructure (atomic group) contribution. In comparison with other estimation methods, ClogP has a better agreement with experimental results. This parameter as a criterion of lipophilicity affects the permeability, accumulation, absorption, bioavailability, and drug cytotoxicity.

Table 1 Assessment of the drug-likeness (solubility and permeability of a molecule) based on Lipinski’s rule of five.

For 245 compounds involved in the modeling process, the aforementioned properties were calculated using Dragon 5.5 software package42, except the ClogP that the Data warrior software43 was used to calculate it. Calculated parameters for these 245 compounds are presented in Supplementary Table S3. The result of checking them by Lipinski’s rule of five confirmed their favorable drug-likeness as shown in Table 1.

Descriptors calculation and feature selection

In the first stage of the molecular modeling, SMILES strings of the structures were saved in SDF (Structure Data File) format; then, Open Babel software44 was applied to convert them into the HyperChem HIN format. Following the modeling process in the HyperChem 8 software package45, the molecular mechanics force field (MM+) procedure was used to pre-optimization of 3D structures to lower energy levels. Then, the Semi-empirical methods including PM3 and AM1, which belong to quantum chemistry methods, were used to optimize the structures geometrically and electronically, respectively. Root mean square gradient equal to 0.001 kcal Å−1 mol−1 was determined as the critical value of optimization. The most stable optimized conformer of each structure was selected and saved. Subsequently, Dragon 5.5 software package42 was used to compute 2D autocorrelation descriptors (a total of 96 of such descriptors) and, using the former optimized geometrics, three-dimensional (3D) descriptors including Randic molecular profiles, geometrical, RDF, 3D-MoRSE, WHIM, and GETAWAY categories (a total of 41, 74, 150, 160, 99 and 197 of such descriptors, respectively).

Among 22 different classes of descriptors computable by the Dragon software, 3D and 2D autocorrelation descriptors most probably have a successful performance in 2D-QSAR modeling based on the results of our previous studies on the inhibitory activity of anti-cancer drug candidates46,47. More details about these descriptors and the superior features that make them most appropriate for modeling are provided at the end of the manuscript. To avoid overfitting, during the QSAR model development, objective feature selection was used to reduce the redundant and unnecessary information. In this way, descriptors that are zero and constant for all molecules were discarded from the descriptors pool. Additionally, as a rule highly correlated (R > 0.90), from each pair of descriptors that have a correlation coefficient greater than 0.9, only one remains in the descriptors pool and the other one is eliminated. Following the feature selection, based on the above described, the number of descriptors is reduced from 817 to 290 numbers, up to this stage. Subsequently, subjective feature selection involves the genetic algorithm tool in selecting the most relevant set of descriptors that were not collinear48.

This algorithm is based on the theoretical principles of Darwin's theory of evolution and is highly welcomed to multivariate analysis. GA runs based on the following steps:

Initially, many subsets of descriptors are randomly generated that serve as chromosomes, where the descriptors included in each subset play the role of the gene. Then, for each subset of descriptors (each chromosome), the MLR model was developed separately. Based on the goodness prediction of inhibitory activity, the chromosomes are evaluated. The correlation coefficient (Q2) value plays the role of the fitness function which is calculated based on employing the leave-one-out cross-validation (LOO-CV) method on each chromosome separately (LOO-CV is described in the next part of this research). Each subset of descriptors located on a chromosome is encoded with a string of binary 1 and 0 values. Based on the modeling results, if the descriptor corresponding to each gene is effective in predicting the inhibitory activity, its value is equal to 1, otherwise, it is taken to be 0. This function leads to the expulsion of the worst subsets. Then two types of modification are operated randomly including crossover through the replacement of the corresponding sections of the two parent chromosomes from two points (Duble) and mutation that is operated through randomly changing a position of a parent chromosome to change its value. In this way, the child chromosomes are extracted, and according to what was previously described, their fitness is computed. The best children replace the worst parent to improve the primary population. This process is subsequently repeated until the most relevant set of descriptors with the highest convergence are selected or criteria defined to stop the algorithm are achieved. In this work using MATLAB software49, GA was run based on the optimal parameters presented in Table 2. By selecting the most suitable descriptors, the number of them reduced from 290 to 56 cases.

Table 2 Parameters of the genetic algorithm.

Dataset splitting

One of the most common methods in QSAR model evaluation is external validation that is performed through dividing the whole data set into the training and test sets by a ratio of 4:1. It is highly critical that both groups must be a reliable representatives of the entire dataset in terms of molecular structure, biological activity, and physicochemical property.

Among the various methods of data splitting, DUPLEX50 and Kennard–Stone51 algorithms are more welcomed; because, they perform data splitting according to the aforementioned conditions, which are introduced below.

The DUPLEX algorithm based on the PCA

Considering the large number of structural descriptors, this approach causes the total space of structures to be counted for data splitting and helps to uniform distribution (homogenity) of data set into the training and test. Based on this algorithm, first, principal component analysis (PCA) was performed on the entire data set including 290 relevant descriptors. Then a new activity was calculated through establishing a principal component regression (PCR) between original experimental inhibitory activity and PCs. Subsequently, the results were provided to the DUPLEX algorithm to splitting the data set based on the following process:

In the beginning, the two most distant (i.e. most dissimilar) objects are removed from the dataset and placed into the training set. From the remaining points, the next pair which are farthest apart are picked up and placed into the test set. Among the remaining points, two points are moved to the training set with the greatest distance from each other, again. Then, from the remaining objects, the one which is furthest away from those previously selected as the training set, is moved to the test set. The process is repeated until each set contains a certain number of molecules. By employing this method, the uniform distribution of data between the training and test sets was guaranteed not only in the properties but also in the structures.

Kennard–Stone algorithm

In a similar approach to DUPLEX, Kennard–Stone algorithm ensures that each point of the test set is close to at least one point of the training set. This algorithm uses the following equation to split a dataset into training and test set:

$$\mathrm{Objective \; function }= \sum_{\mathrm{i}=1}^{\mathrm{k}+1}\left\{{[\upmu (\mathrm{i})}_{\mathrm{train}}-{\upmu \left(\mathrm{i}\right)}_{\mathrm{test}}]+{[\upsigma (\mathrm{i})}_{\mathrm{train}}-{\upsigma \left(\mathrm{i}\right)}_{\mathrm{test}}]\right\},$$
(1)

k represents the number of inputs, while μ and σ are labels for mean and standard deviation of the input or output variable, respectively.

Euclidean distance EDx (p,q) is employed by this algorithm to ensure the uniform distribution of the selected subset in the data space as below:

$${ED}_{x}\left(p, q\right)=\sqrt{\sum_{j=1}^{n}{[{x}_{p}\left(j\right)-{x}_{q}(j)]}^{2}} \mathrm{p},\mathrm{ q}\in \left[1, M\right].$$
(2)

Based on the several reports, in the data splitting process, the superiority and high quality of the DUPLEX algorithm over other methods have been confirmed52,53,54 so in this research, the modeling process was performed using the training set obtained from DUPLEX.

In our research following data splitting by DUPLEX algorithm, Kennard-stone algorithm, and Random data splitting by Minitab software55, GA-MLR models were established on the training set and then generalized to the test and validation sets. More details of data splitting and model validation using these methods are presented in the section “QSAR modeling results”.

Statistical factors and methods used in the model evaluation and validation

Since the external and internal validation of the model is an essential step in QSAR analysis, several statistical parameters were employed to assess the performance of the models, which are briefly described in Table 3, and equations used in calculation them have been presented. Williams plot-based analysis is explained later (Determination of the application domain of the model).

Table 3 Model performance parameters and their related equations.

Model development

The SPSS software56 establishes multivariate linear regression by receiving the data matrix consist of the most suitable 3D and 2D autocorrelations descriptors selected by GA and the corresponding inhibitory activity of each x-vectors. According to the stepwise procedure, the entry of the descriptors into the model continues until the R2 value is strengthened significantly and the root mean square error (RMSE) value is weakened by entering the new descriptor. Of course, simultaneously the value of the Fisher's test (F) parameter is controlled so that it accepts its optimum value. Very high and very low values of F lead to overfitting and underfitting errors, respectively. One of the valid criteria in monitoring the optimum value of F is the variance inflation factor (VIF) which shows the correlation between the descriptors (described in continue further); during the modeling process, its value must be kept less than 5. In addition, the coefficients of the descriptors should be acceptable values based on their standard deviation. The criterion for stopping the entry of descriptors into the model is that by entering the new descriptor, the statistical performance factors do not improve significantly.

The above approach prevents overfitting. The variance inflation factor (VIF) test, ensures that the modeling process is not accompanied with multicollinearity and is calculated as below:

$$\mathrm{VIF}=\frac{1}{1-{\mathrm{R}}_{\mathrm{j}}^{2}},$$
(3)

where R2j is the square of the correlation coefficient between descriptors during the model development. VIF equal to 1 indicates that the j-th descriptor is not correlated to the remaining ones. To accept the model, the VIF value should be between 1 and 5, but in the case of VIF values higher than 10, there is significant multicollinearity; so the model must be corrected.

Leave-one-out cross-validation (LOO-CV)

During the LOO-CV as one of the internal validation methods, each molecule is removed from the data matrix and the remaining molecules are employed to model development. Using the extracted model, the molecule that was kept out is predicted; this process is repeated for all molecules.

Y-randomization test

To ensure that the developed model does not arise from chance, y-randomization is performed through the scrambled biological activity. This procedure is repeated twenty times randomly; then a new regression is established using the same parameters of the original model. Low values of R2y-random and Q2y-random in the new models with shuffled pIC50, confirm the efficiency and robustness of the main developed model.

Description of the artificial neural network theory, briefly

Artificial neural networks algorithms inspired by biological neural networks in the human brain57. The application of ANN is based on this hypothesis that a given training data to construct a model, can learn and generalize from previously seen examples. During the learning, the algorithm extracts the rules and relationships governing the experimental data through their processing. The extracted information is inducted into the network. ANN, as a nonlinear modeling technique, has been used extensively in QSAR analysis and constructed from an input layer, hidden layer(s), and an output layer. The number of input neurons to the network is equal to descriptors used in the linear model development. The weight parameter determines the effect of the input layer on the output layer, which is adjusted during training with the feed-forward back-propagation approach. The trial and error procedure on the training set is employed to optimize the size of the hidden layers. The criterion for this assessment is the average square error (MSE) which acts as a performance function. In the present study, the Bayesian regularization algorithm was used to train with a feed-forward approach and sigmoid as a hidden layer transfer function was employed. Experimental pIC50 acts as one output layer. A large number of hidden layers causes the developed model to have an overfitting problem; however, a too-small number of hidden layers leads to fault tolerance and weakens the generalization capability of the net. In the current study, the implementation of the above approach resulted in the 10-3-1 network architecture. Separate validation of the model was performed by one-tenth of the training set selected randomly. In this way, the performance of the ANN was monitored; through evaluation predicted values of the validation set during the training of the network. The training is stopped when the results for the validation set are not significantly improved.

Results

QSAR modeling results

The DUPLEX algorithm was employed to dividing total 245 PI3Kγ inhibitors into the training (196 molecules) and test sets (49 molecules). The MLR model was developed using a relevant set of descriptors selected by GA and was evaluated by the test set

$$\begin{aligned} {\text{pIC}}_{{{5}0}} & = { 9}.{67}0 \, \left( { \pm 0.{935}} \right) \, - \, 0.{891 }\left( { \pm 0.{158}} \right){\text{Mor12p}} + \, 0.{246 }\left( { \pm 0.0{46}} \right){\text{RDF}}0{1}0{\text{e}} - \, 0.{6}0{4 }\left( { \pm 0.{115}} \right){\text{Mor14u}} \\ & \quad - \, 0.{54}0 \, \left( { \pm 0.{11}0} \right){\text{Mor15m}} - { 1}.{727 }\left( { \pm 0.{3}0{2}} \right){\text{GATS6p}} - \, 0.{732 }\left( { \pm 0.{183}} \right){\text{Mor19m}} + \, 0.0{38 }\left( { \pm 0.00{8}} \right){\text{Te}} \\ & \quad - { 12}.{244 }\left( { \pm {4}.{271}} \right){\text{G2v}} - 0.0{39 }\left( { \pm 0.0{14}} \right){\text{Mor}}0{\text{2v}} + \, 0.{718 }\left( { \pm 0.{293}} \right){\text{GATS4p,}} \\ \end{aligned}$$
(4)
$${\text{n}}_{{{\text{training}}}} = { 196},{\text{ R}}^{{2}} = \, 0.{623},{\text{ R}}^{{2}}_{{{\text{adj}}}} = \, 0.{6}0{2},{\text{ RMSE }} = \, 0.{473},{\text{ F }} = { 3}0.{546,}$$
$${\text{n}}_{{{\text{test}}}} = { 49},{\text{ R}}^{{2}} = \, 0.{662},{\text{ RMSEP }} = \, 0.{451}{\text{.}}$$

R2 and RMSE values calculated for training and test sets using the DUPLEX algorithm, Kennard–Stone algorithm, and random data splitting are provided in Table 4.

Table 4 Calculated R2 and RMSE parameters for training and test sets separately, following the data splitting process by three methods.

Figure 1 is plotted based on the data splitting pattern by Kennard–Stone algorithm on 245 compounds used in this study.

Figure 1
figure 1

Plot of the data splitting pattern using Kennard–Stone algorithm on 245 compounds.

As a general rule, a QSAR model is considered to be predictive if calculated values of R2, Q2, and R2pred are higher than 0.6, 0.6, and 0.5, respectively58,59; therefore, robustness and stability of the GA-MLR model are confirmed based on the obtained statistical performance. The values of 3D and 2D autocorrelations descriptors appearing in model 1 (Eq. 4) are presented in Supplementary Table S4. These descriptors are briefly introduced in Supplementary Table S5. Based on model 1 (Eq. 4), the prominent role of 3D-MoRSE descriptors in combination with 2D autocorrelations can be further evaluated. Minimal multicollinearity between the selected descriptors is confirmed by the VIF index with values less than 3.893 (Table 5); therefore, an informative and optimal GA-MLR model has been built. Based on model 1 (Eq. 4) the predicted values of pIC50 for training and test sets are provided in Table 6. Using the same descriptors selected by the GA-MLR model, ANN was also established on 196 compounds as a training set and was validated by the remaining 49 compounds as the test set. The prediction performance confirms the stability and efficiency of ANN:

$${\text{n}}_{{{\text{training}}}} = { 196},{\text{ R}}^{{2}} = \, 0.{642},{\text{ R}}^{{2}}_{{{\text{adj}}}} = \, 0.{61}0,{\text{ RMSE }} = \, 0.{464,}$$
$${\text{n}}_{{{\text{test}}}} = { 49},{\text{ R}}^{{2}} = \, 0.{615},{\text{ RMSEP }} = \, 0.{5}00.$$
Table 5 The correlation coefficient of descriptors and corresponding VIF values based on model 1 (Eq. 4).
Table 6 Experimental pIC50 values for 245 PI3Kγ inhibitors used as training and test sets and corresponding predicted values for them based on the MLR and ANN methods.

The performance of ANN is relatively better than MLR, in the case of the training set (R2train = 0.642 for ANN in comparison to R2train = 0.623 for MLR); conversely, in the case of the test set, the MLR has the better prediction performance (R2test = 0.662 for MLR in compare to ANN with R2train = 0.615). The calculated values of pIC50 for training and test sets using of ANN technique can be seen in Table 6.

Out-of-sample testing validation

We carried out the out-of-sample testing, as a validation method, to indicate the robustness and stability of the model and to show that the test set selected by the DUPLEX algorithm is representative. Using Minitab software, 49 molecules were selected randomly as a test set from the data set (245 molecules); then the QSAR model was established on the 196 remaining compounds. This model was employed to predict the inhibitory activity of the test set.

The above-mentioned process was repeated 10 times. The results were presented in Table 7; including R2 of training and test sets and VIF. These results are in good agreement with the accepted values for these parameters except for the fifth iteration (in this case the R2 value is slightly less than 0.5 for the test set). These results also confirm that the descriptors are relevant and model 1 (Eq. 4) is predictive. Also, the maximum value obtained for the VIF parameter at each time of out-of-sample testing validation is less than 5, so the established models are not involved with multicollinearity error. The test compounds that were selected randomly at each time of the aforementioned validation method are presented in Supplementary Table S6. In Table 8, the total 56 descriptors selected by GA are listed and descriptors with the highest frequency of iterations in the established models were bold.

Table 7 Out-of-sample testing validation results based on the random dataset splitting.
Table 8 The total 56 descriptors selected by GA carried out to out-of-sample testing validation.

A notable point is that frequent descriptors in the models established based on this validation method are also included in model 1 (Eq. 4). Since model 1 (Eq. 4), is well confirmed by the recent validation, the methodology used for QSAR modeling can be considered valid; especially, in the case of feature selection by GA and data splitting by duplex algorithm. Statistical performance parameters represented in Table 7, also verify that model 1 (Eq. 4) is not involved with overfitting problem60,61.

Subsequently, the GA-MLR model was developed using whole 245 compounds (no splitting) for complementary evaluations of the model.

$$\begin{aligned} {\text{pIC}}_{{{5}0}} & = { 9}.{4}0{2 }\left( { \pm 0.{829}} \right) \, - \, 0.{933 }\left( { \pm 0.{141}} \right){\text{Mor12p}} + \, 0.{27}0 \, \left( { \pm 0.0{41}} \right){\text{RDF}}0{1}0{\text{e}} - \, 0.{589 }\left( { \pm 0.{1}0{3}} \right){\text{Mor14u}} \\ & \quad - \, 0.{536 }\left( { \pm 0.0{98}} \right){\text{Mor15m}} - { 1}.{728 }\left( { \pm 0.{269}} \right){\text{GATS6p}} - \, 0.{81}0 \, \left( { \pm 0.{167}} \right){\text{Mor19m}} \\ & \quad + \, 0.0{36}\left( { \pm 0.00{7}} \right){\text{Te}} + \, 0.{77}0 \, \left( { \pm 0.{264}} \right){\text{GATS4p}} - {11}.{635 }\left( { \pm {3}.{658}} \right){\text{G2v}} - 0.0{39 }\left( { \pm 0.0{13}} \right){\text{Mor}}0{\text{2v}} \\ \end{aligned}$$
(5)
$$\begin{aligned} & {\text{n }} = { 245},{\text{ R}}^{{2}} = \, 0.{632},{\text{ R}}^{{2}}_{{{\text{adj}}}} = \, 0.{616},{\text{ Q}}^{{2}}_{{{\text{LOO}}}} = \, 0.{6}00,{\text{ RMSE }} = \, 0.{476},{\text{ F }} = { 4}0.{114}, \, \\ & \quad {\text{RMSE}}_{{{\text{CV}}}} = \, 0.{623},{\text{R}}^{{2}}_{{{\text{y}} - {\text{random}}}} = \, 0.0{11},{\text{Q}}^{{2}}_{{{\text{y}} - {\text{random}}}} = \, 0.000{6},{\text{PRESS }} = { 94}.{677}{\text{.}} \\ \end{aligned}$$

External verification of the QSAR modes

In order to, first, further evaluate the model robustness, second, to investigate the application domain of the models, and, third, to compare the effectiveness of the models in the face of novel structures, a diverse set of PI3Kγ inhibitors consisting of 45 compounds, out of the training and test sets17,25,33,62,63,64,65 (Supplementary Table S7), were used to external verification of the predictive QSAR models. SMILES strings of these molecules are presented in Supplementary Table S8. External verification was carried out based on the following process: first, by considering model 1 (Eq. 4), the corresponding descriptors were extracted for each structure of the validation set (Supplementary Table S9). Then, predicted pIC50 values of these compounds were calculated through the generalization of the MLR and ANN models to them (Table 9).

Table 9 The experimental pIC50 values of 45 PI3Kγ inhibitors used as a validation set and corresponding predicted values for them based on the MLR and ANN methods.

Simultaneous comparison of training, test, and validation sets based on the QSAR analysis results

Displaying the training, test, and validation sets in one graphical plot provides a clear insight into the molecular distribution, goodness of fit in three subsets of PI3Kγ inhibitors, and ultimately, a more accurate assessment of the application domain of the models. In the following, each of these three aspects is explained in detail.

Data set distribution in terms of standard deviation

Based on the MLR and ANN models, standard deviation ((pIC50)Exp− (pIC50) pred) of PI3Kγ inhibitors versus their corresponding (pIC50)Exp values have been displayed in Fig. 2. The random and uniform distribution of the data on both sides of standard deviation equal to zero can be seen not only in the training set but also in the test and validation sets as a reliable representative of the entire data set. These results confirm that the systematic error did not occur during the model development.

Figure 2
figure 2

Dispersion plots of standardized residuals versus experimental values of the pIC50 during the QSAR model development on PI3Kγ inhibitors.

Regression results of the validation set in comparison to the training and test sets

Modeling efficiency has been evaluated based on the training and test sets in section “QSAR modeling results”; here, we will focus on the model evaluation based on the validation set. A scatter plot of the predicted pIC50 versus the experimental values during the QSAR model development on 196 PI3Kγ inhibitors has been displayed in Fig. 3. Based on these observations, both proposed models have good predictive performance; nevertheless, ANN is superior to MLR in the face of the validation set. (R2valid. = 0.648 for ANN in comparison to R2valid. = 0.532 for MLR). Since ANN is a nonlinear modeling algorithm, considered to be more efficient with high flexibility. Also, using three methods for data splitting, the performance statistical parameters were obtained as: DUPLEX algorithm (R2 = 0.532, RMSE = 0.566), Kennard–Stone algorithm, (R2 = 0.552, RMSE = 0.571) and random data splitting (R2 = 0.532, RMSE = 0.517). Based on the following reasons both models, especially ANN, are robust and approved:

  1. 1.

    The wide variety of structures with pIC50 from 5.00 to 9.30 was used in the validation process

  2. 2.

    QSAR models established on the small number of compounds tend to have better prediction performance than the models developed on a large data set. On the other hand, the efficiency of the QSAR model built from the large data set, consisting of diverse chemical structures and a wide range of pIC50, may seem low due to confounding factors66. However, a model that has been established on more compounds may have a wider applicability domain67 which will be described in the next section.

Figure 3
figure 3

Scatter plots of the predicted versus experimental pIC50 values for MLR (A) and ANN (B) models constructed on PI3Kγ inhibitory activity.

Determination of the application domain of the model

One of the main aspects of QSAR modeling is determining the application domain (AD) of the model. AD is defined as a chemical space constructed by the descriptors and biological responses used in QSAR model development on the training set. Using this approach, the model efficiency in the face of new compounds that may have not been synthesized is assessed. Williams plot-based analysis was used to determine AD. Williams plots represented in Fig. 4 are based on the MLR and ANN model results. This figure makes it possible to comparison the validation set with the training and test set simultaneously in terms of structural similarity and inhibitory activity.

Figure 4
figure 4

Williams plots-based analyses to compare training, test, and validation sets during the MLR and ANN models development on the PI3Kγ inhibitory.

Acceptable limits of structural similarity and inhibitory activity, have been marked with vertical dash line (warning leverage) and horizontal dotted lines, respectively. It can be observed that all of the 290 compounds, consisting of 196, 49, and 45 molecules as training, test, and validation set respectively, are within the boundaries of acceptable standard deviation (± 3δ). To get a better insight from the structural similarity and biological activity in Fig. 4, the molecules close to the boundaries, have been also specified by their corresponding numbers. A chemical structure with high leverage (h > h*) in the training has high influences ability in the modeling process, thus the chemical in the training set is not an outlier for the response fitting. h* is calculated as follow:

$${h}^{*}=3\frac{(K+1)}{n}.$$
(6)

n and k are the numbers of training compounds and descriptors in the model, respectively.

None of the compounds belonging to the test set is X outlier (h* = 0.17). However, two molecules that belong to the validation set are out of the structural similarity threshold. In interpreting these observations, the following explanations can be noted:

  1. 1.

    The whole molecules of the test set are placed in the acceptable limits of structural similarity and standard deviation; it may be because that the models are developed based on the samples with a wide range of structures and pIC50. The proposed models with wide application domine can predict the test set with credibility.

  2. 2.

    The four molecules of the training set with a leverage value greater than 0.17, means that these compounds are very dominant in determining the model; in other words, they are good “influence points” and can be indicators of high accuracy and robustness of the model68. Moreover, these compounds have been accurately predicted by model 1 (Eq. 4) with the lowest standard deviation.

  3. 3.

    In the case of the validation set, as can be seen, only two compounds are X outlier; however, it should be noted that in the routine QSAR studies, evaluation is limited to the test set but in this research, for further assessment of the models, a wide variety of structures were employed as a validation set, out of the training and test sets. As shown in Fig. 4, two X outlier compounds of the validation set are also located on the left side of compound 100 with lower leverage than it. Therefore, according to the description provided in “Materials and methods”, the predicted results of these two compounds can be also accepted. In addition, these molecules are also well predicted by model 1 (Eq. 4). Accordingly, the high stability, predictive ability, and robustness of the MLR and ANN models were assessed in the face of new compounds.

Descriptors interpretation

In-depth insight about structural descriptors helps chemists in the design of new effective drugs through the interpretation of QSAR models. For example, several factors are involved in the binding of a ligand to a target such as van der Waals volumes and surfaces, polarizability, hydrophobicity, lipophobicity, etc. Four categories of descriptors which are entered in the models are defined in Supplementary Table S5 and briefly described here:

Characteristics and capabilities of the four different categories of descriptors

3D-MoRSE descriptors

This category of descriptors was introduced in 1996 by Schuur and et al.69. Range of scattering parameter values (0–31 Å−1) and variety of weighting schemes (unweighted and weighted with atomic mass, atomic van der Waals volume, atomic Sanderson electronegativity, and atomic polarizability) has given them high flexibility and pervasively. 3D-MoRSE descriptors can be employed successfully to extract information from the entire structure of the molecule that results to discriminate a large and diverse set of compounds correctly. These descriptors are sensitive to the presence of specific molecular fragments.

Based on the coordinates, 3D structure, and electron diffraction of molecules, 3D-MoRSE descriptors provide information that calculates by summing atom weights as the following expression:

$$\mathrm{I}\left(\mathrm{s}\right)=\sum_{\mathrm{i}=2}^{\mathrm{N}}\sum_{\mathrm{j}=1}^{\mathrm{i}-1}{\mathrm{A}}_{\mathrm{i}}{\mathrm{A}}_{\mathrm{j}}\frac{{\mathrm{sinsr}}_{\mathrm{ij}}}{{\mathrm{sr}}_{\mathrm{ij}}},$$
(7)

where I is diffraction intensity of electron diffraction, s is the scattering parameter (Angle X-ray scattering), rij is the interatomic distance between i-th and j-th atoms, N indicates the number of atoms, and Ai and Aj expressed the different atomic properties as the weight that were mentioned above. The wide range of scattering parameters is calculated at 32 evenly distributed values at scattering angle(s) in the range of 0–31 Å−1 from the 3D atomic coordinates of molecules based on the above function.

RDF descriptors

In this category, molecular descriptors are calculated through radial basis functions centered on different interatomic distances and are based on the probability of finding an atom in interatomic space with an r radius70. In this category, atoms can also be weighted by various atomic properties (atomic mass, polarizability, etc.). They are independent of factors such as the size of a molecule that is dependent on the number of atoms and focus on describing the 3D arrangement of atoms. These descriptors are effective in providing properties that refer to the morphology of molecular such as steric hindrance, planar or non-planar structure, etc. Another feature of these descriptors which makes them a suitable choice for QSAR analysis is that they are invariant against translation and rotation of a molecule.

WHIM descriptors

These descriptors as statistical indexes are obtained through the projection of atoms on the Cartesian coordinate71. To calculate them, the most stable conformer with minimum energy is used. They can cover 3D information about different characteristics of molecular structure such as size, shape, symmetry, and atomic distribution. Specific information can be obtained from any subset of these descriptors.

2D autocorrelation descriptors

2D autocorrelation descriptors are calculated based on the molecular graph to represent the topological structure of the compounds72. In this class of descriptors, interatomic topology distance is considered based on the length of the types of atomic pairs. Atoms are visualized as the set of discrete points in space and atomic properties including atomic masses, atomic van der Waals volumes, atomic Sanderson electronegativities, and atomic polarizabilities were used to evaluate at that points73. This class of descriptors in combination with 3D-MoRSE descriptors, discuss chemical space between the compounds74. Depending on they are unweighted or weighted with atomic mass, atomic van der Waals volume, atomic Sanderson electronegativity, and atomic polarizability provide a wide variety of information.

Interpretation of the type and coefficient of descriptors in the model compared with X-ray structures of target-ligand complexes

It is noteworthy that in the models established in this research, 3D-MoRSE descriptors have a prominent presence. The negative sign of the coefficients for descriptors weighted with van der Waals volume (G2v and Mor02v) and the positive sign of the coefficients for descriptors weighted with atomic Sanderson electronegativity (RDF010e and Te) is favorable for increasing the inhibitory activities of molecules. The negative sign of the coefficients for Mor12p and GATS6p that were weighted with atomic polarizability, show that smaller or negative values for these descriptors are favorable for increasing the activities of molecules. In the case of descriptors weighted with atomic mass (Mor15m and Mor19m), the negative sign of coefficients is favorable for increasing the inhibitory activities of molecules.

These results, clearly are in good agreement with the experimental observations. This claim is confirmed through an investigation of the X-ray crystal structure which is used following the experimental assays to demonstrate the binding of the ligand with the target site (ATP-binding pocket) of the PI3Kγ enzyme18,24. Interaction between drug structures and PI3Kγ enzyme occurs through strongly electronegative atoms such as N, O, or F. These atoms in the role of hydrogen bond acceptor or hydrogen bond donor (NH and OH) bind to the corresponding polar group R of residual amino acids including Aspartic acid, Glycine, Glutamine, Tryptophan, Lysine, Serine and so on.

Based on the empirical observations, there is a direct relationship between the polarity of the compounds and their inhibitory activity. In a similar situation, the type and coefficients of the descriptors present in the models, show that by increasing the electronegative atoms (N, O, or F) in the structures, their inhibitory activity is enhanced.

Devinyak et al.75 reported that in 3D-MoRSE descriptors weighted by atomic van der Waals volume, and atomic polarizability, significantly decreases the effect of Hydrogen and diminishes the roles of Nitrogen, Oxygen, and Fluorine. In other words, the presence of Oxygen and Nitrogen atoms in the structures reduces the values of these descriptors. Since these descriptors have negative-sign coefficients in the model, smaller values for them are favorable and lead to an increase in their inhibitory activity.

They also showed that 3D-MoRSE descriptors weighted by atomic mass, practically eliminate the role of Hydrogen atoms, while significantly increasing the effect of Phosphorus, Sulfur, and Chlorine on the values of these descriptors. Considering the negative sign of coefficients for this class of descriptors entered in our developed model, larger values of them, in agreement with the experimental observations, lead to the decrease of drug inhibitory activity.

In the case of RDF010e and two other descriptors that were weighted by atomic Sanderson electronegativity coefficients in the models have a positive sign. The presence of the electronegative atoms such as N, O, F, or Cl in the structure of these compounds, increases the value of the aforementioned descriptors; consequently, leads to an increase in the predicted inhibitory activity. In agreement with the empirical observations, these results confirm the stability and correctness of the models, again.

Based on the above interpretation, the total descriptors used in the modeling are in good agreement with the experimental results except for GATS4p. This descriptor was the last choice in the stepwise modeling approach with SPSS software and has the least influence in prediction activity. Elimination of GATS4p has no significant effect on the predictive ability of the models.

Altogether, polar regions strengthen the inhibitory activity of the molecules used as inhibitors of PI3Kγ enzyme. while hydrophobic substitution such as bulky groups and long carbon chain substitution weaken it. Comparing the structures presented in Supplementary Table S1 with experimental activities and modeling results gives a better impression of these structures. For example, the pairs or the series of following compounds can be compared: the series 107, 108 and 109, the pairs 96 and 97, 102 and 103, 191 and 192, 193 and 194, 202 and 203, 209 and 228, 229 and 230, 167 and 168, and so on.

Based on the above discussions the stability and efficiency of the QSAR model were confirmed; so, it can be employed in the face of the external structures in the application domain of the model.

Conclusion

These compounds were previously confirmed as selective isoform-specific PI3Kγ inhibitors by X-ray crystallography. Drug-likeness of them was also confirmed well, based on Lipinski’s rule of five, before using them in the modeling process. QSAR analysis and its evaluation were performed on a diverse set of PI3Kγ inhibitors in a wide range of pIC50 using MLR and ANN models. The out-of-sample testing, as a validation method, was carried out 10 times with different test set selected randomly. The results indicate that descriptors are relevant, the model is predictive, and not facing overfitting. The models were assessed also successfully using another set of compounds out of training and test sets with various structures. To further evaluate the robustness and interpretability of the models and to ensure the accuracy of the methodology used in the modeling process, these models were interpreted based on the type and coefficients of the descriptors included in the models. Results are in good agreement with X-ray structures of target-ligand complexes.