Quantitative Structure-Activity Relationship Model for HCVNS5B inhibitors based on an Antlion Optimizer-Adaptive Neuro-Fuzzy Inference System

The global prevalence of hepatitis C Virus (HCV) is approximately 3% and one-fifth of all HCV carriers live in the Middle East, where Egypt has the highest global incidence of HCV infection. Quantitative structure-activity relationship (QSAR) models were used in many applications for predicting the potential effects of chemicals on human health and environment. The adaptive neuro-fuzzy inference system (ANFIS) is one of the most popular regression methods for building a nonlinear QSAR model. However, the quality of ANFIS is influenced by the size of the descriptors, so descriptor selection methods have been proposed, although these methods are affected by slow convergence and high time complexity. To avoid these limitations, the antlion optimizer was used to select relevant descriptors, before constructing a nonlinear QSAR model based on the PIC50 and these descriptors using ANFIS. In our experiments, 1029 compounds were used, which comprised 579 HCVNS5B inhibitors (PIC50 < ~14) and 450 non-HCVNS5B inhibitors (PIC50 > ~14). The experimental results showed that the proposed QSAR model obtained acceptable accuracy according to different measures, where R2\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$${{\boldsymbol{R}}}^{{\boldsymbol{2}}}$$\end{document} was 0.952 and 0.923 for the training and testing sets, respectively, using cross-validation, while RLOO2\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$${{\boldsymbol{R}}}_{{\boldsymbol{L}}{\boldsymbol{O}}{\boldsymbol{O}}}^{{\boldsymbol{2}}}$$\end{document} was 0.8822 using leave-one-out (LOO).

In recent years, the quantitative structure-activity relationship)QSAR(model has attracted much attention in pharmaceutical research because it can produce high-quality leads in the early stages of drug discovery 12 . In addition, QSAR reduces the costs of experiments and the failure rate when identifying lead compounds. Hansch et al. were the first to apply QSAR more than 50 years ago 13 , but Cros actually developed the QSAR concept in the toxicology field in 1863 by establishing a relationship between the toxicity of alcohols and their water solubility 14 . Thus, QSAR was developed in physical chemistry, before being applied to data sets containing many thousands of compounds with varied molecular structures, while it also evolved from modest regression approaches to wide-ranging statistical and machine learning methods. The QSAR technique has been used extensively in academic research, industry, and governmental institutions throughout the world. QSAR methods are employed to guide lead optimization approaches 15 . QSAR models can be applied broadly to evaluate the potential effects of chemicals, materials, and nanomaterials on human health and the environment. In general, QSAR is used to determine the relationship between the chemical structural features (molecular descriptors) of compounds and their biological activity by using mathematical equations. For example, Hansch et al. proposed a relationship between lipophilicity and biological potency based on a study by Veldstra 16 . Fujita et al. used quantum-chemical calculations to measure the differences in activity regulation in growing plants 17 , where the experimental value of the octanol-water property (logP) was measured. Other studies have employed computational approaches to determine different effects of substituents on potency 18,19 .
Dimensionality can be used to define the type of QSAR used according to the chemical structure dimension. one-dimensional (1D)-QSAR describes the activity based on the total molecular properties such as logp and constitutional properties (no. of atoms of oxygen, nitrogen, etc.). 2D-QSAR associates the activity with the chemical structure such as a pharmacophore.3D-QSAR relates the activity to the interaction fields of molecules. 4D-QSAR is represented by using a group of 3D-ligand conformations, 5D-QSAR considers diverse induced-fit models of 4D-complexes, and 6D-QSAR includes various solvation models 20 .
QSAR models can be calculated using the following two groups of methods 21 . The first group comprises linear methods such as linear regression, partial least-squares, multiple linear regression, and principal component analysis (PCA). However, these methods are affected by various limitations, e.g., the interactions between the dependent and independent variables cannot be interpreted easily due to nonlinear relationships. In contrast to linear methods, the second group comprising nonlinear QSAR methods can determine the nonlinear mappings based on physicochemical and biological descriptors of the molecules 22,23 while they also avoid some drawbacks of the linear QSAR methods. The nonlinear QSAR methods include artificial neural networks (ANNs), k-nearest neighbors, Bayesian neural nets 20 , fuzzy mappings 24 and the adaptive neuro-fuzzy inference system (ANFIS) 25 .
ANNs have been applied to determine activity levels 26 , but this method is sensitive to the parameters and it can be trapped by local optima. Therefore, other methods may be used such as support vector machines (SVMs) 27 and ANFIS 28 . ANFIS is a combination of ANN and fuzzy logic systems, which exploits the advantages of both (i.e., good reasoning using fuzzy logic systems and the simplicity of ANN), and it obtains good results for regression problems (especially the QSAR problem) 29 .
However, ANFIS has some limitations because it is influenced by the approach used to learn its parameters. In addition, a large number of descriptors may affect the performance of ANFIS (or any other QSAR model). Therefore, selecting the best descriptors is very important for reducing the computational cost and eliminating irrelevant descriptors that might reduce the accuracy of ANFIS. Several methods can be used to select the optimal subset of descriptors, including a method 25 comprising two stages where the first employed a genetic algorithm (GA) to select suitable descriptors for the inhibitory activity of cathepsin K, before ANFIS was then used to predict the bioactivity values for cathepsin K. PCA has also been used to determine the largest eigenvectors representing the best descriptors, where these descriptors were then used as inputs for ANFIS and SVM to predict biologically active catechol structures 21 . However, most descriptor selection approaches, such as GA, are affected by limitations such as slow convergence and time complexity, while PCA and other dimension reduction methods change the original data set. Therefore, the antlion optimizer (ALO) was proposed to solve this problem 30 . ALO is a recently developed swarm intelligence technique, which emulates the natural interactions between antlions and ants. The ALO algorithm has several advantages such as small number of parameters, free gradient and good ability to balance between the exploration and exploitation 30 . Therefore, it has been applied in several applications such as, Esha Gupta et al. have proposed method based on ALO algorithm to determine the optimal parameters of primary governor loop of thermal generators, also it used to solve the process planning and scheduling functions in the manufacturing process 31 , and to find the optimal sizing and location of renewable distributed generations 32 . In addition, it has been used in several power system problem for example, load forecasting 33 , economic power dispatch 34 , and load frequency control 35 .
According to the previous literature, in this study, we developed an accurate, simple, reliable, and less computationally expensive technique for calculating bioactivity values by combining the ALO algorithm with ANFIS. The proposed model, called ALO-ANFIS, comprises two phases. In the first phase, the ALO algorithm was used to determine relevant descriptors. In the second phase, the ANFIS method was used in QSAR for modeling the relationships between the bioactivities of 1029 HCVNS5B compounds and their structural descriptors (those selected by ALO). In order to evaluate the performance of the selected descriptors, the data set was split into training and testing sets, where the training set comprised 772 compounds for refining the model and the testing set comprised 257 appropriately selected chemicals for testing the model. The accuracy of the ALO-ANFIS model was assessed using leave-one-out (LOO), Y-randomization, and external validation techniques.
The remainder of this paper is organized as follows. In Section 2, we introduce the methods and provide a brief discussion of the data set and their chemical descriptors, as well as the approach used for splitting the data and the basic concepts employed in ANFIS and ALO. In Section 3, we explain the proposed QSAR model. The experimental results are presented in Section 4. In Section 5, we give our conclusions and suggestions for future research.

Methods
Data sets and chemical descriptors. Chemical descriptors define the construction and function of designated chemicals. Many descriptors are now easy to calculate due to advances in computer technology 36 . These descriptors are classified according to several types such as compositional descriptors, topological descriptors, quantum descriptors, electronic parameters, and geometric parameters. Descriptors may be integers, such as the molecular weight, which describes the entire compound, or substitutions such as a steric effect constant, which refers to a precise fragment or group. Integral descriptors are used only for specific compounds, whereas substitutional descriptors can be used for several compounds, e.g., trichloromethane and trichloroacetic acid have equal steric effect constant values for chlorine 37 .
In this study, 1029 compounds were used to build the QSAR model, which comprised 579 HCV NS5B inhibitors, (PIC 50 < ~14) and 450 non-HCVNS5B inhibitors (PIC 50 > ~14) 45 . These compounds were collected from previous studies and extracted from the BindingDB and CHEMBL databases 10,46 . The chemical descriptors were calculated using the DataWarrior package 47 . Thus, DataWarrior was used to calculate the properties of compound such as the drug-likeness, atom counts, and functional groups, where these descriptors represented 29 properties, as shown in Table 1.
Data splitting. In order to evaluate the performance of the proposed model, the training and testing sets were determined using the cross-validation method. In the cross-validation, the data set was split into a number of classes, before selecting one class as the testing set, whereas the other classes were used to construct the training set. We performed a 10-fold cross validation, with nine classes representing the training set and one representing the testing set, where this process was performed 10 times and the average accuracy was calculated based on all the runs.
Adaptive Neuro-Fuzzy Inference System. The ANFIS method combines fuzzy logic and ANN 48,49 . In recent years, ANFIS has attracted much attention because of its many applications to renewable energy 50 and wind prediction 51 .
In general, the Takagi-Sugeno inference method is one of most popular methods used in ANFIS 48 . This method generates a nonlinear mapping of fuzzy rules from the input space to the output space by using a number of fuzzy IF-THEN rules. The definition of the first-order Sugeno type is given by Eq. (1) 45   where Q 1 , Q 2 and P 1 , P 1 , P 2 are the member functions (MFs) for x and y, respectively, and l i , m i , and n i (i = 1, 2) are the associated parameters of the output functions. The structure of the ANFIS model comprises five layers (Fig. 2), where each layer represents part of the fuzzy inference system (FIS). In particular, the first layer represents the fuzzification process, the second layer is responsible for computing the inference rules and their firing strengths, the third layer is the normalization layer, the adoption layer in FIS is represented by the fourth layer, and the defuzzification output is represented by the last layer. The details of each layer are given in the following. In the structure of ANFIS, the FIS parameters are encoded as weights between layers in the ANN, which allows optimization methods (used in ANN) to be used to determine the optimal FIS parameters.
The first layer receives the input data (x and z) at node i and then computes the membership values (A i and B i ) of the MFs as the output of each node, as given by Eq. (2): where μ A and μ B are the MFs defined as: where ρ i and σ i are the data mean and standard deviation, respectively, which represents the premise parameters set (note that Eq. (3) is the generalized Gaussian MF) 48 . The second layer computes the firing strength of a rule (ω i ) by each node as follows.
The third layer computes the normalization of the firing strength w ( ) i as:  The fourth layer nodes compute the output using the following equation: where l i , m i , and n i are the consequent parameters for node i.
The final layer computes the output of the whole model where this layer comprises a single node: The ANFIS model employs two sets of adjustable parameters: the premise and consequent parameters, which need to be updated throughout the learning phase until the optimal global values are achieved based on the desired response.
Antlion Optimizer. The Antlion Optimizer (ALO) algorithm starts by generating two random populations of antlions and ants, before assessing each position in these two populations using a fitness function, and the optimal ant position is then found 30 . In general, the ants update their solution based on the antlions, which are selected by random walk or according to an elite individual. Each ant's position is updated based on the antlion selected by random walk, where the random walks are normalized according to the following equation 30 : i i i t and d i t represent the minimum and maximum of the random walk, and the minimum and maximum of the i-th variable in the t-th iteration, respectively.
The antlion traps affect the random walks of ants, so the ants move within a hyper-sphere (defined by c and d) around a selected antlion Antlion j t . This process is emulated by the ALO algorithm as follows 30 : where d t and c t are the maximum and minimum of all the variables, respectively. The next process is defined as trap building where the roulette wheel method is used to select Antlion j t based on a fitness function. Thus, a fitter antlion has a higher likelihood of catching the ants.
After building traps, the antlions shoot sands outward from the pit center when ants are in the trap. This behavior is emulated by making c t and d t (defining the radius of the hyper-sphere of ant random walks) decrease with respect to the current iteration t as: where w is a constant for adjusting the exploitation performance and its value is given as: When the ant prey reaches the bottom of the pit, the antlion catches it and takes it into the sand to begin eating. The mathematical definition of this process is given as: where Ant i t is the position of the i-th ant in the t-th iteration. This equation considers that antlions catch prey if they are fitter than other antlions. The position of the antlion is updated to the latest position of the hunted ant to enhance its chance of catching new prey.
In addition to selecting an antlion by random walk, it can be determined using an elite individual strategy, where the best position is used to represent the elite individual. The movement of each ant around a selected antlion may be selected simultaneously by the roulette wheel method and the elite individual strategy, which is formulated as 30 : where R A t and R E t are the random walk around the antlion selected by the roulette wheel and the best position selected by the elite individual strategy, respectively.

Proposed QSAR model
In this section, we explain the proposed QSAR model for predicting the activity of HCV NS5B inhibitors and non-inhibitors. This approach is called the ALO-ANFIS QSAR model and it comprises two phases, where the first is the descriptor selection phase and the second is the PIC 50 prediction phase, as given by Algorithm 1. The details of each phase are explained in the following subsections, where the first step in the proposed algorithm is dividing the HCV NS5B data set into training and testing sets using suitable method (such as 10-fold cross-validation method). Feature selection phase. In this phase, the ALO algorithm is used to select the most relevant features from the training set as follows.
The ALO algorithm starts by generating two populations of ants and antlions, which each solution is converted into binary vector (representing the selected features) using the following equation 52 : ij ij where ε ∈ [0, 1] is a threshold and x ij represents the jth feature of x i . For example, if x i = 01100, then the second and third features are selected.
The fitness function f i is calculated for each solution as: where α ∈ [0, 1] is a random number, and |x| and D are the number of selected descriptors and the total number of features, respectively. The Ŷ i and Y i represent the predict and the actual value of PIC 50 , respectively, for the training set. Also, N_S is the total number of samples in the PIC 50  Determine the best solution (Elite) by selecting the best fAL best . 8.
Choose the antlion using the roulette wheel method (antlions RW ). 10.
Use Eq. (8)  Compute the membership values for each node in the first layer 5.
Computes the firing strength of the rule for each node in the second layer 6.
Compute the normalized firing strength in the third layer. 7.
Compute the output for each node in the fourth layer using Eq. (6). 8.
If the difference between the ŷ and y is smaller than ε, 10. Break 11. Else 12.
Update the parameters of the ANFIS model using the backpropagation method. Where the complexity of Quicksort in best case is O (NlogN) and in the worst case is O(N 2 ). Therefore, the proposed ALO-ANFIS model has complexity

QSAR Model.
The experiments were implemented in Matlab and run in a 64-bit Windows environment. The parameters comprised a population size N = 25, maximum number of iterations = 100, and the stopping condition was reaching the maximum number of iterations. We evaluated the performance of the proposed model by changing the maximum number of iterations, but we found that when the maximum number of iterations exceeded 100, the increase in performance became very small as the computational time increased. In addition, when the maximum number of iterations was less than 100, the computational time was reduced but the performance of the proposed model was greatly degraded. The same effects were obtained when the size of the population was varied.

Molecular docking.
After filtration the 1029 chemical structures to reach 140 structures, molecular docking process was carried out to validate the QSAR technique, supplementary information. The protein crystal structure, 3HHK, were cleaned and treated as described in literature 53 ; the crystal structure was used because of its high resolution 1.7 Å. Chimera was used for crystal structure preparation 54 , only one chain of 3HHK protein was used, all ligands or any solvent molecules were discarded, and polar hydrogens were added by MGLTools.
The MTiOpenScreen is a virtual screening server, which uses Autodock Vina as a docking tool 55 to investigate the highly potent and selective molecular compounds as mentioned before 56 . When using a big data set reaches to 1000 structures, a certain criteria will be applied to ensure a good absorption/penetration of the drugs. this criteria includes Lipenski rule of five (no more than 5 hydrogen bonds, no more than 10 hydrogen bond acceptors, the molecular weight less than or equal to 500 daltons and an octanol-water partition coefficient not be greater than 5) and other two criteria the rotatable bonds less than 10 and the polar surface area will be less than 140 Å 57 .
Molegro molecular view 58 was used to extract the docked inhibitors from their receptors for the graphic representation.
represents the data dispersion around zero deviation, which is defined by: (c) Coefficient of determination (R 2 ) measures the goodness of fit between the results obtained by a method and the actual data. If the value of R 2 is closer to 1, the correlation between the experimental and predicted values is better. R 2 is defined by: Predictive criteria. In addition, we used the criteria proposed by Tropsha et al. 59 for determining whether a model is predictive (when they are satisfied) or not (when they are not satisfied), as follows: where R 2 represents the regression correlation coefficient of between Y and Ŷ (the actual values of PIC 50 and their predicted values, respectively) in both sets (training and testing), O 2 represent the regression correlation coefficients between Ŷ and Y (Yversus Ŷ ) through the origin, and K and K'represent the slopes of the regression lines through the origin (for more information about these measures, please refer to Tropsha et al. 59 .
Moreover, in another study 60 , R 2 to was modified to determine the difference between R O 2 and ′ R O 2 as: where Xis the trained model constructed from the training set and x i is the principal component vector of the element considered. The warning leverage value (h * ) is computed by 3K + 1/N_S, where K represents the total number of independent variables. The probability of accordance between the predicted and experimental values is high for the elements (molecules) in the training set when h < h * . In addition, h > h * indicates that this element will enhance the QSAR model when the element is present in the training set. However, if this element is in the testing set, this indicates that the predicted value is unreliable. If the residual of this element is low, then this element cannot be considered as an outlier. Therefore, in order to determine the applicable domain of the QSAR model, the standardized residual and the leverage must be considered at the same time.

Molecular docking evaluation.
To validate the molecular docking of the current data set structures, the ligand 7ZZ was re-docked to HCVNS5b or 3HHK as mentioned before, the RMSD between the experimental and docked 7ZZ structure was less than 2 angstrom, see Fig. 3.  Results and discussion. The HCVNS5B QSAR model was built using the proposed ALO-ANFIS approach.
The computed Pearson's correlation coefficients between the selected descriptors (molecular flexibility (var1), molecular complexity (var2), non-C/H atoms (var3), electronegative atoms (var4), stereo centers (var5), rotatable bonds (var6), rings (var7), aromatic atoms (var8), and symmetric atoms (var9)) are shown in Fig. 4, which indicates there were positive and negative correlations between various descriptors, but the values were not high so no further descriptors were removed. Figures 5-6 show the predicted values and the experimental values for the training set and testing set, respectively, which indicate that the predicted PIC 50 values agreed well with the experimental values (prediction percentage error was less than 5%). The model did not exhibit proportional and systematic error because the distribution of the residuals to both sides of zero was random. In addition, Fig. 7 shows the regression plot for the training set, testing set, and all the actual PIC 50 values. Table 2 shows the results obtained for the proposed ALO-ANFIS model according to the set of measures, which indicates that the RMSEand MSE values were 0.1104 and 0.012, respectively. The value of R var 2 was higher than 0.6, but all of the results indicate that the goodness of fit for the proposed model was satisfactory. However, several QSAR models obtained good fits (such as ANN with a sufficient number of layers and neurons) but these models were not predictive.
Therefore, several criteria must be satisfied in order to determine whether a model is predictive, as mentioned in the previous section. These criteria are divided into internal and external validation measures 59 . To assess the internal measures, the model was evaluated using only the available data (training) and no other external data (testing set) (e.g., the Y-randomization test). In contrast to the internal measures, the external measures depended on the testing data that had not been used already.
To test the predictability of the proposed model, we divided the data into a training set (823) and testing set (206) (which were selected randomly and they had not been used for training the model). The statistical measures obtained for the ALO-ANFIS model using the training and testing sets are given in Table 2. The RMSE of the proposed model was small for the testing set (~0.588), but after comparing this value with those reported previously 25,26 , we considered that this value was high, possibly because the testing set was larger than those used in previous studies. In addition, the R 2 of values for the training and testing sets were 0.952, and 0.923, respectively, which are greater than 0.5.
Moreover, the results in Table 2 satisfy the criteria mentioned above 59 , where the R 2 value obtained for the testing set was greater than 0.6, and the R 2 value for the training set (or Q 2 ) was greater than 0.5. In addition, the value of R 2 was very close to R 0 2 (and ′ R ), We also obtained evidence that the ALO-ANFIS QSAR model is robust and predictive based on the chance correlation results produced using the LOO test and the Y-randomization test. The results of the LOO cross-validation for the training set (in Table 2) as R LOO 2 and RMSE LOO were 0.8822 and 0.4440, respectively, so it is reasonable to use the ALO-ANFIS model in QSAR. In the Y-randomization test, we constructed a number of     Table 3. Results of the Y-randomization test.     models (set to 10) by randomly permuting the data in the original model, and the expected values of R 2 and Q 2 for these constructed models were smaller than those for the original ALO-ANFIS QSAR model, as shown in Table 3. The applicable domain for the proposed ALO-ANFIS QSAR model was determined using the Williams graph shown in Fig. 8, where the leverage value and standardized residuals are plotted. Figure 8 shows that one molecule has a h value higher than h * = 0.5, which indicates that this molecule represents an outlier for the structure; therefore, the predicted values for this molecule should be considered as being extrapolated using the ALO-ANFIS QSAR model.
Comparison with other models. The performance of the proposed QSAR model is compared with other two models namely, PSO-ANFIS and GA-ANFIS (see the supplementary file for more information about PSO and GA) as given in Table 4. From this table it can be concluded that the PSO-ANFIS model is better than GA-ANFIS model in terms of R 2 and RMSE for both training and testing sets. However, the proposed QSAR  model is still better than the other two models except the R 2 for training set the PSO-ANFIS is better. In addition, the ALO-ANFIS selects smaller number of descriptors than the other models with less times computational.
Moreover, (Figs 9-12) show the actual PIC 50 value versus the predicted values by using PSO-ANFIS and GA-ANFIS, respectively, for training set (testing set). From these figures we can observed that the PSO-ANFIS is closed to the actual values than the GA-ANFIS, especially, in the testing set this goodness of fit is appeared.
From all the pervious results we can conclude that the proposed ALO-ANFIS model is better than the other two models in terms of performance measures (R 2 , RMSE Time(s), No. Descriptors).
From the previous results we concluded that the 9 features that selected by the proposed QSAR model have the largest influencing the HCVNS5B inhibitors. However, the proposed model cannot write its output as a regression equation to describe the relation between the 9 descriptors and the PIC 50 , so, it is difficult to interpret the obtained results. However, the harmony between these descriptors create potent HCV inhibitors as the following: Molecular flexibility: The word flexibility is depending on the chemical graph and refers to the molecular mass, branches, rings and heteroatoms. The importance of molecular flexibility in chemistry and biology are many. As in receptor-ligand interactions, the flexible molecules near pharmacophores may led to reaction inhibition and in chemical reactions, it may led to intermolecular interactions and consequently physical changes 62 .
Molecular complexity: It represents the sum of bond connectivity's of molecular structures. It is a simple tool to design a synthetic pathway to a specific molecule. The relation between molecular complexity and biological activity is mentioned before 63 , it is noticed that more simple molecular structure will produce more potent drug 64 .  Hydrogen bonding: Is a weak bond which formed between hydrogen atom and an electronegative atom like Oxygen, nitrogen and sulfur. There are two types of Hydrogen bonds, if it is formed between molecules, it will be named intermolecular or within a molecule and it will be called intramolecular 65 .
Number of rotatable bonds (nrotb): It represents the change in conformational entropy of a molecule 66 . It also express molecular flexibility and oral bioavailability of drugs 67 . Searle and Williams revealed that each rotatable bond introduces about 1.2-1.6 kcal/mol in changing of binding free energy, assuming complete loss of rotational freedom 68 .
Rings and Aromatic Rings: Debnath et al. 69 , noticed that increasing hydrophobicity is in linear relation with mutagenic effect.
Symmetric atoms: It represents 3D property and it describes atom distribution with respect to some invariant reference frames 70 .
Finally, from the previous discussion it can be observed that the proposed ALO-ANFIS QSAR model provides an efficiency and effectiveness. Since it can selecting an optimal subset of descriptors that increase the regression accuracy. This promising results are achieved due to the ALO algorithm has good ability to balance between the exploration and exploitation during the search process about the optimal solution. Also, due to the good properties of the ANFIS model to solve the QSAR regression problem, since it combines ANN and fuzzy logic system. Molecular docking and QSAR. In Fig. 13, Amino acids were colored according to hydropathy while ligand according to atom type in electrostatic interactions in A and C view. Also ligand was colored according to hydrophobicity in B and D image segments. Hydrogen bond was represented in blue dashed line. The size of compound in C is more accommodate to protein binding site than A, this explain A(the lowest hit) with binding energy = −5.1 kcal/mol and C (the best hit) with binding energy = −13.5 kcal/mol.
In the current study, the highly potent and selective molecular compounds 56 were depend on several descriptors like molecular flexibility (var1), molecular complexity (var2), non-C/H atoms (var3), electronegative atoms (var4), stereo centers (var5), rotatable bonds (var6), rings (var7), aromatic atoms (var8) symmetric atoms (var9), so a curve will be made between these descriptors versus binding energy and PIC50 as in Fig. 14. From this figure, it can be observed that the value of the PIC 50 is increased with increase the value of the following descriptors (var2, var3, var4, var5, and var7), however, with the other descriptors its value is decreased, with small value, by increasing their value. In addition, the value of binding energy is decreased with decreasing the value of var2, var5, and var7 descriptors, and increased with other descriptors.

Conclusion
HCV NS5B is an interesting target for antiviral therapy with limited side effects and it has been the subject of extensive trials to develop nucleoside and non-nucleoside inhibitors. In this study, we used a series of HCV NS5B inhibitors to build a QSAR model. The chemical descriptors were calculated using the DataWarrior package. However, any QSAR model is usually influenced by the number of descriptors and the regression method employed. Therefore, in this study, we developed a new QSAR model for assessing inhibitors and non-inhibitors of HCV NS5B. The proposed model comprised two stages, where we used the ALO algorithm to determine the most relevant descriptors related to the PIC 50 values in the first stage, and ANFIS was then used to determine the nonlinear relationships between the selected descriptors and the PIC 50 values in the second stage. The results obtained by the proposed model indicate that it is an acceptable approach for predicting the activity of drugs as HCV NS5B inhibitors.
According to the promising results obtained using the proposed model, we will apply this method to other complex problems in drug design in future research, as well as other applications such as wind speed prediction by making suitable improvements.