Prediction and molecular field view of drug resistance in HIV-1 protease mutants

Conquering the mutational drug resistance is a great challenge in anti-HIV drug development and therapy. Quantitatively predicting the mutational drug resistance in molecular level and elucidating the three dimensional structure-resistance relationships for anti-HIV drug targets will help to improve the understanding of the drug resistance mechanism and aid the design of resistance evading inhibitors. Here the MB-QSAR (Mutation-dependent Biomacromolecular Quantitative Structure Activity Relationship) method was employed to predict the molecular drug resistance of HIV-1 protease mutants towards six drugs, and to depict the structure resistance relationships in HIV-1 protease mutants. MB-QSAR models were constructed based on a published data set of Ki values for HIV-1 protease mutants against drugs. Reliable MB-QSAR models were achieved and these models display both well internal and external prediction abilities. Interpreting the MB-QSAR models supplied structural information related to the drug resistance as well as the guidance for the design of resistance evading drugs. This work showed that MB-QSAR method can be employed to predict the resistance of HIV-1 protease caused by polymorphic mutations, which offer a fast and accurate method for the prediction of other drug target within the context of 3D structures.


Scientific Reports
| (2022) 12:2913 | https://doi.org/10.1038/s41598-022-07012-x www.nature.com/scientificreports/ drug resistance, they are not suitable for the large scale prediction of the resistance of mutants against drugs due to being time-consuming and offering limited predictive accuracy.
Previously we have developed a method called as MB-QSAR (Mutation dependent Biomacromolecular Quantitative Structure Activity Relationship), which allows to rapidly predict the drug resistance accurately and supplies sufficient structural information directly related to the drug resistance. MB-QSAR method has been successfully applied on the prediction of the herbicide resistance of Acetohydroxyacid Synthases (AHAS) causing by the single or double mutations 29,30 , as well as the ligand binding affinity to ShHTL7 mutants 31 . Here we extend MB-QSAR method to predict the resistance of HIV-1 PR mutants from real patient sequences containing large numbers of mutations towards six HIV-1 PR inhibitors (SQV, IDV, RTV, NFV, APV and LPV, Fig. 1). We obtained well prediction accuracy for the binding affinity of drugs to HIV PR mutants. The interpretation of these MB-QSAR models revealed the molecular field view of the drug resistance in HIV-1 PR mutants, which provides insight into the understanding of the drug resistance mechanisms and structural information for the design of resistance evading inhibitors. Compared to those methods mentioned above, MB-QSAR method is capable of predicting the binding affinity of PIs to various HIV-1 PR mutants fast and accurately, and providing the structural basis of the drug resistance in HIV-1 PR mutants.

Results and discussion
MB-QSAR method is based on the traditional small molecule 3D-QSAR methodology 32,33 , in which a series of proteins mutants were treated as "analogies" been targeted by the same small molecule. MB-QSAR method assumes that a suitable sampling of the molecular field values in the inhibitor binding pocket of the mutants can yield models which can quantitatively predict the drug resistance of new mutants and provide information to help the understand of the drug resistance mechanisms and the design of resistance evading inhibitors (Fig. 2).
Here we have constructed the MB-QSAR models for six drugs (SQV, IDV, RTV, NFV, APV and LPV) against HIV PR mutants. The statistical results of MB-QSAR/CoMFA models for the six drugs were shown in Table 1. Six statistical parameters, including the q 2 , ONC, r 2 , SEE, F-value and r pred 2 value, were obtained to assess the quality of MB-QSAR models. In general, our MB-QSAR/CoMFA models for the six drugs were quite well considering their cross-validated squared correlation coefficient q 2 values were higher than 0.6 using 4 or 3 components and the high r 2 values. The higher F-values and the lower SEE also indicated our models had higher explanatory power.
In these MB-QSAR/CoMFA models, the contributions of the steric and electrostatic fields are approximately 60% and 40%, respectively, which indicated that the steric field plays more important role in the HIV-1 PR mutants confer resistance to PIs. The test sets were used to verify the external predictive power of the models. For all of the MB-QSAR/CoMFA models, the r pred 2 values are higher than 0.7 (except for NFV, which has an r pred 2 value of 0.603), indicating a high prediction accuracy for all of the MB-QSAR/CoMFA models. Compared to CoMFA methods, CoMSIA can utilize up to five different molecular fields (steric, electrostatic, hydrophobic, hydrogen bond donor, and hydrogen bond acceptor field) as well as their combinations to construct QSAR models. We tested all 31 possible field combinations to generate the MB-QSAR/CoMSIA models. The field (combinations) display highest q 2 value and external predictive ability (r pred 2 ) were chosen as the MB-QSAR/CoMSIA models for the six drugs (Table S2). These models mostly involved steric and hydrophobic fields (Table S2), indicating an important role of steric interaction in the binding of PIs to HIV PR mutants.
The obtained CoMFA and CoMSIA models for the six drugs were used to predict the relative pK i value for the test set. As shown in Fig. 3 and Fig. S1, all the errors between the experimental pK i values and the predicted pK i Here we derived the 3D coefficient contour maps of CoMFA models which can show the structural impacts on the binding of drugs, thus can provide a view of the drug resistance mechanisms ( Fig. 4 and Fig. S2). The contours were mapped on the structure of wild-type HIV PR complexed with inhibitors. The CoMFA steric interactions are represented by green and yellow contours, while CoMFA electrostatic interactions are shown with red and blue contours. The bulky substituents in HIV PR are favorable in the green regions of steric contours for enhancing the inhibitory activity, while those in yellow regions may lead to a decrease in inhibitory activity. Meanwhile, in the map of the electrostatic field, the blue contours indicate that electropositive charges in HIV PR are favored for inhibitory activity, while the red contour designates an increase in inhibitory activity of the electronegative charges.
As shown in Fig. 4 and Fig. S2, the yellow contours surrounding the residueGly48 and Gly48′ in the complex of HIV PR with SQV, IDV, NFV and LPV indicated that the steric interactions were not favorable for the binding of PIs, which are consistent with the fact that introduction bulky residue in this site can caused resistance to PIs 34 . The Gly48 of HIV-1 PR mainly interacts with these inhibitors through the interaction between its backbone atoms (Fig. S3). The introduction of bulky residues such as valine and methionine in this site can disrupt the The structures of a series HIV-1 PR mutants were constructed and aligned, then the molecular field values in the inhibitor binding pocket were computed using probe atoms. The PLS regression method was used to correlate the pK i values and the calculated molecular field descriptors to achieve the MB-QSAR models. Based on the constructed MB-QSAR models, the molecular drug resistance of HIV-1 PR mutants could be predicted, and interpreting the MB-QSAR models could yield molecular field view for the interaction between the inhibitors and the HIV-1 PR mutants. www.nature.com/scientificreports/ interaction between the backbone of Gly48 and PIs thus introduce the resistance to these drugs. The green and yellow contours between residue of Val82 and PIs, indicating that the mutations of Val82 to different residues can introduce favorable and unfavorable steric interaction towards different PIs, such as V82F mutation should be unfavorable for the binding of SQV and RTV, while the V82F or V82L is unfavorable for the binding of NFV and APV, respectively, which again agreed with these mutations can cause resistance to the corresponding drugs.
Residue of Ile50 and Ile84 also involved in the HIV-1 PR resistance mutations. The yellow contours between Ile50 and SQV, NFV and IDV; as well as between Ile84 and SQV, IDV and NFV, indicated that the mutations of Ile50 and Ile84 to larger residues caused steric effect can confer the resistance of HIV-1 PR towards the above mentioned PIs.
In the electrostatic fields ( Fig. 4 and Fig. S2), red and blue contours were found between residues Asp30 and several PIs (SQV, RTV, APV, LPV and IDV), indicating that negative and positively charge may contribute to the binding of PIs, which indicated that mutation of Asp30 such as D30N mutation can increase or decrease the binding PIs (Table S3). It is interesting to find that, there are blue contours between Arg8 and two PIs (RTV and APV). This may suggest that the positive charge of the Arg8 is essential for the binding of RTV and APV to HIV-1 PR. Although Arg8 is not involved in the drug resistant mutation of HIV-1 PR, the orientation of the side chain of Arg8 is affected by the mutation of other residues, which resulted in the change of molecular field around this residue.
The contour maps derived from the MB-QSAR models also provide information for the design of resistance evading inhibitors towards HIV-1 PR mutants. The yellow contours between the HIV-1 PR and the P1 site of several PIs (SQV, IDV, NFV, APV and LPV, Fig. 4and Fig. S2) indicate that the substitutions of phenyl group with small groups might increase the resistance evading abilities for the PIs. In the meantime, green contours appear between HIV-1 PR and P1' site of PIs (SQV, NFV and RTV) suggest that increasing the steric interaction for this site and HIV-1 PR can result in better resistance evading abilities. While for APV, a smaller group is favorable to interact with the P1′ site of HIV PR mutants. In the electrostatic filed, the blue contours between Asp30 and the P2 or P2' site of PIs, indicated that a positive charged substitution at P2 site should increase the binding of PIs towards HIV-1 PR mutants. In summary, analyzing the contour maps in HIV-1 PIs yielded many clues for the design of resistance evading inhibitors for HIV-1 PR mutants.
The PI-resistant mutations of HIV-1 PR have been classified into major and minor mutations depending on their effect in antiviral therapy 34 . Our MB-QSAR study of HIV-1 PR involved 59 HIV-1 PR variants, which cover most of the major and minor mutation sites (Fig. S4). The major mutation sites are mostly located in the active site to directly interact with substrate or inhibitors, while the minor mutation sites are the residues mostly located outside of the active site. The minor mutations can influence the binding of the inhibitors or substrates through perturbations of the active site by the transmitted conformational effects. The real patient sequence of HIV-1 PR usually contain multiple major and minor mutations, thus the resistance of HIV-1 PR mutant in patients to PIs are conferred by both major and minor mutations. It can be seen from Fig. S5 that variants with major and minor mutation would cause structural change in the HIV PR, especially on the "flap" region that plays essential role on the binding of inhibitors. In the meantime, these mutations also caused the change on the electrostatic potential  Fig. S5b and c). The effect caused by the mutations on the binding of PIs to HIV-1 PR mutants are regards as the change of the molecular field values in the active site in HIV-1 PR mutants in our MB-QSAR studies. We showed that our MB-QSAR method could be employed to accurately predict the resistance of HIV mutants to PIs caused by both major and minor mutations. Currently there are two categories of method were developed to predict the resistance of HIV-1 PR mutants to inhibitors: sequence based and structure based methods. The sequence based method represents a fast prediction method, however the resistance caused by the mutation is not considered in the context of the 3D structure of the target protein. The structure-based methods, such as molecular dynamics simulation, are computationally expensive. To build the MB-QSAR model fot the HIV-1 PR mutants to inhibitors, it only took several hours to run the program on a desktop computer with a Intel(R) Core(TM)i5-8250 CPU. Our work thus presents a fast, structure based method capable of accurately predict the binding affinity of PIs to various HIV-1 PR mutants.

Conclusions
It is a great challenge to find the "perfect" drugs to conquer the drug resistance against HIV. In this work, the MB-QSAR method was employed to predict and provide a molecular field view of drug resistance in HIV-1 protease mutants. Reliable MB-QSAR models were constructed for six drugs (SQV, IDV, RTV, NFV, APV and LPV) with accurate prediction abilities for the prediction of drug resistance for a series of HIV-1 PR mutants. The relationships between the structures of protease mutants and the drug resistance were derived from these models. Interpretation of the relationships supplies important structure information will benefit the understanding of the HIV-1 PR mutational drug resistance mechanism, and provide important clues for the design of efficient resistance-evading inhibitors as well as help to rationalize and personalize the therapeutic decision-making process. Considering that the problem of mutation-induced resistance cuts across virtually all infectious diseases, we believe our method may be extended to a wide range of drug targets besides HIV.

Materials and methods
Biological data. The inhibition constant values (K i ) of six HIV PIs to different HIV-1 PR mutants were taken from research groups of Dunn [35][36][37][38] and Konvalinka [39][40][41][42][43][44] . A total of 60 HIV-1 PR variants including the wild type and mutants were listed in Table S1. The K i values of mutants were converted to relative K i against the wildtype and subsequently treated to relative pK i values and calibrated with an artificial number of 8.0 to make the relative pK i value of wild-type to 9.0 (relative pK i = log(relative K i ) + 8.0). . The hydrogen atoms were added to these structures by SYBYL6.9 (The Asp25 and Asp25′ were protonated according to their bound inhibitor: Asp25 was protonated when bound to SQV, IDV or LPV; Asp25′ was protonated when bound to RTV, NFV or APV, respectively 46 .) The Gastger-Mashii charges were assigned to small molecules and the amber charges were assigned to proteins. The complex structures were first minimized for 1000 times using the Tripos force field and the Powell method. Then the inhibitor and residues within 8 Å from the inhibitor were minimized to a 0.01 kcal/ (mol*Å) convergences. The residues which were 8-16 Å from the inhibitor were kept rigid and considered the interactions with the interesting region residues and the other residues 16 Å away from the inhibitor were ignored during the second minimization process.

MB-QSAR modeling.
Prior to MB-QSAR modeling, the structures were aligned with the backbone atoms of the residues in the inhibitor binding pocket within 3-6 Å away from the inhibitor with respect to the one of wild-type, according to our previous studies 29,30 . The atom by atom least-square fit was used in the alignment. The inhibitors were removed from the complex structures after alignment. For each of drugs, the proteins were divided into training set and test set. MB-QSAR models were constructed based on the training set. The test set was used to evaluate the external predictively of these models. Special cares were taken to ensure the appropriate ranges and distributions of the pK i values for training and test set, respectively.
To calculate the molecular field values, the lattice were centered on the inhibitors with the edge extended 4 Å away from the edge of inhibitor and a grid spacing of 2 Å. The CoMFA fields were calculated with a distancedependent dielectric constant (1/r), and a sp 3 carbon atom with + 1.0 charges serving as the probe atom were used to calculate the steric and the electrostatic field values. An energy cutoff value of 30 kcal/mol was used for both the steric and electrostatic fields. In CoMSIA studies, five indices (steric (S), electrostatic (E), hydrophobic (H), hydrogen-bond donor (D) and hydrogen-bond acceptor (A) descriptors) were calculated with the same lattice as in the CoMFA fields calculation, using the probe atom with a radius of 1.0 Å, a charge of + 1.0 and a unit hydrophobicity value. A Gaussian-type distance dependence function was used between the grid points and atoms of the proteins.
The CoMFA and CoMSIA field values were used as independent variables, while the relative pK i values were used as dependent variables in the partial least squares (PLS) regression analyses to derive the MB-QSAR models. The cross-validation with the leave-one-out (LOO) option was carried out and the SAMPLS method was used in CoMSIA to obtain the optimal number of components (ONC), and the ONC was used to generate the PLS regression models by non-cross-validated analysis. In the case of CoMSIA analysis, 31 analyses were carried out using the five fields separately and in all possible combinations. All the QSAR calculations were done in SYBYL6.9.