Introduction

Antiretroviral (ARV) drug resistance still persists despite recent improvements in antiretroviral therapy1. As the viral genome continues to accumulate mutations under the selective pressures of therapy2, surviving viral populations inevitably become less sensitive to one or more drugs over time. HIV reservoirs and its existence as a quasispecies3 means that an ARV should ideally inhibit a pool of slightly different conformations of receptor targets. The decreasing efficacy of drug binding over time means that patients may have to switch to more difficult treatment regimens with the possibility of experiencing more severe side-effects if no better-tolerated alternative exists. At the same time, ARVs are a finite resource which should be used with proper timing failing which resistance develops sooner. In order to design more robust ARVs and/or improve onto existing resistance prediction methods, additional knowledge of the motions associated with resistance may be helpful. However this is not straight-forward, as patterns of resistance mutations in HIV are complex4, and may require special consideration in order to extract deeply-engrained behaviour. In this manuscript, we focus on HIV protease, which is a crucial enzyme for viral maturation, and is a well-established HIV drug target5,6. There are minor differences as to how various functional segments of HIV protease are defined in literature (possibly due to its high variability) for which we show some of the structural features in Fig. 1. Protease inhibitors (PIs) competitively inhibit the enzyme7, which under normal circumstances process the viral polyproteins Gag and Gag-Pol8. Multi-drug resistance within members of the PI class is not uncommon due to their long period of use and their three-dimensional and electrostatic similarities9,10.

Figure 1
figure 1

Functional regions within HIV protease. Grey spheres depict the residues constituting the binding cavity.

In resource-available settings drug efficacy can be inferred by monitoring viral load or CD4 cell counts, yet when available, knowledge of genotypic information can improve the choice of therapy to be used11. However, drug resistance mechanisms in HIV are not fully-understood12. Previous research has evaluated various computational modelling approaches over the years in order to predict or understand drug resistance mechanisms in HIV protease, ranging from the use of delaunay triangulations from static protease structures13, binding energies from molecular docking14,15 to elastic network modelling applied to coarse-grained models using a uniform spring stiffness16, 100 fs molecular dynamics (MD)15,17 and many more, as reviewed by Cao and co-workers18. Here we adopt a structural approach using MD applied to 100 highly-resistant and 100 hyper-susceptible HIV sequences against eight docked protease inhibitors using the labelled sequence data available from the Stanford HIVdb19. We focus on the majority subtype (B), with the sequences containing rare residues removed, as described in our previous work20. 3D structures of the 200 protease sequences are built using homology modelling21 with a common drug-bound template for each target. Ligand docking with the eight ARVs, namely atazanavir (ATV), darunavir (DRV), fosamprenavir (FPV), indinavir (IDV), lopinavir (LPV), nelfinavir (NFV), saquinavir (SQV) and tipranavir (TPV) then gives 1,600 3D structures of drug-bound protease complexes. FPV is the amprenavir prodrug which is released in its active form upon hydrolysis22.

Proteins are in constant motion23 and drug-binding alters their dynamics24 so each case requires its own MD run. Allowing for replications, a total of 3,200 MD runs are performed. Each run is about 2 ns, so that in total the MD simulations amount to about 6,400 ns. Such a design was necessary in order to cover as much resistance-related complexity from as large a number of independent observations such that highly-conserved patterns would emerge within the noise from protein dynamics in order to minimize bias while maintaining biological variance, for a reasonable amount of CPU hours. The observation of conserved resistance-related dynamics across this high number of independent short simulations of PI-bound receptor complexes shows that highly drug-resistant sequences may be structurally-detectable in a short amount of simulated time. Considerable amounts of conformational sampling are typically required to observe motions that are of large amplitude25,26 or rare27. Same applies for increasing the accuracy of binding free-energy estimations (for instance between a ligand and a receptor), which comes with increased computational costs28,29. We circumvent these issues in this context, by describing two short and specific motions that are detectable very early in all-atom dynamics simulations of the retroviral protease, using the idea of preferential attachment applied to local residue motion. This concept stems from the tendency of initially highly-connected nodes to attract new connections, leading to a global behaviour referred to as being scale-free, in non-random graph topologies30. We use this idea to reinforce the detection of significantly different (smaller and larger) pairwise residues distances derived from statistical tests of averages on the premise that a residue will most likely be at a given distance within an ensemble if there is larger number of other residue pairs supporting the observed difference.

The in silico methods used to simulate MD of drug-receptor complexes are partially stochastic31,32,33 - we therefore mitigate chance events by calculating statistical properties of each ensemble and applying the network degree centrality (connectivity) measure. Networks are an intuitive way of representing relational data using nodes and edges34, with much of the underlying ideas having emerged from insights made by representing social networks as graphs35. Network analysis has thus evolved into an ideal tool for examining inherently complex biological contexts such as single nucleotide variation analysis36,37, protein-protein interactions38,39, gene co-expression data34,40, intra-protein networks37,41,42,43,44 and allosteric modulation analysis45. A tool that uses network analysis over MD simulations is given in44. Network graphs are composed of nodes and edges, where each node represents a particular object while an edge is drawn between any node pair to represent a shared property. Edges can either be directed, in which case a relationship does not entail reciprocation or conversely be undirected whereby connections are mutual. Additionally, edges can be weighted or binary. While the former preserves information continuity, the latter only denotes the presence or absence of a connection34. Undirected binary edges are used in our case to independently represent a significantly larger or a smaller distance on separate graphs. In our analysis, we used the degree centrality, which is simply the number of neighbours adjacent to a given node. Adjacencies can be represented as a square matrix. The degree centrality then is the row (or column) sum from the matrix.

By calculating degree centrality within graphs generated from statistically-inferred edges between Cα atoms, we remarkably find structural features of HIV protease that differ between susceptible and resistant sequences, with further conservation occurring across all 8 ARVs. The results can form the basis for more robust ARV design and better prediction of drug resistance. Further, the combination of molecular dynamics, network centrality and statistical analyses used here provides an alternative way of analysing the effects of non-synonymous mutations, and should be applicable to other diseases whereby protein variants can be independently simulated as two ensembles of variants, ideally comprising over 30 samples in each case, to represent cases and controls to benefit from normalizing properties of the central limit theorem46 for the t-test. Given our performance at predicting conserved motion using this method, such an approach could be very insightful if similar motions were to be picked up in non-B subtypes of varying residue composition as our method highlights the most definitively distinct dominant motions prevailing between drug resistance and susceptibility within the confines of their ensembles. Schematics illustrating the experimental workflows are shown in Fig. 2 for the distance-based approach and in (Supplementary Fig. S1) for the angle-based method.

Figure 2
figure 2

Experimental work-flow for the distance-based network construction and analysis from MD trajectories.

Results and Discussion

In this study, two replicas of 1600 MD simulations were performed totalling 6400 ns, over 100 highly-resistant and 100 hyper-susceptible HIV protease structures complexed with eight docked protease inhibitors. As a quality control for all the MD simulations, Cα root mean square deviation (RMSD) values were first computed to exclude any error in periodic boundary corrections. A condensed representation of the mean and the standard deviations of RMSD values for each ARV is depicted in Supplementary Fig. S2. The runs were found to display slightly higher variation (in red) for the first 100 ps before stabilizing (yellow to white) thereafter in each case. We then begin the experiment with a global assessment of the distributions of protein compactness using the radius of gyration (Rg) across drug ensembles, as shown in Fig. 3 to more local evaluations, namely pairwise residue distances and Cα angles from receptors (Figs 4, 5, 6 and 7).

Figure 3
figure 3

Distributions of Rg values for protease inihibitor complexes containing ATV (a), DRV (b), FPV (c), IDV (d), LPV (e), NFV (f), SQV (g) and TPV (h). Resistant ensembles are shaded in red while susceptible ensembles are in grey.

Figure 4
figure 4

Normalized degree centralities of significantly larger (red lines) and smaller (black lines) distances observed in resistant ensembles for 8 FDA-approved protease inhibitor complexes, namely ATV (a), DRV (b), FPV (c), IDV (d), LPV (e), NFV (f), SQV (g) and TPV (h). The top 5 residue positions with the highest connectivities are labelled at the peaks in each graph. Inserted underneath are the functional protease residues depicted as colored dots, namely the fulcrum , the elbow , the flap , the cantilever , the interface and the binding cavity residues .

Figure 5
figure 5

Mapping of the edges for top-ranked degree centralities onto HIV 3D protease structures for the significantly larger (left) and smaller (right) distances observed in resistant ensembles for 8 FDA-approved protease inhibitor complexes, namely ATV (a), DRV (b), FPV (c), IDV (d), LPV (e), NFV (f), SQV (g) and TPV (h).

Figure 6
figure 6

Heat map of residue positions with significantly larger Cα angles in the resistant ensemble for each PI. The hierarchical cluster tree is displayed on the left. The first replicate is at the left and the second replicate is at the right.

Figure 7
figure 7

Heat map of residue positions with significantly smaller Cα angles in the resistant ensemble for each PI. The hierarchical cluster tree is displayed on the left. The first replicate is at the left and the second replicate is at the right.

Global assessment via radius of gyration

Previous work described distinct mechanisms associated with ARV drug resistance that all point to active site expansion, namely (1) impaired hydrophobic sliding shown in the G48T/L89M double mutant with saquinavir47, (2) reduced dimer stability in L24I, I50V and F53L mutants9 and (3) single or co-operative distal mutations48,49. Additional research on multi-drug resistant HIV protease further described an expanded active site pocket50,51 and cavity expansion due to atomic volume for the HIV protease mutants V82A and I84V5. As Rg measures compaction by calculating the RMSD of each atomic centre-of-mass with respect to that of the whole molecule, we generally expected that higher values would be associated with resistance ensembles. However, our simulations showed no global tendency towards a less compact (larger Rg) series of conformations in the resistant ensemble compared to the drug-susceptible state, as shown in Fig. 3. Taken individually, only DRV, NFV and SQV display larger Rg values in their resistant ensemble. The reverse is actually observed in FPV and IDV. No appreciable shifts were observed for ATV, LPV and even less for TPV. Of notable interest are the slight shifts in the means of the Rg distributions across all drug ensembles, which are suspected to either be inherited from the template used for modelling or the docked drugs themselves, which could be propagating a different set of local receptor-ligand signals towards various parts of the receptor. Similar trends were observed upon replication (Supplementary Fig. S3).

We hypothesize that compactions may instead be observable at a residue level and that these might be masked by more chaotic motions happening globally. Therefore we further investigated each protein-ligand complex by increasing experimental sensitivity by calculating time-averaged one-tailed t-tests from pairwise residue distances obtained from aggregations of independent MD simulations before summarizing and analysing the results with network analysis.

Local evaluation via statistical tests coupled with network analysis

Results from Bonferroni-moderated t-tests are transformed into a network graph as explained in the Materials and Methods section for the distances, and are represented as normalized degree centrality plots onto which several architectural features of HIV protease are mapped using the coloring scheme from Fig. 1. Time averages of the distances for each residue pair are combined across proteins within an ensemble, instead of using the actual distances, mainly for computational efficiency, but also made the distributions of the variables more normal, as per the central limit theorem. For all MD simulations, an initial region of higher RMSD fluctuation (100 ps) was discarded to reduce residual effects coming from prior equilibration. We further filter out stochastic variations by basing ourselves on the network concept of preferential attachment, which is the tendency of scale-free networks to attach new nodes to those that already have high connectivities52. The five top-ranked residue positions are subsequently prioritized on the basis of their connectivities showing statistically-significant larger or smaller distances across the ensembles.

It is remarkable to note that despite the stochasticity of different sections of the experiment compounded with protein variations, our results show features common to all drug complexes. The base of the cantilever (very close to the 60’s loop) is drawn closer to the catalytic core for all drug complexes (Figs 4 and 5) in the resistant state. All the mutations present in each ARV’s resistance ensemble are shown in Supplementary Table S4, where known accessory and major drug resistance mutations (DRMs)53 are shown in bold black and red fonts respectively. From our simulations, a similarly conserved behavior is not immediately apparent from the degree centralities of those residues with larger distances in the susceptible ensemble, but can be seen from their mapping onto protease 3D structures. A lateral widening involving the elbow region and/ or the 10’s loop of the fulcrum is generally observed across all PIs. Part of this motion is described by Hornak and co-workers54 as events leading to flap opening, involving a concerted downward motion of the cantilever, fulcrum and flap elbow with an upward motion of the catalytic aspartate from the floor of the binding cavity. Further, we observed that residues of the receptor cavity do not appear amongst any of the top-ranked residues for each drug and ensemble, and behave in a quite opposite manner, with very low degree centralities. Occasional spikes did manifest themselves for some cavity residues, but these can be ignored as they may be chance events that would not be ranked similarly upon replication of the experiment, as seen in Supplementary Fig. S5. Low degree centralities in both ensembles (i.e. neither larger in the resistant nor in the susceptible ensembles) would point to the fact there is no consistent motion within the binding cavity that would define the state of drug-resistance or susceptibility, at least not within the time limit and conformational landscapes explored. This hints at a receptor pocket that is very malleable with multiple internal cavity dynamics that can lead to similar states, both within and between PI drug classes. A second scenario that could result in such low connectivities would be that cavity residues move in a coordinated manner across ensembles irrespective of drug exposure, which is unlikely.

ARV-specific results

In Fig. 4a for ATV residues showing smaller distances in the resistant ensemble include positions 70, 71 (on chains A and B) and position 69–71 (chain B), while larger distances are at positions 36, 37, 73 (chain A) and at positions 36, 73 (chain B). Mapping these positions onto protein structures (Fig. 5a) shows that regions predicted to be larger in the resistant ensemble move in a lateral outward direction, favoring a wider conformation, while regions predicted to be smaller in the resistance ensemble (or larger in the susceptible ensemble) show an upward motion with respect to the flaps. The residues involved in widening and shortening around the binding cavity show a high level of symmetry between each monomer of the protein. A very similar profile was obtained upon replication, with residue 36 peaking from within chain A instead of chain B. Such conservation in behavior may find direct application in drug resistance prediction or for feature augmentation for improving machine learning prediction of resistance. Mutations present in the ATV resistance ensemble include both accessory DRMs 10IVF, 32I, 33FV, 34Q, 46LI, 48V, 53L, 54LVM, 60E, 62V, 64VM, 71IV, 73STA, 90M, 93LM and a major DRM 84V, in addition to multiple other variations. In the case of DRV (Figs 4b and 5b), residues at positions 71 and 72 (chain A) and positions 69, 71, 72 (chain B) move closer to the catalytic wall in the resistance ensemble in a symmetric fashion. Larger distances are at positions 10 (chain A) and 10, 21, 37, 54 (chain B). The elbow movement is not mirrored in chain A, however position 10 in chains A and B move away from the plane spanning the surface of the page, showing another way of active site expansion in addition to elbow flaring associated with the resistance ensemble. Residue 54 (chain B) is also seen to move away from the the binding cavity, but same does not occur under the replication (Supplementary Fig. S5). DRV’s resistance ensemble includes amongst other variations, the accessory DRMs 11I, 32I, 33F, 89V and the major DRMs 47V, 54LM, 84V.

For FPV (Figs 4c and 5c), smaller distances in the resistance ensemble are at positions 71 (chain A) and 69–72 (chain B), while larger distances in the resistance ensemble are at positions 15–17 (chain A) and 16, 73 (chain B). As in DRV lateral expansion is observed, but mainly involves the 10’s region in addition to the surface residue 73 in both replicates. The constriction behavior is also reproduced very closely in the replicate (Supplementary Fig. S5). The major DRM includes 84V while accessory DRMs include 10IVF, 32I, 46LI, 47V, 54LVM, 73S, 76V, 82TA, 90M in addition to other variations.

IDV (Figs 4d and 5d) displays smaller distances in its resistance ensemble at positions 63, 69–71 (chain A) and 71 (chain B). Larger distances are observed for the same ensemble at positions 16, 73 (chain A) and 16, 73, 93 (chain B). Upon replication, same residues were found to be involved in expansion, while only chain A showed the cantilever loop compaction towards the active site. Once more, the cantilever residue 73 is found to contribute to lateral widening in both chains. Replication identified identical residues involved in expansion at the 10’s loop from both chains, while those involved in compaction included residues 69–71 only from chain A (Supplementary Fig. S5). Major DRMs in the IDV resistance ensemble include 46LI, 82FTA, 84V and the accessory mutations 10IV, 20R, 32I, 36I, 54V, 71TV, 73SA, 76V, 77I, 90M.

Resistance in LPV (Figs 4e and 5e) was associated with smaller distances in the resistance ensemble at positions 70, 71 (chain A) and position 69–71 (chain B), while larger distances were located at positions 73, 93 (chain A) and positions 34, 36, 73 (chain B). Replicate runs are very concordant for residues involved in expansion with the exception of residues 36 and 81 in chain B, which rank differently despite displaying similar trends. Those involved in compaction again point to the cantilever residues of both chains, whereby residue 71 is replaced by 69 in the replicate. The major DRMs of the resistance ensemble consist of 32I, 47VA, 76V, 82SFTA while accessory DRMs comprise 10IFV, 20RM, 24I, 33F, 46LI, 50V, 53L, 54LTVMS, 63P, 71VT, 73S, 84V, 90M.

In the case of NFV (Figs 4f and 5f), shorter distances for the resistance ensemble were at positions 69–71 (chain A) and positions 70, 71 (chain B), while larger distances for the same ensemble were at positions 20, 36 (chain A) and at positions 20, 36, 73 (chain B). In the replicate run, a very similar profile is observed, however it would appear that residue 20 (part of the fulcrum) and 36 (close to elbow) are moving in concert during expansion. Contraction is observed as for other ARVs, close to the cantilever loop region. Major DRMs of the resistance ensemble consist of 30 N, 90 M and accessory DRMS consist of 10IF, 36I, 46LI, 71TV, 77I, 82FA, 84V, 88D.

SQV (Figs 4g and 5g) displays smaller distances in the resistance ensemble at positions 70, 71 (chain A) and positions 69, 71, 72 (chain B). Larger distances for the same ensemble are observed at positions 73, 89 (chain A) and positions 18, 20, 73 (chain B). Very similar symmetric compaction is observed at the fulcrum region as seen for other ARVs on both chains and the widening peak positions are also very similar despite a slightly changed degree ranking. DRMs for the resistance ensemble include the majors 48 V, 90 M and the accessory mutations 10I, 54LV, 62V, 71TV, 73S, 77I, 82A, 84V.

In the case of TPV (Figs 4h and 5h), smaller distances in the resistances are at positions 33, 60, 71 (chain A) and positions 70, 71 (chain B), while larger distances are at positions 16, 20 (chain A) and positions 15–17 (chain B). Replication reproduced compactions once more, close to the cantilever loop but also included the buried residue at position 33 on chain A, surrounded by the 80’s loop, the cantilever and the elbow regions. According to our simulation conditions, compaction at this region appears to be specific to TPV. Lateral expansions, though not identical, are also closely reproduced around the 10’s loop region. The resistance ensemble includes the major DRMs 47V, 58E, 74P, 82LT, 83D, 84V and the accessory DRMs 10V, 33F, 36IV, 43T, 46L, 54VM, 89VM.

Local evaluation via statistical tests coupled with angular distributions

We further investigated receptor backbone movement by comparing angle distributions occurring at protein Cα atoms. Absolute conservation was observed at residue position 84 in only one of the replicates in Fig. 6. Positions 75 and 84 however displayed strongly conserved larger angles in the resistance ensemble, including ATV, DRV, FPV, IDV, LPV, NFV and SQV. At 99% confidence), one-tailed t-tests did not detect any strong conservation of global angular behavior - both for the same drug replicated and across all drugs as shown by the non-reproducible clustering patterns in Figs 6 and 7. This supports the fact that the enzyme is very malleable, even in the closed conformation complexed with the drug and points to the fact that multiple residue arrangements along the backbone can lead to the same effect.

In conclusion, HIV protease inhibitors are used to delay the symptoms associated with late stages of the infection, however resistance is unrelenting due to the virus’s resilience to the current drug designs. Drug resistance patterns are complex. Nevertheless, our large scale simulations show that despite various DRMs and additional variations, lateral expansion and fulcrum compaction are conserved in the drug resistance state, both within and between different types of PIs. The observation of conserved lateral expansion provides additional support for investigating alternative drug-targeting sites rather than the active site, as done in55. The results may be hinting at (1) conserved mechanistic ripple effects emanating from certain similarities in PI drug design, which possibly hints at how crucial these preliminary early movements are in leading to a less-favorable drug positioning within the active site, or (2) a well-conserved pair of local motions associated with drug resistance lying underneath the complexity of DRM patterns.

Analysis of the backbone motions hinted that there is no single angular trajectory leading to resistance, even for the same sequence. Knowledge of characteristic motions around similar energy wells may be an interesting and inexpensive route for supplementing extant drug resistance prediction approaches in HIV subtype B. Given phenotypically-labelled protease sequences from other subtypes, a similar experimental design may prove to be quite useful in extracting conserved local motions. Additionally, this approach could theoretically extend to proteases harbouring indel mutations by selecting homologous residues after simulation, shedding more light on sequences that are more divergent from the consensus B subtype.

We have used MD simulations coupled with network centrality measures to identify common structural features in drug-resistant mutations of HIV protease. As opposed to the conventional ways of constructing residue contact networks using distance cut-offs, we used statistical tests, thus mitigating the known effect of edge discontinuity34 which may arise when pairwise distances are very close to, but not bound by, the chosen cut-off distance. To our knowledge, this method is novel, although elastic network models were used to determine the functional effects of variants in other proteins56. While the Anisotropic (ANM) and Gaussian Network Models (GNM) are based on the application of Hooke’s potential on a single structure with a uniform spring constant, our method is based on the more thorough Newtonian mechanical simulation employing an all-atom forcefield to analyse a large number of independent observations. We expect that our method will be highly useful in other cases for analysing protein structural variations. For instance, one could use a subset of validated antimalarial drug targets from artemisinin-resistant variants and another batch of sequences for artemisinin-susceptible variants and extract subtle motions hidden within the protein dynamics.

Methods

Dataset preparation

HIV subtype B protease sequence variants labelled with fold drug resistance ratios were obtained from the Stanford HIVdb unfiltered dataset19. These were reconstituted and filtered as explained in20. After ranking the sequences based on decreasing average distance for each of the 8 PIs, 100 highly-resistant and 100 hyper-susceptible sequences were short-listed, using cut-offs defined in57. These two classes of sequences are henceforth referred as to the resistant and the susceptible ensembles respectively. Sequences are provided in Supplementary Dataset S6. Pandas 0.21.058 was used for dataset storage and manipulation. Seaborn 0.7.1 and matplotlib 2.1.059 were used for plotting.

Homology modelling

Modeller (version 9.16) was used to model each of the protein sequences in their closed receptor conformation. The main criteria for choosing the templates, in order of selection were (1) presence of the PI complexed within the a closed conformation receptor active and (2) high resolution of the crystal structures. These two characteristics were determined to be important in giving a good starting point for observing comparable changes within short dynamics simulations. High resolution (<1.55 Å) crystal structure templates were thus retrieved for each of the 8 available PIs from the HIVdb dataset (PDB accessions: 3NU360, 3EL961, 2HS162, 2AVO63, 2O4S64, 3EL561, 2NMZ65 and 3SPK66). Very slow refinement was used, with a random seed set at −10000, while model quality was assessed using z-DOPE scores. As a preparation for molecular docking, template crystal structures were systematically preprocessed to only retain high-occupancy side-chain rotamers. The last rotamers for each concerned residue were kept in cases of equal occupancies. Interfacial water was retained from each template crystal structure by choosing any water molecule shared between the ligand and receptor flaps (ILE50 from chains A and B), at an intersecting distance of 3 Å, except for the case of TPV, which does not require such for stability.

Ligand docking

Flexible ligand docking was performed using AutoDock Vina (version 1.1.2)33 to place each PI in its respective receptor variant. The docking center (20.147, 29.716, 16.093) was picked from a saquinavir atom from template 2NMZ subsequently used as reference to align the totality of the homology models using ProDy67. Receptors were protonated to pH7 using PDB2PQR (version 2.1.0)68 using the PROPKA method before merging non-polar hydrogen atoms and assigning Gasteiger partial charges using the prepare_receptor4.py tool from AutoDockTools (ADT)69, whilst having interfacial water present. Ligands were fully-protonated using ADT’s prepare_ligand4.py tool. A grid box size of dimensions 20 × 26 × 20 Å3 and an exhaustiveness value of 16 was chosen for ligand docking at the designated grid center.

Molecular dynamics

The previously-protonated receptors were used, while parameters for the docked ligand poses were determined using ACPYPE70 after full protonation using VEGA (version 3.1.1)71. All 8 × 200 complexes were prepared for molecular dynamics using GROMACS (version 2016.1)72. The AMBER03 forcefield was used with a short-range non-bonded interaction cut-off distance of 1.2 nm. Long-range electrostatics were handled using the smooth Particle Mesh Ewald algorithm. Energy-minimization was performed using the method of steepest descent after neutralizing charges using 0.15 M sodium chloride in SPC-modeled water within a triclinic periodic box. A 50 ps temperature equilibration (at 310 K) was followed by 50 ps of pressure equilibration (1 atm) with time steps of 2 fs and finally a 2 ns production MD was performed at the same temperature, pressure and time step. All MD runs were distributed over a 2400-core queue with 24 cores per job using GNU Parallel (version 20160422)73, managed by the PBS Professional scheduler over the lengau cluster (Centre for High Performance Computing (CHPC)).

Trajectory analysis

After generating MD trajectories, the proteins were centered and rotations/ translations were removed using the trjconv command in GROMACS. RMSD values were first evaluated to detect any failure in correcting periodic boundary conditions. These plots identified an initial period of fluctuation spanning the first 100 ps, which were dropped from any subsequent analysis. Rg values were calculated to have an overview of the levels of compaction observed in the resistance ensemble compared to susceptible ensemble for each drug investigated. Thereafter, local analyses were performed: (a) Welsch t-tests were evaluated over pairwise residue distances across the ensembles. To do so, pairwise Cβ (and Cα for glycine) atom distances from each trajectory were time-averaged within each ensemble. For each drug, each pairwise residue distance was aggregated into separate two-dimensional arrays - one for each ensemble. The t-tests were then performed between each analogous array at a 99% confidence level. (b) Similarly, the time-averaged angles between Cα residue triplets were computed for each complex within an ensemble and compared against the analogous array of time-averaged angles in the other ensemble using t-tests. Only those angles corresponding to the negative logarithm (base 10) p-value being above 2.5 standard deviations were retained for either of the larger or smaller angles in the resistance ensembles. Bonferroni correction was applied in both approaches to correct for multiple testing and reduce chances of false positives. Finally, the angles found to be significant for each drug were clustered by average linkage from the matrix of pairwise Euclidean distances. The MDTraj library (version 1.9.1)74 was used in Python 3.5 for trajectory distance and angle calculations. Numpy 1.13.3 and scipy 1.0.075 were used for general computations and statistical tests respectively.

Network analysis

Network graphs were built from nodes corresponding to Cβ (or glycine Cα) atoms. Edges were obtained from significant p-values obtained from independent t-tests performed on arrays of time-averaged pairwise residue distances. In other words, each time-averaged pairwise distance 〈Dij〉 for a given protein concatenated to those of other proteins within the ensemble. Each array of 〈Dij〉 values is then compared to its corresponding position in the other ensemble of \(\langle {D^{\prime} }_{ij}\rangle \) values using 2 sample t-tests. In order to expose more information, one-tailed tests were performed to determine whether distances are larger or smaller between the resistance ensembles. Same method was applied for all drugs. Finally the node degree centralities were calculated and the top 5 most central nodes for both higher and lower distances were shown as text labels for each drug. Network construction and analysis were performed using the NetworkX library (version 1.11)76. Edge mappings onto protein structures were generated using the NGLview library (version 1.0)77.