The free energy landscape of the oncogene protein E7 of human papillomavirus type 16 reveals a complex interplay between ordered and disordered regions

When present, structural disorder makes it very challenging to characterise the conformational properties of proteins. This is particularly the case of proteins, such as the oncogene protein E7 of human papillomavirus type 16, which contain both ordered and disordered domains, and that can populate monomeric and oligomeric states under physiological conditions. Nuclear magnetic resonance (NMR) spectroscopy is emerging as a powerful method to study these complex systems, most notably in combination with molecular dynamics simulations. Here we use NMR chemical shifts and residual dipolar couplings as structural restraints in replica-averaged molecular dynamics simulations to determine the free energy landscape of E7. This landscape reveals a complex interplay between a folded but highly dynamical C-terminal domain and a disordered N-terminal domain that forms transient secondary and tertiary structures, as well as an equilibrium between a high-populated (98%) dimeric state and a low-populated (2%) monomeric state. These results provide compelling evidence of the complex conformational heterogeneity associated with the behaviour and interactions of this disordered protein associated with disease.

The C-terminal half of E7 (E7C) contains another conserved region, CR3 (residues , that binds one zinc ion (Zn 2+ ) by two highly conserved CXXC motifs coordinated in a finger-type arrangement 10 . The structure and length of the E7 zinc-finger is unique and not shared by any cellular protein with zinc-finger DNA-binding domains 10,18 . To date, structural characterisations by X-ray crystallography of CR3 have only been achieved using short constructs of the HPV-1a 19 and HPV-45 variants 10 . In both structures, CR3 adopts a folded three-dimensional conformation with a relative arrangement of hydrophobic surfaces that facilitates its dimerization at native conditions.
To contribute to the understanding of the structural mechanisms associated with HPV-related diseases, high-resolution information of the full-length protein expressed by the most high-risk variant are necessary but the entire protein failed to produce X-ray crystals due to its heterogeneous structure. The use of NMR spectroscopy, instead, offers particularly favourable opportunities for the characterisation of proteins that populate simultaneously both ordered and disordered states [20][21][22][23] . Recently, an NMR characterisation of E7 has been reported with a nearly complete sequence-specific assignment of 15 N, 1 H N , 13 C α , 13 C′ NMR chemical shifts 12,24 . The NMR chemical shifts together with 15 N relaxation measurements confirmed a heterogeneous structural and dynamical behaviour, with a highly flexible E7N and a more structured E7C. Despite the latter domain appeared to be folded, the low concentration of the sample required to prevent protein aggregation 12,24 and the heterogeneous intensity of the NMR lines prevented the obtainment of its solution structure. Indeed, the analysis of NOESY experiments revealed only intra-residue and sequential correlations, insufficient to validate a structural model.
In this study we provide a detailed structural description of the full-length E7 in terms of a structural ensemble at nearly atomic resolution. The structural ensemble was determined using all-atom replica averaged metadynamics (RAM) simulations restrained with experimental measurements, a method that combines the sampling efficiency of metadynamics with the on-the-fly modification of the force field with experimental measurements 25 . For this purpose, we have additionally measured HN residual dipolar couplings (RDCs) and complemented them with the previously assigned 15 N, 1 H N , 13 C α , 13 C′ and 13 C β NMR chemical shifts (BMRB entries 19442 and 26069) as experimental restraints. NMR chemical shifts and RDCs are simple population-weighted observables averaged over all conformers that interconvert on timescales faster than the millisecond. As such, they can provide information about the dynamics of structural fluctuations of proteins enabling the determination of the structural ensembles that they populate 20 . The combined use of these experimental data enabled us to obtain a converged sampling in the metadynamics simulation and to characterise major structural features of the high-risk HPV-16 E7 in the context of the full-length protein.

Results
The free energy landscape of E7. In addition to previously reported assignments of the backbone 15 N, 1 H N , 13 C α , 13 C′ and 13 C β NMR chemical shifts 12,24 , here we measured RDCs for the backbone amide groups (HN) of E7 under conditions of fractional protein alignment (see Materials and Methods). The HN signals of the residues of E7C in the full-length construct were too weak and broad to provide measurable RDCs. Instead, we were able to determine 31 HN RDC values for the residues of E7N, that were subsequently used in the structure calculations for the full length E7 (Table S1).
To extract information about the structure and dynamics of E7 provided by the measured chemical shifts and RDCs, we incorporated them as replica-averaged structural restraints in molecular dynamics simulations, using the replica-averaged metadynamics (RAM) method 25,26 (see Materials and Methods). In this approach, the sampling of the conformational space is carried out in an enhanced manner using the metadynamics method, and the force field is modified on-the-fly using experimental data as replica-averaged structural restraints 25,26 ; as a result an ensemble of conformations is generated that is consistent with the maximum entropy principle [27][28][29][30] . In this view, the generated ensemble is the most probable one given both the force field and the experimental data used. The RAM method has been previously shown to accurately characterise the free energy landscape of folded proteins 31,32 and, what is more challenging, the free energy landscape of denatured states 26,33,34 and intrinsically disordered proteins [35][36][37] .
Application of the RAM method to full-length E7 provides the description of the structure and dynamics at a fraction of computational cost needed for the standard molecular dynamics simulations to reach convergence 25 . After a total simulation time of 2.4 μs, we obtained a converged sampling resulting in a free energy landscape within a statistical uncertainty of less than 2 kJ/mol (Fig. 1). The structures in the final ensemble cover a broad range of values of the collective variables (CVs, see Materials and Methods) used in the simulations ( Figure S1): the number of transiently formed antiparallel β-strand segments in E7N of both monomers, between 0 and 4 three-residue segments; the sum of backbone dihedral angles in E7N of both monomers, between 60 rad and 120 rad, and the radius of gyration of the system, between 2.0 nm and 7.5 nm, with values above 4.3 nm corresponding to the dissociation of the dimer into two monomers. The effects of the chemical shift and RDC restraints on the underlined Amber03w force field 38 were very small during the simulations, with an average energy of 190 kJ/mol per replica, a value about 4 orders of magnitude smaller than the average total energy of 1.43 × 10 6 kJ/ mol per replica. These results illustrate how the subtle but highly quantitative corrections to the force field introduced by the replica-averaged experimental restraints induce significant changes in the structural ensemble because of the weakness and transient nature of the interactions between different regions of disordered proteins. The changes to the E7 ensemble are illustrated by comparing the RAM free energy landscape ( Fig. 1) with the free energy landscape obtained using the same procedure, but without experimental restraints ( Figure S2).
The E7 ensemble reveals an equilibrium between monomers and dimers. The RAM ensemble of E7, as described above, can be visualised by its free energy landscape plotted as a function of the radius of gyration and the number of transiently formed β-strand segments in the E7N domain (Fig. 1). A clear feature that emerges from this free energy landscape is that E7 is characterised by two well-separated basins, corresponding to a monomeric state and a dimeric state. The most populated basin corresponds to dimeric conformations with two subunits interacting mainly through the solvent-exposed hydrophobic and polar residues in E7C, homologous to E7 proteins from HPV-1a 19 and HPV-45 10 variants. The low populated basin corresponds to two dissociated monomers. The dimeric basin appears in the free energy landscape with a free energy of 10 ± 2 kJ/mol lower than the basin with two dissociated monomers. The two basins have therefore statistical weights of 98% and 2%, respectively, derived from the Boltzmann distribution at room temperature. The presence of a monomeric state is unlikely to be induced by the force field used in the simulations, as the population of the monomeric state is much lower in the unrestrained ensemble, with the dimeric state having a free energy by 14 ± 2 kJ/mol lower than the monomeric state ( Figure S2). Another characteristic that can be extracted from the free energy landscape of E7 ( Fig. 1) is that within each of the two basins there are several local minima, which is a feature typical of free energy landscapes of intrinsically disorder proteins 26,39 . This conclusion is confirmed by the broad distribution of the root mean square distances between Cα atoms (Cα-rmsd) calculated among all pairs of E7 monomers in both monomeric and dimeric states (15 ± 3 Å).
Characterisation of the structure and dynamics of the ordered E7C domain. Close inspection of the E7C domain in the calculated structural ensemble reveals a globular domain with a β1β2α1β3α2 secondary structure connectivity (Fig. 2a), homologous to structures of E7C from the HPV-1a 19 and HPV-45 10 variants. A main antiparallel β-sheet is formed by the β1 (residues 52-57) and β2 (residues 63-68) strands, which pack against the main α1 helix (residues 73-84), and the short β3 strand (88-91) that extends into right-handed α2 helix (residues 92-95) (Fig. 2a). The folded structure is connected to CR2 with an 11-residue long coiled structure (residues 41-51), which we refer to as the 'linker' region. The CXXC motifs coordinating the Zn 2+ ions are located in the turn connecting β1 and β2 strands and in the short C-terminal α2 helix (Fig. 2a). Zn 2+ is tetrahedrally coordinated to the Cys residues (Cys59, Cys61, Cys91 and Cys94) and this geometry is maintained in all structures.
To characterise in more detail the conformational heterogeneity of the E7C domain, we calculated the root mean square fluctuations (rmsf) of the backbone positions in the RAM ensemble by only considering the E7C domain (Fig. 2b). Rmsf values were averaged for each residue and showed, as expected, the largest values for the 11-residue linker region. This linker in the RAM ensemble populates random coil and polyproline II (PPII) conformations (total populations of 79% and 21%, respectively), which is matching closely the values calculated from the experimental chemical shifts using the δ2D method 40 (Fig. 3). The presence of PPII conformations in this linker is confirmed in the Ramachandran map extracted from the RAM ensemble (Fig. 2c), where several residues including Ala42, Ala45, Glu46, Asp48, His51 and Tyr52 (beside Pro41 and Pro47) have a certain population of ψ/ϕ angles in the PPII-defined region of the conformational space 41 .

Figure 1.
Free energy landscape of HPV-16 E7 representing the monomer-dimer equilibrium. The free energy landscape from the RAM ensemble is plotted as a function of the radius of gyration of the dimeric system (x-axis) and the number of 3-residue segments with antiparallel β-strand content in the disordered E7N domain of monomer 1 (y-axis). Representative conformations of E7 are shown in light blue; in the dimer state, the second monomer is shown in green. The antiparallel β-strand content in the disordered E7N domain is generally low and mainly comprises of short β-bridges of less than 3 residues in length. Radius of gyration values above 4.3 nm stem from the dissociation of the dimer into two monomers and represent various conformations of the isolated monomers. Representative states of both the monomer and the dimer are associated with different regions of the free energy landscape.
www.nature.com/scientificreports www.nature.com/scientificreports/ The rmsf profile for the folded part of the E7C domain shows larger values than those typically observed for folded domains 42 (Fig. 2b). This feature is also reflected in the secondary structural population of the E7C domain (Fig. 3). In particular, the antiparallel β-sheet can be considered as being marginally stable with its constituent β strands showing significant populations of random coil conformations (β1 up to 60%, β2 up to 35% and β3 up to 20%). This finding was expected from the secondary structural populations extracted from the experimental chemical shifts 12,24 and further confirmed by the average structure calculated from the RAM ensemble (Fig. 3,  Secondary structure populations of E7 derived from backbone 15 N, 13 C′, 13 C α , 13 C β , 1 H N chemical shifts using the δ2D method (bars). For comparison, secondary structure populations are also obtained using an alternative method, by back-calculating them from the RAM ensemble; they are depicted by the dotted blue (α-helix), red (β-sheet) and green (PPII) lines (the remaining population, which is random coil, is not depicted here for clarity). The stretches with missing bars are due to missing chemical shift assignments in the NMR spectra and mainly concern the folded part of the E7C domain. The average structure of the E7C domain calculated from the RAM ensemble is depicted in the inset and shows partial population of β strands, in agreement with the secondary structural population. (2019) 9:5822 | https://doi.org/10.1038/s41598-019-41925-4 www.nature.com/scientificreports www.nature.com/scientificreports/ inset). The dynamic nature of the E7C domain was previously inferred based on the relatively high 15 N transverse relaxation rates (R 2 ) measured for this domain 12 .
Structure and dynamics of the disordered E7N domain. Close inspection of the structures of the E7N domain in the RAM ensemble reveals the presence of high flexibility and substantial disorder. Most of the residues of this domain form a random coil structure with a total population of 66 ± 5% (Fig. 3). In addition to the coiled conformation, the residues in the E7N domain show a propensity to form PPII conformations, which account for 26 ± 5% of the total secondary structure populations (Fig. 3). The presence of PPII regions in the RAM ensemble is also confirmed in the Ramachandran maps of residues 1-40 (Fig. 6a), in which the PPII region is occupied, at least transiently, by the majority of residues. The PPII motif is especially present in the region with unusual amount of acidic side chains that stretches over residues 30 to 39 (total population of 39 ± 5%, Fig. 3), in agreement with CD measurements 9 . α-helices in the RAM ensemble are mainly present as 3 10 helices of short length at the N-terminal of the E7N domain and their total population is below the standard error of δ2D (below 5%). The total population of β-sheet is little above the standard error of δ2D (population of 7 ± 5%), and is mainly located at the N-terminal end of the structure. In order to understand the level of compaction of E7N, we then analysed the tertiary contacts present in the contact map of the RAM ensemble (Figs 4 and 5). From this analysis it is evident that the E7N domain mainly exists in extended conformations. Transient contacts are only formed between N-terminal residues and the rest  Figure S3). Interestingly, residue H51 is placed at the exit of the folded part of E7C and makes both inter-and intra-molecular contacts (Fig. 5). Intermolecular contacts are made with the small helix α2 (Val90, Cys94) and β1-β2 loop Lys60) of E7C (top). This same residue makes intramolecular contacts with Tyr23 (lower right).
www.nature.com/scientificreports www.nature.com/scientificreports/ of E7N and between residues Tyr23/Tyr25 in E7N and His51/His73 in E7C. The former contacts are formed within the flexible N-terminal end of E7N that tends to fold back and form transient contact with the rest of E7N (Fig. 4, lower left). The latter contacts are formed between the E7N and E7C domains and stem from ring stacking Figure 5. The linker region in E7 can form both intra-and inter-molecular contacts. We illustrate this conformational heterogeneity by presenting two structures in which His51 forms intra-molecular contacts with Tyr23 in the monomer (left structure) and inter-molecular contacts with Lys60, Val90 and Cys94 in the dimer (right structure). www.nature.com/scientificreports www.nature.com/scientificreports/ between Tyr23, Tyr25, His51 and His73 (Fig. 4). Interestingly, it was previously shown that residues in this region (residues 24-26) have elevated 15 N R 2 rates in comparison to the rest of the E7N domain 12 and this could be a consequence of the exchange process on the NMR time scale. These tertiary contacts are present in 27% of structures in the ensemble, and hence make the LXCXE motif (LYCYE in HPV-16, residues 22-25) not always fully accessible to bind pRb. When it comes to secondary structure, the LXCXE motif is in the extended conformation with a clear propensity for random coil (population of 74 ± 5%) and PPII (population of 21 ± 5%).
Unlike the LXCXE motif, the CKII phosphorylation site composed of S31-S32 is in the extended conformation in all structures of the RAM ensemble. These two residues are part of the acidic-rich stretch (residues 30-39, Fig. 6b), whose extended conformation is stabilised by electrostatic repulsion of the negative side chains. When we calculate electrostatic potential 43,44 of the whole HPV-16 E7 monomer, some interesting features emerge ( Figure S4). The most obvious feature is the overall negative potential coming from HPV-16 E7 with smaller regions of positive potential. This large negative potential likely plays a significant role in the conformation of HPV-16 E7; it keeps E7N domain extended and not bound to the folded E7C. Qualitative examination of the electrostatic potential shows largest values for residues 30 Validation of the RAM ensemble. The E7 ensemble was first validated by back-calculating NMR chemical shifts using the Sparta + predictor, which was not used during the RAM simulation 45,46 , and by comparing the obtained values to the experimentally measured chemical shifts. The RMSDs for each nucleus type were: 1. Secondary structure populations in the RAM ensemble were validated using chemical shift-based secondary structure predictor δ2D that calculates secondary structural population from the measured chemical shifts (Fig. 3,  bars). These values were compared to the secondary structure population calculated from the RAM ensemble using DSSP 47 and a PPII definition reported recently 41 (Fig. 3, dashed line). We found high coefficient of correlations for the α-helical content (0.93, p < 0.001), for the β-sheet content (0.77, p < 0.001) and for the PPII content (0.81, p < 0.001), which confirms the convergence of the RAM simulations. Overall, the differences in secondary structural populations (Fig. 3) are within the uncertainties of the methods used to calculate them (standard errors of both δ2D and DSSP are about 5%).
The E7 ensemble was then cross-validated against the experimentally measured HN RDCs used as restraints in the simulation. In this procedure, different sets of experimentally measured RDCs were used for tensor calculation from the ensemble and for back calculation of RDCs. The level of agreement was very high, with an average value of Q factor of 0.27 ( Figure S5). This agreement is significantly better than that for the unrestrained ensemble (Q factor of 0.95).
The ensemble was verified also against the experimentally determined Stokes radius for the E7N (1-40) peptide 9 . When the radius of gyration of residues 1-40 was back-calculated from the RAM ensemble, it produced an overall average value of R g = 1.70 ± 0.34 nm. This value corresponds to a Stokes radius of 2.04 ± 0.40 nm, which is consistent with the measured experimental values of 2.09 nm and 2.13 nm 9 .
Finally, the ensemble was validated by recording small-angle-X-ray scattering (SAXS) data on the entire E7 protein at increasing protein concentration values. According to the Guinier approximation 48 the scattering intensity plotted as ln I(s) versus the squared momentum transfer is a linear function, were R g is the slope of the linear fit. Our data on full-length E7 (Figure S6), provide a R g = 3.34 ± 0.05 nm, a value that matches the average value of the dimeric state provided by the energy landscape, R g = 3.16 nm. Furthermore, the SAXS profile back-calculated and averaged over from the RAM ensemble using Crysol 49 was highly consistent with the experimental profile (χ 2 = 0.83) (Fig. 7). The average SAXS profile back-calculated from the unrestrained ensemble using Crysol showed no compatibility with the experiment (χ 2 = 3.56) (Fig. 7).

Discussion
Although conformationally heterogeneous states of proteins are involved in a wide range of biological processes, a detailed knowledge of their structure and dynamics is still very challenging to obtain. This is also the case for the oncogenic protein E7 of HPV-16, which is formed by the mainly globular E7C domain and the disordered E7N domain. Indeed, it has been shown crucial to obtain atomic information of E7 in its full form, as different components of the protein appear to have complementary roles when interacting with cellular partners and shorter constructs display lower affinities 4,16,50 . Therefore, the determination of structure and dynamics of the full-length E7 obtained in this study, including the presence of equilibrium between a monomeric state and a dimeric state, offers an opportunity to uncover the presence and functional relevance of its conformationally heterogeneous states.
We have obtained here a structural ensemble representing the monomer/dimer equilibrium of full-length E7 using NMR chemical shifts complemented with HN RDCs measured for the purpose of this study. The experimental measurements were incorporated as replica-averaged structural restraints in molecular dynamics simulations, using the RAM method, and produced an ensemble of conformations according to the maximum entropy principle 27,28 . The free energy landscape of E7 shows that the dimer conformation is dominant, and that it is stabilised by 10 ± 2 kJ/mol in comparison to the monomeric conformation. The results are in agreement with structural evidence available on the construct of E7C from different viral types that showed the presence of the dimeric state at neutral pH 10,19 . Moreover, the analysis of 15 N relaxation rates of E7 from HPV-16 in its full form www.nature.com/scientificreports www.nature.com/scientificreports/ also suggests that E7C is affected by conformational exchange 12,24 . The existence of the predominant dimeric form in solution was further confirmed by SAXS data on the full-length protein.
Although there is limited atomic resolution information for the E7N domain for any HPV virus variant, this domain has been broadly characterised as disordered 12,13,51 . Therefore, one of our aims here was to give a more detailed picture of the E7N domain structure in order to better understand its high promiscuity in the cell 13 . The structural ensemble that we have reported here confirms the overall high conformational heterogeneity of this domain, which can adopt both compact and extended conformations and is characterised with high percentage of random coil population. However, the structural ensemble also demonstrates that the E7N domain does not behave as a random coil with no preferred backbone conformation. Rather, this domain tends to adopt a PPII conformation along the whole sequence with a total population of 26%. Interestingly, 23 consecutive residues stretching between P17 and I39 have a significant population of ϕ/ψ angles characteristic of PPII in the Ramachandran plot. This extended conformation is caused by highly negative electrostatic potential surface of the E7 monomer with a net charge of −16.
Apart to the stability contribution, the extended conformations might have a specific functional significance, as they can promote binding to many different interaction partners by increasing accessibility of its interaction motifs. Indeed, the acidic-rich stretch of residues 30-39 have the highest tendency to populate PPII (39 ± 5%), which is stabilized by electrostatic repulsion of its negatively charged side chains. Residues S31 and S32 are positioned in this acidic-rich region, which make them accessible to phosphorylation by CKII that in turn further increases the PPII content of the E7N domain 9 . Moreover, keeping E7N and E7C apart might be an important feature to render them independent one from the other and fully accessible, but still have complementary roles when interacting with cellular partners positioned next to each other.

Conclusions
The atomic resolution structural ensemble of E7 from HPV-16 reported here reveals very heterogeneous structural and dynamical properties that go beyond the current schematic view of a disordered E7N domain and an ordered E7C domain, and involve an equilibrium between a low populated monomeric state and a high populated dimeric state. The availability of this structural ensemble opens the possibility of atomic resolution characterisations of post-translational modifications and interactions with many partners known to bind different motifs in E7 of HPV-16. As applications of the RAM simulations restrained with experimental measurements, as well as other complementary methods for characterising protein structural ensembles 23,39 , continues to grow so too will our understanding of how protein dynamics control function. These developments will offer an increasingly detailed understanding of structure-dynamics-function relationship in HPV-16 E7, and of intrinsically disordered proteins more in general. www.nature.com/scientificreports www.nature.com/scientificreports/

Materials and Methods
NMR measurements of HN RDCs of E7N. 15 N labelled E7 from HPV-16 was expressed and purified as previously reported 12 . A sample of 150 μM 15 N labelled E7 in 50 mM HEPES, 1 mM DTT, 50 mM KCl at pH 7.5 was used for the measurement of backbone HN RDCs. 1 H-15 N backbone splittings were measured through the IPAP approach 52 . The in-phase and antiphase components of the 15 N signal in indirect dimension of 2D 1 H-15 N correlation spectra were acquired and stored separately. By appropriate linear combinations the two doublet components were separated and used to determine the splitting in the 15 N indirect dimension. For each cross peak, traces in indirect dimension were extracted. The 1 H-15 N splittings were determined by shifting one trace respect to the other. The 2D 1 H-15 N NMR experiments were acquired with 4 scans per increment (32 dummy scans), with a relaxation delay of 1.3 s and with 3072 × 2048 data points and 16.2 × 36 ppm in the 1 H direct and 15 N indirect dimensions, respectively. The experiments were acquired on the 15 N labelled E7 sample before and after addition of the orienting medium. The difference between the 1 H-15 N splittings provides the HN RDCs, with an estimated error of ±0.1 Hz 52 .
A mixture of C5E8 and octanol, which forms a dilute liquid crystalline phase, was used as orienting medium 53 . In particular, 16 μl of C 5 E 8 and 7 μl of octanol was added to a 550 μl E7 sample (3%, R = 0.87, 2 H splitting of 15 Hz). RDCs values could only be determined for residues belonging to the E7 N-terminal part (31 residues, Table S1) as signals for E7C were generally weak and broad, and in part not assigned. Molecular dynamics simulations. Molecular dynamics simulations of E7 were performed using the Amber03w force field 54 and TIP4P/2005 water model 55 , which are particularly suitable for studying unfolded states and intrinsically disordered proteins over a wide range of thermodynamic conditions 38 . All simulations were run using GROMACS 4.5 56 modified with PLUMED2 57 .
A starting model for HPV-16 E7 dimer was generated using homology modelling from the amino acid sequence (UniProt code P03129) and the NMR solution structure of HPV-45 E7 dimer (PDB ID: 2EWL, 65% sequence identity) used as a template for E7C. The homology modelling was performed using Rosetta software 58 implemented on the Robetta server. The starting structure was solvated with 34,682 water molecules and neutralized with 32 Na + ions in a water box of 1070 nm 3 of volume. A high-temperature (450 K) 30 ns preliminary simulation was used to select six starting conformations for each of six replicas (see below). Each conformation was then subsequently relaxed at 300 K for 10 ns. A time step of 2 fs was used together with LINCS constraints for all simulations 59 . Van der Waals interactions were implemented with a cut-off at 0.9 nm, and long-range electrostatic effects were treated with the particle mesh Ewald method 60 .
The RAM simulations were carried out by combining two advanced sampling methods, replica exchange 61 and metadynamics 62 . First, replica exchange is particularly effective in overcoming the multiple minima problem on a rugged free energy surface through the exchange of conformations between multiple replicas. Second, metadynamics efficiently computes free energies and explores the reaction pathways in the space of specific functions of atomic coordinates, called collective variables (CVs) 62 . After performing initial tests, we carried out RAM simulations using six replicas (five replicas biased by five CVs and one unbiased replica) that showed sufficient to recover E7 dynamics with high efficiency [25][26][27]63 . The following five CVs have been employed: the number of 3-residue segments with antiparallel β-strand content of E7N monomer 1 (i) and monomer 2 (ii), the sum of backbone dihedral angles of E7N monomer 1 (iii) and monomer 2 (iv), and the radius of gyration of the whole protein E7 (v). Using the RAM scheme, six replicas were simulated in parallel at 300 K with experimental restraints applied on the average value of back-calculated 15 N, 1 H N , 13 C α , 13 C′ and 13 C β NMR chemical shifts and 1 H N -15 N RDCs using a previously detailed procedure 32,63 . All replicas were simulated in the canonical ensemble at constant volume and by thermosetting the system with the velocity rescaling thermostat 64 . Each replica evolved for 400 ns (2.4 μs of simulation time in total), with exchange trials every 50 ps.
The convergence of the sampling was assessed by monitoring the differences of the free energies at increasing simulation length during the simulations. After the first 350 ns per replica, the free energy landscapes were stable within 2 kJ/mol, suggesting that all the relevant minima in the landscape have been found ( Figure S1). The free energy landscape of the full length E7 was reconstructed using a standard weighted histogram analysis 65 (Fig. 1) as a function of two CVs. The final ensemble consists of 2450 conformations of the full length E7.

SAXS measurements of HPV-16 E7.
The SAXS experiments were performed at the EMBL P12 beamline, DESY, Hamburg, Germany 66 , at E7 protein concentrations 1, 2 and 4 mg/ml in 50 mM HEPES, 1 mM DTT, 50 mM KCl, pH 7.5 at 298 K. The data were recorded using a Pilatus 2 M detector (DECTRIS, Switzerland) with 20 frames of 0.05 s exposure time, at a sample-detector distance 3.00 m and wavelength λ = 0.124 nm. No measurable radiation damage was detected by comparison of successive time frames. The data were processed with the ATSAS package 67 using standard procedures, corrected for buffer contribution, and extrapolated to infinite dilution. The radius of gyration was computed using the Guinier approximation 48 : where I(s) is the measured intensity, I 0 s is the forwarded scattering, R g is the radius of gyration and s the squared momentum transfer. The momentum transfer s is defined by s = 4πsinθ/λ, where λ is the X-ray wavelength (0.124 nm) and 2θ is the scattering angle. The goodness of fit to the experimental SAXS profile was determined using the discrepancy defines as , where µ i and σ i are mean and variance of the measured scattering intensities.