Globular proteins contain cavities/voids that play specific roles in controlling protein function. Elongated cavities provide migration channels for the transport of ions and small molecules to the active center of a protein or enzyme. Using Monte Carlo and Molecular Dynamics on fully atomistic protein/water models, a new computational methodology is introduced that takes into account the protein's dynamic structure and maps all the cavities in and on the surface. To demonstrate its utility, the methodology is applied to study cavity structure in myoglobin and five of its mutants. Computed cavity and channel size distributions reveal significant differences relative to the wild type myoglobin. Computer visualization of the channels leading to the heme center indicates restricted ligand access for the mutants consistent with the existing interpretations. The new methodology provides a quantitative measure of cavity structure and distributions and can become a valuable tool for the structural characterization of proteins.
According to the philosophy of the Latin poet Lucretius, “all things are made up of matter and void, without which atoms could not move”1. Scientists employ the same distinction today to help explain aspects of polymer chain mobility and the transport of small molecules through polymers. Numerous globular proteins are known to possess permanent cavities and channels2. Some are polar and filled with water while others are lined mostly by hydrophobic residues and are empty or are only partially occupied. The dynamics of the internal cavities and channels of a protein have evolutionary value in ligand diffusion, drug delivery, and the control of reactivity3. For example, the internal cavities in myoglobin (MB) are the docking sites transiently occupied by the ligands O2, CO, and NO during their trajectory through the protein, and define a pathway for migration to and from the heme, which is deeply buried. In addition, a cavity containing a molecule can catalyze a chemical process, as illustrated by the mechanism of the reaction of NO with oxyhemoglobin or oxymyoglobin4.
Since the time evolution of cavity creation/destruction inside the protein provides a specific link for protein structure, dynamics, and function, it has attracted much attention in the scientific community5,6,7. Computer simulations have been widely used to study the properties of these cavities. Molecular Dynamics simulations (MD) that take into account explicit treatment of water and protein fluctuations serves as the method of choice for computational exploration of ligand diffusion pathways and dynamics of cavities3,8,9,10,11,12,13,14,15,16,17,18,19,20. However, MD simulation timescale is generally shorter than the time scale of ligand diffusion process and computations are expensive even for a single realization of a ligand escape event. Due to stochastic nature of the process, most of the cases, a single realization is not enough to compare a theory with an experiment especially when kinetics and thermodynamics is of concern. Kinetics and thermodynamics of the process has been studied with enhanced simulation techniques such as Locally Enhanced Sampling11, Umbrella Sampling8, free energy perturbation12, metadynamics18, potential of mean force calculations10 and recently the string method18. Despite the great progress in the methodologies a need exists for a method that is fast, takes into account protein conformational changes caused by thermal fluctuation and solvent interactions, and maps out cavity distributions and all possible diffusion pathways. Herein a new method that combines Cavity Energetic Sizing Algorithm (CESA)21,22 to identify cavities within a dynamic protein and a new method Surface Atom Characterization Algorithm (SACA) (see Figure 1) that finds the cavities on the exterior is presented.
The presence of hydrophobic cavities inside MB has been known for a long time23, yet only recently has there been experimental evidence for ligand pathways24. Such possible routes were suggested already in the pioneering simulations of Elber and Karplus11 and confirmed in more recent atomistic MD simulations10. Some of these studies have shown evidence for a single major pathway for ligand migration that would require movement of the distal histidine amino acid (“His gate”)25,26,27,28,29. However, other investigators have suggested the existence of multiple pathways for ligand diffusion in myoglobin13,14,30. Whether a “single path” or “multiple paths”, the ligand needs entrances on the protein surface to reach the heme center.
To elucidate how ligand migrates through MB, tryptophan (Trp) exchanges has been used widely by experimental groups15,28,31. Single point mutations at the His64 gate, in the distal pocket near the iron atom, and in the Xe4 and Xe1 cavities exhibit dramatic changes in ligand binding kinetics when compared to the wild type28,31. What is the effect of mutations on the cavity structure? How does cavity sizes and distribution change with mutations? Is there any correlation between the cavity distributions and measured experimental observables?
To investigate the effect of Trp exchanges on cavity distributions and ligand migration pathways we have studied wild type myoglobin (MB) and five of its mutants (H64W, L29W, V68W, I107W, and L104W). Our approach allows us to map out all cavities in the protein interior and exterior. Our results show that the structure of the ligand migration pathways and the number of exterior cavities change by the mutations and elucidate for the first time ligand migration pathways and cavity distributions of the mutants in atomic detail. We also observe a relation between cavity properties of the protein and the kinetics of ligand binding.
Figure 2 shows the surface and cavity distributions from a snapshot taken from MD simulations. The cavity distribution and surface structure vary in time while the chemical compositions of both surface atoms and the cavity atoms stay about the same during 15 ns of simulation time. A heuristic method is used to test the ergodicity of the estimates (see Figure 3 and Methods for more detail). We compute the average surface area from the snapshots taken from the simulation trajectory. SACA finds the average surface area of MB to be 16,550 ± 414 Å2 and the result agrees favorably well with the surface area computed by a widely used method32. Using a similar procedure to SACA we also compute the protein volume. We found that the average volume of MB is about 19,000 Å3 and fluctuates by ±10% during the simulation (Figure 4). Surprisingly, fluctuations on the surface area are a lot less. Computed average density of MB is about 1.6 g/cm3, which is close to 1.5 g/cm3 measured in the experiments33.
To compare the differences in cavity distribution in MB and its mutants we compute the cavity size distributions from MD trajectories (Figure 5). Figure 6 displays the position of each mutation on the protein. We find significant differences in the distributions. MB shows a bimodal distribution with cavity sizes peaked at 4Å (major) and 6Å (minor). Major peak values shifted to lower cavity sizes in Trp exchanges. In addition, the bimodal nature of the distribution disappeared except for the mutant V68W. Trp exchanges are found to reduce the size of large cavities, for example Trp mutations of Leu29 and His64 residues decrease the largest cavity size from 7.5 Å to 6 Å.
The distributions in Figure 5 are obtained from the entire 15 ns trajectories. Figure 7 shows the dynamics of cavity size distributions. Each histogram corresponds to a snapshot in time. We observe breaking of large cavities (in 1 ns to 3 ns) and again their re-formation (in 7 ns to 9 ns).
One intriguing observation is the relationship between the experimental rate constants for oxygen rebinding kinetics and cavity size distributions that we compute for MB and its mutants. In Table 1 we compare the experimental rate constants from the kinetic scheme proposed by Scott and Gibson23 with average cavity size computed from our method. Except for V68W, average cavity sizes correlate well with bimolecular binding rate (kO2) which decreases in the order of WT ≈ I107W ≈ L104W > H64W > L29W. Ligand entry rate (kentry) also found to be correlating well with the number of possible entrances on the protein exterior. The more the number of entrances the higher the rate of ligand entry (see Table 1). Furthermore, ligand escape rate increases with the decrease in average cavity sizes. For example, MB has the largest average cavity diameter, around 4.90 Å, whereas L29W has the smallest, with average cavity size of 4.08 Å. Bimolecular binding rate kO2 of MB is about 54 times faster than that of L29W. However, the ligand escape rate (kescape) for MB is about 60% less than that of L29W. An increase in cavity size is accompanied by an increase in bimolecular binding and ligand entry, suggesting the importance of interior voids on gas transport. The insertion of Trp in myoglobin, especially in Leu29 site, causes the most serious breaking of larger cavities, which likely inhibits the O2 binding on the heme28.
To quantitatively compare the cavity size distribution of MB and its mutants, the cumulative distributions are computed (see Figure 8). Interestingly, MB has a distribution that is shifted towards the larger cavity sizes compared to all of its mutants. In MB, about 60% of the cavities have a size larger than the kinetic diameter of oxygen molecule34 (dOxygen = 3.46 Å) whereas the ratio is smaller for the mutants. In particular only about 40% of the cavities have equal or larger size than oxygen molecule in V68W. It is no wonder why V68W has an extremely low oxygen binding rate, (around 0.2 μM−1S−1 while MB has about 16 μM−1S−128). The mutants L104W and I107W have a similar cumulative distribution and it reflects to their kinetics (Table 1). The same can be said for H64W and L29W.
The span distribution is also affected from Trp exchanges. Here we show MB and V68W mutant as an example. The longest span in MB is compared with mutant V68W (Figure 9-a, c). Figure 9-b, d magnifies the distribution of spans that has wider diameter than oxygen molecule. This is important since in order to transport O2 a span must be as wide as the gas molecule. That said, only 7–10% of the total spans in MB and V68W are larger or equal to oxygen molecule kinetic diameter. The longest span with a diameter larger or equal to O2 has a length of about 16.5 Å in MB, whereas it is significantly shorter in V68W (about 12 Å). Same is true for other mutants (not shown here). A decrease in span length is accompanied by an increase in the number of short span elements in V68W and again suggests the segmentation of longer spans.
Figure 2 (right panel) shows the cavities from a single MD snapshot. The cavities from a single configuration are disconnected. To find the pathways at which the ligand moves, if enough time is given, are estimated by combining the cavities obtained from each MD snapshots. Combining the cavities explored in different time frames allows us to determine all unique entrance and exit pathways provided by the dynamics of the protein. It also allowed us to visualize a continuous otherwise disconnected pathways inside the protein. Figure 9 shows such a pathway for MB. We observe that the major pathway going from protein surface to heme passes from some of the Xe binding sites that has been determined by experiments. For wild type myoglobin Xe1, Xe2 and Xe4 are on the ligand migration pathway while Xe3 binding site is slightly off the major pathway (see Figure 10).
In addition, the Figure also shows the cavities on the protein exterior with diameter larger or equal to oxygen kinetic diameter (red spheres). Cavities on the surface are also the result of merged snapshots of the voids on the surfaces. Of course not all cavities on the surface are exit/entrance for the gas molecule. If the cavity on the surface is connected to a cavity that reaches to heme we called it a possible exit/entrance. The total number of such exit/entrance channels is reported in Table 1. From Figure 10, we can see that all pathways from heme to the surface are on one side of the protein and close to heme, which supports the interpretation5 that the ligand entry and exit to myoglobin is from short and direct channel(s) between the heme pocket and solvent. Different than single channel interpretations25,26,27,28,29 our simulations argue that multiple channels to ligand entry/exit can be possible. The same conclusion is proposed by extensive MD simulations of Rusico et al.14 and Maragliano et al.13. However our data suggest the pathways to and in the Mb are very close to the heme group. Rusico et al.14 propose different routes to alternative pathways. A comparison is shown in Figure 10. Difference in the results may be attributed to the difference in our methodology (pathfinder versus our approach) or the difference in the forcefields used in simulations.
Cavities found on the exterior show differences between Mb and its mutants, MB has more entrances/exits than its mutants (Table 1). The most dramatic difference is the mutation of Val68 to Trp in which a dramatic decrease in the number of entrances are observed. Interestingly, the rate of ligand entry (kentry) in V68W is slower by 160 fold28. The decrease in the number of entrances, which is found in our study, explains why there is a dramatic change in the entry rates in V68W.
Our simulations show that in WT MB, the pathway from heme to the surface goes through Xe1, Xe2 and Xe4 cavities while Xe3 cavity does not contribute to the main gas diffusion pathway. The pathway also goes through His64, which opens and closes in time supporting the earlier studies25,26,27,28,29,30 and recapitulating that His64 is the gate for O2 transport25. Similarly, we observe dramatic changes in ligand migration pathway for mutants (Figure 11). Insertion of Trp to Val68 and Ile107 positions block the cavity going to Xe4 binding site, leaving the main pathway going only to Xe1 and Xe2. Trp insertion to Leu104 position, on the other hand blocks the pathway going to Xe1 cavity. The changes in the main pathway observed in our simulations were predicted by Brunori and Olson5 and verified for the first time by computer simulations.
A new protocol is developed to study the cavity and surface properties of globular proteins. MB and five mutants (H64W, L29W, V68W, I107W and L104W) were studied as examples. It is shown that Trp exchanges not only changed the major ligand pathway, as was predicted5, but also it changes the cavitiy size distributions in and out of the protein. Native MB found to have the largest average cavity size in the interior and more possible entrance/exits on the exterior when compared to its mutants. We found that the major pathway for the ligand has multiple exit/entrances and they are on the one side of MB (close to heme) in accord with the experiments. Insertion of Trp to His64, Leu29, Val68, Ile107 and Leu104 positions reduces the number of possible entrances to the protein. Mutations also reduce the average cavity size and block the pathways connecting heme to the surface. The blocked pathways stay blocked during the time scale of the simulations. Our method clearly shows that Ile107 and Val68 to Trp mutations blocks Xe4 cavity, whereas L104W mutant blocks Xe1 cavity. However, Leu104 to Trp mutation has little effect on the interior cavities of myoglobin. Finally, we observe a relationship between the average cavity size and bimolecular binding rate and ligand escape rate. The number of possible entrances also found to be correlated with the rate of ligand entry.
We aim to find all the cavities within the protein as well as the ones on the protein surface. To find the cavities on the surface one has to identify the surface atoms. Lee & Richards35,36 proposed a method to identify molecular surfaces by probing the van der Waals surface with a probe atom comparable to the size of a water molecule. This approach generally accepted as the definition of a molecular surface, or “solvent-accessible surface”, and is supported by a wide range of algorithms37,38,39. An alternative computational method described below, Surface Atom Characterization Algorithm (SACA), not only computes the surface area but also identifies the surface atoms at the same time. The steps involved in implementing SACA are as follows:
Enclose protein configuration within a box.
Find the Center of Mass (COM) of the protein and emit random rays from the COM (see Figure 1-a). Collect the coordinates of each intersected atom defined by its van der Waals volume. For each ray, surface atoms along the ray line are the atomic pair with the largest center-to-center distance.
Define a sphere centered in the COM with a diameter equal to the minimum distance of separation between surface atoms. Within this sphere, randomly choose test points and send rays from each test point to capture the remaining surface atoms.
Repeat the process (step 3) until the number of surface atoms converges.
To calculate the protein surface area, assume that on average half of each atom's surface area contributes to the protein surface.
To find the cavities in the protein matrix we used Cavity Energetic Sizing Algorithm21,22. The method was developed by us and applied to polymer cavities in the past (for more detail21,22). Similar idea was implemented and utilized to study the cavities in proteins10,14 and proved to be useful. The results obtained agreed favorably well with long MD simulations and Locally Enhanced Simulations. In CESA we insert a probe particle (gas molecule) inside a protein and compute the potential energy on the probe particle. A Monte Carlo procedure is then used to move the particle and effectively sample the cavity space.
The cavities in this model are assumed to be spherical with the center of sphere being the minimum of the potential. Two spherical cavities overlap if the distance between their centers is less than the sum of their diameters. Overlapping cavities are grouped into a new entity called a cluster (Figure 1-b). Each cavity belongs to only one cluster and each cluster contains at least one cavity. The span is defined as the maximum distance between any two points that lie within the cluster. This is calculated by finding the two cavity centers in the cluster with the greatest distance. The span is the linear distance between the furthest centers plus the radius of each of the two end spherical cavity.
Cavities and spans are not static entities. As the protein changes configuration they also change their shape and size. To study the dynamics of the cavities we used MD simulations. MD is used to generate protein configurations in room temperature in the presence of water. In this study wild type myoglobin (2MB528) and five of its mutants (3OGB, 2BW9, 2OH9, 2OHB, 1CPW28) were solvated in a cubic box of 60 Å. Water molecules whose atoms were within 1.8 Å distance from any protein atom were removed. Typically, 200 ± 20 water molecules remain in the protein's interior. To assure charge neutrality in the simulation box some water molecules are replaced with chlorine or sodium ions if the protein has a net charge. The final size of each system is about 18,300 atoms. Periodic boundary conditions were used in all directions. Van der Waals interactions were cutoff beyond a distance of 9 Å and the cutoff for electrostatics was 9 Å. The summation of long range electrostatic forces was done by using particle mesh Ewald sum method40.
Molecular dynamic (MD) simulations were performed using MOIL software41. OPLS parameters were used for protein42 and TIP3P for water43. The geometry of the water molecules was fixed with Matrix variant of SHAKE algorithm44, while SHAKE is not applied to the bonds of the protein. The equations of motion were integrated with velocity Verlet scheme, with a time step of 1.5 fs. The total energy conservation is tested by running a single simulation in the NVE ensemble with a time step of 1.5 fs. The drift was less than 0.2% over a period of 1 ns, suggesting that our integration protocol is acceptable. For production runs we used the same settings but sample in NVT ensemble. To ensure constant temperature we used velocity scaling. The system was equilibrated at 300 K for 1 ns followed by production runs of 15 ns for each system. We computed Root Mean Square Deviations (RMSD) from the crystal structures that we began our simulations. Our results show an average RMSD of 2–2.5 Å and fluctuations around it during the simulations suggesting the structures that we started are stable despite the mutations. We recorded the coordinates for every 3 ps for further analysis.
We used a measure to test the ergodicity of the estimates45. Consider an observable q with a mean 〈q〉N obtained from N sampling points. The standard deviation of the N points is σN. If the sampling is made from a uniform distribution the measure should approach a constant value for N. A plot of c(N) as a function of simulation time is shown for cavity size (see Figure 3). Our result shows a constant c(N) even after five nanoseconds. We also report mean, median, and variance of the distributions as a function of simulation time which also shows no drift in longer times and suggest that 15 ns simulation is statistically sufficient for analysis. To estimate the uncertainty we divide the data to two sequential blocks and we compute the average and standard deviation using these data.