Insights into protein sequencing with an α-Hemolysin nanopore by atomistic simulations

Single molecule protein sequencing would represent a disruptive burst in proteomic research with important biomedical impacts. Due to their success in DNA sequencing, nanopore based devices have been recently proposed as possible tools for the sequencing of peptide chains. One of the open questions in nanopore protein sequencing concerns the ability of such devices to provide different signals for all the 20 standard amino acids. Here, using equilibrium all-atom molecular dynamics simulations, we estimated the pore clogging in α-Hemolysin nanopore associated to 20 different homopeptides, one for each standard amino acid. Our results show that pore clogging is affected by amino acid volume, hydrophobicity and net charge. The equilibrium estimations are also supported by non-equilibrium runs for calculating the current blockades for selected homopeptides. Finally, we discuss the possibility to modify the α-Hemolysin nanopore, cutting a portion of the barrel region close to the trans side, to reduce spurious signals and, hence, to enhance the sensitivity of the nanopore.

protein translocation has a number of peaks close to the number of amino acids of the analyzed protein. On the computational side, the possibility to exploit the adhesion of the peptide chain on 2D materials to get a step-like translocation 33,34 has been recently explored as a possible approach to control the protein transport through the pore. (ii) Distinguishability. The signal level associated to a single amino acid (AA) has to allow the unambiguous identification of the AA. Several experimental [8][9][10]20,35 and computational 33,35,36 works have shown that also very small changes in peptide composition can be potentially detected by nanopores. In this respect, systematic analysis of the capability of nanopores to distinguish among all the different residues are highly needed, a remarkable recent example being the work by Farimani et al. 36 on the computational assessment of the peptide sequencing capability of a MoS 2 pore.
In the present study, we focused on the distinguishability of different AA in α-Hemolysin (αHL), the most widely employed pore in nanopore sensing 2,[18][19][20]35,37,38 . To this aim, as a preliminary case study, we analyzed the differences of homopeptide chains inserted in the αHL. First, we employed an extensive set of non-equilibrium all-atom MD simulations ( µ  s 8 in total) to calculate the current levels associated to four different neutral homopeptides. Our results show that, as expected, large residues correspond to lower current values. Interestingly, we find that an equilibrium quantity derived from continuum quasi-1D argument and indicated as "pore clogging estimator" is linearly correlated to the measured current blockages. The estimation of relative conductance is a factor four less computational demanding with respect to non equilibrium runs allowing us to explore all the standard amino acids. Our results show that αHL pore clogging is affected not only by amino acid volume, but also by hydrophobicity and net charge. In particular, charged residues leave more room to electrolyte motion compared to uncharged one, hence, for similar residue volume, they give rise to a smaller clogging.

Results and Discussion
Ionic currents for selected homopeptides. We studied the ionic current for four different homopeptide chains clogging the αHL nanopore via all-atom molecular dynamics simulations. The four amino acids composing the homopeptides are alanine (Ala), phenylalanine (Phe), tryptophan (Trp) and glutamine (Gln) and, for each of them, we prepared five independent replicas. The system set-up is sketched in Fig. 1a. The αHL nanopore is embedded into a double-layer lipid membrane and immersed in a 2M KCl electrolyte solution, for a total of about 310 K atoms. After equilibration, the homopeptide is imported into the pore using steered molecular dynamics 39 . The frame with the central residue closer to the main pore constriction (Glu 111) is selected as starting configuration for the production runs. A constant and homogeneous external electric field E = (0, 0, E z ) corresponding to a trans membrane voltage ΔV = 1 V is applied parallel to the pore axis. Each simulation was run for 240 ns and ionic current is estimated as the time average after discarding an initial transient of 64 ns, see Methods. The current blockage is defined as ΔI/I 0 = (I 0 − I)/I 0 , with I the average current measured with the homopeptide inside the pore and I 0 the empty pore value. (a) The system is constituted by an α-Hemolysin (αHL, blue) nanopore embedded into a lipid membrane (gray). A 35-residues homopeptide (orange chain) is imported into the nanopore with the central residue close to the pore constriction. The simulation box is filled up by 2M KCl electrolyte solution, that, for the sake of clarity, is not shown. A constant and homogeneous external electric field E = (0, 0, E z ) parallel to the pore axis is applied. (b) Average current blockage ΔI/I 0 = (I 0 − I)/I 0 , with I the average current measured with the homopeptide inside the pore and I 0 the empty pore value, for four different homopeptides, Ala, Phe, Gln, Trp. The data are obtained averaging the current blockades of five replicas for each homopeptide. Error bars are estimated by considering current blockades from independent replicas as independent measurements. (c) Molecular structure and Van der Waals volume of the four amino acids 45 . Figure 1b shows the average current blockage ΔI/I 0 for each homopeptide, which are obtained averaging the current blockade of 5 replicas for each homopeptide, while error bars are estimated by considering current blockades from independent replicas as independent measurements. Fig. S1a, instead, reports ΔI/I 0 for each single replica. As expected, ΔI/I 0 roughly reflects the steric hindrance of each amino acid, see Fig. 1c. Indeed, the lower blockage corresponds to Ala (VdW volume, V A = 88.6 Å 3 ) 40 and the largest to Trp (V W = 227.8 Å 3 ) while Gln (V Q = 143.9 Å 3 ) and Phe (V F = 189.9 Å 3 ) blockages are in between the Ala and Trp values.
Interestingly, significant differences among replicas of the same homopeptide are found for Ala, Gln, and Phe, while, Trp replicas do not show any significant variability, see Section S1 and Fig. S1b of Supporting Information. This occurrence can be explained in term of the capability of smaller amino acids to explore a larger number of conformations inside the pore. electrolyte occupancy. The above presented results indicate that the size of the side chain is correlated to the current blockage; the larger the side chain, the deeper the current drop. A similar results was find also for DNA and Mpsa (another biological pore used for sensing), by Bhattacharya et al. 41 where it was shown that the number of water molecules displaced from the nanopore by the DNA determines the ionic current blockade, whereas the steric and conformational (base-stacking) properties of the DNA determine the amount of water displaced.
To better investigate the role of peptide conformation on the current drop, we formulated the following simple theoretical model. In a quasi-1D continuum description, the pore resistance is expressed as where the z− axis coincides with the pore axis, the pore goes from z = 0 to z = L, ρ(z) is the electrolyte resistivity and A(z) is the area of the pore section available to the electrolyte. Access resistances are neglected.
To estimate A(z) from our non-equilibrium runs, we divided the system in cubic cells of size Δx = Δy = Δz = 1 Å, and, for each frame, we used the VMD Volmap plug-in 42 to compute the occupancy map of the electrolyte, m x,y,z , where x, y, z indicate the cell, m x,y,z = 1 if the cell is within a Van der Waals radius of at least one water or ion atoms and m x,y,z = 0 elsewhere. Then, we averaged m x,y,z over all frames and normalized it with the bulk value. The resulting averaged and normalized occupancy map is indicated with M x,y,z . As already discussed in Aksimentiev and Schulten 38 , "electrolyte pockets" are present close to constriction, see Fig. S2. The pockets do not contribute to the ion current. To filter out these pockets, we defined a trans → cis available channel as the pore region accessible to the electrolyte when moving from the barrel entrance towards the vestibule. This procedure excludes reentrant pockets directed towards the trans side, see Fig. S2 of Supporting Information. The same procedure is applied to get a cis → trans accessible pore, and the final occupancy map ∼ M x,y,z is obtained as the intersection of the trans → cis and cis → trans accessible pores, see section S2 of Supporting Information for details. Figure 2c reports slices of ∼ M x,y,z for the four homopeptides. The regions available for the electrolyte transport between the two sides of the membrane are indicated in blue.
The occupancy map ∼ M x,y,z allows direct estimation of the pore section A(z) that can be calculated summing ∼ M x,y,z on slices of width Δz normal to the pore axis, in formula www.nature.com/scientificreports www.nature.com/scientificreports/ Consequently, the resistance, Eq. (1), can be approximated as i N 1 z z where i = 1 and i = N z correspond to the slice at pore trans and vestibule entrances, respectively, and we assumed that the resistivity ρ is constant along the pore. A similar quasi-1D model was recently applied in 43 . Figure 2b reports the inverse of the available section profile, − A z 1 for the five Ala replicas (solid lines) and for the empty pore (black dashed line). In the vestibule region, z ∈ (60, 100) Å, the difference between the empty and the clogged α HL is negligible, indicating that, the contribution of the moiety of the homopeptide in the vestibule region to the pore resistance, Eq. (3), is almost unrelevant. More evident differences are present in the barrel region, z ∈ (5,50) Å, and, in particular in the main αHL constriction,  z 50 Å. Interestingly, A(z) −1 has also a peak for  z 20 Å. This is due to the non-polar Leu-135 residues that forms an isolated hydrophobic ring inside the barrel. Since the available section for the electrolyte passage is smaller in this region, we will indicate it as secondary barrel constriction. The hydrophobic nature of Leu-135 ring was shown to be relevant for DNA translocation through αHL 44 .
To quantify the correlation between the pore clogging and the ionic current, we defined the pore clogging estimator as where  R 0 refers to the empty pore. Equation (4) is inspired by the definition of the current blockage. Indeed, ΔI/I 0 = 1 − I/I 0 , hence, using Ohm law, ΔI/I 0 = 1 − R 0 /R. The above discussed model is based on several hypotheses that are violated by the actual α HL pore shape. In particular, the continuum assumption is not justified at nanoscale, moreover, the model implicitly assumes a smooth variation of A(z) along the pore axis. In addition, we considered a homogeneous electrolyte resistivity ρ. Nevertheless, although a strict quantitative agreement with the blockage ΔI/I 0 and b is not expected, b results to be highly correlated with the measured ΔI/I 0 (Pearson correlation coefficient r = 0.8), see Fig. 3.
pore clogging for all amino acids. Stimulated by the good correlation among the measured ΔI/I 0 and the pore clogging estimator b, Eq. (4), we looked for a less computational demanding strategy to estimate b. We repeated the protocol described in the previous section using, as input data, 64 ns equilibrium runs (E z = 0, only last 32 ns used for statistics) instead of the original 240 ns non-equilibrium trajectories. The resulting equilibrium pore clogging estimator is indicated as b eq . The result discussed in section S3 of the supporting information, show that, although the value of b slightly changes when using equilibrium or non equilibriums runs as input, the correlation is still very good for Ala, Gln and Trp while deviations are obtained for Phe that, at equilibrium, show a larger clogging compared to non equilibrium runs. Figure S3c reports the equilibrium clogging profile for Phe. It is apparent that for replica F1 the clogging in the main α HL constriction is much higher than for the other replicas. We argue that this single outlier is responsible of the high deviation from equilibrium and non equilibrium average clogging for Phe.
The relatively small computational cost of the equilibrium simulations needed to estimate b eq allows us to explore the blockage features of all the amino acids. For each homopeptide, we run five different replicas. Figure 4a shows the pore clogging estimator b eq Vs the apparent amino acid volume V a 45 . A very good correlation is evident for all uncharged residues, while charged residues lie below the regression line. Indeed, charged residues leave more room to electrolyte solution compared to uncharged one. Similarly, although less evident, polar www.nature.com/scientificreports www.nature.com/scientificreports/ residues (green) show, on average, a lower b eq than hydrophobic ones b eq , see section S4 of the Supplementary Material for a statistical analysis. This occurrence can be explained as a combination of two concomitant effects. First, hydrophobic, hydrophilic and charged residues affect the structure of the first shells of the electrolyte solution surrounding them in different ways. Indeed, concerning water molecules, hydrophilic and charged residues induce a more compact layering with respect to hydrophobic ones, see, e.g. 46 . Secondly, charged peptides are slightly more stretched with respect to uncharged residues (see, section S5), increasing the effective cross-section of the clogged pore available for electrolyte motion. In addition, we observed that charged homopeptides also induce an overall increases of the total number of ions inside the pore. Indeed, the ratio between ions and water molecules inside the narrow pore region (barrel plus constriction) is 0.061 ± 0.002 for charged residues and 0.035 ± 0.001 for uncharged ones. These values can be compared with the empty pore one, 0.041, indicating that, despite the confinement, charged residues are able to partially carry their counterion shells inside the narrowest regions of the pore. In summary, on average, for a similar amino acid volume, the pore clogging is minimum for charged homopeptides and it progressively increases moving to hydrophilic and hydrophobic residues. This evidence suggest that, although the main feature controlling the pore clogging is the volume of the amino acid, also charge and hydrophobicity play a role. For completeness, Fig. S4 reports the inverse area profiles for each charged residue and for the corresponding hydrophobic residue with a similar volume while the correlation of the amino acid accessible surface area S 47 and pore clogging b eq is reported in Fig. S5.
Concerning the uncharged residues, the more evident outlier in Fig. 4a is Leu. Indeed, although its volume is the same of its isomer Ile, b eq is much larger. A close inspection to the inverse of the available section profile 1/A(z) indicates that this discrepancy is mainly due to clogging of the secondary barrel constriction,  z 20 Å. In particular, we observed that in some replicas, the Leu-homopeptide forms a short α− helix in the portion that occupies the secondary constriction, see Fig. S6.
The effect of secondary barrel constriction can be, in principle, eliminated using a truncated αHL as the one reported in 48 where it was shown that αHL pores are stable also when the large portion of the trans side of the barrel are deleted. We explored this possibility with our model calculating the summation in Eq. (3) only for the α HL region going from the residues Ile 136 to Asn 123, approximatively 20 Å from the native trans barrel entrance, to the vestibule. Figure 4b reports the corresponding b eq Vs V a plot where Leu lies close to the regression line.
It is worth noting that recent experiments indicate that αHL is able to distinguish among three-block peptides where the central neutral residues were Alanine and Triptophan 35 , or Isoleucine and Serine 49 . Moreover, very recently Piquet et al. 9 , showed that also Aerolysin nanopore is able to discriminate between two different ten-residue long homopeptides made by Arginine (R) and Lysine (K) and one heteropolymeric peptides, (K) 5 -(R) 5 . Taken together, the cited experimental results and our simulations suggest that biological pores can for all residues Vs the amino acid volume V a . Yellow circles corresponds to hydrophobic residues, green squares to polar, blue up-triangles to positively charged residues and red down-triangles to negatively charged ones. The dashed line is the minimum square fit. Panel a reports the b eq calculated on the entire pore while panel b refers to the b eq calculated removing the last part of the barrel including the secondary constriction, see the sketch in the inset. Error bars are estimated by considering b eq from independent replicas as independent measurements and they are reported only when larger than symbols.
www.nature.com/scientificreports www.nature.com/scientificreports/ potentially been employed for protein sequencing although several challenging issues, such as the translocation control, need to be solved 11 .

Conclusion
Nanopore based protein sequencing devices have two fundamental requirements: (i) the signal-to-monomer matching, which implies that the capture and the translocation speed needs to be controlled, and the distinguishability of the signals associated to the different amino acids 11 . In the present study, we focused on the distinguishability of different amino acids in α-Hemolysin. As a preliminary case, we studied homopeptides occupying the whole pore. We first performed an extensive set of non-equilibrium all-atom MD simulations to calculate the ion current blockade induced by four different homopeptides. Inspired by a quasi-1D model for pore conductance, we defined the pore clogging estimator and showed that it correlates with the observed current blockage from non-equilibrium runs. The estimation of relative conductance is a factor four less computational demanding than non-equilibrium runs allowing us to explore all the 20 standard amino acids. Our results show that amino acid volume is the main feature that rules the pore clogging and, consequently, the current blockage. In addition, our results indicate that also hydrophobicity plays a role. Indeed, for similar amino acid volumes, charged residues are associated to a smaller pore clogging than uncharged ones and slight, but significant, differences are observed also between hydrophobic and polar amino acids. Our results suggest that α HL is potentially able to discern among the different residues. For some set of residues with very similar volume, however, the pore clogging is very close and the expected current blockage signal as well. In these cases, long current recordings and signal post processing would be required to distinguish among them 11 . Furthermore, our study provides a set of structural and chemical-physical information about nanopore protein sequencing that can pave the road to improve the distinguishability of the signal associated to a single amino acid. Our simulation protocol can be easily generalized to other pores or to systematically study the effect of modifications of αHL pore with the aim to propose mutations that can be ad hoc designed to amplify the signal differences among the 20 amino acids or to reduce the noise (as discussed, for instance, concerning the cut of the last part of the barrel) 48 . Indeed, this kind of membrane protein engineering is already routinely used with different biotechnological applications. system setup. All-atom Molecular Dynamics (MD) simulations were performed using the NAMD software 50 . The CHARMM36 force field 51 was employed to model lipid, protein, and TIP3P water molecules 52 . NBFIX corrections were applied for ions 53 .
The membrane-αHL system has been assembled using a protocol similar to the one used in other works 38,54,55 . In brief, the system was built starting from the αHL crystal structure PDB_ID: 7AHL 37 downloaded from the OPM database 56 . The POPC lipid membrane, the water molecules, and the ions for neutralizing the system were added using VMD 42 . Then, the system is minimized and a 60 ps NVT simulation (time step 0.2 fs) was run with external forces applied to the water molecules to avoid their penetration into the membrane and the pore. Lipid heads have been constrained to their initial position by means of harmonic springs (spring constant k = 1 kcal/ (mol 2 )) acting on the phosphorus. A second equilibration run (1 ns NPT flexible cell, time step 1 f s) was performed to compact the membrane. During this run, the lipid heads were unconstrained. The third, and last, equilibration step consists of a NPT constant area simulation (2 ns, time step 2 fs) where all the atoms are unconstrained and no external forces act on the water molecules. The resulting periodic box, after the equilibration, has the following size: L x = 127.5 Å, L y = 127.1 Å, and L z = 180.0 Å, and the total number of atoms is ~290000. Initial configurations of peptides are generated by using the PEPFOLD server 57 and then separately equilibrated in a triperiodic water box. Then, the two systems were merged, ions (2M KCl) were added using VMD, and a short NPT equilibration is performed (2 ns, constant area NPT) until L z reaches a stationary value. The resulting box has dimensions L x = 127.5 Å, L y = 127.1 Å (i.e. the same as the original equilibrated αHL-membrane box) while .  L 186 2 z Å (slightly different values are get for each homopeptide) and the overall number of atom is ~310000.
peptide insertion. For each replica, a dedicated Steered Molecular Dynamics simulations was employed to bring the peptides at the pore's lumen entrance (trans side) and, then, into the pore. In particular, the peptide N-terminus was placed at ~15 Å from the αHL's trans entrance and then pulled inside the nanopore using a constant velocity Steered Molecular Dynamics (SMD) simulation at pulling speed v SMD = 0.025 Å/ps. The total SMD simulation time is t SMD = 17 ns that corresponds to a motion of the pulled atom of Δ SMD = 425 Å during which the peptide crosses the αHL two times. The initial configurations for the subsequent non equilibrium (E > 0) and equilibrium (E = 0) production runs for, respectively, ionic current and b eq measurements (see next section) were chosen among the ones of the second passage. Since such SMD method can force the polymer to adopt highly stretched conformations 58 , we checked that the homopeptide relax toward equilibrium by computing the time evolution of gyration radius. The average relaxation time is 10 ns, see Section S6 and Fig. S7 of Supporting Information for details, hence, in the production runs, we discarted the first part of the simulations from the average calculation. Moreover, for selected homopeptides, we also repeated the pulling protocol in the opposite direction and repeated the calculation of pore clogging b eq , see Section S5 and Table S3. No significant differences are observed, suggesting that we sampled an equilibrium state and not a highly stretched conformation induced by the SMD. For reader convenience, we mention that protocols that allow to introduce solutes inside biological nanopores reducing possible conformational distortions induced by the pulling force were proposed in the literature 58,59 . Current measurement. We then select the frame for which the Cα of the central residue of the homopeptide is closer to the pore constriction defined as the average position of the seven copies of amino acid Met-113 of the αHL heptamer. This configuration was used for non-equilibrium runs where uniform and constant external www.nature.com/scientificreports www.nature.com/scientificreports/ electric field E = (0, 0, E z ) was applied perpendicularly to the lipid bilayer. This protocol was shown to be equivalent to the application of a constant voltage ΔV = E z L z 38,60,61 . Each simulation was run for 240 ns and snapshots are saved every Δt = 40 ps. The average current in the interval [t, t + Δt] is estimated as where q i and z i are charge and the z-coordinate of the i-th atom, respectively. The K + and Cl − currents were computed by restricting the sum over the atoms of corresponding type 38 . The mean current is obtained via a time average of I(t) after discarding a transient of 64 ns. Details on the statistical comparison of the current traces are reported in section S1 of the Supporting Information. As often occours in all-atom simulations, the driving voltage does not reflect the typical experimental conditions, but it is necessary for reducing the noise/signal ratio of the ionic current measurements. Current measurement have been carried out at ΔV = 1V. Although ΔV = 1V we can be outside of the linear response region, see e.g. extensive simulation at various voltages reported in 55,62 , we expect that the relative blockage ΔI/I 0 is not strongly affected by the large ΔV.