Introduction

Nanopores have been demonstrated a great versatility in biosensing, as they can be employed to detect and analyze biological sample at single molecule level1,2,3,4,5,6,7,8,9,10,11,12,13,14. In nanopore sensing, the interaction of the molecule with the nanopore, e.g. its translocation through the pore, alters one or more properties of the system that can be recorded by appropriate instruments. The most commonly used approach is the so called resistive pulse, where the changes in the nanopore electric resistance induced by the molecule are associated to molecule properties. Another promising approach, based on tunneling effect, measures the alteration of the transverse current along the membrane plane5,15,16,17.

A potentially disrupting application of nanopore sensors is the single molecule protein sequencing. Compared to DNA sequencing, nanopore protein sequencing poses several challenges due to the large number of monomers to be distinguished (20 amino acids with respect to to 4 bases), the non-uniform charge of the polipeptidic chains (amino acid forming the protein can be neutral, positively or negatively charged) and the complex structure of proteins and peptides. The last two features make difficult even the mere capture of the protein by the nanopore. Indeed, since proteins can be both positively and negatively charged, electrophoresis is not usable to induce the capture unless specific charged tags are added to the protein terminals18,19. To overcome this difficulties, other approaches, such as dielectrophoretic trapping20,21,22,23,24 and electro-osmotic flow6,7,25,26 have been proposed. In addition, the complex interplay between unfolding, capture and translocation typically results in a non-homogeneous multistep co-translocational unfolding process18,27,28,29,30,31,32.

Once the molecule is captured, the fundamental requirements for a nanopore based sequencing devices are, in essence, two11. (i) Signal-to-monomer matching. Each signal has to be unambiguously associated to a specific monomer in the protein sequence. In sequencing strategies where the entire chain is sequentially imported inside the pore, this implies that the translocation speed needs to be controlled. Kennedy et al.8 showed that a homogeneous translocation can be achieved using sodium dodecyl sulfate (SDS), an anionic compound that denaturates the protein providing it with a negative charged shell. They found that the current trace associated to the protein translocation has a number of peaks close to the number of amino acids of the analyzed protein. On the computational side, the possibility to exploit the adhesion of the peptide chain on 2D materials to get a step-like translocation33,34 has been recently explored as a possible approach to control the protein transport through the pore. (ii) Distinguishability. The signal level associated to a single amino acid (AA) has to allow the unambiguous identification of the AA. Several experimental8,9,10,20,35 and computational33,35,36 works have shown that also very small changes in peptide composition can be potentially detected by nanopores. In this respect, systematic analysis of the capability of nanopores to distinguish among all the different residues are highly needed, a remarkable recent example being the work by Farimani et al.36 on the computational assessment of the peptide sequencing capability of a MoS2 pore.

In the present study, we focused on the distinguishability of different AA in α-Hemolysin (αHL), the most widely employed pore in nanopore sensing2,18,19,20,35,37,38. To this aim, as a preliminary case study, we analyzed the differences of homopeptide chains inserted in the αHL. First, we employed an extensive set of non-equilibrium all-atom MD simulations (\(\simeq 8\,\mu s\) in total) to calculate the current levels associated to four different neutral homopeptides. Our results show that, as expected, large residues correspond to lower current values. Interestingly, we find that an equilibrium quantity derived from continuum quasi-1D argument and indicated as “pore clogging estimator” is linearly correlated to the measured current blockages. The estimation of relative conductance is a factor four less computational demanding with respect to non equilibrium runs allowing us to explore all the standard amino acids. Our results show that αHL pore clogging is affected not only by amino acid volume, but also by hydrophobicity and net charge. In particular, charged residues leave more room to electrolyte motion compared to uncharged one, hence, for similar residue volume, they give rise to a smaller clogging.

Results and Discussion

Ionic currents for selected homopeptides

We studied the ionic current for four different homopeptide chains clogging the αHL nanopore via all-atom molecular dynamics simulations. The four amino acids composing the homopeptides are alanine (Ala), phenylalanine (Phe), tryptophan (Trp) and glutamine (Gln) and, for each of them, we prepared five independent replicas. The system set-up is sketched in Fig. 1a. The αHL nanopore is embedded into a double-layer lipid membrane and immersed in a 2M KCl electrolyte solution, for a total of about 310 K atoms. After equilibration, the homopeptide is imported into the pore using steered molecular dynamics39. The frame with the central residue closer to the main pore constriction (Glu 111) is selected as starting configuration for the production runs. A constant and homogeneous external electric field E = (0, 0, Ez) corresponding to a trans membrane voltage ΔV = 1 V is applied parallel to the pore axis. Each simulation was run for 240 ns and ionic current is estimated as the time average after discarding an initial transient of 64 ns, see Methods. The current blockage is defined as ΔI/I0 = (I0 − I)/I0, with I the average current measured with the homopeptide inside the pore and I0 the empty pore value.

Figure 1
figure 1

Ionic current measurements. (a) The system is constituted by an α- Hemolysin (αHL, blue) nanopore embedded into a lipid membrane (gray). A 35-residues homopeptide (orange chain) is imported into the nanopore with the central residue close to the pore constriction. The simulation box is filled up by 2M KCl electrolyte solution, that, for the sake of clarity, is not shown. A constant and homogeneous external electric field E = (0, 0, Ez) parallel to the pore axis is applied. (b) Average current blockage ΔI/I0 = (I0 − I)/I0, with I the average current measured with the homopeptide inside the pore and I0 the empty pore value, for four different homopeptides, Ala, Phe, Gln, Trp. The data are obtained averaging the current blockades of five replicas for each homopeptide. Error bars are estimated by considering current blockades from independent replicas as independent measurements. (c) Molecular structure and Van der Waals volume of the four amino acids45.

Figure 1b shows the average current blockage ΔI/I0 for each homopeptide, which are obtained averaging the current blockade of 5 replicas for each homopeptide, while error bars are estimated by considering current blockades from independent replicas as independent measurements. Fig. S1a, instead, reports ΔI/I0 for each single replica. As expected, ΔI/I0 roughly reflects the steric hindrance of each amino acid, see Fig. 1c. Indeed, the lower blockage corresponds to Ala (VdW volume, VA = 88.6 Å3)40 and the largest to Trp (VW = 227.8 Å3) while Gln (VQ = 143.9 Å3) and Phe (VF = 189.9 Å3) blockages are in between the Ala and Trp values.

Interestingly, significant differences among replicas of the same homopeptide are found for Ala, Gln, and Phe, while, Trp replicas do not show any significant variability, see Section S1 and Fig. S1b of Supporting Information. This occurrence can be explained in term of the capability of smaller amino acids to explore a larger number of conformations inside the pore.

Electrolyte occupancy

The above presented results indicate that the size of the side chain is correlated to the current blockage; the larger the side chain, the deeper the current drop. A similar results was find also for DNA and Mpsa (another biological pore used for sensing), by Bhattacharya et al.41 where it was shown that the number of water molecules displaced from the nanopore by the DNA determines the ionic current blockade, whereas the steric and conformational (base-stacking) properties of the DNA determine the amount of water displaced.

To better investigate the role of peptide conformation on the current drop, we formulated the following simple theoretical model. In a quasi-1D continuum description, the pore resistance is expressed as

$$R={\int }_{0}^{L}\frac{\rho (z)}{A(z)}\,dz,$$
(1)

where the z− axis coincides with the pore axis, the pore goes from z = 0 to z = L, ρ(z) is the electrolyte resistivity and A(z) is the area of the pore section available to the electrolyte. Access resistances are neglected.

To estimate A(z) from our non-equilibrium runs, we divided the system in cubic cells of size Δx = Δy = Δz = 1 Å, and, for each frame, we used the VMD Volmap plug-in42 to compute the occupancy map of the electrolyte, mx,y,z, where x, y, z indicate the cell, mx,y,z = 1 if the cell is within a Van der Waals radius of at least one water or ion atoms and mx,y,z = 0 elsewhere. Then, we averaged mx,y,z over all frames and normalized it with the bulk value. The resulting averaged and normalized occupancy map is indicated with Mx,y,z. As already discussed in Aksimentiev and Schulten38, “electrolyte pockets” are present close to constriction, see Fig. S2. The pockets do not contribute to the ion current. To filter out these pockets, we defined a trans → cis available channel as the pore region accessible to the electrolyte when moving from the barrel entrance towards the vestibule. This procedure excludes reentrant pockets directed towards the trans side, see Fig. S2 of Supporting Information. The same procedure is applied to get a cis → trans accessible pore, and the final occupancy map \({\tilde{M}}_{{\rm{x}},{\rm{y}},{\rm{z}}}\) is obtained as the intersection of the trans → cis and cis → trans accessible pores, see section S2 of Supporting Information for details. Figure 2c reports slices of \({\tilde{M}}_{{\rm{x}},{\rm{y}},{\rm{z}}}\) for the four homopeptides. The regions available for the electrolyte transport between the two sides of the membrane are indicated in blue.

Figure 2
figure 2

Accessible volume estimation. (a) The panel report a cut of the 3D averaged occupancy map \({\tilde{M}}_{{\rm{x}},{\rm{y}},{\rm{z}}}\) for the empty pore. Blue areas corresponds to region that are fully accessible by the electrolyte \({\tilde{M}}_{{\rm{x}},{\rm{y}},{\rm{z}}}=1\) while white ones do not contribute to the volume useful for the ions transport between the two side of the membrane, \({\tilde{M}}_{{\rm{x}},{\rm{y}},{\rm{z}}}=0\). (b) Inverse of the accessible area, Az, along the pore. The empty pore profile is plotted as dashed black line. The two peaks at \(z\simeq 50\) Å and \(\simeq 20\) Å correspond to the central αHL constriction and to the constriction close to the barrel entrance, respectively. The five solid lines refer to the five Ala replicas. (c) Slices of \({\tilde{M}}_{{\rm{x}},{\rm{y}},{\rm{z}}}\) normal to the pore z-axis passing through the two constrictions (z = 19 Å and z = 48 Å) and the vestibule (z = 70 Å) for the four homopeptides.

The occupancy map \({\tilde{M}}_{{\rm{x}},{\rm{y}},{\rm{z}}}\) allows direct estimation of the pore section A(z) that can be calculated summing \({\tilde{M}}_{{\rm{x}},{\rm{y}},{\rm{z}}}\) on slices of width Δz normal to the pore axis, in formula

$${A}_{z}=\sum _{{\rm{x}},{\rm{y}}}\,{\tilde{M}}_{{\rm{x}},{\rm{y}},{\rm{z}}}\,{\rm{\Delta }}x\,{\rm{\Delta }}y\mathrm{.}$$
(2)

Consequently, the resistance, Eq. (1), can be approximated as

$$\tilde{R}=\sum _{i=1}^{{N}_{{\rm{z}}}}\,\frac{\rho }{{A}_{{\rm{z}}}}{\rm{\Delta }}z,$$
(3)

where i = 1 and i = Nz correspond to the slice at pore trans and vestibule entrances, respectively, and we assumed that the resistivity ρ is constant along the pore. A similar quasi-1D model was recently applied in43.

Figure 2b reports the inverse of the available section profile, \({A}_{{\rm{z}}}^{-1}\) for the five Ala replicas (solid lines) and for the empty pore (black dashed line). In the vestibule region, z (60, 100) Å, the difference between the empty and the clogged α HL is negligible, indicating that, the contribution of the moiety of the homopeptide in the vestibule region to the pore resistance, Eq. (3), is almost unrelevant. More evident differences are present in the barrel region, z (5,50) Å, and, in particular in the main αHL constriction, \(z\simeq 50\) Å. Interestingly, A(z)−1 has also a peak for \(z\simeq 20\) Å. This is due to the non-polar Leu-135 residues that forms an isolated hydrophobic ring inside the barrel. Since the available section for the electrolyte passage is smaller in this region, we will indicate it as secondary barrel constriction. The hydrophobic nature of Leu-135 ring was shown to be relevant for DNA translocation through αHL44.

To quantify the correlation between the pore clogging and the ionic current, we defined the pore clogging estimator as

$$b=1-\frac{{\tilde{R}}_{0}}{\tilde{R}},$$
(4)

where \({\tilde{R}}_{0}\) refers to the empty pore. Equation (4) is inspired by the definition of the current blockage. Indeed, ΔI/I0 = 1 − I/I0, hence, using Ohm law, ΔI/I0 = 1 − R0/R. The above discussed model is based on several hypotheses that are violated by the actual α HL pore shape. In particular, the continuum assumption is not justified at nanoscale, moreover, the model implicitly assumes a smooth variation of A(z) along the pore axis. In addition, we considered a homogeneous electrolyte resistivity ρ. Nevertheless, although a strict quantitative agreement with the blockage ΔI/I0 and b is not expected, b results to be highly correlated with the measured ΔI/I0 (Pearson correlation coefficient r = 0.8), see Fig. 3.

Figure 3
figure 3

Pore clogging b Vs measured current blockage ΔI/I0. Linear regression curve is reported in dashed blue, Pearson correlation coefficient r = 0.8.

Pore clogging for all amino acids

Stimulated by the good correlation among the measured ΔI/I0 and the pore clogging estimator b, Eq. (4), we looked for a less computational demanding strategy to estimate b. We repeated the protocol described in the previous section using, as input data, 64 ns equilibrium runs (Ez = 0, only last 32 ns used for statistics) instead of the original 240 ns non-equilibrium trajectories. The resulting equilibrium pore clogging estimator is indicated as beq. The result discussed in section S3 of the supporting information, show that, although the value of b slightly changes when using equilibrium or non equilibriums runs as input, the correlation is still very good for Ala, Gln and Trp while deviations are obtained for Phe that, at equilibrium, show a larger clogging compared to non equilibrium runs. Figure S3c reports the equilibrium clogging profile for Phe. It is apparent that for replica F1 the clogging in the main α HL constriction is much higher than for the other replicas. We argue that this single outlier is responsible of the high deviation from equilibrium and non equilibrium average clogging for Phe.

The relatively small computational cost of the equilibrium simulations needed to estimate beq allows us to explore the blockage features of all the amino acids. For each homopeptide, we run five different replicas. Figure 4a shows the pore clogging estimator beq Vs the apparent amino acid volume Va45. A very good correlation is evident for all uncharged residues, while charged residues lie below the regression line. Indeed, charged residues leave more room to electrolyte solution compared to uncharged one. Similarly, although less evident, polar residues (green) show, on average, a lower beq than hydrophobic ones beq, see section S4 of the Supplementary Material for a statistical analysis. This occurrence can be explained as a combination of two concomitant effects. First, hydrophobic, hydrophilic and charged residues affect the structure of the first shells of the electrolyte solution surrounding them in different ways. Indeed, concerning water molecules, hydrophilic and charged residues induce a more compact layering with respect to hydrophobic ones, see, e.g.46. Secondly, charged peptides are slightly more stretched with respect to uncharged residues (see, section S5), increasing the effective cross-section of the clogged pore available for electrolyte motion. In addition, we observed that charged homopeptides also induce an overall increases of the total number of ions inside the pore. Indeed, the ratio between ions and water molecules inside the narrow pore region (barrel plus constriction) is 0.061 ± 0.002 for charged residues and 0.035 ± 0.001 for uncharged ones. These values can be compared with the empty pore one, 0.041, indicating that, despite the confinement, charged residues are able to partially carry their counterion shells inside the narrowest regions of the pore. In summary, on average, for a similar amino acid volume, the pore clogging is minimum for charged homopeptides and it progressively increases moving to hydrophilic and hydrophobic residues. This evidence suggest that, although the main feature controlling the pore clogging is the volume of the amino acid, also charge and hydrophobicity play a role. For completeness, Fig. S4 reports the inverse area profiles for each charged residue and for the corresponding hydrophobic residue with a similar volume while the correlation of the amino acid accessible surface area S47 and pore clogging beq is reported in Fig. S5.

Figure 4
figure 4

Pore clogging estimator beq for all residues Vs the amino acid volume Va. Yellow circles corresponds to hydrophobic residues, green squares to polar, blue up-triangles to positively charged residues and red down-triangles to negatively charged ones. The dashed line is the minimum square fit. Panel a reports the beq calculated on the entire pore while panel b refers to the beq calculated removing the last part of the barrel including the secondary constriction, see the sketch in the inset. Error bars are estimated by considering beq from independent replicas as independent measurements and they are reported only when larger than symbols.

Concerning the uncharged residues, the more evident outlier in Fig. 4a is Leu. Indeed, although its volume is the same of its isomer Ile, beq is much larger. A close inspection to the inverse of the available section profile 1/A(z) indicates that this discrepancy is mainly due to clogging of the secondary barrel constriction, \(z\simeq 20\) Å. In particular, we observed that in some replicas, the Leu-homopeptide forms a short α− helix in the portion that occupies the secondary constriction, see Fig. S6.

The effect of secondary barrel constriction can be, in principle, eliminated using a truncated αHL as the one reported in48 where it was shown that αHL pores are stable also when the large portion of the trans side of the barrel are deleted. We explored this possibility with our model calculating the summation in Eq. (3) only for the α HL region going from the residues Ile 136 to Asn 123, approximatively 20 Å from the native trans barrel entrance, to the vestibule. Figure 4b reports the corresponding beq Vs Va plot where Leu lies close to the regression line.

It is worth noting that recent experiments indicate that αHL is able to distinguish among three-block peptides where the central neutral residues were Alanine and Triptophan35, or Isoleucine and Serine49. Moreover, very recently Piquet et al.9, showed that also Aerolysin nanopore is able to discriminate between two different ten-residue long homopeptides made by Arginine (R) and Lysine (K) and one heteropolymeric peptides, (K)5-(R)5. Taken together, the cited experimental results and our simulations suggest that biological pores can potentially been employed for protein sequencing although several challenging issues, such as the translocation control, need to be solved11.

Conclusion

Nanopore based protein sequencing devices have two fundamental requirements: (i) the signal-to-monomer matching, which implies that the capture and the translocation speed needs to be controlled, and the distinguishability of the signals associated to the different amino acids11. In the present study, we focused on the distinguishability of different amino acids in α-Hemolysin. As a preliminary case, we studied homopeptides occupying the whole pore. We first performed an extensive set of non-equilibrium all-atom MD simulations to calculate the ion current blockade induced by four different homopeptides. Inspired by a quasi-1D model for pore conductance, we defined the pore clogging estimator and showed that it correlates with the observed current blockage from non-equilibrium runs. The estimation of relative conductance is a factor four less computational demanding than non-equilibrium runs allowing us to explore all the 20 standard amino acids. Our results show that amino acid volume is the main feature that rules the pore clogging and, consequently, the current blockage. In addition, our results indicate that also hydrophobicity plays a role. Indeed, for similar amino acid volumes, charged residues are associated to a smaller pore clogging than uncharged ones and slight, but significant, differences are observed also between hydrophobic and polar amino acids. Our results suggest that α HL is potentially able to discern among the different residues. For some set of residues with very similar volume, however, the pore clogging is very close and the expected current blockage signal as well. In these cases, long current recordings and signal post processing would be required to distinguish among them11.

Furthermore, our study provides a set of structural and chemical-physical information about nanopore protein sequencing that can pave the road to improve the distinguishability of the signal associated to a single amino acid. Our simulation protocol can be easily generalized to other pores or to systematically study the effect of modifications of αHL pore with the aim to propose mutations that can be ad hoc designed to amplify the signal differences among the 20 amino acids or to reduce the noise (as discussed, for instance, concerning the cut of the last part of the barrel)48. Indeed, this kind of membrane protein engineering is already routinely used with different biotechnological applications.

System setup

All-atom Molecular Dynamics (MD) simulations were performed using the NAMD software50. The CHARMM36 force field51 was employed to model lipid, protein, and TIP3P water molecules52. NBFIX corrections were applied for ions53.

The membrane-αHL system has been assembled using a protocol similar to the one used in other works38,54,55. In brief, the system was built starting from the αHL crystal structure PDB_ID: 7AHL37 downloaded from the OPM database56. The POPC lipid membrane, the water molecules, and the ions for neutralizing the system were added using VMD42. Then, the system is minimized and a 60 ps NVT simulation (time step 0.2 fs) was run with external forces applied to the water molecules to avoid their penetration into the membrane and the pore. Lipid heads have been constrained to their initial position by means of harmonic springs (spring constant k = 1 kcal/(mol2)) acting on the phosphorus. A second equilibration run (1 ns NPT flexible cell, time step 1 \(\tilde{{\rm{f}}}\)s) was performed to compact the membrane. During this run, the lipid heads were unconstrained. The third, and last, equilibration step consists of a NPT constant area simulation (2 ns, time step 2 fs) where all the atoms are unconstrained and no external forces act on the water molecules. The resulting periodic box, after the equilibration, has the following size: Lx = 127.5 Å, Ly = 127.1 Å, and Lz = 180.0 Å, and the total number of atoms is ~290000. Initial configurations of peptides are generated by using the PEPFOLD server57 and then separately equilibrated in a triperiodic water box. Then, the two systems were merged, ions (2M KCl) were added using VMD, and a short NPT equilibration is performed (2 ns, constant area NPT) until Lz reaches a stationary value. The resulting box has dimensions Lx = 127.5 Å, Ly = 127.1 Å (i.e. the same as the original equilibrated αHL-membrane box) while \({L}_{{\rm{z}}}\simeq 186.2\) Å (slightly different values are get for each homopeptide) and the overall number of atom is ~310000.

Peptide insertion

For each replica, a dedicated Steered Molecular Dynamics simulations was employed to bring the peptides at the pore’s lumen entrance (trans side) and, then, into the pore. In particular, the peptide N-terminus was placed at ~15 Å from the αHL’s trans entrance and then pulled inside the nanopore using a constant velocity Steered Molecular Dynamics (SMD) simulation at pulling speed vSMD = 0.025 Å/ps.

The total SMD simulation time is tSMD = 17 ns that corresponds to a motion of the pulled atom of ΔSMD = 425 Å during which the peptide crosses the αHL two times. The initial configurations for the subsequent non equilibrium (E > 0) and equilibrium (E = 0) production runs for, respectively, ionic current and beq measurements (see next section) were chosen among the ones of the second passage. Since such SMD method can force the polymer to adopt highly stretched conformations58, we checked that the homopeptide relax toward equilibrium by computing the time evolution of gyration radius. The average relaxation time is \(\simeq 10\) ns, see Section S6 and Fig. S7 of Supporting Information for details, hence, in the production runs, we discarted the first part of the simulations from the average calculation. Moreover, for selected homopeptides, we also repeated the pulling protocol in the opposite direction and repeated the calculation of pore clogging beq, see Section S5 and Table S3. No significant differences are observed, suggesting that we sampled an equilibrium state and not a highly stretched conformation induced by the SMD. For reader convenience, we mention that protocols that allow to introduce solutes inside biological nanopores reducing possible conformational distortions induced by the pulling force were proposed in the literature58,59.

Current measurement

We then select the frame for which the of the central residue of the homopeptide is closer to the pore constriction defined as the average position of the seven copies of amino acid Met-113 of the αHL heptamer. This configuration was used for non-equilibrium runs where uniform and constant external electric field E = (0, 0, Ez) was applied perpendicularly to the lipid bilayer. This protocol was shown to be equivalent to the application of a constant voltage ΔV = EzLz38,60,61. Each simulation was run for 240 ns and snapshots are saved every Δt = 40 ps. The average current in the interval [t, t + Δt] is estimated as

$$I(t)=\frac{1}{{\rm{\Delta }}t\,{L}_{{\rm{z}}}}\sum _{i=1}^{N}\,{q}_{{\rm{i}}}[{z}_{{\rm{i}}}(t+{\rm{\Delta }}t)-{z}_{i}(t)]$$
(5)

where qi and zi are charge and the z-coordinate of the i-th atom, respectively. The K+ and Cl currents were computed by restricting the sum over the atoms of corresponding type38. The mean current is obtained via a time average of I(t) after discarding a transient of 64 ns. Details on the statistical comparison of the current traces are reported in section S1 of the Supporting Information. As often occours in all-atom simulations, the driving voltage does not reflect the typical experimental conditions, but it is necessary for reducing the noise/signal ratio of the ionic current measurements. Current measurement have been carried out at ΔV = 1V. Although ΔV = 1V we can be outside of the linear response region, see e.g. extensive simulation at various voltages reported in55,62, we expect that the relative blockage ΔI/I0 is not strongly affected by the large ΔV.