Sequencing proteins with transverse ionic transport in nanochannels

{\it De novo} protein sequencing is essential for understanding cellular processes that govern the function of living organisms and all post-translational events and other sequence modifications that occur after a protein has been constructed from its corresponding DNA code. By obtaining the order of the amino acids that composes a given protein one can then determine both its secondary and tertiary structures through structure prediction, which is used to create models for protein aggregation diseases such as Alzheimer's Disease. Mass spectrometry is the current technique of choice for {\it de novo} sequencing. However, because some amino acids have the same mass the sequence cannot be completely determined in many cases. Here, we propose a new technique for {\it de novo} protein sequencing that involves translocating a polypeptide through a synthetic nanochannel and measuring the ionic current of each amino acid through an intersecting {\it perpendicular} nanochannel. To calculate the transverse ionic current blockaded by a given amino acid we use a Monte Carlo method along with Ramachandran plots to determine the available flow area, modified by the local density of ions obtained from molecular dynamics and the local flow velocity ratio derived from the Stokes equation. We find that the distribution of ionic currents for each of the 20 proteinogenic amino acids encoded by eukaryotic genes is statistically distinct, showing this technique's potential for {\it de novo} protein sequencing.


Supplementary Information
Molecular Dynamics The amino acid is centered along the z-axis according to the geometric center in z of its terminal N and C atoms, while the molecule is centered in the xy-plane according to the geometric center in x and y of its terminal N atom and a nearest neighboring amino acid's terminal N atom. To fix the rotation angle between amino acids, as a convention, the terminal N atoms always have y = 0, as is the case in Fig. 1. Since PRO has more rigid dihedral angles, we need to center it with the help of two neighboring GLYs, which have flexible dihedral angles, on each side of a single PRO. The nearest neighbor GLYs are configured to have dihedral angles that compensate for those of PRO while the farthest neighbor GLYs are configured to be straight, so that PRO and the two straight GLYs are directed along the longitudinal axis while the two straight GLYs are aligned in the xy-plane. As a result, we can center and then isolate PRO by using the usual centering method on the geometric average of the two straight GLYs, staying consistent with the choice of angles for the rest of the amino acids.
Once the amino acid is isolated, we solvate the system into a right hexagonal prism with regular hexagonal xy-planes having a height of 11 nm and an apothem of 5.9 nm to be used in NAMD2 with periodic boundary conditions in all three dimensions of space. This configuration gives every atom from every amino acid at least 4.8 nm of water padding in the unit cell, or in other words at least 9.6 nm of water between any atom and the closest atom in any neighboring periodic image. We then passivate and ionize the system to about 1 M of KCl, a typical biological solute. The size of the ions will certainly change the average local concentrations near the amino acid, which may then affect the ionic transport. We utilize the CHARMM22 with CMAP force field 1, 2 for all of the amino acid, TIP3P water, and ion interactions. Each amino acid was held fixed throughout the run so that it would not diffuse around and the surrounding solution could equilibrate and be analyzed consistently. After equilibrating at 0 K and progressively ramping up the temperature to 310 K, the system is allowed to evolve in an NPT ensemble first for 1 ns followed by an NVT ensemble for 5 ns, all with 1 fs time steps and 1 ps coordinate recordings. The temperature is held fixed using a Langevin thermostat with a damping coefficient of 5 ps −1 . The first ns of the NVT production run is discarded as transient, leaving 4 ns of run time, or 4000 coordinate snapshots, to analyze radial concentration profiles.
Velocity calculation Due to the small length scales of the intersecting portion of our nanochannel system, we can use the Stokes equation, where µ is the dynamic viscosity of the fluid, E ⊥ is the external electric field applied in the y direction,v i is the transverse velocity through the cross section,ĝ i is the local number density, and ion-ion interactions are ignored. Here the flow velocity is independent of y due to the fact that the transverse nanochannel length is much larger than the diameter of the longitudinal nanochannel, 2R, and that 2R is comparable to the diameter of the transverse nanochannel (as depicted in Fig. 1). Independence from z is similarly due to the longitudinal nanochannel's length being much larger than 2R but we must also choose R to be large enough for ions to diffuse along z after they enter the longitudinal channel at y = ±R. This will make any variation along z have negligible impact on the end result of an average v i over all rotational conformations. In our case we simply use the dynamic viscosity of water (µ = 7.5 × 10 −4 Pa · s), even though the viscosity of water with ions will vary slightly, 3 and a reasonable value of E ⊥ = 5 × 10 8 N/C taken from. 3 However, we must transform this equation into one that depends on r > to obtain v i . Since v i and g i are averages over all rotational orientations, the problem is condensed to the region x > 0 and [θ i , θ f ].
With θ i and θ f close enough to π/2, we can approximate r > as x − x • , where x • is some constant, since at least for the amino acid backbone the contour lines of r > resemble those of x. With these approximations we obtain This will give us the rough form of v i /v i,b between the following boundary conditions: r > = (R − r • )/2 is approximately halfway between the vdW surface and the longitudinal nanochannel surface and is also our upper bound on r > as the domain of v i . To obtain v i for the entire range necessary for Eq. (2), we require that r b =r > = (R − r • )/2 sincer > must be in the bulk as well. From our pRDF plots (see Fig. 2) we learn that the bulk concentrations start at approximately r > = 15Å, meaning that r b ≥ 15Å. Therefore for our model to work we have to take R ≥ 30+max{r • } = 34.16 A, where the max is over all amino acids, and then in the interest of minimizing the bulk ionic current we choose R = 35Å.

3/4
Monte Carlo The Ramachandran plots that we use in our Monte Carlo calculations account for a 250 pN longitudinal force that is applied to the polypeptide chain (ubiquitin and polyglycine in 4 ) to pull it through the nanochannel. The pulling force acts to limit the phase space available to the dihedral angle pair (φ , ψ), making the configurations that are close to straight (ψ = φ = 180 • ) much more appealing. 4 PRO and GLY have significantly different plots from the rest of the amino acids due to how the side chain of PRO bonds with its own amine nitrogen, part of the amino acid backbone, leading to restricted dihedrals while GLY has a hydrogen instead of a side chain leading to more freedom in the dihedrals. This way, while the rest of the amino acids are described by the Ramachandran plot of ubiquitin, which contains all 20 of the proteinogenic amino acids encoded by eukaryotic genes and well represents 18 of them, we describe PRO with the (φ , ψ) plot from the isolated PRO values within ubiquitin and GLY with the Ramachandran plot of the polyglycine analog of ubiquitin. 4 With these Ramachandran plots we use Monte Carlo sampling to obtain (φ , ψ) pairs that we then implement on a lone amino acid, where the number of realizations is dependent on the size of the domain of the Ramachandran plot (1408 realizations for ubiquitin). We also rotate the amino acid in φ ′ ∈ [0, 2π) by all multiples of π/12. Then the amino acid is projected onto the y = 0 plane and using Monte Carlo (1000 realizations) we calculate the area outside of the effective surface, called A i , yet within either π/2 sector of radius R/2 centered around z = 0, where the ion i will fit according to its vdW radius.

Maximum velocity
Since v b is the maximum velocity between the amino acid and the longitudinal nanochannel surface as aforementioned, we can use Eq. (S1) to obtain the max ofv, which is equivalent to v b . In this case we focus on the velocity within the bulk region, namely from the midpoint between the amino acid and the channel surface, x = x mid = (R + r • )/2, to the channel surface, x = R. We employ the following boundary conditions, which are very similar to Eq. (S3). By assuming a constant bulk concentration, g b , over this region we quickly come to a parabolic solution to Eq. (S1) as well as determining v b = qzg b E ⊥ (R − r • ) 2 /4µ, wherez i has been simplified toz since both ion species have the same valency. Then we have v b = 154.47 m/s by choosing a reasonable r • = 4Å, a necessity in making v b independent of the amino acid under study, which is more likely in experiment. In fact, the absolute value of the velocity determined from the Stokes equation is known to differ from experiment, 3 as opposed to the velocity ratio that we have utilized thus far. However, these differences appear to be systematic, 3 and can be solved by dividing v b in half, resulting in the corrected v b = 77.23 m/s.