Abstract
Natural proteins are composed of 20 proteinogenic amino acids and their post-translational modifications (PTMs). However, due to the lack of a suitable nanopore sensor that can simultaneously discriminate between all 20 amino acids and their PTMs, direct sequencing of protein with nanopores has not yet been realized. Here, we present an engineered hetero-octameric Mycobacterium smegmatis porin A (MspA) nanopore containing a sole Ni2+ modification. It enables full discrimination of all 20 proteinogenic amino acids and 4 representative modified amino acids, Nω,N’ω-dimethyl-arginine (Me-R), O-acetyl-threonine (Ac-T), N4-(β-N-acetyl-d-glucosaminyl)-asparagine (GlcNAc-N) and O-phosphoserine (P-S). Assisted by machine learning, an accuracy of 98.6% was achieved. Amino acid supplement tablets and peptidase-digested amino acids from peptides were also analyzed using this strategy. This capacity for simultaneous discrimination of all 20 proteinogenic amino acids and their PTMs suggests the potential to achieve protein sequencing using this nanopore-based strategy.
This is a preview of subscription content, access via your institution
Access options
Access Nature and 54 other Nature Portfolio journals
Get Nature+, our best-value online-access subscription
$29.99 / 30 days
cancel any time
Subscribe to this journal
Receive 12 print issues and online access
$259.00 per year
only $21.58 per issue
Rent or buy this article
Prices vary by article type
from$1.95
to$39.95
Prices may be subject to local taxes which are calculated during checkout






Data availability
Data supporting the findings of this study are given in the main text and the Supplementary Information. All source data are provided with this paper. All data used to train, evaluate and test the machine learning model are available on figshare. Please follow the link: https://figshare.com/articles/software/Amino_acid-classifier/23995890 for download. Source data are provided with this paper.
Code availability
The custom machine learning code is available on figshare as ‘Amino acid-classifier’. Please follow the link: https://figshare.com/articles/software/Amino_acid-classifier/23995890 for download.
References
Aebersold, R. & Mann, M. Mass-spectrometric exploration of proteome structure and function. Nature 537, 347–355 (2016).
Edman, P. Method for determination of the amino acid sequence in peptides. Acta Chem. Scand. 4, 283–293 (1950).
Nivala, J., Marks, D. B. & Akeson, M. Unfoldase-mediated protein translocation through an alpha-hemolysin nanopore. Nat. Biotechnol. 31, 247–250 (2013).
Yan, S. et al. Direct sequencing of 2′-deoxy-2′-fluoroarabinonucleic acid (FANA) using nanopore-induced phase-shift sequencing (NIPSS). Chem. Sci. 10, 3110–3117 (2019).
Zhang, J. et al. Direct microRNA sequencing using nanopore-induced phase-shift sequencing. iScience 23, 100916 (2020).
Yan, S. et al. Single molecule ratcheting motion of peptides in a Mycobacterium smegmatis porin A (MspA) nanopore. Nano Lett. 21, 6703–6710 (2021).
Brinkerhoff, H., Kang, A. S., Liu, J., Aksimentiev, A. & Dekker, C. Multiple rereads of single proteins at single-amino acid resolution using nanopores. Science 374, 1509–1513 (2021).
Chen, Z. et al. Controlled movement of ssDNA conjugated peptide through Mycobacterium smegmatis porin A (MspA) nanopore by a helicase motor for peptide sequencing application. Chem. Sci. 12, 15750–15756 (2021).
Zhang, S. et al. Bottom-up fabrication of a proteasome–nanopore that unravels and processes single proteins. Nat. Chem. 13, 1192–1199 (2021).
Boersma, A. J. & Bayley, H. Continuous stochastic detection of amino acid enantiomers with a protein nanopore. Angew. Chem. Int. Ed. Engl. 51, 9606–9609 (2012).
Ouldali, H. et al. Electrical recognition of the twenty proteinogenic amino acids using an aerolysin nanopore. Nat. Biotechnol. 38, 176–181 (2020).
Hu, Z. L., Huo, M. Z., Ying, Y. L. & Long, Y. T. Biological nanopore approach for single-molecule protein sequencing. Angew. Chem. Int Ed. Engl. 60, 14738–14749 (2021).
Zhao, Y. et al. Single-molecule spectroscopy of amino acids and peptides by recognition tunnelling. Nat. Nanotechnol. 9, 466–473 (2014).
Faller, M., Niederweis, M. & Schulz, G. E. The structure of a mycobacterial outer-membrane channel. Science 303, 1189–1192 (2004).
Cao, J. et al. Giant single molecule chemistry events observed from a tetrachloroaurate(III) embedded Mycobacterium smegmatis porin A nanopore. Nat. Commun. 10, 5668 (2019).
Jia, W. et al. Programmable nano-reactors for stochastic sensing. Nat. Commun. 12, 5811 (2021).
Jia, W. et al. A nanopore based molnupiravir sensor. ACS Sens. 7, 1564–1571 (2022).
Jia, W. et al. Identification of single-molecule catecholamine enantiomers using a programmable nanopore. ACS Nano 16, 6615–6624 (2022).
Zhang, S. et al. A nanopore-based saccharide sensor. Angew. Chem. Int. Ed. Engl. 61, e202203769 (2022).
Wang, Y. et al. Identification of nucleoside monophosphates and their epigenetic modifications using an engineered nanopore. Nat. Nanotechnol. 17, 976–983 (2022).
Liu, Y. et al. Nanopore identification of alditol epimers and their application in rapid analysis of alditol-containing drinks and healthcare products. J. Am. Chem. Soc. 144, 13717–13728 (2022).
Hochuli, E., Döbeli, H. & Schacher, A. New metal chelate adsorbent selective for proteins and peptides containing neighbouring histidine residues. J. Chromatogr. 411, 177–184 (1987).
Ali, M. et al. Label-free histamine detection with nanofluidic diodes through metal ion displacement mechanism. Colloids Surf. B Biointerfaces 150, 201–208 (2017).
Wei, R., Gatterdam, V., Wieneke, R., Tampe, R. & Rant, U. Stochastic sensing of proteins with receptor-modified solid-state nanopores. Nat. Nanotechnol. 7, 257–263 (2012).
Choi, L. S. & Bayley, H. S-nitrosothiol chemistry at the single-molecule level. Angew. Chem. Int. Ed. Engl. 51, 7972–7976 (2012).
Shimazaki, Y., Takani, M. & Yamauchi, O. Metal complexes of amino acids and amino acid side chain groups. Structures and properties. Dalton Trans. 14, 7854–7869 (2009).
Martell, A. E. & Smith, R. M. in Critical Stability Constants (eds Martell, A. E. & Smith, R. M.) 1–58 (Springer US, 1982).
Anderegg, G. Critical survey of stability constants of NTA complexes. Pure Appl. Chem. 54, 2693–2758 (1982).
Zhang, J. et al. Mapping potential engineering sites of Mycobacterium smegmatis porin A (MspA) to form a nanoreactor. ACS Sens. 6, 2449–2456 (2021).
Song, L. et al. Structure of staphylococcal α-hemolysin, a heptameric transmembrane pore. Science 274, 1859–1865 (1996).
Kiseleva, I. et al. Thermodynamic study of mixed-ligand complex formation of copper(II) and nickel(II) nitrilotriacetates with amino acids in solution. I. Polyhedron 51, 10–17 (2013).
Wang, Y. et al. Nanopore sequencing accurately identifies the mutagenic DNA lesion O6-carboxymethyl guanine and reveals its behavior in replication. Angew. Chem. Int. Ed. Engl. 58, 8432–8436 (2019).
Butler, T. Z., Pavlenok, M., Derrington, I. M., Niederweis, M. & Gundlach, J. H. Single-molecule DNA detection with an engineered MspA protein nanopore. Proc. Natl Acad. Sci. USA 105, 20647–20652 (2008).
Abraham, M. J. et al. GROMACS: high performance molecular simulations through multi-level parallelism from laptops to supercomputers. SoftwareX 1-2, 19–25 (2015).
Doyle, H. A. & Mamula, M. J. Post-translational protein modifications in antigen recognition and autoimmunity. Trends Immunol. 22, 443–449 (2001).
Rosen, C. B., Rodriguez-Larrea, D. & Bayley, H. Single-molecule site-specific detection of protein phosphorylation with a nanopore. Nat. Biotechnol. 32, 179–181 (2014).
Nova, I. C. et al. Detection of phosphorylation post-translational modifications along single peptides with nanopores. Nat. Biotechnol. https://doi.org/10.1038/s41587-023-01839-z (2023).
Meng, W. J., Li, Y. & Zhou, Z. G. Anaphylactic shock and lethal anaphylaxis caused by compound amino acid solution, a nutritional treatment widely used in China. Amino Acids 42, 2501–2505 (2012).
Hoffer, L. J. Human protein and amino acid requirements. JPEN J. Parenter. Enteral Nutr. 40, 460–474 (2016).
Grembecka, J., Mucha, A., Cierpicki, T. & Kafarski, P. The most potent organophosphorus inhibitors of leucine aminopeptidase. Structure-based design, chemistry, and activity. J. Med. Chem. 46, 2641–2655 (2003).
Dou, Y., Lee, A., Zhu, L., Morton, J. & Ladiges, W. The potential of GHK as an anti-aging peptide. Aging Pathobiol. Ther. 2, 58–61 (2020).
Wang, Y. et al. Osmosis-driven motion-type modulation of biological nanopores for parallel optical nucleic acid sensing. ACS Appl. Mater. Interfaces 10, 7788–7797 (2018).
Moore, D. S. Amino acid and peptide net charges: a simple calculational procedure. Biochemical Educ. 13, 10–11 (1985).
Tian, C. et al. ff19SB: amino-acid-specific protein backbone parameters trained against quantum mechanics energy surfaces in solution. J. Chem. Theory Comput. 16, 528–552 (2020).
Dickson, C. J., Walker, R. C. & Gould, I. R. Lipid21: complex lipid membrane simulations with AMBER. J. Chem. Theory Comput. 18, 1726–1736 (2022).
Lu, T. Sobtop, version 1.0 (dev3.1), http://sobereva.com/soft/Sobtop (accessed 15 August 2022).
Lu, T. & Chen, F. Multiwfn: a multifunctional wavefunction analyzer. J. Comput. Chem. 33, 580–592 (2012).
Jo, S., Kim, T., Iyer, V. G. & Im, W. CHARMM-GUI: a web-based graphical user interface for CHARMM. J. Comput. Chem. 29, 1859–1865 (2008).
Li, P. & Merz, K. M. Jr. Taking into account the ion-induced dipole interaction in the nonbonded model of ions. J. Chem. Theory Comput. 10, 289–297 (2014).
Acknowledgements
This project was funded by the National Key R&D Program of China (grant no. 2022YFA1304602, to S.H.), National Natural Science Foundation of China (grant no. 22225405 and no. 31972917, to S.H.), the Fundamental Research Funds for the Central Universities (grant no. 020514380257 to S.H.), Programs for high-level entrepreneurial and innovative talents introduction of Jiangsu Province (individual and group program, to S.H.), Natural Science Foundation of Jiangsu Province (grant no. BK20200009, to S.H.), State Key Laboratory of Analytical Chemistry for Life Science (grant no. 5431ZZXM2204, to S.H.) and the China Postdoctoral Science Foundation (grant no. 2021M691508 and grant no. 2022T150308, to Y.W.). The funders had no role in study design, data collection and analysis, decision to publish or preparation of the manuscript.
Author information
Authors and Affiliations
Contributions
S.H., K.W. and S.Z. conceived the project. S.Z. and K.W. performed the pore engineering. K.W., X.Y., X.L. and W.S. performed the measurements. X.Z. and W.L. conducted the molecular dynamics simulations. Y.W., P.F. and Y.X. designed the machine learning algorithms. K.W. and Y.W. prepared the supplementary videos. P.Z. set up the instruments. S.H. and K.W. wrote the paper. S.H. supervised the project.
Corresponding author
Ethics declarations
Competing interests
S.H., S.Z., K.W. and Y.W. have filed patents describing the preparation of heterogeneous MspA and its applications thereof. All other authors have no competing interests.
Peer review
Peer review information
Nature Methods thanks Jeff Nivala, Sukanya Punthambaker and Meni Wanunu for their contribution to the peer review of this work. Peer reviewer reports are available. Primary Handling Editor: Arunima Singh, in collaboration with the Nature Methods team. Peer reviewer reports are available.
Additional information
Publisher’s note Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Extended data
Extended Data Fig. 1 Simultaneous sensing of leucine and isoleucine.
The measurements were carried out as described in Methods. A 1.5 M KCl buffer (1.5 M KCl, 10 mM CHES, pH 9.0) was used. A transmembrane voltage of +100 mV was continually applied. Nickel sulfate was added to trans with a final concentration of 50 μM. (a) The chemical structures of leucine (Leu, L) and isoleucine (Ile, I). Leucine and isoleucine are isomers with identical mass. (b) Top: A representative trace acquired during simultaneous sensing of leucine and isoleucine. Each amino acid was added to cis with a final concentration of 1 mM. Bottom: Representative events of leucine and isoleucine. The events are taken from the continuous trace (top) marked with red arrows. I0 represents the open pore current of MspA-NTA-Ni. Events caused by leucine and isoleucine are easily identifiable. (c) The event scatter plot of ∆I versus S. D. generated from results of (b). 274 successive events were used to generate the statistics. Though leucine and isoleucine have indistinguishable MW, they are fully discriminated by nanopore.
Extended Data Fig. 2 Machine-learning assisted identification of twenty-four amino acids.
(a) The machine-learning workflow. Sensing events acquired with twenty proteinogenic amino acids and four modified amino acids were collected to form a database. Three-hundred events were randomly selected from each amino acid class to form a labeled dataset. Five event features including ΔI, S.D., skew, kurt and toff were extracted from the events to form a feature matrix. After evaluation with ten-fold cross-validation, the quadratic SVM model was found to be the optimum model by demonstrating a validation accuracy of 98.6% (Supplementary Table 9). (b) The confusion matrix result of twenty-four amino acids classification performed with the trained quadratic SVM model. The row of the matrix represents the true class and the column represents the predicted class. (c) The scatter plot of ∆I versus S. D. generated by results of nanopore measurements of 20 proteinogenic amino acids (gray dots) as well as four amino acids containing PTMs (colorful dots). One hundred successive events of each amino acid were used to generate the statistics. The distribution of the four modified amino acids can be fully discriminated from that of the twenty proteinogenic amino acids.
Supplementary information
Supplementary Information
Materials, Supplementary Tables 1–9, Supplementary Figs. 1–30, References
Supplementary Video 1
Single-channel recording of glycine. The measurements were performed with MspA-NTA-Ni in a 1.5 M KCl buffer (1.5 M KCl, 10 mM CHES, pH 9.0). A voltage of +100 mV was continually applied. Nickel sulfate was added to trans with a final concentration of 50 μM. Glycine was added to cis with a final concentration of 2 mM. All glycine events are marked with ‘G’ above the trace. The trace is played back at twofold the speed of data acquisition. This demonstrates the consistency of events when the same type of amino acid is tested.
Supplementary Video 2
Single-channel recording of histidine. The measurements were performed with MspA-NTA-Ni in a 1.5 M KCl buffer (1.5 M KCl, 10 mM CHES, pH 9.0). A voltage of +100 mV was continually applied. Nickel sulfate was added to trans with a final concentration of 50 μM. Histidine was added to cis with a final concentration of 2 mM, and two characteristic types of events were immediately observed, marked with ‘H1’ and ‘H2’ above the trace. The trace is played back at twofold the speed of data acquisition. This demonstration shows amino acids that produce two types of events.
Supplementary Video 3
Sensing of amino acid mixture. The measurements were performed with MspA-NTA-Ni in a 1.5 M KCl buffer (1.5 M KCl, 10 mM CHES, pH 9.0). A voltage of +100 mV was continually applied. Nickel sulfate was added to trans with a final concentration of 50 μM. For demonstration purpose, five amino acids (glycine, asparagine, isoleucine, arginine, glutamic acid) were used as the representative analytes to perform simultaneous sensing. Each analyte was added to cis with a final concentration of 1 mM. The five amino acids can be clearly distinguished and were automatically recognized by machine learning. The trace is played back at twofold the speed of data acquisition. This demonstration shows simultaneous sensing of amino acids that produce visually different event features.
Supplementary Video 4
Simultaneous sensing of asparagine (N) and N4-(β-N-acetyl-d-glucosaminyl)-asparagine (GlcNAc-N). The measurements were performed with MspA-NTA-Ni in a 1.5 M KCl buffer (1.5 M KCl, 10 mM CHES, pH 9.0). A voltage of +100 mV was continually applied. Nickel sulfate was added to trans with a final concentration of 50 μM. N and GlcNAc-N were simultaneously added to cis, with a final concentration of 2 mM for each component. Two types of events corresponding to N and GlcNAc-N could be easily identified during the recording, and the identity of each event was labeled on the trace. The trace is played back at twofold the speed of data acquisition. This demonstration shows discrimination of modified and unmodified amino acids.
Supplementary Video 5
A cartoon demonstration of the sensing strategy. This demonstration provides a schematic overview of the sensing strategy.
Source data
Source Data Fig. 1
Statistical Source Data
Source Data Fig. 2
Statistical Source Data
Source Data Fig. 3
Statistical Source Data
Source Data Fig. 4
Statistical Source Data
Source Data Fig. 5
Statistical Source Data
Source Data Fig. 6
Statistical Source Data
Source Data Extended Data Fig./Table 1
Statistical Source Data
Source Data Extended Data Fig./Table 2
Statistical Source Data
Rights and permissions
Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.
About this article
Cite this article
Wang, K., Zhang, S., Zhou, X. et al. Unambiguous discrimination of all 20 proteinogenic amino acids and their modifications by nanopore. Nat Methods (2023). https://doi.org/10.1038/s41592-023-02021-8
Received:
Accepted:
Published:
DOI: https://doi.org/10.1038/s41592-023-02021-8