The identification and quantification of proteins lags behind DNA-sequencing methods in scale, sensitivity, and dynamic range. Here, we show that sparse amino acid–sequence information can be obtained for individual protein molecules for thousands to millions of molecules in parallel. We demonstrate selective fluorescence labeling of cysteine and lysine residues in peptide samples, immobilization of labeled peptides on a glass surface, and imaging by total internal reflection microscopy to monitor decreases in each molecule's fluorescence after consecutive rounds of Edman degradation. The obtained sparse fluorescent sequence of each molecule was then assigned to its parent protein in a reference database. We tested the method on synthetic and naturally derived peptide molecules in zeptomole-scale quantities. We also fluorescently labeled phosphoserines and achieved single-molecule positional readout of the phosphorylated sites. We measured >93% efficiencies for dye labeling, survival, and cleavage; further improvements should enable studies of increasingly complex proteomic mixtures, with the high sensitivity and digital quantification offered by single-molecule sequencing.
This is a preview of subscription content, access via your institution
Open Access articles citing this article.
Nature Communications Open Access 20 May 2023
Communications Biology Open Access 08 April 2023
Molecular Neurodegeneration Open Access 18 June 2022
Access Nature and 54 other Nature Portfolio journals
Get Nature+, our best-value online-access subscription
$29.99 / 30 days
cancel any time
Subscribe to this journal
Receive 12 print issues and online access
$209.00 per year
only $17.42 per issue
Rent or buy this article
Prices vary by article type
Prices may be subject to local taxes which are calculated during checkout
Liu, Y., Beyer, A. & Aebersold, R. On the dependency of cellular protein levels on mRNA abundance. Cell 165, 535–550 (2016).
da Costa, J.P., Santos, P.S.M., Vitorino, R., Rocha-Santos, T. & Duarte, A.C. How low can you go? A current perspective on low-abundance proteomics. Trends Analyt. Chem. 93, 171–182 (2017).
Makarov, A. et al. Performance evaluation of a hybrid linear ion trap/orbitrap mass spectrometer. Anal. Chem. 78, 2113–2120 (2006).
Makarov, A., Denisov, E., Lange, O. & Horning, S. Dynamic range of mass accuracy in LTQ Orbitrap hybrid mass spectrometer. J. Am. Soc. Mass Spectrom. 17, 977–982 (2006).
Hawkridge, A.M. in Quantitative Proteomics (eds. Eyers, C.E. & Gaskell, S.) 3–21 (The Royal Society of Chemistry, Cambridge, 2014).
Swaminathan, J., Boulgakov, A.A. & Marcotte, E.M. A theoretical justification for single molecule peptide sequencing. PLOS Comput. Biol. 11, e1004080 (2015).
Yao, Y., Docter, M., van Ginkel, J., de Ridder, D. & Joo, C. Single-molecule protein sequencing through fingerprinting: computational assessment. Phys. Biol. 12, 055003 (2015).
Zhao, Y. et al. Single-molecule spectroscopy of amino acids and peptides by recognition tunnelling. Nat. Nanotechnol. 9, 466–473 (2014).
Wilson, J., Sloman, L., He, Z. & Aksimentiev, A. Graphene nanopores for protein sequencing. Adv. Funct. Mater. 26, 4830–4838 (2016).
Kennedy, E., Dong, Z., Tennant, C. & Timp, G. Reading the primary structure of a protein with 0.07 nm3 resolution using a subnanometre-diameter pore. Nat. Nanotechnol. 11, 968–976 (2016).
Sampath, G. Amino acid discrimination in a nanopore and the feasibility of sequencing peptides with a tandem cell and exopeptidase. RSC Advances 5, 30694–30700 (2015).
Kolmogorov, M., Kennedy, E., Dong, Z., Timp, G. & Pevzner, P.A. Single-molecule protein identification by sub-nanopore sensors. PLoS Comput. Biol. 13, e1005356 (2017).
Edman, P. Method for determination of the amino acid sequence in peptides. Acta Chem. Scand. 4, 283–293 (1950).
Hernandez, E.T., Swaminathan, J., Marcotte, E.M. & Anslyn, E.V. Solution-phase and solid-phase sequential, selective modification of side chains in KDYWEC and KDYWE as models for usage in single-molecule protein sequencing. New J. Chem. 41, 462–469 (2017).
Hermodson, M.A., Ericsson, L.H., Titani, K., Neurath, H. & Walsh, K.A. Application of sequenator analyses to the study of proteins. Biochemistry 11, 4493–4502 (1972).
Phatnani, H.P. & Greenleaf, A.L. Phosphorylation and functions of the RNA polymerase II CTD. Genes Dev. 20, 2922–2936 (2006).
Stevens, S.M. Jr. et al. Enhancement of phosphoprotein analysis using a fluorescent affinity tag and mass spectrometry. Rapid Commun. Mass Spectrom. 19, 2157–2162 (2005).
Shapiro, E., Biezuner, T. & Linnarsson, S. Single-cell sequencing-based technologies will revolutionize whole-organism science. Nat. Rev. Genet. 14, 618–630 (2013).
Ohshiro, T. et al. Detection of post-translational modifications in single peptides using electron tunnelling currents. Nat. Nanotechnol. 9, 835–840 (2014).
Nivala, J., Marks, D.B. & Akeson, M. Unfoldase-mediated protein translocation through an α-hemolysin nanopore. Nat. Biotechnol. 31, 247–250 (2013).
Rosen, C.B., Rodriguez-Larrea, D. & Bayley, H. Single-molecule site-specific detection of protein phosphorylation with a nanopore. Nat. Biotechnol. 32, 179–181 (2014).
Wettenhall, R.E., Aebersold, R.H. & Hood, L.E. Solid-phase sequencing of 32P-labeled phosphopeptides at picomole and subpicomole levels. Methods Enzymol. 201, 186–199 (1991).
Pushkarev, D., Neff, N.F. & Quake, S.R. Single-molecule sequencing of an individual human genome. Nat. Biotechnol. 27, 847–850 (2009).
Anderson, N.L. & Anderson, N.G. The human plasma proteome: history, character, and diagnostic prospects. Mol. Cell. Proteomics 1, 845–867 (2002).
McLachlin, D.T. & Chait, B.T. Improved beta-elimination-based affinity purification strategy for enrichment of phosphopeptides. Anal. Chem. 75, 6826–6836 (2003).
Laursen, R.A. Solid-phase Edman degradation: an automatic peptide sequencer. Eur. J. Biochem. 20, 89–102 (1971).
Guizar-Sicairos, M., Thurman, S.T. & Fienup, J.R. Efficient subpixel image registration algorithms. Opt. Lett. 33, 156–158 (2008).
Cannon, B., Pan, C., Chen, L., Hadd, A.G. & Russell, R. A dual-mode single-molecule fluorescence assay for the detection of expanded CGG repeats in Fragile X syndrome. Mol. Biotechnol. 53, 19–28 (2013).
Das, S.K., Darshi, M., Cheley, S., Wallace, M.I. & Bayley, H. Membrane protein stoichiometry determined from the stepwise photobleaching of dye-labelled subunits. ChemBioChem 8, 994–999 (2007).
Shimazaki, H. & Shinomoto, S. A method for selecting the bin size of a time histogram. Neural Comput. 19, 1503–1527 (2007).
Mutch, S.A. et al. Deconvolving single-molecule intensity distributions for quantitative microscopy measurements. Biophys. J. 92, 2926–2943 (2007).
We thank B. Cannon and R. Russell for early assistance with single-molecule imaging, M. Gadush for assistance with peptide synthesis, I. Riddington, J. Dinser, and K. Suhr for assistance in mass spectrometry analysis of fluorescently labeled peptides, Z. Simpson and J. Rybarski for assistance with image analysis, A. Ellington for many fruitful discussions, and the Texas Advanced Computing Center for high-performance computing. This work was supported by fellowships from the HHMI (to J.S.) and NSF (DGE-1610403 to A.A.B.), and by grants from DARPA (N66001-14-2-4051 to E.V.A. and E.M.M.), NIH (DP1 GM106408, R01 GM076536, and R35 GM122480 to E.M.M.), CPRIT (to E.M.M.), and the Welch foundation (F-1515 to E.M.M. and F-0046 to E.V.A.).
J.S., A.M.B., E.M.M., and E.V.A. are cofounders and shareholders of Erisyon Inc. J.S., E.M.M., and E.V.A. are co-inventors on granted US patent PCT/US2012/043769. J.S., A.A.B., E.T.H., J.L.B., A.M.J., E.V.A., and E.M.M. are co-inventors on pending US patent PCT/US2015/050099.
Integrated supplementary information
Supplementary Figure 1 Only select fluorophores exhibit fluorescence stability toward Edman reagents.
(A) Fluorophores (spanning four fluorescent channels, denoted by bar colors) were tested for their percentage change in fluorescence intensity in PBS buffer, following a 24 hour incubation with TFA or pyridine/PITC (shown as pyridine). Dyes marked with boxes exhibited only moderate changes (<20%) in fluorescence. The data are presented as mean across replicates, shown, with error +/- s.d. where n ≥ 3. (B) Images of fluorophore coupled Tentagel beads illustrate fluorescence changes by Edman reagents. In the case of BODIPY-FL (left panels), the fluorescence intensity decreases with TFA incubation, while there is a spectral redshift with pyridine incubation. In contrast, Atto647N (right panels) is stable in both color and intensity to both conditions. Scale bar, 200μm.
Supplementary Figure 2 Adaptation of flow-cell, TIRF-microscopy, and computer-controlled fluidic system for Edman sequencing.
We modified a Bioptechs FCS2 perfusion chamber (A) for Edman sequencing by substituting the silicone gaskets as indicated by red arrows, with perfluoroelastomer gaskets resistant to the Edman chemistry. (B) Eleven different polymeric materials sourced through a number of vendors were cut into 2cm x 1cm strips and tested for inertness following 24 hours of TFA incubation. Kalrez-0040 showed the least change in volume, excellent inertness to pyridine/PITC, and good compressibility (shore durometer A = 70); we used it for all subsequent experiments. Teflon (polytetrafluoroethylene) gaskets, although suitably inert, were not compressible and caused leaks when used in the perfusion chamber. Image of flow chamber is adapted from vendor (Bioptechs) supplied image. (C) Edman sequencing was implemented using a syringe pump (3-way valve configuration) and a 10-port multi-position valve system automated to exchange solvents through polytetrafluoroethylene tubing into the imaging perfusion chamber attached to the stage of a TIRF microscope.
Supplementary Figure 3 Bead-based assays confirm bulk Edman sequencing of fluorescently labeled amino acids.
(A) Specific binding of TMR to functionalized Tentagel beads occurs at the periphery and density can be measured by image processing. Peripheral bead fluorescence intensities were calculated by computing the area under the fluorescence intensity radial distribution normalized relative to a negative control beads bound non-specifically with free TMR lacking the NHS group to control for background fluorescence. (B) Edman degradation can be used to determine the positional information of the fluorescently labeled lysine residues of synthetic peptides using bulk fluorescence measurements. Bar charts indicate the normalized average of the fluorescence intensity per bead +/- s.d. across image fields. Raw fluorescence intensities and field counts are reported in Supplementary File 1.
Supplementary Figure 4 Nitrogen-purged methanol with Trolox significantly decreases photobleaching and fluorophore blinking.
Fluorescence loss is greatly reduced for the TMR labeled peptide, (fmoc)-K*A, under constant laser illumination in nitrogen purged 1 mM Trolox in methanol (top, green, curve, n=2802) versus imaging in methanol alone (bottom, red, curve, n=5942), imaged with consecutive 0.5 sec frames. Solid lines represent fits of each data set to a single exponential. Imaging in methanol/Trolox resulted in a t1/2 of approx. 105 seconds. * indicates TMR conjugated to lysine. Solid lines represent fits to single exponential decay functions.
(A) Schematic of assay. A layer of APTES was covalently formed on a glass surface by siloxane bond formation (-Si-O-Si-). (B) Counts of single Atto647N-NHS fluorophores (plotted as mean count, shown above, per microscopy field +/- s.d.), immobilized via a stable amide linkage to the silane layer, remain unchanged with repeated experimental cycles of Edman chemistry and after washes with wash buffer.
Overview of computational analysis to (1) identify fluorescent peptides as peaks in TIRF images, (2) align imaged peaks across consecutive Edman cycles, (3) identify the cycle at which each dye is removed, retaining “well-behaved” peptides. Dye positions are assigned using a maximum likelihood statistical model (see Online Methods), based on the empirical observation (4) that fluorescence intensities for one dye are log-normally distributed with a (log-normal) mean μ and standard deviation σ. Intensities for higher numbers of dyes are well-fit by log-normals with mean μ + ln(dye count) – dye-dye interaction factor Qc QUOTEμ+ln(dye count) -dye-dye interaction factorQ c , and standard deviation σ (Supplementary Fig. 9). (5) For each peptide, the number of dyes present after each Edman cycle is inferred by fitting observed intensities to each of the possible monotonically decreasing step functions (for up to 5 dyes), selecting the function maximizing a quality of fit scored using th lognormal probability density functions. (6) Counts of individual molecules exhibiting different step drop patterns are summarized in histograms.
Fluorescent spots were identified using a Gaussian peak fitting algorithm, then images were aligned between experimental cycles based on positions of fiducial markers. Each spot’s position was extracted from the aligned images, with the spot categorized in each frame as either ON or OFF depending on the presence of a well-fitting peak. Spot intensities in each frame were then measured using Mexican hat photometry.
Supplementary Figure 8 The sequence positions of multiple amino acid types can be determined by labeling each type with a distinct dye.
The histogram displays counts of individual molecules of the doubly labeled peptide GK‡AGAGAC♦AGAYG (summarizing data from 400 image fields) indicating the Edman cycle numbers at which the two dyes were removed. An example of an individual doubly-labeled molecule is shown in the extracted TIRF images at bottom right. ‡ indicates Janelia Fluor 549 conjugated to lysine and ♦indicates Atto647N conjugated to cysteine.
(A) Fluorophore intensities follow a lognormal model. Photometries in the last frame before a dye labeled peptide permanently turns OFF predominantly represent the intensity of a single fluorophore, regardless of how many dyes a peptide initially started with. The distribution of these photometries is consistently lognormal across multiple experiments as shown by the Q-Q plot. Each point indicates a percentile of the lognormal photometry histogram of GK*AGAG, GC♦AGC♦AGAG, N-Acetyl-GK*AGAG, N-Acetyl-GC♦AGC♦AGAG, QC♦C♦TSIC♦SLYN with n=97137, n=273411, n=72905, n=155097 peptides, respectively. * indicates TMR conjugated to lysine ♦indicates Atto647N conjugated to cysteine. (B, C) Statistical models for photometry of multiple dyes were refined using forward simulation to optimize the values of the lognormal shape parameter σ* QUOTEσ * and dye-dye interaction factor Qc. Photometry distributions of peptides with the observed (blue) and simulated (orange) dye sequence: [2,2,2,2,2,1,1,1,0,0,0,0] for (B) a poor parameter choice, showing results of an overestimated σ* QUOTEσ * QUOTEσ * and underestimated Qc QUOTEQ c , and for (C) the optimized values of σ* =0.20 and Qc =0.30 QUOTEσ * =0.20 QUOTEQ 2 =0.30 . Data shown for the initial condition, 3 mock cycles, and 4 Edman cycles, n=273411 peptides. The remaining cycles, with 0 dyes, are omitted.
Supplementary Figure 10 Monte Carlo simulation of fluorosequencing closely matches the experimentally observed error distribution.
(A) Experimental sequence histograms of GC♦AGC♦AGAG (A, left panel) and its N-acetylated control (A, middle panel) replotted from Fig. 4B for easy comparison. (B) Simulated fluorosequencing of GC♦AGC♦AGAG with errors (B, right panel) and its N-acetylated control (B, middle panel) result in observed (simulated) signal (B, left panel) that closely matches observed experimental data in A. ♦indicates Atto647N conjugated to cysteine.
Supplementary Figure 11 High-resolution mass spectrometry confirmation of the purity of the doubly labeled peptide GC♦AGC♦AGAG.
(A) Peptide structure with expected mass/charge (m/z) for its +2 charge state and top 5 isotopic variants. Single dye conjugates have expected m/z values of 1493.73 (+1 charge state). (B) High resolution mass spectrometry of the HPLC purified peptide confirms the presence of two Atto647N dyes. ♦indicates Atto647N conjugated to cysteine.
(A) To better understand the range of Edman efficiency across amino acids we sequenced two synthetic peptides, differing only by the first amino acid, AK†AGAGRYG and PK†AGAGRYG. These were chosen for their historic ease, in the case of alanine, or difficulty, in the case of proline, of Edman sequencing. Experiments were performed in triplicate (bar chart represents the mean +/- s,d. across experiments), sequencing both peptides simultaneously in a multiplexed flow cell, and results were averaged and compared with simulation to determine error rates. Error rates of 3% dye destruction, 30% surface degradation were identical across the two samples, while Edman cleavage efficiency was 95% and 91% for the alanine and proline samples respectively. (B) Consistent with the higher Edman cleavage efficiencies observed for alanine, a peptide composed of glycine/alanine repeats (GAGAGAGAGC♦ARRYRRG) with a fluorophore at the 10th position sequenced efficiently (top histogram, 400 fields) and was well fit by simulations with a 97% Edman cleavage efficiency, 3% dye destruction, and 4% surface degradation rate (bottom histogram). (Note that “dud" dye rates cannot be determined through simulation for singly labeled peptides.) † indicates Atto647 conjugated to lysine and ♦indicates Atto647N conjugated to cysteine.
Supplementary Figure 13 Full one- and two-dye fluorosequencing histograms for peptides in Fig. 5a and Fig. 5d confirm signal for the expected sequence patterns.
(A) Histograms tallying counts of molecules sequenced from a mixture of GC♦AGC♦AGAG with GAGC♦GAC♦GAGAD (left panel, 98 fields) and GAC♦C♦AGAAD with GAGC♦GAC♦GAGAD (right panel, 49 fields) (B) Histograms of sequenced peptide RK†TTRK†M, 49 fields. ♦ indicates Atto647N conjugated to cysteine and † indicates Atto647N coupled to lysine residues.
Supplementary Figure 14 One-, two-, and three-dye fluorosequencing histograms for insulin peptides (Fig. 5b) confirm signal for the expected sequence patterns.
Plotted are the background-adjusted one, two, and three dye histograms for insulin A2 chain (QC♦C♦TSIC♦SLYNE, measured across 100 image fields), A3 chain (NYC♦N, 120 fields), B1 chain (FVNQHLC♦GSHLVE, 72 fields), and B2 chain (ALYLVC♦GE, 100 fields), respectively. ♦ indicates Atto647N conjugated to cysteine.
Supplementary Figure 15 Modeling the expected effects of experimentally determined error rates on protein identification.
The effects of experimental errors were modeled on the expected identification rates of sets of human proteins in the subcellular compartments examined in Fig. 1C. Each curve plots coverage of uniquely identifiable proteins, as a function of Edman cycles performed, considering the scenario of labeling only cysteines and lysines on peptides formed by GluC proteolysis, which cleaves after glutamate or aspartate, when considering error rates of 94% Edman cleavage efficiency, 5% dye destruction, 5% surface degradation, and 7% “dud” dyes. Modeling was performed using the Monte Carlo procedure of ref. 6. as described in Online Methods. Application to more complex samples in the future could potentially be achieved through reduction of these errors, introduction of additional amino-acid-specific labels, or a combination. The consecutive covalent labeling of Cys, Lys, Asp/Glu, Trp, and the amino terminus of model peptides has been reported14, so there is no intrinsic barrier to extending the approach to additional amino acid types, provided the labels can be independently distinguished.
Supplementary Figures 1–15 (PDF 3211 kb)
Supplementary Tables 1 and 2 (PDF 187 kb)
Statistics and reproducibility of photometry measurements (XLSX 17 kb)
Supporting computer code for image processing algorithms and Monte Carlo simulations (ZIP 895 kb)
About this article
Cite this article
Swaminathan, J., Boulgakov, A., Hernandez, E. et al. Highly parallel single-molecule identification of proteins in zeptomole-scale mixtures. Nat Biotechnol 36, 1076–1082 (2018). https://doi.org/10.1038/nbt.4278
This article is cited by
Communications Biology (2023)
Nature Methods (2023)
Nature Communications (2023)
Nature Biotechnology (2023)