Highly parallel single-molecule identification of proteins in zeptomole-scale mixtures

Abstract

The identification and quantification of proteins lags behind DNA-sequencing methods in scale, sensitivity, and dynamic range. Here, we show that sparse amino acid–sequence information can be obtained for individual protein molecules for thousands to millions of molecules in parallel. We demonstrate selective fluorescence labeling of cysteine and lysine residues in peptide samples, immobilization of labeled peptides on a glass surface, and imaging by total internal reflection microscopy to monitor decreases in each molecule's fluorescence after consecutive rounds of Edman degradation. The obtained sparse fluorescent sequence of each molecule was then assigned to its parent protein in a reference database. We tested the method on synthetic and naturally derived peptide molecules in zeptomole-scale quantities. We also fluorescently labeled phosphoserines and achieved single-molecule positional readout of the phosphorylated sites. We measured >93% efficiencies for dye labeling, survival, and cleavage; further improvements should enable studies of increasingly complex proteomic mixtures, with the high sensitivity and digital quantification offered by single-molecule sequencing.

Access options

Rent or Buy article

Get time limited or full article access on ReadCube.

from$8.99

All prices are NET prices.

Figure 1: Overview of single-molecule fluorosequencing.
Figure 2: Fluorescent amino acid positions can be determined at single-molecule sensitivity.
Figure 3: Stepwise decreases in fluorescence intensity occur at the Edman cycles that correspond to the removal of the dye-labeled amino acids.
Figure 4: Fluorescent sequences can be interpreted computationally to identify dye positions and quantify errors.
Figure 5: Fluorosequencing can discriminate individual peptide molecules in zeptomole-scale mixtures and uniquely identify their parent proteins.
Figure 6: Direct single-molecule sequencing of phosphoserine positions within RNA polymerase II C-terminal-domain repeat peptides.

References

  1. 1

    Liu, Y., Beyer, A. & Aebersold, R. On the dependency of cellular protein levels on mRNA abundance. Cell 165, 535–550 (2016).

    CAS  Article  Google Scholar 

  2. 2

    da Costa, J.P., Santos, P.S.M., Vitorino, R., Rocha-Santos, T. & Duarte, A.C. How low can you go? A current perspective on low-abundance proteomics. Trends Analyt. Chem. 93, 171–182 (2017).

    CAS  Article  Google Scholar 

  3. 3

    Makarov, A. et al. Performance evaluation of a hybrid linear ion trap/orbitrap mass spectrometer. Anal. Chem. 78, 2113–2120 (2006).

    CAS  Article  Google Scholar 

  4. 4

    Makarov, A., Denisov, E., Lange, O. & Horning, S. Dynamic range of mass accuracy in LTQ Orbitrap hybrid mass spectrometer. J. Am. Soc. Mass Spectrom. 17, 977–982 (2006).

    CAS  Article  Google Scholar 

  5. 5

    Hawkridge, A.M. in Quantitative Proteomics (eds. Eyers, C.E. & Gaskell, S.) 3–21 (The Royal Society of Chemistry, Cambridge, 2014).

  6. 6

    Swaminathan, J., Boulgakov, A.A. & Marcotte, E.M. A theoretical justification for single molecule peptide sequencing. PLOS Comput. Biol. 11, e1004080 (2015).

    Article  Google Scholar 

  7. 7

    Yao, Y., Docter, M., van Ginkel, J., de Ridder, D. & Joo, C. Single-molecule protein sequencing through fingerprinting: computational assessment. Phys. Biol. 12, 055003 (2015).

    Article  Google Scholar 

  8. 8

    Zhao, Y. et al. Single-molecule spectroscopy of amino acids and peptides by recognition tunnelling. Nat. Nanotechnol. 9, 466–473 (2014).

    CAS  Article  Google Scholar 

  9. 9

    Wilson, J., Sloman, L., He, Z. & Aksimentiev, A. Graphene nanopores for protein sequencing. Adv. Funct. Mater. 26, 4830–4838 (2016).

    CAS  Article  Google Scholar 

  10. 10

    Kennedy, E., Dong, Z., Tennant, C. & Timp, G. Reading the primary structure of a protein with 0.07 nm3 resolution using a subnanometre-diameter pore. Nat. Nanotechnol. 11, 968–976 (2016).

    CAS  Article  Google Scholar 

  11. 11

    Sampath, G. Amino acid discrimination in a nanopore and the feasibility of sequencing peptides with a tandem cell and exopeptidase. RSC Advances 5, 30694–30700 (2015).

    CAS  Article  Google Scholar 

  12. 12

    Kolmogorov, M., Kennedy, E., Dong, Z., Timp, G. & Pevzner, P.A. Single-molecule protein identification by sub-nanopore sensors. PLoS Comput. Biol. 13, e1005356 (2017).

    Article  Google Scholar 

  13. 13

    Edman, P. Method for determination of the amino acid sequence in peptides. Acta Chem. Scand. 4, 283–293 (1950).

    CAS  Article  Google Scholar 

  14. 14

    Hernandez, E.T., Swaminathan, J., Marcotte, E.M. & Anslyn, E.V. Solution-phase and solid-phase sequential, selective modification of side chains in KDYWEC and KDYWE as models for usage in single-molecule protein sequencing. New J. Chem. 41, 462–469 (2017).

    CAS  Article  Google Scholar 

  15. 15

    Hermodson, M.A., Ericsson, L.H., Titani, K., Neurath, H. & Walsh, K.A. Application of sequenator analyses to the study of proteins. Biochemistry 11, 4493–4502 (1972).

    CAS  Article  Google Scholar 

  16. 16

    Phatnani, H.P. & Greenleaf, A.L. Phosphorylation and functions of the RNA polymerase II CTD. Genes Dev. 20, 2922–2936 (2006).

    CAS  Article  Google Scholar 

  17. 17

    Stevens, S.M. Jr. et al. Enhancement of phosphoprotein analysis using a fluorescent affinity tag and mass spectrometry. Rapid Commun. Mass Spectrom. 19, 2157–2162 (2005).

    CAS  Article  Google Scholar 

  18. 18

    Shapiro, E., Biezuner, T. & Linnarsson, S. Single-cell sequencing-based technologies will revolutionize whole-organism science. Nat. Rev. Genet. 14, 618–630 (2013).

    CAS  Article  Google Scholar 

  19. 19

    Ohshiro, T. et al. Detection of post-translational modifications in single peptides using electron tunnelling currents. Nat. Nanotechnol. 9, 835–840 (2014).

    CAS  Article  Google Scholar 

  20. 20

    Nivala, J., Marks, D.B. & Akeson, M. Unfoldase-mediated protein translocation through an α-hemolysin nanopore. Nat. Biotechnol. 31, 247–250 (2013).

    CAS  Article  Google Scholar 

  21. 21

    Rosen, C.B., Rodriguez-Larrea, D. & Bayley, H. Single-molecule site-specific detection of protein phosphorylation with a nanopore. Nat. Biotechnol. 32, 179–181 (2014).

    CAS  Article  Google Scholar 

  22. 22

    Wettenhall, R.E., Aebersold, R.H. & Hood, L.E. Solid-phase sequencing of 32P-labeled phosphopeptides at picomole and subpicomole levels. Methods Enzymol. 201, 186–199 (1991).

    CAS  Article  Google Scholar 

  23. 23

    Pushkarev, D., Neff, N.F. & Quake, S.R. Single-molecule sequencing of an individual human genome. Nat. Biotechnol. 27, 847–850 (2009).

    CAS  Article  Google Scholar 

  24. 24

    Anderson, N.L. & Anderson, N.G. The human plasma proteome: history, character, and diagnostic prospects. Mol. Cell. Proteomics 1, 845–867 (2002).

    CAS  Article  Google Scholar 

  25. 25

    McLachlin, D.T. & Chait, B.T. Improved beta-elimination-based affinity purification strategy for enrichment of phosphopeptides. Anal. Chem. 75, 6826–6836 (2003).

    CAS  Article  Google Scholar 

  26. 26

    Laursen, R.A. Solid-phase Edman degradation: an automatic peptide sequencer. Eur. J. Biochem. 20, 89–102 (1971).

    CAS  Article  Google Scholar 

  27. 27

    Guizar-Sicairos, M., Thurman, S.T. & Fienup, J.R. Efficient subpixel image registration algorithms. Opt. Lett. 33, 156–158 (2008).

    Article  Google Scholar 

  28. 28

    Cannon, B., Pan, C., Chen, L., Hadd, A.G. & Russell, R. A dual-mode single-molecule fluorescence assay for the detection of expanded CGG repeats in Fragile X syndrome. Mol. Biotechnol. 53, 19–28 (2013).

    CAS  Article  Google Scholar 

  29. 29

    Das, S.K., Darshi, M., Cheley, S., Wallace, M.I. & Bayley, H. Membrane protein stoichiometry determined from the stepwise photobleaching of dye-labelled subunits. ChemBioChem 8, 994–999 (2007).

    CAS  Article  Google Scholar 

  30. 30

    Shimazaki, H. & Shinomoto, S. A method for selecting the bin size of a time histogram. Neural Comput. 19, 1503–1527 (2007).

    Article  Google Scholar 

  31. 31

    Mutch, S.A. et al. Deconvolving single-molecule intensity distributions for quantitative microscopy measurements. Biophys. J. 92, 2926–2943 (2007).

    CAS  Article  Google Scholar 

Download references

Acknowledgements

We thank B. Cannon and R. Russell for early assistance with single-molecule imaging, M. Gadush for assistance with peptide synthesis, I. Riddington, J. Dinser, and K. Suhr for assistance in mass spectrometry analysis of fluorescently labeled peptides, Z. Simpson and J. Rybarski for assistance with image analysis, A. Ellington for many fruitful discussions, and the Texas Advanced Computing Center for high-performance computing. This work was supported by fellowships from the HHMI (to J.S.) and NSF (DGE-1610403 to A.A.B.), and by grants from DARPA (N66001-14-2-4051 to E.V.A. and E.M.M.), NIH (DP1 GM106408, R01 GM076536, and R35 GM122480 to E.M.M.), CPRIT (to E.M.M.), and the Welch foundation (F-1515 to E.M.M. and F-0046 to E.V.A.).

Author information

Affiliations

Authors

Contributions

J.S., A.A.B., E.T.H., A.M.B., J.L.B., A.M.J., E.V.A., and E.M.M. designed and analyzed the experiments or interpreted the data. J.S., E.T.H., A.M.B., J.L.B., and J.M. performed the experiments. J.S., A.A.B., E.T.H., A.M.B., E.V.A., and E.M.M. wrote and edited the manuscript.

Corresponding authors

Correspondence to Eric V Anslyn or Edward M Marcotte.

Ethics declarations

Competing interests

J.S., A.M.B., E.M.M., and E.V.A. are cofounders and shareholders of Erisyon Inc. J.S., E.M.M., and E.V.A. are co-inventors on granted US patent PCT/US2012/043769. J.S., A.A.B., E.T.H., J.L.B., A.M.J., E.V.A., and E.M.M. are co-inventors on pending US patent PCT/US2015/050099.

Integrated supplementary information

Supplementary Figure 1 Only select fluorophores exhibit fluorescence stability toward Edman reagents.

(A) Fluorophores (spanning four fluorescent channels, denoted by bar colors) were tested for their percentage change in fluorescence intensity in PBS buffer, following a 24 hour incubation with TFA or pyridine/PITC (shown as pyridine). Dyes marked with boxes exhibited only moderate changes (<20%) in fluorescence. The data are presented as mean across replicates, shown, with error +/- s.d. where n ≥ 3. (B) Images of fluorophore coupled Tentagel beads illustrate fluorescence changes by Edman reagents. In the case of BODIPY-FL (left panels), the fluorescence intensity decreases with TFA incubation, while there is a spectral redshift with pyridine incubation. In contrast, Atto647N (right panels) is stable in both color and intensity to both conditions. Scale bar, 200μm.

Supplementary Figure 2 Adaptation of flow-cell, TIRF-microscopy, and computer-controlled fluidic system for Edman sequencing.

We modified a Bioptechs FCS2 perfusion chamber (A) for Edman sequencing by substituting the silicone gaskets as indicated by red arrows, with perfluoroelastomer gaskets resistant to the Edman chemistry. (B) Eleven different polymeric materials sourced through a number of vendors were cut into 2cm x 1cm strips and tested for inertness following 24 hours of TFA incubation. Kalrez-0040 showed the least change in volume, excellent inertness to pyridine/PITC, and good compressibility (shore durometer A = 70); we used it for all subsequent experiments. Teflon (polytetrafluoroethylene) gaskets, although suitably inert, were not compressible and caused leaks when used in the perfusion chamber. Image of flow chamber is adapted from vendor (Bioptechs) supplied image. (C) Edman sequencing was implemented using a syringe pump (3-way valve configuration) and a 10-port multi-position valve system automated to exchange solvents through polytetrafluoroethylene tubing into the imaging perfusion chamber attached to the stage of a TIRF microscope.

Supplementary Figure 3 Bead-based assays confirm bulk Edman sequencing of fluorescently labeled amino acids.

(A) Specific binding of TMR to functionalized Tentagel beads occurs at the periphery and density can be measured by image processing. Peripheral bead fluorescence intensities were calculated by computing the area under the fluorescence intensity radial distribution normalized relative to a negative control beads bound non-specifically with free TMR lacking the NHS group to control for background fluorescence. (B) Edman degradation can be used to determine the positional information of the fluorescently labeled lysine residues of synthetic peptides using bulk fluorescence measurements. Bar charts indicate the normalized average of the fluorescence intensity per bead +/- s.d. across image fields. Raw fluorescence intensities and field counts are reported in Supplementary File 1.

Supplementary Figure 4 Nitrogen-purged methanol with Trolox significantly decreases photobleaching and fluorophore blinking.

Fluorescence loss is greatly reduced for the TMR labeled peptide, (fmoc)-K*A, under constant laser illumination in nitrogen purged 1 mM Trolox in methanol (top, green, curve, n=2802) versus imaging in methanol alone (bottom, red, curve, n=5942), imaged with consecutive 0.5 sec frames. Solid lines represent fits of each data set to a single exponential. Imaging in methanol/Trolox resulted in a t1/2 of approx. 105 seconds. * indicates TMR conjugated to lysine. Solid lines represent fits to single exponential decay functions.

Supplementary Figure 5 The aminosilane surface is stable to Edman chemistry.

(A) Schematic of assay. A layer of APTES was covalently formed on a glass surface by siloxane bond formation (-Si-O-Si-). (B) Counts of single Atto647N-NHS fluorophores (plotted as mean count, shown above, per microscopy field +/- s.d.), immobilized via a stable amide linkage to the silane layer, remain unchanged with repeated experimental cycles of Edman chemistry and after washes with wash buffer.

Supplementary Figure 6 Image-processing pipeline.

Overview of computational analysis to (1) identify fluorescent peptides as peaks in TIRF images, (2) align imaged peaks across consecutive Edman cycles, (3) identify the cycle at which each dye is removed, retaining “well-behaved” peptides. Dye positions are assigned using a maximum likelihood statistical model (see Online Methods), based on the empirical observation (4) that fluorescence intensities for one dye are log-normally distributed with a (log-normal) mean μ and standard deviation σ. Intensities for higher numbers of dyes are well-fit by log-normals with mean μ + ln(dye count) – dye-dye interaction factor Qc QUOTEμ+ln(dye count) -dye-dye interaction factorQ c , and standard deviation σ (Supplementary Fig. 9). (5) For each peptide, the number of dyes present after each Edman cycle is inferred by fitting observed intensities to each of the possible monotonically decreasing step functions (for up to 5 dyes), selecting the function maximizing a quality of fit scored using th lognormal probability density functions. (6) Counts of individual molecules exhibiting different step drop patterns are summarized in histograms.

Supplementary Figure 7 Image-processing summary.

Fluorescent spots were identified using a Gaussian peak fitting algorithm, then images were aligned between experimental cycles based on positions of fiducial markers. Each spot’s position was extracted from the aligned images, with the spot categorized in each frame as either ON or OFF depending on the presence of a well-fitting peak. Spot intensities in each frame were then measured using Mexican hat photometry.

Supplementary Figure 8 The sequence positions of multiple amino acid types can be determined by labeling each type with a distinct dye.

The histogram displays counts of individual molecules of the doubly labeled peptide GKAGAGACAGAYG (summarizing data from 400 image fields) indicating the Edman cycle numbers at which the two dyes were removed. An example of an individual doubly-labeled molecule is shown in the extracted TIRF images at bottom right. ‡ indicates Janelia Fluor 549 conjugated to lysine and ♦indicates Atto647N conjugated to cysteine.

Supplementary Figure 9 Measurement and modeling of fluorophore intensities.

(A) Fluorophore intensities follow a lognormal model. Photometries in the last frame before a dye labeled peptide permanently turns OFF predominantly represent the intensity of a single fluorophore, regardless of how many dyes a peptide initially started with. The distribution of these photometries is consistently lognormal across multiple experiments as shown by the Q-Q plot. Each point indicates a percentile of the lognormal photometry histogram of GK*AGAG, GCAGCAGAG, N-Acetyl-GK*AGAG, N-Acetyl-GCAGCAGAG, QCCTSICSLYN with n=97137, n=273411, n=72905, n=155097 peptides, respectively. * indicates TMR conjugated to lysine ♦indicates Atto647N conjugated to cysteine. (B, C) Statistical models for photometry of multiple dyes were refined using forward simulation to optimize the values of the lognormal shape parameter σ* QUOTEσ * and dye-dye interaction factor Qc. Photometry distributions of peptides with the observed (blue) and simulated (orange) dye sequence: [2,2,2,2,2,1,1,1,0,0,0,0] for (B) a poor parameter choice, showing results of an overestimated σ* QUOTEσ * QUOTEσ * and underestimated Qc QUOTEQ c , and for (C) the optimized values of σ* =0.20 and Qc =0.30 QUOTEσ * =0.20 QUOTEQ 2 =0.30 . Data shown for the initial condition, 3 mock cycles, and 4 Edman cycles, n=273411 peptides. The remaining cycles, with 0 dyes, are omitted.

Supplementary Figure 10 Monte Carlo simulation of fluorosequencing closely matches the experimentally observed error distribution.

(A) Experimental sequence histograms of GCAGCAGAG (A, left panel) and its N-acetylated control (A, middle panel) replotted from Fig. 4B for easy comparison. (B) Simulated fluorosequencing of GCAGCAGAG with errors (B, right panel) and its N-acetylated control (B, middle panel) result in observed (simulated) signal (B, left panel) that closely matches observed experimental data in A. ♦indicates Atto647N conjugated to cysteine.

Supplementary Figure 11 High-resolution mass spectrometry confirmation of the purity of the doubly labeled peptide GCAGCAGAG.

(A) Peptide structure with expected mass/charge (m/z) for its +2 charge state and top 5 isotopic variants. Single dye conjugates have expected m/z values of 1493.73 (+1 charge state). (B) High resolution mass spectrometry of the HPLC purified peptide confirms the presence of two Atto647N dyes. ♦indicates Atto647N conjugated to cysteine.

Supplementary Figure 12 Effects of peptide composition and length on Edman cleavage efficiencies.

(A) To better understand the range of Edman efficiency across amino acids we sequenced two synthetic peptides, differing only by the first amino acid, AKAGAGRYG and PKAGAGRYG. These were chosen for their historic ease, in the case of alanine, or difficulty, in the case of proline, of Edman sequencing. Experiments were performed in triplicate (bar chart represents the mean +/- s,d. across experiments), sequencing both peptides simultaneously in a multiplexed flow cell, and results were averaged and compared with simulation to determine error rates. Error rates of 3% dye destruction, 30% surface degradation were identical across the two samples, while Edman cleavage efficiency was 95% and 91% for the alanine and proline samples respectively. (B) Consistent with the higher Edman cleavage efficiencies observed for alanine, a peptide composed of glycine/alanine repeats (GAGAGAGAGCARRYRRG) with a fluorophore at the 10th position sequenced efficiently (top histogram, 400 fields) and was well fit by simulations with a 97% Edman cleavage efficiency, 3% dye destruction, and 4% surface degradation rate (bottom histogram). (Note that “dud" dye rates cannot be determined through simulation for singly labeled peptides.) † indicates Atto647 conjugated to lysine and ♦indicates Atto647N conjugated to cysteine.

Supplementary Figure 13 Full one- and two-dye fluorosequencing histograms for peptides in Fig. 5a and Fig. 5d confirm signal for the expected sequence patterns.

(A) Histograms tallying counts of molecules sequenced from a mixture of GCAGCAGAG with GAGCGACGAGAD (left panel, 98 fields) and GACCAGAAD with GAGCGACGAGAD (right panel, 49 fields) (B) Histograms of sequenced peptide RKTTRKM, 49 fields. ♦ indicates Atto647N conjugated to cysteine and † indicates Atto647N coupled to lysine residues.

Supplementary Figure 14 One-, two-, and three-dye fluorosequencing histograms for insulin peptides (Fig. 5b) confirm signal for the expected sequence patterns.

Plotted are the background-adjusted one, two, and three dye histograms for insulin A2 chain (QCCTSICSLYNE, measured across 100 image fields), A3 chain (NYCN, 120 fields), B1 chain (FVNQHLCGSHLVE, 72 fields), and B2 chain (ALYLVCGE, 100 fields), respectively. ♦ indicates Atto647N conjugated to cysteine.

Supplementary Figure 15 Modeling the expected effects of experimentally determined error rates on protein identification.

The effects of experimental errors were modeled on the expected identification rates of sets of human proteins in the subcellular compartments examined in Fig. 1C. Each curve plots coverage of uniquely identifiable proteins, as a function of Edman cycles performed, considering the scenario of labeling only cysteines and lysines on peptides formed by GluC proteolysis, which cleaves after glutamate or aspartate, when considering error rates of 94% Edman cleavage efficiency, 5% dye destruction, 5% surface degradation, and 7% “dud” dyes. Modeling was performed using the Monte Carlo procedure of ref. 6. as described in Online Methods. Application to more complex samples in the future could potentially be achieved through reduction of these errors, introduction of additional amino-acid-specific labels, or a combination. The consecutive covalent labeling of Cys, Lys, Asp/Glu, Trp, and the amino terminus of model peptides has been reported14, so there is no intrinsic barrier to extending the approach to additional amino acid types, provided the labels can be independently distinguished.

Supplementary information

Supplementary Text and Figures

Supplementary Figures 1–15 (PDF 3211 kb)

Life Sciences Reporting Summary (PDF 1975 kb)

Supplementary Tables

Supplementary Tables 1 and 2 (PDF 187 kb)

Supplementary Data 1

Statistics and reproducibility of photometry measurements (XLSX 17 kb)

Supplementary Software

Supporting computer code for image processing algorithms and Monte Carlo simulations (ZIP 895 kb)

Rights and permissions

Reprints and Permissions

About this article

Verify currency and authenticity via CrossMark

Cite this article

Swaminathan, J., Boulgakov, A., Hernandez, E. et al. Highly parallel single-molecule identification of proteins in zeptomole-scale mixtures. Nat Biotechnol 36, 1076–1082 (2018). https://doi.org/10.1038/nbt.4278

Download citation

Further reading

Search

Quick links

Nature Briefing

Sign up for the Nature Briefing newsletter — what matters in science, free to your inbox daily.

Get the most important science stories of the day, free in your inbox. Sign up for Nature Briefing