Skip to main content

Thank you for visiting You are using a browser version with limited support for CSS. To obtain the best experience, we recommend you use a more up to date browser (or turn off compatibility mode in Internet Explorer). In the meantime, to ensure continued support, we are displaying the site without styles and JavaScript.

Multiplex assessment of protein variant abundance by massively parallel sequencing


Determining the pathogenicity of genetic variants is a critical challenge, and functional assessment is often the only option. Experimentally characterizing millions of possible missense variants in thousands of clinically important genes requires generalizable, scalable assays. We describe variant abundance by massively parallel sequencing (VAMP-seq), which measures the effects of thousands of missense variants of a protein on intracellular abundance simultaneously. We apply VAMP-seq to quantify the abundance of 7,801 single-amino-acid variants of PTEN and TPMT, proteins in which functional variants are clinically actionable. We identify 1,138 PTEN and 777 TPMT variants that result in low protein abundance, and may be pathogenic or alter drug metabolism, respectively. We observe selection for low-abundance PTEN variants in cancer, and show that p.Pro38Ser, which accounts for ~10% of PTEN missense variants in melanoma, functions via a dominant-negative mechanism. Finally, we demonstrate that VAMP-seq is applicable to other genes, highlighting its generalizability.

This is a preview of subscription content, access via your institution

Access options

Rent or buy this article

Prices vary by article type



Prices may be subject to local taxes which are calculated during checkout

Fig. 1: Overview of VAMP-seq.
Fig. 2: VAMP-seq abundance scores for PTEN and TPMT.
Fig. 3: Biochemical features influencing intracellular protein abundance.
Fig. 4: PTEN variant abundance classes across PTEN hamartoma tumor syndrome and cancer.
Fig. 5: TPMT variant abundance classes across pharmacogenomics phenotypes.
Fig. 6: Additional drug- and disease-related genes are compatible with VAMP-seq.

Similar content being viewed by others


  1. Shirts, B. H., Pritchard, C. C. & Walsh, T. Family-specific variants and the limits of human genetics. Trends Mol. Med. 22, 925–934 (2016).

    Article  PubMed  CAS  PubMed Central  Google Scholar 

  2. Lek, M. et al. Analysis of protein-coding genetic variation in 60,706 humans. Nature 536, 285–291 (2016).

    Article  PubMed  PubMed Central  CAS  Google Scholar 

  3. Landrum, M. J. et al. ClinVar: public archive of relationships among sequence variation and human phenotype. Nucleic Acids Res. 42, 980–985 (2014).

    Article  CAS  Google Scholar 

  4. Fowler, D. M., Stephany, J. J. & Fields, S. Measuring the activity of protein variants on a large scale using deep mutational scanning. Nat. Protoc. 9, 2267–2284 (2014).

    Article  PubMed  PubMed Central  CAS  Google Scholar 

  5. Gasperini, M., Starita, L. & Shendure, J. The power of multiplexed functional analysis of genetic variants. Nat. Protoc. 11, 1782–1787 (2016).

    Article  PubMed  CAS  PubMed Central  Google Scholar 

  6. Manolio, T. A. et al. Bedside back to bench: building bridges between basic and clinical genomic research. Cell 169, 6–12 (2017).

    Article  PubMed  PubMed Central  CAS  Google Scholar 

  7. Starita, L. M. et al. Massively parallel functional analysis of BRCA1 RING domain variants. Genetics 200, 413–422 (2015).

    Article  PubMed  PubMed Central  CAS  Google Scholar 

  8. Majithia, A. R. et al. Prospective functional classification of all possible missense variants in PPARG. Nat. Genet. 48, 1570–1575 (2016).

    Article  PubMed  PubMed Central  CAS  Google Scholar 

  9. Yue, P., Li, Z. & Moult, J. Loss of protein structure stability as a major causative factor in monogenic disease. J. Mol. Biol. 353, 459–473 (2005).

    Article  PubMed  CAS  Google Scholar 

  10. Redler, R. L., Das, J., Diaz, J. R. & Dokholyan, N. V. Protein destabilization as a common factor in diverse inherited disorders. J. Mol. Evol. 82, 11–16 (2016).

    Article  PubMed  CAS  Google Scholar 

  11. Berger, A. H., Knudson, A. G. & Pandolfi, P. P. A continuum model for tumour suppression. Nature 476, 163–169 (2011).

    Article  PubMed  PubMed Central  CAS  Google Scholar 

  12. Lee, M. S. et al. Comprehensive analysis of missense variations in the BRCT domain of BRCA1 by structural and functional assays. Cancer Res. 70, 4880–4890 (2010).

    Article  PubMed  PubMed Central  CAS  Google Scholar 

  13. Tai, H. L., Krynetski, E. Y., Schuetz, E. G., Yanishevski, Y. & Evans, W. E. Enhanced proteolysis of thiopurine S-methyltransferase (TPMT) encoded by mutant alleles in humans (TPMT*3A, TPMT*2): mechanisms for the genetic polymorphism of TPMT activity. Proc. Natl Acad. Sci. USA 94, 6444–6449 (1997).

    Article  PubMed  PubMed Central  CAS  Google Scholar 

  14. Kim, I., Miller, C. R., Young, D. L. & Fields, S. High-throughput analysis of in vivo protein stability. Mol. Cell. Proteomics 12, 3370–3378 (2013).

    Article  PubMed  PubMed Central  CAS  Google Scholar 

  15. Klesmith, J. R., Bacik, J.-P., Wrenbeck, E. E., Michalczyk, R. & Whitehead, T. A. Trade-offs between enzyme fitness and solubility illuminated by deep mutational scanning. Proc. Natl Acad. Sci. USA 114, 2265–2270 (2017).

    Article  PubMed  PubMed Central  CAS  Google Scholar 

  16. Yen, H.-C. S., Xu, Q., Chou, D. M., Zhao, Z. & Elledge, S. J. Global protein stability profiling in mammalian cells. Science 322, 918–923 (2008).

    Article  PubMed  CAS  Google Scholar 

  17. Matreyek, K. A., Stephany, J. J. & Fowler, D. M. A platform for functional assessment of large variant libraries in mammalian cells. Nucleic Acids Res. 45, e102 (2017).

    Article  PubMed  PubMed Central  CAS  Google Scholar 

  18. Jain, P. C. & Varadarajan, R. A rapid, efficient, and economical inverse polymerase chain reaction-based method for generating a site saturation mutant library. Anal. Biochem. 449, 90–98 (2014).

    Article  PubMed  CAS  Google Scholar 

  19. Cabantous, S., Terwilliger, T. C. & Waldo, G. S. Protein tagging and detection with engineered self-assembling fragments of green fluorescent protein. Nat. Biotechnol. 23, 102–107 (2005).

    Article  PubMed  CAS  Google Scholar 

  20. Johnston, S. B. & Raines, R. T. Conformational stability and catalytic activity of PTEN variants linked to cancers and autism spectrum disorders. Biochemistry 54, 1576–1582 (2015).

    Article  PubMed  CAS  Google Scholar 

  21. Wu, H. et al. Structural basis of allele variation of human thiopurine-S-methyltransferase. Proteins 67, 198–208 (2007).

    Article  PubMed  PubMed Central  CAS  Google Scholar 

  22. Ward, W. W., Prentice, H. J., Roth, A. F., Cody, C. W. & Reeves, S. C. Spectral perturbations of the Aequorea green-fluorescent protein. Photochem. Photobiol. 35, 803–808 (1982).

    Article  CAS  Google Scholar 

  23. Sarkisyan, K. S. et al. Local fitness landscape of the green fluorescent protein. Nature 533, 397–401 (2016).

    Article  PubMed  PubMed Central  CAS  Google Scholar 

  24. Zhou, H. & Zhou, Y. Quantifying the effect of burial of amino acid residues on protein stability. Proteins 322, 315–322 (2004).

    Google Scholar 

  25. Kauzmann, W. Some factors in the interpretation of protein denaturation. Adv. Protein Chem. 14, 1–63 (1959).

    Article  PubMed  CAS  Google Scholar 

  26. Rocklin, G. J. et al. Global analysis of protein folding using massively parallel design, synthesis, and testing. Science 357, 168–175 (2017).

    Article  PubMed  PubMed Central  CAS  Google Scholar 

  27. Lee, J. O. et al. Crystal structure of the PTEN tumor suppressor: implications for its phosphoinositide phosphatase activity and membrane association. Cell 99, 323–334 (1999).

    Article  PubMed  CAS  Google Scholar 

  28. Song, M. S., Salmena, L. & Pandolfi, P. P. The functions and regulation of the PTEN tumour suppressor. Nat. Rev. Mol. Cell Biol. 13, 283–296 (2012).

    Article  PubMed  CAS  Google Scholar 

  29. Nguyen, H.-N. et al. A new class of cancer-associated PTEN mutations defined by membrane translocation defects. Oncogene 34, 3737–3743 (2015).

    Article  PubMed  CAS  Google Scholar 

  30. Walker, S. M., Leslie, N. R., Perera, N. M., Batty, I. H. & Downes, C. P. The tumour-suppressor function of PTEN requires an N-terminal lipid-binding motif. Biochem. J. 379, 301–307 (2004).

    Article  PubMed  PubMed Central  CAS  Google Scholar 

  31. Das, S., Dixon, J. E. & Cho, W. Membrane-binding and activation mechanism of PTEN. Proc. Natl Acad. Sci. USA 100, 7491–7496 (2003).

    Article  PubMed  PubMed Central  CAS  Google Scholar 

  32. Vazquez, F., Ramaswamy, S., Nakamura, N. & Sellers, W. R. Phosphorylation of the PTEN tail regulates protein stability and function. Mol. Cell. Biol. 20, 5010–5018 (2000).

    Article  PubMed  PubMed Central  CAS  Google Scholar 

  33. Wei, Y., Stec, B., Redfield, A. G., Weerapana, E. & Roberts, M. F. Phospholipid-binding sites of phosphatase and tensin homolog (PTEN): Exploring the mechanism of phosphatidylinositol 4,5-bisphosphate activation. J. Biol. Chem. 290, 1592–1606 (2015).

    Article  PubMed  CAS  Google Scholar 

  34. Naguib, A. et al. PTEN functions by recruitment to cytoplasmic vesicles. Mol. Cell 58, 255–268 (2015).

    Article  PubMed  PubMed Central  CAS  Google Scholar 

  35. Hobert, J. A. & Eng, C. PTEN hamartoma tumor syndrome: an overview. Genet. Med. 11, 687–694 (2009).

    Article  PubMed  CAS  Google Scholar 

  36. Melbārde-Gorkuša, I. et al. Challenges in the management of a patient with Cowden syndrome: case report and literature review. Hered. Cancer Clin. Pract. 10, 5 (2012).

    Article  PubMed  PubMed Central  Google Scholar 

  37. Staal, F. J. T. et al. A novel germline mutation of PTEN associated with brain tumours of multiple lineages. Br. J. Cancer 86, 1586–1591 (2002).

    Article  PubMed  PubMed Central  CAS  Google Scholar 

  38. Nelen, M. R. et al. Novel PTEN mutations in patients with Cowden disease: Absence of clear genotype–phenotype correlations. Eur. J. Hum. Genet. 7, 267–273 (1999).

    Article  PubMed  CAS  Google Scholar 

  39. Whiffin, N. et al. Using high-resolution variant frequencies to empower clinical genome interpretation. Genet. Med. 19, 1151–1158 (2017).

    Article  PubMed  PubMed Central  Google Scholar 

  40. Richards, S. et al. Standards and guidelines for the interpretation of sequence variants: a joint consensus recommendation of the American College of Medical Genetics and Genomics and the Association for Molecular Pathology. Genet. Med. 17, 405–423 (2015).

    Article  PubMed  PubMed Central  Google Scholar 

  41. Hollander, M. C., Blumenthal, G. M. & Dennis, P. A. PTEN loss in the continuum of common cancers, rare syndromes and mouse models. Nat. Rev. Cancer 11, 289–301 (2011).

    Article  PubMed  CAS  PubMed Central  Google Scholar 

  42. Kandoth, C. et al. Mutational landscape and significance across 12 major cancer types. Nature 502, 333–339 (2013).

    Article  PubMed  PubMed Central  CAS  Google Scholar 

  43. AACR Project GENIE Consortium. AACR Project GENIE: Powering Precision Medicine through an International Consortium. Cancer Discov. 7, 818–831 (2017).

    Article  Google Scholar 

  44. Papa, A. et al. Cancer-associated PTEN mutants act in a dominant-negative manner to suppress PTEN protein function. Cell 157, 595–610 (2014).

    Article  PubMed  PubMed Central  CAS  Google Scholar 

  45. Leslie, N. R. & Longy, M. Inherited PTEN mutations and the prediction of phenotype. Semin. Cell Dev. Biol. 52, 30–38 (2016).

    Article  PubMed  CAS  Google Scholar 

  46. Wang, H. et al. Allele-specific tumor spectrum in Pten knockin mice. Proc. Natl Acad. Sci. USA 107, 5142–5147 (2010).

    Article  PubMed  PubMed Central  Google Scholar 

  47. Bonneau, D. & Longy, M. Mutations of the human PTEN gene. Hum. Mutat. 16, 109–122 (2000).

    Article  PubMed  CAS  Google Scholar 

  48. Aguissa-Touré, A.-H. & Li, G. Genetic alterations of PTEN in human melanoma. Cell. Mol. Life Sci. 69, 1475–1491 (2012).

    Article  PubMed  CAS  Google Scholar 

  49. Hodges, L. M. et al. Very important pharmacogene summary. Pharmacogenet. Genomics 21, 152–161 (2011).

    Article  PubMed  PubMed Central  CAS  Google Scholar 

  50. Relling, M. V. et al. Clinical pharmacogenetics implementation consortium guidelines for thiopurine methyltransferase genotype and thiopurine dosing: 2013 update. Clin. Pharmacol. Ther. 93, 324–325 (2013).

    Article  PubMed  PubMed Central  CAS  Google Scholar 

  51. Liu, C. et al. Genomewide approach validates thiopurine methyltransferase activity is a monogenic pharmacogenomic trait. Clin. Pharmacol. Ther. 101, 373–381 (2017).

    Article  PubMed  CAS  Google Scholar 

  52. Appell, M. L. et al. Nomenclature for alleles of the thiopurine methyltransferase gene. Pharmacogenet. Genomics 23, 242–248 (2013).

    Article  PubMed  PubMed Central  CAS  Google Scholar 

  53. Hamdan-Khalil, R. et al. In vitro characterization of four novel non-functional variants of the thiopurine S-methyltransferase. Biochem. Biophys. Res. Commun. 309, 1005–1010 (2003).

    Article  PubMed  CAS  Google Scholar 

  54. Kalia, S. S. et al. Recommendations for reporting of secondary findings in clinical exome and genome sequencing, 2016 update (ACMG SFv2.0): a policy statement of the American College of Medical Genetics and Genomics. Genet. Med. 19, 1–7 (2016).

    Google Scholar 

  55. Relling, M. et al. New Pharmacogenomics Research network: an open community catalyzing research and translation in precision medicine. Clin. Pharmacol. Ther. 102, 897–902 (2017).

    Article  PubMed  CAS  Google Scholar 

  56. Dillon, L. M. & Miller, T. W. Therapeutic targeting of cancers with loss of PTEN function. Curr. Drug Targets 15, 65–79 (2014).

    Article  PubMed  PubMed Central  CAS  Google Scholar 

  57. Gibson, D. G. et al. Enzymatic assembly of DNA molecules up to several hundred kilobases. Nat. Methods 6, 343–345 (2009).

    Article  PubMed  CAS  Google Scholar 

  58. Rubin, A. F. et al. A statistical framework for analyzing deep mutational scanning data. Genome Biol. 18, 1–15 (2017).

    Article  Google Scholar 

  59. Krauthammer, M. et al. Exome sequencing identifies recurrent somatic RAC1 mutations in melanoma. Nat. Genet. 44, 1006–1014 (2012).

    Article  PubMed  PubMed Central  CAS  Google Scholar 

  60. Kellogg, E. H., Leaver-Fay, A. & Baker, D. Role of conformational sampling in computing mutation-induced changes in protein structure and stability. Proteins 79, 830–838 (2011).

    Article  PubMed  CAS  Google Scholar 

Download references


We thank J. Underwood and K. Munson of the UW PacBio Sequencing Services for assistance with long-read sequencing; A. Leith of the UW Foege Flow Lab and L. Gitari and D. Prunkard of the UW Pathology Flow Cytometry Core Facility for assistance with cell sorting; and B. Shirts and C. Pritchard in the UW Department of Lab Medicine for advice. The authors would like to acknowledge the American Association for Cancer Research and its financial and material support in the development of the AACR Project GENIE registry, as well as members of the consortium for their commitment to data sharing. Interpretations are the responsibility of study authors. This work was supported by the National Institute of General Medical Sciences (1R01GM109110 and 5R24GM115277 to D.M.F., P50GM115279 to M.V.R. and W.E.E., National Cancer Institute R01CA096670 to S.B. and P30CA21765 to M.V.R.) and an NIH Director’s Pioneer Award (DP1HG007811 to J.S.). K.A.M. is an American Cancer Society Fellow (PF-15-221-01), and was supported by a National Cancer Institute Interdisciplinary Training Grant in Cancer (2T32CA080416). M.A.C. and V.E.G. are supported by the National Science Foundation Graduate Research Fellowship. J.N.D. is supported by a National Institute of General Medical Sciences Training Grant (T32GM007454). J.S. is an Investigator of the Howard Hughes Medical Institute. D.M.F. is a Canadian Institute for Advanced Research Azrieli Global Scholar.

Author information

Authors and Affiliations



D.M.F., J.S., K.A.M. and L.M.S. conceived of, designed and managed the experiments and analyses, and wrote the manuscript; J.J.S. and B.M. cloned expression constructs and libraries and prepared and performed NGS sequencing; K.A.M., M.A.C. and A.K. provided constructs and data for additional disease genes and pharmacogenes; M.K. wrote the scripts to extract barcodes and variable regions from long-read sequences; J.N.D. assisted in using the ACMG guidelines to reclassify PTEN variants; R.J.H. provided constructs for TPMT experiments; V.E.G. designed the website; and S.B., W.E.E., M.V.R. and W.Y. provided clinical data for TPMT comparison.

Corresponding authors

Correspondence to Jay Shendure or Douglas M. Fowler.

Ethics declarations

Competing interests

The authors declare that the variant functional data presented herein are copyrighted, and may be freely used for non-commercial purposes. Licensing for commercial use may benefit the authors. The authors declare no additional competing interests.

Additional information

Publishers note: Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Integrated supplementary information

Supplementary Figure 1 Validation experiments of EGFP-fusions for assessing PTEN and TPMT steady-state abundance.

a, Representative gating strategy for mTagBFP2 negative, mCherry positive cells containing 15,000 recombined cells. b, PTEN variant EGFP:mCherry ratio geometric means as a fraction of WT, for known and previously uncharacterized PTEN low-abundance variants. Error bars denote 95% confidence intervals of the mean (red), with individual data points shown in grey. Each variant was assessed in at least 3 independent experiments. c, Similar plot for TPMT, with error bars denoting 95% confidence intervals of the mean (red), with individual data points shown in grey. All variants were independently assessed three times, except variants p.Asp15Tyr, p.Arg64Ser, p.Ala80Pro, p.Ile143Thr, p.Lys238Glu, p.Tyr240Cys, which were assessed twice. d, Scatterplot comparison of WT-normalized EGFP:mCherry ratios for EGFP- or 15-aa split-GFP fused PTEN variants. Values are the mean of 3 independently performed experiments. n = 6 samples. “r” and “ρ” denote Pearson’s and Spearman’s correlation coefficients, respectively

Supplementary Figure 2 Correlations between PTEN and TPMT VAMP-seq replicates.

a, b, Pairwise VAMP-seq abundance score correlations between replicate sorting experiments for PTEN (a) and TPMT (b). n values are the number of variants scored in both experiments. Replicates 5 and 6 for TPMT contained a subset of mutagenized positions different from those mutagenized in replicates 1 through 4, with both subsets mixed together for Replicates 7 and 8. Pearson’s correlation coefficients are shown. Score numbers in this figure correspond to experiment numbers in Supplementary Table 1

Supplementary Figure 3 Validation analyses for VAMP-seq-derived abundance scores.

a, b, Scatterplot comparison of VAMP-seq abundance scores (x-axis) and individually assessed log10-transformed, WT-normalized geometric means of the EGFP:mCherry ratios for various PTEN (a) and TPMT (b) variants (see also Supplementary Figure 1b, c). r and ρ denote Pearson’s and Spearman’s correlation coefficients, respectively. c, PTEN VAMP-seq scores for variant steady state expression characterized by western blot analysis in previous publications (See Supplementary Table 9). d, Scatterplot comparing TPMT VAMP-seq scores (y-axis) and previously published abundance values from western blots (see Supplementary Table 10). e, Nonsense variant VAMP-seq scores by amino acid position, for PTEN (top) and TPMT (bottom). WT abundance score (1.0) shown as a blue line. N-terminal nonsense variants append a small number of residues to EGFP, which does not affect its abundance. C-terminal nonsense variants remove a small number of residues from PTEN or TMPT, which also does not impact abundance. f, Missense variant abundance score density plots for PTEN (gray) and TPMT (green). The thresholds of the 5% lowest synonymous variant scores are shown, for each protein, by the dotted lines. g, h, Scatterplot comparing positional median PTEN (g) and TPMT (h) VAMP-seq scores to PSIC evolutionary conservation scores for each position (Sunyaev et al.) i, j, Positional median PTEN (i) and TPMT (j) abundance scores for positions found in various secondary structure types, with the red line denoting the median value for the group. n values denote the number of positions that fell into each category

Supplementary Figure 4 Biochemical features associations with VAMP-seq-derived abundance scores.

a, Scatterplot comparing abundance score (y-axis) to in vitro characterized melting temperatures of select PTEN variants (Johnston et al.). r and ρ denote Pearson’s and Spearman’s correlation coefficients, respectively. b, A plot of positional median scores for PTEN positions with potential hydrogen bonds or salt bridges. A position was considered intolerant only if it had 5 or more variants and more than 90% of the abundance scores were at or below the score threshold containing the lowest 5% of synonymous variants. Red bars denote median abundance score values. n = 26 for substitution intolerant, and n = 50 for the remaining positions. c, Substitution-intolerant PTEN positions with potential polar contacts, clustered by distance based on PDB coordinates (PDB: 1d5r). Positions within 11 Å of each other were considered part of a group. The dashed line shows the 11 Å distance cutoff. d, Histogram of the number of PTEN missense variants per position in COSMIC. Substitution-intolerant positions potentially involved in polar contacts with counts in COSMIC greater than 7 are labeled in red. e, Minimum distance of all PTEN positions (gray) or elevated-abundance positions (red) from known phospholipid-binding positions. The black line denotes a 7 Å distance. A position was considered elevated in abundance only if it had 5 or more variants and there were more than 5 variants with scores above the median of the synonymous distribution. f, VAMP-seq scores for variants at position S385, with a synonymous variant in black, negatively charged variants in red, positively charged variants in blue, and all other variants in gray

Supplementary Figure 5 PTEN variant abundance classification and relationship to germline and somatic variation.

a, Illustrative examples of variant abundance classifications, with the dotted line representing the threshold above which 95% of synonymous variants reside. Points represent the VAMP-seq score for each representative variant, with error bars denoting the 95% confidence interval derived from experimental replicates. n values are 3, 5, 2, and 4 for p.Thr2Asp, p.Thr5Ala, p.Glu7His, and Lys6Ile, respectively. b, Frequencies of each PTEN abundance class for each PTEN ClinVar interpretation, as well as for all possible SNVs with abundance classifications. c, Abundance scores and classes for PTEN variants with allele counts highly unlikely to be causal for Cowden’s Syndrome. d, Frequencies of all observed PTEN variants across different cancer types in the TCGA and AACR GENIE data. Highly recurrent PTEN variants are labeled in red. e, Western blot analysis of a clonal line stably expressing WT or missense variants of N-terminally HA-tagged PTEN. This line was derived independently from the line used to generate the data shown in Figure 4f. This experiment was independently performed twice with similar results. f, Comparison of PTEN abundance scores with changes in folding energies predicted by Rosetta using the ddg_monomer protocol. Variants are shown as gray circles, with the exception of those with Rosetta ΔΔG predictions greater than 17, which are marked by a black “x” at a ΔΔG value of 17. Contour lines are colored by the regional density of points. Previously or newly identified PTEN dominant negative variants shown as blue points with blue labels

Supplementary Figure 6 Flow chart of PTEN p.Ile135Lys pathogenicity reinterpretation using VAMP-seq data.

The ACMG/AMP joint criteria for classifying variants were used, with low abundance classification by VAMP-seq considered strong experimental support of pathogenicity (PS3). Without functional data there is no strong or very strong evidence of pathogenicity for this variant, therefore pathogenic criteria cannot be fulfilled and the variant remains classified as likely pathogenic. With low abundance data, PS3 can be used and pathogenic criteria is met

Supplementary Figure 7 Relationship of TPMT variant abundance to drug sensitivity.

a, Scatterplot comparing abundance scores and previously characterized red blood cell (RBC) activity from patients. b, c, Scatterplots comparing individually assessed, WT-normalized EGFP:mCherry geometric means to previously published values of average RBC activity (b), or average patient dosage intensity (c). Dose intensity is the dose where 6-MP becomes toxic to the patient before reaching the 100% protocol dose of 75 mg/m2. r and ρ denote Pearson’s and Spearman’s correlation coefficients, respectively. n = 6 samples for each plot. d, Western blotting results for individually-expressed TPMT variant GFP fusions. Each variant was blotted with 45, 15, and 5 µg of total protein input per lane. This experiment was performed once

Supplementary Figure 8 Protein stability indices for most human protein N-terminal EGFP fusions.

A histogram of protein stability indices from Yen et al. Protein stability index values for proteins tested in the VAMP-seq assay are shown as dashed vertical lines. Protein stability indices were not available for PTEN, CYP2C9, CYP2C19, and PMS2

Supplementary Figure 9 Amplification and sequencing technical replicates for PTEN.

Scatterplots comparing variant frequency derived from replicate PCR amplification and sequencing for each of the four bins in every PTEN experiment are shown

Supplementary Figure 10 Scheme to determine total frequency filtering threshold value.

a, b, Scatterplots showing the total frequencies and weighted average values of wt (black), synonymous variants (red), or non-terminal nonsense variants (blue) for each experiment, for PTEN and TPMT respectively. A combination of synonymous variant coefficient of variation (c and d), synonymous variant mean (black) and median (red) (e and f), and total number of scored missense variants (g and h) for PTEN (c, e, and g) and TPMT (d, f, and h) were assessed at increasing total frequency filtering threshold values to obtain the threshold value that we required across the four bins for a variant to be included in the analyses we present. The 1 x 10-4.75 total frequency threshold used for the final analysis is displayed as a dotted line in each plot

Supplementary Figure 11 Statistics for the PTEN library.

a, Barcode counts from independent amplifications of the barcoded PTEN library plasmid preparation used for recombination. n = 67,162 data points. r denotes Pearson’s correlation coefficient. b, A filter based on a minimum count of 200 was imposed (black dotted line), resulting in 40,560 unique barcodes. c, The barcode-variant map was used to determine the frequencies of different types of sequences in the plasmid preparation of the barcoded PTEN library. d, Nucleotide biases at the degenerate codon for the single amino acid PTEN variants. e, Amino acid biases of the single amino acid variants of the PTEN library, with the frequencies expected from perfect NNK mutagenesis shown in red. f, Number of substitutions observed at each position of the PTEN protein amongst the 40,560 barcodes in the PTEN library plasmid preparation. g, Distribution of number of substitutions per position in the PTEN protein. h, Distribution of single amino acid variant frequencies in the PTEN library (black), along with an illustrative log-normal distribution that closely fits the PTEN data (red), shown as a density plot (top panel), or a cumulative distribution function plot (bottom panel). i, Sampling simulations of observed and hypothetical PTEN libraries, displaying the fraction of the 8,040 possible PTEN single amino acid and nonsense variants observed for increasing sampling sizes, with a step size of 1. Results of sampling from the PTEN variant frequency distribution observed in the library plasmid preparation are shown in black. Results of sampling hypothetical, uniformly distributed libraries containing either the subset of single amino acid variants observed in the PTEN library plasmid preparation (dark gray), or all possible PTEN single amino acid variants (light gray) are shown for comparison

Supplementary information

Supplementary Text and Figures

Supplementary Figures 1–11, Supplementary Tables 1–3, 5–10 and Supplementary Note

Reporting Summary

Supplementary Dataset 1

Dataset of PTEN variant scores, classifications, and annotations

Supplementary Dataset 2

Dataset of TPMT variant scores, classifications, and annotations

Supplementary Dataset 3

Dataset of PTEN residue scores, classifications, and annotations

Supplementary Dataset 4

Dataset of TPMT residue scores, classifications, and annotations

Supplementary Dataset 5

R Markdown file recreating all of the analyses

Supplementary Table 4

Table of PTEN variant pathogenicity reclassifications that are possible with abundance data

Rights and permissions

Reprints and Permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Matreyek, K.A., Starita, L.M., Stephany, J.J. et al. Multiplex assessment of protein variant abundance by massively parallel sequencing. Nat Genet 50, 874–882 (2018).

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI:

This article is cited by


Quick links

Nature Briefing

Sign up for the Nature Briefing newsletter — what matters in science, free to your inbox daily.

Get the most important science stories of the day, free in your inbox. Sign up for Nature Briefing