Determining the pathogenicity of genetic variants is a critical challenge, and functional assessment is often the only option. Experimentally characterizing millions of possible missense variants in thousands of clinically important genes requires generalizable, scalable assays. We describe variant abundance by massively parallel sequencing (VAMP-seq), which measures the effects of thousands of missense variants of a protein on intracellular abundance simultaneously. We apply VAMP-seq to quantify the abundance of 7,801 single-amino-acid variants of PTEN and TPMT, proteins in which functional variants are clinically actionable. We identify 1,138 PTEN and 777 TPMT variants that result in low protein abundance, and may be pathogenic or alter drug metabolism, respectively. We observe selection for low-abundance PTEN variants in cancer, and show that p.Pro38Ser, which accounts for ~10% of PTEN missense variants in melanoma, functions via a dominant-negative mechanism. Finally, we demonstrate that VAMP-seq is applicable to other genes, highlighting its generalizability.
This is a preview of subscription content, access via your institution
Open Access articles citing this article.
Communications Biology Open Access 16 June 2022
Nature Communications Open Access 25 March 2022
Human Genetics Open Access 30 December 2021
Subscribe to Nature+
Get immediate online access to the entire Nature family of 50+ journals
Subscribe to Journal
Get full journal access for 1 year
only $8.25 per issue
All prices are NET prices.
VAT will be added later in the checkout.
Tax calculation will be finalised during checkout.
Get time limited or full article access on ReadCube.
All prices are NET prices.
Shirts, B. H., Pritchard, C. C. & Walsh, T. Family-specific variants and the limits of human genetics. Trends Mol. Med. 22, 925–934 (2016).
Lek, M. et al. Analysis of protein-coding genetic variation in 60,706 humans. Nature 536, 285–291 (2016).
Landrum, M. J. et al. ClinVar: public archive of relationships among sequence variation and human phenotype. Nucleic Acids Res. 42, 980–985 (2014).
Fowler, D. M., Stephany, J. J. & Fields, S. Measuring the activity of protein variants on a large scale using deep mutational scanning. Nat. Protoc. 9, 2267–2284 (2014).
Gasperini, M., Starita, L. & Shendure, J. The power of multiplexed functional analysis of genetic variants. Nat. Protoc. 11, 1782–1787 (2016).
Manolio, T. A. et al. Bedside back to bench: building bridges between basic and clinical genomic research. Cell 169, 6–12 (2017).
Starita, L. M. et al. Massively parallel functional analysis of BRCA1 RING domain variants. Genetics 200, 413–422 (2015).
Majithia, A. R. et al. Prospective functional classification of all possible missense variants in PPARG. Nat. Genet. 48, 1570–1575 (2016).
Yue, P., Li, Z. & Moult, J. Loss of protein structure stability as a major causative factor in monogenic disease. J. Mol. Biol. 353, 459–473 (2005).
Redler, R. L., Das, J., Diaz, J. R. & Dokholyan, N. V. Protein destabilization as a common factor in diverse inherited disorders. J. Mol. Evol. 82, 11–16 (2016).
Berger, A. H., Knudson, A. G. & Pandolfi, P. P. A continuum model for tumour suppression. Nature 476, 163–169 (2011).
Lee, M. S. et al. Comprehensive analysis of missense variations in the BRCT domain of BRCA1 by structural and functional assays. Cancer Res. 70, 4880–4890 (2010).
Tai, H. L., Krynetski, E. Y., Schuetz, E. G., Yanishevski, Y. & Evans, W. E. Enhanced proteolysis of thiopurine S-methyltransferase (TPMT) encoded by mutant alleles in humans (TPMT*3A, TPMT*2): mechanisms for the genetic polymorphism of TPMT activity. Proc. Natl Acad. Sci. USA 94, 6444–6449 (1997).
Kim, I., Miller, C. R., Young, D. L. & Fields, S. High-throughput analysis of in vivo protein stability. Mol. Cell. Proteomics 12, 3370–3378 (2013).
Klesmith, J. R., Bacik, J.-P., Wrenbeck, E. E., Michalczyk, R. & Whitehead, T. A. Trade-offs between enzyme fitness and solubility illuminated by deep mutational scanning. Proc. Natl Acad. Sci. USA 114, 2265–2270 (2017).
Yen, H.-C. S., Xu, Q., Chou, D. M., Zhao, Z. & Elledge, S. J. Global protein stability profiling in mammalian cells. Science 322, 918–923 (2008).
Matreyek, K. A., Stephany, J. J. & Fowler, D. M. A platform for functional assessment of large variant libraries in mammalian cells. Nucleic Acids Res. 45, e102 (2017).
Jain, P. C. & Varadarajan, R. A rapid, efficient, and economical inverse polymerase chain reaction-based method for generating a site saturation mutant library. Anal. Biochem. 449, 90–98 (2014).
Cabantous, S., Terwilliger, T. C. & Waldo, G. S. Protein tagging and detection with engineered self-assembling fragments of green fluorescent protein. Nat. Biotechnol. 23, 102–107 (2005).
Johnston, S. B. & Raines, R. T. Conformational stability and catalytic activity of PTEN variants linked to cancers and autism spectrum disorders. Biochemistry 54, 1576–1582 (2015).
Wu, H. et al. Structural basis of allele variation of human thiopurine-S-methyltransferase. Proteins 67, 198–208 (2007).
Ward, W. W., Prentice, H. J., Roth, A. F., Cody, C. W. & Reeves, S. C. Spectral perturbations of the Aequorea green-fluorescent protein. Photochem. Photobiol. 35, 803–808 (1982).
Sarkisyan, K. S. et al. Local fitness landscape of the green fluorescent protein. Nature 533, 397–401 (2016).
Zhou, H. & Zhou, Y. Quantifying the effect of burial of amino acid residues on protein stability. Proteins 322, 315–322 (2004).
Kauzmann, W. Some factors in the interpretation of protein denaturation. Adv. Protein Chem. 14, 1–63 (1959).
Rocklin, G. J. et al. Global analysis of protein folding using massively parallel design, synthesis, and testing. Science 357, 168–175 (2017).
Lee, J. O. et al. Crystal structure of the PTEN tumor suppressor: implications for its phosphoinositide phosphatase activity and membrane association. Cell 99, 323–334 (1999).
Song, M. S., Salmena, L. & Pandolfi, P. P. The functions and regulation of the PTEN tumour suppressor. Nat. Rev. Mol. Cell Biol. 13, 283–296 (2012).
Nguyen, H.-N. et al. A new class of cancer-associated PTEN mutations defined by membrane translocation defects. Oncogene 34, 3737–3743 (2015).
Walker, S. M., Leslie, N. R., Perera, N. M., Batty, I. H. & Downes, C. P. The tumour-suppressor function of PTEN requires an N-terminal lipid-binding motif. Biochem. J. 379, 301–307 (2004).
Das, S., Dixon, J. E. & Cho, W. Membrane-binding and activation mechanism of PTEN. Proc. Natl Acad. Sci. USA 100, 7491–7496 (2003).
Vazquez, F., Ramaswamy, S., Nakamura, N. & Sellers, W. R. Phosphorylation of the PTEN tail regulates protein stability and function. Mol. Cell. Biol. 20, 5010–5018 (2000).
Wei, Y., Stec, B., Redfield, A. G., Weerapana, E. & Roberts, M. F. Phospholipid-binding sites of phosphatase and tensin homolog (PTEN): Exploring the mechanism of phosphatidylinositol 4,5-bisphosphate activation. J. Biol. Chem. 290, 1592–1606 (2015).
Naguib, A. et al. PTEN functions by recruitment to cytoplasmic vesicles. Mol. Cell 58, 255–268 (2015).
Hobert, J. A. & Eng, C. PTEN hamartoma tumor syndrome: an overview. Genet. Med. 11, 687–694 (2009).
Melbārde-Gorkuša, I. et al. Challenges in the management of a patient with Cowden syndrome: case report and literature review. Hered. Cancer Clin. Pract. 10, 5 (2012).
Staal, F. J. T. et al. A novel germline mutation of PTEN associated with brain tumours of multiple lineages. Br. J. Cancer 86, 1586–1591 (2002).
Nelen, M. R. et al. Novel PTEN mutations in patients with Cowden disease: Absence of clear genotype–phenotype correlations. Eur. J. Hum. Genet. 7, 267–273 (1999).
Whiffin, N. et al. Using high-resolution variant frequencies to empower clinical genome interpretation. Genet. Med. 19, 1151–1158 (2017).
Richards, S. et al. Standards and guidelines for the interpretation of sequence variants: a joint consensus recommendation of the American College of Medical Genetics and Genomics and the Association for Molecular Pathology. Genet. Med. 17, 405–423 (2015).
Hollander, M. C., Blumenthal, G. M. & Dennis, P. A. PTEN loss in the continuum of common cancers, rare syndromes and mouse models. Nat. Rev. Cancer 11, 289–301 (2011).
Kandoth, C. et al. Mutational landscape and significance across 12 major cancer types. Nature 502, 333–339 (2013).
AACR Project GENIE Consortium. AACR Project GENIE: Powering Precision Medicine through an International Consortium. Cancer Discov. 7, 818–831 (2017).
Papa, A. et al. Cancer-associated PTEN mutants act in a dominant-negative manner to suppress PTEN protein function. Cell 157, 595–610 (2014).
Leslie, N. R. & Longy, M. Inherited PTEN mutations and the prediction of phenotype. Semin. Cell Dev. Biol. 52, 30–38 (2016).
Wang, H. et al. Allele-specific tumor spectrum in Pten knockin mice. Proc. Natl Acad. Sci. USA 107, 5142–5147 (2010).
Bonneau, D. & Longy, M. Mutations of the human PTEN gene. Hum. Mutat. 16, 109–122 (2000).
Aguissa-Touré, A.-H. & Li, G. Genetic alterations of PTEN in human melanoma. Cell. Mol. Life Sci. 69, 1475–1491 (2012).
Hodges, L. M. et al. Very important pharmacogene summary. Pharmacogenet. Genomics 21, 152–161 (2011).
Relling, M. V. et al. Clinical pharmacogenetics implementation consortium guidelines for thiopurine methyltransferase genotype and thiopurine dosing: 2013 update. Clin. Pharmacol. Ther. 93, 324–325 (2013).
Liu, C. et al. Genomewide approach validates thiopurine methyltransferase activity is a monogenic pharmacogenomic trait. Clin. Pharmacol. Ther. 101, 373–381 (2017).
Appell, M. L. et al. Nomenclature for alleles of the thiopurine methyltransferase gene. Pharmacogenet. Genomics 23, 242–248 (2013).
Hamdan-Khalil, R. et al. In vitro characterization of four novel non-functional variants of the thiopurine S-methyltransferase. Biochem. Biophys. Res. Commun. 309, 1005–1010 (2003).
Kalia, S. S. et al. Recommendations for reporting of secondary findings in clinical exome and genome sequencing, 2016 update (ACMG SFv2.0): a policy statement of the American College of Medical Genetics and Genomics. Genet. Med. 19, 1–7 (2016).
Relling, M. et al. New Pharmacogenomics Research network: an open community catalyzing research and translation in precision medicine. Clin. Pharmacol. Ther. 102, 897–902 (2017).
Dillon, L. M. & Miller, T. W. Therapeutic targeting of cancers with loss of PTEN function. Curr. Drug Targets 15, 65–79 (2014).
Gibson, D. G. et al. Enzymatic assembly of DNA molecules up to several hundred kilobases. Nat. Methods 6, 343–345 (2009).
Rubin, A. F. et al. A statistical framework for analyzing deep mutational scanning data. Genome Biol. 18, 1–15 (2017).
Krauthammer, M. et al. Exome sequencing identifies recurrent somatic RAC1 mutations in melanoma. Nat. Genet. 44, 1006–1014 (2012).
Kellogg, E. H., Leaver-Fay, A. & Baker, D. Role of conformational sampling in computing mutation-induced changes in protein structure and stability. Proteins 79, 830–838 (2011).
We thank J. Underwood and K. Munson of the UW PacBio Sequencing Services for assistance with long-read sequencing; A. Leith of the UW Foege Flow Lab and L. Gitari and D. Prunkard of the UW Pathology Flow Cytometry Core Facility for assistance with cell sorting; and B. Shirts and C. Pritchard in the UW Department of Lab Medicine for advice. The authors would like to acknowledge the American Association for Cancer Research and its financial and material support in the development of the AACR Project GENIE registry, as well as members of the consortium for their commitment to data sharing. Interpretations are the responsibility of study authors. This work was supported by the National Institute of General Medical Sciences (1R01GM109110 and 5R24GM115277 to D.M.F., P50GM115279 to M.V.R. and W.E.E., National Cancer Institute R01CA096670 to S.B. and P30CA21765 to M.V.R.) and an NIH Director’s Pioneer Award (DP1HG007811 to J.S.). K.A.M. is an American Cancer Society Fellow (PF-15-221-01), and was supported by a National Cancer Institute Interdisciplinary Training Grant in Cancer (2T32CA080416). M.A.C. and V.E.G. are supported by the National Science Foundation Graduate Research Fellowship. J.N.D. is supported by a National Institute of General Medical Sciences Training Grant (T32GM007454). J.S. is an Investigator of the Howard Hughes Medical Institute. D.M.F. is a Canadian Institute for Advanced Research Azrieli Global Scholar.
The authors declare that the variant functional data presented herein are copyrighted, and may be freely used for non-commercial purposes. Licensing for commercial use may benefit the authors. The authors declare no additional competing interests.
Publisher’s note: Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Integrated supplementary information
Supplementary Figure 1 Validation experiments of EGFP-fusions for assessing PTEN and TPMT steady-state abundance.
a, Representative gating strategy for mTagBFP2 negative, mCherry positive cells containing 15,000 recombined cells. b, PTEN variant EGFP:mCherry ratio geometric means as a fraction of WT, for known and previously uncharacterized PTEN low-abundance variants. Error bars denote 95% confidence intervals of the mean (red), with individual data points shown in grey. Each variant was assessed in at least 3 independent experiments. c, Similar plot for TPMT, with error bars denoting 95% confidence intervals of the mean (red), with individual data points shown in grey. All variants were independently assessed three times, except variants p.Asp15Tyr, p.Arg64Ser, p.Ala80Pro, p.Ile143Thr, p.Lys238Glu, p.Tyr240Cys, which were assessed twice. d, Scatterplot comparison of WT-normalized EGFP:mCherry ratios for EGFP- or 15-aa split-GFP fused PTEN variants. Values are the mean of 3 independently performed experiments. n = 6 samples. “r” and “ρ” denote Pearson’s and Spearman’s correlation coefficients, respectively
a, b, Pairwise VAMP-seq abundance score correlations between replicate sorting experiments for PTEN (a) and TPMT (b). n values are the number of variants scored in both experiments. Replicates 5 and 6 for TPMT contained a subset of mutagenized positions different from those mutagenized in replicates 1 through 4, with both subsets mixed together for Replicates 7 and 8. Pearson’s correlation coefficients are shown. Score numbers in this figure correspond to experiment numbers in Supplementary Table 1
a, b, Scatterplot comparison of VAMP-seq abundance scores (x-axis) and individually assessed log10-transformed, WT-normalized geometric means of the EGFP:mCherry ratios for various PTEN (a) and TPMT (b) variants (see also Supplementary Figure 1b, c). r and ρ denote Pearson’s and Spearman’s correlation coefficients, respectively. c, PTEN VAMP-seq scores for variant steady state expression characterized by western blot analysis in previous publications (See Supplementary Table 9). d, Scatterplot comparing TPMT VAMP-seq scores (y-axis) and previously published abundance values from western blots (see Supplementary Table 10). e, Nonsense variant VAMP-seq scores by amino acid position, for PTEN (top) and TPMT (bottom). WT abundance score (1.0) shown as a blue line. N-terminal nonsense variants append a small number of residues to EGFP, which does not affect its abundance. C-terminal nonsense variants remove a small number of residues from PTEN or TMPT, which also does not impact abundance. f, Missense variant abundance score density plots for PTEN (gray) and TPMT (green). The thresholds of the 5% lowest synonymous variant scores are shown, for each protein, by the dotted lines. g, h, Scatterplot comparing positional median PTEN (g) and TPMT (h) VAMP-seq scores to PSIC evolutionary conservation scores for each position (Sunyaev et al.) i, j, Positional median PTEN (i) and TPMT (j) abundance scores for positions found in various secondary structure types, with the red line denoting the median value for the group. n values denote the number of positions that fell into each category
a, Scatterplot comparing abundance score (y-axis) to in vitro characterized melting temperatures of select PTEN variants (Johnston et al.). r and ρ denote Pearson’s and Spearman’s correlation coefficients, respectively. b, A plot of positional median scores for PTEN positions with potential hydrogen bonds or salt bridges. A position was considered intolerant only if it had 5 or more variants and more than 90% of the abundance scores were at or below the score threshold containing the lowest 5% of synonymous variants. Red bars denote median abundance score values. n = 26 for substitution intolerant, and n = 50 for the remaining positions. c, Substitution-intolerant PTEN positions with potential polar contacts, clustered by distance based on PDB coordinates (PDB: 1d5r). Positions within 11 Å of each other were considered part of a group. The dashed line shows the 11 Å distance cutoff. d, Histogram of the number of PTEN missense variants per position in COSMIC. Substitution-intolerant positions potentially involved in polar contacts with counts in COSMIC greater than 7 are labeled in red. e, Minimum distance of all PTEN positions (gray) or elevated-abundance positions (red) from known phospholipid-binding positions. The black line denotes a 7 Å distance. A position was considered elevated in abundance only if it had 5 or more variants and there were more than 5 variants with scores above the median of the synonymous distribution. f, VAMP-seq scores for variants at position S385, with a synonymous variant in black, negatively charged variants in red, positively charged variants in blue, and all other variants in gray
Supplementary Figure 5 PTEN variant abundance classification and relationship to germline and somatic variation.
a, Illustrative examples of variant abundance classifications, with the dotted line representing the threshold above which 95% of synonymous variants reside. Points represent the VAMP-seq score for each representative variant, with error bars denoting the 95% confidence interval derived from experimental replicates. n values are 3, 5, 2, and 4 for p.Thr2Asp, p.Thr5Ala, p.Glu7His, and Lys6Ile, respectively. b, Frequencies of each PTEN abundance class for each PTEN ClinVar interpretation, as well as for all possible SNVs with abundance classifications. c, Abundance scores and classes for PTEN variants with allele counts highly unlikely to be causal for Cowden’s Syndrome. d, Frequencies of all observed PTEN variants across different cancer types in the TCGA and AACR GENIE data. Highly recurrent PTEN variants are labeled in red. e, Western blot analysis of a clonal line stably expressing WT or missense variants of N-terminally HA-tagged PTEN. This line was derived independently from the line used to generate the data shown in Figure 4f. This experiment was independently performed twice with similar results. f, Comparison of PTEN abundance scores with changes in folding energies predicted by Rosetta using the ddg_monomer protocol. Variants are shown as gray circles, with the exception of those with Rosetta ΔΔG predictions greater than 17, which are marked by a black “x” at a ΔΔG value of 17. Contour lines are colored by the regional density of points. Previously or newly identified PTEN dominant negative variants shown as blue points with blue labels
Supplementary Figure 6 Flow chart of PTEN p.Ile135Lys pathogenicity reinterpretation using VAMP-seq data.
The ACMG/AMP joint criteria for classifying variants were used, with low abundance classification by VAMP-seq considered strong experimental support of pathogenicity (PS3). Without functional data there is no strong or very strong evidence of pathogenicity for this variant, therefore pathogenic criteria cannot be fulfilled and the variant remains classified as likely pathogenic. With low abundance data, PS3 can be used and pathogenic criteria is met
a, Scatterplot comparing abundance scores and previously characterized red blood cell (RBC) activity from patients. b, c, Scatterplots comparing individually assessed, WT-normalized EGFP:mCherry geometric means to previously published values of average RBC activity (b), or average patient dosage intensity (c). Dose intensity is the dose where 6-MP becomes toxic to the patient before reaching the 100% protocol dose of 75 mg/m2. r and ρ denote Pearson’s and Spearman’s correlation coefficients, respectively. n = 6 samples for each plot. d, Western blotting results for individually-expressed TPMT variant GFP fusions. Each variant was blotted with 45, 15, and 5 µg of total protein input per lane. This experiment was performed once
A histogram of protein stability indices from Yen et al. Protein stability index values for proteins tested in the VAMP-seq assay are shown as dashed vertical lines. Protein stability indices were not available for PTEN, CYP2C9, CYP2C19, and PMS2
Scatterplots comparing variant frequency derived from replicate PCR amplification and sequencing for each of the four bins in every PTEN experiment are shown
a, b, Scatterplots showing the total frequencies and weighted average values of wt (black), synonymous variants (red), or non-terminal nonsense variants (blue) for each experiment, for PTEN and TPMT respectively. A combination of synonymous variant coefficient of variation (c and d), synonymous variant mean (black) and median (red) (e and f), and total number of scored missense variants (g and h) for PTEN (c, e, and g) and TPMT (d, f, and h) were assessed at increasing total frequency filtering threshold values to obtain the threshold value that we required across the four bins for a variant to be included in the analyses we present. The 1 x 10-4.75 total frequency threshold used for the final analysis is displayed as a dotted line in each plot
a, Barcode counts from independent amplifications of the barcoded PTEN library plasmid preparation used for recombination. n = 67,162 data points. r denotes Pearson’s correlation coefficient. b, A filter based on a minimum count of 200 was imposed (black dotted line), resulting in 40,560 unique barcodes. c, The barcode-variant map was used to determine the frequencies of different types of sequences in the plasmid preparation of the barcoded PTEN library. d, Nucleotide biases at the degenerate codon for the single amino acid PTEN variants. e, Amino acid biases of the single amino acid variants of the PTEN library, with the frequencies expected from perfect NNK mutagenesis shown in red. f, Number of substitutions observed at each position of the PTEN protein amongst the 40,560 barcodes in the PTEN library plasmid preparation. g, Distribution of number of substitutions per position in the PTEN protein. h, Distribution of single amino acid variant frequencies in the PTEN library (black), along with an illustrative log-normal distribution that closely fits the PTEN data (red), shown as a density plot (top panel), or a cumulative distribution function plot (bottom panel). i, Sampling simulations of observed and hypothetical PTEN libraries, displaying the fraction of the 8,040 possible PTEN single amino acid and nonsense variants observed for increasing sampling sizes, with a step size of 1. Results of sampling from the PTEN variant frequency distribution observed in the library plasmid preparation are shown in black. Results of sampling hypothetical, uniformly distributed libraries containing either the subset of single amino acid variants observed in the PTEN library plasmid preparation (dark gray), or all possible PTEN single amino acid variants (light gray) are shown for comparison
Supplementary Figures 1–11, Supplementary Tables 1–3, 5–10 and Supplementary Note
Dataset of PTEN variant scores, classifications, and annotations
Dataset of TPMT variant scores, classifications, and annotations
Dataset of PTEN residue scores, classifications, and annotations
Dataset of TPMT residue scores, classifications, and annotations
R Markdown file recreating all of the analyses
Table of PTEN variant pathogenicity reclassifications that are possible with abundance data
About this article
Cite this article
Matreyek, K.A., Starita, L.M., Stephany, J.J. et al. Multiplex assessment of protein variant abundance by massively parallel sequencing. Nat Genet 50, 874–882 (2018). https://doi.org/10.1038/s41588-018-0122-z
This article is cited by
Nature Communications (2022)
European Journal of Human Genetics (2022)
Nature Biotechnology (2022)
Communications Biology (2022)
Human Genetics (2022)