Multiplex assessment of protein variant abundance by massively parallel sequencing

Matreyek, Kenneth A.; Starita, Lea M.; Stephany, Jason J.; Martin, Beth; Chiasson, Melissa A.; Gray, Vanessa E.; Kircher, Martin; Khechaduri, Arineh; Dines, Jennifer N.; Hause, Ronald J.; Bhatia, Smita; Evans, William E.; Relling, Mary V.; Yang, Wenjian; Shendure, Jay; Fowler, Douglas M.

doi:10.1038/s41588-018-0122-z

Article
Published: 21 May 2018

Multiplex assessment of protein variant abundance by massively parallel sequencing

Nature Genetics volume 50, pages 874–882 (2018)Cite this article

16k Accesses
213 Citations
83 Altmetric
Metrics details

Subjects

Abstract

Determining the pathogenicity of genetic variants is a critical challenge, and functional assessment is often the only option. Experimentally characterizing millions of possible missense variants in thousands of clinically important genes requires generalizable, scalable assays. We describe variant abundance by massively parallel sequencing (VAMP-seq), which measures the effects of thousands of missense variants of a protein on intracellular abundance simultaneously. We apply VAMP-seq to quantify the abundance of 7,801 single-amino-acid variants of PTEN and TPMT, proteins in which functional variants are clinically actionable. We identify 1,138 PTEN and 777 TPMT variants that result in low protein abundance, and may be pathogenic or alter drug metabolism, respectively. We observe selection for low-abundance PTEN variants in cancer, and show that p.Pro38Ser, which accounts for ~10% of PTEN missense variants in melanoma, functions via a dominant-negative mechanism. Finally, we demonstrate that VAMP-seq is applicable to other genes, highlighting its generalizability.

Access through your institution

Buy or subscribe

This is a preview of subscription content, access via your institution

Access options

Access through your institution

Buy this article

Purchase on Springer Link
Instant access to full article PDF

Buy now

Prices may be subject to local taxes which are calculated during checkout

**Fig. 2: VAMP-seq abundance scores for PTEN and TPMT.**

**Fig. 3: Biochemical features influencing intracellular protein abundance.**

**Fig. 4: PTEN variant abundance classes across PTEN hamartoma tumor syndrome and cancer.**

**Fig. 5: TPMT variant abundance classes across pharmacogenomics phenotypes.**

**Fig. 6: Additional drug- and disease-related genes are compatible with VAMP-seq.**

Inferring gene regulatory networks from single-cell multiome data using atlas-scale external data

Article Open access 12 April 2024

Three million images and morphological profiles of cells treated with matched chemical and genetic perturbations

Article Open access 09 April 2024

Genome-wide association studies

Article 26 August 2021

References

Shirts, B. H., Pritchard, C. C. & Walsh, T. Family-specific variants and the limits of human genetics. Trends Mol. Med. 22, 925–934 (2016).
Article PubMed CAS PubMed Central Google Scholar
Lek, M. et al. Analysis of protein-coding genetic variation in 60,706 humans. Nature 536, 285–291 (2016).
Article PubMed PubMed Central CAS Google Scholar
Landrum, M. J. et al. ClinVar: public archive of relationships among sequence variation and human phenotype. Nucleic Acids Res. 42, 980–985 (2014).
Article CAS Google Scholar
Fowler, D. M., Stephany, J. J. & Fields, S. Measuring the activity of protein variants on a large scale using deep mutational scanning. Nat. Protoc. 9, 2267–2284 (2014).
Article PubMed PubMed Central CAS Google Scholar
Gasperini, M., Starita, L. & Shendure, J. The power of multiplexed functional analysis of genetic variants. Nat. Protoc. 11, 1782–1787 (2016).
Article PubMed CAS PubMed Central Google Scholar
Manolio, T. A. et al. Bedside back to bench: building bridges between basic and clinical genomic research. Cell 169, 6–12 (2017).
Article PubMed PubMed Central CAS Google Scholar
Starita, L. M. et al. Massively parallel functional analysis of BRCA1 RING domain variants. Genetics 200, 413–422 (2015).
Article PubMed PubMed Central CAS Google Scholar
Majithia, A. R. et al. Prospective functional classification of all possible missense variants in PPARG. Nat. Genet. 48, 1570–1575 (2016).
Article PubMed PubMed Central CAS Google Scholar
Yue, P., Li, Z. & Moult, J. Loss of protein structure stability as a major causative factor in monogenic disease. J. Mol. Biol. 353, 459–473 (2005).
Article PubMed CAS Google Scholar
Redler, R. L., Das, J., Diaz, J. R. & Dokholyan, N. V. Protein destabilization as a common factor in diverse inherited disorders. J. Mol. Evol. 82, 11–16 (2016).
Article PubMed CAS Google Scholar
Berger, A. H., Knudson, A. G. & Pandolfi, P. P. A continuum model for tumour suppression. Nature 476, 163–169 (2011).
Article PubMed PubMed Central CAS Google Scholar
Lee, M. S. et al. Comprehensive analysis of missense variations in the BRCT domain of BRCA1 by structural and functional assays. Cancer Res. 70, 4880–4890 (2010).
Article PubMed PubMed Central CAS Google Scholar
Tai, H. L., Krynetski, E. Y., Schuetz, E. G., Yanishevski, Y. & Evans, W. E. Enhanced proteolysis of thiopurine S-methyltransferase (TPMT) encoded by mutant alleles in humans (TPMT*3A, TPMT*2): mechanisms for the genetic polymorphism of TPMT activity. Proc. Natl Acad. Sci. USA 94, 6444–6449 (1997).
Article PubMed PubMed Central CAS Google Scholar
Kim, I., Miller, C. R., Young, D. L. & Fields, S. High-throughput analysis of in vivo protein stability. Mol. Cell. Proteomics 12, 3370–3378 (2013).
Article PubMed PubMed Central CAS Google Scholar
Klesmith, J. R., Bacik, J.-P., Wrenbeck, E. E., Michalczyk, R. & Whitehead, T. A. Trade-offs between enzyme fitness and solubility illuminated by deep mutational scanning. Proc. Natl Acad. Sci. USA 114, 2265–2270 (2017).
Article PubMed PubMed Central CAS Google Scholar
Yen, H.-C. S., Xu, Q., Chou, D. M., Zhao, Z. & Elledge, S. J. Global protein stability profiling in mammalian cells. Science 322, 918–923 (2008).
Article PubMed CAS Google Scholar
Matreyek, K. A., Stephany, J. J. & Fowler, D. M. A platform for functional assessment of large variant libraries in mammalian cells. Nucleic Acids Res. 45, e102 (2017).
Article PubMed PubMed Central CAS Google Scholar
Jain, P. C. & Varadarajan, R. A rapid, efficient, and economical inverse polymerase chain reaction-based method for generating a site saturation mutant library. Anal. Biochem. 449, 90–98 (2014).
Article PubMed CAS Google Scholar
Cabantous, S., Terwilliger, T. C. & Waldo, G. S. Protein tagging and detection with engineered self-assembling fragments of green fluorescent protein. Nat. Biotechnol. 23, 102–107 (2005).
Article PubMed CAS Google Scholar
Johnston, S. B. & Raines, R. T. Conformational stability and catalytic activity of PTEN variants linked to cancers and autism spectrum disorders. Biochemistry 54, 1576–1582 (2015).
Article PubMed CAS Google Scholar
Wu, H. et al. Structural basis of allele variation of human thiopurine-S-methyltransferase. Proteins 67, 198–208 (2007).
Article PubMed PubMed Central CAS Google Scholar
Ward, W. W., Prentice, H. J., Roth, A. F., Cody, C. W. & Reeves, S. C. Spectral perturbations of the Aequorea green-fluorescent protein. Photochem. Photobiol. 35, 803–808 (1982).
Article CAS Google Scholar
Sarkisyan, K. S. et al. Local fitness landscape of the green fluorescent protein. Nature 533, 397–401 (2016).
Article PubMed PubMed Central CAS Google Scholar
Zhou, H. & Zhou, Y. Quantifying the effect of burial of amino acid residues on protein stability. Proteins 322, 315–322 (2004).
Google Scholar
Kauzmann, W. Some factors in the interpretation of protein denaturation. Adv. Protein Chem. 14, 1–63 (1959).
Article PubMed CAS Google Scholar
Rocklin, G. J. et al. Global analysis of protein folding using massively parallel design, synthesis, and testing. Science 357, 168–175 (2017).
Article PubMed PubMed Central CAS Google Scholar
Lee, J. O. et al. Crystal structure of the PTEN tumor suppressor: implications for its phosphoinositide phosphatase activity and membrane association. Cell 99, 323–334 (1999).
Article PubMed CAS Google Scholar
Song, M. S., Salmena, L. & Pandolfi, P. P. The functions and regulation of the PTEN tumour suppressor. Nat. Rev. Mol. Cell Biol. 13, 283–296 (2012).
Article PubMed CAS Google Scholar
Nguyen, H.-N. et al. A new class of cancer-associated PTEN mutations defined by membrane translocation defects. Oncogene 34, 3737–3743 (2015).
Article PubMed CAS Google Scholar
Walker, S. M., Leslie, N. R., Perera, N. M., Batty, I. H. & Downes, C. P. The tumour-suppressor function of PTEN requires an N-terminal lipid-binding motif. Biochem. J. 379, 301–307 (2004).
Article PubMed PubMed Central CAS Google Scholar
Das, S., Dixon, J. E. & Cho, W. Membrane-binding and activation mechanism of PTEN. Proc. Natl Acad. Sci. USA 100, 7491–7496 (2003).
Article PubMed PubMed Central CAS Google Scholar
Vazquez, F., Ramaswamy, S., Nakamura, N. & Sellers, W. R. Phosphorylation of the PTEN tail regulates protein stability and function. Mol. Cell. Biol. 20, 5010–5018 (2000).
Article PubMed PubMed Central CAS Google Scholar
Wei, Y., Stec, B., Redfield, A. G., Weerapana, E. & Roberts, M. F. Phospholipid-binding sites of phosphatase and tensin homolog (PTEN): Exploring the mechanism of phosphatidylinositol 4,5-bisphosphate activation. J. Biol. Chem. 290, 1592–1606 (2015).
Article PubMed CAS Google Scholar
Naguib, A. et al. PTEN functions by recruitment to cytoplasmic vesicles. Mol. Cell 58, 255–268 (2015).
Article PubMed PubMed Central CAS Google Scholar
Hobert, J. A. & Eng, C. PTEN hamartoma tumor syndrome: an overview. Genet. Med. 11, 687–694 (2009).
Article PubMed CAS Google Scholar
Melbārde-Gorkuša, I. et al. Challenges in the management of a patient with Cowden syndrome: case report and literature review. Hered. Cancer Clin. Pract. 10, 5 (2012).
Article PubMed PubMed Central Google Scholar
Staal, F. J. T. et al. A novel germline mutation of PTEN associated with brain tumours of multiple lineages. Br. J. Cancer 86, 1586–1591 (2002).
Article PubMed PubMed Central CAS Google Scholar
Nelen, M. R. et al. Novel PTEN mutations in patients with Cowden disease: Absence of clear genotype–phenotype correlations. Eur. J. Hum. Genet. 7, 267–273 (1999).
Article PubMed CAS Google Scholar
Whiffin, N. et al. Using high-resolution variant frequencies to empower clinical genome interpretation. Genet. Med. 19, 1151–1158 (2017).
Article PubMed PubMed Central Google Scholar
Richards, S. et al. Standards and guidelines for the interpretation of sequence variants: a joint consensus recommendation of the American College of Medical Genetics and Genomics and the Association for Molecular Pathology. Genet. Med. 17, 405–423 (2015).
Article PubMed PubMed Central Google Scholar
Hollander, M. C., Blumenthal, G. M. & Dennis, P. A. PTEN loss in the continuum of common cancers, rare syndromes and mouse models. Nat. Rev. Cancer 11, 289–301 (2011).
Article PubMed CAS PubMed Central Google Scholar
Kandoth, C. et al. Mutational landscape and significance across 12 major cancer types. Nature 502, 333–339 (2013).
Article PubMed PubMed Central CAS Google Scholar
AACR Project GENIE Consortium. AACR Project GENIE: Powering Precision Medicine through an International Consortium. Cancer Discov. 7, 818–831 (2017).
Article Google Scholar
Papa, A. et al. Cancer-associated PTEN mutants act in a dominant-negative manner to suppress PTEN protein function. Cell 157, 595–610 (2014).
Article PubMed PubMed Central CAS Google Scholar
Leslie, N. R. & Longy, M. Inherited PTEN mutations and the prediction of phenotype. Semin. Cell Dev. Biol. 52, 30–38 (2016).
Article PubMed CAS Google Scholar
Wang, H. et al. Allele-specific tumor spectrum in Pten knockin mice. Proc. Natl Acad. Sci. USA 107, 5142–5147 (2010).
Article PubMed PubMed Central Google Scholar
Bonneau, D. & Longy, M. Mutations of the human PTEN gene. Hum. Mutat. 16, 109–122 (2000).
Article PubMed CAS Google Scholar
Aguissa-Touré, A.-H. & Li, G. Genetic alterations of PTEN in human melanoma. Cell. Mol. Life Sci. 69, 1475–1491 (2012).
Article PubMed CAS Google Scholar
Hodges, L. M. et al. Very important pharmacogene summary. Pharmacogenet. Genomics 21, 152–161 (2011).
Article PubMed PubMed Central CAS Google Scholar
Relling, M. V. et al. Clinical pharmacogenetics implementation consortium guidelines for thiopurine methyltransferase genotype and thiopurine dosing: 2013 update. Clin. Pharmacol. Ther. 93, 324–325 (2013).
Article PubMed PubMed Central CAS Google Scholar
Liu, C. et al. Genomewide approach validates thiopurine methyltransferase activity is a monogenic pharmacogenomic trait. Clin. Pharmacol. Ther. 101, 373–381 (2017).
Article PubMed CAS Google Scholar
Appell, M. L. et al. Nomenclature for alleles of the thiopurine methyltransferase gene. Pharmacogenet. Genomics 23, 242–248 (2013).
Article PubMed PubMed Central CAS Google Scholar
Hamdan-Khalil, R. et al. In vitro characterization of four novel non-functional variants of the thiopurine S-methyltransferase. Biochem. Biophys. Res. Commun. 309, 1005–1010 (2003).
Article PubMed CAS Google Scholar
Kalia, S. S. et al. Recommendations for reporting of secondary findings in clinical exome and genome sequencing, 2016 update (ACMG SFv2.0): a policy statement of the American College of Medical Genetics and Genomics. Genet. Med. 19, 1–7 (2016).
Google Scholar
Relling, M. et al. New Pharmacogenomics Research network: an open community catalyzing research and translation in precision medicine. Clin. Pharmacol. Ther. 102, 897–902 (2017).
Article PubMed CAS Google Scholar
Dillon, L. M. & Miller, T. W. Therapeutic targeting of cancers with loss of PTEN function. Curr. Drug Targets 15, 65–79 (2014).
Article PubMed PubMed Central CAS Google Scholar
Gibson, D. G. et al. Enzymatic assembly of DNA molecules up to several hundred kilobases. Nat. Methods 6, 343–345 (2009).
Article PubMed CAS Google Scholar
Rubin, A. F. et al. A statistical framework for analyzing deep mutational scanning data. Genome Biol. 18, 1–15 (2017).
Article Google Scholar
Krauthammer, M. et al. Exome sequencing identifies recurrent somatic RAC1 mutations in melanoma. Nat. Genet. 44, 1006–1014 (2012).
Article PubMed PubMed Central CAS Google Scholar
Kellogg, E. H., Leaver-Fay, A. & Baker, D. Role of conformational sampling in computing mutation-induced changes in protein structure and stability. Proteins 79, 830–838 (2011).
Article PubMed CAS Google Scholar

Download references

Acknowledgements

We thank J. Underwood and K. Munson of the UW PacBio Sequencing Services for assistance with long-read sequencing; A. Leith of the UW Foege Flow Lab and L. Gitari and D. Prunkard of the UW Pathology Flow Cytometry Core Facility for assistance with cell sorting; and B. Shirts and C. Pritchard in the UW Department of Lab Medicine for advice. The authors would like to acknowledge the American Association for Cancer Research and its financial and material support in the development of the AACR Project GENIE registry, as well as members of the consortium for their commitment to data sharing. Interpretations are the responsibility of study authors. This work was supported by the National Institute of General Medical Sciences (1R01GM109110 and 5R24GM115277 to D.M.F., P50GM115279 to M.V.R. and W.E.E., National Cancer Institute R01CA096670 to S.B. and P30CA21765 to M.V.R.) and an NIH Director’s Pioneer Award (DP1HG007811 to J.S.). K.A.M. is an American Cancer Society Fellow (PF-15-221-01), and was supported by a National Cancer Institute Interdisciplinary Training Grant in Cancer (2T32CA080416). M.A.C. and V.E.G. are supported by the National Science Foundation Graduate Research Fellowship. J.N.D. is supported by a National Institute of General Medical Sciences Training Grant (T32GM007454). J.S. is an Investigator of the Howard Hughes Medical Institute. D.M.F. is a Canadian Institute for Advanced Research Azrieli Global Scholar.

Author information

These authors contributed equally: Kenneth A. Matreyek, Lea M. Starita.

Authors and Affiliations

Department of Genome Sciences, University of Washington, Seattle, WA, USA
Kenneth A. Matreyek, Lea M. Starita, Jason J. Stephany, Beth Martin, Melissa A. Chiasson, Vanessa E. Gray, Martin Kircher, Arineh Khechaduri, Ronald J. Hause, Jay Shendure & Douglas M. Fowler
Department of Medical Genetics, University of Washington, Seattle, WA, USA
Jennifer N. Dines
School of Medicine, University of Alabama at Birmingham, Birmingham, AL, USA
Smita Bhatia
Department of Pharmaceutical Sciences, St. Jude Children’s Research Hospital, Memphis, TN, USA
William E. Evans, Mary V. Relling & Wenjian Yang
Howard Hughes Medical Institute, Seattle, WA, USA
Jay Shendure
Department of Bioengineering, University of Washington, Seattle, WA, USA
Douglas M. Fowler
Genetic Networks Program, Canadian Institute for Advanced Research, Toronto, Ontario, Canada
Douglas M. Fowler

Authors

Kenneth A. Matreyek
View author publications
You can also search for this author in PubMed Google Scholar
Lea M. Starita
View author publications
You can also search for this author in PubMed Google Scholar
Jason J. Stephany
View author publications
You can also search for this author in PubMed Google Scholar
Beth Martin
View author publications
You can also search for this author in PubMed Google Scholar
Melissa A. Chiasson
View author publications
You can also search for this author in PubMed Google Scholar
Vanessa E. Gray
View author publications
You can also search for this author in PubMed Google Scholar
Martin Kircher
View author publications
You can also search for this author in PubMed Google Scholar
Arineh Khechaduri
View author publications
You can also search for this author in PubMed Google Scholar
Jennifer N. Dines
View author publications
You can also search for this author in PubMed Google Scholar
Ronald J. Hause
View author publications
You can also search for this author in PubMed Google Scholar
Smita Bhatia
View author publications
You can also search for this author in PubMed Google Scholar
William E. Evans
View author publications
You can also search for this author in PubMed Google Scholar
Mary V. Relling
View author publications
You can also search for this author in PubMed Google Scholar
Wenjian Yang
View author publications
You can also search for this author in PubMed Google Scholar
Jay Shendure
View author publications
You can also search for this author in PubMed Google Scholar
Douglas M. Fowler
View author publications
You can also search for this author in PubMed Google Scholar

Contributions

D.M.F., J.S., K.A.M. and L.M.S. conceived of, designed and managed the experiments and analyses, and wrote the manuscript; J.J.S. and B.M. cloned expression constructs and libraries and prepared and performed NGS sequencing; K.A.M., M.A.C. and A.K. provided constructs and data for additional disease genes and pharmacogenes; M.K. wrote the scripts to extract barcodes and variable regions from long-read sequences; J.N.D. assisted in using the ACMG guidelines to reclassify PTEN variants; R.J.H. provided constructs for TPMT experiments; V.E.G. designed the website; and S.B., W.E.E., M.V.R. and W.Y. provided clinical data for TPMT comparison.

Corresponding authors

Correspondence to Jay Shendure or Douglas M. Fowler.

Ethics declarations

Competing interests

The authors declare that the variant functional data presented herein are copyrighted, and may be freely used for non-commercial purposes. Licensing for commercial use may benefit the authors. The authors declare no additional competing interests.

Additional information

Publisher’s note: Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Integrated supplementary information

Supplementary Figure 1 Validation experiments of EGFP-fusions for assessing PTEN and TPMT steady-state abundance.

a, Representative gating strategy for mTagBFP2 negative, mCherry positive cells containing 15,000 recombined cells. b, PTEN variant EGFP:mCherry ratio geometric means as a fraction of WT, for known and previously uncharacterized PTEN low-abundance variants. Error bars denote 95% confidence intervals of the mean (red), with individual data points shown in grey. Each variant was assessed in at least 3 independent experiments. c, Similar plot for TPMT, with error bars denoting 95% confidence intervals of the mean (red), with individual data points shown in grey. All variants were independently assessed three times, except variants p.Asp15Tyr, p.Arg64Ser, p.Ala80Pro, p.Ile143Thr, p.Lys238Glu, p.Tyr240Cys, which were assessed twice. d, Scatterplot comparison of WT-normalized EGFP:mCherry ratios for EGFP- or 15-aa split-GFP fused PTEN variants. Values are the mean of 3 independently performed experiments. n = 6 samples. “r” and “ρ” denote Pearson’s and Spearman’s correlation coefficients, respectively

Supplementary Figure 2 Correlations between PTEN and TPMT VAMP-seq replicates.

a, b, Pairwise VAMP-seq abundance score correlations between replicate sorting experiments for PTEN (a) and TPMT (b). n values are the number of variants scored in both experiments. Replicates 5 and 6 for TPMT contained a subset of mutagenized positions different from those mutagenized in replicates 1 through 4, with both subsets mixed together for Replicates 7 and 8. Pearson’s correlation coefficients are shown. Score numbers in this figure correspond to experiment numbers in Supplementary Table 1

Supplementary Figure 3 Validation analyses for VAMP-seq-derived abundance scores.

a, b, Scatterplot comparison of VAMP-seq abundance scores (x-axis) and individually assessed log₁₀-transformed, WT-normalized geometric means of the EGFP:mCherry ratios for various PTEN (a) and TPMT (b) variants (see also Supplementary Figure 1b, c). r and ρ denote Pearson’s and Spearman’s correlation coefficients, respectively. c, PTEN VAMP-seq scores for variant steady state expression characterized by western blot analysis in previous publications (See Supplementary Table 9). d, Scatterplot comparing TPMT VAMP-seq scores (y-axis) and previously published abundance values from western blots (see Supplementary Table 10). e, Nonsense variant VAMP-seq scores by amino acid position, for PTEN (top) and TPMT (bottom). WT abundance score (1.0) shown as a blue line. N-terminal nonsense variants append a small number of residues to EGFP, which does not affect its abundance. C-terminal nonsense variants remove a small number of residues from PTEN or TMPT, which also does not impact abundance. f, Missense variant abundance score density plots for PTEN (gray) and TPMT (green). The thresholds of the 5% lowest synonymous variant scores are shown, for each protein, by the dotted lines. g, h, Scatterplot comparing positional median PTEN (g) and TPMT (h) VAMP-seq scores to PSIC evolutionary conservation scores for each position (Sunyaev et al.) i, j, Positional median PTEN (i) and TPMT (j) abundance scores for positions found in various secondary structure types, with the red line denoting the median value for the group. n values denote the number of positions that fell into each category

Supplementary Figure 4 Biochemical features associations with VAMP-seq-derived abundance scores.

a, Scatterplot comparing abundance score (y-axis) to in vitro characterized melting temperatures of select PTEN variants (Johnston et al.). r and ρ denote Pearson’s and Spearman’s correlation coefficients, respectively. b, A plot of positional median scores for PTEN positions with potential hydrogen bonds or salt bridges. A position was considered intolerant only if it had 5 or more variants and more than 90% of the abundance scores were at or below the score threshold containing the lowest 5% of synonymous variants. Red bars denote median abundance score values. n = 26 for substitution intolerant, and n = 50 for the remaining positions. c, Substitution-intolerant PTEN positions with potential polar contacts, clustered by distance based on PDB coordinates (PDB: 1d5r). Positions within 11 Å of each other were considered part of a group. The dashed line shows the 11 Å distance cutoff. d, Histogram of the number of PTEN missense variants per position in COSMIC. Substitution-intolerant positions potentially involved in polar contacts with counts in COSMIC greater than 7 are labeled in red. e, Minimum distance of all PTEN positions (gray) or elevated-abundance positions (red) from known phospholipid-binding positions. The black line denotes a 7 Å distance. A position was considered elevated in abundance only if it had 5 or more variants and there were more than 5 variants with scores above the median of the synonymous distribution. f, VAMP-seq scores for variants at position S385, with a synonymous variant in black, negatively charged variants in red, positively charged variants in blue, and all other variants in gray

Supplementary Figure 5 PTEN variant abundance classification and relationship to germline and somatic variation.

a, Illustrative examples of variant abundance classifications, with the dotted line representing the threshold above which 95% of synonymous variants reside. Points represent the VAMP-seq score for each representative variant, with error bars denoting the 95% confidence interval derived from experimental replicates. n values are 3, 5, 2, and 4 for p.Thr2Asp, p.Thr5Ala, p.Glu7His, and Lys6Ile, respectively. b, Frequencies of each PTEN abundance class for each PTEN ClinVar interpretation, as well as for all possible SNVs with abundance classifications. c, Abundance scores and classes for PTEN variants with allele counts highly unlikely to be causal for Cowden’s Syndrome. d, Frequencies of all observed PTEN variants across different cancer types in the TCGA and AACR GENIE data. Highly recurrent PTEN variants are labeled in red. e, Western blot analysis of a clonal line stably expressing WT or missense variants of N-terminally HA-tagged PTEN. This line was derived independently from the line used to generate the data shown in Figure 4f. This experiment was independently performed twice with similar results. f, Comparison of PTEN abundance scores with changes in folding energies predicted by Rosetta using the ddg_monomer protocol. Variants are shown as gray circles, with the exception of those with Rosetta ΔΔG predictions greater than 17, which are marked by a black “x” at a ΔΔG value of 17. Contour lines are colored by the regional density of points. Previously or newly identified PTEN dominant negative variants shown as blue points with blue labels

Supplementary Figure 6 Flow chart of PTEN p.Ile135Lys pathogenicity reinterpretation using VAMP-seq data.

The ACMG/AMP joint criteria for classifying variants were used, with low abundance classification by VAMP-seq considered strong experimental support of pathogenicity (PS3). Without functional data there is no strong or very strong evidence of pathogenicity for this variant, therefore pathogenic criteria cannot be fulfilled and the variant remains classified as likely pathogenic. With low abundance data, PS3 can be used and pathogenic criteria is met

Supplementary Figure 7 Relationship of TPMT variant abundance to drug sensitivity.

a, Scatterplot comparing abundance scores and previously characterized red blood cell (RBC) activity from patients. b, c, Scatterplots comparing individually assessed, WT-normalized EGFP:mCherry geometric means to previously published values of average RBC activity (b), or average patient dosage intensity (c). Dose intensity is the dose where 6-MP becomes toxic to the patient before reaching the 100% protocol dose of 75 mg/m². r and ρ denote Pearson’s and Spearman’s correlation coefficients, respectively. n = 6 samples for each plot. d, Western blotting results for individually-expressed TPMT variant GFP fusions. Each variant was blotted with 45, 15, and 5 µg of total protein input per lane. This experiment was performed once

Supplementary Figure 8 Protein stability indices for most human protein N-terminal EGFP fusions.

A histogram of protein stability indices from Yen et al. Protein stability index values for proteins tested in the VAMP-seq assay are shown as dashed vertical lines. Protein stability indices were not available for PTEN, CYP2C9, CYP2C19, and PMS2

Supplementary Figure 9 Amplification and sequencing technical replicates for PTEN.

Scatterplots comparing variant frequency derived from replicate PCR amplification and sequencing for each of the four bins in every PTEN experiment are shown

Supplementary Figure 10 Scheme to determine total frequency filtering threshold value.

a, b, Scatterplots showing the total frequencies and weighted average values of wt (black), synonymous variants (red), or non-terminal nonsense variants (blue) for each experiment, for PTEN and TPMT respectively. A combination of synonymous variant coefficient of variation (c and d), synonymous variant mean (black) and median (red) (e and f), and total number of scored missense variants (g and h) for PTEN (c, e, and g) and TPMT (d, f, and h) were assessed at increasing total frequency filtering threshold values to obtain the threshold value that we required across the four bins for a variant to be included in the analyses we present. The 1 x 10^-4.75 total frequency threshold used for the final analysis is displayed as a dotted line in each plot

Supplementary Figure 11 Statistics for the PTEN library.

a, Barcode counts from independent amplifications of the barcoded PTEN library plasmid preparation used for recombination. n = 67,162 data points. r denotes Pearson’s correlation coefficient. b, A filter based on a minimum count of 200 was imposed (black dotted line), resulting in 40,560 unique barcodes. c, The barcode-variant map was used to determine the frequencies of different types of sequences in the plasmid preparation of the barcoded PTEN library. d, Nucleotide biases at the degenerate codon for the single amino acid PTEN variants. e, Amino acid biases of the single amino acid variants of the PTEN library, with the frequencies expected from perfect NNK mutagenesis shown in red. f, Number of substitutions observed at each position of the PTEN protein amongst the 40,560 barcodes in the PTEN library plasmid preparation. g, Distribution of number of substitutions per position in the PTEN protein. h, Distribution of single amino acid variant frequencies in the PTEN library (black), along with an illustrative log-normal distribution that closely fits the PTEN data (red), shown as a density plot (top panel), or a cumulative distribution function plot (bottom panel). i, Sampling simulations of observed and hypothetical PTEN libraries, displaying the fraction of the 8,040 possible PTEN single amino acid and nonsense variants observed for increasing sampling sizes, with a step size of 1. Results of sampling from the PTEN variant frequency distribution observed in the library plasmid preparation are shown in black. Results of sampling hypothetical, uniformly distributed libraries containing either the subset of single amino acid variants observed in the PTEN library plasmid preparation (dark gray), or all possible PTEN single amino acid variants (light gray) are shown for comparison

Supplementary information

Supplementary Text and Figures

Supplementary Figures 1–11, Supplementary Tables 1–3, 5–10 and Supplementary Note

Reporting Summary

Supplementary Dataset 1

Dataset of PTEN variant scores, classifications, and annotations

Supplementary Dataset 2

Dataset of TPMT variant scores, classifications, and annotations

Supplementary Dataset 3

Dataset of PTEN residue scores, classifications, and annotations

Supplementary Dataset 4

Dataset of TPMT residue scores, classifications, and annotations

Supplementary Dataset 5

R Markdown file recreating all of the analyses

Supplementary Table 4

Table of PTEN variant pathogenicity reclassifications that are possible with abundance data

Rights and permissions

Reprints and permissions

About this article

Cite this article

Matreyek, K.A., Starita, L.M., Stephany, J.J. et al. Multiplex assessment of protein variant abundance by massively parallel sequencing. Nat Genet 50, 874–882 (2018). https://doi.org/10.1038/s41588-018-0122-z

Download citation

Received: 19 September 2017
Accepted: 29 March 2018
Published: 21 May 2018
Issue Date: June 2018
DOI: https://doi.org/10.1038/s41588-018-0122-z

This article is cited by

Cellular and molecular mechanisms of aspartoacylase and its role in Canavan disease
- Martin Grønbæk-Thygesen
- Rasmus Hartmann-Petersen
Cell & Bioscience (2024)
Minimum information and guidelines for reporting a multiplexed assay of variant effect
- Melina Claussnitzer
- Victoria N. Parikh
- Alan F. Rubin
Genome Biology (2024)
Integrated multiplexed assays of variant effect reveal determinants of catechol-O-methyltransferase gene expression
- Ian Hoskins
- Shilpa Rao
- Can Cenik
Molecular Systems Biology (2024)
Identification of 27 allele-specific regulatory variants in Parkinson’s disease using a massively parallel reporter assay
- Sophie L. Farrow
- Sreemol Gokuladhas
- Justin M. O’Sullivan
npj Parkinson's Disease (2024)
Protein destabilization underlies pathogenic missense mutations in ARID1B
- Fanny Mermet-Meillon
- Samuele Mercan
- Giorgio G. Galli
Nature Structural & Molecular Biology (2024)