Determining the pathogenicity of genetic variants is a critical challenge, and functional assessment is often the only option. Experimentally characterizing millions of possible missense variants in thousands of clinically important genes requires generalizable, scalable assays. We describe variant abundance by massively parallel sequencing (VAMP-seq), which measures the effects of thousands of missense variants of a protein on intracellular abundance simultaneously. We apply VAMP-seq to quantify the abundance of 7,801 single-amino-acid variants of PTEN and TPMT, proteins in which functional variants are clinically actionable. We identify 1,138 PTEN and 777 TPMT variants that result in low protein abundance, and may be pathogenic or alter drug metabolism, respectively. We observe selection for low-abundance PTEN variants in cancer, and show that p.Pro38Ser, which accounts for ~10% of PTEN missense variants in melanoma, functions via a dominant-negative mechanism. Finally, we demonstrate that VAMP-seq is applicable to other genes, highlighting its generalizability.

  • Subscribe to Nature Genetics for full access:



Additional access options:

Already a subscriber?  Log in  now or  Register  for online access.

Additional information

Publishers note: Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.


  1. 1.

    Shirts, B. H., Pritchard, C. C. & Walsh, T. Family-specific variants and the limits of human genetics. Trends Mol. Med. 22, 925–934 (2016).

  2. 2.

    Lek, M. et al. Analysis of protein-coding genetic variation in 60,706 humans. Nature 536, 285–291 (2016).

  3. 3.

    Landrum, M. J. et al. ClinVar: public archive of relationships among sequence variation and human phenotype. Nucleic Acids Res. 42, 980–985 (2014).

  4. 4.

    Fowler, D. M., Stephany, J. J. & Fields, S. Measuring the activity of protein variants on a large scale using deep mutational scanning. Nat. Protoc. 9, 2267–2284 (2014).

  5. 5.

    Gasperini, M., Starita, L. & Shendure, J. The power of multiplexed functional analysis of genetic variants. Nat. Protoc. 11, 1782–1787 (2016).

  6. 6.

    Manolio, T. A. et al. Bedside back to bench: building bridges between basic and clinical genomic research. Cell 169, 6–12 (2017).

  7. 7.

    Starita, L. M. et al. Massively parallel functional analysis of BRCA1 RING domain variants. Genetics 200, 413–422 (2015).

  8. 8.

    Majithia, A. R. et al. Prospective functional classification of all possible missense variants in PPARG. Nat. Genet. 48, 1570–1575 (2016).

  9. 9.

    Yue, P., Li, Z. & Moult, J. Loss of protein structure stability as a major causative factor in monogenic disease. J. Mol. Biol. 353, 459–473 (2005).

  10. 10.

    Redler, R. L., Das, J., Diaz, J. R. & Dokholyan, N. V. Protein destabilization as a common factor in diverse inherited disorders. J. Mol. Evol. 82, 11–16 (2016).

  11. 11.

    Berger, A. H., Knudson, A. G. & Pandolfi, P. P. A continuum model for tumour suppression. Nature 476, 163–169 (2011).

  12. 12.

    Lee, M. S. et al. Comprehensive analysis of missense variations in the BRCT domain of BRCA1 by structural and functional assays. Cancer Res. 70, 4880–4890 (2010).

  13. 13.

    Tai, H. L., Krynetski, E. Y., Schuetz, E. G., Yanishevski, Y. & Evans, W. E. Enhanced proteolysis of thiopurine S-methyltransferase (TPMT) encoded by mutant alleles in humans (TPMT*3A, TPMT*2): mechanisms for the genetic polymorphism of TPMT activity. Proc. Natl Acad. Sci. USA 94, 6444–6449 (1997).

  14. 14.

    Kim, I., Miller, C. R., Young, D. L. & Fields, S. High-throughput analysis of in vivo protein stability. Mol. Cell. Proteomics 12, 3370–3378 (2013).

  15. 15.

    Klesmith, J. R., Bacik, J.-P., Wrenbeck, E. E., Michalczyk, R. & Whitehead, T. A. Trade-offs between enzyme fitness and solubility illuminated by deep mutational scanning. Proc. Natl Acad. Sci. USA 114, 2265–2270 (2017).

  16. 16.

    Yen, H.-C. S., Xu, Q., Chou, D. M., Zhao, Z. & Elledge, S. J. Global protein stability profiling in mammalian cells. Science 322, 918–923 (2008).

  17. 17.

    Matreyek, K. A., Stephany, J. J. & Fowler, D. M. A platform for functional assessment of large variant libraries in mammalian cells. Nucleic Acids Res. 45, e102 (2017).

  18. 18.

    Jain, P. C. & Varadarajan, R. A rapid, efficient, and economical inverse polymerase chain reaction-based method for generating a site saturation mutant library. Anal. Biochem. 449, 90–98 (2014).

  19. 19.

    Cabantous, S., Terwilliger, T. C. & Waldo, G. S. Protein tagging and detection with engineered self-assembling fragments of green fluorescent protein. Nat. Biotechnol. 23, 102–107 (2005).

  20. 20.

    Johnston, S. B. & Raines, R. T. Conformational stability and catalytic activity of PTEN variants linked to cancers and autism spectrum disorders. Biochemistry 54, 1576–1582 (2015).

  21. 21.

    Wu, H. et al. Structural basis of allele variation of human thiopurine-S-methyltransferase. Proteins 67, 198–208 (2007).

  22. 22.

    Ward, W. W., Prentice, H. J., Roth, A. F., Cody, C. W. & Reeves, S. C. Spectral perturbations of the Aequorea green-fluorescent protein. Photochem. Photobiol. 35, 803–808 (1982).

  23. 23.

    Sarkisyan, K. S. et al. Local fitness landscape of the green fluorescent protein. Nature 533, 397–401 (2016).

  24. 24.

    Zhou, H. & Zhou, Y. Quantifying the effect of burial of amino acid residues on protein stability. Proteins 322, 315–322 (2004).

  25. 25.

    Kauzmann, W. Some factors in the interpretation of protein denaturation. Adv. Protein Chem. 14, 1–63 (1959).

  26. 26.

    Rocklin, G. J. et al. Global analysis of protein folding using massively parallel design, synthesis, and testing. Science 357, 168–175 (2017).

  27. 27.

    Lee, J. O. et al. Crystal structure of the PTEN tumor suppressor: implications for its phosphoinositide phosphatase activity and membrane association. Cell 99, 323–334 (1999).

  28. 28.

    Song, M. S., Salmena, L. & Pandolfi, P. P. The functions and regulation of the PTEN tumour suppressor. Nat. Rev. Mol. Cell Biol. 13, 283–296 (2012).

  29. 29.

    Nguyen, H.-N. et al. A new class of cancer-associated PTEN mutations defined by membrane translocation defects. Oncogene 34, 3737–3743 (2015).

  30. 30.

    Walker, S. M., Leslie, N. R., Perera, N. M., Batty, I. H. & Downes, C. P. The tumour-suppressor function of PTEN requires an N-terminal lipid-binding motif. Biochem. J. 379, 301–307 (2004).

  31. 31.

    Das, S., Dixon, J. E. & Cho, W. Membrane-binding and activation mechanism of PTEN. Proc. Natl Acad. Sci. USA 100, 7491–7496 (2003).

  32. 32.

    Vazquez, F., Ramaswamy, S., Nakamura, N. & Sellers, W. R. Phosphorylation of the PTEN tail regulates protein stability and function. Mol. Cell. Biol. 20, 5010–5018 (2000).

  33. 33.

    Wei, Y., Stec, B., Redfield, A. G., Weerapana, E. & Roberts, M. F. Phospholipid-binding sites of phosphatase and tensin homolog (PTEN): Exploring the mechanism of phosphatidylinositol 4,5-bisphosphate activation. J. Biol. Chem. 290, 1592–1606 (2015).

  34. 34.

    Naguib, A. et al. PTEN functions by recruitment to cytoplasmic vesicles. Mol. Cell 58, 255–268 (2015).

  35. 35.

    Hobert, J. A. & Eng, C. PTEN hamartoma tumor syndrome: an overview. Genet. Med. 11, 687–694 (2009).

  36. 36.

    Melbārde-Gorkuša, I. et al. Challenges in the management of a patient with Cowden syndrome: case report and literature review. Hered. Cancer Clin. Pract. 10, 5 (2012).

  37. 37.

    Staal, F. J. T. et al. A novel germline mutation of PTEN associated with brain tumours of multiple lineages. Br. J. Cancer 86, 1586–1591 (2002).

  38. 38.

    Nelen, M. R. et al. Novel PTEN mutations in patients with Cowden disease: Absence of clear genotype–phenotype correlations. Eur. J. Hum. Genet. 7, 267–273 (1999).

  39. 39.

    Whiffin, N. et al. Using high-resolution variant frequencies to empower clinical genome interpretation. Genet. Med. 19, 1151–1158 (2017).

  40. 40.

    Richards, S. et al. Standards and guidelines for the interpretation of sequence variants: a joint consensus recommendation of the American College of Medical Genetics and Genomics and the Association for Molecular Pathology. Genet. Med. 17, 405–423 (2015).

  41. 41.

    Hollander, M. C., Blumenthal, G. M. & Dennis, P. A. PTEN loss in the continuum of common cancers, rare syndromes and mouse models. Nat. Rev. Cancer 11, 289–301 (2011).

  42. 42.

    Kandoth, C. et al. Mutational landscape and significance across 12 major cancer types. Nature 502, 333–339 (2013).

  43. 43.

    AACR Project GENIE Consortium. AACR Project GENIE: Powering Precision Medicine through an International Consortium. Cancer Discov. 7, 818–831 (2017).

  44. 44.

    Papa, A. et al. Cancer-associated PTEN mutants act in a dominant-negative manner to suppress PTEN protein function. Cell 157, 595–610 (2014).

  45. 45.

    Leslie, N. R. & Longy, M. Inherited PTEN mutations and the prediction of phenotype. Semin. Cell Dev. Biol. 52, 30–38 (2016).

  46. 46.

    Wang, H. et al. Allele-specific tumor spectrum in Pten knockin mice. Proc. Natl Acad. Sci. USA 107, 5142–5147 (2010).

  47. 47.

    Bonneau, D. & Longy, M. Mutations of the human PTEN gene. Hum. Mutat. 16, 109–122 (2000).

  48. 48.

    Aguissa-Touré, A.-H. & Li, G. Genetic alterations of PTEN in human melanoma. Cell. Mol. Life Sci. 69, 1475–1491 (2012).

  49. 49.

    Hodges, L. M. et al. Very important pharmacogene summary. Pharmacogenet. Genomics 21, 152–161 (2011).

  50. 50.

    Relling, M. V. et al. Clinical pharmacogenetics implementation consortium guidelines for thiopurine methyltransferase genotype and thiopurine dosing: 2013 update. Clin. Pharmacol. Ther. 93, 324–325 (2013).

  51. 51.

    Liu, C. et al. Genomewide approach validates thiopurine methyltransferase activity is a monogenic pharmacogenomic trait. Clin. Pharmacol. Ther. 101, 373–381 (2017).

  52. 52.

    Appell, M. L. et al. Nomenclature for alleles of the thiopurine methyltransferase gene. Pharmacogenet. Genomics 23, 242–248 (2013).

  53. 53.

    Hamdan-Khalil, R. et al. In vitro characterization of four novel non-functional variants of the thiopurine S-methyltransferase. Biochem. Biophys. Res. Commun. 309, 1005–1010 (2003).

  54. 54.

    Kalia, S. S. et al. Recommendations for reporting of secondary findings in clinical exome and genome sequencing, 2016 update (ACMG SFv2.0): a policy statement of the American College of Medical Genetics and Genomics. Genet. Med. 19, 1–7 (2016).

  55. 55.

    Relling, M. et al. New Pharmacogenomics Research network: an open community catalyzing research and translation in precision medicine. Clin. Pharmacol. Ther. 102, 897–902 (2017).

  56. 56.

    Dillon, L. M. & Miller, T. W. Therapeutic targeting of cancers with loss of PTEN function. Curr. Drug Targets 15, 65–79 (2014).

  57. 57.

    Gibson, D. G. et al. Enzymatic assembly of DNA molecules up to several hundred kilobases. Nat. Methods 6, 343–345 (2009).

  58. 58.

    Rubin, A. F. et al. A statistical framework for analyzing deep mutational scanning data. Genome Biol. 18, 1–15 (2017).

  59. 59.

    Krauthammer, M. et al. Exome sequencing identifies recurrent somatic RAC1 mutations in melanoma. Nat. Genet. 44, 1006–1014 (2012).

  60. 60.

    Kellogg, E. H., Leaver-Fay, A. & Baker, D. Role of conformational sampling in computing mutation-induced changes in protein structure and stability. Proteins 79, 830–838 (2011).

Download references


We thank J. Underwood and K. Munson of the UW PacBio Sequencing Services for assistance with long-read sequencing; A. Leith of the UW Foege Flow Lab and L. Gitari and D. Prunkard of the UW Pathology Flow Cytometry Core Facility for assistance with cell sorting; and B. Shirts and C. Pritchard in the UW Department of Lab Medicine for advice. The authors would like to acknowledge the American Association for Cancer Research and its financial and material support in the development of the AACR Project GENIE registry, as well as members of the consortium for their commitment to data sharing. Interpretations are the responsibility of study authors. This work was supported by the National Institute of General Medical Sciences (1R01GM109110 and 5R24GM115277 to D.M.F., P50GM115279 to M.V.R. and W.E.E., National Cancer Institute R01CA096670 to S.B. and P30CA21765 to M.V.R.) and an NIH Director’s Pioneer Award (DP1HG007811 to J.S.). K.A.M. is an American Cancer Society Fellow (PF-15-221-01), and was supported by a National Cancer Institute Interdisciplinary Training Grant in Cancer (2T32CA080416). M.A.C. and V.E.G. are supported by the National Science Foundation Graduate Research Fellowship. J.N.D. is supported by a National Institute of General Medical Sciences Training Grant (T32GM007454). J.S. is an Investigator of the Howard Hughes Medical Institute. D.M.F. is a Canadian Institute for Advanced Research Azrieli Global Scholar.

Author information

Author notes

  1. These authors contributed equally: Kenneth A. Matreyek, Lea M. Starita.


  1. Department of Genome Sciences, University of Washington, Seattle, WA, USA

    • Kenneth A. Matreyek
    • , Lea M. Starita
    • , Jason J. Stephany
    • , Beth Martin
    • , Melissa A. Chiasson
    • , Vanessa E. Gray
    • , Martin Kircher
    • , Arineh Khechaduri
    • , Ronald J. Hause
    • , Jay Shendure
    •  & Douglas M. Fowler
  2. Department of Medical Genetics, University of Washington, Seattle, WA, USA

    • Jennifer N. Dines
  3. School of Medicine, University of Alabama at Birmingham, Birmingham, AL, USA

    • Smita Bhatia
  4. Department of Pharmaceutical Sciences, St. Jude Children’s Research Hospital, Memphis, TN, USA

    • William E. Evans
    • , Mary V. Relling
    •  & Wenjian Yang
  5. Howard Hughes Medical Institute, Seattle, WA, USA

    • Jay Shendure
  6. Department of Bioengineering, University of Washington, Seattle, WA, USA

    • Douglas M. Fowler
  7. Genetic Networks Program, Canadian Institute for Advanced Research, Toronto, Ontario, Canada

    • Douglas M. Fowler


  1. Search for Kenneth A. Matreyek in:

  2. Search for Lea M. Starita in:

  3. Search for Jason J. Stephany in:

  4. Search for Beth Martin in:

  5. Search for Melissa A. Chiasson in:

  6. Search for Vanessa E. Gray in:

  7. Search for Martin Kircher in:

  8. Search for Arineh Khechaduri in:

  9. Search for Jennifer N. Dines in:

  10. Search for Ronald J. Hause in:

  11. Search for Smita Bhatia in:

  12. Search for William E. Evans in:

  13. Search for Mary V. Relling in:

  14. Search for Wenjian Yang in:

  15. Search for Jay Shendure in:

  16. Search for Douglas M. Fowler in:


D.M.F., J.S., K.A.M. and L.M.S. conceived of, designed and managed the experiments and analyses, and wrote the manuscript; J.J.S. and B.M. cloned expression constructs and libraries and prepared and performed NGS sequencing; K.A.M., M.A.C. and A.K. provided constructs and data for additional disease genes and pharmacogenes; M.K. wrote the scripts to extract barcodes and variable regions from long-read sequences; J.N.D. assisted in using the ACMG guidelines to reclassify PTEN variants; R.J.H. provided constructs for TPMT experiments; V.E.G. designed the website; and S.B., W.E.E., M.V.R. and W.Y. provided clinical data for TPMT comparison.

Competing interests

The authors declare that the variant functional data presented herein are copyrighted, and may be freely used for non-commercial purposes. Licensing for commercial use may benefit the authors. The authors declare no additional competing interests.

Corresponding authors

Correspondence to Jay Shendure or Douglas M. Fowler.

Integrated supplementary information

  1. Supplementary Figure 1 Validation experiments of EGFP-fusions for assessing PTEN and TPMT steady-state abundance.

    a, Representative gating strategy for mTagBFP2 negative, mCherry positive cells containing 15,000 recombined cells. b, PTEN variant EGFP:mCherry ratio geometric means as a fraction of WT, for known and previously uncharacterized PTEN low-abundance variants. Error bars denote 95% confidence intervals of the mean (red), with individual data points shown in grey. Each variant was assessed in at least 3 independent experiments. c, Similar plot for TPMT, with error bars denoting 95% confidence intervals of the mean (red), with individual data points shown in grey. All variants were independently assessed three times, except variants p.Asp15Tyr, p.Arg64Ser, p.Ala80Pro, p.Ile143Thr, p.Lys238Glu, p.Tyr240Cys, which were assessed twice. d, Scatterplot comparison of WT-normalized EGFP:mCherry ratios for EGFP- or 15-aa split-GFP fused PTEN variants. Values are the mean of 3 independently performed experiments. n = 6 samples. “r” and “ρ” denote Pearson’s and Spearman’s correlation coefficients, respectively

  2. Supplementary Figure 2 Correlations between PTEN and TPMT VAMP-seq replicates.

    a, b, Pairwise VAMP-seq abundance score correlations between replicate sorting experiments for PTEN (a) and TPMT (b). n values are the number of variants scored in both experiments. Replicates 5 and 6 for TPMT contained a subset of mutagenized positions different from those mutagenized in replicates 1 through 4, with both subsets mixed together for Replicates 7 and 8. Pearson’s correlation coefficients are shown. Score numbers in this figure correspond to experiment numbers in Supplementary Table 1

  3. Supplementary Figure 3 Validation analyses for VAMP-seq-derived abundance scores.

    a, b, Scatterplot comparison of VAMP-seq abundance scores (x-axis) and individually assessed log10-transformed, WT-normalized geometric means of the EGFP:mCherry ratios for various PTEN (a) and TPMT (b) variants (see also Supplementary Figure 1b, c). r and ρ denote Pearson’s and Spearman’s correlation coefficients, respectively. c, PTEN VAMP-seq scores for variant steady state expression characterized by western blot analysis in previous publications (See Supplementary Table 9). d, Scatterplot comparing TPMT VAMP-seq scores (y-axis) and previously published abundance values from western blots (see Supplementary Table 10). e, Nonsense variant VAMP-seq scores by amino acid position, for PTEN (top) and TPMT (bottom). WT abundance score (1.0) shown as a blue line. N-terminal nonsense variants append a small number of residues to EGFP, which does not affect its abundance. C-terminal nonsense variants remove a small number of residues from PTEN or TMPT, which also does not impact abundance. f, Missense variant abundance score density plots for PTEN (gray) and TPMT (green). The thresholds of the 5% lowest synonymous variant scores are shown, for each protein, by the dotted lines. g, h, Scatterplot comparing positional median PTEN (g) and TPMT (h) VAMP-seq scores to PSIC evolutionary conservation scores for each position (Sunyaev et al.) i, j, Positional median PTEN (i) and TPMT (j) abundance scores for positions found in various secondary structure types, with the red line denoting the median value for the group. n values denote the number of positions that fell into each category

  4. Supplementary Figure 4 Biochemical features associations with VAMP-seq-derived abundance scores.

    a, Scatterplot comparing abundance score (y-axis) to in vitro characterized melting temperatures of select PTEN variants (Johnston et al.). r and ρ denote Pearson’s and Spearman’s correlation coefficients, respectively. b, A plot of positional median scores for PTEN positions with potential hydrogen bonds or salt bridges. A position was considered intolerant only if it had 5 or more variants and more than 90% of the abundance scores were at or below the score threshold containing the lowest 5% of synonymous variants. Red bars denote median abundance score values. n = 26 for substitution intolerant, and n = 50 for the remaining positions. c, Substitution-intolerant PTEN positions with potential polar contacts, clustered by distance based on PDB coordinates (PDB: 1d5r). Positions within 11 Å of each other were considered part of a group. The dashed line shows the 11 Å distance cutoff. d, Histogram of the number of PTEN missense variants per position in COSMIC. Substitution-intolerant positions potentially involved in polar contacts with counts in COSMIC greater than 7 are labeled in red. e, Minimum distance of all PTEN positions (gray) or elevated-abundance positions (red) from known phospholipid-binding positions. The black line denotes a 7 Å distance. A position was considered elevated in abundance only if it had 5 or more variants and there were more than 5 variants with scores above the median of the synonymous distribution. f, VAMP-seq scores for variants at position S385, with a synonymous variant in black, negatively charged variants in red, positively charged variants in blue, and all other variants in gray

  5. Supplementary Figure 5 PTEN variant abundance classification and relationship to germline and somatic variation.

    a, Illustrative examples of variant abundance classifications, with the dotted line representing the threshold above which 95% of synonymous variants reside. Points represent the VAMP-seq score for each representative variant, with error bars denoting the 95% confidence interval derived from experimental replicates. n values are 3, 5, 2, and 4 for p.Thr2Asp, p.Thr5Ala, p.Glu7His, and Lys6Ile, respectively. b, Frequencies of each PTEN abundance class for each PTEN ClinVar interpretation, as well as for all possible SNVs with abundance classifications. c, Abundance scores and classes for PTEN variants with allele counts highly unlikely to be causal for Cowden’s Syndrome. d, Frequencies of all observed PTEN variants across different cancer types in the TCGA and AACR GENIE data. Highly recurrent PTEN variants are labeled in red. e, Western blot analysis of a clonal line stably expressing WT or missense variants of N-terminally HA-tagged PTEN. This line was derived independently from the line used to generate the data shown in Figure 4f. This experiment was independently performed twice with similar results. f, Comparison of PTEN abundance scores with changes in folding energies predicted by Rosetta using the ddg_monomer protocol. Variants are shown as gray circles, with the exception of those with Rosetta ΔΔG predictions greater than 17, which are marked by a black “x” at a ΔΔG value of 17. Contour lines are colored by the regional density of points. Previously or newly identified PTEN dominant negative variants shown as blue points with blue labels

  6. Supplementary Figure 6 Flow chart of PTEN p.Ile135Lys pathogenicity reinterpretation using VAMP-seq data.

    The ACMG/AMP joint criteria for classifying variants were used, with low abundance classification by VAMP-seq considered strong experimental support of pathogenicity (PS3). Without functional data there is no strong or very strong evidence of pathogenicity for this variant, therefore pathogenic criteria cannot be fulfilled and the variant remains classified as likely pathogenic. With low abundance data, PS3 can be used and pathogenic criteria is met

  7. Supplementary Figure 7 Relationship of TPMT variant abundance to drug sensitivity.

    a, Scatterplot comparing abundance scores and previously characterized red blood cell (RBC) activity from patients. b, c, Scatterplots comparing individually assessed, WT-normalized EGFP:mCherry geometric means to previously published values of average RBC activity (b), or average patient dosage intensity (c). Dose intensity is the dose where 6-MP becomes toxic to the patient before reaching the 100% protocol dose of 75 mg/m2. r and ρ denote Pearson’s and Spearman’s correlation coefficients, respectively. n = 6 samples for each plot. d, Western blotting results for individually-expressed TPMT variant GFP fusions. Each variant was blotted with 45, 15, and 5 µg of total protein input per lane. This experiment was performed once

  8. Supplementary Figure 8 Protein stability indices for most human protein N-terminal EGFP fusions.

    A histogram of protein stability indices from Yen et al. Protein stability index values for proteins tested in the VAMP-seq assay are shown as dashed vertical lines. Protein stability indices were not available for PTEN, CYP2C9, CYP2C19, and PMS2

  9. Supplementary Figure 9 Amplification and sequencing technical replicates for PTEN.

    Scatterplots comparing variant frequency derived from replicate PCR amplification and sequencing for each of the four bins in every PTEN experiment are shown

  10. Supplementary Figure 10 Scheme to determine total frequency filtering threshold value.

    a, b, Scatterplots showing the total frequencies and weighted average values of wt (black), synonymous variants (red), or non-terminal nonsense variants (blue) for each experiment, for PTEN and TPMT respectively. A combination of synonymous variant coefficient of variation (c and d), synonymous variant mean (black) and median (red) (e and f), and total number of scored missense variants (g and h) for PTEN (c, e, and g) and TPMT (d, f, and h) were assessed at increasing total frequency filtering threshold values to obtain the threshold value that we required across the four bins for a variant to be included in the analyses we present. The 1 x 10-4.75 total frequency threshold used for the final analysis is displayed as a dotted line in each plot

  11. Supplementary Figure 11 Statistics for the PTEN library.

    a, Barcode counts from independent amplifications of the barcoded PTEN library plasmid preparation used for recombination. n = 67,162 data points. r denotes Pearson’s correlation coefficient. b, A filter based on a minimum count of 200 was imposed (black dotted line), resulting in 40,560 unique barcodes. c, The barcode-variant map was used to determine the frequencies of different types of sequences in the plasmid preparation of the barcoded PTEN library. d, Nucleotide biases at the degenerate codon for the single amino acid PTEN variants. e, Amino acid biases of the single amino acid variants of the PTEN library, with the frequencies expected from perfect NNK mutagenesis shown in red. f, Number of substitutions observed at each position of the PTEN protein amongst the 40,560 barcodes in the PTEN library plasmid preparation. g, Distribution of number of substitutions per position in the PTEN protein. h, Distribution of single amino acid variant frequencies in the PTEN library (black), along with an illustrative log-normal distribution that closely fits the PTEN data (red), shown as a density plot (top panel), or a cumulative distribution function plot (bottom panel). i, Sampling simulations of observed and hypothetical PTEN libraries, displaying the fraction of the 8,040 possible PTEN single amino acid and nonsense variants observed for increasing sampling sizes, with a step size of 1. Results of sampling from the PTEN variant frequency distribution observed in the library plasmid preparation are shown in black. Results of sampling hypothetical, uniformly distributed libraries containing either the subset of single amino acid variants observed in the PTEN library plasmid preparation (dark gray), or all possible PTEN single amino acid variants (light gray) are shown for comparison

Supplementary information

  1. Supplementary Text and Figures

    Supplementary Figures 1–11, Supplementary Tables 1–3, 5–10 and Supplementary Note

  2. Reporting Summary

  3. Supplementary Dataset 1

    Dataset of PTEN variant scores, classifications, and annotations

  4. Supplementary Dataset 2

    Dataset of TPMT variant scores, classifications, and annotations

  5. Supplementary Dataset 3

    Dataset of PTEN residue scores, classifications, and annotations

  6. Supplementary Dataset 4

    Dataset of TPMT residue scores, classifications, and annotations

  7. Supplementary Dataset 5

    R Markdown file recreating all of the analyses

  8. Supplementary Table 4

    Table of PTEN variant pathogenicity reclassifications that are possible with abundance data

About this article

Publication history






Rights and permissions

To obtain permission to re-use content from this article visit RightsLink.