Article

Interpreting short tandem repeat variations in humans using mutational constraint

Received:
Accepted:
Published online:

Abstract

Identifying regions of the genome that are depleted of mutations can distinguish potentially deleterious variants. Short tandem repeats (STRs), also known as microsatellites, are among the largest contributors of de novo mutations in humans. However, per-locus studies of STR mutations have been limited to highly ascertained panels of several dozen loci. Here we harnessed bioinformatics tools and a novel analytical framework to estimate mutation parameters for each STR in the human genome by correlating STR genotypes with local sequence heterozygosity. We applied our method to obtain robust estimates of the impact of local sequence features on mutation parameters and used these estimates to create a framework for measuring constraint at STRs by comparing observed versus expected mutation rates. Constraint scores identified known pathogenic variants with early-onset effects. Our metric will provide a valuable tool for prioritizing pathogenic STRs in medical genetics studies.

  • Subscribe to Nature Genetics for full access:

    $59

    Subscribe

Additional access options:

Already a subscriber?  Log in  now or  Register  for online access.

References

  1. 1.

    et al. A framework for the interpretation of de novo mutation in human disease. Nat. Genet. 46, 944–950 (2014).

  2. 2.

    , , , & Genic intolerance to functional variation and the interpretation of personal genomes. PLoS Genet. 9, e1003709 (2013).

  3. 3.

    , , & A method for calculating probabilities of fitness consequences for point mutations across the human genome. Nat. Genet. 47, 276–283 (2015).

  4. 4.

    et al. The human functional genome defined by genetic diversity. Preprint at. bioRxiv (2016).

  5. 5.

    , , , & The landscape of human STR variation. Genome Res. 24, 1894–1904 (2014).

  6. 6.

    Expandable DNA repeats and human disease. Nature 447, 932–940 (2007).

  7. 7.

    , , , & De novo Huntington disease caused by 26–44 CAG repeat expansion on a low-risk haplotype. Neurology 81, 1099–1100 (2013).

  8. 8.

    , , , & Polyalanine expansions in human. Hum. Mol. Genet. 13, R235–R243 (2004).

  9. 9.

    , & The overdue promise of short tandem repeat variation for heritability. Trends Genet. 30, 504–512 (2014).

  10. 10.

    et al. Abundant contribution of short tandem repeats to gene expression variation in humans. Nat. Genet. 48, 22–29 (2016).

  11. 11.

    et al. Polymorphic tandem repeats within gene promoters act as modifiers of gene expression and DNA methylation in humans. Nucleic Acids Res. 44, 3750–3762 (2016).

  12. 12.

    , , & Classification and characterization of microsatellite instability across 18 cancer types. Nat. Med. 22, 1342–1350 (2016).

  13. 13.

    et al. Mutability of Y-chromosomal microsatellites: rates, characteristics, molecular bases, and forensic implications. Am. J. Hum. Genet. 87, 341–353 (2010).

  14. 14.

    & Mutation rate estimates for 110 Y-chromosome STRs combining population and father–son pair data. Eur. J. Hum. Genet. 19, 70–75 (2011).

  15. 15.

    et al. A direct characterization of human mutation based on microsatellites. Nat. Genet. 44, 1161–1165 (2012).

  16. 16.

    & Mutation of human short tandem repeats. Hum. Mol. Genet. 2, 1123–1128 (1993).

  17. 17.

    Heterogeneous mutation processes in human microsatellite DNA sequences. Nat. Genet. 24, 400–402 (2000).

  18. 18.

    et al. The Simons Genome Diversity Project: 300 genomes from 142 diverse populations. Nature 538, 201–206 (2016).

  19. 19.

    , , , & Population-scale sequencing data enable precise etimates of Y-STR mutation rates. Am. J. Hum. Genet. 98, 919–933 (2016).

  20. 20.

    & Inference of human population history from individual whole-genome sequences. Nature 475, 493–496 (2011).

  21. 21.

    et al. Genome-wide profiling of heritable and de novo STR variations. Nat. Methods 14, 590–592 (2017).

  22. 22.

    1000 Genomes Project Consortium. An integrated map of genetic variation from 1,092 human genomes. Nature 491, 56–65 (2012).

  23. 23.

    Aligning sequence reads, clone sequences and assembly contigs with BWA-MEM. Preprint at (2013).

  24. 24.

    , , & lobSTR: a short tandem repeat profiler for personal genomes. Genome Res. 22, 1154–1162 (2012).

  25. 25.

    et al. Analysis of protein-coding genetic variation in 60,706 humans. Nature 536, 285–291 (2016).

  26. 26.

    et al. A glutamine repeat variant of the RUNX2 gene causes cleidocranial dysplasia. Mol. Syndromol. 6, 50–53 (2015).

  27. 27.

    et al. Characterisation of novel RUNX2 mutation with alanine tract expansion from Japanese cleidocranial dysplasia patient. Mutagenesis 31, 61–67 (2016).

  28. 28.

    et al. Synpolydactyly phenotypes correlate with size of expansions in HOXD13 polyalanine tract. Proc. Natl. Acad. Sci. USA 94, 7458–7463 (1997).

  29. 29.

    & Repeat expansion disease: progress and puzzles in disease pathogenesis. Nat. Rev. Genet. 11, 247–258 (2010).

  30. 30.

    & BEDTools: a flexible suite of utilities for comparing genomic features. Bioinformatics 26, 841–842 (2010).

  31. 31.

    et al. Whole-genome sequencing in autism identifies hot spots for de novo germline mutation. Cell 151, 1431–1442 (2012).

  32. 32.

    et al. Deep sequencing of 10,000 human genomes. Proc. Natl. Acad. Sci. USA 113, 11901–11906 (2016).

  33. 33.

    et al. Mutation patterns at dinucleotide microsatellite loci in humans. Am. J. Hum. Genet. 70, 625–634 (2002).

  34. 34.

    & Microsatellites as targets of natural selection. Mol. Biol. Evol. 30, 285–298 (2013).

  35. 35.

    et al. Toward male individualization with rapidly mutating Y-chromosomal short tandem repeats. Hum. Mutat. 35, 1021–1032 (2014).

  36. 36.

    , & Inter-allelic interactions play a major role in microsatellite evolution. Proc. Biol. Sci. 282, 20152125 (2015).

  37. 37.

    , & Microsatellite allele frequencies in humans and chimpanzees, with implications for constraints on allele size. Mol. Biol. Evol. 12, 594–603 (1995).

  38. 38.

    & fastsimcoal: a continuous-time coalescent simulator of genomic diversity under arbitrarily complex evolutionary scenarios. Bioinformatics 27, 1332–1334 (2011).

  39. 39.

    et al. The Y-chromosome point mutation rate in humans. Nat. Genet. 47, 453–457 (2015).

  40. 40.

    et al. Punctuated bursts in human male demography inferred from 1,244 worldwide Y-chromosome sequences. Nat. Genet. 48, 593–599 (2016).

Download references

Acknowledgements

We thank N. Patterson, M. Daly, Y. Wan, and A. Goren for helpful discussions. D.R. was supported by NIH grants GM100233 and HG006399 and is a Howard Hughes Medical Institute investigator. M.G. was supported by NIH/NIMH grant 1U01MH105669-01. Y.E. holds a Career Award at the Scientific Interface from the Burroughs Wellcome Fund. This study was supported in part by National Institute of Justice grant 2014-DN-BX-K089 (Y.E., T.W., M.G.) and by a generous gift from Paul and Andria Heafy (Y.E.).

Author information

Author notes

    • Melissa Gymrek
    • , Thomas Willems
    • , David Reich
    •  & Yaniv Erlich

    Present addresses: Department of Medicine and Department of Computer Science and Engineering, University of California, San Diego, La Jolla, California, USA (M.G.), Computational Genomics, Vertex Pharmaceuticals, Boston, Massachusetts, USA (T.W.), Department of Genetics, Harvard Medical School, Boston, Massachusetts, USA (D.R.) and MyHeritage.com, Or Yehuda, Israel (Y.E.).

    • David Reich
    •  & Yaniv Erlich

    These authors contributed equally to this work.

Affiliations

  1. Program in Medical and Population Genetics, Broad Institute of MIT and Harvard, Cambridge, Massachusetts, USA.

    • Melissa Gymrek
  2. New York Genome Center, New York, New York, USA.

    • Melissa Gymrek
    • , Thomas Willems
    •  & Yaniv Erlich
  3. Department of Medicine, University of California, San Diego, La Jolla, California, USA.

    • Melissa Gymrek
  4. Department of Computer Science and Engineering, University of California, San Diego, La Jolla, California, USA.

    • Melissa Gymrek
  5. Computational and Systems Biology Program, Massachusetts Institute of Technology, Cambridge, Massachusetts, USA.

    • Thomas Willems
  6. Department of Genetics, Harvard Medical School, Boston, Massachusetts, USA.

    • David Reich
  7. Howard Hughes Medical Institute, Harvard Medical School, Boston, Massachusetts, USA.

    • David Reich
  8. Department of Computer Science, Fu Foundation School of Engineering, Columbia University, New York, New York, USA.

    • Yaniv Erlich

Authors

  1. Search for Melissa Gymrek in:

  2. Search for Thomas Willems in:

  3. Search for David Reich in:

  4. Search for Yaniv Erlich in:

Contributions

M.G., D.R., and Y.E. conceived the study. M.G. prepared the initial manuscript and performed analyses. T.W. developed the likelihood-maximization procedure and helped design analyses. All authors contributed to the development of the mutation model and mutation rate estimation technique.

Competing interests

Y.E. is the Chief Science Officer of MyHeritage.com and consults for companies that operate in the DNA forensics domain.

Corresponding author

Correspondence to Melissa Gymrek.

Integrated supplementary information

Supplementary information

PDF files

  1. 1.

    Supplementary Text and Figures

    Supplementary Figures 1–14, Supplementary Tables 1–5 and Supplementary Note

  2. 2.

    Life Sciences Reporting Summary