Skip to main content

Thank you for visiting nature.com. You are using a browser version with limited support for CSS. To obtain the best experience, we recommend you use a more up to date browser (or turn off compatibility mode in Internet Explorer). In the meantime, to ensure continued support, we are displaying the site without styles and JavaScript.

  • Review Article
  • Published:

Sequencing and characterizing short tandem repeats in the human genome

Abstract

Short tandem repeats (STRs) are highly polymorphic sequences throughout the human genome that are composed of repeated copies of a 1–6-bp motif. Over 1 million variable STR loci are known, some of which regulate gene expression and influence complex traits, such as height. Moreover, variants in at least 60 STR loci cause genetic disorders, including Huntington disease and fragile X syndrome. Accurately identifying and genotyping STR variants is challenging, in particular mapping short reads to repetitive regions and inferring expanded repeat lengths. Recent advances in sequencing technology and computational tools for STR genotyping from sequencing data promise to help overcome this challenge and solve genetically unresolved cases and the ‘missing heritability’ of polygenic traits. Here, we compare STR genotyping methods, analytical tools and their applications to understand the effect of STR variation on health and disease. We identify emergent opportunities to refine genotyping and quality-control approaches as well as to integrate STRs into variant-calling workflows and large cohort analyses.

This is a preview of subscription content, access via your institution

Access options

Buy this article

Prices may be subject to local taxes which are calculated during checkout

Fig. 1: Causes and consequences of short tandem repeat variation.
Fig. 2: Timeline of developments in short tandem repeat genotyping tools and applications.
Fig. 3: Methods overview of short-read short tandem repeat callers.
Fig. 4: Short tandem repeat association analysis.

Similar content being viewed by others

References

  1. Horton, C. A. et al. Short tandem repeats bind transcription factors to tune eukaryotic gene expression. Science 381, eadd1250 (2023).

    Article  CAS  PubMed  Google Scholar 

  2. Ziaei Jam, H. et al. A deep population reference panel of tandem repeat variation. Nat. Commun. 14, 6711 (2023). This work provides an ensemble calling framework for tandem repeats and a phased haplotype panel to impute tandem repeats.

    Article  CAS  PubMed  PubMed Central  ADS  Google Scholar 

  3. Lander, E. S. et al. Initial sequencing and analysis of the human genome. Nature 409, 860–921 (2001).

    Article  CAS  PubMed  ADS  Google Scholar 

  4. Halman, A., Dolzhenko, E. & Oshlack, A. STRipy: a graphical application for enhanced genotyping of pathogenic short tandem repeats in sequencing data. Hum. Mutat. 43, 859–868 (2022).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  5. Depienne, C. & Mandel, J.-L. 30 years of repeat expansion disorders: what have we learned and what are the remaining challenges? Am. J. Hum. Genet. 108, 764–785 (2021). This review article provides a succinct overview of the timeline of discovery and advances in the understanding of repeat expansion disorders.

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  6. Willems, T. et al. Genome-wide profiling of heritable and de novo STR variations. Nat. Methods 14, 590–592 (2017).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  7. Willems, T. et al. The landscape of human STR variation. Genome Res. 24, 1894–1904 (2014). The first paper, to our knowledge, to catalogue the variation of STRs genome wide.

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  8. Gall-Duncan, T., Sato, N., Yuen, R. K. C. & Pearson, C. E. Advancing genomic technologies and clinical awareness accelerates discovery of disease-associated tandem repeat sequences. Genome Res. 32, 1–27 (2022).

    Article  PubMed  PubMed Central  Google Scholar 

  9. Fotsing, S. F. et al. The impact of short tandem repeat variation on gene expression. Nat. Genet. 51, 1652–1659 (2019).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  10. Lamina, C. et al. A systematic evaluation of short tandem repeats in lipid candidate genes: riding on the SNP-wave. PLoS ONE 9, e102113 (2014).

    Article  PubMed  PubMed Central  ADS  Google Scholar 

  11. Levinson, G. & Gutman, G. A. Slipped-strand mispairing: a major mechanism for DNA sequence evolution. Mol. Biol. Evol. 4, 203–221 (1987).

    CAS  PubMed  Google Scholar 

  12. Huang, Q.-Y. et al. Mutation patterns at dinucleotide microsatellite loci in humans. Am. J. Hum. Genet. 70, 625–634 (2002).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  13. Sulovari, A. et al. Human-specific tandem repeat expansion and differential gene expression during primate evolution. Proc. Natl Acad. Sci. USA 116, 23243–23253 (2019). Human-specific tandem repeat expansions identified in evolutionary history analysis.

    Article  CAS  PubMed  PubMed Central  ADS  Google Scholar 

  14. Steely, C. J., Watkins, W. S., Baird, L. & Jorde, L. B. The mutational dynamics of short tandem repeats in large, multigenerational families. Genome Biol. 23, 253 (2022).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  15. Beecroft, S. J. et al. A Māori specific RFC1 pathogenic repeat configuration in CANVAS, likely due to a founder allele. Brain 143, 2673–2680 (2020).

    Article  PubMed  PubMed Central  Google Scholar 

  16. Tsuchiya, M. et al. RFC1 repeat expansion in Japanese patients with late-onset cerebellar ataxia. J. Hum. Genet. 65, 1143–1147 (2020).

    Article  CAS  PubMed  Google Scholar 

  17. Sobczak, K. et al. Structural diversity of triplet repeat RNAs. J. Biol. Chem. 285, 12755–12764 (2010).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  18. Thys, R. G., Lehman, C. E., Pierce, L. C. T. & Wang, Y.-H. DNA secondary structure at chromosomal fragile sites in human disease. Curr. Genomics 16, 60–70 (2015).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  19. Capra, J. A., Paeschke, K., Singh, M. & Zakian, V. A. G-quadruplex DNA sequences are evolutionarily conserved and associated with distinct genomic features in Saccharomyces cerevisiae. PLoS Comput. Biol. 6, e1000861 (2010).

    Article  MathSciNet  PubMed  PubMed Central  ADS  Google Scholar 

  20. Lago, S. et al. Promoter G-quadruplexes and transcription factors cooperate to shape the cell type-specific transcriptome. Nat. Commun. 12, 3885 (2021).

    Article  CAS  PubMed  PubMed Central  ADS  Google Scholar 

  21. Hamanaka, K. et al. Genome-wide identification of tandem repeats associated with splicing variation across 49 tissues in humans. Genome Res. 33, 435–447 (2023).

    Article  PubMed  PubMed Central  Google Scholar 

  22. Lee, J. E. & Cooper, T. A. Pathogenic mechanisms of myotonic dystrophy. Biochem. Soc. Trans. 37, 1281–1286 (2009).

    Article  CAS  PubMed  Google Scholar 

  23. Zu, T. et al. Non-ATG-initiated translation directed by microsatellite expansions. Proc. Natl Acad. Sci. USA 108, 260–265 (2011).

    Article  CAS  PubMed  ADS  Google Scholar 

  24. Ordway, J. M. et al. Ectopically expressed CAG repeats cause intranuclear inclusions and a progressive late onset neurological phenotype in the mouse. Cell 91, 753–763 (1997).

    Article  CAS  PubMed  Google Scholar 

  25. Karczewski, K. J. et al. The mutational constraint spectrum quantified from variation in 141,456 humans. Nature 581, 434–443 (2020).

    Article  CAS  PubMed  PubMed Central  ADS  Google Scholar 

  26. Lek, M. et al. Analysis of protein-coding genetic variation in 60,706 humans. Nature 536, 285–291 (2016).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  27. Gymrek, M., Golan, D., Rosset, S. & Erlich, Y. lobSTR: a short tandem repeat profiler for personal genomes. Genome Res. 22, 1154–1162 (2012).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  28. Highnam, G. et al. Accurate human microsatellite genotypes from high-throughput resequencing data using informed error profiles. Nucleic Acids Res. 41, e32 (2013).

    Article  CAS  PubMed  Google Scholar 

  29. Miyatake, S. et al. Rapid and comprehensive diagnostic method for repeat expansion diseases using nanopore sequencing. NPJ Genom. Med. 7, 62 (2022).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  30. Chintalaphani, S. R., Pineda, S. S., Deveson, I. W. & Kumar, K. R. An update on the neurological short tandem repeat expansion disorders and the emergence of long-read sequencing diagnostics. Acta Neuropathol. Commun. 9, 98 (2021).

    Article  PubMed  PubMed Central  Google Scholar 

  31. Mousavi, N., Shleizer-Burko, S., Yanicky, R. & Gymrek, M. Profiling the genome-wide landscape of tandem repeat expansions. Nucleic Acids Res. 47, e90 (2019).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  32. Dolzhenko, E. et al. ExpansionHunter: a sequence-graph-based tool to analyze variation in short tandem repeat regions. Bioinformatics 35, 4754–4756 (2019).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  33. Dashnow, H. et al. STRling: a k-mer counting approach that detects short tandem repeat expansions at known and novel loci. Genome Biol. 23, 257 (2022).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  34. Dolzhenko, E. et al. ExpansionHunter Denovo: a computational method for locating known and novel repeat expansions in short-read sequencing data. Genome Biol. 21, 102 (2020).

    Article  PubMed  PubMed Central  Google Scholar 

  35. Chiu, R., Rajan-Babu, I.-S., Friedman, J. M. & Birol, I. Straglr: discovering and genotyping tandem repeat expansions using whole genome long-read sequences. Genome Biol. 22, 224 (2021).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  36. Fang, L. et al. DeepRepeat: direct quantification of short tandem repeats on signal data from nanopore sequencing. Genome Biol. 23, 108 (2022).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  37. Dolzhenko, E. et al. Detection of long repeat expansions from PCR-free whole-genome sequence data. Genome Res. 27, 1895–1903 (2017).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  38. Ibañez, K. et al. Whole genome sequencing for the diagnosis of neurological repeat expansion disorders in the UK: a retrospective diagnostic accuracy and prospective clinical validation study. Lancet Neurol. 21, 234–245 (2022). This study assessed the diagnostic utility of whole-genome sequencing to detect pathogenic repeat expansions associated with neurological conditions.

    Article  PubMed  PubMed Central  Google Scholar 

  39. Mitra, I. et al. Patterns of de novo tandem repeat mutations and their role in autism. Nature 589, 246–250 (2021).

    Article  CAS  PubMed  PubMed Central  ADS  Google Scholar 

  40. Jakubosky, D. et al. Discovery and quality analysis of a comprehensive set of structural variants and short tandem repeats. Nat. Commun. 11, 2928 (2020).

    Article  CAS  PubMed  PubMed Central  ADS  Google Scholar 

  41. Vijayaraghavan, P. et al. The genomic landscape of short tandem repeats across multiple ancestries. PLoS ONE 18, e0279430 (2023).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  42. Stranneheim, H. et al. Integration of whole genome sequencing into a healthcare setting: high diagnostic rates across multiple clinical entities in 3219 rare disease patients. Genome Med. 13, 40 (2021). This study implemented a clinical diagnostic workflow to detect pathogenic variants, including STRs, in the rare disease setting.

    Article  PubMed  PubMed Central  Google Scholar 

  43. Lowther, C. et al. Systematic evaluation of genome sequencing for the diagnostic assessment of autism spectrum disorder and fetal structural anomalies. Am. J. Hum. Genet. 110, 1454–1469 (2023).

    Article  CAS  PubMed  Google Scholar 

  44. Southern, E. M. Detection of specific sequences among DNA fragments separated by gel electrophoresis. J. Mol. Biol. 98, 503–517 (1975).

    Article  CAS  PubMed  Google Scholar 

  45. de Leeuw, R. H. et al. Diagnostics of short tandem repeat expansion variants using massively parallel sequencing and componential tools. Eur. J. Hum. Genet. 27, 400–407 (2019).

    Article  PubMed  Google Scholar 

  46. Sanger, F., Nicklen, S. & Coulson, A. R. DNA sequencing with chain-terminating inhibitors. Proc. Natl Acad. Sci. USA 74, 5463–5467 (1977).

    Article  CAS  PubMed  PubMed Central  ADS  Google Scholar 

  47. Saiki, R. K. et al. Enzymatic amplification of beta-globin genomic sequences and restriction site analysis for diagnosis of sickle cell anemia. Science 230, 1350–1354 (1985).

    Article  CAS  PubMed  ADS  Google Scholar 

  48. Warner, J. P. et al. A general method for the detection of large CAG repeat expansions by fluorescent PCR. J. Med. Genet. 33, 1022–1026 (1996).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  49. Schwartz, D. C. et al. Ordered restriction maps of Saccharomyces cerevisiae chromosomes constructed by optical mapping. Science 262, 110–114 (1993).

    Article  CAS  PubMed  ADS  Google Scholar 

  50. Wyner, N., Barash, M. & McNevin, D. Forensic autosomal short tandem repeats and their potential association with phenotype. Front. Genet. 11, 884 (2020).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  51. Forster, P. et al. A short tandem repeat-based phylogeny for the human Y chromosome. Am. J. Hum. Genet. 67, 182–196 (2000).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  52. Tishkoff, S. A. et al. Global patterns of linkage disequilibrium at the CD4 locus and modern human origins. Science 271, 1380–1387 (1996).

    Article  CAS  PubMed  ADS  Google Scholar 

  53. Oberlé, I. et al. Instability of a 550-base pair DNA segment and abnormal methylation in fragile X syndrome. Science 252, 1097–1102 (1991).

    Article  PubMed  ADS  Google Scholar 

  54. Verkerk, A. J. et al. Identification of a gene (FMR-1) containing a CGG repeat coincident with a breakpoint cluster region exhibiting length variation in fragile X syndrome. Cell 65, 905–914 (1991).

    Article  CAS  PubMed  Google Scholar 

  55. La Spada, A. R., Wilson, E. M., Lubahn, D. B., Harding, A. E. & Fischbeck, K. H. Androgen receptor gene mutations in X-linked spinal and bulbar muscular atrophy. Nature 352, 77–79 (1991).

    Article  PubMed  ADS  Google Scholar 

  56. Kedzierska, K. Z. et al. SONiCS: PCR stutter noise correction in genome-scale microsatellites. Bioinformatics 34, 4115–4117 (2018).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  57. Kamsteeg, E.-J. & Gilissen, C. A comprehensive assay for resolving repeat expansions to the base pair. Clin. Chem. 69, 213–215 (2023).

    Article  PubMed  Google Scholar 

  58. Liu, L. et al. Comparison of next-generation sequencing systems. J. Biomed. Biotechnol. 2012, 251364 (2012).

    Article  PubMed  PubMed Central  Google Scholar 

  59. Jain, M., Olsen, H. E., Paten, B. & Akeson, M. The Oxford Nanopore MinION: delivery of nanopore sequencing to the genomics community. Genome Biol. 17, 239 (2016).

    Article  PubMed  PubMed Central  Google Scholar 

  60. Wenger, A. M. et al. Accurate circular consensus long-read sequencing improves variant detection and assembly of a human genome. Nat. Biotechnol. 37, 1155–1162 (2019).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  61. Eid, J. et al. Real-time DNA sequencing from single polymerase molecules. Science 323, 133–138 (2009).

    Article  CAS  PubMed  ADS  Google Scholar 

  62. Rhoads, A. & Au, K. F. PacBio sequencing and its applications. Genom. Proteom. Bioinform. 13, 278–289 (2015).

    Article  Google Scholar 

  63. Tang, H. et al. Profiling of short-tandem-repeat disease alleles in 12,632 human whole genomes. Am. J. Hum. Genet. 101, 700–715 (2017).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  64. Mousavi, N. et al. TRTools: a toolkit for genome-wide analysis of tandem repeats. Bioinformatics 37, 731–733 (2021).

    Article  CAS  PubMed  Google Scholar 

  65. Das, S. et al. Methylation analysis of the fragile X syndrome by PCR. Genet. Test. 1, 151–155 (1997).

    Article  CAS  PubMed  Google Scholar 

  66. Nurk, S. et al. The complete sequence of a human genome. Science 376, 44–53 (2022). This paper describes the comprehensive catalogue of repeat elements generated from the complete Telomere-to-Telomere (T2T) human genome.

    Article  CAS  PubMed  PubMed Central  ADS  Google Scholar 

  67. Hoyt, S. J. et al. From telomere to telomere: the transcriptional and epigenetic state of human repeat elements. Science 376, eabk3112 (2022).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  68. Zhao, X. et al. Expectations and blind spots for structural variation detection from long-read assemblies and short-read genome sequencing technologies. Am. J. Hum. Genet. 108, 919–928 (2021). This study assessed the additional utility of long-read sequencing relative to short-read sequencing in detecting structural variation, including repetitive elements.

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  69. Dohm, J. C., Peters, P., Stralis-Pavese, N. & Himmelbauer, H. Benchmarking of long-read correction methods. Nar. Genom. Bioinform 2, lqaa037 (2020).

    Article  PubMed  PubMed Central  Google Scholar 

  70. Mitsuhashi, S. et al. Tandem-genotypes: robust detection of tandem repeat expansions from long DNA reads. Genome Biol. 20, 58 (2019).

    Article  PubMed  PubMed Central  Google Scholar 

  71. Giesselmann, P. et al. Analysis of short tandem repeat expansions and their methylation state with nanopore sequencing. Nat. Biotechnol. 37, 1478–1481 (2019).

    Article  CAS  PubMed  Google Scholar 

  72. Dolzhenko, E. et al. Resolving the unsolved: comprehensive assessment of tandem repeats at scale. Preprint at bioRxiv https://doi.org/10.1101/2023.05.12.540470 (2023).

  73. Ameur, A., Kloosterman, W. P. & Hestand, M. S. Single-molecule sequencing: towards clinical applications. Trends Biotechnol. 37, 72–85 (2019).

    Article  CAS  PubMed  Google Scholar 

  74. Mahmoud, M. et al. Utility of long-read sequencing for All of Us. Preprint at bioRxiv https://doi.org/10.1101/2023.01.23.525236 (2023).

  75. Olson, N. D. et al. PrecisionFDA truth challenge V2: calling variants from short and long reads in difficult-to-map regions. Cell Genom. 2, 100129 (2022).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  76. Chen, Y., Zhang, Y., Wang, A. Y., Gao, M. & Chong, Z. Accurate long-read de novo assembly evaluation with inspector. Genome Biol. 22, 312 (2021).

    Article  PubMed  PubMed Central  Google Scholar 

  77. Kronenberg, Z. N. et al. Extended haplotype-phasing of long-read de novo genome assemblies using Hi-C. Nat. Commun. 12, 1935 (2021).

    Article  CAS  PubMed  PubMed Central  ADS  Google Scholar 

  78. Chiu, R., Rajan-Babu, I.-S., Birol, I. & Friedman, J. M. Linked-read sequencing for detecting short tandem repeat expansions. Sci. Rep. 12, 9352 (2022).

    Article  CAS  PubMed  PubMed Central  ADS  Google Scholar 

  79. Shinde, D., Lai, Y., Sun, F. & Arnheim, N. Taq DNA polymerase slippage mutation rates measured by PCR and quasi-likelihood analysis: (CA/GT)n and (A/T)n microsatellites. Nucleic Acids Res. 31, 974–980 (2003).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  80. Kristmundsdóttir, S., Sigurpálsdóttir, B. D., Kehr, B. & Halldórsson, B. V. popSTR: population-scale detection of STR variants. Bioinformatics 33, 4041–4048 (2017).

    Article  PubMed  Google Scholar 

  81. Halldorsson, B. V. et al. The sequences of 150,119 genomes in the UK Biobank. Nature 607, 732–740 (2022). This study performed genome-wide STR genotyping in over 150,000 individuals in the UK Biobank.

    Article  CAS  PubMed  PubMed Central  ADS  Google Scholar 

  82. Viguera, E., Canceill, D. & Ehrlich, S. D. Replication slippage involves DNA polymerase pausing and dissociation. EMBO J. 20, 2587–2595 (2001).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  83. Halman, A. & Oshlack, A. Accuracy of short tandem repeats genotyping tools in whole exome sequencing data. F1000Res 9, 200 (2020).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  84. Dashnow, H. et al. STRetch: detecting and discovering pathogenic short tandem repeat expansions. Genome Biol. 19, 121 (2018).

    Article  PubMed  PubMed Central  Google Scholar 

  85. Tankard, R. M. et al. Detecting expansions of tandem repeats in cohorts sequenced with short-read sequencing data. Am. J. Hum. Genet. 103, 858–873 (2018).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  86. Benson, G. Tandem repeats finder: a program to analyze DNA sequences. Nucleic Acids Res. 27, 573–580 (1999). This algorithm detects tandem repeats in a reference sequence and is used by several long-read and short-read STR genotyping tools.

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  87. Weisburd, B., Tiao, G. & Rehm, H. L. Insights from a genome-wide truth set of tandem repeat variation. Preprint at bioRxiv https://doi.org/10.1101/2023.05.05.539588 (2023).

  88. Rajan-Babu, I.-S. et al. Genome-wide sequencing as a first-tier screening test for short tandem repeat expansions. Genome Med. 13, 126 (2021).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  89. Seixas, A. I. et al. A pentanucleotide ATTTC repeat insertion in the non-coding region of DAB1, mapping to SCA37, causes spinocerebellar ataxia. Am. J. Hum. Genet. 101, 87–103 (2017).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  90. Rafehi, H. et al. Bioinformatics-based identification of expanded repeats: a non-reference intronic pentamer expansion in RFC1 causes CANVAS. Am. J. Hum. Genet. 105, 151–165 (2019).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  91. Bakhtiari, M. et al. Variable number tandem repeats mediate the expression of proximal genes. Nat. Commun. 12, 2075 (2021).

    Article  CAS  PubMed  PubMed Central  ADS  Google Scholar 

  92. Reynolds, H. M. et al. Rapid genome sequencing identifies a novel de novo variant for neonatal congenital myasthenic syndrome. Cold Spring Harb. Mol. Case Stud. 8, a006242 (2022).

    Article  PubMed  PubMed Central  Google Scholar 

  93. Collins, R. L. et al. A structural variation reference for medical and population genetics. Nature 581, 444–451 (2020).

    Article  CAS  PubMed  PubMed Central  ADS  Google Scholar 

  94. Zarate, S. et al. Parliament2: accurate structural variant calling at scale. Gigascience 9, giaa145 (2020).

    Article  PubMed  PubMed Central  Google Scholar 

  95. Ishiura, H. et al. Expansions of intronic TTTCA and TTTTA repeats in benign adult familial myoclonic epilepsy. Nat. Genet. 50, 581–590 (2018).

    Article  CAS  PubMed  Google Scholar 

  96. LaCroix, A. J. et al. GGC repeat expansion and exon 1 methylation of XYLT1 is a common pathogenic variant in Baratela-Scott syndrome. Am. J. Hum. Genet. 104, 35–44 (2019).

    Article  CAS  PubMed  Google Scholar 

  97. Mizuguchi, T. et al. Detecting a long insertion variant in SAMD12 by SMRT sequencing: implications of long-read whole-genome sequencing for repeat expansion diseases. J. Hum. Genet. 64, 191–197 (2019).

    Article  CAS  PubMed  Google Scholar 

  98. Amarasinghe, S. L. et al. Opportunities and challenges in long-read sequencing data analysis. Genome Biol. 21, 30 (2020).

    Article  PubMed  PubMed Central  Google Scholar 

  99. Logsdon, G. A., Vollger, M. R. & Eichler, E. E. Long-read human genome sequencing and its applications. Nat. Rev. Genet. 21, 597–614 (2020).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  100. Ummat, A. & Bashir, A. Resolving complex tandem repeats with long reads. Bioinformatics 30, 3491–3498 (2014).

    Article  CAS  PubMed  Google Scholar 

  101. Liu, Q., Zhang, P., Wang, D., Gu, W. & Wang, K. Interrogating the ‘unsequenceable’ genomic trinucleotide repeat disorders by long-read sequencing. Genome Med. 9, 65 (2017).

    Article  PubMed  PubMed Central  Google Scholar 

  102. Bolognini, D., Magi, A., Benes, V., Korbel, J. O. & Rausch, T. TRiCoLOR: tandem repeat profiling using whole-genome long-read sequencing data. Gigascience 9, giaa101 (2020).

    Article  PubMed  PubMed Central  Google Scholar 

  103. Guo, R. et al. RepLong: de novo repeat identification using long read sequencing data. Bioinformatics 34, 1099–1107 (2018).

    Article  CAS  PubMed  Google Scholar 

  104. Brown, S. D., Dreolini, L., Wilson, J. F., Balasundaram, M. & Holt, R. A. Complete sequence verification of plasmid DNA using the Oxford Nanopore Technologies’ MinION device. BMC Bioinform 24, 116 (2023).

    Article  CAS  Google Scholar 

  105. De Roeck, A. et al. NanoSatellite: accurate characterization of expanded tandem repeat length and sequence through whole genome long-read sequencing on PromethION. Genome Biol. 20, 239 (2019).

    Article  PubMed  PubMed Central  Google Scholar 

  106. Gamaarachchi, H. et al. Fast nanopore sequencing data analysis with SLOW5. Nat. Biotechnol. 40, 1026–1029 (2022).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  107. Tsai, Y.-C. et al. Amplification-free, CRISPR-Cas9 targeted enrichment and SMRT sequencing of repeat-expansion disease causative genomic regions. Preprint at bioRxiv https://doi.org/10.1101/203919 (2017).

  108. Wallace, A. D. et al. CaBagE: a Cas9-based background elimination strategy for targeted, long-read DNA sequencing. PLoS ONE 16, e0241253 (2021).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  109. Loose, M., Malla, S. & Stout, M. Real-time selective sequencing using nanopore technology. Nat. Methods 13, 751–754 (2016).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  110. Stevanovski, I. et al. Comprehensive genetic diagnosis of tandem repeat expansion disorders with programmable targeted nanopore sequencing. Sci. Adv. 8, eabm5386 (2022).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  111. Pemberton, T. J., Sandefur, C. I., Jakobsson, M. & Rosenberg, N. A. Sequence determinants of human microsatellite variability. BMC Genomics 10, 612 (2009).

    Article  PubMed  PubMed Central  Google Scholar 

  112. Jakubosky, D. et al. Properties of structural variants and short tandem repeats associated with gene expression and complex traits. Nat. Commun. 11, 2927 (2020).

    Article  CAS  PubMed  PubMed Central  ADS  Google Scholar 

  113. Mukamel, R. E. et al. Protein-coding repeat polymorphisms strongly shape diverse human phenotypes. Science 373, 1499–1505 (2021).

    Article  CAS  PubMed  PubMed Central  ADS  Google Scholar 

  114. Margoliash, J. et al. Polymorphic short tandem repeats make widespread contributions to blood and serum traits. Cell Genom. 3, 100458 (2023).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  115. Gymrek, M. et al. Abundant contribution of short tandem repeats to gene expression variation in humans. Nat. Genet. 48, 22–29 (2016). The first paper, to our knowledge, to associate STR variation with gene expression genome-wide.

    Article  CAS  PubMed  Google Scholar 

  116. Grapotte, M. et al. Discovery of widespread transcription initiation at microsatellites predictable by sequence-based deep neural network. Nat. Commun. 12, 3297 (2021).

    Article  CAS  PubMed  PubMed Central  ADS  Google Scholar 

  117. Martin-Trujillo, A., Garg, P., Patel, N., Jadhav, B. & Sharp, A. J. Genome-wide evaluation of the effect of short tandem repeat variation on local DNA methylation. Genome Res. 33, 184–196 (2023).

    Article  PubMed  PubMed Central  Google Scholar 

  118. Chen, L.-S., Tassone, F., Sahota, P. & Hagerman, P. J. The (CGG)n repeat element within the 5’ untranslated region of the FMR1 message provides both positive and negative cis effects on in vivo translation of a downstream reporter. Hum. Mol. Genet. 12, 3067–3074 (2003).

    Article  CAS  PubMed  Google Scholar 

  119. Tassone, F. et al. Elevated FMR1 mRNA in premutation carriers is due to increased transcription. RNA 13, 555–562 (2007).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  120. Grünewald, T. G. P. et al. Chimeric EWSR1-FLI1 regulates the Ewing sarcoma susceptibility gene EGR2 via a GGAA microsatellite. Nat. Genet. 47, 1073–1078 (2015).

    Article  PubMed  PubMed Central  Google Scholar 

  121. Saini, S., Mitra, I., Mousavi, N., Fotsing, S. F. & Gymrek, M. A reference haplotype panel for genome-wide imputation of short tandem repeats. Nat. Commun. 9, 4397 (2018).

    Article  PubMed  PubMed Central  ADS  Google Scholar 

  122. Gymrek, M., Willems, T., Reich, D. & Erlich, Y. Interpreting short tandem repeat variations in humans using mutational constraint. Nat. Genet. 49, 1495–1501 (2017).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  123. Trost, B. et al. Genome-wide detection of tandem DNA repeats that are expanded in autism. Nature 586, 80–86 (2020).

    Article  CAS  PubMed  PubMed Central  ADS  Google Scholar 

  124. Wen, J. et al. Rare tandem repeat expansions associate with genes involved in synaptic and neuronal signaling functions in schizophrenia. Mol. Psychiatry 28, 475–482 (2023).

    Article  CAS  PubMed  Google Scholar 

  125. Boland, C. R. & Goel, A. Microsatellite instability in colorectal cancer. Gastroenterology 138, 2073–2087.e3 (2010).

    Article  CAS  PubMed  Google Scholar 

  126. Kanopiene, D. et al. Endometrial cancer and microsatellite instability status. Open. Med. 10, 70–76 (2015).

    CAS  Google Scholar 

  127. Hause, R. J., Pritchard, C. C., Shendure, J. & Salipante, S. J. Classification and characterization of microsatellite instability across 18 cancer types. Nat. Med. 22, 1342–1350 (2016).

    Article  CAS  PubMed  Google Scholar 

  128. Erwin, G. S. et al. Recurrent repeat expansions in human cancer genomes. Nature 613, 96–102 (2023).

    Article  CAS  PubMed  ADS  Google Scholar 

  129. Cuomo, A. S. E., Nathan, A., Raychaudhuri, S., MacArthur, D. G. & Powell, J. E. Single-cell genomics meets human genetics. Nat. Rev. Genet. 24, 535–549 (2023).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  130. Rafehi, H. et al. Unexpected diagnosis of myotonic dystrophy type 2 repeat expansion by genome sequencing. Eur. J. Hum. Genet. 31, 122–124 (2023).

    Article  CAS  PubMed  Google Scholar 

  131. Aganezov, S. et al. A complete reference genome improves analysis of human genetic variation. Science 376, eabl3533 (2022).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  132. Liao, W.-W. et al. A draft human pangenome reference. Nature 617, 312–324 (2023).

    Article  CAS  PubMed  PubMed Central  ADS  Google Scholar 

  133. Goodrich, J. K. et al. Determinants of penetrance and variable expressivity in monogenic metabolic conditions across 77,184 exomes. Nat. Commun. 12, 3505 (2021).

    Article  CAS  PubMed  PubMed Central  ADS  Google Scholar 

  134. Kingdom, R. et al. Rare genetic variants in genes and loci linked to dominant monogenic developmental disorders cause milder related phenotypes in the general population. Am. J. Hum. Genet. 109, 1308–1316 (2022).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  135. Wright, C. F. et al. Assessing the pathogenicity, penetrance, and expressivity of putative disease-causing variants in a population setting. Am. J. Hum. Genet. 104, 275–286 (2019).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  136. Zeman, A. et al. Spinocerebellar ataxia type 8 in Scotland: genetic and clinical features in seven unrelated cases and a review of published reports. J. Neurol. Neurosurg. Psychiatry 75, 459–465 (2004).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  137. Bycroft, C. et al. The UK Biobank resource with deep phenotyping and genomic data. Nature 562, 203–209 (2018).

    Article  CAS  PubMed  PubMed Central  ADS  Google Scholar 

  138. Shi, Y. et al. Characterization of genome-wide STR variation in 6487 human genomes. Nat. Commun. 14, 2092 (2023).

    Article  CAS  PubMed  PubMed Central  ADS  Google Scholar 

  139. Steinbach, P., Gläser, D., Vogel, W., Wolf, M. & Schwemmle, S. The DMPK gene of severely affected myotonic dystrophy patients is hypermethylated proximal to the largely expanded CTG repeat. Am. J. Hum. Genet. 62, 278–285 (1998).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  140. Chaisson, M. J. P. et al. Multi-platform discovery of haplotype-resolved structural variation in human genomes. Nat. Commun. 10, 1784 (2019).

    Article  PubMed  PubMed Central  ADS  Google Scholar 

  141. Wang, T. et al. The Human Pangenome Project: a global resource to map genomic diversity. Nature 604, 437–446 (2022).

    Article  CAS  PubMed  PubMed Central  ADS  Google Scholar 

  142. Lundström, O. S. et al. WebSTR: a population-wide database of short tandem repeat variation in humans. J. Mol. Biol. 435, 168260 (2023).

    Article  PubMed  Google Scholar 

  143. Huang, B. et al. Genome-wide selection inference at short tandem repeats. Preprint at bioRxiv https://doi.org/10.1101/2022.05.12.491726 (2022).

  144. Fazal, S. et al. RExPRT: a machine learning tool to predict pathogenicity of tandem repeat loci. Preprint at bioRxiv https://doi.org/10.1101/2023.03.22.533484 (2023).

  145. Tørresen, O. K. et al. Tandem repeats lead to sequence assembly errors and impose multi-level challenges for genome and protein databases. Nucleic Acids Res. 47, 10994–11006 (2019).

    Article  PubMed  PubMed Central  Google Scholar 

  146. Nelson, D. L., Orr, H. T. & Warren, S. T. The unstable repeats-three evolving faces of neurological disease. Neuron 77, 825–843 (2013).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  147. Ho, T. H. et al. Muscleblind proteins regulate alternative splicing. EMBO J. 23, 3103–3112 (2004).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  148. Zhang, N. & Ashizawa, T. RNA toxicity and foci formation in microsatellite expansion diseases. Curr. Opin. Genet. Dev. 44, 17–29 (2017).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  149. Kimpton, C. et al. Evaluation of an automated DNA profiling system employing multiplex amplification of four tetrameric STR loci. Int. J. Leg. Med. 106, 302–311 (1994).

    Article  CAS  Google Scholar 

  150. De Baere, E. et al. Spectrum of FOXL2 gene mutations in blepharophimosis-ptosis-epicanthus inversus (BPES) families demonstrates a genotype-phenotype correlation. Hum. Mol. Genet. 10, 1591–1600 (2001).

    Article  PubMed  Google Scholar 

  151. MacDonald, M. E. et al. A novel gene containing a trinucleotide repeat that is expanded and unstable on Huntington’s disease chromosomes. Cell 72, 971–983 (1993).

    Article  Google Scholar 

  152. Mahadevan, M. et al. Myotonic dystrophy mutation: an unstable CTG repeat in the 3′ untranslated region of the gene. Science 255, 1253–1255 (1992).

    Article  CAS  PubMed  ADS  Google Scholar 

  153. Dolzhenko, E. et al. REViewer: haplotype-resolved visualization of read alignments in and around tandem repeats. Genome Med. 14, 84 (2022).

    Article  PubMed  PubMed Central  Google Scholar 

  154. Robinson, J. T. et al. Integrative genomics viewer. Nat. Biotechnol. 29, 24–26 (2011).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  155. Park, J., Kaufman, E., Valdmanis, P. N. & Bafna, V. TRviz: a Python library for decomposing and visualizing tandem repeat sequences. Bioinform. Adv. 3, vbad058 (2023).

    Article  PubMed  PubMed Central  Google Scholar 

  156. Ohta, T. & Kimura, M. A model of mutation appropriate to estimate the number of electrophoretically detectable alleles in a finite population. Genet. Res. 22, 201–204 (1973).

    Article  MathSciNet  CAS  PubMed  Google Scholar 

  157. Schlötterer, C. & Tautz, D. Slippage synthesis of simple sequence DNA. Nucleic Acids Res. 20, 211–215 (1992).

    Article  PubMed  PubMed Central  Google Scholar 

  158. Lai, Y. & Sun, F. The relationship between microsatellite slippage mutation rate and the number of repeat units. Mol. Biol. Evol. 20, 2123–2131 (2003).

    Article  CAS  PubMed  Google Scholar 

  159. Sun, J. X. et al. A direct characterization of human mutation based on microsatellites. Nat. Genet. 44, 1161–1165 (2012).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

Download references

Acknowledgements

The authors thank H. Nicholas for substantial contributions to editing the manuscript and E. Dolzhenko, L. Hiatt, M.Goldberg, A. Quinlan and B. Weisburd for valuable discussions.

Author information

Authors and Affiliations

Authors

Contributions

H.A.T. and H.D. researched the literature. All authors contributed substantially to discussion of the content, wrote the article and reviewed and/or edited the manuscript before submission.

Corresponding authors

Correspondence to Harriet Dashnow or Daniel G. MacArthur.

Ethics declarations

Competing interests

D.G.M. is a paid adviser to GlaxoSmithKline, Insitro, Variant Bio and Overtone Therapeutics and has received research support from AbbVie, Astellas, Biogen, BioMarin, Eisai, Merck, Pfizer and Sanofi-Genzyme; none of these activities are related to the work presented here. I.W.D. has previously received travel and accommodation expenses from ONT to speak at conferences. I.W.D. has a paid consultant role with Sequin Pty Ltd. H.A.T and H.D. declare no competing interests.

Peer review

Peer review information

Nature Reviews Genetics thanks Melissa Gymrek, who co-reviewed with Helyaneh Ziaei Jam; and the other, anonymous, reviewer(s) for their contribution to the peer review of this work.

Additional information

Publisher’s note Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Supplementary information

Glossary

Abandonware

Software that is no longer maintained by its developer.

Allelic dropout

One of the two alleles at a locus is undetected, leading to a false homozygous genotype.

Graph genome

Representation of the genome involving graph structures, including nodes and branches.

Mate pairs

Paired reads obtained from opposite ends of a DNA fragment.

Mismatch error rate

Rate of insertion, deletion or substitution errors.

Motif structure

Repetitive sequence pattern that characterizes an STR locus.

Novel STR

An STR locus that is absent from the reference genome or has a motif or coordinates that differ from those in the reference genome.

PCR stutter artefacts

A distribution of erroneous STR allele sizes generated during PCR amplification owing to DNA polymerase slippage.

Regular expressions

Character sequence pattern convention used for efficient sequence matching.

Variable number tandem repeat

A repetitive sequence in the genome consisting of a repeating motif greater than 6 bp.

Variant-calling pipelines

Computational workflows used to identify genetic variants from sequencing data.

Rights and permissions

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Tanudisastro, H.A., Deveson, I.W., Dashnow, H. et al. Sequencing and characterizing short tandem repeats in the human genome. Nat Rev Genet (2024). https://doi.org/10.1038/s41576-024-00692-3

Download citation

  • Accepted:

  • Published:

  • DOI: https://doi.org/10.1038/s41576-024-00692-3

This article is cited by

Search

Quick links

Nature Briefing

Sign up for the Nature Briefing newsletter — what matters in science, free to your inbox daily.

Get the most important science stories of the day, free in your inbox. Sign up for Nature Briefing