Patterns of de novo tandem repeat mutations and their role in autism


Autism spectrum disorder (ASD) is an early-onset developmental disorder characterized by deficits in communication and social interaction and restrictive or repetitive behaviours1,2. Family studies demonstrate that ASD has a substantial genetic basis with contributions both from inherited and de novo variants3,4. It has been estimated that de novo mutations may contribute to 30% of all simplex cases, in which only a single child is affected per family5. Tandem repeats (TRs), defined here as sequences of 1 to 20 base pairs in size repeated consecutively, comprise one of the major sources of de novo mutations in humans6. TR expansions are implicated in dozens of neurological and psychiatric disorders7. Yet, de novo TR mutations have not been characterized on a genome-wide scale, and their contribution to ASD remains unexplored. Here we develop new bioinformatics methods for identifying and prioritizing de novo TR mutations from sequencing data and perform a genome-wide characterization of de novo TR mutations in ASD-affected probands and unaffected siblings. We infer specific mutation events and their precise changes in repeat number, and primarily focus on more prevalent stepwise copy number changes rather than large expansions. Our results demonstrate a significant genome-wide excess of TR mutations in ASD probands. Mutations in probands tend to be larger, enriched in fetal brain regulatory regions, and are predicted to be more evolutionarily deleterious. Overall, our results highlight the importance of considering repeat variants in future studies of de novo mutations.

Access options

Rent or Buy article

Get time limited or full article access on ReadCube.


All prices are NET prices.

Fig. 1: Identifying de novo TR mutations in the SSC cohort.
Fig. 2: Patterns of autosomal TR mutations.
Fig. 3: TR mutation burden in ASD.
Fig. 4: Prioritizing TR mutations by fitness effects.

Data availability

All TR genotypes and mutation calls are available through SFARI base accession code: SFARI_SSC_WGS_2b. Per-locus selection scores computed by SISTR are provided in Supplementary Data 1. The BrainSpan dataset is available at The NHGRI GWAS catalogue is available at

Code availability

The (1) MonSTR software for identifying TR mutations ( ( and (2) SISTR software for prioritizing TR mutations ( ( are open source and available on Github. The code used to generate figures and results for this study is available at (


  1. 1.

    American Psychiatric Association. Diagnostic and Statistical Manual of Mental Disorders 5th edn (2013).

  2. 2.

    Rosti, R. O., Sadek, A. A., Vaux, K. K. & Gleeson, J. G. The genetic landscape of autism spectrum disorders. Dev. Med. Child Neurol. 56, 12–18 (2014).

    PubMed  Google Scholar 

  3. 3.

    Gaugler, T. et al. Most genetic risk for autism resides with common variation. Nat. Genet. 46, 881–885 (2014).

    CAS  PubMed  PubMed Central  Google Scholar 

  4. 4.

    Iakoucheva, L. M., Muotri, A. R. & Sebat, J. Getting to the cores of autism. Cell 178, 1287–1298 (2019).

    CAS  PubMed  PubMed Central  Google Scholar 

  5. 5.

    Iossifov, I. et al. The contribution of de novo coding mutations to autism spectrum disorder. Nature 515, 216–221 (2014).

    CAS  PubMed  PubMed Central  ADS  Google Scholar 

  6. 6.

    Willems, T., Gymrek, M., Poznik, G. D., Tyler-Smith, C. & Erlich, Y. Population-scale sequencing data enable precise estimates of Y-STR mutation rates. Am. J. Hum. Genet. 98, 919–933 (2016).

    CAS  PubMed  PubMed Central  Google Scholar 

  7. 7.

    Hannan, A. J. Tandem repeats mediating genetic plasticity in health and disease. Nat. Rev. Genet. 19, 286–298 (2018).

    CAS  PubMed  Google Scholar 

  8. 8.

    Fischbach, G. D. & Lord, C. The Simons Simplex Collection: a resource for identification of autism genetic risk factors. Neuron 68, 192–195 (2010).

    CAS  PubMed  Google Scholar 

  9. 9.

    Turner, T. N. et al. Genomic patterns of de novo mutation in simplex autism. Cell 171, 710–722 (2017).

    CAS  PubMed  PubMed Central  Google Scholar 

  10. 10.

    Mousavi, N., Shleizer-Burko, S., Yanicky, R. & Gymrek, M. Profiling the genome-wide landscape of tandem repeat expansions. Nucleic Acids Res. 47, e90 (2019).

    CAS  PubMed  PubMed Central  Google Scholar 

  11. 11.

    An, J. Y. et al. Genome-wide de novo risk score implicates promoter variation in autism spectrum disorder. Science 362, eaat6576 (2018).

    PubMed  PubMed Central  ADS  Google Scholar 

  12. 12.

    Gymrek, M., Willems, T., Reich, D. & Erlich, Y. Interpreting short tandem repeat variations in humans using mutational constraint. Nat. Genet. 49, 1495–1501 (2017).

    CAS  PubMed  PubMed Central  Google Scholar 

  13. 13.

    Payseur, B. A., Jing, P. & Haasl, R. J. A genomic portrait of human microsatellite variation. Mol. Biol. Evol. 28, 303–312 (2011).

    CAS  PubMed  Google Scholar 

  14. 14.

    Sun, J. X. et al. A direct characterization of human mutation based on microsatellites. Nat. Genet. 44, 1161–1165 (2012).

    CAS  PubMed  PubMed Central  Google Scholar 

  15. 15.

    Michaelson, J. J. et al. Whole-genome sequencing in autism identifies hot spots for de novo germline mutation. Cell 151, 1431–1442 (2012).

    CAS  PubMed  PubMed Central  Google Scholar 

  16. 16.

    O’Roak, B. J. et al. Sporadic autism exomes reveal a highly interconnected protein network of de novo mutations. Nature 485, 246–250 (2012).

    PubMed  PubMed Central  ADS  Google Scholar 

  17. 17.

    Rahbari, R. et al. Timing, rates and spectra of human germline mutation. Nat. Genet. 48, 126–133 (2016).

    CAS  PubMed  Google Scholar 

  18. 18.

    Ellegren, H. Heterogeneous mutation processes in human microsatellite DNA sequences. Nat. Genet. 24, 400–402 (2000).

    CAS  PubMed  Google Scholar 

  19. 19.

    Huang, Q. Y. et al. Mutation patterns at dinucleotide microsatellite loci in humans. Am. J. Hum. Genet. 70, 625–634 (2002).

    CAS  PubMed  PubMed Central  Google Scholar 

  20. 20.

    Weber, J. L. & Wong, C. Mutation of human short tandem repeats. Hum. Mol. Genet. 2, 1123–1128 (1993).

    CAS  PubMed  Google Scholar 

  21. 21.

    Amos, W., Kosanović, D. & Eriksson, A. Inter-allelic interactions play a major role in microsatellite evolution. Proc. R. Soc. Lond. B 282, 20152125 (2015).

    Google Scholar 

  22. 22.

    Davydov, E. V. et al. Identifying a high fraction of the human genome to be under selective constraint using GERP++. PLOS Comput. Biol. 6, e1001025 (2010).

    PubMed  PubMed Central  Google Scholar 

  23. 23.

    Lek, M. et al. Analysis of protein-coding genetic variation in 60,706 humans. Nature 536, 285–291 (2016).

    CAS  PubMed  PubMed Central  Google Scholar 

  24. 24.

    Rentzsch, P., Witten, D., Cooper, G. M., Shendure, J. & Kircher, M. CADD: predicting the deleteriousness of variants throughout the human genome. Nucleic Acids Res. 47 (D1), D886–D894 (2019).

    CAS  PubMed  PubMed Central  Google Scholar 

  25. 25.

    Samocha, K. E. et al. A framework for the interpretation of de novo mutation in human disease. Nat. Genet. 46, 944–950 (2014).

    CAS  PubMed  PubMed Central  Google Scholar 

  26. 26.

    Werling, D. M. et al. An analytical framework for whole-genome sequence association studies and its implications for autism spectrum disorder. Nat. Genet. 50, 727–736 (2018).

    CAS  PubMed  PubMed Central  Google Scholar 

  27. 27.

    Trost, B. et al. Genome-wide detection of tandem DNA repeats that are expanded in autism. Nature 586, 80–86 (2020).

    CAS  PubMed  Google Scholar 

  28. 28.

    Zhou, J. et al. Whole-genome deep-learning analysis identifies contribution of noncoding mutations to autism risk. Nat. Genet. 51, 973–980 (2019).

    CAS  PubMed  PubMed Central  Google Scholar 

  29. 29.

    Grünewald, T. G. et al. Chimeric EWSR1-FLI1 regulates the Ewing sarcoma susceptibility gene EGR2 via a GGAA microsatellite. Nat. Genet. 47, 1073–1078 (2015).

    PubMed  PubMed Central  Google Scholar 

  30. 30.

    Breuss, M. W. et al. Autism risk in offspring can be assessed through quantification of male sperm mosaicism. Nat. Med. 26, 143–150 (2020).

    CAS  PubMed  Google Scholar 

  31. 31.

    Mousavi, N. et al. TRTools: a toolkit for genome-wide analysis of tandem repeats. Bioinformatics btaa736 (2020).

  32. 32.

    Kent, W. J. et al. The human genome browser at UCSC. Genome Res. 12, 996–1006 (2002).

    CAS  PubMed  PubMed Central  Google Scholar 

  33. 33.

    Willems, T. et al. Genome-wide profiling of heritable and de novo STR variations. Nat. Methods 14, 590–592 (2017).

    CAS  PubMed  PubMed Central  Google Scholar 

  34. 34.

    Huang, W., Li, L., Myers, J. R. & Marth, G. T. ART: a next-generation sequencing read simulator. Bioinformatics 28, 593–594 (2012).

    PubMed  Google Scholar 

  35. 35.

    Li, H. Aligning sequence reads, clone sequences and assembly contigs with BWA-MEM. Preprint at (2013).

  36. 36.

    Quinlan, A. R. BEDTools: the Swiss-army tool for genome feature analysis. Bioinformatics 47, 11–34 (2014).

    Google Scholar 

  37. 37.

    Schuelke, M. An economic method for the fluorescent labeling of PCR fragments. Nat. Biotechnol. 18, 233–234 (2000).

    CAS  PubMed  Google Scholar 

  38. 38.

    Krebs, M. O. et al. Absence of association between a polymorphic GGC repeat in the 5′ untranslated region of the reelin gene and autism. Mol. Psychiatry 7, 801–804 (2002).

    CAS  PubMed  PubMed Central  Google Scholar 

  39. 39.

    Ernst, J. & Kellis, M. ChromHMM: automating chromatin-state discovery and characterization. Nat. Methods 9, 215–216 (2012).

    CAS  PubMed  PubMed Central  Google Scholar 

  40. 40.

    Buniello, A. et al. The NHGRI–EBI GWAS catalog of published genome-wide association studies, targeted arrays and summary statistics 2019. Nucleic Acids Res. 47 (D1), D1005–D1012 (2019).

    CAS  PubMed  Google Scholar 

  41. 41.

    Miller, J. A. et al. Transcriptional landscape of the prenatal human brain. Nature 508, 199–206 (2014).

    CAS  PubMed  PubMed Central  ADS  Google Scholar 

  42. 42.

    Fotsing, S. F. et al. The impact of short tandem repeat variation on gene expression. Nat. Genet. 51, 1652–1659 (2019).

    CAS  PubMed  PubMed Central  Google Scholar 

  43. 43.

    Fu, Y. X. & Chakraborty, R. Simultaneous estimation of all the parameters of a stepwise mutation model. Genetics 150, 487–497 (1998).

    CAS  PubMed  PubMed Central  Google Scholar 

  44. 44.

    Haasl, R. J. & Payseur, B. A. Microsatellites as targets of natural selection. Mol. Biol. Evol. 30, 285–298 (2013).

    CAS  PubMed  Google Scholar 

  45. 45.

    Benjamini, Y. & Hochberg, Y. Controlling the false discovery rate—a practical and powerful approach to multiple testing. J. R. Stat. Soc. B 57, 289–300 (1995).

    MathSciNet  MATH  Google Scholar 

  46. 46.

    Battle, A., Brown, C. D., Engelhardt, B. E. & Montgomery, S. B. Genetic effects on gene expression across human tissues. Nature 550, 204–213 (2017).

    PubMed  ADS  Google Scholar 

  47. 47.

    Karczewski, K. J. et al. The mutational constraint spectrum quantified from variation in 141,456 humans. Nature 581, 434–443 (2020).

    CAS  PubMed  PubMed Central  ADS  Google Scholar 

Download references


This study was supported by the Simons Foundation Autism Research Initiative (SFARI Grant no. 630705). I.M. was additionally supported by a predoctoral fellowship from the Autism Science Foundation. M.G. was additionally supported in part by the Office of The Director, National Institutes of Health under Award Number DP5OD024577 and NIH/NHGRI grants R01HG010149 and R21HG010070. K.E.L. was supported by the National Institutes of Health grant R35GM119856. We thank J. Gleeson, J. Sebat, A. Palmer and A. Goren for helpful comments on this study.

Author information




I.M. performed TR genotyping, identification of de novo mutations and downstream analyses, and helped to write the manuscript. B.H. developed SISTR and performed analysis of TR selection scores in the SSC cohort. N. Mousavi helped to design GangSTR analysis and filtering settings, and performed analyses to evaluate MonSTR. N. Ma performed capillary electrophoresis validation experiments. M.L. performed TR annotation for identification of determinants of TR mutation rates. R.Y. designed AWS cloud analysis pipelines. S.S.-B. helped to design and set up validation experiments. K.E.L. conceived the SISTR method, supervised analysis of TR selection scores and drafted the manuscript. M.G. conceived the study, designed and performed analyses, and drafted the manuscript. All authors have read and approved the final manuscript.

Corresponding authors

Correspondence to Kirk E. Lohmueller or Melissa Gymrek.

Ethics declarations

Competing interests

The authors declare no competing interests.

Additional information

Peer review information Nature thanks Anders Børglum, Thomas Bourgeron, Anthony Hannan and Ryan Layer for their contribution to the peer review of this work. Peer reviewer reports are available.

Publisher’s note Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Extended data figures and tables

Extended Data Fig. 1 Evaluation of MonSTR using simulated data.

a, Evaluation of a naive TR mutation-calling method. WGS was simulated for probands with mutations and controls with no mutation under three different scenarios for a range of mean sequencing coverages (Methods). Top plots show the sensitivity (blue line). Bottom plots show the false positive rate (FPR). Shaded bars show the percent of transmissions called as mutation (blue), no mutation (dark grey), or no call (light ray). b, Evaluation of MonSTR’s default model-based method. Plots are the same as in a. but based on MonSTR’s default model (Supplementary Methods). Note FPR lines are not visible because all are at 0%. c, Evaluation of TR mutation calling using default model-based MonSTR settings as a function of mutation size. The top plot is the same as in a, b, and shows the sensitivity to detect mutations as a function of their size. The bottom plot compares the estimated called mutation size (y-axis) compared to the true simulated mutation size (x-axis). Bubble sizes show the number of mutation calls represented at each point. d, Evaluation of TR mutation calling as a function of mutation size after quality filtering. Plots are same as in c, but using the stringent quality filters in MonSTR applied to analyse the SSC cohort. Compared to default settings, sensitivity is decreased especially for larger expansions but inferred mutation sizes are unbiased. All plots are based on simulation of 100 randomly chosen TR loci (Methods). c, d, show results for scenario #1.

Extended Data Fig. 2 Genome-wide de novo TR mutation rate patterns.

a, Distribution of average TR mutation rates by period. For each repeat unit length (x-axis), bars give the genome-wide estimated TR mutation rate (y-axis, log10 scale). Average mutation rates were computed as the total number of mutations divided by the total number of children analysed. The numbers of TRs considered (rounded to the nearest 1,000) in each category are annotated. b, TR mutation rate vs. length. The x-axis shows the TR reference length (hg38) and the y-axis shows the log10 mutation rate estimated across all TRs with each reference length. Colours denote different repeat unit lengths. c, Number of TR mutations observed for CODIS markers. Red dots show observed mutation counts. Black dots show expected mutation counts and lines give 95% confidence intervals based on mutation rates reported by NIST (Methods). Each x-axis category denotes a separate CODIS marker. The total number of children analysed is annotated above each marker d, Observed TR mutation counts concordant with MUTEA. Boxes show the distribution of log10 mutation rates estimated by MUTEA12 (y-axis) at each TR with a given number of mutations observed in SSC children (x-axis). Black middle lines give medians and boxes span from the 25th percentile (Q1) to the 75th percentile (Q3). Whiskers extend to Q1-1.5*IQR (minima) and Q3+1.5*IQR (maxima), where IQR gives the interquartile range (Q3-Q1). Data are shown for n = 548,724 TRs for which MUTEA estimates were available. e, Determinants of TR mutation rates. The Poisson regression coefficient is shown for each feature in models trained separately for each repeat unit length (Methods). Features marked with an asterisk denote significant effects (two-sided P < 0.01 after Bonferroni correction for the number of features tested across all models). Nominal P-values are annotated above each plot. Error bars give 95% confidence intervals.

Extended Data Fig. 3 Biases in TR mutation sizes.

a, Mutation size distributions by repeat unit length. Histograms show the distribution (y-axis, fraction of total) of de novo TR mutation sizes for each repeat unit length (x-axis, number of repeat units). Mutations <0 denote contractions and >0 denote expansions. Colours denote different repeat unit lengths (grey = homopolymers; red = dinucleotides; gold = trinucleotides; blue = tetranucleotides; green = pentanucleotides; purple = hexanucleotides). b, c, Mutation size distributions by parental origin. Histograms show the distribution of de novo TR mutation sizes for mutations arising in the paternal (b) and maternal (c) germlines (homopolymers excluded). d, e, Mutation directionality bias in homozygous vs. heterozygous parents. In each plot, the x-axis gives the size of the parent allele relative to the reference genome (hg38). The y-axis gives the mean mutation size in terms of number of repeat units across all mutations with a given parent allele length. A separate coloured line is shown for each repeat unit length (red = dinucleotides; gold = trinucleotides; blue = tetranucleotides; green = pentanucleotides). Plots are restricted to mutations that were successfully phased to either the mother or the father for which the parent of origin was homozygous (b) or heterozygous (c). To restrict to highest confidence mutations, these plots are based only on mutations with step size of ±1 and for which the child had more than 10 enclosing reads supporting the de novo allele.

Extended Data Fig. 4 Power to detect per-locus TR mutation enrichments.

a, Number of recurrent mutations required to reach genome-wide significance. We performed a Fisher’s exact test to test for an excess of mutations in probands (n = 1,593) vs. non-ASD siblings (n = 1,593), for a different number of hypothetical mutation counts in probands (x-axis) and assuming 0 mutations observed in non-ASD siblings. The black line shows the two-sided P-value (log10 scale) obtained for each test. The grey dashed line denotes the P-value required to meet a genome-wide significance of P < 0.05 with Bonferroni multiple testing correction. b, Sample sizes required to identify genome-wide significant TRs. The x-axis shows sample size (log10 scale) in terms of the number of quad families analysed. Each line represents a different rate of mutation at a particular TR in probands, assuming 0 mutations at that TR in siblings (blue = 0.001%; orange = 0.01%; green = 0.05%; red = 0.1%; purple = 0.3%). The y-axis shows the power to detect a specific TR at genome-wide significance for each rate. c, Quantile–quantile plots for per-locus TR mutation burden testing. For each TR we performed a Fisher’s exact test to test for an excess of mutations in probands vs. siblings. The x-axis gives expected −log10 P-values under a null (uniform) distribution. The y-axis gives observed −log10 P-values from burden tests. Each dot represents a single TR. Black = all TRs. Gray = homopolymers excluded.

Extended Data Fig. 5 TR mutation burden near SNPs associated with ASD and related traits.

a, b, Bars show mean TR mutation counts in probands (red) vs. non-ASD siblings (blue) for TRs within 50kb of published GWAS associated SNPs (ASD = autism spectrum disorder; SCZ = schizophrenia; EA = educational attainment) considering (a) all TR mutations (ASD n = 4,213; SCZ n = 22,811; SCZ n = 25,668 TR mutations) or (b) TR mutations for which the mutant allele frequency is >5% in controls (SSC parents) (ASD n = 2,774; SCZ n = 14,661; SCZ n = 16,364 TR mutations). Error bars give 95% confidence intervals around the mean. Single asterisks denote nominally significant increases (Mann–Whitney one-sided P < 0.05). Double asterisks denote trends that are significant after Bonferroni correction for the six categories tested. Circles and squares show counts for females and males, respectively.

Extended Data Fig. 6 Proband de novo TR mutations enriched in brain-expressed genes.

a, Ratio of median expression in proband-only genes to control-only genes across time points. The heatmap shows the ratio of the median expression of genes with only proband mutations (n = 268 genes) to that of genes with only mutations in non-ASD siblings (n = 242 genes). Each row shows a different brain structure from the BrainSpan dataset. Each column shows a different developmental time point. The black vertical line separates pre-natal from post-natal time points. Gray boxes indicate no data was available for that time point. Brain structure acronyms are defined in the legend of Fig. 3c. b, Proband TR mutations enriched for brain expression STRs. The quantile–quantile plot shows the distribution of expression STR (eSTR) unadjusted P-values based on associating TR length with gene expression in Brain-Caudate samples in the GTEx cohort46. eSTR association P-values are two-sided and are based on t-statistics computed using linear regression analyses performed previously. Each point represents a TR by gene association test using a linear regression model42. The x-axis gives expected −log10 P-values and the y-axis gives observed −log10 P-values. Red points show TRs with at least one de novo mutation in probands and 0 in controls. Blue points show TRs with at least one de novo mutation in controls and 0 in probands. We found no significant difference in either Brain-Cerebellum or the other 15 non-brain tissues analysed in that study, which we expected should not be relevant to ASD (not shown).

Extended Data Fig. 7 All coding and 5′ UTR mutations to novel alleles.

a, Mutations in probands at coding or 5′ UTR TRs to unobserved alleles. Each panel shows a de novo TR mutation observed in ASD probands to an allele (x-axis, repeat copy number) not observed in SSC parents. Black histograms give the allele counts in parents. Red arrows denote the allele resulting from each specified de novo TR mutation. Pedigrees show genotypes of parents and the child with the mutation (probands = black diamonds; non-ASD siblings = white diamonds). The text below pedigrees gives the gene and region in which the mutation occurred. b, Mutations in non-ASD siblings at coding or 5′ UTR TRs to unobserved alleles. Plots are the same as in a. except show mutations in non-ASD siblings.

Extended Data Fig. 8 TR mutation burden in ASD excluding homopolymers.

a, Mutation burden by gene annotation. b, Mutation burden by frequency of the allele arising by de novo mutation. The x-axis stratifies mutations based on non-overlapping bins of the frequency of the de novo allele in healthy controls (SSC parents). “All” includes all mutations. For other allele frequency bins, only TRs for which precise copy numbers could be inferred in at least 80% of SSC parents are included (Methods). AF = allele frequency. In both plots, the y-axis gives RR in probands vs. non-ASD siblings. Dots show estimated relative risk and lines give 95% confidence intervals. Gray = all samples; green = males only; purple = females only. Both plots show only TRs with repeat unit length >1bp.

Extended Data Fig. 9 A method to estimate selection coefficients for short TRs (STRs).

a, STR mutation model. Mutation is modelled by a stochastic mutation matrix with length-dependent mutation rates and mutation sizes following a geometric distribution with a directional bias towards the central allele. Unless otherwise indicated, alleles are specified in terms of the number of repeat units away from the central, or modal, allele at each STR. b, STR selection model. Negative selection is modelled by a diploid selection surface constructed as a function of the fitness of the individual alleles. The fitness of each allele is calculated as a function of a selection coefficient s, where the central allele has optimal fitness (w = 1), and the fitness of other alleles is a function of the number of repeat units away from the optimal allele. c, Example output of forward simulations of allele frequencies. The simulation starts with one ancestral (“optimal”) allele. As s increases, variability in the resulting allele frequency distributions decreases as the less fit alleles are removed by natural selection. d, Overview of per-STR selection inference using Approximate Bayesian Computation. For each STR, the method takes a prior on s, mutation model, and demographic parameters, and the observed allele frequency distribution as input. It outputs a posterior distribution of s and a P-value from a likelihood ratio test of whether a model with selection fits better than a model without selection (s = 0).

Extended Data Fig. 10 Evaluation of SISTR.

a, Comparison of true vs. inferred per-locus selection coefficients. The x-axis shows the true simulated value of s, and the y-axis shows the mean s value inferred by SISTR across 200 simulation replicates. b, Power to detect negative selection as a function of s. The x-axis shows the true simulated value of s, and the y-axis gives the power to reject the null hypothesis that s = 0. Left, middle, and right panels show results using models for dinucleotide, trinucleotide, and tetranucleotide TRs, respectively. Colours denotes different optimal allele lengths. c, Inferred genome-wide distribution of s is robust to prior choice and demographic models. We applied SISTR genome-wide using 2 different demographic models (Supplementary Methods) and 3 different prior distributions (left panels) on s. Right panels show the inferred genome-wide distribution of s using different combinations of priors and demographic models. Only loci inferred to be under selection (adjusted SISTR P < 1%) are included in the histograms. Red, yellow, and blue denote dinucleotides (n = 29,874), trinucleotides (n = 39,250), and tetranucleotides (n = 13,099), respectively. d, Genes containing coding STRs under strong selection are more missense-constrained. The x-axis gives the missense constraint Z-score reported by Gnomad47. The y-axis gives the frequency of genes with each missense Z-score. e, Genes containing coding STRs under strong selection are more loss-of-function intolerant. The x-axis gives the pLI score measuring loss of function intolerance of each gene reported by Gnomad. For d, e, black bars show the distribution for all genes containing an STR not inferred to be under selection (n = 177; adjusted SISTR P ≥ 1%) and red bars show the distribution for all genes containing an STR inferred to be under selection (n = 21; adjusted SISTR P < 1%). Vertical lines show medians of each distribution. For ce, SISTR P-values are one-sided and based on the likelihood ratio test described in the Supplementary Methods.

Supplementary information

Supplementary Information

Supplementary Methods; Supplementary Note; Supplementary Discussion. The supplementary information provides detailed descriptions of the MonSTR and SISTR methods, analysis of the contribution of genomic sequence features to TR mutation rates, further analyses comparing de novo TR mutations to published GWAS loci, and additional discussion points.

Reporting Summary

Supplementary Data

Supplementary Dataset 1: Per-locus selection coefficients inferred using SISTR.

Supplementary Tables

This file contains Supplementary Tables 1-8.

Peer Review File

Rights and permissions

Reprints and Permissions

About this article

Verify currency and authenticity via CrossMark

Cite this article

Mitra, I., Huang, B., Mousavi, N. et al. Patterns of de novo tandem repeat mutations and their role in autism. Nature 589, 246–250 (2021).

Download citation


By submitting a comment you agree to abide by our Terms and Community Guidelines. If you find something abusive or that does not comply with our terms or guidelines please flag it as inappropriate.


Quick links

Nature Briefing

Sign up for the Nature Briefing newsletter — what matters in science, free to your inbox daily.

Get the most important science stories of the day, free in your inbox. Sign up for Nature Briefing