Approximately half of the human genome, known as the repeatome, consists of repetitive DNA sequences. The repeatome includes more than one million tandem repeats — sections of DNA in which a sequence is replicated many times in tandem — whose biology remains largely unexplored. More than 50 diseases are known to be caused by expansion of a tandem-repeat sequence in a single gene; among them are Huntington’s disease and fragile X syndrome1. But less-well understood is the role of tandem repeats in polygenic diseases, which have more-complex genetic underpinnings. Writing in Nature, Mitra et al.2 use a newly developed bioinformatics approach to identify tandem repeats associated with one such condition, autism spectrum disorder.
Autism spectrum disorder (ASD) is highly prevalent, affecting approximately 1–2% of children in the United States (see go.nature.com/38sqvhd), although this varies internationally. It is characterized by atypical neurodevelopment, communication deficits, atypical social functioning, restricted interests and repetitive behaviours. Although progress is being made3 in discovering the genetic basis of ASD, it remains poorly understood. Changes in the number of copies of large segments of DNA, along with other genetic variants, have been previously implicated3, but the capacity to systematically investigate tandem repeats genome-wide has been optimized only in the past few years, thanks to advances in DNA sequencing and bioinformatics1.
Mitra et al. analysed tandem repeats in people who have ASD, and in their immediate families, using a newly developed bioinformatics tool that they named MonSTR. The tool processes DNA sequences from tandem-repeat regions across the genome to determine the likelihood that a de novo mutation (one that occurs only in the person who has ASD, and not in their parents) led to a change in a tandem repeat.
The analysis revealed that tandem-repeat mutations are significantly more common in people who have ASD than in their unaffected siblings, with mutations more likely to cause the repeat to expand than to contract (Fig. 1). Many of the expansions occurred in DNA regions that drive the expression of genes involved in fetal brain development. To determine the likelihood that a mutation would be deleterious, Mitra et al. developed another bioinformatics tool (named SISTR) based on an evolutionary model of tandem-repeat variation. This analysis revealed 25 mutations, present in individuals with ASD, that were likely to be the most harmful; these mutations are rare in the general population, presumably because they are strongly selected against. Furthermore, some of these tandem-repeat mutations occurred in genes that had previously been associated with ASD, increasing the likelihood that they are directly involved in the disorder.
Mitra and colleagues’ work echoes a recent study by Trost and colleagues4, which examined tandem repeats in ASD using a different, complementary approach. Trost et al. compared tandem-repeat lengths in thousands of genomes in people who had ASD and those who did not (some family members, and some unrelated), using another recently developed bioinformatics tool. This approach allowed the authors to study more cases and controls than Mitra et al. (approximately 17,000 genomes, compared with approximately 6,500 genomes in total). The tool used by Trost and colleagues, named ExpansionHunter Denovo5, is optimized for detecting longer repeats (at least 150 bases pairs of DNA) and their expansions, whereas MonSTR also allows analysis of relatively short tandem repeats, and of both expansions and contractions.
As Mitra and colleagues have done, Trost et al. identified many tandem-repeat mutations associated with autism. But because the two studies used quite different approaches, the identified mutations were largely complementary, rather than overlapping. Another difference is that Trost and colleagues cannot say exactly how many of the mutations are inherited, rather than de novo, because many of the people in their control groups were unrelated to the people with ASD. And Trost et al. found specific associations between tandem-repeat expansions and particular clinical features associated with cognitive function, notably lower IQ and adaptive ability, which Mitra and colleagues did not investigate. Analysis of these clinical subtypes will be valuable in future studies of tandem repeats in ASD.
The large number of non-overlapping tandem-repeat mutations identified in these two papers together provide compelling evidence that tandem repeats contribute significantly to the genetic burden associated with ASD. These studies will inform our understanding of the mechanisms that underlie the condition, as well as future approaches to diagnosis and treatment. However, ASD is a complex and varied disorder. Although both studies involved thousands of individuals, larger international replication studies are needed to establish how robust these genetic associations are, and how commonly each mutation is associated with ASD in general, as well as with each clinical subtype.
In addition, both studies are correlative. There is not yet any direct evidence that the identified mutations have a role in disease, except in the case of a small subset identified by Trost and colleagues. This consists of mutations that occur in genes whose repeat expansions are known to cause other tandem-repeat disorders1, including fragile X syndrome, myotonic dystrophy and Friedreich ataxia. Therefore, an urgent priority now is to test the role of all of the identified tandem repeats in animal models, and in human stem-cell cultures derived from people who have ASD and their families. These further studies will enable key questions to be addressed. Does each tandem repeat regulate aspects of brain development and function (as well as peripheral systems implicated in ASD)? If so, what mechanisms are involved? And are specific tandem repeats associated with subtypes of ASD or particular co-morbidities, given that the condition often occurs with one or more other disorders, including epilepsy, intellectual disability and attention deficit hyperactivity disorder?
Many other avenues for future research are also apparent. For example, do some tandem-repeat lengths confer an evolutionary advantage? If so, perhaps ASD associated with tandem-repeat mutations results from an evolved mechanism that makes tandem-repeat length more variable (a form of genetic plasticity) than that of non-repetitive sequences1. For a given gene, although certain tandem-repeat lengths might have selective advantages, it is possible that extremely long or short repeats (and perhaps, in particular environmental contexts, repeats of a particular length) could be associated with human disorders.
Another question to be addressed is whether each tandem-repeat mutation is differentially detrimental on different genomic and environmental backgrounds, causing or promoting ASD in some settings but not in others. We know that many tandem repeats are highly responsive to environmental signals that lead to altered epigenetic modifications (molecular marks on DNA that can alter gene expression without changing the underlying DNA sequence). Indeed, some tandem repeats might themselves act as regulators of these modifications1.
More broadly, the repeatome — and tandem repeats, in particular — should now be systematically studied across a range of common human disorders, including cancer, diabetes and brain disorders such as schizophrenia and depression. Genome-wide association studies (GWAS), which link genetic variants to traits and disorders of interest, have improved our understanding of these polygenic disorders, but substantial gaps remain6. It has been proposed that such ‘missing heritability’ might be explained, at least in part, by tandem repeats7. But many of these are highly variable, and so are unlikely to be discovered by GWAS approaches, which map genes on the basis of variation at single nucleotides7. The approaches outlined in the current papers2,4 and elsewhere1,6,8,9 provide a road map for identifying some of this missing heritability across a broad spectrum of human traits and disorders.
In the long term, taking a comprehensive approach to analysing and understanding tandem repeats (and the repeatome more generally) could enable mutations to be corrected in the clinic, using therapeutic tools such as CRISPR–Cas gene-editing techniques and drugs10. It is possible that such approaches, as well as other future therapies developed thanks to our expanded understanding of repeatome biology, will constitute a fundamental foundation of precision medicine in the twenty-first century.
Nature 589, 200-202 (2021)