Genome-wide detection of tandem DNA repeats that are expanded in autism

Trost, Brett; Engchuan, Worrawat; Nguyen, Charlotte M.; Thiruvahindrapuram, Bhooma; Dolzhenko, Egor; Backstrom, Ian; Mirceta, Mila; Mojarad, Bahareh A.; Yin, Yue; Dov, Alona; Chandrakumar, Induja; Prasolava, Tanya; Shum, Natalie; Hamdan, Omar; Pellecchia, Giovanna; Howe, Jennifer L.; Whitney, Joseph; Klee, Eric W.; Baheti, Saurabh; Amaral, David G.; Anagnostou, Evdokia; Elsabbagh, Mayada; Fernandez, Bridget A.; Hoang, Ny; Lewis, M. E. Suzanne; Liu, Xudong; Sjaarda, Calvin; Smith, Isabel M.; Szatmari, Peter; Zwaigenbaum, Lonnie; Glazer, David; Hartley, Dean; Stewart, A. Keith; Eberle, Michael A.; Sato, Nozomu; Pearson, Christopher E.; Scherer, Stephen W.; Yuen, Ryan K. C.

doi:10.1038/s41586-020-2579-z

Article
Published: 27 July 2020

Genome-wide detection of tandem DNA repeats that are expanded in autism

Brett Trost ORCID: orcid.org/0000-0003-4863-7273^1,2^na1,
Worrawat Engchuan^1,2^na1,
Charlotte M. Nguyen^1,2,3^na1,
Bhooma Thiruvahindrapuram^1,2^na1,
Egor Dolzhenko⁴,
Ian Backstrom¹,
Mila Mirceta^1,3,
Bahareh A. Mojarad¹,
Yue Yin¹,
Alona Dov^1,3,
Induja Chandrakumar¹,
Tanya Prasolava¹,
Natalie Shum^1,3,
Omar Hamdan^1,2,
Giovanna Pellecchia ORCID: orcid.org/0000-0003-4747-3473^1,2,
Jennifer L. Howe^1,2,
Joseph Whitney^1,2,
Eric W. Klee^5,6,
Saurabh Baheti⁵,
David G. Amaral⁷,
Evdokia Anagnostou⁸,
Mayada Elsabbagh ORCID: orcid.org/0000-0002-7311-9059⁹,
Bridget A. Fernandez¹⁰,
Ny Hoang^1,3,
M. E. Suzanne Lewis^11,12,
Xudong Liu¹³,
Calvin Sjaarda ORCID: orcid.org/0000-0002-9787-1915¹³,
Isabel M. Smith^14,15,
Peter Szatmari^16,17,18,
Lonnie Zwaigenbaum¹⁹,
David Glazer ORCID: orcid.org/0000-0002-6407-8646²⁰,
Dean Hartley²¹,
A. Keith Stewart^6,22,
Michael A. Eberle ORCID: orcid.org/0000-0001-8965-1253⁴,
Nozomu Sato ORCID: orcid.org/0000-0002-8906-2798¹,
Christopher E. Pearson^1,3,
Stephen W. Scherer ORCID: orcid.org/0000-0002-8326-1999^1,2,3,23 &
…
Ryan K. C. Yuen ORCID: orcid.org/0000-0001-7273-4968^1,2,3

Nature volume 586, pages 80–86 (2020)Cite this article

22k Accesses
120 Citations
165 Altmetric
Metrics details

Subjects

Abstract

Tandem DNA repeats vary in the size and sequence of each unit (motif). When expanded, these tandem DNA repeats have been associated with more than 40 monogenic disorders¹. Their involvement in disorders with complex genetics is largely unknown, as is the extent of their heterogeneity. Here we investigated the genome-wide characteristics of tandem repeats that had motifs with a length of 2–20 base pairs in 17,231 genomes of families containing individuals with autism spectrum disorder (ASD)^2,3 and population control individuals⁴. We found extensive polymorphism in the size and sequence of motifs. Many of the tandem repeat loci that we detected correlated with cytogenetic fragile sites. At 2,588 loci, gene-associated expansions of tandem repeats that were rare among population control individuals were significantly more prevalent among individuals with ASD than their siblings without ASD, particularly in exons and near splice junctions, and in genes related to the development of the nervous system and cardiovascular system or muscle. Rare tandem repeat expansions had a prevalence of 23.3% in children with ASD compared with 20.7% in children without ASD, which suggests that tandem repeat expansions make a collective contribution to the risk of ASD of 2.6%. These rare tandem repeat expansions included previously undescribed ASD-linked expansions in DMPK and FXN, which are associated with neuromuscular conditions, and in previously unknown loci such as FGF14 and CACNB1. Rare tandem repeat expansions were associated with lower IQ and adaptive ability. Our results show that tandem DNA repeat expansions contribute strongly to the genetic aetiology and phenotypic complexity of ASD.

Access through your institution

Buy or subscribe

This is a preview of subscription content, access via your institution

Access options

Access through your institution

Buy this article

Purchase on Springer Link
Instant access to full article PDF

Buy now

Prices may be subject to local taxes which are calculated during checkout

**Fig. 1: Genome analysis of tandem repeats.**

**Fig. 2: Functional analysis of rare tandem repeat expansions (frequency of <0.1% in the 1000 Genomes Project).**

**Fig. 3: Clinical analysis of rare tandem repeat expansions in individuals with ASD.**

Patterns of de novo tandem repeat mutations and their role in autism

Article 13 January 2021

Abundancy of polymorphic CGG repeats in the human genome suggest a broad involvement in neurological disease

Article Open access 28 January 2021

Rediscovering tandem repeat variation in schizophrenia: challenges and opportunities

Article Open access 20 December 2023

Data availability

Access to the MSSNG and SSC genome-sequencing data can be obtained by completing data access agreements (https://research.mss.ng and https://www.sfari.org/resource/sfari-base, respectively). The 1000G genome-sequencing data are publicly available via Amazon Web Services (s3://1000genomes/1000G_2504_high_coverage/data). Source data are provided with this paper.

Code availability

Code used in this manuscript is available at GitHub (https://github.com/bjtrost/tandem-repeat-expansions-in-ASD).

References

López Castel, A., Cleary, J. D. & Pearson, C. E. Repeat instability as the basis for human diseases and as a potential target for therapy. Nat. Rev. Mol. Cell Biol. 11, 165–170 (2010).
Article PubMed CAS Google Scholar
Yuen, R. K. C. et al. Whole genome sequencing resource identifies 18 new candidate genes for autism spectrum disorder. Nat. Neurosci. 20, 602–611 (2017).
Article CAS PubMed Central Google Scholar
Fischbach, G. D. & Lord, C. The Simons Simplex Collection: a resource for identification of autism genetic risk factors. Neuron 68, 192–195 (2010).
Article CAS PubMed Google Scholar
The 1000 Genomes Project Consortium. A global reference for human genetic variation. Nature 526, 68–74 (2015).
Article CAS Google Scholar
Bamshad, M. J., Nickerson, D. A. & Chong, J. X. Mendelian gene discovery: fast and furious with no end in sight. Am. J. Hum. Genet. 105, 448–455 (2019).
Article CAS PubMed PubMed Central Google Scholar
Manolio, T. A. et al. Finding the missing heritability of complex diseases. Nature 461, 747–753 (2009).
Article ADS CAS PubMed PubMed Central Google Scholar
Vorstman, J. A. S. et al. Autism genetics: opportunities and challenges for clinical translation. Nat. Rev. Genet. 18, 362–376 (2017).
Article CAS PubMed Google Scholar
Ozonoff, S. et al. Recurrence risk for autism spectrum disorders: a Baby Siblings Research Consortium study. Pediatrics 128, e488–e495 (2011).
PubMed PubMed Central Google Scholar
Risch, N. et al. Familial recurrence of autism spectrum disorder: evaluating genetic and environmental contributions. Am. J. Psychiatry 171, 1206–1213 (2014).
Article PubMed Google Scholar
Fernandez, B. A. & Scherer, S. W. Syndromic autism spectrum disorders: moving from a clinically defined to a molecularly defined approach. Dialogues Clin. Neurosci. 19, 353–371 (2017).
PubMed PubMed Central Google Scholar
De Rubeis, S. et al. Synaptic, transcriptional and chromatin genes disrupted in autism. Nature 515, 209–215 (2014).
Article PubMed PubMed Central CAS Google Scholar
Iossifov, I. et al. The contribution of de novo coding mutations to autism spectrum disorder. Nature 515, 216–221 (2014).
Article ADS CAS PubMed PubMed Central Google Scholar
Sanders, S. J. et al. Insights into autism spectrum disorder genomic architecture and biology from 71 risk loci. Neuron 87, 1215–1233 (2015).
Article CAS PubMed PubMed Central Google Scholar
Yuen, R. K. C. et al. Genome-wide characteristics of de novo mutations in autism. NPJ Genom. Med. 1, 16027 (2016).
Article PubMed Central Google Scholar
Marshall, C. R. et al. Structural variation of chromosomes in autism spectrum disorder. Am. J. Hum. Genet. 82, 477–488 (2008).
Article CAS PubMed PubMed Central Google Scholar
Brandler, W. M. et al. Paternally inherited cis-regulatory structural variants are associated with autism. Science 360, 327–331 (2018).
Article CAS PubMed PubMed Central Google Scholar
Bourgeron, T. From the genetic architecture to synaptic plasticity in autism spectrum disorder. Nat. Rev. Neurosci. 16, 551–563 (2015).
Article CAS PubMed Google Scholar
Tammimies, K. et al. Molecular diagnostic yield of chromosomal microarray analysis and whole-exome sequencing in children with autism spectrum disorder. J. Am. Med. Assoc. 314, 895–903 (2015).
Article CAS Google Scholar
An, J.-Y. et al. Genome-wide de novo risk score implicates promoter variation in autism spectrum disorder. Science 362, eaat6576 (2018).
Article ADS PubMed PubMed Central CAS Google Scholar
Jiang, Y. H. et al. Detection of clinically relevant genetic variants in autism spectrum disorder by whole-genome sequencing. Am. J. Hum. Genet. 93, 249–263 (2013).
Article CAS PubMed PubMed Central Google Scholar
Werling, D. M. et al. An analytical framework for whole-genome sequence association studies and its implications for autism spectrum disorder. Nat. Genet. 50, 727–736 (2018).
Article CAS PubMed PubMed Central Google Scholar
Gaugler, T. et al. Most genetic risk for autism resides with common variation. Nat. Genet. 46, 881–885 (2014).
Article CAS PubMed PubMed Central Google Scholar
Grove, J. et al. Identification of common genetic risk variants for autism spectrum disorder. Nat. Genet. 51, 431–444 (2019).
Article CAS PubMed PubMed Central Google Scholar
Hannan, A. J. Tandem repeat polymorphisms: modulators of disease susceptibility and candidates for ‘missing heritability’. Trends Genet. 26, 59–65 (2010).
Article CAS PubMed Google Scholar
Bahlo, M. et al. Recent advances in the detection of repeat expansions with short-read next-generation sequencing. F1000Res. 7, 736 (2018).
Article CAS Google Scholar
Cortese, A. et al. Biallelic expansion of an intronic repeat in RFC1 is a common cause of late-onset ataxia. Nat. Genet. 51, 649–658 (2019).
Article CAS PubMed PubMed Central Google Scholar
Sato, N. et al. Spinocerebellar ataxia type 31 is associated with “inserted” penta-nucleotide repeats containing (TGGAA)_n. Am. J. Hum. Genet. 85, 544–557 (2009).
Article CAS PubMed PubMed Central Google Scholar
Rafehi, H. et al. Bioinformatics-based identification of expanded repeats: a non-reference intronic pentamer expansion in RFC1 causes CANVAS. Am. J. Hum. Genet. 105, 151–165 (2019).
Article CAS PubMed PubMed Central Google Scholar
Hagerman, R. J. et al. Fragile X-associated neuropsychiatric disorders (FXAND). Front. Psychiatry 9, 564 (2018).
Article PubMed PubMed Central Google Scholar
Dolzhenko, E. et al. ExpansionHunter Denovo: a computational method for locating known and novel repeat expansions in short-read sequencing data. Genome Biol. 21, 102 (2020).
Article PubMed PubMed Central Google Scholar
Levy, S. et al. The diploid genome sequence of an individual human. PLoS Biol. 5, e254 (2007).
Article PubMed PubMed Central CAS Google Scholar
GTEx Consortium. Genetic effects on gene expression across human tissues. Nature 550, 204–213 (2017).
Article PubMed Central Google Scholar
Olson, J. E. et al. Characteristics and utilisation of the Mayo Clinic Biobank, a clinic-based prospective collection in the USA: cohort profile. BMJ Open 9, e032707 (2019).
Article PubMed PubMed Central Google Scholar
Subramanian, S., Mishra, R. K. & Singh, L. Genome-wide analysis of microsatellite repeats in humans: their abundance and density in specific genomic regions. Genome Biol. 4, R13 (2003).
Article PubMed PubMed Central Google Scholar
Benson, G. Tandem repeats finder: a program to analyze DNA sequences. Nucleic Acids Res. 27, 573–580 (1999).
Article CAS PubMed PubMed Central Google Scholar
Willems, T., Gymrek, M., Highnam, G., Mittelman, D. & Erlich, Y. The landscape of human STR variation. Genome Res. 24, 1894–1904 (2014).
Article CAS PubMed PubMed Central Google Scholar
Bignell, G. R. et al. Signatures of mutation and selection in the cancer genome. Nature 463, 893–898 (2010).
Article ADS CAS PubMed PubMed Central Google Scholar
Hannan, A. J. Tandem repeats mediating genetic plasticity in health and disease. Nat. Rev. Genet. 19, 286–298 (2018).
Article CAS PubMed Google Scholar
Lek, M. et al. Analysis of protein-coding genetic variation in 60,706 humans. Nature 536, 285–291 (2016).
Article CAS PubMed PubMed Central Google Scholar
Yuen, R. K. C. et al. Whole-genome sequencing of quartet families with autism spectrum disorder. Nat. Med. 21, 185–191 (2015).
Article CAS PubMed Google Scholar
Banerjee-Basu, S. & Packer, A. SFARI Gene: an evolving database for the autism research community. Dis. Model. Mech. 3, 133–135 (2010).
Article PubMed Google Scholar
Trost, B. et al. A comprehensive workflow for read depth-based identification of copy-number variation from whole-genome sequence data. Am. J. Hum. Genet. 102, 142–155 (2018).
Article CAS PubMed PubMed Central Google Scholar
Takiyama, Y. et al. Single sperm analysis of the CAG repeats in the gene for Machado–Joseph disease (MJD1): evidence for non-Mendelian transmission of the MJD1 gene and for the effect of the intragenic CGG/GGG polymorphism on the intergenerational instability. Hum. Mol. Genet. 6, 1063–1068 (1997).
Article CAS PubMed Google Scholar
Dean, N. L. et al. Transmission ratio distortion in the myotonic dystrophy locus in human preimplantation embryos. Eur. J. Hum. Genet. 14, 299–306 (2006).
Article CAS PubMed Google Scholar
Shoubridge, C. et al. Is there a Mendelian transmission ratio distortion of the c.429_452dup(24bp) polyalanine tract ARX mutation? Eur. J. Hum. Genet. 20, 1311–1314 (2012).
Article CAS Google Scholar
Ekström, A.-B., Hakenäs-Plate, L., Samuelsson, L., Tulinius, M. & Wentz, E. Autism spectrum conditions in myotonic dystrophy type 1: a study on 57 individuals with congenital and childhood forms. Am. J. Med. Genet. B. Neuropsychiatr. Genet. 147B, 918–926 (2008).
Article PubMed Google Scholar
Lagrue, E. et al. A large multicenter study of pediatric myotonic dystrophy type 1 for evidence-based management. Neurology 92, e852–e865 (2019).
Article PubMed Google Scholar
Dolzhenko, E. et al. Detection of long repeat expansions from PCR-free whole-genome sequence data. Genome Res. 27, 1895–1903 (2017).
Article CAS PubMed PubMed Central Google Scholar
Dolzhenko, E. et al. ExpansionHunter: a sequence-graph-based tool to analyze variation in short tandem repeat regions. Bioinformatics 35, 4754–4756 (2019).
Article CAS PubMed PubMed Central Google Scholar
Tsai, L. Y. & Beisler, J. M. The development of sex differences in infantile autism. Br. J. Psychiatry 142, 373–378 (1983).
Article CAS PubMed Google Scholar
Satterstrom, F. K. et al. Large-scale exome sequencing study implicates both developmental and functional changes in the neurobiology of autism. Cell 180, 568–584 (2020).
Article CAS PubMed PubMed Central Google Scholar
Li, H. & Durbin, R. Fast and accurate short read alignment with Burrows–Wheeler transform. Bioinformatics 25, 1754–1760 (2009).
Article CAS PubMed PubMed Central Google Scholar
Purcell, S. et al. PLINK: a tool set for whole-genome association and population-based linkage analyses. Am. J. Hum. Genet. 81, 559–575 (2007).
Article CAS PubMed PubMed Central Google Scholar
Alexander, D. H., Novembre, J. & Lange, K. Fast model-based estimation of ancestry in unrelated individuals. Genome Res. 19, 1655–1664 (2009).
Article CAS PubMed PubMed Central Google Scholar
Koren, S. et al. Canu: scalable and accurate long-read assembly via adaptive k-mer weighting and repeat separation. Genome Res. 27, 722–736 (2017).
Article CAS PubMed PubMed Central Google Scholar
Ester, M., Kriegel, H., Sander, J. & Xu, X. A density-based algorithm for discovering clusters in large spatial databases with noise. In Proc. 2nd International Conference on Knowledge Discovery and Data Mining (AAAI, 1996).
Wang, K., Li, M. & Hakonarson, H. ANNOVAR: functional annotation of genetic variants from high-throughput sequencing data. Nucleic Acids Res. 38, e164 (2010).
Article PubMed PubMed Central CAS Google Scholar
de Leeuw, C. A., Mooij, J. M., Heskes, T. & Posthuma, D. MAGMA: generalized gene-set analysis of GWAS data. PLOS Comput. Biol. 11, e1004219 (2015).
Article PubMed PubMed Central CAS Google Scholar
Schizophrenia Working Group of the Psychiatric Genomics Consortium. Biological insights from 108 schizophrenia-associated genetic loci. Nature 511, 421–427 (2014).
Article ADS PubMed Central CAS Google Scholar
Demontis, D. et al. Discovery of the first genome-wide significant risk loci for attention deficit/hyperactivity disorder. Nat. Genet. 51, 63–75 (2019).
Article CAS PubMed Google Scholar
Lee, J. J. et al. Gene discovery and polygenic prediction from a genome-wide association study of educational attainment in 1.1 million individuals. Nat. Genet. 50, 1112–1121 (2018).
Article CAS PubMed PubMed Central Google Scholar
Wood, A. R. et al. Defining the role of common variation in the genomic and biological architecture of adult human height. Nat. Genet. 46, 1173–1186 (2014).
Article CAS PubMed PubMed Central Google Scholar
Zhu, M. et al. Using ERDS to infer copy-number variants in high-coverage genomes. Am. J. Hum. Genet. 91, 408–421 (2012).
Article CAS PubMed PubMed Central Google Scholar
Abyzov, A., Urban, A. E., Snyder, M. & Gerstein, M. CNVnator: an approach to discover, genotype, and characterize typical and atypical CNVs from family and population genome sequencing. Genome Res. 21, 974–984 (2011).
Article CAS PubMed PubMed Central Google Scholar

Download references

Acknowledgements

We thank The Centre for Applied Genomics, especially J. Buchanan, B. Kellam, S. Lamoureux, T. Nalpathamkalam, R. Patel, W. Sung and Z. Wang. We also thank G. K. W. Wong, S. Walker and A. Paterson; the participating families in MSSNG (www.mss.ng), and Autism Speaks staff, M. Quirbach and V. Seifer, who manage the MSSNG and AGRE programs; the families at the participating SSC sites, and the principal investigators (A. Beaudet, R. Bernier, J. Constantino, E. Cook, E. Fombonne, D. Geschwind, R. Goin-Kochel, E. Hanson, D. Grice, A. Klin, D. Ledbetter, C. Lord, C. Martin, D. Martin, R. Maxim, J. Miles, O. Ousley, K. Pelphrey, B. Peterson, J. Piggot, C. Saulnier, M. State, W. Stone, J. Sutcliffe, C. Walsh, Z. Warren and E. Wijsman). Project funding is from Autism Speaks, the Canadian Institute for Advanced Research (CIFAR), the KRG Children’s Charitable Foundation, The Petroff Family Fund, The Kazman Family Fund, The Marigold Foundation, Genome Canada, the Canada Foundation for Innovation, the Government of Ontario, Canadian Institutes for Health Research (CIHR), the Natural Sciences and Engineering Research Council (NSERC), Brain Canada, Kids Brain Health Network, Province of Ontario Neurodevelopmental Disorders (POND) Network and the Ontario Brain Institute (OBI). R.K.C.Y. is supported by The Hospital for Sick Children’s Research Institute, SickKids Catalyst Scholar in Genetics, NARSAD Young Investigator award, Dataset Analysis Grant from Autism Speaks, the University of Toronto McLaughlin Centre and the Nancy E.T. Fahrner Award. B. Trost was funded by the Canadian Institutes for Health Research Banting Postdoctoral Fellowship and the Brain Canada Canadian Open Neuroscience Platform Research Scholar Award. C.M.N. and B.A.M. were supported by the Restracomp Award from The Hospital for Sick Children, and M.M. by the Ontario Graduate Scholarship. M.E.S.L. is funded by an Investigator Grant Award Program (IGAP) salary award from BC Children’s Hospital Research Institute and a Genome British Columbia Strategic Initiative Grant, D.G.A. by a National Institute of Mental Health Grant (1R01MH103371), E.A. by the Dr. Stuart D. Sims Chair in Autism, L.Z. by the Stollery Children’s Hospital Foundation Chair in Autism Research, P.S. by the Patsy and Jamie Anderson Chair in Child and Youth Mental Health, C.E.P. by the Canada Research Chair in Disease-Associated Genome Instability and S.W.S. holds the GlaxoSmithKline Chair in Genome Sciences at the University of Toronto and The Hospital for Sick Children.

Author information

These authors contributed equally: Brett Trost, Worrawat Engchuan, Charlotte M. Nguyen, Bhooma Thiruvahindrapuram

Authors and Affiliations

Genetics and Genome Biology, The Hospital for Sick Children, Toronto, Ontario, Canada
Brett Trost, Worrawat Engchuan, Charlotte M. Nguyen, Bhooma Thiruvahindrapuram, Ian Backstrom, Mila Mirceta, Bahareh A. Mojarad, Yue Yin, Alona Dov, Induja Chandrakumar, Tanya Prasolava, Natalie Shum, Omar Hamdan, Giovanna Pellecchia, Jennifer L. Howe, Joseph Whitney, Ny Hoang, Nozomu Sato, Christopher E. Pearson, Stephen W. Scherer & Ryan K. C. Yuen
The Centre for Applied Genomics, The Hospital for Sick Children, Toronto, Ontario, Canada
Brett Trost, Worrawat Engchuan, Charlotte M. Nguyen, Bhooma Thiruvahindrapuram, Omar Hamdan, Giovanna Pellecchia, Jennifer L. Howe, Joseph Whitney, Stephen W. Scherer & Ryan K. C. Yuen
Department of Molecular Genetics, University of Toronto, Toronto, Ontario, Canada
Charlotte M. Nguyen, Mila Mirceta, Alona Dov, Natalie Shum, Ny Hoang, Christopher E. Pearson, Stephen W. Scherer & Ryan K. C. Yuen
Illumina, San Diego, CA, USA
Egor Dolzhenko & Michael A. Eberle
Department of Health Sciences Research, Mayo Clinic, Rochester, MN, USA
Eric W. Klee & Saurabh Baheti
Center for Individualized Medicine, Mayo Clinic, Rochester, MN, USA
Eric W. Klee & A. Keith Stewart
MIND Institute and Department of Psychiatry and Behavioral Sciences, University of California Davis School of Medicine, Sacramento, CA, USA
David G. Amaral
Holland Bloorview Kids Rehabilitation Hospital, University of Toronto, Toronto, Ontario, Canada
Evdokia Anagnostou
Montreal Neurological Institute and Azrieli Centre for Autism Research, McGill University, Montreal, Quebec, Canada
Mayada Elsabbagh
Discipline of Genetics, Faculty of Medicine, Memorial University of Newfoundland, St. John’s, Newfoundland and Labrador, Canada
Bridget A. Fernandez
Medical Genetics, University of British Columbia (UBC), Vancouver, British Columbia, Canada
M. E. Suzanne Lewis
BC Children’s Hospital Research Institute, Vancouver, British Columbia, Canada
M. E. Suzanne Lewis
Department of Psychiatry, Queen’s University, Kingston, Ontario, Canada
Xudong Liu & Calvin Sjaarda
Department of Pediatrics, Dalhousie University, Halifax, Nova Scotia, Canada
Isabel M. Smith
IWK Health Centre, Halifax, Nova Scotia, Canada
Isabel M. Smith
Department of Psychiatry, University of Toronto, Toronto, Ontario, Canada
Peter Szatmari
Centre for Addiction and Mental Health, Toronto, Ontario, Canada
Peter Szatmari
Department of Psychiatry, The Hospital for Sick Children, Toronto, Ontario, Canada
Peter Szatmari
Department of Pediatrics, University of Alberta, Edmonton, Alberta, Canada
Lonnie Zwaigenbaum
Verily Life Sciences, South San Francisco, CA, USA
David Glazer
Autism Speaks, New York, NY, USA
Dean Hartley
Division of Hematology, Mayo Clinic, Rochester, MN, USA
A. Keith Stewart
McLaughlin Centre, University of Toronto, Toronto, Ontario, Canada
Stephen W. Scherer

Authors

Brett Trost
View author publications
You can also search for this author in PubMed Google Scholar
Worrawat Engchuan
View author publications
You can also search for this author in PubMed Google Scholar
Charlotte M. Nguyen
View author publications
You can also search for this author in PubMed Google Scholar
Bhooma Thiruvahindrapuram
View author publications
You can also search for this author in PubMed Google Scholar
Egor Dolzhenko
View author publications
You can also search for this author in PubMed Google Scholar
Ian Backstrom
View author publications
You can also search for this author in PubMed Google Scholar
Mila Mirceta
View author publications
You can also search for this author in PubMed Google Scholar
Bahareh A. Mojarad
View author publications
You can also search for this author in PubMed Google Scholar
Yue Yin
View author publications
You can also search for this author in PubMed Google Scholar
Alona Dov
View author publications
You can also search for this author in PubMed Google Scholar
Induja Chandrakumar
View author publications
You can also search for this author in PubMed Google Scholar
Tanya Prasolava
View author publications
You can also search for this author in PubMed Google Scholar
Natalie Shum
View author publications
You can also search for this author in PubMed Google Scholar
Omar Hamdan
View author publications
You can also search for this author in PubMed Google Scholar
Giovanna Pellecchia
View author publications
You can also search for this author in PubMed Google Scholar
Jennifer L. Howe
View author publications
You can also search for this author in PubMed Google Scholar
Joseph Whitney
View author publications
You can also search for this author in PubMed Google Scholar
Eric W. Klee
View author publications
You can also search for this author in PubMed Google Scholar
Saurabh Baheti
View author publications
You can also search for this author in PubMed Google Scholar
David G. Amaral
View author publications
You can also search for this author in PubMed Google Scholar
Evdokia Anagnostou
View author publications
You can also search for this author in PubMed Google Scholar
Mayada Elsabbagh
View author publications
You can also search for this author in PubMed Google Scholar
Bridget A. Fernandez
View author publications
You can also search for this author in PubMed Google Scholar
Ny Hoang
View author publications
You can also search for this author in PubMed Google Scholar
M. E. Suzanne Lewis
View author publications
You can also search for this author in PubMed Google Scholar
Xudong Liu
View author publications
You can also search for this author in PubMed Google Scholar
Calvin Sjaarda
View author publications
You can also search for this author in PubMed Google Scholar
Isabel M. Smith
View author publications
You can also search for this author in PubMed Google Scholar
Peter Szatmari
View author publications
You can also search for this author in PubMed Google Scholar
Lonnie Zwaigenbaum
View author publications
You can also search for this author in PubMed Google Scholar
David Glazer
View author publications
You can also search for this author in PubMed Google Scholar
Dean Hartley
View author publications
You can also search for this author in PubMed Google Scholar
A. Keith Stewart
View author publications
You can also search for this author in PubMed Google Scholar
Michael A. Eberle
View author publications
You can also search for this author in PubMed Google Scholar
Nozomu Sato
View author publications
You can also search for this author in PubMed Google Scholar
Christopher E. Pearson
View author publications
You can also search for this author in PubMed Google Scholar
Stephen W. Scherer
View author publications
You can also search for this author in PubMed Google Scholar
Ryan K. C. Yuen
View author publications
You can also search for this author in PubMed Google Scholar

Contributions

R.K.C.Y. conceived and designed the study. B. Trost and C.M.N. developed the tandem repeat detection pipeline, with additional contributions from E.D., M.A.E. and R.K.C.Y. W.E. performed statistical analyses, with additional contributions from B. Trost, B. Thiruvahindrapuram, I.C., G.P. and R.K.C.Y. I.B., M.M., T.P., N. Shum and N. Sato performed laboratory experiments for validation. B. Thiruvahindrapuram, B.A.M., Y.Y., A.D. and G.P. performed miscellaneous data analysis. O.H. and J.W. helped to process cloud-based data and provided general technical assistance. J.L.H. managed the collection of samples and phenotypes. E.W.K., S.B. and A.K.S. assisted with population control datasets. D.G.A., E.A., M.E., B.A.F., N.H., M.E.S.L., X.L., C.S., I.M.S., P.S. and L.Z. chose the phenotypic assessment tools and recruited, diagnosed and examined the participants. D.G., D.H. and S.W.S. supervised, managed and coordinated genomic data from MSSNG. R.K.C.Y. wrote the manuscript, with additional input from B. Trost, W.E., B. Thiruvahindrapuram, N. Sato, C.E.P. and S.W.S. R.K.C.Y. supervised the study, with additional input from C.E.P. and S.W.S. All authors read, reviewed and approved the final manuscript.

Corresponding author

Correspondence to Ryan K. C. Yuen.

Ethics declarations

Competing interests

E.D. and M.A.E. are, or were, employees of Illumina, a public company that develops and markets systems for genetic analysis. D.G.A. is on the Scientific Advisory Boards of Stemina Biomarkers Discovery and Axial Therapeutics. E.A. has served as a consultant to Roche and Quadrant, has received grant funding from Roche, has received royalties from APPI and Springer, has received in-kind support from AMO Pharmaceuticals and has received editorial honoraria from Wiley. D.G. is employed by Verily. A.K.S. has a consulting role for Amgen, Bristol-Myers Squibb, Celgene, Ionis, Janssen, Oncopeptides, Ono, Roche, Seattle Genetics and Takeda, and has received research funding from Amgen, Celgene and Janssen. S.W.S. serves on the Scientific Advisory Committees of Population Bio and Deep Genomics; intellectual property originating from his research and held at The Hospital for Sick Children is licensed to Lineagen and separately to Athena Diagnostics. The strategies for genome-wide analysis and interpretation of tandem DNA repeats from genome sequence have been filed under reference H8313086USP (US provisional application number 62/951671) with the US Patent and Trademark Office.

Additional information

Peer review information Nature thanks Thomas Bourgeron, Anders Børglum and Anthony Hannan for their contribution to the peer review of this work.

Publisher’s note Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Extended data figures and tables

Extended Data Fig. 1 Study design.

a, Schematic workflow of the tandem repeat detection and analyses. ¹Tandem repeats here are defined as those with 2–20-bp repeat motifs that span at least 150 bp. ²Rare expansions here are defined as tandem repeat expansions that are outliers according to size and occur in <0.1% of population controls from the 1000G. Note that EHdn only approximates the size and location of a given tandem repeat; thus, we use the term ‘region’ to refer to a genomic segment detected in this way, and reserve ‘location’ or ‘locus’ for sites that have been more precisely mapped. b, Genome-sequencing cohorts used for each analysis performed in this study. Numbers above each cohort represent the number of samples that remained after curation (Supplementary Notes).

Extended Data Fig. 2 Distribution of the number of tandem repeats detected by EHdn.

The number of tandem repeats detected by EHdn in a given sample is stratified by cohort, sequencing platform and DNA library preparation method (a; n = 2,504, 594, 1,220, 6,634 and 9,096 for 1000G/Illumina NovaSeq/PCR-free, MSSNG/Illumina HiSeq 2000 or 2500/PCR-based, MSSNG/Illumina HiSeq X/PCR-based, MSSNG/Illumina HiSeq X/PCR-free and SSC/Illumina HiSeq X/PCR-free, respectively) and predicted ancestry for samples in the ‘MSSNG/Illumina HiSeq X/PCR-free’ category (b; n = 157, 301, 247, 287, 4,841, 687 and 114 for admixed, AFR, AMR, EAS, EUR, OTH and SAS, respectively). Ancestry designations were derived from the 1000G ‘super populations’ (https://www.internationalgenome.org/category/population): AFR, African; AMR, admixed American; EAS, East Asian; EUR, European; OTH, other; SAS, South Asian. The centre of each box plot indicates the median, the lower and upper hinges correspond to the first and third quartiles, and the minima and maxima are 1.5× the interquartile range below or above the median, respectively.

Source data

Extended Data Fig. 3 Quality control for the detection of tandem repeats.

a–c, Histogram (left) and normal QQ plots (right) of the number of tandem repeats detected by EHdn for all samples (a), samples for which the number of tandem repeats was within mean ± 2 s.d. (b) and samples for which the number of tandem repeats was within mean ± 3 s.d. (c). Of the three distributions, data in c are the closest to a normal distribution.

Source data

Extended Data Fig. 4 The number of unique motifs in each repeat-containing region.

The number of unique motifs (y axis) in each repeat-containing region (x axis) is shown for all autosomal chromosomes and chromosomes X and Y.

Extended Data Fig. 5 Distributions of gnomAD gene constraints.

The distributions of gnomAD observed/expected (o/e) upper bounds are shown for genes with rare tandem repeat expansions near TSSs (n = 32 genes) and splice junctions (n = 80 genes), compared with other genes (n = 19,567 genes) (one-sided Wilcoxon rank-sum test). The minima and maxima indicate 3× the interquartile range-deviated o/e upper bounds from the median and the centre indicates the median of the o/e upper bounds.

Source data

Extended Data Fig. 6 Transmission tests.

a–c, Odds ratios calculated as ratios of the transmission events of genic large tandem repeats and those in intergenic regions. Only individuals with ASD of European ancestry in SSC (a; n = 1,808), MSSNG (b; n = 2,010) and both SSC and MSSNG (c; n = 3,818) were considered. d–f, Odds ratios calculated as ratios of the transmission events of large tandem repeats (99th percentile of length distribution) in a particular functional element to those in intergenic regions. Only individuals with ASD of European ancestry in SSC (d), MSSNG (e) and both SSC and MSSNG (f) were considered. Fisher’s exact tests were used to estimate the odds ratios and 95% confidence intervals are shown as error bars.

Source data

Extended Data Fig. 7 Transmission gene-set enrichment.

Odds ratios calculated as ratios of the transmission events of large tandem repeats (99th percentile of length distribution) in particular gene sets to those in intergenic regions. Only individuals with ASD of European ancestry in SSC (a; n = 1,808), MSSNG (b; n = 2,010) and both SSC and MSSNG (c; n = 3,818) were considered. Gene sets that were enriched in the burden analysis of rare tandem repeat expansions between children with ASD and their siblings without ASD in SSC are labelled. Red bars indicate significant enrichment in individuals with ASD (FWER < 25%). Fisher’s exact tests were used to estimate the odds ratios and 95% confidence intervals are shown as error bars.

Source data

Extended Data Fig. 8 Methods for sizing of the CTG repeat in DMPK.

a, Although short CTG repeats were correctly sized by ExpansionHunter (the results were perfectly matched with fragment analysis), slight discrepancies were observed in the estimates for premutation alleles between ExpansionHunter (EH) and PCR-based fragment analysis. Note that the length of the premutation CTG repeats (42 CTGs) was close to the read length of the HiSeq X platform (150 bp). N/A, not available. b, Predictions of the presence of longer CTG repeats were validated by repeat-primed PCR, although the estimated size by ExpansionHunter was shown to be an underestimate (the saw-tooth pattern of repeat-primed PCR extended longer than the predicted size). Repeat-primed PCR experiments were consistently reproduced at least three times for the large expansions. Repeat sizing experiments of PCR-amplifiable samples were consistently reproduced at least twice.

Extended Data Fig. 9 Validation of tandem repeats detected by EHdn.

a–d, Tandem repeats detected in CACNB1. e–h, Tandem repeats detected in FXN. a, e, Integrative Genomics Viewer read pile-up showing the reads aligning to the loci in CACNB1 and FXN in two families for which tandem repeat expansions were detected in the child (bottom). In both families, the expansion is transmitted from the mother to the child (samples highlighted in red in b and f). b, f, Image of the gel electrophoresis showing two bands that correspond to the expanded and unexpanded alleles in the mother and child. The father has only the unexpanded allele. Results from PCR and gel electrophoresis were consistently reproduced at least twice for CACNB1 and FXN loci (Supplementary Fig. 9). c, g, Chromatogram of the Sanger sequencing analysis of the expanded non-reference tandem repeat in the mother. d, h, Chromatogram of the Sanger sequencing analysis of the expanded non-reference tandem repeat in the child. Sanger sequencing was performed using the DNA of the expanded alleles, which was extracted from the gels.

Extended Data Table 1 Molecularly unmapped rare folate-sensitive fragile sites overlapped with GC-rich tandem repeats

Full size table

Supplementary information

Supplementary Information

This file contains Supplementary Notes (additional results that could not be included in the manuscript due to space constraints) and Supplementary Figures 1-9.

Reporting Summary

Supplementary Tables

This file contains Supplementary Tables 1-15.

Supplementary Data

This file contains source data for Supplementary Figures 1, 3-5 and 7.

Source data

Source Data Fig. 1

Source Data Fig. 2

Source Data Fig. 3

Source Data Extended Data Fig. 2

Source Data Extended Data Fig. 3

Source Data Extended Data Fig. 5

Source Data Extended Data Fig. 6

Source Data Extended Data Fig. 7

Rights and permissions

Reprints and permissions

About this article

Cite this article

Trost, B., Engchuan, W., Nguyen, C.M. et al. Genome-wide detection of tandem DNA repeats that are expanded in autism. Nature 586, 80–86 (2020). https://doi.org/10.1038/s41586-020-2579-z

Download citation

Received: 16 November 2019
Accepted: 05 June 2020
Published: 27 July 2020
Issue Date: 01 October 2020
DOI: https://doi.org/10.1038/s41586-020-2579-z

Comments

By submitting a comment you agree to abide by our Terms and Community Guidelines. If you find something abusive or that does not comply with our terms or guidelines please flag it as inappropriate.

Subjects

Abstract

Access options

Similar content being viewed by others

Data availability

Code availability

References

Acknowledgements

Author information

Authors and Affiliations

Contributions

Corresponding author

Ethics declarations

Competing interests

Additional information

Extended data figures and tables

Supplementary information

Source data

Rights and permissions

About this article

Cite this article

Share this article

Comments

Search

Quick links