Abstract
The factors driving or preventing pathological expansion of tandem repeats remain largely unknown. Here, we assessed the FGF14 (GAA)·(TTC) repeat locus in 2,530 individuals by long-read and Sanger sequencing and identified a common 5′-flanking variant in 70.34% of alleles analyzed (3,463/4,923) that represents the phylogenetically ancestral allele and is present on all major haplotypes. This common sequence variation is present nearly exclusively on nonpathogenic alleles with fewer than 30 GAA-pure triplets and is associated with enhanced stability of the repeat locus upon intergenerational transmission and increased Fiber-seq chromatin accessibility.
This is a preview of subscription content, access via your institution
Access options
Access Nature and 54 other Nature Portfolio journals
Get Nature+, our best-value online-access subscription
$29.99 / 30 days
cancel any time
Subscribe to this journal
Receive 12 print issues and online access
$209.00 per year
only $17.42 per issue
Buy this article
- Purchase on SpringerLink
- Instant access to full article PDF
Prices may be subject to local taxes which are calculated during checkout
Similar content being viewed by others
Data availability
The data created through the All of Us Program Long Read Data release CDRv7 (April 2023: https://support.researchallofus.org/hc/en-us/articles/14769699298324-v7-Curated-Data-Repository-CDR-Release-Notes-2022Q4R9-versions) are available through the All of Us Research Program researcher workbench (https://researchallofus.org/). The Care4Rare-SOLVE data are available through Genomics4RD (https://www.genomics4rd.ca) via controlled access requests to genomics4rd@cheo.on.ca. The data created as part of Genomic Answers for Kids are available through NIH/NCBI dbGaP (https://www.ncbi.nlm.nih.gov/projects/gap/cgi-bin/study.cgi?study_id=phs002206.v4). Genome sequences of the 79 great apes generated by Prado-Martinez et al.17 were downloaded from the Sequence Read Archive under the accession number PRJNA189439. Patient-level whole-genome sequencing data are not publicly available, as they could compromise privacy and have not been consented for open sharing. Data from Sanger sequencing have not been consented for sharing. Additional data that support the findings of this study are available on request from the corresponding author (M.C.D.).
Code availability
Code for running TRGT and relevant data analysis is available through our manuscript companion repository: https://github.com/ZuchnerLab/FGF14FlankingVariant and https://doi.org/10.5281/zenodo.11239003 (ref. 25).
References
Pellerin, D. et al. Deep intronic FGF14 GAA repeat expansion in late-onset cerebellar ataxia. N. Engl. J. Med. 388, 128–141 (2023).
Rafehi, H. et al. An intronic GAA repeat expansion in FGF14 causes the autosomal-dominant adult-onset ataxia SCA27B/ATX-FGF14. Am. J. Hum. Genet. 110, 1018 (2023).
Méreaux, J. L. et al. Clinical and genetic keys to cerebellar ataxia due to FGF14 GAA expansions. EBioMedicine 99, 104931 (2024).
Dolzhenko, E. et al. Characterization and visualization of tandem repeats at genome scale. Nat. Biotechnol. https://doi.org/10.1038/s41587-023-02057-3 (2024).
Sakamoto, N. et al. GGA*TCC-interrupted triplets in long GAA*TTC repeats inhibit the formation of triplex and sticky DNA structures, alleviate transcription inhibition, and reduce genetic instabilities. J. Biol. Chem. 276, 27178–27187 (2001).
Koenig, Z. et al. A harmonized public resource of deeply sequenced diverse human genomes. Genome Res. https://doi.org/10.1101/gr.278378.123 (2024).
Stergachis, A. B., Debo, B. M., Haugen, E., Churchman, L. S. & Stamatoyannopoulos, J. A. Single-molecule regulatory architectures captured by chromatin fiber sequencing. Science 368, 1449–1454 (2020).
Duffy, M. F., et al. Divergent patterns of healthy aging across human brain regions at single-cell resolution reveal links to neurodegenerative disease. Preprint at bioRxiv https://doi.org/10.1101/2023.07.31.551097 (2023).
Bonnet, C. et al. Optimized testing strategy for the diagnosis of GAA-FGF14 ataxia/spinocerebellar ataxia 27B. Sci. Rep. 13, 9737 (2023).
Li, H. Minimap2: pairwise alignment for nucleotide sequences. Bioinformatics 34, 3094–3100 (2018).
Giesselmann, P. et al. Analysis of short tandem repeat expansions and their methylation state with nanopore sequencing. Nat. Biotechnol. 37, 1478–1481 (2019).
Gamaarachchi, H. et al. Fast nanopore sequencing data analysis with SLOW5. Nat. Biotechnol. 40, 1026–1029 (2022).
Samarakoon, H. et al. Flexible and efficient handling of nanopore sequencing signal data with slow5tools. Genome Biol. 24, 69 (2023).
Samarakoon, H., Ferguson, J. M., Gamaarachchi, H. & Deveson, I. W. Accelerated nanopore basecalling with SLOW5 data format. Bioinformatics 39, btad352 (2023).
Shafin, K. et al. Haplotype-aware variant calling with PEPPER-Margin-DeepVariant enables high accuracy in nanopore long-reads. Nat. Methods 18, 1322–1332 (2021).
Holt, J. M. et al. HiPhase: jointly phasing small, structural, and tandem repeat variants from HiFi sequencing. Bioinformatics 40, btae042 (2024).
Prado-Martinez, J. et al. Great ape genetic diversity and population history. Nature 499, 471–475 (2013).
Li, H. Aligning sequence reads, clone sequences and assembly contigs with BWA-MEM. Preprint at https://ar5iv.labs.arxiv.org/html/1303.3997 (2013).
Dolzhenko, E. et al. ExpansionHunter: a sequence-graph-based tool to analyze variation in short tandem repeat regions. Bioinformatics 35, 4754–4756 (2019).
Dolzhenko, E. et al. REViewer: haplotype-resolved visualization of read alignments in and around tandem repeats. Genome Med. 14, 84 (2022).
Waterhouse, A. M., Procter, J. B., Martin, D. M., Clamp, M. & Barton, G. J. Jalview Version 2—a multiple sequence alignment editor and analysis workbench. Bioinformatics 25, 1189–1191 (2009).
Goujon, M. et al. A new bioinformatics analysis tools framework at EMBL-EBI. Nucleic Acids Res. 38, W695–W699 (2010).
Girdhar, K. et al. Chromatin domain alterations linked to 3D genome organization in a large cohort of schizophrenia and bipolar disorder brains. Nat. Neurosci. 25, 474–483 (2022).
Jha, A. et al. DNA-m6A calling and integrated long-read epigenetic and genetic analysis with fibertools. Preprint at bioRxiv https://doi.org/10.1101/2023.04.20.537673 (2023).
Danzi, M. FGF14 Flanking Variant Manuscript Codebase. Zenodo https://doi.org/10.5281/zenodo.11239003 (2024).
Acknowledgements
We thank all the individuals who participated in this study. We gratefully acknowledge All of Us participants for their contributions, without whom this research would not have been possible. We also thank the National Institutes of Health’s All of Us Research Program for making available the participant data examined in this study. We thank the Centre d’Expertise et de Services Génome Québec for assistance with Sanger sequencing. We thank Pacific Biosciences Applications lab in Menlo Park, CA, and Pacific Biosciences bioinformatics team for HiFi sequencing and alignment of the Care4Rare Canada dataset. This work was supported by the National Institutes of Health (NIH) National Institute of Neurological Disorders and Stroke (grant 2R01NS072248-11A1 to S.Z.), the NIH National Human Genome Research Institute (grant R21HG013397 to M.C.D. and S.Z.), the All of Us Data and Research Center through support of the long-read demo projects (to M.C.D. and S.Z.), the Fondation Groupe Monaco (to B.B.), the Canadian Institutes of Health Research (grant 189963 to B.B.) and the Care4Rare Canada Consortium, funded in part by Genome Canada and the Ontario Genomics Institute (grant OGI-147 to K.M.B.), the Canadian Institutes of Health Research (grant GP1-155867 to K.M.B.), Ontario Research Fund, Genome Quebec and the Children’s Hospital of Eastern Ontario Foundation. This work was also supported by the European Joint Programme on Rare Diseases, under the EJP RD COFUND-EJP 825575 via the Deutsche Forschungsgemeinschaft grant 441409627 as part of the PROSPAX consortium (to M.S. and B.B.), and the National Key R&D Program of China (grant 2021YFA0805200 to H.J.). This work was supported in part by the Bioinformatics for Next Generation Sequencing shared resource facility within the Tisch Cancer Institute at the Icahn School of Medicine at Mount Sinai, which is partially supported by NIH grant P30CA196521. This work was also supported in part through the computational resources and staff expertise provided by Scientific Computing at the Icahn School of Medicine at Mount Sinai and supported by the Clinical and Translational Science Awards grant UL1TR004419 from the National Center for Advancing Translational Sciences. Research reported in this paper was supported by the Office of Research Infrastructure of the National Institutes of Health under award numbers S10OD026880 and S10OD030463. This work was also supported by the Australian Medical Research Future Fund Genomics Health Futures Mission Grants 2007681 and 2023126 (to I.W.D.). N.M.T. is supported by the National Institute on Drug Abuse (grant RF1DA048810) and the National Institute of Neurological Disorders and Stroke (grant R01NS106229). H.H. is supported by the Wellcome Trust, the UK Medical Research Council and the UCLH/UCL Biomedical Research Centre. The Nussenzweig laboratory is supported by the Intramural Research Program of the NIH funded in part with federal funds from the National Cancer Institute under contract HHSN261201500003. G.R. is supported by an EL2 Investigator Grant (APP2007769) from the Australian National Health and Medical Research Council (NHMRC). S.A. is supported by the National Institute on Drug Abuse (grant DP1DA056018). We thank generous donors to Genomic Answers for Kids program (to T.P.) at Children’s Mercy Kansas City. D.P. and G.F.D.G. hold fellowship awards from the Canadian Institutes of Health Research. The funders had no role in the conduct of this study.
Author information
Authors and Affiliations
Consortia
Contributions
D.P., B.B., S.Z. and M.C.D. designed or conceptualized the study. D.P., G.F.D.G., M.C., E.D., S.K.N., W.A.C., I.R.L.X., M.-J.D., G.S., G.M.-R., I.S., C.K.S., A.R., V.R., M.W., C.B., C.A., A.A., C.P., D.H., N.M.T., K.D., P.J.L., N.G.L., M.R., H.H., M.S., K.U., A.N., M.N., Z.C., H.J., I.W.D., G.R., S.A., M.A.E., K.M.B., T.P., B.B., S.Z. and M.C.D. acquired data. D.P., G.F.D.G., M.C., E.D., S.K.N., W.A.C., I.R.L.X., M.-J.D., G.S., G.M.-R., K.M.B., T.P., B.B., S.Z. and M.C.D. analyzed or interpreted data. D.P., G.F.D.G., M.C., E.D., S.K.N., W.A.C., I.R.L.X., M.-J.D., G.S., G.M.-R., I.S., C.K.S., A.R., V.R., M.W., C.B., C.A., A.A., C.P., D.H., N.M.T., K.D., P.J.L., N.G.L., M.R., H.H., M.S., K.U., A.N., M.N., Z.C., H.J., I.W.D., G.R., S.A., M.A.E., K.M.B., T.P., B.B., S.Z. and M.C.D. drafted or revised the manuscript for intellectual content.
Corresponding author
Ethics declarations
Competing interests
E.D. is an employee of Pacific Biosciences. M.S. has received consultancy honoraria from Ionis, Prevail, Orphazyme, Servier, Reata, GenOrph and AviadoBio, all unrelated to the present manuscript. M.A.E. is an employee of Pacific Biosciences. S.Z. received consultancy honoraria from Neurogene, Aeglea BioTherapeutics and Applied Therapeutics and is an unpaid officer of the TGP Foundation, all unrelated to the present manuscript. The other authors declare no competing interests.
Peer review
Peer review information
Nature Genetics thanks the anonymous reviewers for their contribution to the peer review of this work.
Additional information
Publisher’s note Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Extended data
Extended Data Fig. 1 Relationship between 5′-flanking sequences and FGF14 GAA repeat lengths.
a, Schematic representation of the FGF14 gene, isoform 1b with the location of the (GAA)n·(TTC)n repeat locus in the first intron. The sequences of the reference 5′-flanking sequence (5′-RFS), the common 5′-flanking variant (5′-CFV; C4 variant), and the C1, C2, C3, C5, and short flanking variant sequences are shown. The sequences of the C1 through C5 variants are highlighted in blue. The sequences are presented relative to the positive strand (genomic context). b, Swarmplot related to Fig. 1c showing repeat lengths as estimated by PacBio HiFi sequencing for 4,382 alleles (from the Genomic Answers for Kids, Care4Rare-SOLVE, and All of Us cohorts) including each of the C1 through C5 sequence variants, separated into subgroups based on the presence of a single terminal adenine (A) or dual terminal adenines (AA). No alleles with C2AA or C5A 5′-flanking variants were found. Three C4 alleles not counted as part of the 5′-CFV group were observed with a single terminal adenine. This plot also extends the y-axis to show the two alleles of over 800 repeat units carrying the 5′-RFS that were not plotted in Fig. 1c for visual clarity. The color of the data points corresponds to the GAA repeat motif purity (a color legend is shown in the top right corner of the plot). The green dashed horizontal line indicates 30 GAA repeats. Abbreviations: 5′-CFV, common 5′-flanking variant; 5′-RFS, reference 5′-flanking sequence.
Extended Data Fig. 2 Relationship between 3′-flanking sequences and FGF14 GAA repeat lengths.
Distribution of repeat lengths estimated by PacBio HiFi sequencing for 4,382 alleles (data from the Genomic Answers for Kids, Care4Rare-SOLVE, and All of Us cohorts) in relation to variations observed in the 3′-flanking sequence of the FGF14 repeat locus. The distribution of FGF14 GAA repeat lengths in alleles with the common 3′-flanking sequence, alleles with the single nucleotide variation rs61965263, and alleles with other rare flanking variants is illustrated. The color of the data points corresponds to the GAA repeat motif purity (a color legend is shown in the top right corner of the plot).
Extended Data Fig. 3 Analysis of parent-offspring transmission of the FGF14 repeat according to the 5′-flanking sequence variant.
a, Analysis of GAA repeat size changes across 411 intergenerational transmissions (from the Genomic Answers for Kids cohort) as estimated by PacBio HiFi sequencing. Contractions are plotted below the dashed identity line while expansions are plotted above that line. b, Change in GAA repeat length across 411 intergeneration transmission (from the Genomic Answers for Kids cohort) as measured by PacBio HiFi sequencing separated by flanking variant group and parental allele size. The number of intergenerational transmission events in each group is indicated below the x-axis. The y-axis shows the change in repeat length from parent to child. Contractions are plotted below the dashed lines while expansions are plotted above them. Random noise was applied across the x-axis within each category to maximize data visualization. This panel extends Fig. 2b by plotting the 12 additional intergenerational events involving alleles carrying a C1, C2, C5, or other rare 5′-flanking sequence variant. In a and b, red dots are alleles passed from mother to child, while blue dots represent alleles passed from father to child. Abbreviations: 5′-CFV, common 5′-flanking variant; 5′-RFS, reference 5′-flanking sequence.
Extended Data Fig. 4 Haplotype analysis of the FGF14 flanking sequence variants.
Haplotype analysis of the FGF14 flanking sequence variants for 1,674 individuals (from the Genomic Answers for Kids and All of Us cohorts). a, Visualization of haplotypes for 1,674 individuals physically phased through the FGF14 repeat locus. Each row represents one of two alleles per individual. A color-coded legend adjacent to the dendrogram indicates the flanking variant group on each allele: green for 5′-CFV, red for 5′-RFS, blue for degenerate 5′-CFV sequences, and cyan for other flanking sequences. The 15 columns in the plot represent the variant status of each allele for 15 common SNVs derived from the 1000 Genomes and Human Genome Diversity Project (HGDP) dataset, with black indicating reference genotype and tan showing alternate genotype. A vertical red line marks the location of the GAA repeat locus. The heatmap on the right side of the plot displays the GAA repeat length of each allele, with yellow indicating larger repeats and dark blue smaller ones. b, Frequency distribution of the 10 major haplotype groups for each flanking sequence variant. The lower-left pie chart shows the distribution of the 10 major haplotype groups in the 1000 Genomes and HGDP dataset. The number of alleles plotted in each pie chart is given above the chart. Abbreviations: 1KG, 1000 Genomes; 5′-CFV, common 5′-flanking variant; 5′-RFS, reference 5′-flanking sequence; HGDP, Human Genome Diversity Project.
Supplementary information
Supplementary Information
Supplementary Note and Figs. 1–9.
Rights and permissions
Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.
About this article
Cite this article
Pellerin, D., Del Gobbo, G.F., Couse, M. et al. A common flanking variant is associated with enhanced stability of the FGF14-SCA27B repeat locus. Nat Genet 56, 1366–1370 (2024). https://doi.org/10.1038/s41588-024-01808-5
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1038/s41588-024-01808-5