Skip to main content

Thank you for visiting nature.com. You are using a browser version with limited support for CSS. To obtain the best experience, we recommend you use a more up to date browser (or turn off compatibility mode in Internet Explorer). In the meantime, to ensure continued support, we are displaying the site without styles and JavaScript.

  • Brief Communication
  • Published:

A common flanking variant is associated with enhanced stability of the FGF14-SCA27B repeat locus

Abstract

The factors driving or preventing pathological expansion of tandem repeats remain largely unknown. Here, we assessed the FGF14 (GAA)·(TTC) repeat locus in 2,530 individuals by long-read and Sanger sequencing and identified a common 5′-flanking variant in 70.34% of alleles analyzed (3,463/4,923) that represents the phylogenetically ancestral allele and is present on all major haplotypes. This common sequence variation is present nearly exclusively on nonpathogenic alleles with fewer than 30 GAA-pure triplets and is associated with enhanced stability of the repeat locus upon intergenerational transmission and increased Fiber-seq chromatin accessibility.

This is a preview of subscription content, access via your institution

Access options

Buy this article

Prices may be subject to local taxes which are calculated during checkout

Fig. 1: A common 5′-flanking sequence variant is associated with smaller FGF14 GAA repeat sizes.
Fig. 2: Intergenerational instability of the FGF14 repeat locus.

Similar content being viewed by others

Data availability

The data created through the All of Us Program Long Read Data release CDRv7 (April 2023: https://support.researchallofus.org/hc/en-us/articles/14769699298324-v7-Curated-Data-Repository-CDR-Release-Notes-2022Q4R9-versions) are available through the All of Us Research Program researcher workbench (https://researchallofus.org/). The Care4Rare-SOLVE data are available through Genomics4RD (https://www.genomics4rd.ca) via controlled access requests to genomics4rd@cheo.on.ca. The data created as part of Genomic Answers for Kids are available through NIH/NCBI dbGaP (https://www.ncbi.nlm.nih.gov/projects/gap/cgi-bin/study.cgi?study_id=phs002206.v4). Genome sequences of the 79 great apes generated by Prado-Martinez et al.17 were downloaded from the Sequence Read Archive under the accession number PRJNA189439. Patient-level whole-genome sequencing data are not publicly available, as they could compromise privacy and have not been consented for open sharing. Data from Sanger sequencing have not been consented for sharing. Additional data that support the findings of this study are available on request from the corresponding author (M.C.D.).

Code availability

Code for running TRGT and relevant data analysis is available through our manuscript companion repository: https://github.com/ZuchnerLab/FGF14FlankingVariant and https://doi.org/10.5281/zenodo.11239003 (ref. 25).

References

  1. Pellerin, D. et al. Deep intronic FGF14 GAA repeat expansion in late-onset cerebellar ataxia. N. Engl. J. Med. 388, 128–141 (2023).

    Article  CAS  PubMed  Google Scholar 

  2. Rafehi, H. et al. An intronic GAA repeat expansion in FGF14 causes the autosomal-dominant adult-onset ataxia SCA27B/ATX-FGF14. Am. J. Hum. Genet. 110, 1018 (2023).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  3. Méreaux, J. L. et al. Clinical and genetic keys to cerebellar ataxia due to FGF14 GAA expansions. EBioMedicine 99, 104931 (2024).

    Article  PubMed  Google Scholar 

  4. Dolzhenko, E. et al. Characterization and visualization of tandem repeats at genome scale. Nat. Biotechnol. https://doi.org/10.1038/s41587-023-02057-3 (2024).

    Article  PubMed  Google Scholar 

  5. Sakamoto, N. et al. GGA*TCC-interrupted triplets in long GAA*TTC repeats inhibit the formation of triplex and sticky DNA structures, alleviate transcription inhibition, and reduce genetic instabilities. J. Biol. Chem. 276, 27178–27187 (2001).

    Article  CAS  PubMed  Google Scholar 

  6. Koenig, Z. et al. A harmonized public resource of deeply sequenced diverse human genomes. Genome Res. https://doi.org/10.1101/gr.278378.123 (2024).

    Article  PubMed  Google Scholar 

  7. Stergachis, A. B., Debo, B. M., Haugen, E., Churchman, L. S. & Stamatoyannopoulos, J. A. Single-molecule regulatory architectures captured by chromatin fiber sequencing. Science 368, 1449–1454 (2020).

    Article  CAS  PubMed  Google Scholar 

  8. Duffy, M. F., et al. Divergent patterns of healthy aging across human brain regions at single-cell resolution reveal links to neurodegenerative disease. Preprint at bioRxiv https://doi.org/10.1101/2023.07.31.551097 (2023).

  9. Bonnet, C. et al. Optimized testing strategy for the diagnosis of GAA-FGF14 ataxia/spinocerebellar ataxia 27B. Sci. Rep. 13, 9737 (2023).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  10. Li, H. Minimap2: pairwise alignment for nucleotide sequences. Bioinformatics 34, 3094–3100 (2018).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  11. Giesselmann, P. et al. Analysis of short tandem repeat expansions and their methylation state with nanopore sequencing. Nat. Biotechnol. 37, 1478–1481 (2019).

    Article  CAS  PubMed  Google Scholar 

  12. Gamaarachchi, H. et al. Fast nanopore sequencing data analysis with SLOW5. Nat. Biotechnol. 40, 1026–1029 (2022).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  13. Samarakoon, H. et al. Flexible and efficient handling of nanopore sequencing signal data with slow5tools. Genome Biol. 24, 69 (2023).

    Article  PubMed  PubMed Central  Google Scholar 

  14. Samarakoon, H., Ferguson, J. M., Gamaarachchi, H. & Deveson, I. W. Accelerated nanopore basecalling with SLOW5 data format. Bioinformatics 39, btad352 (2023).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  15. Shafin, K. et al. Haplotype-aware variant calling with PEPPER-Margin-DeepVariant enables high accuracy in nanopore long-reads. Nat. Methods 18, 1322–1332 (2021).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  16. Holt, J. M. et al. HiPhase: jointly phasing small, structural, and tandem repeat variants from HiFi sequencing. Bioinformatics 40, btae042 (2024).

    Article  PubMed  PubMed Central  Google Scholar 

  17. Prado-Martinez, J. et al. Great ape genetic diversity and population history. Nature 499, 471–475 (2013).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  18. Li, H. Aligning sequence reads, clone sequences and assembly contigs with BWA-MEM. Preprint at https://ar5iv.labs.arxiv.org/html/1303.3997 (2013).

  19. Dolzhenko, E. et al. ExpansionHunter: a sequence-graph-based tool to analyze variation in short tandem repeat regions. Bioinformatics 35, 4754–4756 (2019).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  20. Dolzhenko, E. et al. REViewer: haplotype-resolved visualization of read alignments in and around tandem repeats. Genome Med. 14, 84 (2022).

    Article  PubMed  PubMed Central  Google Scholar 

  21. Waterhouse, A. M., Procter, J. B., Martin, D. M., Clamp, M. & Barton, G. J. Jalview Version 2—a multiple sequence alignment editor and analysis workbench. Bioinformatics 25, 1189–1191 (2009).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  22. Goujon, M. et al. A new bioinformatics analysis tools framework at EMBL-EBI. Nucleic Acids Res. 38, W695–W699 (2010).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  23. Girdhar, K. et al. Chromatin domain alterations linked to 3D genome organization in a large cohort of schizophrenia and bipolar disorder brains. Nat. Neurosci. 25, 474–483 (2022).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  24. Jha, A. et al. DNA-m6A calling and integrated long-read epigenetic and genetic analysis with fibertools. Preprint at bioRxiv https://doi.org/10.1101/2023.04.20.537673 (2023).

  25. Danzi, M. FGF14 Flanking Variant Manuscript Codebase. Zenodo https://doi.org/10.5281/zenodo.11239003 (2024).

Download references

Acknowledgements

We thank all the individuals who participated in this study. We gratefully acknowledge All of Us participants for their contributions, without whom this research would not have been possible. We also thank the National Institutes of Health’s All of Us Research Program for making available the participant data examined in this study. We thank the Centre d’Expertise et de Services Génome Québec for assistance with Sanger sequencing. We thank Pacific Biosciences Applications lab in Menlo Park, CA, and Pacific Biosciences bioinformatics team for HiFi sequencing and alignment of the Care4Rare Canada dataset. This work was supported by the National Institutes of Health (NIH) National Institute of Neurological Disorders and Stroke (grant 2R01NS072248-11A1 to S.Z.), the NIH National Human Genome Research Institute (grant R21HG013397 to M.C.D. and S.Z.), the All of Us Data and Research Center through support of the long-read demo projects (to M.C.D. and S.Z.), the Fondation Groupe Monaco (to B.B.), the Canadian Institutes of Health Research (grant 189963 to B.B.) and the Care4Rare Canada Consortium, funded in part by Genome Canada and the Ontario Genomics Institute (grant OGI-147 to K.M.B.), the Canadian Institutes of Health Research (grant GP1-155867 to K.M.B.), Ontario Research Fund, Genome Quebec and the Children’s Hospital of Eastern Ontario Foundation. This work was also supported by the European Joint Programme on Rare Diseases, under the EJP RD COFUND-EJP 825575 via the Deutsche Forschungsgemeinschaft grant 441409627 as part of the PROSPAX consortium (to M.S. and B.B.), and the National Key R&D Program of China (grant 2021YFA0805200 to H.J.). This work was supported in part by the Bioinformatics for Next Generation Sequencing shared resource facility within the Tisch Cancer Institute at the Icahn School of Medicine at Mount Sinai, which is partially supported by NIH grant P30CA196521. This work was also supported in part through the computational resources and staff expertise provided by Scientific Computing at the Icahn School of Medicine at Mount Sinai and supported by the Clinical and Translational Science Awards grant UL1TR004419 from the National Center for Advancing Translational Sciences. Research reported in this paper was supported by the Office of Research Infrastructure of the National Institutes of Health under award numbers S10OD026880 and S10OD030463. This work was also supported by the Australian Medical Research Future Fund Genomics Health Futures Mission Grants 2007681 and 2023126 (to I.W.D.). N.M.T. is supported by the National Institute on Drug Abuse (grant RF1DA048810) and the National Institute of Neurological Disorders and Stroke (grant R01NS106229). H.H. is supported by the Wellcome Trust, the UK Medical Research Council and the UCLH/UCL Biomedical Research Centre. The Nussenzweig laboratory is supported by the Intramural Research Program of the NIH funded in part with federal funds from the National Cancer Institute under contract HHSN261201500003. G.R. is supported by an EL2 Investigator Grant (APP2007769) from the Australian National Health and Medical Research Council (NHMRC). S.A. is supported by the National Institute on Drug Abuse (grant DP1DA056018). We thank generous donors to Genomic Answers for Kids program (to T.P.) at Children’s Mercy Kansas City. D.P. and G.F.D.G. hold fellowship awards from the Canadian Institutes of Health Research. The funders had no role in the conduct of this study.

Author information

Authors and Affiliations

Authors

Consortia

Contributions

D.P., B.B., S.Z. and M.C.D. designed or conceptualized the study. D.P., G.F.D.G., M.C., E.D., S.K.N., W.A.C., I.R.L.X., M.-J.D., G.S., G.M.-R., I.S., C.K.S., A.R., V.R., M.W., C.B., C.A., A.A., C.P., D.H., N.M.T., K.D., P.J.L., N.G.L., M.R., H.H., M.S., K.U., A.N., M.N., Z.C., H.J., I.W.D., G.R., S.A., M.A.E., K.M.B., T.P., B.B., S.Z. and M.C.D. acquired data. D.P., G.F.D.G., M.C., E.D., S.K.N., W.A.C., I.R.L.X., M.-J.D., G.S., G.M.-R., K.M.B., T.P., B.B., S.Z. and M.C.D. analyzed or interpreted data. D.P., G.F.D.G., M.C., E.D., S.K.N., W.A.C., I.R.L.X., M.-J.D., G.S., G.M.-R., I.S., C.K.S., A.R., V.R., M.W., C.B., C.A., A.A., C.P., D.H., N.M.T., K.D., P.J.L., N.G.L., M.R., H.H., M.S., K.U., A.N., M.N., Z.C., H.J., I.W.D., G.R., S.A., M.A.E., K.M.B., T.P., B.B., S.Z. and M.C.D. drafted or revised the manuscript for intellectual content.

Corresponding author

Correspondence to Matt C. Danzi.

Ethics declarations

Competing interests

E.D. is an employee of Pacific Biosciences. M.S. has received consultancy honoraria from Ionis, Prevail, Orphazyme, Servier, Reata, GenOrph and AviadoBio, all unrelated to the present manuscript. M.A.E. is an employee of Pacific Biosciences. S.Z. received consultancy honoraria from Neurogene, Aeglea BioTherapeutics and Applied Therapeutics and is an unpaid officer of the TGP Foundation, all unrelated to the present manuscript. The other authors declare no competing interests.

Peer review

Peer review information

Nature Genetics thanks the anonymous reviewers for their contribution to the peer review of this work.

Additional information

Publisher’s note Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Extended data

Extended Data Fig. 1 Relationship between 5′-flanking sequences and FGF14 GAA repeat lengths.

a, Schematic representation of the FGF14 gene, isoform 1b with the location of the (GAA)n·(TTC)n repeat locus in the first intron. The sequences of the reference 5′-flanking sequence (5′-RFS), the common 5′-flanking variant (5′-CFV; C4 variant), and the C1, C2, C3, C5, and short flanking variant sequences are shown. The sequences of the C1 through C5 variants are highlighted in blue. The sequences are presented relative to the positive strand (genomic context). b, Swarmplot related to Fig. 1c showing repeat lengths as estimated by PacBio HiFi sequencing for 4,382 alleles (from the Genomic Answers for Kids, Care4Rare-SOLVE, and All of Us cohorts) including each of the C1 through C5 sequence variants, separated into subgroups based on the presence of a single terminal adenine (A) or dual terminal adenines (AA). No alleles with C2AA or C5A 5′-flanking variants were found. Three C4 alleles not counted as part of the 5′-CFV group were observed with a single terminal adenine. This plot also extends the y-axis to show the two alleles of over 800 repeat units carrying the 5′-RFS that were not plotted in Fig. 1c for visual clarity. The color of the data points corresponds to the GAA repeat motif purity (a color legend is shown in the top right corner of the plot). The green dashed horizontal line indicates 30 GAA repeats. Abbreviations: 5′-CFV, common 5′-flanking variant; 5′-RFS, reference 5′-flanking sequence.

Extended Data Fig. 2 Relationship between 3′-flanking sequences and FGF14 GAA repeat lengths.

Distribution of repeat lengths estimated by PacBio HiFi sequencing for 4,382 alleles (data from the Genomic Answers for Kids, Care4Rare-SOLVE, and All of Us cohorts) in relation to variations observed in the 3′-flanking sequence of the FGF14 repeat locus. The distribution of FGF14 GAA repeat lengths in alleles with the common 3′-flanking sequence, alleles with the single nucleotide variation rs61965263, and alleles with other rare flanking variants is illustrated. The color of the data points corresponds to the GAA repeat motif purity (a color legend is shown in the top right corner of the plot).

Extended Data Fig. 3 Analysis of parent-offspring transmission of the FGF14 repeat according to the 5′-flanking sequence variant.

a, Analysis of GAA repeat size changes across 411 intergenerational transmissions (from the Genomic Answers for Kids cohort) as estimated by PacBio HiFi sequencing. Contractions are plotted below the dashed identity line while expansions are plotted above that line. b, Change in GAA repeat length across 411 intergeneration transmission (from the Genomic Answers for Kids cohort) as measured by PacBio HiFi sequencing separated by flanking variant group and parental allele size. The number of intergenerational transmission events in each group is indicated below the x-axis. The y-axis shows the change in repeat length from parent to child. Contractions are plotted below the dashed lines while expansions are plotted above them. Random noise was applied across the x-axis within each category to maximize data visualization. This panel extends Fig. 2b by plotting the 12 additional intergenerational events involving alleles carrying a C1, C2, C5, or other rare 5′-flanking sequence variant. In a and b, red dots are alleles passed from mother to child, while blue dots represent alleles passed from father to child. Abbreviations: 5′-CFV, common 5′-flanking variant; 5′-RFS, reference 5′-flanking sequence.

Extended Data Fig. 4 Haplotype analysis of the FGF14 flanking sequence variants.

Haplotype analysis of the FGF14 flanking sequence variants for 1,674 individuals (from the Genomic Answers for Kids and All of Us cohorts). a, Visualization of haplotypes for 1,674 individuals physically phased through the FGF14 repeat locus. Each row represents one of two alleles per individual. A color-coded legend adjacent to the dendrogram indicates the flanking variant group on each allele: green for 5′-CFV, red for 5′-RFS, blue for degenerate 5′-CFV sequences, and cyan for other flanking sequences. The 15 columns in the plot represent the variant status of each allele for 15 common SNVs derived from the 1000 Genomes and Human Genome Diversity Project (HGDP) dataset, with black indicating reference genotype and tan showing alternate genotype. A vertical red line marks the location of the GAA repeat locus. The heatmap on the right side of the plot displays the GAA repeat length of each allele, with yellow indicating larger repeats and dark blue smaller ones. b, Frequency distribution of the 10 major haplotype groups for each flanking sequence variant. The lower-left pie chart shows the distribution of the 10 major haplotype groups in the 1000 Genomes and HGDP dataset. The number of alleles plotted in each pie chart is given above the chart. Abbreviations: 1KG, 1000 Genomes; 5′-CFV, common 5′-flanking variant; 5′-RFS, reference 5′-flanking sequence; HGDP, Human Genome Diversity Project.

Supplementary information

Supplementary Information

Supplementary Note and Figs. 1–9.

Reporting Summary

Rights and permissions

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Pellerin, D., Del Gobbo, G.F., Couse, M. et al. A common flanking variant is associated with enhanced stability of the FGF14-SCA27B repeat locus. Nat Genet 56, 1366–1370 (2024). https://doi.org/10.1038/s41588-024-01808-5

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1038/s41588-024-01808-5

This article is cited by

Search

Quick links

Nature Briefing

Sign up for the Nature Briefing newsletter — what matters in science, free to your inbox daily.

Get the most important science stories of the day, free in your inbox. Sign up for Nature Briefing