Attenuated huntingtin gene CAG nucleotide repeat size in individuals with Lynch syndrome

DNA mismatch repair (MMR) is thought to contribute to the onset and progression of Huntington disease (HD) by promoting somatic expansion of the pathogenic CAG nucleotide repeat in the huntingtin gene (HTT). Here we have studied constitutional HTT CAG repeat size in two cohorts of individuals with Lynch syndrome (LS) carrying heterozygous loss-of-function variants in the MMR genes MLH1 (n = 12/60; Lund cohort/Bochum cohort, respectively), MSH2 (n = 15/88), MSH6 (n = 21/23), and controls (n = 19/559). The sum of CAG repeats for both HTT alleles in each individual was calculated due to unknown segregation with the LS allele. In the larger Bochum cohort, the sum of CAG repeats was lower in the MLH1 subgroup compared to controls (MLH1 35.40 CAG repeats ± 3.6 vs. controls 36.89 CAG repeats ± 4.5; p = 0.014). All LS genetic subgroups in the Bochum cohort displayed lower frequencies of unstable HTT intermediate alleles and lower HTT somatic CAG repeat expansion index values compared to controls. Collectively, our results indicate that MMR gene haploinsufficiency could have a restraining impact on constitutional HTT CAG repeat size and support the notion that the MMR pathway is a driver of nucleotide repeat expansion diseases.

The mean somatic HTT CAG expansion index (EI) value, which typically is increased in tissues from individuals with HD 17 , did not differ significantly between individuals with LoF variants in MLH1 (EI = 0.099), MSH2 (EI = 0.122), MSH6 (EI = 0.100) and controls (EI = 0.131) (Table 1).However, notably all LS genetic subgroups showed a lower mean EI value compared to controls (Table 1).

Discussion
Investigation of HTT CAG repeat size in lymphocyte DNA from 217 individuals from two different LS cohorts, showed a small but statistically significant CAG repeat size reduction in a subgroup of 60 MLH1 LoF heterozygotes from the larger cohort from Bochum.The frequencies of HTT intermediate alleles and somatic EI values were consistently lower in all LS genetic groups compared to controls in the Bochum cohort, but the observed differences were individually not statistically significant.The CAG repeat size in the Bochum MLH1 LS subgroup remained significantly smaller compared to controls also after removal of individuals with HTT intermediate alleles.Nucleotide repeat instability and an increased mutational burden is a known phenomenon in dMMR cancers in LS patients following somatic "second hit" of the remaining wild-type MMR allele 16 , and in all tissues in individuals with constitutional biallelic MMR deficiency 18 .However, to the best of our knowledge, MMR gene haploinsufficiency in humans has to date not been reported to affect constitutional nucleotide repeat size.A recent whole genome sequencing (WGS) study of non-neoplastic tissue samples from individuals with LS failed to detect any changes in the repertoire of mutational processes or mutation rates 19 .Yet, subtle nucleotide repeat variations could have escaped detection using WGS technology due to limited methodological accuracy in regions with STRs compared to PCR fragment-based analyses 20 .Like previous population-based observations 21 , we found a large CAG repeat size variation between HTT alleles, both intra-and inter-individually which, together with the lack of parental HTT repeat size data prevent us from a more detailed data interpretation.Clearly, the deciphering of which HTT allele has cosegregated with the LS allele in each individual, e.g., by LS family trio analyses, would have enhanced our data interpretation considerably, allowing us to identify individuals, or certain repeats size intervals including variable CAA interruptions that may account for the observed repeat variation in the Bochum LS cohort.Although HTT intermediate alleles appear under-represented in the Bochum LS cohort, especially in individuals with MLH1-associated LS (1.7%), compared to the controls used (5.2%) and to reported population-based frequencies (6.8%; 21 ), interpretation of data should be made with caution due to the limited size of the cohort subgroups and the absence of LS family trio data.Possibly, the observed frequency of HTT intermediate alleles in LS in our study could reflect intergenerational CAG repeat contractions of such alleles into the normal repeat-size interval.However, HTT intermediate alleles alone are not responsible for the observed CAG repeat-size reduction in the Bochum MLH1 LS subgroup as the removal of this category of alleles from our calculations had little impact.Somatic EI values did not differ significantly between the LS subgroups and controls, but notably the values were consistently lower in all LS genetic subcategories.As EI values normally are positively age-dependent 22 and since the mean age in the Bochum LS group was higher than in the control group, the EI value gap between the two groups could potentially be an underestimate.Clearly, the use of age-matched controls would have sharpened interpretation of EI values.Experimentally, in mouse models of HD, there is long-standing evidence that reduced expression of the MMR proteins Msh2, Msh3, Mlh1 or Mlh3 counteracts www.nature.com/scientificreports/somatic CAG repeat expansion 23,24 and de-escalates the HD experimental pathogenic process 25 .More recently, reduced expression of the endo-and exonuclease Fan1 was shown to promote somatic CAG repeat expansion in an Mlh1-dependent manner, i.e., suppression of Mlh1 blocked Fan1-induced repeat expansion 26,27 .There is now mounting evidence that the MMR pathway contributes to the expansion of unstable pathogenic nucleotide repeats in HD and other human hereditary neurodegenerative diseases 14 .Given the present results and current knowledge in this field of research, it could be speculated that individuals with LS could be less prone to HTT CAG repeat expansion.In summary, this study indicates that MMR gene haploinsufficiency, in particular for MLH1, could be associated with a propensity for reduced constitutional HTT CAG repeat size.Further investigations, e.g., with larger LS case samples and LS family trio WGS analyses are required to confirm our results.
Additional studies should also be encouraged to explore the possible impact of MMR gene haploinsufficiency on other nucleotide repeat regions in the human genome.

Cohort information
Lymphocyte DNA was retrieved from two different cohorts of index individuals diagnosed with LS from Sweden and Germany (Lund cohort and Bochum cohort, respectively) carrying germline class 4 (likely pathogenic) or class 5 (pathogenic) variants in MLH1, MSH2, MSH6 and PMS2 according to variant classification criteria by The American College of Medical Genetics and Genomics (ACMG) 28 or The International Society of Gastrointestinal Hereditary Tumours variant database 29 , and from controls (Fig. 1

HTT CAG repeat size estimation and somatic expansion ratio calculation
HTT germline CAG repeat size estimation was performed using standard protocols for PCR amplification and capillary electrophoresis fragment analysis with a validated accuracy of ± 1 CAG repeat for alleles with < 45 repetitions and ± 3 CAG repetitions for alleles with 45 or more repeats using PCR primers (Lund cohort) HD1: 5′ ATG AAG GCC TTC GAG TCC CTC AAG TCC TTC 3′ and HD3: 5′ Hex-GGC GGT GGC GGC TGT TGC TGC TGC TGC 3′ as described 30 , or (Bochum cohort) Hu4: (F) 6-FAM-5′-ATG GCG ACC CTG GAA AAG CTG ATG AA) and Hu5: (R) (5′-GGC GGT GGC GGC TGT TGC TGC TGC TGC TGC ) as described 31,32 .A canonical glutamine-encoding repeat sequence in HTT was assumed.PCR products were resolved using the ABI 3500XL Genetic Analyzer (Applied Biosystems) using GeneMapper v6 software and GeneScan 500-ROX as internal size standard (Lund cohort), or ABI 3500XL Genetic Analyzer (Applied Biosystems), GeneMapper v4.1 software and GeneScan 500-ROX as internal size standard (Bochum cohort).Somatic CAG repeat EI values were derived from indices from GeneMapper peak height data and calculated as described 17 , considering only expansion peaks to the right of the highest (modal allele) peak, using 250 consecutively selected individuals from the Bochum control group as controls.

Statistical analyses
CAG repeat size was converted to integers according to clinical genetic laboratory diagnostic routines 30 .The methodological estimation error ± 1 repeat was excluded from statistical calculations.Since the methods used in this study do not unmask which HTT allele has co-segregated with the LS-associated variant, the sum of HTT CAG repeats in each individual was calculated and used in all analyses except for somatic EI calculations.Mean values for sum of CAG repeats and standard deviation (SD) with 95% confidence interval (CI) were calculated for each MMR gene.Student's t-test was used.P-values < 0.05 were considered significant.Bonferroni correction was applied to adjust for multiple comparisons, i.e., MLH1, MSH2 and MSH6 vs. controls, respectively, following which P-values < 0.017 were considered significant.Calculations were performed using SPSS Statistics for Windows (SPSS Inc., Chicago, Ill., USA).

Ethics approval
This study was approved by The Regional Ethical Review Board in Lund, Sweden (application no.2013/468 and application no.2015/211), approved, or waived following anonymization procedures by the Swedish Ethical Review Agency (application no.2019-02312 and application no.2021-06254-02, respectively), and approved by the Ethics Review Board of the Ruhr University in Bochum, Germany, (application no.18-6563-BR).Informed written consent was required and obtained from all individuals (Bochum cohort) or waived (Lund cohort) following anonymization of DNA samples prior to HTT CAG repeat size analysis (application no.2021-06254-02).
No individual-level data are published in this study.All methods were performed in accordance with the relevant local guidelines and regulations.

Figure 1 .
Figure 1.Flow-chart and description of the Lund cohort (a) and the Bochum cohort (b) with numbers of included and excluded individuals, gender distribution, and Lynch syndrome genetic subcategories with loss-offunction variants in MLH1, MSH2 and MSH6, respectively, and controls.

Figure 3 .
Figure 3. Boxplot of the sum of CAG repeats in the Bochum cohort from individuals with Lynch syndrome caused by loss-of-function variants in MLH1, MSH2 and MSH6, and controls.Outlier (MSH6 n = 1, 50 CAG repeats) is not shown.*P = 0.014.

and analysis of the Bochum cohort
Boxplot of the sum of CAG repeats in the Lund cohort from individuals with Lynch syndrome caused by loss-of-function variants in MLH1, MSH2 and MSH6, and controls.Outlier (MLH1 n = 1, 55 CAG repeats) is not shown.

Table 1 .
Summary of HTT CAG repeat size characteristics in the study cohorts.The mean sum of HTT CAG repeats, the fraction of individuals with HTT intermediate alleles (27-35 CAG repeats), and the mean somatic expansion index (EI) value in the MLH1, MSH2 and MSH6 Lynch syndrome subgroups and controls in the Lund cohort and Bochum cohort, respectively, are shown.SD, standard deviation.NA, not analyzed.

of individuals with 27-35 CAG repeats Mean somatic EI value Lund cohort
allele interval.The fraction of individuals with an intermediate allele among individuals with LS did not differ significantly from that in controls, but the fraction was consistently lower in all LS genetic subgroups (Table ).A subgroup of the Lund cohort was previously presented in a pre-publication (Dalene Skarping et al. 2022, MedRxiv, https:// doi.org/ 10. 1101/ 2022.05.28.22275 723).Controls in the present study were individuals diagnosed with immunohistochemically MMR proficient colorectal cancers during 1999-2011 from whom tumor tissue DNA had also been archived (Lund cohort) or self-reported healthy university students (Bochum cohort).Controls from Bochum were excluded if they or any of their close relatives suffered from neurological and/or mental illnesses, as assessed by a self-report questionnaire.Individuals with LS-associated missense variants predicted to cause single amino acid substitutions were excluded to avoid variants with partial LoF, and variants with unclear pathogenic mechanism.Other types of LS-associated variants, i.e., nonsense variants, variants altering the reading frame or splicing, deletions or duplications of exon(s) were considered complete LoF alleles.Individuals with variants in the MMR gene PMS2 were excluded due to the limited number of such individuals in both cohorts (Fig.1).