Development and validation in 500 female samples of a TP-PCR assay to identify AFF2 GCC expansions

Over 100 X-linked intellectual disability genes have been identified, with triplet repeat expansions at the FMR1 (FRAXA) and AFF2 (FRAXE) genes being the causative agent in two of them. The absence of FRAXE pathognomonic features hampers early recognition, delaying testing and molecular confirmation. Hence, our laboratory uses a multiplex PCR-based strategy to genotype both FRAXA and FRAXE. However, AFF2 expansions are missed giving rise to an uninformative result in around 20% of female samples. To rule out undetected expansions and confirm homozygosity Southern blot analysis is performed being labour- and resource-intensive. The aim of this study is to develop a timely and economic triplet-primed amplification (TP-PCR) screening strategy to size the AFF2 GCC repeat and accurately assess homozygosity as well as pinpoint multiplex-PCR false negatives in female samples. In order to achieve this, validation was performed in a cohort of 500 females with a previous uninformative FRAXE PCR result. Interestingly, the presence of a T > C SNP (rs868949662), contiguous to the GCC repetitive tract, allows triplet primer binding in two additional repeats, increasing the discrimination power of the TP-PCR assay in heterozygous and homozygous samples. Twelve alleles outside the normal range were recognized: eight intermediate and four premutated, which seems relevant considering the rarity of the AFF2 expansions. All genotypes are concordant with that obtained by Southern blotting, confirming this as a strict, reproducible and low-cost homozygosity screening strategy that enables the identification of small expanded alleles missed by the routine multiplex-PCR due to allele dropout. Overall, this assay is capable of spotting multiplex-PCR false negatives besides identifying alleles up to > 80 GCC repeats. Furthermore, the occurrence of intermediate repeat sizes with unexpected frequency, introduces new areas of clinical research in this cohort in understanding these less explored AFF2 repeat sizes and newly associated phenotypes.

www.nature.com/scientificreports/ multiplex PCR-based strategy to determine simultaneous allele sizing of FRAXA and FRAXE loci in both males and females that fulfil clinical criteria for Fragile X Syndrome, idiopathic intellectual disability (FXS, MIM #300624) 23 . In the presence of an expanded allele, the PCR result is usually inconclusive as the amplification of expanded alleles is inefficient in high repeat sizes. Furthermore, around 20% of female samples are uninformative due to possible homozygosity (both alleles share the same number of repeats) for one or both loci. These observations prompted us to develop a rapid triplet-primed amplification (TP-PCR) screening strategy to size the AFF2 GCC repeat and accurately assess homozygosity in female samples. In the case of FMR1 a commercially available TP-PCR kit is used. Overall, this study allowed validation of a homozygosity screening strategy, spotting false negatives from the multiplex-PCR allowing the identification of FRAXE-related expansions with putative diagnostic utility.

TP-PCR development and optimization.
For the optimization of the TP-PCR assay conditions, we used four male and female samples, previously characterized by sequencing and Southern blot analyses (Supplementary Fig. S1). Taking advantage of the (GCC)2GCT(GCC)2C sequence located 52 nucleotides downstream of the unstable GCC tract, specific primer sequence and ligation conditions allowed the simultaneous amplification of the triplet repeat profile and the determination of the allele total length in one assay. The use of a (GCC)5 primer allows ligation to occur at (GCC)2GCT(GCC)2C, with SNP rs1333167094 "T" being unnoticed (Fig. 1). The TP-PCR electropherogram reflects the total repeat length as well as each GCC triplet, represented as a repetitive stutter pattern where the leftmost peak corresponds to repeat number 5 and each beyond that being represented as an individual peak until the end of the repeat tract is reached. The downstream peak cor- Grey squares represent tandem GCC repeats at the AFF2 gene 5′ UTR. Black squares represent annealing of "CGG primer" within the GCC tract and in a second priming site overriding the rs1333167094 "T" SNP (when GCT is located exactly in the middle of the (CGG)5C primer sequence). Above is an electropherogram illustration after capillary electrophoresis on automatic sequencer, reflecting the total repeat length (higher rightmost peak), that corresponds to the distance between the forward primer and the sole GCC priming site 52 bp downstream of the end of the repeat. The image is not at scale; (B) Partial AFF2 sequence (GRCh37/hg19, NM_002025.3) harbouring the primer binding sites. Annealing sites within the GCC repeats (light blue), generating stutters, and at (GCC)2GCT(GCC)2G sequence located 52 nucleotides downstream (dark blue). "Forward primer" and corresponding binding site (yellow); "CGG primer" (blue arrow and dashed purple line) (CGG)5C; Tail primer (green arrow) and SNPs c.-413T > C rs868949662 (*) (highest population MAF: 0.02) and c.-355T > C rs1333167094 (#) (highest population MAF: < 0.01) are also indicated. In case of rs868949662 "C", the nucleotide following the 3rd GCC triplet is a "C" hampering its recognition, as a result instead of the three additional GCC triplets, only two are recognized (region underlined); www.nature.com/scientificreports/ responds to the distance between the forward primer and the sole GCC priming site 52 bp downstream of the end of the repeat. For instance, a peak of around 197 bp corresponds to 18 GCCs matching the thirteen (plus five triplets of the "CGG primer") consecutive fragments differing by 3 bp (Fig. 1). Our TP-PCR assay is also able to discriminate alleles differing by a single repeat. In samples carrying larger alleles, a continuous series of uninterrupted triplet fragments is displayed, overlapping the total repeat size peak. Despite being visible triplets with an RFU greater than the height of the noise signal, only those with 30 RFUs or above are considered, allowing to accurately size expanded alleles up to 80 GCCs (Fig. 1C,ii). In cases where alleles are visible with a stuttering pattern past the established threshold of 80 repeats, further testing is mandatory following the classical/routine workflow (e.g. Southern blotting).

TP-PCR validation.
Following optimization, a total of 500 female samples, previously classified as uninformative by multiplex-PCR (due to observed homoallelism), were analyzed. In our cohort, FRAXE alleles ranged between 11 to 107 GCC triplets, in which alleles with 15 repeats are the most frequent. As expected, a homozygous normal result was confirmed by TP-PCR in the vast majority of the samples tested (n = 483; 96.6%), showing 13 different genotypes ( Table 1). The most common genotype was 15/15 which is expected because the 15 GCC allele is the most common in the European population. Allele distribution is similar to that obtained by Clark et al., with the 15 (n = 692), 18 (n = 84) and 16 (n = 76) GCC alleles being the most common, followed by those with 20 GCC (n = 29) suggesting a bimodal distribution (Table 1) 8 . The unique design of the "CGG primer" in the reverse sense, with a 5′ tail and (CGG)5C at its 3′ terminus, enables the annealing anywhere within the GCC tract as well as rs1333167094 "T" SNP overriding leading to primer ligation in a second distal site. This occurs because the non-complementary "T" is located exactly in the middle of the (CGG)5 primer's sequence. In case of rs868949662 "C", the primer is unable to bind due to the absence of ligation in 3′ end nucleotide of the primer. We speculate that the presence of a nucleotide interspersed at the GCC stretch would likely produce a partial peak dropout as a consequence of the absence of annealing as that interspersed nucleotide is located towards the primer 3′ end. As no partial or total allele dropout was observed in our cohort, we anticipate that interspersions in the AFF2 repetitive region are scarcer than the AGG interruption(s) in the FMR1 gene repetitive region.
Unexpected alleles. The majority of normal-sized alleles were identically sized by both multiplex-PCR and TP-PCR. The discrepant result observed in one sample, multiplex-PCR (15/15) and TP-PCR (15/27) might be explained by the preferential amplification of the smallest allele when using three primer pairs (multiplexing FMR1, AFF2 and ARX genes), as there is a 36 bp (12 GCCs) difference between the size of each allele. This bias towards the smallest allele seems to be occurring more frequently with AFF2 than FMR1 gene alleles (in our experience, observed even with singleplex PCR). In 6 samples a second expanded allele was observed and in 11 a second unexpected normal-sized allele could be depicted by the triplet repeat profile but not by the total repeat length fragment (Fig. 2). These results prompted sequencing of the repetitive region in those samples, which revealed that the presence of the SNP minor allele "C" at the end of the GCC region [NM_002025.3:c.-413T > C (rs868949662)], resulted in the extension of the repeat tract by three repeats. The nucleotide following the last GCC triplet is a "C", hampering its recognition due to non-complementarity in the very last nucleotide (3′) of the primer (because of the last nucleotide, the primer needs to bind the "G" of the next triplet for amplification to occur). As a consequence, the "CGG primer" only binds to two extra repeats, without interfering in the total fragment length, and thus explaining the discrepant results. This SNP is described with a frequency of 0.021 (gnomAD), and contributes to the discriminatory power of the TP-PCR assay in heterozygous and homozygous samples carrying this variant. In samples where total length fragment shows a difference in two GCCs, when compared to TP-PCR profile, sequencing should be performed to confirm exact GCC number.  S2).

Discussion
A new and accurate screening method for the analysis of the AFF2 gene repetitive region was developed, allowing the quantification of alleles up to 100 GCCs. Among the 500 female samples, seventeen showed alleles with distinct GCC sizes of which three premutations and two intermediate-sized alleles were confirmed by Southern blotting, the remaining were confirmed to have homozygous AFF2 alleles. The previous multiplex-PCR also failed to detect a second normal-sized allele in eleven samples (allele categorizing followed Murray et al. 1996) 7 . Overall, this TP-PCR assay allows a clear distinction between true homoallelism and the presence of an expanded allele and thus proves to be an accurate, reproducible, and low-cost homozygosity screening strategy. There is little information on FRAXE probably due to its rarity and/or mild phenotypic presentation that can be overlooked in the intellectually disabled population. In our cohort a 1:250 premutation carrier ratio was observed that otherwise would be overlooked as homozygous. This frequency seems quite high when compared to that of FRAXA and justifies per se the importance of such a study. AFF2 intermediate and premutated alleles have been associated with disease due to the production of abnormal mRNA levels, similarly to other neurodegenerative disorders caused by triplet repeat expansions 24 . Association between AFF2 and Parkinson's disease clinical features such as gait disorder, visual hallucination and urinary incontinence have been observed, however studies with larger cohorts are needed to further understand the AFF2 role in the disease progression and manifestation 25,26 . It also highlights the importance of the molecular characterization notably in highly heterogeneous and broad conditions such as non-syndromic intellectual disability, for proper medical care and follow-up as well as accurate genetic counselling 27 . There is a relatively high degree of variation of the FRAXE repeat size in particular, and the extensive data available from this study opens areas of research into understanding phenotype associated www.nature.com/scientificreports/ with relatively unexplored repeat sizes. This could be particularly interesting for the lower repeat sizes occurring with high frequency at FRAXE in this population. As the data can be linked to the newly associated phenotypes, it will provide a resource for researchers worldwide.

Methods
Studies fully respect to the Declaration of Helsinki (1964 and amendments), the Oviedo Convention and its protocols (1997)