Introduction

Short tandem repeats (STRs), also known as microsatellites/simple sequence repeats, are an important source of evolutionary and pathological processes1,2,3,4,5. Numerous STRs within gene regulatory regions may have a link with the evolution of human and non-human primates through various mechanisms, such as gene expression regulation6,7,8,9. In a number of instances, there are indications of a link between STRs and late-onset neurocognitive disorders (NCDs) in human such as Alzheimer’s disease (AD) and Parkinson’s disease (PD)10,11,12.

RASGEF1C (RasGEF Domain Family Member 1C), located on chromosome 5q35.3, contains a (GGC)-repeat of 13-repeats, spanning its core promoter and 5′ UTR (RASGEF1C-201 ENST00000361132.9)13. Based on Ensembl 103 (ensemble.org), the transcript containing the (GGC)-repeat is at the highest support level annotated for the transcript isoforms of this gene (TSL:1). The protein encoded by RASGEF1C is a guanine nucleotide exchange factor (GEF) (https://www.genecards.org/cgi-bin/carddisp.pl?gene=RASGEF1C), and primarily interacts with a number of the ZNF family members, such as ZNF507, ZNF235, ZNF25, and ZNF612 (https://version11.string-db.org/cgi/network.pl?taskId=QkiV955revRw). In human, RASGEF1C is predominantly expressed in the brain (https://www.proteinatlas.org/ENSG00000146090-RASGEF1C/tissue), and aberrant regulation of this gene occurs in late-onset NCDs, such as AD14.

Abundancy of polymorphic CGG repeats in the human genome suggest a broad involvement in neurological diseases15. Large GGC expansions are strictly linked to neurological disorders of predominant neurocognitive impairment, such as fragile X-associated tremor and ataxia syndrome, NIID, and oculopharyngeal muscular dystrophy16,17,18,19,20,21.

Here we sequenced the RASGEF1C (GGC)-repeat in a sample of humans, consisting of late-onset NCDs and controls. We also analyzed the status of this STR across several primate and non-primate species.

Materials and methods

Subjects

Two hundred sixty-nine unrelated Iranian subjects of ≥ 60 years of age, consisting of late-onset NCD patients (n = 115) and controls (n = 154) were recruited from the provinces of Tehran, Qazvin, and Rasht. In each NCD case, the Persian version22 of the Abbreviated Mental Test Score (AMTS)23 was implemented (AMTS < 7 was an inclusion criterion for NCD), medical records were reviewed in all participants, and CT-scans were obtained where possible. Furthermore, in a number of subjects, the Mini-Mental State Exam (MMSE) Test24 was implemented in addition to the AMTS. A score of < 24 was an inclusion criterion for NCD.

The AMTS is currently one of the most accurate primary screening instruments to increase the probability of NCD25. The Persian version of the AMTS is a valid cognitive assessment tool for older Iranian adults, and can be used for NCD screening in Iran22.

The control group was selected based on cognitive AMTS of > 7 and MMSE > 24, lack of major medical history, and normal CT-scan where possible. The cases and controls were matched based on age, gender, ethnicity, and residential district. The subjects' informed consent was obtained (from their guardians where necessary) and their identities remained confidential throughout the study. The research was approved by the Ethics Committee of the Social Welfare and Rehabilitation Sciences, Tehran, Iran, and was consistent with the principles outlined in an internationally recognized standard for the ethical conduct of human research. All methods were performed in accordance with the relevant guidelines and regulations.

Allele and genotype analysis of the RASGEF1C (GGC)-repeat

Genomic DNA was obtained from peripheral blood using a standard salting out method. PCR reactions for the amplification of the RASGEF1C (GGC)-repeat were set up with the following primers.

Forward: GAGGGTGAACTGGGTTTTGG.

Reverse: ACTCTAGCGGCTGAAAGAAG.

PCR reactions were carried out with a GC-TEMPase 2 × master mix (Amplicon) in a thermocycler (Peqlab-PEQStar) under the following conditions: touchdown PCR: 95 °C for 5 min, 20 cycles of denaturation at 95 °C for 45 s, annealing for 45 s at 67 °C (− 0.5 decrease for each cycle) and extension at 72 °C for 1 min, and 30 cycles of denaturation at 95 °C for 40 s, annealing at 57 °C for 45 s and extension at 72 °C for 1 min, and a final extension at 72 °C for 10 min. Genotyping of every sample included in this study was performed following Sanger sequencing by the forward primer, using an ABI 3130 DNA sequencer (Supplementary Information 1 and 2).

Analysis of the RASGEF1C (GGC)-repeat across vertebrates

Ensembl 103 (https://www.ensembl.org/index.html) was used to analyze the interval between + 1 and + 100 of the TSS of the RASGEF1C in all the species in which this gene was annotated and the relevant region was sequenced. The CodonCode Aligner (https://www.codoncode.com) and Ensembl alignment programs (http://www.ensembl.org) were implemented for the sequence alignments across the species.

Statistical analysis

The P-values were calculated using the Two-by-Two Table of the OpenEpi calculator (https://www.openepi.com/TwobyTwo/TwobyTwo.htm)26.

Statement of ethics

The subjects' informed consent was obtained (from their guardians where necessary) and their identities remained confidential throughout the study. The research was approved by the Ethics Committee of the University of Social Welfare and Rehabilitation Sciences, Tehran, Iran, and was consistent with the principles outlined in an internationally recognized standard for the ethical conduct of human research.

Results

Predominant abundance of the RASGEF1C (GGC)6 in human.

We detected six alleles at 5, 6, 7, 8, 9, and 11-repeats, of which the predominant allele was the 6-repeat (Figs. 1 and 2). The frequency of (GGC)6 was at 0.85 and 0.78 in the controls and NCD group, respectively (Fig. 2). At significantly lower frequencies, the 8 and 11 repeats ranked next in the NCD group and controls, respectively.

Figure 1
figure 1

Electropherogram of the predominantly abundant allele at 6-repeats in the human RASGEF1C gene, in the context of a 6/6 genotype.

Figure 2
figure 2

Allele frequency of the RASGEF1C (GGC)-repeat in NCD patients and controls. The 6-repeat was the predominant allele in both groups.

Significant enrichment of divergent genotypes (genotypes that lacked the 6-repeat) in the NCD group

We detected significant enrichment of genotypes that lacked the 6-repeat allele in the NCD group. Eleven out of 115 patients harbored such genotypes (Mid-P exact = 0.004) (Table 1, Figs. 3 and 4), whereas 3 out of 154 controls harbored those (p = 0.05). The divergent genotypes consisted of the 7, 8, 9 and 11 repeat alleles, and heterozygous and homozygous genotypes were detected in that genotype compartment.

Table 1 NCD patients and controls harboring divergent genotypes (lacking the 6-repeat).
Figure 3
figure 3

Genotype frequency of the RASGEF1C (GGC)-repeat in NCD patients and controls. While the 6/6 genotype was the predominant genotype in both groups, excess of divergent genotypes was detected in the NCD group.

Figure 4
figure 4

Electropherogram of the non-6 genotypes (divergent genotypes) at the RASGEF1C (GGC) in NCD patients. (A) 7/7, (B) 7/8, (C) 7/9, (D) 8/8, (E) 8/11.

Among the divergent genotypes, in 5 patients (4% of the NCD group) we detected genotypes that were not detected in the control group (hence the term “disease-only”) (Mid-P exact = 0.007) (Table 1, Fig. 4).

Patients harboring the divergent genotypes spanned a wide age range, between 60 to 78 years, and revealed moderate to severe neurocognitive dysfunction. Possible diagnoses also varied, such as AD in patients 1, 5, 9, and 10 and vascular dementia in patients 2 and 11.

In line with a higher frequency of the 8-repeat in the NCD group, we found a significant excess of the 8/8 genotype in this group in comparison to the control group (Mid-P exact = 0.01).

Although not statistically significant (p = 0.05), two control individuals harbored the 11/11 genotype, which was not detected in the NCD group (Table 1). The frequency of the 11-repeat allele was also found to be higher in the controls vs. NCDs.

RASGEF1C (GGC)-repeat expanded specifically in primates, and was at maximum length in human

Across all the species studied, the (GGC)-repeat was at maximum length in human. While in primates the minimum repeat length was 4-repeats (Fig. 5), the maximum length of (GGC)-repeat detectable in non-primates was 2-repeats (Fig. 6), indicating that this STR expanded specifically in primates.

Figure 5
figure 5

Sequence alignment of the RASGEF1C (GGC)-repeat across primate species, using CodonCode. The (GGC)-repeat was at maximum length in human.

Figure 6
figure 6

Sequence alignment of the RASGEF1C (GGC)-repeat across non-primates, using the Ensembl alignment program. Human is depicted as the reference sequence. Reverse strand depicted. (GGC)-repeat of > 2-repeats was not detected in non-primates.

Discussion

We propose that there is natural selection for the 6-repeat of the RASGEF1C (GGC)n in human. This proposition is not only based on the predominant abundance of the 6-repeat allele in the human subjects studied, but also the significant enrichment of divergent genotypes, lacking this allele in the NCD compartment. A number of divergent genotypes were detected in the NCD group only. Evidence of natural selection for an abundant allele in human has been previously reported by our group in the instance of the exceptionally long CA-repeat in the core promoter of the human NHLH2 gene, and enrichment of genotypes lacking the predominantly abundant allele (the 21-repeat) in patients afflicted with late-onset NCD1. It is commonly assumed that genes influencing health in later life are not subject to natural selection. However, findings on the APOE alleles and several other NCD susceptibility loci27,28 indicate that natural selection indeed happens on such alleles. RASGEF1C may also be linked to other yet to be identified phenotypes, which may impose natural selection at the (GGC)-repeat.

Based on the AceView database29, in comparison with several primates, the brain expression of RASGEF1C has the least quantile expression level in human (https://www.ncbi.nlm.nih.gov/IEB/Research/Acembly), which coincides with maximum length of the (GGC)-repeat in human vs. all other primates. A (GGC)-repeat of similar length range in another gene, Reelin (RELN), which is critical to neurodevelopment in human, shows declining luciferase activity with increasing GGC repeat number30. However, it should be noted that the RASGEF1C (GGC)-repeat is among a number of other regulatory factors, which may also affect gene expression, and a link can only be established through future studies.

While aberrant repeat associated non-AUG translation (RAN), DNA hypermethylation, and polyglycine protein translation are established mechanisms linked to large (GGC)-repeat expansions (> 100 repeats) in a spectrum of neurological disorders in human5,31,32, the mechanisms underlying repeat selection at the range of the RASGEF1C (GGC)-repeat and also its link to late-onset NCD need to be clarified in the future studies.

Despite the high prevalence and debilitating characteristics of late-onset NCDs, genetic studies in this group of disorders have resulted in a number of genes with mild to modest effect for the most part33. In a novel approach, we selected the patients group based on late-onset NCD as an entity, without differentiating the NCD subtypes. The advantage of this approach was to eliminate the often-ambiguous diagnoses made for the NCD subtypes, which frequently co-occur and overlap in respect of the clinical and pathophysiological manifestations34,35,36, and are associated with “probable” and “possible” conclusions for the most part (DSM-5).

The RASGEF1C (GGC)-repeat expanded beyond 2-repeats in primates, and was at maximum length in human. It may be speculated that this locus participates in characteristics and phenotypes that have dramatically diverged in human, such as the higher order brain functions.

Our data warrant further functional studies on the (GGC)-repeat and sequencing this repeat in larger sample sizes and various human populations afflicted with major neurological disorders.

Conclusion

We provide a pilot study on repeat length selection at the human RASGEF1C (GGC)-repeat, at 6-repeats, and significant enrichment of genotypes lacking this allele in patients with late-onset NCD. Indication of natural selection for predominantly abundant STR alleles and divergent genotypes unfold a previously underappreciated feature of STRs in human evolution and disease.