Natural selection at the RASGEF1C (GGC) repeat in human and divergent genotypes in late-onset neurocognitive disorder

Expression dysregulation of the neuron-specific gene, RASGEF1C (RasGEF Domain Family Member 1C), occurs in late-onset neurocognitive disorders (NCDs), such as Alzheimer’s disease. This gene contains a (GGC)13, spanning its core promoter and 5′ untranslated region (RASGEF1C-201 ENST00000361132.9). Here we sequenced the (GGC)-repeat in a sample of human subjects (N = 269), consisting of late-onset NCDs (N = 115) and controls (N = 154). We also studied the status of this STR across various primate and non-primate species based on Ensembl 103. The 6-repeat allele was the predominant allele in the controls (frequency = 0.85) and NCD patients (frequency = 0.78). The NCD genotype compartment consisted of an excess of genotypes that lacked the 6-repeat (divergent genotypes) (Mid-P exact = 0.004). A number of those genotypes were not detected in the control group (Mid-P exact = 0.007). The RASGEF1C (GGC)-repeat expanded beyond 2-repeats specifically in primates, and was at maximum length in human. We conclude that there is natural selection for the 6-repeat allele of the RASGEF1C (GGC)-repeat in human, and significant divergence from that allele in late-onset NCDs. STR alleles that are predominantly abundant and genotypes that deviate from those alleles are underappreciated features, which may have deep evolutionary and pathological consequences.

Here we sequenced the RASGEF1C (GGC)-repeat in a sample of humans, consisting of late-onset NCDs and controls. We also analyzed the status of this STR across several primate and non-primate species.

Materials and methods
Subjects. Two hundred sixty-nine unrelated Iranian subjects of ≥ 60 years of age, consisting of late-onset NCD patients (n = 115) and controls (n = 154) were recruited from the provinces of Tehran, Qazvin, and Rasht. In each NCD case, the Persian version 22 of the Abbreviated Mental Test Score (AMTS) 23 was implemented (AMTS < 7 was an inclusion criterion for NCD), medical records were reviewed in all participants, and CT-scans were obtained where possible. Furthermore, in a number of subjects, the Mini-Mental State Exam (MMSE) Test 24 was implemented in addition to the AMTS. A score of < 24 was an inclusion criterion for NCD.
The AMTS is currently one of the most accurate primary screening instruments to increase the probability of NCD 25 . The Persian version of the AMTS is a valid cognitive assessment tool for older Iranian adults, and can be used for NCD screening in Iran 22 .
The control group was selected based on cognitive AMTS of > 7 and MMSE > 24, lack of major medical history, and normal CT-scan where possible. The cases and controls were matched based on age, gender, ethnicity, and residential district. The subjects' informed consent was obtained (from their guardians where necessary) and their identities remained confidential throughout the study. The research was approved by the Ethics Committee of the Social Welfare and Rehabilitation Sciences, Tehran, Iran, and was consistent with the principles outlined in an internationally recognized standard for the ethical conduct of human research. All methods were performed in accordance with the relevant guidelines and regulations.
Allele and genotype analysis of the RASGEF1C (GGC)-repeat. Genomic DNA was obtained from peripheral blood using a standard salting out method. PCR reactions for the amplification of the RASGEF1C (GGC)-repeat were set up with the following primers.

Analysis of the RASGEF1C (GGC)-repeat across vertebrates.
Ensembl 103 (https:// www. ensem bl. org/ index. html) was used to analyze the interval between + 1 and + 100 of the TSS of the RASGEF1C in all the species in which this gene was annotated and the relevant region was sequenced. The CodonCode Aligner (https:// www. codon code. com) and Ensembl alignment programs (http:// www. ensem bl. org) were implemented for the sequence alignments across the species.
Statistical analysis. The P-values were calculated using the Two-by-Two Table of the OpenEpi calculator (https:// www. opene pi. com/ Twoby Two/ Twoby Two. htm) 26 .

Statement of ethics. The subjects' informed consent was obtained (from their guardians where necessary)
and their identities remained confidential throughout the study. The research was approved by the Ethics Committee of the University of Social Welfare and Rehabilitation Sciences, Tehran, Iran, and was consistent with the principles outlined in an internationally recognized standard for the ethical conduct of human research.

Results
Predominant abundance of the RASGEF1C (GGC)6 in human.. We detected six alleles at 5, 6, 7, 8, 9, and 11-repeats, of which the predominant allele was the 6-repeat ( Figs. 1 and 2). The frequency of (GGC)6 was at 0.85 and 0.78 in the controls and NCD group, respectively (Fig. 2). At significantly lower frequencies, the 8 and 11 repeats ranked next in the NCD group and controls, respectively.
Significant enrichment of divergent genotypes (genotypes that lacked the 6-repeat) in the NCD group. We detected significant enrichment of genotypes that lacked the 6-repeat allele in the NCD group. Eleven out of 115 patients harbored such genotypes (Mid-P exact = 0.004) (  Fig. 4).
Patients harboring the divergent genotypes spanned a wide age range, between 60 to 78 years, and revealed moderate to severe neurocognitive dysfunction. Possible diagnoses also varied, such as AD in patients 1, 5, 9, and 10 and vascular dementia in patients 2 and 11.
In line with a higher frequency of the 8-repeat in the NCD group, we found a significant excess of the 8/8 genotype in this group in comparison to the control group (Mid-P exact = 0.01).
Although not statistically significant (p = 0.05), two control individuals harbored the 11/11 genotype, which was not detected in the NCD group (Table 1). The frequency of the 11-repeat allele was also found to be higher in the controls vs. NCDs.

RASGEF1C (GGC)-repeat expanded specifically in primates, and was at maximum length in human.
Across all the species studied, the (GGC)-repeat was at maximum length in human. While in primates the minimum repeat length was 4-repeats (Fig. 5), the maximum length of (GGC)-repeat detectable in non-primates was 2-repeats (Fig. 6), indicating that this STR expanded specifically in primates.

Discussion
We propose that there is natural selection for the 6-repeat of the RASGEF1C (GGC)n in human. This proposition is not only based on the predominant abundance of the 6-repeat allele in the human subjects studied, but also the significant enrichment of divergent genotypes, lacking this allele in the NCD compartment. A number of divergent genotypes were detected in the NCD group only. Evidence of natural selection for an abundant allele in human has been previously reported by our group in the instance of the exceptionally long CA-repeat in the core promoter of the human NHLH2 gene, and enrichment of genotypes lacking the predominantly abundant allele (the 21-repeat) in patients afflicted with late-onset NCD 1 . It is commonly assumed that genes influencing health in later life are not subject to natural selection. However, findings on the APOE alleles and several other NCD susceptibility loci 27,28 indicate that natural selection indeed happens on such alleles. RASGEF1C may also be linked to other yet to be identified phenotypes, which may impose natural selection at the (GGC)-repeat. Based on the AceView database 29 , in comparison with several primates, the brain expression of RASGEF1C has the least quantile expression level in human (https:// www. ncbi. nlm. nih. gov/ IEB/ Resea rch/ Acemb ly), which  , which is critical to neurodevelopment in human, shows declining luciferase activity with increasing GGC repeat number 30 . However, it should be noted that the RASGEF1C (GGC)-repeat is among a number of other regulatory factors, which may also affect gene expression, and a link can only be established through future studies. While aberrant repeat associated non-AUG translation (RAN), DNA hypermethylation, and polyglycine protein translation are established mechanisms linked to large (GGC)-repeat expansions (> 100 repeats) in a spectrum of neurological disorders in human 5,31,32 , the mechanisms underlying repeat selection at the range of the RASGEF1C (GGC)-repeat and also its link to late-onset NCD need to be clarified in the future studies.
Despite the high prevalence and debilitating characteristics of late-onset NCDs, genetic studies in this group of disorders have resulted in a number of genes with mild to modest effect for the most part 33 . In a novel approach, we selected the patients group based on late-onset NCD as an entity, without differentiating the NCD subtypes.  www.nature.com/scientificreports/ The advantage of this approach was to eliminate the often-ambiguous diagnoses made for the NCD subtypes, which frequently co-occur and overlap in respect of the clinical and pathophysiological manifestations [34][35][36] , and are associated with "probable" and "possible" conclusions for the most part (DSM-5).
The RASGEF1C (GGC)-repeat expanded beyond 2-repeats in primates, and was at maximum length in human. It may be speculated that this locus participates in characteristics and phenotypes that have dramatically diverged in human, such as the higher order brain functions.
Our data warrant further functional studies on the (GGC)-repeat and sequencing this repeat in larger sample sizes and various human populations afflicted with major neurological disorders.

Conclusion
We provide a pilot study on repeat length selection at the human RASGEF1C (GGC)-repeat, at 6-repeats, and significant enrichment of genotypes lacking this allele in patients with late-onset NCD. Indication of natural selection for predominantly abundant STR alleles and divergent genotypes unfold a previously underappreciated feature of STRs in human evolution and disease.

Data availability
Raw data are available in Supplementary Information 1 and 2.