Expression dysregulation of the neuron-specific gene, RASGEF1C (RasGEF Domain Family Member 1C), occurs in late-onset neurocognitive disorders (NCDs), such as Alzheimer’s disease. This gene contains a (GGC)13, spanning its core promoter and 5′ untranslated region (RASGEF1C-201 ENST00000361132.9). Here we sequenced the (GGC)-repeat in a sample of human subjects (N = 269), consisting of late-onset NCDs (N = 115) and controls (N = 154). We also studied the status of this STR across various primate and non-primate species based on Ensembl 103. The 6-repeat allele was the predominant allele in the controls (frequency = 0.85) and NCD patients (frequency = 0.78). The NCD genotype compartment consisted of an excess of genotypes that lacked the 6-repeat (divergent genotypes) (Mid-P exact = 0.004). A number of those genotypes were not detected in the control group (Mid-P exact = 0.007). The RASGEF1C (GGC)-repeat expanded beyond 2-repeats specifically in primates, and was at maximum length in human. We conclude that there is natural selection for the 6-repeat allele of the RASGEF1C (GGC)-repeat in human, and significant divergence from that allele in late-onset NCDs. STR alleles that are predominantly abundant and genotypes that deviate from those alleles are underappreciated features, which may have deep evolutionary and pathological consequences.
Short tandem repeats (STRs), also known as microsatellites/simple sequence repeats, are an important source of evolutionary and pathological processes1,2,3,4,5. Numerous STRs within gene regulatory regions may have a link with the evolution of human and non-human primates through various mechanisms, such as gene expression regulation6,7,8,9. In a number of instances, there are indications of a link between STRs and late-onset neurocognitive disorders (NCDs) in human such as Alzheimer’s disease (AD) and Parkinson’s disease (PD)10,11,12.
RASGEF1C (RasGEF Domain Family Member 1C), located on chromosome 5q35.3, contains a (GGC)-repeat of 13-repeats, spanning its core promoter and 5′ UTR (RASGEF1C-201 ENST00000361132.9)13. Based on Ensembl 103 (ensemble.org), the transcript containing the (GGC)-repeat is at the highest support level annotated for the transcript isoforms of this gene (TSL:1). The protein encoded by RASGEF1C is a guanine nucleotide exchange factor (GEF) (https://www.genecards.org/cgi-bin/carddisp.pl?gene=RASGEF1C), and primarily interacts with a number of the ZNF family members, such as ZNF507, ZNF235, ZNF25, and ZNF612 (https://version11.string-db.org/cgi/network.pl?taskId=QkiV955revRw). In human, RASGEF1C is predominantly expressed in the brain (https://www.proteinatlas.org/ENSG00000146090-RASGEF1C/tissue), and aberrant regulation of this gene occurs in late-onset NCDs, such as AD14.
Abundancy of polymorphic CGG repeats in the human genome suggest a broad involvement in neurological diseases15. Large GGC expansions are strictly linked to neurological disorders of predominant neurocognitive impairment, such as fragile X-associated tremor and ataxia syndrome, NIID, and oculopharyngeal muscular dystrophy16,17,18,19,20,21.
Here we sequenced the RASGEF1C (GGC)-repeat in a sample of humans, consisting of late-onset NCDs and controls. We also analyzed the status of this STR across several primate and non-primate species.
Materials and methods
Two hundred sixty-nine unrelated Iranian subjects of ≥ 60 years of age, consisting of late-onset NCD patients (n = 115) and controls (n = 154) were recruited from the provinces of Tehran, Qazvin, and Rasht. In each NCD case, the Persian version22 of the Abbreviated Mental Test Score (AMTS)23 was implemented (AMTS < 7 was an inclusion criterion for NCD), medical records were reviewed in all participants, and CT-scans were obtained where possible. Furthermore, in a number of subjects, the Mini-Mental State Exam (MMSE) Test24 was implemented in addition to the AMTS. A score of < 24 was an inclusion criterion for NCD.
The AMTS is currently one of the most accurate primary screening instruments to increase the probability of NCD25. The Persian version of the AMTS is a valid cognitive assessment tool for older Iranian adults, and can be used for NCD screening in Iran22.
The control group was selected based on cognitive AMTS of > 7 and MMSE > 24, lack of major medical history, and normal CT-scan where possible. The cases and controls were matched based on age, gender, ethnicity, and residential district. The subjects' informed consent was obtained (from their guardians where necessary) and their identities remained confidential throughout the study. The research was approved by the Ethics Committee of the Social Welfare and Rehabilitation Sciences, Tehran, Iran, and was consistent with the principles outlined in an internationally recognized standard for the ethical conduct of human research. All methods were performed in accordance with the relevant guidelines and regulations.
Allele and genotype analysis of the RASGEF1C (GGC)-repeat
Genomic DNA was obtained from peripheral blood using a standard salting out method. PCR reactions for the amplification of the RASGEF1C (GGC)-repeat were set up with the following primers.
PCR reactions were carried out with a GC-TEMPase 2 × master mix (Amplicon) in a thermocycler (Peqlab-PEQStar) under the following conditions: touchdown PCR: 95 °C for 5 min, 20 cycles of denaturation at 95 °C for 45 s, annealing for 45 s at 67 °C (− 0.5 decrease for each cycle) and extension at 72 °C for 1 min, and 30 cycles of denaturation at 95 °C for 40 s, annealing at 57 °C for 45 s and extension at 72 °C for 1 min, and a final extension at 72 °C for 10 min. Genotyping of every sample included in this study was performed following Sanger sequencing by the forward primer, using an ABI 3130 DNA sequencer (Supplementary Information 1 and 2).
Analysis of the RASGEF1C (GGC)-repeat across vertebrates
Ensembl 103 (https://www.ensembl.org/index.html) was used to analyze the interval between + 1 and + 100 of the TSS of the RASGEF1C in all the species in which this gene was annotated and the relevant region was sequenced. The CodonCode Aligner (https://www.codoncode.com) and Ensembl alignment programs (http://www.ensembl.org) were implemented for the sequence alignments across the species.
Statement of ethics
The subjects' informed consent was obtained (from their guardians where necessary) and their identities remained confidential throughout the study. The research was approved by the Ethics Committee of the University of Social Welfare and Rehabilitation Sciences, Tehran, Iran, and was consistent with the principles outlined in an internationally recognized standard for the ethical conduct of human research.
Predominant abundance of the RASGEF1C (GGC)6 in human.
We detected six alleles at 5, 6, 7, 8, 9, and 11-repeats, of which the predominant allele was the 6-repeat (Figs. 1 and 2). The frequency of (GGC)6 was at 0.85 and 0.78 in the controls and NCD group, respectively (Fig. 2). At significantly lower frequencies, the 8 and 11 repeats ranked next in the NCD group and controls, respectively.
Significant enrichment of divergent genotypes (genotypes that lacked the 6-repeat) in the NCD group
We detected significant enrichment of genotypes that lacked the 6-repeat allele in the NCD group. Eleven out of 115 patients harbored such genotypes (Mid-P exact = 0.004) (Table 1, Figs. 3 and 4), whereas 3 out of 154 controls harbored those (p = 0.05). The divergent genotypes consisted of the 7, 8, 9 and 11 repeat alleles, and heterozygous and homozygous genotypes were detected in that genotype compartment.
Among the divergent genotypes, in 5 patients (4% of the NCD group) we detected genotypes that were not detected in the control group (hence the term “disease-only”) (Mid-P exact = 0.007) (Table 1, Fig. 4).
Patients harboring the divergent genotypes spanned a wide age range, between 60 to 78 years, and revealed moderate to severe neurocognitive dysfunction. Possible diagnoses also varied, such as AD in patients 1, 5, 9, and 10 and vascular dementia in patients 2 and 11.
In line with a higher frequency of the 8-repeat in the NCD group, we found a significant excess of the 8/8 genotype in this group in comparison to the control group (Mid-P exact = 0.01).
Although not statistically significant (p = 0.05), two control individuals harbored the 11/11 genotype, which was not detected in the NCD group (Table 1). The frequency of the 11-repeat allele was also found to be higher in the controls vs. NCDs.
RASGEF1C (GGC)-repeat expanded specifically in primates, and was at maximum length in human
Across all the species studied, the (GGC)-repeat was at maximum length in human. While in primates the minimum repeat length was 4-repeats (Fig. 5), the maximum length of (GGC)-repeat detectable in non-primates was 2-repeats (Fig. 6), indicating that this STR expanded specifically in primates.
We propose that there is natural selection for the 6-repeat of the RASGEF1C (GGC)n in human. This proposition is not only based on the predominant abundance of the 6-repeat allele in the human subjects studied, but also the significant enrichment of divergent genotypes, lacking this allele in the NCD compartment. A number of divergent genotypes were detected in the NCD group only. Evidence of natural selection for an abundant allele in human has been previously reported by our group in the instance of the exceptionally long CA-repeat in the core promoter of the human NHLH2 gene, and enrichment of genotypes lacking the predominantly abundant allele (the 21-repeat) in patients afflicted with late-onset NCD1. It is commonly assumed that genes influencing health in later life are not subject to natural selection. However, findings on the APOE alleles and several other NCD susceptibility loci27,28 indicate that natural selection indeed happens on such alleles. RASGEF1C may also be linked to other yet to be identified phenotypes, which may impose natural selection at the (GGC)-repeat.
Based on the AceView database29, in comparison with several primates, the brain expression of RASGEF1C has the least quantile expression level in human (https://www.ncbi.nlm.nih.gov/IEB/Research/Acembly), which coincides with maximum length of the (GGC)-repeat in human vs. all other primates. A (GGC)-repeat of similar length range in another gene, Reelin (RELN), which is critical to neurodevelopment in human, shows declining luciferase activity with increasing GGC repeat number30. However, it should be noted that the RASGEF1C (GGC)-repeat is among a number of other regulatory factors, which may also affect gene expression, and a link can only be established through future studies.
While aberrant repeat associated non-AUG translation (RAN), DNA hypermethylation, and polyglycine protein translation are established mechanisms linked to large (GGC)-repeat expansions (> 100 repeats) in a spectrum of neurological disorders in human5,31,32, the mechanisms underlying repeat selection at the range of the RASGEF1C (GGC)-repeat and also its link to late-onset NCD need to be clarified in the future studies.
Despite the high prevalence and debilitating characteristics of late-onset NCDs, genetic studies in this group of disorders have resulted in a number of genes with mild to modest effect for the most part33. In a novel approach, we selected the patients group based on late-onset NCD as an entity, without differentiating the NCD subtypes. The advantage of this approach was to eliminate the often-ambiguous diagnoses made for the NCD subtypes, which frequently co-occur and overlap in respect of the clinical and pathophysiological manifestations34,35,36, and are associated with “probable” and “possible” conclusions for the most part (DSM-5).
The RASGEF1C (GGC)-repeat expanded beyond 2-repeats in primates, and was at maximum length in human. It may be speculated that this locus participates in characteristics and phenotypes that have dramatically diverged in human, such as the higher order brain functions.
Our data warrant further functional studies on the (GGC)-repeat and sequencing this repeat in larger sample sizes and various human populations afflicted with major neurological disorders.
We provide a pilot study on repeat length selection at the human RASGEF1C (GGC)-repeat, at 6-repeats, and significant enrichment of genotypes lacking this allele in patients with late-onset NCD. Indication of natural selection for predominantly abundant STR alleles and divergent genotypes unfold a previously underappreciated feature of STRs in human evolution and disease.
Abbreviated Mental Test Score
Mini-Mental State Exam
RasGEF Domain Family Member 1C
Short tandem repeat
Transcription start site
Afshar, H. et al. Natural selection at the NHLH2 core promoter exceptionally long CA-repeat in human and disease-only genotypes in late-onset neurocognitive disorder. Gerontology 66(5), 514–522. https://doi.org/10.1159/000509471 (2020) (Epub 2020 Sep 2).
Sulovari, A. et al. Human-specific tandem repeat expansion and differential gene expression during primate evolution. Proc. Natl. Acad. Sci. U. S. A. 116(46), 23243–23253. https://doi.org/10.1073/pnas.1912175116 (2019) (Epub 2019 Oct 28).
Flynn, J. M., Caldas, I., Cristescu, M. E. & Clark, A. G. Selection constrains high rates of tandem repetitive DNA mutation in Daphnia pulex. Genetics 207(2), 697–710 (2017).
Watts, P. C. et al. Stabilizing selection on microsatellite allele length at arginine vasopressin 1a receptor and oxytocin receptor loci. Proc. R. Soc. B Biol. Sci. 284(1869), 20171896 (2017).
Hannan, A. J. Tandem repeats mediating genetic plasticity in health and disease. Nat. Rev. Genet. 19(5), 286–298. https://doi.org/10.1038/nrg.2017.115 (2018) (Epub 2018 Feb 5).
Khademi, E. et al. Support for “Disease-Only” genotypes and excess of homozygosity at the CYTH4 primate-specific GTTT-repeat in Schizophrenia. Genet. Test Mol. Biomark. 21(8), 485–490. https://doi.org/10.1089/gtmb.2016.0422 (2017) (Epub 2017 Jul 19).
Bushehri, A., Barez, M. R., Mansouri, S. K., Biglarian, A. & Ohadi, M. Genome-wide identification of human- and primate-specific core promoter short tandem repeats. Gene 587(1), 83–90. https://doi.org/10.1016/j.gene.2016.04.041 (2016) (Epub 2016 Apr 22).
Mohammadparast, S., Bayat, H., Biglarian, A. & Ohadi, M. Exceptional expansion and conservation of a CT-repeat complex in the core promoter of PAXBP1 in primates. Am. J. Primatol. 76(8), 747–756 (2014).
Bilgin Sonay, T. et al. Tandem repeat variation in human and great ape populations and its impact on gene expression divergence. Genome Res. 25(11), 1591–1599. https://doi.org/10.1101/gr.190868.115 (2015) (Epub 2015 Aug 19).
Afshar, H., Khamse, S., Alizadeh, F., Delbari, A., Najafipour, R., Bozorgmehr, A. et al. Evolving evidence on a link between the ZMYM3 exceptionally long GA-STR and human cognition. 2045–2322 Contract No: 1 (2020).
Rosas, I. et al. Role for ATXN1, ATXN2, and HTT intermediate repeats in frontotemporal dementia and Alzheimer’s disease. Neurobiol. Aging. 87, 139.e1-139.e7. https://doi.org/10.1016/j.neurobiolaging.2019.10.017 (2020) (Epub 2019 Nov 1).
Darvish, H. et al. Biased homozygous haplotypes across the human caveolin 1 upstream purine complex in Parkinson’s disease. J. Mol. Neurosci. 51(2), 389–393. https://doi.org/10.1007/s12031-013-0021-9 (2013) (Epub 2013 May 4).
Namdar-Aligoodarzi, P. et al. Exceptionally long 5′ UTR short tandem repeats specifically linked to primates. Gene 569(1), 88–94 (2015).
Li, Q. S., Sun, Y. & Wang, T. Epigenome-wide association study of Alzheimer’s disease replicates 22 differentially methylated positions and 30 differentially methylated regions. Clin. Epigenetics 12(1), 1–14 (2020).
Annear, D. J. et al. Abundancy of polymorphic CGG repeats in the human genome suggest a broad involvement in neurological disease. Sci. Rep. 11(1), 1–11 (2021).
Jiao, B. et al. Identification of expanded repeats in NOTCH2NLC in neurodegenerative dementias. Neurobiol. Aging. 89, 1421.e1-e7 (2020).
LaCroix, A. J. et al. GGC repeat expansion and exon 1 methylation of XYLT1 is a common pathogenic variant in Baratela-Scott syndrome. Am. J. Hum. Genet. 104(1), 35–44 (2019).
Ma, D. et al. Association of NOTCH2NLC repeat expansions with Parkinson disease. JAMA Neurol. 77(12), 1559–1563 (2020).
Sone, J. et al. Long-read sequencing identifies GGC repeat expansions in NOTCH2NLC associated with neuronal intranuclear inclusion disease. Nat. Genet. 51(8), 1215–1221 (2019).
Ajjugal, Y., Kolimi, N. & Rathinavelan, T. Secondary structural choice of DNA and RNA associated with CGG/CCG trinucleotide repeat expansion rationalizes the RNA misprocessing in FXTAS. Sci. Rep. 11(1), 1–17 (2021).
Kumutpongpanich, T. et al. Clinicopathologic features of oculopharyngodistal myopathy with LRP12 CGG repeat expansions compared with other oculopharyngodistal myopathy subtypes. JAMA Neurol. 78(7), 853–863. https://doi.org/10.1001/jamaneurol.2021.1509 (2021).
Foroughan, M. et al. Validity and reliability of a bbreviated Mental TEst Score (AMTS) among older Iranian. Psychogeriatrics 17(6), 460–465 (2017).
Hodkinson, H. Evaluation of a mental test score for assessment of mental impairment in the elderly. Age Ageing 1(4), 233–238 (1972).
Folstein, M. A practical method for grading the cognitive state of patients for the children. J. Psychiatr. Res. 12, 189–198 (1975).
Carpenter, C. R. et al. Accuracy of dementia screening instruments in emergency medicine: A diagnostic meta-analysis. Acad. Emerg. Med. Off. J. Soc. Acad. Emerg. Med 26(2), 226–245 (2019).
Sullivan, K. M., Dean, A. & Soe, M. M. OpenEpi: A web-based epidemiologic and statistical calculator for public health. Public Health Rep. 124(3), 471–474. https://doi.org/10.1177/003335490912400320 (2009).
Drenos, F. & Kirkwood, T. B. Selection on alleles affecting human longevity and late-life disease: the example of apolipoprotein E. PLoS One. 5(4), e10022 (2010).
Raj, T. et al. Alzheimer disease susceptibility loci: Evidence for a protein network under natural selection. Am. J. Hum. Genet. 90(4), 720–726 (2012).
Thierry-Mieg, D. & Thierry-Mieg, J. AceView: A comprehensive cDNA-supported gene and transcripts annotation. Genome Biol. 7(1), 1–14 (2006).
Persico, A. M., Levitt, P. & Pimenta, A. F. Polymorphic GGC repeat differentially regulates human reelin gene expression levels. J. Neural Transm. (Vienna). 113(10), 1373–1382. https://doi.org/10.1007/s00702-006-0441-6 (2006) (Epub 2006 Apr 11).
Sutcliffe, J. S. et al. DNA methylation represses FMR-1 transcription in fragile X syndrome. Hum. Mol. Genet. 1(6), 397–400. https://doi.org/10.1093/hmg/1.6.397 (1992).
Boivin, M. et al. Translation of GGC repeat expansions into a toxic polyglycine protein in NIID defines a novel class of human genetic disorders: The polyG diseases. Neuron 109(11), 1825-1835.e5. https://doi.org/10.1016/j.neuron.2021.03.038 (2021) (Epub 2021 Apr 21).
de Frutos-Lucas, J. et al. Does APOE genotype moderate the relationship between physical activity, brain health and dementia risk? A systematic review. Ageing Res. Rev. 64, 101173 (2020).
Karantzoulis, S. & Galvin, J. E. Distinguishing Alzheimer’s disease from other major forms of dementia. Expert Rev. Neurother. 11(11), 1579–1591 (2011).
Lin, Y.-F. et al. Genetic overlap between vascular pathologies and Alzheimer’s dementia and potential causal mechanisms. Alzheimers Dement. 15(1), 65–75 (2019).
Noori, A., Mezlini, A. M., Hyman, B. T., Serrano-Pozo, A. & Das, S. Systematic review and meta-analysis of human Transcriptomics reveals Neuroinflammation, deficient energy metabolism, and Proteostasis failure across Neurodegeneration. Neurobiol. Dis. 167, 105225 (2020).
We wish to thank all the participants taking part in this research. This research was funded by the University of Social Welfare and Rehabilitation Sciences.
This research was funded by the University of Social Welfare and Rehabilitation Sciences, Tehran, Iran.
The authors declare no competing interests.
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
About this article
Cite this article
Jafarian, Z., Khamse, S., Afshar, H. et al. Natural selection at the RASGEF1C (GGC) repeat in human and divergent genotypes in late-onset neurocognitive disorder. Sci Rep 11, 19235 (2021). https://doi.org/10.1038/s41598-021-98725-y