Identification of a Nuclear Respiratory Factor 1 Recognition Motif in the Apolipoprotein E Variant APOE4 linked to Alzheimer’s Disease

Alzheimer’s disease affects tens of millions of people worldwide and its prevalence continues to rise. It is caused by a combination of a subject’s heredity, environment, lifestyle, and medical condition. The most significant genetic risk factor for late onset Alzheimer’s disease is a variant of the apolipoprotein E gene, APOE4. Here we show that the single nucleotide polymorphism rs429358 that defines APOE4 is located in a short sequence motif repeated several times within exon 4 of apolipoprotein E, reminiscent of the structure of transcriptional enhancers. A JASPAR database search predicts that the T to C transition in rs429358 generates a binding motif for nuclear respiratory factor NRF1. This site appears to be part of a binding site cluster for this transcription factor on exon 4 of APOE. This de novo NRF1 binding site has therefore the potential to affect the expression of multiple genes in its genomic vicinity. Our in silico analysis, suggesting a novel function for APOE4 at the DNA level, offers a potential mechanism for the observed tissue specific neurodegeneration and the role of environmental factors in Alzheimer’s disease etiology.

Human apolipoprotein E (APOE) plays a key role in the regulation of lipid transport in the central nervous system and in the plasma through its interaction with low-density lipoprotein receptors 1 and it is involved in many other biological processes not directly linked to its lipid transport function 2 . The APOE gene is polymorphic arising from different alleles -designated ε 2, 3 and 4 -at a single gene locus 3 . The three major isoforms, APOE ε2 (APOE2), APOE ε3 (APOE3), and APOE ε4 (APOE4), differ from one another by single nucleotide C/T transitions at two locations in exon 4 of APOE, resulting in a cysteine/arginine substitution at two positions affecting residues 130 and 176 in the synthesized protein containing the signal-peptide and residues 112 and 158 in the mature APOE protein 4 . APOE3 evolved from the ancestral allele APOE4 5 and represents the allele with the highest frequency in the human population of the present time. It is thus considered the normal isoform for APOE functions 2 . APOE2 is associated with the genetic disorder type III hyperlipoproteinemia 3 . The APOE4 allele was linked to Alzheimer's disease in late-onset familial and sporadic Alzheimer's disease [6][7][8] and genome wide association studies 9 confirmed the APOE4 locus as the most significant genetic risk factor for Alzheimer's disease. The risk of developing Alzheimer's disease increases with each copy of the APOE4 variant compared with the APOE3/APOE3 genotype: The odds ratio (OR) is 2.6 (APOE2/APOE4) and 3.2 (APOE3/APOE4) with one copy of the APOE4 allele, and the OR increases to 14.9 with two copies of the allele (APOE4/APOE4) 10 . On the other hand, the APOE2 allele of APOE is protective against Alzheimer's Disease, with an OR = 0.6 for APOE2/APOE2 individuals. APOE4 is associated with an earlier age of onset with age 68 as mean age of clinical onset for APOE4 homozygotes versus 84 years of mean age of clinical onset for subjects not carrying the APOE4 allele 8 . Clinical and epidemiological data have indicated that, depending on the population and the study, between 40 to 80% of Alzheimer's disease patients are APOE4 carriers 11 with penetrance of homozygous APOE4 estimated to be at 60-80% 12 . These data show that the magnitude of the effect of APOE4 on Alzheimer's disease is more similar to the one observed for major genes in Mendelian diseases such as BRCA1 in breast cancer than to low-risk common alleles identified by recent genome-wide association studies in complex diseases 13 . A series of hypotheses have been proposed to explain the association of the APOE4 allele with Alzheimer's disease: impairment of the antioxidative defense system, dysregulation of neuronal signaling pathways, disruption of cytoskeletal structure and function, altered phosphorylation of microtubule associated protein tau (MAPT) and the formation of neurofibrillary tangles, depletion of cytosolic androgen receptor levels in the brain, potentiation of Aβ -induced lysosomal leakage and apoptosis in neuronal cells, or promotion of endosomal abnormalities linked to Aβ overproduction (reviewed in ref. 14). In the brain, apolipoprotein E is expressed by about 75% of astrocytes under normal conditions with the highest level of expression in the olfactory bulb and Bergmann glia in the cerebellum 15,16 . Neuronal expression in human brain tissue is barely detectable but is increased in areas affected by ischemia 17 . Several Apoe mouse models have been established to study the mechanisms underlying the pathogenic actions of APOE4 and its potential relationship to Alzheimer's disease pathology. However, expression of APOE4 in astrocytes under the control of the glial fibrillary acidic protein promoter did not lead to typical Alzheimer's like neuropathology 18 nor did aged APOE4 transgenic mouse brains demonstrate any evidence of senile plaques 19 . Further, APOE isoforms were expressed under the control of the physiological mouse promoter in Apoe−/− mice to investigate their roles on cardiovascular function 20 . Mice having targeted replacements of the intrinsic murine Apoe gene with the three human APOE alleles recapitulate many of the phenotypic cardiovascular effects seen in humans with these same isoforms 20 . Even though APOE4 stimulated the accumulation of Aβ 42 and hyperphosphorylated tau in these animals at 4 months of age, the formation of tangles and senile plaques was not reported 21 . Therefore, the effects of APOE protein isoforms on cholesterol and lipid metabolism are faithfully represented in animal models but these models do not display typical Alzheimer's disease hallmarks as a consequence of human APOE4 isoform protein expression. Further, it is striking that most mammals carry the APOE4 isoform at position 130 (Arg) indicating that this protein structure is sufficient to perform all physiological functions of APOE. Given these considerations, we hypothesized that the APOE4 allele may not cause Alzheimer's disease solely due to its resulting change in the protein sequence but may act at the DNA level to control the expression of genes located in the vicinity of APOE4. To this end, using a bioinformatics approach, we examined the APOE exon 4 for the presence of sequence elements typically observed in transcriptional enhancers, including transcription factor binding motifs and short repeat sequences.

Results
Analysis of the DNA Sequence Overlapping the APOE ε Alleles. The APOE ε alleles are determined by two SNPS, rs429358 and rs7412. APOE4 harbors a C at position 19:44908684 and position 19:44908822. APOE2 harbors a T at both positions, while APOE3 harbors a T at position 19:44908684 and a C at position 19:44908822. Both SNPs are located inside exon 4 of the APOE gene, 138 bp apart (Fig. 1a). We aligned the DNA genomic sequences overlapping these SNPs and observed that they display a high degree of similarity. A core segment including the two SNPs and their immediate vicinity displayed a 68% identity with no gaps in a stretch of 22 nucleotides (Fig. 1b). APOE*2 is a rare variant situated in the lipid binding region of APOE, in which valine 236 is substituted by glutamic acid (V254E in the full length sequence, rs199768005, Fig. 1c). This variant is significantly associated with a marked reduction in risk of Alzheimer's disease (P = 7.5 × 10− 5; OR = 0.10 [0.03 to 0.45]) 22 . We noticed that the DNA sequence harboring this SNP is also very similar (75% with no gaps in a segment of 24 nucleotides) to the DNA sequence encompassing rs429358, the genetic variant determining the APOE4 status (Fig. 1d). The DNA sequence encompassing rs7412 displays a similarity of 63% to this 24 nucleotide long motif. These observations of a high degree of DNA sequence similarities in three separate regions (i.e., rs429358, rs7412, rs199768005) affecting the susceptibility to Alzheimer's disease led us to define the sequence "TGGAGGACGTGCGCGGCCGCCTGG" as the "APOE4 motif " (the rs429358 nucleotide observed in APOE4 is highlighted with bold/underline).

Analysis of APOE Exon 4 DNA Sequence.
Variants ε2, ε3, and ε4 are imbedded in a CpG island (CGI) overlapping the end of intron 3 and exon 4 of the APOE gene that is highly methylated in the human brain. This APOE CGI can function both as a transcriptional enhancer or silencer in a luciferase-based reporter system depending on cell type and promoter construct 23,24 . Enhancers generally represent a modular arrangement of short sequence motifs, each interacting with a specific cellular transcription factor or regulatory protein, which will be responsible for turning the transcription on or off in a different set of cells, or at different times 25 . Given the observed activity of the APOE exon 4 on gene transcription 23,24 and our identification of the APOE4 motif within Alzheimer's disease determining SNPs, we inspected the exon 4 DNA sequence for the presence of additional APOE4 motif-like structural elements. This search revealed the presence of the 24 nucleotide-long APOE4 motif in 8 locations on APOE exon 4 with at least the same level of identity to the consensus with no gaps (Fig. 2) as observed within the three sequence elements defined by the three AD-associated SNPs (63%). The 8 occurrences in 5′ to 3′ order were 67%, 100%, 67%, 63%, 63%, 63%, 79% and 75% identical to the APOE4 motif (Fig. 2). Hence it appears that exon 4 of APOE harbors a modular short-sequence arrangement typical of enhancers. These repeats however were found nowhere else in the APOE gene.
Prediction of a NRF1 Transcription Factor Binding Site within the APOE4 Motif Sequence. Most enhancers exert their regulatory function through binding of cell-type specific transcription factors. Thus, we performed an in silico search of the DNA sequence of the APOE4 motif for putative transcription factor binding sites using binding profiles from the JASPAR CORE database of experimentally defined transcription factor binding sites for eukaryotes. A score is calculated for the probed sequence that provides a measure of similarity to the transcription factor consensus sequence. We submitted the APOE3 DNA sequence to the same query for comparison purposes. Results of the analysis are presented in Table 1, and show that the region of interest (APOE4 motif) leads to statistically significant hits for two transcription factor binding motifs, HIF1A::ARNT (Hypoxia-inducible factor 1, alpha::Aryl hydrocarbon receptor nuclear translocator) and NRF1 (Nuclear Respiratory factor 1). HIF1A::ARNT is a heterodimeric transcription factor composed of the alpha subunit HIF1A, and the beta subunit ARNT. A binding motif for this transcription factor was found in both the APOE4 and the APOE3 sequence, with similar scores of 11.2 and 9.6, respectively. The T to C transition is situated at the edge of the consensus motif in the predicted binding site sequence, a position where every nucleotide can be found with similar frequency. The nucleotide change in the APOE3 to APOE4 transition is thus not expected to affect binding. More importantly, screening of the APOE4 motif identified a binding motif for NRF1 with a score of 11.9. The NRF1 binding motif was not identified in the APOE3 query sequence when a stringent relative profile score threshold cut-off of 90% (Table 1) was applied. Hence, the rs429358 T to C transition in the APOE4 motif creates a novel consensus binding motif for NRF1 (Fig. 3a). This predicted binding site is located on the reverse strand (Fig. 3b) which is not unusual as enhancer sequences can be positioned in both forward or reverse orientations, inside, downstream, or upstream of the regulated gene and most transcription factor binding sites can occur in both orientations in promoters or enhancers. In order to assess the relative strength of the NRF1 binding motif in APOE4, a second JASPAR screen with a lower relative profile score threshold cut-off of 80% was performed ( Table 2). Under these less stringent conditions, the novel NRF1 binding motif in APOE4 retained its score of 11.9 (Table 1) while the APOE3 sequence resulted in a score of 3.9 ( Table 2). As a comparison, the highest score to be expected for the NRF1 consensus sequence in JASPAR is 18.1 while a score of 0 signifies that the sequence has equal probability of being a functional or a random site. Moreover, the APOE4 variant changes a non-consensus T nucleotide (A on the reverse strand) present in APOE3 with 0 appearance in the nucleotide frequency matrix of the NRF1 consensus sequence into a highly conserved, consensus matching C nucleotide (G on the reverse strand) with 4275 appearances in the nucleotide frequency matrix (Fig. 3c).

Presence of Additional NRF1 Binding Motifs in APOE Exon 4. Clustering of multiple transcription
factor binding sites for the same transcription factor -so called homotypic clusters of transcription factor binding sites are a prevalent feature of human cis-regulatory elements. These transcription factor clusters can be found both in distant enhancer elements and in promoter regions, and appear to play an active role in gene regulation 26 . Thus, we investigated whether other NRF1 binding motifs could be detected on APOE exon 4. We subjected the entire sequence of APOE exon 4 to a JASPAR database search of NRF1 binding motifs. Six NRF1 binding sites with scores ranging from 10.5 to 14.5 were predicted when a stringent relative profile score threshold of 90% was applied ( Table 3). Locations of these NRF1 binding motifs on the exon 4 sequence are shown in Fig. 4. As a comparison, screening of the neighboring APOE intron 3 did not lead to any hits for NRF1.

Discussion
We have shown in this study using a bioinformatics approach that the DNA sequences spanning polymorphisms linked to Alzheimer's disease are conserved, and contain short sequence spans of what we defined as the APOE4  motif. We have shown that this DNA motif is repeated several times within exon 4 of apolipoprotein E, which harbors these Alzheimer's disease alleles. Moreover, our in silico analysis of transcription factor binding sites using the JASPAR 2014 database revealed that the change of the T nucleotide (APOE3) to a C nucleotide (APOE4) is sufficient to create a de novo NRF1 binding motif. We suggest that the peculiar structural feature on exon 4 could function as a transcriptional enhancer element and be implicated in the machinery that regulates DNA transcription in the genomic vicinity of APOE4. Transcriptional enhancer elements can control transcriptional activity of genes located on the same (cis) chromosome or on different (trans) chromosomes. In the case of cis transcriptional activation, 98% of chromatin loops anchored at a promoter are located within a range of 2 Mb of the enhancer's location 27 , indicating that the vast majority of genes regulated by the enhancer are located within    Table 3. Predicted NRF1 binding sites on APOE4 exon 4. De novo NRF1 binding motif present in the APOE4 variant is indicated in italic.
2 Mb of the enhancer's chromosomal position. Hence, the de novo APOE4 NRF1 binding site could regulate multiple genes on chromosome 19 located within this genomic distance of the APOE gene. Our finding of a single nucleotide change leading to the generation of a de novo NRF1 site in APOE4 is in line with other studies that have shown that single nucleotide variants can affect gene expression. For example, the blond-associated allele at rs12821256 alters a binding site for the lymphoid enhancer-binding factor 1 (LEF1) and reduces LEF1 transcription factor responsiveness in keratinocytes 28 . Preaxial polydactyly, a frequently observed congenital limb malformation, results from single point mutations within the Sonic Hedgehog (SHH) regulator, designated ZRS, which lies within intron 5 of the LMBR1 gene 1 Mb from its target gene 29,30 . The importance of disease-associated allele polymorphisms affecting transcription has recently been highlighted in neurodegenerative disorders. Notably, it was demonstrated that a polymorphic NRF2/sMAF binding site in MAPT (Tau) is strongly associated with differential risk for Parkinson's disease 31 . Further, a risk variant for Parkinson's disease in a distal enhancer of alpha synuclein (SNCA) was shown to modulate target gene expression 32 . NRF1 is a homodimeric transcription factor that mediates the expression of key metabolic genes and of a range of nuclear genes essential for mitochondrial biogenesis 33 , including subunits of the respiratory chain complexes, and constituents of the mtDNA transcription and replication machinery. NRF1 plays an important role in the coupling between energy consumption, energy generation, and neuronal activity 34 . NRF1 has also been associated with the regulation of neurite outgrowth 35 , glucose metabolism 36 , response to exogenous oxidants 37 and hepatitis B infection 38 . Moreover, the expression of NRF1 is increased in aged subjects 39 . NRF1 has also been found to be a potentially important factor for Alzheimer's disease using network topology analysis of microarray data from post-mortem brains 40 . In addition, a panel of neurodegenerative disease-related genes, such as PARK2, PINK1, PARK7, GPR37, PSENEN, and MAPT have been recognized as NRF1 targets 41 . Traumatic brain injury, episodes of brain ischemia, poorly controlled diabetes as well as common infections are known risk factors influencing Alzheimer's disease onset, progression and outcome, apart from advanced age. Thus, the NRF1 binding motif created by the APOE4 variant offers a potential mechanism to link these environmental signals to aberrant gene expression causing Alzheimer's disease. The preferential expression of the APOE protein in glia of the cerebellum and olfactory bulb is difficult to reconcile with the well documented histopathological progression of AD 42 . Gene expression mediated by the NRF1 binding motif could provide for a mechanism for the observed tissue specificity of AD neurodegeneration. The functional role of the predicted NRF1 recognition motif on the expression of genes within the genomic vicinity of APOE and how these genes link to AD neurodegeneration will be elucidated by biochemical and molecular studies.  Nuclear respiratory factor 1, or NRF1, is also known as ALPHA-PAL, HGNC: 7996, Ensembl:ENSG00000106459, UniProtKB: Q16656.

Methods
Search for sequence similarities within APOE gene. Search for sequences similar to the APOE4 motif within the APOE gene was performed using NCBI Homo sapiens Nucleotide Basic Local Alignment Search Tool/ Blastn (http://blast.ncbi.nlm.nih.gov/) using the default parameter values for short sequences.

Prediction of Transcription Factor Binding Sites created by APOE4. Transcription factor binding
sites were predicted by the software JASPAR 2014 43 (http://jaspar.genereg.net/). JASPAR is an open-access collection of curated, non-redundant set of profiles, derived from published collections of experimentally defined transcription factor binding sites for eukaryotes. Sensitivity and specificity are affected by the relative score threshold (default 80%). Submitted sequences were analyzed using a relative profile score threshold setting of 90% to the "CORE Vertebrata" database, to report only the most likely sites 44 as experimentally reported binding sites in DNA frequently locate true sites as the highest-scoring sequences 45 . Position Frequency Matrix cell numbers indicate the number of sequences having base x in column y. Sequence logos 46 are graphical representation of a transcription factor consensus binding site, in which nucleotides are sized and sorted relative to their occurrence at each position. Ranges are from 0 (no base preference) to 2 (single base occurrence).