Large expansions of a non-coding GGGGCC-repeat in the first intron of the C9orf72 gene are a common cause of both amyotrophic lateral sclerosis (ALS) and frontotemporal dementia (FTD). G-rich sequences have a propensity for forming highly stable quadruplex structures in both RNA and DNA termed G-quadruplexes. G-quadruplexes have been shown to be involved in a range of processes including telomere stability and RNA transcription, splicing, translation and transport. Here we show using NMR and CD spectroscopy that the C9orf72 hexanucleotide expansion can form a stable G-quadruplex, which has profound implications for disease mechanism in ALS and FTD.
Amyotrophic lateral sclerosis (ALS) is a devastating neurodegenerative disorder in which the loss of motor neurons in brain and spinal cord causes progressive weakness and paralysis, ultimately leading to death from respiratory failure1. Frontotemporal dementia (FTD) is one of the most common forms of young-onset dementia and is characterized by the progressive degeneration of the frontal and anterior temporal lobes, leading to changes in personality or language impairment2,3. ALS and FTD share numerous similarities at the genetic and neuropathological level, clinically they can co-occur, and they have been proposed to be part of the same spectrum of disease4.
Large expansions of the non-coding GGGGCC-repeat in the first intron of the C9orf72 gene have been recently demonstrated to cause ALS and FTD5,6. Whilst the unaffected normal control population carries <30 repeats of this hexanucleotide, approximately 8–10% of FTD and ALS European patients carry very large expansions which have been reported to range between 700 and 1,600 repeats5. The introns containing these large expansions are transcribed and indeed in patients the GGGGCC-repeat expansion is detectable by in situ hybridization in nuclear RNA foci5.
Guanine (G)-quadruplexes are highly stable nucleic acid secondary structures formed from short tracts of G-rich sequence associating together. These can occur in both DNA and RNA and consist of stacks of planar layers of G-tetrad units, named G-quartets in which the G bases are arranged in a square cyclic pattern held together by eight hydrogen bonds (Fig. 1a). When stacked the tetrads form a central cavity formed from guanine O6 atoms, where they can interact with metal cations. Monovalent metal cations bound in this central channel significantly affect the stability (K+ > Na+ > Li+) and topology of the folded G-quadruplex structure7.
The presence of RNA G-quadruplexes has been demonstrated in vitro and in vivo, and they have been found in the transcripts of diverse organisms, ranging from viruses to humans8,9. Computational algorithms used to predict G-quadruplex forming sequences suggest that there are approximately 197,000 G-quadruplex forming sequences in the human genome9. Interestingly their presence is particularly enriched in the regulatory 5′ UTR, first intron and 3′ UTR regions of transcripts10,11. It has been shown that RNA G-quadruplexes have been recognized and bound by specific proteins and implicated in a wide range of biological processes including alternative splicing, RNA transport, translation regulation, RNA degradation and telomere stability12,13,14.
The GGGGCC expansion of C9orf72 forms runs of adjacent G-repeats and raises the possibility these could form G-quadruplexes. The secondary structure of the hexanucleotide repeat is very likely to be involved in determining the proteins it interacts with. These interactions may play a critical role in disease – as has been shown in Myotonic Dystrophy, where similar non-coding RNA repeat expansions form nuclear foci which sequester key RNA-binding proteins thereby causing functional defects15.
Thus, given the importance of expanded non-coding repeats in other RNA repeat expansion diseases15, we have investigated the secondary structure of the C9orf72 hexanucleotide repeat by biophysical methods to determine whether it forms RNA G-quadruplexes.
In silico analysis predicts C9orf72 GGGGCC-repeats form G-quadruplexes
The GGGGCC-expansion lies in the 5′ region of C9orf72 intron 1 (Fig. 2b). Control individuals are reported to have <30 repeats, but the majority of controls have 2 repeats5,6,16,17. Nonetheless, using the EuQuad database18, which identifies G-quadruplex forming sequences in eukaryotic genomes, the three GGGGCC-repeats and adjacent GGGGC nucleotides present in the human reference C9orf72 gene sequence, are recognized as a G-quadruplex forming sequence. We therefore termed this sequence the minimal C9orf72 G-quadruplex repeat unit (C9Gru). The prediction is supported by the G-quadruplex analysis tool QGRS Mapper9: the full C9orf72 genomic sequence containing the C9Gru sequence shows one predicted four stacked G-quadruplex with a high predictive score (Fig. 2a). When inputting a pathogenic sequence containing 800 hexamer (GGGGCC) repeats the number of predicted 4 stacked G-quadruplexes increases accordingly (Fig. 2b).
GGGGCC-repeats form RNA G-quadruplexes
1D 1H NMR is a key technique in providing unequivocal experimental evidence of G-tetrad and quadruplex formation. We used NMR to investigate the structure of an RNA oligonucleotide consisting of the C9Gru sequence. NMR analysis in 10 mM potassium phosphate buffer, pH 7.0, prior to refolding in the presence of quadruplex stabilizing cations, revealed the presence of imino proton peaks between 12 to 14 ppm, a region of the spectrum characteristic of Watson-Crick (W-C) hydrogen bonding (Supplementary Fig. S1). Raising the potassium chloride concentration to 40 mM and annealing the RNA allowed the C9Gru oligonucleotide to be refolded in the presence of stabilizing cations. At room temperature the 1D 1H NMR spectrum showed the presence of a broad envelope of peaks between 10 and 11.5 ppm, and several distinct peaks characteristic of G-tetrad formation (Fig. 3). The appearance of peaks consistent with tetrad formations coincides with the loss of W-C peaks between 12 to 13 ppm. The buffered C9Gru RNA oligonucleotide was then heated, stepwise (+5 K) from 273 K to 333 K, but without a significant change observed in the spectra, indicating the stability of the folded quadruplex and retention of the G-tetrads. As clear and distinct peaks could not be differentiated within the imino region, further structural analysis was not pursued.
GGGGCC G-quadruplexes are highly stable and parallel oriented
G-quadruplexes can adopt three possible topologies according to the directions of the strands, these can be parallel, anti-parallel or mixed. Riboguanosine has a strong propensity to adopt the ‘anti’ conformation and therefore to give rise to parallel G-quadruplex structures. Circular dichroism (CD) spectroscopy is a standard method used to analyse structural features of G-quadruplexes19,20. We used this technique to characterize the structural conformation of the C9Gru RNA sequence in a 40 mM KCl buffer. This showed a positive peak at 262 nm and a negative peak at 237 nm (Fig. 4a), which are the hallmarks of a parallel-oriented G-quadruplex structure19.
To study the stability of the G-quadruplex motif found in the C9Gru G-quadruplex structure, we performed melting experiments by increasing the temperature from 15°C to 95°C using a 1°C/min gradient. The CD spectra were measured with 10°C intervals and showed a strong overlay from 15°C to 75°C with a reduction of the 262 nm peak only starting at 85°C, indicating a strong stability of the formed structure (Fig. 4a). Folding was fully reversible, but lagged behind the unfolding indicating either a slow refolding process or intermolecular quadruplex formation (Fig. 4b).
To gain information on whether the G-quadruplexes are formed intermolecularly or intramolecularly, we analysed the melting profile temperature by diluting the C9Gru RNA oligonucleotide 10-fold (0.46 μM) in buffer containing 40 mM KCl. The spectrum displayed the same characteristic maximum and minimum around 262 and 237 nm, respectively. Furthermore, the structure remained highly stable and retained the same amount of structure at 95°C as seen in the concentrated sample (Supplementary Fig. S2). This result indicates the formation of an intramolecular G-quadruplex19.
GGGGCC G-quadruplex stability is cation dependent
The stability of the G-quadruplex structure is strongly influenced by the presence of monovalent cations between the G-quartet stacks. K+ promotes stable folding over Na+ and Li+ ions. We therefore compared the C9Gru RNA structure in the presence of either K+ or Li+. CD results showed a reduction of the formed structure in the presence of Li+, with a reduction of the 262 nm peak by ~30%, and a reduction in stability of the quadruplex and the cooperativity of the fold (Fig. 4c).
Together, these CD data show that GGGGCC RNA repeats fold into a very stable, parallel intramolecular G-quadruplex structure.
Our investigations show the C9orf72 GGGGCC-hexanucleotide repeat forms G-quadruplexes and these adopt a parallel topology as illustrated in the model in Figure 1b. Our structural data confirm in silico predictions that attribute a high probability of forming G-quadruplexes to the GGGGCC-repeat.
With regards to quadruplex topology, in contrast to DNA G-quadruplexes, RNA G-quadruplexes generally disfavour folded topologies with phosphate backbones running antiparallel containing mixed syn and anti glycosidic bonds. A preference of the ribose sugar puckers for the C3’-endo conformation favours the anti-conformation and so an all-parallel topology21. Structural studies have suggested that stability is derived through the 2′-OH groups and their intra-and inter molecular interactions with the ribose O4′ atom for C3′-endo pucker and the N2 guanine amine for C2′-endo puckers22. However, this preference is not absolute and examples of RNA adopting the anti-conformation exist23. Our results show that similar to most other RNA G-quadruplexes, the C9orf72 GGGGCC-repeat G-quadruplex adopts a parallel conformation. The GGGGCC-repeat G-quadruplexes, similar to other RNA G-quadruplexes, are very thermodynamically stable at physiological intracellular potassium ion concentrations. Our CD data obtained from the RNA oligomer dilution also suggests that quadruplexes form intramolecularly. In the context of the C9orf72 expansion, where the hexanucleotide repeats are >700, it remains to be assessed whether quadruplexes form by association of adjacent repeats or from distant tracts of the sequence.
The ALS-FTD causing expansions in C9orf72 are formed by a very large number of repeats, which is thought to range between 700 and 1,400 hexamers5. Although we have shown the propensity of the basic repeats composing this expansion to form G-quadruplexes, the extended structure of this very long RNA sequence is not yet known. Evidence of ternary structure from RNA transcripts of telomeric repeat (TERRA) sequences containing several hundred G-quadruplex-forming hexamers, indicates a “bead on a string model” with stacked quadruplex dimers, using 3′ or 5′ interfaces, connected by the intervening linkers to the next pairing24,25. Therefore, we would predict that many quadruplexes would be formed within the C9orf72 expansion, and that quadruplexes would stack as dimers.
Three transcript variants (V1, V2, V3) have been described for the C9orf72 gene: V2 and V3 utilize exon 1a and therefore include the hexanucleotide repeat, while V1 utilizes the alternative exon 1b therefore excluding the hexanucleotide repeat, which is located upstream of the transcription start site5 (Fig. 2a). Real time RT-PCR data from patient brain and cell lines has shown that the presence of the expanded repeat causes a reduction in V1 transcription5,16. V2 and V3 were shown to be transcribed in patient frontal cortex cDNA, which was confirmed by the detection of the hexanucleotide repeats in nuclear foci in patients' brains by in situ hybridization5. A second study used real-time PCR to show that V2 levels were reduced in patient frontal cortex16. Therefore further studies are required to clarify the extent of transcription of the expanded repeat in V2 and V3 in relevant patient tissues. As GGGGCC RNA repeats form G-quadruplexes, it is likely that the DNA sequence also forms these structures. There is evidence that DNA G-quadruplexes on the template strand (also known as the non-coding strand) can inhibit transcription26. It has also been suggested that G-quadruplexes forming on the coding or non-template strand, as is the case with the C9orf72 repeats, could enhance transcription by keeping the template strand single stranded26. This would be consistent with transcription through the repeat and the identification of nuclear RNA foci containing the repeat in patient tissue5. Further work will be required to determine the extent and mechanism of transcription of the C9orf72 repeat sequence. Whether disease is caused by a toxic effect of the GGGGCC-repeat RNA, by a loss of function of C9orf72 or by both mechanisms is yet to be determined.
Repeat expansions in the 3′ UTR of DMPK and in intron 1 of ZNF9 were shown to cause Myotonic Dystrophy type 1 and 2 respectively (DM1 and DM2)27,28. These mutations are similar to C9orf72 expansions in being located in non-coding regions of the respective genes and in forming nuclear RNA foci in patient tissue and cells. Although loss of transcript has been suggested to play a role in both DM1 and DM229,30, the fact that two repeat sequences located in entirely different genes can cause such similar disease features, implies a potential common pathogenic mechanism by RNA gain-of-function31. Indeed, it has been shown that RNA binding proteins, such as MBNL1, are sequestered away from their normal RNA targets by interaction with the expanded repeats32. This leads to aberrant functional downstream effects which directly cause aspects of the myotonic dystrophy phenotype33,34,35. The expanded repeats that cause DM1 fold into a stable hairpin structure that mimics the normal MBNL1 RNA-binding site36,37. These findings suggest it is possible that the expanded GGGGCC-repeat may act in a similar fashion. In this light, the secondary structure of the RNA repeat expansion is crucial in determining which proteins it binds to.
G-quadruplexes have been shown to play a role in a variety of biological processes13,14. A number of these functions are carried out by the interaction with G-quadruplex binding proteins and the sequestration of these proteins could lead to downstream effects that play a role in the disease pathogenesis. While many DNA G-quadruplex binding proteins have been identified, relatively few proteins have been confirmed to bind RNA G-quadruplexes38. A prominent example of an RNA G-quadruplex binding protein is the fragile X mental retardation protein FMRP39, but it is currently unknown whether FMRP is sequestered by the C9orf72 expanded GGGGCC-repeats. Further work will be required to determine whether specific proteins bind to the repeats and what role they might play in disease pathogenesis. If RNA G-quadruplex binding proteins were sequestered by the expanded repeats, thousands of potential target genes with predicted G-quadruplex structures in their regulatory regions could be affected40.
An intriguing finding, which carries relevance for neurons, is that G-quadruplexes in the 3′ UTR of mRNAs are necessary and sufficient for the localization of these mRNAs to dendrites, through interactions with proteins such as FMRP12. Whether the accumulation of the C9orf72 expanded RNA causes sequestration of proteins relevant in mRNA transport needs to be addressed, however, this highlights the possibility of RNA alterations in C9orf72 ALS/FTD that are not limited to RNA quantity and splicing defects. Therefore the study of C9orf72 ALS/FTD may need to couple quantitative RNA sequencing studies with RNA localization investigations in order to fully understand disease pathogenesis. Finally, small molecules have been identified that interact with G-quadruplexes13 and our data suggests that they may have use as potential therapeutics in ALS and FTD caused by C9orf72 repeat expansions.
RNA oligonucleotide sample preparation and annealing
An HPLC purified RNA oligonucleotide of sequence GGGGCCGGGGCCGGGGCCGGGGC was purchased from Integrated DNA Technologies and supplied as a lyophilized powder and reconstituted in ultrapure water to a stock concentration of 2.5 mM. We termed this sequence the C9orf72 minimal G-quadruplex repeat unit (C9Gru). Annealing was carried out by heating the samples to 90°C and allowing them to cool overnight to 20°C.
NMR data was acquired using a Bruker AVANCE 500 NMR spectrometer operating at a proton resonance frequency of 500.13 MHz and equipped with a QNP cryoprobe. 1D 1H NMR spectra of the RNA sample in 90%H2O/10%D2O were acquired and processed using Topspin (version 2.1, Bruker Biospin, Karlsruhe) and excitation sculpting pulse sequence (Bruker pulse program zgesp). Spectra of the RNA sample were acquired before annealing in the presence of 10 mM potassium phosphate buffer, pH 7.0, and after annealing in the presence of 40 mM of KCl, and 10 mM potassium phosphate buffer, pH 7.0. Samples were equilibrated in both cases at a calibrated temperature of 298 K. Data was accumulated centered at the solvent resonance with 1024 transients over a frequency width of 10.33 kHz into 32 K data points for an acquisition time aq = 1.58 s with a relaxation delay d1 = 2 s between transients using a 90° radiofrequency (r.f.) pulse (p1 = 13 μs at a power level of 2 dB).
RNA concentrations ranged from 4.6 to 0.46 μM and are indicated in the results section. The following buffers were used: 40 mM KCl, 10 mM potassium phosphate, pH 7.0; 50 mM LiCl, 10 mM sodium phosphate, 0.35 mM KCl, pH 7.0. CD experiments were performed at temperatures between 15°C and 95°C, with a 1°C/min temperature gradient, using a Jasco J715 spectropolarimeter (Jasco Hachioji, Tokyo, Japan) equipped with a Jasco peltier temperature control system. A CD spectrum of the buffer was recorded and subtracted from the spectrum obtained for the RNA-containing solution. Data were zero-corrected between 340–350 nm.
We thank the UK Medical Research Council (E.M.C.F., P.F., A.N.), the Motor Neurone Disease Association (E.M.C.F., P.F.), Alzheimer's Research UK and MHMS General Charitable Trust (A.M.I.) for funding. PF is funded by a Medical Research Council/Motor Neurone Disease Association Lady Edith Wolfson Fellowship.