Introduction

Amyotrophic lateral sclerosis (ALS) is a devastating neurodegenerative disorder in which the loss of motor neurons in brain and spinal cord causes progressive weakness and paralysis, ultimately leading to death from respiratory failure1. Frontotemporal dementia (FTD) is one of the most common forms of young-onset dementia and is characterized by the progressive degeneration of the frontal and anterior temporal lobes, leading to changes in personality or language impairment2,3. ALS and FTD share numerous similarities at the genetic and neuropathological level, clinically they can co-occur and they have been proposed to be part of the same spectrum of disease4.

Large expansions of the non-coding GGGGCC-repeat in the first intron of the C9orf72 gene have been recently demonstrated to cause ALS and FTD5,6. Whilst the unaffected normal control population carries <30 repeats of this hexanucleotide, approximately 8–10% of FTD and ALS European patients carry very large expansions which have been reported to range between 700 and 1,600 repeats5. The introns containing these large expansions are transcribed and indeed in patients the GGGGCC-repeat expansion is detectable by in situ hybridization in nuclear RNA foci5.

Guanine (G)-quadruplexes are highly stable nucleic acid secondary structures formed from short tracts of G-rich sequence associating together. These can occur in both DNA and RNA and consist of stacks of planar layers of G-tetrad units, named G-quartets in which the G bases are arranged in a square cyclic pattern held together by eight hydrogen bonds ( Fig. 1a ). When stacked the tetrads form a central cavity formed from guanine O6 atoms, where they can interact with metal cations. Monovalent metal cations bound in this central channel significantly affect the stability (K+ > Na+ > Li+) and topology of the folded G-quadruplex structure7.

Figure 1
figure 1

Schematic representation of the parallel stranded GGGGCC RNA G-quadruplex.

(a) Tetrads are stabilized by eight hydrogen bonds (black lines), with metal ions sitting centrally (orange circle) and phosphate backbones laterally (blue squares). (b) Four stacked G-tetrads, anti-glycosidic torsion angles, with phosphate backbones connected through a propeller loop arrangement, comprised of the two cytosines, ensures a parallel topology.

The presence of RNA G-quadruplexes has been demonstrated in vitro and in vivo and they have been found in the transcripts of diverse organisms, ranging from viruses to humans8,9. Computational algorithms used to predict G-quadruplex forming sequences suggest that there are approximately 197,000 G-quadruplex forming sequences in the human genome9. Interestingly their presence is particularly enriched in the regulatory 5′ UTR, first intron and 3′ UTR regions of transcripts10,11. It has been shown that RNA G-quadruplexes have been recognized and bound by specific proteins and implicated in a wide range of biological processes including alternative splicing, RNA transport, translation regulation, RNA degradation and telomere stability12,13,14.

The GGGGCC expansion of C9orf72 forms runs of adjacent G-repeats and raises the possibility these could form G-quadruplexes. The secondary structure of the hexanucleotide repeat is very likely to be involved in determining the proteins it interacts with. These interactions may play a critical role in disease – as has been shown in Myotonic Dystrophy, where similar non-coding RNA repeat expansions form nuclear foci which sequester key RNA-binding proteins thereby causing functional defects15.

Thus, given the importance of expanded non-coding repeats in other RNA repeat expansion diseases15, we have investigated the secondary structure of the C9orf72 hexanucleotide repeat by biophysical methods to determine whether it forms RNA G-quadruplexes.

Results

In silico analysis predicts C9orf72 GGGGCC-repeats form G-quadruplexes

The GGGGCC-expansion lies in the 5′ region of C9orf72 intron 1 ( Fig. 2b ). Control individuals are reported to have <30 repeats, but the majority of controls have 2 repeats5,6,16,17. Nonetheless, using the EuQuad database18, which identifies G-quadruplex forming sequences in eukaryotic genomes, the three GGGGCC-repeats and adjacent GGGGC nucleotides present in the human reference C9orf72 gene sequence, are recognized as a G-quadruplex forming sequence. We therefore termed this sequence the minimal C9orf72 G-quadruplex repeat unit (C9Gru). The prediction is supported by the G-quadruplex analysis tool QGRS Mapper9: the full C9orf72 genomic sequence containing the C9Gru sequence shows one predicted four stacked G-quadruplex with a high predictive score ( Fig. 2a ). When inputting a pathogenic sequence containing 800 hexamer (GGGGCC) repeats the number of predicted 4 stacked G-quadruplexes increases accordingly ( Fig. 2b ).

Figure 2
figure 2

The C9orf72 hexanucleotide repeat is predicted to form a G-quadruplex structure.

The G-quadruplex prediction tool GQRS mapper was used to identify potential G-quadruplex forming sequences in the entire C9orf72 genomic DNA sequence, which is shown with exons highlighted in red and the GGGGCC repeats in green. GQRS mapper provides a G-score (plotted in blue) which indicates the likelihood of G-quadruplex formation. (a) The highest G-score in the C9orf72 reference sequence from Ensembl GRCh37 corresponds to the three GGGGCC repeats and adjacent GGGGC which we have termed the minimal C9orf72 G-quadruplex repeat unit (C9Gru). (b) 800 GGGGCC repeats were added, which is representative of the disease-causing expansions, leading to an extended region of high G-score. Image is adapted from QGRS mapper (http://bioinformatics.ramapo.edu/QGRS/index.php). E1a = exon 1a; E1b = exon 1b.

GGGGCC-repeats form RNA G-quadruplexes

1D 1H NMR is a key technique in providing unequivocal experimental evidence of G-tetrad and quadruplex formation. We used NMR to investigate the structure of an RNA oligonucleotide consisting of the C9Gru sequence. NMR analysis in 10 mM potassium phosphate buffer, pH 7.0, prior to refolding in the presence of quadruplex stabilizing cations, revealed the presence of imino proton peaks between 12 to 14 ppm, a region of the spectrum characteristic of Watson-Crick (W-C) hydrogen bonding ( Supplementary Fig. S1 ). Raising the potassium chloride concentration to 40 mM and annealing the RNA allowed the C9Gru oligonucleotide to be refolded in the presence of stabilizing cations. At room temperature the 1D 1H NMR spectrum showed the presence of a broad envelope of peaks between 10 and 11.5 ppm and several distinct peaks characteristic of G-tetrad formation ( Fig. 3 ). The appearance of peaks consistent with tetrad formations coincides with the loss of W-C peaks between 12 to 13 ppm. The buffered C9Gru RNA oligonucleotide was then heated, stepwise (+5 K) from 273 K to 333 K, but without a significant change observed in the spectra, indicating the stability of the folded quadruplex and retention of the G-tetrads. As clear and distinct peaks could not be differentiated within the imino region, further structural analysis was not pursued.

Figure 3
figure 3

NMR analysis of the C9orf72 GGGGCC RNA hexanucleotide repeat shows formation of G-quadruplexes.

The 1D proton spectrum of the C9Gru RNA oligonucleotide annealed in 10 mM K2PO4 40 mM KCl buffer, pH 7.0, 298 K. Peaks in the imino proton region (arrow) between 10 and 11.5 ppm correspond to quadruplex formation.

GGGGCC G-quadruplexes are highly stable and parallel oriented

G-quadruplexes can adopt three possible topologies according to the directions of the strands, these can be parallel, anti-parallel or mixed. Riboguanosine has a strong propensity to adopt the ‘anti’ conformation and therefore to give rise to parallel G-quadruplex structures. Circular dichroism (CD) spectroscopy is a standard method used to analyse structural features of G-quadruplexes19,20. We used this technique to characterize the structural conformation of the C9Gru RNA sequence in a 40 mM KCl buffer. This showed a positive peak at 262 nm and a negative peak at 237 nm ( Fig. 4a ), which are the hallmarks of a parallel-oriented G-quadruplex structure19.

Figure 4
figure 4

CD analysis shows the GGGGCC RNA G-quadruplex structures are very stable, cation dependent and parallel oriented.

CD spectra of the C9Gru RNA oligonucleotide (4.6 μm) show a positive peak at 262 nm and a negative peak at 237 nm, which is characteristic of parallel-oriented G-quadruplex structures. (a) and (b) represent the temperature unfold and refold spectra respectively, with the RNA oligonucleotide in KCl 40 mM, K2PO4 10 mM buffer. The peak at 262 nm only decreases at 85°C indicating a very stable structure. Temperature unfold (c) and refold (d) spectra of identical RNA in LiCl 50 mM, Na2PO4 10 mM buffer. The characteristic cation dependence of G-quadruplex structures was confirmed by the observed reduced stability in the presence of Li+ ions.

To study the stability of the G-quadruplex motif found in the C9Gru G-quadruplex structure, we performed melting experiments by increasing the temperature from 15°C to 95°C using a 1°C/min gradient. The CD spectra were measured with 10°C intervals and showed a strong overlay from 15°C to 75°C with a reduction of the 262 nm peak only starting at 85°C, indicating a strong stability of the formed structure ( Fig. 4a ). Folding was fully reversible, but lagged behind the unfolding indicating either a slow refolding process or intermolecular quadruplex formation ( Fig. 4b ).

To gain information on whether the G-quadruplexes are formed intermolecularly or intramolecularly, we analysed the melting profile temperature by diluting the C9Gru RNA oligonucleotide 10-fold (0.46 μM) in buffer containing 40 mM KCl. The spectrum displayed the same characteristic maximum and minimum around 262 and 237 nm, respectively. Furthermore, the structure remained highly stable and retained the same amount of structure at 95°C as seen in the concentrated sample ( Supplementary Fig. S2 ). This result indicates the formation of an intramolecular G-quadruplex19.

GGGGCC G-quadruplex stability is cation dependent

The stability of the G-quadruplex structure is strongly influenced by the presence of monovalent cations between the G-quartet stacks. K+ promotes stable folding over Na+ and Li+ ions. We therefore compared the C9Gru RNA structure in the presence of either K+ or Li+. CD results showed a reduction of the formed structure in the presence of Li+, with a reduction of the 262 nm peak by ~30% and a reduction in stability of the quadruplex and the cooperativity of the fold ( Fig. 4c ).

Together, these CD data show that GGGGCC RNA repeats fold into a very stable, parallel intramolecular G-quadruplex structure.

Discussion

Our investigations show the C9orf72 GGGGCC-hexanucleotide repeat forms G-quadruplexes and these adopt a parallel topology as illustrated in the model in Figure 1b . Our structural data confirm in silico predictions that attribute a high probability of forming G-quadruplexes to the GGGGCC-repeat.

With regards to quadruplex topology, in contrast to DNA G-quadruplexes, RNA G-quadruplexes generally disfavour folded topologies with phosphate backbones running antiparallel containing mixed syn and anti glycosidic bonds. A preference of the ribose sugar puckers for the C3’-endo conformation favours the anti-conformation and so an all-parallel topology21. Structural studies have suggested that stability is derived through the 2′-OH groups and their intra-and inter molecular interactions with the ribose O4′ atom for C3′-endo pucker and the N2 guanine amine for C2′-endo puckers22. However, this preference is not absolute and examples of RNA adopting the anti-conformation exist23. Our results show that similar to most other RNA G-quadruplexes, the C9orf72 GGGGCC-repeat G-quadruplex adopts a parallel conformation. The GGGGCC-repeat G-quadruplexes, similar to other RNA G-quadruplexes, are very thermodynamically stable at physiological intracellular potassium ion concentrations. Our CD data obtained from the RNA oligomer dilution also suggests that quadruplexes form intramolecularly. In the context of the C9orf72 expansion, where the hexanucleotide repeats are >700, it remains to be assessed whether quadruplexes form by association of adjacent repeats or from distant tracts of the sequence.

The ALS-FTD causing expansions in C9orf72 are formed by a very large number of repeats, which is thought to range between 700 and 1,400 hexamers5. Although we have shown the propensity of the basic repeats composing this expansion to form G-quadruplexes, the extended structure of this very long RNA sequence is not yet known. Evidence of ternary structure from RNA transcripts of telomeric repeat (TERRA) sequences containing several hundred G-quadruplex-forming hexamers, indicates a “bead on a string model” with stacked quadruplex dimers, using 3′ or 5′ interfaces, connected by the intervening linkers to the next pairing24,25. Therefore, we would predict that many quadruplexes would be formed within the C9orf72 expansion and that quadruplexes would stack as dimers.

Three transcript variants (V1, V2, V3) have been described for the C9orf72 gene: V2 and V3 utilize exon 1a and therefore include the hexanucleotide repeat, while V1 utilizes the alternative exon 1b therefore excluding the hexanucleotide repeat, which is located upstream of the transcription start site5 ( Fig. 2a ). Real time RT-PCR data from patient brain and cell lines has shown that the presence of the expanded repeat causes a reduction in V1 transcription5,16. V2 and V3 were shown to be transcribed in patient frontal cortex cDNA, which was confirmed by the detection of the hexanucleotide repeats in nuclear foci in patients' brains by in situ hybridization5. A second study used real-time PCR to show that V2 levels were reduced in patient frontal cortex16. Therefore further studies are required to clarify the extent of transcription of the expanded repeat in V2 and V3 in relevant patient tissues. As GGGGCC RNA repeats form G-quadruplexes, it is likely that the DNA sequence also forms these structures. There is evidence that DNA G-quadruplexes on the template strand (also known as the non-coding strand) can inhibit transcription26. It has also been suggested that G-quadruplexes forming on the coding or non-template strand, as is the case with the C9orf72 repeats, could enhance transcription by keeping the template strand single stranded26. This would be consistent with transcription through the repeat and the identification of nuclear RNA foci containing the repeat in patient tissue5. Further work will be required to determine the extent and mechanism of transcription of the C9orf72 repeat sequence. Whether disease is caused by a toxic effect of the GGGGCC-repeat RNA, by a loss of function of C9orf72 or by both mechanisms is yet to be determined.

Repeat expansions in the 3′ UTR of DMPK and in intron 1 of ZNF9 were shown to cause Myotonic Dystrophy type 1 and 2 respectively (DM1 and DM2)27,28. These mutations are similar to C9orf72 expansions in being located in non-coding regions of the respective genes and in forming nuclear RNA foci in patient tissue and cells. Although loss of transcript has been suggested to play a role in both DM1 and DM229,30, the fact that two repeat sequences located in entirely different genes can cause such similar disease features, implies a potential common pathogenic mechanism by RNA gain-of-function31. Indeed, it has been shown that RNA binding proteins, such as MBNL1, are sequestered away from their normal RNA targets by interaction with the expanded repeats32. This leads to aberrant functional downstream effects which directly cause aspects of the myotonic dystrophy phenotype33,34,35. The expanded repeats that cause DM1 fold into a stable hairpin structure that mimics the normal MBNL1 RNA-binding site36,37. These findings suggest it is possible that the expanded GGGGCC-repeat may act in a similar fashion. In this light, the secondary structure of the RNA repeat expansion is crucial in determining which proteins it binds to.

G-quadruplexes have been shown to play a role in a variety of biological processes13,14. A number of these functions are carried out by the interaction with G-quadruplex binding proteins and the sequestration of these proteins could lead to downstream effects that play a role in the disease pathogenesis. While many DNA G-quadruplex binding proteins have been identified, relatively few proteins have been confirmed to bind RNA G-quadruplexes38. A prominent example of an RNA G-quadruplex binding protein is the fragile X mental retardation protein FMRP39, but it is currently unknown whether FMRP is sequestered by the C9orf72 expanded GGGGCC-repeats. Further work will be required to determine whether specific proteins bind to the repeats and what role they might play in disease pathogenesis. If RNA G-quadruplex binding proteins were sequestered by the expanded repeats, thousands of potential target genes with predicted G-quadruplex structures in their regulatory regions could be affected40.

An intriguing finding, which carries relevance for neurons, is that G-quadruplexes in the 3′ UTR of mRNAs are necessary and sufficient for the localization of these mRNAs to dendrites, through interactions with proteins such as FMRP12. Whether the accumulation of the C9orf72 expanded RNA causes sequestration of proteins relevant in mRNA transport needs to be addressed, however, this highlights the possibility of RNA alterations in C9orf72 ALS/FTD that are not limited to RNA quantity and splicing defects. Therefore the study of C9orf72 ALS/FTD may need to couple quantitative RNA sequencing studies with RNA localization investigations in order to fully understand disease pathogenesis. Finally, small molecules have been identified that interact with G-quadruplexes13 and our data suggests that they may have use as potential therapeutics in ALS and FTD caused by C9orf72 repeat expansions.

Methods

RNA oligonucleotide sample preparation and annealing

An HPLC purified RNA oligonucleotide of sequence GGGGCCGGGGCCGGGGCCGGGGC was purchased from Integrated DNA Technologies and supplied as a lyophilized powder and reconstituted in ultrapure water to a stock concentration of 2.5 mM. We termed this sequence the C9orf72 minimal G-quadruplex repeat unit (C9Gru). Annealing was carried out by heating the samples to 90°C and allowing them to cool overnight to 20°C.

NMR spectroscopy

NMR data was acquired using a Bruker AVANCE 500 NMR spectrometer operating at a proton resonance frequency of 500.13 MHz and equipped with a QNP cryoprobe. 1D 1H NMR spectra of the RNA sample in 90%H2O/10%D2O were acquired and processed using Topspin (version 2.1, Bruker Biospin, Karlsruhe) and excitation sculpting pulse sequence (Bruker pulse program zgesp). Spectra of the RNA sample were acquired before annealing in the presence of 10 mM potassium phosphate buffer, pH 7.0 and after annealing in the presence of 40 mM of KCl and 10 mM potassium phosphate buffer, pH 7.0. Samples were equilibrated in both cases at a calibrated temperature of 298 K. Data was accumulated centered at the solvent resonance with 1024 transients over a frequency width of 10.33 kHz into 32 K data points for an acquisition time aq = 1.58 s with a relaxation delay d1 = 2 s between transients using a 90° radiofrequency (r.f.) pulse (p1 = 13 μs at a power level of 2 dB).

Circular dichroism

RNA concentrations ranged from 4.6 to 0.46 μM and are indicated in the results section. The following buffers were used: 40 mM KCl, 10 mM potassium phosphate, pH 7.0; 50 mM LiCl, 10 mM sodium phosphate, 0.35 mM KCl, pH 7.0. CD experiments were performed at temperatures between 15°C and 95°C, with a 1°C/min temperature gradient, using a Jasco J715 spectropolarimeter (Jasco Hachioji, Tokyo, Japan) equipped with a Jasco peltier temperature control system. A CD spectrum of the buffer was recorded and subtracted from the spectrum obtained for the RNA-containing solution. Data were zero-corrected between 340–350 nm.