Introduction

More than 40 neurological diseases are caused by microsatellite repeat expansions, including Huntington disease (HD), fragile X syndrome (FXS), myotonic dystrophy types 1 and 2 (DM1, DM2), C9orf72 amyotrophic lateral sclerosis (ALS)/frontotemporal dementia (FTD), spinocerebellar ataxia (SCA) types 1, 2, 3, 6, 7, 8, 10, 12, 17, 31, and 36, and Fuchs endothelial corneal dystrophy (FECD) [1,2,3,4]

Microsatellites repeats are tandem stretches of 2–10 nucleotides in the DNA. These repeats, which are normally found within the genome, are often polymorphic in length. At some genetic loci microsatellite repeats become genetically unstable [5, 6] and result in disease when the sequence length exceeds a certain threshold. This size threshold differs for each repeat-harboring gene. In HD, which is caused by a CAG•CTG expansion mutation within the huntingtin (HTT) gene, disease-causing mutations typically range from 40–70 CAG repeats, whereas individuals with DM1 harbor hundreds to thousand CTG•CAG repeats in the DMPK gene [7, 8].

Disease-causing repeat expansion mutations can be found in exonic, intronic as well as 5′ or 3′ untranslated regions (UTRs). The location of various expansion mutations within their respective genes has been historically used to group specific expansion diseases into one of the following mechanistic categories: (1)Protein gain of function (GOF), exemplified by HD and other polyglutamine expansion diseases in which the expansion mutation is contained within a ORF and translated into polyGln-expanded motifs; (2)RNA GOF, for mutations located in the non-coding regions of their corresponding genes. One clear example occurs in DM1, where CUG-expanded DMPK transcripts accumulate as intranuclear RNA foci and sequester RNA binding proteins belonging to the muscleblind-like (MBNL) family, which prevents them from performing their function; (3) FXS provides a protein loss of function (LOF) example. FXS is caused by expansions of >200 repeats of a CGG located in the promoter and 5′-UTR of the FMR1 gene. The expanded CGG region undergoes methylation, leading to transcriptional silencing of the FMR1 gene and therefore, the absence or loss of function of the FMR1 protein [9].

A “Pathological mystery”

Although significant data supports the contribution of the expanded RNA and expanded polyglutamine proteins in specific dominantly inherited repeat expansion disorders, the molecular mechanisms underlying these diseases is still unclear. Several observations suggest a lack of correlation between the pathological findings and the expression of the corresponding expanded polyglutamine protein or RNA, suggesting that some pieces of the “pathogenic mechanism” puzzle remain to be uncovered.

Some examples come from the CAG/polyGln expansion disorders, from cases in which the established causative mutant protein accumulates in brain regions not typically associated with neurodegeneration. For example, white matter alterations reported in HD or SCA3 can be detected at very early stages of the disease, but these changes appear to occur in the absence or with minimal accumulation of Huntingtin or ataxin-3 polyGln aggregates. These polyGln aggregates preferentially accumulate inside the nuclei of specific types of neurons [7, 10,11,12,13]. One clear example is the HD cerebellum, where recent reports describe white matter alterations [7, 14, 15] that are mainly negative for polyglutamine [16] and still haven’t been sufficiently explored despite signs of possible cerebellar dysfunction in HD. Another example is the accumulation of SCA7 polyGln aggregates in the cerebral cortex [17] a brain region typically not considered to be a primary site of neuropathology. Additionally, several studies propose that soluble or oligomeric polyglutamine proteins are toxic and the aggregates themselves may be neuroprotective [18]. Taken together, these observations suggest that in addition to polyGln other molecular factors contribute to disease.

An additional pathological puzzle is illustrated by fragile X tremor ataxia syndrome (FXTAS). After the disease was first recognized in elderly men who carry CGG premutation alleles in FMR1, large ubiquitin-positive inclusions, which colocalize with CGG-expansion RNAs, were observed in neurons and astrocytes throughout the brain [19,20,21,22,23,24]. While the number of inclusions correlated with the size of the CGG repeat expansion [21, 22], they did not contain the CGG-expanded mutant protein FMRP and in a FXTAs mouse model, the ubiquitin aggregates contained a very limited amount of FMR1 mRNA [25]. These data suggested that additional molecular components contribute to these FXTAS inclusions.

Changing perspectives: bidirectional transcription and RAN translation

The additional discovery that bidirectional transcription occurs at the DM1 [26] and SCA8 [27] loci offered a different view of how expansion mutations could cause disease. For DM1, small 21nt RNAs were reported [26]. For SCA8, CUG, and CAG expansion transcripts are expressed [27, 28]. Bidirectional transcription is now recognized as common across many microsatellite expansion [29,30,31] raising the possibility that sense and antisense transcripts and resulting proteins contribute to disease.

The discovery of repeat-associated non-AUG (RAN) translation in 2011 [32] added another level of complexity to possible molecular mechanisms underlying repeat diseases. Several types of microsatellite expansion mutations have been shown to undergo RAN translation. Because RAN translation can occur in all three reading frames across both sense and antisense expanded transcripts, a cocktail of up to six proteins may result from a single mutation. RAN translation was initially reported in SCA8 and DM1 [32] and has become a fast-growing field with nine different RAN diseases reported to date. The current list includes SCA8 (CTG•CAG expansion), DM1 (CTG•CAG), C9orf72 ALS/FTD (GGGGCC•CCCCGG), fragile X tremor ataxia syndrome (FXTAS) and fragile X premature ovarian insufficiency (FXPOI) (CGG•CCG), HD (CAG•CTG), SCA31 (TGGAA•TTCCA), DM2 (CCTG•CAGG) and FECD (CTG•CAG) [16, 30, 32,33,34,35,36,37,38,39]. Here we review the pathological features of RAN protein accumulation across these diseases.

RAN proteins accumulation and disease pathology

SCA8

Spinocerebellar ataxia type 8 (SCA8) is a dominantly inherited progressive ataxia caused by a CTG•CAG expansion in the overlapping ATXN8OS and ATXN8 genes. SCA8 symptoms include limb and gait ataxia, dysarthria, and nystagmus [40]. Although SCA8 is generally an adult-onset disease, infantile and juvenile forms have been reported [41].

The CUG-expanded ATNX8OS transcripts lead to the formation of RNA foci and the sequestration of MBNL proteins [42]. The initial observation of SCA8 CUG expansion transcripts [43] was followed by the discoveries of a CAG expansion transcript that encodes an ATG-initiated polyGln protein [27]. Later, polyAla [32] and polySer [33] RAN proteins, expressed from ATXN8 CAG expansion transcripts, were found.

The RAN polyAla protein (translated from the GCA frame) was detected in the soma and dendrites of Purkinje cells from SCA8 human postmortem tissue and SCA8 BAC mice [32]. In contrast, SCA8 RAN polySer protein (from the AGC frame) accumulates in the cerebellum of SCA8 patients and SCA8 transgenic mice [33] (Fig. 1). Curiously, the ATG-initiated polyGln protein and RAN polySer protein show strikingly different patterns of accumulation, with intranuclear polyGln aggregates appearing primarily in Purkinje cells, whereas the RAN polySer accumulates primarily subcortical, and deep white matter regions of the cerebellum (Table 1). SCA8 polyGln and polySer aggregates are also detected in the frontal cortex, hippocampus, and brain stem, in similar regions, but do not colocalize [33].

Fig. 1
figure 1

RAN proteins identified in patient tissue

Table 1 Summary of RAN proteins identified in neurologic disease

In SCA8 BAC transgenic mice, polySer aggregates increase with age and disease progression, initially accumulating in the brain stem at two months of age, which is consistent with the early motor defects observed in these mice [27, 33]. In both SCA8 mice and human postmortem tissue, SCA8 polySer preferentially accumulates in white matter regions that also show signs of axonal breakage and demyelination. In contrast, no signs of axonal loss or demyelination were found in brain regions negative for polySer accumulation [33].

Taken together, these data support possible roles for polyGln, RAN polyAla, and RAN polySer proteins in SCA8 pathology. The location of polyGln and RAN polyAla suggest a role in Purkinje cell loss [32] and the accumulation of RAN polySer in the white matter regions suggest its involvement in demyelination [32, 33, 44, 45]. Additionally, the CUG transcripts form RNA foci in Purkinje cells. It is also possible that RAN proteins expressed from CUG expansion transcripts will be found in the future and may contribute to disease.

C9orf72 ALS/FTD

C9orf72 amyotrophic lateral sclerosis and frontotemporal dementia (C9-ALS/FTD) is a dominantly inherited disorder caused by a GGGGCC hexanucleotide repeat expansion in the first intron of the C9orf72 gene. This mutation is the most common known genetic cause of both familial and sporadic ALS and FTD [46,47,48].

C9orf72 ALS is characterized by the degeneration of upper and lower motor neurons, which leads to muscle weakness and paralysis, respiratory failure, and death, usually within 2–5 years [49]. For FTD, marked neurodegeneration occurs in the frontal and anterior temporal lobes, leading to speech difficulties and cognitive and behavioral abnormalities, such as loss of empathy, abrupt mood changes, disinhibition, and behavioral changes [50, 51].

The C9orf72 expansion mutation is bidirectional transcribed and both sense (GGGGCC) and antisense (GGCCCC) RNA foci accumulate throughout the brain and spinal cord [52]. The expanded transcripts also undergo RAN translation, generating C9-polyGlyPro (GP), C9-polyGlyArg (GR), and C9-polyGlyAla (GA) RAN proteins from the sense strand, and C9-polyGlyPro (GP), C9-polyProArg (PR), and C9-polyProAla (PA) proteins from the antisense transcript [30, 31, 53].

Although the 2011 discovery of the GGGGCC mutation that connects ALS/FTD with the microsatellite repeat disorders is quite recent [46, 47], there has been an intense research focus on understanding the pathogenic mechanisms. Data supporting C9orf72 protein LOF, sense and antisense RNA GOF, and RAN protein toxicity are all being hotly pursued. Here we focus on the role of C9 RAN proteins in disease.

RAN proteins are found in multiple regions in postmortem CNS tissue, including frontal and motor cortex, hippocampus, cerebellum, and spinal cord. RAN proteins typically accumulate as cytoplasmic or perinuclear cytoplasmic aggregates primarily in neurons. While RAN protein aggregates have been observed in motor neurons, they are relatively rare in these ALS vulnerable cells yet abundant in other brain regions that are not typically thought to be affected in ALS patients including the cerebellum. These data raise questions about whether or not RAN proteins are a primary driver of disease [54,55,56,57]. It is also possible that cells, which are most vulnerable in C9 ALS/FTD patients, including motor neurons, are highly sensitive to RAN proteins and die before aggregates are visible or that cells that had RAN protein aggregates had already died, and are no longer detectable in most postmortem cases. RAN-positive immunostaining shows variable density, with some brain areas showing intense and clustered accumulation of C9-RAN proteins, while other regions show a more scattered pattern of RAN protein aggregates. C9-RAN protein aggregates colocalize with p62 and are TDP-43 negative [30, 31, 34, 35, 57, 58]. While TDP-43 aggregates do not colocalize with C9 RAN proteins, a hallmark feature of C9 and other forms of ALS are cytoplasmic TDP-43 aggregates. A recent Drosophila study links RAN protein accumulation, karyopherin-α pathology, and TDP-43 accumulation and mislocalization [59].

Additional experimental models support a toxic role for individual C9 RAN proteins in the absence of RNA GOF effects: PR and GR have been shown to be highly toxic in multiple model systems including cultured cells [60,61,62,63,64,65], zebrafish [66], Drosophila [60, 65, 67,68,69,70,71,72], and mice [73], where they can affect development, motor performance, and cellular function. GA and PA proteins have also been reported to be toxic in a variety of in vitro and in vivo models [66, 67, 74,75,76,77,78] and GA accumulation have been correlated with neurodegeneration across both C9 ALS and FTD pathology [79].

The cellular pathways affected by RAN protein overexpression are numerous and include ER stress [73, 75], oxidative stress and DNA damage [80], protein translation abnormalities [72], nuclear transport deficits [69, 70, 81,82,83] and stress granule formation [73]. Nevertheless, it is less clear if RAN proteins play a major role in disease when expressed at endogenous levels. To address this issue, a number of BAC transgenic mouse models have been developed using the endogenous human promoters [84,85,86,87]. Surprisingly, two of these models showed molecular phenotypes including the accumulation of RAN proteins but did not develop phenotypic manifestations of the disease. Two models developed molecular and disease phenotypes. The Zhu model showed mild behavioral abnormalities and subtle neuronal loss in the hippocampus. The Liu et al., model developed classic features of ALS and FTD including weight loss, paralysis, motor neuron loss, cortical, and hippocampal degeneration, muscle denervation, anxiety-like behavior, and decrease survival. RAN protein aggregates are found in regions showing neurodegeneration and increase with disease onset and phenotype.

FXTAS and FXPOI

Fragile X tremor ataxia syndrome (FXTAS) is a late-onset neurodegenerative disorder caused by a 55–200 long CGG repeat in the 5′UTR of the FMR1 gene on the X chromosome. The disease primarily affects older men who develop progressively worsening tremor, gait ataxia, parkinsonism, and dementia [20, 88,89,90,91]. A pathological hallmark of FXTAS is the accumulation of eosinophilic nuclear inclusions that are ubiquitin-positive in both neurons and astrocytes [21, 22]. In contrast to Fragile X syndrome patients (FXS), in which expansion mutation (>200 repeats) shut down gene expression, FMR1 transcript levels are 2–8 times higher in FXTAS patients than in control individuals [92, 93]. FXTAS CGG-expansion transcripts form RNA foci that have been reported to sequester RBPs including Purα [94], hnRNP A2/B1 [95], and Sam68 [96] causing splicing deficits in FXTAS brain samples and neurodegeneration in Drosophila models [95,96,97].

FMR1 CGG-expansion RNAs undergo RAN translation in a length-dependent manner, producing polyGly and polyAla proteins in the GGC and GCG reading frames, respectively [36]. PolyGly accumulation was detected in FXTAS fly and mouse models, and in the frontal cortex, cerebellum, and hippocampus of postmortem FXTAS brains [36]. Homopolymeric RAN proteins are also expressed from antisense CCG expansion RNAs. The accumulation of antisense polypro (CCG), polyArg (CGC), and polyAla (GCC) have been detected in cell culture [98]. PolyPro accumulates in hippocampus, cortex, midbrain, and pons while polyAla is detected in hippocampus, cortex, and midbrain [98] (Fig. 1, Table 1).

Interestingly, polyGly, polyPro, and polyAla RAN proteins all preferentially accumulate as neuronal perinuclear or intranuclear inclusions, which are also ubiquitin-positive, a previously established pathological marker of FXTAs [21, 22, 99], suggesting that RAN proteins contribute to disease. Further evidence supporting a role for polyGly in disease comes from a mouse model study. Sellier et al., compared FXTAS mice with or without a close cognate initiation codon required for polyGly expression [100]. The first model, expressing both CGG-expanded RNA and polyGly aggregates showed brain inflammation, Purkinje cell loss, motor impairment, and decreased survival. In contrast, no significant pathology was observed in mice expressing CGG-expansion transcripts in the absence of polyGly protein. The expression of the FMR1 polyGly protein also causes death in neuronal cell cultures [100]. The polyGly repeat motif was shown to be necessary for protein aggregation and the C-terminal region is required to drive cell toxicity, possibly through interactions with LAP2b and disruptions in nuclear lamina [100].

Fragile X premature ovarian insufficiency (FXPOI), like FXTAS, is caused by expansions of 55–200 CGGs in the 5′UTR of the FMR1 gene. FXPOI causes early (≤40 years old) ovarian dysfunction in women carrying the repeat expansion, who are also at risk of developing FXTAS later in life [88, 101, 102]. For many years it has been known that CGG-expanded FMR1 transcripts are upregulated in FXPOI patients and ubiquitin-positive intranuclear inclusions accumulate in the ovarian stromal cells of FXPOI patients [103]. A recent study identified RAN polyGly aggregates in ovarian stromal cells of a single FXPOI patient, which are also positive for ubiquitin [104]. PolyGly/ubiquitin aggregates are also found in a FXTAS knock-in mice carrying 98 CGG repeats. Animal studies show polyGly aggregates increase with age and are abundant in 40-week-old mice. Older animals also show a higher number of primordial follicles (1.5 fold) compared to wild type animals [104]. Additionally, RAN polyGly aggregates are found in the pituitary gland of FXTAS mice but not control mice [104], suggesting that polyGly might contribute to ovarian failure by affecting the hypothalamus-pituitary-adrenal axis.

Taken together, these studies show RAN proteins accumulate in patient samples and in animal models. While there is substantial evidence that polyGly is toxic, additional work will be required to understand the contributions that polyGly and other known and yet to be detected RAN proteins play in FXTAS and FXPOI.

HD

Huntington disease (HD) is a dominantly inherited disorder caused by a CAG•CTG expansion mutation in the first exon of the HTT gene [2, 7]. The disease is characterized by severe motor, cognitive and psychiatric alterations that normally presents adult-onset, but can also manifest early in life with more severe and faster progressing juvenile-onset cases. HD is fully penetrant at 40 CAG•CTG repeats, and there is a clear correlation between longer repeat expansion and earlier onset and more severe forms of the disease.

The HD expansion mutation results in an abnormally expanded polyGln tract in the huntingtin (HTT) protein. Since the discovery of the causative gene in 1993 [105], most research has focused on understanding the toxic role of the mutant, polyGln-expanded, ATG-initiated HTT protein [7, 106]. RNA-mediated toxicity has also been reported to play a role in HD through the expression of CUG-expanded antisense transcripts [107] and the generation of CAG microRNA species [108]. However, critical aspects of HD, such as the differential vulnerability of specific brain regions and the distinct and more severe pathology in juvenile-onset cases, are not fully understood.

The possibility of RAN translation occurring in HD has been explored by different groups with polyAla and polySer expressed from HTT minigenes in cultured cells [32, 109]. In 2015, Banez-Coronel et al. [16] demonstrated that RAN translation can also occur across relatively small repeat expansions located in protein coding regions and also in vivo. The HD CAG/CTG expansion expressed four novel homopolymeric RAN proteins from both sense and antisense transcripts. Banez et al., showed polyAla, polySer, polyLeu, and polyCys RAN proteins accumulate in several brain regions including caudate/putamen, cortical white matter, and cerebellum (Fig. 1, Table 1), with RAN protein signal dramatically higher in the cerebellum of juvenile-onset cases showing severe cerebellar atrophy.

HD-RAN protein accumulation is variable, and can be detected by increased nuclear or cytoplasmic staining, with soluble or aggregated patterns. Although HD-RAN proteins are detected in neurons, they are more frequently found in glial cells. Strikingly, prominent RAN protein staining was observed in the white matter bundles of the striatum, Bergman glia, white matter regions of the cerebellum, and white matter regions around the dentate nuclei, regions in which polyGln aggregates are absent or minimal. These data suggest RAN proteins contribute to the white matter abnormalities reported in HD [110,111,112,113,114].

Another important link between HD-RAN accumulation and disease pathology comes from the frequent colocalization between RAN proteins and active Caspase3. These data suggest that RAN-positive cells are damaged or undergoing cell death. HD-RAN proteins are more frequently found in brain regions showing atrophy, astrogliosis and microglial activation, and areas showing severe atrophy [16]. All four identified HD-RAN proteins are toxic to neural cells in vitro, independent of RNA-mediated effects [16].

Taken together, these findings demonstrate that sense and antisense RAN proteins accumulate in HD brains and correlate with sites of degenerative changes.

DM2

Myotonic Dystrophy type 2 is caused by an intronic CCTG expansion in the CNBP gene [115]. This multisystemic disease, which causes muscle weakness and myotonia, also affects the heart, the eye, the endocrine system, and the brain. CNS features include executive function deficits and white matter abnormalities [116,117,118,119,120,121]. DM2 CCUG-expanded transcripts form nuclear RNA foci, which sequester the muscleblind (MBNL) family of RNA binding proteins, causing abnormalities in RNA localization and processing [1, 122, 123]. For many years, RNA GOF effects have been considered to be the major driver of disease in DM2. However, a recent study by Zu et al. [38] showed the DM2 tetranucleotide expansion is bidirectionally transcribed and produces both, sense (CCUG) and antisense (CAGG), expansion transcripts. These transcripts, in turn, undergo RAN translation, generating tetrapeptide leucine-proline-alanine-cysteine (LPAC) and glutamine-alanine-glycine-arginine (QAGR) repeat expansion proteins without an AUG-initiation codon. These RAN proteins accumulate in the cortex, hippocampus, and striatum of DM2 patient brains with specific patterns (Fig. 1): LPAC accumulates as small cytoplasmic punctate aggregates that can be perinuclear or located in the cell processes, while QAGR immunostaining is primarily nuclear, with small aggregates at or in close proximity to the nuclear membrane (Table 1).

LPAC is consistently detected in grey matter regions of the brain. LPAC aggregates are found primarily in neurons, and occasionally in glia. There is considerable variability in the frequency of LPAC-positive cells within the brain, even within the same brain regions. Interestingly, LPAC-positive regions show abundant macrophage staining, suggesting that inflammation can be a trigger that favors RAN translation or that RAN LPAC proteins trigger inflammatory processes. In contrast, QAGR primarily accumulates in the white matter regions of the brain and QAGR-positive white matter regions show rarefaction of fibers, suggesting a role for QAGR in axonal loss and increased water intercalation [38] (Table 1). Additionally, both LPAC and QAGR are toxic to cultured cells in the absence of RNA-mediated effects, supporting a pathogenic role of these RAN proteins. Zu et al. also showed that MBNL overexpression decreases RAN protein levels by sequestering CCUG expansion transcripts within the nuclei. When CCUG repeats are not sequestered they can be exported into the cytoplasm where they undergo RAN translation.

DM1

Myotonic dystrophy type 1 is a dominantly inherited neuromuscular disorder with multisystemic features including myotonia, progressive muscle weakness and wasting, cataracts, cardiac effects, testicular atrophy, and CNS abnormalities [118]. The disease is caused by a CTG•CAG expansion in the 3′UTR of the DMPK gene. Healthy individuals have 5–38 repeats while DM1 patients carry large expansions containing hundreds or thousands of repeats [8].

DM1 is generally considered an RNA GOF disease based on a wealth of genetic and biochemical data and on the location of the causative expansion in a non-coding region [124, 125]. CUG expansion transcripts expressed from the DM1 locus form nuclear RNA foci, which sequester the RBP MBNL1 and cause alternative splicing abnormalities [126, 127].

The detection of a polyGln RAN protein in DM1 raises the possibility that RAN proteins contribute to this disease [32]. DM1 polyGln nuclear aggregates have been detected in cardiac myocytes of DM1 mice, as well as in skeletal muscle and peripheral leucocytes in DM1 human samples [32] (Fig. 1, Table 1). The colocalization of DM1 PolyGln with caspase 8 in human leukocytes suggests polyGln RAN proteins can be toxic. While RAN proteins expressed in the polyGln frame are found in several DM1 mouse and human tissues, additional work and better antibody tools are needed to understand when and where DM1 RAN proteins accumulate, if proteins from multiple frames are expressed and if they accumulate in the brain, a common theme in other repeat expansion disorders.

SCA31

Spinocerebellar ataxia type 31 (SCA31) is an adult-onset autosomal dominant neurodegenerative disorder that shows progressive cerebellar ataxia and Purkinje cell degeneration [128]. SCA31 is associated with a TGGAA•TAAAA pentanucleotide repeat expansion. The mutation is located in an intronic region shared by the genes brain expressed, associated with Nedd4 (BEAN1) from one DNA strand, and thymidine kinase 2 (TK2) from the opposite strand [129, 130]. In support of an RNA GOF model, Nimi et al., show that UGGAA expansion transcripts form nuclear RNA foci in Purkinje cells [131]. Additionally, Ishiguro et al. [37] demonstrated the accumulation of a tryptophan-asparagine-glycine-methionine-glutamic acid (WNGME) pentapeptide repeat protein in vivo. This repeat is distinct from other expansion mutations in that an ATG codon, which may be used for translation initiation is embedded throughout the repeat tract.

Both RNA foci and WNGME RAN proteins accumulate in an SCA31 fly model and induce severe neurodegeneration and a shortened life span. SCA31 RAN pentapeptides accumulate as granular structures in the cell body and dendrites of Purkinje cells in the cerebellum of SCA31 patients, but not in control cases (Fig. 1, Table 1). Interestingly, the RBPs TDP-43, FUS, and hnRNPA2B1 bind to the UGGAA RNA expansion, acting as chaperones, and reducing RNA foci, RAN protein levels and toxicity in the fly eye [37]. Although new studies are necessary to further unravel the pathogenic mechanisms in SCA31, these results suggest that RNA GOF, RAN protein GOF or both mechanisms are involved in SCA31 onset and progression.

FECD

The most recent addition to the list of RAN protein disorders is Fuchs Endothelial Corneal Distrophy (FECD). FECD is an inherited degenerative disease that severely affects the corneal endothelium and results in corneal edema and, in severe cases, vision loss [132, 133].

There are several genes associated with FECD, but the most specific genetic association known is a CTG•CAG expansion in the third intron of TCF4 [3].

Similar to DM1 and SCA8 disorders, CUG expansion transcripts expressed from TCF4 form nuclear RNA foci that colocalize with MBNL1, which is believed to underlie the aberrant RNA splicing found in the cornea of FECD patients [134]. Additional experiments by Soragni et al. [39] showed that overexpression CUG expansion transcripts from FECD minigenes, generates RAN polyCys proteins in cultured cells, which cause toxicity and oxidative stress in immortalized corneal endothelial cells. PolyCys is also detected in in the corneal endothelium of patients with FECD [39] (Fig. 1, Table 1). Additionally, this group detected a polyGln protein in FECD fibroblasts. Similar to other expansion diseases, further studies are needed to understand the contributions that RNA GOF and RAN proteins have in FECD.

Common aspects across RAN protein diseases

Although each RAN-positive disorder has its own set of distinct clinical features and pathology, individual RAN proteins, even if they harbor the same repeat motif, should be considered distinct because they often have unique flanking sequences and tissue accumulation patterns. Nevertheless, there are several common themes underlying RAN translation and RAN-mediated pathology that provide insight.

Repeat length dependence

RAN translation is often repeat-length-dependent, with a length threshold required for RAN protein production and increased RAN proteins accumulation with longer repeat tracts. Length-dependent RAN translation has been observed using a variety of minigene constructs for SCA8 [32], HD [108], FXTAS [36], and C9orf72 ALS/FTD [30].

In FXTAS cell culture experiments, RAN polyAla proteins are detected from constructs expressing 88 CGG repeats but not from constructs with 30 repeats [36]. Similarly, for HD, transfection experiments using HTT exon1 minigenes showed RAN polyAla proteins were detected in cells expressing ≥50 CAG repeats and RAN polySer detected with 35 CAG repeats. The polySer aggregates showed progressively increasing aggregation with longer repeat lengths [16]. The correlation of higher RAN protein levels and increased RAN protein aggregates may contribute to the repeat-length-dependent genetic anticipation that is found in most of the repeat expansion disorders.

White matter accumulation

The prominent RAN protein accumulation in damaged white matter brain regions in HD, DM2, and SCA8 cases [16, 33, 38] supports the hypothesis that RAN proteins play a role in the white matter alterations that can occur very early in each of these diseases. The mechanisms underlying these white matter abnormalities have historically been considered a consequence of neuronal death process since the expected toxic CUG RNA or polyGln proteins do not accumulate in these regions. The findings of astroglyosis, demyelinization, microglial, and/or caspase activation in white matter regions with abundant RAN protein accumulation offer an exciting new pathogenic hypothesis that needs further study. It is possible that RAN white matter accumulation will be found in other known and yet to be identified RAN protein diseases [135,136,137].

Variable RAN protein accumulation

Immunohistochemistry studies have shown that RAN protein aggregates frequently show variable densities. For example, some brain areas show clustered accumulation and high-dense staining while other areas within the same brain region (e.g., the hippocampus, cerebellum of frontal cortex) show less frequent or no RAN-positive cells [30, 33, 108]. These data suggest localized molecular or environmental triggers can activate RAN translation. The observation of RAN translation clusters in regions of infarction (DM2) [38] suggest that external factors such as hypoxia, inflammatory molecules, free radicals, specific metabolites, or energy deficits might promote RAN translation [138,139,140]. Another possibility is that RAN proteins or factors required for RAN translation can propagate from cell to cell, facilitating RAN accumulation in neighboring cells. This hypothesis has been validated for C9orf72 GA, GP, and PA RAN proteins in cultured cells [141, 142].

A clear understanding of the mechanisms underlying variable or patchy RAN protein accumulation may provide valuable knowledge about disease progression and regional susceptibility, a possibility suggested by Zu et al [30].

Prospecting for proteins in other diseases

RAN translation is a rapid moving field, with nine different repeat disorders identified as RAN-positive diseases since the discovery of RAN translation by Zu et al. in 2011 [32]. The demonstration that RAN translation can occur from repeats located in both coding and non-coding regions suggests RAN proteins may be found in additional polyGln expansion disorders.

Interestingly, recent reports show ATXN8OS expansions and FXTAS mutations, as well as FXTAS-polyGly proteins are found in autopsy brains from patients with progressive supranuclear palsy (PSP) and Parkinsonism [143, 144]. These findings highlight that known or yet to be discovered repeat expansion mutations may contribute to the symptomatology of additional neurological disorders.

Better detection tools are badly needed to identify novel RAN-translated proteins and to characterize diseases already known to express one or more RAN proteins [36, 39, 100]. Reduced protein solubility, low protein levels, age- or stress-dependent RAN protein expression [138,139,140] all present technical challenges for protein detection in patient samples. Additionally, with the exception of polyGln antibodies, it has been difficult to generate specific antibodies to detect homopolymeric repeat tracts, which would allow screening for RAN proteins across diseases with the same repeat expansion mutation (e.g., the CAG•CTG SCAs). It is also likely that with the recent advances in DNA including long sequencing technology [145], novel repeat expansion diseases will be identified.

Blocking RAN proteins as a therapeutic strategy

The discovery of RAN proteins in an increasing number of expansion disorders and the various lines of evidence that support their pathogenic role highlights the need to develop strategies to block RAN translation. This would allow scientist to test and better understand the contribution of RNA GOF vs. RAN protein toxicity and may lead to the development of therapeutic strategies.

The use of antisense oligonucleotides (ASOs) to degrade expansion RNAs has received a lot of attention. While ASOs provide a platform to degrade multiple types of disease-causing expansion transcripts and therefore would reduce RAN protein levels expressed from those transcripts at the same time, ASO-based therapeutic efforts have focused on targeting the expanded transcript that was initially discovered and not the other strand [146,147,148,149,150,151,152].

The recurrent observations that bidirectional transcription and RAN translation occur across both strands of the repeat expansion mutation suggests that targeting both sense and antisense transcripts may be required for some of these disorders. It will also be important to consider the possibility that downregulation of sense transcripts may lead to the upregulation of antisense transcripts in some of these diseases [153, 154].

An interesting alternative therapeutic approach would be to block RAN translation. This could be achieved using strategies to prevent expansions RNAs from leaving the nucleus including the overexpression of RBPs such as MBNL [38] or TDP-43 [37] or by reducing levels of factors that favor RAN translation such as the eukaryotic initiation factor 3 F (eIF3F) [33].

Additional therapeutic approaches can be aimed at preventing the toxicity of specific RAN proteins. Efforts aligned with this strategy include the generation of drugs that prevent RAN protein function and/or interactions, targeting RAN proteins for clearance through ubiquitination, or the use of antibodies to specifically target individual or sets of RAN proteins. For example, anti-RAN protein antibodies have been shown to prevent protein propagation in cell culture [141].

Conclusions

RAN translation has now been reported in nine diseases caused by repeat expansion mutations. Different microsatellite motifs, including tri-, tetra-, penta-, and hexanucleotide repeat expansions are permissive for RAN translation, suggesting RAN translation may be a common process across most microsatellite expansion disorders. The development of better tools to detect RAN products is likely to identify new RAN-positive diseases as well as novel RAN protein motifs. Because the production of both sense and antisense RAN proteins is a common feature found in many of these disorders, developing therapeutic strategies to target both sense and antisense RAN proteins will be important. Finally, the development of new in vivo models in that allow both sense and antisense expression will help the community to test and better understand the promise and limitations of various therapeutic strategies.