Main

Cancer is a genetic disease – the result of dysregulation of the gene networks that maintain normal cellular identity, growth and differentiation. Only a small proportion of cancers are attributed to inheritable single-gene disorders, usually involving non-synonymous mutations in the coding sequence of protein-coding genes, such as BRCA1 in familial breast cancer (Miki et al, 1994) and Rb1, which causes familial retinoblastoma (Du and Pogoriler, 2006). The majority of cancers arise via somatic mutations. The susceptibility to develop sporadic cancer is complex, comprising various genetic and environmental factors. The heritable component of cancer susceptibility is dependent on the cancer type and is in many cases considerable. Despite extensive study, the majority of the genetic component of cancer susceptibility has not yet been linked to individual genes, highlighting significant deficiencies in our understanding of the molecular basis of cancer development. A key development in unravelling the complex genetics of cancer may be the shift in focus from looking exclusively at the protein-coding components of the genome to consideration of the role of variation in regulatory elements.

The combination of various genome-wide approaches, best typified by the ENCODE project (Dunham et al, 2012), has stimulated a dramatic reassessment of the information content of the human genome. Rather than islands of protein-coding genes in a sea of junk DNA, it is increasingly apparent that much of the genome, far more than expected, encodes regulatory information. Indeed the ENCODE project recently concluded that although only ∼1.2% of the genome is protein-coding, at least 20% shows biological function and over 80% exhibits biochemical indices of function. The non-protein-coding regions of the genome serve not only as a substrate for DNA-binding proteins that in turn govern both the expression and 3D architecture of the genome, but also as a template for the transcription of vast numbers of noncoding RNAs (Carninci et al, 2005), which exhibit exquisite cell-specific and developmental dynamic expression patterns (Dinger et al, 2008; Mercer et al, 2008, 2010; Pang et al, 2009; Cabili et al, 2011; Khaitan et al, 2011), capable of transacting a wide repertoire of regulatory functions (Amaral et al, 2008; Mercer et al, 2009).

Based on transcript size, these noncoding RNAs can be grouped into two major classes: small noncoding RNAs (<200 bp) and long noncoding RNAs (lncRNAs; >200 bp, up to ∼100 kb). Long noncoding RNAs share many features of mRNAs; they are frequently transcribed by RNA polymerase II, polyadenylated and can show complex splicing patterns. Long noncoding RNAs are found in sense or antisense orientation to protein-coding genes, within introns of protein-coding genes or in intergenic regions of the genome. While many lncRNAs may function in cis (through the act of their transcription), a significant proportion of lncRNAs have intrinsic RNA-mediated functions in trans (Guttman et al, 2011). Although the functions of only a minority have been described, their dynamic and regulated expression suggests that many more may be functional. Long noncoding RNAs can operate through a variety of mechanisms (reviewed by Rinn and Chang, 2012) and their importance in many aspects of cell differentiation and homeostasis is well established. In Figure 1, we have summarised a number of these mechanisms and provide examples for lncRNAs involved in different steps of cancer progression.

Figure 1
figure 1

Generalised mechanisms and associated examples of lncRNAs involved in cancer progression. Long noncoding RNAs act through a variety of mechanisms such as remodelling of chromatin (A), transcriptional co-activation or -repression (B), protein inhibition (C), as post-transcriptional modifiers (D) or decoy elements (E). Consequently, mis-expression of lncRNAs can lead to changed expression profiles of various target genes involved in different aspects of cell homeostasis.

Regulatory roles of lncRNAs in cancer

The mechanisms through which lncRNAs contribute to the regulatory networks that underpin cancer development are diverse. Accumulating evidence suggests that a major role of lncRNAs is to guide the site specificity of chromatin-modifying complexes to effect epigenetic changes (Mattick and Gagen, 2001; Mattick et al, 2009). At least 38% of lncRNAs present in several human tissues bind to the polycomb repressive complex 2 or the chromatin-modifying proteins CoREST and SMCX (Khalil et al, 2009). Others bind to trithorax chromatin-activating complexes and/or activated chromatin (Dinger et al, 2008). The well-characterised lncRNAs ANRIL, XIST, HOTAIR and KCNQ1OT1 are able to recruit epigenetic modifiers to specific loci to reprogram the chromatin state. Recent studies have linked their mis-expression to diverse cancers (ANRIL: prostate cancer, XIST: female cancers, HOTAIR: breast cancer, KCNQ1OT1: colorectal cancer) (Figure 1A; Gutschner and Diederichs, 2012).

Other lncRNAs have been found to be key regulators of the protein signalling pathways underlying carcinogenesis. The lncRNA lincRNA-p21 contains binding sites for the tumour suppressor p53 in its promoter and is directly activated by p53 in response to DNA damage. LincRNA-p21 is associated with heterogenous nuclear ribonucleoprotein K and localises this protein to promoters of genes, downregulated in the canonical p53 pathway and p53-mediated apoptosis, to maintain gene repression (Figure 1B; Huarte et al, 2010). Thus, similar to its activator p53, lincRNA-p21 may play an important role in tumour suppression by operating as a transcriptional repressor.

To achieve replicative immortality, cancerous cells need to override the cellular mechanisms inhibiting proliferation. Telomeres are the protective ends of chromosomes, composed of several kilobases of short repeats. These ends are progressively shortened during cell division until they reach a critical length triggering cell death or senescence. However, the majority of cancer cells circumvent this loss by expressing telomerase, an enzyme that adds telomeric repeats to the 3′ end of chromosomes. Recent studies have demonstrated that telomeric ends are transcribed into a lncRNA named TERRA, which binds telomerase, inhibiting its activity in vitro (Redon et al, 2010). In many cancer cells TERRA is downregulated, providing a possible link to the longevity of cancer cells by telomerase-mediated lengthening of chromosomal ends (Ng et al, 2009; Figure 1C).

Some lncRNAs are constituents of macromolecular complexes with roles in RNA processing. The lncRNA MALAT1 is thought to act at a post-transcriptional level (Figure 1D) by controlling alternative splicing of pre-mRNAs. It modulates the levels of active (phosphorylated) serine/arginine (SR) splicing factors (Tripathi et al, 2010). MALAT1 is upregulated in several cancer types and its overexpression has been linked to an increase in cell proliferation and migration in lung and colorectal cancer cells (Schmidt et al, 2011; Xu et al, 2011). A recent study indicated that MALAT1 may also have a role in the regulation of gene expression, but not alternative splicing, in lung metastasis, highlighting the controversial nature of MALAT1’s molecular mechanism (Gutschner et al, 2013).

Other lncRNAs can also act as decoys, sequestering biomolecules and preventing them from fulfilling their cellular functions. An example of this mechanism is represented by the tumour suppressor gene PTEN and its pseudogene PTENP1. While PTENP1 has a disrupted open reading frame precluding its ability to encode a functional protein, its 3′UTR region is well-conserved. By ectopic expression, Poliseno et al showed that the PTENP1 3′UTR can increase PTEN expression by binding to microRNAs that downregulate PTEN expression. Thus, PTENP1 plays an important role in cancer biology, restricting cell proliferation by acting as a microRNA decoy for the tumour suppressor PTEN (Poliseno et al, 2010; Figure 1E).

In summary, lncRNAs can act through a number of mechanisms to control cancer state. Despite growing knowledge about the molecular mechanisms of lncRNA functions in cancer, the modes of action of most lncRNAs remain unclear. A broader understanding of the mechanisms of action of lncRNAs, and the regulatory pathways, hierarchies and networks in which they operate, will greatly increase our understanding of their functions in cancer and open new therapeutic avenues to modulate their function.

Genetic association of lncRNAs and cancer

As mentioned above, cancer susceptibility has a considerable heritable component that varies with cancer type (Figure 2A). Surprisingly, the large majority of genome-wide association studies (GWAS) identify cancer risk loci outside of protein-coding regions. We utilised a comprehensive GWAS catalogue (http://www.genome.gov/gwastudies/) to assess the genomic context of SNPs, filtered for association with cancer-related conditions. Of 301 SNPs currently linked to cancer (Supplementary Table 1), only 12 (3.3%) change the protein amino-acid sequence. Most are located in the introns of protein-coding genes (40%) or intergenic regions (44%) (Figure 2B), raising the question of the function of these noncoding loci and their role in cancer development. While some may contribute to cancer risk through cis-regulatory interactions, many of these loci may be transcribed into noncoding RNAs.

Figure 2
figure 2

Heritability and genomic distribution of SNPs in cancer. (A) The heritable component of risk for common cancers. Despite many cancers having sizeable genetic components, the identity of most of these heritable risk factors is currently unknown. Table adapted from SNPedia (Cariaso and Lennon, 2012). (B) Genomic distribution (%) of SNPs in selected cancer types. The majority of cancer-related SNPs are located in noncoding regions of the genome (intergenic or intronic) and only a small number are found in coding regions.

Recently several examples of functional lncRNAs transcribed from cancer risk loci have been reported. For example, SNP rs944289 in the 14q13.3 region, which is strongly associated with papillary thyroid carcinoma (PTC), affects the function of a tumour suppressor lncRNA (Jendrzejewski et al, 2012). A thyroid-specific lncRNA, termed PTC susceptibility candidate 3 (PTCSC3), that was strongly downregulated in PTC was identified in this region and it was found that the repression was caused by the associated SNP. The risk allele alters the binding of the C/EBP proteins to the PTCSC3 promoter, reducing gene expression. PTCSC3 exhibits tumour suppressor activity, controlling the expression of genes involved in DNA replication and repair, tumour morphology, cell movement and cell death. This study is to our knowledge the first to link a cancer-associated SNP to a mechanism of action by altering the expression of a tumour suppressor lncRNA.

ANRIL, a large lncRNA gene spanning 126 kb adjacent to p14/ARF, is located in a GWAS ‘hot spot’ linked to many complex diseases, including type-2 diabetes, coronary artery disease and, recently, cancer (Pasmant et al, 2011). ANRIL interacts with polycomb group proteins and may add repressive histone marks to the p15/CDKN2B-p16/CDKN2A-p14/ARF locus, suppressing cell proliferation. Intriguingly, disease-associated SNPs in ANRIL cluster by disorder; vascular conditions are associated with the 3′ end of the transcript, while cancer susceptibility SNPs map to the 5′ region (Figure 3). This may reflect the mode of action of ANRIL. Polymorphisms in different regions of the ANRIL RNA may affect the RNA–DNA or RNA–protein interactions necessary for ANRIL-induced gene silencing. The region mutated may cause changes in site selectivity of epigenetic programming, with some interactions required for vascular function while other interactions are required for cell cycle maintenance. Further analysis of the structure–function effects of ANRIL polymorphisms may unravel the complexity of this GWAS ‘hot spot’.

Figure 3
figure 3

The GWAS ‘hot spot’ at the ANRIL locus. Multiple disease-associated SNPs map to the ANRIL locus, including various cancers (blue) and vascular-related conditions (red). These individual polymorphisms may affect ANRIL function differently, resulting in diverse diseases.

It is likely that many more lncRNAs are transcribed from cancer loci, as these loci have typically not been examined in a targeted manner and low-abundance RNAs originating from them may not have been detected or characterised. Even apparent gene desert regions including the extensively studied prostate cancer 8q24 locus produce a lncRNA that may be involved in prostate carcinogenesis (Chung et al, 2011). Recently we described a technology termed RNA Capture-Seq, which is capable of detecting lowly expressed and highly tissue-specific transcripts (Mercer et al, 2012). When applied to gene deserts, whole forests of previously unannotated transcripts were identified. This technique will allow the focused re-examination of cancer-associated regions for novel and rare noncoding transcripts that may be important regulators of carcinogenesis. Combined, these studies suggest that many noncoding cancer risk loci identified by GWAS are transcribed into lncRNAs with important regulatory functions in cancer biology.

Diagnostics and therapeutic potential of lncRNAs in cancer

The discovery that lncRNAs are key regulators in cancer transformation and progression leads to intriguing possibilities of application for diagnostics and therapeutics. Many lncRNAs are expressed in a tissue- and cancer-type restricted manner and have already shown to be useful as prognostic markers. HOTAIR expression, for example, is strongly increased in primary tumours and metastases of breast cancer patients, and expression levels correlate positively with a poor outcome (Gupta et al, 2010). In a recent study, Yang et al proposed that high expression levels of HOTAIR could serve as a biomarker to predict tumour recurrence in patients with hepatocellular carcinomas (Yang et al, 2011).

The use of noncoding RNAs in diagnostics has intrinsic advantages over protein-coding RNAs. Although lncRNAs may require post-transcriptional modifications or protein interactions to function, because the mature product is the functional end-product, measurement of its expression directly represents the levels of the active molecule. In contrast, mRNA levels are only indirectly indicative of the levels of the functional product of coding genes (the encoded protein). Long noncoding RNA levels may have a higher correlation with particular cancer states and thus be more useful diagnostic tools. Noncoding RNAs are often stable in human serum and thus measuring either individual marker RNAs (e.g., by qPCR) or the entire transcriptome (e.g., RNA-seq) may allow the non-invasive generation of reliable and actionable clinical indicators (Tong and Lo, 2006). For example, the lncRNA prostate cancer gene 3 (PCA3) is highly associated with prostate cancer and is routinely used to indicate prostate cancer risk (Progensa PCA3 urine test) in urine samples, thereby avoiding unnecessary prostate biopsies (de la Taille, 2007).

Clinical transcriptomics will greatly impact on the medical treatment of cancer. It is foreseeable that clinicians may use analysis of tumour transcriptomes on initial diagnosis, enabling a personalised treatment regime rather than a generic alternative. This unbiased approach avoids preconceptions about which molecular pathways (coding or noncoding) may underlie the disease. Additionally, this information may allow more accurate prognostic predictions, for example, by determining the expression of molecular markers of prognosis and metastasis (e.g., HOTAIR). Subsequently, progression of the individual cancer may be monitored by transcriptomics to detect progression, recurrence and metastasis. Long noncoding RNAs are typically more cell-type specific than protein-coding genes (Cabili et al, 2011) and may allow estimation of the cellular composition of a tumour by marking a specific cell population (e.g., cancer stem cells) (Chan et al, 2013).

Finally, the cell-type specificity of cancer-associated lncRNAs and their regulatory networks can aid the development of targeted therapies. H19, a lncRNA with oncogenic properties, is upregulated in a wide range of tumours. To treat H19-driven cancer types, a plasmid (BC-819) carrying diphtheria toxin under the control of the H19 regulatory sequence has been developed to target cells overexpressing H19. Intratumoral injection of BC-819 was successfully applied in patients with bladder, ovarian and pancreatic cancer to reduce tumour size (Smaldone and Davies, 2010). Several studies also indicate that the reduction of MALAT1 expression levels by siRNAs can influence the migratory and proliferative potential of lung adenocarcinoma and cervical cancer cells in culture (Guo et al, 2010; Tano et al, 2010). Similarly to H19/BC-819, expression of these lncRNA-specific siRNAs in a tumour-restricted manner will allow precise targeting of further tumour types without excessive harm to healthy tissue.

Conclusion

In summary, lncRNAs play integral roles in the control of cellular growth, division and differentiation. The perturbation of lncRNA expression can contribute to the development and progression of cancer. While a large proportion of cancer susceptibility is heritable, the underlying genetic components are largely unknown. Here we posit that a large proportion of the cancer risk may be explained by lncRNAs transcribed from cancer-associated loci. These RNAs exact functions through a diverse range of mechanisms. Characterisation of these lncRNA genes and their modes of action will allow their use for improved cancer diagnosis, monitoring of progression and targeted therapies.