Introduction

ERG (ETS-related gene) is a member of the E-26 transformation-specific (ETS) family of transcription factors.1, 2 There are 30 identified ETS family genes, 28 of which in the human genome.3, 4, 5 ETS genes are evolutionarily conserved across metazoa and are thought to have arisen 600–700 million years ago.6, 7, 8 Research in several vertebrate model organisms shows that ETS proteins are nuclear DNA-binding phosphoproteins that act as activators or repressors of transcription.4, 9, 10, 11, 12 The ETS transcription factors are required for development and differentiation impacting across a wide range of tissue and cell types with roles in embryogenesis,13 vasculogenesis,14 angiogenesis,15, 16 haematopoiesis17 and neuronal development.18 Their target genes are involved in the regulation of cellular architecture,19 cell migration,20 invasion21 and cell permeability.22, 23

The ERG gene was first described in 1987 by Reddy et al.24 in human colorectal carcinoma cells and gene resides on chromosome 21. Phylogenetic research suggests that ERG evolved from a series of ETS gene duplications during the Cambrian explosion around 542 million years ago.25

ERG’s roles in development and normal physiology

A detailed description of ERG’s roles in development and physiology is beyond the scope of this review; here we briefly outline key features. In normal development, ERG is initially highly expressed in the embryonic mesoderm and endothelium where it has a critical role in the formation of the vascular system, the urogenital tract and in bone development.15, 26 ERG is also expressed at high levels in embryonic neural crest cells during their migratory phase.27 ERG expression decreases during vascular development28 but continues to regulate the pluripotency of haematopoietic stem cells,29 endothelial cell (EC) homeostasis30, 31 and angiogenesis.15, 16 ERG expression is not restricted to development: in the adult mouse it is expressed in endothelial tissue including adrenal, cartilage, heart, spleen, lymphatic endothelial and eosinophil cells.28

During mouse embryonic development, ERG is initially expressed in ECs,13 particularly the amniotic membrane, in the blood vessels surrounding the neural tube,32 the vasculature of the heart and in precartilage.28, 33 ERG is essential for maintaining vascular integrity and the viability of the embryo. ERG maintains vascular stability by tight regulation of the WNT/β-catenin signalling pathway and the transcriptional control of EC-specific genes (angiopoietin 2, endoglin, vWF, VEGF-A and VE-cadherin).26, 30 Consistent with these observations, ERG knockout in mice leads to embryonic lethality associated with vascular defects.30

ERG has also been shown to have a major role in cell response to vascular inflammation where it works to maintain endothelial tube formation and EC barrier function.22, 23 Inhibition of ERG in human umbilical vein ECs leads to loss of cell–cell contact and inhibits tube formation.15, 16 ERG mediates junction stability via transcriptional activation of the adherens glycoprotein VE-cadherin and the tight junction protein claudin protein 5 (CLDN5) genes. Knockdown of ERG is associated with significant increases in endothelial permeability because of changes in cell structure.15, 16, 23 ERG also inhibits vascular inflammation via the repression of genes such as ICAM-1, interleukin-8 (IL-8) and vascular cell adhesion protein (VCAM).

Furthermore, ERG is required for definitive haematopoiesis, normal haematopoietic stem cell function and the maintenance of normal peripheral blood platelet numbers.34 T- and B-cell lymphocytes both arise from haematopoietic stem cells. ERG is found to continuously express in B-lymphocytes from early pre-B cells to mature B cells,35 whereas in T-lymphocytes ERG expression is only detected transiently during T-lineage specification and is silent in mature T-lymphocytes.36 The aberrant expression of ERG in T cells promotes T-cell acute lymphoblastic leukaemia, resulting the accumulation of immature lymphoblasts.34 Murine studies have shown that a proline to serine transition (S329P) in the DNA-binding domain of ERG leads to an inability to transactivate target genes and in the context of haematopoietic lineage, this results in a reduction of mature platelets, erythrocytes and leucocytes.17, 34, 35, 36

ERG is also expressed in mesodermal cells that form precartilage.32 In chicken, ERG is expressed in cartilaginous skeletal primordia.37 In adult mice, ERG is constitutively expressed in the articular chondrocytes of transient cartilage in order to prevent their differentiation into hypertrophic cells.38, 39, 40 ERG’s expression in chondrocytes has also been studied in chicken in which an ERG variant was cloned and called C-1-1.38 The variant lacks 27 amino acids that are normally located upstream of ERG’s DNA-binding domain. C-1-1 is a splice isoform of ERG in which exon 7 is skipped. However, although C-1-1 is expressed in developing articular chondrocytes, full-length ERG is more prominently expressed in prehypertrophic chondrocytes in the growth plate. Forced overexpression of C-1-1 from a viral vector maintained chondrocytes in an immature state preventing the replacement of cartilage with bone. As we will see later, increased skipping of ERG’s exon 7 has also been associated with the progression of prostate cancer.41

Structure of ERG protein

Full-length ERG is a 486 amino-acid 54 kDa transcription factor.3, 24 What identifies the ETS family uniquely is a specific DNA-binding domain called the ETS DNA-binding domain (EBD). It is an 85-amino-acid domain that consists of three α-helices supported by a four-strand anti-parallel β-sheet (Figure 1). This forms a winged helix-turn-helix motif where the third α-helix (H3) contacts the major groove of DNA and confers the principal DNA-binding activity. This is achieved by the EBD, which recognises DNA sequences that contain a core GGA(A/T) motif.42, 43, 44, 45 Direct contact with the DNA is made between two arginines within the third helix and the two guanines of the GGA(A/T) sequence.46 The amino acids directly flanking the EBD interact with the minor groove of DNA and a water molecule, effectively anchoring the protein to the DNA backbone.47 Conserved within the EBD are three tryptophan residues separated by 17–18 amino acids that create the integral structure of the EBD by forming a hydrophobic core around which the α-helices can be arranged.46, 48, 49, 50, 51, 52 This type of conformation can be observed in other families of transcription factors; for example, the DNA-binding, helix-turn-helix domain of the oncogenic transcription factor MYB has three conserved, tryptophan-rich repeated regions. Each region consists of three tryptophan residues separated by 18–19 amino acids.53 The tryptophan triplicates form a hydrophobic core in each repeat,54 which provide a scaffold for the protein’s helix-turn-helix binding domain.55

Figure 1
figure 1

ERG1 protein schematic. The open reading frame of the full-length ERG protein is 486 amino acids long. Functional sites include a phosphorylation site (amino-acid 106); a protein–protein interaction pointed domain (PNT) at 125–209; a TAD at 210–272; an NID at 273–289; the EBD at 290–378; the CID at 379–388; and the C-terminal transactivation domain (CTD) at 410–486. ERG amino-acids in the EBD are shown below the corresponding α-helices and β-strands. Amino-acid residues that contact DNA are starred *; the same residues are involved across all ETS classes but only labelled in class I (adapted from Ng et al.29). The two arginines that bind the GGA of the ETS-binding site consensus are shown in bold. The tyrosine that substitutes for leucine in class I proteins is underlined.

Analysis of the ERG protein predicts that the N-terminus contains a site for phosphorylation by protein kinase C and a pointed (PNT) domain. The PNT domain is 65 amino acids long and forms a monomeric, five-helix bundle that is thought to aid heterodimerisation with protein partners including other members of the ETS family (ETS1 and 2, ETV1, ETV6, FLI1 and ELK3) and with associated factors including DNA-dependent protein kinases, the androgen receptor (AR) and the AP-1 complex.56, 57 Although specific to ETS proteins, PNT domains form part of the larger sterile alpha motif (SAM) family of protein domains. SAM domains are known to be involved in diverse protein–protein interactions including self-association.58 ETV6 is an ETS member with a PNT domain that is able to self-associate;59 it is apparent that ERG can also form homodimers with itself via the PNT domain and the ETS-binding domain.60, 61 Studies have shown that the PNT domain has another potential function: in GABPα, ETS1 and ETS2 the PNT domain acts as a docking platform for mitogen-activated protein (MAP) kinases leading to phosphorylation of adjacent residues and enhanced transactivation activity.62, 63, 64, 65 Consistent with this observation, ERG contains a site close to its PNT domain, which has been shown to be phosphorylated by protein kinase C, IκB kinase and protein kinase B. It is presumed that ERG’s PNT domain also serves as a protein kinase docking platform.66

The middle part of ERG contains a transcriptional activation domain (TAD). TAD is also known as the central alternative domain or the alternative domain. This region also contains a negative regulatory domain.67 The C-terminus of the protein contains the ETS-binding domain including a nuclear localisation signal; adjacent is an additional, smaller transactivation domain, the C-terminal TAD.63 The TAD increases transactivation and is involved in binding protein partners including the AP-1 complex. Both the TAD and the C-terminal TAD can be inhibited by the negative regulatory domain. The EBD is essential for DNA recognition and is also involved in the recruitment of AP-168 and co-activators including histone acetyltransferases.69 The C-terminal transactivation domain has some involvement in heterodimerisation, but it is not involved in homodimerisation. Its main role appears to be in allosteric autoinhibition of ERG’s EBD.

The mechanism of autoinhibition is performed by two stretches of amino acids that directly flank ERG’s EBD. These regions are designated as the N-terminal inhibitory domain (NID), which consists of a randomly coiled formation; and the C-terminal inhibitory domain (CID), which consists of a small α-helix. The NID is found within the negative regulatory domain and the CID is situated on the boundary between the EBD and C-terminal activation domains. These inhibitory domains form a hydrophobic cage that acts primarily to bury the first α-helix (H1) of the EBD. In the absence of DNA, the NID is also able to bind H3 of the EBD. In the presence of DNA containing the ETS, GGA(A/T) sequence a specific tyrosine residue (Tyr354) within the EBD lies perpendicular to H3; in this position it is able to form hydrogen bonds with the target DNA. In the absence of DNA binding, Tyr354 rotates 90° and binds to the NID. It is also suggested that other proteins may interact with the NID to displace it and reinstate ERG’s DNA-binding abilities.70 This type of regulatory mechanism can be found in other ETS proteins. In ETS1, the CID (H4) can align in an anti-parallel manner with the H1, locking them together preventing access to the EBD.46 Instead in ETV6, the CID forms two separate helical structures, only one of which sterically blocks the EBD.71, 72

DNA-binding properties of ERG

To date, the binding specificity of individual ETS transcription factors is not yet fully known, although they share a GGA(A/T) core sequence. In general, ETS transcription factor-binding targets encompass sequences of approximately ~15–20 bp in length.42, 46, 73, 74 In order to determine binding preferences, several groups have tried to categorise the ETS family members through the similarity of the ETS binding domain.47, 75, 76 A classification system designed by Wei et al.47 defined five classes (I, IIa, IIb, III and IV) that are derived from binding site preference. Although all members of the ETS family bind the core sequence GGA(A/T), differentiation between the classes is associated with the surrounding sequences. ERG belongs to class I, containing the largest number of ETS factors (ERG, ETS1 and 2, ETV1–5, ELK1, ELK3, ELK4, ERF, FEV, FLI1 and GABPα). This class of ETS members prefer the extended sequence ACC(GGAA)NT, whereas classes IIa, IIb and III prefer CCC(GGAA)NT. Class IV preference is for CCC(GGAT) NT. Note that the class I target sequence begins with an A. This binding preference is facilitated by the substitution of a leucine residue in the fourth β-strand with a tyrosine or phenylalanine (Figure 1). This results in a reduced affinity for cytosine and a preference for adenine.47

ETS transcription factors also bind sites that do not conform to the core consensus sequence. ETS factor SPI (class I) binds sequences that lack the GGA(A/T) core, including sequences in the macrophage scavenger receptor (AGAGAAGT) and IL-1 beta (IL-1; GCAGAAGT) promoters in which the core sequence is AGAA.77 Binding specificity is also affected by post-translational modifications and protein–protein interactions. ERG has been shown to work in partnership with other proteins to alter DNA structure locally. To this effect, ERG cooperates with the SRY-related HMG box transcription factor SoxD to bind the major and minor groove of DNA. This interaction induces changes in the local DNA double helix geometry, facilitating transcription. Similarly, it has been demonstrated that ERG and the AP-1 complex (Fos+Jun) together form a pincer-like structure around the major groove of a DNA double helix. The C-terminal H3 of ERG’s EBD faces the N-terminus of Jun in an anti-parallel manner. This pairing is able to introduce a bend in the local DNA structure, facilitating access for the transcriptional machinery.61, 78

Alternative promoters and alternative splicing of ERG

There are several descriptions of ERG’s gene and exon/intron structure.24, 79, 80, 81, 82 Here, we use the classification proposed by Zammarchi et al. in 2013.83 The ERG locus is approximately 300 kb long and includes at least 12 exons. There are three mutually exclusive alternative promoters (PI-III) and consequently three alternative first exons (1a, 1b and 1c) and translation start sites. Exons 4 and 7b of ERG are cassette exons and are commonly subject to exon skipping. There are also alternative polyadenylation sites in exons 7b, exon 11 and exon 12.3, 52, 83 As a result, up to 30 alternative ERG transcripts are expressed encoding at least 15 protein variants. The protein variants can include three different N-termini, two alternative transactivation domains (generated by the skipping or retention of exon 7 and exon 7b) and three different C-termini (Figure 2). The splice isoforms denoted ERG2 (NM_004449) and ERG3 (NM_182918) are the main isoforms expressed in most endothelial, myeloid and lymphoid haematopoietic progenitor cells.84

Figure 2
figure 2

Complexity of ERG isoforms. ERG isoforms arise from the use alternative promoters (PI–PIII). Sites of alternative polyadenylation are also shown (black triangles). ERG splice variants are shown below; start codons are indicated by an arrow and stop codons by an asterisk (*). Adapted from Kim et al.68

The third alternative promoter (PIII) is most frequently activated in normal tissues, whereas in prostate cancer the second alternative promoter (PII) is the main driver of ERG transcription. What regulates the transcription of ERG expression is not yet fully understood. However, it is clear that the ERG promoters are epigenetically regulated and susceptible to hypermethylation in cancer.85 The ERG promoters contain two CpG islands (located +571 and +1415 upstream of the transcriptional start site). Hypermethylation of these islands leads to transcriptional repression of ERG in T-lymphoblastic leukaemia.86

In mice, a region 85- kb downstream of ERG’s promoter (termed the ERG +85 enhancer) is highly active in T-cell acute lymphoblastic leukaemia. This region shows strong binding of stem cell leukaemia, lymphoblastic leukaemia-associated haematopoiesis regulator 1 and LIM domain only 2 transcription factors. In the human ERG gene, this enhancer region is immediately upstream of ERG’s exon 4. In human T-cell acute lymphoblastic leukaemia cell lines, the expression of ERG is increased by the binding of ETS (ERG, FLI1), GATA (GATA3) and E-box (stem cell leukaemia, lymphoblastic leukaemia-associated haematopoiesis regulator 1 and LIM domain only 2) transcription factors to the +85 enhancer; this is associated with increased leukaemic cell proliferation.34 The binding of ERG to ETS motifs within its own promoter has also been demonstrated in prostate cancer; thus ERG can transactivate its own promoter. This positive feedback loop is associated with increased invasiveness of prostate cancer cell lines.87, 88

Isoforms of ERG interact with each other, as well as with other ETS family members (FLI1, ETV1 and SPI1) via the PNT and/or ETS-binding domain.89 ERG isoforms, which lack the 81-bp exon 7 (Δ81 isoforms) or the 72-bp exon 7b (Δ72), are expressed in chicken, mouse and human tissues (adding, in frame, 27 and 24 amino acids, respectively). Mice that overexpress the Δ81 isoform die at birth from respiratory failure are smaller and their skeletons hypo-mineralised.90 In cell lines, the expression of ERG isoforms that include exon 7b results in increased proliferation and invasion of prostate cancer cells;81 and both exon 7 and exon 7b inclusion increases in advanced prostate cancer (pathological stage T3).41 As exons 7 and 7b encode part of the TAD (Figure 1), alternative splicing therefore is likely to modulate ERG’s effect on the transcription of target genes.60

Involvement of ERG in prostate cancer

Over the last decade, ERG has been increasingly implicated in the aetiology of prostate cancer. In 2005, a paper published by Tomlins et al.79 showed that ERG is overexpressed in a high proportion of prostate carcinomas as a result of a gene fusion with the androgen-driven promoter of the TMPRSS2 gene. Prostate epithelia do not normally express ERG.89 ERG is one of the most consistently overexpressed oncogenes in malignant prostate cancer 91, 92 and is a driver event in the transition from prostatic intraepithelial neoplasia (PIN) to carcinoma.93 In prostate cancer, high expression of ERG is also associated with advanced tumour stage, high Gleason score, metastasis and shorter survival times.94 ERG is also implicated in other cancers, including Ewing’s sarcoma and leukaemia. For example, ERG-positive acute T-lymphoblastic leukaemias are four times more likely to relapse.95 The overexpression of ERG is one of the key factors in transforming localised, aggressive cancer into metastatic cancer.96 High levels of ERG are implicated in loss of cell polarity, changes in cell adhesion, nuclear pleomorphism promoting hyperplasia and PIN in mouse prostate epithelia.97

Aberrant ERG expression has a major impact on cell invasion and epithelial–mesenchymal transition (EMT) through the upregulation of the FZD4 gene, a member of the frizzled family of receptors.69 Higher levels of FZD4 increase the expression of mesenchymal markers and reduce the expression of epithelial markers. ERG overexpression also leads to the loss of E-cadherin expression (a marker of EMT), as well as increased cell mobility and invasion.69, 98, 99 Enhanced cell mobility and migration also results from ERG’s transactivation of the EMT-related gene vimentin. Vimentin is highly expressed in actively migrating cells but not stationary in cells. It is a key component of the cytoskeleton in which it has a role in the re-organisation of actin filaments in migrating cells.100, 101 High levels of ERG increase cell invasion via the activation of matrix metalloproteases (MMPs), the plasminogen activator pathway and the WNT-signalling pathway.21, 102 ERG upregulates MMP1 and indirectly modulates the activation of MMP3 and of secreted protein acidic and rich in cysteine. These genes regulate EC proliferation and induce loss of focal adhesion, alteration of cell morphology and barrier function.16, 103 Other ERG-regulated genes involved in EMT and cell invasion include RhoA,16, 23 VEGF-R2/FLK1 (ref. 5) and Zeb1/Zeb2.98

ERG is clearly implicated in metastasis. CXCR4 is a type 4 C-X-C chemokine receptor that is upregulated by ERG in ~80% of primary prostate cancers and promotes metastasis to bone tissue.20, 66, 104, 105 Its ligand, the chemokine stromal-derived factor-1 is produced by the bone marrow. Cells that express the membrane-bound CXCR4 receptor metastasise to sites of stromal-derived factor-1 release.106 Furthermore, the ADAMTS1 gene (encoding a disintegrin and metalloproteinase with a thrombospondin motif) is upregulated by ERG in prostate cancer cells. Cells that overexpress ADAMTS1 display excessive matrix deposition and chemotactic attraction towards fibroblasts.107, 108, 109 The downregulation or inactivation of the tumour-suppressor SMAD4 and the upregulation of osteopontin are associated with biochemical recurrence and lethal metastasis. ERG activates osteopontin transcription; and there is evidence of a reciprocal relationship between the expression of SMAD4 and ETS-regulated genes such as VEGF-A and MMP-9.75

ERG represses a number of prostate epithelium-specific genes (KLK3—best known as PSA, SLC45A3/prostein, C15ORF, MSMB/PSP94 and SCGB1D2). This suggests that ERG promotes the de-differentiaton of prostate epithelium.104 ERG may also have a role in cell lineage selection as its overexpression causes stem cell surface markers (such as CD49F) normally expressed by the basolateral layer of the prostate to be expressed in luminal cells.97 It is the basal cell layer and stem cells of the prostate that show the biggest response to ERG overexpression resulting in ductal dysplasia and PIN lesions.110, 111

ERG and the AR: transcriptional cross-talk in prostate cancer

The use of chromatin immunoprecipitation followed by massively parallel sequencing (ChIP-Seq) has revealed a complex network of transcriptional cross-talk between ERG, the androgen receptor (AR) and epigenetic programming in the context of prostate cancer. AR signalling is crucial for the lineage-specific differentiation of prostate epithelia; ERG is able to disrupt differentiation and maintain cells in a de-differentiated state.104 ERG can achieve this disruption via several mechanisms: through physical interaction with the AR protein, through binding to the promoter of AR itself and by binding to the promoters of downstream, AR-regulated genes.112 AR and ERG bind a wide range of sites in target genes. Binding sites that accommodate both AR and ERG are located both in distal enhancers and proximal promoters, most similar in location to AR-specific sites. ERG also appears to cooperate with histone deacetylase complexes (HDACs) and the polycomb protein E2H2 to module AR’s transcriptional output, inhibiting epithelial differentiation.113

Thus, it appears that one of ERG’s roles is to attenuate androgen-regulated transcription. The knockdown of ERG in prostate cancer cells leads to AR induction and the reversal of ERG’s transcriptional regulation programme; for example, the promoters of PSA, trefoil factor 3 and prostein are repressed by ERG and induced by AR.114, 115 The transcription factor MYC is upregulated by ERG. MYC upregulation is linked to increased cell survival, invasion, androgen independence and biochemical recurrence. Loss of ERG recruits the AR to the promoter of c-MYC, blocking its transcriptional activation.116, 117 Conversely, androgen deprivation in prostate cells can result in a cooperative interaction between ERG and the transforming growth factor β/bone morphogenic pathway; the latter is an initiator of EMT closely linked to WNT signalling.99 The cooperation is mainly achieved through interactions with transforming growth factor β and SMAD3 to control mesenchymal differentiation.39 Inhibition of AR-regulated gene transcription is further enhanced by ERG at the epigenetic level when HDAC1–3 and the H3K27 methyltransferase EZH2 are recruited to AR/ERG-binding sites. Once recruited to these sites, they can act as co-repressors aiding ERG-mediated transcriptional repression.113 This is well illustrated by ERG’s upregulation of EMT, orchestrated by ERG through the epigenetic silencing of WNT-signalling pathway repressors in collaboration with HDAC1.35, 36 HDAC1 is highly expressed in ERG-positive prostate cancers69 and its upregulation is mediated by ERG’s repression of the CREB-binding (CBP/p300) histone acetyltransferase. CBP/p300 activates the tumour-suppressor p53, which in turn inhibits the activation of HDAC1.118, 119 ERG and HDAC1 can form a protein complex along with the histone methyltransferase ESET (ERG-associated protein with a SET domain) and the co-repressors of transcription mSin3A and mSin3B to mediate transcriptional silencing.120, 121 ESET is required to keep cells in a pluripotent state122, 123 and may be one of the ways in which ERG overexpression contributes to cellular de-differentiation.

Adding to this already complex transcriptional regulatory partnership is the inclusion of microRNA-mediated regulation. It has become clear that microRNAs have a role in transcriptional regulation in prostate cancer. Several are implicated in the ERG/AR network. MiR-221 is downregulated in ERG-positive tumours and linked recurrence and metastasis after surgery.124 The downregulation of orphan receptor small heterodimer partner by miR-141 leads to the promotion of transcriptional activity by AR.125 The microRNA miR-200c can prevent ERG-directed EMT transition by repressing downstream effectors such as Zeb1 and vimentin; however, in turn ERG is able to directly bind to and prevent transcription of miR-200c. This results in the restoration of expression of miR-200c target genes and the re-establishment of EMT, cell migration and invasion characteristics.126 ERG itself is a direct target of miR-145 and miR-30. These microRNAs can bind ERG mRNA at specific sequences in the 3’UTR and work as potential tumour suppressors, blocking translation and downregulating ERG protein expression. Not surprisingly, the expression of these microRNAs is low in ERG-positive prostate cancers. The effect of miR-30 on ERG expression is even considered a possible mechanism in the progression to androgen-independent prostate cancer.127, 128

TMPRSS2–ERG fusions in prostate cancer

ERG is involved in gene translocations in Ewing’s sarcoma and acute myeloid leukaemia (specifically EWS-ERG and TLS/FUS-ERG).96, 129, 130, 131, 132, 133, 134 Chromosomal re-arrangements that produce fusion genes were generally thought to be uncommon in epithelial cancers such as prostate cancer but a break-through study by Tomlins et al.79 showed a recurring fusion between the promoter of the transmembrane protease serine 2 (TMPRSS2) gene and ERG. TMPRSS2 is a transmembrane protease135 expressed in the epithelium of normal prostate glands and found in semen. In prostate cancer, TMPRSS2 is detected in the apical membrane of secretory epithelia, in the lumen of the glands and in the basal cells.136, 137 The biological function of TMPRSS2 is complex; it has been shown to regulate sodium absorption in human airway epithelia,138 the activation of influenza139, 140, 141 and even severe acute respiratory syndrome (SARS) replication.142, 143 In the prostate, TMPRSS2 is cleaved and can activate protease-activated receptor-2 as part of a signal transduction pathway associated with inflammation, metastasis and invasion.144

In prostate cancer, the promoter region of TMPRSS2 becomes fused to the coding region of ERG. The promoter of TMPRSS2 contains androgen-sensitive elements145 and subsequently this fusion drives the overexpression of ERG in the presence of androgens.79 Fusions are caused by chromosomal translocation or by interstitial deletion of the intergenic region between TMPRSS2 and ERG. Both genes are located on chromosome 21, approximately 3 Mb apart.146, 147, 148, 149 Deletions may occur because of fragile sites and breakpoints found in intron 2 of ERG and in introns 1 and 2 of TMPRSS2.149 An alignment of these breakpoint regions shows them to be very similar to Alu repeat elements (80% homology).150 Androgen may drive the fusion by initiating chromatin looping via the AR transcription complex, bringing the ERG and TMPRSS2 loci together. This, in combination with DNA double-strand break repair, can then lead to the deletion of the interstitial 2.8 Mb of DNA and result in a fusion gene.87, 151

Why does the ERG:TMPRSS2 fusion occur? Androgen signalling leads to recruitment of the AR and TOP2B to breakpoint regions within the regulatory regions of the TMPRSS2 and ERG genes where it induces double-strand breaks and gene recombination events.152 Thus, fusions are thought to occur as a result of long-term exposure to androgens, increased AR activity and inhibition of the double-strand break preventing protein PIWIL1 (Piwi-like protein 1).153 Recent findings have suggested that formation of the TMPRSS2:ERG translocation represents a distinct subset of prostate cancer and that overexpression of ERG may cause structural changes in chromatin topology and DNA damage repair.154, 155, 156, 157 Fusions generated by interstitial deletion rather than translocation are more prevalent in end-stage, castration-resistant prostate cancer.158

Several variants may be generated by differing combinations TMPRSS2 and ERG exons (Figure 3). The most common fusion variant contains either exon 1, or exon 1 and 2 of TMPRSS2 fused with exon 4 of ERG. There are many TMPRSS2–ERG fusion transcripts. The resulting ERG proteins include full-length, N-truncated ERG and those with premature stop codons. Fusions in which TMPRSS2 provides a translation start site in frame with the ERG open reading frame are associated with more aggressive cancer characterised by seminal vesicle invasion.159, 160, 161, 162, 163, 164

Figure 3
figure 3

TMPRSS2:ERG fusion types in prostate cancer. White boxes represent the TMPRSS2 exons (labelled T1–T4), grey boxes represent ERG exons (E2 to E11), white boxes with underlined numbers indicate a retained fragment of TMPRSS2 intron I and underlined numbers in grey boxes signify different variants of ERG retained intron III. Black triangles indicate translation start and * ERG’s normal translation stop site. Black rectangles indicate early stop sites created by frameshifts.

The TMPRSS2:ERG fusion is a remarkably common event in prostate cancer (~50%).79, 160, 161, 162, 163 The occurrence of the fusion increases in frequency from high-grade PIN (10–20%)162, 165, 166, 167 to carcinoma (30–80%).146, 161, 163, 165, 168, 169, 170, 171, 172, 173, 174, 175, 176, 177 Normal prostate tissue does not normally present with TMPRSS2:ERG fusions;168 however, normal tissue adjacent to a site of prostate cancer occasionally contains the fusion (15.6%).178 Interestingly, sites of high-grade PIN containing the fusion are found adjacent to areas of aggressive fusion-positive cancer and both share the same fusion type.146 Fusions have also been detected at low frequency (6–8.3%) in benign prostatic hyperplasia.178, 179 This could indicate that fusion is an early-stage event and that their presence in benign prostatic hyperplasia could increase the risk of developing carcinoma.

The Gleason score is a prognostic grading system numbering from 1 (well-differentiated cells) to 5 (poorly differentiated cells). The most common grade present plus the highest grade give the overall Gleason score. Evidence suggests that the fusion is found more often in moderately to poorly differentiated samples (Gleason score >6).174 The presence of the fusion also correlates with disease recurrence after surgery; 79% of patients with the fusion are more likely to relapse.169 Patients with early onset prostate cancer, which include ERG fusions develop biochemical relapse but those lacking ERG fusion do not.180 In contrast, other studies indicate that the fusion is associated with favourable prognosis, lower-grade cancers and lack of seminal vesicle invasion.177, 181, 182 The use of alternative TMPRSS2 first exons could impact pathogenesis of the fusion. TMPRSS2 can make use of two initial exons (T0 and T1). The most commonly utilised is exon T1; it forms part of the most frequently detected TMPRSS2:ERG fusion (T1-E4). The alternative exon, T0, lies approximately 4-kb upstream of T1 and appears to be prostate specific. Although the use of the T0 exon does not result in a different ERG protein, it appears that prostate cancers that express the T0 containing variant are of lower pathological stage and associated with more favourable prognosis. Therefore, the presence of a T0 containing fusion may be an indicator of a less aggressive tumour.170, 183 Copy number variation may also have a role in prognostic outcome. Increased copy numbers of the TMPRSS2 and ERG loci along with the presence of a deletion fusion are linked poor outcome.177 Single copy fusions are associated with lower Gleason scores, whereas increased fusion copy numbers are associated with higher Gleason scoring.184 This implies that a higher dosage of ERG leads to more severe disease phenotype—this makes sense given ERG’s oncogenic role.

The overexpression of TMPRSS2:ERG in mice leads them to develop PIN and a disrupted basal cell layer (a prime indicator of invasive carcinoma). The overexpression of ERG in cell lines increases invasive abilities via activation of the urokinase plasminogen pathway. In fact, there are several indicators that ERG facilitates the PIN to prostate cancer transition. Forced overexpression of TMPRSS2:ERG in the prostate cancer cell-line PC3 (fusion negative, with trace ERG expression), keeps cells in a de-differentiated state leading to a significant increase in cell migration and invasion.92 Furthermore, two genes that are directly induced by TMPRSS2:ERG are MMP 9 and PLXNA2 (Plexin A2). These genes act to breakdown the extracellular matrix and as a signal for axonal growth cone guidance molecules, respectively.185 Upregulation of the microtubule-forming protein β-III tubulin has also been tightly associated with the TMPRSS2:ERG fusion and phosphatase and tensin homologue deleted on chromosome 10 (PTEN) deletion, particularly in tumours with a high Gleason score.186 TMPRSS2:ERG has been shown to physically interact with poly (ADP-ribose) polymerase 1 (PARP1) and the catalytic subunit of DNA-dependent protein kinases. They act as co-factors in ERG-driven invasion of prostate cells; and contribute to further DNA damage by inducing double-strand breaks.156

TMPRSS2:ERG expression is also linked to stromal changes, the promotion of EMT and aggressive prostate cancer phenotype.94, 98 ERG also activates the promoter of EZH2 in prostate cancer cells, promoting cancer growth progression by epigenetically deactivating tumour-suppressor genes such as NKX3.1. NKX3.1, a homeobox transcription factor, negatively regulates TMPRSS2187 and is also essential for early prostate differentiation.188 Loss of NKX3.1 allows the transcription of the TMPRSS2:ERG fusion gene to proceed uninhibited.189 Interestingly, EZH2 has been shown to repress ERG transcription in normal prostate cell lines but to have no effect in cancer cell lines.85 High expression levels of the polycomb gene EZH2 in localised prostate cancer is a clinical predictor of poor prognosis190 and the resulting hypermethylation of glutathione S-transferase pi 1 (GSTP1) is considered to be a crucial event in early prostate cancer development.191

In the absence of AR activity, TMPRSS2:ERG can be regulated by other androgen-independent mechanisms, including by ERG itself88 or even by the oestrogen receptor ERα. TMPRSS2:ERG fusions are associated with a distinct genetic signature that is consistent with ER signalling. Expression of TMPRSS2:ERG decreases in response to an ERβ agonist, but increases in response to an ERα agonist.112, 192

ERG as a diagnostic and prognostic indicator in prostate cancer

Clearly, the spectrum of target genes and biological processes associated with ERG is complex. As a result, the value of ERG as a prognostic or diagnostic indicator of prostate cancer is greatly debated at present. Conflicting data have suggested that ERG overexpression is associated with aggressive disease, indolent disease, early-stage cancer and later-stage cancer, an indicator of early biochemical recurrence and an indicator of a better recurrence-free survival. This is most probably due to heterogeneity in sample collection methods, screening, sample types and processing.

However, on the whole recent data points towards ERG fusion as being a relatively early-stage event in the progression to malignant prostate cancer. It has been suggested that there are two main types of malignant prostate cancers—ETS+ (those containing ERG or other ETS gene fusions) and ETS (those without ERG/ETS fusions). ERG overexpression in conjunction with loss of PTEN or TP53 is able to transform high-grade PIN into invasive carcinoma with increased cell migration.97 Therefore, it is thought that only the concomitant loss or inactivation of a tumour-suppressor gene is required for the progression to a more aggressive, invasive phenotype.97, 103 Consistent with this theory, lesions in PTEN and TP53 tumour-suppressor genes are associated with ETS+ tumours.193, 194 The loss, mutation or inhibition of PTEN, TP53 and other tumour-suppressor genes are thought to be the triggers for invasion and metastasis.195, 196

ERG status can act as an indicator of pathological stage but in isolation it is not necessarily related to biochemical recurrence or survival; this would require further confirmation of PTEN and TP53 status.197 TMPRSS2:ERG fusions can be detected with quantitative PCR in the urine of patients with suspected prostate cancer. Urine samples are taken before biopsy and results correlate with tissue-based fluorescence in situ hybridisation results, suggesting a non-invasive diagnostic test.198 It is now reasonable to expect that ERG testing will become part of routine clinical practise. Table 1 summarises the association between different biological features of ERG, pathological consequences and clinical outcomes.

Table 1 The biological complexity of ERG and its clinical impact

ERG-based therapies

Together, the several findings described in these previous sections convincingly implicate ERG in several aspects of the biology of prostate cancer. Overwhelming evidence suggests that ERG does contribute to worse outcomes and is involved in the regulation of signalling pathways that are dysregulated. ERG is strongly implicated in several processes that are relevant to prostate cancer including invasion and metastasis, EMT, epigenetic reprogramming, differentiation and inflammation.

Having discussed the involvement of ERG in prostate cancer, and its utility in diagnostic tests, we turn our attention to potential ERG-based therapies. Owing to the high prevalence of TMPRSS2:ERG fusions in prostate cancer, ERG proteins and their co-factors offer an attractive target for novel therapies. The enzyme PARP1 has been shown to be a required co-factor for ERG proteins in prostate cancer cells. Treatment with the PARP inhibitor olaparib significantly reduced the invasive abilities of ERG+ cells.156 Exposure of ERG+ /PTEN prostate cells to the PARP inhibitor rucaparib was shown to sensitise the cells to low-dose radiation. This sensitisation occurred via DNA damage, activation of senescence and reduction of clonogenic survival.199

Similarly, inhibiting HDAC partners of ERG could prevent the advancement of prostate cancer development. ERG-positive cell lines treated with the HDAC inhibitors trichostatin A, MS-275 and suberoylanilide hydroxamic acid displayed growth inhibition and cell death. Furthermore, HDAC interference interfered with AR transport by sequestering AR in the cytoplasm and preventing nuclear transport.200 The use of HDAC inhibitors trichostatin A and valproic acid significantly decreases TMPRSS2:ERG expression at both the mRNA and protein level; this is concurrent with an increase in acetylation of p53, increasing apoptosis and the upregulation of cell cycle control gene CDKN1A (linked with cell cycle arrest and senescence).119

Other inhibitors function by directly targeting ERG itself. The small molecule inhibitor, YK-4–279, can directly bind to ERG and inhibit its transcriptional activity. This is mediated by interfering with ERG protein–protein interactions rather than ERG-DNA binding. In ERG-positive prostate cancer cell lines, its inhibition leads to decreased motility, invasion and metastasis.201 A DNA-binding inhibitor, DB1255 (di-(thiophene-phenyl-amidine)), targets the core GGA(A/T) consensus sequence within an ETS-binding site and prevents the ETS-binding domain from binding it.202

Targeting ERG for rapid degradation is another avenue for potential treatment. The deubiquitinase enzyme ubiquitin-specific peptidase 9 has been shown to deubiquitinate ERG in vitro, leading to stabilisation of the protein. Knockdown of USP9X resulted in increased ubiquitination and degradation of ERG. A similar effect was seen using a direct inhibitor of USP9X, the compound WP1130, a second-generation tyrphostin derivative. Treatment of ERG-positive cells with WP1130 resulted in ERG degradation both in vivo and in vitro.203

A new novel method for direct inhibition of ERG has been achieved in vivo. Long-term knockdown of the two most common variants of the TMPRSS2:ERG fusion (T1-E4 and T2-E4, see Figure 3) has been successfully performed in mouse xenograft models using small interfering RNA delivered in non-toxic liposomal nanovectors (2-dioleoyl-sn-glycero-3-phosphatidylcholine). After 4 weeks of treatment tumour growth inhibition, reduced tumour weight and increased cell death was observed with minimal toxicity.204 This approach could be used in the future to personalise treatment by targeting specific oncogenic fusions within a tumour.

Conclusions and future perspectives

The occurrence of ERG overexpression in prostate cancer has been well established over the last decade. Although some debate still remains as to the prognostic implications of this event, there is an emerging role for its diagnostic value as an early indicator of prostate cancer development with ERG overexpression being found in benign prostatic hyperplasia and PIN, as well as later-stage carcinoma and castration-resistant cancers. Prognostically, there is evidence to suggest that the TMPRSS2:ERG gene fusion event is linked to early relapse and biochemical recurrence. ERG’s ability to regulate a wide network of genes implicated in differentiation, growth, motility, invasion and epigenetic control are all hallmarks of its oncogenic potential. To link a specific gene so clearly to a specific type of cancer is a very rare occurrence in the field of cancer research. This review has focused on prostate cancer; however, ERG is also implicated in Ewing’s sarcoma and acute myeloid leukaemia. It is reasonable to expect that ERG will turn out to be involved in several other types of cancer.

The use of small molecule inhibitors to interfere with ERG’s abilities to interact with protein partners and co-factors (such as PARP and HDACs) or to inhibit its DNA-binding properties and stability are just starting to be explored. Further research is required before the full story of ERG’s role in prostate cancer can be understood. There is no doubt that diagnostic tests and therapies that are based on ERG will provide new opportunities in the treatment of prostate cancer.