Introduction

Fragile sites are specific loci that are vulnerable to breaks and constrictions when chromosomes are exposed to replication stress, acting as genomic ‘fault lines’1. Recently, Li et al. provided the most detailed account of structural variants (SV) in the cancer genome to date where researchers derived 16 distinct signatures of structural rearrangement. The signatures were characterised by an over-representation of a particular SV class, size, replication timing and genomic location. They also compared the co-occurrence of these signatures with known pathogenic mutations in key DNA repair genes (e.g. ATM, BRCA1, BRCA2). The fragile sites signature showed only moderate co-occurrence with alterations in DNA repair genes, instead being characterised by deletions and tandem duplications at chromosomal fragile sites2. The genes in closest proximity to the most commonly affected fragile sites highlighted by Li et al. are shown in Table 12. While the mechanism underpinning these sites is not fully understood, there are a number of proposed mechanisms for why these fragile sites are so vulnerable to breaks.

Table 1 Most commonly altered fragile sites ranked from most to least affected by structural variation in the cancer genome.

Fragile sites associate with extreme genomic stress

Large genes are considered more likely to harbour fragile sites3, with the most common fragile sites (CFS) corresponding to the largest actively transcribed genes or transcription units (TU) in both human and mouse cells4. Active TUs >1 Mb are known to be reliable predictors of chemically induced CNV hotspots4. In the most affected fragile sites described by Li et al., all genes were ≥0.89 Mb with an average length of 1.54 Mb (Table 1)2. The transcription-dependent double-fork failure (TrDoFF) model proposes that genomic instability may arise from cellular stress induced by transcription during replication5,6,7. Curiously, the increased transcription in large TUs does not necessarily increase the instability and may even increase the stability at these sites4,8. The TrDoFF model suggests that large TUs could promote simultaneous failure of two converging replication forks through the formation of RNA:DNA hybrid structures known as R loops5. Alternatively, large TUs may create late-replicating domains, which prolong transcription into the S-phase and disrupt the initiation of DNA replication at origins (origin firing)5. As large TUs fail to replicate the DNA in the S-phase, these regions have also been shown to uniquely exhibit mitotic DNA synthesis (MiDAS)9,10. Sites of MiDAS may be defined through a method known as MIDAS-seq and are evident as well-defined twin peaks that merge into a single peak as the M-phase progresses11. These peaks are conserved between cell lines and encompass all known CFSs as well as regions resembling CFSs11. Consequently, the presence of MiDAS is an indicator that cells are experiencing DNA replication stress9. Within these unreplicated regions, fragile site breaks occur, creating a deletion CNV in the DNA that spans the TU5. This is supported by experimental evidence in primary cells that shows clusters of double-stranded gene breaks and translocations that localise to the gene bodies of longer genes12. The alternate possibility is the formation of a copy-number gain.

Okazaki fragments are short sequences of DNA nucleotides (150–200 base pairs long in eukaryotes) that are synthesised discontinuously on the lagging strand. At fragile sites, duplications (CNV gains) may also occur, theoretically following fork-stalling, when the 3′ end of a nascent Okazaki fragment disengages and anneals with the lagging strand template of a nearby replication fork undergoing replication13. This is known as the fork stalling and template switching (FoSTeS) model14. In more contemporary work, the FoSTeS model is superseded by the microhomology-mediated break-induced replication model, which proposes that a single double-strand end results from replication fork collapse in a cell under stress and as part of the stress response, repair molecules RecA/Rad51 become downregulate preventing double-stranded repair. As a consequence, the 3′ end from the collapsed fork anneals to any single-stranded template with sufficient microhomology. This annealing typically occurs in front of, or behind the position of the fork collapse, leading to gene deletion or duplication, respectively14.

The alternative breakage–fusion–bridge cycle model proposes that double-strand breaks between the DNA are bridged, joining the Watson and Crick strands, and that, over progressive cycles of breakage and fusion, create a series of tandem inverted gene duplications13. However, the exact mechanism behind these gene duplications remains unclear and a number of other plausible models exist13.

Irrespective of the mechanism, experimental evidence shows a clear correlation between fragile sites and copy-number changes4,14. In cell models, genome instability occurs in cells treated with DNA replication-stress-inducing agents, eventually resulting in CNVs in the genome15. Mapping of the resulting CNVs follows these genomic fault lines and large genes, including those identified in the Li et al. study2,15. While deletions at these loci are more common with chemically induced replication stress, gains have also been observed in cells15. If these alterations provide a fitness advantage, then it seems feasible that the frequency of alterations may increase through clonal selection.

Many of the genes that harbour these fragile sites and CNVs have already been implicated in oncogenesis and have well-established roles in cancer development and/or progression, e.g. the tumour suppressors FHIT and WWOX16. However, some sites are poorly understood, such as the site at the N-acetylated alpha-linked acidic dipeptidase like-2 (NAALADL2) gene.

The fragile site in NAALADL2 may have a functional role in tumourigenesis

NAALADL2 was identified as the fifth most altered site in a pan-cancer analysis by Li et al. It is a giant gene spanning 1.37 Mb, approximately 45 times larger than the average gene, which is usually between 10–15 kbp17,18. The biological role of NAALADL2 and its relevance in oncogenesis are relatively understudied. However, data exist implicating NAALADL2 in tumour development and progression19,20,21,22,23.

Genome-wide association studies (GWAS) have linked single-nucleotide polymorphisms (SNPs) in NAALADL2 to risk in breast and lung cancers and several studies have identified SNPs within the NAALADL2 locus that are associated with prostate cancer risk or aggression20,22,23,24,25. A GWAS of 12,518 prostate cancer cases identified rs78943174 within the NAALADL2 locus as one of two loci associated with a high Gleason sum score,22 leading to suggestions that NAALADL2 could be a potentially valuable therapeutic target21. Other SNPs in NAALADL2 have been found in TP53 and GATA2 binding sites and associated with reduced time to biochemical recurrence in patients undergoing radical prostatectomy20,25. SNPs within the NAALADL2 locus have been shown to be in linkage disequilibrium (LD) with SNPs associated with an increased risk of PCa, suggesting possible synergy or, alternatively, that one of these genes represents a false-positive association26.

NAALADL2 protein expression has previously been shown to be increased in higher-stage and grade cancers27. Its overexpression in prostate cancer cell lines can lead to altered extracellular matrix binding, increased growth and invasive capabilities. Cell lines overexpressing NAALADL2 had altered transcription of genes in pathways involving the cell cycle, cell adhesion, epithelial to mesenchymal transition and cytoskeletal remodelling, suggesting a potential functional role in tumour progression; however, the specific nature of its mechanism remains elusive27.

We recently published a report on the association of somatic copy-number gains at the NAALADL2 locus with an aggressive prostate cancer phenotype28. Copy-number gains in NAALADL2 were found to occur in 15.99% (95% CI:13.02-18.95) of primary prostate cancers with increasing frequency in metastatic, castrate-resistant and neuroendocrine disease. This contrasts the pattern of NAALADL2 CNVs across all tumour types, where the loss occurred more frequently than gains2. Gains in NAALADL2 were associated with clinical hallmarks of aggressive prostate cancer, including tumour stage, Gleason grade, reduced time to disease recurrence following radical prostatectomy, increased likelihood of a multi-focal tumour, positive surgical margins and lymph node metastasis28. Importantly, of the 465 genes that were frequently co-amplified with this locus, 47.5% of the genes displayed a significant increase at a transcriptional level compared to just 2.36% that were downregulated28. This suggests that a gain or loss may have a predictable effect on transcription and therefore the function of any affected gene is important. Copy-number gains in the locus co-occurred with 67 nearby oncogenes, including BCL6, ATR, TERC and PI3K family members, and are associated with the altered transcription of 473 oncogenes, activating pro-proliferative transcription processes28. Therefore, the consequences of potential breakage at these fragile sites can be highly significant.

Ren et al. proposed a small signature of five proteins (Ki-67, Cyclin E, POLD3, γH2AX and FANCD2) associated with DNA replication stress across several tumour types29. We observed significant (albeit small) increases in the corresponding mRNA transcripts of the genes encoding these proteins: MKI67 (Log2 FC: 0.63, paj = 0.000059), CCNE1 (Log2 FC: 0.29, paj = 0.018), POLD3 (Log FC: 0.13, paj = 0.011) and FANCD2 (Log2 FC: 0.35, paj = 0.000087) in patients with NAALADL2 gain compared to diploid carriers (no changes in H2AFX expression)28. This supports the hypothesis that those patients with gains in this region have increased replication stress. In the case of NAALADL2, this correlates with a CNV in a potentially clinically significant fragile site as summarised in Fig. 1 Given the large size of the NAALADL2 gene, replication stress at this site may increase the chance of breakage and the formation of an SV. Alternatively, it may be that once a duplication event has occurred, transcription of such large transcripts could be responsible for increasing the replication stress.

Fig. 1: Overview of evidence surrounding the fragile site NAALADL2’s association with aggressive PCa.
figure 1

a The Frequency of NAALADL2 amplifications increases with Gleason grade, tumour stage and local metastasis in PCa. b Upper: Location of NAALADL2 on Chromosome 3; The lightning symbol indicates the location of the fragile site. The red box indicates the extent of the region that can co-amplify with the NAALADL2 genomic region surrounding 3q26.31, which is rich in oncogenes. Lower: pictograph displaying nearby oncogenes co-amplified with NAALADL2 in PCa. The x-axis shows the genomic location of genes within the amplicon, the y-axis represents significant co-occurrence (−Log10 p-value). c Increased copy number results in increased transcription of oncogenes through the ‘gene dosage’ effect as well as downstream activation of other oncogenes. The diagram shows tumour cells replicating following a number of pro-proliferative mRNA signalling pathways becoming activated.

Importantly, unlike FHIT and WWOX, it currently remains unclear whether the associations between the NAALADL2 fragile site and this gene signature are related to the protein function of NAALADL2. This seems plausible given that as the locus surrounding the NAALADL2 fragile site is rich in oncogenes, upon breakage, gains frequently co-occur, leading to concurrent changes in expression in pro-proliferative genes that could drive clonal expansion28,30. This raises the possibility that the location of a fragile site and the proximity of any oncogenes may be used to predict its significance in disease. The majority of research into the NAALADL2 fragile site has been in prostate cancer. Furthermore, just as fragile site–CNA interaction is often cell-type specific, it is likely that fragile site SV signatures are specific to certain tumour types, and this could prove to be a worthwhile area of research. This is supported by the findings of Li et al., who noted that tumours of the gastrointestinal tract such as colorectal and oesophageal adenocarcinomas showed higher rates of the fragile site signature and overall prostate cancer showed little enrichment for the fragile site signature. It may prove useful to further assess these sites in theses specific tumour types2.

Conclusion

Tumours with copy-number changes that occur predominantly at fragile sites represent a distinct class of structural variation in the cancer genome. The clinical significance of many of these sites remains unexplored, as evidenced by the frequently altered fragile site within NAALADL2 that has only recently attracted scientific interest. Research into this gene has highlighted the possibility that the function of the encoded protein may not be the only factor influencing the impact of structural variants. Given the broader effects and scale of the CNVs that may occur along these fault lines in the absence of significant DNA repair defects, fragile sites are likely to represent important sites in the cancer genome that have so far been largely overlooked.