Introduction

Genomic stability is constantly susceptible to irregularities that arise from endogenous and exogenous sources. One of the greatest threats to genomic integrity and implicated as a driving factor in numerous diseases is oxidative stress (OS), which is defined as imbalance between the production of reactive oxygen species (ROS) and antioxidant defense1,2,3. OS leads to various DNA damage such as DNA single-stranded breaks (SSBs) and DNA double-stranded breaks (DSBs). Whereas DSBs are among the most deleterious lesions, SSBs can occur upwards of 10,000 daily per cell, representing the most abundant type of DNA damage4. To better preserve integrity of the genome, various DNA repair and DNA damage response (DDR) pathways have evolved to mitigate the deleterious effects of OS, functioning effectively in healthy individuals4. It has been widely accepted that deficiencies in DNA repair and DDR pathways have been implicated in human diseases such as cancer and neurodegenerative disorders.

Previous studies have elucidated several critical proteins that regulate and coordinate various DNA repair and DDR pathways. Whereas MRN complex (Mre11, Rad50, and Nbs1) as well as BRCA1 and BRCA2 promote homology recombination (HR) -directed DSB repair5,6,7, Poly(ADP-ribose) polymerase 1 (PARP1) and X-ray repair cross-complementing protein 1 (XRCC1) have been implicated in SSB repair pathway8. Generally speaking, DSBs trigger the activation of ATM (ataxia-telangiectasia mutated) ā€“ Checkpoint kinase 2 (Chk2) DDR pathway2,4. On the other hand, SSBs and DNA replication stress trigger the activation of ATM and Rad3-related (ATR)- Checkpoint kinase 1 (Chk1) DDR pathway9,10,11. Proliferating cell nuclear antigen (PCNA) coordinates the faithful and processive DNA replication and repair of the genome when needed12,13. In addition to its critical role in ATR-Chk1 DDR pathway, TopBP1 participates in DSB repair directly14,15,16,17. Therefore, prior knowledge demonstrates that these proteins crosstalk among different DNA repair and DDR pathways.

Oxidative stress-induced DNA damage is primarily repaired by base excision repair (BER) pathway2. Oxidative DNA damage is first recognized and processed by various DNA glycosylases including but not limited to MUTYH, OGG1, NTH1, and NTHL118. Defects in BER genes predispose to hereditary cancers. Whereas MUTYH has been established in the etiology of a colorectal cancer predisposition syndrome19, a recent case study reports that MUTYH germline and somatic aberrations are implicated in pancreatic ductal adenocarcinoma and breast cancer oncogenesis20. OGG1 is also implicated in association with lung cancer21. Furthermore, NTHL1 mutations or deficiencies are associated with multi-cancer phenotype including colorectal cancer and breast cancer22. In addition, nucleotide excision repair (NER) protein XPC may regulate OGG1 expression and participates in BER pathway to repair oxidative DNA damage, suggesting crosstalk between different DNA repair pathways23,24.

AP endonuclease 1 (APE1, also known as APEX1, Apn1, or Ref-1) and AP endonuclease 2 (APE2, also known as APEX2 or Apn2) have been implicated in many regulatory mechanisms in the maintenance of genome stability, especially in BER pathway8,25,26. In addition to its transcriptional activity of redox regulation, APE1 is the main AP endonuclease with high endonuclease activity yet weak exonuclease activity to promote BER8,25,27. In contrast, APE2 has high exonuclease activity but weak endonuclease activity11,28. APE2 is composed of three functional domains: N-terminal EEP, PCNA-interacting (PIP) motif, and a highly-conserved C-terminal zinc finger Zf-GRF10. APE2 interacts with PCNA via two different modes, promoting 3ā€²-5ā€² SSB end resection for the activation of ATR-Chk1 DDR pathway9,10,11. Accumulating evidence has suggested that abnormal expression of APE1 is implicated in cancer such as gastric cancer and ovarian cancer29,30,31. Both APE1 and APE2 have previously been found to be upregulated in multiple myeloma cell lines32. However, it remains unknown whether the expression of APE2 in patient-derived tumor tissues is altered when compared with non-malignant tissues across multiple cancer types.

When genetic irregularities arise within DNA repair and DDR mechanisms, OS-induced damage may not be properly repaired, leading to genome instability and compromised protein production2,3,33. Copy number variations (CNVs), point mutations, and irregular gene expression patterns are examples of genomic alterations that have the potential to impact protein structure and function. Here, we conduct a multi-cancer bioinformatics analysis of APE2 at DNA, mRNA, and protein levels from publicly available data across multiple studies and cancer tissue types. In this study, we examine genomic alterations (somatic mutations and CNVs) occurring in APE2 in vivo from 14 cancer types and if APE2 is differentially expressed in 6 types of tumor tissue compared with non-malignant tissue. Furthermore, we analyze mRNA expression between APE2 and 13 critical DNA repair and DDR proteins. To the best of our knowledge, this work is the first to provide patient-derived evidence showing that abnormal expression of APE2 is implicated in multiple cancer types and that APE2 expression is correlated with the expression of various DNA repair and DDR proteins. We also discuss the function and biology of APE2 in genome integrity.

Methods

Data sets

Multiple-study data for genomic alteration events in 14 different cancer types was retrieved from the cBioPortal for Cancer Genomics. Although the cBio repository contains data for 30 different primary sites, only data for these 14 cancer types adhered to the following criteria at the time of download: (1) sample size greater than 250, (2) alteration data available and/or (3) sufficient gene expression data for the same tissue available from TCGA. Alteration events include amplifications (high-level amplification and typically focal), gains (a few additional copies but usually broad), heterozygous deletions (shallow loss), homozygous deletions (deep loss), and protein-level somatic mutations (docs.ciboportal.org). Study-specific details on genomic alteration event data creation can be found within individual studies. A list of all studies - with links ā€“ from which datasets have been provided is available on the web (cbioportal.org/datasets).

Gene expression quantification data of sequenced mRNA from TCGA for 6 cancer types was downloaded from Genome Data Portal (GDC) v14.0 via GDC's transfer tool. Sample size n ā‰„ 20 where matched tumor and non-malignant tissue data was available for each individual was the determining factor by which cancer types were chosen for gene expression analysis. See Discussion for more information on an exception made to this criterion. Tissue samples for gene expression quantification data were originally obtained from patients at tissue source sites34 such as Columbia University, Mayo Clinic, Duke University, and the University of Pittsburgh. Tissue samples were then sent to one of two biospecimen collection sites (BCRs) where RNA was isolated, clinical data standardized, and analyte distribution conducted. Samples and associated data were then sent to a genome characterization or sequencing center (GCC or GSC, respectively) where data was sequenced with Illumina HiSeq technologies. For gene expression quantification, mRNA transcripts were aligned and the .bam files wereĀ quantified with HTSeq. Read count data from HTSeq and from STAR are available, as well as fragments per kilobase of transcript per million mapped readsā€™ (FPKM) and FPKM-UQ normalized data. Read count data is often used for global differential expression analysis of gene sets. FPKM normalization method is much more specific, taking into consideration gene length differences and is often a preferred method for gene-gene expression comparisons35. Since we focus on single gene and gene-gene pair analysis in this study, we use the FPKM normalized data where read count is divided by gene length.

Data pre-processing

Genomic alteration event files containing information on APE2 per cancer type were downloaded from cBioPortal website in TSV format. Files were then programmatically parsed for relevant information from tumor tissue samples, removing duplicate data and any individuals for whom alteration event data was either not available or not explored. To reduce redundancy, only the first record, for onlyĀ tumor tissue samples which had more than one record, was retained for analyses.

File pre-processing for mRNA gene expression quantification included renaming files from UUID to the associated TCGA barcode. Two new sets of files were created after parsing original files: one set contained data for all tumor data for APE2 and 13 additional genes implicated in DDR pathways, one set contained only expression data from individuals for whom both tumor and matched non-malignant tissue samples were available. TableĀ S3 shows sample sizes for tumor-only tissue and matched tissues per cancer type in our analysis.

Genomic alteration event analysis

To characterize the frequency of genomic alteration events in APE2, the frequency of each event type per cancer type was calculated in Excel and plotted in R. Genomic alteration plots were created in R and figures representing APE2 protein domains and mutations ā€“ along with R-generated plots ā€“ were created in PowerPoint.

mRNA analysis

To find the difference in mRNA expression values of APE2 between tumor and matched non-malignant tissue, a paired two-sided t-test was conducted and boxplots were created in R. To find correlations between mRNA expression of APE2 to other genes implicated in DDR pathways, FPKM values were log-transformed, Pearsonā€™s correlation run and scatterplots created in R.

Bioinformatics tools

Data from the GDC portal was downloaded using the provided transfer tool. For pre-processing of data, scripts written in Python, utilizing the os.walk module, were used to parse, create, and rename files. To find matching TCGA barcode per UUID, UUIDtoBarcode from the R-package TCGAUtils was used36. From R-package ggplot2 and dependencies, genomic alteration frequency plots and somatic mutation plots were created using ggplot and scatterplots created with ggscatter37. All R-packages were implemented, and statistical analyses and plots were created, with RMarkdown in RStudio 1.1.45637,38,39,40,41. All Python scripts were run on Ubuntu 18.04LTS from Windows 64-bit OS. Figures created in PowerPoint were done so using Microsoft Office 2019 PowerPoint Version 1912 (Build 12325.20240 Click-to-Run).

Results

Genomic alterations in APE2 across all cancer types

To verify the occurrence of APE2-situated genomic alterations in cancer patients, data from 14 cancer-types in the cBiolPortal database was programmatically analyzed. From all samples for which data were available and had been evaluated for genomic alterations (nā€‰=ā€‰21,769), genomic alterations in APE2 occurred at ~17% frequency and appeared in each cancer type (Fig.Ā 1). Cancers with the highest frequency of total events were skin (24.35%), liver (23.88%), and breast (23.72%) (Fig.Ā 1, TableĀ S1). From the total alteration events observed, heterozygous deletions occurred most frequently (51.18%), gains next (40.77%), then amplifications (3.77%) and mutations (3.11%) with homozygous deletions occurring at the lowest frequency (1.17%). CNVs in APE2 were found at varying levels across each cancer type. Gains occurred most frequently in lymphoid cancer (14.30%), amplifications most often in prostate cancer (3.28%), heterozygous deletions observed most frequently in liver tumor tissue (15.42%), and head and neck cancerous samples revealed the highest frequency of homozygous deletions (1.07%) (Fig.Ā 1, TableĀ S1). Frequency of somatic mutations in APE2 ranged from 2.89% (uterus) to 0.10% (head and neck), although its somatic mutations did not appear in kidney nor pancreas tumor tissue.

Figure 1
figure 1

Frequency of genomic alteration events in APE2 across 14 cancer types and its mRNA expression analysis between tumor tissue and matched non-malignant tissue from 6 different cancer types. cBioPortal analysis of APE2 genomic alterations including gains (Gain), amplifications (Amp), heterozygous deletions, homozygous deletions, and somatic mutations (Mute). Redundant samples were removed before analysis. Multiple somatic mutation events within a single sample and duplicate mutations across different individuals were each distinctly counted. The image was created with PowerPoint and R-package ggplot2 - ā€˜ggplotā€™ (AlterationEvents.Rmd).

Occurrences of coding truncations and amino acid changes in APE2

In addition to the analysis of APE2 genomic alterations, protein consequences of somatic mutations in APE2 were analyzed. Protein-level annotations of somatic mutations in APE2 were extracted from genomic alteration event data for 12 cancer types (TableĀ S2). A total of 117 mutations were found in APE2 with uterine (40), lung (22), and skin (10 melanoma, and 6 non-melanoma) tumor tissue revealing the highest number of mutations (Fig.Ā 2A, TableĀ S2). Mutation events were comprised mainly of missense mutations with several nonsense mutations which create premature stop codons, frameshift deletions which are deletions of chunks of protein sequence that create a shift in remaining amino acids, and alternative splices denoted with an ā€˜Xā€™ indicating a translation termination codon at the site (Fig.Ā 2Aā€“C). Tumor tissues with the highest frequency of somatic mutations in APE2 were uterus (2.89%ā€‰=ā€‰40 out of 1,386), skin (2.47%ā€‰=ā€‰16 out of 649) and lung (0.78%ā€‰=ā€‰22 out of 2,831) tumor samples (Fig.Ā 2B).

Figure 2
figure 2

Somatic mutations in APE2 at the protein level across multiple cancer types. (A) mutation counts of different mutations in APE2 protein in 12 different cancer types. (B) missense mutations in APE2 protein in uterus, skin, and lung cancers. Mutations from non-melanoma patients are highlighted with small red triangles.Ā (C) representative splicing and truncations in APE2 protein in uterus, lung, and skin cancers. The images were created by PowerPoint and R-package ggplot2 - ā€˜ggplotā€™ (LolliPlotMutes.Rmd).

Notably, out of the 117 total mutations across multiple cancer types, 29 Arginine residues (~25%) were mutated to other residues, suggesting a distinct feature of APE2 missense mutations (Fig.Ā 2A,B, TableĀ S2). An Arginine to Cysteine mutation was found at position 173 (i.e., R173C) in one breast cancer and two uterine tumor samples. A R222ā€‰C and a R222H mutations were found in uterine cancer, and the R222ā€‰C mutation was also found in a brain cancer sample. The Arginine residue at position 465 was mutated to Histidine (i.e., R465H) in skin and cervical tumor samples, and to Cysteine (i.e., R465C) in more than one individual with uterine cancer. In particular, there are 8 Arginine mutations localized in the extreme C-terminus Zf-GRF motif of APE2 (i.e., R465C, R465H, R490H, R491C, R497S, R499W, R508Q, R516M). Our previous studies have shown that R473A, R473E, R502A, and R502E mutant APE2 in Xenopus (homologous to R479 and R508 mutants in human APE2) is deficient in ssDNA binding10, and that G483A-R484A double mutants of Xenopus APE2 (homologous to G489A-R490A double mutants in human APE2) are defective for ssDNA binding and PCNA interaction11. We speculate that the Arginine missense mutations in human APE2 identified from this study may compromise its ssDNA binding and/or PCNA interaction, leading to compromised exonuclease activity and associated genome instability.

Further distinction of skin cancer mutations into melanoma and non-melanoma tumor-types revealed three of the six non-melanoma mutations were Proline to Leucine residue changes (i.e., P165L, P463L and P467L) (Fig.Ā 2B, TableĀ S2). Interestingly, P463L and P467L flank the start of the Zf-GRF domain and P165L falls in the middle of the EEP domain.

Additionally, four nonsense mutations of APE2 were found in lung cancers (Q235*, Q380*, and E448*) and uterine cancer (E444*) creating premature stop codons (Fig.Ā 2C). Of the two frameshift deletions found, one was in uterine tissue (T356Cfs*37) and one in lung cancer (L464Cfs*14) (Fig.Ā 2C). Furthermore, three alternative splices were revealed in uterus (X141, X190) and skin cancers (X214) (Fig.Ā 2C).

Upregulation of APE2 mRNA expression in tumor tissue

After establishing the occurrence of alterations in APE2 at the genomic level, APE2 expression at the mRNA level was analyzed. To find if APE2 is differentially expressed in cancer patients, gene expression quantification data was computationally compared between tumor and non-malignant tissue using a two-sided t-test in R. Samples referred to as ā€˜matchedā€™ indicate individuals where non-malignant and tumor tissue were both available in the data (TablesĀ S3 and S4). FPKM (fragments per kilobase of exon model per million reads mapped) values of mRNA sequencing data for APE2 in matched samples revealed significant upregulation (Ī±ā€‰=ā€‰0.05) in tumor tissue compared with non-malignant tissues across 5 cancer types including kidney (nā€‰=ā€‰126), breast (nā€‰=ā€‰112), lung (nā€‰=ā€‰106), liver (nā€‰=ā€‰58), and uterine (nā€‰=ā€‰23) cancers (Fig.Ā 3Aā€“D,F). Matched samples from prostate cancer patients (nā€‰=ā€‰52) did not present a significant difference in APE2 FPKM values (Fig.Ā 3E).

Figure 3
figure 3

APE2 mRNA expression between tumor tissue and matched non-malignant tissue per individual from 6 different cancer types including kidney (A), breast (B), lung (C), liver (D), prostate (E), and uterus (F). A paired two-sided t-test was conducted to determine significance of difference. df (degree of freedom) and p values are listed in each panel. The images were created by PowerPoint and R-default ā€˜boxplotā€™ (matchedGeneExp.Rmd).

To establish a baseline for finding correlations - between expression patterns in APE2 and 13 other DNA repair and DDR pathway genes ā€“ in tumor tissue, matched samples were further analyzed for differential expression in these genes. Notably, in most, if not all, cancer types analyzed, mRNA expression of 11 out of the 13 DNA repair and DDR genes are upregulated in tumor tissue compared with that in non-malignant tissue. Analysis of these genes revealed significant upregulation in APE1, BRCA1, Chk1, Chk2, and TopBP1 for all cancer types (Figs.Ā S1ā€“S5). Like APE2, PCNA, BRCA2 and XRCC1 were also significantly upregulated in all cancer types except prostate (Figs.Ā S6ā€“S8). PARP1 was upregulated in all cancer types except kidney cancer (Fig.Ā S9). mRNA expression of ATR in tumor tissue was increased compared with matched non-malignant tissue in liver, lung, uterus, and prostate cancers, but not in breast and kidney cancers (Fig.Ā S10). Rad50 mRNA expression was upregulated in almost all cancer types except uterine cancer (Fig.Ā S11).

Intriguingly, mRNA expression of ATM and Mre11 was upregulated in tumor tissue compared with matched samples in some cancer types but downregulated in other cancer types (Figs.Ā S12ā€“S13). ATM expression was upregulated in kidney and liver cancers, but not in lung and prostate cancers, while it was downregulated in breast and uterine cancers (Fig.Ā S12). Mre11 expression was upregulated in lung and liver cancers, but not in kidney, prostate, and uterus cancers, and was significantly downregulated in breast cancer (Fig.Ā S13).

Patterns in mRNA expression between APE2 and other DNA repair and DDR genes

Because co-expression and interactive partners of APE2 are not yet well known, it is significant to compare expression levels of APE2 to other DNA repair and DDR genes. After establishing differential expression patterns for reach gene per cancer types, FPKM values of APE2 were correlated with 13 other DNA repair and DDR genes in tumor tissue only for breast (nā€‰=ā€‰1,105), lung (nā€‰=ā€‰1,028), kidney (nā€‰=ā€‰891), uterine (nā€‰=ā€‰552), prostate (nā€‰=ā€‰499), and liver (nā€‰=ā€‰407) cancers (TablesĀ S3 and S5). While APE2 had a significant (Ī± = 0.05) positive correlation with PCNA, APE1, XRCC1, PARP1, Chk1, and Chk2 across all 6 cancer types, groupings of DNA repair and DDR genes were found to be correlated with APE2 in different patterns in different cancer types (Figs.Ā 4, and S14ā€“S18). In liver cancer, correlations of APE2 with these 13 DNA repair and DDR genes were all positive (Fig.Ā S14). Out of the 13 other genes, 11 had the significantly strongest relationship per gene with APE2 in liver cancer (determined by R-value and Ī±ā€‰=ā€‰0.05) (Fig.Ā S14); however, direction, strength, and significance of correlations varied in other cancer types.

Figure 4
figure 4

Correlation between mRNA expression of APE2 and 13 other DNA repair and DDR proteins in tumor tissues of breast cancer. (Aā€“F) are positive correlation of APE2 mRNA expression with that of PCNA (A), APE1 (B) Chk1 (C), Chk2 (D), XRCC1 (E), and PARP1 (F). (G-I) No correlation of APE2 mRNA expression with that of BRCA1 (G), TopBP1 (H), and BRCA2 (I). (Jā€“M) show negative correlation of APE2 mRNA expression with that of Mre11 (J), ATR (K), Rad50 (L), and ATM (M). Pearsonā€™s R and p values are listed in each panel. The images were created by PowerPoint and R-package ggplot2 - ā€˜ggscatterā€™ (GeneExpressionAllCancer.Rmd).

PCNA, APE1, XRCC1, and PARP1

Across all 6 cancer types, APE2 expression was significantly positively correlated with the expression of PCNA, APE1, PARP1 and XRCC1 yet expression ranges varied per cancer type and gene-gene pair. The positive correlation between the expression of APE2 and PCNA was most significant in prostate cancer. However, neither APE2 nor PCNA expression values was significantly different in matched prostate tumor (Figs.Ā 3E and S6E), thus there is no baseline to strengthen the implications of this APE2-PCNA correlation. Weak APE2-APE1 expression correlations were observed in lung and kidney cancers, which is consistent with the significant role of APE1 and APE2 in SSB repair11,42,43. When APE2 and PARP1 values were correlated, liver and uterine cancer samples showed more variability whereas very little spread occurred in breast and kidney cancer samples. The highest positive correlation of APE2 and XRCC1 occurred in liver cancer.

Chk1 and Chk2

APE2 mRNA expression was positively and significantly correlated with the mRNA expression of Chk1 and Chk2 in all 6 cancer types (breast, liver, prostate, uterine, lung, and kidney) (Figs.Ā 4, S14ā€“S18). It is noted that the strongest positive correlation of APE2 with Chk1 and Chk2 is found in lung cancer (Fig.Ā S17).

BRCA1 and BRCA2

When APE2 expression was correlated with expression of BRCA1 and BRCA2, the top 4 strongest positive correlations were, in order starting with strongest, liver, lung, prostate, and uterus. Whereas there were no significant correlations of APE2 to BRCA1 and BRCA2 in breast cancer (Fig.Ā 4), kidney cancer samples exhibited a negative correlation of APE2-BRCA2 and a positive correlation of APE2-BRCA1 (Fig.Ā S18, and TableĀ S5).

TopBP1 and ATR

The APE2-ATR relationship was positive in liver and lung cancer but negative in breast cancer. Furthermore, the APE2-ATR correlation was not significant in prostate, uterine, nor kidney cancers. Although a positive correlation of APE2-TopBP1 was revealed in liver, prostate, uterus, and lung cancers, the APE2-TopBP1 correlation was not significant in kidney nor breast cancer.

ATM, Mre11, and Rad50

The APE2-ATM correlation was negative in all cancer types expect liver. APE2 did not have a significant relationship with Mre11 or Rad50 in uterine nor prostate cancers. The correlation of APE2 with Mre11 was positive in liver and lung cancers, and negative in breast and kidney cancers. In addition, the APE2-Rad50 correlation was negative in breast, kidney, and lung cancers, yet positive in liver cancer.

Discussion

Function and biology of APE2

The human APE2 gene was first cloned and briefly characterized in 200044. However, the understanding of APE2 in DNA repair and DDR pathways has been derived from studies of model organisms such as Xenopus and yeast9,10,11,45,46,47,48,49, despite some biochemical and sub-cellular localization characterization of human APE2 protein28,50,51. Recent genetic screens identified APE2 as a synthetic lethal target in BRCA1- and BRCA2-deficient colonic and ovarian cancer cell lines52. Although the exact underlying mechanism remains unknown, this report suggests that APE2 may contribute to different DNA repair pathways other than BRCA1- and BRCA2-mediated DSB repair. Consistent with this, our series of studies using Xenopus egg extract system suggest that APE2 plays a direct role in SSB repair via the 3ā€²-5ā€² SSB end resection11,26,43. Of note, gene expression of APE2 and APE1 has been found up-regulated in multiple myeloma (MM) patients and MM cells, which may lead to dysregulation of HR via regulating Rad51 expression32. These new evidences suggest that APE2 contributes to genome integrity via different mechanisms.

A prior study has shown that APE2-knock out (KO) mice are viable but develop immune response defects and growth retardation53. Subsequent characterization of APE2-KO mice revealed the significance of APE2 in B cell development and immunoglobulin class switch recombination54,55. Interestingly, recent evidences suggest that ATR and BER pathways are involved in regulating the expression of Programmed death-ligand 1 (PD-L1), an important player for immunotherapy56,57,58. As APE2 has been shown in both BER and ATR pathways11,50, it is interesting to test whether APE2 is directly involved in cancer immunotherapy in future studies.

Due to the lack of in-dept knowledge about APE2 functions in human diseases, the purpose of this study has been to contribute knowledge from such an angle to investigate the genomic alterations and abnormal expression of APE2 in human cancers. Using data from samples across multiple cancer types has been advantageous in providing a snapshot of APE2 characteristics in cancer yet there are limitations which must also be acknowledged.

Genomic alterations of APE2 in cancer samples

Our data from 21,769 cancer patients revealed that genomic alterations of APE2 occur in all cancer types analyzed. This observation suggests the potential involvement of APE2 in cancer development, although future studies are needed to test this question directly. Frequencies of CNVs in APE2 for gains, amplifications, heterozygous deletions, and homozygous deletions vary significantly per cancer type. It is important to note that prior studies from which data was obtained did not provide CNV information for all patients32 and frequencies observed in this study may be relatively lower than those in the actual disease population, warranting further investigation.

We note some unique features of APE2 somatic mutations. The relatively high rate of Arginine mutations dominating APE2 somatic mutations across several cancer types are a distinct feature. The identified Proline to Leucine mutations flanking the Zf-GRF domain (P463L and P467L) may affect its secondary structure and potential function to interact with ssDNA and PCNA10. Future investigations are needed to test whether these point mutations in APE2 affect its function in SSB repair and signaling via different capacity of interacting with ssDNA and PCNA or its nuclease activity per se.

Abnormal expression of APE2 in cancer patients

Our results on APE2 mRNA expression from matched tumor and non-malignant tissue demonstrate its overexpression in 5 out of 6 cancer types (Fig.Ā 3). This pattern of APE2 overexpression is also found in PCNA, BRCA2, and XRCC1 (Figs.Ā S6ā€“S8). Consistent with our findings, Kumar et al. recently reported that APE2 is overexpressed at the levels of mRNA and protein in MM cell lines and 112 MM patients from two datasets32. Previous studies have demonstrated that PCNA interacts with APE2 to regulate its exonuclease activity11,28,45,47,50. A recent genetic screen has revealed that APE2 is a synthetic lethal target in BRCA2-deficient cells52. Although APE2 and XRCC1 are involved in BER pathway, it seems that they contribute to SSB repair via different mechanisms43.

A prior study revealed that mRNA expression of APE1 and PARP1 is upregulated in tumor tissue compared with that in non-malignant tissues from 53 paired colorectal cancer patients59. This is consistent with our observation of upregulation of APE1 mRNA (Fig.Ā S1), and increased PARP1 mRNA expression (Fig.Ā S9). Although upregulation or downregulation of individual genes in the BER pathway is often found in tumor tissue over non-malignant tissue, overall BER capacity has been proposed a determinant of prognosis and therapy response to DNA-damaging 5-fluorouracial in colon cancer patients60.

Correlation of APE2 mRNA expression with other DNA repair and DDR proteins

We have shown that APE2 expression is positively correlated with the expression of PCNA, APE1, PARP1, XRCC1, Chk1, and Chk2 across all six cancer types analyzed in this work (Figs.Ā 4, S14ā€“S18). This may be interpreted that APE2 is involved in BER and SSB repair pathways as same as PCNA, APE1, XRCC1, and PARP111,28,42,43. Recent studies showing the role of APE2 in Chk1 phosphorylation in oxidative stress could provide feasible explanation for the positive correlation found between APE2 and Chk19,10. Interestingly, the negative correlation of APE2 expression with ATM expression in all cancer types analyzed (excepting liver), suggests that APE2 and ATM may be in different DDR pathways.

Due to tumor heterogeneity, there is no ā€˜one-size-fits-allā€™ formula that can be applied to the prediction, prevention, or therapy of cancer or other human diseases. In this study we demonstrate the varying gene expression and genomic alteration profiles of APE2 by tissue site and even by individual in multiple cancers. Future studies that include more tissue sites, additional metadata, and consider other potential molecular partners of APE2 will be useful in establishing the genomic landscape in which certain APE2 characteristics can be implicated as predictive markers or therapeutic targets.

Consideration of sample size and data availability

The presence of matched samples in TCGA dataset provides clear indication of whether APE2 changes in tumor tissue per individual, yet many of the cancer type sample sizes of matched tissues in TCGA were too small to provide statistical value. Due to the prevalence of some cancer types over others, it is infeasible to obtain consistently large quantities of data for all cancer types. Furthermore, some tumor tissue samples are available in ā€˜study-worthyā€™ amounts, yet the collection of matched non-malignant tissues is not available due to the nature of the tissue.

Gene expression quantification data was useful in understanding the transcriptome surrounding APE2. In this study, data from multiple cancer types reveals not only APE2 characteristics in disease but suggests tissue-dependent or tissue-specific characteristics. All available data from TCGA has been subject to uniform protocol and strict quality control, indications of valid and consistent datasets. However, TCGA data from bladder and colon cancer studies contained identical gene expression quantification data and thus, due to lack of clarity, were not included in the current study. Protein expression data from The Cancer Proteome Atlas (TCPA) is available for many of the same individuals in the TCGA datasets we have analyzed here. An analysis of matched protein expression could be duly advantageous for examining APE2 transcription levels to the level of expressed protein and to compare translation of APE2 with other DNA repair and DDR genes. Regardless, APE2 is not included in the TCGA/TCPA protein quantification analyses, deeming the protein expression data inefficient for the current study.

Downloadable files of somatic mutation data from cBio contained conflicting information from that which could be visualized on the web interface of cBioPortal. For example, somatic mutations A372T and G462E found in skin cancer were not included in the files but do appear on the cBioPortal website. These missing mutations were found and manually added to analyses. There were a total of 633 melanoma patients whose available CNV information was analyzed and recorded (491 normal copies of APE2; 58 gains; 82 heterozygous deletions; 2 amplifications. TableĀ S2). There were an additional 10 melanoma patients and 6 non-melanoma patients for whom APE2 somatic mutations were recorded. Therefore, the ~24% genomic alteration frequency from skin cancer patients was derived primarily from melanoma patients (Fig.Ā 1).

Likewise, genomic alteration data from cBio is missing for a large number of samples, resulting in slightly less robust results. Furthermore, ploidy and purity of copy number calls could be different across studies. Due to a general lack of prior manual review that could ensure uniform quality of data available from cBioPortal, the CNV data has the potential to represent a number of false positives and false negatives (docs.cbiopotal.org). Despite minor limitations with sample sizes and lack of availability for particular data-types, we believe the magnitude and overall quality of data analyzed here has been sufficient to capture certain patient-derived characteristics of APE2 in disease.