Introduction

Genomic instability in cancer affects cancer development and evolution, causing drug resistance and poor prognosis, thus impacting therapy outcomes in clinic1,2,3. Hence, the “targeting genomic instability and/or aneuploidy for cancer therapy” concept has been proposed4. For contemporary targeted drug development, genomics information is critical5. Although some signatures for genomic instability in select organs were identified [e.g.,6], genes involved in genomic instability in cancer have been elusive, preventing researchers from designing specific agents for targeted therapies. Gene expression analysis of pan-cancer datasets indicated that mitotic signature increases and immune signature decreases were characteristics of high CNA cancers7, suggesting the roles of mitotic mis-regulation in generating CNA and of immune functions in antagonizing cancer cells with CNA. Although the notion of immunosurveillance of genomic instability and aneuploidy has long been proposed, few involved genes have been identified and the molecular mechanisms remain to be determined8,9.

Results with transgenic mouse models from our and other laboratories have indicated dual effects of genomic instability in the body on cancer, for both tumor suppression and oncogenesis10,11. Mitosis-targeting genomic instability models (Chromosome instability [CIN] models; e.g., Mad2, BubR1, Sgo1) have demonstrated the role of genomic instability as a disease modifier, resulting in tumor proneness in organs including the colon, lung, and liver later in life12,13,14,15,16,17. Although genomic instability is prevalent in most solid tumors, based on the tumor profile in genomic instability transgenic mice, we hypothesized that genomic instability has prominent effects for cancer development and/or disease modification in the colon, liver, and lung18. To identify specific genes involved in genomic instability in human lung adenocarcinoma, we developed a novel data mining strategy, GE-CNA, which is an approach to identify all genes whose expression associates with increased or decreased tumor CNA18. Pathway analysis revealed that (a) amplification/insertion CNA is facilitated by over-expressions of DNA replication stressors and suppressed by a broad range of immune cells (T-, B-, NK-cells, leukocytes), and (b) deletion CNA is facilitated by over-expressions of mitotic regulator genes and suppressed predominantly by leukocytes guided by leukocyte extravasation signaling. Among the 39 CNA- and survival-associated genes, purine metabolism (PPAT, PAICS), immune-regulating CD4-LCK-MEC2C and CCL14-CCR1 axes, and ALOX5 emerged as survival-critical pathways. These pathways/genes are potential therapy drug targets for lung adenocarcinoma18.

With the lung cancer results, we continued the GE-CNA analysis with cancers in liver and colon, anticipating similar gene profile, thus common genes for targeting genomic instability, would emerge. As naturally-occurring polyploidization in liver complicating the CNA datasets and analysis, we focused on colon cancer. In the United States, colorectal cancer (CRC) is expected to cause about 52,580 deaths during 2022, and is the second most common cause of cancer deaths when cancer deaths for men and women are combined19. Thus, CRCs remain a major target for prevention and therapy development. In CRCs, tumor development is associated with progressive mutational accumulation, as indicated in the “Vogelgram”20. Functional analysis of the frequently mutated genes indicated that each of the mutations in the gene (e.g., APC, TP53, FBXW7/hCDC4, PI3K-PTEN, K-RAS) can cause genomic instability, directly or indirectly21. Thus, a part of genomic instability in CRCs is linked to mutations in key oncogenic/tumor-suppressing genes. In addition, epigenetic modulations, environmental challenges from microbiota, and transcriptomic and microRNA changes, which are also suggested to affect genomic instability, were reported [e.g.,22,23,24,25,26]. Among these events impacting genomic instability, transcriptomic alterations, especially over-expressions, are most feasible to manipulate with drugs, while restoring mutated genes is technically difficult. However, transcriptomic alterations associated with genomic instability in CRCs have not been comprehensively identified, and our understanding of the impact of the transcriptomic landscape on genomic instability in CRCs remains incomplete. Hence, we set out to apply the GE-CNA data mining approach to identify genes and pathways involved in genomic instability in CRCs via transcriptomic mis-regulations.

Materials and methods

GE-CNA analysis

We downloaded the Colorectal Adenocarcinoma (TCGA, PanCancer Atlas, 2018) datasets from cBioportal (https://www.cbioportal.org/study/summary?id=coadread_tcga_pan_can_atlas_2018)27,28, a publicly available database. All following methods were carried out in accordance with relevant guidelines and regulations. The datasets included survival and clinical data for 594 patients. Among these patients, we also collected the available the gene expression profile and copy number alterations of 592 patients, and whole exome sequencing (WES) mutation profile of 528 patients. The batch normalized gene expression Z-scores by RSEM29 from Illumina HiS-eq_RNASeqV2 were used. The downloaded copy-number alteration (CNA) was estimated by GISTIC 2.030. Neutral or no change CNA was indicated by 0. Gain/amplification CNA was indicated by a positive value, while a negative value indicated deletion CNA. Amplification CNAs and deletion CNAs were analyzed jointly and separately.

In the gene expression file, we had 20,471 genes of 592 subjects. We excluded 3073 genes that were missing in more than 1/3 of subjects. The included genes were complete in all subjects. We sorted each gene by its expression in all subjects and selected the top 10 and bottom 10 subjects. The selected subjects were assigned to a high expression group and a low expression group, accordingly. Next, we extracted the subjects’ CNA counts in the high and low expression groups from the CNA file. Student’s t-test was used to examine the difference in CNA counts in the high group vs. the low group. Multiple-testing was corrected by q-value31. The significance level was 0.05.

Further, we divided the significant genes into two groups: higher expression that resulted in more CNAs and higher expression that resulted in fewer CNAs. We employed the bioinformatics tool IPA (Ingenuity Pathway Analysis, QIAGEN, Inc., https://www.qiagenbioinformatics.com/products/ingenuity-pathway-analysis) to conduct the gene set enrichment analyses32. The Benjamini–Hochberg corrected p-value33 provided by IPA was reported and evaluated at the significance level of 0.05. Also, we presented the pathway graphs from IPA.

The survival analysis of the gene alteration with regard to the overall survival was examined by the Cox Proportional-Hazards (CoxPH) Model. Age and tumor stage were adjusted as covariates, which were selected by their univariate CoxPH analysis p-value < 0.05. All available variables, such as age, sex, race, and tumor stage, were considered. The race groups with small numbers of patients were combined. The race variable analyzed in CoxPH model had two levels: White and Other. The sub-levels of tumor stage under each stage of stages 1 to 4 were combined, which resulted in four levels used in the analysis. We excluded patients with incomplete data. The Hazard Ratio (HR) and p-value of the gene were reported. The definitions of “altered” and “unaltered” subjects were from cBioportal. Briefly, an altered subject was a subject having any type of high-level CNA amplification, CNA homozygous deletion, or WES mutation. Otherwise, a subject was considered an unaltered subject. We compared the difference in gene expression levels in the altered and unaltered groups using the Wilcoxon rank sum test. The significance level was 0.05. We presented the survival curves and boxplots by altered/unaltered group. We implemented all statistical analyses using R (v4.0.3) and R packages.

The major reason to only use extreme high and low gene expression groups is to increase the statistical power by enriching the presence and increasing the effect size of the causal genetic factors. 592 is not a large sample size to separate, thus we use all samples to maximize the study power.

To estimate the magnitude of HR, we employed the following categories: small (not trivial, but possibly inconsequential), medium (likely consequential), and large (very likely consequential) HRs comparing 2 groups would be approximately 1.3, 1.9, and 2.8, respectively34.

Availability of data and materials

We obtained original tumor data from the cBioportal (https://www.cbioportal.org/study/summary?id=coadread_tcga_pan_can_atlas_2018)27,28, which is a publicly available database. The data were openly available for download. Main data generated or analyzed during this study are included in this published article and its supplementary information files. All the datasets used and/or analyzed during the current study will be available from the corresponding author on reasonable request.

Results

We applied GE-CNA to 592 CRCs in the TCGA database (Fig. 1). Supplementary Table 1 shows 247 genes whose high expression associates with high tumor CNA, and thus are annotated as CNA facilitators. Functional denotation and pathway analysis indicated that (i) the genes are functionally diverse and (ii) there was no statistically significant enrichment (corrected P < 0.05) of a specific pathway. The lack of specific enrichment is a major difference from the previous results from lung adenocarcinoma that showed enrichment on mitotic regulators and DNA replication pathways18.

Figure 1
figure 1

Identifying genes associated with Copy Number Alterations in colon adenocarcinoma with the “Gene Expression to Copy Number Alterations” (“GE-CNA”) approach. For all genes, we recorded CNA for high expressor tumors (N = 10) and for low expressor tumors (N = 10). The CNA from the “high expressor” and “low expressor” groups were compared using unpaired t-test for each gene, testing the correlation between gene expression and numbers of CNA (q-value < 0.05). Genes whose high expression was associated with high CNA were annotated as CNA suppressors, while genes whose high expression was associated with low CNA were annotated as CNA suppressors. Genes specifically associated with a type of CNA ([a] amplification/insertion [amp/ins] CNA, often associated with Microsatellite Instability [MIN], and [b] deletion CNA, often associated with mitotic error-mediated Chromosome Instability [CIN]), were identified. Figure was generated with cBioportal (https://www.cbioportal.org/datasets).

Supplementary Table 2 shows 253 genes whose high expression associates with low tumor CNA, and thus are annotated as CNA suppressors. The enriched pathways (corrected P < 0.05) were: Interferon Signaling (BAK1, BCL2, IFIT3, IFNG, JAK2, STAT2), Antigen Presentation Pathway (CLIP, MHC II-alpha), Heme Biosynthesis II (ALAS1, CPOX, FECH), Natural Killer Cell Signaling (HSPA5, IFNG, IL15, JAK2, KIR2DL4, MAP2K1, MTOR, NCR1, ULBP3), Retinoic acid Mediated Apoptosis Signaling (TRAIL-R, PARP), JAK/Stat Signaling (JAK2, MAP2K1, MTOR, PIAS2, SOCS6, STAT2), Glucocorticoid Receptor Signaling (HSP90, HSP70, NCOR, TFIIA, OXPHOS), Heme Biosynthesis from Uroporphyrinogen-III I (CPOX, FECH), and Glutathione Redox Reactions II (GSR, PDIA3) (Fig. 2. pathway analysis of CNA suppressors). The functions of the pathways are (i) immune function and its regulation (Interferon signaling, Antigen Presentation, Natural Killer cell signaling); (ii) growth signaling (JAK/STAT, Glucocorticoid receptor); (iii) apoptosis (Retinoic acid); (iv) Heme biosynthesis II (ALAS1, CPOX, FECH); and (v) Glutathione redox signaling.

Figure 2
figure 2

Pathway analysis of CNA suppressors. The 247 CNA facilitator genes in Supplementary Table 1 did not show significant enrichment in a pathway. The 253 CNA suppressor genes in Supplementary Table 2 were further subcategorized to amplification/insertion CNA suppressors (Supplementary Table 5) and deletion CNA suppressors (Supplementary Table 6). Amp/ins CNA suppressors include only 23 genes, while deletion CNA suppressors include 253 genes, suggesting that CRC cells with amplification/insertion CNA and deletion CNA are suppressed through different modalities. Deletion CNA suppressor genes show enrichment in the (A) Antigen Presentation Pathway, (B) Interferon signaling pathway, and (C) JAK-STAT signaling pathway, suggesting that CRC cells carrying CIN-associated deletion CNA are targeted by these immune-associated pathways and that they represent an immunosurveillance mechanism of CIN cells in CRC. Purple highlighting indicates particular genes with significant GE-CNA correlations and/or a cluster of such genes in the IPA pathways. Figures were generated with IPA (Ingenuity Pathway Analysis, QIAGEN, Inc., https://www.qiagenbioinformatics.com/products/ingenuity-pathway-analysis).

To obtain further mechanistic insight on CNA generation/suppression in CRC, we questioned whether amplification/insertion CNA and deletion CNA are differentially affected by different sets of genes. In lung adenocarcinoma, amplification/insertion CNA was facilitated by 161 genes whose main functions are involved in the DNA replication and repair pathways, suggesting that amplification/insertion CNA is predominantly driven by MIN or CIN caused by DNA replication stress18. In contrast, deletion CNA was associated with 187 genes that were enriched with known mitotic regulators, suggesting a link between mitotic errors and deletion CNA in lung adenocarcinoma. In CRCs, we identified 28 genes associated with amplification/insertion CNA increases (Amp/ins CNA facilitators; Supplementary Table 3), and 20 genes associated with deletion CNA increases (Deletion CNA facilitators; Supplementary Table 4). The number of identified genes is several-fold fewer than those in the lung, and the genes were not significantly concentrated in particular pathways, nor were the same genes identified in lung adenocarcinoma, indicating organ specificity in the profile. Yet, there are limited similarities; a few of the genes in Supplementary Table 3 and 4 are indeed involved in DNA metabolism and/or mismatch repair. For example, ASTE1/HT001 encodes a nuclease associated with MIN35,36,37. Recently, ASTE1 was identified as a downstream effector of the shieldin complex and a structure-specific DNA endonuclease that specifically cleaves single-stranded DNA and 3′ overhang DNA38. DNASE1 encodes Deoxyribonuclease1, which may be involved in clearance of cell-free DNA that serves as circulating tumor marker as well as playing a role in SLE pathogenesis39. Genes involved in RNA metabolism are also noted. DDX27 encodes a putative RNA helicase. PRPF6 encodes pre-mRNA processing factor 6. RPS6KA6 encodes ribosomal protein S6 kinase A6, a kinase downstream to the ERK/MAPK pathway, and is being investigated as an inhibition target for various cancers 40. SMG5 encodes SMG5 nonsense-mediated mRNA decay factor, which is thought to provide a link to the mRNA degradation machinery involving exonucleolytic pathways 41. Therefore, nucleic acid metabolism emerged as a factor affecting CNA in CRC.

The CNA suppressor genes in Supplementary Table 2 were further subcategorized to amplification/insertion CNA suppressors (Supplementary Table 5) and deletion CNA suppressors (Supplementary Table 6). Supplementary Table 5 includes only 23 genes, and Supplementary Table 6 includes 253 genes, suggesting that CRC cells with amplification/insertion CNA and deletion CNA may be suppressed through different modalities, which agrees with results from lung adenocarcinoma. Pathway analysis indicated that (a) amplification/insertion CNA suppressor genes show enrichment in Maturity Onset Diabetes of Young (MODY) Signaling (FABP2, GAPDH), NADH Repair (GAPDH), and Heme Biosynthesis from Uroporphyrinogen-III I (FECH) pathways; and (b) deletion CNA suppressor genes show enrichment in Antigen Presentation Pathway (Fig. 2A), Interferon Signaling (Fig. 2B), Heme Biosynthesis II, Natural Killer Cell Signaling, Retinoic acid Mediated Apoptosis Signaling, JAK/Stat Signaling (Fig. 2C), Glucocorticoid Receptor Signaling, Heme Biosynthesis from Uroporphyrinogen-III I, and Glutathione Redox Reactions II pathways. The enrichment profiles suggest that cells with amplification/insertion CNA are suppressed with metabolic modulations, while cells with deletion CNA are targeted by immune cells and/or by growth and cell death-related signaling, also affected by redox signaling.

The notable differences in pathway profiling results between lung adenocarcinoma and CRC led us to hypothesize that the total number of CNA is different between lung adenocarcinoma and CRC; one of the cancer types would show higher CNA. We compared total CNA numbers by cancer stages (Fig. 3A). In both cancers, cancer CNA increases over stages. In all types of CNA, in all stages, lung adenocarcinoma showed higher CNA than did CRC. The differences were significant in stages 1, 2, and 3 (corrected P < 0.05). Only in stage 4, due to an increase of CNA in CRC, did the gap in CNA numbers shrink to a non-significant level (Bonferroni corrected p-value = 0.13). The results were the same for amplification/insertion CNA (Fig. 3B) and for deletion CNA (Fig. 3C); CNA were consistently higher in lung adenocarcinoma than in CRC, regardless of the type. Based on the gene profile differences and CNA numbers between lung adenocarcinoma and CRC, we suspect that (a) major CNA generation mechanisms vary among cancers; (b) a transcriptome-driven mechanism is dominant in lung adenocarcinoma, while a mutation-driven mechanism is prominent in CRC; and (c) a transcriptome-driven mechanism of CNA generation is more aggressive than a mutation-driven mechanism.

Figure 3
figure 3

Lung adenocarcinomas carry higher CNA than do CRCs at all stages and in both types of CNA (amp/ins CNA and deletion CNA). (A) At all stages, lung adenocarcinomas carry higher numbers of CNA (all types of CNA) than do CRCs (green: lung adenocarcinoma, orange: CRC). The difference is particularly notable at earlier stages. For stages 1–3, the difference was statistically significant (Bonferroni corrected p-value < 0.05). The trend is the same in both (B) Amp/ins CNA and (C) deletion CNA. Figures were generated from R v4.0.3 (https://www.R-project.org/).

The genes whose expression levels are associated with CNA are all potential targets to modulate genomic instability, which would affect therapy outcome. However, even if modulation of the gene expression can curtail genomic instability, if the modulation does not affect patients’ survival, the modulation approach would be futile. With this reasoning, we applied secondary screening, searching for genes whose expression levels are also significantly associated with survival rate of patients (P < 0.05). The secondary screening to identify genes whose expression levels were associated both with CNA and survival rate (i.e., “survival-critical”) yielded 11 genes from 247 CNA facilitators in Supplementary Table 1, and 16 genes from 253 CNA suppressors in Supplementary Table 2 (Table 1, Table 2). As indicated in Table 1, all the 27 select “survival-critical” genes showed significant differences in average CNA/CNV between high expressor and low expressor.

Table 1 Data for Gene Expression and Copy Number Alteration (GE-CNA) on initially-identified 27 “survival critical” genes.
Table 2 List of 18 (27) survival critical genes.

The 11 CNA facilitator-survival critical genes were CAPS, CCDC115, ATP6AP1. NBEAP1, SPANXC, TIGD6, C7ORF13, TMEM184A, F8A1, LZTS3, and OLMALINC. Notably, three of these (CAPS/calcyphosin, CCDC115/coiled-coil domain containing 115, ATP6AP1/ATPase + transporting accessory protein 1) are involved in ion transport and/or vacuolar ATPase (V-ATPase), and two (TMEM184A/Transmembrane protein 184A, F8A1/ Coagulation Factor VIII Associated 1) are involved in vesicle transport. Together, these genes suggest a novel survival-critical role of Golgi trafficking in CRC and in CNA management. Two (SPANXC/SPANX family member C, and C7ORF13 [LINC01006]/long intergenic non-protein coding RNA1006) are normally expressed in a testis-specific manner, and their expressions in gastric cancers are associated with EMT, migration, and metastasis41,42,43. TIGD6 (Tigger Transposable Element derived 6) is a DNA-mediated transposon with similarity to a centromere component Cenp B. Based on the Cenp B homology, TIGD6 expression was suspected to interfere with mitotic fidelity and structural integrity of the genome. However, no strong centromere binding of TIGD6-EGFP fusion protein was observed, although binding on the chromosome arms and a low level of binding at centromeres were seen 44. Thus, how TIGD6 affects genomic stability currently remains unclear.

The 16 CNA suppressor-survival critical genes were WARS, FOXD4L1, VWA5B2, DDB2, EPOR, ROBO3, PKIB, TMED6, APOBEC3D, B3GNT4, CLCN3, FOXD4, ZNF683, EP400P1, KLHDC7B, and MT1G. Among these, involvement of EPOR (Erythropoietin receptor; involved in JAK2-MAPK/ PI3K/ STAT signaling), DDB2 (Damage specific RNA binding protein 2; involved in UV damage repair and Xeroderma), ROBO3 (Roundabout guidance receptor 3; involved in migration or neurite outgrowth), and MT1G (Metallothionein 1G; involved in protection against oxidative stress and metals) in various cancers is well-documented with hundreds of publications. Three are transcription factors (FOXD4L1; Forkhead Box D4 Like 1, FOXD4; Forkhead Box D4, ZNF683; Zinc Finger Protein 683). Three are transmembrane proteins involved in trafficking (TMED6; Transmembrane p24 trafficking protein 6, B3GNT4; UDP glcNAc betaGal 1,3-N-acetylglucosaminyl transferase 4, CLCN3; Chloride voltage-gated channel 3). Three are immunomodulators (ZNF683, WARS; Tryptophanyl-tRNA synthase1, APOBEC3D; Apolipoprotein B mRNA editing enzyme catalytic subunit 3D). The APOBEC family of enzymes are single-stranded DNA (ssDNA) cytosine-to-uracil (C-to-U) deaminases and are involved in HIV-1 restriction and in mutational generation in cancer. As such, APOBEC enzymes have been proposed as targets for virus and cancer therapy via hypomutation, and small molecule inhibitors are under development45. Four are involved in growth regulation (EPOR, PKIB, KLHDC7B, MT1G).

Next, we used tumor data to analyze expression alteration (“altered” vs. “not altered”; definition in Methods section) and hazard ratio (HR), and tested whether expression alteration correlates with survival (see Methods for estimate on HR magnitude34. Generally, medium-large HR is > 1.3). The correlations were categorized as (a) lower altered expression with improved survival, (b) higher altered expression with improved survival, (c) lower altered expression with decreased survival, and (d) higher altered expression with decreased survival (Fig. 4). From the standpoint of drug development, developing inhibitor(s) for genes in category (a) or (d) would be most feasible, while developing enhancer(s) of a gene or its function to target categories (b) or (c) remains difficult. For category (a), decreased TIGD6 or TMED6 expression were each associated with improved survival (HR 1.16204E-07 [TMED6], 0.455 [TIGD6]) (Table1; Fig. 4A). For category (b), higher altered expression of DDB2 (HR 2.86E-06), WARS (HR 0.788), or KLHDC7B (HR 0.881) was associated with improved survival (Fig. 4B). As DDB2, WARS, and KLHDC7B are assessed functionally as CNA suppressors, increased expression may be antagonizing high genomic instability. For category (c), decreased MT1G (HR 2.478), CLCN3 (HR 3.564), or CAPS (HR 1.908) expression was associated with poorer survival (Fig. 4C). For category (d), with APOBEC3D (HR 4.55), EP400NL (HR 3.792), B3GNT4 (HR 2.354), ZNF683 (HR 1.957), FOXD4 (HR 1.788), FOXD4L1 (HR 1.426), or PKIB (HR 1.468), higher altered expression was associated with decreased survival (Fig. 4D). On the other hand, ROBO3 is a gene whose overexpression was consistently observed in CRC, and its possible involvement in EMT and malignant progression has been reported46,47. Yet, overexpression of ROBO3 showed only small effects on survival in CRCs (HR 1.058). This finding suggests that the amount of ROBO3 expression alone may not be a strong indicator of benefit or disadvantage for survival in CRCs (Fig. 4E). Overall, this analysis identified nine potential target genes (medium-large HR [> 1.3]; TIGD6, TMED6, APOBEC3D, EP400NL, B3GNT4, ZNF683, FOXD4, FOXD4L1, PKIB) for inhibitor development, and four genes (DDB2, MT1G, CLCN3, CAPS) for enhancer development.

Figure 4
figure 4

CNA facilitator/suppressor genes affecting patients’ survival (“survival-critical”). For 18 genes, expression levels correlate with both CNA and patients’ survival in CRC (i.e., “survival-critical” genes). The genes represent potential targets for drug development. There are four categories, as follows. (A) Lower altered expression with improved survival. For TMED6 and TIDG6, lower expression was associated with improved survival; thus, they are potential inhibitor development targets. Hazard ratio (HR) < 1 (i.e., decreased risk). “Altered” (red), “Not Altered” (green). (B) Higher altered expression with improved survival. For DDB2, WARS, and KLHDC7B, higher expression was associated with improved survival; thus, they are potential enhancer development targets. (C) Lower altered expression with decreased survival. For MT1G, CLCN3, and CAPS, lower expression was associated with decreased patients’ survival. For HR > 1, expression alterations increase risk. For estimating magnitude of HR, small, medium, and large HRs comparing two groups would be approximately 1.3, 1.9, and 2.8, respectively34. (D) Higher altered expression with decreased survival. For APOBEC3D, EP400NL, B3GNT4, ZNF683, FOXD4, FOXD4L1, and PKIB, higher expression was associated with decreased survival; thus, they are potential targets for inhibitors. (E) ROBO3 is consistently shown to be over-expressed in CRCs. This finding is corroborated by the present study. However, the impact of ROBO3 expression on patients’ survival in CRCs is small (not trivial, but possibly inconsequential) with HR1.058. Figures were generated with cBioportal and with R v4.0.3.

Discussion

At the onset of this project, we anticipated that a similar profile between lung and colon would emerge and a set of genomic instability genes common among cancers would be identified. This expectation was based on (a) pan-cancer analysis of oncogenes that indicated recurring sets of oncogenic pathways common among various cancers (e.g., kras, TP53), and (b) extrapolation from previous pan-cancer analysis of CNA-associated pathways7. However, the results were surprising: (a) less involvement of over-expressions of mitotic genes in generating genomic instability in the colon, and (b) the presence of CNA-suppressing pathways, including immune-surveillance, were only partly similar to those in the lung. The results suggest that generation and suppression mechanisms of tumor genomic instability depend on the organ, and that therapeutic modalities targeting genomic instability must be tailored for the target organ.

Although CNA suppression pathways were only partly similar, common to lung and colon were the Antigen Presentation, Interferon Signaling, and Natural Killer Cell Signaling pathways, suggesting the presence of both common/non-organ specific and organ-specific immune components for genomic instability surveillance. This observation may extend to a basis for developing highly organ-specific cancer immuno-prevention or therapies.

This study identified RNA metabolism regulators (e.g., DDX27, PRPF6, SMG5) as influencers of genomic instability in CRC. A mechanistic link between RNA regulators and genomic instability had not been fully explained. Recently, in pancreatic cancer, mRNA regulators/RNA-binding splicing factors were identified as methylation targets of PRMT1 (Protein Arginine Methyl Transferase 1). Inhibition of the methylation via specific inhibitor affects splicing site selection and functional protein expression of the downstream targets. Many of the downstream target proteins, including Cyclin D, were cell cycle and proliferation regulators. Thus, PRMT1 inhibition indirectly caused growth-static effects and genomic instability48. We speculate that transcriptomic disturbance of RNA metabolism genes may affect genomic stability in CRC in a similar, indirect mechanism.

Suggesting the validity of this GE-CNA approach, many of the identified pathways are also pathways that have been identified in cancer (chemo) prevention and therapy studies, including apoptosis, Redox signaling, JAK-STAT signaling, and inflammation pathways. The Heme biosynthesis pathway, however, is under-investigated in cancer. As it is newly identified with this unbiased approach, further study is warranted. Regarding MODY signaling, the potential link between diabetes and cancer has been a subject of interest. Meta-analysis indicated that type 2 diabetes (T2D) was associated with incidence of several cancers, especially prostate and liver cancer, and with mortality from pancreatic cancer. In bias analyses, the proportion of studies with a true effect size larger than a RR of 1.1 (i.e., 10% increased risk in individuals with T2D) was nearly 100% for liver, pancreatic, and endometrial cancer; 86% for gallbladder cancer; 67% for kidney cancer; 64% for colon cancer; and 62% for colorectal cancer49, indicating a modest level of positive association between CRC and diabetes. However, microsatellite instability was reported to be inversely associated with T2D in CRC50. The inverse association between diabetes and MIN-CRC corroborates with our discovery of MODY signaling as suppressor of amplification/insertion CNA, a MIN trait.

Other genes/pathways of interest include APOBEC3 (HR4.6), due to the strong HR, and B3GNT4 (HR2.4), due to its relation to mucin function. APOBEC3D encodes double-domain deaminase and is a member of the APOBEC3 family genes51. APOBEC3 proteins form Apolipoprotein B Editing Complex and mediate intrinsic responses to infection by retroviruses [e.g., HIV52,], but also can act as a strong mutagenic factor53. In breast cancer, expression of APOBEC3B is increased and associated with mutation load and poor outcome, while high APOBEC3C-H expression was linked to favorable prognostic benefit for both cancer progression and mortality54. A recent study showed causal relationship between APOBEC3B induction and DNA replication stress and CIN in early breast and lung cancer evolution55. Our results with APOBEC3D likely indicate a parallel with APOBEC3B in breast cancer, a mutagenic activity of APOBEC3D in CRCs, and suggest survival benefit with a specific inhibitor of APOBEC3D.

B3GNT4 is a member of the B3GNT family, which is a transmembrane Golgi enzyme that catalyzes the transfer of N-acetyl glucosamine from UDP-GlcNAc onto Gal beta 3 (GlcNAc beta 6) GalNAc-mucin. The enzymes function in the elongation and branching of O-linked oligosaccharide chains of mucin glycoproteins, thus the complete functional maturation of mucins. Mucins play pivotal mucosal barrier functions in the intestine, and their dysfunction is associated with colitis and CRC56,57. However, only limited reports portray the importance of mucin maturation enzymes or their value in cancer drug development58. B3GNT3 was reported as a novel marker correlated with metastasis and poor clinical outcome in cervical cancer59, but to our knowledge this is the first report of potential clinical significance for B3GNT4 in cancers.

Overall, the present study identified genomic instability genes via transcriptomic alterations in CRC, which is an unbiased portrait of genes that may or may not have been identified through previous hypothesis-driven studies. Indeed, this study identified CIN and MIN genes as predicted, as well as a number of genes whose mechanism of generating genomic instability is yet to be investigated. The new results from CRC allows us to compare the profile with that of lung adenocarcinoma. The comparison indicated organ specificity in genes influencing tumor genomic instability and suggests the value of a tailored approach for targeting genomic instability. We identified nine genes whose inhibition may lead to better survival (HR > 1.3; TIGD6, TMED6, APOBEC3D, EP400NL, B3GNT4, ZNF683, FOXD4, FOXD4L1, PKIB) and four genes for which an enhancer may benefit CRC patients’ survival (DDB2, MT1G, CLCN3, CAPS) via genomic instability modulation. These 13 genes with potential clinical relevance carry diverse functions, thus implicating multiple pathways leading to genomic instability rather than single central network affecting genomic instability. With promising target genes identified, further drug development is warranted.