Centromere and kinetochore gene misexpression predicts cancer patient survival and response to radiotherapy and chemotherapy

Chromosomal instability (CIN) is a hallmark of cancer that contributes to tumour heterogeneity and other malignant properties. Aberrant centromere and kinetochore function causes CIN through chromosome missegregation, leading to aneuploidy, rearrangements and micronucleus formation. Here we develop a Centromere and kinetochore gene Expression Score (CES) signature that quantifies the centromere and kinetochore gene misexpression in cancers. High CES values correlate with increased levels of genomic instability and several specific adverse tumour properties, and prognosticate poor patient survival for breast and lung cancers, especially early-stage tumours. They also signify high levels of genomic instability that sensitize cancer cells to additional genotoxicity. Thus, the CES signature forecasts patient response to adjuvant chemotherapy or radiotherapy. Our results demonstrate the prognostic and predictive power of the CES, suggest a role for centromere misregulation in cancer progression, and support the idea that tumours with extremely high CIN are less tolerant to specific genotoxic therapies.


Supplementary Note 1. Misregulation of a subset of CEN/KT genes in cancers.
To better understand CEN/KT gene misregulation in cancers, we analyzed TCGA RNA-seq data for different types of cancer. A recent study demonstrated a strong correlation between the FoxM1 transcription factor and kinetochore gene expression, and proposed that CEN/KT genes are simultaneously up-regulated by FoxM1 in cancers 1 . Consistent with this observation, we used gene expression correlation network analyses and also detected strong correlations among many CEN/KT genes in diverse cancer types (Spearman's rho, r s =0.4, p<0.05) (Supplementary Fig. 2). However, the number of genes and correlation coefficients in this network vary greatly among different cancers, suggesting significantly different strength of gene expression correlation within and between cancer types. For example, in several cancers such as bladder, cervical and uterine cancers, this sub-network contains many fewer components and significant correlations than cancers such as acute myeloid leukemia (AML), lung adenocarcinoma or lower grade brain cancer ( Supplementary Fig. 2). We conclude that overexpression of CEN/KT genes can be coordinated by FoxM1 and other factors, but regulatory relationships differ significantly among cancer types and even among individuals within the same type.

Supplementary Note 2. A subset of CEN/KT genes have significant prognostic value in multiple cancers.
We determined if individual CEN/KT gene misregulation has prognostic value for cancer patients by performing meta-analyses on expression microarray datasets for multiple cancer types. These analyses were first performed on >3,000 human breast cancer clinical samples using BC-GenExMiner 3.0 2 , then using K-M Plotter database for breast, lung, gastric and ovarian cancers 3 . For breast cancers using BC-GenExMiner, overexpression of 22 individual CEN/KT genes and reduced expression of CENP-C are significantly associated with poor any event (AE)-free survival (p<0.05) and poor metastatic relapse-free survival (MRFS) (p<0.05) (Supplementary Table 3). Eleven of these 22 identified genes (CENP-A, -C, -N, -H, -I, -M, -K, -L, HJURP, MIS18A and MIS18B) are required for new CENP-A assembly, implying an important role in breast cancer progression. Notably, misexpression of nine essential CEN/KT genes (CENP-T, -S, -P, -Q, -R, M18BP1, PMF1, MIS12 and NSL1), including M18BP1 which is known to be essential for CENP-A assembly, demonstrated lack of overall prognostic value in the meta-analysis using BC-GenExMiner. Analysis using K-M Plotter database identified many of the same genes (Supplementary Table 4) 3 . We conclude that the prognostic value of individual CEN/KT gene misexpression can vary, even when their functions are intimately related, suggesting distinct roles and regulations in cancer progression.
Moreover, we analyzed prognostic values of CEN/KT gene expression for overall survival and disease progression in over 1,600 lung cancer patients, over 350 gastric cancer patients, and a smaller number (n<150) of stage I and stage II ovarian cancer patients, using K-M Plotter 3 . We identified 20 CEN/KT genes whose misexpression impacts lung cancer prognosis when up-or down-regulated (p<0.05) (Supplementary  Table 8). These results suggest that expression levels of many CENK/T genes are effective predictors of breast, lung, gastric, and early stage ovarian cancer prognosis.

Supplementary Note 3. CES signature in breast cancer ILCs.
We examined breast invasive lobular carcinoma (ILC) using the TCGA breast adenocarcinoma dataset. Briefly, we found that ILCs have significantly lower CES than IDCs, and detected significant correlation between CES and both fraction of CNA and mutation frequency within the ILC subcohort ( Supplementary Fig. 6A and Supplementary  Table 10). ILCs are predominantly luminal A subtype 4 , which has the lowest average CES among all molecular subtypes ( Supplementary Fig. 5). Because most ILCs belong to luminal A subtype (65/77), we also compared ILCs and IDCs within luminal A subtype. There is no significant difference in CES values between IDCs and ILCs within luminal A subtype ( Supplementary Fig. 6B). Within ILCs, we did not detect significant association between high CES and any particular ILC subtype (Supplementary Fig. 6C). We also detected significant correlations between CES and mutation frequency and fraction of CNA for both ILCs and IDCs within luminal A subtype (Supplementary Table 10).

Supplementary Note 4. Prognostic performance of the CES for TCGA datasets.
We evaluated the prognostic value of the CES signature using TCGA breast adenocarcinoma and lung cancer datasets (Supplementary Tables 12 and 13). For breast cancer, we observed significant difference in overall survival across CES tertiles, but the low CES group appears to have worse survival than the intermediate CES group, and has similar survival to the high CES group (Supplementary Fig. 23A). We note that the TCGA dataset at this time suffers from very short follow-up times (median follow-up is 1.8 years for overall survival, Supplementary Table 12). This problem significantly affects survival analyses since most breast cancer patients are expected to live longer than 5-10 years after initial diagnosis under the current standards of care. Indeed, even though we detected highly significant differences in overall survival across PAM50 molecular subtypes ( Supplementary Fig. 23B), Kaplan-Meier graph and Cox regression analysis on PAM50 subtypes did not show a significant difference even between basallike and luminal A subtypes (Supplementary Fig. 23B and Supplementary Table 28), indicating that there are short follow-up or other problems with the dataset, even when it is tested against a well-established marker.
For TCGA lung ADC dataset, CES is a significant prognostic factor in both Kaplan-Meier survival analysis and multivariate Cox regression (Supplementary Fig. 23C and Supplementary table 29), even though the dataset also has short follow-up times (Supplementary Table 13). This is probably because lung ADC patients have significantly shorter median survival after initial diagnosis. Because high CES is also a predictive marker for better response to adjuvant chemotherapy for lung cancer patients (Figure 7), we removed all samples treated with chemotherapy before we analyzed the prognostic value of the CES signature.
For TCGA lung SCC dataset, the CES signature does not significantly prognosticate overall survival ( Supplementary Fig. 23D), similar to the result from metadata analysis ( Figure 5C and Supplementary Fig. 16C). However, the CES signature is not only a prognostic marker but also a predictive marker for lung cancer patient outcome after adjuvant chemotherapy or radiotherapy. As we pointed out earlier, it is likely that adjuvant chemotherapy improved survival for high CES patients in the dataset. Unfortunately, the TCGA lung SCC dataset at this time does not provide chemotherapy information (Supplementary Table 13), so we cannot address this issue.
In summary, the CES signature shows significant prognostic value for TCGA breast cancer and lung ADC datasets, but not for lung SCC dataset. However, more careful analyses raised concerns about short follow-up times or lack of treatment information for breast cancer and lung SCC datasets.