Proteomics-derived basal biomarker DNA-PKcs is associated with intrinsic subtype and long-term clinical outcomes in breast cancer

Precise biomarkers are needed to guide better diagnostics and therapeutics for basal-like breast cancer, for which DNA-dependent protein kinase catalytic subunit (DNA-PKcs) has been recently reported by the Clinical Proteomic Tumor Analysis Consortium as the most specific biomarker. We evaluated DNA-PKcs expression in clinically-annotated breast cancer tissue microarrays and correlated results with immune biomarkers (training set: n = 300; validation set: n = 2401). Following a pre-specified study design per REMARK criteria, we found that high expression of DNA-PKcs was significantly associated with stromal and CD8 + tumor infiltrating lymphocytes. Within the basal-like subtype, tumors with low DNA-PKcs and high tumor-infiltrating lymphocytes displayed the most favourable survival. DNA-PKcs expression by immunohistochemistry identified estrogen receptor-positive cases with a basal-like gene expression subtype. Non-silent mutations in PRKDC were significantly associated with poor outcomes. Integrating DNA-PKcs expression with validated immune biomarkers could guide patient selection for DNA-PKcs targeting strategies, DNA-damaging agents, and their combination with an immune-checkpoint blockade.


INTRODUCTION
While gene expression profiling has refined breast cancer prognosis and helped guide treatment choices 1-3 , few advancements have been made in identifying practical biomarkers that can aid in tailoring treatments for the aggressive basal-like intrinsic subtype of breast cancer [4][5][6] .
The gene expression-defined basal-like breast cancer subtype is currently clinically approximated by triple-negative immunohistochemical (IHC) status, characterized by combined negativity for estrogen receptor (ER), progesterone receptor, and human epidermal growth factor receptor-2 (Her2). However, this IHC definition identifies a group with a heterogeneous biology 7-9 that consists of at least four major molecular subgroups termed basal immune activated, basal immune suppressed, mesenchymal and luminal androgen receptor 10,11 . These subgroups have been repeatedly shown to differ in their clinical outcomes and exhibit a complex repertoire of somatic mutations, highlighting the complexity of guiding therapeutic choices for triple-negative breast cancers, including those with basal-like molecular biology 7,10 .
In an attempt to identify improved diagnostic tools and therapeutic options for this aggressive group of cancers, more precise basal biomarkers have been recently proposed based on new proteomic profiling data. A mass spectrometry-based analysis performed by the Clinical Proteomic Tumor Analysis Consortium group using fresh frozen materials from 122 TCGA breast cancer specimens reported DNA-dependent protein kinase catalytic subunit (DNA-PKcs) to be the most specific biomarker for the basal-like subtype 12,13 .
DNA-PKcs, encoded by PRKDC, is a member of the phosphatidylinositol 3-kinase-related family of protein kinases that plays a critical role in cell response to DNA damage and repair of doublestrand breaks 14 . In response to DNA damage, the catalytic subunit of DNA-PKcs is recruited to the double-strand break site to bind to the Ku70/Ku80 heterodimer and form the DNA-PK serine/ threonine-protein kinase complex 15 . This complex plays an important role in DNA damage response (DDR) and maintenance of genomic stability through the nonhomologous end-joining DNA repair pathway 16 . The binding of DNA-PKcs further phosphorylates and coordinates the activation of other proteins that mediate nonhomologous end-joining DNA repair 17,18 . Recently, DNA-PKcs has been proposed as an actionable therapeutic target for DNA damage in the breast and several other tumors types [19][20][21][22] with DNA-PKcs inhibitors being actively assessed in clinical trials 14,23 . These findings along with recent evidence supporting the efficacy of DNA-PKcs in preclinical models support the development of DNA-PKcs targeting strategies in breast cancer 24 .
In the context of triple-negative breast cancer heterogeneity, PRKDC has been reported to be highly expressed in the basal immune activated and basal immune-suppressed molecular subgroups, while being depleted in the luminal androgen receptor and mesenchymal breast carcinomas 11 . DNA damage and double-strand break repair pathways have been shown to be specifically upregulated in basal-like breast cancers due to their high aberrant activation resulting from the DDR deficits, high mutational load, and genomic instability that characterize these tumors 8,25,26 .
In this study, we evaluated the prognostic capacity of the basal biomarker DNA-PKcs on a large tissue microarray series representing early-stage breast cancer patients. Following a prespecified study design, we tested the hypothesis that high expression of DNA-PKcs identifies cases with basal-like features and poor clinical 1 outcomes. We further explored the value of combining DNA-PKcs IHC assessment with key immune biomarkers in the context of basal-like heterogeneity. In addition, we investigated the utility of DNA-PKcs as a basal biomarker in ER-positive breast cancers, correlating results with biological intrinsic gene expression subtype and genomic data.

RESULTS
DNA-PKcs expression is associated with basal-like characteristics, adverse clinicopathological features and poor survival in the UBC series A total of 300 cases were evaluable for DNA-PKcs expression by immunohistochemistry (IHC) in the UBC series. Representative images of IHC expression of DNA-PKcs are displayed in Fig. 1. High expression of DNA-PKcs was found in 20.3% (n = 67) of cases and was associated with features of the aggressive disease including grade 3 histology, lymphovascular invasion, ER negativity, EGFR expression, CK5/6 expression, high proliferation index (Ki-67 ≥ 14%), and triple-negative status (Supplementary Table 1). DNA-PKcs expression was significantly higher in cases with the IHC core basal phenotype (defined as ER-negative, progesterone receptornegative, Her2 negative, and [EGFR + or CK5 + ]) 27 compared to non-core basal cases (Fig. 2a and Supplementary Table 1). Additionally, DNA-PKcs expression was associated with high numbers of cytotoxic T-cells (CD8 + iTILs, Supplementary Table  1). When matching IHC expression data for DNA-PKcs with CD8 + iTILs on the same tissue core, a weak but significant correlation was observed (Fig. 2b).
The median follow-up for the UBC cohort was 12.7 years; tumors with high expression of DNA-PKcs were found to be significantly associated with lower breast cancer-specific survival (HR 2.04, 95% CI 1.19-3.52, p = 0.01) when compared to cases with low DNA-PKcs (Fig. 2c).
Validation of the prognostic significance of DNA-PKcs in the BC Cancer series Observations from the UBC series cohort were next validated in the larger, independent BC Cancer series cohort (Supplementary Data 1) wherein the mean age at diagnosis was 58.9 years and the median duration of follow-up was 12.5 years (Table 1).
Of the primary tumor samples, 2401 were interpretable for DNA-PKcs immunostaining. The original version of the BC Cancer series TMA had 3992 cases 28,29 , but since that time many cores have been cut through and source blocks exhausted, such that interpretable data for the current study could be generated for 2401 cases. Among these, high expression of DNA-PKcs was observed in 25.7% (618/2401 cases) ( Table 1). A significant association was observed between tumors with high expression of DNA-PKcs and adverse pathological features including grade 3 histology, high Ki-67 proliferation index (defined as ≥14%), hormone receptor negativity, Her2 positivity, expression of basal biomarkers including CK5/6, EGFR, and a triple-negative phenotype (p < 0.001) ( Table 1). In addition, DNA-PKcs expression was found to be significantly associated with core basal tumors (p < 0.001) (Fig. 3a) (Table 1). Using prespecified criteria and the scoring methodology as published by others 30 , we analyzed the correlation of DNA-PKcs expression with infiltrating lymphocytes (stromal H&E sTILs and CD8 + iTILs). We found that tumors categorized with high DNA-PKcs expression were highly significantly associated with the expression of these immune biomarkers (p < 0.001) ( Table 1). Further assessment of DNA-PKcs expression revealed significantly higher scores in cases characterized by high H&E sTILs and CD8 + iTILs (Fig. 3b).
We next evaluated the prognostic significance of DNA-PKcs expression in the BC Cancer series, and confirmed that cases with high DNA-PKcs expression are associated with poor BCSS (HR 1.38, 95% CI 1.17-1.62; p < 0.001) (Fig. 3c) Fig. 1a-1b).
Next, we performed multivariate analysis using Cox proportional hazards model to assess the independent prognostic relevance of DNA-PKcs expression adjusted for clinicopathological variables (age, tumor size, histological grade, axillary lymph node status), breast cancer subtypes, and systemic treatments. High expression of DNA-PKcs remained an independent prognostic factor indicative of poor BCSS (HR 1.33, 95% CI 1.10-1.60; p = 0.002) ( Table 2).
Prognostic stratification of basal cases based on the combination of DNA-PKcs and immune biomarkers The core basal phenotype defined by a 5-biomarker immunopanel (ER-negative, progesterone receptor-negative, Her2 negative, and [EGFR + or CK5 + ]) has been previously shown to more specifically identify cases with the basal-like gene expression subtype 27 and to provide superior prognostic information when compared to an IHC definition that is based simply on triple-negative expression for the estrogen, progesterone and Her2 receptors 31 . Thus, we specifically examined the prognostic significance of DNA-PKcs expression in the IHC based core basal (vs non-core basal) subtype and found that tumors characterized by both high expression DNA-PKcs and by a core basal phenotype displayed the worst BCSS (HR 1.66, 95% CI 1.23-2.22; p = 0.001) compared to other groups (Fig. 3d).
Given that PRKDC, the gene encoding for DNA-PKcs, is characteristic of both the basal immune activated and basal immune-suppressed RNA-based subgroups of triple-negative breast cancer 11 , we investigated the prognostic significance of the combination of key immune biomarkers (sTILs and CD8 + iTILs) and DNA-PKcs expression status within the core basal subtype. We found that low DNA-PKcs expression concurrent with the presence of stromal TILs correlated with superior survival in the core basal tumors (HR 0.42, 95% CI 0.22-0.78; p = 0.005) (Fig.  4a). Similar results were observed when we used ≥30% as the cutpoint for defining high levels of H&E sTILs, a value used by others in recently published studies 32,33 (Supplementary Fig. 2).
The cytotoxic T-cell subset showed an even more marked association with good prognosis: cases with low DNA-PKcs that had high levels of CD8 + iTILs were associated with a significantly better BCSS (HR 0.26, 95% CI 0.13-0.55; p < 0.001) (Fig. 4b), defining a group of patients with disease-specific survival better than 80% even 15 years after being diagnosed with triple-negative breast cancer.
DNA-PKcs and mRNA PRKDC expression are associated with PAM50 intrinsic subtype and poor clinical outcomes To date, successful basal biomarkers that have been validated against gold-standard gene expression assays are mostly limited to the triple-negative breast cancer setting 31 with very few applicable in the context of ER positivity 34,35 . However, there is a proven subset of basal-like gene expression in the literature that is ER positive [36][37][38] . Thus, we aimed to assess the value of DNA-PKcs as a basal marker on datasets with gene expression profile data that include ER-positive cases. We tested the association between DNA-PKcs IHC expression and PAM50 intrinsic subtype on a set of 825 cases in the BC Cancer series previously profiled by quantitative reverse transcription-polymerase chain reaction for PAM50 gene expression 39 . The majority of these cases corresponded to clinically ER + patients that were treated with adjuvant tamoxifen 39 ; a total of 571 had available data for both mRNA PAM50 intrinsic subtype and DNA-PKcs expression by IHC. Basal-like PAM50 tumors were characterized by higher IHC scores for DNA-PKcs expression when compared to the other PAM50 subtypes (p-value<0.001) (Fig. 5).
We next assessed the expression of the DNA-PKcs gene (PRKDC) at the transcriptomic level using data from the TCGA invasive breast cancer cohort 8 (Fig. 6a). Higher PRKDC expression is significantly associated with basal-like PAM50 subtype and ER negativity (Fig. 6b, c). In addition, high PRKDC expression is also associated with basal-like gene signature within ER + tumors in the TCGA cohort ( Supplementary Fig. 3). We further validated the association between PRKDC expression and the basal-like  Representative images for nuclear expression of DNA-PKcs by immunohistochemistry. A case with negative staining for DNA-PKcs (IHC score = 0) is shown in (a), a case with 10% positivity and weak intensity (IHC score = 2) is shown in (b), a case with 40% positivity and moderate intensity (IHC score = 6) is shown in (c), and a case with 90% positivity and strong intensity (IHC score = 12) is shown in (d). The images were acquired at 20× objective magnification (200× original magnification) for the tissue microarray cores. Scale bar of 100 µm is shown. Abbreviations: IHC, immunohistochemistry.
PAM50 subtype using data obtained from a contemporary collection of primary breast cancer tissues from women enrolled in the SCAN-B trial 40 (NCT02306096) (Fig. 6d). High PRKDC was significantly associated with poor disease-free survival rates in the TCGA cohort (Fig. 6e) and when applying KMplotter to 35 publicly available Gene Expression Omnibus datasets 41 (Fig. 6f). Taken together, our results show that both DNA-PKcs protein and PRKDC transcript are biomarkers that help identify basal-like cases both within ER + and ER-breast cancers.
PRKDC non-silent somatic mutations are associated with poor clinical outcomes We next aimed to correlate our findings with somatic mutations in the PRKDC gene encoding for DNA-PKcs. The somatic mutations previously published from a subset of 640 tamoxifen-treated, clinically ER + patients from the BC Cancer series were used in this analysis 42 (Supplementary Data 1). Among those cases, 420 cases also had data for DNA-PKcs expression by IHC generated for the current study, for which 16 had non-silent and 8 had silent mutations in PRKDC whereas 396 were wild type. The majority of non-silent mutations were missense (14/16), with 1 additional nonsense and 1 frameshift (Fig. 7a). Four of the 14 missense mutations with available data for IHC DNA-PKcs expression were predicted to be damaging to the protein function using the "Mutation Assessor" tool (Supplementary Data 1). When testing the association between mutation status and DNA-PKcs expression by IHC, the two cases with truncating mutations were negative for IHC expression (Fig. 7a). In addition, the majority of cases with missense mutations displayed low expression for DNA-PKcs by IHC. A comparison of DNA-PKcs IHC expression between wild type vs. non-silent mutated cases was insufficiently powered to observe a significant association due to the small number of mutated cases (Fig. 7b).
We further investigated the prognostic implications of PRKDC somatic mutations in this cohort and found that cases classified as having non-silent mutations in PRKDC exhibited significantly poor clinical outcomes when compared to cases characterized with wild-type PRKDC (HR 2.21, 95% CI 1.08-4.53; p = 0.03) (Fig. 7c). The TCGA dataset showed non-silent somatic mutations in PRKDC in 8 of 818 cases, which despite limited power showed a similarly significant adverse prognostic association (Fig. 7d). Amongst ERnegative cases, only 2 of 179 in the TCGA cohort and 6 of 233 in the SCAN-B cohort had PRKDC somatic mutations, numbers too small for meaningful survival analyses.

DISCUSSION
In this study, we evaluated the prognostic capacity of a basal subtype biomarker, DNA-PKcs, derived from a published highquality comprehensive proteomic profiling study on TCGA breast cancer samples 12 . Following prespecified study design and methodology adhering to REMARK criteria on both training and validation cohorts 43 , we demonstrate, using large cohorts of clinically-annotated breast cancer cases, that IHC expression of DNA-PKcs is associated with the basal-like subtype, high-risk clinicopathological factors, and poor prognosis. These findings are consistent with previous reports showing that high expression of  PRKDC is associated with poor clinicopathological features and clinical outcomes in breast cancer at the transcriptomic level 19,44 .
The association of high expression of DNA-PKcs with poor clinical outcomes in our cohort was more evident within ER-cases when compared to ER + . These findings might be explained by preclinical studies showing that ER signaling regulates DNA damage response targets including DNA-PKcs and ATM, with the majority of ER + tumors displaying relatively low protein expression of DNA-PKcs 45 . In addition, DNA damage processes are particularly characteristic of basal tumors, when compared to ER + tumors that have less genomic instability 8,25,26 , thus consistent with our observation of low overall protein expression of DNA-PKcs within the majority of ER + when compared to ER− breast cancers. Interestingly, a dual role for DNA-PKcs has been further suggested in preclinical models, as a tumor suppressor in premalignant stages maintaining genome integrity; while in an aggressive and advanced stage, DNA-PKcs could indicate high genomic instability, thus acting as an oncogenic driver 23 .
A previous study by the Nottingham breast cancer group reported that IHC expression of DNA-PKcs was significantly associated with good clinical outcomes in breast cancer 46 . These observations were mainly seen in the ER + subgroup and as the authors noted, are contradictory to the preponderance of the preclinical literature showing that DNA-PKcs phosphorylates and stabilizes ER and hence that low levels of DNA-PKcs would be expected to contribute to a reduced ER signaling resulting in less aggressive ER + tumors [47][48][49][50] . Furthermore, the authors noted the discordance of their outcome associations from those reported in other transcriptomic studies in breast cancer 19,44 and several other tumors 20,21,23,51 . The apparent discordance with our current study and other transcriptomic studies [19][20][21]23,44,51 might be because the Nottingham study applied data-driven cutpoints to maximize outcome differences in their data set, in contrast to our study that applied a prespecified externally-validated cutpoint on first a breast cancer training and then on a larger independent validation set. The discordant findings might also be explained by the complexity of DNA-PKcs expression due to changes in protein post-translational modifications that are involved in the DNA damage repair process [52][53][54] and could affect the role of ER signaling in regulating DDR targets including DNA-PKcs 45 .
In our study, we demonstrated the capacity of DNA-PKcs as a basal biomarker that is applicable even in the setting of ER positivity, validated its association with the basal-like PAM50 subtype and correlated results with PRKDC mutational status in a large subset of ER + cases. Within the clinically ER + group, cases with low DNA-PKcs expression were luminal by PAM50 gene expression while those with a high DNA-PKcs profiled as basal-like.
Profiling of the ER + cases for mutations in the PRKDC gene further showed that non-silent mutations correlated with poor survival. Interestingly, the original study 42 that performed targeted sequencing on the ER + subset of cases included in this study reported non-silent somatic mutations in PRKDC to be one of the   topmost significant poor outcome drivers in ER + cases. These mutations have been further reported to be associated with downregulated ATM levels 55 , potentially driving resistance to endocrine therapy 42 .
In support of our findings, mutations in PRKDC have been previously implicated in breast cancer initiation and progression 44,56 . In our study, the majority of non-silent mutations (including both missense and truncating) resulted in a lower DNA-PKcs protein expression. The majority of PRKDC missense mutations we identified would result in either impaired function or a lower expression of the DNA-PKcs protein. While only a third of missense mutations were predicted to be damaging, the majority of other "likely benign" mutated cases still correlated with poor disease-specific survival. These findings suggest that these mutated cases, being defective for DNA damage repair, are impaired in their capacity to maintain genomic stability and consequently evolve to behave aggressively. PRKDC mutant breast tumors (including those bearing loss-of-function mutations) are  characterized by high mutational load and genomic instability 56 , suggesting that these tumors should correlate with poor prognosis regardless of DNA-PKcs protein levels. Since the fraction of mutated PRKDC tumors in breast cancer is very low (1%), our main findings reporting that low expression of DNA-PKcs correlates with good survival is driven by wild-type tumors. In this context, DNK-PKcs IHC expression and PRKDC mutation should be considered in combination to define a specific subgroup with better prognostication. In relation to treatment, the expression of DNA-PKcs has been reported to drive resistance to chemotherapy and radiotherapy in preclinical models 19,57 , whereas inhibition of DNA-PKcs has been shown to sensitize breast cancer cells to these treatment modalities 22,24 .
DNA-PKcs is a key regulator for maintaining genomic integrity, cell cycle, and DNA repair through forming complexes with Ku70/ 80 to mediate DDR 16,18 . However, its aberrant high expression in tumors could be indicative of inherent DDR deficits and high genomic instability that drive the resistance of these tumors to chemoradiotherapy. High DNA-PKcs levels have also been shown to be induced after chemotherapy or radiotherapy treatments in a manner not dependent on endogenous levels of DNA damage, but rather on drug-induced levels precipitated by damaging agents [57][58][59][60] . Thus, the prognostic and predictive capacity of DNA-PKcs may well best be assessed on specimens taken from primary tumors prior to systemic chemotherapy and/or metastatic disease.
In the context of basal-like breast cancer heterogeneity, our study shows that tumors exhibiting the core basal-like phenotype had a higher expression of DNA-PKcs when compared to tumors characterized as non-core basal. Furthermore, we found a significant association between high DNA-PKcs and high numbers of TILs and cytotoxic CD8 + lymphocytes. These findings might be explained by the high genomic instability that characterizes basal tumors with high DNA-PKcs expression, resulting in an accumulation of genetic alterations and a consequent high mutational load 61 that could lead to neoantigen production, inducing an immunogenic antitumor response (basal-like immune hot phenotype). While the process of neoantigen production is an established consequence of genomic alterations, the immunogenicity of these tumors is highly dependent on the preexisting inflammatory milieu of the host 62 and thus the basal subset exhibiting high expression for DNA-PKcs, but low expression of sTILs and CD8 iTILs, represents a "basal immune cold" subset. This basal subset exhibited the worst survival in our study.
To date, the major success made in treating triple-negative breast cancer patients, with PARP inhibitors targeting the homologous recombination pathway, has been mainly limited to a small fraction of patients who harbor BRCA mutations or homologous recombination pathway defects 4,63,64 . However, limited data exist on the role of PARP inhibitors across the diverse subsets of triple-negative, basal-like and sporadic homologous deficient breast cancers 64,65 .
DNA-PKcs, as one of the key proteins involved in the DDR, could aid in matching basal patients to DNA-PKcs targeting strategies, DNA-damaging agents, or PARP inhibitors 24 . Specifically, since PARP inhibitors induce the nonhomologous end-joining process among homologous recombination deficient tumors, DNA-PKcs could represent a promising therapeutic target in this particular setting 66 . Furthermore, with the contribution of the DNA damage process to high mutational load and antigenicity 67 , neoantigen production could be further increased as a result of mutations induced by DNA damaging agents. It has been shown that in response to DNA damaging chemotherapy, DDR can promote signalling pathways resulting in a release of proinflammatory cytokines including type I interferon and nuclear factor-kB 68 .
Our study has several limitations. While our primary hypothesis has been tested on large cohorts of patients following a prespecified design and scoring methodology, yielding powered positive results, future studies using samples from clinical trials are critical to establishing the capacity of DNA-PKcs as a biomarker to predict benefit from DNA damaging agents, PARP inhibitors and/ or immunotherapies among basal breast cancer patients. Additionally, the prognostic capacity of DNA-PKcs in the context of basal immune heterogeneity was based on evaluating sTILs and CD8 + iTILs in these tumors. Given the contribution of many cell populations, protein components, and their cross-talk to form an effective anti-tumor immune response, the enumeration of cytotoxic T cells in tumors is insufficient to characterize complex immune distinctions. Furthermore, our study included pretreatment specimens from early-stage breast cancer patients; DNA-PKcs expression could be upregulated after exposure to chemotherapy or radiotherapy during subsequent tumor progression. Thus, the prognostic and predictive capacity of DNA-PKcs expression should be further evaluated after exposure to chemoradiation particularly amongst basal patients who progress to metastatic disease.
In conclusion, this study demonstrates the prognostic capacity of DNA-PKcs, a basal breast cancer biomarker derived from comprehensive proteomic profiling of breast cancer. The integration of DNA-PKcs expression along with established immune biomarkers stratifies major risk differences within the basal-like subtype. Such findings, when applied on clinical trial series would aid in matching basal patients to DNA-PKcs targeting strategies, DNA-damaging agents, and their combination with immune checkpoint blockade.    Fig. 6 Analysis of PRKDC expression using publicly-available breast cancer datasets. a Oncoprint outlining the biological classifications of 825 cases included in the TCGA invasive breast cancer cohort according to ER status, PAM50 subtype, RPPA cluster, and PRKDC mRNA expression as determined by microarray. b-c Boxplots showing the expression levels of PRKDC, as derived from microarray in the TCGA invasive breast cancer cohort, is significantly associated with basal-like PAM50 intrinsic subtype. The median (center bar), and the third and first quartiles (upper and lower edges, respectively) are shown. c Data were obtained through the cBioPortal for Cancer Genomics database 74 . d Raincloud plots showing the expression level of PRKDC, as derived from RNA-seq on the SCAN-B breast cancer cohort, is significantly associated with basal-like PAM50 intrinsic subtype. e-f Kaplan-Meier survival curves showing the association between PRKDC expression and DFS on cases from the TCGA invasive breast cancer cohort (e) and 35 Gene Expression Omnibus breast cancer datasets (f). Plots were generated using the bc-GenExMiner v4.5 75

Study cohorts
Two independent, well-annotated cohorts corresponding to patients diagnosed with stage I-III breast cancer were included in the current study. The staining protocol, scoring criteria, and clinical data analysis were first evaluated on a set of female patients diagnosed with invasive breast cancer (n = 330) at the University of British Columbia (UBC) hospital between 1998-2002, designated as the UBC series. The second cohort was used for subsequent detailed analyses and is comprised of primary invasive breast cancer cases diagnosed in the province of British Columbia at the British Columbia Cancer Agency between 1986-1992, referred to as the BC Cancer series. These patients were treated in accordance with the provincial guidelines during the specified time period. The characteristics of these cohorts have been described previously 28,29 .
Patients diagnosed with ductal carcinoma in situ only, metastatic disease at presentation, and those who received neoadjuvant therapies were excluded.

Ethics approval and study design
This study was approved by the research ethics board of UBC and the BC Cancer Breast Cancer Outcomes unit (approval number: H17-01207). The current hypothesis-based retrospective biomarker study was conducted in accordance with the Reporting Recommendations for Tumor Marker Prognostic Studies (REMARK) guidelines 43 . Prespecified assessment criteria were used for IHC scoring of the biomarkers of interest. Potential hypotheses were initially tested in the UBC series, with the independent BC Cancer series used for subsequent validation studies following a formal prespecified analysis plan approved at a meeting of the Breast Cancer Outcomes Unit at BC Cancer. Consent for the use of previously assembled patient specimens was obtained under a waiver of informed consent policy without identification of patient information.

Tissue microarrays and immunohistochemistry
Formalin-fixed paraffin-embedded tumor blocks of primary surgical specimens were used to construct a series of 0.6 mm core tissue microarrays (TMAs) for both study cohorts as described previously 69 . For the UBC series, duplicate 0.6 mm cores were extracted from each pathology block and embedded into three TMA recipient blocks, while seventeen TMA blocks needed to be constructed to represent the BC Cancer series (1 core per patient) 69 . Serial 4μm sections from these TMAs were previously stained for the following IHC biomarkers included in this study: ER, progesterone receptor, HER2, Ki-67, cytokeratin (CK5/6), and EGFR. The detailed protocols for IHC staining, scoring criteria of these biomarkers, and definitions of IHC-based breast cancer subtypes have been described previously 31 . Stromal tumor-infiltrating lymphocytes were assessed as per recommendations of the International TIL Working Group 70 . To assess the suitability of TMAs for assessing stromal tumorinfiltrating lymphocytes (sTILs), we scored sTILs on digitized full-face hematoxylin and eosin (H&E) stained sections and corresponding 0.6 mm TMA cores from 317 cases from the BC Cancer series. A good correlation (spearman rho = 0.67) was observed ( Supplementary Fig. 4). Hence for further analyses TMAs were utilized with a 10% cutoff as described previously 71 . In addition, TMA sections were scored for CD8 + TILs in the intraepithelial compartment (iTILs) using established, analytically validated IHC staining and interpretation methods as previously published by our group 72,73 . We chose to analyze CD8 + iTILs based on previous observations that this biomarker could define the relevant subset of cytotoxic immune cells that drives anti-tumor immune response and associates with good prognosis 72 . Array sections at 4 µm were mounted on glass slides and baked for an hour at 60°C to prepare for staining on a Ventana Discovery XT automated stainer (Ventana Medical Systems, Tucson, AZ). Antigen retrieval was performed using Cell Conditioning 1 antigen retrieval (Ventana Medical Systems) followed by 2 h of primary antibody incubation at room temperature, and detected using a ChromoMap DAB Detection Kit (Ventana Medical Systems). IHC staining of DNA-PKcs was performed with anti DNA-PKcs rabbit monoclonal primary antibody (clone Y393, dilution 1:500, Abcam, cat# ab32566). Slides were then incubated with a secondary antibody (UltraMap anti-Rb HRP) for an additional 16 min. Separate TMAs that included normal breast, breast, and ovarian cancer tissues were used as positive controls. The stained TMA slides were digitally scanned and DNA-PKcs nuclear expression in the tumor cells was visually scored by a pathologist blinded to clinical data.
Scoring of DNA-PKcs was performed following published criteria previously employed by other groups, using an IHC scoring system based on the proportion and intensity of nuclear staining observed in the invasive carcinoma cells 30 . The positivity proportion scores were captured as a continuous variable for each core and then categorized into four scores as follows: 0, no positive tumor cells; 1, <10%; 2, 10-34%; 3, 35-74%, and 4, ≥75%. The staining intensity was reported as weak (1), moderate (2), or strong (3). The IHC score was computed by multiplying the proportion of positive cells (categorized as 1-4) by the intensity score. This computed IHC score ranged from 0-12 and was used for the final scoring of DNA-PKcs by IHC. For cases with duplicate cores, the higher IHC score was used for analysis. Low expression of DNA-PKcs was defined as an IHC score of <6 whereas a score ≥6 was assigned to tumors with high expression of DNA-PKcs. All slides were scanned digitally using a Bliss System (Bacus Laboratories/Olympus America, Lombard, IL, USA).

STATISTICAL ANALYSIS
IBM SPSS (version 25) and R statistical software were used for performing statistical analyses. Descriptive statistics were computed for continuous and categorical variables. The assessment of IHC expression scores against categorical groups was performed using the two-sided Wilcoxon rank-sum test for pair-wise comparisons and the Kruskal-Wallis rank-sum test for comparisons among more than two groups. Chi-square or Fisher exact tests were used to assess associations between DNA-PKcs expression and clinicopathological variables or expression of other biomarkers. Survival analysis was performed using breast cancer-specific survival (BCSS) as the prespecified primary endpoint, defined as the period between the date of diagnosis and the date of death attributed to breast cancer. Patients who were alive at the end of the follow-up period or who died due to causes other than breast cancer were censored. Cumulative survival probabilities were estimated by Kaplan-Meier methodology and differences in the survival rates between groups were calculated by log-rank testing. Cox proportional hazard modelling was used to compute univariate and multivariate analyses; hazard ratios with 95% confidence intervals were reported for each variable. Multivariate analysis was adjusted for clinicopathological variables including age at diagnosis, tumor size, grade, and nodal status. P-values of less than 0.05 were considered statistically significant.
Bioinformatic analyses using publicly-available breast cancer datasets The expression of PRKDC mRNA was assessed at the transcriptomic level using the TCGA cohort of breast invasive carcinomas 8 and the Sweden Cancerome Analysis Network -Breast (SCAN-B) cohort 40 (NCT02306096). TCGA data including PRKDC expression, PAM50 subtypes, reverse phase protein assay (RPPA) clusters, and IHC ER status were obtained through cBioPortal 74 . SCAN-B was accessed using the bc-GenExMiner v4.5 publicly-available tool 75 . Survival analyses for PRKDC mRNA expression were performed using the bc-GenExMiner v4.5 and the previously-established KMplotter analysis platform 41 curated from 35 Gene Expression Omnibus datasets accessed using https://kmplot.com/analysis/. Kaplan-Meier survival curves were generated by partitioning cases according to the median mRNA expression.

Analysis of PRKDC mutation data
Somatic mutations in a subset of 640 tamoxifen-treated, clinically ER + primary tumors from the BC Cancer series were available from a previous study that performed targeted sequencing of 83 biologically important genes including PRKDC 42 . Mutation lollipop diagrams were generated using the cBioPortal Mutation Mapper tool. Functional categorizations of PRKDC mutations were assessed using the "Mutation Assessor" with information using PolyPhen 76 and SIFT 77 tools.