E-cadherin breast tumor expression, risk factors and survival: Pooled analysis of 5,933 cases from 12 studies in the Breast Cancer Association Consortium

E-cadherin (CDH1) is a putative tumor suppressor gene implicated in breast carcinogenesis. Yet, whether risk factors or survival differ by E-cadherin tumor expression is unclear. We evaluated E-cadherin tumor immunohistochemistry expression using tissue microarrays of 5,933 female invasive breast cancers from 12 studies from the Breast Cancer Consortium. H-scores were calculated and case-case odds ratios (OR) and 95% confidence intervals (CIs) were estimated using logistic regression. Survival analyses were performed using Cox regression models. All analyses were stratified by estrogen receptor (ER) status and histologic subtype. E-cadherin low cases (N = 1191, 20%) were more frequently of lobular histology, low grade, >2 cm, and HER2-negative. Loss of E-cadherin expression (score < 100) was associated with menopausal hormone use among ER-positive tumors (ever compared to never users, OR = 1.24, 95% CI = 0.97–1.59), which was stronger when we evaluated complete loss of E-cadherin (i.e. H-score = 0), OR = 1.57, 95% CI = 1.06–2.33. Breast cancer specific mortality was unrelated to E-cadherin expression in multivariable models. E-cadherin low expression is associated with lobular histology, tumor characteristics and menopausal hormone use, with no evidence of an association with breast cancer specific survival. These data support loss of E-cadherin expression as an important marker of tumor subtypes.

Institutional Review Board at each institution, and all subjects provided informed consent or did not opt-out, depending on national regulations. All methods were performed in accordance with the relevant guidelines and regulations and a list of ethical approval committees are listed at the end of this manuscript.
Risk factor information. The 12 participating studies provided information on one or more of the following risk factors for breast cancer: family history of breast cancer in first-degree relatives, reproductive factors including age at menarche, parity, age at first full-term birth, oral contraceptive (OC) use among women ≤50 years of age, menopausal hormonal use, type of ever menopausal hormone used, and anthropometric measures including body mass index (BMI), and height. As a proxy for menopausal status, we used age ≤50 and >50 as a proxy for pre-and postmenopausal status respectively, since not all studies captured menopausal status. E-cadherin tumor tissue measurements. Routinely prepared formalin-fixed paraffin-embedded (FFPE) blocks of invasive breast tumors were used to construct TMA blocks at each study center. One-hundred and forty-two TMA slides with tumor samples from 6,010 individual patients were prepared for E-cadherin staining (ranging from 1-2 cores per patient). TMA's from all participating studies were stained centrally in the Experimental Pathology Laboratory at the National Cancer Institute (NCI) to allow for consistency across sites and avoid any potential batch effect that may arise due to systematic variation in staining procedures. We recognized the study is unable to control for pre-analytic variables in tissue fixation and processing. However, to address this, the Experimental Pathology Lab at NCI carefully re-titrated the IHC assay to provide a stable assay across all samples.
IHC staining was performed on a Benchmark ULTRA autostainer (Ventana Medical Systems, Tuscon, AZ). TMA sections were deparaffinized with zylene and graded alcohols; antigen retrieval was mediated with citrate buffer pH 9 (Dako) for 20 minutes in a pressure cooker. Primary mouse monoclonal antibody, anti-E-cadherin (clone NCH-38, 1:500; Dako, Carpinteria, CA) was applied at room temperature for 2 hours. The antigen-antibody complex was detected using Envision + (Dako) and DAB was applied for 20 minutes. Slides were counterstained with hematoxylin, dehydrated and coverslipped. Slides were imaged with a Hamamatsu Nanozoomer (Bridgewater, NJ), at 20× magnification and cataloged using the SlidePath Digital Image Hub (Leica Biosystems, Wetzlar, Germany).
As our primary interest was in investigating clinically relevant expression of E-cadherin expression, we used the H-scoring system as has been proposed and evaluated in previous publications [23][24][25] . Two cytotechnologists assessed digital images of TMA spots using the SlidePath Digital Image Hub, blinded to any clinical data. Manual readings of each TMA spot recorded the quality of the image (unsatisfactory, limited or satisfactory), percentage of cells positively stained for E-cadherin (0, 1,5,10,20,30,40,50,60,70,80,90 or 100%) and the intensity of staining (0 = negative, 1 = weak, 2 = intermediate, and 3 = strong). Reproducibility of IHC scoring was assessed based on evaluation of 200 images, by the two cytotechnologist and a pathologist (M.E.S.); inter-and intra-observer agreement was excellent (weighted kappa ≥90%; P < 0.001). A summary E-cadherin score was calculated using the product of % positive tumor cells and intensity (range of 0-300) [23][24][25] . For patients with multiple spots, the maximum E-cadherin score across the spots was calculated for analysis. The median and interquartile range for the E-cadherin score did not vary substantially across the 12 studies (Supplementary Table 2). Tumors having a score of <100 were classified as E-cadherin low and those with a score ≥100 as E-cadherin high. Representative images are shown in Supplementary Figure 1. This cut-point was informed by the known relationship between E-cadherin expression and lobular histology supported by evidence in the literature 23,32,33 . Further for sensitivity analysis, we also evaluated a more stringent cut-point defining E-cadherin loss with a score = 0 (i.e. no expression of E-cadherin).

Assessment of other tumor markers.
Assessment of the tumor markers ER, PR and human epidermal growth factor receptor 2 (HER2), and the definition of positive expression of the tumor markers varied across studies. For the majority of cases (N = 1891, 32%), ER status were primarily extracted from medical records, 15% (N = 908) had ER obtained from IHC staining of whole sections and 28% (N = 1685) had ER obtained from IHC staining of TMAs. Previous publications from participating groups in the current study show good concordances between marker status from medical records and standardized measurements from TMA analyses 34-36 . Statistical analysis. Case-case analyses. As we performed all staining at the NCI to minimize batch effects, a pooled analysis was conducted using data from all 12 studies. We performed case-case analyses to assess whether there was heterogeneity in risk factor associations by E-cadherin breast tumor expression. We used logistic regression models to estimate case-case odds ratios (OR) and 95% confidence intervals (CIs) where E-cadherin low vs. E-cadherin high expression was the outcome and risk factors the explanatory variables. The ORs were interpreted as the risk factor associations of E-cadherin low disease compared to E-cadherin high disease. For each risk factor, the category that has been shown to be associated with the lowest overall breast cancer risk in the literature was selected as the reference category. Thus, the case-case OR >1 can be interpreted to mean that the risk factor examined in the analysis is more strongly associated with E-cadherin low tumors than with E-cadherin high tumors (OR E-cadherin low vs. control > OR E-cadherin high vs. control ). Heterogeneity by E-cadherin subtype were tested using global F test 37 . Because E-cadherin expression may vary by age and study site 23,30,31 all models were adjusted for age (in 10-year categories) and study site. Given that ER status is an important marker of etiologic heterogeneity 38 , we stratified all analyses by ER status (ER+, ER−). Among ER-positive tumors, we also evaluated associations after stratification by histology (lobular, ductal/mixed); this was not done among ER-negative tumor due to small numbers. To assess the variation in results by study for risk factors that showed evidence of a differential association by E-cadherin expression, we fitted study*risk factor interaction terms in the models to estimate p-heterogeneity by study using the likelihood ratio test; P < 0.20 was considered suggestive evidence of between-study heterogeneity 37 . In sensitivity analysis, we also assessed associations with risk factors using a more stringent definition of E-cadherin loss, where loss of E-cadherin was defined as those cases with a score of 0.
Survival analysis. For survival analysis, we further excluded patients with distant metastases at diagnosis of the primary tumor (N = 63) and those who were missing vital status (N = 174). In total, 5,696 invasive breast cancer cases from 12 BCAC studies were included in the survival analysis. A total of 1,085 deaths were observed within 10 years of diagnosis, 671 due to cancer. We calculated the survival time for each case as the difference between the date of diagnosis and the date of death or censoring. Analyses were left censored for time to study entry to allow for inclusion of prevalent cases. End of follow-up was defined as the date of death, date of last follow-up or 10 years, whichever came first. Hazard ratios (HR) and 95% CIs for all-cause mortality and breast cancer-specific mortality were estimated using Cox regression models, using study site as a stratifying factor. Multivariable Cox models were adjusted for potential confounders: age at diagnosis (in 10-year categories), tumor grade (well/ moderately differentiated, poorly differentiated, or unknown), tumor size (≤2, >2 cm, or unknown), node status (positive, negative, or unknown), HER2 status (positive, negative, or unknown), and histology (ductal/mixed, lobular, other, or unknown). To assess whether the associations vary by tumor characteristics, we also estimated HRs and 95% CI by ER status (positive, negative, unknown), HER2 status (positive, negative, unknown), and, among ER-positive tumors, histology (lobular, ductal/mixed, other/unknown).
All statistical tests were two-sided with 5% type-I error. All pooled analyses were performed using the SAS software version 9.3 (SAS Institute, Inc, Cary, NC).

Study and tumor characteristics by E-cadherin expression.
The median age at breast cancer diagnosis was 52 years with some variation by study. E-cadherin low expression by study ranged from 10% to 31% (Supplementary Table 2). Table 1 presents the distribution of clinicopathologic features by level of E-cadherin tumor tissue expression (low/high). E-cadherin low tumors were more likely to be lobular, well/moderately differentiated (low grade), larger in size (>2 cm), and HER2-negative compared to E-cadherin high tumors (P ≤ 0.005; Table 1). These associations were generally consistent across studies (Supplementary Table 3). Table 2 presents risk factor associations for ER-positive breast cancers overall and stratified by histology. Among ER-positive cases, compared with E-cadherin high tumors E-cadherin low tumors were marginally associated with ever use of menopausal hormones compared with never users (OR = 1.24, 95% CI = 0.97-1.59, P-het = 0.08). No consistent associations were observed for E-cadherin status by age at menarche, number of live births, age at live birth, or anthropometric measurements (BMI and height; Supplementary Table 4).

Case-case analyses of risk factor associations with E-cadherin tissue expression among ER-positive tumors overall and stratified by histology.
Among women with ER-positive tumors, we observed a difference by E-cadherin status for number of live births; women who had 1-birth were less likely to have E-cadherin low expression than women with 2 or more births (OR = 0.74, 95% CI = 0.58-0.95, Table 2); however no trend was present, based on the result of nulliparous women. This relationship was driven by the ductal/mixed tumors while in contrast, for lobular tumors nulliparous women had more frequent loss of E-cadherin compared to those with two or more live births. Other breast cancer risk factors examined, family history of breast cancer, age at menarche, age at menopause, age at first birth, OC and menopausal hormone use, did not exhibit heterogeneity by E-cadherin expression.
Among ER-positive breast cancers of ductal histology, no breast cancer risk factors examined exhibited heterogeneity in their associations by E-cadherin expression ( Table 2 and Supplementary Table 4). Analyses using a score of 0 to define E-cadherin loss are presented in Supplemental Table 5-6. In these sensitivity analysis, we observed a stronger relationship with E-cadherin loss with ER expression (Supplemental Table 5), and analysis by risk factors (Supplemental Table 6) showed ever use of menopausal hormones more likely to have E-cadherin loss (defined as score = 0) compared to never users among ER-positive tumors (OR = 1.57, 95% CI = 1.06-2.33, p = 0.02), other factors did not show significant differences.

Case-case analyses of risk factor associations with E-cadherin tissue expression among ER-negative tumors.
We did not find differences by E-cadherin expression among ER-negative breast cancers (Table 3), although the result for OC use was marginal. Cases that reported ever use of OC's were more likely to be E-cadherin low compared with E-cadherin high tumors (OR = 1.97, 95% CI = 0.96-4.06, P-het = 0.06). No significant associations for E-cadherin status were observed for anthropometric measurements including BMI and height (Supplementary Table 4). There were too few cases to evaluate with this more stringent cut-point of 0 to define E-cadherin loss for ER-negative cases.

E-cadherin expression and survival by tumor subtypes.
The mean follow-up time was 9.6 years and results for all-cause and breast cancer specific survival are presented in Table 4. E-cadherin expression showed no significant associations with survival in multivariable models overall, or in any of the tumor subtypes.

Discussion
In our study of nearly 6,000 breast cancer patients, with centrally stained and scored TMA slides, our analyses demonstrated E-cadherin loss was significantly associated with lobular histology consistent with previous work. We found limited evidence of heterogeneity of E-cadherin loss, except for menopausal hormone use, to vary by risk factors or with 10-year breast cancer specific survival within tumor subtypes.
Analysis by tumor characteristics and E-cadherin loss showed significant associations with lobular histology, low grade, larger tumor size and lack of HER2 staining, consistent with previous studies 39 . Lobular breast cancers feature noncohesive cells that are individually dispersed or arranged in a single file pattern, a phenotype that has been attributed to dysregulation of cell-cell adhesion, primarily by loss of E-cadherin protein expression 40,41 . Lobular breast cancers because of their single file pattern tend to be harder to detect in screening and hence, are larger when diagnosed 39 , consistent with our data showing E-cadherin loss associated with larger tumor size.
We also observed a relationship between low E-cadherin expression and ever use of menopausal hormones among ER-positive tumors, which was more pronounced when we used a more stringent cut-point of E-cadherin loss (with a score of 0). Numerous studies have shown that menopausal hormone use, particularly combined estrogen-progestin therapy, to be more strongly associated with lobular tumors than with ductal tumors and that reduced use of menopausal hormones is associated with a declining incidence rate of lobular cancers at the population level [11][12][13][15][16][17][19][20][21][22]42,43 . Further, we also observed among ER-negative breast cancers use of OCs compared to never users to be almost twice as likely to have E-cadherin loss, although not statistically significant. Given that findings suggesting that the relationship between menopausal hormones or OC and breast cancer risk are strongly influenced by recent exposure, it is possible that a true association in our data was attenuated by our reliance on ever as opposed to current use. Given prior epidemiologic studies and in vitro data showing that estrogen may lower E-cadherin expression, the observed relationship with OC or menopausal hormone use may be plausible 44 .
From our analysis of breast cancer risk factors among ER-positive tumors we observed that women who had one birth were less frequently E-cadherin low compared to women who had two or more live births. We saw an opposite relationship among lobular tumors where nulliparous women had more frequent loss of E-cadherin compared to those with two or more live births. Whether E-cadherin loss in tumors is related to reproductive characteristics requires larger datasets. With regards to genetic factors, mutation profiling studies targeted at CDH1 suggest mutations to be rare and unlikely to explain loss (33/507 based on TCGA data) 45,46 .
We did not observe associations between E-cadherin and breast cancer specific survival in multivariable models as reported in previous studies [24][25][26][27][28] . Our data, based on the largest analysis of its kind, do not support E-cadherin as an important marker of survival in breast cancer patients.
Strengths of our study include the use of large, pooled analysis, centrally stained and scored E-cadherin data which allowed for the reduction of any systematic bias that may have been introduced across participating studies. Age ≤ 50 years was used as a proxy for premenopausal status and age >50 years was used as a proxy for postmenopausal status. Abbreviations: OR = odds ratio, CI = confidence interval, ER = estrogen receptor Note: For each variable, the category that has been shown to be associated with the lowest overall breast cancer risk in the literature was selected as the reference category. The case-case OR may be interpreted as the ratio of casecontrol ORs for E-cadherin low tumors (vs. controls) and E-cadherin high tumors (vs. controls). The case-case OR >1 may suggest that the risk factor association is more strongly associated with E-cadherin low tumors than with E-cadherin high tumors (OR E-cadherin low vs. control > OR E-cadherin high vs. control ). Limitations of this study include limited power for analysis of tumor subgroups due to smaller numbers especially for survival analysis. Although staining of E-cadherin was performed centrally, we had lower than expected percentages of E-cadherin associated lobular breast cancers, which may reflect variation in calling of histologic subtypes, but could also indicate the need for molecular profiling methods or more detailed image analysis studies on the compartment of where E-cadherin is stained needed, for defining E-cadherin loss 47,48 . In fact, the percentage of E-cadherin low lobular carcinomas varied by study, which could reflect tissue factors that influenced staining, variability in classification of cancers as lobular including potential sampling issues if TMA's did not capture fully lobular morphology, or factors related to populations and relative frequency of risk exposures. In summary, this study provides limited evidence for heterogeneity in risk factor associations or for differences in survival by E-cadherin tumor tissue expression. Our data are consistent with molecular profiling studies showing distinctive expression of genes associated with E-cadherin signaling among ER-positive ductal and lobular carcinomas 10,49 . Evaluating genetic susceptibility markers and E-cadherin loss, where data suggest that genetic susceptibility factors may influence loss of E-cadherin expression, might provide new insights on pathways of E-cadherin loss consistent with histology analysis 23,50 . Future studies using comprehensive molecular subtyping data, including histology, hormone markers and mRNA, might provide new insights on common and distinct molecular pathways of E-cadherin loss as well as tumor heterogeneity 51 .  Availability of data and materials. The datasets generated and/or analyzed during the current study are not publicly available due to privacy and ethical approvals but are available from the corresponding author on reasonable request.   Table 4. 10-year Hazard ratios (HR) and 95% confidence intervals (CI) for all-cause and breast cancer specific mortality according to E-cadherin tumor expression (low/high): a pooled analysis of 12 participating Breast Cancer Association Consortium studies. a Hazard ratios and 95% confidence intervals were estimated using stratified Cox models (stratified by study site), adjusted for age at diagnosis (in 10-year categories). b For analyses by ER status, hazard ratios and 95% confidence intervals were estimated using stratified Cox models (stratified by study site), adjusted for age at diagnosis (in 10-year categories), tumor grade (well/moderately differentiated, poorly differentiated, or unknown), tumor size (≤2, >2 cm, or unknown), axillary node involvement (negative, positive, unknown), histology (ductal/mixed, lobular, other/unknown), and HER2 status (positive, negative, or unknown). For analyses by histology among ER + tumors, hazard ratios and 95% confidence intervals were estimated using stratified Cox models (stratified by study site), adjusted for age at diagnosis (in 10-year categories), tumor grade (well/moderately differentiated, poorly differentiated, or unknown), tumor size (≤2, >2 cm, or unknown), axillary node involvement (negative, positive, unknown), and HER2 status (positive, negative, or unknown). For analyses by HER2 status, hazard ratios and 95% confidence intervals were estimated using stratified Cox models (stratified by study site), adjusted for age at diagnosis (in 10-year categories), tumor grade (well/moderately differentiated, poorly differentiated, or unknown), tumor size (≤2, >2 cm, or unknown), axillary node involvement (negative, positive, unknown), histology (ductal/mixed, lobular, other/ unknown), and ER status (positive, negative, unknown).