Sex differences in esophageal cancer overall and by histological subtype

Esophageal cancer is the seventh most common type of cancer in the world, the sixth leading cause of cancer-related death and its incidence is expected to rise 140% in the world in a period of 10 years until 2025. The overall incidence is higher in males, while data about prognosis and survival are not well established yet. The goal of this study was to carry out a comprehensive analysis of differences between sexes and other covariates in patients diagnosed with primary esophageal cancer. Data from 2005 to 2020 were obtained from the University Hospitals (UH) Seidman Cancer Center and from 2005 to 2018 from SEER. Patients were categorized according to histological subtype and divided according to sex. Pearson Chi-square test was used to compare variables of interest by sex and the influence of sex on survival was assessed by Kaplan Meier, log rank tests and Cox proportional hazards regression models. A total of 1205 patients were used for analysis. Sex differences in all types were found for age at diagnosis, histology, smoking status and prescriptions of NSAIDs and in SCC for age at diagnosis and alcoholism. Survival analysis didn’t showed differences between males and females on univariable and multivariable models. Males have a higher incidence of Esophageal Cancer and its two main subtypes but none of the comprehensive set of variables analyzed showed to be strongly or unique correlated with this sex difference in incidence nor are they associated with a sex difference in survival.

As in other types of cancer, sex differences in incidence are also seen in esophageal cancer. In the United States, 76% of cases of adenocarcinoma from 1973 to 2012 occurred in white males 16,17 . It is estimated that the odds for EAC is 7-10 times greater and the odds for SCC is 3-4 times greater in males than females 18 . Also, sex has been shown to be an independent prognostic marker in SCC but not in EAC, with females having better survival [19][20][21][22] . In addition, there is a report of greater regional recurrence and distant metastasis in males when compared to females, indicating that there is greater control of the disease after radiotherapy in females 19 .
Despite published data showing sex differences for esophageal cancer, root causes are still poorly understood. To our knowledge, no studies have analyzed sex differences across a large spectrum of variables in both SCC and EAC. We hypothesized that there may be differences in epidemiological criteria, risk factors or treatment patterns that explain the sex differences in incidence and outcomes. Therefore, we carried out a comprehensive analysis on solid databases of differences between sexes and other covariates in patients diagnosed with primary esophageal cancer.

Methods
Data were obtained from the University Hospitals (UH) Seidman Cancer Center research data repository consisting of patient records from 2005 to 2020. The Data repository is based on CAISIS, an open source web-based cancer data management system that integrates research with patient care and has integration from disparate sources (Soarian, NGS Labs, Sunrise Clinical Manager, Tumor Registry, Via Oncology, OnCore, MosiaQ, PRO tools and others) to provide comprehensive data on the UH Seidman cancer patient population 23 . Patient records were deidentified and all the analysis were performed in accordance with relevant guidelines and regulations, respecting the Declaration of Helsinki. The study with the waiver of the informed consent was approved by the University Hospitals of Cleveland Institutional Review Board (IRB).
The initial cohort included patients ≥ 18 years old who were diagnosed with primary malignant esophageal cancer between 2005 and 2020 (ICD codes C15.XX, C49.A1 and 150.XX) 24, 25 . Patients were excluded from analysis if they had missing sex information, unknown date of diagnosis or a prior history of cancer. The cohort selection for analysis is described in Fig. 1.
Data extracted from the UH platform for each patient included basic demographics (such as age at diagnosis, sex, race, etc.), comorbidities, histology/subtype, staging, laboratory results, vital signs, medications, and cancer treatment information (chemotherapy, hormone therapy, immunotherapy, surgery, radiation). Treatment information was only included if it was related to esophageal cancer or the anatomical location of the esophagus. From the list of medications, we selected the drugs and classes commonly used on treatment for esophageal cancer or those drugs that can be risk factors 26 . Patients with a recorded date of death obtained from the EMR and state records were considered deceased.
Our final analysis included 29 categorical variables, grouped into General Characteristics, Cancer Characteristics, Risk Factors and General Treatment. General Characteristics variables included age at diagnosis, sex, median income, race and ethnicity. Cancer Characteristics variables included histological subtype, clinical stage and pathological stage. Risk Factors included Charlson comorbidity score 27 , smoking status and presence/absence of the following comorbidities: obesity, BE, alcoholism, achalasia, previous gastrectomy, gastritis, gastroesophageal reflux, H.pylori infection and long term use of NSAIDs. General treatment variables included whether the patient received the following therapies: chemotherapy, immunotherapy, radiation (of the esophagus or nearby   38,28,29 . Race was categorized as white, black, or other. Ethnicity was categorized as Hispanic, non-Hispanic or other. Estimated median income was determined by the patient's zip code and categorized as < $43,235, $43,235-$64,446, or > $64,446 (25%, 50% and 75% percentiles). The risk factors were selected from the list of comorbidities of each patient based on those most related to esophageal cancer according to the literature and clinical experience 30,31 . Histological subtype was categorized into Squamous Cell Carcinoma-SCC (ICD-O-3 8050-8084) or Esophageal Adenocarcinoma-EAC (ICD-O-3 8140-8384) 32 . The categorization process is summarized in supplementary Table S1.
SEER data was used to compare and validate our findings with the general population. Data were obtained from SEER*stat software based on SEER Research Plus Database for esophageal cancer diagnosis between 2005 and 2018 33 . The variables analyzed were categorized following the methodology applied to the UH Database and included sex, age at diagnosis, race, ethnicity, histology, staging, chemotherapy, radiotherapy, surgery, vital status and median survival.
The sample was divided according to sex as male or female. Pearson Chi-Square test was used to compare variables of interest by sex, disregarding patients with missing values, with p < 0.05 being considered significant. The influence of sex on survival was first assessed using Kaplan Meier analysis generating median survival by sex with 95% confidence intervals (95% CI) and log rank tests by sex. Cox proportional hazards regression models were used after getting the assumptions checked to assess univariable and multivariable models of overall survival by sex and by sex and histological subtype (EAC and SCC). The variables selected for the multivariable model overall and by histological subtype were those with p < 0.20 in the univariable model and those with clinical importance. Correlated variables checked by chi-square test were not included in the final model. All analyses were performed using RStudio 1.2.1335 software 34 .

Results
All types of esophageal cancer. Using data from years 2005 to 2020 we analyzed a total of 1205 patients for all types of esophageal cancer, with 75.8% (913) males and 24.2% (292) females, establishing a male: female ratio of about of 3:1. The evolution of cases by year is shown on Fig. 2. For general characteristics (Table 1), sex www.nature.com/scientificreports/ differences existed only for age at diagnosis (p < 0.001), with a predominance of females > 70 years old (46.9% of females) and males between 56 and 70 years old (48.2% of males). There were no significant differences for median income (p = 0.12), race (p = 0.06) and ethnicity (p = 0.21). When cancer characteristics (Table 2), we found a difference for histology (p < 0.001) with a predominance of EAC in both groups (79.2% in males and 56.5% in females) and no significant differences were found for clinical staging (p = 0.21) and pathological staging (p = 0.08). There was a difference for risk factors (Table 3) in smoking status (p = 0.01) with a predominance of former smokers in males overall and by histological subtype (58.1% in males and 45% in females overall), with no difference for Charlson Score (p = 0.28), obesity (p = 0.11), BE (p = 0.22), alcoholism (p = 0.35), achalasia (p = 0.64), previous gastrectomy (p = 0.17), gastritis (p = 0.28), gastroesophageal reflux (p = 0.42), H.pylori infection (p = 0.80) and long term use of NSAIDs (p = 0.96).
On the Survival Analysis, summarized on Fig. 7, differences were found on all types, EAC and SCC univariable models, while on multivariable EC and SCC showed higher risk of death for males, except on the EAC multivariable (HR for males = 1.01, CI = 0.97-1.06, p = 0.35).

Discussion
The primary objective of this work was to assess sex differences in a large spectrum of variables and assess the potential effects of these differences on survival for Esophageal Cancer and its two main histological subtypes (Adenocarcinoma-EAC and Squamous Cell Carcinoma-SCC). We believe that our main contribution to the field is the solid, qualified, and detailed information available on our institutional database, that with the integration of disparate sources, enabled us to carry a comprehensive analysis, adding variables and information that helps to understand the epidemiology of sex differences for esophageal cancer. This study showed that, like other cancers, esophageal cancer and its two main histological subtypes (EAC and SCC) occur more often in males than in females, on both our institutional database and SEER, corroborating with literature reports of higher incidence in males 38,35 . The mechanisms to explain these sex differences are not fully understood and seems to be multifactorial, mainly involving hormonal and genomic factors 38,36 .  www.nature.com/scientificreports/ Our study also showed that, regarding risk factors, there are differences only on smoking status and only when analyzing all types of esophageal cancer (probably due to the inclusion on other/unknown histology diagnosis on this group), in line with findings that risk factors doesn't seem to be associated with the higher incidence in males 37,38 . Looking to demographic factors we noted that there are differences for age at diagnosis for all types of Esophageal Cancer and for SCC, with females tending to be diagnosed at older ages, findings that can contribute to the hypothesis that estrogen can be an inhibitor for the esophageal carcinogenesis and thus protective for females on the pre-menopausal stage [38][39][40] . Regarding cancer characteristics ( Table 2) the only difference seen is on the histological subtype. Besides our institutional database confirming the trends of higher rates of EAC for both sexes, it interestingly also showed ). *Adjusted for: age at diagnosis, race, ethnicity, histology, obesity, Barret's, gastrectomy, gastritis, gastroesophageal reflux, chemotherapy, surgery and radiotherapy. **Adjusted for: age at diagnosis, smoking status, obesity, Barret's, alcoholism, achalasia, gastrectomy, gastritis, gastroesophageal reflux, h.pilory, chemotherapy, surgery and radiotherapy. ***Adjusted for: race, Barret's, gastritis, gastroesophageal reflux, h.pilory, radiotherapy, surgery, smoking status and alcoholism.  Excluding the differences already mentioned on age at diagnosis, smoking status and histological subtype, the only other differences between males and females are on alcoholism (only for SCC) and NSAIDs prescription (only for all types of Esophageal Cancer). All the other various variables analyzed didn't showed any sex differences. Smoking status and alcoholism differences seems to be explained by populational behavior differences and, together with the other differences found, don't seem to be to have an impact on the outcomes. Regarding to the outcomes, there's not statistical significance but it's possible to see a tendency of higher hazard ratios for males. The literature is conflicting about sex differences on survival, while some studies report worse outcomes in males, other report no differences, just like our study 19,20,22,[42][43][44] . In addition, we observed changes in diagnosis over time (Fig. 2), with an increasing trend in Esophageal Cancer overall, with increasing number of EAC cases and a downward trend in the number of cases of SCC, especially in male patients. These trends corroborate  Interestingly SEER data showed different patterns from our population. These differences reported between our database and SEER reflects differences in quality of care and population treated inside the US, for example with the underrepresentation of Hispanic patients and higher rates of treatment on our population. Our institution is localized on the state of Ohio, that, accordingly to the Ohio Department of Health, has an average of 781 new esophageal cancer cases per year, with an incidence rate of 5.2 per 100,000 (number 23% higher than the US rate) and annual cases of 625 per 100,000 for males vs 156 cases per 100,000 for females and all providers are required, by law, to report to Ohio Cancer Incidence Surveillance System (OCISS) all cancers diagnosed and/ or treated on the state. This study has several limitations. Our institutional database is based on the specific population being followed up on the University Hospitals Seidman Cancer Center, thus our selection does not indicate a population www.nature.com/scientificreports/ sample outside this context. Since this service also receives patients already diagnosed and being treated on other services, some of the information on the EMR may be incomplete. In addition, given the retrospective nature of this study, some valuable variables (such as the location of the cancer on the esophagus) are not available and for some variables there is a high number of NAs/unknown information (such as histological subtype and clinical staging) and this missing information can lead to a loss of statistical power. Also, the median income variable was generated by patient`s zip code and thus could have some misclassification. On the other hand, we analyzed a high number of variables with detailed information, giving new insights for what`s already published on the field. Additional studies with other databases, larger cohorts and with prospective design are needed to corroborate and investigate the findings reported here.
In summary, we found that males have a higher incidence of Esophageal Cancer and its two main subtypes (EAC and SCC) but none of the comprehensive set of variables analyzed showed to be strongly or unique correlated with this sex difference in incidence nor are they associated with a sex difference in survival.