Prevalence of comorbidities and their impact on survival among older adults with the five most common cancers in Taiwan: a population study

Because of the cancer incidence increase and population aging in Taiwan, we aimed to assess the cancer prevalence, to summarize the comorbidities of older patients with the five most common cancers (i.e., breast, colorectal, liver, lung, and oral), and to develop a Taiwan cancer comorbidity index (TCCI) for studying their actual prognosis. The linkage of the Taiwan Cancer Registry, Cause of Death Database, and National Health Insurance Research Database was used. We followed the standard statistical learning steps to obtain a survival model with good discriminatory accuracy in predicting death due to noncancer causes, from which we obtained the TCCI and defined comorbidity levels. We reported the actual prognosis by age, stage, and comorbidity level. In Taiwan, cancer prevalence nearly doubled in 2004–2014, and comorbidities were common among older patients. Stage was the major predictor of patients' actual prognoses. For localized and regional breast, colorectal, and oral cancers, comorbidities correlated with noncancer-related deaths. Compared with the US, the chances of dying from comorbidities in Taiwan were lower and the chances of dying from cancer were higher for breast, colorectal, and male lung cancers. These actual prognoses could help clinicians and patients in treatment decision-making and help policymakers in resource planning.


Study population. This study was based on the linkage of the Taiwan Cancer Registry (TCR), National
Health Insurance Research Database (NHIRD), and Cause of Death Database (TCOD); they were used in our earlier studies [30][31][32] .
The TCR collects information on patients with primary cancers at all hospitals in Taiwan with 50 or more beds. The quality of the TCR is improving and was reviewed previously 22,33 . The TCR included 1,934,198 records for 1979-2014, with one record for each primary cancer. After basic data checks and cleaning using birth date and sex, 1,852,694 cancer cases involving 1,699,907 patients were included.
Taiwan's National Health Insurance (NHI) program (implemented in 1995 by the NHI Administration) provides compulsory universal health insurance and covers all health care services for more than 99% of Taiwan's population. It is characterized by good accessibility, short waiting times, and low cost, among others; however, problems with the system include poor gatekeeping of specialist services; patients can self-schedule hospital visits without a general practitioner's referral 34 . The NHIRD is built on data from this program, and we used the 2000-2015 data in this study. While data in the NHIRD have proven to be valuable resources for health science research, there are limitations 35,36 .
TCOD has included the unique underlying cause, not the multiple cause, of death for individuals in Taiwan since 1971 and used the national identification card number (NICN) since 1985 37 . The original TCOD contained 4,191,373 individual records for 1985-2016. Eliminating inconsistencies in the NICN, sex, birth date, death date, and cause of death yielded 4,054,632 unique death records for this period.
Using the linked datasets, we studied the prevalence of cancer survivors, including patients with invasive or noninvasive cancers. We considered an individual a cancer survivor at the end of 2014 if he/she was included in the TCR for the period 1979-2014, not included in the TCOD for the period 1985-2014, but included in the NHIRD for the period 2000-2014; thus, there might be some minor underestimation of cancer prevalence due to cancers diagnosed before 1979. For each cancer survivor, the time interval from his/her cancer diagnosis to the end of 2014 was his/her survival time. Similarly, we considered cancer prevalence in 2004 and 2009 in this study.
For comorbidity, we studied patients whose first invasive primary cancer was breast, colorectal, liver, lung, or oral cancer and was diagnosed between 2004 and 2014. Table S1 presents the ICD9 codes for these cancers. From 2004, 27 hospitals, and from 2007, all participating hospitals were required to use the TCR long form to collect more information on patients with these cancers, including the sage. Table S2 reports the numbers of patients with these five cancers in the TCR. The restriction of the linkage of the TCR from 2004-2014, NHIRD from 2000 to 2014, and TCOD from 2004 to 2016 to those whose first primary cancer was the five cancers is referred to as the Linked Dataset. Figure 1 outlines the work flow of this comorbidity study. After forming the Linked Dataset, we decided the time interval prior to cancer diagnosis for comorbidity assessment and formed training sets, validation sets, and test sets for TCCI development, validation, and evaluation, which are described in detail in Fig. 2. Details are given below.
Noncancer death. This study used the Surveillance, Epidemiology, and End Results Program (SEER) causespecific death classification algorithm to define noncancer-related death 38 . This algorithm has been essential for cancer survivorship studies [9][10][11]20 . Our earlier publication used this algorithm and showed that cause-specific survival and relative survival for common cancers in Taiwan are comparable, thus suggesting the validity of this SEER algorithm in Taiwan 32 .
Comorbidity definition. The comorbidities considered in this study were mainly adopted from those in NCICI except modifying mild liver diseases by incorporating viral hepatitis B and C to reflect their high prevalence in Taiwan and their roles in cancer development 39,40 and including hypertension without and with complications, which was mentioned in the discussion of Stedman and colleagues on their study limitation 20 and whose associations with cancer have been widely studied; see, for example, Seretis and colleagues and Dima and    41,42 . We considered 18 comorbidities in this study; they are shown in Table 1, and their ICD 9 codes are shown in Table S3. They included 16 of the 19 comorbidities defining the Charlson Comorbidity Index but excluded solid cancer, leukemia, and lymphoma because of our study focus 43 . Intervals defining comorbidity. Because comorbidity assessment depends on the time interval before cancer diagnosis, we followed Maringe and colleagues to determine the interval for comorbidity assessment for this study 44 . We know that longer intervals for comorbidity assessment provide more information for each patient but include fewer patients for the study because the NHIRD started in 1995 and adopted ICD9 exclusively in 2000. With these in mind, we present in Table S4 the hazard ratios from fitting Cox regression models, including only a single comorbidity as the covariate of interest and using time from cancer diagnosis to noncancer-related death as the outcome, based on all patients with colorectal cancer in the TCR from 2006-2014 . Tables S4-1 and 4-2 regard patients aged 15-64 and 65-94, respectively. We considered three comorbidity assessment time intervals, 30 months, 54 months, and 78 months, before the cancer diagnosis and explored the sex-specific comorbidity effect. In fact, we followed Maringe and colleagues to exclude comorbidities that appeared only in the six months immediately before the cancer diagnosis to reduce the comorbidities caused by the cancers. A patient was said to have a specific comorbidity if their inpatient files contained a diagnosis of this comorbidity www.nature.com/scientificreports/ within the earlier 24, 48, or 72 months or if their outpatient files contained two diagnoses of this comorbidity in these periods with a gap > 1 month. The Supplementary Materials and Table S4 give more details in this regard. For the age group 65-94, Table S4-2 shows that for the vast majority of the comorbidities, the differences in hazard ratios were small among these three assessment periods but not small between sexes; thus, we decided to consider sex-specific assessment with a 30-month period to include more patients in the study.
Taiwan cancer comorbidity index. We followed the standard three steps in statistical learning (i.e., model training, model selection, and model assessment) to obtain a survival model with good discriminatory accuracy in predicting death due to noncancer causes; see, for example, Chapter 7 of Hastie, Tibshirani, and Friedman 45 . We acquired data from the NHIRD and TCOD for each patient with the five studied cancers in the TCR during 2004-2014, aged between 15 and 94 years. This dataset was called "Five-Cancer". We report in Tables S5-1, for those aged 15-64, and S5-2, for those aged 65-94, the numbers and percentages of these patients who had any of the 18 comorbidities and who were alive at the end of 2016. Five-Cancer was randomly divided into three disjoint parts: one half was the training set, one quarter was the validation set, and the remaining quarter was the test set. Five-Cancer had a total of 501,572 patients, as shown in Table S2.
Using the Five-Cancer training set, we fitted Cox regression models with time from diagnosis to noncancerrelated death as the outcome. Censoring events included cancer-associated deaths or loss to follow-up as per the linkage of the TCR, TCOD, and NHIRD. Table S6-1 presents the estimated coefficients of the Cox model, including all 18 comorbidities and the interactions of any two of the 11 most common comorbidities ("Main18&11") as covariates. Three of the estimated main effects of the comorbidities were negative. We deleted the comorbidities with negative coefficients altogether and refitted the model until all the main effects were positive; whenever a comorbidity was deleted, interaction terms involving it were also deleted. The resulting model was termed "Main18&11.ND", for whom Table S6-2 presents the hazard ratios and coefficients. We note that the negative coefficients of the main effects may result from interactions or residual confounders. The motivation to delete them was to increase the interpretability, although it does not address the possible issue of bias.
A patient's TCCI in this study was defined to be the sum of the coefficients in the Cox model Main18&11. ND corresponding to the patient's comorbid conditions and interaction terms. We chose this for its excellent performance and simplicity.
In fact, we systematically considered 24 subsets of Five-Cancer defined by age, sex, and cancer site and divided each of them randomly into a training set, a validation set, and a test set. These divisions were compatible among these 24 subsets in the sense that if one subset was included in another subset, the training set, validation set, and test set of the former were included in the counterparts of the latter. The formation of these sets is detailed in Fig. 2. We fitted several Cox's regression models to each of the 24 training sets and computed the time-dependent area under the operating characteristic curve (AUC) at 1 year, 2 years, and 5 years from diagnosis in each validation set. This AUC, a predictive accuracy measure, is the time-dependent extension of the analysis by Heagerty and Zheng 46 . Table S7 presents the 5-year AUCs evaluated in each of the sex-and sitespecific validation sets of cancer patients aged 65-94. According to Tables S7-1-S7-9, the more intuitive Cox model Main18&11.ND trained by Five-Cancer generally performed very well across all these validation sets. The Supplementary Methods detail the construction of the training sets, validation sets, and test sets and the Cox models considered and assessed.
TCCI and comorbidity levels. Based on the TCCI and clinical judgment, we followed Cho and colleagues and Edwards and colleagues to consider three comorbidity levels 9,11 . Patients with none of the 18 comorbidities were coded as 0. Patients were considered to have a severe comorbidity and coded as 2 if their TCCIs were > 0.66 or they had severe illnesses, such as COPD, liver dysfunction, chronic renal failure, dementia, or congestive heart failure, which frequently lead to organ failure or systemic dysfunction and usually require adjusting the cancer treatment. We note that according to NCICI, patients with exactly one comorbidity are coded 2 if and only if they have NCI index weights > 0.66 9 ; this statement also held true in this study, except for those with COPD only. Patients coded as neither 0 nor 2 were coded as 1 and said to have a low/moderate comorbidity. Note that the cutoff of 0.66 was coincidentally the same as that of Edwards et al. 9 . Note also that AMI is usually not excluded from cancer clinical trials unless it is within 12 months prior to randomization; see, for example, the protocol in Krop and colleagues 47 . This may be the reason that patients were not coded 2 if AMI was the only comorbidity. In evaluating the effect of targeted therapy on lung cancer patients, we studied patients aged 30-94 years with distant stage by histology, although TCCI was not evaluated for younger patients. Statistical analysis. In this study, all the fitting of Cox's models, for choosing the time interval for comorbidity assessment and for constructing TCCI based on the training sets, were carried out using the R package 'survival' . The time-dependent AUCs for the Cox models based on the validation datasets were obtained using the R package 'risksetROC' , studied by Heagerty and Zheng 46 . The actual prognoses were computed using the R package 'cmprsk' 48 , which estimates the subdistributions of a competing risk.
All methods were performed in accordance with the relevant guidelines and regulations of Scientific Reports. This study used only datasets for which all personal information had been deidentified by the Health and Welfare Data Science Center, Ministry of Health and Welfare of Taiwan. There was no patient contact for the study; therefore, there was no patient consent process.

Results
A rapid increase in the number of cancer survivors. Figure 3 shows that the total number of cancer survivors increased drastically, from 314,107 in 2004 to 610,712 in 2014, and the number of long-term survivors who survived > 15 years increased from 29,953 in 2004 to 115,021 in 2014, a fourfold increase, which was much faster than that in the US 49 . Figure S1 presents the corresponding numbers for each of the five most common cancers, indicating that breast cancer in women had more long-term survivors than the other four cancers. Thus, cancer survivorship warrants immediate attention in Taiwan. Among the elderly patients in Taiwan, hypertension, diabetes, ulcer disease, COPD, and CVD were the most common comorbidities, with a prevalence higher than 10%; liver disease, CHF, and CRF were the next most common comorbidities, with a prevalence of 5-10%. There were differences in comorbidity prevalence compared to those in the US and UK 9,20,50 . For example, CHF and PVD had higher ranks in the US and UK, and liver disease and ulcer disease had higher ranks in Taiwan. In Taiwan, the noncancer and oral cancer cohorts had the fewest comorbidities; patients with liver cancers had the most comorbidities, and 25%-28% of patients with breast, colorectal and lung cancer had no comorbidities. In the US, breast cancer had a similar comorbidity prevalence to the noncancer cohort, and lung cancer had a much higher comorbidity prevalence 9 . However, COPD was most prevalent in patients with lung cancer in both the US and Taiwan. We present the weights for computing the TCCI for each patient in Table 2, which is the same as Table S6-2 and was used to define three levels of comorbidity. Based on these, Table 3 reports the numbers and percentages of patients by stage, age, and comorbidity levels for each cancer. Here, the cancer stage follows the SEER summary stage described in Table S8, which converts the stage at diagnosis from the tumor, node, metastasis (TNM) staging system to the SEER summary stage. Tables S9-S11 provide additional information about Table 3. Table 3 shows that comorbidity prevalence increased with age; breast cancer, colorectal cancer, and liver cancer had more patients diagnosed with early stages, oral cancer had more in regional stage, and lung cancer had the majority in late stage.  Figure 4 presents the 5-year probabilities of dying from cancer, dying from competing causes, and survival stratified by sex, stage, age, and comorbidity level for the five cancers. Table S12 reports their actual values and the corresponding 1-year and 2-year probabilities; Figure S2 presents the corresponding figures. Strata with fewer than 100/50 patients are marked with */+. Among patients with localized and regional stage cancers, those with older age or severe comorbidity had lower survival rates, mainly due to increased deaths from competing causes. For patients with distant-stage cancers, age and comorbidities had a reduced effect, and the chances of dying from cancer were high. Although comorbidities affected both cancer-related and noncancer-related deaths, the effect was larger for noncancerrelated deaths; this observation was in line with an Australia study of colorectal cancer, which included patients aged 18-80+ 51 . Stage had a much larger effect on survival than age or comorbidity. Thus, the impact of age, comorbidity, and stage on the actual prognosis was generally similar to that reported in the US 10 .

Prevalence of comorbidities.
Despite these similarities, there were considerable differences between Taiwan and the US. In Taiwan, patients with local or regional breast, colorectal, and lung cancers had lower chances of dying from competing causes and higher chances of dying from cancer, except for women with lung cancer. Figure 4 shows that patients with liver and lung cancer had the highest probabilities of cancer-related death, and their comorbidities had smaller influences on death. Figure 4 also shows that patients with oral cancers had a better prognosis than those with liver and lung cancers when there were enough patients in the strata.

Lung cancer subtypes.
For patients with distant lung cancer and aged 30-94, Fig. 5 shows that the overall survival for lung adenocarcinoma (ADC) was better than that for squamous cell carcinoma (SCC) of the lung and that for small cell lung cancer (SCLC); the difference was most obvious for one-year overall survival. It also shows that for lung ADC, the overall survival was better in 2011-2014 than in 2004-2010, and the difference was also most obvious for one-year overall survival. It also shows that the one-year overall survival was best for women with lung ADC, next for men with lung ADC, and worst for men with lung SCC. Supplementary  Figures S3-S5 include other prognoses. Tables S13-S14 present the corresponding point estimates, confidence intervals, and other related statistics.
All of these findings are consistent with the 2011 Taiwan NHI Program policy that reimburses patients with late-stage lung ADC who have EGFR mutations for tyrosine kinase inhibitors (TKIs); EGFR mutations are common among never-smoking female lung ADC patients in Taiwan 52 .
The above observations from the 1-year probabilities became less prominent for the 2-year probabilities and nearly vanished for the 5-year probabilities (Figures S2-S3). This may reflect the palliative nature of the TKIs.  Table S7 reports the AUCs regarding the 5-year survival of noncancer deaths for indices based on different models and training sets. It is interesting to see from Tables S7-1-S7-9 that the AUCs did not change much by deleting the comorbidities with negative coefficients but did decrease clearly by backward stepwise variable selection, where one of the coefficients was still negative; see Table S6-3. They also varied little with training sets. Table S7-     www.nature.com/scientificreports/ higher than those in the US 20 . The observation that male cancer patients had smaller AUCs might be caused by more deaths due to lung cancer.

Discussion
This study reports that in Taiwan, the number of cancer survivors increased rapidly, comorbidities among older patients with cancer were common, and the comorbidity profile among Taiwanese older patients differed from those in the US and UK. Using the three comorbidity levels defined by the TCCI and clinical judgment, we reported the actual prognoses of patients with the five most common cancers, indicating that stage was the major predictor of patients' actual prognoses but for localized and regional breast, colorectal, and oral cancers, comorbidities correlated with noncancer-related deaths. Compared with the US, the chances of dying from comorbidities in Taiwan were lower, and the chances of dying from cancer were higher for breast, colorectal, and male lung cancers. These findings highlight the challenge of coordinating multidisciplinary cancer treatment and survivorship care and prompt future studies to determine whether cancer patients in Taiwan receive similar treatments for their comorbidities as their noncancer counterparts and whether their cancer treatments are unnecessarily modified. www.nature.com/scientificreports/ Here are some remarks on the methodology of the TCCI. Although we modified the condition of mild liver disease by adding viral hepatitis B and C and included hypertension to reflect the high prevalence of these diseases in Taiwan, all the remaining comorbidities were adopted from the CCI and NCICI. While all these comorbidities are well established, we found that some of them had negative coefficients for their main effects in the resulting Cox models, suggesting the existence of correlation, interaction, or residual confounders among them. To make the model more intuitive and facilitate communication, we considered the procedure to eliminate the comorbidities with negative coefficients. Tables S7-1-S7-9 show that this procedure resulted in a more intuitive model without sacrificing performance. These tables also suggest that stepwise variable selection may suffer severe disadvantages. All the above are in line with those discussed in Steyerberg 53 and Harrel 54 . Because we followed standard model development, selection, and assessment procedures strictly, the AUCs reported in Table S7-10 were based on the test sets, and the test sets were held back until the final assessment, the performance of the TCCI is likely reliable. Finally, we note that in the model selection step, we chose Main18&11, instead of Main11, because the former performed better in 7 of the 9 cancers, although only slightly.
Compared with the SEER studies, the chances of dying from competing causes are lower and those of dying from cancer are higher in Taiwan for local and regional breast, colorectal and male lung cancers 9 . This seems to be in line with the results based on net survival. Indeed, a comparison between the 5-year cancer cause-specific survival in Taiwan during 2000-2010, based on Table 3 in Chien and colleagues 32 , and that in the US SEER study during 1992-2004, based on Table 3 in Howlader and colleagues 38 , suggests that cancer survival of the breast and the colon and rectum in Taiwan seemed to be poorer than those in the US.
Comparing Table 2 with Stedman et al. 20 suggests that COPD and chronic renal failure (CRF) exhibited the largest difference in hazard ratios. While the large hazard ratio for CRF might reflect the serious renal disease problem in Taiwan 55 , further studies are needed to understand the low hazard ratios for COPD in Taiwan. Because tobacco smoking is an important risk factor for both lung cancer and COPD and a large proportion of lung cancer patients are never-smokers in Taiwan 52,56 , it might be worthwhile to study the prognosis of lung cancer by smoking status.
A recent study suggested that targeted therapies may have contributed to the reduced mortality from nonsmall-cell lung cancer in the US population 57,58 . Our results on the actual prognoses for patients with distant-stage disease provide additional population-level support for the positive effects of recent advances in lung cancer treatment on patient outcomes, reflecting the 2011 reimbursement policy of the Taiwan NHI program. Figure 4 indicates that for localized liver cancer, 5-year overall survival rates were better for those at comorbidity level 1 than for those without comorbidities. This might be related to the 2003 NHI policy that reimburses antiviral medications 59 and suggests a future study that considers the actual prognoses of patients with liver cancer separately for those with and without hepatitis viral infections.
A major strength of this study is that the TCCI was developed and evaluated in a large dataset by following standard statistical learning methods; in addition, the comorbid conditions were selected from a literature review, and the comorbidity assessment period was decided empirically. Table S7 exemplifies the advantage of a large training set in terms of predictive performance.
There are some limitations to this study. The comorbid conditions, assessed by the administrative dataset NHIRD, do not reflect their severity. Another limitation is that including only comorbidities with positive main effects in defining TCCI promotes communication but may cause some bias.
Effects of comorbidities on actual prognoses have been studied in Australia, England, and the US 51,60,61 . Although there are underlying similarities between our study and theirs, comparisons suggest that we should take into account additional risk factors, such as treatment and exposures, to obtain more precise prognoses. In particular, the role of socioeconomic status could be explored 50 . It is also desirable to improve the TCCI by including more comorbid conditions and based on cohorts of more cancer sites, for other uses in geriatric oncology 62 .

Conclusions
The rapid increase in long-term cancer survivors and the widespread comorbidities among older cancer patients in Taiwan demand attention to their actual prognoses. In addition to providing information for patients and clinicians regarding treatment decisions and for policymakers regarding resource allocation, this study proposed TCCI and suggested important future research topics, which may also be relevant to geriatric oncology in other parts of the world.

Data availability
All the datasets used in this study were provided by and all the analyses were carried out in one of the secure labs of the Health and Welfare Data Science Center, Ministry of Health and Welfare, Taiwan. All the data are de-identified. For information on how to submit an application for gaining access to these datasets, please follow the instructions at https:// www. apre. mohw. gov. tw/ If some one wants to request the data from this study, please contact the corresponding author Dr. I-Shou Chang (ischang@nhri.org.tw) for more detailed information.