A national database analysis for factors associated with thyroid cancer occurrence

In order to analyze the associations between thyroid cancer and environmental factors, we analyzed the national sample cohort representative of the entire population provided by the Korean National Health Insurance Service database record from 2006 to 2015. The cohort was categorized according to age, body mass index, income, residential areas, frequency of exercise, frequency of alcohol drinking, diet, presence or absence of hyperthyroidism, presence or absence of hypothyroidism, and smoking data. Age ≥ 55 years (HR 0.68, 95% CI 0.53–0.88), lower income (0.57, 0.40–0.80), and current smoking (0.69, 0.55–0.85) were associated with lower thyroid cancer occurrence among men. Body mass index (BMI) ≥ 25 kg/m2 (1.51, 1.26–1.82), higher income (1.44, 1.19–1.76), urban residence (1.24, 1.03–1.49), and presence of hypothyroidism (3.31, 2.38–4.61) or hyperthyroidism (2.46, 1.75–3.46) were associated with higher thyroid cancer occurrence among men. Age ≥ 55 years (0.63, 0.56–0.71), moderate alcohol drinking (0.87, 0.77–0.99), and current smoking (0.56, 0.37–0.85) were associated with lower thyroid cancer occurrence among women. BMI ≥ 25 kg/m2 (1.41, 1.26–1.57), frequent exercise (1.21, 1.07–1.36), higher income (1.18, 1.06–1.32), urban residence (1.17, 1.06–1.29), and presence of hypothyroidism (1.60, 1.40–1.82) or hyperthyroidism (1.38, 1.19–1.61) were associated with higher thyroid cancer occurrence among women. In conclusion, age ≥ 55 years and current smoking were associated with lower thyroid cancer occurrence, while BMI ≥ 25 kg/m2, higher income, urban residence, hypothyroidism, and hyperthyroidism were associated with higher occurrence in both men and women.

Thyroid cancer is globally the most common endocrine malignancy, and its incidence has increased worldwide [1][2][3][4] . However, the causes of thyroid cancer, apart from exposure to radiation during childhood, have not been elucidated 1 . This is in contrast with the well-documented causal effects of lifestyle or environmental factors such as smoking, alcohol drinking, and high-sodium diet on cancers of the lung, liver, and stomach, respectively. Many environmental factors have been proposed to be associated with thyroid cancer occurrence; however, this remains controversial 1 .
For example, iodine uptake has been linked with the development of thyroid cancer; however, the evidence is inconclusive 5,6 . Likewise, reported associations between thyroid cancer and environmental factors, such as obesity, physical activity, socio-economic status (SES), alcohol drinking, and smoking, differ among studies with small and biased population selection [7][8][9][10] . Moreover, these studies investigated the association of each factor with thyroid cancer separately and thus neglected the confounding influence among these factors.
In recent years, large-scale databases of national insurance systems have become publicly available in several countries, including South Korea 11,12 . Analyzing such databases allows investigators not only to comprehensively assess the associations of multiple environmental factors with a disease and their confounding influence, but also to perform such analyses using a population that is representative of an entire nation. Therefore, we Scientific Reports | (2020) 10:17791 | https://doi.org/10.1038/s41598-020-74546-3 www.nature.com/scientificreports/ comprehensively analyzed the associations between thyroid cancer and environmental factors by extracting a thyroid cancer patient group and a control non-cancer group from the nation-wide insurance database. Development of thyroid cancer according to smoking status was analyzed separately for each gender because of the demographical difference (difference in incidence of thyroid cancer, female prone) that may allude to heterogeneity of thyroid cancer between the two genders, and because the variables may be exhibited differently between the two genders. For example, social taboos against women smoking tobacco may elicit different response patterns in health survey between the genders ( Table 2). The percentage of current smokers was 43.6% among men and 2.7% among women. During the 2074182.2 person-years of follow-up, 2106 participants developed thyroid cancer and received surgery for it (incidence rate, 1.01 per 1000 person-years). For men, only current smoking was significantly associated with a decreased thyroid cancer risk when adjusted for multiple variables (HR 0.69, 95% CI 0.55-0.85). Former smoking was not associated with an increased risk of developing thyroid cancer compared with never smoking. Likewise, current smoking was associated with a decreased risk of thyroid cancer among women in the multivariable adjusted model, whereas former smoking was not. The multivariable adjusted model demonstrated an association between pack-years of smoking and development of thyroid cancer among men (Supplement 3). The risk of developing thyroid cancer was significantly lower in each of the pack-years groups than in never smokers. However, pack-years of smoking was not associated with thyroid cancer occurrence among women.

Discussion
According to our study, younger age, BMI ≥ 25 kg/m 2 , higher income, urban residence, and presence of hypothyroidism or hyperthyroidism, were associated with higher thyroid cancer occurrence in both genders, while less physical activity, moderate alcohol drinking and current smoking were associated with lower thyroid cancer occurrence in only women. Pack-years of smoking was associated with a decreased risk of thyroid cancer in men, but not in women.
This study is the first to comprehensively analyze the effects of lifestyle and environmental factors on thyroid cancer occurrence using a national database. Previous national database studies investigated the impact of individual environmental factors on thyroid cancer occurrence. Using the National Health Insurance database of Taiwan, two separate groups concluded that the presence of hyperthyroidism, higher SES, and urban residence are associated with an increased risk of thyroid cancer occurrence 12,13 . Likewise, a 7-year follow-up study of the KNHIS database demonstrated that higher BMI is associated with an increased risk of thyroid cancer 11,14 . However, these studies investigated the effects of individual factors, whereas the present study conducted multivariable adjusted analysis of epidemiologic factors to reduce confounding effects. Furthermore, insurance databases are designed for cost-claim analysis, not for research purposes, which may lead to questions about the validity of thyroid cancer diagnosis 15 . However, because of the reimbursement policies, diagnostic codes for cancer tend to be more accurate than those for benign conditions 16 . In addition, we strengthened the diagnostic accuracy by requiring registration of multiple cancer codes (to exclude unconfirmed diagnosis) followed by a thyroid operation code. Therefore, diagnosis of thyroid cancer in our study can be considered reliable.
National database analysis has great significance because it is representative of an entire population or nation; therefore, the generalizability of the data is more acceptable in comparison with individual studies. Our results are representative of the South Korean population and are in accordance with previous studies [14][15][16][17][18] . Young age, high SES, urban residence, obesity, and hypothyroidism have been linked to an increased risk of thyroid cancer occurrence in both national database and individual studies. Although young age is well-documented to be associated with lower thyroid cancer mortality 19 , it is not always associated with lower thyroid cancer occurrence. Multiple genetic alterations are found more frequently in younger patients 17 . For example, RET/PTC1 rearrangement, which is a key somatic genetic alteration in papillary thyroid cancer development, occurs more frequently in younger patients 18 . Such factors may underlie national database-based results demonstrating an increased risk of papillary thyroid cancer lymph node metastasis in younger patients 20 .
The associations of thyroid cancer with higher SES and urban residence can be attributed to easy access to medical healthcare 21 . Individuals who reside in cities earn more and tend to receive frequent medical checkups, which may explain the increase in thyroid cancer occurrence 22,23 . Likewise, obesity has been associated with a higher occurrence of thyroid cancer in many studies 11 . Although the underlying mechanism is not fully www.nature.com/scientificreports/ understood, insulin resistance or diabetes, which is associated with obesity, may be a risk factor for tumor development 24 . Hyperthyroidism and hypothyroidism have also been associated with thyroid cancer. Thyroidstimulating antibody may affect proto-oncogenes such as RET and TRK during the development of thyroid cancer in hyperthyroidism, and an increased thyroid-stimulating hormone level may lead to nodule and cancer growth in patients with hypothyroidism 25 . Furthermore, frequent ultrasonography performed in these patients may increase the chance of detecting thyroid cancer 26 . The effect of smoking on thyroid cancer has been extensively investigated in many individual studies. In general, active cigarette smoking is one of the most well-known causes of cancer. While some data support the carcinogenic effect of smoking in thyroid cancer 27 , others demonstrate no significant effect 28,29 and some even indicate that cigarette smoking protects against thyroid cancer 30,31 . Subsequently, a meta-analysis demonstrated that current smokers are less likely to be diagnosed with thyroid cancer than non-smokers 7 and the general opinion is in accordance with this result. The theory that smoking exerts a protective effect for thyroid cancer is based on the finding that cigarette extracts exhibit properties similar to that of thyroid hormones and thus may act as thyroid hormone receptor partial agonists 32 . This may mimic a 'thyroid stimulating hormone suppression effect' which would lead to decreased thyroid gland stimulation and in turn less of tumorigenesis. Our study is the first to confirm this association using a national database, which strengthens the generalizability of the association between current smoking and decreased thyroid cancer occurrence.
Although national insurance databases have their strengths, they can only reveal associations, not causality, due to their inherent quality. Such causal relationships must be demonstrated in individual studies investigating the mechanisms underlying thyroid carcinogenesis. Furthermore, clinical information, such as cancer stage, type of thyroid cancer (i.e. medullary or anaplastic cancer), and laboratory data (e.g., thyroid function test results), were not included in this study. In addition, the questionnaires provided by the participants did not include any information on the time duration of the variables. This was complicated by the fact that questionnaires were reported in two different formats (never/former/current vs. pack-year format) obtained through patients' own reports. Consequently, subgroup analyses pertaining to such clinical and temporal information could not be conducted. Finally, the non-linear effect of smoking pack-years and the different effect on both genders could not be explained by our data. Perhaps additional data must be further analyzed through a meticulously designed study in order to elucidate the effect of smoking pack-years on thyroid cancer.
In conclusion, our study provides representative data about the environmental factors that are associated with thyroid cancer occurrence. Age ≥ 55 years and current smoking were associated with lower thyroid cancer occurrence, while ≥ BMI 25 kg/m 2 , higher income, urban residence, hypothyroidism, and hyperthyroidism were www.nature.com/scientificreports/ associated with higher thyroid cancer occurrence in both men and women. Further individual studies focusing on the mechanism underlying thyroid carcinogenesis are needed to elucidate the causality between these factors and thyroid cancer occurrence.

Methods
Database. Around 97% of the South Korean population is enrolled in national health insurance, and the KNHIS has publicly disclosed its national medical care claims information database. This database contains reimbursement claims from all medical facilities in the nation, which are paired with personal information data including SES, medical procedures, diagnostic codes, prescription drugs, information about the medical facility, inpatient and outpatient care medical costs, and dental services. There were 48,222,537 individuals who had Korean citizenship and maintained health insurance membership in 2006. Among them, the KNHIS selected 1,021,208 (approximately 2%) individuals in accordance with proportionate stratified random sampling based on 2142 strata (consisting of age, gender, residential area, and income level) to form the nationally representative sample database (national sample cohort). All of their medical claim information was prospectively collected up to 2015, and newborn cohorts were annually added to replace the deceased and to maintain the representability of the national sample cohort. Personal and sensitive information was replaced with information identifiers. We acquired institutional review board approval prior to conducting this analysis (No. GCIRB2019-318). decile, and lower class: 8th-10th decile), residential area (metropolitan cities and others), frequency of exercise (< 3 times per week and ≥ 3 times per week), alcohol drinking (never, < 3 times per week and ≥ 3 times per week), diet (vegetarian, meat only, and both), and smoking status (never, former, and current smokers). Smoking packyears was calculated as the number of packs smoked per day multiplied by the number of years of previous or current smoking among current smokers. The duration of smoking was provided as categorical variables; therefore, the median of each category was used to calculate the pack-years, which were subsequently categorized into 0, < 10, 10-19.9, and ≥ 20 for men and 0, < 5, 5-9.9, and ≥ 10 for women ( Table 3). The study endpoints were defined as all-cause mortality or the first incidence of thyroid cancer up to 2015.
Statistical analysis. Independent variables were compared between the thyroid cancer and control groups using the χ2-test and further multivariate analysis was conducted with Cox regression. The incidence of thyroid cancer significantly differs according to gender; therefore, analyses were performed separately for men and women. All variables were compared according to the smoking status using the χ 2 -test for binomial variables. Multivariate analysis was conducted using the Cox regression model. For smoking, the incidence density (per 1000 person-years) was obtained by calculating the ratio of the number of incident cases by person-years to the sum of the observation periods of all persons in each group. Multivariate Cox proportional hazards regression models were used to calculate the incidence, survival rate, hazard ratio (HR), and 95% confidence interval (CI) of smoking status and pack-years. The model was adjusted for the following variables: age, BMI, exercise, income, residential area, hypothyroidism, hyperthyroidism, alcohol drinking, and diet (Table 1). All statistical analyses were performed using R Studio, version 1.0.136, with a two-sided significance level of 0.05.