Introduction

Thyroid cancer is globally the most common endocrine malignancy, and its incidence has increased worldwide1,2,3,4. However, the causes of thyroid cancer, apart from exposure to radiation during childhood, have not been elucidated1. This is in contrast with the well-documented causal effects of lifestyle or environmental factors such as smoking, alcohol drinking, and high-sodium diet on cancers of the lung, liver, and stomach, respectively. Many environmental factors have been proposed to be associated with thyroid cancer occurrence; however, this remains controversial1.

For example, iodine uptake has been linked with the development of thyroid cancer; however, the evidence is inconclusive5,6. Likewise, reported associations between thyroid cancer and environmental factors, such as obesity, physical activity, socio-economic status (SES), alcohol drinking, and smoking, differ among studies with small and biased population selection7,8,9,10. Moreover, these studies investigated the association of each factor with thyroid cancer separately and thus neglected the confounding influence among these factors.

In recent years, large-scale databases of national insurance systems have become publicly available in several countries, including South Korea11,12. Analyzing such databases allows investigators not only to comprehensively assess the associations of multiple environmental factors with a disease and their confounding influence, but also to perform such analyses using a population that is representative of an entire nation. Therefore, we comprehensively analyzed the associations between thyroid cancer and environmental factors by extracting a thyroid cancer patient group and a control non-cancer group from the nation-wide insurance database.

Results

Among the 1,021,208 individuals in the Korean National Health Insurance Service (KNIHS) national sample cohort database, 176,387 and 104,588 received health examinations in 2006 and 2007, respectively. During the selection process, 2106 eligible patients with thyroid cancer and 232,680 participants without a history of thyroid cancer were enrolled as cases and controls, respectively. The median follow-up period for the case and control groups were 4.79 and 9.07 years respectively. The baseline characteristics of the study population are shown in Table 2. The χ2-test indicated that all variables significantly differed between cases and controls (p < 0.05, Table 1).

Table 1 Baseline characteristics of the study population.

The results of multivariable survival analysis for men and women are shown in Supplement 1 and 2. Age ≥ 55 years (HR 0.68, 95% CI 0.53–0.88), lower income (HR 0.57, 95% CI 0.40–0.80), and current smoking (HR 0.69, 95% CI 0.55–0.85) were associated with lower thyroid cancer occurrence among men. BMI ≥ 25 kg/m2 (HR 1.51, 95% CI 1.26–1.82), higher income (HR 1.44, 95% CI 1.19–1.76), urban residence (HR 1.24, 95% CI 1.03–1.49), and presence of hypothyroidism (HR 3.31, 95% CI 2.38–4.61) or hyperthyroidism (HR 2.46, 95% CI 1.75–3.46) were associated with higher thyroid cancer occurrence among men (p < 0.05, Supplement 1).

Age ≥ 55 years (HR 0.63, 95% CI: 0.56–0.71), moderate alcohol drinking (HR 0.87, 95% CI 0.77–0.99), and current smoking (HR 0.56, 95% CI 0.37–0.85) were associated with lower thyroid cancer occurrence among women. BMI ≥ 25 kg/m2 (HR 1.41, 95% CI 1.26–1.57), frequent exercise (HR 1.21, 95% CI 1.07–1.36), higher income (HR 1.18, 95% CI 1.06–1.32), urban residence (HR 1.17, 95% CI 1.06–1.29), and presence of hypothyroidism (HR 1.60, 95% CI 1.40–1.82) or hyperthyroidism (HR 1.38, 95% CI 1.19–1.61) were associated with higher thyroid cancer occurrence among women (p < 0.05, Supplement 2).

Development of thyroid cancer according to smoking status was analyzed separately for each gender because of the demographical difference (difference in incidence of thyroid cancer, female prone) that may allude to heterogeneity of thyroid cancer between the two genders, and because the variables may be exhibited differently between the two genders. For example, social taboos against women smoking tobacco may elicit different response patterns in health survey between the genders (Table 2). The percentage of current smokers was 43.6% among men and 2.7% among women. During the 2074182.2 person-years of follow-up, 2106 participants developed thyroid cancer and received surgery for it (incidence rate, 1.01 per 1000 person-years). For men, only current smoking was significantly associated with a decreased thyroid cancer risk when adjusted for multiple variables (HR 0.69, 95% CI 0.55–0.85). Former smoking was not associated with an increased risk of developing thyroid cancer compared with never smoking. Likewise, current smoking was associated with a decreased risk of thyroid cancer among women in the multivariable adjusted model, whereas former smoking was not.

Table 2 Development of thyroid cancer according to smoking status.

The multivariable adjusted model demonstrated an association between pack-years of smoking and development of thyroid cancer among men (Supplement 3). The risk of developing thyroid cancer was significantly lower in each of the pack-years groups than in never smokers. However, pack-years of smoking was not associated with thyroid cancer occurrence among women.

Discussion

According to our study, younger age, BMI ≥ 25 kg/m2, higher income, urban residence, and presence of hypothyroidism or hyperthyroidism, were associated with higher thyroid cancer occurrence in both genders, while less physical activity, moderate alcohol drinking and current smoking were associated with lower thyroid cancer occurrence in only women. Pack-years of smoking was associated with a decreased risk of thyroid cancer in men, but not in women.

This study is the first to comprehensively analyze the effects of lifestyle and environmental factors on thyroid cancer occurrence using a national database. Previous national database studies investigated the impact of individual environmental factors on thyroid cancer occurrence. Using the National Health Insurance database of Taiwan, two separate groups concluded that the presence of hyperthyroidism, higher SES, and urban residence are associated with an increased risk of thyroid cancer occurrence12,13. Likewise, a 7-year follow-up study of the KNHIS database demonstrated that higher BMI is associated with an increased risk of thyroid cancer11,14. However, these studies investigated the effects of individual factors, whereas the present study conducted multivariable adjusted analysis of epidemiologic factors to reduce confounding effects. Furthermore, insurance databases are designed for cost-claim analysis, not for research purposes, which may lead to questions about the validity of thyroid cancer diagnosis15. However, because of the reimbursement policies, diagnostic codes for cancer tend to be more accurate than those for benign conditions16. In addition, we strengthened the diagnostic accuracy by requiring registration of multiple cancer codes (to exclude unconfirmed diagnosis) followed by a thyroid operation code. Therefore, diagnosis of thyroid cancer in our study can be considered reliable.

National database analysis has great significance because it is representative of an entire population or nation; therefore, the generalizability of the data is more acceptable in comparison with individual studies. Our results are representative of the South Korean population and are in accordance with previous studies14,15,16,17,18. Young age, high SES, urban residence, obesity, and hypothyroidism have been linked to an increased risk of thyroid cancer occurrence in both national database and individual studies. Although young age is well-documented to be associated with lower thyroid cancer mortality19, it is not always associated with lower thyroid cancer occurrence. Multiple genetic alterations are found more frequently in younger patients17. For example, RET/PTC1 rearrangement, which is a key somatic genetic alteration in papillary thyroid cancer development, occurs more frequently in younger patients18. Such factors may underlie national database-based results demonstrating an increased risk of papillary thyroid cancer lymph node metastasis in younger patients20.

The associations of thyroid cancer with higher SES and urban residence can be attributed to easy access to medical healthcare21. Individuals who reside in cities earn more and tend to receive frequent medical check-ups, which may explain the increase in thyroid cancer occurrence22,23. Likewise, obesity has been associated with a higher occurrence of thyroid cancer in many studies11. Although the underlying mechanism is not fully understood, insulin resistance or diabetes, which is associated with obesity, may be a risk factor for tumor development24. Hyperthyroidism and hypothyroidism have also been associated with thyroid cancer. Thyroid-stimulating antibody may affect proto-oncogenes such as RET and TRK during the development of thyroid cancer in hyperthyroidism, and an increased thyroid-stimulating hormone level may lead to nodule and cancer growth in patients with hypothyroidism25. Furthermore, frequent ultrasonography performed in these patients may increase the chance of detecting thyroid cancer26.

The effect of smoking on thyroid cancer has been extensively investigated in many individual studies. In general, active cigarette smoking is one of the most well-known causes of cancer. While some data support the carcinogenic effect of smoking in thyroid cancer27, others demonstrate no significant effect 28,29 and some even indicate that cigarette smoking protects against thyroid cancer30,31. Subsequently, a meta-analysis demonstrated that current smokers are less likely to be diagnosed with thyroid cancer than non-smokers7 and the general opinion is in accordance with this result. The theory that smoking exerts a protective effect for thyroid cancer is based on the finding that cigarette extracts exhibit properties similar to that of thyroid hormones and thus may act as thyroid hormone receptor partial agonists32. This may mimic a ‘thyroid stimulating hormone suppression effect’ which would lead to decreased thyroid gland stimulation and in turn less of tumorigenesis. Our study is the first to confirm this association using a national database, which strengthens the generalizability of the association between current smoking and decreased thyroid cancer occurrence.

Although national insurance databases have their strengths, they can only reveal associations, not causality, due to their inherent quality. Such causal relationships must be demonstrated in individual studies investigating the mechanisms underlying thyroid carcinogenesis. Furthermore, clinical information, such as cancer stage, type of thyroid cancer (i.e. medullary or anaplastic cancer), and laboratory data (e.g., thyroid function test results), were not included in this study. In addition, the questionnaires provided by the participants did not include any information on the time duration of the variables. This was complicated by the fact that questionnaires were reported in two different formats (never/former/current vs. pack-year format) obtained through patients’ own reports. Consequently, subgroup analyses pertaining to such clinical and temporal information could not be conducted. Finally, the non-linear effect of smoking pack-years and the different effect on both genders could not be explained by our data. Perhaps additional data must be further analyzed through a meticulously designed study in order to elucidate the effect of smoking pack-years on thyroid cancer.

In conclusion, our study provides representative data about the environmental factors that are associated with thyroid cancer occurrence. Age ≥ 55 years and current smoking were associated with lower thyroid cancer occurrence, while ≥ BMI 25 kg/m2, higher income, urban residence, hypothyroidism, and hyperthyroidism were associated with higher thyroid cancer occurrence in both men and women. Further individual studies focusing on the mechanism underlying thyroid carcinogenesis are needed to elucidate the causality between these factors and thyroid cancer occurrence.

Methods

Database

Around 97% of the South Korean population is enrolled in national health insurance, and the KNHIS has publicly disclosed its national medical care claims information database. This database contains reimbursement claims from all medical facilities in the nation, which are paired with personal information data including SES, medical procedures, diagnostic codes, prescription drugs, information about the medical facility, inpatient and outpatient care medical costs, and dental services. There were 48,222,537 individuals who had Korean citizenship and maintained health insurance membership in 2006. Among them, the KNHIS selected 1,021,208 (approximately 2%) individuals in accordance with proportionate stratified random sampling based on 2142 strata (consisting of age, gender, residential area, and income level) to form the nationally representative sample database (national sample cohort). All of their medical claim information was prospectively collected up to 2015, and newborn cohorts were annually added to replace the deceased and to maintain the representability of the national sample cohort. Personal and sensitive information was replaced with information identifiers. We acquired institutional review board approval prior to conducting this analysis (No. GCIRB2019-318).

Study population

This was a retrospective study of a prospective national cohort: the KNHIS national sample cohort. Individuals who made any healthcare examination claims in 2006 or 2007 were enrolled for analysis. Only medical claim information from 2006 was evaluated for individuals who made claims in both years. Individuals who had the thyroid cancer code (Korean Classification of Disease (KCD) code C73) registered more than once and a thyroid surgery code (operation codes that included all types of thyroidectomies: P4551, P4552, P4554, or P4561) registered subsequently were categorized as the thyroid cancer group. Individuals who had never had the thyroid cancer code registered were classified as the control group. Individuals who satisfied at least one of the following criteria were excluded from the analysis: (1) minors younger than 18 years old at the time of enrollment: (2) history of thyroid cancer diagnosis before enrollment or of any other cancer diagnosis regardless of the time period; and (3) variables containing missing values (Fig. 1).

Figure 1
figure 1

Data flow diagram of the cohort study.

Data collection

The study population was categorized according to age, body mass index (BMI), income, residential areas, frequency of exercise, frequency of alcohol drinking, diet, presence or absence of hyperthyroidism, presence or absence of hypothyroidism, and smoking data (smoking status and pack-years) at the time of examination (i.e. 2006 ~ 2007). The categorization was performed as follows: age (< 55 years and ≥ 55 years), BMI (< 25 kg/m2: normal, and ≥ 25 kg/m2: overweight), income (middle class: 4th–7th decile, upper class: 1st–3rd decile, and lower class: 8th–10th decile), residential area (metropolitan cities and others), frequency of exercise (< 3 times per week and ≥ 3 times per week), alcohol drinking (never, < 3 times per week and ≥ 3 times per week), diet (vegetarian, meat only, and both), and smoking status (never, former, and current smokers). Smoking pack-years was calculated as the number of packs smoked per day multiplied by the number of years of previous or current smoking among current smokers. The duration of smoking was provided as categorical variables; therefore, the median of each category was used to calculate the pack-years, which were subsequently categorized into 0, < 10, 10–19.9, and ≥ 20 for men and 0, < 5, 5–9.9, and ≥ 10 for women (Table 3). The study endpoints were defined as all-cause mortality or the first incidence of thyroid cancer up to 2015.

Table 3 Variables in the cohort study.

Statistical analysis

Independent variables were compared between the thyroid cancer and control groups using the χ2-test and further multivariate analysis was conducted with Cox regression. The incidence of thyroid cancer significantly differs according to gender; therefore, analyses were performed separately for men and women. All variables were compared according to the smoking status using the χ2-test for binomial variables. Multivariate analysis was conducted using the Cox regression model.

For smoking, the incidence density (per 1000 person-years) was obtained by calculating the ratio of the number of incident cases by person-years to the sum of the observation periods of all persons in each group. Multivariate Cox proportional hazards regression models were used to calculate the incidence, survival rate, hazard ratio (HR), and 95% confidence interval (CI) of smoking status and pack-years. The model was adjusted for the following variables: age, BMI, exercise, income, residential area, hypothyroidism, hyperthyroidism, alcohol drinking, and diet (Table 1). All statistical analyses were performed using R Studio, version 1.0.136, with a two-sided significance level of 0.05.