Introduction

Anthropogenic environmental pollutants are believed to account for a sizable portion of the worldwide incidence of cancer1. Over the past decades, hundreds of confirmed and suspected environmental carcinogens have been identified. Industrialization has led to air pollution from dust storms, smoke, fumes, and toxic gas emissions from thermal power plants, coal mines, petroleum, and chemicals. As a result, ambient air pollution has become one of the most significant environmental risks to health. In addition, exposure to outdoor air pollution poses an urgent public health challenge worldwide because it is ubiquitous, affects everyone, and has numerous adverse human health effects, including cancer2,3. According to the International Agency for Research on Cancer, air pollution is a Group 1 carcinogen. In addition to causing lung cancer, it is also associated with an increased risk of other types of cancer4. Therefore, studies on the toxicological effects of these anthropogenic ambient air pollutants and their impact on human organs are urgently needed.

Examining the pollutant profiles can inform policy-making on forming environmental and public health counteractive measures; exhaust emissions from transportation include total hydrocarbons (THC) and nitrogen oxides (NOX). Exhaust gases emitted by industrial plants contain a variety of pollutants, such as volatile organic compounds (VOCs), nitrogen dioxide (NO2), and carbon monoxide (CO)1. Outdoors, petrochemical solvent spillage, evaporated fuels, and biogenic emissions produce air pollutants in the form of gases, VOCs, and PM. Although regarded as low-carbon emissions, electricity generation from natural gas power plants produces airborne nonmethane hydrocarbons (NMHC) and nitrogen oxides (NOx) in the long run. Notably, after a series of photochemical reactions between VOCs (including hydrocarbons) and NOX, the concentration of ground-level ozone (O3) increases, leading to poor air quality5. Therefore, effective control of hydrocarbons (HC) can indirectly reduce the ground-level O3 concentration to improve regional air quality.

VOCs are defined in several ways. Total hydrocarbons (THCs) are also VOCs when used in a broader sense. Oxygenated hydrocarbons, such as alcohols and aldehydes, are not considered THCs. HCs are the most toxic organic gases in vehicle emissions. Methane, a HC, is neither photoreactive nor toxic; conversely, “nonmethane hydrocarbons” are known to be reactive in ambient air. Although slight differences may occur among different geographical regions5, a source apportionment study of non-biogenic, anthropogenic NMHC as an air pollutant in Delhi, India, disclosed that the primary source of NMHC was traffic vehicle emissions (petrol and diesel), with 38% from petroleum, 32% from liquefied petroleum gas, 16% from solid fuel combustion, and 14% from diesel6.

The global cancer burden estimation report, GLOBOCAN 2020, was published in 2021, showing urinary bladder cancer (UBC) as the 12th most common cancer worldwide, accounting for 3% of all cancer burden7. In 2020, there were 573,278 newly diagnosed UBCs, with 212,536 new deaths the same year7. Furthermore, although the overall number of UBC cases is relatively minor compared with lung cancer, colorectal cancer, and liver cancer, the recurrence rate of bladder cancer is high at up to 70%. Fortunately, over the years, through government-led air pollution control, workplace health promotion, industrial occupational hazard exposure prevention, bans on aristolochic acid-containing medicines starting from November 4, 20038, and a substantial reduction in the cigarette smoking rate in Taiwan, the age-specific incidence rates of UBC in Taiwan are expected to decrease by > 25% from 2016 to 20259,10,11. Despite all these efforts, it is still important to examine the magnitude of UBC development risk due to specific ambient air pollutants.

Studies on the association between air pollution and UBC have revealed contradictory results. A few studies with positive results suggest that air pollution, particularly PM, is associated with a higher incidence of UBC12,13,14,15,16,17,18,19. In contrast, other studies did not detect this causal relationship20,21,22,23. People living near chemical factories24 harbor a higher risk of developing UBC. However, the culprit pollutants are thought to be airborne polycyclic aromatic hydrocarbons (PAHs) and motor vehicle engine exhaust. In addition, several studies have suggested that air pollution (ambient PM2.5, traffic air pollution, and petrochemical air pollutant emissions) is associated with increased bladder cancer mortality in Taiwan16,18,20. However, to our knowledge, no studies have investigated the ambient air hydrocarbon pollutant, THC, and NMHC.

We hypothesized that exposure to the ambient air pollutants THC and NMHC would increase the risk of developing UBC. Data from the National Health Insurance Research Database (NHIRD) and government environmental databases were used to examine whether long-term exposure to HC in ambient air increased UBC risk among people aged ≥ 20 years in Taiwan. This is one of the first studies to investigate the risk of UBC associated with exposure to ambient air THC and NMHC pollution.

Results

Study population characteristics

This study tracked 589,135 initially cancer-free individuals aged 20 years and above (Fig. 1). The mean age was 42.5 ± 15.7. Males accounted for 50.5% of the total population. The most prevalent medical comorbidities were dyslipidemia (32.5%), hypertension (24.0%), diabetes mellitus (23.5%), chronic liver disease (21.6%), and gout (18.7%). We also examined risk factors for UBC in the entire cohort, which included smoking-related diagnoses with a prevalence of 12.3%, alcohol use disorder (4.1%), morbid obesity (2.1%), spinal cord injury (1.2%), chronic cystitis (0.8%), chronic kidney disease (4.8%), and pesticide exposure (0.3%). Furthermore, 54.3%, 31.4%, 6.4%, and 0.9% of the participants lived in high, medium-high, medium, or low levels of urbanization, respectively. A total of 5416 study subjects originated from the six black-foot disease endemic regions, accounting for 0.9% of the entire study cohort. The demographic data and comorbid states among tertiles of THC and NMHC are presented in Tables 1 and 2, respectively, with T1 and T3 being the lowest and highest levels of the daily average of the respective pollutant. Individuals exposed to any pollutant in T1 had more comorbidities and risk factors than those under T3 exposure (Tables 1 and 2).

Figure 1
figure 1

The study flow diagram.

Table 1 Odds ratios of incident urinary bladder cancer by tertile of the total hydrocarbons’ exposure and characteristics of the cohorts.
Table 2 Odds ratios of incident urinary bladder cancer by each tertile of nonmethane hydrocarbons exposure and characteristics of the cohorts.

Air pollution exposure

In the current study over the 10-year exposure period, the mean daily average of THC concentration was 2.25 ppm (SD = 0.13); NMHC was 0.29 ppm (SD = 0.09). Table S1 shows the Pearson’s correlation analysis for the 12 air pollutants over a 10-year exposure period. The absolute value of the correlation coefficients < |0.3| denote a low correlation strength, which qualifies as the controlling pollutants in the multiple-pollutant models of the targeted pollutants, THC and NMHC. THC with SO2 (r = 0.089), PM10 (r = − 0.238), and PM2.5 (r = − 0.287), while NMHC with SO2 (r = 0.159) and CH4 (r = 0.243) (Supplementary Table S1). The summary statistics of the air pollutants over a 10-year exposure period are shown in Supplementary Table S2. The distributions of daily average concentrations of air pollutants over the 10-year exposure period are shown in Supplementary Figs. S1 and S2 (SO2, CO2, CO, O3, PM10, and PM2.5, are shown in Fig. S1; NOx, NO, NO2, THC, NMHC, and CH4 in Fig. S2). Both ambient pollutants showed a long-term monotonic downward trend over time by the Mann–Kendall test. Sen’s method used to estimate the slope of the trend over the study period revealed a downward trend of − 0.03 units per year for THC (95% CI, − 0.04 to − 0.02; p < 0.001) and − 0.01 units/year for NMHC (95% CI, − 0.01 to − 0.01; p < 0.001) (Supplementary Fig. S3).

Ambient air THC, NMHC exposure, and incident UBC

Among the target pollutant concentrations to which all cohort participants were exposed, pollutant levels were categorized into tertiles, with T1 being the lowest and T3 being the highest. At the end of the follow-up period, prolonged exposure to THC increased the number of newly diagnosed UBC in a dose-dependent manner: 60.9 cases per 100,000 study individuals under T1 concentration exposure, 221.2 under T2 exposure, and 651.8 under T3 exposure (Table 1). For T2 exposure, the odds ratio (OR) was 3.64 (95% CI 2.96–4.46) when compared to that of T1 exposure. Moreover, when comparing T3 to T1, the OR was 10.76 (95% CI 8.92–12.99) (Table 1). Ambient air NMHC exposure was associated with 170.0/100,000 enrollees over the entire study period for the lowest T1 exposure; 349.5 for T2; and 426.7 for T3, indicating a dose-dependent effect (Table 2). The OR was 2.06 (95% CI 1.80–2.35) for T2 exposure when compared to that of T1 exposure. The OR was 2.52 (95% CI 2.21–2.87) for T3 compared to T1 (Table 2).

Tables 3, 4, and Supplementary Table S3 present the single- and multiple-pollutant models for per 0.13 ppm or 0.09 ppm increase in THC or NMHC on different stratifications. We fitted two-pollutant models for THC by controlling for the concomitant exposure to SO2, PM10, or PM2.5. Two-pollutant models for NMHC were used to control for the concomitant exposure to SO2 or CH4, and the three-pollutant model for NMHC while controlling for SO2 and CH4.

Table 3 Crude and adjusted hazard ratios of developing urinary bladder cancer during long-term THC or NMHC air pollutants exposure at a standard deviation (SD) increment controlled for PM2.5 and other air pollutants.
Table 4 Cox regression-derived adjusted hazard ratios for incident urinary bladder cancer associated with each tertile of ambient THC or NMHC exposure stratified by sex.

Without controlling for confounding air pollutants, the adjusted hazard ratio (HR) for UBC development was 1.83 (95% CI 1.75–1.91; p < 0.001) per 0.13 ppm increase of THC; after controlling for PM2.5, adjusted HR was even higher at 2.09 (95% CI 1.99–2.19). The adjusted HR was 1.37 (95% CI 1.32–1.43; p < 0.001) per 0.09 ppm increase in ambient NMHC concentration. After controlling for SO2 and CH4, the adjusted HR was 1.10 (95% CI 1.06–1.15). Ambient NMHC controlling for the most important confounder of air pollutant SO2 resulted in an adjusted HR of 1.36 (1.30–1.42, p < 0.001) (Table 3).

Table 4 presents the Cox proportional hazards regression analysis of the two targeted pollutant categories divided into three tertiles. The lowest tertile was used as the reference in each case, and the estimated HRs were adjusted for age, sex, lag 0–2, season, ambient temperature, level of urbanization, black-foot endemic region, and comorbidities. These results were consistent with those obtained from earlier multivariate analyses. Our research shows that the overall population exposure to the highest tertile (T3) of THC significantly increased the risk of UBC, with an adjusted HR (95% CI) of 3.16 (2.54–3.94; p < 0.001) (Table 4). The following results shows adjusted HRs associated with each tertile of airborne THC or NMHC exposure stratified by sex. For THC, the adjusted HRs of UBC for T3 were 3.16 (2.40–4.16; p < 0.001) for males and 3.11 (2.15–4.49; p < 0.001) for females. For NMHC, the adjusted HRs for T3 were 1.98 (1.57–2.49; p < 0.001) for males and 1.83 (1.33–2.51; p < 0.001) for females. When data regarding sex were stratified or merged for analysis, statistically significant correlations of adjusted HRs were measured for T2 and T3 compared with T1. The analysis demonstrated an association between the targeted pollutants and UBC risk in a dose-dependent manner but had no sex difference in the magnitude of risk.

Sensitivity analyses

To assess the sex-specific differences and the confounding effect of diabetes mellitus status on the association between the targeted pollutants and UBC cancer development, and to reveal any unexpected hidden relationships, we performed sensitivity analyses to compute causal effects only within the underlying strata. The results demonstrated no unexpected relationship, sex-specific differences, or ameliorating impact in non-diabetics (Supplementary Tables S3 and S4).

Before controlling for other pollutants, newly diagnosed UBC in males was significantly positively associated with the daily average concentration over the 10-year period for THC and NMHC with adjusted HR (95% CI) of 1.78 (1.68–1.88; p < 0.001) and 1.34 (1.27–1.41; p < 0.001) (Supplementary Table S3). Moreover, the hazards were numerically larger in females showing that the adjusted HR (95% CI) for developing incident UBC by THC and NMHC was 1.91 (1.78–2.06; p < 0.001) and 1.43 (1.33–1.53; p < 0.001). Among them, THC controlling for PM2.5 resulted in adjusted HRs for males and females of 2.04 (1.92–2.17; p < 0.001) and 2.19 (2.02–2.37; p < 0.001), respectively. NMHC controlling for SO2 resulted in adjusted HRs for males and females of 1.31 (1.24–1.38; p < 0.001) and 1.43 (1.33–1.54; p < 0.001), respectively.

Before controlling for other pollutants, newly diagnosed UBC in people with diabetes mellitus was significantly positively associated with the daily average concentration over the 10-year period for THC and NMHC, with adjusted HRs (95% CI of 1.99 [1.84–2.16; p < 0.001]) and 1.48 (1.36–1.60; p < 0.001), respectively (Supplementary Table S4). However, the magnitude of these risks was not significantly reduced in non-diabetics. Namely, the newly diagnosed UBC without diabetes mellitus for THC and NMHC with adjusted HR was 1.76 (1.67–1.86; p < 0.001) and 1.33 (1.27–1.40; p < 0.001), respectively. Among them, THC controlling for PM2.5 resulted in adjusted HRs for people with and without diabetes mellitus of 2.34 (2.14–2.55; p < 0.001) and 2.00 (1.88–2.11; p < 0.001), respectively. NMHC controlling for SO2 resulted in adjusted HRs for people with and without diabetes mellitus of 1.48 (1.37–1.61; p < 0.001) and 1.31 (1.24–1.38; p < 0.001), respectively.

Cumulative incidences of UBC compared between different tertiles

We observed slight changes in the effects of THC and NMHC after controlling for other pollutants, and the directions of the effect estimates did not change, suggesting that our findings were robust against potential confounders. The appropriateness of the Cox proportional hazards model is supported by the plot in the upper panel of Fig. 2, showing the log (− log [survival function]) versus survival time. Cumulative UBC incidence for the targeted pollutants was assessed using the Kaplan–Meier method (Fig. 2, lower panel), presenting a clear trend of increased UBC risk with increased exposure to each targeted pollutant. Over the entire follow-up period, when T1 exposure was set as a reference for comparison, the adjusted HRs by T2 and T3 exposure to THC was 1.23 (0.98–1.54) and 3.16 (2.54–3.94), respectively; it was 1.67 (1.41–1.97) and 1.94 (1.61–2.34) corresponding to NMHC T2/T3 exposure, respectively (Table 4). In addition, statistically significant differences in UBC occurrence were observed among the tertiles of the targeted pollutant categories (log-rank test; p < 0.001). A visualization summary to display the dose–response effect by escalated exposure in tertile is displayed in Fig. 3.

Figure 2
figure 2

The log minus log survival plot by ambient air THC and NMHC pollutants, and the cumulative incidence of urinary bladder cancer in individuals among THC and NMHC pollutants’ tertile. Upper panel: To evaluate the proportional hazards (PH) assumption involving the comparison of estimated − ln (− ln) survival curves over different THC and NMHC tertiles, the plot of log (− log [survival function]) versus survival time in THC and NMHC air pollutants was constructed. The graphical approach showing parallel curves over time provides support of the PH assumption. Lower panel: Cumulative incidence of urinary bladder cancer for individuals among tertiles of THC and NMHC pollutants. The tertile values, in ppm (THC, NMHC), are as follows: THC (lowest tertile, T1: < 2.10; medium tertile, T2: ≥ 2.10 and < 2.25; and highest tertile, T3: ≥ 2.25); NMHC (lowest tertile, T1: < 0.26; medium tertile, T2: ≥ 0.26 and < 0.34; and highest tertile, T3: ≥ 0.34).

Figure 3
figure 3

A schematic diagram summarizing the study results of the risk of developing urinary bladder cancer associated with long-term exposure to different tertiles of airborne THC or NMHC concentration. The magnitudes of risk show a significant exposure–response relationship.

Discussion

This nationwide population-based cohort study linked national insurance claims data to open government data to investigate the association between long-term exposure to ambient air pollutants, THC or NMHC in Taiwan, and UBC risk. Our novel analysis shows a positive correlation between exposure to HC (THC or NMHC) in the ambient air for 10 years and UBC risk in people aged ≥ 20 years. Each additional unit of SD (0.13 ppm; 0.09 ppm) concentration of THC and NMHC increases the risk of bladder cancer by 83% and 37%, respectively. Furthermore, sensitivity analyses showed that these relationships did not change according to sex or presence or absence of diabetes. A large body of epidemiological evidence has indicated that diabetes is an independent risk factor for increased rates of heterogeneous types of cancer occurrence and death. The incidence and mortality of various types of cancer, including UBC, have a modest increase in patients with diabetes25. The magnitude of the risk of UBC mortality in people with diabetes in terms of the HR, after being adjusted for baseline age, smoking status, and body mass index, was increased by 40% when compared to non-diabetics25. We therefore performed sensitivity analyses and observed a consistent effect on patients without diabetes: the association in our study remained after controlling for simultaneous exposure to other pollutants, particularly PM2.5.

Many studies have demonstrated that air pollution adversely impacts health, and may result in problems including cancer; however, few have focused on specific pollutants in ambient air2,14,15,20,23,26,27,28. Most studies investigated the role of PM2.5, NO2, NOx, SO2, and respirable elemental carbon as a proxy for diesel exhaust in the overall incidence of cancers, including UBC2,14,23. By contrast, our study presents a novel analysis of the association between the long-term exposure to ambient NMHC and THC and the risk of UBC. VOCs, including benzene and formaldehyde, in diesel engine exhaust emissions can be positively correlated with THC emissions, contributing to aggravated ground-level O3 pollution when intense solar radiation and high temperatures and low humidity occur. Our prespecified models demonstrated that after controlling for PM2.5; health comorbidities; levels of urbanization, black-foot disease endemic region, season and ambient temperatures, each 0.13 ppm increase in THC concentrations would lead to a two-fold increase in the risk of incident UBC even in people without diabetes mellitus (adjusted HR 2.00; 95% CI 1.88–2.11). When replacing PM2.5 with SO2, an increase of 74% in the risk of developing UBC was still observed (adjusted HR 1.74; 95% CI 1.65–1.84).

A recent systematic review and meta-analysis study pointed out that petroleum industry work was associated with a modest increased risk of various cancers, including UBC (effect size = 1.25, 95% CI 1.09–1.43)29. A Spanish case–control study indicated that living more than 40 years in a city with more than 100,000 inhabitants was associated with an increased risk for UBC (OR = 1.30, 95% CI 1.04–1.63)30. Emissions of PAHs and diesel from industries near the residence, as evaluated by experts, were associated with an increased risk (OR = 1.29, 95% CI 0.85–1.98)30. In addition, previous research has shown that, for urban bus drivers and tramway employees who were employed for > 3 months, the risk of bladder cancer (standardized incidence ratio [SIR] = 1.4, 95% CI 1.2–1.6) was significantly increased22. The SIRs and 95% CIs for bladder cancer in road transportation workers compared with those in the whole population were 1.26 and 1.03–1.52, respectively15. Nevertheless, assuming occupational exposure, such as among road transportation workers, motor vehicle mechanics and repairers, garage mechanics, underground mines workers, and petroleum industry workers, as a proxy to ambient air pollution may be invalid since it is possible that the route of entry into the human body might have been via skin contact or even through the mucosal organs and not from inhalational hazard sources. Similarly, using petrol station density or annual industrial waste gas emissions to represent exposure to air pollution may also be groundless due to poor representation of specific pollutants. Thus, our study specifically targeted air pollutants such as THC and NMHC to offer new evidence on the impact of these precise components on the development of UBC.

In addition, we used a multiple-pollutant model to control the association between HC and the incidence of bladder cancer. We found that compared with the THC single-pollutant model, THC controlling for PM2.5 has a higher risk of bladder cancer. Although previous studies used mortality rather than new cancer incidence as endpoints, they still showed that PM2.5 has a significant deleterious effect on bladder cancer17,18. In an analysis of 623,048 ACS CPS-II participants in the United States, there was a significant adverse association between PM2.5 and bladder cancer mortality (HR per 4.4 µg/m3, 1.13; 95% CI 1.03–1.23; N = 1324)17. Thus, it can be seen that in the multiple-pollutant model, PM2.5 may increase the risk of target pollutants in bladder cancer.

Sufficient or convincing evidence about UBC carcinogenesis is associated with genetic susceptibility, cigarette smoking, diet, endemic Schistosoma haematobium infestation, and environmental pollution exposure31. However, the biological mechanism of the urinary tract carcinogenesis related to ambient air pollution remains unclear. Air pollution contains various mutagens and carcinogens, which may play a role in chronic systemic inflammation, oxidative stress, and DNA damage in tissues other than the lungs12,14,24,32,33. Multiple lines of indirect evidence have shed light on the mechanistic explanation for ambient air THC and NMHC-related UBC carcinogenesis. The toxic potential of environmental pollutants can induce oxidative stress and inflammatory potential32,33,34,35. However, there is no evidence in animal studies for such a relationship between ambient hydrocarbon exposure and UBC, even in the case of the pollutant PAHs. Fundamentally, PAHs are local carcinogens, i.e., they trigger cancer development at the site of exposure, such as the lung with inhalation, skin with dermal application, or stomach and duodenum with oral ingestion, in animal models36,37,38. The biological plausibility of hydrocarbon pollutant-induced lower urinary tract carcinogenesis remains elusive in light of the research findings from studies with other sources of PAHs. Before we have more convincing and direct evidence for UBC's ambient air THC and NMHC-induced carcinogenesis, the canonical oxidative and inflammatory mechanisms and other mechanisms could be investigated in THC and NMHC carcinogenesis models in future research.

The strengths of this study are as follows. First, this is a nationwide study using a large population derived from the NHIRD, which contains the medical care data of 2296 million people (99% of Taiwan’s population) under the National Health Insurance Plan. Second, this study is based on a 10-year long-term follow-up, which can provide adequate follow-up time to assess UBC development. Third, few epidemiological studies have assessed the association between airborne HC and UBC in Asia. Many studies have dealt with an increase in mortality, rather than cancer incidence. These endpoints are different in that the increase in mortality from UBC is due to the promoting effect of pollutants on cancer progression, whereas an increase in UBC incidence indicates the presence of the carcinogenic effect of air pollutants. Fourth, we considered and included the most important known risk factors for UBC in our calculation models and examined the possible synergistic effects between air pollutants. Finally, this study explored the association between HC and UBC through sex and diabetes stratification.

This linkage dataset cohort study has several limitations. First, because study subjects might relocate to a new place, exposure to air pollutants may change accordingly if the geographic location has a different level of pollutant concentration. The impact of this transition was difficult to measure. However, for most subjects in the cohort, a fixed postal address for one exposure time was reliable. Furthermore, the postal code-based study methodology does not consider that the individuals would be away from their residences, and their exposures at these times could be very different. Second, we could not measure occupational exposure to other hazardous chemicals in the workplace. Although we utilized the smoking-related diagnoses as a proxy for heavy smokers, this adjustment is considered limited in its ability to evaluate confounding by cigarette smoking. Thus, the impact of this likely confounding from cigarette smoking needs to be considered, a potential in this type of analysis that is not easily evaluated. Lastly, individual dietary habit records were not available to us. We were therefore unable to measure the possible exposure to food chain carcinogens in the current study. Although we do not believe that the lack of individual dietary habits and other lifestyles data had a large enough effect to significantly alter the risk of developing UBC in this large population-based cohort study, it may have caused possible biases, thereby affecting the association’s estimates.

Conclusions

This ambispective cohort study offered new evidence to suggest that long-term exposure to THC and NMHC may be a risk factor for UBC. The results indicate a possible link between HC and UBC risk. Further, in a stratified analysis of the population in Taiwan by sex or diabetes status, long-term exposure to the two target pollutant categories was associated with an increased risk of UBC.

The leading cause and mechanism of the disease remain unclear. Therefore, the best prevention method currently available is to avoid dangerous factors as much as possible or reduce exposure to hazardous environments. Early detection and treatment are also important for prevention and treatment. With the global aging trend, the prevalence and burden of UBC may also increase. Acknowledging pollutant sources that harmful to health can provide information for risk management strategies and help decision-makers formulate more targeted air pollution regulations. These findings may have important public health implications for preventing UBC.

Methods

Ethics approval and consent to participate

This study was approved after a full ethical review by the Institutional Review Board (IRB) of the China Medical University, Taichung, Taiwan (approval number: CMUH104-REC2-115 [CR-6]). In addition, because de-identified/anonymized data were used from the NHIRD, the IRB waived the requirement to obtain informed consent from the study participants. All experiments were performed according to confidentiality guidelines set forth by the Taiwan Personal Information Protection Act regulations. The entire study was conducted in accordance with the Declaration of Helsinki.

Data sources for linkage dataset ambispective cohort research

Health data were obtained from the Longitudinal Health Insurance Database 2000 (LHID2000) within the NHIRD, including claims data for 1 million randomly selected individuals, from 1996 to 201339. The NHIRD, established in 1996 in Taiwan, contains healthcare data of 22.96 million people (99% of Taiwan’s population) under a universal health insurance program, including all claims data (ambulatory care claims and inpatient claims) and prescriptions dispensed at pharmacies, the registry for beneficiaries, registry for medical facilities, and registry for medical specialists. To establish demographic characteristics for research, patient-level information is gathered by linking these data files using the identification number of insured individuals. As recorded in the database, each individual’s health and disease status was assigned an International Classification of Disease, Ninth Revision, Clinical Modification (ICD-9-CM) until recently, when ICD-10-CM was implemented. To enhance the reliability of the NHIRD data, the observation period was set as 2000–2013.

In addition, the Environment Resource Datasets40 are publicly available from open government data. This dataset was obtained by the Environmental Protection Administration of Taiwan, which determined ambient pollutants and temperatures at 76 monitoring stations across Taiwan, from 1993 to 2013. In this linkage database research, we used the postal code location as a proxy for the residence location from the NHIRD dataset and matched the postal code locations to the corresponding air quality monitoring stations in the Environmental Protection Administration (EPA) Open Dataset.

Study design and study population

A nationwide linkage database ambispective cohort design was used for this study from January 1, 2000, to December 31, 2013. The selection of the study subjects is depicted in Fig. 1. Among the 1 million subjects in the LHID2000 database, individuals aged 20 years and above were enrolled on January 1, 2000 (n = 594,297). Those with missing or unknown records for sex and birth were excluded. Those with UBC (n = 351) and cancer from any other primary site (n = 10,415) diagnosed before the beginning of the study period; those with only one claim record during the study period (n = 304); and to avoid the reverse causation bias, those with outcome diagnosis made before July 2003 (n = 14,696) were excluded. Ultimately, 594,297 subjects were selected for further merging with the EPA dataset through postal codes linked to the location of the air quality monitoring stations. 5162 enrollees did not have postal code information or EPA monitoring station data, thus were excluded. After the linkage dataset merge, we finally tracked 589,135 study subjects for the research.

Exposure modeling

We established from our research hypothesis to measure exposure to the targeted pollutants. Our devised exposure model for this study incorporated ambient concentrations of targeted pollutants over time while simultaneously addressing personal exposures tracked with the residential information and the duration of contact as input variables to estimate the cumulative individual exposure from inhalation. We have previously reported a similar exposure modeling approach which has drawn acceptance from the exposure and outcomes research community41. We determined the concentrations of 12 ambient air pollutants monitored by the EPA in Taiwan over a prespecified study period. The study targets were THCs and NMHCs. To examine the association between long-term exposure to targeted air pollutants and the development of newly diagnosed UBC, we measured the risk magnitude after controlling for other non-targeted pollutants over the exposure period. Non-targeted pollutants were included in the subsequent multiple-pollutant analyses. These were selected based on weak correlations (Pearson’s correlation coefficients < 0.3) of target pollutants with 10 other monitored air pollutants: sulfur dioxide (SO2); ozone (O3); carbon monoxide (CO); carbon dioxide (CO2); nitrogen oxides (NOX); nitrogen monoxide (NO); nitrogen dioxide (NO2); particulate matter < 10 μm in size (PM10); particulate matter < 2.5 μm in size (PM2.5); and methane (CH4) (Supplementary Table S1). Daily air quality data were collected at 76 monitoring stations from July 1, 1993, to December 31, 2013, and maintained by the EPA40. The locations where air pollutants were recorded were selected to form an integrated geographic information system. Using this system, each study patient was linked to the appropriate monitoring region by postal code, and the change in residence was considered through insurance registration during the study period. A patient’s long-term exposure to each air pollutant was defined as the cumulative concentration during the measurement period (i.e., 10 years before the survival date) averaged per day. Therefore, the long-term exposure to each air pollutant (LEAPij) (i = SO2, O3, CO, CO2, NOX, NO, NO2, PM10, PM2.5, THC, NMHC, and CH4) for a patient living in the region served by the air quality monitoring station j was calculated as follows41,42:

$${{LEAP}}_{{ij}} = \frac{\sum\nolimits_{{t=m}}^{{n}}{{{AP}}}_{{ijt}}}{{d}}$$

where APi is the ambient air pollution level for pollutant category i, m is the start date of the measurement period (10 years before the survival date), n is the end date of the measurement period (survival date), and d is the number of days in the measurement period.

This research also investigated the long-term trends of the airborne pollutants, THC and NMHC, employing the Mann–Kendall test to statistically assess whether there is a monotonic upward or downward trend of THC and NMHC over time. The test investigates the null hypothesis, H0, of no trend, indicating the observations are arbitrarily ordered in time, or the alternative hypothesis, H1, where either an increasing or decreasing monotonic trend is present. Sen’s method (the Thell–Sen estimator) was used to estimate the slope of these trends43.

Study outcomes

From the included population, we identified people who received a first-time diagnosis of either invasive or in situ UBC during the study period, based on ICD-9-CM codes 188 for invasive carcinoma and 233.7 for urinary bladder carcinoma-in-situ, respectively. Individuals were considered to have UBC if they visited an outpatient clinic ≥ 3 times with a UBC diagnosis or had been hospitalized because of UBC44. The earliest hospitalization or outpatient visit with UBC diagnosis was assigned as the diagnosis date and served as the newly diagnosed date of UBC for all subsequent analyses. We defined survival (the expected duration of time until the outcome event) with an endpoint date of either UBC diagnosis, death, or December 31, 2013, the final observation date, whichever occurred first.

Comorbidities as confounding factors for UBC outcome were collected

Information on comorbid conditions of patients was determined from the LHID2000 based on ICD-9-CM codes. The following comorbidities were considered essential: hypertension (401–405); chronic cystitis (595.1, 595.2); smoking-related diagnosis (305.1, 491.0, 491.2, 492.8, 496, 523.6, 989.84, V15.82, 649.0); alcohol use disorders (265.2, 291, 303, 305.0, 357.5, 425.5, 535.3, 571.0, 571.1, 571.2, 571.3, 980.0, V11.3); morbid obesity (278, 646.1, 649.1, 649.2, V45.86, V65.3, V77.8); spinal cord injury (806, 952, 336.1)47; chronic liver disease (571, 572.2–572.9); diabetes mellitus (249, 250, 648.8, 648.0); gout (274); chronic kidney disease (403, 404, 582.9, 585, 646.2, 792.5, 996.1, 999); pesticide exposures (989.1, 989.2, 989.3, 989.4); and dyslipidemia (272). These were identified and defined according to the diagnostic history collected from at least three outpatient visits or a single hospital admission before the survival date.

Levels of urbanization and the historic black-foot disease endemic regions as confounders

Seven clusters of urbanization stratification were grouped into four levels: high, medium–high, medium, and low urbanization, according to the previously published consensus methodology45. Clusters of high bladder cancer incidence rate in the black-foot disease endemic regions included six southwestern coastal townships where people unknowingly drank arsenic-contaminated well water before tap water installation in 1979–200346,47,48. The study subjects originating from these regions were identified and regarded as a confounder for adjustment in the multivariate Cox models.

Statistical analysis

The Chi-squared test (for categorical variables) and one-way analysis of variance (for continuous variables) were used to test for differences in demographic characteristics and distribution of comorbidities among tertiles of the targeted pollutant concentrations. UBC risk associated with each targeted pollutant category, expressed as hazard ratios (HRs) with 95% confidence intervals (CIs), was examined using Cox proportional hazards regression, considering potential confounders. To control the confounding effects of other pollutants, the possible link between air pollutants was used to assess the effects of multiple pollutants, by controlling others that were based on the selection of weak correlations with other air pollutants (i.e., the absolute value of the correlation coefficients between each of the two air pollutants was lower than 0.3; Supplementary Table S1). To avoid potential collinearity problems, we did not include pollutants with high correlations in the same regression model. The effect of each targeted pollutant on the risk of newly diagnosed UBC was estimated as the adjusted HR for the change in standard deviation (SD) over the follow-up period.

Local research has identified a V/U-shaped relationship between air pollutants and ambient temperature, showing significant effects at both ends of extreme temperatures in the region49. Therefore, to control the impact of weather conditions on air pollution and UBC, the ambient temperature should be one of the confounding factors in the pollutant models. Additionally, to control for short-term pollutant exposure effects, we used a lag of 0–2 days (average concentration levels on the same day of the UBC diagnosis, and one and two days before) for all air pollutants as one of the adjusting factors. Because air pollutant levels vary depending on the weather conditions, adjustment for the season is usually considered an important modifier in ambient air pollution-related biological effects in East Asia19. In the present study, multiple-pollutant models for two targeted pollutants were fitted, the independent effects of each targeted pollutant were adjusted for age, sex, comorbidities, level of urbanization, lag of 0–2 days, season (seasonal trends in UBC onset), and ambient temperature were estimated, and other pollutants that showed weak correlations were controlled. The concentration data of the targeted pollutants were divided into three tertiles, T1, T2, and T3, and adjusted HRs with 95% CIs were re-calculated.

Attributable risk proportion (ARP), as a percentage, estimates the proportion of UBC in the study population that is attributable to ambient air pollutants exposure. The incidence of exposure in the study population (not the entire general population) was estimated to calculate ARP. The calculation of ARP is as follows: From the exposure and outcome’s 2 by 2 table, odds ratio (OR) = (a × d)/(b × c); study population exposure (SPe) = c/(c + d); finally, ARP (%) = 100 × (SPe × (OR-1))/(1 + (SPe × (OR-1))).

Sensitivity analyses examined whether the effects of pollutant categories differed between males and females. In addition, studies have pointed out that diabetes is related to a higher risk of UBC50; we decided to use diabetes stratification to explore whether the pollutant category would have a significant impact on the non-diabetic population. Kaplan–Meier analysis was used to determine the cumulative incidence of UBC, and the log-rank test was used to evaluate the difference among tertiles of concentrations of the target pollutants. The analyses were performed using the MetaTrial Platform and Statistical Product and Service Solutions (Version 22). All statistical tests were two-sided; p values of 0.05 were considered statistically significant.