Abstract
Humans can be exposed to multiple pollutants in the air and surface water. These environments are non-static, trans-boundary and correlated, creating a complex network, and significant challenges for research on environmental hazards, especially in real-world cancer research. This article reports on a large study (377 million people in 30 provinces of China) that evaluated the combined impact of air and surface water pollution on cancer. We formulate a spatial evaluation system and a common grading scale for co-pollution measurement, and validate assumptions that air and surface water environments are spatially connected and that cancers of different types tend to cluster in areas where these environments are poorer. We observe “dose–response” relationships in both the number of affected cancer types and the cancer incidence with an increase in degree of co-pollution. We estimate that 62,847 (7.4%) new cases of cancer registered in China in 2016 were attributable to air and surface water pollution, and the majority (69.7%) of these excess cases occurred in areas with the highest level of co-pollution. The findings clearly show that the environment cannot be considered as a set of separate entities. They also support the development of policies for cooperative environmental governance and disease prevention.
Similar content being viewed by others
Introduction
Air and surface water are basics for human survival1,2. The rapid urbanisation and industrialisation in the past century came at the price of environmental deterioration3, which in turn, caused multiple hazards to population health4,5. It is estimated that over 90% of the world population in 2021, especially those in developing countries, lived in places where the World Health Organisation (WHO) standard on fine particulate matter, with a diameter of ≤ 2.5 microns (PM2.5), was not met6. This is alarming because there is sufficient evidence to show that PM2.5 has a causative association with lung cancer7,8,9. Other air pollutants are also gaining increased attention. For instance, nitrogen dioxide (NO2) has been linked to breast cancer10,11, and the WHO introduced this pollutant to its monitoring database in 202212. Surface water is similar to air in some ways (for example, it is non-static13, transboundary14, and has multiple exposure routes15,16), and these pose challenges for research. While previous studies have investigated the impact of specific types of water pollution at a local level17, there remains a limited understanding of the potential health effects of exposure to combined pollutants and their association with cancer incidence. The global number of cancer cases is expected to double in approximately half a century for multiple reasons, such as population aging18. From a spatial perspective, studies have observed overlapping distribution patterns of various cancer types19,20, indicating the presence of potential common environmental causes. However, the existing evidence regarding environmental carcinogenicity has been developed in a fragmented manner21, lacking a comprehensive evaluation system for understanding the holistic relationship between the real-world environment and cancer.
Here, we hypothesised that air and surface water environments would be spatially connected and cancers of different types would tend to cluster in areas with poor environmental conditions22,23. To address these hypotheses and facilitate future work, we established a spatial evaluation system that harmonises nationwide data on air, surface water, and cancer incidence in China. We also developed a graded scale of co-pollution that makes it possible to transform a complex network with multiple pollutants and multiple types of cancers to enable quantifiable evaluation of the relationships. Finally, we demonstrated the graphic consistency in air and surface water quality and cancer incidence. By reshaping cross-industry monitoring data into a minable data resource, we highlight a unique opportunity to accelerate the generation of knowledge to support the development of policies for cooperative environmental governance and disease prevention.
Results
Spatial evaluation system for environment and cancer
There is considerable public concern in China about both the environment and cancer24,25. Different industries in the country have separately established one of the world’s largest monitoring networks on air (real-time data from the China National Environmental Monitoring Centre)26,27, surface water (monthly release through the Environmental Quality Monitoring Network)28, and cancer (annual report from the China Cancer Registry System)29. These cover all the provincial-level regions that make up the Chinese mainland (Supplementary Information Fig. 1a–c). However, they differ in their spatial-temporal scales, and air, water and cancer data demonstrate considerable spatial heterogeneity across the country. These barriers prevent academics, cross-industry workers and the government from appraising the environment–cancer relationship. As a fundamental step to overcome this challenge, we “harmonised” these industry data as a Spatial Evaluation System for Environment and Cancer (SESEC, Table 1).
To spatially integrate the three types of national industry data, we defined the prefecture-level area as the basic unit. Any unit that simultaneously contained all three components (air monitoring site, water monitoring section, and cancer registry institute [CRI]) was included and denoted an analysis unit. All of these analysis units (totalling 219) constituted the study area, covering a population of 377 million (Supplementary Information Fig. 1d and Supplementary Information Table 1). For pollutants, if a specific analysis unit contained multiple monitoring points, we calculated the average value of the pollutants from these points to represent the average pollution level of that unit. This definition makes the units independent (i.e., non-overlapping both geographically and for information about the environment and outcome) for spatial analysis, having no restriction on the distribution or number of data points for the three components within one basic analytic unit, and therefore preserves the “natural” pattern of data sources to a large extent.
Items considered in the SESEC are extensive and tailored to the current situation in China (Table 1). The environmental quality items considered for assessment included six air pollutants recommended by the WHO Global Air Quality Guidelines30, as well as 13 surface water organic pollutant indicators. These specific pollutants were chosen based on criteria such as testability, comprehensiveness, and pollution share rate31. It is worth mentioning that surface water metal pollutants were not included in the assessment because they are already subjected to strict control measures in China and are present at very low concentrations. Threshold concentrations for the environmental quality items were set using the annual limit values recommended by the National Ambient Air Quality Standard for air pollutants26,32. They were based on the 75th percentiles of the national levels for water pollutant indicators because there is no standard for health effect evaluation33. For cancer items, the SESEC included 13 cancer types, selected because of their high incidence, increasing trend or low survival rate34,35. Collectively, these cancers accounted for 77% of all new cases in 2016. All the items were based on their annual average levels (with a few exceptions), to smooth temporal fluctuations (e.g., seasonal effect) and therefore better reflect a relative “stable” exposure–outcome relationship36.
We observed very high cross-country heterogeneity in the concentration levels of all 19 environmental items, the incidence of 13 types of cancer and high spatial autocorrelation among pollutants (descriptive data in Supplementary Information Tables 2, 3). There were some high correlations within and across the air, surface water and cancer elements in the system (Supplementary Information Fig. 2). For instance, the Spearman’s correlation coefficients were up to 0.88 within air pollutants (PM2.5 and particulate matter with diameter ≤ 10 microns [PM10]), 0.75 within water pollutants (permanganate index [COD_Mn] and total phosphorus [TP]), and 0.50 between air and water pollutants (PM10 and total nitrogen [TN]). The incidence of types of cancer also had similar patterns, with correlation coefficients up to 0.72 (breast and kidney cancer). Most air and water pollutants were also correlated with cancers. For instance, the correlation coefficient was 0.44 between PM10 and oesophageal cancer and 0.34 between COD_Mn and lung cancer. These figures provide a preliminary glance at the complex network in the SESEC and serve as primary proof of the inter-dependency between the air and surface water environments and human cancer.
Distribution of co-pollution measured on a common graded scale
The environment as a whole is very difficult to understand, given the large heterogeneity and complex relationships that exist both within and across environmental media. This hinders progress in the exploration of the relationship between the environment and diseases such as cancer. There are separate standards for classifying the quality of air or surface water, for example, the Air Quality Index (AQI)37 or Water Quality Index (WQI)38, but these classification systems use single-factor evaluation (i.e., the single pollutant with the highest pollution level), and do not take into account the spatial relation between monitoring sites or sections39. There is also no measurement tool to quantify and compare the degree of co-pollution in different places. We, therefore, proposed an approach to translate the complex environmental network relationship into common graded scales so as to quantify their combined effects on cancer occurrence.
The pollution grade was achieved in three successive steps. (1) For each pollutant, using its threshold concentration as the cut-off value, we used a modified local Moran’s index to identify the aggregation characteristics of the spatial distribution of the pollutant. (2) Assuming a high correlation among pollutants, we applied the principle of “combining items with similar features” to facilitate the assessment of the combined effect of multiple pollutants. (3) Based on the combination of similar features, the whole space was divided into four progressive grades, enabling the transformation of a multi-dimensional complex network into a one-dimensional quantifiable co-pollution grade.
Specifically, to identify spatial patterns, we used the threshold concentration assigned to each pollutant item (as shown in Table 1) and calculated a modified local Moran’s I33,40. Through conversion to a binary variable, the modified local Moran’s index is intrinsically equivalent to converting the original local Moran’s index with the mean as the cut-off value to the threshold concentration as the cut-off value, such that we can identify the high value in the sense of threshold concentration specific to each pollutant. This index helped us identify six various cluster patterns, including: a high–high cluster (HH), low–low cluster (LL), high–low outlier (HL), low–high outlier (LH), high–not clustered (HN), and low–not clustered (LN) (detailed in the Methods). This standardisation process treats all pollutants equally, regardless of their measurement units. It also considers the non-static and transboundary characteristics of air and water pollutants. The geographical details regarding clustering or outliers of pollutants (as shown in Fig. 1) can provide valuable insights for regionally tailored environmental policy-making. We focused on the high-level pollutants (“H” for short) and counted the number of “H”s to grade the degree of pollution in air or surface water. Different grades across environmental media can be further tabulated as a matrix to show patterns of co-pollution (Fig. 2a). This matrix network could also be simplified as a common graded scale of co-pollution by merging cells with similar patterns (Fig. 2b).
Based on our empirical grouping on the common graded scale, 78 basic analytic units (35.6%) had high-level pollution in both air and surface water, which corresponds to Grade IV on the scale. These areas were mainly distributed in the Beijing–Tianjin–Hebei region (i.e., the Capital Economic Circle), the Huaihe River basin (which has a dense water network and population) and the Fen-Wei Plain (downstream of the Yellow River). All 19 pollutants exceeded the thresholds in these areas, with the exposure rate (proportion of basic analytic units exceeding the threshold for each pollutant) of PM2.5 and PM10 approaching 100% (Fig. 2c). At the other extreme, 32 (14.6%) basic analytic units had very low-level pollution of both air and surface water (Grade I). In these areas, all in southern China, only 7 pollutants exceeded the thresholds, and the exposure rate was low, with the highest exposure rate of 59.4% for PM2.5. In between, 65 (29.7%) and 44 (20.1%) basic analytic units were classified as Grade II (low-level pollution in both) and III (moderate-level co-pollution), respectively. Note that very few areas had either high-level pollution in the air but low-level pollution in surface water (11 [5.0%] basic analytic units, scattered in Northern and Central China) or the opposite (four [1.8%] basic analytic units, in border areas in the southeast and southwest). This reinforces the spatial connection between air and surface water pollution and the validity of the proposed grading system of co-pollution.
We quantified the degrees of pollution uniformly and simultaneously in both air and surface water, and to show a spatial connection between them. Despite that, the final grade of co-pollution was affected by the thresholds used for the pollutants (e.g., the number of areas of a higher grade would be reduced if less strict criteria, say 80th percentiles, was used for water pollutants), the overall pattern would not change, and the present results are supported by some previous knowledge of environmental problems. For example, the Grade IV areas are mainly distributed in populated regions where air pollution (such as the Beijing–Tianjin–Hebei region) or water pollution (such as the Huaihe River basin) have aroused public concerns41,42. The pollution may be due to gases, wastewater and solid waste from the chemical-based industrial structure, the road freight-based transport structure and the coal-based energy structure43. Our results further stress the co-pollution problems in these areas, i.e., the possibility of shared pollution sources for both air and surface water. These suggest that coordinated governance across sectors is required to balance economic development and the environment. China is also a miniature of the discrepancy in both pollution degree and patterns that exist worldwide. No uniform development model could fit all areas. Tailored environmental policies are therefore needed.
Cancer incidence in relation to environmental pollution
Stephen Paget studied the patterns of cancer metastasis and then proposed the Seed-and-Soil Theory44. This states that metastasis depends on interactions between cancer cells (the ‘seeds’) and specific organ microenvironments (the ‘soil’) and that cancer cells exhibit preferences when metastasising to organs. We assumed that cancers, when viewed from the spatial perspective, also have preferences for particular environmental conditions, i.e., they tend to cluster in areas with particular environmental characteristics. We examined whether cancers of different types display similar spatial patterns in the population.
Interestingly, we found good consistency in the spatial distributions between the cancer incidence and the co-pollution grade. The spatial consistency was especially clear for lung, stomach and oesophageal cancers, the three most common cancers in China (Supplementary Information Fig. 3)34.
To provide some insights about the spatial consistency, we showed that Grade IV areas had the highest levels of incidence of seven types of cancer, including oesophageal (incidence rate ratio [RR] of 2.502 compared to Grade I, an increase in risk of 150.2%), gallbladder (1.790), pancreatic (1.686), kidney (1.639), stomach (1.469), breast (1.374), and lung (1.289). In Grade II and III areas, the incidence of one and five types of cancer, respectively, was significantly higher than in Grade I areas (Fig. 3). There was a “dose-response” relationship between the number of affected cancer types and the cancer incidence with an increase in co-pollution grade. This relationship remained consistent across different grouping schemes used to define the co-pollution grade, as indicated in Supplementary Information Fig. 4. This sensitivity analysis further strengthens the evidence supporting the combined impact of environmental conditions on cancer outcomes.
Looking at the effect of specific pollutants on specific cancers, all 19 pollutants had potentially important effects on at least one cancer type filtered by Shapley additive explanations (SHAP) analysis45 (Fig. 4). Among these, eight pollutants (four air pollutants, PM10, PM2.5, NO2 and ozone [O3], and four water pollutants, COD_Mn, petroleum, dissolved oxygen [DO], cyanide) showed significant positive effects (Table 2). The per capita gross domestic product, the fraction of the population aged 65 years and older, and the urbanisation rate were also identified as significant contributors to cancer risks (spatial patterns presented Supplementary Information Fig. 5). After adjusting for these social factors, the observed effects of the pollutants remained stable (Supplementary Information Fig. 6). However, the study found that there was no positive correlation between natural environments and liver cancer after adjusting for these social factors. This suggests that social factors may be more important than natural environments in terms of liver cancer, which is primarily driven by hepatitis B and C infections in China46.
A relationship between NO2 and breast cancer has been established47, and some previous studies pointed to a similar relationship with colorectal cancer47 and leukaemia48. We extended these findings to nine cancer types, including colorectal (RR = 1.132), gallbladder (1.102), pancreatic (1.172), lung (1.042), breast (1.119), kidney (1.126), and brain (1.056) cancers, leukaemia (1.099) and lymphoma (1.233). These observations could be used to reinforce the rationality for including NO2 in the WHO ambient air quality database12. We confirmed that PM2.5 has a causal relationship with lung cancer (RR = 1.188). Our findings also suggest this known Type I carcinogen21 may have an effect on leukaemia (RR = 1.298). We also observed a relation between COD_Mn and three types of cancers, including pancreatic (RR = 1.089), breast (1.274), and kidney (1.177). COD_Mn is extensively utilised in China as a comprehensive indicator for assessing nitrite and organic pollutants in surface water49. The combination of nitrite with amines can generate nitrosamine, which is a known carcinogen. In addition, direct carcinogens and pre-carcinogens are organic substances that have the potential to induce DNA changes50,51. This biological basis supports our findings. In situations where testing capacity is insufficient, our study suggests that COD_Mn could serve as a simplified indicator for assessing both nitrite and organic pollutants, thereby providing valuable information on cancer risks associated with water pollution.
Recognising that pollutants do not exist in isolation, it is important to acknowledge that these substances also do not act independently. Even pollutants that have not yet been acknowledged as carcinogenic can still have an impact on the risk of cancer in populations. This impact may arise from intricate interactions with known carcinogens. Findings from several previous studies have provided support for this assumption33,52, underscoring the urgent need for further investigation into the potential network mechanisms that connect multiple pollutants and the development of cancer. Exploring these complex interactions is crucial for effectively managing the risks associated with pollutants and preventing cancer.
Acting upon the environment-attributable cancer burden
Understanding the environmental–cancer relationship is a premise for motivating actions, but this knowledge alone does not provide a sufficient basis for the formulation of environmental governance and disease prevention policies. In this section of the report, we provide estimates of the number of excess cancer cases related to air and surface water environments in areas of different co-pollution grades. This is used as a call for growth in both academic and political interest in environmental health and collaborative efforts across sectors.
Overall, there were 62,847 excess cases in the basic analytic units in 2016, which means 7.4% of total cancer cases were attributable to air and surface water pollution. As the co-pollution grade increased, the number of pollutants that could explain the excess cases increased, from three in Grade I areas to eight in Grade IV areas. The number of types of cancers that were attributable also increased, from five in Grade I areas to 10 in Grade IV areas (Fig. 5a and Supplementary Information Table 4).
The cancer spectrum attributable to different co-pollution grades was affected by the patterns of pollution (Fig. 5b). For example, PM2.5 as a single dominating pollutant (59.4% of the basic analytic units) in Grade I areas explained 523 (4.0%) excess cases of lung cancer. In Grade II areas, 66.2% of the basic analytic units were exposed to PM10, and there were more excess cases (1763, 11.1%) of oesophageal cancer than in Grade I areas (91, 0.58%). The number of basic analytic units exposed to NO2 increased from 0.0% in Grade I and 1.5% in Grade II to 22.7% in Grade III. The excess cases of colorectal cancer and breast cancer also increased significantly, reaching 2218 (15.9%) and 1863 (13.3%) in Grade III, respectively. The Grade IV areas were exposed to the largest number of pollutants and had the highest excess cases across all cancer types, 43,827 in total (accounting for 69.7% of total excess cases). These findings could be used by local governments to scale up countermeasures.
Discussion
In this study, we integrated nationwide data on air, surface water and cancer and consolidated the methodological basic for examining the relationships between multiple pollutants and multiple types of cancers within this giant system. Data availability is fundamental to speed up future research and actions. The development of the SESEC in this study benefited from the establishment of monitoring sites, sections and CRIs across the country. Maintaining and improving coverage of this infrastructure requires considerable and costly effort. Our work provides a way to make use of the available data, and we encourage researchers in China and elsewhere to build upon knowledge that could inform environmental protection and cancer control policies. To facilitate the replication and modification of our work, we have included an overview of our study design, analytic methods, and underlying assumptions in Fig. 6. This information serves as a useful guide for researchers seeking to build upon our work and contribute to this important field.
An important aspect of the work is its contributions to the development of spatial analysis, an interdisciplinary field of geography, epidemiology, statistics and ecology. This represents a research paradigm using an ecosystem perspective, which is more macroscopic than the traditional lenses for observation, such as the molecular, cellular and individual patient levels. Because our 219 analytical units covered nearly all CRIs (477/487), our evaluation of cancer risk is unbiased. As anticipated, however, for all non-included units, the levels of air, water, and related sociological factors were significantly lower than those in the study area (Supplementary Information Table 5). This phenomenon illustrates the focus of national monitoring data and also highlights that coordinating economic development and environmental governance in less-developed areas will be a more effective choice.
The unique nature of air and surface water as environments to which individuals are continuously exposed from birth presents several challenges when attempting to establish associations between these exposures and cancer outcomes. Challenges include factors such as the lack of quantification approaches for individual exposure levels, the intricate interplay between genetic and environmental factors and unknown time lags. In addressing these challenges, spatial analysis offers an alternative perspective to understand the environmental effects. In this analysis, we emphasised the spatial consistency of data rather than the temporal sequence of the dataset. The large-scale urbanisation process and rapid economic development in China began in the early 1990s, leading to concentrated air and water pollution, particularly in the eastern coastal areas, which resulted in substantial health effects. After more than 20 years of continuous efforts, the overall air and water quality in China has been gradually improving33. Therefore, from a logical standpoint, correlating the occurrence of cancers several years ago with the latest water and air pollution data would likely underestimate the risks. We have reason to believe that the connection effect between the poorer water and air quality in China 10 or 15 years ago and the subsequent occurrence of cancers would be stronger. Furthermore, it is important to note that no country’s administrative data are specifically designed for a particular research topic. However, certain studies (such as ours) must rely on national-level data to be more credible and comprehensive. This is because it is difficult to encompass the health effects of air and water on even large populations through population-based studies.
This study encompasses a vast majority of China’s geographical area, with a population of 377 million. As a result, the data on the macroscopic system can be viewed as “parameters” reflecting the environmental conditions and population outcomes in specific locations. These parameters exhibit relative stability over time, as demonstrated in Supplementary Information Figs. 7, 8. By utilising these parameters, we can bypass the limitations inherent in population-based studies and gain valuable insights into the relationships between environmental factors and cancer outcomes. The wide range in geography that far exceeds human impact also means that these parameters demonstrated significant variations (i.e., randomness) that can help reveal interesting patterns that may suggest causation. However, establishing causation for cancer is very challenging, and will require close cooperation between different industries and disciplines to effectively control the increasing disease burden.
Methods
Data source and processing
We obtained data on air pollutants, surface water contaminants and population cancer incidence from the CNEMC, Ministry of Ecology and Environment, and China Cancer Registry Annual Report (2019), respectively.
The establishment of the air quality monitoring stations followed the guidelines outlined in the Technical Regulation for Selection of Ambient Air Quality Monitoring Stations53. The selection of monitoring stations follows the principles of representativeness, comparability, integrity, foresight and stability, taking into account factors such as environmental and socio-economic characteristics comprehensively (information about each monitoring station is available elsewhere27).
The establishment of the surface water monitoring sections followed the guidelines outlined in the National Technical Specifications for Surface Water Environmental Monitoring54. These sections were strategically chosen to accurately represent the natural characteristics of river network density, runoff supply, and hydrological features. For records that fell below the limits of detection (LOD), we adopted 1/2 LOD for processing according to the technical specification for surface water quality assessment published by the Ministry of Ecology and Environment of the People’s Republic of China55. In some monitoring sections (ranging from 1.8% to 9.2% per month), data were unavailable owing to factors such as dry conditions, freezing, or other reasons. The proportion of missing data was small and it presented spatial randomness, so it may have little influence on the overall effect estimation. To address this issue and ensure a comprehensive analysis, we used averaging techniques over the analytic year for each specific section. By calculating the average concentrations over the course of the year, we also aimed to smooth out temporal fluctuations, which are common in air and surface water pollutants, and provide a more comprehensive reflection of the potential long-term effects on cancer incidence. To establish high-value status and determine thresholds for multiple water quality pollutants, we employed a practical criterion of computing the 75th percentile values for each pollutant. This approach provides sufficient variability in various pollutant indicators and their binary conditions when performing high-value analysis. Furthermore, using the 75th percentile as a threshold offers a uniform way to determine thresholds for multiple pollutants, as we have observed their correlation with multiple types of cancer33.
To account for social determinants of cancer risk, we obtained data on several key factors from China Statistical Yearbook 202056. Specifically, we looked at per capita gross domestic product (GDP), the fraction of the population aged 65 years and older, and the urbanisation rate. The base map was obtained from the China Resource and Environment Science and Data Centre57.
Modified local Moran’s I index for identification of spatial clustering patterns
We utilised a modified version of Moran’s I, originally designed for continuous variables, to analyse the spatial clustering patterns of high-value status regarding the pollutants33. This modified Moran’s I account for categorical variables. The formula used to calculate this modified Moran’s I is as follows:
where \({x}^{\ast }\) is a binary variable that takes on values of 0 or 1. It serves as an indicator variable to determine whether the concentration x exceeds a certain threshold. If the concentration x exceeds the threshold, \({x}^{\ast }=1\); otherwise, if the concentration x does not exceed the threshold, \({x}^{\ast }=0\). Thus it follows a two-point distribution. \({\bar{x}}^{\ast }\) is the mean of the binary variable. n is the number of analytic units. \({x}_{i}^{\ast }\) and \({x}_{j}^{\ast }\) represent the values of the binary variable for the ith and jth analytic units, respectively. The weight wij is defined as the inverse of the distance between neighbouring units i and j. To calculate this weight, we used the minimum distance that ensures each unit has at least one neighbouring unit as the distance threshold.
The same as the original local Moran’s index, six types of spatial patterns were derived based on the value of the local variable, the value of the local Moran’s index, and the results of its hypothesis testing (Table 3).
For each analytic unit, any pollutant identified as HH, HL or HN was defined as “H”. One to three levels of air or surface water for each analytic unit were defined based on the number of “H”s: for air, units with 0–1, 2, and 3–6 “H” pollutants were defined as 1, 2, and 3 (low- to high-level pollution). For water, units with 0–1, 2–5, and 6–13 “H” pollutants were defined as 1, 2, and 3 (low- to high-level pollution). Thus, based on the 3 × 3 cross table of levels of both air and surface water pollution, we defined the co-pollution grade as follows: Grade I (both air and water pollution at low levels, 1-1); Grade II (1-2 or 2-1); Grade II (1-3, 3-1, or 2-2); and Grade IV (2-3, 3-2, or 3-3). Co-pollution was based on three criteria: order consistency, sufficient interval, and appropriate size for each group. We examined an alternative grouping scheme that met these conditions to see robustness in the dose-response relation between the co-pollution degree and cancer risk.
A mixed modelling strategy for identification of cancer-specific key pollutants
We adopted a mixed strategy of machine learning (SHAP analysis45) and classical statistics (negative binomial regression58) to identify key pollutants with an active role in the mixtures of pollutants affecting cancer incidence.
SHAP is a game theory-based framework that has been used to explain various supervised learning models without the need to know the exact structure inside the model. By providing both group and individual interpretations as well as information about the direction of the variable’s effects on outcomes, SHAP has been widely used in medical research as a more flexible approach to model interpretation. In this study, SHAP allowed us to estimate the degree of impact of each pollutant on cancer incidence. The SHAP value \({\phi }_{j}\) of pollutant j was calculated as follows:
where F is the set of all pollutants, S is any subset of F, “| |” denotes the number of elements in the set and “!” denotes factorial. The SHAP value reflects the importance of pollutant j by calculating the weighted average of the difference between the predicted value with and without the pollutant j across all subsets S. SHAP can be based on any machine learning model, and we used the random forest algorithm. For the hyperparameters in a random forest, to fully train the model, we set the number of trees to 1000, the minimum split sample size to 2, and no restriction on the depth of the tree.
The larger the SHAP value, the greater the effect of the pollutant on the cancer incidence. Because the variability of SHAP values for variables beyond rank 10 tended to stabilise in most cancer types, to ensure an adequate number of variables and consistent screening criteria, we considered the 10 leading pollutants (approximately representing half of the total number) for each specific cancer type. This selection process enabled us to focus on the most influential pollutants in relation to cancer risk. For the selected pollutants, which were found to positively contribute to cancer risk, we conducted a multivariable negative binomial regression analysis. This regression model allowed us to quantify the magnitude of their effects by calculating the incidence rate ratio (RR). The RRs were adjusted for the presence of other pollutants, considering potential confounding effects. Furthermore, we explored the impact of social factors on cancer risk by incorporating adjustments for per capita gross domestic product, the fraction of the population aged 65 years and older, and the urbanisation rate. These adjustments were made both with and without considering the additional influence of social factors. By applying these approaches, we aimed to enhance the validity and reliability of our findings, particularly given the lack of prior knowledge regarding the specific cancer effects of the multiple pollutants under investigation.
Population attributable fraction for quantifying cancer burden
We utilised the population attributable fraction (PAF)59 to assess the cancer attribution in each grade. This was done by using the burden attributable to the continuous exposure method to calculate the PAF for each grade separately. The PAF was calculated for each cancer type, with the contribution of the ith pollutant in the jth ( j = I–IV) grade determined using the following formula:
where x represents the pollutant concentration, min and max are the minimum and maximum concentration values of each pollutant. RRi(x) is the incidence rate ratio at concentration x, which is obtained by negative binomial regression while adjusting for pollutants and social factors. The upper limit of RRi(x) is restricted to the value of the rate ratio at the threshold concentration to obtain the minimum estimate of PAF. Pij(x) is the concentration distribution of the ith pollutant in the jth grade. Pi*(x) is the theoretical minimum risk exposure distribution. It assumes that all units are exposed to concentrations below the threshold. By comparing the actual and assumed (referenced) conditions, we can estimate the burden.
The combined PAF of co-pollution in air and water in the jth grade was calculated as ref. 60
where PAFij is the PAF for the single pollutant calculated in (3).
ArcGIS software version 10.8 (Esri, Redlands, CA, USA) was used for the calculation of the modified local Moran’s index, and to visualise all maps. Other analyses used SAS software (version 9.4), R (version 4.2.1) and package “SHAP” in Python (version 3.10). The code for the key steps can be obtained in the Supplementary Information Code (Supplementary Notes 1–3).
Reporting summary
Further information on research design is available in the Nature Portfolio Reporting Summary linked to this article.
Data availability
Sources of raw public datasets used within the paper are summarised in the ‘Methods’ section. Data on air pollutants can be obtained from The China National Environmental Monitoring Centre (https://air.cnemc.cn:18007/). Data on surface water can be obtained from the Ministry of Ecology and Environment of China (http://www.cnemc.cn/). Data on cancer incidence can be obtained from China Cancer Registry Annual Report. The sharing of ecological environment data, including surface water and air pollutants, is built upon a series of national and local technical standards. The authors are not authorised to disclose the original data. For access to the original data, please contact the relevant departments listed in the methodology section. Source data are provided in this paper.
Code availability
The code used in this study can be obtained in the Supplementary Information.
References
Lelieveld, J., Evans, J. S., Fnais, M., Giannadaki, D. & Pozzer, A. The contribution of outdoor air pollution sources to premature mortality on a global scale. Nature 525, 367–371 (2015).
Rodell, M. et al. Emerging trends in global freshwater availability. Nature 557, 651–659 (2018).
Seifollahi-Aghmiuni, S., Kalantari, Z., Egidi, G., Gaburova, L. & Salvati, L. Urbanisation-driven land degradation and socioeconomic challenges in peri-urban areas: Insights from Southern Europe. Ambio 51, 1446–1458 (2022).
Li, X. et al. Urbanization and health in China, thinking at the national, local and individual levels. Environ. Health 15, 32 (2016).
Nweke, O. C. & Sanders, W. H. 3rd Modern environmental health hazards: a public health issue of increasing significance in Africa. Environ. Health Perspect. 117, 863–870 (2009).
World Health Organization. Ambient (outdoor) air pollution. https://www.who.int/news-room/fact-sheets/detail/ambient-(outdoor)-air-quality-and-health (2022).
Hamra, G. B. et al. Outdoor particulate matter exposure and lung cancer: a systematic review and meta-analysis. Environ. Health Perspect. 122, 906–911 (2014).
Hill, W. et al. Lung adenocarcinoma promotion by air pollutants. Nature 616, 159–167 (2023).
Kulhánová, I. et al. The fraction of lung cancer incidence attributable to fine particulate air pollution in France: Impact of spatial resolution of air pollution models. Environ. Int. 121, 1079–1086 (2018).
Amadou, A. et al. Long-term exposure to nitrogen dioxide air pollution and breast cancer risk: A nested case-control within the French E3N cohort study. Environ. Pollut. 317, 120719 (2023).
Hystad, P. et al. Exposure to traffic-related air pollution and the risk of developing breast cancer among women in eight Canadian provinces: a case-control study. Environ. Int. 74, 240–248 (2015).
World Health Organization. Ambient (outdoor) air pollution. Geneva, Switzerland (2016).
Zhang, H., et al. Changes in China’s river water quality since 1980: management implications from sustainable development. Npj Clean Water 6, 45 (2023).
Jin, H., Chen, X., Zhong, R. & Liu, M. Influence and prediction of PM2.5 through multiple environmental variables in China. Sci. Total Environ. 849, 157910 (2022).
Lu, Y. et al. Impacts of soil and water pollution on food safety and health risks in China. Environ. Int 77, 5–15 (2015).
Schraufnagel, D. E. et al. Air pollution and noncommunicable diseases: A review by the forum of international respiratory societies’ environmental committee, part 2: Air pollution and organ systems. Chest 155, 417–426 (2019).
Zhang, H., Li, H., Gao, D. & Yu, H. Source identification of surface water pollution using multivariate statistics combined with physicochemical and socioeconomic parameters. Sci. Total Environ. 806, 151274 (2022).
Soerjomataram, I. & Bray, F. Planning for tomorrow: global cancer incidence and the role of prevention 2020-2070. Nat. Rev. Clin. Oncol. 18, 663–672 (2021).
Cassetti, T., La Rosa, F., Rossi, L., D’Alò, D. & Stracci, F. Cancer incidence in men: a cluster analysis of spatial patterns. BMC Cancer 8, 344 (2008).
Sung, H. et al. Global cancer statistics 2020: GLOBOCAN estimates of incidence and mortality worldwide for 36 cancers in 185 countries. CA Cancer J. Clin. 71, 209–249 (2021).
WHO IARC. Agents Classified by the IARC Monographs, Volumes 1–132: WHO International Agency for Research on Cancer. (2022).
Ma, T. et al. China’s improving inland surface water quality since 2003. Sci. Adv. 6, eaau3798 (2020).
Han, L., Zhou, W., Pickett, S. T., Li, W. & Qian, Y. Multicontaminant air pollution in Chinese cities. Bull. World Health Organ. 96, 233–242e (2018).
Cao, M. et al. Cancer screening in China: The current status, challenges, and suggestions. Cancer Lett. 506, 120–127 (2021).
Liu, J. & Diamond, J. China’s environment in a globalizing world. Nature 435, 1179–1186 (2005).
Ministry of Ecology and Environment of China. Report on the State of the Ecology and Environment in China. (2015).
China Environmental Monitoring Station. National urban air quality real-time release platform. https://air.cnemc.cn:18007/ (2023).
China National Environmental Monitoring Centre. National Surface Water Environmental Quality Monitoring Network. http://www.cnemc.cn/ (2021).
National Cancer Center of China. Annual Report of the China Cancer Registry 2019. Beijing: China People’s Medical Publishing House. (2019).
World Health Organization. WHO Global Air Quality Guidelines: Particulate Matter (PM2.5 and PM10), Ozone, Nitrogen Dioxide, Sulfur Dioxide and Carbon Monoxide. Geneva (2021).
Ministry of Ecology and Environment of China. Environmental Quality Standards for Surface Water. GB 3838–2002 (2002).
Ministry of Ecology and Environment of China. Ambient air quality standards. GB 3095–2012 (2012).
Wang, Z., et al. Spatial association of surface water quality and human cancer in China. Npj Clean Water 6, 53 (2023).
Zhang, S. et al. Cancer incidence and mortality in China, 2015. J. Natl. Cancer Cent. 1, 2–11 (2021).
Zeng, H. et al. Cancer survival in China, 2003–2005: a population-based study. Int. J. Cancer 136, 1921–1930 (2015).
Yang, L. et al. Burden of lung cancer attributable to ambient fine particles and potential benefits from air quality improvements in Beijing, China: A population-based study. Sci. Total Environ. 738, 140313 (2020).
Tan, X. et al. A review of current air quality indexes and improvements under the multi-contaminant air pollution exposure. J. Environ. Manag. 279, 111681 (2021).
Nong, X., Shao, D., Zhong, H. & Liang, J. Evaluation of water quality in the South-to-North Water Diversion Project of China using the water quality index (WQI) method. Water Res 178, 115781 (2020).
Davalos, A. D., Luben, T. J., Herring, A. H. & Sacks, J. D. Current approaches used in epidemiologic studies to examine short-term multipollutant air pollution exposures. Ann. Epidemiol. 27, 145–153 e141 (2017).
Anselin, L. J. G. A. Local Indicators of Spatial Association—LISA. Geogr. Anal. 27, 93–115 (1995).
Environmental Planning Institute, Ministry of Ecology and Environment. Report on Air Pollution Control Strategy of the 14th Five-Year Plan in Beijing-Tianjin-Hebei and Surrounding Areas (2021).
Yang G., Zhuang D. Atlas of the Huai River Basin Water Environment: Digestive Cancer Mortality. 1 edn, Springer Dordrecht (2014).
Ministry of Ecology and Environment of China. Report on the State of the Ecology and Environment in China. (2021).
Paget, S. The distribution of secondary growths in cancer of the breast. Lancet 133, 571–573 (1889).
Lundberg, S. M. & Lee, S. I. A unified approach to interpreting model predictions. Adv. Neural Inf. Process. Syst. 30, 4765–4774 (2017).
Shi, J. F. et al. Is it possible to halve the incidence of liver cancer in China by 2050? Int J. Cancer 148, 1051–1065 (2021).
Datzmann, T. et al. Outdoor air pollution, green space, and cancer incidence in Saxony: a semi-individual cohort study. BMC Public Health 18, 715 (2018).
Raaschou-Nielsen, O., Ketzel, M., Harbo Poulsen, A. & Sørensen, M. Traffic-related air pollution and risk for leukaemia of an adult population. Int J. Cancer 138, 1111–1117 (2016).
Ministry of Ecology and Environment of China. Water quality-Determination of permanganate index, GB11892-89 (1989).
Picetti, R. et al. Nitrate and nitrite contamination in drinking water and cancer risk: A systematic review with meta-analysis. Environ. Res. 210, 112988 (2022).
Dixon, K. & Kopras, E. Genetic alterations and DNA repair in human carcinogenesis. Semin. Cancer Biol. 14, 441–448 (2004).
Hendryx, M., Conley, J., Fedorko, E., Luo, J. & Armistead, M. Permitted water pollution discharges and population cancer and non-cancer mortality: toxicity weights and upstream discharge effects in US rural-urban areas. Int. J. Health Geogr. 11, 9 (2012).
Ministry of Ecology and Environment of the People’s Republic of China. Technical regulation for selection of ambient air quality monitoring stations. (2013).
Ministry of Ecology and Environment of the People’s Republic of China. Technical Specifications for Surface Water Environmental Monitoring. (2022).
Ministry of Ecology and Environment of the People’s Republic of China. Technical Specification for Surface Water Quality Assessment. (2022).
National Bureau of Statistics of China. China Statistical Yearbook (2020).
Xu X. Multi-annual district and county administrative boundary data of China. Resource and environmental science data registration and publication system. (2023).
Hilbe, J. M Negative Binomial Regression. (Cambridge University Press, 2011).
Mansournia, M. A. & Altman, D. G. Population attributable fraction. Br. Med. J. 360, k757 (2018).
Ezzati, M. et al. Estimates of global and regional potential health gains from reducing multiple major risk factors. Lancet 362, 271–280 (2003).
Acknowledgements
We would like to thank the CAMS Innovation Fund for Medical Sciences [grant numbers 2017-I2M-1-009 and 2021-1-I2M-022] for their support of this work. The funder played no role in the study design, data collection, analysis and interpretation of data, or the writing of this manuscript.
Author information
Authors and Affiliations
Contributions
J.J. and C.J. contributed to the study's conception and design. L.Z., W.G., C.Y. and Y.S. contributed to data analysis. J.J., L.Z., Z.W., W.G., C.Y. and Y.S. contributed to data interpretation. J.Z., W.H., Y.H. and W.C. contributed to literature research. F.X., X.G. and H.L. contributed to data collection. P.W., Y.C., Y.Z. and J.D. contributed to figure design. J.J., L.Z., Z.W., W.G., C.Y. and Y.S. wrote the manuscript. All authors had access to the raw data, gave critical revisions for important intellectual content and gave final approval of the version to be published.
Corresponding authors
Ethics declarations
Competing interests
The authors declare no competing interests.
Peer review
Peer review information
Nature Communications thanks Jianguang Ji who co-reviewed with Yishan Liu and the other anonymous reviewer(s), for their contribution to the peer review of this work. A peer review file is available.
Additional information
Publisher’s note Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Supplementary information
Source data
Rights and permissions
Open Access This article is licensed under a Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 International License, which permits any non-commercial use, sharing, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if you modified the licensed material. You do not have permission under this licence to share adapted material derived from this article or parts of it. The images or other third party material in this article are included in the article’s Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article’s Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by-nc-nd/4.0/.
About this article
Cite this article
Jiang, J., Zhang, L., Wang, Z. et al. Spatial consistency of co-exposure to air and surface water pollution and cancer in China. Nat Commun 15, 7813 (2024). https://doi.org/10.1038/s41467-024-52065-3
Received:
Accepted:
Published:
DOI: https://doi.org/10.1038/s41467-024-52065-3