Multiple air pollutant exposure and lung cancer in Tehran, Iran

Lung cancer is the most rapidly increasing malignancy worldwide with an estimated 2.1 million cancer cases in the latest, 2018 World Health Organization (WHO) report. The objective of this study was to investigate the association of air pollution and lung cancer, in Tehran, Iran. Residential area information of the latest registered lung cancer cases that were diagnosed between 2014 and 2016 (N = 1,850) were inquired from the population-based cancer registry of Tehran. Long-term average exposure to PM10, SO2, NO, NO2, NOX, benzene, toluene, ethylbenzene, m-xylene, p-xylene, o-xylene (BTEX), and BTEX in 22 districts of Tehran were estimated using land use regression models. Latent profile analysis (LPA) was used to generate multi-pollutant exposure profiles. Negative binomial regression analysis was used to examine the association between air pollutants and lung cancer incidence. The districts with higher concentrations for all pollutants were mostly in downtown and around the railway station. Districts with a higher concentration for NOx (IRR = 1.05, for each 10 unit increase in air pollutant), benzene (IRR = 3.86), toluene (IRR = 1.50), ethylbenzene (IRR = 5.16), p-xylene (IRR = 9.41), o-xylene (IRR = 7.93), m-xylene (IRR = 2.63) and TBTEX (IRR = 1.21) were significantly associated with higher lung cancer incidence. Districts with a higher multiple air-pollution profile were also associated with more lung cancer incidence (IRR = 1.01). Our study shows a positive association between air pollution and lung cancer incidence. This association was stronger for, respectively, p-xylene, o-xylene, ethylbenzene, benzene, m-xylene and toluene.

www.nature.com/scientificreports/ Humans are simultaneously exposed to a complex mixture of air pollutants; therefore, many researchers have investigated a multiple-pollutant approach for assessing air pollution exposure 8 , because in single pollutant models, it is not clear if an observed association reflects the effect of the specific pollutant under study, or the effect of coinciding pollutants 9 . However, there is no consensus on the method used for measuring multiple ambient air pollutants simultaneously. Previous review studies have evaluated multiple pollutants by using methods that can be classified into three groups; dimension reduction, variable selection, and grouping of observations 4,8,10,11 . However, Caban-Martinez, et al. 12 and Kolpacoff, et al. 13 have used latent profile analysis (LPA) to identify subgroups of cancerous patients with different multi-pollution profiles. LPA is a probabilistic or model-based technique and is a variant of the traditional cluster analysis, which better handles outliers and unequal cluster sizes 14 . This method enables identifying possible unobservable subgroups, or latent classes in a population using a number of related observable variables. Using this method, complex relations between groups of risk factors and disease outcomes, such as cancer, which may not be best explained by a single pollutant model, can be better understood. It also reduces the dimensionality of exposure data and decreases the burden of multiple testing, while enhancing the power of statistical analysis 13 .
Knowledge about geographical patterns of multiple pollution helps policy makers to target high risk regions for more intense interventions. Air pollution exacerbates the health disparity among socioeconomic groups, because usually the poor socio-economic areas are more polluted 15,16 . The relation between air pollution exposure and health outcomes can also be theoretically modified by socioeconomic status, through causing differences in access to medical care and healthy diet, and also by biological factors, such as age and psychological stress. However, the hypothesis that residents with a low socioeconomic position face more severe consequences from air pollution is debated 16 .
In the current study, we aimed to examine the association between single and multiple ambient air pollutants and lung cancer incidence in Tehran, Iran.

Methodology
Research location. This study was conducted in Tehran, a megacity which is the most populous city in Iran, with a residing population of about 9 million and a daytime population of over 14 million people. According to the World Population Review report, Tehran's 2020 population is now estimated to be 9,134,708. Tehran is the most populous city in Iran and Western Asia, and has the third-largest metropolitan area in the Middle East. It is ranked 24th in the world by the population of its metropolitan area. Tehran's area extend is about 730 km 2 and consists of 22 municipal districts with different concentrations of ambient air pollutants 17 .
Data sources. Lung cancer data. Totally, 1850 patients residing in Tehran were diagnosed with lung cancer (Trachea, Bronchus and Lung) between 2014 and 2016. The latest address of these patients' residence in Tehran was inquired from the Cancer Department of the Ministry of Health of Iran. The officials of the cancer registry claimed that the recorded addresses are more than 90% accurate.
The geographical coordinates (longitude and latitude) of the participants' residential addresses were determined according to the address of the patients' residence and was marked on the GIS map of Tehran.
Exposure assessment. The annual mean concentrations of PM 10 , SO 2 , NO, NO 2 , NO X , benzene, toluene, ethylbenzene, m-xylene, p-xylene, o-xylene (BTEX), and total BTEX in the 22 districts of Tehran were inquired from previously developed land use regression (LUR) models. The LUR models for PM 10 , SO 2 , NO, NO 2 and NO X in Tehran were developed based on measurements conducted at 23 sites in Tehran in 2010 18,19 . The models for volatile organic compounds (VOCs) were developed based on measurements at 179 sampling sites from April 2015 to May 2016 20,21 .
Confounding covariates. Population-based data was extracted from the Urban Health Equity Assessment and Response Tool (Urban HEART-2), which has been conducted in 22 districts of Tehran and is a data repository that collected many district-level variables, such as population density, per capita urban green space, smoking rates and life expectancy in 2011. A detailed description of the Urban HEART-2 study can be found elsewhere 22 .
The socio-economic development situation of the 22 districts of Tehran was extracted from a study conducted by Sadeghi et al. 23 . In brief, sixteen economic and social indicators were incorporated to estimate the level of development in the 22 districts of Tehran, based on Exploratory Factor Analysis (EFA), Principal Component Analysis (PCA). This multivariate statistical technique is used to reduce the number of variables in a dataset into a smaller number of "dimensions" that explains most variations in the dataset using a few estimated substitutional latent variables 23 .
In Sadeghi et al. 's study, the variables used for estimating the social dimension included adult literacy level (among the 30-59 year olds), elderly literacy level (60 years and older), the proportion of university graduates in the total population, the proportion of the males with university education, the proportion of females with university education, and the percent of population that uses the internet. The variables used for estimating the economic part, included women's economic participation rate, the proportion of employees with high-rank jobs, the proportion of households with cars, the proportion of households with computers, indicators of household access to public facilities, the ratio of households who own their home, the proportion of homes with civil standards, the average price per square meter of residential building land, the average selling price per square meter of residential building area and the average monthly rent per square meter of residential area for each district. A detailed description of these variables can be found elsewhere 23 . A higher socio-economic development score showed a higher socio-economic level. This variable did not have a specific unit, its minimum was 36.6 and maximum was 67.4 23  www.nature.com/scientificreports/ Statistical analyses. Latent profile analysis (LPA) was used to make multiple-pollution profiles 14 . A series of LPAs was performed, ranging from two to seven latent profiles. The 12 air pollutants (PM 10 , SO 2 , NO, NO 2 , NO X , benzene, toluene, ethylbenzene, m-xylene, p-xylene, o-xylene and TBTEX) were used as LPA indicators. The grand mean centering of all 12 pollutants were used in the analysis to facilitate interpretation. Grand-mean centering subtracts the overall mean from a variable 24 . To identify the best fitting model, measures of relative statistical fit and the interpretability of their profile structure were used. Models with low values in AIC, BIC, aBIC metrics, and a significant Bootstrap Likelihood Ratio Test (BLRT) were preferred 25 . The absolute value of log likelihood is not recommended to be used for model selection, because this value gradually improves by adding more parameters to the model. Entropy was also evaluated for each model; and values closer to 1 suggest a higher discrimination of the latent classes 26 . In addition, the interpretability and parsimony of the candidate models were compared. Models with profiles including less than 5% of the class size are considered spurious 27 , and are not acceptable. Data preparation was done in Stata version 14. LPA was performed using Mplus version 7.4 (Muthen & Muthen, 1998-2015) mixture modeling procedure, with the robust maximum likelihood (MLR) estimator. Missing data were addressed using full information maximum likelihood (FIML). To examine how the various multiple-pollution profiles differ in terms of each air pollution component, one-way ANOVA and post hoc follow-up tests were used. Kolmogorov Smirnov tests were used to test the normality of the pollutants and because the data was normally distributed, the Pearson's correlation test was used to estimate the correlation between pollutants. As the number of lung cancer cases was over-dispersed, negative binomial (NB) regression was performed to estimate the incidence rate ratios (IRR) and their 95% confidence intervals (CI) for each air pollutant and multiple pollution profiles, adjusted for age, sex, smoking at district level.
In order to calculate the population attributable fraction (PAF), the risk estimates for air pollutants were obtained from the results of negative binomial (NB) regression analysis.
The PAF for air pollutants as continuous variables, were estimated using the following equation 28 : In this equation, RR unit is the relative risk for each one unit increment in exposure to the air pollutant and χ is the average of exposure.
Statistical analyses were performed using Mplus version 7.4, Stata version 14 (Stata Corp LLC; College Station, TX, USA) and ArcGIS version 10.8.
As the data was inquired in aggregated form and anonymously, informed consent from individuals or their family was not required. This data is not publicly available, but can be inquired by formal request in aggregated and/or anonymous form from the Ministry of Health of Iran. The Ethic approval code of this project was IR.KMU. REC.1398.230. All methods were carried out in accordance with relevant guidelines and regulations.

Results
Basic information about the area under study is shown in Table 1. The total number of lung cancer cases in 2014-2016 was 1850 in all districts of Tehran. We had to exclude the data of subjects who lived in remote suburbs of Tehran, which air pollutants estimation was not possible. Eventually, 1653 cases entered the final analyses. The distribution of lung cancer patients in different districts of Tehran is shown in Fig. 1. The highest number of patients per 100,000 populations was in regions 12, 6 and 11, respectively.
The spatial distributions and average levels of ambient air pollutants in different districts of Tehran are shown in Fig. 2. Districts with higher concentrations for pollutants were mostly in downtown (district 6, 7, 11, 12 and 14(, and around the railway (district 16 and 17), and a few of the southern districts of the city (district 18). District 16 had the highest concentration of SO 2 and district 9, 2 and 6 had the highest concentration of NO, NO 2 and NOx pollutants during these years. District 12 had the highest concentration of VOC pollutants.
Fit indices for the different LPA models are displayed in Table 2. All solutions provided acceptable classification accuracy, as indicated by entropy values close to 1. Although models with four, five, six, and seven latent class profiles had lower AIC, BIC, and aBIC than two and three latent class profiles, these models included classes with less than 1% of the sample. Therefore, the three latent profile model was preferred. The multi-pollution profiles are shown in Fig. 4. Profile 1 had the lowest scores for all pollutants, except SO 2 . We labeled this profile as "low multiple-pollution". Profile 3 had the highest scores of all pollutants. We labeled this profile as "high multiplepollution". Profile 2 was in between and was labeled "medium multiple-pollution".
Summary statistics for each pollutant in different profiles are shown in Table 3. There was a significant difference between the means of all pollutants in the three profiles, except SO 2 . Table 4 shows the IRR estimates by single-pollutant and multiple-pollutant multivariable negative binomial regression models, adjusted for age, gender, socioeconomic status, life expectancy and smoking prevalence. In single-pollutant models, p-xylene, o-xylene, ethylbenzene, benzene, m-xylene and TBTEX were significantly associated with increased lung cancer incidence in model 3, which was adjusted for age, gender, socioeconomic status, life expectancy and smoking prevalence.
In multi-pollutant models, the high multiple-air-pollutants profile was associated with higher lung cancer incidence when compared with the low multiple-air-pollutants profile.
The fraction of cancers attributable to air pollutant can be seen in Fig. 5. The highest fractions belong to m-xylene, o-xylene, and TBTEX.

Discussion
This study was the first to investigate the effect of single and multiple ambient air pollutants on lung cancer in Iran. The findings suggest that ambient air pollutants, especially p-xylene, o-xylene, ethylbenzene, benzene, m-xylene and TBTEX were associated with lung cancer. Previously several studies have also shown a strong association between air pollution and respiratory mortality 29,30 and respiratory diseases 31,32 , including chronic obstructive pulmonary disease (COPD), asthma, bronchitis, and decreased lung function 33 . Recently, some studies have shown the association between air pollution and lung cancer as well 3 . Iran's national cancer registry data indicates an approximately sevenfold increase in the trend of lung cancer incidence over a 27-year span (1990 to 2016), in the whole country and in the capital city, Tehran 34 . In the past, lung cancer had been mainly attributed to direct tobacco smoke exposure. However, its increased incidence in never-smokers in the recent years shows that there are other risk factors that need to be discovered 35 . Some of the probable risk factors for lung cancer in never smokers could be environmental pollutants, such as air pollution, occupational carcinogens, radon and infections 36 . In Taiwan, air pollution was related with the incidence of lung cancer in never-smokers 37 ; and the result of lung cancer screening programs in China and the United States in 2018 showed that the incidence of lung cancer in never smokers was significantly higher in China than the United States. Their data suggested that inclusion of ambient air pollution could improve the lung cancer risk models, especially for non-smokers 38 .
Air pollutants have been reported to be correlated in many studies. Faridi et al. reported positive correlations between PM 2.5 , PM 10 , NO 2 , SO 2 , CO and O 3 in Tehran 39 , and another study from Los Angeles County, also reported correlations between multiple ambient air pollutants 40,41 . Studies from Spain have shown positive correlations between PM 10 and PM 2.5 and between nitrogen oxides (NO 2 and NO). The correlations between nitrogen oxides (NO and NO 2 ) and particulate matter (PM 10 ) is probably due to the common sources of these pollutants that are traffic, heating systems, industries, and other combustion processes 42,43 . High spatial correlations between exposure variables preclude the possibility to do multivariate adjusted analysis in air pollution studies. In the present study, because of the high correlation between pollutants, we used LPA models to investigate the association between multiple -pollutants and lung cancer incidence. Exposure profile modelling for multiple exposures has been used in previous epidemiologic studies on health outcomes such as blood pressure 44 , low birth weight 40 , total mortality 45 , respiratory mortality 46 , and lung cancer in nonsmokers 47 . Table 1. Description of the study area, air pollution level, and district-level covariates. *Socio-economic status score according to the 16 variables mentioned in the method section. This variable does not have a unit. The lowest value of this score was 36.6 and the highest was 67.4.

Number of districts 22
Number of lung cancer patients 1653

Age of patients (year)
Median (1st-3st quartile range) 65.5 (60-71) www.nature.com/scientificreports/ This method enables identifying possible unobservable subgroups, or latent classes, in a population using a number of related exposure variables and can help better understand the complex relations between risk factors and health outcomes, such as cancer, that may not be best explained by a single exposure 13 48 . The inversion phenomenon, fossil fuel consumption of old vehicles, low-quality fuel, population congestion, and high-traffic highways, and the existence of several factories in the south of Tehran such as iron and steel industries, are other reasons for the high level of BTEX in Tehran city 48,49 .

Gender of patients (male)
In the present study, about 70% of lung cancer patients were men and the prevalence of smoking among men was about 16%. But even after adjusting for smoking prevalence, the effect of air pollutants on lung cancer was significant.
Su et al. conducted an ecological study about ambient air pollution and all cancer incidences in Taiwan. Their results showed positive correlations between PM 2.5 SO 2 , NO x , and O 3 levels and age-adjusted total cancer incidence rates 50 ; and a study conducted on data from 2002 and 2011 in Brazil showed that traffic density and NO 2 were associated with an increased incidence of respiratory cancers 51 . In a large population of 16,209 Norwegian men, after a 27-year follow-up the risk ratios for developing lung cancer attributed to NO x and SO 2 exposure were 1.08 (CI 95% = 1.06-1.11) and 1.03 (CI 95% = 0.77-1.38), respectively 52 . A cohort study conducted from January 1998 to December 2009 in four Northern Chinese cities including Tianjin, Shenyang, Taiyuan, and Rizhao, showed that the combined effect of NO 2 and PM 10 resulted in a significant increase in mortality from lung cancer 53  , and each ~ 5 μg/m 3 increase in outdoor PM 2.5 concentration was associated with a 2% (95% CI: 1%-5%) increased risk of lung cancer. However, no associations were observed for O 3 or O x and lung cancer 54 . In our analysis, NO 2 was associated with an increased risk of lung cancer as well. Also a meta-analysis of 14 studies showed that the pooled risk of PM 2.5 and PM 10 for lung cancer mortality was respectively RR: 1.14, CI95%:1.07-1.21 and RR: 1.07, CI95%:1.03-1.11 55 .

Strengths and limitation.
An important limitation of our study was the short time interval between the recorded pollutants and lung cancer incidence data. However, this was the latest available data about lung cancer in Tehran, at the time we started our study. Also, the study design was ecologic with no individual-level data, and the PAF would have been better estimated using individual-level data with adjustments for important confounding covariates. However, the results of this study still have important implications for public health, underscoring the need to reduce air pollution. Tehran includes 22 districts, and the number of women with lung cancer was small in some districts, which prevented us to perform further analyses in gender subgroups. However, gender prevalence was adjusted in the analyses. Also, pathological data about cancer cell types was not available, and this prevented us to perform a separate analysis based on pathological type.
A novelty of our study was estimating the simultaneous effect of several air pollutants on lung cancer incidence. This helps to have a holistic picture of the effect of complex air pollution mixtures on human disease.

Conclusion
This is the first study to examine the associations between multiple air pollutants on lung cancer incidence in Iran. The findings suggest that lung cancer was associated with ambient air pollution in Tehran, and this association was stronger for p-xylene, o-xylene, ethylbenzene, benzene, m-xylene and TBTEX. Air pollution is a serious problem in Tehran, and decreasing the concentrations of air pollutants should be a key goal for policy makers to reduce the number of lung cancer cases in Tehran.