Urbanites’ mental health undermined by air pollution

The rising mental health difficulties of the urban population in developing countries may be attributed to the high levels of air pollution. However, nationwide large-scale empirical works that examine this claim are rare. In this study, we construct a daily mental health metric using the volume of mental-health-related queries on the largest search engine in China, Baidu, to test this hypothesis. We find that air pollution causally undermines people’s mental health and that this impact becomes stronger as the duration of exposure to air pollution increases. Heterogeneity analyses reveal that men, middle-aged people and married people are more vulnerable to the impact of air pollution on mental health. More importantly, the results also demonstrate that the cumulative effects of air pollution on mental health are smaller for people living in cities with a higher gross domestic product per capita, more health resources, larger areas of green land and more sports facilities. Finally, we estimate that with a one-standard-deviation increase of fine particulate matter (26.3 μg m−3), the number of people who suffer from mental health problems in China increases by approximately 1.15 million. Our findings provide quantitative evidence for the benefits of reducing air pollution to promote mental health and well-being. The impacts of air pollution on mental health have been previously documented but rarely using nationwide large-scale data. This study investigates the short-term and long-term impacts of air pollution on urbanites’ mental health by leveraging national real-time internet search data in China.

The rising mental health difficulties of the urban population in developing countries may be attributed to the high levels of air pollution. However, nationwide large-scale empirical works that examine this claim are rare. In this study, we construct a daily mental health metric using the volume of mental-health-related queries on the largest search engine in China, Baidu, to test this hypothesis. We find that air pollution causally undermines people's mental health and that this impact becomes stronger as the duration of exposure to air pollution increases. Heterogeneity analyses reveal that men, middle-aged people and married people are more vulnerable to the impact of air pollution on mental health. More importantly, the results also demonstrate that the cumulative effects of air pollution on mental health are smaller for people living in cities with a higher gross domestic product per capita, more health resources, larger areas of green land and more sports facilities. Finally, we estimate that with a one-standard-deviation increase of fine particulate matter (26.3 μg m −3 ), the number of people who suffer from mental health problems in China increases by approximately 1.15 million. Our findings provide quantitative evidence for the benefits of reducing air pollution to promote mental health and well-being.
Rapid economic development in China has been accompanied by an increase in material living standards as well as a rise in mental health difficulties, especially for urban inhabitants. According to the estimation of the World Health Organization (WHO), the number of people suffering from depression had reached 54.8 million in China (approximately 4.2% of the Chinese population) in 2015 (ref. 1 ), and this number is increasing 2 . Mental disorders worsen human health 3 , lower productivity 4 and reduce quality of life 5,6 . They have become a major contributor to the Chinese disease burden, accounting for ~3.1-7.3% of years lived with disability in 2015 (ref. 1 ). Noticeable changes in social, economic and physical systems in China in recent decades are major factors that lead to the risks of mental disorders. Air pollution, the primary environmental challenge for China and a notable byproduct of China's industrialization 7 , is likely to exacerbate the risk factors for mental disorders.
Recently, a few empirical studies have shed light on the threats that air pollution poses to mental health. For example, Zhang et al. 8 investigated the effects of air pollution on respondents' mental health status over the previous month in China. Xue et al. 9 and Yang et al. 10 used survey data from China to estimate the impact of long-term exposure to air pollution. Newbury et al. 11 and Bakolis et al. 12 examined the association between air pollution and mental health on the basis of survey data from the United Kingdom, but the former study focused only on adolescent psychotic experiences. Some researchers have also utilized depression-related admissions to hospitals to explore the impact of air pollution on mental health [13][14][15][16] .
These prior studies provide an initial understanding of the relationship between air pollution and mental health. However, we still lack a large-scale nationwide quantification of the mental health risks posed by air pollution, especially with regard to the different effects with Analysis https://doi.org/10.1038/s41893-022-01032-1 we quantitatively analysed the duration of exposure to air pollution that leads to a decline in people's mental health, and we show that the effects become very larger as the duration of exposure to air pollution increases (approximately six times more for 60 days of long-term exposure compared with daily exposure). Second, how does air pollution varied durations of pollution exposure. Previous studies employing approaches such as surveys [8][9][10] are insufficient to capture the real-time impact of air pollution on a national scale. It is unclear whether their findings hold across cities, considering the large economic disparities among different cities in China. For example, a notably negative effect between fine particulate matter (PM 2.5 ) concentration and mental health was identified in Shijiazhuang 13 but not in Beijing 16 .
This study begins to fill this knowledge gap by utilizing a unique nationwide dataset with a fine-grained time series. Specifically, we use real-time internet search data from 1 March 2019 to 31 December 2019 across 252 cities in China to estimate the impact of short-term and long-term exposure to air pollution on urbanites' mental health at the city level. We have access to the daily queries of all users on the largest search engine in China, Baidu (Baidu.com), which allows us to obtain real-time mental-health-related queries (MHQs). Web search queries provide ample information about the interests, concerns and intentions of the overall population 17 , making them a valuable source of information about health trends 18,19 . MHQs enable us to leverage the advantages of big data to quantify the impact of air pollution on the whole population and assess the number of affected people compared with conventional approaches.
To accurately filter MHQs from daily search queries, we selected a group of search terms to capture two common mental health problems-depression and anxiety 1 (see the details in the Methods and Supplementary Note 1). We obtained 360 million geotagged queries related to mental health and measured people's daily mental health for each city by aggregating the MHQ data to the city level (Fig. 1). Note that the higher the volume of MHQs per capita, the worse the mental health of the population in that area ( Supplementary Fig. 1).
This study answers three research questions. First, do real-time exposure and long-term exposure to air pollution have different effects on Chinese urbanites' mental health? Using the fixed effects model,

Results
Before conducting the empirical analyses, we first adopted real mental-health-related cases (MHCs) from Haodf (Haodf.com), a leading online health care platform in China, to validate whether the MHQs are related to people's mental health in a local area (see the details in Supplementary Note 1). Figure 2 shows a highly correlated relationship between MHQs on Baidu and MHCs on Haodf (Spearman's correlation is 0.70, and P < 0.01). To further validate the MHQs, we also employed the text of search queries-that is, we used the top ten most frequent search queries that are most likely to reflect mental health issues and examined the correlation between MHQs and these queries (see the details in Supplementary Note 1). Supplementary Fig. 3 also reveals a highly correlated relationship (Spearman's correlation is 0.90, and P < 0.01). We thus found that MHQs can capture the mental health status of the population in a city.

Short-term exposure to air pollution
We first estimated the short-term effect of air pollution on urbanites' mental health using equation (2). As shown in columns 1 and 2 of Table 1, there is a significantly positive association between air pollution (air quality index (AQI) and PM 2.5 ) and MHQs, suggesting that people's mental health status declines when they are temporarily exposed to air pollution. The coefficient in column 2 indicates that a one-standard-deviation increase in PM 2.5 concentration (that is, 26 In addition to air pollution, extreme weather might also affect people's mental health 20 . Accordingly, we compared the impact of air pollution with that of an extreme storm, Typhoon Lekima, which had the most severe influence in China in 2019, to help us understand the effect size. We found that the impact of air pollution on mental health issues was smaller than that of Typhoon Lekima (for example, the impact of PM 2.5 was that a one standard deviation increase in PM 2.5 concentration will increase the volume of MHQs by 0.0067 standard deviation and the one of Typhoon Lekima was a 0.0141 standard deviation increase in the search volume of MHQs; Supplementary Note 2 and Supplementary  Table 6). Nonetheless, severe air pollution occurs frequently in China, unlike extreme storms. For example, 53.0% of cities' daily PM 2.5 concentrations exceeded the WHO limitation in the sample period.
The relationship between air pollution and mental health might be spurious as a result of omitted variables that vary with days on the city level. For instance, traffic congestion may affect a city's air pollution and exacerbate people's emotions. The results would be biased if such omitted variables were not controlled in the model specification. We leveraged an instrumental variable (IV) to address the endogeneity issues caused by omitted variables, which helped us identify the causal effect of air pollution on mental health.
We constructed the IV-Neighbour-by combining the wind direction with the air pollution level of neighbouring areas following prior research 21 (Methods). The insight behind this method is that the formation and dissipation of air pollution are heavily affected by meteorological conditions 22 ; the wind can bring exogenous variation to a city's air pollution level by blowing pollutant emissions from neighbouring regions to the city 21,23 . The neighbour is an ideal IV as it is unlikely to influence the social and economic activities of a city, except for varying the city's air pollution. This estimation also addresses the classic measurement error issue in using station-based data 21 . The IV results reveal that the coefficients of AQI and PM 2.5 remain positive and statistically significant (columns 3-6 in Table 1; see the full results in Supplementary Table 7). The results are robust to altering the weight and the range in calculating the IV (see the details in Supplementary  (2). Columns 3-6 present the second-stage regression results from employing two-stage least squares on equation (2), where the IV is Neighbour. Here d ij is the distance between city i and grid j; w ijt = 1.0 when calculating the IV (see the details in the Methods and Supplementary Note 3). Robust standard errors are clustered at the city level. P values are reported in parentheses. ***P < 0.01 (two-sided test).

Note 3 and Supplementary
In addition to the IV estimation, our baseline results are robust to various checks (Supplementary Note 3 and Supplementary Table 11). First, Supplementary Table 11a documents similar results with the alternative measurement of the dependent variable, where we accounted for the different population sizes of regions by using the ratio of MHQs instead of the natural logarithm of MHQs. Second, we included day-of-week fixed effects and quarter fixed effects to absorb the possible interference from different weekdays and seasons (Supplementary  Table 11b). Third, we removed duplicate queries of each user on each day; that is, we counted only one effective query if a user searched many MHQs in one day (Supplementary Table 11c). Fourth, we used a two-way cluster standard error-that is, simultaneously clustering by cities and days-to capture the unspecified correlations between observations for different cities on the same day (Supplementary Table 11d). Finally, we controlled for the number of Baidu users to mitigate the concern that the usage ratios of Baidu in different regions might affect the MHQs (Supplementary Table 11e). We also conducted a placebo test using false search queries (see the details in Supplementary Note 3 and Supplementary Table 13).

Long-term exposure to air pollution
We employed equation (3) to estimate the long-term effects of air pollution. We controlled for transitory exposure to air pollution in this model to evaluate the relative importance of transitory and cumulative effects. Table 2 reports the results using four windows of air pollution exposure: 7-day, 14-day, 30-day and 60-day exposures. As shown in Table 2, the impact of air pollution becomes stronger as the duration of exposure increases, even when controlling for the real-time effects of air pollution. The coefficient in column 4 suggests that a one-standard-deviation increase in PM 2.5 concentration during the past 60 days is related to a 0.0206-standard-deviation increase in MHQs (over six times the short-term effect of PM 2.5 , 0.0033). These findings reveal that the proportion of people who suffer from mental health problems strikingly increases if the air quality in the city remains at a poor level for a long time. The results are quite similar when using the AQI to reflect air pollution (Supplementary Table 23).

Heterogeneity analyses
We further examined how air pollution impacts mental health among different demographic and socio-economic groups. We considered four demographic characteristics: gender, education, age and marriage. First, we compared the effects of air pollution on the male and female subsamples (Supplementary Table 14). Figure 3a shows the estimated results coupled with 95% confidence intervals. As shown in Fig. 3a, the effect on mental health becomes larger for both men and women as the duration of exposure to air pollution increases. The results also reveal that although the impact of short-term exposure to air pollution is not significantly different between men and women, men are more vulnerable than women as the duration of exposure increases. This finding is similar to the study by Zhang et al. 24 , who found that the effect of air pollution on cognitive performance is larger for men than for women.
We repeated our analysis to explore heterogeneous effects on people with different educational attainments. Figure 3b and Supplementary Table 15 display the estimated coefficients for two subsamples with different educational attainments-that is, high school or below versus college or above. As shown in Fig. 3b, the effects of air pollution are significant for both groups, but there are no significant differences in the impacts on people with high or low levels of educational attainment.
To learn how air pollution affects mental health for different age cohorts, we estimated equation (3) Table 16 reports the numerical results, and Fig. 3c shows the graphical results. In general, we found that the effects (both short-term and long-term effects) of air pollution on mental health present an inverted-U shape as people age. Specifically, compared with the youngest age cohort (that is, 18-24) and the oldest age cohort (that is, 55-64), the effect on mental health caused by air pollution is more pronounced for the middle-aged cohorts (that is, 25-54).
We also compared the married group with the unmarried group to examine whether the effects of exposure to air pollution are different. Figure 3d presents the graphical results (the numerical results are in Supplementary Table 17), indicating that the impact of air pollution on mental health is stronger for married people than for unmarried people, especially when they are exposed to air pollution over the long term.
We next explored how the effects of exposure to air pollution vary with the socio-economic characteristics of each city. We examined the impact of heterogeneity with respect to economic development, health resources, living conditions and sports facilities using gross domestic product (GDP) per capita, the number of hospitals, the area of green land and the number of gyms. Figure 4 reports the graphical results (Supplementary Tables 18-21 report the numerical regression results). We found that the effects of long-term exposure to air pollution are smaller for people living in cities with higher GDP per capita, more health resources, larger areas of green land and more sports facilities.
Prior literature has demonstrated that impoverished countries (that is, low-and middle-income countries) suffer from more mental disorders 25,26 . Our study reveals that this inequality remains within a large country such as China. People living in cities with lower economic development suffer from more mental health problems caused by air pollution. An environment with a higher quality of living could improve people's subjective well-being, thus alleviating the effect of air pollution. More health resources could enable people to conveniently receive professional treatment for mental disorders and reduce mental health problems. More accessible green spaces in urban areas are associated with decreased mental illnesses (for example, anxiety disorder) 27 , thereby mitigating the effect of air pollution. Physical exercise is also beneficial to promoting people's mental health 28,29 . Hence, the impact of air pollution could be weakened in cities with more sports facilities, as citizens have more opportunities to access them. Our findings imply that improving the living conditions and public social welfare of cities can effectively reduce the negative effects of air pollution on urbanites' mental health. We also obtain consistent results using the AQI to measure the air quality of each city (see the details in Supplementary Note 4, Supplementary Figs. 6 and 7, and Supplementary Tables 24-31). Analysis https://doi.org/10.1038/s41893-022-01032-1

Additional analysis
To further investigate whether air pollution has different effects on depression and anxiety (the two most common mental illnesses 1 ), we separated the MHQs into two subsamples: a depression-related query subsample and an anxiety-related query subsample (Supplementary Table 2). We then estimated the impacts on depression and anxiety on the basis of equation (3). Supplementary Table 22 presents the results. It shows that air pollution has statistically significant impacts on both depression-and anxiety-related queries. However, the likelihood of suffering depression is higher under long-term exposure to air pollution (see the details in Supplementary Note 5).

Discussion
Air pollution is a severe environmental problem around the world, especially in developing countries such as China. Previous studies have documented the adverse effects of air pollution on people's health [30][31][32][33][34] , but research on mental health using nationwide large-scale data remains scarce. This study investigates the short-term and long-term impacts of air pollution on urbanites' mental health by leveraging national real-time internet search data in China. Our findings suggest that both short-term and long-term exposures to air pollution exacerbate mental health problems, manifested by the statistically significant increase in search queries about these problems. The effects become larger as the duration of exposure to air pollution increases.
Furthermore, the heterogeneity analyses reveal that the impact of air pollution is stronger for the male group, the middle-aged group and married people, especially when the duration of exposure to air pollution increases. We did not find statistically significant differences for people with different educational attainment. More importantly, the results also demonstrate that the cumulative effects of air pollution on mental health are smaller for people living in cities with higher GDP per capita, more health resources, larger areas of green land and more sports facilities. A nationwide study is needed to evaluate the representative exposure-response links between air pollution and mental health given the research limitations of prior studies. Our findings confirm the causal effects of air pollution on human mental health. The extensive heterogeneity analyses in this study also provide a comprehensive understanding of how air pollution affects people's mental health under various demographic and socio-economic conditions. Moreover, this study helps estimate how many people suffer from mental health problems as a result of air pollution. Given the market share of the Baidu search engine (69.5%) 35 and the percentage of internet users who used the search engine service in China in 2019 (approximately 49.6%) 36 , this study found that a one-standard-deviation increase in PM 2.5 (that is, 26

Baidu search data
According to the report of the China Internet Network Information Centre 36 , the total number of internet users using the Baidu search engine reached 694 million in 2019, where Baidu accounts for approximately 70% of the market share 35 . The search queries on Baidu are thus representative and able to reflect the activities of most populations.
We partnered with Baidu to obtain all search queries after 1 March 2019 on the Baidu search engine. To filter the MHQs from daily search queries, we selected a group of search terms related to two mental health problems-depression and anxiety. According to WHO 1 , depression and anxiety are the two most common mental disorders and are highly prevalent in the overall population. We obtained seven keywords for depression and eight keywords for anxiety referring to the definitions of WHO 1 (analyses based on each keyword are presented in Supplementary Note 6). We then translated these keywords to Chinese following the processes in Supplementary Fig. 2 to build the corresponding Chinese search keywords (Supplementary Note 1 and Supplementary Table 2). On the basis of these keywords, we constructed a regular expression to filter the MHQs from daily search queries on Baidu: where A is a set of selected keywords representing depression and anxiety and B is a set of words that are used to exclude the irrelevant queries. The elements included in A and B are reported in Supplementary Table  2. The rule of this regular expression is that a query is an effective MHQ if at least one keyword in set A is included in the query and all keywords in set B do not occur in this query.
Using the filtering rule, we obtained 360 million geotagged search queries (with a 95.1% positive ratio) related to mental health from 252 cities from 1 March 2019 to 31 December 2019. We adopted this period to avoid confounding factors from COVID-19 because many cities were locked down at the start of 2020, which might affect people's mental health 37 .

Online doctor consultation data
In recent years, online doctor consultation has been an important way of broadening the channels available to patients. The online consultation records the actual mental health cases. Accordingly, we collected online doctor consultation data from a leading online health care    Tables 18-21; the error bars depict the 95% confidence intervals. We used the median values to separate the high group from the low group for each pair of heterogeneity analyses. Analysis https://doi.org/10.1038/s41893-022-01032-1 platform in China, Haodf, to validate the association between MHQs on Baidu and MHCs on Haodf. The MHCs were selected on the basis of the disease classification on Haodf and included typical mental disorders, such as depressive disorder, anxiety disorder and insomnia (Supplementary Note 1). We then aggregated the data to the city level.

Air quality data
We obtained air quality data from China's Ministry of Ecology and Environment, which provides real-time monitoring data covering all prefectural cities. The original dataset included hourly readings of the AQI and specific pollutants, including PM 2.5 concentrations, from 1,605 monitoring stations. We aggregated the station-hour-level air pollution data to city-day-level data by averaging all station data within the corresponding city each day.

Weather data
We considered five common weather variables: temperature, sunshine, humidity, precipitation and wind speed. The weather data were collected from the China Meteorological Data Service Centre, which provides daily meteorological data from 699 meteorological stations covering most cities. We calculated the distance between each city and all meteorological stations. We then used the meteorological data from the nearest station as a proxy for the city weather conditions. We excluded cities for which the distance to the nearest meteorological station is more than 40 km. This reduced the number of our sample cities to 252. We also varied this threshold to verify the robustness of our results (Supplementary Note 3 and Supplementary Table 12).

Demographic and socio-economic data
To explore heterogeneity, we assembled demographic and socio-economic data. The demographic data were at the individual level and were provided by the Baidu User Profile Platform, which integrates the user information from over 55 Baidu products (apps) and creates a comprehensive profile for more than 90% of individual users in Baidu. The profile data were generated by collecting the users' actual information and predicting unknown information on the basis of comprehensive machine learning algorithms. We collected each individual's demographic information from the platform and constructed mental health measures for different groups (such as male versus female) for each city. The socio-economic data were at the city level and came from the 2019 China City Statistical Yearbook compiled by the Chinese National Bureau of Statistics. The yearbook provides a series of city information, including GDP per capita, number of hospitals and area of green land. We collected each city's socio-economic information from the yearbook and divided the sample cities into high-and low-value groups on the basis of the median value of each characteristic.

Baseline regression model
We used a fixed effect panel model to estimate the effects of air pollution on mental health. First, we estimated the short-term effects of air pollution on the basis of the following econometric specification: where ln(MHQs) it is the natural logarithm of the number of MHQs in city i on date t; AirPollution it represents the pollution level in city i on date t, measured by AQI and PM 2.5 concentration; X it is a vector of weather controls; μ i,m indicates city-month fixed effects, which can not only control for time-invariant confounders but also absorb time-varying characteristics for each city (such as local business cycle dynamics); and λ t indicates date fixed effects, accounting for common shocks for all cities in a given day. The coefficient α 1 reflects the transitory impact of air pollution on mental health and is expected to be positive. Next, we estimated the long-term effects of air pollution on mental health by focusing on the primary pollutant-PM 2.5 : Here PM 2.5it represents the contemporaneous level of air pollution in city i on date t, and 1 k ∑ k−1 n=0 PM 2.5i,t−n is the mean level of air pollution in the past k days, which measures cumulative exposure. The control variables are the same as in equation (2). The coefficient α 1 reflects the short-term impact of air pollution on mental health, and α 2 reflects the long-term exposure on mental health. Controlling for these two variables in a model can help us evaluate the relative importance of transitory and accumulative effects in the association between air pollution and mental health.

IV estimation
We introduced an IV and employed the two-stage-least-square approach to address the potential endogeneity issues in equation (2). In this study, we followed the idea of Zheng et al. 21 by utilizing "cross-boundary air pollution flows" to construct the IV.
To build the cross-boundary spillover measure of local air pollution, we procured a dataset from Tracking Air Pollution in China (http://tapdata.org), which provides real-time PM 2.5 data with a spatial resolution of 10 km × 10 km (in a regular grid of 0.1° × 0.1°) 38 . We also used the PM 2.5 data from the monitoring stations to construct the IV and obtained very similar results (see the details in Supplementary Note 3 and Supplementary Tables 9 and 10). Using the data, we constructed the IV: where GridPollution jt represents grid cell j's pollution level on date t measured by PM 2.5 concentration; w ijt represents grid cell j's weight, which depends on the wind direction of city i on date t and the relative direction of grid cell j to city i; and d ij is the distance between city i and grid cell j. The weights of upwind grid cells are larger than those of grid cells in other directions, and the weights of downwind grid cells are set to zero because their impacts on city i's air pollution level are minimal. Neighbour it therefore measures how city i's air pollution level on date t is affected by the PM 2.5 concentration from nearby grid cells on the same day.
To minimize the interference from agglomeration economies on a group of cities, which may produce correlations between the IV and city i's economic activities 21 , we excluded all the neighbouring grid cells for which the distance to city i is within 120 km. The correlation between grid cell j's PM 2.5 concentration and city i's air pollution level should be small if the distance between city i and grid cell j is long. We thus also excluded grid cells outside 300 km from city i given that 90% of the sample days' wind speeds are smaller than 3.4 m s −1 (that is, 312 km d −1 -the greatest distance the wind can travel per day).

Reporting summary
Further information on research design is available in the Nature Portfolio Reporting Summary linked to this article.

Data availability
The air quality data are available from https://air.cnemc.cn. The weather data are available from http://data.cma.cn. The socio-economic data are available from https://data.cnki.net/area/Yearbook/Single/ N2020050229. The spatial PM 2.5 data that were used to generate the IV are available from http://tapdata.org. The online doctor consultation data are available from https://open.haodf.com/opendata/home. After anonymization and aggregation, there are no privacy issues for the aggregated mental health query data and related demographic data Analysis https://doi.org/10.1038/s41893-022-01032-1 for academic purposes. However, these data are still sensitive for the respective communities since they were extracted from user-generated data. Therefore, the aggregated mental health query data and related demographic data that support the findings of this study are available from the corresponding author Jingbo Zhou upon reasonable request.
Corresponding author(s): Jingbo Zhou, Meng Li, Jizhou Huang, Dejing Dou Last updated by author(s): Nov 23, 2022 Reporting Summary Nature Research wishes to improve the reproducibility of the work that we publish. This form provides structure for consistency and transparency in reporting. For further information on Nature Research policies, see our Editorial Policies and the Editorial Policy Checklist.

Statistics
For all statistical analyses, confirm that the following items are present in the figure legend, table legend, main text, or Methods section.

n/a Confirmed
The exact sample size (n) for each experimental group/condition, given as a discrete number and unit of measurement A statement on whether measurements were taken from distinct samples or whether the same sample was measured repeatedly The statistical test(s) used AND whether they are one-or two-sided Only common tests should be described solely by name; describe more complex techniques in the Methods section.
A description of all covariates tested A description of any assumptions or corrections, such as tests of normality and adjustment for multiple comparisons A full description of the statistical parameters including central tendency (e.g. means) or other basic estimates (e.g. regression coefficient) AND variation (e.g. standard deviation) or associated estimates of uncertainty (e.g. confidence intervals) For null hypothesis testing, the test statistic (e.g. F, t, r) with confidence intervals, effect sizes, degrees of freedom and P value noted For manuscripts utilizing custom algorithms or software that are central to the research but not yet described in published literature, software must be made available to editors and reviewers. We strongly encourage code deposition in a community repository (e.g. GitHub). See the Nature Research guidelines for submitting code & software for further information.

Data
Policy information about availability of data All manuscripts must include a data availability statement. This statement should provide the following information, where applicable: -Accession codes, unique identifiers, or web links for publicly available datasets -A list of figures that have associated raw data -A description of any restrictions on data availability The air quality data are available from https://air.cnemc.cn. The weather data are available from http://data.cma.cn. The socioeconomic data are available from https://data.cnki.net/area/Yearbook/Single/N2020050229. The spatial PM2.5 data that are used to generate the instrumental variable are available from http://tapdata.org. The online doctor consultation data are available from https://open.haodf.com/opendata/home. Other data that support the findings of this study, including the mental health-related queries data and related demographic data, are available from the corresponding author upon reasonable request.

nature research | reporting summary
April 2020 Field-specific reporting Please select the one below that is the best fit for your research. If you are not sure, read the appropriate sections before making your selection.

Life sciences Behavioural & social sciences Ecological, evolutionary & environmental sciences
For a reference copy of the document with all sections, see nature.com/documents/nr-reporting-summary-flat.pdf

Ecological, evolutionary & environmental sciences study design
All studies must disclose on these points even when the disclosure is negative.

Study description
This study has constructed a daily city-level mental health metric based on the volume of mental health-related queries (MHQs) on the largest search engine in China---Baidu, to test how air pollution affects mental health of urbanites. We use the real-time Internet search data from March 1, 2019 to December 31, 2019 across 252 cities in China to estimate this relationship. We further investigate whether the impact is heterogeneous considering the different demographic and socio-economic factors.

Research sample
The sample consists of all people's search queries on Baidu.com across 252 cities in China. As Baidu.com is the biggest search engine in China, accounting for about 70% of the market share, the search behavior on Baidu.com can accurately reflect the mental health status of people in a city.

Sampling strategy
The sample size, 252 cities, is selected based on the data availability. We excluded cities where the distance to the nearest meteorological station is more than 40 km.

Data collection
Mental health-related queries data are collected and filtered from Baidu search engine. Air pollution data are collected from China's Ministry of Ecology and Environment. Weather data are collected from the China Meteorological Data Service Center. The demographic and socio-economic data are collected from the 2019 China City Statistical Yearbook.
Timing and spatial scale The sample period is from March 1, 2019 to December 31, 2019. We select March 1, 2019 as the start date of the sample because this is the earliest date that we are permitted to access the search query data; The selection of December 31, 2019 is to avoid the confounding factors from the COVID-19, because one-third of Chinese cities were locked down to prevent the escalation of COVID-19 virus transmission at the start of 2020, which might affect people's mental health.

Data exclusions
There are no data excluded in the analysis.