## Main

Rapid economic development in China has been accompanied by an increase in material living standards as well as a rise in mental health difficulties, especially for urban inhabitants. According to the estimation of the World Health Organization (WHO), the number of people suffering from depression had reached 54.8 million in China (approximately 4.2% of the Chinese population) in 2015 (ref. 1), and this number is increasing2. Mental disorders worsen human health3, lower productivity4 and reduce quality of life5,6. They have become a major contributor to the Chinese disease burden, accounting for ~3.1–7.3% of years lived with disability in 2015 (ref. 1). Noticeable changes in social, economic and physical systems in China in recent decades are major factors that lead to the risks of mental disorders. Air pollution, the primary environmental challenge for China and a notable byproduct of China’s industrialization7, is likely to exacerbate the risk factors for mental disorders.

Recently, a few empirical studies have shed light on the threats that air pollution poses to mental health. For example, Zhang et al.8 investigated the effects of air pollution on respondents’ mental health status over the previous month in China. Xue et al.9 and Yang et al.10 used survey data from China to estimate the impact of long-term exposure to air pollution. Newbury et al.11 and Bakolis et al.12 examined the association between air pollution and mental health on the basis of survey data from the United Kingdom, but the former study focused only on adolescent psychotic experiences. Some researchers have also utilized depression-related admissions to hospitals to explore the impact of air pollution on mental health13,14,15,16.

These prior studies provide an initial understanding of the relationship between air pollution and mental health. However, we still lack a large-scale nationwide quantification of the mental health risks posed by air pollution, especially with regard to the different effects with varied durations of pollution exposure. Previous studies employing approaches such as surveys8,9,10 are insufficient to capture the real-time impact of air pollution on a national scale. It is unclear whether their findings hold across cities, considering the large economic disparities among different cities in China. For example, a notably negative effect between fine particulate matter (PM2.5) concentration and mental health was identified in Shijiazhuang13 but not in Beijing16.

This study begins to fill this knowledge gap by utilizing a unique nationwide dataset with a fine-grained time series. Specifically, we use real-time internet search data from 1 March 2019 to 31 December 2019 across 252 cities in China to estimate the impact of short-term and long-term exposure to air pollution on urbanites’ mental health at the city level. We have access to the daily queries of all users on the largest search engine in China, Baidu (Baidu.com), which allows us to obtain real-time mental-health-related queries (MHQs). Web search queries provide ample information about the interests, concerns and intentions of the overall population17, making them a valuable source of information about health trends18,19. MHQs enable us to leverage the advantages of big data to quantify the impact of air pollution on the whole population and assess the number of affected people compared with conventional approaches.

To accurately filter MHQs from daily search queries, we selected a group of search terms to capture two common mental health problems—depression and anxiety1 (see the details in the Methods and Supplementary Note 1). We obtained 360 million geotagged queries related to mental health and measured people’s daily mental health for each city by aggregating the MHQ data to the city level (Fig. 1). Note that the higher the volume of MHQs per capita, the worse the mental health of the population in that area (Supplementary Fig. 1).

This study answers three research questions. First, do real-time exposure and long-term exposure to air pollution have different effects on Chinese urbanites’ mental health? Using the fixed effects model, we quantitatively analysed the duration of exposure to air pollution that leads to a decline in people’s mental health, and we show that the effects become very larger as the duration of exposure to air pollution increases (approximately six times more for 60 days of long-term exposure compared with daily exposure). Second, how does air pollution unequally affect people’s mental health for different population groups? The large-scale MHQs enable us to investigate whether the impact on mental health varies by different demographic characteristics (for example, gender, marriage and age) and socio-economic characteristics (for example, economic development, health resources and living conditions). Finally, we estimated how many people suffer from mental health issues as a result of air pollution. This estimation can shed light on the risks and losses posed by air pollution to people’s mental health nationwide.

## Results

Before conducting the empirical analyses, we first adopted real mental-health-related cases (MHCs) from Haodf (Haodf.com), a leading online health care platform in China, to validate whether the MHQs are related to people’s mental health in a local area (see the details in Supplementary Note 1). Figure 2 shows a highly correlated relationship between MHQs on Baidu and MHCs on Haodf (Spearman’s correlation is 0.70, and P < 0.01). To further validate the MHQs, we also employed the text of search queries—that is, we used the top ten most frequent search queries that are most likely to reflect mental health issues and examined the correlation between MHQs and these queries (see the details in Supplementary Note 1). Supplementary Fig. 3 also reveals a highly correlated relationship (Spearman’s correlation is 0.90, and P < 0.01). We thus found that MHQs can capture the mental health status of the population in a city.

### Short-term exposure to air pollution

We first estimated the short-term effect of air pollution on urbanites’ mental health using equation (2). As shown in columns 1 and 2 of Table 1, there is a significantly positive association between air pollution (air quality index (AQI) and PM2.5) and MHQs, suggesting that people’s mental health status declines when they are temporarily exposed to air pollution. The coefficient in column 2 indicates that a one-standard-deviation increase in PM2.5 concentration (that is, 26.3 μg m3) is associated with a 0.0033-standard-deviation increase in search queries about mental health on Baidu. The magnitude is practically notable given the large volume of search queries. For example, a 0.0033-standard-deviation increase translates into an increase of approximately 1.44 million MHQs during our study period. While this query volume is not equal to the actual increase in the number of people with mental health problems, it should be highly correlated18. According to our statistics, each user on Baidu searched for mental health information 7.8 times on average, and the average ratio of the located queries was 46.41% during our sample period. Therefore, an increase in MHQs of 1.44 million means an increase of approximately 0.398 million people (that is, 1.44 million/7.8/0.4641) who suffer from mental health problems due to air pollution. It is worth noting that the true value should be higher than this number because not all people look up mental-health-related information through the search engine. The results for other pollutants (that is, PM10, SO2, CO, NO2 and O3) are reported in Supplementary Table 5.

In addition to air pollution, extreme weather might also affect people’s mental health20. Accordingly, we compared the impact of air pollution with that of an extreme storm, Typhoon Lekima, which had the most severe influence in China in 2019, to help us understand the effect size. We found that the impact of air pollution on mental health issues was smaller than that of Typhoon Lekima (for example, the impact of PM2.5 was that a one standard deviation increase in PM2.5 concentration will increase the volume of MHQs by 0.0067 standard deviation and the one of Typhoon Lekima was a 0.0141 standard deviation increase in the search volume of MHQs; Supplementary Note 2 and Supplementary Table 6). Nonetheless, severe air pollution occurs frequently in China, unlike extreme storms. For example, 53.0% of cities’ daily PM2.5 concentrations exceeded the WHO limitation in the sample period.

The relationship between air pollution and mental health might be spurious as a result of omitted variables that vary with days on the city level. For instance, traffic congestion may affect a city’s air pollution and exacerbate people’s emotions. The results would be biased if such omitted variables were not controlled in the model specification. We leveraged an instrumental variable (IV) to address the endogeneity issues caused by omitted variables, which helped us identify the causal effect of air pollution on mental health.

We constructed the IV—Neighbour—by combining the wind direction with the air pollution level of neighbouring areas following prior research21 (Methods). The insight behind this method is that the formation and dissipation of air pollution are heavily affected by meteorological conditions22; the wind can bring exogenous variation to a city’s air pollution level by blowing pollutant emissions from neighbouring regions to the city21,23. The neighbour is an ideal IV as it is unlikely to influence the social and economic activities of a city, except for varying the city’s air pollution. This estimation also addresses the classic measurement error issue in using station-based data21. The IV results reveal that the coefficients of AQI and PM2.5 remain positive and statistically significant (columns 3–6 in Table 1; see the full results in Supplementary Table 7). The results are robust to altering the weight and the range in calculating the IV (see the details in Supplementary Note 3 and Supplementary Table 8). This demonstrates the causal effect of air pollution on people’s mental health.

In addition to the IV estimation, our baseline results are robust to various checks (Supplementary Note 3 and Supplementary Table 11). First, Supplementary Table 11a documents similar results with the alternative measurement of the dependent variable, where we accounted for the different population sizes of regions by using the ratio of MHQs instead of the natural logarithm of MHQs. Second, we included day-of-week fixed effects and quarter fixed effects to absorb the possible interference from different weekdays and seasons (Supplementary Table 11b). Third, we removed duplicate queries of each user on each day; that is, we counted only one effective query if a user searched many MHQs in one day (Supplementary Table 11c). Fourth, we used a two-way cluster standard error—that is, simultaneously clustering by cities and days—to capture the unspecified correlations between observations for different cities on the same day (Supplementary Table 11d). Finally, we controlled for the number of Baidu users to mitigate the concern that the usage ratios of Baidu in different regions might affect the MHQs (Supplementary Table 11e). We also conducted a placebo test using false search queries (see the details in Supplementary Note 3 and Supplementary Table 13).

### Long-term exposure to air pollution

We employed equation (3) to estimate the long-term effects of air pollution. We controlled for transitory exposure to air pollution in this model to evaluate the relative importance of transitory and cumulative effects. Table 2 reports the results using four windows of air pollution exposure: 7-day, 14-day, 30-day and 60-day exposures. As shown in Table 2, the impact of air pollution becomes stronger as the duration of exposure increases, even when controlling for the real-time effects of air pollution. The coefficient in column 4 suggests that a one-standard-deviation increase in PM2.5 concentration during the past 60 days is related to a 0.0206-standard-deviation increase in MHQs (over six times the short-term effect of PM2.5, 0.0033). These findings reveal that the proportion of people who suffer from mental health problems strikingly increases if the air quality in the city remains at a poor level for a long time. The results are quite similar when using the AQI to reflect air pollution (Supplementary Table 23).

### Heterogeneity analyses

We further examined how air pollution impacts mental health among different demographic and socio-economic groups. We considered four demographic characteristics: gender, education, age and marriage. First, we compared the effects of air pollution on the male and female subsamples (Supplementary Table 14). Figure 3a shows the estimated results coupled with 95% confidence intervals. As shown in Fig. 3a, the effect on mental health becomes larger for both men and women as the duration of exposure to air pollution increases. The results also reveal that although the impact of short-term exposure to air pollution is not significantly different between men and women, men are more vulnerable than women as the duration of exposure increases. This finding is similar to the study by Zhang et al.24, who found that the effect of air pollution on cognitive performance is larger for men than for women.

We repeated our analysis to explore heterogeneous effects on people with different educational attainments. Figure 3b and Supplementary Table 15 display the estimated coefficients for two subsamples with different educational attainments—that is, high school or below versus college or above. As shown in Fig. 3b, the effects of air pollution are significant for both groups, but there are no significant differences in the impacts on people with high or low levels of educational attainment.

To learn how air pollution affects mental health for different age cohorts, we estimated equation (3) on the basis of five cohorts: 18–24, 25–34, 35–44, 45–54 and 55–64 years. Supplementary Table 16 reports the numerical results, and Fig. 3c shows the graphical results. In general, we found that the effects (both short-term and long-term effects) of air pollution on mental health present an inverted-U shape as people age. Specifically, compared with the youngest age cohort (that is, 18–24) and the oldest age cohort (that is, 55–64), the effect on mental health caused by air pollution is more pronounced for the middle-aged cohorts (that is, 25–54).

We also compared the married group with the unmarried group to examine whether the effects of exposure to air pollution are different. Figure 3d presents the graphical results (the numerical results are in Supplementary Table 17), indicating that the impact of air pollution on mental health is stronger for married people than for unmarried people, especially when they are exposed to air pollution over the long term.

We next explored how the effects of exposure to air pollution vary with the socio-economic characteristics of each city. We examined the impact of heterogeneity with respect to economic development, health resources, living conditions and sports facilities using gross domestic product (GDP) per capita, the number of hospitals, the area of green land and the number of gyms. Figure 4 reports the graphical results (Supplementary Tables 1821 report the numerical regression results). We found that the effects of long-term exposure to air pollution are smaller for people living in cities with higher GDP per capita, more health resources, larger areas of green land and more sports facilities.

Prior literature has demonstrated that impoverished countries (that is, low- and middle-income countries) suffer from more mental disorders25,26. Our study reveals that this inequality remains within a large country such as China. People living in cities with lower economic development suffer from more mental health problems caused by air pollution. An environment with a higher quality of living could improve people’s subjective well-being, thus alleviating the effect of air pollution. More health resources could enable people to conveniently receive professional treatment for mental disorders and reduce mental health problems. More accessible green spaces in urban areas are associated with decreased mental illnesses (for example, anxiety disorder)27, thereby mitigating the effect of air pollution. Physical exercise is also beneficial to promoting people’s mental health28,29. Hence, the impact of air pollution could be weakened in cities with more sports facilities, as citizens have more opportunities to access them. Our findings imply that improving the living conditions and public social welfare of cities can effectively reduce the negative effects of air pollution on urbanites’ mental health. We also obtain consistent results using the AQI to measure the air quality of each city (see the details in Supplementary Note 4, Supplementary Figs. 6 and 7, and Supplementary Tables 2431).

To further investigate whether air pollution has different effects on depression and anxiety (the two most common mental illnesses1), we separated the MHQs into two subsamples: a depression-related query subsample and an anxiety-related query subsample (Supplementary Table 2). We then estimated the impacts on depression and anxiety on the basis of equation (3). Supplementary Table 22 presents the results. It shows that air pollution has statistically significant impacts on both depression- and anxiety-related queries. However, the likelihood of suffering depression is higher under long-term exposure to air pollution (see the details in Supplementary Note 5).

## Discussion

Air pollution is a severe environmental problem around the world, especially in developing countries such as China. Previous studies have documented the adverse effects of air pollution on people’s health30,31,32,33,34, but research on mental health using nationwide large-scale data remains scarce. This study investigates the short-term and long-term impacts of air pollution on urbanites’ mental health by leveraging national real-time internet search data in China. Our findings suggest that both short-term and long-term exposures to air pollution exacerbate mental health problems, manifested by the statistically significant increase in search queries about these problems. The effects become larger as the duration of exposure to air pollution increases. Furthermore, the heterogeneity analyses reveal that the impact of air pollution is stronger for the male group, the middle-aged group and married people, especially when the duration of exposure to air pollution increases. We did not find statistically significant differences for people with different educational attainment. More importantly, the results also demonstrate that the cumulative effects of air pollution on mental health are smaller for people living in cities with higher GDP per capita, more health resources, larger areas of green land and more sports facilities.

A nationwide study is needed to evaluate the representative exposure–response links between air pollution and mental health given the research limitations of prior studies. Our findings confirm the causal effects of air pollution on human mental health. The extensive heterogeneity analyses in this study also provide a comprehensive understanding of how air pollution affects people’s mental health under various demographic and socio-economic conditions. Moreover, this study helps estimate how many people suffer from mental health problems as a result of air pollution. Given the market share of the Baidu search engine (69.5%)35 and the percentage of internet users who used the search engine service in China in 2019 (approximately 49.6%)36, this study found that a one-standard-deviation increase in PM2.5 (that is, 26.3 μg m−3) results in approximately 1.15 million people with mental health problems (that is, 0.398 million/0.695/0.496). Overall, our findings provide evidence that campaigns to mitigate air pollution, despite potentially large and unavoidable control costs, can bring extra benefits by improving people’s mental health.

## Methods

### Baidu search data

According to the report of the China Internet Network Information Centre36, the total number of internet users using the Baidu search engine reached 694 million in 2019, where Baidu accounts for approximately 70% of the market share35. The search queries on Baidu are thus representative and able to reflect the activities of most populations.

We partnered with Baidu to obtain all search queries after 1 March 2019 on the Baidu search engine. To filter the MHQs from daily search queries, we selected a group of search terms related to two mental health problems—depression and anxiety. According to WHO1, depression and anxiety are the two most common mental disorders and are highly prevalent in the overall population. We obtained seven keywords for depression and eight keywords for anxiety referring to the definitions of WHO1 (analyses based on each keyword are presented in Supplementary Note 6). We then translated these keywords to Chinese following the processes in Supplementary Fig. 2 to build the corresponding Chinese search keywords (Supplementary Note 1 and Supplementary Table 2). On the basis of these keywords, we constructed a regular expression to filter the MHQs from daily search queries on Baidu:

$${\mathrm{Regex}}=[{{{\bf{A}}}}]* ![{{{\bf{B}}}}],$$
(1)

where A is a set of selected keywords representing depression and anxiety and B is a set of words that are used to exclude the irrelevant queries. The elements included in A and B are reported in Supplementary Table 2. The rule of this regular expression is that a query is an effective MHQ if at least one keyword in set A is included in the query and all keywords in set B do not occur in this query.

Using the filtering rule, we obtained 360 million geotagged search queries (with a 95.1% positive ratio) related to mental health from 252 cities from 1 March 2019 to 31 December 2019. We adopted this period to avoid confounding factors from COVID-19 because many cities were locked down at the start of 2020, which might affect people’s mental health37.

### Online doctor consultation data

In recent years, online doctor consultation has been an important way of broadening the channels available to patients. The online consultation records the actual mental health cases. Accordingly, we collected online doctor consultation data from a leading online health care platform in China, Haodf, to validate the association between MHQs on Baidu and MHCs on Haodf. The MHCs were selected on the basis of the disease classification on Haodf and included typical mental disorders, such as depressive disorder, anxiety disorder and insomnia (Supplementary Note 1). We then aggregated the data to the city level.

### Air quality data

We obtained air quality data from China’s Ministry of Ecology and Environment, which provides real-time monitoring data covering all prefectural cities. The original dataset included hourly readings of the AQI and specific pollutants, including PM2.5 concentrations, from 1,605 monitoring stations. We aggregated the station–hour-level air pollution data to city–day-level data by averaging all station data within the corresponding city each day.

### Weather data

We considered five common weather variables: temperature, sunshine, humidity, precipitation and wind speed. The weather data were collected from the China Meteorological Data Service Centre, which provides daily meteorological data from 699 meteorological stations covering most cities. We calculated the distance between each city and all meteorological stations. We then used the meteorological data from the nearest station as a proxy for the city weather conditions. We excluded cities for which the distance to the nearest meteorological station is more than 40 km. This reduced the number of our sample cities to 252. We also varied this threshold to verify the robustness of our results (Supplementary Note 3 and Supplementary Table 12).

### Demographic and socio-economic data

To explore heterogeneity, we assembled demographic and socio-economic data. The demographic data were at the individual level and were provided by the Baidu User Profile Platform, which integrates the user information from over 55 Baidu products (apps) and creates a comprehensive profile for more than 90% of individual users in Baidu. The profile data were generated by collecting the users’ actual information and predicting unknown information on the basis of comprehensive machine learning algorithms. We collected each individual’s demographic information from the platform and constructed mental health measures for different groups (such as male versus female) for each city. The socio-economic data were at the city level and came from the 2019 China City Statistical Yearbook compiled by the Chinese National Bureau of Statistics. The yearbook provides a series of city information, including GDP per capita, number of hospitals and area of green land. We collected each city’s socio-economic information from the yearbook and divided the sample cities into high- and low-value groups on the basis of the median value of each characteristic.

### Baseline regression model

We used a fixed effect panel model to estimate the effects of air pollution on mental health. First, we estimated the short-term effects of air pollution on the basis of the following econometric specification:

$${\mathrm{ln}}{({\mathrm{MHQs}})}_{it}={\alpha }_{0}+{\alpha }_{1}{\mathrm{AirPollution}}_{it}+{\alpha }_{2}{{{{\bf{X}}}}}_{it}+{\mu }_{i,m}+{\lambda }_{t}+{\epsilon }_{it},$$
(2)

where ln(MHQs)it is the natural logarithm of the number of MHQs in city i on date t; AirPollutionit represents the pollution level in city i on date t, measured by AQI and PM2.5 concentration; Xit is a vector of weather controls; μi,m indicates city–month fixed effects, which can not only control for time-invariant confounders but also absorb time-varying characteristics for each city (such as local business cycle dynamics); and λt indicates date fixed effects, accounting for common shocks for all cities in a given day. The coefficient α1 reflects the transitory impact of air pollution on mental health and is expected to be positive.

Next, we estimated the long-term effects of air pollution on mental health by focusing on the primary pollutant—PM2.5:

$$\begin{array}{rlr}{\mathrm{ln{}}({\mathrm{MHQs}})}_{it}=&{\alpha }_{0}+{\alpha }_{1}{{{{\rm{PM}}_{2.5}}}}_{it}+{\alpha }_{2}\cdot \frac{1}{k}\mathop{\sum }\limits_{n=0}^{k-1}{{{{\rm{PM}}_{2.5}}}}_{i,t-n}&\\ &+{\alpha }_{3}{{{{\bf{X}}}}}_{it}+{\mu }_{i,m}+{\lambda }_{t}+{\epsilon }_{it}.\end{array}$$
(3)

Here $${{{{\rm{P{M}}}_{2.5}}}}_{it}$$ represents the contemporaneous level of air pollution in city i on date t, and $$\frac{1}{k}\mathop{\sum }\nolimits_{n = 0}^{k-1}{{{{\rm{P{M}}}_{2.5}}}}_{i,t-n}$$ is the mean level of air pollution in the past k days, which measures cumulative exposure. The control variables are the same as in equation (2). The coefficient α1 reflects the short-term impact of air pollution on mental health, and α2 reflects the long-term exposure on mental health. Controlling for these two variables in a model can help us evaluate the relative importance of transitory and accumulative effects in the association between air pollution and mental health.

### IV estimation

We introduced an IV and employed the two-stage-least-square approach to address the potential endogeneity issues in equation (2). In this study, we followed the idea of Zheng et al.21 by utilizing “cross-boundary air pollution flows” to construct the IV.

To build the cross-boundary spillover measure of local air pollution, we procured a dataset from Tracking Air Pollution in China (http://tapdata.org), which provides real-time PM2.5 data with a spatial resolution of 10 km × 10 km (in a regular grid of 0.1° × 0.1°)38. We also used the PM2.5 data from the monitoring stations to construct the IV and obtained very similar results (see the details in Supplementary Note 3 and Supplementary Tables 9 and 10). Using the data, we constructed the IV:

$${\mathrm{Neighbour}}_{it}=\frac{1}{k}\mathop{\sum }\limits_{j=1}^{k}{w}_{ijt}\times {\mathrm{GridPollution}}_{jt},{d}_{ij}\in \left(120,300\right]{\mathrm{km}},$$
(4)

where GridPollutionjt represents grid cell j’s pollution level on date t measured by PM2.5 concentration; wijt represents grid cell j’s weight, which depends on the wind direction of city i on date t and the relative direction of grid cell j to city i; and dij is the distance between city i and grid cell j. The weights of upwind grid cells are larger than those of grid cells in other directions, and the weights of downwind grid cells are set to zero because their impacts on city i’s air pollution level are minimal. Neighbourit therefore measures how city i’s air pollution level on date t is affected by the PM2.5 concentration from nearby grid cells on the same day.

To minimize the interference from agglomeration economies on a group of cities, which may produce correlations between the IV and city i’s economic activities21, we excluded all the neighbouring grid cells for which the distance to city i is within 120 km. The correlation between grid cell j’s PM2.5 concentration and city i’s air pollution level should be small if the distance between city i and grid cell j is long. We thus also excluded grid cells outside 300 km from city i given that 90% of the sample days’ wind speeds are smaller than 3.4 m s−1 (that is, 312 km d−1—the greatest distance the wind can travel per day).

### Reporting summary

Further information on research design is available in the Nature Portfolio Reporting Summary linked to this article.