Stage-specific, Nonlinear Surface Ozone Damage to Rice Production in China

China is one of the most heavily polluted nations and is also the largest agricultural producer. There are relatively few studies measuring the effects of pollution on crop yields in China, and most are based on experiments or simulation methods. We use observational data to study the impact of increased air pollution (surface ozone) on rice yields in Southeast China. We examine nonlinearities in the relationship between rice yields and ozone concentrations and find that an additional day with a maximum ozone concentration greater than 120 ppb is associated with a yield loss of 1.12% ± 0.83% relative to a day with maximum ozone concentration less than 60 ppb. We find that increases in mean ozone concentrations, SUM60, and AOT40 during panicle formation are associated with statistically significant yield losses, whereas such increases before and after panicle formation are not. We conclude that heightened surface ozone levels will potentially lead to reductions in rice yields that are large enough to have implications for the global rice market.


II. Administrative Divisions in China
The local administration in China can be described as a top-down system with four layersprovinces, prefectures, counties, and villages. A province usually contains several prefectures, and each prefecture contains several counties, with several villages under each county's governance. Our study uses county as the smallest observational unit. We focus on five provinces in southeastern China, namely Guangxi, Guangdong, Fujian, Jiangxi, and Hainan. The panel data set we construct contains 231 counties located in 53 prefectures in the five provinces.
Most rice is double-cropped in the five provinces. According to the National Bureau of Statistics, the proportions of double-cropped rice for Fujian, Guangxi, and Jiangxi are 45.10%, 92.83%, and 86.06%, respectively; Guangdong and Hainan only plant double-cropped rice. Because we do not have information of cropping patterns at the county level, we treat all observations in our merged data as double croppers. The growing season for single rice in Fujian, Guangxi, and Jiangxi virtually overlaps with the late growing season in those provinces, which makes our definition of growing stages valid even if a county is not a double-cropper.

III. Empirical approach
We use a multivariate panel model with county and year fixed effects to measure the impact of O 3 on rice yield. The general form of this model is given by where y irt represents the average rice yield for county i in province r in year t, o irt is a vector of ozone measures, w irt is a vector of environmental factors, x rt is a vector of additional control variables at the province level, including fertilizer application and natural disasters, where r indexes provinces, α ir represents county fixed effects that control for time-invariant characteristics of county i, λ t represents year fixed effects that capture general changes across years like technological advances and policy changes, ε irt is the error term. The standard errors are clustered at the prefecture level.

Rice yields
The planted area is the sum of planted areas for the early and late growing seasons. The units of measure are hectares (ha) for the planted area and metric tons (mt) for output. We calculate the average yield by dividing the annual total output by the annual total planted area. Yields are measured in mt per ha. Observations with yields above 20 mt per ha were dropped from the dataset because they are likely reporting errors. The data contain original observations for 315 counties. Among them, 279 counties were successfully merged with our other data. Among the merged counties, 231 counties display non-zero planted areas in all three years. This sample corresponds to about 20% of China's rice planted area and 18% of its rice output (Table 1).

Ozone measures
We construct three different O 3 measures for our baseline regression: simple average, SUM60, and AOT40. In line with the literature, we construct these ozone measures based on an 8-hour time window (10 am -5 pm) each day. We also perform sensitivity analyses with various time windows.
For our main regression, the simple average measure is defined as The SUM60 measure is defined as 10,17] oc irt,d,h oc irt,d,h ≥ 60 .
In the equations above, oc irt,d,h represents the O 3 concentration (measured in ppb) for county i in province r during hour h of day d in year t, where h = 1, 2, ..., 24 and d ∈ D. D denotes the set of days corresponding to the period that we construct the ozone measure over, andd denotes the maximum number of days in that period.
Scientific evidence suggests rice is most sensitive to O 3 pollution during panicle formation (PF) 1-3 . To measure this statistically, we divide the growing season into three periods: before, during, and after panicle formation (labeled as pre-PF, PF, and post-PF, respectively). Table 2 shows the dates of these periods for the different provinces in our data set. These dates are determined according to the rice progress report by the National Bureau of Statistics of China. Because we do not observe rice yields specific to the early and late seasons, we create pre-PF, PF, and post-PF measures that average over the two growing seasons. This only imposes the assumption that the marginal effects in the early and the late season are equal, which is arguably valid because the crop variety typically does not differ among the early and late seasons.
We construct three variables for each ozone measure over the three subperiods and label them using superscripts, i.e., (1).

Weather variables
The weather data contain daily precipitation, and daily maximum, minimum, and average temperature. The county-level data were compiled from the station-level as follows. First, the inverse distance weighted (IDW) method was used to interpolate the climate data at 825 weather stations across China to a spatial field with grid spacing of 500m. Then, the values of each grid located in one county were averaged. We only used county-level weather data for the counties in the five provinces. The units of measure are degrees centigrade for temperature, and millimeters for precipitation. The period we construct the weather variables over encompasses the early and late growing seasons in a year; specifically, this period is defined as Mar 15 -Oct 15 for counties in Fujian, Jiangxi, and Guangxi, and Mar 15 -Nov 15 for counties in Guangdong and Hainan. 1 Following the literature on climate change impacts on crop yields 4,5 , we construct three weather variables over the period defined above: (1) growing degree days, (2) heating degree days, and (3) total precipitation. We define growing degree days as follows. We denote the average temperature in county i in province r on day d in year t as T irt,d , and define the daily degree days (8 • C -29 • C) as After calculating D irt,d for each county i in province r for each day d in year t, we sum over the entire growing season and obtain the growing degree days GD irt for each county i in year t, i.e., where D denotes all days in the period defined earlier. We define heating degree days as Total precpitation T P irt for each county i is defined as where P irt,d represents the total precipitation in county i in day d of year t.
In the regression results, these environmental factors are divided by 1000 to scale up the magnitude of coefficients. This adjustment does not influence the estimation of other variables and the interpretation of their coefficients.

Fertilizer application
The fertilizer data provide a nutrient-based fertilizer index for each province. The index is calculated based on effective chemical compounds, namely nitrogen, phosphorus, and potassium, specifically applied on rice on a per-hectare basis. The measure is comparable across regions because it is based on the quantity of effective nutrients. In the raw data, the fertilizer index is provided separately for four different categories, namely early Indica rice, middle Indica rice, late Indica rice, and single-cropped Japonica rice. For each province in each year, the four measures are weighted by the share of production corresponding to each category, and a single index of fertilizer use is obtained for each province-year combination. 2

Natural disasters
Four province-level statistics are reported for natural disasters: area covered by floods, area covered by droughts, area affected by floods, and area affected by droughts. The measure of area covered is defined as the area where the disaster reduces more than 10% of the total crop production; and area affected is defined as the area where the disaster reduces more than 30% of the total crop production. To make the measure comparable between different provinces, we calculate a disaster severity index using a ratio-based measure. In particular, we obtain the total arable area at the provincial level, and divide area covered and area affected by the total arable area for every province. We place these ratios into four categories: flood (severe or mild) and drought (severe or mild). We construct these bins only during panicle formation, i.e., D in equations 9 -12 corresponds to PF in Table 2. We use the same variables to control for environmental factors, fertilizer application, and natural disasters as in the baseline regression. The regression equation can be writen as

IV. Measuring nonlinearities in
V. Discussion of the Results Table 3 shows the estimated results for equation (1) using simple average O 3 , SUM60, and AOT40. Among the coefficients associated with O 3 variables, only variables during PF are statistically significant. A one ppb increase in daytime average O 3 during PF is associated with a 0.48% decrease in rice yield. A one ppmh increase in SUM60 (AOT40) is associated with a 0.74% (1.56%) decrease in rice yield. Table 4 presents estimated results for equation (13). Only coefficients associated with daily maximum O 3 above 100 ppb are statistically significant. Specifically, during PF, one more day with its maximum O 3 over 100 (120) ppb is associated with a 0.94% (1.28%) decrease in rice yield. This means the accumulation of extremely high O 3 exposures is particularly detrimental to rice yields.
The weather variables are not statistically significant. In contrast to statistical studies on climate change and crop yields using long panels, we use a short panel to estimate the relationship between yield and O 3 pollution. Because our short panel consists of only three years, the year-to-year variation in weather variables is not substantial and hence we do not find statistically significant weather effects on yields using our data set. Among other control variables, only the fertilizer index displays a significantly positive effect on rice yields. Table 5 shows the decomposition of variation in rice yields explained by different sets of explanatory variables. We focus on the within-R 2 of a certain set of explanatory variables because the explanatory power in our estimation strategy is mostly provided by within-county variation. The ratio in parentheses below the within-R 2 indicates the portion of yield variation explained by the corresponding set of variables. It is evident that most of the within-county variation comes from O 3 variables. For the control variables, while fertilizer application and natural disasters provide moderate explanatory power, only a very small portion of yield variation is explained by the variation in weather variables.
To address any concern of potential multi-collinearity, we perform correlation tests conditional on the fixed effects. It is worth noting that the identification of partial effects in our estimation strategy relies on year-by-year variation within the same county. Hence, the multi-collinearity issue arises only if the within-county variation of different explanatory variables are highly correlated. We show results of the correlation tests in Tables 6 and 7. The former controls for county fixed effects, and the latter controls for both county and year fixed effects. We do not find high correlation between O 3 variables and other explanatory variables and therefore we are confident that our estimation of O 3 coefficients are not subject to a multi-collinearity problem.
To verify that our empirical results are not sensitive to the chosen daytime window, we perform sensitivity analyses by varying the definition of daytime windows. We repeat our analysis by using the following time windows: 9am-5pm, 10am-6pm, 9am-6pm, 8am-6pm, 9am-7pm, and 8am-7pm. The estimated coefficients associated with O 3 variables are reported in Table 8. The results are consistent with our findings as in Tables 3 and 4.
We estimate the yield changes from 2006 to 2010 caused by variation in O 3 during PF based on our regression results. Figure 3 shows the spatial distribution of the impact based on the three O 3 measures: simple average, SUM60, and AOT40. The three different measures produce consistent estimates of the impact from 2006 to 2010. While some counties in Guangxi, Fujian, and Jiangxi experienced some yield recovery due to the alleviation in O 3 during PF, the majority of counties, especially in Guangdong, experienced substantial yield losses because of the higher O 3 pollution during PF in 2010 relative to 2006.

VI. O 3 damage over the entire growing season
We use an alternative model to study the O 3 damage ignoring differences across stages of plant growth. We construct the 8-hour simple average, SUM60, and AOT40 using the entire growing season (GS). These regression results are presented in Table 9. We find that a 1 ppb increase in GS simple average concentration lowers yield by 0.60% ± 0.34%, whereas an additional ppmh in GS SUM60 and AOT40 is associated with a 0.16% ± 0.16% and 0.42% ± 0.31% yield loss, respectively. Figure 2 compares the coefficients estimated from models with and without the stage specificity. The GS coefficient of the simple average measure is similar to the PF coefficient in the baseline, while the SUM60 and AOT40 coefficients are smaller in magnitude than their PF counterparts. This is because SUM60 and AOT40 are not "average measures". A one ppb decrease in simple average ozone concentration for GS implies a one ppb decrease for PF, which is consistent with the finding that GS and PF coefficients are very close. However, a one ppmh reduction in SUM60 or AOT40 over the entire GS is equivalent to a reduction of 1/3 ppmh before, during and after PF. Since changes in SUM60 or AOT40 before and after PF are not associated with yield changes at a statistically significant level, the 1 ppmh reduction in SUM60 or AOT40 over the entire GS has a similar impact as a 1/3 ppmh reduction during PF. Therefore the benefit arising from this reduction would be roughly a third of a 1ppmh reduction during PF. To confirm our reasoning, using our regression estimates for PF, a 1/3 ppmh reduction in SUM60 and AOT40 would reduce yields by 0.25% ± 0.17% and 0.53% ± 0.38%, respectively, which are similar up to sampling variability to the estimates of a 1 ppmh reduction in SUM60 and AOT40 over the GS.
We believe that our GS estimates are underestimates of O 3 damage over the entire growing season because the potential omitted variables drive point estimates of pre-and post-PF effects to be positive (although statistically insignificant), while positive coefficients are illogical since no evidence shows O 3 can potentially benefit crop growth. An experimental study in China 6 concludes that one additional ppmh of AOT40 is associated with a 0.95% reduction in rice yield, which is larger in magnitude than our GS estimate of a 1 ppmh increase in AOT40. It is important to note that these two estimates are not readily comparable as our definition of GS encompasses two consecutive growing seasons, we only observe average yield for both seasons and our sample is not localized to a small region as in an experimental study. If we assume an additional ppmh is evenly increased across two seasons and its impact is homogeneous across the two seasons, the yield reduction associated with a 1 ppmh increase in AOT40 will be roughly 0.84% based on our GS estimate, which is slightly lower than 0.95%. Tang et al. (2014) 7 estimate the increases in ozone risk in terms of AOT40 from 2000 to 2020 for different regions of China, and calculate the loss of rice harvest by using the yield-ozone relationship established in Wang et al. (2013) 6 . Tang et al. (2014) project that rice production loss will be as high as 7.9 -12.1% from 2000 to 2020 due to ozone pollution. If this projection was based on our GS estimate using AOT40, the projected production loss would be around 7.0 -10.7%. Hence, our results using county-level longitudinal data confirm the findings in Tang et al (2014).    Note: This figure compares ozone coefficients obtained from a regression using the simple average, SUM60 and AOT40 constructed over the entire GS with those constructed separately over three periods: the period before PF (Pre-PF), the PF period (PF), and the period after PF (Post-PF). Black dots represent point estimates, and gray bars represent 95% confidence intervals based on prefecture-level cluster-robust standard errors. The regression includes control variables for weather, fertilizer application, and natural disasters, as well as county and year fixed effects.

Estimated changes based on SUM60
Estimated changes based on AOT40 Note: Magnitudes of changes are estimated by multiplying PF coefficients reported in Table 3       Note: This table reports the regression results using ozone variables constructed separately before, during, and after panicle formation (pre-PF, PF, and post-PF). The ozone variable used in the regression is indicated by the column header. Prefecture-level cluster-robust standard errors are reported in parentheses. Significance levels: * p < 0.10, * * p < 0.05, * * * p < 0.01 Significance levels: * p < 0.10, * * p < 0.05, * * * p < 0.01 Note: The R 2 statistics are obtained by regressing the dependent variable on a certain set of explanatory variables (indicated in "Variables Included"). The ratio in the parentheses indicates the portion of yield variation explained by the corresponding set of explanatory variables. Fixed effects (FE's) include both county and year fixed effects.    .0062 (0.00201) * * * (0.00194) * * * (0.00180) * * * (0.00163) * * * (0.00164) * * * (0.00150) * * * Daily Max.: >120 ppb -0.0114 -0.0115 -0.0103 -0.0092 -0.0094 -0.0085 (0.00397) * * * (0.00395) * * * (0.00360) * * * (0.00326) * * * (0.00329) * * * (0.00300) * * * Note: The first three panels correspond to the specification using simple average, SUM60, and AOT40 with three stages. The last panel corresponds to the specification based on ozone bins during PF. In each panel, each column reports ozone coefficients with a different daytime window as indicated by the header. All regressions include control variables of weather, fertilizer application, and natural disasters. Prefecture-level cluster-robust standard errors are reported in parentheses. Significance levels: * p < 0.10, * * p < 0.05, * * * p < 0.01 Note: This table reports the regression results using Simple average, SUM60, and AOT40 constructed over the entire growing season (GS). The ozone variable used in the regression is indicated by the column header. Prefecture-level cluster-robust standard errors are reported in the parentheses. Significance levels: * p < 0.10, * * p < 0.05, * * * p < 0.01 Note: After each ozone variable is constructed during the panicle formation, we first-difference the data to obtain within-county year-to-year variation of that ozone variable. Then we pool all year-to-year variation and calculate the proportion of this pooled dataset with year-to-year variation falling into a specific range, as labeled in the "Variation" columns.