Risk analysis of maize yield losses in mainland China at the county level

Food security in China is under additional stress due to climate change. The risk analysis of maize yield losses is crucial for sustainable agricultural production and climate change impact assessment. It is difficult to quantify this risk because of the constraints on the high-resolution data available. Moreover, the current results lack spatial comparability due to the area effect. These challenges were addressed by using long-term county-level maize yield and planting area data from 1981 to 2010. We analyzed the spatial distribution of maize yield loss risks in mainland China. A new comprehensive yield loss risk index was established by combining the reduction rate, coefficient of variation, and probability of yield reduction after removing the area effect. A total of 823 counties were divided into areas of lowest, low, moderate, high, and highest risk. High risk in maize production occurred in Heilongjiang and Jilin Provinces, the eastern part of Inner Mongolia, the eastern part of Gansu-Xinjiang, west of the Loess Plateau, and the western part of the Xinjiang Uygur Autonomous Region. Most counties in Northeast China were at high risk, while the Loess Plateau, middle and lower reaches of the Yangtze River and Gansu-Xinjiang were at low risk.

www.nature.com/scientificreports/ estimated by the method proposed in IPCC 5th assessment reports, is determined by three indicators: exposure degree, sensitivity, and adaptive capacity 24 . Many researchers have selected a method that directly quantifies the risk of crop yield loss based on historical time series yield data [25][26][27] , which could integrate climatic conditions and socioeconomic components, as these factors are directly reflected in the data. Various indices, which include the coefficient of variation (CV) and yield reduction rate, have been commonly used to indicate the risk of losses 25,27 . However, assessments are often conducted at a coarse resolution, such as at the provincial or district level 6 , or at a high resolution over small regions 27,28 , which has significant limitations because of the difficulty in accessing yield data at high spatial resolution. In addition, these studies focused on only the economic responses to sown area size or used the percentage of the affected areas to planting areas to show the degree of damage/exposure 24,[28][29][30] , without considering the area effects (see the next paragraph for further explanation). These choices induced the lack of spatial comparability in the results. Most of the farmland in China is distributed in the eastern region, including Heilongjiang, Hebei, Henan, and Shandong Provinces, which host large crop-planting areas in most counties 31 . The planting scale of a county affects the yield loss risk result when field-based or farm-based observed yields are aggregated by county. For instance, the increase/decrease of the yield in some fields will offset the decrease/increase in other fields when the field yields are summed, and then the fluctuation or variation in yield will be underestimated. Moreover, the increasing crop area is threatening food security from increased competition for land for food production 23 . Thus, the actual risk should be higher. In contrast, the provinces in western China, such as Qinghai Province, the Tibet Autonomous Region, and the Xinjiang Uygur Autonomous Region (Xinjiang), have relatively small crop-planting areas in most counties 31 , where the estimated yield variation or loss risk could be higher than those of the counties with large planting areas. This suggests that the risk of small crop-planting areas could be overestimated in this area. Twenty-four counties (in western and eastern China) were randomly selected from the alphabetically ordered data of maize yield and planting area as a case study to test the hypothesis (Fig. 1). The CV of the annual yield per unit area of maize from 1981 to 2010 was calculated for each of the 24 counties to reflect the variation in maize yield. The Pearson's pairwise correlation coefficient between planting area and CV was − 0.53, which indicates that there is a strong negative relationship between planting area and CV. The influence of planting area scale, which induced the overestimation or underestimation of yield loss risk, is referred to as the "area effect". This study is the first to consider the area effect in the analysis of yield loss risk. The area effect will be removed or mitigated by introducing the indicator of the standardized planting area in our study.
This study aims to provide high-resolution information on the spatial distribution of yield loss risk based on a new comprehensive risk index, which was established by combining the reduction rate, CV, and probability of yield reduction after removing the area effect. The proposed index can improve the spatial comparability of risk. The results may be crucial for agricultural decision-support systems and climate change assessments.

Result
Distribution of the coefficient of variation ( CV). The CV indicates the stability of maize production.
A high CV means that the maize yield fluctuates greatly between years and that the yield is vulnerable to both climatic and socioeconomic conditions. The CV identified three regions with high variations covering (1) Table 1).  www.nature.com/scientificreports/ Distribution of risk index ( I R ). The risk of maize yield loss increases with I R , which means that the yield is vulnerable to climatic conditions. The two regions that were identified to exhibit a high risk covered (1) parts of Heilongjiang and Jilin Provinces and the eastern part of Inner Mongolia and (2) the western part of Xinjiang (Fig. 3). Agricultural zones I, II and IV had 38.61%, 57.35% and 24.56% of their counties in the highest I R level (see Table 2). Agricultural zones V and VII had 32.26% and 30% of their counties in the high I R level. Agricultural zone VI had more counties with the lowest risk, low risk, and moderate risk than the other zones (see Table 2). The comparison of Figs. 2 with 3 indicates that there are some differences between the spatial distributions of the CV and I R . The reason for these differences might be that I R is related to only climatic conditions such as precipitation, temperature, sunshine hours and soil type. However, the CV is related to the local socioeconomic conditions (technological developments, infrastructure and investment level) in addition to the climatic conditions.

Performance of two comprehensive risk indices ( CRI and ICRI). The risk of maize yield loss increases
with the comprehensive risk index. A high value of the comprehensive risk index means that the maize yield is dramatically impacted by both climatic and socioeconomic conditions. Figures 4 and 5 show the distributions of the comprehensive risk index without considering the planting area effect and after removing the area effect, respectively. After the effect was removed, the improved comprehensive risk index (ICRI) identified three   5). More counties in zones I, III, and III were at high risk after the area effect was removed, while the risk in zone VIII decreased dramatically, especially in the western (Xinjiang) and eastern (Hetao area) parts of zone VIII (Figs. 4 and 5). Across China, the degree of yield loss risk increased gradually from south to north and from east to west (Fig. 5). As shown in Table 3

Discussion
There are many different methods of crop risk assessment 17,[24][25][26][27] . Quantifying the risk by the index calculated from historical time series yield data is often used [25][26][27] . Many indices, including the CV, reduction rate and probability of yield reduction, and a comprehensive index combining several indices, can indicate the fluctuation in crop yield 25,27,28 . However, the planting area of a county affects the risk assessment when field-based observed yields are aggregated at the county scale. As a consequence, the risk is underestimated or overestimated. In this study, the indicator of standardized planting area was introduced into the comprehensive risk index to remove this effect and improve the spatial comparability of the results. In this research, natural and socioeconomic conditions were considered in the assessment of maize production risks. I R is related to only climatic conditions such as precipitation, temperature, sunshine hours and soil type, and the CV indicates that the fluctuations between years are related to both local socioeconomic conditions (technological developments, infrastructure and investment level) and climate conditions, while the ICRI integrates the CV , the average reduction rate and I R after removing the area effect. The ICRI indicates the risk due to both climatic and socioeconomic conditions.
In agricultural zones I and II, including some parts of Heilongjiang, Jilin and Liaoning Provinces, and the eastern part of Inner Mongolia (the Inner Mongolia Plateau), there is a region with high CV , I R and ICRI values, indicating that the maize yield in these areas varies considerably between years and that those areas are prone to disasters and changes in the socioeconomic level. Yield fluctuations increase from the south to the north in zone I 34,35 . This result occurs because the regions in these areas are rain-fed dryland cropping areas that are entirely dependent on limited and erratic rainfall 30 . The percentage of annual effective irrigated area to the sown area in Jilin and Heilongjiang Province accounted for no more than 26%, which was less than the 50th percentile of www.nature.com/scientificreports/ the proportion of all provinces (Fig. 6). A significant increase in the maize water requirements will occur in the future because precipitation has been continually decreasing, especially during the maize growing season 7,36,37 , while only approximately 6% of this region is irrigated 38 . Thus, drought is the greatest agro-meteorological disaster that occurs with the highest frequency, covers the largest area, and causes the most considerable agricultural production losses in this area 30 . The degree of drought disaster risk increases gradually from south to north and from east to west 29 , similar to the distribution of the yield loss risk. The areas that already correspond to a high yield loss risk will be significantly damaged by climate change. Therefore, the need for adjustment and management is urgent. The ICRI increased in most counties in zone III after the area effect was removed. The distribution of the ICRI is consistent with the distribution of the drought disaster risk, which indicates a high risk in most areas in zone III, and the areas with the high risk and the highest risk accounted for more than 60% of the study area 12,30 . The high-risk areas of summer maize yield loss in Henan Province are distributed in the south and northwest parts of the province 39,40 , which is consistent with our results (Fig. 5, orange to red areas in the south part of zone III, north part of zone V, and southeast part of zone IV).
The yield loss risk increased significantly for some counties in zone VI, as indicated by the ICRI , because most areas of zone VI are rain-fed regions with a low proportion of effective irrigated area (Fig. 6). This zone is prone  www.nature.com/scientificreports/ to spring and summer drought and heat stress, especially in the central and southwestern Sichuan Basin 6,41,42 , and the agricultural infrastructure is weak 30 .
The yield loss risk decreased in most counties in agricultural zone VIII. The sunshine and significant day/ night temperature differences in this zone were suitable for maize growth 43 , although the annual precipitation was less than 400 mm/year (only 296 mm/year from 1981 to 2000) (Table 4). Thus, maize could not be grown in this area without irrigation 44,45 . In fact, irrigation is extensively used to relieve water shortages in these areas. The percentages of effective irrigated area to sown area accounted for 91.01% and 33.64% in Xinjiang and Inner Mongolia, respectively; these results are greater than the 50th percentile of the proportion of all provinces (Fig. 6). Thus, maize production in the western part (Xinjiang Province) and eastern part of zone VIII (Hetao area) had a low yield loss risk 6,46 . In turn, the I R was higher in these areas than in other areas, which was consistent with   www.nature.com/scientificreports/ the trend of the average degree of drought hazard 34 , while the actual yield loss risk was lower, which indicated that the dependence on management such as irrigation cannot be underestimated. Wind has played a significant role in maize yield loss 39,47 . Wind-induced lodging reduces the grain-filling rate and decreases the kernel weight 47,48 . In addition, the light penetration through the upper layer of vegetation is reduced by lodging 49 . It can decrease the ear number and kernel number per ear 48,50 . Lodging also can increase the yield loss of mechanical harvest 50,51 . Wind effects were not analyzed in our study but should be studied in the future. conclusion 1. The area effect is the limiting factor for the spatial comparability of risk assessment because the planting area of a county affects the risk assessment when field-based or farm-based observed yields are aggregated by county. risk (e.g., drought) and socioeconomic components (e.g., irrigation level); thus, the ICRI is reasonable. This index can be used to accurately compare the risk of maize yield loss in different areas.

Data and methods
Agricultural data and meteorological data. This study used the total yield (kg) and planting area (ha) of maize for 2,247 counties in China from 1981 to 2010; these data were collected by the Ministry of Agriculture and Rural Affairs of the People's Republic of China. The dataset contains summer maize and spring maize, while not all of the counties planted both types of maize. The yield per unit area was determined by the planting area and total yield. The counties used in this study were required to have at least fifteen records over the 1981-2010 period. The planting area sizes ranged from 2 ha in Lang County, Tibet, to 199,043 ha in Nongan County, Jilin Province. The data of annual effective irrigated area and annual total sown area at the provincial level from 1981 to 2010 were collected by the National Bureau of Statistics (https ://data.stats .gov.cn/); however, data were only available from 1996 to 2010 for Chongqing City.
The daily temperature and precipitation data from 740 stations during 1981-2000, which were collected by the China Meteorological Administration, were used. Study area. The study area included the main maize production regions in mainland China. The annual total yield in each county was calculated by the temporal average. The main production counties were those with an annual average total yield greater than the 50th percentile of the annual average total yield of all the counties. Finally, 823 counties were selected for analysis (in light green to dark blue areas, Fig. 7). There are nine main agricultural zones in mainland China 52 . The annual average precipitation (P, mm) and temperature (T, ℃) in each zone were calculated by the spatial and temporal averages of the daily temperature and precipitation data. The number of maize-growing counties (N), number of main maize-growing counties (N m ), and percentage (%) of N m to N in each zone are listed below (Table 4).
In this study, the observed maize yield per unit area (kg/ha) was used to obtain the CV. The yield per unit area time series was decomposed into three components by using the linear moving average method: the meteorological yield, the trending yield (or technical yield in some studies), and the error 53,54 . Ultimately, the meteorological yield was further processed to obtain the relative meteorological yield, in which the negative values were taken as the object to obtain the probability of reduction, risk index, and ICRI.

Coefficient of variation ( CV). The CV of yield per unit area indicates the variations in yield caused by
climatic and socioeconomic conditions. The equation is as follows: where Y i is the i th observed yield per unit area (kg ha −1 ); Y is the mean of Y i during the period 1981-2010; and n is the total number of observations, which is at least 15.
Meteorological yield ( Y w ) and trending yield ( Y t ). The observed maize yield per unit area is impacted by natural conditions (temperature and precipitation) and socioeconomic components (technological progress and infrastructure improvements). The yield can be divided into three parts: trending yield ( Y t ), meteorological yield ( Y w ) and random output/error ( ε ). The equation is as follows: where Y is the annual observed maize yield per unit area. Since the random yield ε is quite small, it can be ignored. Furthermore, the simplified Eq. (1) is: In addition, the trending yield and meteorological yield calculated with the 5-year linear moving average method met three criteria that determine trend models 56 , and this method can smooth irregularities and high-frequency variations in the trends 28 . Thus, the five-year linear moving average method was employed to simulate the trending yield. The time series of Y was divided into sequence segments according to the time step (k), which is 5 in this study. The number of segments is n − k + 1 . The linear regression for each segment ( j ) is as follows: where Y j (t) is the tth trend yield in segment j , k j and b j are estimated from a set of Y and t in segment j with the least squares method, and t is the rank index of each observed year. There can be more than one simulated value for each t in segment 2 to n − k + 1 . Finally, the trending yield of each t is a moving average:  where Y w (t) Y(t) are the tth meteorological yield and actual yield per unit area, respectively.
Relative meteorological yield ( Y r ). The Y r values are comparable since they are not impacted by the socioeconomic component 15 . The corresponding equation is as follows: where a negative Y r (t) is defined as the tth reduction rate 3 .
Average yield reduction rate ( R). The average yield reduction rate ( R ) was determined by the negative value of Y r (t) . The corresponding equation is as follows: where n is the number of negative values of Y r (t).

Risk index of yield loss ( I R )
. I R results from the integration of different levels of reduction rates ( R i ) and their probability of occurrence ( P i ). The greater the value of I R is, the greater the risk of yield losses.
Because the climate factors that affect crop yield exhibit a normal distribution, it is argued that the Y r series should also exhibit a normal distribution. The normal distribution test was performed on Y r to verify this assumption. Due to the small sample size, the Lilliefors goodness-of-fit test was chosen. For a few samples that did not fit the normal distribution, the normal conversion was conducted by the logarithmic method.
There is no fixed standard for the division of the range of R i . The China national standard (GB/T24438.1-2009) roughly divides R i into three ranges, 0.1-0.3, 0.3-0.8, and 0.8-1, to indicate three levels of damaged crop area. The threshold values of R i for identifying different levels of drought (mild, moderate, severe, and extreme drought) are 0.1, 0.2, and 0.3 57 . A value of 0.05 was used to determine whether the crop was impacted by a disaster 58 . Based on the above threshold, R i was divided into four ranges: Comprehensive risk index of yield loss ( CRI). The CRI combines CV , R , and I R . A larger CRI means a greater risk of losses. Due to the inconsistent units of the four variables, standardization is first performed using the extreme difference method. The standardized CV , R , and I R were calculated using the following equation: where x is CV/R/I R , and x s is the standardized x.
CRI is the comprehensive risk index without considering the area effect.
where CV s , R s and I Rs are the standardized versions of CV , R , and I R . The weights of these three indicators are the same 28,58,59 .
Improved comprehensive risk index ( ICRI ) of yield loss. R , CV and I R exhibit close positive correlations with the yield loss risk, while planting area size ( S ) exhibits a negative correlation with this risk because the increase/decrease in the yield of a field will offset the decrease/increase in another field in the same region. The ICRI is the comprehensive risk index after removing the planting area effect. The main maize growing counties were divided into lowest-, low-, moderate-, high-and highest-risk areas.
The ICRI equation is as follows: where S S is the standardized planting area size calculated using Eq. (11).

Data availability
All the data analyzed in this study are included in this published article. The original data are available from the corresponding author upon reasonable request.