Background & Summary

Food security is the foundation of human survival and national security. According to the Food and Agriculture Organization (FAO), nearly 10% of the world’s population suffered from hunger in 20201. With the rising global population, global food demand will increase by 100–110% in 2050 compared to 20052, which may significantly challenge food security3,4. Maize is one of the most widely planted cereals in the world, and sustainable maize production plays a crucial role in meeting the rapidly growing food demand and ensuring national and global food security. Maize represents an important source of food for humans and animals and industrial raw material5,6. Since 2001, maize has surpassed rice as the world’s second-largest cereal1. In 2019, maize accounted for about 12% of global crop production1.

Large-scale and long-term distribution maps of maize are essential for maintaining food security and achieving sustainable development7,8. Distribution maps of maize are not only crucial data for simulating maize yield9, but also the basis for identifying maize phenology10,11. Additionally, a distribution map of maize can be used as a reference for predicting future maize distributions by exploring the driving forces of crop distribution patterns12. Because of the use of fertilizers such as nitrogen, a large amount of long-term existed greenhouse gas N2O is associated with the maize planting process13,14. Therefore, the distribution map of maize can also be an important source of data for simulating greenhouse gas emissions from agricultural ecosystems and plays an important role in determining the regional budget for greenhouse gases15,16. In addition, compared with C3 plants, a C4 crop type such as maize has stronger photosynthetic capacity17. To accurately estimate the global crop gross primary productivity, mapping the long-term spatial distribution of maize is necessary18.

As the world’s second-largest maize producer, China produced 260.95 million tons of maize in 2019, accounting for 22.72% of global maize production1. However, in recent years, climate change and extreme weather events have profoundly affected China’s agricultural production19,20,21. Specifically, maize can be cultivated in various geographical regions and environmental conditions, including large areas with no irrigation22, and is highly sensitive to climate change. Studies have shown that from 1979 to 2016, for every 1 °C increase in temperature, China’s maize yield decreased by 1.7%23. In addition, Yuan et al.24 found that the planting area of maize in China has significantly increased with the improvement of economic returns, leading to substantial changes in its spatial distribution pattern. Therefore, long-term and accurate monitoring of maize distribution is of great significance for reducing economic losses and ensuring food security25.

At present, methods based on remote sensing data are the basic approach for identifying the regional-scale maize planting area and its dynamics. Remote sensing data has the advantages of temporal and spatial continuity, high updating frequency, and low acquisition cost. Currently, many studies have attempted to map the distribution of maize at the provincial and national levels in China based on remote sensing data26,27. For example, Zhang et al.26 mapped the distribution of maize in 11 provinces in Northeast and North China in 2010 using MODIS (Moderate Resolution Imaging Spectroradiometer) data with a spatial resolution of 250 m. Luo et al.27 generated a 1000-m spatial resolution dataset of the distribution of rice, wheat, and maize in China from 2000 to 2015 based on MODIS data, by comparing the phenological periods of each pixel with the reference phenological periods of the three crops. However, China is one of the countries with the most severe cropland fragmentation issues28. Since the implementation of the household contract responsibility system in 1979, the absolute average allocation of land has exacerbated the degree of cropland fragmentation, resulting in extremely scattered cropland in China29,30,31. The report shows that the average cropland area per household in China was 0.58 hectares in 201432. In addition, there are significant differences in planting habits among Chinese households, as farmers are free to choose which crops to plant, leading to high heterogeneity in crop types. Thus, one MODIS pixel with resolutions of 250 to 1000 m typically covers the fields of 10 to 172 households, resulting in a large amount of misclassification that affects the accuracy of maize maps31.

There have been several efforts to generate distribution maps of maize using high spatial resolution satellite datasets, which can effectively avoid mixed-pixel issues. You et al.33 produced annual crop maps for the main crops (maize, soybean, and rice) of Northeast China from 2017 to 2019 based on Sentinel-2 data at 10-m resolution, with an overall accuracy of 81–86%. Recently, Shen et al.22 obtained a 30-m resolution maize map for 22 provinces of China from 2016 to 2020 using Landsat and Sentinel-2 satellite data, with an average overall accuracy of 79.13%. These studies have focused more on recent years’ maize mapping. However, a long-term high-resolution dataset of maize distribution in China is still lacking. The limitation is mainly due to the lack of corresponding long-term high spatiotemporal resolution remote sensing data. Although Landsat data have a high spatial resolution (30 m) and a long-term span, their temporal resolution is low (16 days) and tends to be affected by cloud contamination. In Southern China, for example, from 1984 to 2017, there were fewer than 10 cloud-free observations per year, making it challenging to accurately map maize due to the lack of critical crop growth information34. Sentinel-2 data (10 m or 20 m, 5 days) can balance spatial and temporal resolution well but lack historical time series due to the more recent launch of the satellite (e.g., Sentinel-2A was launched in 2015).

A fused data product has the potential to overcome the above problems. It can fuse frequently revisited but coarse spatial resolution images and infrequently revisited but high spatial resolution images to reconstruct long-term high spatiotemporal resolution images, providing data support for improving crop classification accuracy35,36,37,38,39,40. For example, Yin et al.41 produced a rice distribution map of the Sanjiang Plain by fusing MODIS and Landsat data, and the results showed that the overall accuracy improved by 6.07% based on the fused data compared to using only Landsat data. Ding et al.42 used a fusion of MODIS and Landsat data to identify rice in Nanchang County of China and achieved an overall accuracy of 93.66%. However, these studies were conducted on small reference datasets and the long-term identification of maize over large regions based on fused data still lacking. Recently, a study has combined MODIS data and Landsat data to generate a long-term high spatiotemporal resolution fused dataset of Normalized Difference Vegetation Index (NDVI) (Integrating ENvironmental VarIable spatiotemporal fusion dataset, InENVI) over a large regional scale, which can be used for high-precision crop mapping43.

Therefore, based on the newly produced high spatiotemporal resolution and long time series (2001–2020) NDVI fused dataset (i.e., InENVI) in China, this study produced high-resolution maps of maize distribution from 2001 to 2020 by the Time-Weighted Dynamic Time Warping (TWDTW) method. The specific objectives were: (1) to generate a publicly available database of long-term high-resolution maize distribution maps in China; (2) to evaluate the accuracy of the maize distribution dataset using field surveys, Google Earth samples, unmanned aerial vehicle images (UAV), and county-level statistical data; and (3) to analyse the temporal and spatial variation characteristics of maize distribution in China. The generated dataset provides important foundational data for estimating maize production, identifying maize phenology, and monitoring food security.

Methods

Study area

Based on agricultural statistical data, 30 provinces and municipalities in mainland China planted maize in 2020. This study focused on mapping maize in 22 of these provinces and municipalities, accounting for over 99% of the total maize planting area in China for 2020, including Anhui, Gansu, Hebei, Heilongjiang, Henan, Inner Mongolia, Jiangsu, Jilin, Liaoning, Ningxia, Shaanxi, Shandong, Shanxi, Tianjin, Xinjiang, Chongqing, Guangxi, Guizhou, Hubei, Hunan, Sichuan, and Yunnan (Fig. 1). In 2020, the 22 provinces produced a total of 259 billion tons of maize, with a total planting area of 40.93 million hectares (http://www.moa.gov.cn/), which increased by about 17.05 million hectares from 2001 (Fig. 2). The major maize planting provinces in China, including Heilongjiang, Jilin, Shandong, Inner Mongolia, Henan, Hebei, and Liaoning, accounted for over half of the total planting area in China.

Fig. 1
figure 1

Location of study area and field survey samples. Red dots indicate maize samples and blue dots indicate non-maize samples, including other crops, forests, shrubs, water, buildings, and sheds. All samples were collected in 2019 through field surveys and © Google Earth.

Fig. 2
figure 2

Changes in maize planting area in 22 provinces and municipalities in China from 2001 to 2020.

In China, maize can be divided into spring maize and summer maize based on the sowing season. Summer maize is mainly planted in the Huang-Huai-Hai region, including Henan, Shandong, Jiangsu, and Anhui; Hebei, Shanxi, Shaanxi, and Xinjiang plant both spring maize and summer maize; other provinces mainly plant spring maize. Summer maize is usually rotated with other winter crops (such as summer maize-winter wheat rotation). The planting time is typically from late May to mid-to-late June, with harvest occurring from mid-to-late September to early October. Spring maize is only planted once a year, with a wide range of planting times across different regions, from March to May. The harvest time varies depending on the planting time, mostly from August to October.

Fused dataset

In this study, the latest fused InENVI NDVI dataset was used to generate the maize distribution maps in China43. This dataset is based on the nonlinear relationship between MODIS NDVI and Landsat NDVI to reconstruct high spatial resolution NDVI data43. The dataset has a wide spatial coverage (China), long time series (2001–2020), and high spatiotemporal resolution (30 m, 8 days), making it a high-quality dataset for long-term identification of maize planting areas.

Agriculture data

This study assessed the accuracy of a maize distribution map through field surveys and statistical data. In 2019, field surveys were conducted at 600 randomly selected sites across 11 provinces in China. In August 2018, three UAV (eBee, senseFly Ltd., Switzerland) images covering approximately 0.1 square kilometers and containing about 6000 spatial resolution 30 × 30 m pixel samples were collected in Ningxia, Shaanxi, and Inner Mongolia. Additionally, maize fields and other land cover types were interpreted using high-resolution satellite images from Google Earth in 22 provinces for 2002, 2005, 2007, 2008, 2011, 2012, 2013, and 2019. In total, 54,281 samples with a spatial resolution of 30 × 30 m were obtained through field surveys and visual interpretation of Google Earth. Among them, 22,070 samples were maize, and 32,211 samples were other crops, forests, shrubs, water, buildings, and sheds.

In addition, this study evaluated the accuracy of the identified maize distribution using county-level and provincial-level statistical data obtained from the National Bureau of Statistics (NBS) of China (http://data.stats.gov.cn/). Accuracy assessment was conducted using county-level statistical data from 17 provinces from 2001 to 2020.

Time-weighted dynamic time warping

In this study, the TWDTW method proposed by Maus et al.44 and Dong et al.45 was used to identify the spatial distribution of maize in China, which run annually. The TWDTW method is an improved version of the Dynamic Time Warping (DTW) algorithm, which measures the similarity between two non-linear time series curves by calculating the distance value. Assuming that the time series X = {x1, x2, …, xn} is the curve of an unknown pixel, and the time series Y = {y1, y2, …, ym} is the standard curve of a known maize pixel, and the lengths of the two curves are n and m respectively. The DTW algorithm measures the similarity between the two given time series using the Euclidean distance and can flexibly warp and stretch the time series X to align with the time series Y. Use dbase(i, j) to represent the distance matrix obtained by calculating the Euclidean distance between any two points in sequence X and sequence Y. The calculation is as follows46:

$$\begin{array}{c}{d}_{base\left(i,j\right)}=\left|{x}_{i}-{y}_{j}\right|\end{array}$$
(1)

Among them, xiXi = 1, 2, …, n, yjYj = 1, 2, …, m. Each matrix element d(i, j) represents the alignment distance between xi and yi.

On the basis of the distance matrix dbase(i, j), the cumulative distance matrix is obtained by recursively summing the minimum distance di, j:

$$\begin{array}{c}{d}_{i,j}={d}_{base\left(i,j\right)}+min\left\{{d}_{i-1,j},{d}_{i-1,j-1},{d}_{i,j-1}\right\}\end{array}$$
(2)

The DTW algorithm calculates the distance between two sequences by finding a warping path with the smallest stretching or compressing distance in the cumulative distance matrix, and the points on the path are the points where the two sequences are warped and aligned. The final distance is used to characterize the similarity between the time series X and Y. TWDTW improves DTW by adding time weights, which avoids over-stretching or compressing the curves during time matching and ignoring the seasonal variation of crops. The calculation of the distance matrix dbase(i, j) in TWDTW becomes:

$$\begin{array}{c}{d}_{base\left(i,j\right)}={\omega }_{i,j}+\left|{x}_{i}-{y}_{j}\right|\end{array}$$
(3)
$$\begin{array}{c}{\omega }_{i,j}=\frac{1}{1+{e}^{-\alpha \left(g\left({t}_{i},{t}_{j}\right)-\beta \right)}}\end{array}$$
(4)

This study calculated the time weights using a logistic model with parameters suggested by Belgiu and Csillik (2018), where the steepness and midpoint were set to 0.1 and 50, respectively, indicating lower penalties for time warping less than 50 days and higher penalties for time warping more than 50 days.

First, this study created a potential maize distribution map based on the NDVI time series, where pixels with NDVI greater than 0.3 at any time during the maize growing period to reduce the number of identified pixels. Then, fifty field samples of maize were randomly selected in each province and their NDVI time series were averaged to obtain the standard seasonal curve of NDVI for spring and summer maize (Fig. 3). The similarity between the standard NDVI seasonal curve and the seasonal curve for the unknown land cover type was calculated for each pixel, with higher similarity indicating a higher probability of the identified pixels as maize. For the provinces planted both spring and summer maize, we calculated the similarity between the seasonal curve and the two standard curves for each pixel separately, and took the one with the higher similarity as the correct similarity for the pixel. Finally, we selected the n pixels with the highest similarity to the standard curve as the identified maize pixels. The total area of all selected n pixels should be equal to the planting area of maize in the given province. The similarity threshold for each province was determined using the provincial statistical area of maize.

Fig. 3
figure 3

Standard seasonal curves of maize in 22 provinces in 2019, including (a) summer maize and (b) spring maize.

Statistical analysis

In this study, the identification accuracy of maize was evaluated based on the field surveys conducted in 2019. Fifty field samples were randomly selected to determine the standard seasonal curve of maize. The remaining samples were used to calculate three accuracy indicators, including Producer’s Accuracy (PA), User’s Accuracy (UA), and Overall Accuracy (OA). PA represents the percentage of the surveyed maize samples correctly identified as maize; UA represents the percentage of identified maize that are actually confirmed as maize samples by field survey; OA is calculated as the percentage of correctly identified samples. The three accuracy metrics can be calculated as:

$$\begin{array}{c}{\rm{PA}}=\frac{TP}{TP+FP}\times 100 \% \end{array}$$
(5)
$$\begin{array}{c}{\rm{UA}}=\frac{TP}{TP+FN}\times 100 \% \end{array}$$
(6)
$$\begin{array}{c}{\rm{OA}}=\frac{TP+TN}{TP+TN+FP+FN}\times 100 \% \end{array}$$
(7)

where TP is the number of correctly classified maize samples, TN is the number of correctly classified non-maize samples, FP is the number of non-maize samples classified as maize, and FN is the number of maize samples classified as non-maize.

In addition, this study compared the identified maize planting area with the county-level statistical area. The coefficient of determination (R2), the slope of the regression line between the identified area and the statistical area, and the relative mean absolute error (RMAE) were calculated for the 17 provinces with county-level statistical data. The calculation formulas of R2 and RMAE are as follows:

$${R}^{2}=1-\frac{{\sum }_{i=1}^{n}{\left(I{A}_{i}-S{A}_{i}\right)}^{2}}{{\sum }_{i=1}^{n}{\left(\overline{SA}-S{A}_{i}\right)}^{2}}$$
(8)
$$RMAE=\frac{{\sum }_{i=1}^{n}\left|S{A}_{i}-I{A}_{i}\right|}{{\sum }_{i=1}^{n}S{A}_{i}}$$
(9)

where SAi and IAi are the statistical area and identified area of the ith county, and n represents the number of counties in a given province.

Data Records

The 30 m CCD-Maize dataset from 2001 to 2020 is available at https://doi.org/10.57760/sciencedb.0849047. The dataset is provided in GeoTIFF format, with pixel values of 1 for maize and 0 for non-maize. A total of 440 GeoTIFF files are stored under 20 folders, and each folder represents the maize maps of 22 provinces in a specific year from 2001 to 2020.

Technical Validation

Accuracy assessment

Field survey assessment

We quantitatively evaluated the accuracy of maize distribution maps based on field survey samples from 2002, 2005, 2007, 2008, 2011, 2012, 2013, and 2019. On average, the OA of maize identification in 22 provinces was 80.06%, with UA and PA of 77.32% and 80.98%, respectively (Tables 1, 2). Jilin had the highest OA of 94.04%, while Chongqing had the lowest at 64.81% (Tables 1, 2). There was a large variation in UA and PA across provinces, with lower accuracy found in several provinces in Southern China, such as Hunan, Hubei, Jiangsu, Guangxi, Chongqing, and Guizhou (Table 2). In some provinces, such as Henan Province, the accuracy varies greatly between early years and recent years, with the OA of 82.72% in 2019 and only 53.06% in 2013 (Table S1). Figure 4a1,b1,c1 show large maize fields captured by UAV in August 2018 at three sites in Ningxia, Shaanxi, and Inner Mongolia, with most fields showing a dark green color indicating maize, and a small area of other crops (such as rice, displaying light green). As shown in Fig. 4, the maize distribution map produced in this study accurately identified the location of maize and distinguished buildings and wide roads. However, there were some misclassifications, with some light green rice fields being misclassified as maize (Fig. 4a2).

Table 1 Confusion matrices of the distribution map of maize in northern provinces.
Table 2 Confusion matrices of the distribution map of maize in southern provinces.
Fig. 4
figure 4

Classification maps at UAV sites of Ningxia (a1,a2), Shaanxi (b1,b2), and Inner Mongolia (c1,c2). All UAV pictures were taken in August 2018. Red indicates the identified maize pixel. The coordinates of the center points of a1 and a2 are (105°48′36″ E, 37°32′24″ N); the coordinates of the center point of b1 and b2 are (109°44′04″ E, 34°33′0″ N); the coordinates of the center point of c1 and c2 are (110°27′0″ E, 40°33′13″ N).

Statistical data assessment

The accuracy of the maize maps was further validated based on county-level statistical data in this study. A total of 17 provinces with county-level statistical data were used for validation, with the total number of counties ranging from 1131 to 1614. The results showed that the maize distribution maps produced in this study could effectively reproduce the spatial variability of the maize planting area (Fig. 5). From 2001 to 2020, the identified areas and the statistical areas of maize showed a strong correlation, with the scatters concentrated near the 1:1 line. The slope of the regression line between the maize classified area and the statistical area ranged from 0.881 to 0.974, the RMAE ranged from 0.279 to 0.497, and the R2 was above 0.65, with a maximum of 0.903 in 2018 (Fig. 5).

Fig. 5
figure 5

County-level comparison of identified and statistical planting areas of all provinces. (at) show 2001–2020. The red dashed lines indicate the 1:1 line, and the red solid lines indicate the regression lines.

Moreover, this study also analyzed the consistency between the identified maize area and the statistical area for each province. The results showed that the average R2 of all provinces from 2001 to 2020 ranged from 0.477 to 0.868, the slopes of the regression line ranging from 0.712 to 1.001, and RMAE ranging from 0.264 to 0.644 (Fig. 6). As shown in Fig. 7a,b, large identification errors were found in the middle and lower reaches of the Yangtze River Plain (i.e., Jiangsu, Hubei) and mountainous regions (i.e., Gansu, Ningxia, Yunnan, Sichuan), with R2 below 0.6. The northeast region (i.e., Liaoning, Jilin) showed the best identification accuracy, with Liaoning achieving the highest R2 of 0.868, followed by Jilin with R2 of 0.813 (Figs. 6, 7a). The identification accuracy of major maize-producing provinces such as Jilin, Liaoning, and Inner Mongolia was higher than that of other provinces from 2001 to 2020. The identification accuracy of provinces such as Jiangsu, Shandong, Henan, and Gansu fluctuated greatly between 2001 and 2020. For example, the R2 of Jiangsu was only 0.203 in 2003 but reached 0.739 in 2019 (Fig. 6). From the interannual variation of the average identification accuracy, both R2 and RMAE showed significant temporal trends, with R2 gradually increasing and RMAE gradually decreasing from 2001 to 2020 (Fig. 7c). In 2018, RMAE reached a minimum of 0.350, and R2 was 0.750. Overall, the maize distribution maps produced in this study accurately reproduced the spatial variability of maize planting area and also achieved good performance at the annual scale.

Fig. 6
figure 6

County-level comparison of identified and statistical planting areas in each province. (a) R2, (b) RMAE, and (c) Slope of county-level identified maize areas compared to agricultural statistical data in 17 provinces for 2001–2020.

Fig. 7
figure 7

Spatial comparison of (a) R2 and (b) RMAE of identified and statistical planting areas, and (c) trend change in average R2 and RMAE from 2001 to 2020 over the 17 provinces with the county-level statistical area.

Spatiotemporal patterns of maize

Based on the maize distribution maps generated by this study, we first analyzed the ratio of pixels with continuous maize planting in China from 2001 to 2020. As shown in Fig. 8, the maize planting in China generally showed a high frequency of continuous planting, with 50.74% of pixels having continuously planted maize for over 10 years and 19.80% for less than 5 years. The frequency of continuous maize plantings varied largely across different regions. The Northeast and Northern China, two major maize production regions, had relatively high frequencies of continuous maize planting, with over 67.13% of pixels planted for more than 10 years (Fig. 8). In contrast, Hunan had the lowest frequency of continuous maize planting, with 56.12% of pixels planted for less than 5 years (Fig. 8).

Fig. 8
figure 8

Distribution of maize planting frequency in China from 2001 to 2020. Panels 1–5 on the right and bottom are the zoomed-in maps, indicating the local details of different provinces and regions, including Northeast, North China, South China, Southwest, and Xinjiang.

In addition, we analyzed the distribution of different-sized patches based on the generated maize distribution maps in China (Fig. 9). As shown in Fig. 9, the proportion of large patches (defined in this study as patches with over 1,000 pixels, an area of about 90 hectares or larger) in major maize-producing provinces such as Hebei, Henan, Jilin, Liaoning, and Shandong were very high, reaching over 50%. However, in most southern provinces, such as Guangxi, Guizhou, Hubei, Hunan, Sichuan, Yunnan, and Chongqing, the proportion of large patches was relatively low, at only 13.88%. The proportion of patches of different sizes in each province depends largely on the degree of cropland fragmentation in China. In this study, we used the proportion of small patches (defined in this study as patches with 10 or fewer pixels, an area of about 0.9 hectares or less) as an indicator to measure the fragmentation of maize distribution maps. From 2001 to 2020, the most severely fragmented areas were generally in South and Southwest China, while the degree of fragmentation in Northeast China was much smaller (Fig. 9).

Fig. 9
figure 9

Proportion of patches with different numbers of pixels (0–108) on the maize map of each province, (at) indicates 2001–2020.

The fragmentation of maize maps can be further classified into three classes (Fig. 10a). Class I includes provinces with a small proportion of small patches (less than 15%), including Heilongjiang, Inner Mongolia, Jilin, Liaoning, Henan, Hebei, Shanxi, and Shandong, which are major maize planting provinces. Class II has a proportion of small patches between 15% and 30%, including Yunnan, Guizhou, Sichuan, Jiangsu, Hubei, and Gansu. Class III has a proportion of small patches greater than 30%, including Chongqing, Guangxi, and Hunan. Among them, Hunan has the highest proportion of small patches, reaching 67.52%, indicating that maize planting in this province is highly fragmented. Provinces with higher proportions of small patches (i.e., classes II and III) are mostly mountainous areas, where maize planting is more scattered. It should be noted that although Jiangsu is adjacent to Anhui, there is a significant difference in the proportions of small patches. This is because Jiangsu promotes rice planting to the north of the Huaihe River, where rice is widely planted, while maize planting is relatively less concentrated and more scattered, leading to a higher proportion of small patches (i.e., 24.33%). In contrast, in Anhui, north of the Huaihe River, maize is the main crop, while south of the Huaihe River, rice is the main crop, resulting in more concentrated maize planting and a lower proportion of small patches (i.e., 12.80%) (Fig. 10a). Additionally, the average proportion of small patches has gradually decreased from 2001 to 2020 (Fig. 10b), indicating a gradual reduction of cropland fragmentation in China.

Fig. 10
figure 10

Statistics of the proportion of patches with 10 or fewer pixels (small patches). (a) spatial distribution of the proportion of small patches, and (b) the distribution of the average proportion of small patches in all provinces from 2001 to 2020.

Comparison with other studies

We compared the dataset produced in this study with an existing product, a 30-m maize identification product for China from 2017 to 2020, produced by Shen et al.22, which was based on the NDVI composited from Landsat 7, Landsat 8 and Sentinel-2 images. Taking 2020 as an example, we selected study areas in Henan and Shandong (see the green boxes in Fig. 11a) and compared two products. As shown in Fig. 11b1,c1, the maize identification map of Shen et al.22 showed clear striping problems. This is due to the large number of missing values in these areas of Landsat and Sentinel-2 data, which cannot be fully restored by common filling interpolation methods48, thus affecting the identification accuracy45. Instead, the maize distribution map produced in this study, which was based on the NDVI fused dataset, overcame this problem well, and the identified maize fields are more complete (Fig. 11b2,c2).

Fig. 11
figure 11

Comparison of the maize map of this study and Shen et al.22. (a) Maize planting area mapping based on fused products of 2020 in China; (b1,c1) partial zoomed-in maps of the identification results of Shen et al.22; (b2,c2) partial zoomed-in maps of the identification results based on the fused dataset.

Based on field survey data of 2019, the average OA of this study is slightly higher than that of Shen et al.22. Specifically, the OA in Anhui and Jiangsu have significantly improved compared to Shen et al.22, while those in Gansu and Yunnan provinces have decreased (Fig. 12). In addition, this study calculated the average R2 at the county level from 2017 to 2020, and the results show that the average R2 based on the fused product was 0.877, which is higher than the identification accuracy of Shen et al.22, which was 0.822. Overall, the maize distribution maps produced in this study have achieved higher identification accuracy than the product of Shen et al.22.

Fig. 12
figure 12

Comparison of the overall accuracy of the maize planting distribution map in 2019 of this study and Shen et al.22.

Additionally, we further used independent sample sets from You et al.33 to verify the identification accuracy of our maize maps. On average, the OA of maize identification in 3 provinces was 85.85%, with UA and PA of 80.95% and 82.80%, respectively (Table 3). Liaoning got the highest OA of 91.50%, while Heilongjiang achieved the lowest OA of 78.52% (Table 3). In addition, we compared our maize distribution map with You et al.’s33, and the overlap percentage in Heilongjiang, Jilin, and Liaoning are 72.37%, 77.56%, and 78.08%, respectively. Figure 13 shows the overlap percentage of two products in Heilongjiang in 2019. Our product was overestimated in high latitude areas, and the two products achieved high consistency in lower latitude areas.

Table 3 Confusion matrices of the distribution map of maize in Northeast provinces using independent sample sets from You et al.33.
Fig. 13
figure 13

Comparison of the maize map in Heilongjiang for 2019 of this study and You et al.33. (a) Maize planting area mapping based on this study and You et al.33 (b,c) partial zoomed-in maps of this study and You et al.33.

Limitations and prospects

Although the long-term annual maize distribution map produced in this study performed well in spatial and interannual variability, there are still some uncertainties. In this study, the small patch proportion was used as an indicator of cropland fragmentation to analyze the relationship between cropland fragmentation and maize identification accuracy. As shown in Fig. 14a, there was a negative correlation between maize identification accuracy and the proportion of small patches, with decreased identification accuracy as the proportion of small patches increased. When the proportion of small patches was higher (greater than 15%, i.e., Class II), the identification accuracy and the proportion of small patches showed a stronger correlation, i.e., the R2 of the Class II is 0.460, which is twice than that of Class I (0.212). In addition, the interannual variation of small patch proportion and identification accuracy shows a significant negative correlation, with an R2 of 0.806 and a slope of −0.037 (Fig. 14b). In terms of spatial distribution, they also show a certain negative correlation, with an R2 of 0.305 and a slope of −0.007 (Fig. 14c). Vintrou et al.49 also found that the identification accuracy was linearly correlated with the average patch size calculated on the crop maps (R2 = 0.8), and the identification accuracy continued to improve as the average patch size increased. Numerous studies have consistently shown that a higher degree of cropland fragmentation leads to increased uncertainty in the accuracy of crop mapping50,51. Overall, our results indicated that the maize identification accuracy decreased as the degree of cropland fragmentation increased. Therefore, it would be meaningless to identify the data in the other provinces and municipalities with limited and fragmentized maize areas.

Fig. 14
figure 14

Comparison between the proportion of small patches and identification accuracy. (a) relationship between the proportion of the first and second types of small patches and the identification accuracy, (b) interannual relationship between the proportion of small patches and the identification accuracy the relationship between the changes, and (c) spatial comparison of the proportion of small patches and the identification accuracy.

The identification accuracy is also affected by the quality of remote sensing data. A recent study found that the number of cloud-free satellite images largely determines the recovery of vegetation index seasonal changes, which in turn affects the accuracy of crop classification45. Although the long-term, high spatiotemporal resolution NDVI fused dataset used in this study has effectively recovered most of the missing data in the original Landsat data, the data recovery effect is still limited in some areas with severe cloud cover52. We calculated the proportion of Landsat missing values filled in the fused dataset in the study area from 2001 to 2020. As shown in Fig. 15, the filling percentage in Southern China was relatively high. Due to the influence of the East Asian summer monsoon in this region, there may be long periods of cloudy and rainy weather during the maize growing season, making it difficult to obtain cloud-free satellite images53. Taking Yunnan as an example, the average filling percentage from 2001 to 2020 was 75.26% (Fig. 15). When the filling percentage of the fused dataset is too high, it can cause some spectral distortion and blur52, which is an important factor to the low identification accuracy in these provinces.

Fig. 15
figure 15

Averaged filling percentage of fused data in the study area from 2001 to 2020.

The uncertainty of CCD-Maize is also highly related to the identification method used in this study, which is a phenology-based identification method. Maize has a very similar phenological cycle to many other summer crops, such as soybean, peanut, and rice, which greatly increases the difficulty of identifying maize. In most southern provinces, rice is the dominant summer crop and maize identification accuracy tends to be lower than in northern provinces (Fig. 7a,b). A previous study also found that maize had the lowest identification accuracy compared to rice and winter wheat distribution maps identified using the 250-m MODIS data27. Using only one index (such as NDVI) is not enough to fully distinguish maize from other summer crops22. To improve identification accuracy, further research is needed to explore the differences in vegetation indexes or surface albedo between maize and other summer crops. The red-edge bands from Sentinel-2 data are considered to be very useful for vegetation monitoring and are widely used for crop classification54,55,56. Currently, multi-source data fusion mainly focuses on Landsat and MODIS data. In the future, using additional satellite data for spatiotemporal fusion, such as Sentinel-2, may improve fusion accuracy, leading to higher precision maize distribution maps.