Abstract
Strengthening industrial pollution control in the Yangtze River is a fundamental national policy of China. There is a lack of detailed distribution of chemical industrial parks (CIPs). This Study utilized random forest (RF) and active learning to generate the distribution map of CIPs along the Yangtze River at 10-m resolution. Based on Sentinel-2 imagery, spectral and texture features are extracted. Combined with the Points of Interest (POI), a multidimensional feature space is constructed. By employing partitioned training, classification of CIPs map is achieved on Google Earth Engine (GEE). Technical validation along the entire Yangtze River demonstrates a model accuracy of 80%. Compared to traditional manual survey methods, this approach saves significant time and economic costs while also being timelier. As the first publicly available CIPs map within a 5-km range along the Yangtze River, this research will provide a scientific basis for the fine governance of chemical industries in the region. Additionally, it offers a model guide for the accurate identification of the chemical industry.
Similar content being viewed by others
Background & Summary
Since the 1990s, the global chemical industry has undergone extensive restructuring and relocation1. Since 2011, China has emerged as the world’s largest chemical industry market, with sales reaching around $2.04 trillion in 2021, contributing to half of the global chemical market growth over the past two decades2. Simultaneously, various factors have prompted chemical enterprises to cluster together3,4, forming Chemical Industry Parks (CIPs) with well-established infrastructure, strict management systems, and clear geographical boundaries5. However, CIPs are also high-risk areas, often characterized by large-scale operations and high-density storage of chemicals, making them prone to hazards and major accidents6,7. These areas can trigger catastrophic domino effects in the event of fires, explosions, or chemical leaks8,9,10,11, resulting in significant property damage, casualties, environmental pollution, and ethical concerns12,13. For instance, in November 2005, an accidental explosion at the Jilin Petrochemical Group in China led to nitrobenzene contamination in the Songhua River, causing environmental pollution throughout the basin and triggering an international dispute between China and Russia14. In March 2019, a major explosion occurred in the Ecological CIP in Xiangshui County, Jiangsu Province, resulting in 78 deaths, 76 severe injuries, 640 hospitalizations, and direct economic losses of 1.986 billion yuan15. Therefore, public awareness and concern about CIPs are extremely sensitive.
Furthermore, factors such as water resource availability, convenient transportation conditions, and favourable environmental emissions have led chemical enterprises to choose to establish factories along rivers and coastlines16. As China’s inland waterway transportation artery, the Yangtze River offers lower shipping costs and facilitates the convenient transport of raw materials and products, making its vicinity a hub for chemical industries worldwide. Statistics indicate that there are over 400,000 chemical enterprises in the entire Yangtze River Basin, with the chemical output accounting for approximately 46% of China’s total17. These enterprises are primarily concentrated in regions such as Jiangsu, Zhejiang, Shanghai, Hubei, Sichuan, and Chongqing, where the high population density and numerous sources of risk contribute to a challenging environmental situation. In recent years, as China has intensified its efforts to promote the green development, the protection of the Yangtze River has reached unprecedented levels. Governments at all levels along the Yangtze River are actively transitioning CIPs, while the protection of the shoreline are increasingly prioritized18. The promulgation of the Yangtze River Protection Law of the People’s Republic of China in 2020 aims to strengthen the ecological environment protection and restoration in the Yangtze River Basin. Effective from March 1, 2021, the law explicitly prohibits the construction or expansion of CIPs within a 1-km range along the Yangtze River (https://www.mee.gov.cn/ywgz/fgbz/fl/202012/t20201227_814985.shtml).
However, in reviewing the released data on the distribution of CIPs in China, we are only able to obtain official statistical information, lacking spatial distribution data that could provide more detailed insights. Additionally, conducting large-scale surveys in the region is often costly and time-consuming, making it difficult to accurately capture the distribution characteristics of CIPs in a timely manner. Recognizing the importance of the CIPs distribution along the Yangtze River for the protection efforts, we aim to provide a publicly available dataset detailing CIPs map along the river19. This dataset serves as a scientific basis for future refined governance of chemical industries along the Yangtze River. Moreover, it offers guidance for addressing the issue of chemical industries polluting rivers, not only along the Yangtze River but also for addressing similar challenges along other global rivers.
At present, the use of Deep Learning (DL) for classification of land features in high-resolution remote sensing imagery has become increasingly popular20,21. However, the training process often requires a large number of labeled samples; otherwise, the model may easily overfit on limited training samples and perform poorly when predicting new, unknown datasets22. Currently, there is a lack of sample datasets covering general features of CIPs. Most studies rely on labeled samples of storage tanks for training, which cannot be applied to the entire park area in practice23. Furthermore, research tends to focus on smaller areas such as urban centers, limiting the functionality of large-scale, timely, and accurate mapping. In contrast, Machine Learning (ML) applications in remote sensing are more mature, with increasingly clear classification of fine details of land features24,25. In this context, employing ML method RF based on GEE and utilizing Sentinel-2 imagery, offers a viable solution to classify CIPs. By embedding training samples region by region and combining active learning through iterations, it is possible to predict CIPs within the study area. This approach enables timely and accurate completion of remote sensing image recognition tasks for CIPs on a large scale26.
The CIPs map19 provided by this study, covering a 5-km range along the Yangtze River with a resolution of 10 m in 2021, fills a gap in the comprehensive detection of CIPs along the Yangtze River. By openly sharing this dataset, it can reveal the spatial patterns, density ranges, and proximity to urban areas, water bodies, and environmentally sensitive areas of CIPs along the river. This dataset will support the formulation and implementation of relevant policies for the protection of the Yangtze River.
Methods
Study area
The Yangtze River (Fig. 1), the world’s third-longest river spanning a length of 6363 km, boasts an annual average runoff of approximately 9600 billion m3. Serving as one of the most densely populated and industrialized regions globally, the Yangtze River Basin has shaped a world-class economic belt27. Encompassing 11 provinces and municipalities, the basin covers an area of around 2.0523 million km2, accounting for 21.4% of China’s total land area. In 2022, the region achieved a total GDP of 55.98 trillion yuan, contributing 46.5% to China, with a population of 609 million, representing 43.1% of the total China. Compared to other regions in China, the level of urbanization in the Yangtze River Economic Belt has also shown a rapid increase, with the urban population proportion rising from 49.25% in 2010 to 62.96% in 2021.
The Yangtze River plays a crucial role as a lifeline for human existence, a catalyst for economic growth, and a protector of China’s natural heritage. Therefore, this study focuses on the core area within a 5-kilometer range along the main stream of the Yangtze River, identifying the locations of chemical industrial parks.
Main input data
Sentinel-2 imagery data
Sentinel-2 provides global multispectral remote sensing data with a resolution of 10 m, displaying advantages in high resolution and mixed pixels compared to NASA’s Landsat data, making it a better choice for precise identification of CIPs28. When extracting Sentinel-2 remote sensing imagery data along the Yangtze River using Google Earth Engine (GEE), the autumn season window from October to December is selected to avoid interference from more clouds, fog, and vegetation during climate characteristic of the Yangtze River.
Feature extraction is crucial for classification based on remote sensing imagery. A series of spectral and texture features (Table S1) are computed from Sentinel-2’s multispectral data to construct a multidimensional feature space and enhance inter-class differences. Specifically, spectral features such as Normalized Difference Vegetation Index (NDVI), Soil Adjusted Vegetation Index (SAVI), Modified Normalized Difference Water Index (MNDWI), and Normalized Difference Built-up Index (NDBI) are utilized. NDVI enhances the differentiation between vegetation and CIPs, while SAVI further corrects NDVI values influenced by soil brightness in areas with low vegetation cover. MNDWI, a widely used water index, enhances the separability between CIPs and water bodies. NDBI indicates CIPs have high-density hard surfaces and higher indices, while normal residential built-up areas have fewer buildings and more green spaces.
Moreover, the texture features of CIPs’ factory facilities in remote sensing imagery are significantly different from other artificial land covers. In this study, texture features are computed using the popular Gray-Level Co-occurrence Matrix (GLCM), including mean, variance, Contrast, dissimilarity, entropy, correlation, inverse difference moment, and angular second moment. Generally, larger window sizes provide coarser texture information, and the contribution of texture features to classification accuracy depends on both texture scale and object scale. Considering the characteristic scale of storage, manufacturing, and transportation equipment within CIPs, a GLCM window size of 3 × 3, corresponding to an actual object area of 900 m2, is selected. After extracting spectral and texture features, a multidimensional attribute feature space is constructed through concatenation, and classification is performed on this feature space based on RF and active learning strategies.
Chemical industrial park POI data
Open social data has been widely employed in remote sensing research, with POI, representing geospatial and classification information, playing a crucial role29,30. POI is typically sourced from online maps, geographic information platforms, and other sources, serving as coordinate points in high-resolution electronic maps linked to human activity information31. In this study, we utilized POI data related to “chemical industry” along the river obtained from the Baidu Big Data platform (https://map.baidu.com/) as the basis for identifying CIPs.
Chemical industrial parks in remote sensing imagery are often large, complex targets containing various features with strong semantic information, including chemical storage tanks, production plants, and water treatment facilities. The scale difference between large and small chemical industrial facilities is also apparent, with dense storage tanks being a typical feature in remote sensing imagery. It is crucial to initially use chemical industrial park POI for manual annotation to delineate the boundaries of these parks. Subsequently, combining this information with a RF classifier for partitioned recognition enables the rapid detection of CIPs on a large scale.
China administrative boundary data
The boundaries of China’s administrative divisions can be downloaded from the National Geographical Information Catalog Service (www.webmap.cn). This data will be used to conduct statistical analysis of the distribution of CIPs within city and county units along the Yangtze River. Along the main stream of the Yangtze River, there are a total of 27 cities and 142 county-level units. Table 1 shows the main information and sources of all data used in this study.
Modelling
Random forest classifier
The main reason for choosing the RF classifier to identify CIPs from a multidimensional feature space is its effectiveness in modelling nonlinear features32, and RF has been widely applied in remote sensing fields such as vegetation mapping, water body extraction, and urban facility recognition33,34,35.
RF can be considered as an ensemble classifier consisting of a large number of decision trees, where the classification result of each decision tree is determined by the vote of all decision trees36. Additionally, there are two random sampling processes in RF. The first one is bootstrapping, which is an inherent step in the RF method. Specifically, all initial training samples are resampled using bootstrapping to train each base classifier of the RF. The other process is the random selection of features used to train each decision tree. Therefore, the above random processes make RF robust to noise and outliers. Compared to other classifiers like support vector machines, the parameterization of RF is much simpler. Only a few parameters need to be adjusted, including the number of decision trees, the maximum number of splits for each tree, and the number of features used to train each tree. The optimal values for the three parameters mentioned above in this study were determined through grid search as follows: 100, 10, and 5, respectively.
Training details
We used the GEE API to build the RF classifier. Given that the study area covers the entire Yangtze River basin, the CIPs in different sections of the Yangtze River exhibit visual differences due to surrounding environmental and terrain factors. In particular, the upper reaches of the Yangtze are mountainous with steep and rugged terrain and dense vegetation, whereas the downstream areas are predominantly plains with surrounding farmlands. This leads to inconsistent intra-class differences, making it impossible to train a high-precision RF classifier to identify the entire Yangtze River basin. By modelling in regional divisions, we aimed to reduce intra-class differences and obtain more accurate local predictions, thereby improving overall classification performance. Considering the importance of spatial generalization of the model, we chose the administrative units of 27 cities along the Yangtze River as the modelling units for regional division.
In terms of sample selection, we first obtained POI of chemical industrial along the Yangtze River in 2021 through web crawling, and then conducted visual annotation to strictly select training samples. To ensure the correctness and representativeness of the collected CIPs samples, we iteratively checked the remote sensing images. Another important aspect was the selection of non-CIPs samples, as it would help reduce false positives in the classification (i.e., predicting non-CIPs as CIPs). Considering that non-CIPs samples consist of multiple land cover types, we collected these negative samples from more refined categories such as forests, grasslands, farmlands, water bodies, bare lands, as well as specific categories like steel plants, cement plants, manufacturing plants, and conventional residential areas, which are impervious surfaces. After completing the labelling process, we gathered 3000 samples of CIPs and an equivalent number of samples from non-CIPs areas.
Active learning
During the mapping process, it is challenging to generate an accurately classified map in a single attempt. In such cases, we aim to further improve the CIPs classification performance through an iterative process based on active learning, using a coarse-to-fine strategy. In fact, active learning is a preferable choice for remote sensing classification when labelled samples are limited37. Firstly, active learning requires training the classifier with initial labelled data, then using the trained classifier to predict the partition dataset. The incorrectly predicted and unlabelled data are then returned to experts for further annotation. Finally, the classifier is retrained using both the initial and newly labelled data to avoid potential prediction errors. This process runs iteratively until satisfactory classification results are obtained.
In this study, we combined the active learning strategy with the RF classifier to refine the CIPs classification results. Careful selection of misclassified pixels and subsequent re-labelling were performed to retrain the RF model. We focused on two types of errors: missed CIPs pixels and false positive prediction errors, to increase the robustness of CIPs against these typical errors. The above process iterates continuously until the accuracy of CIPs classification is met. By incorporating active learning, classification accuracy can be improved with only a small additional annotation cost, making it a good choice for large-scale high-precision CIPs mapping applications in this study. Figure 2 provides an overview of our workflow.
Data Records
This study provides a 10-m resolution raster map of CIPs along the Yangtze River within a 5-km range for the year 2021. It is the first publicly released dataset of high-resolution chemical industrial parks data in the Yangtze River region. The data format is GeoTIFF, and the spatial reference system is WGS-84. The map contains two values, where 1 represents CIPs and 0 represents non-CIPs area. The CIPs map data can be loaded into GIS software such as ArcGIS and QGIS for data visualization and spatial analysis. This dataset will be freely available to all users on the figshare repository19 (https://figshare.com/articles/dataset/The_Yangtse_River_CIPs_10m_2021_rar/25566132).
In addition, we have zoomed in on nine typical CIPs along the Yangtze River for detailed analysis, as shown in Fig. 3. It can be observed that the distribution of CIPs is relatively concentrated, often formed by the aggregation of multiple small-scale chemical factories. Most of them exhibit characteristics of extending along the Yangtze River. This dataset provides a more accurate basis for understanding the specific locations and planning governance of chemical industries along the Yangtze River.
Technical Validation
This section describes the method for technical and accuracy validation of the CIPs map. Firstly, we meticulously constructed a comprehensive test dataset covering the entire Yangtze River to evaluate the accuracy of the mapping results. Subsequently, we will showcase several detailed CIPs classification maps to qualitatively assess the accuracy of the provided dataset. Through these steps, our aim is to ensure the reliability and precision of the generated CIPs map, thereby enhancing their utility and credibility for various applications following their release.
Specifically, we randomly selected samples of both CIPs and non-CIPs along the Yangtze River based on the CIPs dataset. We divided the data into 70% for training and 30% for testing, resulting in nearly 2000 test samples, including approximately 1000 positive samples (i.e., CIPs) and 1000 negative samples (non-CIPs). It’s important to note that all these test samples and training samples are mutually independent and have no spatial intersections.
We assessed the classification performance by computing the confusion matrix, overall accuracy, and Kappa coefficient for both the test samples (Table 2) and training samples (Table S2). The overall accuracy of the test dataset is approximately 79.37%. Here, for category CIP, the producer’s accuracy (PA) is approximately 80.20%; for category Non-CIP, the PA is approximately 78.54%. User’s accuracy (UA) refers to the accuracy of the model in correctly classifying the predicted category of features. For category CIP, the UA is approximately 79.06%; for category Non-CIP, the’UA is approximately 79.70%. The Kappa coefficient measures the consistency between the model’s predicted results and random predicted results. The value of the Kappa coefficient ranges from −1 to 1, where 1 indicates complete consistency, 0 indicates the same consistency as random, and −1 indicates complete inconsistency. Here, the Kappa coefficient is approximately 0.587, indicating that the model’s predicted results are more consistent than random predictions. The accuracy of the training set is very high, with an overall accuracy of 99.73% and a PA of 0.9933 for category CIP. Although the accuracy of the test set is lower than that of the training set, considering that the test dataset is distributed along the entire Yangtze River, these results provide us with substantial confidence in the identified distribution of CIPs.
To better illustrate the classification results of CIPs, we collected several detailed CIPs identification results, including Suzhou, Wuxi, Changzhou, Nanjing, Wuhan, Jingzhou, and Chongqing, covering major cities along the upper, middle, and lower reaches of the Yangtze River and different terrains, as shown in Fig. 4. All CIPs are accurately identified and displayed using this research dataset.
Furthermore, based on the CIPs dataset19 provided by this study along the Yangtze River region and relevant Sentinel-2 imagery, a large number of image and vector mask samples can be obtained. In the future, further development of DL-based semantic segmentation models for CIPs in high-resolution remote sensing images will be pursued.
Usage Notes
Through this study, we have released the high-resolution map of CIPs19 along the Yangtze River in 2021. It is important to note that each pixel in the published CIPs map represents an area of 100 m2 (10 m × 10 m). We filtered out speckle noise smaller than 3 × 3 pixels (i.e., 900 m2), ensuring that the smallest identifiable chemical industrial park is 900 m2, which is much smaller than the actual statistical area of CIPs.
Due to the rapid urban expansion along the river cities and the industrialization without proper planning, CIPs have emerged along the river, posing one of the risks faced by China’s early rapid urbanization development38. With the provided CIPs dataset19, these parks’ locations and distributions can be easily identified. In the Supplementary file (Figs. S1, S2, S3), we have provided the detailed statistical information of the chemical industrial parks.
We have also plotted a scatter map showing the distribution and area of CIPs along the Yangtze River according to the axis of location (Fig. 5a). Based on segmentation into the upper, middle, and lower reaches of the Yangtze River, the area of CIPs in the upper reaches is 24.85 km2, sparsely distributed, with the smallest average area of CIPs at 27 hm2, mainly in the city of Chongqing. In the middle reaches, the area of CIPs is 65.62 km2, with an average park area of 41 hm2, mainly in Wuhan, Yichang, and Jingzhou cities. The downstream area has the largest CIPs area, with the most densely distributed, covering 133.87 km2, accounting for 61.20% of the total area of CIPs along the Yangtze River. The average area is also the largest at 44 hm2, mainly including contiguous areas such as Nanjing and Suzhou-Wuxi-Changzhou. Additionally, there is a segment in the upper reaches where no CIPs were identified. This segment corresponds to the distribution of the Three Gorges Dam (111.05°E, 30.84°N) and Gezhou Dam (111.29°E, 30.73°N), which are strictly protected by central government policies39 and influenced by steep terrain on both sides, making it no CIPs construction.
It is worth considering that based on the precise distribution of CIPs along the Yangtze River, the risks of flood hazard they face in the context of future climate change should be of concern40. Especially in the flat terrain of the middle and lower reaches, where CIPs are highly concentrated and susceptible to direct attacks, the resulting hazards cannot be ignored41. Furthermore, secondary disasters triggered by flood hazards, such as collapses and landslides, mainly occur in mountainous areas in the upper reaches, which will exacerbate safety risks around CIPs.
The pattern of human settlement along riverbanks and coasts overlaps significantly with the distribution of CIPs, which heavily rely on water transportation for import and export. The challenge of chemical industrial pollution is prevalent in developing countries, particularly in regions with large rivers42,43. In 2021, the Chinese government enacted law explicitly prohibiting the construction or expansion of CIPs within 1-km of the Yangtze River, underscoring the urgent need to understand the current distribution of chemical industries along the river to support policy implementation. To realize the vision of the Yangtze River as a truly world-class green golden waterway, it is imperative to implement measures to forcibly shut down non-compliant chemical facilities, promoting green development in the Yangtze River Economic Belt44 and Beautiful China initiative. To sum up, we comprehensively detected CIPs within a 5-km range along key areas of the Yangtze River, producing a high-resolution distribution map. This provides authentic and reliable data to protect the Yangtze River while also serving as a valuable data source for various studies on the planning, land use, and ecological conservation along the river. Our research sets a paradigm for the precise identification and regulation of chemical pollution hotspots along waterways globally, benefiting other countries facing similar challenges.
Code availability
The GEE code for classifying CIPs based on Sentinel-2 imagery can be accessed on GitHub at the following link (https://github.com/Songwm26/CIPs21_ScientificData_main). The code, written in JavaScript, includes all the steps mentioned in this article, such as feature calculation, random forest training, and so on. Additionally, we provide data on the area of CIPs within different distance ranges for counties and cities along the Yangtze River.
References
Grimes, S. China’s Evolving Role in the Chemical Global Value Chain. The Chinese Economy 56(6), 441–458 (2023).
Hong, S., Jie, Y., Li, X., & Liu, N. China’s chemical industry: new strategies for a new era. McKinsey & Company (2019).
Reniers, G. L., Ale, B. J. M., Dullaert, W. & Soudan, K. Designing continuous safety improvement within chemical industrial areas. Safety Science 47(5), 578–590 (2009).
Arunraj, N. S. & Maiti, J. A methodology for overall consequence modeling in chemical industry. Journal of hazardous materials 169(1-3), 556–574 (2009).
Hu, X. et al. Land‐use planning risk estimates for a chemical industrial park in China–A longitudinal study. Process Safety Progress 37(2), 124–133 (2018).
Chrysoulakis, N., Adaktylou, N. & Cartalis, C. Detecting and monitoring plumes caused by major industrial accidents with JPLUME, a new software tool for low-resolution image analysis. Environmental Modelling & Software 20(12), 1486–1494 (2005).
Hou, Y. Environmental accident and its treatment in a developing country: a case study on China. Environmental monitoring and assessment 184(8), 4855–4859 (2012).
Cozzani, V., Gubinelli, G., Antonioni, G., Spadoni, G. & Zanelli, S. The assessment of risk caused by domino effect in quantitative area risk analysis. Journal of hazardous Materials 127(1-3), 14–30 (2005).
Reniers, G., & Cozzani, V. (Eds.). Domino effects in the process industries: modelling, prevention and managing. Newnes (2013).
Chen, C., Reniers, G. & Khakzad, N. Cost-benefit management of intentional domino effects in chemical industrial areas. Process Safety and Environmental Protection 134, 392–405 (2020).
Zeng, T., Chen, G., Yang, Y., Chen, P. & Reniers, G. Developing an advanced dynamic risk analysis method for fire-related domino effects. Process Safety and Environmental Protection 134, 149–160 (2020).
Shi, W. & Zeng, W. Application of k-means clustering to environmental risk zoning of the chemical industrial area. Frontiers of Environmental Science & Engineering 8, 117–127 (2014).
Zhang, Y., Deng, Y., Zhao, Y. & Ren, H. Using combined bio-omics methods to evaluate the complicated toxic effects of mixed chemical wastewater and its treated effluent. Journal of hazardous materials 272, 52–58 (2014).
Fu, W., Fu, H., Skøtt, K. & Yang, M. Modeling the spill in the Songhua River after the explosion in the petrochemical plant in Jilin. Environmental Science and Pollution Research 15, 178–181 (2008).
Zhang, N., Shen, S. L., Zhou, A. N. & Chen, J. A brief report on the March 21, 2019 explosions at a chemical factory in Xiangshui, China. Process Safety Progress 38(2), e12060 (2019).
Liu, X. Y. The petrochemical park fire safety planning study based on fire risk analysis. Advanced Materials Research 518, 1045–1051 (2012).
Houming, Z. & Hailin, Q. Study on “Heavy chemical industry encircling the river” in the Yangtze River Economic Belt. Chinas Natl. Cond. Strength 4, 38–40 (2017).
Chen, D. & Hou, L. J. Strengthening efficient usage, protection, and restoration of Yangtze River shoreline. Water Science and Engineering 14(4), 257–259 (2021).
Song, W., Chen, M., & Tang, Z. The Yangtse River_CIPs_10m_2021.rar, figshare, https://doi.org/10.6084/m9.figshare.25566132.v3 (2024).
Mao, L. et al. Large-scale automatic identification of urban vacant land using semantic segmentation of high-resolution remote sensing images. Landscape and Urban Planning 222, 104384 (2022).
Lu, W., Tao, C., Li, H., Qi, J. & Li, Y. A unified deep learning framework for urban functional zone extraction based on multi-source heterogeneous data. Remote Sensing of Environment 270, 112830 (2022).
Chen, B. et al. Multi-modal fusion of satellite and street-view images for urban village classification based on a dual-branch deep neural network. International Journal of Applied Earth Observation and Geoinformation 109, 102794 (2022).
Robinson, C., Bradbury, K. & Borsuk, M. E. Remotely sensed above-ground storage tank dataset for object detection and infrastructure assessment. Scientific Data 11(1), 67 (2024).
Mao, W., Lu, D., Hou, L., Liu, X. & Yue, W. Comparison of machine-learning methods for urban land-use mapping in Hangzhou city, China. Remote Sensing 12(17), 2817 (2020).
Feng, Q. et al. A 10-m national-scale map of ground-mounted photovoltaic power stations in China of 2020. Scientific Data 11(1), 198 (2024).
Gorelick, N. et al. Google Earth Engine: Planetary-scale geospatial analysis for everyone. Remote sensing of Environment 202, 18–27 (2017).
Lu Dadao. Building an Economic Belt is the Optimal Choice for Economic Development Layout—The Huge Potential for Economic Development in the Yangtze River Economic Belt. Geographical Science (07), 769-772 (In Chinese) (2014).
Phiri, D. et al. Sentinel-2 data for land cover/use mapping: A review. Remote Sensing 12(14), 2291 (2020).
Du, S., Du, S., Liu, B., Zhang, X. & Zheng, Z. Large-scale urban functional zone mapping by integrating remote sensing images and open social data. GIScience & Remote Sensing 57(3), 411–430 (2020).
Zhang, X., Du, S. & Wang, Q. Hierarchical semantic cognition for urban functional zones with VHR satellite images and POI data. ISPRS Journal of Photogrammetry and Remote Sensing 132, 170–184 (2017).
Su, Y., Zhong, Y., Zhu, Q. & Zhao, J. Urban scene understanding based on semantic and socioeconomic features: From high-resolution remote sensing imagery to multi-source geographic datasets. ISPRS Journal of Photogrammetry and Remote Sensing 179, 50–65 (2021).
Breiman, L. Random forests. Machine learning 45, 5–32 (2001).
Ko, B. C., Kim, H. H. & Nam, J. Y. Classification of potential water bodies using Landsat 8 OLI and a combination of two boosted random forest classifiers. Sensors 15(6), 13763–13777 (2015).
Wu, H., Lin, A., Xing, X., Song, D. & Li, Y. Identifying core driving factors of urban land use change from global land cover products and POI data using the random forest method. International Journal of Applied Earth Observation and Geoinformation 103, 102475 (2021).
Belgiu, M. & Drăguţ, L. Random forest in remote sensing: A review of applications and future directions. ISPRS journal of photogrammetry and remote sensing 114, 24–31 (2016).
Pal, M. Random forest classifier for remote sensing classification. International journal of remote sensing 26(1), 217–222 (2005).
Settles, B. Active learning literature survey. https://minds.wisconsin.edu/handle/1793/60660 (2009).
Chen, M., Liu, W. & Lu, D. Challenges and the way forward in China’s new-type urbanization. Land use policy 55, 334–339 (2016).
Zhang, Q. & Lou, Z. The environmental changes and mitigation actions in the Three Gorges Reservoir region, China. Environ. Sci. Policy 14, 1132–1138 (2011).
Overpeck, J. T., Meehl, G. A., Bony, S. & Easterling, D. R. Climate data challenges in the 21st century. Science 331(6018), 700–702 (2011).
Yang, J. et al. The role of satellite remote sensing in climate change studies. Nature climate change 3(10), 875–883 (2013).
Roy, M. & Shamim, F. Research on the impact of industrial pollution on River Ganga: A Review. International Journal of Prevention and Control of Industrial Pollution 6(1), 43–51 (2020).
El Gohary, R. Agriculture, industry, and wastewater in the Nile Delta. Int. J. Sci. Res. Agric. Sci 22, 159–172 (2015).
Chen, Y. et al. The development of China’s Yangtze River Economic Belt: How to make it in a green way. Science Bulletin 62(9), 648–651 (2017).
Acknowledgements
This research was supported by National Natural Science Foundation of China (grant no. 42121001, grant no. 42171204).
Author information
Authors and Affiliations
Contributions
W.M.S. wrote the code, generated the data and contributed to manuscript writing and revision. M.X.C. designed the study and organized the research, manuscript writing and revision. Z.P.T. contributed to manuscript revision.
Corresponding author
Ethics declarations
Competing interests
The authors declare no competing interests.
Additional information
Publisher’s note Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Supplementary information
Rights and permissions
Open Access This article is licensed under a Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 International License, which permits any non-commercial use, sharing, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if you modified the licensed material. You do not have permission under this licence to share adapted material derived from this article or parts of it. The images or other third party material in this article are included in the article’s Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article’s Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by-nc-nd/4.0/.
About this article
Cite this article
Song, W., Chen, M. & Tang, Z. A 10-m scale chemical industrial parks map along the Yangtze River in 2021 based on machine learning. Sci Data 11, 843 (2024). https://doi.org/10.1038/s41597-024-03674-6
Received:
Accepted:
Published:
DOI: https://doi.org/10.1038/s41597-024-03674-6