A big data approach to assess progress towards Sustainable Development Goals for cities of varying sizes

Liu, Yu; Huang, Bo; Guo, Huadong; Liu, Jianguo

doi:10.1038/s43247-023-00730-8

Download PDF

Article
Open access
Published: 10 March 2023

A big data approach to assess progress towards Sustainable Development Goals for cities of varying sizes

Communications Earth & Environment volume 4, Article number: 66 (2023) Cite this article

9659 Accesses
18 Citations
9 Altmetric
Metrics details

Subjects

Abstract

Cities are the engines for implementing the Sustainable Development Goals (SDGs), which provide a blueprint for achieving global sustainability. However, knowledge gaps exist in quantitatively assessing progress towards SDGs for different-sized cities. There is a shortage of relevant statistical data for many cities, especially small cities, in developing/underdeveloped countries. Here we devise and test a systematic method for assessing SDG progress using open-source big data for 254 Chinese cities and compare the results with those obtained using statistical data. We find that big data is a promising alternative for tracking the overall SDG progress of cities, including those lacking relevant statistical data (83 Chinese cities). Our analysis reveals decreasing SDG Index scores (representing the overall SDG performance) with the decrease in the size of Chinese cities, suggesting the need to improve SDG progress in small and medium cities to achieve more balanced sustainability at the (sub)national level.

The carbon dioxide removal gap

Article 03 May 2024

Frequent disturbances enhanced the resilience of past human populations

Article Open access 01 May 2024

Heat health risk assessment in Philippine cities using remotely sensed data and social-ecological indicators

Article Open access 27 March 2020

Introduction

The Sustainable Development Goals (SDGs)¹ adopted by all members of the United Nations call for concerted efforts to achieve global social, economic, and environmental well-being. National governments have demonstrated strong commitment to the SDGs, but cities are critical actors in implementing the sustainability agenda—an estimated 65% of the 169 targets underlying the 17 SDGs require city engagement². As the centre of social and technological innovations, cities will continue to drive the achievement of the SDGs³. Nevertheless, rapid urban development has also introduced pressing social and environmental problems—such as various inequalities⁴, air pollution⁵, and a lack of infrastructure⁶—all of which threaten city prospects. Thus, local municipal governments globally are integrating the SDGs into their development plans to address these challenges and participate in a global dialogue^7,8.

Implementing and achieving the SDGs requires measuring and assessing progress in different contexts and determining development priorities. Quantitative assessments of SDG progress have been undertaken at the global^9,10, regional¹¹, national¹², and subnational¹³ levels by various government and nongovernment organisations. Among them, the SDG Index score (arithmetic mean of 17 individual SDG scores) has been highlighted as useful for comparing the overall SDG performance of different countries and provinces. The indicator framework and systematic methods arising from such research are essential for understanding SDG progress and the actions to take next¹⁴, which should be communicated to the intended target audience in a way that is easy to interpret¹⁵. At the city level, from 2016 to 2021, nearly 80 voluntary local reviews were submitted by city governments in different countries to report their progress¹⁶, while most of these reviews focused on status descriptions and governance arrangements regarding the SDGs and offered little in terms of setting baselines or evaluating progress towards SDG targets. Transforming the SDGs and their targets into a data-driven management tool to quantify progress is crucial for formulating evidence-based strategies and refining resource allocation¹¹. However, only some large cities or capital cities/provincial capitals have measured their progress towards 15 or 16 of the 17 SDGs^2,17. Large-scale sustainability assessments of all 17 SDGs for all cities of varying sizes in a specific country are still limited. The shortage of relevant statistical data in many cities in developing and underdeveloped countries has worsened the situation. Among the cities at the prefecture level or higher in China, the number of small cities and their total land area are larger than those of large cities, but small cities face a more serious data shortage problem (see details in Table S5), thereby hindering the development of holistic strategies for promoting city sustainability. Thus, there is an urgent need to develop systematic methods to address the shortage of relevant statistical data in quantifying city-level progress towards SDGs, especially for small cities.

The wide availability of big data with five important characteristics (large amount, fewer properties, high data generation speed, great variety of data formats and sources, and high economic benefits)¹⁸ provides tremendous opportunities to monitor SDG progress. This capability has been highlighted for indicators and targets in SDG assessment studies^19,20. More than a quarter of the publications pertaining to SDG assessment using big data have focused on the indicator (target) monitoring of SDGs 1.1.1 (the international poverty line), 1.1.2 (national poverty lines), 6.6.1 (water-related ecosystems), and 15.3.1 (degraded land), which underlie SDGs 1 (no poverty), 6 (clean water and sanitation), and 15 (life on land)²¹. Multiple types of big data (e.g., nighttime light (NTL) satellite imagery, point of interest (POI) data, and OpenStreetMap data) have been integrated to construct a variety of monitoring indicators that reflect the current status of cities in a timely and efficient way to help assess the SDGs. The same big data can also be applied to monitor multiple SDGs. For example, NTL satellite imagery was used not only to represent economic growth (SDG 8)²² but also to map poverty (SDG 1)²³ and estimate inequality (SDG 10)²⁴. On the other hand, machine learning models—including random forest²⁵, boosted regression trees²⁶, and artificial neural networks (ANNs)²⁷—have been used in monitoring processes to improve evaluation efficiency²¹. However, these studies have focused only on the assessment of one or a few indicators (targets) of a specific SDG, and they have lacked an overall consideration of multiple SDGs. A comprehensive evaluation is a fundamental step for identifying the priorities that cities should pursue in implementing the SDGs. Therefore, it is necessary to integrate multisource big data and machine learning models into the overall assessment of SDGs.

In response, this study constructed a generic indicator system using open-source big data and developed an ANN model to efficiently assess the overall SDG progress for cities of varying sizes (Fig. S1). The proposed systematic methods are not limited to investigating the city-level SDG Index in China and can also be applied to other countries with appropriate adjustments. In detail, for 254 Chinese cities with relatively sufficient statistical data, we first evaluated their performance of 17 individual SDGs and the SDG Index with 54 statistical data indicators (details are available at the figshare repository²⁸). Then, the individual SDG scores were used to select the most suitable big data monitoring indicators for the generic indicator system, and the SDG Index scores were used as the expected output to train the ANN model. Finally, we applied the developed ANN model to evaluate the SDG Index for 83 Chinese cities with a severe lack of relevant statistical data. Overall, this study assessed the SDG Index for 337 cities at the prefecture level or above in China in 2017 with multisource and low-cost big data and compared the varying sustainability of cities with different population sizes, spatial locations, and income levels. Evidence-based policy recommendations were provided for cities to optimise their development paths and achieve the SDGs in accordance with local contexts.

Results

City-level individual SDG performance

Figure 1a shows the average performance of 254 Chinese cities—including large, medium, and small cities (see classification in Table S3)—regarding each SDG. Although large cities showed relatively better performance on most individual SDG scores, they also faced challenges in reducing income inequalities and increasing global partnerships, parts of SDG 10 (reduced inequalities) and SDG 17 (partnerships for the goals), respectively. Compared with large cities, small and medium cities performed better on SDG 15 (life on land), although they still scored poorly. Large and medium cities showed similar performance regarding SDG 13 (climate action) and scored higher than small cities. In terms of the five critical SDG dimensions, most cities did not make particularly good progress in the dimension of peace (SDG 16) and faced a challenge in forming partnerships (SDG 17). In contrast, Chinese cities performed well in the planet dimension, particularly in terms of clean water popularisation (SDG 6) and waste discharge and treatment (SDG 12), scoring more than 77 on average. Regarding the people and prosperity dimensions, most cities achieved better gender equality (SDG 5) and energy production and consumption (SDG 7) but needed to invest more effort into improving medical equipment (SDG 3), transportation networks, and technological innovations (SDG 9), and coordinated development (SDG 10).

**Fig. 1: Comparison of average individual SDG scores for different city groups.**

The performance of cities located in different regions regarding each SDG can be found in Fig. 1b–e. Eastern cities (e.g., Shenzhen) performed better on most of the SDGs but scored the lowest on SDG 14 (life below water) due to the poor water quality in coastal waters. Improved by the reform and opening-up policy, eastern coastal cities also showed better performance on SDG 17 (partnerships for the goals) due to their open and inclusive foreign investment environment, and they experienced rapid economic development (SDG 8) and gradually improved infrastructure (SDG 3, SDG 4, and SDG 9). Central cities performed better on energy consumption (SDG 7 and SDG 12), while northeastern cities scored the highest on SDG 2 (zero hunger) due to their fertile agricultural lands. Western cities scored lower on urban sustainability because of their limited ecological assets (SDG 15), inadequate educational facilities (SDG 4), and lower levels of external communication (SDG 17).

In terms of the indicator scores for individual SDGs (details are available at the figshare repository²⁸), most cities performed well in drinking water popularity, gas popularity, and domestic waste disposal, relevant to SDG 6 (clean water and sanitation), SDG 7 (affordable and clean energy), and SDG 12 (responsible consumption and production). However, many cities met challenges regarding road network construction and highway transportation, relevant to SDG 9 (industry, innovation, and infrastructure) and SDG 11 (sustainable cities and communities), with lower average normalised indicator scores across all 54 indicators. Regarding the indicators related to research and patents, parts of SDG 9 (industry, innovation, and infrastructure), large cities performed better than small and medium cities. For SDG 4 (quality education), a difference existed in the indicator pertaining to higher education in different-sized cities, but the compulsory education indicators performed similarly.

Harnessing big data to assess the SDGs with an ANN model

To obtain more comprehensive and accurate evaluation results, we constructed a variety of indicators using open-source big data for each SDG and conducted a correlation analysis with the individual SDG scores calculated using statistical data (see details in the Supplementary Methods section). Finally, 18 big data monitoring indicators were selected to assess the SDGs (Table S1). As shown in Fig. 2, the 18 big data monitoring indicators were highly correlated with their corresponding individual SDG scores at the 0.05 significance level and partially correlated with the other individual SDG scores. Except for SDG 15, which was calculated with the same indicators as before, the correlation between the NIC (nighttime light intensity of construction land) and the SDG 8 score was the highest (0.621). For multiple indicators monitoring the same SDG, the DME (density of existing manufacturing enterprises on construction land) and DRE (density of existing research enterprises on construction land) were significantly correlated with the SDG 9 score, and the Pearson correlation coefficients were 0.462 and 0.574, respectively.

**Fig. 2: Correlations of individual SDG scores with big data monitoring indicators.**

Using the same evaluation methods as those applied to the statistical data, we calculated the SDG Index scores for 254 cities based on the big data monitoring indicator framework (see details in the Supplementary Results section). Although the evaluation results using two different data types were relatively highly correlated, the accuracy of our evaluation results using big data could still be improved. Hence, we developed an ANN model using the 18 big data monitoring indicators to assess different-sized cities’ progress towards SDGs (see details in the Supplementary Results section). The low value of the root-mean-square error (RMSE = 3.13) and the high value of the coefficient of determination (R² = 0.7625) for the test set (Fig. 3d) indicate the better performance and higher accuracy of the trained ANN model. In contrast to collecting and calculating 54 statistical data indicators to measure the SDG Index scores of the different-sized cities, we provide a way of using only 18 big data monitoring indicators and a trained ANN model with comparable evaluation accuracy. Our results prove that big data can be used as a comprehensive and representative data source to monitor progress towards SDGs independently, and the developed assessment method is superior to the previous method using statistical data in terms of efficiency, availability, and cost.

**Fig. 3: Proposed ANN model performance.**

Fig 3a–c show the spatial distribution of the expected output, actual output, and errors for the trained ANN model, and Table 1 shows its performance for cities in different groups. The results suggest that the ANN model can be used to accurately assess the SDG progress for different-sized cities, not just the large cities that are the subject of existing SDG assessment reports^2,17. For equitable and harmonious global development, small and medium cities should also be assessed for their progress towards SDGs. The best performance of the ANN model was found in high-income cities, with the lowest mean absolute error (MAE) (1.370) and the highest correlation (R) (0.939).

Table 1 ANN model performance for cities in different groups.

Full size table

City-level SDG Index performance

Instead of focusing on the subtle differences between the two SDG Index scores derived using statistical data and big data, it may be more important to consider the performance of cities under the same indicator framework. Since the trained ANN model performed well in cities with different sizes, economic levels, and geographical locations, we used it to calculate the SDG Index scores for 83 other prefecture-level cities that lacked relevant statistical data.

Figure 4a shows the spatial distribution of the SDG Index scores of all 337 cities at the prefecture level or higher (excluding the newly established Sansha City) using big data monitoring indicators and the developed ANN model. We found that higher SDG Index scores were present in Eastern China, especially the coastal regions in Guangdong Province, Zhejiang Province, and Jiangsu Province, which formed a high-value agglomeration area, while the SDG Index scores of western cities were relatively lower, especially for cities in Tibet and Qinghai Province. Provincial capital cities in the central region performed better on the SDG Index, especially Changsha City, Wuhan City, and Zhengzhou City, all of which ranked in the top 15. For the northeastern region, cities in Liaoning Province performed better than those in Heilongjiang Province. Overall, in most provinces, provincial capital cities rather than other prefecture-level cities tended to have higher SDG Index scores.

**Fig. 4: Comparison of SDG Index scores for different city groups.**

Fig 4b–d show the distribution of the SDG Index scores according to the different group information across 337 cities in China. Regarding the cities located in different regions, the median SDG Index score in the eastern region was the highest (50.13), followed by that in the central (46.90), northeastern (43.77), and western (41.31) regions. The SDG Index scores of the cities were more evenly distributed in the northeastern region, concentrated around the median in the eastern region, and more concentrated in the third quartile (Q3) in the western region. In terms of different city sizes, the average SDG Index score of the large cities was the highest (49.66), followed by that of the medium (46.50) and small cities (42.10). The distribution of the SDG Index scores for the medium cities was more balanced, while that of most small cities was comparatively lower. The SDG Index scores showed differences across cities with varying income levels. Specifically, the median SDG Index score of high-income cities was more than 50, while that of lower-middle-income cities was less than 40. More lower-middle-income cities than high-income and upper-middle-income cities had SDG Index scores around and below the third quartile (Q3). Overall, large cities with high-income levels located in the eastern region had higher SDG Index scores (the average and median were 53.92 and 54.42, respectively), which is corroborated by Fig. 4a.

Discussion

Addressing development challenges and making notable SDG progress in different-sized cities require quantitative evaluations³. We quantify the performance of 17 individual SDGs for 254 cities with statistical data and assess the SDG Index for all 337 different-sized cities with open-source big data. The developed evaluation methods lay a solid foundation for United Nations members to assess the city-level performance of SDGs. At the same time, the assessment results provide a scientific reference for Chinese cities to achieve sustainability.

Specifically, big data with high availability provide a timely and reliable initial assessment of SDGs, which helps cities of varying sizes detect development challenges as early as possible. In contrast to integrating a series of global composite indexes to estimate the SDG Index²⁹, this study used multisource and low-cost big data to construct simple but representative indicators for individual SDGs and then developed an ANN model to estimate the SDG Index score, which can help governments at all levels monitor and evaluate SDG progress, especially cities that have limited statistical data availability and weak big data processing capabilities. Compared to traditional statistics, constructing big data monitoring indicators can save considerable human effort and financial costs since all big data used in this study are publicly available and most of them are globally available. To make assessment results more suitable for local situations, cities in other countries can use other open-source big data (e.g., land use data from GlobeLand30³⁰) as a supplement or alternative to construct the big data indicator system of SDGs and apply local assessment results with statistical data as the expected output to train the ANN model. In addition, unlike statistical data release with a lag of at least one year (e.g., city-level statistical yearbook), big data can reflect the strengths and weaknesses of urban development in the current year, providing an initial direction that cities should prioritise before formulating relevant policies and allowing timely monitoring during the implementation period. Furthermore, although big data has been proven to be a valuable alternative to statistical data for assessing the overall SDG progress, indicators using official statistics can reflect more detailed information about urban sustainability. Therefore, we suggest using the big data indicator framework and associated methods to conduct an overall assessment of SDGs to identify development priorities and then combining relevant statistical data (if available) for detailed analyses and targeted solutions.

A much faster and more ambitious governmental response is needed to measure SDG performance, address local development challenges, and achieve the SDGs by 2030. On the one hand, more specific and frequent statistical data at the city or finer scales are needed. The main data gaps for each SDG in Chinese cities compared with the official SDG indicator framework proposed by the United Nations were provided (details are available at the figshare repository²⁸). Taking SDG 3 (good health and well-being) as an example, data on hospital beds and doctors that relate to SDG 3.8 and SDG 3.c are commonly used to evaluate medical conditions³¹, as is done in this paper, while data on the prevalence of various diseases (e.g., HIV, tuberculosis, and hepatitis B) related to other indicators of SDG 3 can only be found at the provincial and national levels. Making these data public is crucially important for improving health care databases and allocating specialised medical facilities. Governments and statistical departments can also pay more attention to unifying the statistical method and calibre of indicators used in different statistical yearbooks, such as the traffic death rate (an indicator of SDG 3), net primary enrolment rate (an indicator of SDG 4), and proportion of small-scale industries in total industry value added (an indicator of SDG 9), which can only be collected in some cities. On the other hand, we call for more powerful support in building a long-term big data evaluation platform to promote the achievement of the SDGs. Since 2015, many studies have quantified SDG targets and indicators using multisource big data^32,33; however, different localised indicators for the same SDG target and different spatiotemporal resolutions for big data (e.g., remote-sensing images) targeting the same SDG indicator diminish the comparability and certainty of evaluation results³⁴. Based on enhancing cooperation among local governments, academics, big data vendors, and those interested in city sustainability, a possible route is to establish an SDG indicator monitoring platform to automatically generate big data assessment results on a regular basis (see details in the Supplementary Discussion section). This platform could define a common set of indicators, use public high-resolution big data, and provide appropriate processing methods for others to follow.

From large to medium and then to small cities, the average SDG Index scores calculated using both statistical data and big data tended to decrease (Figs. 3 and 4). Factors such as positive governmental support, rapid land urbanisation, and complete infrastructure may promote the sustainable development of larger cities. For instance, the results of the correlation analysis (Table S6) showed that local financial expenditure per capita, road network density, and the ratio of the urban land area to the total land area significantly and positively affected the scores of SDG 3 (good health and well-being), SDG 9 (industry, innovation, and infrastructure), and SDG 17 (partnerships for the goals). In fact, these three goals are the three worst-performing goals for small cities and three of the top four goals with the largest gaps in their average scores compared to large cities. The geographical distribution of different-sized cities and urban scaling characteristics may further exacerbate the score differences of these three SDGs and the SDG Index^13,35. More than half of the small cities were located in the western region, while nearly half of the large cities were concentrated in the eastern region (details are available at the figshare repository²⁸). Compared to western cities with rugged topography and a far distance from the coast³⁶, the relatively level ground in Eastern China provides a vital condition for the establishment of well-connected roads, further contributing to the gathering of capital and talent and the development of various industries (SDG 9)³⁷. The implementation of the reform and opening-up policy provided a massive chance for eastern coastal cities (especially large cities such as Shenzhen City and Zhuhai City) to attract foreign capital (SDG 17) and drive the development of medical care (SDG 3) by increasing local financial expenditures³⁸. Urban scaling laws have been widely proven to be applicable to cities in China^35,39. Urban indicators related to socioeconomic activities, such as GDP growth (SDG 8), mobile phone and internet use (SDG 9), and foreign capital investment (SDG 17), have superlinear relations with population size³⁵, indicating the increasing returns to scale effect of urban indicators³⁹. China achieved rapid urbanisation by prioritising the economic and infrastructural development of large cities⁴⁰, which have a greater ability to maintain growth due to their superior resources (e.g., more urban construction land quotas)⁴¹.

To achieve more balanced city-level sustainability, it is necessary to explore the central role of integrated socioeconomic communities (e.g., metropolitan areas, urban agglomerations, and economic belts) in driving the development of medium and small cities⁴². As the core cities of urban agglomerations, one or more large cities with advantages in location, political resources, industries, and population size can support the development of neighbouring small and medium cities through various means, including establishing industrial cooperation and transfer mechanisms, strengthening transportation links, and promoting the flow of people and materials⁴³. The rational division of industries and functions between small, medium, and large cities within integrated socioeconomic communities can lead to effective resource allocation and improve overall sustainability through complementary advantages. At the same time, the formation of integrated socioeconomic communities can reduce the negative spatial externalities caused by intracity competition and play a radiating role in promoting the development of surrounding cities.

Moreover, the Chinese government should prioritise the SDGs that lag behind others, especially SDG 10 (reduced inequalities), SDG 15 (life on land), and SDG 17 (partnership for the goals). More effective and powerful policies are needed to promote holistic sustainability and advance SDG progress (see details in the Supplementary Discussion section).

Conclusions

This study developed an ANN model with multisource and low-cost big data to efficiently assess different-sized cities’ overall SDG performance. In this way, this study provides a solid foundation for elaborating the important role of big data in assessing progress towards SDGs, which can also be extended to other countries and cities. The assessment results show that SDG individual and Index scores varied across city sizes, geographical locations, and income levels. Compared to large cities, small cities (especially western cities) scored relatively low on the SDG Index. Constructing integrated socioeconomic communities is a possible solution to improve the development of small and medium cities and promote the achievement of balanced sustainability at the subnational level.

In the future, two focal points can facilitate the evaluation and achievement of the SDGs. One is to encourage the government and related organisations to increase investments and financial support in SDG data and evaluation systems. Due to data limitations, not all 17 SDGs were measured using appropriate big data monitoring indicators, and the constructed city-level indicator framework with statistical data still does not fully reflect progress towards SDGs. The other is to suggest that researchers explore the trade-offs and synergies of the SDGs at the city level and integrate these interactions into the assessment models. Although the SDG Index calculated using statistical data provides a straightforward and easy-to-interpret way to identify development priorities and compare the overall SDG progress at different scales, its calculation methods lack the quantification of interactions among the SDGs.

Methods

Calculation of individual SDG and SDG Index scores using statistical data

The United Nations adopted more than 230 indicators to measure the 17 SDGs, but most are defined globally or nationally. We developed a localised indicator framework with statistical data mainly considering a combination of the global indicator framework for the SDGs and targets of the United Nations⁴⁴, the SDG Index and Dashboards Report 2018⁹, the Sustainable Development Report 2021¹⁰, and China’s SDG progress assessment papers and reports at various scales^{13,31,45,46,47}. Recognising the importance of developing an indicator framework covering all 17 SDGs for the city-level assessment⁴⁸, we selected as many indicators as possible for each SDG based on data availability and consistency in the statistical calibre at the city level. Our indicator evaluation system includes 54 indicators, with an average of 3 indicators for each SDG, and nearly 70% of the indicators can be found in previous SDG progress reports and published papers (details are available at the figshare repository²⁸). Unlike the global sustainable development reports, we selected indicators according to the SDG challenges facing China and the input (e.g., food, energy, and water.) and output (e.g., innovation, technology, and information) systems of prefecture-level cities. We then used data from local statistical sources for a more nuanced analysis to show the efforts of different-sized cities in China in implementing and achieving the 17 SDGs.

We set the best performance (upper bound) and lower bound for each SDG indicator to decrease the influence of extreme values. The rules for setting the SDG upper and lower bounds were similar to those used in previous studies^9,10,13. For the SDG indicators that have significant ideal values or technical optimum values, such as “gender equality” and “hazardous waste generated”, we used the corresponding fixed quantitative value as the upper bound. The principle “leave no one behind” has been used to determine the best way to access essential living resources and basic infrastructures, such as drinking water and internet coverage. For the other SDG indicators, we adopted the average top 5 performers as the upper bounds. For all indicators, we used the value at the 2.5th percentile as the lower bound. Since some SDG indicators are negative indicators, such as carbon dioxide emissions per capita, a lower value represents better performance.

Next, the indicator for each SDG was normalised to a standard scale of 0 to 100 to increase the comparability among the SDGs.

$${I}_{nij}=100\times \frac{{I}_{nij}^{{\prime} }-mix({I}_{nij}^{{\prime} })}{max({I}_{nij}^{{\prime} })-mix({I}_{nij}^{{\prime} })}$$

(1)

where n, i, and j represent the indicator, SDG, and city, respectively. ${I}_{{nij}}$ is the normalised score of indicator n under the ith SDG for city j. ${I}_{nij}^{{\prime} }$ represents the raw data value. $max({I}_{nij}^{{\prime} })$ and $mix({I}_{nij}^{{\prime} })$ indicate the upper bound (best performance) and lower bound (worst performance) of indicator n under the ith SDG, respectively.

For the normalised indicator score, a higher score means better performance, and a score of 100 represents the best performance of an indicator towards achieving the SDGs. The indicator scores were assigned values of 100 or 0 if their original performance was better than the upper bound or worse than the lower bound, respectively.

Determining weights is key for achieving reliable evaluation results, but not enough evidence is available to prove which method is the best^9,10. Since each indicator is equally important within each SDG and all 17 SDGs need to be achieved in all counties⁹, all 17 SDGs were equally weighted in this study, and the indicators of each SDG were equally and inversely weighted according to the number of indicators belonging to that SDG. Finally, we calculated the 17 individual SDG scores and the SDG Index scores of 254 cities in China in 2017 to measure urban engagement in achieving the SDGs.

The calculation of the 17 individual SDG scores is as follows:

$${S}_{{ij}}=\mathop{\sum }\limits_{n=1}^{k}({w}_{{ni}}\times {I}_{{nij}})$$

(2)

$${w}_{{ni}}=\frac{1}{k}$$

(3)

where ${S}_{{ij}}$ is the ith SDG score of city j. $k$ is the number of indicators under the ith SDG.

${w}_{{ni}}$ is the weight of indicator n under the ith SDG.

The SDG Index scores can be calculated using the following equations:

$${{SDGI}}_{j}=\mathop{\sum }\limits_{i=1}^{m}({W}_{i}\times {S}_{{ij}})$$

(4)

$${W}_{i}=\frac{1}{m}$$

(5)

where ${{SDGI}}_{j}$ is the SDG Index score of city j, m is the total number of individual SDG scores, and ${W}_{i}$ is the weight of the ith SDG score.

To test the stability of the evaluation results, uncertainty and sensitivity analyses for the SDG scores were also performed, and the detailed process and analysis results can be found in the Supplementary Methods.

Calculation of SDG Index scores using multisource big data

Big data from multiple sources, including remote sensing–associated data (e.g., NTL imagery and land use data) and geospatial big data (e.g., POI data, company information big data, gridded population data, and road networks), were used to derive the SDG monitoring indicator values. A correlation analysis was used to select the most suitable proxy-monitoring indicators calculated using big data for individual SDGs. The selected big data monitoring indicators should be highly correlated with their corresponding individual SDG scores at the 0.05 significance level. For each SDG, as few representative big data monitoring indicators as possible were ultimately selected to build the big data evaluation framework to increase global availability while ensuring satisfactory evaluation results of the SDG Index.

The Supplementary Methods section shows the detailed process of constructing the big data monitoring indicators, and Fig. S2 provides an example. Table S1 shows the final big data assessment framework, including the SDGs, corresponding proposed monitoring indicators, big data types, and big data sources.

The assessment methods using statistical data were also applied to the big data monitoring indicators to assess city-level progress towards SDGs (see details in the Supplementary Results section). To overcome the limitations of previous methods and improve the accuracy of the assessment results, we developed an ANN model using big data to assess different-sized cities’ progress towards the SDGs. As an intelligent learning system, ANNs can model complex nonlinear relationships with an appropriate network structure and transfer function and make decisions with a massive parallel processing capability⁴⁹. Through their parameter configuration and multiple iterations, ANNs can achieve better modelling performance with optimal weights and higher prediction accuracy than most traditional statistical tools^50,51; thus, they have been widely used in the research field of urban sustainability^27,52. As one of the most extensively applied ANN models, the backpropagation (BP) neural network has been chosen due to its easy operation, strong self-adaptability, and good performance in evaluation and forecasting⁵³. A BP neural network is a multilayer feedforward network consisting of an input layer, one or more hidden layers, and an output layer with random connection weights (Fig. S7). Compared with commonly used evaluation methods (e.g., entropy weight method), BP neural networks with topological structures are good at handling fuzzy information while considering multiple factors due to their distributed processing capability and fault tolerance⁵².

The output of the first hidden layer is:

$${O}_{j}=f\left(\mathop{\sum }\limits_{i=1}^{n}{w}_{{ij}}{x}_{i}-{d}_{j}\right)\qquad j=1,2,\ldots ,l$$

(6)

where ${x}_{i}$ represents the input data in the input layer, ${w}_{{ij}}$ represents the weight between the input layer and the hidden layer, n is the number of neurons in the input layer, l is the number of neurons in the hidden layer, and d represents the threshold. The calculation of the output of other hidden layers (e.g., the second hidden layer) is similar to Formula (6). The output of the previous hidden layer is the input of the next hidden layer.

The number of neurons in each hidden layer (l) is:

$$l=a+\sqrt{n+m}\qquad1 \, < \, a \, < \, 10$$

(7)

where m is the number of neurons in the output layer.

The output of the output layer is as follows:

$${Y}_{k}=f\left(\mathop{\sum }\limits_{j=1}^{l}{O}_{j}{w}_{{jk}}-{d}_{k}\right)\qquad k=1,2,\ldots ,m$$

(8)

where ${w}_{{jk}}$ represents the weight between the last hidden layer and the output layer.

The mean squared error is used as the error function for the BP neural network model:

$${E}_{k}=\frac{1}{2}{\sum }_{k}{({F}_{k}-{Y}_{k})}^{2}$$

(9)

where ${F}_{k}$ represents the expected output.

The Levenberg‒Marquardt algorithm is used as the network training function to improve the convergence speed and model accuracy:

$$x\left(k+1\right)=x\left(k\right)-{[{{{{{{\boldsymbol{J}}}}}}}^{T}{{{{{\boldsymbol{J}}}}}}+\mu {{{{{\boldsymbol{I}}}}}}]}^{-1}{{{{{{\boldsymbol{J}}}}}}}^{T}e$$

(10)

where ${{{{{\boldsymbol{J}}}}}}$ is the Jacobian matrix, ${{{{{\boldsymbol{I}}}}}}$ is a unit matrix, and $e$ is the network error. When the value of $\mu$ is zero, it is the same as Newton’s method. When $\mu$ is large, it becomes the gradient descent algorithm with a small step size.

For this study, the input data were 18 big data monitoring indicators for each city, and the expected output data were their SDG Index scores calculated using statistical data. A total of 254 cities were randomly divided into 3 groups as follows: 70% were used for training, 15% were used to validate and stop training the network before overfitting, and 15% were used as an independent dataset to test the generalised network. By using MATLAB (R2021a), the number of hidden layers and the number of neurons in each layer were adjusted several times to improve the accuracy of the results. The final adopted ANN model consisted of an eighteen-neuron input layer, two nine-neuron hidden layers, and a one-neuron output layer (see details in the Supplementary Results section). Compared with a single hidden layer structure, the multihidden layer structure is more suitable for mapping complex relations due to its stronger generalisation capability and higher prediction accuracy⁵¹.

Data availability

The data generated and analysed supporting the findings of this study are accessible at https://doi.org/10.6084/m9.figshare.22005461. Raw data are available from the following sources. Statistical data are available from China national population sample survey in 2015, City-level Statistical Communique on National Economic and Social Development (2017) (https://www.cnstats.org/tjgb/), China City Statistical Yearbook (2016-2018), and China Urban Construction Statistical Yearbook (2016-2018) (https://data.cnki.net/Yearbook). Carbon emission data were collected from the Carbon Emission Account & Datasets (https://www.ceads.net/data/county/). Data related to government performance are available from the Research Report on Financial Transparency of Municipal Governments in China (https://www.sppm.tsinghua.edu.cn/__local/4/EE/0C/08CAEBBFCA6ABF51DEED995A9B5_3A0F9DEB_5E32BA.pdf?e=.pdf) and the Ranking of Political and Business Relations in Chinese Cities (http://nads.ruc.edu.cn/zkcg/ndyjbg/c9ad75bec3024ec0bb24e4fc6b7d3c14.htm). Marine data are available from the Bulletin on Ecological and Environmental Quality of China’s Coastal Waters (https://www.mee.gov.cn/hjzl/sthjzk/jagb/201808/P020191217742220289047.pdf). Company information big data are available from TianYanCha.com (https://www.tianyancha.com/). POI data are available from Amap.com (https://lbs.amap.com/api/webservice/guide/api/search). Road network data are available from OpenStreetMap (https://download.geofabrik.de/asia/china.html). Population data are available from WorldPop (https://www.worldpop.org/geodata/summary?id=24923). Land use data are available from the Geographical Information Monitoring Cloud Platform (http://www.dsac.cn/DataProduct/Detail/200804). Nighttime light data are available from the Earth Observations Group (EOG) (https://eogdata.mines.edu/nighttime_light/annual/v20/).

Code availability

The original code of the ANN model was generated by the Neural Fitting App of MATLAB (R2021a) (https://www.mathworks.com/help/deeplearning/ref/neuralnetfitting-app.html;jsessionid=ddfbbce3ba76ebac4d019c3f4420).

References

United Nations. Sustainable Development Goals: 17 Goals to Transform Our World https://www.un.org/sustainabledevelopment/sustainable-development-goals/ (UN, 2015).
Lafortune, G. et al. The 2019 SDG index and dashboards report for European Cities https://www.sdgindex.org/reports/sdg-index-and-dashboards-report-for-european-cities/ (Sustainable Development Solutions Network, 2019).
Wiedmann, T. & Allen, C. City footprints and SDGs provide the untapped potential for assessing city sustainability. Nat. Commun. 12, 3758 (2021).
Article CAS Google Scholar
Hu, F. Z. Global city development and urban wage inequality in China. Asian Geogr. 38, 73–91 (2021).
Article Google Scholar
Gariazzo et al. A multi-city air pollution population exposure study: Combined use of chemical-transport and random-Forest models with dynamic population data. Sci. Total Environ. 724, 138102 (2020).
Article CAS Google Scholar
Jenks, M. J. et al. Compact cities: Sustainable urban forms for developing countries. Taylor & Francis (2000).
Shenzhen Municipal Government. Sustainable Development Plan of Shenzhen (2017-2030) http://www.sz.gov.cn/zfgb/2018/gb1052/content/mpost_5018701.html (Shenzhen Municipal Government, 2018).
Mayor of the City of Bonn. Voluntary Local Review: Agenda 2030 on the local level. In: Implementation of the UN Sustainable Development Goals in Bonn. https://sdgs.un.org/sites/default/files/2020-10/Voluntary-Local-Review-Bericht-englisch.pdf (Mayor of the City of Bonn, 2020).
Sachs, J., Schmidt-Traub, G., Kroll, C., Lafortune, G. & Fuller, G. SDG Index and Dashboards Report 2018 https://www.sdgindex.org/reports/sdg-index-and-dashboards-2018 (Pica, 2018).
Sachs, J., Kroll, C., Lafortune, G., Fuller, G. & Woelm, F. The Decade of Action for the Sustainable Development Goals: Sustainable Development Report 2021 https://www.sdgindex.org/reports/sustainable-development-report-2021/ (Cambridge: Cambridge University Press, 2021).
Allen, C. et al. Indicator-based assessments of progress towards the sustainable development goals (SDGs): a case study from the Arab region. Sustain. Sci. 12, 975–989 (2017).
Article Google Scholar
Allen, C., Reid, M., Thwaites, J., Glover, R. & Kestin, T. Assessing national progress and priorities for the Sustainable Development Goals (SDGs): experience from Australia. Sustain. Sci. 15, 521–538 (2020).
Article Google Scholar
Xu, Z. et al. Assessing progress towards sustainable development over space and time. Nature 577, 74–78 (2020).
Article CAS Google Scholar
Newig, J. et al. Communication regarding sustainability: Conceptual perspectives and exploration of societal subsystems. Sustainability 5, 2976–2990 (2013).
Article Google Scholar
Miola, A. & Schiltz, F. Measuring sustainable development goals performance: How to monitor policy action in the 2030 Agenda implementation? Ecol. Econ. 164, 106373 (2019).
Article Google Scholar
UN-Habitat. Voluntary Local Reviews https://unhabitat.org/topics/voluntary-local-reviews (UN-Habitat, 2021).
Lynch, A., LoPresti, A. & Fox, C. The 2019 US Cities Sustainable Development Report. https://www.sustainabledevelopment.report/reports/2019-us-cities-sustainable-development-report/ (Sustainable Development Solutions Network, 2019).
Wamba, S. F., Akter, S., Edwards, A., Chopin, G. & Gnanzou, D. How ‘big data’can make big impact: Findings from a systematic review and a longitudinal case study. Int. J. Prod. Econ. 165, 234–246 (2015).
Article Google Scholar
Avtar, R., Aggarwal, R., Kharrazi, A., Kumar, P. & Kurniawan, T. A. Utilizing geospatial information to implement SDGs and monitor their Progress. Environ. Monit. Assess. 192, 1–21 (2020).
Article Google Scholar
Kashyap, R., Fatehkia, M., Tamime, R. A. & Weber, I. Monitoring global digital gender inequality using the online populations of Facebook and Google. Demogr. Res. 43, 779–816 (2020).
Article Google Scholar
Allen, C., Smith, M., Rabiee, M. & Dahmm, H. A review of scientific advancements in datasets derived from big data for monitoring the Sustainable Development Goals. Sustain. Sci. 16, 1701–1716 (2021).
Article Google Scholar
Keola, S., Andersson, M. & Hall, O. Monitoring economic development from space: using nighttime light and land cover data to measure economic growth. World Dev. 66, 322–334 (2015).
Article Google Scholar
Elvidge, C. D. et al. A global poverty map derived from satellite data. Comput. Geosci. 35, 1652–1660 (2009).
Article Google Scholar
Ivan, K., Holobâcă, I.-H., Benedek, J. & Török, I. Potential of night-time lights to measure regional inequality. Remote Sens. 12, 33 (2020).
Article Google Scholar
Ghazaryan, G. et al. Monitoring of urban sprawl and densification processes in Western Germany in the light of SDG Indicator 11.3. 1 based on an automated retrospective classification approach. Remote Sens. 13, 1694 (2021).
Article Google Scholar
Asadikia, A., Rajabifard, A. & Kalantari, M. Systematic prioritisation of SDGs: Machine learning approach. World Dev. 140, 105269 (2021).
Article Google Scholar
Gue, I. H. V., Ubando, A. T., Tseng, M. L. & Tan, R. R. Artificial neural networks for sustainable development: a critical review. Clean Technol. Environ. 22, 1449–1465 (2020).
Article Google Scholar
Liu, Y., Huang, B., Guo, H. & Liu, J. Supplementary material for the article: A big data approach to assess progress towards Sustainable Development Goals for cities of varying sizes. figshare https://doi.org/10.6084/m9.figshare.22005461 (2023).
Mirghaderi, S. H. Using an artificial neural network for estimating sustainable development goals index. Manag. Environ. 31, 1023–1037 (2020).
Google Scholar
Chen, J. et al. Global land cover mapping at 30 m resolution: A POK-based operational approach. ISPRS J. Photogramm. Remote Sens. 103, 7–27 (2015).
Article Google Scholar
Sun, X. WWF-UK. 2018 China SDGs Indicators and Progress Assessment Report (Summary) https://www.wwfchina.org/content/press/publication/2019/SDG%20%E6%8A%A5%E5%91%8A%E8%8B%B1%E6%96%87%E7%AE%80%E6%9C%AC.pdf (World Wide Fund for Nature, 2018).
Pomati, M. & Nandy, S. Measuring multidimensional poverty according to national definitions: operationalising target 1.2 of the Sustainable Development Goals. Soc. Indic. Res. 148, 105–126 (2020).
Article Google Scholar
Ram, R. Attainment of multidimensional poverty target of sustainable development goals: a preliminary study. Appl. Econ. Lett. 28, 696–700 (2021).
Article Google Scholar
Huang, C. L. et al. Big earth data supports sustainable cities and communities: progress and challenges. [in Chinese]. Bull. Chin. Acad. Sci. 36, 914–922 (2021).
Google Scholar
Zhou, C., Gong, M., Xu, Z. & Qu, S. Urban scaling patterns for sustainable development goals related to water, energy, infrastructure, and society in China. Resour. Conserv. Recycl. 185, 106443 (2022).
Article Google Scholar
Gai, K. Study on The Coordination between Ecological Environment and Economic Development in West China. [in Chinese]. PhD thesis, Southwestern University of Finance and Economics (2008).
Wang, X. & Team, T. C. S. Reprint of “China geochemical baselines: Sampling methodology”. J. Geochem. Explor. 154, 17–31 (2015).
Ortuño-Padilla, A., Espinosa-Flor, A. & Cerdán-Aznar, L. Development strategies at station areas in Southwestern China: the case of Mianyang city. Land Use Policy 68, 660–670 (2017).
Article Google Scholar
Lei, W., Jiao, L., Xu, G. & Zhou, Z. Urban scaling in rapidly urbanising China. Urban Stud. 59, 1889–1908 (2022).
Article Google Scholar
Brelsford, C., Lobo, J., Hand, J. & Bettencourt, L. M. Heterogeneity and scale of sustainable development in cities. Proc. Natl Acad. Sci. 114, 8963–8968 (2017).
Article CAS Google Scholar
Keuschnigg, M., Mutgan, S. & Hedström, P. Urban scaling and the regional divide. Sci. Adv. 5, eaav0042 (2019).
Article Google Scholar
Fang, C. & Yu, D. Urban agglomeration: An evolving concept of an emerging phenomenon. Landsc. Urban Plan. 162, 126–136 (2017).
Article Google Scholar
Tian, Y. et al. Regional industrial transfer in the Jingjinji urban agglomeration, China: An analysis based on a new “transferring area-undertaking area-dynamic process” model. J. Clean Prod. 235, 751–766 (2019).
Article Google Scholar
United Nations Statistics Division. SDG Indicators https://unstats.un.org/sdgs/indicators/indicators-list (UNSD, 2017).
Wang, Y. et al. Spatial variability of sustainable development goals in China: A provincial level evaluation. Environ. Dev. 35, 100483 (2020).
Article Google Scholar
Ma, Y. J. & Ai, X. P. Evaluation of sustainable urbanization development in Jilin province based on the 2030 sustainable development goals (in Chinese). Sci. Geogr. Sin. 39, 487–495 (2019).
Google Scholar
Chen, J. et al. Deqing’s Progress Report on Implementing the 2030 Agenda for Sustainable Development https://unhabitat.org/sites/default/files/2021/06/deqing_2017_en.pdf (Deqing, 2018).
Dawood, T., Elwakil, E., Novoa, H. M. & Delgado, J. F. G. Toward urban sustainability and clean potable water: Prediction of water quality via artificial neural networks. J. Clean Prod. 291, 125266 (2021).
Article CAS Google Scholar
Cui, K. & Jing, X. Research on prediction model of geotechnical parameters based on BP neural network. Neural. Comput. Appl. 31, 8205–8215 (2019).
Article Google Scholar
Wang, J. Z., Wang, J. J., Zhang, Z. G. & Guo, S. P. Forecasting stock indices with back propagation neural network. Expert Syst. Appl. 38, 14346–14355 (2011).
Article Google Scholar
Paliwal, M. & Kumar, U. A. Neural networks and statistical techniques: A review of applications. Expert Syst. Appl. 36, 2–17 (2009).
Article Google Scholar
Li, X., Fong, P. S., Dai, S. & Li, Y. Towards sustainable smart cities: An empirical comparative assessment and development pattern optimization in China. J Clean Prod. 215, 730–743 (2019).
Article Google Scholar
Deng, Y., Xiao, H., Xu, J. & Wang, H. Prediction model of PSO-BP neural network on coliform amount in special food. Saudi J. Biol. Sci. 26, 1154–1160 (2019).
Article Google Scholar

Download references

Acknowledgements

This work was supported by the National Key Research and Development Program of China (2022YFB3903700) and the Strategic Priority Research Program of the Chinese Academy of Sciences (XDA19090108).

Author information

Authors and Affiliations

Department of Geography and Resource Management, The Chinese University of Hong Kong, Hong Kong, China
Yu Liu & Bo Huang
Institute of Space and Earth Information Science, The Chinese University of Hong Kong, Hong Kong, China
Bo Huang
Department of Sociology, The Chinese University of Hong Kong, Hong Kong, China
Bo Huang
International Research Center of Big Data for Sustainable Development Goals, Beijing, China
Huadong Guo
Key Laboratory of Digital Earth Science, Aerospace Information Research Institute, Chinese Academy of Sciences, Beijing, China
Huadong Guo
Center for Systems Integration and Sustainability, Department of Fisheries and Wildlife, Michigan State University, East Lansing, MI, USA
Jianguo Liu

Authors

Yu Liu
View author publications
You can also search for this author in PubMed Google Scholar
Bo Huang
View author publications
You can also search for this author in PubMed Google Scholar
Huadong Guo
View author publications
You can also search for this author in PubMed Google Scholar
Jianguo Liu
View author publications
You can also search for this author in PubMed Google Scholar

Contributions

B.H. and Y.L. designed the research. Y.L. contributed and checked the data. Y.L. and B.H. built the models and carried out analyses. Y.L. and B.H. wrote the original manuscript. Y.L., B.H., H.G., and J.L. interpreted the findings and revised the manuscript.

Corresponding author

Correspondence to Bo Huang.

Ethics declarations

Competing interests

The authors declare no competing interests.

Peer review

Peer review information

Communications Earth & Environment thanks Bhavya Alankar and the other, anonymous, reviewer(s) for their contribution to the peer review of this work. Primary Handling Editor: Joe Aslin.

Additional information

Publisher’s note Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Supplementary information

Supplementary Information

Rights and permissions

Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made. The images or other third party material in this article are included in the article’s Creative Commons license, unless indicated otherwise in a credit line to the material. If material is not included in the article’s Creative Commons license and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this license, visit http://creativecommons.org/licenses/by/4.0/.

Reprints and permissions

About this article

Cite this article

Liu, Y., Huang, B., Guo, H. et al. A big data approach to assess progress towards Sustainable Development Goals for cities of varying sizes. Commun Earth Environ 4, 66 (2023). https://doi.org/10.1038/s43247-023-00730-8

Download citation

Received: 30 April 2022
Accepted: 23 February 2023
Published: 10 March 2023
DOI: https://doi.org/10.1038/s43247-023-00730-8

This article is cited by

Regional and global hotspots of arsenic contamination of topsoil identified by deep learning
- Mengting Wu
- Chongchong Qi
- Yong Sik Ok
Communications Earth & Environment (2024)
Unraveling interactions and priorities under sustainable development goals in less-developed mountainous areas: case study on the National Innovation Demonstration Zone for the 2030 Agenda for Sustainable Development, China
- Qingping Cheng
- Chunxiao Zhang
- Hanyu Jin
Environmental Science and Pollution Research (2023)

Comments

By submitting a comment you agree to abide by our Terms and Community Guidelines. If you find something abusive or that does not comply with our terms or guidelines please flag it as inappropriate.