Introduction

The past few decades have been characterized by extensive urbanization in many parts of the world, and especially in developing countries1,2,3,4. While urbanization has been linked to rapid economic growth, it has also been associated with negative environmental (e.g. pollution, ecosystem degradation)2,5,6,7,8,9,10,11 and socio-economic impacts such as the unavailability or lack of access to public services (e.g. education, healthcare, transport, waste management, sanitation)12,13,14. Such negative effects can the outcome of various planning, socioeconomic, and political factors, and result in some areas modernizing rapidly when urbanising, while others being stuck in unsustainable urban development paths15,16. Some scholars have argued that the quality of life of urban residents decreases as urban agglomerations increase in size3,17, while others have argued that it improves due to scaling effects related to economic prosperity and community wellbeing18,19.

In this context, urban livability has emerged as an important concept in urban policy and practice discources around the world, including for cities that are rapidly expanding or planned from scratch20,21,22. Urban livability is a multi-dimensional concept that mirrors the overall quality of life in an urban setting23,24,25 (see “Conceptual framework of urban livability” for definitions and dimensions). Beyond dimensions related to the availability of or access to services, urban livability can be influenced by demographic transitions (e.g. population aging and shrinking), socioeconomic change (e.g. boom-and-bust economic cycles), cultural factors, and environmental change26,27,28,29.

The assessment of urban livability has been attracting similar attention30. A growing number of studies have explored urban livability using very diverse conceptual frameworks and methodological approaches, which are usually influenced by the thematic or geographical focus of the study. For example there has been a proliferation of urban livability city rankings that use diverse indicators to capture different aspects of urban livability at the city scale31,32. Some academic studies have adopted similarly aggregated approaches to assess urban livability, developing composite urban livability indices. Examples include rankings/comparisons of European cities through composite urban livability indices33,34,35,36 or the distribution patterns of such indices in Australian cities37 or Singapore38. Studies have also explored relationships between composite urban livability indices (or their individual indicators) with different aspects linked to urban activity such as transport choice in Melbourne39 or urban form in Vancouver40.

Such conventional approaches to urban livability assessment mostly rely either on data for a limited set of indicators or on highly aggregated data that are poorly adapted to the wealth of the increasingly available spatially-explicit demographic, socio-economic and Point-of-Interest (POI) data in cities around the world. Such spatially disaggregated datasets can provide rich information categorized by type and location, and as their availability and quality impoves globally they are increasingly applied in tourism studies and urban planning41,42,43,44. Arguably, POI data-driven approaches can also be used for assessing urban livability, and have the advantage of establishing geographic patterns of urban livability45. For example, studies have used a Levenberg–Marquardt backpropagation algorithm (LMBP) to assess residential livability patterns using spatial POI data (in Yinchuan City)46 or developed a life convenience index using Analytical Hierarchy Process (AHP) analysis with POI data (in Guangzhou and Shenzhen)47. Studies have also used POIs for more focused applications, such as to relate urban livability with urban transport networks, urban form and social activities (in Hong Kong)48. Collectivelly, such approaches to urban livability assessments reduce reliance on potentially unavailable or inconsistent official statistics and enables the use of open-source data that is readily available and frequently updated.

However, despite the growing availability and popularity of open data-driven approaches to studying urban phenomena (incl. urban livability)46,47,49, there are substantial gaps in the literature. The main challenge often lies in the difficulty to reconcile and combine meaningfully different open-source datasets due to differences in spatial resolution, coverage, quality or accessibility. Other challenges include data overload, decision paralysis, and the failure to generalize or utilize properly large open-source datasets for livability assessment50.

China offers an ideal setting for developing and implementing data-driven approaches for the assessment of urban livability. Rapid urbanization and urban transformation have increased the demand for urban livability assessments, while the relevant research environment is rapidly evolving. Notably, rapid urbanization has caused major challenges such air pollution51, traffic congestion52, and limited accessibility to amenities53,54,55. Although the livability of Chinese cities shows signs of improvement, many cities are still characterized by low urban livability, with for example residents expressing moderate satisfaction, particularly concerning transportation20,26,56. Previous studies have proposed assessment methods, but they often lean towards traditional highly aggregated techniques57,58. However, political constraints usually limit access to high-quality spatially-disaggragated socioeconomic data. This challenge can be alleviated by the increasing availability of open-source spatial data, especially given China’s technological advancement in the past decades. However, even when such data is used there are constraints posed by data integration and/or low spatial resolution46,47,59,60, with the full capabilities of POIs remaining underutilized and often overlooking crucial factors of urban livability such as housing quality (e.g. prices, building ages).

Here we develop and apply a data-driven approach to assess urban livability using POIs and other readily available open-source spatially-explicit data. We focus on Shanghai, which is a city characterized by significant variability in terms of socioeconomic characteristics and access to services within its boundaries61. Our approach combines open-source data related to residential building clusters (RBC), population distribution, POIs, and other infrastructure-related aspects (i.e. transportation network) (see Methods). Firstly, we employ advanced data processing techniques to clean and filter only relevant datasets to minimize the noise and redundancy in the data, ensuring that only pertinent information is retained. Secondly, we use dimensional reduction techniques suggested in the literature50 to generalize the open-source data to ensure that our analysis is robust yet interpretable. This way, we effectively circumvent the problem of decision paralysis and extract meaningful insights about urban livability. Beyond outlining the main patterns of urban livability within Shanghai and the factors affecting them (see Results), we critically discuss the findings and potential of this type of open-source data-driven approach for assessing urban livability in China and other rapidly urbanizing areas in developing and emerging economies (see “Discussion”).

Results

Characteristics and distribution of RBCs and POIs in Shanghai

Figure 1 shows the spatial distribution of RBCs in Shanghai, as the real estate data elicited. Combining this with the population density data, we identify a rather uneven distribution with different clusters depending on different city attributes. Overall, due to its rapid urbanization, Shanghai has experienced a significant concentration of population and increasing property prices. Before the 1980s, the urban area of Shanghai was mainly located on the west bank of the Puxi river, with the Pudong area on the eastern bank of the river beginning to urbanize in the following decades. This is clearly shown in the distribution of the population, housing price and age of the RBCs, as outlined in Fig. 1a–d.

Fig. 1: Spatial distribution and characteristics of RBCs in Shanghai.
figure 1

Panel (a): Population; Panel (b): Price, Panel (c): Building age, and Panel (d): spatial distribution of RBCs in relation to bus routes and roads.

In more detail, Fig. 1a suggests that the majority of the population is concentrated in the northwest area of the city, while the rest of the population is relatively evenly distributed. The largest concentration in population in RBCs (Level 9) is found in the northwest region. Here, levels 1 to 9 indicate the RBC’s population density (increasing scale) and refer to 100 m population mesh data from Worldpop’s China population 202062. Figure 1b presents the distribution of house prices in the RBCs scattered in the urban area of Shanghai. With the Bund as the center, RBCs with the highest prices cluster in a circular pattern, and in general, the average house price in the western region of Shanghai is higher than in the surrounding areas. In terms of building age in the RBCs, there are no significant differences across the study area. Then, Fig. 1c illustrates that most of the RBCs were built after the 1980s when the rapid urbanization of Shanghai started, with newer RBCs emerging in the southeast areas such as Pudong that were developed later. However, we still can observe some areas with much older RBCs in areas such as Jingan and Hongkou. Moreover, Fig. 1d presents a detailed layout of Shanghai’s transportation and infrastructure network in relation to RBCs distribution. It highlights the interconnectedness of RBCs with the city’s bus and road networks, illustrating the mobility options available for residents.

Table 1 outlines the number of POIs by type and district. The results indicate that named entities are the most frequent POI types in Shanghai, accounting for 14.88% of the total identified POIs. This could be attributed to the fact that Address entities are generally the names of geographic entities, including villages, buildings, roads, and bridges, among others. The second most frequent POI type is shops, accounting for around 13.42% of the total, followed by life service (communication service station, post office, ticket shop, laundry shop, print shop, real estate agency, public service, public toilet, etc.; accounts for 10.57%), indoor facilities (10.26%), and real estate (a set of the office building, residential area, dormitory, private buildings of enterprises and etc.; accounts for 9.68%). Specifically, daily life services include services such as hairdressing, mailing, laundry, and grid services, among others, while the types of activities included in indoor services are similar to those of life services, with the exception that these are only provided within buildings.

Table 1 Types and distribution of POIs in the different regions of Shanghai

Distribution of livability dimensions

Below we provide scores for each of the major livability aspects reflected by the POIs in Table 1. For more straightforward reference Figs. 2, 3 provide livability distribution scores according to the 5 main categories of POIs (see “Methods”). Then, we report the results of the spatial analysis results of the SDE method (see “Methods”) to show livability trends within Shanghai (Fig. 4).

Fig. 2: Livability score distribution in 1 km concentric circles around RBCs.
figure 2

Each panel represents livability score distribution in the study area based on a comparatively higher preference for each specific urban livability dimension (see “Methods”). The livability dimensions include: education (Panel a), medical services (Panel b), recreation (Panel c), living services (Panel d) and transportation services (Panel e). In each of these panels, we keep constant the age, price, population density, and access to transport for each RBC, and we change the relative weights among the POI categories to reflect areas that are better in terms of access for the specific POI category. Areas denoted in dark red and blue color represent respectively the areas of Shanghai with the highest and the lowest value of urban livability for each dimension.

Fig. 3: Livability score distribution in 2 km concentric circles around RBCs.
figure 3

Each panel represents livability score distribution in the study area based on comparatively higher preference for each specific urban livability dimension (see “Methods”). The livability dimensions include: education (Panel a), medical services (Panel b), recreation (Panel c), living services (Panel d) and transportation services (Panel e). In each of these panels we keep constant the age, price, population density and access to transport for each RBC and we change the relative weights among the POI categories to reflect areas that are better in terms of access for the specific POI category. Thus areas denoted in dark red and blue color represent respectively the areas of Shanghai with the highest and the lowest value of urban livability for each dimension.

Fig. 4: Livability trend analysis of 1 and 2 km radius.
figure 4

The ellipses denote areas with high livability scores for each dimension. The main map depicts livability trends within a 1 km radius using solid-line circles. The inset map presents livability trends for the 2 km radius analysis. Areas falling within multiple circles denote areas with high livability scores for multiple dimensions.

In terms of the major POIs categories, Fig. 2 shows the livability distribution of major areas calculated within 1 km surrounding each RBC. Firstly, Panels 3a and 3b show similar scores and patterns for education and medical service. Since education and medical services are the major types of public services in China, these two types of POIs are distributed almost in the form of bundles around RBCs, showing a similar spatial distribution. Additionally, because the calculation of livability takes into account the potential diversity of educational and medical entities from the POI data, similar scores indicate that the educational and medical resources are concentrated and not diversely distributed in the city. To note, the legend Low-to-High in Fig. 2 indicates the recommendation level of areas in Shanghai associated with the specific livability dimension.

In terms of recreation livability (see results in Panel 4c), on the whole, Shanghai has relatively fewer relevant POIs, which are clustered mainly in the western region of the city, which is in line with the population and house price distribution of RBCs. The livability scores for transportation (Panel 4e) and living service livability (Panel 4d) show opposite trends against the distribution of RBCs. This happens because RBCs with excessively clustered POIs will receive lower scores due to the potential congestion within a certain geographic distance. The livability scores regarding living service are generally evenly distributed throughout the city, with the average score for this dimension being the highest in the downtown area of Shanghai, where residents have easier access to diverse and numerous services.

Interestingly, the livability score distribution follows different patterns if estimated across concentric circles with a 2 km radius around RBCs (Fig. 3). The most visible difference compared to Fig. 2 relates to the distribution of livability scores associated with education and medical services. The highest livability for education service is located in the middle of the downtown area, while that of medical service is located in the north. Also, there are more RBCs with higher livability scores in medical service than in education service.

Next, livability trend analysis is delineated in Fig. 4 by confining RBCs with livability results within the radius of 1 and 2 km areas with varied residential demands. The trend of livability scores in Shanghai city follows an east-west distribution, with no substantial difference between the analysis for 1 km and 2 km radius. With the exception of recreation, we observe that the high livability areas are relatively consistent, and are contained in the central northwest areas of the city (e.g. Hongkou district, Yangpu district, etc.). Conversely, areas with high livability in terms of recreation are mostly concentrated in a relatively smaller area with several clusters, as the relevant POI data are mainly concentrated in the older districts that contain old temples, churches, and gardens, which tend to be more distant from RBCs (e.g. Huangpu district). The largest dispersion comes from the transport service, involving residents’ commuting needs in the downtown area of Shanghai. The livability distribution of transport service covers the largest area as well, in line with the characteristics of a metropolis with intensive needs of commuting and accessibility.

Discussion

Trends in urban livability patterns

Shanghai is one of the most developed and populous Chinese cities. Its rapid urbanization has been characterized by rapid economic and population growth in the past decades, particularly through the increased migration from rural areas26,63. As a result the local government is facing the challenge of improving livability within the city. In this study, we provide a data-driven approach to estimate urban livability through multiple open-source related to real estate, population, and POIs. By taking RBCs as the starting point of this research (Fig. 1), we estimate livability scores by integrating information for RBCs characteristics, population density, access to transport, and availability and diversity of POIs, within a 1 km and 2 km radius of RBCs. Although livability scores are at the individual RBC level, for the purpose of this paper, we elicit livability patterns at the city scale. For this reason, we estimate five categories of livability scores that reflect the main functions of the POIs, namely education, medical/health services, recreation, transportation, and living services (Figs. 2, 3).

In more detail, the results show that, other than recreation service, livability scores are spatially distributed east-west in the downtown area of the city. In this sense, the spatial distribution of the livability dimensions is uneven. Living services have the highest livability score on average and is followed by transportation service, which are essential for the needs of citizens. Livability scores for living service are the highest in the downtown areas of Shanghai, and are highly correlated with the living habits of citizens. The spatial distribution of recreation areas in the downtown of the city are mainly concentrated in the west, in contrast to the east-west distribution of RBCs.

Medical and educational facilities (and related services) are relatively dispersed and mainly concentrated in the western part of the city. In general, different RBCs are matched with different medical and educational facilities, and this, in consequence, affects to a large degree the housing price and the density of the RBCs. This reflects a study that examined 35 major Chinese cities and concluded that Shanghai is one of the three worst cities in terms of convenience (including the availability of and access to educational and medical facilities/services)64.

Improving access to medical and educational facilities/services would be critical for enhancing urban livability in city areas that have a low density (or are far from) such facilities/services, and considering that accessing them beyond a 2 km distance from the respective RBCs could be costly and inconvenient. In this sense, an expansion of medical and educational facilities to the north, east and south should be considered. However, this would require substantial and coordinated action, which is not necessarily without precedent in China. For instance, from 2009 onwards, the “Action Plan for Speeding up the Development of the South Area” has been implemented in Beijing to improve the development and livability area of the southern part of the city, partly through the development of businesses, and educational and medical facilities65.

Beyond these critical services, it becomes important to ensure the comfort and convenience of citizens by enhancing the diversity and availability of living services and recreation opportunities66. Increasing government investment in urban infrastructure and economic incentives for such services can become a feasible means to approach these issues67,68.

Here, we observe that the uneven spatial distribution of recreation opportunities indicates the need for additional recreation and leisure areas in some parts of the city. This could be achieved by increasing green spaces in an east-west spatial distribution (Fig. 4). However, many rapidly urbanizing cities suffer from the lack of green spaces, with efforts to develop large-scale green spaces69 often challenged by the lack of available space70. As even small green areas can positively contribute to the well-being of urban residents and essentially increase the livability of urban areas71,72, increasing their quantity and quality has become a major task for urban planners in the efforts to enhance urban livability73,74,75,76.

Despite the past efforts to increase the extent of public green spaces in Shanghai77, access remains an issue78. Despite the existence of a number of green spaces in the part of Shanghai that we have investigated, their spatial distribution is unbalanced, which can negatively contribute to the equality of access to green spaces79. This is also reflected in other studies, pointing out that the unequal access to public green spaces in Shanghai is largely due to either their shortage following historical city planning legacies or the limited access to existing green spaces due to the preferential development of socioeconomic status78. Since equality of access to green spaces is an essential aspect of urban livability, as pointed out by the expanding literature on sustainable and livable cities80,81,82,83, urban planners should consider how to ensure the spatially equal distribution of recreation spaces to ensure that the livability of some areas is not compromised.

We should point that due to the unique methodological approach adopted in this study, our results are not directly comparable with other spatially-explicit urban livability studies. However, we see certain similarities with other studies in terms of the clear urban livability differences between city areas discussed above. Such differences have been identified in other spatially-explicit urban livability assessments in cities as diverse as Singapore38, Melbourne39, Vancouver40, and Wuhan49 among several others. Similar to our study, these studies have identified hotspots for the different urban livability dimensions, which gives rise to livability trade-offs in some areas.

Furthermore, our study expands on other studies that have explored different dimensions of urban livability in Shanghai. Similar to ours, such studies have identified the interconnected nature of urban livability challenges in the city, and the centrality of the transport sector84,85, and the built environment and green spaces86,87, all of which are considered in our study. The areas of Shanghai that exhibit higher urban livability (for all or some dimensions) are to some extent similar to the areas of Shanghai showing greater urban vitality88 and street livability89, despite the rather distinct methodological approaches adopted in these studies.

Lessons learned and limitations

Collectively, this analysis points to the fact that the combination of these datasets can provide a valuable lens for assessing urban livability and identifying sub-city patterns, which is generally lacking from aggregate urban livability assessments at the entire city level20,90,91. Current spatial analyses of Shanghai predominantly concentrate on aspects pertaining to air pollution84, mobility85, and temperature86,87, while our study specifically addresses the spatial patterns of urban livability in Shanghai. Our data-driven approach with open-source data can help circumnavigate the constraints posed by the lack of availability of spatially explicit socioeconomic data (see “Introduction”). As a result, our data-driven approach can offer a valuable and complementary lens for identifying “gaps” in livability and offering recommendations for the needs of urban residents. Despite its comprehensiveness, the proposed analytical approach has certain challenges and limitations that need to be considered when interpreting our results.

The first limitation relates to the extraction, manipulation and integration of various spatially explicit datasets with different coverage, resolution and focus. For example, in order to circumnavigate the lack of available data about building and population distribution, for major indicators of livability such as housing age, price and location, we focused our analysis at the RBC level that contains multiple buildings (rather than the individual building level), and then we superimposed population on these RBCs using a different dataset. This approach creates certain uncertainties given the different spatial resolutions of the different datasets, which should be considered when considering the results.

Secondly, despite the multi-dimensional approach to urban livability assessment it was not possible to consider important aspects of urban livability, For example it was not possible to incorporate eco-environmental indicators such as temperature, hydrology, air quality (e.g. ambient concentration of NOx, PM2.5 and other pollutants), or waste management92. Although air quality is an important dimension of urban livability due to the approach used to estimate livability scores and the format/extent of the publicly available ambient air quality data in Shanghai, it was not possible to map air pollution data in the right form by gridding or rasterizing them. Similarly, recreation is mostly captured through green spaces, as currently available data did not enable us integrate in the analysis other urban spaces with recreational potenrtial such as public squares, sports facilities, cultural venues, and waterfront areas. In the future, more granular air quality and public space datasets could be utilized to consider added aspects of urban livability related to environmental quality and recreation, as a means of providing more comprehensive urban livability assessments.

Thirdly, though the RBC data are extracted from the largest and most comprehensive public data platform on real estate, it is still possible that not all residential building information was publicly available online at the time of data extraction. Although this may affect the reliability of our samples, we believe that the results provide a good representation of urban livability patterns within the city. Future studies should include building information from multiple sources to improve the accuracy of the livability evaluation.

Fourth, it is not possible to establish a composite livability score across all dimensions of livability. The reason is that the denominators for each livability score (see Eq. 10 in “Methods”) is calculated from a variant of the Gini-Simpson method and its value ranges between 0 and 1. This represents the likelihood that the area is surrounded by POIs of different types, with lower numerators denoting a high concentration of POIs of specific types.

Finally, we must point that this study demonstrates the feasibility of utilizing readily available open-source data to assess urban livability at the sub-city level. By relying on open-source data, this method can be applied in cities facing a lack of spatially explicit official data or low-quality/resolution spatial data. This approach offers a comprehensive means to identify priority areas for urban planning in Shanghai and potentially in other cities in China and beyond, provided relevant datasets are accessible. However, it is essential to acknowledge that our methodology can be adapted for broader applications, but our analysis is specific to Shanghai’s geographical context. The unique socio-cultural, economic, and infrastructural characteristics of each city necessitate caution when comparing and generalizing such findings.

Methods

Conceptual framework, research approach and dataset structure

The growing consensus within empirical and conceptual urban livability studies (see also citations in Introduction) indicates that urban livability is a multi-dimensional concept, with no single or unified definition and conceptualization. The over-emphasis on economic aspects in urban livability metrics has been criticized within this literature as failing to account for the externalities of economic growth in urbanizing contexts25,93. Some of the dominant approaches to urban livability assessments such as city livability rankings have been criticized as highly aggregated and failing to provide the holistic understanding needed about how interrelationships between different aspects of the urban environment contribute to urban livability93.

Considering these points, it is both challenging and contentious to assess urban livability through holistic and commonly agreed approaches and metrics25,94,95. The growing consensus within the literature96,97,98 suggests that the design and implementation of urban livability assessments/metrics should consider: (a) multiple dimensions of urban livability, (b) reflect the specific context of the assesssments and the goals of the researchers, and (c) integrate insights/methods from multiple fields. Broadly speaking urban livability assessments should consider dimensions that are integral to the quality of life, such as a healthy living environment, good access to transport/amenities/services, and interactions between citizens and their surroundings28,66,99.

Some of the dimensions commonly considered in urban livability assessments have included the availability, access, and quality of social services, amenities, green spaces, and transport, as well as the broader socio-cultural environment and security20,25,93. Access to transportation has been very prominent in urban livability assessments100, as well as characteristics of the built-up form and spatial quality, among others101.

Considering the main dimensions/aspects of urban livability as considered in both the international and the Chinese literature, and our aim of showing how to leverage open-source data to capture some of these dimensions, here we focus on three specific dimensions of urban livability:

  1. (a)

    housing characteristics;

  2. (b)

    accessibility to transport;

  3. (c)

    availability and accessibility to multiple amenities and services (specifically education, recreation, transportation, health and living services).

Considering the general lack of open-source and high-resolution population distribution and socioeconomic data at the city level in China (see constraints in the Introduction), we base our research approach on the level of residential building clusters (RBCs), and their access to transport options and POIs. This is because such data can be readily extracted from different online databases and maps that are both quite up to date and could be combined with transport and POI datasets (see Introduction). Here RBCs are defined as relatively independent areas (usually a community in China) with a certain scale of population and land, and a concentration of residential buildings, public buildings, green spaces, roads, and various facilities, surrounded by city streets or natural boundaries.

First, we assume that the building age and price are important elements of housing livability. On the one hand, the building age can be viewed as a proxy for basic residential quality and access to public services such as sanitation. For example, building age can be a good proxy of access to different sanitation services, as houses prior to the 1990s seldom have a private flush toilet102,103. Price is another important indicator of housing livability, as higher a property price could indicate better facilities and/or environmental privileges104, indicating the generally higher willingness-to-pay for living in areas with higher livability105. For the purpose of this assessment, we extracted the location, sales prices and age of RBCs from Lianjia, which is the largest real estate brokerage firm in China106 (see “Data Collection” for more information on the underlying data). This was achieved through the development and application of a custom-made web-crawling program in the programming language Python. This information was extracted in July 2020 and was used to generate a citizen affluence indicator (see “Data Analysis” for more details) (Fig. 5).

Fig. 5
figure 5

Research approach for the urban livability assessment.

Second, we assume that the population density in a certain residential area (i.e. in and around an RBC) is another crucial aspect affecting urban livability25. In particular urban livability might be affected in areas characterized by high population density due to crowding and lack of access to space, as well as higher competition for access to services105. For the purpose of this study, and considering the lack of readily available official datasets on population distribution (see Introduction), we use population data at 100 m x 100 m resolution derived from Worldpop (see https://www.worldpop.org/) (see “Data Analysis” and Fig. 5).

Third, as already discussed, access to transport services is a major dimension of urban livability (see Introduction and beginning of this section), as it affects all sorts of daily activities, including access to employment or recreational opportunities, among others107,108. In summary, for the purpose of this analysis, we extracted through different open-source databases and multi-layer geoinformation data, including for road networks and bus lines (Fig. 5).

Fourth, access to different services is closely associated with the convenience and comfort of an area, and has been an integral dimension of many urban livability assessments (see Introduction and beginning of this section). POIs can point to locations catering services for the needs of urban residents, ranging from shops to educational facilities, health facilities, sports venues, and green spaces. Arguably, the availability, diversity and distance to POIs from residential spaces can affect urban livability, as residential neighborhoods with lower availability, diversity and access to such POIs could be considered less livable due to the lack of sufficient and necessary facilities and service providers59. In this sense, the presence and diversity of POIs in the vicinity of RBCs is a crucial element for the livability evaluation presented in this study. In summary, for the purpose of this study, we extract data for several types of POIs from the public map API (Fig. 5).

Open-source data for these urban livability dimensions is combined to generate livability scores for the different RBCs (Fig. 5) (see “Data Analysis”). We develop maps showing urban livability distribution across the city for 1 km and 2 km distances, which denote distances that RBC residents can comfortably traverse on foot. We should note that the radius of the assessment can determine livability scores, with different studies adopting different options according to their context. For example, studies have used a radius of 800 m when assessing livability aspects related to transit systems, as this value distinguishes whether people choose to walk or take public transit systems109,110,111. Other radius choices for transport have included 750 m for a metro-specific analysis and 1000 m for taxi-based trip analysis112. Similarly, five radii were selected to analyse livability at different aggregation levels in a study in Vancouver, Canada113. For different modes of commuting, 500 m was regarded as a threshold where residents to choose public transport while 1500 m was the maximum distance for residents to walk to work114. Other studies have followed the “15 minute” concept, where a 15-min walking radius (about 850 m) has been proposed as appropriate for assessing the accessibility to essential services while a radius choice of 2 km was applied around railway stations115. Here we assess the impact of accessibility on livability scores within a walking distance or a distance accessible by public transit systems. Reflecting the literature above we use both 1 and 2 km to judge whether POIs fall within walking distance or not (see sub-section “Interpolation and direction analysis for POIs” below).

Study site

Shanghai is a major metropolitan area located in the Yangtze Delta in eastern China. It has a population of approximately 24.9 million residents as of 2021, up from 16.7 million in 2001116. It is a global financial hub with a Regional Domestic Production of >4,321.4 billion RMB ( > 669.8 billion USD) in 2021, increasing 12-fold since 2011116. The economy is dominated by the tertiary sector that accounts for about 75% of the Regional Domestic Product116. Considering its sheer size, economic performance and global connections, Shanghai is one of the most important cities in China. Furthermore, due to its economic dynamism, Shanghai has been rapidly growing over the past decades, being in a constant state of flux.

The city has 16 districts, 107 sub-districts, and 18,539 km of total road network, out of which 845 km are highways and 5,494 km are city roads117. In tandem with population increase and economic growth, there is a huge boom in construction with 59.8-92.3 Mm2 of new floor space constructed annually between 2011 and 2021116. Further to new construction, there is also significant renovation efforts, with the city completing the renovation of 753,000 m2 of old houses in the downtown area and the comprehensive renewal of 7.09 Mm2 of three types of old houses, benefiting over 130,000 households (by end of 2020). Also, in 2016, the Shanghai Urban Planning and Land Resources Administration Bureau published the “Guidelines for 15minutes community life circle planning”, which aimed to improve neighborhood facilities across the city. In 2021, the local government published the “Suggestions on accelerating the renewal and transformation of old houses” and enacted the “Urban Renewal Ordinance of Shanghai” to lay the legislative foundation for urban renewal throughout the city.

However, despite being one of the most developed regions in China, urban livability in the city varies according to different indicators, and in some cases fluctuates or remains unchanged over time. For example, while the extent of urban green space has increased by 70% between 2011 and 2021, the number of visitors has fluctuated and has remained practically unchanged116. Similarly the number of cultural spaces such as museums, cinemas and theaters has fluctuated and does not show a clear trend over time116. In this context, some studies have linked the rapid economic growth of the city with significant improvements urban livability60, particularly through the development of new urban infrastructure and improved environmental conditions (following the World Expo 2010). Conversely others have noted the possibility of major livability differences within the city, due to for example the uneven distribution of urban greenery clusters in the downtown areas118 or the monotonous nature of the built environment in some areas due to the shift of many activities from the downtown to the periphery88.

Such possible differences in urban livability within the city make a strong case for developing urban livability assessments that make full use of the newly available open-source spatially-disaggregated data. In this study we primarily focus on Shanghai’s central districts, namely Xuhui, Putuo, Yangpu, Pudong, Hongkou, Changning, Jingan, and Huangpu. These districts span the majority of Shanghai’s central business areas, contain most of the population, are densely populated, economically vibrant, and offer diverse amenities. However, they also face challenges related to traffic congestion, high living costs, and social inequality, which make them an ideal focus of our analysis. Furthermore, it allows us to fully use the POI data within the downtown area118,119, while simplifying computational demand.

Data collection

Basic information about the RBCs is extracted from the public data access portal of the real estate brokerage company Lianjia (see https://sh.lianjia.com/). The metadata of interest for each RBC used in this study include (a) ordered numeric IDs, (b) community name, (c) district name, (d) address, (e) housing price (in RMB/m2), (f) construction year, (g) building structure, (h) operation fee, (j) operation company, (k) construction company, (l) building area (in m2), (m) number of households in RBC, and (n) location latitude and longitude. Notably, population data of RBCs, i.e. number of households were calibrated to the 100 m * 100 m data provided by the Worldpop.

Overall, we extracted 15,994 valid records for RBCs in Shanghai in 2020, as discussed in more depth in the Results. Each RBC is identified with a single point whose longitude and latitude corresponding with the central point of the RBC. In order to simplify the calculation of housing prices, all prices are divided into the 9 classes presented in Table 2. The number of RBCs is relatively normally distributed across 9 price classes.

Table 2 Housing price ranges and number of RBCs within each price range

Information for POIs is available from map data services for commercial and academic purposes. In this study, we extract and preprocess the POI data through the public map API (see https://lbsyun.baidu.com/index.php?title=webapi/guide/webservice-placeapi). We extract approximately 15 million lines of unique POI records in Shanghai up to the end of 2020. The taxonomy for POI is provided by the map services and generally falls under 23 first-level categories, including transport, hotels, sports, public services, hospitals, and businesses, among others. Furthermore, there are 235 second-level categories in the POI dataset, and 2,008 third-level categories, which offer more significant details about the functionality of different POIs. However, as the complexity increases significantly when granular types of POI data are considered, in this study, we only use the first-level categories (see detailed first-level categories in Table 3). In addition, reflecting the literature on urban livability outlined in the Introduction, we narrow down these 23 first-level categories into five major categories reflecting major aspects of urban livability, namely education, medical service, recreation, transportation service, and living services (see Table 3).

Table 3 Types of POIs relevant to the five domains of urban livability

Furthermore, since the size of the POI dataset is relatively extensive for computing, in order to simplify the calculations, we first exclude POI data that are more than 2 km away from the downtown area and RBCs, and then calculate the distance between each RBC and nearby POIs in the processed samples. The processing does not affect the evaluation of livability but helps to focus on POIs in the vicinity of RBCs, reducing thus computational complexity.

Finally, information about the accessibility of RBCs to the transport facilities is retrieved from transport-related POI data. This category may include a variety of different functional positions, such as a parking lot that may belong to the traffic authority and is open for the public as a parking facility. In this case, in order to apply the transport-related POI data to measure the accessibility of RBCs to the public transport facilities, we extract data on bus stops and metro entrances. We identify this data from the API through their IDs and types. In total, we extracted data for 13,334 validated bus stops and 1,506 metro entrances for the urban area of Shanghai in 2020.

Data analysis

As outlined in “Data Collection”, the major factors considered in the urban livability assessment include housing price, building age, population density, access to transport, and diversity of POIs within a given area. Therefore, the livability assessment is conducted in two steps where we first normalize the scores of the major factors based on the characteristics of RBCs and then we use the scores calculated based on the POI distribution and weights to get the final livability scores. This first step is to calculate the normalized factor scores, and the second step is to generate five different livability scores according to the different POIs scores.

Normalized factor scores

In the first step, we measure the normalized factor scores of the major livability factors. The normalized factor score \({{\boldsymbol{S}}}^{{\boldsymbol{i}}}\) is defined as the normalized values of the factors for each RBC \({\boldsymbol{i}}\) and it normalizes the factors: housing price (\({\boldsymbol{Price}}{{\boldsymbol{S}}}^{{\boldsymbol{i}}}\)), building age (\({\boldsymbol{Age}}{{\boldsymbol{S}}}^{{\boldsymbol{i}}}{\boldsymbol{)}}\), population density (\({{\boldsymbol{DensityS}}}^{{\boldsymbol{i}}}{\boldsymbol{)}}\), access to transport (\({{\boldsymbol{A}}{\boldsymbol{cc}}{\boldsymbol{essibilityS}}}^{{\boldsymbol{i}}}\)), and POI diversity (\({\boldsymbol{Diversit}}{{\boldsymbol{y}}}^{{\boldsymbol{i}}}\)). The score \({{\boldsymbol{S}}}^{{\boldsymbol{i}}}\) can be formulated by the sum of factor values in the form of logarithm (Eq. 1) (see Eq. 1), and we use \({\bf{ln}}\left({\boldsymbol{x}}{\boldsymbol{+}}{\boldsymbol{1}}\right)\) to ensure the factor values are bigger than zero.

$${{\boldsymbol{S}}}^{{\boldsymbol{i}}}=\frac{{\bf{1}}}{{\bf{5}}}\left[\begin{array}{c}{\bf{ln}}\left({\boldsymbol{Price}}{{\boldsymbol{S}}}^{{\boldsymbol{i}}}+{\bf{1}}\right)+{\bf{ln}}\left({\boldsymbol{Age}}{{\boldsymbol{S}}}^{{\boldsymbol{i}}}+{\bf{1}}\right)+{\bf{ln}}\left({{\boldsymbol{DensityS}}}^{{\boldsymbol{i}}}+{\bf{1}}\right)\\ +{\bf{ln}}\left({{\boldsymbol{AccessibilityS}}}^{{\boldsymbol{i}}}+{\bf{1}}\right)+{\bf{ln}}\left({{\boldsymbol{Diversity}}}^{{\boldsymbol{i}}}+{\bf{1}}\right)\end{array}\right]$$
(1)

where except the normalized factor values, the weight 1/5 ensures that all factors are equally weighted in the calculation of the standardized score. In addition to housing price and building age, factors including population density, access to transport, and POI diversity are derived from the data collection. Firstly, population density is defined as the average population per building of an RBC. This metric essentially measures the average living space of an RBC given its population and building area120,

$${\boldsymbol{D}}{{\boldsymbol{ensity}}}^{{\boldsymbol{i}}}={\boldsymbol{Po}}{{\boldsymbol{p}}}^{{\boldsymbol{i}}}/{{\boldsymbol{Build}}}^{{\boldsymbol{i}}}$$
(2)

where \({\boldsymbol{Po}}{{\boldsymbol{p}}}^{{\boldsymbol{i}}}\) and \({{\boldsymbol{Build}}}^{{\boldsymbol{i}}}\) are the area \({\boldsymbol{i}}\)’s population and number of buildings, respectively. It is noteworthy that the building area is not available in the city area data and then we use the number of buildings as a proxy variable to approximate the spatial differences between city areas.

In terms of transport-related scores, our apprioach evaluates the accessibility of RBCs to transport-related POIs. Since our approach focuses on the location of each RBC rather than the individuals who use these transportation facilities, we construct a location-based accessibility index121,112. As POIs represent opportunities for residents from RBCs, we apply a cumulative opportunity method (CUM) to calculate the accessibility index112. Notably, since we only use transit points as destinations and do not have enough path data to estimate accurate travel time, opportunities in the CUM equation can be simplified to be the number of bus stops or metro stations, and the travel cost is assumed to be a desirable and maximum walking distance. Therefore the POI-based accessibility index can be written as:

$${{\boldsymbol{Accessibility}}}^{{\boldsymbol{i}}}=\frac{{\sum }_{j}{\delta }_{j}^{i}{O}_{j}}{{\sum }_{j}{O}_{j}}$$
(3)

where \({\delta }_{j}^{i}\) is a threshold function with a value of “1” if the distance between destination j and i is within 1 km or 2 km, and a value of “0” otherwise. \({O}_{j}\) represents the number of POIs at destination j, which is “1” in our case.

For POI diversity, we apply an entropy-based diversity method to measure POI diversity within 1 km and 2 km concentric circles from each RBC. For each city area, this diversity indicator essentially captures the variety of POIs surrounding an RBC, and describes the distribution of different POIs surrounding an RBC122. The POI diversity metric is computed through (Eq. 4),

$${\boldsymbol{Diversit}}{{\boldsymbol{y}}}^{{\boldsymbol{i}}}=-\mathop{\sum}\limits_{{\boldsymbol{k}}={\bf{1}}}^{{\boldsymbol{K}}}{\boldsymbol{p}}\left({{\boldsymbol{x}}}^{{\boldsymbol{ik}}}\right){\bf{ln}}{\boldsymbol{p}}\left({{\boldsymbol{x}}}^{{\boldsymbol{ik}}}\right)$$
(4)

where \({\boldsymbol{p}}\left({{\boldsymbol{x}}}^{{\boldsymbol{ik}}}\right)\) denotes the distribution of \({\boldsymbol{k}}\) types of POIs surrounding the city area. The higher the score, the higher the POI diversity surrounding an RBC.

As POI diversity is defined as an index, it is excluded for normalization, and thus, we normalize housing price, building age, population density, and access to transport factors through the min-max normalization, which converts a vector of values into the range \([0,1]\). Therefore, we have the following normalization processes from (Eq. 5) to (Eq. 8),

$${{\boldsymbol{PriceS}}}^{{\boldsymbol{i}}}=({{\boldsymbol{Price}}}^{{\boldsymbol{i}}}-{{\boldsymbol{Price}}}_{{\boldsymbol{min}}})/({{\boldsymbol{Price}}}_{{\boldsymbol{max }}}-{{\boldsymbol{Price}}}_{{\boldsymbol{min }}})$$
(5)
$${{\boldsymbol{AgeS}}}^{{\boldsymbol{i}}}=({{\boldsymbol{Age}}}^{{\boldsymbol{i}}}-{{\boldsymbol{Age}}}_{{\boldsymbol{min}}})/({{\boldsymbol{Age}}}_{{\boldsymbol{max}}}-{{\boldsymbol{Age}}}_{{\boldsymbol{min}}})$$
(6)
$${\boldsymbol{D}}{{\boldsymbol{ensityS}}}^{{\boldsymbol{i}}}=({\boldsymbol{D}}{{\boldsymbol{ensity}}}^{{\boldsymbol{i}}}-{{\boldsymbol{Density}}}_{{\boldsymbol{min}}})/({{\boldsymbol{Density}}}_{{\boldsymbol{max}}}-{{\boldsymbol{Density}}}_{{\boldsymbol{min}}})$$
(7)
$${{\boldsymbol{AccessibilityS}}}^{{\boldsymbol{i}}}=({{\boldsymbol{Accessibility}}}^{{\boldsymbol{i}}}-{{\boldsymbol{Accessibility}}}_{{\boldsymbol{min}}})/({{\boldsymbol{Accessibility}}}_{{\boldsymbol{max}}}-{{\boldsymbol{Accessibility}}}_{{\boldsymbol{min}}})$$
(8)

where “S” is the suffix that indicates the respective variable is normalized, and the raw variables with “min” and “max” footnotes denote the minimum and maximum values of the variable.

Livability score by RBC

The second step estimates the livability scores for each RBC. Here, we need to balance two things, diversity of POIs and possibly higher preference of some residents for certain POIs. Regarding the first, we hypothesize that urban residents prefer areas with a concentration of POIs that have a considerable diversity46, which will be an important determinant for urban livability beyond a simple count of POIs. To reflect the possibly different preferences of urban residents for access to POIs, we give different weights to certain demands to evaluate the accessibility of POIs, and the weights can be described by (Eq. 9):

$${{\boldsymbol{w}}}^{{\boldsymbol{ik}}}={{\boldsymbol{N}}}^{{\boldsymbol{k}}}/\mathop{\sum}\limits_{{\boldsymbol{k}}}{{\boldsymbol{N}}}^{{\boldsymbol{k}}}$$
(9)

where \({{\boldsymbol{N}}}^{{\boldsymbol{k}}}\) represents the total number of POIs in \({\boldsymbol{k}}\)-th aspect. The weight variable \({{\boldsymbol{w}}}^{{\boldsymbol{ik}}}\) measures the ratio of POIs within the category \({\boldsymbol{k}}\) over the total five major categories.

Hence, we define the livability score as a tradeoff between the balance index of POIs and the average score of the five factors, including population density, POI diversity, house price, and building age. Therefore, given the average score of five factors, the livability score is measured by the balance index of POIs surrounding the RBC \({\boldsymbol{i}}\) using the Gini-Simpson Index denoted as \({{\boldsymbol{L}}}^{{\boldsymbol{ik}}}\). The index is originally developed to measure the concentration of populations across different species123,124 and can avoid predetermining the weights of subsidiary indicators125. \({{\boldsymbol{L}}}^{{\boldsymbol{ik}}}\) can be estimated according to (Eq. 10), as:

$${{\boldsymbol{L}}}^{{\boldsymbol{ik}}}=\left({\bf{1}}-{{\boldsymbol{w}}}^{{\boldsymbol{ik}}}\times \frac{{\sum }_{{\boldsymbol{k}}={\bf{1}}}^{{\bf{5}}}\left({{\boldsymbol{P}}}^{{\boldsymbol{ik}}}\left({{\boldsymbol{P}}}^{{\boldsymbol{ik}}}-{\bf{1}}\right)\right)}{{\sum }_{{\boldsymbol{k}}}^{{\bf{5}}}{{\boldsymbol{P}}}^{{\boldsymbol{ik}}}\left({\sum }_{{\boldsymbol{k}}}^{{\bf{5}}}{{\boldsymbol{P}}}^{{\boldsymbol{ik}}}-{\bf{1}}\right)}\right)/{{\boldsymbol{S}}}^{{\boldsymbol{i}}}$$
(10)

where \({{\boldsymbol{P}}}^{{\boldsymbol{ik}}}\) denotes the number of \({\boldsymbol{k}}\)-th type of POIs surrounding area \({\boldsymbol{i}}\), and \({{\boldsymbol{S}}}^{{\boldsymbol{i}}}\) is the normalized factor score defined in the previous sub-section.

Interpolation and direction analysis for POIs

When estimating urban livability differences and understanding its patterns, it is important to be aware of the distribution of factors that may affect different livability scores. To obtain a deeper understanding of the different POIs scores, we use the Standard Deviational Ellipse (SDE) and inverse distance-weighted (IDW) to detect the spatial distribution of different livability scores. This step is a further analysis for understanding how the livability differences generated.

As an effective trend analysis tool, the SDE approach is widely applied for capturing the spatial features of targeted elements126,127. By calculating the standard distance of multiple points in two directions, the ellipse intuitively describes the dynamic trends of the points128. This analysis allows us to investigate the overlapped area (which are excellent by multiple preferences), and understand the distribution or trend of higher scores of RBCs within the study area127.

Both analyses are conducted with ESRI ArcMap 10.4. After extracting the building cluster point data from Lianjia, the general housing price and population distribution in Shanghai can be calculated through (Eq. 11) as:

$$\bar{I}=\mathop{\sum }\limits_{i=1}^{n}\frac{{D}_{i}^{-w}}{\mathop{\sum }\nolimits_{i=1}^{n}{D}_{i}^{-w}}$$
(11)

where \({D}_{i}\) is the geographic distance calculated by \({d}_{i}=\sqrt{({\left({x}_{0}-{x}_{i}\right)}^{2}+{({y}_{0}-{y}_{i})}^{2})}\),\(\,w\) is a weighting term, \(\bar{I}\) is the interpolation value, and \(n\) refers to the number of available data points.

After capturing the housing price distribution, we use SED to analyze the spatial distribution by classifying housing prices into nine categories. Levels 1-8 indicates the housing price from 10,000 to 80,000 CNY/m2, while Level 9 refers to housing prices >80,000 CNY/m2 (Table 2). SED is applied to investigate the direction and spatial-temporal features of housing prices. The ellipse center, x and y-axis and azimuth are estimated as through (Eq. 12)- (Eq. 17):

$$A=\mathop{\sum }\limits_{i=1}^{n}{\widetilde{x}}_{i}^{2}-\mathop{\sum }\limits_{i=1}^{n}{\widetilde{y}}_{i}^{2}$$
(12)
$$B=\sqrt{{\left(\mathop{\sum }\limits_{i=1}^{n}{\widetilde{x}}_{i}^{2}-\mathop{\sum }\limits_{i=1}^{n}{\widetilde{y}}_{i}^{2}\right)}^{2}+4{\left(\mathop{\sum }\limits_{i=1}^{n}{\widetilde{x}}_{i}{\widetilde{y}}_{i}\right)}^{2}}$$
(13)
$$C=2\mathop{\sum }\limits_{i=1}^{n}{\widetilde{x}}_{i}{\widetilde{y}}_{i}$$
(14)
$$\theta =\arctan \frac{A+B}{C}$$
(15)
$${\partial }^{x}=\sqrt{\mathop{\sum }\limits_{i=1}^{n}{({w}_{i}{\widetilde{x}}_{i}\cos \theta -{w}_{i}{\widetilde{y}}_{i}\sin \theta )}^{2}/\mathop{\sum }\limits_{i=1}^{n}{w}_{i}^{2}}$$
(16)
$${\partial }^{y}=\sqrt{\frac{\mathop{\sum }\nolimits_{i=1}^{n}{\left({w}_{i}{\widetilde{x}}_{i}\sin \theta -{w}_{i}{\widetilde{y}}_{i}\cos \theta \right)}^{2}}{\mathop{\sum }\nolimits_{i=1}^{n}{w}_{i}^{2}}}$$
(17)

Reporting summary

Further information on research design is available in the Nature Research Reporting Summary linked to this article.