Background & Summary

Since the start of the COVID-19 pandemic, public health measures intended to control disease spread have been used with near-global application. Amongst these measures, social distancing (also referred to as physical distancing) has been widely advised, with individuals required to maintain physical distance between themselves and others outside their household. Distances specified by governments have typically been between 1 and 2 metres or 6 feet1. Social distancing guidelines or requirements have been adopted as a public health measure or non-pharmaceutical intervention (NPI), intended to slow or stop transmission, in nearly all countries globally during the COVID-19 pandemic. Universal application of such requirements assumes that social distancing is possible everywhere, however the feasibility of populations being able to comply with social distancing requirements can vary geographically, both between and within countries2,3. Contextual factors affecting social distancing feasibility include population density, urban form and the built environment and a range of socioeconomic factors, such as occupation, reliance on daily wages, shared water, sanitation and hygiene facilities or dependence on public transport. In urban areas particularly, high population and built densities, can mean social distancing is all but impossible in some locations2,4,5,6.

Locations where population density and urban form are constraints for social distancing, have not been systematically identified. Existing maps or datasets related to social distancing feasibility have been limited to individual cities or single countries at most, with approaches predominantly focussing on urban form and infrastructure. For cities in a number of countries, maps considering pavement (sidewalk) width as a social distancing constraint for pedestrians have been developed, for example New York (https://www.sidewalkwidths.nyc/), London (https://www.underscorestreets.com/social-distancing) and various cities in the Netherlands (https://covid19.social-glass.tudelft.nl/). These maps have been possible due to the availability of highly-detailed data on urban infrastructure, including pavement widths. Generally such data with sufficient spatial coverage is only available for individual cities or countries with advanced geospatial data systems.

Other approaches have been used to map and identify social distancing constraints in low- and middle-income countries. Macharia et al.7 developed a sub-national social vulnerability index for COVID-19 for Kenya. Although this index doesn’t specifically consider social distancing feasibility, it includes measures related to socioeconomic deprivation, associated with difficulty in practising social distancing, such as informal employment and shared sanitation facilities, alongside the proportion of the population resident in informal settlement and internally-displaced person camps7. The index was applied at the sub-county level (administrative unit level 2), highlighting sub-national variations in vulnerability, but not at the intra-urban scale. Bhardwaj et al.8 used data on population density and estimates of building height, alongside access to WASH facilities to identify “hotspots for contagion and vulnerability” in Kinshasa, Cairo and Mumbai8. In identifying high risk locations, building height data were used to calculate an adjusted population density measure considering “livable floor space”, based on assumptions of the height of a floor. However, these adjusted population density estimates may be highly uncertain, given that the input gridded population datasets (from WorldPop or Facebook) do not consider building height in the spatial allocation of population to grid cells9. For informal settlements in South Africa, Gibson and Rush10 identified challenges in social distancing, based on the small distance between dwelling units. These studies utilised building outlines that were manually digitised from satellite imagery, and therefore were limited to a few locations due to data availability.

Given the identified data gap, this paper outlines an ease of social distancing index for sub-Saharan Africa, intended to identify locations where social distancing in urban areas is likely to be very difficult. The index incorporates residential population density and urban form metrics, which are calculated from new geospatial datasets available for sub-Saharan Africa, with index values mapped for small spatial units within urban areas. The geographic variation in social distancing feasibility highlighted by this dataset is relevant in planning disease response efforts, for both COVID-19 and future pandemic preparedness, and also can be beneficial for urban planning, development and risk identification. The datasets produced11,12 are available through the WorldPop Open Population Repository (https://wopr.worldpop.org/?/SocialDistancing).

Methods

To calculate ease of social distancing index values for small spatial units, the following steps were taken: 1) definition of urban extents; 2) creation of spatial units within urban extents; 3) calculation of the built score component of the index from building footprint data; 4) calculation of the population density score component of the index from gridded population data; and 5) calculation of the index values from the built score and population density score components for each spatial unit. A schematic overview of this process is provided in Fig. 1.

Fig. 1
figure 1

An overview of the data processing steps involved in estimating the ease of social distancing index for urban areas in sub-Saharan Africa. These steps consist of: (1) defining urban extents, (2) creating spatial units, (3) calculating the built score and (4) population density score for each spatial unit, and (5) from these scores, calculating the index values. The colours representing each step are used in the subsequent figures also.

Defining urban extents

The ease of social distancing index has been calculated for urban areas in 50 countries/territories/dependencies in sub-Saharan Africa. The index is calculated specifically for urban areas due to the high and growing proportion of the population resident in urban areas13,14, and there generally being greater social distancing constraints in urban settings. The urban areas for which index values have been calculated, are based on two GHSL (Global Human Settlement Layer) datasets: GHS-SMOD v2.015 and GHS Urban Centre Database (UCDB) 2015 v1.216. The GHS-SMOD dataset is a raster of settlement types, while the GHS-UCDB dataset provides locations (points and polygons) of urban centres. The GHS-UCDB urban centre polygons correspond to the urban centre (class 30) in the GHS-SMOD dataset. The GHS-UCDB dataset was used as the basis for selecting urban centres for inclusion in the index dataset, with the GHS-SMOD dataset used to define the extent of the urban area surrounding each urban centre.

The GHS-UCDB dataset includes a data quality field (QA2_1V); all urban centres in the GHS-UCDB dataset classified as being true positives (QA2_1V = 1) were selected for inclusion in the index dataset. For these selected urban centres, the GHS-SMOD data was used to define the spatial extent of each urban area, for which index values were calculated. In defining urban extents, the intention was to ensure that the full urban area was included, as opposed to accurately delimiting the urban area. SMOD class values of 21–30 (peri-urban, semi-dense urban cluster, dense urban cluster, urban centre) representing urban/peri-urban settlement types were all considered to be urban, and the grid cells corresponding to these classes were reclassified to create a binary urban/not-urban raster. All grid cells in the binary urban raster that were spatially contiguous with an urban centre, were considered to be part of the urban area surrounding the urban centres. Given the relatively coarse spatial resolution of the GHS-SMOD data (1 km) and the potential growth in urban settlements since 2015, the spatially contiguous urban grid cells were buffered by 3 km to try and ensure the full urban area was included. A distance of 3 km was chosen after testing a range of buffer distances and visually assessing the buffered area against the urban extent visible in recent satellite imagery. Finally, a convex hull was created around each location of contiguous urban grid cells to create a smooth polygon boundary around the buffered grid cells. If convex hulls spatially overlapped, the urban areas within these convex hulls were considered to be a single urban area, and a new convex hull was created around the combined urban areas. In the case of urban areas being near national boundaries, national administrative boundaries were used to spatially clip the urban extents. Similarly for urban areas in coastal locations, urban extents were clipped to the coastline. In total, for the 50 countries in sub-Saharan Africa covered by this dataset, there were 1,373 urban extent polygons covering an area of 239,050 km2, associated with 1,551 named locations in the GHS-UCDB dataset.

Defining spatial units within urban extents

As the social distancing index dataset covers 50 countries, no sufficiently small administrative or statistical unit existed across all countries to be the unit of analysis within urban areas. Instead, small spatial units were created within each urban extent (Fig. 2), for which index values were calculated. The boundaries of the custom spatial units were defined by recognisable features as far as possible, such as roads, rivers and railways. Data on these linear features were supplemented by additional line and polygon features, such as the boundaries of land use types (e.g. military areas, airports, hospital grounds, golf courses). Data on all features used in defining the boundaries of the custom spatial units was extracted from OpenStreetMap17 and is detailed in Table 1. Similar approaches utilising OpenStreetMap (OSM) data to create spatial units based on recognisable features have been used in sub-Saharan African cities for studies on urban land use classification18,19, slum mapping20 and semi-automated approaches to create census enumeration units21.

Fig. 2
figure 2

A schematic representation of the data processing steps involved in the creation of the spatial units for which the index is ultimately calculated.

Table 1 The input datasets used in creating the mapped index outputs for urban areas in sub-Saharan Africa, including the data file format, purpose, source and description.

The extracted features from OpenStreetMap (OSM) were intersected to create polygons. In many locations on the urban fringe, there was a sparsity of features or land use boundaries, meaning that the polygons created by intersecting features could be quite large. To address this, further efforts were made to subdivide large polygons located on the urban fringe. Settlement extents22, classified as either BUAs (built-up areas) or SSAs (small settled areas), were used to identify locations on the urban fringe. Polygons which intersected the boundary of a BUA or SSA settlement extent, and were greater than 100,000 m2 in area, were considered to be large polygons on the urban fringe which should be further subdivided. For these selected polygons, if OSM data for residential areas was available, these were included as additional features to subdivide the existing polygons. The area of every polygon was then calculated. A minimum area constraint of 10,000 m2 (1 hectare) was applied; any polygons less than 1 hectare in area were merged with neighbouring polygons iteratively, until the area constraint was met. The resulting polygons were the spatial units of analysis for which the social distancing index is calculated, and are most representative of single street blocks, or groups of street blocks, in the urban centre.

If an urban extent had fewer than 30 spatial units, the urban extent was excluded from the output dataset as it was considered to most likely not actually constitute part of an urban area. Such instances either occurred along national boundaries, where an urban extent polygon spanned the national boundary but significant urban settlement was only present in one country. Alternatively urban extents with less than 30 spatial units occurred where an urban centre was very small, and insufficient features in OpenStreetMap meant it was not possible to define suitable spatial units within the urban extent. A threshold of 30 spatial units was chosen following testing of a range of threshold values.

Calculating population density and urban form metrics

Calculating estimates of population density requires data on the geographic distribution of population. Typically, population distribution data provides a static representation of population, for example population counts derived from censuses are based on residential address locations. The spatial distribution of population though is not static, and changes constantly as people go about their daily lives. Although census data are generally considered to be the standard source for population counts, the necessity of enumerating the population at their residential location means that the population distribution derived from census data is inherently more representative of the population at night, than during the day23. During the daytime, populations are likely to be at school or work, running errands or undertaking other activities outside their residential location, altering the spatial distribution of population. During the COVID-19 pandemic, changes in population mobility and implementation of measures intended to reduce disease transmission, such as school closures, requirements to work from home and curfews, have also affected the spatial distribution of population24,25.

High-resolution data on population distributions is most commonly derived from censuses and therefore represent residential populations (e.g. gridded datasets from WorldPop). Data on other population distributions are available, although these vary in terms of their spatial coverage and resolution. Oak Ridge National Laboratory’s LandScan Global dataset26 represents ambient population at 1 km spatial resolution globally, LandScan USA27 provides daytime and nighttime population distributions for the USA, while Pop24723 provides time-specific population distributions for England. Considerations around social distancing feasibility are not limited to residential settings, with social distancing a necessity in a wide-range of settings where other transmission-mitigating measures are not in place. However, given the availability of data on population distributions with suitably high spatial resolution, and the general lack of additional transmission-mitigating measures in residential settings, this work focuses on social distancing feasibility considering residential population density.

The ease of social distancing index is calculated from metrics of residential population density and space occupied by buildings, calculated for each spatial unit (Fig. 3). Residential population density was calculated from WorldPop high-resolution gridded population count datasets28 with a spatial resolution of 3 arc seconds (approximately 100 m at the Equator). There are other gridded population datasets available with similar or finer spatial resolutions (e.g. GHS-POP or HRSL)29, however only WorldPop population datasets are constrained to grid cells with Ecopia building footprints. As the index relies on measures of both population density and space occupied by buildings (calculated from Ecopia building footprints), it made most sense to use gridded population estimates that were spatially constrained based on the same building footprint data. WorldPop gridded population datasets are created primarily with two approaches: top-down and bottom-up estimation modelling30. The top-down approach creates gridded estimates through the spatial disaggregation of enumerated census population counts or projections for administrative units. In contrast, the bottom-up approach takes enumerated population counts for small areas, for example from household survey listings, and uses a geostatistical model to estimate population for all grid cells, including in unsampled locations30. Particularly for locations where population projections are uncertain due to a long time period having elapsed since the last census, bottom-up modelled population estimates are likely to provide a more realistic estimate of population counts and spatial distribution. For this reason, where WorldPop bottom-up population estimates are available, these have been used as the data source for the population density component of the ease of social distancing index. For countries where bottom-up estimates were not available, gridded population estimates produced using a top-down constrained approach were used instead.

Fig. 3
figure 3

A schematic representation of the data processing steps involved in the calculation of the built and population density scores, and the subsequent calculation of the ease of social distancing index values for each spatial unit.

The WorldPop top-down approach involves dasymetric modelling of census counts or projections, using a Random Forest model to estimate a weighting layer from a range of ancillary gridded covariates31. In the “constrained” approach, a binary settlement mask is used to constrain the disaggregation to grid cells identified as having one or more buildings present. In the case of top-down constrained population datasets for sub-Saharan African countries, the binary settlement mask is based on Ecopia building footprints. Ancillary covariates derived from the building footprints, such as building count, building area, variation in building area and distance to edge of settled area are also included as inputs to the Random Forest model (used to estimate the dasymetric weighting layer). Further details of the gridded population datasets are provided in Table 1.

To calculate a mean population density for each spatial unit, a population density raster was first created for each country by dividing the grid cell values in the population count raster, by the surface area for each grid cell, maintaining a spatial resolution of 3 arc seconds. From the density raster, an estimate of mean population density was then calculated for each spatial unit, using zonal statistics. In the event that a spatial unit was of a size or shape such that there were no grid cell centroids of the population raster located within it, a mean zonal statistic could not be calculated. Instead, for the centroid of the spatial unit, an estimate of mean population density was calculated by interpolating neighbouring grid cell values. For spatial units in close proximity to a national boundary, there were some locations where it was not possible to calculate a mean population density, due to differences in the spatial extents of datasets resulting in missing population estimates. Where this occurred, spatial units have been assigned a no data value (−99) for the population density score.

The second metric used in the index was a measure of space occupied by buildings, calculated from building footprint polygons. Developments in computing power and the increasing availability of very high-resolution satellite imagery, have enabled new datasets of building footprint polygons to be produced using feature extraction techniques. For sub-Saharan Africa, existing building footprint datasets include those from Microsoft (https://www.microsoft.com/en-us/maps/building-footprints), Google (https://sites.research.google/open-buildings) and Ecopia (https://www.ecopiatech.com/global-feature-extraction). The differences between these datasets, and the extent to which they can be used interchangeably has not been systematically explored and this will be a focus of future work. Only the Ecopia building footprints have coverage for all countries in sub-Saharan Africa and hence were used in calculating the built metric of the social distancing index.

For each spatial unit, the proportion of space occupied by buildings was calculated by summing the total building footprint area, divided by the area of the spatial unit. In instances where building footprints spanned more than one polygon, the building footprint polygon was split into two or more smaller polygons along the boundary of the spatial unit. This metric was solely derived from building footprint data and does not consider building height or associated floor space. To incorporate measures of available floor space would require detailed building height data for all cities across sub-Saharan Africa, which was not available. In assessing social distancing feasibility, estimating available floor space is important in areas where high population densities are associated with high-rise buildings which have a relatively small building footprint. In the context of sub-Saharan African urban areas, the prevalence of high-rise buildings varies but they are typically not the norm for residential dwellings. More commonly high population densities are associated with informal settlement areas5, where building height is commonly one- or two-storeys. Whilst data on building height were not available to incorporate into the index, a new urban morphological dataset32, available for major cities globally, provides a classification of urban form which considers building height. Comparing the ease of social distancing index values with this urban morphological classification for 12 cities (Supplementary Information Fig. 1), indicates that for the majority of cities, areas of high- and mid-rise buildings are not associated with high index values.

To convert the estimates of mean population density and proportion of space occupied by buildings into scores, the estimates were then classified with values assigned between 1 and 10. Whilst this is an arbitrary and subjective choice of classes, the values are easy to interpret and it provides a level of detail that adequately distinguishes between different levels. Values for the proportion of space occupied by buildings in a spatial unit were assigned values between 1 and 10; spatial units in which 10% or less of the space was occupied by buildings were assigned a value of 1, greater than 10% but less than or equal to 20% were assigned a value of 2. Classified values were assigned in this linear fashion up to a value of 10, which was assigned where over 90% of a spatial unit was occupied by buildings. If a spatial unit had no building footprints within it, then a value of 0 was assigned (Fig. 4).

Fig. 4
figure 4

The population density and built scores are evenly weighted in the index, calculated as the mean of the two scores (top). Examples of the spacing of buildings and population for each score value is included, with the built score shown with building footprints representing both a single and multitude of buildings. The population density score is calculated based on a hexagon tessellation with a range of distancing parameters, with the classification of built and population scores provided (bottom).

Mean population density values were classified based on the population density that is possible with different distancing parameters. To estimate population density with a range of distancing parameters, an idealised model of perfect spacing between people, based on a hexagon tessellation was used. In this conceptualisation, the hexagon represents the space available for one person, with the person considered as the hexagon centroid (Fig. 4). The distance between the centroid of a hexagon and the midpoint of a side is termed the apothem (d), with the distance between centroids of neighbouring polygons being twice the apothem length. Considering a social distancing requirement of 2 m distance, for all individuals in a spatial unit to maintain 2 m from each other, whilst being able to move, an apothem of 3 m is required. This distance ensures that a person can move 2 m in any direction, whilst still maintaining a distance of 2 m from any other person. The area of a hexagon with an apothem of 3 m (i.e. the area needed for one person to maintain 2 m distance whilst being able to move) is 31.18 m2, which corresponds to a population density of 32,075 people per km2. Spatial units with a mean population density of greater than or equal to 32,075 people per km2 are assigned a value of 10 - the maximum population density score.

Population density scores were assigned to the remaining population density estimates using threshold values derived for increased spacing between people. The hexagon apothem was increased in increments of 1 m, up to 11 m. A hexagon with an apothem of 11 m covers an area of 419.16 m2 for which the corresponding maximum population density is 2,386 people per km2. A value of 1 was assigned to spatial units with non-zero population density estimates less than 2,386 people per km2. The population density thresholds are outlined in Fig. 4. Any spatial unit with 0 estimated population was assigned a value of 0. The calculation of population density thresholds are theoretical and do not account for obstructions which may be present and affect the space available for individuals to move within and distance from each other. The distances used to calculate population density thresholds will in reality most likely be reduced, due to the space within built environments rarely being open and free from obstructions.

Calculating the social distancing index

The index value for each spatial unit was calculated as the mean value of the population and built scores, resulting in index values ranging from 0 to 10. An index value of 10 would indicate a very high population density (≥32,075 population per km2) and a very high density of buildings (>90% of spatial unit area is occupied by buildings); a combination of factors which would mean that social distancing is extremely difficult. In contrast, an index value of 1 would indicate a relatively low population density and considerable space available around buildings, likely associated with fewer physical constraints for social distancing. A value of 0 indicates that a spatial unit has no buildings and no estimated population, and consequently no social distancing constraints are anticipated. A no data value (−99) was assigned for any spatial units which had a no data value for the population score. Mapped index values for West Africa are shown in Fig. 5, with detailed maps included for the capital cities of Sierra Leone, Ghana and Cameroon.

Fig. 5
figure 5

Mapped ease of social distancing index outputs are shown for West Africa, with examples for Freetown, Sierra Leone (left column), Accra, Ghana (centre column) and Yaoundé, Cameroon (right column). The lower panes show larger-scale maps (middle rows) with examples for locations with higher index values, shown both with the mapped index values and the spatial units overlaid with Maxar satellite imagery from 2018–2020 (bottom row). The higher index values (paler colours) indicate greater expected difficulty in practising social distancing.

Data Records

The ease of social distancing index datasets11,12 for urban areas in 50 sub-Saharan African countries are openly available to download from the WorldPop Open Population Repository (https://wopr.worldpop.org/?/SocialDistancing) in Shapefile format. To download files for multiple countries at once, we recommend using the WOPR API or wopr R package (see relevant tabs at https://apps.worldpop.org/woprVision/ for guidance). The datasets can also be viewed interactively through the GRID3 Data Hub (https://data.grid3.org/). Further details of the datasets, including a description of each field, are outlined in Table 2.

Table 2 Details of the output data files of the ease of social distancing index.

Technical Validation

The ease of social distancing index values are calculated from metrics derived from gridded population estimates and building footprints. These input datasets were already checked by the data producers to ensure they comply with the intended quality-standards. In terms of the building footprints, the data producers implemented both automated checks and a manual review process. This includes the manual review of a randomly-selected area of 50 km2, for every 1000 km2 of processed imagery33, and checks to ensure that the dataset meets specified quality requirements (≥90% valid interpretation)34. Gridded population datasets are harder to validate, particularly at fine spatial resolutions or at the grid-cell level29. Gridded population datasets for sub-Saharan African countries, produced using the WorldPop top-down constrained approach, utilise the best-available census-derived data on population totals, carefully selected geospatial covariates and building footprint datasets. Output datasets are checked to ensure that grid cell values sum to the administrative unit totals used as input for the dasymetric modelling, have a sufficiently high explained variance (e.g. >0.8) and spot-checked through visual comparison with satellite imagery basemaps. For WorldPop gridded population datasets produced using bottom-up modelling approaches, similar geospatial covariates and building footprint datasets are used, with high-quality population enumeration datasets, typically collected in partnership with national statistical offices. The resulting gridded population datasets include uncertainty estimates at the grid cell level, and for aggregated totals. A review process involving calculating various goodness of fit metrics, cross-validation, comparison of the outputs with alternative population data sources and recent high-resolution satellite imagery is undertaken. Where possible the estimates are also reviewed by staff from national statistical offices. In the context of this work, additional visual checks of mapped input datasets were implemented. For the resulting index values, mapped outputs were checked on a country-by-country basis, and the statistical distributions of index values reviewed. Any issues identified in this process are listed in the data release statement Appendix E11,12.

It is difficult to validate if the ease of social distancing index values reflect the reality of social distancing feasibility within and between urban areas. However, in the context of urban sub-Saharan Africa, populations in slums and/or informal settlements are recognised as experiencing challenges in social distancing2,3,4,5,6. Consequently, we would expect that high index values would be associated with these types of settlements. UN-Habitat characterises informal settlements as residential areas where inhabitants do not have security of tenure, often have limited access to basic services and infrastructure, with housing that may not meet relevant building and planning regulations, and may be located in environmentally-hazardous locations35. However, informal settlements take many forms36 and there is no accepted formal definition of slums or information settlements37. Locations of informal settlements and slums are also not routinely mapped in a standardised way, and are often home to populations that can be excluded from official statistics38,39.

Mapped datasets of informal settlements do exist for some cities, for example informal settlements in Cape Town, South Africa are available as mapped extent polygons (https://africaopendata.org/dataset/city-of-cape-town-gis-data). Recent work by the International Growth Centre (IGC) with Ordnance Survey (OS), characterised informal settlements in Lusaka, Zambia, with detailed maps of settlement types40. These spatial datasets provide mapped locations of informal settlements, but are limited in their coverage to individual cities and cannot be assumed to be consistent in the definition used in mapping informal settlements. The ease of social distancing index values, and the associated population density and built scores, were compared with the maps of informal settlements in Lusaka and Cape Town (Fig. 6). Spatial units in these cities were classified as one of two settlement types: informal settlements or other (not informal settlements). For Lusaka, the IGC/OS detailed maps of settlement types provided information at the building level; the majority settlement type for each spatial unit was determined and reclassified to the binary informal settlement classes. In contrast, the informal settlement dataset for Cape Town was less spatially-detailed and consisted of polygons representing informal settlement extents. We took a conservative approach and considered any spatial unit with 50% or more of its area within a settlement polygon as being an informal settlement. For spatial units within the areas considered to be informal settlements (“Informal”), the proportion of spatial units with each combination of population density and built scores was calculated, and compared against those spatial units in the areas that are not classified as informal settlements (“Other”) within the same spatial extent. An additional comparison with areas classified as planned residential in Lusaka, is shown in Supplementary fig. 2.

Fig. 6
figure 6

Considering all spatial units within the urban extents of (a) Cape Town, South Africa and (b) Lusaka, Zambia, the population density score (POPscore) is plotted against the built score (BUILTscore). Spatial units are classified according to settlement type, as either being within an informal settlement (yellow) or not, i.e. all other locations (blue). The size of the circle denotes the proportion of spatial units with each combination of POPscore and BUILTscore values in the two settlement types.

Figure 6 shows that a greater proportion of the spatial units in areas of informal settlement had higher population density and built score values, than spatial units in other types of settlement. For Lusaka, 92.3% of spatial units in informal settlements had a population density score of 7 or greater, while for other settlement types, only 35.8% of spatial units did. Similarly for Cape Town, 84.9% of spatial units in informal settlements had a population density score of 7 or greater, while for other settlement types, only 20.4% of spatial units did. Focussing on the highest population density scores, 58.8% and 66.7% of spatial units in informal settlements in Lusaka and Cape Town respectively, had a population density score of 9 or greater with the same score only for 16.6% (Lusaka) and 7.4% (Cape Town) of spatial units in other settlement types. Built score values are consistently lower than population density score values, but higher values are still observed for spatial units within informal settlements than other settlement types. 45.4% of spatial units within informal settlements in Lusaka and 69.7% of spatial units within informal settlements in Cape Town had a built score value greater than or equal to 4. In comparison, only 18.9% and 31.6% of spatial units in other settlement types in Lusaka and Cape Town respectively had a built score value greater than or equal to 4. The higher score values found in informal settlement locations are reflected in the index values also, with 61.7%/73.4% of spatial units in informal settlements (Lusaka/Cape Town) having an index value of 6 or greater, compared to 19.3%/12.0% of spatial units in other settlement types (Lusaka/Cape Town). Examining index scores for areas of informal settlements in Lusaka and Cape Town, confirms that high index values, indicating greater difficulty in social distancing, are found in locations where social distancing is recognised as being more difficult. This comparison would ideally be repeated across all urban extents in the ease of social distancing index dataset, but this is not possible given the limited mapped data on informal settlement extents.

Gibson and Rush10 assessed social distancing feasibility by calculating the distance between buildings in two informal settlements in Cape Town. In both settlements: Masiphumelele and Klipfontein Glebe, it was identified that social distancing would be very difficult given the dense arrangement of buildings10. Settlements with these names are included in the Cape Town informal settlement dataset described previously, however the spatial extents of the settlements differ and only overlap for the settlement of Masiphumelele. The ease of social distancing index values for spatial units in Masiphumelele are generally high (minimum: 8, maximum: 8.5); the index values are in agreement with Gibson and Rush’s findings for Masiphumelele. For the settlement of Klipfontein Glebe only a qualitative assessment was carried out due to the spatial mismatch in the area identified as Klipfontein Glebe. The area considered to be Klipfontein Glebe in Gibson and Rush’s work did not have particularly high index values. The low index values for this area are driven by low population density scores, whilst built scores remain relatively high. Reviewing the values of the gridded population dataset for Klipfontein Glebe shows surprisingly low values given the number and density of built structures. The gridded population estimates used in calculating index values for South Africa are top-down estimates (i.e. disaggregated from census projections). The discrepancy between the population score and the built score may be the result of a potential under-enumeration of the population in this area during the census or growth of settlement since the last census. This is consistent with the findings of Thomson et al.41 who identified underestimates of population in gridded datasets for slum areas in Nigeria and Kenya. Future work to improve the accuracy of gridded population estimates in informal settlements, such as the development of bottom-up modelled estimates42,43, will likely benefit derived datasets such as the ease of social distancing index.

Usage Notes

The ease of social distancing index datasets provide estimates of social distancing feasibility for urban areas across sub-Saharan Africa. The index values are calculated for small spatial units, providing mapped estimates at high spatial resolution. The datasets are available to download in Shapefile format and users can work with these in Geographic Information Systems (GIS) software or other software with spatial analysis capabilities. These datasets can support a range of applications, both directly associated with COVID-19 response and more broadly related to public health, urban planning and accessibility. For example, an analysis to identify locations with poor access to COVID-19 testing, could also consider where social distancing is most difficult as locations with both poor access to testing and difficulty in social distancing may be susceptible to rapid community transmission that is not detected through standard testing programmes. In such locations, infected individuals are unlikely to be able to effectively self-isolate at home and therefore to prevent further community transmission, support and provision of accessible facilities nearby to self-isolate are likely needed5. Given potential differences in testing- or treatment-seeking behaviour, the ease of social distancing index may be beneficial in planning sampling frames for COVID-19 seroprevalence surveys in terms of stratification by neighbourhoods or settlement types. The index datasets may also have a role in community advocacy, for example in providing quantitative data and mapped outputs to communities experiencing difficulties in social distancing and associated challenges with overcrowding. Aside from the COVID-19 pandemic, the ease of social distancing index can be useful in identifying locations susceptible to other risks. Locations with high population densities and overcrowding, particularly when combined with poor ventilation, can provide conditions favourable for transmission of other airborne infectious diseases, such as influenza, tuberculosis or measles44. Other physical hazards are also a concern in such locations; for example, buildings in close proximity and constructed from flammable materials are at particular risk of rapid fire spread - conditions that are commonly found within informal settlements10.

The following limitations have been identified with the ease of social distancing datasets. Firstly, the index is calculated from data on urban form and population density to capture factors affecting social distancing feasibility at a high spatial resolution. The building footprint data were created through automated feature extraction of satellite imagery, and represent the spatial extent of buildings but do not include information on building height or use (e.g. whether the building is used for residential purposes). The imagery used in the extraction of the building footprints was acquired over multiple years. Greater than 80% of the imagery for the area covered by the ease of social distancing index is from 2018 or 201945, however some imagery therefore predates 2018, and cloud cover in the satellite imagery may introduce false negatives into the building footprint datasets. The time point of the input gridded population datasets also varies. For the majority of locations where WorldPop top-down population datasets based on projected census figures were used, the projected estimates were for 2020. For locations where other population estimate datasets have been used (Burkina Faso, Ghana, Nigeria, Mozambique, South Sudan, Sierra Leone, Zambia and the Democratic Republic of the Congo), the time point of the population estimates varies between 2015 and 2019 as estimates were only available for a single year. Population density was estimated from a single dataset for each country, however other gridded population datasets at similar spatial resolutions are available29. Between datasets, there can be marked variation in estimates of population at the grid cell level29,41,46,47. This is potentially a considerable source of uncertainty for analyses using gridded population datasets, including the ease of social distancing index. Future work will explore alternative datasets and potential methods for integrating multiple datasets.

Secondly, the gridded population datasets provide estimates of residential or nighttime population density. Given that significant diurnal population movements occur within cities, the index values may well change if estimates of daytime population density were used instead. There is however a lack of datasets representing daytime population distributions within urban areas, e.g. LandScan USA27, Pop24723 and Batista e Silva et al.48 provide daytime population estimates, but are limited to the USA, England and Europe respectively. Other data on population mobility such as call detail records (CDRs) from mobile phones or Facebook mobility data, could be used to capture these different spatial distributions where available (e.g.24,49,50). Population movements that occur at particular times and in particular places may also influence social distancing feasibility, for example transport hubs and markets. In these locations large numbers of people can congregate at certain times, and integrating data on population mobility could help identify such locations.

Thirdly, in addition to population density and urban form, socioeconomic factors influence social distancing feasibility, however accurate data on these factors which is georeferenced at a granular level is rarely available. Socioeconomic factors such as employment in the informal sector/reliance on daily wages, use of communal WASH facilities and dependence on crowded public transport can have a multiplicative effect in increasing difficulty in social distancing. Conversely, a secure income source from employment which it is possible to do whilst working from home, can increase the feasibility of social distancing. Such individual or community-level factors will also influence social distancing feasibility. As the index values are solely based on estimates of population density and urban form, which themselves have their own limitations, the dataset should be considered as one source to guide response efforts, but should not be relied on as the sole basis for decision-making. Finally, the ease of social distancing index dataset is limited in its spatial coverage to urban areas in 50 countries in sub-Saharan Africa. This index has been developed with these contexts in mind, but social distancing challenges will also be present in smaller towns and some rural locations within the countries covered by the dataset. Future work will look to expand the coverage of the index datasets, both outside major towns and cities, through the development of a gridded version of the index, and to a greater number of countries. We will also look to incorporate additional factors, for example data on building height, and explore alternative sources for calculating population density, including self-reported slum population values.