Main

Urban density is often considered a contributing factor in the transmission of infectious diseases. Although urban agglomerations have been connected to the risk of disease throughout history, the rapid population growth in cities during the industrialization period of the nineteenth century, and the rise of epidemiological methods, more directly linked cities to disease spread1. As increasing demand outpaced infrastructure capacity, the resultant overcrowded housing, congestion and insufficient sanitary and public health services led to widespread disease outbreaks, particularly among disadvantaged populations. Since higher urban densities are associated with a greater probability of close contacts, it follows that the rate of infectious disease transmission should increase with population density, represented mathematically by a scaling function2,3. In response, residents living in higher-density communities may acknowledge this risk, and modify their behavior to reduce the potential for exposure to infected individuals4. Emerging technologies, such as food home-delivery apps, and changing labor market structures, which enable new opportunities for remote work for some occupations, contribute to the viability of social distancing behaviors for certain groups5,6. Studies on density and disease spread have been constrained by the confounding effects of the positive correlation between population density and contact probability, and the potential mitigating influence of behavior change among urban residents7,8. The ability of a particular community to shift mobility behavior through social distancing, however, is mediated by racial and ethnic disparities, social and cultural norms, political ideologies, education, occupation and income, among other household and neighborhood attributes9,10,11. Beyond this polarity, the influence of density on local health risk is further shaped by a range of social determinants, including access to health care and housing conditions12. Thus, transmission rates can vary considerably across different communities in the same city13.

Taken together, it is important to consider the role of urban population density in disease transmission at the scale of the neighborhood, where variations in built environment and socioeconomic characteristics can be observed and analyzed14,15,16. During what is considered the first wave of the coronavirus disease 2019 (COVID-19) pandemic in the United States, many cities and states issued ‘stay-at-home’ or ‘shelter-in-place’ orders to encourage, or mandate, social distancing and the reduction of close contacts outside of the home17. This behavioral intervention was intended to reduce exposure density and thus minimize the spread of the disease. The response to these orders, however, was not uniform9,18. Such interventions have been found to have had a range of negative effects on individuals and communities, from mental health and emotional well-being, to educational outcomes and financial burdens, reinforcing the need to understand the relationship between stay-at-home orders, neighborhood context and transmission risk19,20,21. Measuring and understanding contact heterogeneity resulting from social distancing practices within neighborhoods is also a necessary step in evaluating the effect of urban density on infection rates, as well as the role social and economic disparities play in increasing risk for vulnerable communities.

There has been growing interest in social distancing behavior during the COVID-19 pandemic, as measured using various data sources, such as aggregated mobility data derived from smartphone geolocation information and geotagged social media activity22,23,24. However, the value of analyzing social distancing independent of the built environment context, and vice versa, is limited. For instance, a household in a rural community sheltering-in-place would reduce their potential risk of exposure to an infected individual less than a similar household in an urban environment simply as a function of the lower likelihood of close contacts in a less-dense neighborhood. Thus, we must consider both the baseline risk of transmission in a particular place, and the impact of social distancing behavior on an adjusted risk basis.

Several recent studies have specifically examined the effect of urban density on the transmission of COVID-19 in different global contexts25,26,27,28,29,30,31. The results from these papers show inconsistent results on the link between population density and COVID-19 transmission rates31,32. Most are limited by the geographic scale of the analysis, the granularity of COVID-19 outcome data and the extent to which social distancing behavior is appropriately represented27,30,33,34,35,36,37,38. First, density and disease transmission are mediated by the likelihood of close contacts in urbanized areas. Effectively accounting for mobility behaviors when examining differential socio-spatial impacts of density creates nontrivial data and computational challenges39. Second, many studies analyze the effects of density at the county or regional level, which can obscure localized variations in urban density, particularly at the neighborhood scale36,40, or focus on a single city, which can constrain the generalizability of findings30. Finally, it is important to account for local health policy context during the study period, particularly as it relates to stay-at-home orders. Many cities and states adopted these orders in the early stages of the pandemic to limit disease transmission. However, the timing of these orders, and the extent of the associated mandates, varied considerably across the United States, given local and regional differences in disease prevalence and political ideology41,42.

In this Article, we analyze the effects of urban residential population density on COVID-19 infection rates at the neighborhood (census tract) level, while controlling for changes in mobility behavior and other social determinants of health. We model both residential population density and exposure density across neighborhoods in 15 US cities to identify and evaluate how racial and income disparities influence the effects of density and the adoption of mitigating behaviors in response to COVID-19 transmission risk. We address three questions related to local public health policy and the impact of density and land use on infection rates during the first wave of the COVID-19 pandemic: (1) how did neighborhoods in different urban and socioeconomic contexts respond to social distancing policies during the early stage of the pandemic?; (2) what are the effects of land use and density on social distancing behaviors and COVID-19 infection rates? and (3) how do racial and income disparities influence mitigating behaviors in higher-density neighborhoods? To examine heterogeneity in neighborhood effects and capture regional variations, we focus our study on a diversity of cities in the United States for the time periods before, during and after stay-at-home orders were issued in each respective city.

Our approach is as follows: we first employ an indicator of exposure density as a measure of social distancing behavior to quantify changes in neighborhood activity associated with stay-at-home orders, using mobility data provided by VenPath, Inc. This indicator captures changes in human mobility behavior by both the volume of activity in a given place and the nature of that activity across land-use types. Exposure density provides a measure of local adoption of social distancing behaviors accounting for both migration patterns and changes in typical routines. We then analyze the effects of density and land use on COVID-19 infection rates by neighborhood, controlling for measured changes in exposure density. We define density here as the residential population per unit area of residential land use in a given census tract. Finally, we identify neighborhood racial and income disparities in observed COVID-19 outcomes by analyzing socioeconomic and demographic correlates of infection rates across neighborhoods with similar densities and land-use typologies. Throughout our analysis, we account for appropriate covariates that influence health outcomes. Our findings provide insight into the role of urban density on disease transmission and the effect of mitigating behaviors on neighborhood health risk, supporting targeted and equity-driven policy decisions.

Results

Neighborhood disparities in social distancing behavior

We calculate the average exposure density (as defined in Methods) for each census tract in the 15 cities (Supplementary Table 1) included in the study before the COVID-19 pandemic period (prepandemic or ‘typical’ period, covering the 2 weeks from 16 February to 29 February 2020), in the 2-week period after the stay-at-home order was issued for each respective city, and the 2-week period following the lifting of the stay-at-home order, also known as the ‘phase 1’ reopening (Supplementary Table 1). The neighborhood exposure density for each city is presented in Supplementary Fig. 2.

Figure 1 (selected cities) and Supplementary Fig. 3 (all cities) visualize the spatial patterns of neighborhood exposure density change over the study period (for additional details, see Supplementary Information). In a majority of cities, the overall activity volume in downtown areas decreased after the stay-at-home order compared with pre-COVID-19 levels, while activities in the peripheral areas remained relatively constant or increased, a result of a shift to more localized activity around residential areas. The nature of activity changed over this time period, with a substantial increase in the proportion of activity in residential land uses and commensurate decreases in nonresidential (office, retail, school and so on) and outdoor (park, sidewalks and so on) areas. Varying levels of social distancing behavior adoption across the country are observed. In Miami, Florida, for example, activity proportions by land use type are found to be relatively unchanged before, during and after the state-wide stay-at-home order, indicating a relatively stable mobility pattern. On the other hand, nonresidential and outdoor activities in New York City decreased approximately 20% after the stay-at-home order when compared with prepandemic levels. Independent of the total volume of activity in New York City, and the migration of population out of the urban core, those residents that remained changed their mobility behavior to avoid activities outside of the home.

Fig. 1: Census tract-level exposure density changes between the prepandemic period and the period after the stay-at-home order (left), and between the stay-at-home period and the period after phase 1 reopening (right).
figure 1

Austin, TX (top), Chicago, IL (middle) and New York City, NY (bottom) are shown here as examples. Maps for all cities are found in Supplementary Fig. 3. The minimum and maximum values of −0.4 and 0.4 are threshold levels for the lowest and highest bins, respectively, for visual representation.

Figure 2 shows the neighborhood distribution of exposure density change across the studied cities. Neighborhoods in New York City, on average, experienced a decrease in exposure density of approximately 49%, and almost all neighborhoods adopted social distancing behaviors after the stay-at-home order (represented as the gray-colored area under the curve). In the case of neighborhoods in Las Vegas, Nevada, on the other hand, average exposure density decreased by only 6%, and approximately half of all neighborhoods did not adjust their typical mobility behaviors. Overall, neighborhoods in east coast cities, including Boston, New York City, Philadelphia and Washington DC, exhibit larger decreases in exposure density when compared with the other cities in the study group, as represented by positively skewed exposure density distributions. The distributions of the neighborhood response to the lifting of the stay-at-home order are shown as dotted curves. For example, the exposure density curve for New York City after its phase 1 reopening demonstrates a clear positive shift from the stay-at-home period. The mean value for the exposure density change curve indicates a return to near-normal levels, on average, of neighborhood activity. As the figures represent, exposure density change varied considerably across cities and within different neighborhoods in each city. Despite similar social distancing policies, there are clear disparities in community social distancing behaviors.

Fig. 2: Neighborhood distribution of exposure density change across the studied cities.
figure 2

The solid lines represent exposure density change after the stay-at-home order and dotted curves show the change after the lifting of the order.

Effects of density on social distancing and infection rates

After observing community differences in mobility behaviors during the first wave of the COVID-19 pandemic, we focus on the effect of neighborhood residential population density on mitigating behaviors and COVID-19 infection rates. Specifically, we hypothesize that neighborhoods with similar density and land-use characteristics will have similar social distancing behavioral responses and comparable infection rates, after controlling for other social determinants of health. Residents in higher-density neighborhoods may perceive a higher risk of infection due to the greater probability of close contacts with potentially infected individuals in their everyday lives. As a risk-mitigation measure, this awareness could lead to mobility behavior changes to reduce the likelihood of interaction with others outside of the household or family unit. Conversely, residents in lower-density neighborhoods may not perceive an increased risk of transmission in their communities, where the number of random social interactions may be relatively limited. This could result, then, in more modest mobility behavior changes in response to social distancing restrictions. Other factors, such as political ideology, demographic and socioeconomic characteristics, and risk tolerance are also considered.

To test this hypothesis, we first group neighborhoods in all 15 cities based on density and land-use characteristics using k-means clustering, an unsupervised machine learning method. The algorithm is applied to a total of 6,216 census tract neighborhoods after removing outliers based on population size (for descriptions of the methodology and land use data, see Methods). The clustering output identifies five distinct neighborhood groups with similar built environment characteristics across the cities in the study group. Neighborhoods assigned to the same cluster group, although they may not be in the same city, share similar land use profiles and population densities. Figure 3 visualizes the spatial pattern of the clusters for selected cities (for all cities, see Supplementary Information) and Supplementary Table 2 presents descriptive statistics for the built environment features for each cluster group.

Fig. 3: The results of the neighborhood clustering algorithm based on density and land use, with Austin, TX (left), Chicago, IL (middle) and New York City, NY (right) as examples. The number of census tract neighborhoods assigned to each clustered group are shown in the legend.
figure 3

The results for all cities can be found in Supplementary Fig. 4. CT, census tract.

The five neighborhood groups have distinct patterns of exposure density change, as shown in Supplementary Table 2. Group 1 (defined as ‘ex-urban residential’ given the relatively high proportions of residential and open space land uses in these neighborhoods) and group 2 (‘institutional and industrial’) neighborhoods decreased overall activity 26% and 27%, respectively, which represents the lowest rate of exposure density change across the five groups. Residents in these neighborhood groups may be more likely to maintain their behaviors and activity outside of the home because of a lower perceived risk of close contacts. ‘Low-density residential’ (group 3, characterized by one- and two-family housing) and ‘mixed-use’ (group 4) neighborhoods show similar exposure density change rates, decreasing overall by 34% and 36%, respectively, with a noted shift away from activities in nonresidential and outdoor areas. Activity levels in ‘high-density’ neighborhoods (group 5), which includes urban core neighborhoods, dropped dramatically, approximately 61% when compared with baseline (prepandemic) levels. Residents in this cluster group appear to adjust their normal mobility behaviors substantially in response to social distancing mandates. Additionally, neighborhoods in group 5 see many visitors for various purposes (for example, commuting, tourism, education, work and so on), which may influence risk perception and thus behavior change. Furthermore, there were considerable decreases in activity volume in group 5 neighborhoods resulting from population out-migration, thus accounting for the large decrease in exposure density. The relationship between neighborhood density and social distancing behavior is more clearly demonstrated in the scatter plots of population density and exposure density change for each neighborhood group. As Fig. 4 shows, there is a statistically significant negative relationship between residential population density and exposure density change within each neighborhood group. This suggests that neighborhoods with higher residential population density reduced their exposure density more than lower-density neighborhoods.

Fig. 4: Scatter plots of residential population density and exposure density change for each cluster group.
figure 4

The linear best fit is shown.

The magnitude of the density effect on infection rates can be estimated using a spatial contact model2,43,44. Airborne transmission of respiratory viruses generally follows a density-dependent curve within a relatively lower-density setting that saturates in a higher-density setting based on a frequency-dependent constraint (Fig. 5a and ref. 2). From empirical data as shown in Fig. 5b, however, we observe a quadratic function as the best-fit relationship between residential population density and neighborhood infection rates (here, we use cumulative COVID-19 case rates through the second week of August 2020). This suggests that within higher-density communities, there are mitigating behaviors that reduce transmission rates from what would be expected based on density alone.

Fig. 5: Relationship between residential population density and COVID-19 infection rate.
figure 5

a, Illustration of the conceptual relationship between population density and infection rate. b, A scatter plot of neighborhood residential population density and infection rates for the first wave of the COVID-19 pandemic. The generalized best fit curve (y = −120.2x2 + 2,282x − 8,062) results in a higher R2 (0.027) than a linear relationship (R2 = 0.00026). c, A conceptual diagram for identifying neighborhoods with and without mitigating behaviors.

We assume that mitigation effects begin at the inflection point of the curve, and identify communities with densities greater than the x value of the inflection point and case rates lower than its associated y value (the lower-right shaded quadrant). Neighborhoods in this quadrant are assumed to exhibit mitigating behaviors (for example, mobility changes to reduce exposure density) to lower case rates below what would be expected on the basis of residential density alone. The magnitude of the mitigation effect can be measured by the relative vertical distance between the inflection point in the curve and a given neighborhood’s case rate (for a detailed description, see Methods). Supplementary Table 3 represents the summary of mitigation effects for each clustered group (city-specific mitigation effects are summarized in Supplementary Table 4 and Supplementary Fig. 5).

To understand differences between higher-density neighborhoods with and without mitigating behaviors, we examine exposure density change, total residential population change, activity proportion change and socioeconomic characteristics of neighborhoods above and below the estimated inflection point of case rates for each cluster. We specifically examine the role of racial composition and income on disparities in infection rates and exposure density change, controlling for neighborhood density and land-use characteristics. We analyze statistically significant differences within and across cluster groups using t-tests and a logistic regression model (for detailed descriptions of the data and methodology, see Methods).

The t-test results are presented in Table 1, and the results of the logistic regression model are presented in Supplementary Table 5. Race and ethnicity are found to be significant factors in mitigating behavior for higher-density neighborhoods. For example, the percentage of non-Hispanic White population is positively associated with mitigating behavior (making up 41.65% of the population of neighborhoods with mitigating behavior versus 15.94% of those without, odds ratio 1.789), indicating that neighborhoods with larger proportions of non-Hispanic White population reduced exposure density after controlling for urban density. On the other hand, the proportion of Black population is found to have mixed outcomes depending on neighborhood group. The effect for Asian and Hispanic populations is found to be consistent across the different neighborhood clusters. Neighborhoods with a greater reduction in exposure density are more likely to have a higher proportion of Asian households and fewer Hispanic households. Overall, more racially diverse neighborhoods are less likely to change their mobility behavior, particularly in higher-density communities.

Table 1 Demographic and socioeconomic characteristics of higher-density neighborhoods with and without mitigating behavior. Statistically significant features shown

Educational attainment and job occupation are also found to be important determinants of social distancing and exposure density change. Neighborhoods where social distancing behavior adoption is greatest tend to have populations with higher educational attainment (based on the percentage of the population with a bachelor’s degree or higher) and a larger percentage of employees working in finance, real estate, professional, scientific and management occupations. On the other hand, neighborhoods without mitigating behaviors tend to have a higher proportion of employees working in manufacturing, wholesale, retail, transportation and health care, occupations more likely to be considered essential or less conducive to work-from-home arrangements45. This finding is observed in the results of both the t-tests and the logistic model, where the odds ratio for essential workers is 0.519.

Income, health insurance coverage and neighborhood health characteristics are found to be among the influencing factors in the behavioral responses to social distancing guidelines, after controlling for land use and density. Neighborhoods with higher median household incomes (measured as the normalized difference from the city mean) reduced exposure density more than relatively lower-income neighborhoods, resulting in a larger mitigation effect in high-density areas. The results for households without health insurance and residents with underlying health conditions suggest that socioeconomically disadvantaged and vulnerable neighborhoods are less likely to adopt social distancing behaviors, despite higher health risk factors, when compared with other neighborhoods with similar built environments. This is a result, in part, of the higher likelihood of households to be employed in essential occupations and/or those that cannot be done remotely. Finally, political ideology is shown to be a significant determinant of social distancing compliance across all clustered neighborhood groups. In areas with similar density and land use types, neighborhoods in Republican governed states are less likely to engage in social distancing behaviors. Additionally, neighborhoods with higher proportions of residents who voted for Democrat Joe Biden in the 2020 Presidential election are more likely to engage in social distancing behavior compared with communities with more citizens who voted for Republican Donald J. Trump.

Discussion

Our study provides evidence for the effect of urban density on COVID-19 infection rates at the scale of the neighborhood, while accounting for changes in mobility behavior and social distancing within communities. We note that our study has several limitations, including the effects of city and neighborhood testing differentials on reported COVID-19 case data, the absence of local COVID-19 case data for three cities in the study group, and the unobserved impacts of physical distancing and the use of masks. These constraints are described in detail in Methods.

We observe two critical findings. First, the risk of COVID-19 infection is, in general, positively correlated with density; however, as density increases, the likelihood of the adoption of mitigating behaviors increases, which counteracts the potential transmission risk associated with higher-density neighborhoods. The effect of neighborhood density on COVID-19 infection rates, therefore, is found to be nonlinear, influenced by socio-spatial disparities in income, racial and ethnic composition, and political ideology. Case rates increase with density to a point, measured here to be approximately 13,260 residents per square mile of residential land area, at which case rates begin to decrease. After controlling for neighborhood density and land-use characteristics, we observe that this decrease is driven, in part, by the adoption of mitigating behaviors in the form of social distancing and changes in mobility. Residents in relatively higher-density communities change their mobility behavior most, perhaps in response to the perception of higher transmission risk, and this results in a mitigating effect on infection rates below what might be expected from density factors alone. While infection rates tend to decrease after a specific population density, there is significant variation after accounting for neighborhood socioeconomic and demographic characteristics.

If social distancing behavioral responses stem solely from disparities in risk perception based on residential population density, we would expect that most—if not all—higher-density neighborhoods should exhibit a mitigation effect, either from behavior change (shifting activities from nonresidential to residential areas) or a decrease in population (out migration from higher risk areas). To the contrary, we observe that there are higher-density communities without mitigating behaviors, such as in the South Bronx and Flatbush in New York City, Dorchester in Boston and East Garfield Park in Chicago (as illustrated by communities above the horizontal line from the inflection point of the density-infection rate curve and spatially visualized in Supplementary Fig. 5). This suggests that while density influences social distancing behavior, additional factors contribute to exposure density change and infection rates more than density alone.

Second, we find that only certain groups are able to change their mobility behavior sufficiently to offset the increased risk due to density. The observed differences in mitigating behavior are directly correlated with the racial and economic composition of the neighborhood. We find that, after controlling for neighborhood population density and land-use mix, communities without mitigating mobility behaviors tend to have a greater proportion of racial and ethnic minorities, lower median incomes and more critical vulnerabilities, including overcrowded housing, lack of health insurance and higher likelihood of underlying health conditions. Furthermore, the proportion of essential workers in nonmitigating neighborhoods is almost double that of neighborhoods with mitigating behaviors. This indicates that neighborhoods facing the greatest risk from COVID-19 infection are least able to change their mobility behavior in response to social distancing mandates, putting these neighborhoods at continued risk for COVID-19 transmission despite stay-at-home orders. These findings suggest a paradox for social distancing mandates—while they are intended to protect the most vulnerable, these policies may have a disparate impact on at-risk communities that are unable, for a wide range of reasons, to meaningfully change mobility behaviors. As such, while stay-at-home guidelines and the resultant changes in behavior reduce the risk of infection for some, the ancillary negative effects on health and well-being from sheltering in place may have created additional burdens for vulnerable communities, without a considerable reduction in transmission risk.

Taken together, while infection rates increase with density, behavior change in higher-density neighborhoods appears to outweigh the intrinsic risk associated with densely populated communities. However, low-income and minority communities, facing cascading health challenges, are found to be least able to modify their mobility behavior and therefore experienced a disproportionate burden of COVID-19 infection risk during the first wave of the pandemic. The implications of this for equitable health policy interventions are substantial. If the response to stay-at-home orders is not uniform and vulnerable communities are least able to adopt social distancing behaviors, then not only do these communities face greater risk of infection due to a higher probability of exposure, they also bear the burden of the negative impacts associated with stay-at-home mandates.

Methods

Data

This study relies on several data sources. A descriptive summary is presented in Supplementary Tables 68. The primary data used for calculating the exposure density index are anonymized smartphone geolocation pings obtained from VenPath, a data-marketplace company providing mobile-application data based on more than 200 smartphone applications across the United States. The dataset contains approximately 250 billion geotagged data points associated with 4.5 million unique devices hourly, and covers the period from 1 February to 13 July 2020. Due to the sensitivity of the data, raw data were collected from the dedicated Amazon Web Services S3 cloud storage service using the Amazon Web Services command line interface and processed within the secured and access-controlled environment of the New York University (NYU) High Performance Computing facility. To further ensure anonymity and data privacy, after classifying mobile activity data into one of the representative types using land-use information (as described further in Methods), geolocation data were aggregated to a 250-meter grid and further averaged at the neighborhood level (operationalized here as the census tract, but other areal units could be used with the method, as shown in Supplementary Fig. 1). The data processing and data management plan were approved by NYU’s institutional review board (approval no. IRB-FY2018-1645).

For the purpose of assigning an activity type to the mobility geolocation data, we used a range of city-specific land-use data combined with building footprint data, administrative boundaries and road networks. These data were sourced from official administrative records obtained through publicly available open data platforms. Local road network data were obtained for each city from the OpenStreetMap platform using QGIS software with dedicated plugins. Each data layer was converted into the Geographic Coordinate System corresponding with the mobility data and clipped to the local municipality (city) boundary extent. A full list of ancillary data used for this rasterization process and analysis is provided in Supplementary Table 7.

To examine neighborhood-level COVID-19 infection rates, we collected available data on confirmed COVID-19 cases for 12 cities from multiple data sources at the zip code level (localized COVID-19 data were not available for three of the initial sample of 15 cities). On the basis of the number of reported COVID-19 cases for each city, we used spatial interpolation to adjust to the census tract level. Specifically, the infection rate is defined as the number of confirmed cases per 10,000 residential population. COVID-19 case data sources are presented in Supplementary Table 8. Additionally, we retrieved multiple ancillary datasets for neighborhood demographic, socioeconomic and political characteristics. We used the US Census Bureau 2019 5-year estimate American Community Survey database for census tract data on race, ethnicity, household type, housing, income and job occupation attributes46. Voting patterns and political affiliation were extracted from the standardized precinct 2020 presidential data, developed and shared by the New York Times47. We also obtained census tract data on neighborhood health conditions, specifically chronic disease and health indicators provided by the Centers for Disease Control and Prevention48. We used these data to estimate the percentage of neighborhood residents with underlying medical conditions. These datasets are publicly available and acquired from open data platforms and government databases.

Land-use rasterization and mobility behavior classification

We considered activity outside of residential land uses to be associated with a greater risk of infection. While within-household transmission rates have been found to be a important component of COVID-19 disease spread49, our study focuses on the role of urban density in infection risk and, therefore, emphasizes the likelihood of close contacts with those outside the family or household unit.

The rasterization and mobility activity classification methodology is illustrated in Supplementary Fig. 6. To differentiate between residential and nonresidential activities, we first created a 1-meter resolution rasterized basemap with specific land-use type labels for each city. The rasterized basemap integrates city and administrative boundaries, street and sidewalk networks, land-use classifications and building footprints derived from various data sources (as presented in Supplementary Table 7). For each city, the extent of the study area was defined using the administrative boundaries of the municipality, converted into the projected coordinate system (EPSG:3857) and rasterized at a 1-square-meter resolution. Similarly, street network, land use and building footprint data were also reprojected, rasterized, encoded and aligned with the city boundaries raster. After processing, these rasters were then stacked such that each pixel represents a specific land-use/land-cover class. Land cover information is categorized into multiple integer values to represent different standardized land-use types (such as 10 for residential properties, 30 for commercial buildings, and so on; for the full list of classes used, see Supplementary Table 9). In addition to the associated land-use classification label, each pixel is identified by its positional index referenced to the bottom-left corner of the raster, as well as its geographical coordinates.

This information is used for assigning mobility geolocation data to a specific land-use grid cell. To do so, each mobility ping was spatially joined to a corresponding land-use raster pixel based on its location. Calculation of the changes in mobility behavior and exposure density are based on the activity levels aggregated to a 250-meter × 250-meter grid and normalized by the land-use distribution within each grid cell. To estimate those changes, we count the hourly number of unique devices within each land-use category based on the matched raster pixel in a grid cell. This process helps to further ensure data privacy and account for uncertainty in the device geolocation accuracy. Land-use assignment is used as a proxy for the nature of the activity occurring in the respective cell. It allows us to differentiate between activities occurring within four distinct groups, namely residential areas, nonresidential areas (for example, commercial and retail), outdoor areas (for example, park and playground) and on major roads associated predominantly with vehicular traffic (as presented in Supplementary Table 9). Activities occurring on major road networks are excluded from the analysis. To align with the spatial resolution of the ancillary datasets, the resultant activity and exposure density metrics are aggregated and reported at the census tract level.

Initial data processing of each land-use layer was performed using QGIS software followed by the rasterization process using the Python environment within NYU’s Center for Urban Science and Progress Research Computing Facility (CUSP RCF). Mobile activity mapping was executed inside NYU High Performance Computing’s Hadoop cluster using the PySpark environment.

Measuring social distancing behavior using exposure density

To measure neighborhood behavioral responses to social distancing across multiple cities, we updated and expanded our exposure density metric developed previously9. This accounts for the land-use rasterization process for each city. Exposure density is based on the probability of contact (contact rate) with others outside of residential land uses. It measures the likelihood of random contact with a potentially infected individual given a unit area, land-use category and defined time interval. This can be expressed as the neighborhood population, including both infectious and susceptible subgroups and accounting for population changes over the study period, multiplied by the proportion of activity in nonresidential and outdoor land-use settings, normalized by the total area of nonresidential and outdoor land uses within a specified geographic boundary. Our exposure density metric accounts for not only the volume of people active in a given area, but also the extent of activity occurring outside of predominantly residential areas.

Here, we define exposure density (Er(t)) as the population within a unit area during a given time period. (Er(t)) is the number of unique devices in a selected geographic area multiplied by the proportion of activity occurring in nonresidential and outdoor land uses. This measure approximates a contact probability for mobility activities occurring outside of residential areas. Therefore, exposure density is specified as:

$${E}_{r(t)}=\frac{{N}_{(t)}\times {P}_{{\mathrm{NRO}}(t)}}{{A}_{{\mathrm{NRO}}}},$$
(1)

where (Er(t)) is the exposure density for a given temporal unit, Nt is the total population in a given geographic area, PNRO is proportion of activity occurring in nonresidential or outdoor land uses (as a proxy for contact probability) and ANRO is the total area of nonresidential and outdoor land uses in the selected geographic unit. PNRO takes into account not only how many people stay in a given area, but also how many people are active outside of their presumed place of residence. PNRO is given by:

$${P}_{{\mathrm{NRO}}(t)}=\frac{{N}_{{\mathrm{NR}}(t)}+{N}_{{\mathrm{O}}(t)}}{{N}_{(t)}},$$
(2)

where NNR(t) is the hourly number of unique devices in nonresidential land uses and NO(t) is the hourly number of unique devices in outdoor land uses at the time t. If the population of a given neighborhood all remain in their presumed home locations, as measured by the mobility data, then the exposure density would be equal to zero.

We measure the change in exposure density and activity proportions in different land-use categories for each census tract before, during and after the respective stay-at-home order was implemented for each city. This is represented by:

$${E}_{r{\mathrm{change}}({t}_{1})}=\frac{\left({E}_{r({t}_{1})}-{E}_{r({t}_{0})}\right)}{{E}_{r({t}_{0})}},$$
(3)
$${P}_{L{\mathrm{change}}({t}_{1})}=\frac{\left({P}_{L({t}_{1})}-{P}_{L({t}_{0})}\right)}{{P}_{L({t}_{0})}},$$
(4)

where Er(t1) is the 14-day average exposure density after the stay-at-home order, Er(t0) is the 14-day average exposure density before the stay-at-home order (prepandemic period) and \({P}_{L({t}_{1})}\) is the 14-day average proportion of activities occurring in different land use types L (residential, nonresidential and outdoor) in a given temporal period. \({E}_{r{\mathrm{change}}({t}_{1})}\) and \({P}_{L{\mathrm{change}}({t}_{1})}\) provide a measure of social distancing behavior, accounting for the both the volume of activity and its nature.

Estimating the effects of neighborhood density

One hypothesis we consider is that if social distancing behavior is independent of neighborhood density, there should be no statistically significant differences in exposure density change between neighborhoods with varying built environment characteristics. Specifically, we test this hypothesis by clustering neighborhoods based on their density and land-use composition, and then examining whether exposure density changes vary across the resultant neighborhood groups.

To identify neighborhood clusters based on density and built environment characteristics, we apply a k-means clustering algorithm based on five input variables: residential population density, percentage of residential land cover, percentage of nonresidential land cover, percentage of industrial or institutional land cover and percentage of open space land cover. Residential population density is defined as the residential population divided by the area of residential land cover. The k-means algorithm is specified as:

$$J=\mathop{\sum }\limits_{j=1}^{k}\mathop{\sum }\limits_{i=1}^{n}| | {x}_{i}^{(\,j)}-{c}_{j}| {| }^{2},$$
(5)

where J is the sum of squares of the distance of each data point to its assigned vector, xi is each data point, cj is a centroid for cluster j, k is the number of clusters and n is the number of data points. To find cluster centers representing certain regions of the data, k-means clustering conducts an iterative process. After optimization, it was determined that five (k = 5) clusters minimized the total sum of squares of distance between data points and the respective centroid of the cluster to which the data point belongs.

We applied this clustering analysis to all census tract neighborhoods (6,216 census tracts, excluding census tracts with residential land cover accounting for less than 5% of the total area), independent of city identification. The resultant clustered neighborhood groups were then integrated with exposure density measures to examine statistically significant differences in social distancing behavior adoption between groups by using a one-way analysis of variance (ANOVA) and a Tukey’s test for post hoc analysis.

Mitigating behaviors in higher-density neighborhoods

We first plotted the relationship between residential population density and COVID-19 infection rates for each neighborhood, segmented by clustered group, as a descriptive analysis of neighborhood distributions (Fig. 5). To measure mitigation effects in higher-density neighborhoods, we identified a best-fit curve using a quadratic equation. The inflection point of each curve marks the point where infection rates break from the frequency-dependent constraint, as shown in Fig. 5. Neighborhoods with case rates below this threshold are considered to have lower than expected case rates, and thus demonstrate a mitigation effect. This is specified as:

$${N}_{i}=\left\{\begin{array}{ll}1\quad &{{{\rm{if}}}}\,{D}_{i}\ge {x}_{v}\,{{{\rm{and}}}}\,{C}_{i}\le {y}_{v}\\ 0\quad &{{{\rm{if}}}}\,{D}_{i}\ge {x}_{v}\,{{{\rm{and}}}}\,{C}_{i} > {y}_{v}\end{array}\right.,$$
(6)

where Ni is a binary variable equal to 1 if a neighborhood exhibits mitigating behavior, and 0 otherwise, Di is the residential population density of neighborhood i, Ci is the COVID-19 case rate for neighborhood i and (xv, yv) is a vertex (inflection point) of the fitted curve (Fig. 5c). The mitigation effect for each neighborhood is calculated based on the vertical distance between the y value of the inflection point and the case rate for each neighborhood specified as:

$${M}_{i}=\left\{\begin{array}{ll}\frac{({y}_{v}-{C}_{i})}{{y}_{v}}\quad &{{{\rm{if}}}}\,{D}_{i}\ge {x}_{v}\,{{{\rm{and}}}}\,{C}_{i}\le {y}_{v}\\ 0\quad &{{{\rm{if}}}}\,{D}_{i}\ge {x}_{v}\,{{{\rm{and}}}}\,{C}_{i} > {y}_{v}\end{array}\right.,$$
(7)

where Mi is the estimated mitigation effect of neighborhood i, Ci is the COVID-19 case rate of neighborhood i and yv is the y value of the inflection point of the fitted curve. Mi indicates the modeled COVID-19 case rate reduction resulting from mitigating behaviors.

We compared higher-density neighborhoods with mitigation (Di ≥ xv and Ci ≤ yv) and those without mitigation (Di ≥ xv and Ci > yv) within and across clustered neighborhood groups (Fig. 5c). We analyzed neighborhood socioeconomic, demographic and political attributes, including race and ethnicity, age group, household characteristics, education and occupation, neighborhood health burdens and political affiliation. T-tests were used to identify statistically significant differences between neighborhoods.

In addition, a logistic regression model was used to identify statistically significant correlates of neighborhood characteristics and mitigating behaviors, controlling for the built environment context. Specifically, the dependent variable is a binary for mitigating behavior adoption (1: neighborhoods with mitigation, 0: neighborhoods without mitigation) and independent variables include demographic, socioeconomic and political features.

Limitations

There are several limitations to our analysis. First, we rely on case rate data at the neighborhood level for our study. It is acknowledged that testing rates varied across cities and communities during the first wave of the pandemic; however, testing data are not consistently available at the local level in the studied cities. In addition, as with all COVID-19 infection data, reported infections are based on residential location, not necessarily on the place where infection occurred. Second, our exposure density measure captures the relative density of mobile devices aggregated to the neighborhood scale. We cannot infer from these data the spatial proximity of devices to draw any conclusions regarding person-to-person physical distancing (for example, remaining 6 feet apart, as suggested by the Centers for Disease Control and Prevention guidelines). Finally, land-use and exposure density measures are collected and estimated for all 15 cities in the study group, whereas local COVID-19 infection data was only available for 12 of these cities. Thus, our study of the relationship between urban density and exposure density is based on the full sample, while the link between density and infection rates is estimated from the 12 cities with complete data are available. The study group is described in more detail in Supplementary Information.

Reporting summary

Further information on research design is available in the Nature Portfolio Reporting Summary linked to this article.