Introduction

Crime is a serious and costly challenge to many urban areas. There is a large heterogeneity in crime rates observed across and within cities. Much work has focused on economic factors, such as education, job opportunities, and poverty1,2, and sociological factors, such as social control and collective efficacy3,4. Importantly, as proposed by environmental criminology theory5, some of this heterogeneity may be due to the characteristics of the physical environments of different neighborhoods. For example, how individuals in a neighborhood engage with their physical environment in the sense of where they choose to spend time may also influence crime. In this study, we analyze neighborhood networks constructed through cell phone mobility data to test sociological and psychological hypotheses on the relationship between specific physical environment variables, e.g., tree canopy, and sociological variables, e.g., local street activity, and their relationship with crime.

A growing body of research examines the impact of the physical environment on crime, through such features as climate6, vacant lots or buildings7, ambient and artificial light8, or disorder9. Another factor that has been much less studied is the impact of urban greenspace10. Some research has demonstrated a negative relationship between crime levels and various types of urban greenspace, such as tree canopy11, vegetation levels12, and greened lots13, while others have failed to find a relationship between greenspace and crime at all14. In at least one case, researchers found a significant positive relationship between parks and crime, as crime was found to be clustered in and around greenspace15. Relatedly, Troy and colleagues16 found a mostly negative relationship between tree canopy and crime across an urban-rural gradient, except for in certain areas at the interface between residential and industrial areas, where the association was positive. A potential problem is that these studies have used the static physical presence of greenery, either binary or as a quantified amount, as their independent variable. This coarse measure may be leading to equivocal results, as it is uncertain how residents interact with the available greenspace. Accounting for differences in experiential engagement may be critical in determining the efficacy of urban greenspace17. As such, it becomes important to quantify how individuals interact with greenspace in their city in terms of quality, type, and amount of interaction because such variations likely affect the relationships between greenspace and crime. However, doing so is not trivial and requires unique data and analyses that allow researchers to monitor, en masse, how individuals interact with different physical environments in their cities.

The actual mechanisms by which greenspaces affect crime remain uncertain. One potential sociological mechanism is that urban greenspaces could increase residential street activity. For example, trees and grass can create pleasant public spaces where neighbors can interact and spend time outside18,19 and are associated with more walking trips20. Thus, urban greenspace can motivate individuals to spend more time on the streets of their neighborhood. This increase of “eyes on the street” then can help prevent criminal behavior21,22,23, which is in accordance with the theory of crime prevention through environmental design24. Through the lens of routine activity theory, residents spending time outside within their neighborhood may be effective guardians against crime25. Interactions with urban parks have been shown to increase feelings of place attachment, which in turn increases guardianship19. Additionally, busy streets have been proposed to empower communities by helping promote social cohesion26, which leads to safer neighborhoods27,28.

Another potential mechanism relating crime and greenspace is psychological, by restoring attentional functioning29. Long-term and acute exposures to greenspace are associated with improvements in cognitive functioning30. These improvements in attentional functioning resulting from experiencing urban greenspace led to reduced aggression for adults living in public housing projects31. These results are in accordance with theory, suggesting that attention is an underlying psychological resource that influences self-control29. Therefore, any intervention that might increase attentional capacity, such as interactions with nature, would increase self-control and subsequently reduce criminal behavior. Additionally, reduced attentional fatigue throughout a group of co-located people may allow them the cognitive resources to be more vigilant or social, both of which can help create a safer neighborhood, as described above. This cognitive mechanism suggests that urban greenspaces may contribute two mechanisms to predict crime: (1) at an individual level, an effect of enhancing cognitive resources required for self-control, which then would lead to reduced crime and (2) at a neighborhood level, an effect of increasing social interaction and street activity, which would then also lead crime reductions.

An alternative hypothesis is that the causal direction for an association between crime and greenspace runs in the opposite direction. According to this hypothesis, interactions with greenspace do not cause less crime, but rather more crime prevents individuals from interacting with greenspace as they do not feel safe visiting their neighborhood parks. Conversely, less crime would cause increases in park visits. Additionally, under this hypothesis, high crime neighborhoods may have less physical greenspace if tree maintenance and planting are neglected in these areas due to concerns for worker safety, for example. In this way, fear of crime could also decrease local street activity.

Here, we approach this problem using unique cell phone-based mobility datasets from tens of thousands of residents where we can measure the amount of street activity in a neighborhood and the amount of active engagement that residents have with greenspace through visits to parks. This makes it possible to tease apart each of these factors’ associations with crime. Smartphone penetration is extremely high in American cities. Virtually all smartphones include a GPS chip, as well as applications that can retrieve the device’s physical coordinates. This allows for the recording of human mobility with high granularity and volume32. By identifying park visits within such data, our study interrogates the impact of realized access to parks, as distinct from their potential of use captured by more traditional sources like park area or land use. In this study, we are able to determine if: (1) street activity and exposure to urban greenspace add unique information to a model predicting crime, and (2) if intentional greenspace contact (i.e., park visits) and incidental greenspace contact (i.e., tree canopy) have unique associations with crime. To achieve this, we analyzed crime data in two large, diverse, urban locations in the US. We first analyzed crime data over a 1-year period in Chicago. We then independently repeated the same analysis in New York City to confirm that the relationships found were not specific to Chicago. It is possible that visiting any cultural amenity may be related to less crime, which would indicate no special role for a park visit. As such, we also investigated whether museum visits would have the same association as park visits. In both cities, we found that park visits and street activity uniquely and significantly predicted reduced crime (controlling for income, education, and other demographic factors), with these variables having similar size associations in most models. Tree canopy was only significant in models for Chicago. In contrast, we failed to find a significant relationship between museum visits and crime in either city.

We also conducted an exploratory directed acyclic graph analysis to determine if there were direct or indirect relationships between crime and these variables. We found direct relationships between park visits and crime, as well as local street activity and crime. These results suggest important, independent, and significant roles for the physical and social environments of cities in potentially reducing crime in urban areas.

Results

Chicago

Figure 1 shows choropleth maps for the number of park visits, tree canopy, street activity, and crime rates for the City of Chicago. We ran four spatial error models, individually adding our independent variables of interest, with non-violent crime as the dependent variable. The first model only included tree canopy, the second model included tree canopy and park visits, the third model included tree canopy and street activity, and the fourth model included all three variables of interest (see Table 1). The models controlled for a number of socioeconomic variables including: unemployment, income, poverty, crowded housing, residential stability, foreign born population, size of the resident population, working population, and educational attainment, after first regressing out percent Black and percent Hispanic. We found that tree canopy, park visits, and street activity all had significant, and negative associations with non-violent crime in each of the models. The model that included all of the predictor variables had the best fit, as indicated by the lowest Akaike Information Criteria (AIC).

Fig. 1: Choropleth maps of Chicago.
figure 1

a Number of monthly park visits, b Percent tree canopy, c Local street activity (as percentage), d Total crime rate (per 1000 resident population, log-transformed). Airports and census tracts with missing data have been removed. Total crime rate shown for visualization purposes only; all linear model analysis was done separately for violent and non-violent crime using crime counts while adjusting for residential and working population.

Table 1 Spatial error models for non-violent crime in Chicago.

For violent crime, controlling for all of our confounding variables, tree canopy, park visits, and street activity all showed significant negative associations with violent crime across all models (see Table 2). Again, the model of best fit was the model that included all three of these variables.

Table 2 Spatial error models for violent crime in Chicago.

As crime was log-transformed and the independent variables were standardized, we can determine the percent change in crime associated with each of the significant predictor variables. In the models with all variables included, a 5% increase in street activity was associated with 6.9% and 9% less non-violent and violent crime, respectively. An increase in park visits equal to 25% of the average number of visits was associated with 4.9% and 6.8% less non-violent and violent crime, respectively. An increase of 5% tree canopy was associated with 3.3% less violent and non-violent crime. As a comparison, 9.9% less poverty was associated with the same amount less violent crime as a 5% increase in street activity, while 7.4% less poverty was associated with the same amount less as an increase in park visits equal to 25% of the average number of visits.

We then investigated whether museum visits have the same association with crime as park visits, in order to determine if different types of amenities may be interchangeable. We failed to find a significant relationship for museum visits with either violent or non-violent crime, in models both with and without park visits (see Supplementary Table 1 for models).

New York City

Figure 2 shows the choropleth maps for number of park visits, tree canopy, street activity, and total crime rate for New York City at the census tract level. Controlling for all of our confounding variables we found that park visits and street activity had significant, negative associations with non-violent crime, while tree canopy was not significant (see Table 3). Park visits and street activity each added unique information to the model, and the model with all variables had the lowest AIC. For violent crime, park visits and street activity had significant, negative associations with crime (see Table 4), and provided unique information to the model. Tree canopy, again, was not significant. In the models with all variables included, a 5% increase in street activity was associated with 5.0% and 2.7% less non-violent and violent crime respectively. An increase in park visits equal to 25% of the average number of visits was associated with 4.8% and 5.7% less non-violent and violent crime respectively. To compare the strength of association, 7.6% less poverty was associated with the same amount less violent crime as an increase in park visits equal to 25% of the average number of visits, while 3.6% less poverty was associated with the same amount less crime as 5% increase in street activity. As in Chicago, we examined if museum visits would have similar associations with crime, however, we failed to find a significant relationship between museum visits and crime in New York City, in models both with and without park visits included (see Supplementary Table 2 for models).

Fig. 2: Choropleth maps of New York City.
figure 2

a Number of monthly park visits, b Percent tree canopy, c Local street activity (as percentage), d Total crime rate (per 1000 resident population, log-transformed). Airports and census tracts with missing data have been removed. Total crime rate shown for visualization purposes only; all linear model analysis was done separately for violent and non-violent crime using crime counts while adjusting for residential and working population.

Table 3 Spatial error models for non-violent crime in New York City.
Table 4 Spatial error models for violent crime in New York City.

Exploratory directed acyclic graph analysis

Directed acyclic graph (DAG) models can be used to determine direct and indirect relationships between variables from observational data33. We used the fast causal inference (FCI) algorithm34 to determine if our variables of interest had direct or indirect relationships with crime. Unlike other DAG algorithms, FCI does not assume that there are no hidden or latent variables. Given that crime is a complex social phenomenon, it is likely that our model does not include all variables that influence crime, making FCI a reasonable approach. As our models for violent and non-violent crime were similar across linear regressions within the two cities, we combined violent and non-violent crime into one measure of total crime (log(violent crime + non-violent crime)). Figure 3 shows the direct connections to or from total crime in Chicago (Fig. 3a) and New York City (Fig. 3b). We found direct relationships between park visits and crime, and street activity and crime in both cities. Population also showed a direct relationship with crime in both cities, while poverty and percent foreign born population showed direct relationships with crime in Chicago, and percent unemployed and working population showed direct relationships with crime in New York City. Most relationships were found to be bidirectional, indicating that the relationship is influenced in both directions. Bidirectional arrows can also indicate that a hidden variable is directly related to each of the nodes. We did not find a direct relationship between tree canopy and crime or museum visits and crime in either city, nor did the models show direct relationships in between park visits, tree canopy, and local street activity. Supplementary Figs. 1 and 2 show connections found between all variables for Chicago and New York City, respectively.

Fig. 3: Direct relationships to and from crime found in DAG models.
figure 3

a Chicago; b New York City. Note: Regular arrowhead indicates positive relationship. Black triangle indicates negative relationship. Open circle indicates that the algorithm cannot tell whether the edge is one-directional or bidirectional.

Discussion

We used an extensive dataset on human mobility from mobile devices to find significant, negative associations between tree canopy, park visits and street activity with crime in two large US cities. We first conducted a pilot analysis in Chicago, and then replicated the results in New York City in a preregistered report. By comparing the models, which included park visits, local street activity, or both, we saw that interactions with greenspace and street activity accounted for unique variance in predicting crime. This lends support to the idea that multiple pathways are at work to explain the associations of greenspace and street activity with reduced crime and suggests that the influence of greenspace may be due, in part, to psychological/cognitive mechanisms (i.e., attentional functioning) and not only to sociological mechanisms. The DAG models also indicated separable, direct paths for parks visits and local street activity to crime, while no direct relationship was observed between tree canopy and crime. The observation of multiple pathways for these relationships will help design future research on interventions to reduce crime that focus on both individual and neighborhood factors. We also found that park visits are more associated with crime than museum visits in both cities, suggesting that these amenities are not interchangeable. Additionally, park visits showed a direct relationship with crime, while museum visits did not.

Our results are in line with previous research findings, which uncovered negative associations between greenspace and crime10,11,16,35,36. However, our “physical presence of greenspace” variables (i.e., Tree Canopy, Grass Coverage) had weaker and less consistent associations with reduced crime, compared to our “use of greenspace” variable (i.e., Park Visits). By including this variable, we show the importance of determining residents’ realized access to, or engagement with, greenspace in order to investigate its relationship with crime. Our findings of negative associations between street activity and crime are also consistent with prior theoretical and empirical work21,22,28,37. As greenspace usage and street activity added unique information to our models, and both showed direct relationships to crime in the DAGs, more studies are needed to investigate how these two neighborhood-level characteristics may work together and separately to influence crime levels.

The strength of associations for each of these independent variables was stronger in Chicago, where overall crime levels are also higher at the time of the study compared to New York City. This could be for several reasons: The parks in Chicago and New York City may have different facilities, sizes, and landscaping. Proportions of trees lining streets compared to on private property may also vary, and these may drive differences in the strength of associations as well. While both cities showed negative correlations between tree canopy and both violent and non-violent crime, only in Chicago did tree canopy stay significant in models controlling for socio-demographic variables. These two cities have very different populations, geographies, and baseline crime levels so the convergence of results for park visits and street activity in our analysis provides substantial evidence for the consistent influence of these neighborhood characteristics across environments. Future work should include additional cities to further test the generalizability of these associations. Additionally, given prior work showing that various neighborhood characteristics, such as social cohesion and walkability, had different associations with crime in cities outside of the US and Western Europe38,39,40,41, conducting similar research for cities around the world is critical to deepen our understanding of how physical and social environments influence crime around the world.

Given the observed strength of associations, the results suggest that support for green infrastructure—and importantly its use, including community programs to facilitate local street activity—could provide cost-effective ways to address crime, while additionally providing many other socioeconomic and health co-benefits. For those measures to be most effective, it will also be important to understand sociocultural elements that influence how people voluntarily engage with greenspace42. The cost of crime is difficult to calculate, however one method estimates that the direct and indirect costs for violent crime in Chicago were $5.31 billion in 201043. Thus, for example, the 6.8% less violent crime associated with a 25% of average park visits increase per month is equivalent to approximately $361 million total savings, although the amount actually saved would likely be lower due to the bidirectionality of the relationship.

While we were able to conduct an exploratory DAG analysis to investigate direct and indirect relationships in our observational data, future research is needed to investigate the causal nature of the relationships between urban greenspace and crime. There are some limitations to these models. First, the FCI algorithm does not account for the spatial nature of our data. Additionally, FCI can only find equivalent classes of ancestral graphs, or possible graph structures, as opposed to the exact true DAG that explains the data. It is also difficult to speculate about the possible latent variables that could be causing some of the dependencies. Thus, future work using interventions remains critical for examining true causal relationships. For example, while our models did not find a direct relationship between tree canopy and crime, longitudinal studies examining the physical presence of greenspace lend support to a causal relationship between greenery and lower crime, with less crime being observed after increased greening or tree planting13, and more crime being observed after a natural event that led to a reduction in trees (e.g., the Emerald Ash Borer infestation that killed many Ash trees35). The detailed mechanism behind these observations remains unclear44. A tree-lined street may indicate cues of social order or property that is cared for, indicating territoriality, which could lead to less crime45,46. Alternatively, the cognitive benefits attained after exposure to natural elements may lead to less crime, as increased attention functioning has been shown to mediate reduced aggression29,30,31. Natural environments also have been shown to increase positive affect47 as well as pro-social behavior48, both of which could translate to lower crime levels in neighborhoods where residents visit parks more or there is greater tree canopy. Future research could also explore what neighborhood characteristics are associated with high levels of street activity28. Given that our exploratory DAG models showed bidirectional relationships between crime and park visits in both cities, longitudinal studies could possibly be used to determine how much of the effects are due to one direction. For example, studies could compare crime levels before and after interventions to improve accessibility or localized campaigns to increase park usage, after verifying the effectiveness of such interventions.

While the theories that guided the design of this study suggest that it is park visits and local street activity that cause less crime, it is also important to recognize that there remain practical impacts for individuals if future studies show the opposite causal direction between these factors. That is, individuals living in high crime areas are thus disadvantaged in ways beyond the direct effects of crime exposure49, meaning that they are disproportionately unable to take advantage of the benefits that both urban greenspace and local street activity can provide to individuals18,23,30, due to crime or fear of crime, in addition to other barriers to greenspace use, such as lack of time or transit access. In this way, crime prevention may help open paths to more equitable, realized access to greenspace17, for example.

While exploring a large dataset, this study is limited by the sample of smartphone users that create our mobility data in Chicago and New York City. During the study period over four fifths of US adults in cities had a smartphone50. The educated and wealthy are modestly more likely to own a smartphone, while children and the elderly are less likely to. However, these data have been shown to be reasonably representative across census tract populations and constitute a substantial subset of the residents for each of these major cities51. Additionally, our measure of local street activity could be improved as it does not measure the quality or type of social interactions that residents may engage in while being active in their neighborhood. Certain street activity behaviors, and the resulting social interactions, may be more or less influential on crime levels. Being able to quantify the nature of local social interactions would be of great utility, but is also a very difficult proposition empirically that would require the use of additional datasets.

In addition to providing insights into the relationship between greenspace, street activity, and crime, this study demonstrates how cell phone mobility data – a large scale data source that will keep improving in the near future—can be leveraged to quantify neighborhood characteristics at scales previously impossible to access52. These near-continuous empirical measures of mobility behavior can be used to address questions from a range of fields, including urban planning, health sciences, sociology, geography, and psychology and can help to revolutionize how social scientists conduct research.

In conclusion, utilizing cell phone trace data to study human mobility presents a framework for examining intentional behaviors in cities that have practical implications for urban planners and policy makers, and theoretical implications for how greenspace and local street activity influence crime and other social behaviors. Realized park access, tree canopy and local street activity are all associated with safer neighborhoods. Our results support multiple pathways for the associations between greenspace and local street activity with crime. These data also support the notion that much of our behavior is determined by environmental factors and is not solely attributable to individual choices44. Ensuring equitable access to urban greenspace and support for neighborhood amenities to promote local street activity may be ways to help cities reduce crime, leading to more sustainable and inclusive cities, ecologically and socially.

Methods

Experimental design

To assess the relationships of physical greenspace, use of greenspace, and local street activity with crime, we analyzed cell phone trace data, LiDAR land cover data, and open source crime and demographics data, all aggregated to the census tract level, in spatially appropriate linear models in Chicago and New York City. Data analysis for Chicago served as a pilot analysis, and preregistration was completed for the New York City analysis (https://osf.io/3thza). Several changes to the experimental design were introduced during peer review, including the addition of percent foreign born and residential stability as independent variables, and shifting the years of crime data to more closely align to the cell phone mobility data. Directed acyclic graph analysis was completed to investigate the presence of direct or indirect connections between our variables of interest. This analysis is exploratory (i.e., not preregistered).

Statistical models

All models were run using census tracts as the spatial units of measurement. Our dependent variable was either violent or non-violent crime. The independent variables were tree canopy, grass coverage, park visits, distance traveled to parks, museum visits, local street activity, population, working population, median household income, percent unemployed, percent living below the poverty line, percent living in crowded housing, percent foreign born, percent residential stability, percent with less than a high-school diploma, percent with a bachelor’s degree or higher, percent Black, and percent Hispanic. Detailed descriptions of these variables and their sources are given in the following paragraphs of this section. All independent variables were z-scored, with median household income, population and working population first being log-transformed due to their positive skew. See Supplementary Figs. 3 and 4 for the correlation tables of all variables in Chicago and New York City, respectively. Non-violent and violent crime were also log-transformed, and as some census tracts had no reported violent crimes in both Chicago and New York City, violent crime counts were increased by 1 before completing log-transformation. We used a two-step regression for all our models. We first regressed out percent Black population and percent Hispanic population from crime, either non-violent or violent, using a simple linear model. This allows us to statistically adjust for previously shown associations between race/ethnicity and crime, for which there is no theoretical justification, but rather are proxies that can indicate forms of residential inequality, which are unable to be directly measured. Given the spatial nature of the data, we then ran all models as hierarchical linear models with census tract as the unit of measurement and neighborhood as a random intercept, using the residuals of the first linear regression as the dependent variable. In Chicago, neighborhoods are officially called Community Areas, while in New York City the equivalent areas are defined as Neighborhood Tabulation Areas. Thus, this hierarchical model places a census tract within its larger neighborhood in an attempt to account for its spatial location. However, if the hierarchical model had significant spatial autocorrelation as indicated by global Moran’s I, we then conducted Lagrange multiplier diagnostics using a queen contiguity spatial weights matrix to determine whether a spatial lag model or spatial error model was more appropriate. In most cases, a spatial error model was the most appropriate.

Directed acyclic graphs (DAGs)

DAGs were computed using the fast causal inference (FCI) algorithm34. FCI allows for the discovery of direct and indirect relationship structure in observational data while allowing for the presence of an unknown number of hidden, or confounding, variables, as opposed to other DAG algorithms, such as PC or Greedy Equivalence Search, which do not allow for hidden variables33.

Land cover data

Light detection and ranging (LiDAR) data for Chicago and New York City were downloaded from the University of Vermont’s Spatial Analysis Lab website. LiDAR data, collected in 2010 at 2 ft resolution, were classified into seven land cover variables—trees, grass, road/rail, building, bare soil/sand, water, and pavement (other than road). Percent tree canopy and percent grass coverage was calculated for each census tract in ArcGIS, version 10.5.1.

Cell phone trace data

Location data for this study were recorded in May 2017 by applications on users’ phones, and provided by Carto. Each data record or “ping” consists of a latitude/longitude coordinate, with a timestamp, an estimated precision, and a unique device identifier. While specific apps are not identified, the provider does provide product categories. These cover photos, texting, navigation, weather, music, dating, and many others. The largest share of data comes through Software Development Kits (SDKs) that are themselves embedded in other applications. All variables were constructed at the census tract level in order to mitigate risk and prevent the identification of individual behaviors. Carto collected cell phone trace data in accordance with privacy laws and no identifiable information on the participants was provided to the authors.

OpenStreetMaps data define the locations of roads, parks, and museums. Roads and railways are used to identify individuals in transit (and not “actively” in a park or neighborhood). For this purpose, only highways and arterials are used. These are identified using OSM tags: motorway, trunk, primary, and secondary highways (and their links), and rail and subways railways. Similarly, park boundaries are defined when the leisure tag is park, playground, garden, dog_park, nature_reserve, recreation_ground, or golf course, or if land_use is recreation_ground, natural is beach, or boundary is protected area. Museums are identified as tourism tags of museum, aquarium, or zoo, or the amenity of planetarium.

Points within 10 km of the Census Bureau’s “place” definition of New York City and Chicago are selected for analysis. These are associated with census tracts, parks, museums, and major roads and railways as already defined. This is achieved through point-in-polygon merges. Road centerlines are converted to polygons using a 10 m buffer. Locations can be recorded simultaneously by multiple apps and so duplicates are dropped. Pings with precision that is either undefined (often from the Wi-Fi network) or worse than half a kilometer are also removed. To avoid imputing “visits” to parks or neighborhoods as seen from the freeway, locations flagged along major roads and railways are removed. Home locations are then defined for each device as the modal census tract location between midnight and 6 a.m. local time. In Chicago, this location could be defined for N = 95,000 users. In New York City, this location could be defined for N = 191,000 users. Each device is thus associated with a residence, and has a series of other locations—each within a defined census tract, and possibly within a park or museum. Data were processed using Open Science Grid53.

The location data are used to derive three variables: park and museum visitation rates, and street activity. Park and museum visits were defined as the number of times over the month that a device records a location in a park or museum. One visit is counted per resource, per day. This means that short or long stays, or multiple visits on a single day are not distinguished. To suppress spurious visits to parks adjacent to the place of residence, visits within 100 meters of the home location were not counted. For this purpose only, the home location is defined as the centroid of all nighttime locations in the home tract. Each variable is averaged over users at the level of the census tract. In this way, park and museum visits account for how often residents of a census tract visit parks and museums located anywhere in the city, as opposed to how often the parks within a particular census tract are visited. This methodology accounts for the appeal, safety, and accessibility of a resource, and the ability of residents to use it.

Local street activity is derived following Saxon23 to quantify residents’ use of their own neighborhoods. It is the share of residents’ recorded ping coordinates in the immediate vicinity of their home, averaged by census tract. To express this formally, we denote the share of coordinates recorded by user u in census tract by \({\mathrm{A}}_\ell ^u\). Aggregate users by home location h, averaging the visitation rates over the set of users Rh resident there. If \(\left\| {R_h} \right\|\) is the cardinality of Rh, this is \(\hat A_{h\ell } = \mathop {\sum}\nolimits_{u \in R_h} {A_\ell ^u/\left\| {R_h} \right\|}\). Next define the vicinity Vh of h, as the k nearest neighbors to the home tract (not including the home itself). The local street activity would then be naively defined as the sum over locations in the vicinity, \({\mathrm{local}} = \mathop {\sum}\nolimits_{\ell \in V_h} {\hat A_{h\ell }}\).

However, census tracts vary in population, and the vicinity should not depend on the Census Bureau’s definitions of tracts. The number of k is therefore defined separately for each home location as the maximum number of tracts containing less than N = 40,000 total people. Index tracts by their distance from the reference tract, and denote a tract’s population by nk and the cumulative population by Nk. Then k is the smallest number such that Nk + nk+1 > N = 40,000. This allows the local street activity to be defined as

$${\mathrm{local}} = \frac{{\left( {\mathop {\sum}\nolimits_{\ell \in V_h} {\hat A_{h\ell }} } \right) + \hat A_{h\ell }\frac{{N - N_k}}{{n_{k + 1}}}}}{{1 - \hat A_{hh}}}$$
(1)

In this expression, the naive sum within Vh is corrected for the fraction of activity in tract k + 1 required to reach the N = 40,000 person threshold. The overall street activity is then the share of out-of-home locations \(1 - \hat A_{hh}\) that happen within the local space. Previous work has found that these specific constructed variables are not sensitive to either the selection criteria of devices entering the sample, or the rate at which users generate data through the location-based services23.

Crime data

Crime data for 2017 in Chicago and New York City were obtained from each city’s open data portal. This year was chosen to coincide with the cell phone mobility data. Crime data was also analyzed separately for only May of 2017, to precisely match the timeframe of the cell phone trace data (Supplementary Tables 3 and 4). Crimes were categorized as violent or non-violent and then aggregated to the census tract level. Any crime without location data, or with location listed as the precinct headquarters, was removed. In Chicago, violent crimes included assault, battery, criminal sexual assault, homicide, kidnaping, robbery, and sex offense (31.6% of total crime). In New York City, violent crimes included murder and non-negligent manslaughter, homicide, robbery, felony assault, and kidnaping & related offenses (7.5% of total crime). Locations are not reported for rape and other sex-related crimes in New York City data and were thus not included in the crime count. As some census tracts had no reported violent crimes in both Chicago and New York City, all violent crime counts were increased by 1 before completing log-transformation. In Chicago, the mean violent crime count was 105 (SD = 92.8) and the mean non-violent crime count was 227 (SD = 225). In New York City, the mean violent crime count was 16 (SD = 16.6) and the mean non-violent crime count was 198 (SD = 178).

Demographic data

Demographic data were downloaded from the U.S. Census Bureau using the American Community Survey 5-year estimates (2012–2017). Working population was computed as the total number of jobs in the census tract from the Workplace Area Characteristics table of the Longitudinal Employer-Household Dynamics Origin-Destination Employment Statistics for 2017. Census tracts with no resident population (e.g., airports) and those missing other demographic data were removed. Seven-hundred ninety-two census tracts in Chicago and 2098 census tracts in New York City were included in the models.

Statistical analysis

All analysis was completed in R, version 3.6.3. R packages used for data processing, visualization, and analysis were: corrplot, lme4, pcalg, RColorBrewer, rgdal, spdep, spatialreg, tidycensus, tidyverse, and tigris. Linear regressions to adjust for race and ethnicity used the lm function. Hierarchical linear model regressions used the lmer function (lme4 package) with REML set to FALSE. Shapefiles were read using readOGR (rgdal package). Spatial error regressions used the errorsarlm function (spatialreg package). For the spatial models, neighbors were defined using the poly2nb function with queen set to TRUE, and the spatial weights matrix was defined using the nb2listw function with style set to “W.” Spatial autocorrelation was tested using moran.mc and Lagrange multiplier tests were conducted using lm.LMtests. DAG models were run using the fci function (pcalg package) with indepTest set to “gaussCItest”, skel.method set to “stable”, and alpha equal to 0.05. Figures 1 and 2 were generated using RcolorBrewer and spplot. Supplementary Figs. 1 and 2 were generated outside of R using an online implementation of GraphViz. Supplementary Figs. 3 and 4 were generated using corrplot.

Reporting summary

Further information on research design is available in the Nature Research Reporting Summary linked to this article.