Main

The origin of cities can be traced back thousands of years to the rise of human civilization. As human populations grew and settled in fertile regions, the need for organized communities and shared resources became crucial. These early settlements gradually evolved into cities, serving as centers of political, economic and cultural activities1. Cities provided a hub for trade, governance, innovation and socialization, attracting individuals from diverse backgrounds. Over time, cities have grown in size and complexity, shaping the trajectory of human history and becoming integral to our modern world2. Cities are an embodiment of self-organization that exemplifies how humans have evolved through intricate interactions among each other and with their surroundings3.

Cities exhibit emergent behaviors characterized by urban scaling laws3,4, akin to other complex systems, such as turbulent fluid flows5 or ant colonies6. Urban scaling laws describe the relationship between population size and certain features of cities, such as the gross domestic product (GDP). A large body of literature has been dedicated to the study of urban scaling laws, leading to many insightful conclusions about the present and future of urbanization. For example, quantities related to the volume occupied by infrastructure, such as road surface area, built area and power cable lengths, exhibit sublinear scaling3. Quantities related to household needs, such as water and electricity consumption, tend to have linear scaling with population size3. Lastly, variables related to social outputs, such as GDP, number of new patents and research and development jobs, tend to have a superlinear scaling3. The larger a city is, the less (more) its infrastructure (social output) per capita will be; household needs per capita, instead, will not vary with the population size.

The emergence of urban scaling laws can be attributed to social interactions3,4, associated with reciprocating thoughts and experiences that allow for information exchange7. Superlinear scaling of the economy and sublinear scaling of infrastructural needs indicate clear benefits of living in cities, yet, not every social interaction is conducive to a scaling law that favors urbanization. As we have seen during the COVID-19 pandemic, social interactions in cities are also a vector to support the propagation of infectious diseases8, leading to superlinear scaling of incidence with population9. To a less understood extent, there is also evidence of superlinear scaling of crime in cities10.

Regardless of the kind of crime, superlinear scaling has almost always been observed3,11,12,13. For example, thefts show superlinear scaling in the United States and in countries in Central and South America (Mexico and Colombia) and Europe (Belgium, Denmark, France, Italy, Spain and the United Kingdom)13. Likewise, murder and aggravated assault scale superlinearly with population size for cities in the United States12. Interestingly, an equivalent scaling has been found for homicides, which encompass lawful and unlawful killing of a person14. A few hypotheses have been posited to explain superlinear scaling of crime. Some scholars12,15,16 have proposed crimes to be an output of social interactions, thereby scaling equivalently to any other socio-economic output (GDP, contagious disease rates, number of patents and so on). Others have suggested that superlinear scaling is due to the fact that the prevalence of police officers scales sublinearly with population size, so that criminals are winning in number17. Others have described superlinear scaling of crimes within a theory that combines economic complexity and cultural evolution, so that the larger the city, the more it will offer ‘factors’ that allow its residents to commit a crime11. Another potential mechanism may relate to ‘societal differences in material inequality’18, whereby higher levels of inequality in larger cities19 will lead to higher incidence of violent crime.

In the United States, many of these crimes are committed with guns; for example, about 80% of murders in the United States in 2021 involved guns20. Firearm injuries have been a leading cause of death in the country for years, surpassing the number of deaths due to car accidents in 35 US states in 202021. Whereas some authors have demonstrated that firearm homicides scale superlinearly with population size in Brazil22, to date, an analogous relationship has not been examined in the United States. To fill this literature gap and test the hypothesis of superlinear scaling of firearm violence in the United States, we examine urban scaling of both the incidences of firearm homicides and armed robberies.

Upon developing a scaling law for firearm violence, we focus on firearm ownership and accessibility—two other key elements of the US firearm ecosystem23,24,25. Recent statistics suggest that the number of firearms in the United States has reached 393.3 million26, more than one firearm per person, placing the United States at the very top rank of gun possession globally27. Consistent with these figures, the number of licensed firearm dealers has reached about 78,000 shops, passing “all McDonald’s, Burger King, Subway and Wendy’s locations combined, and twice the number of US post offices”28. We examine urban scaling of the percentage of suicide deaths committed with firearm, scaled by the city population. In the absence of a national registry, this quantity has been shown to be one of the best proxies of the prevalence of firearm ownership (Methods). The rationale behind this proxy is that the more guns are used as a means to take one’s own life, the more likely they are prevalent in that community. The prevalence of federal firearm-selling licenses, instead, offers a direct measure of the ease of access to (legal) firearms25.

Within the framework of urban scaling, we study a long-standing question in firearm research: why do people buy guns? A popular theory29,30,31 advocates for self-protection, whereby people will purchase weapons as they fear for themselves and their loved ones to be victims of violence. Quantitative evidence in favor of this explanation is limited, due to difficulties in teasing out cause-and-effect relationships from complex datasets. We propose to use the variance of cities with respect to the scaling laws to help address this technical limitation. Specifically, we investigate the triad consisting of the incidence of homicides (as a measure of violence that would trigger a desire for better self-protection), firearm ownership and prevalence of federal firearm-selling licenses.

Deviations from scaling laws are typically referred to as scale-adjusted metropolitan indicators (SAMIs)32. SAMIs are used to disentangle local features from population size, providing a true measure of local urban performance at different scales. They have been previously employed in various attempts of understanding urban structures and relationships between cities32. For example, whereas per capita homicide rates have been increasing from 1990 to 2010 in Brazilian cities, the average of the SAMIs for cities above and below the scaling law have been approaching zero, suggesting that more violent cities (above the scaling law) have been experiencing a reduction in homicides and less violent ones (below the scaling law) an increase in homicides33. Analyses with the SAMIs should yield more reliable conclusions, as no spurious relationships among urban features would appear due to the scaling laws. By applying tools of casual discovery on the SAMIs, we demonstrate an influence of firearm accessibility and homicides on firearm ownership, which we use to formulate a Cobb–Douglas model34 to predict firearm ownership in the country. The use of such models in the study of urban scaling phenomena has been recently explored by Sarkar et al.35 and Ribeiro et al.36 in the context of urban economies and CO2 emissions, respectively.

The study has two main contributions. First, we apply urban scaling theory to detail the role of population size on firearm violence, ownership and accessibility in US cities. We demonstrate that firearm violence scales superlinearly, like a social output, in contrast with firearm ownership and accessibility. Both these quantities exhibit sublinear scaling, similar to the scaling observed with respect to infrastructure: there are fewer firearms and selling licenses per capita as cities grow in size, yet firearm violence per capita increases. Second, we propose a broadly applicable methodology that combines causal discovery tools and SAMIs to infer associations in urban science. Using our methodology, we offer support in favor of the theory of self-protection as a driver of firearm ownership.

Results

Urban scaling

Urban scaling laws are expressed as simple power law models

$${y}_{i}=c{n}_{i}^{\,\beta }{e}^{{\xi }_{i}},$$
(1)

where i {1,  , N} such that N is the number of cities, yi is the urban feature of city i (such as GDP) and ni is the population size of city i (metropolitan and micropolitan statistical areas, MSAs and MicroSAs, respectively; Methods). c is a constant, and β is the scaling exponent. We say that the scaling is superlinear when β > 1, sublinear when β < 1 and linear when β = 1. c and β are usually found by fitting a linear model using the ordinary least squares regression on the logarithmically transformed data3. The ξis are the SAMIs, which capture the deviation of each city from the nominal scaling in equation (1)32. SAMIs are dimensionless measures, independent of the city size, which offer a metric of performance of any city with respect to others.

We investigate urban scaling of five urban quantities in the United States: incidence of homicides, incidence of firearm homicides, prevalence of federal firearm-selling licenses, incidence of armed robberies and prevalence of firearm ownership. The incidence of homicides or firearm homicides and the prevalence of federal firearm-selling licenses can be found from the Center for Disease Control (CDC) and from the Listing of Federal Firearms Licensees from the Bureau of Alcohol, Tobacco, Firearms and Explosives (ATF), respectively (Methods). For armed robberies, we employ the Gun Violence Archive dataset published on Kaggle (Methods). Because there is no accessible data about firearm ownership in the United States, we use the percentage of suicide deaths committed with firearms as a proxy, following established practices (Methods). Specifically, for each city, the estimate of the prevalence of firearms ownership as the fraction of the number of suicide deaths committed with firearm over the total number of suicides, scaled by the population size of the city (Methods). Suicide data can also be directly accessed from the CDC (Methods).

Urban scaling for the incidence of firearm homicides (Fig. 1a), incidence of armed robberies (Fig. 1b), prevalence of federal firearm-selling licenses (Fig. 1c) and prevalence of firearm ownership (Fig. 1d) reveal nonlinear scaling. Specifically, in agreement with observations on firearm violence in Brazil22, the incidences of firearm homicides and armed robberies in the United States scale superlinearly with β = 1.15 (β (1.10, 1.19); c = 10−5.46; σ = 0.15) and β = 1.10 (β (1.05, 1.14); c = 10−5.69; σ = 0.14), respectively. In line with survey results from Pew Research Center on gun ownership in urban versus rural America37, prevalence of firearm ownership scales sublinearly with β = 0.95 (β (0.94, 0.97); c = 100.01; σ = 0.01). Likewise, we find that prevalence of federal firearm-selling licenses scales sublinearly with β = 0.66 (β (0.63, 0.69); c = 10−1.47; σ = 0.04). Superlinear scaling is observed for homicide incidence with β = 1.12 (β (1.08, 1.15); c = 10−5.08; σ = 0.11) (Supplementary Fig. 1), in strong agreement with the scaling reported by Bilal et al.14, based on a smaller dataset (376 US cities) and an earlier time period (up to 2016). Scaling results for the incidence of suicides and firearm suicides are presented in Supplementary Discussion 1. Results about the SAMIs of the three main variables are in Supplementary Discussion 2. Scaling results obtained by varying the number of cities are examined in Supplementary Discussion 3.

Fig. 1: Results on urban scaling of firearm violence, accessibility and ownership in the United States.
figure 1

ad, The dots identify incidence of firearm homicide as a function of the combined population in 810 cities from 2014 to 2019 (a), incidence of armed robbery as a function of the combined population in 649 cities from 2013 to 2018 (b), prevalence of federal firearm-selling licenses as a function of the combined population in 833 cities from 2014 to 2019 (c) and prevalence of firearm ownership as a function of the combined population in each of the 833 cities from 2014 to 2019 (d). Data about firearm homicides were intentionally hidden for areas with less than ten homicides per the privacy policy of the CDC.

Analysis of associations based on SAMIs

To study relationships among the SAMIs of homicide incidence, prevalence of firearm ownership and prevalence of federal firearm-selling licenses, we adopt the framework pioneered by Pearl, which relies on conditional independence among variables38. Through conditional independence, one can delve into the nature of the associations that are seen from scatter plots between the SAMIs (Fig. 2a–c), hinting at the independence between homicide incidence and prevalence of federal firearm-selling licenses and at their associations with prevalence of firearm ownership. We rely on conditional mutual information, a model-free measure of dependence among variables39 (Methods).

Fig. 2: Results from the study of SAMIs of violence, firearm ownership and firearm accessibility.
figure 2

ac, Scatter plots and histograms showing marginal and pairwise joint distributions of the SAMIs of the three examined variables: homicide incidence and prevalence of firearm ownership (a), prevalence of federal firearm-selling licenses and prevalence of firearm ownership (b) and prevalence of federal firearm-selling licenses and homicide incidence (c). The contour plots visually represent the joint distribution; the more tilted the contours, the higher the pairwise linear association between the variables. d, Collider structure discovered through our framework and estimation of the fitted Cobb–Douglas model in equation (2) versus data; F and βF = 0.95 represent prevalence of firearm ownership and its scaling exponent, L and βL = 0.66 the prevalence of federal firearm-selling licenses and its scaling exponent, H and βH = 1.12 homicide incidence and its scaling exponent and β1 and β2 are the exponents of the Cobb–Douglas model for H and L, respectively.

Mutual information results on pairwise dependencies and conditional mutual information results on triple-wise dependencies are summarized in Table 1. For a 5% significance level, the SAMIs of the prevalence of firearm ownership are dependent on the SAMIs of homicide incidence and on the SAMIs of the prevalence of federal firearm-selling licenses. These pairwise dependencies still hold when conditioning on the third variable; that is, the prevalence of firearm ownership depends on the incidence of homicides also when conditioning on the prevalence of federal firearm-selling licenses, and it depends on the prevalence of federal firearm-selling licenses also when conditioning on the incidence of homicides. These results alone address only symmetric, bidirectional relationships. However, the incidence of homicides and the prevalence of federal firearm-selling licenses are not dependent, and conditioning on the prevalence of firearm ownership makes them dependent. Under standard causal learning assumptions38, this is the classic case of a ‘collider’38, where one can conclude that both the incidence of homicides and the prevalence of federal firearm-selling licenses influence the prevalence of firearm ownership (Fig. 2d); Methods provide details on the deduction of the collider, and Supplementary Discussion 3 provides robustness tests of our claims with respect the use of alternative measures of violence, accounting for the presence of external confounders (such as income inequality or urbanicity) and use of synthetic data. We warn prudence in interpreting these findings as true cause-and-effect relationships due to the inherent inability to test for assumptions of causal learning from data, such as causal sufficiency, faithfulness and Markovianity. The observed associations could, in fact, emerge due to direct or indirect causality40.

Table 1 Results of the analysis of associations based on scale-adjusted metropolitan indicators to explain firearm ownership in US cities

To infer the signs of associations between homicide incidence and the prevalence of firearm ownership and between the prevalence of federal firearm-selling licenses and the prevalence of firearm ownership, we perform a Spearman partial correlation (Table 1). We register that the associations are both positive: the worse a city is performing with respect to homicides, the more firearms it has. Similarly, the more a city has access to firearms, the higher it is its firearm ownership. To stress the importance of using the SAMIs instead of the per capita rates, we note that Spearman correlation between per capita rates for homicide incidence and the prevalence of federal firearm-selling licenses would yield a spurious significant negative correlation (ρ = − 0.157 and p < 0.001; Supplementary Discussion 4 provides details on how spurious correlations might arise).

We calibrate a Cobb–Douglas model to predict the prevalence of firearm ownership in terms of the incidence of homicide and the prevalence of federal firearm-selling licenses (Methods). The fitted powers for incidence of homicides and prevalence of firearm licenses are β1 = 0.45 (β1 (0.41, 0.48)) and β2 = 0.68 (β2 (0.62, 0.74)), respectively, R2 = 0.61 (Fig. 2d). Given that β1 + β2 > 1, returns to scale are increasing, meaning that an increase in both the incidence of homicides and the prevalence of federal firearm-selling licenses by a factor k produces an increase in the prevalence of firearm ownership by more of a factor k.

Discussion

Over the last 15 years, the theory of urban scaling has provided critical insight into the present and future of urbanization3. The superlinear scaling of GDP, number of new patents and research and development jobs with population size is a compelling argument for cities to grow and thrive. Yet not all urban features that grow superlinearly are actually beneficial to cities. Several studies have documented that crime also grows superlinearly with population size11,12,13, with crime per capita being higher in larger cities than smaller ones. In this Article, we apply urban scaling theory to the study of firearm violence, ownership and accessibility in the United States.

Unlike most nations, the United States has a strong tradition of individual gun ownership41 and the Second Amendment of its Constitution guarantees the right to bear arms42. In agreement with the recent survey by Pew Research Center37 that broadly identified that 46% of adults in rural areas own a firearm compared with only 19% of adults in urban areas, we document sublinear scaling of prevalence of firearm ownership. At β = 0.95, per capita firearm prevalence decreases with the population size of a city so that per capita prevalence would drop by a factor of two when comparing a small community of 1,000 people with a large city of millions. Sublinear scaling is also found with respect to the prevalence of federal firearm-selling licenses, suggesting that licensed dealers locate proportionally more in smaller cities, probably to serve the increased fraction of gun owners.

Sublinear scaling has been systematically discovered in previous urban scaling studies addressing different forms of infrastructure volume of cities (road surface area, built area and power cable lengths and so on)3. Could this imply that firearms are some sort of infrastructure for US cities or at least that some segments of the American society view them as such? Infrastructure generally refers to the physical and organizational structures and facilities necessary for the functioning of a society. Firearms are not infrastructure in this traditional sense, and many Americans raise issues of public safety and violence around firearms43. Yet, as detailed by Boine et al.41, “gun culture is not monolithic”; there are many attitudes and behaviors related to firearms in the country. The network of gun dealers and owners in the country may be viewed by some segments as infrastructure needed for self-defense, recreational purposes and criminal activity. In this vein, the observed sublinear scalings are not unexpected.

Reciprocally to firearm ownership and accessibility, the incidence of firearm violence scales superlinearly with population size. By associating firearm violence with criminality, this finding is consistent with the view that criminality is a socio-economic output of a city3,12, which follows a superlinear scaling, such as GDP, number of new patents and research and development jobs. Particularly relevant to our observations is the recent theory by Gomez-Lievano et al.11, which explains superlinear scaling of socio-economic output in terms of the economic complexity of the specific phenomenon and cultural evolution. Within this theory, the number of concurrent factors M that are required for a specific phenomenon controls the scaling exponent, proportionality factor and variance. As such, when comparing homicides with homicides with firearm, one should see an increase in the complexity of the phenomenon by at least one, that is, the action of securing a firearm. In agreement with this theory, we register an increase in the scaling exponent, a decrease in the intercept and an increase of the variance, when comparing scaling laws for homicides with firearm and homicides (Supplementary Discussion 5).

Urban scaling laws detail the nominal behavior of cities with respect to their population size, but, obviously, they do not capture the totality of the variance about urban quantities. Deviations from scaling laws are called SAMIs—a relevant commodity for scoring cities’ performance and understanding urban systems at different scales. Here we put forward an approach to discover associations between urban features using SAMIs in urban science, focusing on answering a long-standing question in firearm research: why do people in the United States buy guns? In our previous work44, we studied nation- and state-level patterns of firearm ownership as functions of mass shootings, media coverage of shootings and media coverage of firearm regulations. Through time series analysis, we discovered an influence of media coverage of firearm regulations on firearm prevalence. Such a relationship supports the hypothesis that people buy guns as they fear that new regulations may be enacted to curtail their rights to own firearms, offering a statistical basis to the anecdotal observation of increased firearm sales at the time of the election of President Obama when stricter regulations were on the horizon45.

Our previous work44 does not offer support for another known driver of firearm prevalence, that is, self-protection29,30,31. People may want to purchase a firearm under the fear that they and their loved ones could be the victim of a crime. Working with finer data from 99 geographic areas (nationally representative counties and county clusters from the General Society Survey46), Rosenfeld et al.31 created a structural equations model to study the relationship between social trust and firearm ownership, while controlling for several conditions, including firearm violence. Their results point at an indirect effect of social trust on firearm ownership through firearm violence, whereby they propose that “people arm themselves in response to mistrust only insofar as mistrust is translated into high levels of firearm violence”. Despite the important insights the work offered, it is based on a relatively small dataset of 99 areas and, most importantly, it uses a model that is not designed to discover causal structures: as acknowledged in the paper, “the result holds only if we (the authors) have properly specified the causal direction of the relationship between firearm homicide and firearm prevalence”.

Here we revisit the hypothesis of self-protection as a driver of firearm ownership by combining the conditional independence framework of Pearl38 with urban scaling theory. In agreement with Rosenfeld et al.31, we discover that people may purchase firearms due to a desire for self-protection, as expressed by the level of violence they will experience in their city. The effect of such a driver is moderated by another association, between firearm ownership and accessibility—a result previously hinted at by Chao et al.25. Importantly, increasing the number of variables in the analysis or modifying the way violence is measured does not affect the nature of these associations (Supplementary Discussion 3). On the basis of these findings, we calibrate a Cobb–Douglas function that is consistent with urban scaling to model to predict firearm ownership in terms of the incidence of homicides and the prevalence of federal firearm selling in cities.

This study is not free of limitations. In particular, we identify two main limitations related to the data collection. The first one is the lack of a direct measure of firearm ownership in the United States, which forces the community to utilize proxies that can be estimated from available data. On the basis of the literature47,48,49, we choose to use the fraction of suicide deaths committed with firearm as a valid proxy of firearm ownership. Such a proxy is based on the premise that people use a firearm for suicide only as a function of their ownership of a firearm, thereby discounting personal choice in the selection of the way to take their own life. The second limitation is in the notion of accessibility of firearm, which is purely based on the prevalence of federal firearm-selling licenses, thereby discounting other means (legal or illegal) to purchase a firearm. As detailed by Wintemute50, “guns sold by licensed dealers account for only about 60% of the guns sold in the United States. Guns sold by private parties, collectors and unlicensed vendors at gun shows account for 40% of all gun sales”—none of these routes to gun ownership are part of our study. In addition, we should acknowledge that federal firearm-selling licenses do not impose limitations on where to trade, thereby potentially straining the association between the prevalence of federal firearm-selling licenses and ease of access to firearms in a specific city. In principle, an authorized dealer can also sell online51—presently, we have no ability to track these activities.

Despite these limitations, our effort provides critical, city-level insight into the firearm ecosystem, which helps detail scaling laws and underpinning associations about firearm violence, ownership and accessibility. Alongside these insights, our work contributes a methodology to study associations between urban features based on SAMIs, which is broadly applicable to urban science.

Methods

Data

Homicide data were collected from the CDC Wonder causes of death database at a county level52. Specifically, the ‘ICD-10 Codes’ cause of death was filtered according to the code ‘X85-Y09 (Assault)’. The codes for homicides with firearm were ‘X93 (Assault by handgun discharge), X94 (Assault by rifle, shotgun and larger firearm discharge)’ and ‘X95 (Assault by other and unspecified firearm discharge)’. In Supplementary Discussion 3, we expand the definition of homicides to include deaths assigned to injuries of ill-defined intent, Code ‘Y10-Y34 (Event of undetermined intent)’. Interestingly, the superlinear scaling is not affected by potential undercounting of deaths due to injuries of ill-defined intent as reported by the CDC52. These deaths are more frequent in smaller urban areas so that it is prudent to ensure that they would not affect the superlinear nature of the scaling of homicide incidence. Our analysis points to a modest reduction in the scaling exponent, from β = 1.12  (1.08, 1.15) to β = 1.07  (1.04, 1.11), a decrease in the variance from σ = 0.11 to σ = 0.08 and an increase in the intercept from c = 10−5.08 to c = 10−4.70.

From the same source, we retrieved suicide at the county level data using the codes ‘X60–X84 (Intentional self-harm)’. The codes for suicide with firearm were ‘X72 (Intentional self-harm by handgun discharge)’, ‘X73 (Intentional self-harm by rifle, shotgun and larger firearm discharge)’ and ‘X74 (Intentional self-harm by other and unspecified firearm discharge)’. From the CDC Wonder database, we also collected population data. For city i, we estimated prevalence of firearm ownership (Fi) from the incidence of suicides (Si), the incidence of suicides with firearm (SFi) and the population (ni), as Fi = niSFi/Si. This proxy has been validated against data about the percentage of households reporting ownership of a firearm across 170 cities49, 21 states47 and the nine Census regions48 with correlation coefficients of 0.86, 0.90 and 0.93, respectively.

Data about armed robberies were obtained from the Gun Violence Archive dataset on Kaggle that ends in 201853, as incidents that include the word ‘armed robbery’ in the incident’s characteristic. The geographical coordinates of the armed robberies data were aggregated on a county level using the ARC GIS software54 and the US census shape files55. To convert from county codes of homicides, firearm homicides, suicides, firearm suicides and armed robberies to MSAs and MicroSAs, we relied on data from the US Bureau of Labor Statistics56, year 2013.

Data on federal firearm-selling licenses at the zip code level were collected from the ATF Listing of Federal Firearms Licensees, which begins in 201457. Specifically, we counted the number of licenses in August for each studied year and aggregated them over the years. The dataset contained eight types of license as follows: ‘type 01 Dealer in Firearms Other Than Destructive Devices’, ‘type 02 Pawnbroker in Firearms Other Than Destructive Devices’, ‘type 06 Manufacturer of Ammunition for Firearms’, ‘type 07 Manufacturer of Firearms Other Than Destructive Devices’, ‘type 08 Importer of Firearms Other Than Destructive Devices’, ‘type 09 Dealer in Destructive Devices’, ‘type 10 Manufacturer of Destructive Devices’ and ‘11 Importer of Destructive Devices’. Type 01 is by far the most common (Supplementary Table 1). All the types of license allow for selling except for type 06 (ref. 58), which we consistently excluded from our study. To convert from zip codes to MSAs and MicroSAs, we relied on data from the Missouri Census Data Center59 (CBSA/ZIP, 2010 geographies). Firearm licenses in a zip code belonging to more than one MSA or MicroSA were counted multiple times to account for the fact that persons from different, neighboring cities could purchase a firearm from the same seller. For example, zip code 57785 belongs to both Spearfish and Rapid City, South Dakota, and persons from both cities may buy a firearm there.

To improve the reliability of the data, we combined all the variables over six-year windows. Incidence of homicides, incidence of firearm homicides, prevalence of federal firearm-selling licenses and firearm prevalence were computed over six years, from 2014 to 2019. We stopped in 2019 before the COVID-19 pandemic hit the United States, bringing new insecurities in people’s lives that might have sparked more firearm purchases60,61. Incidence of armed robberies was also computed over six years, but one year before (from 2013 to 2018) as the Kaggle dataset did not contain information past 2018. As a result, population refers to the combined population over six years, measures of incidence refer to total counts over six-year intervals and measures of prevalence should be intended over the combined population over the six-year interval. For example, incidence of homicides in city i means the total number of homicides suffered by city i from 2014 to 2019; likewise, firearm prevalence in city i should be viewed as the combined number of firearm owners in city i from 2014 to 2019.

All the data acquired were aggregated over MSAs and MicroSAs because they are one of the functional definitions of cities62. Out of the 388 MSAs and 541 MicroSAs studied, seven MSAs and five MicroSAs in Puerto Rico were removed as they were not part of the CDC dataset. Additional 26 MSAs and 58 MicroSAs were removed as they experienced no homicide or had no federal firearm-selling licenses (firearm prevalence did not have null values for these cities), resulting in a total of 833 statistical areas, or cities. The three main variables (incidence of homicides, firearm prevalence and prevalence of federal firearm-selling licenses) were studied for this same set of 833 cities to ensure consistency in the analysis of associations. For the scaling analysis of other variables (armed robberies and firearm homicides), we removed cities that had null values of those variables, separately (for example, cities that did not experience armed robberies but suffered firearm homicides were still considered in the firearm homicides scaling). Hence, we studied 366 MSAs and 283 MicroSAs for armed robberies and 355 MSAs and 455 MicroSAs for firearm homicides.

Urban scaling

To find β and the ξis for each of the variables, the logarithmically transformed data were fitted into a linear model using ordinary least square regression. To compute the standard errors for the R2 and the 95% confidence interval on β, we took into account heteroscedasticity. Heteroscedasticity in regression is the dependence between the variance of the residuals and the regressor, which is typically accounted for in urban scaling fits63. The white test of heteroscedasticity64 indicates the presence of heteroscedasticity for all three variables: incidence of homicides (p value < 10−4), prevalence of firearm ownership (p value = 0.033) and prevalence of federal firearm-selling licenses (p value < 10−4). Linear regression estimations were performed using the Python package Statsmodels65. In the text, we also report the variance σ computed for the SAMIs.

Cobb–Douglas model

The Cobb–Douglas model is written as

$$F=C{H}^{{\beta }_{1}}{L}^{{\beta }_{2}},$$
(2)

where F is prevalence of firearm ownership, H incidence of homicides and L prevalence of federal firearm-selling licenses. From the scaling analyses, we know that \(F \sim {N}^{{\beta }_{F}}\), \(H \sim {N}^{{\beta }_{H}}\) and \(L \sim {N}^{{\beta }_{L}}\), with βF = 0.95, βH = 1.12 and βL = 0.66 as the scaling exponents of prevalence of firearm ownership, homicide incidence and prevalence of federal firearm-selling licenses, respectively. To retrieve \(F \sim {N}^{{\beta }_{F}}\) from the Cobb–Douglas model, we set the following constraint:

$${\beta }_{1}{\beta }_{H}+{\beta }_{2}{\beta }_{L}={\beta }_{F}.$$
(3)

The resulting model has only two free parameters, C and β1 because it can be written as follows:

$$\ln F-\frac{{\beta }_{F}}{{\beta }_{L}}\ln L=\ln C+{\beta }_{1}\left(\ln H-\frac{{\beta }_{H}}{{\beta }_{L}}\ln L\right),$$
(4)

so that the calibration is performed the same way as the scaling laws.

Analysis of associations based on SAMIs

Traditionally, teasing out associations in complex systems has relied on the use of time series66, whereby causality has been typically rooted in a Wiener–Granger sense67 or in dynamical systems theory68. In the former case, a link from X to Z corresponds to the possibility of reducing the uncertainty in the prediction of the future Z from knowledge about the history of X67. In the latter case, a link from X to Z is related to the variables belonging to the same dynamical system68. The use of time series is problematic, if not unfeasible, when working with urban data, which have high spatial resolution (100−1,000 cities) and low time resolution (yearly sampling for a few years). To address this issue, we leveraged the conditional independence framework originally developed by Pearl38.

The conditional independence framework does not require temporal data as it uses directed acyclic graphs (DAGs) to describe causal structures from conditional independence tests38. The framework is based on several assumptions, one of them being acyclicity, that is, two variables cannot be drivers of one another. This assumption may not hold true in many applications. The framework could be extended to temporal data66, where one may argue for cyclic causality; however, such a route is not feasible for urban data such as hours, where only a handful of temporal snapshots are available. Another assumption of the framework is that there are no unobserved variables. Should this assumption not hold, one might propose different DAGs with hidden variables that are compatible with the set of independence tests of Table 1. Specifically, should we relax the assumption and allow for unobserved variables, we would conclude that: (1) either L → F or there exists an unobserved common cause of L and F or both; and (2) either H → F or there exists an unobserved common cause of H and F or both. Practically, contemplating this possibility calls for exploring larger DAGs with more variables (Supplementary Discussion 3).

The main component of the conditional independence framework is a statistical test for independence between two variables conditioned on others. Typically, simple correlations are employed; however, correlation assumes a linear relationship between the variables, which might not hold for all real-world systems. Rank correlations relax the assumption of linearity, but they assume a monotonic relationship between the variables, which also might not hold. To test for possible nonlinear relationships between random variables, we rely on an information-theoretic measure known as conditional mutual information69.

Conditional mutual information makes no assumption about the underlying causal mechanism. The concept of conditional mutual information is based on the fundamental notion of (differential) entropy of a random variable X

$$H(X\,)=-\int\nolimits_{-\infty }^{\infty }p(x)\log p(x){\mathrm{d}}x.$$
(5)

The entropy of random variable measures the unpredictability of its outcomes or the average surprise its outcomes can carry; hence it can be written also as \(H(x)=\langle -\log p(x)\rangle\), measured in nats39. Mutual information is defined as the amount of information shared between two random variables, X and Z, I(X, Z) = H(X) − H(XZ) = H(Z) − H(ZX). Mutual information serves as a measure of statistical association, in the sense that if X and Z are independent, then I(X, Z) = 0. Conditional mutual information is defined as the shared information between two random variables X and Z given a set of variables W

$$I(X,Z| W\,)=H(X| W\,)+H(Z| W\,)-H(X,Z| W\,).$$
(6)

Because the true probability distributions are unknown, we rely on the estimator proposed by refs. 70,71 and the corresponding statistical testing scheme proposed by ref. 72 to assess the significance of the conditional mutual information. This estimator does not require fitting a kernel for the probability distribution and is based on the nearest neighbor statistic, which provides good estimates for small datasets. Usually, to test the significance, the estimated mutual information is compared to a surrogate distribution generated by randomly shuffling data of the X variable to destroy the relationship between X and Z. However, such an approach will also destroy the relationship between X and W, which does not produce a correct null distribution. The algorithm in ref. 72 proposes a local permutation scheme of the nearest neighbors that conserves the information between X and W while destroying the one between X and Z, thereby offering an adequate statistical test.

The conditional independence tests were performed on the SAMIs to control for the role of the population size in cities and grounding the associations in the variations of cities from the nominal behavior that scaling laws would predict. Interestingly, using per capita rates rather than SAMIs would beget spurious results in the presence of underpinning sublinear or superlinear scalings (Supplementary Discussion 4 provides details on how spurious correlations might appear). The estimation and the statistical tests have only two free parameters, the numbers of nearest neighbors for the estimate kCMI and the number of nearest neighbors that shall be permuted for the significance test kperm. Following the suggested parameters in ref. 72, we set kCMI ≈ 0.15N and kperm = 10. The surrogate distributions were each made from 10,000 null estimates from the randomly permuted data.

Reporting summary

Further information on research design is available in the Nature Portfolio Reporting Summary linked to this article.