Rural–urban scaling of age, mortality, crime and property reveals a loss of expected self-similar behaviour

The urban scaling hypothesis has improved our understanding of cities; however, rural areas have been neglected. We investigated rural–urban population density scaling in England and Wales using 67 indicators of crime, mortality, property, and age. Most indicators exhibited segmented scaling about a median critical density of 27 people per hectare. Above the critical density, urban regions preferentially attract young adults (25–40 years) and lose older people (> 45 years). Density scale adjusted metrics (DSAMs) were analysed using hierarchical clustering, networks, and self-organizing maps (SOMs) revealing regional differences and an inverse relationship between excess value of property transactions and a range of preventable mortality (e.g. diabetes, suicide, lung cancer). The most striking finding is that age demographics break the expected self-similarity underlying the urban scaling hypothesis. Urban dynamism is fuelled by preferential attraction of young adults and not a fundamental property of total urban population.

A number of issues have been noted when fitting power laws to urban scaling data sets 22 and particularly when data sets have null values or zeros 23 . In the data considered here, this issue is occasionally severe. Although progress has been made on these problems we note the following: (1) The analysis of scale adjusted metrics 2,8,10 assumes that the power law fits are an incomplete explanation of the data. Specifically, the approach 2,8,10 assumes the residuals around a power law fit contain explainable variance and are not random relative to other residuals. (2) Power variance models (e.g. Taylor's law 22,24 ) are good models of the noise in some instances and across limited scales. However, segmented fluctuation scaling occurs at least in the case of crime 24 . (3) Alternatives to power law models have been presented 22,25 but the extent to which the problems driving their development apply to density scales is unknown. In this context, power law models and the segmented modifications used here remain useful for understanding scale in human systems despite their limitations.
If two arrays of DSAMs corresponding to indicators (X, Y) over a set of n regions are represented by X = (x 1 , x 2 , . . . , x n ) and Y = (y 1 , y 2 , . . . , y n ) , a range of similarity measures (sm) can be computed. A region in this context is a defined land area of some size. Here, it represents administrative areas in the UK (unitary authorities, non-metropolitan districts, metropolitan boroughs, and London boroughs) but could be any defined region for which indicator data is available. We considered 6 similarity measures: Pearson correlation (r(X, Y)), Spearman correlation ((S(rg X , rg Y )), Kendall correlation ((K(X,Y)), cosine similarity (c(X, Y)), and Jaccard similarities (J(X, Y)) to investigate the inter-relationships between the DSAMs.
The matrix of similarity measures (sm ij ) generated for each pair of indicator DSAMs (e.g. mortality, property, crime and age) were analyzed by hierarchical clustering based on a distance, δ ij = 2(1 − sm ij ).

Results and discussion
overview of regions. England and Wales consist of 348 regions including unitary authorities, non-metropolitan districts, metropolitan boroughs, and London boroughs. The regions ranged in area from 289 ha (City of London, England) up to 518,037 ha (Powys, Wales). Regional populations were from 2158 (Isles of Scilly, England) to 1,070,912 (Birmingham, England) while population density ranged from 0.25 people per hectare Rural-urban scaling. The density scaling model gave reasonable fits to power laws (e.g. Figs. 1, S1 to S4).
Regions did not stand out relative to the scaling laws with the notable exception of the City of London. This region was an obvious outlier in 23 separate metrics and was so extreme that it merits special attention (e.g. Fig. 1). The City of London is a small 289 hectare region within the greater London metropolitan area with a small resident population (7355) and a much larger (> 350,000) daytime population. Scaling laws have been shown to change depending on whether resident or floating population is considered 26 . In our work, many crime indicators gave positive deviations consistent with daytime population. However, dementia mortality and to a lesser extent lung cancer exhibited extreme negative deviation. The generally reduced incidence of dementia in the high population density portion of the scaling plot is intriguing. The trend can be partly explained by a lower proportion of older people. However, the exponents for age and dementia are incommensurate. Dementia mortality decreases to a greater extent than the reduction in older people. This makes the City of London which is nearly a factor of 10 below expectations even more remarkable and future studies of dementia risk should consider a more detailed look at this group of people. The density scaling exponents (Figs. 2a,b, S1, S2, S3, and S4; Tables S1 and S2) for crime and property were similar to those observed previously 7 when parliamentary constituencies were used to define areas. Approximately half of crime metrics followed simple power laws: ASB, Burglary, Vehicle Crime, Violent Crime, Other Crime, Bike Theft, Weapons and Order. The remainder exhibited segmented scaling. Drugs, Other Theft, Theft from the Person and Robbery accelerated while Shoplifting and CD&A were inhibited in high density regions. This heterogeneity of behaviors is a challenge to crime opportunity theory 27,28 and situational action theories 29,30 . A simple power law suggests uniformly increasing opportunities or criminogenic settings, but critical densities with both acceleration and inhibition require a clearer picture of what these opportunities and criminogenic settings represent. Similarly, the observation of a single relationship defining burglary across all scales challenges the notion of designed environments 31 for reducing this and the other crime types showing single exponential behavior. The behavior of the eight single power-law crime types is remarkably robust over the entire land area of England and Wales.
Examination of mortality (Figs. 2c, S3, and Table S2) revealed that in rural regions except for 5 types of cancer (liver, stomach, lung, larynx and uterine cancer) and homicide, all mortality indicators exhibited sub-linear to linear scaling. In high density regions, all mortality except homicide was strongly sublinear. The dramatic improvement in mortality can be understood by examining the scaling of age groups.
Population density (Figs. 2d, S4; Table S2) had a profound influence on age demographics. High density regions attract young adults aged 25-39 and people age 45 and over preferentially leave. Although density exponents are not directly comparable to conventional ones 32 , the strength of the super-linear attraction for young people (β H = 1.46 for the 30-34 age group) may be sufficient to explain almost all reported super-linear economic indicators 1,2,33 . This can entirely explain the acceleration of Robbery in high density areas (Figs. 3, S1, and Table S2). Age has a strong influence on the exponent for mortality indicators. For example, kidney cancer and dementia show sublinear scaling in high density regions for the general population (Fig. S5, Table S3). When the two oldest age groups are considered, the protective effect of high density remains but is less pronounced. The data is suggestive for homicide having a single scaling exponent when considered using only the 30-34 age group, however, the data is too sparse at high density to reach a robust conclusion with this data set. If this observation holds beyond the UK, it is probably an important underlying mechanism for many effects observed in the urban scaling literature. As a minimum, age groups break the universal self-similarity of the urban scaling hypothesis. Scaling is not constant across age groups.
From a policy perspective, these findings are important. Mortality and health are primarily understood in per capita terms. As noted above, UK National Health Service funding is provided through clinical commissioning groups using a formula based primarily on a constant per capita cost [12][13][14] . This per capita model may significantly  16 . The regions north of the "north-south" divide have a lower population density and DSAM metrics make clear that the excess mortality is per capita and is commensurate with rural metrics.
critical densities. Fifty-one out of sixty-seven indicators (6 crime, 8 property, 21 mortality and 16 age) exhibited a critical density (Fig. 4) distributed around a median of 27 p/h. This is similar to the average value of 30 p/h for 19 indicators in our previous work 7 . Although a bimodal density histogram is observed (Fig. 4b), a single distribution dominates. This is remarkable considering they arise from a wide range of indicators including crime, property, mortality and age. The exceptions to the rule include four age groups (aged 5-9, aged 10-14, aged 40-44, aged [45][46][47][48][49]. The 40-49 age range is the boundary between the young adults who are super-linearly attracted to high density urban regions and the elderly who preferentially leave. It is likely that were the age   www.nature.com/scientificreports/ ranges defined differently no critical point would be observed and the change in exponent around the critical values for all four is relatively small. Without these transitional age groups, only two exceptions remain. For the 45 indicators with critical densities in the same distribution, there is currently no explanation. There is no explanation for why mortality, crime, and property scaling pivots around a critical density. Age group behavior is important, but there is no explanation for the preferential attraction of young people to regions above a critical density. The critical density appears robustly near 27 p/h, but the reason it appears at that scale is unclear. The physics of percolation transitions 34-36 may offer solutions, but a unifying statistical mechanics remains to be found which predicts a transition in human behavior (crime), health (mortality), economics (property transaction values), and age demographics at a critical density remains to be found. The most striking finding is the division of the bulk of mortality indicators into two groups. One group clustered with the elderly and tended to have positive correlation with certain types of property DSAMs. The other group, nearly all of which are to some degree preventable (  . The extent to which the magnitude of property transaction value exceeds scaling expectations protects against a wide range of mortality from preventable conditions ranging from homicide to uterine cancer. These conclusions were generally reinforced by all correlation measures (Figs. S13 and S14). The similarity measures were less informative (Figs. S15 and S16). A limitation of the heatmap and clustering (Fig. 5) is the pairwise structure which does not display the significance of the correlations. A network accounting for this was created by bootstrapping the Pearson correlation with 2000 replications for every pair of metrics to identify correlations significant at 99% confidence. The resulting network (Fig. 6) has 66 nodes including all metrics except bone cancer which had no statistically significant correlations. There were 784 significant connections out of 2211 possible and the optimal modularity score (0.472) partitioned the network into 3 communities very similar to the clusters in the indicator heatmap (Fig. 4).

Correlation and hierarchical clustering of DSAMs by category. Correlation analysis and hierarchi
Specifically, the network analysis found three modules containing: the elderly and mortality; children, middle-aged people and property; and young adults and crime. There were only two exceptions to this pattern, suicide and cancer of the larynx, which clustered with young adults and crime. These two were also most closely related to each other in the clustering analysis. Cancer of the larynx has long been associated with alcohol 37 and smoking 38 and preventative measures beyond cessation are limited. The association with suicide as well as the positive correlations of cancer of the larynx and suicide with ASB, CD&A, violence, accidents, diabetes, liver and lung cancers suggests health care delivery focusing on mental health 39,40 , alcohol 41 , and community safety may be beneficial for this group. Considering these types of mortality as long term responses to violence, stress, and mental illness could lead to more efficient prevention strategies.
Analysis of DSAMs by region. To understand regional behavior the clustering and correlation analysis was repeated on the transpose of the matrix of DSAMs such that it was presented by region rather than indicator (Fig. 7). Although heterogeneity is seen, broadly two clusters appear with universal anti-correlation at the extreme ends. The two extreme ends (e.g. Stoke-on-Trent vs. Bromley) live in nearly opposite worlds. If crime and mortality are above expectation in one it is below in the other. A geomap of the two clusters (Fig. S17) divided North England, Wales and the Midlands from Southern England with some exceptions.
Self-organizing maps. The simple geomap (Fig. S17) did not provide sufficient understanding of regional heterogeneity apparent in the cluster analysis. Regions are also affected by age demographics and their importance needs to be understood better. To explore regional behavior, the 348 regions were distributed onto an 8 by 8 hexagonal self-organizing map (Fig. S18). After 350 iterations convergence was reached (Fig. S19) with 4 clusters containing 2, 95, 190 and 61 regions which were colored orange, red, blue and green, respectively (Fig. 8).
The   (Fig. S21) which shows the similarity to be related to high crime and property DSAMs and low mortality. It is important to note that the SOM classification is not based on correlation. Thus, the large group of more neutral indicators form part of the overall picture. The cluster primarily in the South of England is characterized by low mortality, a younger age demographic, and high property DSAMs. The remaining cluster (blue) represents most of the area of England and Wales. These are generally average for age, crime, and mortality with below expectation property DSAMs.

conclusion
This study represents an advance in our understanding of scaling behavior while challenging the urban scaling hypothesis. It supports the general concept of scaling by making clear the problems of per capita models when applied to health outcomes. However, incommensurate scaling in different age demographics is a challenge. The scaling hypothesis considers all people as equal participants in the acceleration of life in cities. The data here shows that much of that acceleration depends on the ability of urban regions to attract young adults. Observed urban scaling is a consequence of separate scale related processes that define the behavior of specific age demographics around a critical transition in human behavior at the rural and urban boundary.
The consequences of this are great. There have now been many studies making clear that linear per capita measures are biased 1,4,7,10,11 . The current study is the first to extend this to mortality from non-transmissible diseases and age demographics. Epidemiologists studying excess death need to understand the bias of per capita models. For example, the observed northern excess mortality in the UK 16 reflects mortality at low population density rather than north-south division. The north mostly falls into a single category (the blue region of Fig. 7) and this region does not have exceptional mortality for the population densities. Policy makers need to understand the limitations of linear per capita models. In terms of mortality outcomes, there are large cumulative economies of scale between the most rural and the highest density urban regions. This is a consequence of scale related changes in age groups, to scaling behavior across all population densities, and conditions where high density areas provide protection (e.g. dementia). Within this context, health care resourcing is skewed in favor of population dense regions.  www.nature.com/scientificreports/ The robust rural-urban division near 25-30 p/h makes clear that ignoring rural regions is a missed opportunity for researchers studying urban systems. The existence of a rural-urban boundary justifies the study of cities while providing a clearer comparison against which claims about urban areas can be made. The lack of a clear explanation for critical densities and why they appear in such a consistent place is an important unsolved problem.
The success of DSAMs and related methodologies 2,8,10,11 makes clear that any set of scaling laws provides an incomplete picture of both rural and urban landscapes. Although they may appear to be, the residuals are not randomly distributed around the scaling law whatever the model. They are extensively correlated and reveal persistent structure and regional variation.

Materials and methods
Data sets. Data on mortality and age were provided by NOMIS (https ://www.nomis web.co.uk) a database service for labour market statistics run by the University of Durham on behalf of the UK Office of National Statistics. To anonymise the mortality data, NOMIS sets values ≤ 2 to 0 and values of 3 and 4 to 5 causing some distortion of low values and rare events. The age demographic data is model adjusted for a particular year based on the most recent census. Population, land area, crime and property information were obtained from the UK Home Office and Land Registry via UKCrimeStats (https ://www.ukcri mesta ts.com) which provides alignment of public data sets using geographic shape files obtained from the Ordnance Survey Boundary Line dataset. Data covering the period from 2013-2017 were captured on 20/03/2019. A total of 67 indicators were obtained (Table 1)   Self-organizing maps. A self-organizing map is an iterative approach to representing high dimensional datasets in a low dimensional space 46,47 using a pre-defined array of nodes, m, arranged in a "grid-like" structure. We selected an 8 × 8 hexagonal array of nodes initialized to a random weight w ij in the interval [0, 1]. This array was the largest n × n array without empty nodes 48 . The nodes were then updated after introducing each regional DSAMs input vector x 1 , . . . , x n at iteration t . The distance, D j , was obtained by calculating the Euclidean distance between the input vector and weight vector for all units such that: The input vector (regional DSAMs) was assigned to the unit index j that has the minimum Euclidean distance. The weight vector w ij is updated on the "winning" unit j after each iteration such that: where x(t) is the input vector's instance at iteration t, w ij (t) is the old weight, w ij (t + 1) is the new weight and α is the learning rate in the interval [0, 1], which decreases with t , to ensure the network converges. After the learning phase, all observations (i.e. regions) are positioned into a node within the map. If two or more observations are positioned within the same node this shows similarity.
The nodes were clustered by the standardized gap statistic 49 , where k is the number of clusters, W k is the pooled within-cluster sum of squares around the cluster means and E * n denotes expectation under a sample of size n from the reference distribution 49 usually a uniform distribution (i.e. a distribution with no obvious clustering). An estimate of E * n log(W k ) , is obtained by simulating B samples of log w * k each of size n generated from a Monte Carlo sample X * 1 , ..., X * n drawn from the reference distribution. www.nature.com/scientificreports/ In each case, E * n log(W k ) is an average of B samples of log w * k . Therefore, assuming the reference distribution is a uniform distribution, a large gap statistic means that the clustering structure does not resemble uniformly distributed observations. Thus, the optimal number of clusters k occurs when Gap(k) ≥ Gap(k + 1) − s k+1 . Here, s k is the simulation error in E * n log(W k ) .
Data analysis. The data were analyzed using the statistical software R version (3.6.2) 50 78 , and png (0.1-7) 79 packages. The data were log transformed and analyzed by piecewise regression. The Davies test was used to test the significance of any changes of slope with a 99% confidence level set for inclusion of a second segment. The Davies test and Akaike (AIC) and Bayesian (BIC) information criteria were used to select single and double exponential models. The residuals from the selected model were computed and used directly as DSAMs. Correlation and similarity measures were investigated including Pearson, Spearman and Kendall correlation, cosine similarity, and Jacquard distance using the proxy package computed in a pairwise manner for all indicator metrics and regions. The Pearson correlation and uncertainties were bootstrapped using the boot package to find significant connections at 95% confidence. The obtained connections for both the indicators and the regions are used to form positive and negative networks. The networks were constructed using Gephi version (0.9.2) 80 . The selforganizing maps (SOM) were constructed using the kohonen package to investigate regional characteristics. A range of clustering methods were deployed on the SOM using the package factoextra to find an optimal number of clusters. These clusters are represented in the regional maps.

Data availability
All data generated or analysed during this study are included in this published article (and its supplementary information files). This data was compiled from a range of publicly available sources as noted in the manuscript. These are provided as the Following files: S1_data_raw.csv, S1_data_densities.csv, S1_data_cluster_means.csv, and S1_data_residuals.csv.
code availability