A model for simulating emergent patterns of cities and roads on real-world landscapes

Emergence of cities and road networks have characterised human activity and movement over millennia. However, this anthropogenic infrastructure does not develop in isolation, but is deeply embedded in the natural landscape, which strongly influences the resultant spatial patterns. Nevertheless, the precise impact that landscape has on the location, size and connectivity of cities is a long-standing, unresolved problem. To address this issue, we incorporate high-resolution topographic maps into a Turing-like pattern forming system, in which local reinforcement rules result in co-evolving centres of population and transport networks. Using Italy as a case study, we show that the model constrained solely by topography results in an emergent spatial pattern that is consistent with Zipf’s Law and comparable to the census data. Thus, we infer the natural landscape may play a dominant role in establishing the baseline macro-scale population pattern, that is then modified by higher-level historical, socio-economic or cultural factors.

1 Calculation of transport distances across landscapes 1

.1 Topographic datasets
We used the Shuttle Radar Topography Mission (SRTM) 1 Arc-Second Global elevation dataset for the digital elevation model, and the SRTM Water Body, and Global River Classification (GloRiC) datasets for ocean, lakes and rivers. These datasets are publicly available, distributed by the United States Geological Survey (USGS) 1 and World Wildlife Fund (WWF) 2,3 . In this study, we used the region ranging from 5 • to 18.55 • longitude, to 36.5 • to 47.5 • latitude that contains Italy and the surrounding Mediterranean Sea. The SRTM datasets contain the information on the type of cell, such as land, ocean, lake, or river, and the elevation in meters for each cell type. The data were downsized to 3 arcsecond spacing by median resampling to reduce the effects of outliers and reduce the computational complexity. To supplement the river network, rivers in the GloRic dataset greater than hydrologic class 1 were included.

Least-cost path analysis
The topography of the landscape was used to determine the best route between two points, specifically avoiding mountains, lakes, rivers or oceans, where a direct path may be too difficult to follow (see Figure S1).  Figure S1. A schematic illustration of the least cost path analysis. (a) Rather than take a direct route (dashed line) over mountainous terrain, for example, we take a detour (solid line) to avoid obstacles in the landscape. (b) The transportation cost r to adjacent cells depends both on the slope s and the cell type (land, ocean, lake or river) of the origin and destination cell. The least cost path minimises the cost along the entire route.
We used standard GIS least cost path analysis 4 to calculate the route for any detour, taking into account the topography. Thus, for each cell in the topographic dataset, the transportation cost to move to its eight adjacent neighbours was evaluated. The cost depends on both the slope, and cell type of the origin and destination cells. The least cost path between any pair of cells was then calculated on the grid graph to minimize the accumulated cost along the overall length of the route, r.
In the following sections, we set out the transportation costs for traversing land, oceans, lakes, and rivers, and transfers between each cell type. Then, we define critical parameters in the equations to reflect the transport costs for two different transport epochs, specifically a historical scenario dominated by walking, and a modern scenario with motorised vehicles.

Transportation costs for traversing each cell type
First, we formulate the transportation cost for movement across land. Although a number of factors can affect the cost of movement, such as land cover, land use, and climate, the land slope s has been used in most previous studies 4 . From a range of possible slope-based cost functions, we chose a simple quadratic function, with the critical slopeŝ 5 , given by g(s;ŝ) = 1 + sŝ 2 . This function is more appropriate for wheeled vehicles than other functions 4 and the resulting least-cost paths are in good agreement with actual historical routes when combined with an additional cost component for avoiding streams 6 . The parameter s controls the sensitivity to the slope, giving a transition point at which the cost of a straight path on the sloped-hill exceeds a switchback one 5 . If the slope is zero, the cost-based distance equals the direct distance. The cost function g is also symmetric and negative slopes have the same absolute cost to positive ones. As the result, the transport distance between locations was independent of the direction travelled. In other words, the spatial interactions between locations are assumed to be based on round-trip that are accumulated over years, since the time-scale of the simulation runs over millennia, Nevertheless, it should be noted that the model proposed in this paper is not restricted to symmetric distances, but can also accommodate an asymmetric cost function, as the model is defined on a weighted, directed graph.
In addition, we introduced an absolute critical slope parameters that constrains the maximum slope permissible -if the absolute value of slope exceeds this parameter, the cost was set to infinity. As a consequence, the cost-based distance for the route r i j between a pair of adjacent land cells (i, j) is given by: where d i j is the direct distance between the cells. Next, we set out the transportation cost for ships across oceans, rivers or lakes. The empirical cost functions in these situations are not as well characterised as land transportation, so the same quadratic function was used, but with an additional multiplying factor B > 0: In practice, the 'slope' was zero for oceans and lakes, but followed the land gradient for rivers. The factor B determines the cost ratio of ship transportation to that on land. The lower the value of B, the more effective transport by ship becomes. This factor B can be different for ocean, river and lake cell types.
Next, we set out the cost function to transition between different cell types. For example, moving from a land cell to an adjacent ocean cell requires transfer to some form of marine transport. To take account of the time penalty associated with transfer, an additive cost A > 0 was included: We note that the transit penalty A is defined in units of distance, rather than time. Finally, the cost functions described above were combined to account for the movement across the four individual types of cells (land, ocean, lake, or river), and the 10 different transfer cases between cell types. These movement costs were described by the same function with different parameter sets,ŝ,s, A, and B: (Lake − Lake) (S6) = B River−River · g(s i j ;ŝ River−River ,s River−River ) · d i j (River − River) (S7) = A Land−Ocean + g(s i j ;ŝ Land−Ocean ,s Land−Ocean ) · d i j (Land − Ocean) (S8) = A Land−Lake + g(s i j ;ŝ Land−Lake ,s Land−Lake ) · d i j (Land − Lake) (S9) = A Land−River + g(s i j ;ŝ Land−River ,s Land−River ) · d i j (Land − River) (S10) = A Ocean−Lake + g(s i j ;ŝ Ocean−Lake ,s Ocean−Lake ) · d i j (Ocean − Lake) (S11) = A Ocean−River + g(s i j ;ŝ Ocean−River ,s Ocean−River ) · d i j (Ocean − River) (S12) = A Lake−River + g(s i j ;ŝ Lake−River ,s Lake−River ) · d i j (Lake − River). (S13) The parametersŝ,s, A, B and the typical distance R in the distance-based factor f (r) = r R exp(r/R) in equation (1) were adjusted to match the transportation system in the epoch under consideration.

Parameter specification for scenarios dominated by walking
We assume that movement of population and goods in ancient times was achieved predominantly through walking, or assisted by ox carts and horse-drawn carriages. Therefore, we assume that the typical distance travelled, R, was 8km by considering the distance covered during a trip of several hours using these transportation modes. The critical slopeŝ Land−Land has been estimated previously in the range of 8-12% to match known historical routes 7 . It has been also reported that the slopes of Roman roads may have reached 12% in some exceptional locations 8 . Thus, a value of 12% was used for the critical slopê s Land−Land and 18% for the absolute critical slopes Land−Land for this transport epoch.
For transport across water, the cost ratio B relative to transport across land was specified in addition toŝ ands. This parameter determined the typical distance for movement by ships as R/B. We set B to 0.2 for rivers, lakes, and oceans, which means that the typical distance scale was 40km over water in this epoch, which corresponds to a trip lasting several hours for a ship moving at 4-6 knots. The parameters between river cells,ŝ River−River ,s River−River , were assumed to be the same as that on the adjacent land, since there is little information on the slope-dependency.
For movements between different cell types, we specify a transfer penalty A. At interfaces between land and water the penalty A Land−Ocean , A Land−Lake , A Land−River was assumed to be 1km. With the low value of B described above, these values of A gave a relative advantage to transport by ship in ancient times. On the other hand, the slope-dependent cost at land-water interfaces was assumed to be larger 6 , which corresponds to dividing the critical slope by an additional factor. Here, we used a factor of 3, and the modified critical slopeŝ Land−River ,ŝ Land−Lake ,ŝ Land−Ocean becomes 4%. The absolute slopess Land−River , s Land−Lake ,s Land−Ocean were set as 18% as in the case of land transport. No additional penalties or slope-dependent costs were included for transitions between different types of water bodies, such as a move from a river cell to a ocean cell: A Ocean−Lake = A Ocean−River = A Lake−River = 0 andŝ Ocean−Lake =ŝ Ocean−River =ŝ Lake−River =s Ocean−Lake =s Ocean−River =s Lake−River = ∞.
These parameters were determined based on prior knowledge on the ancient era independently from the actual population data. Among the parameters, we focus on the typical distance R, which critically determines the spatial distribution of the population as discussed later in Section 7, and checked whether the chosen value (8km) is optimal or not for minimizing the difference between the simulation outcome and the empirical data. Using the KL distance to evaluate the difference, we calculated the optimal R in the Roman era as shown in Figure S2. The chosen value (8km) is not exactly optimal, but almost close to the minimum. KL distance from the simulation data at R=40km to the 2011 census data at Provincia level Figure S2. Optimal parameter for R in the Roman era. Starting from the initial state by the estimates of population in the Roman era, we ramped increase in R upto 40km as in Figure 5. The x-axis indicates the initial value of the parameter R. The final simulation state at R = 40km were compared with the census population in 2011 by KL distance (y-axis).

Parameter specification for scenarios with motorised vehicles
In the modern period, we assume that the major mode of land transport predominantly shifted to automotive vehicles on roads, even though other modes of transportations are also available. The distance scale R on land was set to be 40 km, to reflect a typical one-hour car trip for commuting. The maximum permissible slope for a road was commensurate with the design speed. Thus, in Italy, the maximum slope is in the range of 4-6% for urban streets, and 10% for local roads 9 , although steeper roads over these permissible limits still exist. Therefore, a value of 5% was used for the critical slopeŝ Land−Land that served as a soft limit above which the slope significantly affected the transport cost in the quadratic function (S1). The absolute critical slopē s Land−Land was set at 12%, which gives an upper limit for the permissible slope.
For transport across oceans and lakes in the modern era, B Ocean−Ocean , B Lake−Lake were both set to 0.7, which gives a distance scale for movement by ship of 57.14km, corresponding to a few hours trip at around 20 knots. At 'ports' on the shore, the additive penalties for transfer were given by A Land−Ocean = A Land−Lake = 20km, corresponding to a loss of half an hour of land transport. The critical slope parametersŝ Land−Lake ,ŝ Land−Ocean ,s Land−Lake ,s Land−Ocean were set to be the same value as the land layer described above.
We assume that transport across rivers is via bridges and does not involve transfer between vehicles and ships. Thus, the cost ratio B River−River was set to 1, and no transit penalty was applied (A Land−River = 0). The critical slopesŝ River−River ,s River−River were the same as those on land. However, in a limited number of cases where a path went from land to ocean or lake via a river, transport may be carried by ships. Thus, a transfer penalty for a ship between a river cell and a ocean/lake cell was included, such that A River−Ocean = A River−Lake = 20km. The other parameters at the interface between different types of water bodies were as follows: A Ocean−Lake = 0 andŝ Ocean−Lake =ŝ Ocean−River =ŝ Lake−River =s Ocean−Lake =s Ocean−River =s Lake−River = ∞.

Defining four scenarios for the underlying space: Flatland, Coastline, Elevation, and Water Bodies
We considered four scenarios for the underlying space in the model: Flatland, Coastline, Elevation, and Water Bodies. Starting from an idealised isotropic or homogeneous condition, the additional landscape information was integrated into the underlying space in a step-by-step manner. For each scenario, the distances were calculated between each pair of cells by the least-cost path analysis.
The 'Flatland' scenario comprised an idealized isotropic space, where all cells were classed as land and had an identical elevation. To remove the boundary effect of running the simulation in a rectangular space, periodic boundary conditions were used. Furthermore, to follow the traditional assumption of a regular idealized space, this idealized space was approximated as an euclidean space, whilst latitude and longitude co-ordinates are usually defined on an ellipsoid. In the least-cost path analysis, all tiles of 1 × 1 arcsecond in the dataset were treated as euclidean rectangles, all with the same size defined at the centre tile of the target region.
In the 'Coastline' scenario, ocean cells were included while other cells were land. All land cells maintained the same elevation and no transport across oceans was permitted by setting an infinite penalty for A Land−Ocean .
In the 'Elevation' scenario, the topographic dataset was used without modification, and the slope-based cost for land transportation was included. However, transport on rivers, lakes, and oceans was still prohibited by including an infinite penalty at the transition between these cell types.
In the 'Water Bodies' scenario, transport over rivers, lakes, and oceans were integrated by setting appropriate values of the least-cost path parameters described above. In this scenario, water bodies were no longer barriers, but could act as transport routes between locations if the costs were lower than alternative routes on land.

Census in 2011
The census in 2011 was taken from the GEOSTAT 2011 grid dataset, distributed from Eurostat, EFGS 10 . In Fig. 3, this census data was aggregated by Italian administrative units (Regione and Provincia). The boundary information of these administrative units in 2011 was obtained from the Italian National Institute of Statistics 11 .

Census in 1911
The Provincia population in 1911 ( Fig. 6) was taken from a data repository 12 , originally from the Almanach de Gotha 13 . The boundary information of Provincia in 1911 was obtained from the Italian National Institute of Statistics 11 .

Estimated city population sizes in the ancient Roman period (164CE)
The population of the ancient Roman cities shown in Fig. 5a were obtained from the open-access database of the Oxford Roman Economy project 14,15 . The cities that existed at 164CE in the target region were selected from the database. The populations of the cities were estimated using a scaling law between city area and population, as proposed by Hanson and Ortman 16 . The sum of city populations over the cities shown in Fig. 5 was 2,800,832, which is less than the estimated population in the entire area of about 12 million 17 . The urbanisation rate rarely exceeded 20 percent of total population at this time 18 , thus the difference between these numbers represents the rural population outside the cities in total. The spatial distribution of the rural population is less clear because of methodological difficulties, but it is probably higher close to urban sites in highly urbanized areas, whereas it may be distributed further away from urban sites in low-populated regions 19 . We therefore modelled the spatial distribution of the rural population by a biased distribution to the locations close to populated cities, which is proportional to a distance-discounted score s(i) at each location i: where j u denotes the location of a city u and r i j u is the least-cost distance specified for walking with R = 8km.

Estimated population distribution during 1300 -1861
During 1300-1861, the populations in Italian cities with greater than 5000 residents were taken from the Italian Urban Population database 20,21 . The locations of the cities were determined by matching their names with the records in the GEONAMES database 22 . The rural population in total outside the cities was calculated from the urbanisation ratio during the ages 20, 23 , and its spatial distribution is determined in the same way to that in 164CE-a distribution proportional to the distance-discounted score to populated cities, given by the equation (S14) with R = 8km. In Fig. 6, the population distributions during 1300-1861 were aggregated using the Provincia boundaries in 1911 following re-unification. This is because (i) a common agglomeration is required to compare the value of KL distance across the middle

5/19
ages. (ii) High-resolution shape files for Provincia in 1911 are available in digital format from the Italian National Institute of Statistics 11 . (iii) Provincia in 1911 covers the whole Italian Peninsula, including Roma, Verona, and Vicenza that were excluded from the Provincia dataset in 1861.

GPS-tracked traffic data
The traffic data ( Figure S3) is publicly available from OpenStreetMap 24 . We used a set of GPS tracks in Italy, collected upto 2013. A track is an ordered list of GPS points describing a path. We excluded uncorrected data which only comprised a single point, and selected the tracks which had a minimum 1km trip from a location in Italy. The resultant dataset has 63999 tracks with 90600639 points. The number of these points were counted in cells 45 arcsecond × 30 arcsecond (about 1km), to give the traffic data describing the number of people passing through each cell. It is noted that anonymous contributors who uploaded the GPS tracks may not necessarily be representative samples from Italy as a whole. Thus, to correct the sample, counts were weighted by the populations in 2011 within the provinces where the tracks originated.

The Landscape-Transport-Population reinforcement model
In this section, we present the technical details of the system and numerical simulations.
As described in the main text, the Landscape-Transport-Population (LTP) reinforcement model is defined by equations (1) and (2) as follows:

6/19
where f (r) = r R exp(r/R). N is the number of locations (nodes) and the indices i and j range from 1 to N. N i is the set of locations that connect to location i. The popularity score x i (t) and the transport connectivity w i j (t) are the variables of this dynamical system. The parameters ε, d were fixed for all simulations: ε = 0.02, d = 0.81. As described in Section 7, the homogeneous solution was destabilized at d c = 0.8 without any dependence on ε. Values of d c just above the critical point were selected. R and r i j were determined by the topography under the given scenario and epoch described in Section 1.

A reaction-diffusion system on an adaptive network
The dynamical system described by equations (1) and (2) is derived from diffusion dynamics on an adaptive network. Diffusion is a fundamental process associated with many physical and social phenomena operating on real-world networks, and it is considered as a paradigm for dynamical processes on networks. The state of each node at time t is represented by the current fraction of the diffusive population at the node, x i (t). The network topology with N nodes is given by the adjacency matrix a i j , whilst the link weight from j-th to i-th node for each existing link is denoted by w i j (t). We assume that the population diffuses over the network through the movement of many random agents. Thus the diffusion process is described by: where d is a diffusion constant that controls the strength of diffusion over the whole system. The second and third terms are the inward and outward flows of the population, respectively. The first term F(x) describes a reaction process at the node, which represents the intrinsic dynamics of the population except for the diffusion process. We here assume a simple dissipation process that converges to a capacity 1/N, given by where κ is a decay constant. Thus the total popularity score, ∑ i x i (t), always converges to 1, according to the equation, . This diffusion dynamics can be rewritten as: , and I I I is the identity matrix. If κ + d = 1, then this equation gives the well-known map iteration (2) that is used to evaluate the PageRank centrality 25 . The dispersion force in equation (2) is given by the dissipation process F(x). In the extreme situation with d = 0, in which all locations are isolated, the population dynamics are governed by ∆x i = F(x i ), and the population merely converges to the constant capacity 1/N. However, if d > 0, the population diffuses over the network whose connections are simultaneously reinforced by the local population size. If d > d c , populated places spontaneously emerge on the network as shown in Section 7.
A possible extension of the model is to include space-varying capacity C i , instead of the constant capacity 1/N over all locations as described above. This means that the local dynamics at each node described by equation (S16) can be rewritten as: (S17) The equation (1), has a global stable solution under fixed popularity x i , This form is relevant to standard spatial interaction models, such as the "gravity model" and "distance-discounted attractivity", which describes the spatial interactions between two locations by terms of their population and a "deterrence function" that scales with distance r i j . Among the variations of the gravity model, we choose one of the simplest types to remove any additional parameters except R, even though models with more parameters generally give a better match to a given set of data. We combined the form x i x j r i j in the in the early classical papers 26,27 , with the additional exponential factor of exp(−d i j /R), which can represent a critical distance R caused by some physical constraints. It is noted that w i j is scale invariant and only the relative ratio among w i j is important in equation (2). The equation (1) implies that the connections w i j between locations are progressively updated to converge the equilibrium (1'), rather than the connections are instantly built at any moment.
The equations (1) and (2) are well-known, quite simple by themselves: if uncoupled, they have a single global stable solution, and the dynamical behaviour is easily understood. However, if the equations are combined, the co-evolution of the variables x i (t) and w i j (t) leads to rich phenomena of pattern formation with multi-stability.

Coarse-graining: The spatial unit of the system
The least-cost distance was evaluated on the high-resolution topographic dataset. However, this spatial resolution is too high to perform numerical simulations. Thus, the LTP model was run on coarse-grained cells of 270 (longitude) × 180 (latitude) arc-second (about 6.25km × 5.57km). This gave a total of 39,600 coarse-grained cells in the target region. Thus, each cell had 5,400 location points from the original topographic dataset. If the majority of these points were land (above 50%) the identity of the coarse-grained cell was set to be land. Otherwise, it was assumed to be water. This gave a total of 18,702 land cells in the target region, where populations could be assigned. Therefore, the total number of locations N in the simulation was 18,702, and indices i and j in equations (1) and (2) run over these locations.
The transport distance r i j between coarse-grained cells i and j was calculated by the least-cost distance between representative points located near the centre of the cell, avoiding water bodies and steep slopes that might increase unnecessary local transportation costs. To be precise, the land point chosen had the minimal slope within the central 1/4 × 1/4 of the cell. Figure  S4 shows the representative points superimposed on the topographic map.

Using cutoff distances to ensure non-negative connectivity weights
For an extremely large transport distance r i j , equation (1) potentially leads to negative w i j (t) which is inconsistent with the meaning of the variable. Non-negative values of w i j (t) can be guaranteed by the condition 1 − ε f (r i j ) g 0 for positive ε, x i (t), and x j (t) with a non-negative initial value w i j (0).
This condition leads to a cutoff distance r max i j = 1/ f (1/ε). Thus, a set of locations N i are connected to location i if the distances to them are below the cutoff. The transport connectivity w i j (t) is defined within the cutoff distance. For any pair of (i, j) beyond the cutoff distance, w i j (t) is undefined, and the locations are assumed to be disconnected. It is noted that because of the exponential term in f (r), the connections beyond the cut-off distance are extremely small (ε order), and negligible. The cutoff distance can be interpreted as the upper limit above which direct transportation is too expensive to maintain, and some stopovers are needed. Owing to the cutoff distance, some cells have no connections to any other cells, and were excluded from the simulations.
It should be noted that all locations interact via the adaptive network even with the cutoff distance. For example, a population increase in Naples induces not only population changes in neighboring places, but also the changes in the connections around it. This induces further re-organization of the adjacent population and traffic flows, which continue to propagate to other locations beyond the cutoff distance.

Initial conditions
The initial condition of the dynamical system, {x i (0), w i j (0)} for unconstrained scenarios, is given by, where Z is a normalized constant to ensure that ∑ x i (0) = 1, and ξ x i and ξ w i j are small fluctuations taken from uniform distributions in [−0.01, 0.01] and (0, 0.01), respectively. These small fluctuations are added to avoid the artefactual situation in numerical simulations where the output would otherwise remain in the homogeneous state, and is a standard approach used in studies of pattern formation.
In contrast, the initial condition in the 'historical' scenario ( Fig. 5) is unique. In this condition, x i (0) is given by the estimates of the population in 164CE, normalized so that the sum of them equals 1. w i j (0) is given by the above equation.

Numerical simulations
The least-cost path analysis and numerical simulations of the equations (1) and (2) were performed using parallel processing with 36 processors.

Stationary state
By updating the system variables x i (t) and w i j (t) using equations (1) and (2), the dynamical system converged to a fixed state {x * i , w * i j } in all scenarios and all initial conditions that were tested. The convergence was confirmed by observing the following measure at regular intervals, τ: 1 The dynamical system given by equations (1) and (2) exhibits multistability. Depending on the initial condition, the system converges to different fixed points.

Population estimates from the popularity score x i
After obtaining the stationary state {x * i , w * i j } for equations (1) and (2), the population X i at a location i was calculated from the popularity score x * i output by the model. The score x * i is the probability of random walker being present, with a total summation normalized to 1. To obtain the actual population, the probabilities were rescaled by the total population X total obtained from the census. However, the dispersion term in equation (1), ensured that there was always a tiny positive probability for the presence of a random walker even at tops of mountains in the Alps. Therefore, the minimal score x min was used as the reference for zero population, and the remaining scores scaled so that their summation equalled X total : . (S20) In Fig. 2, the total population X total is given by the summation of the census populations in the depicted region. We note that the census does not cover all countries present in the region, such as North Africa or former Yugoslavia, thus the total population X total is underestimated. In Figs. 3 and 5, where the simulation results are compared to the aggregated populations by administrative units in Italy, X total is the total population of these administrative units in the census.

Traffic estimates from the transport connectivity w i j
The traffic between locations was calculated from the transport connectivity w * i j output by the model. In equation (2), the flow from location j to location i per step is given by Using the population X i and X j described in (S20), the bidirectional traffic W i j between the locations is given by, The traffic W i j is then assigned to the least-cost path between i and j on the topographic dataset, corresponding to the transport distance r i j . Using a dummy variable δ i j (x) that returns 1 only if a point x of the topographic dataset is on the path between i and j and otherwise 0, the net trafficŴ (x) passing through each point x was calculated as:

A counterfactual model of geographical determinism
In the main text, a counterfactual model, named Geographical Determinism was introduced, whereby the pattern of population was determined directly by the external topographic information, without the internal reinforcement dynamics between populated places and their transport connectivities. We here describe the detailed structure of the model and clarify the differences from the LTP reinforcement model. In the geographical determinism model, instead of equation (1), the evolution of transport connectivity w i j is described by, where the popularity scores at the endpoint locations do not influence the evolution of the weight. This equation has an equilibrium w * i j , This result means that the transport connectivity w * i j is given by the inverse of distance-based cost f (r i j ). Using this inverse cost w * i j , we obtain the equilibrium of the popularity score x * i from equation (2).

10/19 5 Aggregate traffic at the Provincia level
The aggregate traffic at the Provincia level are shown in Figure S5. The figure depicts quantitative comparison between the traffic dataset for Italy and the LTP simulation results for different landscape scenarios. Error bars depict the standard deviation (n=64) with different initial conditions. Inclusion of more landscape features progressively reduces the variance at each location, stabilises the output, and improves the match between histogram profiles from simulation and traffic data.  6   3   0   3   6   TO  VC  NO  CN  AT  AL  AO  IM  SV  GE  SP  VA  CO  SO  MI  BG  BS  PV  CR  MN  BZ  TN  VR  VI  BL  TV  VE  PD  RO  UD  GO  TS  PC  PR  RE  MO  BO  FE  RA  FC  PU  AN  MC  AP  MS  LU  PT  FI  LI  PI  AR  SI  GR  PG  TR  VT  RI  RM  LT  FR  CE  BN  NA  AV  SA  AQ  TE  PE  CH  CB  FG  BA  TA  BR  LE  PZ  MT  CS  CZ  RC  TP  PA  ME  AG  CL  EN  CT  RG  SR  PN  IS  BI  LC  LO  RN  PO  KR  VV  VB  MB

Validation by other metrics
In the main text, we evaluate the difference between the simulation result and the empirical data (Figure 3 g,h), and found that the KL distance progressively decreases as the landscape factors are included.
In this section, we validate the improvement by other metrics, namely Pearson's correlation and cosine similarity. These metrics are given by, Cosine similarity = ∑ i P(i)Q(i) where P, Q denote the spatial distributions in empirical data (P) and simulation (Q), respectively. The index i of the distributions (P, Q) indicates the individual Provincia.P,Q are the means of the distributions. Figure S6 shows these metrics for the population distribution for Provincia. For the metrics, the inclusion of the landscape factors improves the metrics and reduces the variations among the different initial states. Figure S7 shows the same metrics, but for the traffic distribution, which exhibits a similar improvement to that for the population distribution.

Linear stability analysis for the homogeneous solution to the model
In this Section, we provide a linear stability analysis for the homogeneous solution of the model, to reveal the criteria under which populated places spontaneously emerge. Figure S8 shows time-sliced images to illustrate the dynamical behavior of the model in the Flatland scenario. Populated places spontaneously emerge as a result of the reinforcement dynamics starting from an almost homogeneous initial distribution, even if there is no topographic inhomogeneity. As shown below, this behaviour is driven by destabilization of the homogeneous solution against a finite wavelength of perturbation, which is a general mechanism of pattern formation.
It should be noted that the model is defined as a network-based system with reinforced connections, not as a classical activator-inhibitor system diffusing in free space. Thus, unlike traditional pattern formation models, the difference in diffusion constants is not the key parameter leading to destabilization in the model. Here we clarify the conditions leading to destabilisation, and discuss the contribution made by the model parameters in this transition.

Homogeneous solution
Let us consider the homogeneous solution of equations (1) and (2), .

Linearized system
First, we derive the linearized equation for δ x i (t) = x i (t) − x * i , δ w i j (t) = w i j (t) − w * i j . Substituting equations (S24) and (S25) into equation (2), we obtain where and C j , and δ i j denotes the Kronecker delta function. It is easy to verify that the linearized equation for (1) is , which means that ∑ j δ x j (t) decays in time with the constant rate 0 < d < 1. Therefore, we can use the initial condition ∑ j δ x j (0) = 0 without loss of generality.

Eigenvalue equation for the linearized equation of the homogeneous solution
In the literature on pattern formation, linear stability of the homogeneous solution is usually estimated by analyzing the linear stability of all spatial modes. For systems having spatial translational symmetry, this spatial mode coincides with the spatial Fourier component. However, it is not trivial in the LTP model to know whether we can confirm stability using spatial modes, because some components of the eigenvectors in the Jacobian are labelled by two locations, i.e., w i j has two suffixes i and j.
In order to clarify and solve this problem, let us introduce an N(N + 1)-dimensional vector v v v = (δ x 1 , δ x 2 , · · · , δ x N , δ w 11 , δ w 12 , · · · , δ w 1N , δ w 21 , · · · , δ w 2N , δ w 31 , · · · , δ w NN ) T , namely, where +x, gives the largest integer not larger than x. Then equations (S26) and (S29) can be summarized by where T T T is the Jacobian matrix with the size N(N + 1) × N(N + 1). If T T T is diagonalizable, v v v(t) can be written by the superposition of the eigenvectors of T T T , as where φ φ φ a = (y i j . The corresponding eigenvalue is Λ a , and c a is a constant determined by the initial conditions.

14/19
In order to represent the eigenvectors using quantities only allocated to a single site, let us eliminate z (a) i j from the linearized equations, and rewrite them with y (a) i only. Since φ φ φ a is an eigenvector, its component y i and z i j terms must satisfy the linearized equation (S29) by replacing δ x i and δ w i j with y i and z i j . Recalling equation (S33), we have which leads us to The above equation allows us to convert z i . In this way, we can investigate the linear stability of this system on a spatial coordinate system. Finally, substituting equation (S36) into equation (S26), we can rewrite the eigenvalue equation for Λ in terms of y y y as where .
(S38) In this model, from equation (S29), the transition probability has an equilibrium δ w i j (t) = δ x i (t) + δ x j (t) N f (r i j ) in the linear approximation. For ε → ∞, this equilibrium is achieved immediately when x i changes. This can be obtained from equation (S36) by taking the limit ε → ∞. This result is reasonable, because ε determines the relaxation time scale of w. In other words, the terms Λ a − 1 in the denominator of equation (S36) represent the effect of slow relaxation of w.

LTP reinforcement model without topographic inhomogeneity
Let us consider a specific case, where N sites align on a one-dimensional ring. We further assume no topographic inhomogeneity, i.e. a flat landscape. This means we assume translational symmetry of the system, C j = C holds for all j, and J i j and K i jk reduce to The distance r i j between points i and j (i f j) is given by r i j = min( j − i, N + i − j). Let us discuss the linear stability of specific cases to see how instability of the homogeneous state takes place.

Nearest neighbor coupling case
For the simplest case, let us assume that only the nearest neighbors are coupled, i.e., f (r i j ) = 1 for j = i ± 1 and f (r i j ) = ∞ otherwise. In this case, 1 f (r i j ) hold. Therefore, we

15/19
have It is noted that H i j does not depend on R, owing to the assumption of nearest-neighbor coupling. The matrix H is circulant and symmetric. Hence, its eigenvalues are real and can be represented by using the matrix elements as where κ = 0, 1 N , 2 N , · · · , N−1 N . The associated eigenvectors are and its complex conjugate, where i = √ −1.
Here Nκ gives the wave number of the eigenvector. In the continuum limit N → ∞, κ can take 0 f κ < 1. Because of the symmetry, we focus on the range 0 f κ < 0.5.
Although the eigenvalue Λ κ can be explicitly obtained by solving the quadratic equation (S43) with respect to Λ κ , we can estimate the instability of the homogeneous state more easily using the fact that the eigenvalues are real. Thus the instability occurs by crossing either Λ κ = 1 or Λ κ = −1. Let us concentrate on the former case. Let Λ κ = 1, Eq. (S43) can be rewritten as It can be easily verified from the fact that the peak of F (κ) is 5/4 for κ = 1/6. Therefore, the mode with κ = 1/6 becomes unstable at d = 4/5 for the first time when d is increased.

Next nearest neighbor coupling
To discuss the effect of parameter R, which was removed by the assumption of nearest-neighbor coupling in the previous section, we next consider the case that each node is coupled with both the nearest neighbors and the next nearest neighbors. The coupling is given as f (r i j ) = 1 R exp( 1 R ) for j = i ± 1, 2 R exp( 2 R ) for j = i ± 2, and f (r i j ) = ∞ otherwise. Let us focus on calculating the bifurcation point where the homogeneous state loses linear stability. As in the nearest neighbor coupling case, elements of the H matrix for Λ = 1, where the instability occurs, are obtained as Note that this matrix is identical to the one for the nearest neighbor case for the limit R → 0. Finally, at the critical point with eigenvalue Λ = 1, the relationship between the wave number κ of the perturbation in Fourier mode, and the parameters d and R are given as follows: − 8e − 1 R cos (6πκ) − 2e − 2 R cos (8πκ) =: F (κ; R, d). (S48)

Role of model parameters
The model without topographic inhomogeneity has only three parameters: the typical distance scale R, the diffusion constant d, and the time-scale ratio between the two dynamical processes ε. Using the result of the linear stability analysis given by equation (S48), we clarify the role that these parameters play in destabilization of the homogeneous solution. First, the parameter ε does not appear in equation (S48). Thus, ε is not involved in this type of destabilization. Second, the diffusion constant d controls the destabilization that triggers spontaneous emergence of populated places. Figure S9a shows the stable and unstable area in (R, d) parameter space, evaluated from equation (S48). This analytical result is numerically confirmed in Figure S9b, in which we plot the value, N · [max i x i (t) − min i x i (t)] at the stationary state. This value should be zero if the homogeneous solution is stable. The boundary between stable and unstable regions depends only on the diffusion constant d, and not on R. By increasing d, F (κ; R, d) increases and hits the line of F (κ; R, d) = 1 at the critical point d c = 0.8, as shown in Figure S10a. Figure S10b,c show the behaviour of the model just below or above the critical point, supporting the analytical prediction.
Third, the typical distance scale, R, controls the wavelength of the destabilized Fourier mode at the critical point d c , but R does not itself trigger destabilization as shown above. At the critical point d c , the eigenvalue Λ = 1 at a finite wave number κ c of the perturbation in Fourier mode. This means that the unstable mode κ c starts to separate from the homogeneous solution, and leads to the emergence of populated places. Thus, κ c determines the number of populated places in the finite space (N locations). In other words, it controls the interval distance between the populated places. By increasing R, the unstable mode κ c decreases, as shown in Figure S11a. This corresponds to fewer populated places with longer distances between them. The numerical simulations support this analytical prediction ( Figure S11b).
Thus the parameters of the proposed LTP model have different effects on the pattern formation of populated places that quite distinct and intuitively understandable: (1) The time-scale ratio ε is unrelated to the destabilization; (2) The diffusion constant d determines the destabilization condition d c = 0.8; (3) The typical distance scale R determines the spacing between populated places at the destabilization point.