Urbanity and the dynamics of language shift in Galicia

Sociolinguistic phenomena often involve interactions across different scales and result in social and linguistic changes that can be tracked over time. Here, we focus on the dynamics of language shift in Galicia, a bilingual community in northwest Spain. Using historical data on Galician and Spanish speakers, we show that the rate at which shift dynamics unfold correlates inversely with the internal complexity of a region (approximated by the proportion of urban area). Less complex areas converge faster to steady states, while more complex ones sustain transitory dynamics longer. We further explore the contextual relevance of each region within the network of regions that constitute Galicia. The network is observed to sustain or reverse the dynamic rates. This model can introduce a competition between the internal complexity of a region and its contextual relevance in the network. Harnessing these sociodynamic features may prove useful in policy making to limit conflicts.


1.-Quantitative information for all the regions.
In Supplementary Table 1 we present all the relevant quantitative data corresponding to each of the regions considered for our analysis in the main text plus the averages over the four different groups (A to D). The parameter c was obtained by fitting the experimental data s described in the main text. The percentages of urban population for each region (u) was calculated directly from data available in [1,2]. The values of K were estimated for a situation similar to Galicia. The growth factors were calculated as described in the Methods section considering all previous values. Additional details are in the caption. SupplementaryTable 1: Quantitative information for all the regions considered in the analysis presented in the main text [2]. Data is grouped according to the classification introduced in the main text and explicitly given in the first column. Second column is just a reference number and the third column presents the name of each region. 4 th and 5 th columns present the values of c as obtained from the experimental fits and the standard deviation (c). 6 th column is the average of the c value for each group of regions (<c>) with the corresponding error in column 7 th (<c>). 8 th column presents the fraction of urban population (u) calculated as the total urban population in that region over the total population in that region and column 9 th the averages corresponding to each group. 10 th column contains the K values used for the simulation of the Galician case. Column 11 th , the average of the K parameters for each region. Column 12 th is a list of all the growth factors calculated for each region () and in column 13 th , the corresponding averages for each group. The values of the c parameter are fits of the original data in [2].

2.-Definition of Singular Population Entities.
The definition of singular population entity according to the Spanish Instituto Nacional de Estadística (INE, Spanish Statistical Office) is "any inhabitable area of the municipality, inhabited or exceptionally uninhabited, clearly differentiated within it, and which is known by a specific denomination that identifies it without the possibility of confusion".
The amount of 5000 inhabitants was chosen in this manuscript to discriminate between rural or urban settlements based on existing regulation. Locations with 5000 inhabitants or more, new regulations regarding construction and council representation apply. Also, direct inspection of the raw data shows a discontinuity in the series at precisely 5000 inhabitants (most likely due to the actual legislation).

3.-On the normalization of the variables (total population)
The model without the network term (Eqs. 1 in the main text) was theoretically analyzed and it was demonstrated that the total population was kept equal to one [3]. We did not perform a similar analysis (it is beyond the scope of this manuscript) when the network term is included (Eqs. 2 in the main text) but simple simulations for multiple sets of parameters demonstrate that the total population in each node is kept equal to 1. In Supplementary Fig. 1a we show the evolution of the variables ( , , ) for an arbitrary region (node of the network). The total population is also plotted and the value is always equal to 1 as expected. Supplementary Fig. 1b shows the final values of the variables ( , , ) at the stationary for all the nodes/regions in the network. Note that the total population is always equal to one.
The remainder of the subplots in Supplementary Fig. 1 show the same type of results for completely different values of the parameters ( , ). In all cases considered the total population is always equal to 1.

4.-Effect of the structure of Ki
All the results shown along the manuscript considered the simplest coupling between the nodes that actually accounts for the amount of urban population in each node. Here we analyze other, more complex, situations in order to provide evidence for the generality of the results.
Along the main manuscript, we analyze the temporal evolution of the speakers. As shown in the main text, the network term is given by (1) In the paper, the results were evaluated choosing the network weight factor as = (1 + ) and the adjacency matrix elements as = 1/ . The initial case is set equal to ( 0, 0 ) = (0.3, 0.3) with a random perturbation of up to 10%. Note that the total population is maintained equal to 1 with the constraint = 1 − − . Equations were integrated using Euler´s method with a time step ∆ = 10 −4 . The results are discussed along the main text. a) Symmetric coupling. Another possible parameterization for the adjacency matrix could be In this case, as = (1 + ) is still the same, the coupling coefficient becomes symmetric. It corresponds in a situation where a region feels each one of the other nodes of the network not only because of its amount of urban population but also because of the amount of urban population in the other nodes. It means that a very rural region will not feel the network at all, but those urban nodes will just feel those other nodes that are mostly urban.
The initial conditions and solving method used are as in the case discussed in the main text.
The results are in Supplementary Fig. 2. The plots on the left show the temporal evolution of the variables ( , , ) of one single node of the network as well as the total population (that remains equal to 1 as expected). The plots on the right column, show the stationary state for each one of the 20 regions/nodes in the network with this configuration. Each row corresponds to a different set of the coupling parameters ( , ). Note that the total population still remains normalized to unity. For this particular type of coupling, we did not find the reverse situation. This can be understood analyzing the coupling coefficients as the factor ( • ) strongly connects urban regions and dismisses connections between rural zones. Information is more accessible for urban zones. b) Non-symmetric coupling coefficient weighted by the total urban population in each node. Another option for the adjacency matrix is given by where uj is the percentage of urban population in node j and pj is the absolute value of the population in node j. thus ( • ) is the total urban population in node j. In this case, the contribution of each node is weighted by the total urban population in such a way that regions with low population play a less significant role than those with large populations (that in our case usually corresponds with predominantly Spanish-speaking regions).
Supplementary Fig. 4 plots on the right the temporal evolution of the variables ( , , ) of one single node of the network as well as the total population (that remains equal to one as expected).