The role of geography in the complex diffusion of innovations

The urban–rural divide is increasing in modern societies calling for geographical extensions of social influence modelling. Improved understanding of innovation diffusion across locations and through social connections can provide us with new insights into the spread of information, technological progress and economic development. In this work, we analyze the spatial adoption dynamics of iWiW, an Online Social Network (OSN) in Hungary and uncover empirical features about the spatial adoption in social networks. During its entire life cycle from 2002 to 2012, iWiW reached up to 300 million friendship ties of 3 million users. We find that the number of adopters as a function of town population follows a scaling law that reveals a strongly concentrated early adoption in large towns and a less concentrated late adoption. We also discover a strengthening distance decay of spread over the life-cycle indicating high fraction of distant diffusion in early stages but the dominance of local diffusion in late stages. The spreading process is modelled within the Bass diffusion framework that enables us to compare the differential equation version with an agent-based version of the model run on the empirical network. Although both model versions can capture the macro trend of adoption, they have limited capacity to describe the observed trends of urban scaling and distance decay. We find, however that incorporating adoption thresholds, defined by the fraction of social connections that adopt a technology before the individual adopts, improves the network model fit to the urban scaling of early adopters. Controlling for the threshold distribution enables us to eliminate the bias induced by local network structure on predicting local adoption peaks. Finally, we show that geographical features such as distance from the innovation origin and town size influence prediction of adoption peak at local scales in all model specifications.


Supporting Information 1: Spatial diffusion and churn over the product life-cycle
Video on spatial diffusion and churn of iWiW. Nodes denote towns and links represent invitations sent across towns between 2002 and 2012 on a monthly basis. The size of nodes illustrates the number of users who registered in the town by the given month and the color depicts the share of those registered users who still logged in. Adoption started in Budapest (the capital) and was followed first in its surroundings and other major regional subcenters. The vast majority of invitations have been sent from Budapest in the initial phase of diffusion and subcenters started to transmit spreading when diffusion speeded up in the middle of the life-cycle. A decisive fraction of users logged in to the website even after Facebook entered the country in 2008. Collective churn started in 2010 and the rate of active users dropped quickly in most of the towns. Exceptions are small villages in the countryside, where people have difficulties to adopt new waves of social media innovation.
For the video on spatial diffusion and churn, go to https://vimeo.com/251494015

Supporting Information 2: Correlation of Bass model predictions and geographical characteristics
Prediction Error correlates negatively with peak of adoption indicating that Bass prediction of peaks works better in towns that adopt late. Town size and Distance from the Capital are negatively correlated with each other (ρ = −0.32). We find that q i is significantly smaller in large towns than in small towns. There is a significant negative correlation between the month of Predicted Peak and q i ; while this correlation with p i is positive. Standard errors of p i and q i correlate strongly with the respective parameters. Further correlations of SEq i indicate that estimation of q i is significantly more accurate in towns where adoption peaks late but is less accurate in towns that are far from Budapest. Figure S1: Pearson correlation coefficients of Bass model and geographical characteristics of towns. Peak denotes month of observed adoption peak in towns; P. Peak is the predicted month of adoption peak by the Bass model; P. Error is predicted month of adoption peak minus the empirical peak; p i and q i denote Bass model parameters; SEp i and SEq i are standard errors of the estimated parameters; Pop denotes log 10 of town population and Dist denotes log 10 kilometers from Budapest.

Supporting Information 3: Network sampling for ABM
To model diffusion in the empirical social network, we sample the full network of 3 Million nodes by keeping the distribution of nodes according to locations and network communities. This is done by identifying the community structure of the full network with the Louvain algorithm and assigning every node into one community. Then, we take 5%, 10%, 20% samples and stop sampling when the p-value of the Kolmogorov-Smirnov test comparing both town and community distributions of the sampled and full node lists is larger than 0.95. Finally, we connect the nodes with ties that link them in the full network and exclude those nodes that are not part of the giant component.
In Table S1, we compare structural characteristics of the 5%, 10%, and 20% sample networks with the full network. Density of links in the sampled networks are on the same magnitude as the full network. However, the smaller sample we take the higher density. Global clustering (the ratio of closed triangles among all possible triangles) is identical across samples, which is around half of the full network. The fraction of links that connect individuals across towns are identical in the samples and the full network. In Figure S2, we plot degree distribution and distance decay of connections for each sample and the full network. The sample degree distributions lack the high probability of low-degrees (k < 10) that is an interesting characteristic of the full network. Further, the probability of ties at short distances (d < 10 1.5 ) deviate positively from the generally observed distance decay in the full network. This deviation is present in the sample networks as well, but only to a lesser extent. In sum, by taking the 10% sample of the full network, we cannot fully represent the fraction of low degree nodes and short-distance linkages. Consequently, Density is higher and Global Clustering is lower in the sampled network than in the full network. In our understanding, this slight bias does not disturb the consistency of our findings, since urban scaling of adoption and distance decay of spreading have similar patterns in the full network and in the 10% sample we apply in the ABM.

Supporting Information 4: Calibration of ABM parameters and their influence on adoption
Fitting the ABM to the diffusion data We fit our basic ABM model to the diffusion data using the method of Xiao et al. [1]. The first step in the fitting is finding the linear transformation between the macroscopic p and q parameters of the solution of the Bass differential equation (see Eq.1), and the microscopic p ABM and q ABM parameters that drive the neighborhood adoption in the ABM (see Eq.3).
Second, we fit the Bass DE solution to the empirical adoption curve using again the nonlinear least squares method. From this fit, we getp = 0.0001570 andq = 0.1047. Substituting these values into Eq. S1, we get our initial estimates p ABM 0 = 0.0001939 and q ABM 0 = 0.1191 for the microscopic parameter values.
Starting out from this (p 0 , q 0 ) pair, we set up a grid in the (p, q) parameter space with ∆p = 0.00001 and ∆q = 0.01. We are going to run ABMs corresponding to the (p, q) pairs on this grid, and we characterize the goodness of fit of these ABMs with respect to the empirical data by calculating the sum of the squared deviation of the ABM adoption curve from the empirical adoption curve (SSE). We keep track of the already visited grid points, the SSE at each gridpoint, and the two gridpoints with the least SSEs so far. In each search step, we take these two points, and we run ABMs and calculate the corresponding SSEs for all of their neighboring gridpoints (p ± ∆p, q ± ∆q) that we have not visited yet. Then, we determine the two new least SSE gridpoints, and continue the search. When there are no new neighbors for the two selected least SSE points that have not been visited yet, we stop the search, and select the parameter pair with the least SSE to be the parameters for the fitted ABM. Our final parameters after this optimization step are: p ABM opt = 0.0001940, q ABM opt = 0.1191. We then select the combinations for which the error was below the threshold log 10 SSE < 10.2. For these, we calculate the Pearson correlation of the peak adoption time of the largest towns (where population is greater than 5000) in the dataset. Then, as an alternative ABM model, we select h = 0.2 and l = 0.2, since this combination gives the highest correlation ρ = 0.12 apart from the original h = 0, l = 0 model, for which ρ = 0.14. Figure S3 illustrates T (x, h, l) from Eq. 5 (left) and CDF of ABM adoption considering various levels of h and l (right). Figure S4 illustrates the transformation function T (N j (t), h = 0.2, l = 0.2) and the empirical distribution of N j (t) on the full network. In this paper, we do not aim to develop a perfect T to weight adoption probability that can reproduce the empirical N j (t) distribution. Instead, we intend to modify adoption probability in a simple way and motivated by the threshold distribution. Our approach captures the notion that N j (t) peaks between 0.4 and 0.6 ( Figure S4). However, empirical N j (t) are relatively rare below 0.3 (these are high degree individuals, as reported in Figure 3B) that is not reflected by our T . Figure S4: Threshold distribution and its modulation.

The influence of transformation function on adoption probability
Substituting T in Eq. 3 with Eq. 5 gives us adoption probability at N j (t)), which equalsp ABM + N j (t) ×q ABM in case h = 0.0 and l = 0.0 andp ABM + (−1.6N j (t) 3 + 1.6N j (t) 2 + 0.8N j (t)) ×q ABM in case h = 0.2 and l = 0.2. We substitutep ABM andq ABM values and plot adoption probabilities as a function of N j (t)) in Figure S5 (left) and also their differences (right). Setting h = 0.2 and l = 0.2, instead of h = 0.0 and l = 0.0, slightly decreases adoption probability until N j (t)) = 0.2 but provides higher probability for N j (t)) values between 0.2 and 0.8. The additional probability of h = 0.2 and l = 0.2 is highest at N j (t)) = 0.6. Adoption probability of the h = 0.2 and l = 0.2 setting declines at N j (t)) > 0.8 such that probability at N j (t)) = 1 is approximately equal to the probability at N j (t)) = 0.7.

Supporting Information 5: Urban scaling estimates with control variables
To understand, whether urban scaling of adoption is governed by demographic characteristics of towns, we run multiple OLS regressions with number of adopters across life-cycle stages as dependent variable. Independent variables include town population (log) and further measures that have been used in previous studies to predict adoption rate, or to investigate inequalities: development level (average income [2]), inequalities (Gini of income [3]), internet infrastructure and media presence (Telecom Composite Index, Number of TV, Number of School PC [2]), physical barriers of social interaction in towns (Rail-River Division [3]), segregation (Ethnic Entropy [3]), town hierarchy (Subregion Centre [2]).
We find a robust urban scaling coefficient reported in Figures 4A and 4C. Economic development of towns measured in average salary increases adoption at all phases of the life-cycle prediction; whereas development in terms of telecommunication infrastructure facilitates adoption in the Innovation phase only.

Supporting Information 6: Estimates and confidence intervals of urban scaling coefficients in the ABM sample
We estimate the logarithm of adopters in towns with the logarithm of town population using an ordinary least squares regression. Table S3 details Figure 4C by reporting 95% confidence intervals for each estimates. All coefficients are significantly above 1. This indicates super-linear scaling meaning that adoption concentrates in large towns.

Supporting Information 7: Assortativity of adoption fuels peak prediction bias in large towns
Connections of individuals with similar tendency to adopt, or assortative mixing, is crucial in spatial spreading. However, predicting the likelihood of adoption is the aim of diffusion models and a priori labeling of individuals in these models would be a paradox. To illustrate adoption assortativity in our data in Figure S6A, we calculated the number of links between groups W ij and compared it to the expected number of ties E(W ij ) for which uniform distribution of links across the groups is assumed and is calculated by j Wi * i Wj Wij . We have transformed the Wij E(Wij) ratio into the (-1; 1) interval using the x−1 x+1 formula. This indicator is positive if the observed number of ties exceeds the expected number of ties and negative otherwise. The plot suggests that assortative mixing fragment the network into categories of Innovators and Early Adopters who are only loosely connected to Late Majority and Laggard users.
To characterize assortativity on the town level, we classified each user into the adopter categories stated by Rogers [4] and calculated Newman's assortativity r [5] for every town. This indicator takes the value of 0 when there is no assortative mixing by adopter types and a positive value when links between identical adopter types are more frequent than links between different adopter types. Figure  S6B demonstrates the similarity of peers in each town using the Newman r index of assortative mixing [5]. In many towns, the empirical data has a stronger assortativity than the ABM(h=0.0, l=0.0). This phenomenon is due to adoption time lag differences depicted in Figure S6C. Here we contrast ABM(h=0.0, l=0.0) and ABM(h=0.2, l=0.2) with empirical data in terms of the average difference between adoption time between each ego and the time of adoption of his/her network neighbors. The ABM differs from the empirical data in determining how fast individuals follow their connections. These observations confirm that assortative mixing in terms of adoption tendency is an important feature of spatial diffusion of innovation.
To test how assortative mixing influences the spatial prediction of the diffusion ABM(h=0.0, l=0.0) in Figure S6D, we estimated the prediction error with Newman's r with ordinary least square estimator and used the number of OSN users in the town as weights in the regression. The ABM predicted adoption earlier in the majority of small towns, where no assortative mixing was found. On the contrary, the ABM predicted adoption late in large towns, where Innovators and Early Adopters were only loosely connected to Early-and Late Majority and Laggards. In case, we do not include weights in the regression, the point estimate of assortativity is not significant. These findings confirm that assortativity in terms of the adoption probability influences diffusion [6,7] and fuels peak prediction bias in large towns.
We find that Population and Distance influence ABM Prediction Error as reported in the main text. Also, prediction is slightly late in towns that are relatively developed (measured by average income). The rest of the socio-economic variables, however, do not have significant point estimates.  Note:95% CI in parentheses. * p<0.1; * * p<0.05; * * * p<0.01