In recent years Italy has been involved in massive migration flows and, consequently, migrant integration is becoming a urgent political, economic and social issue. In this paper we apply quantitative methods, based on probability theory and statistical mechanics, to study the relative integration of migrants in Italy. In particular, we focus on the probability distribution of a classical quantifier that social scientists use to measure migrant integration, that is, the fraction of mixed (natives and immigrants) married couples, and we study, in particular, how it changes with respect to the migrant density. The analysed dataset collected yearly by ISTAT (Italian National Institute of Statistics), from 2002 to 2010, provides information on marriages and population compositions for all Italian municipalities. Our findings show that there are strong differences according to the size of the municipality. In fact, in large cities the occurrence of mixed marriages grows, on average, linearly with respect to the migrant density and its fluctuations are always Gaussian; conversely, in small cities, growth follows a square-root law and the fluctuations, which have a much larger scale, approach an exponential quartic distribution at very small densities. Following a quantitative approach, whose origins trace back to the probability theory of interacting systems, we argue that the difference depends on how connected the social tissue is in the two cases: large cities present a highly fragmented social network made of very small isolated components while villages behave as percolated systems with a rich tie structure where isolation is rare or completely absent. Our findings are potentially useful for policy makers; for instance, the incentives towards a smooth integration of migrants or the size of nativist movements should be predicted based on the size of the targeted population.
Systems made of a large number of components can be suitably analysed with probability theory and statistical mechanics formalism. This comes from the fact that each component can be mapped into a random variable and the behaviour of the entire system encoded on their joint probability distribution. The simplest, although somehow idealised, case is when the random variables are mutually independent. In statistical physics this is often referred to as a perfect gas or, in general, as a system of non-interacting particles where their global behaviour is fully and easily deducible from that of the single ones. From the mathematical point of view a perfect gas is described by a joint probability distribution that is the product of the probability distribution of each particle. Under very general assumptions, when the particles are of the same type and have each a regular distribution, their large sum converges to a smooth function of the natural parameters and, suitably normalised, can be proved to have Gaussian fluctuations, according to the well known Central Limit Theorem (Feller, 1960).
Real systems, nevertheless, are rarely described tout court by such an elementary scheme. This is because interaction among parts is ubiquitous and the independence among the components is rather an exception than the rule. Still, it has been understood that, even in the presence of interaction, systems display generically a smooth behaviour and Gaussian fluctuations, apart from special points in the parameter space. Those points, called critical points, display non-Gaussian fluctuations and their exhaustive comprehension and classification is among the main challenges of probability theory and statistical physics (Liggett, 1985; Parisi, 1988). Also within the social science, the relevance of the interaction among agents is a growing focus of research (Scheinkman, 2018; Horst and Scheinkman, 2006; Glaeser and Scheinkman, 2001; Bialek et al., 2014; Brock and Durlauf, 2001a, b; Durlauf, 1999) that is progressively generalising the original social choice paradigm (McFadden, 2001) based on the independent agents assumption.
In this paper, we investigate a data set collected by the Italian National Institute of Statistics (ISTAT) for the years 2002 to 2010 on the social choice (Weber, 1978) (for a native) of marrying a person from the host country (i.e., Italy) or from a different one. This quantifier is used, among other classical ones (Portes and Sensenbrenner, 1993; Rannala and Mountain, 1997; Agliari et al., 2014; Barra et al., 2014), to study the level of integration of migrants. The fraction of mixed marriages mmix, collected each year and for each municipality, is studied vs. the fraction γ of immigrants on the total population. While previous studies of analogous databases from Spain (Barra et al., 2014; Agliari et al., 2014; Barra et al., 2016), France, Germany and Switzerland (Agliari et al., 2015), all reported an average smooth law for the evolution of mixed marriages vs. the migrant’s percentage (alternating a square-root behaviour vs. a linear growth), the Italian database does not show the existence of an underlying average law if analyzed as a whole: in other terms, it is not possible to map the statistical sample into a well defined function mmix(γ). Since this feature is generally the signature of a mixture of two, or more, different phenomena that need to be disentangled we split the database into large and small municipalities with respect to a proper tuning of a threshold θ over the population size. In this case, analyzing separately the two resulting ensembles, a clear functional dependence was recovered for each of them matching two behaviours: for large municipalities the quantifier mmix grows linearly with γ (i.e., mmix∝γ), while for small municipalities a square-root function emerges (i.e.,). In this last case γc turns out to be positive and close to zero. By further analyzing the quantifier probability distribution around the critical point, we find that large municipalities display Gaussian fluctuations while small municipalities fluctuate according to a quartic exponential distribution close to γc. To further confirm our findings we performed the same analysis by splitting municipalities according to their relative population densities, rather than their size, hence in low and high-density areas and we found analogous results. While this is partly due to the natural correlation that the larger the municipalities, the higher the density of people they contain, a detailed inspection of this point was necessary (see Clark (1951) and “Analysis of the data distributions” in the Supplementary Information for a discussion).
The previous results can then be interpreted in terms of a probabilistic model of monomer-dimer type, the mean-field version of a class of statistical mechanical models used in condensed-matter statistical physics to describe the deposit of diatomic molecules on lattices (Heilmann and Lieb, 1972). The dimer corresponds here to a married couple and the monomer to the unmarried person. The monogamic rule of the social setting is the equivalent of the forbidden configuration made of two dimers on the same vertex. Our interpretation is based on a series of rigorous mathematical works (Alberici et al., 2014a, b; Alberici and Contucci, 2014; Alberici et al., 2015, 2016a, b) where it has been shown that, in the presence of an imitative interaction among vertices (monomers), the model displays a phase transition: the dimer densities have a square-root growth by the critical point, where quartic exponential fluctuations are observed at the scale N3/4 (rather than quadratic at the usual scale N1/2, i.e., the standard Gaussian scenario). This result is also typical of a large class of mean-field ferromagnetic spin models whose behaviour was understood in the works (Ellis and Newman, 1978a, b; Ellis et al., 1980; Ellis and Rosen, 1982). The imitative interaction, from a sociological modelistic point of view (Nowak, 2006; Hauert and Doebeli, 2004), is seen as the trust-bond among two people. In the works (Barra et al., 2014; Gallo et al., 2009) it was seen that this type of interaction is strong enough to cause, when the interaction graph is percolated, a mean-field phase transition with square-root singularity. This allows us to conclude that in small municipalities the trust social network is percolated while it is fragmented in large municipalities thus confirming classical theories on alienation (Durkheim, 1897).
Our conclusions lend to sociological studies and are potentially useful for policy makers. The incentives towards a smooth integration of migrants can in fact be predicted to be proportional to the size of the targeted population for sparse networks (large cities) while are sensibly smaller for percolated ones (Burioni et al., 2015).
Database and observables
The source of our database is the ISTAT (Italian National Institute of Statistics). We consider the yearly collected data in the time window 2002–2010 for 8100 municipalities (comuni, the smallest administrative units of the Italian territory), distributed all over the country. For every municipality the resident population is provided, divided into native citizens and immigrants. Data for the marriages are split into three types: couples which are composed by two Italians, or one Italian and one immigrant (mixed marriage), or two immigrants. ISTAT also provides information about the surface S (in Km2) of each municipality (ISTAT, 2018), in such a way that the densities (ρ, people per Km2) for native citizens, immigrants and total population can be deduced.
Labelling with i each municipality, and with t each year we define:
M(i,t): the total number of marriages (this includes marriages where partners are both natives, or both foreign-born, or mixed)
Mmix(i,t): the number of mixed marriages;
mmix(i,t): the fraction of mixed marriages (mmix=Mmix/M);
Nnat(i,t): the native resident population;
Nimm(i,t): the immigrant resident population;
N(i,t): the overall population (N = Nnat+Ninn);
γ(i,t): the fraction of immigrants on the total population (γ = Nimm/N);
Γ(i,t): the fraction of potential cross-links between the two subsets of the population (Γ = γ(1−γ));
S(i): the surface of the municipality (independent of time);
ρ(i,t): the total population density (ρ = N/S).
As anticipated, the index of municipalities i ranges from 1 to 8100, while the index t ranges from 1 to 9 for the years from 2002 to 2010. During the considered time window the resident population and the number of marriages evolve as summarized in Fig. 1 (panels a–c).
In the following, we will focus on the averages and on the fluctuations of the (suitably normalized) number of mixed marriages as functions of Γ,Footnote 1 and we can therefore merge the data entries into a unique catalogue, regardless of their coordinates in space and time, ordering them by increasing values of Γ (see Fig. 1, panel d). Such raw data are then properly “binned” over Γ in order to highlight a possible functional dependence. We investigated two binning procedures: constant information and constant step. With the first the intervals on Γ are chosen in such a way to include a fixed number of points. The resulting set of averaged data will have a non-uniform spacing, with larger width where data are less dense. The second has intervals of the same size but the statistical robustness of each bin varies. We thoroughly checked the quantitative consistence of the results of the two methods emphasising the use of the first for the analysis at small Γ and of the second for the whole histograms and related distributions.
We finally notice that our analysis goes up to Γ ≈ 0.1. Beyond that value there is <5% of the data and, beside us being mainly interested on the neighbour of Γ = 0, its statistical robustness would be insufficient for our purposes.
Analysis of mean values
The analysis of the mean values is performed on the quantifier mmix as a function of Γ according to the binning procedures explained in the previous section. The advantage of normalising the mixed marriages with the total number of marriages is twofold. First, it takes care of the finite size fluctuations due to the fact that the database includes a mixture of municipalities of all sizes. Second, it allows us to momentarily skip the search of the proper volume of the phenomenon, i.e., the normalisation scale to observe the intensive quantities and their averages through the law of the large numbers as well as their fluctuations and their limiting behaviour.
Our first attempt to identify a law mmix(Γ), if any, has been to consider the whole database at once, i.e., with all the municipalities included. In this case the average values turned out to be irregular in Γ and unstable with respect to the binning procedures. The lack of a functional relation between the observable and the parameter Γ can be interpreted as the signature of the mixture of two, or more, different laws to be disentangled.
We identified such laws according to two (correlated) parameters: the population size of the municipality and the density (per unit area) of its population. What we found is that mmix(Γ) grows linearly for large size (or high-density) cities
while smaller cities (or less dense ones) display a square-root growth
The two different growth laws have been observed (Barra et al., 2014; Agliari et al., 2014, 2015; Contucci and Sandell, 2015, 2016) and successfully explained (Alberici et al., 2014a, b; Alberici and Contucci, 2014; Alberici et al., 2015; McFadden, 2001; Brock and Durlauf, 2001a, b; Agliari and Barra, 2011; Barra and Agliari, 2012; Barra and Contucci, 2010; Agliari et al., 2008) in terms of different social network structures (see also the section “A review of the statistical mechanical model” in the Supplementary Information for a short but self-contained treatment). The first type of behaviour can be associated to a lack of citizen’s proximity interaction, namely a lack of a (percolated) trust networkFootnote 2 of mutual influences (Granovetter, 1983; Watts and Strogatz, 1998). This is also an implicit and indirect confirmation of the general sociological theories about alienation and anomie in large cities (Durkheim, 1897). Conversely, the second type of behaviour typically emerges in the presence of citizen’s interaction on a percolated network.
In order to identify the two subsets of municipalities, we analysed the coefficient of determination for the linear and for the square-root fit, as a function of a trial threshold used to split the database into two complementary ones, according to either the population size or the population density. It turns out that (see Fig. 2, panel a) and are, respectively, monotonically increasing and decreasing, thus identifying a crossover regime that we take as threshold: for the size people and for the density people per Km2. The two thresholds are clearly correlated (see “Analysis of the data distribution” in the Supplementary Information) since high densities are usually reached only on large cities. It is worth to mention that, as explained in the Supplementary Information (see “Analysis of the data distribution”), the critical thresholds θ c and are not far away from their respective medians. This guarantees that the data sets of each bipartition are robust enough as expected a priori since otherwise we would have seen a clear functional law even without splitting the original one.
With the two databases identified by large and small cities, or cities at high or low density, we can finally analyse the data sets and check the behaviour of the quantifier in each of them: results are shown in Fig. 2 panels c and e, for small and large cities, respectively. They display a clear square-root scaling in the first case and a linear one in the second. Similarly, for sparse and dense cities, respectively, the results are shown in Fig. 2 panels d and f (see also “Statistical analysis and robustness tests for mean values” in the Supplementary Information).
To summarise, the analysis above has outlined two key variables (i.e., population size and density), intrinsically connected, responsible for the heterogeneity among municipalities, when looked in terms of mixed marriages. In particular, for both the variables, we found a critical threshold according to which homogenous subsets of municipalities can be determined and a percolative or non-percolative regime for the interaction identified.
Analysis of fluctuations
In the previous section, we analysed the behaviour of the first moment of the fraction of number of mixed marriages and we found the existence of two different laws for the quantity mmix versus Γ. Here, we extend our investigation to its second moment, i.e., the fluctuations, and also to the type of its limiting distribution.
Such analysis, beside being crucially relevant on itself, is of fundamental importance to validate the theoretical picture that we advanced. Our model, in fact, predicts precise quantitative differences in the behaviour of the fluctuations around the critical point Γ c (see “A review of the statistical mechanical model” in the Supplementary Information). In the absence of a percolating-interaction, fluctuations should always be Gaussian. When the interaction is strong and percolating instead we expect Gaussian fluctuations away from the critical point, while in the vicinity of the critical point the model predicts a quartic exponential limiting distribution.
In order to compare empirical data with theoretical laws a paramount step to be solved is the identification of the proper “size” of the interacting system. Dealing with a matching problem among two groups of sizes Nnat and Nimm, the natural candidate definition, to be verified and tested, seems to be
To this purpose we introduce an intensive random variable, i.e., the ratio
where the exponent β at the denominator is a trial positive number to be experimentally identified through the Law of Large Numbers. The filtering procedure to be implemented in this case, unlike in the analysis of the previous section where the size problem was avoided by studying the ratio of two extensive observables, must be able to select, within the data set, all those sub-clusters of data that are approximatively at the same Γ and at the same Ω. To this aim we mesh the (Γ,Ω) space in such a way that municipalities falling within the k-th region can be considered as homogeneous, namely municipalities corresponding to observables Γ(i,t) and Ω(i,t) which fulfill, simultaneously, and , constitute the k-th sample. Of course, the partition of the space (Γ,Ω), that is, ultimately, the choice of δΓ and δΩ, must provide a proper trade-off, which ensures that each sample is statistically large enough but, still, relatively homogeneous. We can then define the average over the k-th sample as follows
where, with some abuse of notation, in the denominator there is the cardinality of the k-th sample. Basically, represents the average number of marriages within those municipalities which share the same effective size and the same fraction of immigrants. Analogously, the normalised fluctuations (around μi,t) are defined as
where the exponent α at the denominator is another tuneable parameter to be experimentally fixed through the Central Limit Theorem away from the critical point and to a possibly different law, if any, in its vicinity. Hereafter we shall drop the subindex k for the sake of simplicity. More explicitly, we are left to prove the existence of two suitable exponents and such that
namely, when and , μ and Δ shall not exhibit any dependence on the system size thus making possible direct comparison with statistical-mechanical theories. In particular, we are interested in identifying any possible breakdown of the Eq. 8 as this could provide the signature of a possible critical behaviour (expected as ). For this reason a separate analysis shall be conducted by focusing on the small municipalities database at progressively smaller Γ because those are, respectively, the phase and point at which a phase transition has been found on the previous section. The results for μ are summarised in Fig. 3 where the choice of the trial size is indeed confirmed since, for , data points corresponding to different sizes Ω merge together. Similarly, the results for Δ are summarised by Fig. 4, where the set of data corresponding to different sizes Ω collapse when , confirming the Central Limit Theorem.
Finally, we investigate the behaviour of Δi,t(α,Γ,Ω) in the vicinity of Γ = 0. We expect that in this region there exists a suitable exponent α c , that keeps the quantity finite for large sizes, namely the fluctuations of Mmix pertaining to different values of Ω collapse to a finite value (and do not grow in N). We expect, moreover, that
our theory suggesting α c = 0.75 (Alberici et al., 2016a). To check this we look at the distribution of the raw data falling in three Γ-bins approaching Γ = 0 for α = 0.5 and α = 0.75. The results are shown in Fig. 5. When the normalisation exponent is α = 0.5 (left panels) and the data are fitted against a Gaussian distribution, the quality of the fit progressively deteriorates while Γ approaches 0. In particular one sees that the coefficient R2 goes from 0.981 in panel e to 0.956 in panel a, which suggests that near Γ = 0 the Gaussian regime for the fluctuations does not hold any longer. When instead we choose α = 0.75 and fit the data against a quartic exponential (see the right panels of Fig. 5) we observe an opposite trend i.e., approaching Γ = 0 the quality of the fit is progressively improving. Namely R2 = 0.999 in panel b, while far from the critical point R2 = 0.586 in panel (f). This picture is in complete agreement with the statistical-mechanics prediction (Alberici et al., 2015, 2016b; Alberici and Mingione, 2017) especially considering the finite size effects that real data carry with them. To test the compatibility of those effects with the model we have made numerical simulations, reported in the Supplementary Information (see “Further robustness tests on fluctuations”), that show how the boundedness of the fluctuations with the normalisation α c = 3/4 is consistent with those found on real data.
The study of migration fluxes and their relative integration has a long tradition in Italy, a country that has been constantly exposed to immigration phenomena. There are therefore excellent accounts from the classical sociological perspectives (Colombo, 2012; Pastore and Ponzo, 2016). Our approach in this study has been based on data and on mathematical modelling and data analysis methods developed within the hard sciences, in particular statistical physics. What we have learnt is that large cities and small villages have very different mechanisms to integrate the immigrants, as far as the mixed marriage observable is concerned. While in large cities the growth of integration follows linearly the increasing presence of immigrants, in villages the functional dependence is of square-root type. This result confirms (see also (Barra et al., 2014; Agliari et al., 2015)) that the collective behaviour of integration phenomena obeys to either two of the social paradigms: independent agents vs. correlated ones. One further confirmation step was obtained in the present work, i.e., the verification that also at the fluctuation level one can see the difference of the two phases. At very small immigration densities the central limit theorem is violated and a different limiting distribution appears, at much larger scales N3/4 instead of N1/2, that turns out to be a quartic exponential. The idea that statistical mechanics could shed some light on the social sciences was discussed by Durlauf almost two decades ago (Durlauf, 1999). The results reached in this work provide an instance of concrete evidence in favour of that idea by identifying a matching between data and theory. The conclusion that we reached concerning the percolation of the trust network and the size of the municipality provides very suggestive forecasts on different phenomena. One of them is the vote of the country on sensitive topics like pro or anti immigration policies. Small villages (percolated case) will display highly polarised votes almost totally concentrated on one of the two positions. Such voting results are quite difficult to predict based on preliminary polls. Large cities instead (fragmented network) will display a voting result that comes from the superposition of the two opinions. A well done survey, therefore, will correctly predict the outcome of the elections.
The datasets analysed during the current study are available on demand in the ISTAT repository: https://contact.istat.it.
Publisher's note: Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Since Γ=γ(1−γ), for small percentages of migrants γ (i.e., in a neighborhood of zero, which is the main focus of the present work), Γ and γ are basically indistinguishable.
With the term trust network we mean not just the usual social network of acquaintances, but rather a subnetwork where only the most significant connections are retained in such a way that the behaviour of nearest neighbours is actually strongly correlated.
EA, AB, AP are grateful to GNFM-INdAM Progetto-Giovani Agliari-2016; AB further acknowledges support by Salento University; AP further acknowledges support by the Engineering and Physical Sciences Research Council (EPSRC), Grant No. EP/L505110/1, by The Alan Turing Institute EPSRC grant EP/N510129/1 and seed project SF029 “Predictive graph analytics and propagation of information in networks”; CV acknowledges financial supports from Fondo di Ateneo per la ricerca 2015, Università di Modena e Reggio Emilia and FIRB Grant RBFR10N90W. PC acknowledges financial support from PRIN project Statistical Mechanics and Complexity (2015K7KK8L). Finally and most importantly we want to dedicate this work to our missed colleague and friend Ignacio Gallo. Ignacio has been a driving force in the mathematical approach to quantitative sociology and his ideas had a profound impact on our understanding of the field.
Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made. The images or other third party material in this article are included in the article’s Creative Commons license, unless indicated otherwise in a credit line to the material. If material is not included in the article’s Creative Commons license and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this license, visit http://creativecommons.org/licenses/by/4.0/.