Linking theory and empirics: a general framework to model opinion formation processes

We introduce a minimal opinion formation model, which is quite flexible and can reproduce a broad variety of the existing micro-influence assumptions and models. At the same time, the model can be easily calibrated on real data, upon which it imposes only a few requirements. From this perspective, our model can be considered as a bridge, connecting theoretical studies on opinion formation models and empirical research on social dynamics. We investigate the model analytically by using mean-field approximation and numerically. Our analysis is exemplified by recently reported empirical data drawn from an online social network. Employing these data for the model calibration, we demonstrate that the model may reproduce fragmented and polarizing social systems. Furthermore, we manage to generate an artificial society that features properties quantitatively and qualitatively similar to those observed empirically at the macro scale. This ability became possible after we had advanced the model with two important communication features: selectivity and personalization algorithms.


Introduction
Models of opinion dynamics (aka social-influence models) concern how individuals change their opinions as a response to information their receive from social environments.
Understanding the processes of opinion dynamics is important due to its applications in many fields, including policy-making, business, and marketing. This branch of modeling is naturally interdisciplinary, attracting scholars from different fields, including social psychology, control theory, and physics. Despite the theoretical side of these models having been seriously advanced, the problem of its applications to describing real social processes remains an important question (Castellano et al., 2009;Flache et al., 2017;Mäs, 2019;Proskurnikov & Tempo, 2017, 2018. This issue is rooted in the complex nature of the social systems. To be more precise, it is extremely difficult to calibrate parameters of the underlying social-influence models, which operate with hardly formalizable entities. One prominent example is that of opinions themselves, which are intrinsic components of such models (Mäs, 2019).
The proliferation of online social networks (OSNs) has made it possible to identify the dynamics of users' opinions on a large scale by applying machine learning techniques (Barberá, 2014;Chang et al., 2017). Combining these methods with tools elaborated in the field of social network analysis, one can obtain both dynamics of opinions and information on social connections between individuals (M. E. J. Newman, 2018bNewman, , 2018a. Further, recent research has proposed a methodology to identify not only the structure of ties but also their weights, which describe how well these ties conduct social influence (aka influence networks) (Ravazzi et al., 2021). This information may be effectively integrated into the existing socialinfluence models, allowing to calibrate them, validate, and, further, make necessary predictions.
However, the step involving the integration of information gathered from OSN into the models is hampered because there is a substantial scope of the opinion dynamics models, and each model may require a specific data format. Hence, empirical data gathered through an experiment (most likely an expensive and time-consuming one) may be useful in one case but, unfortunately, unacceptable or requiring a lot of extra work with data in other situations. Thus, each new dataset on opinion dynamics will potentially have a limited area of application in the sense that it could be investigated only by a restricted number of opinion formation models.
Therefore, we propose a quite general and, nonetheless, minimal model of opinion formation. On the one hand, the model is extremely flexible and can approximate a broad variety of the existing micro-influence assumptions and models. On the other hand, our model can be easily calibrated on empirical data, upon which it imposes a relatively small number of requirements. From this perspective, this model can serve as a bridge, connecting theoretical studies on opinion formation models on the one hand and empirical research on the other. We investigate the model analytically and numerically and exemplify our analysis with recently reported empirical data drawn from an online social network.
The remainder of this paper is organized as follows. Section 2 reviews the related literature. In Section 3, we elaborate upon the model and discuss it. Section 4 presents the analytical results obtained from use of the model. In Section 5, we describe the design of numerical experiments that we use to investigate the model's behavior. Section 6 presents the results of the numerical experiments, and in Section 7, we discuss them. Concluding remarks are provided in Section 8, and, finally, the Appendix includes supplemental information.

Literature
Social-influence models are numerous, and it is extremely difficult to classify all of them correctly. However, most of them can be grouped into main classes to some extent. Within this paper, we concentrate on the micro-level models, whereby the opinions of every individual are initialized and can both change and cause changes. In such models, the modeler analyzes how influence processes at the micro scale affect the resulting state of the social system at the macro scale. In contrast, so-called macro-level models describe the behavior of macroscopic variables, such as populations of individuals espousing particular opinions (Rashevsky, 1939).
The literature emphasizes three main classification criteria. (1) Time in the model: continuous (Abelson, 1964) or discrete (DeGroot, 1974); (2) opinions: continuous (Friedkin & Johnsen, 1990) or discrete (Moretti et al., 2013;Sznajd-Weron & Sznajd, 2000); and (3) micro-assumption of social influence (see below). Because the empirical data are typically gathered in discrete time, we will focus hereafter on discrete-time models. Besides, we will assume that opinions are represented by scalar quantities describing individuals' positions on a single issue. More complex situations could arise when several topics (logically connected or independent) are analyzed at once (Friedkin et al., 2016). However, gathering data on individuals' opinions on two or more topics simultaneously is a challenging and costly task.
On this basis, we will focus on scalar opinions.
There are three main micro-assumptions regarding social influence, built upon a continuous opinion space (Flache et al., 2017). In that space, in the case of one-dimensional opinions, one can say that an individual's opinion is affected by positive (aka assimilative) influence from a different opinion if the former moves towards the source of influence (DeGroot, 1974;French Jr, 1956). However, the literature on social psychology stipulates that if opinions are too distant, then the positive influence may not be accepted. Hence, the concept of bounded confidence has been introduced whereby only individuals espousing sufficiently close opinions may communicate (Deffuant et al., 2000). In turn, if individuals' opinions become more distant following communication, then such a mechanism is termed negative (aka dissimilative) influence (Macy et al., 2003). Note that in the case of a discrete opinion space, these assumptions are rather meaningless unless opinion values are ordered.
The positive influence mechanism explains elegantly how individuals reach agreement, and bounded confidence can model a situation when a social system is characterized by persistent disagreement (opinion fragmentation), whereas negative influence is one of the possible mechanisms explaining opinion polarization-the process in which individuals' opinions are stretched to the polar positions of the opinion space (Banisch & Olbrich, 2019).
Thus, two camps of diametrically opposite opinions appear, a state of the social system that may have potentially dangerous consequences because it prevents democracy processes in general and consensus reaching in particular (Prasetya & Murata, 2020). An important challenge in the field of opinion dynamics models is to determine the settings under which the model would be able to generate stable opinion polarization (Flache et al., 2017). Possible solutions here, apart from negative influence, are mass media (Prasetya & Murata, 2020), social feedback processes (Banisch & Olbrich, 2019), social influence structure (Friedkin, 1999), arguments exchange (Banisch & Olbrich, 2021;Mäs & Flache, 2013), social identity (Törnberg et al., 2021), or mistrust (Adams et al., 2021.

Model Setup
We consider the system of agents that are connected by a social network . Each agent's opinion may take one of values from the set = { ! , … , " } that represents a discrete opinion space, a construction that is extensively studied in the sociophysics literature (Axelrod, 1997;Castellano et al., 2009;Clifford & Sudbury, 1973;Galam, 1986). In some situations, one may assume that a binary relation is predetermined on the opinion space whereby types ! and " stand for the most radical and polar position in that space: Depending on the context, we will endow these quantities with different values.
In our model, the time is discrete; we denote the opinion of agent at time by $ ( ) ∈ { ! , … , " }. The population of agents having opinion % at time is described by the quantity % ( ) ∈ {0,1, … , }. Note that we assume that the system is "conservative" (agents do not leave the system, and there are no incoming agents): ∑ % ( ) " %&! = for any .
A key element of the model is a 3-D matrix = 9 ',),% ; ∈ ℝ "×"×" where , , ∈ {1, … , }, which governs opinion dynamics that unfolds on the social network. This matrix, which we will refer to as the transition matrix hereafter, prescribes probabilities of opinion shifts. Each opinion shift is a move in the opinion space that is a result of peer influence processes. The first index in ',),% stands for an agent's current opinion ' , the second index describes the opinion ) of an influence source, and the last index represents the potential opinion % of the target agent at the next time point. In other words, ',),% is the probability that an agent with opinion ' will switch their opinion to % after being influenced by an agent holding opinion ) : where $← denotes the opinion of the source of influence. As such, we should require for any and . Note that ',),' represents the probability of staying at the current position after interaction with opinion ) . In the following, it will be convenient to represent different transition matrices by considering their slices over the first index. We will denote these slices, which are row-stochastic 2-D matrices, by ',:,: ∈ ℝ "×" . In brief, the matrix ',:,: outlines the behavior of an agent who has opinion ' . Its rows indicate the opinion of an influence source, and its columns represent potential opinion options: Thus, the number of parameters in the transition matrix depends only on the number of possible opinion values rather than on the total number of agents.
To illustrate the organization of the transition matrix, let us consider the following example.

This transition matrix represents the likelihood of opinion shifts in opinion space
According to the transition matrix, an agent who has opinion ! is a conformist who completely follows the opinion of an influence source. In turn, agents who hold position # are so-called stubborn agents who do not change their opinions in the presence of peer influence.
Agents with opinionact in a purely random fashion regardless of who influences them. Now let us introduce how an opinion dynamics protocol is organized (see Figure 1). At each time point , a randomly chosen agent is influenced by one of their neighbors in the social network (the neighbor is also chosen at random). Hence, the opinion of the focal agent $ ( ) changes (or remains the same) according to the probability distribution established in the transition matrix. This influence mechanism is asymmetric: the opinion of agent does not change following the interaction.

Flexibility of the model
This model is extremely general and can capture a broad set of micro-influence mechanisms introduced in the literature. First, we demonstrate that the classic voter model (Clifford & Sudbury, 1973) can be easily obtained as a special case of our model.  The notion of bounded confidence is based on the idea that an individual perceives (positive) influence only if the opinion of the influence source is not too far from their own opinion (Flache et al., 2017). Bounded confidence may take a strict form-that is, only agents with sufficiently similar opinions may influence each other-or a mild form, whereby agents with different opinions may communicate but with a small probability (Kurahashi-Nakamura et al., 2016;Mäs & Flache, 2013).  In Example 4, we do not pay attention to so-called leapfrog opinion shifts-situations when an agent's opinion ' moves towards the opinion of an influence source ) , with a magnitude that is higher than the distance between ' and ) . In this case, the focal agent's opinion skips the influence source's one. Leapfrog opinion shifts are rarely considered in the theoretical studies but nonetheless may be encountered in empirical environments (Friedkin et al., 2021;Kozitsin, 2020Kozitsin, , 2021. In principle, such situations may be attributed to measurement errors (Carpentras & Quayle, 2021).
Acting in a similar fashion, one could adjust the values of the transition matrix to represent more complex microscopic assumptions on social influence, such as moderated positive influence or combinations of positive and negative influence in which coexistence may take quite nontrivial forms (Kozitsin, 2021;Takács et al., 2016).
If the binary relation on the opinion space is introduced, then the limit → ∞ provides an approximation of a continuous opinion space (without loss of generality, we may consider the interval [0,1]), which has gained substantial attention in the literature (Mastroeni et al.,

2019).
The flexibility of our model is not without limitations. One can notice that the model assumes that agents with similar opinions should act equally on average in similar situations (that is, being exposed to comparable influence opinions), an assumption that significantly reduces the model's predictive power because not all ties transmit influence on an equal basis.
In other words, our model can reproduce only the average patterns of opinion formation processes and approximate only anonymized forms of continuous opinion models (whereby all influence weights are equal). This issue makes the model less flexible than, for example, the DeGroot model, which allows individuals to allocate different influence weights to their peers.
On the other hand, our model can easily explain the situation when an agent acts differently (by choosing positive or negative shift) as a response to the same influence opinion.
This ability may be (i) due to the stochasticity of the model and (ii) because the model allows encoding of different opinion-changing strategies, depending on the current opinion of the focal individual. In the case of the DeGroot model (which is linear in its canonic form), the same can be achieved if one introduces nonlinearity in the influence matrix, an assumption that makes the model significantly more sophisticated.

Model identification
To calibrate the elaborated model, one needs to know (1) the trajectories of individual opinions and (2) the history of the individuals' communications. To be more precise, for a given individual , one must know their opinion $ before communication, the opinion of the influence source $← , and the focal agent's opinion $ after communication. One should first apply the procedure of discretization on experimental opinions if these opinions are initially continuous. Here, one should find the most appropriate discretization step, which should be a sort of compromise: a step that is too large could lead to losing potentially useful information on individuals' opinion trajectories, whereas too small a step results in a sharp increase in the number of transition matrix elements, which are now highly difficult to interpret and gives way to unnecessary data fluctuations. The resulting discrete opinions (for convenience, we denote them similarly) can be used to estimate the transition matrix elements: where #{… } denotes the cardinal number of the set. To put it simply, ',),% is computed as the fraction of individuals who made opinion change ' → % among those whose opinion is ' and are influenced by opinion ) . To compute all the transition matrix components, one needs to be provided with a substantial opinion diversity, which ensures that all combinations of ' and ) are represented in the data. Otherwise, the available statistics would be insufficient to calibrate the transition matrix. Individuals' opinions should be represented at least twice in the data: before and after interactions. However, longer opinion trajectories will be useful because they provide more room for analysis.
After the transition matrix is estimated, one should first analyze it to understand the average patterns of opinion dynamics and then make predictions about the future behavior of the system. Examples of the transition matrices estimated from empirical data will be presented below in this paper.
Note that we do not impose any requirements on the nature of empirical opinions. They could be discrete-in this case, we will consider low-dimensional transition matrices, or continuous-then, we firstly discretize opinions. In principle, the history of an individual's communications (with whom they talked), which is highly difficult to retrieve in nonlaboratory settings, can be replaced by more simple forms of structures of social connections, such as a friendship network, which could be relatively easily retrieved from the Web. Of course, more detailed information on how individuals interact with each other will make the model more precise, but in the following sections, we will demonstrate that even a simple friendship network may serve as a good approximation of the actual communication network in the sense that it could simulate artificial social systems consistent with empirics at the macro scale.

Mean-field approximation
The model elaborated above can be studied using mean-field approximation. Within this section, we assume that the social network is a complete graph whereby each agent can communicate with each one.
Establishing scaled time = 0 1 and scaled time step = ! 1 , for large we get the nonlinear autonomous system of differential equations (see Appendix for details): which should be equipped with the initial condition: where ∑ 2 " 2&! = 1 and 2 ∈ [0, 1]. Note that one of the equations in (1) is redundant.
Equilibrium points of the system (1) are given by: where we use the notation: Due to the fact that the right side of (1) is a polynomial, we can guarantee that the Cauchy problem (1)^(2) has a unique solution, which is an analytic function of parameters ',),% and 2 . In other words, small changes in ',),% and 2 lead to small perturbations of the solution.
In the following two subsections, we will exemplify the elaborated analytical results with the low-dimension cases = 2 and = 3, enriching them with data describing real opinion dynamics processes.

Analysis of the binary opinion space
If = 2, which corresponds to the binary opinion space, then system (1) takes the quite simple form: (3) One of two equations in system (3) is unnecessary. Substituting ! = 1 − # into the first one, we get: If ! * ∈ [0,1] is an equilibrium point, then it should admit equation: The sign of the quantity: determines the stability properties of this equilibrium.
Let us now equip the model with empirical data from the dataset that was recently reported in Kozitsin (2020Kozitsin ( , 2021, where the author analyzed longitudinal data representing the dynamics of (continuous) opinions of a large-scale sample of OSN users (hereafter -Dataset). Using the first two opinion snapshots 1 from Dataset (see Appendix for details), we obtain the following transition matrix: One can observe that transition matrix (4) is very different from the one that represents the voter model (see Example 2); in the current case, agents rarely change their opinions if they are exposed to challenging positions. In turn, they have a nonzero chance of their opinion changing after being exposed to the same position-a phenomenon that was referred to in (Krueger et al., 2017) as anticonformity. Nonetheless, the likelihood of an opinion shift slightly increases if two agents with opposite positions communicate, compared to when they have similar opinions.

Analysis of the triple opinion space
Let us now increment the dimensionality of the opinion space by one and consider triple In this configuration, opinions ! andstand for antagonistic positions, whereas # is somewhat neutral, located in the center. For brevity, we do not describe how system (1) appears in this case. Instead, we immediately obtain th0e transition matrix from empirics in a similar manner to the process performed in the previous subsection: The slices of the transition matrix presented above reflect several remarkable features.
First, larger values of opinion difference between individuals increase the rate of positive influence. One can observe this trend by inspecting slices !,:,: and -,:,: : for example, the second and third columns in matrix !,:,: are established so that their values increase as the index of the columns also rises. Further, individuals located in the middle of the opinion space may make negative shifts: #,!,-> 0, #,-,! > 0. However, according to (5), positive shifts are more likely than negative ones: System (1) augmented with transition probabilities (5) has only one meaningful (located in the unit square) equilibrium point that can be obtained graphically or via solving the following optimization problem (see Figure 2): We find that ! * ≈ 0.269, # * ≈ 0.611 (and - * ≈ 0.12 correspondingly). The Jacobian matrix at the equilibrium point has two different negative eigenvalues; hence, we can identify the behavior of the phase curves near the equilibrium point, which is an asymptotically stable nodal sink.

Numerical experiments
We perform extensive numerical experiments to investigate the behavior of our model and, more specifically, support the analytical results derived in Section 4. Besides, we would like to understand whether our model can generate artificial social systems close (in some respects) to those observed empirically. We recognize that it is unlikely that we will be able to predict the opinion trajectories of specific individuals. However, we hope to predict the system's dynamics at the macroscopic level.
We build our further analysis on the investigation into how the model works on synthetic random networks by employing the following macroscopic metrics.

M1.
The fraction of individuals $ who have opinion ' for ∈ {1, … , }. The combination of variables ! ( ), … , " ( ) represents public opinion at time . In what follows, we will refer to them as public opinion variables.

M2. Assortativity coefficient
which measures whether the system at hand is homophilic (connected nodes tend to have similar opinions). In (7), vector ( ) = [ ! ( ) … 1 ( )] 5 stands for current agents' opinions, adjacency matrix = 9 $. ; ∈ {0,1} 1×1 describes the stkructure of social network , and is the number of edges in the network. $ represents node 's degree: To put it simply, (7) measures how similar the neighboring opinions are, compared with the configuration in which edges are placed at random (M. E. J. Newman, 2003). For homophilic networks (most empirically observed social networks are homophilic), metric (7) takes positive values (assortative mixing).

M3.
The dissimilarity coefficient `( )a, which effectively assesses the current level of polarization (Banisch & Olbrich, 2019;Flache & Macy, 2011). This measure is defined as the standard deviation of all pairwise opinion distances and takes values in the range between = 0 (no polarization -all opinions are equal) and = 1 (the highest level of polarizationindividuals are divided into two equally sized camps located on the edges of the opinion space), provided that opinions lie in the interval [−1, 1].

Simulation settings
In experiments, we consider = 2000 agents, who are endowed with randomly generated initial opinions. Unless otherwise stated, opinions are initialized from the generalized Bernoulli distribution in which we set the uniform vector of probabilities ( $ (0) = 1/ for ∈ {1, … , }) 2 . For large , the initial opinion configuration is characterized by ≈ 0, ≈ 0.5. Note that in the case of the complete graph, the assortativity coefficient is always equal to null. In each experiment, a new network is generated, as well as new initial opinion values.
Apart from the complete graph model (under which we derived the mean-field approximation and which we use for the public opinion dynamics analysis only), we employ four synthetic graph models that are widely used in the social simulations literature (Giardini & Vilone, 2021; Perra & Rocha, 2019): (i) Erdős-Rényi network, (ii) random geometric network, (iii) Watts-Strogatz network, and (iv) Barabási-Albert network. Detailed information on network configurations is presented in Table 1. Experiments typically lasted no more than one million iterations, a time interval that is sufficiently large to inspect the model's behavior. We repeated each experiment 20 times to gain more precise estimations. Table 1 2 We tested different initial opinion configurations; however, we found that they have no influence on the asymptotic behavior of the model. Note: all model parameters are tuned to ensure that the resulting networks are connected and have approximately the same density (or the same number of edges ).

Transition matrices
We estimate the transition matrix using information from Dataset and concentrate on the first two opinion snapshots (see Appendix for details). We analyze cases = 2, = 3, and = 10. The first two opinion space configurations require only a small number of variables to be parametrized, and this ability is useful in interpretations and for demonstrative purposes.
Instead, the tenfold opinion space provides a more precise approximation of the underlying social processes. Further increase in may lead to unnecessary fluctuations in the data and a sharp increase in the number of transition matric elements. Transition matrices for binary and triple opinion spaces have already been introduced in (4) and (5). The tenfold transition matrix is partially presented (and discussed) in the Appendix; its full representation can be found in the Online Supplementary Materials.

Measurements, hypotheses, and expectations
The immediate observation that could be made from the estimated transition matrices is that the system has no stable states at the microscopic level: for every opinion vector, there is a nonzero probability that a randomly chosen agent will change their opinion at the next time point (even after exposure to the same opinion). However, in the following, we will demonstrate that the system has a stable converging tendency from the perspective of the macroscopic metrics. Kozitsin (2020Kozitsin ( , 2021 reported that the social system under consideration is homophilic with an assortativity coefficient of approximately 0.14, which may be considered as a not particularly strong (bot noticeable) rate of homophily. He also observed that users' opinions tend to stretch out to the edges of the opinion space in such a way that the fraction of individuals having middle-located (moderate) opinions decreases, individuals disposed on the left edge grow in number, and right-opinion users tend to keep their number or slightly decrease. To gain a more systematic understanding of the system, we calculate the metrics M1-M3 using different discretization strategies for all three opinion snapshots from Dataset (see Table 2). We observe the following dynamical patterns: (i) growth of the population of individuals espousing left-side opinions, decrease of those who hold a middle-side opinion, and a relatively small decrease of right-side opinion persons; (ii) extremely small increasing trend in the homophily level; (iii) extremely small increasing trend in the polarization rate. As such, we hypothesize that the model calibrated on the same data should be able to achieve similar metric values (hereafter -reference values) at some point of its evolution and, further, demonstrate the same dynamics patterns near this point. Note: the dissimilarity coefficient in the case of the binary opinion space is not useful and, therefore, we do not calculate it. The dynamics of public opinion variables in the case of the tenfold opinion space are too massive-one can find this information in Kozitsin (2021) if necessary.

Macroscopic behavior of the model
Our experiments reveal that regardless of the network topology, the behavior of public opinion variables remains the same. The evolution of the model can be decomposed into two periods (see Figure 3, panels A, B). In the first one, populations of camps ! ( ), … , " ( ) nearly monotonically converge to the theoretical predictions ! * , … , " * (see Subsections 4.2, 4.3). In the next period, the system fluctuates around these limiting values. At the beginning, the system is characterized by the almost zero assortativity because opinions are endowed at random. After a simulation has been initiated, the system rapidly becomes homophilic and then features fluctuations in a positive area demonstrating a relatively low rate of homophily (see Figure 3, panel C). For example, for the binary opinion space, the maximal assortativity rate observed is 0.05 (under WS1 topology), a value that is far from the empirical reference value (0.109). A similar difference in assortativity values was discerned in the case of the tenfold opinion space (0.07 in simulations against reference value 0.141).
Further, we found that more clustered networks (random geometric, WS1, and WS2) tend to produce more homophilic systems (see Figure 3, panel D).
The typical behavior of the polarization coefficient can be easily predicted because we know how public opinion evolves: starting from some point (that is determined by the initial opinion distribution), the polarization coefficient should firstly drift to the limiting value that characterizes the polarization of the stationary state opinion distribution { ! * , … , " * }.
Depending on the initial opinion configuration, this stage of evolution may feature growth (if the initial polarization level is lower than the limiting value) or decrease (if the initial polarization rate exceeds the limiting one). After the limiting value is achieved, the polarization coefficient should fluctuate around it. Further, this limiting value should not depend on the network topology because the latter does not affect the stationary state opinion distribution. Our numerical experiments (see Figure 3, panels E, F) confirm this proposal. We do not observe any relation between the network topology and the system's asymptotic polarization.
The presented results indicate that the elaborated model demonstrates a stable converging tendency: its macroscopic parameters tend to some limiting values at first and then feature oscillations around them. Depending on a particular macroscopic metric, corresponding limiting values may (the assortativity coefficient) or may not (public opinion variables, dissimilarity) be affected by the network topology. However, all of them are not sensitive to the initial opinion configuration. These findings contradict the work of Banisch and Olbrich (2019) and Stern and Livan (2021), who reported that the community organization of the network is one of the key factors of polarization. Instead, we found that this organization makes the network more homophilic. In the limit → ∞, the system features opinion fragmentation, which is characterized by persistent disagreement between agents. Further, one can also notice opinion polarization if the system begins from a densely concentrated opinion distribution. For example, if opinions are concentrated near the center of the opinion space, in this case, at the initial stage of the system's evolution, opinions will be prone to antagonistic or anticonformity-based interactions (in Kozitsin (2020), this phenomenon was attributed to the striving for uniqueness) and, thus, will move towards the extreme opinion values ! and " . However, increasing pairwise distances between agents' opinions will give way to assimilative interactions that are less likely to occur if opinions are too close. This process will continue until a sort of balance between assimilative and antagonistic interactions is reached.
The model exhibits good predictability from the perspective of the public opinion dynamics: the mean-field approximation derived in Section 4 is plausible not only for complete graphs (under the assumption of which this approximation was obtained) but also in other settings whereby social connections may have a quite complex structure. Next, the model can reproduce positive assortativity (i.e., can create a homophilic social system). However, the observed homophily rates are far lower than the empirical reference values. Further, dissimilarity demonstrated by the system after stabilization (which is slightly more than the corresponding reference values) provides a clue that the system can reach the reference polarization level if its initial degree of polarization is lower than the asymptotic one-this situation could arise if initial opinions are densely concentrated (for example, near the center of the opinion space).
To understand whether the simulation system can demonstrate similar behavior to the empirically observed one (see Table 2), we use the triple opinion space because it requires only a few macroscopic metric values to be analyzed (for example, in the case = 10, we need to inspect the behavior of 12 variables). Quite close matching between simulations and empirics can be achieved if one begins a simulation run from opinion distribution ! (0) = 0.07, # (0) = 0.79, -(0) = 0.14 (see Figure 4). In this case, at moment ≈ 20,000, we notice that metrics M1 and M3 match the reference values as well as their local dynamic patterns (see Table 2) 3 . An exception is the assortativity coefficient, the simulation values of which are below the reference one. These findings lead us to the following question: What modifications should one add to the model to make it possible to reproduce empirically observed homophily?

Possible modifications
Let us present some possible explanations of the observed discord in the values of the assortativity coefficient between simulations and empirics and feasible avenues for resolving this conflict.

Methodological errors in Dataset.
A discerned divergence between numerical experiments and empirics may stem from methodological errors in obtaining the underlying empirical data. Because these data were derived through a natural experiment in which individuals' opinions were estimated by using some heuristics (more precisely, it was assumed that users' opinions are reflected by the information sources they are subscribed to), it could mean that the empirical reference values we strive to equal are incorrect. Unfortunately, we cannot fix this problem; hence, we leave this situation beyond the scope of this article and assume that the reference values are identified correctly.

Noise in the estimated transition matrices. A slightly different idea is to suppose that
the reference values of macroscopic metrics are correct (i.e., individuals' opinions were estimated faithfully), but the transition matrix is identified with errors. The point is that errors in opinion identification naturally lead to mistakes in the estimated transition matrix (this is precisely what the previous paragraph discusses); however, to identify the transition matrix accurately, apart from the opinion values knowledge, we should also be able to determine the influence individuals are exposed to (see Subsection 3.3. and Appendix for details). This problem requires uncovering the influence network (Ravazzi et al., 2021), which is also a challenging task. Note that in Dataset, all influence weights are assumed to be equal (the influence opinion directed on a user is the average of opinions of the user's friends), an assumption which is unlikely true: some ties may be more successful in social influence transmission (Bond et al., 2012). Further, an influence system retrieved from OSN is likely incomplete because it neglects to consider the influence beyond the online world (more precisely, beyond a particular OSN). One more effect stems from our algorithm of the transition matrix identification and the nature of data (see Appendix for details): opinion dynamics presented in Dataset are identified under the assumption of many-to-one interactions, whereas the model assumes one-to-one interactions. Besides, such factors as selectivity or personalization algorithms (see below) may dominate. As a result, the real transition matrix may differ from the estimated one.
To model mistakes in the estimated transition matrix, we consider the binary opinion space, in which these mistakes may be parametrized by two variables: Selectivity is a well-documented tendency of social actors that we could not ignore, and it creates ties with those having similar opinions and breaks connections that promote uncomfortable information (Holme & Newman, 2006;Lewis et al., 2012;Neubaum et al., 2021;Sasahara et al., 2021). Along with (assimilative) social influence, selectivity is considered to be a main driver that makes social networks homophilic. As such, we may hypothesize that by adding selectivity into the model, we will increase the level of homophily, which is one of our purposes. However, the question is how selectivity affects polarization.
There is a line of research devoted to modeling coevolutionary processes whereby social influence mechanisms are combined with the dynamics of social graphs driven by selectivity Holme & Newman, 2006). We incorporate selectivity into our model by introducing parameter ∈ [0,1] (selectivity rate), which operates as follows. Their results indicate that personalization may both amplify and reduce polarization, subject to the underlying opinion formation model. Further, scholars argue that personalization algorithms may amplify the formation of echo chambers and, thus, increase the level of homophily. Besides, it is unlikely that the opinion dynamics presented in Dataset were not affected by personalization algorithms. On this basis, we need to incorporate personalization in our model. For our purposes, we employ one of the simplest approaches whereby communications between individuals may be declined. More precisely, selected agents and do not communicate (and the system goes to the next time step) with probability (personalization rate) if their opinions differ for more than Δ .
Implementation details. We implement assortativity and personalization into the model only in the cases of the triple and tenfold opinion spaces. Importantly, we combine them: if selectivity and personalization rates are positive at once, then agents can both change their connections and be affected by the personalization algorithm, as occurs in the following fashion (see Figure 5). First, the personalization algorithm checks whether | $ ( ) − $ ( )| is greater than Δ or not. If the latter is true, then communication between agents is allowed, and changes their opinion as usual. Otherwise (if opinions are too distant), the personalization algorithm activates and prohibits the communication with probability , and nothing happens (the system goes to the next time step). In contrast, with probability 1 − communication between agents and is allowed. This permitted communication can go in two different ways.
In the first one (that occurs with probability ) agent decides to renew their social environment by replacing tie ( , ) because it makes agent uncomfortable. The second direction implies that agent accepts influence from agent and follows the standard opinion dynamics protocol (whatever it leads to). Note that if we set = 0 and = 0 in the resulting model (Model 2), we return to the previously elaborated model (Model 1).

Figure 5. A graphical sketch of Model 2. The interaction of agents and with opinions )
(the influence source) and ' (influenced agent) respectively is driven by personalization (that may exclude the communication) and selectivity (that may lead to the replacement of edge ( , ) instead of the standard procedure of opinion change).
We employ the idea of noise in the transition matrix only for the binary opinion space because it requires (under the symmetrical assumption on noise) only two parameters to be used. Hence, we do not investigate how the data noisiness affects polarization patterns and concentrate only on the behavior of the assortativity coefficient.
Model 2 is investigated only under the empirically calibrated transition matrix derived for triple and tenfold opinion spaces. This approach gives us the opportunity to analyze the effects of selectivity and personalization factors on the dynamics of homophily and polarization. In Model 2, we use threshold Δ = 0 if = 3 (which is applied to opinion values ! = 0, # = 1, -= 2) and threshold Δ = 3 if = 10 (which is applied to opinion values ! = 0, … , !6 = 9).
We recognize that the implementation of selectivity and personalization affects not only the opinion dynamics itself but also influences the transition matrix that we observe and estimate from the side (for example, transition matrices estimated from Dataset). From this perspective, the most faithful approach would be to use the (ideal) transition matrix, which, combined with selectivity and personalization, will produce (in simulations) social dynamics by estimating which we will come to that transition matrix, as estimated from Dataset.
However, for the sake of simplicity, we build our analysis upon the estimated transition matrix, assuming that this matrix does not depend on selectivity and personalization factors.

Data noisiness
The ideal-typical behavior of the system under the presence of noise in the transition matrix remains the same. Thus, we concentrate on how and affect the limiting behavior of the assortativity coefficient.
Our analysis reveals that for a given network topology, the maximal value of assortativity depends positively on both and (see Figure 6). This result is intuitively clear.
On the one hand, by increasing , we reduce the likelihood that like-minded agents will have different opinions after an interaction. On the other hand, higher values of amplify the probability of opinion adoption; thus, neighboring agents are more likely to espouse similar positions after interaction. The minimal disturbance (in the Euclidean metric) we should make with the transition matrix to achieve the acceptable value of the assortativity coefficient is = 0.02 and = 0 (in the case of highly clustered networks). The resulting transition matrix

Macroscopic behavior of Model 2
The presence of selectivity and personalization does not alter the qualitative behavior of the system (such behavior is inherited from Model 1). Besides, we have observed no situations when the social network becomes disconnected. Nonetheless, we notice that selectivity and personalization affect the limiting values of the macroscopic metrics (see Figure 7). Recall that our purpose is to determine the combination of and that would increase the assortativity coefficient up to 0.14 at some point. Figure 6 indicates that higher selectivity rate values lead to more homophilic systems, as expected. However, personalization has the opposite effect on assortativity. Note that for large personalization rate values, the effect of selectivity is reduced.
In limiting case → 1, we obtain the system characterized to be the zero level of assortativity regardless of the selectivity rate. In this case, only sufficiently similar opinions can interact; thus, all they can do is make antagonistic interactions. The most homophilic system is obtained if = 1 and there is no personalization; then, we get ≈ 0.5. Interestingly, the effect of topology observed for Model 1 (more clustered networks produce more homophilic systems) disappears when we increase the selectivity or personalization rates. Both selectivity and personalization have a positive effect on the system's polarization. However, in contrast to the assortativity coefficient, the dissimilarity coefficient varies in relatively small intervals ≈  which could also generate a system that is consistent with the empirics. For example, a little level of asynchrony is observed by selecting = 0.2, = 0 or = 0.2, = 0.2 (i.e., by varying the level of personalization). However, we find that changes in the level of assortativity will lead to an apparent discord (see Appendix).

Discussion
Selectivity and personalization, which we implemented into the model, sufficiently advanced it and made it more realistic. Hence, we managed to simulate an artificial society that demonstrates properties similar to those observed empirically at the macro scale (at the particular point). Our finding on the (possibly) most appropriate settings that could generate empirically acceptable systems may be employed in several ways. On the one hand, these settings may offer an opportunity to understand how strong personalization and particularly selectivity are in the referenced OSN, from which the empirical data were gathered (see Appendix). More precisely, our results indicate that without selectivity, our artificial systems cannot achieve the desired empirics under the assumption that the transition matrix is estimated correctly. This knowledge can be further used in other studies in which this OSN is involved.
However, this idea may be wrong in the case when the transition matrix is estimated with errors. Our current results do not answer the question of whether changes in the transition matrix could lead to full coincidence between simulations and empirics, but they at least hint that they could do. This problem requires additional analysis.
Next, knowing the current state of the system, we can predict its future evolution at the macro scale. However, only short-term predictions are meaningful because the transition matrix likely changes during long time intervals, reflecting events that occur both in this very OSN or beyond it. To be more specific, we calculate the transition matrix employing the second and third opinion snapshots from Dataset (recall that previous matrices were computed using the first two ones). We obtain the transition matrix From this perspective, the opinion dynamics of our model can be understood in terms of the evolution of the transition matrix.

Conclusion
In this paper, we introduced a minimal opinion formation model.

Appendix B. Dataset organization
Dataset contains information on the dynamics of political preferences of a large-scale sample (approximately 1.6 M) of VKontakte (the most popular Russian social network) users. The sample was made by randomly choosing active (at least one platform interaction per month) individuals who meet a set of natural criteria (see (Kozitsin, 2020(Kozitsin, , 2021 for details). Further, the sample was additionally cleared of isolated subgroups of online friends in such a way that the resulting social network (whereby edges represent online friends) consists of one ( . Note that Dataset is built under the assumption that friendship connections are static, representing one of its main disadvantages.

Appendix C. Transition matrix estimation
Let us now describe how we integrate Dataset information into the model. We will consider the case = 2; other situations are elaborated analogously. We discretize the empirical opinion scale [0,1] by endowing users who have opinion values from the interval [0,0.5] with new opinions ! (say, ! = −1). Analogously, those who have opinions from the interval [0.5,1] are marked with # (say, # = 1). A similar procedure is applied on average opinions of users' friends. Next, for each , , and , we calculate quantity ',),% as follows: where #{… } denotes the cardinal number of the set. To put it simply, (A1) is the fraction of individuals who made opinion change ' → % among those whose opinion is ' and whose friends' average opinion is equal to ) . To avoid noise, we additionally require that the opinion shift should have a magnitude of more than 0.05 on the continuous scale. Note that in (A1), we first use two snapshots to calibrate the transition matrix's elements. Acting analogously, one can estimate the transition matrix from second and third opinion snapshots.
We should say that substantial opinion diversity, presented in Dataset, ensures that all combinations of ' and ) are available in sufficient quantity, providing the opportunity to estimate all the transition matrix elements (see Table C1). It is also important to note that our algorithm of the transition matrix estimation assumes many-to-one interactions (because we approximate the influence directed on a user by the average opinion of their friends), whereas our model is built on the idea of one-to-one interactions. One could suggest transforming the model and using the many-to-one assumption in simulations; however, we do not do so. The reason is rather technical-the averaging procedure may not be applicable for discrete opinion spaces.