Interacting innovation processes

In this work, we introduce a general model for a collection of innovation processes in order to model and analyze the interaction among them. We provide theoretical results, analytically proven, and we show how the proposed model fits the behaviors observed in some real data sets (from Reddit and Gutenberg). It is worth mentioning that the given applications are only examples of the potentialities of the proposed model and related results: due to its abstractness and generality, it can be applied to many interacting innovation processes.

Analyzing the innovation process, that is the underlying mechanisms through which novelties emerge, diffuse and trigger further novelties is definitely of primary importance in many areas (biology, linguistics, social science and others [8,9,13,15,25,31,32,34,35,36,37,39]).We can define novelties (or innovations) as the first time occurrences of some event.A widely used mathematical object that models an innovation process is an urn model with infinitely many colors, also known as species sampling sequence [16,27,45].Let C 1 be the first observed color, then, given the colors C 1 , . . ., C t of the first t extractions, the color of the (t + 1)-th extracted ball is new (i.e.not already drawn in the previous extractions) with a probability Z * t which is a function of C 1 , . . ., C t (sometimes called "birth probability") and it is equal to the already observed color c with probability P c,t = t n=1 Q n,t I {Cn=c} , where Q n,t is a function of C 1 , . . ., C t .The quantities Z * t and Q n,t specify the model: precisely, Z * t describes the probability of having a new color (that is a novelty) at time-step t + 1 and Q n,t is the weight at time-step t associated to extraction n, with 1 ≤ n ≤ t, so that the probability of having at time-step t + 1 the "old" color c is proportional to the total weight at time-step t associated to that color (a reinforcement mechanism, sometimes called "weighted preferential attachment" principle).Note that the number of possible colors is not fixed a priori, but new colors continuously enter the system.We can see the urn with infinitely many colors as the space of possibilities, while the sequence of extracted balls with their colors represents the history which has been actually realized.
The Blackwell-MacQueen urn scheme [10,27] provides the most famous example of innovation process.According to this model, at time-step t + 1 a new color is observed with probability given by a deterministic function of t, that is Z * t = z * (t) = θ/(θ + t), where θ > 0, and an old color is observed with a probability proportional to the number K c,t of times that color was extracted in the previous extractions: Q n,t = q n (t) = 1/(θ + t), i.e.P c,t = K c,t /(θ + t).This is the "simple" preferential attachment rule, also called "popularity" principle.This urn model is also known as Dirichlet process [12] or as Hoppe's model [19] and, in terms of random partitions, it corresponds to the so called Chinese restaurant process [28].Afterwards, it has been extended introducing an additional parameter and it has been called Poisson-Dirichlet model [22,28,29,38].More precisely, for the Poisson-Dirichlet model, we have and so P c,t = K c,t − γ θ + t , where 0 ≤ γ < 1, θ > −γ and D t denotes the number of distinct extracted colors until time-step t.
From an applicative point of view, as an innovation process, the Poisson-Dirichlet process has the merit to reproduce in many cases the correct basic statistics, namely the Heaps' [17,18] and the (generalized) Zipf's laws [46,47,48], which quantify, respectively, the rate at which new elements appear and the frequency distribution of the elements.In particular, the Heaps' law states that the number D t of distinct observed elements (i.e.colors, according to the metaphor of the urn) when the system consists of t elements (i.e. after t extractions from the urn) follows a power law with an exponent smaller than or equal to 1 and, for the Poisson-Dirichlet model, we have D t ∝ t γ for 0 < γ < 1 (while D t ∝ ln(t) for γ = 0).
Recently, a new model, called urn with triggering, that includes the Poisson-Dirichlet process as a particular case, have been introduced and studied [1,40,41,42].This model is based on Kauffman's principle of the adjacent possible [23]: indeed, the model starts with an urn with a finite number of balls with distinct colors and, whenever a color is extracted for the first time, a set of balls with new colors is added to the urn.This represents Kauffman's idea that, when a novelty occurs, it triggers further potential novelties.In particular, the urn with triggering has the merit to provide a very clear representation of the evolution dynamics of the Poisson-Dirichlet process.An urn initially contains N 0 > 0 distinct balls of different colors.Then, at each time step t + 1, a ball is drawn at random from the urn and (a) if the color of the extracted ball is new, i.e. it was not been extracted in the previous extractions, then we replace the extracted ball by ρ balls of the same color as the extracted ball plus (ν + 1) balls of distinct new colors, i.e. not already present in the urn; (b) if the color of the extracted ball is old, i.e. it has been already extracted in the previous extractions, we replace the extracted ball by (1 + ρ) balls of the same color as the extracted one.
It easy to verify that, when the balance condition ρ + ν = ρ is satisfied (this means that at each time-step the number of balls added to the urn is always ρ, regardless of the outcome of the extraction), the above updating rule gives rise to the above probabilities (1), taking ρ > ν ≥ 0, θ = N 0 /ρ and γ = ν/ρ.
Since it is doubtless important to understand how different innovation processes affect each other, this work aims at introducing and analyzing a model for a finite network of innovation processes.In the proposed model, for each node h, the probability of observing a new or an old item depends, not only on the path of observations recorded for h itself, but also on the outcomes registered for the other nodes j = h.More precisely, we introduce a system of N urns with triggering that interact each other as follows: (i) the probability of exploitation of an old item c by node h, i.e. the probability of extracting from urn h a color c already drawn in the past from an urn of the system (not necessarily from h itself), has an increasing dependence not only on the number of times c has been observed in node h itself (that could be even zero), but also on the number of times c has been observed in each of the other nodes; (ii) the probability of production (or exploration) of a novelty for the entire system by node h, i.e. the probability of extracting from urn h a color never extracted before from any of the urns in the system, has an increasing dependence not only on the number of novelties produced by h itself in the past, but also on the number of novelties produced by each of the other nodes in the past.
In particular, (ii) means that Kauffman's principle of the adjacent possible is at the "system level": that is, when urn h produces a novelty for the system, this fact triggers further potential novelties in all the urns of the system, not only in urn h itself.The two different dependencies described above ((i) and (ii)) are tuned by two different matrices (called Γ and W in the sequel).
Despite the amount of scientific works regarding interacting urns with a finite set of colors (see, for instance, [2,3] and the references therein), in the existing literature we have found only a few papers about a collection of interacting (in the same sense of the present work) urns with infinitely many colors, that is [14,21,43].In the model provided in [14] (see Example 3.8 in that paper), there is a finite collection of Dirichlet processes with random reinforcement.More precisely, in that model we have a random weight W t,h associated to the extraction at time-step t from the urn h so that, the probability of extracting from urn h an old color c (here, the term "old" refers to urn h, that is a color never extracted before from urn h) is proportional to the weight associated to that color, specifically t n=1 W n,h I C n,h =c /(θ+ t n=1 W n,h ).The interaction across the urns is introduced by means of the weights, which could be stochastically dependent: each W n,h may be the same for each urn h, or a function of the observed outcomes of the other urns, or a function of some common (observable or latent) variables.It is easy to understand that this model is different from ours: we consider Poisson-Dirichlet processes, not only Dirichlet processes, and, differently from the model in [14], for us, the notion of "old" or "new" color refers to the entire system, not to each single urn, and Kauffman's principle of the adjacent possible is at the system level as explained above.Our work and [21] share the fact that the proposed models are both a collection of urns with triggering with an interacting dynamics that brings the Kauffman's principle of the adjacent possible from the single agent to the network of agents.Adopting the terminology of [21], we can say that both interacting mechanisms are based on the construction and the updating of a "social" urn for each network node from which the extractions take place, but the contruction and the updating rules of the social urns are deeply different in the two models.In particular, differently from [21], we introduce the notion of "new" and "old" at the system level.In [21] the authors focus on the novelties in each sequence (novelty in h = first apperance in h of a new item), that they call "discoveries"; while we also study the sequence of the novelties for whole the system produced by each agent.Furthermore, in [21] the extraction of an "old" item in a certain network node does not affect the other nodes, in our model we also have an interacting reinforcement mechanism for the "old" items: the probability of the extraction of an old item depends on the number of times it has been observed in all the nodes.This allows us to get a specific result on the distribution of the observations in the system among the different items observed.Finally, [43] provides a multi-agent version of the urn with triggering model, which is specific for describing the birth and the evolution of social networks.
While the model we propose is extremely general and may be also employed in other contexts, it has been tested on two real data sets: one taken from the social content aggregation website Reddit, collected, elaborated and made freely available on the web by the authors of [24], and one got from the on-line library Project Gutenberg, which is a collection of public domain books.We show that both data sets exhibit empirical behaviours that are in accordance with those predicted by the proven theoretical results.
The sequel of the paper is so structured.In Section 1 we introduce the model and we explain the role played by each model parameter.In Section 2 we illustrate the theoretical results and we show how some real innovation processes can be well described using the proposed model.Section 3 is devoted to the discussion of the achieved results and the presentation of possible future developments.Finally, the supplementary material collects the analytical proof of all the presented theoretical results.

Methods
The model we propose essentially consists in a finite system of interacting urns with triggering.More precisely, suppose to have N urns (that may represent N different agents of a system), labeled from 1 to N .At time-step 0, the colors inside each urn are different from those in the other urns.Let N 0,h > 0 be the number of distinct balls with distinct colors inside the urn h.Then, at each time-step t ≥ 1, one ball is drawn at random from each urn and, for any h = 1, . . .N , urn h is so updated according to the colors extracted from urn h itself and from all the other urns j = h: • if the color of the ball extracted from urn h is "new" (i.e., it appears for the first time in the system), then we replace (inside urn h) the extracted ball by ρ h,h > 0 balls of the same color plus (ν h,h + 1), with ν h,h ≥ 0, balls of distinct "new" colors (i.e.not already present in the system); • if the color of the ball extracted from urn h is "old" (i.e., it has been already extracted in the system), we add ρ h,h > 0 balls of the same color into urn h; • for each j = h, if the color of the ball extracted from urn j is "new" (i.e., it appears for the first time in the system), then into urn h we add ρ j,h ≥ 0 balls of the same color as the one extracted from urn j plus ν j,h ≥ 0 balls of distinct "new" colors (i.e.not already present in the system); • for each j = h, if the color of the ball extracted from urn j is "old" (i.e., it has been already extracted in the system), then into urn h we add ρ j,h ≥ 0 balls of the same color as the one extracted from urn j.As already pointed out, the terms "new" and "old" refer to the entire system, that is a "new" color is a color that has never been extracted from an urn of the system.On the contrary, an "old" color is a color that has already been extracted from at least one urn of the system, but it is possible that it has never been extracted from some other urns in the system.
We assume that the "new" colors added to a certain urn are always different from those added to the other urns (at the same time-step or in the past).By means of this fact, together with the assumption that initially the colors in the urns are different from each other, we cannot have the same new color extracted simultaneously from different urns.In other words, we cannot have the same novelty produced simultaneously from different agents of the system.Therefore, for each observed new color (novelty) c, there exists a unique urn (agent), say j * (c), in the system that produced it.However, in a time-step following its first extraction, color c could be also extracted from another urn h = j * (c), as a consequence of the interaction among the urns (agents).Indeed, the "contamination" of the color-set of the urn h with the colors present in the other urns is possible by means of the interaction terms ρ j,h and/or ρ j,h in the above model dynamics.
As in the standard Poisson-Dirichlet model, we assume the balance condition (2) ρ j,h + ν j,h = ρ j,h , i.e. ρ j,h = ρ j,h − ν j,h , so that, at each time-step, each urn j contributes to increase the number of balls inside urn h by ρ j,h ≥ 0, with ρ h,h > 0. Therefore, at each time-step, the number of balls added to urn h is ρ h = N j=1 ρ j,h > 0. Hence, if we denote by C t+1,h the color extracted from urn h at time-step t + 1, we have where D * t,j denotes the number, until time-step t, of distinct observed colors extracted for their first time from urn j, that is the number of distinct novelties for the whole system "produced" by urn (agent) j until time-step t.Moreover, for each old color c, we have where K t (j, c) denotes the number of times the color c has been extracted from urn j until time-step t and j * (c) denotes the urn from which the color c has been extracted for the first time.(Note that ρ j * (c),h = 0 implies ν j * (c),h = 0 by the balance condition.) Without loss of generality, to ease the notation we adopt a different parametrization by setting (3) θ h = N 0,h /ρ h , γ j,h = ν j,h /ρ h , λ j,h = ρ j,h /ρ h and w j,h = ρ j,h /ρ h , where θ h > 0, 0 ≤ γ j,h ≤ 1 with < 1 for j = h, 0 ≤ λ j,h ≤ 1 with > 0 for j = h and 0 ≤ w j,h ≤ 1 with > 0 for j = h.This choice can be read as a normalization of the parameters since, for each h = 1, . . ., N , we have j w j,h = 1 and so, by the balance condition, 0 ≤ j γ j,h < 1 and 0 < j λ j,h ≤ 1.With the new parametrization, we obtain (4) and, for each "old" color c, Note that the probability that urn (agent) h will produce at time-step t+1 a novelty for the entire system has an increasing dependence on the number D * t,j of novelties produced by the urn (agent) j until time-step t and the parameter γ j,h regulates this dependence.In other words, Kauffman's principle of the adjacent possible is at the "system level": that is, for each pair (j, h) of urns in the system, the parameter γ j,h quantifies how much the production of a novelty by urn j induces potential novelties in urn h.Moreover, on the other hand, the probability that from urn h we will extract at time-step t + 1 an old color c has an increasing dependence on the number K t (j, c) of times the color c has been drawn from urn j until time-step t and the parameter w j,h quantifies how much the number K t (j, c) leads toward a future extraction of a ball of color c from urn h.
As particular cases, we can see that the case N = 1 reduces to the classical Poisson-Dirichlet process with parameters θ > 0 and 0 ≤ γ < 1, and the case of independence corresponds to the framework when w j,h = 0 (and so γ j,h = λ j,h = 0) for each j = h.In the latter case, by the model definition, the colors are not shared by the agents, because each urn has colors different from those inside the other urns.Indeed, for N independent Poisson-Dirichlet processes the probability of having colors in common is null.
Chinese restaurant metaphor.It is also worthwhile to recall that a standard metaphor used to represent the random partition induced by the Poisson-Dirichlet process, that is the random partition of the extracted balls among the observed colors, is the "Chinese restaurant" metaphor: suppose to have a restaurant with infinite tables and, at each time-step t + 1, a customer enters and sits at a table, with probabilities Z * t and P c,t given in (1) as the probability of sitting to an empty table and to an already occupied table, respectively.The random partition induced at time-step t is the random allocation of the customers, arrived until time-step t, among the occupied tables.The interacting model introduced above can be represented with a similar metaphor.More precisely, suppose to have a restaurant with infinite tables where, at each time-step, N customers enter simultaneously.Each customer belongs to a specific category h = 1, . . ., N .Then, at time-step t + 1, the probability that the customer belonging to category h sits to an empty table is Z * t,h defined in (4) and the probability that she sits to an already occupied table is P t (h, c) defined in (5).We cannot have customers belonging to different categories that occupy simultaneously the same empty table.However, the sharing of a table by multiple categories is possible, after the first occupation of the table, because of the presence of the interaction terms λ j,h and w j,h in (5).The probability Z * t,h results increasing not only with the number of distinct tables occupied by customers of category h until time-step t, but also with the numbers of distinct tables occupied by customers of each other category j = h.The parameters γ j,h rule these dependencies.Similarly, the probability P t (h, c) has naturally an increasing dependence on the number of customers already seated at that table, but each of these customers has a different weight, i.e. w j,h , according to her category: indeed, the parameter w j,h regulates how much the number of customers of category j sitting to a table drives a customer of category h to choose that table.For the sake of clarity, we have synthetize in Table 1 how the quantities and the events involved in the proposed model can be interpreted through both the urn metaphor or the Chinese restaurant metaphor.
Matrix notation.In order to present the theoretical results, we set Γ, W , Λ equal to the nonnegative N × N square matrices with elements γ j,h , w j,h and λ j,h , respectively.We recall that, by the balance condition (2) and the reparametrization (3), we have where 1 and 0 denote the vectors with all the components equal to 1 and 0, respectively.As observed above, the matrix Γ rules the production of potential novelties and, in particular, its elements out of the diagonal regulate the interaction among the agents with respect to this issue; while, the matrix W rules the interaction among the agents with respect to the choice of an old item.

Results
In this section we will present first the theoretical results and then the empirical results related to two real data sets.The proofs of the first ones are collected in the supplementary materials, that may be found together with the online version at [5].

Theoretical results.
The first result states that, if Γ is irreducible, that is the graph with the agents as nodes and with Γ as the adjacency matrix is strongly connected, then D * t,h ∝ t γ * a.s.for all h = 1, . . ., N , that is all the D * t,h grow with the same Heaps' exponent γ * ∈ (0, 1).This means that, at the steady state, all the agents of the network produce innovations for the system at the same rate.In addition, the ratio D * t,h /D * t,j provides a strongly consistent estimator of the ratio u h /u j of the relative centrality scores (with respect to Γ ) of the two nodes h and j.More precisely, we have Theorem 2.1.Suppose that the matrix Γ is irreducible.Denote by γ * ∈ (0, 1) the Perron-Frobenius eigenvalue of Γ, by v the corresponding right eigenvector with strictly positive entries and such that v 1 = 1 and, finally, denote by u the corresponding left eigenvector with strictly positive entries and v u = 1.Then, for each h = 1, . . ., N , we have ∞,h is a finite strictly positive random variable.Moreover, for each pair of indexes h, j = 1, . . ., N , we have As a consequence, since the number D * t of distinct items observed in the entire system until time-step t coincides by model definition with N h=1 D * t,h , we also have that this number grows as t γ * , i.e.
Furthermore, if we denote by (D t,h ) the discovery process [21] for agent h, that is if we denote by D t,h the number of distinct items adopted by agent h, then we have D * t,h ≤ D t,h ≤ D * t and so we get D t,h = O(t γ * ) and 1/D t,h = O(t −γ * ), which, in particular, imply that, when the quantities D t,h have an asymptotic power law behavior, then they necessarily have the same Heaps' exponents, equal to γ * .In addition, we obtain The second result of the present work affirms that if W is irreducible, that is the graph with the agents as nodes and W as the adjacency matrix is strongly connected, then, for each observed item c, the number of times item c has been adopted by agent h grows linearly.Moreover, at the steady state, the times item c has been adopted in the whole system are uniformly distributed among the agents.This concept can be reformulated more clearly using the metaphor of the Chinese restaurant: the limit composition of each table c is the uniform one (with respect to the categories).More precisely, we have Theorem 2.2.Suppose that the matrix W is irreducible.Then, for each h = 1, . . ., N , we have for each observed color c in the system, where K ∞ (c) is a suitable random variable that takes values in (0, 1] and does not depend on h.As a consequence, for each h = 1, . . ., N , we also have that

Empirical results.
In this subsection we show that the behaviors predicted by the previous theoretical results match with the ones we actually observe in two different real data sets: one taken from the social content aggregation website Reddit, collected, elaborated and made freely available on the web by the authors of [24] at https://github.com/corradomonti/demographic-homophily,and one got from the on-line library Project Gutenberg at https://www.gutenberg.org/.In order to illustrate these examples, we adopt the metaphor of the Chinese restaurant and so, for each of them, we identify the customers' categories and the tables we are looking at.In both examples, we consider N = 2 categories with their sequences of customers who select the tables.See Table 2 for a guide on how to interpret the quantities and the events of interest in the considered data sets in terms of the Chinese restaurant metaphor.
We analyze the processes (D * t,h ) and (D t,h ), with h = 1, 2, and the composition of the tables, constructed starting from the real data, in order to verify if they exhibit a behavior along time in agreement with the theoretical results of the previous section.Specifically, we point out: 1) the power law behavior of the processes (D * t,h ) and (D t,h ), with h = 1, 2; 2) the fact that the above processes increases with the same Heaps' exponent (the constant γ * in Theorem 2.1); 3) the convergence of the ratio D * t,1 /D * t,2 , or equivalently of the difference log 10 (D * t,1 )−log 10 (D * t,2 ), as t → +∞; 4) the convergence of the composition of the tables toward the uniform one (as stated in Theorem 2.2).For points 1) and 2), we follow the standard method in literature: we provide the log 10 − log 10 plot of the considered processes and the estimate of the common slopes of the corresponding lines by a least square interpolation.The goodness of fit of the provided lines with the same slope is supported by the extremely high value of the R 2 index.Regarding point 3), we plot the observed sequence log 10 (D * t,1 ) − log 10 (D * t,2 ) along time, in order to highlight how its fluctuations decrease along time and how it asymptotically stabilizes.The limit of this process is estimated as the difference between the intercepts of the two lines obtained for D * t,1 and D * t,2 in the log 10 − log 10 plot.This value, denoted as u, represents an estimation of the difference log 10 (u 1 /u 2 ) = log 10 (u 1 ) − log 10 (u 2 ), where r = u 1 /u 2 is the limit quantity in the second part of Theorem 2.1, which is also the ratio of the two centrality scores with respect to Γ of the two categories.Finally, for point 4), we plot the quantiles of the distribution of the proportion (2,c) , from the least populated table c to the most populated one, in order to appreciate their convergence toward 1/2.Reddit data set.This data set consists of a collection of news, and comments associated to each news, for the period 2016 − 2020, downloaded from the r/news community on the website Reddit at https://www.reddit.com/r/news,which is devoted to the discussion of news articles about events in the United States and the rest of the world.Each news is associated with the author who posted it.Moreover, the data set contains the specific topic the news belongs to (we refer to [24] for details about the topic classification) and, to each comment is also assigned a measurement of the sentiment, expressed as a real value in (−1, 1).It corresponds to the "compound" score given by the VADER (Valence Aware Dictionary and sEntiment Reasoner) Sentiment Analysis [20], which is a lexicon and rule-based sentiment analysis tool, specifically thought for sentiments expressed in social media.
Here we consider only the comments to news belonging to the topic "Politics".Moreover, we categorize the sentiment variable, following [6]: precisely, we define it as "positive" if the provided sentiment value was larger than +0.35 and "negative" if the provided sentiment value was lower than −0.35.Any comment with an original sentiment value that lies within −0.35 and +0.35 has been removed.Summing up, we consider all the comments to the commented news regarding the topic "Politics", with a sentiment value larger than +0.35 (positive) or lower than −0.35 (negative).This provides us a total of 3 016 990 comments in the negative sentiment category and 2 602 173 comments in the positive sentiment category.
We are interested in the sequence of authors who receive at least one comment with negative sentiment for the news they post and in the analogous sequence related to comments with positive sentiment.As explained above, we illustrate that these two sequences exhibit the asymptotic behaviors predicted by the proposed model.For this purpose, we firstly identify the main quantities related to the Chinese restaurant version of the model: each sentiment category is a customer category (category 1 = negative sentiment and category 2 = positive sentiment) and the authors represent the tables.Therefore, when at time-step t a news receives a comment with a specific sentiment, then the author who posted such a news is "new" or "old" for that specific sentiment category if, respectively, she has or has not already received a comment within that sentiment category.Analogously, the author will be "new" or "old" for the entire system (the whole collection of comments) if, respectively, she has or has not already received any comment.In order to obtain two sequences of comments of the same length, as required by the model, we have randomly removed some comments from the negative sentiment category, i.e. the one containing more comments.In addition, we verified successfully that an author is not commented for the first time simultaneously with two comments of different sentiment.
For each possible sentiment category h = 1, 2, the observed quantity D * t,h (i.e. the number, until time-step t, of distinct authors whose first received comment belongs to sentiment category h) shows a power law growth along time.We can observe the same behavior for D t,h (i.e. the number, until time-step t, of distinct authors who have received at least one comment belonging to sentiment category h). Figure 1 provides the asymptotic behavior of these processes in log 10 − log 10 scale, where we can also appreciate how the lines exhibit the same slope, which indicates that the processes have the same Heaps' exponent.This is exactly in accordance with the first result of Theorem 2.1.The estimated value of the Heaps' exponent, estimated as the common slope of the lines in the log 10 − log 10 plot is γ * = 0.781.Figure 2 shows the convergence of the process log 10 (D * t,1 ) − log 10 (D * t,2 ) toward the estimated limit value u = −0.727,computed as the difference between the intercepts of the two regression lines for the two processes (D * t,h ) in Figure 1.This value is an estimation of the quantity u = log 10 (r), where r = u 1 /u 2 = 10 u is the limit in the second result of Theorem 2.1.
Regarding the table composition, we provide Figure 3 with the proportion of comments with negative sentiment received by an author over the total number of received comments.More precisely, we plot the quantiles of the empirical distribution of this proportion, from the least commented author to the most commented one.In order to construct the quantiles, we have listed the authors (tables) from the least commented to the most commented, removing those commented less than 10 times (tables with less than 10 customers), then we have grouped these authors taking intervals of equal length (0.5 in log 10 scale).Finally, within each group, we have computed the quantiles of the empirical distribution of the proportion of the comments with negative sentiment the authors have received with respect to the total number of received comments.We can appreciate how these quantiles get closer to 1/2 (the uniform composition) as the number of received comments increases.This is in accordance with Theorem 2.2.
Gutenberg data set.We downloaded this data set from the on-line library Project Gutenberg.It consists of a collection of over 70 000 free ebooks.After selecting only those written in English and classifying them in different topics, we decided to focus on two particular literary genres: "Western" and "History".For each of them, we have considered all the words contained in seven books, for a total of 480 460 words for "Western" and 476 948 words for "History" (after a slight pre-processing: e.g.removal of punctuation, spaces, numbers and words with 1 or 2 characters and acquisition of  .the stem of the words by means of Dr. Martin Porter's stemming algorithm [30]).
We are interested in the two sequences of words for the two different literary genres and, as explained at the beginning of this subsection, we would like to check that these two sequences exhibit the asymptotic behaviors predicted by the proposed model.In order to do so, we firstly identify the main quantities related to the Chinese restaurant version of the model: each literary genre is a customer category (category 1 = "Western" and category 2 = "History") and the words represent the tables.Therefore, each word will be "new" or "old" for a specific literary genre if, respectively, it has or has not already been used within that genre.Analogously, each word will be "new" or "old" for the entire system if, respectively, it has or has not already been used within any considered book.In order to obtain two sequences of words of the same length, as required by the model, we have randomly removed some words from the category "Western", i.e. the one containing more words.In addition, we verified successfully that a new word does not appear for the first time simultaneously in both genres.
For each literary genre h = 1, 2, the observed quantity D * t,h (i.e. the number, until time-step t, of distinct words whose first appearance has been in literary genre h) shows a power law growth along time.The same behavior is shown by D t,h (i.e. the number, until-time-step t, of distinct words used in literary genre h). Figure 4 provides the asymptotic behavior of these processes in log 10 − log 10 scale, where we can also appreciate how the lines exhibit the same slope, which indicates that the processes have the same Heaps' exponent.This is exactly in accordance with the first result of Theorem 2.1.The estimated value of the common Heaps' exponent, estimated as the common slope of the lines in the log 10 − log 10 plot is γ * = 0.466.Figure 5 shows the convergence of the process log 10 (D * t,1 ) − log 10 (D * t,2 ) toward the estimated limit value u = −0.238,computed as the difference between the intercepts of the two lines for the two processes (D * t,h ) in Figure 4.This value is an estimation of the quantity u = log 10 (r), where r = u 1 /u 2 = 10 u is the limit in the second result of Theorem 2.1.With respect to the Reddit data set, we can observe that here the convergence is slower.
Regarding the table composition, we provide Figure 6 with the proportion of times a word has been used in the topic "Western" over the total number of times it has been used in the entire system.As in the previous application, we plot the quantiles of the empirical distribution of this proportion along the frequency of the words in the system, from the least frequent to the most frequent.In order to construct the quantiles, we have listed the words (tables) from the least frequent to the most frequent, removing those appeared less than 10 times (tables with less than 10 customers), then we have grouped these words taking intervals of equal length (0.5 in log 10  .scale).Finally, within each group, we have computed the quantiles of the empirical distribution of the proportion of times the words have been used in the topic "Western" with respect to the total number of times it has been used in the entire system.We can appreciate how these quantiles get closer to 1/2 (the uniform composition) as the frequency of the word increases.This is in accordance with Theorem 2.2.
Figure 6.Gutenberg data set.Real data: Quantiles of the distribution of the proportion of times the words have been used in the topic "Western" along their frequency in the system, form the least frequent to the most frequent.

Discussion
In this work we have introduced a general model in order to analyze a system of N interacting innovation processes.The interaction among the processes is ruled by two matrices Γ and W .The first one regulates the production of potential novelties, while the second one tunes the interaction with respect to the choice of an old item.When matrix Γ is irreducible, we have proven that the numbers D * t,h , with h = 1, . . ., N , of distinct novelties for the entire system produced by agent h until time-step t have and asymptotic power law behavior with a common Heaps' exponent 0 < γ * < 1.Moreover, we have proven that the ratio D * t,h /D * t,j converges almost surely toward the ratio u h /u j of the relative centrality scores of h and j.Finally, when the matrix W is irreducible, we have proven that, for each observed item c, the number of times item c has been adopted by agent h (i.e. the number of customers of category h sitting at table c) grows linearly and the proportions of times it has been adopted by agent h over the number of times it has been adopted in general in the system converges almost surely to 1/N (i.e. the asymptotic composition of table c, with respect to the different N categories, is the uniform one).
In order to highlight the potentialities of the proposed model and of the proven related results in the study of the interaction among innovation processes, we have illustrated that the behaviors predicted by the provided theoretical results match with the ones we observe in two real data sets.One interesting research line that we have in mind for the future is to study the speed of convergence for the limits given in the shown theoretical results, in order to develop statistical instruments for an accurate inference on the two interaction matrices, Γ and W , from the real data.Regarding this issue, it is important to note that the value γ * and the vector u = (u h ) h=1,...,N do not uniquely determine the matrix Γ.In other terms, given the estimates of γ * and of u, there exist infinite matrices Γ that could have generated that estimated values.This is a tough point to deal with and further theoretical results are needed if we want to detect the model parameters from the data.In the Supplementary material, we present an idea for a first estimation of the interaction matrices in the case N = 2: after the estimation of γ * and log 10 (r) = log 10 (u 1 /u 2 ) as the common slope and the difference of the intercepts, respectively, of the lines related to the observed processes (D * t,h ), h = 1, 2, plotted in log 10 − log 10 scale, we can consider parametric families of matrices Γ and W compatible with these estimated values and we can perform a Maximum Likelihood Estimation (MLE) in order to detect the remaining parameters that better fit the data.However, for having a robust MLE procedure, we need to reduce the number of parameters by imposing some restrictions on them, for instance the symmetry of the matrices.We have tested this procedure on some simulations and the results are collected in the Supplementary material.
Regarding the model assumptions, we point out that the balance condition (2) forces to have Heaps' exponents strictly smaller than 1.Since eliminating this condition in the case of a single process (N = 1) makes an exponent equal to 1 possible [1,40,41,42], it is plausible that it would be the same also for N ≥ 2. Therefore, a second research line for the future is to investigate the proposed model without assuming the balance condition.Moreover, the balance condition forces w h,j (the parameter governing the interaction in the selection of an old item) to be large whenever γ h,j (the parameter tuning the interaction with respect to the production of potential novelties) is large and, vice versa, γ h,j is necessarily small whenever w h,j is small.On the contrary, the proposed model without the restriction of the balance condition may include cases where γ h,j is large, but w h,j is small.
Another model assumption that could be removed is the simultaneity in the extractions from all the urns of the system (i.e. in the arrivals of the customers for all the categories).Indeed, this condition forces to have the same number of observations for each process of the system.This variant of the model could be obtained by inserting a selection mechanism for the urn from which the extraction at a certain time-step will be performed (i.e. for the category of the customer who will enter the restaurant at a certain time-step).This selection could be driven by a reinforcement mechanism on the number of times an urn (category) has been selected.
Finally, regarding the assumption of irreducibility in the theoretical results, we underline that when the matrices are not irreducible, it is possible to decompose them in irreducible sub-matrices such that the union of the spectra of the sub-matrices coincides with the spectrum of the original matrix.Then, a deeper analysis starting from the present theory is needed in the same spirit of [7,2,3].See also the Supplementary material for an heuristic argument in order to deduce the rate at which each D * t,h grows in the case of a general (i.e.not necessarily irreducible) matrix Γ.

SUPPLEMENTARY MATERIAL FOR INTERACTING INNOVATION PROCESSES: CASE STUDIES FROM REDDIT AND GUTENBERG
Appendix S1.Analytical proofs Denote by X * t,h the random variable that takes value 1 when the ball extracted from urn h at time-step t has a new (for all the system) color and is equal to 0 otherwise.Then Z * t,h defined in (4) coincides with P (X * t+1,h = 1 | past) = E[X * t+1,j | past] and D * t,j can be written as t n=1 X * n,j .Therefore, since we have we obtain the following dynamics for Z * t,h : where r t,h = 1/(θ h + t + 1) = 1/(t + 1) + O h (1/t 2 ).The corresponding vectorial dynamics for Z = (Z t,1 , . . ., Z t,N ) is We prove the following key result: Theorem S1.1.Under the same assumptions and notation of Theorem 2.1, we have ∞ is an integrable strictly positive random variable.Proof.We firstly want to decompose the vectorial process Z * t based on the Jordan representation of the matrix Γ.Specifically, for any γ ∈ Sp(Γ ) \ γ * , we can denote as J γ the Jordan block and with U γ and V γ the matrices whose columns are, respectively, the left and right (possibly generalized) eigenvectors of Γ associated to the eigenvalue γ, i.e.
Then, we can consider the decomposition where (note that Z * * t is non-negative but not bounded by 1 as Z * t ) so that we have In the following steps, we are going to show that Z * * t converges almost surely and in mean to an integrable random variable Z * * ∞ such that P ( Z * * ∞ > 0) = 1 and that each Z * * γ,t converges almost surely to zero.In particular, this last task will be done separately for the eigenvalues with |γ| < γ * and with |γ| = γ * .Remember that the assumption that Γ (or, equivalently, Γ ) is irreducible ensures that γ * is real, simple and |γ| ≤ γ * for any γ ∈ Sp(Γ ).In the sequel of the proof, the symbol F t denotes the past until time-step t.
Study of Z * * t .By multiplying equation (S:0) by v we obtain Then, multiplying everything by ζ t+1 and using the relation (S:0) Therefore, we have ).Since γ * > 0 and so t ζ t+1 /t 2 ∼ t 1/t 1+γ * < +∞, the process Z * * t is a non-negative almost (super-)martingale, almost surely convergent toward a finite random variable Z * * ∞ (see Appendix S1.2).Then, using Theorem S1.3, we can prove that P ( Z * * ∞ > 0) = 1.Indeed, if we define the stochastic process W = (W t ) t≥0 , taking values in the interval [0, 1], as From Theorem S1.3 applied to (W t ) with δ = γ * , we get that ζ t W t converges almost surely to a random variable with values in (0, +∞).This random variable is obviously also the almost sure limit of Z * * t and so we can conclude that P ( Z * * ∞ > 0) = 1.Furthermore, we can observe that, for each t, we have | and thus, since the last series is finite, we have sup t E[ Z * * t ] < +∞.By Fatou's lemma, this fact implies that Z * * ∞ is integrable.Now, we are ready to prove Lemma S1.2, whose statement and proof is postponed at the end of the present proof.A first consequence of this lemma is that the convergence of Z * * t to Z * * ∞ is also in mean.Indeed, from (S:0), since sup where V * t is defined in the statement of Lemma S1.2.Then, we find where we have used Lemma S1.2 in order to say that the first series is finite.Therefore, we have sup t E[( Z * * t ) 2 ] < +∞ and so ( Z * * t ) t is uniformly integrable and we can conclude that Z * * t converges to Z * * ∞ also in mean.
Dynamics of Z * * γ,t .By multiplying equation (S:0) by ζ t+1 we get where (S:0) 2 converges a.s. to zero.To this end, by multiplying equation (S:0) by V γ , we have Then, since for any real matrix A we can write we have that Then, regarding the first term, we note that and so Therefore, since γ * > |γ| and by Lemma S1.2, the process B * * t 2 is a non-negative almost supermartingale that converges almost surely.Moreover, by applying the expectation we obtain which, since t (γ * − |γ|)/(t + 1) = +∞, by Lemma S1.2 and Lemma S1.6, we can conclude that B * * t a.s.
Study of Z * * γ,t with |γ| = γ * .From the Frobenious-Perron theory, we know that each eigenvalue with maximum modulus is simple.Then, set b t = v γ Z * * so that, since we have Z * * γ,t = u γ v γ Z * * = u γ b t , it is enough to prove that |b t | almost surely converges to zero.To this end, by multiplying equation (S:0) by v γ , we have Then, using (S:0), we have that Then, regarding the first term we have that and so Therefore, since γ * > Re(γ) and by Lemma S1.2, the process |b * * t | 2 is a non-negative almost supermartingale that converges almost surely.Moreover, by applying the expectation, we obtain Since Then, if Γ is irreducible, we have Proof.First notice that by definition Then, denoting by v min the minimum element of v, which is strictly positive since Γ is irreducible, we have that N j=1 Z * j,t ≤ v Z * t /v min = Z * t /v min .Therefore, we have This concludes the proof.
Proof of Theorem 2.1.Leveraging on Theorem S1.1, we can prove Theorem 2.1.Indeed, by the previous convergence results for (Z * t,h ) t , we have and so, by Lemma S1.8, we get As a consequence, we obtain Proof of Theorem 2.2.Recall from (5) that, for any color c already present in the network at time t, P t (h, c) = P (C t+1,h = c| past) denotes the conditional probability that the extraction at time-step t + 1 from urn h gives the old color c, while K t (h, c) indicates the number of times the color c has been drawn from urn h until time-step t.
First of all, we observe that, from (5), we have Let C so that 0 ≤ U A t = α t+1 Y t ≤ C. Using the Taylor expansion of the function f 2 x 2 with x 0 ∈ (0, x)) with a = H t and x = U A t+1 , we have eventually (so that H t ≥ 1) and so, recalling that W t = H t /s t a.s.
∼ H t /t, we get Therefore, we have and so, for θδ > 1, since H t → +∞, we can conclude that the above conditional expectation is eventually negative.This proves that, for each θ > 1/δ, (t/H θ t ) t is eventually a (positive) supermartingales and so, for each θ > 1/δ, it converges almost surely to a finite random variable.Since θ > 1/δ is arbitrary, we necessarily have that t/H θ t converges almost surely to zero.This fact concludes the proof.Now we are ready for the proof of the previous theorem.
Proof.(of Theorem S1.3) where ∆ n , R 1,n , R 2,n are all non-negative sequences of random variables.Then (Y n ) is called nonnegative almost super-martingale.
By [33], we know that it almost surely converges on If n ∆ n and n Q n are almost surely convergent, then (L n ) n converges almost surely to a finite random variable.

Appendix S2. Heuristics
We here describe an heuristic argument (also employed in [21]), useful in order to detect the rate at which each D * t,h grows along time in the case of a general matrix Γ.
The dynamics that rules the vectorial process D * t = (D whose general solution is given by d * (z) = e Γz c.Now, the term e Γz can be expressed using the canonical Jordan form of the matrix Γ, so that we obtain where γ 1 , . . ., γ r are the distinct eigenvalues of Γ, p 1 , . . ., p r are the sizes of the corresponding Jordan blocks and c i are suitable vectors related to c and to the generalized eigenvectors of Γ.Indeed, we can write Γ as P JP −1 , where J is its canonical Jordan form and P is a suitable invertible matrix of generalized eigenvectors.Therefore, we have e Γz = P e Jz P −1 , where e Jz is a block matrix with blocks of the form e J k z with J k block in J. On the other hand, if J k = γ k I + N k is a generic Jordan block of Γ with size p k and associated to the eigenvalue γ k , we have Changing the variable from z to t, we find and so the rate at which D * t,h increases is given by the leading term in the expression of d * h (t).
In particular, when Γ is irreducible, the above general formula leads, for each D * t,h , to the same asymptotic behavior t γ * , with γ * equal to the leading eigenvalue of Γ (recall that γ * is simple and so the logarithm term is not present).However, it is important to note that, with this heuristic argument, we can deduce the right rate at which each D * t,h grows, but we cannot get any information about the limit random variable: we can deduce that, for each h, the quantity D * t,h /(u h t γ * ), where u is the vector of the relative centrality scores, converges almost surely to a certain random variable (first statement of Theorem 2.1), but we cannot affirm that these limit random variables are all equal and this last fact is fundamental in order to obtain the second statement of Theorem 2.1.Nevertheless, we can affirm that the merit of this heuristics is the fact that, from (S:0), we can get the rate at which each D * t,h grows for any matrix Γ.
Appendix S3.A preliminary idea for the estimation of the interaction in the case N = 2 In this section, for the case N = 2, we provide a parametric family for the matrix Γ = (γ j,h ) j,h=1,2 such that its leading eigenvalue γ * and the ratio r = u 1 /u 2 of the components of its corresponding left eigenvector coincide with some given values.More precisely, given the values γ * ∈ (0, 1) and r ∈ (0, 1], the matrices (S:0) are non-negative, irreducible, such that 1 Γ < 1 and have the leading eigenvalue equal to γ * and the ratio of the components of the corresponding left eigenvector equal to r.Moreover, we can define a parametric family for the matrix W = (w j,h ) j,h=1,2 , adding other two parameters, as Note that the above matrices W (x 1 , x 2 , y 1 , y 2 ) are non-negative, irreducible and such that 1 W = 1 .The balance condition is satisfied by construction.
Given a data set such that the observed processes exhibit asymptotic behaviors in accordance with the provided theoretical results of the model, the above parametric families for the two interaction matrices Γ and W can be used for performing a Maximum Likelihood Estimation (MLE) procedure.In details: 1) estimate the quantity γ * as the common slope of the lines in the log 10 − log 10 plot of the processes (D * t,h ), with h = 1, 2; 2) estimate the quantity r as 10 u , where u is the difference between the intercepts of the lines in the log 10 − log 10 plot of the processes (D * t,h ), with h = 1, 2 (note that, in order to employ the above parametric families of matrices, we need to label the two categories so that the estimated value for r is ≤ 1, i.e. u ≤ 0); 3) consider the matrices Γ(x 1 , x 2 ) and W (x 1 , x 2 , y 1 , y 2 ) related to the estimated values for γ * and r; 4) perform a MLE procedure in order to estimate from the data the interaction parameters x 1 , x 2 , y 1 and y 2 and, possibly, the initial parameters θ 1 and θ 2 .However, in order to get a robust MLE estimation, we may want to reduce the number of parameters by imposing some conditions on them: for instance, we can take θ 1 and θ 2 equal to some given values and restrict to matrices Γ(x 1 , x 2 ) and W (x 1 , x 2 , y 1 , y 2 ) that are symmetric (which means that the interaction mechanism is symmetric, i.e. the influence of h = 1 on h = 2 is equal to the one of h = 2 on h = 1).The general formula of the likelihood function that we have to maximize is: where I E denotes the indicator function of the event E, Z * t,h and P t (h, c) are given in (4) and in (5) , respectively, and (c t,1 ) 1,...,T and (c t,2 ) 1,...,T are the two observed sequences of items (colors/tables) for the two agents (urns/categories) h = 1, 2.
We now present a simulation study aimed at highlighting the performance of the estimation procedure obtained by following the steps 1)-4) of the algorithm proposed above.In order to reduce the number of parameters to be estimated, we set θ 1 = θ 2 = 1 and we impose that both Γ and W must be symmetric.This assumption, combined with the condition W 1 = 1, implies that Γ and W can be univocally identified by four parameters, e.g.γ 1,1 , γ 1,2 , γ 2,2 , w 1,2 .For each choice of Γ and W , 100 independent innovation processes following the model presented in this work have been generated until the time-step T = 10 4 .Then, we have applied steps 1)-4) to the data generated by each simulation, so obtaining a set of 100 estimates of γ * , r, x 1 , x 2 , y 1 and y 2 which fulfill the symmetric condition, i.e. each one leading to symmetric estimated matrices Γ and W .The results of this simulation study are collected in Table S3, where the mean values and the standard deviations of the estimated elements are compared with the true ones used for generating the data.Regarding the elements of the two matrices and γ * , the estimation procedure works very well in all the cases.Regarding r, we can note that the estimated values are "sensitive" to the strenght of the interaction term γ 1,2 : the higher the interaction term, the better is the estimation.Table S3.Simulation results of the estimation procedure described in steps 1)-4) with θ 1 = θ 2 = 1 and assuming Γ and W symmetric.Each parameter has been estimated by 100 independent simulated processes generated until time-step T = 10 In order to complete the picture, we have also checked how the results can be affected by the choice of θ h and, in particular, if choosing a wrong value of θ h in the likelihood could considerably worsen the estimation of Γ and W .To this end, we have considered some of the scenarios presented in Table S3 and we have computed the estimates of the elements of Γ and W for two different values of θ h and, in particular, including the cases when the value of θ h used to generate the simulated data sets is different from the value of θ h used to compute the likelihood.The results of this simulation study on the "sensitivity" of the parameter θ h are collected in Table S4.In general, we can notice that the results seem to be quite robust to the choice of θ h used in the likelihood.Therefore, the problem of using the "right" θ h in the likelihood does not seem so important as we could imagine.However, the performance of the estimation procedure does worsen considerably when the data are generated with high values of θ h .This is probably due to the fact that, when θ h is large, the asymptotic behaviors of the innovation processes are reached after a number of time-steps which is much larger than T = 10 4 used in this simulation study.
In conclusion, the estimation procedure provided in this subsection is only a first step toward the estimation of the interaction between two innovation processes.Additional simulations and analyses are needed.In particular, we need to understand how to test the restrictions on the parameters, for example how to provide a test on the symmetry of the interaction mechanism.Table S4.Simulation results of the estimation procedure described in steps 1)-4) with θ 1 = θ 2 and assuming Γ and W symmetric.Each parameter has been estimated by 100 independent simulated processes generated until time-step T = 10

Figure 1 .
Figure 1.Reddit data set.Linear behavior of (D * t,h ) and (D t,h ) along time, for h = 1, 2, in log 10 − log 10 scale.The dashed lines are obtained by a least square interpolation.The goodness of fit R 2 index is 0.9984.The estimated common slop is γ * = 0.781..

Figure 3 .
Figure 3. Reddit data set.Real data: Quantiles of the distribution of the proportion of comments with negative sentiment received by the authors along their number of received comments, form the least commented to the most commented.

Figure 4 .
Figure 4. Gutenberg data set.Linear behavior of (D * t,h ) and (D t,h ) along time, for h = 1, 2, in log 10 − log 10 scale.The dashed lines are obtained by a least square interpolation.The goodness of fit R 2 index is 0.9937.The estimated common slope is γ * = 0.466.

4 .
Columns 1: value of θ 1 = θ 2 = θ Data used to generate the data.Columns 2: value of θ 1 = θ 2 = θ Likelihood put in the likelihood function.Columns 3-6: elements of the interacting matrices Γ and W used to generate the data.Columns 7-10: mean values and standard deviations of the elements of the 100 estimated interacting matrices Γ and W . θ Data θ Likelihood γ 1,1

Table 1 .
Correspondence table between the model, the urn metaphor and the Chinese restaurant metaphor = number, until time-step t, number, until time-step t, of distinct colors number, until time-step t, of distinct novelties for the whole system observed in the whole system and of distinct tables produced by agent h extracted for their first time from urn h occupied for their first time by a customer belonging to category h D t,h = number, until time-step t, number, until time-step t, number, until time-step t, of distinct items adopted by agent h of distinct colors extracted from urn h of distinct tables occupied by at least one customer belonging to category h Kt(h, c) = number, until time-step t, number, until time-step t, number, until time-step t, of customers of times agent h has adopted item c of times color c has been extracted from urn h belonging to category h and sitting at tablec