Article | Open

# Sample space reducing cascading processes produce the full spectrum of scaling exponents

• Scientific Reportsvolume 7, Article number: 11223 (2017)
• doi:10.1038/s41598-017-09836-4
Accepted:
Published:

## Abstract

Sample Space Reducing (SSR) processes are simple stochastic processes that offer a new route to understand scaling in path-dependent processes. Here we define a cascading process that generalises the recently defined SSR processes and is able to produce power laws with arbitrary exponents. We demonstrate analytically that the frequency distributions of states are power laws with exponents that coincide with the multiplication parameter of the cascading process. In addition, we show that imposing energy conservation in SSR cascades allows us to recover Fermi’s classic result on the energy spectrum of cosmic rays, with the universal exponent −2, which is independent of the multiplication parameter of the cascade. Applications of the proposed process include fragmentation processes or directed cascading diffusion on networks, such as rumour or epidemic spreading.

## Introduction

Practically all complex adaptive systems exhibit fat-tailed distribution functions in the statistics of their dynamical variables. Often these distribution functions are exact or almost exact asymptotic power-laws, $p(x)∼ x − λ$. While Gaussian statistics can often be traced back to a single origin, the central limit theorem, for the origin of power laws there exist several routes. These include (i) Yule-Simon processes (preferential attachment)1,2,3, (ii) multiplicative processes with constraints, see refs 4 and 5, (iii) criticality6, (iv) self-organized criticality and cascading processes7,8,9,10, (v) constraint optimization11,12,13, and (vi) sample space reducing (SSR) processes14,15,16. Most of these mechanisms are able to explain specific values of the exponent λ or a range of exponents. None of them however explain the full range of exponents from zero to infinity, λ [0, ∞) in a straightforward way. Depending on the generative process exponents belong to different ranges. For example, exponents near critical points are often fractional, and within the range $λ∈[1/2,5/2]$ 6. Many exponents for avalanche processes are found within a range of λ (0, 3)17. Some processes, like the preferential attachment, can formally explain a wide range of exponents, although deviations from the standard values λ (2, 3.5) are hard to map to realistic underlying stochastic dynamics18. Here we show that the combination of cascading processes with SSR processes is able to do exactly that, to provide a single one-parameter model that produces the full spectrum of all possible scaling exponents. The parameter is nothing but the multiplication ratio of the cascading processes. Finally, we show that generic disintegration processes can be mapped one-to-one to SSR cascades with an over imposed condition of conservation of whatever magnitude is represented by the states. This mapping allows us to derive a remarkable result: The histogram of visits to each state follows an exponent −2, regardless the multiplication parameter. This result may have important consequences in order to understand generic properties of disintegration processes and the ubiquity of the exponent −2 in nature.This, for example, allows us to recover Fermi’s classic result on the energy spectrum of cosmic rays, only appealing to combinatorial properties of the cascade. In addition, we provide a rigorous proof of that result in the appendix A, as a new contribution to the study of the random partition of the interval.

Cascading processes have played an important role in the understanding of power-law statistics in granular media7,8,9,10, 19, earth quakes20,21,22, precipitation23, 24, dynamics of combinatorial evolution25, or failure in networks26,27,28. The scaling exponents of the probability distribution functions of quantities such as avalanche sizes, energy distributions, visiting times, event durations, etc., are found within a relatively narrow band. Cascading processes are often history-dependent processes in the sense that for a particular event taking place the temporal order of microscopic events is important. Recent progress in the understanding of the generic statistics of history-dependent processes and their relation to power laws was made in refs 14,15,16 and 29. Maybe the simplest history-dependent processes are the sample space reducing (SSR) processes14, which explain the origin of scaling in a very simple and intuitive way. They have been used in various applications in computational linguistics30, fragmentation processes14, and diffusion on directed networks and search processes15.

A SSR process is a stochastic processes whose sample space reduces as it evolves in time. They can be depicted in a simple way, see Fig. 1(a). Imagine a set of N states in a system, labelled by i = 1, 2, …, N. The states are ordered by the label. The only rule that defines the SSR process is that transitions between states may only occur from higher to lower labels. This means that transition from state ji is possible only if label j > i. When the lowest state i = 1 is reached, the process stops or is re-started. It was shown in ref. 14 that this dynamics leads to a Zipf’s law in the frequency of state visits, i.e. the probability to visit state i is given by p(i) = i −1. This scaling law is extremely robust and occurs for a wide class of prior probabilities15. We now show that the combination of SSR processes with the simplest cascading process allows us to obtain a mechanism that can produce power laws with any scaling exponent. We will comment on how the model can be used to recover the Fermi’s classic result on the cosmic ray spectrum31, 32. This is possible by imposing energy conservation on SSR cascading processes. We discuss other potential cases where the theory of SSR cascades might apply.

To define SSR cascades, imagine a system with a set of N ordered states, each of which has a prior probability of appearing, q 1, …, q N−1, being state N the starting point of the cascade. The process starts at t = 0 with μ balls at state N jumping to any state i 1, …, i μ  < N with a probability proportional to q i , …, q μ , respectively. Suppose that the μ balls landed on states i 1,..i μ , respectively. At the next timestep, t = 1, each of these μ balls divide into μ new balls which all jump to any state below their original state. The multiplicative process continues downwards. Whenever a ball hits the lowest state, it is eliminated from the system. Effectively we superimpose a multiplicative process that is characterized by the multiplicative parameter μ, and the SSR process described above, see Fig. 1(c). The case μ = 1 is exactly the standard SSR process, where no new elements are created, and the case μ < 1 corresponds to the noisy SSR, where there is the possibility that the process gets cut at some step, see Fig. 1(b).

The derivation of the visiting distribution of this cascading SSR process follows the arguments found in ref. 15. We first define the cumulative prior distribution function g(k),

$g(k)= ∑ i ≤ k q i .$

Without a multiplication factor μ the transition probabilities p(i|j) determine the probability to reach state i at timestep t + 1, given that the system is in state j at time t, and are given by

$p ( i | j ) = { q i g ( j − 1) for i < j 0 for i ≥ j .$
(1)

In a SSR cascade, if there is an element sitting at state j at time t, there are now μ trials to reach any state i < j at t + 1. Since the number of particles is not conserved throughout the process, we talk about the expected number of jumps from j to i. Since the jumps from j to i of each ball is independent, the expected number of jumps from j to i we denoted by n(ji) can be approximated as follows:

$n(j→i)=μp ( i | j ) .$
(2)

We denote the expected number of elements that will hit state i in a given SSR cascade by n i . Up to a factor the sequence n 1, …, n N is identical to the histogram of visits. From equations (1) and (2) we get

$n i = ∑ j > i n(j→i) n j =μ q i ∑ j > i n j g ( j − 1) .$

By subtracting n i+1 − n i and after re-arranging terms we find

$n i + 1 q i + 1 ( 1 + μ q i + 1 g ( i ) ) = n i q i ,$

or, when applied iteratively

$n i = n 1 q i q 1 ∏ 1 < j ≤ i ( 1 + μ q j g ( j − 1) ) − 1 .$

Since μ(q j )/(g(j − 1)) is typically small the product term is well approximated by

$∏ 1 < j ≤ i ( ⋯ ) − 1 = exp [ − ∑ 1 < j ≤ i log ( 1 + μ q j g ( j − 1) ) ] ≈ exp ( − μ ∑ 1 < j ≤ i q j g ( j − 1) ) ≈ exp ( − μ ∫ 1 i d g d x 1 g ( x ) d x ) ≈ exp ( − μ log g ( i ) q 1 ) = ( g ( i ) q 1 ) − μ ,$

where we used $q j ∼ d g / d x | j$ and $log(1+x)∼x$. Finally, we have

$n i ∼ n 1 q 1 1 − μ ( q i g ( i ) μ ) .$
(3)

For equal prior probabilities, $q i = 1 N − 1$ for all states, we get the expected visiting probability to be

$n i ∼ i − μ .$

The multiplication factor μ becomes the scaling exponent, for μ = 1 the standard SSR processes is recovered14. Figure (2) shows numerical results which are in perfect agreement with the theoretical predictions. Note that the argument also holds for non-integer μ, where, on average, μ balls are created at every step. In the numerical implementation, a non-integer μ is introduced as follows: Let $μ= ⌊ μ ⌋ +δ$, with δ < 1. Then, with probability δ, $⌊ μ ⌋ +1$ balls are created and, with probability 1 − δ, $⌊μ⌋$ balls are created. Also the case of multiplication factors μ < 1 are possible, reproducing the previously defined noisy SSR case, see Fig. (1b) and ref. 14. In this situation at each step the process can be restarted with the probability 1 − μ.

We numerically compute the cascade size distribution as a function of the number of states N and μ. For a given realisation of a cascade ψ with initial sample space N and multiplicative parameter μ, starting with a single element at N, we define the cascade size, s μ,N (ψ), as the number of elements of the cascade ψ that reach state 1, $n 1 ( ψ )$. Numerical analysis suggests that the cascade size distribution f(s m,μ ) can be well approximated by a Γ distribution33. For the sake of simplicity, we drop the subscripts μ and N for s. We thus find a purely phenomenological equation that reads.

$f(s)∝ s α − 1 e − λ s , 〈 s 〉 ∝ N μ e a μ , σ 2 ∝ N b μ e ( 1 2 + a ) μ 〈 s 〉 ,$
(4)

with a = 0.82, b = 0.9, α = 〈s2/σ 2, λ = 〈s〉/σ 2. Numerical results and fits are shown in Fig. (3). The inset shows that the approximation for 〈s〉 is highly accurate.

### Energy conservation and the prevalence of −2 exponent

In the sequel we derive the statistics of visits to the states of our system when our cascade observes an energy conservation constraint. Remarkably, we see that the histogram of visits to each state along the whole cascading process follows a power-law of exponent −2, regardless the multiplication parameter of the avalanche. The strategy followed here is based on the partition through renormalization and works basically as follows: out of an interval $[0,1]$, a partition is performed by throwing μ random numbers between 0 and 1 and then renormalizing them such that their sum is 1 as in Fig. (4a,b)34. Our strategy can be seen as a particular choice of Dirichlet partitioning of the interval35. This strategy is different from the random selection of breaking points of the interval, described, e.g., in refs 33 and 36. In the appendix A we provide a complete proof of our result.

To study SSR cascades with a superimposed conservation law, let us assume, with any loss of generality, that the states 1, 2, …, N are associated with energy levels

$E 1 , E 2 ,…, E N .$

Energy conservation imposes the following constraint: If a particle with energy E and splits into μ particles i 1, i 2, … i μ , with respective energies $E i 1 , E i 2 ,…, E i μ$, then:

$∑ k = 1 μ E i k =E.$
(5)

We are interested in the energy spectrum, i.e. number of observations of particles at a particular energy level at any point of the cascading process, n(E).

The first task is to impose the energy conservation constraint given in equation (5) in the schema of transition probabilities of the SSR cascade. To compute it we use a rescaling technique and we will assume that the energy spectrum is continuous. The rescaling technique is outlined in Fig. (4). Let us ignore energy conservation for the moment and define a continuous uniform random variable u on the interval $[0,1]$. Let u 1, u 2, …, u μ be μ independent realisations of u, see Fig. (4a). Let us suppose that we are at level E. From this sequence of random variables one can derive the target sites of the newly created particles in a SSR avalanche with multiplicative parameter μ as

$u 1 ⋅E, u 2 ⋅E,…, u μ ⋅E.$

This is the continuous version of what we described in section 2.1. Now we define a new random variable, ϕ μ , which is the sum of μ realisations of the random variable u:

$φ μ = ∑ k ≤ μ u k .$

The sum of μ realisations of a random variable u uniformly distributed on the interval $[0,1]$, ϕ μ , follows the Irwin-Hall distribution, f μ (ϕ μ )33. This means that one can construct, for each μ realisations of the random variable u a rescaled sequence, see Fig. (4b)

$φ μ − 1 u 1 ⋅E,…, φ μ − 1 u μ ⋅E,$

such that sum up to the total energy E,

$φ μ − 1 ∑ i ≤ μ u i ⋅E=E.$

Thus, by imposing energy conservation we actually expect the following sequence of rescaled energies $E i 1 , E i 2 ,…, E i μ$ for the emerging particles, where

$E i k = φ μ − 1 ( u k ⋅E).$

This rescaling approach assumes that the μ new particles behave independently. The crucial issue is to map this process into a cascade, see Fig. (4c,d). To approach this problem, we first study the expected number of particles that jump to a given state E if a given value of ϕ μ occurs. We then average over all potential values of ϕ μ . We assume that the expected number of particles from EE′, n(EE′) goes as ~μp(E′|E). Taking into account the rescaling imposed by energy conservation, $p ( E ′ | E ) ∼ φ μ E$, see Fig. (4)–on has that:

$n(E→E′, φ μ )= { μ φ μ E for φ μ E ′ < E 0 for φ μ E ′ ≥ E .$
(6)

Assuming a continuous spectrum of energies, one has, for a given value of ϕ μ , that the expected number of particles that will visit state E at some point of the cascade, n(E,ϕ μ ) is:

$n(E, φ μ )= ∫ φ μ E ∞ μ φ μ E ′ n(E′)dE′,$

where n(E′) is the total number of particles that are expected to visit state E′ during the cascade. n(E) will be obtained by averaging n(E,ϕ μ ) over all potential values of ϕ μ , distributed as the Irwin-Hall distribution, f μ :

$n(E)= ∫ 0 μ f μ ( φ μ ) { ∫ φ μ E ∞ μ φ μ E ′ n ( E ′ ) d E ′ } d φ μ .$
(7)

Differentiating n(E), one arrives at the following equation with displacement:

$d n d E =−μ ∫ 0 μ φ μ f μ ( φ μ )n( φ μ E)d φ μ .$

Assuming that n(E) E α, we arrive at the following self-consistent equation for α:

$α=μ ∫ 0 μ φ μ 1 − α f μ ( φ μ )d φ μ .$
(8)

whose only solution, for large μs converges to α = 2, leading to the general result.

$n(E)∝ E − 2 .$
(9)

In the appendix A we provide a rigorous derivation of this result. In spite of the asymptotic nature of the proof given in the appendix A, numerical simulations show an excellent agreement with this theoretical prediction, even for α small. In Fig. (5) we show the frequency plots for 105 avalanches with μ = 2.5, 3.5, 4.5, and the convergence of the histograms to E −2 can be perfectly appreciated. This result is the same that is obtained from Fermi’s particle acceleration model to explain the spectrum of cosmic rays31, 32. Here we derived it on the basis of simple combinatorial reasonings of SSR processes with a superimposed constraint.

## Discussion

### Appendix A: Derivation of exponent −2 for cascades with energy conservation

We derive the main result of section IIB. The strategy followed is summarised in Fig. (4). An alternative view is given in Fig. (6) in this appendix.

We start with the definition of the Irwin-Hall distribution: Let u be a random variable whose probability density is uniform in the interval $[0,1]$. Let u 1, …, u μ be a sequence of independent drawings of the random variable u and ϕ(μ) a random variable defined over the interval $[0,μ]$ as:

$φ μ = ∑ k ≤ μ u k .$
(A1)

The probability density that governs the random variable ϕ μ is the Irwin-Hall distribution. Now we go to equation (7),

$n(E)= ∫ 0 μ f μ ( φ μ ) { ∫ φ μ E ∞ μ φ μ E ′ n ( E ′ ) d E ′ } d φ μ .$

Differentiating,

$d n d E = d d E ∫ 0 μ f μ ( φ μ ) { ∫ φ μ E ∞ μ φ μ E ′ n ( E ′ ) d E ′ } d φ μ = ∫ 0 μ f μ ( φ μ ) d d E { ∫ φ μ E ∞ μ φ μ E ′ n ( E ′ ) d E ′ } d φ μ = − μ ∫ 0 μ φ μ f μ ( φ μ ) n ( φ μ E ) d φ μ ,$

one arrives at the following equation with displacement:

$d n d E =−μ ∫ 0 μ φ μ f μ ( φ μ )n( φ μ E)d φ μ .$

Assuming that n(E) E α, we arrive at the following self-consistent equation for α:

$α=μ ∫ 0 μ φ μ 1 − α f μ ( φ μ )d φ μ .$

This is equation (8) and that is what we have to solve. The following theorem states that the solution in the limit of large μ’s is α = 2, independent of μ. To demonstrate that we need to proof 5 lemmas. After the demonstration, we approach the solution α → 2 using a mean field approach. Finally, we report a side observation concerning the behaviour of the average value of a random variable following the Irwin-hall distribution.

Theorem: The only α satisfying the following equation:

$lim μ → ∞ μ ∫ 0 μ φ μ 1 − α f μ ( φ μ )d φ μ =2.$
(A2)

is α = 2.

To prove this theorem, we observe first observe that a random variable following the Irwin-Hall distribution converges to a random variable following a normal distribution with average $μ 2$ and standard deviation $μ 12$.

Lemma 1: Let ϕ μ a random variable following the Irwin-Hall distribution. Let Y be a random variable following a normal distribution centred at 0 and with standard deviation 1. Then, the following limit holds:

$φ μ → μ 2 + μ 12 Y(0,1).$

in probability.

Proof: The average value of a uniformly distributed random variable u is $E(u)= 1 2$ and standard deviation $σ= 1 12$. By of the central limit theorem, one has that, for an i.i.d. sequence of random variables u 1, …, u n :

$∑ i ≤ μ u i − μ 2 μ 12 →Y,$

being Y a random variable following a normal distribution centred at 0 and with standard deviation 1. Therefore, by realising that ϕ μ is actually a sum of μ i.i.d random variables u, one has:

$φ μ → μ 2 + μ 12 Y(0,1),$

as we wanted to demonstrate.☐

This implies that the Irwin-Hall distribution f μ can be fairly approached by a normal distribution with mean $μ 2$ and standard deviation $μ 12$, Φ μ (x). However, one must be careful with this approach: It can lead the integral that we want to solve, $∫ 0 μ φ μ 1 − α f μ ( φ μ )d φ μ$, to a singularity at 0 which is inexistent in the Irwin-Hall distribution. Therefore, for the interval [0, 1) we will maintain the original form of the distribution. In the following lemma we demonstrate that this has no impact in the limit of large μ’s.

Lemma 2: Let Φ μ be a normal distribution with mean at $μ 2$ and standard deviation $σ μ = μ 12$. Then, $(∀ϵ>0)(∃N):(∀μ>N)$

$| ∫ 0 μ φ μ 1 − α f μ ( φ μ ) d φ μ − ∫ 1 μ φ μ 1 − α Φ μ ( φ μ ) d φ μ | <ϵ.$

Proof: From lemma 1 we know that $(∀ϵ′>0)(∃N):(∀Sμ>N)$

$| ∫ 1 μ φ μ 1 − α f μ ( φ μ ) d φ μ − ∫ 1 μ φ μ 1 − α Φ μ ( φ μ ) d φ μ | <ϵ′.$

Now we observe that the Irwin-Hall distribution can be defined per intervals using different polynomials. In the case of the interval [0, 1), the polynomial reads:

$f μ ( φ μ )= 1 ( μ − 1)! φ μ μ − 1 ; φ μ ∈ [ 0 , 1 ) .$
(A3)

Computing directly the integral, one has:

$∫ 0 μ φ μ 1 − α f μ ( φ μ ) d φ μ = ∫ 0 1 φ μ 1 − α f μ ( φ μ ) d φ μ + ∫ 1 μ φ μ 1 − α f μ ( φ μ ) d φ μ ,$

where the first integral, according to equation (A3) leads to:

$∫ 0 1 φ μ 1 − α f μ ( φ μ ) d φ μ = 1 ( μ − 1)! ∫ 0 1 φ μ μ − α d φ μ = 1 ( μ − 1)!( μ − α − 1) .$

Now take δ (0, 1) and define $ϵ(μ,δ)$ as:

$ϵ(μ)≡ 1 + δ ( μ − 1)!( μ − α − 1)$

From lemma 1, $(∀ϵ′>0)(∃N):(∀μ>N)$ we can define the following bound:

$| ∫ 0 μ φ μ 1 − α f μ ( φ μ ) d φ μ − ∫ 1 μ φ μ 1 − α Φ μ ( φ μ ) d φ μ | <ϵ′+ϵ(μ,δ).$

Finally, we observe that

$lim μ → ∞ ϵ(μ,δ)=0,$

which demonstrates the lemma.☐

Lemma 3: The function of G(α) defined by the integral

$G(α)= ∫ 1 μ φ μ 1 − α Φ( φ μ )d φ μ ,$

is strictly decreasing.

Proof: It is enough to compute the derivative:

$d d α ∫ 1 μ φ μ 1 − α Φ ( φ μ ) d φ μ = ∫ 1 μ ( d d α φ μ 1 − α ) Φ ( φ μ ) d φ μ = − ∫ 1 μ φ μ 1 − α log φ μ Φ ( φ μ ) d φ μ < 0 ,$

since the term inside the integral, $φ μ 1 − α log φ μ Φ( φ μ )$, is strictly positive in the interval (1, μ).☐

Now take a monotonously increasing function that grows slower than the standard deviation $σ μ = μ 12$, φ(μ). For convenience, we define define it as:

$ϕ(μ)≡ ( μ 12 ) 1 4 .$
(A4)

Clearly:

$lim μ → ∞ ∫ μ 2 − σ μ ϕ ( μ ) μ 2 + σ μ ϕ ( μ ) Φ( φ μ )d φ μ =1.$
(A5)

Now suppose, that α = 2. Then, thanks to Lemma 2, one has that $(∀ϵ>0)(∃N):(∀μ>N)$

$| ∫ 0 μ f μ ( φ μ ) φ μ d φ μ − ∫ 1 μ Φ μ ( φ μ ) φ μ d φ μ | <ϵ.$

From this we derive the third lemma of our demonstration:

Lemma 4: $(∀ϵ>0)(∃N):(∀μ>N),$

$| ∫ 1 μ Φ ( φ μ ) φ μ d φ μ − ∫ μ 2 − σ μ ϕ ( μ ) μ 2 + σ μ ϕ ( μ ) Φ ( φ μ ) φ μ d φ μ | <ϵ.$

Proof: We need to compute the parts that fall outside the integration limits and see that their contribution vanishes. First, we see that:

$∫ μ 2 + σ μ ϕ ( μ ) μ Φ ( φ μ ) φ μ d φ μ < ( μ 2 + σ μ ϕ ( μ ) ) e − ϕ 2 ( μ ) − O ( log μ ) < ( μ 2 + σ μ ϕ ( μ ) ) e − μ .$

Analogously,

$∫ 1 μ 2 − σ μ φ ( μ ) Φ ( φ μ ) φ μ d φ μ < ( μ 2 − σ μ ϕ ( μ ) ) e − μ .$

Now we define:

$ϵ 1 ( μ ) = ( μ 2 + σ μ ϕ ( μ ) ) e − μ , ϵ 2 ( μ ) = ( μ 2 − σ μ ϕ ( μ ) ) e − μ ,$

$| ∫ 1 μ Φ ( φ μ ) φ μ d φ μ − ∫ μ 2 − σ μ ϕ ( μ ) μ 2 + σ μ ϕ ( μ ) Φ ( φ μ ) φ μ d φ μ | < ϵ 1 (μ)+ ϵ 2 (μ),$

demonstrating the lemma.☐

Corollary of Lemma 4: $(∀ϵ>0)(∃N):(∀μ>N),$

$| ∫ 0 μ φ μ 1 − α f μ ( φ μ ) d φ μ − ∫ μ 2 − σ μ ϕ ( μ ) μ 2 + σ μ ϕ ( μ ) Φ ( φ μ ) φ μ d φ μ | <ϵ.$

Proof: By direct application of lemmas 2 and 4.☐

Now we define the following functions of the limits of the integral $∫ μ 2 − σ μ ϕ ( μ ) μ 2 + σ μ ϕ ( μ ) …$:

$r 1 ( μ ) ≡ ( μ 2 + σ μ ϕ ( μ ) ) − 1 = 2 μ − 2 σ μ ϕ μ μ ( μ 2 + σ μ ϕ ( μ ) ) , r 2 ( μ ) ≡ ( μ 2 − σ μ ϕ ( μ ) ) − 1 = 2 μ + 2 σ μ ϕ μ μ ( μ 2 − σ μ ϕ ( μ ) ) .$

Clearly,

$r 1 , 2 (μ)∼ 2 μ +O ( μ − 5 4 ) ,$
(A6)

where the subscript 1,2 means that both functions satisfy the property.

Lemma 5: $(∀ϵ>0)(∃N):(∀μ>N)$

$| r 1,2 ( μ ) ⋅ μ ∫ μ 2 − σ μ ϕ ( μ ) μ 2 + σ μ ϕ ( μ ) Φ ( φ μ ) d φ μ − μ ∫ μ 2 − σ μ ϕ ( μ ) μ 2 + σ μ ϕ ( μ ) Φ ( φ μ ) φ μ d φ μ | <ϵ.$

Proof: We first observe that, by substituting $φ μ − 1$ by the integration limits, we have the following chain of inequalities, in terms of the above defined functions r 1,2(μ):

$r 2 ( μ ) ⋅ ∫ μ 2 − σ μ ϕ ( μ ) μ 2 + σ μ ϕ ( μ ) Φ ( φ μ ) d φ μ < ∫ μ 2 − σ μ ϕ ( μ ) μ 2 + σ μ ϕ ( μ ) Φ ( φ μ ) φ μ d φ μ < r 1 ( μ ) ⋅ ∫ μ 2 − σ μ ϕ ( μ ) μ 2 + σ μ ϕ ( μ ) Φ ( φ μ ) d φ μ .$

Therefore, is enough to demonstrate that $(∀ϵ′>0)(∃N):(∀μ>N)$

$| μ r 1 ( μ ) − μ r 2 ( μ ) | <ϵ′.$

This can be proven directly from equation (A6), leading to:

$| μ r 1 ( μ ) − μ r 2 ( μ ) | ∼O ( μ − 1 4 ) ,$

which demonstrates the lemma.☐

Collecting lemmas 1, 2, 4, and 5, we have demonstrated that, under the assumption that α = 2,

$μ r 1 (μ)→μ ∫ 0 μ φ μ 1 − α f μ ( φ μ )d φ μ .$

From equation (8), the only remaining issue is to demonstrate the consistency of our hypothesis is that indeed $lim μ → ∞ μ r 1 (μ)=2$. It is not difficult to check, from equation (A6), that:

$lim μ → ∞ μ r 1 (μ)= lim μ → ∞ [ 2 + O ( μ − 1 4 ) ] =2.$

So far we have demonstrated that the solution α = 2 is consistent with the statement of the theorem. Now it remains to demonstrate that this is the only solution. To see that, we observe that thanks to lemma 3, we know that the function μG(α) is decreasing. In addition, we have proven that the statement of the theorem is consistent for α = 2. Therefore, if α = 2 + β, with β > 0, then:

$μ ∫ 0 μ φ μ 1 − α f μ ( φ μ )d φ μ <2+β,$

which contradicts the statement of the theorem. The same happens if one imposes if α = 2 − β, with β > 0, since one gets:

$μ ∫ 0 μ φ μ 1 − α f μ ( φ μ )d φ μ >2−β,$

which is, again inconsistent, thereby proving the lemma.☐

By direct application of equation (8), Lemma 5 puts the last piece to demonstrate the theorem.☐

### Mean-field approach

In a less rigorous way, we observe that we can approach the solution as follows: We know that the expected value of a random variable ϕ μ following the Irwin-Hall distribution f μ is:

$E( φ μ )= μ 2 .$

Now assume that $φ μ ≈ μ 2$. This implies that, in the integral of the statement of the theorem, equation (A2), we replace f μ (ϕ μ ) by $δ ( φ μ − μ 2 )$, where δ is the Dirac δ function:

$μ ∫ 0 μ φ μ 1 − α f μ ( φ μ )d φ μ ≈μ ∫ 0 μ φ μ 1 − α δ ( φ μ − μ 2 ) d φ μ .$

Solving the integral, and thanks to equation (8), we obtain the following relation:

$α μ = ( μ 2 ) 1 − α ,$

whose only solution is α = 2.

We end observing that the theorem that we demonstrated has a curious consequence: Let ϕ μ be a random variable following the Irwin-Hall distribution. We observe that, if α = 2, then:

$α μ = ∫ 0 μ φ μ 1 − α f μ ( φ μ ) d φ μ = ∫ 0 μ f μ ( φ μ ) φ μ d φ μ = E ( 1 φ μ ) → 2 μ .$

We know that $E( φ μ )= μ 2$. Therefore, a direct consequence of the theorem is that:

$E ( 1 φ μ ) → 1 E ( φ μ ) .$

Publisher's note: Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

## References

1. 1.

Yule, G. U. A Mathematical Theory of Evolution, Based on the Conclusions of Dr. J. C. Willis. Philos. Trans. R. Soc. London B 213, 21 (1925).

2. 2.

Simon, H. A. On a class of skew distribution functions. Biometrika 42, 425 (1955).

3. 3.

Barabási, A.-L. & Albert, R. Emergence of scaling in random networks. Science 286, 509 (1999).

4. 4.

Mitzenmacher, M. A Brief History of Generative Models for Power Law and Lognormal Distributions. Internet Mathematics 1(2), 226 (2003).

5. 5.

Newman, M. E. J. Power laws, Pareto distributions and Zipfs law. Contemp. Phys. 46(5), 323 (2005).

6. 6.

Stanley, H. E. Introduction to Phase Transitions and Critical Phenomena (Oxford University Press: Oxford) (1987).

7. 7.

Bak, P., Tang, C. & Wiesenfeld, K. Self-organized criticality: An explanation of the 1/f noise. Phys. Rev. Lett. 59, 381 (1987).

8. 8.

Kadanoff, L. P., Nagel, S. R., Wu, L. & Zhou, S.-M. Scaling and universality in avalanches. Phys. Rev. A 39, 6524 (1989).

9. 9.

Jensen, H. J. Self-Organized Criticality (Cambridge University Press, Cambridge) (1996).

10. 10.

Christensen, K. & Moloney, N. R. Complexity and Criticality (Imperial College Press, London, UK) (2005).

11. 11.

Mandelbrot, B. An Informational Theory of the Statistical Structure of Languages. In Communication Theory, Jackson, W. editor 486502 (Woburn, MA: Butterworth) (1953).

12. 12.

Harremoës, P. & Topsøe, P. Maximum Entropy Fundamentals. Entropy 3, 191 (2001).

13. 13.

Corominas-Murtra, B., Fortuny, J. & Solé, R. V. Emergence of Zipf’s law in the evolution of communication. Phys. Rev. E 83, 036115 (2011).

14. 14.

Corominas-Murtra, B., Hanel, R. & Thurner, S. Understanding scaling through history-dependent processes with collapsing sample space. Proc. Natl. Acad. Sci. USA 112(17), 5348 (2015).

15. 15.

Corominas-Murtra, B., Hanel, R. & Thurner, S. Extreme robustness of scaling in sample space reducing processes explains Zipf-law in diffusion on directed networks. New Journal of Physics 18(9), 093010 (2016).

16. 16.

Hanel, R., Thurner, S. & Gell-Mann, M. How multiplicity determines entropy and the derivation of the maximum entropy principle for complex systems. Proc. of the Natl. Acad. of Sci. USA 111, 6905 (2014).

17. 17.

Paczuski, M., Maslov, S. & Bak, P. Avalanche dynamics in evolution, growth, and depinning models. Phys. Rev. E 53, 414 (1996).

18. 18.

Jackson M. O. Social and Economic Networks (Princeton University Press, Princeton, NJ) (2010).

19. 19.

Frette, V., Christensen, K., Malthe-Sorensen, A., Feder, J., Jossang, T. & Meakin, P. Avalanche dynamics in a pile of rice. Nature 379, 49 (1996).

20. 20.

Sornette, A. & Sornette, D. Self-organized criticality and earthquakes. EPL (Europhysics Letters) 9(3), 197 (1989).

21. 21.

Turcotte, D. L. Fractals and Chaos in Geology and Geophysics (Cambridge University Press, Cambridge), 2nd ed (1997).

22. 22.

Corral, Á. Long-term clustering, scaling, and universality in the temporal occurrence of earthquakes. Phys. Rev. Lett. 92(10), 108501 (2004).

23. 23.

Peters, O. & Neelin, D. Critical phenomena in atmospheric precipitation. Nature Physics 2, 393 (2006).

24. 24.

Corral, Á., Osso, A. & Llebot, J. E. Scaling of tropical-cyclone dissipation. Nature Phys 6, 693 (2010).

25. 25.

Thurner, S., Klimek, P. & Hanel, R. Schumpeterian economic dynamics as a quantifiable minimum model of evolution. New Journal of Physics 12, 075029 (2010).

26. 26.

Boss, M., Summer, M. & Thurner, S. Contagion flow through banking networks. Lecture Notes in Computer Science 3038, 1070 (2004).

27. 27.

Buldyrev, S. V., Parshani, R., Paul, G., Stanley, H. E. & Havlin, S. Catastrophic cascade of failures in interdependent networks. Nature 464(7291), 1025 (2010).

28. 28.

Thurner, S., Farmer, J. D. & Geanakoplos, J. Leverage causes fat tails and clustered volatility. Quantitative Finance 12, 695 (2012).

29. 29.

Hanel, R. & Thurner, S. Generalized (c,d)-Entropy and Aging Random Walks. Entropy 15, 5324–5337 (2013).

30. 30.

Thurner, S., Hanel, R., Liu, B. & Corominas-Murtra, B. Understanding Zipf’s law of word frequencies through sample-space collapse in sentence formation. Journal of the Royal Society Interface 12, 20150330 (2016).

31. 31.

Fermi, E. On the Origin of the Cosmic Radiation. Phys. Rev. 75, 1169 (1949).

32. 32.

Longair, M. S. High Energy Astrophysics, Vol. 2: Stars, the Galaxy and the Interstellar Medium (Cambridge University Press, Cambridge, MA) (2008).

33. 33.

Feller, W. An Introduction to Probability Theory and its Applications, Vols I and II, third edition (John Wiley and Sons, New York, NY) (1968).

34. 34.

Kingman, J. F. C. Poisson processes, Oxford Studies in Probability, vol. 3 (The Clarendon Press, Oxford University Press, New York) (1993).

35. 35.

Huillet, T. Sampling formulae arising from random Dirichlet populations. Communications in Statistics - Theory and Methods 34(5), 1019–1040 (2005).

36. 36.

Krapivsky, P. L. & Ben-Naim, E. Scaling and multiscaling in models of fragmentation. Phys. Rev. E 50, 3502–3507 (1994).

37. 37.

Hanel, R., Corominas-Murtra, B., Liu, B. & Thurner, S. Fitting Power-laws in empirical data with estimators that work for all exponents. PLoS One 12(2), e0170920 (2016).

## Acknowledgements

This work was supported by the Austrian Science Fund FWF under the P29032 and P 29252 projects. We acknowledge an anonymous reviewer for the constructive comments on our first version of the manuscript.

## Author information

### Affiliations

1. #### Section for the Science of Complex Systems, CeMSIIS, Medical University of Vienna, Spitalgasse 23, A-1090, Vienna, Austria

• Bernat Corominas-Murtra
• , Rudolf Hanel
•  & Stefan Thurner
2. #### Complexity Science Hub Vienna, Josefstädterstrasse 39, 1080, Vienna, Austria

• Bernat Corominas-Murtra
• , Rudolf Hanel
•  & Stefan Thurner
3. #### Santa Fe Institute, 1399 Hyde Park Road, Santa Fe, NM, 87501, USA

• Stefan Thurner
4. #### IIASA, Schlossplatz 1, 2361, Laxenburg, Austria

• Stefan Thurner

### Contributions

B.C.-M., R.H. and S.T. designed, performed the research and wrote the manuscript.

### Competing Interests

The authors declare that they have no competing interests.

### Corresponding author

Correspondence to Stefan Thurner.