Branching into the Unknown: Inferring collective dynamical states from subsampled systems

When studying the dynamics of complex systems, one can rarely sample the state of all components. We show that this spatial subsampling typically leads to severe underestimation of the risk of instability in systems with propagation of events. We analytically derived a subsampling-invariant estimator and applied it to non-linear network simulations and case reports of various diseases, recovering a close relation between vaccination rate and spreading behavior. The estimator can be particularly useful in countries with unreliable case reports, and promises early warning if e.g. antibiotic resistant bacteria increase their infectiousness. In neuroscience, subsampling has led to contradictory hypotheses about the collective spiking dynamics: asynchronous-irregular or critical. With the novel estimator, we demonstrated for rat, cat and monkey that collective dynamics lives in a narrow subspace between the two. Functionally, this subspace can combine the different computational properties associated with the two states.

How can we infer properties of a high-dimensional dynamical system if we can only observe a very small part of it? This problem of spatial subsampling is common to almost every area of research where spatially extended, time evolving systems are investigated. For example, in many diseases the number of reported infections may be much lower than the unreported ones [1,2], or in the financial system only a subset of all banks is evaluated when assessing the risk of developing system wide instability [3] ("stress test"). Spatial subsampling is particularly severe when recording neuronal spiking activity, because the number of neurons that can be recorded with ms precision is vanishingly small compared to the number of all neurons in a brain area [4,5] (Fig. 1a).
Here, we show for a large class of systems, namely systems that can be approximated by a process with 1 st order autoregressive representation (PAR), that subsampling leads to a strong overestimation of their stability. Besides the conventional AR(1) process, PARs include e.g. branching processes [6,7,8] and Kesten processes [9]. PARs are ubiquitously used as an first order approximation to assess the self-sustaining dynamics and stability of systems with propagating activity, including epidemic spread of infectious diseases [10,11], cell proliferation [12], evolution [13] (further biological applications in [14]), neutron processes in nuclear power reactors [15], spread of bankruptcy [16,17], evolution of stock prices [18,19], or the propagation of spiking activity in neural networks [20,21] (Fig. 1b). In these systems, overestimating their stability can fa-cilitate catastrophes, e.g. when misjudging the risk that a disease becomes epidemic, or that a brain area becomes epileptic. Correct risk prediction is essential to timely initiate counter actions which can mitigate the propagation of events. We analytically derived the origin of the estimation bias and developed a novel estimator, which we analytically proof to be unbiased under subsampling. To assure that a PAR is a reasonable approximation of the evaluated complex system, and to exclude contamination through potential nonstationarities, we included a set of automated, datadriven tests.
In a PAR, the activity in the next time step, A t+1 , depends linearly on the current activity A t . In addition, it allows external input, e.g. drive from stimuli or other brain areas, with a mean rate h, yielding the autoregressive representation Here, · | · denotes the conditional expectation. The stability of A t is solely governed by m, for example the mean number of spikes triggered by one spike in a neuron. The activity is stationary if m < 1, while it grows exponentially if m > 1. The state m = 1 separates the stable from the unstable regime. Especially close to this transition, a correct estimate of m is vital to assess the risk that A t develops a large, potentially devastating cascade or avalanche of events (e.g. an epileptic seizure or an epidemic disease outbreak), either generically or via a minor increase in m. Figure 1: Spatial subsampling. a. In complex networks, such as the brain, often only a small subset of all units can be sampled (spatial subsampling); figure created using TREES [22]. b. In a branching network (BN), an active unit (e.g. a spiking neuron, infected individual, or defaulting bank) activates some of its neighbors in the next time step. Thereby activity can spread over the system. As the subsampled activity at may significantly differ from the actual activity At, spatial subsampling can impair inferences about the dynamical properties of the full system. c. In recurrent networks (BN, Bak- Tang Conventional estimation of m uses linear regression [23] (r 1 ) of activity at time t and t + 1, because r 1 directly returnsm owing to the autoregressive representation. This estimation of m is consistent if the full activity A t is known [23,24]. However, under subsampling it can be strongly biased, as we show here. To derive the bias quantitatively, we only assume that the subsampled activity a t is in expectation linear in A t , a t | A t = α A t + β with two constants α and β (Def. S1). This represents, for example, sampling a fraction α of all neurons in a brain area. Then the conventional estimator is biased by m (α 2 Var[A t ] / Var[a t ]−1) (Corollary S5). The bias vanishes when all units are sampled (α = 1, Fig. 1c,d). We emphasize that this bias is inherent to subsampling and cannot be overcome by obtaining longer recordings.
The novel estimator takes a different approach (Def. S2). Instead of directly using the biased regression r 1 , we make use of our analytical result that all regressions r k between a t and a t+k are biased by the same factor (Fig. 1e, Theorem S4). Hence, the exponential relation [25] r k = m k (Theorem S1) under full sampling translates to under subsampling. The factor b is, in general, not known. However, one does not need to know it to estimate m from the exponential relation, which serves as the heart of our new multiple regression (MR) estimator (Fig. 1f, Corollary S2 and Theorem S4). The MR estimator is unbiased under subsampling, because the system itself evolves independently of the sampling process: While subsampling biases each regression r k by decreasing the mutual dependence between subsequent observations (a t , a t+k ), the temporal decay in r k ∼m k = e −k/τ remains unaffected. Here, τ = −(log m) −1 refers to the autocorrelation time of stationary (subcritical) processes, where autocorrelation and regression r k are equal. Thus for subcritical PARs, subsampling decreases the autocorrelation strength r k , while the autocorrelation time τ is preserved. Making use of this result allows for an unbiased estimate of m even when sampling only a single unit (Fig. 1f ).
PARs are typically only a first order approximation of real world event propagation. However, their mathematical structure allowed for an analytical derivation of the subsampling bias and the unbiased estimator. To show that the MR estimator returns correct results also for more complex systems, we applied it to more complex simulated systems: a branching network [21] (BN) and the non-linear Bak-Tang-Wiesenfeld model [26] (BTW). In contrast to generic PARs, these models (a) run on recurrent networks and (b) are of finite size. In addition, the second model shows (c) completely deterministic propagation of activity instead of the stochastic propagation that characterizes PARs, and (d) the activity of each unit depends of many past time steps, not only one. Both models approximate neural activity propagation in the cortex [20,21,4,5,27,28]. In both models, the numerical estimates very well match the analytically derived bias (dashed lines in Fig. 1c, see Eq. S3), although the BN and BTW are only approximated by a PAR. The bias is considerable: For example, sampling 10% or 1% of the neurons in a BN with m = 0.9 resulted in the wrong estimatesm = r 1 = 0.312, or evenm = 0.047, respectively. Thus a process fairly close to instability (m = 0.9) is mistaken as Poisson-like (m = 0.047 ≈ 0) just because sampling is constrained to 1% of the neurons. Thereby the risk that systems may develop instabilities is tremendously underestimated.
MR estimation is readily applicable to subsampled data, because it only requires a sufficiently long time series a t , and the assumption that in expectation a t is proportional to A t . Hence, in general it suffices to sample the system randomly, without even knowing the system size N , the number of sampled units n, nor any moments of the underlying process. Importantly, one can obtain an unbiased estimate of m, even when sampling only a very small fraction of the process, under homogeneity even when sampling only one single unit (Figs. 1c,d). This robustness makes the estimator readily applicable to any system that can be approximated by a PAR.
Application to disease case reports. We used the novel estimator to estimate the "reproductive number" m for incidence time series of different diseases. Here, m determines the disease spreading behavior and has been deployed to predict the risk of epidemic outbreaks [11]. However, the problem of subsampling or underascertainment has always posed a challenge [1,2]. Our MR estimator promises to infer the reproductive number m directly from the subsampled time series, without even knowing the degree of under-ascertainment.
First, we investigated measles, a contagious infection and a leading cause of death among children, because the vaccination percentage for measles differs in each country, and this is expected to impact the spreading behavior through m. We evaluated worldwide case and vaccination reports for 124 countries provided by the WHO since 1980 (Fig. 2a, see Supp. 8). The reproductive numberm ranged between 0 and 0.93, and in line with our prediction clearly decreased with increasing vaccination percentage in the respective country (Spearman rank correlation: r = −0.343, p < 10 −4 , Fig. 2b), even though the time series comprised only up to 35 data points per country.
Second, we estimated the reproductive numbers for three under-ascertained [2,29] diseases in Germany with highly different infectiousness: norovirus, measles, and invasive meticillin-resistant Staphylococcus aureus (MRSA, an antibiotic-resistant germ classically associated with health care facilities [30,31], Figs. 2c,d), and quantified for the first time their propagation behavior. MR estimation returned the highestm = 0.98 for Disease propagation. In epidemic models, the reproductive number m can serve as an indicator for the infectiousness of a disease within a population, and predict the risk of large incidence bursts. We have estimatedm from incidence time series of measles infections for 124 countries worldwide (see Supp. 7); as well as norovirus, measles, and invasive meticillin-resistant Staphylococcus aureus (MRSA) infections in Germany. a. MR estimation of m is shown for measles infections in three different countries. Errorbars here and in all following figures indicate 1SD or the corresponding 16% to 84% confidence intervals if asymmetric. b. The reproductive numbersm decrease with the vaccination rate (Spearman rank correlation: r = −0.342, p < 10 −4 ). c. Reproductive numbersm for norovirus, measles and MRSA infections in Germany. d. Weekly case report time series for these infections surveyed by the Robert-Koch-Institute. norovirus, compliant with its high infectiousness [32]. For measles we found an intermediatem = 0.88, reflecting the vaccination rate of about 97%. For MRSA we identified m = 0, confirming that transmission is still minor in Germany [33]. However, a future increase of transmission is feared and would pose a major public health risk [34]. Such an increase could be detected by our estimator, even in countries where case reports are incomplete.
Spiking activity in vivo is subcritical. We applied the MR estimator to in vivo spiking activity of cortical networks to investigate two contradictory hypothesis about collective spiking dynamics. One hypothesis suggests that the collective dynamics is "asynchronous irregular" (AI), i.e. neurons spike independently of each other and in a Poisson manner (m = 0), e.g. reflecting a balanced state [35,36,37]. The other hypothesis suggests that neuronal networks operate at criticality (m = 1) [20,38,39,40,41], thus in a particularly sensitive state close to a phase transition. These different hypotheses have distinct implications for the coding strategy of the brain: Criticality is characterized by long-range correlations in space and time, and in models optimizes performance in tasks that profit from long reverberation of the activity in the network [42,21,43,44]. In contrast, the typical balanced state minimizes redundancy [45,46,47] and supports fast network responses [35].
Analyzing in vivo spiking activity from Macaque monkey prefrontal cortex during a memory task [48], anesthetized cat visual cortex with no stimulus [49], and rat hippocampus during a foraging task [50,51] returnedm to be between 0.963 and 0.998 (median m = 0.984), corresponding to auto-correlation times τ between 100 ms and 2000 ms (Fig. 3d, Fig. S3). This clearly suggests that spiking activity in vivo is neither AI-like (m = 0), nor consistent with a critical state (m = 1), which shows τ → ∞. Instead cortical activity lives in a narrow subspace covering only 5% of the physiologically plausible range (m ∈ [0, 1], as instability is not physiologically sustainable).
The MR estimator can distinguish these states with the necessary precision, because tiny changes in m have large impact on the network dynamics if the network is close to m = 1. In fact, the estimator would allow for 100 times higher precision when distinguishing critical from non-critical BN, assuming in vivo-like subsampling (sampling n = 100 from N = 10 4 neurons, Fig.  3d). With larger N , this discrimination becomes even more sensitive (detailed error estimates: Fig. S2 and Supp. 6). As the number of neurons in a given brain area is typically much higher than N = 10 4 in the simulation, finite size effects are not likely to account for the observed deviation from criticality = 1 − m ≈ 10 −2 in vivo, supporting that in rat, cat, and monkey the brain does not operate in a critical state.
The in vivo recordings showed clear fluctuations for the population spiking activity (Fig. 3a). Such fluctuations could in principle arise from non-stationarities, which could in turn lead to misestimation of m. This is unlikely for three reasons: First, our MR estimator rejects recordings that show any signature of common non-stationarities ( Fig. S1, see Supp. 5). Second, recordings in cat visual cortex were acquired in absence of any stimulation, excluding stimulus related non-stationarities. Third, when splitting the spike recording into short windows, the window-to-window variation ofm in the recording did not differ from that of a stationary in vivo-like BN (p = 0.3, Figs. 3b,c). The in vivo-like BN was set up with the same branching ratio m, spike rate a t , number of sampled neurons n, and duration as the experimental recording (for the cat n = 50, m = 0.98, a t /n = 7.9 Hz, recording of 295 s length).
We want to stress, that cortical dynamcis is more complicated than a simple PAR, because heterogeneity of neuronal morphology and function, non-trivial network topology, and the complexity of neurons themselves are likely to have a profound impact onto the population dynamics. Nevertheless, we found an exponential relation r k = b m k for the population activity of all considered species (insets of Fig. 3b,c, Fig.  S3), indicating that cortical dynamics can at least to a large extent be approximated by a PAR, despite the diversity of the underlying system. Because of the analytical tractability of such generic models, this analysis allows valuable insight into reverberant dynamics and the stability of cortical activity. Future analysis can then study the effect of relevant, additional parameters. In order to test for the applicability of a PAR approximation, we defined several tests (see Supp. 5) and included only those experiments, where the approximation by a PAR was considered appropriate (14 out of 21, Fig. S3). Note, that these tests were defined very conservative. For example, we excluded all experiments that showed an offset in the slopes r k , because this offset is, strictly speaking, not explained by a PAR and might indicate non-stationarities.
Using the novel MR estimator, we have shown that in vivo spiking dynamics in three different species operates neither at criticality, nor in an AI state, but in a regime with finite correlation times of hundreds of milliseconds, in agreement with the results by [52]. In summary, because of its broad applicability, in particular because it requires only the knowledge of the recorded time series, but no assumption about the size of the full system, the number of sampled units, or any of the moments of the full process, we expect that MR estimation can substantially contribute to the understanding of real-world dynamical systems in diverse fields of research where subsampling prevails.
We emphasize that the MR estimator only requires the subsampled recording a t of a system with full activity A t conforming with the definition below. It is not necessary to know either the full system size, the number of subsampled units, nor any of the moments of the full process A t .

Supp. 2 Branching processes
In a branching process (BP) with immigration [6,7,8] each unit i produces a random number y t,i of units in the subsequent time step. Additionally, in each time step a random number h t of units immigrates into the system (drive). Mathematically, BPs are defined as follows [6,7]: Let y t,i be independently and identically distributed non-negative integer-valued random variables following a law Y with mean m = Y and variance σ 2 = Var[Y ]. Further, Y shall be non-trivial, meaning it satisfies P[Y = 0] > 0 and P[Y = 0] + P[Y = 1] < 1. Likewise, let h t be independently and identically distributed non-negative integer-valued random variables following a law H with mean rate h = H and variance ξ 2 = Var [H]. Then the evolution of the BP A t is given recursively by i.e. the number of units in the next generation is given by the offspring of all present units and those that were introduced to the system from outside. The stability of BPs is solely governed by the mean offspring m. In the subcritical state, m < 1, the population converges to a stationary distribution A ∞ with mean A ∞ = h/(1 − m). At criticality (m = 1), A t asymptotically exhibits linear growth, while in the supercritical state (m > 1) it grows exponentially.

Supp. 3 Subsampling
To derive the MR estimator for subsampled data, subsampling is implemented in a parsimonious way, according to the following definition: Definition S1 (Subsampling). Let {A t } t∈N be a BP and {a t } t∈N a sequence of random variables. Then {a t } t∈N is called a subsampling of {A t } t∈N if it fulfills the following three conditions: (i) Let t , t ∈ N, t = t. Then the conditional random variables (a t |A t = j) and (a t |A t = l) are independent for any j, l ∈ N. If A t = A t then (a t |A t = j) and (a t |A t = j) are identically distributed.
(ii) Let t ∈ N. Conditioning on a t does not add further information to the process: The two random variables (A t+1 | A t = j, a t = l) and (A t+1 | A t = j) are identically distributed for any j, l ∈ N.
Thus the subsample a t is constructed from the full process A t based on the three assumptions: (i) The sampling process does not interfere with itself, and does not change over time. Hence the realization of a subsample at one time does not influence the realization of a subsample at another time, and the conditional distribution the subsampled a t and a(t ) do not necessarily take the same value. (ii) The subsampling does not interfere with the evolution of A t , i.e. the process evolves independent of the sampling. (iii) On average a t is proportional to A t up to a constant term. It will be shown later, that the novel estimator is applicable to any time series a t that was acquired from a BP conforming with this definition of subsampling. We will demonstrate possible applications at the hand of two examples: 1. Diagnosing infections with probability α. For example, when a BP A t represents the spread of infections within a population, each infection may be diagnosed with probability α ≤ 1, depending on the sensitivity of the test and the likelihood that an infected person consults a doctor. If each of the A t infections is diagnosed independent of the others, then the number of diagnosed cases a t follows a binomial distribution a t ∼ Bin(A t , α). Then a t |A t = j = α j is given by the expected value of the binomial distribution. This implementation of subsamling conforms with the definition, with the sampling probability α and the constant in (iii) being identical here.
2. Sampling a subset of system components. In a different application, a high-dimensional system of interacting units forms the substrate on which activation propagates. Often, the states of a subset units are observed continuously, for example by placing electrodes that record the activity of the same neurons over the entire recording (Fig. 1b). This implementation of subsampling in finite size systems is mathematically approximated as follows: If n out of all N model units are sampled, the probability to sample a t active units out of the actual A t active units follows a hypergeometric distribution, a t ∼ Hyp(N, n, A t ). As a t | A t = j = j n / N , this representation satisfies Def. S1 with α = n / N . Choosing this special implementation of subsampling allows to derive predictions for the Fano factor under subsampling and the spike count cross correlation. First, evaluate Var[a t ] further in terms of A t : This expression precisely determines the variance Var[a t ] under subsampling from the properties A t and Var[A t ] of the full process, and from the parameters of subsampling n and N . Using Eq. (S2), we could predict the linear regression slopesr k under subsampling (see Theorem S4, Eq. (S15)) in more detail: is constant when subsampling a given (stationary) system, and quantifies the factor by whichm is biased when using the conventional estimate for m. It depends on N , n and the first two moments of A t and is thus known for a BP. This relation was used for Fig. 1c.

Supp. 4 MR estimation
We here derive an estimator for the mean offspring m based on the autoregressive representation of the BP, This novel estimator is based on multistep regressions [25] (MR estimator), which generalize (S4) to arbitrary time steps k. From iteration of Eq. (S4), it is easy to see that Definition S2 (Multistep regression estimator). Consider a subsampled BP {a t } of length T . Let k max ∈ N, k max ≥ 2. Then multistep regression (of k max -th order) estimates m in the following way: 1. For k = 1, . . . , k max , estimate the sloper k of linear regression between the pairs {(a t , a t+k )} T −k t=0 , e.g. by least square estimation (Fig. 1e).
2. Based on the relation r k = b · m k , estimateb andm by minimizing the sum of residuals with the collection of slopes {r k } kmax k=1 obtained from step 1 (Fig. 1f ).
Thenm is the multistep regression (MR) estimate of the mean offspring m. For the application to experimental data, we further applied tests to identify nonstationarities (see Supp. 5).
We first prove that the MR estimator is consistent in the fully sampled case, and will then show the consistency under subsampling. First, we need the following result about the individual linear regression slopeŝ r k under full sampling: Theorem S1. The sloper k , obtained from A t under full sampling, is a consistent estimator for m k . If the process is subcritical, thenŝ k is also a consistent estimator for h 1−m k 1−m . Remark. For k = 1, these results were already obtained by [23,56,24], and details can be found in these sources. Based on their proofs, we here show the generalization to k timesteps.
Proof. Let k ∈ N, i ∈ {0, . . . , k − 1}. Construct a new random process by starting at time i and taking every k-th time step of the original process A t . This new process is given by A (k,i) t = A i+k·t with the index t ∈ N. Hence, the "time" t of this new process relates to the time t of the old process as t = i + k · t . For a time series of length T , let r (k,i) be the least square estimator for the slope andŝ (k,i) the least square estimator for the intercept of linear regression on all pairs (A . We will derive that r (k,i) is a consistent and unbiased estimator for m k . According to [24], it is sufficient to show that the evolution of A (k,i) t can be rewritten as , as this is a stochastic regression equation. Hence, consider We now show that ( holds. Hence, satisfies a linear stochastic regression equation with slope m k and intercept h 1−m k 1−m . The least square estimators return unbiased and consistent estimates for the slope and intercept in the subcritical case, i.e. the estimators converge in probability [23,56,24]: In the critical and supercritical cases, onlyr (k,i) p − → m k holds following [24]. Hence, we obtainr k Numerical results for Kesten processes support this conjecture [25]. Next, we show that MR estimation is unbiased in the subcritical case even if only the subsampled a t is known: Theorem S4. Let A t be a PAR with m < 1 and a stationary limiting distribution A ∞ and let the PAR be started in the stationary distribution, i.e. A 0 ∼ A ∞ . Let a t be a subsampling of A t . Multistep regression (MR) on the subsampled a t is an unbiased estimator of the mean offspring m.
Proof. The existence of a stationary distribution A ∞ was shown by [7]. The least square estimator for the slope of linear regression is also given by [57]r k =ρ at a t+kσ at σ a t+k (S10) with the the estimated standard deviationsσ at andσ a t+k of a t and a t+k respectively. In the subcritical state, σ at = σ a t+k because of stationarity. Thus estimating the linear regression slope is equivalent to estimating the Pearson correlation coefficientρ at a t+k =ρ at (k) (which is identical to the autocorrelation function of a t ). In the following, we calculate the Pearson correlation coefficient for the subsampled time series by evaluating a t a t+k . We use the law of total expectation in order to express a t a t+k not in dependence of a t , but in terms of A t : where the inner expectation value is taken with respect to the joint distribution of a t+k and a t , and the outer with respect to the joint distribution of A t+k and A t . Through conditioning on both A t and A t+k , (a t | A t ) and (a t+k | A t+k ) become independent due to Def. S1. Hence, the joint distribution of (a t , a t+k | A t , A t+k ) factorizes, and the expectation value factorizes as well. By definition, a t | A t = j = α j + β and hence a t a t+k = (αA t+k + β) (αA t + β) A t+k ,At (S12) Without loss of generality, we here show the proof for β = 0 which is easily extended to the general case. We express a t a t+k in terms of Eq. (S5) using the law of total expectation again: where the first expectation was taken with respect to the joint distribution of A t and A t+k . We then used that A 2 t and A t = h/(1 − m) exist, which follows from stationarity of the process. By a similar argument, and combining these results the covariance is Therefore, we find that the estimatorr k converges in probability: Hence, the bias of of the conventional estimatorr 1 is precisely given by the factor However, importantly the relationr k =bm k still holds for the subsampled a t . Given a collection of multiple linear regressionsr 1 , . . . ,r kmax , the least square estimation ofb andm from minimizing the residual (S6) yields an unbiased and consistent estimatorm for the mean offspring m even under subsampling and only requires the knowledge of a t .
This proof also showed that the conventional estimator [23] is biased under subsampling: Nonstationarity, criticality and supercriticality. Even if the subcritical process is not started in the stationary distribution (A 0 A ∞ ), the results by [7] show that it will converge to this stationary distribution as t → ∞. The consistence of the estimator in the fully sampled case is included in our proof of Lemma S1 and follows from the results by [23,24]. Our proof for the subsampled case (Theorem S4), in contrast, strictly requires stationarity (A t ∼ A ∞ for any t) and the existence of the first two moments of A t . Nevertheless, numerical results suggest that the MR estimator is also consistent if the subcritical process is not started in the stationary distribution, A 0 A ∞ , and also for critical and supercritical cases where no stationary distribution exists (Fig. 3d).

Supp. 5 Identifying Common non-stationarities and Poisson activity.
In many types of analyses, non-stationarities in the time series can lead to wrong results, typically an overestimation ofm. We developed tests to exclude data sets with signatures of common non-stationarities. The different non-stationarities, their impact on the r k and the rules for rejection of time series are outlined below. First, transient increases of the drive h t , e.g. in response to a stimulus, lead to a transient increase in A t . These transients induce correlations or anti-correlations, which prevail on long time scales (Fig. S1c,d). The autocorrelation function is therefore better captured by an exponential with offset, r k = b offset · m k offset + c offset . If the residual of this exponential with offset R 2 offset was smaller than the residual of the MR model R 2 exp by a factor of two, H offset = (2 · R 2 offset < R 2 exp ), then the date set was rejected. The factor two punishes for the differences in degree of freedom: The residuals of a model with two free parameters (exponential with offset) instead of one (exponential only) can only be smaller.
Second, ramping of the drive can lead to overestimation of autocorrelation times (Fig. S1e). The comparison of the two fits introduced above serves as a consistency check, which was able to identify ramping: if the data are captured by a BP, both models should infer identicalm. Thus, a difference betweenm exp and m offset hints at the invalidity of MR estimation. Instead ofm, we compared the autocorrelation timesτ offset = −1/ logm offset andτ exp obtained from both models, as the logarithmic scaling increases the sensitivity. If their relative difference was too large, then the data are inconsistent with a BP and MR estimation invalid: Third, when a system changes between different states of activity, e.g. up and down states, the drive rate h t may experience sudden jumps. These can lead to spurious autocorrelation (Fig. S1f ). To identify these trends resulting from non-stationary input h t or from choosing too short data sets, we tested whether the sequence of r k was fit better by a linear regression r k = q 1 k + q 2 on the pairs (k, r k ), than by the exponential relation (S6). If the residuals R 2 lin were smaller than R 2 exp : H lin = (R 2 lin < R 2 exp ), data were rejected. Apart from non-stationarities, even Poisson activity (m = 0, A t = h t ) with stationary rate may lead to a spurious overestimation ofm as well: for subsampled branching processes of finite duration, the Poisson case and processes close to criticality (m = 1) can show very similar autocorrelation results, because the sequence of r k is expected to be absolutely or almost flat, respectively. Moreover, for m = 0 any solution on the manifold with b = 0 minimizes the residuals in Eq. (S6). Hence, the estimator form may yield any value depending on the initial conditions of the minimization scheme. To distinguish between m = 0 and m > 0, we used the fact that for m = 0, all slopes r k are expected to be distributed around zero, r k = 0. In contrast, for processes with m > 0, all slopes are expected to be larger than zero r k = b · m k > 0. Thus to identify stationary Poisson activity, we tested (using a one-sided t-test) if the slopes obtained from the data were significantly larger than zero, yielding the p-value pr ≤0 and the following test (Fig. S1b): Hr ≤0 = (pr ≤0 ≥ 0.1). The choice of the significance level should be guided by the severity of type I or II errors here: if it is set too liberal, Poisson activity may be mistaken for correlated activity, potentially even close-to-critical. On the other hand, if the significance level is too conservative, activity with long autocorrelation times may be spuriously considered Poissonian under strong subsampling (when b is small and all slopes only slightly differ from zero). For this study, we chose a significance level of pr ≤0 < 0.1 in order to not underestimate the risk of large activity cascades. To confirm candidates for Poisson activity identified through positive Hr ≤0 , we assured that the r k did not show a systematic trend, i.e. that linear regression of r k as a function of k (see H lin above) yielded slope zero: H q1=0 = (p q1=0 ≥ 0.05). The according significance level for this two sided test is then given by p q1 =0 < 0.05.
We discriminate the following cases in the order indicated in Tab. S1:m obtained from MR estimation is only valid if none of the tests (except H q1=0 , which is ignored here) is positive. A positive result for any of H offset , H τ , or H lin indicates non-stationarities, the data are not explained by a stationary BP, and MR estimation is invalid. If Hr ≤0 is positive, the data are potentially consistent with Poisson activity (m = 0). This is only the case if H q1=0 is also positive, if otherwise H q1=0 is negative, the Poisson hypothesis is also rejected and MR estimation invalid. This strategy correctly identified the validity of MR estimation for all investigated cases: stationary BPs with m = 0.98 and m = 0.0 were accepted, while nonstationary BPs with transient changes, ramping, or sudden jumps of the drive were excluded (Fig. S1).

Supp. 6 Variance of the estimates.
The distribution ofm is consistent with a normal distribution N (m, σ 2 m ) centered around the true mean offspring m ( Fig. S2a; numerical results). The variance σ 2 m depends on the branching ratio m, the mean activity A t , the length L of the time series, and the sampling fraction α. Each of these factors affects σ 2 m mainly by changing the effective length of the time series, i.e. the number of non-zero entries l = |{A t | A t > 0}|. Thus, regardless of the actual time series length L or the mean activity A t , the variance scales as a power-law in l, Var[m] ∝ l −γ (Fig. S2b). The exponent of this power-law depends on m. The closer to criticality the process is, the larger Poisson activity (m = 0) explains data MR estimation valid the exponent γ, i.e. the larger benefit from longer time series length l. For m = 0.99, we found γ ≈ 3/2. The performance of the estimator is in principle independent of the mean activity, small A t only affect the variance of the MR estimator through a potential decrease of l.
Similarly, the degree of subsampling only affects the variance of the estimator through a decrease of the effective length of a t . While there may be a significant rise in σ 2 m when reducing the sampling fraction α, this increase can be explained by the coincidental decrease in l, as the rescaled variance σ 2 m · l γ remains within one order of magnitude over four decades of the sampling fraction α (Fig. S2c).
How does the variance change close to the critical transition? We found that the answer to this question highly depends on the specific choice of the parameters: if m is varied, one can either keep A t or h constant, not both at the same time. If the mean activity A t is fixed by choosing h = A t (1 − m), then the variance of the process scales as Var[A t ] ∝ 1/(1 − m). As m → 1, the activity will inevitably get into a regime, where bursts of activity (A t > 0) are disrupted by intermittent quiescent periods (A t ), thereby reducing l. In turn, the variance of the estimator increases as detailed before.
If however, the drive h is kept constant, we found that variance scales linearly in the distance to criticality = 1 − m over at least 5 orders of magnitude of : σ 2 m ∝ (Fig. S2d). Thus, the variance decreases when approaching criticality, while the relative variance σ 2 m / is constant. Note, however, that even though the standard deviation also decreases when approaching criticality (σm ∝ √ ), the relative standard deviation increases (σm/ ∝ 1/ √ ). We obtained scaling laws with the same exponents for other quadratic (like the mean squared error MSE) and linear (like the inter-quartile range IQR) measures of variation.
Confidence interval estimation. We used a model based approach to estimate confidence intervals for both simulation and experimental data. For simulations, we simulated B ∈ N independent copies of the investigated model and applied MR estimation to each copy, yielding a collection of B independent estimates {m (b) } B b=1 . For experimental time series a t with length L, mean activity a t , and number of sampled units n, MR estimation yields an estimatem. We then simulated B copies of branching networks {A = a t . This procedure gives B copies of a BN that all match a t in terms of the mean activity, the branching ratio, time series length, and number of sampled units. Applying MR estimation to these BNs yields a collection of B independent estimates {m (b) } B b=1 . For both simulation and experimental data, the distribution ofm and confidence intervals can be constructed from this collection.

Supp. 7 Simulations
Branching process. We simulated BPs according to Eq. (S1) in the following way: Realizations of the random numbers y t,i and h t describing the number of offsprings, and the drive, were each drawn from a Poisson distribution: y t,i ∼ Poi(m) with mean m, and h t ∼ Poi(h) with mean h, respectively. Here, we used Poisson distributions as they allow for non-trivial offspring distributions with easy control of the branching ratio m by only one parameter. For the brain, one might assume that each neuron is connected to k postsynaptic neurons, each of which is excited with probability p, motivating a binomial offspring distribution with mean m = k p. As in cortex k is typically large and p is typically small, the Poisson limit is a reasonable approximation. For the performance of the MR estimator and the limit behavior of the BP, the particular form of the law Y is not important such that the special choice we made here does not restrict the generality of our results.
The mean rate A t depends on m and h. In the simulation we varied m and fixed A t = 100 by adjusting h accordingly if not stated otherwise. For subsampling the BP, each unit is observed independently with probability p ≤ 1 . Then a t is distributed following a binomial distribution Bin(A t , p), and subsampling is implemented by drawing a t from A t at each time step. As a t = p A t , this implementation of subsampling is satisfies the definition of stochastic subsampling with α = p, β = 0.
Branching network. In addition to the classical branching process, we also simulated a branching network model (BN) by mapping a branching process [6,21] onto a fully connected network of N = 10000 neurons. An active neuron activated each of its k postsynaptic neurons with probability p = m/k. Here, k = 4 postsynaptic neurons were drawn randomly without replacement at each step, thereby avoiding that two different active neurons would both activate the same target neuron. Similar to the BP, the BN is critical for m = 1 in the infinite size limit, and subcritical (supercritical) for m < 1 (m > 1). As detailed for the BP, h was adjusted to the choice of m to achieve A t = 100, which corresponds to a rate of 0.01 spikes per neuron and time step. Subsampling [4] was applied to the model by sampling the activity of n neurons only, which were selected randomly before the simulation, and neglecting the activity of all other neurons.
Self-organized critical model. The SOC neural network model we used here is the Bak-Tang-Wiesenfeld (BTW) model [26]. Translated to a neuroscience context, the model consisted of N = 10000 (100 × 100) non-leaky integrate and fire neurons. A neuron i spiked if its membrane voltage V i (t) reached a threshold θ (S16) Note that the choice of θ does not change the activity of the model at all, so we set θ = 0 for convenience. The model neurons were arranged on a 2D lattice, and each neuron was connected locally to its four nearest neighbors with coupling strength α ij = α: where T j denotes the spike times of neuron j, and h t is the Poisson drive with mean rate h as defined for the BP above. Note that the neurons at the edges and corners of the grid had only 3 and 2 neighbors, respectively. This model is equivalent to the well-known Bak-Tang-Wiesenfeld model [26] if h → 0 and α = 1. Subsampling [4] was applied to the model by sampling the activity of n neurons only, which were selected randomly before the simulation, and neglecting the activity of all other neurons.
Parameter choices. If not stated otherwise, simulations were run for L = 10 7 time steps or until A t exceeded 10 9 , i.e. approximately half of the 32 bit integer range. If not stated otherwise, confidence intervals (see Supp. 6) were estimated from B = 100 samples, both for simulation and experiments. In Figs. 1c,d, BNs and the BTW model were simulated with N = 10 4 units and A t = 100. In Fig. 1e, a BP was simulated with m = 0.9 and A t = 100.
In Fig. 3d, subcritical and critical BNs with N = 10 4 and A t = 100 were simulated, and n = 100 units sampled. Because of the non-stationary, exponential growth in the supercritical case, here BPs were simulated with h = 0.1 and units observed with probability α = 0.01.
The in vivo-like BN was defined to match the cat recording, with m = 0.98, n = 50, and a t = 1.58 per bin, from which h/N = 0.032 per bin and neuron follows. Here, we chose N = 10 4 . To test for stationarity, the cat recordings and the in vivo-like BN were split into 59 windows of 1250 time steps each, before estimating m for each window.

Supp. 8 Epidemiological recordings
WHO data on measles worldwide. Time series with yearly case reports for measles in 194 different countries are available online from the World Health Organization (WHO) for the years between 1980 and 2014. MR estimation was applied to these time series. Because they contain very few data points and potential long-term drifts, we applied the consistency checks detailed above for every country (Tab. S1). After these checks, 124 out of the 194 surveyed countries were accepted for MR analysis and included in our analysis. Yearly information on approximate vaccination percentages (measles containing vaccine dose 1, MCV1) for the same countries and time span are also available online from the WHO.
RKI data on norovirus, measles and MRSA in Germany. For Germany, the Robert-Koch-Insitute (RKI) surveys a range of infectious diseases on a weekly basis, including measles, norovirus, and invasive meticillin-resistant Staphylococcus aureus (MRSA). Case reports are available through their SURVSTAT@RKI server [58]. Because of possible changes in report policies in the beginning of surveillance, we omitted the data from the first 6 months of each recording. Moreover, we omitted the incomplete week on the turn of the year, thus evaluating 52 full weeks in each year.
The MRSA recording showed a slow, small variation in the case reports that can be attributed to slow changes in the drive rates. To compensate for these slow drifts, we corrected the time series by subtracting a moving average over 3 years (156 weeks). We then applied MR estimation to the obtained time series. The recordings for measles and norovirus showed strong seasonal fluctuations of the case reports, resulting in a baseline oscillation of the autocorrelation function. We therefore used a modified model with a fixed period of T = 52 weeks, and estimatedm,b,ĉ 1 , andĉ 2 from minimizing the residual of this modified equation.

Supp. 9 Animal experiments
We evaluated spike population dynamics from recordings in rats, cats and monkeys. The rat experimental protocols were approved by the Institutional Animal Care and Use Committee of Rutgers University [51,50].
The cat experiments were performed in accordance with guidelines established by the Canadian Council for Animal Care [49]. The monkey experiments were performed according to the German Law for the Protection of Experimental Animals, and were approved by the Regierungspräsidium Darmstadt. The procedures also conformed to the regulations issued by the NIH and the Society for Neuroscience. The spike recordings from the rats and the cats were obtained from the NSF-founded CRCNS data sharing website [59,49,51,50].
In rats the spikes were recorded in CA1 of the right dorsal hippocampus during an open field task. We used the first two data sets of each recording group (ec013.527, ec013.528, ec014.277, ec014.333, ec015.041, ec015.047, ec016.397, ec016.430). The data-sets provided sorted spikes from 4 shanks (ec013) or 8 shanks (ec014, ec015, ec016), with 31 (ec013), 64 (ec014, ec015) or 55 (ec016) channels. We used both, spikes of single and multi units, because knowledge about the identity and the precise number of neurons is not required for the MR estimator. More details on the experimental procedure and the data-sets proper can be found in [51,50].
For the spikes from the cat, neural data were recorded by Tim Blanche in the laboratory of Nicholas Swindale, University of British Columbia [49]. We used the data set pvc3, i.e. recordings in area 18 which contain 50 sorted single units [59]. We used that part of the experiment in which no stimuli were presented, i.e., the spikes reflected spontaneous activity in the visual cortex of the anesthetized cat. Because of potential non-stationarities at the beginning and end of the recording, we omitted data before 25 s and after 320 s of recording. Details on the experimental procedures and the data proper can be found in [49,59].
The monkey data are the same as in [48,28]. In these experiments, spikes were recorded simultaneously from up to 16 single-ended micro-electrodes (∅ = 80 µm) or tetrodes (∅ = 96 µm) in lateral prefrontal cortex of three trained macaque monkeys (M1: 6 kg ♀; M2: 12 kg ♂; M3: 8 kg ♀). The electrodes had impedances between 0.2 and 1.2 MΩ at 1 kHz, and were arranged in a square grid with inter electrode distances of either 0.5 or 1.0 mm. The monkeys performed a visual short term memory task. The task and the experimental procedure is detailed in [48]. We analyzed spike data from 11 experimental sessions comprising almost 12.000 trials (M1: 4 sessions; M2: 4 sessions; M3: 3 sessions). 6 out of 11 sessions were recorded with tetrodes. Spike sorting on the tetrode data was performed using a Bayesian optimal template matching approach as described in [60] using the Spyke Viewer software [61]. On the single electrode data, spikes were sorted with a multi-dimensional PCA method (Smart Spike Sorter by Nan-Hui Chen).
Analysis. For each recording, we collapsed the spike times of all recorded neurons into one single train of population spike counts a t , where a t denotes how many neurons spiked in the t th time bin ∆t. We used ∆t = 4 ms, reflecting the propagation time of spikes from one neuron to the next.
From these time series, we estimatedm using the MR estimator with k max = 2500 (corresponding to 10 s) for the rat recordings, k max = 150 (600 ms) for the cat recording, and k max = 500 (2000 ms) for the monkey recordings, assuring that k max was always in the order of multiple autocorrelation times. For the monkey experiments, we performed MR estimation on the pooled data of all ∼700 trials of 5 s length each. Experiments were excluded if the tests according to Supp. 5 detected potential nonstationarities .

=0.998
Rejected (H lin , H offset , Hτ ) Figure S1: Excluding nonstationary data. Each left panels shows the time series at of the activity from one single trial (light blue) and averaged activity from 100 trials (dark blue), recorded from n = 50 out of N = 10 4 neurons. Each right panels shows the corresponding MR estimation from one single trial. We investigated the following, generic cases for the temporal evolution of the drive rate ht : a, b. The drive is stationary ( ht identical for all t, red), so are the mean rates at . c, d. The drive exhibits a transient increase centered around half of the simulation time. The mean rate at is therefore also time-dependent and follows the temporal evolution of ht . e. The drive shows a linear increase over the simulation. f. The drive exhibits a step function after half the simulation. Nonstationarities (c -f ) typically lead to an overestimation ofm, which is particularly severe if the underlying dynamics is Poissonian (m = 0). The tests defined in Supp. 5 (see Tab. S1) were able to exclude time series where the investigated nonstationarities were present, while accepting the stationary cases a, b.  Figure S2: Variance of the MR estimates. This figure shows numerical result for the distribution and variability of the estimatem as a function of multiple parameters. a. Distribution of the estimatem, estimated from 5000 independent copies of a branching process (BP) with m = 0.99, At = 100 and length L = 10 5 : normalized histograms of the probability of estimatingm for full sampling (blue) and binomial subsampling with α = 0.001 (red), together with normal distributions N (m,σ 2 m ). Inset: Q-Q-plot for the quantiles of N (m,σ 2 m ) and the quantiles of the estimatedm under both samplings. The estimatedm are found to be distributed normally in both cases (fully sampled: r 2 = 0.9995, subsampled: r 2 = 0.998). b. The variance σ 2 m of the estimatem is estimated from 100 independent copies of a BP. Results for different m, mean activities At and time series lengths L are plotted as a function of the effective time series length l = |{At | At > 0}|, the number of nonzero entries. For any given m, the variance ofm shows algebraic scaling σ 2 ∝ l γ . The exponent of this scaling depends on m, with higher γ the closer m is to unity. Hence, the benefit from longer time series is larger the closer a system is to criticality. Importantly, the variance does not directly depend on the mean activity At , this number only influences the accuracy of MR estimation via the potential change in l. c. The variance of the estimatem is estimated from 100 independent copies of a BP with m = 0.99, At = 100, and L = 10 5 and plotted as a function of the sampling probability α under binomial subsampling. While the variance appears to increase dramatically under stronger subsampling, this increase can be attributed to the according decrease of the effective time series length l. After rescaling by (l/L) 3/2 (cf. panel b), the rescaled variance remains within one order of magnitude over four orders of magnitude in α. Hence, the accuracy of the estimator is not directly influenced by the degree of subsampling. d. The variance σ 2 m is estimated from 100 independent copies of a BP with m = 0.99, h = 1, and L = 10 5 and plotted as function of the distance to criticality = 1 − m. The variance is found numerically to scale as σ 2 m ∝ , hence the standard deviation scales as σm ∝ √ . Similar scaling results were found for other linear (like the interquartile range) and quadratic (like the mean squared error) measures of variation.  Figure S3: MR estimation for individual animals. MR estimation is shown for every individual animal (see Supp. 9). The consistency checks are detailed in the Supp. 5 (see Tab. S1). a. Data from monkey prefrontal cortex during an working memory task. The third panel shows a oscillation of r k with a frequency of 50 Hz, corresponding to measurement corruption due to power supply frequency. b. Data from anesthetized cat primary visual cortex. c. Data from rat hippocampus during a foreaging task. In addition to a slow exponential decay, the slopes r k show the ϑ-oscillations of 6 -10 Hz present in hippocampus.