## Introduction

Collective dynamics are ubiquitous in nature1,2. From neural circuits to animal groups, there are countless instances where the interactions among large numbers of elementary units bestow surprisingly complex patterns of tantalizing beauty for the collective. Deciphering the interactions within a collective from observations of its units is the object of several successful methodological endeavors3,4. Common to the vast majority of the existing approaches is the knowledge of the size of the collective, a hypothesis that is often defeated in experiments. For example, understanding bird migration requires tagging individuals to measure their motion5, but there is no guarantee that these tagged individuals comprise the totality of the flock.

Only a handful of recent studies have addressed the fundamental problem of estimating the size of a collective from measurements of some of its units6,7,8,9. The approaches proposed by Haenhe et al.6, Porfiri7, and Tang et al.8 are based on the computation of the rank of ancillary matrices that encode the transient response of some of the units, in the absence of forcing. In contrast, Tyloo and Delabay9 tackle the inference problem from steady-state response to harmonic forcing. Although they may tolerate some level of added noise and measurement uncertainty, all of these methods are designed on the premise of completely deterministic dynamics.

Here, we propose a different paradigm to solve the problem of estimating the size of a collective, building upon seminal ideas of Einstein10, Smoluchowski11, and Perrin12 that led to the determination of the Avogadro’s number from kinetic theory. Using the words of Feynman regarding the experiments of Perrin13, “one of the earliest determinations of the number of atoms was by the determination of how far a dirty little particle would move if we watched it patiently under a microscope for a certain length of time.” We demonstrate that the mean square dynamics of a single unit is sufficient to infer the size of a collective.

We focus on a system of self-propelled Vicsek particles14 as a universal model for collective dynamics, for which we show that the time rate of growth of the mean square heading of any particle is sufficient to predict the number of particles in the system. The concept is best explained by drawing an analogy between Vicsek models and the classical kinetic theory of gases15, where the mean square displacement of any particle is given by λDt, with t being time and D the diffusion coefficient. The latter can be estimated as $$D\propto \frac{{k}_{{{{{{\rm{B}}}}}}}T}{m}\frac{V}{N{d}^{2}}$$, where kB is Boltzmann’s constant, T the absolute temperature, V the volume of the gas, d the particle diameter, m the particle mass, and N the number of particles. Hence, one can estimate the number of particles in the gas from the rate of growth of the mean square displacement, upon some knowledge of the thermodynamic state of the gas and the properties of its particles.

Provided that the network of interactions underlying the collective dynamics of the Vicsek particles guarantees some average connectivity in time, the motion of one unit will contain a significant footprint of the size of the entire system, similar to existing approaches for the inference of the size of a collective6,7,8,9. We numerically and mathematically demonstrate that the inference of the size of the system from the motion of a single particle only requires knowledge about the variance of the added noise, which, indeed, has been proposed as a thermodynamic analog of the temperature for collective dynamics16,17,18,19. Different from the experiments of Perrin, our focal unit is not a macroscopic particle in a bath of microscopic particles, rather, it is one of the very particles comprising the collective whose trajectory is accessible by the experiment.

## Results

### Numerical results

We consider a system of N self-propelled particles in a square domain of size L × L with periodic boundary conditions. At time $$k\in {{\mathbb{Z}}}_{\ge 0}$$, the planar position of particle i is $${r}_{k}^{i}\in {\mathbb{C}}$$ and its heading is $${\theta }_{k}^{i}\in [-\pi ,\pi ]$$. Particles negotiate their heading with other particles within a unitary interaction radius under the effect of additive noise, moving at a constant speed,

$${\theta }_{k+1}^{i} =\, \arg \left(\mathop{\sum}\limits_{j\in {{{{{{{{\mathcal{N}}}}}}}}}_{k}^{i}}\exp (\imath {\theta }_{k}^{j})\right)+{\psi }_{k}^{i},\\ {r}_{k+1}^{i} =\, {r}_{k}^{i}+s\exp (\imath {\theta }_{k}^{i}).$$
(1)

Here ı is the imaginary unit; $${{{{{{{{\mathcal{N}}}}}}}}}_{k}^{i}=\{j=1,\ldots ,N:\parallel {r}_{k}^{j}-{r}_{k}^{i}\parallel \le 1\}$$ is the set of neighbors of particle i at time k; $${\psi }_{k}^{1},\ldots ,{\psi }_{k}^{N}$$ are independent uniformly distributed random variables in [ − η/2, η/2] with $$\eta \in {{\mathbb{R}}}_{ \,{ > }\,0}$$ being a control parameter that modulates the added noise so that the noise variance is σ2 = η2/12; and $$s\in {{\mathbb{R}}}_{ \,{ > }\,0}$$ is the common speed of all the particles, which corresponds to the number of interaction radii that are traveled in one time-step, so that neighbors are almost fixed for s 1 and they randomly mix for s 1. Parameters N and L control the density of the system, η acts as a temperature, and s regulates temporal correlations among particles.

We report simulation results for two values of s (low and high), and several values of L and N, spanning the simulation range considered by Vicsek et al.14. For each parameter combination, we perform 1000 repetitions with the same initial configuration, randomly selected in a square of unit length at the center of the domain. The initial heading of each particle is randomly selected in [−π, π], and η is equal to 0.1, which corresponds to the “small noise” case originally considered by Vicsek14 to demonstrate ordered motion in the collective (with an average velocity, sometimes called polarization, close to one). In each repetition, we run model (1) for 150 time steps and track the heading of a single (randomly selected) focal particle, see Fig. 1. For each parameter combination and at every time-step k, we compute the sample variance Yk of the heading across the 1000 repetitions (for moderate noise, the sample variance is, in fact, equivalent to the sample circular variance). As one should expect from kinetic theory, Yk grows linearly with time (that is, Yk = Dk + Y0, with Y0 being the initial variance) and the diffusion coefficient (estimated using linear regression on Y1, …, Y150) is inversely proportional to the number of particles N, see Fig. 1.

Irrespective of the particle speed, there is a wide region of the parameter space (L2, N) for which the diffusion coefficient per unit noise variance (D/σ2; calculated by taking the mean of the diffusion coefficients estimated from all the particles) is equal to the inverse of the number of particles, see Fig. 2a, b. As a result, it is possible to exactly infer the size of the collective from knowledge of the heading of a focal particle and of the noise variance that captures the system temperature (for details on the precision of the estimate, see Supplementary Note 1). More specifically, for the low-speed case, the accuracy of the inference is robust with respect to changes in the domain size and the number of particles, with variations of at most 20% registered in the whole parameter space. For the high-speed case, we find a more marked dependence on the accuracy of the inference on L and N. Although accurate inferences (within 20%) are obtained for L2 ≤ 40, further increasing the domain size challenges the inference. Such a detrimental effect is more evident for small values of N. The possibility of accurately performing the inference across widely different particle speeds is surprising, whereby the Vicsek model spans from the diluted XY-model (s 1) to the mean-field behavior of a ferromagnet (s 1) as a function of the speed14. We offer some mathematical backing to this observation through the analytical treatment of these two limit cases, for which we establish closed-form expressions for the diffusion coefficient.

In the vast majority of the simulations in which the inference does not match N (especially for large values of L in the high-speed case), we document an underestimation of the size of the collective (DN/σ2 > 1). This can be explained by the formation of independent particle clusters14, which can be revealed from the study of the overall connectivity of the system. As a result, the diffusion coefficient will only be reflective of the size of the cluster to which the focal particle belongs, thereby leading to the observed underestimation. Numerical evidence supporting this explanation is presented in the bottom panels of Fig. 2c, d, where we plot the normalized Fiedler eigenvalue λ2/N of the Laplacian matrix associated with the average network of interactions during the simulation window, similar to the analysis by Baglietto et al.20.

The Fiedler eigenvalue, measuring the overall connectivity of the system, follows the trend similar to that of DN/σ2 in Fig. 2a, b. Until a threshold value for the normalized eigenvalue $$-\log ({\lambda }_{2}/N)$$, the method provides a nearly perfect inference of the collective size with a weak dependence on λ2/N that translates into a modest decrease in accuracy as $$-\log ({\lambda }_{2}/N)$$ increases. On the contrary, beyond the threshold, the accuracy of the inference significantly degrades, with a clear dependence on λ2/N, as illustrated in Fig. 2e. The origin of this qualitative transition is presently unclear. Perhaps, it is associated with the emergence of ordered bands within the collective21, so that a few subgroups are able to synchronize along common headings in a bath of disordered motion for the rest of the collective. We do not favor this explanation, however, given the short length of the observation window that may challenge the development of such bands. Also, these bands are typically documented for large systems, while the observed threshold phenomenon involves small collectives as well (with N as small as 10).

The analysis of the precision of the inference along with simulations for higher noises (η = 0.3 and η = 0.5), different speeds (spanning from 0.1 to 10 at a fine resolution), fewer repetitions (spanning from 100 to 1000), and time steps (spanning from 15 to 150), non-identical initial conditions across repetitions, non-standard Vicsek models (with forward position update), and non-uniform (Gaussian) noise distributions are presented in Supplementary Note 2. Therein, we also include a numerical analysis of the rate of change in time of the interaction network as a function of the particle speed that illustrates the progression from nearly static to memoryless switching topologies, in terms of the decay of an autocorrelation function for the network of interaction.

### Theory: low particle speed

We begin with the study of low speed, for which the neighbor set of each particle can be approximated to be constant in time. We hypothesize that the noise is sufficiently small for the system to be in its ordered phase, where particles’ headings synchronize. Hence, the collective dynamics of the system is described by the classical noisy consensus problem with static network topology that is extensively studied in control theory22.

Specifically, we consider the following linear dynamics:

$${{{{{{{{\boldsymbol{\theta }}}}}}}}}_{k+1}={{{{{{{\bf{A}}}}}}}}{{{{{{{{\boldsymbol{\theta }}}}}}}}}_{k}+{{{{{{{{\bf{u}}}}}}}}}_{k},$$
(2a)
$${y}_{k}={{{{{{{{\boldsymbol{\theta }}}}}}}}}_{k}^{{{{{{{{\rm{T}}}}}}}}}{{{{{{{{\bf{e}}}}}}}}}_{f},$$
(2b)

where $${{{{{{{{\boldsymbol{\theta }}}}}}}}}_{k}={[{\theta }_{k}^{1},\ldots ,{\theta }_{k}^{N}]}^{{{{{{{{\rm{T}}}}}}}}}$$, A is the state matrix, $${{{{{{{{\bf{u}}}}}}}}}_{k}={[{u}_{k}^{1},\ldots ,{u}_{k}^{N}]}^{{{{{{{{\rm{T}}}}}}}}}$$ is a white noise vector with variance σ2 (that is, for all k and τ ≠ 0: E[uk] = 0, $${{{{{{{\rm{E}}}}}}}}[{{{{{{{{\bf{u}}}}}}}}}_{k}{{{{{{{{\bf{u}}}}}}}}}_{k}^{{{{{{{{\rm{T}}}}}}}}}]={\sigma }^{2}{{{{{{{{\bf{I}}}}}}}}}_{N}$$, and $${{{{{{{\rm{E}}}}}}}}[{{{{{{{{\bf{u}}}}}}}}}_{k}{{{{{{{{\bf{u}}}}}}}}}_{k+\tau }^{{{{{{{{\rm{T}}}}}}}}}]=0$$, with E[  ] denoting expectation), ef is the vector with all zeros but a one in the fth entry, and yk is the heading of the focal particle. The state matrix A corresponds to the adjacency matrix of an undirected, unweighted graph with self-loops for each node, such that A1N = 1N with 1N being the vector of all ones. The reason why the adjacency matrix is row-stochastic is because the fully ordered phase is a solution of the Vicsek model in the absence of additive noise, so that any common heading of the particles shall be a solution for the collective. Additionally, we hypothesize a connected graph, such that A is primitive23.

We prove that the variance of yk diverges linearly with time, with a slope that is inversely proportional to the system size N. We commence from the dynamics of the covariance matrix $${{{{{{{{\boldsymbol{\Theta }}}}}}}}}_{k}={{{{{{{\rm{E}}}}}}}}\left[{{{{{{{{\boldsymbol{\theta }}}}}}}}}_{k}{{{{{{{{\boldsymbol{\theta }}}}}}}}}_{k}^{{{{{{{{\rm{T}}}}}}}}}\right]$$ of θk (Θk is the covariance matrix if $${{{{{{{\rm{E}}}}}}}}{[{{{{{{{{\boldsymbol{\theta }}}}}}}}}_{0}]}^{{{{{{{{\rm{T}}}}}}}}}{{{{{{{{\bf{1}}}}}}}}}_{N}=0$$, otherwise, it should be interpreted as the second raw moment),

$${{{{{{{{\boldsymbol{\Theta }}}}}}}}}_{k+1}={{{{{{{\rm{E}}}}}}}}\left[{{{{{{{{\boldsymbol{\theta }}}}}}}}}_{k+1}{{{{{{{{\boldsymbol{\theta }}}}}}}}}_{k+1}^{{{{{{{{\rm{T}}}}}}}}}\right]={{{{{{{\bf{A}}}}}}}}{{{{{{{{\boldsymbol{\Theta }}}}}}}}}_{k}{{{{{{{{\bf{A}}}}}}}}}^{{{{{{{{\rm{T}}}}}}}}}+{\sigma }^{2}{{{{{{{{\bf{I}}}}}}}}}_{N},$$
(3)

where we used the uncorrelatedness between θk and uk, and we introduced the identity matrix IN. To solve the linear recursion, we use a similarity matrix V that is obtained by juxtaposing column-wise the orthonormal eigenvectors of A. By denoting with Ξk the covariance matrix of the transformed state variable ξk = VTθk and observing that Ξk = VTΘkV, from Eq. (3) we establish for element ij the following:

$${{{\Xi }}}_{k+1}^{ij}={\alpha }_{i}{\alpha }_{j}{{{\Xi }}}_{k}^{ij}+{\sigma }^{2}{\delta }_{ij},$$
(4)

where αi is the ith eigenvalue of A, and δ(, ) is the Kronecker delta. Since A is row-stochastic and primitive, 1 is its simple dominant eigenvalue (denoted as α1) whose eigenvector is $${{{{{{{{\bf{v}}}}}}}}}_{1}={{\mathbb{1}}}_{N}/\sqrt{N}$$23. As k → , the off-diagonal elements of Ξk converge to zero, and the i-th diagonal entry, for i ≠ 1, converges to $${\sigma }^{2}/(1-{\alpha }_{i}^{2})$$. The first diagonal entry, instead, diverges linearly with time, according to the expression $${{{\Xi }}}_{k}^{11}={{{\Xi }}}_{0}^{11}+{\sigma }^{2}k$$.

Given Eq. (2b), the variance of yk is $${Y}_{k}={{{{{{{{\bf{e}}}}}}}}}_{f}^{{{{{{{{\rm{T}}}}}}}}}{{{{{{{{\boldsymbol{\Theta }}}}}}}}}_{k}{{{{{{{{\bf{e}}}}}}}}}_{f}$$. Working with the transformed state variable ξk, we have $${Y}_{k}={{{{{{{{\bf{e}}}}}}}}}_{f}^{{{{{{{{\rm{T}}}}}}}}}{{{{{{{\bf{V}}}}}}}}{{{{{{{{\boldsymbol{\Xi }}}}}}}}}_{k}{{{{{{{{\bf{V}}}}}}}}}^{{{{{{{{\rm{T}}}}}}}}}{{{{{{{{\bf{e}}}}}}}}}_{f}$$, which, for sufficiently large k, yields the sought dependence on the size of the collective

$${Y}_{k}\approx \frac{{\sigma }^{2}}{N}k.$$
(5)

Here, we used the facts that the first column of V is $${{\mathbb{1}}}_{N}/\sqrt{N}$$ and that the first diagonal entry of Ξk diverges in time with a rate of σ2. Similar to the mean square displacement of a particle in a gas15, Yk linearly diverges and the diffusion coefficient is D = σ2/N, thereby offering mathematical backing to the numerical results in Fig. 2a (DN/σ2 ≈ 1). In particular, the values of the diffusion coefficient and variance of the added noise are sufficient to accurately infer the collective size under the approximation of a static topology. This result may be counterintuitive if one considers that for an XY-model interactions are completely local, so that intuition may suggest that the trajectory of a particle only contains the footprint of the dynamics of its neighbors. However, this is not the case since the connectivity of the interaction network ensures that, albeit indirectly, every particle in the collective interacts with any other particle.

A few comments are in order for the derivation of Eq. (5). First, we considered a more general form of additive noise in Eq. (2a) than the Vicsek model, to emphasize that only knowledge of the variance is sufficient for the inference and that other noise distributions can be equivalently considered. This evidence strengthens the connection between the notion of thermodynamic temperature and the variance of the noise16,17,18,19. Second, should the network be disconnected as in the case of high values of L, the same mathematical steps would yield a higher diffusion coefficient of σ2/Nf, where Nf is the number of particles in the connected component the focal particle belongs to. Hence, the proposed inference approach would never overestimate the size of the system, see Fig. 2. Third, the approach is robust with respect to individual variations in the form of heterogeneous variances $${\sigma }_{1}^{2},\ldots ,{\sigma }_{N}^{2}$$ across the particles. In this case, the same derivations would yield an equivalent linear dependence in time for Yk with $$D=\mathop{\sum }\nolimits_{i = 1}^{N}{\sigma }_{i}^{2}/N$$, so that the inference is independent of the choice of the focal particle and the only required information is the average value of the variance. Likewise, the approach is robust with respect to weighted interactions, whereby the results would not change for any undirected primitive row-stochastic matrix A.

Finally, although the derivation of Eq. (2b) assumes the network to be constant in time and given once for all across the repetitions, its applicability is likely to extend to a more general setting that should be formalized in future studies. Specifically, should one consider slowly-varying topologies so that A varies in time much slower than the rate of divergence of the variance of yk, the same mathematical arguments might be applicable. Likewise, should the adjacency matrix differ across repetitions, we would not expect changes in Eq. (2b), provided each of the matrices is connected, since no topological feature of the adjacency matrix enters the estimation of the size of the collective. Simulation results in Fig. 2 and in Supplementary Note 2 offer compelling evidence for these claims, whereby the inference is accurate albeit the network topology changes in time due to additive noise, and initial conditions are varied across repetitions.

### Theory: high particle speed

The analytical result in Eq. (5) is based on the assumption of a static topology. An equivalent claim can be established in the opposite scenario of random mixing, where the speed of the particles is sufficiently large for the memory of past interactions to fade between two consecutive time steps. This limit case can be studied through the vectorial network model (VNM)24, which implements heading update by randomly assigning to any particle K neighbors drawn uniformly from the entire collective at each time step. In this high-speed limit, particles interact on average with π/L2(N − 1) other particles within their unit circles so that L and K can be related by L2 = π(N − 1)/(K − 1). For the case of small noise, the model reduces to25

$${{{{{{{{\boldsymbol{\theta }}}}}}}}}_{k+1} =\, {{{{{{{{\bf{A}}}}}}}}}_{k}{{{{{{{{\boldsymbol{\theta }}}}}}}}}_{k}+{{{{{{{{\bf{u}}}}}}}}}_{k},\\ {y}_{k} =\, {{{{{{{{\boldsymbol{\theta }}}}}}}}}_{k}^{{{{{{{{\rm{T}}}}}}}}}{{\mathbb{e}}}_{f},$$
(6)

where the matrices A0, A1, … are independent identically distributed (i.i.d.) with common random variable A, and the noise uk and all the other variables are defined as in the case of static topology. Specifically, A is such that each of its rows is the sum of K ≥ 2 i.i.d. vectors, each one having all entries 0, except one (selected from a uniform distribution) that is equal to 1/K. Similar to the low-speed case in Eq. (2a) and consistent with the original Vicsek model (1), matrix A is row-stochastic so that A1N = 1N by construction. This temporal patterning yields an all-to-all average network, $${{{{{{{\rm{E}}}}}}}}[{{{{{{{\bf{A}}}}}}}}]=\frac{1}{N}{{{{{{{{\bf{1}}}}}}}}}_{N}{{{{{{{{\bf{1}}}}}}}}}_{N}^{{{{{{{{\rm{T}}}}}}}}}$$. We acknowledge the possibility that the number of distinct neighbors is less than K within the VNM; however, the probability that two vectors will contribute a non-zero entry at the same location is very small.

Similar to Eq. (3), we write

$${{{{{{{{\boldsymbol{\Theta }}}}}}}}}_{k+1}={{{{{{{\rm{E}}}}}}}}[{{{{{{{{\bf{A}}}}}}}}}_{k}{{{{{{{{\boldsymbol{\theta }}}}}}}}}_{k}{{{{{{{{\boldsymbol{\theta }}}}}}}}}_{k}^{{{{{{{{\rm{T}}}}}}}}}{{{{{{{{\bf{A}}}}}}}}}_{k}^{{{{{{{{\rm{T}}}}}}}}}]+{\sigma }^{2}{{{{{{{{\bf{I}}}}}}}}}_{N}$$
(7)

but, in this case, the adjacency matrix depends on time and cannot be moved out of the expectation. To establish a linear recursion for θk, we resort to Kronecker algebra,

$${{{{{{{\rm{vec}}}}}}}}({{{{{{{{\boldsymbol{\Theta }}}}}}}}}_{k+1})={{{{{{{\bf{G}}}}}}}}{{{{{{{\rm{vec}}}}}}}}({{{{{{{{\boldsymbol{\Theta }}}}}}}}}_{k})+{\sigma }^{2}{{{{{{{\rm{vec}}}}}}}}({{{{{{{{\bf{I}}}}}}}}}_{N}).$$
(8)

Here, “vec” identifies the process of vectorization and G = E[AA] can be explicitly computed as25

$${{{{{{{\bf{G}}}}}}}}=\frac{1}{KN}{{{{{{{\rm{vec}}}}}}}}({{{{{{{{\bf{I}}}}}}}}}_{N}){{{{{{{\rm{vec}}}}}}}}{({{{{{{{\bf{R}}}}}}}})}^{{{{{{{{\rm{T}}}}}}}}}+\frac{1}{{N}^{2}}{{{{{{{{\bf{1}}}}}}}}}_{N}{{{{{{{{\bf{1}}}}}}}}}_{N}^{{{{{{{{\rm{T}}}}}}}}}\otimes {{{{{{{{\bf{1}}}}}}}}}_{N}{{{{{{{{\bf{1}}}}}}}}}_{N}^{{{{{{{{\rm{T}}}}}}}}},$$
(9)

with “” being the outer product and $${{{{{{{\bf{R}}}}}}}}={{{{{{{{\bf{I}}}}}}}}}_{N}-{{{{{{{{\bf{1}}}}}}}}}_{N}{{{{{{{{\bf{1}}}}}}}}}_{N}^{{{{{{{{\rm{T}}}}}}}}}/N$$ a projection matrix. None of the analytical developments by Porfiri25 can be utilized for the case at hand, whereby the dynamics considered therein was deprived of the mean to study the evolution of the synchronization error. In Supplementary Note 3, we study the spectral properties of G to determine that, for sufficiently large times, element ij of Θk can be written as

$${{{\Theta }}}_{k}^{ij}\approx \frac{{\sigma }^{2}\gamma }{N}k,$$
(10)

where γ = (a − b)/(a − N + 1), with a = N(N − 1 − 2KN + 2K + K2N) and b = N(N − 1 − KN + K). Given that $${Y}_{k}={{{\Theta }}}_{k}^{ff}$$, we find the sought dependence on the size of the collective,

$${Y}_{k}\approx \frac{{\sigma }^{2}\gamma }{N}k.$$
(11)

Different from the case of low speed, the particle density enters the estimation of the diffusion coefficient through γ. As further detailed in Supplementary Note 3, for any choice of 2 ≤ K ≤ N and N > 1, γ is between 1 and 2. The largest values of γ are registered for low densities, that is, small systems in large domains. Increasing the density (that is, increasing K) lowers the value of γ, thereby leading to accurate inferences. This theoretical result is in qualitative agreement with numerical simulations in Fig. 2b, which indicate that in the high-speed case, the accuracy of the inference degrades with L and improves with N. However, care should be placed when drawing conclusions for large values of L that lead to poorly connected average networks (as evidenced by the normalized Fiedler eigenvalue in Fig. 2d), in contrast with the all-to-all average network predicted by the vectorial network model.

In principle, the computation can be extended to weighted, random interactions, in which every node randomly weights the state of its K neighbors at each time-step, such as blinking models26 activity-driven networks27, and averaging models28. However, this would likely lead to a different expression for γ since the spectral properties of G would vary. The presence of heterogeneities in the added noise across the units will not produce a difference in the diffusion coefficient with respect to the static case, as shown in Supplementary Note 4.

## Conclusions

There is a growing effort to build physical intuition and technical rigor in collective dynamics by recognizing analogies with thermodynamics29,30,31,32,33. For example, Sinhuber and Oulette30 have introduced definitions of pressure and chemical potentials for insect swarms to study the coexistence of a condensed phase surrounded by a vapor phase observed in experiments. Likewise, Haeri et al.32 have recently elucidated thermodynamic variables of collectives governed by attractive and repulsive forces, presenting an equation of state similar to the ideal gas law.

Here, we unveil a further, microscopic analogy, upon which we establish a methodology to infer the size of a collective from the dynamics of a single unit. Different from the entire body of knowledge that relies on deterministic dynamics6,7,8,9, we treat stochasticity as a commodity to solve the inference problem. Just as the motion of a Brownian particle in a liquid can be used to determine Avogradro’s number as shown in the classical experiments by Perrin12, so we demonstrate that the stochastic motion of a single unit is sufficient for the determination of the size of the entire collective.

We focus on the ordered (homogeneous) phase of the self-propelled particle model of Vicsek, for which we discover an inverse proportionality between the diffusion coefficient associated with the heading of any of the particles and the size of the collective. Given the sole knowledge of the temperature of the system, in the form of the variance of the added noise, one can successfully infer the size of the collective from the diffusion coefficient. We offer detailed analytical treatment of two limit cases of the Vicsek model, encompassing the XY-model and mean-field behavior of a ferromagnet. The mathematical study of these cases confirms numerical predictions that connectivity of the collective ensures that the diffusion coefficient is, within a first approximation, independent of the particle density—a counterintuitive finding with respect to the case of an ideal gas where the geometry of the system modulates the diffusion coefficient15. While these limited cases may offer partial insight into the study of key nonlinear features of the Vicsek model, like the nature of its phase transition21, they accurately anticipate the observed evolution of the variance of the heading and its dependence on the system parameters at low temperatures.

This study is not free of limitations, which shall be addressed in future work toward the study of real collectives, from insect swarms to bird flocks, fish schools, and human crowds. First, the approach is demonstrated on a rather simplified model of collective behavior, which, albeit popular and general, does not capture all the intricacies and complexities of collective behavior. Future work should seek to apply the approach to more detailed models of collective behavior, including, for example, attraction and repulsion rules2. Second, the implementation of the approach requires knowledge of the noise variance, which may not be feasible all the times, especially when dealing with experimental observations. Should one have access to some video recordings of the collective and knowledge of an adequate mathematical model, it could be possible to infer the value of the noise through dimensionality reduction techniques, without the need for tracking34.

Overall, this study provides a rigorous, mathematically backed method to infer the size of a realistic collective from measurements of some of its units, whose random motion contains the footprints of the entire system. The theoretical underpinnings of the method provide further evidence for the analogies identified by Einstein between interdisciplinary research in the collective behavior of animal groups and modern physics35.