Natural swarms in 3.99 dimensions

The renormalization group is a key set of ideas and quantitative tools of statistical physics that allow for the calculation of universal quantities that encompass the behaviour of different kinds of collective systems. Extension of the predictive power of the renormalization group to collective biological systems would greatly strengthen the effort to put physical biology on a firm basis. Here we present a step in that direction by calculating the dynamical critical exponent z of natural swarms of insects using the renormalization group to order ϵ = 4 − d. We report the emergence of a novel fixed point, where both activity and inertia are relevant. In three dimensions, the critical exponent at the new fixed point is z = 1.35, in agreement with both experiments (1.37 ± 0.11) and numerical simulations (1.35 ± 0.04). Our results probe the power of the renormalization group for the quantitative description of collective behaviour, and suggest that universality may also play a decisive role in strongly correlated biological systems. Tests of the predictions of the renormalization group in biological experiments have not yet been decisive. Now, a study on the collective dynamics of insect swarms provides a long-sought match between experiment and theory.

The dynamical critical exponent z of natural swarms of insects is calculated using the renormalization group to order = 4 − d. A novel fixed point emerges, where both activity and inertia are relevant. In three dimensions the critical exponent at the new fixed point is z = 1.35, in agreement with both experiments (1.37 ± 0.11) and numerical simulations (1.35 ± 0.04).
Collective behaviour is found in a great variety of biological systems, from bacterial clusters and cell colonies, up to insect swarms, bird flocks, and vertebrate groups. A unifying ingredient, which also provides an insightful connection with statistical physics, is the presence of strong correlations: the correlation length, ξ, is often significantly larger than the microscopic scales [1][2][3][4][5][6][7]; in some instances ξ grows with the system's size, giving rise to scale-free correlations [1,6,8]. In the case of natural swarms of insects, a second key hallmark of statistical physics has been verified, namely dynamic scaling [9]; this is noteworthy, as dynamical scaling entangles spatial and temporal relaxation into one law, known as critical slowing down [10]: the collective relaxation time grows as a power of the correlation length, τ ∼ ξ z , thus defining the dynamical critical exponent, z. Strong correlations and scaling laws are the two essential prerequisites of the Renormalization Group (RG) [11,12]: by coarse-graining short-wavelength fluctuations, the parameters of different systems flow towards few fixed points ruling their largescale behaviour; RG fixed points therefore organize into universality classes the macroscopic behaviour of strongly correlated systems, thus providing parameter-free predictions of the critical exponents. The emergence of scale-free correlations and scaling laws calls for an exploration of the RG path also in collective biological systems.
In the broader field of active matter [13], RG is already a key tool. The pioneering hydrodynamic theory of Toner and Tu [14,15] has been studied through the RG both in the polarised [16,17] and near ordering phase [18,19], with applications in systems with nematic or polar order [20][21][22]. RG has also been employed to study motilityinduced phase separation [23,24], active membranes [25], bacterial chemotaxis [26], cellular growth [27]. Direct comparisons with experiments are few, though: the exponent of giant number fluctuations in d = 2 [15,17] was confirmed in experiments on vibrated polar disks [28], while in [29] the exponents of the Vicsek Model in the ordered phase were found to be incompatible with those conjectured by Toner and Tu [14]. Other RG exponents have been checked in numerical simulations [30][31][32][33]. Comparisons with biological experiments are scarcer. Experi-ments studying giant number fluctuations in swimming filamentous bacteria displaying long-range nematic order [34] found an exponent in disagreement with RG predictions of active nematic [21] and polar [17] systems. To the best of our knowledge, there is yet no successful test of an RG prediction against experiments on living active systems.
Here, we apply the RG to the dynamics of insect swarms. Swarms of midges in the field are near-critical, strongly correlated systems [8], with short-range interactions [4], obeying dynamic scaling [9] with an experimental exponent z exp = 1.37 ± 0.11, significantly smaller than the value z ≈ 2 of standard ferromagnets [35]. This large gap indicates that fundamental new physics is required. Although the relation, τ ∼ ξ z , is not merely a dispersion law, smaller z nevertheless suggests that fluctuations are more swiftly transported across the system. Hence, the effort to match theory with experiments in natural swarms must look for more efficient mechanisms of information transfer. Activity is the first obvious candidate, as it allows fluctuations to propagate not only thanks to the inter-individual interaction, but also through the self-propelled motion of the particles [14]. Incompressible, near-critical active matter was first studied in [18], where an RG analysis found that activity lowers z from 2 to 1.73. This was an important step towards bridging the gap between experiments and theory in natural swarms, although the chasm with the experimental exponent remains significant.
The second ingredient known to foster information propagation is inertia. Behavioural inertia in the rotations of the individual velocities was first introduced to explain the propagation of collective turns in bird flocks [36,37], but experiments found clear evidence of underdamped inertial relaxation also in natural swarms of midges [9]. At the general level, inertial dynamics stems from the existence of a reversible coupling between the primary field (playing the role of the generalized coordinate) and the generator of the symmetry (playing the role of the generalized momentum); in the case at hand, the symmetry is the rotation of the primary field, hence we call 'spin' its generator. In absence of explicit dissipation, this reversible coupling leads to global conservation of the spin, a conservation law which is known -at equilibrium -to significantly decrease the dynamical exponent, from z ≈ 2 of standard ferromagnets (Model A in the classification of Halperin and Hohenberg [35]), to z = 1.5 of superfluid helium and quantum antiferromagnets (Models E/F and G of [35]). 1 Overall, this scenario suggests that the combined effect of activity and inertia may account for the experimental exponent of natural swarms. Here, we perform an RG study of such theory, and find z = 1.34 (8) in d = 3, a value in agreement with both experiments on real swarms, z exp = 1.37 ± 0.11, and numerical simulations, z sim = 1.35 ± 0.04. The RG result is a parameter-free prediction, with no input beyond the information that both activity and inertia must be part of the theory.
A hydrodynamic theory of active matter with reversible inertial couplings requires three fields: velocity, spin and density [39,40], making the calculation technically unfeasible. To make progress, following [18], we eliminate the density field by imposing incompressibility, ∇·v (x, t) = 0. Beyond its technical inevitability, this is a reasonable physical assumption. In compressible active systems the transition is first-order [41], a framework that would make RG pointless and would rule out scaling. However, dynamic scaling is observed in natural swarms, suggesting a more complex scenario: in absence of density fluctuations, the transition becomes second-order and recent studies [42][43][44] suggest the existence of a crossover from a finite-size regime, where density fluctuations are weak and second-order physics is observed, to a infinite-size regime, where density fluctuations dominate, rendering the transition first-order. Weak density fluctuations and scaling laws in natural swarms [9], suggest then that the dynamics of these finite-size systems can be studied through an incompressible second-order theory.
The polarization field, ψ, is defined by the relation, v (x, t) = v 0 ψ (x, t), where v 0 is the microscopic speed; having v 0 as an explicit parameter is useful to take the zero-activity limit, v 0 → 0, and compare with equilibrium calculations [45][46][47]. The generator of the rotations of ψ (x, t) is the spin, s (x, t); in d = 3 the spin is a vector; however, incompressibility requires ψ to have the same dimension as space, and because RG entails an expansion in powers of = 4 − d, we need a generic spin in d dimensions, which is a d × d anti-symmetric tensor. Reversible inertial dynamics arises from the Poisson brackets [35], stating that s αβ (x) is the generator of the rotational symmetry, thus leading to the conservation of the total spin, S αβ (t) = d d x s αβ (x, t); the crucial constant g is the reversible coupling regulating this symplectic structure [35]. Finally, I αβγν = (δ αγ δ βν − δ αν δ βγ )/2 is the identity in the space of s αβ .
The dynamical field theory we study combines the irreversible off-equilibrium hydrodynamic approach of Toner and Tu [14,15,18], with the reversible conservative structure used to describe superfluid helium and quantum antiferromagnets (Models E/F and G of [35]). The equations of motion are, D t s αβ = −Λ αβγν δH δs γν + 2g I αβγν ψ γ δH δψ ν + ζ αβ , (3) where the material derivatives are defined as, Activity breaks Galilean invariance [15], so that the couplings γ v and γ s can be different from 1 and from each other. H is the classic Landau-Ginzburg coarse-grained Hamiltonian [14], The ψ-dependent part of H is the standard alignment interaction, while s 2 /2 is the 'kinetic' term [35]. Since natural swarms have scale-free correlations [8], we will perform the calculation on the critical manifold, r = 0. The terms proportional to g in (30)- (31) are the reversible forces, characterizing inertial dynamics: instead of acting directly on the polarization, the alignment force δH/δψ acts on the spin, which in turns rotates ψ [37]. On the other hand, the terms proportional to the kinetic coefficients Γ and Λ represent the irreversible forces, responsible of relaxation. The pressure P enforces incompressibility, which in k-space translates into k α ψ α = 0, or P ⊥ αβ ψ β = ψ α , with the projector P ⊥ αβ = δ αβ − k α k β /k 2 . To respect this constraint, one must project the dynamic equation for ψ, eq. (30), and its noise correlator [18], θ α (k, t) θ β k , t = (2π) d 2Γ P ⊥ αβ δ (d) k + k δ t − t whereΓ = Γ out of equilibrium. Notice that, if we write a general anisotropic form of this kinetic coefficient, Γ αβ = Γ ⊥ P ⊥ αβ + Γ (I − P ⊥ αβ ), only Γ ⊥ survives the projection of eq. (30), so that effectively Γ = Γ ⊥ in our notation (and similarly forΓ). 2 On the other hand, because eq.(31) is not projected, anisotropic corrections to Λ in k-space are generated by the RG [47], so it is convenient to assume from the outset a general anisotropic form, where P ⊥ αβγν is the projector in the anti-symmetric space [47]. The fact that this kinetic coefficient is proportional to k 2 grants that also the irreversible terms conserve the total spin; notably, thanks to the Poisson structure, this k 2 term gets generated by the RG if one tries neglecting it [46]. Finally, the spin noise has correlator, ζ αβ (k, t) ζ γν k , t = (2π) d 4Λ αβγν δ (d) k + k δ t − t whereΛ has the same structure as Λ, although out of equilibrium we may have, λ ⊥, =λ ⊥, . In addition to the terms discussed here, new interactions compatible with symmetries are generated by the RG transformation. In order to be self-consistent and closed, the RG calculation must take into account these new terms (see SI-IC4).
Within the RG analysis it is possible to define a set of effective parameters and couplings that are independent of the field dimensions and upon which physical predictions uniquely depend (see SI-IC5). Because the most important factors are activity and inertia, we focus here on their effective coupling constants: 3 c v is the effective coupling regulating activity, which vanishes for v 0 → 0; on the other hand, f quantifies the effective reversible coupling giving rise to inertial dynamics, and we will therefore refer to it as the inertial coupling constant. The scaling dimension of all effective couplings is proportional to 4 − d, indicating that the upper critical dimension is d c = 4 and that an expansion in powers of = 4 − d is appropriate. A momentum shell RG calculation at one loop [11,12] produces 65 Feynman diagrams (full details are provided in SI-I and SI-II). A rich fixed-point structure emerges (Fig.1a). The simplest fixed point corresponds to zero activity and zero inertial coupling, c * v = f * = 0 (black square in Fig.1a). This equilibrium non-inertial fixed point describes non-active systems, as classical ferromagnets, where the polarization is not coupled to the spin; here z = 2 at one loop (Model A of [35]). Incompressibility is merely a solenoidal constraint on ψ, leading to the universality class of dipolar ferromagnets [48]. If we perturb this fixed point by adding an inertial coupling we reach the equilibrium inertial fixed point (blue diamond), which has still zero activity, but non-zero inertial coupling, c * v = 0, f * = 0. This fixed point describes equilibrium superfluids and antiferromagnets (Models E/F and G of [35]) and it has z = d/2, hence z = 1.5 in d = 3; here too incompressibility is a solenoidal constraint on ψ, which changes the static universality class, but not the dynamical one [47]. This fixed point is unstable against activity, which leads the RG flow towards a novel active inertial fixed point (red circle), where both c * v = 0 and 3 We set the cutoff to 1 to simplify the notation.
f * = 0. The combined effect of activity and inertia lowers significantly the dynamical critical exponent; in d = 3 we find, z = 1.34 (8). This fixed point is stable against perturbations of all the parameters considered up to now. As we shall discuss more thoroughly later on, we believe this to be the fixed point describing natural swarms. Finally, there is a fourth fixed point (green triangle), which has non-zero activity, c * v = 0, but zero inertial coupling, f * = 0, corresponding to z = 1.73 in d = 3; here the inertial reversible terms are absent from the dynamics, hence the polarization is decoupled from the spin [18]. This active non-inertial fixed point is stable against activity fluctuations, but as soon as we perturb it with an inertial coupling, f = 0, the RG flow diverges (shaded area). There is a sound reason for this: the correct way to attain non-inertial dynamics is not to kill the reversible coupling between coordinate and momentum, but to introduce dissipation and let it take over in the overdamped limit. This is the consistency check our calculation must pass next.
So far our theory conserved the total spin, thanks to the Poisson structure generating the reversible terms in the dynamics and to the fact that the irreversible kinetic coefficient Λ is zero at k = 0. Although the Poisson structure has no reasons to change, Λ could: within natural swarms we cannot exclude that some spin dissipation exists, not as a result of a violation of the rotational symmetry, but because individual midges might exchange spin with the environment in a way that is unaccounted for in the equations of motion. Spin dissipation is produced by a k-independent term η in the kinetic coefficient, Λ ∼ (λ + λ ⊥ )k 2 + η, and similarly inΛ. The RG calculation (reported in SI-I.E) shows that the scaling dimension of η is always positive, so that if we perturb the active inertial fixed point with η = 0, the RG flow gets out of the conservative plane and eventually reaches a fixed point at η = ∞, where polarization decouples from the spin and z = 1.73 (see Fig. 1b). This is the correct way to obtain the overdamped limit in which inertia becomes irrelevant, hence we call this the active overdamped fixed point (a hopefully clarifying map of the theory is depicted in Fig.2). Yet: if the overdamped fixed point is the only asymptotically stable one, why should we be interested in the inertial fixed point?
The answer is that finite-size systems can be ruled by a partially-stable RG fixed point, if the physical parameters are close enough to it. Consider a ferromagnet slightly above its critical temperature; the stable fixed point is T * = ∞, and yet, if the temperature is close enough to T c , the critical fixed point governs the physics as long as the system's size is smaller than the correlation length, L ξ. This is a general mechanism: when the physical couplings are close to a mixed-stability fixed point, the RG flow remains for many iterations in the vicinity of it, and because iterating RG corresponds to observing at larger and larger scales, the flow of a finite-size system may never get out of that basin of attraction. This balance is always regulated by a crossover length scale, R, which is in general a more The novel fixed point (red circle), with non-zero off-equilibrium activity and non-zero inertial coupling, is the only stable one, with a dynamical critical exponent z = 1.35 in d = 3. The equilibrium non-inertial fixed point (black square), z = 2.0, corresponds to standard ferromagnets (Model A of [35]); the equilibrium inertial fixed point (blue diamond), z = 1.5, corresponds to superfluids and quantum antiferromagnets (Models E and G of [35]); finally, the active non-inertial fixed point (green triangle), z = 1.73, corresponds to active matter without reversible coupling between velocity and spin [18]; this last fixed point is not connected to the active inertial one onto this plane. b) The flow with spin dissipation: Spin dissipation, η, is a relevant parameter that brings the flow out of the conservative plane; because η grows up to infinity with the RG iterations, it is convenient to use the reduced dissipationη = η/(1 + η) to represent the flow. If we perturb the active inertial fixed point, z = 1.35, with some dissipation, the RG flow leaves theη = 0 plane, until it eventually reaches the active overdamped fixed point forη = 1 (green pyramid), where z = 1.73. Whenη = 0 it is better to represent the flow through the reduced inertial coupling,f = (1 −η)f , instead of f , so that in the overdamped limit,η = 1, we have one less parameter, as the inertial coupling drops out of the calculation. The overdamped fixed point, z = 1.73, is best seen as belonging to the overdampedη = 1 line, rather than to the conservative but non-inertial line, η = 0, f = 0: even though the value off is the same on the two lines, only the first one corresponds to the correct overdamped limit. All flow lines are actual numerical solutions of the RG equations. complicated quantity than ξ; but the upshot is the same: as long as L R κ (where κ is the crossover exponent) the metastable fixed point rules the system [42,46]. How is this relevant for natural swarms? The underdamped shape of the dynamic correlation functions in natural swarms (Fig.8a) is solid experimental evidence that spin dissipation is weak. On the other hand, the rewiring of the interaction network in swarms occurs over the same time scale as velocity relaxation (Fig.8b), i.e. activity is strong. Hence, the RG flow starts close to the conservative plane, η = 0, but far from the equilibrium plane, c v = 0; as a result, RG rapidly leads the system in the vicinity of the active inertial fixed point, z = 1.35, lingering there for many iterations, before flowing to the overdamped fixed point (Fig.8c and 8d). We find R = λ /η and κ = 2/z (SI-IE2), so that for L (λ /η) 1/z a finite-size system is ruled by the active inertial fixed point. Given that λ is finite along the flow, we conclude that as long as η L z 1, the underdamped inertial scenario must hold; because experimental relaxation is underdamped (Fig.8a), we conclude that the dynamical critical exponent in natural swarms is z = 1.35.
Critical slowing down in natural swarms was first experimentally observed in [9]; the spatio-temporal span of the events in that study, though, was somewhat too limited to have an accurate determination of z, as the largest swarm had N = 278 individuals; here we added 8 new swarming events to the experimental dataset, notably including a swarm of 780 insects. The relaxation time τ vs correlation length ξ is reported in Fig.4a. In [9] the exponent was determined through Least Squares (LS) linear regression of log τ vs log ξ; however, LS works under the hypothesis that the independent variable is perfectly determined and that all experimental uncertainty is in the dependent variable; when this hypothesis is violated, LS systematically underestimate the slope [49]. In our experiments errors certainly impact on both τ and ξ, hence LS is not appropriate and this is why z was unfortunately underestimated in [9]. Reduced Major Axis (RMA) regression [49], on the other hand, treats fluctuations over x and y on the same statistical footing (see Methods and SI-IV); applied to our dataset RMA natural swarms  FIG. 2: Map of the incompressible active theory. The coarse-grained dynamical equations may either have or not have reversible terms giving rise to the inertial coupling between the polarization ψ (i.e. the generalized coordinate) and the spin s (i.e. the generalized momentum). In the first case (g = 0) we have an inertial theory, with a Poisson structure expressing the fact that s is the generator of the rotational symmetry, thus leading to the conservation of the total spin. In the second case (g = 0) we recover the non-inertial theory of [18], where polarization is decoupled from the spin and the symmetry does not entail any Poisson structure (the equation for s becomes irrelevant); in this case z = 1.73. On the other hand, in the inertial theory the irreversible kinetic coefficient of the spin may be either conservative or non-conservative. In the conservative case there is no spin-dissipation (η = 0), which produces the inertial-conservative fixed point with z = 1.35. In the non-conservative case, the kinetic coefficient contains a dissipative term (η = 0), although the impact of dissipation depends on how strong that is compared to system's size. In the underdamped regime, ηL z 1, collective fluctuations are still ruled by the inertial-conservative fixed point, so that z = 1.35; this is the regime of natural swarms. Conversely, in the overdamped regime, ηL z 1, the Poisson structure is washed out, the spin drops out of the calculation and collective fluctuations are ruled by the fully non-conservative fixed point, hence giving z = 1.73.
gives z exp = 1.37 ± 0.11 (Fig.4). The substantial error bar should make us cautious about the agreement between experiments and theory, also considering the rather uncontrolled approximations our calculation made, most notably incompressibility and the first-order perturbative expansion in powers of , with = 1. For this reason, we make a final sanity check of our RG calculation through numerical simulations. The field theory we studied is the coarse-grained expression of the Inertial Spin Model (ISM - [37]), in which the particles' velocities are rotated by the spins, while the spins are acted upon by the social alignment forces, with noise correlator, ζ i (t) · ζ j (t ) = 2dT η δ ij δ(t − t ); χ is the generalized turning inertia, J the alignment strength, η the spin dissipation 4 , and T the noise amplitude (or temperature); the adjacency matrix n ij (t) is defined by a metric interaction radius, r c . We want to compare the numerical results with the incompressible RG calculation; hence, even though we do not impose incompressibility in the simulation, we employ a normalized alignment strength, J/n i = J/ k n ik , a prescription known to make alignment-based models less prone to phase separation [50]; moreover, we monitor each simulation to be sure that phase separation does not occur. We run three-dimensional simulations in the near-ordering scale-free regime, where ξ ∼ L (see SI-VB). In the overdamped limit, η → ∞, the ISM converges to the noninertial Vicsek model [37], exactly as our dynamical field theory converges to the non-inertial theory of [18]; but our aim is to check that in the underdamped regime the dynamics of a finite-size ISM simulation is ruled by the inertial fixed point; hence, the dissipation η has been chosen small enough to yield inertial relaxation, as in FIG. 3: RG crossover. a) Weak dissipation: Given the velocity correlation function, C(t), and its relaxation time, τ , we can define the shape function as, h(t/τ ) = − log C(t/τ )/(t/τ ); in the limit t/τ → 0, h(t/τ ) → 1 for overdamped exponential dynamics, while h(t/τ ) → 0 for inertial underdamped dynamics [9]. Experiments on natural swarms (orange line) and numerical simulations of near-critical ISM (purple line), both display underdamped inertial relaxation. The Vicsek model, on the other hand, belongs to the overdamped class (green line -data from [9]). b) Strong activity: Velocity (full line) vs network (dashed line) dynamical correlation functions, for natural swarms (orange) and near-critical ISM simulations (purple). The network correlation function measures the fraction of particles remaining within the nc nearest neighbours after a time t (see SI-III), hence it quantifies how quickly the interaction network reshuffles with time. In both natural swarms and ISM, the network decorrelates on the same time scale as the velocity, hence they are strongly active systems. Here nc = 18, which is the mean number of interacting neighbours in simulations; in the SI-III Fig.15 we show that in natural swarms the two timescales are the same over all spatial scales. c) Crossover of the flow: A close up of the RG flow around the active inertial fixed point shows that when the flow starts at weak dissipation and strong activity, it first rapidly approaches the active inertial fixed point, staying in its neighbourhood for many RG iterations, and then it crosses-over to the overdamped regime (red dots). d) Crossover of the critical exponent: The RG evolution of the dynamical critical exponent z and of the reduced dissipation, η = η/(1 + η), along the crossover flow line depicted in panel (c). The RG crossover from underdamped to overdamped fixed points corresponds to an actual crossover in real space, such that z = 1.35 for L R 2/z and z = 1.73 for L R 2/z . natural swarms (see Fig.8a). On the other hand, the speed v 0 = |v i |, has been chosen large enough to be in the active regime, namely to have a network relaxation time of the same order as the velocity relaxation time (see Fig.8b). Full details of the simulation are reported in the Methods and in the SI-V. Relaxation time vs correlation length is shown in Fig.4b; numerical errors are quite small, hence LS and RMA give the same value of z, and we can therefore calculate δz simply through LS. The result is z sim = 1.35 ± 0.04, in remarkable agreement with the RG theoretical prediction. This consistency also validates the idea that the incompressible theory can indeed be used to describe finite-size compressible systems, as long as density fluctuations are not strong.
A final comment is in order: of the two keystones of the Renormalization Group, rescaling and coarse-grainin, only the latter produces anomalous critical exponents, giving rise to non-trivial collective behaviours. The technical fingerprint of coarse-graining is the presence of Feynman diagrams: this is the case of the present calculation, which therefore probes the core element of RG. The consistency between theory, simulations and experiments attained here strongly supports the idea that the RG -and its most fruitful consequence, universality -may have an incisive impact also in biology.
This work was supported by ERC Advanced Grant RG.BIO (n.785932) to AC. TSG was supported by grants from CONICET, ANPCyT and UNLP (Argentina). We errors are so small that LS and RMA give the same result, zsim = 1.35; the LS error is δzsim = 0.04. c) Experimental resampling: To estimate the error bar on the experimental exponent we use a resampling method: we randomly draw 10 7 subsets with half the number of points and in each subset we determine z using RMA; we report here three such random subsets (orange: selected point -grey: unselected point). Rare experimental fluctuations under resampling can produce an unphysical value of z smaller than 1; this, however, happens in only the 0.002% of data subsets. d) Final comparison: Probability distribution of the experimental critical exponent (orange) from the resampling method of (b); the standard deviation of this distribution gives the error on the experimental exponent, δzexp = 0.11. The vertical band (purple) indicates the position and error of the numerical critical exponent; coloured symbols indicate the various RG fixed points values of z.

Author Contributions
AC, IG and TSG designed the study. AC, LDC, IG, TSG, GP and MS derived the structure of the dynamical field theory. LDC, GP and MS, coordinated by AC and TSG, carried out the RG calculation. LDC designed the code to evaluate the Feynman diagrams, which was further developed with the help of MS. GP performed the numerical simulations. SM and LP performed the 3D tracking of the experimental data and estimated the exponent; SM measured experimental swarm activity vs relaxation. MS, with the help of LDC and GP, wrote the Supplementary Information. AC wrote the paper.
Experiments. Data were collected in the field by acquiring video sequences using a multi-camera system of three synchronized cameras (IDT-M5) shooting at 170 fps. Two cameras (the stereometric pair) were at a distance between 3m and 6m depending on the swarm and on the environmental constraints. A third camera, placed at a distance of 25cm from the first camera was used to solve tracking ambiguities. We used Schneider Xenoplan 50mm f =2.0 lenses. Typical exposure parameters: aperture f =5.6, exposure time 3ms. Recorded events have a time duration between 0.88 and 15.8 seconds (see Table I of the  SI-II). More details can be found in [4]. To reconstruct the 3d positions and velocities of individual midges we used the tracking method described in [51]. Our tracking method is accurate even on large moving groups and produces very low time fragmentation and very few identity switches, therefore allowing for accurate measurements of time-dependent correlations.
Fit of the dynamic critical exponent. Dynamic scaling states that the relaxation time τ k at wavelength k and correlation length ξ are linked by the relation, τ k = ξ z Ω(kξ), where Ω is a scaling function. To infer the value of z from experimental data, we measured the relaxation time τ k of the mode at wavelength k = ξ −1 in different swarming events. Experimental evaluation of τ k and ξ is discussed in SI-IV, and follows [9]. Dynamic scaling in this case reduces to τ ∼ ξ z (where τ ≡ τ k=ξ −1 ); hence, log τ = z log ξ + c. In [9] z was fitted through a standard Least Squares (LS) regression, which gave z exp = 1.12 ± 0.16 on the dataset of [9], and z exp = 1.16 ± 0.12 on the current larger dataset; the problem with LS, though, is that it assumes that experimental uncertainty is only present in the dependent variable y, which is not true for our experimental data, as both τ and ξ are subject to experimental uncertainty; when using it on a dataset where the error affects also x, LS systematically underestimate the slope [49]. Therefore, LS is not a good method in our case. Reduced Major Axis (RMA) regression, on the other hand, is a method that works under the hypothesis that both x and y are affected by uncertainties [52,53]. RMA fits a set Gaussian variables x i and y i , with homogeneous variance σ 2 x and σ 2 y to a regression line, y = f (x) = αx + β, and it determines α and β through the minimization of the sum of the areas of the triangles formed between each point and the regression line with sides parallel to the axis (see insert in Fig.4 panel a). For each point, the area of this triangle is given by, |∆x i ∆y i | /2, where, The function to be minimized is therefore, The minimization equations, ∂ α Σ = ∂ β Σ = 0, give, where E g(x, y) = 1 The sign of α is the same as the sign of the correlation between x and y. A further benefit of RMA compared to other methods, such as Least Squares or Effective Variance (both discussed in SI-IVB), is that the fit is invariant under an interchange of variables, y(x) vs x(y). Moreover, RMA is also invariant under any scale change of the variables, hence it is not sensitive to the values of σ x and σ y , at variance with other methods. RMA is the only method, among those in which the fitted coefficient can be expressed in terms of elementary regression coefficients, that obeys both properties above [53]. In SI-IVB we also describe the Effective Variance (EV) regression method, which requires the experimental errors δτ and δξ as an input; we use EV with two different estimates of the (most problematic) experimental error δτ , and obtain results compatible with RMA (z exp = 1.32 ± 0.18 and z exp = 1.34 ± 0.18). Given the significant difficulties in assigning a univocal experimental error on τ to each swarm (see SI-IVA3), we prefer to quote the RMA result z exp = 1.37 ± 0.11 -which is error-neutral -as our most confident determination of the exponent.
Numerical simulations. Eqs (8) of the microscopic ISM are numerically implemented using the RATTLE algorithm to enforce the constraint |v i (t)| = v 0 . Simulations are performed in d = 3 in cubic boxes with periodic boundary conditions; average density is fixed at ρ 0 = 1 and the sizes explored are N = 256, 384, 512, 729, 1024, 2048, with N the number of particles. The effective inertia is χ = 1 and the alignment strength is J = 18; the metric interaction range is r c = 1.6, corresponding (at this density) to an average number n c ∼ 18 of interacting neighbours; the microscopic spin dissipation is η = 1 and the speed is v 0 = 2; this choice of η and v 0 ensures that the dynamics is clearly underdamped and active (see main text). The temperature (or noise strength) T serves as control parameter of the order-disorder transition; we explored the interval T ∈ [1 : 8]. The time-step of integration is chosen as dt = 10 −4 . For each size N we identify a finite-size 'critical' point, T c (N ): at every value of T we run 5 independent samples, initialised with polarized configurations, of length 6 − 8 × 10 5 steps increasing with the system size; we compute the susceptibility χ and we take as T c (N ) the point where this quantity reaches a maximum. We measure the correlation length with the inverse of the wave-number k = 1/ξ where the static correlation function at this temperature peaks, and we calculate the relaxation time τ of the velocity's C(k, t) in complete analogy with the analysis of experimental data. This procedure ensures that the systems are always in a near-critical scaling regime, with the correlation length scaling linearly with the linear system's size L (see SI-VB).

I. THE RENORMALIZATION GROUP CALCULATION
The calculation we are going to describe here is the final point of a rather long trajectory, so that it may be useful for the reader to keep in mind the sequence of the main works leading to the present result. The hydrodynamic theory originally developed by Toner and Tu for flocks [1], can also be applied to swarms when studied in its critical phase; this was done in the incompressible case in [2], which is our starting point for the active incompressible non-inertial branch of our theory. The crossover between the non-active (equilibrium) fixed point to the active (off-equilibrium) fixed point in this theory was studied in [3], where it was also established that the results of the incompressible RG calculation agree perfectly well with simulations of the finite-size compressible model, provided that density fluctuations are mild (no phase separation). The experimental evidence of an inertial coupling between velocity and spin, and therefore of the need to go beyond Toner-Tu theory, was first found for flocks in [4] and later for swarms in [5]; the microscopic model of active self-propelled particles obeying such inertial mode-coupling dynamics is the Inertial Spin Model (ISM), which is defined in detail in [6]; the Vicsek model [7] is the overdamped limit of the ISM [6]: when spin dissipation is very large compared to inertia, the spin becomes irrelevant and one reduces to a theory for just one degree of freedom, the velocity (or polarization). A discussion of the connections between Vicsek model, Toner and Tu theory and the ISM can be found in [8].
The path from the microscopic inertial active model (the ISM) to the relative coarse-grained dynamical field theory that we study here has required some intermediate steps. First, the equilibrium version (i.e. fixed network, or non-active) of the ISM has a field-theoretical benchmark in Model E (planar case) and Model G (3d case), according to the classic Halperin-Hohenberg classification of dynamical critical phenomena [9]; these equilibrium models are really a fundamental starting point for understanding the inertial mode-coupling dynamics of a primary field coupled to the generator of its rotations. These equilibrium non-active models have perfect conservation of the spin -namely zero dissipation -which is unlikely to be exactly true in biological systems, hence in [10,11] we studied the effect of dissipation at equilibrium on this kind of theories, finding that for finite-size systems the inertial fixed point of the mode-coupling theories is still relevant. Finally, we assessed (again in the non-active case) how to impose the incompressibility constraint (which is a solenoidal constraint on a fixed network) on a theory where there is inertial coupling between primary field and spin in [12]. The present calculation finally analyses the active out-of-equilibrium incompressible case in presence of inertial coupling between velocity and spin.

DESCRIBING ACTIVITY: THE TONER AND TU THEORY IN THE INCOMPRESSIBLE CASE
The starting point of our field-theoretical description is given by the hydrodynamic theory developed by Toner and Tu [1], representing the minimal description of an active system with rotational invariance. This theory represents the hydrodynamic counterpart of the Vicsek model (VM) [7], although it describes in general any model which shares the same symmetries and conservation laws. The VM describes the dynamics of a collection of self-propelled particles with fixed speed v 0 , whose velocities interact through a dynamic alignment as in equilibrium XY or Heisenberg models. The continuous theory of Toner and Tu represents a sort of moving ferromagnet, combining activity with Model A dynamics ∂ t v = −δ v H + θ [9], with H being the Landau-Ginzburg free-energy. This makes the Toner and Tu theory fall into the class of overdamped Langevin equations, in which the social force δ v H acts directly on the time-evolution of the order parameter. In the incompressible limit -see Section I A 5 -the equation of motion for the velocity field in the near-critical Toner and Tu theory is given by [2], where on the l.h.s. we can recognise the material derivative Since the active force breaks Galilean invariance [13], the coefficient γ v needs not to be equal to unity. On the r.h.s. we have the alignment force ∇ 2 v, the active force coming from a Landau potential, the pressure and a gaussian white noise of variance 2Γ. A RG analysis of Eq. (13) near criticality predicts at one loop a dynamic critical exponent of z = 1.73 in d = 3 dimensions [2]. When γ v = 0, namely in absence of advection, equilibrium Model A's critical behaviour is recovered, with the dynamic exponent being z = 2 at one-loop. A crossover thus occurs between the non-active equilibrium and the active off-equilibrium critical behaviour [3]. The microscopic parameter controlling this crossover is the microscopic speed v 0 ; to see this it is convenient to work with the coarse-grained orientation field ψ (or local polarization) rather than with the velocity field v. In models as the VM, in which each individual has the same speed v 0 , the connection between these two fields is simply given by, Written in terms of the field ψ, equation (13) becomes, in which the explicit dependence of the advection term on the microscopic parameter v 0 clarifies the mechanism underlying the crossover between equilibrium Model A and off-equilibrium active theory [3].

RESTORING INERTIA: EQUILIBRIUM MODE-COUPLING DYNAMICS
Experimental evidence about swarms' temporal correlation function shows that an inertial structure -vanishing derivative for small times -absent in the incompressible Toner and Tu theory, is needed to describe natural swarms [5]. This is confirmed by the discrepancies between the experimental value of the dynamic critical exponent z, found to be z 1.37 in natural swarms, and the theoretical prediction of z = 1.73 of the Toner and Tu incompressible theory in d = 3 [2]. As proposed in [4] for the case of flocks, restoring inertia can be done by recognizing that, although Eq. (13) is invariant under rotations, there is no trace of a conservation law associated with this symmetry. According to Noether's theorem, when a theory is invariant under a given symmetry, the generator of this symmetry is conserved. The presence of conservation laws heavily affects the critical properties of a system, leading to completely different dynamic behaviours. Since Toner and Tu theory is built to be invariant under rotational symmetry, our aim is to couple (13) with the conservation law associated with the rotational invariance, in order to restore inertial behaviour.
A possible way to restore inertial behaviour has been proposed in [6], where a model named Inertial Spin Model (ISM) has been introduced to provide a theoretical explanation for information propagation in flocks [4]. The ISM shares many common features with the Vicsek Model, but has one main (and crucial) difference: the aligning force does not act directly on ∂ t v, but is mediated by the generator of rotational symmetry. This new variable, in analogy with quantum mechanics, has been called spin, since it represents the generator of rotations in the internal space of the velocities. It must not be confused with angular momentum, which is the generator of rotations in positions' space. The spin is a measure of how much an individual is rotating around its own axis; more precisely, it is proportional to the curvature of the trajectory, namely to the inverse of its radius of curvature: all individuals sharing the same spin undergo equal radius turns rather than parallel-path turns [4]. Conservation (or slow dissipation) of the total spin has huge impacts on the dynamics of a swarm or a flock. The presence of a spin-velocity coupling makes the spin responsible to carry information, giving rise to second-sound propagation. The presence of second-sound modes even close to the ordering transition (called 'paramagnons' in condensed matter) was also found experimentally in natural swarms [5], supporting the idea that spin-velocity mode-coupling is an essential mechanism also of these system. When shifting our attention to hydrodynamics, inertia can be restored by dynamically coupling the velocity/polarization field with to spin field [10]. The global conservation of the spin allows it to fluctuate on space-time-scales comparable with those of critical fluctuations, thus making spin-velocity couplings relevant in the RG sense. To simplify the discussion, we will first discuss the effects of restoring inertia in absence of activity. At equilibrium, a mode-coupling interaction between an order parameter ψ and its spin s arises from their Poisson-bracket relation [9], which encodes the fact that s is the generator of rotations of ψ (repeated indices imply a summation over them). The parameter g is the reversible coupling regulating the symplectic structure, i.e. the inertial coupling between polarization and spin. In general, when the order parameter is a n-dimensional vector, the generator of its rotations s is a n × n anti-symmetric tensor [14]. The tensor I represents the identity in the space of s αβ , and it is given by, with the factor 1 2 ensuring that I αβγν s γν = s αβ and I αβστ I στ γν = I αβγν . We work here with an order parameter of generic dimension n, although in the physical case we have n = 3. This choice might seem inconvenient at first glance. When n = 3, the spin s can be written as a 3-dimensional vector, lightening the tensorial structure and reducing the number of indices. This comes from the fact that when n = 3, the plane on which the rotation occurs can be uniquely identified by the vector orthogonal to it, while this does not happen when n = 3. However, there is an important reason to work with a tensorial spin, rather than a vectorial one. In the following, we will impose incompressibility, which requires v and therefore ψ, to have the same dimension d as space. Although the physical case is given by n = d = 3, the RG perturbative expansion is performed by expanding d near the upper critical dimension d c = 4, hence one is forced to work with an order parameter of dimension n = d ∼ d c to correctly perform the RG perturbative expansion. The spin associated with an n-dimensional order parameter, in generic dimension n, is represented by a n × n anti-symmetric tensor rather than a vector. Therefore, we will need to work with this more generic form.
The equilibrium dynamics of a near-critical system in which s is conserved, known as Model G in the physical case of a three dimensional order parameter (n = 3) [9] and generalized by the Sasvari-Schwabl-Szepfalusy (SSS) model in n dimensions [14,15], can be constructed following the classic Mori-Zwanzig formalism [16,17] and it is given by, Here the free-energy functional H is chosen to take the usual Landau-Ginzburg form for the critical field ψ while it is gaussian for the spin field (we set to 1 the inertia, which does not get any renormalization), The square gradient enforces local alignment of the order parameter ψ, while rψ 2 + uψ 4 is the modulus' confining potential. At mean-field level, when r < 0 the ground state exhibits symmetry breaking and an ordered phase is observed, while for r > 0 the ground state is given by the disordered state with zero mean polarization. Inertia is restored thanks to the presence of mode-coupling interactions that encode the conservative nature of the dynamics, arising as a consequence of the Poisson-bracket relation (16). The term ∂ t s ∼ gψδ ψ H represents the action of the force on the dynamics of the spin, rather than directly on the order parameter. The indirect action of this force on the dynamics of ψ is guaranteed by the term ∂ t ψ ∼ gψδ s H, which expresses the rotation of ψ induced by the conservation of s. This mode-coupling mechanism restores the inertial structure of the equations of motion, thus allowing to describe the behaviour observed experimentally in swarms in the field [5]. On the other hand, the terms ∂ t ψ = −Γδ ψ H and ∂ t s ∼ −Λδ s H represent dynamic relaxations, giving rise to the diffusion and transport phenomenology typical of stochastic statistical systems. These relaxation terms are thus complemented by the white Gaussian noises θ and ζ, whose variances are given by Einstein relations when the system is at equilibrium. The dissipative constant (or kinetic coefficient) Γ rules the relaxation of the order parameter and it is a crucial player in determining the dynamic exponent z since it fixes the time-scale on which relaxation occurs. Similarly, the kinetic tensor Λ αβγν rules the relaxation of the spin. When the total spin is conserved, the tensor Λ is proportional to ∇ 2 [9], and in the isotropic theory it takes the form, In this theory the total spin is conserved,Ṡ An RG analysis of this field theory shows that the critical dynamic exponent is given by z = d 2 [9,15,18], that is z = 1.5 in d = 3.
Is behavioural inertia the only way in which experimental observations of [5] can be explained? The possibility that other mechanisms take part in explaning those observation cannot be excluded a priory. For example, another candidate which could account for the faster and more efficient information propagation, and hence a lower value of z, could be the presence of long-range interactions. However, midges in the field seem to interact mainly with sound-mediated interactions with an interaction range of only few centimeters [19,20], way smaller than the size of the swarms observed. The short-range nature of the interactions in swarms was also confirmed in [21], where the radius of the effective aligning interaction was extimated to of 2 − 5 cm, in agreement with acoustic interactions. This seems to suggest that effective interactions in swarms are short range, and hence that the symmetry-based arguments of behavioural inertia are the most economic way to describe the large scale behaviour of natural swarms.

THE ROLE OF DISSIPATION
The spin-velocity coupling introduced in the previous section was mainly motivated by symmetry arguments, and associated conservation laws. However, in real biological systems, information is not expected to be propagated forever with zero dissipation, as damping effects may become relevant over longer and longer distances. Thus, a (small) spin dissipation cannot be excluded in real biological systems. Note that the introduction of this friction does not violate the rotational symmetry of the problem, since all hydrodynamic equations are still invariant upon rotations. Moreover, we will show in sec. I E 1 that our theory in the presence of large friction is equivalent to the incompressible Toner and Tu theory, which has been explicitly built up to obey rotational symmetry.
To understand why violation of spin conservation does not come from a weak violation of the symmetry, let us discuss an example in a case with which the reader might be more familiar. In a translational invariant system -say a collection of marbles -, Noether theorem states that the total linear momentum is conserved. If we had complete control of all degrees of freedom, namely position q and momentum p of each marble, we could describe the system through Hamilton equationsṗ i = − ∂H ∂qi ,q i = ∂H ∂pi . In this case, Noether theorem can be explicitly tested: since all the interactions in H must obey translational invariance q i → q i + δq, the time derivative of the total momentum P = i p i vanishes. In which cases can momentum conservation be violated? The first case is when the system is closed in a box. In this case, collisions with the walls of the box violate momentum conservation. This lack of conservation is due to the violation of the symmetry: the presence of a wall in a precise point of space manifestly violates translational invariance.
If instead our marbles were surrounded by some medium, say a fluid, exchange of momentum between the system and the medium is allowed. Hence, when observing the momentum of the collections of marbles only, it might be that violation of momentum conservation are detected. Does this mean that the symmetry is violated? Of course not: Noether theorem ensures conservation of the total momentum of all degrees of freedom, including those describing the medium. What happens to the total momentum of the marbles only depends on the interactions between fluid and marbles. When we want to describe only the degrees of freedom of the system of marbles, non-Hamiltonian effective interactions must be taken into account to describe the effect of those degrees of freedom we decided to coarse-graine (the medium). These interactions between the system and the medium can be effectively described as a dissipation of the momentum, apparently violating the conservation law associated with the symmetry. In fact, this effective dissipation arises as a consequence of not taking into account all the possible degrees of freedom; the symmetry and its associated conservation law are still in place.
Similarly, if the swarm were an isolated system, its total spin would be exactly conserved. Nevertheless, we expect midges in a swarm to interact not only among each other, but also with the surroundings. Due to the global rotational symmetry, in virtue of Noether's theorem we expect the total spin of all degrees of freedom to be conserved. However, our field theory is an effective description of swarms, in which the interactions with the environment have been coarse-grained. Hence, some spin friction may arise as the effect of external forces on the swarms. Whether these interactions between midges and environment allow some spin exchange is out of our current knowledge, and hence we can not exclude them. What we do know is that if spin exchange was possible, it has to be weak, since the presence of inertial effects [5] indicate that the spin dissipation is small.
In order to introduce spin-dissipation, a k-independent term must be added to the kinetic coefficient of the spin, Λ αβγν → Λ αβγν + ηI αβγν where η is the dissipative friction. The new form of Λ αβγν , in isotropic theories, is therefore given by Although from a purely hydrodynamic perspective -i.e. at long wavelengths and long times -the existence of a spin friction η would make the field s a fast mode that can be dropped from the hydrodynamic description [9], it was shown in [10,11] that this is not the case for finite-size systems. When the size of the system is finite, crossover phenomena that are usually ignored in the hydrodynamic limit may become relevant. In the present case, if the dissipation is weak, η λ, the system undergoes an RG crossover between the conservative regime of Model G (η = 0) and the fully overdamped dissipative case of Model A (η = 0). Below a certain crossover length-scale R determined by the extent of dissipation, i.e. for modes with wavevector k R −1 , the critical dynamics has a inertial nature as if η = 0, with z = 1.5 [10,11]. On the other hand, on length-scales larger than R, k R −1 , the dissipation overcomes and the dissipative result of Model A is recovered. This argument will discussed more thoroughly later on in Section I E 2.
Natural swarms definitively have a finite size. Moreover, in natural swarms, the spin dissipation must be small enough to keep the system in its underdamped phase, as otherwise the temporal correlation functions of the theory would not reproduce the experimental ones [5]. This means having a crossover length-scale R larger than the system's size so that experimentally one observes the conservative inertial dynamics at all the accessible scales [10]. For this reason, we will be particularly interested on the neighbourhood of the conservative η = 0 plane, even though η turns out to be a relevant perturbation in the RG sense.

COMBINING ACTIVITY AND INERTIAL DYNAMICS
Inspired by this equilibrium dynamic structure, we build the off-equilibrium theory describing inertial active matter by adding a term gψ × δ s H to the active field theory (15), identifying the rotation of ψ induced by the conservation of s. The dynamics for s is instead constructed from Eq. (19), with advection added through the minimal substitution ∂ t → D t = ∂ t + γ s v · ∇, encoding the fact that also the spin field is transported by the velocity. Here γ s is not necessary equal to 1 since Galilean invariance is violated. Thus, the resulting equations of motion take the following form: Although in equilibrium systems Γ needed to be a coefficient, rather than a matrix, because of the rotational symmetry, out of equilibrium the possibility of having non-symmetric interactions allows in principle for more complex tensorial structures, making Γ αβ a matrix rather than a real coefficient. However, non-symmetric linear couplings between the different components of ψ α are typically known to lead to totally different phase transition phenomenology, very different from that in which we are interested [22]. Hence, for the sake of simplicity, we shall assume that Γ αβ has no anti-symmetric component. Because of the rotational symmetry, the only symmetric form Γ αβ can take is that proportional to the identity: Γ αβ = Γδ αβ . The theory presented in Eq. (24)-(25) will be studied in the incompressible case, therefore we omit all terms incompatible with this constraint, such as -for example -the two non-standard advective terms v (∇ · v) and ∇v 2 arising in the compressible Toner and Tu theory as a consequence of the absence of Galilean invariance [13]. Incompressibility, however, does not forbid the presence of non-standard advection terms in the equation of motion for the spin and indeed we will demonstrate in Section I C 4 that the RG does generate two of these advection (adv) terms, namely, Moreover, we will demonstrate that the RG also generates two of these anomalous mode-coupling (mc) terms in the equation of the spin (see Section I C 4), Crucially, each one of these anomalous terms is the divergence of a current, implying that the RG does not generate non-conserved (k-independent) spin dissipation: the conservation of the total spin,Ṡ αβ (t) = 0, hallmark of the mode-coupling theories [9], is preserved even out of equilibrium. The novel vertices are accompanied by four new dimensionless couplings µ 1 , µ 2 , φ 1 , φ 2 .
Some other terms could be included in the calculation. It is the case of linear couplings between spin and velocity described in [23], which near criticality take the form ∂ t ψ α ∼ ∂ β s αβ and ∂ t s αβ ∼ ∇ 2 ∂ α ψ β − ∂ β ψ α . These terms modify the structure of linearized hydrodynamic equations, allowing the presence of propagators and correlation functions that mix the fields s and ψ. However, if such terms were included, the number of diagrams (which are not few even in the present calculation) would inevitably become enormous and impossible to manage. Moreover, the presence of these new linear terms does not modify the dynamic critical exponent z of the linear theory, while the presence of advection or inertial mode-coupling alone has a great impact on it. Therefore, the presence of these additional linear terms is expected only to perturb the effect of non-linear interactions on the critical exponents. Hence, we decide to focus on the study of spin-velocity couplings due to non-linear interactions only by working on the sub-manifold of the parameter space where such linear terms are not present. Because the RG calculation is in perfect agreement with numerical simulations even in the case in which these linear terms are ignored, we believe that including them from the beginning should not really affect the results we find here.
The resulting equations of motion therefore become, In principle, these equations should be coupled to an additional equation for the density field ρ. However, as we will discuss in Section I A 5, we can get rid of the density field by studying incompressible systems. Moreover, due to anisotropic effects caused by requiring the system to be incompressible, the kinetic tensor Λ in (21) will have two different diffusive coefficients λ ⊥ = λ for the longitudinal and transverse modes of s [12].
Although the system we are dealing with is out of equilibrium, it is possible to identify the truly non-equilibrium dynamic terms from those arising from a free-energy functional H that would survive also in the equilibrium limit v 0 → 0. In this limit, the theory resembles the dynamical structure of Model G [9], therefore (18), (19) can be viewed as given by the merging of this equilibrium model [9] with Navier-Stokes equation [24]. The latter takes into account the active motion of particles, as it happens for the Toner and Tu theory.
Because the system is out of equilibrium, Einstein relations between the kinetic coefficients and the corresponding noise variances are not expected to hold. Therefore, θ and ζ of Eq. (30), (31) are white gaussian noises with zero mean θ = ζ = 0 and variance given by whereΓ = Γ and the amplitudeΛ to take the same form of Λ but with different coefficients (λ = λ andη = η).
All the other terms, which cannot be written as derivatives of a free energy functional, represent genuinely offequilibrium interactions; these are advection and anomalous terms, which all occur as a consequence of the fact that individuals are not fixed on a network. As we already said, the couplings of the advective terms γ v and γ s need not be equal to 1 nor to each other, due to the absence of Galilean invariance [13]. Together with these active terms, we added also a pressure force −∂ α P to (30), as it happens in Navier-Stokes, as well as in Toner-Tu equations.
The equations of motion we just derived in the present section describe inertial active matter. By tuning the different parameters, we expect these equations to be able to describe many different phases of active matter. Since swarms have large, scale free correlations but no net global polarization [21,25], they are expected to be described by the near-critical regime of the present field-theory. The absence of any intrinsic length-scale in the correlations of swarms suggestes that the renormalized mass of the field theory has to vanish. This means that the bare mass r is expected to be small. Does this arise as a consequence of some fine tuning of the parameters of the swarms in the field? Although no answer to this question has been given yet, we believe there are two main possibilities. One is of course that natural swarms do fine-tune their intrinsic parameters to achieve scale-invariance. This would however require midges to be able to change their interactions with neighbours, and tune them accordingly. A second possibility is that each swarm in the field has a given set of parameters, and tunes its size to maximize its susceptibility, namely the collective response. This mechanism, proposed in [25], lies on a simple assumption: midges gather in swarms only when it is convenient, namely when they do maximize their ability to behave collectively. This could happen due to interactions we are not aware of, that make the swarms unstable whenever its size is too large, naturally breaking it into smaller swarms until the collective response is maximal. Note that, whatever is the correct scenario, based on the results in [21,25], swarms can be effectively described at a field-theoretical level as a system near a critical point, namely with a small mass r.

ENFORCING INCOMPRESSIBILITY
In an active system, individuals are not fixed on a lattice but are free to move, allowing fluctuations in the local number density to arise. When the total number of individuals is conserved, these fluctuations occur on large space and time-scales, making the local density ρ (x, t) one of the slow-modes characterizing the hydrodynamic behaviour of the system [9]. The time-evolution of the coarse-grained density field is thus given by the continuity equation, where v 0 ψ = v is the velocity field. When considering systems with effective alignment interactions, the presence of density fluctuations plays a crucial role in determining the phenomenology of the ordering transition. While in equilibrium ferromagnetic systems the transition is known to be continuous, things radically change when activity is added [26][27][28]. The presence of density fluctuations generates instabilities when the transition is approached from the ordered state, thus leading to a discontinuous transition from finite to zero polarization. On the contrary, a continuous transition has been shown to arise if density fluctuations are suppressed [2]. The first-order nature of the transition in compressible active matter is hence induced by the presence of density fluctuations. A recent renormalization group study has highlighted that a crossover between second and first order phenomenology is present in a modification of the Toner and Tu theory [29], with the former belonging to the incompressible universality class. Moreover, also recent numerical simulation of the compressible Vicsek Model show that a continuous, second-order phenomenology is observed when density fluctuations are mild, namely when no phase-separation arises [3], with scaling laws ruled by the incompressible exponents found in [2]. In finite-size systems the transition may thus acquire a continuous phenomenology as a consequence of finite-size effects [27], with density fluctuations not being strong enough to destabilize the second-order transition typical of equilibrium models [7]. Natural swarms have been shown to exhibit static and dynamic scaling laws typical of systems near to a continuous transition [5,21], thus suggesting density fluctuations are not strong in determining the collective state. Following [29], we believe that the incompressible universality class might describe the exponents of natural swarms. Hence, incompressibility will be enforced from the very beginning of our calculation. This is achieved by requiring a homogenous and constant density in eq (34), namely ρ (x, t) = ρ 0 , thus dropping it from the hydrodynamic description. Incompressibility radically decreases the technical intricacy of the RG calculation, to a level that can be managed. Therefore, in the present work, we shall get rid of density fluctuations and study the theory at fixed density, namely assuming the system to be incompressible. In an incompressible system Eq. (34) becomes a constraint on the field v and, consequently, on the polarization ψ: In Fourier-space this constraint translates into the following two equivalent statements, where we have defined an object which is rather central to this calculation, namely the projector onto the subspace orthogonal to k, and where, Here and in the following, we will use the notation, where Λ is the ultraviolet cutoff of the theory, of the order of the inverse of the microscopic inter-particle distance. Summation over repeated indices is always understood.
In order to enforce incompressibility in equations (30) and (31), two steps are necessary. The first one is rather intuitive, and it consists in projecting the equation of motion for ψ (30) onto the subspace transverse to k, which is a standard procedure [2,24]; this is equivalent to say that the pressure term ∂ α P enforces the constraint. The second step is less intuitive and it has been discovered in the equilibrium case v 0 = 0 [12]: the presence of a solenoidal constraint on the order parameter ψ requires to project also the force δ ψ H that appears in the mode-coupling term of equation (31) [12]. This second projection leads to the presence of an additional non-linear interaction in the equation of motion for s, the so-called DYnamic-Static (DYS) vertex, first found in [12], This vertex mixes the effects of the dynamic mode-coupling term and the static ferromagnetic ψ 4 interaction. At equilibrium the coupling constant κ must obey the relation κ = g u, a crucial result that allows the equilibrium theory to be closed under renormalization [12]. However, off-equilibrium effects may lead to a violation of the relation between κ, g and u meaning that, in general, one can have, We shall demonstrate that at the new off-equilibrium inertial fixed point, u and g remain finite, while κ vanishes. For this reason, in the main text, we omitted altogether the DYS interaction in the equations of motion to facilitate reading. However, in the actual calculation described here, this interaction will be kept for two reasons. First, it allows maintaining a connection with the equilibrium theory of [12], in particular, recovering the same result as in equilibrium when v 0 → 0 is an important consistency check in such a complicated calculation. Secondly, the presence of this vertex is crucial for an additional reason: the high dimensionality of the parameter space (16 dimensions) and the intricate form of the β-functions will not allow us to find analytically the RG fixed point. To perform a successful numerical integration of the RG flow equations, the initial condition will be chosen in a region of the parameters space close to the equilibrium theory with solenoidal constraint. For the RG flow to go smoothly from the equilibrium to the off-equilibrium novel fixed point, it is technically crucial to keep this DYS interaction in the calculation, even though it eventually flows to zero at the new RG fixed point. In other words, although the DYS vertex is not relevant at the novel fixed point so that it does not contribute to the new value of the dynamical critical exponent, the DYS vertex is technically relevant to find the new fixed point in the large parameter space.

THE EQUATIONS OF MOTION OF INCOMPRESSIBLE INERTIAL ACTIVE SWARMS
Incompressibility implies that the field ψ is only allowed to fluctuate in the direction perpendicular to k, thus generating an anisotropic behaviour of the field s that acquires two different relaxation rates for its longitudinal and transverse components with respect to k [12]. Therefore the tensor Λ αβγν , and consequently alsoΛ, takes the form, where P ⊥ αβγν is the projection operator in the anti-symmetric space of s [12], which in k-space is given by and we recall that, After symmetrizing terms containing powers of the same field, the incompressible equations of motion in k-space, finally become, where the following tensors were introduced, and the noises have correlations, Finally, to simplify the notation, in (45), (46) the following reduced parameters have been defined, B. SETTING UP THE STAGE FOR THE DIAGRAMMATIC EXPANSION

THE MARTIN-SIGGIA-ROSE-JANSSEN-DE DOMINICIS ACTION
In order to employ RG to study the critical dynamics of our model we follow the method proposed by Martin, Siggia, Rose [30], Janssen [31] and De Dominicis [32]. The Martin-Siggia-Rose-Janssen-De Dominicis (MSRJD) formalism allows to describe the behaviour of fields evolving according to stochastic differential equations in terms of a field theory formulated using path integrals. The dynamic behaviour of the field φ, defined by the following Ito stochastic differential equation, is reproduced by the field-theoretical action S given by, Here F is the deterministic evolution operator, θ a white gaussian noise with variance 2L αβ . The introduction of the hatted fieldφ in the action is the price that has to be paid to exploit the path integral formulation, using the standard rules of static renormalization and writing the perturbative series in terms of Feynman diagrams. The field theoretical description reproduces the stochastic dynamics in the sense that, for a given observable where O is the average value of O over all possible realizations of the noise θ, while Thanks to this equivalence, the critical dynamics can be investigated by studying the action S through RG techniques. The gaussian part of the action S derives from the linear dynamics, namely the linear part of the operator F, while the interactions derive from non-linear terms. Within this formalism, an external source h introduced in the dynamical equation of φ is coupled toψ in the effective action. Therefore, the response function, known also as Green function or propagator, can be written as, For this reason,φ takes the name 'response field'. The MSRDJ action S for the stochastic equations (45) and (46) depends upon four fields: ψ,ψ, s andŝ. The action can be split in the following terms, where S 0,ψ and S 0,s are the gaussian parts of the action, respectively coming from the linear dynamic terms of the equations of motion of ψ and s, while S I is the interacting part. From Eq. (55) we have, We wrote the effective action in k and ω space, where the generic field φ is given by withk = (k, ω) and, Notice that there is no cutoff in the frequency ω.

FREE THEORY: PROPAGATORS AND CORRELATION FUNCTIONS
The starting point to build the perturbative expansion of the equations of motion is the free theory, obtained by setting to zero all the dynamic non-linear couplings, namely g, γ v , γ s , J and κ. From the gaussian part of the action, given by Eqs. (60) and (61), we can derive the expressions for the bare propagators and correlation functions for the effective field theory, which are the same as Model G with solenoidal constraint [12], and are given by, The subscripted zeros on thermal averages indicate that they are computed within the non-interacting theory. The tensors G and C are given by, In Eq. (67), (69), (68) and (70) we have, In the diagrammatic framework, the fields ψ andψ are represented with a solid line, while the fields s andŝ are represented with wavy lines. Bare propagators and correlation functions thus take the following graphical representation where the arrows in the propagators always point in the direction of the response field.

NON-LINEAR TERMS: THE VERTICES
The six terms that compose S I represent the non-linear interactions in the equations of motion. Each interaction involves one response field, identifying the equation of motion in which the corresponding non-linearity appears:ψ, if the vertex comes from a non-linearity in the equation of ψ;ŝ, if it comes from a non-linearity in the equation of s. In the diagrammatic framework, these interactions are graphically represented by vertices, in which different lines merge, each representing one of the fields involved in the interaction. We remind that full lines representψ and ψ fields, while wavy lines representŝ and s fields. Moreover, an entering arrow is used to identify the leg representing the response field. We shall choose vertices to have opposed signs with respect to the interactions; the convenience of this choice is that vertices play a crucial role in building Feynman diagrams, which come from the expansion of exp(−S).
The first vertex involvingψ represents the mode coupling non-linearity proportional to the reversible dynamic coupling g,ψ The second interaction involvingψ is the self-propulsion (or advection) interaction coming from the convective derivative in the equation of motion. This vertex is proportional to v 0 γ v , and it vanishes when the microscopic speed does. Graphically, this interaction is represented bŷ The third vertex involvingψ derives from the ferromagnetic ψ 3 Landau-Ginzburg interaction, proportional to J. It is represented by the term,ψ The other three vertices involve one fieldŝ and derive from the equation for the spin. The first one represents the dynamic mode-coupling interaction proportional to g with the addition of the two mode-coupling anomalous terms, with different tensorial structure, : where This vertex vanishes when h = p, guaranteeing that this interaction does not contribute to the dynamics of the total spin S (t) = s (k = 0, t). The anomalous mode coupling terms are those proportional to Φ 1,2 . From a technical point of view, it is essential to note that this vertex can be rewritten in an alternative form; by using the delta function, together with a symmetric distribution of the momenta, one has, This second form is convenient for two reasons: first, it has a simpler structure, which makes it easier to recognize corrections to the coupling constant g; secondly, in this form, it is more transparent to demonstrate the diagrammatic origin of the anomalous terms, a derivation that we will see later on in Section I C 4. On the other hand, the first form of this same vertex, equation (79), is handier when calculating diagrams in whichŝ αβ appears as an internal leg. The second vertex involvingŝ is the self-propulsion interaction, coming from the fact that s is advected by the velocity v 0 ψ, and it is proportional to v 0 γ s . Graphically, this interaction is represented by, Also this interaction vanishes when q = p, and therefore does not contribute to the dynamics of the total spin either. This vertex takes into account also the anomalous advection of s through the terms proportional to µ 1,2 . The last interaction term is the DYnamic-Static (DYS) vertex [12]. This interaction mixes the ferromagnetic-like interaction and the mode-coupling dynamic term, as a consequence of the presence of incompressibility. It represents the effects of the Landau confining potential on the dynamics of s, mediated by the mode-coupling dynamic interaction. It takes the following form, This interaction does not vanish when k = 0, and thus it contributes to the dynamics of the total spin. However, we shall be careful not to confuse the absence of conservation with the presence of dissipation. If the spin was largely dissipated, it would become a non-hydrodynamic variable whose behaviour does not affect that of the order parameter. As shown in [12], where the fixed network approximation of the incompressible theory developed here is analyzed, even though the spin may not be globally conserved due to the DYS interaction, it represents still an hydrodynamic slow mode. This is because the DYS vertex does not involve the spin, and thus represents the effects of the slow-mode ψ.
Even if this effect remains finite as k → 0, it may only be responsible for slow variations of the total spin s. Hence, although the total spin is not conserved, it is not even dissipated and thus it undergoes a generalized precession caused by the DYS vertex [12].
C. THE DIAGRAMMATIC EXPANSION

DYNAMICAL RG IN MOMENTUM SHELL: GENERAL PROCEDURE
The momentum shell RG scheme [33][34][35][36][37] provides an explicit method to calculate critical exponents. By integrating out the short-wavelength modes iteratively, the RG generates a flow in the parameter space that eventually converges towards a fixed point. The study of the linearized flow equations near an attractive fixed point allows computing explicitly the critical exponents of the theory. Universality, in this RG context, means that one single fixed point rules the long-wavelength behaviour of a large class of theories, each one identified by a different initial point in the parameter space. The RG flow is defined by a set of recursive relations, obtained by iterating a RG transformation of the effective action of the field theory. The RG transformation consists of two steps: i) the probability distribution of the fields is marginalized by integrating short-wavelength modes on the shell b −1 Λ < k < Λ, with b > 1, hence effectively decreasing the cutoff in momentum space; ii) space and time are rescaled, so to formally restore the same cutoff as the original theory. The action obtained after one RG transformation has different parameters and it describes the system when it is observed on a larger scale. However, since the partition function remains the same up to a multiplicative constant, the physical observables are left unchanged. The flow equations are obtained under the form of recursive relations, describing how the parameters at the iteration (l + 1) can be obtained starting from those at step l.
In the gaussian theory, namely when all interaction terms vanish, the shell integration is harmless since modes at different wavelengths are independent [34], and thus the RG flow is trivial: essentially each parameter rescales according to naive dimensional analysis. However, when non-gaussian interactions are present, the shell integration couples long and short wavelength modes, generating nontrivial corrections to the bare action. These corrections can be computed within a perturbative expansion in powers of = d c − d, where d c is the upper critical dimension, namely the dimension above which mean-field theory is exact, while d is the spatial dimension. This expansion method to compute the critical exponents, known as -expansion, was proposed by K.G. Wilson, and first explored in the seminal paper "Critical exponents in 3.99 dimensions" [38], in honour of which we chose the title of the present work.
In order to perform the shell integration, it is convenient to split the fields in their Infra-Red (IR) and Ultra-Violet (UV) modes [34], and write the partition function as, where φ stands for all the fields of the theory, while superscripts < and > indicate whether the field has momenta lower (IR modes) or higher (UV modes) than Λ/b respectively. The action S can then be written in the following form, where S 0 is the gaussian part of the action while V represents all the interactions between UV and IR modes. Integrating out short wavelength details, namely performing the integration over the UV modes φ > with wavevector on the shell, leads to, where, is the gaussian average of e V[φ < ,φ > ] over the UV fields with on-shell momentum, and ∆S[φ < ] represents the corrections to the bare action S due to the shell integration. By assuming the interaction couplings small -assumption that will be verified a posteriori at the stable fixed point-it is possible to expand e V in powers of the couplings. The expansion of ∆S can be graphically represented as an expansion in Feynman diagrams, composed only by connected diagrams [34]. This shell integration is usually performed under a thin-shell approximation, namely taking b ∼ 1: in this limit ∆S is proportional to the shell thickness 1 − b −1 ln b. The final result after the integration over the UV modes φ > is an effective action with a new cutoff Λ/b and modified parameters, namely, where we used the relation b a 1 + a ln b when b is close to 1. The aim of the perturbative theory is to calculate the corrections to the parameters, namely the values of δP.
In order to recover a theory with the same cutoff Λ as the bare theory momenta must be rescaled, which in turns requires to rescale also frequency and fields [39][40][41][42], where z is the dynamical critical exponent and χ φ is the scaling dimension of the field. Once this rescaling is done, the action takes the same form as the bare one, but with new renormalized parameters and couplings, which will be denoted with a subscript b. These new values of the parameters, defined in order to absorb all the powers of b in front of them, can be expressed as functions of the bare parameters through the following relation, where χ P defines the total scaling dimension of P, which takes into account both the naive physical dimension, coming from dimensional analysis, and the anomalous scaling dimension due to the RG coupling of IR and UV modes. We may therefore write χ P as, where d P is the naive physical dimension of P in units of momentum k, while δP is the anomalous scaling dimension defined in Eq (87). Thanks to this transformation, the partition function can be written as, where S b [φ] is the renormalized action, obtained after integration over a shell of thickness ln b and after rescaling.

SELF-ENERGIES AND VERTEX CORRECTIONS
Following the RG scheme introduced in the previous Section we now derive the recursive relations for the parameters and coupling constants. As a consequence of the shell integration, the bare gaussian action S 0 of (60) and (61) acquires some corrections, that it is customary to write in the following way [9], where all momenta are integrated off-shell, k < Λ/b, while frequency integrals still run form −∞ to ∞. The new quantities Σ,Σ, Π andΠ are the self-energies, which contribute to the perturbative corrections of the gaussian parameters of the original action. On the other hand, the interacting part of the action (62) acquires the corrections, where the various vertex-functions V s give perturbative corrections to the coupling constants of the non-linear interactions. We notice that all the corrections are proportional to the volume of the momentum shell, which is given In what follows, we will compute the self-energies and the vertex functions by using perturbation theory. Evaluation of perturbative corrections is done using a Feynman diagram expansion and considering only the first order in . Since we are interested in the dynamic behaviour near criticality, the mass m is set to 0 in all diagrams contributing to self-energies and vertex functions [34], except for those giving corrections to m itself.
As mentioned earlier, the self-energies represent the perturbative corrections to the gaussian part of the effective action. We distinguish four different self energies, one for each combination of fields appearing in S 0 , namelyψψ, ψψ,ŝs andŝŝ. From a diagrammatic point of view, each self-energy is given by the sum of all amputated 1-particle irreducible diagrams with external fieldsψψ,ψψ,ŝs andŝŝ respectively. Graphically, they are represeded by the blobs in the following diagrammatic scheme To identify perturbative corrections of the gaussian parameters, we need to expand the self energies in ω and k and keep only the leading terms. By comparing the corrections ∆S 0 in Eq. (92). with the form of the bare action S 0 in Eq.
(60)-(61), we can define the perturbative corrections to the gaussian parameters through the relations, where we denoted the bare parameters with the subscript 0 and where the ellipses stand for higher orders terms in ω and k, which are irrelevant in determining the critical behaviour at first order in . The ln b factors present in all terms reflect the fact that perturbative corrections are proportional to the volume of the momentum shell. After the shell integration, the gaussian action takes the following form, where we omitted the tensorial structure of the action to facilitate the reading. The self-energies thus give perturbative corrections to the gaussian parameters Γ, m,Γ, λ ⊥/ andλ ⊥/ . Moreover, the self-energy Σ has also a non-vanishing term linear in ω, namely δΩ, which gives a perturbative correction to −iωψψ; because this term is not multiplied by any parameter in the original bare action, the only way to reabsorb this correction will be through a modification of the scaling dimension of the fieldsψ and ψ. On the other hand, analogous perturbative terms do not arise in the self-energy Π, because all one-loop diagrams are at least of order k as a consequence of the particular properties of the vertices (79) and (81), which vanish at k = 0, so that the term −iωŝs acquires only k-dependent perturbative corrections that vanish when k = 0. Another -rather crucial -consequence of the fact that all corrections to Π andΠ are proportional to k, is that there are no perturbative corrections to neither η 0ŝ s norη 0ŝŝ , and hence no δη and δη corrections are present in Eq. (100) and (101). This is no coincidence, but it is a result deeply related to the presence of the underlaying (rotational) symmetry of the problem. Although terms explicitly violating the conservation of the spin s are present, namely the dissipative terms η andη, the symmetry is still at work and the non-linear interactions are not able to generate any perturbative corrections to the dissipative coefficients η andη. The standard way to explicitly perform this shell integration, and to compute the corrections to the bare parameters of the model, is using perturbation theory. The corrections δP are thus computed using a Feynman diagram expansion. At order , the non-vanishing diagrams contributing the self-energies are listed in Section II A.
Once the shell integration is performed, we are left with an effective action with a cutoff of Λ/b, a coefficient different from 1 in front of the −iωψψ term and modified parameters, namely Following Section I C 1, we perform the following rescalings After this rescaling, we end up with an action with the same cutoff Λ but with new renormalized parameters and couplings, which will be denoted with a subscript b.
Let us at first focus on what happens to the −iω terms under rescaling. Once the shell integration is performed, the rescaling procedure defined by Eq. (104)-(106) transforms these terms as follows where we remind that the −iωŝs term has no perturbative corrections because of the rotational symmetry. As previously mentioned, these terms lack of a coupling constant to redefine. Hence, the scaling dimensions of the fields must be properly chosen to restore the same structure of the bare action, namely With this choice, the renormalized theory will still have a coefficient equal to one in front of the time derivative of the equation of motion. This requirement allows to fix the scaling dimension of the response fields χψ and χŝ, Rescaling can be performed on all the other terms of the action in the same way as it was done in Eq. (107)-(108). In these cases however, the powers of b appearing after rescaling can be reabsorbed by defining new renormalized parameters of the theory. These new values of the parameters can be expressed as functions of the bare parameters, through the perturbative corrections, the dynamic exponent z and the scaling dimensions χ of the fields. Scaling dimensions for time and fields are not known a priori, but can be determined by imposing additional conditions. For what concernes the gaussian action, the renormalized parameters are given by By taking advantage of Eq. (110) it is furthermore possible to write all the scaling dimensions of the parameters in terms of the scaling dimensions of the frequency and physical fields only, namely of z, χ ψ and χ s : Let us note here a crucial fact: since the dynamic critical exponent z is always positive, so is the scaling dimension χ η of the dissipation η. Due to this, η grows exponentially when the RG transformation is iterated over and over, eventually diverging. This makes η a relevant parameter, with a role similar to that of the mass m: while m is the relevant parameter that drives the system away from the critical manifold, similarly η drives the system away from the conservative plane (while remaining on the critical manifold). Hence, independentely of all the other parameters, we expect η to diverge under the RG flow whenever its bare value is non-zero. This will have important consequences that we will discuss later on. Now that we have defined the perturbative corrections to the gaussian parameters, we shall switch our attention to the coupling constants of the non-linear interactions. Perturbative corrections to the interactions are known as vertex functions. These vertex functions can be diagrammatically expressed through the sum of all the amputated connected diagrams having as external fields the same fields of the vertex they are correcting. The six vertex functions of our theory, one for each bare vertex, can be graphycally represented through the blobs in the following expressions At one-loop, the non-vanishing diagrams contributing to the vertices are listed in Section II B. Each vertex function contributes to the corrections of couplings and parameters of Eqs. (76)-(82) in the following way, Vψ ψs αβγν (k,q) = g δg ψ P ⊥ αρ (k) I ρβγν (125) Vŝ ψψψψ αβγνστ (k,q,h,p) = Higher orders in ω and k turn out to be irrelevant in determining the critical behaviour at first order in . The perturbative corrections to the anomalous terms have been defined as, (because they represent corrections to the products g Φ 1,2 and γ s µ 1,2 , they were not simply called δΦ 1,2 and δµ 1,2 ).
After rescaling momenta, frequency and fields, we obtain the following RG transformations, By using Eq. (110) we can write all the scaling dimensions of the interaction couplings in terms of the scaling dimensions of the frequency and physical fields, namely of z, χ ψ and χ s : By looking at Eqs. (137) and (138) it may seem that we have two different mode-coupling constants; however, this is not the case. We defined two different renormalized coupling constants g (ψ) and g (s) only because the perturbative corrections δg ψ and δg s arising in Eqs. (125) and (128) are completely independent from each other, but this does not mean that there are two different physical constants. In fact, there must be only one mode-coupling constant, because g arises in the derivation of the equations of motion as the consequence of a symmetry, encoded by the Poisson-bracket relation {s, ψ} ∝ gψ, stating that s is the generator of the rotations of ψ. The mode-coupling terms in both the equations of motion derive from this one Poisson-bracket relation; therefore, the existence of two different couplings would mean losing the connection with the underlying symmetry and Poisson structure. As we mentioned before, the scaling behaviour of the physical fields ψ and s is arbitrary, which allows a certain freedom in determining the scaling dimensions χ ψ and χ s , which enter Eqs. (137) and (138). Hence, it is possible to use this freedom to restore the identity of the two mode-coupling constants, by simply asking that χ s is chosen in such a way that, Essentially, we are requiring the field s to scale in such a way that the coupling regulating the symmetry has one unique scaling behaviour.

A PRIMER OF THE DIAGRAMMATIC EXPANSION
The one-loop Feynman diagrams generated by the action's expansion, and necessary to calculate all the corrections δP to the parameters and coupling constants, are listed in Section II; not only they are many, but they are also quite complicated due to the tensorial structure of the theory. Hence, we will perform in this Section the explicit calculation of one diagram, hoping that this may help the interested reader in picking up the general technique. The diagram we calculate is a contribution to the self-energy Σ defined in (94), To facilitate the reader to follow the calculation and understand it better, we shall set η =η = 0 in the following paragraphs. Generalization of the following calculation in the case with η andη is straightforward. The integral expression corresponding to this diagram can written following the definitions of lines and vertices given in Sections I B 2 and I B 3, While the momentum integral is restricted to the momentum shell, Λ/b < p < Λ, the frequency integral is extended over the whole spectrum (−∞, +∞). Furthermore, we notice that the integrand (151) depends on the frequency ω p only through the correlation functions G 0,ψ and C 0,s defined in Section I B 2; this is in general true for any Feynman diagram present in this work. All the correlation functions (67)-(70) have at most two poles in the complex plane, and it is always possible to compute the frequency integral using the residue theorem. In the case of this diagram the integration over ω p yields, Since we are going to work at first order in = 4 − d (one loop), we can drop the m dependence in the Feynman diagrams, because this would lead to higher order corrections. Hence we set m = 0 in the following. The integral over the modulus of the momentum becomes trivial if we assume that the RG transformation is infinitesimal, namely if b 1; this means that the thickness of the momentum shell is infinitesimal and it is possible to approximate the integral of a generic function f (p) as follows, which corresponds to approximating the integral as the value of the function at |p| = Λ times the volume of the momentum shell Λ d (1 − 1/b) ln b. After this step, the integrand may still depend on the direction of the momentum p = p/|p|, which is a unit versor. Applying this procedure to equation (152) leads to, where the versorp is integrated over the d-dimensional sphere Ω d . The main advantage of having an expression of this kind is that all the dependence on the cut-off Λ is made explicit.
In order to recognize within this diagram the corrections to the various parameters of the original action, one has to expand in small external momenta and frequency, as we did in (98). The order at which it is convenient to perform this expansion depends on the particular Feynman diagram we are considering and the specific parameters or coupling constants that it corrects. For example, the diagram we are studying here gives rise to perturbative corrections to thê ψψ term in the Gaussian action (60); hence we must expand it up to the second order in the external momentum k and up to the first order in the external frequency ω k . In this example we will calculate explicitly only the contribution of order ω k of this diagram, even though the diagram gives also contributions of O(1) and of O(k 2 ). The diagram at order ω k is, 155) Expanding all the tensors in equation (155), using the definitions (37), (43) and (44), we obtain, We can now perform the integral over the d-dimensional sphere dΩ d , which can be done using the following two relations, where the brackets indicate the average over the d-dimensional sphere, While the average of an even number of versorsp is nonzero, the angular average of an odd number of momenta vanishes by symmetry. This leads to the following expression for the diagram (150), Since we are performing an -expansion around the upper critical dimension d c = 4, we can set d = 4 in all the Feynman diagrams when results at first order in are concerned. In the end, by comparing this expression to the expansion of the self-energy in (98), we can finally read the correction to the scaling dimension of the fields coming from this one diagram, namely, where the dots indicates the corrections from all other diagrams contributing to the self-energy Σ, which are listed in Section II.

GENERATION OF THE ANOMALOUS TERMS
The equations of motion proposed initially in (24), (25), in which the anomalous terms proportional to µ 1 , µ 2 , Φ 1 or Φ 2 do not appear, have a (relatively) clear physical interpretation. However, as previously pointed out, in the absence of some additional terms, which we call anomalous, the equations of motion are not RG-invariant; basically, what happens is that the RG generates some terms that were not present in the original action. To see how this happens, we perform here a shell integration starting from a theory with bare coefficients Φ 10 = Φ 20 = µ 10 = µ 20 = 0, and we show that these anomalous terms are spontaneously generated by the renormalization group transformation. We also hope that this further direct analysis of some highly nontrivial diagrams may provide the reader with a more robust technique to perform the entire calculation. Since the generation of these non-linear terms does not rely on the presence of η andη, we shall set them both to 0. Once again, we hope this makes the calculation easier to follow.

*** Anomalous Mode-Coupling terms
Without anomalies, the non Gaussian action (62) has only one term proportional toŝψψ, corresponding to the following vertex,ŝ which is the same mode coupling vertex as (80), but without the anomalous terms. To compute the perturbative corrections to this vertex we must consider all the diagrams with an incomingŝ line and two outcoming ψ lines; these are 12 diagrams, listed in Fig.(14) of Section II, most of which produce anomalous corrections. Here, as an example, we limit ourselves to consider only the first two of all these diagrams namely, where we explicitly write also the momenta of internal field lines. These diagrams can be converted into integral expressions following the usual Feynman rules. It is important to note, though, that this is exactly one of the cases in which it is more convenient to write the spin mode-coupling vertex as in (79) (but of course with Φ 1 = Φ 2 = 0), rather than as in (80). The result is, Their sum is given by, where the constants δg s , A and B are, with w = Γ/λ and x = λ ⊥ /λ . These two Feynman diagrams correct the interacting part of the action relative to the mode-coupling of the spin as follows, Note that, in terms of the corrections defined in Eq. (128), the terms A and B contribute to Φ 1 δg s1 = A + . . . and Φ 2 δg s2 = B + . . . , where the ellipses stand for contributions coming from diagrams other than D mc,1 and D mc,2 .
The core idea of the renormalization group is that the perturbative contributions generated by Feynman diagrams can be reabsorbed into a redefinition of the model's parameters. Comparing equations (161) and (167) it is evident that the first term, proportional to I αβγν k · q, has the same form as the bare vertex, hence it can be reabsorbed in the coupling, g → g(1 + δg s ln b). However, we cannot reabsorb the second and third terms of equation (167) as correction of any pre-existing parameters, because of the different tensorial structure. For this reason two novel terms, proportional to I αβρσ I ρτ γν k σ q τ and I αβρσ I ρτ γν q σ k τ respectively, must be included in the action. These two terms coincide with the two anomalous terms Φ 1 and Φ 2 in equations (62). We remind that most of the diagrams listed in Fig.(14) of Section II generate the same anomalous terms with different coefficients A and B. It is crucial to note that, if the model is at equilibrium, v 0 = 0,Γ = Γ andλ ⊥, = λ ⊥, , all these perturbative contributions vanish: A = B = δg s = 0, so that no anomalous terms are generated in the non-active case.

*** Anomalous Advection terms
The same procedure can be used to prove that the two anomalous advection terms, proportional to µ 1 and µ 2 , are fundamental to guarantee the closure of the theory under the RG transformation. We use the same strategy: we assume that the anomalous terms are zero in the bare theory, µ 1 = µ 2 = 0, then they are spontaneously generated by the renormalization group transformation. If we assume that µ 1 = µ 2 = 0 , there is only one term proportional tô ssψ in the non Gaussian action (62), corresponding to the advective derivative. In the diagrammatic expansion, this corresponds to modifying the vertex, The term proportional toŝsψ in the action is corrected by the Feynman diagrams with an incomingŝ, one outcoming s and ψ line. These are only 3 diagrams, listed in Fig.(16) of Section II, so we calculate here all the three of them, namely, The integral expressions of these Feynman diagrams are: their sum is, with the constants δγ s , C and D, given by, These three Feynman diagrams correct the interacting part of the action relative to the advection of the spin as follows, bare structure + C k τ δ ση + D k σ δ τ η absent in the bare theory ]s γν (k/2 −q)ψ η (k/2 +q) , (175) Comparing equations (175) and (168) it is evident that the term proportional to I αβρσ I ρτ γν k η δ στ = I αβγν k η is the same as in the bare case and it can be reabsorbed as a correction of the advection coupling, γ s . On the other hand, the two terms proportional to I αβρσ I ρτ γν k τ δ ση and I αβρσ I ρτ γν k σ δ τ η have tensorial structure different from the bare action, so that they require the addition of two new anomalous advection terms, which are exactly the ones proportional to µ 1 and µ 2 in equation (62). In terms of the corrections of Eq. (129), the terms C and D thus contribute to 2µ 1 δγ s1 = C and 2µ 2 δγ s2 = D.

*** Summary of the anomalous terms
We have shown explicitly that some vertex correction diagrams generate the following mode-coupling and advective anomalous terms in the equation of motion for the spin, which we have summarized here together with the real-space form that we anticipated in (27) and (29). Notice a complication: while each one of the two advection terms in k space gives rise to one term in x space, both mode-coupling anomalous terms in k space give contribute to both terms in x space. For this same reason the couplings of the MC anomalous terms in momentum space, Φ 1 , Φ 2 are a linear combination of those in real space, φ 1 , φ 2 (see equations (52) and (53)). The crucial result of the diagrammatic calculation is that all diagrams generating anomalous terms (including all diagrams that we have not explicitly analyzed in this Section), generate only terms with one of these four tensorial structures. In this sense, the RG calculation is closed once we include these four vertices in the action. Notice also that the total perturbative amplitudes of these terms are not simply given by the coefficients A, B, C, D calculated here, indeed because anomalous corrections with the same tensorial structure arise in many more diagrams than the five computed in this Section.

EFFECTIVE COUPLINGS
As it is always the case in RG calculations, it is now possible to define a set of effective parameters whose scaling behaviour does not depend on z, χ ψ and χ s , so that all physical quantities turn out to depend on the parameters of the theory only via these effective couplings. The effective couplings can be found by looking at the scaling dimensions of the parameters of the theory, given in Eqs. (118)-(120), (143)-(147). By recalling that the scaling dimension of the product of two couplings is given by the sum of the respective scaling dimensions, namely, it is always possible to find a set of combinations of coupling constants and parameters of the theory that have a scaling behaviour which is independent from z, χ ψ and χ s [18]. For example, let us consider the coupling constant J: its scaling, given by Eq. (147), depends on χ ψ and z. To compensate the dependence on χ ψ we can multiply it by the noise strengthΓ, which also has a χ ψ scaling dependence; thus, the productΓJ does not depend anymore on χ ψ . Similarly, the dependence on z can be compensated by dividing by Γ 2 , making the scaling behaviour of the combinatioñ ΓJ/Γ 2 depending only on d and perturbative corrections, which are computed through Feynman diagrams. A similar procedure can be applied at any other constant; multiplying it by appropriate powers ofλ ⊥/ orΓ to compensate the dependence on χ s or χ ψ respectively, and by appropriate powers of Γ, λ ⊥/ to compensate the dependence on z.
There is some arbitrariness on the choice of whether to useλ ⊥ orλ to compensate χ s and whether to use Γ, λ ⊥ or λ to compensate z. However, as long as these parameters are not singular (and they will not be at the fixed point), this choice is irrelevant. The effective parameters for (30)-(31) are, where θ ⊥, = 1 if the system is out-of-equilibrium. Although the technical definition of θ ⊥/ contains the ratio g (s) /g (ψ) , in the physical case we expect g (s) = g (ψ) , as argued in Section I C 2 . Hence, the physical meaning of θ ⊥/ is that of the ratio between the effective temperatures of the two fields, T ψ =Γ/Γ and T From the dissipative coefficients η andη, two effective parameters can be obtained Here the presence of Λ is needed for dimensional reasons. In terms of these reduced frictions, the conservative dynamics is recovered whenη =η = 0, while the fully dissipative dynamics is recovered whenη =η = 1, namely when η λ /⊥ Λ 2 ,η λ /⊥ Λ 2 . The effective coupling constants regulating activity are, the ferromagnetic coupling J and the mass m have effective couplings given bỹ while the mode-coupling and the DYS effective coupling constants respectively are, where Λ is the cutoff of the theory and K d is the surface of the unitary sphere in d dimensions (note that in the main text we set Λ = 1 and K d = 1 to simplify the notation). The presence of Λ to the power 4 − d in all effective couplings except for r suggests that the upper critical dimension of the theory is d c = 4, which means that all couplings are relevant in d < 4, while mean-field behaviour is recovered for d ≥ 4. The dependance of r on Λ d−2 indicates instead that the mass is always a relevant perturbation above d = 2, driving the system away from the critical manifold. Therefore, this requires the mass to be fine-tuned to be near the critical point. At equilibrium, where v 0 = 0 and Γ = Γ,λ = λ, κ = ug, all the effective couplings become identical to their standard equilibrium counterpart [9], with the equilibrium resultũ =ũ κ [12] being recovered. Finally, we should add to the list of effective parameters the four adimensional parameters regulting the anomalous mode-coupling and self propulsion non-linearities, namely (185)

RG FLOW EQUATIONS AND β-FUNCTIONS
The flow of the effective couplings can be obtained by iterating the RG transformation, thus defining a set of recursive relations. After l iterations, the new parameters will take the form, where χ P is evaluated using the values of the parameters at step l. The values P * to which the flow of P approaches when l → ∞ are called fixed points, and play a crucial role in determining the critical behaviour of the theory [34]. To study the RG flow of our theory, it is convenient to rewrite Eq. (186) in the thin shell limit ln b → 0. In this limit, the flow equations becomeṖ where β P is the derivative of P with respect to ln b, namely β P = ∂P/∂(ln b), known as β-function. The fixed points P * of the RG-flow are given by the zeros of β P , since these are the points at whichṖ = β P | P=P * = 0. Among these fixed points, some are (IR-)stable meaning that the flow is driven towards them, while other may have one or more directions of instability from which the RG flow escapes. Asymptotically IR-stable fixed points are those typically ruling the critical behaviour of systems in the thermodynamic limit. Note that, as previously discussed, the mass coupling r always represents a source of instability since it drives the system away from criticality. Hence, stability is intended as stability in all directions except r.
The β-function of the generic parameter P is given by β P = Pχ P . Hence, for the parameters defined in Eqs. (179)-(184) the beta-functions take the following form, where all the perturbative corrections δP are obtained from the Feynman diagram expansion. The explicit expressions of these beta-functions are given in the Mathematica attached notebook in the ancillary filed (beta functions.nb). Before we proceed, let us draw the reader's attention to the beta functions of the reduced dissipative coefficients,η andη, Eq. (192) and (193). Within our perturbative approach, near 4 dimensions the parametersη andη may take only two possible fixed point values: 0 or 1. This is because the perturbative corrections δλ and δλ are both of O ( ); hence the quantities 2 − δλ and 2 − δλ will always be strictly positive and close to 2; hence, the only way in which βη = 0 is by settingη = 0 orη = 1. Similarly, βη = 0 is achieved only whenη = 0 orη = 1. Near the conservative manifoldη =η = 0, the scaling dimensions of bothη andη are positive, meaning that the flow drives the system away from it. On the other hand, the fully dissipative manifoldη =η = 1 is attractive. Hence, bothη andη flow from the conservative value of 0 towards the dissipative value of 1, regardless of what all the other parameters do. In terms of the dissipations coefficients η andη, the stability of theη =η = 1 manifold reflects into the fact that both parameters grow under the RG flow, eventually diverging: η → ∞,η → ∞. This means that the asymptotically stable behaviour of the theory is given by the overdamped limit.
The power of the RG lies in the fact that critical exponents can be inferred by the study of the RG flow in the neighbourhood of a fixed point. The fixed point at which the critical exponents should be evaluated usually is the stable one, since one expects the RG flow to eventually reach its stable fixed point. However, unstable fixed points often play a crucial role in determining the behaviour of finite-size systems, since crossover phenomena may take place. We will show in the following section that this is indeed the case for the field theory we have introduced. Although the IR-stable fixed point in the presence of spin dissipation is the one of the incompressible Toner and Tu theory [2], an RG crossover takes place for small frictionsη 1, as in [10,11]. The (IR-unstable) fixed point ruling the behaviour of systems with small frictions can be found by focusing on the RG flow of the conservative theory. It is worth noting that the conservative subspace is RG-invariant, since βη = βη = 0 whenη =η = 0. Hence, we shall at first restrict ourself to the sub-spaceη =η = 0, and come back to the dissipative theory later on in Section I E 1.

FLOW AND FIXED POINT
In the simple cases, fixed points can be found by solving analytically the set of equations β P = 0 in the parameters P. Here this was not possible. Instead, we tackle this problem by integrating numerically the set of partial differential equations defining the RG flow, namely (187), and looking at what values of the parameters the flow converges. If the flow does converge, the point to which it will converge is a fixed point of the theory. This procedure does not allow us to find unstable fixed points, but only the IR-stable ones in the subspaceη =η = 0. To perform this numerical integration, we have to take great care in the choice of initial conditions of the parameters and couplings, as we expect a large portion of this 14-dimensional space to be just unphysical. The set of parameters v 0 = 0 (and thus c v = c s = 0), θ ⊥/ = 1, Φ 1 = Φ 2 = 0 andũ =ũ κ = 0 identifies Solenoidal Model G [12], which represents the equilibrium limit of our theory -Eq. (30) (31). We expect a system with small, but non-vanishing activity v 0 to belong to the neighbourhood of the equilibrium limit, and thus to be physical in the RG space. If activity is relevant, and indeed we will see it is, the corresponding coupling constants c v and c s should grow along the RG flow and drive the system towards an out-of-equilibrium active fixed point, if it exists. We remark that we do not start close to equilibrium because we expect activity to be weak in swarms; in fact, quite the opposite: as we show in the main text and in Section III, activity in natural swarms is strong. We start close to the equilibrium fixed point for a mere technical point: we need a safe path through a very dangerous parameter space in which it is far too easy for the RG flow to go bonkers if one deviates too much from a physical initial condition.
The constants µ 1 and µ 2 have undefined values in the equilibrium limit, since they are both multiplied by v 0 γ s in the equations of motion and thus we have no a priori argument to fix their value near equilibrium. However, we find that the β-functions of these two parameters, namely β µ1 and β µ2 , vanish when µ 1 and µ 2 take the following values independently of the values of all other parameters. This means that both these combinations of µ 1,2 remain constant along with the RG flow, whatever the other couplings and parameters are doing. The first solution of µ 1,2 turns out to be unstable for small perturbations of their values, at least near the equilibrium fixed point, while the second solution is stable. Hence, we fix from the beginning µ 1 = −1 and µ 2 = 0 and check the consistency of this choice by checking a posteriori the stability of the RG flow with respect to perturbations in µ 1 and µ 2 , thus reducing our problem to 11 coupled equations instead of 13.
We simulate the RG flow starting from various initial conditions close to equilibrium by using the built-in NDSolve function of the software Mathematica, and we always find the same attractive fixed point, The typical RG flow of a selected number of couplings is represented in Fig. 5. Since c v , c s and f have a finite fixed point value, both activity and mode-coupling are relevant at this fixed point, making it the ideal candidate to describe incompressible inertial active matter. The stability of the novel fixed point is analyzed by looking at the Jacobian matrix of the beta functions The eigenvalues of J β at the new fixed point are given by

THE DYNAMICAL CRITICAL EXPONENT
The spatio-temporal behaviour of a collective system is described by the two-point connected correlation function of the order parameter, C(x, t). In general, the correlation function encodes a very complicated relation between space and time and it depends on the set of parameters P defining the state of the system. In the case of critical systems, the parameters enter the correlation function only through the correlation length ξ (P). This property is known as dynamic scaling [41,42], and it states that the correlation function C, when expressed as a function of wave-vector and frequency, obeys the following scaling form, where ξ = ξ (P) is the correlation length and where the static correlation function C 0 has in turns the scaling form, while the characteristic frequency at scale k is given by, In the relations above, Ω, F 0 and F are well-behaved scaling functions, whose explicit form is not relevant for our purposes; η is the critical exponent for the static correlation function (normally called anomalous dimension of the order parameter [43]), which must not be confused with the dissipation. What dynamic scaling asserts is that the divergence of space correlations and time correlations are not independent near the critical point, but they are connected by the dynamic critical exponent z. To find the critical exponent z, following a standard procedure [9,37] we require that the kinetic coefficient of the velocity field is not singular at the RG stable fixed point, namely, By plugging χ Γ = 0 into Eq. (118), we find, Once we substitute the diagrammatic results for δΓ and δΩ atη =η = 0 in the equation for z, we obtain the following expression for the dynamic critical exponent, The value of z is then simply obtained by plugging into (220) the fixed point values of the parameters, which gives, For d = 3 ( = 1), we finally obtain the RG prediction for the dynamic critical exponents of the inertial active theory, z = 1.34 (8) .
The correction 0.65 with respect to the free value z = 2 might seem large for a first-order term in a perturbative expansion, in particular when we set = 1 in d = 3. However, comparing our result with the equilibrium non-inertial theory of Model A might be misleading. In fact, the new fixed point has been found by adding nonlinear activity to Model G [9], which has a non-perturbative dynamic critical exponent z = d 2 = 2 − 2 [18]. Thus, our result should be considered as a 0.15 departure from Model G's non-perturbative exponent, rather than a 0.65 correction to Model A. To compare our result with previous ones in a similar context, the incompressible Toner and Tu theory has a dynamic critical exponent of z = 2 − 0.27 [2], with a perturbative departure of 0.27 from equilibrium Model A's exponent, which is its natural expansion point. When setting = 1, the incompressible Toner and Tu theory predicts an exponent z = 1.73 in d = 3. Numerical simulations of Vicsek Model in d = 3 found a dynamic exponent of z 1.7 [3], showing that the dynamic exponent found in [2] holds with remarkable accuracy also in d = 3, namely when = 1.

STATIC CRITICAL EXPONENTS
In the present section we will derive some of the static critical exponents relative to the active inertial fixed point found in the previous sections. Before proceeding however, an important caveat is in order. The present RG calculation was performed by imposing a solenoidal constraint on the primary field, ∇ · ψ = 0, in order to enforce incompressibility. In non-active equilibrium systems, the presence of the solenoidal constraint is known to change the static universality class [44,45], while leaving unchanged the dynamic universality class [12]; this means that an equilibrium RG calculation with solenoidal constraint would find the same dynamic critical exponent z as a simulation without solenoidal constraint, but would fail to reproduce the static critical exponents of that same simulation. It is not known how this scenario generalises to the active off-equilibrium case, but some caution is certainly required: although in active systems with mild density fluctuations incompressibility is a reasonable hypothesis to calculate the dynamic critical exponents, one must be careful about the static exponents, as we have no certainty that they must be the same as in system where the solenoidal constraint is not imposed, as in natural swarms or numerical simulations.
The first exponent we shall compute is the ν, which characterizes the divergence of the correlation length, where T is the control parameter (the temperature in the case of equilibrium systems) and T c its value at the critical point. This exponent can be computed from the runaway exponent of the mass, namely from the mass β-function. Let us define, where P * are the values of the parameters at the fixed point. The exponent ν is thus given by The second exponent we compute, γ, characterizes the response of the system to a small external field H coupled to the order parameter. It is crucial here, for the purpose of computing these exponents up to one loop, to notice that no one-loop graphical corrections to such a field term appears in the calculation. Assuming linear response of ψ on H, ψ = χ H H, the exponent γ characterizes the divergence of the susceptibility χ H in the vicinity to the phase transition, where T is the control parameter. Since χ H = ∂ ψ ∂H , the exponent γ will be given by, where χ ψ(r,t) is the scaling dimension of ψ in position and time space, and y H is the scaling dimension of the field H. The absence of graphical correction to H at one loop implies that y H can be determined only by power counting.
To determinate y H , let us recall how H would enter the equation of motion for ψ: ∂ t ψ ∼ H, hence leading to y H = z+χ ψ(r,t) + O 2 . At first order in , the exponent γ is thus given by, which in d = 3 becomes, Let us note that this exponent does not quantify the divergence of the space integral of the connected correlation function, as it would be the case for equilibrium systems, because the fluctuation-dissipation relation linking the susceptibility χ H to the correlation function loses its validity when the system is out-of-equilibrium. In previous works [5,21,25], we called the space integral of the connected correlation function 'generalized susceptibility' and used the symbol χ for it, so what we are saying here is that, although the two quantities will both diverge at the critical point, in general χ = χ H , so they might have different critical exponents in the off-equilibrium active regime. The evaluation of the other static critical exponents requires the knowledge of the scaling dimension of the primary field, χ ψ , which in turns requires some RG fixing prescription. We had a similar case with the evaluation of z in the previous section, which we resolved through (218); however, while the prescription Γ * = O(1) used to compute z is well-established and well-understood, both in and out of equilibrium [3,9,46,47], and it has been validated by countless simulations and experiments, to our knowledge there is no standard, nor validated prescription to fix χ ψ . Hence, we prefer not to speculate here about this, with the risk of giving wrong predictions. The issue clearly deserves more study, both analytical and numerical.

OFF-EQUILIBRIUM ACTIVITY RESTORES THE CONSERVATION OF THE SPIN
One of the most intriguing features of the novel active inertial fixed point is the vanishing of the DYS vertex effective coupling constant, namelyũ * κ = 0. At equilibrium, the presence of the DYS vertex, withũ κ =ũ, had such a crucial role in keeping the dynamics of s consistent with the solenoidal constraint on ψ [12] that it seems surprising to find that this vertex disappears when the system is the off-equilibrium active phase. Moreover, the couplingsũ andũ κ both derive from derivatives of the uψ 4 term in the free energy functional H, and thus one would expect a deep connection between the two. However, the fact that the DYS vertex vanishes at this novel fixed point avoids the odd scenario of having two different ferromagnetic couplings; both at equilibrium and off-equilibrium, the RG suggests that only one ferromagnetic coupling should exist, by requiring in the former case thatũ =ũ κ and in the latterũ κ = 0.
But by far the most surprising consequence of the off-equilibrium vanishing of the DYS vertex is that whenũ κ = 0 (and when the dissipation η = 0) the global spin turns out to be conserved! As mentioned in Section(I B 3) and discussed in [12], the DYS vertex is the only interaction that contributes to the dynamics of the spin at k = 0, thus violating the spin conservation (although it does not violate it strongly, e.g. through dissipation). Sinceũ κ is the effective coupling associated with the DYS interaction, the fact that it vanishes means that this vertex is irrelevant at the novel fixed point and thus the spin becomes globally conserved. The restored spin conservation in the active off-equilibrium case is surprising, and it can hardly be a mere accident of the RG calculation. In order to understand its meaning we have to go back to the underlying symmetry and conservation law in the theory. configuration of the field (ψ0, black) and we let a constant uniform spin rotate the field, we obtain a non-solenoidal configuration (ψ2, orange). This is equivalent to saying that the solenoidal constraint breaks rotational invariance at equilibrium. Right: in the active case the particles are dragged by the field, hence the rotation generated by the spin rotates both the field and the position of the field, giving rise to a new configuration that is equally solenoidal (that is incompressible).
At equilibrium, and in absence of any constraint on the field, the presence of an exact rotational symmetry guarantees that the spin is globally conserved since it is the generator of the symmetry. This is exactly what happens in Model G, described in Section I A 2. Now we add the solenoidal constraint, which is the equilibrium version of incompressibility: as it is clear from Fig.6-left, the solenoidal constraint breaks the rotational invariance: if we start from a solenoidal configuration of the field and rotate each vector by the same constant amount, we obtain a new field configuration which violates the solenoidal constraint. At the field-theoretical level, this manifests itself by the RG generation of the DYS vertex that indeed breaks the symmetry and conservation of the spin [12]. This is what happens at equilibrium, namely in the non-active case.
According to the RG, if we now turn from the inactive solenoidal case to the active incompressible one, the presence of activity restores the full power of the rotation symmetry -at least on long wavelengths -by making the spin conserved once again. This suggests that -at variance with the inactive case -activity preserves incompressibility under local rotations. A qualitative cartoon of this mechanism can be seen in Fig.6-right: in the active case a local rotation generated by the spin has a twofold effect: i) it rotates the field (as in the inactive case); ii) it also rotates the positions, through the self-propelled part of the equations,˙ x = v (at variance with the inactive case). These two rotations balance each other, giving rise to a new field configuration that is still solenoidal, i.e. incompressible (Fig.6-right). We stress that this is far from being a general mathematical proof, as it is restricted to the very simple case of a purely rotational field, while one would need to generalize this argument to a generic solution of the dynamical equations. It may be that the full proof of spin conservation in the case of active incompressible dynamics is the very RG calculation that we carried out here; even though one would hope for a simpler and more direct way to prove this result, we could not find it.
In fact, when we reflect on the whole RG flow, rather than restricting ourselves to the fixed point, the situation becomes even more intriguing. As we wrote above, we found the new active incompressible inertial fixed point by starting close to the inactive (equilibrium) incompressible inertial fixed point. At the starting point,ũ * κ = 0, becauseas we said -at equilibrium the DYS vertex is required to enforce incompressibility; along with the flow which goes from equilibrium to off-equilibrium, the couplingũ * κ decreases weakly (see Fig.5) until it abruptly goes to zero right before arriving at the new fixed point. If we tried to start the flow with all parameters close to their equilibrium values, but withũ * κ = 0 from the outset, we would not reach the new off-equilibrium fixed point, and the RG flow would simply go bonkers. Hence, not only the symmetry-breaking couplingũ * κ is necessary at equilibrium, but it is also necessary to accompany the RG flow to the off-equilibrium fixed point; only thereũ * κ is finally allowed to vanish. Clearly, this phenomenon deserves a deeper study.

DISSIPATIVE FIXED POINT
Up to now, we analyzed the RG flow only in its fully conservative manifold, namely atη =η = 0, since we expect finite size systems with small dissipation to be well described by fixed points in this manifold. In the present section we will test the validity of our expectations. First, we will look to the behaviour of the RG flow in the strong-dissipation limit, recovering the results for incompressible Toner and Tu theory. Secondly, we fill focus our attention to the crossover between underdamped (small dissipation) and overdamped (strong dissipation) dynamics by analysing the RG flow in the proximity of the new active, inertial conservative fixed point found in Section I D 1.
As discussed at the end of Appendix I C 6, wheneverη =η = 0 the RG flow makes them grow, approaching the fully dissipative manufoldη =η = 1. Whenη =η = 1, the β-functions of the parameters and coupling constants of the theory take a simplified form. In particular, the β-functions of c v andũ become independent from all the other parameters, while all the critical exponents can be expressed in terms of c v andũ alone. This is a consequence of the fact that, in the presence of friction, the actual mode-coupling effective constant in the perturbative expansion is no longer f , but rather q =λ λ whose scaling dimension in proportional to 2 − d, suggesting that mode-coupling is relevant only below 2 dimensions, and not 4, when friction is present. The upper critical dimension of the field theory however remains d c = 4, since both c v andũ still become relevant below 4 dimensions. The physical reason behind this change of dimension at which mode-coupling is relevan is the fact that the spin becomes a fast mode. When η = 0, the finite relaxation time for the spin η −1 will become, close to criticality, much smaller than the velocity relaxation time τ = ξ z . Hence, the effect of the spin on the order parameter becomes irrelevant, since its effect can be represented as noise.
Therefore, the presence of dissipation strongly modifies the asymptotic critical behaviour, since near criticality only the modes fluctuating on the same scales of the order parameter may affect universal quantities [9]. Therefore, when dissipation is present, the asymptotic critical dynamics in the thermodynamic limit is unaffected by the presence of the spin-velocity coupling.
In this limit,η =η = 1, our theory should therefore become equivalent to the overdamped incompressible theory of [2]. In all perturbative corrections to the equation of motion of v, the diagrams proportional to f are always also proportional to (1 −η). Hence, whenη = 1 these diagrams vanish, giving no contributions to the critical exponents. The velocity field v thus fully decouples from the spin s in the overdamped limit, making the behaviour of the fast mode s irrelevant in determining the universal quantities, so that forη =η = 1, we obtain, which are the same RG flow equations of [2], giving the dynamic critical exponent, This prediction is confirmed also by numerical integration of the full RG flow, as shown in Fig.7. As required by physical consistency, the overdamped limit of our theory is the same as the non-inertial theory of [2].

THE CROSSOVER FROM UNDERDAMPED TO OVERDAMPED DYNAMICS
The asymptotic stable fixed point for our field theory is indeed given by the overdamped, off-equilibrium one already found in [2]. However, as previously mentioned, unstable fixed points can play a crucial role in determining critical exponents for finite-size systems [10]. It often happens that for small enough sizes, the critical properties of a system are determined by the closest fixed point, rather than the fully stable one. As the size of the system is increased, a crossover between different regime emerges, changing the critical properties of the system. In the present case, we expect a finite-size system with small enough dissipation to initially behave as if it were inertial, and turn to the overdamped limit only when the thermodynamic limit is approached.
At the theoretical level, this crossover between a finite-size physics influenced by inertia, and an asymptotic physics completely overdamped manifests itself as a crossover between different RG fixed points. Hence, following [10,11], we will investigate the crossover between the underdamped dynamic behaviour, ruled by our novel fixed point with z = 1.35, and the dissipative incompressible theory with z = 1.73. The starting point of this analysis is the observation that the ratio between dissipation η and conservative kinetic coefficient λ naturally defines, by simple power counting, a length-scale R given by [10,11], We could define a second length-scale: λ ⊥ /η. However, it turns out this second length-scale has the same scaling behaviour as R, hence it makes no difference what definition we use. The parameter R plays the role of a conservation length-scale, in the sense that fluctuations occurring on length-scales smaller than R obey a conservative dynamics, while beyond R fluctuations are insensitive to inertia and conservation laws. The scaling dimension of R is that of a length, but only at the naive (non-interacting) level; in fact, we know that the RG coupling between the UV and IR degrees of freedom generates non-trivial modifications of naive scaling dimensions of λ and in principle also of η. The RG flow of R can be written -defining its scaling dimension χ R -as (see Eq. (186)), and from equations (235), (119) and (120) we obtain, where −1 is the naive dimension of R, which allowed us to naively identify it as a length-scale, while δλ the correction to the kinetic coefficient λ . We recall that η has no perturbative correction due to the symmetry of the problem. The physical origin of this fact is that all diagrams contributing to the self energy Π (which contains corrections to both η and λ ) vanish at k = 0 as a consequence of the form of the mode-coupling and the self-propulsion vertices for s. This has been verified in our calculation at first order in perturbation theory, where higher order effects of diagrams containing the DYS vertex on Π are not taken into account; however, if the result κ * = 0 is exact to all orders, the DYS vertex is always irrelevant and thus no diagrams could ever generate dissipation. This is what we expect, as κ * = 0 implies a global conservation of the spin (at η = 0), which does not seem a perturbative accident (see Section I D 4). But even in the unlikely case in which the result κ * = 0 were an accident of first order perturbation theory, we prove here that the dissipation of the spin still can never receive any perturbative corrections. In general, the self-energy Π takes contributions only from the following two distinct classes of diagrams, where the blobs represent the renormalized mode-coupling and self-propulsion vertices respectively, in which all the possible diagrammatic corrections (at all orders in k and q) are taken into account. Because the renormalized spin mode-coupling and self-propulsion vertices vanish at zero external momentum k, then these diagrams are zero at k = 0, implying that no dissipation is generated. Therefore, as long as the structure of the equations of motion (30) and (31) is preserved under the RG, no spin dissipation can be generated.
To work out the correction δλ we need a different argument. As we have seen, the ratio w = Γ/λ is finite at the novel fixed point, w * = 3.95 (equation (210)); moreover, the kinetic coefficient of the primary field is also finite at the fixed point, Γ * = O(1) (this is how one works out the dynamic exponent z), implying that also the kinetic coefficient of the spin is finite, λ * = O(1), and thus that χ λ = 0. From (118) we conclude, From equations (237) and (239), we finally obtain the scaling dimension of R near the active conservative fixed point, where z is the dynamic exponent of the conservative inertial fixed point we are interested in. The fact that the scaling dimension of R is negative implies (through (236)) that it decreases along the RG flow, making the length-scale within which critical dynamics is underdamped shorter and shorter; this is another way to see that dissipation eventually takes over in the hydrodynamic limit. However, thanks to (240), we can now quantitatively describe the RG crossover from the conservative active fixed point to the dissipative one, or -more precisely -describe the departure of the RG flow from the conservative active fixed point when we start close to it. Close to criticality, the correlation length-scales as, where ξ 0 is its physical value in the original system under study. The RG flow stops when the system is far from the critical manifold, namely when the correlation length becomes of the same size of the microscopic scale Λ −1 , giving b lstop = ξ 0 Λ [9,37]. Let us consider a system with small bare dissipation, η 0 , and therefore with a large conservation length-scale R 0 (the subscript zero indicates the bare values of the parameter, namely the starting values of the RG flow or the parameters that are those of the equations of motion). The RG flow will rapidly approach the conservative fixed point, remaining in its neighborhood for a large number of RG iterations and eventually flowing towards the dissipative fixed point [11]. Whether the system is ruled by the conservative or dissipative fixed point depends on how large R lstop is when the RG flow leaves the critical region: if when the flow stops R lstop Λ −1 , the critical behaviour is ruled by the conservative fixed point; this is equivalent to the condition, that is, By using equation (240) and by measuring all lengths in units of the microscopic scale Λ −1 (which is equivalent to simply set Λ = 1 in all equations), we obtain the following condition for active underdamped dynamics, where κ = 2/z is the so-called crossover exponent, which determines how slowly the dissipative dynamics becomes relevant. Beyond this regime the overdamped fixed point with dynamic exponent 1.73 takes over. In Fig. 8 we show The figure refers to the d = 3 case. The dark red region is where dissipation is weak enough that conservative dynamics still holds, thus leading to z = 1.35: this is the underdamped regime. In the light green dissipation overcomes and overdamped regime is achived, where the dynamic behaviour is controlled by z = 1.73. The red line represents Lc, the threshold between the two behaviour in the active case. The black dashed line represents instead the threshold between underdamped and overdamped dynamics at equilibrium: since the undedamped region is lower here, we conclude that activity protects underdamped behaviour up to high scales.
the regions corresponding to the two dynamical behaviours. In the case of finite-size systems as natural swarms, the size of the system L is a physical upper bound to the correlation length ξ 0 . Therefore, if the dissipation η 0 is small enough to have L (R 0 ) 2/z , the ruling fixed point is the active inertial one. By recalling the definition of R, equation (235), we get, Because λ 0 is always finite, the condition above can be finally rewritten as, which clearly shows that the regime of the system (underdamped vs overdamped) depends essentially on the balance between system size and spin dissipation. A final remark is in order. From (245) we clearly see that the smaller is z, the larger the system can be before crossing over to the dissipative fixed point. Since the new active inertial critical exponent, z = 1.35, is smaller than its equilibrium inertial counterpart, z = 1.5, this means that in the active off-equilibrium case conservative dynamics rules the collective behaviour of the system up to larger scales compared to the equilibrium case (see Fig. (8)); this means that activity protects the conservative structure against dissipation, rather than thwarting it, which is quite remarkable.

II. FEYNMAN DIAGRAMS
A. SELF ENERGIES 1. SELF ENERGY Σ The self energy Σ αβ corrects the inverse bare propagator of ψ, and is given bŷ

SELF ENERGY Π
The self energy Π αβγν corrects the inverse bare propagator of s, and is given bŷ

SELF ENERGYΣ
The self energyΣ αβ corrects the noise variance of ψ, and is given bŷ with the following non-vanishing diagram contributing to it

SELF ENERGYΠ
The self energyΠ αβγν corrects the noise variance of s, and is given bŷ The vertex function Vψ ψs represents the corrections to the mode-coupling vertex in the equation of ψ and is given by The vertex function Vŝ ψψ represents the corrections to the mode-coupling vertex in the equation of s, and is given by The vertex function Vψ ψψ represents the corrections to the advection vertex in the equation of ψ, and is given by The vertex function Vŝ sψ represents the corrections to the advection vertex in the equation of s, and is given by The vertex function Vψ ψψψ represents the corrections to the ferromagnetic vertex in the equation of ψ, and is given by

III. ASSESSING THE RELEVANCE OF ACTIVITY IN NATURAL SWARMS
In systems of active individuals interacting via local alignment there are two distinct mechanisms allowing propagation of directional information through the group. The first consists in passing the information from neighbour to neighbour via local interaction links: this mechanism would be present even at equilibrium, i.e. for a Heisenberg model on a fixed network. The second is due to the individuals' movement, which carry the information along with their motion causing a rearrangement of the interaction network in time: this second (activity-related) mechanism would be absent in standard ferromagnets and it is an out-of-equilibrium feature. A quantitative way to assess the role of activity in determining the relaxation properties of the system is therefore to compare the typical timescales related to the rearrangement of the interaction network and the relaxation of directional observables. In order to do so, we consider the following two correlation functions.
To characterize the dynamical evolution of the network we define the network overlap function C net (r, t) as, where the matrix n ij (r, t) is equal to 1 if the two individuals i and j are at mutual distance r ij < r at time t and zero otherwise, n i (r, t 0 ) = j n ij (r, t 0 ) represents the number of neighbors in a region of size r around individual i at time t 0 , while N is the total number of individuals and · · · t0 represents a time average over t 0 . The function C net (r, t) measures how much on average a neighborhood of size r of a given individual remains the same in a time interval t. For systems with a metric interaction -like swarms are [21] -if we set r equal to the interaction range r c , then C net (r c , t) measures the reshuffling of the interaction neighborhood (i.e. how many individuals interacting at time t 0 are still interacting at time t 0 + t).
To characterize the directional relaxation we consider the space-time correlation function of velocity fluctuations. In the context of the field theory described in the main text, this function would be given by the connected correlations of the field ψ(x, t). However, if we want to measure such quantity on real data, it is convenient to define it in terms of the individual velocities: where r ij (t 0 , t) = |r i (t 0 ) − r j (t 0 + t)| is the mutual distance.The fluctuations δv i are the dimensionless velocity fluctuations, defined as where δv i = v i − V indicates the individual fluctuation with respect to the collective velocity of the swarm V that takes into account global translation, rotation and dilation modes. Note that in Fig.3b of the main, we plot the normalized correlation functionĈ (r, t) = C(r, t) C(r, t = 0) .
These two correlation functions can be easily computed from the experimental and the numerical data, and compared with each other; their characteristic timescales (or relaxation times) can be computed through the formula introduced by Halperin and Hohenberg in [48] (see equation (269)), which provides a reliable evaluation of τ independently of the functional form of the correlation function. The question then is: over what spatial scale should we compare network vs velocity relaxation? Ideally, one would like to compare these two curves over the scale of the interaction range, to check whether relaxation is affected by activity even locally. Previous experiments [21] indicate that interaction in natural swarms is metric, with an interaction range r c of a few centimetres. However, since that estimate is far from precise, here we compare network and velocity relaxation at several spatial scales, from r = 5cm up to r = 15cm. To have a better intuition of what these scales entail, we note that r = 5cm corresponds to a neighbourhood of approximately 3 individuals, while r = 15cm to one of 41 individuals. In Fig.19 we show the behaviour of τ for the network and the normalized velocity correlations as a function of the spatial scale r: we can see that in the entire considered range the characteristic timescale is the same in the two cases. Given that the correlation functions quantify, respectively, how quickly the network changes on scale r, and how quickly directional correlations decay on the same scale, we conclude that activity is always strong in natural swarms, certainly affecting the relaxation properties of the system. We conclude that natural swarms are highly active systems.
The experimental information that network and velocity relaxation times are the same in natural swarms, was used to fix the level of activity in numerical simulations of the Inertial Spin Model (ISM); in the model, the interaction range r c (and therefore the mean number of interacting neighbours n c ) is a parameter set by us, hence we can calculate the two correlation functions in ISM simulations over the exact scale of the interaction range, and increase the particles speed v 0 until the two timescales are approximately the same, to be sure that numerical simulations -as natural swarms -are in the active regime. In Fig.3b of the main text we display the behaviour of both C net (r, t) andĈ(r, t) in simulation, at r = r c , corresponding to n c = 18 interacting neighbours; for a comparison, in the same figure we report the two correlation functions in natural swarms, at the same value of n c ; however, as we have seen from Fig.19, in natural swarms the two timescales are the same over a very wide range of scales.
To compare our theoretical RG prediction of the dynamic critical exponent z with experiments on swarms in the field we must provide a robust experimental evaluation of z. To understand what is the best way to experimentally infer z, let us go back to the dynamic scaling hypothesis, which links the relaxation time of the mode at wave-vector k, namely τ k , to the correlation length of the system ξ, To fit z from experimental data, we proceed as follows. For each swarm, we compute the relaxation time at k = ξ −1 [5], so that the scaling law (260) becomes, where τ ≡ τ k=ξ −1 . In log-log scale, this becomes a linear relation between log ξ vs log τ . The exponent can be thus obtained as the slope of the regression line fitted on log τ vs log ξ: Hence, to fit z we need a robust estimate of ξ and τ .

A. EVALUATION OF ξ AND τ
In this Section we describe how the correlation function is computed from the trajectories of midges, and then how we can extract the correlation length and the relaxation time from the data. This is the same procedure used in [5]. For each swarming event, the best estimate of ξ, τ and of their experimental errors can be found in Table I.
Let us here remind that we are dealing with experimental data in which time is intrinsically discrete, where the temporal resolution ∆t is fixed by the experimental setup. Data presented in this work are collected with cameras shooting at 170 fps (see Methods), thus ∆t = 170 −1 s. In the following data analysis, time is therefore treated as discrete, namely measured in units of ∆t. Hence, it will typically range from 1 to the length of tha data acquisition t max . The only exception to this convention are plots in Fig.20, where we use time measured in seconds.

CORRELATION FUNCTIONS
As a starting point, we define the dimensionless velocity fluctuations as, where, δv i ≡ v i − V and V is the collective velocity of the swarm which takes into account global translation, rotation and dilation modes, see [25]. The spatio-temporal correlation function is the time generalization of the static space correlation function previously studied in [21,25,49], where r ij (t 0 , t) = |r i (t 0 ) − r j (t 0 + t)| and the positions are calculated with respect to the center of mass of the swarm, that is r i (t 0 ) = R i (t 0 ) − R CM (t 0 ), and N is the total number of individuals. The brackets . . . t0 indicate an average over time, where t max is the total available time in the simulation or in the experiment and ∆t is the time interval between two frames. The purpose of C(r, t) is to measure how much a change of velocity of an individual at time t 0 influences a change of velocity of another individual at distance r at a later time t 0 + t. The (dimensionless) correlation function in Fourier space is given by, C(k, t) = ρ dr e ik·r C(r, t) .
with ρ being the average density. By using the definition of C(r, t) and the approximation N i,j δ[r−r ij (t 0 , t)] ∼ 4πr 2 ρN in the integral, we obtain, which is the correlation function that we compute experimentally in the present work. Notice that, by definition, i δv i = 0; due to this sum rule we obtain C(k = 0, t) = 0. Hence, k = 2π/L is the smallest non-trivial value of the momentum at which we can evaluate the correlation.

CORRELATION LENGTH
To compute the correlation length, ξ, we can directly work in k space. The static correlation function, C 0 (k) ≡ C(k, t = 0), is, where now both i and j are evaluated at equal time, t 0 . For k → ∞, only the self-correlations contribute to C 0 since sin(k r ij )/(k r ij ) → δ ij , and hence C 0 (k) → 1. By decreasing k we are averaging over larger length scales, therefore adding to (267) more correlated pairs, making C 0 (k) increase. When the momentum arrives at k ∼ 1/ξ, we start adding uncorrelated pairs, hence, C 0 (k) must level. If we further decrease k and reach 1/L (where L is the system's size) we start to be affected by the sum rule, C 0 (k = 0) = 0, hence the static correlation C 0 (k) decreases, until eventually it vanishes for k = 0 [50]. In a system where ξ L the static correlation therefore has -in log scale -a broad plateau between k ∼ 1/ξ and k ∼ 1/L. However, natural swarms are scale-free systems, where ξ ∼ L [25]; in this case, C 0 (k) has a well-defined maximum at k max ∼ 1/ξ ∼ 1/L. This is a very practical way to evaluate ξ if one is already working in k space and it is the one we used in [5]. Alternatively, in the scale-free regime, one could define ξ as the point where the static correlation in r space C 0 (r) = C(r, t = 0) reaches zero, C 0 (r = ξ) = 0. These two definitions of ξ are consistent with each other, as shown in [5].
The experimental evaluation of the correlation function is not exempted from experimental errors, which in turn affect our determination of ξ. Our best estimate of ξ is obtained by looking for the value of k, k max , correspondent to the maximum of C 0 (k), which is computed by averaging Eq. (267) over t 0 using the full time interval t max experimentally available, namely by using all the t max frames. To associate an experimental error δk max to this evaluation of k max , we use a resampling method. Among the t max different frames, we randomly sort half of them and compute k max from the sub-sample. We repeat this 1000 times, thus working out the experimental distribution of k max . The standard deviation of this distribution is our estimate of the error δk max . The experimental error associated to log ξ is given by δξ/ξ = δk max /k max , since ξ = k −1 max . The relative error δξ/ξ evaluated in this way is given in Table I. As we shall see in the next section, assigning an experimental error to the relaxation time is far less straightforward.

RELAXATION TIME
To calculate the relaxation time at wavevector k, τ k , we use the formula [48], whereĈ(k, t) = C(k, t)/C(k, t = 0) is the normalized correlation function. Notice that this is nothing else than the time domain version of the classic definition of the characteristic frequency as the half-width of the correlation function in the ω domain [48]. For a purely exponential correlation, τ k coincides with the exponential decay time, while for more complex functional forms, τ k is the most relevant time scale of the system. Relation (268) gives an estimate of τ k that is more robust than simply crossingĈ(k, t) with a constant and more reliable than a fit, as it does not require a priori knowledge of the functional form ofĈ(k, t). Dealing with real data, to estimate τ = τ k=ξ −1 = τ kmax we discretize the integral (268) approximating it with trapezoids and hence numerically solve: where we remind that t max is the time duration of the event of interest while ∆t is the time resolution. How can we be sure that the value of τ so determined is a fair estimate of the actual relaxation time? Whenever we estimate a relaxation time, we must be very careful about the balance between τ and the total length of the time series, t max . Ideally, one would like to have an infinitely long time series, which would give a perfect determination of the correlation function; let us call the relaxation time relative to this ideal correlation function, τ ∞ . In the real cases, t max < ∞, so that in general τ (t max ) = τ ∞ . Although the way in which τ (t max ) converges to τ ∞ for t max → ∞ depends on the specific form of the correlation function, there are some general traits: when t max < τ ∞ the estimated relaxation time τ is strongy biased by the shortness of the interval, while it saturates to τ ∞ when t max τ ∞ ; hence, this saturation can (and must) be used as a check that τ is a fair estimate of τ ∞ . This is what we have done in our data. Given that t max is fixed by the experiment, to do this test we calculated the relaxation time from Eq. (269) with t max replaced by various observation times T ≤ t max , with increment ∆T = 10 frames. Whenever Eq.(269) has no solution, we take the boundary value, τ (T ) = T . For T < t max /2, the relaxation time can be computed in several non-overlapping intervals, hence we use their average as an estimate of τ (T ). In Fig.20 we report τ (T ) for three different swarming events: because we see a rather clear saturation of τ (T ) for larger T , we trust the value of the relaxation time.
Given the procedure to calculate τ , it should be clear that the estimate of its experimental uncertainty, δτ , is way more problematic than for ξ. The main point is that, because we need the whole time series to evaluate τ , we cannot use temporal sampling to estimate δτ , as we instead did for δξ. Moreover, unlike what one would do in a numerical simulation, we do not have different independent samples of the same swarm to make statistics. Admittedly, the procedure to assign an error on τ is rather arbitrary, as it strongly depends on our prior about what is the main source of experimental variability of τ ; unfortunately, while different methods to estimate τ give very similar results, strongly correlated to each other, different procedures to estimate δτ have far greater variability and weaker correlation. We propose in the following two different methods to estimate the experimental error on τ ; the two methods will give results that, although not outlandishly so, are in fact quite different from each other. This will hopefully convince the reader of the difficulties in the determination of the experimental error on the relaxation time, and of the reason why we ultimately decided to use a fitting method for z that does not require the experimental errors as an input. *** Experimental error from the Time Series -δτ TS As we explained above, when the duration of the time series increases, τ eventually reaches a plateau (Fig.20); but of course, there are fluctuations around this saturation value, fluctuations that should become smaller and smaller the larger t max is. We use these fluctuations from the time series around the plateau to estimate the error, δτ TS ; more precisely, we calculate the variance of the fluctuations of τ around the plateau. There is however a significant problem with this definition of the error: to apply the procedure we need to define a time T * at which τ (T ) has reached the plateau, so we can sample the fluctuations of τ (T ) only for T ≥ T * ; but deciding where to locate T * is not straightforward and the reason can be understood from Fig. 20: the relatively large fluctuations on τ and the different shapes of τ (T ) make it not possible to find a safe universal criterion to decide when the plateau is actually reached. But unfortunately, the final value of δτ TS depends rather strongly on T * . This problem is particularly severe becauseeven though we only consider swarms that do reach a plateau -in some cases we do not have a very large interval of time on the plateau. These problems notwithstanding, we have estimated to the best of our judgement δτ TS in all swarms, and the results are reported in Table I.

*** Experimental error from the Correlation Functions -δτ CF
The second method consists in propagating the error in the determination of the correlation function C(t), by using Eq. (269). To do so, let us call the quantity on the l.h.s. of Eq. (269), where C t ≡ C(k max , t). A variation δC and δτ would affect the quantity I as However, from Eq. (269) we know that I (τ ) = π/4, meaning that δI = 0 by definition. The variations δC and δτ cannot be independent from each other, but they must keep δI = 0. We can therefore express δτ as a function of δC at different times. After some algebraic manipulation, we obtain that To avoid an underestimation of the uncertainty on τ , we approximate δτ CF with the upper bound given by Eq. (272). Errors of the correlation function, δC n are evaluated as follows: reminding that C(k, t) is computed as a temporal average -see Eq. (264), (266) -we define the experimental error δC n as the standard error associated to this temporal average. Even though this method to assess δτ may seem quite promising, as it does not require any arbitrary definition (in contrast with the choice of T * in the previous method), it has its own shortcomings. The correlation function C(t) is the more unreliable the closer t is to t max , as the number of temporal pairs of frames used to compute C(k, t) in Eq. (264) is given by t max − t; hence, one would expect the actual uncertainty on the correlation to be larger at larger t. But in fact, this is not necessarily what happens to the standard error: δC t is obtained by sampling the distribution of the correlation at that time, but this sampling is very imprecise if the number of pairs is very limited. Moreover, in the determination of the standard error on C(k, t) one is implicitly assuming that all the temporal pairs over which we are averaging are uncorrelated, which is clearly blatantly false. While this does not impact on the me an, that is on C(k, t) itself, it certainly impacts -and a lot -on the determination of the variance, namely on δC t , and therefore in turns on δτ CF .
In conclusion, even though both definitions of the error on the relaxation time seem reasonable, they both have some critical issues; their values are reported in Table I, where one can see that -despite some correlation -they are in fact quite different from each other. If the value of the experimental errors, δξ and δτ only affected the error on the critical exponent, δz, we would not be too worried; however, with certain fitting methods the experimental errors affect the value of z itself, which is quite another story. Because of this, we ultimately decided to rely on a method (RMA) that does not need the errors as an input, hence giving a value of z that is not affected by our estimate of δξ and δτ . Nevertheless, we describe in the next Sections also a method that does use δξ and δτ to estimate z (EV) and see how it performs with the two different definitions of δτ given above. Fortunately, we will see that all three estimates (RMA, EV with δτ TS and EV with δτ CF ) are all quite compatible with each other.

THE PROBLEM WITH LEAST SQUARES
The Least Squares (LS) regression method is perhaps the most common and used regression method to fit experimental data [51]. This has also been the method with which previous determinations of the dynamic critical exponent z in natural swarms were made [5]. Its wide use is due to the simplicity of its assumptions and to the fact that the equations to solve in order to find the coefficients of the fitting line are linear. LS works under the hypothesis of zero experimental errors on one of the two variables, which then takes the role of independent variable x, and uniform variance σ 2 y for the second variable, which takes the role of dependent variable y. Note that, although with LS the variable y can be affected by experimental uncertainty, the actual errors are not needed as an input, hence results given by LS do not depend on the protocol used to evaluate the errors. The best fit line y = αx + β is obtained by minimizing the average square scatter ∆y 2 i of the points from the line in the direction of y only, where This translates in the minimization of the following quantity which can be done explicitly by solving ∂ α Σ = ∂ β Σ = 0. The system that has to be solved to find the minimum point is given by The linear system in α, β can be easily solved, leading to where When LS is applied to the data of Table I, we obtain where the error is obtained through a resampling method, namely by performing many times the fit over half of the data and taking the square root of the variance of z (see main text). The LS value of z is in line with what was obtained with fewer data in [5], which also used LS. But unfortunately, there is a serious problem. Despite its simplicity, the main limit of LS is the very assumption of the absence of experimental uncertainty on the x variable; in our case this means assuming that the correlation length is perfectly determined, which is definitely not true in the case of our experiments. In fact, errors on both ξ and τ are significant, as it can be seen from Table I. When the hypothesis of exact determination of x is violated, LS is known to underestimate the slope, and to severely do so if the uncertainty in x is significant. A consequence of the LS assumption is that the method is not symmetric under exchange of x and y. In our case, when the dependent and independent variables are swapped in LS, namely when we fit the relation log ξ = z −1 log τ + c , we obtain a completely different result for the dynamic exponent, z LS = 1.62 ± 0.20; in this case LS is assuming that there is no experimental uncertainty in the relaxation time, which is even worse than assuming no uncertainty in ξ, given that errors on log τ are usually even larger than those on log ξ. Summarizing, the fact that experimental errors on log τ and log ξ are present, no matter how difficult they are to determine, indicates that one critical assumption of LS fails in our case. This is the reason why the exponent z was unfortunately significantly underestimated in [5] as well as in [52].

REDUCED MAJOR AXIS
The shortcomings of LS can be easily overcome by treating the two variables x and y in a statistically symmetric way, which is what the Reduced Major Axis (RMA) method does. We will not repeat here the detailed description of RMA, as this is already provided in the main text, Methods Section. We simply recall here that RMA -as LS -does not require the experimental errors in order to work, but -unlike LS -RMA works under the hypothesis that both x and y are affected by experimental uncertainty and therefore gives the same fitted line under exchange of x and y. The fit of z with RMA gives, z RMA = 1.37 ± 0.11 . (281)

EFFECTIVE VARIANCE
The Effective Variance (EV) regression method -unlike LS and RMA -makes explicit use of the errors on the experimental data-points. To derive the EV method, let us start by making some improvement on LS by taking into account the errors experimentally measured for different swarms. To do this, we minimize the average square scatter from the fitting line divided by the experimental error on y, namely where δ 2 i = σ 2 yi is the variance of the measurement of y i . This version of LS does resolve the critical issues raised in the previous section and it only slightly changes the determination of the fitting parameters, which are the same as Eq. (277),(278), but with an average which weights the different points by their inverse variance To fully take into account also the presence of experimental uncertainty also on the variable x we must replace the variance σ 2 yi with an effective variance that takes into account the errors on both variables [53]. Hence, the EV regression method consists in minimizing the quantity defined in Eq. (282) with δ i taken as, No explicit solutions exists for EV method in the general case in which σ xi and σ yi are different from point to point. Minimization must be thus performed numerically with a Monte Carlo algorithm, where as starting value we choose the solution given by LS. We applied the EV method to our data using the experimental errors on log ξ and log τ estimated above. When the error on the relaxation time is estimated from the fluctuations around the plateau, δτ TS , we obtain, z TS = 1.34 ± 0.18 On the other hand, when the error on the relaxation time is estimated from the fluctuations of the correlation function, δτ CF , we obtain, Errors on the determination of z are computed through the usual resampling method, as for RMA and LS. In Fig. 21 we compare the different experimental distributions obtained through all the fitting methods discussed, namely LS, RMA, EV TS and EV CF . The only clear outlier is LS, which is expected because of the shortcomings we have discussed above; note also that LS is the method that gives the largest fraction of exponents smaller than 1, which is unphysical. On the other hand, RMA and EV are quite compatible with each other, although only RMA has virtually zero fraction of exponents smaller than 1 (0.002% of the sub-samples). Moreover, EV quite strongly depends on the determination of the errors, which -as we explained above -has its own criticalities. This is why we ultimately decided to estimate z through a method that does not depend on the determination of the experimental errors (RMA), although we believe that the consistency between RMA and EV is a sanity check of our experimental determination of z.

V. NUMERICAL SIMULATIONS
We perform numerical simulations in three dimensions of the Inertial Spin Model (ISM). Although the ISM was introduced in [6] to study information propagation in birds flocks, recent experimental data on natural swarms of insects [5], and preliminary theoretical investigations [10], strongly suggest that it is also a useful model to describe natural system. The ISM is a model of self-propelled particles moving at constant speed v 0 = |v i (t)| ∀ i, t and interacting with a short-range metric alignment force. The main difference with similar models of polar active particles, for instance the Vicsek model [7], is that the dynamics is inertial in the velocities v i , that is mediated by the generator of internal rotational symmetry, i.e. the spin s i . For completeness we report also here the equations already explained in the main text, where ζ i is a gaussian white noise with variance, We will study the ISM in the near-ordering phase. Among all the parameters entering the equations, two are the most important for our purposes: the first is the microscopic friction η, which represents the spin dissipation and is directly connected to the mesoscopic parameter discussed in section A.3. In [10] we studied the equilibrium (v 0 = 0) version of ISM and we unveiled the role of this parameter, showing that for η = 1.0 an underdamped value of the equilibrium dynamic critical exponent z is found. To be sure we are in the underdamped phase also in the active regime, we decide to use the same value of spin friction in the present simulations. Accordingly, in Fig.3a of the main text we report the velocity correlation functions computed with this value of η that clearly show an inertial decay. The second important parameter is what makes the system active, namely the particle speed v 0 . The study we carried out in [3] shows that in order to observe the active critical exponents, v 0 has to be large enough to yield a substantial reshuffling of the interaction network. We chose v 0 = 2.0 since, as it is shown in Fig.3b of the main text, at this speed the relaxation time of the network becomes comparable to that of the velocity.

A. THERMALIZATION
To perform numerical simulations, eqs (287) are discretized in time, and the constraint of constant speed is imposed using the RATTLE algorithm known from molecular dynamics simulation schemes [50,54]. An order-disorder phase transition is observed when the amplitude of the noise T is varied in the range [1 : 8]. For the set of parameters and sizes chosen, the phase transition is smooth and it does not show a jump usually due to the presence of heterogeneities. This is also due to the fact that the alignment strength is rescaled with J → J/n i where n i = j n ij (t) are the neighbours of particle i at time t. Doing this corresponds to implementing a non-additive interaction rule, which is known to suppress density fluctuations and to prevent aggregation events typical of the critical phase of polar active systems [55]. Without this normalization, the system is very unstable and the heterogeneities affect the behaviour of the order parameter appearing non-stationary, thus quite far from a scaling regime. Since our ultimate purpose is the computation of a critical exponent, we check that every sample used for the analysis has reached a stationary state and it does not show abrupt fluctuations. In panel a of Fig 22 we show the curve of time averaged polarization φ = | i v i |/(N v 0 ) vs T for two of the sizes simulated: the smooth transition is traceable in a well-behaved time series of this quantity (panel b). We also monitor density fluctuations looking at the trend of the average number of interacting neighbours n c , where the average is done on the particles of the system at a given instant of time. This quantity shows a stationary trend and limited fluctuations reflecting the absence of polarized clusters (panel c). Because of these evidences, we conclude that at these sizes and for this set of parameters it is reasonable to apply a second-order paradigm under the assumption of incompressibility in agreement with the analytical calculation. To extrapolate the dynamical critical exponent we use a finite-size scaling analysis simulating systems of sizes N = 256, 384, 512, 729, 1024, 2048 at constant average density ρ = 1. The other parameters of the model are fixed at: χ = 1, J = 18, r c = 1.6, η = 1, v 0 = 2. For each size we simulated 5 statistical independent samples for every value of temperature T explored. We initialize each simulation in a perfectly ordered state in a random spatial direction, and we let it evolve for ∼ 6 × 10 5 simulation steps to reach the termalization. After it, the sample is collected as a simulation of length t max depending on the system's size. To check that the choice of the latter provides a reliable calculation of correlation functions, we compute the auto-correlation of polarization as, and we compute its characteristic time scale τ φ using the definition (268) for different choices of t max . The left panel of Fig 23 shows this quantity averaged on 5 simulations of size N = 256 and T = 5.1. The decorrelation time reaches a plateau around 30 s, where time = N steps dt and dt = 0.0001. The asymptotic τ φ is ∼ 2.1 s, therefore choosing t max ∼ 20τ φ seems already a safe choice to compute correct correlation functions. Moreover, the C φ takes into account the global collective dynamics, thus it is expected to relax on time-scales larger than when involving modes with higher wave-number. To perform an analysis similar to that of the experiments, we are going to compute spatio-temporal correlation functions only at k = 0, therefore this ensures that using t max = 60 s (or greater) always provides reliable calculations of correlation functions, and consequently of the corresponding characteristic time-scales. We chose t max = 60 s for N = 256, up to t max = 80 s at N = 2048.

B. SCALING AND CRITICAL EXPONENT
For every set of simulation at given T and N we compute the static correlation functions in k-space, following definition (267) and using the simplified fluctuations δv i (t) = v i (t) − k v k (t)/N . Because of the sum rule i δv i = 0, this quantity is null at k = 0, it reaches a saturation value depending on v 0 and it shows a maximum at a point k max . The value of correlation at this point C 0 (k max ) represents a measure of the generalized integral susceptibility χ of the system (note that, because the system is out of equilibrium, the integral susceptibility χ that we are using here needs not be equal to the field-susceptibility, χ H , defined as the static response to an external field [8]). Computing χ for all the values of T considered, we obtain a curve which peaks at T c (N ), namely the size-dependent critical point that we identify for every N simulated. At the corresponding temperature we compute the dynamical correlation functions C(k, t) following (266) at k = k max . The latter is the wave-number at which the system is maximally correlated, it can thus be interpreted as the inverse of the correlation length k max = 1/ξ. When a finite-size system is close to criticality it shows a scale-free behaviour where ξ scales linearly with L: in the right panel of Fig 23 is shown that this is exactly what we observe in our simulations. This evidence ensures we are in the correct scaling regime where the dynamic scaling hypothesis can be tested.
For the six sizes simulated we compute the C(k, t) at T c (N ) on the 5 samples, the average curves are shown in the left panel of Fig 24. From them we extrapolate the characteristic time-scales using definition (268); in the main text is reported the linear regression of ln(τ ) vs ln(ξ) that provides a numerical estimate of the dynamical critical exponent, z sim = 1.35 ± 0.04. , which is in perfect agreement with the RG predictions and in good agreement with the experimental data. An additional confirmation of the validity of the dynamical scaling hypothesis with the exponent just found is the collapse of the dynamical correlation functions when time is properly rescaled t → tk z [48]. This theory indeed predicts that the correlation function and the characteristic time scale of the system are homogenous functions of the the product kξ, namelyĈ(k, t) = f (kξ, t/τ c (k)) with τ c (k) = k −z Ω(kξ). Therefore, at fixed kξ = 1, we should observeĈ(k, t) = g(tk z ) with the correct rescaling of time. Using the analytical prediction z = 1.35, the numerical curves almost perfectly collapse one onto each other in a single shape function, asserting the validity of the scaling hypothesis and fully confirming the RG perturbative calculation.