Learning dominant physical processes with data-driven balance models

Throughout the history of science, physics-based modeling has relied on judiciously approximating observed dynamics as a balance between a few dominant processes. However, this traditional approach is mathematically cumbersome and only applies in asymptotic regimes where there is a strict separation of scales in the physics. Here, we automate and generalize this approach to non-asymptotic regimes by introducing the idea of an equation space, in which different local balances appear as distinct subspace clusters. Unsupervised learning can then automatically identify regions where groups of terms may be neglected. We show that our data-driven balance models successfully delineate dominant balance physics in a much richer class of systems. In particular, this approach uncovers key mechanistic models in turbulence, combustion, nonlinear optics, geophysical fluids, and neuroscience.


Introduction
Across the engineering and physical sciences, decades of experimental and theoretical efforts have produced accurate and detailed physics-based models.The success of first principles modeling has resulted in governing equations describing a wide range of physics, including fluids, plasmas, combustion, atmospheric dynamics, electromagnetic waves, and quantum mechanics.However, it is well known that persistent behaviors are often determined by the balance of just a few dominant physical processes.This heuristic, which we refer to in general as dominant balance, has played a pivotal role in our study of systems as diverse as turbulence [1,2], geophysical fluid dynamics [3][4][5], fiber optics [6,7], and the earth's magnetic field [8].It is also thought to play a role in the emerging fields of pattern formation [9][10][11][12], wrinkling [13], buckling [14], droplet formation [15,16], electrospinning [17], and biofilm dynamics [18].These balance relations, or order parameters [9], provide reduced-order mechanistic models to approximate the full complexity of the system with a tractable subset of the physics.
The success of dominant balance models is particularly evident in the field of fluid mechanics.The Navier-Stokes equations describe behavior across a tremendous range of scales, from water droplets to supersonic aircraft and hurricanes.Thus, much of our progress has required simplifying the physics with nondimensional parameters that determine which terms are important for a specific problem.Perhaps the most well known dimensionless quantity, the Reynolds number, describes the balance between inertial and viscous forces in a fluid.Other nondimensional numbers capture the relative importance of inertial and Coriolis forces (Rossby number), inertia and buoyancy (Froude number), and thermal diffusion and convection (Rayleigh number), among dozens of other possible effects.In many situations, the magnitude of these coefficients determine the important mechanisms at work in a flow; further, they determine which mechanisms may be safely neglected.This approach has been especially important in making experimentally observable predictions for the profiles and scaling of wall turbulence [1,[19][20][21][22][23][24][25].Similarly, in geophysical flows, balance arguments bypass the incredible complexity of the ocean and atmosphere to identify driving mechanisms such as geostrophy, the thermal wind, Ekman layers, and western boundary currents [3,4].Lighthill, one of the most influential fluid dynamicists of the 20th century, often relied on dominant balance arguments as physical motivation for his mathematical analyses [26][27][28].Beyond fluid mechanics, asymptotic methods have been crucial in characterizing a diverse range of physical behavior.
More recently, modern developments in scientific computing have revolutionized our understanding of complex systems by enabling high-fidelity models that quantify multi-scale spatiotemporal interactions.At the same time, advanced tools from statistics have enabled the analysis of this increasing wealth of data.However, dominant balance models are still typically derived by hand using tedious scaling analysis or asymptotic expansions in limiting regimes.This severe restriction explains why such a powerful technique has not found even wider traction; many systems of practical or basic research interest lie between the extremes where scaling analysis can be unambiguously applied.
There is an exciting opportunity to leverage data-driven methods to identify dominant balance physics in these more challenging applications.Data-driven modeling is already driving changes in how we approach problems from control [29][30][31][32] to turbulence modeling [33] and forecasting [34,35].Indeed, some studies have addressed the dominant balance problem by using expert knowledge to design application-specific clustering algorithms, for example in a transitional boundary layer [36,37] and stratified turbulence [38], in the latter case confirming the results of prior scaling analyses [39,40].Although these results are encouraging, to our knowledge the general challenge of identifying local dominant balance regimes from data remains open; our paper aims to address this gap.
In this work, we develop a generalized data-driven method to identify dominant balance regimes in complex physical systems.Beginning from the full evolution equations, we treat each term as a coordinate in an "equation space".Dominant balance relations have a natural geometric interpretation in this space, allowing a combination of unsupervised clustering and sparse approximation to automatically identify regions where groups of terms have negligible contributions to the local dynamics.We explore the proposed method on several systems, including a turbulent boundary layer (shown in Fig. 1), electromagnetic pulse propagation in an optical fiber, geostrophic balance in the Gulf of Mexico, and a biophysical model of a bursting neuron.In each case, we recover the expected balance relations from classical scaling analysis.The apparent ubiquity of the dominant balance phenomenon confirms a long-standing heuristic in physical sciences, while the ability to identify spatiotemporally local balance models via a data-driven approach opens new opportunities in a broad range of applications.
If successful, nonasymptotic data-driven methods could be used to better understand the behavior of more exotic dynamics such as non-Newtonian turbulence [41], hydrodynamic quantum analogues [42], and extreme event triggering [43], or to study important transitional behavior in cases where the asymptotics are already well known [44][45][46][47][48].In the latter case, a clear understanding of the active mechanisms has proven crucial to successful control strategies [49,50].We may even be able to identify local dominant balance behavior in spatiotemporal systems without clear governing equations, such as neuroscience [51], epidemiology [52], ecology [53], active fluids [54][55][56], and schooling [57].Automatic segmentation may also inform efficient numerical methods, in the vein of shock-capturing schemes [58], adaptive mesh refinement [59], or hybrid turbulence modeling [60].It is our hope that this approach will shed light on more exotic physical processes that have remained elusive to traditional analysis.
< l a t e x i t s h a 1 _ b a s e 6 4 = " U 0 9 P a K w m 8 6 o + F 8 j a h i j J j s y n Z E L z 1 l / + S z m X N q 9 f q r X q l U c 3 j K M I Z n E M V P L i C B t x C E 9 r A A O E J X u D V u X e e n T f n f d V a c P K Z U / g F 5 + M b 4 7 W M 6 w = = < / l a t e x i t > Figure 1: Schematic of the dominant balance identification procedure applied to a turbulent boundary layer.High-resolution direct numerical simulation results (a, visualized with a turbulent kinetic energy isosurface) are averaged to compute the Reynolds-averaged Navier-Stokes equations (b).The equation space representation of the field enables clustering and sparse approximation methods to extract the distinct geometrical structures in the six-dimensional space corresponding to dominant balance physics (c).Finally, the entire domain can be segmented according to these interpretable balance models, identifying distinct physical regimes (d).The equations and classical scaling analysis are discussed in Sec.3.2.

Unsupervised dominant balance identification
In many fields of physics, painstaking analyses have produced models that are capable of describing a wide range of physical phenomena.However, it is well understood that the full complexity of such models is not always necessary to describe the local behavior of a system.We find that in many regimes the dynamics are governed by just a subset of the terms involved in the global description.For example, a general evolution equation for the field u(x, t) on the domain (x, t) ∈ D can be written as Classically, this equation would be derived from fundamental physics (e.g.Maxwell's equations or the Navier-Stokes equations), but it could result from a model discovery procedure [61][62][63].Consider an "equation space" where each coordinate is defined by one of the K terms in Eq. (1).At each point (x, t) in space and time, each of the K terms f i in the governing equations (1) may be evaluated at u(x, t), resulting in a vector f ∈ IR K : By construction, 1 T f (x, t) = N (u) = 0 for all (x, t) ∈ D. Simulated or measured field data is typically discretized, so the domain is approximated by N spacetime points: D ≈ (x, t) j | j = 1, 2, . . ., N .The field at each of these points corresponds to a point in equation space.
We define a dominant balance regime as a region R ⊂ D where the evolution equation is approximately satisfied by a subset of p < K of the original terms in the equation; the remaining terms may be neglected.In this case f (x, t) will have near-zero entries corresponding to negligible terms when (x, t) ∈ R. Geometrically, the field is approximately restricted to p of the original K dimensions of the equation space, resulting in a subspace that is aligned with the active p terms.
This geometric perspective on dominant balance physics leads naturally to segmentation via unsupervised clustering.For example, the Gaussian mixture model (GMM) framework learns a probabilistic model by assuming the data are generated from a mixture of Gaussian distributions with different means and covariances [64].The learned covariances for each cluster can then be interpreted in terms of active and inactive terms in the evolution equation.The N spacetime points in D are used to train a mixture model; the algorithm treats points from a dominant balance regime as if they were generated from a distribution with near-zero variance in the directions corresponding to negligible terms.Data beyond the original inputs can efficiently be assigned to a balance model using the trained GMM.
In practice, there is no reason to expect the points will even approximate a mixture of Gaussian distributions.We therefore expect that the number of clusters required to capture all of the relevant physics will exceed the number of distinct balance regimes, resulting in redundant clusters.Furthermore, there is some ambiguity in the interpretation of "near-zero variance".We address both of these issues using sparse principal components analysis (SPCA) [65], which uses 1 regularization to extract a sparse approximation to the leading principal component.If a cluster describes a dominant balance regime, it should be well-described by its direction of maximum variance.Moreover, this leading principal component should have many near-zero entries.We apply SPCA to the set of points in each GMM cluster and take the active terms in the cluster to be those which correspond to nonzero entries in the sparse approximation to the leading principal component.The number of models can then be reduced by grouping clusters with the same set of active terms (or equivalently, the same sparsity pattern in the SPCA approximation).Dominant balance identification can be seen as a localized active subspace analysis in equation space [66].Rather than assuming that there is a global decomposition into approximately active and inactive subspaces, we simultaneously search for subspaces corresponding to different balance relations and the regions of the domain where the dynamics are well-described by this subspace.
For example, one of the simplest models that demonstrates dominant balance is the viscous Burgers' equation, shown in Fig. 2. Shocks form from the nonlinear advection and are dissipated by the viscous term.Away from the shock front, however, the gradients of the field are relatively weak, so viscosity does not contribute significantly to the dynamics.Figure 2 demonstrates the balance identification procedure applied to a snapshot of the viscous Burgers' equation example.Most of the field is classified into two clusters, corresponding to either no dynamics or an inviscid balance between acceleration and advection.Only a narrow slice along the shock front belongs to a cluster in which viscosity is active.
In simple cases, this two-step GMM-SPCA procedure might be replaced with a hard threshold; if a term exceeds some value it is "on".However, the proposed method offers two main advantages over thresholding.First, the idea of dominant balance has a natural geometric interpretation in equation space, thereby avoiding setting an arbitrary threshold for which diagnostics and interpretation may not be straightforward.Second, our method considers the local, relative importance of terms, whereas thresholding describes global, absolute importance.For example, this distinction is significant in multiscale systems with some background process underlying intermittent bursts of activity.The intermittency is dominated by a balance between terms which may be much larger than the background process, although the dynamics during quiescent periods would be determined primarily by the background process.In this case an absolute thresholding method would either choose the background process to be always on or always off, whereas a relative approach recognizes that the dominant local balance simply changes during the intermittent activity.This is illustrated in Sec.3.5, where we investigate a Hodgkin-Huxley-type model of spiking neuron, generalized to introduce multiscale bursting behavior.

Results
We now apply the dominant balance identification method to a range of physics with varying complexity: unsteady vortex shedding past a cylinder at Reynolds number 100; the mean field of a turbulent boundary layer; optical pulse propagation in supercontinuum generation; geostrophy in the Gulf of Mexico; and a Hodgkin-Huxley-type model of a biological neuron.Figure 3 shows a summary of the results, including slices of the equation space representations, identified balance models, and segmented fields.In each case, the results are consistent with classical scaling analyses and known physical behavior.Descriptions of the models and code used to generate this data are presented in Appendix A and are available online.

Flow past a circular cylinder at Re = 100
Governing equations and analytic scaling.Flow past a cylinder at moderate Reynolds number is a prototypical flow configuration for bluff body wakes.The wake transitions from steady laminar flow to periodic vortex shedding via a Hopf bifurcation at Re ≈ 47.The transition from linear instability to a stable limit cycle is itself a fascinating example of dominant balance in fluid mechanics and dynamical systems.The quadratic nonlinearity, initially inactive in the linear regime, mediates energy transfer between the mean flow and instability modes, deforming both until an energy balance is reached in the periodic limit cycle.This nonlinear stability mechanism was first described by Stuart and Landau [67,68] and later employed for reduced-order modeling [69].
Even in the stable limit cycle, however, the local dynamics of the flow vary widely throughout the domain, highlighting mechanisms that give rise to von Kàrmàn-type vortex streets in a wide variety of flows.This unsteady, incompressible, viscous flow is governed by the two-dimensional Navier-Stokes equations: where ũ is the velocity field, p is the pressure, ρ is the density, and ν is dynamic viscosity.Of course, these equations themselves involve some degree of approximation, ignoring effects such as compressibility and gravity, making use of the Newtonian form of the stress tensor, and assuming Fickian diffusion, though they have proven highly accurate when applied in the correct regime.Nevertheless, there are distinct regimes in this simple wake flow.
For the wake behind a circular cylinder, the most relevant scales are the cylinder diameter L and free-stream velocity U .Dimensional analysis then suggests that Nondimensionalizing with respect to these scales, we find that the viscous term is smaller than the others by a factor of the Reynolds number, Re = U L/ν, resulting in the familiar nondimensional form of the Navier-Stokes equations: The variables and operators have been nondimensionalized according to the previous scales.For even moderately large Reynolds numbers, we would expect the flow to behave in an approximately inviscid manner away from the cylinder.Thus, structures formed in the near-wake region will be advected downstream by the mean flow with only weak dissipation, as observed in the vortex street.Near the cylinder, the no-slip boundary conditions due to viscosity change the behavior qualitatively.If we examine the flow at a point a distance δ L from the wall, then δ is a more appropriate length scale for the gradients.However, since the near-wall flow varies on a similar timescale to the wake, suppose that U/L is still a good scale for the time derivative.The various terms then scale as Local balance Figure 4: Vorticity snapshot for the wake behind a cylinder at Re = 100 (a).A Gaussian mixture model (GMM) assigns field points to clusters by looking for groups with distinct mean and covariance (b).For instance, some clusters vary mainly in the acceleration-advection directions, while others vary principally in the viscous-advection directions.We would expect these to represent the far-field and boundary regions, respectively.This is confirmed by the sparse principal components analysis (SPCA) reduction, where clusters with significant nonzero variance in the same directions are grouped together (c).These directions can be interpreted as active terms in the balance relation (d).As anticipated, the region near the cylinder is dominated by a balance between viscosity and advection and pressure forces, while the far wake is approximately inviscid (e).
We find that the acceleration term is now smaller by a factor of δ/L, and expect the viscous term to be balanced by advection and the pressure gradient.The relatively strong gradients near the wall give rise to the vortex structures which characterize the wake.
Identified dominant balance.Figure 4 shows an example vorticity field along with views of the 4D equation space corresponding to Eq. ( 4).Although the method treats space and time equivalently, here we freeze time and explore a single snapshot; since the flow is periodic we expect the results to be representative.The visualization in equation space clearly reveals signatures of balance relations.One set of GMM clusters is nearly restricted to the the zero-viscosity plane, while another has reduced variance in the acceleration direction.The sparse approximations to the leading principal components of each cluster confirms this intuition; we use SPCA to construct balance models by grouping the Gaussian models with non-negligible variance in the same directions.As expected, the far wake is approximately inviscid, while the region near the cylinder is dominated by a balance between viscosity, pressure, and advection.This method also identifies other approximate regions, such as a low-pressure-gradient balance between acceleration and advection (blue), slowly varying potential flow (green), and a far-field region with near-zero dynamics (white).

Turbulent boundary layer
One of the major breakthroughs in the study of fluid mechanics in the 20th century was the development of boundary layer theory [1,70].In many practical applications fluids can be treated as inviscid, but close to solid boundaries strong velocity gradients lead to significant viscous forces.Prandtl showed in 1904 that careful scaling analysis applied to the governing Navier-Stokes equations reveals distinct regimes where the behavior of the fluid is essentially determined by a small subset of the full equations.In turn, these balance relations can be used to derive powerful scaling laws such as the so-called "law of the wall".
Although such analyses can be intractable for general turbulent flows, one of the most important canonical configurations is zero pressure gradient flow over a flat plate parallel to the free stream velocity.The zero pressure gradient ensures that the free-stream velocity is constant in the streamwise direction at large distances from the wall.This flow is statistically two-dimensional; the configuration does not vary in the cross-stream direction so the mean flow only varies in the streamwise and wall-normal directions.
Governing equations and analytic scaling.After performing the Reynolds decomposition of the variables into mean and fluctuating components, e.g.u = ū + u , the mean flow is determined by the Reynolds-averaged Navier-Stokes (RANS) equations.For the streamwise mean velocity ū, the equation is The terms on the left represent mean flow advection, while those on the right are the pressure gradient, viscosity, wall-normal Reynolds stress, and streamwise Reynolds stress, respectively.One of the challenges in studying this flow is that there are multiple length scales.Following [74], we may consider a streamwise length scale L, a wall-normal length scale , and a viscous length scale η = ν/u τ , where u τ is the "friction velocity" associated with the shear stress at the wall.
Beginning with the "outer" region of the boundary layer (where y η), suppose the mean streamwise velocity ū scales with the free stream U ∞ , while the turbulent fluctuations u , v scale with u τ .As with the previous example, assume that the derivatives scale with the corresponding length scale, so that for instance (•) y ∼ 1/ .For instance, the continuity equation ūx + vy = 0 implies that v ∼ U ∞ ( /L).By this reasoning typically we would expect the mean velocity gradient ūy to scale with U ∞ / , but as argued in [74], the gradients in the outer part of the layer are much weaker than near the wall, and empirically a better estimate is ūy ∼ u τ / .Then for the streamwise momentum equation we find and the pressure gradient is negligible by construction.Since L we neglect the streamwise Reynolds stress compared to the wall-normal term.On the other hand, since U ∞ u τ , we can assume the mean flow advection is dominated by the streamwise component ūū x .Finally, the viscous terms are smaller than the advection by a factor on the order of the Reynolds number Re L = U ∞ L/ν 1.The outer part of the boundary layer is then determined by an inertial balance between streamwise mean flow advection and wall-normal Reynolds stress: However, this relation cannot describe the near-wall regime, where viscosity is known to be important.In this region we expect the wall-normal derivatives to scale with (•) y ∼ 1/η = u τ /ν.
< l a t e x i t s h a 1 _ b a s e 6 4 = " P d The method recovers expected balance relations for the free-stream (green), the inertial sublayer (blue), and the viscous sublayer (red), along with a laminar region near the inlet (purple) and a transitional region (orange).The inertial sublayer follows the theoretically predicted power law (c).Boundary layer theory predicts that the length scale of the sublayer scales with ∼ x 4/5 .As a rough criterion for the scale of the inertial balance model, we use the wall-normal coordinate at which the balance relation changes (solid line top), once the transitional region (purple) ends.A curve fit shows an approximate scaling of ∼ x 0.81 .
As a consequence of the no-slip boundary conditions, in this region the free-stream velocity is not an appropriate scale for the streamwise component and we should instead use the friction velocity u τ , so that In this case the wall-normal Reynolds stress is larger than the mean flow advection by a factor of L/η 1 and must instead be balanced by the viscosity.Therefore, in a thin viscous sublayer near the wall the dominant balance is The overall picture is then that the Reynolds stress must be balanced by mean flow advection in the inertial sublayer and by viscosity in the near-wall region.Outside of the turbulent boundary layer the Reynolds stresses and mean wall-normal velocity are negligible, so small variations, for instance due to incompletely converged statistics, should be described by the balance ūū x = −ρ −1 px .In a true zero pressure gradient flow both of these would be zero in the free stream.

Identified dominant balance.
We investigate the dominant balance physics of transitional boundary layer data from a direct numerical simulation [36,37,73], openly available from the Johns Hopkins Turbulence Database [71, 72] 1 .Figure 5 shows the equation space clusters and associated dominant balance models for the mean fields.As with the cylinder example, some sets of points have significantly reduced variance in certain directions of equation space, a strong signature of the dominant balance phenomenon.
The method identifies regions corresponding to the viscous sublayer (7), inertial sublayer (6), and slightly perturbed free stream.It also identifies a region near the inlet characterized by a lack of Reynolds stresses, suggesting the mean profile here should be consistent with the laminar solution.The boundaries between balance regimes need not be sharp, however, especially in a transitional flow.In this case a cluster containing all of the active terms in the zero-pressuregradient flat plate turbulent boundary layer equation is identified between the laminar inflow region and fully developed turbulence downstream.
Equations ( 6) and ( 7) are a starting point for many of the results of boundary layer theory; from these a range of useful laws can be derived, such as the logarithmic mean velocity profile in the inertial sublayer.Although we ultimately hope that data-driven balance identification will open new avenues of analysis, we can also use established results to examine the validity of the proposed method.
For example, the dominant length scale in the inertial sublayer is expected to depend on the streamwise coordinate x via a power law ∼ x 4/5 [1].It is not usually obvious how to extract a specific value of for which this scaling can be checked.However, as a rough proxy we may consider the wall-normal coordinate at which the dominant balance changes from that of the inertial sublayer to the free-stream.Figure 5 shows the growth of the inertial sublayer thickness according to this definition along with a power law fit with exponent 0.81, showing close agreement with the expected value of 4/5.Although this evidence is somewhat circumstantial, it is at least suggestive that the balance model identification procedure reflects the underlying physics.

Optical pulse propagation
Another important example of dominant balance arises in nonlinear optics, where the interplay of an intensity dependent index of refraction with chromatic dispersion can generate localized optical solitons [75].The derivation of the governing evolution equations of the electric field envelope from Maxwell's equations shows that for ultra-short pulses of light (e.g. a few femtoseconds), the time response of the polarization field can yield [76] a rich set of nonlinear dynamics.Figure 6 shows an example of a process known as supercontinuum generation, in which nonlinear processes act on a localized pulse of light to generate a severe broadening of the optical spectrum.This is typically accomplished in microstructured optical fibers [77].Thus an initial    20-30 nanometer bandwidth can be stretched to hundreds of nanometers.The governing equation in this case is derived from Maxwell's wave equation in one dimension through the rotating wave approximation and the slowly varying envelope approximation [76].The original PDE is linear and second order in a vacuum, but in order to handle complicated polarization responses in fibers the field is expanded about the frequency of the original pulse [6,7].This "center frequency" expansion leads to a Taylor series expansion of the linear polarization response, and the Raman convolution integral describing a time-delayed nonlinear response.
The resulting PDE, known as a generalized nonlinear Schr ödinger equation (GNLSE) describes the evolution of the slowly varying complex envelope u(x, t) of the pulse.When nondimensionalized with soliton scalings [7], the envelope equation is The various constants (α k , a, b, c, d) describe the polarization response and are determined empirically.
Although the spectral domain is often of practical interest for studies of supercontinuum generation, in the time domain the pulse exhibits soliton behavior, as shown in figure 6.To leading order, the soliton propagation is typically understood to be maintained by a balance between the second order dispersion and the instantaneous part of the nonlinear response, or intensitydependent index of refraction.That is, evaluating the delta function component of the Raman kernel leads to the cubic Kerr nonlinearity.If only this cubic nonlinearity and second order dispersion are retained, equation (8a) is reduced to the usual nonlinear Schr ödinger equation (NLS): Figure 6 shows the balance models obtained through the unsupervised balance identification procedure applied to regions of the field where the intensity is within 40 dB of the peak.Most of the domain is associated with various linear dispersion relations, corresponding to different propagation speeds.Only a narrow region containing the strongest soliton is identified with the instantaneous nonlinear response, suggesting that a linear description is sufficient for much of the domain.The standard NLS equation is never identified, although the balance relation with cubic nonlinearity and fourth order dispersion (green) is consistent with standard truncation of the linear response at third or fourth order [6].Interestingly, the full Raman time-delay response is never selected as an important term, although this is understood to be a critical mechanism for the initial scattering.Presumably the Gaussian mixture model approach is not sensitive enough to detect this, possibly due to the clearly invalid underlying assumption of normally distributed data.

Geostrophic balance in the Gulf of Mexico
Geophysical fluid dynamics is a particularly complex field; a full description of ocean dynamics for instance requires not only the Navier-Stokes equations on a rotating Earth with complicated bathymetry, but must also account for the effects of varying salinity, temperature, and pressure via a nonlinear equation of state.The ocean dynamics also couple to atmospheric and geological processes and solar forcing [3].Scaling analyses have been remarkably successful; despite the complexity of the dynamics, in many cases it can be argued that greatly simplified versions of the governing equations are sufficient to describe the dominant motions.
Perhaps the most important model of this type is geostrophic balance.To a first approximation, the surface currents can be modeled with the 2D incompressible Navier-Stokes equations on a rotating sphere: where ρ is the density (in general a function of temperature, pressure, and salinity), and x and y are defined in the zonal and meridional directions, respectively.The Coriolis parameter f is given in terms of the Earth's angular velocity Ω and the latitude φ by f = Ω sin φ.Note that this equation already includes some approximations.Compressibility, vertical motions, and both molecular and turbulent viscosities are all ignored in this model.Nevertheless, these equations are a standard starting point for many analyses of large scale ocean dynamics.For flows with length scale L and velocity scale U , the relative importance of the Coriolis terms compared to the inertial terms is given by the Rossby number, Ro = U/f L. In low Rossby number flows (relatively slow, large scale motions), the inertial terms become negligible and the dominant balance is between the Coriolis forces and pressure gradient forces: This balance is thought to describe most approximately steady large scale currents [3].We apply the unsupervised balance identification procedure to the high-resolution 1/25 • HY-COM reanalysis data for the Gulf of Mexico [78]. Figure 7 shows the regions corresponding to balance models for this data.The method identifies three regimes; geostrophic balance (orange), a balance between acceleration and Coriolis forces (blue), and the linearized rotating Navier-Stokes equations (white).The nonlinear advective term is not included in any of the models in this case, supporting the common use of linearized equations to study wavelike motions.Geostrophic balance is primarily identified in regions corresponding to slow, large scale motions: the southern end of the Gulf Stream and the relatively stable current between Cuba and the Yucatàn Peninsula.
Clearly the approximations in estimating gradients introduce significant error and variability into the balance identification procedure for this examples.However, the identified models are consistent with the expected behavior according to classical arguments.These results indicate some degree of robustness of the procedure and suggest that it may be applied to sufficiently clean experimental or data-assimilated observations.

Generalized Hodgkin-Huxley model of an intrinsically bursting neuron
Networks of biological neurons in an animal's nervous systems communicate with each other through the propagation of electrical potentials.These all-or-nothing events, known as action potentials or spikes, are large deviations from the membrane electrical potential at rest, as measured between the inside and outside of a neuron.Importantly, spikes can travel without significant degradation down the length of a neuron's long axon, which may be meters long.
The celebrated Hodgkin-Huxley model for spiking neurons reproduces an action potential through a balance of currents from multiple ions, each of which moves through the cell's membrane across specialized channels and pores at different phases of a spike [79].These non-linear partial differential equations were the first detailed biophysical model to quantitively describe the dynamic activity of neurons, and they underpin decades of ongoing attempts to understand more complex properties of neuronal electrical excitability [80].
The propagation of an action potential along an axon is well approximated by the cable equation of a cylinder of radius a, where C M is the membrane capacitance, r L is the resistivity inside the cell, and I j are each of the ionic currents in current per unit area due to the flow of ions into and out of the cell.Hodgkin and Huxley originally modeled three (3) ionic currents: I N a sodium, I K potassium, and a leak I L .The dynamics of V for a single action potential can then be expressed as a system of four (4) ordinary differential equations; the balance of currents in these equations reflect the biophysical mechanisms.
Adding more ionic currents and modeling the interactive balance of their dynamics produces more complex spiking behavior.In particular, here we consider a generalized Hodgkin-Huxley model with ten (10) currents that simulates the intrinsically bursting pattern of spikes observed in the R15 neuron of the sea slug Aplysia [81], as shown in Fig. 8.The R15 neuron has been used to study the mechanisms underlying intrinsic bursting, where several action potentials are generated in rapid succession interspersed with relative quiet with constant inputs.Under spaceclamp conditions where an entire axon cable is considered to be spatially uniform, the equation describing the time-evolution of membrane voltage V under applied external input I stim is Specifically, the ionic currents I j in our model are: I N a the fast sodium Na + current; I Ca the fast calcium Ca 2+ current; I K the delayed rectifier potassium current; I SI the slow inward calcium current; I N S the non-specific cation current; I R the anomalous rectifier current; I L the leakage rectifier current; I N aCa the sodium-calcium exchanger current; I N aK the sodium-potassium pump; I CaP the calcium pump.
Our dominant balance approach identifies several interpretable regimes of physics in the generalized Hodgkin-Huxley model that are largely consistent with known biophysics.The addition of a set of calcium-dependent currents underly the slower oscillations between quiescence and excitable bursting, as evident in the slower limit cycle.Notably, in these clusters, colored pink and gray in Fig. 8, the balance of ions is dominated by terms with strong calcium dependence (I CaP , I SI , and I N aCa ).In contrast, the time-course of V at each fast spike is dominated by voltagegated ionic currents.In Fig. 8, the rising part of each spike is mediated by activation of sodium channels, and the inward I SI and I N a increase V (red and blue).V reaches peak voltage as the sodium channels inactivate and delayed rectifier potassium channels I K activate (purple).The exit of potassium from the cell decreases V back towards the resting potential.
There are three currents that have not been identified to belong to any cluster: the fast calcium current, sodium-potassium pump, and the non-specific cation current.Although these are dynamically important for the model, they are relatively small compared to the other terms (O(0.1 − 1) compared to O(100) for the spiking dynamics) and so they don't appear to participate in any of the local dominant balance relationships identified by this method.This is a similar situation to the Raman time-delay nonlinearity in the optical pulse propagation example (Sec.3.3) and the nonlinear advection in the Gulf of Mexico (Sec.3.4).In all of these cases, the influence of the neglected terms appears to be of a more subtle nature than the dominant balance physics we explore in this work.

Discussion
In one guise or another, dominant balance analysis has played a major role in the development of our understanding of many complex systems.In this paper we have proposed a method of identifying dominant balance regimes in an unsupervised manner directly from data.This approach leverages our understanding of the full physical complexity in the form of governing equations, but by using simple clustering and sparse approximation methods we avoid any a priori assumptions about balance relations.Nevertheless, in contexts ranging from fluid turbulence to nonlinear optics the method recovers classical dominant balance relationships.
The critical step in this process is the "equation space" perspective described in Sec. 2. By considering each term in the governing equation to describe a direction in this space, the dominant balance relations naturally manifest via restriction to sparse subspaces, i.e. dramatic reductions in variance in directions corresponding to negligible terms.This enables the Gaussian mixture models to identify clusters with variance in different directions, and the sparse principal components analysis to extract sparse subspaces by finding directions with significantly nonzero variance.These machine learning tools are therefore applied in a targeted and clearly motivated context, but the equation space perspective necessarily ties the output to underlying physics.
The method as presented here is perhaps the simplest version possible of this type of analysis.As such, there are clear opportunities for further refinement.For example, the Gaussian mixture model analysis is built on the assumption of normally distributed data.There is no reason to think that the equation space representation of physical fields would be normally distributed, which may limit the sensitivity of the method.Other methods such as spectral clustering or a custom, =physically motivated algorithm may be more effective at segmenting this type of data.
On the other hand, the method can be sensitive to computation of the various terms in the equation, especially gradients.When possible, the terms were extracted directly from the numerical solvers, although this may present a challenge for noisy experimental data.One way to address this could be a reanalysis-type smoothing procedure, as was used by the HYCOM group to generate the Gulf of Mexico data.Similar data-assimilation approaches have been successful at resolving mean profiles of turbulent flows from limited experimental data [82,83].
When properly developed and validated, the ability to automatically extract balance relations from data has exciting potential applications.For instance, identifying regions of flow fields where viscosity is important could be a principled way to inform schemes such as adaptive mesh refinement [59] or hybrid turbulence modeling [60,84]; currently regions are typically chosen using heuristics or expert knowledge.An understanding of balance relations could even potentially be used to develop novel control strategies.By designing or actuating with the goal of manipulating which regimes are active, such an approach might be used to achieve drag reduction or mixing enhancement.
More generally, dominant balance analysis has historically been a critical tool for understanding local physical behavior in complex systems.To date we have only been able to apply these methods to systems for which the governing equations are well-understood and which admit an asymptotic scaling analysis.Generalizing this analytic approach with data-driven dominant balance identification could allow application of this powerful perspective to complex geometries, non-asymptotic regimes, and even systems for which the governing equations are unknown.
However, as with all applications of machine learning and data science methods to physical systems, a critical step in application to any system will be careful validation that the balance identification procedure reproduces the expected results.The dominant balance modeling approach described here is designed to build on, rather than circumvent, physical expertise.The study of dominant balance regimes has been foundational to our understanding of many complex systems; we hope that data-driven methods can integrate with this legacy to enable even wider applicability.integrated with third order backwards differentiation, while convective terms are advanced with a third order extrapolation.The results of this simulation have been validated against those of the immersed boundary projection method [86] by comparing aerodynamic coefficients and vortex shedding frequency.We extract the vorticity field and spatial terms in equation ( 4) directly from the solver for further analysis.Time derivatives for dominant balance identification were estimated with a second order central difference.
Direct numerical simulation of a transitional boundary layer.To study dominant balance physics in the turbulent boundary layer, we use the transitional DNS by Lee and Zaki [36,37,73], openly available from the Johns Hopkins Turbulence Database [71,72] 2 .The full computational domain consists of a long flat plate with an elliptical leading edge.The extent of the domain (in units defined by the plate half-thickness) is (x, y, z) ∈ (1040, 40, 240) with periodic boundary conditions in the spanwise (z) direction, discretized to (N x, N y, N z) = (4097, 257, 2049).Since the configuration of interest is a zero pressure gradient flat plate boundary layer, the DNS results are only saved once the flow passes the elliptical leading edge (x > 30.2185).The inflow consists of small amplitude free-stream turbulence superimposed on a uniform streamwise velocity U ∞ incident on the plate.The interactions of these perturbations with the laminar boundary layer cause a downstream transition to turbulence [73].
Since we are interested here in the mean momentum balance, we only use the 2D mean field (also available from JHTDB), which was computed from 4701 data snapshots once the flow reached a statistically stationary state.Without direct access to the gradients, we compute the constituent terms of the RANS equations with second-order accurate finite differences, as shown in Fig. 1b.Although some of these fields show small fluctuations, the overall smoothness suggests the statistics are approximately converged.Supercontinuum generation in photonic crystal fiber.The generalized nonlinear Schr ödinger equation (GNLSE), nondimensionalized with soliton scaling [7], is given by Eq. (8a).The various constants describe the polarization response and are determined empirically.In this case we use the values described by Dudley et al for photonic crystal fiber [77].We also use the split-step spectral method and initial conditions described in these works to simulate the pulse propagation 3 .
Surface currents in the Gulf of Mexico.We study the high-resolution 1/25 • HYCOM reanalysis data for the Gulf of Mexico [78].We use data from only the first field in the data set, corresponding to January 1993.Data-assimilated fields are available for the 2D velocity components, sea surface temperature, salinity, and sea surface height; vorticity is shown in Fig. 7.
We must therefore estimate time derivatives and both velocity and pressure gradients to compute the terms in Eqns.(10a) and (10b).Since this information is not directly accessible from the model (as for the numerical examples), we use finite differences to estimate the velocity derivatives.The pressure field itself is also not available; as a rough estimate we use the residuals of the left-hand side of Eqns.(10a) and (10b) in place of pressure gradients.We also assume constant density throughout the field.Finally, since this field is two-dimensional but the terms in each evolution equation represent the same physics, we simply stack the features for each velocity component into a single (2N × 4) matrix with columns corresponding to acceleration, convection, Coriolis forces, and the pressure gradient.Although these are strong assumptions and approxi-mations, we would expect them to only make the dominant balance identification problem more difficult, since they represent attempts to deal with limited information about the system.

Generalized Hodgkins-Huxley model of a bursting neuron.
A full set of model equations, including biophysical parameters, follow [81] and are given in the simulation code.Briefly, gating variables following Hodgkin-Huxley form are described by solutions to differential equations of the general form ż = (z ∞ − z)/τ z , where z ∞ are the steady-state values and τ z are the time constants associated with the gating variable z.To produce the data used in our analysis, this system of ordinary differential equations was integrated numerically in MATLAB using ode15.
Figure 9: Model selection procedure used to choose a sparse regularization value for the principal components analysis, demonstrated on the turbulent boundary layer example.Although there is some flexibility depending on the desired accuracy and simplicity in the specific application, the residual of neglected terms suggests a range of appropriate values.In this work we chose regularizations that were as sparse as possible but spanned most of the original terms in the equation and had relatively small residuals (middle panel).Often this led to a set of balance relations, each with 2-3 terms, which collectively captured much of the richness of the full system.results in an estimate of the probability of misclassification of each point, as shown in Fig. 10.As expected, this measure generally becomes large in transitional regions.However, keeping their approximate nature in mind, the balance models offer a principled and intuitive segmentation of the domain according to the dominant physics.
d 3 p 2 N 9 e t 7 p g 1 X 8 h K G C e v E 5 F b y i F M C l u q 5 N I g J D M I o S 0 c 9 w A e 4 8 t 3 j g P Y V 4 E C S U J B 9 P C O c 4 s N A D 1 Q 3 O / R H u Y 4 T a w 5 k m r f d 2 s x 8 z y 1 7 V W 9 S + C / w c 1 B G e T V 7 7 l P E 9 R A 5 6 i J W o i i B / S C 3 t C 7 8 + i 8 O h / O 5 3 S 0 4 O S e b f S j C o t f 9 3 + y o A = = < / l a t e x i t > Non-asymptotic local balance models fields Transitional boundary layer (direct numerical simulation) Identified dominant balance physics Clustering in "equation space" t e x i t s h a 1 _ b a s e 6 4 = " u W y u S 0 8 5 4 E J v o f s H m h O F Y h g E U 8 o = " > A A A B 6 H i c b V B N S 8 N A E J 3 U r 1 q / q h 6 9 L B a h p 5 J I Q Y 8 F L x 5 b s B / Q h r L Z T t q 1 m 0 3 Y 3 Y g l 9 B d 4 8 a C I V 3 + S N / + N 2 z Y H r T 4 Y e L w 3 w 8 y 8 I B F c G 9 f 9 c g o b m 1 v b O 8 X d 0 t 7 + w e F R + f i k o + N U M W y z W M S q F 1 C N g k t s G 2 4 E 9 h K F N A o E d o P p z c L v P q D S P J Z 3 Z p a g H 9 G x 5 C F n 1 F i p 9 T g s V 9 y a u w T 5 S 7 y c V C B H c 1 j + H I x i l k Y o D R N U 6 7 7 n J s b P q D 5 9 3 5 W L U W n H z m H P 7 A + f w B 4 j G M 6 g = = < / l a t e x i t > z < l a t e x i t s h a 1 _ b a s e 6 4 = " S 7 z s o 3 t v t q X D Q R b I 3 x / b 8 p x r 7 s s = " > A A A B 6 H i c b V B N S 8 N A E J 3 U r 1 q / q h 6 9 L B a h p 5 J I Q Y 8 F L x 5 b s B / Q h r L Z T t q 1 m 0 3 Y 3 Q g 1 9 B d 4 8 a C I V 3 + S N / + N 2 z Y H r T 4 Y e L w 3 w 8 y 8 I B F c G 9 f 9 c g o b m 1 v b O 8 X d 0 t 7 + w e F R + f i k o + N U M W y z W M S q F 1 C N g k t s G 2 4 E 9 h K F N A o E d o P p z c L v P q D S P J Z 3 Z p a g H 9 G x 5 C F n 1 F i p 9 T g s V 9 y a u w T 5 S 7 y c V C B H c 1 j + H I x i l k Y o D R N U 6 7 7 n J s b P q D

x
< l a t e x i t s h a 1 _ b a s e 6 4 = " u W y u S 0 8 5 4E J v o f s H m h O F Y h g E U 8 o = " > A A A B 6 H i c b V B N S 8 N A E J 3 U r 1 q / q h 6 9 L B a h p 5 J I Q Y 8 F L x 5 b s B / Q h r L Z T t q 1 m 0 3 Y 3 Y g l 9 Bd 4 8 a C I V 3 + S N / + N 2 z Y H r T 4 Y e L w 3 w 8 y 8 I B F c G 9 f 9 c g o b m 1 v b O 8 X d 0 t 7 + w e F R + f i k o + N U M W y z W M S q F 1 C N g k t s G 2 4 E 9 h K F N A o E d o P p z c L v P q D S P J Z 3 Z p a g H 9 G x 5 C F n 1 F i p 9 T g s V 9 y a u w T 5 S 7 y c V C B H c 1 j + H I x i l k Y o D R N U 6 7 7 n J s b P q D

Figure 2 :
Figure 2: Example of dominant balance identification on the viscous Burgers' equation (a), with constituent terms shown in (b).The viscous term acts to diffuse sharp gradients and prevent formation of a discontinuous shock, but away from the shock front the dynamics are essentially inviscid.Away from the shock front, the field is approximately restricted to the νu xx = 0 plane (c).This is reflected in the covariance matrices learned by the Gaussian mixture model (d).

Figure 3 :
Figure 3: Dominant balance physics identified across a range of systems.For each case, a visualization of the system is shown on the left, followed by 2D views of the feature space colored by the identified balance relation, a key describing the active terms in each model, and the original field colored by the local balance.From top: a bluff body wake at moderate Reynolds number, a boundary layer in transition to turbulence, pulse propagation in an optical fiber, surface currents in the Gulf of Mexico, and a Hodgkins-Huxley model for an intrinsically bursting neuron.

Figure 5 :
Figure 5: Direct numerical simulation (DNS) of a transitional boundary layer[36,37,[71][72][73], visualized by contours of the turbulent kinetic energy (a).The Reynolds number based on free stream velocity and streamwise extent is Re L = 192, 000.Active terms vary across the domain (b).The method recovers expected balance relations for the free-stream (green), the inertial sublayer (blue), and the viscous sublayer (red), along with a laminar region near the inlet (purple) and a transitional region (orange).The inertial sublayer follows the theoretically predicted power law (c).Boundary layer theory predicts that the length scale of the sublayer scales with ∼ x 4/5 .As a rough criterion for the scale of the inertial balance model, we use the wall-normal coordinate at which the balance relation changes (solid line top), once the transitional region (purple) ends.A curve fit shows an approximate scaling of ∼ x 0.81 .
s h a 1 _ b a s e 6 4 = " x Y B G n o y C U n Q v 1 X s i 6 l 2 E + P u / s F Y = " > A A A C g n i c b V F N a 9 w w E J X d t E 3 S j 2 y a W 3 M Z u p R s C l n s T a A 9 N B D o J c c U u k l g 7 R h Z l t f C 8 g f S K N S 4 P v Z H 9 g / k 0 j 9 R e d f Q J u m A 0 O P N e z P i K a 6 l 0 O h 5 v x z 3 y c b T Z 8 8 3 t 7 Z f v H z 1 e m e 0 + + Z S V 0 Y x P m e V r N R 1 T D W X o u R z F C j 5 d a 0 4 L W L J r + L 8 S 9 + / u u V K i 6 r 8 h k 3 N w 4 I u S 5 E K R t F S 0 e h n U C e 3 r e n a 7 x 0 c Q a B N E b X 5 6 a y 7 a Q N R p t h 0 E F B Z Z z T K o V c u 8 r A X Y w e n E E i e 4 k T 0 t n 4 G d o E S y w w P w Y D 1 Y t Q e D S P + z l I T P D i E H 2 D W 9 8 0 M g i Q BP I h G Y 2 / q r Q o e A 3 8 A Y z L U R T S 6 C 5 K K m Y K X y C T V e u F 7 N Y Y t V S i Y 5 N 1 2 Y D S v K c v p k i 8 s L G n B d d i u 4 u r g v W U S S C t l T 4 m w Y v 9 1 t L T Q u i l i q y w o Z v p h r y f / 1 1 s Y T D + F r S h r g 7 x k 6 0 W p k Y A V 9 N l D I h R n K B s L K F P C v h V Y R h V la H / o 3 p Y 6 a 7 R g u r P B + A 9 j e A w u Z 1 P / e D r 7 e j I + O x k i 2 i T 7 5 B 2 Z E J 9 8 J G f k n F y Q O W H k t 7 P r v H X 2 3 Q 3 3 g + u 7 x 2 u p 6 w y e P X K v 3 M 9 / A L A 2 w r 4 = < / l a t e x i t > r(t) = a (t) + b exp (ct) sin (dt) ⇥(t) < l a t e x i t s h a 1 _ b a s e 6 4 = " A l N j s S o p + 3 K d p D W p C + U 5 8 b 0 q F k 0 = " > A A A C S X i c b V D P S x t B G J 2 N V q 1 W j X r s 5 c N Q i A h h N w r 2 U h C 8 9 N C D g j G B b A i z s 9 9 m h 8 z O L j P f i i H k n / L f 6 D 9 Q j / b Q u z f x 5 O Q H p c Y + G H j z 3 v f 4 Z l 5 U K G n J 9 x + 8 y s r q h 7 X 1 j Y + b W 5 + 2 d 3 a r e / s 3 N i + N w J b I V W 4 6 E b e o p M Y W S V L Y K Q z y L F L Y j o Y X U 7 9 9 i 8 b K X F / T q M B e x g d a J l J w c l K / + s P U 6 H L j j i a Y q f 8 m x j y z d p R F b j L j l N p l b y r + z + u W l H z t j a U u S k I t 5 o u S U g H l M K 0 Q Y m l Q k B o 5 w o W R 7 q 0 g U m 6 4 I F f 0 m y 1 F O r J S 2 I k r J l i u 4 T 2 5 a T a C k 0 b z 6 r R 2 f r q o a I N 9 Z o e s z g J 2 x s 7 Z d 3 b J W k y w e / a L P b L f 3 k / v y X v 2 X u a j F W + R O W B v U F l 5 B e w V s E g = < / l a t e x i t > Log EM intensity

Figure 6 :
Figure6: Identified balance models for the generalized nonlinear Schr ödinger equation.The governing equations are derived from Maxwell's equations in 1D with a nonlinear time-delayed polarization response.Soliton propagation is understood to be maintained primarily by a balance between low-order dispersion and the cubic Kerr nonlinearity (delta-function component of the right-hand side integral)[7].Although most of the field is identified with various linear dispersion relations, the strongest soliton is associated with cubic nonlinearity and dispersive terms through fourth order.

Figure 7 :
Figure 7: Surface vorticity in the Gulf of Mexico (left) along with identified balance models for zonal (middle) and meridional (right) dynamics.Orange regions are identified with the geostrophic balance, while the blue regions are time-varying in response to the Coriolis forces and regions in white are associated with the linearized rotating Navier-Stokes equations.

Figure 8 :
Figure 8: Generalized Hodgkins-Huxley model for an intrinsically bursting neuron.Dynamics in quiescent periods are characterized by currents related to calcium concentration (pink and gray), while the spiking dynamics are dominated by the classic sodium-potassium cycle.

4 <
u t + (u • r)u = rp + 1 Re r 2 u < l a t e x i t s h a 1 _ b a s e 6 4 = " Z k I x N x r G y z F a Z A 6 N l t b s 0 e h U A 1 Y = " > A A A C U H i c b V H L S g M x F L 1 T 3 / V V d e k m W A R F L D N V 0 I 1 Q c O N S x a r Q q S W T Z t p g J j M k d 4 Q y z C e 6 6 c 7 v c O N C 0 f Q h 1 u q F k J N z 7 i E 3 J 0 E i h U H X f X E K M 7 N z 8 w u L S 8 X l l d W 1 9 d L G 5 q 2 J U 8 1 4 n c U y 1 v c B N V w K x e s o U P L 7 R H M a B Z L f B Y / n A / 3 u i W s j Y n W D v Y Q 3 I 9 p R I h S M o q V a p Y 4 f U e w G Y Z b m L S Q H Z O / n T H z W j p H 4 i g a S 7 p M J 4 Y w c j l i S W I s f a s o y L 8 9 8 H W X X P M / H l o f q h K d V K r s V d 1 j k L / D G o A z j u m y V + n 4 7 Z m n E F T J J j W l 4 b o L N j G o U T P K 8 6 K e G J 5 Q 9 0 g 5 v W K h o x E 0 z G w a S k 1 3 L t E k Y a 7 s U k i E 7 6 c h o Z E w v C m z n Y E I z r Q 3 I / 7 R G i u F p M x M q S Z E r N r o o T C X B m A z S J W 2 h O U P Z s 4 A y L e y s h H W p z Q f t H x R t C N 7 0 k / + C 2 2 r F O 6 p U r 4 7 L t d N x H I u w D T u w B x 6 c Q A 0 u 4 B L q w O A Z X u E d P p y + 8 + Z 8 F p x R 6 / c O W / C r C s U v h j G 0 F A = = < / l a t e x i t > 10 l a t e x i t s h a 1 _ b a s e 6 4 = " + 5 k F n W l r x E n A U 6 s V u e L D T 6 e m i e g = " > A A A B 7 n i c bV B N S 8 N A E J 3 U r 1 q / q h 6 9 L B b B i y W R g B 6 L X j x W s B / Q x r L Z b t u l m 0 3 Y n Q g l 9 E d 4 8 a C I V 3 + P N / + N 2 z Y H b X 0 w 8 H h v h p l 5 Y S K F Q d f 9 d g p r 6 x u b W 8 X t 0 s 7 u 3 v 5 B + f C o a e J U M 9 5 g s Y x 1 O 6 S G S 6 F 4 A w V K 3 k 4 0 p 1 E o e S s c 3 8 7 8 1 h P X R s T q A S c J D y I 6 V G I g G E U r t T z 3 M b v w p 7 1 y x a 2 6 c 5 B V 4 u W k A j n q v f J X t x + z N O I K m a T G d D w 3 w S C j G g W T f F r q p o Y n l I 3 p k H c s V T T i J s j m 5 0 7 J m V X 6 Z B B r W w r J X P 0 9 k d H I m E k U 2 s 6 I 4 s g s e z P x P 6 + T 4 u A 6 y I R K U u S K L R Y N U k k w J r P f S V 9 o z l B O L K F M C 3 s r Y S O q K U O b U M m G 4 C 2 / v E q a l 1 X P r / r 3 f q V 2 k 8 d R h B M4 h X P w 4 A p q c A d 1 a A C D M T z D K 7 w 5 i f P i v D s f i 9 a C k 8 8 c w x 8 4 n z 9 C P o 7 f < / l a t e x i t > 10 0 < l a t e x i t s h a 1 _ b a s e 6 4 = " E t Z d B 4 7 c 5 d K C U F 5 e g F d z T N I 9 I z k = " > A A A B 6 3 i c b V B N S w M x E J 3 1 s 9 a v q k c v w S J 4 K l k p 6 L H o x W M F + w H t W r J p t g 1 N s k u S F c r S v + D F g y J e / U P e / D d m 2 z 1 o 6 4 O B x 3 s z z M w L E 8 G N x f j b W 1 v f 2 N z a L u 2 U d / f 2 D w 4 r R 8 d t E 6 e a s h a N R a y 7 I T F M c M V a l l v B u o l m R I a C d c L J b e 5 3 n p g 2 P F Y P d p q wQ J K R 4 h G n x O a S j x / x o F L F N T w H W i V + Q a p Q o D m o f P W H M U 0 l U 5 Y K Y k z P x 4 k N M q I t p 4 L N y v 3 U s I T Q C R m x n q O K S G a C b H 7 r D J 0 7 Z Y i i W L t S F s 3 V 3 x M Z k c Z M Z e g 6 J b F j s + z l 4 n 9 e L 7 X R d Z B x l a S W K b p Y F K U C 2 R j l j 6 M h 1 4 x a M X W E U M 3 d r Y i O i S b U u n j K L g R / + e V V 0r 6 s + f V a / b 5 e b d w U c Z T g F M 7 g A n y 4 g g b c Q R N a Q G E M z / A K b 5 7 0 X r x 3 7 2 P R u u Y V M y f w B 9 7 n D w z 8 j Z g = < / l a t e x i t > log p < l a t e x i t s h a 1 _ b a s e 6 4 = " v p l R 2 C C C F 0 y 2 h 0 P Q h m y Y l b k O U q E = " > A A A B 7 X i c b V D L S g N B E O z 1 G e M r 6 t H L Y B A 8 h V 0 J 6 D H o x W M E 8 4 B k C b O T S T J m d m a Z 6 R X C k n / w 4 k E R r / 6 P N / / G S b I H T S x o K K q 6 6 e 6 K E i k s + v 6 3 t 7 a + s b m 1 X d g p 7 u 7 t H x y W j o 6 b V q e G 8 Q b T U p t 2 R C 2 X Q v E G C p S 8 n R h O 4 0 j y V j S + n f m t J 2 6 s 0 O o B J w k P Y z p U Y i A Y R S c 1 u 1 I P S d I r l f 2 K P w d Z J U F O y p C j 3 i t 9 d f u a p T F X y C S 1 t h P 4 C Y Y Z N S i Y 5 N N i N 7 U 8 o W x M h 7 z j q K I x t 2 E 2 v 3 Z K z p 3 S J w N t X C k k c / X 3 R E Z j a y d x 5 D p j i i O 7 7 M 3 E / 7 x O i o P r M B M q S Z E r t l g 0 S C V B T W a v k 7 4 w n K G c O E K Z E e 5 W w k b U U I Y u o K I L I V h + e Z U 0 L y t B t V K 9 r 5 Z r N 3 k c B T i F M 7 i A A K 6 g B n d Q h w Y w e I R n e I U 3 T 3 s v 3 r v 3 s W h d 8 / K Z E / g D 7 / M H P t K O 6 w = = < / l a t e x i t >

Figure 10 :
Figure 10: Uncertainty estimation for the dominant balance identification procedure.The Gaussian mixture model clusters points in the domain by assigning a probability of belonging to each Gaussian distribution.Summing the probabilities that each point belongs to a GMM cluster which SPCA reduces to the same balance model gives an overall estimate of the uncertainty associated with the identified dominant balance.