Introduction

Accurate modeling of non-equilibrium reacting flows is critical in many engineering and science disciplines, e.g., designing hypersonic vehicles for space exploration1,2 or material processing and manufacturing with low-temperature plasmas3,4. The need for describing and understanding these flows has led to the development of increasingly large and sophisticated mathematical models5,6,7,8, describing multiple physical phenomena characterized by a broad spectrum of spatio-temporal scales.

The most physically consistent approach to model non-equilibrium flows relies on the direct numerical solution of the master equation5,6,9,10,11,12,13, whereby all the relevant spatial and temporal scales resulting from chemical and radiative processes are accounted for. Indeed, the availability of quantum state-to-state (StS) chemistry models based on ab initio theories14,15,16,17,18 enables unprecedented levels of physical accuracy5,6,7,8, crucial for modeling flows typified by a significant degree of non-equilibrium. However, the exponentially large number of degrees of freedom (i.e., molecules’ and atoms’ energy levels) and the numerical restrictions (stiffness) associated with the derived system of equations make these models impracticable in large-scale multi-dimensional computational fluid dynamics (CFD) problems. To overcome these difficulties, crude “engineering” non-equilibrium models19,20, referred to as multi-temperature (MT) models, have been developed over the years, often assembled without any rigorous derivation from fundamental kinetic equations nor consideration for physical principles and constraints. Given their interpolative nature, these cannot be used to perform predictions outside their development range.

This work targets the numerical challenges in solving such computationally intense systems of equations by surrogating the thermochemical processes characterizing non-equilibrium phenomena that conventional techniques cannot address. Surrogate and reduced-order models21,22,23,24,25,26 can be designed and constructed by employing various techniques, such as projection-based methods27,28,29,30,31,32,33, data-fit interpolation and regression34, and machine learning (ML)-based models35,36. A recent application of surrogates for hypersonics has been published by Ozbenli et al.37, who trained a feed-forward neural network (FNN) to learn a given set of the master equations’ solution functions for a specific non-equilibrium model38. Their ML framework showed a great computational speed-up compared to numerical integrators, with generalization performances left unclear. Similarly, Campoli et al.39 explored different ML algorithms to regress the source terms of the ODEs system modeling the thermochemical relaxation processes. A coupling between a conventional integrator and the ML regressor was attempted, and speed-up performances were analyzed. They also tried to infer the solution of Euler’s equations for a single one-dimensional reacting shock flow scenario by leveraging a deep neural network (DNN). Scherding and coworkers40 developed a lower-dimensional surrogate to compute the thermochemical properties of the gas mixture to be used in place of any high-dimensional look-up non-equilibrium thermodynamic library. However, despite the considerable speed-up performances and encouraging perspective, they did it only for steady-state solutions, targeting specific flow conditions and considering only chemical and not thermal non-equilibrium. The above-mentioned frameworks lack generalization performances and do not impose physical constraints during the surrogate construction, making them less suitable for CFD simulations. Instead, the present study aims to provide a prototyping tool that can replace the master equations with a surrogate that preserves the original’s essential properties and physical constraints while being orders of magnitude faster and able to cover an extensive range of physical conditions. The present work augments the framework introduced by Zanardi et al.41, and it introduces a new machine learning-based method for solving non-equilibrium flows by combining:

  1. i.

    Coarse-graining, i.e., a reduced order modeling (ROM) technique that extracts meaningful physics from the master equations10,42,43,44,45, in general, by leveraging unsupervised learning adaptation to seek the optimal grouping configuration46. The so-derived reduced system of equations models the dynamics of groups of states, addressing the high-dimensionality problem characterizing the StS models.

  2. ii.

    Neural operators, i.e., a ML-based surrogate that approximates the integral solution operator of a family of partial differential equations (PDEs) to bypass conventional numerical integration47.

Coarse-graining

Constructing a surrogate for high-fidelity quantum-state-specific chemistry models to describe non-equilibrium phenomena is not a simple task as they rely on the solution of an overwhelmingly large number of differential equations (order of 10\(^5\))5. More importantly, the mathematical closure of these equations requires the determination of a sizeable kinetic database that often cannot be computed owing to many processes (order of 10\(^{16}\)) to be considered. Therefore, performing first a physics-preserving dimensionality reduction is of paramount importance. To this end, nonlinear manifold learning techniques such as autoencoders48, diffusion maps49, or kernel PCA50 could be employed. Recently, Oommen et al.51 proposed learning high-dimensional complex dynamics by combining neural operators and autoencoders. Their application first reduced the problem’s dimensionality by training a convolutional autoencoder and then learned the low-dimensional dynamics lying in the latent space using a deep neural operator. However, although powerful in applications requiring dimensionality reduction, autoencoders lack physical interpretability and introduce spurious correlations, not necessarily guaranteeing a discrete separation of temporal scales. To overcome these limitations, our approach relies on a class of physics-based reduced-order coarse-grained (CG) models52,53,54. In chemical kinetics, coarse-grained modeling has extensively been used to describe non-equilibrium phenomena of atomic and molecular species45,46,55,56,57,58. The central idea in the proposed CG model is to combine the solution of the coarse-grained dynamics with the partial equilibration of the underlying microscopic structure. The concept of partial equilibrium suggests applying the maximum entropy principles (MEP) to reconstruct the unresolved scales or physics. This choice is of paramount importance, as it ensures the physical consistency of the model by enforcing the principle of detailed balance and ensuring the positivity and boundness of the distribution function.

Neural operators

The second basis of the proposed methodology aims to address the stiffness associated with thermochemical processes, characterized by a broad spectrum of temporal scales, ranging from the flow time scales to time scales that are orders of magnitude smaller. This work uses DNNs to infer the generalized solution of the governing equations to bypass the conventional numerical integration. In literature, a series of new ML-based paradigms for speeding up the numerical simulation of partial differential equations59,60,61,62,63,64,65 have been proposed over the past few years. In particular, this work leverages the family of neural operators47,66,67,68,69,70, DNN-based surrogates designed to learn or discover solution operators defined by the mapping between inputs of a dynamical system, such as initial or boundary conditions (ICs/BCs), and its state. We employ a parametric-based approach to operator learning, introduced first by Chen et al.71 and then recently extended by Lu et al.72 In their work, Lu and coworkers introduced DeepONet, a novel network architecture that effectively approximates the solution operator of linear and nonlinear parametric PDEs. DeepONets have found applications in various fields of physics73,74, including hypersonics with the work of Mao et al.75, who approximated the fluid flow evolution and concentration profiles downstream of a normal shock with a DeepONet-based surrogate. Although Mao et al.’s work is significant for the scientific community, it relies on a simple physical model that cannot correctly represent the non-equilibrium distribution of internal energy states, which is crucial for the current study. Additionally, their approach lacks physics constraints during the design and training phase of the model, such as physics-informed (PI) machine learning methodologies employed in this work, commonly known as PINNs76,77,78,79,80,81. These techniques impose constraints by penalizing deviations from governing equations, enhancing the model’s generalization performance. This new class of machine learning models, called physics-informed deep neural operator (PI-DeepONet)82,83,84,85, which combines physics-informed techniques with the DeepONet architecture, was initially introduced by Wang et al.82 and successfully applied to construct surrogate solution operators for various partial differential equations (PDEs), demonstrating excellent results.

Figure 1
figure 1

Schematics of the proposed approach. Combining coarse-graining (a) and hierarchical DeepONets (b). (a) Reduced order modeling technique based on clustering the species’ quantum energy states (schematized as black dots and as functions of vibrational, \(\epsilon _v\), and rotational energy, \(\epsilon _J\)) into macroscopic bins. In the figure, three different levels of hierarchical clustering are shown. (b) Tree visualization of the hierarchical deep learning framework, where the leaf nodes correspond to separate DeepONets (one for each macroscopic bin), which take as inputs the initial conditions, ic, and time, t.

Proposed approach

The combined use of coarse-graining and neural operators is of primary importance. On the one hand, the mere application of neural operators does not resolve the high-dimensionality problem, as it is not straightforward to design and train an efficient surrogate for thousands of coupled differential equations. On the other hand, dimensionality reduction does not solve the issues with integration, as small steps are still needed to stably integrate the reduced system of equations. For these reasons, the proposed framework (Fig. 1) is characterized by a novel physics-inspired architecture based on a hierarchy of DeepONets used to learn the solution operator for multiple coarse-grained configurations to resolve different scales of the phenomena considered. The CG surrogate herein proposed, referred to as CG-DeepONet throughout the rest of the paper, is constructed by training each scale sequentially and employing transfer learning between them. In this sense, our framework is in line with recent operator learning techniques for multi-scale systems86,87,88,89,90,91. Among the latest ones, Liu et al.86 proposed a promising hierarchical time-stepper approach for solving the system dynamics. In their approach, they trained multiple neural networks to capture different timescales of the physical phenomenon by varying the integration step. We also recall the work of Migus et al.87, who designed a multi-scale architecture based on multi-pole graph neural operators (MGNO) by embedding multi-resolution iterative methods92. Liu and coworkers88 drew inspiration from hierarchical matrix methods to develop their multi-scale hierarchical transformer. Furthermore, Liu and Cai89 integrated multi-scale deep neural networks (MscaleDNNs)93 within the DeepONet architecture. These innovative approaches open up new possibilities for more accurate and efficient modeling of multi-scale complex systems, and the paradigm proposed in this work builds upon these advancements. Indeed, our framework allows the development of a parsimonious and autonomous tool that can quickly deliver the optimal thermochemical representation of the gas given initial conditions and time instant by adaptively choosing the most efficient and physically accurate grouping resolution. The need for adaptation is a direct consequence of different physical scenarios arising in multidimensional numerical simulations, ranging from equilibrium or near-equilibrium to strong non-equilibrium conditions. A controller-acting surrogate, identified as Neq-DeepONet in the remainder of this paper, is responsible for the model adaption to the local flow conditions. In this sense, our framework can be viewed as a multi-fidelity composition of DeepONets and shares analogies with some recent works on the topic94,95,96. However, the novelty of our approach stems from the definition of such a composition based on the maximum-entropy coarse-grained modeling, which is consistent with the underlying physics.

Physics-informed attributes of the surrogate

In this paragraphs, we highlight the physics-informed features of the proposed approach, which take the form of either soft or hard constraints imposed on the surrogate:

  1. i.

    Dimensionality reduction in the state space. In addition to the dimensionality reduction in the space of the initial conditions automatically carried out by the DeepONet based on the scenarios provided during training97, a physics-based reduction is performed in the state space (i.e., in the space of the discrete energy states) by grouping states that are likely to be found in local equilibrium46,57. Only briefly introduced above, such a coarse-graining approach will be detailed in Section “Physical modeling”.

  2. ii.

    Physics-consistent architecture components. A Boltzmann transformation layer is built into the surrogate to enforce the equilibrium distributions between states in the same group, as explained in Section “Neural operators”.

  3. iii.

    Interpretable prior distributions for the network parameters. As discussed in Section “Neural operators”, the addition of Boltzmann layers allows the imposition of prior distributions for the network parameters that, when propagated to the state populations (e.g., mass fractions), produce equilibrium distributions between distinct groups of states. Therefore, such priors can provide physically consistent solutions even for un-trained surrogates.

  4. iv.

    Physics-informed loss function. The framework employs a physics-informed loss as a soft constraint, which biases the surrogate predictions towards physically consistent solutions. In particular, the employed hybrid strategy, described in Section “Neural operators”, combines data from high-fidelity simulations (or experiments) to anchor the solution to frequent or reproducible real-world scenarios and the residual of the governing laws to ensure generalizability to different unseen physical conditions.

  5. v.

    Hierarchical architecture and transfer learning. The training strategy involves sequential fine-tuning transfer learning between different temporal scales, explained in Section “Neural operators”. On the one hand, this approach allows for partially preserving the learned physics. On the other hand, it enables surrogate adaptation and knowledge transfer from one temporal scale to another, speeding up the training process of the entire network.

  6. vi.

    Physics-driven online pruning at the prediction phase. As detailed in Section “Training strategy”, an additional (controller-acting) surrogate learns the dynamics of a physically-relevant non-equilibrium control variable, determining the minimum resolution level required to accurately describe the system dynamics while avoiding explicitly computing unnecessary fine scales. During the prediction phase, this additional surrogate is responsible for selecting which component of the overall architecture needs to be queried.

The paper is structured as follows. First, in Section “Physical modeling”, the basic framework and derivation of the thermochemical non-equilibrium model are provided, along with the details of the one-dimensional numerical experiment conducted in this work. Next, in Sections “Neural operators” and “Training strategy”, the proposed ML framework and the developed adaptive technique are described, respectively. In the “Results” section, the accuracy and performance of the surrogate with and without adaptive inference are illustrated and discussed in detail for both 0-\(\textrm{D}\) and 1-\(\textrm{D}\) test case scenarios. Finally, in the “Conclusions” section, final remarks are presented along with possibilities for future work. Additional information can be found in the Supplementary Information for interested readers.

Methods

Physical modeling

Modeling of chemically reacting flows relies on the solution of Navier-Stokes equations complemented by additional conservation equations accounting for changes in the chemical composition and non-equilibrium relaxation of the energy modes. This extra set of equations often represents a computational burden that makes reacting non-equilibrium flows hard to solve. An extensive discussion on non-equilibrium modeling can be found in reference45.

The most general way to express the extra set of governing equations is

$$\begin{aligned} \frac{D}{Dt} \left( \rho _i e^m_i\right) + \nabla \cdot {\mathscr {J}}^m_i = \Omega ^m_i\;, \end{aligned}$$
(1)

where \(\rho _i\) and \(e_i\) indicate the mass density and the internal energy of the i-th pseudo-species (i.e., a particular species’ internal degree of freedom treated as a state variable). Additionally, m denotes the moment order (0, 1, 2, etc.), \(\Omega ^m_i\) the reactive source terms, D/Dt the Lagrangian derivative, and \({\mathscr {J}}_i^m\) the dissipative/diffusion terms. Depending on the assumptions made in the definition of the chemical species indicated by i, three different models can be identified:

  1. i.

    If i refers to a particular energy state, \(\epsilon _i\) (i.e., rovibronic \(i=(el,v,J)\)), the approach is called state-to-state (StS) master equations5,6. In this case, m is set to 0.

  2. ii.

    If \(\rho _i\) indicates the density of a group of states, the approach is named coarse-grained (CG) modeling or coarse-grained master equations (CGME)42,45,46,57,58,98,99,100. In this case, the conservation equations for mass, momentum, and energy are complemented by additional equations (i.e., \(m=0\) and/or \(m=1\)) to model chemical composition and internal energy modes.

  3. iii.

    In the case of binning one group per internal energy mode, which is a particular case of (ii), we have the multi-temperature (MT) models101.

Figure 2 compares the levels of physical accuracy and resolution among the three models mentioned above for O\(_2 +\)O kinetics, the only system considered in this work. A substantial loss of physical information can be noticed moving from the internal energy states distribution obtained with the StS model to the one defined by Park’s two-temperature model101, which is a particular case of the MT models, where all the states are collapsed along a straight line. Differently, the CGME approach better captures the StS distribution by modeling the dynamics of multiple clusters of states (27 in Fig. 2, namely the CGME27 model). In this work, only the coarse-grained master equations approach will be employed to construct our surrogate, which is tested in both 0-\(\textrm{D}\) and 1-\(\textrm{D}\) scenarios.

Figure 2
figure 2

Normalized quasi-steady state (QSS) rovibrational states distribution for different models. The level of physical accuracy can vary significantly depending on the choice of the thermochemical model. The orange dots are determined by low-fidelity Park’s two-temperature model102, e.g., a particular MT model, the blue ones by the 27-groups CG grouping strategy, and the grey ones by the high-fidelity StS modeling. Initial conditions used for the 0-\(\textrm{D}\) simulation: \(P_{_0}=3\,000\text { Pa}\), \(X_{\text {O}_0}=0.2\), \(T_{int_0}=1000\text { K}\), \(T=10\,000\text { K}\).

Coarse-grained modeling

The numerical solution of the master equations, whereby the dynamics of each internal energy state is captured via the direct solution of the corresponding mass conservation equations, is often impractically expensive. Moreover, it is usually not required since the internal energy distribution is generally a composition of partial equilibria rather than a complete non-equilibrium state46. The concept of local or partial equilibrium suggests the application of the principle of maximum entropy to reconstruct the unresolved scales of physics10,44,45. The construction of a coarse-grained model is accomplished by adopting a two-step procedure which goes as follows103:

  1. i.

    Group energy states into \({\mathscr {N}_\mathscr {G}}\) macroscopic bins according to a specific strategy;

  2. ii.

    Prescribe a bin-wise distribution function to represent the population within each group together with a series of moment constraints.

This work employs a log-linear form for the bin-wise distribution function, which results in a thermalized local Boltzmann distribution within individual bins, defined as follows

$$\begin{aligned} {\mathscr {F}}_P^i\left( \epsilon _i\right) :\quad \log \left( \frac{g_i}{n_i}\right) =\alpha _P+\beta _P\epsilon _i\;, \end{aligned}$$
(2)

where the bin-specific coefficients \(\alpha _P\) and \(\beta _P\) are expressed in function of the macroscopic group constraints (i.e., number density, energy, etc.). The total population and energies of the different bins are the set of unknowns for the reduced-order system. The governing equations for these macroscopic constraints can be derived by taking successive moments of the StS master equations, using \(\left( \epsilon _i\right) ^m\) for \(m=0,1,\dots\) as weights (see Supplementary Sect. S.1.2 for more details).

While more accurate strategies have been developed during the past few years46,57, the model-reduction approach employed in this work is the rovibrational energy-based grouping technique99,100, which lumps together energy states with similar internal energy regardless of their rotational and vibrational quantum numbers.

Zero-dimensional chemical reactor

We wish to investigate the behavior of oxygen molecules in their electronic ground state undergoing dissociation when subjected to sudden heating in an ideal chemical reactor. We make the following assumptions:

  1. i.

    The 0-\(\textrm{D}\) reactor is plunged into a thermal bath maintained at constant temperature T.

  2. ii.

    The translational energy mode of the atoms and molecules is assumed to follow a Maxwell-Boltzmann distribution at the temperature T of the thermal bath.

  3. iii.

    At the beginning of the numerical experiment, the population of the rovibrational energy levels is assumed to follow a Boltzmann distribution at the internal temperature \(T_{int_0}\).

  4. iv.

    The volume of the chemical reactor is kept constant during the experiment, and the thermodynamic system is closed (no mass exchange with the surrounding environment).

  5. v.

    Only \(\alpha _P\) in Eq. (2) is modeled for each bin P, while \(\beta _P=1/\left( k_BT_P\right)\) is kept constant during the 0-\(\textrm{D}\) simulation, with \(k_B\) being the Boltzmann’s constant and \(T_P=T\).

Therefore, Eq. (1) reduces to

$$\begin{aligned} \begin{aligned} \frac{d\rho _i}{dt}&= \Omega ^0_i \left( \rho _i,T\right) \\ \rho _i\left( t=0\right)&= f_i \end{aligned}\;, \end{aligned}$$
(3)

where \(f_i\) refers to the corresponding Maxwell-Boltzmann equilibrium value of specie i at temperature \(T_{int_0}\).

Since the goal is to learn the integral solution operator of the rovibrational CG master equations to be able to deliver accurate predictions in multidimensional CFD simulations characterized by a wide range of physical scenarios, we aim to generalize over the space of initial conditions (ICs) and time domain. The ICs are generated by defining the initial pressure \(P_{_0}\), the initial molar fraction of atomic oxygen \(X_{\text {O}_0}\), the translational temperature T, and the initial internal temperature \(T_{int_0}\) for which a Boltzmann distribution is prescribed for the O\(_{2}\) bins. In this work, the domain in which the initial conditions have been sampled is defined in table 1 as minimum-maximum pair values. For all the possible sampling scenarios, T is greater than \(T_{int_0}\), which implies that thermal excitation and dissociation processes are the dominant phenomena occurring in the reactor.

Table 1 Space of initial conditions. Minimum-maximum pair values for each IC variable.

Regarding the time domain, we train the model over an interval of [0,10\(^{-2}\)] s, covering most excitation and dissociation processes for the non-equilibrium problem under investigation.

One-dimensional numerical experiment

Following the approach used by Zanardi et al.104, a one-dimensional shock case scenario is employed to test the ML-based framework proposed in this work. The governing equations for the dynamics of inviscid, one-dimensional gas flows are given by the Euler equations:

$$\begin{aligned} \frac{\partial {\textbf{U}}}{\partial t} +\frac{\partial {\textbf{F}}}{\partial x}={\textbf{S}}\;, \end{aligned}$$
(4)

where t represents time and x represents space. It is worth noting that Eq. (1) is the Lagrangian version of Eq. (4), including an additional diffusive term. The vectors \({\textbf{U}}\), \({\textbf{F}}\), and \({\textbf{S}}\) represent the conservative variables, inviscid fluxes, and source terms, respectively. They are defined as follows:

$${\textbf{U}} = [\rho _i\hspace{3mm}\rho u \hspace{3mm}\rho E]^{\textrm{T}} \;, $$
(5)
$${\textbf{F}}= [\rho _iu \hspace{3mm}p +\rho u^2\hspace{3mm}\rho u H]^{\textrm{T}}\;,$$
(6)
$$\begin{aligned} {\textbf{S}}&= [\Omega ^0_i\hspace{3mm}0\hspace{3mm}0]^{\textrm{T}}\;, \end{aligned}$$
(7)

where the total energy and enthalpy per unit-mass are \(E = e + u^2/2\) and \(H = E + p/\rho\), respectively. The thermodynamics of the system is explained in detail in the Supplementary Sect. S.1.1, and the variables e, u, p, and \(\rho\) have their usual meanings in the context of gas dynamics. The source term \(\Omega ^0_i\) represents the mass production term, which is the same one as defined in Eq. (3) and described in detail in the Supplementary Sect. S.1.2.

The flow governing equations (4) are discretized in space using the finite volume method, with inviscid fluxes evaluated using van Leer’s flux vector splitting in conjunction with the second-order upwind-biased MUSCL reconstruction procedure105,106. The time integration method is based on the operator-splitting technique proposed by Strang107. This method integrates the transport operator, \(\varvec{{\mathscr {T}}}\left( {\textbf{U}}\right) =\partial {\textbf{F}}/\partial x\), and the reaction operator, \(\varvec{{\mathscr {R}}}\left( {\textbf{U}}\right) ={\textbf{S}}\), sequentially in a symmetric fashion:

$$\begin{aligned} \partial _t{\textbf{U}}^{(1)}&=\varvec{{\mathscr {T}}}\left( {\textbf{U}}^{(1)}\right)\;,\qquad \qquad {\textbf{U}}^{(1)}\left( t_n\right) ={\textbf{U}}_n \end{aligned}$$
(8)
$$\begin{aligned} \partial _t{\textbf{U}}^{(2)}&=\varvec{{\mathscr {R}}}\left( {\textbf{U}}^{(2)}\right)\;,\qquad \qquad {\textbf{U}}^{(2)}\left( t_n\right) ={\textbf{U}}^{(1)}\left( t_n+\Delta t/2\right) \end{aligned}$$
(9)
$$\begin{aligned} \partial _t{\textbf{U}}^{(3)}&=\varvec{{\mathscr {T}}}\left( {\textbf{U}}^{(3)}\right)\;,\qquad \qquad {\textbf{U}}^{(3)}\left( t_n+\Delta t/ 2\right) ={\textbf{U}}^{(2)}\left( t_n+\Delta t\right) \end{aligned}$$
(10)
$$\begin{aligned} {\textbf{U}}_{n+1}&={\textbf{U}}^{(3)}\left( t_n+\Delta t\right)\;, \end{aligned}$$
(11)

where \(\Delta t\) is the time step. The splitting formulation is second-order accurate, strongly stable, and symplectic for non-linear equations. Its convergence and stability properties have been extensively studied for reacting flow simulations108,109,110,111. The use of an operator-splitting approach facilitates the straightforward insertion of the constructed neural operator into the framework described by Eqs. (8) to (11). Instead of using an implicit scheme to integrate the stiff reaction step described by Eq. (9), a simple evaluation of the trained surrogate is performed to evolve the solution in time. The surrogate takes the solution from the first flux integration step as input and provides the evolved gas state resulting from the reaction operator to the last step of the splitting scheme.

The main configuration details of the one-dimensional shock case scenario used herein are given below.

  • Initial and boundary conditions. Table 2 presents the piece-wise initial conditions. On the left side, freestream conditions corresponding to a hot gas at \(T=3000\) K and \(u=3000\) m/s are imposed. This choice is made because, at this temperature, the equilibrium state of the gas results in a reasonable amount of dissociated oxygen. It is important to note that this condition is not a requirement of the method itself but rather a consequence of only modeling the O\(_2+\)O kinetics without considering the O\(_2+\)O\(_2\) system, where molecular oxygen alone is sufficient to activate the thermochemical processes. On the right side, the initial solution is set equal to the post-shock equilibrium state. A supersonic inflow boundary condition (BC) is imposed on the left side, where all characteristics are incoming, by prescribing all flow variables. A subsonic outflow BC is imposed on the right side with a specified pressure value.

  • Time and space grid. The one-dimensional domain length is set to \(L=0.1\) m, and the spatial discretization uses a space step of \(\Delta x=4\times 10^{-4}\) m, resulting in a total of 250 cells. The integration is performed until the shock profile is fully developed, using a total of 500 iterations with a constant time step of \(\Delta t=1.33\times 10^{-7}\) s determined by the freestream velocity while maintaining a maximum CFL number of 1 to ensure numerical stability.

Table 2 Initial conditions for one-dimensional shock case scenario.

To ensure that the left and right equilibrium conditions are fully guaranteed and avoid any error accumulation due to even minor discrepancies in the surrogate’s predictions, the inference is performed only for those cells experiencing non-local thermodynamic equilibrium (NLTE) effects, meaning for gas thermochemical states different from the ones shown in table 2. However, to fairly compare the numerical integrator’s and the surrogate’s performance, the inference is performed for the whole 1-\(\textrm{D}\) domain, and the predictions for those cells in the same conditions as in table 2 are simply disregarded.

To ensure physical consistency, the surrogate must learn the integral solution of the zero-dimensional formulation of Eq. (4), specifically Eq. (9), which describes an adiabatic thermodynamic system without energy or mass exchange. Consequently, the isothermal assumption made in the 0-\(\textrm{D}\) analysis does not apply to this particular test. To accurately represent the adiabatic case, an additional DeepONet is required on top of the surrogate described in the next section. This additional DeepONet is employed to model the translational temperature T, enabling a more comprehensive and accurate representation of the complex thermochemical dynamics in the 1-\(\textrm{D}\) domain. Therefore, a distinct surrogate is constructed specifically for this simulation, with detailed information on data generation and network construction provided in the Supplementary Sect. S.3.1.

Neural operators

DeepONet

Building upon the original formulation of the DeepONet by Lu et al.72, whereby the solution operator G maps an input function \(\varvec{u}\) and the continuous coordinates \(\varvec{y}\) of \(G(\varvec{u})\) to a real scalar value, this work extends the DeepONet framework to accommodate the high-dimensional nature of the master equations, thus obtaining an output vector \(G(\varvec{u})(\varvec{y})\in {\mathbb {R}}^{D}\), where D is the number of the output variables41,67. As illustrated in Fig. S1 in the Supplementary Information, the DeepONet architecture is characterized by two different deep neural networks: the “branch net” and the “trunk net”. The modified version is characterized by multiple branches, one for each output variable, which takes \(\varvec{u}\) as input and returns a feature embedding \(\varvec{\alpha }\in {\mathbb {R}}^p\) as output. Instead, the trunk net takes the continuous coordinates \(\varvec{y}\) as inputs and outputs another feature embedding \(\varvec{\phi }\in {\mathbb {R}}^p\). This block is shared between different branches67,97, gaining computational efficiency. In the framework of operator learning for ODEs, \(\varvec{u}\) represents the space of initial conditions, whereas \(\varvec{y}\) is the time variable. To obtain a continuous and differentiable representation of the output functions of the DeepONet, the outputs of each branch and the trunk networks are merged via dot product as follows:

$$\begin{aligned} {\widehat{G}}^{(i)}(\varvec{u})(\varvec{y})=\sum _{k=1}^{p} \alpha ^{(i)}_{k}(\varvec{u}) \phi _{k}(\varvec{y}) \qquad \text {for}\hspace{2mm}i=1,\dots ,D\;. \end{aligned}$$
(12)

One can notice that Eq. (12) reminds the proper orthogonal decomposition (POD) formulation112, as highlighted by Lu et al.67, and more generally Eq. (12) can be related to the singular value decomposition (SVD) factorization, as explained by Venturi and Casey97. From this perspective, the trunk net learns the p most important modes of the dynamical system, \(\varvec{\phi }\), while the branch net learns the coefficients \(\varvec{\alpha }\) of the expansion. Under this perspective, the shared-trunk version of the DeepONet works reasonably well only when the dynamics of the modeled variables are similar to each other such that they can share the same basis \(\varvec{\phi }\)97.

Multi-scale hierarchical coarse-grained model

Similar to what is done in adaptive mesh refinement (AMR) techniques used in CFD, the accuracy of the CG model can be improved by increasing the number of groups but at a higher computational cost. The improvement in accuracy is explained by the larger range of scales (or kinetic processes) that can be resolved. Indeed, taking as an example the rovibrational energy-based grouping strategy employed in this work, if we recursively split the energy space of the internal states by following a cascade in the groups, all the micro-groups inside the corresponding macro-group quickly reach the same equilibrium value, showing a fast dynamical behavior. Consistently, we leveraged the multi-scale nature of the physical problem to construct a physics-inspired ML-based surrogate (see Supplementary Sect. S.2.2 for all the details) by sequentially learning the different timescales of the thermochemical phenomena occurring inside a 0-\(\textrm{D}\) reactor.

  • Timescale 1. Chemical dissociation of O\(_2\) molecules (irrespective of their internal excitation) and creation of O atoms are the slowest processes that can be learned. As shown in Fig. 3a, the outputs of the DeepONet employed for this first timescale, denominated as CG-DeepONet\(^{(1,1)}\) (i.e., the surrogate’s component in charge of predicting the group number one in the scale number one), are simply the mass fractions of O and O\(_2\). So, we are assuming that all the internal states can be clustered in one unique group, but we do not solve for the rovibrational-translation energy transfer phenomena. As concerns the physical input of the model, \(\varvec{u}\) represents the initial conditions of the reactor, which is characterized by translational temperature, T, reactor density, \(\rho\), and initial mass fraction of O\(_2\), while the independent variable, \(\varvec{y}\), of the operator \(G\left( \varvec{u}\right)\) is the time, t:

    $$\begin{aligned} \begin{aligned} \varvec{u}&=\left[ T,\,\rho ,\,Y_{{\text {O}_2}_{0}}\right]{} & {} \quad \in {\mathbb {R}}^{3}\\ \varvec{y}&=t{} & {} \quad \in {\mathbb {R}}^1\\ {\widehat{G}}(\varvec{u})(\varvec{y})&=\left[ {\widehat{Y}}_{\text {O}}\vert _{\varvec{u}}(t),\; {\widehat{Y}}_{\text {O}_2}^{(1,1,1)}\vert _{\varvec{u}}(t)\right]{} & {} \quad \in {\mathbb {R}}^2 \end{aligned} \end{aligned}\;.$$
    (13)

    In (13) and Fig. 3a, a series of two or three superscripts have been used, where the first one corresponds to the timescale investigated, the second the DeepONet index, and the last one the O\(_2\) group. They will help to identify the different variables and DeepONets used for each timescale. The Softmax function in Fig. 3a is applied to the dot product outputs after these being linearly transformed. It guarantees the mass fractions to be positive values and the mass to be conserved.

  • Timescale 1-2. In the following timescale, we start modeling the energy exchange processes for O\(_2\). To do so, the internal states are clustered into three groups (CGME3) which is equivalent to uniformly splitting the energy space covered by the unique group from the previous timescale (CGME1) into three parts, as shown in Fig. 3d. To learn the dynamics of this new system, the information learned from the previous timescale is leveraged by adopting transfer learning for the calibrated weights of CG-DeepONet\(^{\left( 1,1\right) }\). The new DeepONet is designed to learn the 3-group normalized distribution. The mass fractions of the three bins are then obtained by multiplying the modeled distribution by the total mass fractions of O\(_2\) predicted by CG-DeepONet\(^{\left( 1,1\right) }\), as shown in Fig. 3b, ensuring the conservation of mass across the two scales. In terms of architecture, two are the difference between Timescale 1 and Timescale 2. The first is related to the inputs, \(\varvec{u}\), of the branch net, which considers the initial mass fractions of all the three groups, \({\varvec{Y}}_{{\text {O}_2}_{0}}\). Since Timescale 1 takes as an input the total mass fraction of O\(_2\) as described in (13), the three values are summed to get the correct input for CG-DeepONet\(^{\left( 1,1\right) }\). The second aspect concerns the replacement of the Softmax layer with the EquilSoftmax one. The latter can be considered as an extension of the former, and it has the following formulation:

    $$\begin{aligned} \frac{{\widehat{Y}}_{\text {O}_2}^{(2,1,i)}}{{\widehat{Y}}_{\text {O}_2}^{(1,1,1)}} =\textit{EquilSoftmax}\left( \varvec{x}\right) _i=\frac{\exp \left( x^{(2,1,i)}\right) Q_i(T)}{\sum \limits _{i}\exp \left( x^{(2,1,i)}\right) Q_i(T)} \qquad \text {for}\hspace{2mm}i=1,2,3\;, \end{aligned}$$
    (14)

    where \(Q_i(T)\) is the internal partition function of group i. Therefore, if \(x^{(2,1,i)}=0\) \(\forall\) i, all the groups are in equilibrium at the translational temperature T. In the case of isothermal reactors, T is provided as one of the inputs \(\varvec{u}\). Conversely, for adiabatic systems like the 1-\(\textrm{D}\) test case scenario considered in this work, T is predicted by a separate DeepONet. This additional transformation layer, referred to as the Boltzmann layer in the introductory section, enforces local equilibrium distributions between states in the same group by construction. Moreover, it positively impacts the regularization of the network by providing a physically consistent prior distribution to anchor the network parameters, specifically a zero-valued distribution, which can be effectively regulated using \(L^2\) regularization. This ensures that the surrogate predictions remain closely aligned with the known reference equilibrium state, preventing excessive divergence and enhancing the robustness of the surrogate. It is worth highlighting that during the joint training process, all the parameters of CG-DeepONet\(^{\left( 1,1\right) }\) are re-trained together with the ones of CG-DeepONet\(^{\left( 2,1\right) }\), rather than being kept frozen. This is performed by employing fine-tuning transfer learning with L\(^1\)-SP and L\(^2\)-SP regularization as described in reference113.

  • Faster Timescales. It is possible to increase the accuracy of the CG model by further splitting the energy space into a higher number of clusters. Therefore, by sequentially repeating the same procedure that has been done for augmenting the model from Timescale 1 to Timescale 2, we can construct a surrogate that can predict the dynamics of high-resolution CG models. In our case, we further split each bin into three more bins, obtaining first a 3-group CG modeling for Timescale 2, then a 9-group CG modeling for Timescale 3, and finally a 27-group CG modeling for Timescale 4. We treat each group’s triplet with a single DeepONet, and we apply the EquilSoftmax layer at the output of each entire timescale block. As explained in the previous paragraph, the predicted mass fraction of each macro-group multiplies the distribution of the corresponding three micro-groups, obtaining a hierarchical surrogate for multi-scale coarse-grained dynamics, as shown in Fig. 3c.

Figure 3
figure 3

Hierarchical surrogate for multi-scale coarse-grained dynamics. (a) Schematics of the network architecture for 1-group CG master equations (CGME1). (b) Schematics of the network architecture for 3-groups CG master equations with the additional EquilSoftmax layer. (c) Schematics of the complete hierarchical network architecture. (d) O\(_2\) rovibrational distribution for 1-group (CGME1), 3-groups (CGME3), 9-groups (CGME9), and 27-groups (CGME27) coarse-grained grouping.

Training strategy

Physics-informed neural networks (PINNs)76 can integrate data and physical governing laws by adding PDE residuals to the loss function of neural networks by relying on automatic differentiation. This capability can also be incorporated into the DeepONet framework (physics-informed DeepONet or PI-DeepONet)82,83. Specifically, the following composite loss function is minimized to train the network parameters, \(\varvec{\theta }\):

$$\begin{aligned} {\mathscr {L}}(\varvec{\theta })=\lambda _{d}{\mathscr {L}}_{d}(\varvec{\theta })+\lambda _{r}{\mathscr {L}}_{r}(\varvec{\theta }) +\lambda _{ic}{\mathscr {L}}_{ic}(\varvec{\theta })+\Lambda (\varvec{\theta })\;, \end{aligned}$$
(15)

where \({\mathscr {L}}_{d}(\varvec{\theta })\) is computed based on the discrepancy between predicted and given data points, \({\mathscr {L}}_{r}(\varvec{\theta })\) is the residual loss, \({\mathscr {L}}_{ic}(\varvec{\theta })\) is the loss over the initial conditions of the 0-D reactor, and \(\Lambda (\varvec{\theta })\) contains the \(L^1\) and \(L^2\) regularization loss. These terms can be expressed as follows:

$$\begin{aligned} {\mathscr {L}}_{d}(\varvec{\theta })&=\frac{1}{N_{d}}\sum _{i=1}^{N_{d}}\ell \left( \varvec{Y}\left( \varvec{u}^i_d,t^i_d\right) ,\varvec{{\widehat{Y}}}\left( \varvec{u}^i_d,t^i_d\right) \right)\;, \end{aligned}$$
(16)
$$\begin{aligned} {\mathscr {L}}_{r}(\varvec{\theta })&=\frac{1}{N_{r}}\sum _{i=1}^{N_{r}}r \left( t^i_r,\rho ^i_r,\varvec{{\widehat{Y}}}\left( \varvec{u}^i_r,t^i_r\right) ,T^i_r\right)\;, \end{aligned}$$
(17)
$$\begin{aligned} {\mathscr {L}}_{ic}(\varvec{\theta })&=\frac{1}{N_{ic}}\sum _{i=1}^{N_{ic}}\ell \left( \varvec{Y}\left( \varvec{u}^i_{ic},0\right) , \varvec{{\widehat{Y}}}\left( \varvec{u}^i_{ic},0\right) \right)\;, \end{aligned}$$
(18)

where \(N_{d}\), \(N_{r}\), and \(N_{ic}\) denote the batch sizes of the training data. \(\varvec{Y}\) are the exact mass fraction values from direct numerical simulation of the CG master equations (CGME), whereas \(\varvec{{\widehat{Y}}}\) are the predicted ones from the surrogate. The parameters \(\lambda _d\), \(\lambda _r\), and \(\lambda _{ic}\) correspond to weight coefficients in the loss function that can effectively assign a different learning rate to each loss term. In this study, the error function \(\ell\) is expressed as follows:

$$\begin{aligned} \ell \left( \varvec{Y},\varvec{{\widehat{Y}}}\right) =\frac{1}{|\varvec{{\widehat{Y}}}|}\left\| \frac{\ln \left( \varvec{Y}\right) -\ln \left( \varvec{{\widehat{Y}}}\right) }{\ln \left( \varvec{Y}\right) }\right\| _1\;, \end{aligned}$$
(19)

while the residual \(r\in {\mathbb {R}}\) is

$$\begin{aligned} r\left( t,\rho ,\varvec{{\widehat{Y}}},T\right) =\frac{1}{|\varvec{{\widehat{Y}}}|}\left\| \rho \frac{d\varvec{{\widehat{Y}}}}{dt} -\varvec{\Omega ^0}\left( \rho ,\varvec{{\widehat{Y}}},T\right) \right\| _1\;, \end{aligned}$$
(20)

with \(\varvec{\Omega ^0}\) being the right hand side of Eq. (3).

Given the hierarchical structure of the proposed surrogate, the parameters of the entire network are trained by adopting a multi-step procedure:

  1. i.

    Fully data-driven optimizations. In this first step, the surrogate is trained sequentially from the slowest to the fastest timescale with only anchor and ICs points \(\left( \lambda _d=1,\,\lambda _r=0,\,\lambda _{ic}=1\right)\) obtained from the numerical solution of the coarse-grained master equations:

    1. a)

      Training only Timescale 1 with data generated by solving CGME1;

    2. b)

      Training jointly Timescales 1-2 with data generated by solving CGME3;

    3. c)

      Training jointly Timescales 1-2-3 with data generated by solving CGME9;

    4. d)

      Training jointly Timescales 1-2-3-4 with data generated by solving CGME27.

    At each training step, the knowledge acquired from the previous iterations is preserved and used as a prior by employing fine-tuning transfer learning with L\(^1\)-SP and L\(^2\)-SP regularization as described in reference113. For instance, in step (b), the calibrated weights for Timescale 1 from step (a) are kept and finely retrained with the newly initialized parameters of Timescale 2.

  2. ii.

    Hybrid physics-informed and data-driven optimization. The governing equations describing the CGME27 model are now enforced in the trained surrogate from step (i.d) using the hybrid loss formulation shown in Eq. (15). The weight coefficients \(\lambda _i\) are automatically tuned using the learning rate annealing technique described in reference114. The tuning procedure involves balancing the gradients of different loss terms during back-propagation using \(\lambda _i\) as a re-scaling factor of the learning rate corresponding to each loss term. This technique ensures that the model’s parameters are updated in a balanced manner, giving equal importance to all the loss terms. The complete training history of the parameter values \(\lambda _i\) can be found in the Supplementary Sect. S.2.2.3.

The decision to incorporate the residual loss only in the final step is intended to accelerate the training of the entire surrogate. Data from numerical simulations serves as anchor points for frequent or commonly seen scenarios, while the residual of the governing laws ensures the model’s ability to generalize to different, unseen physical conditions.

Adaptive pruning technique

Flow simulations are often characterized by regions of strong and weak non-equilibrium conditions of the gas. When the extent of non-equilibrium is large, the highest resolution is needed to resolve all the physical processes accurately. However, there are conditions for which the fine scales (or micro-groups) corresponding to the highest resolution CG model are in equilibrium with other neighboring groups or states. For these cases, adding resolution penalizes the computational efficiency rather than improving the model’s accuracy. In fact, under these conditions, the population distribution can be approximated with a Boltzmann distribution, and the low-fidelity CG model can accurately resolve their dynamics. Figure 4 illustrates the concept described above, where all the reconstructed low-lying energy states from different coarse-grained (CG) models are considered to be in equilibrium. As a result, it is sufficient to predict the values of the first group of the CGME3 model, without needing to resolve all the timescales.

Figure 4
figure 4

Example of the adaptive strategy. (a) O\(_2\) QSS rovibrational distribution for 1-group (CGME1), 3-groups (CGME3), 9-groups (CGME9), and 27-groups (CGME27) coarse-graining. The dashed ovals identify those CG high-resolution groups that can be accurately reconstructed from the low-resolution ones. (b) Exploded view of the groups’ graph. The opaque dots represent the Boltzmann-reconstructed groups that correspond to the ovals in (a) and do not require evaluations of CG-DeepONets’ high-resolution components.

These observations indicate the need to introduce a controller in the algorithm that accurately determines the resolution level needed to describe the dynamics of the system, without explicitly computing unnecessary fine scales. In the following, the design procedure for the additional controller-acting surrogate is firstly outlined, including the definition of the control variable and the network architecture. Subsequently, the adaptive inference technique is described, which involves the dynamic pruning of unnecessary nodes in the CG-DeepONets hierarchical architecture. This online pruning process enhances computational efficiency by selectively skipping the evaluation of specific nodes based on the local thermochemical state of the gas.

  • Physically-relevant non-equilibrium control variable. First, defining a metric that can quantify the physical information lost due to the coarse-graining procedure is crucial. This work employs the Euclidean distance between the Boltzmann reconstructed states of the highest resolution CG model available (i.e., Timescale 4) and the remaining low-fidelity ones. Since only the zeroth-order moment of the master equations is considered, the bin-specific coefficient \(\alpha\) in Eq. (2) is selected to construct our metric, which can be expressed as follows:

    $$\begin{aligned} \delta ^{\left( ts,\cdot ,P\right) }=\frac{1}{{\mathscr {N}}_p} \sum _{p,{\mathscr {I}}^{\left( 4,\cdot ,p\right) }\subset {\mathscr {I}}^{\left( ts,\cdot ,P\right) }} \left( \alpha ^{\left( ts,\cdot ,P\right) }-\alpha ^{\left( 4,\cdot ,p\right) }\right) ^2\;, \end{aligned}$$
    (21)

    where ts and P (or p) refer to the timescale and its specific group, respectively. Equation (21) involves the computation of the difference between the offsets of the log-linear Boltzmann distribution functions described in Eq. (2). The sum in Eq. (21) is performed over all the \({\mathscr {N}}_p\) micro-groups of Timescale 4 that belong to the macro-group P of timescale ts. Figure 5a provides a visual intuition of Eq. (21) for the first CGME3-group, which consists of the sum of the drawn dashed black lines. We briefly mention that other options for constructing the metric could have relied on the Kullback-Leibler divergence computed between population or energy distributions at the different temporal scales.

  • Controller-acting surrogate architecture. Given the defined metric, the design of the non-equilibrium controller-acting surrogate requires a specific architecture. To maintain consistency with the coarse-grained operator network described in Section “Neural operators”, we again leverage the multi-scale connotation of the physical problem by separately modeling the underpredicted non-equilibrium values for each CG low-fidelity model, as illustrated in Fig. 5b. An exponential transformation is applied to the surrogate outputs, and a single DeepONet is used for each triplet of values, following a similar approach as used for the CG-DeepONets. More details can be found in the Supplementary Sect. S.2.3.

  • Physics-driven online pruning. The composition of coarse-grained deep operator networks (CG-DeepONets) and non-equilibrium controller-acting DeepONets (Neq-DeepONets) allows the development of a technique that, given IC and time instant, adaptively predicts the groups’ distribution with the highest accuracy and lowest computational cost possible. This technique can be summarized as a two-step procedure which goes as follows:

    1. i.

      The first step involves querying the Neq-DeepONets to obtain the non-equilibrium control variable \(\delta\) for each CG resolution level. This variable reflects the inaccuracy of the low-fidelity CG models in describing the non-equilibrium state of the gas at the upcoming time instant.

    2. ii.

      The predicted \(\delta\) is then compared with a user-chosen tolerance level, \(\delta _{\text {tol}}\). If the predicted value is lower than the tolerance, the resolution level of the specific low-fidelity CG model is deemed sufficient to accurately represent the reactor dynamics. In such a case, the leaf nodes of the corresponding dependent tree in the CG-DeepONets model are temporarily pruned and not evaluated, as exemplified in Fig. 4b.

    At this point, we highlight the twofold advantage of CG-DeepONets’ hierarchical structure. In fact, other than simplifying the training stage, the presence of the controller boosts the inference phase, as the surrogate relies only on the CG-DeepONets’ components that are truly required to characterize the non-equilibrium distributions. The details of the adaptive algorithm are presented in the Supplementary Sect. S.2.3.

Figure 5
figure 5

Adaptive inference design. (a) Euclidean distance metric, \(\delta\), used to quantify the physical information lost due to the equilibrium assumption imposed in a too-large subspace in the energy phase. \(\alpha\) represents the zeroth-order term, i.e., the offset of the log-linear Boltzmann distribution function defined in Eq. (2). (b) Schematics of the multi-scale network architecture of the controller-acting surrogate responsible for adapting the required coarse-grained model resolution based on the local flow conditions.

Results

The framework discussed in the previous sections is used to construct a surrogate for an ideal chemical reactor. The first part of this section provides the details of the training and testing of the surrogate in isothermal 0-D scenarios, demonstrating its ability to learn the differential operator governing the physics of the reactor. The surrogate’s predictions are then compared against the solutions obtained from the numerical integration of the governing equations. Observables such as time-resolved distributions and its moments, including densities and energies, are employed for evaluation. Furthermore, details regarding the adaptive technique and a preliminary analysis of computational savings are provided. At the end of the section, the results of the one-dimensional numerical experiment are analyzed in terms of surrogate accuracy and performance.

Inference

As explained in Section “Physical modeling”, different initial conditions have been uniformly sampled from Table 1 to train and test the proposed ML framework. Figure 6 shows the broad ranges of the space of ICs for pressure, \(P_{_0}\), molar fraction of atomic oxygen, \(X_{\text {O}_0}\), and internal temperature, \(T_{int_0}\). A fourth dimension should be considered since the translational temperature of the reactor, T, also varies. In Fig. 6, the red dots represent unseen test scenarios, whereas the black crosses represent the training points.

Figure 6
figure 6

Space of initial conditions. The black crosses represent the set of training points, while the red dots identify the testing data set. Note: the figure is missing the last fourth dimension in the space of initial conditions, i.e., the translational temperature of the reactor, T.

Figure 7a compares the exact solution computed by the numerical integrator and the surrogate’s predictions for one unseen scenario taken from the test data set in Fig. 6. The isolated blue line represents the evolution of the atomic oxygen taken from Timescale 1. In contrast, the others describe the dynamics of the 27 rovibrational energy-based groups predicted by Timescale 4. The inference has been performed by querying the CG-DeepONet based on the vector of time instants generated from the numerical integrator and the given initial conditions, defined by \(\left\{ \left[ T,\,\rho ,\,\varvec{Y}_{{\text {O}_2}_{0}}\right] ,\,t_k\right\} ^{M}_{k=1}\), with M the number of evaluation points.

Figure 7
figure 7

Inferred solution. (a) Reference solution (solid line) versus prediction from the trained model (dashed line with markers). The isolated blue line represents the evolution of the atomic oxygen, while the remaining are the 27 groups of O\(_2\) in ascending order of energy content per particle (top-down). Initial conditions used: \(P_{_0}=3\,617.24\text { Pa}\), \(X_{\text {O}_0}=0.4996\), \(T_{int_0}=4\,333.16\text { K}\), \(T=9\,742.13\text { K}\). (b) Predicted O\(_2\) rovibrational states distribution at three different time instants taken from the dynamics shown in (a).

From Fig. 7a, it can be observed that the predicted and exact solutions show excellent agreement. This indicates that the trained model is capable of accurate predictions for different and unseen initial conditions (additional test cases are presented in the Supplementary Sect. S.2.2.2). Negligible discrepancies can be noticed in various regions of the dynamics of the heat bath, which can be improved by further refining the trained model. To the author’s best knowledge, this work provides the first application of PI-DeepONets to a dynamical system containing many such degrees of freedom. The main reason for such good surrogation of the dynamics is that the hierarchical structure of the proposed deep learning framework embodies the multi-scale connotations of the problem, showing higher accuracy and robustness compared to a vanilla DeepONet architecture (details provided in the Supplementary Sect. S.2.1.1). The micro-groups inside each macro-groups equilibrate faster between each other than with other ones outside it. For this reason, they show very similar behavior in their dynamics, which can be captured by the few modes discovered by the shared trunk. This aspect facilitates reaching high levels of accuracy with a relatively small number of network parameters. Indeed, the surrogate correctly predicts the dynamics of almost thirty species spanning a wide range of orders of magnitude (around 12) in mass fractions values. Additionally, to expand the initial conditions’ space even further by keeping such a high accuracy level and relatively small network architecture, one could consider constructing multiple surrogates. Each of these surrogates can be built with the same architecture but specialized for a local sub-domain in the space of the initial conditions.

Accuracy

The relative \(L^2\)-norm has been used as the error metric to evaluate the accuracy of the surrogate, consistently with reference82. In particular, the employed test error corresponds to the mean relative error of the surrogate’s predictions for Timescale 4 over all the examples in the test data set:

$$\begin{aligned} \varepsilon ^{(j)}=\frac{1}{N} \sum _{i=1}^{N} \dfrac{\left\| {\widehat{Y}}_i^{(4,\cdot ,j)}(t) - Y^{(4,\cdot ,j)}_i(t)\right\| _2}{\left\| Y^{(4,\cdot ,j)}_i(t)\right\| _2} \qquad \text{for}\hspace{2mm}j\in \{1,\dots ,{\mathscr {N}_\mathscr {G}}\}\;, \end{aligned}$$
(22)

where \({\mathscr {N}_\mathscr {G}}=27\) represents the number of groups, \(N=100\) denotes the number of testing cases, and t represents a set of log-uniformly spaced points in the time domain. For this analysis, \(1\,000\) points in time have been sampled from each testing scenario. The four highest errors of the inferred solution are presented in table 3. Once again, the reported values confirm the excellent agreement between the numerically integrated master equations and the predicted solutions, with a maximum relative \(L^2\)-norm error of approximately 4.5%.

Table 3 Test error. The four highest mean relative \(L^2\)-norm testing errors (with standard deviations) of the trained model for Timescale 4.

Surrogate predictions vs. numerically-integrated thermochemical models

Figure 8
figure 8

Surrogate predictions compared to numerically-integrated thermochemical models. \(\circ\) Black dashed line: state-to-state exact solution \(\circ\) Blue line with markers: coarse-grained exact solution \(\circ\) Blue dashed line with markers: coarse-grained surrogate predictions \(\circ\) Orange line with markers: Park’s two-temperature model with QSS approach5 \(\circ\) Orange dashed line with markers: Park’s two-temperature model with kinetics from reference102. Initial conditions used: \(P_{_0}=3\,000\text { Pa}\), \(X_{\text {O}_0}=0.2\), \(T_{int_0}=1000\text { K}\), \(T=10\,000\text { K}\). (a) Evolution of total mass fraction of O\(_2\). (b) Weighted rovibrational energy evolution per particle of O\(_2\).

To demonstrate the level of physical accuracy of the coarse-grained surrogate discussed in this study, a comparison is made against the reference CG solution, the high-fidelity state-to-state solution, and the computationally cheaper two-temperature model of Park, which is a specific case of the multi-temperature models described in Section “Physical modeling”. The exact CG, StS, and Park’s solutions have been computed with traditional numerical integrators. In Fig. 8, two different approaches are considered for Park’s model, one employing the less accurate but still widely used kinetics from reference102, derived from empirical methods or experimental data, and the other using the more recent QSS approach5, whose kinetic database is directly computed from state-to-state calculations. Figure 8 shows the evolution of the total mass fraction and internal energy content per particle of O\(_2\) for the different models considered. It is evident from the figure that the coarse-grained grouping strategy employed in this work provides the closest solution to state-to-state modeling. Only Timescale 1 (or CG-DeepONet\(^{\left( 1,1\right) }\)) of the proposed surrogate has been queried to produce the evolution of the total mass fraction of O\(_2\) shown in Fig. 8a, which is in excellent agreement with the numerically-integrated CG solution. This is because CG-DeepONet\(^{\left( 1,1\right) }\) implicitly contains all the information about the energy transfer processes between the 27 groups, as it has been trained with data from the integration of CGME27. However, while using only Timescale 1 is sufficient for accurately predicting the dynamics of the total mass fraction of the reactor species, the same approach may not be accurate for predicting the total internal energy content of the molecule. This is because CG-DeepONet\(^{\left( 1,1\right) }\) is specifically designed to model only the zeroth-order moment of the master equations and may not capture higher-order moments, such as the total internal energy content, with sufficient accuracy. Therefore, this quantity generally requires the evaluation of the overall surrogate, which includes the low-scale components CG-DeepONet\(^{\left( 2:4,:\right) }\). The discrepancy between the CG surrogate’s predictions and the StS numerical solution in Fig. 8 is almost exclusively determined by the physical simplifications made by the CG model. In particular, the energy difference that can be noticed at the initial time instants is caused by the fact that the reconstructed states within each bin follow a Boltzmann distribution at the translational temperature T (for the assumptions made in Section “Physical modeling”). In contrast, the quantum energy levels for the StS solution follow a distribution at temperature \(T_{int_0}\).

The proposed hierarchical architecture could be upgraded to model higher-order moments of the master equations. This improvement could involve replicating the same architecture as the CG-DeepONets to model the internal energy content of every single bin. Consequently, CG-DeepONet\(^{\left( 1,1\right) }\) could correctly predict both zeroth-, i.e., total mass, and first-order moment, i.e., internal energy, of O\(_2\). In such a case, the low-scale components CG-DeepONet\(^{\left( 2:4,:\right) }\) would not be required to predict the solution shown in Fig. 8b, but they might still be necessary for providing the correct distribution function of the quantum energy states when considering other physical phenomena, such as radiation.

Adaptive inference

The advantage of the hierarchical architecture proposed in this work is the ability to tailor the model complexity to the specific localized flow conditions to obtain a computationally efficient yet accurate physical model. Figure S4 in the Supplementary Information shows an example of the dynamics of underpredicted non-equilibrium Euclidean metric computed via Eq. (21) for Timescale 1 and Timescale 3 for the same test case shown in Fig. 9. The values plotted in Fig. S4 can be considered a good reference for the space the proposed metric can span, as the analyzed test case exhibits considerable initial thermal and chemical non-equilibrium. It should be noted that the values of \(\delta ^{\left( 1,1,1\right) }\) reported in Fig. S4a are almost an order of magnitude larger than Fig. S4b due to the more accurate modeling adopted in the latter. Overall, the trend is decreasing by approaching the equilibrium, except for the evident QSS region starting around 10\(^{-6}\) s, where all the quantities remain constant. \(\delta ^{\left( 3,2,6\right) }\) shows an interesting behavior in Fig. S4b, which corresponds to the sixth group of the 9-groups rovibrational energy-based coarse-grained grouping strategy for Timescale 3, the one close to the dissociation energy (5.115 eV). By observing the highly non-equilibrium StS dynamics at QSS of the states in this group (e.g., Fig. 2), it is clear that the highest resolution possible is necessary for that region of the energy space to model the dynamics of those states accurately5.

Figure 9
figure 9

Exact vs. adaptive solution for different \(\delta _{\text {tol}}\). Exact solution (solid line) versus prediction from the trained surrogate (dashed line with markers) using the adaptive technique. The isolated blue line represents the evolution of the atomic oxygen, while the remaining are the 27 groups of O\(_2\) in ascending order of energy content per particle (top-down). Initial conditions used: \(P_{_0}=3\,000\text { Pa}\), \(X_{\text {O}_0}=0.2\), \(T_{int_0}=3\,500\text { K}\), \(T=8\,000\text { K}\). (a) Underpredicted non-equilibrium tolerance used: \(\delta _{\text {tol}}=0.1\). (b) Underpredicted non-equilibrium tolerance used: \(\delta _{\text {tol}}=0.5\).

The solution obtained with the adaptive technique is compared with the exact one in Fig. 9 for two different values of the underpredicted non-equilibrium metric tolerance, \(\delta _{\text {tol}}\). This value acts as a discriminant for assuming equilibrium inside each macro-group for all the timescales modeled. For \(\delta _{\text {tol}}=0.1\), the adaptation starts playing effect just before the QSS region, as can also be deduced from Fig. S4a, whereas for \(\delta _{\text {tol}}=0.5\), it already acts at the beginning of the dynamics. We can assert that for a value of \(\delta _{\text {tol}}=0.1\), the solution looks very similar to the exact one, supporting the effectiveness of the adaptive technique in terms of physical accuracy. The adaptive solutions shown in Fig. 9 have been obtained by solving the number of groups dictated by the respective \(\delta\) reported in Fig. 10a as functions of time. From Fig. 10a, it is evident that the number of the solved groups decreases considerably by increasing the tolerance value, confirming the validity of the proposed adaptive technique. As already demonstrated in the previous section, the prediction of the total mass fraction of O\(_2\) is independent of the tolerance used since our model has been trained such that even the lowest-fidelity coarse-grained model can correctly predict the actual mass of the reactor species. However, in the case of energy, the choice of the proper tolerance can play an essential role in predicting its correct value, as shown in Fig. 10b.

Figure 10
figure 10

Adaptive solution for different \(\delta _{\text {tol}}\). The evolution of the total mass fraction of O\(_2\) is not reported here, as the model accurately predicts the solution regardless of the tolerance value used. (a) Total number of actually modeled O\(_2\) groups. (b) Weighted sum of the grouped-specific O\(_2\) rovibrational energy, with weights given by the groups’ distribution. (c) Computational cost comparison between the adaptive technique and the standalone CG-DeepONet model for single-time instant evaluation. The numbers on top of the bars are the corresponding number of groups, also shown in (a).

Figure 10c presents a preliminary performance analysis of the adaptive technique for the different tolerance values based on a comparison with the standalone CG-DeepONet model. The reported timings are obtained as the mean of 1000 different inference evaluations of the model per each physical time instant, conducted with a single central processing unit (CPU) core. The computations shown in Fig. 10c have been performed in the TensorFlow115 environment, which means that a large part of the network evaluation time involves Python call overhead. The bar plot illustrates that the adaptive technique outperforms the standalone surrogate at later stages of the system’s dynamical evolution, particularly when the composition approaches the asymptotic equilibrium value. The opaque bar chunks in Fig. 10c represent the contribution to the inference cost due to the Neq-DeepONets surrogate. A great advantage of this methodology is also its flexibility, as computational costs and physical accuracy can be easily balanced by tuning the tolerance value, \(\delta _{\text {tol}}\). Moreover, inference with physics-informed DeepONets is trivially parallelizable with graphics processing units (GPUs), which can remarkably boost the inference timings shown in Fig. 10c. Wang et al.82,83 have already demonstrated that PI-DeepONets can outperform and replace conventional numerical solvers even for long-time integration.

One-dimensional shock case scenario

In this section, preliminary results of a one-dimensional numerical experiment are presented, where the constructed surrogate is tested both with and without the adaptive technique.

Figure 11a,b present the final temperature and mass fraction profiles in the shock reference frame for the test case scenario described in Sect. 2.1. In both figures, the exact solution obtained using a thermochemical library is represented with black dashed lines, while the solution obtained using the surrogate without adaptation and employing adaptive inference with tolerance values of \(\delta _{\text {tol}}=0.01\) and \(\delta _{\text {tol}}=0.05\) are represented by blue, orange, and green lines, respectively. The integration using the surrogate produces physically correct solutions, with the largest differences noticed at the tail of the temperature profile, in particular when the tolerance value is high. As already explained in the previous section and demonstrated in Fig. 10b, the reason for these small discrepancies is due to the incorrect predictions of internal energy, which can result in incorrect temperature profiles while the conservation equation for total energy is integrated in time. The reconstructed microscopic distribution is also presented in Fig. 11c, showing a good agreement of the surrogate predictions with and without adaptation compared to the numerically integrated solution.

Figure 11
figure 11

One-dimensional shock solution. Comparison between exact and predicted final solutions in the shock reference frame, \(x_s\). \(\circ\) Black dashed line: exact solution \(\circ\) Blue line: surrogate predictions without adaptation \(\circ\) Orange line: adaptive surrogate predictions with \(\delta _{\text {tol}}=0.01\) \(\circ\) Green line: adaptive surrogate predictions with \(\delta _{\text {tol}}=0.05\). (a) Translational temperature. (b) Total mass fractions of O\(_2\) and O. (c) Reconstructed O\(_2\) rovibrational states distribution at \(x_s=5\times 10^{-4}\) m and corresponding Boltzmann equilibrium distribution function.

Figure 12a,b provide a preliminary performance analysis of surrogate inference with and without adaptation. The timings are computed by evaluating only the integration time for the reactive step in Eq. (9) using a single CPU core within Fortran 2008 environment. The corresponding statistics, i.e., mean and standard deviation, are calculated over 500 iterations and averaged over the number of cells in the 1-\(\textrm{D}\) domain. The speedup statistics are then obtained using the formula proposed by Díaz and Rubio116, which approximates the ratio of two independent normal random variables with a normal distribution. In Fig. 12a, the speedup of the standalone surrogate is presented as a function of time step, \(\Delta t\), which has been varied by changing only the number of cells and keeping everything else fixed. The surrogate inference is at least eight times faster than the serial integration performed with a conventional implicit scheme, in this case, the second-order backward differentiation formula (BDF2). Furthermore, the maximum speedup is reached when the integration time is much longer, which is expected since the integrator may need more steps to reach the final time, unlike the surrogate inference, which is independent of the total integration time. The computed speedup depends on various factors, such as the dimension of the network, the stiffness associated with the system of equations, the scheme and tolerances used for the ODEs integration, and the length of the integrated physical time. All these details for this particular test can be found in the Supplementary Sect. S.3. In Fig. 12b, a comparison is shown between the varying speedup with \(\delta _{\text {tol}}\) obtained with the adaptive inference technique (light blue) and the constant one obtained with the standalone surrogate (light orange) for \(\Delta t=1.33\times 10^{-7}\) s. As expected, increasing the tolerance values leads to higher speedups, which is consistent with the reported timings in Fig. 10c. However, this comes at a cost of reduced accuracy, as shown in Fig. 12c, which presents the increasing mean relative error for temperature and total mass fraction of O\(_2\) with increasing \(\delta _{\text {tol}}\). The reference error values for the surrogate without adaptation are \(\varepsilon _{Y_{\text {O}_2}}=0.93\%\) and \(\varepsilon _T=0.58\%\). It is noteworthy that the computation of the error does not include points in the domain where the gas experiences the left or right equilibrium thermochemical states, as the surrogate predictions are not considered in those regions. The increasing error is again related to the inaccurate prediction of internal energy, as observed in the previous analysis of the temperature profile in Fig. 11a, and it may be exacerbated by the error accumulation issue, also shown by Zanardi et al.104 This highlights the importance of upgrading the surrogate to also model the internal energy content of each individual bin, as it can lead to improved accuracy in terms of the macroscopic quantities of interest. Nevertheless, this approach holds promise when scaled to multi-dimensional CFD simulations with millions of unknowns. For example, in hypersonic simulations, most domain points may lie in the equilibrium or near-equilibrium regions, while only a few points may be in strong non-equilibrium regions (such as shock proximity) where the evaluation of the entire surrogate is needed. In light of these considerations and the performance analysis performed, the adaptive technique has the potential to outperform the standalone model in a multi-dimensional simulation framework.

Figure 12
figure 12

Surrogate performances and accuracy in one-dimensional simulation. (a) Speedup achieved using the surrogate without adaptation with increasing \(\Delta t\), which is equivalent to reducing the number of cells while maintaining a constant CFL. (b) Speedup achieved using adaptive surrogate inference with increasing \(\delta _{\text {tol}}\). (c) Mean relative percentage error for \(Y_{\text {O}_2}\) and T as a function of \(\delta _{\text {tol}}\). The constant reference values for the surrogate without adaptation are \(\varepsilon _{Y_{\text {O}_2}}=0.93\%\) and \(\varepsilon _T=0.58\%\).

Conclusions

We proposed a new machine learning-based paradigm inspired and constrained by physical laws for solving multiscale non-equilibrium flows. The designed model (CG-DeepONet) sequentially learned the integral solution operator for multi-fidelity coarse-grained master equations by employing a physics-inspired hierarchical architecture, where physics-informed DeepONet (PI-DeepONet) represents the core element. Furthermore, we developed a controller-acting surrogate (Neq-DeepONet) to learn the dynamics of the underpredicted degree of non-equilibrium to tailor the model’s accuracy to the local non-equilibrium conditions. Finally, by combining the two, we designed a novel adaptive pruning inference technique for non-equilibrium thermochemical processes, which showed flexibility in balancing accuracy and computational cost.

Overall, the proposed framework incorporates different key elements that enforce the underlying physics into the surrogate: (i) the physics-based dimensionality reduction in the state space; (ii) the additional layers enforcing the Boltzmann distribution functions, which in turn allow the imposition of prior distributions for the network parameters. When propagated to the state populations (e.g., mass fractions), such priors provide physically consistent solutions even when the surrogate is not trained (i.e., equilibrium distributions); (iii) the physics-informed loss; (iv) the hierarchical architecture and the related sequential fine-tuning transfer learning between different time scales, with mass conservation enforced; (v) the online pruning of the surrogate at the prediction phase through a parsimony-based approach that relies on an additional controller-acting surrogate informed by a non-equilibrium variable.

The methodology was applied to the study of chemical kinetics relevant for application to hypersonic flight and was tested on oxygen mixtures. However, the framework is not constrained to the chosen thermochemical configuration, but it can be extended to Air-5 mixtures (i.e., simultaneously with N\(_2\), O\(_2\), NO, N, and O species) or even other fields of physics spanning a wide range of temporal scales, such as electromagnetism, magnetohydrodynamics, and more generally, plasma physics. The proposed framework was tested in 0-\(\textrm{D}\) and 1-\(\textrm{D}\) configurations, and the following results were obtained:

  • In 0-\(\textrm{D}\) scenarios, the CG-DeepONet surrogate alone showed excellent physical accuracy compared to the numerical integration of the master equation, with a maximum relative error of 4.5%. It also exhibited good computational efficiency when the adaptive method was used, gaining more than 3X speedup in the regions of weak non-equilibrium.

  • The 1-\(\textrm{D}\) numerical experiment demonstrated the flexibility of the proposed method in capturing complex dynamics and confirmed the good performances and accuracy of both standalone and adaptive versions of the constructed surrogate. The relative error was in the range of 1–4.5% with a corresponding 8X-13X speedup compared to conventional implicit schemes employed in an operator-splitting integration framework. As expected, the choice of high tolerances for the adaptive schemes and the consequent lack of degrees of freedom in characterizing the rovibrational distribution generated error accumulations in the predictions of the overall O\(_2\) internal energy. In future work, we will treat the group temperatures as state variables together with the species mass fractions. This addition will have two benefits. Firstly, it will allow us to achieve comparable accuracy with fewer groups. Secondly, it will enable the accurate prediction of the O\(_2\) internal energy by relying only on the first scale (i.e., CG-DeepONet\(^{\left( 1,1\right) }\)), similar to what was achieved for the mass fractions (e.g., Fig. 8).

Future work will extend and test the framework to 2-\(\textrm{D}\) and 3-\(\textrm{D}\) simulations, leveraging its ability to be designed and constructed independently of geometric features of the problem. Additionally, alternative neural operator approaches other than DeepONets will be explored to mitigate the issue of error accumulation. Beyond the application and the numerical outcomes, this work serves as an example on how physics and machine learning can enhance each other, aiming for more interpretable and robust ML-based tools for the scientific community.