Abstract
The predictive capabilities of turbulent flow simulations, critical for aerodynamic design and weather prediction, hinge on the choice of turbulence models. The abundance of data from experiments and simulations and the advent of machine learning have provided a boost to turbulence modeling efforts. However, simulations of turbulent flows remain hindered by the inability of heuristics and supervised learning to model the nearwall dynamics. We address this challenge by introducing scientific multiagent reinforcement learning (SciMARL) for the discovery of wall models for largeeddy simulations (LES). In SciMARL, discretization points act also as cooperating agents that learn to supply the LES closure model. The agents selflearn using limited data and generalize to extreme Reynolds numbers and previously unseen geometries. The present simulations reduce by several orders of magnitude the computational cost over fullyresolved simulations while reproducing key flow quantities. We believe that SciMARL creates unprecedented capabilities for the simulation of turbulent flows.
Introduction
Simulations of wallbounded turbulent flows have become a key element in the design cycle of wind farms^{1} and aircraft^{2} and the major factor in the predictive capabilities of simulations of atmospheric flows^{3}. Due to the high Reynolds numbers associated with these flows, direct numerical simulations (DNS), where all scales of motion are resolved, are not attainable with current computing capabilities. LES aims to reduce the necessary grid requirements by resolving only the energycontaining eddies and modeling the smaller scale motions. However, this requirement is still hard to meet in the nearwall region, as the stressproducing eddies become progressively smaller, scaling linearly in size with the distance to the wall. Several studies^{4,5,6} have estimated that the number of grid points necessary for wallresolved LES scales as \({{{{{{{\mathcal{O}}}}}}}}(R{e}^{13/7})\), where Re is the characteristic Reynolds number of the flow. This number of computational elements is orders of magnitude smaller than that required for DNS, yet it remains prohibitive. In turn, modeling the nearwall flow such that only the largescale motions in the outer region of the boundary layer are resolved, the gridpoint requirement for wallmodeled LES (WMLES) scale at most as \({{{{{{{\mathcal{O}}}}}}}}(Re)\). With WMLES, certification by analysis—prediction of the aerodynamic quantities of interest for engineering applications by numerical simulations alone—may soon be a reality. Certification by analysis is expected to narrow the number of wind tunnel experiments, reducing both the turnover time and cost of the design cycle.
Several strategies for modeling the nearwall region have been explored^{7,8,9,10}. The taxonomy of WMLES approaches can be broadly categorized as Hybrid LES/RANS methods and wallflux modeling. Hybrid LES/RANS and its variants^{8} combine Reynoldsaveraged Navier–Stokes (RANS) equations close to the wall and LES in the outer layer, with the interface between RANS and LES domains enforced implicitly through the change in the turbulence model. In wallflux modeling, the usual noslip and thermal wall boundary conditions are replaced with stress and heat flux boundary conditions provided by the wall model. Examples of wellknown approaches involve computing the wall stress using either the law of the wall^{11,12,13} or the RANS equations^{14,15,16,17,18,19,20}. Models account for nonlinear advection and pressure gradient by solving the unsteady threedimensional RANS equations^{15,17} or accounting only for the wallnormal diffusion reducing the computational requirements to the solution of a system of ordinary differential equations^{19,20}.
The main impediments of the abovementioned models are that they rely on RANS parametrization that requires the use of a priori empirical coefficients calibrated for a particular flow state (usually fully developed turbulence in equilibrium over a flat plate). Such wall models do not function as intended in realworld applications, where various flow states coexist (e.g., separated flows, flow over roughness, predicting transition, etc.)^{7}. The use of RANS parametrization for wall modeling was challenged with a dynamic wall model that is free of a priori specified coefficients at a negligible additional cost^{21,22}. The two approaches were formulated by requiring consistency between the filtered velocity field at the wall and a differential filter kernel.
Dynamic wall models provide encouraging results, but they also face significant challenges. They are robust for changes in Reynolds number and grid resolution but sensitive to numerical methods employed in the flow solver and the choice of the subgridscale (SGS) model. This is attributed to the dominance of numerical errors close to the wall that in turn affect the evaluation of the necessary wallmodel constants^{23}. Furthermore, the methodology has only been exploited specifically for structured, incompressible flow solvers, with limited applicability for compressible flows or complex geometries.
The essential requirements for a successful dynamic wall model are that it (i) accommodates diverse flow solvers and SGS models and (ii) generalizes beyond their calibration flow fields. Recent advances in machine learning and data science aim to address these issues and complement the existing turbulence modeling approaches. To date, most efforts have focused on the application of supervised learning to SGS modeling^{24,25,26,27,28,29,30} and wall modeling^{31,32,33}. However, despite the demonstrated promise, these methods encounter difficulties in generalizing beyond the distributions of the training data. In supervised learning, the parameters of the neural network are commonly derived by minimizing the model prediction error, which is often based on singlestep target values to limit computational challenge. Therefore, it is necessary to differentiate between a priori and a posteriori testing. The first measures the accuracy of the supervised learning model in predicting the target values on a database of reference simulations, typically obtained via DNS. A posteriori testing is performed after training, by integrating in time the Navier–Stokes equations along with trained supervised learning closure and comparing the obtained statistical quantities to that of DNS or other reference data. Due to the singlestep cost function, the resultant neural network model is not trained to compensate for the systematic discrepancies between DNS and LES (or WMLES) and the compounding errors. The issue of illconditioning of datadriven SGS models has been exposed by studies that perform a posteriori testing^{27,34,35,36}. Wall models are more sensitive than SGS models^{22}, and we expect the compounding of errors to play a more detrimental role in WMLES.
Here, we propose SciMARL for the development of wall models in LES. Reinforcement learning (RL) identifies optimal strategies for agents that perform actions, contingent on their information about the environment, and measures their performance via scalar reward functions. In this work, the agents correspond with the computational elements and their actions compensate both for the closure terms and errors associated with the numerics of the flow solver. RL is a semisupervised learning framework with foundations on dynamic programming^{37} and a broad range of applications ranging from robotics^{38,39}, games^{40,41}, and more recently flow control^{39,42,43,44,45}. We note that SciMARL has been used in fluid mechanics only recently for the development of SGS models in LES of homogeneous turbulent flow^{46}.
In the case of WMLES, the performance of the SciMARL can be measured by comparing the statistical properties of the simulation to those of reference data such as the wallshear stress. SciMARL is a semisupervised learning algorithm that requires information about the flow formulated in terms of a reward rather than detailed spatiotemporal data as in the case of supervised learning. In the case of wall modeling, SciMARL does not rely on a priori knowledge of the loglaw coefficients but rather aims to discover active closure policies according to patterns in the flow physics captured by the filtered equations. The respective wall models are robust with respect to the numerical discretizations, as these errors are taken into consideration in the training process. Furthermore, the model discovery method can be readily extended to complex geometries and different flow configurations, such as flow over rough surfaces and stratified and compressible boundary layers.
Results
Multiagent reinforcement learning for wall modeling
In RL, the agent interacts with its environment by sampling its states (s), performing actions (a), and receiving rewards (r). At each time step, the agent performs the action and the system is advanced in time before the agent can observe its new state, receive a scalar reward, and choose a new action. The agent infers a policy π(s, a) through its repeated interactions with the environment to maximize its longterm rewards. The optimal policy π^{*}(s, a) is found by maximizing the expected utility, which is given by the expected cumulative reward. Throughout the paper, x, y, and z denote the streamwise, wallnormal, and spanwise directions, respectively. The corresponding velocity components are u, v, and w. RL agents are distributed evenly on each channel wall with each agent located at (x, z) receiving local states s_{n}(x, z) and rewards r_{n}(x, z) and providing local actions a_{n}(x, z) at each time step t_{n}. A single policy is maintained and updated by the multiple agents present in the domain (Fig. 1).
In order for the RL to be universally applicable for a wide range of flow parameters, the states are nondimensionalized using viscosity ν and the modeled instantaneous friction velocity
where \({\tau }_{w}^{m}\) is the modeled wallshear stress and ρ is the density. These quantities are only dependent on the output of the wall model and can be obtained without any prior knowledge of the flow. This nondimensionalization is noted by the superscript * and is distinct from the one by the true friction velocity u_{τ}, noted by the superscript + , which will be used for the assessment of model performance. The goal of the wall model is to predict the correct wallshear stress τ_{w}, and thus the u_{τ}, which will allow for good predictions of quantities such as the mean velocity profile and turbulence intensities^{47}.
Velocitybased wall model
We first train the model to adapt to the variation of the velocity with the wallnormal height, which has a universal behavior in the log region. We set as states s_{n}(x, z) the instantaneous velocity u^{*}(x, h^{m}, z, t_{n}), the wallnormal derivative ∂u^{*}/∂y^{*}(x, h^{m}, z, t_{n}), and the wallnormal location \({y}^{* }={({h}^{m})}^{* }\) of the sampling point. Agents act to adjust the wallshear stress through a multiplication factor a_{n}(x, z) ∈ [0.9, 1.1] such that \({\tau }_{w}^{m}(x,z,{t}_{n+1})={a}_{n}(x,z){\tau }_{w}^{m}(x,z,{t}_{n})\). This choice does not require the model to produce the exact wallshear stress (which is dependent on Reynolds number), but rather proposes an action that adjusts the wallshear stress. The reward (see “Methods” for definition) is also incremental and proportional to the improvement in the prediction of the wallshear stress compared to the one obtained in the previous time step. The agent behavior is rendered stable by providing additional reward if the predicted wallshear stress is within 1% of the true value.
Loglawbased wall model
The second model is based on the existence of a logarithmic (log) layer in the nearwall region of turbulent flows, present in all flows with an inner–outer scale separation^{48}. In the log layer, the velocity profile is expressed as:
where κ is the von Karman constant and B is the intercept constant. The exact value of κ and B depends on the flow configurations and wall roughness; however, for the current study, we use values attributed to a canonical smooth zeropressuregradient boundary layer. The states for the second model are the local instantaneous coefficients for the log law κ^{m} and B^{m}, computed from the instantaneous velocity, velocity gradient, and wallnormal location information. We emphasize that this model does not take as input the a priori known values of κ and B from the log law, but rather derived quantities from the instantaneous flow. This has an advantage over the first model in that the values do not depend on the value of y^{*} and thus the model can learn the loglaw behavior for y^{*} outside the range of values it trained on. This allows the model to be extended to higher Reynolds numbers or coarser grids more readily. The actions and rewards are the same as the first model.
Stateaction map
We inquire into the learned models by examining the stateaction map conditioned to positive rewards for the channel flow at friction Reynolds number Re_{τ} = 2000, 4200, 8000. As seen in Fig. 2a, the velocitybased wall model (VWM) has distinguished states (y^{*}, u^{*}) with distinct actions corresponding to positive rewards. The model is able to up/downshift the wallshear stress based on whether the (y^{*}, u^{*}) pair is located above or below the loglaw profile. The model initially does not have any prior knowledge of the loglaw coefficients, yet it is able to learn to adjust the wallshear stress through the RL process. However, because the model is trained on a limited range of \({({h}^{m})}^{+}\) in the training set, the extrapolation of this behavior to much larger values of \({({h}^{m})}^{+}\) may be challenging. This can be alleviated by refining the grid in the wallnormal direction with N_{y}~Re^{4,5,6}.
The loglawbased wall model (LLWM) similarly has distinct states with different actions corresponding to positive rewards (see Fig. 2b). The main mechanism for controlling the wallshear stress is similar to the VWM, with the wallshear stress being up/downshifted based on whether the point corresponding to the slope and intercept of 1/κ^{m} and B^{m} are under/overpredicting the log law. Depending on the wallnormal location of h^{m}, the classification of whether the point is above or below the log law may vary, especially for points farther away from the origin. However, the majority of the states are centered around the true value of 1/κ and B, and the mechanism will work as expected.
Testing: turbulent channel flow
We examine the model predictions on turbulent channel flow for Reynolds numbers in the range from 5200 to 10^{6} (Fig. 3). In the case of VWM, we expect that as long as \({({h}^{m})}^{+}\) is within the range observed during the training process (\(150 \; < \; {({h}^{m})}^{+} < \; 1200\)), the model will perform as expected. Cases at Re_{τ} = 2 × 10^{4} and 5 × 10^{4} produce high errors as the \({({h}^{m})}^{+}\) is not within the trained range. Once the values of \({({h}^{m})}^{+}\) are adjusted to be within the range by refining the grid, the errors decrease significantly. This entails refining the grid for higher Reynolds numbers to allow the first grid point off the wall to be within the trained range of \({({h}^{m})}^{+}\). In the case of LLWM, we observe that the prediction error in the friction velocity is less than 4% while the mean velocity profiles are wellaligned with the log law regardless of the value of \({({h}^{m})}^{+}\). The error increases with Reynolds number, most likely due to the high variation of the streamwise wallnormal gradient with increasing Reynolds number as well as the departure of \({({h}^{m})}^{+}\) from the trained range of values. Still, the results are comparable to the results obtained from the widely used equilibrium wall model (EQWM) up to Re_{τ} ≈ 10^{5}, which uses an empirical coefficient tuned for this particular flow configuration. This range of Reynolds numbers is sufficient for various external aerodynamic and geophysical flows. The predicted mean velocity profiles for both models are shown in Fig. 4.
Testing: spatially evolving turbulent boundary layer
The predictive performance of the LLWM is assessed in a zeropressuregradient flatplate turbulent boundary layer. The simulation ranges from Re_{θ} = 1000–7000, where Re_{θ} is the Reynolds number based on the momentum thickness.
The modeled skin friction coefficient \({C}_{f}^{m}={\tau }_{w}^{m}/(\rho {U}_{\infty }^{2}/2)\) for the full simulation domain is comparable to the C_{f} from the empirical values^{49} (Fig. 5a). This shows that the model is capable of adapting to variations of wallshear stress in the streamwise direction, even when it was only trained on a channelflow simulation.
Distribution of wallshear stress
A growing body of studies in wallbounded turbulence has shown that the generation of wallshear stress fluctuations is directly connected with outerlayer largescale motions^{50,51}. This observation supports the idea that the loglayer flow contains the information necessary to predict not only the mean wallshear stress but also the fluctuations. However, in deterministic wall models such as the EQWM, the wallshear stress is perfectly correlated with the velocity at the sampling location^{52,53}, as opposed to a correlation coefficient of 0.3 observed in DNS^{50}. This can be observed in Figs. 6a and 7, where the wallshear stress predicted by the EQWM is perfectly correlated with the velocity fluctuations at the sampling location h_{wm}. On the other hand, LLWM results in a smaller correlation between the velocity at an offwall location and the wallshear stress (Figs. 6b and 7) with a maximum correlation of ~0.3, which matches the expected correlation from DNS.
Potential of SciMARL wall models
We demonstrate that the SciMARL wall models perform as well as the RANSbased EQWM, which has been tuned for this particular flow configuration. The SciMARL wall model is able to achieve these results by training on moderate Reynolds number flows with a reward function only based on the mean wallshear stress rather. Moreover, RL models are trained insitu with WMLES and do not require any DNS simulation data. This is in contrast to supervised learning methods, where a vast amount of data need to be generated using highfidelity DNS simulations to proceed with the learning process. For example, in the case of a moderate Reynolds number channel flow (Re_{τ} = 4200), LLWM can be trained using O(10^{3}) CPUhours with less than 1 GB of storage. For supervised learning, generating the DNS data will require O(10^{7}) CPUHours with more than 100 TB storage. DNS databases might be already available for canonical cases such as channel flow, but it would be more difficult to obtain for cases regarding wall roughness or adverse pressure gradients, where wall models will be more useful. The additional overhead for generating data for supervised learning makes it less practical for realworld applications of wall modeling.
The LLWM is easy to extend to complex geometries and flow simulations utilizing different numerical methods or SGS models, as it only takes as states the instantaneous streamwise (or wallparallel) velocity, its wallnormal gradient, and the distance from the wall. These quantities do not depend heavily on numerics or SGS models, unlike filtered velocities or eddy viscosity values required in the dynamic model^{22}. Thus, the model can be used in a wide range of simulations, much like the EQWM, but without prescribed tunable parameters. Furthermore, the RL framework can be extended to various flow configurations by adding an additional dimension to the state vector. Since all flow with an inner–outer scale separation exhibit a log law^{48} in the overlap region, the current configuration for wallmodel development can be extended to flows exhibiting roughness, stratified flows, compressible flows, among many others. These flows usually have different loglaw coefficients κ and B that are manually tuned for existing wall models. However, in this work, these values are adjusted automatically using a SciMARLbased model.” This gives the LLWM a distinct advantage over existing models. For example, in cases with varying pressure gradients over the simulation domain, traditional methods will have to assign different model parameters for each location containing different pressure gradients. In contrast, the SciMARL model can smoothly transition between various pressuregradient effects with a single policy trained from various canonical cases when the parameters such as pressure and velocity gradients are included as a state. A similar argument can be applied to simulations with varying levels of stratification or compressible effects within a simulation domain. In addition, the evaluation of the LLWM involves evaluating the weights of the trained neural net, which is an order of magnitude faster than the EQWM that solves an ODE at each time step.
Discussion
We have introduced a potent method for the automated discovery of closures in simulations of wallbounded turbulent flows that uses limited data by fusing scientific computing and multiagent reinforcement learning (SciMARL). In this method, we solve the filtered Navier–Stokes equations using LES and develop a wall model as a control policy enacted by cooperating agents using the recovery of the correct mean wallshear stress as a reward. SciMARL requires limited data in contrast to supervised learning methods. The training was performed using LES of a turbulent channel flow at moderate Reynolds numbers (Re_{τ} = 2000, 4200, and 8000). Remarkably, the method generalizes on LES of a turbulent boundary layer and turbulent channel flow at extreme Reynolds numbers.
We examine the robustness of the method by studying two models (VWM and LLWM) with different state spaces. In the VWM, the state space comprises the streamwise velocity and its wallnormal derivatives. This model adjusts the wallshear stress based on the discrepancy of the velocity profile from the log law. The model captures the mean velocity profile for a wide range of Reynolds numbers when the wallnormal location of the sampling point is within the training set. Alternatively in the LLWM, the state space is based on the instantaneous loglaw coefficients. This model generalizes to a broader set of grid resolutions and Reynolds numbers than the VWM. Moreover, despite training in turbulent channel flows we find that the LLWM generalizes to spatially evolving turbulent boundary layer and it recovers the correct skin friction coefficient at a fraction of the cost of highfidelity simulations.
We note that the LLWM produces correlations between the predicted wallshear stress and the offwall velocity profile that are similar to fully resolved flow. This is in contrast to the correlations obtained by the classical RANS models. This implies that the policy of the LLWM replicates the natural mechanisms of wallshear stress control that can be obtained so far only through highly resolved simulations. Furthermore, as the model only requires instantaneous flow information at one offwall location, it could be extended to more complex geometries and different numerical methods without additional modifications.
We anticipate that the model can be easily expanded for all wallbounded flows that exhibit a log law through an inner–outer scale separation^{48}. We envision that when SciMARL is trained over a wide range of flows, the model will also acquire experiences for the key flow patterns that are omnipresent in the fundamental physics of flows in complex configurations. This advance will present a paradigm shift in wallmodel development for LES in the prediction and control of industrial aerodynamics and environmental flows.
Methods
Reinforcement learning
Learning is performed through the opensource RL library smarties^{54}. The library leverages efficiently the computing resources by separating the task of updating the policy parameters from the task of collecting interaction data. The flow simulations are distributed across workers who collect, for each agent, experiences organized into episodes,
where n tracks inepisode RL steps, μ and σ are the statistics of the Gaussian policy used to sample a, and t_{N} is the final time step for each episode. When a simulation concludes, the worker sends one episode per agent to the central learning process (master) and receives updated policy parameters. The master stores the episodes to a replay memory (RM), which is sampled to update the policy parameters according to RememberandForget Experience Replay (ReFER)^{54}. ReFER is combined with an offpolicy actorcritic algorithm VRACER which supports continuous state and action spaces.
VRACER trains a neural network defined by weights w which, given input state s, outputs the mean μ^{w}(s) and standard deviation σ^{w}(s) of the policy π^{w} and a statevalue estimate v^{w}(s). The statistics μ^{w} and σ^{w} are improved through the policy gradient estimator
where \({{{{{{{\mathcal{P}}}}}}}}({a}_{n} {\mu }_{n},{\sigma }_{n})\) is the probability of sampling a_{n} from \({{{{{{{\mathcal{N}}}}}}}}({\mu }_{n},{\sigma }_{n})\), and \({\hat{q}}_{n}\) is an estimator of the action value which is computed recursively from a Retrace algorithm^{55} as
where γ = 0.995 is the discount factor for rewards into the future. Retrace is also used to derive the gradients for the statevalue estimate
The expectations in Eqs. (3) and (5) are approximated by Monte Carlo sampling B observations from RM.
Due to the use of experience replay, VRACER and similar algorithms become unstable if the policy diverges from the distribution of experiences in the RM. We circumvent this issue by using an importance weight ρ_{t} to classify whether an experience is “nearpolicy" or “farpolicy" and clip the gradients computed from farpolicy samples to zero^{54}. In ReFER, the gradients are computed as
where \({\rho }_{t}={\pi }_{{\mathtt{w}}}({a}_{t} {s}_{t})/{{{{{{{\mathcal{P}}}}}}}}({a}_{t} {\mu }_{t},{\sigma }_{t})\). Here, \({g}^{D}={\nabla }_{{\mathtt{w}}}{D}_{KL}({\pi }_{{\mathtt{w}}}(\cdot  {s}_{t}))\parallel {{{{{{{\mathcal{P}}}}}}}}(\cdot  {\mu }_{t},{\sigma }_{t})\), where D_{KL}(P∥Q) is the Kullback–Leibler divergence measure between distributions P and Q. The coefficient β is iteratively updated to keep a constant fraction of samples in the RM within the trust region by
where r_{RM} is the fraction of the RM with importance weights outside the trust region [1/C, C] and D is a parameter.
The most notable hyperparameters used in our description of the MARL setup are the spatial resolution for the interpolation of the actions onto the grid (determined by \({{{\Delta }}}_{x}^{m}/{{{\Delta }}}_{x}\), and \({{{\Delta }}}_{z}^{m}/{{{\Delta }}}_{z}\)). The default values \({{{\Delta }}}_{x}^{m}/{{{\Delta }}}_{x}\), and \({{{\Delta }}}_{z}^{m}/{{{\Delta }}}_{z}\) reduce the number of experiences generated per simulation to O(10^{5}). This value is similar to the number of experiences generated per simulation used for SciMARL of SGS model development^{46}. Consistent with previous studies, we found that further reducing the number of agents per simulation reduced the model’s adaptability and therefore exhibit slightly lower performance. Because we use conventional reinforcement learning update rules in a multiagent setting, single parameter updates are imprecise. We found that ReFER with hyperparameters C = 1.5 and D = 0.05 (Eqs. (6) and (7)) stabilizes training. We ran multiple training runs per reward function and whenever we vary the hyperparameters, but we observe consistent training progress regardless of the initial random seed.
Further implementation details of the algorithm can be found in Novati et al.^{54}.
Overview of the training setup
The models are trained on turbulent channel flow simulations of Re_{τ} = u_{τ}δ/ν ≈ 2000, 4200, and 8000, where δ is the channel halfheight with grid resolution Δ_{x,y,z} ≃ 0.05δ. Each WMLES is initialized for uniformly sampled Re_{τ} ∈ {2000, 4200, 8000} with the initial velocity field for the training obtained by superposing white noise sampled from \({{{{{{{\mathcal{N}}}}}}}}(0,0.5{u}_{\tau })\) to a previously obtained WMLES flow field at the given Re_{τ} that is run for a short period of time to remove numerical artifacts. The initial wallshear stress is set to overestimate or underestimate the correct wallshear stress within ± 20%. At each time step of the WMLES, the location h^{m} is randomly selected between 0.075δ and 0.15δ to train over a smooth range of \({({h}^{m})}^{+}\) within the log layer. The velocity and its wallnormal gradient are then interpolated to the chosen wallnormal location h_{m} to form the state vector. The agents are located with spacings \({{{\Delta }}}_{x}^{m}=4{{{\Delta }}}_{x}\) and \({{{\Delta }}}_{z}^{m}=4{{{\Delta }}}_{z}\). Each iteration of the learning algorithm runs the simulation for 2δ/u_{τ}, updating the model at every time step.
The policy is parameterized by a neural network with two hidden layers of 128 units each, with soft sign activations and skip connections. The neural network is initialized with small outer weights and bias shifted such that the initial policy is approximately \({{{{{{{\mathcal{N}}}}}}}}(1,1{0}^{4})\)^{56}. Gradients are computed with Monte Carlo estimates with sample size B = 512 from an RM of size 10^{6}. The parameters are updated with the Adam algorithm^{57} with learning rate η = 10^{−5}. ReFER hyperparameters of C = 1.5 and D = 0.05 are used to stabilize training. Each training run is advanced for 10^{7} policy gradient steps.
For both VWM and LLWM, the action is given by a multiplication factor a_{n}(x, z) ∈ [0.9, 1.1] such that \({\tau }_{w}^{m}(x,z,{t}_{n+1})={a}_{n}(x,z){\tau }_{w}^{m}(x,z,{t}_{n})\). The reward is given by
where \({\mathbb{1}}\) is an indicator function and τ_{w} is the true mean wallshear stress. This gives a reward that is proportional to the improvement in the prediction of the wallshear stress compared to the one obtained in the previous time step with an additional reward if the predicted wallshear stress is within 1% of the true value. The states of the VWM are the instantaneous velocity u^{*}(x, h^{m}, z, t_{n}), the wallnormal derivative ∂u^{*}/∂y^{*}(x, h^{m}, z, t_{n}), and the wallnormal location \({y}^{* }={({h}^{m})}^{* }\) of the sampling point. The states of the LLWM are
Details of the flow simulation
We solve the filtered incompressible Navier–Stokes equations in a channel using LES with a staggered secondorder finite difference in space^{58} with a fractionalstep method^{59} and a thirdorder Runge–Kutta timeadvancing scheme^{60}. The SGS model is given by the anisotropic minimum dissipation (AMD) model^{61}, which is known to perform well in highly anisotropic grids^{62}.
For the channel flow, periodic boundary conditions are imposed in the streamwise and spanwise directions, and the noslip and nopenetration boundary conditions at the top and bottom walls. The modeled wall stress \({\tau }_{w}^{m}\) is applied to the LES domain through the eddy viscosity at the wall^{63},
where ν_{t} is the eddy viscosity and the subscript w indicates values evaluated at the wall. This boundary condition, compared to the more widely used Neumann boundary condition, is better at resolving the socalled loglayer mismatch for WMLES^{63}. The channel is driven by a constant pressure gradient for the testing cases. For training, the channel is driven by a constant mass flow rate computed from the mean velocity profile of channel flow. The domain size is given by L_{x} = 2πδ, L_{y} = 2δ, and L_{z} = πδ, where δ is the channel halfheight.
For the spatially evolving boundary layer, periodic boundary conditions are imposed in the spanwise direction. Noslip and nopenetration boundary condition with viscosity augmentation (Eq. (9)) is used at the wall. In the top plane, we impose u = U_{∞} (freestream velocity), w = 0, and v estimated from the known experimental growth of the displacement thickness for the corresponding range of Reynolds numbers^{49}. This controls the average streamwise pressure gradient, whose nominal value is set to zero. The turbulent inflow is generated by the recycling scheme^{64}, in which the velocities from a reference downstream plane, x_{ref}, are used to synthesize the incoming turbulence. The reference plane is located well beyond the end of the inflow region to avoid spurious feedback^{65,66}. A convective boundary condition is applied at the outlet with convective velocity U_{∞}^{67} with a small correction to enforce global mass conservation^{66}. The spanwise direction is periodic.
The code has been validated in previous studies in turbulent channel flows^{22,68,69,70} and flatplate boundary layers^{22,71}.
Testing: channel flow
The model predictions of VWM and LLWM are tested on turbulent channel flow for Reynolds numbers in the range from 5200 to 10^{6} (see Table 1) and for a time span of 300δ/u_{τ} significantly longer than the training period 2δ/u_{τ}. While only results using Δ_{x} ≈ Δ_{z} ≈ 0.05δ are reported here, using different grid resolutions representative of WMLES also produce similar results.
Note that for LLWM, one of the states, 1/κ^{m} = (∂u^{*}/∂y^{*})y^{*}, depends on the choice of y with respect to the discrete points of the simulation. For example, if y is located at the midpoint of two computational grid points, a central finite difference can be used to compute the wallnormal derivative ∂u^{*}/∂y^{*}. On the other hand, if y is located on the computational grid point, either a left or rightfinite difference is used. In this case, we chose y values that are midpoints of the two computational grid points. Changing the location of y had minor effects on the results, with the wallshear stress changing ~5% when the location of y was chosen to be on the computational grid point.
Testing: spatially evolving turbulent boundary layer
The predictive performance of LLWM is assessed in a zeropressuregradient flatplate turbulent boundary layer with Re_{θ} ranging from 1000 to 7000. This range was chosen so that the results can be compared against relevant DNS^{72}. The recycling plane for the inlet boundary condition is set to x_{ref}/θ_{0} = 890, where θ_{0} is the momentum thickness at the inlet. The length, height and width of the simulated box are L_{x} = 3570θ_{0}, L_{y} = 100θ_{0}, and L_{z} = 200θ_{0}. The streamwise and spanwise resolutions are Δ_{x}/δ = 0.06 (\({{{\Delta }}}_{x}^{+}=128\)) and Δ_{z}/δ = 0.05 (\({{{\Delta }}}_{z}^{+}=105\)) at Re_{θ} = 6500. The grid is uniform in the wallnormal direction with Δ_{y}/δ = 0.03 (\({{{\Delta }}}_{y}^{+}=64\)) at Re_{θ} = 6500. The number of wallnormal grid points per boundary layer thickness is chosen to be ~10 at the inlet, which is in line with the channel flow simulations. The sampling point h^{m} was chosen to be at the third grid point off the wall in the wallnormal direction^{16}, which places the point in the log region for most of the domain. All computations were run for 50 washout times after transients.
Data availability
All the data analyzed in this paper were produced with an inhouse flow solver and an opensource reinforcement learning software described in the code availability statement. Reference data and the scripts used to produce the data figures is available through GitHub (https://github.com/hjbae/SciMARL_WMLES).
Code availability
The wallmodeled largeeddy simulations were performed with a inhouse flow solver, which is available on demand. The wall models were trained with the reinforcement learning library smarties (https://github.com/cselab/smarties).
References
Sørensen, J. N. Aerodynamic aspects of wind energy conversion. Annu. Rev. Fluid Mech. 43, 427–448 (2011).
Slotnick, J. et al. CFD vision 2030 study: a path to revolutionary computational aerosciences. NASA Contractor Report, NASA/CR–2014218178 (2013).
Stoll, R., Gibbs, J. A., Salesky, S. T., Anderson, W. & Calaf, M. Largeeddy simulation of the atmospheric boundary layer. Bound.Layer Meteorol. 177, 541–581 (2020).
Chapman, D. R. Computational aerodynamics development and outlook. AIAA J. 17, 1293–1313 (1979).
Choi, H. & Moin, P. Gridpoint requirements for large eddy simulation: Chapman’s estimates revisited. Phys. Fluids 24, 011702 (2012).
Yang, X. I. A. & Griffin, K. P. Gridpoint and timestep requirements for direct numerical simulation and largeeddy simulation. Phys. Fluids 33, 015108 (2021).
Piomelli, U. & Balaras, E. Walllayer models for largeeddy simulations. Annu. Rev. Fluid Mech. 34, 349–374 (2002).
Spalart, P. R. Detachededdy simulation. Annu. Rev. Fluid Mech. 41, 181–202 (2009).
Larsson, J., Kawai, S., Bodart, J. & BermejoMoreno, I. Large eddy simulation with modeled wallstress: recent progress and future directions. Mech. Eng. Rev. 3, 15–00418 (2016).
Bose, S. T. & Park, G. I. Wallmodeled largeeddy simulation for complex turbulent flows. Annu. Rev. Fluid Mech. 50, 535–561 (2018).
Deardorff, J. W. A numerical study of threedimensional turbulent channel flow at large Reynolds numbers. J. Fluid Mech. 41, 453–480 (1970).
Schumann, U. Subgrid scale model for finite difference simulations of turbulent flows in plane channels and annuli. J. Comput. Phys. 18, 376–404 (1975).
Piomelli, U., Ferziger, J., Moin, P. & Kim, J. New approximate boundary conditions for large eddy simulations of wallbounded flows. Phys. Fluids A 1, 1061–1068 (1989).
Balaras, E., Benocci, C. & Piomelli, U. Twolayer approximate boundary conditions for largeeddy simulations. AIAA J. 34, 1111–1119 (1996).
Wang, M. & Moin, P. Dynamic wall modeling for largeeddy simulation of complex turbulent flows. Phys. Fluids 14, 2043–2051 (2002).
Kawai, S. & Larsson, J. Wallmodeling in large eddy simulation: length scales, grid resolution, and accuracy. Phys. Fluids 24, 015105 (2012).
Park, G. I. & Moin, P. An improved dynamic nonequilibrium wallmodel for large eddy simulation. Phys. Fluids 26, 37–48 (2014).
Yang, X. I. A., Sadique, J., Mittal, R. & Meneveau, C. Integral wall model for large eddy simulations of wallbounded turbulent flows. Phys. Fluids 27, 025112 (2015).
Bodart, J. & Larsson, J. Wallmodeled large eddy simulation in complex geometries with application to highlift devices. Annu. Res. Briefs Center Turbulence Res. 2011, 37–48 (2011).
BermejoMoreno, I. et al. Confinement effects in shock wave/turbulent boundary layer interactions through wallmodelled largeeddy simulations. J. Fluid Mech. 758, 5–62 (2014).
Bose, S. T. & Moin, P. A dynamic slip boundary condition for wallmodeled largeeddy simulation. Phys. Fluids 26, 015104 (2014).
Bae, H. J., LozanoDurán, A., Bose, S. T. & Moin, P. Dynamic slip wall model for largeeddy simulation. J. Fluid Mech. 859, 400–432 (2019).
Meyers, J. & Sagaut, P. Is planechannel flow a friendly case for the testing of largeeddy simulation subgridscale models? Phys. Fluids 19, 048105 (2007).
Sarghini, F., De Felice, G. & Santini, S. Neural networks based subgrid scale modeling in large eddy simulations. Comput. Fluids 32, 97–108 (2003).
Hickel, S., Franz, S., Adams, N. A. & Koumoutsakos, P. Optimization of an implicit subgridscale model for LES. in Proceedings of the 21st International Congress of Theoretical and Applied Mechanics (eds Gutkowski, W. & Kowalewski, T. A.) FM24_11256 (Springer, Warsaw, Poland, 2004).
Maulik, R. & San, O. A neural network approach for the blind deconvolution of turbulent flows. J. Fluid Mech. 831, 151–181 (2017).
Gamahara, M. & Hattori, Y. Searching for turbulence models by artificial neural network. Phys. Rev. Fluids 2, 054604 (2017).
Vollant, A., Balarac, G. & Corre, C. Subgridscale scalar flux modelling based on optimal estimation theory and machinelearning procedures. J. Turbul. 18, 854–878 (2017).
Xie, C., Wang, J., Li, H., Wan, M. & Chen, S. Artificial neural network mixed model for large eddy simulation of compressible isotropic turbulence. Phys. Fluids 31, 085112 (2019).
Fukami, K., Fukagata, K. & Taira, K. Superresolution reconstruction of turbulent flows with machine learning. J. Fluid Mech. 870, 106–120 (2019).
Milano, M. & Koumoutsakos, P. Neural network modeling for near wall turbulent flow. J. Comput. Phys. 182, 1–26 (2002).
Yang, X. I. A., Zafar, S., Wang, J.X. & Xiao, H. Predictive largeeddysimulation wall modeling via physicsinformed neural networks. Phys. Rev. Fluids 4, 034602 (2019).
LozanoDurán, A. & Bae, H. J. Selfcritical machinelearning wallmodeled LES for external aerodynamics. Annu. Res. Briefs Center Turbulence Res. 2020, 197–210 (2020).
Nadiga, B. T. & Livescu, D. Instability of the perfect subgrid model in implicitfiltering large eddy simulation of geostrophic turbulence. Phys. Rev. E 75, 046303 (2007).
Wu, J.L., Xiao, H. & Paterson, E. Physicsinformed machine learning approach for augmenting turbulence models: a comprehensive framework. Phys. Rev. Fluids 3, 074602 (2018).
Beck, A., Flad, D. & Munz, C.D. Deep neural networks for datadriven LES closure models. J. Comput. Phys. 398, 108910 (2019).
Bertsekas, D. P. Reinforcement Learning and Optimal Control (Athena Scientific, 2019).
Levine, S., Finn, C., Darrell, T. & Abbeel, P. Endtoend training of deep visuomotor policies. J. Mach. Learn. Res. 17, 1334–1373 (2016).
Reddy, G., Celani, A., Sejnowski, T. J. & Vergassola, M. Learning to soar in turbulent environments. Proc. Natl. Acad. Sci. USA 113, E4877–E4884 (2016).
Mnih, V. et al. Humanlevel control through deep reinforcement learning. Nature 518, 529–533 (2015).
Silver, D. et al. Mastering the game of go with deep neural networks and tree search. Nature 529, 484 (2016).
Gazzola, M., Hejazialhosseini, B. & Koumoutsakos, P. Reinforcement learning and wavelet adapted vortex methods for simulations of selfpropelled swimmers. SIAM J. Sci. Comput. 36, B622–B639 (2014).
Novati, G. et al. Synchronisation through learning for two selfpropelled swimmers. Bioinspir. Biomim. 12, 036001 (2017).
Verma, S., Novati, G. & Koumoutsakos, P. Efficient collective swimming by harnessing vortices through deep reinforcement learning. Proc. Natl. Acad. Sci. USA 115, 5849–5854 (2018).
Biferale, L., Bonaccorso, F., Buzzicotti, M., Clark Di Leoni, P. & Gustavsson, K. Zermelo’s problem: optimal pointtopoint navigation in 2D turbulent flows using reinforcement learning. Chaos 29, 103138 (2019).
Novati, G., Lascombes de Laroussilhe, H. & Koumoutsakos, P. Automating turbulence modeling by multiagent reinforcement learning. Nat. Mach. Intell. 3, 87–96 (2020).
Lee, J., Cho, M. & Choi, H. Large eddy simulations of turbulent channel and boundary layer flows at high Reynolds number with mean wall shear stress boundary condition. Phys. Fluids 25, 110808 (2013).
Millikan, C. B. A critical discussion of turbulent flows in channels and circular tubes. In Proceedings of Fifth International Congress of Applied Mechanics, (eds Hartog, J.P.D. & Peters, H.) 386–392 (Wiley, 1939).
Schlichting, H. & Kestin, J. Boundary Layer Theory, Vol. 121 (Springer, 1961).
Mathis, R., Marusic, I., Chernyshenko, S. I. & Hutchins, N. Estimating wallshearstress fluctuations given an outer region input. J. Fluid Mech. 715, 163 (2013).
Cheng, C., Li, W., LozanoDurán, A. & Liu, H. On the structure of streamwise wallshear stress fluctuations in turbulent channel flows. J. Fluid Mech. 903, A29 (2020).
Park, G. I. & Moin, P. Spacetime characteristics of wallpressure and wall shearstress fluctuations in wallmodeled large eddy simulation. Phys. Rev. Fluids 1, 024404 (2016).
Yang, X. I. A., Park, G. I. & Moin, P. Loglayer mismatch and modeling of the fluctuating wall stress in wallmodeled largeeddy simulations. Phys. Rev. Fluids 2, 104601 (2017).
Novati, G. & Koumoutsakos, P. Remember and forget for experience replay. In Proceedings of the 36th International Conference on Machine Learning. (eds Chaudhuri, K. & Salakhutdinov, R.) 4851–4860 (Proceedings of Machine Learning Research, 2019).
Munos, R., Stepleton, T., Harutyunyan, A. & Bellemare, M. G. Safe and efficient offpolicy reinforcement learning. In Advances in Neural Information Processing Systems Vol. 29, 1054–1062 (eds Lee, D. D., von Luxburg, U. Garnett, R., Sugiyama, M. & Guyon, I.) (Curran Associates, Inc., 2016).
Glorot, X. & Bengio, Y. Understanding the difficulty of training deep feedforward neural networks. In Proceedings of the Thirteenth International Conference on Artificial Intelligence and Statistics, 249–256 (eds Teh, Y. W. & Titterington, M.) (JMLR Workshop and Conference Proceedings, 2010).
Kingma, D. P. & Ba, J. L. Adam: A method for stochastic optimization. in Proc. 3rd International Conference on Learning Representations (ICLR, 2014).
Orlandi, P. Fluid Flow Phenomena: A Numerical Toolkit. Fluid Flow Phenomena: A Numerical Toolkit (Springer, 2000).
Kim, J. & Moin, P. Application of a fractionalstep method to incompressible NavierStokes equations. J. Comp. Phys. 59, 308–323 (1985).
Wray, A. A. Minimalstorage time advancement schemes for spectral methods. Tech. Rep. NASA Ames Research Center, California, Report No. MS 202 (1990).
Rozema, W., Bae, H. J., Moin, P. & Verstappen, R. Minimumdissipation models for largeeddy simulation. Phys. Fluids 27, 085107 (2015).
Haering, S. W., Lee, M. & Moser, R. D. Resolutioninduced anisotropy in largeeddy simulations. Phys. Rev. Fluids 4, 114605 (2019).
Bae, H. J. & LozanoDurán, A. Effect of wall boundary conditions on wallmodeled largeeddy simulation in a finitedifference framework. Fluids 6, 112 (2021).
Lund, T. S., Wu, X. & Squires, K. D. Generation of turbulent inflow data for spatiallydeveloping boundary layer simulations. J. Comp. Phys. 140, 233–258 (1998).
Nikitin, N. Spatial periodicity of spatially evolving turbulent flow caused by inflow boundary condition. Phys. Fluids 19, 091703 (2007).
Simens, M. P., Jiménez, J., Hoyas, S. & Mizuno, Y. A highresolution code for turbulent boundary layers. J. Comp. Phys. 228, 4218–4231 (2009).
Pauley, L. L., Moin, P. & Reynolds, W. C. The structure of twodimensional separation. J. Fluid Mech. 220, 397–411 (1990).
Bae, H. J., LozanoDurán, A., Bose, S. T. & Moin, P. Turbulence intensities in largeeddy simulation of wallbounded flows. Phys. Rev. Fluids 3, 014610 (2018).
LozanoDurán, A. & Bae, H. J. Characteristic scales of Townsend’s wallattached eddies. J. Fluid Mech. 868, 698 (2019).
LozanoDurán, A. & Bae, H. J. Error scaling of largeeddy simulation in the outer region of wallbounded turbulence. J. Comput. Phys. 392, 532–555 (2019).
LozanoDurán, A., Hack, M. J. P. & Moin, P. Modeling boundarylayer transition in direct and largeeddy simulations using parabolized stability equations. Phys. Rev. Fluids 3, 023901 (2018).
Sillero, J. A., Jiménez, J. & Moser, R. D. Twopoint statistics for turbulent boundary layers and channels at Reynolds numbers up to δ^{+} ≈ 2000. Phys. Fluids 26, 105109 (2014).
Acknowledgements
The authors acknowledge the support of Air Force Office of Scientific Research (AFOSR) Multidisciplinary University Research Initiative (MURI) project: Prediction, Statistical Quantification, and Mitigation of Extreme Events Caused by Exogenous Causes or Intrinsic Instabilities under grant number FA95502110058. Computational resources were provided by the Swiss National Supercomputing Centre (CSCS) Project s929.
Author information
Authors and Affiliations
Contributions
H.J.B. jointly conceived the study with P.K., designed and performed experiments, analyzed the data, and wrote the paper; P.K. devised the concept of SciMARL, supervised the project, and edited the manuscript.
Corresponding authors
Ethics declarations
Competing interests
The authors declare no competing interests.
Peer review
Peer review information
Nature Communications thanks the anonymous, reviewer(s) for their contribution to the peer review of this work. Peer reviewer reports are available.
Additional information
Publisher’s note Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Supplementary information
Rights and permissions
Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made. The images or other third party material in this article are included in the article’s Creative Commons license, unless indicated otherwise in a credit line to the material. If material is not included in the article’s Creative Commons license and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this license, visit http://creativecommons.org/licenses/by/4.0/.
About this article
Cite this article
Bae, H.J., Koumoutsakos, P. Scientific multiagent reinforcement learning for wallmodels of turbulent flows. Nat Commun 13, 1443 (2022). https://doi.org/10.1038/s41467022289577
Received:
Accepted:
Published:
DOI: https://doi.org/10.1038/s41467022289577
This article is cited by

The transformative potential of machine learning for experiments in fluid mechanics
Nature Reviews Physics (2023)

Superresolution analysis via machine learning: a survey for fluid flows
Theoretical and Computational Fluid Dynamics (2023)

Enhancing computational fluid dynamics with machine learning
Nature Computational Science (2022)

Bayesian uncertainty quantification for machinelearned models in physics
Nature Reviews Physics (2022)
Comments
By submitting a comment you agree to abide by our Terms and Community Guidelines. If you find something abusive or that does not comply with our terms or guidelines please flag it as inappropriate.