Reduced order modeling for flow and transport problems with Barlow Twins self-supervised learning

We propose a unified data-driven reduced order model (ROM) that bridges the performance gap between linear and nonlinear manifold approaches. Deep learning ROM (DL-ROM) using deep-convolutional autoencoders (DC–AE) has been shown to capture nonlinear solution manifolds but fails to perform adequately when linear subspace approaches such as proper orthogonal decomposition (POD) would be optimal. Besides, most DL-ROM models rely on convolutional layers, which might limit its application to only a structured mesh. The proposed framework in this study relies on the combination of an autoencoder (AE) and Barlow Twins (BT) self-supervised learning, where BT maximizes the information content of the embedding with the latent space through a joint embedding architecture. Through a series of benchmark problems of natural convection in porous media, BT–AE performs better than the previous DL-ROM framework by providing comparable results to POD-based approaches for problems where the solution lies within a linear subspace as well as DL-ROM autoencoder-based techniques where the solution lies on a nonlinear manifold; consequently, bridges the gap between linear and nonlinear reduced manifolds. We illustrate that a proficient construction of the latent space is key to achieving these results, enabling us to map these latent spaces using regression models. The proposed framework achieves a relative error of 2% on average and 12% in the worst-case scenario (i.e., the training data is small, but the parameter space is large.). We also show that our framework provides a speed-up of \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$7 \times 10^{6}$$\end{document}7×106 times, in the best case, and \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$7 \times 10^{3}$$\end{document}7×103 times on average compared to a finite element solver. Furthermore, this BT–AE framework can operate on unstructured meshes, which provides flexibility in its application to standard numerical solvers, on-site measurements, experimental data, or a combination of these sources.

www.nature.com/scientificreports/ It is noted that the final total training samples are 0.9MN t because we allocate 10% of the training samples for the validation set. The total of testing data is M test N t . We want to emphasize that our N t is not constant, but it is a function of µ . To elaborate, the higher Ra value will result in the higher N t to satisfy CFL condition.
The summary of each model, including the subspace dimension and compression method, is presented in Table 2. The detailed description of POD, AE, and DC-AE models is provided in Kadeethum et al. 6 , and our newly developed BT-AE models are described in "Methodology" section. In short, for POD models, we use proper orthogonal decomposition as a compression tool. The AE models use an autoencoder as a compression method. We employ a deep convolutional autoencoder to compress our training snapshots ( MN t ) for DC-AE models. The BT-AE models utilize a combination of an autoencoder and Barlow Twins self-supervised learning in their compression procedure. For the POD models, linear compression, subspace dimension refers to the number of reduced basis or N as well as the number of intermediate reduced basis or N int . We assume N = N int for all models for simplicity. The subspace dimension is the number of latent space ( Q ) for the nonlinear compression, AE, DC-AE, and BT-AE models.
Details of POD, AE, DC-AE models are provided in Kadeethum et al. 6 .
Comparisons of BT-AE with POD, AE, and DC-AE models in simple domains. We first compare the BT-AE model accuracy (for different numbers of Q ) with the models developed by Kadeethum et al. 6,43 (i.e., POD, AE, and DC-AE models) in relatively simple model domains. Example 1 illustrates a case where a linear manifold is optimal, while Example 2 presents a case where a nonlinear manifold is optimal. The results of POD, AE, and DC-AE models presented in Kadeethum et al. 6 demonstrated that the POD-based and DL-ROM approaches are more suitable for the linear and nonlinear manifold problems, respectively, and they are used in this manuscript to evaluate the performance of BT-AE models.
Example 1: Heating from the left boundary. The geometry and boundary conditions are shown in Fig. 1a, and we adopt this example from Zhang et al. and Kadeethum et al. 6,47 . This example represents a case where its fluid flow is driven by buoyancy as the fluid is heated on the left side of the domain. The fluid then flows upwards and rotates to the right side of the domain. We set µ = (Ra) , and its admissible range of variation is [40.0, 80.0], see Table 1. For the training set, we use M = 40 , which lead to, in total, MN t = 16802 training data points. We present the test case results of the BT-AE model (BT-AE 16 Q) as supplimental information (SI-Animation-Example 1). The difference between solutions produced by the FOM and ROM (DIFF) is calculated by where ϕ h is a finite-dimensional approximation of the set of primary variables corresponding to velocity, pressure, and temperature fields. ϕ h is an approximation of ϕ h produced by the ROM. Thus, ϕ h (·; t k , µ represent ϕ h and ϕ h at all space coordinates (i.e., evaluations at each DOF) at time t k with input parameter µ (i) test , respectively. Note that we only present the results of the temperature field. Hence, ϕ h and ϕ h represent T h and T h , respectively. From SI-Animation-Example 1, we observe that BT-AE 16 Q provides a reasonable approximation of the temperature field.
The results of Example 1 is presented in Fig. 2. In Fig. 2a, The performance of the different models (Table 2) is evaluated with the mean square error ( MSE ϕ (:, µ (i) test ) ) of the test cases defined as follows test . The MSE results show that BT-AE models perform better than AE and DC-AE models. Besides, BT-AE 16 Q delivers similar MSE results to those of the POD models. In contrast to the findings presented in Kadeethum et al. 6 where the linear compression (POD) outperforms nonlinear compression (AE and DC-AE), BT-AE models in this study could perform similar to the POD models. To be accurate, BT-AE models still underperform, but errors are comparable.
We then investigate how the performance of BT-AE models compares to DC-AE. First, we examine the data compression loss of the validation set (see Eq. 18) which is presented in Fig. 2b. From this figure, the data compression losses of BT-AE models are slightly better than those of the DC-AE models. Subsequently, we illustrate the mapping using ANN loss of the validation set, see Eq. (19), in Fig. 2c. From Fig. 2c, we observe that the mapping losses of the BT-AE models are six orders of magnitude less than those of the DC-AE models. This behavior shows that the BT-AE's latent spaces are easier to be mapped (i.e., ANN loss of the validation set for the BT-AE mapping is much lower than that of the DC-AE.). This speculation is explained by Fig. 2d,e, using a t-Distributed Stochastic Neighbor Embedding (t-SNE) plot. From Fig. 2d, one could see that all latent variables of DC-AE 16 Q blend (i.e., you cannot differentiate among cases with different Ra values.). The latent variables of the BT-AE 16 Q model, on the other hand, shown in Fig. 2e, behave in a much better structure (i.e., we can differentiate among cases with different Ra values.).
Example 2: Elder problem. The Elder problem 48 is a significantly more complicated and ill-posed problem 48,49 . High Ra numbers considered in this case may cause the flow instability to be fingering behavior. The domain and boundary conditions are presented in Fig. 1b 6,47,50 . In short, the model domain is heated from the half of the bottom boundary (Fig. 1b), and the flow is driven upwards by buoyancy force. We set µ = (Ra) , and its admissible range as [350.0, 400.0] ( Table 1). Compared to Example 1, this higher range of Ra values affects the minimum and maximum N t as its range increases to [790,1010].
The results of Example 2 are presented in Fig. 3. From Fig. 3a, we observe that all the models using nonlinear compression (AE, DC-AE, and BT-AE) perform better than the linear compression (POD). Furthermore, the BT-AE model accuracy is comparable to that of the DC-AE models. However, the BT-AE model results seem to be insensitive to the number of Q , while the DC-AE model results are affected by the number of Q (i.e., the DC-AE 16 Q and DC-AE 256 Q are more accurate than the DC-AE 4 Q). We also present the results of the test cases for the BT-AE 16 Q model in the supplemental animation (SI-Animation-Example 2). From these results, we observe that the BT-AE 16 Q model delivers a reasonable approximation of the solution T h (i.e., T h ).
We present the data compression loss of the validation set (Eq. 18) in Fig. 3b. In contrast to the ones shown in Fig. 2b, the DC-AE models have a slightly lower loss than that of the BT-AE models. We then investigate the ANN mapping loss (see Eq. 19) of the validation set in Fig. 3c. Similar to those presented in Fig. 2c, the BT-AE models have much lower mapping losses compared to those of the DC-AE models. Among the BT-AE models, BT-AE 256 Q has the highest value of ANN mapping loss, which is expected since it has the highest output dimension (i.e., we are mapping t and µ to z Q ). Again, we observe a much better structure of the BT-AE 16 Q latent space than the one from DC-AE 16 Q (see Fig. 3d,e). To elaborate, the latent variables of the DC-AE 16 Q are overlapped to hinder us from differentiating among each case (different Ra values). The latent variables of the BT-AE, on the contrary, are structured in a way that one can clearly observe different parts that represent different Ra values as shown in Fig. 3e.

Model performance of BT-AE models on complex geometries. From Examples 1 and 2, we have
observed that the BT-AE models could provide good results while operating on unstructured data. In this section, more challenging geometries which require an unstructured mesh for the FOM are evaluated with BT-AE models only since other methods are not suitable for unstructured mesh problems.
Example 3: Unit cell of micromodel. Example 3 uses a unit cell of micromodel where a central part of honeycomb shape and four corners are impermeable for flow. Still, the heat can conduct through these five subdomains as presented in Fig. 1c. Over the past decade, the micromodel has been used to study multiple coupled processes, including flow, reactive transport, bioreaction, and flow instability 19,20,[51][52][53][54] . The flow is initiated from an influx at the bottom of the domain. This geometry is more complex than those utilized in Examples 1 and 2 (see Fig. 1a,b). The higher temperature at the bottom surface (shown in red) alters a fluid density at the bottom, and subsequently, a buoyancy force drives the flow upwards from the bottom (shown in red) to the top of the domain. Five subdomains contain very low flow conductivity, but they can conduct heat. Again, we set µ = (Ra) and its range as [350.0, 400.0] ( Table 1). The range of Ra can also cause flow instability. We use M = 40 , M validation N t = 10 % of MN t , and M test = 10 . We have in total MN t = 44354 training data points.
The summary of the Example 3 results is shown in Fig. 4. For all test cases the MSE values over time in Fig. 4a-c are in the range of ≈ 1 × 10 −5 . The MSE values tend to decrease over time until the temperature field becomes a steady state. Besides, BT-AE models with different Q values provide approximately similar results (in line with our findings from Examples 1 and 2). The behavior infers that utilizing only a small number of latent spaces; the model can achieve the same level of accuracy as the one with a large number of latent spaces. This behavior is very beneficial because the mapping between parameter space and latent space becomes more manageable. We also present the results of the test cases for the BT-AE 16 Q model in the supplemental animation (SI-Animation-Example 3). Overall, BT-AE 16 Q delivers a reasonable approximation of the T h (i.e., DIFF results are low, and the relative error lies within 2%).
The data compression loss (Eq. 18) is in the range of ≈ 1 × 10 −5 to 1 × 10 −6 ( Fig. 4d) which is similar to that of Example 1 (Fig. 2b), but slightly lower than that of the Example 2 (Fig. 3b). The data compression loss seems to be invariant to Q values. We also present the Barlow Twins loss (Eq. 14) in Fig. 4e. We observe that the Barlow www.nature.com/scientificreports/  www.nature.com/scientificreports/  www.nature.com/scientificreports/ Twins loss increases with increasing the Q value as in Zbontar et al. 46 . This can be explained that as the Q value grows larger, the cross-correlation matrix C T (t, µ) becomes bigger, resulting in more members in Eqs. (15) and (16). As stated by Zbontar et al. 46 , the absolute value of Eqs. (15) and (16) is not as important as their trend. To elaborate this, in Fig. 4e, all models (different Q values) reach their saturated points around 40 epochs, meaning that the minimization of Eqs. (15) and (16) is completed. The mapping of the latent space using ANN loss (Eq. (19)) is presented in Fig. 4f. Similar to Examples 1 and 2 (Figs. 2c, 3c), the mapping loss is in range of ≈ 1 × 10 −5 to 1 × 10 −7 . The higher Q values, the mapping loss grows larger because there are more outputs to map. We present the latent space structure in Fig. 4g (only for BT-AE 16 Q). Following the results shown in Figs. 2e and 3e, the latent structure of the BT-AE model has a good structure since we can differentiate among different Ra values. This behavior stems from the fact that the BT losses maximize the information content of the embedding with the latent space through a joint embedding architecture. Even though this setting is very challenging, we still observe that the BT-AE 16 Q delivers a reasonable approximation of the T h as seen in the supplemental animation (SI-Animation-Example4). The summary of the Example 4 results is shown in Fig. 5. We present the MSE values as a function of time in Fig. 5a-c. We can observe that the MSE values for all test cases are in the range of ≈ 1 × 10 −1 to 1 × 10 −5 , which are significantly higher than those of Examples 1, 2, and 3. Moreover, the MSE values generally increase as we approach steady-state solutions, unlike the behaviors shown in Example 3. Again, BT-AE models with different Q provide approximately similar results (in line with our finding from Examples 1, 2, and 3).
The data compression loss (Eq. 18) is in the range of ≈ 1 × 10 −2 to 1 × 10 −4 (Fig. 5d), which is significantly higher than that of Examples 1, 2, and 3. This behavior illustrates that this example is the most challenging case for the BT-AE models. The data compression loss is the lowest for Q = 256 and the highest for Q = 4 , but the difference is not critical. As shown in the Barlow Twins loss (Eq. 14) in Fig. 5e, the higher values of Q the larger Barlow Twins loss is (as we discussed in the previous example.).
The mapping of the latent space using ANN loss (Eq. 19) is presented in Fig. 5f. The mapping loss is in the range of ≈ 1 × 10 −4 to 1 × 10 −5 , which is significantly higher than those of Examples 1, 2, and 3 (see Figs. 2c, 3c, 4f). This behavior also contributes to the higher MSE values of the BT-AE models. We present the latent space structure (only for BT-AE 16 Q) in Fig. 5g,h for Ra 1 and Ra 4 , respectively. Since Example 4 has different Ra values in four subdomains, the differentiation of the latent space of individual Ra does not provide good solutions as each latent space of each subdomain might also be interconnected.

Discussion
Recent developments in ML-based data-driven reduced order modeling (DL-ROM or DC-AE in this study) 5,6 have shown promising results in capturing parametrized solutions of systems of nonlinear equations. These models, however, rely on convolutional operators, which hinders the applicability of these models to complex geometries where an unstructured mesh is required for FOMs, as in Examples 3 and 4. Though we could utilize an autoencoder without convolutional layers, the model could not achieve the same level of accuracy as DL-ROM 6 . Kadeethum et al. 6 also illustrate that in a specific setting (simple geometry and boundary conditions), a linear compression approach using POD can outperform the DL-ROM model (Example 1). We have demonstrated that the autoencoder model through Barlow Twins self-supervised learning (BT-AE) could achieve the same accuracy as DL-ROM (Example 2 where POD models perform much worse than DL-ROM) by regularizing the latent space or nonlinear manifolds. Besides, it also yields optimal results in the case where the linear compression model outperforms the DL-ROM (Example 1). It means that the BT-AE model excels in all the test cases (Examples 1 and 2) while it still can operate on an unstructured mesh. This behavior has a significant advantage in scientific computing since most realistic problems require unstructured mesh representations. Besides, the BT-AE's performance is insensitive to the number of latent spaces, suggesting that with only a small number of latent spaces, the model can achieve the same level of accuracy as the one with a large number of latent spaces. This behavior is very beneficial because the mapping between parameter space and latent space becomes more manageable.
The computational time used to develop our ROM can be broken down into three primary parts: (1) generation of training data through FOM (the second step in Fig. 6), (2) training BT-AE (the third step in Fig. 6), and (3) mapping of t and µ to reduced subspace (the fourth step in Fig. 6). Each FOM model (corresponding to each set of µ or Ra in this work) takes, on average, about two hours on AMD Ryzen Threadripper 3970X (4 threads). We note that our FOM utilizes the adaptive time-stepping; hence, each µ (i) may require a substantially different computational time. To elaborate, cases that have higher Ra usually have a smaller time-step ( N t becomes larger), and subsequently, they require more time to complete.
The wall time used to train BT-AE is approximately 0.4 hours using a single Quadro RTX 6000. It is noted that this computational cost is much cheaper than that of the DC-AE model, taking around four to six hours 6  www.nature.com/scientificreports/  www.nature.com/scientificreports/ This is because DC-AE relies on convolutional layers, dropout, and batch normalization, which require much higher computational resources. The BT-AE, on the other hand, utilizes only a plain autoencoder. The BT-AE model is also cheaper than the POD model. However, we note that this may not be a fair comparison as we perform POD and BT-AE using different machines (i.e., our POD framework only works on CPU, but our BT-AE is trained using GPU). Please refer to Kadeethum et al. 6 for detailed wall time comparisons among POD and DC-AE models. The mapping of t and µ to reduced subspace through artificial neural networks (ANN) takes around half an hour to one hour using a single Quadro RTX 6000. As mentioned in "Methodology" section, we do not terminate the training of both BT-AE and mapping of t and µ to reduced subspace through ANN early, but rather use the model with the best validation loss through the final epochs. For example, we train for 50 epochs, but the model that offers the best validation loss might be the model at 20 epochs. However, the training time we report here is for 50 epochs. Thus, our training time provided here is considered conservative. Even though the ROM training time is not trivial, it could provide a fast prediction during the online phase. Using AMD Ryzen Threadripper 3970X, the ROM takes approximately several milliseconds for a query of a pair of t k and µ (i) . We also note that, as discussed previously, our ROM is needed to be trained on GPU for the problems at hand. Still, it could utilize CPU during an online time since we do not have to deal with backpropagation or optimization during the prediction time. On the contrary, one FOM simulation (for each µ (i) for all t ∈ 0 =: t 0 < t 1 < · · · < t N := τ ) takes about two hours. So, assuming that we query all t similar to those of the FOM, ROM takes only a matter of several seconds. In practice, however, we might not need to evaluate all timestamps in 0 =: t 0 < t 1 < · · · < t N := τ because the quantities of interest at the specific time may be more important. Since ROM is not bound by the CFL condition and can predict the quantities of interest at any specific time without intermediate computation, we could simply perform one query-t N and µ (i) , resulting in saving computational time significantly. Our ROM could provide a speed-up of 7 × 10 6 at any specific time step for Example 2, and a speed-up of 7 × 10 3 to 7 × 10 6 for all examples considered in this work.
Our model is developed upon the data-driven paradigm, which is applicable to any FOM. Besides, it could be trained using data produced by FOM, on-site measurements, experimental data, or a combination among them. This characteristic provides flexibility, which intrusive approaches could not provide. The data-driven model, though, is usually hungry for training samples. We have illustrated that as the dimensionality of our parameter space grows, the model requires more training samples, or it will suffer by losing its accuracy significantly as in Example 4 compared to accurate prediction in Example 3. We speculate that an adaptive sampling technique [57][58][59] , incorporating physical information 60,61 , or including multimodal unsupervised training 62 might provide a resolution to this issue in the future work. Another gap in data-driven machine learning ROM is that a posteriori error www.nature.com/scientificreports/ is exceptionally challenging to quantify. An error estimator developed by Xiao 63 for linear manifolds could be adapted and extended to the nonlinear manifold paradigm. Additionally, epistemic uncertainty could also be quantified by adopting the ensemble technique proposed by Jacquier et al. 64 .

Methodology
A graphical summary of our procedure is presented in Fig. 6: the computations are divided into an offline phase for the ROM construction, which we will show through four consecutive main steps and (single-step) online stage for the ROM evaluation. The first step of the offline stage represents an initialization of a training set ( µ ), validation set ( µ validation ), and test set ( µ test ) of parameters used to train, validate, and test the framework, of cardinality M , M validation , M test . For the rest of sections we will discuss only µ . The same analogy goes for µ validation and µ test . Let P ⊂ R P , P ∈ N , be a compact set representing the range of variation of the parameters µ ∈ P . For the sake of notation we denote by µ p , p = 1, . . . , P , the p-th component of µ . To explore the parametric dependence of the phenomena, we define a discrete training set of M parameter instances. Each parameter instance in the training set will be indicated with the notation µ (i) , for i = 1, . . . , M . Thus, the p-th component of the i-th parameter instance in the training set is denoted by µ (i) p in the following. The choice of the value of M , as well as the sampling procedure from the range P , is typically user-and problem-dependent. In this work, we use an equispaced distribution for the training set as done in 6,43 .
In the second step, we query the FOM, based on the finite element solver proposed and made publicly available in Kadeethum et al. 6,32 , for each parameter µ in the training set. In short, we are interested in gravity driven flow in porous media, and here we briefly describe all the equations used in this study: (1) mass balance and (2) heat advection-diffusion equations. Let ⊂ R d ( d ∈ {1, 2, 3} ) denote the computational domain and ∂� denote the boundary. X * are spatial coordinates in (e.g., X * = [x * , y * ] when d = 2 , which we will focus on throughout this study). The time domain is denoted by T = (0, τ ] with τ > 0 (i.e., τ is the final time). Primary variables used in this paper are u * (·, t * ) : � × T → R d , which is a vector-valued Darcy velocity (m/s), p * (·, t * ) : � × T → R d , which is a scalar-valued fluid pressure (Pa), and T * (·, t * ) : � × T → R d , which is a scalar-valued fluid temperature (C). Time is denoted as t * (s).
Following Joseph 65 , the Boussinesq approximation to the mass balance equations results in the density difference only appearing in the buoyancy term. The mass balance equation reads and where κ = k/µ f is the porous medium conductivity, k is the matrix permeability tensor, µ f is the fluid viscosity, y is a unit vector in the direction of gravitational force, g is the constant acceleration due to gravity, ρ and ρ 0 are the fluid density at current and initial states, respectively. We assume that ρ is a linear function of T * 47,66 where α is the thermal expansion coefficient, and T * 0 is the reference fluid temperature. We note that Eq. (5) is the simplest approximation, and one may easily adapt the proposed method when employing a more complex relationship provided in 67 . The heat advection-diffusion equation defined as Here, γ is the ratio between the porous medium heat capacity and the fluid heat capacity, K is the effective thermal conductivity, and f c * is a sink/source. We follow Nield and Bejan 18 and define dimensionless variables as follows where H is the dimensional layer depth, and T * is the temperature difference between two boundary layers. From these dimensionless variables, we could rewrite our Eqs. (3) and (4) as where ∂� p and ∂� q are the pressure and flux boundaries (i.e., Dirichlet and Neumann boundary conditions), respectively. Here, Ra is the Rayleigh number We then write Eq. (6) in dimensionless form as follows www.nature.com/scientificreports/ where ∂� T is temperature boundary (Dirichlet boundary condition), ∂� in and ∂� out denote inflow and outflow boundaries, respectively, which are defined as The detail of discretization could be found in Kadeethum et al. 6,32 , and the FOM source codes are provided in Kadeethum et al. 32 . After the second step, we have M snapshots of FOM results associated with the different parametric configurations in µ . Since the problem formulation is time-dependent, the output of the FOM solver for each parameter instance µ (i) collects the time series representing the time evolution of the primary variables for each time-step t. Thus, each snapshot contains approximations of the primary variables ( u h , p h , and T h ) at each time-step of the partition of the time domain T . Therefore, based on the training set cardinality M and the number N t of time-steps, we have a total of N t M training data to be employed in the subsequent steps. We note that as our finite element solver utilizes an adaptive time-stepping 6,32 , each snapshot may have a different number of time-steps N t , i.e. N t = N t (µ). The third step aims to compress the information provided by the training snapshots provided by the second step. Kadeethum et al. 6 provide detailed derivations and comparisons between linear and nonlinear compression. Especially the convolutional layers, in their classical form, could not deal with an unstructured data structure (unstructured mesh), which is very common in scientific computing or, more specifically, finite element analysis. Hence, our goal is to develop a nonlinear compression that (1) consistently outperforms (or at least delivers similar accuracy) the linear compression and (2) is compatible with an unstructured data structure.
To achieve this goal, we propose a nonlinear compression utilizing feedforward layers in combination with self-supervised learning (SSL) of Barlow Twins (BT) (Fig. 6). The BT for redundancy reduction is proposed by Zbontar et al. 46 . It operates on a joint embedding of noisy images by producing two distorted images from an original one through a series of random cropping, resizing, horizontal flipping, color jittering, converting to grayscale, Gaussian blurring, and solarization. Since we do not operate on structured data (image) but unstructured data produced by finite element solver, we only employ random noise and Gaussian blur operations to produce our noisy data set, see Fig. 6.
Let z u 1 , · · · , z u Q , z p 1 , · · · , z p Q , and z T 1 , · · · , z T Q be the nonlinear manifolds of the u h , p h , and T h , respectively. For the sake of compactness, we will only discuss primary variable T h . The same procedure holds for u h and p h . Our goal is to achieve Q ≪ MN t where MN t is the total training data, which implies that our nonlinear manifolds could represent our training data using much lower dimension. We employ a vanilla AE (using only feedforward layers) that is regularized by Barlow Twins SSL to obtain z T = z T 1 , · · · , z T Q . We do not use any batch normalization or dropout. The summary of the training process is presented in Algorithm 1. We will provide the detailed implementation in https:// github. com/ sandi alabs. www.nature.com/scientificreports/ In short, during the training phase, our BT-AE model is composed of one encoder, one decoder, and one projector. The training entails two sub-tasks; the first is BT (encoder and projector), which takes place in the outer loop. The second sub-task is responsible for the training of AE (encoder and decoder), which takes place inside the inner loop. The main reasons for this procedure are two folds. The first reason is Zbontar et al. 46 states that the BT works better with large batch sizes. The AE, however, generally requires a small batch size 68,69 . Our previous numerical experiments based on DC-AE 6 also align with this statement. Consequently, we set our batch size of the outer loop as B outer = 512 , and the batch size of the inner loop as B inner = 32 Prior to the training, we distort our training set (i.e., creating T h,A (t, µ) and T h,B (t, µ) from T(t, µ) ) through a series of two operations. First, add random noise is added as follows where T h,A (t, µ), T h,B (t, µ) are intermediate distorted input data. The constant ǫ , which is set to 0.1, determines the noise level as it is multiplied with the standard deviation of the input field. G (0, 1) is a random value which is sampled from the standard normal distribution with mean and standard deviation of zero and one, respectively.
Subsequently, we pass T h,A (t, µ), T h,B (t, µ) through Gaussian blur operation, which reads to obtain T h,A (t, µ) and T h,B (t, µ).
We use a number of the epoch of 50, see Algorithm 1. The outer loop works as follows: training BT begins with passing T h,A (t, µ) and T h,B (t, µ) to the encoder (it is noted we have only one encoder) resulting in z T A (t, µ) and z T B (t, µ) . We then use z T A (t, µ) and z T B (t, µ) as input to the projector resulting in cross-correlation matrix www.nature.com/scientificreports/ C T (t, µ) . C T (t, µ) is a square matrix with the dimensionality of the projector's output, and its values range between -1, perfect anti-correlation, and 1, perfect correlation. The Barlow Twins loss L T BT (BT loss) is then calculated using where and where C T ii (t, µ) denotes the i-th diagonal entry of C T (t, µ) , is a positive constant, which is set to 5 × 10 −3 as recommended by Zbontar et al. 46 , and C T ij are off-diagonal entries of C T . In short, we train our BT part by trying to force L T I to 1, but L T RR to 0 resulting in teaching the encoder and projector learn how to get rid off noise from the distorted data, T h,A (t, µ) and T h,B (t, µ) , and construct a representation that conserves as much T(t, µ) information as possible.
Here, we follow the training procedures used by Kadeethum et al. 6,70 . We use the ADAM algorithm 71 to adjust learnable parameters of encoder(W and b ) and projector(W and b ). The learning rate ( η ) is calculated as 72 where η c is a learning rate at step step c , η min is the minimum learning rate, which is set as 1 × 10 −16 , η max is the maximum or initial learning rate, which is selected as 1 × 10 −4 , step c is the current step, and step f is the final step. We note that each step refers to each time we perform back-propagation, including updating both encoder and projector's parameters.
The inner loop is as follows: training AE starts with obtaining z T (t, µ) by passing T h (t, µ) to the encoder. We then use z T (t, µ) to reconstruct T h (t, µ) through the decoder. Subsequently, we calculate our data compression loss or AE loss ( L T AE ) using Similar to the training of BT, we use ADAM to adjust learnable parameters of encoder(W and b ) and decoder(W and b ) according the gradient of Eq. (18). The η c is adjusted by Eq. (17). In contrast to the training of BT, we use η min = 1 × 10 −16 , and η max = 1 × 10 −5 . Following the training of the BT-AE, we now establish the manifold z T (t, µ), ∀t ∈ T and ∀µ ∈ P during the fourth step shown in Fig. 6. The data available for this task are the pairs (t, µ) and z T (t, µ) in the training set. We achieve this through the training of artificial neural networks (ANN). Following Kadeethum et al. 6,43 , our ANN has five hidden layers, and each layer has seven neurons. We use tanh as our activation function. Here, we use a mean squared error ( MSE z T ) as the metric of our network loss function, defined as follows To minimize Eq. (19), we use the ADAM algorithm to adjust each neuron W and b , a batch size of 32, a learning rate of 0.001, a number of epoch of 10,000, and we normalize both our input ( t, µ ) and output ( z T ) to [0, 1]. To prevent our networks from overfitting behavior, we follow early stopping and generalized cross-validation criteria 4,73,74 . Note that instead of literally stopping our training cycle, we only save the set of trained weight and bias to be used in the online phase when the current validation loss is lower than the lowest validation from all the previous training cycle.
During the online phase (the fifth step shown in Fig. 6), we utilize the trained ANN and the trained decoder to approximate T h (·; t, µ) for each inquiry (i.e., a pair of (t, µ) ) through and, subsequently, We note that, for the prediction phase, our ROM could be evaluated using any timestamps, including one that does not exist in the training phase (i.e., any t that lies within [t 0 , τ ]) because our ROM treats the time domain T continuously. Besides, in contrast with the FOM, the ROM is not bound by the CFL condition and can predict T h t k , µ (i) − T h t k , µ (i) 2 .