Decentralized digital twins of complex dynamical systems

In this article, we introduce a decentralized digital twin (DDT) modeling framework and its potential applications in computational science and engineering. The DDT methodology is based on the idea of federated learning, a subfield of machine learning that promotes knowledge exchange without disclosing actual data. Clients can learn an aggregated model cooperatively using this method while maintaining complete client-specific training data. We use a variety of dynamical systems, which are frequently used as prototypes for simulating complex transport processes in spatiotemporal systems, to show the viability of the DDT framework. Our findings suggest that constructing highly accurate decentralized digital twins in complex nonlinear spatiotemporal systems may be made possible by federated machine learning.


Introduction
A recent multifaceted trend in science, engineering and technology, both in academia and industry, is digital transformation (Westerman et al., 2014;Arts et al., 2015).Thanks to the availability of inexpensive sensors and communication technologies, improved interoperability between sensors, the phenomenal success of machine learning and artificial intelligence, in particular, deep learning, new developments in the computational hardware, and cloud and edge computing infrastructure, each and every sector is steadily moving from data-sparse regimes to the data-rich regimes.However, this growth has resulted in several major challenges.First, the data exhibits a high correlation in many systems, which implies more data does not necessarily translate into more information.Second, it is computationally demanding to train an end-to-end data-driven machine learning model that can be trustworthily used in future predictions.Third, there is a growing interest in the legal and regulatory landscape relevant to data sharing, security, privacy, and intellectual property rights (IPRs).
One outcome of rapid digitalization is the emergence of technologies like digital twins.
Broadly speaking, a digital twin can be defined as a computational model that interacts with the physical assets, and evolves over time to persistently represent the structure and behavior of the involved assets and processes (Kapteyn et al., 2021;San et al., 2021;Rasheed et al., 2020).The digital twin concept has garnered a lot of attention with a multibillion market capitalization value for years to come (San, 2021), involving many different stakeholders all the way from the development to decommissioning phases of underlying systems and processes.Numerous disruptive concepts and start-ups have been continuously evolving to bring more value and impact to society.According to a recent report by Research Dive (2022), the global digital twin market is predicted to grow at a compound annual growth rate of 40%, thereby garnering $125 billion by 2030.In particular, the availability of highly modular open-source libraries such as TensorFlow, PyTorch, and Theano, and the flexibility of cross-platforms such as Unity and Unreal Engine are lowering the barriers for many different potential use cases from precision agriculture (Gebbers and Adamchuk, 2010) to precision medicine (Ashley, 2016) and beyond.This open-source ecosystem has allowed scientists and engineers to expose themselves faster to the state of the art digital twin technologies with cross-functionalities.
One of the key enablers of digital twins is modeling.In computational science and engineering applications, most modeling approaches lie in either of the two categories: first principle physics-based modeling (PBM) or data-driven modeling (DDM).More recently Hybrid Analysis and Modeling (HAM) (San et al., 2021) which is a fusion between the PBM and DDM is fast evolving (e.g., see also Karpatne et al. (2017); Childs and Washburn (2019) 2022)).Out of the three modeling approaches, DDM and HAM thrive on the availability of large amount of high quality data.The data requirement in the context of digital twins poses several challenges specially as one moves towards creating digital twins of increasingly complex systems.We enumerate the challenges which are most relevant in the context of the current work: • Any reasonably complex asset of a physical system or a process consists of multiple subcomponents operated and maintained by different players who might have competing interests making data sharing difficult.This creates the data silo issues.
• Even when the IPRs involving data is sufficiently resolved, the shared big data characterized by 5Vs (large volume, velocity, veracity, variety, value) from different components/vendors can become overwhelmingly large that might require an exorbitant amount of compute power in one place to generate knowledge.
• Furthermore, the analysis of a large variety of big data in a centralized approach requires a wide array of expertise in one place which will be difficult to achieve resulting in sub-par analysis.
• Lastly, the limited bandwidth of data transmission in a centralized data analysis approach limits the amount of shared data.
To that end, the concept of federated learning can be utilized.This concept is a branch of machine learning that deals with training a model across multiple decentralized edge devices (or servers) without exchanging local data samples between them (Yang et al., 2019;Zhang et al., 2021;Wahab et al., 2021;Liu et al., 2022).Although such privacy preserving decentralized and collaborative machine learning applications become more of a main stream approach in advertising, financial and many other industries with personalized dominance, it is relatively uncharted in advancing functionalities and providing technical support to forge a cross-domain, cross-data and cross-enterprise digital twin ecosystem.Federated machine learning algorithms have been successfully used for training machine learning models (e.g., see Zhang et al. (2021)).In this study, we aim to extend the prospects of this concept to complex dynamical systems in order to provide building blocks (from a modeling perspective) for next generation decentralized solvers.
With this in mind, our chief motivation is to communicate to a wide variety of audience on how federated machine learning can address many of the issues raised above.Consequently, this is more of a concept paper and we omit providing rigorous derivations of the utilized algorithms.Consistently, several relatively simple (but representative) problems have been chosen to demonstrate the framework.The plain vanilla versions of learning models have been utilized without any particular efforts dedicated to rigorous hyperparameter tuning and model calibration.Rather our focus is laser sharp on highlighting the exploitation of the benefits of federated machine learning in the context of digital twin.This paper is organized as follows.Section 2 provides a brief overview on the federated learning in the context of digital twins.In Section 3 we detail the federated learning process for our demonstration cases.Then we validate our frameworks in Section 4 demonstrating results from representative spatiotemproal dynamical systems.Our test cases include nonlinear advection-diffusion, chaotic travelling wave, and sea surface temperature forecasting processes.Finally, we highlighted our concluding remarks and outline the future research direction in Section 5.

Digital twins
The need for digitalization has achieved even higher importance since the outbreak of COVID-19.The ability to control physical assets remotely under the "work from home" situation was never felt so deeply.As the world has gradually recovered from this unfortunate situation, one can expect to see a rapid expansion of Digital Twins (Rasheed et al., 2020), which can be defined as virtual representations of physical assets that enable, by combining data and simulators, to perform real-time prediction, optimization, monitoring, control, and improved decision making for the physical assets.Recent advances in computational pipelines, multiphysics solvers, artificial intelligence, cybernetics, data processing and management tools are bringing now the promise of digital twins and their impact on society closer to reality.For these reasons, digital twinning is now an important and emerging trend in many sectors, and there can be no doubt it is bound to play an even more transformative role in the post-COVID-19 world than foreseen previously.The concept of digital twins is quite rich and may benefit from many decades of research and developments in computational science and engineering.
Digital twin, defined as a virtual representation of a physical asset enabled through data and simulators for real-time prediction, optimization, monitoring, control, and improved decision making, is a technology penetrating every domain.Figure 1 describes the concept of a digital twin utilizing a decentralized modeling for physical realism.On the top right side of the figure, we have the physical asset we want to represent by a digital twin.The physical asset is often equipped with a diverse class of sensors that gives big data in real-time.This data has a very coarse spatio-temporal resolution and does not describe the future state of the asset.Therefore, to complement the measurement data, models are utilized to bring physical realism into the digital representation of the asset.Provided that the same information can be obtained from the digital twin as from the physical asset, we can utilize the digital twin for informed decision-making and optimal control of the asset.However, one might be interested in risk assessment, what-if?analysis, uncertainty quantification, and process optimization.These can be realized by running the digital twin in an offline setting for scenario analysis.The concept is then known as a digital sibling.Additionally, the digital twin predictions can be archived during the asset's lifetime and can then be used for designing the next generation of assets, in which case the concept is referred to as digital threads.Digital twins, based on their capability levels, can be categorized on a scale from 0 to 5 as standalone, descriptive, diagnostic, predictive, prescriptive, and autonomous.It is evident that modeling plays a vital role in enhancing the capability of digital twins.As pointed out in San et al. (2021), any modeling approach should at least be generalizable, trustworthy, computationally efficient, accurate, and self-adapting.Most of the modeling techniques currently lie in either the physics-based modeling or data-driven modeling categories.
Since a digital twin requires a real time exchange of data between the physical asset and its digital representation, data privacy and security objectives become an increasingly important topic.Due to involvement of multiple stakeholders all the way from the development, through operation to the decommissioning phase of physical assets, the gathered data has to be secure both in terms of validity and privacy.Many digital twin systems and applications might involve different original equipment manufacturers, and they might want to keep their proprietary rights to secure their designs.
The predictive digital twins are now coined with precision sciences, such as precision agriculture (Gebbers and Adamchuk, 2010), precision medicine (Ashley, 2016) as well as precision meteorology (Chilson et al., 2019), in emphasizing their improved and localized forecasting capabilities.For example, we emphasize the penetration of data-driven modeling in precision meteorology, which often refers to accurate microscale weather forecasting that is of great importance in enabling solutions to water, food, energy and climate challenges in the coming century (Tran et al., 2020).Therefore, an increase of parameterization capabilities and characteristics along with enhanced in situ observations within the atmospheric boundary layer becomes a key concept to improve the accuracy of such microscale weather forecasting models (Nolan et al., 2009).
Our chief motivation in this study relies on the fact that a greater than ever penetration of smart devices (e.g., smart weather stations, smartphones, and smartwatches) has remained an uncharted technology in many spatiotemporal extended systems, and crowdsourcing data-driven modeling could be a key enabler toward their emerging digital twin applications.For example, by 2025 there will be more than 7 billion smartphones in the world.This number is significantly huge compared to the paltry (over 10,000) official meteorological stations located around the world (Mildrexler et al., 2011).While the analysis and utilization of data only from a few edge devices might not yield accurate predictions, the processing of data from many smart and connected devices, equipped with sensors, might be a game-changer in the weather monitoring and prediction.In their where P k is the data on the kth client, n k is the cardinality of is the loss of the prediction on example (x i , y i ), and w refers to the trainable model's parameters.The above aggregation protocol can be applied to any machine learning algorithm.A complete pseudo-code for deep learning models in a federated learning setting is provided in Algorithm 1.We highlight that the approach we utilize in our study simply weights edge devices proportionally by the data they own.More advanced approaches can be considered to mitigate such limitations (Li et al., 2020(Li et al., , 2019;;Fallah et al., 2020;Deng et al., 2020;Tan et al., 2022), but that is beyond the scope of this paper.

Demonstration cases
To demonstrate the value of the federated learning we chose the following three cases.

Burgers system
Our first test case for demonstration of the DDT framework is the one-dimensional Burgers equation, which is a prototypical example for nonlinear advection-diffusion problems.The Burgers equation can be written as follows  1 + t+1 t0 exp where t 0 = exp(1/8ν).We parameterize the Burgers equation using the time t and viscosity ν as the parameter space, i.e., µ = (t, ν) T .The first step of the reduced order modeling (ROM) is to generate the data.We generate the data for different values of parameters.Specifically, the parameter t is varied between 0 to 2, and the parameter ν lies between 0.001 to 0.01.Once the data is generated, we form the matrix A ∈ R Nx×N whose columns are the u n corresponding to µ n , and then perform the singular value decomposition (SVD) of the matrix where W is an N x × N matrix with orthonormal columns w k , V is an N × N matrix with orthonormal columns v k , and Σ is an N × N matrix with non-negative diagonal entries, called singular values, arranged such that The vectors w k are the proper orthogonal decomposition (POD) modes that we denote as φ k , and is the set of POD basis functions (Rowley and Dawson, 2017).The representation of the approximated velocity field using the POD modes is as follows where α k (µ) are the modal coefficients corresponding to parameter µ, and R refers to the number of modes retained in our model.The neural network is trained to learn the relationship between the parameters of the Burgers equation to the POD modal coefficients.Once the neural network is trained, we can predict the POD modal coefficients for a new set of parameters, and the velocity field can be reconstructed using Equation 5.
In the case of the federated ROM framework, the low-dimensional data is assumed to be distributed across clients and the training is done using the federated averaging algorithm as described in Algorithm 1.The complete federated ROM framework is illustrated in

Kuramoto-Sivashinsky system
The second test case pertains to federated machine learning applied to an autoencoder which is a powerful approach for obtaining the latent space on a nonlinear manifold.The autoencoder is composed of an encoder and a decoder, where the encoder maps an input to a low-dimensional latent space and the decoder performs the inverse mapping from latent space variables to the original dimension at the output.If we denote the encoder function as η(w) and a decoder function is defined as ξ(w), then we can represent the manifold learning as follows where z represents the low dimensional latent space and R is the dimensionality of the latent space.
We demonstrate the nonlinear dimensionality reduction using an autoencoder for spatiotemporal chaotic systems in which local data samples are held in edge devices.The encoder and decoder both have three hidden layers and each hidden layer has 100 neurons.
We use the elu activation function for hidden layers and the linear activation function is used at the output layer of both encoder and decoder.
Specifically, we consider the Kuramoto-Sivashinsky (KS) system to demonstrate the prospects of the federated learning concept.This KS system is known for its irregular or chaotic behavior (LaQuey et al., 1975;Kuramoto, 1978;Sivashinsky, 1977;González-García et al., 1998;Pathak et al., 2018;Vlachas et al., 2018;Linot and Graham, 2020;Vlachas et al., 2022), and we often formulate the KS equation with L-periodic boundary conditions as follows where the dynamics undergo a hierarchy of bifurcations as the spatial domain size L is increased, building up the chaotic behavior.Here, we perform the underlying numerical experiments with L = 22 to generate our spatiotemporal data set.Equation 9 is solved using the fourth-order method for stiff partial differential equations (Kassam and Trefethen, 2005;Vlachas et al., 2022) with the spatial grid size of N = 64.The random initial condition is assigned at time t = −250 and the solution is evolved with a time step of 2.5 × 10 −3 up to t = 0. Using the solution at time t = 0 as the initial condition, the KS system is evolved till t = 2500.The data is sampled at a time step of 0.25 and these 10,000 samples are used for training and validation.For the testing purpose, the data from t = 2500 to t = 3750 is utilized.

NOAA Optimum Interpolation Sea Surface Temperature Dataset
Our third test case is the decentralized autoencoder for the nonlinear dimensionality reduction of the NOAA OI SST V2 analysis dataset.This dataset is obtained using in situ (ship and buoy) and satellite SSTs plus SSTs simulated by the sea-ice cover.
Before calculating the analysis, the satellite data is adjusted for biases using the method of Reynolds (1988) and Reynolds and Marsico (1993).This dataset consists of the weekly average sea surface temperature on a 1 • latitude × 1 • longitude global grid (180 × 360).Even though the SST dataset exhibits a strong periodic structure due to seasonal fluctuations, complex ocean dynamics lead to rich flow physics in this dataset.
This dataset has been used in number of recent studies on geophysical emulation (Maulik et al., 2020), flow reconstruction (Callaham et al., 2019), and multi-resolution dynamic mode decomposition (Kutz et al., 2016).The autoencoder is utilized for nonlinear dimensionality reduction of the NOAA OI SST V2 dataset.The autoencoder is composed of the encoder function η(w) which maps the high dimensional sea surface temperature field to low dimensional latent space and a decoder function ξ(w) which maps the low dimensional latent space to the same sea surface temperature field.Both encoder and decoder are parameterized by the weights w and these parameters are learned through training.We can represent the reconstruction of the temperature field with the autoencoder as follows where θ is the sea surface temperature field, z represents the low dimensional latent space and R is the dimensionality of the latent space at the bottleneck layer of the autoencoder.
Here, we use the data from October 1981 to July 2010 (1500 snapshots) for training the autoencoder model and the data beyond July 2010 is used for comparing the performance of the trained autoencoder model with the unseen data.

A decentralized reduced order modeling framework: Burgers system
The mean squared error loss for the validation dataset during the training of centralized and federated ROM frameworks is shown in Figure 4.The validation loss for both for the centralized and federated autoencoder with different dimensionality of the latent space is depicted in Figure 6, and we see that both the losses converge to very similar values.This shows that there is no loss in accuracy due to federated learning compared to centralized learning.As shown in Figure 8, the reconstruction error for both centralized and federated autoencoders saturates around R = 8 modes.Our observations are consistent with previous works (Vlachas et al., 2022;Linot and Graham, 2020;Cvitanović et al., 2010;Robinson, 1994), suggesting that the latent space dynamics lies effectively on a manifold with R = 8 dimensions.
The state trajectory of the KS system for the testing period is shown in Figure 8.
The true data (i.e., FOM) is also included in the figure for comparison purpose.Both the centralized and federated autoencoders have a similar level of predictive performance.
The joint probability distribution function (PDF) of the pointwise values of u x and u xx is shown in Fig. 9.We observe that the joint PDF of the prediction from both centralized and federated autoencoder matches very closely with the true joint PDF.three hidden layers with the number of neurons being {800, 400, 200} across the first, the second, and the third hidden layer, respectively.The decoder has the same architecture in reverse order, i.e., three hidden layers with {200, 400, 800} neurons.The number of neurons in the bottleneck layer is varied between R = 5 to R = 25 with an increment of 5.
The encoder and decoder architecture remain the same for all the numerical experiments.effective strategy especially when the amount of the data is small for each client.Despite this, the validation loss for both centralized and federated learning follows a similar trajectory with the loss reduced by almost one order of magnitude from the initial loss.
The reconstruction of the weekly average temperature field for the fourth week of March 2018 is shown in Figure 11 for both centralized and federated autoencoder along with the full order model, i.e., the analysis.

Conclusion
In this work, we explore the potential of federated machine learning in the context of digital twin operated by multiple vendors for modeling complex spatiotemporal dynamical systems.In particular, we investigated federated learning for nonlinear dimensionality  updating a global model without exposing the local data collected from different sources.
In summary, we addressed the four main challenges related to data mentioned in the introduction section as follows: • Since the data generated by individual vendors never leaves the local servers, a decentralized setting guarantees better security, addresses key concerns related to IPRs, and data silo issues.Furthermore, since each vendor can focus on individual sub-component of the asset, the data quality can be significantly enhanced.
• Since each data owner can conduct the training using the limited amount of data that they generate and own, they can work with significantly low compute power in a cost effective way.
• The decentralized federal learning approach means that specialists can be involved more effectively to analyze, interpret, and train models on the data that they have Although in this study we primarily focus on federated learning in the context of spatiotemporal reconstruction of dynamical systems, our approach can be also generalized to large-scale computational settings beyond transport phenomena, for which the research outcomes might improve broader modeling and simulation software capabilities to design fast, cohesive, effective, and secure predictive tools for cross-domain simulations in the various levels of information density.
In summary, the trained models contributed by the different stakeholder can be reversed engineered thereby compromising data security.In our future studies, we plan to leverage decentralized learning approaches in the context of precision meteorology and develop new physics-guided federated learning approaches to forge new surrogate models compatible among heterogeneous computing environments.Also, we would like to investigate which federated learning approach is best to fuse the weights.The three demonstration cases handled here are relatively simple but that was completely intentional as it eases the communication and dissemination of the work to a larger audience.However, the next logical step should be to apply the approach to a more complex problem.information, apparatus, product, or process disclosed, or represents that its use would not infringe privately owned rights.Reference herein to any specific commercial product, process, or service by trade name, trademark, manufacturer, or otherwise does not necessarily constitute or imply its endorsement, recommendation, or favoring by the United States Government or any agency thereof.The views and opinions of authors expressed herein do not necessarily state or reflect those of the United States Government or any agency thereof.

Figure 1 :
Figure 1: An overview of a decentralized digital twin concept for wind farm applications.

Figure 2 :
Figure 2: Overview and schematic illustrations of the centralized and federated machine learning approaches.
Federated machine learning refers to the collaborative training among multiple clients without sharing the actual data.We closely follow the seminal work in federated learning(McMahan et al., 2017), which introduces a federated averaging algorithm where clients collaboratively train a shared model.Figure2contrasts the federated learning approach with the centralized method.In the centralized method, the local dataset is transferred from clients to a central server and the model is trained using centrally stored data.In case of the federated learning, the local dataset is never transferred from clients to a server.Instead, each client computes an update to the global model maintained by the server based on the local dataset, and only this update to the model is communicated.The federated averaging algorithm assumes that there is a fixed set of K clients with a fixed local dataset and a synchronous update scheme is applied in rounds of communications.At the beginning of each communication round, the central server sends the global state of the model (i.e., the current model parameters) to each of the clients.Each client computes the update to the global model based on the global state and local dataset and this update is sent to a server.The server then updates the global state of the model based on the local updates received from all clients, and this process continues.The objective function for a federated averaging algorithm can be written as follows by the data they own end for ClientUpdate(k, w): B ← (split P k into batches of size B) for each local epoch i from 1 to E do for batch b ∈ B do w ← w − α∇l(w; b) end for end for return w to a server where u(x) ∈ R Nx is the velocity field.We can write the analytical solution of the Burgers equation as follows u(x, t) = x t+1

Figure 3 Figure 3 :
Figure 3 where clients collaboratively learn an aggregated model while keeping all the training data on their respective client.The machine learning model is a feed-forward neural network with four hidden layers and forty neurons per hidden layer.We use the relu activation function to introduce nonlinearity and the linear activation function is used at the output layer.

Figure 4 :
Figure 4: Validation loss during training of the centralized and federated neural network for the ROM framework of Burgers equation.

Figure 5 :
Figure 5: Reconstruction performance of the centralized and federated learning approaches for Burgers equation for µ = (0.02, 0.00475) T .

Figure 6 :
Figure 6: Validation loss during training of the centralized and federated neural network for the Kuramoto-Sivashinsky system.

Figure 10
Figure 10 depicts the validation loss during both centralized and federated training of the autoencoder.The validation loss for federated learning is slightly higher compared to the centralized learning from the first epoch.This can be attributed to the fact that in federated averaging the weights of the global model are initialized by simply averaging the weights of the client model.The simple averaging of weights might not be the most 14

Figure 7 :
Figure 7: Mean square error representing the reconstruction performance of the federated and centralized autoencoders considering different latent space dimensions in the Kuramoto-Sivashinsky system.
reduction of dynamical systems.Federated learning allows for collaborative training of a model while keeping the training data decentralized.Our numerical experiments show that a federated model can achieve the same level of accuracy as the model trained using the central data collected from all clients.This work opens up the possibility of

Figure 8 :
Figure 8: Space-time solutions of the Kuramoto-Sivashinsky system obtained by the federated and centralized models.

Figure 9 :
Figure 9: Joint PDF of pointwise values of ux and uxx on a logarithmic scale for solving the Kuramoto-Sivashinsky system.

Figure 10 :
Figure 10: Validation loss during training of the centralized and federated neural network for the NOAA Optimum Interpolation Sea Surface Temperature Dataset.

Figure 11 :
Figure 11: Weekly averaged temperature field in degrees Celsius for the fourth week of March 2018 along with reconstructed temperature field from the centralized and federated autoencoders.

Centralized Data Centralized Model Aggregated Model Client Models
(Sivaraman et al., 2018)t al. (2021)highlighted that the Weather Company utilizes data from over 250,000 personal weather stations.Moreover,Chapman et al. (2017)discussed how the crowdsourcing data-driven modeling paradigm can take meteorological science to a new level using smart Netatmo weather stations.As more attention shifts to smart and connected internet of things devices, security and privacy implications of such smart weather stations have been also discussed(Sivaraman et al., 2018).2.2.Federated learning Algorithm 1 Federated averaging algorithm.B is the local minibatch size, E is the number of local epochs, and α is the learning rate, and w refers to the trainable parameters.