A pretraining domain decomposition method using artificial neural networks to solve elliptic PDE boundary value problems

Developing methods of domain decomposition (DDM) has been widely studied in the field of numerical computation to estimate solutions of partial differential equations (PDEs). Several case studies have also reported that it is feasible to use the domain decomposition approach for the application of artificial neural networks (ANNs) to solve PDEs. In this study, we devised a pretraining scheme called smoothing with a basis reconstruction process on the structure of ANNs and then implemented the classic concept of DDM. The pretraining process that is engaged at the beginning of the training epochs can make the approximation basis become well-posed on the domain so that the quality of the estimated solution is enhanced. We report that such a well-organized pretraining scheme may affect any NN-based PDE solvers as we can speed up the approximation, improve the solution’s smoothness, and so on. Numerical experiments were performed to verify the effectiveness of the proposed DDM method on ANN for estimating solutions of PDEs. Results revealed that this method could be used as a tool for tasks in general machine learning.

control problems 32 , the methods of unsupervised learning 33 , immersed boundary problem 34 , and a constrained backpropagation approach 35 .
In 2020, devising a domain decomposition methodology (DDM), Jagtap et al. 36 proposed conservative physics-informed neural networks to solve Burgers' equations, incompressible Navier-Stokes equations, and compressible Euler equations. Among many variations of ANN models, in Refs. 36,37 , the authors implemented PINN-based domain decomposition methodologies and showed their approximations' effectiveness by hprefinement. They divided their computational domain into several subdomains and constructed the main loss function by employing weight parameters on terms of boundary/initial conditions, the residual from governing equations and interfaces, and applied their method to forward and inverse problems of several challenging nonlinear PDEs. Since hp-refinement technologies have been one of the main concerns for research performing numerical analysis of PDEs [38][39][40] , their breakthroughs employing the ANN-based decomposition of the domain could not be regarded as anything other than impressive.
Solving PDE by ANN models, many studies have developed efficient technologies as listed in above. However, there is no model that employs a concept of pretraining technique. As the main contribution of our proposed method in our work, we proposed a pretraining scheme to improve the performance of a given model of ANN to solve elliptic PDEs.
In this study, as another approach for an ANN-based DDM, we developed a pretraining scheme DDM. The pretraining process that is engaged in at the beginning of the training epochs helps the approximation basis get well-posed on the domain so that the accuracy of the estimated solution is enhanced, i.e., the speed of approximation is improved rather than non-pretrained schemes. We report that such a well-organized pretraining scheme is effective as an NN-based PDE solver. In our work, we did not focus on handling disadvantages of existing methods. Instead, we focused on developing an ad-hoc technology by proposing a pretraining scheme so that it could be applied to any given existing ANN models to enhance their numerical performances.
The proposed pretraining scheme consists of two parts. The first part is to train along with governing equations without any engagement of the boundary or subdomain's interfacial conditions. The second part is to train for neighboring subdomains to be flattened, i.e., a subdomain's function of the estimation is compelled to be zero at data points located on neighboring subdomains. In this article, we call the first part the smoothing process and the second part the basis reconstruction (BR) process. With numerical experiments, we show that as the proposed pretraining process continues, the estimated solution's output of the fine-tuning improves. Moreover, we report that the BR process helps the numerical basis of the ANN become well-posed to approximate sorts of high degrees of polynomials that are in solutions as the BR process makes the basis of ANN of a subdomain closely gather on its region such that the initial state of the ANN, at the starting point of the fine-tuning, has high smoothness.
Other than existing methodologies of the NN-based domain decomposition method that employ hyperparameters within the loss function, our proposed method consists of a scheme of pretraining and fine-tuning approach that allows us to use the scheme in various architectures of deep-learning algorithms. As the main contribution of our proposed method, the pretraining process that is engaged at the beginning of the training epochs helps the approximation basis being well-posed on the domain so that the quality of the estimated solution is enhanced. We report that such well-organized pretraining scheme may give effectiveness to any NN-based PDE solver as we get a speed-up in approximation and improvement in the solution's smoothness and so on.
In "Methods" section, we introduce our domain decomposition method for ANNs. In "Numerical experiments" section, we present numerical results of using our method to solve elliptic boundary value PDEs and the XOR problem as a demonstration of the method's performance for application to the general machine learning task. We then discuss properties of our proposed method. In "Conclusion" section, we present some conclusions drawn from our study.

Methods
The feedforward multi-layered neural networks and the system of PDEs. Let the architecture of the multi-layered ANN (abbreviated as ANN thereafter) be given as follows: where v ∈ R H and w ∈ R H×(d+1) denote learnable weights, p represents them as general parameters, the input variable is denoted by x , an extension of x such that x = (x, b) T with a constant b for the computation of the bias weight in w , and " σ : R H → R H " is a component-wise activation function. In our study, we employed the sigmoidal activation function. The d-dimensional problem structured by N x, p is illustrated in Fig. 1. www.nature.com/scientificreports/ Following 28 , we considered the following system of PDEs: where A(x) satisfies the boundary condition such that �(x) = A(x), ∀x ∈ ∂� , and N x, p is the single-hiddenlayered ANN. The scalar function B(x) is constructed to satisfy B(x)N x, p = 0, ∀x ∈ ∂� D , where ∂� D is the portion of ∂� where the essential boundary condition is imposed. For details of techniques on how to impose constraints, please refer to Refs. 28,29 .
To estimate the trial solution t x, p solving the system of Eq. (2), we defined discretization of such that � = ξ j ∈ �; j = 1, . . . , m . Then Eq. (2) is in the form of the collocation method 3 as follows: To train weight parameters of the minimization problem in Eq. (4), we applied the backpropagation method used in gradient descent 27,28,35 .
As for our example problem of Eq. (2), the Poisson equation is given as follows: where F(x) is the source function.
To estimate the solution of Eq. (6), we discretized the minimization problem of Eq. (6) as follows: Now, by employing the gradient descent method with a suitable learning rate, we obtained the iterative solution of Eq. (7). For weight parameters w and v of p , using Eqs. (5) and (7), their n th epoch's updates of gradients �w (n) and �v (n) are given by the chain rule of the derivatives of Eq. (7) for w (n) and v (n) , respectively, as follows: where σ ′ k is the derivative of the k th component of σ.
The pretraining domain decomposition method for ANNs. In our study, the given computational domain was divided into several non-overlapping smaller computational regions over which the governing PDEs were inherited. We call these small-sized computational domains as subdomains of our domain decomposition method. Now, the core of the DDM scheme we proposed began with the flux-continuity on the interfaces of the subdomains. After that, the construction of the basis related to the solutions on the subdomains was considered. To illustrate the scheme, we defined a pair of consecutive neighborhood subdomains r , s ⊂ where r and x ∈ ∂� , www.nature.com/scientificreports/ s were disjoint except for their interfaces. Trial functions t,r ·, p r and t,s ·, p s were given for r and s , respectively.
Flux-continuity on interfaces. The training for the interfacial flux-continuity was divided into two parts: the nodal continuity and the continuity of the normal directional derivatives.
(1) Nodal continuity: (2) Continuity of the outward normal directional derivatives: For normal directional derivatives in Eq. (10), we applied direct differentiation on � t,r ζ , p r and � t,s ζ , p s for respect to normal directions on their domains. Neglecting the subscript of r or s , we have Hence, where − → n is the outward directional normal vector on the interface.

Construction of the basis on subdomains.
As the key subject of our study, a pre-processing approach was taken for the model of multi-subdomains by proposing a pretraining scheme that could initialize training weight parameters well. We aimed to enhance the convergence power. This pretraining can be divided into two parts: (1) Pretraining for the loss function on each subdomain (smoothing process) In this step, we strengthened the convergence power interior of each subdomain except for terms of the flux-continuity. From this, we performed an early estimation of the smoothness interior before the interfacial flux-continuity exchanges. The pretraining scheme implemented to optimize the problem is given as follows: where the subscript q denotes the arbitrary index of subdomains. (2) Pretraining to impel the basis into each subdomain (basis reconstruction (BR) process) In this step, we aim to train the network to nullify the basis arising in the neighboring subdomains. We hypothesize that if the function-shaping basis exists outside of the interface, it may weaken the power of the polynomial basis that shapes the interior estimations. Hence, to prevent the basis from being non-zero outside of the interfaces, we performed nullification pretraining such that where the ǫ q was the ǫ-expansion of q for a given positive number ǫ > 0 ∈ R defined by The definition of ǫ q is illustrated in Fig. 2. Assuming that a two-dimensional (2D) rectangular domain is given, the gray-shaded region depicts q and the region enclosed by a red curve represents ǫ q .
Note that the above two parts of the pretraining occurred subsequently in each epoch. The main purpose of the pretraining scheme is to enhance computational smoothness of basis functions made of activation functions. As we nullified solutions outside of a given subdomain, we assumed that the smooth region in the basis function moved near the region of the subdomain so that the well-posed basis into the domain might prevent leakage of the computational basis' smoothness off the domain.
The proposed ANN version of DDM is illustrated in Fig. 3.

Numerical experiments
The first numerical experiment we performed was a preliminary study to investigate the pretraining's effect on one-dimensional (1D) problems. We applied the smoothing process as pretraining without basis reconstruction or nullification for the Poisson problem with the Dirichlet boundary condition. Note that in our work, the benchmark algorithm, which is conceptually the same as the one with the algorithm introduced in Ref. 36 , i.e., a DDM model without pretraining process, was one of the recently proposed algorithms to apply the traditional DDM method calculated by preserving the flux continuity between subdomains to ANN as it is (refer to refs 36,37 .).

1D problem: effect of smoothing. The 1D Poisson equation with Dirichlet boundary condition is:
where the solution u satisfies equations over the domain = [0,2] and the boundary ∂� of is ∂� = �\(0,2) . As a test solution, we let u = u(x) , for x ∈ , such as  www.nature.com/scientificreports/ Figure 4 shows plots of the estimated solution made from the ordinary single domain method. They were compared to the exact solution. To compare, we verified the use of hidden units which we set to be 150, 200, and 300. To organize the set of training data, we discretized the domain equi-spatially into 3 × 2 4 nodes. Among estimations, only the case of using 300 hidden units gave a good approximation to the solution. Note that when updating neural weights in the training, we conducted the smallest batch model that updated weights for each input data, like the stochastic batch mode. Now, to assess the computing power effectiveness of the DDM model, we conducted training with a threesubdomain method without any pretraining process. Figure 5 shows a plot of these results. Here, hidden units are set to be 20, 30, and 100 on each subdomain. Hence, the total number of the hidden units is 150. As show in Fig. 5, preset hidden units of the DDM failed to estimate the solution. Thus, the ordinary scheme of the DDM with ANN did not improve the results further in this case.
Next, to improve DDMs by using smoothing for pretraining, we applied it to the same structure of subdomains with the same sets of hidden units. The pretraining used 10,000 epochs. For fine-tuning, the ratio of the training epochs at the boundary, subdomain's interface, and the governing equation was set to be 20:2:1, i.e., the training took place at boundary nodes for 20 epochs, then for 2 epochs at the interface, and then 1 epoch for the governing equation on interior nodes of each subdomain. Here, the number of training epochs at the boundary was set relatively high since we noted that the structure of our ANN was not equipped with an ansatz employed  www.nature.com/scientificreports/ to hold the boundary condition. We propose that further training at the boundary will allow estimations to be stable and maintain their analytic uniqueness. The resulting configuration of the estimated solution and the plots of the progression of the loss versus epoch are shown in Figs. 6 and 7, respectively. It was revealed that the model of smoothing well approximated the exact solution in Eq. (17). However, the single domain method and the ordinary approach of the DDM failed to make an accurate estimate (Figs. 6, 7). This demonstrates that such highly-oscillatory solutions in Eq. (17), which the ordinary single domain method hardly solves with lower amounts of hidden units, are more effectively solved by the DDM model with its effective alignment of the subdomains' numerical basis delivered from the DDM's inherent domain separation scheme property. Note that, for our numerical results, the learning rate is set to be 10 −6 . From our preliminary tests, the convergence rate deteriorated for some scales of the learning rates much greater than 10 −6 . Even for DDM solutions, in the subregion where oscillation or size change of the solution was large, the convergence rate was significantly lower than that in other regions.
In the 1D problem, we have defined the area of the domain to reach a relatively longer region, i.e., we have defined the domain as [0,2]. In this case of domain region, we found that the ordinary ANN with some small numbers of hidden units failed to converge to the right solution. For this reason, as a special case of the solution to reveal the effectiveness of DDM applied for the solution type made of ANN structure, ANN solution made of some hundreds of hidden units was used for comparison to show the effectiveness of DDM. In the 1D problem, we found that, even for a hardly learnable solution due to some touch conditions regarding the domain, DDM could technically provide guarantees to reach the right convergence.   Table 1. Note that the initial weight parameters of the ANN were given randomly since we intended to verify prominent and consistent effects of methods even in random situations. The ratio of the training epochs at the boundary, subdomain's interface, and the governing equation was set to be 40:2:2, i.e., the training took place at boundary nodes for 40 epochs, then for 2 epochs at the interface, and then 2 epochs for the governing equation on the interior nodes of each subdomain.
The basis reconstruction scheme on the 2 × 2 square subdomains of our test problem is illustrated in Fig. 8. In Fig. 9, the estimated solutions from the case of 6 hidden units are depicted in the subplots (on the left-hand side) with a decaying process of the loss function (on the right-hand side). The solution estimated on the single domain model with 6 hidden units is denoted as '1-domain' . The ordinary DDM model without pretraining is 'No-pretraining' . The pretraining model of DDM with only smoothing is 'Smoothing' . DDMs of BR methods with ǫ of 0.05, 0.1, 0.2 are indicated as 'BR-0.05' , 'BR-0.1' , and 'BR-0.2' , respectively. For the case of '1-domain' , the solution estimation was incomplete and the loss remained high, suggesting that shortages of the numerical basis in the ANN caused the estimation to fail. For other methods using DDM structures, the loss declined. These methods relaxed the shortages in the approximation basis of the single domain model. Hence, from a computational effectiveness perspective, when one applies parallel processors, DDM models might improve the convergence rate as independent calculators (processors) are distributed to subdomains, which may save training costs per processor as each processor's allocated portion of the training data is reduced as the domain is divided (i.e., the overall order of the basis is strengthened whereas the training load on each processor is relieved) 39,40 .
To demonstrate further, in Figs. 10 and 11, configurations of estimated solutions and the loss along epochs are given for cases with 8 and 10 hidden units, respectively. It was found that the single domain approximation method still struggled to reduce the loss. DDM methods were comparatively more computationally powerful, although the dominance among them was indiscriminate when the hidden layer's size was increased.
As shown in Figs. 9, 10 and 11, the DDM-based lower-order ANN model had a more convergent solution than a higher-order single domain model. When training is performed using a given input data, computational costs for both the DDM-based model and the single domain model would be the same if both models' numbers of hidden nodes are the same. In our work, we focused on the computational efficiency of our proposed DDM technique since single-domain models might have difficulty to obtain the right solutions for some problems while our proposed methods were validated to handle these problems.
Results of comparing different DDM models are presented in Fig. 12, which shows resulting values of the loss at the end of each training with increasing number of hidden units. As shown in Fig. 12, when the size of hidden layers is increased, results are improved. However, the problem is that weight parameters that generate (18) u(x) = 1 e π − e −π sin(πx 1 ) e πx 2 − e −πx 2 . In the next subsection, we will present that, by pretraining with the BR method, the smoothness of the basis function generated by the hidden layer is improved. Hence, we can interpret that the effect of ǫ can cause the BR method to enrich the order of polynomials which shape the basis.
Comparing polynomial degrees using the H 2 semi-norm. We confirmed effects of using the H 2 semi-norm to compare degrees of polynomials in the basis as a measure of smoothness according to pretraining. Although we did not investigate all of the basis-degree with the measure in the H 2 semi-norm reflecting indirect comparison of the strength of the basis, our numerical results gave us confidence that the proposed pretraining method might help the initial basis reform its polynomials to higher degrees.  www.nature.com/scientificreports/ From the structure of the ANN, we know that the solution is a linear combination of the outputs of activated hidden units. Hence, the H 2 semi-norm of functions shaped by the activation of the hidden units, i.e., the measure of the degree of the polynomials of each hidden unit's activation function, can be considered as a measure of the degree of the polynomials of the numerical basis of the ANN. Precisely, we measured the H 2 semi-norm of one hidden unit's output values activated in each subdomain. The sum of the values of the hidden units on their subdomains was then added to make a total measure (see Table 2). In Table 2, the mean of the H 2 semi-norm is    Table 2 are plotted. Figure 13 (where the x-axis is H) clearly shows that the values of the H 2 semi-norm for the BR method are almost more than those for the smoothing method. In Fig. 14, the x-axis denotes the method of pretraining. Values tended to increase as ǫ decreased. This means that the basis of BR with low values of ǫ has improved its polynomial degrees as a result of the pretraining. Results shown in Fig. 12 (where the BR method with ǫ = 0.3 and 0.4 generated a relatively low-quality decay of the loss) could be interpreted that the BR method with a relatively low ǫ could give a benefit to fitting the solution's estimation in higher orders due to enrichment of the basis. Note that the computation of the H 2 semi-norm on the domain is conducted as follows. Let us define the output of the activation of the hidden units as k such that where w k ∈ R H×(d+1) denotes learnable weights defined in Eq. (1) and the subscript k denotes the index of subdomains. The discrete form of the second-order derivative is straightforward as shown in Eq. (20):  , ξ i,+δ = (ξ (1), ξ (2), · · · , ξ (i) + δ, . . . , ξ(d)). The measure of the H 2 semi-norm of k , denoted | k | 2 , is defined by the following: We summed all individual | k | 2 so that we could have k | k | 2 for values listed in Table 2.

Verification of the learning efficiency for general ANN tasks by simulation. If the BR method
can be effectively applied to the estimation of solutions of PDEs, it stands to reason that it may also be effectively applied to a general task of machine learning with ANNs. To generally prove that the BR method can enhance the performance of ANNs as a tool for machine learning, we tested the BR method on the problem of the differentiation of the exclusive OR (XOR) logic gate, a well-known low-dimensional problem of machine learning on which to apply an ANN. To arrange the problem of XOR in a way that could fit the concept of the BR method, we reorganized the dataset in the computational domain ⊂ R H , where = [0, 1] 2 , as follows (see Fig. 15). The XOR dataset is organized as (0,0), (1/2, 0), (0, 1/2), (1/2, 1/2), with labels of 1, 0, 0, 1, respectively. We defined the domain q as � q = [0, 1/2] 2 which included the dataset (see Fig. 15) and denoted q as q = [(0,0), (1/2, 0), (0, 1/2), (1/2, 1/2)] with the set of labels Y = [1,0,0,1]. Note that the definition of ǫ q is given in Eq. (15). Now, we can employ the BR scheme which is illustrated in Fig. 16. Since BR is a pretraining scheme, we determined the efficiency as a function of the number of epochs in the pretraining process. In our experiments, we set the number of pretraining epochs to be 500, 1000, 2000, and 3000. Results are compared with the ordinary training scheme which does not apply any pretraining method. We discretized each subdomain of except q into 21 equi-spatial nodes along each axis so that there were about 400 nodes in total.
In Fig. 17, results of the experiments for BR with varying ǫ are shown for the case with 4 hidden units. The values of ǫ used were 0.05, 0.1, 0.2, 0.3, and 0.4. In the plots, data for 500, 1000, 2000, and 3000 epochs of pretraining are indicated as PT500, PT1000, PT2000, and PT3000, respectively. The ordinary training scheme which had no pretraining process was denoted as PT0. Note that initial values of neural weights were set to be the same for each case since we aimed to verify the progression as we increased epochs of the pretraining of each method. Our results revealed that as the number of epochs of pretraining increased, the convergence of the loss to 0 was accelerated. As the size of ǫ increased, the speed of convergence was further accelerated. However, for the case of PT0, the convergence progress appeared to stop in the middle and the training failed to optimize the loss.
To make a comparison in the case of a larger hidden layer, we repeated the experiments with 200 hidden units and plotted the results for ϵ = 0.05, 0.1, and 0.2 as shown in Fig. 18. Similar to previous results with 4-hidden units, as the number of pretraining epochs increased, the convergence of the loss accelerated, although the difference in the speed of convergence was insignificant (Fig. 18). Experimental results shown above suggest that the BR technique applied in the pretraining seems to help the ANN model estimate the solution to the XOR problem. Furthermore, the finding that the convergence speed increased with the number of epochs in the pretraining showed us that the training results might not be caused by a random phenomenon with some vague factors and that the BR method strongly contributed to the effective utilization of the optimization power of the hidden units.
. Figure 15. The domain of the XOR problem, data points with labels in red letters (left), and the domain structure to apply the BR method with ǫ (right).  www.nature.com/scientificreports/ On the other hand, to demonstrate that the basis reconstruction method improved the approximation power of ANNs, the basis function of each hidden unit was developed into a higher state of the degree of polynomials so that the estimation power of the hidden layer could be improved. The approximated function representing the output function of the XOR logic gate given in Fig. 19 shows a wrong approximation of XOR by PT0 on the left and a higher-quality approximation on the right. The solution of the XOR function might have a high order of curvature's smoothness. Thus, the approximating numerical basis functions of the ANN also need to be in the form of some high-degree polynomials. With prerequisite knowledge, we observed shapes of the hidden   www.nature.com/scientificreports/

Conclusion
The estimation's accuracy using NN depends on the solution's analytic complexity. If the solution consists of high order polynomials, it needs to employ high order hidden units in NN because the number of hidden units determines the estimation's order of polynomials. In this study as basic research, we applied the proposed method for a single hidden layered architecture based on the universal approximation theory of the single hidden layered NN 25 . In our study of the implementation of an ANN-PDE solver, we proposed a pretraining version of ANNbased DDM which used a basis reconstruction technique where weight parameters in the hidden layer were rearranged to enrich the quality of the numerical basis and estimated solutions. The BR method was implemented by setting the output of the ANN to be zero on the neighboring subdomains. The main idea is that, as the ANN is nullified on other subdomains, eventually the core degrees of polynomials constituting the hidden units' numerical basis might be rearranged to center on their main subdomains. Experiments were conducted with one-and two-dimensional Poisson's equations for the XOR problem. When the BR method was applied to PDE problems, it was observed that the BR method enhanced the computing power to accelerate the rate at which poor estimations of an ANN on single domains were improved in the DDM model by reducing the computational costs relative to single-domain models. On the other hand, to numerically demonstrate the basis reconstruction property and benefits for the weight parameter's initialization, we compared the after-pretraining status of the hidden units' activation in the XOR problem. Results showed that degrees of polynomials of activations by the BR method, understood by measuring the curvature, were enriched. This gives a superior approximation compared with ordinary training procedures.
In this work of developing DDM in the structure of ANNs, the basis reconstructing treatment was effective for estimating solutions to PDEs. It could also be applied well to general machine learning tasks. Thus, it may be applicable as a tool for subjects such as regularization, weight initialization, and speeding computations in training.
Nowadays, lots of research have been conducted applying more deeper architectures. For instance, researchers have used ResNet 42 which provides more promising performances than any basic deep neural network (DNN)based architecture due to the skip-connection's property to find optimal number of hidden layers without too laborious tests made for some normal DNN architectures. However, for a deeper architecture of neural network, we have to apply the proposed method and test it to know how effective it will be in future works.