DeepGreen: deep learning of Green’s functions for nonlinear boundary value problems

Boundary value problems (BVPs) play a central role in the mathematical analysis of constrained physical systems subjected to external forces. Consequently, BVPs frequently emerge in nearly every engineering discipline and span problem domains including fluid mechanics, electromagnetics, quantum mechanics, and elasticity. The fundamental solution, or Green’s function, is a leading method for solving linear BVPs that enables facile computation of new solutions to systems under any external forcing. However, fundamental Green’s function solutions for nonlinear BVPs are not feasible since linear superposition no longer holds. In this work, we propose a flexible deep learning approach to solve nonlinear BVPs using a dual-autoencoder architecture. The autoencoders discover an invertible coordinate transform that linearizes the nonlinear BVP and identifies both a linear operator L and Green’s function G which can be used to solve new nonlinear BVPs. We find that the method succeeds on a variety of nonlinear systems including nonlinear Helmholtz and Sturm–Liouville problems, nonlinear elasticity, and a 2D nonlinear Poisson equation and can solve nonlinear BVPs at orders of magnitude faster than traditional methods without the need for an initial guess. The method merges the strengths of the universal approximation capabilities of deep learning with the physics knowledge of Green’s functions to yield a flexible tool for identifying fundamental solutions to a variety of nonlinear systems.


1D Problems
The data for all of the one-dimensional systems are created using the same method and forcing functions. Each solution is computed on an evenly-spaced 128-point grid using MATLAB's bvp5c solver with a relative error tolerance of 10 −8 and an absolute error tolerance of 10 −10 . The forcing functions F k (x) are designed to yield a variety of solutions u k such that ||u k || 2 1.
The training data consists of two types of systems: Gaussian-forced and cosine-forced systems. The Gaussian-forced systems have forcing functions of the form However, there were 97 solutions of the nonlinear biharmonic equation that did not meet the error tolerance and were therefore discarded. Of the remaining data, 10% are randomly chosen and withheld as test data, 80% are used as training data, and 20% are used as validation data. In order to test the ability of the network to generalize, we also have another test data set that consists of solutions with cubic forcing functions of the form where γ ∈ {0.01, 0.03, 0.05, . . . , 0.29}, and cubic forcing functions of the form where γ ∈ {0.01, 0.03, 0.05, . . . , 0.29}, ζ ∈ {0.01, 0.03, 0.05, . . . , 0.49}, and ψ ∈ {−5, −4, −3, . . . , 5}. There are a total of 4140 solutions with cubic forcing functions.

2D Problem
The two-dimensional data satisfies the nonlinear Poisson equation in the manuscript. The solutions are computed with a finite element method using the DOLFIN library 1 Figure S2. Decoder network architecture for the two-dimensional data. All transposed convolutional layers use 4 × 4 kernels with stride size 2, zero-padding, and ReLU activation functions except for the last layer which has stride size 1. data in that there are Gaussian and cosine forcing functions along with a separate data set of cubic polynomial forcing functions used to test the ability of the network to generalize. The Gaussian forcing functions are of the form where γ x , γ y ∈ {0.01 + 0.28k/3|k = 0, 1, 2, 3}, and cubic forcing functions of the form

Neural Network Implementation Details
The model training procedure is kept constant for all of the examples in this work. The networks are optimized with an Adam optimizer (β 1 = 0.9, β 2 = 0.999). Every numerical experiment starts by training a set of 20 models for a 'small' number of epochs. Each of the 20 models has a randomly selected learning rate for the Adam optimizer, uniformly selected between 10 −2 2/6 Supplementary Figure S3. Layer-by-layer autoencoder architecture for 1D problems.
and 10 −5 . The initial training period consists of two phases: autoencoder-only (75 epochs) and full model (250 epochs). The autoencoder-only phase only enforces the autoencoder losses L 1 and L 2 during backpropagation. A checkpoint algorithm is used to keep track of the model with the lowest overall loss throughout the training procedure. At the end of the initial period, the best model is selected and the others are eliminated. The best model is trained for an additional 2500 epochs. There are two network architectures in this work. The architectures depicted in Figs. S1 and S2 are applied to the two-dimensional nonlinear Poisson BVP. The architecture depicted in Fig. S3 is applied to one-dimensional problems.
The two architectures have a few training variables in common. Both models use variance scaling initialization, 2 regularization (λ = 10 −6 ), and ReLu activation functions for fully connected (1D architecture) and convolutional (2D architecture) layers. Notably, the two layers immediately before and after the latent space do not have activation functions. A normalized mean squared error loss function is used for all of the loss functions, as described in the manuscript. The models are trained in batches of 64 samples.
The 2D architecture utilizes convolutional layers and pooling layers, as shown in Figs. S1 and S2. All convolutional layers use a kernel size of 4 × 4. There are differences between the convolutional layers in the encoder and the convolutional layers in the decoder. The encoder convolutional layers use a stride size of 1 × 1 and an increasing number of filters (8, 16, 32, 64). The deconvolutional layers use a stride size of 2 × 2 with decreasing filter size (32, 16, 8). Pooling layers are similar for both the encoder and decoder with a stride size of 2 × 2 and a pool size of 2 × 2.

Additional Results
The repeatability of the results and models learned by the DeepGreen architecture are interesting to study from the perspective of operator convergence and latent space representations. In both cases, we aim to investigate the convergence of the model parameters to determine if the learned latent spaces and operators are unique or non-unique.

Operator Initialization
We repeat the training procedure for DeepGreen with three different initialization approaches for the operator L. Again, we train with data from the example nonlinear cubic Helmholtz model. This experiment focuses on comparing the initial values of the operator L with the final values of the operator at the end of training to determine if the DeepGreen approach tends to converge to a specific operator construction. The results in Fig. S4 show the initial and final operator for identity-initialized, randomly initialized, and Toeplitz-initialized operator matrices. Impressively, the result shows that the network tends to learn operators with diagonal dominance for all of the tested initialization strategies. This approach, which DeepGreen appears to prefer, draws strong parallels to the coordinate diagonalization approach commonly used in physics.

3/6
Supplementary Figure S4. Initial vs learned operators for an operator matrix L for different initial conditions. The top row shows identity matrix initialization, the middle row shows random initialization (He normal), and the bottom row shows a Toeplitz gradient initialization.

Latent Space Analysis
We repeat the training procedure for the example system, the nonlinear cubic Helmholtz model, a total of one hundred times. A single sample was selected from the training data and the latent space representation, v i and f i , of the input vectors u i and F i are computed. Statistics for the latent space representations are presented in Fig. S5. It is evident that the latent space vectors are not identical between runs, and that the values in the vector do not follow any particular statistical distribution. This information implies that the learned weights in the model and the learned latent space representations vary for each training instance and do not appear to converge to a single representation.

Residual network architecture
All of the autoencoders used in this work use a residual network (ResNet) architecture. In order to demonstrate the advantage of the ResNet architecture, we trained six models using the DeepGreen architecture for each of the four systems. Three of the models use the ResNet skip connections, while three do not use the ResNet architecture.
For the two simplest systems, the nonlinear cubic Helmholtz equation and the nonlinear Sturm-Liouville equation, the difference between the models with and without the ResNet skip connections was negligible. For the nonlinear cubic Helmholtz equation, the mean validation loss for the non-ResNet models was 2.7 × 10 −3 and the median validation loss was 2.4 × 10 −3 . Using the ResNet architecture resulted in a mean validation loss of 3.5 × 10 −3 and a median validation loss of 8.8 × 10 −4 . The ResNet architecture resulted in a lower median validation loss but a higher mean due to one of the three models performing much more poorly than the other two. The results for the nonlinear Sturm-Liouville system are analogous. With a non-ResNet architecture, the mean validation loss was 4.5 × 10 −3 and the median validation loss was 4.0 × 10 −3 . With a ResNet architecture, the mean validation loss was 5.7 × 10 −3 and the median validation loss was 3.1 × 10 −3 . Therefore, the use of the ResNet architecture produced similar results to a non-ResNet architecture for these two simple systems.
For the two systems that had larger losses -the nonlinear biharmonic equation in 1D and the 2D nonlinear Poisson

Comparison to baseline solvers
The DeepGreen architecture has several advantages over traditional BVP solvers with speed being one of the most striking. In order to demonstrate this, we performed speed tests of DeepGreen versus the numerical solvers that we used to generate the training and test data: MATLAB's bvp5c for the one-dimensional data and the finite element method of the DOLFIN library for the two-dimensional data. All speed tests were performed on a MacBook Pro with a 2.3 GHz Quad-Core Intel Core i7 processor and 32 GB of RAM. We recorded the time it took to solve the BVPs for each of the four model systems using the cubic polynomial forcing functions described above. The times in seconds are given in Supplementary Table S1. The last column gives the time using the traditional solver divided by the time using DeepGreen (labeled as "Speed up"). In all cases, the DeepGreen method is orders of magnitude faster than the traditional solver with the difference being most extreme for the one-dimensional systems. One thing to note is that the times given for DeepGreen are conservative. As it is currently implemented, the predict method of the DeepGreen architecture takes a forcing function as input and outputs a prediction of the solution to the BVP, but it also takes a solution as input and outputs a prediction of the forcing function corresponding to that solution. Additionally, it runs both the forcing function and solution through their respective autoencoders. The times given in Supplementary Table S1 include all four of those evaluations and not just the time needed to produce a solution given a forcing function.
For several of the cubic polynomial forcing functions, MATLAB's bvp5c did not converge for the nonlinear biharmonic equation due to an initial guess that was outside the radius of convergence. This cannot happen for DeepGreen since no initial guess is required. In order to circumvent the issue of a poor initial guess, the BVP can be solved using a continuation method. However, this would require many nonlinear equation solves and the speed difference would be even more stark.