Data-driven discovery of Green’s functions with human-understandable deep learning

There is an opportunity for deep learning to revolutionize science and technology by revealing its findings in a human interpretable manner. To do this, we develop a novel data-driven approach for creating a human–machine partnership to accelerate scientific discovery. By collecting physical system responses under excitations drawn from a Gaussian process, we train rational neural networks to learn Green’s functions of hidden linear partial differential equations. These functions reveal human-understandable properties and features, such as linear conservation laws and symmetries, along with shock and singularity locations, boundary effects, and dominant modes. We illustrate the technique on several examples and capture a range of physics, including advection–diffusion, viscous shocks, and Stokes flow in a lid-driven cavity.

nature (3,4). Recently, scientific computing and machine learning have successfully converged on PDE discovery (5)(6)(7)(8), PDE learning (9)(10)(11)(12)(13)(14), and symbolic regression (15,16) as promising means for applying machine learning to scientific investigations. These methods attempt to discover the coefficients of a PDE model or learn the operator that maps excitations to system responses. The recent DL techniques addressing the latter problem are based on approximating the solution operator associated with a PDE by a neural network (9)(10)(11)(12)(13). While excellent for solving PDEs, we consider them as "black box" and focus here on a data-driven strategy that improves human understanding of the governing PDE model.
In contrast, we offer a radically different, alternative approach that is backed by theory (17) and infuse an interpretation in the model by learning well-understood mathematical objects that imply underlying physical laws. We devise a DL method, employed for learning the Green's functions (18) associated with unknown governing linear PDEs, and train the neural networks (NNs) by collecting physical system responses from random excitation functions drawn from a Gaussian process. The empirically derived Green's functions relate the system's response (or PDE solution) to a forcing term, and can then be used as a fast reduced-order PDE solver. The existing Graph Kernel Network (12) and DeepGreen (11) techniques also aim to learn solution operators of PDEs based on Green's functions. While they show competitive performance in predicting the solution of the PDE for new forcing functions, they fail to capture Green's functions accurately, which makes the extraction of qualitative and quantitative features of the physical system challenging.
Our primary objective is to study the discovered Green's functions for clues regarding the physical properties of the observed systems. Our approach relies on a novel and adaptive neural network architecture called a rational neural network (19), which has higher approximation power than standard networks and carries human-understandable features of the PDE, such as shock and singularity locations.
In this paper, we use techniques from deep learning to discover the Green's function of linear differential equations Lu = f from input-output pairs (f, u), as opposed to directly learning L, or model parameters. In this sense, our approach is agnostic to the forward PDE model, but nonetheless offers insights into its physical properties. There are several advantages to learning the Green's function. First, once the Green's function is learned by a neural network (NN), it is possible to compute the solution, u, for a new forcing term, f , by evaluating an integral (see Eq. (1)); which is more efficient than training a new NN. Second, the Green's function associated with L contains information about the operator, L, and the type of boundary constraints that are imposed; which helps uncover mechanistic understanding from experimental data. Finally, it is easier to train NNs to approximate Green's functions, which are square-integrable functions under sufficient regularity conditions (18,20,21), than trying to approximate the action of the linear differential operator, L, which is not bounded (22). Also, any prior mathematical and physical knowledge of the operator, L, can be exploited in the design of the NN architecture, which could enforce a particular structure such as symmetry of the Green's function.

Results
Deep learning Green's functions.
Our DL approach (see Fig. 1) begins with excitations (or forcing terms), {f j } N j=1 , sampled from a Gaussian process (GP) having a carefully designed covariance kernel (17), and corresponding system responses, {u j } N j=1 . It is postulated that there is an unknown linearized governing PDE so that Lu j = f j . The selection of random forcing terms is theoretically justified (17) and enables us to learn the dominant eigenmodes of the solution operator, using only a small number, N , of training pairs. The Green's function, G, and homogeneous solution, u hom , which encodes and are approximated by two rational neural networks: N G and N hom .
A rational NN consists of a NN with trainable rational activation functions whose coefficients are learned simultaneously with the weights and biases of the network. Rational NNs have better approximation properties than standard NNs (19), both in theory and in practice, which makes them ideal for the present application. The parameters of the NNs representing the  Human-understandable features.
The trained NNs contain both the desired Green's function and homogeneous solution, which we evaluate and visualize to glean novel insights concerning the underlying, governing PDE ( Fig. 2). In this way, we achieve one part of our human interpretation goal: finding a link between the properties of the Green's function and that of the underlying differential operator and solution constraints.
As an example, if the Green's function is symmetric, i.e., G(x, y) = G(y, x) for all x, y ∈ Ω, then the operator L is self-adjoint. Another aspect of human interpretability is that the poles of the trained rational NN tend to cluster in a way that reveal the location and type of Numerical examples.
As a first example, we consider a second-order differential operator having suitable variable coefficients to model a viscous shock at x = 0 (23). The system's responses are obtained by solving the PDE, with Dirichlet boundary conditions, using a spectral numerical solver for each of the N = 100 random forcing terms, sampled from a GP having a squared-exponential covariance kernel (17). The learned Green's function is displayed in Fig. 3A and satisfies the following symmetry relation: G(x, y) = G(−x, −y), indicating the presence of a reflective symmetry group within the underlying PDE. Indeed, if u is a solution to Lu = f with homogeneous boundary conditions, then u(−x) is a solution to Lv = f (−x). We also observe in Next, we reproduce the same viscous shock numerical experiment, except that this time we remove measurements of the system's response from the training dataset in the interval  In Fig. 3H and I, we display the homogeneous solution NN, along with the phase of the rational NN, evaluated on the complex plane. The agreement between the exact and learned homogeneous solution illustrates the ability of the DL method to accurately capture the behavior of a system within "multiphysics" contexts. The choice of rational NNs is crucial here: to deepen our understanding of the system, as the poles of the homogeneous rational NN characterize the location and type of singularities in the homogeneous solution. Here the change in behavior of the differential operator from diffusion to advection is delineated by the location of the poles of the rational NN.

Nonlinear and vector-valued equations.
We can also discover Green's functions from forcing terms and concomitant solutions to nonlinear differential equations possessing semi-dominant linearity. In Fig. 4A to C, we visualize the Green's function NNs of three operators with cubic nonlinearity considered in (11). The nonlinearity does not prevent our method from discovering a Green's function of an approximate linear model, from which one can understand features such as symmetry and boundary conditions. This property is crucial for tackling time-dependent problems, where the present technique may be extended and applied to uncover linear propagators. Finally, we consider a Stokes flow in a two-dimensional lid-driven cavity to emphasize the ability of our method to handle systems of differential equations in two dimensions. In this context, the relation between the system's responses and the forcing terms can be expressed using a Green's matrix, which consists of a two-by-two matrix of Green's functions and whose components reveal features of the underlying system such as symmetry and coupling ( Fig. 4D

Discussion
Contrary to existing works in the literature (10)(11)(12)24), our primary aim is to uncover mechanistic understanding from input-output data using a human-understandable representation of the underlying, but hidden, differential operator. This representation takes the form of a rational NN (19) for the Green's function. We extensively described all the physical features of the operator that can be extracted and discovered from the learned Green's function and homogeneous solutions, such as linear conservation laws, symmetries, shock front and singularity locations, boundary conditions, and dominant modes.
The DL method for learning Green's functions of linear differential operators naturally extends to the case of three spatial dimensions but these systems are more challenging due to the GPU memory demands required to represent the six-dimensional inputs used to train the NN representing the Green's function. However, alternative optimization algorithms than the one used in this paper and described in Methods, such as mini-batch optimization (25,26), may be employed to alleviate the computational expense of the training procedure.
While our method is demonstrated on linear differential operators, it can be extended to nonlinear, time-dependent problems that can be linearized using an implicit-explicit time-stepping scheme (27,28) or an iterative method (29). This process allows us to learn the Green's functions of linear time propagators and understand physical behavior in time-dependent problems from input-output data such as the time-dependent Schrödinger equation (Supplementary Material, Section 9). The numerical experiments conducted in Fig. 4A to C highlight that our approach can discover Green's functions of linearizations of nonlinear differential operators.
Our deep learning method for learning Green's functions and extracting human-understandable properties of partial differential equations benefits from the adaptivity of rational neural networks and its support for qualitative feature detection and interpretation. We successfully tested our approach with noisy and sparse measurements as training data (Supplementary Material, Section 4). The design of our applied network architectures, and covariance kernel used to generate the system forcing is guided by rigorous theoretical statements (17,19) that offer performance guarantees. This shows that our proposed deep learning method may be used to discover new mechanistic understanding with machine learning.

Methods
Green's functions.
We consider linear differential operators, L, defined on a bounded domain Ω ⊂ R d , where d ∈ {1, 2, 3} denotes the spatial dimension. The aim of our method is to discover properties of the operator, L, using N input-output pairs {(f j , u j )} N j=1 , consisting of forcing functions, f j : Ω → R, and system responses, u j : Ω → R, which are solutions to the following equation: where D is a linear operator acting on the solutions, u, and the domain, Ω; with g being the constraint. We assume that the forcing terms have sufficient regularity, and that the operator, D, is a constraint so that Eq. (2) has a unique solution (18). An example of constraint is the imposition LG where L is acting on the function x → G(x, y) for fixed y ∈ Ω, and δ(·) denotes the Dirac delta function. The Green's function is well-defined and unique under mild conditions on L, and suitable solution constraints imposed via an operator, D (see Eq. (2)) (18). Moreover, if (f, u) is an input-output pair, satisfying Eq. (2) with g = 0, then Therefore, the Green's function associated with L can be thought of as the right inverse of L.
Let u hom be the homogeneous solution to (2), so that Using superposition, we can construct solutions, u j , to Eq. (2) as u j =ũ j + u hom , whereũ j satisfies Then, the relation between the system's response, u j , and the forcing term, f j , can be expressed via the Green's function as Therefore, we train two NNs: N G : Ω × Ω → R ∪ {±∞} and N hom : Ω → R, to learn the Green's function, and also the homogeneous solution associated with L and the constraint operator D. Note that this procedure allows us to discover boundary conditions, or constraints, directly from the input-output data without imposing it in the loss function (which often results in training instabilities (33)).

Rational neural networks.
Rational NNs (19) consist of NNs with adaptive rational activation functions x → σ(x) = p(x)/q(x), where p and q are two polynomials, whose coefficients are trained at the same time as the other parameters of the networks, such as the weights and biases. These coefficients are shared between all the neurons in a given layer but generally differ between the network's layers.
This type of network was proven to have better approximation power than standard Rectified Linear Unit (ReLU) networks (34,35), which means that they can approximate smooth functions more accurately with fewer layers and network parameters (19). It is also observed in practice that rational NNs require fewer optimization steps and therefore can be more efficient to train than other activation functions (19).
The NNs, N G and N hom , which approximate the Green's function and homogeneous solution associated with Eq. (2), respectively, are chosen to be rational NNs (19) with 4 hidden layers and 50 neurons in each layer. We choose the polynomials, p and q, within the activation functions to be of degree 3 and 2, respectively, and initialize the coefficients of all the rational activation functions so that they are the best (3, 2) rational approximant to a ReLU (see the supplementary material of (19) for details). The motivation is that the flexibility of the rational functions brings extra benefit in the training and accuracy over the ReLU activation function. We highlight that the increase in the number of trainable parameters, due to the adaptive rational activation functions, is only linear with respect to the number of layers and negligible compared to the total number of parameters in the network as: number of rational coefficients = 7 × number of hidden layers = 28.
The weight matrices of the NNs are initialized using Glorot normal initializer (36), while the biases are initialized to zero.
Another advantage of rational NNs is the potential presence of poles, i.e., zeros of the poly-nomial q. While the initialization of the activation functions avoids training issues due to potential spurious poles, the poles can be exploited to learn physical features of the differential operator (Supplementary Material, Section 5.5). Therefore, the architecture of the NNs also supports the aim of a human-understandable approach for learning PDEs. In higher dimensions, such as d = 2 or d = 3, the Green's function is not necessarily bounded along the diagonal, i.e., {(x, x), x ∈ Ω}; thus making the poles of the rational NNs crucial.
Finally, we emphasize that the enhanced approximation properties of rational NNs (19) make them ideal for learning Green's functions and, more generally, approximating functions within regression problems. These networks may also be of benefit to other approaches for solving and learning PDEs with DL techniques, such as PINNs (37), DeepGreen (11), Deep-ONet (10), Neural operator (12), and Fourier neural operator (24).

Data generation.
We create a training dataset, consisting of input-output functions, Theoretical justification.
Our approach for learning Green's functions associated with linear differential operators has a theoretically rigorous underpinning. Indeed, it was shown in (17)  Similarly, the choice of rational NNs to approximate the Green's function, and the homogeneous solution, is justified by the higher approximation power of these networks over ReLU (19). Other adaptive activation functions have been proposed for learning or solving PDEs with NNs (44), but they are only motivated by empirical observations. Both theory and experiments support rational NNs for regression problems. The number of trainable parameters, consisting of weight matrices, bias vectors, and rational coefficients, needed by a rational NN to approximate smooth functions within 0 < < 1, can be completely characterized (19). This motivates our choice of NN architecture for learning the Green's functions.

Data Availability
All

Supplementary Material 1 Generating the training data
The training dataset consists of N forcing functions, f j : Ω → R, and associated system responses, u j : Ω → R, which are solutions to the following equation: where L is a linear differential operator, D is a linear operator acting on the solutions, uj, and the domain, Ω; with g being the constraint. Unless otherwise stated, the training data comprises N = 100 forcing and solution pairs, where the forcing terms are drawn at random from a Gaussian process, GP(0, K SE ), where K SE is the squared-exponential covariance kernel (46) defined as The parameter > 0 in Eq. (4) is called the length-scale parameter, and characterizes the correlation between the values of f ∼ GP(0, K SE ) at x and y for x, y ∈ Ω. A small parameter, , yields highly oscillatory random functions, f , and determines the ability of the GP to generate a diverse set of training functions. This last property is crucial for capturing different modes within the operator, L, and for learning the associated Green's function accurately (17). Other possible choices of covariance kernels include the periodic kernel (46): which is used to sample periodic random functions for problems with periodic boundary conditions (Fig. S6B). Another possibility is a kernel from the Matérn family (46).
When Ω is an interval [a, b], we introduce a normalized length-scale parameter λ = /(b−a), so that the method described does not depend on the length of the interval. In addition, we choose λ = 0.03, so that the length-scale, , is larger than the forcing spatial discretization size, which allows us to adequately resolve the functions sampled from the GP with the discretization.
More precisely, we make sure that Fig. S1, we display the squared-exponential covariance kernel on the domain Ω = [−1, 1], along with ten random functions sampled from GP(0, K SE ). In the presence of real data, with no known homogeneous solution, one could instead normalize the output of the NNs, N G and N hom , to facilitate the training procedure.

Loss function
The NNs, N G and N hom , are trained by minimizing a mean square relative error (in the L 2 -norm) regression loss, defined as: Unless otherwise stated, the integrals in Eq. (5) are discretized by a trapezoidal rule (49) using training data values that coincide with the forcing discretization grid, In Section 4.4, we compare the results obtained by using trapezoidal integration, described above, and a Monte-Carlo integration (50): which has a lower convergence rate to the integral with respect to the number of points, N u .
This integration technique is, however, particularly suited for approximating integrals in high dimensions, or with complex geometries (50).
It is also possible to enforce some prior knowledge, regarding the Green's function, through the loss function, by adding a penalty term. If the differential operator is self-adjoint, then depending on the constraint operator D, the associated Green's function is symmetric, i.e., G(x, y) = G(y, x) for all x, y ∈ Ω. In this case, one can train a symmetric NN N G defined as However, our numerical experiments reveal that the NNs can learn both boundary conditions and symmetry properties directly, from the training data, without additional constraints on the loss function or network architectures.
where K = 15 denotes the Helmholtz frequency. We first remark in Fig. S2 that the rational NN is easier to train than the other NNs, as it minimizes the loss function to 10 −5 with ≈ 15000 epochs, while a ReLU NN requires three times as many optimization steps to reach 10 −4 . We also see that the loss function for the ReLU and rational NN becomes more oscillatory (51) and harder to minimize before epoch 1000, while it converges much faster after switching to L-BFGS-B. In theory, one could introduce a variable learning rate that improves the behavior of Adam's optimizer (52,53). However, that introduces an additional parameter, which is not desirable in the context of PDE learning. We aim to design an adaptive and easy-to-use method that does not require extensive hyperparameter tuning. We also observe that the tanh NN has a similar convergence rate to the rational NN due to the smoothness of the activation function, but this network exhibits instability during training, as indicated by the high value of the loss function when the optimization terminates. Rational NNs do not suffer from this issue, thanks to the initialization close to a ReLU NN, as can be observed in Fig. S2, when focusing on the value of the loss function corresponding to the early optimization steps.

Measuring the results
Once the NNs have been trained, we visualize the Green's functions by sampling the networks on a fine 1000 × 1000 grid of Ω × Ω. In the case where the exact Green's function G exact is known, we measure the accuracy of the trained NN, N G , using a relative error in the L 2 -norm: Here, we multiplied by 100 to obtain the relative error as a percentage (%). This illustrates an additional advantage of using a Green's function formulation: we can create test case problems with known Green's functions and evaluate the method using relative error and offer perfor-  (6)). The performance is measured using the relative error in the L 2 -norm defined in Eq. (7) between the trained network, N G , and the exact Green's function, G exact , whose analytic expression is given by (15) , if x ≤ y, sin(15y) sin(15(x−1)) 15 sin (15) ,

Influence of the activation function on the accuracy
We compare the performances of different activation functions for learning the Green's functions of the Helmholtz operator by training the NNs, N G and N hom , with rational, ReLU, and tanh activation functions. The numerical experiments are repeated ten times to study the statistical effect of the random initialization of the network weights and the stochastic nature of Adam's optimizer. The rational NN achieves a mean relative error of 1.2% (with a standard deviation of 0.2%), while the ReLU NN reaches an average error of 3.3% (with a standard deviation of 0.2%), which is three times larger. Note that the ten times difference in the loss function between ReLU and Rational NNs, displayed in Fig. S2, is consistent with the factor of three in the relative error since the loss is a mean squared error and √ 10 ≈ 3. This indicates that the rational neural networks are not overfitting the training dataset. One of the numerical experiments with a tanh NN terminated early due to the training instabilities mentioned in Section 2: achieving a relative error of 99%. We excluded this problematic run when comparing the ReLU and rational NN's accuracy, limiting ourselves only to cases where the training was successful.
The ReLU and rational NNs did not suffer from such issues and were always successful. The averaged relative error of the tanh NN, over the nine remaining experiments, is equal to 3.9% (with a standard deviation of 1.4%), which is slightly worse than the ReLU NN, with higher volatility of the results.

Number of training pairs and spatial measurements
This section describes our method's accuracy as we change the number of training pairs and the size of the spatial discretization. First, we fix the number of spatial measurements to be N u = 100, and then vary the number of input-output pairs, {(f j , u j )} N j=1 , of the training dataset for the Helmholtz operator with Dirichlet boundary conditions (see Eq. (6)). As we increase N from 1 to 100, we report the relative error of the Green's function learned by a rational NN in Fig. S4A. Next, in Fig. S4B, we display the relative error on the learned Green's function as we increase N u from 3 to 100, with N = 100 input-output pairs. Note that we only perform the numerical experiments once since we obtained a low variation of the relative errors in Section 4.1 when the networks, N G and N hom , have rational activation functions. We observe similar behavior in Fig. S4A and B, where the relative error first rapidly (exponentially) decreases as we increase the number of functions in our dataset or spatial measurements of the solutions to the Helmholtz equations with random forcing terms. One important thing to notice is our method's ability to learn the Green's function of a high-frequency Helmholtz operator, with only 1% relative error, using very few training pairs. The performance reaches a plateau at N ≈ 20 and N u ≈ 20, respectively, and ceases to improve. However, the stagnation of the relative error for more numerous training data and spatial measurements is expected and can be explained by our choice of covariance kernel length-scale, which restricts the GP's ability to generate a wide variety of forcing terms. This issue can be resolved by decreasing the length-scale parameter and concomitantly increasing the forcing discretization size or choosing another covariance kernel with a less pronounced eigenvalue decay rate (17).

Noise perturbation
The impact of noise in the training dataset on the accuracy of the learned Green's function is gauged by perturbing the system's response measurements with Gaussian noise as where the coefficients c i,j are independent and identically distributed, Gaussian random variables for 1 ≤ i ≤ N u and 1 ≤ j ≤ N , and δ denotes the noise level (in percent). We then vary the level of Gaussian noise perturbation from 0% to 50%, train the NNs, N G and N hom , for each choice of the noise level, and report the relative error in Fig. S4C. We first observe a low impact of the noise level on the accuracy of the learned Green's function, as a perturbation of the system's responses measurements with 20% noise only increases the relative error by a factor of 2 from 1.5% (no noise) to 2.7%. When the level of noise exceeds 25%, we notice large variations of the relative errors and associated higher volatility in results, characterized by a large standard deviation in error associated with repeated numerical experiments. We consider our DL approach relatively robust to noise in the training dataset.

Location of the measurements
As described in the Methods and Section 1, by default, we use a uniform grid for spatial measurements of the training dataset, and thus we discretize the integrals in the loss function (cf. Eq. (5)) using a trapezoidal rule. We conducted additional numerical experiments on the Helmholtz example to study the influence of the measurements' location and quadrature rule on the relative error of the learned Green's function. We report the relative errors between the learned and exact Green's functions in Table S1, using a Monte-Carlo or a trapezoidal rule to approximate the integrals and uniform or random spatial measurements. In the latter case, the measurement points {x i } Nu i=1 are independently and identically sampled from a uniform distribution, U(0, 1), where Ω = [0, 1] is the domain. We find that the respective relative errors vary between 0.96% and 1.3%. Therefore, we do not observe statistically significant differences in the relative error computed by rational NNs. These results support the claim that our method is relatively robust to the type of spatial measurements in the training dataset.  Fig. S5A  The space between the vertical black lines indicates where there is a lack of spatial measurements.

Learning features of differential operators from the Green's function
This section highlights that several features of the differential operators can be extracted from the learned Green's function, which supports our aim of uncovering mechanistic understanding from input-output pairs of forcing terms and solutions.

Linear constraints and symmetries
We first remark that boundary constraints, such as the constraint operator, D, of Eq. (3), can be recovered from the Green's function, G, of the differential operator, L. Hence, let f ∈ C ∞ c (Ω) be any infinitely differentiable function with a compact support on Ω, and u be the solution to Eq. (3) with forcing term, f , such that As an example, we display in Fig. S6A the learned Green's function of the following secondorder differential operator on Ω = [−1, 1] with an integral constraint on the solution: We observe that G(−1, y) = 0 for all y ∈ [−1, 1] and one can verify that One can see in Fig. S6B that the Green's function itself is periodic and that G(0, y) = G(1, y) for all y ∈ [0, 1], as expected. The periodicity of the Green's function in the y-direction: 1], is due to the fact that the Helmholtz operator is self-adjoint, which implies symmetry in the associated Green's function. Furthermore, any linear constraint C(u) = 0 such as linear conservation laws or symmetries (54), satisfied by all the solutions to Eq. (3), under forcing f ∈ C ∞ c (Ω), is also satisfied by the Green's function, G, such that C(G(·, y)) = 0 for all y ∈ Ω, and is therefore witnessed by the Green's function.

Eigenvalue decomposition
Let L be a self-adjoint operator and consider the following eigenvalue problem: where v is an eigenfunction of the differential operator, L, satisfying the homogeneous constraints with associated eigenvalue, λ > 0. The eigenfunction, v, can be expressed using the which implies that v is also an eigenfunction of the integral operator with kernel G, but with eigenvalue 1/λ. Consider now the eigenvalue problem associated with the Green's function, itself: where µ > 0. Then, we find that (w, 1/µ) are solutions to the eigenvalue problem (10). Consequently, the differential operator, L, and integral operator with kernel, G, share the same eigenfunction, but possess reciprocal eigenvalues (18). Thus, we can effectively compute the lowest eigenmodes of L from the learned Green's function. We now evaluate our method's ability to accurately recover the eigenfunctions of the Green's function that are associated with the largest eigenvalues, in magnitude, from input-output pairs.
We train a NN to learn the Green's function of the Laplace operator Lu with homogeneous Dirichlet boundary conditions, and numerically compute its eigenvalue decomposition. In Fig. S7A, we display the learned and exact Green's function, whose expression is given by

Singular value decomposition
When the Green's function of the differentiation operator, L, is square-integrable, its associated Hilbert-Schmidt integral operator defined by is compact and admits a singular value decomposition (SVD) (55). Then, there exist a positive sequence σ 1 ≥ σ 2 ≥ · · · > 0, and two orthonormal bases, {φ n } and {ψ n }, of L 2 (Ω) such that where u is the solution to Eq. (3) with forcing term f , and ·, · denotes the inner product in L 2 (Ω). Therefore, the action of the solution operator f → u can be approximated using the SVD of the integral operator. Similarly to Section 5.2 with the eigenvalue decomposition, the dominant terms in the expansion of Eq. (11) are associated with the largest singular values σ 1 ≥ σ 2 ≥ · · · > 0 of the integral operator.
The learned Green's function is illustrated in Fig. S7A, next to the exact Green's function given by: for x, y ∈ [0, 1]. In Fig. S9, we display the first five left and right singular vectors and the singular values of the exact and learned Green's functions. We observe that the first fifteen singular values of the learned Green's functions are accurate. This leads us to conclude that our method enables the construction of a low-rank representation of the solution operator associated with the differential operator, L, and allows us to compute and analyze its dominant modes.

Schrödinger equation with double-well potential
We highlight the ability of our DL method to learn physical features of an underlying system by considering the steady-state one-dimensional Schrödinger operator on Ω = [−3, 3]: is illustrated in Fig. S10, along with the Green's function learned by the rational NN from pairs of forcing terms and the system's responses. First, the shape of the well potential can be visualized along the diagonal of the Green's function in Fig. S10B. Next, in Fig. S10, we compute the first ten eigenstates of the Schrödinger operator in Chebfun (48) and plot them using a similar representation as Fig. 6.9 of (56). Similarly to Section 5.2, we compute the eigenvalue decomposition of the Green's function learned by a rational NN and plot the eigenstates (shifted by the corresponding eigenvalues) in Fig. S10. Note that the eigenvalues of the operator and the Green's functions are reversed. We observe a perfect agreement between the first ten exact and learned eigenstates. These energy levels capture information about the states of atomic particles modeled by the Schrödinger equation.

Singularity location and type
The input-output function of a rational NN is a high-degree rational function, which means that it has poles (isolated points for which it is infinite). In rational function approximation theory, it is known that the poles of a near-optimal rational approximant tend to cluster near a function's singularities (57). The clustering of the poles near the singularity is needed for the rational approximant to have excellent global approximation (58,59). Moreover, the type of clustering (algebraic, exponential, beveled exponential) can reveal the type of singularity (square-root, blow-up, non-differentiable) at that location. This feature of rational approximants is used in other settings (60).
We show that the rational NNs also cluster poles in a way that identifies its location and type. In Fig. S11C, we display the complex argument of the trained rational NN for the Green's function of a second-order differential operator with a jump condition, defined on Ω = [0, 1] as

Differential operators in two dimensions
We demonstrate the ability of our method to learn Green's functions of two-dimensional operators by repeating the numerical experiment of (12), which consists of learning the Green's function of the Poisson operator on the unit disk Ω = D(0, 1), with homogeneous Dirichlet boundary conditions: This experiment is a good benchmark for PDE learning techniques as the analytical expression of the Green's function in Cartesian coordinates can be expressed as (32): where (x, y), (x,ỹ) ∈ D(0, 1).
The training dataset for this numerical example is created as follows. First, we generate N = 100 random forcing terms using the command randnfundisk of the Chebfun software (48,62,63) with a frequency parameter of λ = 0.2, and then solve the Poisson equation, with corresponding right-hand sides, using a spectral method. Then, the forcing terms and system responses (i.e. solutions) are sampled at the N u = N f = 673 nodes of a disk mesh, generated using the Gmsh software (64). The spatial discretization of the mesh is chosen to approximatively match the discretization (N u = N f = 625) of the Stokes example in the main text. Moreover, the mesh structure ensures that the repartition of the sample points is approximately uniform in the disk (Fig. S13C) and that the boundary is accurately captured.
The Green's function and homogeneous rational NNs have four hidden layers and width of 50 neurons, with 4 and 2 input nodes, respectively, as the Green's function is defined on Ω × Ω.
The two-dimensional integrals of the loss function (5) are discretized using uniform quadrature weights: w i = π/N f for 1 ≤ i ≤ N f . In Fig. S13D to G, we visualize four, two-dimensional, slices of the learned Green's function together with two slices of the exact Green's function in panels A and B. Because of the symmetry in the Green's function, due to the self-adjointness of L and the boundary constraints, the exact Green's function satisfies G(x, y, 0, 0) = G(0, 0, x, y) for (x, y) ∈ D(0, 1). Therefore, we compare Fig. S13A to Fig. S13D, E, and similarly for Fig. S13B and Fig. S13F, G. We observe that the Green's function is accurately learned by the rational NN, which preserves low approximation errors near the singularity at (x, y) = (x,ỹ), contrary to the Neural operator technique (12). The visual artifacts present in Fig. S13E to G are likely due to the low spatial discretization of the training data.

System of differential equations
The method for discovering Green's functions of scalar differential operators extends naturally to systems of differential equations. Let f = f 1 · · · f n f : Ω → R n f be a vector of n f forcing terms and u = u 1 · · · u nu : Ω → R nu be a vector of n u system responses such The solution to Eq. (13) with f = 0 is called the homogeneous solution and denoted by Similarly to the scalar case, we can express the relation between the system's response and the forcing term using Green's functions and an integral formulation for 1 ≤ i ≤ n u . Here, G i,j : Ω × Ω → R ∪ {±∞} is a component of the Green's matrix for 1 ≤ i ≤ n u and 1 ≤ j ≤ n f , which consists of a n u × n f matrix of Green's functions: Following Eq. (14), we remark that the differential equations decouple, and therefore we can learn each row of the Green's function matrix independently. That is, for each row 1 ≤ i ≤ n u , we train n f NNs to approximate the components G i,1 , . . . , G i,n f , and one NN to approximate the ith component of the homogeneous solution, u i hom .
As an example, we consider the following system of ordinary differential equations (ODEs) on Ω = [−1, 1]: with boundary conditions: u(−1) = 1, u(1) = −1, v(−1) = v(1) = −2. In Fig. S14, we display the different components of the Green's matrix and the exact solution (computed by a spectral method), along with the learned homogeneous solutions. We find that the Green's function matrix provides insight on the coupling between the two system variables, u and v, as shown by the diagonal components G 1,2 and G 2,1 of the Green's matrix in Fig. S14A. Similarly, the components G 1,1 and G 2,2 are characteristic of diffusion operators, which appear in Eq. (15).
In this case, the Green's matrix can be understood as a 2 × 2 block inverse (65)

Viscous shock
We first consider the following second-order differential operator (23) on Ω = [−1, 1]: The robustness of our DL method with missing data in the vicinity of a shock front is analyzed in the panels D to F of

Linearized models of nonlinear operators
We emphasize that our DL method can be used to linearize and extract Green's functions from nonlinear boundary value problems of the form where L denotes a linear operator, N is a nonlinear operator, and < 1 is a small parameter controlling the nonlinearity. We demonstrate this ability on the three nonlinear boundary value problems, dominated by the linearity, used in (11).

Time-dependent equations
In this section, we show that one can use a time-stepping scheme to discretize a time-dependent PDE and learn the Green's function associated with the time-propagator operator τ : u n → u n+1 , where u n is the solution of the PDE at time t = n∆t for a fixed time step ∆t. As an example, we consider the time-dependent Schrödinger equation with a harmonic trap potential V (x) = x 2 given by i ∂ψ(x, t) ∂t = − 1 2 with homogeneous Dirichlet boundary conditions. We use a Crank-Nicolson time-stepping scheme with time step ∆t = 2 × 10 −2 to discretize Eq. (16) in time and obtain i ψ n+1 − ψ n ∆t = 1 2 − 1 2 Our training dataset consists of one hundred random initial forcing functions ψ n at time t and associated response ψ n+1 at time t + ∆t. The functions ψ n have real and imaginary parts sampled from a Gaussian process with periodic kernel and length-scale parameter λ = 0.5 (see Section 1), and multiplied by the Gaussian damping function g(x) = e −x 6 /20 to ensure that the functions decay to zero before reaching the domain boundaries. We then train a rational neural network to learn the Green's function G associated with the time-propagator operator such that τ (ψ n )(x) = Note that since ψ takes complex values, we in fact split Eq. (16) into a system of equations for the real and imaginary parts of ψ, and learn the Green's matrix associated with the system (see Section 7). We report the Green's matrix of the time-propagator operator for the Schrödinger equation in Fig. S19A and observe that the four components are dominated by the diagonal, which is expected for a small time-step. Additionally, we evaluate the accuracy of the learned Green's functions by generating a testing dataset with one hundred initial functions ψ n , sampled from the same distribution, and associated solution ψ n+1 at time t+∆t. We then compute the average (over the one hundred test cases) relative error in the L 2 norm between the exact solution ψ n+1 and the one predicted using the learned Green's functions, ψ pred n+1 , as where ψ pred n+1 is defined as G(x, y)ψ n (y) dy, x ∈ [−3, 3].
Finally, we obtain an average relative error of 1.3% with standard deviation 0.2% across the 100 test cases, confirming the good accuracy of our method. We display the worst-case prediction of the solution ψ n+1 in Fig. S19B.