Introduction

We have seen many triumphs of deep learning architectures successfully working on two-dimensional (2D) data sets, e.g.1,2,3. But, the extension of deep learning to three-dimensional (3D) data sets is an active research area facing many theoretical and computational challenges4,5. Although advanced machine learning (ML) methods gradually master and replace the complex actions that require the human-level intelligence and control capabilities6,7, there exists a deep chasm between advanced ML methods and mature areas of science and engineering. Often, successful ML methods are limited to opaque decision-making capability about simple tasks. A direct adoption of an advanced ML method can hardly guarantee successful learning and prediction of the real-world experimental data that often involve 3D objects and multifaceted physical measurements. Scientific discovery often follows a typical process—measuring complex physical quantities, investigating the observed data, and deriving a rule that best describes the target physical phenomenon. This fundamental process poses several obstacles to the advanced ML methods. Physical measurements are often scattered over complex 3D spaces or objects. A few apparent descriptors are insufficient for learning. The size and volume of physical experimental data are relatively small for direct adoption of advanced ML methods. Scientists seek to find “expressions” of physical rules, not merely a “black-box” prediction.

Recently, a new research paradigm emerges to address these hurdles, i.e., the so-called physics- or theory-guided ML paradigm8. Domain science is used to help ML to predict physically sound solutions, e.g. human neurology9, quantum mechanics10, and heterogeneous composite structures11. Governing physical rules (often partial differential equations) are often fed into ML, e.g., geophysics by an hierarchical graph model12 and wave and fluid flow by deep learning13. The present goal of searching hidden rules shares the similar notion with auto-encoder methods in pursuit of salient latent terms14. Still, this study differs from prior works in several aspects. First, this framework focuses on obtaining transparent expressions of hidden physical rules. Second, it seeks to leverage basic physics and scientists’ experience rather than relying upon the predefined global governing equations. Third, it is built upon two ingredients, the externalized multi-layer convolved information index and flexible link functions which reinterpret and inherit the deep learning’s successful philosophies. Last, the identified expressions can easily evolve by embracing other physics and more experience. In essence, this framework is purely data-driven, requiring no distributional assumptions about priors and posteriors compared to14. A comparable recent work to our framework would be15 which pursues both hidden coordinates and a parsimonious form of governing equation of dynamic systems although the key methods and procedures are different.

This research proposes a framework for a constantly evolving, “glass-box” (as opposed to the black-box) rule learner which can help extract hidden physical rules behind complex real-world phenomena by integrating experienced scientists’ knowledge and the central notions of deep learning. In particular, we propose a framework that can facilely deal with multifaceted measurements of 3D specimens by incorporating basic physics and scientists’ experience. Also, the framework can extract hidden rules in terms of transparent expressions which will evolve with increasing data through a Bayesian update scheme.

For feasibility tests, this research applied the framework to nano-scale contact electrification (CE), a process which is difficult to elucidate due to its complex geometry, intractable small length scale, and interaction of several physical mechanisms such as friction, fracture, demolding, and friction-induced charging (called tribocharging). The feasibility tests show a promising performance of the proposed framework in reproducing complex 3D distributions of CE-induced electric potentials over the 3D point cloud along with transparent expressions.

Results

Quantified basic physics and experiences of scientists

The first question is about how to quantify and transform basic physics and scientists’ experiences into ML-friendly quantities. Depending upon the underlying physics and domain-specific knowledge, the resultant quantities may differ in forms and ranges.

This study presents an example of how to quantify and infuse the scientists’ simple experience. The target experiments adopted for the present feasibility tests is about CE during the demolding process of the hardened polydimethylsiloxane (PDMS) specimen from the polyethylene terephthalate (PET) base mold which holds unique nano-scale patterns. The physical measurements are distributed over the 3D point cloud and each point is on a smooth surface (Fig. 1a, b). The demolding occurs in a specific direction, e.g., left-to-right (Fig. 1c, d), and the separation of PDMS from the patterned PET mold (Fig. 1e) has an important role in determining the direction of crack propagation and frictional stress development (Fig. 1f, g), which scientists know as prior knowledge. We quantify this basic experience by a directional vector, \({\mathbf{u}} = \{ {u_x,u_y,u_z} \} \in {\mathbb{{R}}}^3\). It is true that well-known analyses such as principal component analysis or Isomap16 with raw data may identify some information about the directivity of demolding test or reduced features, but this study regards scientists’ basic experiences as a starting point and focuses on how to infuse them into the subsequent rule learning. In particular, u = {0.001, 0, 1} was used to mimic left-to-right demolding process along the global X-axis whereas u = {0.001, −0.0014, 1}, for the tilted demolding process. We denote the unit normal vector to the friction surface as \({\mathbf{n}} = \{ {n_x,n_y,n_z} \} \in {\mathbb{{R}}}^3\), and each point has unique normal vector (Fig. 1b). Assuming scientist’s decent control of demolding, u is constant for all points of an individual specimen while n is varying at every point due to nano-scale patterns (Fig. 1e). Another important physical principle stemming from the scientists’ prior knowledge is that the friction-induced electric potential is proportional to the surface friction. Physically, the more surface friction, the more friction-related fracture energy can develop. Thus, to systematically quantify and infuse the degree of how much the demolding direction is aligned with the tangential direction of the surface friction, we leverage simple terms of the projection of demolding direction onto the friction surface, which can be mathematically described by

$${\mathbf{P}}_u = {\mathbf{u}} - \left( {{\mathbf{u}} \cdot {\mathbf{n}}} \right){\mathbf{n}},$$
(1)

where u is the global demolding direction vector and n is the surface normal vector. Since the current experimental data set consists of the observed electric potential over the 3D point cloud, the creation of the point-wise n requires a specific algorithm, which is described in Supplementary Note 1. Aiming at a simple range term that accommodates all these experiences and physical knowledge, we define a point-wise information index (II). We denote the local II (i.e., the degree of projection of the demolding on the friction surface) as \(II_{{\mathrm{local}}} \in {\mathbb{R}}\left[ { - 1,1} \right]\):

$$II_{{\mathrm{local}}} = {\mathrm{sgn}}\left( {{\mathbf{u}},{\mathbf{n}}} \right)\left| {{\mathbf{P}}_u} \right|^2/\left| {\mathbf{u}} \right|^2,$$
(2)

where sgn(u, n)= +1 if \({\mathrm{cos}}^{ - 1}( {\frac{{{\mathbf{u}}_{XY} \cdot {\mathbf{n}}_{XY}}}{{| {{\mathbf{u}}_{XY}} || {{\mathbf{n}}_{XY}} |}}} ) \,< \, \pi /2\); −1 otherwise; uXY = {ux, uy}, nXY = {nx, ny}. One simple physical reason behind this sgn(.) definition is that the specimens are assumed to lie on the XY plane and demolding occurs mainly toward the Z direction. Thus, sgn(u, n) gives +1 when the directions of demolding and surface normal are positively aligned, and −1 otherwise. It should be noted that the range of IIlocal [−1, 1] has several physical meanings. It will help confine the convolved II in the simple range [0, 1], which is compatible with the support range of the adopted cubic spline-based link functions. The simple monotonic range from −1 to 1 retains the similarity to activation values of typical ML methods. Clear physical interpretations exist: +1 indicating the positively aligned tangential friction whereas −1 meaning the negatively aligned condition. As long as these characteristics are satisfied, there exists ample room for different expressions of information index depending upon domain-specific knowledge. In computational mechanics, for instance, the spatial proximity between disparate materials can be represented by another form of information index by using virtual stress excitation11. It should be noted, however, that the local index by itself may be insufficient to capture the interactions among nearby physical quantities and may suffer from irregularity and local spikes. These drawbacks are to be resolved by the convolved information index, similar to multi-layered convolution in deep learning. Example plots of the local information index calculated with practical nano-scale measurements are presented in Supplementary Note 2.

Fig. 1: Basic physics and scientist’s experience to be quantified.
figure 1

a Example plot of the 3D point cloud of a nanocup array. b Point-wise normal vector n and area A of a friction surface (see algorithms in Supplementary Note 1). cd Illustration of the alignment of the demolding direction u and normal vector n; e Demolding of nano-patterns in the lab. fg Definition of tangential fracture energy Gf related to the tribocharging, and its calculated values over the circumference (AA′) of a nanocup. Inset shows the normalized potential measured by Kelvin probe force microscopy (adapted from refs. 18, 19).

Generation of convolved information index

One of the key enablers of deep learning is the convolution process that allows information integration. If convolution is done over a spatial domain, ML can better understand the interaction of spatially distributed information and hidden patterns while applied to the temporal domain, the interactions between past and present information can be elucidated5,17. Inheriting the philosophy of the deep learning’s convolution, the proposed framework seeks to spatially integrate the local II over the 3D point cloud. The key difference of this study’s convolution is that we “externalize” the deep learning’s multi-layered convolutions by conducting multiple convolutions at the input information level, not in the opaque deep network layers. Rather than a uniform integration, we adopt a weighted integration using Gaussian weight function (denoted ω) to realize the proximity-proportionate importance of information. This process creates “convolved” information index denoted as \(\overline {II}\), which is calculated by

$${\overline {II}} ( {{\mathbf{x}}_{\left( i \right)}} ) = \frac{1}{2}\left( {1 + \int_V {\omega \left( {{\mathbf{x}}_{\left( i \right)},{\boldsymbol{\xi }}} \right)II_{{\mathrm{local}}}\left( {\boldsymbol{\xi }} \right){\rm{d}}{\boldsymbol{\xi }}} } \right),$$
(3)

where the factor 1/2 and addition of 1 in the parentheses are for normalization to [0, 1]. For the integration over a discrete 3D point cloud, with the uniformity assumption over a small patch we have an approximation of

$$\overline {II} ( {{\mathbf{x}}_{\left( i \right)}} ) \cong \frac{1}{2} \left( {1 + \mathop {\sum}\limits_{{\boldsymbol{\xi }}_{\left( j \right)} \in V} {\omega \left( {{\mathbf{x}}_{\left( i \right)},{\boldsymbol{\xi }}_{\left( j \right)}} \right)II_{{\mathrm{local}}}\left( {{\boldsymbol{\xi }}_{\left( j \right)}} \right)A_{\left( i \right)}} } \right),$$
(4)

where A(i) is a patch area of the friction surface centered at the point i and calculated over the 3D point cloud (see the algorithm in Supplementary Note 1); ξ(j) is the position vector of the point j. As a primary spatial weighting function, we used the Gaussian function which has been widely used in other fields with different names (other weighting functions may be used):

$$\omega \left( {{\mathbf{x}}_{\left( i \right)},{\boldsymbol{\xi }}} \right) = \left( {L\left( {2\pi } \right)^{1/2}} \right)^{ - N}{\mathrm{exp}}\left( { - \frac{{\left| {{\mathbf{x}}_{\left( i \right)} - {\boldsymbol{\xi }}} \right|^2}}{{2L^2}}} \right) = {\cal{N}}\left( {{\mathbf{x}}_{\left( i \right)},L^2} \right),$$
(5)

where position vectors x(i), ξ V; L is the influence range parameter; N is the dimension parameter (herein N = 3 for the 3D point space). Of particular importance is the radius of influence range L. With a larger value of L, the information over a large space can be incorporated, but over-smoothing effects may occur; with a smaller L, adjacent information to the current point becomes more important, but the near information may be peaked which may lead to an over-fitting issue (visual comparisons of varying L are presented in Supplementary Note 2). When compared to the convolution of image data, the larger L may correspond to a large image patch used for the convolution. Figure 2 illustrates how the spatial information integration is conducted over the 3D point cloud, and also compares its similarity to the spatial or temporal convolutions used in typical deep learning methods. In the multi-layered deep learning, convolution can take place multiple times over several layers. Analogous to this philosophy, we generate multiple convolved IIs with various values of L and consider their interactions via multiple link functions.

Fig. 2: Comparison of the proposed framework’s convolved information index over the 3D point cloud and deep learning’s typical convolutions.
figure 2

a Each point has its own IIlocal, and the weighted integration of Eq. (3), a spatial convolution, generates the “convolved information index” (denoted as \(\overline {II}\)) at all points. b Multiplicative interaction among several link functions (LFs) of multiple influence ranges similar to the deep learning’s multiple convolutions over several layers. c Deep learning’s typical 2D convolution of image and d a temporal convolution for continuous aggregation of past information (adapted from ref. 17).

Feasibility tests with nano-scale experimental data

To confirm the feasibility of the proposed framework, we applied it to real experimental data sets of nano-scale electric potentials. The charge values were measured by Kelvin probe force microscopy over patterned nano-cups represented by the 3D point cloud (Figs. 35; raw test data are adapted from18,19). Raw data in Fig. 3a, b are from 4000 nm × 4000 nm square area with interval of 7.8125 nm, and Fig. 3c, d show the reproduced potentials using our framework. Raw data in Fig. 4a, b are measured over 3000 nm × 3000 nm square area with interval of 5.86 nm, and Fig. 4c, d show the corresponding predictions. Figure 5a, b has the same setup as Fig. 3, and Fig. 5c, d are the prediction plots. The demolding direction is different in all experiments including rightward, diagonal, and downward as marked in figure. Another complexity arises from the diverse heights of nano-cups: 154 nm for Fig. 3a, 93.5 nm for Fig. 4a, 117 nm for Fig. 5a, and 50.2 nm for Fig. 5c, respectively. Nano-scale patterns are also different, i.e. nano-cup arrays and parallel ridges. Since the goal of this study is to learn underlying physical rules, the proposed framework should overcome all the aforementioned apparent diversities and complexities.

Fig. 3: Initial training with raw data and the first prediction of electric potentials.
figure 3

ab Bird’s eye and top view of the real experimental data of the 3D point cloud showing a specimen with the rightward demolding direction parallel to the X-axis as shown in inset (experimental data values are adapted from Fig. 2a, d of ref. 19). cd Reproduced results using the identified physical rule with the best-so-far 3 link functions of convolved IIs in Eq. (6).

Fig. 4: Bayesian training with raw data and reproduction of electric potentials.
figure 4

ab Bird-eye and top view of the real experimental data showing a specimen with the diagonal demolding direction as shown in inset (experimental data values are adapted from Fig. 2b, e of ref. 19). cd Reproduced results using the identified physical rule with the best-so-far 3 link functions of convolved IIs in Eq. (6). A Bayesian update is used to inherit trained physical rule from Fig. 3.

Fig. 5: Prediction tests with substantially different experimental data sets.
figure 5

ab Rightward demolding direction parallel to the X-axis as shown in inset. a Experimental data (raw data values are adapted from Fig. S2a, c in ref. 19) and b predicted electric potentials using the best-so-far physical rule with three link functions of convolved IIs in Eq. (8). cd Downward demolding direction parallel to the Y-axis as shown in inset. c Experimental data (raw data values are adapted from Fig. 2c, f in ref. 19) and d predicted electric potentials using the best-so-far physical rule with three link functions of convolved IIs in Eq. (6). The experimental data for these prediction tests were not used for the Bayesian training, and the identified physical rules are solely based on training on the raw data of Fig. 3 and Fig. 4.

To confirm general learning capability, the Bayesian-evolution training is conducted on two seemingly disparate experimental data sets. In particular, the proposed glass-box learning begins from raw data of Fig. 3a, and then the identified rules are inherited to the next learning of Fig. 4a. After each training finishes, the best-so-far rules are used to reproduce the distribution of electric potentials of training data sets as shown in Fig. 3c, d and Fig. 4c, d. Out of many possible combinations of multiple link functions (LFs) with different influence ranges, the best-so-far rule is found to be the combination using 3 LFs of L = 8, 50, 100 nm (here, L is the spatial convolution influence range). The mathematical expression is attained by the proposed framework as

$${\mathrm{\Delta }}V_{\left( i \right)} = \mathop {\prod}\limits_{l = 1}^3 {{\cal{L}}^{\left( l \right)}\left( {\overline {II} _{\left( i \right)}^{\left( l \right)};{\mathbf{\uptheta }}^{\left( l \right)}} \right)},$$
(6)

where θ(l) = {a(l), x*(l)} are the free parameters of the lth LF \({\cal{L}}^{\left( l \right)}\) and their values are summarized in Supplementary Table 1. Then, the identified physical rules are used to predict substantially different experimental data sets, Fig. 5a, b. Visual illustrations of the identified rules are presented in Supplementary Note 3. The identified rule of Eq. (6) appears to be able to predict the spatial distributions of potentials which are substantially irregular and complex. It should be noted that the electric potential is a relative quantity, and thus we focus on overall shapes of the potential rather than specific values of certain locations. The best-so-far rule appears to reasonably reproduce the peaks and patterns over the 3D space. As anticipated, the identified rule is defined at the material point level, and thus the prediction can be done regardless of substantial differences in demolding directions, nano-patterns, and different geometries. Since the inheritance takes place for underlying rule expressions, the learned physical rules can be used for general cases (Fig. 5) regardless of substantial differences of nano-scale experiments.

Discussion

One of the strengths of the proposed glass-box learner is its transparency and clear interpretability. In view of Eq. (3), each convolved information index may be regarded as a marginal likelihood with the Gaussian conditional probability (dimension N = 3; influence range L):

$$ {\mathbb{{E}}}_{{\cal{N}}\left( {{\mathbf{x}}_{\left( i \right)},L^2} \right)}\left( {II_{{\mathrm{local}}}\left( {\boldsymbol{\xi }} \right)} \right): = \quad \\ \frac{1}{2}\left( 1 + \int_V \left( {L\left( {2\pi } \right)^{1/2}} \right)^{ - N} {\mathrm{exp}}\left( { - \frac{{\left| {{\mathbf{x}}_{\left( i \right)} - {\boldsymbol{\xi }}} \right|^2}}{{2L^2}}} \right)II_{{\mathrm{local}}}\left( {\boldsymbol{\xi }} \right){\rm{d}}{\boldsymbol{\xi }} \right)$$
(7)

Here, the factor 1/2 and addition of 1 used for normalization (Eq. 3) do not change the intended meaning of the likelihood. The best-so-far LFs are regarded as a weighted summation of constant, linear, quadratic, and cubic polynomials, respectively). If we decompose the best-so-far LF’s cubic spline basis, it will help elucidate the probable relationships; each polynomial basis informs the dominant relationship (e.g., linear, parabolic or high nonlinearity) between the target physics and II (compare Supplementary Figures 5 and 6 in Supplementary Note 4). This clear interpretability may help explicitly reveal a dominant relationship and approximation in subsequent investigations of domain scientists.

From the statistical angle, all the bases take the marginal likelihood \({\mathbb{{E}}}_{{\cal{N}}\left( {{\mathbf{x}}_{\left( i \right)},L^2} \right)}\left( {II_{{\mathrm{local}}}\left( {\boldsymbol{\xi }} \right);{\mathbf{\uptheta }}} \right)\). In particular, the observed physical rule behind the electric potential and our information index is written in an explicit form as

$${\mathrm{\Delta }}V_{\left( i \right)} = \mathop {\prod}\limits_{l = 1}^3 \Bigg[ a_1^{\left( l \right)} + a_2^{\left( l \right)} \times {\mathbb{{E}}}_{{\cal{N}}\left( {{\mathbf{x}}_{\left( i \right)},L^{\left( l \right)2}} \right)}\left( {II_{{\mathrm{local}}}} \right) \\ +\, \mathop {\sum}\limits_{j = 3}^5 {\left( {a_j^{\left( l \right)} \times b_j^{\left( l \right)}\left( {{\mathbb{{E}}}_{{\cal{N}}\left( {{\mathbf{x}}_{\left( i \right)},L^{\left( l \right)2}} \right)}\left( {II_{{\mathrm{local}}}} \right)} \right)} \right)} \Bigg],$$
(8)

where \(b_j^{\left( l \right)}\left( x \right)\) is basis of the adopted cubic spine splines for LFs (details about this basis are presented in the section of Flexible and transparent linf functions in Methods); target ΔV and local index IIlocal are defined at each material point, i.e. each data point of the 3D point cloud; θ(l) = {a(l), x*(l)} are summarized in Supplementary Table 1. By all means, this identified physical rule about ΔV is not a fixed, unique form. Rather, it suggests a best-so-far probable and physically explainable expression regarding the target phenomena. There are myriad of ways to use this identified physical rule based on scientists’ knowledge. For instance20, suggested a plausible causal pathway between the frictional charge and contact-surface temperature difference, ΔV ΔT, at nano-scale surfaces. Since the local information index always preserves physical meaning, e.g., direction alignment of demolding and tangential friction in Eq. (2), researchers may derive a new physical governing equation by simply linking the ΔT to Eq. (8) via a new II. For instance, atomic-scale investigations21,22,23 into thermally activated Prandtl and Tomlinson (PT) model suggests an interesting relationship among friction F1, temperature T, and velocity ν: F1(ν, T) = Fc − (βkBT ln(νc/ν)2/3 where νc is the critical velocity, kB is the Boltzmann constant, and Fc means the maximum slip-inducing force at zero temperature. It would be an interesting endeavor to link this PT relation to the II (e.g., F1 = f(IIlocal) with an invertible expression f) and finally to ΔV by plugging f−1 into Eq. (8). Such a new merge across multiple physics will be transparent by virtue of the glass-box learning and prediction of the proposed framework.

This study proposed a framework to seek a constantly evolving, glass-box physical rule learner by addressing how to infuse scientists’ knowledge and experience into quantifiable, ML-friendly terms. The framework’s two primary ingredients, the convolved II and LF, are inspired by deep learning’s multi-layered convolutions. The framework’s constantly evolving capability arises from the combination of a Bayesian update and evolutionary algorithm, which may translate into a maximization of log-likelihood, requiring no distributional assumptions about the priors and posteriors. The framework seeks to identify explicit mathematical expressions about the target physical phenomenon and IIs via LFs. Practical feasibility tests with complex nano-scale CE phenomena showed a promising capability of the framework in identifying reasonable expressions about intractable electric potential distribution across the 3D point cloud measured from nano-patterned specimens with varying geometry and demolding directions. By virtue of the transparency of LFs, the revealed physical rule will serve as a gateway for numerous possible rules, and with such a fertile partnership with ML, the new discovery will return to the hands of scientists in diverse disciplines. In general, this framework can be applied to other disciplines where multi-dimensional, multi-faceted physical data sets and the limited access to internal states pose challenges: e.g., new meta-material design, geophysics, and complex heterogeneous bodies11. Overall, this study demonstrates how advanced ML methods inspire domain scientists and how they can combine to tackle hitherto intractable scientific questions, promoting more cross-disciplinary collaborations. This framework will spark imagination of scientists to develop their domain-specific IIs and also invigorate ML community to embrace IIs into their successful platforms.

Still, there are ample rooms for further sophistication of this initial framework: to name a few, more flexible and versatile basis for link functions24 or use of an extensive library of possible mathematical expressions as done in15; more advanced ML methods and evolutionary algorithms for efficient searching of latent feature and high-dimensional parameters14,25; automated approaches to finding optimal layers of LFs; integration of diverse multiphysics rules such as nano-scale heat transfer, nano electrification and mechanical friction. Toward any extensions and applications, ideas in this study will serve as a fertile ground for departure.

Methods

Flexible and transparent link functions

In deep learning methods, the hidden layers’ weights embed important relations and interactions of variables and neurons in terms of numerical values. The meaning of weights gradually becomes opaque as the number of hidden layers of convolution processes increases. To emphasize higher transparency and interpretability, this study suggests using a LF that describes the impact of convolved information index \(\overline {II}\) on the hidden physical rules in terms of clear mathematical expressions. LF is denoted as \({\cal{L}}\left( {\overline {II} ;{\mathbf{\uptheta }}} \right)\) where θ is a set of free parameters prescribing the LF. Since the true form of a hidden physical rule remains unknown, this study suggests to borrow the power of an evolutionary algorithm to enable LF to continue to learn, train, and evolve. The framework focuses on evolving θ of LF, rather than finding a single set of parameters. Although there is no restriction on the selection of specific LF, this study chose the cubic spline basis, which is highly smooth and flexible. The cubic spline curves consist of a few cubic polynomials connected (at the so-called knots) such that the curves are continuous up to the second derivatives24. For example, when a practical cubic spline basis26 (denoted as bi) is adopted, LFs are given as

$${\cal{L}}\left( {\overline {II} ;{\mathbf{a}},{\mathbf{x}}^ \ast } \right) = \mathop {\sum}\limits_i^p {a_ib_i\left( {\overline {II} } \right)}$$
(9)

where b1(x) = 1, b2(x) = x, and

$$b_{i + 2}\left( x \right) = \frac{{\left[ {\left( {x_i^ \ast - \frac{1}{2}} \right)^2 - \frac{1}{{12}}} \right]\left[ {\left( {x - \frac{1}{2}} \right)^2 - \frac{1}{{12}}} \right]}}{4} - \frac{{\left[ {\left( {\left| {x - x_i^ \ast } \right| - \frac{1}{2}} \right)^4 - \frac{1}{2}\left( {\left| {x - x_i^ \ast } \right| \\ - \frac{1}{2}} \right)^2 + \frac{7}{{240}}} \right]}}{{24}},$$
(10)

for i = 1…p − 2. Here, \(x_i^ \ast\) is ith knot location. Therefore, to completely describe one LF, we need to identify p + (p − 2) unknowns, i.e. a = {a1, …¸ap} and \({\mathbf{x}}^ \ast = \{ {{\it{x}}_1^ \ast , \ldots ,{\it{x}}_{\left( {p - 2} \right)}^ \ast } \}\) For brevity, we denote the total unknown parameters as θ = {a, x*}. Regarding the LF’s flexibility, when a physical relationship is monotonic, a smooth shape LF may suffice, but when the hidden relations are in complex relations, the more flexible shape may be better. The adopted cubic spline basis can accommodate all of these characteristics. It should be emphasized that the adopted cubic spline basis is not used for direct regression. Rather than aiming at direct regression, this framework seeks to leverage the flexibility and transparency of cubic spline basis for finding “expressions” of LFs. Comparable to the multiple convolution over layers of deep learning, we allow an interaction of multiple LFs with different influence ranges. Thus, a target physical response ΔV is, in general, obtained with θ(l) = {a(l), x*(l)} by

$${\mathrm{\Delta }}V_{\left( i \right)} = \mathop {\prod}\limits_l^{n_l} {{\cal{L}}^{\left( l \right)}\left( {\overline {II} _{\left( i \right)}^{\left( l \right)};{\mathbf{\uptheta }}^{\left( l \right)}} \right)},$$
(11)

where ΔV(i) is the predicted electric potential at point (i), \(\overline {II} _{\left( i \right)}^{\left( l \right)}\) is the convolved II associated with lth influence range L(l) at the point (i), and the number of total LFs is denoted by nl which is also to be determined through learning. After we considered various numbers of LFs and different combinations such as additive or multiplicative, we found that nl=3 with influence ranges (e.g., L(l)= 8, 50, and 100 nm) and multiplicative combinations as in Eq. (11) lead to reasonable learning. As in the multiple convolutions over several layers of deep learning, this framework compares possible combinations of multiple LFs allowing interaction among different ranges’ information and finds the best performance case as illustrated in Fig. 2. Unlike hyperparameters of other ML methods, the proposed LFs seek to offer “expressions” which will be inserted or interwoven with other physical phenomena.

Bayesian update ingrained into an evolutionary algorithm

Aiming at no distributional assumptions about the priors/posteriors as well as pursuing smooth evolution, this study adopts the combination of a Bayesian update and a modified genetic algorithm11. The key evolutionary algorithm involves the preparation of initial generation, organism-wise evaluation of fitness score, and fitness-based spawning of the next generation. The prior best physical rules can be smoothly inherited by the Bayesian update-based fitness proportionate probability (FPP) rule. To accelerate the evolution speed of the modified genetic algorithm, an individual variable-wise gene cross-over scheme27 has been used, and the changing search range scheme28 is used in an iterative manner for better performance as successfully done in11. Since an individual s realizes a candidate of θ = (a, x*) in current generation S, the raw cost of an individual s, termed as \({\cal{J}}\left( s \right)\), is calculated by

$${\cal{J}}\left( s \right) = \frac{1}{n}\mathop {\sum}\limits_i^n {\left[ {{\mathrm{\Delta }}V_{\left( i \right)} - {\mathrm{\Delta }}V_{\left( i \right)}\left( {\mathbf{\uptheta }} \right)} \right]^2},$$
(12)

where ΔV(i) is the true (measured) physical response at point (i) and ΔV(i) is the predicted response by Eq. (11) with θ. This fitness score simply means the mean squared errors of the observed and the predicted surface potentials. Then, following typical genetic algorithm procedures11,29,30, the normalized fitness score \({\cal{F}}\) of an individual is calculated by

$${\cal{F}}\left( s \right) = \frac{{\left( {1 + {\cal{J}}\left( s \right)} \right)^{ - 1}}}{{\mathop {\sum}\nolimits_{\forall s \in S} {\left[ {\left( {1 + {\cal{J}}\left( s \right)} \right)^{ - 1}} \right]} }},$$
(13)

where s denotes an individual in the entire generation S. Learning a hidden physical rule is not a one-time task, rather a continuous activity. As diverse new experimental data become available, the physical rule learner must embrace all the previous knowledge and learn new information. To seamlessly realize this continuous learning, this study infused a Bayesian update scheme into the evolutionary algorithm’s FPP rule. Suppose we have the best-so-far generation, denoted as S* and its associated fitness scores, \({\cal{F}}^ \ast \left( s \right)\), sS*. According to the FPP rule, the probability of selecting an θ for next parent is given by \({\mathrm{Prob}}\left( {\mathbf{\uptheta }} \right) \propto {\cal{F}}\left( s \right),s \in S^ \ast\). Thus, \({\cal{F}}^ \ast \left( s \right)\) is regarded as a prior probability density function of parameters θ = {a, x*}, i.e. πprior(θ) in the typical Bayesian formalism. For initialization of πprior(θ), there are several choices: fully random initialization, expert knowledge-based31 or the principle of maximum entropy-based initialization32. This study intentionally departs from fully random initialization to investigate positive evolution trends without special initialization assumption. Thus, this framework is purely data-driven, requiring no distributional assumptions about priors and posteriors compared to some successful auto-encoder methods14. For the posterior distribution, we proposed the following two-stage procedure.

Suppose that we have the prior best LFs and their S* and that new experimental data become available. At the first learning generation with the new data, we can calculate the first fitness scores \({\cal{F}}\left( {s;S^ \ast } \right)\) by applying the prior S* and LFs to the new experiment. After the first generation, we can estimate the Bayesian fitness score (denoted as \({\cal{F}}_{\mathrm{B}}\)) as

$${\cal{F}}_{\mathrm{B}}\left( s \right) = \frac{1}{\kappa }\frac{{{\cal{F}}\left( {s;S^ \ast } \right){\cal{F}}^ \ast \left( s \right)}}{{\mathop {\sum}\nolimits_{\forall s \in S^ \ast } {{\cal{F}}^ \ast \left( s \right)} }},$$
(14)

where κ is needed for normalizing the Bayesian fitness to unity, which is simply given by

$$\kappa = \mathop {\sum}\limits_{\forall s \in S^ \ast } {\frac{{{\cal{F}}\left( {s;S^ \ast } \right){\cal{F}}^ \ast \left( s \right)}}{{\mathop {\sum}\nolimits_{\forall s \in S^{\ast}} {{\cal{F}}^ \ast \left( s \right)} }}}.$$
(15)

Then, from the second learning generation of the new experiment, the probability of selecting two parents is proportional to the Bayesian fitness score as

$${\mathrm{Prob}}\left( {\left. {{\mathrm{parent}}_i} \right|s} \right) \propto {\cal{F}}_{\mathrm{B}}\left( s \right),\left( {i = 1,2} \right).$$
(16)

Once again, an individual s realizes a candidate of θ = (a, x*) in the new generation S, and thus the desired posterior distribution is obtained. In this way, the prior knowledge is smoothly inherited to the new experiment on the framework of evolutionary algorithm, thereby enabling constantly evolving physical rule learning. To allow for evolving with new data, the previous scores are inherited by the Bayesian score Eq. (14). Since the adopted evolutionary algorithm remembers prior generation’s fitness scores, which offer the probability distribution of free parameters of LFs. As the Bayesian inheritance continues with new experimental data, the probability distribution of LFs will naturally evolve. Thus, the proposed framework can achieve evolving capability with increasing data. In the future, more dedicated investigations should focus on validation of the constantly evolving capability of LFs with sufficient, diverse test data. To some extent, the aforementioned combination of a Bayesian update and evolutionary algorithm can be viewed as a log-likelihood maximization as explained in Supplementary Note 4.

Overall flow of the evolving physical rule learner

Based on the aforementioned building blocks, the proposed framework has the overall architecture as shown in Fig. 6. For a target physical phenomenon, learning begins with quantifying the scientist’s knowledge and experience into a simple, local information index (step a in Fig. 6). Spatial information integration is conducted with multiple influence ranges (i.e. L(l), l = 1,…, nl) and Gaussian weights \({\cal{N}}\) as marked between steps a and b in Fig. 6. Multiple convolved IIs and their interaction may be regarded as the counterparts to the deep learning’s multi-layered convolutions. As mentioned earlier, the central novelty is that we “externalize” the deep learning’s multi-layered convolutions by conducting multiple convolutions at the input information level (b in Fig. 6), being independent of learning engine. Then, a ML or optimization method (here, evolutionary algorithm) is used for learning and evolving internal parameters θ of the LFs, not for direct prediction of target responses (step c in Fig. 6). Thus, the ML methods’ powerful strength of training and solution searching can focus on identifying mathematical expressions between the IIs and the target physical responses. The best-so-far “expressions” of the hidden rule will become a prior best generation for the next generation’s LFs via the Bayesian update scheme (step d in Fig. 6) for future data sets.

Fig. 6: Flowchart of the proposed framework.
figure 6

a Basic physics and experience is translated into a local II (IIlocal) at all spatial point xi. b With various ranges L(j) and Gaussian weights \({\cal{N}}\), multiple convolved IIs (\(\overline {II}\)) are generated; c Using \(\overline {II}\) and prediction errors of target electric potentials (ΔV), evolutionary algorithm reveals LFs. d Bayesian update is incorporated for cross-specimen evolution.

Raw experimental data sets of electric potentials

The raw experimental data sets adapted for the feasibility tests are in a text-based matrix form of {x, y, z, ΔV}(i), i = 1,…, n: (1) the horizontal demolding case (Fig. 3) has a 65536 × 4 matrix, (2) the diagonal demolding case (Fig. 4) has a 115600 × 4 matrix, and (3) the horizontal demolding case of parallel strip patterns (Fig. 5) has a 65534 × 4 matrix. Here, the unit of coordinates {x, y, z} is [nm] while the electric potential charge ΔV is in [Voltage] or [V]. All of these raw training data sets are made publicly available (see Data Availability).

General settings for the evolutionary algorithm

The initial search ranges for all link functions’ parameters aia are set as ai [−3000, 3000] while \(x_1^ \ast \in \left[ {0,1/3} \right]\), \(x_2^ \ast \in \left[ {1/3,2/3} \right]\), and \(x_3^ \ast \in \left[ {2/3,1} \right]\) for the three knots of cubic spline regression basis. For typical settings for the genetic algorithm, 4 alleles are used per gene; the mutation rate is 0.01–0.02; the total organisms per generation is 10,000–20,000. The maximum generation number is 1000 which gave reasonably converged results.

Computational implementation of proposed algorithms

The spatial convolution of the 3D point cloud to generate the convolved information index may be computationally expensive depending upon the size of data points. This framework developed a scalable parallel program with C++ and OpenMPI. All other learning, evolutionary algorithm and a Bayesian update scheme are implemented on the parallel program. The developed program will be publicly shared for academic purposes upon request to the corresponding author. Iowa State University’s high-performance computing facility, Condo cluster is used for this study.

Fabrication of tribocharged nano-cup array

The tribocharged PDMS nanocup array is fabricated by the sequence of (1) prepare a PET mold with a 750 nm-pitch triangular array of nanocones (Micro-continuum Inc.), (2) pour the liquid phase PDMS, Sylgard 184, Dow Corning, mixed with the curing agent, (3) solidify the specimen, and (4) peel off the PDMS from the PET mold (for more details see ref. 19).

Surface characterization with Kelvin probe force microscopy

Atomic force microscopy (AFM) (Multimode, Veeco) in the tapping mode is used to obtain the topography. AFM in the Kelvin probe force microscopy mode is used to measure the surface topography and potential. The settings for Pt/Ir coated tips (purchased from Bruker) are SCM-PIT-v2, the spring constant of 2.8 N.m−1, and the resonance frequency of 75 kHz. The lift height and the typical scanning rate are 45 nm and 0.5 Hz, respectively (for more details see ref. 19).