Abstract
While machine learning (ML) in experimental research has demonstrated impressive predictive capabilities, extracting fungible knowledge representations from experimental data remains an elusive task. In this manuscript, we use ML to infer the underlying differential equation (DE) from experimental data of degrading organicinorganic methylammonium lead iodide (MAPI) perovskite thin films under environmental stressors (elevated temperature, humidity, and light). Using a sparse regression algorithm, we find that the underlying DE governing MAPI degradation across a broad temperature range of 35 to 85 °C is described minimally by a secondorder polynomial. This DE corresponds to the Verhulst logistic function, which describes reaction kinetics analogous to selfpropagating reactions. We examine the robustness of our conclusions to experimental variance and Gaussian noise and describe the experimental limits within which this methodology can be applied. Our study highlights the promise and challenges associated with MLaided scientific discovery by demonstrating its application in experimental chemical and materials systems.
Introduction
In the traditional scientific discovery process, prior knowledge from first principles and empirical laws are combined with experimental data and intuition to yield governing equations. Newton’s law of gravitation^{1}, Einstein’s massenergy equivalence equation^{2}, Kepler’s laws of planetary motion^{3}, and other physical principles were uncovered through careful interpretation of experimental data and inductive reasoning^{4}. The approach of fitting experimental data through regression is difficult with systems that are yet to be understood fully—the set of feasible equations capturing the physics is enormous^{5,6,7}.
One such area where underlying physics is often poorly understood is the study of materials under environmental stress. For example, alloys^{8,9}, polymers^{10}, doped silicon^{11}, and hybrid materials^{12} experience changes at elevated temperatures. The degradation pathways can be complex and not directly obvious when examining the experimental data. Machine learning (ML) has been used to predict degradation^{13,14,15,16,17} as well as to optimize process conditions to reduce material decomposition^{17,18}. However, traditional datascience methods yield little insight into the underlying mechanisms. We posit that hidden in the blackbox ML models is valuable scientific information on the dynamics of the system. If uncovered, the knowledge of the governing dynamics can serve as foundation for physical interpretation of phenomena and scientific discovery.
Herein, we use Scientific ML, which combines regressionbased ML with sparsity generating techniques in order to automatically identify governing equations directly from data, especially when the systems being studied are too complicated to yield to traditional theoretical analysis. Not only does Scientific ML help us understand the underlying scientific phenomena better, it also has the potential to help to make simulations faster and extrapolate beyond the dataset at hand.
Recently, many approaches aiming for this target have been presented in literature. A method that we apply in this contribution is PDEFIND by Rudy et al.^{19}. This method is used for the discovery of physical laws describing dynamical systems. First, a library of potential candidate functions is built. Differentials are calculated by finite difference or polynomial interpolation. Once a large matrix with all candidate functions is composed, different sparse regression methods may be used to extract the partial differential equation (PDE) describing the system. The sparse methods implemented are sequential threshold ridge regression, lasso regression, elastic net regression, and greedy algorithm. Another sparse technique is Sparse Identification of nonlinear Dynamics (SINDy)^{20}. It uses a custom deep autoencoder to find a coordinate system in which the dynamics of the system are sparse, and then uses sparse regression to find the governing equations in the associated coordinate system. Atkinson et al.^{21} present a generalized method for the discovery of differential equations using genetic programming. Physics Informed Neural Networks (PINN)^{22} and PDENET^{23,24} are deep learning methodologies to extract governing partial differential equations using dynamical data. These methods have shown great promise in several applications^{25,26,27,28}. The automatic discovery of scientific laws and principles is at the frontier of machine learning that awaits application to materials science^{29} and other domains^{30,31,32}.
Halide perovskite materials, which have potential to provide high performing and costeffective solar energy, degrade at elevated temperature^{33,34,35,36,37,38}, humidity^{39,40,41}, and illumination^{42,43,44}. This is a major issue hindering the commercialization of perovskite photovoltaic technology. However, the degradation mechanisms affecting halide perovskites are not well understood. Discovering the underlying equations directly from perovskite degradation data could accelerate the development of stable perovskite solar cells. Herein, we apply Scientific ML to study the environmental degradation of methylammonium lead iodide (MAPI).
From prior knowledge in the literature, MAPI has multiple documented reaction pathways, including decomposition to PbI_{2} via reaction^{35}:
Smecca et al.^{36} demonstrate that the rate of MAPI degradation obeys an Arrheniustype law. Their data suggest that the degradation of MAPI follows zeroorder kinetics in the presence of moisture and firstorder kinetics in vacuum at temperatures ranging from 90 to 135 °C. Bastos et al.^{45} hypothesize that the thermal degradation of MAPI is defined by the Avrami equation^{46,47} of nucleation and growth. The Avrami equation has also been used to describe degradation kinetics in humid air^{48}. Recently, studies have shown that halide perovskite degradation follows autocatalytic reaction kinetics^{49} with the hypothesis that the degradation is propagated by iodine vapors^{50}. The derivation of exact kinetics through first principles as well as Arrheniustype dependence is difficult because of the complexity of MAPI decomposition, despite the availability of wellresolved dynamical data, inviting the application of Scientific ML.
In this study, we focus on the application of PDEFIND to perovskite degradation data. We choose PDEFIND as it is an interpretable method that provides a parsimonious description of the dynamics with the flexibility to apply domain expertise for library selection. Successfully identifying governing differential equations directly from the experimental aging test data would deepen the understanding of thermal degradation and provide tools for reliable lifetime prediction of perovskite solar cells as well as the determination of acceleration factors for longterm aging tests. These developments could spur the advancement of the perovskite photovoltaic technology and have been called for by the community^{51,52,53}. This study provides a generalizable pathway to identify degradation modes in other materials research domains as well.
The objectives of this study are twofold, as illustrated by the workflows shown in Fig. 1: Uncover the underlying differential equation corresponding to perovskite degradation using sparse regression methodology PDEFIND (Workflow (1)) and quantify the effect of noise on the accuracy of extraction of differential equations by PDEFIND by comparing noiseless and noisy simulated data (Workflow (2)).
Further details can be found in the “Methods” section; a summary is provided here. To generate the experimental data, we subjected 206 thinfilm samples of methylammonium lead iodide (MAPI) to 0.15 ± 0.01 Sun illumination, 20 ± 5% relative humidity, and temperatures varying from 35 to 85 °C in our inhouse environmental chamber described in detail in ref. ^{18} (Fig. 2b). A camera is used to monitor color changes versus time; the red color timeseries is chosen for further analysis because two studies^{18,54} have shown a clear correlation between film color and device performance, in the limit of MAPI composition, given one of the principal degradation products is yellow PbI_{2}. One hundred and eight samples were grown under lowvariance conditions (labeled ‘lowvariance experimental’, quantified in the “Results” section); 98 samples were grown under highvariance conditions (labeled ‘highvariance experimental’) (Supplementary Fig. 2), referring to the amount of variance (color change versus time) between samples synthesized and degraded under ostensibly identical conditions. Unless specified otherwise, we assume ‘experimental’ data in this paper refers to the lowvariance sample set.
Results
Results on experimental data
Our aim is to obtain the equation that most accurately describes the environmental degradation of methylammonium lead iodide (MAPI) as a function of time and temperature. There are two main challenges for Scientific ML in this application that are also common with many other experimental applications, especially in materials science: The function space that could in principle capture the degradation processes is enormous, complicating identification of unique equations. Furthermore, experimental data has measurement noise as well as sampletosample variance, making the identification of quantitative analytic descriptions even more challenging. These conditions can be optimized to some extent, but not excluded.
Our experimental setup represents a typical materials science experiment: The noise in our experimental data is of the order of 0.35% for both highvariance and lowvariance experimental datasets. The low value indicates that the camera measurement of degradation is optimized. The sampletosample variance for the ‘low variance experimental’ dataset is estimated to be 20% in relative standard deviation and the maximum mean absolute deviation is 12 units (red color values vary from 0 to 255). For the ‘high variance experimental’ dataset, variance is estimated to be 23% in relative standard deviation and the maximum mean absolute deviation is 31 units. These values are typical for spincoated perovskite film samples that tend to have rather high variations, especially when aged.
First, we attempt to uncover the differential equation governing perovskite degradation directly from experimental data (Workflow (1)). A simple way to analyze reaction rate orders is to fit the data to pure 0th, 1st, and 2nd order dynamics (Supplementary Fig. 3). These equations do not fit the data, showing that the environmental degradation of MAPI does not follow a simple nth order kinetics. This motivates the use of PDEFIND. We apply sparse regression to the whole experimental dataset with a broad function library consisting of polynomials of U up to order 5, sine and cosine of U, polynomials of t up to order 3, the square root of t, U multiplied with polynomials of t, temperature T and adjusted negative exponent of \(\frac{1}{T}\left( {{{{\mathrm{exp}}}}\left( {  \frac{{100}}{T}} \right)} \right)\). While we choose a broad set of candidate functions, the choice of function library is critical and determines the outcome of PDEFIND (candidate function libraries considered are in Supplementary Table 2). We find that sine and cosine terms are not selected by PDEFIND—indicating as a sanity check that the algorithm correctly identifies that periodicity is not a feature of the dynamics. Polynomials of t and U times the polynomials of t, which correspond to the Avrami equation, are not included in the chosen library or assigned very small weights. To understand how well the obtained DE represents our data, we compare the derivative estimated by our DE to the numerical derivative obtained from the experimental data. While certain trends in the derivative are captured, errors exist because of the variance in our experimental data (Supplementary Fig. 4). Refinements to the approach are thus needed.
We proceed to narrow the application of PDEFIND, by applying PDEFIND to the averaged data at each temperature individually to extract the governing ODE. Using the averaged data helps us deal with sampletosample variance. Since all environmental conditions were almost identical for samples degraded at a particular temperature but aging tests of each temperature were conducted one after another (introducing differences, e.g., in sample storage times and exact equipment atmosphere), we aim to reduce the influence of varianceinducing conditions by applying PDEFIND at each temperature separately. First, we apply PDEFIND with a large library as described in the previous paragraph. Here too, we see that sine and cosine of U, polynomials of t and U times the polynomials of t are either removed from the library or have small coefficient values. We exclude these terms in further analysis. Then, we apply PDEFIND with 1st to 5th order polynomial libraries. We find that with the 1st order polynomial library, PDEFIND is unable to find an equation that fits the derivative of our data (Fig. 3a). All other libraries from 2nd order polynomial to 5th order polynomial appear to fit the derivative of our data with significant accuracy (Fig. 3a, b). When these differential equations are integrated, they have the same Sshape as our experimental data (Fig. 3c). The 2nd order polynomial library is the most minimal library that fits our data without high error. The functional form of this ODE is:
We also notice a trend in the values of the fitting coefficients with respect to temperature—especially in the case of the 2nd order polynomial library (Fig. 3d, Supplementary Fig. 5). The slope of the curve changes between 55 °C and 65 °C, the temperature at which a wellknown MAPI phase transition^{55,56} occurs. This may indicate that the phase transition affects the degradation mechanism, but is not experimentally confirmed in this work.
Next, we evaluate the effect of variance on PDE extraction by comparing the above results (obtained on the lowvariance experimental dataset) with the same workflow applied to the highvariance data (Supplementary Fig. 6). After averaging multiple curves (U(t)) for each temperature, the results are qualitatively similar for a constrained function library of polynomials of 2nd order—the obtained coefficients have the same sign and order of magnitude (Supplementary Table 3). This indicates that PDEFIND can fit even highvariance experimental data when appropriately averaging over multiple samples. To quantify the effect of sampletosample variance, we apply PDEFIND to each curve individually. As expected, PDEFIND extracts a large variance in coefficient values. The values of coefficients vary as much as 60% with the low variance dataset and up to 90% with the high variance datasets for T = 55 °C.
Results on simulated data
Now, we evaluate the effect of noise on PDE extraction using simulated data. We use the nonlinear leastsquares method to fit our experimental data to the Verhulst logistic equation^{57} and the Arrhenius equation, as shown in the Methods section. We produce both noisefree simulated data and simulated data with Gaussian noise (Workflow (2)) with this model.
We apply sparse regression to the simulated dataset at each temperature individually to discover the governing ODEs with libraries ranging from 2nd to 5th order polynomials. With the noisefree data, PDEFIND’s identified DEs fit the derivative as well as the data on integration of the DE with significant accuracy for libraries from 2nd order to 5th order. In the case of the 2nd order polynomial library, both the underlying differential equation and the fitting parameters are identified with significant accuracy, as shown in Fig. 4. We know that the underlying governing equation for this dataset (which we defined as \(\frac{{dU}}{{dt}} = a_0 + a_1U + a_2U^2\)) does not have any terms higher than order two, thus the higherorder coefficients (e.g., \(a_3,a_4,a_5, \ldots\) of functional terms \(U^3,U^4,U^5 \ldots\)) are equal to zero. PDEFIND assigns small nonzero values to these functional forms, although they are not set to zero. In the case where sine and cosine are added to the library, the algorithm correctly identifies that these terms do not represent the dynamics and are set to zero exactly. The MAE between the exact numerical derivative and one estimated from the differential equation identified by PDEFIND is of order 10^{−7} (when derivative varies from 0 to 1). This indicates that PDEFIND works well for simulated curves with zero noise. Thus, with the candidate function library constrained to polynomials of U, PDEFIND is able to identify the same ODE that fits the data at each temperature.
We then add varying amounts of Gaussian noise to this simulated equation at different temperatures. First, we consider the effect of varying amounts of noise at a fixed temperature of 55 °C, as indicated by the black box in Fig. 4a. We add up to 5% noise, which is typical in many experimental settings. The equation identified by PDEFIND yields an Sshaped curve similar to the noisefree simulated curve upon integration (Fig. 4d) for up to 5% noise, after which the DE identified by PDEFIND does not seem to model the dynamics. We compare the error of estimating the parameter values in the differential equation describing the simulated data. At 5% Gaussian noise the error of the fitting parameters increases to almost 80% (Fig. 4b). The resulting integrated curve has MAE is 6 (on a color scale of 0–255) relative to the ‘ground truth’ noisefree simulated curve (Fig. 4c, d). In addition, PDEFIND is no longer able to threshold sin and cosine terms to 0, as it even fits the noise with sinusoidal pattern.
We then consider different temperatures at the same noise level. The Verhulst logistic equation model becomes increasingly steep and shifts to the left with higher temperature. PDEFIND successfully identifies this trend. It appears that the MAE is higher for equation extraction at highertemperature data. This could be because of noise obscuring PDEFIND’s ability to fit steeper peaks accurately.
Discussion
There remain many complex systems that have eluded quantitative analytic descriptions or even characterization of a suitable choice of variables in many disciplines such as biology, finance and materials science. With today’s stateofthe art equipment, acquiring large quantities of data has never been easier. As put by Rackauckas et al.^{58}, ‘the wellknown adage ‘a picture is worth a thousand words’ might well be ‘a model is worth a thousand datasets.’.
Scientific ML enables unique insights into MAPI degradation in this work. It is a promising method that can be used to uncover governing equations through data, especially when the derivation of physical laws using first principles is challenging. In our study, we demonstrate that PDEFIND identifies an underlying rate equation for the degradation of perovskite solar cells. MAPI degradation does not follow a simple singleorder reaction rate law, defined as:
where, n is the order of the reaction and U is the concentration of the species. In our system, this equation does not yield a good fit for n = 0, 1 or 2. The Sshaped dynamics we see in our study have been reported in other studies involving MAPI degradation as well^{45,48,49,50}. Some articles report that the degradation results from nucleation and growth of PbI_{2} crystals^{45,48}, supporting the hypothesis that the kinetics follows the Johnson–Mehl–Avrami–Kolmorgorov or simply, the Avrami equation^{46,47}:
where,
And a_{0}, a_{1}, n, and k are fitting constants.
The Avrami equation represents dynamics where degradation starts at nucleation spots on the film and these spots of degraded material grow radially, diffusionlimited. Some recent studies have presented an alternate hypothesis of selfpropagating or autocatalytic kinetics^{49,50}, which is described by another differential equation, the logistic function (discussed in the “Methods” section Eqs. (7), (8)). In this study, we build a large library of candidate terms for the DE– polynomials of U, that make up the logistic function, and polynomials of t and U multiplied with polynomials of t, which feature in the Avrami equation. PDEFIND determines that the simplest ODE that fits our experimental dataset best is of the form (Fig. 3),
the Verhulst logistic function. This equation indicates that the reaction is first, propelled forward by the presence of the reactant as well as the product, leading to a rapid growth in the product that eventually saturates when it exhausts its reactants—a selfpropagating reaction. While the Avrami equation (nucleation and growth) model is limited by the rate at which the reactant diffuses to the reaction site, the Verhulst logistic function (selfpropelling kinetics) model is not limited by this because the reactant is already present at the reaction site. This is why we chose the logistic function model for the simulated dataset over the Avrami equations that has been used to model nucleationgrowth reactions. The algorithm picks terms that describe selfpropelling kinetics (2nd order polynomial library) as opposed to diffusionlimited nucleation and growth (Avrami equation). When visually inspecting the videos of degrading films, without the benefit of PDEFIND, it can be hard to identify the underlying mechanism. In the example shown in Supplementary Fig. 7 [and Supplementary Video], one can see light areas of degraded material in the middle of the film degradation.
Equation (6) also offers insights that could help engineer more stable MAPI films. Once the degradation has begun, the autocatalytic nature suggests that degradation will continue, as the reaction products catalyze further MAPI degradation. Therefore, suppressing degradation means delaying the creation of the first reaction products for as long as possible. To engineer more stable MAPI films, this equation suggests that reducing MAPI degradation may be possible by reducing the density of nucleation points inside the material, including, e.g., by ensuring that all PbI_{2} precursors are fully converted during film formation, and possibly by using highly purified (i.e., devoid of contaminant particles) reagents in the film and adjacent layers that could nucleate PbI_{2}.
These insights bear consequence for researchers attempting to identify the underlying root cause(s) of perovskite degradation, as well as those modeling or predicting the (accelerated) degradation of these materials. If indeed this is a nucleation and growth phenomenon, little can be done to halt the growth of degraded regions once the initial nucleation event occurs. Therefore, to improve phase stability of perovskite films, an emphasis can be placed on identifying the nucleation points of these phase transformations, and inhibiting them, perhaps through improved precursor purification to remove impurities, improved control of the nucleation process, improved processing to remove growth catalysts, and improved packaging to prevent ingress of exogenous gasses. Changes to the film composition may increase the nucleation energy barrier; therefore, further investigation of stoichiometry optimization may be warranted in combination with the above.
We demonstrate the application of a Scientific ML tool, PDEFIND on MAPI degradation data. When applied to experimental data, PDEFIND identifies a differential equation that fits the data, when appropriate constraints are applied. In spite of the noise and variance in the dataset, only functions corresponding to the dynamics of the system are picked and the DEs show good agreement with the numerical derivatives. Our ‘robustness analysis’ with simulated data shows that PDEFIND with a 2nd order polynomial library succeeds at identifying the differential equation describing the simulated data when up to 5% Gaussian noise is added. However, the error of the fitting parameters increases with noise, to almost 80%. With 5% noise, the resulting integrated curve has a 6 MAE relative to the underlying noisefree simulated curve but the coefficients differ by as much as 80%. With the addition of noise, PDEFIND is unable to eliminate terms not in the DE (sine and cosine) and even fits the noise with these terms. Applying a denoising filter may allow for higher levels of noise in the data, assuming that an appropriate filter is chosen.
Scientific ML methods can be immensely useful at uncovering governing equations of dynamical systems, if the data obtained has low noise or can be denoised by noisereduction techniques. Data obtained through experiments is not devoid of measurement noise and denoising the data adequately can be challenging. In addition, certain operating conditions cannot be fully controlled, leading to sampletosample variance making it hard to get rid of. Our contribution motivates the development of Scientific ML techniques that are more robust to noise as well as variance in data. Scientific ML, in its current state, is wellsuited to be applied to domains where obtaining large quantities of lownoise data is possible, and will find more applications with methods that are robust to noise.
We show that Scientific ML has the potential to accelerate the understanding of materials degradation and the reliability optimization of perovskite materials. Extracting physical laws may facilitate the definition of acceleration factors for aging tests and also help in the prediction of perovskite solar cell degradation under varying environmental conditions. Not only does scientific machine learning aide us with understanding the underlying scientific phenomena better, it may also enable faster simulations and better extrapolations beyond our experimental datasets. The conclusions of any given materials study may well be rendered more generalizable by identifying underlying equations governing the observations.
Methods
Data collection
For the experimental portion of our study (Workflow (1) in Fig. 1), the input is the experimental data obtained from degrading MAPI films. MAPI film synthesis conditions follow those of the Materials subsection of the Experimental procedures section of ref. ^{18} (More information in Supplementary Methods). Our experimental data is shown in Fig. 2.
We monitored the degradation of MAPI based on the color change of the material. As MAPI films decompose, they change their color from initial black (majority MAPI) to degraded yellow (minority MAPI). We acquired images of the degrading films with 0.5min temporal resolution and processed them to obtain the average red, blue and green color components of the films as a function of time (Fig. 2a, Supplementary Fig. 1). There are limits to the use of film color as a proxy; for example, in mixed perovskites, degradation may proceed via phase demixing into pure phases that may be dark in color, and camerabased imaging technique should be modified or complemented with other metrology, e.g., Xray diffraction^{18}.
For the study of noise robustness (Workflow (2) in Fig. 1), we generate simulated degradation data to analyze how noise obfuscates the identification of underlying DEs. We apply a nonlinear leastsquares method to fit the experimental data (e.g., those shown in Fig. 2d) to the Verhulst logistic equation^{57} to model the Sshaped curve. This is a reasonable assumption because the logistic function is used to describe the thermal decomposition dynamics of several materials^{49,50,59}. We obtain,
where U_{o} is the initial concentration, k is growth rate, K is the carrying capacity and M is a fitting constant. In the context of MAPI degradation, M, U_{o}, and K can be considered as fitting parameters. The growth rate k varies with temperature according to the Arrhenius equation:
here, E_{a} is the activation energy, T is the temperature in Kelvin, A is the preexponential factor and R is the universal gas constant. We use this model to produce noisefree simulated data (labeled ‘simulated’) and simulated data with Gaussian noise (labeled ‘simulated with Gaussian noise’).
Data analysis
First, we apply the sparse regression methodology PDEFIND^{19} to experimental data (Workflow (1)). We use the timeseries from all the temperatures to infer the partial differential equation (PDE) defining the relationship between MAPI degradation, temperature, and time. Then, we apply PDEFIND to the timedependent degradation data at each temperature, to infer the ordinary differential equation (ODE) that describes MAPI decomposition at a particular temperature. To study the effect of noise, we apply PDEFIND to simulated data with and without Gaussian noise (Workflow (2)).
The library of potential candidate functions consists of polynomials of U, polynomials of time t, sine and cosine of U, temperature T, and other nonlinear functions of U, t, and T (Supplementary Table 1). Differentials are calculated by finite difference with convolutional smoothing using a 1D Gaussian kernel. Once a large tall matrix (Θ(U)) with all candidate functions is composed, we use sequential threshold ridge regression to identify which terms contribute to the dynamics described by the data as well as those terms’ weights. The goal of this method is to find a sparse coefficient vector β that only consists of the active features that best represent the time derivative \(U_t\). The rest of the features are hardthresholded to zero. The loss functions are follows (λ_{2} and λ_{0} are the L2 and L0 regularization penalties, respectively, more details can be found in the supplementary information of ref. ^{19}):
for a given \(\widehat {{{{\mathrm{tol}}}}}\), where \(\widehat {{{{\mathrm{tol}}}}}\) is:
Data availability
The experimental dataset analyzed during the current study, the simulated data and labeled data supporting the findings of this study, and the data comprising the figures in this paper are all available in the following GitHub repository: https://github.com/PVLab/PDEExtraction.
Code availability
The code used for data analysis in this study is available in the following GitHub repository: https://github.com/PVLab/PDEExtraction.
References
Newton, I. Philosophiae Naturalis Principia Mathematica (A. et JM Duncan, 1833).
Einstein, A. Does the inertia of a body depend upon its energycontent. Ann. Phys. 18, 639–641 (1905).
Russell, J. L. Kepler’s laws of planetary motion: 1609–1666. Br. J. Hist. Sci. 2, 1–24 (1964).
Heit, E. Properties of inductive reasoning. Psychonomic Bull. Rev. 7, 569–592 (2000).
Pitt, M. A. & Myung, I. J. When a good fit can be bad. Trends Cogn. Sci. 6, 421–425 (2002).
Christopoulos, A. & Lew, M. J. Beyond eyeballing: fitting models to experimental data. Crit. Rev. Biochem. Mol. Biol. 35, 359–391 (2000).
Roberts, S. & Pashler, H. How persuasive is a good fit? A comment on theory testing. Psychol. Rev. 107, 358–367 (2000).
Otto, F. et al. Decomposition of the singlephase highentropy alloy CrMnFeCoNi after prolonged anneals at intermediate temperatures. Acta Materialia 112, 40–52 (2016).
Starke Jr, E. A. et al. Accelerated Aging of Materials and Structures. Accelerated Aging of Materials and Structures (National Academies Press, 1996).
McKeen, L. W. The Effect of Long Term Thermal Exposure on Plastics and Elastomers. The Effect of Long Term Thermal Exposure on Plastics and Elastomers (Elsevier Inc., 2013).
Simmons, C. B. et al. Deactivation of metastable singlecrystal silicon hyperdoped with sulfur. J. Appl. Phys. 114, 243514 (2013).
Macan, J., Brnardić, I., Orlić, S., Ivanković, H. & Ivanković, M. Thermal degradation of epoxy  Silica organic  Inorganic hybrid materials. Polym. Degrad. Stab. 91, 122–127 (2006).
Choi, W., Huh, H., Tama, B., Park, G. & Lee, S. A neural network model for material degradation detection and diagnosis using microscopic images. IEEE Access 7, 92151–92160 (2019).
Severson, K., Attia, P., Jin, N., Perkins, N. & Jiang, B. Datadriven prediction of battery cycle life before capacity degradation. Nat. Energy 4, 383–391 (2019).
Nash, Will, Drummond, T. & Birbilis, N. A review of deep learning in the study of materials degradation. npj Mater. Degrad. 2, 1–12 (2018).
Entekhabi, E., Haghbin Nazarpak, M., Sedighi, M. & Kazemzadeh, A. Predicting degradation rate of genipin crosslinked gelatin scaffolds with machine learning. Mater. Sci. Eng. C. 107, 110362 (2020).
Hartono, N. T. P. et al. How machine learning can help select capping layers to suppress perovskite degradation. Nat. Commun. 11, 1–9 (2020).
Sun, S. et al. A data fusion approach to optimize compositional stability of halide perovskites. Matter 4, 1305–1322 (2021).
Rudy, S. H., Brunton, S. L., Proctor, J. L. & Kutz, J. N. Datadriven discovery of partial differential equations. Sci. Adv. 3, e1602614 (2017).
Champion, K., Lusch, B., Nathan Kutz, J. & Brunton, S. L. Datadriven discovery of coordinates and governing equations. Proc. Natl Acad. Sci. USA 116, 22445–22451 (2019).
Atkinson, S. et al. Datadriven discovery of freeform governing differential equations. Preprint at http://arxiv.org/abs/1910.05117 (2019).
Raissi, M., Perdikaris, P. & Karniadakis, G. E. Physicsinformed neural networks: a deep learning framework for solving forward and inverse problems involving nonlinear partial differential equations. J. Computat. Phys. 378, 686–707 (2019).
Long, Z., Lu, Y. & Dong, B. PDENet 2.0: learning PDEs from data with a numericsymbolic hybrid deep network. J. Comput. Phys. 399, 108925 (2019).
Long, Z., Lu, Y., Ma, X. & Dong, B. PDENet: learning PDEs from data. in 35th International Conference on Machine Learning, ICML 2018 Vol. 7, 5067–5078 (2017).
Raissi, M., Yazdani, A. & Karniadakis, G. E. Hidden fluid mechanics: learning velocity and pressure fields from flow visualizations. Science 367, 1026–1030 (2020).
Yin, M., Zheng, X., Humphrey, J. D. & Karniadakis, G. E. Noninvasive inference of thrombus material properties with physicsinformed neural networks. Computer Methods Appl. Mech. Eng. 375, 113603 (2021).
Zanna, L. & Bolton, T. Data‐driven equation discovery of ocean mesoscale closures. Geophys. Res. Lett. 47, e2020GL088376 (2020).
Schmelzer, M., Dwight, R. P. & Cinnella, P. Discovery of algebraic Reynoldsstress models using sparse symbolic regression. Flow., Turbulence Combust. 104, 579–603 (2020).
Butler, K. T., Davies, D. W., Cartwright, H., Isayev, O. & Walsh, A. Machine learning for molecular and materials science. Nature 559, 547–555 (2018).
Cichos, F., Gustavsson, K., Mehlig, B. & Volpe, G. Machine learning for active matter. Nat. Mach. Intell. 2, 94–103 (2020).
Brunton, S. L., Noack, B. R. & Koumoutsakos, P. Machine learning for fluid mechanics. Annu. Rev. Fluid Mech. 2020 52, 477–508 (2019).
Roscher, R., Bohn, B., Duarte, M. F. & Garcke, J. Explainable machine learning for scientific insights and discoveries. IEEE Access 8, 42200–42216 (2020).
JuarezPerez, E. J. et al. Photodecomposition and thermal decomposition in methylammonium halide lead perovskites and inferred design principles to increase photovoltaic device stability. J. Mater. Chem. A 6, 9604–9612 (2018).
Divitini, G. et al. In situ observation of heatinduced degradation of perovskite solar cells. Nat. Energy 1, 1–6 (2016).
Fan, Z. et al. Layerbylayer degradation of methylammonium lead triiodide perovskite microplates. Joule 1, 548–562 (2017).
Smecca, E. et al. Stability of solutionprocessed MAPbI_{3} and FAPbI_{3} layers. Phys. Chem. Chem. Phys. 18, 13413–13422 (2016).
Conings, B. et al. Intrinsic thermal instability of methylammonium lead trihalide perovskite. Adv. Energy Mater. 5, 1500477 (2015).
Schwenzer, J. A. et al. Thermal stability and cation composition of hybrid organic–inorganic perovskites. ACS Appl. Mater. Interfaces 13, 15292–15304 (2021).
Yang, J., Siempelkamp, B. D., Liu, D. & Kelly, T. L. Investigation of CH_{3}NH_{3}PbI_{3} degradation rates and mechanisms in controlled humidity environments using in situ techniques. ACS Nano 9, 1955–1963 (2015).
Kim, N. K. et al. Investigation of thermally induced degradation in CH_{3}NH_{3}PbI_{3} perovskite solar cells using insitu synchrotron radiation analysis. Sci. Rep. 7, 1–9 (2017).
Han, Y. et al. Degradation observations of encapsulated planar CH_{3}NH_{3}PbI_{3} perovskite solar cells at high temperatures and humidity. J. Mater. Chem. A 3, 8139–8147 (2015).
Nie, W. et al. Lightactivated photocurrent degradation and selfhealing in perovskite solar cells. Nat. Commun. 7, 1–9 (2016).
Lee, S. W. et al. UV degradation and recovery of perovskite solar cells. Sci. Rep. 6, 1–10 (2016).
Abdelmageed, G. et al. Effect of temperature on light induced degradation in methylammonium lead iodide perovskite thin films and solar cells. Sol. Energy Mater. Sol. Cells 174, 566–571 (2018).
Bastos, J. P. et al. Model for the prediction of the lifetime and energy yield of methyl ammonium lead iodide perovskite solar cells at elevated temperatures. ACS Appl. Mater. Interfaces 11, 16517–16526 (2019).
Avrami, M. Kinetics of phase change. I: General theory. J. Chem. Phys. 7, 1103–1112 (1939).
Fanfoni, M. & Tomellini, M. The JohnsonMehlAvramiKolmogorov model: a brief review. Nuovo Cim. Della Soc. Ital. di Fis. D.  Condens. Matter, At., Mol. Chem. Phys., Biophys. 20, 1171–1182 (1998).
Tran, C. D. T., Liu, Y., Thibau, E. S., Llanos, A. & Lu, Z. H. Stability of organometal perovskites with organic overlayers. AIP Adv. 5, 087185 (2015).
Ellis, C. L. C., Javaid, H., Smith, E. C. & Venkataraman, D. Hybrid perovskites with larger organic cations reveal autocatalytic degradation kinetics and increased stability under light. Inorg. Chem. 59, 12176–12186 (2020).
Fu, F. et al. I2 vaporinduced degradation of formamidinium lead iodide based perovskite solar cells under heatlight soaking conditions. Energy Environ. Sci. 12, 3074–3088 (2019).
Asghar, M. I., Zhang, J., Wang, H. & Lund, P. D. Device stability of perovskite solar cells—a review. Renew. Sustain. Energy Rev. 77, 131–146 (2017).
Boyd, C. C., Cheacharoen, R., Leijtens, T. & McGehee, M. D. Understanding degradation mechanisms and improving stability of perovskite photovoltaics. Chem. Rev. 119, 3418–3451 (2019).
Khenkin, M. V. et al. Consensus statement for stability assessment and reporting for perovskite photovoltaics based on ISOS procedures. Nat. Energy 5, 35–49 (2020).
Hashmi, S. G. et al. Long term stability of air processed inkjet infiltrated carbonbased printed perovskite solar cells under intense ultraviolet light soaking. J. Mater. Chem. A 5, 4797–4802 (2017).
Whitfield, P. S. et al. Structures, phase transitions and tricritical behavior of the hybrid perovskite methyl ammonium lead iodide. Sci. Rep. 6, 1–16 (2016).
Rajendra Kumar, G. et al. Phase transition kinetics and surface binding states of methylammonium lead iodide perovskite. Phys. Chem. Chem. Phys. 18, 7284–7292 (2016).
Tsoularis, A. & Wallace, J. Analysis of logistic growth models. Math. Biosci. 179, 21–55 (2002).
Rackauckas, C. et al. Universal differential equations for scientific machine learning. Preprint at http://arxiv.org/abs/2001.04385 (2020).
Burnham, A. K. Use and misuse of logistic equations for modeling chemical kinetics. J. Therm. Anal. Calorim. 127, 1107–1116 (2017).
Eames, C. et al. Ionic transport in hybrid lead iodide perovskite solar cells. Nat. Commun. 6, 7497 (2015).
Acknowledgements
The authors thank Kathleen Champion, Samuel Rudy, Zichao Long, and Steven Atkinson for helpful discussions regarding Scientific ML. This work was supported by Defense Advanced Research Projects Agency (DARPA) under contract no. HR001118C0036 (R.N., J.T., A.T.), TotalEnergies SE research grant funded through MITeI Sustng Mbr 9/08 (A.T., S.S., Z.L.), and the U.S. Department of Energy (DOE) under Photovoltaic Research and Development (PVRD) program under Award no. DEEE0007535 (Z.L.). This work was partially supported by the U.S. Department of Energy’s Office of Energy Efficiency and Renewable Energy (EERE) under the Advanced Manufacturing Office (AMO) Award Number DEEE0009096 (R.N.). A.T. acknowledges the Alfred Kordelin Foundation.
Author information
Authors and Affiliations
Contributions
R.N., A.T., S.S., and T.B. conceived of and designed the study. C.B. and J.T. fabricated the samples. R.N. executed different aspects of the study such as the experiments and ML modeling. R.N., A.T., and T.B. wrote the paper while all coauthors contributed to reviewing the manuscript.
Corresponding authors
Ethics declarations
Competing interests
Although our laboratory has IP filed covering photovoltaic technologies and materials informatics broadly, we do not envision a direct COI with this study, the content of which is opensourced. Two of the authors (Z.L., T.B.) own equity in a startup company, Xinterra Pte Ltd, which applies machine learning to materials.
Additional information
Publisher’s note Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Supplementary information
Rights and permissions
Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made. The images or other third party material in this article are included in the article’s Creative Commons license, unless indicated otherwise in a credit line to the material. If material is not included in the article’s Creative Commons license and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this license, visit http://creativecommons.org/licenses/by/4.0/.
About this article
Cite this article
Naik, R.R., Tiihonen, A., Thapa, J. et al. Discovering equations that govern experimental materials stability under environmental stress using scientific machine learning. npj Comput Mater 8, 72 (2022). https://doi.org/10.1038/s41524022007515
Received:
Accepted:
Published:
DOI: https://doi.org/10.1038/s41524022007515
This article is cited by

Knowledgeintegrated machine learning for materials: lessons from gameplaying and robotics
Nature Reviews Materials (2023)