Learning heterogeneous reaction kinetics from X-ray videos pixel by pixel

Reaction rates at spatially heterogeneous, unstable interfaces are notoriously difficult to quantify, yet are essential in engineering many chemical systems, such as batteries1 and electrocatalysts2. Experimental characterizations of such materials by operando microscopy produce rich image datasets3–6, but data-driven methods to learn physics from these images are still lacking because of the complex coupling of reaction kinetics, surface chemistry and phase separation7. Here we show that heterogeneous reaction kinetics can be learned from in situ scanning transmission X-ray microscopy (STXM) images of carbon-coated lithium iron phosphate (LFP) nanoparticles. Combining a large dataset of STXM images with a thermodynamically consistent electrochemical phase-field model, partial differential equation (PDE)-constrained optimization and uncertainty quantification, we extract the free-energy landscape and reaction kinetics and verify their consistency with theoretical models. We also simultaneously learn the spatial heterogeneity of the reaction rate, which closely matches the carbon-coating thickness profiles obtained through Auger electron microscopy (AEM). Across 180,000 image pixels, the mean discrepancy with the learned model is remarkably small (<7%) and comparable with experimental noise. Our results open the possibility of learning nonequilibrium material properties beyond the reach of traditional experimental methods and offer a new non-destructive technique for characterizing and optimizing heterogeneous reactive surfaces.


Materials and methods
We briefly describe the details on materials synthesis, electrode assembly, and characterization. More details can be found in [1,26]. Synthesis. LiFePO 4 platelets were synthesized using a solvothermal method. All the precursors were purchased from Sigma-Aldrich. 6 mL of 1 M H 3 PO 4 was mixed with 24 mL of polyethylene glycol 400. Then, 18 mL of 1 M LiOH(aq) was added to precipitate Li 3 PO 4 . The mixture was constantly bubbled with dry N 2 at a flow rate of 50 mL/min for 16 h for deoxygenation. Next, 12 mL of deoxygenated H 2 O was added via a Schlenk line to FeSO 4 7H 2 O powder, which was pre-dried under vacuum. The FeSO 4 solution was then injected into the Li 3 PO 4 suspension without oxygen exposure, and the entire mixture was transferred to a 100-mL Teflon-lined autoclave. The autoclave was heated to 140°C for 1 h and then to 210°C for 17 h. The sample was annealed under Ar gas at 720°C for 5 h to eliminate anti-site defects [55,56].
In this study, we employed the platelet particle morphology for three reasons: (1) matching particle size to the spatial resolution of the microscope, (2) aligning the particles crystallographically to the X-ray beam, and (3) ensuring a short solid-state transport distance. These three considerations are essential for image version. Additionally, the platelet morphology is also used extensively for fundamental studies in the literature [57,58,41,59,60,26].
Carbon coating. The white LiFePO 4 particles were mixed with sucrose at a ratio of 5:1 (LiFePO 4 :sucrose). This sample was then heated to 600°C for 5 hr in a tube furnace under flowing Ar to yield the carbon-coated LiFePO 4 .
Structure and electrochemical characterization. The carbon-coated LFP particles were first electrochemically tested in standard CR2032 lithium half cells, yielding a specific capacity of 150mAh/g at a rate of C/10. For structural and morphological characterization, scanning electron microscopy (SEM), X-ray diffraction (XRD), and atomic force microscopy (AFM) analyses of the pristine LFP samples were performed (see Refs. [1,26]).
STXM experiment. The STXM experiment was conducted at beam line 11.0.2.2 and 5.3.2.1 of Advanced Light Source (ALS). The operando STXM at 11.0.2.2 utilizes a 45-nm zone plate. High-resolution images were rasterscanned in 50-nm-steps, with a dwell time of 1 ms per pixel. The additional operando STXM at 5.3.2.1 utilizes 60-nm zone plate. For imaging, a single layer of 10,000 to 50,000 particles was dispersed on the working electrode of the liquid STXM imaging platform. During the experiment, Fe(II) and Fe(III) 3 edge was repeatedly scanned to quantify the local oxidation state of Fe and ultimately Li concentration. STXM scans the LFP particles in 50nm steps, generating about 200 to 1400 pixels per particle, which is sufficient resolution for resolving spatial variation within single particles. The large, micron-sized LFP particles are separated from one another so individual particles can be clearly segmented out from the images. In addition, these LFP particles have a thin platelet morphology with a thickness of about 150 nm [1]. The morphology of the particles and the resolution of STXM imaging enables us to directly compare images taken in the a-c plane which reveals the depth-average Li concentration with our depth-averaged 2D single particle model [19,7,14,24], perform the image inversion, and make inferences on the spatial heterogeneity.
AEM imaging. After the in-situ STXM experiment, the chip with the platelet LFP was transferred under Arsealed bags and directly imaged under a PHI 700 Auger Scanning Probe (ULVAC-PHI Inc.) with a depth sensitivity of up to 5 nm. Principles of AEM can be found in Refs. [61,62]. First, we performed SEM to identify and ensure the particles to be analyzed were identical to the ones imaged under in-situ STXM experiment. Next, we obtain the auger electron spectra of a select area on the particle to identify the peak energy positions for C (270 eV), Fe (703 eV), O (510 eV) of the identified particle. Finally, elemental maps of the entire particle were obtained by monitoring the peaks of specific elemental energy transitions.

Experimental uncertainty
The depth-averaged Li fraction map is obtained from the X-ray absorption map via STXM imaging. As established previously [1,63], the optical density ≡ − ln / 0 (where and 0 are the intensity of the transmitted X-ray with and without the particle) at an energy level ( ) can be expressed as a linear combination of optical densities of pure LFP ( ) and FP ( ) based on the local composition: = + , and /( + ) is the Li fraction. At least two energy levels are needed to uniquely determine and . To assess the uncertainty of the Li fraction, we measure the X-ray adsorption at 5 selected energy levels (700 eV, 706.2 eV, 706.8 eV, 711.2 eV, and 713.4 eV, covering 1 pre-edge, 2 rising edges, and 2 falling edges), we obtain an overdetermined system of equations in terms of and . Since physically we require > 0 and > 0, we then used a non-negative least square optimization algorithm to solve for and and hence the Li fraction. We perform bootstrapping, that is, randomly sampling multiple energy levels and use the corresponding optical density to solve for the Li fraction using the optimization algorithm. From the ensemble of bootstrapping samples, we obtain an estimate for the variance of the Li fraction for each pixel pixel . The result is plotted in Fig. S1, where we show the maps of pixel for each particle used for the uncertainty analysis, which have different states of charge, as well as the histogram of pixel over all pixels and particles. The average of pixel is 0.072.
Previously in Ref.
[1], an estimate of the experimental error on Li fraction is obtained by calculating the standard deviation of the Li fraction values of all pixels in the reference images, in which the average Li fractions of the particles are 0.97 and 0.03, and the particles have uniform Li concentration field. The standard deviation is 0.06. Based on the bootstrapping result and the standard deviation of Li fraction in the reference images, we estimate that the standard deviation of the experimental error of Li fraction is about ≈ 0.07.  Table S1 describes the statistics of the in-situ STXM images. Some particles are imaged over multiple charge and/or discharge cycles at different rates. In the Supplementary Information, each sequence of images during charge or discharge is called an episode. Fig. S2 shows the histograms of the number of frames per episode, pixels of each particle, particle size, and average rate of each episode, which is defined as the change in average Li fraction (from the first to last frames) divided by the duration of the episode. The median number of pixels and major axis length are 458.5 and 1.60 m while the corresponding mean values are 458. 75

Governing equations
The free energy of lithium iron phosphate in the bulk can be described variationally as consisting of chemical free energy and mechanical energy (assumed to be linear elasticity), where is the particle domain, is the maximum concentration of lithium, or the lattice site density, is the Li fraction (the fraction of lattice sites occupied by Li) 0 ≤ ≤ 1, ℎ ( ) is the homogeneous free energy, is the stiffness tensor, and the elastic strain is   Figure S2: Statistics of the datasets. From left to right: the histogram of the number of frames per episode, the number of pixels of each particle, the major axis length of each particle, and the average rate of each episode.
where is the displacement field, , denotes the derivative of the th component of the displacement in the th direction, and the second term is the chemical strain due to the change in lattice parameters (measured in Ref. [26]), which is approximately proportional to the local lithium concentration. 0 is known as the stress-free strain, or misfit strain. Since the relaxation time for mechanical deformation is fast for solids, we assume mechanical equilibrium at all times, that is the variational derivative / = 0, or equivalently where the stress is (using Einstein notation) where we define 0 = 0 . The chemical potential is defined by the variational derivative Li inserts and de-inserts from the a-c plane. When LFP nanoparticles are thin in the b direction (or depth direction), we may assume that phase separation does not occur in the depth direction and, due to fast diffusion in the b direction, we assume that the Li concentration is uniform in the depth direction [7,19,24]. This effectively reduces the model to 2D, where we use a depth-averaged concentration in the a-c plane, and locally the rate of change of is governed by the (de)lithiation reaction rate. Diffusion in the a-c plane is typically slow and can occur via defects or the surface layer [64,60]. Neglecting the lateral diffusion, the local rate of change of the Li fraction results from the surface reaction, where we model the reaction rate by Butler-Volmer kinetics for electrochemical reactions for the purpose of inverting the unknown 0 ( ) and later compare it with coupled electron transfer theory (CIET). The Butler-Volmer rate formula for is [65] = 0 ( ) −˜− (1− )˜ , where˜= /( ) is the nondimensionalized overpotential, is the overpotential and˜= − res , where res = Li + − Δ is known as the reservoir chemical potential, and Li + and Δ are the chemical potential of Li + in the electrolyte and the interfacial voltage, respectively.
The natural boundary condition for satisfies the minimization of the sum of bulk energy Eq. S1 and interfacial energy at the particle boundaries ∫ , Without the knowledge of the interfacial energy, we impose Dirichlet boundary condition for based on the image data. We use the no traction boundary condition for Eq. S3, n · = 0. (S10)

Chemomechanics
In this article, the mechanical parameters for LFP are given by literature values. The misfit strain in the a-c plane is approximated by a linear dependence on Li (010) plane is shown to capture the particle behavior well. Following Ref.
[20], we use the plane strain condition. Under mechanical equilibrium, Eq. S3 can be written as where , = . This equation can be solved analytically in Fourier space, in an infinite domain. Following whereˆis the Fourier transform of the heterogeneous displacement field (the total displacement subtracted by the average),¯is the average concentration, Δˆis the Fourier transform of −¯, and −1 = . (S13) where k is the wavevector in Fourier space. Substituting the solution to the displacement into the total free energy results in where is the dimensionality, n = k/|k| is the unit vector, and where −1 = .
(S16) Therefore, the minimum mechanical energy is obtained when the concentration field has zero components other than the optimal direction n * = argmin (n) in Fourier space. It has been shown that the minimum energy for LFP corresponds to the LFP-FP interface being approximately in the direction of [101] [20]. Therefore, in an infinitely large, coherent, relaxed, and phase-separated (FP and LFP are in coexistence) particle, the total free energy can also be written as . This equilibrium state minimizes the chemical free energy and satisfies the following criteria, also known as Maxwell or common tangent construction, When the two coexisting phases are in the same particle and remain mechanically coherent, the compositions are called the coherent miscibility gap, which minimizes the free energy Eq. S17. If we define ℎ, ( ) = ℎ ( ) + 1 2 min ( − ) 2 and chemical potential ℎ, ( ) = ℎ ( ) + 1 2 min ( −¯), the equilibrium compositions 1, and 2, satisfy 2D simulations are performed in the image coordinates. Since the particles are randomly oriented, we need to perform a coordinate transformation of the aforementioned mechanical parameters from the crystallographic coordinate system to the image coordinate system. The rotation matrix is where is the counterclockwise angle from crystallographic a-c coordinates to image x-y coordinates. For ptychography images [26] with the electron diffraction measurement, the a-c directions are known from the diffraction measurement. For STXM images where only the concentration profile is available, the a-c directions need to be estimated. Wulff reconstruction shows that a perfect and energetically stable LFP crystal is platelike and with the major and minor axes of the plane being [001] and [100], as confirmed in experiments [66,67,41]. Therefore, for particles imaged by STXM, we assume that a-c axis coincides with the minor and major axes of the particle, defined by the principal axes of the particle based on the central second moment of the particle shape (this assumption is revisited in Section 3). The misfit strain in the image coordinate is and the stiffness tensor is ′ = . (S24)

Parameterization of free energy and exchange current
In the literature, the regular solution model is typically used to model the homogeneous chemical free energy, In Ref.
[20], Ω = 4.47, which corresponds to a mosaic miscibility gap of 1 = 0.0126 ( 2 = 1− 1 due to symmetry), and a coherent miscibility gap of 1, = 0.0927 ( 2, = 1 − 1, ). Other parameters that we use from the literature are [20]: = 5.02×10 −10 J/m, maximum Li concentration = 2.29×10 4 mol/m 3 . At = 298 K, /( ) = 8.847×10 −18 m 2 . When normalized by STXM pixel size ( = 50 nm [1]),˜= 2 = 3.54 × 10 −3 . For the inversion, we consider two choices for representing the chemical potential. One option has an ideal entropic term and a polynomial function that corresponds to the excess part, which follows the physical constraint that the concentration is within [0, 1]. In the following text, the chemical potential ℎ ( ) is nondimensionalized by , that is,˜ℎ ( ) = ′ ℎ ( )/ . For simplicity, we drop the tilde and where ( ) are Legendre polynomials defined on [0, 1]. The second option consists only of the polynomials, We constrain the chemical potential such that the mosaic miscibility gap is [ 1 , 2 ] (using values in [20] as mentioned above), The constraint corresponds to a linear constraint on the parameters . Because we do not have the local voltage data Δ in our study since the local voltage losses are unknown, a constant shift in ℎ ( ) is equivalent. Hence we may fix the chemical potential at the mosaic miscibility gap to be 0, To illustrate the constrained representation of ℎ ( ), Fig. S3 shows the family of ℎ ( ) curves when we allow one degree of freedom (1-parameter representation). Using the lowest order polynomial possible, because there are three linear constraints, the highest order is a third-order polynomial. Since the derivative of Ginzburg-Landau free energy contains up to a third-order polynomial, the former choice of Eq. S26 corresponds to a regular solution model plus a rescaled Ginzburg Landau chemical potential (multiplied by a coefficient), The latter choice of Eq. S27 corresponds to a rescaled Ginzburg Landau chemical potential above (multiplied by a coefficient), In Fig. S3, we also show ℎ, ( ) = ℎ ( ) + min ( − 0.5). At the same coherent spinodal barrier height, the Ginzburg-Landau free energy (Eq. S31 gives a smaller miscibility gap than the regular solution representation (Eq. S30, it also has a smaller unstable (spinodal) region where ℎ / < 0. The coherent spinodal point is defined by ℎ, / = 0. Next, we constrain the coherent miscibility gap, that is, which adds two additional linear constraints. Fig. S4 shows the family of ℎ ( ) and ℎ, curves when we allow one degree of freedom with the entropic term. In this case, the highest order is a fifth-order polynomial.
Using the first approach that includes the ideal entropy (Eq. S26), we assume the prior for the nonideal part of the (excess) chemical potential follows a Gaussian distribution ex ( ) ∼ N (0, 2 global ( − ′ )); alternatively when using the second approach (Eq. S27), assume ℎ ( ) ∼ N (0, 2 global ( − ′ )). Due to the orthogonality of Legendre polynomials, the prior for is For the exchange current 0 ( ), we consider two parameterizations. To ensure its positivity, Because electron transfer requires the availability of vacancy in the lattice and, based on transition state theory and coupled-electron transfer theory [29], 0 is expected to approach 0 as approaches 0 and 1; therefore, we consider the representation Similar to the chemical potential, we suppose the prior for 0 ( ) is See Section 5.2 for the choice of the number of parameters used in the inversion.

Relaxation toward equilibrium
To validate the chemomechanical model using images of relaxed particles, we simulate the relaxation toward equilibrium under the condition that the average composition of the particles remains constant in time. The variation of the total free energy (including the surface energy) is if the natural boundary condition n · ∇ + ′ ( ) = 0 is used, then = ∫ . (S39) For a large particle, the surface energy has a very small contribution, hence we show the free energy without the surface energy term. When the system is governed by reaction with the external reservoir, with the constraint that ∫ / = 0, and res changes in time to satisfy the constraint, therefore, Butler-Volmer kinetics satisfies the condition that when res − > 0 (known as chemical affinity), > 0, hence / < 0. Since the free energy decreases monotonically in time, we compute the free energy to monitor the relaxation process and verify whether the system has reached equilibrium, that is, the local minimum of the free energy. Because relaxation mostly occurs outside the electrolyte and happens at a much longer time scale than reaction, we also consider modeling relaxation via lateral diffusion, In this case, the free energy of the system follows where the flux F = − ∇ , and the flux is zero on the boundary. Since is positive definite, the free energy also decreases monotonically / < 0. We consider three scenarios of achieving equilibrium: starting from the experimentally observed concentration field as the initial condition, 1) relax via diffusion, 2) relax via reaction, and 3) starting from a uniform concentration field with the same composition as the average of the experimentally observed concentration fields and relax via reaction. We do not consider relaxation from a uniform concentration field via diffusion because of the formation of spinodal patterns, whose wavelength is much smaller than the particle size and hence will take a long physical relaxation time and computational time (note that with diffusion, the domain size coarsens as 1/3 while with reaction the coarsening follows 1/2 ). Essential boundary condition for the concentration field is imposed in all scenarios, that is, the boundary values of ( ) are set to be equal to the experimentally observed values throughout the entire simulation.
The relaxation time in the following figures is nondimensionalized. When driven by reaction, the dimensionless time is 0 , where 0 is the scale for the exchange current density. When driven by diffusion, the dimensionless time is 0 / 2 , where = 300 nm is chosen such that reaction and diffusion relaxation be compared on the same dimensionless time axis. When reporting the free energy, we subtract the following free energy which corresponds to a phase-separated, infinitely large, coherent, and relaxed particle, without the interfacial energy, where we have previously defined ℎ, ( ) = ℎ ( ) + 1 2 min ( −¯) 2 as the sum of chemical free energy and bulk elastic energy. 1, and 2, are the coherent miscibility gap. 1 is the fraction of phase 1, therefore, we have 1, 1 + 2, (1 − 1 ) =¯. At later stages of the coarsening, the reported free energy − ref is the sum of interfacial and edge elastic energy [25]. The model parameters are described in Section 2. Fig. S5 shows the relaxation process of one particle via the three scenarios up to dimensionless time ( 0 or 0 / 2 ) 5000. 4 selected frames from the simulations are shown next to the experimental image and the selected frames are highlighted as black dots in the plots of free energy versus dimensionless time. The interface in the final frame is juxtaposed with the experimental image, as well as the free energy in time. The experimental images are from Ref. [26]. Based on electron diffraction measurements, the a-c crystallographic axes of these particles are known. The maximum mesh size is 5 nm, the same as the pixel size, which is sufficiently fine as explained in Section 6.5. Fig. S6 compares the experiment and the interface at equilibrium from simulation (white lines) and summarizes the evolution of free energy in time for each method and particle.
We find that relaxation starting from the experimental images either via diffusion or reaction results in similar final patterns (except for lateral displacement and mirror symmetry) and agrees well with the experimental observation. The final free energy reflects the interfacial area; for example, for particle 2, there are two LFP domains for scenarios 1 and 2 whereas, for particle 3, LFP has merged into one domain and has a lower free energy than the former.
In the simulations above, essential boundary conditions are used for all scenarios. To illustrate the effect of boundary condition, we perform simulations using the non-wetting boundary condition (n · ∇ = 0) and let the system relax via reaction from the experimental concentration field and compare it with scenario 2. Fig. S7 shows that, for most particles, the boundary condition does not have any significant influence on the final geometry of the bulk LFP or FP domains except for particle 2, in which two LFP domains are still undergoing coarsening to merge into one. The difference between the two boundary conditions is usually confined to within a very short distance (interfacial thickness) from the interface, for example, the zoom-in views of particles 1 to 4 show a thin FP layer between the LFP domain in the bulk and the particle boundary, the zoom-in view of particle 5 shows a particle boundary wetted by LFP. Therefore, the influence of the boundary condition is typically within a distance of the interfacial thickness away from the boundary and a much smaller length scale than the particle size.
In the image data above, the crystallographic orientation is known, as a result, the relaxed LFP-FP interface predicted by the model (based on the misfit strain measured in Ref. [26] matches the experiment nearly exactly. With in-situ STXM images, the crystallographic orientation is unknown. In the relaxation simulations below, we assume that the c axis aligns with the major axis of the particle. We have shown above that the difference in the equilibrium shape between essential and natural conditions is minor. Hence in the simulations below, we show only the results using essential boundary condition. Here the maximum mesh size is also set to 5 nm, or 1/7 of the pixel size, fine enough to resolve the interface. We find in Figs. S8 and S9 that, regardless of the mode of relaxation (reaction from experimental or uniform initial condition, or diffusion from experimental initial condition), the final pattern of phase separation is similar to the experimental image. 7.9 • and 10.0 • deviation is observed between the simulated and experimental LFP-FP interface for particles 3 and 4, which shows the error due to the unknown crystallographic orientation.
Lastly, we comment on the interfacial thickness. The value of interfacial thickness reported in the literature does vary [68,69,70,71], Ref. [70] reports a thickness of 12 nm based on STEM/EELS, while Ref. [71] reports a thickness of about 75 nm for the (210) interface and about 25 nm for the (100) interface. In our datasets, the  Figure S5: The experimental image of Li concentration field in relaxed LFP particles, and the simulated concentration field over time. The LFP-FP interface from the simulation at the last time step is shown on the experimental image as the white curve for comparison. The free energy evolution is plotted for all relaxation scenarios. Legend 1, 2, and 3 refer to the relaxation via diffusion, via reaction (starting from experimental image), and relaxation via reaction (starting from a uniform concentration field), and correspond to the 1st, 2nd, and 3rd rows. Black dots in the free energy curves correspond to the simulation snapshots shown. The scale bar is 500 nm. The image horizontal and vertical axes correspond to crystallographic a and c axes.  Figure S6: Similar to Fig. S5. The LFP-FP interface from the simulation at the last time step is shown on the experimental image as the white curve for comparison. From top to bottom, simulations correspond to relaxation via diffusion, via reaction (starting from experimental image), and relaxation via reaction (starting from a uniform concentration field). Legend names 1 to 5 in the free energy curves correspond to particles from left to right. The scale bar is 500 nm. Figure S7: Comparison of non-wetting boundary condition (left) and essential boundary condition (right, based on experimental images) for concentration. Relaxation occurs via reaction and the results at a dimensionless time of 5000 are shown. The inset is a zoom-in view of the highlighted region in the main image. All highlighted rectangles are 100 nm × 100 nm in size and are identical regions in space for the same particle except for particle 2, which are shifted to highlight the LFP-FP-boundary junction.
interfacial thickness estimated from the STXM images is about 70 nm, and that estimated from the ptychography images is about 123 nm [26]. The variation can come from an interface that is not fully relaxed, or its orientation not being perpendicular to the imaged a-c plane. The parameters we chose following Ref.
[20] give a simulated interfacial thickness of 12 nm. We see above that although the simulated interfacial thickness differs from the experiments, the phase separation morphology agrees well with the experiments. In the following sections, we are most interested in the regime where the Li concentration gradient is not as large because particles actively undergo Li insertion and de-insertion and do not fully relax, hence the value of is expected to have an even smaller effect and we will use the same value as described in sec. 2.3. 4 Spatial heterogeneity

Model
As mentioned in the main text, we use a multiplicative prefactor (x) (x corresponds to ( , ) in the main text) to model the spatial heterogeneity in the reaction kinetics, that is, The prior that we choose for (  Figure S8: The experimental image of Li concentration field in relaxed LFP particles, and the simulated concentration field over time. The LFP-FP interface from the simulation at the last time step is shown on the experimental image as the white curve for comparison. The free energy evolution is plotted for all relaxation scenarios. Legend 1, 2, and 3 refer to the relaxation via diffusion, via reaction (starting from experimental image), and relaxation via reaction (starting from a uniform concentration field), and correspond to the 1st, 2nd, and 3rd rows. Black dots in the free energy curves correspond to the simulation snapshots shown. The scale bar is 500 nm. The image's horizontal and vertical axes correspond to crystallographic a and c axes. Therefore, the prior of (x) is a Gaussian random field plus a constant that follows a normal distribution independently. The random field (x) has the properties We define the mean¯= ∫ (x) / ∫ . From the properties above, E ¯ = 0. We also define the overall variance, which is the expectation of the normalized and squared deviation of (x) from the ensemble mean, and the inter-particle variance, which is the variance of the mean, ( 1 , 2 ) 1 2 ∫ 1 2 Given the correlation function, when is much smaller than the size of the domain, in 2D, we have where is the area of the domain. In our study, the correlation length is chosen to be the size of the pixel, which is much smaller than the particle size ( 2 ≪ ), hence the inter-particle variance is approximately ′′ 3 ≈ 2 0 . Finally, we define the intra-particle variance, which is the expectation of the normalized and squared deviation of (x) from the particle mean, Given the approximation for the inter-particle variance, the intra-particle is approximately ′′ 2 ≈ 2 . We often need to quantify the variance of a sample of particles. Here we define the sample overall, intraparticle, and inter-particle variance by area (all pixels have the same weight), where ( ) is ln (x) of the th particle,¯is the mean of the th particle, is the area of the th particle, = 1, . . . , , and¯is the overall mean of all particles by area, i.e., We can also define the variance by particle (all particles are given the same weight), where¯′ is the overall mean by particle,¯′ 1 and ′ 1 can be expanded and written as and Similarly, Because is a multiplicative factor and both and 0 ( ) are inferred from the images, it is necessary to impose a normalization constraint, which can either be¯= 0 or¯′ = 0. We use the former in this study. Suppose the sample ( ) follows the distribution ( ) ∼ (x) i.i.d. for = 1, 2, . . .. When the constraint¯= 0 is imposed, we have [ 1 ] = ′′ 1 , and in the limit of 2 ≪ , we have In summary, when the correlation length is much smaller than the particle size, the sample intra-particle variance 2 and ′ 2 are unbiased estimators of the intra-particle variance 2 0 . When the constraint¯= 0 (¯′ = 0) is imposed, the sample inter-particle variance 3 ( ′ 3 ) is a unbiased estimator of the inter-particle variance 2 0 . Regardless of the particle size, when the constraint¯= 0 (¯′ = 0) is imposed, the overall sample variance 1 ( ′ 1 ) is a unbiased estimator of the overall variance 2 + 2 0 . We use this property later to estimate and 0 . Given the prior structure, we expand (x) − 0 in orthogonal basis functions, that is, we use the Karhunen-Loeve expansion to represent (x) using eigenfunctions of the covariance operator, We use finite elements to numerically solve the eigenvalue problem by taking its inner product with finite element basis { ( )}. This procedure converts the integral equation to a matrix form. Given that each eigenfunction is which is a generalized eigenvalue problem of the form = . The integrals are computed using quadrature. After obtaining the eigenfunctions, (x) can be written as where ∼ N (0, 1) ( = 0, . . . , KL ) i.i.d.. Oftentimes, we need to numerically compute the variance of (x) given Z. Recall that ( ) are orthonormal,

Estimates of intra-and inter-particle variance
When the concentration field ( ) is uniform and the insertion or de-insertion reaction occurs, from Eq. S45 and with = 0.5, where is a constant in space. Therefore we can estimate (x) = ln (x) using where the overhead bar is the spatial average over the particle. This quantity is important in finding the intraparticle variance. When (x) is close to 1, the expression above is approximately Proof: For brevity, we use to represent (x) First, we have Hence the left-hand side of the equation is Since is a constant in space, we have From Section 4.1, we use the sample variation 2 to estimate intra-particle variance 2 0 . The time derivative used to estimate (x) is estimated using finite difference of two consecutive frames. The frames chosen must satisfy the following conditions: the concentration field is close to being uniform with no phase separation and, during Li insertion, the frames chosen must not contain pixels whose Li fraction is too close to 1, because those regions are prevented from reacting further by the thermodynamic limit (hence the overpotential is nonuniform and is not constant in space) and do not reflect the kinetic properties and will be misidentified as the kinetically slow region; similarly, during Li de-insertion, the Li fraction must not be too close to 0. The difference in the average compositions of the two frames Δ used for finite differencing is a subtle choice. It should not be too large due to the error of finite differencing compared to the time derivative. It should also not be too small because the pixel noise leads to an increase in the estimated intra-particle variance by 2 2 /Δ .
In practice, we use the following set of criteria to determine pairs eligible for finite differencing: • The smaller concentration variance ( ∫ ( ( ) −¯) 2 / ∫ ) of the two frames is smaller than 0.01.
• The larger concentration variance of the two frames is smaller than 0.02.
• The difference in the average Li fraction between the two frames is greater than 0.05 and smaller than 0.3 • For Li insertion episodes, the percentage of pixels whose Li fraction is above 0.85 in the selected frame is smaller than 1%. For Li extraction episodes, the percentage of pixels whose Li fraction is below 0.15 in the selected frame is smaller than 1%.
For some particles with multiple pairs of consecutive frames that satisfy the criteria above, we take the average of the estimated (x) −¯based on these multiple pairs. The finite differencing estimates based on pairs from the same episode are averaged to obtain an estimate for (x). (x) −¯from all available episodes of the same particle are averaged to give an estimate for . The overall is computed by taking the area-weighted average for all available particles ( 2 ). Both the logarithm method (Eq. episode, respectively. Pixels whose values are smaller than −1 are rare and have a local reaction rate that has an opposite sign compared to that of the average reaction rate. (x) −¯maps share the same color map.
S70) and ratio method (Eq. S71) are used. When the logarithm method is used, pixels at which −1 is negative are omitted. The estimate for using the logarithm and ratio methods are 0.68 and 0.76, respectively. For the estimation of inter-particle variance, we select particles in the same batch of experiments having similar average composition at the same time and calculate the average reaction rate of each particle . Assuming that all particles have the same composition and overpotential, the variance of ln is an estimate of 3 . However, because the local environment of particles is different, the overpotential can be different, hence this approach may overestimate 3 . Fig. S11 shows the trajectory of the average compositions of all particles in the same batch of experiments (with the same global charge/discharge rate and belonging to the same electrochemical cell). The region of data selected to calculate 3 is highlighted by rectangles. 3 for the different batches shown in Fig. S11 are 1.51, 0.92, 0.25, 1.11, 0.33, and 0.76, respectively.

Inference based on different objective functions
In this section, we consider different objective functions that can be used to infer the unknown quantities in the model, ℎ ( ), 0 ( ), and (x), including metrics that describe the uniformity of the concentration field, compared with using full-image pixel-to-pixel error.

Inference based on uniformity coefficient
Ref.
[1] observed that the uniformity of the Li concentration field ( ) depends on the C-rate (reaction rate) and direction of reaction (charge or discharge). As mentioned in the main text, it is possible to infer the reaction kinetics and thermodynamics from the uniformity coefficient. Here we discuss in detail the model and procedure.
The heterogeneity of the concentration field is defined by its variance ( ) = ∫ ( ( , ) −¯( )) 2 / ∫ . The maximum variance (¯(1 −¯)) given the mean¯is attained when the concentration values are either 0 or 1. The closer ( ) is to this maximum value, the more nonuniform the concentration field. Following Lim et al.
[1], we  Figure S11: The average composition of particles based on images taken versus time since the beginning of discharge (the first row) or charge (the second row). The title of each plot is the global C-rate. The discrete data points are interpolated using pchip. The reaction rates in the highlighted boxes are used to compute interparticle variation.
define the uniformity coefficient UC from the least squares fitting of the standard deviation according to , that is, given ( ) at each time point of the episode, The model predicted UC is computed using ( , ) from simulations at a fine time resolution (100 time points from beginning to end). Using Bayesian inference, the likelihood is where UC and UC data refer to the model predicted and experimental uniformity coefficients, respectively, is the global C-rate of the th data point, 2 UC, is the variance of the th data point measured experimentally, p are the model parameters. The model predicted UC is solved at constant C-rate .The data can be found in Fig. 3E of Ref.
[1]. The data point that corresponds to = 0 is not used in the fitting, since the model prediction for UC at equilibrium does not dependent on the parameters related to reaction kinetics.
Since the model predicted UC is for any generic particle geometry, the model is solved on a square domain with periodic boundary condition. The surface heterogeneity is assumed to be a realization of a Gaussian random field, where the covariance is given in Eq. S46. With periodic boundary condition, the eigenfunctions of the convolution operator are Fourier basis functions, in other words, where Λ( 1 − 2 ) is the inverse covariance, and hat is used to denote Fourier transform,Λ =ˆ− 1 . Therefore, the random field (x) can be generated by first generating a white noise random field (no spatial correlation) ( ), and viaˆ=Λ −1/2ˆ. The variance of (x) ( 2 ) is an unknown parameter and we assume its prior is ln ∼ N (ln 0.1, 2).
Due to the limited number of data points for UC (6), we use a one-parameter representation of ℎ ( ) constrained by the mosaic miscibility gap (Eq. S30). We know that, in order for = 0.5 to be within the spinodal region, we we choose a reasonable prior for GL ∼ N (0, | GL,crit |) where GL,crit = (4 − 2Ω)/ ′ GL ( = 0.5). Alternatively, we also consider representing ℎ ( ) using Eq. S31 and require GL > 0 for stability. Based on the literature value for the spinodal barrier, we choose a reasonable prior ln GL ∼ N (0, 0.5).
It is known that, at higher overpotential, the dependence of the reaction rate on overpotential can deviate from exponential dependence as predicted by Butler-Volmer. This deviation can be due to Ohmic resistance or electron transfer rate at high potential (Markus-Hush-Chidsey model [19,29]) both of which cause ln / to decrease at high overpotential. Therefore, we examine the importance of the overpotential dependence given the experimentally measured uniformity coefficient in this section. For the reaction kinetics , we consider the effect of film resistance that is in series with the surface reaction that follows Butler-Volmer kinetics, where Ω is the film resistance scaled by the thermal voltage. The film resistance is an unknown parameter and we assume its prior follows ln Ω ∼ N (ln 0.1, 2). The surface heterogeneity is multiplicative, = (x) .
In the literature, a symmetric exchange current √︁ (1 − ) is frequently used [72]. From transition state theory [7], we expect that the exchange current approaches zero as approaches 0 and 1. Hence we use the representation in this section, where sets the shape, = max 0 ( ), and = argmax 0 ( ) since and ln 0 ( )/ = 0. We assume the prior for follows uniform distribution on (0, 1). We use the transformation = ln 1− ∈ R. Therefore the probability distribution of the prior for is and are positive, and we assume their priors are given by ln ∼ N (ln 0.5, 2) (the unit of is hr −1 ) and ln ∼ N (ln 1.5, 2). In total, we have 6 parameters p = [ , , , GL , , Ω ] whose priors are independent.
The initial condition ( ) is generated using ln ( ) is 0.05 and 0.95 for lithiation and delithiation, respectively, and that the variance as suggested by the experimental error. To illustrate the importance of reaction kinetics, we first omit the mechanical effects in the model. In a large domain, the effective gradient penalty scaled by the domain size / 2 → 0, and the correlation length of the surface heterogeneity field (x) satisfies √ ≪ ≪ , the continuum model reduces to the discrete 0D model, in which all grid points (pixels) has no spatial interaction. We find the maximum a posteriori estimate (MAP) for p using the discrete model (Eq. S31 is used to represent the chemical potential), and then solve both discrete and continuum model ( √ = 0.01 , = 0.2 ) using MAP p. The simulations are repeated 10 times with different random noise (for the initial condition and surface heterogeneity) to obtain the mean and variance of the model-predicted UC. Fig. S12a shows that both models are sufficiently close to the experimental values. The insets show examples of the concentration fields at¯= 0.5 at the same current and opposite directions, illustrating the increasing uniformity with increasing C-rate and the asymmetry between lithiation and delithiation.
When mechanics is included in the model, we must use the full continuum model to find the MAP. In Fig.  S12b, the uncertainty of the MAP model prediction due to stochasticity is well within the uncertainty in the UC data. The influence of C-rate and direction on the uniformity coefficient and the concentration field is qualitatively similar to the model that omits mechanics. We see a strong correlation between and ′ ℎ ( = 0.5). The two parameters for the shape of 0 ( ) is also highly correlated. We attempt to understand this correlation analytically.
Without the voltage and spatial information, the evolution of the concentration variance is approximated by the auto-catalytic rate defined below to first order [53], Normalizing by the reaction rate, In the case of Butler-Volmer kinetics, and at high rate ( → ±∞), At lower C-rates, the compositional heterogeneity is determined more by the thermodynamics (the magnitude of the second term in Eq. S86 is large compared to the first), and less by the reaction kinetics, that is, Therefore, given UC in the low rate limit, the posterior falls on the manifold of constant / , or 1/ 0 ∝ / (note that the proportionality constant is negative because 0 > 0 and / < 0 to ensure thermodynamic phase separation). Since / differs from ′ ℎ ( ) due to the mechanical effect we perform a linear regression between 1/ and ′ ℎ ( = 0.5) (it is found a regression between 1/ 0 ( = 0.5) and ′ ℎ ( = 0.5) gives similar results) and found The fitted correlation between and ′ ℎ ( = 0.5) is shown by the red curves in the scattered plots of Fig. S13 Since the experimental UC is observed at multiple C-rates, we expect that the first and second terms of the autocatalytic rate Eq. S85 can be identified separately since the C-rate affects only the second term and not the first term. Therefore, we hypothesize that in the posterior distribution, and approximately lie on the manifold of constant (ln 0 ) ′ . Since we perform a linear regression between and 1/ (shown by the green curves in the scattered plots of Fig. S13) which translates into ln 0 ′ ( = 0.57) ≈ −1.57. Having collapsed and into a single parameter ln 0 ′ ( = 0.57) related by the first term in Eq. S85, and and ′ ℎ ( = 0.5) into a single parameter related by the second term in Eq. S85, these two pairs can be identified separately. This is confirmed by the lack of any significant correlation between the pairs (the shape parameters of 0 ( ) are not strongly correlated with the magnitude of 0 ( ) and the thermodynamic parameter GL ). In fact, we have shown previously [31] that datasets from both lithiation and lithiation allow us to decorrelate ln 0 ′ ( ) and ′ ℎ ( ). The analysis above shows that the magnitude of 0 cannot be determined from this set of uniformity coefficient data due to its strong correlation with GL . However, its shape can be determined separately using effectively one parameter. We show the uncertainty of 0 ( )/ in red in the right panel of Fig. S13, which is defined by the 90% percentile at each .
Additionally, note that there is some negative correlation between the strength of surface heterogeneity and the chemical potential barrier, and some positive correlation between and , both of which suggest that when the system has a greater tendency to phase separate, the surface heterogeneity should be smaller given the same observed values of uniformity coefficient. We also find that the film resistance Ω is not sensitive to the observed UC and does not have any significant correlation with other parameters. Therefore, the following discussion omits the film resistance Ω .
In comparison, we perform HMC using the discrete model without mechanical effects and with Eq. S31) (10000 samples and the acceptance ratio is 0.85). We find a strong correlation between ln and ln GL . A linear regression finds A linear regression also finds a similar correlation between and , , or ln 0 ′ ( = 0.54) ≈ −1.26. Note that this result is very close to that using the continuum model. In fact, in the right panel of Fig. S13 we plot the uncertainty of 0 ( )/ using the discrete model in gray beneath that for the continuum model and find they are very close. Therefore, the posterior normalized 0 ( )/ 0,max is insensitive to the addition of mechanical effects given the uniformity coefficient data.
Finally, we show the posterior distribution of 0 ( ) and ℎ ( ) in Fig. S14 when using full representation Eq. S26 with fixed mosaic and coherent miscibility gap, the same as what will be used in full image inversion, with the number of parameters being = 1, and = 3, less than full image inversion where = 3, = 5.

Comparison of different objective functions and the number of parameters
In this section, we compare the inference based on 1) all the pixels in the image, 2) the uniformity coefficient, and 3) the variance of the concentration field by evaluating their linear sensitivities with respect to the parameters and through Hamiltonian Monte Carlo sampling of the posterior distributions. We also discuss the number of parameters needed to parameterize ℎ ( ) and 0 ( ). Suppose the likelihood of the pixel values data ( ) follows data ( )− ( ; p) ∼ N (0, 2 ). Based on the definition of the variance of the concentration field, where¯refers to the average composition. Similarly, the model-predicted variance is Based on the likelihood of the pixel values, the conditional mean and variance of the variance are As an order of magnitude, the value 2 = 0.07 2 is typically smaller than the variance , we estimate E[ data |p] ≈ (p). Next, given the definition of uniformity coefficient S76, we evaluate the mean and variance of √ data . In the limit of large number of pixels , Var[ data |p] is small and data |p approaches a normal distribution. To first-order approximation, Proof: (all symbols used in this proof are local) Given that ∼ N ( , 2 ): and Similar to the variance, we define the experimental and model predicted uniformity coefficient UC data and UC, respectively. Given that different snapshots in time are independent and the properties above, where we defined max as the sum of maximum variance for all snapshots. When the number of pixels is large, the observed UC approaches a normal distribution. Therefore, in summary for each episode, As a result, we compare the sensitivities of the following objective functions to determine the identifiability of the parameters (including multiple episodes at different C-rate ): We study the sensitivities by computing the Hessian when the model prediction is equal to the data. For example, the Hessian of the first objective function is (S107) We numerically evaluate the Hessian using the model in Section 5.1, with parameters from the optimal result (MAP) from fitting the uniformity coefficients. In this section, we focus on the identifiability of ℎ ( ) and 0 ( ), and for simplicity omit the spatial heterogeneity = 0 and set the series resistance Ω = 0. This is equivalent to the best-case scenario with the highest identifiability where the spatial heterogeneity is known. When pixelbased objective function 2 is used and the spatial heterogeneity is unknown, the uncertainty in ℎ ( ) and 0 ( ) increases. Hence, the analysis below provides an upper bound on the sensitivities. The geometry, simulation grid, and parameters are kept the same as that in Section 5.1. We use 5 snapshots for each dataset, which are simulated at C-rates of 0.3, 0.6, and 2 for charge, and 0.2, 0.6, and 2 for discharge. The average concentration ranges from 0.05 to 0.95. Thus, with max = 0.74, omitting this constant in the objective function (Eq. S106) is acceptable for an order of magnitude estimate. In order to be consistent for all objective functions, the model predicted UC is also defined based on the least square fitting (Eq. S76 given the variance ( ) at the same 5 snapshots, instead of 100 for ( ) used in Section 5.1. The initial condition is generated using the same approach as Section 5.1. Since the initial condition is random, we compute the Hessian given the same initial condition for all C-rates and take the average using 20 realizations.
Using the full representation of ℎ ( ) and 0 ( ) ( = + 2), Fig. S15 shows the sorted eigenvalues of the Hessian for the three objective functions using in total 12, 8, and 4 parameters. We find that the eigenvalue declines more rapidly in the order of pixel-, variance-, and UC-based objective functions. Therefore, more information is lost and parameters become increasingly unidentifiable by discarding spatial information and further compressing the experimental data down to statistical descriptors. We provide an estimate of the number of identifiable principal components by the relative magnitude of the eigenvalue compared to the largest. By choosing 10 −3 as the threshold value, we find that by using UC , only two principal components can be identified, with the other parameters highly correlated. This is consistent with the conclusion in Section 5.1. With the same threshold, we use a total of 8 parameters ( = 3, = 3, that is, ℎ ( ) and 0 ( ) are fifth-order polynomials) to parameterize ℎ ( ) and 0 ( ) when full concentration field is used as training data. Fig. S16 compares the posterior distribution of parameters using the two likelihood based on UC (Eq. S103) and pixels (Eq. S101) obtained using Hamiltonian Monte Carlo ( 2 / = 10 −4 ). We find that using full pixel information significantly reduces the inference uncertainty and the posterior distribution is clustered around the optimum, indicating that using full image data allows us to obtain more accurate inference of the unknown constitutive laws.

Optimization and numerical methods
This section discusses the optimization procedure and the numerical methods when full images are used and spatial heterogeneity is included in the inference. The numerical solver for the PDE is the core component of the inversion framework. The requirement on the solver accuracy, its efficiency, and robustness will also be considered in this section.

Objective function
We define the squared error, where , and data, , are the model prediction and experimental concentration field of the th episode of particle and the squared 2 norm of their difference is defined to be the sum of squared error of all pixels of the episode; the mean squared error where , is the number of pixels of particle , , , is the number of frames for the th episode of particle , the initial frame is excluded in the normalization because it is used as the initial condition, and the root mean squared error RMSE = √ MSE.
(S110) Figure S16: The scatter plots and histograms for each pair and each parameter in p generated by Hamiltonian Monte Carlo using UC-based (left. Eq. S103) and pixel-based (right. Eq. S101) likelihood (5000 samples), respectively. Red and green lines are fitted curves showing the correlation between and GL , as well as and , respectively.
We also define the squared error of a single particle , (S111)

Data preprocessing
Next, we describe in detail the preprocessing of experimental data before the inversion step to obtain the experimental concentration field data, , in Eq. S108 and then how the squared error in Eq. S108 is computed. Given the Li fraction and optical density at all the pixels, the image is binarized to produce the boundary of the particle. Triangulation is then performed on the particle geometry to be used as the mesh in the finite element simulation. The first frame of the images is interpolated onto the mesh and used as the initial condition. After solving the model, the solution is then interpolated onto a fine rectangular mesh (10 times finer than the image pixel), convolved by averaging over neighboring grid points that are within [−0.5, 0.5] pixel size relative to itself in both x and y directions, and then downsampled onto the image pixel positions. We explain why convolution is performed below. Invalid pixels and pixels outside of the particle boundary are not included in the squared error.
The convolution is performed here because, in the experiment, X-ray absorption is measured at 706 eV and 713 eV at all pixels where the matrix is the optical densities of LFP and FP references at 706 eV and 713 eV. Pixels where < 0 or > 0 are considered invalid. Recall that the mosaic miscibility gap [ 1 , 2 ] is close to [0, 1] ( 1 = 0.013), and 1 is much smaller than the experimental error = 0.07 [26, 1]. Hence, concentration within [0, 1 ] and [ 2 , 1] cannot be reliably inferred from the experiment. As a result, we make the reasonable assumption that the LFP and FP references correspond to = 1 and = 2 , respectively. By constraining the observed values to within [ 1 , 2 ], we allow data to inform physical properties only within this range, in the narrow regions of [0, 1 ] and [ 2 , 1], ℎ ( ) and 0 ( ) will be informed by the prior. The representation for these two functions (Eqs. S26 and S36) and their prior determined their asymptotic behavior: ℎ → ln , ln 0 → ln as → 0, and ℎ → ln 1 − , ln 0 → ln 1 − as → 1. Because the first frame is used as the initial condition, the choice of references precludes values of 0 and 1 in the initial condition and hence avoids numerical issues due to the logarithmic terms in the chemical potential. Therefore in summary, data, , = ( 2 − 1 ) + + 1 .
Following Ref.
[1], we define the relative thickness of the particle to be ℎ(x) = ( + )/ + , or the normalized optical density.
Suppose the measured optical density at the two energy levels and at each pixel is the convolution of true optical density and the same point spread function (PSF) (in the limit of small spatial variation), then the observed a and b are also the convolution of the true values of a and b (because the spatial convolution and the matrix above commute). We ignore the variation in + within the extent of PSF, which is about the size of one pixel in our case. This assumption is reasonable given that the overall relative standard deviation of + is 17% to 27% and that the variation within the extent of one pixel is unknown. Therefore, the observed Li fraction ′′ is related to the true Li fraction ′ by where ⊗ denotes convolution, and I is the shape function for the particle (1 inside the particle and 0 outside).
Because the X-ray beam is focused using 45 nm to 60 nm zone plates, and the images are raster-scanned in 50 nm steps [1], hence we assume that PSF is a nonzero constant within a box of one pixel in width and height. Correspondingly, we average the values on the finer simulation grid within the box to obtain the model prediction value at the location of the pixel , (p global , Z )). The difference with the pixel value is the error.

Numerical solver
We use finite element to solve the model. The use of the regular solution model for thermodynamics forbids = 0 and = 1, therefore ∈ (0, 1). Numerically, if the Li fraction is too close to 0 or 1 (when large overpotential is imposed), the solver might cause to go beyond this range, leading to complex and hence nonphysical values in the equation due to the presence of ln 1− in ℎ ( ). Therefore, to improve the robustness of the solver during optimization, we perform the nonlinear transformation or We use the linear basis functions { , = 1, . . .} on the triangular mesh to represent the variables, and similarly for and . Using the basis functions also as the test function, the weak form of the model described in sec. 2.1 is , where the bracket ⟨·, ·⟩ denotes the inner product of two functions, that is, the integral over the domain.
, denotes the derivative of the th component of the displacement field in the th direction, , denotes the partial derivative of test function in the th direction, ( ) is the interpolated average reaction rate. When essential boundary condition is imposed, the chemical potential equations at the boundary is replaced by the essential boundary condition. The time derivative of Eq. S117 can be written in a form that is explicit in terms of / , and similarly Eq. S120 can be written as ∑︁ , 1 = . (S122) The discretized equations are a set of differential algebraic equations (DAEs), where is the dynamic variable (except for nodes at the boundary where essential boundary condition is imposed) and the other variables , u, and Δ are algebraic. The DAE is of the form = ( , ). (S123) We use an implicit solver with adaptive time stepping, adaptive order, and error control to solve the DAEs [73].

Temporal accuracy
Because an implicit adaptive DAE solver is used, the objective function may be discontinuous or nondifferentiable in certain regions. The objective function becomes smoother as the solver tolerance decreases (higher temporal accuracy). Here we study the effect of temporal accuracy by computing the objective function around a nominal set of parameters. We perform a preliminary optimization on all of the datasets, choose the result at the 4th iteration as the nominal parameters, and compute the objective function along a certain direction in the parameter space by varying one global parameter while keeping all other parameters fixed. Fig. S17 shows the half squared error of all particles SE/2, and half squared error of the second particle SE 2 /2 as a function of the second parameter for ℎ ( ) ( 2 ) as well as the gradient computed from adjoint sensitivity analysis (ASA). The tangent line with the slope of the computed gradient is shown as red lines. From left to right, the first case is when the solver solves directly and the relative (RelTol) and absolute tolerance (AbsTol) of the DAE solver is 10 −3 and 10 −4 , respectively. We find that the objective function is highly noisy. Compared to the first case, the second case uses the transformation in Eq. S114. Compared to the second case, the third case reduces the relative tolerance of the DAE solver to 5 × 10 −4 , which reduces the roughness to some extent. The fourth case is the most stringent and compared to the second case, it reinitializes with a small time step at each frame, this significantly reduces the roughness of the objective function, and the slope agrees well with the adjoint sensitivity analysis.
Note that the adjoint and forward sensitivity analysis can return a sufficiently accurate gradient even with low accuracy such as in the first or second case. With increasing accuracy, the computational time increases. Therefore, the setting of the DAE solver tolerance should be balanced between the need for accuracy and time. With optimization, RelTol = 10 −3 and AbsTol = 10 −4 is often sufficient since the gradient calculated using forward sensitivity analysis and the approximated Hessian can lead to the minimum despite the rough landscape. Fig. S18 shows the objective function as a function of the parameters of ℎ ( ) around the optimum found by the optimizer when DAE tolerance is RelTol = 10 −3 and AbsTol = 10 −4 . The objective function is computed using RelTol = 5 × 10 −4 and AbsTol = 10 −4 with reinitialization. We find that the lower accuracy used during optimization is sufficient to find the optimum.
However, when doing a gradient-based Monte Carlo sampling such as Hamiltonian Monte Carlo, we need to make sure that the roughness is not much larger than 2 ; otherwise, the acceptance ratio becomes low and the   The roughness of the objective function. SE is the squared error of all particles, and SE is the squared error of individual particle. Three parameters of ℎ ( ) 1 , 2 , and 3 are varied. For the objective function of individual particles, the 25%, 50%, and 75% percentile roughness of all particles for all three parameters are tabulated. "reinit. " stands for reinitialize the solver at each frame; "in (no )" stands for solving in the variable , with all other cases solved using the transformation.
SE/2 SE /2 Natural boundary condition sampling is inefficient. Here, we define the roughness of the objective function in the direction of parameter by The integral is approximated using the trapezoidal rule using the points shown in Fig. S17 and Δ = 0.01. / is computed from ASA. Table S2 lists the roughness of the objective function SE/2 including all particles in the direction of 3 parameters for ℎ ( ) under different solver accuracy, as well as the 25%, 50%, and 75% percentile of the roughness of SE /2 for the same three parameters. We find that using the transformation in Eq. S114 decreases the roughness. Decreasing the solver tolerance and reinitialization decrease the roughness. The roughness for individual particles is on the order 10 −4 , and much smaller than 2 = 4.9 × 10 −3 . Hence HMC performed on a few particles will only need a low-accuracy time integrator. However, the objective function including all particles has a much higher roughness. In fact, when reinitialization is not applied, the roughness is much larger than 2 , resulting in inefficient sampling. Hence, when HMC is performed on all particles, a much tighter DAE tolerance is suggested. If the solver error is independent for all particles, the roughness scales with the square root of the number of particles √ .

Spatial accuracy
The finite element solver uses a fixed mesh. The mesh size needs to be fine enough to resolve the gradient in the concentration field ( ) when phase separation occurs or in the presence of strong spatial heterogeneity. At the same time, since the computation time is dominated by model evaluation, the scale of the problem (particle size and mesh size) determines the computational cost of the inversion and uncertainty quantification. In this section, we discuss the tradeoff between fidelity and computational cost.
The important length scales are the characteristic length of the particle and the thickness of the interface (between the two phases in a phase-separated particle). The interfacial thickness is found to be 12nm ≈ 4 √ [20] in a coherently phase-separated particle. In comparison, the pixel size of the STXM images is 50 nm, which is unable to resolve the interface. We perform simulations on two particles whose major axis lengths are 0.9 and 1.7 m, respectively (300 and 1000 pixels in STXM image unit). We study the error and computational time as a function of the maximum mesh size.
The particles are subject to two different constant total reaction rates chosen to correspond to those observed in the experiments that lead to the largest and smallest uniformity coefficients, which are −0.3 hr −1 (de-insertion) and 2 hr −1 (insertion). Since the uniformity of ( ) is very different in these two cases, this study is designed to test the numerical accuracy at different levels of concentration gradient. It is expected that finer mesh is needed to resolve features of large spatial gradient while the gradient decreases with increasing uniformity. The simulations are performed using the MAP result in 5.1. The same random fields are generated and scaled to the sizes of different particles.
The 2 norm of the error is defined by where is the th time point, is the concentration field at a given mesh resolution, and˜is the solution on a sufficiently fine mesh (whose maximum mesh size is 0.3 √ = 0.89nm) and is considered to be the truth. The integral is evaluated through quadrature and by projecting the coarse solution onto the fine mesh. Fig. S19 shows the 2 norm of the error and computational time as a function of maximum mesh size ℎ max . We also show the convergence of the largest magnitude of concentration gradient. The error and the gradient ∥∇ ∥) are reported in the unit of STXM pixel size (50 nm) and its inverse, respectively on the left axes. ℎ max is normalized by √ at the bottom axes. The corresponding values in nm are shown at the upper and right axes. The error scales as ℎ 1.4 max . The computational time scales as ℎ 2.4 max (or about 1.2 node where node is the number of nodes in the triangular mesh), which suggests that the costs of model evaluation for the experimentally observed particles differ by at least 16 (10-fold difference in the area). Since the time integrator has an adaptive time step, the computational time also depends on the reaction rate. The slower and negative reaction enhances the nonuniformity and increases the computational time. Note that the error for two particles of different sizes are close and the error in the largest magnitude of concentration gradient converges very slowly, suggesting that the 2 error mostly comes from the error at the interface. In fact, when the mesh is very coarse, max ∥∇ ∥ approaches its upper bound ℎ −1 max (gray curve in Fig. S19c), suggesting a sharp transition across the distance of one triangular element. The computational time for the two particle sizes collapses onto the same curve based on the number of nodes.
To put the magnitude of the numerical error into perspective, for an error of 10, the average error per pixel for a particle of area 300 and 1000 is 0.03 and 0.01. In comparison, the standard deviation of the pixel error is 0.07 [1]. Fig. S20 visualizes the numerical error using two sets of mesh whose maximum mesh size is 20 nm and 5 nm, respectively, (or 0.4 and 0.1 in STXM pixel unit), which are equal to 6.7 √ and 1.7 √ (highlighted as vertical dashed lines in Fig. S19). The error is about 16 and 2.2, respectively (the error per pixel for a particle of area 300 is 0.05 and 0.016). We visualize the solution on the coarse mesh and the error compared to the fine mesh result. We find that for the Li de-insertion case with a large spatial gradient, most of the error comes from the region near the interface and regions of large concentration variation, while the position of the interface is well captured despite the low spatial resolution. For the Li insertion case with high uniformity coefficient, the error is small and more evenly distributed, the error using mesh size 5 nm is dominated by the DAE solver error (10 −4 ). In summary, we find that the error of 5 nm mesh size is significantly smaller than the experimental error. However, in our study, the computational cost of optimization, cross-validation, and uncertainty quantification (Hamiltonian Monte Carlo or bootstrapping) can be high, since simulations must be performed for a large number of particles and for many repeated model evaluations.
Next, we compare the inversion result using 20 nm and 5 nm mesh size in Fig S21. The maximum 0 ( ) differs between the two meshes, due to insensitivity to the magnitude of 0 ( ), but the scaled 0 ( ) and ℎ ( ) from the two are sufficiently close. We also show examples of inverted ln (x) in Fig. S22. The inverted ln (x) is interpolated onto image pixels and compared. We find that ln (x), when subtracted by the mean of each particle, differ only slightly, much less than its uncertainty (see uncertainty quantification of ln (x) in Section 9.1). Therefore, for computational efficiency, we use a maximum mesh size of 20 nm for cross-validation and uncertainty quantification.    Figure S21: The MAP estimate of ℎ ( ) and 0 ( ) using a coarse (maximum mesh size equals 40% pixel size, or 20 nm) and fine mesh (maximum mesh size equals 10% pixel size, or 5 nm).

Optimization algorithm
To minimize the objective function, each iteration of the optimization performs the evaluation of the model and the sensitivities for all episodes in parallel. In this section, we compare quasi-Newton with the BFGS and trustregion algorithms. We set the initial guess of ℎ ( ) to be the same as the model in Ref.
[20], 0 ( ) = 4 (1 − ), and ln (x) = 0. For quasi-Newton, we use adjoint sensitivity analysis to compute the gradient of the objective function. In Fig. S23, we show ℎ ( ), 0 ( ), and the variance of ln (x) at each iteration. Because ℎ ( ) and 0 ( ) are shared by all simulations, the gradient with respect to these parameters are much greater than the parameters for ln (x). At the initial guess, the 2 norm of the gradient of all parameters for ℎ ( ) and 0 ( ) ( ( = 1, . . . , ) and ( = 1, . . . , )) is 676 (root mean square is 104), that that for ln (x) (Z ( = 1, . . . , ) ) is 104 (root mean square is 3.3). Therefore dyrubg the first few iterations of the optimization (gradient descent), it is mostly ℎ ( ) and 0 ( ) that are updated while ln (x) stays close to 0. Eventually, this leads to an exchange current that is very small and converges to a large objective function value and large residual gradient.
For the trust-region algorithm, we use forward sensitivity analysis to compute the gradient of the objective function and Gauss-Newton approximation of the Hessian [31, 32] (Eq. S107). In comparison, Fig. S24 shows that using FSA the optimizer converges quickly to a local minimum in about 20 iterations, with a final objective function smaller than ASA. Compared to ASA, ℎ ( ), 0 ( ), and (x) are updated simultaneously. Therefore, throughout the manuscript, we use the trust-region algorithm for optimization.

Interpolation of average composition
As explained in the methods (PDE-constrained optimization), the average Li fraction of each frame is interpolated in time to give the average reaction rate. This section discusses the choice of interpolation method. We consider two interpolating methods: spline and shape-preserving piecewise cubic Hermite interpolation (pchip). Both methods are piecewise cubic polynomials and have continuous first-order derivatives. The former has a continuous secondorder derivative, and the latter preserves the monotonicity of the interpolated data, but its second-order derivative may be discontinuous.
To quantify the error due to the interpolation, we simulate with nominal parameters used in Section 5.1 in a 1.5 m × 1.5 m periodic domain on a uniform 512 × 512 grid. The correlation length is set to be 75 nm and = 0.2. The reaction rate is constant in time and the average Li fraction goes from 0.05 to 0.95 or backward. The domain is divided into 3 × 3 500 nm × 500 nm subdomains. The average Li fraction¯of each domain is recorded. Five equally spaced points from the beginning to end are selected and the average fractions at these time points are used as known data points. Time is normalized for all cases to be¯∈ [0, 1]. We compute the error between the  and ∥Δ¯(¯)∥ ∞ = max¯¯|¯i nterp (¯) −¯t ruth (¯)|, where¯i nterp is the interpolated average fraction based on the five data points and¯t ruth is the true average fraction from the simulations. We also compare the norm of the error of the first derivative¯¯, which is the average reaction rate.
We run the simulations using 5 different realizations of the random field. Fig. S25 shows the median, 25%, 75% quantile of the error of all subdomains and all realizations. We see that the difference in error between spline and pchip is negligibly small. The error increases with decreasing reaction rate, due to the enhanced tendency to phase separate and hence increasingly nonlinear¯( ) trajectory. Fig. S11 shows the average composition of all particles and episode (dots) and the interpolated trajectory¯( ) (solid lines) using pchip. The data in the same plot corresponds to the particles in the same batch of experiments and the same charge or discharge episode. Time is synchronized and shifted such that the time of the first frame is 0. The title of each plot is the global rate of charge of discharge. In contrast, Fig. S26 shows the interpolation using spline. Because spline interpolation does not preserve non-monotonicity, it is possible that the average composition overshoots to values beyond [0, 1]. Therefore, the spline is performed on = ln1 −¯, and then transformed back tō ( ( )). Significant oscillations are observed. In view of this feature and the error analysis above, we choose pchip interpolation.

Function representation
Recall that we consider two representations for 0 ( ) (Eqs. S35 and S36). We test the optimization using these representations starting from an initial guess of = 0 ( = 0, 1, . . .) for both representations and = 5. Regularization on ln (x) is imposed ( 2 = 0.01, see Section 8.5 and the main text). The natural boundary condition n · ∇ = 0 is used. Fig. S27 shows that the MAP ( ℎ ( ) and 0 ( )) found by the optimizer using the two different representations are close, 0 ( ) are close to zero as approaches 0 or 1 in both cases. Fig. S27 also shows the uncertainty estimated by the Hessian at the MAP using the Gauss-Newton approximation ( = 0.07). The shaded region is the 99% confidence interval. We find that the uncertainty of 0 ( ) is mostly in its magnitude and not the functional form and that MAP of both representations lies within the confidence interval of each other.
Numerically, the computational time using the second representation is faster than the first, because the second forces the exchange current to approach 0 near = 0 or 1, reducing the strong dependence of the reaction rate on because of the large slope of ℎ ( ) near = 0 and 1. The squared errors found using the first and second representations are 594.9 and 587.8. The objective functions (p) are 301.5 and 297.9. The former is 1.2% higher than the latter.
In summary, since the results of 0 ( ) using the two representations are similar, the second representation finds a smaller objective function and squared error, and the second is computationally more efficient. The second representation is used in the main text.

Boundary condition
We consider two different boundary conditions: the natural and essential boundary conditions. Following the relaxation simulation in Section 3, we assume no wetting when using the natural boundary condition n · ∇ = 0. When the essential boundary condition is used, we interpolate the data at the particle boundaries in time using pchip, for the same reason as the interpolation of the average concentration, and impose the time-dependent concentration at the boundaries in the model. Fig. S28 shows the MAP ( ℎ ( ) and 0 ( )) using either boundary conditions (using the second representation for 0 ( ), Eq. S36). Regularization on ln (x) is imposed ( 2 = 0.01, see Section 8.5). We find that the resulting normalized 0 ( ) and ℎ using the two boundary conditions are very close. The difference in the magnitude of 0 ( ) is due to the insensitivity of the objective function with respect to 0 : a change in the boundary condition or other parameters causes a large shift in 0 (see Section 8.3). Fig. S29 shows two examples of ln (x) (subtracted by the mean), which shows that the difference is small (much  Figure S27: The MAP estimate of ℎ ( ) and 0 ( ) and the 99% confidence interval estimated using the Hessian at MAP. Natural boundary condition is used for the model.  Figure S28: The MAP estimate of ℎ ( ) and 0 ( ) using the natural and essential boundary conditions. smaller than the uncertainty in ln (x), see Section 9.1) and the difference mostly occurs along the boundaries.
Using the same procedure as in Section 6.8, we compare the MAP and uncertainty (estimated using linear approximation) results using two different representations of 0 ( ) using essential boundary condition in Fig. S30. The total SE of the first and second representations are 569.9 and 563.8. The objective functions (p) are 288.8 and 285.9. The former is 1.0% higher than the latter. The training error using essential boundary condition is smaller than natural boundary condition in Section 6.8. Therefore in the main text and the rest of the supplemental information, we represent 0 ( ) using Eq. S36 and use the essential boundary condition.

Identifiability
This section discusses the identifiability of the model based on simulated data. The simulations are performed on three randomly shaped convex polygons with the nominal parameters used in Section 5.1. The particles have on average 459 pixels and 5 frames. The first particle is cycled at 0.3C discharge and 2C charge, the second at 2C discharge and 0.2C charge, and the third at 0.6C discharge and charge.
Using Hamiltonian Monte Carlo (HMC), Fig. S31 shows that at the current level of pixel error of = 0.07 the posterior distribution of ℎ ( ), 0 ( ) and (x) are distributed around the truth. Notice that in order to improve the chain mixing of HMC, we perform a parameter space transformation based on Cholesky decomposition of the Hessian at the truth, 2 = R * R. The transformed parameters are p ′ = Rp. HMC is performed in the space of p ′ ,   Figure S30: The MAP estimate of ℎ ( ) and 0 ( ) and the 99% confidence interval estimated using the Hessian at MAP. The essential boundary condition is used for the model. which is significantly more efficient than sampling p. The resulting mean step size is 0.0051 with an acceptance ratio of 0.677. Based on the first and last third of the chain, the maximum Gelman-Rubin diagnostic is 1.047. Next, we perform the optimizations starting at different initial guesses. Regularization is not included. The simulated data are created using the exact same model and solver that the optimizer uses and has no noise. So ideally the optimizer should be able to find the exact set of parameters that the simulated data are generated with. However, because the DAE solver is an adaptive and fully implicit solver, the numerical error and the roughness of the objective function landscape will cause the converged parameters to deviate from the truth. Such deviation must be controlled within a tight error that is comparable to the numerical error of the solver and tolerance of the optimizer. If this is not true, then that suggests multiple optimal solutions exist and the parameters are nonidentifiable. Fig. S32a shows the sorted final objective function ( (p) = RMSE 2 · , /2, where is the particle area and , is the number of frames for particle excluding the initial frame. Note that in Fig. S32 particle sizes are scaled such that the average long axis length of the particles is nondimensionalized to be 1) as well as the corresponding RMSE starting from 500 randomly drawn initial guesses when the DAE solver has a relative tolerance of 10 −3 and absolute tolerance of 10 −4 , when the optimizer tolerance is 10 −6 and 10 −9 , and when is excluded or included in the optimization. The optimizer terminates when function tolerance (function difference between consecutive iterations less than Tol·(1+FunVal), where FunVal is the function value of the previous step, absolute step tolerance, or the absolute optimality (infinity norm of the gradient at the current step) tolerance meets the aforementioned threshold. When is set to be the truth and not optimized, the final RMSE drops below the dashed line, which is the absolute tolerance of the DAE solver. The corresponding error in the objective function due to numerical errors is estimated to be 6.8 × 10 −8 (the square of absolute error multiplied by total area and number of frames, excluding the initial frame). Decreasing the optimizer tolerance decreases the final objective function further, while when the optimizer tolerance is 10 −9 there is a still large proportion of the final objective function that stays close to the dashed line, which suggests that the optimizer may be limited by the error of the forward sensitivity solver that computes the gradient and Hessian. When is unknown and included as a free parameter in the optimizer even when the optimizer tolerance is 10 −9 , more than 80% of the solutions are above the DAE absolute error and the plot is tilted without a plateau region, suggesting the nonidentifiability of the problem when is unknown. Therefore, in this study, we will use the literature value for .
Figs. S32bcd show the RMSE of the ℎ ( ), ln 0 ( ), and ln (x) that correspond to the sorted objective functions in Fig. S32a. The RMSE of inferred function ( ) is defined by ∫ ( ( ) − truth ( )) 2 1/2 , where truth ( ) is the known truth. We see that when RMSE for the concentration field ( ) is 10 −4 , the error for the inferred functions is around 10 −3 . This is the level set by the DAE solver tolerance. When is unknown, the error for the inferred functions can greatly exceed this level, suggesting their nonidentifiability. Fig. S32e compares the sorted final objective function using low and high DAE solver tolerance. The high tolerance is the same as above, while the lower one corresponds to a relative tolerance of 10 −6 and absolute tolerance of 10 −9 . The dashed line is the optimizer tolerance. Switching to a lower DAE tolerance (higher accuracy) reduces the objective function, eliminating the region that is limited by the solver accuracy. Fig. S32f shows the corresponding RMSE of the inferred functions with the lower DAE tolerance.
In comparison with the result on the spatial accuracy in Section 6.5, a particle of 300 STXM pixels with the discretization used in the study above has a maximum MSE of 3.5×10 −3 . Therefore a large percentage of the results that the optimizer converges to can be considered to have found the truth.

Cross-validation
As mentioned in the main text, regularization (especially that of spatial heterogeneity) is essential to prevent overfitting. This section discusses in detail the effect of regularization, the sensitivities of parameters and crossvalidation to determine the optimal regularization parameter.

Regularization
In this section, we use simulated data to study the effect of regularization on the global parameters p global ( 1 ) and parameters for the heterogeneity Z ( 2 ). We use the same simulated datasets as used in section 7 with added noise that corresponds to a pixel-wise error = 0.07 that is consistent with experimental images. We perform regularization test on the three particles above, each having two episodes at 0.2C (charge and discharge), 0.3C, and 0.6C, and use the same particles at 2C (charge or discharge) as validation dataset. Fig. S33 shows the validation error (RMSE) as a function of 1 and 2 . The validation error shows a minimum near 1 = 2 = 10 −4 , however, the minimum is not sensitive to the value of 1 . This is also illustrated in Fig. S33b which shows a negligible change in the validation error as a function of 1 when 2 = 0. On the other hand Fig.  S33c shows significant reduction of the validation error at its minimum varying 2 when 1 = 0. Correspondingly, in Fig. S33d, the 2 norm of the error of ln (x) also shows a minimum around the 2 that has the minimum validation error. In contrast, as shown in Fig. S33e, the 2 norm of the error of ℎ ( ) and ln 0 ( ) shows negligible or no decrease as 2 varies, instead, they increase sharply when 2 > 10 −4 .

Variance and bias of MAP estimate
Understanding the variance and bias of the MAP estimate with regularization is important for cross-validation. In this section, we use simulated data generated using the same method as in sec. 7 to study the scaling of the variance and bias of the MAP estimate for the global parameters p global for ℎ ( ) and ln 0 ( ) with respect to the Optim. tol. = 10 -9 , low solver tol. number of particles. The variance and bias are computed using linear approximation of the model as explained further in sec. 9.1. For the variance of the MAP ℎ ( ) given the dataset (Eq. S134), we plot its 2 norm (std) as a function of the number of particles in the dataset used to obtain the MAP when 2 = 10 −4 in Fig. S34, and similarly for ln 0 ( ). Note that the normalization condition on Z is imposed. We see that the square root of the variance of ℎ ( ) and ln 0 ( ) scales as −1/2 , where is the number of particles. The bias of ℎ ( ) is defined to be the 2 norm of the deviation of the expectation of the MAP estimate ℎ ( ) (E[ ℎ ( )]) from the truth. Again, the expectation is estimated using the linear approximation of the model. The bias of ln 0 ( ) is defined similarly. In Fig. S35a we plot the bias as a function of the number of particles in the dataset for obtaining the MAP with 2 = 10 −4 . We see that the bias decays toward a finite and nonzero value with an increasing number of particles. Fig. S35b shows that the bias increases with increasing regularization parameter 2 when the number of particles is 50. Therefore, in summary, the bias of global parameters increases with increasing regularization of the spatial heterogeneity 2 , regardless of the number of particles. Note that this is the result of the heterogeneity being a multiplicative prefactor. In linear mixed model where the random effect is additive, unbiased estimator of the fixed effect (which corresponds to the global parameters in our nonlinear model) can be obtained regardless of the regularization on the random effect [74].

Sensitivity analysis of spatial heterogeneity
In choosing the prior for the spatial heterogeneity (x), we make the assumption that the same prior distribution for the random field (x) is shared by the entire ensemble of particles. This is because, for some particles, especially when the concentration field is nearly uniform, low-order components of (x) cannot be identified.
First, we illustrate the claim above by analyzing synthetic datasets generated using parameters in Section 5.1 with additive noise. we compute the squared error SE for different data sets as a function of 0 , while all other parameters are fixed to be the truth. Fig. S36 shows that when 0 = 0, at an average reaction rate of 0.3C, there is a distinct local minimum for SE at around 0 = 0,truth . Note that noise is added to the synthetic dataset to this case so SE ≠ 0 at 0 = 0,truth . However, we notice that with a noiseless dataset with 0 = −1, at an average reaction rate of 2C, local minimum for SE at 0 = 0,truth is very shallow, and when noise is added, local minimum no longer exists. Therefore, it is possible that the optimal value of 0 is unbounded. Next, we show that this phenomenon is also observed with experimental datasets. Fig. S37 plots the squared error SE and SE /2 + 2 2 0 /2 as a function of 0 of each particle around the MAP while all other parameters are fixed. We find that for some particles there exists a minimum squared error while for others the squared error is monotonic within the bound. Adding the regularization ensures that a local minimum exists and prevents 0 from becoming too large or small. The insensitivity may also be found in other low-order parameters. Therefore, it is important to impose a prior on the spatial heterogeneity shared by all particles.

Regularization of 0 and higher order parameters
As a result of the insensitivity with respect to 0 , we ask the question: should we impose a different regularization coefficient on 0 and ≠0 ? From a Bayesian perspective, the variance of the prior for 0 and ≠0 is 0 and , respectively, which may in general be different. Correspondingly, the regularization coefficient for 0 ( 2, 0 ) and ≠0 ( 2, ) can be different.
To answer this question, we vary 2, 0 and 2, independently. 0 ( ) and ℎ ( ) are fixed at MAP when 2, 0 = 2, = 0.01, hence optimization of Z of each particle can be done independently. The objective function is minimized, where , is the th coefficient for particle . Particles with three episodes are used for the validation study. The first episode with the largest absolute Crate is chosen for validation. Fig. S38 shows that while 2, has the larger influence on the validation error, the validation error also shows a local minimum as a function of 2, 0 at a given 2, . The minimum validation error is found at 2, = 5 and 2, 0 = 3. In addition, increasing 2, mostly decreases the intraparticle variance of (x), or 2 . Increasing 2, 0 mostly decreases interparticle variation 3 . However, at small 2, , this reduction of 3 is limited, because as 2, 0 increases, ≠0 can adjust to compensate for decreasing | 0 | such that the particle ln mean barely changes, hence 3 only decreases slightly.  Figure S38: The validation error, intraparticle variation 1/2 2 , and interparticle variation 1/2 3 of the validation dataset when 2, 0 and 2, , the regularization coefficients on 0 and ≠0 respectively, varies independently. Therefore, regularization on both 0 and ≠0 is important and for simplicity, in the following text, we consider only when these two parameters are equal ( 2, 0 = 2, ).

Method of regularization
Based on the results in sec. 8.1, in this section, we set 1 = 0 to reduce the bias of the inferred ℎ ( ) and 0 ( ). In 8.1, using synthetic datasets, we have shown that the reduction in the error of ℎ ( ) and 0 ( ) with the use of regularization ( 2 ) is much smaller than that of (x) and it is expected to be even smaller with a larger number of particles. Notice that in Fig. S33(c-e), all parameters are updated when 2 varies. Next, with the same synthetic datasets, we now fix ℎ ( ) and 0 ( ) to be the MAP solution at 2 = 0, then vary 2 , do the optimization with respect to Z , and calculate the validation error. By comparing with Fig. S33(c,d), Fig. S39 shows that this alternative approach which fixes ℎ ( ) and 0 ( ) results in a similar optimal 2 , a similar reduction of the validation error, and a similar reduction of the 2 norm of the error of ln ( , ) compared to allowing ℎ ( ) and 0 ( ) to vary. Therefore, this section compares these two methods of validation with experimental datasets. In the first approach, ℎ ( ) and 0 ( ) are free to change as 2 changes, the optimization is where p = [p global , Z 1 , . . . , Z ] and the normalization condition is imposed on the spatial heterogeneity. All particles are included in the optimization and must be evaluated in parallel.
In this section, we split the entire dataset into training and validation datasets: for particles with three episodes, the first episode with the largest absolute C-rate is chosen for validation, while all the other episodes and particles with less than three episodes are used for training.
The second approach is fixing the parameters of ℎ ( ) and 0 ( ) when 2 = 2 / 2 = 0.01 is small to reduce bias of ℎ ( ) and 0 ( ), that is p global is the global parameters part of arg min p (p; 2 = 0.01), where the objective function includes all datasets. As 2 varies, since the global parameters are fixed, particle square error SE is only dependent on Z , hence the following optimization problem is solved independently for each particle, The same training and validation datasets are used for the second approach. Clearly, the normalization condition cannot be imposed on individual particles, and hence the constraint is removed. The normalization is re-applied after the optimization when ln (x) is obtained for all particles. Fig. S40(a-b) reports the training and validation RMSE. The training error increases with increasing 2 , while the validation error shows local minima for both approaches (note that the training and validation error does not include the regularization term). The training error when ℎ ( ) and 0 ( ) are free is lower than when they are fixed since the optimization is performed with more degrees of freedom. In contrast, the validation error when ℎ ( ) and 0 ( ) are free is higher than when they are fixed, suggesting that a larger error in p when global parameters are free during regularization coefficient sweep. Fig. S40c shows that the two approaches produce a similar variance of the spatial heterogeneity, which decreases with increasing 2 , while the decrease in the interparticle variation is more significant than intraparticle variation, consistent with the insensitivity of the objective function with respect to 0 . Fig. S40 also compares ℎ ( ) and 0 ( ) at the optimal 2 when the validation error is at its minimum.
Since fixing ℎ ( ) and 0 ( ) when performing a sweep of the regularization coefficient yields a smaller validation error, we will use this method in our subsequent discussion and the main text.

k-fold cross-validation
In the previous validation, we chose a particular partition of training and validation datasets. In this section, we explore different combinations and perform k-fold cross-validation. Since ℎ ( ) and 0 ( ) are fixed during the sweep of the regularization coefficient, particles can be simulated, optimized with respect to their spatial heterogeneity, and evaluated on the validation set independently.
Within particles that have three episodes, we denote SE , to be the squared error of the th episode when it is used as the validation set of particle while using other episodes of the particle for training. 3-fold cross-validation can be performed, yielding validation error for = 1, 2, 3. Suppose episode V ( ) is chosen for validation for particle , we define the validation MSE, where the summation is over particles with three episodes, V ( ) = 1, 2, or 3, is the number of pixels in particle , and , , is the number of images in the th episode of particle . We compute the MSE for all possible V (3 raised to the power of the number of particles with three episodes, or 3 6 possibilities in our study). We define the sample mean MSE V and sample variance 2 (MSE V ) based on all combinations. Fig. S41 plots MSE V ± (MSE V ) as a function of the regularization coefficient 2 . Since the validation error is estimated with  Figure S40: A comparison of (a) training RMSE, (b) validation RMSE, and (c) ln statistics as a function of the regularization coefficient 2 when ℎ ( ) and 0 ( ) are free to change and when they are fixed during the 2 sweep. d) a comparison of ℎ ( ) and 0 ( ) fixed at the MAP and 2 = 0.01 with all data included ( 0 ( ) rescaled so that the normalization condition is satisfied at the optimal 2 ), and ℎ ( ) and 0 ( ) that corresponds to the minimum validation error when they are free to change. uncertainty, instead of choosing 2 with the smallest mean validation error, we can use the one-standard-error rule, choosing the smallest 2 that is one standard deviation above the minimum validation error [54], which is 2 = 0.88. We can evaluate the performance of the model by analyzing the result of the 3-fold cross-validation. Fig. S42 shows the validation error curve for each particle, including the validation RMSE of each of the 3-fold validation of the particle RMSE , ( th episode is the validation dataset), and the average validation RMSE of each particle, defined by the square root of SE , , , , where SE , is the squared error of the validation error of particle using the th episode for validation.
The average validation errors of particles 1, 2, 3, 5, and 6 show clearly defined local minimum validation errors. The corresponding optimal 2 are 15, 0.7, 2, 6, and 0.7. The validation error curves of particle 4 and RMSE 1,1 of particle 1 do not have a well-defined local minimum and decrease with increasingly large 2 and approaches the asymptotic limit when 2 → ∞, which is indicated by the dashed lines and corresponds to when there is no spatial heterogeneity ln (x) = 0. This suggests the model is not generalizable to all episodes of particle 4 and episode 1 of particle 1, in other words, there exist some particles and episodes in the dataset whose dynamics the current model does not fully account for.
By examining the centered ln (x) (subtracted by its mean) from each of the 3-fold cross-validation (Fig. S43) at the same regularization coefficient ( 2 = 0.88), we find that particles 2, 3, 5, and 6 show strong pointwise correlation among all three ln (x). The consistency of ln (x) indicates that the physical model can explain the data under different charge or discharge rates, therefore, all three validation errors increase at large enough 2 .
The other way of evaluating the validation error is to take the average MSE of all particles and k-fold validation tests, or the weighted squared error, We exclude validation error curves that decrease monotonically in the range of 2 studied and give all variance curves the same weight and compute the WSE. In addition to the 3-fold validation that we performed earlier on particles with 3 episodes, we also do 2-fold validation on particles with 2 episodes, that is, one is used as the training set and the other is the validation set, and vice versa. In Fig. S44, we find that 3-fold and 2-fold minimum validation WSE is found at 2 = 2. Comparing this optimal 2 with optimal 2 = 0.88 determined previously, we choose the lower value 2 = 0.88 since 2 = 0.88 produces a validation error in Fig. S44 that is only slightly larger than 2 = 2.  Figure S42: Top: the RMSE of each of the 3-fold validation of the particle RMSE , , where = 1, . . . , 6, corresponding to particle 1 to 6. Bottom: average validation RMSE of each particle RMSE . The dashed lines are the asymptotic limit when 2 = ∞, that is, when there is no spatial heterogeneity (ln (x) = 0).  Figure S43: Pointwise correlation plots of centered ln (x) from the k-fold validation for particles 2, 3, 5, and 6 when 2 = 0.7. The horizontal and vertical axis labels refer to which dataset is used for validation. The other two datasets are then used for training. particles and episodes by their model evaluation time and choose 21 particles (30 episodes in total) that are faster to evaluate, ensuring efficient use of all workers by reducing their idle time so that the Monte Carlo sampling can be achieved in a reasonable amount of time. In total, we obtained a chain of 53,000 samples. Normalization constraint is imposed. The parameter space is transformed based on Cholesky decomposition of the Hessian at the MAP. The preconditioning leads to significantly more efficient chain mixing. The MAP solution at 2 = 0.01 corresponds to the previously estimated variance of the prior for (x) at = 0.7. Fig. S45a shows the 99% confidence interval of ℎ ( ), 0 ( ), and 0 ( )/ 0,max from a HMC of 5.3 × 10 5 samples on the subset of particles. We also show the standard deviation of the centered spatial heterogeneity ( (x) −¯) for each particle (Fig. S45c). HMC results are then compared with the uncertainty and posterior mean estimated using linear approximation based on the same subset of particles (Fig. S45b,d). The magnitude of the uncertainty of intra-particle variation is well captured by the linear approximation. Linear approximation underestimates the uncertainty of 0 ( ) compared to the HMC result.
The average standard deviation of ln (x) over all particles, is 0.24, which is much larger than the variation due to the difference in boundary condition or mesh size, justifying the choice of boundary condition and mesh size made in Sections 6.9 and 6.5.
There exists a small difference between the posterior mean of ℎ ( ) and 0 ( ) based on a subset of particles from HMC and the MAP based on the entire dataset shown in Fig. S40, which is due to the presence of particles and episodes that are not completely accounted for by the model. Hence the heterogeneity of the dataset may also contribute to the uncertainty of the inferred models. For this reason, we next use bootstrapping to estimate the uncertainty of the MAP estimators for ℎ ( ) and 0 ( ).
Specifically, we sample times from all particles ( = 470) with replacement. Images from different particles can be assumed to be independent and hence the noise of each particle can be assumed to satisfy the iid condition for bootstrapping. For each sample, optimization with regularization ( 2 = 0.01) is performed to find the MAP. Here we do not use optimal 2 because we are interested in estimating the uncertainty of ℎ ( ) and 0 ( ) and in previous discussions, their MAPs are obtained at 2 = 0.01. Some particles may be drawn times and hence their squared errors are multiplied by in the objective function. Based on the bootstrapping method, the ensemble of MAP serves as an estimation of the uncertainty of ℎ ( ) and 0 ( ) [33]. Fig. S46a shows the MAP of all samples. Fig.  S46b shows the 99% confidence interval from this ensemble of MAP results. We notice that there is a significant spread in the magnitude of 0 ( ) due to the insensitivity to its magnitude as explained above but the normalized 0 ( )/ 0,max is consistent among different samples. Fig. S46c shows the histogram of the training RMSE.
We remark that the uncertainty of ℎ ( ) and 0 ( ) is reduced significantly compared to those inferred based on the uniformity coefficient (Fig. S14).

Comparison of spatial heterogeneity estimated using finite difference and inversion
We relax the criteria used in Section 4.2 to find pairs of images eligible for finite difference (FD) estimate of (x) −ī n order to have a particle whose all three episodes are eligible for the estimate. The second criterion is relaxed to: the larger variance ( 2 ) of the two frames must be smaller than 0.04. In Fig. S47a, we highlight pairs chosen for FD estimates by coloring frame edges with the same color. FD estimates for (x) −¯are plotted next to the images. The scatter plots for all three pairs are shown in Fig. S47b. We find that the correlation between estimates based on the three episodes is poor. The range of ln (x) from the second episode is much larger than the other two.
(x) −¯from the full inversion using an optimal regularization parameter 2 = 0.88 are also shown in Fig. S47a next to the FD estimate. The label indicates which dataset is used for training. For example, (1,2) means trained on episodes 1 and 2. Comparing the FD estimate and inversion result, we find that the pattern and magnitude of spatial heterogeneity revealed from the FD estimate of episode 1 is mostly consistent with inversion, except in a small region near the left boundary. In comparison with the poor correlation of the FD estimates, We find remarkably good correlation and match between each pair of (x) −¯from the full inversion using different training datasets (Fig. S47c,d).
These results suggest that PDE-constrained optimization and regularization may be needed to obtain a solution to the spatial heterogeneity that is consistent across different episodes. Fig. S48 shows an example of a particle with 2 episodes. We see a significant improvement in the correlation between the (x) −¯trained on two different episodes using full inversion than using FD. The optimization reveals details of ln (x) that must rely on the entire sequence, which the FD estimate cannot capture due to nonlinearity and the constraints on the frames that can be chosen. For example, the full inversion based on either episode can reveal the two small kinetically fast regions in the left half of the particle, while FD misses that. FD also fails to identify the right half of the particle as a kinetically fast region and only identifies the isolated island in the middle.

Comparison with auger electron microscopy (AEM)
The model assumes the rate of change of the local composition is (x) multiplied by the local surface reaction rate given by the Butler-Volmer kinetics. Because the local Li fraction is depth-averaged, the spatial heterogeneity comes from the nonuniform thickness of the particle and the heterogeneity in the surface reaction rate. Based on the depth-averaged model and the reaction that happens on the top and bottom surface of the LFP nanoparticles, the spatial heterogeneity is where (x) is the kinetic prefactor for the surface reaction. A factor of 2 is absorbed into ℎ(x) because we can only measure its relative magnitude. The particle thickness is proportional to the optical density OD( ) from the STXM and we average the optical density over all snapshots and define nondimensionalized thickness ℎ(x) = OD( )/OD where OD is the spatially averaged optical density of the particle. As explained in sec. 1.1, the LFP particles are coated with carbon which may be nonuniform due to particleto-particle contact and local gas flow during the coating process. We confirm this with AEM which measures the signal from surface carbon and reveals that the carbon distribution on the surface is indeed heterogeneous. Next, we compare the inverted (x) with AEM carbon signal, in order to understand the origin of the spatial heterogeneity.
It is known in the literature that while pristine LFP without carbon coating has poor rate capability due to low electron conductivity, a carbon coating that is too thick also reduces the reversible capacity due to hindered electrolyte penetration and increased resistance in Li + transport between the electrolyte and LFP as revealed by the charge transfer resistance [43,44,45,46,75], therefore, there exists an optimal carbon coating thickness for the electrochemical performance, which, depending on the coating method and precursor, is around 1-2 nm [43], 5 nm [44], 4.2 nm [45]. In these studies, the authors found that the rate capabilities decrease when the thickness is 2-3, 8nm, and 14nm, respectively. Other studies also found that a small amount of carbon can increase the rate capability significantly [46,75]. The results suggest that the kinetics of Li insertion and de-insertion is very sensitive to carbon coating thickness. Therefore, we compare (x) with the AEM carbon images as shown in Fig.  3b in the main text. We see that the "islands" of kinetically fast regions correspond well to the region with thin carbon coating. Fig. 3c plots the pixel-wise comparison of the values of (x) and AEM carbon image intensity after being centered (shifted by their respective mean) and scaled by their respective standard deviation (x) (also known as the z-score). The correlation coefficient is −0.4. The results show that the rate heterogeneity is well correlated with carbon thickness and thicker carbon coating slows down the reaction, suggesting our observed carbon coating thickness is likely greater than the optimal thickness for rate capability.
Here, we note that Ref. [76] found that in large LFP particles (1 − 5 m or larger), the carbon coating process using various carbon precursors leads to the formation of a secondary lithium deficient phase Fe 2 P 2 O 7 , which is an inactive material. A nonuniform distribution of such a secondary phase on the LFP particle surface (which is observed in Ref. [76]) may also lead to heterogeneity in reaction kinetics. However, the authors have shown that (3) (1) (2,3) (1,3) (1,2) (1) (2) (3) (1) (1) (1) (2)  Figure S48: (a) All the frames and estimated spatial heterogeneity of a particle with two episodes. Frames chosen for FD estimate of ln (x) are highlighted. The same pair is highlighted with the same color. On the right are the FD estimate for each episode and the inversion result at 2 = 0.88. The first and second rows are trained on episodes 1 and 2, respectively. (b) Scatter plots of each pair of FD and full inversion estimates. Diagonal dashed lines are = . Table S3: List of optimal RMSE and RMSEV using different models and objective functions. 0 ( ), ℎ ( ), and (x) may either be specified or optimized based on the chosen objective function. lit. refers to the model in the literature (Section 2.3).

Comparison of different models
where , and data, , are the model prediction and experimental variance of the concentration field of the th episode of particle and the squared 2 norm is the sum of the squared error over all frames of the episode. Similar to RMSE, define root mean squared error of the variance, . (S141) The largest reduction in the RMSE (compare (a) and (f) or (g)) is due to the introduction of spatial heterogeneity, i.e., the optimization of ℎ ( ), 0 ( ), and 0 ( ) simultaneously. We show that when there is no spatial heterogeneity, using the optimal ℎ ( ) and 0 ( ) from the full optimization captures the trend in uniformity better than the symmetric model, and results in a lower error in the variance of the concentration field (RMSEV) even though the RMSE does not differ greatly (compare (a) and (b)). We demonstrate this visually with a few examples in Fig. S50, comparing cases (a), (b), and (g). Notice that the baseline (a) results in most uniform concentration fields, hence both RMSE and RMSEV are large; with the optimal ℎ ( ) and 0 ( ) (b), the variance of the concentration is much closer to the data, but details of the simulated patterns may differ from data. By including spatial heterogeneity, both RMSV and RMSEV decrease further and a closer pixel-to-pixel agreement between model and data is achieved.
Without spatial heterogeneity, the optimization of SE can only decrease the RMSE slightly (compare (a) and (c)). This case is equivalent to setting the regularization coefficient for ln (x), or 2 , to infinity.
Alternatively, when we set the objective function to be RMSEV, or the error in the variance of the concentration field, which measures the heterogeneity of the concentration field and is known to largely be influenced by the form of 0 ( ) and ℎ ( ), the resulting RMSEV is significantly reduced compared to using the pixel-wise error in the objective function (compare (c) and (d)).
Comparing the results when spatial heterogeneity is included in the optimization, we find that using the optimal ℎ ( ) and 0 ( ) (all three optimized simultaneously) results in a smaller RMSE than when ℎ ( ) and 0 ( ) are  Figure S49: Comparison of the RMSE and RMSEV of some methods in table. S3.
fixed at the baseline and (x) alone is optimized (compare (e) and (f) or (g)). More notably, the former approach leads to a larger percentage decrease in RMSEV than RMSE, again reflecting the importance of ℎ ( ) and 0 ( ) in determining the uniformity of the concentration field.

Reaction kinetics
In this section, we briefly describe the coupled-ion-electron transfer (CIET) theory [29,30]  where is the reorganization energy of the solid host, which is set to = 8.3 based on previous estimate[29], is the formal overpotential, defined to be = + ln˜+ + (S145) Figure S50: Comparison of experimental (data) and simulated Li concentration field based on three methods (a,b,g) in Table S3 for key frames of three selected episodes. The methods chosen are (based on the numbering in Table S3) (a) baseline model without spatial heterogeneity, (b) 0 ( ) and ℎ ( ) from MAP without spatial heterogeneity, and (g) inverted model with spatial heterogeneity included. The standard deviation of the concentration field (std = 1/2 ) of all frames (from data and models) are shown below each episode. The RMSE and RMSEV of these three models quantified by all episodes (including those not shown) are shown in the bar graph.
where is the ionic resistance of the carbon coating on the surface. can be related to the surface heterogeneity . Here, we focus only on the non-spatially dependent reaction kinetics and hence set = 0. Assuming surface adsorption of Li + is fast compared to CIET, the surface coverage can be related to the bulk via Langmuir isotherm for non-interacting surface sites˜+ = + − + / 1 + + − + / (S146) where + is the activity of Li + in the electrolyte and + is the Li + surface adsorption energy. Ref.
[35] derived a simple formula that accurately approximates the full expression in Eq. S142, where for simplicity of notation, we define the normalized reorganization energy˜= /( ), and normalized formal overpotential˜= /( ). From Eq. S147, we can obtain the exchange current predicted by the CIET theory when at low overpotential ( → 0) shown in Eq. 3 in the main text. At low overpotential, = − 0,CIET ( )˜, which can be directly compared with Butler-Volmer (BV) kinetics at low overpotential because from Eq. S8, we have → − 0 ( )˜. Fig. 3a in the main text compares 0,CIET ( ) and 0 ( ) from the inversion.
Next, we would also like to compare CIET with the inferred reaction kinetics based on BV at higher overpotential. Because the dependence of CIET reaction rate on and is non-separable, in order to compare with Butler-Volmer (BV) kinetics, we need to first quantify the distribution of . Fig. S51 shows the distribution of overpotential at all frames and pixel locations when trained using all datasets and at the optimal regularization coefficient 2 = 0.88, as well as the distribution for each episode. The standard deviation is 2.7 / , which is consistent with the estimate of the overpotential in Ref.
[1] (2.5 / ) which is used to compare an experimental estimate of reaction rate with CIET prediction in Ref.
[29].  Figure S51: The histogram of overpotential (a) at all frames and pixel locations (b) for each episode when trained using all datasets and at the optimal regularization coefficient 2 = 0.88. The label is the mean overpotential of the episode.
We first compare CIET and BV reaction rates at various overpotential by assuming the surface coverage˜+ = 1 (which corresponds to a strong affinity of Li + to the surface). Fig. S52 compares ( , ) between CIET and BV using inverted 0 ( ) as well as | ( , )|/max | ( , )|, because we are more interested in the normalized reaction rate since the magnitude of 0 ( ) cannot be accurately inferred. For BV, | ( , )|/| max ( , )| = 0 ( )/ 0,max . We find that the normalized reaction rate predicted by CIET in the range of [−5, 5] / is mostly within the 99% confidence interval of the normalized reaction rate inferred from the images.
Next, we study the effect of˜+. The electrolyte used in this study is 1M LiClO 4 in tetraethylene glycol dimethyl ether.˜+ cannot be directly measured. As a reference, we use the activity of 0.358 mol/kg LiClO 4 in dimethoxyethane + = 0.015 [77] as a reference and vary + / from -10 to 10. Fig. S53 shows the normalized rate | ( , )|/max | ( , )| at = −5, 0, 5 / , + = 0.015 and at various + . As + decreases (increasing affinity), the peak of the normalized rate shifts to higher values of , asymptotically approaching the limit when˜+ = 1.
Finally, we compare the inverted reaction rate with an extension of the CIET theory to the case where the rate is limited by ion transfer (ion-coupled electron transfer, ICET), which predicts the same overpotential dependence as BV up to large overpotential with a new functional form for the exchange current 0,ICET ( ) = (1 − ) [34]. Fig. S54 shows the comparison of inverted and its best fit to 0,ICET ( ) with = 0.66. The inverted concentration dependence is quantitatively consistent with the unified quantum theory of CIET for intercalation, whether rate is limited by ion transfer (ICET) or by electron transfer (electron-coupled ion transfer, ECIT) [34]. In contrast, the prevailing empirical BV model, 0 ( ) = √︁ (1 − )[39], lies well outside the error bounds and cannot fit the experimental data. where the summation is taken over all frames of each episode,¯( ) is the spatially average composition at time , and again, the variance of the concentration field is = ∫ ( ( ) −¯) 2 ∫ (S149)

Variation in concentration field
where the integral here is performed by summing over all pixels that belong to the particle. The uniformity coefficient UC fits the standard deviation (std) of the concentration field to std = (1 − UC) √︁¯( 1 −¯). When UC is 1, the concentration field is uniform; when UC is 0, the std is maximum, i.e., consisting of 0 and 1. is defined to be the average reaction rate, that is, = (¯( ) −¯( 1 ))/( − 1 ), where 1 and are the time of the first and last frames, respectively. 0 = ∫ (x) / ∫ . We find that, while there is a significant spread due to the different initial conditions, geometry, spatial heterogeneity, and the non-constant total reaction rate ( ), the general trend is that the concentration field ( )becomes more uniform with increasing reaction rate and that ( ) is more uniform during Li insertion than extraction.
10 Inversion result Fig. S56 compares the standard deviation ( 1/2 ) of ( ) of all frames between data and model. We find good agreement between the two. Fig. S57 displays all the experimental images and the inversion result using the optimal regularization parameter 2 = 0.88 determined in Section 8.6 trained on the entire dataset. The inversion results include (x) of each particle and the concentration field ( , ) predicted by the inferred model.  Figure S52: Comparison of ( , ) and | ( , )|/| max ( , )| between CIET (˜+ = 1) at different overpotential and the inferred reaction kinetics based on Butler-Volmer kinetics. For the latter, ( , ) = 2 0 ( ) sinh˜/2 where 0 ( ) is from the MAP result, and | ( , )|/| max ( , )| = 0 ( )/ 0,max is shown with the shaded 99% confidence interval as well as the MAP result in the black curve (same as Fig. 3a in the main text).  Figure S54: Comparison between the inferred exchange current 0 ( ) (MAP in solid curve and 99% confidence region in the shaded region) and the best fit 0,ICET ( ) = (1 − ) which is based on an extension of the CIET theory to ion transfer limitation.  Figure S56: Standard deviation (std = 1/2 ) of ( ) of all frames between data and learned model.    Figure S57: All the experimental images and the inversion result using the optimal regularization parameter 2 = 0.88. The inferred (x) and the corresponding colormap of each particle is displayed to the left of the concentration fields. For each particle, the first row is the experimental data, the second row is the model, and the third row is the difference between data and model. Colorbars for them and the scale bar are shared and shown on the right. The time duration from the first frame and the global C-rate of each episode is shown below the images.