PRISMS-PF: A general framework for phase-field modeling with a matrix-free finite element method

A new phase-field modeling framework with an emphasis on performance, flexibility, and ease of use is presented. Foremost among the strategies employed to fulfill these objectives are the use of a matrix-free finite element method and a modular, application-centric code structure. This approach is implemented in the new open-source PRISMS-PF framework. Its performance is enabled by the combination of a matrix-free variant of the finite element method with adaptive mesh refinement, explicit time integration, and multilevel parallelism. Benchmark testing with a particle growth problem shows PRISMS-PF with adaptive mesh refinement and higher-order elements to be up to 12 times faster than a finite difference code employing a second-order-accurate spatial discretization and first-order-accurate explicit time integration. Furthermore, for a two-dimensional solidification benchmark problem, the performance of PRISMS-PF meets or exceeds that of phase-field frameworks that focus on implicit/semi-implicit time stepping, even though the benchmark problem’s small computational size reduces the scalability advantage of explicit time-integration schemes. PRISMS-PF supports an arbitrary number of coupled governing equations. The code structure simplifies the modification of these governing equations by separating their definition from the implementation of the numerical methods used to solve them. As part of its modular design, the framework includes functionality for nucleation and polycrystalline systems available in any application to further broaden the phenomena that can be used to study. The versatility of this approach is demonstrated with examples from several common types of phase-field simulations, including coarsening subsequent to spinodal decomposition, solidification, precipitation, grain growth, and corrosion.


INTRODUCTION
Phase-field models are one of the foundational tools of computational materials science and are used to study microstructure evolution during a variety of processes, including solidification, grain growth, and solid-state phase transformations. A detailed review of phase-field models and their applications can be found in refs [1][2][3][4][5][6][7][8] . Phase-field models are almost exclusively solved numerically, yet developing software to perform phase-field simulations can be challenging for two reasons. First, phase-field simulations at scientifically relevant length and time scales are computationally intensive, often requiring millions of computing core hours on parallel computing platforms 9,10 . Therefore, computational performance and parallel scalability are leading concerns when choosing a numerical approach. Second, a simulation code written for one application is often not transferable to another application without extensive modifications. Different applications require different numbers of phasefield variables, different forms of the free energy functional, and may require the solution of additional coupled equations (e.g., mechanical equilibrium). Even within a single application, such as grain growth in a polycrystalline metal, multiple approaches with very different governing equations are common 7 .
As a response to these challenges, a standard approach is to develop a different code for each application 7 . A single-purpose code has hard-coded governing equations, which reduces computational overhead and permits numerical approaches tailored for the problem at hand. However, this approach has its limitations. Creating and maintaining a number of separate codes, each with its own tests and documentation, can be difficult. Often, only the numerical methods that are easiest to implement are used, namely finite difference methods and Fourier-spectral methods 1,3 . While substantial efforts over the past twenty years have been focused on techniques that greatly improve performance such as adaptive mesh refinement and multilevel parallelism [11][12][13][14][15][16][17] , these techniques are often neglected in userdeveloped single-purpose codes, as they are time-consuming to implement.
An alternative paradigm based on developing and utilizing open-source community frameworks is spreading through the phase-field community 7 . This type of framework contains building blocks for a variety of phase-field models. Therefore, developers' time can be spent extending the capability of the framework rather than implementing basic features in a new single-purpose code. Examples of such community frameworks are FiPy 18 , MOOSE [19][20][21] , OpenPhase 22 , AMPE 23 , and MMSP 24 .
In this article, we introduce PRISMS-PF, a new open-source community framework for phase-field modeling, which is a key component of the open-source multiscale materials modeling framework developed by the PRISMS Center 25 . PRISMS-PF was created upon four principles: 4. The framework should be open source in order to enable widespread adoption, modification, and development by the members of the phase-field community.
Embodying these four principles, PRISMS-PF enables scientists and engineers in the phase-field community to rapidly develop and employ phase-field models to explore the frontiers of the field. The computational performance of PRISMS-PF is enabled through the use of a matrix-free variant of the finite element method, as opposed to the matrix-based finite element methods traditionally applied for phase-field modeling (e.g., in MOOSE). In combination with Gauss-Lobatto elements, this matrix-free approach permits efficient explicit time integration. PRISMS-PF also leverages adaptive mesh refinement and multilevel parallelism for further increases in performance. Furthermore, PRISMS-PF contains functionality for nucleation and for efficiently handling large polycrystalline systems, two common phenomena in physical systems studied by phase-field modeling, to broaden its applicability. Finally, PRISMS-PF is integrated with the Materials Commons 26 information repository and collaboration platform to collect, store, and share a detailed record of each simulation. A more detailed discussion of the methods mentioned above and their implementation in a code structure that delivers performance, ease of use, and flexibility and adaptability to a wide range of applications is given in the "Methods" section of this article.

RESULTS AND DISCUSSION
To demonstrate the performance and flexibility of PRISMS-PF, we compare its performance to that of typical approaches and present examples of its use to investigate several physical phenomena. The comparisons are to a finite difference code and to other open-source frameworks for phase-field modeling. Examples of the use of PRISMS-PF to study coarsening subsequent to spinodal decomposition, precipitate growth, grain boundary nucleation, different formulations of interfacial energy anisotropy, grain growth, and corrosion are presented to demonstrate its flexibility.
Parallel scaling: PRISMS-PF vs. finite difference code To evaluate the parallel scaling efficiency of PRISMS-PF, a set of strong scaling tests were performed for PRISMS-PF and a customdeveloped, optimized finite difference (FD) code written by the authors. The FD code is written in the Fortran language and employs MPI parallelization. The spatial discretization utilizes second-order, centered finite differences on a regular grid. Like PRISMS-PF, the FD code employs a forward Euler time discretization. Although basic, this code is representative of a type of finite difference code commonly employed for phase-field modeling 1,8 . The scaling tests were performed for a coupled Cahn-Hilliard/Allen-Cahn system of equations describing the growth of two particles in three dimensions. The initial conditions and final solution are shown in Fig. 1. This system of equations is a simplified version of the models commonly used for solid-state transformations and solidification. Full descriptions of this test problem, computing environment, and FD code are found in the Supplementary Information. The PRISMS-PF calculations were performed with linear elements, so that the theoretical order of accuracy and degrees of freedom (DOF) equal those for the FD code. The calculations on the regular and adaptive meshes havẽ 3.4 × 10 7 DOF and 3.0 × 10 6 DOF, respectively. The results of the strong scaling tests on 16-512 computing cores are given in Fig. 2. From Fig. 2a, the PRISMS-PF calculations on a regular mesh have the longest run times, followed by the FD calculations and then by the PRISMS-PF calculations with adaptive mesh refinement (except for 512 cores, where the finite difference calculation is slightly faster than the PRISMS-PF calculation with adaptive meshing). A detailed analysis of the computational cost of calculations using these two codes is presented in the next section. Figure 2b shows the parallel efficiency (the ratio of the time assuming ideal scaling to the actual time) for the strong scaling tests. For PRISMS-PF with a regular mesh, the parallel efficiency is above 90% for 5.3 × 10 5 DOF/core or more (64 cores or less), and decreases to 68% at 6.6 × 10 4 DOF/core (512 cores). For the FD calculation, the parallel efficiency is above 90% for 1.1 × 10 6 DOF/ core or more (32 cores or less), and decreases to 43% at 6.6 × 10 4 DOF/core (512 cores). While the PRISMS-PF calculations exhibit improved parallel efficiency, the improvement is not driven by a reduction in the absolute deviation from the ideal time, which is actually larger for PRISMS-PF than for FD (see Fig. 2c) due to the more complex data structures involved. Instead, the improved parallel efficiency is driven by the longer baseline wall time of the 16-core calculation for PRISMS-PF. The longer baseline time leads to longer ideal times for PRISMS-PF, meaning that it can have a larger absolute deviation from ideality but still have a lower relative deviation from ideality, as measured by the parallel efficiency.
For the adaptive mesh calculations with PRISMS-PF, Fig. 2a shows that additional cores decrease the wall time up to 256 cores (1.2 × 10 4 DOF/core), after which, the wall time starts to increase. Unlike the calculations on regular meshes, the parallel efficiency in Fig. 2b for the adaptive mesh calculations does not asymptotically approach unity as the DOF per core increases. This finding suggests that the adaptive meshing calculation is out of the ideal scaling regime even for 16 cores (1.9 × 10 5 DOF/core). At the same DOF per core as the 16-core adaptive meshing calculation, the parallel efficiency for a PRISMS-PF calculation on a regular mesh is 81% (i.e., it is out of the ideal scaling regime). To correct for the deviation from ideality in the 16-core baseline adaptive meshing calculation, Fig. 2b also shows the parallel efficiency results with the adaptive meshing calculations shifted downward be equal to the regular mesh results at 1.9 × 10 5 DOF/core. The corrected curve for the adaptive meshing calculations overlaps with the regular mesh curve and then continues on to lower DOF/core values. This behavior indicates that the adaptive meshing calculations exhibit similar scaling performance to the calculations on regular meshes.
In summary, PRISMS-PF maintains near-ideal strong scaling up to 5.3 × 10 5 DOF/core on regular meshes. It shows improved parallel efficiency over a finite difference code, although the wall time is about one order of magnitude larger. The PRISMS-PF calculations on adaptive meshes exhibited similar scaling performance as the calculations on regular meshes when corrected for being outside the ideal scaling regime due to their fewer DOF.
Computational cost at fixed error: PRISMS-PF vs. finite difference code When performing a phase-field calculation, one must balance the objectives of reducing the error and the required computational resources. With this in mind, we performed a second comparison between PRISMS-PF and the custom FD code, in which we examined the error and the wall time for simulations using the same test problem as the previous section. A detailed description of the test conditions is found in the Supplementary Information. The error for each simulation is defined as the L 2 norm of the difference between its solution and the solution from a simulation that is highly resolved in time and space. Figure 3 shows the relationship between the time required for PRISMS-PF and FD simulations and their error. Table 1 uses these results to determine the speedup factor for PRISMS-PF compared with the finite difference code for levels of error resulting from three, five, and seven points across the particle/matrix interface in the FD simulations. On a regular mesh with linear elements (which have the same spatial order of accuracy as the FD discretization), PRISMS-PF requires an order of magnitude more run time than a FD calculation with the same error. This finding confirms the common wisdom that FD codes are more computationally efficient than finite element codes in terms of raw throughput.
Despite this disadvantage, Fig. 3 and Table 1 show that PRISMS-PF can leverage higher-order elements and adaptive meshing to become substantially faster than the FD code for this test case. From Fig. 3a, increasing the element degree reduces the run time for the PRISMS-PF calculations across all error levels examined. With a regular mesh, PRISMS-PF with quadratic or cubic elements is slower than the FD code when the allowed error is set to that of the FD code with three points in the interface. As the allowed error  is reduced, the PRISMS-PF calculations with quadratic or cubic elements become faster than the FD calculations due to their increased spatial order of accuracy. When the error corresponds to a typical level of resolution in the FD simulation (five points in the interface), PRISMS-PF with cubic elements is 1.9 times faster than the FD code. At that error level, the run times for the PRISMS-PF with quadratic elements and the FD code are approximately the same. However, in general this speedup factor will depend on the number of grid points required to resolve the interface, which will vary depending on the particular phase-field model and coupled physics that are utilized as well as the degree of accuracy intended for the simulation. Unfortunately, to the authors' knowledge there is no systematic review of the minimum number of grid points required for different types/applications of phase-field models, and the number has to be determined on a case-by-case basis with convergence analysis. We can expect the use of higher-order elements to be more advantageous for problems that require higher resolution at the interface. As can be seen in Fig. 3b, the speed of a PRISMS-PF calculation can be further increased with adaptive meshing, with a negligible increase in error. With the typical five-point interface resolution, PRISMS-PF with adaptive meshing is 12 times faster than the FD code for this test case. For more accurate calculations corresponding to seven points in the interface for the FD code, the PRISMS-PF calculations are up to 41.3 times faster.
In summary, PRISMS-PF with cubic elements outperforms a representative FD code for a two-particle test problem at an error level corresponding to a typical choice of five points across the interface for the FD calculation. Without adaptive meshing, the PRISMS-PF calculation is nearly twice as fast and, with adaptivity enabled, it is 12 times faster. At higher error levels, the PRISMS-PF calculation with a regular mesh is slower than the FD calculation, although the PRISMS-PF calculation with adaptive mesh refinement remains faster. While the governing equations and initial conditions for the test problem are similar to those used in precipitation and solidification simulations, note that the full diversity of phase-field models cannot be represented by a single test case. Changes to the governing equations could increase or decrease the advantages of higher-order elements in PRISMS-PF. The benefit of adaptive meshing is also strongly problemdependent. For problems that are not very amenable to adaptive meshing (e.g., the initial stages of spinodal decomposition) the performance, relative to the FD code, would be similar to that of simulations on regular meshes. (See applications below for an example of a large-scale simulation of spinodal decomposition and subsequent coarsening that leads to similar run time using PRISMS-PF and FD.) For problems especially amenable to adaptive meshing (e.g., the initial stages of particle nucleation), even larger speedups due to adaptivity may be observed than those presented here. It should be noted that FD codes can also utilize adaptive meshing, albeit with a large increase in their complexity 13 . Even with these caveats, the tests presented here demonstrate that PRISMS-PF does not sacrifice performance for generality, and instead can yield improved performance over a code employing a basic, yet common finite difference approach.
Computational cost: PRISMS-PF vs. other open-source frameworks As discussed in the introduction, open-source frameworks for phase-field modeling are increasing in popularity. To compare the performance of PRISMS-PF to other such frameworks, we reference results that have been uploaded to the PFHub phasefield benchmarking website for 2D dendritic solidification in a pure material [27][28][29] . Note that this benchmark problem can be solved to reasonable accuracy using less than 20,000 DOF, a small enough problem size that the advantage of the explicit time stepping scheme in PRISMS-PF over implicit/semi-implicit  Fig. 4 Solidification benchmark problem. a The order parameter (left, and reflected with respect to the simulation domain) and the temperature field (right) at t = 1500 for a PRISMS-PF simulation of the solidification of a pure material. Due to fourfold symmetry, the simulation considers only one quarter of the dendrite. b The dimensionless velocity of the dendrite tip for the PRISMS-PF simulation and the analytical sharp-interface steady-state solution from ref. 70 . The simulation parameters are the same as those for PRISMS-PF (#2) in Table 2 (additional details can be found in the Supplementary Information).
schemes for large problems is minimal. Figure 4 shows the PRISMS-PF solution to this benchmark problem. The tip velocity is steadily approaching the sharp-interface solution, but does not fully converge before nearing the boundary of the computational domain. An extension of the domain size in the existing benchmark that allowed the dendrite tips to reach steady-state velocity would permit a direct comparison with the sharpinterface solution. Unfortunately, given the setup of the benchmark problem, the analytical solution does not provide the accurate solution for the problem. Therefore, a highly resolved simulation is employed to benchmark the accuracy. Convergence tests in time and space indicate that the nondimensional tip velocity is~8.8 × 10 −4 at the stopping time designated in the problem definition (t = 1500). Table 2 shows results for this benchmark problem for PRISMS-PF and selected results uploaded to the PFHub website using three other open-source frameworks that focus on implicit or semiimplicit time stepping (MOOSE 19 , AMPE 23 , and FiPy 18 ). The PRISMS-PF calculations required three orders of magnitude fewer normalized core hours to complete than the calculations using AMPE and FiPy, while having similar or lower error. The fastest calculations using PRISMS-PF and MOOSE have similar computational cost and tip velocity error.
Each calculation represented in Table 2 was performed by a primary developer of the listed open-source framework. Uploads by users who are not primary developers of the framework used were not included in this analysis as they are more likely to make substantially suboptimal decisions while performing the calculation. The two different PRISMS-PF results represent different parameter sets used to obtain different balances of computational cost and numerical error. The two MOOSE results are from the same developer, but on different computers and with a different remeshing frequency. Details regarding the PRISMS-PF calculations and the analysis of the data from the PFHub website can be found in the Supplementary Information. The computational cost given in Table 2 for each result is normalized for reported processor clock speed in an attempt to make fair comparisons between calculations performed on different hardware systems. While other hardware characteristics likely impact the computational cost of these simulations, the clock speed is the only hardware information gathered on the PFHub website.
In summary, for tests using a dendritic solidification benchmark problem, PRISMS-PF achieves more accurate results with similar or (often much) lower computational cost than other leading phasefield frameworks. To reiterate the caveat from the previous section, no single test problem can contain the full diversity exhibited by phase-field models. The relative performance of codes using explicit versus implicit/semi-implicit approaches depends on many factors (e.g., computational problem size, deviation from equilibrium, the presence of short time-scale phenomena such as nucleation, and the order of the partial differential equations). However, the dendritic solidification benchmark problem used here provides a neutral, communitydetermined reference point that is directly relevant to an important class of phase-field simulations.

Applications
The flexibility of PRISMS-PF enables it to be utilized across a range of applications. PRISMS-PF contains 25 example applications that simulate a variety of physical phenomena. In this section, we highlight seven areas in which PRISMS-PF has been applied to showcase various aspects of the framework. Simulation results from these applications are shown in Fig. 5. Table 3 includes standardized comparisons of the simulations for these applications and one of the two-particle simulations from the FD performance comparison above. The table shows the wide variation of calculation sizes in the simulations, ranging from under 400,000 DOF to~1,000,000,000 DOF. The table also includes a normalized performance metric, the number of core-seconds per DOF required for each time step of the simulation. The normalized performance metric varies by approximately an order of magnitude. This variation originates from several factors including the number of arithmetic operations in the governing equations, required iterative solutions to linear/nonlinear equations per time step, additional overhead due to nucleation or grain remapping, efficiency losses due to non-ideal parallel scaling, and differences in computing architecture.
Spinodal decomposition followed by coarsening of the microstructure is a classic example of a phenomenon studied with phase-field modeling. To demonstrate the capabilities of PRISMS-PF for large simulations, we show in Fig. 5a the results of a simulation of this phenomenon using the cahnHilliard application with 9.6 × 10 8 DOF. This example is based on a previous calculation performed using a finite difference code to study coarsening behavior subsequent to spinodal decomposition 30 . The calculations were performed at the National Energy Research Scientific Computing Center (NERSC) on the Intel Xeon Phi Knights Landing architecture. With its multilevel parallelism approach, PRISMS-PF can leverage the strengths of this modern architecture, especially its high level of vectorization. Quadratic elements were used in this calculation because the octree structure of the mesh did not permit an efficient mesh with cubic elements (see the Supplementary Information for a more detailed discussion regarding this choice). The high density of interfaces eliminates the benefits of adaptive mesh refinement, and thus the simulation was performed on a regular mesh. The simulation was performed on 1024 computing cores (16 nodes) for 5.6 days. A corresponding calculation using the finite difference code from ref. 30 requires 3.5 days on 1024 cores, assuming that the relationship between the element size and error from the two-particle benchmark tests holds. (The finite difference estimate was determined by doubling the wall time of a finite difference simulation with half the simulated time of the PRISMS-PF simulation). This example demonstrates the application of PRISMS-PF to a system with A summary of the benchmark results uploaded to the PFHub phase-field benchmarking website for the dendritic solidification problem. a Due to differences in the processor clock speeds, the computational cost is scaled to approximate a calculation on a processor with a 3.0 GHz clock speed.
S. DeWitt et al.
nearly one billion degrees of freedom with performance of only slightly worse than a custom FD code under some unfavorable conditions (no adaptivity, quadratic rather than cubic elements). The performance tests presented in Figs 2 and 3 involved the simulated growth of two particles. However, matters of scientific or engineering interest often involve many interacting particles. Thus, Fig. 5b shows a PRISMS-PF simulation using the same governing equations as the two-particle simulations, but scaled up to 128 particles, each with a random initial location and size. This calculation required 5.5 days on 512 computing cores to simulate 400 time units and shows the transition from particle growth to coarsening. From Table 3, this simulation requires~2 × 10 −7 cores/DOF-time step. This normalized performance is approximately two times worse than the two-particle simulation with an adaptive The evolution of the particles in the scaled-up version of the two-particle growth simulation. c A cross section of a 3D simulation of two precipitates in an Mg-Nd alloy (left) and a corresponding transmission electron microscopy image (right). (Reproduced with permission from ref. 31 . Copyright 2017 Acta Materialia Inc.) d The initial (left) and final (right) grain structure for a 2D grain growth simulation. The color represents the grain identification number that remains associated with a grain, as it is transferred between order parameters. e A 2D simulation of pitting corrosion. Snapshots are at times 0, 2 s, and 4 s. For each of these times, the left frame (A) represents the metal surface in half of the system, and the right frame (B) represents the metal cation concentration in the other half. f A 2D simulation of grain boundary nucleation showing the composition (upper) and the average of the order parameter parallel to the grain boundary (lower). The grain boundary region with a decreased nucleation barrier is denoted by the dashed lines. g Initially spherical particles evolved using two different functions to describe strong interfacial anisotropy: a conventional form (left), which enables simulation of shapes with sharp corners and edges 41 , and a more sophisticated form (right), which enables flatter facets and noncentrosymmetric anisotropies 42 .
mesh on 16 computing cores (9.5 × 10 −8 core-s/DOF-time step). This discrepancy indicates non-ideal weak scaling, potentially due to imperfect load balancing. A comparable FD calculation is expected to take 17.2 days on 512 cores, more than twice as long, assuming the same relationship between element size and error from the two-particle tests holds (and determined by scaling the wall time of a simulation with 3% of the total simulated time in the PRISMS-PF simulation).
PRISMS-PF has already been applied toward advancing the understanding of lightweight structural components, the primary testbed for the PRISMS integrated software framework 25 , to simulate the evolution of precipitates in magnesium-rare earth alloys 31 . Simulations of isolated precipitates were shown to be largely consistent with experimental observations. A twoprecipitate simulation (see Fig. 5c) demonstrated a precipitate interaction mechanism to explain outliers. These simulations used the Kim-Kim-Suzuki model 32 and linear elasticity to account for misfit strain between the precipitate and the matrix. PRISMS-PF contains an application named MgRE_precipitate_single_Bppp for performing these and similar simulations.
Another common application of phase-field modeling is grain growth in polycrystals. The grainGrowth_dream3D application uses the grain-remapping algorithm (largely based on ref. 33 ), described in the "Methods" section, to simulate grain growth in an isotropic system using the Fan and Chen model 34 . The initial microstructure containing 739 grains is imported from Dream3D 35 . Figure 5f shows this initial microstructure as well as an evolved structure that contains 186 grains. The grain remapping led to a 62-fold reduction in the number of order parameters to just 12 shared order parameters to track the grains; no artificial coalescence was observed during the simulation. Less than 9% of the simulation time was spent on the grain-remapping algorithm, a small cost for the nearly two orders of magnitude reduction in the number of order parameters and thus equations to be solved.
The use of phase-field models to describe microstructure evolution in electrochemical systems has been an area of increasing interest. In this area, an application for corrosion is under development using a phase-field model developed by Chadwick et al. 36 . The electric potential and the transport of ions in the electrolyte are calculated using the smoothed boundary method 37 to apply boundary conditions along the metal surface. A Cahn-Hilliard equation with a source term proportional to the reaction rate tracks the dissolution of the metal. A continuity equation for the electric potential is solved using the nonlinear solver described in the "Methods" section 38 . Figure 5e shows the evolution of the metal surface and the dissolving metal cation concentration in the electrolyte as a pit grows.
The nucleation capabilities of PRISMS-PF have also been used to study precipitate nucleation in grain boundaries and the associated creation of precipitate-freezones 39 . Figure 5f shows the results from the nucleationModel_preferential application. This application uses a coupled Cahn-Hilliard/Allen-Cahn model similar to the one used in the finite difference performance comparisons and leverages the flexibility of PRISMS-PF in handling nucleation. Nuclei are added explicitly via a nucleation probability that depends on the local nucleation rate, which is calculated dynamically for each element. This method is described with more detail in the "Methods" section below and in the "Nucleation Algorithm Description" section of the Supplementary Information. The nucleation rate function in the nucleation application file includes a spatial dependence for the nucleation barrier (in addition to a dependence on the supersaturation) to mimic the effect of preferential nucleation at a grain boundary. The use of adaptive meshing is particularly effective here because the composition and order parameter fields are nearly uniform across the majority of the domain at the beginning of the simulation.
The user interface for PRISMS-PF permits straightforward modification of governing equations to add more physical phenomena by those who are familiar with C++ programming. PRISMS-PF contains three similar applications with different approaches for handling interfacial energy anisotropy. Each of these is an extension of a single-particle version of the Cahn-Hilliard/Allen-Cahn application used in the finite difference performance comparisons. The CHAC_anisotropyRegularized and CHAC_anisotropy applications use a standard form of the anisotropy 40 , with and without a fourth-order regularization term that prevents a lack of convergence of the numerical solution with increasing resolution, which can occur with strong anisotropies (i.e., leading to edges or corners) 41 . The facetAnisotropy application uses the same regularization term, but has a more sophisticated and versatile anisotropy function 42 . Figure 5g shows examples of simulations using these applications. Table 3. A summary of the performance (measured in core-seconds per DOF for a single time step) of selected simulations presented in this article including the two-particle simulation (cubic elements, adaptive mesh with Δx = 100/64) and simulations from seven applications highlighted in this section.  43 with a graphical user interface (GUI) developed using the Rappture Toolkit 44 . The incorporation of implicit time stepping into PRISMS-PF is being investigated for systems that are near equilibrium, in which explicit time stepping is inefficient. The existing nonlinear solver can be applied to compute the implicit update, but improved parallel preconditioning options are required. An extension to the grain-remapping algorithm that can more accurately handle irregularly shaped grains is also planned. Another focus of near-term development is improved integration with other computational tools. Tight coupling with the PRISMS-Plasticity 45 framework for crystal and continuum plasticity will enable the simulation of dynamic recrystallization. Finally, integration is planned with ThermoCalc, a CALPHAD software package, and with CASM 46 , a first-principles statistical mechanics software package, to provide thermodynamic and kinetic information directly to PRISMS-PF. This integration provides an alternative to the current approach of manually inserting thermodynamic and kinetic parameters for a given material system into the input parameters file.

Conclusions
This article describes a new phase-field modeling framework, PRISMS-PF, with four guiding principles: high performance, flexibility, ease of use, and open access. One characteristic aspect of PRISMS-PF is its use of a matrix-free variant of the finite element method, which enables efficient explicit time stepping by eliminating the need to diagonalize the mass matrix as in traditional finite element methods. This method, combined with advanced adaptive meshing and parallelization strategies, is key to the framework's competitive performance. A benchmark test of two precipitates, in which mesh adaptivity provides a significant speedup, demonstrated that PRISMS-PF was up to 12 times faster than a basic, custom-developed finite difference code at the same level of error. Even for a test case involving spinodal decomposition and subsequent coarsening with no mesh adaptivity and quadratic, rather than cubic elements, the performance of PRISMS-PF is close to that of the finite difference code. In addition, comparison of the results for a dendritic solidification benchmark problem demonstrate that PRISMS-PF yields a similar or higher accuracy with similar or lower computational cost than three other open-source frameworks for phase-field modeling. A second characteristic aspect of PRISMS-PF is its modular, applicationcentric structure. It is structured such that users primarily interact with applications that contain governing equations, initial/ boundary conditions, and parameters. In an application, an arbitrary number of coupled governing equations are input into simple C++ functions, giving users substantial flexibility in their construction. The core library contains shared functionality for the applications and shields users from most of the numerical complexity, allowing them to focus on their system of interest. The core library also includes functionalities for nucleation and for polycrystalline systems that can be activated in any application, thereby broadening the scope of phenomena that can be investigated. The versatility of this approach is demonstrated by its use in a range of applications, including precipitate nucleation, dendritic solidification, grain growth, and corrosion. In sum, this new phase-field modeling framework provides the performance, flexibility, ease of use, and open availability to drive breakthroughs across the field of materials science.

METHODS
PRISMS-PF utilizes advanced numerical methods to enable computational performance at the frontier of the field while also supporting common applications of phase-field modeling. While variants of all of the methods described below have been previously applied in the phase-field community or the wider numerical partial differential equation community, to the authors' knowledge PRISMS-PF is currently the only open-source framework to combine all of these methods within the context of phasefield modeling. PRISMS-PF is structured so that users can leverage these advanced methods without detailed knowledge of their implementation. This structure allows users to focus on the unique governing equations and parameters for their particular application. This article describes version 2.1 of PRISMS-PF 47 .
Code structure PRISMS-PF is written in the C++ programming language and built upon the deal.II finite element library 48 . The structure of PRISMS-PF reflects the principles set forth in the "Introduction", particularly that it should accommodate a wide variety of governing equations and that those governing equations should be straightforward to modify and be as separated as possible from the numerical methods used to solve them. Therefore, PRISMS-PF is broken into two main components: the core library and the PRISMS-PF applications. The core library contains shared functionality for use in any PRISMS-PF calculation, including the methods described later in this section. An application describes a particular type of simulation, defining an arbitrary number of coupled governing equations and the associated boundary/initial conditions and parameters. The structure of PRISMS-PF allows three levels of engagement: (1) using a preset application and changing the parameters, (2) creating a custom application, and (3) modifying the core library. Thus, PRISMS-PF accommodates a range of users, from those with no programming expertise to expert programmers who want full control over their calculations.
The core library performs the actual finite element calculations, parses input, generates and adapts the mesh, initializes variables, outputs results, handles nucleation, and performs grain remapping. At the heart of the core library is the solver loop, which is structured to maximize flexibility and performance. A flow chart of the solver loop is given in Fig. 6. One cycle of this loop is performed in each time step in the solution of the governing equations (or once for a purely time-independent calculation). To control what operations are necessary for a particular governing equation in the solver loop, governing equations are classified as "explicit time-dependent", "time-independent", or "auxiliary." Explicit   Fig. 6 Flowchart of the solver loop. A flowchart showing the structure of the solver loop in PRISMS-PF. The solver loop consists of four stages. The first stage is for updating the mesh, the list of nuclei, and the list of grains and their assignment to order parameters. In the second stage, all of the governing equations that are to be updated explicitly are updated. The third stage is the loop implementing the nonlinear solver discussed below. The final stage of the solver loop is to output variable values to file (including postprocessed variables) and to save checkpoints at user-specified time increments. time-dependent equations solve an initial boundary value problem (e.g., the Allen-Cahn equation) via explicit time stepping. Time-independent governing equations solve boundary value problems for sets of linear or nonlinear equations that are time-independent (e.g., Poisson's equation). Auxiliary equations are relational expressions that can be directly calculated from known values at each time step, but do not contain any derivatives with respect to time (e.g., the chemical potential in the split form of the Cahn-Hilliard equation).
A PRISMS-PF application defines a set of governing equations, initial conditions, boundary conditions, and parameters. Each application is derived from the MatrixFreePDE class in the core library. This class structure allows any application to selectively override behavior in the core library for increased customization. An application requires a set of input files, each for the purpose of (1) setting constant parameters, (2) setting the governing equations, (3) setting the initial and boundary conditions, (4) defining the custom class for the application, and (5) containing the main C++ function. Additional files to define postprocessing expressions and the nucleation rate are optional.
The contents of the five mandatory app files are briefly discussed below. For a more detailed description of the application files, please refer to the PRISMS-PF user manual 49 . The parameters file is a parsed text file that contains numerical parameters (e.g., mesh parameters) and model parameters (e.g., diffusivity). The model parameters differ between applications, and are allowed to have various data types (e.g., floating-point number, tensor of floating-point numbers, boolean). The type of boundary condition for each governing equation is also set in the parameters file. The equations source file contains the attributes of the variables in the governing equations as well as expressions for the terms in the governing equations themselves. Another mandatory source file contains expressions for the initial conditions for each variable and expressions for Dirichlet boundary conditions that vary in time or space (simpler boundary conditions can be fully specified in the parameters file). Users have flexibility in customizing the governing equations, initial conditions, and boundary conditions because they are entered into C++ functions that can use loops and conditional statements alongside standard arithmetic operations. The class definition file is a C++ header file that defines all of the members of the class, including member variables representing each of the model parameters in the parameters file. The final mandatory application file is a source file containing the C+ + main function. This file can be modified to change the overall flow of a simulation, but is identical for the majority of applications.
Beyond the core library and the applications, the third primary component of PRISMS-PF is a test suite containing a battery of unit tests and regression tests. Following best practices, continuous integration testing is used (i.e., the test suite is run after every change to the code in the public PRISMS-PF repository 47 ).

Matrix-free finite element calculations
For improved computational performance over typical finite element codes, PRISMS-PF takes advantage of the matrix-free finite element capabilities in the deal.II finite element library 48 . For a detailed discussion of this approach, please refer to ref. 50 ; a brief summary is provided below. A central ingredient of any finite element code is the evaluation of a discrete differential operator that approximates a term in the governing equation(s). The discrete operator can typically be evaluated as a matrix-vector product (or a set of matrix-vector products). The standard approach is to assemble a global sparse matrix representing the discrete differential operator for the entire mesh and then multiply it with a vector. However, storing even a sparse matrix is memory intensive for problems with many degrees of freedom, and calculations are often limited by memory bandwidth rather than the speed of the floating-point operations 50 .
To circumvent the performance issues related to storing sparse matrices, PRISMS-PF uses a matrix-free approach (also referred to as a cell-based approach) as implemented in deal.II library 48 . The contributions to the global matrix-vector product are calculated element-by-element, and then summed together. Furthermore, for improved performance, the underlying deal.II implementation enables sum factorization for the matrix-free approach. Sum factorization refers to a restructuring of the calculation of function values and derivatives on the quadrature points of the unit cell into a product of sums along each direction using 1D basis functions. This strategy for computing and assembling operators significantly reduces the number of arithmetic operations and, more importantly, lends itself for efficient implementation using multilevel parallelism. Furthermore, PRISMS-PF uses Gauss-Lobatto finite elements instead of traditional Lagrangian finite elements. The quadrature points for Gauss-Lobatto elements coincide with the element nodes, and this feature yields improved conditioning for higher-order elements, and efficient explicit time stepping due to a trivially invertible diagonal "mass matrix" (a matrix where each element is given by M ij = ϕ i ϕ j , where ϕ n is the nth shape function). This combination of the matrix-free finite element approach with sum-factorization and Gauss-Lobatto elements has been previously shown to significantly outperform matrix-based calculations for wave and fluid dynamics calculations 50 .

Explicit time stepping
The default scheme of time discretization in PRISMS-PF is the forward Euler method. The advantages of this explicit scheme are that it is simple to implement 1,5,51-53 , the calculation for each time step is fast 1,5,51-53 , and it scales well for parallel calculations [54][55][56] . The disadvantages are that it is only first-order accurate, and the time step is limited by the Courant-Friedrichs-Lewy (CFL) condition. Implicit and semi-implicit schemes are alternatives that permit stable calculations with larger time steps, but require the solution of a linear or nonlinear system of equations at each time step. This process is time intensive, and parallel scaling often suffers when the number of degrees of freedom (DOF) exceeds a few million due to the lack of effective parallel preconditioners, except for a small class of problems for which physics-based or geometric multigrid preconditioners are available [57][58][59] . Efficient implementations of implicit/semi-implicit schemes also require significant user effort to select/implement the solution strategy (monolithic/ staggered), appropriate preconditioner, time-step size, and convergence tolerances 59 . As a result of the advantages of explicit time integration, a recent review of phase-field modeling noted that a majority of the papers it highlighted as making high-impact contributions to materials design employed this approach 8 .
In the context of PRISMS-PF, explicit time stepping is attractive for four reasons. First, the time step for phase-field simulations is often limited to values near the CFL condition by the physics of interest (e.g., sharp gradients in the interface, topological changes), decreasing the advantage of taking larger time steps while using implicit/semi-implicit methods. Second, the improved parallel scaling for explicit methods enables the types of simulations for which PRISMS-PF is designed, with hundreds of millions (or more) of DOF. Third, the diagonal mass matrix provided by the Gauss-Lobatto elements permits efficient explicit time stepping without any ad hoc mass lumping to diagonalize the mass matrix as is required in traditional finite element approaches with Lagrange elements 50 . Fourth, the simplicity of explicit methods reduces the possible ways novice users can make ill-informed choices that substantially reduce code performance.

Adaptive mesh refinement
Adaptive mesh refinement can greatly improve the speed of a simulation with a negligible decrease in accuracy 14,60 . Phase-field calculations are particularly well suited for adaptive meshing, since order parameters are nearly uniform outside the interfacial regions 14 . Despite these benefits, the use of a regular mesh for computationally intensive phase-field simulations is still a common practice due to the complexity of implementing adaptive meshing (e.g., refs 9,61 ).
PRISMS-PF utilizes the adaptive mesh refinement capabilities from the deal.II 48 and p4est 62 libraries. The mesh is a set of connected quadtrees/ octrees (a "forest of quadtrees/octrees" 62 ), a structured approach that offers improved efficiency over unstructured approaches, while maintaining the flexibility to represent arbitrary geometries 62 . In PRISMS-PF, the user selects an upper and lower bound for one or more variables to define the interfacial regions. The mesh is maximally refined in this region of interest, and is allowed to gradually coarsen to a user-defined minimum level outside of it. Over the course of a simulation, the mesh is periodically regenerated using updated values of the model variables. Thus, as the solution evolves, so does the mesh.

Multilevel parallelism
Parallel computation is crucial for phase-field codes to reduce the run times of large simulations. Using features of the deal.II 48 , p4est 62 , and Threading Building Blocks 63 libraries, PRISMS-PF employs three levels of parallelism: distributed memory parallelization, task-based threading, and vectorization. On the distributed memory level, the domain is decomposed such that each core stores only the refined mesh and corresponding degrees of freedom for its subdomain. This allows for calculations across multiple computing nodes, and permits calculations that require more memory than is available on a single compute node 64 . Each core independently performs calculations to update its portion of the mesh, and communicates the variable values along the subdomain boundary to cores storing adjacent subdomains employing the Message Passing Interface (MPI) protocol. Task-based parallelism is utilized to divide the computational load within a node/core across all available threads. Finally, vectorization up to the AVX-512 standard is permitted, corresponding to operations on eight double-precision floating-point numbers per clock cycle. The sum-factorization approach used in PRISMS-PF is particularly well suited for efficient vectorization 50 . From the perspective of a PRISMS-PF user, this multilevel parallelism scheme is nearly invisible; the user sets the number of MPI processes at run time, and parallelization is handled in the background.
Nonlinear Newton/Picard solver Many of the partial differential equations involved in phase-field modeling are nonlinear. The nonlinear terms for evolution equations are straightforwardly handled with explicit time-integration methods. However, a nonlinear solver is necessary for nonlinear time-independent partial differential equations, such as the continuity equation for the electric potential in the corrosion model described in the "Results and Discussion" section. In PRISMS-PF, a hybrid Newton/Picard approach is taken to leverage the performance of Newton's method and the simple implementation of Picard's method 38 . Each governing equation is solved using Newton's method, and if there are multiple nonlinear equations, they are coupled using a Picard (fixed-point) iteration until convergence is reached 38 . For the Newton iterations, backtracking line search is used to determine the step size to ensure stability without compromising the quadratic convergence rate 51 . Linear equations with no coupling to other non-explicit-time-dependent equations are only solved during the first iteration of the nonlinear solver to avoid unnecessary calculation overhead.

Nucleation
Nucleation is an important phenomenon in a number of systems of interest for phase-field modeling, such as solid-state transformations, solidification, and recrystallization 1 . To aid in the treatment of these systems, PRISMS-PF contains functionality to assist in the placement of nuclei during a simulation. PRISMS-PF uses the explicit nucleation approach 65 , where supercritical nuclei are stochastically introduced at a time and spatial location determined by a nucleation probability function. The probability function is typically determined from classical nucleation theory 65,66 , although any probability model can be utilized. In PRISMS-PF, users specify the desired probability function, which can depend on the value of any model variable, spatial location, and time. Different phases can have different nucleation probabilities and different nucleus properties (e.g., size and shape). The initial nuclei are ellipsoids with an arbitrary rotation with respect to simulation axes. The details of the algorithm used to place the nuclei on a distributed mesh (where each processor stores only a portion of the mesh, as in PRISMS-PF) are described in the Supplementary Information.

Grain remapping
For simulations of polycrystalline systems, the naive approach of assigning one-order parameter per grain is intractable even for small systems with hundreds of grains. To handle polycrystalline systems, PRISMS-PF uses the grain remapping approach 67 . Each order parameter stores multiple grains, and grains are transferred between order parameters as needed to prevent direct contact that would lead to artificial coalescence between neighboring grains with the same order parameter. The advantage of this approach is that the evolution equations and data structures are unchanged from those of a typical phase-field simulation. The process used to transfer grains is largely based on the approach from ref. 33 . A recursive flood-fill algorithm is used to identify the elements in each grain. A simplified representation of each grain is created, and these simplified representations are used to identify grains to be transferred. Finally, these grains are transferred to new order parameters. The details of this process, which includes communication over a distributed mesh, are discussed in the Supplementary Information. A similar process is employed when importing a polycrystalline microstructure (e.g., from electron backscatter diffraction or Dream3D 35 ) as the initial condition for a simulation.
Automatic capture of simulation data and metadata PRISMS-PF is integrated with the Materials Commons 26 information repository and collaboration platform that is specifically designed for materials scientists. Using the Materials Commons command line tool 68,69 , information from the PRISMS-PF input files are automatically parsed and uploaded to Materials Commons. The same tool can be used to upload the results of the simulation, creating a comprehensive record that can be published for broader dissemination. This capability was utilized for the simulations presented in the "Results and Discussion" section of this article, which can be viewed at the hyperlink given in the "Data Availability" section below.

PRISMS-PF is an open-source computer code under the GNU Lesser General Public
License version 2.1. The source code for PRISMS-PF is available at the following hyperlink: https://github.com/prisms-center/phaseField. The finite difference codes used in the performance comparisons are available upon request.