Temperature-dependent kinetic pathways of heterogeneous ice nucleation competing between classical and non-classical nucleation

Ice nucleation on the surface plays a vital role in diverse areas, ranging from physics and cryobiology to atmospheric science. Compared to ice nucleation in the bulk, the water-surface interactions present in heterogeneous ice nucleation complicate the nucleation process, making heterogeneous ice nucleation less comprehended, especially the relationship between the kinetics and the structures of the critical ice nucleus. Here we combine Markov State Models and transition path theory to elucidate the ensemble pathways of heterogeneous ice nucleation. Our Markov State Models reveal that the classical one-step and non-classical two-step nucleation pathways can surprisingly co-exist with comparable fluxes at T = 230 K. Interestingly, we find that the disordered mixing of rhombic and hexagonal ice leads to a favorable configurational entropy that stabilizes the critical nucleus, facilitating the non-classical pathway. In contrast, the favorable energetics promotes the formation of hexagonal ice, resulting in the classical pathway. Furthermore, we discover that, at elevated temperatures, the nucleation process prefers to proceed via the classical pathway, as opposed to the non-classical pathway, since the potential energy contributions override the configurational entropy compensation. This study provides insights into the mechanisms of heterogeneous ice nucleation and sheds light on the rational designs to control crystallization processes.


Supplementary Figures
Supplementary Figure 1. (a) Molecular dynamics simulation system. A typical simulation system consists of a slab of water (transparent cyan) and a surface (black). (b) Typical hexagonal ice (pink) and rhombic ice (cyan) close to the surface from different angles. Only one layer of rhombic ice can form due to the lack of dangling H-bonds, thus water (purple) above the rhombic ice remains in the liquid state. The H-bonds are represented by the magenta dashed lines.
Supplementary Figure 2. Illustration of the five selected representative collective variables (CVs). The five CVs represent the numbers of ice molecules in the corresponding configurations as colored in purple: the largest ice nucleus, rhombic ice in the largest ice nucleus, and hexagonal ice in the 2nd, 3rd, and upper layers of the largest ice nucleus, respectively. Supplementary Figure 6. Typical molecular dynamics trajectories of the non-classical two-step nucleation pathway at T =230 K. (a). Schematic of the free energy profile for the non-classical two-step ice nucleation pathway. (b)-(c). Evolutions of the number of hexagonal ice molecules in the largest ice nucleus, exhibiting two activation processes. The gray regions are extended simulations showing that ice grows continuously after entering the growth stage. (d)-(e). Typical configurations of the largest ice nucleus for the trajectories in (b) and (c), respectively. The purple and green spheres denote rhombic and hexagonal ice, respectively. The gray surfaces represent the substrate. Figure 7. Percentages of complete flux for heterogeneous ice nucleation at T=230 K. The numbers on the left give the total number of ice molecules. The pathways following the purple and yellow arrows depict the classical one-step and nonclassical two-step nucleation pathways, respectively. The numbers in brackets next to each arrow represent the flux as a percentage (shown in blue) for each transition between each pair of states. The mean first passage time of some transitions are listed in black in the top-left corner. After macro-states IV or V, the system enters the stage of ice growth. Supplementary Figure 13. Contributions of the CVs to the timescales. To evaluate the contributions, we took the three leading eigenvectors that correspond to the three slowest timescales, and then normalized them based on the squared sum of their components. The normalized square of each component in each eigenvector, V 2 , is plotted as the relative contribution. A higher V 2 value indicates a stronger contribution to the slowest timescales. The leading eigenvectors were obtained from the time-lagged correlation matrix from the Spectral-oASIS analysis with the five chosen CVs as shown in Fig. 1c in the main text.

Supplementary Method 1. System setup and all-atom MD simulations
The all-atom NVT molecular dynamics (MD) simulations 1 were performed using LAMMPS packages 2 with a time step of 2 fs. The system consisted of a slab of water (4940 water molecules) sitting on top of a wurtzite-structured surface in a periodic box with dimensions: x=5.43 nm, y= 5.89 nm, and z= 7.0 nm; and periodic boundary conditions (PBC) were employed in the xyz directions, respectively. To minimize the undesired water-surface interactions caused by the PBC, a void space (around 3nm) was maintained above the water in the z-direction. Additionally, the surface atoms were fixed in the wurtzite structure 3 with unit cell parameters of a=4.519 Å and c=7.357 Å. To simplify the modelled surface, the charge effect of the surface atoms was not considered. Furthermore, the TIP4P/Ice 4 water model was applied to simulate the water molecules, whose bonds and angles were maintained by the SHAKE algorithm 5,6 . The water-surface interactions were modelled by the 12-6 Lennard-Jones (LJ) potential 7 , which was cut off at 1.0 nm with interaction strength =4.98829 kJ mol -1 and σ=2.9034 Å. The water-surface interaction strength is close to, but a bit less than, that of gold-water interactions 8 , i.e., =5.2362576 kJ mol -1 . The Columbic interactions among the water molecules were cut off at 0.85 nm, whose longrange interactions were considered by the Particle-Particle Particle-Mesh (PPPM) 9 .
The system was initially heated up to 300 K for 2ns before being cooled down to a target temperature, i.e., 230 K and 240 K. The temperature was controlled by the Nosé-Hoover thermostat [10][11][12][13] , and the data was collected for production after relaxation. The hexagonal ice nucleates and then grows mainly on its first prism plane. To account for the possibility of both the one-step and two-step pathways, unbiased MD simulations were run in parallel with different initial velocities at T=230 K and 240 K. The simulations were terminated once the total number of ice molecules ( ) reached about 1400, or the simulation time exceeded 1000 ns. In total, the simulation length was about 15 μs for each temperature.
At T=250 K, two rounds of seeded unbiased MD simulations were performed. In the first round, 200 initial seeds were randomly chosen as the starting structures from the cluster centers of the most populated microstates at T =240 K. Unbiased MD simulations were then carried out with different initial velocities while T was maintained at 250 K. After 5 ns of relaxation, data was collected for 140 ns (unless reached 1400), after which a second-round of 500 30-ns parallel seeded unbiased MD simulations were performed. In this round, we randomly selected the seeds used as the initial structures from the first round of simulations.

Supplementary Method 2. Ice detection
We used the average bond order parameter 14,15 to characterize the local structures of the water molecules. Specifically, the average bond order parameter calculates the local order of an atom with its neighbors as follows: hexagonal ice and rhombic ice 3 , respectively. Note that the quadrilateral structure with high fourfold symmetry is termed as "square ice" in the reference 3 , whereas here we use the term "rhombic ice" since the quadrilaterals are buckled and not always in a perfectly square shape, as shown in

Selections of representative collective variables
In practice, reducing the feature dimensions by selecting representative CVs that can accurately describe the kinetics of HIN is crucial before constructing the MSMs. Spectral-oASIS 16,17 provides a powerful approach to fulfil this task as it can automatically choose representative CVs that can reconstruct the slowest dynamics relevant to ice nucleation. Starting from a random initial CV, we used Spectral-oASIS to add the CVs one-by-one from the pool of 18 CVs until the time-lagged correlation matrices generated by the selected subset of CVs could well approximate the leading eigenvalues and eigenvectors of the full matrix of 18 CVs (see Supplementary Table 1). The lag-time was pre-determined as 5 ns, which will be further verified in subsequent Sections. As shown in Fig. 1c in the main text, with the five optimal CVs (i.e., the number of ice molecules in the largest ice nucleus, rhombic ice in the largest ice nucleus, and hexagonal ice in the 2 nd , 3 rd , and upper layers of the largest ice nucleus), the slowest timescales respectively. Notably, the timescale for the formation of hexagonal ice in the third layer (middle panel, Supplementary Fig. 13) is found to be slower than that in the second layer (bottom panel, Supplementary Fig. 13). This observation underlines the second step in our two-step non-classical nucleation pathway, where the rhombic ice needs to be converted into hexagonal ice. This conversion process mainly involves the first two layers of ice, as the transformation of the rhombic ice molecule in the first layer will free one of its -OH bonds to quickly form a hydrogen bond with a hexagonal ice molecule in the 2 nd layer. As a result, the growth of the 3 rd layer of hexagonal ice is contingent upon the completion of this conversion process, rendering it to be the second-slowest timescale.

Determination of hyperparameters and clustering
We first determined an appropriate number of microstates by conducting variational crossvalidation with GMRQ 19 . In the cross-validation, the MD trajectories were randomly divided into five folds, in which four folds were used as a training set, and one fold was used as a testing set.
The optimal number of microstates was determined to be 1000 since it had the highest test score before entering the overfitting regime, as indicated by the progressively decreasing score shown in Supplementary Fig. 14. Then, the MD configurations were grouped into 1000 microstates based on the k-centers clustering algorithm 18 . The GMRQ cross-validation and clustering were performed by using in-house python codes in MSMbuilder Packages 22 .

Validations of MSMs
We constructed our microstate MSMs using a lag-time of 5 ns, as the implied timescale plateaued at this lag-time (see Supplementary Fig. 3b Supplementary Fig. 4.

Supplementary Method 4. Construction and validation of MSMs at higher temperatures
We also constructed MSMs at T=240 K and 250 K. Before the MSM construction, we have also conducted spectral-oASIS analysis at T=240 K and 250 K. As shown in Supplementary Fig.   15, we also need around five CVs to reproduce the slowest timescales of the completed sets of CVs. Furthermore, we found that the selected CVs is largely overlapped with those selected at 230K. For instance, in both T=240 K and 250 K, the five chosen CVs by Spectral-oASIS are: the number of ice molecules in the largest ice nucleus, number of hexagonal ice in the 2 nd , 3 rd , and upper layers of the largest ice nucleus, and the number of ice molecules in the largest hexagonal ice nucleus. The most notable difference is that the CV that represents the number of rhombic ice is no longer selected at 240K and 250K. This observation agrees well with one of our key conclusions that the non-classical two-step pathway (where the rate-limiting step involves the formation of rhombic ice) becomes less significant as temperature increases. In order to compare the flux of the classical and non-classical pathways at different temperatures, we adopted the same CVs and state-definition obtained at T=230K across all three temperatures. In addition, we have also performed a control analysis to show that these five CVs chosen at 230K (largely overlapped with CVs chosen by Spectral-oASIS at 240K and 250K) can still reasonably reproduce the slowest timescales at these two higher temperatures (see right panels in Supplementary Fig. 15a and b for 240K and 250K, respectively). Note that extra care should be taken since some of the microstates are missing at higher temperatures. The microstate MSMs were also validated by implied timescale analysis and the Chapman-Kolmogorov test. Supplementary Fig. 9b and 10b show that the implied timescales plateau for both cases at t=4 ns and 1.5 ns, respectively, implying that the MSMs reach Markovian. In addition, the residence probabilities for the most populated microstates in Supplementary Fig. 9c and 10c further validate the MSMs for T=240 K and 250 K, respectively. is the MFPT for the system from macro-state I to macro-states IV or V for flux i. Note that the heterogeneous nucleation rate should normally be evaluated using the surface area (for nucleation on a surface). Here, a volume-based nucleation rate is used to allow for a direct comparison with homogeneous nucleation 23 .
Supplementary Method 6. Identification of the critical ice nucleus via committor probability analysis The flux of the nucleation pathways was analysed based on a mathematical framework of TPT [25][26][27][28] . We computed the flux of ensembled nucleation pathways along with their relative probabilities at the microstate level. For visualization purposes, the net flux of the transitions was assembled into coarse-grained macrostates 29 . To identify the critical ice nuclei along the classical and non-classical pathways, the saddle points along the two pathways were obtained using two independent TPT analyses based on the committor analysis at the microstate level. As the ratelimiting step of the nucleation flux is proximal to where the flux bifurcates, the committor analyses were located around the bifurcation. For TS I and TS II, the transition microstates can be identified with a corresponding committor probability of 0.5, meaning they have a 50% probability to return to state I, while the other 50% corresponds to the probability to proceed to the microstates in state IV and III, respectively (see Fig. 3a in the main text).