Efficient Gaussian Process Regression for prediction of molecular crystals harmonic free energies

We present a method to accurately predict the Helmholtz harmonic free energies of molecular crystals in high-throughput settings. This is achieved by devising a computationally efficient framework that employs a Gaussian Process Regression model based on local atomic environments. The cost to train the model with ab initio potentials is reduced by starting the optimisation of the framework parameters, as well as the training and validation sets, with an empirical potential. This is then transferred to train the model based on density-functional theory potentials, including dispersion-corrections. We benchmarked our framework on a set of 444 hydrocarbon crystal structures, comprising 38 polymorphs, and 406 crystal structures either measured in different conditions or derived from them. Superior performance and high prediction accuracy, with mean absolute deviation below 0.04 kJ/mol/atom at 300 K is achieved by training on as little as 60 crystal structures. Furthermore, we demonstrate the predictive efficiency and accuracy of the developed framework by successfully calculating the thermal lattice expansion of aromatic hydrocarbon crystals within the quasi-harmonic approximation, and predict how lattice expansion affects the polymorph stability ranking.


I. INTRODUCTION
Polymorphism and the prediction of the energetic stability of a crystal polymorph are a fundamental problem of condensed matter physics, especially for the research and applications of molecular crystals. Polymorphism is the capability of solid materials to form more than one distinct crystal structure [1,2]. It is particularly pronounced when multiple atomic or molecular packing arrangements are characterised by a similar free energy. The physicochemical properties of these systems, such as mechanical and optical characteristics, melting point, chemical reactivity, solubility or stability are tied strongly to the crystal morphology, therefore increasing the relevance of a comprehensive structure screening and the prediction of the relative stability of polymorphs for a broad range of industries [3].
High-throughput computational screening of crystal structures based on free energies is rarely performed due to its high complexity as well as large computational effort, in particular if a first-principles potential energy surface is required [4]. It is more common to evaluate the relative stability of crystal polymorphs by calculating the lattice energy taking into account only potential energy contributions [5][6][7][8][9], effectively disregarding enthalpic and entropic contributions at finite temperature [2,10]. Finite pressure contributions when comparing different phases at different pressures is typically of a lower * marcin.krynski@pw.edu.pl † Present address: Warsaw University of Technology, Faculty of Physics, Koszykowa 75, 00-662 Warsaw, Poland magnitude, reaching only about 1 kJ/mol/molecule for pressure difference of several gigapascals. It was shown [11] that even if the vibrational free energy difference between two given polymporphs lies typically around 2 kJ/mol/molecule, it is sufficient to cause a rearrangement of the polymporph relative stability ranking. Furthermore, even when the vibrational contribution to the relative stability is taken into account in a number of cases, the effect of the thermal expansion of the crystal unitcell on the free energy is most frequently omitted. This is due to the, typically low impact of the thermal expansion on the free energy (around 1 − 2 kJ/mol/molecule [12]), which is, nevertheless, also sufficient to affect the polymporph stability ranking.
The vibrational part of the free energy can be accessed by, among others, two straightforward types of calculation: within the harmonic approximation given by lattice dynamics calculations [13,14] and with statistical sampling methods that accounts for all anharmonic contributions, for example via thermodynamic integration (TI) [15][16][17][18][19]. Even though methods like TI are more accurate, they are also extremely computationally demanding, requiring a large amount of statistical sampling in order to achieve the necessary accuracy. This renders this technique often impossible to carry out within a highthroughput setting. Approximations to the contribution of anharmonic terms to the free energy can be accessed by a number of other methods that are less computationally demanding. However, such approximations have been shown not to present a significant improvement over the much less computationally demanding harmonic approximation for the investigation of polymporph relative stability [20]. Still, harmonic lattice dynamics are not a viable solution for high-throughput screening if force evaluations are a bottleneck, since the calculation typically involves hundreds of force evaluations for a single structure (or costly perturbation theory techniques), considering the full unit cell.
Within the last decades, the rapid increase of computer power, allied to the rise of machine learning (ML) and big-data algorithms in the realm of material science, allowed for large-scale screening of materials properties, including those related to polymorphism [10,[21][22][23][24][25][26][27][28]. There are only a handful of examples where vibrational free energies [29], or other quantities related to the vibrational density of states [30][31][32][33], were successfully predicted with the assistance of machine learning (ML) methods. Those methods, however, do not focus on high transferability, or, if they do, rarely achieve the necessary accuracy to differentiate between polymorphs. Clearly, if one could train a very accurate ML interatomic potential for a large class of systems, it would represent the best solution for the evaluation of lattice energies and free energies at the same time. However, despite the exceptional performance of many such potentials, typical root-meansquare errors on the forces lie around 20 meV/Å/atom [34][35][36][37][38][39]. With such errors, the expected prediction accuracy of phonon modes is ±0.15 THz for the best performing potentials [34]. If the resulting phonon accuracy, as in [34], is assumed to be constant along the entire frequency range, the harmonic free energy calculation error amounts to 0.38 kJ/mol/atom.
In this study, we target high accuracy and low computational cost for harmonic free energy predictions. We build a model for the prediction of Helmholtz harmonic free energies of molecular crystals based on Gaussian Process Regression (GPR) and Smooth Overlap of Atomic Positions (SOAP) [40] descriptors for representing the local atomic environments. We optimize the training and validation set selection with a computationally cheap empirical potential, confirm its transferability to a firstprinciples potential, and proceed to achieve a model with first-principles accuracy with a very low cost of training. For a set of hydrocarbon crystals, we are able to achieve a mean absolute error on the free energies of 0.04 kJ/mol/atom. We analyzed the stability ranking for a few families of hydrocarbon crystal polymorphs up to 300 K, highlighting the power and accuracy of the model. Furthermore, this method can predict the anisotropic lattice expansion of these crystals, allowing a cheap evaluation of volume expansion and free energies in the quasiharmonic approximation.

II. RESULTS AND DISCUSSION
Because it was shown [20] that the harmonic approximation to the free energy can be a suitable estimate for the computation of the relative stability between different structures of molecular crystals, this project focuses on predicting the harmonic Helmholtz free energies F . Con-tributions from pressure that would be described instead by the Gibbs free energy are not considered because the structures regarded in this study are typically observed much below 1GPa of pressure, making this contribution to the free energy negligible. Throughout this paper, for the sake of simplicity, F is evaluated at the Γ point of the Brillouin zone of a given unit cell. We consider unit cells larger than the primitive cell where needed (see Methods). The harmonic free energies are thus calculated as where ω i is the frequency of a given phonon mode at the Γ point. When taking lattice expansion into account, the vibrational frequencies depend indirectly on the temperature such that ω i = ω i (V (T )).

A. Definition of the GPR model
The key assumption of the free energy prediction approach explored in this project is that even if free energies are defined only for the entire collection of atoms of the crystal structure, they can be decomposed into local contributions of atomic environments. The approach of casting a global property on local environments was explored previously [41,42] for the generation of an interatomic potential from quantum mechanical data. The problem of the harmonic Helmholtz free energy prediction is approached by connecting the atomic-wise free energy to the full free energy by where F is the vector with all measured free energies for a given crystal set of dimension N s (number of crystal structures in the training set), M is an incidence matrix of dimension N s × N ae (number of atom environments in the given set) and f is the vector of all, unobserved, atom-wise free energies in the chosen ensemble. Then, the prediction of f in the training set is modeled as where C is the matrix containing the similarities between pairs of atomic environments (dimension N ae × N ae ), defined as where σ is a scaling prefactor, and q i is a vector of length D describing local atomic environments. C ij corresponds to the Gaussian kernel. In Eq. 3, α is a vector of N ae weights for each atomic environment, such that Opening up this equation element-wise, the full free energy of one sample i in the training set is given by Optimizing the weights α k is equivalent to minimizing the loss function where σ 2 is a regularization parameter related to the variance of the noise of the data. Finally, substituting Eq. 6 into 7, the minimization is straightforward and leads to where I is the identity matrix of dimensions N s × N s . In this way, one can obtain the optimized weights with no need to define or observe atom-wise free energies. Finally the prediction of the free energy of a new structure that is not contained in the training set is achieved by calculating where C * is the similarity matrix between the atomic environments q * of the new structure to the ones in the training set, with elements All hyper-parameters for the GPR model and the representations were selected by minimising, using the steepest descent method, the negative log marginal likelihood function [43] − ln P (F |(l, σ , θ)) = 1 where θ is a vector containing the hyperparameters of the representations entering q. The application of the steepest descent method is only guaranteed to find a local minimum. A wide space of hyper-parameters was considered in order to increase the probability of finding a global minimum. In all supervised machine learning based models, the quality of the model strongly depends on the quality of the training set. Typically, selecting the training set can be done by either a random selection of samples, given that the considered ensemble is fairly homogeneous, or by implementing methods that aim at covering the sampled domain by maximising the resulting prediction accuracy, such as the "correlation" clustering method [44], genetic optimization [45] or k-fold cross-validation [46]. Unfortunately, most of the methods from the latter group require a large pool of data for which the target property, like free energy in this case, is available. In this study, because one of the objectives is to minimize the computational cost of obtaining a good training set, the applied procedure focuses on selecting an optimal training subset based exclusively on the geometrical parameters of the crystal structure.
For this purpose, the farthest point sampling (FPS) [47] method is applied, that searches for a subset of the entire investigated crystal structure ensemble that covers evenly all structural motifs of the sampled domain with minimal information overlap. First, a similarity measures between molecular crystal structure R a→b is defined according to the best-match structural kernel [48] method, as it is needed for the application of the FPS where C(q a j , q b i ) is the kernel matrix element defined in equation 4, q a j and q b i are i-th and j-th atomic environment representations of structure a and b respectively, and similarly n a and n b are the number of atoms in those structures. R a→b defines how well atoms of structure a can represent geometrical motifs of structure b and R a→b = R b→a . In other words, it is possible that atomic environments of structure a represent well those of structure b, while structure b contains geometric features not present in a. This method of defining the relationship between crystal structures is very similar to others typically chosen for such tasks [40,[48][49][50][51], with the difference that R a→b is not invariant with respect to the crystal structure index (R a→b = R b→a ) so it is not a similarity metric in a strict sense. Next, according to the FPS algorithm, the training set is created by iteratively picking structures that are least represented by those already present in the training set. Since any crystal structure can be used as the starting point for the FPS algorithm, the applied method selects N Ω potential training sets, where N Ω is the number of crystal structures in the considered ensemble. In order to choose out of N Ω potential training sets, we have investigated the scaled cumulative sum I(N m ) of the R a→b , (13) where N m is the total number of the molecular crystal configurations in the training set. This quantity reveals how fast a given training set candidate converges to unity, which we consider to represent a full coverage of the sampled feature space. In another sense, the I(N m ) quantity can be seen as the description of the information acquisition during consecutive steps of the FPS algorithm. Finally, training set with the highest recorded value of I(N m ) after all N m = 60 steps of the FPS algorithm is chosen. The training set size of N m = 60 was chosen because above this number, the improvement of the prediction accuracy was too small to justify a larger training set and the associated increase in computational effort.
FIG. 1: a) Learning curves obtained for 100 randomly chosen learning sets out of total 444 in grey, and the chosen learning set with the highest recorded value of I(N m ) in red. Data was obtained using SOAP representation and the classical force field model with the free energy calculated at 300K. b) Correlation between predicted and calculated free energies at 300 K (classical force field, SOAP representations). Different crystal families are represented by different colors.
In the same spirit of maximizing accuracy and minimizing cost, with the objective of performing free energy predictions with ab initio data, the aim was to select an efficient and reliable validation set, without using the entire ensemble. Here the goal is to create such a subset that would represent well the entire set, so as to include, for example, a proportional number of outlier structures as found in the entire set. A random selection of validation set would not fulfill this criterion due to the limited size of validation set used in this project. Additionally, this task largely differs from selecting the training set, because it typically contains a greater relative number of outliers compared to the entire set. In order to optimally select the validation set, while preserving the density of outliers, a stratified approach was used. Here each crystal structure a is assigned a similarity index S a , that compares a given crystal to entire set The relatively high values of S a indicate a "typical" crystal and low values indicate "outliers". Next, the entire set is sorted with respect to S a and the validation set is chosen by selecting every n-th element of the sorted set, with n = round(N Γ /N Θ ) where N Γ and N Θ are the target numbers of structures in the validation and training sets. All sets sorted with respect to S a are presented in supplementary information Figure SI2.
Within the discussed framework and common to many ML models, the choice of method encoding the atomic environments to numerical representations has an impact on the resulting performance of the model. In this project, three well-established general-use atomic environment representations [52] were selected and tested, namely: Smooth Overlap of Atomic Positions (SOAP) [40] that uses spherical harmonics to locally expand atomic densities, Many Body Tensor Representation (MBTR) [53] that uses distributions of different structural motifs (like radial or angular distribution functions) and Atom-centered Symmetry Functions (ACSFs) [54] that use two-and three-body functions detecting specific features. The Python implementations of the mentioned representations found in the DScribe package [55] was used.

B. Model implementation and validation
For the purpose of this work we have chosen crystals composed of seventeen different hydrocarbons: pyrene, methylcyclopentane, styrene, naphthalene, benzene, tetracene, mesitylene, pentane, pentacene, hexane, ethylbenzene, propane, heptane, phenanthrene, butane, hexacene and anthracene. We have included most available polymporphs that could be obtained from the Cambridge Crystallographic Data Centre [56] (CCDC), leading to an ensemble of 74 structures. We noted that polymporphs of very similar lattice constant in CCDC tend to be almost identical, with close to negligible differences in atomic positions, for example, the case of ANTCEN20 and ANTCEN22. Finally, the sample domain was further expanded by introducing structures with perturbations of roughly 5% in the lattice parameters, as this can lead to up to 16% increase in unit cell volume -a typical volume expansion percentage for molecular crystals [57]. The addition of crystal structures with strongly perturbed lattice parameters was found to be crucial for the later prediction of lattice expansion coefficients. Finally, N Ω = 444 crystal structures were considered in this project.
The building and testing of the framework was initially performed using a classical force-field potential (AIREBO, as detailed in Methods). In the first steps of the model verification, the training and validation set selection criterion, based on the FPS method and maximization of I(N m ), was evaluated. For this purpose, based on the classical force field data with prediction performed at 300 K and with SOAP atomic-environment representations, free energy mean absolute error F MAE was calculated where N is the number of structures for which the prediction is performed. The results are presented in Figure 1a in the form of learning curves, with increasing size of the training set N and with the validation set. It is visible that the learning curve obtained for the chosen training set, so with the highest I(N m ), shows one of the lowest F MAE at the target training set size among all potential sets obtained using FPS method. Next, the linear and monotonic correlations between benchmark F and predicted F values was assessed by calculating the Pearson and Spearman correlation coefficients. For predictions performed at 300 K with the SOAP representation they were found to be 0.9996 and 0.9894, respectively. A value so close to 1 for these coefficients indicate a good performance of the developed framework. Furthermore, due to the low cost of the lattice dynamics calculations performed using classical force field, the F MAE was inspected for the entire set (300 K with the SOAP representation) and it was found to be 0.042 kJ/mol/atom. Additionally, the F MAE = 0.218 kJ/mol/atom was obtained for 10% of the crystals with the poorest prediction and 0.023 kJ/mol/atom for the remaining 90% of samples. Figure 1b shows the predicted free energy values F compared with the benchmark data F for the different crystal families. The analysis gives an indication of the system-sensitive performance of the framework, revealing that crystals of pentane, pentacene, tetracene and hexane are characterised by the poorest averaged prediction accuracy, with the F MAE around 2 kJ/mol/molecule, reaching a possible free energy difference between different polymporphs [11,12]. Additionally, the predictions performed for crystal structures with strongly perturbed lattice parameters were noticeably poorer, even if the training set contained parental crystal structures. Nevertheless, the prediction accuracy overall is very high, especially considering the diversity of hydrocarbons represented. Figure 2a shows the learning curves by monitoring F MAE at 300 K with increasing training set size and a constant validation set. Additionally, the impact of the atomic environment representation on the efficiency of the method was investigated. The learning is wellbehaved for all representations, as expected for properly parameterized machine learning models. The results obtained with the SOAP representation, with a 6Å cutoff and 1Å for the standard deviation of the Gaussian functions used to expand the atomic density, are characterised by the lowest F MAE , showing that it is the best representation within the investigated set. Finally, the accuracy of the predictions are noticeably affected by the temperature at which the free energies are required, going from 0.019 kJ/mol/atom at 300 K and 0.015 kJ/mol/atom at 200 K to as low as 0.002 kJ/mol at 0 K. ferability of the model when using DFT data was investigated. For that, the PBE exchange correlation functional with pairwise van der Waals interactions was employed, as detailed in Methods. As a test, the similarity between relaxed structures obtained with the empirical potential and DFT was assessed by analysing the root mean square deviation (RMSD) of the atomic positions averaged over entire set. RMSD for carbon and hydrogen were 0.16 A and 0.20Å, respectively. Importantly, differences in the SOAP representation were also investigated by calculating the root mean square error normalized by the standard deviation X , defined as where D is the number of features of the representation and N is the number of atoms for which the X was calculated. Obtained values for both carbon and hydrogen are C =1.09 and H =1.15 respectively. Those results show that the overall structural features are in good agreement in these two potential energy surfaces. As a consequence of this structural similarity between two data sets, the training and validation sets obtained with the empirical potential, as explained in the previous section, can be automatically used in DFT. As a cross-check, the same training and validation set optimization procedure were independently applied on the optimized DFT structures, indeed obtaining the same results. This proved that the experience gathered from the first phase of the project, where only classical data was used, is fully transferable to the current stage, where we employ more accurate ab initio data. As a result, the more expensive ab initio lattice dynamics calculations were only performed for crystal structures included in the training and validation sets, greatly reducing the computational cost of the model generation.
Finally, the hyperparameters of the GPR model were re-optimised and were used to calculate learning curves for training and validation sets shown in Figure 2b, with the SOAP, MBTR and ACSF representations. All representations presented a good performance, with MBTR and SOAP yielding very similar learning curves. The obtained F MAE for the SOAP representation at full training set was found to be 0.038 kJ/mol/atom. Interestingly, a fairly good prediction performance can be obtained with as little as 20 crystal structures, resulting in F MAE =0.07 kJ/mol/atom. Such small training sets typically do not contain all different molecular components of the crystals that are present in the entire set, but can still describe it well. The remainder of this manuscript will focus on results obtained based on the DFT data with the SOAP representation, exclusively.
The proposed framework is summarised in the flowchart in Figure 3. In addition, as it is shown in the SI, the possibility of this model trained only on hydrocarbons to extrapolate to systems containing carbon, hydrogen and nitrogen atoms was investigated. Although the prediction accuracy decreases as the concentration of nitrogen atoms in the samples increases, the model is not completely invalid. It shows that with a small addition of structures to the training set, or building representations for new atoms that combine characteristics of the atoms that were previously trained [58,59] this framework could be easily extended to other systems.

D. Relative free energies of molecular crystals: stability ranking
The GPR model was employed to create a stability ranking of several families of hydrocarbon molecular crystals. Sixteen crystal families were considered, encompassing 38 polymorphs and 36 variants corresponding to different thermodynamical conditions with lattice parameters as they are given by the CCDC [56]. Additionally, 370 crystal structures with randomly distorted lattice parameters derived from the initial 74 were included. Figure 4 shows the lattice energy and the free energy obtained at various temperatures, presented as relative values to the crystal structure characterised by the lowest free energy at 300 K (full data is found in Table S2, in the SI). The identifiers of all crystal structures follow the convention used in CCDC [56]. For many crystal families, the structure with the lowest lattice energy is not the same as the one with the lowest free energy especially at the room temperature. A clear example is the pyrene crystal and its three polymorphs: Form I is represented FIG. 4: Lattice energy E lat. and predicted free energy differences between variants of four molecular crystal families: anthracene, naphthalene, benzene and pyrene.
Full data, also for a larger number of crystals and polymporphs, is available in the Table S2 in the SI. Relative energies are calculated separately at each temperature, always with respect to the lowest free energy structure at 300K. The numbers on the labels represent the identifiers of the crystals following the convention used in CCDC [56].
by PYRENE02 and PYRENE03 (structures measured at 423 K and 113 K, respectively, and ambient pressure); Form II is represented by PYRENE07 and PYRENE10 (at 93 K and 90 K, ambient pressure); and Form III is represented by PYRENE08 and PYRENE09 (measured at at 0.3 GPa and 298 K, and at 0.5 Gpa and 298 K, respectively). Form I is measured to be more stable than form III at all temperatures up to and beyond 430 K, at ambient pressure. Here, it is shown that the energy ranking formed based on lattice energy exclusively would place the high-pressure form III PYRENE09 (form III) structure very close to PYRENE02 (form I). An inclusion of zero-point-energy and vibrational contributions already at low temperatures irrevocably destabilizes form III. A similar example is the benzene crystal. Here, structures of the ambient-pressure form I, represented by, for example, BENZEN15, BENZEN19 or BENZEN26, are characterised by overall lower free energy comparing to the high-pressure form II structures, like BENZEN16 and BENZEN17. Interestingly, for this crystal family, the lattice energy can provides a satisfactory relative stability ranking. However, the need for including the vibra-tional contributions becomes visible once a high and ambient pressure variants of one polymorph are compared, e.g. BENZEN13 and BENZEN26. It is visible in Figure  4 that if considering only lattice energies, BENZEN13 shows the lowest energy compared to other crystal variants, with lattice energy lower than that of BENZEN26 by 2.58 (kJ/mol/molecule). However, the free energy prediction shows that at 300 K, the BENZEN26 structure becomes the most stable out of all those investigated, and its relative free energy with respect to BENZEN13 is now lower by 2.92 (kJ/mol/molecule), effectively swapping places in the relative ranking stability with BEN-ZEN26. For this case, and to test the predictions of the model in practice, the free energies for both BENZEN13 and BENZEN26 structures were additionally calculated with DFT. These calculations showed that BENZEN13 is characterised by a free energy that is 3.43 (kJ/mol) higher than that of BENZEN26 at 300 K, confirming the results obtained with the GPR model.
The rearrangement of the relative stability ranking when room temperature free energy is taken into account is a very common trend among the investigated samples, and there are a number of cases, where even at 0 K the zero point energy contribution is high enough to affect the relative stability ranking. These observations are in good agreement with previous studies, where more direct methods were used [11]. In some cases, the prediction accuracy of this model is not sufficient to determine the relative stability of some structures. Nevertheless, the model is accurate enough to point towards those few that are characterised by the lowest free energies. Here, even only narrowing the pool of considered strucrures can effectively decrease the computational effort of phonon calculations required, if more accuracy is needed.

E. Predicting lattice expansion.
Because one of the challenges in high throughput computational screening of crystal structures is accounting for thermal lattice expansion, the application of the trained free energy model was explored in this context. To illustrate the procedure, a simple case where only one of the lattice parameters is being perturbed was considered. For this purpose the BENZEN11 crystal was chosen, with the lattice parameter a being sampled within 6.52Å and 7.32Å. Next, within the quasi-harmonic approximation, the free energy was calculated and predicted as a function of a. Figure 5a shows the comparison between the GPR model and DFT calculations for the free energy at 200 K. The optimal lattice parameter a is determined by a Birch-Murnaghan [60] fit. While there are small differences between the DFT and the GPR curves, mostly consisting of a shift in energy, the resulting optimal lattice parameter a is very similar in both cases, and equal to 6.92Å and 6.95Å respectively. This simple and fairly artificial example illustrates that the prediction accuracy of this framework is sufficient to be employed in the context of the lattice expansion/contraction prediction.
A more challenging task is the prediction of anisotropic lattice changes. Direct calculations of anisotropic lattice expansion requires lattice dynamic evaluations for, typically, hundreds of structures of the same crystal polymorph, making it a very costly calculation for a high-throughput setting. Although harmonic and quasiharmonic models [61] as well as an approach based on the assumption of the linear relation between free energy and volume have been proposed to overcome this cost [62], with this framework these lattice changes can be estimated without relying on any ansatz for the dependence of the free energy on the lattice parameters. It is worth noticing that even if the free energy predictions at various temperatures requires training the ML model multiple times, it happens with minimal computational overhead once appropriate lattice vibrations have been computed. Four molecular crystals were picked, namely P 2 1 /a anthracene (ANTCEN), P bca benzene (BENZEN), P 1 pentacene (PENCEN01) and P bcn styrene (ZZZTKA01) and hundreds of ionic relaxations with the a, b and c lattice parameters perturbed by around 5% were performed. Next, for each of those perturbed structures free energy prediction at a number of temperatures from 0 K to 300 K range was performed. Figure 5b shows a 3D visualisation of free energy predictions for over 300 different combinations of lattice parameters a, b and c of the ANTCEN crystal. Even with such a high number of sampled lattice parameter combinations, the position of the free energy local minima might not overlap with the gathered data. In this case, in order to find the minimum in this high-dimensional space, an active learning based on the GPR algorithm is employed. Here, the GPR is used as a multi-dimensional, non-linear regressor, as implemented in the scikit-learn [63] package. In detail, the following bootstrap procedure is used: 1. Identifying the position of the data point with the lowest free energy value according to the GPR 3D interpolation.
2. For the chosen set of (a, b, c) lattice parameters perform an ionic relaxation and predict the free energy with the trained model.
3. If the predicted free energy of the (a, b, c) sample varies from the free energy obtained by the 3D GPR regression, a new 3D GPR regression is performed, now explicitly including sample (a, b, c), then go back to step 1.
4. If the predicted free energy of the (a, b, c) sample is sufficiently close to the one of the 3D GPR regression (within ±0.1%), then the scheme is stopped and the optimal lattice parameters are considered to be found.
We found that typically only around 3 additional relaxations and free energy predictions (per temperature) are necessary to achieve sufficient convergence of the lattice parameters. By employing this procedure to predict the anisotropic lattice changes the lattice-parameter change is calculated, as well as the full volume change of the selected crystals, as shown in Figure S5.
The results obtained can be compared to experimental values where data is available. For anthracene the experimentally measured volume change is V exp.
120K /V exp. 83K = 1.017 [67,68] and V M L 120K /V M L 83K = 1.009. The predictions are quite close to experimental data and overall a high degree of anisotropy is observed. Moreover, a deviation from a linear behavior of the free energy change with respect to volume is observed, as shown in Figure S6.
This framework can thus be used to create the relative stability ranking including the thermal expansion effect on the free energy. Here, one example of how this can impact the relative stability and crystal form of these systems is presented. For this purpose, BEN-ZEN13 and BENZENB26 (high and low pressure variants of the P 2 1 /b2 1 /c2 1 /a benzene I polymorph [69]) are selected, as well as BENZEN16 (a high pressure P 2 1 /c benzene II polymorph [70]). The initial lattice constants were taken from the CCDC. As shown in Figure 5c, by simply searching for the free energy minimum at 0 K using the procedure described above, BENZEN13 and BENZEN26 were found to end up being characterised by almost identical (predicted) free energies and lattice constants. Further inspection indicated that indeed the BENZEN13 and BENZEN26 structures converged to the same structure, and the same behavior was found at all investigated temperatures. Even if somewhat expected, given that they are high and low pressure phases within the same crystal group and in the absence of any applied pressure it is natural that they both adopt the lowpressure structure, the fact that this result came from the model alone, and that the free energy predictions were able to capture this transition, shows that the method is robust. The BENZEN16 structure is stabilized by 1 kJ/mol/molecule upon increasing the temperature from 0 K to 300 K, as shown in Figure 5c. This stabilization is accompanied by an appreciable lattice expansion with a volume increase of around 6% from 0 to 300 K.

III. CONCLUSIONS
In summary, we proposed a framework provides a machine learning model with first principles accuracy for the harmonic Helmholtz free energies of molecular crystals, that is suitable for high-throughput studies. In addition, it was shown that the training and validation set of the model can be optimised using a cheaper empirical potential, and then transferred to first-principles calculations, thus substantially decreasing the cost of training, without sacrificing accuracy.
The model was tested to predict the relative energetic stability ranking of several diverse hydrocarbon polymorphs and distorted crystal structures derived from them and the changes on this ranking with increasing temperature was studied. We observed that in most cases, omitting thermal effects and instead using only the lattice energy, leads to misleading results. Furthermore, it was shown that the model can be efficiently employed to calculate the anisotropic lattice expansion -a task rarely approached due to its complexity and high computational demand when performed at the ab initio level. Unsurprisingly, taking the anisotropic lattice expansion into account leads to further changes in the stability ranking. Naturally, the same framework could be used to predict other quantities derived from vibrational properties, like the vibrational heat capacity.
The strengths of this framework are its low computational cost, reliability and accuracy. However, because the model is trained to directly predict free energies, one still has to deal with the computational cost of obtaining optimized structures, which here we obtained from firstprinciples geometry optimizations. Nonetheless, fitting a machine-learned interatomic potential is becoming more streamlined [71], even though these potentials rarely target the accurate description of vibrational properties due to the added complexity of including them in the learning procedure. The presented framework, on the other hand, can be easily combined with any potential that can predict structures in a reasonable manner and has the potential to be more accurate.
Extending this framework beyond hydrocarbon-based crystals could be straightforward, albeit perhaps requiring different training data. We have already observed that the framework is capable of predicting DFT free energies from FF-relaxed structures with promising accuracy (see Supplementary Information). Finally, targeting fully anharmonic free energies with ab initio accuracy is still a daunting task that can, nevertheless, profit from the knowledge gained in this study.

IV. METHODS
Geometry optimisation calculations with empirical potentials were performed using LAMMPS [72] together with AIREBO [73] interatomic potentials. The conjugate gradient minimization algorithm was used with dummy parameters to ensure full convergence, namely 10 −25 (1) and 4 × 10 −25 kJ/mol/Å for energy and forces respectively and with 5 · 10 4 maximum iterations of the minimizer. Phonon calculations with the empirical potentials were performed using the i-PI [74] code, considering 2 × 2 × 2 repetitions of the primitive cell. The phonons were calculated by finite differences with a 0.005Å displacement in all Cartesian directions.
All ab initio simulations were performed using the FHI-aims package [75]. For this purpose, we employed light settings for all atomic species, together with the Perdew-Burke-Ernzerhof exchange-correlation functional [76] and many-body dispersion corrections [77]. We have used 5 × 5 × 5 k-point sampling of the Brillouin zone. A self-consistency convergence criterion of 10 −5 eV/Å was imposed on the forces, which ensured that energies were converged to 10 −7 eV or below. The relaxation was performed using the trust radius version of the Broyden-Fletcher-Goldfarb-Shanno [78,79] optimization algorithm with the maximum residual force component threshold equal to 10 −4 eV/Å. Lattice dynamics calculations were performed through finite differences using Phonopy [80]. The atomic displacements were of 0.002 A in all Cartesian directions. The size of the supercell was individually chosen for the different molecular crystals, with the requirement that at least twice the distance between molecular centers of mass of adjacent molecules was comprised by the vector lengths in each direction.
The framework for the GPR model was developed using Python3 and Fortran95 languages. A preliminary version of the core functionalities is available in https://github.com/sabia-group/fep.git. The SOAP, MBTR and ACSF representations were calculated using the DScribe [55] package.

DATA AVAILABILITY
All data necessary to replicate and interpret the free energy prediction framework discussed in this article can be accessed in the NOMAD repository with identifier https://dx.doi.org/10.17172/NOMAD/2020.09.16-1. Figure S1: Convergence of the I Θ (N m ) calculated for all potential training set candidates. Colours from yellow to brown marks sets of low to high I Θ (N m ) convergence. The red colour is marking the selected training set. S1  Figure S2: Sorted vector S a calculated for each of discussed sets: validation, training and the entire set. Each dot represent one molecular crystal. Presented in the Figure S2 sorted vector S a for the training set differs greatly from this of the entire set. This is caused by the necessity for the training set to cover proportionally more outliers than those present in the entire set.

GPR Model: Parameters and Tests
S2 Figure S3: where X is the subset of N X crystal structures for which the prediction is performed) calculated for the validation and the training set for both, ab initio and classical models at 300K.  Figure S4: Temperature evolution of the relative lattice parameters calculated for four investigated molecular crystals. Figure S5: Temperature evolution of the relative volume calculated for four investigated molecular crystals.

Predicting DFT free energies from force field structures
We have analyzed whether the force-field (FF) structures could be used to predict DFT free energies. In order to obtain an upper limit for the errors that such study would yield, we performed additional calculations of DFT free energies using geometries obtained by relaxations with the AIREBO force-field. We have used the same training (all 60 structures), validation sets and SOAP descriptors as in the manuscript. We performed a new optimization of the hyperparameters of the GPR model with the same approach as presented in the manuscript and obtained values: σ=0.079, l=18 and σ =0.04. The predictions were performed at 300K. Figure S8 shows a correlation between predicted and calculated free energies. We have obtained a mean absolute deviation of 0.07 kJ/mol/atomalmost a factor 2 higher when compared to the 0.04 kJ/mol/atom when DFT structures were used. It is expected that once more accurate potential is used, resulting in structures closer to those obtained in DFT, a higher prediction accuracy could be achieved. This method thus shows some promise that the DFT free energies could be directly from FF structures and potentially free energies. Figure S8: Correlation between predicted F and calculated F free energies at 300 K, with a model trained on force-field structures, but with DFT free-energy predictions. Different crystal families are represented by different colors.