Materials property prediction for limited datasets enabled by feature selection and joint learning with MODNet

In order to make accurate predictions of material properties, current machine-learning approaches generally require large amounts of data, which are often not available in practice. In this work, MODNet, an all-round framework, is presented which relies on a feedforward neural network, the selection of physically meaningful features, and when applicable, joint-learning. Next to being faster in terms of training time, this approach is shown to outperform current graph-network models on small datasets. In particular, the vibrational entropy at 305 K of crystals is predicted with a mean absolute test error of 0.009 meV/K/atom (four times lower than previous studies). Furthermore, joint learning reduces the test error compared to single-target learning and enables the prediction of multiple properties at once, such as temperature functions. Finally, the selection algorithm highlights the most important features and thus helps to understand the underlying physics.


I. INTRODUCTION
Designing new high-performance materials is a key factor for the success of many technological applications [1].In this respect, Machine Learning (ML) has recently emerged as a particularly useful technique in materials science (for a review, see e.g.Refs.[2][3][4]).
Complex properties can indeed be predicted by surrogate models in a fraction of time with almost the same accuracy as conventional quantum methods, allowing for a much faster screening of materials.
Many studies have been published lately, differing by the feature generation approaches or the underlying ML models.Concerning crystalline solids, the majority of methods presented up to now can mainly be divided in three categories.The first one, called 'ad hoc' models here, relies on a case per case study, targeted on a specific group of materials and a specific property.Typically, hand-crafted descriptors are tailored in order to suit the physics of the underlying property and are the major point of attention, while common simple-to-use ML models are chosen.Some examples include the identification of Heusler compounds of type AB 2 C [5], force field fitting by using many-body symmetry functions [6], the prediction of magnetic moment for lanthanide-transition metal alloys [7] or formation energies by the sinecoulomb-matrix [8].This type of method is popular because it is simpler to construct case by case descriptors, motivated by intuition, than general all-round features.Furthermore, by focusing on a specific problem, good accuracy is often achieved.For instance, performance is increased when learning on a particular structure, which is therefore inherently built into the model.
The second category, that appeared more recently, gathers more general models that are applicable to various materials and properties based on graph networks.They transform the raw crystal input into a graph and process it through a series of convolutional layers, inspired by deep learning as used in the image-recognition field [9].Examples of such graph models are the Crystal Graph Convolutional Neural Network (CGCNN) [10] or the MatErials Graph Network (MEGNet) [11].
Graph models are very convenient as they can be used for any material property.However, their accuracy crucially depends on the quantity of the available data.Since the problems that would benefit the most from machine learning are the ones that are computationally demanding with conventional quantum methods, they are precisely those for which less data is available.For instance, the band gap has been computed within GW for 80 crystals [12], the lattice thermal conductivity for 101 compounds [13], and the vibrational properties for 1245 materials [14]).It is therefore important to develop techniques that can deal efficiently with limited datasets.This has resulted in a third category of models trying to bridge the gap between the two former ones and combining their advantages.Examples are the sure independence screening and sparsifying operator (SISSO) [15], Automatminer [16], CrabNet [17] and AtomSet [18].
The present article introduces a model that falls in this third category.It is based on three key aspects for achieving good performance on small datasets: physically meaningful features, feature selection, and joint-learning.We show that this framework is very effective in predicting various properties of solids with small datasets and why feature selection is important in this regime.Finally, the selection algorithm also allows one to identify the most important features and thus helps understanding the underlying physics.

A. The MODNet model
The model proposed here consists in building a feedforward neural network with an optimal set of descriptors.This reduces the optimization space without relying on a massive amount of data.Prior physical knowledge and constraints are taken into account by adopting physically-meaningful features selected by a relevance-redundancy algorithm.Moreover, we propose an architecture that, if desired, learns on multiple properties, with good accuracy.This makes it easy to predict more complex objects such as temperature-, pressure-, or energy-dependent functions (such as the density of states).The model, illustrated in Fig- The raw structure is first transformed into a machine-understandable representation.The latter should fulfill a number of constraints such as rotational, translational and permutational invariances and should also be unique.In this study, the structure will be represented by a list of descriptors based on physical, chemical, and geometrical properties.In contrast with more flexible graph representations, these features contain pre-processed knowledge driven by physical and chemical intuition.Their unknown connection to the target can thus be found more directly by the machine, which is key when dealing with limited datasets.In comparison, general graph-based frameworks could certainly learn these physical and chemical representations automatically but this would require much larger amounts of data, which are often not available.In other words, part of the learning is already done before training the neural network.To do so, we rely on a large amount of features previously published in the literature, that were centralized into the matminer project [19].These features cover a large spectrum of physical, chemical, and geometrical properties, such as elemental (e.g.atomic mass or electronegativity), structural (e.g.space group) and site-related (i.e local environments) features.We believe that they are diverse and descriptive enough to predict any property with excellent accuracy.Importantly, a subset of relevant features is then selected, in order to reduce redundancy and therefore limit the curse of dimensionality [20], a phenomenon that inhibits generalization accuracy.In particular, previous works showed the benefit of feature selection when learning on material properties [15,21].
We propose a feature selection process based on the Normalized Mutual Information (NMI) defined as, with MI the mutual information, computed as described in Ref. [22] and H the information entropy (H(X) = MI(X, X)).The NMI, which is bounded between 0 and 1, provides a measure of any relation between two random variables X and Y.It goes beyond the Pearson correlation, which is parametric (it makes the hypothesis of a linear model) and very sensitive to outliers.
Given a set of features F, the selection process for extracting the subset F S goes as follows.When the latter is empty, the first chosen feature will be the one having the highest NMI with the target variable y.Once F S is non-empty, the next chosen feature f is selected as having the highest relevance and redundancy (RR) score: where (p, c) are two hyperparameters determining the balance between relevance and redundancy.In practice, varying these two parameters dynamically seems to work better, as redundancy is a bigger issue with a small amount of features.Practically, after some empirical testing, we decided to set p = max[0.1,4.5 − n 0.4 ] and c = 10 −6 n 3 when F S includes n features, but other functions might even work better.The selection proceeds until the number of features reaches a threshold which can be fixed arbitrarily or, better, optimized such that the model error is minimized.When dealing with multiple properties, the union of relevant features over all targets is taken.Our selection process is in principle very similar to the mRMR-algorithm [23], but it goes beyond by combining both redundancy and relevance in a more flexible way by introducing the parameters p and c.Furthermore, it is less computationally expensive than the Correlation-based Feature Selection (CFS) [24] and provides a global ranking.
In contrast with what is usually done, we take advantage of learning on multiple properties simultaneously, as recently proposed for SISSO [25].This could be used, for instance, to predict temperature-curves for a particular property.
In order to do so, we use the architecture presented in Figure 1.Here, the neural network consists of successive blocks (each composed of a succession of fully connected and batch normalization layers) that split on the different properties depending on their similarity, in a tree-like architecture.The successive layers decode and encode the representation from general (genome encoder) to very specific (individual properties).Layers closer to the input are shared by more properties and are thus optimized on a larger set of samples, imitating a virtually larger dataset.These first layers gather knowledge from multiple properties, known as joint-transfer learning [26].This limits overfitting and slightly improves accuracy compared to single target prediction.
Taking vibrational properties as an example, the first-level block converts the features in a condensed all-round vector representing the material.Then, a second-level block transforms this representation into a more specific thermodynamic representation that is shared by many third-level predictor blocks, predicting different thermodynamic properties (specific heat, entropy, enthalpy, energy at various temperatures).A fourth-level block splits different predictors based on the actual property, but sharing different temperature predictors.Optionally, another second-level block could be built shared by mechanical third-level predictors.

B. Performance assessment
To investigate the predictive performance of MODNet, two case studies are considered for properties originating from the Materials Project (MP) [14,[27][28][29][30]. First, we focus on single-property learning.We benchmark MODNet against MEGNet, a deep-graph model, and SISSO, a compressed-sensing method, for the prediction of the formation energy, the band gap, and the refractive index.Second, we also consider multi-property learning with MODNet for the vibrational energy, enthalpy, entropy, and specific heat at 40 different temperatures as well as the formation energy, as the latter was found to be beneficial to the overall performance.Since some models only predict one property at a time, we compare their accuracy with that of MODNet on the vibrational entropy at 305K.Details about the datasets, training, validation, and testing procedures are provided in the Methodology section.
Table I summarizes the results for single-property learning on a left-out test set for the formation energy, the band gap, and the refractive index.The complete datasets for the formation energy and the band gap include 60 000 training samples.For the band gaps, a training set restricted to the 36 720 materials with a non-zero band gap (labeled by a superscript nz in the Table ) is also considered as it was done in the original MEGNet paper [11].For the refractive index, the complete dataset is much more limited containing 3 240 compounds.In addition to these complete datasets, subsets of 550 random samples are also considered in order to simulate small datasets.The results are systematically compared with those obtained from the MEGNet and SISSO regression.Two variants of MEGNet are used: (i) with all weights randomly initialized and (ii) by fixing the first layers to the one learned from the formation energy (i.e using transfer learning as recommended by the authors when training on small datasets).MODNet systematically outperforms MEGNet and SISSO when the number of training samples is small, typically below ∼4 000 samples, even when using transfer learning.In contrast, for the large datasets containing the formation energy and the band gap, MEGNet (even without transfer learning) leads to the lowest prediction error.SISSO was found to systematically result in higher errors, and does not show significant improvement when increasing the training size.
Depending on the amount of available data, a clear distinction should thus be made between feature-and graph-based models.The former should be preferred for small to medium datasets, while the latter should be left for large datasets, as it will be confirmed for the vibrational properties.
For the second case study, i.e. multi-target learning, the dataset only includes 1 245 materials for which the vibrational properties have been computed [14].
Figure 2 shows the absolute error distribution on the vibrational entropy at 305K (S 305K ) at three training sizes (200, 500, and 1100 samples) for different strategies, for a systematic identical test set of 145 samples.Furthermore, Supplementary Figure 7 reports the test MAEs as a function of the training size for the same different strategies.MODNet is compared with a Random Forest (RF) learned on the composition alone (i.e. a vector representing the elemental stoichiometry) similar to a previous work relying on 300 vibrational data [31].This strategy is referred to as c-RF in order to distinguish it from another strategy, labeled RF, which consists in a RF learned on all computed features (covering compositional and structural features).Note that, for both c-RF and RF, performing feature selection on the input space has no effect on the results as a RF intrinsically selects optimal features while learning.This strategy can be seen as the baseline performance.The state-of-the art methods MEGNet with transfer learning (i.e. using the embedding trained from the formation energy) and SISSO are also used in the comparison.Another strategy, labelled AllNet, is considered which consists of a single-output feedforward neural network, taking all computed features into account.Finally, the results obtained with m-MODNet and m-SISSO, taking all thermodynamic data and formation energies, are also reported.
The lowest mean absolute error and variance is systematically found for the MODNet models, with a significant (∼ 8%) gain in accuracy for our joint-learning approach, more noticeable at lower training sizes.The RF approaches are performing worst in our tests, with a large spread and maximum error, especially when considering only the composition.This The density is obtained from a kernel density estimation with Gaussian kernel.The mean µ (equal to the MAE) and variance σ of each distribution are also reported in µeV/K/atom.models, AllNet, which is also based on physical descriptors, provides a baseline to measure the gain in performance achieved thanks to feature selection.In Figure 2 and even more clearly in Supplementary Figure 7 of the Supplementary Information, it can be seen that the usefulness of feature selection decreases with the training size.While, for 200 training samples, the gain is ∼12%, it reduces to ∼5% for 1000 training samples.
It is worth noting that, at the lower end of the training-set size (see 200 samples), SISSO has a comparable error with the other methods while offering a simpler analytic formula, which can be valuable.However, when increasing the training-set size, its error distribution does not seem to improve significantly in contrast with the other methods.Furthermore, contrary to m-MODNet, m-SISSO does not seem to provide any noticeable improvement with respect to SISSO in this example.
The m-MODNet was trained on four vibrational properties from 5 to 800 K: entropy, enthalpy, specific heat and Helmholtz free energy.Although the vibrational entropy at 305 K was systematically used to compare against other models, excellent performance was also found on the other properties.Table II contains the MAE for these four properties at 25, 305 and 705 K. Typical values of the corresponding properties found in the dataset are also given to compare against the error.As an example, we illustrate the prediction on Li 2 O in Figure 3, which is a good representation of the typical observed error.We want to emphasize that the gain in accuracy provided by joint-learning is strongly influenced by the architecture choice.The similarity between target properties is used  to decide where the tree splits, i.e., the layer up to which properties share an internal representation.In all generality, one can count the number of neurons and layers that separates two properties.This determines to which degree those two properties are related.
Increasing this distance (i.e. more layers and neurons between them) gives more freedom to the weights and improves learning of dissimilar properties.However, increasing it too much will tend to make the predictions independent, and no common hidden representation can be used to improve generalization.A good balance thus needs to be found between freedom and generalization.Note that increasing the architecture-distance between two properties will always decrease training error (up to convergence), but the validation error will have a minimum.Unfortunately finding this minimum based on a quantitative analysis of the dataset is rarely feasible, similarly as finding the right architecture a priori for a single-target model.It is therefore considered as a hyperparameter, as it is commonly done in the ML field.In practice, we suggest to first gather the properties in groups and subgroups based on their similarity.This will define the splits in the tree-like architecture.Then, various sizes for the layers and number of neurons (which will define the intra-property distance in architectural space) should be included in the regular hyperparameter optimization of the model.An in-depth example for the architectural choice for the vibrational properties can be found in the Supplementary Information, section C.

C. Feature selection
Feature selection is a valuable asset of MODNet and has two main advantages.First, it was shown in Figure 2 that an average 12% improvement error can be obtained by removing irrelevant features.This is far from negligible.This increase in performance is achieved by reducing the noise to signal ratio, caused by the curse of dimensionality.This is especially the case for small datasets.Supplementary Figure 7 shows that the gain in performance by feature selection reduces as the training size increases.We therefore expect that feature selection will be less important for larger datasets.Second, feature selection (compared to feature extraction) has the advantage of keeping the input space understandable.As they are chosen according to their relation (i.e.mutual information), important factors contributing to the target property can be found.Figure 4 shows a bivariate visualization for the vibrational entropy, formation energy, band gap and refractive index as a function of the two first selected features.Thanks to the redundancy criterion, both features are complementary to predict the target.A detailed description of these features can be found in Sec.B of the Supplementary Information.Concerning the vibrational entropy, a strong correlation is seen with the first feature, namely AGNIFingerprint, which gives a measure of the inverse bond length.In other words, increasing the average bond length increases the vibrational entropy.Similarly having a larger range of p-valence electrons (which is linked to ionicity) increases the vibrational entropy.Concerning the refractive index, two import factors are identified: the band gap and the density of the material.The band gap, although not explicitly given but instead approximated by the bandgap of the constituent elements, is known to be an important variable.Typically there is an inverse relation between the band gap energy and the refractive index, see Ref. [30].
Finding materials combining a high value for both properties remains a tedious task, and could therefore certainly benefit from Machine Learning.Overall, it is seen how common intuitive patterns for the physicist are indeed retrieved by the machine.Therefore, this strategy can be used to analyze and find underlying factors for all types of properties and datasets.
The feature selection algorithm presented in this work is based on relevance and redundancy and will be called MOD-selection.Other popular choices exist.Here, MOD-selection is compared to five other algorithms: (i) corr-selection in which features having the highest Pearson-correlation with the target are selected first; (ii) NMI-selection in which features having the highest NMI with the target are selected first; (iii) RF-selection where the data is first fitted with a Random Forest (300 trees) and features are ranked according to their impurity-based importance; (iv) SISSO-selection in which the data is first fitted by the SISSO model without applying any operator on the feature set, i.e.only primary features are used (rung set to 0) and each n th dimension of the final model corresponds to the n-th descriptor; and (v) OMP-selection in which an orthogonal matching pursuit is applied by using the SISSO strategy with a SIS-space restricted to one.
It is worth noting that, although SISSO is a powerful dimensionality reduction technique, it can not be used as such for feature selection with the same generality as the other techniques.Indeed, SISSO provides a general framework for selecting the best few descriptors from an immense set of candidates but the selection is computationally limited to ∼10 features.This is not an issue for the original aim of SISSO (which consists in a low dimensional model), but it surely is when used together with a neural network, where the optimal amount is typically a few hundreds of features.Therefore, when going beyond the 10 th feature, we simplified SISSO to OMP, which scales linearly with the number of features.dancy is critical and, in this case, SISSO was found to be best.Unfortunately, it becomes computationally unaffordable above 10 features.

III. DISCUSSION
Previous results show that although state-of-the art methods such as graph networks are very powerful on big datasets, they do not scale well on smaller datasetes which are typically encountered in physics.Our framework excellent accuracy on limited datasets by using prior knowledge such as prepossessed meaningful features or multiple properties for a same material.Beyond increasing accuracy, the m-MODNet is also convenient for constructing a single model for multiple properties, hence speeding up training and prediction time.
We showed that feature selection is very useful for small datasets.An improvement of 12% was found on the vibrational thermodynamics when learning on 200 samples.Moreover, an additional improvement of 8% on S 305K can be attributed to the joint-learning mechanism of MODNet.Importantly, our model provides the most accurate ML-model at present for vibrational entropies with a MAE (resp.RMSE) of 8.9 (resp.12.0) µeV/K/atom on S 305K on a holdout test set of 145 materials.This is four times lower than reported by Legrain et al. [31] (trained on 300 compounds) and 25 times lower than reported by Tawfik et al. [32] (trained on the exact same dataset as this work).
Another important advantage of MODNet is that its feature selection algorithm provides some understanding of the underlying physics.Indeed, it pinpoints the most important and complementary variables related to the investigated property.For instance, the vibrational entropy is found to strongly depend on the inter-atomic bond length and the valence range of the constituent elements (which relates to the ionicity of the bond) while the refractive index is related to an estimation of the band gap and to the density.Although all property predictions in this work were made from structural primitives, MODNet is certainly not limited to structures.For instance, it can easily be extended to composition-only tasks (see GitHub repository [33]).
In summary, we have identified a frontier between physical-feature-based methods and graph-based models.Although the latter are often referred to as state-of-the-art for many material predictions, the former are more powerful when learning on small datasets (below ∼4 000 samples).We have proposed a novel model based on optimal physical features.
Descriptors are selected by computing the mutual information between them and with the target property in order to maximize relevance and minimize redundancy.This combined with a feedforward neural network forms the MODNet model.Moreover, a multi-property strategy was also presented.By modifying the network in a tree-like architecture, multiple properties can be predicted, which is useful for temperature functions, with an increase in generalization performance thanks to joint-transfer learning.In particular, this strategy was applied on vibrational properties of solids, providing remarkably reliable predictions, orders of magnitude faster than conventional methods.Finally, we illustrated how the selection algorithm which determines the most important features can provide some understanding of the underlying physics.

A. Datasets
Four datasets were used throughout this work: formation energies, band gaps, refractive indices and vibrational thermodynamics.
The crystal data set for the band gaps and formation energies are based on DFT computations of 69,640 crystals from the Materials Project obtained via the Python Materials Genomics (pymatgen) interface to the Materials Application Programming Interface (API) on June 1, 2018 [28,29].Those crystals corresponds to the ones used for MEGNet (i.e. the MP-crystals-2018.6.1 dataset), which facilitates benchmarking as the Materials Project is constantly being updated.A subset of 45,901 crystals with finite band gap was used for the non-zero band gap regression (superscript nz in Table I).
The vibrational properties for 1,245 inorganic compounds were computed by Petretto et al. [14], in the harmonic approximation based on Density Functional Perturbation Theory (DFPT).This dataset contains the following thermodynamic properties: vibrational entropy, Helmholtz free energy, internal energy and heat capacity from 5 to 800 K in steps of 5 K. Supplementary Figure 1 graphically represents these four properties from 5 to 800 K for all materials contained in the dataset in meV/atom or meV/K/atom.A wide variety of values the MEGNet has also an additional hyperparameter consisting in the number of MEGNetblocks.Finally, when using the Random Forest the number of trees is taken as the only hyperparameter.In Supplementary Information, section C, an in depth-example is given on how the hyperparameters were chosen for MODNet when trained on multiple vibrational thermodynamic quantities.The final model has a min-max preprocessing, learning rate set to 0.01, MSE loss (with scaling of targets, see Supplementary Information), an architecture of two layers per block and 256, 128, 64 and 8 neurons in these succesive blocks.Adding (or removing) a layer, as well as doubling or halving the number of neurons does not improve accuracy as can be seen in Supplementary Figure 5.The batch size was fixed to 256.
A rectified linear unit function (ReLU) is used as activation for each layer.Learning is performed using an Adam optimizer (β 1 = 0.9, β 2 = 0.999, decay = 0) on 600 epochs.The final architecture is depicted in Figure 6.

V. DATA AVAILABILITY
The generated features, NMI and MP-2018.6 datasets are available on https://figshare.com/account/home#/projects/82607.The vibrational thermodynamics and refractive in-parameters a 1 ,. . .,a n A ,. . .,z 1 ,. . .,z n Z .The first green block of the neural network encodes a material in an appropriate all-round vector, while subsequent blocks decode and re-encode this representation in a more target specific nature.

ure 1 ,
is thus referred to as Material Optimal Descriptor Network (MODNet).Both ideas, feature selection and the joint-learning architecture, are now detailed further.

Figure 2 .
Figure 2. Comparison of the test error distributions on the vibrational entropy for different models.Absolute error distribution on the vibrational entropy at 305K (S 305K in µeV/K/atom) at three training sizes and for various strategies (see text for a detailed description).

Figure 3 .
Figure 3. Example prediction of MODNet on vibrational thermodynamics.MODNet predictions (dashed line) and DFPT values (solid line) for the thermodynamic quantities of Li 2 O (MPID: mp-1960) as a function of the temperature.Observed errors on this particular sample are close to the overall MAE of the test set.

Figure 4 .
Figure 4. Visualization of selected features.Bivariate representation of the two most important features for four different properties: (a) vibrational entropy at 305 K, (b) refractive index, (c) formation energy and (d) band gap energy.Both features are complementary to narrow down the target output, although certainly not sufficient for an accurate estimation.

Figure 5 (
Figure 5(a) shows the test error on the vibrational entropy at 305 K for the different models (MODNet substituted with different selection algorithms) for the first 10 selected features.The training size is fixed to 1100 samples.There is a clear distinction between redundancy based techniques (MOD and SISSO) and non-redundancy based techniques (corr, NMI and RF).Accounting for redundancy is clearly important when using only a few features.In this particular scenario, SISSO outperforms MOD-selection.

Figure 5 .
Figure 5. Performance comparison of different feature selection methods.(a) Test error the vibrational entropy at 305K for different feature selection algorithms, as a function of the first few features for 1100 training samples.(b) Test error on the vibrational entropy at 305K for different feature selection algorithms as a function of the training size with other parameters being optimized over a fixed grid.Models are constructed by replacing the selection algorithm in MODNet by a Pearson correlation (corr), Normalized Mutual Information (NMI), Random Forest (RF), SISSO, and orthogonal matching pursuit (OMP).(c) Jaccardian similarity of the 300 first selected features on a sampled training set of size n, and the total dataset (1245 samples) as a function of n, for different feature selection algorithms.

Figure 6 .
Figure 6.MODNet architecture for the vibrational properties.Architecture of the MODNet (composed of 4 blocks) when learning on the vibrational properties.The formation energy is added by adding a second order block to the first block.

Figure 2 .
Figure 2. Comparison of the test error distributions on the vibrational entropy for different models.Absolute error distribution on the vibrational entropy at 305K (S 305K in µeV/K/atom) at three training sizes and for various strategies (see text for a detailed description).The density is obtained from a kernel density estimation with Gaussian kernel.The mean µ (equal to the MAE) and variance σ of each distribution are also reported in µeV/K/atom.

Figure 3 .
Figure 3. Example prediction of MODNet on vibrational thermodynamics.MODNet predictions (dashed line) and DFPT values (solid line) for the thermodynamic quantities of Li 2 O (MPID: mp-1960) as a function of the temperature.Observed errors on this particular sample are close to the overall MAE of the test set.

Figure 4 .
Figure 4. Visualization of selected features.Bivariate representation of the two most important features for four different properties: (a) vibrational entropy at 305 K, (b) refractive index, (c) formation energy and (d) band gap energy.Both features are complementary to narrow down the target output, although certainly not sufficient for an accurate estimation.

Figure 5 .
Figure 5. Performance comparison of different feature selection methods.(a) Test error on the vibrational entropy at 305K for different feature selection algorithms, as a function of the first few features for 1100 training samples.(b) Test error on the vibrational entropy at 305K for different feature selection algorithms as a function of the training size with other parameters being optimized over a fixed grid.Models are constructed by replacing the selection algorithm in MODNet by a Pearson correlation (corr), Normalized Mutual Information (NMI), Random Forest (RF), SISSO, and orthogonal matching pursuit (OMP).(c) Jaccardian similarity of the 300 first selected features on a sampled training set of size n, and the total dataset (1245 samples) as a function of n, for different feature selection algorithms.

Figure 6 .
Figure 6.MODNet architecture for the vibrational properties.Architecture of the MODNet (composed of 4 blocks) when learning on the vibrational properties.The formation energy is added by adding a second order block to the first block.

Table I .
Test accuracies with small and large training size for different machine learning algorithms.Comparison of the mean absolute error (MAE) on a test set in the formation energy (E f in eV/atom), the band gap (E g in eV, the superscript nz refers to datasets restricted to non-zero band gaps), the refractive index (n) for MODNet, two variants of MEGNet and SISSO as a function of the training-set size (N train ).The MEGNet variant including transfer learning is indicated by a star.

Table II .
MODNet errors on various vibrational properties.MAE and MaxMAE for the vibrational entropy, Helmholtz energy, specific heat and internal energy at different temperatures as predicted with MODNet.The MaxMAE is defined as the mean over worst 5% predicted samples.