Fig. 5 | npj Computational Materials

Fig. 5

From: Chemically intuited, large-scale screening of MOFs by machine learning techniques

Fig. 5

Schematic representation of the analysis pipeline employed by JAD. Based on the type of data and its size, the tool determines a set of combinations of tuning hyper-parameter values to try, called configurations. Hyper-parameters are depicted as tuning sliders. The data are partitioned to K-folds and for each fold and configuration a predictive model is trained. These are evaluated on the held-out folds and the average performance of each configuration is estimated. Based on the best configuration found a final model is produced on all data. The estimate of the best configuration is optimistic (see ref. 44 for an explanation); the optimism is removed using a bootstrap procedure before it is returned in a similar fashion as in ref. 44

Back to article page