From bulk effective mass to 2D carrier mobility accurate prediction via adversarial transfer learning

Data scarcity is one of the critical bottlenecks to utilizing machine learning in material discovery. Transfer learning can use existing big data to assist property prediction on small data sets, but the premise is that there must be a strong correlation between large and small data sets. To extend its applicability in scenarios with different properties and materials, here we develop a hybrid framework combining adversarial transfer learning and expert knowledge, which enables the direct prediction of carrier mobility of two-dimensional (2D) materials using the knowledge learned from bulk effective mass. Specifically, adversarial training ensures that only common knowledge between bulk and 2D materials is extracted while expert knowledge is incorporated to further improve the prediction accuracy and generalizability. Successfully, 2D carrier mobilities are predicted with the accuracy over 90% from only crystal structure, and 21 2D semiconductors with carrier mobilities far exceeding silicon and suitable bandgap are successfully screened out. This work enables transfer learning in simultaneous cross-property and cross-material scenarios, providing an effective tool to predict intricate material properties with limited data.


Data distribution.
The completeness of the dataset is crucial for ensuring the reliability and generalizability of machine learning (ML) models.In order to train an accurate and robust model, it is important to have a balanced dataset that covers a large element and property space.In this study, the training and testing datasets were collected from published papers based on theoretical calculations using the deformation potential approximation method, resulting in a total of 178 samples.Prediction set are two opensource 2D material databases, C2DB 1 and 2Dmatpedia 2

Model optimization and comparison.
The multi-layer perceptron (MLP) model is used to build adversarial transferlearning network structure, which is a fully-connected neural network with multiple layers.We further optimize the number of network layers and the neurons in each layer by random search.The joint loss is the summation of the regressor loss (MAELoss) and the reversed classifier loss (BCELoss), which are shown in Supplementary Fig. 3(a-b).
An intriguing observation is that deeper hidden layer result in better feature extraction.This trend can be attributed to two factors.Firstly, the regulation of adversarial training improves the effectiveness of the network in the target domain.Additionally, there is a close relationship between effective mass and carrier mobility, which may contribute to this trend.Furthermore, the lower dimensionality of deeper layers may provide a bonus when working with small target datasets.It is worth noting that training an MLPbased classifier can be challenging when dealing with lower dimensionality data.In such cases, the use of simpler models such as support vector machines and logistic regression may help to improve training efficiency.
The optimized network structures are demonstrated in Supplementary Fig. 3(c-d), which are pyramid-like MLP, that have been proved to be effective for material property prediction in ElemNet 3 and it is easy to build.To further improve model performance, one can introduce multi-task learning and graph-based method. 4,5upplementary Figure 3 The adversarial training workflow is illustrated in Supplementary Fig. 4. We begin by removing duplicated samples and outliers from the input source and target data.Next, we select a range of learning rates and use Adam optimizer to initialize the feature extractor multiple times.If the loss drops over 30% within 50 epochs, the MLP initialization is deemed successful.We then train the initialized MLP model for 500 epochs to capture knowledge in the source domain.The trained feature extractor produces two sets of features for the source and target data, respectively, which are utilized to train the classifier.The classifier is also initialized using this process.
We then fix the classifier and continue training the feature extractor.However, the loss function is adjusted by incorporating another term from the reversed classifier loss.
Given the vast differences between the source and target domains, the initial classifier can easily identify the data sources.When the feature extractor becomes good enough to fool the classifier, we fix the feature extractor and train the classifier again.This The extracted features are used as input to train a gradient boosting tree model, which is more effective for small datasets.We also compared other machine learning models as shown in Supplementary Fig. 5(a-b), gradient boosting tree model under XGBoost framework gives highest R 2 score and lowest MAE.The hyperparameters of LASSO and KRR models are optimized by grid search, and for XGB models, we use a global search algorithm based on the simulated annealing algorithm 6 to find the optimal parameters, which are listed in Supplementary Table 1.To get better prediction accuracy, and fully use the calculated results, we using active learning to sample new materials with highest predicted carrier mobilities and add them back to update models, 44 samples are added back in 11 iterations, the performance increment are shown in Supplementary Fig. 5(c).  1 Hyperparameters' search range and the optimal values for different models.

Supplementary Table
Fig. 2. As shown in Supplementary Fig. 1, the training and testing datasets cover a large element space and the average carrier mobilities follow a normal distribution, which facilitates effective model training.

.
Model optimization for adversarial transfer learning.Joint loss of MLP models with (a) different layers and (b) optimal layers with different neurons in each layer.The red dashed line box selects the optimal network configuration.The optimal (c) feature extractor and (d) data source classifier, each hidden layer contains a linear and a batch normalization (BN) layer with rectified linear unit (ReLU) as activation function, the output of classifier uses Sigmoid as activation function.The input feature is generated by materials agnostic platform for informatics and exploration (MAGPIE).Source data are provided as Source Data file.
process of training the feature extractor and classifier alternately is repeated several times until the classifier can no longer distinguish the origin of the data.Supplementary Figure 4. Adversarial transfer-learning workflow.The arrows represent the order of data processing or model training.

Supplementary Figure 5 .
Selection of mobility prediction models and active learning sampling.Model comparison on predicting (a) average electron mobility and (b) average hole mobility.The model performance is evaluated by mean absolute error (MAE) and coefficient of determination (R 2 ).Three popular models, least absolute shrinkage and selection operator (LASSO), kernel ridge regression (KRR), extreme gradient boosting (XGB) are chosen for comparison.All models are trained five times with different random seed to split data, the error bar represent the standard deviation.(c) Performance increment during active learning.The solid dots and transparent diamonds represent hole and electron, respectively.Source data are provided as Source Data file.

R
Number of in-plane rotate symmetry operation MI Number of in-plane mirror symmetry operation MO Number of out-of-plane mirror symmetry Average fraction of valence electrons in the s,p, and d shells of the constituent elements Supplementary Figure 7. Visualization of latent feature space by t-distributed stochastic neighbor embedding (t-SNE).The scatter plot uses color to indicate the hole mobility of 2D semiconductors.Source data are provided as Source Data file.

Table 2 .
Feature list of four feature groups.Feature label and the corresponding description, all features are divided into four sets according to the way they are extracted or their physical means, and the letters in brackets are their abbreviations.

Table 3 .
Comparison of running time and prediction accuracy, the running time are the estimated time for predicting 1000 2D materials, the prediction accuracy is evaluated by R 2 score compared of DFT-based calculations.

Table 4 .
Calculated carrier mobility, effective mass and deformation potential for the top ten materials with the highest carrier mobility.All materials are converted to orthogonal lattices, which are labeled as x and y.Since the deformation potential approximation and effective mass approximation require first-order and second-order fitting, results with a fitness less than 0.9 may be unreliable thus are represented by '/'.