Predicting adsorption ability of adsorbents at arbitrary sites for pollutants using deep transfer learning

Accurately evaluating the adsorption ability of adsorbents for heavy metal ions (HMIs) and organic pollutants in water is critical for the design and preparation of emerging highly efficient adsorbents. However, predicting adsorption capabilities of adsorbents at arbitrary sites is challenging, with currently unavailable measuring technology for active sites and the corresponding activities. Here, we present an efficient artificial intelligence (AI) approach to predict the adsorption ability of adsorbents at arbitrary sites, as a case study of three HMIs (Pb(II), Hg(II), and Cd(II)) adsorbed on the surface of a representative two-dimensional graphitic-C3N4. We apply the deep neural network and transfer learning to predict the adsorption capabilities of three HMIs at arbitrary sites, with the predicted results of Cd(II) > Hg(II) > Pb(II) and the root-mean-squared errors less than 0.1 eV. The proposed AI method has the same prediction accuracy as the ab initio DFT calculation, but is millions of times faster than the DFT to predict adsorption abilities at arbitrary sites and only requires one-tenth of datasets compared to training from scratch. We further verify the adsorption capacity of g-C3N4 towards HMIs experimentally and obtain results consistent with the AI prediction. It indicates that the presented approach is capable of evaluating the adsorption ability of adsorbents efficiently, and can be further extended to other interdisciplines and industries for the adsorption of harmful elements in aqueous solution.


INTRODUCTION
Recent studies have shown that when artificial intelligence (AI) meets material design and discovery, it means reducing the time and cost going from lab to practical applications by greatly improving the research efficiency [1][2][3] . Heavy metal ions (HMIs) and organic pollutants are major sources for water pollution 4,5 , causing persistent harm through the accumulation in food chain, threatening ecological conditions and human health 6,7 . Pioneers have designed and synthesized several adsorbents that exhibit high adsorption ability for removing HMIs and organic pollutants from water [8][9][10][11][12][13] . Adsorption ability of an adsorbent relies on the active sites and the corresponding activity intensities, which is currently hardly detectable 14,15 . Theoretical prediction provides an alternative approach for understanding the mechanism of the adsorption process and for exploring highly efficient adsorbents. Researchers spend a lot of time to allocate, model, and wait for first-principle calculations [16][17][18][19] , which can determine the adsorption capacity of materials at special sites in advance. However, the configurational space offered by the wide variety of materials and the complex relationships between active sites and activity intensities of adsorbents indicates that a conventional approach for structural optimization, based on inherently time-consuming ab initio methods, is particularly challenging.
Recently, the means that are based on mechanism have been partly displaced by machine learning (ML), which is an AI method containing three elements: models, strategies, and algorithms, so as to speed up the computational process and obtain complex physical and chemical properties that are not accessible with conventional approaches [20][21][22] . While significant research progress has been achieved by improving the material descriptors over many years, the applications of ML for material prediction is in general plagued by several significant challenges [23][24][25] . For example, for some tasks, to achieve a high prediction accuracy, the ML method requires a sufficient amount of effective data to capture the correlations between physical properties and features or uses repeated iterations to train different models, which inevitably consumes time and reduces the efficiency of ML [26][27][28] . To address those issues, in this study, we present a popular ML model to investigate the HMI trapping and quantitatively determine the adsorption ability of adsorbent to HMIs at arbitrary sites. The transfer learning (TL) method is adopted in the model [29][30][31][32] , which has hardly been mentioned and applied in the adsorption energy prediction model. Since two-dimensional materials commonly possess enriched adsorption active sites at several positions (such as defect and boundary) with abundant surface functional groups, especially, the ultra-thin two-dimensional materials can have large surface area because the material can maintain the maximum plane size while maintaining the atomic thickness [33][34][35] , they have been considered promising adsorbents for many fields including water purification. Herein, we choose a typical twodimensional (2D) graphitic-C 3 N 4 (g-C 3 N 4 ) adsorbent as a case study to evaluate the adsorption characteristics towards three representative HMIs including Pb(II), Hg(II), and Cd(II).
Unlike most ML approaches that use different models for training and testing based on enough data to ensure accuracy and avoid overfitting, the TL method can transfer knowledge from one dataset to another in the different but related domains with high reliability, making full use of the feature similarity between models. For the prediction of similar material properties, TL alleviates the issues of time-consuming and data scarcity by 1 National Key Laboratory of Science and Technology on Micro/Nano Fabrication, Shanghai Jiao Tong University, Shanghai, China. 2  switching the multi-model training to single-model training, decreasing a large amount of training data to a small amount of effective data. TL is able to utilize the chemical and physical properties and similarities between the structure descriptors learned by Pb(II)/g-C 3 N 4 model, as well as Hg(II)/g-C 3 N 4 and Cd(II)/ g-C 3 N 4 models. Based on the TL method, the adsorption ability towards Hg(II), Cd(II) at arbitrary sites can be predicted accurately and quickly by a small amount of training data, through training the adsorption ability of Pb(II) on the surface of g-C 3 N 4 in advance.
In our study, 7000 adsorption energies calculated by the ab initio density functional theory (DFT) were used to predict the adsorption sites and adsorption capacity of Pb(II)/g-C 3 N 4 through the deep neural network (DNN), which served as the initial model to be learned. Based on the TL method, the adsorption capacity of the remaining HMIs on the same adsorbent can be predicted by a small amount of DFT data. Here, only 700 adsorption energies were calculated to quickly predict the adsorption capacity of Hg(II) and Cd(II) on the surface of g-C 3 N 4 through TL. DNN prediction indicated that compared with the edges of adsorbent material, HMIs were more likely to be adsorbed at the center of g-C 3 N 4 adsorbent, with predicted RMSEs all less than 0.1 eV. A RMSE of 0.1 eV by a prediction model with only a few hundreds of DFT calculations is treated as a remarkable feat, which provides a powerful guarantee for predicting the adsorption capacity of adsorbent towards HMIs accurately 27,36 . The presented AI method has the same accuracy as the ab initio DFT calculation, but is ten times faster than the training from scratch in the training stage (only requires one-tenth of datasets than training from scratch) and millions of times faster than the DFT in prediction stage. In addition to the adsorption ability prediction of g-C 3 N 4 for Pb(II), Hg(II), and Cd(II), the proposed method can be easily extended to predict the adsorption ability of other adsorbents for different HMIs, organic contaminants, etc., which is significant for the environmental treatment of removing harmful pollutants from water.

Adsorption ability prediction of g-C 3 N 4 for Pb(II)
We started with the determination of the adsorption ability of Pb (II) adsorbed at the arbitrary site of the surface of g-C 3 N 4 . The corresponding adsorption model is presented as Pb(II)/g-C 3 N 4 . To ensure the unbiased statistical results, a total of 7000 single-point adsorption energies with different potential active sites were calculated by DFT. The Deep Potential-Smooth Edition (DeepPot-SE), an end-to-end deep neural network-based (DNN) potential energy surface (PES) model, was performed to evaluate the adsorption ability to Pb(II) adsorbed on the surface of g-C 3 N 4 at arbitrary site in the feature space. Figure 1 shows the schematic model of the adsorption process of HMIs on the surface of g-C 3 N 4 , where the active sites and activity intensities of Pb(II) adsorbed on the surface of g-C 3 N 4 were calculated by DFT and trained by the DNN model, while the corresponding adsorption ability of Hg(II) and Cd(II) can be predicted by a small amount of data via TL method. Supplementary Fig. 1 in Supplementary Materials (SM) shows the calculated structures of Pb(II)/g-C 3 N 4 , Hg(II)/g-C 3 N 4 , and Cd(II)/g-C 3 N 4 .
The dataset of Pb(II)/g-C 3 N 4 contains 7000 DFT-based singlepoint adsorption energies (ΔE). The parallelogram-shaped single layer g-C 3 N 4 was fully scanned with respect to the Pb(II) position, as depicted in Fig. 2a. The energy landscape of Pb(II) on the surface of g-C 3 N 4 shows that the calculated 7000 ΔE were widely distributed between −0.07 and −4.144 eV with the absolute maximum of 4.144 eV (the black points in Fig. 2a). The randomly placed Pb(II) and g-C 3 N 4 have different degrees of adsorption interaction (ΔE < 0), indicating the rationality of the required structural sampling in a real space. Different colors represent different adsorption energies, with the strongest ones locating at the center of a dashed triangle (see discussion in Supplementary Note 1). To reach an accuracy of 0.1 eV 27,36 , the accurate DNN predictions for ΔE were needed and an appropriate descriptor was selected. To preserve all natural symmetries of the system, a local environment matrix (LEM) was used as a structural descriptor 37,38 , which is an extensive, continuously differentiable approach and linear to the size of the system. Compared to traditional kernels and hand-crafted features, LEM performs well in many systems, such as organic molecules and metal materials 28 , thus serving as a feature space for DNN input in this study. From Fig. 2a, most of the yellow and green points (with absolute energies below 3 eV) were adsorbed at the edges of the parallelogram-shaped single layer g-C 3 N 4 , while the red and black points with strong adsorption energies were located in the center from the top view. The position scan of Pb(II) in Fig. 2a shows that Pb(II) is more favorably adsorbed at the center of the g-C 3 N 4 rather than at the edges from the top view. Figure 2b shows the correlation plot of ΔE between DFT and DNN calculations of Pb(II)/g-C 3 N 4 , including 6000 training points and 1000 testing points. From the dashed line errors, the points are uniformly distributed on both sides of the dashed line around y = x from −0.07 and −4.144 eV. The determination coefficient (R 2 ) of Pb(II)/g-C 3 N 4 model obtained from these scattered points is 0.99 (as shown in Table 1), this indicated that the DNN prediction for energy distribution is in good agreement with the DFT, and the maximum deviation between DFT and DNN is 0.133 eV. Especially, on a single-CPU, it takes only a few milliseconds for DNN to predict an adsorption energy, which is millions of times faster than the DFT calculations. Therefore, our method can not only predict the adsorption ability of g-C 3 N 4 towards Pb(II) at the arbitrary site, but also maintains the DFT level of accuracy. In addition, in the training stage, this AI method is ten times faster than the training from scratch (only requires one-tenth of datasets than training from scratch), while millions of times faster than the DFT in the prediction stage.

Adsorption abilities of g-C 3 N 4 for Hg(II) and Cd(II)
To evaluate the adsorption ability of g-C 3 N 4 towards Pb(II), 7000 adsorption energies were calculated by the DFT method which was a time-consuming but worthwhile process since such sufficient data ensured the prediction accuracy of initial prediction. To maintain the same prediction accuracy as Pb(II)/g-C 3 N 4 but shorten the calculation time, we used the TL method to evaluate the adsorption abilities towards Hg(II) and Cd(II). TL enabled the transfer of feature representation learned for a specific predictive modeling task from a large data source set to small target datasets in a similar domain (Fig. 3) [29][30][31][32] , thus it could transfer the DNN prediction of Pb(II)/g-C 3 N 4 into similar systems with less data and higher reliability. Compared with the Pb(II)/g-C 3 N 4 prediction with 7000 DFT adsorption energies, the adsorption ability predictions for Hg(II)/g-C 3 N 4 and Cd(II)/g-C 3 N 4 were achieved by the calculations of 700 adsorption energies, respectively. The energy landscapes of Hg(II) and Cd(II) scans at the arbitrary sites of the surface of parallelogram-shaped g-C 3 N 4 were plotted in Supplementary Figs. 2, 3, where only 700 adsorption energies (one-tenth of data of Pb(II)/g-C 3 N 4 ) were calculated by the DFT method. Table 1 shows the predicted root-mean-squared errors (RMSEs) for Pb(II)/g-C 3 N 4 , Hg(II)/g-C 3 N 4 , and Cd(II)/g-C 3 N 4 . The RMSE of 0.1 eV obtained from the prediction model by only a few hundred DFT calculations is a remarkable achievement, which provides a powerful guarantee for the statistical prediction of adsorption capacity of materials to HMIs. As expected, based on the structural descriptor of LEM, we can fleetly predict the adsorption energy of Pb(II) at the arbitrary site with an accuracy of 0.051 eV for the 1000  The Pb(II)/g-C 3 N 4 structure was predicted with 7000 DFT adsorption energies, while the Hg(II)/g-C 3 N 4 and Cd(II)/g-C 3 N 4 were predicted with 700 energies, respectively.
Z. Wang et al. testing data, while the testing RMSEs for Hg(II) and Cd(II) are 0.012 eV and 0.043 eV for 100 testing data, respectively. To clarify the rationality and accuracy of the TL method in processing small datasets, Fig. 4 shows the performance comparison of trained models from scratch (FS) and transfer learning (TL) in each iteration for Hg(II)/g-C 3 N 4 and Cd(II)/g-C 3 N 4 , based on 700 single-point adsorption energies by DFT calculation. In the FS method, the model parameters were initialized randomly from a uniform distribution and all feature attributes were learned from the input training data. The 700 adsorption energies were randomly divided into 600 training data and 100 testing data. The red and orange curves in Fig. 4a are the training and testing RMSE of Hg(II)/g-C 3 N 4 and Cd(II)/g-C 3 N 4 based on FS, where the RMSE decreases as the increasing of iterations. However, during the iterations of 200-400 and 1000-2200, the training RMSE based on FS exhibits abnormal decreasing while the testing RMSE increases with the increasing of iterations. This is a typical overfitting effect induced by FS, resulting in large prediction errors for training and test data and poor generalization ability of the model. The same overfitting effect can also be found in the Cd(II)/g-C 3 N 4 structure in Fig. 4b. The green and purple curves in Fig. 4a, b show the training and testing RMSEs based on the TL method. Different from the FS method with randomly initialized parameters, the model parameters for Hg(II)/g-C 3 N 4 and Cd(II)/g-C 3 N 4 coming from TL were initialized based on the well-trained model of Pb(II)/g-C 3 N 4 and fine-tuned in the next training. Table 1 Table. S1, for Hg(II), the maximum deviations of DFT and DNN are 1.266 eV with FS and 0.056 eV with TL, and the ones for Cd(II) are 1.515 eV with FS and 0.054 eV with TL. For the training from scratch, the maximum error exceeds 1.0 eV (the relative error is more than 50%), which is likely to result in very inaccurate model predictions. Table 1 displays the reliability and effectiveness of the TL method, where the prediction errors of Hg(II)/g-C 3 N 4 and Cd(II)/g-C 3 N 4 based on the TL method and 700 DFT energies are 0.012 eV and 0.043 eV, respectively. Although the 700 data size is small, the prediction errors for Hg(II)/g-C 3 N 4 and Cd(II)/g-C 3 N 4 are far below than those of Pb(II)/g-C 3 N 4 based on FS and 7000 DFT energy (0.051 eV). Therefore, even if the dataset is very small (like the Hg (II)/g-C 3 N 4 and Cd(II)/g-C 3 N 4 calculations with only about 600 samples for fine-tuning), the proposed TL method can work effectively even if the target domain has a small amount of data, as long as an accurate model is established on the source domain (see Supplementary Note 2). Supplementary Fig. 4 shows the comparison of FS predictions for Pb(II)/g-C 3 N 4 , Hg(II)/g-C 3 N 4 , and Cd(II)/g-C 3 N 4 , with 7000, 700, and 700 adsorption energies, respectively, where the FS performs well for the 7000 energies of Pb(II)/g-C 3 N 4 , but poorly for the 700 energies of Hg(II)/g-C 3 N 4 or Cd(II)/g-C 3 N 4 . Therefore, the size of training dataset in ML has a significant impact on the model performance 27,39 , where FS fails to predict the system with small size of dataset, but TL can. More statistical information of adsorption energies are provided in Supplementary Figs

Adsorption ability comparison and verification
The above adsorption ability prediction for Pb(II)/g-C 3 N 4 , Hg(II)/g-C 3 N 4 , and Cd(II)/g-C 3 N 4 was based on 7000, 700, and 700 singlepoint adsorption energies by DFT calculations, respectively. To compare the adsorption abilities of three HMIs, we filled the datasets of Hg(II)/g-C 3 N 4 and Cd(II)/g-C 3 N 4 with 6300 adsorption energy points, making them the same sizes as the datasets of Pb (II)/g-C 3 N 4 , based on the TL prediction rather than DFT calculation. By using the proposed heavy metal ion-transfer learning (HMI-TL) model, such large datasets enable us to obtain the unbiased and reliable statistical results without computational cost. Figure 5 shows the frequency histograms of g-C 3 N 4 towards three HMIs, where the blue, pink, and yellow curves represent the energy distributions of Pb(II), Hg(II), and Cd(II) with 7000 adsorption energies, respectively. Table 1 shows the energy distributions of three HMIs adsorbed on the surface of g-C 3 N 4 at arbitrary sites, To predict the adsorption ability of a material toward HMIs, the traditional method is to optimize the composite structure to obtain the adsorption energy of HMIs at a fixed position of the material. It is unilateral to evaluate the adsorption ability of a material toward HMIs at a fixed position, only corresponding to one point in Fig. 2a. The proposed study considers the prediction of adsorption energy of HMIs at arbitrary sites of an adsorbent material. From Fig. 5, the predicted and calculated 7000 adsorption energies for each HMI are distributed in different energy ranges, indicating the different adsorption abilities of g-C 3 N 4 adsorbent at different positions. In Table 2, the standard deviation shows the energy distribution of three HMIs, among which the Pb(II) with the largest standard deviation has the widest energy distribution, followed by Hg(II) and Cd(II). Such distributions are consistent with the curve trend in Fig. 5. The widely distributed adsorption energy makes it more difficult to evaluate the collective adsorption capacity of a material towards certain HMIs. To evaluate the adsorption abilities of different HMIs on the surface of g-C 3 N 4 , we calculated the mean of 7000 adsorption energy distributions of each HMI, with the results of −1.664 eV for Pb(II), −1.695 eV for Hg(II), and −1.707 eV for Cd(II), as shown in Table 2. Therefore, based on the HMI-TL model, the adsorption abilities for three ions at any site of the g-C 3 N 4 surface are evaluated as Cd(II) > Hg(II) > Pb(II).  Furthermore, to validate the prediction of the presented HMI-TL model for evaluating the relative adsorption ability on the arbitrary sites of the adsorbent's surface, we have experimentally measured the adsorption capacity of g-C 3 N 4 adsorbent towards the three HMIs. The g-C 3 N 4 was synthesized by calcining urea at 550°C. As shown in Fig. 6a, a porous interconnection structure was observed and the high-magnification scanning electron microscopy (SEM) image (Fig. 6b) indicated that the layered g-C 3 N 4 presented wrinkled structure, which was beneficial for adsorbing HMIs. Figure 6c shows the X-ray diffraction (XRD) pattern of g-C 3 N 4 . Two strong peaks at 2θ = 13.2°and 27.6°are observed, which are indexed to the (100) and (002) planes of graphitic nature, respectively 40,41 .
In our measurements, two initial concentrations (100 and 200 mg L −1 ) were used for measuring the adsorption ability towards three HMIs, respectively. The adsorption capability of the g-C 3 N 4 was obtained by measuring the concentrations of the solutions before and after adsorption. As plotted in Fig. 6d, the order of adsorption amounts under the same conditions follow the sequence of Cd(II) > Hg(II) > Pb(II) either in 100 or 200 mg L −1 solutions, which is consistent with our theoretical prediction, and these results prove that our method is feasible and effective. For the large change of values in Fig. 6d, we thought it may be attributed to the influence of steric hindrance and the interaction between HMIs. Owing to the relatively strong adsorption of g-C 3 N 4 for Cd(II), the values change more obviously at different initial concentrations. In this work, we aimed at exploring an ML method to rapidly pick out relatively strong adsorption adsorbents. We expect that it will be significant to further study the practical application of the presented ML method, which would be an important research direction.

DISCUSSION
In this study, we proposed an AI approach to evaluate the adsorption ability of adsorbent toward HMIs at arbitrary sites accurately and quickly, based on the deep neural network and transfer learning. As a case study, we chose a typical g-C 3 N 4 as the adsorbent to investigate the adsorption abilities toward three representative HMIs (Pb(II), Hg(II), Cd(II)). The Pb(II) on g-C 3 N 4 was evaluated by 7000 single-point adsorption energies with DFT calculations, while the Hg(II)/g-C 3 N 4 and Cd(II)/g-C 3 N 4 were calculated by only 700 DFT energies with the TL method. The predicted adsorption abilities for the three HMIs were Cd(II) > Hg (II) > Pb(II), corresponding to the predicted RMSE of 0.043, 0.012, and 0.051 eV, respectively. Such RMSEs, all less than 0.1 eV with only a few hundred DFT calculations, ensured the prediction accuracy and were considered as a remarkable feat. Furthermore, the predicted results are also confirmed by experimentally measuring the adsorption efficiency of g-C 3 N 4 adsorbent towards Cd(II), Hg(II), and Pb(II). While significant research progress has been achieved by finding and designing adsorbents to deal with the water pollution, the prediction of adsorption ability of adsorbents to HMIs is still a challenge. First, the experimental prediction of adsorption capacity of adsorbents on HMIs is a complex process, involving multiple steps such as material design, synthesis, and measurement. It is really difficult to determine the adsorption capacity of adsorbents at arbitrary sites. Second, the first-principles calculations can quantitatively determine the adsorption sites and ability of different adsorbents to HMIs, but it is time-consuming. In our study, to obtain the adsorption ability of any site, the presented AI approach can reach the same prediction accuracy as the firstprinciples calculation, but only requires one-tenth of datasets than training from scratch, which means it is ten times faster than the training from scratch in the training stage and millions of times faster than the DFT in prediction stage. The present study shows that the HMI-TL model can accurately and rapidly evaluate the adsorption ability of the adsorbent towards HMIs and determine the adsorption position at arbitrary sites without involving the experimental process. HMT-TL model provides a convincing and powerful pre-experimental guidance for removing of certain HMIs, which is of great significance to design adsorption materials.
To sum up, this work has demonstrated the feasibility of transfer learning in evaluating the adsorption capacity of adsorbent materials for HMIs, based on a small amount of data. When the source field used for transferring learning is similar to the target field, we believe that the proposed HMI-TL model can effectively transfer knowledge from the source dataset to the target dataset. In addition, the AI approach proposed in this work can help solve HMIs and organic contamination in aqueous solutions, which can be used to screen more robust materials when designing and discovering adsorbents. Considering that the prediction of adsorption processes can be widely used in many fields such as catalysis and batteries 42,43 , the proposed model provides an opportunity to solve adsorption problems by combining AI, materials, and environmental science.

First-principles calculations
The DFT calculations were conducted using the Vienna ab initio Simulation Package (VASP) 44 . The projected augmented wave (PAW) method 45,46 was applied to describe ion-electron interactions along with the Perdew-Burke-Ernzerhof (PBE) exchange-correlation function within generalized gradient approximation (GGA). The Hg, Cd, Pb, and single layer g-C 3 N 4 were optimized in advance. During the adsorption calculation, a cutoff energy of 500 eV was performed with a Monkhorst-Pack of 3 × 3 × 1 k-point grids, and the convergence criteria were set to 1 × 10 −6 eV atom −1 for energy and 0.01 eV Å −1 for force, respectively. A vacuum distance of 15 Å was added in the g-C 3 N 4 slab to avoid periodic interactions. To accurately describe the interaction between HMIs and g-C 3 N 4 substrate, the DFT-D3 method 47 was employed, which considers the van der Waals interaction. The adsorption energy can be described as (1) where E sub + met , E sub and E met were the energy of HMIs adsorbed on the surface of the substrate, the energy substrate of g-C 3 N 4 and the energy of HMIs (Cd, Hg, Pb), respectively.

Datasets
The three datasets for Pb(II)/g-C 3 N 4 , Hg(II)/g-C 3 N 4 , and Cd(II)/g-C 3 N 4 contained 7000, 700, and 700 single adsorption energies calculated by DFT, respectively. The HMIs were randomly scanned on the parallelogramshaped g-C 3 N 4 . To explore the adsorption active sites and make the adsorption energy negative, Pb(II) were randomly scanned at a distance of 100-300 pm from the surface, where the seed value of rand function was changed by using the system time and the different random number sequences were generated by C++ program. Furthermore, the datasets of Hg(II)/g-C 3 N 4 and Cd(II)/g-C 3 N 4 were produced in a similar fashion, where the Hg(II) and Cd(II) were randomly scanned at a distance of 200-400 pm from the surface.

Structural descriptors
Structural descriptors are the input vectors of NN, satisfying the translational, rotational, and permutational invariance. In this study, we used a LEM as the structural descriptor 37 , which is an extensible approach and has powerful functions. For a n atoms system, the Cartesian coordinates are {R 1 , We calculated the entire radial and angular features of atom i and neighbor atom j base on the equation: where x0 ij ,V ij could be obtained by x ij , R ij through rotation matrix ℜ, which could be expressed as The rotation matrix ℜ were defined by the two closest atoms (atom ia and ib), independently of their chemical elements, and atom i: where e(x) = x/∥x∥. The rotation matrix ℜ could also be named as local frame of atom i. Therefore, different atoms had different rotation matrices. We set 10.0 Å as the cutoff radius for neighbor searching and 8.8 Å as where the smoothing started.

Training model
The Deep Potential-Smooth Edition (DeepPot-SE) model implemented by Python/C++ and TensorFlow framework 48 was used in this study. DeepPot-SE, an end-to-end DNN-based PES model, which is able to efficiently represent the PES of a wide variety of systems with the accuracy of ab initio quantum mechanics. It is extensive and continuously differentiable, scales linearly with system size, and preserves all the natural symmetries of the system. In the model, the three hidden layers each with 20 nodes were fully connected, which was determined by the DNN to predict the adsorption energies via structural descriptors. A batch size of 64 with Adam optimizer was used to improve the training speed while strengthening the optimization 49 . During the training, the error of the model was tested and displayed every 100 iterations. We used the initial learning rate of 0.002 for the model that was trained from scratch and 0.001 for the model based on TL, given that hyper parameters were finetuned during the TL training.
In this work, we used the parameter-based TL. The source domain and the target domain share model parameters, that is, the model trained by a large amount of data in the source domain is applied to the target domain for prediction. The parameter-based TL method is more straightforward and has the advantage of making full use of the similarity between models. In this work, Pb(II)/g-C 3 N 4 is a well-trained model based on large datasets. Before training, the parameters of the Pb(II)/g-C 3 N 4 model are randomly initialized. In the training of Hg(II)/g-C 3 N 4 and Cd(II)/g-C 3 N 4 , the parameters in Pb(II)/g-C 3 N 4 model are taken as the starting points for Hg(II)/g-C 3 N 4 and Cd(II)/g-C 3 N 4 , instead of randomly initializing parameters, and then the parameters are further fine-tuned for training.

Experimental validation
The g-C 3 N 4 adsorbent was prepared by heating 10 g of urea in a crucible for 3 h under a heating speed of 5°C min −1 . The HMIs solutions with initial concentrations of 100 and 200 mg L −1 were prepared by using Cd(NO 3 ) 2 , Hg(NO 3 ) 2 , Pb(NO 3 ) 2 as sources, respectively. To reach the adsorption equilibrium, these mixtures were shaken for 24 h and the suspensions were centrifuged. The residual concentrations of HMIs were measured by an inductive coupled plasma (ICP) atomic emission spectrometer (Optima 7300 DV, USA). The adsorption amount of HMIs in g-C 3 N 4 could be obtained by the formula (mmol g −1 ): where C 0 is the initial concentration of HMIs, C e is the residual concentration of HMIs, M is the relative atomic mass of Cd, Hg, and Pb, V is the volume of adsorption solution, and m is the mass of g-C 3 N 4 adsorbent.

DATA AVAILABILITY
All datasets generated in the current study are available from the corresponding author upon reasonable request.

CODE AVAILABILITY
The codes of generating the structures in this study are available from the corresponding author upon reasonable request. The codes of DeepPot-SE model used in this study is available at https://github.com/deepmodeling/deepmd-kit.