Deep Reinforcement Learning Optimizes Graphene Nanopores for Efficient Desalination

Two-dimensional nanomaterials, such as graphene, have been extensively studied because of their outstanding physical properties. Structure and geometry optimization of nanopores on such materials is beneficial for their performances in real-world engineering applications, like water desalination. However, the optimization process often involves very large number of experiments or simulations which are expensive and time-consuming. In this work, we propose a graphene nanopore optimization framework via the combination of deep reinforcement learning (DRL) and convolutional neural network (CNN) for efficient water desalination. The DRL agent controls the growth of nanopore by determining the atom to be removed at each timestep, while the CNN predicts the performance of nanoporus graphene for water desalination: the water flux and ion rejection at a certain external pressure. With the synchronous feedback from CNN-accelerated desalination performance prediction, our DRL agent can optimize the nanoporous graphene efficiently in an online manner. Molecular dynamics (MD) simulations on promising DRL-designed graphene nanopores show that they have higher water flux while maintaining rival ion rejection rate compared to the normal circular nanopores. Semi-oval shape with rough edges geometry of DRL-designed pores is found to be the key factor for their high water desalination performance. Ultimately, this study shows that DRL can be a powerful tool for material design.


Introduction
Single-layer graphene, as an iconic two-dimensional (2D) material, has drawn much scientific attention in recent decades. Because of its ultrathin thickness and outstanding mechanical properties, graphene with artificial pores has been demonstrated to have great potentials in many engineering applications such as effective hydrogen gas separator, [1][2][3] next-generation energy storage or supercapacitor building, 4,5 and high-resolution DNA sequencing. [6][7][8] Given the potential imminent global water scarcity crisis, another important application for nanoporus graphene is energy-efficient water desalination. 9,10 Equipped with nanoporous 2D material membranes like graphene, reverse osmosis (RO) water desalination process can expect 2-3 orders improvement in water flux compared with traditional polymeric membranes. [9][10][11][12][13] In RO, the geometry of nanopores in 2D materials plays a determinant role in water desalination performance. 9,11 In general, a large pore that allows high water flux is likely to perform poorly in rejecting ions; a small pore that rejects 100% undesired ions, on the other hand, usually have limited water flux. Thus, an optimal nanopore for water desalination is expected to allow as high water flux as possible while maintaining a high ion rejection rate. However, finding the optimal nanopore geometry on graphene can be challenging due to high computational and experimental cost associated with extensive experiments, i.e., there are millions of possible shapes for a pore on a 4nm × 4nm graphene membrane, but evaluating the water flux and ion rejection of a single pore using 10ns MD simulation takes roughly 36 hours on a 56-core CPU cluster. Given this time benchmark, evaluating the water desalination performance of 1000 graphene nanopores can take more than 4 years. Therefore, to design the optimal nanopore geometry for water desalination, an efficient nanopore optimization method and a fast nanopore water desalination performance predictor (performance predictor in short) are needed. Inspired by the recent success of deep learning 14 and reinforcement learning, 15 we combined the state-of-art deep reinforcement learning (DRL) algorithm with convolutional neural network (CNN) to solve this challenge.
The main idea of reinforcement learning (RL) 16 is to train an agent to find an optimal 2 policy which maximizes the expected return in the future through actively interacting with the environment to achieve a goal. Recently, DRL, 15,17 which models the RL agent with artificial neural networks, has proven to be an efficient tool in material-related engineering fields, such as material design [18][19][20] and molecule optimization. 21 In this work, we designed and implemented an artificial intelligence framework consisting of DRL, which is capable of designing the nanopore on a single-layer graphene membrane to reach optimal water desalination performance. By a series of decisions on whether or not to remove carbon atoms and which atom to be removed, the DRL agent can eventually create a pore that allows the highest water flux while maintaining ion rejection rate above an acceptable threshold.
Such precisely controlled atom-by-atom removal nanopore synthesis can be conducted by electrochemical reaction (ECR). 22,23 During training, the DRL agent updates the neural network weights through back-propagation from the reward given by the water desalination performance. In our case, the evaluation of desalination performance is the water flux and ion rejection rate of the nanoporus graphene generated by the DRL agent at each timestep. However, conventional methods to calculate desalination performance, like MD simulation, are too time-consuming to be implemented in our DRL model. To evaluate DRL-designed nanopores fast and accurately, we implemented a CNN-based 24-27 model that uses the geometry of porous graphene membrane to directly predict the water flux and ion rejection rate under certain external pressure. To this end, a ResNet 27 model is trained on the dataset we collected through MD simulation of water desalination using various nanoporus graphenes. With the CNN-accelerated desalination performance prediction, the DRL model can rapidly optimize graphene nanopore for water desalination. MD simulations on promising DRL-designed nanoporus graphenes prove that they have higher water flux while maintaining similar ion rejection rate comparing to the circular nanopores. Further investigation of molecular trajectories illustrates why DRL-designed nanopores outperform the conventional nano-structures and provides insights for optimal nanopore design for water desalination.
3 Figure 1: Overview of CNN accelerated DRL nanopore design model. At each timestep, the nanoporous graphene structure is transformed into geometrical features, which is fed into a performance prediction network to predict water flux and ion rejection rate. The reward is then calculated based on the predicted water flux and ion rejection. Also, the geometrical features extracted from the performance predictor are concatenated with the fingerprint and atom coordinates for the current state. Given the current graphene structure, candidate atoms are picked from those locate at the edge of the nanopore. The DRL agent constructed upon Deep Q-network takes the reward, candidate atoms, and state as input to determine the next atom to remove from the graphene.
The graphene nanopore optimization framework for water desalination consists of a DRL agent incorporated with a CNN-based water desalination performance predictor ( Fig. 1).
At each timestep, the DRL agent generates a new nanopore by removing one atom from the graphene sheet, and the CNN model poredicts the water flux/ion rejection rate of the nanopore, such that the DRL agent can get instantaneous feedback on its action. Given the featurized information of the nanoporous graphene sheet (Morgan fingerprint, Cartesian coordinates of each atom, and geometrical features of graphene membrane from the CNN 4 model) and predicted water flux and ion rejection, the DRL agent was trained to create a pore on graphene sheet with the goal to maximize its performance in the water desalination process. The dataset used to train CNN performance predictor is generated by MD simulations of various nanoporus graphenes for water desalination.

Molecular Dynamic Simulation
MD simulations were conducted using LAMMPS package, 28 where porous graphene membranes simulated were either created using Visual Molecular Dynamics 29 or automatically generated by deep reinforcement agent (samples from early stage of training). A regular simulation system consists of four different sections: a graphene piston that applied constant external pressure; a saline water section containing potassium chloride as solute; a singlelayer graphene membrane with the pore of different geometries; and a freshwater section which functioned as a reservoir of filtered water (Fig. 2a). The molarity of the saline water in this work is ∼2.28 M, which is higher than normal seawater for the sake of computational efficiency. The dimension of the simulation box is approximately 4 nm × 4 nm × 13 nm in x, y, and z direction, respectively. A periodic boundary condition was applied to all three dimensions.
All water molecules in this work were simulated using SPC/E model, 30 with SHAKE 31 algorithm to constrain the bond length and angles. Lennard-Jones (LJ) potentials (Supplementary Information Table S1) along with long-range Coulombic electrostatics potentials were adopted as interatomic potentials in the MD simulation. The cutoff for the interatomic potentials was set to be 12 A. Lorentz-Berthelot rules were employed for the calculation of LJ potentials between different kinds of atoms. Particle-particle particle-Mesh (PPPM) Ewald sovler 32 with 0.005 root-mean-squared error was used for long-range Coulombic potential correction. The porous graphene membrane and piston were each regarded as an entity during the simulation (internal interatomic potentials were not calculated) in order to reduce the computational cost.

5
In the first stage of each individual simulation, the internal energy of the system was minimized for 1000 iterations. The system then ran for 5 ps under the N P T (isothermal-isobaric) ensemble at 300 K after the velocities of molecules were initialized based on Gaussian distribution. After the equilibrating, the system under N P T ensemble, the system was switched to N V T (canonical) ensemble to run for another 10 ns. The temperature was maintained at 300 K by Nosé-Hoover thermostat 33,34 with a time constant of 0.5 ps. At this stage, a z-direction constant external pressure of 100 MPa was applied on saline water by the piston to mimic the reverse osmosis process in water desalination. Since the relationship between water flux and external pressure in reverse osmosis process was generally linear, 9,11-13 the performance of pores under 100 MPa could be extrapolated to lower pressures. Therefore, we chose to run simulations under 100 MPa external pressure to rapidly collect meaningful data. Molecular trajectories of each simulation were collected every 5 ps for data processing.

Dataset and Data Augmentation
The two major performance indicators of a membrane in water desalination: water flux, and ion rejection rate, were calculated by post-processing the MD simulation trajectories.
The slope of the fitted least square regression line on filtered water with respect to the simulation time curve was calculated to be the water flux of each membrane (Fig. 2b). The ion rejection rate of each membrane was calculated by dividing the number of ions in the freshwater section by the total number of ions.
The total number of different simulated porous graphene is 185. Since the reward of DRL agent in our model was calculated based on the water flux/ion rejection prediction of performance predictor (Eq.1 and 2, Supplementary Information Fig.S2), highly accurate prediction must be achieved to ensure the quality of DRL training. A much larger training dataset was necessary for the optimization of CNN model. The method employed in our study to substantially increase the size of the dataset was data augmentation. 35,36 Given that the water desalination performance of a graphene pore depended on its size and geometry, we could assume that a flipped or translated pore on the same graphene membrane would demonstrate identical water flux/ion rejection rate of the original pore (Proven by MD simulations in Supplementary Information Fig.S3). Therefore, copies of original pores were created by being flipped along x or y axis and/or translating in -4 A to 4 A in x and y directions ( Fig. 2c). The water desalination performance of pore copies are random variable with normal distribution (µ = original pore performance, σ = 1% of original pore performance). In order to improve CNN's prediction accuracy on the performance of pores created by the DRL agent, we augmented DRL-generated pores 32 times. Among the other pores, the ones with zero water flux (too small to allow water transport) were augmented 6 times, and the rest of the pores were augmented 24 times. Data augmentation was conducted using the Atomic Simulation Environment (ASE) package. 37 The final dataset used for CNN training contains 3937 samples (Fig. 2d). A reverse sigmoid function was fitted to the distribution of samples to show the general relationship between the water flux and ion rejection rates.

Water Desalination Performance Predictor
To facilitate the reward estimation of DRL, a CNN model was trained to make an instantaneous prediction of water flux and ion rejection rates of a specific nanoporous graphene membrane. CNN is widely known as a universal feature extractor. Given that the water desalination performance of a graphene nanopore depends on its geometrical features, CNN can be the most suitable model to recognize geometrical features and make predictions based on them. There were 2 steps in the CNN modeling, including extracting features from the geometry of graphene nanoporous membrane and making predictions through a fully-connected multi-layer perceptron (MLP) regression model (Fig.1)

DRL Nanopore Optimization Agent
Our goal was to design the optimal geometry of graphene nanopore for energy-efficient water desalination, which simultaneously demanded high flux and high ion rejection under certain external pressure. In order to optimize the nanopore, an agent was expected to remove atoms sequentially until a desired pore geometry was developed. To this end, the agent was set to interact with nanoporous graphene in a sequence of actions a t , states s t , and rewards r t within an episode of length T . The goal of the agent was to select the action such that it could maximize the future discounted return R t = T t=1 γ t−1 r t in the finite Markov decision process (MDP) setting. In our case, we set the discount factor γ to be 1.
At timestep t, given the nanoporous graphene G t , the agent observed the state s t , which was composed of Morgan fingerprint, 40 coordinates of all the atoms, along with CNNextracted graphene geometrical features. The graphene geometry g t was fed into the flux and ion rejection predictor, respectively. The geometrical features were the concatenation of last layer before output of the performance predictors. Once an atom was removed, its coordinate was set to the origin since MLP required a homogeneous input dimension. The predicted flux f t and ion rejection i t were leveraged to compute the reward signal r t for the agent, as given in Eq. 1 and Eq. 2: where σ(·) is the generalized logistic function 43 and α is the coefficient for flux term. In our setting, α was set to be 0.01, and A = −15, K = 0, B = 13, Q = 100, ν = 0.01, C = 1 for the logistic function. A linear term of flux reward encouraged the agent to expand nanopores, which would allow higher water flux. Since low ion rejection rate was not favored in water desalination, a generalized logistic function σ(·) was leveraged to penalize ion rejection term. When i t was high, σ(i t ) was close to zero, allowing the growth of the nanopores.
However, when i t was low, σ(i t ) fiercely penalized the agent by outputting a large negative value( Supplementary Information Fig.S2). Besides, an extra 0.05 reward was given to the agent when it chose to remove an atom at timestep t to encourage pore growth at an early stage. Given state s t and reward r t , the agent intended to choose the action a t for next step. However, due to the high dimensionality of possible action space (all the atoms in the graphene fragment), it was computationally expensive for the agent to efficiently and thoroughly explore the possible actions and to learn an optimal design. Therefore, only a subset of M atoms was selected as candidates c t . Atoms on the edge of pore were picked based on the rank of their proximity to the pore center, if the number exceeds M , only the first M atoms closest to the center of pore were selected. However, when the number of edge atoms was less than M , non-edge atoms closest to the center of pore were selected as possible candidates to maintain the size of c t . Given the state s t , reward r t , and candidate c t , the agent learned to pick the action aiming to maximize future rewards.
To train the agent, deep Q-learning 15 with experience replay was implemented. Our task only considered deterministic environment, namely given (s, c), the pair (s , c ) at the next time-step was unique. Based on Bellman equation, 16 the optimal action-value function Q * (s, c) in the deterministic environment was defined as: To model the Q function, the Q-network parameterized by θ and target network parameter- In our setting, we use Adam optimizer 42 with learning rate 0.001. The replay buffer is of capacity 10000 and batch size is set to 128.

Results
The mean squared error (MSE) and coefficient of determination (R 2 ) are used as metrics to evaluate the performance predictions of models. The water flux and ion rejection labels are standardized before fed into the property prediction models. Thus the metrics tabulated are based on standardized water flux or ion rejection rate(  We trained the DRL agent with 10 random seeds to generate various graphene nanopores. In the DRL agent training processes with different random seeds (Fig.3), the red curves indicate mean values and the blue shadows represent standard deviations. The accumulated reward for each episode increases during training the DRL agent (Fig. 3a). Initially the policy is noisy and the accumulated rewards are low, because the DRL agent has not yet learned to stop expanding the pore before receiving enormous penalty for low ion rejection rate.
During the training, the DRL agent gradually learns a stable policy through maximizing the rewards (balancing the trade-off between water flux and ion rejection rate).The performance of DRL agent after 2000 episodes of training is demonstrated in Fig. 3(b)-(e). The DRL Step: 0 Step:10 Step: 20 Step: 60 Step: 70 Step: 80 Step: 30 Step: 40 Step: 50 (f) agent generates nanopore which brings positive reward at each timestep, and the agent also automatically learns to stop enlarging the nanopore to avoid low ion rejection rate ( Fig. 3b and 3c). For example, the evolution of a DRL generated pore (Fig. 3f, animated in Supplementary Movie) shows that DRL stops removing atom from the edge of graphene nanopore after 50th timestep, because it determines that higher water flux reward brought by further removing atoms is not worth the penalty for low ion rejection rate. Based on the prediction of the performance predictor, the DRL generated nanoporus graphenes have averaged ∼40 #/ns water flux and ∼96% ion rejection rate ( Fig. 3d and 3e).  Fig. 4a and 4b). T-SNE is a dimensionality reduction tool that is capable of mapping highdimensional data to lower-dimension form while preserving the similarities between data points. In another words, membranes that are more similar to each other will have a higher tendency of being clustered. In this work, using CNN extracted features from each graphene membrane, T-SNE successfully clustered samples with similar water flux or ion rejection.
This result indicates that features extracted from CNN models have a strong correlation with the water flux and ion rejection rate.
The water desalination performances of all nanopores (DRL generated and training dataset) are compared in Fig. 5a. It is worth noting that the process of generating 7999 nanopores using DRL and predicting their water flux/ion rejection rate takes less than a single week; however, evaluating the performance of the same amount of nanopores using MD simulation will take ∼33 years (average 36 hours on each sample, using one 56-core CPU node). Among the nanopores with the same level of ion rejection rate, some nanopores discovered by DRL allow much higher water flux. One common feature shared by those high-performance nanopores is the semi-oval geometry with rough edges. We set 90% ion rejection rate as the threshold to determine if a nanopore can effectively reject ions. The water flux histogram (Fig. 5b) shows that given the baseline ion rejection rate as 90%, DRL can extrapolate from the training dataset and discover graphene nanopores that generally allow higher water flux.
Further MD simulations are conducted with DRL generated membranes that show high predicted performances to evaluate how the DRL helps in optimizing graphene nanopore for water desalination (Simulation process recorded in Supplementary Movie). Although DRL generated pores generally have lower water flux compared with circular pores with the same area, they have much higher ion rejection rate (Fig. 5c, 90% threshold of ion rejection rate is marked by a red dashed line). For example, when the pore area is 113 A 2 , DRL generated nanopore maintained over 90% ion rejection rate while the circular pore rejects only approximately 65% of ions. A pore with high water flux but a very low ion rejection rate is not desirable in water desalination application. Moreover, the comparison between 113 A 2 DRL generated nanopore with 88 A 2 circular pore shows that DRL generated pore can reject more ions when achieving same water flux: they both have approximately 125 #/ns water flux while 113 A 2 DRL generated pore can reject approximately 7% more ions. The (e) Schematic showing how ions are blocked by DRL generated pore due to the steric effect. Ions with their hydration shell (Blue dashed circle) are too large to pass through the corner area (Red dashed circle) in the DRL generated pore, thus rendering that area an ion-free zone 16 comparison between simulation results proves that DRL tends to prioritize the ion rejection rate over water flux, which makes it capable of maximizing the water flux of nanopores while maintaining a valid ion rejection rate.
To gain deeper understanding about the reason behind the high ion rejection rate of DRL generated pores, distribution of water molecules and ions inside of 113 A 2 DRL generated pore and 88 A 2 circular pore have been visualized (Fig. 5d). From the ion distribution (marked by red dots), we can observe that ions can traverse the circular pore evenly through the entire central area of the pore. The distributions of water molecules (marked by aqua blue color) and ions in the circular pore are in a homogeneous pattern. However, the corners inside of DRL generated nanopore are small enough to block the passage of ions while being large enough to accommodate the transport of water molecules. With the knowledge that ions are covered by hydration shell during the transport through nanopore, it can be seen that ionfree zones (corners) inside of DRL generated nanopore obstruct the traversing of ions with hydration shell by steric effect (Fig. 5e). This is the reason why high-performance nanopores (zoom-in Fig. 5a, more high-performance DRL-generated pores shown in Supplementary   Information Fig.S5) all have rough edges. Discovers and utilizes this special geometry, DRL designs nanopores that can reject most ions while allowing high water transport.

Conclusion
In this work, we propose a graphene nanopore optimization framework based on the DRL agent accelerated by the water desalination performance predictor. In particular, we focus on the optimal design of nanoporous graphene for water desalination. With a well-trained machine learning property predictor, the DRL can automatically learn to design the optimal material structure effectively and efficiently.

Lennard-Jones potentials for MD simulations
The 12-6 Lennard-Jones (LJ) potentials used in MD simulations for generating dataset are tabulated as below. Force field between different types of atoms are calculated using Lorentz-Berthelot rule.

Details of XGBoost Regression Model
For XGBoost, 39 100 iterations of random search with 5-fold cross validation was applied on the hyperparameter grid tabulated as below. The hyperparameters of best XGBoost models for both water flux and ion rejection rate prediction are also recorded.

Details of DRL agent
Deep Q network 15,17 is utilized to model the DRL agent, which takes as input the graphene state representation and candidate atoms (Fig. S1) 16 Namely, with probability , the agent will randomly pick an action from the candidates. In our setting, is initially set to 0.9 and decayed to 0.05 linearly after 1000 episodes.
Also, the generalized logistic function 43 utilized in the reward function for penalizing ion rejection is illustrated in Fig. S2. When the ion rejection is greater than 95%, the function returns a near-zero value. Therefore, given that DRL agent receives positive reward for removing atoms and generating water flux, it tends to aggressively remove atoms at the beginning of each episode. While the ion rejection drops below 90%, the function gives a harsh penalty to the agent. Such reward function strongly discourage the agent from creating a nanoporus graphene with low ion rejection rate, which is inefficient in real-world water desalination. Figure S1: Overview of deep Q network architecture. The numbers below each box illustrates the dimensions. Each data point is calculated by averaging water flux and ion rejection rate of 4 simulations.
Error bar represent the value of one standard deviation. The water flux difference between the original pore and augmented pore is 3.17 #/ns (approximately 2%) and the ion rejection rate difference is 0.6%. All differences are within the error bar of the result of 4 simulations.
From this result, we can conclude that the augmentation method implemented in this work is valid. Figure S3: Comparison between the water desalination performance of original humandesigned graphene nanopore (left, blue membrane) and its augmented copy version (right, black membrane). The performance difference between them is minimum and within the error bar.

Pore area calculation
The area of graphene nanopores mentioned in this work is calculated using computer vision method (OpenCV package 50 ). For each nanoporous graphene membrane, all of its atoms which locate in the area of 0 A ≤ x ≤ 40 A and 0 A ≤ y ≤ 40 A are plotted with radius as σ (3.39 A used in this paper) of carbon (Fig. S4a). The graphene membrane image dimension in pixel is 360 × 360. The OpenCV package is then used to detect the pore from the image (Fig. S4b), and the area of the pore in terms of pixel is calculated. Further, The area of pore in terms of A 2 can be calculated using the ratio: 1 pixel = (40/360) 2 A 2 .
(a) (b) Figure S4: (a) Plot nanoporous graphene membrane (b) Detecting pore area and calculate its size using computer vision method Compilation of DRL generated pores with high predicted performance The compilation of 64 DRL generated pores with predicted water flux > 120 #/ns and ion rejection rate > 90% are demonstrated in (Fig. S5). One common trait of those grpahene nanopores is that they tend to have narrow cavities. Those narrow cavities in pore allow the passage of water molecules but are small enough to restrict the transport of ions along with their hydration shell. DRL gradually learned the importance of narrow cavity and applied it during the optimization of graphene nanopores. Figure S5: DRL generated pores with predicted water flux > 120 #/ns and ion rejection rate > 90%. 30