a Summation of reward in each timestep vs. episode, where the red line is the running average of the reward with window size 21 and the blue shadow represents the standard deviation. b Summation of reward in each timestep vs. timestep. c Number of removed atoms vs. timestep. d Predicted water flux vs. timestep. e Predicted ion rejection vs. timestep. b–e show the results of DRL agents after trained for 2000 episodes, where the red line indicates the mean and the blue shadow is the standard deviation. f Evolution of a graphene nanopore designed by DRL agent.