PySAGES: flexible, advanced sampling methods accelerated with GPUs

Molecular simulations are an important tool for research in physics, chemistry, and biology. The capabilities of simulations can be greatly expanded by providing access to advanced sampling methods and techniques that permit calculation of the relevant underlying free energy landscapes. In this sense, software that can be seamlessly adapted to a broad range of complex systems is essential. Building on past efforts to provide open-source community supported software for advanced sampling, we introduce PySAGES, a Python implementation of the Software Suite for Advanced General Ensemble Simulations (SSAGES) that provides full GPU support for massively parallel applications of enhanced sampling methods such as adaptive biasing forces, harmonic bias, or forward flux sampling in the context of molecular dynamics simulations. By providing an intuitive interface that facilitates the management of a system's configuration, the inclusion of new collective variables, and the implementation of sophisticated free energy-based sampling methods, the PySAGES library serves as a general platform for the development and implementation of emerging simulation techniques. The capabilities, core features, and computational performance of this new tool are demonstrated with clear and concise examples pertaining to different classes of molecular systems. We anticipate that PySAGES will provide the scientific community with a robust and easily accessible platform to accelerate simulations, improve sampling, and enable facile estimation of free energies for a wide range of materials and processes.


Introduction
Molecular simulations are extensively used in a wide range of science and engineering disciplines [ ].As their use has grown for the discovery of new phenomena and the interpretation of sophisticated experimental measurements, so has the complexity of the systems that are considered.Classical atomistic molecular dynamics ( ) simulations are generally limited to microsecond time scales and length scales of tens of nanometers.For systems that are characterized by rugged free energy landscapes, such time scales can be inadequate to ensure su cient sampling of the relevant phase space, and advanced methods must therefore be adopted to overcome free energy barriers.In that regard, it is useful and increasingly common to identify properly chosen collective variables ( s), which are generally di erentiable functions of the atomic coordinates of the system; then, biases can be applied to explore the space de ned by such s, thereby overcoming barriers and enhancing sampling of the thermally accessible phase space.
The rapid growth of hardware accelerators such as s or s, or specialized hardware designed for fast computations [ , ], has provided researchers with increased opportunities to perform longer simulations of larger systems.G s, in particular, provide a widely accessible option for fast simulations, and several so ware packages, such as -blue [ ], Open [ ], [ , ], LAMMPS [ ], and Gromacs [ ], are now available for simulations on such devices.As mentioned above, enhanced sampling methods seek to surmount the high energy barriers that separate multiple metastable states in a system, while facilitating the calculation of relevant thermodynamic quantities as functions of di erent s such as free energy surfaces ( ).Several libraries, such as [ ], Colvars [ ], and our own package [ ], provide out-of-the-box solutions for performing enhanced sampling simulations.Among the various enhanced sampling methods available in the literature, some of the most recently devised schemes rely on machine learning ( ) strategies to approximate free energy surfaces and their gradients (generalized forces) [ , , , ].Similarly, algorithms for identifying meaningful s that correlate with high variance or slow degrees of freedoms ( s) are based on deep neural networks [ , , , , , ].These advances serve to highlight the need for seamless integration of frameworks with existing so ware libraries.To date, there are no solutions that combine enhanced sampling techniques, hardware acceleration, and frameworks to facilitate enhancedsampling simulations on s.While some libraries that support s provide access to a limited set of enhanced sampling methods [ , , , , ], there are currently no packages that enable users to take advantage of all of these features within the same platform and in the same backend-agnostic fashion that tools such as and have provided for -based simulations.Here we present y , a Python Suite for Advanced General Ensemble Simulations.It is a free, open-source so ware package written in Python and based on that follows the design ideas of and enables users to easily perform enhanced-sampling simulations on s, s, and s.Py can currently be coupled with -blue, Open , and and by extension from the latter to , Quantum , and Gaussian, among others.At this time, y offers the following enhanced sampling methods: Umbrella Sampling, Metadynamics, Well-tempered Metadynamics, Forward Flux Sampling, String Method, Adaptive Biasing Force, Arti cial neural network sampling, Adaptive Biasing Force using neural networks, Combined Force Frequency, and Spectral Adaptive Biasing Force.Py also includes some of the most commonly used s and, importantly, de ning new ones is relatively simple, as long as they can be expressed in terms of the NumPy [ ] interface provided by . All s can be automatically di erentiated through functional transforms.Py is highly modular, thereby allowing for the easy implementation of new methods as they emerge, even as part of a user-facing script.
In the following sections, we provide a general overview of the design and implementation of y , and present a series of examples to showcase its exibility for addressing research problems in di erent application areas.We also discuss its performance in s and present a few perspectives on how to grow and improve the package to cover more research use cases through future development, as well as community involvement and contributions.

Implementation
We begin by brie y outlining the core components of y , how they function together, and how communication with each backend allows y to bias a simulation during runtime.A summary of the execution work ow of y along with a mapping of the user interface with the main stages of the simulation and the interaction with the backends, is illustrated in Figure 1.
To provide a uniform user interface while minimizing disruption to preexisting work ows, y only requires the user to wrap their traditional backend scripting code into simulation generator functions.This approach accommodates the heterogeneity of Python interfaces across the di erent simulation backends supported by y . An example of a simulation generator function and how a traditional Open script can be modi ed to perform an enhanced-sampling simulation is depicted in Figure 2. At the start of a simulation, the simulation generator function is called to instantiate as many replicas of the simulation as needed.Then, for each replica, y queries the particle information and the device that the backend will be using.In addition, during this initial stage y also performs automatic di erentiation of the collective variables via 's grad transform required to estimate the biasing forces, and generates specialized initialization and updating routines for the user-declared sampling method.
Like , y wraps the simulation information into an object called a Snapshot.This object exposes the most important simulation information, such as particle positions, velocities, and forces in a backend-and device-agnostic format.To achieve this, y uses ack [ ] for ++ based libraries to directly access the contents of the backendallocated bu ers for the di erent particle properties without creating data copies whenever possible.
Once the setup of both the simulation and sampling method is completed, y hands control back to the backend, which will run for a given number of time steps or until some other stopping criteria is reached.In order to exchange information back and forth, y adds a force-like object or function to the backend which gets called as part of the time integration routine.Here, the sampling method state gets updated and the computed biasing forces are added to the backend net forces.
Finally, the information collected by the sampling method is returned and can be used for calculating the free energy as function of the selected s.Unlike , y o ers a user-friendly analyze interface that simpli es the process of performing post simulation analysis, including the automatic calculation of free energies based the chosen sampling method.This feature can greatly reduce the time and e ort required to gain valuable insights from simulations.([4, 6, 8, 14]), DihedralAngle ([6, 8, 14, 16])] grid = Grid(lower=(-pi, -pi), upper=(pi, pi), shape=(32, 32), periodic=True) method = ABF(cvs, grid) raw_results = pysages.run(method, generate_simulation, int(1e6)) result = pysages.analyze(raw_result) .It is easy to extend existing scripts with y to perform enhanced-sampling , with minimal changes to the code.In general, the only requirement is for the user to wrap the code that de nes the simulation system into a simulation generator function.

Py
o ers an easy way to leverage di erent parallelism frameworks including with the same uniform fronted available to run enhanced sampling simulations.This is achieved via Python's concurrent.futuresinterface.In particular, for parallelism, the user only needs to pass an additional MPIPoolExecutor (from mpi4py) to y ' run method.If the user selects a method such as UmbrellaSampling, the workload for each image will be distributed across available nodes.On the other hand, for most of the sampling methods, the parallelization interface allows the user to run multiple replicas of the same system to enable, for instance, analysis of the uncertainties associated to computing the free energy of a given system.
To ensure the reproducibility and correctness of our implementation and to follow so ware engineering best practices, we have implemented a comprehensive unit tests suite, and leverage GitHub's continuous integration services.In addition, we use trunk.io[ ] to adhere to quality standards as well as to ease the collaboration of developers.

Enhanced Sampling Methods
While we assume the reader has some basic understanding of enhanced sampling methods, here we provide an overview of these techniques.We direct readers interested in learning more about the fundamentals of enhanced sampling to a number of excellent recent review articles [ , , , , , , , , , ].In addition, we discuss the general structure of how enhanced sampling methods are implemented within y , and also present a summary of the various methods already available in the library.
Enhanced sampling methods are a class of simulation techniques that manipulate regular simulations in order to more e ectively sample the con guration space.In a collective variable, , is typically a function of the positions of all particles, ξ({  }).
For a given statistical ensemble (such as the canonical, ), the corresponding free energy can be written as  = − B  ln(), where  is the Helmholtz free energy and  is the canonical partition function.To make explicit the dependency of the free energy on , let us write down the partition function: Normalizing this partition function gives us the probability of occurrence, () = ()/( ∫ d ()), for con gurations in the subspace.Substituting this probability into the expression for the free energy, we get: where  is a constant.
If we take the derivative of the free energy with respect to  we get where . . . denotes the conditional average.The goal of -based enhanced sampling methods is to accurately determine either () or  ()/ from which () can be recovered in a computationally tractable manner.
In y , the implementation of sampling methods follows the functional style programming model.New methods are implemented as subclasses of the SamplingMethod class, and are required to de ne a build method.This method returns two methods, initialize and update, used as part of the process of biasing the simulation.For readers familiar with , these could be thought of as analogues to the higher level functions returned by 's simulate integration methods.The initialize method allocates all the necessary helper objects and stores them in a State data structure, while the update method uses the information from the simulation at any given time to update the State.
While y allows new methods to be written seamlessly as part of Python scripts used to set up molecular dynamics simulations, it also provides out-of-the-box implementations of several of the most important known sampling methods.We list and brie y detail them next.

Harmonic Biasing
One simple way to sample a speci c region of the phase space is to bias the simulation around a point  0 with harmonic bias.This adds a quadratic potential energy term to the Hamiltonian that increases the potential energy as a system moves away from the target point: H  = H + /2( −  0 ) 2 , where  > 0 is the spring constant.The unbiased probability distribution () can be recovered by dividing the biased distribution by the known weight of the bias () =   ()/ −/2(− 0 ) 2 / B  .
The disadvantage of this approach is that it can only be used to explore the free energy landscape near a well-know point in phase space.This may not be su cient for many systems, where the free energy landscape is complex.

Umbrella Sampling
Umbrella sampling is a technique that traditionally builds on harmonic biasing by combining multiple harmonically-biased simulations.It is a wellknown method for exploring a known path in phase space to obtain a free energy pro le along that path [ , ].Typically, a path between to point of interest is described by  points in phase space,   .At each of these points, a harmonically biased simulation is performed, and the resulting occurrence histograms are combined to obtain a single free energy pro le.
In y , we implement umbrella integration for multi-dimensional s.This method approximates the forces acting on the biasing points and integrates these forces to nd the free energy pro le (), and allows to explore complex high-dimensional free energy landscapes.

Improved String Method
When only the endpoints are known, but not the path itself, the improved (spline-based) string method can be used to nd the mean free energy pathway ( ) between these two endpoints [ ].The spline-based string method improves upon the original string method by interpolating the using cubic-splines.In this method, the intermediate points of the path are moved according to the recorded mean forces acting on them, but only in the direction perpendicular to the contour of the path.This ensures that distances between the points along the path remain constant.This method has been widely used and has been shown to be an e ective way to nd the between two points in the phase space [ ].

Adaptive Biasing Force sampling
The adaptive biasing force ( ) sampling method is a technique used to map complex free-energy landscapes.It can be applied without prior knowledge of the potential energy of the system, as it generates on-they estimates of the derivative of the free energy at each point along the integration pathway.A works by introducing an additional force to the system that biases the motion of the atoms, with the strength and direction of the bias continuously updated during the simulation.In the long-time limit, this yields a Hamiltonian with no average force acting along the transition coordinate of interest, resulting in a at free-energy surface and allowing the system to display accelerated dynamics, thus providing reliable freeenergy estimates [ , ]. Similarly to , y implementation of is based on the algorithm described in [ ].

Metadynamics
Metadynamics is another popular approach for enhancing sampling of complex systems.In metadynamics [ ], a bias potential is applied along one or more s in the form of Gaussian functions.The height and width () of these Gaussians are controlled by the user.The Gaussian bias potentials are cumulatively deposited at user-de ned intervals during the simulation.In standard metadynamics, the height of the Gaussian bias potentials is xed.
In contrast, for well-tempered metadynamics ( ) [ ] simulations, the height of the Gaussian bias potentials is adjusted at each timestep using a preset temperature based bias factor.This scaling of Gaussian heights in leads to faster convergence compared to standard metadynamics, as it restricts the range of free energy explored to a range de ned by the bias factor.
In y , we have implemented both standard metadynamics and .The well-tempered variant is activated when a user sets a value for the bias factor.To improve the computational performance, we have added optional support for storing the bias potentials in both on a pre-de ned grid.This allows users to trade-o accuracy for faster simulations, depending on their needs.
2.1.6.Forward Flux Sampling Forward ux sampling ( ) belongs to a di erent family of enhanced sampling methods than the ones described above.In the previously described methods, the free energy change from a region in the phase space () to the region of interest () is calculated by applying a bias to the system.In no bias is added and instead an e cient selection of trajectories that crosses the phase space from  to  is performed.Since no bias is used, the intrinsic dynamics of the system is conserved and therefore kinetic and microscopic information of the transition path can be studied [ ].In y we have implemented the direct version of [ , ].

Arti cial neural networks sampling
Arti cial neural networks sampling ( ) [ ] employs regularized neural networks to directly approximate the free energy from the histogram of visits to each region of the space, and generates a biasing force that avoids ringing and boundary artifacts [ ], which are commonly observed in methods such as metadynamics or basis functions sampling [ ].This approach is e ective at quickly adapting to diverse free energy landscapes by interpolating undersampled regions and extrapolating bias into new, unexplored areas.
The implementation on y o ers more exible approaches to network regularization than , which uses Bayesian regularization.
2.1.8.Force-biasing using neural networks Force-biasing using neural networks ( ) [ ] is based upon the same idea as , that is, relying on arti cial neural networks to provide continuous functions to bias a simulation, but instead of using the histogram to visits to space it updates its network parameters by training on the estimates for the mean forces as the simulation advances.This method shares all of the features of , but the smooth approximation of the generalized mean force it produces enables much faster convergence to the free energy of a system compared to .

Combined Force Frequency sampling
The combined force frequency sampling ( ) method [ ] combines the speed of generalized-force based techniques such as or with the advantages of frequency-based methods like metadynamics or .Notable improvements over earlier force-based methods include eliminating the need for hyperparameters to dampen early-time estimates, automating the integration of forces to generate the free energy, and providing an explicit expression for the free energy at all times, enabling the use of replica exchange or reweighing.
In principle, by using sparse storage of histograms, it should be possible to scale the method to higher dimensions without encountering memory limitations, such optimization is however not yet implemented in y .
2.1.10.Spectral Adaptive Biasing Force Spectral [ ] is a method that follows the same principle as neuralnetwork-based sampling methods, in that it builds a continuous approximation to the free energy.However, in contrast to methods like it does so by tting exponentially convergent basis functions expansions, and could be thought as a generalization of the Basis Functions Sampling Method.In contrast to the latter, and similar to , it allows for the recovery of an explicit expression for the free energy of a system.It is an extremely fast method in terms of both runtime and convergence.

Collective variables
As previously mentioned, enhanced sampling calculations commonly involve the selection of a .An appropriate for a given system could simply be the distance between the centers of mass of two groups of atoms, but could be a complex specialized quantity.
Below, we list a set of s prede ned in y , sorted by the number of groups of atom coordinates necessary for their use: 1. TwoPointCV.This subclass is for s that need two groups for their de nition.This includes Distance and Displacement (vector).2. ThreePointCV.Subclass of s with three groups of atoms, such as Angle.3. FourPointCV.Subclass of s with four groups of atoms, such as DihedralAngle.4. AxisCV.Subclass of s that are projected on a determinate axis.
This includes Component and PrincipalMoment. 5. CollectiveVariable General base class for all s.In y , s that directly derive from this class, and do not belong to the previous groups, include: RingPhaseAngle, RingAmplitude, RadiusofGyration, Asphericity, Acylindricity, ShapeAnisotropy, RingPuckeringCoordinates [ ] (vector).
In y we provide users with a simple framework for de ning s, which are automatically di erentiated with .To illustrate this, we compare how to write the calculation of a that measures the projection of the vector between two groups of atoms over the axis that passes by other two groups, in both and y (see Figure 3).In y the gradient calculation is done automatically whereas in it has to be coded explicitly.
Data-driven and di erentiable s discovered using arti cial neural networks (e.g.autoencoders) [ , , , , ] with arbitrary featurizations of atoms can in principle be implemented in y based on the above general abstract classes of s.The following second example shows the power of di erential programming for declaration in y .

Case study: A collective variable for interfaces
When the two immiscible liquids are in contact with each other, the density of one liquid experiences a gradual change.This transition region is the liquid-liquid interface and its position has high importance in many studies (see section 3.1.3).However, the location of such interface is not a trivial task since it generally uctuates as the simulation progresses.As a representative for the interface, we can utilize the position of the point where the gradient of the density is maximized.More formally, let () denote the density of a liquid of interest at a coordinate  on the perpendicular axis.We would like to nd the location of the interface: However, the density function () is not directly measurable in a molecular simulation, as the coordinates of atoms are discrete.To obtain an approximation of (), we divide the coordinates into multiple bins, each with a width of , and create a histogram () that records the number of atoms falling into the bin around position .In other words,

PySAGES
Apart from the usual overhead involved in writting C++ code in comparison to Python, the gradients of a �� need to be manually implemented in ������, whereas in �y����� these are automatically computed with ���. in which  is a hyperparameter that decides the width of the Gaussian kernel.
Then, the gradient of the density can be approximated as: and we calculate the location of the interface as  = arg max  | p ()|.The arg max operator is also non-di erentiable.As a result, we replace it with a so max function that transforms the raw input into a probability.Denote the  bins as  = 1 . . ., and nally we calculate the location of the interface as: As demonstrated in the code snippet for this , provided in Appendix A, y allows for the concise and straightforward implementation of complex s such as this one.

Results and Discussion
To evaluate a so ware package like y , we must consider at least two factors: physical correctness and computational performance.
First, to assess the correctness of the enhanced sampling methods implemented in y , we present in Appendix B.1 the free-energy landscape for the dihedral angles  and  of alanine dipeptide ( ).This example is commonly used to benchmark new enhanced sampling algorithms.Similarly, we also show in Appendix B.2 the free-energy as a function of the dihedral angle of butane.Our results show that y reproduces the expected free-energy landscapes using di erent methods and backends.In section 3.1, we further investigate the applicability and correctness of y beyond these simple model systems.Second, we demonstrate the performance of y on s with two di erent backends in section 3.2.In particular, we compare the performance of enhanced sampling simulations to the performance of pure simulations, as well as other enhanced sampling implementations.

Example applications of enhanced sampling with y
To demonstrate the versatility and e ectiveness of y in di erent contexts, we present several examples of how enhanced sampling methods can be used to gain valuable insights in various elds including biology, drug design, materials engineering, polymer physics, and ab-initio simulations.
These examples showcase how y can be used in diverse research areas and the utility of di erent enhanced sampling methods and backends.
Overall, these examples con rm that the enhanced sampling methods implemented in y work as intended and provide results consistent with existing literature.

Structural Stability of Protein-Ligand Complexes for Drug Discovery
High-throughput docking techniques are a widely-used computational technique in drug lead discovery.However, these techniques are limited by the lack of information about protein conformations and the stability of ligands in the docked region [ ].To address this issue, the Dynamical Undocking ( u ) method was developed to evaluate the stability of the ligand binding by calculating the work required to break the most important native contact (hydrogen bond interactions) in the protein-ligand complex [ ].This method has been shown to be complementary and orthogonal to classical docking, making both techniques work parallel in drug discovering [ , ].However, u can be slow to converge when combined with traditional enhanced sampling techniques [ ], making it unsuitable for high-throughput drug discovery protocols.
Here, we demonstrate how y with Open can be used eciently in drug discovery applications, where the user-friendly interface, native parallel capabilities, and new enhanced sampling methods with fast convergence are synergistically combined to accelerate the virtual screening of ligand databases.In this example, we study the main protease (Mpro) of Sars-CoV-2 virus ( : [ ]), where the ligands were removed and the monomer A was selected as the docking receptor.A ligand with string CCCCOCC(=O)c1ccc(C)cc1N[C@H]1N[C@@H](c2cccnc2)CS1 was docked using RDock [ ].The best scoring pose was used to initialize the system, which was simulated using the ff [ ], [ ], and [ ] force elds.A 10 ns equilibration procedure was carried out to nd the most stable hydrogen bond between the ligand and the protein.The last frame of this equilibration was then used to initialize the enhanced sampling calculations in y with , metadynamics, , , and Spectral .These methods were compared against the same system simulated using Amber20 [ ] with Steered Molecular Dynamics (see Figure 4b).Our results suggest that we can reduce the simulation time by an order of magnitude using new enhanced sampling methods like Spectral or .This can greatly accelerate the drug discovery process and help identify potential drug leads more quickly.
Figure 4: Dynamical Undocking ( u ) method in detail.For a proposed binding mode obtained from classical docking, a short run using simulations is carried out and the most stable receptor-ligand native contact is selected from that run.In this case, it is the hydrogen bond between the red and blue atoms highlighted in panel a).b) Comparison between di erent methods in y for u calculations averaged over 5 di erent replicas for each method.The reference, a Steered simulations simulations of 2 ns is in red.In comparison, di erent methods in y are used considering simulation period 10 times shorter: only [ ] provides inferior performance against the reference; Spectral [ ] or [ ] give the best performance.

Fission of a Diblock Copolymer Spherical Domain
We now investigate the ssion of a single spherical domain of a diblock copolymer using a coarse-grained model.We use a so , coarse-grained A standard potential is used to enforce incompressibility with a repulsion parameter of   = 5  / 2 .However, a higher interaction of   =   + Δ  / 2 , with Δ ∈ [0.1, 0.4] is applied between unlike particles to create a repulsion that leads to a microphase separation.A Flory-Huggins parameter Δ ∝  > 0 can characterize this phase separation.
The interaction range of this non-bonded potential is 1, as well as the range of the thermostat that keeps the temperature at  = 1   = 1.
In addition, a harmonic spring force with zero resting length is used to connect the beads to polymer chains with a spring constant of  = 16/3  / 2 , resulting in an average bond length of  0 = 0.75.The equilibrium phase for this polymer melt is a body-centered cubic ( ) phase of spherical A droplets inside a B melt. [ ] However, we con ne the polymer to a tight cubic simulation box of length  0 = 10, which results in a single Figure 5: Free energy landscape of the ssion of a spherical diblock-copolymer domain.The chain ends forming the spherical domain are split into two groups (blue) and (red), the other chain ends not visible for clarity except for a single chain (grey).Initially, a single spherical domain is formed, but as we constraint the center of mass between the blue and red groups further, the domain rst elongates and then separates completely.During this separation, the free energy continuously increases and the increase is steeper for high repulsion between unlike type Δ.As soon as the domain is separated, the free energy plateaus.
A spherical domain in the B matrix.We integrate the simulation with a time step of Δ = 10 −3  and each simulation is equilibrated for  = 1000, followed by a production run of  = 1000 as well.A discussion of the performance of this system with and without y can be found in section 3.2.1.
A er de ning the diblock copolymer system, the next step is to de ne a within the system.In this case, we are interested in the ssion of the single spherical A domain into two equally sized smaller A domains.To achieve this, we divide the polymer chains into two groups: the rst  = 100 chains are going to form the rst small domain (blue in Figure 5) and the second  = 100 chains form the second spherical domain (red in Figure 5).
To de ne and enforce the separation of the two groups, we de ne our as the distance, , between the center of mass of the blue A-tails and the center of mass of the red A-tails.Initially, without biasing, the two groups form a single spherical domain and blue and red polymer tails are well mixed, as shown at small  < 1 in Figure 5.
To study the separation of the spherical domain, we use harmonic biasing (see section 2.1.1)to enforce a separation distance  0 between the two groups.The high density in the system 0 ≈ 344, leads to low uctuations and suppression of unfavorable conformations.Therefore, we use a high spring force constant of   = 1500/ 2 to facilitate the separation.We investigate a separation of  ∈ [0, 6] with 14 replicas and use umbrella integration (see section 2.1.2) to determine the free energy pro le, as shown in Figure 5.As we increase the external separation distance  0 , we observe how the single domain splits into two.At a low separation distance  < 2, the single domain is mostly undeformed, but the two groups separate inside the single spherical domain.Increasing the separation distance further goes beyond the dimensions of the spherical domain, leading to the deformation of the domain into an elongated rod-like shape.The two groups still maintain a connection to minimize the AB interface.At a separation between 4 and 5 the deformation becomes so strong, that the penalty of forming another AB interface between the two groups, and hence forming two spherical domains, is lower than the entropic penalty of the domain deformation and elongated AB interface of the droplet.After the separation, the free energy landscape remains indi erent to the separation, since there is no interaction between the two domains le .
The free energy pro le of separation is controlled by the repulsion of unlike types  ∝ Δ.The stronger the repulsion, the more energy is necessary to enlarge the AB surface area for the ssion.For the strongest interaction Δ = 0.4, the total free energy barrier reaches about 800, while for the lowest Δ = 0.1 it remains below 400.Both barriers are orders of magnitude larger than thermal uctuations 1   = 1, so a spontaneous separation is not expected and the ssion can only be studied via enhanced sampling.
It is interesting to note that at the lowest separation distance  0 = 0 it is not the lowest free energy state.Enforcing perfect mixing is not favorable, as the two groups naturally want to separate slightly optimizing the entropy of the chain end-tails.

Liquid Crystal Anchoring in Aqueous Interfaces
Liquid crystals ( s), materials that ow like liquids but have anisotropic properties as crystals, have been used lately as prototypes for molecular sensors at interfaces given the high sensitivity in their anchoring behavior relative to small concentration of molecules at aqueous interfaces [ ].The presence of molecules at the interface changes drastically the free energy surface of molecules relative to their orientation and distance to such interface.In this example, we are revisiting some canonical interfaces for ; 4-cyano-4'-pentylbiphenyl ( ) at the interface of pure water and sodium lauryl sulfate ( ).For and water, previous work has focused on obtaining the free energy surface of a at the water interface [ ].In our case, hybrid anchoring conditions have been imposed on a 16 nm slab of 1000 molecules in the nematic phase (300 K) interacting with a 3 nm slab of water with 62 molecules of at one of the interfaces.The force elds used are: united atom for [ ], [ ] for water, [ ] and Lipid 17 for .The s chosen to study this system are the distance of the center of mass of one molecule of at each one of the interfaces (see Appendix A), and the tilt orientation of the same molecule with respect to the z axis of the box.The free energy surfaces for the pure water and with at the interface are both displayed in Figure 6.We can observe that the free energy surface of pure water shows a minimum corresponding to a parallel orientation to the surface with a similar shape that one calculated in [ ].On the contrary, the presence of transforms the minimum to a maximum in the same relative position and orientation to the interface (Figure 6 top le ), moving now the minima to a perpendicular orientation of to the interface, in agreement to the experimental observation of change from planar to homeotropic anchoring in the presence of in water.

Ab Initio Enhanced Sampling Simulations
In the eld of ab initio simulations of heterogeneous catalysis, capturing the dynamic and entropic e ects is crucial for an accurate description of the phenomena [ ]. Classical force elds are inadequate for capturing the essential bond breaking events involved in catalysis, so simulations based on rst-principles calculations are necessary.Given that reactive events are o en limited by large free energy barriers, enhanced sampling techniques are a crucial part of these simulations.Coupling y to , provides access to a wide range of rst-principle calculators.As an example, we have used as a calculator for a simple ab initio enhanced sampling simulation.The is the separation distance between a sodium and chlorine atom using the functional [ ], and Spectral as the enhanced sampling method (see section 2.1.10).The results are shown in Figure 7, where the minimum in the free energy pro le along the Na-Cl distance corresponds to the equilibrium distance between Na and Cl atoms in vacuum.

Enhanced Sampling with Machine Learning Force Fields
Deep neural network ( ) force elds can retain the accuracy of ab initio while allowing for computational costs similar to those of classical .).Additionally, allows to leverage more general potentials that can be used in enhanced sampling calculations.Coupling of y with or can be used in active learning of force elds by e ciently sampling rare events using any of the enhanced sampling methods provided by y as described in Ref. [ ] where parallel tempering metadynamics was used to generate accurate force eld in urea decomposition in water.To test the capabilities of y to handle di erent force elds, we have selected three di erent systems trained with the methods mentioned above.For Deep , we use a pre-trained model for water, where the enhanced sampling system is one single water molecule in vacuum, the collective variable is the internal angle of the molecule and the sampling method is (section 2.1.4).The results in Figure 8 show that the minimum for this free energy pro le is around 105 degrees, which is within the range of the experimental value.
Next, in Figure 8b, a potential was used for Si-H amorphous mixtures [ ].In this case, a system of 244 atoms was used, and the collective variable is the bond angle between a triad of Si-Si-H atoms in the mixture.The global minimum in free energy agrees with the histogram taken from unbiased simulations reported in [ ].
Lastly, we studied a Graph neural network ( ) model of a Si crystal [ ] with y and .In this case, a crystalline Si system of 64 atoms used, and the was the Si-Si distance for the the crystal.The results of Figure 8c show that for this model, the minimum in the free energy corresponds almost exactly to the experimental value for the Si-Si nearest distance of 2.35 Å.

Performance
Our analysis revealed that y is at least ∼14-15 times faster than on an Nvidia machine.To obtain this estimate, we ran enhanced sampling using umbrella sampling along the center of mass distance between two spherical polymer domains to measure the free energy landscape of the ssion of a spherical diblock-copolymer blend (Figure 5) described in section 3.1.2.For support and compatibility across libraries and engine versions, we estimated the performance with v ..alpha and y v . .using -blue v . .and -blue v . ., respectively. .We observe idle time during the y Python coordination with -/ u y (green bar), but note that there is no memory copies even within the memory.The additional time for biasing per time step is 247 μs (teal bar).

Py
is designed to execute every compute-intensive step of a simulation on the and have zero copy instruction between device and host memory for its explicit backends for -blue [ ] and Open [ ], while still providing Python code for the user through [ ].In this section, we investigate the calculation e ciency of y by examining two example systems, one for each backend.
For -blue, we are investigating a system of highly coarse-grained diblock-copolymers as discussed in section 3.1.2.The simulation box contains a total of  = 51 200 particles at a density of  = 51.2/ 3 , which we use for benchmarking purposes with an Nvidia hosted on an Intel Xeon Gold @ 3.00GHz.Running only with -blue v . .we achieve an average time steps per second ( ) of 754, which is the expected high performance of -blue on s. Figure 9 shows a detailed pro led timeline during the execution of a single time step.During 1.8 ms, -blue spends the most computational e ort on the calculation of pairwise forces.It can be noted that -blue is designed to have almost no idle time of the during a time step.As soon as y is added computation part, we observe that an additional part is added to calculate the and add the forces to every particle.This causes a small period of idle of the , since the execution also requires action of the Python runtime interface with .In the future, we plan to launch the calculation of asynchronously with the regular force calculation, which would hide this small -intensive idle time.However, we measure that the total delay due to the extra computation is only about 247 μs only.We regard this to be an acceptable overhead for the user-friendly de nition of s.In order to connect multiple points in space we can use enhanced sampling methods such as umbrella sampling (see section 2.1.2) or the improved string method (see section 2.1.3)to calculate the .Common for these advanced sampling methods that multiple replica of the system are simulations.With y we easily parallelize their execution using the Python module mpi4py and its MPIPoolExecutor.This enables us to execute replica of the simulations on multiple s even as they span di erent host machines.In our example, we used 14 replicas for umbrella integration with 7 Nvidia s.The use of a single to execute the simulations with 5 • 10 5 time steps for all replicas takes 2 hours and 59 minutes.Ideal scaling with 7 s reduces the time to solution to about 26 minutes.With our -parallel implementation, we achieve a time-tosolution of 28 minutes.Synchronization overhead and nonparallel aspects like nal analysis sum up to 2 minutes or about 9% overhead.This multiimplementation via enables automatically e cient enhanced sampling in high performance computing ( ) environments.For enhanced sampling methods that are designed for single replica simulations, we o er an implementation that allows multiple replicas to run in parallel, known as embarrassingly parallel computing.In this situation, the build-in analysis averages the results from multiple replicas and estimates uncertainties.
In the previous section, we have demonstrated the fast interoperability between y and -blue via .However, the concept of y is to develop enhanced sampling methods independently of the simulation backend, so here we demonstrate that similar performance can be achieved with Open .Since Open focuses on all-atom simulations, we simulate an all-atom model of a polymer with the Big )=O} with an -force eld [ , ] including long-range Coulomb forces via particle mesh Ewald ( ).We simulate a bulk system of 40mers with 31 macromolecules present, adding up to 40 981 atoms.As a proof of concept, we calculated the center of mass for every polymer chain and biased it harmonically via y .As a performance metric, we evaluate the nano-seconds per day ( / ) executed on the same hardware con guration as a the -blue example above.For the unbiased, pure Open simulation we achieve a performance of ≈ 136 / . For the y biased simulation, we achieve a performance of ≈ 75 / , equating to a biasing overhead of approximately 50%. Figure 10 shows a similar time series analysis as for -blue.It is notable that Open 's execution model makes more use of parallel execution of independent kernels, which also changes the order of execution compared to -blue.As a result, the same synchronization changes the execution more drastically than in -blue.Additionally, a single time step for this system is faster executed compared to blue, making the synchronization overhead more noticeable.In this case, parallelization of y and Open is projected to have a bigger performance advantage.Furthermore, we notice that the calculation of the center of mass and the biasing of all 31 polymer chains is more costly than the single in the previous example.The combination of these factors explain the higher y overhead for this Open simulation, but overall performance is good and signi cantly better for alternative implementations that require calculations on the .

Conclusion
We have introduced y , a library for enhanced sampling in molecular dynamics simulations, which allows users to utilize a variety of enhanced sampling methods and collective variables, as well as to implement new ones via a simple Python and -based interface.We showed how y can be used through a number of example applications in di erent elds such as drug design, materials engineering, polymer physics, and ab-initio MD simulations.We hope that these convey for the reader the exibility and potential of the library for addressing a diverse set of problems in a high-performance manner.
As our analysis showcased, for large problems, y can perform biased simulation well over one order of magnitude faster than a library such as even when the backend already performs computations on a .Nevertheless, as with any newly developed so ware, y is still under development and we are continually working to improve it.In the near term, we plan to add the ability for users to perform restarts, which will provide greater exibility running long simulations.Moreover, we plan to optimize y -side computations to run fully asynchronously with the computation of the forces of the backend, which will further enhance its current performance.We also invite the community to contribute to the development of y , whether by suggesting new features, reporting bugs, or contributing code.
Overall, we believe that y provides a useful tool for researchers interested in performing molecular and ab-initio simulations in multiple elds, due to its user-friendly framework for de ning and using sampling methods and collective variables, as well as its high performance on devices.
Looking further ahead, we are excited about the potential for y to enable fully end-to-end di erentiable free energy calculations.This will provide new possibilities for force-eld and materials design, which would drive signi cant advances in these areas.

Figure 2 :
Figure 2: Example of how to use the Python interface for y.It is easy to extend existing scripts with y to perform enhanced-sampling , with minimal changes to the code.In general, the only requirement is for the user to wrap the code that de nes the simulation system into a simulation generator function.

Figure 3 :
Figure 3: Example of how to write a in y .On the le is the same s written in and on the right the y version.In general, the only requirement is for the user to write the as a di erentiable function in .
dissipative particle dynamics ( ) model published in previous studies [ , , ].The model consists of  = 200 chains with  = 256 beads each, representing a liquid polymer melt.The rst   = 16 beads in each chain are type A, while the remaining   = 240 are type B.

Figure 6 :
Figure 6: Free energy surface of in a hybrid anchoring slab with and water.Right: Snapshot of the system with water molecules in red, in purple, in green and sodium ions in yellow.Top Le : of molecule near the water-interface.Bottom Le : of near a pure water interface.Both were obtained with y and Open using the method.

Figure 7 :
Figure 7: Free energy ( = 300 K) and potential energy calculations of Na-Cl distance with + using Spectral in y .

Figure 8 :
Figure 8: Free energy calculation of: a) Water internal angle from a Deep model with , b) Si-Si-H angle of model with and c) Si-Si distance of a model with .

Figure 9 :
Figure 9: The gure shows a 1.8ms section of pro led timeline recorded with Nvidia Nsight systems on an Nvidia .The top row shows a vanilla -blue simulation step, while the bottom row shows a y / -blue simulation with harmonic biasing of a center of mass .Light-blue represents the activity while dark-blue represents individual compute kernels.The maroon letters show case the same compute steps in both simulations: a) First half-step of integration, b) compute of bond forces, c) pair-forces, d) calculation of the , e) addition of the harmonic biasing force to the -blue simulation, and f) the second integration step.Sections d) and e) are y only and are executed on the.We observe idle time during the y Python coordination with -/ u y (green bar), but note that there is no memory copies even within the memory.The additional time for biasing per time step is 247 μs (teal bar).
Figure 10: 1.6ms pro led time line of an Open simulation of 40, 981 particles as polymers with particle mesh Ewald ( ) summation for long-range Coulomb forces.The colors and labels are identical to Figure 9. Open works with asynchronous kernel execution, which leads to less linearly sorted timelines, compared with-blue, but we can still identify the calculation d) and force biasing e) and the synchronization idle of the (green).Overall, the performance degradation is more pronounced with Open compared to -blue.

Figure
Figure B.12: Free energy pro le along the dihedral angle of a butane molecule (using an based force eld [ ]) obtained via di erent enhanced sampling methods with y and -blue: , , , Spectral , .The legend also indicates the length of the simulation.The long simulations represent the ground truth.
5)in which   denotes the coordinate of atom .As written above, () is nondi erentiable.Therefore, as in other works [ ], we utilize the kernel density trick with a Gaussian kernel to modify ().