Background & Summary

The passive permeation of small organic, drug-like molecules across phospholipid membranes has garnered much interest, not only to practically optimize pharmaceutical properties, but also as a more fundamental physical-chemistry problem1. The latter acts as a testbed to understand the molecular driving forces at play during a permeation process across a soft interface. A more robust understanding of the structure-property relationship can be obtained by screening across chemistries and systematically measuring the permeability coefficient from in vitro experiments2,3. Given the small size and apparent bias of databases of experimental compounds4, the perspective to harness computational methods at high throughput has been on the rise5,6,7,8,9.

Permeation is described using the inhomogeneous solubility-diffusion model to yield a diffusion process in terms of a one-dimensional Smoluchowski equation along z—the normal to the membrane midplane. The resulting permeability coefficient, P, takes the form

$${P}^{-1}=\int dz\frac{exp[\beta G(z)]}{D(z)},$$
(1)

where \({\beta }^{-1}={k}_{B}T\) is the inverse temperature, G(z) is the potential of mean force (PMF), and D(z) is the local diffusivity. As such, knowledge of G(z) and D(z) enables an in silico estimation of P, which may be obtained from molecular dynamics (MD) simulations. While the direct estimation of these quantities from brute-force MD typically fails, enhanced-sampling methods have provided a robust strategy to estimate both G(z) and D(z). Equation 1 depends exponentially on G(z), but only linearly on D(z), making the latter quantity less critical—it was also found to depend rather weakly on the chemistry of the drug10. A large number of enhanced-sampling studies have demonstrated the capability to not only converge the PMF, but also to provide permeability coefficients that exhibit high correlation with experimental measurements10,11,12,13,14,15,16. An illustrative example of the PMF is shown in Fig. 1a, together with a cartoon of a phospholipid membrane in the background.

Fig. 1
figure 1

Drug-membrane computer simulation setup; screening over both phospholipids and solute molecules. (a) Background: Simulation setup of a solute (yellow) partitioning between water (not shown) and the lipid membrane. Foreground: Potential of mean force along the normal of the bilayer, G(z). (b) Lipid membrane: Cartoon representations of the five phospholipids, differing in the number of unsaturated groups. (c) Solute molecule: Combinatorics of all 105 CG Martini dimers. (d) The present dataset contains the trajectory of each MD simulation.

The in silico route is predictive and generalizable in that it does not rely on adjustable parameters: the main input of an MD simulation is the force field, often parametrized on properties unrelated to interactions with a phospholipid membrane17,18. Critically, this limits the danger of overfitting observed in statistical models19. The main downside of using MD simulations is the computational investment: atomistic simulations with explicit solvent typically require 105 CPU-hours for a small molecule in a single-component lipid membrane10,13,14,20, hindering the prospects of running them at high-throughput.

We have recently proposed the use of coarse-grained (CG) models to tackle this problem. Coarse-graining can enable a more efficient sampling of the conformational space by lumping together atoms into super-particles or beads21,22. In particular we relied on the CG Martini model23,24,25, which is specifically tailored to reproduce the partitioning behavior of compounds in different environments—thus making it particularly well suited for permeability calculations. The modularity of Martini means that it constructs molecules based on a small set of bead types, each one encoding different chemical properties—mainly hydrophobicity, hydrogen-bonding, and charge (see Table 1 for details). We reported the systematic calculation of PMFs using umbrella sampling for all CG compounds made of one and two neutral beads (hereafter denoted unimers and dimers)26. This amounted to 14 unimers and 14 × 15/2 = 105 dimers (Fig. 1c). The thermodynamic parametrization of Martini yields accurate PMFs, as compared to reference curves from atomistic simulations, as well as remarkably-accurate permeability coefficients, as compared to reference simulations and experiments27. Because of the transferable nature of Martini, the CG model significantly reduces the size of chemical space, such that these 119 computer simulations offer estimates for more than 500,000 small molecules26,27. We more recently extended our approach to linear trimers and tetramers, demonstrating that the screening range can be significantly increased28. As such, Martini offers a robust methodology to run high-throughput computer simulations of drug-membrane permeability.

Table 1 Characteristics of non-charged Martini bead types.

The present database reports the full umbrella-sampling MD trajectories necessary to run PMF calculations for all Martini dimers in six different single-component phospholipid membranes. The 105 dimers inserted in 6 membranes amounts to 630 drug-membrane combinations (Fig. 1). Given that each PMF calculation relied on 24 umbrella sampling simulations, the present database contains 15,120 MD trajectories. The diversity of compound and lipid chemistries can offer unprecedented insight into the underlying thermodynamics26,29. Below we present an example use of the present database by displaying the tilt angle of each compound across membrane-insertion depth and compound chemistry. We believe that the raw MD trajectories provided for this breadth of chemistries will provide further insight into the structure-property relationships governing drug-membrane permeability.

Methods

We follow previously established simulation protocols that are described in detail elsewhere26. In brief, we built symmetric, single-lipid bilayer membranes that contain 64 lipids per leaflet using the Insane script30. Table 2 informs on the composition of the various membrane systems that differ in the number of water beads. As it is common practice, we replaced at least 10% of the non-polarizable Martini waters by anti-freeze beads. Simulations were performed in Gromacs 4.6.631 using the Martini force field with standard input parameters32. We ran simulations in the NPT ensemble at 300 K and 1 bar controlled by means of a stochastic velocity-rescaling thermostat33 and a Parrinello-Rahman barostat34, respectively. We performed umbrella sampling along the bilayer normal (z-axis) in a range from 0.0 to 4.1 nm at a step size of 0.1 nm by generating 24 windows in which the solute is centered via a harmonic biasing potential (k = 240 kcal/mol/nm2). For computational efficiency, each simulation box contained two solute compounds placed in different membrane leaflets. Each window included a sequence of minimization, heat-up, and equilibration runs prior to the production one, the latter being simulated for 1.2 · 105τ using a time step of δt = 0.02 τ, where τ (1 ps) refers to the model’s natural unit of time. PMF profiles were then reconstructed by means of the weighted histogram analysis method (WHAM)35,36, with error bars estimated from 100 bootstraps.

Table 2 Composition of single-lipid bilayer membranes.

Data Records

We provide datasets for MD trajectories of solute-membrane systems at a CG resolution for 105 solutes inserted in six different phospholipid bilayers37. Each dataset is denoted by the abbreviated name of the lipid and deposited as a single archive file, e.g., DPPC.tar.bz2. Within a dataset, there are 105 folders containing the trajectories and PMF profile of a particular solute, following the naming convention DIM_bead1-bead2, where bead1 and bead2 denote the relevant bead types following the standard Martini notation (see Table 1). For improved sampling we have systematically placed two solutes in each simulation box, always separated by a normal distance (i.e., only along z) of 4.1 nm. The trajectories obtained from umbrella sampling (US) are stored in sub-folders denoted us-x, where x takes values 0.0, 0.1, …, 2.4, corresponding to the reference depth in the bilayer of one of the two solutes. For instance, the folder us-2.4 contains two solutes restrained around z1 = 2.4 nm and z2 = −1.7 nm. The US sub-folders contain all necessary input files to repeat the production runs as well as the respective output files, including trajectories and observables. The sub-folder pmf contains the input files to perform WHAM reweighting in Gromacs and the output PMF profiles. Table 3 lists all files included in the sub-folders us-x and pmf together with a brief description of their purpose. In Fig. 2 we report typical PMF profiles obtained for hydrophobic (Fig. 2a), amphiphilic (Fig. 2b), and polar solute compounds (Fig. 2c) across all considered lipid environments.

Table 3 Supplied files and their purpose.
Fig. 2
figure 2

Examples of PMF profiles in different lipid environments. Three representative solutes are shown: (a) hydrophobic (C1-C1); (b) amphiphilic (C1-P3); and (c) polar (P1-P1). See labels for the different lipid types.

Our records indicate a computational investment to run heat-up, equilibration, and production simulations of roughly 0.5, 1, and 8 CPU-hours per umbrella, respectively. Summing up over all 15,120 MD simulations supplied, this amounts to roughly 150,000 CPU-hours to generate the present dataset.

Technical Validation

A number of studies using the same simulation protocol have demonstrated the thermodynamic validity and accuracy of CG Martini simulations. Bereau and Kremer found a mean absolute error between experimental and Martini transfer free energies between water and octanol of 0.79 kcal/mol across 653 neutral small organic molecules—an excellent result given the minimalism of the model38. By further invoking relations between bulk transfer free energies, used as proxy for various environments of the membrane, we deduced a mean absolute error on features of our CG PMFs of approximately 1.4 kcal/mol26. This remarkable agreement had been earlier probed specifically for amino acids39. On a structural level, we showed that backmapping CG snapshots and running short atomistic MD simulations offered a significant speedup in convergence of the atomistic PMF calculations, suggesting that the conformational ensemble of the CG model adequately matches its atomistic counterpart40. Moreover, as the permeability coefficient depends exponentially on the PMF, reliable estimates of the latter prove necessary13. We showed that the accuracy of the PMFs obtained through Martini translated into excellent predictability for the permeability coefficient—roughly 1 log unit27,29.

Usage Notes

The 15,120 MD trajectories in this dataset provide a rich amount of information. As an illustration, we focus on the orientation of the solute with respect to the normal of the membrane bilayer. We define a tilt angle θ between the bond vector of the solute and the normal vector of the membrane, oriented as to point from the the bilayer midplane to the membrane surface. Figure 3 displays the average tilt angle as a function of the depth z in a DPPC bilayer across all 105 solute compounds. The depicted angles are normalized to sin θ to account for the Jacobian of the transformation to spherical coordinates.

Fig. 3
figure 3

Average tilt angle in a DPPC bilayer as function of the z-distance across all 105 solute compounds.

We find that solutes composed of two beads of identical or similar polarity display no preferred orientation, such that θ ≈ 90°. On the other hand, features appear for compounds that show a difference in polarity between the two beads of the solute. These features are markedly present in the range of depths 0.9 < z < 2.2 nm, which entails the lipid tail region. These amphiphilic solutes, such as C1-P5, show a strong preference for small tilt angles θ < 45°, where the more hydrophobic bead is facing the membrane core. The lack of features below z ≈ 0.9 nm is likely due to the force field’s interaction cut-off. In addition, strongly amphiphilic solutes also show orientational order at the membrane/water interface (2.4 < z < 2.7 nm), but with a flipped bond vector (θ ≈ 130°), i.e., the polar site now faces the membrane.