LipIDens: simulation assisted interpretation of lipid densities in cryo-EM structures of membrane proteins

Cryo-electron microscopy (cryo-EM) enables the determination of membrane protein structures in native-like environments. Characterising how membrane proteins interact with the surrounding membrane lipid environment is assisted by resolution of lipid-like densities visible in cryo-EM maps. Nevertheless, establishing the molecular identity of putative lipid and/or detergent densities remains challenging. Here we present LipIDens, a pipeline for molecular dynamics (MD) simulation-assisted interpretation of lipid and lipid-like densities in cryo-EM structures. The pipeline integrates the implementation and analysis of multi-scale MD simulations for identification, ranking and refinement of lipid binding poses which superpose onto cryo-EM map densities. Thus, LipIDens enables direct integration of experimental and computational structural approaches to facilitate the interpretation of lipid-like cryo-EM densities and to reveal the molecular identities of protein-lipid interactions within a bilayer environment. We demonstrate this by application of our open-source LipIDens code to ten diverse membrane protein structures which exhibit lipid-like densities.


Supplementary Information:
Best practices for users and pipeline limitations: • When running the LipIDens pipeline using the master python file ('lipidens_master_run.py')default variables and parameters are provided.To accept these parameters press the ENTER/RETURN key.In addition, all steps of the LipIDens pipeline are described in detail in the accompanying LipIDens protocol.• For smooth running we recommend users are familiar with setting up and performing MD simulations using GROMACS.• We recommend using the lipidens_master_run.py file for extended running of the pipeline.However, the jupyter notebook is useful for tutorial processes.• LipIDens will provide the most likely identity of lipids bound to a site under a given lipid composition.The pipeline cannot be used to definitively identify lipidlike densities due to the plethora (>1000s) of lipids present within cellular membranes.Hence users should select bilayer conditions which provide a suitable minimal mimetic of the native membrane or experimental conditions.• Cellular membranes contain thousands of lipid species, a subset of which have accompanying CG parameters files, hence users should check the availability of specific lipid parameters for a particular forcefield.In particular, it may be necessary to approximate tail lengths and saturation to the nearest available lipid.• The LipIDens pipeline will print out warnings when appropriate and terminate the protocol when GROMACS errors occur during simulation setups.The output of GROMACS commands are provided within the 'output_files' directory within each simulation replicate.If errors occur, please review these files and fix the underlying problem (e.g.missing atoms in the .pdbfile) before rerunning the code.• We recommend a CG simulation time of at least 10 μs and to run multiple repeat simulations (at least 8).If convergence of lipid interactions is not reached during the simulation timeframe, then simulations should be extended.The 'Screening PyLipID section' of the accompanying protocol details best practices for assessing convergence of kinetic parameters.• It is good practice to test PyLipID cut-offs for at least one phospholipid and any other lipids/sterols with significantly different molecular structures e.g.cholesterol or cardiolipin.• When running the analysis stages of the pipeline the 'stride' variable can be used to skip X number of frames.This is useful for dealing with slower run times or computational memory errors during PyLipID analysis.• LipIDens will provide warnings when binding sites are assigned multiple times to the site comparison dictionary (BindingSite_ID_dict).Users should check these sites carefully, remove any poorly defined sites and/or additional site occurrences.• While LipIDens can facilitate modelling into structures with lipid-like densities it cannot predict the functional consequence of bound lipids.Hence, the biological relevance of bound lipids should be assessed by accompanying biochemical and/or spectral analyses.

Supplementary Figures:
Supplementary Figure Plots show the effect of the selected cut-offs on e) interaction duration times f) the number of calculated binding sites and g) the number of interacting residues.

1 :
Tuning PyLipID cut-off values: interactions of HHAT with PIP2Plotted outputs from PyLipID cut-off testing.a-d) Minimum distances between HHAT residues a) K91 b) R131 c) P3 and d) T408 and a PIP2 molecule across one 15 μs CG simulation.The minimum distance was calculated between any bead of the residue and any bead of the lipid.For clarity, only those interactions which came within 0.65 nm (distance_threshold) for at least 30 frames (contact_frames) of the simulation are plotted.e-g) Exhaustive testing of a range of lower and upper cut-off combinations for HHAT-PIP2 interactions (n=10 x 15 μs CG independent simulations).

Supplementary Figure 2 :Supplementary Figure 3 :
Comparison of lipid fluctuation with the cryo-EM density.The per atom root mean square fluctuation (RMSF) of a POPE lipid bound to HHAT (boxed) across n=5 x 200 ns independent atomistic simulations.POPE atom spheres are scaled by RMSF value and coloured from low (white) to high (red).The per atom Q score 1 was used to assess how well the simulation derived lipid pose matched the cryo-EM density.Cardiolipin binding to ELIC.a) Cardiolipin (CDL) binding sites (BS) ranked from worst to best Δkoff (Δkoff = koff from curve fitting -bootstrapped koff) or lowest to highest residence time (n=10 x 15 μs independent CG simulations).The CDL binding site with the longest residence time, BS1, is arrowed.b) Top ranked CG binding pose for CDL at BS1. c) Snapshots of the CDL binding pose at the end of n=3 x 200 ns independent atomistic simulations initiated using the CG CDL binding pose in b.

Supplementary Figure 7 :
Relative contribution of distinct lipids to binding sites on ChRmine.a) Position of four acyl-tails (grey sticks, numbered i-iv) modelled into lipid-like densities (grey surface) within the extracellular (EC) leaflet of the ChRmine structure (PDBid: 7SFK) 3 .b) Overlay of lipid binding poses for DOPE, POPE and POPS (sticks) to BS1 (blue) and BS2 (red) with partitioned site densities (mesh).c) Comparison of top-ranked lipid binding poses at BS1 with density-i showing co-localisation of a single lipid tail while the second tail faces the surrounding membrane.Poses correspond to those directly backmapped from n=10 x 15 μs independent CG simulations (without refinement using atomistic simulations).d) Relative residence times for lipids bound to BS1, where the koff was derived from bi-exponential curve fitting of the interaction survival function.Asymmetric error bars indicate a second koff value obtained via bootstrapping to the same data.e) Overlay of DOPE and POPS poses with densitiesii to -iv at BS2. f) Relative residence time plot for BS2 (plotted as in d).g) Suggested lipid modelling based upon analysis of top ranked lipid binding poses, relative residence time plots and density connectivity.We suggest the most likely identity of density-i and density-iv are a single tail of POPS and DOPE respectively while densities-ii/iii correspond to a single POPS lipid.