Analyzing protein topology based on Laguerre tessellation of a pore-traversing water network

Given the tight relation between protein structure and function, we present a set of methods to analyze protein topology, implemented in the VLDP program, relying on Laguerre space partitions built from series of molecular dynamics snapshots. The Laguerre partition specifies inter-atomic contacts, formalized in graphs. The deduced properties are the existence and count of water aggregates, possible passage ways and constrictions, the structure, connectivity, stability and depth of the water network. As a test-case, the membrane protein FepA is investigated in its full environment, yielding a more precise description of the protein surface. Inside FepA, the solvent splits into isolated clusters and an intricate network connecting both sides of the lipid bilayer. The network is dynamic, connections set on and off, occasionally substantially relocating traversing paths. Subtle differences are detected between two forms of FepA, ligand-free and complexed with its natural iron carrier, the enterobactin. The complexed form has more constricted and more centered openings in the upper part whereas, in the lower part, constriction is released: two main channels between the plug and barrel lead directly to the periplasm. Reliability, precision and the variety of topological features are the main interest of the method.

. Influence of the enterobactin-iron status on water inclusions. (A) The number N inc of isolated water clusters inside FepA complexed with the enterobactin-iron ligand is plotted as a function of MD snapshot time. The enterobactin-iron complex is treated either as protein-like (blue dots) or as solvent (red dots). The blue dots represent the data, N inc complexed, of Table S1. The interpolating lines are visual guides only. (B) This 3D view, extracted from the 80 ns snapshot, demonstrates the change in N inc . The enterobactin-iron complex is shown as van der Waals spheres whereas the protein is in lightteal cartoon. Water molecules are displayed as Laguerre polyhedra. Each connected component has a different colour, randomly chosen, except blue representing the main connected component (as in Figure 2 of the main text). When the ligand is considered as being part of the solvent, it merges the green and yellow water clusters (encircled) with the main component, in blue, thereby decreasing N inc by 2. Figure S3. Path disconnection. Example of a string of water polyhedra that is disconnected with standard weight w ref (left) and connects into a path when water has a Laguerre weight 75% larger (right). The images are taken from the ligand-free snapshot at 60 ns.

2/7
A B Figure S4. Influence of the enterobactin-iron status on the protein surface genus. (A) The global genus g is plotted as a function of MD snapshot time. The enterobactin-iron complex is treated either as solvent (blue bars) or as protein-like (cyan bars). The blue values are the same as the corresponding ones in Figure 4, complexed case, of the main text. (B) The enterobactin-iron is displayed by its orange Laguerre polyhedral surface whereas the FepA protein, coloured lightteal, is represented in a mixed way: cartoon except for the residues in contact with the ligand, which are in atomic Laguerre polyhedra. On the left (at 0 ns), a single opening in the protein alone transforms into two narrower passages in the united enterobactin-iron-protein complex. On the right (at 50 ns), a hole in the protein is shut by the presence of the enterobactin-iron ligand (orange). Figure S5. Two ways of defining optimal water paths. Method 1: Voronoi algorithm without solvent (inspired from MOLE); paths follow edges of a Voronoi tessellation built on the isolated protein, discarding solvent, lipids and ligand. Each path is a sequence of maximal empty spheres (in grey). Method 2: Laguerre tessellation with solvent; each path is a string of contiguous water Laguerre polyhedra (coloured). In both cases, the optimal paths minimise a potential function penalising short distance with the protein. The comparison is carried out on two paths, P1 and P2, in the complexed form of FepA at 86 ns.

P1:
P2: Figure S6. Channel radius as a function of z z z. The z axis is the protein axis, perpendicular to the membrane. The radius is plotted following a water path (method 2, in blue) or a path in the Laguerre graph of the protein only (method 1, in black), for both paths P1, P2 shown in the previous Figure S5.  In order of increasing eigenvalues, the corresponding principal directions are approximatelyx,ŷ,ẑ for plug, x,ŷ −ẑ,ŷ +ẑ for barrel, without any significant change of orientation between the ligand-free (blue lines) and complexed (red lines) forms of FepA. Figure S9.

5/7
Cost function f f f as a function of water radius r r r. The cost function is plotted for the free (red) and the complexed (blue) forms of FepA. The water weight w chosen for this study is the square of the minimising radius r w = 1.22Å.
Supporting Tables   Table S1. Water inclusions in FepA. For each snapshot (ligand-free: 0 → 70 ns, complexed: 0 → 85 ns), N inc is the number of inclusions trapped in protein environment. The population, counting water molecules in each inclusion, is reported by the average pop and standard deviation σ over the N inc inclusions. The maximal population recorded is 16 molecules for the free form, 15 for the complexed form. The bottom line indicates time averages and rms taken over the 11 snapshots closely sampled in the last 10 ns (see Figure S3 for details).   Table S3. Residues bordering main water channels in apo and holo during the last 10 ns. The table lists the residues found bordering in both ligand-free and complexed forms. The colour code is the same as in Figure 8 in the Main Text. Residue labels follow the original pdb file. The conserved residues (Chakraborty, 2003;Lopez, 2007) are surrounded by a square box.