In silico method for selecting residue pairs for single-molecule microscopy and spectroscopy

Obtaining (dynamic) structure related information on proteins is key for understanding their function. Methods as single-molecule Förster Resonance Energy Transfer (smFRET) and Electron Paramagnetic Resonance (EPR) that measure distances between labeled residues to obtain dynamic information rely on selection of suitable residue pairs for chemical modification. Selection of pairs of amino acids, that show sufficient distance changes upon activity of the protein, can be a tedious process. Here we present an in silico approach that makes use of two or more structures (or structure models) to filter suitable residue pairs for FRET or EPR from all possible pairs within the protein. We apply the method for the study of the conformational dynamics of the substrate-binding domain of the osmoregulatory ATP-Binding Cassette transporter OpuA. This method speeds up the process of designing mutants, and because of its systematic nature, the chances of missing promising candidates are reduced.


Scientific Reports
| (2021) 11:5756 | https://doi.org/10.1038/s41598-021-85003-0 www.nature.com/scientificreports/ combination with a carefully designed homology model of the other conformation should also work. Our method reduces the number of possibilities drastically and allows to focus on the biological restraints rather than the technical ones to obtain the best possible residue pairs. We tested the method by designing pairs for labelling in the substrate-binding domain (OpuAC) of the ABC transporter OpuA and performed single-molecule FRET measurements and functionality assays of the full-length protein complex.

Results
The ABC-transporter OpuA. The protein that we use to showcase the in silico approach is the osmoregulatory ABC transporter OpuA. Its substrate-binding protein (OpuAC) undergoes a conformational change upon binding of glycine betaine. Manually selected labelling positions are already available for this protein, which have been used in previous smFRET studies 10 . Even though the used labelling sites (V360C/N423C) report large differences in distance upon glycine betaine binding and do not affect the binding process, they affect the transfer of substrate from the SBD to the membrane domain of OpuA. In fact, the V360 and N423 are present in the lobes of OpuAC that interact with the transmembrane domain (TMD) of OpuA 19 . Therefore, we aimed to find new residue pairs for smFRET that do not interfere with the docking of OpuAC. We present a general procedure for selecting labelling sites based on a minimum of two protein structures. This method provides a screening of all possible residue pairs and allows smart filtering, prior to performing the actual experiments. The results can be inspected manually, for instance by using knowledge of the activity and structure of the entire protein complex. In our case we manually filtered out regions that would affect the interaction of the substrate-binding domain (here OpuAC) with the TMD of the OpuA complex 19 .
In silico distance mapping. Crystal structures of OpuAC in the open (PDB: 3L6G) and ligand-bound closed (PDB: 3L6H) conformation 20 were used as a starting point for the in silico distance mapping. In short, a distance map plots the distance between each possible pair of residues, in this case between the two centers of mass of the side chains of the amino acids (Cα in case of glycine). The center of mass of the sidechain was chosen, because it is closer to the site of labelling than Cα and it takes the direction of the sidechain into account; in the script (https ://githu b.com/Membr aneEn zymol ogy/Resid uePai rs) the center of mass is easily changed into Cα, if preferred. This (d conformationA d conformationB ) leads to a symmetrical (d 1,2 = d 2,1 ) area plot (Fig. 1a,b). Next, a distance change map is generated by subtracting the distance map of the second conformation (here the closed state of OpuAC) from the first one (open state) (Eq. 1). This difference map shows the distance shift for each residue pair upon the conformational change that is elicited by the binding of glycine betaine (Fig. 1c).
Filtering of the results. The three obtained maps (Fig. 1a-c) contain all possible pairs of residues, which can now be used to apply restraints in a mathematical way to the residue pairs. First, we select for distances within the predefined range (e.g. as set by the R o value of the FRET pair), by discarding pairs with distances that are larger or smaller than two threshold values (Eqs. 2 and 3). The resulting pairs are shown in Fig. 1d.
We then establish a minimum threshold for the distance shift required for the smFRET measurements (Eq. 4). The resulting pairs are shown in Fig. 1e.
Finally, the absolute accessible area is calculated using the DSSP program, which makes use of the structure of the protein to calculate properties as secondary structure, bond and torsion angles and water-exposed surface area 21 . Using solvent exposed residues is important to ensure accessibility for the probe to react but also to allow free rotation of the label. The total accessible area from the DSSP program is then divided by the theoretical total surface area for that residue (used values are the calculated surface area for the amino acid X in a Gly-X-Gly tripeptide from 22 , giving the relative surface accessibility (RSA). The amino acid pairs with a sufficient RSA in both conformations are kept (Fig. 1f), all others are discarded. All the defined thresholds can be adjusted to suit specific needs or to reduce the number of remaining pairs. Similarly, one could easily extend the filtering method based on secondary structure, as labelling of loop regions is typically favoured over structured areas. Secondary structure is also calculated by the DSSP program. A customizable script is available on GitHub: https ://doi.org/10.5281/zenod o.44468 14 or https ://githu b.com/Membr aneEn zymol ogy/Resid uePai rs.
In the case of OpuAC we used the following thresholds: d min = 40 Å, d max = 80 Å, d shift-threshold = 8 Å, RSA = 60% to obtain 9 pairs, shown in Fig. 2a,b. These pairs were exported to PyMOL for manual inspection, where we aimed for pairs located on the sides of OpuAC that do not interfere with the docking of the substrate-binding domain in the full-transporter complex (Fig. 2c). We selected two pairs, one with a positive FRET change upon glycine betaine binding (D320C/K453C) and one with a negative FRET change upon binding (N414C/K566C). We also include a pair (T504C/K521C) with a positive FRET signal and a low relative surface accessibility (RSA = 8-11% for Thr504). The parameters of all three mutants plus the original mutant (V360C/N423C) are shown in Table 1.  (Table 1) were first constructed in the SBD of OpuA, which were expressed as water-soluble proteins (named OpuAC) and purified to homogeneity. Glycine betaine titrations were performed to assure normal function of the mutant and fluorophore-labelled proteins. OpuAC with double cysteines were labelled with the fluorescence donor (Alexa555) and acceptor (Alexa647), using maleimide derivatives of the dyes. Glycine betaine titration of these labelled OpuAC mutants (Fig. 3a) was monitored by solution-based alternating laser excitation (ALEX) single-molecule FRET (Fig. 3b). Indeed, we see the FRET signal decreasing in OpuAC (N414C/K566C) and increasing in the other two mutants. The mutant (T504C/K521C) that was predicted to be least surface accessible (8-11%) also showed a glycine-betaine dependent conformational change, however, the apparent dissociation constant (K D ) of 38 μM is an order of magnitude higher than reported for the wildtype protein and the (D320C/ K453C) and (N414C/K566C) mutants 10,20 . Moreover, a low surface accessibility may influence the rotational freedom of the labels. Although we cannot say with certainty that the increased K D of OpuA (T504C/K521C) is due to the labelling of the buried Thr-504, we believe that the RSA is a valuable parameter to restrain in the initial selection of labelling sites. We propose to lower the restraints when the number of pairs is too low, but the labelled protein should always be tested for functionality. The (D320C/K453C) and (N414C/K566C) mutants show K D values in the same range (1-4 μM) as reported for the wildtype protein and were used for further studies.
Mutations in the full-length transporter OpuA. Next, we verified that the labelling positions do not interfere with the activity of the full-length transporter. OpuA has two SBDs covalently linked to the transmembrane domain, and therefore four cysteines per complex. The three mutant pairs were constructed in the full- The same as panel c but now the pairs with an absolute distance larger than 80 Å and smaller than 40 Å are filtered out. Panel (e), the same as panel d but now pairs with a distance change smaller than 8 Å are filtered out. Panel (f), the same as panel e but now all residues that are less than 60% surface-exposed are filtered out; circles are drawn around pairs/clusters to increase visibility. www.nature.com/scientificreports/ length transporter and the proteins were purified and reconstituted in MSP1D1 nanodiscs. After reconstitution, half of the nanodiscs were labelled with 4-acetamido-4′-maleimidylstilbene-2,2′-disulfonic acid (AMdiS) and the other half of the sample was used as control. Like the fluorophores used for smFRET, AmdiS is a relatively bulky water-soluble maleimide but unlike the dyes it is affordable for large-scale protein labelling. We used SDS-PAGE gel electrophoresis to show that the proteins are quantitatively labelled with AmdiS, which is apparent from a significant shift in the migration of the OpuABC subunit of the OpuA complex (Fig. 4a). All fractions were then analysed for ATPase activity using a coupled enzyme assay (Fig. 4b). We do not want to interpret the

Discussion
We describe a straightforward approach to select sites for labelling of proteins for smFRET or EPR measurements. One can use proteins similar to the one used to showcase the approach, for instance, the receptor or substrate binding domains associated with ABC transporters, tripartite tricarboxylate transporters (TTTs),  www.nature.com/scientificreports/ tripartite ATP-independent periplasmic transporters (TRAP), some ligand-gated ion channels (LGI), metabotropic receptors (GPCRs) or 2-component regulatory systems 23 . In these classes of proteins alone, already in 2016 (last structural classification 23 ) there were over 500 structures available. However, in principle, the method is not limited to these proteins, but can be used for any system, provided at least two structures or homology models in different conformations are available. Like OpuA, many of the above-mentioned proteins are homodimeric with more than one SBD per functional complex, hence multiple pairs of cysteine residues are present per complex, complicating the smFRET analysis (Fig. 5a). By introducing a single cysteine per domain and stochastic labeling, hence two cysteine residues per complex in case of a homodimeric complex, it will be possible to observe interdomain movements (Fig. 5b), as has been shown for the ABC-transporter BtuCD by labelling the transmembrane domains 24 , the ABC-transporter MRP1 by labelling the NBD's 25 , but also for the ABC-transporters MsbA 6 and McjD 7 . A similar approach has been used in smFRET studies on BetP, a homotrimeric protein with three fluorophores per complex 8 . Alternatively, one could label the protein with a fluorescence donor and introduce a fluorescence quencher in the ligand or membrane to probe conformational dynamics. In another study on the ABC transporter BtuCD, the cobalt ion in the ligand (vitamine B 12 ) has been used for quenching to determine transfer of the substrate from the SBD through the TMD 9 . By inserting a quencher in for instance the vesicle or nanodisc membrane, and a fluorescence donor in the SBD, one could determine the conditions under which the SBD gets closer to or further away from the membrane (Fig. 5c).
To facilitate smFRET measurements in homodimeric proteins such as OpuA with multiple identical subunits, it should be possible to create apparent heterodimeric complexes with e.g. one protomer containing the double cysteine mutation and one protomer being cys-less. One can then probe the opening and closing of the SBD in the context of the full-length transporter and e.g. determine if the two SBDs of OpuA deliver substrates stochastically or that a receptor domain once bound can deliver multiple substrates. We aim to take this approach in future studies, building on the work described in this paper. In short, we describe a systematic method to find candidates for FRET, EPR or other double mutation-based distance-reporting methods that can be used to make a pre-selection of suitable pairs using relevant distance and solvent accessibility constraints.

Material and methods
Residue selection protocol. For the residue selection protocol, we recommend to follow the instruction of the script (https ://doi.org/10.5281/zenod o.44468 14). In short: two protein structures (different conformations) are read using the ProDy Python library 26 Distance maps, containing the distance between the center of mass of the side chain (Cα for glycine) of each pair of residues for each protein structure. The difference map of these two distance maps is generated by subtraction. By selection of distances within a specified range, the amino acid pairs are filtered and only the pairs with a suitable distance (e.g. depending on the probes used for FRET or EPR) between the residues are kept. The DSSP software (version 3.0.0-2) 21,27 is used to assess the secondary structure and surface accessibility for each of the residues. By filtering based on surface accessibility the number of possible pairs is further reduced. The script returns a list of suitable residue pairs, as well as a script that can be imported in the PyMOL Molecular Graphics System and used for direct visualization of the obtained amino acid pairs.

Construction of expression strains. The cysteines were introduced sequentially using Quikchange
Mutagenesis and the Escherichia coli pREOpuAHis vector. Using restriction cloning (AlwNI and BamHI), the OpuAC region of the gene, where the mutations were introduced, was transferred into the Lactococcus lactis Reconstitution of OpuA in MSP1D1 nanodiscs. The reconstitution procedure was similar to 29 . In short: 1.4 μM of the purified OpuA was mixed with 14 μM purified MSP1D1 scaffold protein and 1.4 mM lipids (lipid composition: 50% DOPE, 12% DOPC, 38% DOPG) in 50 mM potassium phosphate pH 7.0, 4% (v/v) glycerol, 10 mM DDM plus 1 mM DTT to a total volume of 2 mL and was nutated for an hour at 4 °C. Then 2 g of SM2-Biobeads (Bio-rad) were added to adsorb the detergent and this mixture was allowed to incubate overnight. In the morning the supernatant was separated from the Biobeads with a syringe.
Labeling of OpuA for ATPase assay. Two times 200 μL column volume of Ni 2+ -Sepharose resin was equilibrated with 1 to 2 column volumes of water and 1-2 column volumes of 50 mM potassium phosphate pH 8.0, 200 mM KCl and 1 mM DTT. The reconstitution mixture was split in two samples of 2 mL. The mixture was let to bind to the column (1 hr 4 °C), which then was washed with 10-20 column volumes of buffer without DTT (50 mM potassium phosphate pH 8.0 plus 200 mM KCl). Then, 1 mL of the same buffer supplemented with 1 mM AMdiS (4-acetamido-4′-maleimidylstilbene-2,2′-disulfonic acid) was added and let to react for 1 h at