Open and Lys–His Hexacoordinated Closed Structures of a Globin with Swapped Proximal and Distal Sites

Globins are haem-binding proteins with a conserved fold made up of α-helices and can possess diverse properties. A putative globin-coupled sensor from Methylacidiphilum infernorum, HGbRL, contains an N-terminal globin domain whose open and closed structures reveal an untypical dimeric architecture. Helices E and F fuse into an elongated helix, resulting in a novel site-swapped globin fold made up of helices A–E, hence the distal site, from one subunit and helices F–H, the proximal site, from another. The open structure possesses a large cavity binding an imidazole molecule, while the closed structure forms a unique Lys–His hexacoordinated species, with the first turn of helix E unravelling to allow Lys52(E10) to bind to the haem. Ligand binding induces reorganization of loop CE, which is stabilized in the closed form, and helix E, triggering a large conformational movement in the open form. These provide a mechanical insight into how a signal may be relayed between the globin domain and the C-terminal domain of HGbRL, a Roadblock/LC7 domain. Comparison with HGbI, a closely related globin, further underlines the high degree of structural versatility that the globin fold is capable of, enabling it to perform a diversity of functions.

The bacterium Methylacidiphilum infernorum, isolated from the Hell's Gate (Tikitere) geothermal area in New Zealand, is an aerobic methanotrophic bacterium growing optimally at 60 °C and pH 2.0 1 . Its genome encodes five globins -one double-domain HGbRL and four single-domain HGbI-IV 2 -and thus far only the single-domain HGbI 3 and HGbIV 4 have been studied. HGbI possesses a very high affinity for O 2 and an unusually low autoxidation rate 3 . Its structure consists of a typical but compact globin fold, with the highly conserved Tyr29(B10) and Gln50(E7) in the distal site flexible enough to bind a molecule of either O 2 or acetate 3 . HGbIV, on the other hand, is a truncated haemoglobin with two unique extensions at both termini -a Pre-A loop at the N terminus and a helix I at the C terminus 4 . Its distal site, intriguingly, is comparatively large and polar, containing a specifically conserved His70(B9)-His71(B10) motif that in a crystal structure was well positioned to bind a phosphate ion 4 .
The globin fold, having one of the most versatile architecture, can bind a variety of different ligands in the distal site. Besides small ligands such as O 2 , CO and NO, ligands as large as econazole or even a phospholipid have been reported in the structures of yeast flavohaemoglobin 5 (Yhb) and Ralstonia eutropha flavohaemoglobin (FHP) [6][7][8] . In some globins, their His(E7) bind directly to the haem in the absence of an exogenous ligand, forming a bis-His hexacoordinated species first observed in the structure of a nonsymbiotic plant haemoglobin 9 . Neuroglobin, meanwhile, employs a novel haem-sliding mechanism for its bis-His hexacoordination 10 . In the unligated structure of Escherichia coli flavohaemoglobin (HMP), interestingly, the sixth coordination site is physically occupied by Leu57(E11) 11 .
Globins have also been found fusing to the N termini of a variety of proteins, forming multi-domain proteins which are widely distributed in prokaryotes and eukaryotes alike. In flavohaemoglobins, which function in cellular responses to nitrosative stress, the globin domain is fused to a ferredoxin reductase-like module consisting of a FAD-and an NADH-binding domain 12 . In HemAT, a globin-coupled sensor (GCS) involved in Bacillus subtilis aerotaxis, the C-terminal domain is a methyl-accepting chemotaxis protein 13 . RsbR from B. subtilis, on the other hand, comprises a C-terminal STAS domain and forms a gigantic stressosome with RsbS and RsbT, which plays a role in the bacterium's general stress response 14 . The N-terminal globin domain of RsbR, however, not only lacks helices C and D but also has the strictly conserved haem-binding His(F8) in the proximal site replaced by Ala74 -it no longer binds any haem 15 . While B. subtilis RsbR represents an extreme example of a globin that has evolved to lose its haem, the RsbR homologues from Saprospira grandis still retain their haem and are able to bind O 2 16 . HGbRL from M. infernorum is a putative globin-coupled sensor of 274 residues, consisting of an N-terminal globin domain and a C-terminal Roadblock/LC7 domain. It is unique to the genus Methylacidiphilum, and presently is found only in two other closely related bacteria, M. kamchatkense 17 and M. fumariolicum 18 . The globin domain, N-HGbRL, is 38% identical to HGbI, with both belonging to the same group of bacterial globins that lack helix D and conserve both Tyr(B10) and Gln(E7) (Fig. 1). The structures of several bacterial globins from this group, such as HMP 11 , FHP [6][7][8] and Vitreoscilla stercoraria haemoglobin (VsHb) 19 , have displayed a high degree of versatility in their distal sites involving mainly loop CE and helix E. HMP, for instance, moves its helix E inwards in the closed form to occupy the distal site 11 , while FHP moves its outwards in the open form, expanding the distal site to bind a ligand as large as a phospholipid 7 . Here we further demonstrate the architectural plasticity of the globin fold by reporting the structure of an unprecedented site-swapped dimeric globin, N-HGbRL, which has been solved in both the open and closed forms with several novel features.

Figure 1. Structural alignments of globins.
HGbRL is unique to the genus Methylacidiphilum (Group 1), presently found only in two other bacteria, M. kamchatkense (Mka) and M. fumariolicum (Mfu) (aligned by protein sequences). Its globin domain is closely related to bacterial globins (Group 2) with conserved Tyr(B10) and Gln(E7), as well as lacking helix D that is present in eukaryotic globins (Group 3). Thus far N-HGbRL is the only globin containing fused helices E and F (EF) among all the known structures, with the corresponding loop EF of HGbI (green) straightened into part of the elongated helix. Haem hexacoordination in eukaryotic globins invariably involves His(E7) (yellow), as seen in nonsymbiotic plant haemoglobin (nsHb), neuroglobin (Ngb) and cytoglobin (Cgb). In bacterial globins, on the other hand, hexacoordination involves residues at either position E10 or E11, such as Lys52(E10) in N-HGbRL, His66(E11) in GsGSC, and Leu57(E11) in both VsHb and HMP for haem shielding. Following the conventional numbering for the distal Gln/His(E7) and proximal His(F8), the fused helices E and F are numbered E6-E31 followed by F1-F11.

Results and Discussion
Hexacoordination of N-HGbRL. The purifieddimeric N-HGbRL was in the ferric state, whose spectrum revealed a predominantly hexacoordinated form with a Soret band at 412 nm and a visible peak at 534 nm (Fig. 2). In the presence of an excess of sodium dithionite that also removed dissolved oxygen, N-HGbRL was reduced but in the hexacoordinated form too, with the Soret band shifting to 424 nm and two distinct peaks emerging at 529 and 557 nm (Fig. 2). Both HGbI 3 and HGbIV 4 have been reported to form a hexacoordinated species as well. The internal sixth ligand, however, did not seem to form a strong interaction with the haem at least in the ferrous state, as under the experimental conditions the deoxy N-HGbRL slowly turned oxygenated within 20 minutes, with the Soret band and the two peaks shifting to 413, 544 and 576 nm (Fig. 2).

Site-swapped N-HGbRL with fused helices E and F. N-HGbRL consists of seven α-helices, hel-
ices A-H, lacks helix D and forms a dimer. It displays a typical bacterial globin fold except for a very untypical feature -its helices E and F unexpectedly fuse into an elongated helix (Fig. 3a). Loop EF is straightened into part of the α-helical conformation, and the elongated helices from both subunits form an antiparallel pair. Instead of being sandwiched between helices E and F from a single globin subunit, a haem group in N-HGbRL is sandwiched by two subunits. A single N-HGbRL globin fold, denoted subunit A*, is therefore made up of helices A-E from one subunit and helices F-H from the other. In other words, both the proximal and distal sites of a single globin fold are constituted mainly by two separate subunits, representing the first ever site-swapping example in the globin family. The two C-terminal Roadblock/LC7 domains, meanwhile, will be positioned at the opposite ends of the dimer's helices H, but may still lie in the same side to interact with one another.
The structure of N-HGbRL was first solved in the open form in space group C2, with two subunits in the asymmetric unit. In the distal site, an imidazole molecule from the Ni 2+ -affinity chromatography binds to the haem and the highly conserved Tyr28(B10), which also coordinates a water molecule, W1 (Fig. 3b). Several imidazole-bound globin structures have been reported too, including those of VsHb 20 and sperm whale myoglobin 21 , and even larger ligands with an imidazole moiety bound in a similar pattern to the haem, such as the econazole molecules bound to the structures of FHP 8 and Yhb 5 , have also been observed. The distal site of N-HGbRL is lined by all hydrophobic residues (Phe27, Leu31, Phe42, Leu53, Val56, Val94 and Leu98) except for Tyr28(B10) and Lys52(E10). The A-ring propionate of the haem also hydrogen-bonds to the side chains of Arg80 and Tyr84, and the D-ring propionate to Lys43 N and the Tyr28-bound water W1 (Fig. 3b).
Further optimization subsequently yielded crystals of the closed form, also in the same space group C2 but with only a subunit in the asymmetric unit. Intriguingly, the first turn of helix E in the distal site has unravelled, allowing Lys52(E10) to close in to bind to the haem at the sixth coordination site and to a water molecule, W2 (Fig. 3c). Tyr28(B10), meanwhile, has been driven away to hydrogen-bond to both the N of Arg51(E9) and Lys52(E10), as well as to W1. Proposed as a mechanism to regulate ligand affinity, haem hexacoordination has been widely observed in several globin structures including nonsymbiotic plant haemoglobin 9 , fruit fly haemoglobin 22 , Caenorhabditis elegans neural globin GLB-6 23 , cytoglobin 24 and neuroglobin 25,26 that normally involves a distal His(E7) residue, or in truncated haemoglobin 27 an His(E10) residue. Haem coordination by lysine, however, has thus far been reported only in the structures of cytochrome c nitrite reductase (NrfA) 28 and tetrathionate reductase 29 , but both in the proximal site. Presently the only structures of haem hexacoordination by a proximal histidine and a distal lysine are found in a complex of NrfA and NrfH, a membrane-bound cytochrome c quinol dehydrogenase 30 , as well as in the M100K mutant of cytochrome c-550 from Paracoccus versutus, whose haem-coordinating Met100 was replaced by lysine 31 . The closed form of N-HGbRL is hence the first globin structure with Lys-His hexacoordination.
Distal site reorganization and a large conformational shift upon ligand binding. Both subunits A* of the open and closed structures superpose well at helices A-E (rmsd 1.22 Å) and considerably better at helices F-H (rmsd 0.46 Å), with the distal site undergoing some conformational changes. A ~ 25° bending of helix E results in a large shift of subunits B* between the two forms ( Fig. 4a), creating a huge pocket of about 250 Å 3 in the distal site of the open form that connects to several tunnels (Fig. 4b). In the closed form, meanwhile, Lys48(E6)-Arg51(E9) are untangled and no longer form the first turn of helix E, which tilts towards the haem and binds to it (Fig. 4c). Loop CE is also pulled towards helices B and C, stabilized by new hydrogen bonds between Arg39(C5) from helix C and Phe42-Ile44, between Gly50(E8) from the unravelled turn of helix E and Ile46, and between Arg29(B11) from helix B and Glu47 (Fig. 4d). The side chain of Trp59(E17) has two alternate conformations in the closed form -one with an occupancy of 0.6 facing the solvent and another of 0.4 flipping 180° inwards -but in the open form it flips inwards, appearing to act as a switch between the two forms (Fig. 4e). In the open form, interaction between subunits B* from two adjacent asymmetric units, Arg39(C5) from one asymmetric unit and Glu47 from the other, causes loop CE to shift relative to that of subunit A* ( Supplementary Fig. S1).
These changes provide a mechanical insight into how N-HGbRL may sense a signal. When a signalling ligand binds the haem in the closed form, it dislodges Lys52(E10) and pushes helix E outwards,  (Fig. 4c). This may probably expose Trp59(E17) wholly to the solvent and prompt it to flip inwards, further facilitating helices E and F to move away from helix A (Fig. 4a). The side chain of Asp68(E26) breaks its bonding with Arg3 and in turn binds to Arg75(F2), promoting the bending of the fused helices E and F (Fig. 4e). The presence of cooperativity between the two subunits can also be deduced from these structures -the conformational changes induced by ligand binding in subunit A* drives helix F in subunit B* towards helix E, forcing the latter's fused helices to bend and hence to trigger the reorganization that opens up the closed distal site of subunit B* (Fig. 4a). Once the ligand dissociates, loop CE may tend to assume the closed conformation as it is stabilized by the interactions, absent in the open form, with Arg29(B11), Arg39(C5) and Gly50(E8) that pulls Lys52(E10) towards the haem (Fig. 4d).  6,8 . However, FHP's reportedly 'closed' state is not totally closed -its two closed structures still bind some large azole derivatives -and it is also unclear if FHP's helix E can move further inwards to bind to the haem when an exogenous ligand is actually absent. On the other hand, HMP has been solved only in a completely closed structure, with its haem totally shielded by Leu57(E11) 11 . The single-domain VsHb also shields its haem with Leu57(E11) in the closed form, but only slight movements of its helix E and a sideways rotation of Leu57(E11) are already sufficient to open up its distal site to bind a molecule of azide, thiocyanate or imidazole 19,20 . Nevertheless, as HMP's Leu57(E11) is closer to the haem than VsHb's Leu57(E11) is, in HMP it may require a larger outward movement of helix E for ligand binding.
N-HGbRL and these multi-domain globins probably constitute a subgroup of novel bacterial globins that share distinct features such as an open and a closed form, a huge distal pocket, haem hexacoordination or shielding, and, upon ligand binding, reorganization of loop CE and helix E which may possibly lead to a large movement of a neighbouring domain. A signature of this subgroup may be haem hexacoordination or shielding with a residue at site E10 or E11, instead of E7, such as Lys52(E10) in N-HGbRL and Leu57(E11) in HMP. Haem hexacoordination normally involves the invariant His(E7), and only recently has an E11 residue been reported, i.e. His66(E11) in Geobacter sulfurreducens globin-coupled sensor (GsGCS) 32 . These globins notably do not possess helix D, whereas globins with haem-coordinating His(E7) usually do. The absence of helix D may not only increase the flexibility in this region, but a shorter sequence may also be more energetically favoured for a concerted conformational change involving loop CE and helix E. Still, at present as only N-HGbRL has been solved in both forms, more structures of globins in either form are needed for further studies.
HGbI also has a smaller distal pocket which is connected to the solvent through several tunnels 3 , but the distal pocket of the open form of N-HGbRL is readily accessible from the solvent through an opening (Fig. 4b). In HGbI's pocket, Tyr29(B10) is positioned closer to the haem and able to bind an O 2 molecule, which also binds Gln50(E7). Although spectrally ferrous N-HGbRL also binds O 2 (Fig. 2), it is unclear if its corresponding Tyr28(B10) could directly coordinate O 2 , since this would definitely require a larger outward movement of helix E to make room for Tyr28(B10) to move in. On the other hand, the hexacoordinated spectra of HGbI 3 are a very perplexing observation, as no histidine is present in the vicinity of the distal site and the only lysine nearby, Lys53(E10), is positioned outside the pocket (Fig. 5a). It would require considerable conformational changes of HGbI's helices B and E for Tyr29(B10) to make way for Lys53(E10) to move in to bind the haem, and this certainly would also disrupt the dimerization of HGbI. A structure of the hexacoordinated HGbI will no doubt prove interesting to reveal perhaps another new mechanism for haem hexacoordination.
Evolution of HGbRL from HGbI and HRL. As observed in the reorientation of FHP's NADH domain 8 , the large conformational movement of N-HGbRL between its open and closed forms, by the same token, is also expected to trigger changes in HGbRL's C-terminal Roadblock/LC7 domain. Roadblock/LC7 proteins form one of the light chains of dynein in eukaryotes, a large multisubunit complex that transports cellular cargo along mircrotubules. In bacteria, they constitute an ancient protein superfamily which may function in NTPase regulation 33 . The Roadblock/LC7 protein from Myxococcus xanthus, MglB, is the cognate GTPase-activating protein (GAP) of the GTPase MglA, and they are involved in regulating the dynamic switching of cell polarity in bacterial motility 34,35 . The M. infernorum genome also encodes a pair of genes similar to the MglB-MglA operon, but whose functions are still unknown -a single-domain Roadblock/LC7 protein (GenBank: ACD84470) 34% identical to the C-terminal domain of HGbRL, and an MglA-like GTPase (GenBank: ACD84469) encoded 183 bp upstream.
HGbRL is unique to the genus Methylacidiphilum, and could have evolved from the duplication and subsequent fusion of the genes of HGbI and the Roadblock/LC7 protein. With both domains having their own carbon copies in the genome -hence able to relieve themselves of the same functions -HGbRL was therefore free to have developed a new role of its own. Its globin domain, upon binding a yet to be identified ligand, is likely to mechanistically relay the signal to the Roadblock/LC7 domain through the former's conformational changes. The downstream pathway might involve cell motility as mediated by the Roadblock/LC7 domain, but M. infernorum has been reported as a non-motile rod 1 . Alternatively, since HGbRL is also encoded in the same operon as an S-adenosyl methionine (SAM)-dependent methyltransferase (GenBank: ACD83143), signalling from HGbRL may perhaps lead to the methylation of a target molecule by the methyltransferase. Further characterization of this protein, including identification of the native ligand for the globin domain, will definitely help relate the unusual structural features of HGbRL to its function.

Methods
Cloning, protein expression and purification. The gene encoding N-HGbRL (residues 1-133; GenBank: ACD83144) was amplified by PCR using primers containing the NdeI and BamHI restriction sites, with a His-tag sequence inserted before the stop codon. The PCR product was cloned into the pET-3a expression vector (Novagen), and the resulting vector was transformed into Rosetta 2(DE3)pLysS (Novagen). The cells were grown in LB medium containing 100 μ g/ml ampicillin and 34 μ g/ml chloramphenicol. Expression was induced with 50 μ M isopropyl-β-D-thiogalactopyranoside, supplemented with 100 mg/l FeSO 4 •7H 2 O and 17 mg/l δ-aminolevulinic acid. After overnight incubation at 37 °C, the cells were harvested by centrifugation and resuspended in 50 mM Tris-HCl, 200 mM NaCl, pH 8.0, sonicated and centrifuged to remove cell debris. The supernatant was then applied to a Ni 2+ -charged HiTrap IMAC HP column (GE Healthcare), and N-HGbRL was eluted with a gradient of up to 1.0 M imidazole. Fractions containing the protein were pooled and further purified by size-exclusion chromatography on a Superdex 200 column (GE Healthcare).
Absorption spectra. Extensive buffer exchange was carried out to remove traces of imidazole. The absorption spectrum for the met form of N-HGbRL was measured in 50 mM Tris-HCl, pH 8.0 at room temperature. The deoxy form was obtained by adding freshly prepared sodium dithionite to a final concentration of 100 mg/ml, and the absorption spectra were measured every ten minutes for an hour. Crystallization and data collection. N-HGbRL was concentrated to 10-20 mg/ml prior to crystallization. Using the sitting-drop vapour diffusion technique, the imidazole-bound crystals were formed at 20 °C in 3.5 M 1,6-hexanediol, 0.1 M sodium citrate pH 5.6, and cryoprotected with 12% glycerol. Crystals of the hexacoordinated closed form, meanwhile, were formed at 4 °C on a big salt crystal in 50% 2-methyl-2,4-pentanediol, 0.1 M Tris-HCl pH 8.0, 0.2 M (NH 4 ) 2 HPO 4 . Diffraction data for both types of crystals, flash-cooled at 100 K, were collected on a Rigaku MicroMax-007 HF X-ray generator (λ = 1.5418 Å) equipped with an R-AXIS IV ++ area detector, and processed with XDS 36 in the same space group C2 but with different cell parameters.
Structure determination. The HGbI structure 3 (PDB: 3S1I) was used as an initial model to solve the closed form by molecular replacement with Molrep 37 , which was then used to solve the imidazole-bound form. These structures were subsequently built and refined using Phenix 38 and Coot 39 , with TLS refinement introduced into the closed form. Data collection and refinement statistics are summarized in Table 1. Structural alignments were performed with the Dali server 40 , manually edited and presented with ESPript 41 . Tunnels and the distal pocket were calculated with CAVER 42 , and figures were generated using PyMOL (http://www.pymol.org). Values for the outer shell are given in parentheses.