Ice-nucleating proteins are activated by low temperatures to control the structure of interfacial water

Ice-nucleation active (INA) bacteria can promote the growth of ice more effectively than any other known material. Using specialized ice-nucleating proteins (INPs), they obtain nutrients from plants by inducing frost damage and, when airborne in the atmosphere, they drive ice nucleation within clouds, which may affect global precipitation patterns. Despite their evident environmental importance, the molecular mechanisms behind INP-induced freezing have remained largely elusive. We investigate the structural basis for the interactions between water and the ice-nucleating protein InaZ from the INA bacterium Pseudomonas syringae. Using vibrational sum-frequency generation (SFG) and two-dimensional infrared spectroscopy, we demonstrate that the ice-active repeats of InaZ adopt a β-helical structure in solution and at water surfaces. In this configuration, interaction between INPs and water molecules imposes structural ordering on the adjacent water network. The observed order of water increases as the interface is cooled to temperatures close to the melting point of water. Experimental SFG data combined with molecular-dynamics simulations and spectral calculations show that InaZ reorients at lower temperatures. This reorientation can enhance water interactions, and thereby the effectiveness of ice nucleation.


Table of Supplementary Contents
XPS experiments were performed on a Kratos AXIS Ultra DLD instrument equipped with a monochromatic Al K X-ray source (hν = 1486.6 eV). All spectra were collected in hybrid mode at a take-off angle of 0° (angle between the sample surface plane and the axis of the analyzer lens). The spectra were collected at fresh spots on the sample (n=3) and were charge corrected to the C1s aliphatic carbon binding energy at 285.0 eV, after which a linear background was subtracted for all peak area quantifications. Analyzer pass energy of 160 eV was used for compositional survey scans. High-resolution scans of C1s and N1s elements were collected at an analyzer pass energy of 20 eV. Compositions and fits of the high-resolution scans were produced in CasaXPS. XPS samples were prepared by Langmuir-Schaeffer (LS) deposition of the INP9R protein from the air/water interface onto silicon substrates. The substrates were cleaned immediately before use by 15 min sonication in dichloromethane (DCM), acetone, and ethanol and dried with nitrogen. The protein was injected at 7 uM in the trough used for SFG experiments and allowed to equilibrate for 30 min at each given temperature before deposition. After deposition, the samples were left to dry overnight in a dark glovebox under a slow stream of nitrogen.
Supplementary Results 1 XPS is a surface analytical technique able to provide precise atomic level compositions of the first approximately 10 nm of a surface. 17,18,19 In short, the irradiated sample emits photoelectrons from the core level by transfer of energy from the photon to core-level electrons. The energy of the emitted photoelectron is directly related to the atomic molecular environment of the electron, and the quantity of a specific photoelectron relates to the concentration of a specific element. The depth of XPS analysis is attributed to the inelastic mean free path (IMFP) of a photoelectron emitted; the average distance than a photoelectron travels between successive inelastic collisions. Any photoelectrons that escape the surface with energy loss contribute to the background. All elements, expect for hydrogen and helium, present in quantities greater than 0.1 atomic percent can be identified with XPS. Previously, XPS has been used to determine elemental compositions on flat surfaces as well as monolayer coverage of proteins. 17,18,20,21,22,23,24,25,26,27 In this study, we used XPS to follow the surface coverage of a protein (InaZ9R) at the air-water interface. The protein was removed from the air-water interface by LS depositon with silicon substrates. The samples were then analyzed by XPS for the amount of protein, which directly relates to the amount of protein at the air-water interface. The amount of protein the air-water interface was measured at 5˚C, 10˚C, and 20˚C corresponding to the SFG spectra recorded at these temperatures in the main article. Supplementary Figure 1 shows the survey and high-resolutions spectra of the LS deposition onto substrates and Supplementary Table 1 shows the survey spectrum atomic compositions for all expected elements in each sample. Figure 7B shows the C1s high-resolution spectra, and it is clear from the peak at 288.0 eV that there is protein on the substrate surface. Next, to determine the amount of protein on the surface, we can directly use the atomic percent N1s in Supplementary Table 1, and since the only source of nitrogen is from the protein, thus it is a direct indication of the relative amount of protein on the substrate surface. Due to the fact that the substrate below the protein is not well measured, likely due to the thickness of the protein and the salt overlayer, the relative amount of protein to salt buffer can be used to indicate the amount of protein on the substrate surface and thus direly related to the air/water surface of our trough experiments. We observe 8.9 ± 0.3, 9.2 ± 0.5, and 9.0 ± 0.7 atomic percent nitrogen for 5˚C, 10˚C, and 20˚C, respectively. Within the experimental error, the nitrogen percent is the same between the temperatures. Furthermore, the counts per second (CPS) measured by XPS in the N1s high-resolution (Supplementary Figure 1C) and also very similar. Together, these two pieces of information from XPS show that the amount on INP9R protein is constant for all temperatures measured in this study.

2D-IR Spectra
The spectra were calculated based on an amide-I Hamiltonian model. 5 Briefly, we construct a one-and two-exciton Hamiltonian for the amide-I mode of the backbone amide groups in the protein, with couplings that are estimated differently for nearest-and nonnearest-neighbor amide groups. The nearest-neighbor interactions, which are dominated by through-bond effects, are modeled using a parameterized map of an ab initio calculation with the 6-31G+(d) basis set and the B3LYP functional, thus providing the coupling as a function of the dihedral angle. 6,7 The nonnearest-neighbor interactions, which are dominated by through-space effects, are estimated with the transition-dipole coupling model. 8 The local-mode IR frequencies !"#$! are shifted with the same model employed by Lu et al. 9 The values of % and , as well as the number of frames over which to average the C=O length, was calibrated to LK 14 data and simulations using the same force field as in the current study, and found to be 1660 cm -1 , 500 cm -1 /Å and 1, respectively. The equilibrium bond distance &'( *+,-. of C=O, was set to 1.229 Å for amide groups with secondary amines, which corresponds to the equilibrium value of a C-O bond in the AMBER99SB-ILDN force field, and to 1.232 Å for amide groups with tertiary amines (prolines), for which the local-mode frequency is also redshifted by 26.3 cm -1 to account for the redshift due to the larger carbon mass as compared to the hydrogen mass bound to the amide N atom in other amino acids.
The Hamiltonian is then diagonalized to obtain the amide-I eigenvalues and eigenvectors, from which the spectroscopic response is calculated.

SFG Spectra
To account for the azimuthal isotropy of the proteins at the interface, we average the Euler angle from 0 to 2π. For the SFG spectral calculations, the total Lorentzian width was set to 20 cm -1 , in accordance with the experimentally determined visible bandwidth of 15 cm -1 plus an inhomogeneous broadening of 5 cm -1 . Furthermore, the interfacial refractive indices were all set to 1.18 in accordance with ref. 11 .
The orientational SFG investigation employed in this study requires a single (frame of a) hypothetical structure with the z-axis of the molecular frame aligned along a symmetry axis (the long axis of the β-helix in this case) so that the azimuthal averaging can be performed well in the SFG calculations. We did not include the hydrogen bond-induced shifts applied in the IR calculations (on the MD trajectories), because for a single frame this often leads to spectral distortions as not all hydrogen-bonded states are probed well in a single frame. For a good match between the calculated and experimental spectra, the gas-phase frequency was set to 1645 cm -1 , 5 cm -1 redshifted with respect to the values used in previous studies in which similar calculations were performed 12,13,14,15 . This redshift is probably necessary to account for the stronger hydrogen bonding within the β-helix as compared to the globular proteins investigated in the other studies. Both for θ and Ψ we assumed a Gaussian broadening of the orientation distribution with s = 10°, to account for the fact that the protein is not expected to have a very well-defined orientation distribution. To keep the spectral calculations as simple as possible, we chose to not include a lipid C=O contribution, but to focus on optimizing the match between the model and the experimental protein signal between 1600-1670 cm -1 , which is expected to be unaffected by the tail of the lipid peak (the peak centered at 1730 cm -1 ).
The reported (θ,Ψ) values reported in the main text are determined by starting Levenberg-Marquardt fits with the minima of the 2D-RSS plots taken as the initial guesses. The errors are defined as the value for which the RSS doubles in a procedure in which the angles were fixed to values away from the optimal values, and performing fits in which the other fit parameters were left free. Under the azimuthal symmetry assumption and for small twist angles (θ), the uncertainty for the twist angle Ψ is relatively large, because for such angles Ψ becomes increasingly similar to the in-plane rotation angle .

Supplementary Results 2
The Effect of Inhomogeneous Broadening on Calculated 2DIR Spectra In Supplementary Figure 2, the effect of the addition of inhomogeneous broadening to the calculated 2D-IR spectra is shown, along the experimental spectra. One can see that while the peak frequency is already predicted well by applying the spectral calculations to the betahelical model, the match with the spectral shape is better with inhomogeneous broadening applied to the local-mode frequencies. In order to do this, the spectral calculation was run 50 times on the first frame of the trajectory with 50 times a gasdev 10 random distributed inhomogeneous broadening of 17.5 cm -1 applied. The IR calculations were performed with a Lorentzian half-width-at-half-max (hwhm) of 7.5 cm -1 and a Gaussian pump hwhm of 2.5 cm -1 . The spectra that were not inhomogeneously broadened, were calculated by averaging the IR response of 250 equispaced frames of the 10 ns trajectories (thus spaced by 40 ps).

Supplementary Figure 2. Comparing experimental FT-IR and 2D-IR spectra with several methods of spectral calculations
for (in the 2D-IR case) a parallel and perpendicular pump-probe polarization. The spectra indicate that the β-helix model leads to the closest match with the experimental spectra, both with and without a random-coil contribution added to the calculations in the form of an inhomogeneous broadening of the local-mode frequencies.

Normal-mode Analysis
In order to gain more insight into the molecular order of the observed IR and SFG response, we performed a normal-mode analysis for the low-and high-frequency normal modes that have the strongest IR response, the result of which can be found in Supplementary Figure 3.

Supplementary Note 3: Sample Preparation and Characterization
Supplementary Methods 3

Biochemistry
The InaZ9R construct was prepared essentially as described previously for a 16 repeat InaZ construct. 1 Briefly, a PCR product encoding the N-terminal domain, repeats 1-4 and 63-67 and the C-terminal domain (see Supplementary Table 2 for the primer sequences) was prepared by splicing-by-overlap PCR (SOE-PCR) using a codon-optimized (Genscript) version of the 67 repeat inaZ gene from the P. syringae strain R10.79 as template as previously described 1 . The PCR product was cloned into the pET30 Ek/LIC vector according to the manufacturer's instructions (Novagen, Merck Biosciences). This generated a recombinant construct where the truncated InaZ sequence was fused with a tag-sequence containing a His-tag and an S-tag encoded by the vector and a TEV cleavage site (introduced via the cloning primers) in the Nterminal of the construct. This construct was termed InaZ9R (see the full InaZ9R sequence in main text Figure 1B

InaZ Lyophilization of the IR and SFG Samples Assessed with UV-CD
InaZ9R was lyophilized overnight in a ScanVac Coolsafe (Labogene) before all experiments, so that the SFG and IR amide-I spectra would not be affected by the H2O bending mode (1643 cm -1 ) that overlaps with the amide-I (1600-1700 cm -1 ) region. Normalized UV-CD spectra before and after lyophilization indicate that the secondary structure of the protein is not affected by this procedure (see Supplementary Figure 5). The UV-CD spectra exhibit a random coil contribution around 205 nm, as well as a broad peak around 217 nm that probably indicates b-helical contents. 2 These UV-circular dichroism (UV-CD) spectra were recorded with a Chirascan-plus CD spectropolarimeter (Applied Photophysics, Leatherhead, UK), using a 1 mm pathlength quartz cuvette. The inaZ9R was dissolved in PBS buffer at a concentration of 0.05 mg/ml. The spectra were obtained at room temperature in the spectral area of 200 nm to 250 nm set to 0.5s time-per-point, step size of 0.2 nm, and 2 nm bandwidth.

FT-IR and 2D-IR Sample Preparation
The lyophilized InaZ9R in powder form was resolvated at 0.8 mg/mL in phosphate buffered saline (PBS; 0.01 M phosphate buffer, 0.0027 M KCl, and 0.137 M NaCl, pD 7.4, Sigma-Aldrich), D2O (99.9%D, Eurisotop), and 7 μL of the solution was placed in between two 1 mm CaF2 windows and sealed off with a Krytox vacuum greased (Duniway) 50 μm spacer in a custom-made IR cell.

SFG Sample Preparation
The samples for SFG measurements were prepared from lyophilized inaZ9R powder in D2O-PBS buffer to avoid interference from H2O bending modes. The samples were prepared in a stainless-steel trough with a quartz window at the bottom containing approximately 3 mL of solution. Throughout all SFG experiments, the water level was held constant by a syringe pump (New Era Pump Systems Inc.) with the cannula submerged to the bottom of the trough. A solution of D2O-PBS (pD = 7.4) with inaZ9R was added to the trough such that the final protein concentration was 10 M. Throughout the experiment, a water chiller (Neslab RTE-101) was used with a water-cooled breadboard to adjust the temperature of the protein solution to 20°C, 10°C, and 5°C, as monitored with a submerged digital thermometer (Omega).

Mass Spectral Analysis of Recombinant InaZ
The recombinant InaZ9R produced was assayed by mass spectrometry to confirm protein sequence and (denatured) holo-mass of the produced protein in comparison to the nucleotide sequence of the construct. This can reveal unexpected post-translational and chemical modifications as well as amino-acid substitutions. In addition, we assayed the protein by native-spray mass spectrometry to determine possible complex formation and different folding states of the protein.
To this end, the purified protein was subjected to digest by trypsin with chymotryptic activity (owing to the limited number of fully tryptic sites in the protein) and analyzed by data dependent tandem mass spectrometry on a timsTOFpro mass spectrometer (Bruker Daltonik, Germany). Data were analyzed by fragment ion searches performed by an in-house MASCOT server (matrix science, UK) against the sequence of InaZ alone and in the background of the Escherichia coli proteome database, the latter to assay the preparation for background proteins stemming from the recombinant production of the protein (see Supplementary Figure 6). Searches were conducted with cysteine carbidomethylation and methionine oxidation set as fixed and variable modifications respectively, against the InaZ sequence. This showed a maximum of 56% sequence coverage of the proposed sequence, which, owing to the limited number of fully tryptic cleavage sites, is a good coverage for this protein. In addition, the data was searched against the same InaZ sequence in the background of the E. coli proteome, to assay for purity of the preparation, InaZ was the major protein found in the sample (47% coverage), with a number of background proteins from E. coli also identified (minor components). To assay for possible unexpected modifications of InaZ, we also analyzed the data using PEAKS studio X+ (BSI bioinformatics) using its de novo search algorithm and found no indication of unexpected post-translational modifications to the protein (data not shown).

Supplementary Figure 6. Sequence coverage of InaZ by tryptic and chymotryptic digest. Purified InaZ subjected to digestion by trypsin and chymotrypsin was analyzed by mass spectrometry. Peptide sequences identified (p<0.05) are shown in bold red in the protein sequence.
In addition to bottom-up analysis of tryptic digests, we also assayed the intact protein by denatured and native spray mass spectrometry on a SynaptG2 mass spectrometer (Waters, UK). In short, InaZ was diluted in 50% acetonitrile, 49.9% water, 0.1% formic acid to an approximate concentration of 1 µM, and sprayed into the mass spectrometer by loading the solution into a glass-tip emitter (New Objective, USA). The mass spectrometer was externally calibrated for a range of 500-5000 m/z by sodium cesium iodide clusters and source settings were as follows: capillary voltage 1200 V, cone voltage 45 V, source gas 0.3 bar. The raw spectrum is shown in Supplementary Figure 7 and, although noisy, we are able to extract numerous components with a mass around 45 kDa (expected mass 43.834 kDa), which is highly similar to the mass of ~45 kDa observed on SDS page. To ascertain whether InaZ might form different folding structures with differing charge states or form complexes with itself in solution, we also attempted to measure InaZ under native conditions, using the same source settings. In this case, InaZ was diluted to approximately 1 µM in 1 mM ammonium acetate at pH 5.0. This, however, resulted in a very undefined mass spectrum (data not shown) that could not be interpreted. This could be due to the heterogeneity already shown in the poorly defined denatured spectrum being exacerbated by multiple folding or aggregation states of the protein under native conditions.

1D-IR (FT-IR) Spectroscopy
The FTIR spectra were recorded with 32 scans on a Bruker Vertex v70 spectrometer. The sample compartment was purged with N2 gas to avoid absorption lines due to water lines.

2D-IR Spectroscopy
For the 2D-IR experiment, 794 nm pulses were generated with a Mantis (Coherent) oscillator at an 80 MHz rate, of which 1 kHz were amplified with a Legend (Coherent) optical parametric amplifier to a 3 mJ beam. This light is converted to ~15 μJ of mid-IR (~6100 nm) light, with a full-width-at-half-max (fwhm) of ~150 cm -1 , after which it is split into a probe, reference and pump beam at a 5/5/95% ratio. The pump beam is spectrally narrowed to a fwhm of 10 cm -1 by a Fabry-Pérot interferometer and overlapped in space with a 1.5 ps delay in the sample with the probe beam, while the reference beam passes through the sample cell a few mm beside the focus of the probe and pump. The difference absorbance spectrum is then dispersed by an OrielMS260i spectrograph onto a 32 pixel MCT array with a resolution of 3.9 cm −1 (see supplementary reference 3 for more details).

Supplementary Figure 8. Temperature-dependent 1D-IR (FT-IR, top) and 2D-IR (bottom, with the color bars indicating the differential absorption in DOD) spectra at different temperatures for an InaZ9R concentration of 4.3 mg/mL in PBS prepared with D2O, and with a parallel pump polarization with respect to the probe polarization.
In Supplementary Figure 8 one can see that the secondary structure does not change for a similar temperature change as employed in the SFG experiments (from +20 °C to 5 °C, on top). Even when going down to -10 °C (bottom) the amide-I spectrum does not change significantly. The minor spectral changes are probably a result of scattering induced by vapor formation due to the low temperature, on the outer side of the CaF2 windows of the IR cell. The sample does not yet freeze at this temperature, probably due to the very small sample volume (7 µL) and the presence of the salt in the PBS buffer.

SFG Spectroscopy
The SFG setup is based on a 7 W, 35 femtosecond laser system (Astrella, Coherent) with pulses centered at 800 nm and a repetition rate of 1 kHz. One part of the output was used to pump an optical parametric amplifier (OPA) with a non-collinear difference frequency generation (NDFG) extension (TOPAS Prime, Light Conversion) to generate broadband (FWHM ~ 300 cm -1 ) IR pulses tunable over the range 3.3-6.1 µm. A narrowband (FWHM ~ 15 cm -1 ) visible beam was generated by guiding 1 mJ of the fundamental through a Fabry-Perot etalon. The visible and IR beams were spatially and temporally overlapped on the surface. The SFG signal was focused into a spectrograph (Shamrock 303i, Andor) and detected by an EMCCD camera (Newton 971, Andor). In the present study, SFG measurements were recorded in the amide I region (1600-1700 cm -1 ), O-D stretching region (2200-2800 cm -1 ), and C-H stretching region (2800-3100 cm -1 ). Spectra in the Amide I region were collected in ssp (s-SFG, s-visible, p-IR) and ppp polarization combinations. Spectra in the O-D region were recorded in ssp polarization combination and spectra in the CH region were recorded in ssp and sps polarization combinations. The sample stage and IR beam path were flushed with nitrogen to avoid artifacts due to IR light adsorption by water vapor. All spectra were background subtracted and normalized using a reference spectrum obtained from gold.

SFG Fitting Methods
The SFG data in the OD region were fit using the following equation: (1) ( ) = 34 (1) + 0 where Γ + , + , and + are the full width half max (FWHM), amplitude, and resonant frequency of the q th vibrational mode, respectively, and 34 (1) and *// (1) are the nonresonant background and effective seconded nonlinear susceptibility tensor, respectively. To determine the error in the measurements, the amplitude was allowed to change, and the other fitting components are held the same (this is done to directly compare the amplitude change). We determine the error for the amplitudes by allowing the amplitudes to change until the fit became unreasonable. This error was determined to be 10 percent.

Supplementary Results 5 SFG Spectra in the C-H Region
Supplementary Figure 9 depicts the C-H stretch region SFG spectra of inaZ9R at the air-water interface at temperatures of 20 ᵒC (left) and 5 ᵒC (right). The observed C-H spectra are indicative of protein side chains and resonances are observed near 2850, 2870, 2910, 2935, and 2950 cm -1 . These are assigned to CH2 symmetric, CH3 symmetric, CH2 asymmetric, CH3 Fermi, and CH3 asymmetric resonances, respectively. The observed increase in overall intensity upon decreasing the temperature from 20 ᵒC to 5 ᵒC, can be attributed to inaZ9R ordering or changing orientation at the interface as temperature decreases.

Peak-Fitting Parameters of the OD-Stretch Spectra
The parameters obtained from fitting the SFG data from Figure 2 of the main text are shown below in Supplementary Table 3 (for the spectra with inaZ9R) and Supplementary Table 4 (for the spectra without inaZ9R). Note: The peak positions near 1636 and 1668 cm -1 can be assigned to B2 mode and βturns/unordered structures of a β-structure protein 16 , and the 1720 cm -1 peak can be assigned the lipid ester mode 30 . These assignments are also found back in the normal-mode analysis derived for these peaks (see Supplementary Figure 3).