## Introduction

Bacterial lipids are the key constituent segregating cellular components from the external environment. Bacterial lipids are highly diverse yet there is currently little understanding of the benefits that this diversity provides [1]. Glycerophospholipids are by far the best studied lipids in bacteria and a key branch point for glycerophospholipid biosynthesis is phosphatidic acid (PA), from which a variety of lipids, including phosphatidylglycerol (PG), phosphatidylethanolamine (PE), phosphatidylcholine (PC), diacylglycerol (DAG), and triacylglycerol (TAG), can be made through either the cytidine diphosphate (CDP)-diacylglycerol (DAG) pathway or the Kennedy pathway [2]. PA biosynthesis in bacteria is carried out by a membrane-attached acyltransferase PlsC, the founding member of the large lysophosphatidic acid acyltransferase (LPAAT) family [3, 4].

Aside from phospholipids, the study of bacterial lipid diversity is currently hampered by a lack of knowledge of both the chemical structures of many of these lipids and the identity of genes involved in their synthesis. These have severely hindered our understanding of lipid diversity and their physiological function in bacteria. Once the chemical structure of a lipid is known, analytical strategies can then be devised to detect the lipid in both the natural environment and cell cultures [5]. This can also help to direct studies into the biosynthesis of the lipid, knowledge of which can provide a clearer idea of the likely distribution of the lipid amongst various bacterial classes. A group of poorly studied bacterial lipids are the aminolipids, of which only ornithine lipids have been detected in diverse cultured bacteria since the 1960s [6]. However, it was not until the genes involved in its biosynthesis were elucidated that it became clear how widespread the capacity to produce ornithine lipid really was [7, 8]. Similarly, Sebastian et al. [9] found several uncharacterized aminolipids in marine heterotrophic bacteria one of which was recently determined as a glutamine-containing aminolipid, often found in the marine roseobacter group [10]. Both ornithine and glutamine lipids play a key role in the adaptation of cosmopolitan marine bacteria (e.g., the marine SAR11 clade and the roseobacter group) to oligotrophic environments [9,10,11].

In this study, we report the characterization and chemical structure of a novel sulfur-containing aminolipid using high resolution-accurate mass spectrometry from the marine roseobacter group. This newly identified lipid represents a novel class of sulfur-containing lipids with an aminosulfonate head group. Furthermore, we describe a novel acyltransferase enzyme (SalA), part of the LPAAT family, that is responsible for the biosynthesis of this sulfonolipid. This sulfonolipid appears widespread within the roseobacter group that are key players in marine biogeochemical cycles and important for biofilm formation. Furthermore, the salA gene is abundant and actively transcribed in marine surface microbial assemblages.

## Materials and methods

### Bacterial strains and cultivation

All marine bacteria used in this study were cultivated using either marine broth medium (BD Difco™ 2216), ½YTSS medium containing yeast extract (2 g/L), tryptone (1.25 g/L), and sea salts (20 g/L, Sigma-Aldrich) or a defined marine ammonium mineral salts (MAMS) medium [10]. The MAMS medium contained 30 g/L NaCl, 10 mM glucose, 1 mM K2HPO4, 0.75–7.5 mM NH4Cl, 10 mM HEPES buffer (pH 7.6), 1.36 mM CaCl2, 0.98 mM MgSO4, 7.2 µM FeCl2, 84 µM Na2MoO4, 370 nM ZnCl2, 510 nM MnCl2, 97 nM H3BO3, 1.1 µM CoCl2, 12 nM CuCl2, 100 nM NiCl2, 30 nM thiamine, 160 nM nicotinic acid, 97 nM pyridoxine, 73 nM aminobenzoic acid, 53 nM riboflavin, 84 nM pantothenate, 4.1 nM biotin, 1.5 nM cyanocobalamin, and 11 nM folic acid. All cultures were grown at 30 °C aerobically in a shaker (150 r.p.m) unless stated otherwise.

### Intact polar lipid analysis

Lipid extraction from bacterial cultures was carried out using the modified Folch extraction protocol as described previously [10, 12]. Briefly 1 mL culture of OD540 ~ 1.0 was collected by centrifugation. Total lipids were then extracted using methanol-chloroform, dried under nitrogen gas and the pellet re-suspended in 1 mL solvent (95% (v/v) liquid chromatography-mass spectrometry (LC-MS) grade acetonitrile and 5% 10 mM ammonium acetate pH 9.2 in water). These lipids were then analysed by LC-MS using a Dionex 3400RS HPLC with a HILIC BEH amide XP column (2.5 µm, 3.0 × 150 mm, Waters) coupled with an amaZon SL ion trap MS (Bruker) via electrospray ionization (ESI) in both positive (+ve) and negative (−ve) ionization mode. Samples were run on a 15 min gradient from 95% (v/v) acetonitrile/5% (w/v) ammonium acetate (in water, 10 mM, pH 9.2) to 70% (v/v) acetonitrile/30% (w/v) ammonium acetate (in water, 10 mM, pH 9.2), followed by 5 min of isocratic run 70% acetonitrile/30% ammonium acetate with 10 min equilibration between samples. The flow rate was maintained at 150 μL min–1 and the column temperature at 30 °C. The injection volume was 5 μL for each run; the ionization was done in both positive and negative mode. Drying conditions were the same for both modes (8 L min–1 drying gas at 300 °C and nebulizing gas pressure of 15 psi). The end cap voltage was 4500 V in positive mode and 3500 V in negative mode, both with 500 V offset. Data analysis was carried out using the Bruker Compass software package. Unless stated otherwise, base peak chromatographs were presented with m/z range from 400 to 1000.

High resolution MS identification and fragmentation was carried out using either a quadrupole-time-of-flight MS (Q-TOF, Waters Synapt G2-Si) or an Orbitrap Fusion (Thermo Fisher Scientific) by direct infusion and collision induced dissociation (CID). For the Orbitrap Fusion, the resolution was set at 120 K with CID for MSn. A TriVersa Nanomate nanospray source (Advion, NY) was used and the flow rate was at 300 nL min−1. The voltage was set at 1.4 kV and the gas pressure was 0.3 psi. Sheath and sweep gas were set to zero and the cone voltage was 2100 V and the mass range was from 50 to 1000 Da. MS data were analyzed using Xcalibur (Thermo Fisher Scientific). For Q-TOF, samples were injected through a Universal NanoFlow Sprayer (Waters) by direct infusion at 200–300 nL min−1 and the cone voltage was 30 V in negative mode ESI. Mass range was set from 50 to 1000 Da and data analyses ware carried out in MassLynx (Waters). The most abundant peak in the negative ion spectrum corresponding to the SAL lipid (m/z 656.6) was selected for MSn fragmentation. Spectra were obtained in profile mode and smoothed using a moving mean. Background correction using a linear baseline was applied with a 40% noise cut-off. For accurate mass determination, the centroid of each peak was used. The peak corresponding to C17H33COO (m/z 281.2480, an 18:1 fatty acid carboxylate anion) was used as a lock mass. Calculation of candidate elemental formulae from the accurate mass considered formulae containing C0–100, H0-100, N0-100, S0-4, and P0-1. A conservative mass error of 100 ppm was assumed.

### Marker-exchange mutagenesis

Marker-exchange mutagenesis was carried out as described previously using a suicide vector pK18mobsacB [10]. Briefly, DNA fragments corresponding to an upstream element and a downstream element that flank the target gene were amplified by PCR using high-fidelity Phusion DNA polymerase. A Gm-resistance cassette was amplified from plasmid p34S-Gm [10, 13]. These fragments, together with the linearized pK18mobsacB vector were then assembled through Gibson cloning and transformed into competent Escherichia coli DH5α cells. The engineered suicide vector was then extracted from E. coli DH5α and transformed into the conjugation donor strain E. coli S17.1 λpir before conjugating into Ruegeria pomeroyi DSS-3 as described previously [10]. Transconjugants were then selected on defined MAMS medium containing gentamycin (Gm, 10 μg mL−1). All mutants were confirmed by PCR using the confirmation primers (Supplementary Table 1) and subsequent Sanger sequencing.

### Transposon library of Phaeobacter inhibens DSM 17395

A library of 5500 transposon mutants of Phaeobacter inhibens DSM 17395, which was established at the DSMZ, served as a basis to identify genes involved in the biosynthesis of the novel SAL lipid. Transposon mutagenesis was performed with the EZ-Tn5<R6Kγori/Kan-2> Tnp Transposome kit (Epicentre, Illumina, CA, USA) and the insertion site of all mutants was determined via arbitrary PCR [14]. Transposon mutants were streaked out three times to eliminate attached wild type cells. The absence of wild type cells and the presence of the 65 kb plasmid were validated as described previously [14,15,16]. The transposon integration site of each mutant was also confirmed via sequencing of the amplification PCR product, and stable maintenance of all three extrachromosomal elements was validated via diagnostic PCR [17].

The transposon mutant #1036 of P. inhibens DSM 17395 (PGA1_c01210) unable to produce the SAL lipid was complemented using the salA homolog of Ruegeria pomeroyi DSS-3 (locus tag SPO0716) and P. inhibens DSM 17395 (locus tag PGA1_c01210). Complementation was carried out by PCR amplification of the salA homologs together with a constitutive promoter (~250 bp upstream of the aacC1 gene from plasmid p34S-GM [13]), which was then cloned into the broad host range vector pBBR1MCS and transformed into the salA mutant of P. inhibens DSM 17395 by conjugation as described previously [9, 10]. The complemented mutants were cultivated using marine broth medium and cells were harvested for lipidomics analysis as described above.

### Biofilm assays

To grow biofilms of Phaeobacter inhibens DSM 17395 and the salA mutant, post-exponential grown bacterial cells were washed and diluted in fresh marine broth medium and inoculated at an OD590 nm of 0.2 into 24-well plates (Corning Incorporated Costar®, New York, NY, USA) containing a sterilized glass coverslip into each well. At each time point (3, 24, and 48 h), biofilms were washed to remove non-adherent bacteria and fixed using formalin 3.5% (v/v) for 20 min. Bacteria were stained using DAPI (5 μg• mL−1, Sigma-Aldrich, Darmstadt, Germany) and coverslips were mounted with a drop of Mowiol antifade before observation using confocal laser scanning microscopy (CLSM) (Zeiss LSM 880, Göttingen, Germany). The biovolume and the average thickness of the biofilms were determined using COMSTAT software developed in MATLAB R2017a (MathWorks, Natick, MA, United States) as described previously [18, 19]. To test for statistically significant differences between the wild-type strain and the salA mutant, a t-test was performed using SPSS 13.0 (IBM, Armonk, NY, USA).

A crystal violet biofilm assay was also performed which was adapted from Guillonneau et al. [18]. Bacterial biofilms were developed in 96-well microtiter plates (Greiner Bio-One, Kremsmünster, Austria) with bacteria in the post-exponential growth phase using marine broth medium. Cells were diluted to a final OD590 nm = 0.1 into each well (n = 4 for both the wild type and the salA mutant) and grown in static conditions at 30 °C. At each time point (3, 24, 48, and 72 h) samples were washed three times with fresh marine broth medium and dried for 30 min at 50 °C. Biofilms were then stained for 15 min with 200 μL crystal violet 0.01% (w/v), rinsed three times with phosphate-buffered saline and dried for 10 min. Biofilm quantification was performed by releasing the stain from the biofilm using absolute ethanol for 10 min at 30 °C with gentle shaking. The absorbance of the crystal violet in solution was measured at 595 nm. The final absorbance of each sample was calculated by subtracting the blank (i.e., marine broth medium only treated with crystal violet, n = 4).

### Bioinformatics analysis

Phylogenetic analysis of 16S rRNA genes from Rhodobacteraceae was carried out using the full length 16S rRNA gene retrieved from the Integrated Microbial Genomes (IMG) database (https://img.jgi.doe.gov/). Sequence alignment of 16S rRNA genes and LPAAT genes (also retrieved from IMG) were performed using Muscle and phylogenetic analyses were performed with MEGA7.0 [20] with 500 bootstrap replicates. Sequence alignment was visualized using JalView [21].

To search for SalA homologs in the Tara metagenome/metatranscriptomics datasets, we used the Ocean Gene Atlas (OGA) database OM_RGCv2_metaG (metagenomics) and OM-RGCv2_metaT (metatranscriptomics) with e-value cut-off of e−40 [22]. Abundance was normalized as a percentage of the median mapped read abundance of genes/transcripts of ten prokaryotic single-copy marker genes [23]. Taxonomic distribution of homologs was displayed using Krona in the OGA interface.

The genomes of the marine roseobacters used in this study were downloaded from the NCBI database. These comprised nine strains that were found to produce SAL and two strains (Stappia stellulata DSM 5886 and Dinoroseobacter shibae DFL12) that did not. In order to identify genes potentially involved in SAL synthesis, each gene from the 11 genomes was assigned to an orthologous group using the eggNOG mapper [24]. This program conducts a BLAST search of each sequence against the eggNOG database [25] of orthologous genes, with the query sequence being annotated with the same orthologous group as the best BLAST hit. Orthologous groups that were present in the genomes of all SAL-producing strains but absent from the genomes of S. stellulata and D. shibae were considered to be potentially involved in SAL synthesis.

Abundance data of SalA homologs from four depths derived from the Tara metagenomics/metatranscriptomics datasets were tested for normal distribution using a Shapiro–Wilks test. Significant differences between depths was tested for using a Kruskal–Wallis test followed by a post-hoc Dunn’s test using Holm’s correction for multiple comparisons. All statistical analysis was performed in RStudio (version 1.3) using R (version 4.02).

### In silico homology modeling and docking studies for SalA

A SalA homology model was generated using the Phyre2 protein folding prediction server [26], and the lyso-SAL lipid was drawn in MarvinSketch (v19.10.0, 2019, ChemAxon for Mac) and exported as a Mol SDF format file. The homology model was built using the structure of the lysophosphatidic acid acyltransferase PlsC (PDB code 5KYM [4]). The SalA protein model was then imported into Flare (v3.0, Cresset) for docking the lyso-SAL substrate and energy minimized with 2000 iterations with a cut off of 0.200 kcal/mol/A. The lyso lipid was imported as a ligand and energy minimized in Flare before being docked into the active site and the best scoring pose selected.

## Results

### A new sulfur-containing aminolipid is found in Ruegeria pomeroyi DSS-3

During LC-MS analysis of lipid extracts from Ruegeria pomeroyi DSS-3 grown on ½ YTSS medium, two prominent peaks eluting around 3.5 min were found in both negative and positive ionization mode (Fig. 1). The most prominent ions in the two peaks had m/z values of 656.6 and 672.7 in the negative ionization mode, respectively. Other major lipids identified in this bacterium include two phospholipids, PG and PE and two aminolipids, ornithine lipid (OL) and glutamine lipid (QL) [10].

To elucidate the structure of the new lipids eluted at 3.5 min, the most intense species, at 656.4882 m/z, was selected for high resolution MS/MS analysis on a quadrupole-time of flight (Q-TOF) mass spectrometer (Fig. 2). At low collision energy (40 eV) the major species formed corresponded to a neutral loss of 282 mass units. This is consistent with the neutral loss of an 18:1 fatty acid. A second peak at m/z 281.2480 is likely the carboxylate anion of an 18:1 fatty acid. Further fragmentation, at higher collision energies (up to 90 eV), yielded a major ion at m/z 237.2159. This ion likely corresponds to a 16:0 fatty acid present as a ketene, which would be consistent with the fragmentation scheme proposed for ornithine lipids and glutamine lipids [27]. These results therefore suggest a lipid class with a similar fatty acyl backbone structure to the aminolipids, such as ornithine and glutamine lipid [10]. The glutamine lipid (QL, [M+H]+ m/z 719.7) and ornithine lipid (OL, [M+H]+ m/z 705.7) was eluted at 9.5 and 12.5 min, respectively (Fig. 1). The formation of these novel lipids at ~3.5–4 min is not affected in the olsA or glsB mutants of R. pomeroyi DSS-3 (Supplementary Fig. S1). The olsA and glsB genes in R. pomeroyi DSS-3 were essential for the production of the nitrogen-containing ornithine/glutamine lipids [10].

Prominent peaks at 80 and 81 m/z, respectively, were apparent in the fragmentation spectrum obtained at 90 eV collision energy (Fig. 2c). The accurate masses of these ions were 79.9568 and 80.9643. Of the candidate formulae within 100 ppm of the measured mass, $${\mathrm{SO}}_3^ -$$ and $${\mathrm{HSO}}_3^ -$$ appear most plausible, with mass errors of 0.182 ppm and 4.194 ppm, respectively. A smaller peak doublet at m/z 63.9611 and 64.9692 was also present in the 90 eV spectrum. These masses are unambiguously assigned to $${\mathrm{SO}}_2^ -$$ (mass error 12.506 ppm) and $${\mathrm{HSO}}_2^ -$$ (mass error 8.08 ppm). Taken together, these results demonstrate the presence of a sulfonate group in the lipid. An ion at 136.0045 m/z corresponded to the deprotonated head group. The mass determined here is larger than that of deprotonated taurine (m/z 124). Since the head group includes a sulfonate ($${\mathrm{SO}}_3^ -$$) group, the plausible formula most closely corresponding to the accurate mass is C3H6NSO3 (Table 1). This is consistent with the structure being aminopropane sulfonic acid, although the position of the amino group cannot be unequivocally determined by mass spectrometry (Fig. 2). The proposed fragmentation scheme is presented in Fig. 3.

To further confirm the presence of an amino-group in the hydrophilic head of this SAL, we cultivated Ruegeria pomeroyi DSS-3 in a chemically defined marine ammonium mineral salts (MAMS) medium using 15N-ammonium as the sole nitrogen source. Indeed, the 15N-labeled SAL was readily observed in the lipid extract resulting in a shift of m/z from 656.4951 to 657.4903 (Supplementary Fig. S2a), whereas the non-nitrogen containing lipids, such as PG were not labeled by 15N as expected (Supplementary Fig. S2b). The incorporation of the 15N isotope into the head group of SAL was confirmed by MSn (Supplementary Fig. S2c, d). We also performed the same MSn analysis on the m/z 672.4875 species as well as the 15N-labeled m/z 673.4852 species. Loss of 282 at MS2 (672.4875→390.2317; 673.4852→391.2285) suggests the R2 fatty acid was C18:1. Therefore, the data suggest that the lipid species eluted immediately after the m/z 656.6 species is likely a hydroxylated SAL, and the proposed fragmentation scheme is presented in Supplementary Fig. S3.

### The sulfur-containing aminolipid is found in a range of marine roseobacters

To investigate the presence of SAL amongst roseobacters we selected 16 strains, in addition to R. pomeroyi DSS-3, to obtain a wide coverage of the roseobacter group including the model roseobacter bacterium Phaeobacter inhibens DSM 17395 (Fig. 4). The selected strains included Stappia stellulata, which recent phylogenetic studies indicate is not a member of the Rhodobacteraceae [28], which served as an outgroup. These strains were each grown in marine broth overnight, before cells were harvested for lipid analysis. SAL was detected in all the strains tested apart from S. stellulata and Dinoroseobacter shibae (Fig. 4a). The separation of these two strains from the remaining roseobacter sequences is in line with previous results showing D. shibae branching deeply within the Rhodobacteraceae phylogeny [29].

### Comparative genomics to determine genes involved in SAL biosynthesis

We then conducted a comparative genomics investigation into the roseobacter strains whose lipid profiles had been analysed. We reasoned that synthesis of the SAL would require an N-acyltransferase activity to acylate aminopropane sulfonic acid, analogous to that mediated by OlsB and GlsB in the synthesis of ornithine and glutamine lipid [8, 10]. We investigated predicted N-acyltransferases that were present in all the strains that produced SAL in marine broth (the “producers”), while being absent from the strains that did not produce SAL (the “non-producers”). We assigned all the genomic sequences from the nine genome-sequenced producer strains and two non-producer strains to orthologous groups (OGs) using the eggNOG-mapper software [24], which provides a consistent pipeline for sequence annotation and OG assignment by comparison to the eggNOG database [24]. We identified a group of 1417 “core” genes which were present in the genomes of all SAL producer strains of which 1060 were also present in the two non-producers (Fig. 4b). Thirty-seven candidate genes are present in all SAL producer strains but not in the genomes of the non-producers (Fig. 4b), two of which (OG accession numbers 08UX5 and 05CDD) were annotated as being potential acyltransferases (Table 2). We therefore generated mutants in these two genes in the two model bacteria, R. pomeroyi DSS-3 and P. inhibens DSM 17395 and screened for the loss of SAL production. The 08UX5 mutant (locus SPO2471) of R. pomeroyi DSS-3 still produced SAL to the same level as the wild type (data not shown), suggesting that this gene is unlikely involved in SAL formation. However, in the 05CDD mutant of P. inhibens DSM 17395 (locus tag PGA1_c01210), SAL formation is completely abolished, suggesting that this gene is indeed responsible for SAL biosynthesis (Fig. 4c). This gene is named salA hereafter. Indeed, when the mutant was complemented with either salA from R. pomeroyi DSS-3 (SPO0716) or P. inhibens DSM 17395 (PGA1_c01210), SAL production was restored (Fig. 4d).

SalA is a putative O-acetyltransferase-like protein with a recognized LPAAT (lysophosphatidic acid acyltransferase) domain. Amongst bacterial LPAAT-domain containing proteins, the best characterized examples are PlsC and OlsA, encoding enzymes responsible for the final step in the biosynthesis of the anionic phospholipid phosphatidic acid (PA) and the ornithine/glutamine-containing aminolipid, respectively [3, 10, 30]. The structure of PlsC has recently been solved, showing an in silico docked LPA lipid together with the fatty acid in an acyl carrier protein (ACP, [4]). Multiple sequence alignments of SalA, PlsC, and OlsA shows the presence of two conserved sequence motifs (Fig. 5), representing the catalytic center (HX4/5D) and the substrate co-ordination center (FP[E/S]G[T/V]), respectively. Notably, both PlsC and OlsA have the conserved HX4D motif whereas SalA has the HX5D motif. Interestingly, the reported key Lys105 in PlsC, thought to be responsible for electrostatic interactions via its amide nitrogen backbone to the negatively-charged oxygen of the ACP-fatty acid intermediate, was replaced with Arg135 in SalA. The LPA phosphate head group is thought to be coordinated by Arg159 in PlsC. However, the sequence alignment shows a Val189 in SalA. In order to further investigate the implications of the sequence alignment, we obtained a homology model of SalA. The model shows the catalytic HX5D motif to be structurally comparable to that of PlsC despite the additional residue, with the His109 and Asp115 adjacent to each other, analogous to that in PlsC (Supplementary Fig. S4). In silico docking of a lyso-SAL lipid molecule into the model demonstrated a possible pose for the lyso-lipid hydroxy group adjacent to His109 (Supplementary Fig. S4), with the Arg135 suggested to coordinate the sulfonate head group. The conformationally flexible alkyl chain group was able to adopt many configurations, but the polar head group was docked consistently in the same region. Overall, the data suggests a diversification in function of LPAAT family enzymes during evolution, with SalA representing a novel member of this group. The presence of this unique motif of HX5D in SalA allowed us to determine the distribution of SAL-biosynthesis in environmental metagenomes and metatranscriptomes (see below).

### SAL production in Phaeobacter inhibens DSM 17395 is involved in biofilm formation

We next investigated the role of SAL lipids in the physiology of roseobacters. The loss of SAL lipids had no clear role in the growth of the bacterium. Both wild type and the salA mutant of Phaeobacter inhibens DSM 17395 had comparable growth rates and reached similar final cell density in marine broth medium (Supplementary Fig. S5a). An important change of lifestyle for roseobacters is the switch from planktonic growth to biofilm formation, which triggers a particle-associated life strategy that is ecologically relevant for their survival in the natural environment [31]. It has been shown previously that many roseobacters including Phaeobacter inhibens DSM 17935 are able to form a biofilm, and a 65 kb plasmid in this bacterium was important for biofilm formation [15, 16]. Interestingly, we observed that the salA mutant has a significantly reduced ability to form biofilms when in contact with solid surfaces, such as glass (Fig. 6) and plastics (Supplementary Fig. S5b). Both the bioviolume on the glass surface as well as the thickness of the biofilm are significantly reduced in the salA mutant strain in the early phase of biofilm formation (3 h), and the latter stage (24 and 48 h) of biofilm maturation (Fig. 6). The 65 kb biofilm plasmid was confirmed to be present in the salA mutant (Supplementary Fig. S5c). Thus, the significant reduced ability of the salA mutant in biofilm formation suggests that this lipid may play a key role in roseobacters in their natural environment.

### Distribution of the new acetyltransferase SalA in the Tara Ocean metagenomes and metatranscriptomes

To better understand the distribution of SAL in environmental microbial assemblages, we searched the Tara Ocean metagenomes and metatranscriptomics datasets using SalA (locus tag, SPO0716 of R. pomeroyi DSS-3) as the query. We experimentally determined the e value cut-off to be e–40 at which value it selectively retrieves LPAAT homologs belonging to SalA but not OlsA or PlsC. The environmental SalA homologs obtained from the Tara Oceans metagenome and metatranscriptomics dataset were aligned, and the key sequence motifs were manually examined. In particular, the HX5D motif is strictly conserved in all SalA sequences retrieved from the Tara Oceans datasets providing strong support, that these environmental sequences are of the SalA but not PlsC nor OlsA clade. On average, between 2–4% of microbial cells are estimated to have the potential for SAL biosynthesis; this is comparable to that of the olsA gene but somewhat lower than the plcP gene in the same dataset, suggesting SAL biosynthesis is less prevalant than the PlcP-mediated lipid remodeling pathway [9, 10]. This is likely due to the fact that SALs are primarily found in marine roseobacters but not in other dominant marine Alphaproteobacteria, such as the abundant bacterium Pelagibacter ubique of the SAR11 clade which are capable of PlcP-mediated lipid remodeling [9, 11]. Indeed, the majority (>85%) of the SalA sequences from the Tara Oceans dataset were classified as members of the Rhodobacteraceae in both Tara Oceans metagenomes and metatranscriptomes (Fig. 7), and a thorough search of 120-genome sequenced Rhodobacteraceae confirmed the wide occurrence of salA in all ten clades of the roseobacters (Supplementary Fig. S6 [32]).

## Discussion

Here, we identify a novel aminolipid containing an aminopropane sulfonic acid head group that is widespread amongst marine roseobacters. The presence of a sulfonate group means this SAL lipid also falls into the broad category of sulfonolipids. The most abundant, and arguably one of the best studied lipids of this type, is sulfoquinovosyl diacylglycerol (SQDG), which is present in the membranes of most oxygenic phototrophs [33] as well as some heterotrophic bacteria [34]. SQDG likely plays a structural role in photosynthetic membranes, since crystal structures of photosystem proteins show specific binding of this lipid [35].

Other sulfolipids appear to elicit potent responses when certain organisms are exposed to them. Thus, a sulfolipid produced by zooplankton from a number of copepod species was found to induce toxin production in the dinoflagellate Alexandrium minutum [36], likely as a defense against predation. Conversely, a sulfonolipid produced by the Bacteroidetes bacterium Algoriphagus machipongonensis induced the development of multicellularity in a choanoflagellate [37]. Both examples suggest that sulfolipids are used by the sensing organism as a marker for the presence of another organism with which it interacts (either as a predator or as a symbiont). The fact that sulfolipids appear to be relatively rare across the tree of life likely makes them well suited to mediate such chemical interactions, where a high degree of specificity is required. Lipids similar to those produced by A. machipongonensis have been described in a number of Bacteroidetes, particularly amongst Cytophaga [38,39,40]. They tend to be localized to the outer membrane, and seem to play a role in the gliding motility of these organisms [41, 42]. The sulfonolipids from Bacteroidetes differ from those that we describe here in roseobacters in that they are composed of a base, termed capnine, similar to the sphingoid bases of sphingolipids, which may be N-acylated to form the full sulfonolipid [38]. In this way they are similar structurally to sphingolipids, whereas the SALs of the roseobacter group are more similar to aminolipids such as ornithine lipid and glutamine lipid (Fig. 5). Whether the SAL lipid plays a role in interspecies interactions requires further work. However, we already observed that this lipid is involved in biofilm formation in Phaeobacter inhibens DMS17395 (Fig. 6), suggesting that formation of this SAL lipid may play an important role in the adaptation of marine roseobacters to a biofilm lifestyle.

A survey of the distribution of SAL among isolates from the roseobacters indicated that the ability to produce this lipid is widely distributed within the group. One strain, D. shibae, taxonomically the most basal of the strains examined, lacked any SAL under the conditions assessed, as did the outgroup strain Stappia stellulata. The absence of SAL in these strains suggests they lack the capacity to produce this lipid as the other roseobacters examined seem to produce SAL constitutively. However, it is possible that these strains have the capacity to produce SAL, but only do so under certain conditions. This pattern is observed for ornithine lipid, which is produced constitutively in some bacteria, such as R. pomeroyi DSS-3 [10], but in others is only produced as a response to P-depletion [11, 43]. Indeed, a close salA homolog was found in the genome of D. shibae (Dshi_0206), but it is absent in S. stellulata.

Although we have identified the LPAAT enzyme, SalA, involved in the last step of synthesis of this new sulfur-containing aminolipid, the key steps and genes involved in the synthesis of the lyso-SAL lipid remain to be determined. It is likely that SAL synthesis occurs in a manner analogous to that of ornithine and glutamine lipids. As such, 3-hydroxy fatty acids would be required as a substrate for the first step in SAL synthesis [44]. Such a hypothesis suggests that the aminopropane sulfonic acid moiety is also directly produced by the marine roseobacters since no exogenous supply was provided. The presence of 3-aminopropane sulfonic acid (a.k.a. homotaurine) has been documented in some red algae [45, 46] and unicellular green algae (prasinophytes such as Ostreococcus and Micromonas, [47]) but, to the best or our knowledge, never previously in bacteria. However, a hydroxylated form of 2-aminopropane sulfonic acid, cysteinolic acid, has been found in a variety of marine phytoplankton and heterotrophic bacteria, including Ruegeria pomeroyi DSS-3 although its biosynthetic pathway remains to be established [47]. Nevertheless, it is tempting to speculate that 2-aminopropane sulfonic acid is likely the hydrophilic head of the new SAL observed in these marine roseobacters, and this certainly warrants further investigation.

To sum up, this study describes a new class of lipid, which are an important component of the membranes of a number of marine Rhodobacteraceae. Comparative genomics of SAL-producing strains has identified a novel acyltransferase (SalA), which is involved in the production of this lipid. salA is widely distributed in marine microbial assemblages in the Oceans and actively expressed in Tara Oceans metatranscriptomes, and its functional role in addition to biofilm formation in these marine bacteria certainly warrants further investigation.