Sulfonolipids as novel metabolite markers of Alistipes and Odoribacter affected by high-fat diets

The gut microbiota generates a huge pool of unknown metabolites, and their identification and characterization is a key challenge in metabolomics. However, there are still gaps on the studies of gut microbiota and their chemical structures. In this investigation, an unusual class of bacterial sulfonolipids (SLs) is detected in mouse cecum, which was originally found in environmental microbes. We have performed a detailed molecular level characterization of this class of lipids by combining high-resolution mass spectrometry and liquid chromatography analysis. Eighteen SLs that differ in their capnoid and fatty acid chain compositions were identified. The SL called “sulfobacin B” was isolated, characterized, and was significantly increased in mice fed with high-fat diets. To reveal bacterial producers of SLs, metagenome analysis was acquired and only two bacterial genera, i.e., Alistipes and Odoribacter, were revealed to be responsible for their production. This knowledge enables explaining a part of the molecular complexity introduced by microbes to the mammalian gastrointestinal tract and can be used as chemotaxonomic evidence in gut microbiota.


High pressure liquid chromatography-based separation and fractionation
Fractionation experiments were performed on an Agilent 1290 Infinity LC system using an Acquity Xbridge™ column (5µm, 4.6 x 250 mm, Waters, Germany). A gradient of water/acetonitrile (A, 5 millimolar ammonium acetate/0.1% acetic acid in water; B, acetonitrile) was used for the fractionation experiments. The gradient used was 65% (B) for 8.40 min, and was increased to 99% (B) within 30 min and then held for 2.40 min. Recondition was done for 5 min with a pre-runtime of 8 min to 65% (B). The flow rate, the column temperature and the injection volume were 1 mL/min, 40°C and 100µL, respectively. Sample manager was cooled to +4°C. The fractions were collected every minute  with addition of trimethylsilyl-tetradeuteropropionic acid (TSP), as a reference standard. One dimensional proton ( 1 H)-NMR spectra were acquired on a Bruker 800 MHz spectrometer (Bruker Biospin, Rheinstetten, Germany) operating at 800.35 MHz with a quadruple inverse cryoprobe at 300 K. A standard 1D pulse sequence [recycle delay (RD)-90°-t1-90°-tm-90°-acquire free induction decay (FID)] was acquired, with water suppression irradiation during RD of 2 s, mixing time (tm) set on 100 ms, and a 90° pulse set to 10.13 μs, collecting 800 scans into 64,000 data points with a spectral width of 12 ppm. In addition, a 2D total correlation spectroscopy (TOCSY) analysis was performed, using a 1H-1H phase-sensitive sensitivity-improved 2D pulse sequence with water suppression by gradient tailored excitation (3-9-19) and DIPSI-2. 19,228 × 1,024 data points were collected using 32 scans per increment, an acquisition time of 1 s, and 16 dummy scans. Spectral widths were set to 12 ppm in the F2 and F1 dimensions. Processing of spectrum was performed using TopSpin 3.2 (Bruker BioSpin).
FIDs were multiplied by an exponential decaying function corresponding to a line broadening of 0.3 Hz (F1) and 2.5 Hz (F2) before Fourier transformation, manual phasing, baseline correction and calibration to TSP (δ 0.00) was also performed in TopSpin. Chemical shifts, multiplicity and Jcoupling constants were compared to Kamiyama et al. 1 and predicted spectra using ACD/NMR prediction software (ACD/Labs, Toronto, ON, Canada).

Statistical analysis
SIMCA-P version 9.0 (Umetrics, Umea, Sweden) was used for the principal component analysis

Metagenomics
Metagenomic studies were made only from C57BL/6NTac mouse group fed a safflower enriched high fat diet. In total 10 metagenomes were prepared: from 6 mice cecal samples were used (6 biological replicates), 4 of them could be prepared as duplicates (4 technical replicates). Genomic DNA was extracted from cecal luminal content (30mg) using an extraction kit NucleoSpin96 for Soil according to protocol. DNA was quantified using Quant-iT™ PicoGreen® dsDNA Kit. Sequencing was done by applying a whole-genome sequencing approach on the GS-FLX+ Titanium™ sequencing platform from Roche (Roche Diagnostics GmbH, Mannheim, Germany). DNA libraries were prepared for each metagenome on 1 µg of sample DNA following manufacturer's instructions. After nebulisation DNA fragments were processed by end repair, adapter ligation and size selection. Products were purified and quantified. Quality assessment of libraries on an Agilent Bioanalyzer High Sensitivity DNA Chip (Agilent Technologies, Santa Clara, USA) determined fragment lengths of sequence libraries of around 1400 bp, which were taken for further sequencing. By titration mainly a 6-12 DNA-copies per bead ratio was determined. After emulsion PCR and subsequent bead recovery enrichments of 790 000 DNA-beads were pooled per sample and loaded onto each quarter of a PicoTiter-Plate. Sequencing of long fragments was applied by selecting a 200 cycles sequencing run. Metagenome sequence data are available on Sequence Read Archive (SRA) under BioProject ID PRJNA299870. For quality control, prinseq-lite was used: Three bases were trimmed from the 5' end, bases with a quality score <20 in a window of 3 bases were trimmed from the 3' end, and sequences with a mean quality <20 were discarded 2 . Minimum length of all sequences was restricted to 150 bp, maximum length to 500 bp.
Contaminating mouse DNA sequences were detected by a sequence similarity search using BLASTN (NCBI BLAST 2.2.26+ max. e-value 0.1, DUST filter off) of all sequences against mouse reference genome on NCBI (build GRCm38.p1) 3,4 . An alignment length of ≥80% of the query sequence and evalues ≤10 -4 were used as cutoff criteria; sequences matching these criteria were considered presumptive mouse sequences and removed. To determine taxonomic origin of sequences and associated gene functions, a sequence similarity search was performed using BLASTX (NCBI BLAST 2.2.26 with the -w 15 parameter set, allowing for frameshifts in alignments, max. e-value 10) against the NCBI non-redundant (NR) database (downloaded 07/19/2013) 3 . Output was imported into MEGAN 5 (version 5.7.1), using parameters min. bitscore 50, max. e-value ≤10 -2 . Functional gene annotation was performed in MEGAN using KEGG classification of reads 5 . Based on RefSeq-IDs mapping to KEGG orthology (KO) groups, each read was mapped to a gene with KO identification.
Taxonomical assignment of genes involved in sphingolipid metabolism was determined by MEGAN 6 .    also Table S1).      A: Typical chromatogram of SL2 (fraction 10) and SL3 (fraction 11) as isolated from OSP pellet, using a combination of UHPLC coupled to ion trap mass spectrometer. Insert structures concern SL2 (B) and SL3 (C) compounds as characterised by NMR spectroscopy (see Figs. S7-9 and Tables S5-6).    Tables   Table S1. Overview of putative SLs, detected in cecal samples by means of FT-ICR-MS analysis, including experimental and theoretical mass signal values, molecular formulas, mean intensities, and database (ChemSpider) annotation.

Class
Nr  Table S2.
Summary of all eighteen SLs with their measured retention time (RT), theoretical mass signal values and molecular formulas of parent and fragment ions and applied collision energies in eV. MS/MS were performed in negative electrospray ionization mode. This table also represents information about major parent-fragment ions that were used for all MS/MS experiments that are highlighted in Figure 3.    Figure S7).  H NMR data for SL3 (fraction 11) in DMSO-d 6 at 800-MHz (see also Figure S8).

SL RT in min
Position δH in ppm (multiplicities, coupling constants in Hz), measured δH in ppm (multiplicities), predicted in ACD/Labs Table S7.
Arithmetic mean for analyzed SL1-SL9 (normalized peak areas (weight of wet cecal content)) in GF, SPF and Alistipes mice.