Lipid analysis of CO2-rich subsurface aquifers suggests an autotrophy-based deep biosphere with lysolipids enriched in CPR bacteria.

Sediment-hosted CO2-rich aquifers deep below the Colorado Plateau (USA) contain a remarkable diversity of uncultivated microorganisms, including Candidate Phyla Radiation (CPR) bacteria that are putative symbionts unable to synthesize membrane lipids. The origin of organic carbon in these ecosystems is unknown and the source of CPR membrane lipids remains elusive. We collected cells from deep groundwater brought to the surface by eruptions of Crystal Geyser, sequenced the community, and analyzed the whole community lipidome over time. Characteristic stable carbon isotopic compositions of microbial lipids suggest that bacterial and archaeal CO2 fixation ongoing in the deep subsurface provides organic carbon for the complex communities that reside there. Coupled lipidomic-metagenomic analysis indicates that CPR bacteria lack complete lipid biosynthesis pathways but still possess regular lipid membranes. These lipids may therefore originate from other community members, which also adapt to high in situ pressure by increasing fatty acid unsaturation. An unusually high abundance of lysolipids attributed to CPR bacteria may represent an adaptation to membrane curvature stress induced by their small cell sizes. Our findings provide new insights into the carbon cycle in the deep subsurface and suggest the redistribution of lipids into putative symbionts within this community.

heterotrophy parameter was calculated from the isotopic mass balance of expected and observed archaeal εDIC-lipid and weighted average δ 13 C of bacterial lipids (thereby assuming heterotrophic archaeal feed on bacterial biomass, whereas archaeal feeding on archaeal biomass would be isotopically indistinguishable from autotrophy). As lipids and bulk biomass may differ in δ 13 C values, a 5‰ uncertainty of ε was assumed for visualization in Fig. 2.
Fourier-transform infrared (FTIR) spectromicroscopy, data processing and analysis. Recovered cells that passed through a 0.2-µm filter but retained by a 0.1-µm filter were deposited on a double-side-polished silicon slide and dried with a gentle nitrogen gas stream in a biological safety cabinet. All measurements were made on the silicon slide in conjunction with the Hyperion 3000 Infrared-Visible microscope that was coupled to a Vertex70V interferometer (Bruker Optics -Billerica MA) from the Berkeley Synchrotron Infrared Structural BioImaging (BSISB) program. The microscope was equipped with a 128×128 pixels focal plane array detector. Together with a 15× magnification objective, each individual image field (or tile) covers an area of ~300×300 µm. Nine tiles where measured, collecting a total of ~150,000 spectra. Each spectrum represents an average of 1024 scans acquired at the mid-infrared frequency range between 900 and 3,700 cm -1 at a spectral resolution of 4 cm -1 . All FTIR spectra were corrected for CO2 and water vapor with a built-in function "OPUS7.5" (Bruker Optics -Billerica MA), exported in ENVI format to R-Studio, which were analyzed using the following three R packages: Hyperspec [18], Signal [19] and Baseline [20]. In this study, data were first baseline corrected with a "fillpeaks" function of the baseline package. Then, we removed spectra from pixels that did not have cells by applying an intensity filter on the 3200 cm -1 signal to filter out all the pixels with intensity lower than 0.02 a.u (abundance units). This yielded a final dataset of 89,000 spectra. A principal component analysis (PCA) was first performed on the 2800-3050 cm -1 energy range to focus on differences in lipid spectral features. Loading vectors of this latter analysis were compared with spectra of dry films of reference lipids: 1-oleoyl-2-stearoyl-snglycero-3-phosphocholine (C18:1-18:0-PC) versus 1-oleoyl-2-hydroxy-sn-glycero-3phosphocholine (lyso C18:1-PC). Both C18:1-18:0-PC and lyso C18:1-PC were purchased from Avanti Polar Lipids, Inc. (Alabaster, Alabama, USA). To prepare lipid films, 1.0 mg of C18:1-18:0-PC or lyso C18:1-PC were dissolved in ~200 µl of chloroform, then deposited ~5 µl of the solute droplets onto a double-side-polished silicon slide and dried under a gentle N2 stream for 1 hour before FTIR measurement. A second PCA was then performed over the whole 900-3700 cm -1 spectral range to determine whether the variance of the spectral patterns highlights the main chemical composition of the samples.

Additional FTIR data analysis
Additional principal component analysis on the 900-3700 cm -1 spectral range was performed to gather more information on the chemical environment that harbored CPR cells. The results showed that >95% of the variance for both samples can be represented by the first 5 components (Fig. S9).
The sample presents strong signals in the lower wavenumber region, between 1200 and 900 cm -1 . Loading vector 1 is characterized by a strong signal at 1125 cm -1 characteristic for sulfates, in particular iron and manganese sulfates. Moreover, loading vector 1 presents signals at 1680 and 1540 cm -1 , assigned to proteins as amide I and amide II respectively; the signal at 1680 cm -1 is quite broad and has a shoulder at 1633 cm -1 , due to beta sheet conformation of proteins, and another shoulder at 1725 cm -1 assigned to the carbonyl group of lipids. Loading vector 2 shows strong absorption at 1415 cm -1 due to presence of calcium carbonate and two sharp signals at 2920 cm -1 and 2850 cm -1 originating from CH2 of lipid aliphatic chains. Loading vector 3 exhibits a strong peak at 1125 cm -1 , but no other relevant features, whereas loading vector 4 shows two sharp signals from CH2 of the lipid aliphatic chains at 2920 cm -1 and 2850 cm -1 and a signal at 2950 cm -1 from CH3, suggesting the presence of a different type of lipid containing more branching, possibly archaeal isoprenoid lipids. In summary, the chemical environment is rich in sulfates, carbonates, and silicates. The cellular material exhibits signals of proteins, folded in beta-sheet structures and a mixture of signals from lysolipids with different levels of branching and unsaturation, belonging to bacterial and archaeal cells.

Analysis of dissolved organic carbon
Water samples were collected for dissolved organic carbon (DOC) in July 2019 during the minor eruption phase of Crystal Geyser. Samples were filtered through 0.1 µm syringe filters, acidified with hydrochloric acid in the field, and transported and stored at 4 °C . The samples were sparged with helium in the lab prior to being loaded onto a model 1088 autosampler for analysis. Quantification and carbon isotopic analysis of DOC was performed following the protocol of Lalonde et al. [21] and using a OI Analytical Aurora 1030W TOC Analyzer interfaced to a Finnigan Mat DeltaPlusXP IRMS. The 2σ analytical precision was 2% (of the ppm value) for quantification and ±0.2‰ for carbon isotopic composition.

Diversity and phylogeny of 3-oxoacyl-[acyl-carrier protein] reductase beta subunit
To investigate the diversity of dehydrogenases involved in fatty acid reduction, we picked the 3oxoacyl-[acyl-carrier protein] reductase beta subunit as a representative example. 67 currently existing assemblies from Crystal Geyser [22][23][24] underwent gene prediction using prodigal [25] followed by an HMM search [26] using the respective model from KEGG K11539 [27] with an evalue cutoff of 10 -10 . Results were annotated against UniRef100 (e-value cutoff 10 -5 ) and only those results were retained, which contained the annotation strings "3-oxoacyl-ACP_reductase", "3-oxoacyl-_acyl-carrier-protein__reductase", "Beta-ketoacyl-ACP_reductase", "Enoyl-_acylcarrier-protein__reductase__NADH", "Enoyl-_acyl-carrier-protein__reductase", "3-oxoacyl-_Acyl-carrier_protein__reductase", "3-oxoacyl-_acyl-carrier_protein__reductase", and "3ketoacyl-ACP_reductase". The resulting sequences were clustered using CD-Hit [28] at 90% amino acid similarity and aligned using muscle [29]. After manual end-trimming of the alignment a tree was computed using FastTreeMP [30] with default parameters. Visualization of the tree was done using the iTOL software [31]. *sample underwent whole genome amplification for sequencing ** volume of CG26A and CG 26B is identical as these were sequential filters Table S2. Samples for lipidomic analyses and the respective volume of groundwater which was filtered.  Fig. S1 | Sampling scheme of the near 5-day eruption cycle as previously displayed and explained in [22]. Please note the alignment of the metagenomic samples (displayed as numbers) and the lipidomic samples that were taken in parallel. For most of the samples, two metagenome samples are representative of one lipidomic sample. Bulk = collection of fluids directly onto a 0.1 µm filter. Sample 26 was a sequential filtration first onto a 0.2 µm filter, then onto a 0.1 µm filter. One additional sample, collected outside of this field campaign was also included for metagenomic and infrared analysis. This particular sample was collected in the middle of the recovery phase, when the abundance of CPR was high and little to no Sulfurimonas, which passes through 0.2 µm filter, was present. Intact polar lipids consist of a polar headgroup connected to an apolar core lipid. Note that not all possible combinations were detected in Crystal Geyser. The type of hexoses and pentoses could not be distinguished by the employed analytical techniques and the exact structure of the 1G-1pentose headgroup is unknown (i.e., whether it is the hexose or the pentose that is bound to the glycerol moiety of the core lipid).

Fig. S5
| Percent relative abundance patterns of different rpS3-carrying scaffolds across the 27 metagenomic samples from Crystal Geyser (rpS3 sequences were clustered at 99% amino acid similarity, followed by stringent read-mapping for coverage calculation). Each sample is one line on the x-axis. Each color represents the relative abundance of one microbial rpS3 gene that was tracked over time. Coloring according to Fig. 1: Pink = recovery phase, green = minor eruptions, brown = major eruptions. Individual relative abundances are provided in Table S5. The water pressure variations showing sourcing of fluids from the conduit (mixed), the deep aquifer, and the shallow aquifer from ref. [22].
Prim_2G-1uns-ext-AR    Table S4 | "Percent relative abundance and stable carbon isotopic composition of lipids across the cycle" is provided in a separate file.

Table S5
| "Percent relative abundance of rpS3 genes across the cycle" is provided in a separate file.