Introduction

Planktonic archaea were first detected in the sea almost 30 years ago [1, 2]. Since then, the extent of the diversity, distribution, and potential function of the third domain of life in the ocean has been extensively studied, but many unknowns remain. Among marine Archaea, the Marine Group II (MGII) is usually more prevalent in surface waters, but not always, and has a global distribution from the poles to tropical seas [3]. Its widespread occurrence suggested that it had an important role in the ocean, but because of the lack of any cultivated representatives, its physiology was largely unknown. The development of metagenomic approaches has, however, allowed a glimpse into the metabolic potential and lifestyle of the MGII [4,5,6,7,8,9,10]. Some MGII harbor a proteorhodopsin (PR) gene and could thus be able to use sunlight as energy source [11, 12]. They can potentially degrade large molecules such as lipids, proteins, and polysaccharides [8, 9, 12], and some have genes for motility [6, 7, 12]. The rare experimental tests of the metabolisms of MGII showed the uptake of phytoplankton-derived proteins using stable-isotope probing [13], and stimulated growth in the presence of whole cells of the picoeukaryote Micromonas [9]. The potential metabolisms detected in MGII suggest that the group has an important role for the carbon cycle of the ocean [14].

Among the genes present in MGII, the archaeal PR has some interesting features. Analyses of key amino acid residues show that both green- and blue-light tuned PR types are present within the group [6, 7], with distinct geographic and vertical distributions [10]. The archaeal PR phylogeny is very different from the inferred organismal phylogeny [6], which indicates extensive and ongoing horizontal gene transfers [11, 15]. Although the rhodopsins found in this pelagic archaeal group are predicted to act as a counter-gradient protonic pumps, like canonical PRs, the fact that some genes have a very unusual amino acid sequences [4], and that some MGII can grow in the dark [9], begs additional examination of the genomic diversity and the ecology of the archaeal PRs.

MGII was recently proposed as an order level lineage, Candidatus Poseidoniales, separated in two families: MGIIa (Candidatus Poseidonaceae) and MGIIb (Candidatus Thalassarchaeaceae) [6]. The two families have different spatial and temporal distributions, with MGIIb generally found in deeper waters [4, 10, 16,17,18], and in surface waters during winter in temperate seas, while MGIIa is predominant in summer [19,20,21]. These different distributions suggest that the two groups have different ecologic niches corresponding to distinct lifestyles [3]. The specific metabolic traits distinguishing MGIIa and MGIIb were recently described more in details [4, 6, 7, 21, 22]. For example, members of the MGIIa have a larger genome size [7, 21], they are all motile and have a catalase gene possibly protecting the cell from oxidative damage [22], while MGIIb have the potential for assimilatory sulfate reduction [4, 21]. The fact that MGIIa and MGIIb represent different taxonomic and ecological entities is now clear, but the question remains as whether the large phylogenetic diversity seen within each family, and especially within MGIIb, reflects the presence of additional ecotypes.

The main goal of this study was to test for the presence of different ecotypes within the diverse MGIIb family. In addition, we wanted to study if different archaeal PR groups reflect different ecological strategies within MGII archaea. To do so, we used a metagenomic time series gathered over 3 years at the SOLA station in the Bay of Banyuls sur Mer in the North Western Mediterranean Sea [23]. We reconstructed metagenome-assembled genomes (MAGs) and monitored their seasonal dynamics monthly. We also used a high-frequency (twice a week) 16S rRNA amplicon sampling to track the fine succession of ecotypes.

Materials and methods

Sampling and metagenome sequencing

For metagenome data, surface seawater (3 m) was collected monthly from January 2012 to February 2015 (40 samples) at the SOLA station (42°31′N, 03°11′E) in the Bay of Banyuls sur Mer (France) in the northwestern Mediterranean as described previously [23]. Briefly, a volume of 5 L was prefiltered through 3-μm pore-size polycarbonate filters (Millipore, Billerica, MA, USA), and the microbial biomass was collected on 0.22-μm pore-size GV Sterivex cartridges (Millipore). The physicochemical parameters were provided by the Service d’Observation en Milieu Littoral (SOMLIT). Metagenomes were sequenced on a HiSeq 2500 “High-Output” paired-end run (2 × 100 bp) (Illumina). Sequencing produced a total of 2,984,444,036 reads that were archived in the EBI repository under accession number PRJEB26919.

Gene catalog

A gene catalog was built as described earlier [23]. Briefly, for each metagenome, high-quality reads were individually assembled with IDBA-UD [24], and gene prediction was done on contigs ≥1 kb with MetaGeneAnnotator [25], which generated a total of 6.4 million gene-coding sequences. A catalog of genes was then built by clustering the predicted gene-coding sequences at 95% identity using CD-HIT (v. 4.6, [26]. The resulting catalog contained 1,568,213 nonredundant predicted genes for the SOLA site. An abundance matrix of gene-coding sequences was then built by mapping the metagenomic reads against the predicted SOLA gene catalog using the SOAPaligner [27]. The abundance matrix was normalized to the gene length and per million of reads. The 16S rRNA gene contigs were annotated against the SILVA database (v.128).

Metagenomic co-assemblies and binning

The 40 metagenomes were first compared to each other with the Commet software [28] to assess their pairwise similarity. The method allows an all-against-all comparison of the non-assembled reads based on shared k-mers. The clustering of metagenomes according to their k-mer similarity separated samples in four groups that corresponded to the four seasons: winter, spring, summer, and autumn (Supplementary Fig. 1). We then co-assembled all reads that belonged to a same seasonal group using MEGAHIT (v.1.1.1) with the ‘meta-sensitive’ preset parameter [29]. Contig abundance profiles were produced by mapping back short reads to the co-assemblies using the mem algorithm from BWA (v.0.7.12) [30]. We then clustered contigs >2.5 kbp with the automatic binning algorithm CONCOCT (v.1.0.0) with a maximal number of clusters set to 100 to minimize the ‘fragmentation error’ [31]. Finally, we manually binned each CONCOCT cluster using the anvi’o interactive interface to produce MAGs [32].

Identification and selection of MAGs

We used CheckM to determine the completeness and degree of contamination of the bins, and all bins with >50% completeness and <10% contamination were defined as MAGs. The taxonomic classification of the MAGs was done with GTDB-Tk (v0.3.2) and GTDB release 82 [33]. 16S rRNA gene were detected using nhmmer (HMMER v.3.2.1) [34] and taxonomically classified against the SILVA database (v.128) using BLAST+ (v.2.2.31) [35]. Only MAGs affiliated as MGII were kept for this study.

Gene prediction was done with prodigal (v.2.6.3, [36] and all the MAG’s predicted proteins were annotated against the KEGG database using GhostKOALA (‘genus_prokaryotes + family_eukaryotes’, v.2.2) [37].

A pangenomic investigation was conducted on the MGIIb subclades O2, O3, O5, and WHARN using anvi’o workflow.

Phylogenetic assessment

We inferred a maximum likelihood tree of MGII archaea based on multiple sequence alignment of 122 single-copy marker proteins as detailed by the Genome Taxonomy Database (GTDB; http://gtdb.ecogenomic.org/). Each MAGs were searched for the 122 marker genes using HMMER (v.3.2.1). The gene sequences were aligned with MUSCLE [38], alignments were concatenated and a phylogenomic tree was constructed using FastTreeDbl (v.2.1.10) [39]. The tree included MAGs from this study as well as reference MAGs from earlier studies that had ≥60 single-copy markers [7, 21, 40]. The phylogenetic tree was visualized using the Interactive Tree Of Life webtool (v.5.5.1).

Extraction of euryarchaeal PR sequences

PR homologous sequences were retrieved from the SOLA gene catalog, from the collection of MAGs reconstructed from SOLA metagenomes, and from public databases using the hidden Markov model (HMM) approach [41]. Briefly, an HMM profile was generated from the 7900 manually curated sequences of type-1 rhodopsins stored into the MicRhoDE database [42], and search against the SOLA gene catalog using the HMMER software (v.3.1b2, [34]). Open reading frames from nucleic acid sequences corresponding to HMM hits were generated using prodigal (v.2.6.3, [36]). The corresponding protein sequences were dereplicated then clustered into PR sequence clusters (PR clusters) at a conservative value of 82% amino acid sequence similarity as used earlier [43]. The clustering was done using CD-HIT (v.4.7, [26]) with parameters set as “-c 1 -n 5 -p 1 -T 6 -g 1 -d 0” and “-c 0.82 -n 5 -p 1 -T 6 -g 1 -d 0”, respectively.

Putative PRs were aligned to the MicRhoDE reference alignment using MAFFT (‘-- addfragments’, v.7.055b, [44]) and were added to a custom version of the MicRhoDE database with ARB software (v.6.0.4, [45]), and protein sequences were placed into the MicRhoDE phylogenetic tree using the add-by-parsimony module of ARB to assess their affiliation. SOLA PR sequences belonging to phylogenetic clusters containing previously identified euryarchaeal PR-types or clustering with a MAG phylogenetically identified as belonging to MGII were extracted. The abundance of each PR sequences was retrieved from the SOLA gene catalog and summed by PR clusters to generate an abundance table.

All aligned sequences were screened for the presence of amino acid residues at position 97, 101, 105, and 108. Partial sequences lacking these sites were not included in the analysis of proton-pumping motif and spectral tuning.

Genes flanking the PR-encoding gene in MAGs were predicted and identified using PROKKA (v.1.14.0, [46]). The synteny analysis was performed by BLASTn homology search and displayed using Easyfig (v.2.2.2, [47]).

16S and 18S rRNA sequencing

A high-frequency amplicon analysis was conducted to target both prokaryotes (16S rRNA) and eukaryotes (18S rRNA). The goal was to infer the fine seasonal dynamics of MGII Archaea and correlate their occurrences with bacterial and eukaryotic taxa. SOLA surface water (3 m) was collected from January 2015 to March 2017: twice a week during the periods of January–March 2015, January–April 2016 and December 2016–March 2017, and roughly once a week otherwise. A total of 5 L of seawater was filtered sequentially through a 3-μm filters and a 0.22-μm Sterivex cartridges. DNA was extracted and 16S rRNA and 18S rRNA were sequenced with the prokaryotic primer (515F-Y [5′-GTGYCAGCMGCCGCGGTAA] and 926R [5′-CCGYCAATTYMTTTRAGTTT], and eukaryotic primer (TAReuk_F1 [5′CCAGCASCYGCGGTAATTCC] and TAReuk_R [5′ACTTTCGTTCTTGATYRATGA] as described earlier [48]. All sequences were deposited in NCBI under accession number PRJNA579489.

The standard pipeline of the DADA2 package (v1.6) [49] in “R” was used for the analysis of the raw sequences to build an Amplicon Sequence Variant (ASV) table. The ASVs were compared to the 16S rRNA of the MAGs by BLASTn. ASV abundances were normalized with the median-ratio method implemented in the “DESeq2” package [50].

For the long-term 16S rRNA archaeal amplicon analysis, data were extracted from a 7 years time-series study from the SOLA site [51]. SOLA surface water (3 m) was collected roughly twice a month from October 2007 to January 2015 and archaeal 16S rRNA genes were sequenced and analyzed as described in the original publication [51].

Statistics

Spearman correlations were computed between log-transformed environmental data and normalized ASV abundances, and between normalized prokaryotic ASVs and normalized eukaryotic ASVs abundances. Bray–Curtis similarities were computed on Hellinger transformed MAG and PR abundance data with the vegan package in R [52]. The relative abundance of the archaeal 16S rRNA genes was used to computed the Shannon diversity index with the vegan package in R.

Results

Temporal dynamics of planktonic archaea

Archaeal 16S rRNA contigs obtained from the gene catalog were more abundant in the metagenomes from November to February at the SOLA station (Fig. 1a). They were almost absent in June, July, and August. In contrast, bacterial 16S rRNA contigs were present all year long (Fig. 1a). The diversity of the archaeal communities was highest during the winter months (from October to March), with the exception of a peak of diversity in April 2012 (Fig. 1b). Among the planktonic archaeal sequences, MGIIa were overall the most abundant (46% of the archaeal sequences), followed by MGIIb (30%), MGI (22%), and MGIII (2%) (Fig. 1c). MGIIb and MGI were present mostly for 3 months from December to February. MGIIa was the most abundant planktonic archaea during the rest of the year and dominated spring and autumn samples (Fig. 1c). MGIII was only present in winter.

Fig. 1: Seasonal dynamics of archaeal abundance and diversity at the SOLA station over 3 consecutive years.
figure 1

a Abundance of 16S rRNA contigs in the metagenomes normalized per million reads. b Shannon diversity index for archaeal communities. c Polar plot showing the abundance of 16S rRNA contigs belonging to the main groups of planktonic archaea summed by month. MGI: Thaumarchaeota Marine Group I, Euryarchaeota MGIIa (Ca. Poseidonaceae), Euryarchaeota MGIIb (Ca. Thalassarchaeaceae), and Euryarchaeaota MGIII. Values are normalized to 1 million reads and averaged per month.

MAGs characteristics

MAGs were reconstructed independently from each of the four seasonal group. A total of 40 MGII MAGs satisfying quality standards (>50% complete and <5% contamination) were obtained, of which 12 had a 16S rRNA sequence (Supplementary Table 1). Their mean genome completeness was 79%, and contamination mean 1.4%. Most were obtained from winter samples (21) followed by spring and autumn (9 for each), and summer (only 1).

The SOLA MAGs covered 15 out of the 17 or 21 clades described earlier ([7] and [6] respectively). We did not detect the clades I (1) and O4 (16), and J1-3 and Q (Supplementary Fig. 2). Thirteen MAGs belonged to MGIIb, of which 11 were nonredundant and spread across 9 different clades (Fig. 2). The most abundant MAG belonged to clade O3. A total of 27 MAGs belonged to MGIIa. They separated in 16 nonredundant MAGs that were distributed through 7 different clades (Supplementary Fig. 2). MGIIa MAGs had larger genome sizes than MGIIb (Supplementary Table 1).

Fig. 2: Phylogenomic tree for the MGIIb MAGs from SOLA (bold) and reference MAGs from the literature.
figure 2

The tree is constructed using concatenated single marker protein gene sequences. Subclades names are indicated following the nomenclature of Galand et al. [20], and Rinke et al. [6], and Tully [7] in parenthesis. Predicted proteorhodopsin spectral tuning color and amino acid motif are indicated by colors when the gene is present.

MAGs seasonal dynamics

Individual MAGs showed strong seasonal patterns (Fig. 3). Among MGIIa, one MAG showed peaks of abundance from October to November (group 1, Supplementary Fig. 3). It was the only MAG that had a positive correlation to ammonium concentrations (Supplementary Table 2). One group of MGIIa was relatively abundant from December to February (group 2, Fig. 3), with an additional peak of abundance in April 2012, and was positively correlated with nitrite and nitrate. A second group of MGIIa was more abundant in February (group 4), positively correlated with nutrients and chlorophyll a, but negatively correlated with seawater temperature (Supplementary Table 2). The last group of MGIIa was more abundant during spring, in April and May, and had strongest correlations to chlorophyll a (group 5, Fig. 3).

Fig. 3: Abundance of the MAGs constructed from metagenomes from the SOLA station sampled from January 2012 to February 2015.
figure 3

MAGs that represented >1% of the total archaeal MAG abundance are represented. Bars are colored according to seawater temperature from coolest (blue) to warmest (yellow). MAG names, MGII groups (a or b) and clade affiliation as numbers [7] and as letters [6, 20] are indicated at the right. Numbers in bold indicate the groups delineated by Bray–Curtis similarity and MAG ordering follow the clustering order (Supplementary Fig. 3).

Among MGIIb, some peaked in late autumn (group 2), others in the middle of the winter (group 3) and one in late winter (group 4) (Fig. 3). All MAGs from group 3 correlated positively with chlorophyll a, and with nutrients, nitrite, nitrate and phosphate, and one MAG with silicate (Supplementary Table 2).

Archaeal PR phylogeny and amino acid patterns

PR gene homologs were extracted from both the SOLA gene catalog and reconstructed MAGs, and grouped into PR clusters. The phylogenetic placement of the PR clusters was inferred against the MicRhoDE reference database (Fig. 4). Archaea-related PRs from SOLA were distributed mainly into two clades. A total of 9 PR clusters from the gene catalog and 14 sequences from the MAGs were affiliated with rhodopsin-types previously identified as belonging to the euryarchaeota HF70_39H11/HF70_59C08-like subcluster (red branches, Fig. 4). The amino acid residues in position 97, 101, and 108, involved in the ion pumping mechanism, were aspartate (D), threonine (T), and glutamate (E). This DTE motif is classically associated with proton-pumping PR. The glutamine (Q) residue found in position 105 is characteristic of the blue-light tuned variant of the PR.

Fig. 4: Phylogenetic tree of microbial rhodopsins affiliated with Euryarchaea.
figure 4

Rhodopsins from SOLA: genes extracted directly from the metagenomes are in bold grey, and from MAGs in bold black. Branches are colored by taxonomy as given in MicRhoDE: Supercluster IV/Alphaproteobacterial-like cluster 8/B subcluster in yellow, Supercluster IV/Proteobacterial-like cluster 3/HF10_19P19-like subcluster in pink, euryarcheaeal-like cluster/HF70_39H11 HF/70_59C08-like subcluster in red, and other microbial rhodopsins/group 2 in orange. Spectral tuning and ion pumping motifs are displayed as internal rings. Uninformative clades have been collapsed.

A second group of 8 PR clusters from the gene catalog and 29 sequences from MAGs were closely related to the Alphaproteobacterial-like cluster 8/B subcluster (yellow branches, Fig. 4), but constituted a separate sister clade within the supercluster IV, comprising most of the canonical PRs (Supplementary Fig. 4). However, all archaeal sequences from this clade of rhodopsin presented a unique amino acid motif, with a lysine (K) in position 108, in addition to the usual aspartate (D), threonine (T) in position 97 and 101 (DTK motif) (Supplementary Fig. 5). The methionine (M) at position 105 suggested a green-tuning of this variant.

Both the DTE-Q type from the euryarchaeal clade, and the less common DTK-M variants, presented a conserved putative tertiary structure with seven transmembrane helices, and harbored a lysine in the seventh transmembrane helix, the key residue covalently binding the retinal, the light-absorbing molecule (Supplementary Fig. 5).

Finally, some rhodopsins grouped in clades having no SOLA MAGs representatives. One group was close to the proteobacterial-like cluster 3/HF10_19P19-like subcluster, and had a DTE-Q motif (pink branches, Fig. 4). Another rhodopsin group was affiliated to an uncharacterized cluster of microbial rhodopsins rooting the proteorhodospin family with a DTE-L motif (orange branches, Fig. 4).

The DTE-Q and DTK-M types were not homogeneously distributed among MGIIa and MGIIb subclades, which indicates that the archaeal and the rhodopsin phylogenies were not congruent (Supplementary Fig. 6). DTE-Q variants were found among 13 subclades, while DTK-M were found in 11 subclades, and were the only types found in subclades L3, O2, and WHARN.

Archaeal PR seasonal dynamics in the gene catalog

The archaeal PR genes detected in the gene catalog showed seasonal dynamics (Fig. 5). Overall, they were absent from the samples from June to September and most abundant during winter. Four main seasonal dynamics were observed. A group of PR clusters were more abundant in autumn (group 1, Supplementary Fig. 7), another, which contained the largest number of PR clusters, was more abundant in early winter (group 2), and a third was typical for the winter (group 3). The last group showed variable patterns with a peak in April 2014 (group 4). The succession of different archaeal PR clusters corresponded to a succession in the color of the spectral tuning (Fig. 5). Green-tuned types dominated the autumn with DTK-M key amino acids motif, blue-tuned types in early winter with DTE-Q motif and green again in late winter. There were a few exceptions to the color succession, and in particular, the presence of green-tuned PR during earlier winter with the less common DTE-L amino acid pattern.

Fig. 5: Abundance of the proteorhodopsin clusters extracted from the gene catalog from the SOLA station sampled from January 2012 to February 2015.
figure 5

Bars are colored according to seawater temperature from coolest (blue) to warmest (yellow). Cluster names, color tuning, and amino acid motives are indicated at the right. Numbers in bold indicate the groups delineated by Bray–Curtis similarity (Supplementary Fig. 7).

High-frequency temporal dynamics and long-term occurrence of selected MGII ecotypes based on 16S rRNA amplicon

To monitor the fine temporal dynamics of possible MGII ecotypes at SOLA we focused on MAGs that contained a 16S rRNA sequence, that had distinct seasonal dynamics, and that harbored a PRshowing distinct seasonality. We selected one MGIIa that was abundant in autumn (SOLA-AUT-MGII-B2), one MGIIb that was typical for early winter (SOLA-AUT-MGII-B5), and one MGIIb that was more abundant in late winter (SOLA-SPR-MGII-B1). We verified the dynamics of each MAG against the dynamics of the corresponding 16S rRNA (miTAG) in the metagenomes and the corresponding PR clusters for the three metagenome sampling years. The three different markers showed very similar temporal dynamics for each selected MAG (Supplementary Fig. 8).

We used high-frequency sampling data from the SOLA station and tracked the 16S rRNA of the three selected MAGs within the 16S rRNA amplicon dataset (Fig. 6). The dataset sampled twice a week shows the time during which the specific archaea are present, and their temporal dynamics at a fine level of resolution. For MGIIa SOLA_AUT_MGII_B2, the length of the continuous occurrence varied and lasted from 1.5 to 3.5 months. It also showed episodic single re-occurrences after the main period (Fig. 6). For MGIIb SOLA_AUT_MGII_B5, the period of occurrence was longer and more regular lasting for 4 months, and correlated with salinity. Finally, the MGIIb SOLA_SPR_MGII_B1 occurrence varied in intensity from year to year, lasted at least 3 months, and was correlated with chlorophyll a concentration (Supplementary Table 3). For the three archaea, the occurrence was characterized by large variations in relative abundance. In some cases, the occurrence ended abruptly, but not always.

Fig. 6: High-frequency temporal dynamics of the 16S rRNA amplicons corresponding to three selected MGII MAGs.
figure 6

Sampling frequency is twice a week from January to March and once a week otherwise. Bars are colored according to seawater temperature from coolest (blue) to warmest (yellow). Dates indicate the beginning and the end of the MAGs occurrences.

A correlation analysis, based on the high-frequency sampling, between the three archaeal groups and other microorganisms, showed that MGIIa SOLA_AUT_MGII_B2 cooccurred with typical planktonic bacteria such as members of the SAR86 and Rhodobacteraceae (Supplementary Table 4), and picoeukaryotes from the Picozoa and Stramenopiles lineages. MGIIb SOLA_AUT_MGII_B5 (O3) cooccurred with prokaryotes usually found in deeper water such as the SAR324 clade or the archaeal Marine Group III, and eukaryotes from the Prymnesiophyceae and Dinophyceae lineages. MGIIb SOLA_SPR_MGII_B1 (WHARN) was associated to a clade of Roseobacter and the bloom of the picoeukaryotes Bathycoccus, as well as some Spirotrichea and Cryptophyceae (Supplementary Table 5).

The long-term amplicon data confirmed the succession in time of the three selected archaea over 7 years (Fig. 7). It showed that the MGIIa MAG (SOLA-AUT-MGII-B2) had the shortest blooms. The peak of these blooms varied from August to October. For the MGIIb SOLA_AUT_MGII_B5 (O3), blooms lasted for longer periods and they peaked in November and once in December (Fig. 7). The other MGIIB, SOLA_SPR_MGII_B1 (WHARN), also had longer blooms, and showed peaks of abundance from January to March depending on the year (Fig. 7).

Fig. 7: Temporal dynamics over 7 years of the 16S rRNA corresponding to three selected MGII MAGs.
figure 7

Sampling frequency is roughly twice a month. Bars are colored according to seawater temperature from coolest (blue) to warmest (yellow).

Metabolic capabilities of MGIIb O3 and WHARN

We observed a seasonal succession within the abundant MGIIb archaea at the SOLA station and specifically between the group O3 and WHARN. In order to identify potential genotypic differences between these two groups, we focused a pangenomic analysis on the branch of MGIIb tree grouping the O2 to O5 clades and the WHARN clade (Fig. 2). Clade O4, which contains deep-sea representatives, was not included because no representatives were reconstructed from SOLA. The MAGs reconstructed from the SOLA metagenomes were compared with other MGIIb MAGs available in the literature (Supplementary Fig. 9). The analysis shows that the MAGs separated in four main groups according to their gene content. These groups corresponded to the clades delineated by the phylogenetic analysis and showed a clear separation between O2 and O3. The figure also shows that all clades shared a core genome grouping the majority of genes, as illustrated by the black area on the right side of the phylogram (Supplementary Fig. 9). There were also a number of gene clusters that were unique to each clade (left side of the phylogram). Most of these genes were unassigned but among the ones that were annotated we could pinpoint some that were found in O3 and not WHARN: archaeal flagellum (COG3354, COG2874, COG0630), beta-glucosidase (COG2723) for the utilization of polysaccharides, or galactokinase (COG0153) for monosaccharides, pyridoxine 5′-phosphate oxidase (COG0259) involved in the vitamin B6 salvage pathway, superoxide dismutase (COG0605) against the reactive oxygen species superoxide, bacterioferritin (COG2193) for iron storage and cysteine synthase (COG0031) for the incorporation of inorganic sulfur into organic compound (cysteine), or threonine synthase (COG0498). For the WHARN clade, we could not identify specific cluster of annotated genes but among vitamin related genes, all O3 and WHARN MAGs had thiamine pyrophosphokinase (COG1564) used for the salvage pathway of vitamin B1 (salvage thiamine to form ThDP), the two genes needed for the de novo B6 synthesis through the DXP-independent pathway were present (COG0311, COG0214) and the catalase peroxiredoxin (COG1225) for the conversion of the reactive oxygen species hydrogen peroxide.

A synteny analysis of the genes surrounding the PR from the WHARN and O3 clades showed that none had genes involved in retinal biosynthesis from fatty acids. There were, however, clear differences between the genomic regions of the two clades (Supplementary Fig. 10). For WHARN, which carries the DTK-M type, the region was very well conserved between MAGs. The PR was juxtaposed with an hypothetical protein close to a phospholipase C (plcA), a fatty acid oxidation complex (fadB), and a GTP cyclohydrolase II (ribA) catalyzing the first step (APy synthesis) of the riboflavin biosynthesis (vitamin B2). In addition, a high number of constitutive genes involved in ribosome constitution and transcription, as well as numerous transfer RNA coding fragments were present (Supplementary Fig. 10). For the O3 clade, the regions containing the DTE-Q type were poorly conserved. Some contigs had genes coding for pyridoxal 5′-phosphate synthase subunit (PdxS) for vitamin B6 synthesis, and other had peroxiredoxins involved in protection against superoxide.

Discussion

The metagenomic time-series combined with high-frequency amplicon data demonstrated the presence of distinct ecotypes within the MGIIb group (Ca. Thalassarchaeaceae). Earlier work on MGII Euryarchaeota have shown seasonal differences at a broad phylogenetic scale (family) between MGIIa and MGIIb [6, 19,20,21], but the existence of niche differentiation between clades of MGIIb had never been reported. Here we show a close succession in time between members of the MGIIb clade O3 and WHARN. The clade O3, which bloomed first during winter, had genotype characteristics that draw a picture of microorganisms able to swim, degrade polysaccharides, obtain B6 vitamins through salvages pathways using pyridoxamine and pyridoxal, and protect itself against both the reactive oxygen species superoxide and hydrogen peroxide. In addition, members of O3 had a blue-tuned PR with a DTE-Q amino acid motif. They appear best fitted to occupy the niche defined by the darkest time of the year, and may have a lifestyle similar to the one of SAR86, i.e. photoheterotrophs that live on phytoplankton-derived organic matter, with whom they cooccurred at SOLA. They may thus be competitors but could also co-exist by having different substrate preferences. Members of the O3 clade could move toward sources of food and protect themselves against the reactive oxygen possibly produced by photosynthetic organisms. Alternatively, peroxiredoxins could also protect the archaea against peroxide produced during the vitamin B6 biosynthesis. Interestingly, the genes for vitamin B6 (PdxS), peroxiredoxin, and PR were located in synteny, which suggests that they could be coexpressed, and that their activation could be potentially enhanced or driven by sunlight.

Many of the genomic features found in the O3 clade were not seen in other MGIIb, but are common to some members of the MGIIa group, including motility and ability to degrade algal polysaccharides [6, 7, 21, 22]. However, members of the MGIIb O3 and MGIIa are distantly related in the phylogenetic tree, which raises the question as whether these common traits were acquired through horizontal gene transfer. Alternatively, the trait may have been common to all MGII before being lost in some clades.

Members of the WHARN clade, in turn, bloomed later, when the water temperature was at its yearly low and day-length increased. WHARN members only have the catalase peroxiredoxin gene, so that they can remove superoxide but not hydrogen peroxide. They can only salvage B6 from pyridoxal, but can, together with the O3 clades, obtain vitamin B1 through a salvage pathway involving thiamine pyrophosphokinase. WHARN archaea were highly correlated with the abundance of the picoeukaryote Bathycoccus. Earlier, the picoeukaryote Micromonas was showed to enhance the growth of some MGII, which may be able to take up certain phytoplankton-derived labile organic compounds [9]. It remains to be shown whether the WHARN archaea take up organic material from Bathycoccus, if they need each other, or if they simply thrive under similar environmental conditions. WHARN have a green-tuned PR with the unusual DTK-M motif. The conserved genomic region around the PR was rich in tRNAs and genes involved in transcription and regulation processes, which suggest that this PR could be constitutively expressed as previously reported for bacteria [53], and thus important for the metabolism of the group. Our observation of a very conserved genetic environment suggests a strong environmental selection for the maintenance of this structure.

Our data demonstrated the widespread presence within archaea at SOLA of the uncommon DTK PR, DTE being the most common motif defining the capability of establishing a proton motive force, i.e. function as PRs [54]. The DTK motif is very unusual among marine microbial rhodopsins and has only been noted in members of the MGII group [4, 6]. Among bacteria, it is only seen in nonmarine microorganisms from the Exiguobacterium subcluster (supercluster I/Firmicutes cluster, ESR) originally detected in Exiguobacterium sibiricum, isolated from permafrost soil [55], and an evolutionary relative from a lake (Exiguobacterium sp. JL-3 [56]). This newly named ESR retinal protein (after Exiguobacterium sibiricum rhodopsin) functions as a light driven proton pump with a unique structural feature and pumping mechanism [57, 58]. The specificity of the marine archaeal PR variant that we describe here is that it is phylogenetically distant from terrestrial ESR, and the presence at position 105 of a methionine (M) instead of a leucine (L). Both, methionine and leucine tune the spectral absorbance to the green. The euryarchaeal DTK-M variant has otherwise characteristics that are similar to the bacteria ESR, with a lysine instead of the typical carboxyl residue at the position that corresponds to the proton donor [57]. In addition, both the bacteria ESR and the euryarchaeal DTK-M variant contain a histidine residue interacting with the proton acceptor aspartic acid, which provides the ability to function over a wider pH range compared to canonical PR [57]. The observation of the DTK motif in both the marine Euryarchaeota and terrestrial microorganisms is intriguing. The role of this rhodopsin and its origin in the sea definitely requires further investigation.

Genes involved in the retinal biosynthesis from fatty acids, notably β-carotene 15,15′-dioxygenase (Blh), were not found in the SOLA O3 nor WHARN MAGs, nor in most other MGIIb clades [7]. MGIIb thus probably obtain the retinal, a vitamin A aldehyde, through environmental scavenging of compounds produced by other prokaryotes or microalgae. We cannot, however, exclude the possibility that the genes could have been present on a region not well covered in certain MAGs. It should be noted that we detected Blh in numerous MGIIa MAGs, including the L3 subclade that only harbor DTK-M variants, which could pinpoint different PR-based ecological strategies between MGIIa and MGIIb.

The MGII PRs showed clear seasonal patterns with a succession of green-tuned rhodopsins in autumn followed by blue in the middle of the winter, and green again in the spring. The spectral tuning of the prokaryotic PR is thought to be adapted to the dominant light field in deeper (blue) and shallower (green) oceanic waters [41, 59], and to have biogeographic patterns that follow a coastal to open ocean gradient [60]. An earlier winter vs summer comparison in the eastern Mediterranean also suggested seasonality in PR [61]. Our 3 years survey adds to the current knowledge by showing that seasonality is a major driver for the succession of different PR spectral tuning types. The fact that the blue-tuned proteins were present during the middle of the winter could suggest that they give an advantage to archaea in surface waters when days are shorter and light intensity lower. When day length increases, as well as the level of photosynthetically active solar radiation, the archaea fitted with green-tuned PR take over.

The use of microbial metagenomic time-series demonstrated that planktonic archaea can at time disappear from surface marine waters or become very rare. In the NW Mediterranean, they were almost absent in surface waters during the summer month. It strongly contrasts with the pattern observed with the domain Bacteria that is present all year round. It suggests that surface archaea are adapted to the times during which the water column is mixed and the primary production is highest. The oligotrophic conditions found during summer stratification do not seem to appeal to planktonic archaea. The absence of archaea in summer surface waters may not be a specific feature of the Mediterranean Sea. Also limited in time scale, earlier studies based on hybridization have reported that archaea were absent during summer in the Southern Ocean waters [62]. Inversely, in the North Sea, FISH counts showed that archaea were more abundant in summer [21, 63], and especially Euryarchaeota [64], after the phytoplankton bloom, and almost absent in winter. They were also present in some summer samples of the Atlantic Ocean [65]. Our and early results thus suggest that MGII archaea thrive under nutrient-rich conditions when primary productivity is high. Inversely, they disappear from surface waters under oligotrophic conditions when phytoplankton is absent.

Our metagenomic data also showed that the seasonality of the two different MGII families is very different from the classical summer versus winter rhythm described in the literature [19,20,21]. In the NW Mediterranean Sea, MGIIa and MGIIb were both more abundant in winter. The difference between families being that MGIIa was present over a longer period than MGIIb. MGIIb had a narrow peak of yearly abundance from December to February while MGIIa had a longer presence that ranged from November to April. This result contradicts earlier seasonal studies from the Mediterranean Sea that showed a dominance of MGIIa in summer and MGIIb in winter [19, 20], or no seasonal pattern between MGII groups of the San Pedro Channel [18]. The present metagenomic approach, by removing the PCR bias, probably shows a picture that is closer to reality. Our results, however, contradict a metagenome based results from the North Sea depicting summer MGIIa versus winter MGIIb [21]. The temporal range of that survey was however shorter, which could have hidden the full seasonality of the MGII clades.

Finally, our data confirms the phylogeny by Tully [7] separating the O/WHARN branch in six subclades. Rinke et al. [6] delineated only five subclades within that branch. These two earlier studies also had different way of naming the clades: clade WHARN in Tully [7] was named O2 in Rinke et al. [6], and the clade O3 in Rinke et al. [6] grouped together the clades O2 and O3 from Tully [7]. With the thought of avoiding future naming confusion, we propose to adopt the 6 clades defined in Tully [7] and named O1, O2, O3, O4, O5, and WHARN. The letter O refers to the first delineation based on 16S rRNA [20], and we find it important to keep the name WHARN that historically refers to the first published MGII planktonic archaeal sequence (WHAR-N from Woods Hole, [2]).

In conclusion, the combination of time-series and reconstructed genomes data has proven to be a powerful tool for elucidating the ecology of marine microorganisms. We dived into the diversity of the archaea MGIIb family (Candidatus Thalassarchaeaceae) and discovered the presence of different ecotypes that closely succeeded one another in a short period of time during the winter season. The different subclades had distinct metabolic capabilities and PR variants that were in adequacy with their ecological niches.