Metabolic traits of an uncultured archaeal lineage -MSBL1- from brine pools of the Red Sea

The candidate Division MSBL1 (Mediterranean Sea Brine Lakes 1) comprises a monophyletic group of uncultured archaea found in different hypersaline environments. Previous studies propose methanogenesis as the main metabolism. Here, we describe a metabolic reconstruction of MSBL1 based on 32 single-cell amplified genomes from Brine Pools of the Red Sea (Atlantis II, Discovery, Nereus, Erba and Kebrit). Phylogeny based on rRNA genes as well as conserved single copy genes delineates the group as a putative novel lineage of archaea. Our analysis shows that MSBL1 may ferment glucose via the Embden–Meyerhof–Parnas pathway. However, in the absence of organic carbon, carbon dioxide may be fixed via the ribulose bisphosphate carboxylase, Wood-Ljungdahl pathway or reductive TCA cycle. Therefore, based on the occurrence of genes for glycolysis, absence of the core genes found in genomes of all sequenced methanogens and the phylogenetic position, we hypothesize that the MSBL1 are not methanogens, but probably sugar-fermenting organisms capable of autotrophic growth. Such a mixotrophic lifestyle would confer survival advantage (or possibly provide a unique narrow niche) when glucose and other fermentable sugars are not available.

The candidate Division MSBL1 (Mediterranean Sea Brine Lakes 1) comprises a monophyletic group of uncultured archaea found in different hypersaline environments. Previous studies propose methanogenesis as the main metabolism. Here, we describe a metabolic reconstruction of MSBL1 based on 32 single-cell amplified genomes from Brine Pools of the Red Sea (Atlantis II, Discovery, Nereus, Erba and Kebrit). Phylogeny based on rRNA genes as well as conserved single copy genes delineates the group as a putative novel lineage of archaea. Our analysis shows that MSBL1 may ferment glucose via the Embden-Meyerhof-Parnas pathway. However, in the absence of organic carbon, carbon dioxide may be fixed via the ribulose bisphosphate carboxylase, Wood-Ljungdahl pathway or reductive TCA cycle. Therefore, based on the occurrence of genes for glycolysis, absence of the core genes found in genomes of all sequenced methanogens and the phylogenetic position, we hypothesize that the MSBL1 are not methanogens, but probably sugar-fermenting organisms capable of autotrophic growth. Such a mixotrophic lifestyle would confer survival advantage (or possibly provide a unique narrow niche) when glucose and other fermentable sugars are not available.
More than half of the 60 major lines of descent within the bacterial and archaeal domains that have been described based on SSU rRNA phylogeny 1 remain uncultured and make up the so-called "microbial dark matter" 2 , since their metabolic capabilities and ecological role remain obscure. Members of the candidate division MSBL1 (Mediterranean Sea Brine Lakes 1) encompass an uncultured archaeal lineage that is abundant and widespread in deep hyper-saline anoxic basins (DHABs) of the Mediterranean Sea, the Red Sea, and the Gulf of Mexico [3][4][5] . 16S rRNA signature sequences of this group were also reported from the anoxic hypolimnion of a shallow hyper-saline Solar Lake in Egypt 6 , sediments of hyper-saline Lake Chaka in China 7 , from a crystallizer in a multi-pond solar saltern in the south of Mallorca Island 8,9 and recently in metagenomic libraries from a hyper-saline lake in Kenya (Mwirichia et al. unpublished data). MSBL1 have been postulated to be methanogenic based on their phylogenetic position and circumstantially because they are the numerically dominant archaeal group in DHABs, where incidentally also high methane concentrations of presumably biogenic origin are present 3,10,4,11 . However, with the exception of few sequences distantly related to Methanohalophilus 4,12 no other homologs of the key methanogenic enzyme, methyl coenzyme-M reductase (mcrA) have been recovered from the brine pools studied so far. Therefore, the exact metabolism of this group remains enigmatic in the absence of cultured representatives or larger contigs of genomic sequences. In this study, we applied single-cell genomics using cells of the MSBL1 clade from the Red Sea brine pools to reconstruct their metabolic potential. Our study provides the first evidence of their non-methanogenic metabolic capabilities that enable them to thrive in the anoxic DHABs.
Carbohydrate transport and metabolism. Genes encoding for transcriptional regulators involved in carbohydrate transport and metabolism were identified in 18 of the SAGs (Supplementary Table S2), the most important being the archaeal sugar-specific transcriptional regulator TrmB. In Thermococcus litoralis, the respective protein is involved in the maltose-specific regulation of a gene cluster (malE, malF, malG, malK) that encodes a trehalose/maltose-binding protein-dependent ABC transporter for trehalose and maltose 18 . The genes malE and malG were identified in the genomes of AAA259A05 and AAA259E19, respectively. MalE is a maltose binding protein whereas MalG is a maltose transport system permease. Sugar transporters include a putative catabolism phosphotransferase system, putative sugar ABC transport system and a glucose import ATP-binding protein TsgD13 (Supplementary Table S2). Potential substrates include glucose, galactose arabinose, maltodextrin, maltose, xylose and ribose (Supplementary Table S2). Trehalose could play a significant role both as a carbon source and also compatible solute involved in osmoprotection. In this group, trehalose is synthesized from maltose, starch or UDP glucose (Supplementary Table S2; Supplementary Figure S3). The ability to utilize trehalose as an osmolyte would explain their rather normal pI as compared to that of other extreme halophiles. In the genome of AAA259B11, α -D-glucanotransferase may be involved in conversion of starch to trehalose. Supplementary Figure S3 summarizes the initial sugar metabolism to either α -D-glucose or trehalose.
Glycolysis/ Gluconeogenesis. Diversity in sugar metabolism pathways in archaea as well as the variability in enzymes involved has been reviewed recently 19 . The MSBL1 group uses a fermentative sugar metabolism that combines the classical and recently discovered (archaeal) enzymes of the Embden-Meyerhof (EM) pathway ( Fig. 4; Supplementary Table S2). The absence of cytochromes, cytochrome oxidases and quinones in all the SAGs reinforce our hypothesis that these Archaea are likely to ferment and also that they probably do not contain an electron transport chain. Besides, presence of oxygen-sensitive enzymes (pyruvate-ferredoxin oxidoreductase) and absence of catalase indicates a strictly anaerobic lifestyle as expected within the anoxic brine environment.
During sugar metabolism via the EM pathway, glucose is converted to two molecules of pyruvate and yields two ATPs, reducing equivalents and intermediates that are precursors for cellular building blocks.  Table 1. Summary statistics of the clean contigs for SAGs described in this study. Contigs that were less than 2 kb were flagged as suspicious and omitted from the analysis. Genome completeness was computed using 104 and 191 conserved marker genes, respectively. * None of the AMPHORA marker genes were detected in this genome.
Scientific RepoRts | 6:19181 | DOI: 10.1038/srep19181 Entner-Doudoroff pathway, gluconate dehydratase and KDG aldolase are missing in all the SAGs, which could be related to the fact this pathway has one less ATP net yield compared to the EM pathway that yields two ATP molecules. As illustrated in Fig. 3, the transported glucose molecules or those emanating from the hydrolysis of cellulose are probably converted to α -D-glucose 1-phosphate and s α -D-glucose-6-phosphate, eventually entering the Embden-Meyerhof pathway. The genes involved in glycolysis are summarized in Supplementary Pentose metabolism. The oxidative pentose phosphate pathway is lacking in all the SAGs, consistent with findings in other archaea. Instead, pentoses are metabolized non-oxidatively by conversion of fructose 6-phosphate (C 6 ) to ribulose 5-phosphate (C 5 ). The four enzymes required in this archaeal pathway (fructose 1,6 bisphosphatase, fructose 16-bisphosphatase, 6-phospho-3-hexuloisomerase and 3-hexulose-6-phosphate synthase) were identified in ten of the genomes (Supplementary Table S2). Another source of the ribulose 5-phosphate could be ribose sugars via the nucleotide salvage pathway. Ribulose bisphosphate carboxylase (identified in 14 of the SAGs) is an enzyme known to convert ribulose 1,5-biphosphate to the highly unstable six-carbon intermediate 3-keto-2-carboxyarabinitol 1,5-bisphosphate, which spontaneously decays to two molecules of glycerate 3-phosphate. This end product is fed into the central metabolic pathway. The ribulose bisphosphate carboxylase proteins identified in nine of the SAGs are phylogenetically closely related to the archaeal form III cluster of RuBisCo proteins, which are able to fix CO 2 to ribulose bisphosphate 21 . These form III RuBisCo proteins have also been shown to participate in the AMP salvage pathway 22 . In the genome of AAA259A05, Glyceladehyde-3P is synthesized from deoxyribose sugars catalysed by ribokinase/phosphopentomutase and  The figure summarizes glycolysis/gluconeogenesis, autotrophic carbon fixation, one-carbon metabolism via the tetrahydrofolate/tetrahydromethanopterin pathways, sulfur, nitrogen, amino acid degradation and aldehyde metabolism. Membrane associated proteins, proteins involved in solute or ion transport are anchored in the membrane and the arrows indicate the flow direction (import, export or symport). Encircled numbers represent the various enzymes, whereas the color of the tiny balls on the periphery indicate in how many of the SAGs was the enzyme identified: Grey color 1-5 SAGs, Blue 6-10; Yellow 11-16 SAGs. * denotes not detected. The enzymes are: (1) phosphoglucomutase; (2) PTS system cellobiose-specific IIA component protein; (3) glucose-6phosphate isomerase; (4) 6-phosphofructokinase/Pyrophosphate-fructose 6-phosphate 1-phosphotransferase protein; (5) fructose 16-bisphosphate aldolase; (6) fructose 16-bisphosphate aldolase-phosphatase protein; (7) glyceraldehyde-3-phosphate dehydrogenase; (8) tungsten-containing aldehyde ferredoxin oxidoreductase (GAPOR)/ Aldehyde oxidoreductase protein; (9) phosphoglycerate kinase protein; (10)  Carbon fixation. Though the genomes of the individual SAGs are largely incomplete, the complete TCA cycle was recovered in AAA259A05 and AAA259I09 and the genes involved are listed in Supplementary Table  S2. MSBL1 possesses genes that are typically involved in autotrophic and anaplerotic CO 2 fixation. The reductive TCA cycle leads to the fixation of two molecules of CO 2 and the production of one molecule of acetyl-CoA catalysed by the key enzymes 2-oxoglutarate-ferredoxin oxidoreductase and isocitrate dehydrogenase. These two genes were identified in seven SAGs indicating that MSBL1 may have a functional reductive citric acid cycle. However, ATP-citrate lyase, which catalyses an ATP-dependent cleavage of citrate to oxaloacetate and acetyl-coA was not detected in any of the SAGs. Instead, two homologs were identified in the genomes AAA261O19 (gene. AAA261O19_00625C) and AAA261C02 (gene.AAA261C02_00763C) albeit with low similarity value of 32% to the known enzyme. In the anaplerotic reaction acetyl-CoA is (reversibly) reductively carboxylated to pyruvate by pyruvate:ferredoxin oxidoreductase (porA, porB, porC were identified in nine genomes) from which all other central metabolites can be formed or used for gluconeogenesis via a reversal of the EMP pathway. Alternatively, the enzyme phosphoenolpyruvate carboxylase (present in SAGs AAA382A03_00089C and AAA382N08) is able to fix CO 2 by using phosphoenolpyruvate 23 . Neither pyruvate decarboxylase, which catalyses the decarboxylation of pyruvic acid to acetaldehyde and carbon dioxide, nor lactate dehydrogenase were detected in any of the SAGs. A glyoxylate bypass is also probably missing as the two key genes (isocitrate lyase and malate synthase) were not detected. However, the organisms may import and degrade a variety of organic acids since beta-oxidation enzymes such as ferredoxin-dependent oxidoreductases are present (Supplementary Table S2). Acquisition of amino acids and proteins from the surrounding environment is evidenced by the occurrence of binding/transport proteins for branched chain amino acids as well as oligopeptides (Supplementary Table S2). Beta-oxidation of the branched chain amino acids uses enzymes that are also involved in the citric acid cycle (Fig. 4). The MSBL1 SAGs lack the enzyme acetate kinase, which catalyses the transfer of phosphate from ATP to short chain aliphatic acids. However, genes for acetyl-CoA synthetase (which converts acetate to acetyl-CoA) were found in 16 of the SAGs. On the other hand, CO dehydrogenase/acetyl-CoA synthase (Table S2), which participates in the Wood-Ljungdahl pathway, through which CO 2 is fixed under anaerobic conditions 24 , is present. The oxidative and reductive branches 25 of this pathway are present in the MSBL1 indicating that both, one-carbon metabolism and carbon dioxide/carbon monoxide fixation might be possible. The occurrence of the various carbon fixation pathways is summarized in Fig. 4 and Supplementary Table S4. Among the autotrophic CO 2 fixation pathways, the reductive acetyl-CoA pathway has the lowest energetic costs, requiring probably less than one ATP to produce pyruvate 26 . We cannot exclude the possibility that this pathway is used in the oxidative direction to oxidize acetate as energy substrate. This pathway has a requirement for metals, cofactors, strict anaerobic environment and substrates with low-reducing potential such as H 2 or CO, which restricts the pathway to anoxic niches-such as the deep-sea brine pools. Many facultative autotrophic archaea often down-regulate the enzymes that are specifically required for CO 2 fixation when organic substrates (such as acetate) are available 26 . Pyruvate formate lyase (present AAA259B11, AAA259E22 and AAAA259I09) catalyses the reversible conversion of pyruvate and coenzyme-A into formate and acetyl-CoA. Formate dehydrogenase detected in the SAGs may be involved in the oxidation of formate to CO 2 and donating the electrons to NAD + (since no cytochromes were detected). However, formate might also be reversibly incorporated into tetrahydrofolate by formyltetrahydrofolate ligase (although this is missing in the SAGs) and goes through a series of rearrangements resulting in the formation of 5-methyl-tetrahydrofolate (Fig. 4). The transfer of the methyl group of methyl-THF to carbon monoxide is mediated by a multi-enzyme complex catalysed by CO-methylating acetyl CoA synthase yielding acetyl-CoA 26 . Therefore, the acetyl-CoA decarboxylase/synthase complex (ACDS) bidirectionally links the tetrahydromethanopterin and tetrahydrofolate pathways with CO 2 as the initial substrate (Fig. 4). Notably, the pterin-containing tetrahydromethanopterin and tetrahydrofolate serve as carriers of C 1 fragments between formyl and methyl oxidation levels in both anabolic/ catabolic reactions 27 . Tetrahydromethanopterin may be involved in autotrophy (Wood-Ljungdhal pathway in some archaea) as well as purine biosynthesis whereas H4-folate could be used in the biosynthesis of methionine, serine and acetyl-CoA 28 . None of the core genes usually found in methanogenic archaea were detected in the MSBL1 SAGs (Supplementary Table S4).
Branched-chain amino acid transporters and permeases, transporters and genes for fatty acid beta-oxidation pathway were identified (Fig. 4, Supplementary Table S2). In the absence of fermentable sugars, these long-chain fatty acids (LCFA) could serve as an alternative carbon source for the MSBL1 group. The end product of LCFA biodegradation is acetyl-CoA, which can then be converted to pyruvate by 1 Sulphur metabolism. Sulphate, thiosulfate and sulfonates can be transported into the cell from the surrounding environment via ABC transporters and/or molybdate-tungstate transport system permeases or cysA/ cysA2 proteins (Fig. 4, Supplementary Table S2). Assimilatory sulphate reduction occurs through ATP sulfurylase into adenylylsulfate (APS), which gets further reduced to either sulphite directly through the activity of adenylylsulfate reductase (1.8.99.2), or to form 3-phosphoadenylyl sulfate (PAPS) due to the activity of sulfate adenylyl transferase. Finally, PAPS is reduced to sulphite by PAPS reductase (Fig. 4). Therefore, in MSBL1, sulphate reduction is putatively assimilatory leading to synthesis of cysteine and homocysteine catalysed by cysteine synthase and cystathionine gamma-synthase, respectively (Supplementary Table S2). In six of the genomes, we detected a thiosulfate sulfurtransferase GlpE protein, which contains a rhodanese domain. Theoretically, the role of this protein is to transfer sulphur from thiosulfate to cyanide yielding sulphite and thiocyanate. The sulfoxide reductase catalytic subunit YedY protein, which was identified in AAA259E22 and AAA259I07, is an inner-membrane Scientific RepoRts | 6:19181 | DOI: 10.1038/srep19181 bound protein, which catalyses the reduction of sulfoxide to sulphite (Fig. 4). The sulphate reduction mechanism in MSBL1 probably proceeds in the same manner as in A. fulgidus and sulphate-reducing bacteria where by the CoB-CoM heterodisulfide reductase iron-sulphur subunit A protein transfers electrons via the adenosine 5′ -phosphosulfate reductase (AprAB 1.8.99.2 adenylylsulfate reductase subunit) from the reduced menaquinone pool in the membrane to activated sulphate (APS, adenosine-5′ -phosphosulfate) forming sulphite. Localization prediction on the TMHMM server (http://www.cbs.dtu.dk/services/TMHMM-2.0/) shows that these reductases are located outside the membrane probably. The membrane-associated dsrMKJOP complex essential for sulphur oxidation as well as dissimilatory sulphite reductase are absent in MSBL1. Finally, five of the SAGs encode a gene identified as ferredoxin-nitrite reductase which is a homolog of the F 420 -dependent sulphite reductase 29,30 . It has been hypothesized that this enzyme may be involved in assimilatory nitrite/sulphite reduction 31 .
Nitrogen metabolism. The SAG AAA259D14 encodes two genes nrtA and nrtD that are essential for nitrate uptake from the environment. Neither assimilatory nitrate reductases (Nas) nor respiratory nitrate reductase (Nar) were identified in the genomes. However genes encoding for a periplasmic nitrate reductase (napA) were identified in four genomes (Supplementary Table S2). It has been proposed that periplasmic nitrate reductase can participate indirectly in respiration as part of the electron transport chain when coupled to a proton-translocating enzyme, such as NADH dehydrogenase I (NuoA-N enzyme), reviewed in references 32,33 . In SAG AAA259E22, the napA gene is located on the same contig with heterodisulfide reductase (hdrABC), tetrathionate reductase sub-unit B, Coenzyme F 420 reducing hydrogenase and succinate dehydrogenase (iron sulfur and flavoproteins subunits). In genome AAA261O19, the gene is located on the same contig as NADH-quinone oxidoreductase (subunits ACDHIK). The link between nitrate reduction and electron transport is also supported by the occurrence of genes such for ferredoxin-nitrite reductase protein, cytochrome c-type protein NrfB, 4Fe-4S ferredoxin iron-sulfur binding domain protein, electron transfer flavoprotein subunit alpha, electron transfer flavoprotein and periplasmic Fe-hydrogenase large subunit proteins (Supplementary Table S2). Other sources of nitrogen could be nitrile that is converted to ammonia catalysed by a nitrilase (3.5.5.1) or nitroalkene (also called nitro olefin) that is oxidized to nitrite by nitronate monooxygenase (EC: 1.13.12.16), whose orthologues were found in 15 genomes. Nitrilases act solely on carbon-nitrogen bonds to produce a carboxylate and ammonia. Eight SAGs encode an anaerobic nitric oxide reductase flavorubredoxin that can be used to detoxify nitric oxide using NADH 34 . Table S2). Based on the sub-unit composition of the NADH-quinone/ubiquinone oxidoreductase, the potential electron donor is NADH catalyzed by NADH dehydrogenase (found in 9 of the SAGs as shown in Table S2). The NADH dehydrogenase is a flavoprotein that contains iron-sulfur centres. Iron-sulphur binding proteins, oxidoreductases and fumarate reductase possibly contribute to energizing the cell membrane as well as general intracellular flow of electrons. Several genes encoding for coenzyme F420 hydrogenase and a putative hydrogenase maturation protease (EC 3.4.23.-) were identified in the SAGs (Supplementary Table S2). The CoB-CoM heterodisulfide reductase iron-sulfur protein (1.12.98.1) is similar to that of Methanothermobacter fervidus and is involved in sulphate reduction as discussed above. The subunit FrhB of F 420 -reducing hydrogenase carries the binding site for the prosthetic groups F 420 , FAD and a [4Fe-4S] cluster 35 . Putative K + -stimulated pyrophosphate-energized sodium pumps are probably involved in oxidative phosphorylation in the MSBL1.

Energy metabolism. Oxidative phosphorylation in MSBL1 consists of oxidoreductases, membrane bound hydrogenases and dehydrogenases, NADH-quinone/ubiquinone oxidoreductases, fumarate reductase and an ATPase complex (Supplementary
Transport. Transporters identified in the SAGs include ABC transport systems for branched-chain amino acid, arginine, ornithine, dipeptide, spermidine/putrescine, sugars and acids as well as for uptake of metal ions (Supplementary Table S2). These compounds provide the necessary substrates for numerous biosynthetic and degradation pathways. Additionally, ion transporters facilitate the flux of the different ions into, and also out of the cells (Fig. 4). For example, iron ions are essential for the synthesis of iron-sulphur clusters in the [NiFe] hydrogenases, formylmethanofuran dehydrogenases, heterodisulfide reductase, ferredoxins, and [Fe] hydrogenase. Phosphate is probably taken up by a PstABCS and PhoU system as described by Aguena and Spira 36 . This is confirmed by the occurrence of the respective genes/proteins involved in regulation and uptake of phosphorous from the environment. For example, in the SAG AAA259B11, genes for phosphate uptake regulation protein (PhoU), phosphate binding (PstS), ABC transporter permease protein (YqgH), phosphate Import ATP binding protein (PstB) and phosphate transport system permease protein PstA are all located on the same contig. Neither anion permeases nor sodium dependent phosphate transporters were identified in any of the SAGs. In microorganisms, molybdate ions are required for the synthesis of the molybdenum-dependent formylmethanofuran dehydrogenase, formate dehydrogenase and nitrogenases 37,38 . On the other hand, tungstate ions are required for the synthesis of the tungsten-dependent formylmethanofuran dehydrogenase 39 and their uptake from the environment is mediated by a tungsten transport protein (WtpA).

Stress response. Carbon starvation genes (carbon starvation-induced protein A) were detected in the SAGs
AAA261C02 and AAA261O19, AAA833F18 and AAA833K04. This is a predicted membrane protein probably involved in peptide utilization when carbon becomes limiting 40,41 . Stress response genes include the small heat shock protein C4 and a universal stress protein YxiE (14 of the genomes). The universal stress protein UspA identified in four of the genomes is a small cytoplasmic bacterial protein whose expression is enhanced when the cell is exposed to stress agents 42 . Oxidative stress genes in MSBL1 include a putative oxidative stress-related rubrerythrin protein, putative superoxide reductases, glutaredoxins and thioredoxins. Glutathione and glutaredoxins are involved in disulphide reductions in the presence of NADPH and glutathione reductase 43 . Genes for Scientific RepoRts | 6:19181 | DOI: 10.1038/srep19181 resistance to heavy metals as well as antibiotics are listed in Supplementary Table S2. These include genes for the resistance to cadmium and arsenate as well as antibiotics such as danorubicin, methicillin, quinolones and tetracycline. The complete archaeal gene cluster for motility is missing in all the genomes though twitching motility protein PilT occurs in 21 SAGs. On the other hand, genes for pilus assembly are more widespread in the group (Supplementary Table S1) and could be responsible for secretion and cell-to-cell signalling.

Discussion
MSBL1 have been presumed to be methanogens on the basis of phylogenetic placement or the presence of large amount of methane in the environments where they have been detected 3,9,10,44,45 . Phylogeny based on the Silva aligner places the MSBL1 within the class Thermoplasmata 16 with identity scores between 83 to 86.9%. Previous phylogenetic placement of this group was summarized by Antunes et al. 5 . When shorter clone sequences are included in the analysis, the MSBL1 lie in the radiation of the uncultured Euryarchaeota group-SAGMEG 3,44,46 or other uncultured groups 9 . However, the low bootstrap values in the phylogenetic trees do not allow for a clear placement at this point in time. In our analysis, we chose only full-length 16S sequences from the SAGs and comparative genomes from the NCBI database in order to have consistency between both the phylogenetic trees using 16S rRNA genes and core proteins. The MSBL1 group has been exclusively reported from hypersaline environments. The brine environment for example is one of the most extreme environments and therefore specifically adapted microorganisms probably have evolved mechanisms that enable them to adapt and thrive under these conditions. The common adaptation mechanisms have been previously described 11,47 . Ability of the MSBL1 archaea to import or synthesize osmolytes enables them to maintain intracellular osmotic balance and hence cope with salt stress in hypersaline environments where they have been reported. This is evidenced by the presence of transporters for glycine-betaine (Fig. 4) and also genes for biosynthesis of trehalose (Supplementary Figure S3). Furthermore, the slightly acidic proteome signature is associated with organisms employing the "salt out" strategy in contrast to the extreme halophiles that have a highly acidic proteome (Supplementary Figure  S2) and use the "salt in" strategy 47,48 . MSBL1's ability to operate between heterotrophy (sugar fermentation) and potentially autotrophic CO 2 fixation (Fig. 2) highlights a possibility of a flexible mixotrophic lifestyle that might explain why MSBL1 is the major group reported for example in Lake Medee brine 4 as well as in the metagenomic samples collected from the Atlantis II and Discovery brine pools 12 . Methanogenesis as we know it cannot occur in the absence of methyl-coenzyme M reductase as well as the associated cofactor (F 430 ). We were not able to detect mcrA genes in the genomes nor were we able to amplify mcrA genes from the MDA-DNA that was used to generate the genome sequences. In addition, none of 15 core genes found in methanogenic archaea 49 were detected in the MSBL1 SAGs (Supplementary Table S4). Moreover, at high salinity methanogenesis from H 2 + CO 2 or from acetate, dissimilatory sulphate reduction coupled to the oxidation of acetate, and autotrophic nitrification have been mentioned as some of the energy-producing reactions that are bioenergetically unfavourable 47,50 . Therefore, methane encountered in the brines could be from other biochemical processes or is produced by MSBL1 through a novel pathway independent of the canonical mcrA-associated pathway. For example, low amounts of methane observed in Archaeoglobus 51 and in sulphate-reducing bacteria 52 result from transfers of methyl groups by CO dehydrogenase. It is reported that the methyl group of N 5 -methyltetrahydromethanopterin can be reduced to methane and tetrahydromethanopterin by carbon monoxide (CO) dehydrogenase 53,54 . Assimilatory sulphate / nitrite reduction in the MSBL1 is catalysed by a ferredoxin-nitrite reductase 31 . Dissimilatory nitrate reduction serves to oxidize excess reducing equivalents 32 potentially catalysed by the periplasmic nitrate reductase with electrons from formate dehydrogenase as the electron donor 55 . Stress response involves a repertoire of genes in the different SAGs (Supplementary Table S2). These include the universal stress protein UspA 42 , rubrerythrin (Rr) also found in anaerobic sulphate-reducing bacteria 56 and rubredoxin 57 . Glutaredoxins and thioredoxins are proteins that act as antioxidants by facilitating the reduction of other proteins by cysteine thiol-disulphide exchange and therefore play a role in alleviating oxidative stress in MSBL1. The data presented here provide a first insight into the metabolism of this enigmatic uncultured archaeal lineage encountered in hypersaline environments. We are convinced that the metabolic reconstruction and genome sequences here will guide future isolation efforts.

Materials and Methods
Sampling sites and sample preparation. Samples used in this study were collected from the Atlantis II, Nereus, Erba and Discovery brine pools in the Red Sea (Table 2; Fig. 1) between 16 th and 29 th of November 2011 during the 3 rd KAUST Red Sea Expedition-Leg 2 on the vessel R/V Aegaeo.
Samples for the single-cell sorting were processed as follows: Small volumes of sample (ca. 30 ml) were collected and divided into two parts. The first aliquot of the sample was stored unfixed whereas the second part was fixed by adding glycerol (final concentration 10%) and immediately placed at -20 °C. Ten ml of the unfixed sample were transferred into a serum bottle and sent for cell sorting. Big volumes (ca. 480 L) of sample were collected into 20 L carboys, bubbled with nitrogen gas. Concentration was done using a Tangential Flow Filtration (TFF) system equipped with a 0.1μ m cassette filter and coupled with a 5.0-μ m pre-filter. A 10-ml portion of the concentrates was transferred to a serum bottle and sent for cell sorting. Cell sorting, lysis, whole genome amplification and SSU rRNA PCR were performed as described 58 at the Bigelow Laboratory Single Cell Genomics Centre (http://www.bigelow.org). Amplification of the mcrA gene from the MDA reactions was done as published previously 59 . Sequencing, assembly, annotation. The SAG DNA was cleaned in preparation for sequencing using the ethanol/sodium acetate precipitation method and re-suspended in 25 μ L of MilliQ water. Quantification of the DNA was performed using Quant-iT dsDNA HS assay kit and a Qubit fluorometer (Invitrogen GmbH, Karlsruhe, Germany) as recommended by the manufacturer. Sequencing was done at the Bioscience core facility, King Abdullah University of Science and Technology on an Illumina HiSeq 2000 platform. Assemblies of the single-cell amplified genomes (SAGs) were generated using a pipeline that employs assemblers designed for single-cell sequencing data including VelvetSC 60 , Spades 61 , and IDBA-UD 62 , along with several pre and post assembly data quality checks using Trimmomatic 63 . In our benchmarking tests, IDBA-UD showed better contig-level assemblies and the assemble contigs were used in further analysis. After quality control (described below), genome annotation for each of the SAGs was carried out as described in Alam et al. 64 . Briefly, given a set of DNA sequences from particular SAG, the Automatic Annotation of Microbial Genomes (AAMG) pipeline first detects rRNA and tRNA. To avoid prediction of Open Reading Frames (ORFs) in RNA detected regions, all DNA regions detected with RNA are masked, followed by ORF predictions using Prodigal 65 and MetaGeneAnnotator 66 . After ORFs prediction is complete, a series of similarity searches are performed to select optimal gene annotations using UniProt, NCBI's NR, NCBI's Conserved Domain Database (CDD), KEGG database and finally Interproscan. All annotations, including DNA and ORF sequences are then stored in an integrative data-warehouse of microbial genomes (INDIGO, see methods in reference 64) for easy look up. As none of the SAGs represent complete genomes, the metabolic reconstruction in Fig. 4 represents our current working hypothesis for the metabolism of MSBL1.
Quality control. Our approach towards decontaminating the draft assembly of SAGs was simple and some of the genomic features like GC content, size, gene content and tetranucleotide frequency (TNF) 67 of the contigs were exploited. These filters were kept independent and a contig had to pass all filters in order to be put into the clean bin. A contig with %GC content lying outside + /− 10% range around the average was marked as potentially contaminated 68 . The calculation of the average G + C% for any draft assembly might be spurious and misleading if the assembly contains a lot of contaminations. We observed some real contigs (passing size, gene content and all filters) ending in the contamination bin just due to having slightly lower or higher G + C% around the range. This problem was overcome in two different ways. Firstly, if the analysis was done on a single SAG, the average GC content was calculated on the set of large contigs or the contigs constituting more than fifty per cent of the draft assembly (i.e N50 contigs). The second solution was applied when the analysis contained a group of SAGs belonging to same taxa and they needed to be cleaned in concert. In this case the single copy conserved genes were identified using Bio-Hal pipeline (PMID:21327165) for each of them and pooling their corresponding contigs to constitute a set (named "seed contigs") on which G + C% was calculated. This single value of G + C content calculated on "seed contigs" could be applied for cleaning all SAGs of this group. Alternatively, the seed contigs could be separated for each SAG and G + C% calculated on different set. Size filter was relatively simple and cut-off could be fixed using the contig statistics of the draft assembly. For this cleaning the size threshold was 2000 bp i.e any contig below 2kb was discarded 68 . Nonetheless, a manual inspection of the contigs discarded just due to size filter is always advised. The smaller contigs (500 bp < x < 2000bp) might also contain some important gene. If majority (50% or more) of the genes in a scaffold/contig hits to the non-target phylum, that contig was discarded 68 . Binning of contigs was done at domain level either bacteria or archaea. Once the binning was complete based on the above three filters the clean bin and all bin were subjected to Canonical Correspondence Analysis (CCA) using TNF of the sequences and the contigs visualized on the plots 67 . Canonical Correspondence Analysis was done in R using Vegan package (https://cran.r-project.org/web/packages/vegan/index.html), while plotting was done in R using custom scripts developed in our group. The plot showing all contigs of the assembled genome gave a clear idea of the level of contamination in terms of phylogeny and G + C content profiles of contigs. The subsequent plot using only the clean contigs was much clearer and helped in finding out the few false positives, which were very dispersed but passed all the above three filters to be in clean bin. Manual inspection of such contigs was done to decide whether keep or discard them. The number of multiple single-copy conserved genes in the genome is a very important indicator of the contamination or might represent spurious assembly. To check the distribution of conserved cluster of orthologous groups (COGs) in the genome (to have an idea about "genome completeness"), we used different COG set for bacteria and archaea (adopted from Human Microbiome Project; R package Vegan was used (https://cran.r-project.org/web/packages/vegan/index.html). We observed in our single cell genomes data multiple copies of conserved genes could belong to multiple contigs. In most cases, the contig with largest size and more genes content was retained as part of the genome. Altogether, the QC pipeline takes care of the contamination present in the draft single cell genome using various genomic features both sequence-dependent and independent.
Evolutionary relationships. The SAGs and the representative genomes were scanned for common marker genes (CMG) using the phylogenomic inference tool AMPHORA2 14 along with its set of predefined marker genes. Identified marker genes were concatenated in the same order across all the samples and saved in multi-fasta format with headers being the sample names. The concatenated sequence in multi-fasta was then aligned using Muscle 69 (reference) with default settings. Simple gblocks tool 70 -with default settings -was used to remove any ambiguous bases from the Muscle alignment. Phylogenetic trees were inferred from either trimmed alignments of nucleotide sequences (16S tree) or the amino-acid (protein tree) alignments of ten concatenated proteins present in all the genomes included in the analysis using the pthreads-parallelized RAxML 71 version 7.2.8. The ten concatenated proteins are: translation initiation factor IF-2 (infB); 50S ribosomal proteins rpl18p, rpl19e, rpl32e, rpl5p, rpl6p, rpl7ae, and 30S ribosomal proteins rps28e, rps6e and rps8p. Fast bootstrapping was applied with subsequent search for the best tree 72 , the autoMRE bootstopping criterion 73 and the LG model of amino acid evolution 74 (with which these data yielded the highest log likelihood among all empirical protein models implemented in RAxML) in conjunction with gamma-distributed substitution rates 75 and empirical amino acid frequencies.
Tree searches under the maximum parsimony (MP) criterion were conducted with PAUP* version 4b10 76 using 100 rounds of random sequence addition MP bootstrap support was calculated with PAUP* using 1000 replicates with ten rounds of heuristic search per replicate. The 16S rRNA gene datasets were analyzed in the same way, but using GTR as substitution model. pI estimation. In addition to the predicted protein-coding genes of our SAGs, we extracted the proteins-coding coding genes from GenBank files from the NCBI of exemplary extreme halophiles (Halobacterium sp. NRC1 and Salinibacter ruber), moderate halophiles (Chromohalobacter salexigens, Idomarina loihiensis L2TR, and Nitrosococcus halophilus Nc4), as well those from typical marine bacterioplankton (Pelagibacter ubique HTCC1062, Pelagibacter sp. HTCC 7211, Nitrosopumilus maritimus SCM1, and Nitrococcus mobilis). The isoelectric points (pIs) of these proteomes were calculated using the "iep" script in the EMBOSS software package (v6.5.1; http://emboss.sourceforge.net/what/) using the following settings: -amino 1 -termini YES -step 0.2.