Microbial diversity in the hypersaline Lake Meyghan, Iran

Lake Meyghan is one of the largest and commercially most important salt lakes in Iran. Despite its inland location and high altitude, Lake Meyghan has a thalassohaline salt composition suggesting a marine origin. Inputs of fresh water by rivers and rainfall formed various basins characterized by different salinities. We analyzed the microbial community composition of three basins by isolation and culturing of microorganisms and by analysis of the metagenome. The basins that were investigated comprised a green ~50 g kg−1 salinity brine, a red ~180 g kg−1 salinity brine and a white ~300 g kg−1 salinity brine. Using different growth media, 57 strains of Bacteria and 48 strains of Archaea were isolated. Two bacterial isolates represent potential novel species with less than 96% 16S rRNA gene sequence identity to known species. Abundant isolates were also well represented in the metagenome. Bacteria dominated the low salinity brine, with Alteromonadales (Gammaproteobacteria) as a particularly important taxon, whereas the high salinity brines were dominated by haloarchaea. Although the brines of Lake Meyghan differ in geochemical composition, their ecosystem function appears largely conserved amongst each other while being driven by different microbial communities.

a tectonic depression that drains an extensive catchment area 15 . High evaporation and a lack of rainfall in recent years in this region has led to an increased salinity 15 . The lake itself is located at an altitude of 1,660 m in an area with an arid to semi-arid continental climate and the temperature ranges from −30 °C in winter to 40 °C in summer. The lake receives average annual precipitation of ~320 mm and the maximum lake depth during the wet season is ~1.5 m, while the average annual evaporation is ~2,070 mm. Ephemeral rivers along with smaller rivulets deliver sediment-loaded rainwaters from the catchment area into the lake basin. Lake Meyghan is commercially one of the most important hypersaline lakes in Iran because of its mineable sodium sulfate deposits, which is the largest in the Middle East 15 . Despite its origin and location, which could have resulted in an athalassohaline composition like the divalent cation-rich Dead Sea 16 , Lake Meyghan is sodium chloride-dominated with a high sulfate concentration of 48-62 g L −1 15 , more than three times the concentration of Great Salt Lake (10-20 g L −1 ) 17, 18 and a slightly higher pH (7.7-8.8) than other salt lakes. Besides the presence of birds that feed on the abundantly present brine shrimp -Artemia salina -nothing is known about the biodiversity in the hypersaline parts of the lake. Here we describe the first metagenomic analysis of the microbial community composition in an Iranian salt lake. In addition, we performed a cultivation dependent analysis and reveal that the microbiota of Lake Meyghan share commonalities with both thalassohaline and athalassohaline hypersaline lakes and that each of the stations exhibit their own microbial signature.

Results
Site description. The shallow brines of Lake Meyghan were sampled at three different sites that were named according to the dominant brine color, i.e. G (green), R (red) and W (white) (Fig. 1). The physicochemical properties of the brine samples are presented in Table 1. Na + and Cl − were identified as the major ions in the three samples, followed by SO 4 2− and Mg 2+ . With a pH of 8.8 (G), 7.9 (R), and 7.7 (W), the brines were moderately alkaline with salinities of 50 g kg −1 , 180 g kg −1 , and 300 g kg −1 , respectively. The bacterial and archaeal abundance was estimated by qPCR using domain specific 16S rRNA primers ( Table 2). The sum of the archaeal and bacterial 16S rRNA gene copies was used to estimate the total prokaryote abundance. Total 16S rRNA gene copy number decreased by ~50% with increasing salinity from 8.1 × 10 6 (G) to 3.6 × 10 6 (W) copies per ml. The relative contributions of Bacteria and Archaea also changed with increasing salinity. Bacteria were the dominant group (79%) at low salinity (G) and Archaea were dominant (84%) at the highest salinity (W).
Bacterial and archaeal isolates. A total of 361 strains was isolated using different media. From each site, the 16S rRNA gene of 35 randomly chosen isolates (105 in total) was sequenced (Supplementary Table S1, sheet 9). The low salinity pond (G) yielded mainly Bacteria (31 out of 35). Proteobacteria were the dominant group with 20 Gammaproteobacteria (e.g. Idiomarina sp. and Halomonas sp.) and 3 Alphaproteobacteria. The other bacterial isolates belong to the phyla of Actinobacteria (3), Bacteroidetes (4), and Firmicutes (1). From the medium-salinity red pond (R), 16 Bacteria were isolated of which 9 belong to the Firmicutes (e.g. Bacillus sp. and Thalassobacillus), 5 Gammaproteobacteria, and 2 Actinobacteria. Bacteroidetes and Alphaproteobacteria were not amongst the isolates from the red pond. From the highest salinity, 10 Bacteria were retrieved, 6 of which belong to the Gammaproteobacteria, 3 Firmicutes, and 1 Alphaproteobacteria. Of the sequenced isolates 4, 19, and 25 were of archaeal origin from the Green (G), Red (R), and White (W) sample sites, respectively (Supplementary Table S1, sheet 9). All archaeal isolates belong to three families within the class of Halobacteria: Halobacteriaceae, Haloferacaceae and Natrialbaceae with respectively Haloarcula, Halorubrum, and Natrinema as the dominant genera. We did not obtain Haloferacaceae in the collection of the lower salinity brine isolates while they dominate the collection of isolates at the highest salinity (site W, 10 isolates).
Metagenome community analysis. The metagenome of the Meyghan lake communities yielded per site ~35 million paired sequence reads of ~100 nucleotides each (Table 3). MG-RAST identified 6.8 million (G), 4.1 million (R), and 3.0 million (W) reads, corresponding with alignment-identified protein features in green, red, and white samples, respectively. Due to the insecurity in correctly assigning the relatively small sequencing reads at the species level we further analyzed the taxonomic annotation derived from rRNA fraction in MG-RAST at the genus level (Supplementary Table S1 sheet 10). Rarefaction curves obtained from the genus assigned rRNA fraction of the metagenomes for the bacterial and archaeal populations are nearing an asymptote showing that the sequencing depth is sufficient to identify the majority of abundant genera but may miss a number of rare genera (Fig. 2). The green site is dominated by bacteria (78%) (Fig. 3A) with an estimated richness of 374 genera while the hypersaline lakes have an estimated richness of 90 (R) and 86 (W) bacterial genera. Archaea dominate the hypersaline lakes (64%, red and 66% white) (Fig. 3A), but with an estimated richness of 29 (G) and 33 (W + R), the archaeal richness is much lower than that of Bacteria. Bacterial genus diversity using the Shannon diversity index was estimated at 5.1 (G), 2.8 (R) and 3.3 (W). The archaeal diversity was estimated at 2.5 (G), 1.8 (R) and 2.4 (W).
In all brines we found approximately 15% of the reads belonging to Eukarya. The dominant bacterial phyla were, in order of abundance, Proteobacteria, Firmicutes, Bacteroidetes, and Actinobacteria, which made up more than 94% of the total bacterial diversity in the three brines (Fig. 3B). In addition, a large number of bacterial reads   was found that could not be assigned to any known phylum. In the hypersaline brines (W and R), Bacteroidetes were more abundant than Firmicutes. Notably, two phyla, Planctomycetes and Cyanobacteria, were present at the lowest as well as at the highest salinity (albeit at low abundance) but were absent from the intermediate salinity brine (R). The dominant proteobacterial class in all samples were Gammaproteobacteria ( Fig. 3C) Table S1, sheet 6). Overall, at the genus level, the top three dominant genera amongst the bacteria that could be assigned at the genus level were Pseudoalteromonas, Shewanella, and Clostridium (Table 4). Furthermore, an unassigned bacterial genus and an unassigned proteobacterial genus were abundantly present. More than 97% of the archaeal reads belong to the class of Halobacteria, while only a few methanobacterial reads were found. In fact, at the two highest salinities, Halobacteria made up approximately 63% of all rRNA reads in contrast to a mere 6% at the lowest salinity. The halobacterial reads were distributed over three families, the Haloferacaceae at 50-70% relative abundance with Halorubrum and Haloferax as dominant genera, Halobacteriaceae at 23-30% with Natronomonas as dominant genus at high salinity and Haloarcula at the lowest salinity, and Natrialbaceae at 6-19% relative abundance with Natronorubrum and Natronolimnobius as dominant genera at the highest salinities and Natronococus at the lowest salinity (Supplementary Table S1, sheet 7). Halorubrum, Haloferax, and Natronomonas are also overall the three most abundant prokaryote genera in the two hypersaline brines ( Table 4). Most of the micro-eukaryote 18S rRNA sequences in the metagenomes belong to the Chlorophyta family of Dunaliellaceae with 1,057, 1,368, and 1,602 reads for sites G, R, and W, respectively. Most of these reads (70-94%) were assigned to Dunaliella salina (Fig. 3A).
Diversity indexes from the metagenome dataset revealed the highest number (4332) of observed species (Chao-1) at the lowest salinity site (G), which also had the highest evenness (0.04) and Shannon diversity index (4.8). For the two high salinity sites identical Shannon diversity (3.1) and evenness (0.01) were calculated while the observed species richness indexes were also similar (2682 and 2686 for R and W, respectively).
From the metagenome we extracted data of functions as defined by the gene onthology set (GO-slim) 19 . The distribution of functions was highly similar between the red and white brine and differed only slightly from the green brines. Comparison of the low salinity brine with the average of the high salinity brines revealed only small differences in functional diversity (Fig. 4). Cofactors-vitamins-prosthetic groups-pigments, DNA metabolism, and RNA metabolism are functions that occurred in higher numbers in the hypersaline sample, whereas processes related to nucleosides and nucleotides, cell wall and capsule, stress response, fatty acids-lipids-isoprenoids, and virulence-disease-defense were more pronounced in the low salinity brine. A list of most abundant genes was generated from the MG-RAST subsystems dataset (Table 5). In total, 345 protein-coding genes were identified at 55% amino acid identity (data not shown). The top ten most abundant genes made up 43.7% of the reads from the green brine, 69.7% of the red brine and 81.5% of the white brine. Table 5 revealed no overlap in dominant functions between the three brines.
The co-assembled reads database of the metagenome contained 25,406 contigs with a minimum contig length of 2,500 bp, which corresponded to 27% of all contigs and 59% of all nucleotides found in the database. The contigs were organized in 60 bins with 2 to 2,609 contigs per bin. The contigs in the bins were split to ensure that no read exceeded 20,000 bp. The total number of splits (corresponding to tree tips of the dendrogram, representing the shared and unique bins) was 25.749, which are plotted on the X-axis (Fig. 5). The GC content varied from 34.1 to 70.5%. Applying a cut-off of 1% recruitment (of the total number of reads from a sample site), only 1 bin (number 20) contained (split) contigs that aligned with reads from all three sample sites. However, 11 (bin numbers 1, 2, 3, 7, 11, 13, 14, 20, 25, 26, and 28) were shared between the red and white sample sites at a recruitment of >1%. When considering a mean coverage higher than 1% (the number of reads of a sample that aligns with a contig in a particular bin), only three bins (number 20, 34, and 48) contained contigs that aligned with reads from all three sites. Figure 5 shows that the green site was very different from the two higher salinity sites and that there was hardly overlap. The red and white sites also form distinct clusters but there was more overlap between them. Moreover, it is remarkable that the low salinity green site was characterized by a lower GC content (average 54%) compared to the high salinity sites (average 61%).
For the comparison of genera obtained by isolation with genera identified by metagenomics analysis (MG-RAST and read assembly), samples from the different stations were combined (Tables 6 and S1 sheet 11). The bacterial genera Halomonas and Pseudomonas are abundant amongst all datasets (top 10). Idiomarina are abundant amongst the isolates and contigs but rank lower in the MG-RAST dataset. Several isolates, e.g. Bacillus,   Marinobacter and Streptomyces, are present in the MG-RAST dataset but are not found the annotated contigs. On the other hand, isolates belonging to the genus Algoriphagus, Kocuria and Spiribacter are present in the assembled contigs but absent in the MG-RAST annotation. Interestingly, nine of isolated genera were not found in the two metagenome datasets. Salinibacter, Clostridum and Shewanella are among the major genera present in the metagenome datasets but absent from the isolates. For the Archaea, only seven genera were isolated (Tables 6 and S1 sheet 11) and only three genera were identified by their rRNA in the metagenome assemblies while thirty-four were annotated by MG-RAST (Supplementary Table S1 sheet 11). Haloarcula was the only genus present in all three datasets and identified at high numbers. Halorubrum was the dominant genus among the isolation and MG-RAST annotation but is absent from the assembly. Abundant genera like Haloferax, Natronomonas, Natronorubrum and Halobacterium are only found in the MG-RAST annotated dataset (Supplementary Table S1 sheet 11).

Discussion
Based on its inland location, unconnected to the sea and the input of mainly riverine freshwater enriched with salts leaching from its immediate environment, an athalassohaline composition was expected. Lake Meyghan has a thalassohaline composition, which is not uncommon for inland hypersaline lakes 11,12,16,20,21 . However, molecular and cultivation analysis revealed organisms that are commonly found in thalassohaline and athalassohaline lakes (see below). The total number of 16S rRNA gene copies in the genomes of each community member is unknown and reports suggest copy numbers in fully sequenced bacterial genomes ranging from 1 to 15 genes per genome with an average of 4 22 . Total cell counts in the hypersaline basins estimated from gene copy numbers are in the range of 10 6 cells mL −1 , similar as reported for other hypersaline environments worldwide [23][24][25][26] , but lower than the microbial cell densities of 10 8 or more that were reported by others 27,28 . The number of identified sequence reads is relatively low as compared to the initial 30 million reads per sample that passed quality control but is a consequence of still incomplete reference databases, especially when less common ecosystems like hypersaline lakes are analyzed. The green site had salinities similar to -for example -the Salton Sea, USA, which has a salinity of about 56 g kg −1 29, 30 . The metagenome of the green -low salinity -site contained several chloroplast sequences, sequences from the eukaryal phyla Bacillariophyta (diatoms, mainly of the genus Phaeodactylum) and Chlorophyta (mainly of the genus Dunaliella), as well as a low number of Cyanobacteria-related sequences. Both a diverse phytoplankton community and low Cyanobacterial abundance were also found in the Salton Sea 31 , arguing that the composition of the green site is typical for basins with a slightly elevated salinity relative to seawater. The abundantly present chlorophyll-containing microbes may explain the color of the green site. These organisms represent the primary producers in this ecosystem, fixing carbon and producing organic matter in the form of sugars. The majority of the green site's microbial diversity consists of Bacteria, typical for low salinity ecosystems and mostly consisting of heterotrophic species that decompose the primary produced carbon sources. Despite the large distance from the nearest sea, the bacterial diversity was typical for marine ecosystems or for systems with an elevated salinity 29,30,32 . For example, the dominantly occurring bacterial genera Pseudoalteromonas, Shewanella, and Vibrio are commonly found in ocean waters [33][34][35] . Not common to normal marine waters, however, is the occurrence of haloarchaea such as Haloarcula, Haloferax, Natronococcus, Halogeometricum, and even the extreme halophilic Haloquadratum. Two halophilic genera were even more abundant in the green basin than in the hypersaline basins: Haloarcula and Natronococcus. Halophilic species such as Haloquadratum should not survive at a salinity of 50 g kg −1 . Moreover, presence of haloarchaea has not been documented in any of the Salton Sea studies 29,30,32 . This raises the question whether members of these species are capable of (temporarily) resisting these low salinities or whether their sequences were derived from an extracellular pool of DNA that might have been co-extracted from the samples. Given the absolute requirement for high salt concentrations of species such as Haloquadratum, the extracellular pool of DNA seems the best explanation for the detection of sequences belonging to these organisms in the metagenome. However, the isolation of at least 4 halophilic Archaea from the green basin using hypersaline isolation media suggests that some of these species resisted the low salt concentrations. Indeed, species such as Natronococcus are capable of growing at low salinities 36 . An experiment where samples from marine wetlands (32 g kg −1 salinity) were inoculated on high salinity media (100-200 g kg −1 ) yielded several haloarchaeal isolates related to Halorubrum, Haloferax, and members of the family of Natrialbaceae 37 . While this demonstrates that these isolates resist low salinities, they require higher salinities for optimal growth, in accordance with the 4 archaeal isolates presented here, which were isolated on 230 g kg −1 salinity media. A potential source of these sequences and strains could be the occasional discharge of hypersaline water into the lower salinity areas of Lake Meyghan resulting from the industrial sulfate mining. As expected, the microbial composition in the green -low salinity -basin differed greatly from those of the two hypersaline basins, which were dominated by haloarchaea. The red basin might derive its color from alga like Dunaliella salina and from halophilic archaea. Although Dunaliella belongs to the family of green algae, they often have a red appearance at high salinity due to the high content of carotenoids 38 . Haloarchaea are also rich in carotenoid derivatives such as bacterioruberin that protects the organism against oxidative DNA damaging agents such as UV irradiation 39 . Although the same organisms were present in the white basin, their characteristic red color was probably masked by the high precipitation of salts. Especially at high salinities, Dunaliella salina becomes the last surviving primary producer that feeds the heterotrophic community mainly through the production and release of the osmoprotectant glycerol 1, 10 .
The slightly alkaline nature of Lake Meyghan (pH 7.7-8.8) raised the question whether the brines are home to haloalkaliphiles, species that require both high salinity and a high pH for optimal growth 40 . However, the highest pH of 8.8 was restricted to the low salinity green basin while the hypersaline brines with pH < 8 were below the threshold of >pH 9 for alkaline waters 40 . The halophilic members found in the red, medium salinity basin consisted mainly of genera commonly found in thalassohaline brines such as the haloarchaea Halorubrum, Haloferax,  Table 6. Comparison of major genera in the combined samples identified by cultivation with genera found in the MG-RAST annotation and annotation of rRNA containing assembled contigs. 1 To allow comparison with the metagenomic assembly dataset that was obtained by pooling sequence reads from all samples, the sum of isolates and MG-RAST annotations from the three samples are presented. 2 Between brackets is their relative position according to decreasing abundance. Note that several genera are found at identical numbers and are positioned according to rank and alphabet. 3 The minus sign indicates that the genus is not identified in the dataset.
Scientific RepoRts | 7: 11522 | DOI:10.1038/s41598-017-11585-3 Halobacterium, and Haloarcula and halophilic bacteria such as Halomonas and Salinibacter. However, genera such as Halorubrum, are also commonly found in haloalkaline environments 41 . In the extreme hypersaline, white basin, however, several genera that were more abundant are common to athalassohaline high pH brines e.g. Natronorubrum, Natronomonas, and Natronolimnobius. Also among the isolates are species typically associated with alkaline hypersaline waters such as Natrinema sp. 41 , of which 9 strains were isolated from the red brine and 4 strains from the white brine. However, the same brine also contained members of the genus Haloquadratum, genera typically found at neutral to slightly basic pH in saturated crystallizer ponds 25 . Moreover, strains of the haloalkaliphile Natronomonas were also found at lower pH while other characteristic genera for haloalkaline ecosystems such as Natrialba and Thioalkalivibrio 40 were present at low numbers. Despite the large overlap in community composition between the red and white brines, there were some noticeable differences mostly at the lower taxonomic levels. Notably, the SSU reads for the bacterial phyla Planctomycetes and Cyanobacteria were present in the green and white basins but were not found in the medium salinity red basin. Also, SSU annotated reads from the alphaproteobacterial order Rhodobacterales were not found in the red brine while in contrast the Rhizobiales were nearly absent from the white brine. Amongst the gammaproteobacterial orders, Oceanospirillales and Xanthomonadales were more abundant in the red brine while Enterobacteriales and Legionellales were more abundant in the white brine. Two gammaproteobacterial strains that were isolated from the white brine are potential novel species having a sequence identity less than 96% with known species. One isolate has 95.2% identity with Arhodomonas recens RS91, a halophilic alkane-utilizing hydrogen-oxidizing bacterium 42 and the other has 95.7% identity with Spiribacter salinus M19-40, an abundant halophilic bacterium isolated from 190 g kg −1 ponds of salterns on Isla Christina, Spain 43 . Among the haloarchaea, genus Halalkalicoccus, Halobiforma had a higher relative abundance in the red brine while genera like Haloquadratum were more abundant in the high salinity brine. The preferred prevalence of strains at either highest or medium salinity most likely reflected their competitive advantages at these salinities. For example, it is well established that Haloquadratum dominates at the highest salinities by efficiently competing for the extant resources and by being able to resist the imposed desiccation stress 27 .
Binning of the assembled reads confirmed the overlap in diversity between the two higher salinity sites and their difference with the green site. Only a small overlap in sequences was found, potentially related to the previously mentioned haloarchaea in the green basin. Also, the observed difference in GC content was typical for hypersaline ecosystems where the majority of haloarchaeal species had a high GC content whereas at lower salinities the lower GC Bacteria dominated 44 . An exception is Haloquadratum walsbyi, which led to a significantly different GC signature in crystallizer ponds of the Spanish, Santa Pola solar salterns, in agreement with the abundance of this organism in the crystallizers. In Lake Meyghan, however, the contribution of Haloquadratum barely exceeded 2% and therefore did not influence the overall GC content. Despite significant differences in the community composition, the functional properties of the different brines as deduced from the COG distribution did not greatly differ. Most likely, this reflected the conserved ecosystem functioning of saline lakes and brines that largely consist of photosynthetic primary producers and heterotrophic decomposers while higher organisms and predators are absent or limited to the smaller brine shrimps, Artemia. Similar conservation in ecosystem function was found in microbial mats ranging from hypersaline, hot spring, to coastal microbial mats 45 . Only at the gene-level, variations between the brines were found, including a decrease in number of assigned functions in both hypersaline brines. This probably reflected the lower diversity of the two hypersaline brines with less species being able to thrive under these conditions. In the most saline, white brine, flagella synthesis appeared important, which may aid the cells in chemo-and phototaxis. The abundantly present glycerol kinase genes is in agreement with glycerol as major carbon and energy source derived from algae like Dunaliella sp 38 . However, metagenome analysis cannot predict gene expression and protein synthesis and for a thorough analysis of functional complexity in microbial ecosystems, a metatranscriptomic or proteomic approach is essential.
In general, when isolating microorganisms, the results often differ from the actual distribution of microbial taxa in that environment due to the inability to culture a large proportion of microbial species while opportunistic rare species dominate the isolates 46 . The most abundant bacterial genera isolated, Halomonas and Idiomarina, and to a lesser extent Pseudomonas and Streptomyces, are also abundant in the metagenome datasets suggesting that the isolates give a reasonable representation of the dominant diversity. A similar result was obtained for the halobacterial genera of Halorubrum and Haloarcula. However, many more genera are identified by metagenomics and several of these may not be isolated in the near future due to specific yet unknown growth requirements. Isolation yielded several species that probably belong to the rare biodiversity that are not sufficiently present to be detected by metagenomics but may be opportunists that easily grow under the applied cultivation conditions. A better comparison between cultivation and metagenomics would require a multitude in isolates grown under variable conditions.

Methods and Materials
Sampling sites and sample collection. Lake Meyghan (34°11′-27.91″N, 49°50′-26.70″E) was sampled in November 2013 at the peak of the dry season. The shallow brines were sampled at three different sites that were named according to the brine color. These were G (green, 34°11′-21.59″N, 49°50′-45.73″E), R (red, 34°11′-20.78″N, 49°50′-21.87″E), and W (white, 34°11′-35.31″N, 49°50′-18.25″E) (Fig. 1). Thirty liter samples were taken aseptically and transferred to sterile plastic containers and were brought to the laboratory within a few hours. Total DNA was extracted from ~20 liters of sample that was prefiltered over 20 μm (ALBET, Germany) filters and subsequently filtered over multiple 0.2 μm (Sartorius, Germany) nitrocellulose filters until they clogged. The remainder of the samples was stored at 4 °C for physicochemical analyses and culturing. Aliquots of the samples were sent to a commercial water chemistry laboratory (Khak Behin Azma Co., Iran) for analysis of the chemical composition.  25 . The plates were incubated aerobically for 8 weeks at two different temperatures: 30 °C and 40 °C. Isolated colonies of microorganisms were selected according to their size, shape, and color and streaked on new agar plates with the same growth medium. This procedure was repeated at least three times until pure cultures were obtained.
Nucleic acid extraction, amplification and sequence analysis. Genomic DNA was extracted from colonies grown on the agar medium from which the strain was isolated, using the Genomic-DNA extraction kit (Roche, Diagnostic, Mannheim, Germany) and according to the manufacturer's recommendations. The DNA concentration and purity were assayed using the Nanodrop 1000 spectrophotometer (Thermo Scientific, Wilmington, DE, USA) and were confirmed by visualization on a 1% agarose gel. The 16S rRNA genes from the isolates were amplified using either the Bacteria specific primers 27F and 1492R or the Archaea specific primers 20F and 1530R ( Table 7). The PCR conditions were as follows for Bacteria: 94 °C for 3 min, followed by 25 cycles of 95 °C for 45S, 55 °C for 45S and 72 °C for 90S and a final 10 min extension at 72 °C and for Archaea: 94 °C for 3 min, followed by 30 cycles of 94 °C for 15S, 52 °C for 30S and 72 °C for 50S and a final 7 min extension at 72 °C. Sanger sequencing was performed on an ABI 3730XL DNA sequencer at Macrogen (Seoul, South Korea) generating on average 900 bp sequences using bacterial 27F or the archaeal 20F oligonucleotides as sequencing primers. Neighboring taxa were identified using the BLASTN program and analysed by pairwise sequence alignment to calculate sequence similarity using the EzBioCloud server (www.ezbiocloud.net) 50 . The sequences were considered to belong to an operational taxonomic unit (OTU) when sharing ≥97% sequence identity.
For metagenomic analysis, multiple filters were pooled for DNA extraction using a phenol-chloroform based protocol 51 . Metagenome sequencing was performed using a paired-end protocol with an average read size of 100 bp (PE100) on Illumina HiSeq 4000 (BGI, Hong Kong) on randomly sheared 170-500 bp DNA fragments.
Bioinformatic data analysis. Taxonomic and functional profiling of raw sequencing reads was performed on the MG-RAST pipeline that contains its own quality control algorhythm 52 . Taxonomic profiles of the microbial community were extracted from the MG-RAST annotation pipeline focusing on sequences annotated as small subunit ribosomal (SSU) RNA using the Silva SSU database as a reference. For further in house analysis, low quality reads were removed using Trimmomatic 53 . The filtering cutoffs were: minimum length 100 bp, average quality score 20, sliding window 4:20, maximum info 100:0.8.
Metagenomic contigs were annotated using a collection of in house bioinformatics scripts (supplementary material S2). The pipeline included: the identification of ORFs with Prodigal 54 , tRNAs with tRNAscan-SE 55 and functional annotation using BLAST 56 implemented in USEARCH 57 and HMMER-3 58 . The databases used for taxonomical and functional annotation were: Silva SSU release 123 59 , COG 60 , TIGRFAM 61 and NCBI non-redundant RefSeq release 75 62 . Comparative metagenomic analysis of the three samples was applied in order to overcome the shallow sequencing of the metagenome and to identify sequences that appear in all samples and those that are sample specific. Quality filtered reads were co-assembled using MEGAHIT 63 with a minimum contig length of 1,000 bp. Reads from each metagenome were mapped back to the contigs using bowtie-2 64 . Anvi' o pipeline 65 metagenomics workflow was applied to build a contigs database and to analyze the distribution of functional annotation of open reading frames and GC% across the individual samples. Briefly, reads of each sample were mapped to the co-assembled contigs and bins of clustered contigs were generated and stored in a phylogenetic tree applying the default settings of CONCOCT 66 . Hierarchical clustering was done by combining the sequence alignments with the differential coverage of relative abundance of individual sample reads. The individual sample reads were aligned to the sequences in the contig bins using Euclidean distance with a maximum contig length of 20,000 bp.
Quantitative real time PCR (qPCR). The abundance of 16S rRNA genes (bacterial and archaeal) in each sample was measured using qPCR. The standard curves and qPCR were performed as described by Nathani et al. 67 . Briefly, plasmid DNA possessing a full-length copy of the 16S rRNA gene belonging either to the Halorubrum chaoviator (DSM 19316) or Escherichia coli (ATCC 25922) were used as DNA standards in qPCR. The target DNA for standard curves was amplified using the domain-specific primer sets 27F and 1492R for Bacteria and 20F and 1530R for Archaea. Copy number per µl of extracted DNA was calculated using formula: Copy number per µl = (concentration of plasmids (gm/µl) × 6.023 × 10 23 )/(length of recombinant plasmid (bp) × 660), where 660 is the molecular weight of one base pair in double-stranded DNA and 6.023 × 10 23 = Avogadro's number 68 . qPCR was performed with Rotor-Gene TM 6000 (Corbett Research Biosciences, Sydney, Australia) using the domain-specific primer sets Eub338 and Eub518 to detect Bacteria and Parch519F and ARC915R to detect Archaea (Table 7). Amplifications were performed in a total volume of 15 μl, containing 30 ng of template DNA, 7.5 μl of 2× Maxima SYBR Green qPCR Master Mix (Fermentas, France), 1.5 μl of each primer (10 pmol/μl) and sterile H 2 O to a total volume 15 µl. The amplification program involved an initial denaturation step at 95 °C for 15 min, followed by 40 cycles of 94 °C for 30S, 60 °C for 30S and 72 °C for 20S for both Bacteria and Archaea. For all standard curves, the coefficients of determination (R 2 value) were better than 99.0%. The number of target genes per ml of sample was calculated using formula: Number of gene copies per mL samples = (gene copies per reaction mixture × volume of DNA (µl))/(3 µl DNA per reaction mix × volume of sample (ml)) 62 .
Nucleotide sequence accession numbers. The 16S rRNA gene sequences determined for the isolates in this study have been deposited in GenBank with accession numbers KY411714-KY411818. The metagenomic data is publically available at MG-RAST under the accession numbers: 4683415.3, 4683416.3 and 4683417.3 and at the EBI metagenomics server under accession numbers ERS1455389, ERS1455390 and ERS1455391.