Distribution of Bathyarchaeota Communities Across Different Terrestrial Settings and Their Potential Ecological Functions

High abundance and widespread distribution of the archaeal phylum Bathyarchaeota in marine environment have been recognized recently, but knowledge about Bathyarchaeota in terrestrial settings and their correlation with environmental parameters is fairly limited. Here we reported the abundance of Bathyarchaeota members across different ecosystems and their correlation with environmental factors by constructing 16S rRNA clone libraries of peat from the Dajiuhu Peatland, coupling with bioinformatics analysis of 16S rRNA data available to date in NCBI database. In total, 1456 Bathyarchaeota sequences from 28 sites were subjected to UniFrac analysis based on phylogenetic distance and multivariate regression tree analysis of taxonomy. Both phylogenetic and taxon-based approaches showed that salinity, total organic carbon and temperature significantly influenced the distribution of Bathyarchaeota across different terrestrial habitats. By applying the ecological concept of ‘indicator species’, we identify 9 indicator groups among the 6 habitats with the most in the estuary sediments. Network analysis showed that members of Bathyarchaeota formed the “backbone” of archaeal community and often co-occurred with Methanomicrobia. These results suggest that Bathyarchaeota may play an important ecological role within archaeal communities via a potential symbiotic association with Methanomicrobia. Our results shed light on understanding of the biogeography, potential functions of Bathyarchaeota and environment conditions that influence Bathyarchaea distribution in terrestrial settings.

Phylogenetically, MCG is rather divergent. Totally 21 subgroups were identified and the similarity between the most distant sequences is 75% 20 . Indicator value analysis indicates that some MCG subgroups appear to adapt well to specific environments, such as subgroups 1, 3 and 8 to saline sediments, subgroups 5a, 5b, 7, 9 and 11 to freshwater sediments 20 . Moreover, salinity is demonstrated to be the main factor controlling the adaptive evolution of subgroups as indicated by ancestor state reconstruction analysis 20 .
Despite these advances, the geographic distribution pattern of MCG subgroups among terrestrial habitats remains largely unknown. One reason is that, systematic and comprehensive evaluation about abundance and diversity of MCG subgroups in terrestrial environments is still sparse. On the other hand, multiple factors may affect the structures of MCG communities which make it difficult to clearly picture the general distribution pattern of MCG on large scale. For example, abundant and diverse MCG lineage is detected in acidic and neutral coastal sediments 21 , contrarily only a small amount of MCG is observed in alkaline lagoon sediments 7 . Temperature also impacted the abundance of MCG as indicated by mesocosm experiments with high arctic peat soils which showed an increase of temperature will decrease the relative abundance of MCG members 22 .
With the accumulation of DNA sequencing data from various terrestrial environments, it is highly possible to further understand the distribution pattern of MCG on large scale via bioinformatics analysis. Thus to fully address the distribution pattern of MCG on large ecological scale, archaeal clone libraries were firstly constructed with peat sediments from the acidic Dajiuhu Peatland, central China. Subsequently phylogenetic analysis was conducted by pooling sequences from other terrestrial locations that were available on July 2015. Our results provide new insights into overview on the distribution pattern of MCG across terrestrial ecosystems, their correlation with environmental conditions and their ecological functions.

Results
Widespread distribution of MCG lineages. A total of 890 archaeal OTUs (with 97% cutoff) from 51 sites (Supplementary Table S1) around the world were classified into 19 lineages ( Supplementary Fig. S1), whose mean relative abundance was plotted against their occurrence (number of sites in which given lineage was found) (Fig. 1). A significant positive correlation (R 2 = 0.82) between mean relative abundance and occurrence ( Fig. 1) suggested that broadly-occurred archaeal lineages are more abundant than those with restricted occurrence. MCG was the most frequently observed and abundant archaeal group in terrestrial environments. It was detected in 28 sites and accounted for 27.8% of the total archaeal sequences ( Fig. 1) with a relative abundance of 0.7-82.1% among those sites.
Diversity of MCG lineage. The numbers of clone libraries and MCG 16S rRNA sequences varied from one site to another (Table 1). Specifically the number of clone libraries in lagoon was the highest, whilst the number of MCG 16S rRNA sequences was highest in peat sediments. Both the numbers of clone libraries and sequences were the lowest in samples from hot springs (Table 1). However, little difference was observed in Shannon-Wiener indices of MCG among peat sediments, mangrove soils and estuary sediments. The lowest PD index was observed within soils and the highest was in mangrove soils (Table 1).
Different from the biodiversity indexes of Shannon-Wiener and PD, PSV quantifies the phylogenetic relatedness of all species in a community. Observed PSV (Table 1) was remarkably different among habitats. The highest value of PSV in soils indicated a distant phylogenetic relatedness which means a phylogenetic overdispersion. Low PSV values in hot springs, estuary sediments and mangroves suggested a close phylogenetic relatedness which means a phylogenetic clustering. Overall, the mean observed PSV value (0.56) was significantly lower than those given by two null hypothesis models Null 1 (0.64, p < 0.05) and Null 2 (0.61, p < 0.05). These results indicated that coexisting taxa within habitats were more closely related (phylogenetic clustering) than expected by chance.  Beta diversity of these clone libraries were analyzed with weighted PCoA according to phylogenetic community similarity. Principle coordination 1 (P1) and 2 (P2) explained 21.2% and 9.7% of the total variance, respectively (Fig. 2). MCG communities showed a high specificity to habitats, samples from a specific habitat clustered together well except those from lagoons with a scattering distribution. The high specificity was further strongly supported by permutation manova test (R 2 = 0.53, P < 0.01). It is interesting to see that phylogenetic community composition of estuary sediments and mangrove soils, peat sediments and hot springs were close related to each other, respectively (Fig. 2).

Phylogenetic analysis and community compositions of MCG lineages. Totally 1456 MCG
sequences (353 OTUs, at the threshold of 97% similarity) were used to construct MCG phylogenetic tree (Fig. 3). The MCG phylogenetic tree was classified into 18 subgroups with intergroup identity ranging from 82% to 96%. Notably, new cluster peat MCG (pMCG) displayed high stability with a bootstrap value of 100%. However, about 1.2% of MCG sequences (17 sequences) still remained unclassified (uMCG).
As indicated by phylogenetic analysis (Fig. 3), MCG 6 and MCG 5b dominated the MCG lineages with a relative abundance of 38.3% and 20.0%, respectively, followed by MCG 8, MCG 17 and MCG 15. The remaining 13 subgroups contributed < 3% of the total sequences (Supplementary Table S2). Overall, MCG communities showed high similarities both in group number and the dominant groups in a specific habitat (Fig. 4). For example, subgroup number ranged from 4 to 6 and the dominant subgroup was MCG 5b in all peat samples (Fig. 4). However, both the number of MCG subgroups and community composition varied significantly in estuaries. Totally 16 subgroups have been detected and MCG 8 and MCG 17 dominated estuarine MCG communities with a relative abundance of 25.6% and 23.1%, respectively. Fourteen MCG subgroups were observed in mangrove soils. Among the MCG subgroups, subgroup 8 and 15 were detected from various environments such as lagoons, estuary sediments and mangroves. In addition, MCG subgroup 6 was the sole subgroup detected in soil samples.
Indicator value (IndVal) function analysis identified 9 indicator groups from all habitats investigated (Fig. 4). Estuary sediments had 4 indicator groups, followed by peat sediments (2), mangrove soils (1), hot springs (1) and soils (1). No indicator group was observed in lagoon samples (Fig. 4). Furthermore, MCG subgroup 3, 5a, 5b, 13 and peat MCG with high IndVal values (0.83-1.0) suggested a high specificity to their habitats. The other indicator MCG subgroups showed a moderate specificity to their habitats with IndVal values of 0.43-0.67. IndVal value of MCG 6 was rather low (0.43) due to its extensive distribution in various environments. In contrast, peat MCG lineage was exclusively detected in peat habitat with a relative abundance of 2.5% with the highest IndVal value of 1.0 (Fig. 4).
Relationship between MCG community structure and environmental factors. We tested the relationship between MCG phylogenetic communities and environmental factors including pH, salinity, temperature, TOC and TN using permutational analysis of variation. Results indicated that salinity was the strongest environmental force (R 2 = 0.39, P < 0.01), which explains 39% of variation of MCG assemblages. Temperature  (R 2 = 0.10, P = 0.02) and TOC (R 2 = 0.08, P = 0.02) also had significant influence on MCG community structure, accounting for 10% and 8% of the variation, respectively. The remaining environmental factors did not show significant impact on MCG groups and contributed to only 6% total variance of UniFrac matrix. The effects of the environment factors on the community structure of MCG subgroups were determined by multivariate regression tree (MRT) analysis. The variance was shown in a 5-leaf tree mainly based on TOC content, followed by salinity and temperature, which can explain 77% of the variance in total (Fig. 5). Samples collected from peat sediments were grouped together by high TOC contents in the sediments, and estuary sediment samples were clustered together by their similar salinity. However, samples from lagoons were split into two groups as large variation of salinity among these sites ( Fig. 5 and Supplementary Table S1). In addition, samples collected from soils and hot springs were clustered with samples from low salinity lagoon and mangrove, respectively, which were affected by temperature (Fig. 5).
The pie charts in MRT showed the relative abundance of MCG subgroups in each leaf. Each leaf was specified by Indicator groups identified by IndVal index, which were mainly responsible for separation of the topology of the MRT. For example, MCG subgroup 5b and pMCG lead to the separation of MCG in peat sediments and in  Interrelation between MCG and other archaeal lineages. A total of 99 archaeal OTUs from 51 sites with a cutoff of 90% were subjected to the co-occurrence network interface analysis to explore potential commensalism that MCG may involve in. The network of terrestrial archaea consisted of 88 nodes and 269 edges among which 40% (35 out of 88) nodes and 68% (184 out of 269) edges were composed by MCG (Fig. 6a). The network topology of terrestrial archaea was characterized by a diameter of 5, average path length of 1.62 and 437 shortest paths. The individual node with top 10 network indices were affiliated with MCG, Halobacteria, 1.1b, peat Crenarchaeota and Methanomicrobia, respectively (Table 2). Furthermore, these network indices of MCG nodes were much higher than those of others archaeal groups, particular for centrality and degree. These results suggested that MCG lineages possess pivotal ecological function among archaeal groups.
The network showed modular structure with a modular index 0.67. The network was classified into 11 modules (Fig. 6b). Modules could be considered as ecological niches (Fisher's test, P < 0.001) in which OTUs were closely connected to each other and connectivity was significantly higher than those between modules. For example, modules 2, 5 and 8 represented sub-networks of peat sediments, hot springs and saline lakes, respectively. MCG was observed in 5 out of the 11 modules (Fig. 6b). Co-occurrence association was observed between MCG and other archaeal groups such as DSAG, MBGD, peat Crenarchaeota, Thermoprotei, and VAL3 (Fig. 6c). It should be noted that, the co-occurrence of MCG and Methanomicrobia were detected in 3 modules out of 5 containing MCG modules indicating a rather close ecological association (Fig. 6c). Members of Methanomicrobia which co-occurred with MCG were affiliated with Methanosaeta, Methanoregula, Methanocella, Methanoperedens and uncultured Methanomicrobiaceae.

Discussion
MCG is a ubiquitous archaeal group in terrestrial environments. It occurred in 28 of 51 sites investigated and accounted for 27.8% of total archaeal sequences analyzed. MCG widely distributed across various terrestrial habitats including hot springs, soils and lagoons with the relative abundance of MCG ranging from 0.7% to 80.0%. Furthermore MCG dominated the archaeal communities in peat sediments, mangroves and estuary sediments. Compared with previous studies, which found that MCG accounted for 35-80% of archaeal communities in peat sediments 8 , 10-100% in high mountain lake 23 , and 43% in southern red soil 11 , our result provided an overall view of terrestrial distribution of MCG group on a large scale. Wide distribution and relatively high abundance strongly suggest that MCG is an important archaeal group in terrestrial ecological process.
Despite of ubiquitous distribution of MCG in terrestrial environment, some MCG subgroups show well-adapted and high specificity to certain habitats, indicating a close relationship between MCG and environmental conditions. This made them suitable to serve as ecological indicator for specific habitats. In this study, nine indicator subgroups were identified based on indicator value function analysis, and at least one indicator subgroup was observed in most habitats investigated except lagoons. Previously, salinity has been shown to be a driving force that shapes archaeal communities both within a specific estuary habitat 24 and on large-scale aquatic habitats 25 . The phylogenetic communities of MCG also showed a close relationship with salinity which significantly architects the composition and abundance of MCG as indicated by MRT. In addition to that, salinity was also found to be an evolutionary barrier that leads to the diversification of MCG between marine and freshwater sediments 20 . This is highly consistent with our observation that salinity was the most significant factor affecting MCG phylogenetic structure.
It is demonstrated that TOC greatly constrains the distribution and diversity of heterotrophs due to their high dependence on TOC for carbon source 26,27 . It should be noted that in our research TOC also significantly shaped the distribution of MCG on a large scale. Even through the knowledge about MCG life style is still limited, several lines of evidence support the heterotrophic lifestyle of MCG recently 13,15,18 . Our observation of the significant impact of TOC on MCG distribution may mount another line of heterotrophic lifestyle of MCG 28 .
Temperature is also an important factor controlling microbial communities. For example, it has been found to be a dominant factor that drives microbial diversity and composition in geothermal springs 29 . Importantly, increase of temperature will decrease the relative abundance of MCG as indicated by both laboratory experiment 22 and field investigation 7 . Our study also demonstrated the impact of temperature on MCG communities to certain extent.
Compared with restricted distribution of Thermoprotei in hyperthermal environments, the wide distribution of MCG across terrestrial environments indicates that MCG have adapted to various environments and potentially have versatile metabolic pathways for survival. Co-occurrence network analysis indicates the dominance of MCG in nodes, edges and modules (Fig. 6b). Moreover, individual nodes with top 10 highest values of network indices are mainly affiliated with MCG. All of these suggest that MCG is the most abundant and widespread archaeal group and plays an important role in construction of archaeal network. Likely, MCG serve as "keystone species" which functions to maintain the stability and adaptability of archaeal community 30 .
Furthermore, the co-occurrence pattern also reveals some of interesting ecological patterns among archaeal divisions which have not been well revealed previously. For example, MCG co-occurred with the members of 1.1a, 1.1b, DSAG, MBGD, peat Crenarchaeota, Thermoprotei, and VAL3. These co-occurrences suggest a preference of similar environmental conditions of these archaeal groups or a commensalism relationship between them. However, the currently available data do not permit us to distinguish these two possibilities at this point.
Notably, members of MCG co-occur more often with the Methanomicrobia (including Methanosaeta, Methanoregula, Methanocella, Methanoperedens and Methanomicrobiaceae) than with other archaeal groups, which indicates a high possibility of a syntrophic association between these two groups. We proposed that acetate may be response for this close association between those two groups. Acetate has been demonstrated to be important organic substrate for growth and methane production of acetoclastic Methanosaeta 31 . Meanwhile, acetate is also a necessary growth factor for hydrogenotrophic Methanoregula 32 , Methanocella 33 and Methanomicrobiaceae 34 . Importantly, some MCG members were demonstrated to possess a gene encoding acetate kinase 18 , which is involved in acetate production. This may explain the mechanism of symbiosis between MCG and Methanomicrobia. In fact the close physical association between the members of MCG and Methanomicrobia (Methanosaeta) has been demonstrated by fluorescence in situ hybridization (FISH) in active sludge 35 .
Due to the high specificity to a given environment, indicator MCG subgroups may play important roles in biogeochemical processes under specific conditions. For example, MCG subgroup 15 (the indicator group of mangrove soils) showed a high abundance in coast and marine environments and their genomic data indicated that they are highly capable of extracellular protein degradation, acetate-producing and dissimilatory nitrite reduction 16 . MCG subgroup 6 (indicator group for soils) may also capable of degrading plant-derived carbohydrate under the anaerobic condition besides the functions proposed for MCG subgroup 15 16 . As indicator groups of estuary sediments, MCG subgroup 3 is proposed to be methanogens and contributes to global methane cycling 17 . Subgroup 16 (indicator group for estuary) could utilize a variety of organic substrate via fermentation to produce acetate 18 which could be subsequently utilized by various microbial groups 36 .
Lack of available physiologic and genomic information renders it difficult to elucidate the ecological function of the remaining 5 indicator MCG subgroups (MCG 4, 13, 5a, 5b and peat MCG). Based on their phylogenetic distance with the groups whose genomic data are available, the functions of some of these subgroups are  speculated. For MCG subgroup 13, we hypothesize that they play a significant role in degrading various organic compounds such as protein, starch, and persistent aromatic compounds according to their close phylogenetic relationship with MCG 16 (Fig. 3). Because they are phylogenetically related to MCG subgroup 6 and 7 (Fig. 3), MCG subgroup 5a and 5b might degrade carbohydrate, protein debris and reduce nitrite to ammonia. As they show no close relationship with other MCG subgroups, the ecological functions of MCG subgroup 4 and peat MCG cannot be speculated.

Materials and Methods
Site description and sampling. The Dajiuhu Peatland (31°28′ 50″ N, 110°00′ 90″ E) locates in northwest Hubei province and is characterized by typically sub alpine monsoon climate. The annual temperature and precipitation were 7.2 °C and 1,560 mm, respectively. The organic carbon was abundant due to the low temperature condition and high ground water level 37 . Four peat sediment samples were collected with 50 ml sterilized centrifuge tubes along a profile in Sep, 2012 and transported to the geomicrobiology lab in China University of Geosciences (Wuhan) on dry ice immediately. Samples for molecular analysis were stored at − 80 °C. Samples for physicochemical analysis were freezing dried and grinded into powder.
Physicochemical analysis. Homogeneous suspension was obtained by mixing the grinded peat sediments with ultra-pure water with a ratio of 1:2.5 (w/v) via stirring with a glass rod for 10 min. After sitting quietly for 30 min, the pH of supernatant was measured three times for each sample and the mean value of three measurements was reported. The total organic carbon (TOC) and total nitrogen (TN) in sediment samples were analyzed with Elemental Analyzer (Elementar Vario EL III) with a precision of ± 0.4% 38  PCR product was visualized with 1% agarose gel. Target bands were excised and recovered with QIAquick gel extraction kit (Qiagen, Germany) according to the manufacturer's instructions. Purified PCR product fragments were ligated into pMD18-T plasmid vector (TaKaRa, Dalian, China) and transformed into the Escherihcia coli JM109 cell (TaKaRa, Dalian, China). Transformants were identified based on blue-white screening on LB media containing 100 μ g/ml of ampicillin. The presence of the insert was screened directly with M13F and M13R vector primers and positive clones were sequenced with A21F primer by Genscript Company (Nanjing, China). Possible vector contamination was checked using NCBI Vecscreen (http://www.ncbi.nlm.nih.gov/tools/vecscreen/). Totally 216 valid sequences were acquired from Dajiuhu peat sediments which are deposited in GenBank under accession numbers of KM514090-KM514305. All the sequences were pooled together as one clone library to increase the representativeness.
Selection criteria for archaeal 16S rRNA sequence through NCBI nr database. To discern the distribution pattern of MCG communities across terrestrial ecosystems and the association between MCG communities and environmental parameters, we collected archaeal 16S rRNA sequences available in Genbank database. Only the sequences of 1) that were amplified with the universal primer set (A21F and A958R); 2) that contained no nucleotide ambiguities and whose lengths were > 700 bp; 3) whose sampling site were well defined were used.
Construction of phylogenetic tree. ClustalW program 40 was used to align the 5226 archaeal 16S rRNA sequences from 51 investigated sites and Mothur software was used to calculate distance matrix and cluster sequences 41 . Those sequences were clustered into Operation Taxonomy Unit (OTU) at the identity level of 97% (890 OTUs) for diversity analysis and 90% (237 OTUs) for network analysis. The selected 890 OTUs (97% similarity) were used to construct archaeal taxonomy tree based on Maximum likelihood method by GRT + G + T mode with Mega 6.0 software 42 . Among 890 archaeal OTUs, 353 MCG OTUs were selected to construct MCG phylogenetic tree. The best phylogenetic tree was constructed with GTR + G2 mode with the bootstrap 1000 and was drawn by iTOL 43 . Statistical analysis. Phylogenetic diversity (PD) 44 and Shannon-Wiener index 45 were used to characterize microbial diversity. PD is the sum of phylogenetic branch length within a habitat 44 . Mean PD value of 1000 random subsamples were used to correct unequal sequence numbers between habitats. The Shannon-Wiener index indicates the number and abundance of species within a habitat which positively correlates with richness and evenness, and is sensitive to the abundance variation of rare species 46 .
UniFrac was conducted to analyze beta diversity which quantifies community similarity based on phylogenetic relativeness 47  Species abundance distribution (SAD) pattern and ecological importance of MCG lineage were estimated according to the relative abundance of archaeal lineages. A multivariate regression tree (MRT) was carried out by R package mvpart (version 1.6-2) 50 to determine the relationship between MCG subgroups and environmental factors.
Phylogenetic structure was estimated with phylogenetic species variability (PSV) 51 which based on species presence/absence and theirs phylogenetic relativeness. This metrics is between 0 and 1, and asymptotically approach 0 as relatedness increases. Comparison between observed PSV and null values from different randomized models were conducted by R package picante (version 1.6-2) 52 to statistically test whether MCG community composition of different habitats is more closely related or distantly related than expected by chance.
Correlation-based method 53 as well as probabilistic graphic model method 54 is a common approach to explore underlying mechanism of microbial ecological process. Despite of the deficiency in the information of control and dependency, correlation-based method shows direct relatedness between (environmental and biological) variables and is widely used to study potential interactions between microorganisms 20,55 . In this study we analyzed correlation between OTUs affiliated with MCG and other archaeal lineages using correlation-based method based on Spearman coefficient 56 . We calculated all possible Spearman rank correlation between those OTUs (at the threshold of 90% identity) with more than 5 sequences (99 OTUs). We removed those poor representative OTUs and reduced network complexity, facilitating the determination of core archaeal community. Valid co-occurrence event with rank correlation coefficients above 0.6 and statistical significant below 0.01 57 were considered to be a robust correlation. The network was characterized by topology indices (diameter, average path length, modularity, number of the shortest path) and indices of an individual node (degree, closeness centrality, between centrality). Specially, modularity was calculated using Louvain algorithm 58 , which include two phases of modularity optimization and community aggravation. Aggravation is repeated until modularity no longer can be increased. All these analysis were performed by R package psych (version 1.6.9) 59 and Gephi (version 0.8.2 beta) software 60 . Network images were visualized by Gephi software. The correlation between modules and habitats were statistically tested by the Fisher's test under software R.