Main

Thaumarchaeota are abundant in both marine and terrestrial environments and have a significant role in the global nitrogen cycle (Leininger et al., 2006; Wuchter et al., 2006). Oceanic Thaumarchaeota are distributed throughout the water column, with their relative abundance increasing with depth to up to 50% of mesopelagic prokaryotic cells (Karner et al., 2001). Cultivation of marine Thaumarchaeota has proven to be difficult. Only one strain, Nitrosopumilus maritimus SCM1 (Konneke et al., 2005), has been successfully brought into pure culture, while several others have been cultivated as enrichment cultures (Blainey et al., 2011; Mosier et al., 2012a, 2012b; Park et al., 2012a, 2012b). Therefore, our understanding of their diversity and ecological function has been gained largely through culture-independent approaches. Using a few marker genes (16S ribosomal RNA, amoA, accA), many studies have shown that marine Thaumarchaeota fall into shallow- and deep-water phylogenetic clades (Francis et al., 2005; Hallam et al., 2006; Nicol et al., 2011). In addition to depth, light, temperature, latitude and dissolved oxygen have been identified as important correlates of Thaumarchaeota diversity and distributions in the ocean (Prosser and Nicol, 2008; Biller et al., 2012). These single-gene-based analyses have outlined the phylogenetic distribution and diversity of marine Thaumarchaeota, but not provided insights into the adaptive mechanisms giving rise to specific ecotypes.

We turned to single amplified genomes (SAGs) in order to link additional metabolic capabilities to taxonomy and to reconstruct phylogeny using characters sampled across the genome. Forty-six single cells related to Thaumarchaeota populations in a variety of marine environments (Supplementary Figure S1) were obtained from epi- and mesopelagic waters of the Southern Ocean, temperate north Atlantic, subtropical north Pacific and south Atlantic (Supplementary Table S1). Genomes of these cells were recovered with variable success, with a mean of 32% (±12%) relative to the Nitrosopumilus maritimus SCM1 genome (1.65 Mbp). A maximum likelihood phylogenomic tree constructed using a concatenated amino-acid sequence of 97 single-copy orthologous genes (Supplementary Table S2) with 27 041 sites strongly supported separating the SAGs into two phylogenetically coherent groups corresponding to epi- and mesopelagic clades (Figure 1). The within-clade average nucleotide identity (ANI) is 89.0% (±8.9%) and 86.8% (±5.4%) for epi- and mesopelagic clades, respectively, whereas the between-clade ANI is 75.4% (±4.1%). When the composite genomes from enrichment cultures or from metagenomic assembly were included in the phylogenomic analysis, the surface water SAGs appeared to be evolutionarily separated from all the cultured marine Thaumarchaeota (Supplementary Figure S2). Among the eight Antarctic SAGs sampled, three were obtained from 80 m, a depth associated with the Winter Water (WW) water mass (Church et al., 2003), whereas the remaining five were sampled from 400 m in the Circumpolar Deep Water (CDW) water mass. Although separated by only 300-m depth, the SAGs from these two water masses were confidently assigned to the epi- and mesopelagic clades, respectively (Figure 1). Conversely, among the mesopelagic cells collected 1000s of km apart from CDW and subtropical waters, SAGs were intermingled within the mesopelagic clade (Figure 1).

Figure 1
figure 1

Maximum likelihood phylogenomic analysis of 47 genomes of the marine Thaumarchaeota. The tree was constructed using the RAxML v7.3.0 software (Stamatakis, 2006) using a concatenated amino-acid sequence of 97 genes with 27 041 sites, with a data partition model determined by the PartitionFinder software (Lanfear et al., 2012). Values at the nodes show the number of times the clade defined by that node appeared in the 100 bootstrapped data sets. Two Crenarchaeota outgroup species are not shown. Details of tree construction can be found in Supplementary Material. The epi- and mesopelagic clades are indicated by shading. Single-cell genomes from different water masses/locations/depths are marked with different colors as identified in the legend inset.

Previous studies have shown that marine Thaumarchaeota are inhibited by light (Merbt et al., 2012) and suggested that sensitivity to photoinhibition might be a key factor determining their depth distribution (Mincer et al., 2007; Church et al., 2010; Hu et al., 2011; Merbt et al., 2012). Light was also implicated as the factor determining the seasonal dynamics of Thaumarchaeota in the Southern Ocean, where they are abundant during winter but nearly absent in summer (Kalanetra et al., 2009). Yet no direct evidence from field populations has been reported to support these hypotheses. Our analyses of genomes from uncultivated Thaumarchaeota showed that a homolog to deoxyribodipyrimidine photolyase, a key gene in the pathway to repair ultraviolet-induced DNA damage (Goosen and Moolenaar, 2008), was present in a SAG sampled from Gulf of Maine surface water (1-m depth), but absent from all of the 42 mesopelagic SAGs (Supplementary Table S3). The occurrence of putative DNA photolyases in surface water Thaumarchaeota was reinforced by conducting a rigorous search for DNA photolyase genes associated with Thaumarchaeota reads in the Global Ocean Survey (GOS) surface water metagenomes. Our identification of seven putative Thaumarchaeota photolyase genes in the GOS data (Figure 2a) is consistent with the hypothesis that light is an important factor structuring marine Thaumarchaeota populations by depth, and suggests that surface water members have evolved effective mechanisms to cope with ultraviolet-induced DNA damage. It remains unknown, however, whether the process of ammonia oxidation is indeed subject to photoinhibition, as has been suggested previously (Mincer et al., 2007). DNA photolyase was not found in the other three epipelagic clade SAGs that were associated with the Antarctic WW water mass and collected from 80 m, which is inconclusive because of the low recovery of genome sequence from these cells (coverage <35% relative to N. maritimus SCM1).

Figure 2
figure 2

Bayesian phylogenetic tree of (a) photolyase and (b) catalase amino-acid sequences. The trees were constructed using the MrBayes v3.1.2 software (Ronquist and Huelsenbeck, 2003) using the WAG +Γ4 model. Values at the nodes are posterior probabilities of the internal branches. Details of tree construction can be found in Supplementary Material. The distinct phylogenetic groups (Tharmarchaeota, Euryarchaeota, Crenarchaeota, Bacteria) are indicated by shading. The trees consist of sequences from single cells (filled star), reference taxa with sequence id (NCBI gi/accession/locus tag) given in parenthesis, and homologs in GOS metagenomes with sequence id in the format of ‘JCVI_READ_XXXX’.

Members of the epi- and mesopelagic clades also appear to differ in their capabilities for reducing oxidative stress. Genes encoding superoxide dismutase, which catalyzes the dismutation of superoxide into oxygen and hydrogen peroxide, are equally abundant in epi- and mesopelagic clades (Supplementary Table S3; χ2 test; P>0.05). Hydrogen peroxide is subsequently converted to water by the enzymes peroxiredoxin (also known as alkyl hydroperoxide reductase) and catalase. Although peroxiredoxin gene families occur with comparable frequency in both clades (Supplementary Table S3; χ2 test; P>0.05), a gene with high homology to catalase was found exclusively in two WW SAGs of the epipelagic clade (Supplementary Table S3). A key difference between these two types of antioxidant enzymes is that peroxiredoxin is 100- to 1000-fold less efficient than catalase; the latter becomes crucial once the former is saturated with hydrogen peroxide (Parsonage et al., 2008). Further, there is evidence that catalase is critical in minimizing ultraviolet-induced oxidative damage in bacteria (Costa et al., 2010). Phylogenetic analysis suggested that this gene was acquired through horizontal gene transfer (Figure 2b), which is further substantiated by its absence in the genomes of any of the seven cultured marine Thaumarchaeota sequenced to date by homology search, all of which are members of the epipelagic ecotype (Supplementary Figure S2). These results are consistent with microbes in epipelagic waters experiencing a stronger oxidative stress because of photochemical and photosynthetic production of reactive oxygen species compared with those in deep water where biological activity is the single source of superoxide (Diaz et al., 2013).

When gene functions were assigned more broadly using either COG (Tatusov et al., 1997) or arCOG (Wolf et al., 2012) categories, we found that the genome content of the epipelagic clade was significantly different from the mesopelagic clade (χ2 test; P<0.001), with the signal transduction functional category significantly enriched in the epipelagic clade (Rodriguez-Brito et al., 2006). The ability of generalist marine bacteria to respond to a changing environment has been attributed to differences in the sophistication of regulatory machinery (Lauro et al., 2009; Luo et al., 2013), and this reasoning may apply to selection pressures operating on epipelagic Thaumarchaeota compared with those inhabiting the more stable mesopelagic waters. By contrast, we found a higher abundance of urease genes in SAGs from the mesopelagic compared with the epipelagic clade (Supplementary Table S3), consistent with a recent report of the depth distribution of Thaumarchaeota urease genes in polar oceans (Alonso-Sáez et al., 2012).

In conclusion, differentiation of epi- and mesopelagic Thaumarchaeota populations first detected by analysis of single genes (Francis et al., 2005; Hallam et al., 2006) is supported by phylogenomic analysis of partial genomes retrieved from uncultivated cells. The exclusive presence of putative DNA photolyase and catalase in SAGs from the epipelagic is strong evidence that light or light-driven photochemistry is a major factor structuring marine Thaumarchaeota by depth.