Main

Following the first report of the wide distribution of archaeal lineages in the marine environment (DeLong, 1992), a commentary by Gary Olsen stated enthusiastically: ‘…overlooking the Archaea has been equivalent to surveying one square kilometre of the African savanna and missing over 300 elephants’ (Olsen, 1994). Today, this analogy has been largely verified, and the initial expectations have even been exceeded. A burst in the availability of the first genomic data from a large number of uncultured archaeal lineages has been witnessed in the past few years. According to the NCBI genome database, 1062 archaeal genomes have been made available as of December 2016 (Figure 1), of which 186 are from metagenomes and 111 are single cell genomes. Twice as many are sequenced but not yet released according to the GOLD database. Given that the symbolic number of 100 complete genomes was reached only six years ago (Brochier-Armanet et al., 2011), this provides a measure of how rapidly the field of archaeal genomics is moving. As a comparison, the number of isolates and newly described species has remained stationary (Figure 1), and mainly concerns members of well-characterized lineages, stressing the need for a stronger isolation effort. To that end, metabolic predictions derived from genomic data of uncultured archaeal lineages can also provide unprecedented information to guide culture strategies.

Figure 1
figure 1

Number of archaeal genome sequences and validly described archaeal species over the last 20 years. The orange line and histogram indicate respectively the annual and cumulative number of novel archaeal genome sequences (that is, complete genomes, chromosome, contigs and scaffolds) released in public databases (NCBI, latest update December 2016). The blue line and histogram indicate respectively the annual and cumulative number of validly described archaeal species (Source: List of Prokaryotic Names with Standing in Nomenclature with names published until July 2016—http://www.bacterio.net/).

The analysis of the first genomic data from these newly sequenced lineages has had a strong impact on archaeal systematics, leading to the proposal of a multitude of new clades at various taxonomic levels (orders, classes, phyla, superclasses, superphyla), with a wealth of new assigned names that have replaced the original acronyms from environmental 16S rRNA studies (Table 1). It is important to remember here that there is no established criterion to propose a new taxonomic status above the Class level, an important priority to address in modern microbial systematics (Gribaldo and Brochier-Armanet, 2012). Moreover, the phylogenetic coherence of already established high-rank systematics in both Bacteria and Archaea based on 16S rDNA divergence is far from uniform (Yarza et al., 2014). The current and future deluge of genomic sequences from an ever-larger fraction of uncultured microbial diversity prompts for the urgent establishment of common criteria based on genomic data, particularly in the frame of nomenclature and classification consistency of major reference databases.

Table 1 Newly named archaeal lineages with their original acronyms and corresponding etymology (when applicable)

Under such a deluge of genomic data, the establishment of a robust phylogenetic frame for the Archaea, and, in particular, the placement of all the new uncultured lineages, becomes of paramount importance. This is essential to infer the nature of the last ancestor of Archaea and the very origin of this domain of life, as well as its relationship with eukaryotes. Also, it allows understanding the evolutionary processes that led to present-day archaeal diversity and drove the emergence of specific metabolic capacities and adaptations to different environments, well beyond extreme niches.

The majority of the new genomes originate from uncultured lineages representing a sizeable proportion of microbial life in sediments and water columns, and may significantly increase the already well-recognized importance of Archaea as major players in global biogeochemical cycles (Offre et al., 2013). Thus, access to genomic data and the associated metabolic potential of the first representatives of these lineages is an important step toward understanding their role in the environment, and provides a new outlook on the metabolic diversity of the Archaea (Table 2).

Table 2 Environmental distribution and potential implication in elemental cycles of recently described archaeal lineages

Hereafter, we will present an overview of some of the most significant recent findings, which are discussed based on an updated robust phylogeny of the Archaea obtained from a large taxonomic sampling including all the new uncultured lineages (Figure 2). The fast-evolving nanosized lineages constituting the proposed DPANN superphylum (Rinke et al., 2013) have been treated separately, because their monophyly and phylogenetic placement are unclear, and will be discussed in a dedicated section.

Figure 2
figure 2

Phylogeny of the Archaea. Bayesian phylogeny (PhyloBayes, CAT+GTR+ Γ4) based on a 41 gene supermatrix (8710 amino acid positions). Scale bar represents the average number of substitutions per site. Node supports refer to posterior probabilities, and ultrafast bootstrap values based on a thousand replicates calculated by maximum likelihood (IQTree, LG+C60). The 41 genes consist of 36 genes from the Phylosift marker genes list (Darling et al., 2014), plus RNA polymerase subunits A and B, and three universal ribosomal proteins (L7-L12, L30, S4) from (Liu et al., 2012). The tree is rooted according to (Raymann et al., 2015), but alternative roots are indicated with numbered red dots (see main text for discussion). Grey font indicates the clades for which no isolates are available. Currently proposed taxonomic status: C=Class; P=Phylum; SC=Super Class; SP=Super Phylum.

The expanding TACK superphylum

The TACK superphylum was proposed in 2011 based on phylogenetic proximity and signatures shared with eukaryotes (Guy and Ettema, 2011). At that time, it included the Thaumarchaeota, the Aigarchaeota, the Crenarchaeota and the Korarchaeota (Guy and Ettema, 2011; Table 1). An additional TACK phylum named Geoarchaeota was suggested (Table 1; Kozubal et al., 2013), but was subsequently indicated to represent a deep-branching lineage of the Crenarchaeota (Guy et al., 2014; Table 1), consistently with our analysis (Figure 2). Based on a large-scale phylogenomic analysis it has been recently proposed that the TACK represents a kingdom-level clade named Proteoarchaeota (Petitjean et al., 2014). In recent years, the genomic coverage for members of the TACK has substantially increased, providing a better view on its diversity and evolution.

Thaumarchaeota and the origin of archaeal nitrification

For a long time, the phylum Thaumarchaeota (former Group I Crenarchaeota, Table 1) has been identified with the ecologically important aerobic ammonia oxidizing archaea inhabiting marine (Group I.1a/Nitrosopumilales, Nitrosopumilus, Nitrosoarchaeum, Cenarchaeum), and soil environments (Group I.1b/Nitrososphaerales, Nitrososphaera) (Pester et al., 2011). Increasing availability of genomic data from three new thaumarchaeal lineages (Figure 2) has drastically changed this picture and provided substantial insights into the still largely unexplored metabolic versatility of the Thaumarchaeota. These genomes correspond to Fn1 (Group I.1c) obtained from deep anoxic peat layers (Lin et al., 2015), and to Beowulf (Group I.1d) and Dragon (Group I.1d) obtained from acidic (pH ~3), thermophilic (65–72 °C), iron oxide and sulfur sediments of Yellowstone National Park (Beam et al., 2014; Table 2). Although Fn1 is predicted to obtain energy and carbon from β-oxidation of volatile fatty acids, either by using fumarate as terminal electron acceptor or in syntrophy with methanogens (Lin et al., 2015), both Beowulf and Dragon Thaumarchaeota appear to be versatile chemoorganotrophs, potentially growing on diverse carbohydrates, peptides and amino acids (Beam et al., 2014). Surprisingly, while mostly complete, none of the genomes from these three Thaumarchaeota lineages contains the amoABC genes for ammonia oxidation (Beam et al., 2014). This indicates that the ability to oxidize ammonia is not a general characteristic of the Thaumarchaeota. In this respect, further genomic data and exploration of the metabolic potential and phylogenetic placement of the three lineages, in particular Fn1, which appear to be the closest relatives of aerobic ammonia oxidizing archaea (Figure 2), will provide key information on the emergence of ammonia oxidation in the Thaumarchaeota.

Fn1, Beowulf and Dragon might also provide significant information on the adaptation of ammonia oxidizing Thaumarchaeota to aerobic conditions. Fn1 members were in fact isolated from anaerobic environments (Lin et al., 2015), Dragon members were obtained from hypoxic conditions, and have genes indicating the ability for elemental sulfur reduction (Beam et al., 2014), while Beowulf members were isolated from oxic conditions where they might use oxygen as terminal electron acceptor (as suggested by the presence of a Heme Copper Oxidase, HCO), but might also be capable of growing anaerobically by reducing nitrate to nitrite thanks to the presence of a narGHJI gene cluster (Beam et al., 2014).

Finally, the deep branching of Beowulf and Dragon lineages (Figure 2) may support the hypothesis of a thermophilic ancestor for all Thaumarchaeota and a subsequent adaptation to mesophilic environments (Barns et al., 1996; Eme et al., 2013), a trend becoming more and more evident for many archaeal phyla. Additional genomes and isolation of the first members of these lineages, combined with specific phylogenetic analyses will clarify the overall phylogeny of the Thaumarchaeota and allow further assumptions on the diversity and emergence of various metabolic capacities in this important phylum.

Aigarchaeota and adaptation to oxygen

Genomic coverage has also substantially expanded for the Aigarchaeota (former Hot Water Crenarchaeotic Group, HWCG I, Table 1), a diverse lineage widespread in moderate to extremely hot terrestrial, marine, and subsurface environments (Hedlund et al., 2015) which robustly branch as the sister clade of Thaumarchaeota (Brochier-Armanet et al., 2011 and Figure 2). Obtained from a subsurface geothermal water stream, the metagenome of ‘Candidatus Caldiarchaeum subterraneum’ was the first to be published (Nunoura et al., 2011), and was followed by several SAGs from various hydrothermal environments (Rinke et al., 2013). Another candidate species named ‘Ca. Caldithenuis aerorheumensis’, from an oxic, hot spring streamer microbial community, has been the target of a metatranscriptomic analysis, providing the first insights into the metabolic potential of Aigarchaeota in situ (Beam et al., 2016). They appear as filamentous microorganisms that are capable of chemoorganoheterotrophy by using several organic carbon substrates (Table 2). Seemingly, Aigarchaeota are auxotrophs for vitamins and cofactors, as well as heme, which they might obtain from other community members (Beam et al., 2016).

The phylogenetic placement of Aigarchaeota makes them a key lineage to investigate the emergence of Thaumarchaeota and their specific metabolic adaptations. For example, the presence of an Heme Copper Oxidase, HCO, in the majority of available Aigarchaeota and Thaumarchaeota genomes indicates that the capacity to grow aerobically is a widespread trait of these lineages (Beam et al., 2016). This raises the question of whether adaptation to aerobic environments preceded the divergence of Aigarchaeota and Thaumarchaeota or instead it occurred independently in the two phyla.

Bathyarchaeota: key players in the global carbon cycle

The TACK superphylum has recently acquired a new member lineage, the Bathyarchaeota (former Miscellaneous Crenarchaeotal Group, MCG, Table 1), an emerging clade of great ecological interest. Bathyarchaeota are robustly indicated as the sister lineage to the Aigarchaeota/Thaumarchaeota (Figure 2). This phylogenetic affiliation is also supported by the fact that many genomes of Bathyarchaeota contain homologues of the eukaryotic-like Topoisomerase IB (Meng et al., 2014), a character so far defining the Thaumarchaeota/Aigarchaeota (Brochier-Armanet et al., 2011), pushing the origin of this enzyme further back in archaeal diversification than previously thought. The Bathyarchaeota are ubiquitous in both terrestrial and marine anoxic sediments (surface and subsurface) where they can represent a major fraction of the archaeal community (Kubo et al., 2012; Lloyd et al., 2013). The extensive diversity of this lineage, divided into as many as 17 subgroups (mostly at the family level), suggests a wide variety of metabolisms and environmental adaptations (Kubo et al., 2012). The genomic data now available for six subgroups has revealed a common capacity to degrade peptides to obtain carbon and energy, and a more variable ability to use carbohydrates, fatty acids or aromatic compounds (Lloyd et al., 2013; Meng et al., 2014; Evans et al., 2015; He et al., 2016; Lazar et al., 2016; Table 2). The utilization of a diverse range of organic compounds for heterotrophic growth is also supported by incorporation of 13C-labelled molecules (Seyler et al., 2014). In addition, several members of the phylum possess a complete H4MPT-type Wood-Ljungdahl (WL) pathway and genes for acetate formation suggesting the possibility of growing autotrophically by acetogenesis from H2+CO2, a capacity previously thought to be limited to Bacteria (He et al., 2016; Lazar et al., 2016). Moreover, some members possess markers of methanogenesis, suggesting a possible role in the methane cycle (Evans et al., 2015; see below). The potential metabolic flexibility between autotrophic and heterotrophic growth on a wide range of compounds represents an ecological advantage for the Bathyarchaeota and underlines the importance of this abundant benthic group in the global carbon cycle.

Lokiarchaeum, Asgard and the origin of eukaryotes

Among the major accomplishments of the exploration of uncultured archaeal diversity is the discovery of new lineages proposed to be the closest relatives of eukaryotes. The phylum Lokiarchaeota was defined following the sequencing of the first metagenomic data from the uncultured DSAG lineage (Spang et al., 2015, Table 1). De novo assembly and binning was applied on DNA extracted from deep marine sediment samples (3283 m below sea level) at the Arctic Mid-Ocean-Ridge, in the vicinities of the hydrothermal Loki’s Castle site (Jorgensen et al., 2012). This resulted in the reconstruction of one nearly complete (Lokiarchaeum) and two partial (Loki 2 and Loki3) genomes related to this lineage. These data revealed a surprisingly large number of eukaryotic signature proteins previously thought to be absent in Archaea, in particular genes coding for components related to membrane remodelling and cytoskeletal functions in eukaryotes (for example, actin, small Ras GTPases, extended ESCRT complex; Spang et al., 2015). Consistent with their genomic content, the inclusion of the three Lokiarchaeota in a universal tree of life indicated them as a sister clade to eukaryotes, suggesting that Lokiarchaeota may represent a ‘missing link’ between the two domains of life (Spang et al., 2015). Additional studies have been consistent with this hypothesis by analysing the Lokiarchaeum genome for homologues of a few eukaryotic-like processes, such as the membrane-trafficking system (Klinger et al., 2016) and the selenocysteine-encoding system (Mariotti et al., 2016).

Very recently, genomic sequences have been obtained from three additional uncultured phyla closely related to Lokiarchaeota (Table 1 and Figure 2): the Thorarchaeota, the Heimdallarchaeota and the Odinarchaeota (Seitz et al., 2016; Zaremba-Niedzwiedzka et al., 2017). The Thorarchaeota (former MBG-B) were described based on partial- to near-complete genomes obtained from sediments of the White Oak River estuary, in the sulfate–methane transition zone (Seitz et al., 2016). The Odinarchaeota, and the Heimdallarchaeota genomes were obtained from high-temperature habitats and marine sediments, respectively (Zaremba-Niedzwiedzka et al., 2017). Along with additional metagenomic bins of Lokiarchaeota and Thorarchaeota, they were shown to possess further eukaryotic signature proteins, such as eukaryotic-like tubulins, homologues of the ɛ DNA polymerase and membrane-trafficking components (TRAPP complex, Sec23/24 family proteins), and proposed to form a new superphylum which was named Asgard (Zaremba-Niedzwiedzka et al., 2017, Figure 2). Further evolutionary analysis of Asgard lineages might provide important information on the processes that led to the emergence of the first eukaryotic cell. To confirm and extend these results, isolation of the first representatives of Asgard members are paramount priorities.

Asgard lineages are common inhabitants of anaerobic marine, estuarine and lake sediments, and they might have an important role in the global carbon cycle (Table 2; Teske and Sørensen, 2008). Metabolic prediction suggests that Thorarchaeota are able to degrade organic matter, contributing to the carbon cycle, but also may have a role in intermediate sulfur cycling (Seitz et al., 2016). Based on the presence of an almost-complete H4MPT-type WL pathway and of some electron-bifurcating hydrogenases coding genes in its genome, it was proposed that Lokiarchaeum might be anaerobic, autotrophic and hydrogen-dependent (Sousa et al., 2016). More in depth genomic analyses and the isolation of representative members will be necessary to clarify further the metabolic potential of the Asgard superphylum.

Methanogens, methanogens everywhere!

Methanogenesis is an important and ancient metabolism that is specific to the Archaea (Thauer et al., 2008). For a long time, the known diversity of methanogens was known to fall into two large clades, which were called Class I methanogens (Methanococci, Methanopyri, Methanobacteria) and Class II methanogens (Methanomicrobia: Methanosarcinales and Methanomicrobiales) (Bapteste et al., 2005). These two clusters have been confirmed and enriched by new genomic data. In particular, the monophyly of Class I methanogens was supported by a large-scale phylogenomic analysis of the archaeal domain leading to the proposal of the superclass Methanomada (Table 1 and Figure 2; Petitjean et al., 2015). This additionally stabilized the oft-unclear placement of Methanopyri in the archaeal phylogeny as robustly branching with Methanobacteria, a relationship also supported by a shared derived character, the presence of pseudomurein in their cell walls (Albers and Meyer, 2011).

Concerning Class II methanogens/Methanomicrobia, they now firmly include two novel divisions: Methanocellales (former Rice Cluster I) and Methanoflorentaceae (former Rice Cluster II; Table 1), as well as the non-methanogenic Halobacteria (Figure 2). More specific analyses are, however, necessary to fully resolve the internal relationships of this clade, in particular, to clarify which lineage represents the closest outgroup to the Halobacteria, whose specific amino acid composition might be at the origin of incongruent placements in different published studies. This will be essential to understand the process of adaptation to a halophilic, aerobic and heterotrophic lifestyle from a methanogenic ancestor (Nelson-Sathi et al., 2015; Groussin et al., 2016). Given a possible common origin from a methanogenic ancestor, we propose uniting former Methanogens Class II with their closely related non-methanogenic lineages (Halobacteria, ANaerobic MEthanotrophic (ANME-1), Syntropharchaeales, Archaeoglobi) into a new superclass called Methanotecta (Figure 2 and Table 1).

Recently, important progresses have been done on the characterization of methanogenesis cofactors (Zheng et al., 2016; Moore et al., 2017), and enzymes (Wagner, 2016). Also, a novel pathway for utilization of methoxylated compounds (methoxydotrophic methanogenesis) has been discovered in a member of Methanosarcinales, with important implications for deep subsurface methanogenesis (Mayumi, 2016). In addition, the diversity of archaea capable of methanogenesis appears much larger than previously thought, and among the most exciting discoveries in the archaeal field is the identification of a large number of new lineages of methanogens.

Methanomassiliicoccales: from deep sediments to the human gut

The Methanomassiliicoccales (former Rumen Cluster C/Rice Cluster III, Table 1) are a novel order of methanogens present in various environments such as marine and lake sediments, sewers, soils and also animal digestive systems (insects, ruminants, humans; Dridi et al., 2012; Paul et al., 2012; Borrel et al., 2013; Söllinger et al., 2016; Raymann et al., 2017; Table 2). Importantly, they represent the second lineage of methanogens, other than the Methanobacteriales, to include members consistently adapted to the human gastrointestinal tract (Gaci et al., 2014). The analysis of the first genomes of Methanomassiliicoccales isolated/enriched from the human gastrointestinal tract showed that they are unrelated to any previously known Class I and Class II methanogens, but are rather affiliated to a large clade of non-methanogenic lineages (Borrel et al., 2013, 2014b).

In agreement with their placement, the Methanomassiliicoccales display unique characteristics, such as complete lack of genes coding for methanogenesis from H2+CO2 and the MTR complex, making them reliant on methyl-dependent hydrogenotrophic methanogenesis in an energy-conservation process that is not completely resolved (Borrel et al., 2014b; Lang et al., 2015). Moreover, the Methanomassiliicoccales use specific methyltransferases that contain the rare 22nd proteinogenic amino acid pyrrolysine (Pyl), which is incorporated during translation by a sophisticated process involving a specific amber non-sense codon suppressor tRNA (Borrel et al., 2014a). This genetic code expansion is potentially handled by distinct mechanisms compared with the few other Pyl-containing bacteria and archaea, and even among different Methanomassiliicoccales (Borrel et al., 2014a, b).

The discovery of Methanomassiliicoccales underlines our still poor understanding of the diversity and role of archaeal methanogens in human health and disease, an important area of future research (for a recent review, see Gaci et al., 2014; Bang and Schmitz, 2015). Indeed, trimethylamine (TMA), which can be depleted into methane by Methanomassiliicoccales, is generated by the gut microbiota from nutrients and is further converted in the liver into the pro-atherogenic compound trimethylamine N-oxide (Brugère et al., 2014). Analyses of human-associated Methanomassiliicoccales have supported their role in trimethylamine utilization in the gut but also revealed that members of the two main clades of Methanomassiliicoccales have contrasting associations with subject health status and microbiota (Borrel et al., 2017). Many aspects of the biology of Methanomassiliicoccales remain largely unknown. For instance, there are currently no genomic data and no isolate/enrichment culture from the large diversity of environmental members. This will provide important information on the role of Methanomassiliicoccales in the environment and on the paths that led to their adaptation to the human gastrointestinal tract.

Methyl-dependent methanogenesis: more widespread than previously thought

The type of methanogenesis present in Methanomassiliicoccales is not an isolated case and as been recently inferred in a number of uncultured lineages. The genome sequences from an uncultured novel methanogenic lineage, WSA2/Arc1 (Table 1) were acquired from a wastewater treatment bioreactor (Nobu et al., 2016; Table 2). Because WSA2/Arc1 did not appear to group with any of the previously known methanogens, it was proposed that they represent a new class, tentatively called ‘Ca. Methanofastidiosa’ (Nobu et al., 2016). This is consistent with our phylogenetic analysis (Figure 2), where Methanofastidiosa are robustly placed within a potential new superclass, the Acherontia (see below, Table 1). Interestingly, the metabolism inferred from these genomic data indicates absence of CO2-reducing or aceticlastic methanogenesis, similarly to Methanomassiiicoccales, and a potential specialization on methylated-thiol reduction with H2 (Nobu et al., 2016; Table 2).

Moreover, potential new lineages of methanogens have been reported for the first time within the TACK superphylum. Metagenomic analysis has highlighted the presence of methanogenesis markers (for example, McrA) in two members of the Bathyarchaeota, and it has been proposed that they may proceed through reduction of methyl compounds by H2, like the Methanomassiliicoccales (Evans et al., 2015). Genomic data from a second lineage of putative methanogens with a similar metabolism of reduction of methyl-compounds (methylamines, methanol and methylthiols) was obtained from various anaerobic environments (Table 2) and proposed to represent a new phylum, the Verstraetearchaeota (Vanwonterghem et al., 2016; Table 1), which robustly cluster with the Crenarchaeota (Figure 2). Interestingly, both Verstraetearchaeota and the potentially methanogenic Bathyarchaeota are predicted to be able to gain energy through metabolisms other than methanogenesis (for example, fermentation of peptides), an observation never reported for any previously known methanogens.

Beyond methanogenesis: variations on a theme

Experimental characterization of members of these novel putative methanogenic lineages is needed to clarify their role in methane cycling and more generally in carbon cycling. Indeed, enzymes traditionally considered as markers of methanogenesis (for example, MCR) can also be used for anaerobic methane oxidation in several ANME lineages (Timmers et al., 2017) and have even been shown to catalyse reactions that do not involve methane in two recently characterized strains of a new genus called ‘Ca. Syntrophoarchaeum’ (Laso-Pérez et al., 2016; Table 2). The two ‘Ca. Syntrophoarchaeum’ strains were enriched from gas-rich hydrothermal sediments. Their genomes contain MCR-like complexes that are likely used to activate butane toward butyl-CoM, and this intermediate is further metabolized into acetyl-CoA by β-oxidation and finally to CO2 through the methyl branch of the WL pathway (Laso-Pérez et al., 2016). To perform this novel anaerobic alkane-degradation pathway, the two ‘Ca. Syntrophoarchaeum’ strains are dependent on a syntrophic partner, the sulfate-reducing bacterium ‘Ca. Desulfofervidus auxilii’ (Laso-Pérez et al., 2016). Direct cell-to-cell electron transfer through nanowire and cytochromes may occur between ‘Ca. Desulfofervidus auxilii’ and ‘Ca. Syntrophoarchaeum’, similarly to what has been observed between ANME and sulfate-reducing bacteria (Wegener et al., 2015; McGlynn, 2017). The two ‘Ca. Syntrophoarchaeum’ are robustly placed as sister group to ANME-1 (proposed Methanophagales, Figure 2, Table 1), and we therefore propose that they represent a new order, the Syntropharchaeales (Table 1). This placement is important because it allows to break the branch leading to Methanophagales, so far represented by a single genome, and might clarify the evolutionary processes that led to loss and tinkering of methanogenesis (Borrel et al., 2016). Based on their phylogenetic proximity with Synthrophoarchaeales MCR homologues, it has been suggested that Bathyarchaeota MCR may be involved in a similar metabolism (Laso-Pérez et al., 2016), which will require experimental demonstration.

These new data provide a novel view on the diversity and evolution of methanogenesis and associated metabolisms (for a recent discussion see (Borrel et al., 2016)). In particular, they highlight the widespread distribution and the likely underestimated environmental importance of methyl-dependent hydrogenotrophic methanogenesis, and question its potential antiquity (Borrel et al., 2016). In addition, they are consistent with the hypothesis of a methanogenic ancestor for the Archaea, and a scenario whereby multiple independent losses/tinkering of this metabolism occurred during archaeal diversification (Raymann et al., 2015), the details of which remain to be fully understood.

New emerging clades in the archaeal tree

The availability of genomic data from previously uncharacterized lineages has allowed identification of several interesting new clades.

The Diaforarchaea: a model clade to study adaptive processes in the Archaea

Following the availability of genomic data from previously uncharacterized lineages, the new superclass Diaforarchaea was recently proposed (Petitjean et al., 2015; Table 1). All Diaforarchaea members sequenced so far share two common characters: the lack of eukaryotic-like histones otherwise largely present in archaea and the fact that their 16S and 23S rRNA genes are not clustered in the genome (Brochier-Armanet et al., 2011; Borrel et al., 2014b).

The Diaforarchaea currently contain at least six well-defined lineages (Figure 2). Other than the already-mentioned Methanomassiliicoccales, they include: the Thermoplasmatales, which include the only known examples of wall-less archaea and inhabit extreme acidic, hot, solfataric environments, and are important contributors to the release of toxic acid mine drainage into the environment (Baker and Banfield, 2003); the Deep sea Hydrothermal Vent Euryarchaeota group 2 (DHVE-2, Aciduliprofundum), making up 15% of the Archaea at hydrothermal vents, where they contribute to sulfur and iron cycling (Takai and Horikoshi, 1999; Reysenbach et al., 2006); the Marine Benthic Group D (MBG-D), a class-level lineage abundant in anoxic deep sediments (for which we suggest the name Izemarchaea, Table 1) that has a key role in the global carbon cycling through degradation of organic matter (Lloyd et al., 2013; Table 2); the Terrestrial Miscellaneous Euryarchaeotic Group (TMEG, Table 1), whose members are found in gold mines, freshwater, marine sediment and peat soils, where they degrade long-chain fatty acids and reduce organosulfate or sulfite, important for carbon re-mineralization (Teske, 2006; Teske and Sørensen, 2008; Lin et al., 2015; Table 2); and two class-level lineages corresponding to Marine Group II (MG-II, Thalassoarchaea, Martin-Cuadrado et al., 2015) and Marine Group III (MG-III, Li et al., 2015), for which we propose the name Pontarchaea (Table 1). Thalassoarchaea and Pontarchaea are abundant in oxygenated surface and deep marine waters, where they contribute to the oceanic carbon cycle by degrading extracellular proteins, carbohydrates and straight-chain lipids (Iverson et al., 2012; Li et al., 2015). Some Thalassoarchaea occupying the shallow photic zone are also able to obtain energy from light by using proteorhodopsins, which were acquired via lateral gene transfer from marine Proteobacteria (Frigaard et al., 2006; Iverson et al., 2012). Formerly thought to be restricted to deeper waters, some Pontarchaea are now known to also be epipelagic photoheterotrophs (Haro-Moreno et al., 2017).

The wide range of lifestyles of the Diaforarchaea lineages and their specific genomic and cellular characteristics make them a great model to study the processes underlying archaeal evolution and adaptation to contrasted environments. These may include for example the transition between anoxic (methanogenic lineages) and oxic (marine lineages) environments, or the consequences of the loss of the cell wall (Themoplasmatales).

Thermococcales gain new friends

The Thermococcales are one of the main model organisms in the Archaea (Leigh et al., 2011). Although they had no close relatives in the archaeal tree for a long time, they now appear evolutionarily related to four uncultured lineages. These form two large clades, possibly at superclass-level, for which we propose the names Acherontia and Stygia (Table 1 and Figure 2).

The Acherontia include Thermococcales, the aforementioned Methanofastidiosa (WSA2/Arc1), and the recently sequenced Theionarchaea (former Z7ME43, Lazar et al., 2017, Table 1). Theionarchaea appear to be widespread in sediments, and possess the WL pathway of carbon fixation into acetyl-CoA which could enter the Krebs cycle, and are probably capable to degrade detrital proteins as well as fix nitrogen to ammonium (Table 2). They also harbour a sulfhydrogenase through which they might use sulfur or polysulfides as electron acceptors (Lazar et al., 2017).

The Stygia include the Hadesarchaea (former South African Gold Mine Euryarchaeotic Group, SAGMEG) and the MSBL-1 (Mediterranean Seafloor Brine Lake Group 1, for which we suggest the name Persephonarchaea, Table 1). Hadesarchaea metagenomes were recovered from hot spring sediments at the White Oak River estuary and Yellowstone National Park (Baker et al., 2016). Hadesarchaea are predicted to be heterotrophic and possess genes for sulphidogenesis and sulfide oxidation, as well as for CO oxidation coupled to nitrite reduction (Baker et al., 2016). Persephonarchaea genomes were obtained from deep anaerobic ocean brine lakes in the Red sea (Mwirichia et al., 2016). They can import and ferment sugars via the Embden-Meyerhof pathway, and possess gluconeogenesis and a Krebs cycle, but no oxidative pentose phosphate pathway. In addition, Persephonarchaea are able to assimilate sulfur and possibly to import and reduce nitrate (Mwirichia et al., 2016). Interestingly, both lineages of the Stygia seem to be able to fix carbon in unusual ways, by combining partial versions of carbon fixation pathways (reverse TCA, Calvin, WL cycles). So far, all these new members of the Stygia and Acherontia have been retrieved from anaerobic and mostly moderate temperature environments, and possess unusual metabolic capabilities (Table 2). Sequencing of new members as well as exploration of their lifestyle will shed light on their metabolic diversity and evolution, including the emergence of the hyperthermophilic and heterotrophic Thermococcales.

Although they are paraphyletic clades in our Bayesian tree (Figure 2), the Stygia and Acherontia appear monophyletic in ML trees (not shown), consistent with a recent universal phylogeny (Hug et al., 2016). The phylogenetic relationships between Stygia and Acherontia will need to be clarified by more specific analyses when new genomic data become available. In any case, the Stygia and Acherontia appear as the closest relatives to the TACK and Asgard superphyla (Figure 2), hence they might hold clues on the emergence of these clades and their special link to eukaryotes. Finally, according to alternative possible rootings of the archaeal phylogeny (see later), the Stygia and Acherontia might be among the deepest branches in the Archaea, and therefore able to provide key information on the origin and deep evolution of this domain.

The enigmatic Altiarchaeales

The Altiarchaeales (former SM1 Euryarchaeon, Table 1) are an uncultured lineage whose members predominate subsurface anaerobic cold groundwater environments, which was proposed to represent a novel archaeal order (Probst et al., 2014). Altiarchaeales are one of the rare cases of archaea with an outer cell membrane, and display unique grappling hooks appendices (‘hami’) and form near pure biofilms in these environments (for a recent review, see Probst and Moissl-Eichinger, 2015). Other than their unique cell envelopes and related structures, the Altiarchaeales are also interesting from a metabolic point of view. The first metagenomic data point in fact to the presence of a complete WL pathway, indicating that they are capable of autotrophic metabolism and may represent an important carbon dioxide sink in the subsurface (Probst et al., 2014; Table 2). Clarifying the placement of Altiarchaeales and their evolutionary relationship with the other archaeal lineages is therefore relevant to all inferences of the metabolic capabilities of the last archaeal ancestor, as well as the origin and evolution of the WL pathway in Archaea.

In early analyses, Altiarchaeales were associated to Methanococcales (Probst et al., 2014). According to the recently proposed new root of the Archaea (Raymann et al., 2015) and our updated analysis, they could represent one of the deepest archaeal branches, possibly at the class or even phylum level (Figure 2). However, due to their fast evolutionary rates, the placement of Altiarchaeales in the archaeal phylogeny should be taken with caution. Other analyses have also suggested that they may be clustered with the fast-evolving DPANN lineages (Bird et al., 2016; Hug et al., 2016), although this may be caused by a tree reconstruction artefact and needs to be confirmed by specific analysis (see below).

Key open questions in the phylogeny of Archaea

The DPANN superphylum: reality or artefact?

One of the most intriguing outcomes of the exploration of archaeal diversity has been the discovery of a large number of archaeal lineages that display extremely reduced cell sizes and genomes. Their 16S rRNA sequences are so divergent that for a long time they have escaped detection by PCR-based environmental surveys. Since the identification of the first nanosized archaeon, Nanoarchaeum equitans, which lives attached to the surface of its crenarchaeotal host (Ignicoccus hospitalis) in hyperthermophilic environments (Huber et al., 2002; Forterre et al., 2009), other similar small archaea have been found in a variety of different environments (Table 3 and references therein). They all display very fast evolution, a phenomenon commonly linked to extreme genome reduction, and lack the coding capacity for most amino acid biosynthetic pathways, indicating dependence on other microorganisms for survival.

Table 3 Characteristics of nanosized archaeal lineages

It has been suggested that nanosized archaeal lineages form a monophyletic deep-branching clade representing a new superphylum named DPANN (for Diapherotrites, Parvarchaeota, Aenigmarchaeota, Nanoarchaeota, Nanohaloarchaeota; Rinke et al., 2013). Since then, four additional DPANN groups have been defined through environmental genomic sequencing: Micrarchaeota, DHVEG-6, Pacearchaeota and Woesearchaeota (Table 3 and references therein). These lineages are found in many different environments (including oxic and anoxic ones; Table 3). Moreover, Pacearchaota and Woesearchaeota, initially sequenced from anoxic environments, are also abundant components of surface waters of oligotrophic alpine lakes (Ortiz-Alvarez and Casamayor, 2016), indicating a wider range of environmental adaptations.

The current picture is that of a very large diversity of DPANN lineages, with at least seven well-supported clades (Figure 3). However, a key question is whether the DPANN constitute a monophyletic clade and where they branch in the archaeal phylogeny. The clustering and deep branching of fast-evolving lineages is in fact a well-known artefact of tree reconstruction called long branch-attraction (Philippe, 2000). Indeed, the monophyly of DPANN was not recovered by in depth phylogenomic analyses (Brochier-Armanet et al., 2011; Petitjean et al., 2014; Williams et al., 2015). These indicated, for instance, Nanoarchaeota as a sister lineage of Thermococcales, in agreement with previous results (Brochier et al., 2005), and Nanohaloarchaeota grouping with Halobacteria, also consistently with previous reports (Narasingarao et al., 2012; Petitjean et al., 2014), but this placement may also be the result of a bias in amino acid composition driven by similar adaptation of these two lineages to high salt environments. Parvarchaeota and Micrarchaeota have been shown to branch with the Diaforarchaea (Petitjean et al., 2014). Nonetheless, the placement of Parvarchaeota and Micrarchaeota is unstable and they also clustered with other fast-evolving lineages at different places in the archaeal phylogeny (Brochier-Armanet et al., 2011; Raymann et al., 2014).

Figure 3
figure 3

Diversity of the DPANN. Unrooted maximum likelihood phylogeny (IQTree, LG+C60) based on concatenation of the same 41 genes as in Figure 2 (9305 amino acid positions). Scale bar represents the average number of substitutions per site. Node supports refer to ultrafast bootstrap values based on a thousand replicates. The tree is only meant to describe the diversity of the DPANN, with at least seven well-supported clades. The question mark represents uncertainty on the relationships among these clades, as well as on the monophyly of the DPANN.

The distribution of archaeal characters in the DPANN might provide additional information to complement phylogenetic analysis (Figure 4). Evolutionary analysis of archaeal DNA replication components has highlighted the existence of a potential character shared between Nanohaloarchaeota, Nanoarchaeota and Parvarchaeota (but not Micrarchaeota): the presence of a peculiar DNA primase where the large and small subunits (PriS and PriL) are combined in a short version of the protein, distantly related to the classical archaeal enzyme (Brochier-Armanet et al., 2011; Raymann et al., 2014). By searching all currently available DPANN genomes, we found that a fused primase appears to be a common feature of the large majority of DPANN lineages apart from Micrarchaeota and Diapherotrites, which possess a classical PriS+PriL primase (Figure 4). This might indicate that at least Micrarchaeota and Diapherotrites could be evolutionarily distinct from other DPANN lineages. However, the presence of a fused primase may also be due to evolutionary convergence for reduced genome sizes, or horizontal gene transfer among DPANN clades.

Figure 4
figure 4

Distribution of marker genes in Archaea. Homologues were searched by Blast and HMM searches against a local database of 646 representative archaeal genomes (one per species). Alignment, phylogenetic analysis and examination of genomic synteny were performed to confirm homology when necessary. To account for the presence/absence of characters in very recent genomes added in public databases, we also checked by Blast on the NCBI. Full circles represent presence in most or all members of the taxon, empty circles absence and partial circles presence in a few members only. It should be noted, however, that absence of genes in uncultured taxa may be due to genomes incompleteness. For the ubiquitin system, we considered presence when at least two out of the three main components were found. For RNA polymerase beta and alpha genes, a single square means a fused gene and two squares a split gene. For primase, a single square means a unitary ‘fused’ primase, and two squares means the classical archaeal two subunit primase (PriS+PriL). Asterisks in the Asgard ESCRT system indicate that they are more similar to eukaryotic than archaeal homologues.

Clearly, the placement of fast-evolving nanosized lineages in the archaeal tree remains an important open issue and a future methodological challenge. Combined with the evidence of a similar phenomenon in Bacteria (Candidate Phyla Radiation; Brown et al., 2015; Hug et al., 2016), exciting avenues of research arise to understand the mechanisms (and potential convergences) that led to such extreme reduction of cell and genome sizes in nanosized lineages, to analyse the impact on fundamental cellular processes, and to obtain information on their largely unknown biology.

The root of the Archaea, the nature of the last archaeal common ancestor and the puzzling distribution of archaeal characters

Rooting the tree of a domain of life is a key issue, as it allows to polarize characters, to establish deep evolutionary relationships among the major phyla, to infer the nature of the ancestor and to propose high-rank systematics categories. The traditional root of the Archaea has been historically placed between Euryarchaeota and Crenarchaeota (Woese et al., 1990) and by extension the TACK (Figure 2, node #1). This root has also been supported by large-scale phylogenomic analyses, leading to a proposal for restructuring high-rank systematics of the Archaea with two major kingdoms, the Proteoarchaeota (TACK) and the Euryarchaeota (Petitjean et al., 2014). Nevertheless, by applying a sophisticated approach to uncover the ancient phylogenetic signal in proteins shared between Archaea and Bacteria, we have recently challenged this root. Support was found for a new root of the archaeal tree that falls within the Euryarchaeota, de facto breaking apart this phylum (Raymann et al., 2015; Figure 2, node #2). We suggested that the traditional root of the Archaea might be the result of an artefact linked to the presence of noisy phylogenetic signal in sequence data, important elements affecting deep phylogenies (Raymann et al., 2015). According to the new root, the first divergence in the Archaea would have separated two large clusters (Figure 2): Cluster I including Proteoarchaeota/TACK, Methanobacteriales, Methanococcales and Thermococcales, and Cluster II including all remaining ‘Euryarchaeota’ (Raymann et al., 2015). The inclusion of the new archaeal lineages described here extends greatly these two clusters (Figure 2).

The possibility of a new root lying within the Euryarchaeota opens up a new look on the origin and early evolution of the Archaea. For example, past inferences on the nature of the last archaeal common ancestor will have to be reconsidered. What used to be regarded as ‘euryarchaeal’ characters might be ancestral while the TACK ones would become derived. Furthermore, as both Cluster I and Cluster II include methanogenic lineages (Figure 2), the possibility arises that the last archaeal common ancestor itself was capable of methanogenesis, and that this metabolism was lost multiple times independently during archaeal evolution (Raymann et al., 2015). The large number of additional potential methanogenic lineages highlighted recently, including members of the TACK (Evans et al., 2015; Vanwonterghem et al., 2016), further supports this scenario (Borrel et al., 2016).

The new root is not at odds with current genomic data. The split between Euryarchaeota and Crenarchaeota for long supported by the distribution of specific characters has now become less sharp following their identification in other phyla (Brochier-Armanet et al., 2011; Eme and Doolittle, 2015; Figure 4). For instance, homologues of the cell division protein FtsZ, of eukaryotic-like histones H3/H4, and of DNA polymerase PolD, all characters previously considered as hallmarks of Euryarchaeota, are now evident in several lineages of the TACK superphylum (Figure 4), and can be inferred in the last archaeal common ancestor, regardless of where the root lies (Figure 2). Moreover, the distribution of these markers in the new divisions provides further interesting information on the diversity and evolution of fundamental cellular processes in the archaea; PolD appears to have been specifically lost in the ancestor of Crenarchaeota/Geoarchaeota, and FtsZ in the common ancestor of Verstraetearchaeota and Crenarchaeota/Geoarchaeota, possibly compensated for by the presence of crenactin or archaeal-like ESCRT systems for cell division (Figure 4).

In general, a search for the distribution of archaeal characters in all novel lineages reveals a much more complex pattern than previously thought. For example, complete eukaryotic-like ubiquitin systems, initially identified in one Aigarchaeota, appear to be present in all available Aigarchaeota, as well as most Asgard and Theionarchaea, and a few Methanomassiliicoccales and Izemarchaea (Figure 4). As already mentioned, homologues of TopoIB, for long indicated as a distinctive marker of Thaumarchaeota, are more widely distributed and scattered among various lineages. Homologues of bacterial type DNA gyrases (GyrA and B), in the past suggested as a possible marker of Cluster II archaea (Raymann et al., 2014), also appear to be more widely present in archaea. Finally, the pattern of split/fusion events in the genes coding for RNA polymerase A and B subunits appears much more complex than previously thought (Figure 4), and might be prone to evolutionary convergence (Brochier et al., 2004). It is expected that additional genomic data from a larger taxonomic sampling will even further complicate the picture. A more thorough analysis is needed to understand whether such puzzling distribution of characters that have key cellular roles reflects an ancient origin and multiple independent losses during archaeal diversification, or rather ancient horizontal gene transfers, or both, and if these events had an impact in the adaptation to different environments and lifestyles.

An important challenge ahead is to test the new root by including the new genomic data that were made available since our analysis. In particular, it will be essential to include the Altiarchaea, the Asgard and the Stygia, which occupy a pivotal position by lying in between the traditional and the new root, which may lead to an alternative root (Figure 2, unnumbered red dots). A robust analysis of the placement of the DPANN is also essential to the root issue. A number of recently published rooted archaeal phylogenies have shown the DPANN as the first emerging branch but with nonsignificant support (Rinke et al., 2013; Williams and Embley, 2014; Castelle et al., 2015; Hug et al., 2016). Still, as discussed above, the very fast evolutionary rates of the DPANN, which might lead to an artefactual monophyly, as well as their attraction at the base of the archaeal tree by the long branch of the bacterial outgroup, prompts for caution. For these reasons, previous analyses of the root of Archaea did not include DPANN lineages (Petitjean et al., 2014; Raymann et al., 2015). Promising alternative approaches are the use of sophisticated evolutionary models and new methodological improvements, such as those that enable to root phylogenies without the use of a bacterial outgroup, thus limiting the risk of tree reconstruction artefacts (Yang and Roberts, 1995; Szollosi et al., 2012, 2013; Groussin et al., 2013; Williams et al., 2015).

Perspectives

These are exciting times for archaeal research. Direct metagenomics approaches are likely to highlight an even higher diversity of archaeal lineages in the environment than currently known based on 16S rRNA surveys, as many commonly used primers may miss entire lineages (Eloe-Fadrosh et al., 2016). These data will give a better picture of the vast reservoir of metabolic capacities in the archaea, the way these emerged during their diversification, and their major ecological roles at the global scale. Yet, to confirm and extend ecological predictions based on sequences, it will be essential to put larger efforts in the isolation of representatives of uncultured lineages. This will also allow to refine genome completeness and annotations, and test a number of important evolutionary predictions currently based on genomes derived from metagenomes, notably the involvement of archaea in eukaryotic origins and the role of eukaryotic-like characters in an archaeal cellular context. Finally, it will open up the possibility to develop new experimental models in addition to the currently available ones, covering a larger representation of archaeal diversity. Combined with detailed evolutionary analyses, these data will continue to provide exciting insights into the fascinating archaeal biology.

Note added in proof

While this manuscript was in press, an analysis was published to determine the root of Archaea and the position of the DPANN (Williams et al., 2017). The authors employed a strategy which does not require the use of an outgroup, but is instead based on a probabilistic gene tree-species tree reconciliation approach taking into account gene family gain, duplication, transfer, and loss. By using a sampling of 62 archaeal genomes, they found that the root of the Archaea lies between a monophyletic DPANN clade and a group comprised of monophyletic Euryarchaeota and the TACK/Lokiarchaeum. However, the position of the different DPANN lineages was unstable when analyzed individually, suggesting that their grouping when analyzed together might be the result of an artefact. As discussed in this review, the monophyly of the DPANN needs to be tested further, and the root of Archaea to be analyzed by using a larger taxonomic sampling including key lineages (Altiarchaea, Stygia, and the new representatives of Acherontia and Asgard) that were not considered in the Williams et al. study.