The origin and cellular complexity of eukaryotes represent a major enigma in biology. Current data support scenarios in which an archaeal host cell and an alphaproteobacterial (mitochondrial) endosymbiont merged together, resulting in the first eukaryotic cell. The host cell is related to Lokiarchaeota, an archaeal phylum with many eukaryotic features. The emergence of the structural complexity that characterizes eukaryotic cells remains unclear. Here we describe the ‘Asgard’ superphylum, a group of uncultivated archaea that, as well as Lokiarchaeota, includes Thor-, Odin- and Heimdallarchaeota. Asgard archaea affiliate with eukaryotes in phylogenomic analyses, and their genomes are enriched for proteins formerly considered specific to eukaryotes. Notably, thorarchaeal genomes encode several homologues of eukaryotic membrane-trafficking machinery components, including Sec23/24 and TRAPP domains. Furthermore, we identify thorarchaeal proteins with similar features to eukaryotic coat proteins involved in vesicle biogenesis. Our results expand the known repertoire of ‘eukaryote-specific’ proteins in Archaea, indicating that the archaeal host cell already contained many key components that govern eukaryotic cellular complexity.
- Eukaryotic evolution, changes and challenges. Nature 440, 623–630 (2006) &
- Open questions on the origin of eukaryotes. Trends Ecol. Evol. 30, 697–708 (2015) &
- Origin of eukaryotes from within Archaea, archaeal eukaryome and bursts of gene gain: eukaryogenesis just made easier? Phil. Trans. R. Soc. Lond. B 370, 20140333 (2015)
- Endosymbiotic theories for eukaryote origin. Phil. Trans. R. Soc. Lond. B 370, 20140330 (2015) , &
- The archaebacterial origin of eukaryotes. Proc. Natl Acad. Sci. USA 105, 20356–20361 (2008) , , , &
- The archaeal ‘TACK’ superphylum and the origin of eukaryotes. Trends Microbiol. 19, 580–587 (2011) &
- The two-domain tree of life is linked to a new root for the Archaea. Proc. Natl Acad. Sci. USA 112, 6670–6675 (2015) , &
- The hybrid nature of the Eukaryota and a consilient view of life on Earth. Nat. Rev. Microbiol. 12, 449–455 (2014) , &
- A congruent phylogenomic signal places eukaryotes within the Archaea. Proc. R. Soc. Lond. B 279, 4870–4879 (2012) , , , &
- Mitochondrial evolution. Science 283, 1476–1481 (1999) , &
- Complex Archaea that bridge the gap between prokaryotes and eukaryotes. Nature 521, 173–179 (2015) et al.
- An archaeal origin of eukaryotes supports only two primary domains of life. Nature 504, 231–236 (2013) , , &
- The origin of the eukaryotic cell: a genomic investigation. Proc. Natl Acad. Sci. USA 99, 1420–1425 (2002) &
- Tracing the archaeal origins of eukaryotic membrane-trafficking system building blocks. Mol. Biol. Evol. 33, 1528–1541 (2016) , , &
- Are there Rab GTPases in Archaea? Mol. Biol. Evol. 33, 1833–1842 (2016) &
- On the archaeal origins of eukaryotes and the challenges of inferring phenotype from genotype. Trends Cell Biol. 26, 476–485 (2016) , &
- Archaeal ancestors of eukaryotes: not so elusive any more. BMC Biol. 13, 84 (2015)
- Endosymbiosis and eukaryotic cell evolution. Curr. Biol. 25, R911–R921 (2015)
- Energy for two: new archaeal lineages and the origin of mitochondria. BioEssays 38, 850–856 (2016) , , , &
- Phylogenomic analysis of lipid biosynthetic genes of Archaea shed light on the ‘lipid divide’. Environ. Microbiol. (2016) , &
- Lokiarchaeon is hydrogen dependent. Nat. Microbiol. 1, 16034 (2016) , , , &
- Lokiarchaeota marks the transition between the archaeal and eukaryotic selenocysteine encoding systems. Mol. Biol. Evol. 33, 2441–2453 (2016) et al.
- Genomic reconstruction of a novel, deeply branched sediment archaeal phylum with pathways for acetogenesis and sulfur reduction. ISME J. 10, 1696–1705 (2016) , , , &
- Genetic diversity of Archaea in deep-sea hydrothermal vent environments. Genetics 152, 1285–1297 (1999) &
- Phylogenomics and the reconstruction of the tree of life. Nat. Rev. Genet. 6, 361–375 (2005) , &
- A Bayesian mixture model for across-site heterogeneities in the amino-acid replacement process. Mol. Biol. Evol. 21, 1095–1109 (2004) &
- The ESCRT machinery in endosomal sorting of ubiquitylated membrane proteins. Nature 458, 445–452 (2009) &
- Archaeal origin of tubulin. Biol. Direct 7, 10 (2012) &
- Evolution of DNA polymerases: an inactivated polymerase-exonuclease module in Pol epsilon and a chimeric origin of eukaryotic polymerases from two classes of archaeal ancestors. Biol. Direct 4, 11 (2009) , , , &
- The TRAPP complex: insights into its architecture and function. Traffic 9, 2032–2042 (2008) , , , &
- The prokaryotic V4R domain is the likely ancestor of a key component of the eukaryotic vesicle transport system. Biol. Direct 3, 2 (2008) , , &
- COPII: a membrane coat formed by Sec proteins that drive vesicle budding from the endoplasmic reticulum. Cell 77, 895–907 (1994) et al.
- Bi-directional protein transport between the ER and Golgi. Annu. Rev. Cell Dev. Biol. 20, 87–123 (2004) , , , &
- Bacterial vesicle secretion and the evolutionary origin of the eukaryotic endomembrane system. Trends Microbiol. 24, 525–534 (2016) , &
- Components of coated vesicles and nuclear pore complexes share a common molecular architecture. PLoS Biol. 2, e380 (2004) et al.
- Functional and genomic analyses of alpha-solenoid proteins. PLoS One 8, e79894 (2013) et al.
- Evolution: on a bender–BARs, ESCRTs, COPs, and finally getting your coat. J. Cell Biol. 193, 963–972 (2011) , &
- Unexpected ancient paralogs and an evolutionary model for the COPII coat complex. Genome Biol. Evol. 7, 1098–1109 (2015) &
- Evolution of the eukaryotic membrane-trafficking system: origin, tempo and mode. J. Cell Sci. 120, 2977–2985 (2007) &
- Endosymbiotic origin and differential loss of eukaryotic genes. Nature 524, 427–432 (2015) et al.
- Late acquisition of mitochondria by a host with chimaeric prokaryotic ancestry. Nature 531, 101–104 (2016) &
- Evolution: mitochondria in the second act. Nature 531, 39–40 (2016)
- The dispersed archaeal eukaryome and the complex archaeal ancestor of eukaryotes. Cold Spring Harb. Perspect. Biol. 6, a016188 (2014) &
- Complex Intracellular Structures in Prokaryotes (ed. ) 3–22 (Springer Berlin Heidelberg, 2006) in
- Energized outer membrane and spatial separation of metabolic processes in the hyperthermophilic archaeon Ignicoccus hospitalis. Proc. Natl Acad. Sci. USA 107, 3152–3156 (2010) , , , &
- S-layer and cytoplasmic membrane—exceptions from the typical archaeal cell wall with a focus on double membranes. Front. Microbiol. 5, 624 (2014)
- The origins of phagocytosis and eukaryogenesis. Biol. Direct 4, 9 (2009) , , &
- From archaeon to eukaryote: the evolutionary dark ages of the eukaryotic cell. Biochem. Soc. Trans. 41, 451–457 (2013) &
- Eukaryotic origins: how and when was the mitochondrion acquired? Cold Spring Harb. Perspect. Biol. 6, a015990 (2014) &
- The energetics of genome complexity. Nature 467, 929–934 (2010) &
- Exploring microbial dark matter to resolve the deep archaeal ancestry of eukaryotes. Phil. Trans. R. Soc. Lond. B 370, 20140328 (2015) et al.
- Genomic inference of the metabolism of cosmopolitan subsurface Archaea, Hadesarchaea. Nat. Microbiol. 1, 16002 (2016) et al.
- Genomic expansion of domain Archaea highlights roles for organisms from new phyla in anaerobic carbon cycling. Curr. Biol. 25, 690–701 (2015) et al.
- Culture-dependent and -independent characterization of microbial communities associated with a shallow submarine hydrothermal system occurring within a coral reef off Taketomi Island, Japan. Appl. Environ. Microbiol. 73, 7642–7656 (2007) et al.
- A modular method for the extraction of DNA and RNA, and the separation of DNA pools from diverse environmental sample types. Front. Microbiol. 6, 476 (2015) et al.
- Trimmomatic: a flexible trimmer for Illumina sequence data. Bioinformatics 30, 2114–2120 (2014) , &
- IDBA-UD: a de novo assembler for single-cell and metagenomic sequencing data with highly uneven depth. Bioinformatics 28, 1420–1428 (2012) , , &
- Ray Meta: scalable de novo metagenome assembly and profiling. Genome Biol. 13, R122 (2012) , , , &
- Community-wide analysis of microbial genome sequence signatures. Genome Biol. 10, R85 (2009) et al.
- Binning metagenomic contigs by coverage and composition. Nat. Methods 11, 1144–1146 (2014) et al.
- Phymm and PhymmBL: metagenomic phylogenetic classification with interpolated Markov models. Nat. Methods 6, 673–676 (2009) &
- Unusual biology across a group comprising more than 15% of domain Bacteria. Nature 523, 208–211 (2015) et al.
- Genome sequences of rare, uncultured bacteria obtained by differential coverage binning of multiple metagenomes. Nat. Biotechnol. 31, 533–538 (2013) et al.
- SPAdes: a new genome assembly algorithm and its applications to single-cell sequencing. J. Comput. Biol. 19, 455–477 (2012) et al.
- CheckM: assessing the quality of microbial genomes recovered from isolates, single cells, and metagenomes. Genome Res. 25, 1043–1055 (2015) , , , &
- Prodigal: prokaryotic gene recognition and translation initiation site identification. BMC Bioinformatics 11, 119 (2010) et al.
- IMG: the integrated microbial genomes database and comparative analysis system. Nucleic Acids Res. 40, D115–D122 (2012) et al.
- tRNAscan-SE: a program for improved detection of transfer RNA genes in genomic sequence. Nucleic Acids Res. 25, 955–964 (1997) &
- InterProScan 5: genome-scale protein function classification. Bioinformatics 30, 1236–1240 (2014) et al.
- CDD: NCBI’s conserved domain database. Nucleic Acids Res. 43, D222–D226 (2015) et al.
- Archaeal clusters of orthologous genes (arCOGs): an update and application for analysis of shared features between Thermococcales, Methanococcales, and Methanobacteriales. Life 5, 818–840 (2015) , &
- The Pfam protein families database: towards a more sustainable future. Nucleic Acids Res. 44 (D1), D279–D285 (2016) et al.
- SMART: recent updates, new developments and status in 2015. Nucleic Acids Res. 43, D257–D260 (2015) , &
- The HHpred interactive server for protein homology detection and structure prediction. Nucleic Acids Res. 33, W244–W228 (2005) , &
- The Phyre2 web portal for protein modeling, prediction and analysis. Nat. Protocols 10, 845–858 (2015) , , , &
- UCSF Chimera–a visualization system for exploratory research and analysis. J. Comput. Chem. 25, 1605–1612 (2004) et al.
- RAxML version 8: a tool for phylogenetic analysis and post-analysis of large phylogenies. Bioinformatics 30, 1312–1313 (2014)
- The archaeal legacy of eukaryotes: a phylogenomic perspective. Cold Spring Harb. Perspect. Biol. 6, a016022 (2014) , &
- Phylogenomics of prokaryotic ribosomal proteins. PLoS One 7, e36972 (2012) , , &
- MAFFT multiple sequence alignment software version 7: improvements in performance and usability. Mol. Biol. Evol. 30, 772–780 (2013) &
- BMGE (block mapping and gathering with entropy): a new software for selection of phylogenetic informative regions from multiple sequence alignments. BMC Evol. Biol. 10, 210 (2010) &
- trimAl: a tool for automated alignment trimming in large-scale phylogenetic analyses. Bioinformatics 25, 1972–1973 (2009) , &
- J. PhyloBayes MPI: phylogenetic reconstruction with infinite mixtures of profiles in a parallel environment. Syst. Biol. 62, 611–615 (2013) , , & ,
- IQ-TREE: a fast and effective stochastic algorithm for estimating maximum-likelihood phylogenies. Mol. Biol. Evol. 32, 268–274 (2015) , , &
- Ultrafast approximation for phylogenetic bootstrap. Mol. Biol. Evol. 30, 1188–1195 (2013) , &
- New algorithms and methods to estimate maximum-likelihood phylogenies: assessing the performance of PhyML 3.0. Syst. Biol. 59, 307–321 (2010) et al.
- Independent genome reduction and phylogenetic reclassification of the oceanic SAR11 clade. Mol. Biol. Evol. 29, 599–615 (2012) , &
- On reduced amino acid alphabets for phylogenetic inference. Mol. Biol. Evol. 24, 2139–2150 (2007) &
- DendroPy: a Python library for phylogenetic computing. Bioinformatics 26, 1569–1571 (2010) &
- Evolution of replicative DNA polymerases in Archaea and their contributions to the eukaryotic replication machinery. Front. Microbiol. 5, 354 (2014) , &
Extended data figures and tables
Extended Data Figures
- Extended Data Figure 1: Sample origin, metagenomics workflow and global distribution of Asgard archaea. (155 KB)
a, World map showing the sampling locations of the current study. Abbreviations of the sites mentioned are as follows: LC, Loki’s Castle; CR, Colorado River aquifer (USA); LCB, Lower Culex Basin (Yellowstone National Park, USA); WOR, White Oak River (USA); AB, Aarhus Bay (Denmark); RP, Radiata Pool (New Zealand); and TIV, Taketomi Island Vent (Japan). The world map was drawn using the Matplotlib Basemap Toolkit (http://matplotlib.org/basemap/). b, Simplified schematic overview of the metagenomics approach that was used to obtain Asgard genomes. Software used during the assembly and binning processes are shown in grey. c, Normalized distribution of major Asgard archaeal groups across various environments based on 16S rRNA gene survey datasets. Numbers on the right side of the bar graph represent total number of identified sequences.
- Extended Data Figure 2: Bayesian phylogenetic inference of 48 concatenated marker genes. (208 KB)
The tree was inferred using CAT + GTR model and rooted with Bacteria, showing high support for the phylogenetic affiliation between Asgard archaea and eukaryotes (support value in red). Numbers at branches represent posterior probabilities and scale bar indicates the number of substitutions per site.
- Extended Data Figure 3: Asgard genomes encode an expanded GTPase repertoire. (131 KB)
Graph showing small Ras and Arf-type GTPases (containing any of the following domains: IPR006762, IPR024156, IPR006689, IPR006687, IPR001806, IPR003579, IPR020849, IPR003578, IPR021181, IPR031260, IPR002041, IPR019009) per Asgard genomic bin normalized to the total amount of proteins predicted per genome and compared with selected eukaryotic, archaeal and bacterial taxa. Numbers refer to the total amount of GTPases per genome.
- Extended Data Figure 4: Phylogenetic analysis of oligosaccharyl-transferase-complex-related proteins. (437 KB)
a, Bayesian inference of STT3-domain proteins (598 aligned amino acid positions) present in all three domains of life. This phylogenetic tree was rooted with bacterial sequences. Numbers at branches refer to Bayesian and non-parametric RAxML bootstrap values, respectively. b, Unrooted maximum likelihood phylogenetic analysis of ribophorin domain proteins (357 aligned amino acid positions) including all prokaryotic homologues identified so far. Numbers at branches show slow, non-parametric maximum-likelihood bootstrap support values. Scale bars indicate the number of substitutions per site.
- Extended Data Figure 5: Genomic conservation links ESCRT and ubiquitin modifier systems. (260 KB)
Schematic overview of ubiquitin and ESCRT gene clusters identified in Asgard genomes. Contiguous contigs from Heimdallarchaeote AB_125 are represented with a double line at the end of the contig. E1-like and putative deubiquitinating proteins not belonging to any ubiquitin cluster are not shown.
- Extended Data Figure 6: Phylogenetic analyses of selected ESPs. (245 KB)
a, Tubulin protein family maximum-likelihood tree, highlighting Odinarchaeota homologues branching basal to major eukaryotic tubulin families (red clades). Green clade reflects bacterial tubulin genes probably acquired horizontally from eukaryotes. The tree was rooted with thaumarchaeal artubulins. b, Unrooted maximum-likelihood phylogenetic tree of the replicative polymerase B family depicting a Heimdallarchaeote LC_3 sequence and its corresponding protein model (red), branching basal to the eukaryotic Pol-ε (protein model in grey: PDB ID 4M8O of S. cerevisiae). Bootstrap support values of ≥99, ≥90 and ≥50 for major clades are indicated by black, grey and white circles, respectively. Eukaryotic, bacterial and archaeal clades are shaded red, green and purple, respectively. c, PFAM domain topology analysis of family B polymerases, indicating that the heimdallarchaeal homologue lacks the C-terminal DUF1744 domain characteristic of eukaryotic Pol-ε. d, Unrooted maximum-likelihood tree of RPL28e homologues, including eukaryotic RPL28e and MAK16, a RPL28e-like sequence identified in the Heimdallarchaeote LC_3 genome and a metagenomic homologue. Eukaryotic MAK16 proteins (implicated in rRNA maturation) contain an additional C-terminal domain absent in the heimdallarchaeal protein. a, b, d, Scale bars indicate the number of substitutions per site and numbers at branches show slow, non-parametric maximum-likelihood bootstrap support values.
- Extended Data Figure 7: Asgard ESPs are enriched for intracellular trafficking and secretion functions. (322 KB)
Overview of functional classification (arCOGs and EggNOG categories) of Asgard proteins assigned to major taxonomic levels. Taxonomic levels are shown in different colours. Note that, in some cases, one protein can be assigned to more than one functional category.
- Extended Data Figure 8: Eukaryotic signatures in Asgard archaea. (362 KB)
Schematic representation of a eukaryotic cell in which ESPs that have been identified in Asgard archaea are highlighted, including their phylogenetic distribution pattern. The overall picture indicates that the archaeal ancestor of eukaryotes already contained many key components underlying the emergence of cellular complexity that is characteristic of eukaryotes. DUB, deubiquitinating enzyme; MVB, multi-vesicular body; ER, endoplasmatic reticulum.
Extended Data Tables
- Supplementary Information (5 MB)
This file contains Supplementary Methods, Supplementary Discussions 1-4, Supplementary References, Supplementary Tables 1-14 and Supplementary Figures 1-5, which provide more details into annotations, applied methods and phylogenetic analyses.