Asgard archaea illuminate the origin of eukaryotic cellular complexity

Journal name:
Nature
Volume:
541,
Pages:
353–358
Date published:
DOI:
doi:10.1038/nature21031
Received
Accepted
Published online

Abstract

The origin and cellular complexity of eukaryotes represent a major enigma in biology. Current data support scenarios in which an archaeal host cell and an alphaproteobacterial (mitochondrial) endosymbiont merged together, resulting in the first eukaryotic cell. The host cell is related to Lokiarchaeota, an archaeal phylum with many eukaryotic features. The emergence of the structural complexity that characterizes eukaryotic cells remains unclear. Here we describe the ‘Asgard’ superphylum, a group of uncultivated archaea that, as well as Lokiarchaeota, includes Thor-, Odin- and Heimdallarchaeota. Asgard archaea affiliate with eukaryotes in phylogenomic analyses, and their genomes are enriched for proteins formerly considered specific to eukaryotes. Notably, thorarchaeal genomes encode several homologues of eukaryotic membrane-trafficking machinery components, including Sec23/24 and TRAPP domains. Furthermore, we identify thorarchaeal proteins with similar features to eukaryotic coat proteins involved in vesicle biogenesis. Our results expand the known repertoire of ‘eukaryote-specific’ proteins in Archaea, indicating that the archaeal host cell already contained many key components that govern eukaryotic cellular complexity.

At a glance

Figures

  1. Identification and phylogenomics of Asgard archaea.
    Figure 1: Identification and phylogenomics of Asgard archaea.

    a, Maximum-likelihood tree, inferred with RAxML and PROTCATLG model, based on metagenomic contigs containing conserved ribosomal proteins (see Methods) revealing the Asgard superphylum. Slow, non-parametric maximum-likelihood bootstrap support values above 50 and 90 are indicated with empty and filled circles, respectively. Abbreviations of the sites mentioned are as follows: LC, Loki’s Castle; CR, Colorado River aquifer (USA); LCB, Lower Culex Basin (Yellowstone National Park, USA); WOR, White Oak River (USA); AB: Aarhus Bay (Denmark); RP, Radiata Pool (New Zealand); m.b.s.f., metres below sea floor. b, c, Bayesian inference of 55 concatenated archaeo-eukaryotic ribosomal proteins inferred with PhyloBayes and CAT-GTR model (b) and maximum-likelihood analysis of concatenated small and large subunit rRNA gene sequences inferred with RAxML and GTRGAMMA model (c) showing high support for the phylogenetic affiliation between Asgard archaea and eukaryotes (support values in red). ac, Scale bars indicate number of substitutions per site. Numbers at branches refer to Bayesian posterior probabilities (b) and slow non-parametric maximum-likelihood bootstrap values (c). Trees were rooted with Euryarchaeota + DPANN (a, b) and with Bacteria (c). Branch length value corresponding to cut branch in c is 0.6769. d, Schematic tree of Asgard lineages and corresponding overview of identified ESPs. Black circles, ESP predicted based on presence of arCOG, (TopoIB, rpb8, RNA polymerase) or IPR domain (all others); grey circles, putative ESP homologues present; empty circles, no ESP homologue identified (Extended Data Table 2). *Most Asgard genomes encode distantly related FtsZ homologues (Supplementary Discussion 3).

  2. Vesicular trafficking components in Asgard archaea.
    Figure 2: Vesicular trafficking components in Asgard archaea.

    a, Conserved thorarchaeal gene clusters comprising archaeal (ar) TRAPP- and V4R-domain-encoding genes and corresponding predicted protein models of Thorarchaeote AB_25 homologues. b, Bayesian inference of thorarchaeal Bet3 homologues and subunits of the eukaryotic TRAPP complex. The tree was rooted with crenarchaeal and Asgard V4R-domain proteins. c, Domain topology of archaeal and eukaryotic Sec23/24 proteins, and prokaryotic von Willebrand factor proteins. d, Bayesian inference of thorarchaeal and metagenomic Sec23/24 homologues that branch basal to eukaryotic Sec23/Sec24 sequences. The tree was rooted with bacterial von Willebrand factor proteins. e, Thorarchaeal gene clusters encoding a protein with a predicted β-propeller fold (WD40-repeat protein), an adjacent protein with a predicted α-solenoid fold (ARM-repeat protein), and one or more small GTPases. f, Thorarchaeal gene clusters that encode a TPR-domain protein located next to a WD40-repeat protein. g, Schematic depiction of a putative archaeal proto-coatomer complex. a, c, Scale bar indicates the number of substitutions per site. Numbers at branches refer to Bayesian (a) and slow non-parametric maximum-likelihood bootstrap values (c). b, e, f, Protein models for Thorarchaeota AB_25 are shown above the respective genes, with proteins for structures shown in e and f being artificially fused before modelling (Supplementary Tables 6 and 7).

  3. Sample origin, metagenomics workflow and global distribution of Asgard archaea.
    Extended Data Fig. 1: Sample origin, metagenomics workflow and global distribution of Asgard archaea.

    a, World map showing the sampling locations of the current study. Abbreviations of the sites mentioned are as follows: LC, Loki’s Castle; CR, Colorado River aquifer (USA); LCB, Lower Culex Basin (Yellowstone National Park, USA); WOR, White Oak River (USA); AB, Aarhus Bay (Denmark); RP, Radiata Pool (New Zealand); and TIV, Taketomi Island Vent (Japan). The world map was drawn using the Matplotlib Basemap Toolkit (http://matplotlib.org/basemap/). b, Simplified schematic overview of the metagenomics approach that was used to obtain Asgard genomes. Software used during the assembly and binning processes are shown in grey. c, Normalized distribution of major Asgard archaeal groups across various environments based on 16S rRNA gene survey datasets. Numbers on the right side of the bar graph represent total number of identified sequences.

  4. Bayesian phylogenetic inference of 48 concatenated marker genes.
    Extended Data Fig. 2: Bayesian phylogenetic inference of 48 concatenated marker genes.

    The tree was inferred using CAT + GTR model and rooted with Bacteria, showing high support for the phylogenetic affiliation between Asgard archaea and eukaryotes (support value in red). Numbers at branches represent posterior probabilities and scale bar indicates the number of substitutions per site.

  5. Asgard genomes encode an expanded GTPase repertoire.
    Extended Data Fig. 3: Asgard genomes encode an expanded GTPase repertoire.

    Graph showing small Ras and Arf-type GTPases (containing any of the following domains: IPR006762, IPR024156, IPR006689, IPR006687, IPR001806, IPR003579, IPR020849, IPR003578, IPR021181, IPR031260, IPR002041, IPR019009) per Asgard genomic bin normalized to the total amount of proteins predicted per genome and compared with selected eukaryotic, archaeal and bacterial taxa. Numbers refer to the total amount of GTPases per genome.

  6. Phylogenetic analysis of oligosaccharyl-transferase-complex-related proteins.
    Extended Data Fig. 4: Phylogenetic analysis of oligosaccharyl-transferase-complex-related proteins.

    a, Bayesian inference of STT3-domain proteins (598 aligned amino acid positions) present in all three domains of life. This phylogenetic tree was rooted with bacterial sequences. Numbers at branches refer to Bayesian and non-parametric RAxML bootstrap values, respectively. b, Unrooted maximum likelihood phylogenetic analysis of ribophorin domain proteins (357 aligned amino acid positions) including all prokaryotic homologues identified so far. Numbers at branches show slow, non-parametric maximum-likelihood bootstrap support values. Scale bars indicate the number of substitutions per site.

  7. Genomic conservation links ESCRT and ubiquitin modifier systems.
    Extended Data Fig. 5: Genomic conservation links ESCRT and ubiquitin modifier systems.

    Schematic overview of ubiquitin and ESCRT gene clusters identified in Asgard genomes. Contiguous contigs from Heimdallarchaeote AB_125 are represented with a double line at the end of the contig. E1-like and putative deubiquitinating proteins not belonging to any ubiquitin cluster are not shown.

  8. Phylogenetic analyses of selected ESPs.
    Extended Data Fig. 6: Phylogenetic analyses of selected ESPs.

    a, Tubulin protein family maximum-likelihood tree, highlighting Odinarchaeota homologues branching basal to major eukaryotic tubulin families (red clades). Green clade reflects bacterial tubulin genes probably acquired horizontally from eukaryotes. The tree was rooted with thaumarchaeal artubulins. b, Unrooted maximum-likelihood phylogenetic tree of the replicative polymerase B family depicting a Heimdallarchaeote LC_3 sequence and its corresponding protein model (red), branching basal to the eukaryotic Pol-ε (protein model in grey: PDB ID 4M8O of S. cerevisiae). Bootstrap support values of ≥99, ≥90 and ≥50 for major clades are indicated by black, grey and white circles, respectively. Eukaryotic, bacterial and archaeal clades are shaded red, green and purple, respectively. c, PFAM domain topology analysis of family B polymerases, indicating that the heimdallarchaeal homologue lacks the C-terminal DUF1744 domain characteristic of eukaryotic Pol-ε. d, Unrooted maximum-likelihood tree of RPL28e homologues, including eukaryotic RPL28e and MAK16, a RPL28e-like sequence identified in the Heimdallarchaeote LC_3 genome and a metagenomic homologue. Eukaryotic MAK16 proteins (implicated in rRNA maturation) contain an additional C-terminal domain absent in the heimdallarchaeal protein. a, b, d, Scale bars indicate the number of substitutions per site and numbers at branches show slow, non-parametric maximum-likelihood bootstrap support values.

  9. Asgard ESPs are enriched for intracellular trafficking and secretion functions.
    Extended Data Fig. 7: Asgard ESPs are enriched for intracellular trafficking and secretion functions.

    Overview of functional classification (arCOGs and EggNOG categories) of Asgard proteins assigned to major taxonomic levels. Taxonomic levels are shown in different colours. Note that, in some cases, one protein can be assigned to more than one functional category.

  10. Eukaryotic signatures in Asgard archaea.
    Extended Data Fig. 8: Eukaryotic signatures in Asgard archaea.

    Schematic representation of a eukaryotic cell in which ESPs that have been identified in Asgard archaea are highlighted, including their phylogenetic distribution pattern. The overall picture indicates that the archaeal ancestor of eukaryotes already contained many key components underlying the emergence of cellular complexity that is characteristic of eukaryotes. DUB, deubiquitinating enzyme; MVB, multi-vesicular body; ER, endoplasmatic reticulum.

Tables

  1. Assembly statistics and quality metrics of reconstructed Asgard genome bins
    Extended Data Table 1: Assembly statistics and quality metrics of reconstructed Asgard genome bins
  2. Overview of presence/absence pattern of Asgard ESPs
    Extended Data Table 2: Overview of presence/absence pattern of Asgard ESPs

Accession codes

References

  1. Embley, T. M. & Martin, W. Eukaryotic evolution, changes and challenges. Nature 440, 623630 (2006)
  2. López-García, P. & Moreira, D. Open questions on the origin of eukaryotes. Trends Ecol. Evol. 30, 697708 (2015)
  3. Koonin, E. V. Origin of eukaryotes from within Archaea, archaeal eukaryome and bursts of gene gain: eukaryogenesis just made easier? Phil. Trans. R. Soc. Lond. B 370, 20140333 (2015)
  4. Martin, W. F., Garg, S. & Zimorski, V. Endosymbiotic theories for eukaryote origin. Phil. Trans. R. Soc. Lond. B 370, 20140330 (2015)
  5. Cox, C. J., Foster, P. G., Hirt, R. P., Harris, S. R. & Embley, T. M. The archaebacterial origin of eukaryotes. Proc. Natl Acad. Sci. USA 105, 2035620361 (2008)
  6. Guy, L. & Ettema, T. J. The archaeal ‘TACK’ superphylum and the origin of eukaryotes. Trends Microbiol. 19, 580587 (2011)
  7. Raymann, K., Brochier-Armanet, C. & Gribaldo, S. The two-domain tree of life is linked to a new root for the Archaea. Proc. Natl Acad. Sci. USA 112, 66706675 (2015)
  8. McInerney, J. O., O’Connell, M. J. & Pisani, D. The hybrid nature of the Eukaryota and a consilient view of life on Earth. Nat. Rev. Microbiol. 12, 449455 (2014)
  9. Williams, T. A., Foster, P. G., Nye, T. M., Cox, C. J. & Embley, T. M. A congruent phylogenomic signal places eukaryotes within the Archaea. Proc. R. Soc. Lond. B 279, 48704879 (2012)
  10. Gray, M. W., Burger, G. & Lang, B. F. Mitochondrial evolution. Science 283, 14761481 (1999)
  11. Spang, A. et al. Complex Archaea that bridge the gap between prokaryotes and eukaryotes. Nature 521, 173179 (2015)
  12. Williams, T. A., Foster, P. G., Cox, C. J. & Embley, T. M. An archaeal origin of eukaryotes supports only two primary domains of life. Nature 504, 231236 (2013)
  13. Hartman, H. & Fedorov, A. The origin of the eukaryotic cell: a genomic investigation. Proc. Natl Acad. Sci. USA 99, 14201425 (2002)
  14. Klinger, C. M., Spang, A., Dacks, J. B. & Ettema, T. J. Tracing the archaeal origins of eukaryotic membrane-trafficking system building blocks. Mol. Biol. Evol. 33, 15281541 (2016)
  15. Surkont, J. & Pereira-Leal, J. B. Are there Rab GTPases in Archaea? Mol. Biol. Evol. 33, 18331842 (2016)
  16. Dey, G., Thattai, M. & Baum, B. On the archaeal origins of eukaryotes and the challenges of inferring phenotype from genotype. Trends Cell Biol. 26, 476485 (2016)
  17. Koonin, E. V. Archaeal ancestors of eukaryotes: not so elusive any more. BMC Biol. 13, 84 (2015)
  18. Archibald, J. M. Endosymbiosis and eukaryotic cell evolution. Curr. Biol. 25, R911R921 (2015)
  19. Martin, W. F., Neukirchen, S., Zimorski, V., Gould, S. B. & Sousa, F. L. Energy for two: new archaeal lineages and the origin of mitochondria. BioEssays 38, 850856 (2016)
  20. Villanueva, L., Schouten, S. & Damsté, J. S. Phylogenomic analysis of lipid biosynthetic genes of Archaea shed light on the ‘lipid divide’. Environ. Microbiol. (2016)
  21. Sousa, F. L., Neukirchen, S., Allen, J. F., Lane, N. & Martin, W. F. Lokiarchaeon is hydrogen dependent. Nat. Microbiol. 1, 16034 (2016)
  22. Mariotti, M. et al. Lokiarchaeota marks the transition between the archaeal and eukaryotic selenocysteine encoding systems. Mol. Biol. Evol. 33, 24412453 (2016)
  23. Seitz, K. W., Lazar, C. S., Hinrichs, K. U., Teske, A. P. & Baker, B. J. Genomic reconstruction of a novel, deeply branched sediment archaeal phylum with pathways for acetogenesis and sulfur reduction. ISME J. 10, 16961705 (2016)
  24. Takai, K. & Horikoshi, K. Genetic diversity of Archaea in deep-sea hydrothermal vent environments. Genetics 152, 12851297 (1999)
  25. Delsuc, F., Brinkmann, H. & Philippe, H. Phylogenomics and the reconstruction of the tree of life. Nat. Rev. Genet. 6, 361375 (2005)
  26. Lartillot, N. & Philippe, H. A Bayesian mixture model for across-site heterogeneities in the amino-acid replacement process. Mol. Biol. Evol. 21, 10951109 (2004)
  27. Raiborg, C. & Stenmark, H. The ESCRT machinery in endosomal sorting of ubiquitylated membrane proteins. Nature 458, 445452 (2009)
  28. Yutin, N. & Koonin, E. V. Archaeal origin of tubulin. Biol. Direct 7, 10 (2012)
  29. Tahirov, T. H., Makarova, K. S., Rogozin, I. B., Pavlov, Y. I. & Koonin, E. V. Evolution of DNA polymerases: an inactivated polymerase-exonuclease module in Pol epsilon and a chimeric origin of eukaryotic polymerases from two classes of archaeal ancestors. Biol. Direct 4, 11 (2009)
  30. Sacher, M., Kim, Y. G., Lavie, A., Oh, B. H. & Segev, N. The TRAPP complex: insights into its architecture and function. Traffic 9, 20322042 (2008)
  31. Podar, M., Wall, M. A., Makarova, K. S. & Koonin, E. V. The prokaryotic V4R domain is the likely ancestor of a key component of the eukaryotic vesicle transport system. Biol. Direct 3, 2 (2008)
  32. Barlowe, C. et al. COPII: a membrane coat formed by Sec proteins that drive vesicle budding from the endoplasmic reticulum. Cell 77, 895907 (1994)
  33. Lee, M. C., Miller, E. A., Goldberg, J., Orci, L. & Schekman, R. Bi-directional protein transport between the ER and Golgi. Annu. Rev. Cell Dev. Biol. 20, 87123 (2004)
  34. Gould, S. B., Garg, S. G. & Martin, W. F. Bacterial vesicle secretion and the evolutionary origin of the eukaryotic endomembrane system. Trends Microbiol. 24, 525534 (2016)
  35. Devos, D. et al. Components of coated vesicles and nuclear pore complexes share a common molecular architecture. PLoS Biol. 2, e380 (2004)
  36. Fournier, D. et al. Functional and genomic analyses of alpha-solenoid proteins. PLoS One 8, e79894 (2013)
  37. Field, M. C., Sali, A. & Rout, M. P. Evolution: on a bender–BARs, ESCRTs, COPs, and finally getting your coat. J. Cell Biol. 193, 963972 (2011)
  38. Schlacht, A. & Dacks, J. B. Unexpected ancient paralogs and an evolutionary model for the COPII coat complex. Genome Biol. Evol. 7, 10981109 (2015)
  39. Dacks, J. B. & Field, M. C. Evolution of the eukaryotic membrane-trafficking system: origin, tempo and mode. J. Cell Sci. 120, 29772985 (2007)
  40. Ku, C. et al. Endosymbiotic origin and differential loss of eukaryotic genes. Nature 524, 427432 (2015)
  41. Pittis, A. A. & Gabaldón, T. Late acquisition of mitochondria by a host with chimaeric prokaryotic ancestry. Nature 531, 101104 (2016)
  42. Ettema, T. J. Evolution: mitochondria in the second act. Nature 531, 3940 (2016)
  43. Koonin, E. V. & Yutin, N. The dispersed archaeal eukaryome and the complex archaeal ancestor of eukaryotes. Cold Spring Harb. Perspect. Biol. 6, a016188 (2014)
  44. Shively, J. M. in Complex Intracellular Structures in Prokaryotes (ed. Jessup M. Shively) 322 (Springer Berlin Heidelberg, 2006)
  45. Küper, U., Meyer, C., Müller, V., Rachel, R. & Huber, H. Energized outer membrane and spatial separation of metabolic processes in the hyperthermophilic archaeon Ignicoccus hospitalis. Proc. Natl Acad. Sci. USA 107, 31523156 (2010)
  46. Klingl, A. S-layer and cytoplasmic membrane—exceptions from the typical archaeal cell wall with a focus on double membranes. Front. Microbiol. 5, 624 (2014)
  47. Yutin, N., Wolf, M. Y., Wolf, Y. I. & Koonin, E. V. The origins of phagocytosis and eukaryogenesis. Biol. Direct 4, 9 (2009)
  48. Martijn, J. & Ettema, T. J. From archaeon to eukaryote: the evolutionary dark ages of the eukaryotic cell. Biochem. Soc. Trans. 41, 451457 (2013)
  49. Poole, A. M. & Gribaldo, S. Eukaryotic origins: how and when was the mitochondrion acquired? Cold Spring Harb. Perspect. Biol. 6, a015990 (2014)
  50. Lane, N. & Martin, W. The energetics of genome complexity. Nature 467, 929934 (2010)
  51. Saw, J. H. et al. Exploring microbial dark matter to resolve the deep archaeal ancestry of eukaryotes. Phil. Trans. R. Soc. Lond. B 370, 20140328 (2015)
  52. Baker, B. J. et al. Genomic inference of the metabolism of cosmopolitan subsurface Archaea, Hadesarchaea. Nat. Microbiol. 1, 16002 (2016)
  53. Castelle, C. J. et al. Genomic expansion of domain Archaea highlights roles for organisms from new phyla in anaerobic carbon cycling. Curr. Biol. 25, 690701 (2015)
  54. Hirayama, H. et al. Culture-dependent and -independent characterization of microbial communities associated with a shallow submarine hydrothermal system occurring within a coral reef off Taketomi Island, Japan. Appl. Environ. Microbiol. 73, 76427656 (2007)
  55. Lever, M. A. et al. A modular method for the extraction of DNA and RNA, and the separation of DNA pools from diverse environmental sample types. Front. Microbiol. 6, 476 (2015)
  56. Bolger, A. M., Lohse, M. & Usadel, B. Trimmomatic: a flexible trimmer for Illumina sequence data. Bioinformatics 30, 21142120 (2014)
  57. Peng, Y., Leung, H. C., Yiu, S. M. & Chin, F. Y. IDBA-UD: a de novo assembler for single-cell and metagenomic sequencing data with highly uneven depth. Bioinformatics 28, 14201428 (2012)
  58. Boisvert, S., Raymond, F., Godzaridis, E., Laviolette, F. & Corbeil, J. Ray Meta: scalable de novo metagenome assembly and profiling. Genome Biol. 13, R122 (2012)
  59. Dick, G. J. et al. Community-wide analysis of microbial genome sequence signatures. Genome Biol. 10, R85 (2009)
  60. Alneberg, J. et al. Binning metagenomic contigs by coverage and composition. Nat. Methods 11, 11441146 (2014)
  61. Brady, A. & Salzberg, S. L. Phymm and PhymmBL: metagenomic phylogenetic classification with interpolated Markov models. Nat. Methods 6, 673676 (2009)
  62. Brown, C. T. et al. Unusual biology across a group comprising more than 15% of domain Bacteria. Nature 523, 208211 (2015)
  63. Albertsen, M. et al. Genome sequences of rare, uncultured bacteria obtained by differential coverage binning of multiple metagenomes. Nat. Biotechnol. 31, 533538 (2013)
  64. Bankevich, A. et al. SPAdes: a new genome assembly algorithm and its applications to single-cell sequencing. J. Comput. Biol. 19, 455477 (2012)
  65. Parks, D. H., Imelfort, M., Skennerton, C. T., Hugenholtz, P. & Tyson, G. W. CheckM: assessing the quality of microbial genomes recovered from isolates, single cells, and metagenomes. Genome Res. 25, 10431055 (2015)
  66. Hyatt, D. et al. Prodigal: prokaryotic gene recognition and translation initiation site identification. BMC Bioinformatics 11, 119 (2010)
  67. Markowitz, V. M. et al. IMG: the integrated microbial genomes database and comparative analysis system. Nucleic Acids Res. 40, D115D122 (2012)
  68. Lowe, T. M. & Eddy, S. R. tRNAscan-SE: a program for improved detection of transfer RNA genes in genomic sequence. Nucleic Acids Res. 25, 955964 (1997)
  69. Jones, P. et al. InterProScan 5: genome-scale protein function classification. Bioinformatics 30, 12361240 (2014)
  70. Marchler-Bauer, A. et al. CDD: NCBI’s conserved domain database. Nucleic Acids Res. 43, D222D226 (2015)
  71. Makarova, K. S., Wolf, Y. I. & Koonin, E. V. Archaeal clusters of orthologous genes (arCOGs): an update and application for analysis of shared features between Thermococcales, Methanococcales, and Methanobacteriales. Life 5, 818840 (2015)
  72. Finn, R. D. et al. The Pfam protein families database: towards a more sustainable future. Nucleic Acids Res. 44 (D1), D279D285 (2016)
  73. Letunic, I., Doerks, T. & Bork, P. SMART: recent updates, new developments and status in 2015. Nucleic Acids Res. 43, D257D260 (2015)
  74. Söding, J., Biegert, A. & Lupas, A. N. The HHpred interactive server for protein homology detection and structure prediction. Nucleic Acids Res. 33, W244W228 (2005)
  75. Kelley, L. A., Mezulis, S., Yates, C. M., Wass, M. N. & Sternberg, M. J. The Phyre2 web portal for protein modeling, prediction and analysis. Nat. Protocols 10, 845858 (2015)
  76. Pettersen, E. F. et al. UCSF Chimera–a visualization system for exploratory research and analysis. J. Comput. Chem. 25, 16051612 (2004)
  77. Stamatakis, A. RAxML version 8: a tool for phylogenetic analysis and post-analysis of large phylogenies. Bioinformatics 30, 13121313 (2014)
  78. Guy, L., Saw, J. H. & Ettema, T. J. The archaeal legacy of eukaryotes: a phylogenomic perspective. Cold Spring Harb. Perspect. Biol. 6, a016022 (2014)
  79. Yutin, N., Puigbò, P., Koonin, E. V. & Wolf, Y. I. Phylogenomics of prokaryotic ribosomal proteins. PLoS One 7, e36972 (2012)
  80. Katoh, K. & Standley, D. M. MAFFT multiple sequence alignment software version 7: improvements in performance and usability. Mol. Biol. Evol. 30, 772780 (2013)
  81. Criscuolo, A. & Gribaldo, S. BMGE (block mapping and gathering with entropy): a new software for selection of phylogenetic informative regions from multiple sequence alignments. BMC Evol. Biol. 10, 210 (2010)
  82. Capella-Gutiérrez, S., Silla-Martínez, J. M. & Gabaldón, T. trimAl: a tool for automated alignment trimming in large-scale phylogenetic analyses. Bioinformatics 25, 19721973 (2009)
  83. Lartillot, N., Rodrigue, N., Stubbs, D. & Richer, J. PhyloBayes MPI: phylogenetic reconstruction with infinite mixtures of profiles in a parallel environment. Syst. Biol. 62, 611615 (2013)
  84. Nguyen, L. T., Schmidt, H. A., von Haeseler, A. & Minh, B. Q. IQ-TREE: a fast and effective stochastic algorithm for estimating maximum-likelihood phylogenies. Mol. Biol. Evol. 32, 268274 (2015)
  85. Minh, B. Q., Nguyen, M. A. & von Haeseler, A. Ultrafast approximation for phylogenetic bootstrap. Mol. Biol. Evol. 30, 11881195 (2013)
  86. Guindon, S. et al. New algorithms and methods to estimate maximum-likelihood phylogenies: assessing the performance of PhyML 3.0. Syst. Biol. 59, 307321 (2010)
  87. Viklund, J., Ettema, T. J. & Andersson, S. G. Independent genome reduction and phylogenetic reclassification of the oceanic SAR11 clade. Mol. Biol. Evol. 29, 599615 (2012)
  88. Susko, E. & Roger, A. J. On reduced amino acid alphabets for phylogenetic inference. Mol. Biol. Evol. 24, 21392150 (2007)
  89. Sukumaran, J. & Holder, M. T. DendroPy: a Python library for phylogenetic computing. Bioinformatics 26, 15691571 (2010)
  90. Makarova, K. S., Krupovic, M. & Koonin, E. V. Evolution of replicative DNA polymerases in Archaea and their contributions to the eukaryotic replication machinery. Front. Microbiol. 5, 354 (2014)

Download references

Author information

  1. Present address: Department of Plant Systems Biology, VIB and Department of Plant Biotechnology and Bioinformatics, Ghent University, Technologiepark 927, B-9052 Ghent, Belgium.

    • Emmelien Vancaester
  2. These authors contributed equally to this work.

    • Katarzyna Zaremba-Niedzwiedzka,
    • Eva F. Caceres &
    • Jimmy H. Saw

Affiliations

  1. Department of Cell and Molecular Biology, Science for Life Laboratory, Uppsala University, SE-75123 Uppsala, Sweden

    • Katarzyna Zaremba-Niedzwiedzka,
    • Eva F. Caceres,
    • Jimmy H. Saw,
    • Disa Bäckström,
    • Lina Juzokaite,
    • Emmelien Vancaester,
    • Anja Spang &
    • Thijs J. G. Ettema
  2. Department of Marine Science, University of Texas-Austin, Marine Science Institute, Port Aransas, Texas 78373, USA

    • Kiley W. Seitz &
    • Brett J. Baker
  3. Department of Earth and Planetary Sciences, and Department of Environmental Science, Policy, and Management, University of California, Berkeley, California, USA

    • Karthik Anantharaman &
    • Jillian F. Banfield
  4. Section for Microbiology and Center for Geomicrobiology, Department of Bioscience, Aarhus University, DK-8000 Aarhus, Denmark

    • Piotr Starnawski,
    • Kasper U. Kjeldsen &
    • Andreas Schramm
  5. GNS Science, Extremophile Research Group, Private Bag 2000, Taupō 3352, New Zealand

    • Matthew B. Stott
  6. Research and Development Center for Marine Biosciences, Japan Agency for Marine-Earth Science and Technology, Yokosuka 237-0061, Japan

    • Takuro Nunoura

Contributions

T.J.G.E. conceived the study. A.Sc., P.S., K.U.K., M.B.S. and T.N. took/provided environmental samples. L.J. purified environmental DNA and prepared sequencing libraries. K.Z.-N., E.F.C, J.H.S., K.A., J.F.B, K.W.S., B.J.B. and E.V. performed metagenomic sequence assemblies and metagenomic binning analyses. K.Z.-N., E.F.C., J.H.S., A.Sp. and T.J.G.E. analysed genomic data and performed phylogenetic analyses. A.Sp., D.B., E.F.C. and T.J.G.E analysed genomic signatures. K.Z.-N., E.F.C., J.H.S., A.Sp. and T.J.G.E. wrote, and all authors edited and approved, the manuscript.

Competing financial interests

The authors declare no competing financial interests.

Corresponding author

Correspondence to:

Reviewer Information Nature thanks J. Gilbert, E. Koonin, A. Roger and the other anonymous reviewer(s) for their contribution to the peer review of this work.

Author details

Extended data figures and tables

Extended Data Figures

  1. Extended Data Figure 1: Sample origin, metagenomics workflow and global distribution of Asgard archaea. (155 KB)

    a, World map showing the sampling locations of the current study. Abbreviations of the sites mentioned are as follows: LC, Loki’s Castle; CR, Colorado River aquifer (USA); LCB, Lower Culex Basin (Yellowstone National Park, USA); WOR, White Oak River (USA); AB, Aarhus Bay (Denmark); RP, Radiata Pool (New Zealand); and TIV, Taketomi Island Vent (Japan). The world map was drawn using the Matplotlib Basemap Toolkit (http://matplotlib.org/basemap/). b, Simplified schematic overview of the metagenomics approach that was used to obtain Asgard genomes. Software used during the assembly and binning processes are shown in grey. c, Normalized distribution of major Asgard archaeal groups across various environments based on 16S rRNA gene survey datasets. Numbers on the right side of the bar graph represent total number of identified sequences.

  2. Extended Data Figure 2: Bayesian phylogenetic inference of 48 concatenated marker genes. (208 KB)

    The tree was inferred using CAT + GTR model and rooted with Bacteria, showing high support for the phylogenetic affiliation between Asgard archaea and eukaryotes (support value in red). Numbers at branches represent posterior probabilities and scale bar indicates the number of substitutions per site.

  3. Extended Data Figure 3: Asgard genomes encode an expanded GTPase repertoire. (131 KB)

    Graph showing small Ras and Arf-type GTPases (containing any of the following domains: IPR006762, IPR024156, IPR006689, IPR006687, IPR001806, IPR003579, IPR020849, IPR003578, IPR021181, IPR031260, IPR002041, IPR019009) per Asgard genomic bin normalized to the total amount of proteins predicted per genome and compared with selected eukaryotic, archaeal and bacterial taxa. Numbers refer to the total amount of GTPases per genome.

  4. Extended Data Figure 4: Phylogenetic analysis of oligosaccharyl-transferase-complex-related proteins. (437 KB)

    a, Bayesian inference of STT3-domain proteins (598 aligned amino acid positions) present in all three domains of life. This phylogenetic tree was rooted with bacterial sequences. Numbers at branches refer to Bayesian and non-parametric RAxML bootstrap values, respectively. b, Unrooted maximum likelihood phylogenetic analysis of ribophorin domain proteins (357 aligned amino acid positions) including all prokaryotic homologues identified so far. Numbers at branches show slow, non-parametric maximum-likelihood bootstrap support values. Scale bars indicate the number of substitutions per site.

  5. Extended Data Figure 5: Genomic conservation links ESCRT and ubiquitin modifier systems. (260 KB)

    Schematic overview of ubiquitin and ESCRT gene clusters identified in Asgard genomes. Contiguous contigs from Heimdallarchaeote AB_125 are represented with a double line at the end of the contig. E1-like and putative deubiquitinating proteins not belonging to any ubiquitin cluster are not shown.

  6. Extended Data Figure 6: Phylogenetic analyses of selected ESPs. (245 KB)

    a, Tubulin protein family maximum-likelihood tree, highlighting Odinarchaeota homologues branching basal to major eukaryotic tubulin families (red clades). Green clade reflects bacterial tubulin genes probably acquired horizontally from eukaryotes. The tree was rooted with thaumarchaeal artubulins. b, Unrooted maximum-likelihood phylogenetic tree of the replicative polymerase B family depicting a Heimdallarchaeote LC_3 sequence and its corresponding protein model (red), branching basal to the eukaryotic Pol-ε (protein model in grey: PDB ID 4M8O of S. cerevisiae). Bootstrap support values of ≥99, ≥90 and ≥50 for major clades are indicated by black, grey and white circles, respectively. Eukaryotic, bacterial and archaeal clades are shaded red, green and purple, respectively. c, PFAM domain topology analysis of family B polymerases, indicating that the heimdallarchaeal homologue lacks the C-terminal DUF1744 domain characteristic of eukaryotic Pol-ε. d, Unrooted maximum-likelihood tree of RPL28e homologues, including eukaryotic RPL28e and MAK16, a RPL28e-like sequence identified in the Heimdallarchaeote LC_3 genome and a metagenomic homologue. Eukaryotic MAK16 proteins (implicated in rRNA maturation) contain an additional C-terminal domain absent in the heimdallarchaeal protein. a, b, d, Scale bars indicate the number of substitutions per site and numbers at branches show slow, non-parametric maximum-likelihood bootstrap support values.

  7. Extended Data Figure 7: Asgard ESPs are enriched for intracellular trafficking and secretion functions. (322 KB)

    Overview of functional classification (arCOGs and EggNOG categories) of Asgard proteins assigned to major taxonomic levels. Taxonomic levels are shown in different colours. Note that, in some cases, one protein can be assigned to more than one functional category.

  8. Extended Data Figure 8: Eukaryotic signatures in Asgard archaea. (362 KB)

    Schematic representation of a eukaryotic cell in which ESPs that have been identified in Asgard archaea are highlighted, including their phylogenetic distribution pattern. The overall picture indicates that the archaeal ancestor of eukaryotes already contained many key components underlying the emergence of cellular complexity that is characteristic of eukaryotes. DUB, deubiquitinating enzyme; MVB, multi-vesicular body; ER, endoplasmatic reticulum.

Extended Data Tables

  1. Extended Data Table 1: Assembly statistics and quality metrics of reconstructed Asgard genome bins (146 KB)
  2. Extended Data Table 2: Overview of presence/absence pattern of Asgard ESPs (476 KB)

Supplementary information

PDF files

  1. Supplementary Information (5 MB)

    This file contains Supplementary Methods, Supplementary Discussions 1-4, Supplementary References, Supplementary Tables 1-14 and Supplementary Figures 1-5, which provide more details into annotations, applied methods and phylogenetic analyses.

Additional data