Skip to main content

Thank you for visiting You are using a browser version with limited support for CSS. To obtain the best experience, we recommend you use a more up to date browser (or turn off compatibility mode in Internet Explorer). In the meantime, to ensure continued support, we are displaying the site without styles and JavaScript.

  • Letter
  • Published:

Origins of major archaeal clades correspond to gene acquisitions from bacteria


The mechanisms that underlie the origin of major prokaryotic groups are poorly understood. In principle, the origin of both species and higher taxa among prokaryotes should entail similar mechanisms—ecological interactions with the environment paired with natural genetic variation involving lineage-specific gene innovations and lineage-specific gene acquisitions1,2,3,4. To investigate the origin of higher taxa in archaea, we have determined gene distributions and gene phylogenies for the 267,568 protein-coding genes of 134 sequenced archaeal genomes in the context of their homologues from 1,847 reference bacterial genomes. Archaeal-specific gene families define 13 traditionally recognized archaeal higher taxa in our sample. Here we report that the origins of these 13 groups unexpectedly correspond to 2,264 group-specific gene acquisitions from bacteria. Interdomain gene transfer is highly asymmetric, transfers from bacteria to archaea are more than fivefold more frequent than vice versa. Gene transfers identified at major evolutionary transitions among prokaryotes specifically implicate gene acquisitions for metabolic functions from bacteria as key innovations in the origin of higher archaeal taxa.

This is a preview of subscription content, access via your institution

Access options

Buy this article

Prices may be subject to local taxes which are calculated during checkout

Figure 1: Distribution of genes in archaeal-specific families.
Figure 2: Bacterial gene acquisitions in archaeal genomes.
Figure 3: Archaeal gene acquisition network.

Similar content being viewed by others


  1. Doolittle, W. F. & Papke, R. T. Genomics and the bacterial species problem. Genome Biol. 7, 116 (2006)

    Article  Google Scholar 

  2. Retchless, A. C. & Lawrence, J. G. Temporal fragmentation of speciation in bacteria. Science 317, 1093–1096 (2007)

    Article  ADS  CAS  Google Scholar 

  3. Achtman, M. & Wagner, M. Microbial diversity and the genetic nature of microbial species. Nature Rev. Microbiol. 6, 431–440 (2008)

    Article  CAS  Google Scholar 

  4. Fraser, C., Alm, E. J., Polz, M. F., Spratt, B. G. & Hanage, W. P. The bacterial species challenge: making sense of genetic and ecological diversity. Science 323, 741–746 (2009)

    Article  ADS  CAS  Google Scholar 

  5. Puigbò, P., Wolf, Y. I. & Koonin, E. V. The tree and net components of prokaryote genome evolution. Genome Biol. Evol. 2, 745–756 (2010)

    Article  Google Scholar 

  6. Dagan, T. Phylogenomic networks. Trends Microbiol. 19, 483–491 (2011)

    Article  CAS  Google Scholar 

  7. Hess, W. R. Genome analysis of marine photosynthetic microbes and their global role. Curr. Opin. Biotechnol. 15, 191–198 (2004)

    Article  CAS  Google Scholar 

  8. Kloesges, T. et al. Networks of gene sharing among 329 proteobacterial genomes reveal differences in lateral gene transfer frequency at different phylogenetic depths. Mol. Biol. Evol. 28, 1057–1074 (2011)

    Article  CAS  Google Scholar 

  9. Williams, D., Gogarten, J. P. & Papke, R. T. Quantifying homologous replacement of loci between haloarchaeal species. Genome Biol. Evol. 4, 1223–1244 (2012)

    Article  Google Scholar 

  10. Woese, C. R. Bacterial evolution. Microbiol. Rev. 51, 221–271 (1987)

    CAS  PubMed  PubMed Central  Google Scholar 

  11. Rivera, M. C., Jain, R., Moore, J. E. & Lake, J. A. Genomic evidence for two functionally distinct gene classes. Proc. Natl Acad. Sci. USA 95, 6239–6244 (1998)

    Article  ADS  CAS  Google Scholar 

  12. Puigbò, P., Wolf, Y. I. & Koonin, E. V. Search for a tree of life in the thicket of the phylogenetic forest. J. Biol. 8, 59 (2009)

    Article  Google Scholar 

  13. Brochier-Armanet, C., Forterre, P. & Gribaldo, S. Phylogeny and evolution of the Archaea: one hundred genomes later. Curr. Opin. Microbiol. 14, 274–281 (2011)

    Article  Google Scholar 

  14. Lake, J. A. & Rivera, M. C. Deriving the genomic tree of life in the presence of horizontal gene transfer: conditioned reconstruction. Mol. Biol. Evol. 21, 681–690 (2004)

    Article  CAS  Google Scholar 

  15. Enright, A. J., Van Dongen, S. & Ouzounis, C. A. An efficient algorithm for large-scale detection of protein families. Nucleic Acids Res. 30, 1575–1584 (2002)

    Article  CAS  Google Scholar 

  16. Wolf, Y. I., Makarova, K. S., Yutin, N. & Koonin, E. V. Updated clusters of orthologous genes for Archaea: a complex ancestor of the Archaea and the byways of horizontal gene transfer. Biol. Direct 7, 46 (2012)

    Article  CAS  Google Scholar 

  17. Nelson-Sathi, S. et al. Acquisitions of 1,000 eubacterial genes physiologically transformed a methanogen at the origin of Haloarchaea. Proc. Natl Acad. Sci. USA 109, 20537–20542 (2012)

    Article  ADS  CAS  Google Scholar 

  18. Bräsen, C., Esser, D., Rauch, B. & Siebers, B. Carbohydrate metabolism in Archaea: current insights into unusual enzymes and pathways and their regulation. Microbiol. Mol. Biol. Rev. 78, 89–175 (2014)

    Article  Google Scholar 

  19. Siebers, B. & Schönheit, P. Unusual pathways and enzymes of central carbohydrate metabolism in Archaea. Curr. Opin. Microbiol. 8, 695–705 (2005)

    Article  CAS  Google Scholar 

  20. Doolittle, W. F. & Bapteste, E. Pattern pluralism and the tree of life hypothesis. Proc. Natl Acad. Sci. USA 104, 2043–2049 (2007)

    Article  ADS  CAS  Google Scholar 

  21. Creevey, C. J. et al. Does a tree-like phylogeny only exist at the tips in the tree of prokaryotes? Proc. R. Soc. Lond. B 271, 2551–2558 (2004)

    Article  CAS  Google Scholar 

  22. Deppenmeier, U. et al. The genome of Methanosarcina mazei: evidence for lateral gene transfer between bacteria and archaea. J. Mol. Microbiol. Biotechnol. 4, 453–461 (2002)

    CAS  Google Scholar 

  23. Williams, T. A., Foster, G. F., Cox, C. Y. & Embley, T. M. An archaeal origin of eukaryotes supports only two primary domains of life. Nature 504, 231–236 (2013)

    Article  ADS  CAS  Google Scholar 

  24. McInerney, J. O., O’Connell, M. J. & Pisani, D. The hybrid nature of eukaryota and a consilient view of life on Earth. Nature Rev. Microbiol. 12, 449–455 (2014)

    Article  CAS  Google Scholar 

  25. Wolf, Y. I. & Koonin, E. V. Genome reduction as the dominant mode of evolution. Bioessays 35, 829–837 (2013)

    Article  Google Scholar 

  26. Altschul, S. F. et al. Gapped BLAST and PSI-BLAST: a new generation of protein database search programs. Nucleic Acids Res. 25, 3389–3402 (1997)

    Article  CAS  Google Scholar 

  27. Tatusov, R. L., Koonin, E. V. & Lipman, D. J. A genomic perspective on protein families. Science 278, 631–637 (1997)

    Article  ADS  CAS  Google Scholar 

  28. Rice, P., Longden, I. & Bleasby, A. EMBOSS: the European molecular biology open software suite. Trends Genet. 16, 276–277 (2000)

    Article  CAS  Google Scholar 

  29. Guindon, S. & Gascuel, O. A simple, fast, and accurate algorithm to estimate large phylogenies by maximum likelihood. Syst. Biol. 52, 696–704 (2003)

    Article  Google Scholar 

  30. Stamatakis, A., Ludwig, T. & Meier, H. RAxML-III: a fast program for maximum likelihood-based inference of large phylogenetic trees. Bioinformatics 21, 456–463 (2005)

    Article  CAS  Google Scholar 

Download references


We gratefully acknowledge funding from European Research Council (ERC 232975 to W.F.M.), the graduate school E-Norm of the Heinrich-Heine University (W.F.M.), the DFG (Scho 316/11-1 to P.S.; SI 642/10-1 to B.S.), and BMBF (0316188A, B.S.). G.L. is supported by an ERC grant (281357 to Tal Dagan), D.B. thanks the Alexander von Humbold Foundation for a Fellowship. Computational support of the Zentrum für Informations- und Medientechnologie (ZIM) at the Heinrich-Heine University is gratefully acknowledged.

Author information

Authors and Affiliations



S.N.-S., F.L.S., M.R., N.L.-C. and T.T. performed bioinformatic analyses; A.J., D.B. and G.L. performed statistical analyses; P.S., B.S., J.O.M. and W.F.M. interpreted results; S.N.-S., F.L.S., G.L., J.O.M. and W.F.M. wrote the paper; S.N.-S., G.L. and W.F.M. designed the study. All authors discussed the results and commented on the manuscript.

Corresponding author

Correspondence to William F. Martin.

Ethics declarations

Competing interests

The authors declare no competing financial interests.

Extended data figures and tables

Extended Data Figure 1 Inter-domain gene sharing network.

Each cell in the matrix indicates the number of genes (e-value ≤ 10−10 and ≥ 25% global identity) shared between 134 archaeal and 1,847 bacterial genomes in each pairwise inter-domain comparison (scale bar at lower right). Archaeal genomes are listed as in Fig. 1. Bacterial genomes are presented in 23 groups corresponding to phylum or class in the GenBank nomenclature: a = Clostridia; b = Erysipelotrichi, Negativicutes; c = Bacilli; d = Firmicutes; e = Chlamydia; f = Verrucomicrobia, Planctomycete; g = Spirochaete; h = Gemmatimonadetes, Synergisteles, Elusimicrobia, Dyctyoglomi, Nitrospirae; i = Actinobacteria; j = Fibrobacter, Chlorobi; k = Bacteroidetes; l = Fusobacteria; Thermatogae, Aquificae, Chloroflexi; m = Deinococcus-Thermus; n = Cyanobacteria; o = Acidobacteria; δ, ε, α, β, γ = Delta, Epsilon, Alpha, Beta and Gamma proteobacteria; P = Thermosulfurobateria, Caldiserica, Chysiogenete, Ignavibacteria. Bacterial genome size in number of proteins is indicated at the top.

Extended Data Figure 2 Presence–absence patterns of archaeal genes with sparse distribution among bacteria sampled.

Archaeal export families are sorted according to the reference tree on the left. The figure shows the 391 cases of archaea-to-bacteria export (≥ 2 archaea and ≥ 2 bacteria from one phylum only), 662 cases of bacterial singleton trees (≥ 3 archaea, one bacterium). The 25,762 clusters were classified into the following categories (Supplementary Table 2): 16,983 archaeal specific, 3,315 imports, 391 exports, 662 cases of bacterial singletons with ≥ 3 archaea in the tree, 308 cases with three sequences (a bacterial singleton and 2 archaea) in the cluster, 4,074 trees in which archaea were non-monophyletic, and 29 ambiguous cases among trees showing archaeal monophyly. The bacterial taxonomic distribution is shown in the lower panel. Gene identifiers and trees are given in Supplementary Table 3.

Extended Data Figure 3 Comparison of sets of trees for single-copy genes in 11 archaeal groups.

Cumulative distribution functions for scores of tree compatibility with the recipient data set. Values are P values of the two-sided Kolmogorov–Smirnov (KS) two-sample goodness-of-fit test in the comparison of the recipient (blue) data sets against the imports (green) data set and three synthetic data sets, one-LGT (red), two-LGT (pink) and random (cyan). a, Thermoproteales. b, Desulfurococcales. c, Sulfolobales. d, Thermococcales. e, Methanobacteriales. f, Methanococcales. g, Thermoplasmatales. h, Archaeoglobales. i, Methanococcales. j, Methanosarcinales. k, Haloarchaea.

Extended Data Figure 4 Presence–absence patterns of all archaeal non-monophyletic genes.

Archaeal families that did not generate monophyly for archaeal sequences in ML trees are plotted according the reference tree on the left, the distribution across bacterial genomes groups is shown in the lower panel. These trees include 693 cases in which archaea showed non-monophyly by the misplacement of a single archaeal branch. Gene identifiers and trees are given in Supplementary Tables 4 and 5.

Extended Data Figure 5 Sorting by bacterial presence absence patterns for archaeal imports, exports and archaeal non-monophyletic families.

Archaeal families and their homologue distribution in 1,847 bacterial genomes are sorted by archaeal (top) and bacterial (bottom) gene distributions for direct comparison. af, Distributions of archaeal imports sorted by archaeal groups (a) and by bacterial groups (b); distributions of archaeal exports sorted by archaeal groups (c) and by bacterial groups (d); distributions of archaeal non-monophyletic gene families sorted by archaeal groups (e) and by bacterial groups (f).

Extended Data Figure 6 Testing for evidence of higher order archaeal relationships using a permutation tail probability (PTP) test.

Comparison of pairwise Euclidian distance distributions between archaeal real and conditional random gene family patterns using the two-sided Kolmogorov-Smirnov (KS) two-sample goodness-of-fit test. a, Archaeal specific families: distribution of 2,471 archaeal specific families present in at least 2 and less than 11 groups (top); comparison between real data and 100 conditional random patterns generated by shuffling the entries within Crenarchaeota and Euryarchaeota separately; comparison between real data and conditional random patterns generated by including others (Nanoarchaea, Thaumarchaea and Korarchaeota) into Crenarchaeota (mean P =  0.0071, middle) or into Euryarchaeota (mean P =  0.02591, bottom). b, Archaeal import families: distribution of 989 archaeal import families present in at least 2 and less than 11 groups (top). Comparison between real data and 100 conditional random patterns generated by shuffling the entries within Crenarchaeota and Euryarchaeota separately by including others (Nanoarchaea, Thaumarchaea and Korarchaeota) into Crenarchaeota (mean P =  0.0795, middle); comparison between real data and random patterns generated by including others (Nanoarchaea,Thaumarchaea and Korarchaeota) into Euryarchaeota (mean P =  0.0098, bottom).

Extended Data Figure 7 Archaeal specific and import gene counts on a reference tree.

Number of archaeal specific and import families corresponding to each node in the reference tree are shown in the order of ‘specific/imports’. Numbers at internal nodes indicate the number of archaeal-specific families and families with bacterial homologues that correspond to the reference tree topology. Values at the far left indicate the number of archaeal-specific families and families with bacterial homologues that are present in all archaeal groups.

Extended Data Figure 8 Non tree-like structure of archaeal protein families.

Proportion of archaeal families whose distributions are congruent with the reference tree and with all possible trees. Filled circles indicate the proportion of archaeal families that are congruent to the reference tree allowing no losses (with a single origin) and different increments of losses allowed. Red, blue, green, magenta and black circles represent the proportion of families that can be explained using a single origin (849, 11.5%), single origin plus 1 loss (22.4%), single origin plus 2 losses (15%), single origin plus 3 losses (13%) and single origin plus ≥ 4 losses (38%) respectively. Lines indicate the proportion of families that can be explained by each of the 6,081,075 possible trees that preserve euryarchaeote and crenarchaeote monophyly. Note that on average, any given tree can explain 569 (8%) of the archaeal families using a single origin event in the tree, and the best tree can explain only 1,180 families (16%). In the present data, 208,019 trees explain the gene distributions better than the archaeal reference tree without loss events, underscoring the discordance between core gene phylogeny and gene distributions in the remainder of the genome.

Extended Data Table 1 Comparison of sets of trees for single-copy genes in 11 archaeal groups
Extended Data Table 2 Functional annotations for archaeal genes according to gene family distribution and phylogeny

Supplementary information

Supplementary Information

This file contains Supplementary Methods and Supplementary References. (PDF 728 kb)

Supplementary Data

This file contains Supplementary Tables 1-8 and a Supplementary Table Guide. (ZIP 32480 kb)

PowerPoint slides

Source data

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Nelson-Sathi, S., Sousa, F., Roettger, M. et al. Origins of major archaeal clades correspond to gene acquisitions from bacteria. Nature 517, 77–80 (2015).

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI:

This article is cited by


By submitting a comment you agree to abide by our Terms and Community Guidelines. If you find something abusive or that does not comply with our terms or guidelines please flag it as inappropriate.


Quick links

Nature Briefing

Sign up for the Nature Briefing newsletter — what matters in science, free to your inbox daily.

Get the most important science stories of the day, free in your inbox. Sign up for Nature Briefing