Origins of major archaeal clades correspond to gene acquisitions from bacteria

Abstract

The mechanisms that underlie the origin of major prokaryotic groups are poorly understood. In principle, the origin of both species and higher taxa among prokaryotes should entail similar mechanisms—ecological interactions with the environment paired with natural genetic variation involving lineage-specific gene innovations and lineage-specific gene acquisitions1,2,3,4. To investigate the origin of higher taxa in archaea, we have determined gene distributions and gene phylogenies for the 267,568 protein-coding genes of 134 sequenced archaeal genomes in the context of their homologues from 1,847 reference bacterial genomes. Archaeal-specific gene families define 13 traditionally recognized archaeal higher taxa in our sample. Here we report that the origins of these 13 groups unexpectedly correspond to 2,264 group-specific gene acquisitions from bacteria. Interdomain gene transfer is highly asymmetric, transfers from bacteria to archaea are more than fivefold more frequent than vice versa. Gene transfers identified at major evolutionary transitions among prokaryotes specifically implicate gene acquisitions for metabolic functions from bacteria as key innovations in the origin of higher archaeal taxa.

Access options

Rent or Buy article

Get time limited or full article access on ReadCube.

from$8.99

All prices are NET prices.

Figure 1: Distribution of genes in archaeal-specific families.
Figure 2: Bacterial gene acquisitions in archaeal genomes.
Figure 3: Archaeal gene acquisition network.

References

  1. 1

    Doolittle, W. F. & Papke, R. T. Genomics and the bacterial species problem. Genome Biol. 7, 116 (2006)

  2. 2

    Retchless, A. C. & Lawrence, J. G. Temporal fragmentation of speciation in bacteria. Science 317, 1093–1096 (2007)

  3. 3

    Achtman, M. & Wagner, M. Microbial diversity and the genetic nature of microbial species. Nature Rev. Microbiol. 6, 431–440 (2008)

  4. 4

    Fraser, C., Alm, E. J., Polz, M. F., Spratt, B. G. & Hanage, W. P. The bacterial species challenge: making sense of genetic and ecological diversity. Science 323, 741–746 (2009)

  5. 5

    Puigbò, P., Wolf, Y. I. & Koonin, E. V. The tree and net components of prokaryote genome evolution. Genome Biol. Evol. 2, 745–756 (2010)

  6. 6

    Dagan, T. Phylogenomic networks. Trends Microbiol. 19, 483–491 (2011)

  7. 7

    Hess, W. R. Genome analysis of marine photosynthetic microbes and their global role. Curr. Opin. Biotechnol. 15, 191–198 (2004)

  8. 8

    Kloesges, T. et al. Networks of gene sharing among 329 proteobacterial genomes reveal differences in lateral gene transfer frequency at different phylogenetic depths. Mol. Biol. Evol. 28, 1057–1074 (2011)

  9. 9

    Williams, D., Gogarten, J. P. & Papke, R. T. Quantifying homologous replacement of loci between haloarchaeal species. Genome Biol. Evol. 4, 1223–1244 (2012)

  10. 10

    Woese, C. R. Bacterial evolution. Microbiol. Rev. 51, 221–271 (1987)

  11. 11

    Rivera, M. C., Jain, R., Moore, J. E. & Lake, J. A. Genomic evidence for two functionally distinct gene classes. Proc. Natl Acad. Sci. USA 95, 6239–6244 (1998)

  12. 12

    Puigbò, P., Wolf, Y. I. & Koonin, E. V. Search for a tree of life in the thicket of the phylogenetic forest. J. Biol. 8, 59 (2009)

  13. 13

    Brochier-Armanet, C., Forterre, P. & Gribaldo, S. Phylogeny and evolution of the Archaea: one hundred genomes later. Curr. Opin. Microbiol. 14, 274–281 (2011)

  14. 14

    Lake, J. A. & Rivera, M. C. Deriving the genomic tree of life in the presence of horizontal gene transfer: conditioned reconstruction. Mol. Biol. Evol. 21, 681–690 (2004)

  15. 15

    Enright, A. J., Van Dongen, S. & Ouzounis, C. A. An efficient algorithm for large-scale detection of protein families. Nucleic Acids Res. 30, 1575–1584 (2002)

  16. 16

    Wolf, Y. I., Makarova, K. S., Yutin, N. & Koonin, E. V. Updated clusters of orthologous genes for Archaea: a complex ancestor of the Archaea and the byways of horizontal gene transfer. Biol. Direct 7, 46 (2012)

  17. 17

    Nelson-Sathi, S. et al. Acquisitions of 1,000 eubacterial genes physiologically transformed a methanogen at the origin of Haloarchaea. Proc. Natl Acad. Sci. USA 109, 20537–20542 (2012)

  18. 18

    Bräsen, C., Esser, D., Rauch, B. & Siebers, B. Carbohydrate metabolism in Archaea: current insights into unusual enzymes and pathways and their regulation. Microbiol. Mol. Biol. Rev. 78, 89–175 (2014)

  19. 19

    Siebers, B. & Schönheit, P. Unusual pathways and enzymes of central carbohydrate metabolism in Archaea. Curr. Opin. Microbiol. 8, 695–705 (2005)

  20. 20

    Doolittle, W. F. & Bapteste, E. Pattern pluralism and the tree of life hypothesis. Proc. Natl Acad. Sci. USA 104, 2043–2049 (2007)

  21. 21

    Creevey, C. J. et al. Does a tree-like phylogeny only exist at the tips in the tree of prokaryotes? Proc. R. Soc. Lond. B 271, 2551–2558 (2004)

  22. 22

    Deppenmeier, U. et al. The genome of Methanosarcina mazei: evidence for lateral gene transfer between bacteria and archaea. J. Mol. Microbiol. Biotechnol. 4, 453–461 (2002)

  23. 23

    Williams, T. A., Foster, G. F., Cox, C. Y. & Embley, T. M. An archaeal origin of eukaryotes supports only two primary domains of life. Nature 504, 231–236 (2013)

  24. 24

    McInerney, J. O., O’Connell, M. J. & Pisani, D. The hybrid nature of eukaryota and a consilient view of life on Earth. Nature Rev. Microbiol. 12, 449–455 (2014)

  25. 25

    Wolf, Y. I. & Koonin, E. V. Genome reduction as the dominant mode of evolution. Bioessays 35, 829–837 (2013)

  26. 26

    Altschul, S. F. et al. Gapped BLAST and PSI-BLAST: a new generation of protein database search programs. Nucleic Acids Res. 25, 3389–3402 (1997)

  27. 27

    Tatusov, R. L., Koonin, E. V. & Lipman, D. J. A genomic perspective on protein families. Science 278, 631–637 (1997)

  28. 28

    Rice, P., Longden, I. & Bleasby, A. EMBOSS: the European molecular biology open software suite. Trends Genet. 16, 276–277 (2000)

  29. 29

    Guindon, S. & Gascuel, O. A simple, fast, and accurate algorithm to estimate large phylogenies by maximum likelihood. Syst. Biol. 52, 696–704 (2003)

  30. 30

    Stamatakis, A., Ludwig, T. & Meier, H. RAxML-III: a fast program for maximum likelihood-based inference of large phylogenetic trees. Bioinformatics 21, 456–463 (2005)

Download references

Acknowledgements

We gratefully acknowledge funding from European Research Council (ERC 232975 to W.F.M.), the graduate school E-Norm of the Heinrich-Heine University (W.F.M.), the DFG (Scho 316/11-1 to P.S.; SI 642/10-1 to B.S.), and BMBF (0316188A, B.S.). G.L. is supported by an ERC grant (281357 to Tal Dagan), D.B. thanks the Alexander von Humbold Foundation for a Fellowship. Computational support of the Zentrum für Informations- und Medientechnologie (ZIM) at the Heinrich-Heine University is gratefully acknowledged.

Author information

S.N.-S., F.L.S., M.R., N.L.-C. and T.T. performed bioinformatic analyses; A.J., D.B. and G.L. performed statistical analyses; P.S., B.S., J.O.M. and W.F.M. interpreted results; S.N.-S., F.L.S., G.L., J.O.M. and W.F.M. wrote the paper; S.N.-S., G.L. and W.F.M. designed the study. All authors discussed the results and commented on the manuscript.

Correspondence to William F. Martin.

Ethics declarations

Competing interests

The authors declare no competing financial interests.

Extended data figures and tables

Extended Data Figure 1 Inter-domain gene sharing network.

Each cell in the matrix indicates the number of genes (e-value ≤ 10−10 and ≥ 25% global identity) shared between 134 archaeal and 1,847 bacterial genomes in each pairwise inter-domain comparison (scale bar at lower right). Archaeal genomes are listed as in Fig. 1. Bacterial genomes are presented in 23 groups corresponding to phylum or class in the GenBank nomenclature: a = Clostridia; b = Erysipelotrichi, Negativicutes; c = Bacilli; d = Firmicutes; e = Chlamydia; f = Verrucomicrobia, Planctomycete; g = Spirochaete; h = Gemmatimonadetes, Synergisteles, Elusimicrobia, Dyctyoglomi, Nitrospirae; i = Actinobacteria; j = Fibrobacter, Chlorobi; k = Bacteroidetes; l = Fusobacteria; Thermatogae, Aquificae, Chloroflexi; m = Deinococcus-Thermus; n = Cyanobacteria; o = Acidobacteria; δ, ε, α, β, γ = Delta, Epsilon, Alpha, Beta and Gamma proteobacteria; P = Thermosulfurobateria, Caldiserica, Chysiogenete, Ignavibacteria. Bacterial genome size in number of proteins is indicated at the top.

Extended Data Figure 2 Presence–absence patterns of archaeal genes with sparse distribution among bacteria sampled.

Archaeal export families are sorted according to the reference tree on the left. The figure shows the 391 cases of archaea-to-bacteria export (≥ 2 archaea and ≥ 2 bacteria from one phylum only), 662 cases of bacterial singleton trees (≥ 3 archaea, one bacterium). The 25,762 clusters were classified into the following categories (Supplementary Table 2): 16,983 archaeal specific, 3,315 imports, 391 exports, 662 cases of bacterial singletons with ≥ 3 archaea in the tree, 308 cases with three sequences (a bacterial singleton and 2 archaea) in the cluster, 4,074 trees in which archaea were non-monophyletic, and 29 ambiguous cases among trees showing archaeal monophyly. The bacterial taxonomic distribution is shown in the lower panel. Gene identifiers and trees are given in Supplementary Table 3.

Extended Data Figure 3 Comparison of sets of trees for single-copy genes in 11 archaeal groups.

Cumulative distribution functions for scores of tree compatibility with the recipient data set. Values are P values of the two-sided Kolmogorov–Smirnov (KS) two-sample goodness-of-fit test in the comparison of the recipient (blue) data sets against the imports (green) data set and three synthetic data sets, one-LGT (red), two-LGT (pink) and random (cyan). a, Thermoproteales. b, Desulfurococcales. c, Sulfolobales. d, Thermococcales. e, Methanobacteriales. f, Methanococcales. g, Thermoplasmatales. h, Archaeoglobales. i, Methanococcales. j, Methanosarcinales. k, Haloarchaea.

Extended Data Figure 4 Presence–absence patterns of all archaeal non-monophyletic genes.

Archaeal families that did not generate monophyly for archaeal sequences in ML trees are plotted according the reference tree on the left, the distribution across bacterial genomes groups is shown in the lower panel. These trees include 693 cases in which archaea showed non-monophyly by the misplacement of a single archaeal branch. Gene identifiers and trees are given in Supplementary Tables 4 and 5.

Extended Data Figure 5 Sorting by bacterial presence absence patterns for archaeal imports, exports and archaeal non-monophyletic families.

Archaeal families and their homologue distribution in 1,847 bacterial genomes are sorted by archaeal (top) and bacterial (bottom) gene distributions for direct comparison. af, Distributions of archaeal imports sorted by archaeal groups (a) and by bacterial groups (b); distributions of archaeal exports sorted by archaeal groups (c) and by bacterial groups (d); distributions of archaeal non-monophyletic gene families sorted by archaeal groups (e) and by bacterial groups (f).

Extended Data Figure 6 Testing for evidence of higher order archaeal relationships using a permutation tail probability (PTP) test.

Comparison of pairwise Euclidian distance distributions between archaeal real and conditional random gene family patterns using the two-sided Kolmogorov-Smirnov (KS) two-sample goodness-of-fit test. a, Archaeal specific families: distribution of 2,471 archaeal specific families present in at least 2 and less than 11 groups (top); comparison between real data and 100 conditional random patterns generated by shuffling the entries within Crenarchaeota and Euryarchaeota separately; comparison between real data and conditional random patterns generated by including others (Nanoarchaea, Thaumarchaea and Korarchaeota) into Crenarchaeota (mean P =  0.0071, middle) or into Euryarchaeota (mean P =  0.02591, bottom). b, Archaeal import families: distribution of 989 archaeal import families present in at least 2 and less than 11 groups (top). Comparison between real data and 100 conditional random patterns generated by shuffling the entries within Crenarchaeota and Euryarchaeota separately by including others (Nanoarchaea, Thaumarchaea and Korarchaeota) into Crenarchaeota (mean P =  0.0795, middle); comparison between real data and random patterns generated by including others (Nanoarchaea,Thaumarchaea and Korarchaeota) into Euryarchaeota (mean P =  0.0098, bottom).

Extended Data Figure 7 Archaeal specific and import gene counts on a reference tree.

Number of archaeal specific and import families corresponding to each node in the reference tree are shown in the order of ‘specific/imports’. Numbers at internal nodes indicate the number of archaeal-specific families and families with bacterial homologues that correspond to the reference tree topology. Values at the far left indicate the number of archaeal-specific families and families with bacterial homologues that are present in all archaeal groups.

Extended Data Figure 8 Non tree-like structure of archaeal protein families.

Proportion of archaeal families whose distributions are congruent with the reference tree and with all possible trees. Filled circles indicate the proportion of archaeal families that are congruent to the reference tree allowing no losses (with a single origin) and different increments of losses allowed. Red, blue, green, magenta and black circles represent the proportion of families that can be explained using a single origin (849, 11.5%), single origin plus 1 loss (22.4%), single origin plus 2 losses (15%), single origin plus 3 losses (13%) and single origin plus ≥ 4 losses (38%) respectively. Lines indicate the proportion of families that can be explained by each of the 6,081,075 possible trees that preserve euryarchaeote and crenarchaeote monophyly. Note that on average, any given tree can explain 569 (8%) of the archaeal families using a single origin event in the tree, and the best tree can explain only 1,180 families (16%). In the present data, 208,019 trees explain the gene distributions better than the archaeal reference tree without loss events, underscoring the discordance between core gene phylogeny and gene distributions in the remainder of the genome.

Extended Data Table 1 Comparison of sets of trees for single-copy genes in 11 archaeal groups
Extended Data Table 2 Functional annotations for archaeal genes according to gene family distribution and phylogeny

Supplementary information

Supplementary Information

This file contains Supplementary Methods and Supplementary References. (PDF 728 kb)

Supplementary Data

This file contains Supplementary Tables 1-8 and a Supplementary Table Guide. (ZIP 32480 kb)

PowerPoint slides

Source data

Rights and permissions

Reprints and Permissions

About this article

Verify currency and authenticity via CrossMark

Cite this article

Nelson-Sathi, S., Sousa, F., Roettger, M. et al. Origins of major archaeal clades correspond to gene acquisitions from bacteria. Nature 517, 77–80 (2015). https://doi.org/10.1038/nature13805

Download citation

Further reading

Comments

By submitting a comment you agree to abide by our Terms and Community Guidelines. If you find something abusive or that does not comply with our terms or guidelines please flag it as inappropriate.