Eukaryogenesis is one of the most enigmatic evolutionary transitions, during which simple prokaryotic cells gave rise to complex eukaryotic cells. While evolutionary intermediates are lacking, gene duplications provide information on the order of events by which eukaryotes originated. Here we use a phylogenomics approach to reconstruct successive steps during eukaryogenesis. We find that gene duplications roughly doubled the proto-eukaryotic gene repertoire, with families inherited from the Asgard archaea-related host being duplicated most. By relatively timing events using phylogenetic distances, we inferred that duplications in cytoskeletal and membrane-trafficking families were among the earliest events, whereas most other families expanded predominantly after mitochondrial endosymbiosis. Altogether, we infer that the host that engulfed the proto-mitochondrion had some eukaryote-like complexity, which drastically increased upon mitochondrial acquisition. This scenario bridges the signs of complexity observed in Asgard archaeal genomes to the proposed role of mitochondria in triggering eukaryogenesis.
Subscribe to Journal
Get full journal access for 1 year
only $9.92 per issue
All prices are NET prices.
VAT will be added later in the checkout.
Tax calculation will be finalised during checkout.
Rent or Buy article
Get time limited or full article access on ReadCube.
All prices are NET prices.
The code used to annotate the phylogenetic trees can be accessed in Github (https://github.com/JulianVosseberg/feca2leca).
Dacks, J. B. et al. The changing view of eukaryogenesis—fossils, cells, lineages and how they all come together. J. Cell Sci. 129, 3695–3703 (2016).
Shiratori, T., Suzuki, S., Kakizawa, Y. & Ishida, K. Phagocytosis-like cell engulfment by a planctomycete bacterium. Nat. Commun. 10, 5529 (2019).
Koumandou, V. L. et al. Molecular paleontology and complexity in the last eukaryotic common ancestor. Crit. Rev. Biochem. Mol. Biol. 48, 373–396 (2013).
Szathmáry, E. Toward major evolutionary transitions theory 2.0. Proc. Natl Acad. Sci. USA 112, 10104–10111 (2015).
Spang, A. et al. Complex archaea that bridge the gap between prokaryotes and eukaryotes. Nature 521, 173–179 (2015).
Zaremba-Niedzwiedzka, K. et al. Asgard archaea illuminate the origin of eukaryotic cellular complexity. Nature 541, 353–358 (2017).
Roger, A. J., Muñoz-Gómez, S. A. & Kamikawa, R. The origin and diversification of mitochondria. Curr. Biol. 27, R1177–R1192 (2017).
Martijn, J., Vosseberg, J., Guy, L., Offre, P. & Ettema, T. J. G. Deep mitochondrial origin outside the sampled alphaproteobacteria. Nature 557, 101–105 (2018).
Poole, A. M. & Gribaldo, S. Eukaryotic origins: how and when was the mitochondrion acquired? Cold Spring Harb. Perspect. Biol. 6, a015990 (2014).
Pittis, A. A. & Gabaldón, T. Late acquisition of mitochondria by a host with chimaeric prokaryotic ancestry. Nature 531, 101–104 (2016).
Makarova, K. S., Wolf, Y. I., Mekhedov, S. L., Mirkin, B. G. & Koonin, E. V. Ancestral paralogs and pseudoparalogs and their role in the emergence of the eukaryotic cell. Nucleic Acids Res. 33, 4626–4638 (2005).
Jékely, G. Small GTPases and the evolution of the eukaryotic cell. Bioessays 25, 1129–1138 (2003).
Wickstead, B., Gull, K. & Richards, T. A. Patterns of kinesin evolution reveal a complex ancestral eukaryote with a multifunctional cytoskeleton. BMC Evol. Biol. 10, 110 (2010).
Elias, M., Brighouse, A., Gabernet-Castello, C., Field, M. C. & Dacks, J. B. Sculpting the endomembrane system in deep time: high resolution phylogenetics of Rab GTPases. J. Cell Sci. 125, 2500–2508 (2012).
Dacks, J. B. & Field, M. C. Evolutionary origins and specialisation of membrane transport. Curr. Opin. Cell Biol. 53, 70–76 (2018).
Finn, R. D. et al. The Pfam protein families database: towards a more sustainable future. Nucleic Acids Res. 44, D279–D285 (2016).
Fritz-Laylin, L. K. et al. The genome of Naegleria gruberi illuminates early eukaryotic versatility. Cell 140, 631–642 (2010).
Derelle, R. et al. Bacterial proteins pinpoint a single eukaryotic root. Proc. Natl Acad. Sci. USA 112, E693–E699 (2015).
Burki, F., Roger, A. J., Brown, M. W. & Simpson, A. G. B. The new tree of eukaryotes. Trends Ecol. Evol. 35, 43–55 (2020).
Tria, F. D. K. et al. Gene duplications trace mitochondria to the onset of eukaryote complexity. Preprint at bioRxiv https://doi.org/10.1101/781211 (2019).
Esser, C. et al. A genome phylogeny for mitochondria among α-proteobacteria and a predominantly eubacterial ancestry of yeast nuclear genes. Mol. Biol. Evol. 21, 1643–1660 (2004).
Pisani, D., Cotton, J. A. & McInerney, J. O. Supertrees disentangle the chimerical origin of eukaryotic genomes. Mol. Biol. Evol. 24, 1752–1760 (2007).
Narrowe, A. B. et al. Complex evolutionary history of translation elongation factor 2 and diphthamide biosynthesis in archaea and parabasalids. Genome Biol. Evol. 10, 2380–2393 (2018).
Williams, T. A., Cox, C. J., Foster, P. G., Szöllősi, G. J. & Embley, T. M. Phylogenomics provides robust support for a two-domains tree of life. Nat. Ecol. Evol. 4, 138–147 (2020).
Martin, W. F. et al. Late mitochondrial origin is an artifact. Genome Biol. Evol. 9, 373–379 (2017).
Lane, N. Serial endosymbiosis or singular event at the origin of eukaryotes? J. Theor. Biol. 434, 58–67 (2017).
Pittis, A. A. & Gabaldón, T. On phylogenetic branch lengths distribution and the late acquistion of mitochondria. Preprint at bioRxiv https://doi.org/10.1101/064873 (2016).
Lane, N. & Martin, W. The energetics of genome complexity. Nature 467, 929–934 (2010).
Lane, N. Bioenergetic constraints on the evolution of complex life. Cold Spring Harb. Perspect. Biol. 6, a015982 (2014).
Klinger, C. M., Spang, A., Dacks, J. B. & Ettema, T. J. G. Tracing the archaeal origins of eukaryotic membrane-trafficking system building blocks. Mol. Biol. Evol. 33, 1528–1541 (2016).
Martijn, J. & Ettema, T. J. G. From archaeon to eukaryote: the evolutionary dark ages of the eukaryotic cell. Biochem. Soc. Trans. 41, 451–457 (2013).
Akıl, C. & Robinson, R. C. Genomes of Asgard archaea encode profilins that regulate actin. Nature 562, 439–443 (2018).
Imachi, H. et al. Isolation of an archaeon at the prokaryote–eukaryote interface. Nature 577, 519–525 (2020).
Deutekom, E. S., Vosseberg, J., Dam, T. J. Pvan & Snel, B. Measuring the impact of gene prediction on gene loss estimates in Eukaryotes by quantifying falsely inferred absences. PLoS Comput. Biol. 15, e1007301 (2019).
Huerta-Cepas, J. et al. eggNOG 4.5: a hierarchical orthology framework with improved functional annotations for eukaryotic, prokaryotic and viral sequences. Nucleic Acids Res. 44, D286–D293 (2016).
Eddy, S. R. Accelerated Profile HMM Searches. PLoS Comput. Biol. 7, e1002195 (2011).
Hauser, M., Mayer, C. E. & Söding, J. kClust: fast and sensitive clustering of large protein sequence databases. BMC Bioinform. 14, 248 (2013).
van Wijk, L. M. & Snel, B. The first eukaryotic kinome tree illuminates the dynamic history of present-day kinases. Preprint at bioRxiv https://doi.org/10.1101/2020.01.27.920793 (2020).
Altschul, S. F., Gish, W., Miller, W., Myers, E. W. & Lipman, D. J. Basic local alignment search tool. J. Mol. Biol. 215, 403–410 (1990).
Shah, N., Nute, M. G., Warnow, T. & Pop, M. Misunderstood parameter of NCBI BLAST impacts the correctness of bioinformatics workflows. Bioinformatics 35, 1613–1614 (2019).
González-Pech, R. A., Stephens, T. G. & Chan, C. X. Commonly misunderstood parameters of NCBI BLAST and important considerations for users. Bioinformatics 35, 2697–2698 (2019).
Adl, S. M. et al. Revisions to the classification, nomenclature, and diversity of eukaryotes. J. Eukaryot. Microbiol. 66, 4–119 (2019).
Katoh, K. & Standley, D. M. MAFFT multiple sequence alignment software version 7: improvements in performance and usability. Mol. Biol. Evol. 30, 772–780 (2013).
Capella-Gutiérrez, S., Silla-Martínez, J. M. & Gabaldón, T. trimAl: a tool for automated alignment trimming in large-scale phylogenetic analyses. Bioinformatics 25, 1972–1973 (2009).
Nguyen, L.-T., Schmidt, H. A., von Haeseler, A. & Minh, B. Q. IQ-TREE: a fast and effective stochastic algorithm for estimating maximum-likelihood phylogenies. Mol. Biol. Evol. 32, 268–274 (2015).
Le, S. Q., Dang, C. C. & Gascuel, O. Modeling protein evolution with several amino acid replacement matrices depending on site rates. Mol. Biol. Evol. 29, 2921–2936 (2012).
Minh, B. Q., Nguyen, M. A. T. & von Haeseler, A. Ultrafast approximation for phylogenetic bootstrap. Mol. Biol. Evol. 30, 1188–1195 (2013).
Price, M. N., Dehal, P. S. & Arkin, A. P. FastTree 2—approximately maximum-likelihood trees for large alignments. PLoS ONE 5, e9490 (2010).
Huerta-Cepas, J., Serra, F. & Bork, P. ETE 3: reconstruction, analysis, and visualization of phylogenomic data. Mol. Biol. Evol. 33, 1635–1638 (2016).
Steinegger, M. et al. HH-suite3 for fast remote homology detection and deep protein annotation. BMC Bioinform. 20, 473 (2019).
Huerta-Cepas, J. et al. Fast genome-wide functional annotation through orthology assignment by eggNOG-mapper. Mol. Biol. Evol. 34, 2115–2122 (2017).
Buchfink, B., Xie, C. & Huson, D. H. Fast and sensitive protein alignment using DIAMOND. Nat. Methods 12, 59–60 (2015).
Vosseberg, J. et al. Data for: timing the origin of eukaryotic cellular complexity with ancient duplications. figshare https://doi.org/10.6084/m9.figshare.10069985.v3 (2020).
We thank K. S. Marakova and E. V. Koonin for sharing their KOG-to-COG protein clusters with us. We are grateful to T. J. P. van Dam, E. S. Deutekom and G. J. P. L. Kops for useful advice and discussions. This work is part of the research programme VICI with project number 016.160.638, which is (partly) financed by the Netherlands Organisation for Scientific Research (NWO). T.G. acknowledges support from the Spanish Ministry of Science and Innovation for grant PGC2018-099921-B-I00 and from the European Union’s Horizon 2020 research and innovation programme under grant agreement ERC-2016-724173.
The authors declare no competing interests.
Peer review information Peer reviewer reports are available.
Publisher’s note Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Extended Data Fig. 1 Estimating the number of LECA genes from the number of Pfam domains with linear regression.
Scatter plot showing the number of Pfam domains and protein-coding genes in present-day eukaryotes, with each dot representing one genome. The regression line (black) and its 95% confidence (filled grey) and prediction intervals (dashed grey) are depicted. The vertical line corresponds to the obtained number of LECA Pfam domains.
a, Number of inferred LECA families considering different root positions. These numbers are based on phylogenetic trees from Pfams that are only present in eukaryotes. Besides the Opimoda and Diphoda groups, two other group definitions were used to identify bidirectional best hits (BBHs) and select sequences for tree inference. Names of root positions indicate either the lineage at one side of the root or the position of the split (ADis-DiaM: Amorphea+Discoba – Diaphoretickes+Metamonada; AM-DiaDis: Amorphea+Metamonada – Diaphoretickes+Discoba). Excavate sequences, especially from Metamonada species, are rarely involved in BBHs, unless specifically searched for (Excavata in BBHs 5 groups; Discoba and Metamonada in BBHs 4 groups). b, Distribution of duplication lengths obtained using different root positions for eukaryote-only trees based on the four group BBHs. The difference between distributions is not statistically significant according to the Kruskal-Wallis test.
a, Contribution of inventions to LECA families performing different functions. 82% of pairwise comparisons were significantly different (Supplementary Fig. 3). b, Fraction of LECA families resulting from either an invention or duplication – a eukaryotic innovation – according to functional category. 84% of pairwise comparisons were significantly different (Supplementary Fig. 5). c, Contribution of inventions to LECA families performing their function in different cellular components. 51% of pairwise comparisons were significantly different (Supplementary Fig. 4). d, Fraction of LECA families resulting from an innovation according to cellular localisation. 74% of pairwise comparisons were significantly different (Supplementary Fig. 6). a–d, Dashed lines indicate the overall invented or innovated fraction.
a, b, Phylogeny of the prokaryotes (a) and Asgard archaea (b) present in our dataset based on the NCBI taxonomy. The branch widths and numbers indicate the number of acquisitions from a group. c, Number of acquisitions from different alphaproteobacterial orders or a combination of multiple orders (‘Alphaproteobacteria’).
a, b, Distribution of alphaproteobacterial (a) and Asgard archaeal (b) stem lengths (sl’s) for acquisitions without and with duplications. Two alphaproteobacterial sl’s from acquisitions with Magnetococcales as sister group were removed based on the previously inferred phylogenetic position of mitochondria8. c, d, Distribution of Asgard archaeal sl’s for information storage and processing (c) and cellular processes and signalling families (d), comparing those without and with duplications. Upon removal of the outliers, the difference in cellular processes and signalling families no longer reached statistical significance. e, Distribution of Asgard archaeal sl’s for duplicated acquisitions, in which homomer-to-heteromer transitions had occurred compared to the other duplicated acquisitions. f, Distribution of vertebrate sl’s for families without and with duplications. g, Distribution of duplication lengths (dl’s) grouped according to the lineage in which the duplication occurred. All pairwise comparisons were significantly different (Mann-Whitney U tests). h, Distribution of differences in log-transformed dl values for all pairwise comparisons between chordate duplications according to age and functional annotation. All groups were significantly different (Mann-Whitney U tests). a–f, P values of Mann-Whitney U tests are shown. c–e, The minimal sl via each duplication node is plotted.
a, Ridgeline plot showing the distribution of uncorrected stem (rsl) or duplication lengths (rdl). Numbers indicate the number of acquisitions or duplications for which the branch lengths were included. The low peaks at very short branch lengths are an artefact from near-zero branch lengths. Groups are ordered based on the median value of rsl’s and rdl’s. b, Ridgeline plot showing the distribution of sls for non-duplicated acquisitions that share the same functional annotation of the prokaryotic sister group and are therefore expected to have undergone little functional divergence during eukaryogenesis. a, b, Branch lengths are depicted as the additive inverse of the log-transformed values. Pairwise comparisons that did not give a significant P value (Mann-Whitney U tests) are shown.
About this article
Cite this article
Vosseberg, J., van Hooff, J.J.E., Marcet-Houben, M. et al. Timing the origin of eukaryotic cellular complexity with ancient duplications. Nat Ecol Evol 5, 92–100 (2021). https://doi.org/10.1038/s41559-020-01320-z