Skip to main content

Thank you for visiting You are using a browser version with limited support for CSS. To obtain the best experience, we recommend you use a more up to date browser (or turn off compatibility mode in Internet Explorer). In the meantime, to ensure continued support, we are displaying the site without styles and JavaScript.

  • Letter
  • Published:

Late acquisition of mitochondria by a host with chimaeric prokaryotic ancestry


The origin of eukaryotes stands as a major conundrum in biology1. Current evidence indicates that the last eukaryotic common ancestor already possessed many eukaryotic hallmarks, including a complex subcellular organization1,2,3. In addition, the lack of evolutionary intermediates challenges the elucidation of the relative order of emergence of eukaryotic traits. Mitochondria are ubiquitous organelles derived from an alphaproteobacterial endosymbiont4. Different hypotheses disagree on whether mitochondria were acquired early or late during eukaryogenesis5. Similarly, the nature and complexity of the receiving host are debated, with models ranging from a simple prokaryotic host to an already complex proto-eukaryote1,3,6,7. Most competing scenarios can be roughly grouped into either mito-early, which consider the driving force of eukaryogenesis to be mitochondrial endosymbiosis into a simple host, or mito-late, which postulate that a significant complexity predated mitochondrial endosymbiosis3. Here we provide evidence for late mitochondrial endosymbiosis. We use phylogenomics to directly test whether proto-mitochondrial proteins were acquired earlier or later than other proteins of the last eukaryotic common ancestor. We find that last eukaryotic common ancestor protein families of alphaproteobacterial ancestry and of mitochondrial localization show the shortest phylogenetic distances to their closest prokaryotic relatives, compared with proteins of different prokaryotic origin or cellular localization. Altogether, our results shed new light on a long-standing question and provide compelling support for the late acquisition of mitochondria into a host that already had a proteome of chimaeric phylogenetic origin. We argue that mitochondrial endosymbiosis was one of the ultimate steps in eukaryogenesis and that it provided the definitive selective advantage to mitochondria-bearing eukaryotes over less complex forms.

This is a preview of subscription content, access via your institution

Access options

Buy this article

Prices may be subject to local taxes which are calculated during checkout

Figure 1: Stem length analysis.
Figure 2: Phylogenetic distance profiles.
Figure 3: Correspondence of different LECA components with different cellular localizations and functions.

Similar content being viewed by others


  1. Koonin, E. V. The origin and early evolution of eukaryotes in the light of phylogenomics. Genome Biol. 11, 209 (2010)

    Article  Google Scholar 

  2. Embley, T. M. & Martin, W. Eukaryotic evolution, changes and challenges. Nature 440, 623–630 (2006)

    Article  ADS  CAS  Google Scholar 

  3. Koumandou, V. L. et al. Molecular paleontology and complexity in the last eukaryotic common ancestor. Crit. Rev. Biochem. Mol. Biol. 48, 373–396 (2013)

    Article  CAS  Google Scholar 

  4. Gray, M. W., Burger, G. & Lang, B. F. Mitochondrial evolution. Science 283, 1476–1481 (1999)

    Article  ADS  CAS  Google Scholar 

  5. Poole, A. M. & Gribaldo, S. Eukaryotic origins: how and when was the mitochondrion acquired? Cold Spring Harb. Perspect . Biol. 6, a015990 (2014)

    Google Scholar 

  6. Martijn, J. & Ettema, T. J. G. From archaeon to eukaryote: the evolutionary dark ages of the eukaryotic cell. Biochem. Soc. Trans. 41, 451–457 (2013)

    Article  CAS  Google Scholar 

  7. Lester, L., Meade, A. & Pagel, M. The slow road to the eukaryotic genome. BioEssays 28, 57–64 (2006)

    Article  CAS  Google Scholar 

  8. Rochette, N. C., Brochier-Armanet, C. & Gouy, M. Phylogenomic test of the hypotheses for the evolutionary origin of eukaryotes. Mol. Biol. Evol. 31, 832–845 (2014)

    Article  CAS  Google Scholar 

  9. Thiergart, T., Landan, G., Schenk, M., Dagan, T. & Martin, W. F. An evolutionary network of genes present in the eukaryote common ancestor polls genomes on eukaryotic and mitochondrial origin. Genome Biol. Evol. 4, 466–485 (2012)

    Article  CAS  Google Scholar 

  10. Ku, C. et al. Endosymbiotic gene transfer from prokaryotic pangenomes: inherited chimerism in eukaryotes. Proc. Natl Acad. Sci. USA 112, 10139–10146 (2015)

    Article  ADS  CAS  Google Scholar 

  11. Spang, A. et al. Complex archaea that bridge the gap between prokaryotes and eukaryotes. Nature 521, 173–179 (2015)

    Article  ADS  CAS  Google Scholar 

  12. Do, C. B. & Batzoglou, S. What is the expectation maximization algorithm? Nature Biotechnol. 26, 897–899 (2008)

    Article  CAS  Google Scholar 

  13. Esser, C. et al. A genome phylogeny for mitochondria among alpha-proteobacteria and a predominantly eubacterial ancestry of yeast nuclear genes. Mol. Biol. Evol. 21, 1643–1660 (2004)

    Article  CAS  Google Scholar 

  14. Gabaldón, T. & Huynen, M. A. Shaping the mitochondrial proteome. Biochim. Biophys. Acta 1659, 212–220 (2004)

    Article  Google Scholar 

  15. Koonin, E. V. & Yutin, N. The dispersed archaeal eukaryome and the complex archaeal ancestor of eukaryotes. Cold Spring Harb. Perspect . Biol . 6, a016188 (2014)

    Google Scholar 

  16. Powell, S. et al. eggNOG v4.0: nested orthology inference across 3686 organisms. Nucleic Acids Res. 42, D231–D239 (2014)

    Article  CAS  Google Scholar 

  17. Katoh, K. & Toh, H. Recent developments in the MAFFT multiple sequence alignment program. Brief. Bioinform. 9, 286–298 (2008)

    Article  CAS  Google Scholar 

  18. Capella-Gutiérrez, S., Silla-Martínez, J. M. & Gabaldón, T. trimAl: a tool for automated alignment trimming in large-scale phylogenetic analyses. Bioinformatics 25, 1972–1973 (2009)

    Article  Google Scholar 

  19. Price, M. N., Dehal, P. S. & Arkin, A. P. FastTree 2 – approximately maximum-likelihood trees for large alignments. PLoS ONE 5, e9490 (2010)

    Article  ADS  Google Scholar 

  20. Gabaldón, T. & Koonin, E. V. Functional and evolutionary implications of gene orthology. Nature Rev. Genet. 14, 360–366 (2013)

    Article  Google Scholar 

  21. Huerta-Cepas, J. et al. PhylomeDB v3.0: an expanding repository of genome-wide collections of trees, alignments and phylogeny-based orthology and paralogy predictions. Nucleic Acids Res. 39, D556–D560 (2011)

    Article  CAS  Google Scholar 

  22. Edgar, R. C. MUSCLE: multiple sequence alignment with high accuracy and high throughput. Nucleic Acids Res. 32, 1792–1797 (2004)

    Article  CAS  Google Scholar 

  23. Subramanian, A. R., Kaufmann, M. & Morgenstern, B. DIALIGN-TX: greedy and progressive approaches for segment-based multiple sequence alignment. Algorithms Mol. Biol. 3, 6 (2008)

    Article  Google Scholar 

  24. Wallace, I. M., O’Sullivan, O., Higgins, D. G. & Notredame, C. M-Coffee: combining multiple sequence alignment methods with T-Coffee. Nucleic Acids Res. 34, 1692–1699 (2006)

    Article  CAS  Google Scholar 

  25. Darriba, D., Taboada, G. L., Doallo, R. & Posada, D. ProtTest 3: fast selection of best-fit models of protein evolution. Bioinformatics 27, 1164–1165 (2011)

    Article  CAS  Google Scholar 

  26. Stamatakis, A. RAxML version 8: a tool for phylogenetic analysis and post-analysis of large phylogenies. Bioinformatics 30, 1312–1313 (2014)

    Article  CAS  Google Scholar 

  27. Huerta-Cepas, J., Dopazo, J. & Gabaldón, T. ETE: a python Environment for Tree Exploration. BMC Bioinformatics 11, 24 (2010)

    PubMed  Google Scholar 

  28. Keeling, P. J. The number, speed, and impact of plastid endosymbioses in eukaryotic evolution. Annu. Rev. Plant Biol. 64, 583–607 (2013)

    Article  CAS  Google Scholar 

  29. Fraley, C., Raftery, A. E., Murphy, T. B. & Scrucca, L. mclust Version 4 for R: Normal Mixture Modeling for Model-Based Clustering, Classification, and Density Estimation . Technical Report No. 597 (Department of Statistics, Univ. Washington, 2012)

  30. Greenacre, M. Correspondence Analysis in Practice (Chapman & Hall, 2007)

Download references


T.G. group research is funded in part by a grant from the Spanish Ministry of Economy and Competitiveness (BIO2012-37161), a grant from the European Union FP7 FP7-PEOPLE-2013-ITN-606786 and a grant from the European Research Council under the European Union’s Seventh Framework Programme (FP/2007-2013)/ERC (Grant Agreement number ERC-2012-StG-310325).

Author information

Authors and Affiliations



A.A.P. and T.G. conceived the study. A.A.P. performed the computational analyses. A.A.P. and T.G. analysed and interpreted the data. A.A.P. and T.G. wrote the manuscript.

Corresponding author

Correspondence to Toni Gabaldón.

Ethics declarations

Competing interests

The authors declare no competing financial interests.

Extended data figures and tables

Extended Data Figure 1 Sister group distribution and extended phylogenetic distance profiles.

a, Ring plot showing the distribution of inferred prokaryotic origins. Inner layers represent hierarchically lower (broader) taxonomic levels. The number of LECA families assigned to each group is indicated in parentheses next to the corresponding level in the ring plot or in the boxes below. b, Box plot showing the distributions of branch lengths in the different bacterial components. Measured stem lengths (sl), raw stem lengths (rsl), and the medians of the lengths from LECA to branch tips inside the eukaryotic families (ebl), as defined in Fig. 1a, are shown. Permutation tests were performed to evaluate the statistical significance of the differences between the distributions. A total of 106 permutations were performed, with the values being randomly shuffled in each permutation (see also Methods). The arrows and symbols above the boxes refer to the statistical significance of the differences observed compared with randomly shuffled distributions (lower values, downward red arrow; higher values, upward green arrow). The correspondence between the symbols and the P values is as follows: ~P ≤ 1 × 10−1; *P ≤ 5 × 10−2; **P ≤ 1 × 10−2; ***P ≤ 1 × 10−3; ******P < 1 × 10−6. The lower and upper box limits correspond to the first and third quartiles (the 25th and 75th percentiles). c, d, Stem length profiles of the various functional categories (c) and GO slim cellular components (d) are shown. As in Fig. 2c, the stem lengths are also evaluated by looking only at the bacterial component to exclude the possibility that the observed differences are due solely to archaeal–bacterial differences. The significance was assessed with permutation tests (106 permutations) and is indicated with arrows as in b.

Extended Data Figure 2 Families of archaeal origin have significantly longer stems than families of bacterial origin across different functional categories, similar selective pressures, and connectivities/expression levels.

a, The stem lengths, raw stem lengths, and eukaryotic branch lengths, between families of archaeal and bacterial inferred origin, are compared across the three major functional categories. While the eukaryotic branch lengths among the groups do not show significant differences, differences are detected in their respective stems (raw stem lengths and stem lengths). b, Archaeal and bacterial LECA families of similar selective pressures (as measured by dN/dS values across family members) differ significantly in terms of their raw stem lengths. Sets of families from both groups were matched with respect to their dN/dS values in the indicated reference species. The dN/dS data were downloaded from Ensembl for family members corresponding to Homo sapiens (Metazoa), Aspergillus nidulans (fungi) and Zea mays (plants) (see Supplementary Information section 1). The comparison of the raw stem lengths of the two sets shows that archaeal families generally have significantly longer stems (upper plots), and functions within the ‘information storage and processing’ category (lower plots), irrespective of their selective pressures. c, Archaeal and bacterial LECA families of similar connectivity/expression levels show significantly different raw stem lengths (see Supplementary Information section 1). In ac, differences between the archaeal and bacterial component were evaluated with a two-tailed Mann–Whitney U-test and the P value is indicated in each case (*P ≤ 5 × 10−2; ~ P ≤ 1 × 10−1; #P > 1).

Extended Data Figure 3 Analysis of the cyanobacterial signal in primary plastid-bearing eukaryotes.

a, Ring plot showing the distribution of inferred prokaryotic origins in widespread plant protein families, as in Extended Data Fig. 1a. The profile of inferred origins of eukaryotes that acquired a plastid through primary endosymbiosis carries a strong signal from the cyanobacterial endosymbiont. b, c, Families of inferred cyanobacterial origin have significantly shorter stem lengths and raw stem lengths than alphaproteobacterial families (b) and than the random distribution of stem lengths from the bacteria inferred component (c), pointing to a more recent acquisition of plastids (post-LECA). d, Overall, as with mitochondrial localized proteins, those proteins localized to plastids have shorter stems than the nuclear and endomembrane system proteins. e, Schematic representation of the expected difference in stems, given that cyanobacterial endosymbiosis occurred after the diversification of the major eukaryotic lineages. As confirmed, the raw stem lengths measured from plant protein families to their common ancestor with cyanobacteria are shorter than those whose origin can be traced back to Alphaproteobacteria or other bacterial groups. Two-tailed Mann–Whitney U-test P value symbols in b and d are as in Extended Data Fig. 1; additionally ****P ≤ 1 × 10−4; *****P ≤ 1 × 10−5.

Extended Data Figure 4 Effect of alternative LECA definitions.

a, The four eukaryotic groups including all 37 selected eukaryotic species used in the analysis are shown next to the NCBI taxonomic structure, with the higher groupings modified according to the Tree of Life Project ( b, Stricter LECA definitions have a much larger effect on the bacterial component than on the archaeal component, which is more widespread among eukaryotic groups. c, The effect of different LECA definitions in terms of taxonomic assignments and differences in stem lengths between proteins of alphaproteobacterial origins and those derived from other bacteria. Numbers in parenthesis indicate the total number of LECA families that passed the threshold. The kernel density plots, as in Fig. 2b, show the observed stem length means for Alphaproteobacteria compared with 106 random samplings among values in protein families of bacterial origin. The observed means (μobs) are shown with a dashed red line, reflecting the P value of each test, and indicated next to the plot. See also Supplementary Information section 3.1.

Extended Data Figure 5 Alphaproteobacterial-derived proteins have consistently shorter branches, irrespective of the methods, data sets, and support thresholds.

Kernel density plots of the random mean distributions of the stem lengths are shown for the different methods, data sets and support thresholds used (see also Supplementary Information sections 3.2 and 3.3). The observed alphaproteobacterial means (μobs) are as in Fig. 2b. a, Results after using either the phylogenetic trees provided by the authors in ref. 8 (upper left), our standard phylogenetic pipeline applied to their sampling of sequences (upper right) or alternative phylogenetic pipelines or samplings from EggNOG (lower). b, The main result is robust against progressively stricter support thresholds until the sample size becomes too small (support threshold > 0.9). Numbers in parenthesis indicate the number of bacteria-inferred LECA families for each threshold.

Extended Data Figure 6 Evaluation of alternative HGT scenarios and other potential biases.

a, The sampling effect was simulated by artificially removing part or all of the alphaproteobacterial sequences in the final data sets. To simulate the potential bias caused by an enriched sampling of Alphaproteobacteria, an artificial reduction of alphaproteobacterial sequences to 50% was applied to the data set (‘HALF alpha sampling’). The reduction of alphaproteobacterial sequences by 50% does not significantly change the inferred stem length within families of alphaproteobacterial origin. #Cases where the difference was not significant. b, Different scenarios of HGT to the proto-mitochondrion are unable to explain the observed signal in families mapped to non-alpha Bacteria. The transfer of a gene from Alphaproteobacteria to another bacterial lineage after mitochondrial endosymbiosis and its parallel loss from the lineage of the mitochondrial ancestor (‘post-mito HGT from alpha’) would result in unchanged stem lengths. Loss of a gene from the alphaproteobacterial sister clade would result in an increase of the inferred stem lengths (‘vertical transmission/pre-mito HGT from alpha’). The transfer of a gene from the protoeukaryotic lineage to other bacterial clades would result in shorter stem lengths compared with the alphaproteobacterial mappings (‘post-mito HGT from protoeukaryote’). c, Upon total exclusion of alphaproteobacterial sequences (‘NO alpha sampling’), eukaryotic families map to other bacterial groups but with stem length higher than those observed typically. The same is observed when comparing the stem lengths of the families mapping to proteobacterial groups in the absence of Alphaproteobacteria with those typically mapping to proteobacterial groups other than Alphaproteobacteria. d, Box plots showing that there are no significant differences in the stem lengths between alphaproteobacterial families with mitochondrial localization compared with those with other subcellular localizations (left), or between families involved in energy-related functions compared with those involved in other functional categories (right). e, Box plot showing no significant difference between the distribution of stem lengths of families of Rickettsiales-inferred origin and other Alphaproteobacteria. f, Alphaproteobacterial families in different functional categories show no difference in stem lengths. In all cases the distributions were compared using a two-sided Mann–Whitney U-test. See also Supplementary Information sections 4 and 5.

Extended Data Figure 7 LECA inference and Lokiarchaeota.

Results after the inclusion of Lokiarchaeota in our analysis. a, The distribution of the sister group inference among prokaryotic taxonomy is shown in a ring plot together with the number of families in each group in parentheses (as in Extended Data Fig. 1). b, Box plot showing the stem length profiles of the various prokaryotic groups. Lokiarchaeota show the lowest values among all archaeal groups but higher values than any bacterial group. The symbols correspond to the same P values explained in Extended Data Fig. 1 after applying a permutation test (106 permutations) for the archaeal and bacterial components, independently. c, Box plot with the comparison between the non-Loki archaeal, the Lokiarchaeota and the bacterial stem length profiles. The P value symbols are as before (two-sided Mann–Whitney U-test, correction for false discovery rate). d, Schematic representation of the effect of the absence of Lokiarchaeum sequences on the stem lengths. The inferred origin of 30 eukaryotic families that were previously mapped to other, mainly archaeal, groups within the eggNOG version 4 database, is Lokiarchaeota, when homologous sequences from this metagenome are included. A reduction in the observed stem lengths of the families of Lokiarchaeota-inferred origin is expected in the scenario of Lokiarchaeota being the closest known archaeal relative of Eukaryotes. See also Supplementary Information section 6.

Extended Data Figure 8 Correspondence of different LECA components with different cellular localizations and functions (extended version of Fig. 3).

ad, Different LECA components have different GO cellular components (a, c) and functional (b, d) profiles. Genes of different origin tend to have different functions and subcellular localizations. a, b, The same correspondence analysis symmetrical biplots as in Fig. 3 in higher resolution, with the names of the taxonomic group, the function and the GO slim terms indicated next to the coordinates. The percentage of variance explained by each principal component is indicated next to each axis in parentheses. c, d, The contingency tables also used in correspondence analysis are shown in the form of a heatmap. The asterisks in the different cells reflect the significance of the association between a given origin and a localization (c) or function (d), as computed using permutation tests (106 permutations), where the annotations among each eukaryotic family were reshuffled (see Methods). The correspondence between the symbols and the P values is as in Extended Data Figs 1 and 3. e, The COG functional categories, as organized in the three major groups ‘information storage and processing’, ‘cellular processes and signalling’ and ‘metabolism’.

Supplementary information

Supplementary Information

This file contains Supplementary Notes, which comprise alternative methods and tests, and additional references. (PDF 427 kb)

Supplementary Table 1

The selected 37 eukaryotic species and the 692 prokaryotic taxonomic levels used for sub-sampling the eggNOG v4 orthologous groups. (XLS 101 kb)

Supplementary Table 2

The file contains information on the protein families assigned to LECA, on which the subsequent analysis was performed. This includes information based on the phylogenetic inference (sister group, component assignment, branch length estimations) and the corresponding annotations, as provided by eggNOG v4 or as defined by the family's members. (XLS 304 kb)

PowerPoint slides

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Pittis, A., Gabaldón, T. Late acquisition of mitochondria by a host with chimaeric prokaryotic ancestry. Nature 531, 101–104 (2016).

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI:

This article is cited by


By submitting a comment you agree to abide by our Terms and Community Guidelines. If you find something abusive or that does not comply with our terms or guidelines please flag it as inappropriate.


Quick links

Nature Briefing

Sign up for the Nature Briefing newsletter — what matters in science, free to your inbox daily.

Get the most important science stories of the day, free in your inbox. Sign up for Nature Briefing