The plastid ancestor originated among one of the major cyanobacterial lineages

Ochoa de Alda, Jesús A. G.; Esteban, Rocío; Diago, María Luz; Houmard, Jean

doi:10.1038/ncomms5937

Article
Published: 15 September 2014

The plastid ancestor originated among one of the major cyanobacterial lineages

Jesús A. G. Ochoa de Alda^1,2,3^nAff4,
Rocío Esteban²^nAff5,
María Luz Diago²^nAff6 &
…
Jean Houmard³

Nature Communications volume 5, Article number: 4937 (2014) Cite this article

6150 Accesses
52 Citations
23 Altmetric
Metrics details

Subjects

Abstract

The primary endosymbiotic origin of chloroplasts is now well established but the identification of the present cyanobacteria most closely related to the plastid ancestor remains debated. We analyse the evolutionary trajectory of a subset of highly conserved cyanobacterial proteins (core) along the plastid lineage, those which were not lost after the endosymbiosis. We concatenate the sequences of 33 cyanobacterial core proteins that share a congruent evolutionary history, with their eukaryotic counterparts to reconstruct their phylogeny using sophisticated evolutionary models. We perform an independent reconstruction using concatenated 16S and 23S rRNA sequences. These complementary approaches converge to a plastid origin occurring during the divergence of one of the major cyanobacterial lineages that include N₂-fixing filamentous cyanobacteria and species able to differentiate heterocysts.

You have full access to this article via your institution.

Download PDF

Dynamic evolution of mitochondrial genomes in Trebouxiophyceae, including the first completely assembled mtDNA from a lichen-symbiont microalga (Trebouxia sp. TR9)

Article Open access 03 June 2019

Fernando Martínez-Alberola, Eva Barreno, … Eva M. del Campo

Pediludiella daitoensis gen. et sp. nov. (Scenedesmaceae, Chlorophyceae), a large coccoid green alga isolated from a Loxodes ciliate

Article Open access 20 January 2020

Ryo Hoshina, Masashi M. Hayakawa, … Toshinobu Suzaki

A phylogenetically novel cyanobacterium most closely related to Gloeobacter

Article Open access 18 May 2020

Christen L. Grettenberger, Dawn Y. Sumner, … Anne D. Jungblut

Introduction

The appearance of free oxygen in the atmosphere results from an evolutionary biological breakthrough, and probably represents the most important biogeological event in Earth history. The innovation of oxygen-evolving photosynthesis occurred in precursors of cyanobacteria—a monophyletic group of microalgae recognized among prokaryotes by their ability to evolve oxygen. Cyanobacteria are also responsible for the spread of phototrophy among eukaryotic lineages. Many lines of evidence support that the (oxy)photosynthetic lifestyle of Archaeplastida (an evolutionary lineage grouping Glaucophyta, red and green algae, and green plants) derived from a common cyanobacterial ancestor that established a permanent endosymbiotic relationship with a mitochondriate ancestor. Some descendants of this primary endosymbiont underwent subsequent independent events (secondary and tertiary eukaryotic endosymbiosis), leading to the spread of oxygenic photosynthesis across an extremely diverse array of protists^{1,2,3,4,5,6,7}.

Cyanobacterial diversification was accompanied by one of the most outstanding increases in physiological and morphological complexity of the prokaryotic world⁸. Cyanobacteria were first subdivided into five taxonomic sections on the basis of morphological complexity and reproduction mode⁸. Although this complexity has been the driving force of classical cyanobacterial taxonomy, the recognition of polyphyly of most characters (muticellularity, nitrogen fixation, and baeocyte formation) rendered the assignment of phylogenetic groups necessary. Shih et al.⁹ have generated a cyanobacterial species tree from a concatenation of 31 conserved proteins from 126 genomes, which defines 7 clades A to G⁹. In Fig. 1 of their paper, they show the non-univocal correspondence between the subclades or groups and the five previously defined morphological subsections for which no specific or unique genetic determinants underlying these major phenotypes could be retrieved. The candidate phylum of Melainabacteria appears to be the closest non-photosynthetic sibling to cyanobacteria¹⁰. Gloeobacter violaceus PCC 7421 and a reduced number of Synechococcus strains (Group G) are descendants of early and most probably extinct divergent lineages^5,11,12. These were followed by divergence of groups F (which includes Pseudanabaena strains) and D (which includes Acaryochloris and Thermosynechococcus strains). Most extant cyanobacteria diversified from two major cyanobacterial lineages: (i) Group C, which includes Prochlorothrix sp., Prochlorococcus/Synechococus subclades and Leptolyngbya sp., and (ii) Group A and B, which include a great diversity of unicellular and multicellular strains, among which some are able to differentiate specific cells (heterocysts, hormogonia, akinetes and baeocytes)⁹.

**Figure 1: Phylogenetic position of endosymbiotic events inferred from rRNA sequences.**

Molecular phylogenies using single or concatenated sequences converge to a monophyletic origin for plastids^4,9,13,14, meaning that a single ancestral cyanobacterium underwent the successful primary event. However, the identification of the nearest current cyanobacterial species remains controversial (refs 1, 9, 13, 15 and references therein for a recent analysis), hindering the inference for the morphological, biochemical and physiological characteristics of the ancestor. Most phylogenetic analyses based on 16S ribosomal RNA or single protein sequences showed that all the plastids group in a single radiation, and position the progenitor very close to the root (group G) of the cyanobacterial tree, before the divergence of the major lineages^4,5. This ancient origin of plastids among the cyanobacterial radiation received support from phylogenetic reconstructions using concatenated protein and gene sequences of plants and cyanobacteria^9,13,15,16. However, these single-gene phylogenetic and phylogenomic approaches are prone to important biases, as recently reviewed by Williams et al.¹⁷

One approach to overcome pitfalls during reconstruction of ancient evolutionary events is to use refined models accounting for the phylogenetic landmarks that are diluted or buried (homoplasy) among a long and complex evolutionary history¹⁸. This must be accompanied by a strict selection of reliable phylomarkers among protein or DNA sequences that are resistant to horizontal gene transfer (HGT) and possess both strong evolutionary signals and a common phylogeny, as previously described^19,20. Analysing the genetic makeup for 13 cyanobacterial genomes, Shi and Falkowski²⁰ identified 682 single-copy genes ubiquitous to all genomes and reported a subset of 323 sequences (the core) that possessed strong phylogenetic information and showed similar evolutionary trajectories as opposed to the other 359 sequences (the shell) that exhibited divergent phylogenies (that is, independent evolution and frequent transfers). Concatenation of core sequences allowed them to obtain a highly resolved and supported cyanobacterial tree. Given that these core genes had a similar evolutionary trajectory, our rationale was that if some homologous sequences are still retained in the descendants of the primary endosymbiont, the cyanobacterial core could be used for tracing the evolution of the plastid lineage among cyanobacteria. This approach should reduce the phylogenetic noise due to conflicting signals arising from the cyanobacterial sequences affected by site saturation, hidden paralogy and/or HGT events before endosymbiosis. Such conflicting signals may accumulate when the markers are identified by choosing homologous plastid sequences as seeds, as achieved in previous phylogenomic reconstructions^9,13,15,16.

Here we report on the evolutionary trajectory of cyanobacterial core genes once the last common ancestor of current cyanobacteria and plastids became an endosymbiont into a mitochondriate host. We identify and concatenate core sequences still present in cyanobacteria and photosynthetic eukaryotes for an accurate phylogenetic reconstruction using complex evolutionary models. The resulting phylogeny is congruent with an independent reconstruction using concatenated small and large rRNA sequences from the same species and previous physiological clues for the plastid origin. Our analysis places plastid origin among members of one of the major cyanobacterial lineages that includes filamentous N₂-fixing cyanobacteria.

Results

The debate on plastid ancestor

Single-loci phylogenetic reconstructions return an extremely large confidence set of trees²¹, supporting both a deep²² and a recent^4,12 origin for plastids (Supplementary Fig. 1). On the other hand, the phylogenomics results may be undermined by systematic errors if the phylogenetic reconstruction methods do not account for the complexity of the sequences (difference in evolutionary rates of sites and/or lineages) or if the concatenated data provide more phylogenetic noise (for example, hidden paralogy and HGT) than congruent phylogenetic information^17,19,20,23. As a result, in such studies concatenated plastid sequences could group with ancient cyanobacteria (groups F and G) either as a consequence of long branching-attraction phenomenon¹⁶ or of the heterogeneity of the evolutionary history of the concatenated sequences¹⁸. In contrast, a more recent origin—plastids diverging with Groups A and B—has been suggested based on phylogenetic analyses of concatenated rRNA sequences¹², physiological data on starch storage²⁴ or protein similarity^1,25. However, these analyses may also be biased as ribosomal sequences are susceptible of stochastic error²⁶ and evolutionary model misspecification (Supplementary Fig. 1); common physiological traits can be acquired by convergence or retained by chance in different lineages and protein similarity can be enhanced by reduced evolutionary rates after divergence. Thus, further work is needed to accurately determine the origin of the plastid lineage.

Phylogeny of concatenated 16S–23S rRNAs

A thorough phylogenetic reconstruction using a concatenation of large and small rRNA sequences (Supplementary Data 1) shows that the plastid lineage clusters with cyanobacterial groups A and B (posterior probability=0.99), as a sister group with group A and subgroup B2 (posterior probability=0.96) (Fig. 1). In this analysis and in contrast to previous works^12,22, we used an evolutionary model that accounts for heterogeneity among sites (CAT), allowing a good description of saturation and biochemical diversity of sequence alignments (Table 1). Discrepancies with previous works could result from previous misspecification of the evolutionary model (Supplementary Fig. 1). To further check the accuracy of the phylogenetic reconstruction, we increased the stringency for the selection of less-saturated characters in the multiple alignments (Supplementary Data 2). As described for simulated data^27,28, character trimming reduces confidence values for branches but increases the accuracy of phylogenetic reconstructions, that is, reduces the difference between the ‘true’ and the reconstructed trees. As expected from these previous works, confidence values for cluster support ≥0.95 (0.99 posterior probability on average) are reduced to an average of 0.74 after trimming. In spite of the increase in stringency, phylogenetic reconstruction recovered the monophyly of plastids as well as its clustering with groups A and B, but not as a sister of groups A and B2. This suggests that plastids arose during the diversification of the main groups. However, it does not end the current controversy on plastid origin, as the resulting topology differs from that obtained through previous phylogenomic approaches^{9,13,15,16,25}.

Table 1 Relevance of accounting for site heterogeneity during phylogenetic reconstructions.

Full size table

Phylogenomic of the core genes in photosynthetic eukaryotes

We mined the complete sequences of cyanobacterial genomes and photosynthetic eukaryotes for the 323 cyanobacterial core sequences (as in May 2010, Supplementary Table 1). The number of sequences kept varies across photosynthetic eukaryotes with only 38 common to all photosynthetic eukaryotes (Supplementary Data 3). Thus, only a few cyanobacterial core genes appear essential for intracellular lifestyle.

To further test our first results, we added to the 13 analysed by Shi and Falkowski²⁰ 16 genomes chosen on the basis of their belonging to distant groups, genome size and evolutionary rate. To reconstruct the cyanobacterial/plastid evolutionary history, we started with only 68 (out of 323) cyanobacterial core genes (PCD data set, Supplementary Data 4), none being duplicated in the available cyanobacterial sequences (as May 2011) and all being present simultaneously in a diatom (Phaeodactylum tricornutum), a red alga (Cyanidioschyzon merolae) and a green plant (Physcomitrella patens). This data set was further reduced to 48 sequences (CyPlas data set, Supplementary Data 4), those for which protein trees were congruent (P-value>0.05, Supplementary Data 4) with at least one of six topologies for the species tree (Supplementary Fig. 2 and Supplementary Data 5–7); these topologies are likely to approach the evolutionary history of cyanobacteria.

We further analysed the congruence of the CyPlas data set with five evolutionary scenarios: (i) the 16S–23S rRNA tree reconstructed using Phylobayes; (ii) two trees reconstructed from the concatenated CyPlas data set using both PhyML and Phylobayes; (iii) a consensus tree obtained with the 48 single-gene trees of the CyPlas data set; and (iv) a tailored tree in which plastids diverged together with heterocystous cyanobacteria as recently suggested²⁵ (Fig. 2a–e and Supplementary Data 8–10). Phylogenies based on protein sequences (Consensus, PhyML and Phylobayes) are the best guide trees for the common evolutionary history of individual gene trees, being in the confidence set (P-value≥0.05) of 33 sequences (Table 2). In fact, 28 of these genes were congruent simultaneously with topologies supporting an ancient origin of plastids (proposed by the PhyML and consensus trees) over a recent origin of plastids (proposed by Phylobayes tree), highlighting their limits to solve cyanobacteria–plastid phylogeny (Fig. 2f).

**Figure 2: Selection of phylomarkers for phylogeny of cyanobacteria and plastids.**

Table 2 Set of 33 cyanobacterial core genes selected.

Full size table

The set of 33 sequences of plastids and cyanobacteria having a congruent evolutionary history (Table 2) were concatenated for phylogenetic reconstructions (Supplementary Data 11). In agreement with previously published analyses, maximum likelihood and Bayesian inference using LG+discrete gamma rate substitutions (Γ) evolutionary model supported with maximal statistical values (approximate Likelihood-Ratio Test (aLRT) and posterior probability=1) the basal emergence of plastids among the cyanobacterial tree (Supplementary Fig. 3A). However, this high statistical support does not necessarily ensure an accurate phylogenetic reconstruction if it is not supported by model assessment^18,29. A posterior predictive analysis confirms that the PhyML topology that points to an ancient origin for plastids was the result of a model misspecification and that the LG+ Dirichlet (d)+CAT model, which accounts for heterogeneity across sites (CAT), is a good prediction of evolutionary history (Supplementary Fig. 3C). This model was further improved by accounting for heterogeneity over time (General-Time-Reversible model (GTR)+d+CAT model) without any change in the topology (Fig. 3). The clustering of plastid lineage with groups A and B (posterior probability=0.99) is congruent with our previous reconstruction using ribosomal sequences (Fig. 1). The distance from the plastid grafting point to the tips of heterocystous cyanobacteria appears as the shortest among the tree, in agreement with the remarkable similarity of the cyanobacterial proteins inherited by plants with those from heterocystous (Group B1) organisms^1,25. The inclusion of Porphyra purpurea sequences in the data set reduces the number of available genes from 33 to 30 (Supplementary Data 12). This does not alter the tree topology but increases to 0.99 the posterior probability for the monophyly of plastids (Supplementary Fig. 4A). In contrast, the additional inclusion of Cyanophora paradoxa and four cyanobacteria (Gloeocapsa sp. PCC 7428, Rivularia sp. PCC 7116, Oscillatoria sp. PCC 6506 and Crinalium epipsammum PCC 9333) reduces the number of congruent genes to 18 (Supplementary Data 13), which results in a reduction of branch support, whereas it maintains the Group A, B and plastid cluster (Supplementary Fig. 4B). These results thus point to the diversification of plastids within the major cyanobacterial lineages.

**Figure 3: Core phylogenomics converges on a recent origin for plastids.**

Plastid origin versus cyanobacterial diversification

The recent availability of genome sequences covering the wide cyanobacterial diversity⁹ as well as of several photosynthetic eukaryotes allows to improve phylogeny by increasing the number and diversity of taxon sampling. Given the paucity of phylogenetically congruent proteins, we carried out a phylogenetic reconstruction using only concatenated rRNA sequences from 120 cyanobacteria, Paulinella chromatophora and 14 plastids (Supplementary Fig. 5 and Supplementary Data 14). As the root of cyanobacteria has been recently questioned³⁰, we included three diverse Melainabacteria (the closest related outgroup)¹⁰ in the data set to root the phylogenetic tree constructed (Supplementary Data 15 and 16). Reduction of data set complexity (number of sequences, redundancy, saturation and compositional heterogeneity) converges towards the clustering of plastid lineage with group A (Fig. 4, Supplementary Table 2, Supplementary Figs 6 and 7, and Supplementary Data 17–20). A recent phylogenetic reconstruction using concatenated protein-coding genes and refined methods ascribes this branching point to a compositional bias¹⁵. We observed however that the phylogenetic reconstruction after mitigation of compositional bias (from 13 to 2 s.d.) maintain plastid lineage as a sister of group A (Supplementary Fig. 6). Noteworthy, after mitigation of compositional bias, the posterior probability of plastids as a sister of non-heterocystous filamentous N₂-fixing cyanobacteria (members of family Oscillatoriaceae) reaches a posterior probability of 0.9, as plastids cluster with group A with a bipartition frequency of 0.76, whereas they cluster with a Microcoleus strains with a bipartition frequency 0.14 (Table 3). This is consistent with the hypothesis of heterocystous cyanobacteria as the more recent common ancestor of plastids¹, as according to our phylogenetic analysis heterocystous cyanobacteria evolved from a non-heterocystous filamentous N₂-fixing cyanobacteria of Group A or a Microcoleus related strains (Figs 2, 3, 4).

**Figure 4: Increasing the phylogenetic diversity of the rRNA data set places the plastid lineage as a sister of group A.**

Table 3 Mitigating compositional bias.

Full size table

The resulting rRNA tree supports the origin of plastids among already evolved cyanobacteria and fits the topology of the cyanobacterial groups of our phylogenomic tree: (i) it positions Gloeobacter at the root of the tree; (ii) Groups G, E and C diverge following the order described before; and (iii) it supports the divergence of plastids among already evolved cyanobacteria.

Discussion

Overall, our phylogenetic reconstructions using ribosomal and protein sequences were congruent. One important exception was the branching position of Microcoleus chthonoplastes PCC 7420, recently renamed Coleofasciculus chthonoplastes³¹. It clustered with subgroup B2 in protein phylogeny (in agreement with other phylogenomic reconstructions^13,25 but with group A in ribosomal phylogeny (in agreement with morphological and physiological data³¹, and exceptional domain acquisition of ValtRNA synthetases³²). Lodders et al. provided evidence that genetic recombination in natural populations of the cyanobacterium M. chthonoplastes frequently occurs³³ and that the nitrogenase cluster has been horizontally acquired³⁴.This highlights the complex evolutionary history of this strain in which massive gene acquisitions have recently been reported²⁵.

Our results suggest that plastids arose during the diversification of groups A and B1 (Fig. 4) that encompasses a majority of N₂-fixing filamentous cyanobacteria; they are more closely related to group A, as they cluster with a relatively high support compared with well-described nodes. Thus, in contrast to the current dominant opinion, the plastid lineage probably has close relatives among extant cyanobacteria and it is not the sole survivor of an extinct lineage of cyanobacteria that diverged among groups G^13,15 and F⁹ more than 2.5 Bya ago^3,5.

Current estimates date the group A and B1 diversification to some 1.75–2 Bya ago, and group A diversification to 1.5–1.75 Bya ago^5,12, which is close to the date estimated for the primary endosymbiosis and for the last common ancestor of extant Archaeplastida (1.428–1.67 Bya)^3,35,36,37 and far from the Great Oxygenation Event (2.45–2.32 Bya)⁵.

Our work accounts for previous discrepancies in the proposed phylogenies and gives support to a rather recent origin for the plastid lineage. It positions the last common ancestor of extant cyanobacteria and plastids after the diversification of clades A–B (Figs 1, 2, 3, 4), more probably as a sister group A (Fig. 4). This diversification could have occurred 1.5–1.75 Bya ago, that is, after the Great Oxygenation Event^5,12. Eukaryotes would thus not have been major factors in the early stages of the atmosphere oxygenation. Furthermore, the rise in atmospheric oxygen could have been the driving force that promoted some N₂-fixing cyanobacteria to invade the microaerobic environment found in the cytosol of a mitochondriate phagotroph so as to protect their nitrogenase against O₂ inhibition. As feedback, the hosting cell may have benefitted from carbon and nitrogen-rich exudates from the endosymbiont.

Although cyanobacterial endosymbioses are common in nature, for example, P. chromatophora or the diatom Rhopalodia gibba² being other examples, none of these more recent endosymbioses have however had the ecological success of the Archaeplastida primary plastid lineage or its secondary and tertiary plastid descendants. In addition, this work points to a set of core genes, and to a cluster of N₂-fixing filamentous cyanobacteria (groups A and B1) on which future synthetic endosymbionts could be based.

Methods

Experimental design

Our phylogenomic experimental design involved: (i) a diversity-driven selection of cyanobacteria; (ii) the reconstruction of guide trees tracing the vertical evolution of this phylum; (iii) the identification of orthologous phylogenetic markers congruent to these trees; (iv) the addition to these markers of eukaryotic homologues of cyanobacterial origin; and (v) the phylogenetic reconstruction of cyanobacterial and plastid evolution using concatenated markers and refined evolutionary models.

Taxonomic sampling

Cyanobacteria were initially selected among 57 genomes available in 2010 on the basis of their position in a phylogenetic tree deduced from small subunit rRNA sequences; indeed this gene is a good diversity predictor of the universal gene core present in bacterial genomes³⁸. As a rule, we identified the most divergent lineages from the root to the branch tips of the tree, and among these, the slowest evolving strains with the largest genomes (Supplementary Table 1). We excluded closely related strains, as they add low genetic diversity while increasing the probability of incongruence by hidden/undetected HGT and biasing the heterogeneity of amino acids towards a given composition; this would have occurred if we had included all the marine Synechococcus and Prochlorococcus genomes^39,40,41. The cyanobacterial data set was completed with photosynthetic eukaryotes for which the complete genome was available (May 2010). However, due to scarcity of orthologues for the reconstruction with concatenated sequences, this data set was reduced to three eukaryotes showing the highest diversity, slowest evolutionary rate and the largest number of cyanobacterial core genes in common: a diatom (P. tricornutum), a red alga (C. merolae) and a green plant (P. patens). The inclusion of a single green plant reduced the potential impact on incongruence test of duplications and hidden paralogy frequent in this lineage. Finally, as the position of the root of cyanobacteria was questioned during the work³⁰, and the number of available genomes increased following a diversity-driven effort⁹, we expanded the taxon sampling to three diverse Melainabacteria¹⁰ so as to root the phylogenetic tree, and to 120 cyanobacteria, P. chromatophora and 14 plastids from which a full set of small (Supplementary Data 15) and large (Supplementary Data 16) RNA gene sequences were available in June 2013 JGI-DOE⁴² and SILVA Databases⁴³.

Data set selection, retrieval, concatenation and assessment

Small and large ribosomal sequences were retrieved from JGI-DOE⁴² and SILVA Databases, and aligned using SILVA tools⁴³ (bases remaining unaligned at the end were removed). BMGE²⁷ was used to remove gaps and constant positions from rRNA alignments and for selection of phylogenetic informative characters (-w 1 -h 1E-5:1 setting) under default (PAM100 matrix, -m DNAPAM100:2 -w 1 -g 0.0 -b 1 setting) or very stringent conditions (PAM1 matrix, -m DNAPAM1:2 -w 1 -g 0.0 -b 1 setting). A comparison of phylogenetic reconstructions using default and stringent conditions allowed us to estimate tree accuracy (more accurate under stringent conditions) and confidence values for branches (more reliable under default conditions)^27,28. Constant sites were removed before phylogenetic reconstructions because it allows a better fit of models to data and reduces computing time.

Eukaryotic proteins of cyanobacterial origin were identified after BLASTp searches⁴⁴ using the amino acid sequences from G. violaceus PCC 7421 (Supplementary Data 2 as seed data set against Refseq-NCBI database⁴⁵ (Summer 2010), allowing 1,000–5,000 maximum target sequences. A eukaryotic top hit into the BLOSUM62 score range of cyanobacteria was the first evidence of a common origin. Blast results allowed us to ascertain the number of gene copies per cyanobacteria (using the Blast taxonomy report), the presence of eukaryotic counterparts and their evolutionary relationship with cyanobacteria (using Tree-blast phylogenetic reconstruction) either as a sister group or as originating from other bacteria. A second Blastp was performed to detect the absence/presence in photosynthetic eukaryotes by filtering for cyanobacteria and the selected eukaryotes. Selected protein sequences were retrieved and aligned (MAFFT⁴⁶) and translation start point reassigned (if required) using tBlastn⁴⁷. Selection of reliable position (removing gaps and fastest evolving sites) were carried out using Gblock under default setting⁴⁸.

Guide trees

To identify sequences orthologous to cyanobacterial genes, we used several guide trees that probably approximate the ‘real’ species tree. For the reconstruction of guide trees, we used two phylogenetic reconstruction approaches, PhyML 3.0 (ref. 49) and Phylobayes 3.3e⁵⁰, and three different alignments: (i) small subunit rRNA sequences (Supplementary Data 5), (ii) a concatenation of the large and small rRNA sequences (Supplementary Data 6) and (iii) a concatenation of protein phylogenetic markers exhibiting a congruent evolutionary history¹¹ (Supplementary Data 7). The latter was done in two steps⁴⁷: we first concatenated Cicarelli’s sequences¹¹ to carry out a phylogenetic reconstruction using Phylobayes (GTR+4Γ+CAT). Approximately unbiased (AU) test^51,52 was used to select a subset of sequences congruent with the resulting topology. These 13 sequences were in turn concatenated (Supplementary Data 7) and used for the reconstruction of the guide trees shown in Supplementary Fig. 2.

Evolutionary model selection and phylogenetic reconstruction

We used the Akaike Information Criteria implemented in jModelTest 0.1 (ref. 53) and Prottest 2.4 (ref. 54) to select the best evolutionary models for the PhyML⁴⁹ reconstruction of DNA and protein sequence alignments, respectively. Model selection progressed in two steps. We first delimited the number of evolutionary models by selecting the best two models among 88 (jModelTest) or 14 (ProtTest) candidate models, and then we improved the model adjusting Γ discontinuous rates from 4 to 16. However, for the PhyML reconstruction of multiple alignments containing more than 90 sequences, we used the Bayesian Information Criteria and Model Averaged Phylogeny implemented in jModelTest 2.1.4 (ref. 55) to select the best evolutionary models among 1,624 available. Models were finally refined using Phylobayes 3.e to account for compositional heterogeneity across sites (CAT, 20 profiles)²⁹ and over time (GTR)⁵⁰ as well as rates across sites, following either a Dirichlet (d) process or discrete Γ distributions from 4 to 16 categories. To select the best evolutionary model among Bayesian reconstructions, we carried out a posterior predictive analysis of saturation (number of substitutions and level of homoplasy) and of the mean number of different amino acids per column²⁹ using the ppred programme implemented in Phylobayes. A consensus tree was obtained from trees sampled from the chain showing the best posterior predictions. Convergence of two chains was achieved using a parallelized version of phylobayes (MPI phylobayes⁵⁶) and was checked with the bpcomp programme, whereby convergence was reached if the maxdiff value of the four chains was ≤0.1. All Bayesian analyses were run at the University of Oslo’s Bioportal (www.bioportal.uio.no), Calendula (FCSCL, León, Spain) and Cipres Gateway⁵⁷ High Performance Computing Clusters.

Finally, we evaluated the stability of the topology to variations in compositional heterogeneity (progressively suppressing sequences showing more than 3 or 2 s.d. of the mean) and taxon sampling (Supplementary Data 20). Ppred programme implemented in Phylobayes was used to select sequences to mitigate compositional bias.

Topology testing

We used the Weighted Shimodaira–Hasegawa test implemented in CONSEL⁵¹ to estimate the P-values of a set of topologies for a given alignment of sequences and its corresponding optimal evolutionary models (Supplementary Data 3). Each of these models was used to calculate the likelihood per site of candidate trees (no more than 50 trees per run) using PhyML. Parameters and branch length (but not topology) were optimized and the branch support was not calculated.

According to Shimodaira⁵², Weighted Shimodaira–Hasegawa test (WSH-test) is more adequate than AU test when several best trees (our six guide trees for cyanobacterial vertical evolution) are included in the set of candidate trees together with the optimal PhyML tree. To reduce sampling error, we increased ten times the number of replicates. We considered genes as orthologues if they had at least one guide tree topology in their confidence set of trees (P-value>0.05).

Additional information

How to cite this article: Ochoa de Alda, J.A.G. et al. The plastid ancestor originated among one of the major cyanobacterial lineages. Nat. Commun. 5:4937 doi: 10.1038/ncomms5937 (2014).

References

Deusch, O. et al. Genes of cyanobacterial origin in plant nuclear genomes point to a heterocyst-forming plastid ancestor. Mol. Biol. Evol. 25, 748–761 (2008).
Article CAS Google Scholar
Gould, S. B., Waller, R. F. & McFadden, G. I. Plastid evolution. Annu. Rev. Plant Biol. 59, 491–517 (2008).
Article CAS Google Scholar
Parfrey, L. W., Lahr, D. J., Knoll, A. H. & Katz, L. A. Estimating the timing of early eukaryotic diversification with multigene molecular clocks. Proc. Natl Acad. Sci. USA 108, 13624–13629 (2011).
Article CAS ADS Google Scholar
Sato, N. Origin and Evolution of Plastids: Genomic View on the Unification and Diversity of Plastids- The Structure and Function of Plastids. Advances in Photosynthesis and Respiration Vol. 23 (eds Wise, R. R. & Hoober, J. K.)75–102 (2006).
Schirrmeister, B. E., de Vos, J. M., Antonelli, A. & Bagheri, H. C. Evolution of multicellularity coincided with increased diversification of cyanobacteria and the Great Oxidation Event. Proc. Natl Acad. Sci. USA 110, 1791–1796 (2013).
Article CAS ADS Google Scholar
Adl, S. M. et al. The new higher level classification of eukaryotes with emphasis on the taxonomy of protists. J. Eukaryot. Microbiol. 52, 399–451 (2005).
Article Google Scholar
Keeling, P. J. The number, speed, and impact of plastid endosymbioses in eukaryotic evolution. Annu. Rev. Plant Biol. 64, 583–607 (2013).
Article CAS Google Scholar
Rippka, R., Deruelles, J., Waterbury, J. B., Herdman, M. & Stanier, R. Y. Generic assignments, strain histories and properties of pure cultures of cyanobacteria. J. Gen. Microbiol. 111, 1–61 (1979).
Google Scholar
Shih, P. M. et al. Improving the coverage of the cyanobacterial phylum using diversity-driven genome sequencing. Proc. Natl Acad. Sci. USA 110, 1053–1058 (2013).
Article CAS ADS Google Scholar
Di Rienzi, S. C. et al. The human gut and groundwater harbor non-photosynthetic bacteria belonging to a new candidate phylum sibling to cyanobacteria. eLife 2, e01102 (2013).
Article Google Scholar
Ciccarelli, F. D. et al. Toward automatic reconstruction of a highly resolved tree of life. Science 311, 1283–1287 (2006).
Article CAS ADS Google Scholar
Falcon, L. I., Magallon, S. & Castillo, A. Dating the cyanobacterial ancestor of the chloroplast. ISME J. 4, 777–783 (2010).
Article CAS Google Scholar
Criscuolo, A. & Gribaldo, S. Large-scale phylogenomic analyses indicate a deep origin of primary plastids within cyanobacteria. Mol. Biol. Evol. 28, 3019–3032 (2011).
Article CAS Google Scholar
Gray, M. & Archibald, J. inGenomics of Chloroplasts and Mitochondria. Advances in Photosynthesis and Respiration eds Bock R., Knoop V. Vol. 35, Chapter 1,1–30Springer Netherlands (2012).
Li, B., Lopes, J. S., Foster, P. G., Embley, T. M. & Cox, C. J. Compositional biases among synonymous substitutions cause conflict between gene and protein trees for plastid origins. Mol. Biol. Evol. 31, 1697–1709 (2014).
Article CAS Google Scholar
Rodriguez-Ezpeleta, N. et al. Monophyly of primary photosynthetic eukaryotes: green plants, red algae, and glaucophytes. Curr. Biol. 15, 1325–1330 (2005).
Article CAS Google Scholar
Williams, T. A., Foster, P. G., Cox, C. J. & Embley, T. M. An archaeal origin of eukaryotes supports only two primary domains of life. Nature 504, 231–236 (2013).
Article CAS ADS Google Scholar
Philippe, H. et al. resolving difficult phylogenetic questions: why more sequences are not enough. PLoS Biol. 9, e1000602 (2011).
Article CAS Google Scholar
Martin, W. et al. Gene transfer to the nucleus and the evolution of chloroplasts. Nature 393, 162–165 (1998).
Article CAS ADS Google Scholar
Shi, T. & Falkowski, P. G. Genome evolution in cyanobacteria: the stable core and the variable shell. Proc. Natl Acad. Sci. USA 105, 2510–2515 (2008).
Article CAS ADS Google Scholar
Castresana, J. Topological variation in single-gene phylogenetic trees. Genome Biol. 8, 216 (2007).
Article Google Scholar
Schirrmeister, B. E., Antonelli, A. & Bagheri, H. C. The origin of multicellularity in cyanobacteria. BMC Evol. Biol. 11, 45 (2011).
Article Google Scholar
Philippe, H. & Roure, B. Difficult phylogenetic questions: more data, maybe; better methods, certainly. BMC Biol. 9, 91 (2011).
Article Google Scholar
Deschamps, P. et al. Metabolic symbiosis and the birth of the plant kingdom. Mol. Biol. Evol. 25, 536–548 (2008).
Article CAS Google Scholar
Dagan, T. et al. Genomes of stigonematalean cyanobacteria (Subsection V) and the evolution of oxygenic photosynthesis from prokaryotes to plastids. Genome Biol. Evol. 5, 13 (2013).
Article Google Scholar
Jeffroy, O., Brinkmann, H., Delsuc, F. & Philippe, H. Phylogenomics: the beginning of incongruence? Trends Genet. 22, 225–231 (2006).
Article CAS Google Scholar
Criscuolo, A. & Gribaldo, S. BMGE (Block Mapping and Gathering with Entropy): a new software for selection of phylogenetic informative regions from multiple sequence alignments. BMC Evol. Biol. 10, 210 (2010).
Article Google Scholar
Talavera, G. & Castresana, J. Improvement of phylogenies after removing divergent and ambiguously aligned blocks from protein sequence alignments. Syst. Biol. 56, 564–577 (2007).
Article CAS Google Scholar
Lartillot, N., Brinkmann, H. & Philippe, H. Suppression of long-branch attraction artefacts in the animal phylogeny using a site-heterogeneous model. BMC Evol. Biol. 7(Suppl 1), S4 (2007).
Article Google Scholar
Szöllősi, G. J., Boussau, B., Abby, S. S., Tannier, E. & Daubin, V. Phylogenetic modeling of lateral gene transfer reconstructs the pattern and relative timing of speciations. Proc. Natl Acad. Sci. 109, 17513–17518 (2012).
Article ADS Google Scholar
Siegesmund, M. A, Johansen, J. R, Karsten, U. & Friedl, T. Coleofasciculus Gen. Nov. (Cyanobacteria): Morphological and molecular criteria for revision of the genus Microcoleous Gomont. J. Phycol. 44, 1572–1585 (2008).
Article Google Scholar
Olmedo-Verd, E., Santamaria-Gomez, J., Ochoa de Alda, J. A. G., Ribas de Pouplana, L. & Luque, I. Membrane anchoring of aminoacyl-tRNA synthetases by convergent acquisition of a novel protein domain. J. Biol. Chem. 286, 41057–41068 (2011).
Article CAS Google Scholar
Lodders, N., Stackebrandt, E. & Nubel, U. Frequent genetic recombination in natural populations of the marine cyanobacterium Microcoleus chthonoplastes. Environ. Microbiol. 7, 434–442 (2005).
Article CAS Google Scholar
Bolhuis, H., Severin, I., Confurius-Guns, V., Wollenzien, U. I. & Stal, L. J. Horizontal transfer of the nitrogen fixation gene cluster in the cyanobacterium Microcoleus chthonoplastes. ISME J. 4, 121–130 (2010).
Article CAS Google Scholar
Yoon, H. S., Hackett, J. D., Ciniglia, C., Pinto, G. & Bhattacharya, D. A molecular timeline for the origin of photosynthetic eukaryotes. Mol. Biol. Evol. 21, 809–818 (2004).
Article CAS Google Scholar
Douzery, E. J., Snell, E. A., Bapteste, E., Delsuc, F. & Philippe, H. The timing of eukaryotic evolution: does a relaxed molecular clock reconcile proteins and fossils? Proc. Natl Acad. Sci. USA 101, 15386–15391 (2004).
Article CAS ADS Google Scholar
Shih, P. M. & Matzke, N. J. Primary endosymbiosis events date to the later Proterozoic with cross-calibrated phylogenetic dating of duplicated ATPase proteins. Proc. Natl Acad. Sci. USA 110, 12355–12360 (2013).
Article CAS ADS Google Scholar
Wu, D. et al. A phylogeny-driven genomic encyclopaedia of Bacteria and Archaea. Nature 462, 1056–1060 (2009).
Article CAS ADS Google Scholar
Dufresne, A., Garczarek, L. & Partensky, F. Accelerated evolution associated with genome reduction in a free-living prokaryote. Genome Biol. 6, R14 (2005).
Article Google Scholar
Zhaxybayeva, O., Doolittle, W. F., Papke, R. T. & Gogarten, J. P. Intertwined evolutionary histories of marine Synechococcus and Prochlorococcus marinus. Genome Biol. Evol. 1, 325–339 (2009).
Article Google Scholar
Paul, S., Dutta, A., Bag, S. K., Das, S. & Dutta, C. Distinct, ecotype-specific genome and proteome signatures in the marine cyanobacteria Prochlorococcus. BMC Genomics 11, 103 (2010).
Article Google Scholar
Markowitz, V. M. et al. IMG: the Integrated Microbial Genomes database and comparative analysis system. Nucleic Acids Res. 40, D115–D122 (2012).
Article CAS Google Scholar
Quast, C. et al. The SILVA ribosomal RNA gene database project: improved data processing and web-based tools. Nucleic Acids Res. 41, D590–D596 (2013).
Article CAS Google Scholar
Altschul, S. F. et al. Gapped BLAST and PSI-BLAST: a new generation of protein database search programs. Nucleic Acids Res. 25, 3389–3402 (1997).
Article CAS Google Scholar
Pruitt, K. D., Tatusova, T., Brown, G. R. & Maglott, D. R. NCBI Reference Sequences (RefSeq): current status, new features and genome annotation policy. Nucleic Acids Res. 40, D130–D135 (2012).
Article CAS Google Scholar
Katoh, K., Misawa, K., Kuma, K. & Miyata, T. MAFFT: a novel method for rapid multiple sequence alignment based on fast Fourier transform. Nucleic Acids Res. 30, 3059–3066 (2002).
Article CAS Google Scholar
Luque, I., Riera-Alberola, M. L., Andujar, A. & Ochoa de Alda, J. A. G. Intraphylum diversity and complex evolution of cyanobacterial aminoacyl-tRNA synthetases. Mol. Biol. Evol. 25, 2369–23897 (2008).
Article CAS Google Scholar
Castresana, J. Selection of conserved blocks from multiple alignments for their use in phylogenetic analysis. Mol. Biol. Evol. 17, 540–552 (2000).
Article CAS Google Scholar
Guindon, S. et al. New algorithms and methods to estimate maximum-likelihood phylogenies: assessing the performance of PhyML 3.0. Syst. Biol. 59, 307–321 (2010).
Article CAS Google Scholar
Lartillot, N., Lepage, T. & Blanquart, S. PhyloBayes 3: a Bayesian software package for phylogenetic reconstruction and molecular dating. Bioinformatics 25, 2286–2288 (2009).
Article CAS Google Scholar
Shimodaira, H. & Hasegawa, M. CONSEL: for assessing the confidence of phylogenetic tree selection. Bioinformatics 17, 1246–1247 (2001).
Article CAS Google Scholar
Shimodaira, H. An approximately unbiased test of phylogenetic tree selection. Syst. Biol. 51, 492–508 (2002).
Article Google Scholar
Posada, D. jModelTest: phylogenetic model averaging. Mol. Biol. Evol. 25, 1253–1256 (2008).
Article CAS Google Scholar
Abascal, F., Zardoya, R. & Posada, D. ProtTest: selection of best-fit models of protein evolution. Bioinformatics 21, 2104–2105 (2005).
Article CAS Google Scholar
Darriba, D., Taboada, G. L., Doallo, R. & Posada, D. jModelTest 2: more models, new heuristics and parallel computing. Nat. Methods 9, 772 (2012).
Article CAS Google Scholar
Lartillot, N., Rodrigue, N., Stubbs, D. & Richer, J. PhyloBayes MPI. Phylogenetic reconstruction with infinite mixtures of profiles in a parallel environment. Syst. Biol. 62, 611–615 (2013).
Article CAS Google Scholar
Miller, M. A., Pfeiffer, W. & Schwartz, T. inProceedings of the Gateway Computing Environments Workshop (GCE) 1–8 (2010).
Huelsenbeck, J. P., Ronquist, F., Nielsen, R. & Bollback, J. P. Bayesian inference of phylogeny and its impact on evolutionary biology. Science 294, 2310–2314 (2001).
Article CAS ADS Google Scholar

Download references

Acknowledgements

This work was supported by Junta de Castilla y León (grant number IEU002A10-2), Ministerio de Economía y Competitividad (grant number BFU2010-19544), and Junta de Extremadura and the European Social Fund (grant DE12007 to J.A.G.O.d.A). We are grateful to Bioportal Oslo and CIPRES Gateway for their support; to Mark Miller for continuous support and availability; to David Sánchez and Diego Lorenzana for initial bioinformatic assistance; the Erasmus long life-training programme and IE University (Segovia) for initial support; to Gérard Guglielmi, Manolo Gouy, Céline Brochier-Armanet and Ignacio Luque for critical reading of the manuscript and insightful comments.

Author information

Jesús A. G. Ochoa de Alda
Present address: Present address: Didáctica de las Ciencias y las Matemáticas, Fac. Formación de Profesorado, Universidad de Extremadura, Avenida de la Universidad s/n, 10003 Cáceres, Spain,
Rocío Esteban
Present address: Present address: Grupo Deprofe, Didáctica de las Ciencias y las Matemáticas, Fac. de Educación, Universidad de Extremadura, Avenida de Elvas s/n, 06071 Badajoz, Spain,
María Luz Diago
Present address: Present address: Universidad Internacional de La Rioja, Rey Juan Carlos I, 41, 26002 Logroño, Spain (M.L.D.),

Authors and Affiliations

Grupo Hortofruenol, INTAEX-CICYTEX, Avenida Adolfo Suárez, s/n, Badajoz, 06071, Spain
Jesús A. G. Ochoa de Alda
School of Biology, IE University, Cardenal Zúñiga 12, Segovia, 40003, Spain
Jesús A. G. Ochoa de Alda, Rocío Esteban & María Luz Diago
Ecole Normale Supérieure, Institut de Biologie (IBENS), CNRS UMR 8197, Inserm U 1024, 46, rue d'Ulm, Paris, F-75005, France
Jesús A. G. Ochoa de Alda & Jean Houmard

Authors

Jesús A. G. Ochoa de Alda
View author publications
You can also search for this author in PubMed Google Scholar
Rocío Esteban
View author publications
You can also search for this author in PubMed Google Scholar
María Luz Diago
View author publications
You can also search for this author in PubMed Google Scholar
Jean Houmard
View author publications
You can also search for this author in PubMed Google Scholar

Contributions

J.A.G.O.de.A. and J.H. designed the work; R.E. and J.A.G.O.de.A. performed data retrieval, alignments and selection of reliable positions; M.L.D. and J.A.G.O.de.A. carried out phylogenetic model selection. J.A.G.O.de.A. performed concatenations, phylogenetic reconstructions, comparisons and hypothesis testing, and examined all the data. J.A.G.O.de.A. and J.H. wrote the paper.

Corresponding author

Correspondence to Jesús A. G. Ochoa de Alda.

Ethics declarations

Competing interests

The authors declare no competing financial interests.

Supplementary information

Supplementary Figures, Tables and References

Supplementary Figures 1-7, Supplementary Tables 1-2 and Supplementary References (PDF 4972 kb)

Supplementary Data 1

Alignment used for the reconstruction of phylogenetic trees (Fig. 1, Supplementary Fig. 1). Concatenated sequences for small and large ribosomal subunits trimmed for reliable characters using Gblock ¹³ under default (less stringent) conditions (Parameter used: Minimum number of sequences for a conserved position: 20; Minimum number of sequences for a flanking position: 32; Maximum number of contiguous nonconserved positions: 8; Minimum length of a block: 5; Allowed gap positions: With half). (TXT 195 kb)

Supplementary Data 2

Stringent trimming. Concatenated sequences for small and large ribosomal subunits trimmed for reliable characters using BMGE ⁷ under stringent conditions (Command line: -t DNA -m DNAPAM1:2 -w 1 -g 0.0 -b 1) and further removal of constant sites (Command line: -t DNA -w 1 -h 1E-5:1). (TXT 47 kb)

Supplementary Data 3

Identification of potential cyanobacterial orthologs in eukaryotes. Cyanobacterial core genes sorted out by COG categories with their protein variability as originally described 14 (Supplementary Table 2): informational (purple), ribosomal (pink) and photosynthetic genes (green) are color shaded. Gloeobacter genes were used as seeds to query refseq NCBI database for homologous genes in photosynthetic eukaryotes, using BLAST (5000 maximum target sequences). BLAST score ranges were extracted from the Taxonomy report. The distance tree of results and the taxonomy report were used to score each core gene: absent (n/p, white cells), clustering with cyanobacteria as an inner group (a, green cells) or as a sister group (b, yellow cells), or clustering with other bacteria (c, blue cells). A "d" indicates duplication, "dc" duplicated paralogs or "da" if at least one copy clusters with cyanobacteria. The Central Core (CC) of 38 genes corresponds to the set maintained in all photosynthetic eukaryotes. (XLS 157 kb)

Supplementary Data 4

Assesment of orthology. Cyanobacterial core genes sorted out by COG categories as originally described, (Supplementary Table 2¹⁴): informational (purple), ribosomal (pink) and photosynthetic genes (green) are color shaded. Gloeobacter GI numbers are used to identify the genes. Dataset columns show: the 38 core genes retained in all plastids (CC), the 68 core genes present simultaneously in Physcomitrella patens, Phaeodactylum tricornutum and Cyanidioschyzon merolae (PCD), the 48 genes subset of PCD (CyPlas) congruent with at least one cyanobacterial species tree (Supplementary Fig. 2) and the 33 genes subset of CyPlas (PBTopo) congruent with at least two reference cyanobacterial-plastid trees (Fig. 2), those used for the final reconstruction of the cyanobacterial-plastid evolutionary history (Fig. 3). To measure congruence we selected the best available Prottest model [14 substitution matrices ± Invariable sites ± (4-16) Γ rates ± Amino acid Frequency] for each core protein. Then, the sequence alignment of each protein and its optimal evolutionary model were assessed for their congruence (WSH-test, p>0.05) with tree topologies using Consel. (XLS 106 kb)

Supplementary Data 5

Small rRNA sequences from cyanobacteria. Alignment used for the reconstruction of a guide tree (Supplementary Fig. 2) for the evolution of cyanobacteria. (TXT 53 kb)

Supplementary Data 6

Concatenated large and small rRNA sequences from cyanobacteria. Alignment used for the reconstruction of a guide tree (Supplementary Fig. 2) for the evolution of cyanobacteria. (TXT 161 kb)

Supplementary Data 7

Concatenated protein phylogenetic markers from cyanobacteria. Subset of 13 sequences derived from Cicarelli's sequences ¹⁵ used for the reconstruction of a guide tree (Supplementary Fig. 2) for the evolution of cyanobacteria. (TXT 156 kb)

Supplementary Data 8

Concatenated large and small rRNA sequences from plastid and cyanobacteria. Alignment used for the reconstruction of a guide tree (Fig. 2A) for the evolution of plastid and cyanobacteria. (TXT 169 kb)

Supplementary Data 9

Concatenated CyPlas-dataset. Subset of 48 cyanobacterial core sequences aligned with the corresponding homologous proteins from three photosynthetic eukaryotes, and trimmed and concatenated for the reconstruction of guide trees (Fig. 2 B-C) for the evolution of plastid and cyanobacteria. (TXT 418 kb)

Supplementary Data 10

Individual CyPlas-dataset phylogenies. Phylogenetic trees used for the reconstruction of consensus tree as guide for the evolution of plastid and cyanobacteria (Fig. 2 D). Gloeobacter GI numbers of core genes are used to identify the trees. (TXT 62 kb)

Supplementary Data 11

Concatenated PBTopo-dataset used for the reconstruction of phylogenetic (Fig. 3). Subset of 33 cyanobacterial core sequences aligned with the corresponding homologous proteins from three photosynthetic eukaryotes, and trimmed and concatenated for the reconstruction of a guide tree for the evolution of plastid and cyanobacteria. (TXT 289 kb)

Supplementary Data 12

Alignment used for the reconstruction of phylogenetic tree (Supplementary Fig. 4A). (TXT 275 kb)

Supplementary Data 13

Alignment used for the reconstruction of phylogenetic tree (Supplementary Fig. 4B). (TXT 142 kb)

Supplementary Data 14

Alignment used for the reconstruction of phylogenetic tree (Supplementary Fig. 5). Concatenated sequences for small and large ribosomal subunits trimmed for reliable characters using BMGE ⁷ under stringent conditions (Command line: -t DNA -m DNAPAM1:2 -w 1 -g 0.0 -b 1) and further removal of constant sites (Command line: -t DNA -w 1 -h 1E-5:1). (TXT 137 kb)

Supplementary Data 15

SILVA ¹⁶ alignment of small (16S) rRNA sequences. Dataset includes 3 Melainabacteria, 120 cyanobacteria, Paulinella chromatophora and 14 plastids. (TXT 239 kb)

Supplementary Data 16

SILVA ¹⁶ alignment of large (23S) rRNA sequences. Dataset includes 3 Melainabacteria, 120 cyanobacteria, Paulinella chromatophora and 14 plastids. (TXT 506 kb)

Supplementary Data 17

Alignment used for the reconstruction of phylogenetic tree (Fig. 4). Supplementary Data 15 and 16 were concatenated and their redundancy were reduced to 97% using Jalview ¹⁷. The resulting alignment was trimmed using BMGE ⁷ under stringent conditions (Command line: -t DNA -m DNAPAM1:2 -w 1 -g 0.0 -b 1) and further removal of constant sites (Command line: -t DNA -w 1 -h 1E-5:1). (TXT 100 kb)

Supplementary Data 18

Alignment used for the reconstruction of phylogenetic tree (Supplementary Fig. 6B). This alignment derived from Supplementary Data 17 after mitigation of compositional bias to two standard deviations using Phylobayes 18. (TXT 70 kb)

Supplementary Data 19

Alignment used for the reconstruction of phylogenetic tree (Supplementary Fig. 7). Dataset was obtained after removing long branches (Cyanidioschyzon merolae, Ostreococcus tauri and Volvox carteri), short sequences (Cyanoptyche gloecystis and Lyngbya), redundancy (97% threshold) and positions that were only variable in plastid and not in cyanobacteria (17 % of sites variable) from Supplementary Data 14. (TXT 88 kb)

Supplementary Data 20

Reduction of dataset complexity by character trimming, and suppression of sources of redundancy and compositional heterogeneity places the plastid branching point close to group A. Datasets originate from a multiple alignment of concatenated sequences of small and large ribosomal subunits (Supplementary Data 15-16). This multiple alignment of 138 sequences was trimmed using BMGE software under default (-m DNAPAM100:2) or stringent (-m DNAPAM1:2) conditions to obtain dataset 1 and 4, respectively. The original multiple alignment was filtered using Jalview to reduce redundancy to 97% (when sequences exhibit more than 97% identity only one of them was further used). This resulted in a multiple alignment of 99 sequences that was further trimmed using BMGE under default (dataset 2) or stringent conditions (dataset 8). Suppression of seven fast evolving eukaryotes from dataset 2 or dataset 8 resulted in dataset 3 and 9 respectively. Dataset 5 was obtained from dataset 1 after removing outgroups (MEL A1, B1 and B2) and a stringent trimming of the resulting multiple alignment of 135 sequences (Supplementary Data 14). Suppression of five fast evolving eukaryotes from this multiple alignment and further stringent trimming resulted in dataset 6. Phylobayes was used to reduce compositional heterogeneity (2z) of dataset 5 to obtain dataset 7 and dataset 8 to obtain dataset 10. Posterior probability of branching point was estimated using model GTR+4Γ+CAT. Branching points correspond to the letters close to the nodes of Supplementary Fig. 5 tree. (XLSX 20 kb)

Rights and permissions

Reprints and permissions

About this article

Cite this article

Ochoa de Alda, J., Esteban, R., Diago, M. et al. The plastid ancestor originated among one of the major cyanobacterial lineages. Nat Commun 5, 4937 (2014). https://doi.org/10.1038/ncomms5937

Download citation

Received: 29 January 2014
Accepted: 08 August 2014
Published: 15 September 2014
DOI: https://doi.org/10.1038/ncomms5937

This article is cited by

Lipid bilayer properties potentially contributed to the evolutionary disappearance of betaine lipids in seed plants
- Stéphanie Bolik
- Alexander Schlaich
- Juliette Jouhet
BMC Biology (2023)
In silico insight of cell-death-related proteins in photosynthetic cyanobacteria
- Siddhesh B. Ghag
- Jacinta S. D’Souza
Archives of Microbiology (2022)
Comprehensive genomic analyses with 115 plastomes from algae to seed plants: structure, gene contents, GC contents, and introns
- Eun-Chae Kwon
- Jong-Hwa Kim
- Nam-Soo Kim
Genes & Genomics (2020)
tRNA functional signatures classify plastids as late-branching cyanobacteria
- Travis J Lawrence
- Katherine CH Amrine
- David H Ardell
BMC Evolutionary Biology (2019)
A constrained SSU-rRNA phylogeny reveals the unsequenced diversity of photosynthetic Cyanobacteria (Oxyphotobacteria)
- Luc Cornet
- Annick Wilmotte
- Denis Baurain
BMC Research Notes (2018)

Comments

By submitting a comment you agree to abide by our Terms and Community Guidelines. If you find something abusive or that does not comply with our terms or guidelines please flag it as inappropriate.

Subjects

Abstract

Similar content being viewed by others

Introduction

Results

The debate on plastid ancestor

Phylogeny of concatenated 16S–23S rRNAs

Phylogenomic of the core genes in photosynthetic eukaryotes

Plastid origin versus cyanobacterial diversification

Discussion

Methods

Experimental design

Taxonomic sampling

Data set selection, retrieval, concatenation and assessment

Guide trees

Evolutionary model selection and phylogenetic reconstruction

Topology testing

Additional information

References

Acknowledgements

Author information

Authors and Affiliations

Contributions

Corresponding author

Ethics declarations

Competing interests

Supplementary information

Rights and permissions

About this article

Cite this article

Share this article

This article is cited by

Comments

Search

Quick links