The redox-sensitive proteome (RSP) consists of protein thiols that undergo redox reactions, playing an important role in coordinating cellular processes. Here, we applied a large-scale phylogenomic reconstruction approach in the model diatom Phaeodactylum tricornutum to map the evolutionary origins of the eukaryotic RSP. The majority of P. tricornutum redox-sensitive cysteines (76%) is specific to eukaryotes, yet these are encoded in genes that are mostly of a prokaryotic origin (57%). Furthermore, we find a threefold enrichment in redox-sensitive cysteines in genes that were gained by endosymbiotic gene transfer during the primary plastid acquisition. The secondary endosymbiosis event coincides with frequent introduction of reactive cysteines into existing proteins. While the plastid acquisition imposed an increase in the production of reactive oxygen species, our results suggest that it was accompanied by significant expansion of the RSP, providing redox regulatory networks the ability to cope with fluctuating environmental conditions.
The origin of eukaryotes represents a major evolutionary transition in life history. According to endosymbiotic theory, photosynthetic eukaryotes arose via two symbiotic associations of prokaryotic lineages1. The first endosymbiont evolved into the mitochondrion, a process that was accompanied by a massive lateral gene transfer (LGT) from the endosymbiont genome into the nucleus (termed endosymbiotic gene transfer; EGT)2. Photosynthetic eukaryotes evolved from a eukaryotic ancestor that already harboured the hallmarks of eukaryotic cells including the nucleus and mitochondrion3. The acquisition of primary plastids involved an endosymbiosis of a cyanobacterium symbiont within a eukaryotic host and led to the evolution of Archaeplastida3. Secondary plastids of algae ancestry are found in multiple eukaryotic lineages, yet the number of independent plastid acquisition events in these lineages is still highly debated4,5.
The chimeric ancestry of eukaryotes is reflected in their genomes. A substantial portion of eukaryotic genes trace back to an archaebacterial or eubacterial ancestry6,7. The ancestry of eukaryotic genes is commonly correlated with their cellular function6,8. Genes of archaebacterial origin typically encode for proteins that function in information processing pathways (for example, replication, transcription and translation). Genes of proteobacterial and cyanobacterial origin also encode for proteins that generally function in operational processes within the cell (for example, energy metabolism, synthesis of biomolecules, cell envelope and regulatory functions). While organelle acquisition was fundamental for the evolution of eukaryotic complexity9, it was most probably accompanied by increased reactive oxygen species (ROS) production resulting from their oxygen-based metabolic processes10,11. Thus, it is reasonable to hypothesize that the organelles’ evolution was accompanied by evolution of the mechanisms required for ROS detoxification and their integration into signalling pathways.
Oxidative stress is a unique physiological state which is characterized by modulation of gene expression patterns12,
Diatoms are a heterogeneous clade of phytoplankton that are responsible for roughly 20% of global primary productivity17. They belong to the Stramenopiles within the SAR supergroup (Stramenopiles, Alveolata and Rhizaria18), which includes species harbouring secondary plastids. Diatom genomes encode for a mosaic of bacterial, plant and animal traits19,20. Here, we examined the phylogeny of redox-sensitive proteins in the diatom Phaeodactylum tricornutum. We reconstructed the origin of redox-sensitive Cys residues in the P. tricornutum proteome (RSCys) and classified them according to their ancestry. Our analysis reveals two major expansions of the eukaryotic redox signalling network, which coincide with the primary and secondary plastid acquisition during eukaryote evolution.
Cysteine residue gain dynamics
Data of RSCys in the P. tricornutum proteome was obtained from Rosenwasser et al.21. All Cys residues that show no redox sensitivity or lack of redox state information were catalogued as unclassified Cys (UNCys). To trace the origin of redox-sensitive Cys in the P. tricornutum redox-sensitive proteome (RSP), we used an ancestral sequence reconstruction approach22. Homologues to the P. tricornutum proteome were identified by comparing all P. tricornutum protein sequences to 132 proteomes from organisms representing different phyla across the tree of life, including archaebacteria, eubacteria, protists, plants and animals (Supplementary Table 1). Each P. tricornutum protein sequence was aligned with its homologues and a phylogenetic tree was reconstructed using a maximum likelihood approach. The trees were rooted according to their taxonomic composition and the largest eukaryote-specific clade that includes P. tricornutum was extracted for further analysis. This resulted in 7,118 eukaryotic gene trees. Ancestral sequences were reconstructed by PAML23 and used to document Cys residue gains along the lineage leading to P. tricornutum. We distinguished between two possible scenarios for the evolution of Cys residues. Cys gains reconstructed into an existing gene were classified as amino-acid (AA) replacement gains, whereas Cys residues already present in the earliest occurrence of a gene were classified as gene origin gains.
An example of the Cys gain analysis is provided by 3-oxoacyl-(acyl-carrier-protein) synthase, which is involved in fatty acid metabolism and found to be redox sensitive in P. tricornutum as well as in tomato during infection response24. The P. tricornutum 3-oxoacyl-(acyl-carrier-protein) synthase (XP_002184832.1) includes one RSCys and ten UNCys residues and it has homologues in three other diatoms, Symbiodinium sp. clade B1, Emiliania huxleyi and Guillardia theta. An inference of Cys gain and loss dynamics reveals that five of the UNCys (Cys22,176,182,200,293) were reconstructed to the common ancestor of all species represented in the tree. These Cys residues are classified as gains by gene origin. One RSCys (Cys79) and two UNCys (Cys39,47) residues were gained in the terminal branch leading to P. tricornutum (Fig. 1a). All three residues were gained in a gene already present in the genome of the diatom ancestor of P. tricornutum. Accordingly, they are classified as AA replacement in P. tricornutum. Because all homologues were found in organisms having a secondary plastid, the gene origin event of 3-oxoacyl-(acyl-carrier-protein) synthase is inferred to coincide with the secondary plastid acquisition. Another UNCys (Cys216) was gained on an early branch that excludes E. huxleyi. Two additional UNCys (Cys155,418) are specific to diatoms, and are classified as AA replacements at the diatom ancestor. One of these UNCys (Cys155) was lost again in Thalassiosira pseudonana.
Another example is the protein cluster of a protochlorophyllide reductase (XP_002179689.1), an enzyme involved in chlorophyll synthesis that's transcription level was shown to be sensitive to redox alterations25. Ancestral sequence reconstruction (Fig. 1b) shows that one RSCys (Cys356) and another UNCys (Cys224) are specifically gained in diatoms and E. huxleyi by AA replacement. The remaining UNCys residues are not universal to all species in the tree, but were inferred to be present in the root and were thus classified as gene origin gains.
Similar to the examples above, we analysed 7,118 phylogenetic trees and inferred the evolution of the Cys residues. In addition, we inferred the evolutionary origin of Cys residues in 841 protein sequences having a single eukaryotic homologue and 2,343 protein sequences with no homologues in other eukaryotic genomes. Our dataset comprises 54,873 Cys residues. All Cys gain events were mapped onto six main ancestral lineages along the path leading from the last eukaryotic common ancestor (LECA) to P. tricornutum through the primary and secondary endosymbiosis and the evolution of Stramenopiles and diatoms (Fig. 2a). We note that eukaryotes harbouring a secondary plastid may not be monophyletic4,18; here we grouped those taxa into a single ancestral lineage for practical reasons.
The mapping of Cys gains to ancestral lineages was determined according to the taxonomic depth of the earliest node on the protein phylogeny possessing the Cys residue. Thus, a P. tricornutum Cys residue that was reconstructed as present in a node ancestral to at least one member of the three non-photosynthetic eukaryotic groups (Opisthokonta, Amoebozoa and Excavata) was inferred to have been gained at the LECA (for example, Supplementary Fig. 1a). Residues that were reconstructed as present in a node ancestral to members of Archaeplastida were inferred to have been gained at the ancestral lineage of primary plastid-bearing eukaryotes, hence these gain events coincide with the primary plastid acquisition (for example, Supplementary Fig. 1b). Nodes ancestral to any taxon whose evolution involves secondary plastid acquisition, including Haptophyta, Cryptophyceae, Rhizaria and Alveolata, were inferred as gains in the secondary plastid ancestral lineage (for example, Supplementary Fig. 1c). Cys residues that were reconstructed as present in a node ancestral to a non-diatom Stramenopiles species were inferred as gains in the Stramenopiles lineage (for example, Supplementary Fig. 1d). Residues reconstructed as present in the ancestor of any diatom other than P. tricornutum were assigned to the diatom lineage (for example, Supplementary Fig. 1e). Lastly, P. tricornutum-specific Cys were reconstructed as gains at the P. tricornutum lineage (for example, Fig. 1a). All RSCys and their ancestral reconstruction classification are detailed in Supplementary Table 2.
The frequency of RSCys gain events in all eukaryotic protein families, for each of the six ancestral lineages, is summarized in Fig. 2b–d according to the two types of gains. The expected baseline evolutionary signal is provided by an identical analysis applied to UNCys and aspartic acid residues in the P. tricornutum proteome. Asp was chosen as an additional control because in the present dataset its conservation level is comparable to that of cysteine. Our analysis reveals that residue gain in the earlier lineages (for example, LECA and primary plastid) is primarily via gene origin, whereas in later nodes (for example, Stramenopiles and diatoms) the proportion of residue gain via AA replacement increases. The relative frequency of the two types of gain is similar among the three tested residues in most ancestral lineages (Fig. 2b).
We next tested for enrichment of RSCys gain in each ancestral lineage by comparing the proportion of RSCys residues gained in the node to that of the two baseline residues (α = 0.05, using Fisher's exact test and false discovery rate (FDR)). A significant enrichment of RSCys gains via gene origin was observed in the LECA and the primary and secondary plastid endosymbiosis ancestral lineages (Fig. 2c). RSCys gains by gene origin at those three lineages are significantly more frequent than expected according to the baseline residues UNCys and Asp. Among these lineages, the highest enrichment (threefold) was detected in the primary plastid endosymbiosis. A significant enrichment of RSCys gains via AA replacement is observed in the secondary plastid ancestral lineages (Fig. 2d). This points to three major expansions of the RSP that coincided with the evolution of LECA (that is, mitochondrion acquisition) and the primary and secondary plastid acquisitions. Earlier expansions at the LECA and primary plastid ancestral lineages were driven by gene origin, whereas the expansion at the secondary plastid acquisition was driven by gene origin as well as AA replacements in existing genes. The proportion of RSCys gains via gene origin in the P. tricornutum-specific lineage is significantly depleted in comparison to the baseline AAs (Fig. 2c). A similar trend is observed for RSCys gain via AA replacement at the diatom ancestral lineage, but this observation has a weak statistical support in comparison to UNCys residues (Fig. 2d).
The observation that RSCys gain is correlated with plastid acquisition is further supported by protein functional annotations. A test for enrichment of gene ontology (GO) terms26 of protein sequences that contain RSCys revealed that RSCys gain events in the primary plastid ancestral lineage are enriched in protein sequences annotated with plastid- or ROS-related terms (for example, plastid, peroxidase and chlorophyll biosynthesis). Significantly enriched GO terms associated with RSCys gains reconstructed to the secondary plastid acquisition are related to cofactor binding and pigment biosynthesis (Supplementary Table 3).
Origins of the RSP
The majority (76%) of P. tricornutum RSCys residues are observed only in eukaryotic homologues, hence they are eukaryotic specific. This indicates that the present-day RSP is mainly attributed to eukaryotic innovation. Nonetheless, the 24% RSCys with homologous residues in prokaryotes constitutes a significantly higher proportion than that observed for other Cys residues (8%; P value < 0.001 using Fisher test). Thus, a substantial proportion of RSCys was already present in the prokaryotic ancestors. We have already observed that during the early stages of eukaryotic evolution the majority of RSP expansion occurred via gene origin and coincided with the organelle acquisitions. Given these two observations, we hypothesized that RSCys gained by gene origin at the three earliest ancestral lineages correspond to gene acquisition by EGT.
We therefore classified P. tricornutum protein-coding genes into several ancestry classes: cyanobacteria (plastid ancestor), proteobacteria (mitochondrion ancestor), bacteria, archaebacteria and eukaryotic specific, based on the branching pattern in phylogenetic trees (see Methods). Most redox-sensitive proteins (proteins with at least one RSCys residue) are observed to be of prokaryotic ancestry (57%) rather than eukaryote specific, and prokaryotic ancestry is significantly more frequent in genes encoding RSPs in comparison to other protein-coding genes of P. tricornutum (32%; P value < 0.001 using Fisher test). Considering individual RSCys residues in comparison to the baseline residue frequencies in RSP genes, we find that genes of cyanobacterial ancestry harbour a significantly higher frequency of RSCys, nearly twofold more frequent than the baseline residues (Table 1, α = 0.05, using Fisher test and FDR).
The distribution of protein ancestry within the ancestral lineages (Fig. 3) reveals that cyanobacterial ancestry accounts for 35% of the RSCys gains by gene origin in the primary plastid ancestral lineage and 47% of the gains in the secondary plastid ancestral lineage. Of the RSCys gains via AA replacement at the secondary plastid ancestral lineage, 50% occur in genes of cyanobacterial ancestry (Supplementary Fig. 2). The high frequency of genes of cyanobacterial ancestry in the plastid ancestral lineages (primary and secondary) further supports the observation that RSCys reconstructed to those nodes were gained concomitantly with the plastid acquisition. This suggests a significant eukaryotic-specific expansion of the RSP by AA replacement in genes of cyanobacterial ancestry (Fig. 3).
In proteins of cyanobacterial or proteobacterial ancestry, most RSCys gained by gene origin were also present in the prokaryotic lineages (23 out of 26 and 11 out of 15, respectively). This conservation is not restricted to the immediate ancestral group, as half of these RSCys are conserved also in other eubacterial or archaebacterial species. Thus, several of the Cys residues that are redox sensitive in P. tricornutum can be traced back to the very root of the tree of life. One example of an RSCys residue with homologues in both eubacteria and archaebacteria is the peroxiredoxin (EC45666.1; Supplementary Fig. 1f) with a catalytic Cys (Cys98) that is conserved in all organisms, except Aureococcus anophagefferens that contains a Selenocysteine instead.
A recent publication of thiol oxidation data for the cyanobacterium Synechocystis sp. PCC 6803 during light/dark modulation and in response to photosynthesis inhibition27 enables the comparison of P. tricornutum RSCys to that of a representative cyanobacterial species. A total of 962 protein-coding genes are homologous between P. tricornutum and Synechocystis sp. PCC 6803 (Supplementary Table 4). These homologues include 1,105 Cys residues, where oxidation information in both species is available for 100 residues. A comparison of the oxidation state of these homologous Cys revealed that Cys having the same redox sensitivity state in the two species are observed significantly more frequently than expected by chance (Supplementary Table 5a; P value = 0.0118, using Fisher test). The set of common RSCys consists of nine residues that are oxidized in both P. tricornutum and Synechocystis sp. PCC 6803 (Supplementary Table 5b). We note that these residues have been classified as RSCys under different physiological conditions27. Nonetheless, the comparison shows that they are redox sensitive in both organisms. These observations, though based on a small sample size, bear witness to the contribution of plastid acquisition to the RSP, even though reflecting the divergence and independent evolution of the cyanobacterial and eukaryotic lineages following the origin of plastids.
The contribution of LGT to large-scale eukaryote genome evolution outside the framework of EGT is harshly debated28,29. Yet, LGT is often invoked to explain sporadic occurrence of prokaryotic homologues in eukaryotic genomes (for example, see refs 30, 31). Our dataset includes 39 RSCys with unresolved eubacterial ancestry that cannot be clearly associated with EGT from the plastid or mitochondrion ancestors. Hypothetically, these may have been obtained via LGT from free-living prokaryotes (that is, independently of EGT). If so, it would suggest a substantial role of LGT in the RSP expansion. However, the lack of significant enrichment for RSCys in genes of general bacterial ancestry does not indicate this (Table 1). It was recently proposed that the P. tricornutum genome20 includes 587 protein-coding genes acquired from prokaryotic donors at the diatom ancestor or P. tricornutum. Of these, 28 proteins (comprising 34 RSCys) are included in the RSP of P. tricornutum. We note, however, that the original genome analysis did not make a distinction between LGT and EGT. We re-examined the phylogeny of protein sequences included in our dataset and found that 15 of the 28 putative laterally transferred genes are of either proteobacterial or cyanobacterial ancestry. Hence, it is probable that these genes were acquired via EGT rather than LGT.
An example of a gene that was considered as an LGT-candidate20, but may have been classified as an EGT, is a nitrite reductase protein (Nir, Supplementary Fig. 1d) that in P. tricornutum contains two RSCys and 19 UNCys residues, making it a potentially important redox-sensitive protein. For the remaining 13 genes, we could not find a clear EGT signal, hence with the current data these may be considered as LGT events.
Here, we followed the evolution of the P. tricornutum redox-sensitive protein network during major events in the diatom evolution by examining phylogenetic relations between homologous proteins and reconstruction of ancestral sequences. Our results demonstrate that the RSP is highly dynamic, with a constant flux of new genes and Cys residues expanding the repertoire of the redox-responsive network. The vast majority of RSP innovation is purely eukaryotic, and a substantial portion of it is recent and specific to the diatom lineage. Yet, underlying the massive eukaryotic innovations are clear footprints of two major expansions of the RSP, acquisition of redox-sensitive proteins derived from the endosymbiotic origin of the chloroplast and introduction of RSCys residues into existing proteins during secondary endosymbiosis. Our results are supported by the observation of protein domains from plastid origin in proteins within eukaryotic redox regulation pathways32 and the cyanobacterial origin of thioredoxins in plants33. These expansions correspond to major transitions in the eukaryotic lineage that were most probably accompanied by increased ROS pressures imposed by acquiring the mitochondrion and plastids that harbour oxygen-based metabolic processes. Our study suggests that organelle acquisitions not only generated a major ROS challenge, but also provided, via EGT, many of the proteins that are employed in the ROS signalling and response pathways.
The contribution of plastid ancestors to the evolution of redox pathways was gradual and different functions were incorporated at different stages (Supplementary Table 2). The RSP expansion during the primary plastid acquisition includes the gain of redox transmitters such as thioredoxin via EGT from cyanobacteria (Supplementary Table 2). Expanding cellular capabilities to sense redox signals, by increasing the number of redox signal transmitters such as thioredoxins and their protein targets, can provide cells with a highly modular sensing network of environmental cues. This plasticity can allow cells to integrate various signals in order to rapidly adjust cellular processes in a reversible manner, via reactive Cys, and maintain homeostasis under fluctuating physiological conditions34. Our analysis uncovers a significant expansion of redox signalling also during the secondary endosymbiosis event, indicating that a developed redox regulation could have been beneficial during the inhabitation of the new organelle. As the basic machinery for transmitting of redox signals was already acquired during the primary endosymbiosis, it is reasonable that this expansion was mainly characterized by Cys residue gains in pre-existing proteins (Fig. 2). An example is the protochlorophyllide reductase, which is involved in chlorophyll metabolism and gained a novel RSCys via AA replacement (Fig. 1b).
According to our analysis, the contribution of mitochondrion acquisition to the RSP is not significantly larger than other contributors. On the other hand, enrichment of RSCys by gene origin at the LECA node and enrichment of prokaryotic inheritance was observed and may still suggest a contribution of the mitochondrial ancestor (Table 1; Fig. 2). It is possible that redox regulation was only one process among many other operational functions acquired from proteobacteria6, and the RSP system was not over-represented among the genes acquired by EGT from the mitochondrion.
Our results show that RSCys are mainly eukaryotic innovations (76%) and therefore indicate a massive expansion of redox regulation during eukaryotic diversification. However, reconstruction of protein ancestry shows that the contribution of prokaryotic ancestors to the RSP expansion was significant. Different stages of eukaryotic evolution were accompanied by RSCys gain and these involved many eukaryotic-specific genes, but also genes of different prokaryotic origins. This is exemplified by the 40S ribosomal protein S3 phylogeny (Supplementary Fig. 1g) that shows a clear archaebacterial ancestry and an RSCys residue gain that coincides with the LECA. In a few cases, we even observed RSCys residues that are conserved in both bacteria and archaebacteria (for example, peroxiredoxin, Supplementary Fig. 1f). Our ability to infer whether these Cys had a signalling role at the very early origin of life is limited. Nonetheless, these Cys are as ancient as LUCA (last universal common ancestor). Indeed, the finding of antioxidant enzymes in many strict anaerobes35 suggests the emergence of ROS production and regulation mechanisms prior to the rise of oxygen in the Earth's atmosphere36. Thus, the very ancient Cys residues we uncovered here could, potentially, already have been part of redox signalling prior to the evolution of oxygenic photosynthesis. The finding of thioredoxins and their protein targets in a methane-producing archaeon37 as well as the recent finding of redox regulation of sulfide-dependent anoxygenic photosynthetic electron transfer in Rhodobacter capsulatus38 further support this hypothesis.
Oxidation and reduction of reactive thiol proteins are key regulatory post-translational modifications that enable integration of redox signals into cellular pathways16. However, over-oxidation of redox-sensitive Cys, under high ROS pressure, subverts this regulatory role and leads to loss of protein function and its subsequent degradation39. Therefore, accumulation of redox-sensitive Cys during evolution presents a trade-off between extended ability to accurately sense redox signals on the one hand and its sensitivity to oxidative stress on the other. Following the significant expansion of the RSP during primary and secondary endosymbiosis, the subsequent evolution of P. tricornutum is characterized by a paucity of gains as compared to the baseline expectation provided by UNCys and Asp. (Fig. 2c, node 6). This reduction in gaining reactive Cys residues may indicate a cost associated with higher sensitivity to toxic ROS levels, and that only Cys with key regulatory and metabolic values were fixed after the initial expansion.
To conclude, we provide evidence for the major contribution of endosymbiosis events to the redox-sensitive protein network evolution. Our results point to a major role of redox regulation in coordinating organelle endosymbiosis at the eukaryotic origins.
P. tricornutum data, including gene protein sequences, oxidation data, GO and targeting were obtained from Rosenwasser et al.21. Eleven genes annotated as ‘predicted’ or ‘hypothetical’ (out of 186) were manually annotated according to the best BLASTP hits (E-value < 10−5) in closely related taxa (see also Supplementary Table 2). GO term enrichment was assessed using Ontologizer40 with the ‘Topology-Weighted’ algorithm. Only the top ten GO terms by P value were considered. LGT candidates in P. tricornutum were extracted from Bowler et al.20 by protein identifiers and their presence in the RSCys data was inferred from 100% identity BLASTP hits.
Taxonomic classification of 74 eukaryotic and 58 prokaryotic species was determined according to NCBI taxonomy41 and Adl et al.18, except for Hacrobia that represent an assemblage of Haptophyta and Cryptophyceae42. All 1,764,232 predicted protein sequences of the selected organisms were downloaded from RefSeq (Ver. Oct–Nov 2014)43, GenBank44 and additional online databases (Supplementary Table 1). Identical sequences of the same organism were clustered using CD-HIT45.
Oxidation data for Synechocystis sp. PCC 6803 was obtained from Guo et al.27. Peptides were mapped to the P. tricornutum protein sequences using BLAT (Ver. 34 × 12 (ref. 46)). Only the first Cys residue per mapped peptide was considered. In cases of multiple oxidation data for one Cys residue that point to different types of redox sensitivity (RSCys or UNCys), only the redox-sensitive one was considered. Otherwise, the Cys residue was counted only once.
Protein families and phylogenetic trees
A search for eukaryotic homologues to P. tricornutum protein sequences was performed with BLAST Ver. 2.2.29+ (ref. 47) using an E-value < 10−5 threshold. Bidirectional best BLAST hits (BBH)48 were extracted and globally aligned with needle (EMBOSS 6.6 (ref. 49)). Homologues having ≥30% AA similarity with the P. tricornutum sequence were clustered into 8,017 protein families, with 2,343 sequences remaining as singletons. All eukaryotic clusters were aligned using MAFFT Ver. 7 (‘linsi’ option50), before the addition of prokaryotic homologues. The search for prokaryotic homologues was performed with PSIBLAST (ref. 51) Ver. 2.2.29+ using the eukaryotic alignments and the unclustered singleton sequences, applying an E-value < 10−5 threshold. The BBH in each prokaryotic species was pairwise globally aligned with all cluster members and retained if at least one eukaryotic cluster member had ≥30% identical AAs. Prokaryotic homologues were merged into the clusters and aligned using MAFFT Ver. 7 (‘linsi’ option). Maximum likelihood (ML) trees were reconstructed using PhyML (Ver. 3.0 (ref. 52)) with the ‘BEST’ search strategy, and the inferred substitution model (LG+G+I+F) chosen as the most frequently chosen PROTTEST (Ver. 3.2 (ref. 53)) result according to corrected Akaike information criterion.
Tree rooting, gene origin and ancestral sequence reconstruction
Trees with prokaryotic homologues were rooted on the branch that splits the largest eukaryotic-only group including P. tricornutum. Eukaryotic clades of ≥3 operational taxonomic units (OTUs) were extracted for further analysis. Remaining trees without prokaryotic homologues were rooted by the midpoint approach54. Gene ancestry was determined according to nearest neighbour of the root split (see Supplementary Fig. 3 for details).
Ancestral sequence reconstruction was performed using PAML Ver. 4.7 (ref. 23) with the LG substitution model applying the marginal reconstruction approach. The input constituted the rooted trees with the corresponding OTU sequences from the multiple sequence alignments. Gapped positions and ambiguous characters were not considered. Resulting support values for ancestral states exceeded 0.9 for 79% of the RSCys residues and 61% of the UNCys residues. AA replacements were inferred as AA substitutions resulting in the residue found in the genome of P. tricornutum. If no AA replacements were determined for a specific AA residue, it was defined to be acquired at the root of the tree and classified as a gain by gene origin, as were AAs in singleton proteins (that is, those without eukaryotic homologues). In protein clusters with two members only, AA gains were classified as gene origin in their common ancestor if they are conserved, or as AA replacement if the AA residue is present only in P. tricornutum.
The authors declare that all information on accessing data analysed in this study is included in the paper (and its Supplementary Information files). The datasets generated during the current study are available from the corresponding authors upon request.
How to cite this article: Woehle, C., Dagan, T., Landan, G., Vardi, A. & Rosenwasser, S. Expansion of the redox-sensitive proteome coincides with the plastid endosymbiosis. Nat. Plants 3, 17066 (2017).
Publisher's note: Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
The authors thank A. Kupzok, J. Ilhan, J. Weissenbach, C. Walda, A. Mrnjavac and T. Wein for critical comments on the manuscript. This project was supported by the European Research Council (Grant No. 281357 awarded to T.D. and 280991 awarded to A.V.), the Israeli Science Foundation (Grant 25 No. 712233 awarded to A.V.) and the cluster of excellence, The Future Ocean (funded within the framework of the Excellence Initiative by the Deutsche Forschungsgemeinschaft (DFG) on behalf of the German federal and state governments).
Supplementary Tables 1, 2 and 4.