## Main

Trace element composition of the Precambrian ocean is of prime interest, due to the oxygen sensitivity of their availability, and is frequently inferred from the chemistry of indirect seawater recorders such as iron formations, black shale, pyrite records and carbonates1. Such an approach has rarely been applied to the Phanerozoic eon2,3, despite recent evidence of a transformation in the oxygen content of the upper ocean at the Palaeozoic/Mesozoic boundary4. Coincident with this change in oxygen, a Mesozoic algal revolution took place. Steranes in fossil records reveal that the Palaeozoic5 dominance of green algal phototrophs (Chlorophyta) ceased in the Mesozoic era (252–66 Myr ago; Ma) when the open ocean was taken over by the chlorophyll a+c-containing eukaryotes of the haptophyte and heterokont lineages whose plastids originate from red algae (Rhodophyta) via secondary endosymbiosis6,7,8 (Supplementary Fig. 1). These changing algal community structures and the development of mineralization strategies subsequently led to major changes in ocean ecology and chemical buffering9.

Recent molecular clock analyses indicate that the events underlying the evolutionary origin of algae, including the primary endosymbiosis and the subsequent spread of red plastids across the eukaryote tree, all pre-dated the ecological expansion of these groups by at least ~0.5–1 Gyr (ref. 5). This suggests that the selective pressures that underpinned the establishment of phytoplankton groups are distinct from those that drove their expansion5. Algal superfamilies are suspected to harbour distinct intracellular trace element stoichiometry10,11,12 highlighting potential differences in their physiological metal requirements for growth. To fulfil their needs for trace metals, phytoplankton have developed a complex network for metal uptake, chelation, transportation and storage, in which transmembrane metal transporters are crucial components13,14. Metal transport systems are key to dealing with metal limitation and toxicity in the environment, which underpins the environmental selection for different phytoplankton groups in the ocean. Episodes of transformation of phytoplankton domination in the ocean from the geological record, therefore, may provide evidence for change in seawater chemistry.

Here we employed a comparative genomic approach to analyse metal transporters and metal-binding proteins across a variety of phytoplankton species, abundant in the modern ocean, to understand whether patterns of trace metal usage and uptake are characteristic of different algal lineages (Figs. 1 and 2). The geological history of species succession in the ocean could then be used as a broad indicator of changing ocean trace metal chemistry, even though there may be some examples of decoupling between evolving marine chemistry and metal usage by phototrophs15.

## Contrasting metal requirements of phytoplankton lineages

The biological requirement of elements varies in different species16. Although previous studies convincingly show that different lineages of phytoplankton systematically differed in trace metal content10,11,12, the discovery of various metal storage proteins and the plasticity in elemental stoichiometry under different environmental conditions17,18 questions whether intracellular trace metal quotas reflect true differences in usage. Here we embellish the difference in metal requirements with genomically grounded use of metals. We determined that the percentage of the metalloproteins encoded in phytoplankton genomes indeed reflects the physiological requirements for trace metals (Figs. 1, and 2, and Supplementary Discussion). This approach could harbour major uncertainties associated with expression levels of different metalloproteins, but a good correlation exists between the half-saturation coefficient Ku of various phytoplankton species as a measure of their metal requirements compared with the proportion of the genome dedicated metal-binding proteins (Fig. 2e). Ku is the concentration that can support a growth rate at one-half the maximum growth rate of phytoplankton. The percentages of Fe-, Mn- and Zn-binding proteins in phytoplankton positively correlate with their Ku of Fe, Mn and Zn, respectively (Fig. 2e). Given that the percentage of encoded metalloproteome seems to be a useful indicator of physiological metal requirements, we extended our analysis to a larger number of phytoplankton genomes to gain insight into contrasting trace metal requirements (Fig. 1). The significantly lower percentage of metalloproteins in the secondary endosymbiotic lineages indicates that they have lower metal requirements for trace metals Fe, Zn and Cu, compared with those of primary endosymbionts (P < 0.05; Fig. 1).

Despite the gap between the bioinformatically predicted function of genes and the phenotypic metal quota, our analysis here can illustrate the proportional resource investment in phytoplankton trace metal acquisition and use (Figs. 13 and Supplementary Fig. 2). This yields additional dimensions that cannot be obtained via metal quota analysis and provides further insights into the co-evolution of life and environment10,19.

## Strategies for metal management in different phytoplankton

Phytoplankton use metal transport systems for metal acquisition and distribution. The amount of energy dedicated to the transport of trace metals may hint at chemistry changes in seawater. Principal component analysis (PCA) of metal-transporting protein families in aggregate (numbers of proteins in each metal-transporter family in phytoplankton species) confirmed that phytoplankton from the same phyla cluster together (Fig. 2h), with the first two components clearly separating cyanobacteria, red lineage and green lineage phytoplankton (Fig. 2h). The separation of phytoplankton into distinct groups from PCA analysis of metal transporters agrees with the separation via elemental stoichiometry11.

To compare the relative genetic investment of phytoplankton in transporting of different metals, we calculated the percentages of different metal transporters relative to the proteome sizes across the groups (Fig. 2). Cyanobacteria encode a significantly greater proportion (~2.8%) of metal transporters in the proteomes compared with eukaryotes (0.5–1%, P < 0.05; Fig. 2c). Adenosine triphosphate (ATP)-binding cassette transporters (ABC transporters), one of the largest and oldest gene families that are responsible for the ATP-powered translocation of many substrates across membranes20, are the most abundant metal-transporter families in all phytoplankton (Fig. 2f). Particularly, in cyanobacteria, they comprise more than 90% of all metal-transporter families—these transporters typically transport more than one type of metal, such as the Ni/Co uptake transporter family and the Mn/Zn/Fe chelate uptake transporter family20. The low diversity in transporter type (Supplementary Fig. 3) in cyanobacteria compared with eukaryotes reflects intense genome streamlining21. Moreover, some cyanobacteria lack phytochelatin synthase (Supplementary Table 2), known to mediate the synthesis of metal-binding phytochelatins22, a strategy for managing metal toxicity by binding and storing of excess metals23. Cyanobacteria have a low percentage of P-type ATPases, transporters that confer protection from extreme environmental stress conditions such as high concentrations of metals24,25. These features of cyanobacterial metal transport systems—a high investment (high percentage of transporter proteins in the proteome), streamlined and indiscriminate metal uptake and efflux system—indicates that they employ a strategy centred around nutrient acquisition and efflux, making cyanobacteria ubiquitously well adapted.

Among eukaryotic phytoplankton, the chlorophyll a+c-containing (red) lineages have a lower diversity in their transporter systems compared with the chlorophyll b-containing (green) lineages. Many of the red lineage phytoplankton lack specific Cu transporters (for example, the COPT group) and high-affinity Ni transporters, but encode a greater number of generalist transporters (for example, ABC transporters; Fig. 2f and Supplementary Fig. 3)20. The ABC transporters comprise a significantly greater proportion (P < 0.05) of metal transporters in the red lineages (60%) compared with green lineages (50%). In contrast to the red lineage phytoplankton, the green lineages have significantly higher percentages of P-type ATPase dedicated to metal efflux (Fig. 2f, P < 0.001)24,25. The green lineages have more complex metal transport systems, likely to eliminate an excess of specific metals from cells, and therefore may be better adapted for environments with metals at higher potentially toxic concentrations26,27 such as modern nutrient-rich coastal or upwelling environments28,29. Such an adaptation may contribute to the green algae providing the successful ancestors for the modern land plants, as living in the terrestrial realm requires the ability to select for essential metals from the metal-rich rock and soil milieu. In contrast, the red lineages have fewer metal transport systems comprising more general metal transporters, meaning that they are better adapted to an oligotrophic environment where the risk of uptake of excess metals is lower21.

The affinity of metal transporters, indicated by the proportions of sulfur- or thiol-containing amino acids that have high affinity for transition metals30, is greater in the secondary endosymbionts compared with the primary endosymbionts, in various metal-transporting proteins (Fig. 2g, Supplementary Fig. 4 and Supplementary Table 3), especially those related to Fe- and Zn-transporting processes, such as Zn transporters (for example, ZIPs) and ferric reductases, which reduce Fe(III) to Fe(II) for Fe transmembrane uptake. No systematic variations in the average thiol amino acid compositions were found in different algal lineages, suggesting that the differences in the thiol content found in the transporters are not resulting from genome-wide bias (Supplementary Fig. 4). The histidine contents are also found to be significantly higher in the high-affinity Ni transporting family, multicopper oxidases and the Fe(III) transporters (FTR family) in the secondary endosymbionts (Supplementary Table 3). This suggests that the secondary endosymbionts may have evolved a more selective metal transport system with improved discrimination against abundant metal competitors for the binding sites (for example, Mg and Ca) in seawater that are orders of magnitude higher than those essential trace elements such as Fe and Zn, supported by a higher sulfate availability in the marine environment31. Having a higher-affinity transporting system for essential trace metals Fe and Zn could be an advantageous feature for the secondary endosymbionts to live in the oligotrophic open ocean.

## How differences in chemistry and strategies arise

We conducted a protein location analysis to test whether the evolutionary history of the plastid is the source of the differences in metal requirements and transport strategy of the cell. All orthogroups of metalloproteins and metal transporters, annotated to be plastid- or mitochondria-targeted, were extracted across different phytoplankton species for each trace element (Fig. 3). The variation in proportions of plastid-targeted metalloproteins and transporters of different phytoplankton groups partially explains the evolutionary inheritance of elemental requirements found in different eukaryotic marine phytoplankton10. Elements that were more bioavailable billions of years ago, such as Fe and Co, are integral to the metalloproteome of cyanobacteria. A higher proportion of these elements can be attributed to the plastids derived from cyanobacteria. In contrast, higher proportions of elements such as Mo, which were less bioavailable under anoxic ocean conditions, are found in proteins originating from eukaryotes (that is, these protein groups are absent from the cyanobacteria; Fig. 3). Although no significant difference in plastid core protein-coding genes was found at the superfamily level previously10,32, a reduction in proportions of plastid metalloprotein/transporter of various trace elements was found in the secondary-endosymbiont-bearing lineages, suggesting a metalloproteome reduction in the plastid via endosymbiosis and a potential increased contribution from the eukaryotic host in regulating their trace element requirement and transport ability, especially for elements such as Fe, Cu and Co.

## Metal requirements may have coevolved with metal availability

Contrasting metal use and uptake strategies for Fe and Zn, in particular, reveal how evolving ocean chemistry has shaped parallel changes in the metalloproteome and metal acquisition strategies of different lineages of phytoplankton (Fig. 1). These two metals comprise the largest proportion of the metalloproteome in various organisms and are involved in a more diverse range of biological pathways and activities than other essential trace metals33,34. Fe is an abundant metallocentre in photosynthesis and electron transfer (Supplementary Fig. 2), being found in almost all electron transfer complexes (photosystem II, photosystem I, cytochrome b6f complex and ferredoxins), and abundantly bound with plastid or mitochondria proteins in the eukaryotic phytoplankton (Fig. 3). Our analysis reveals an inverse relation between the proportions of specific Fe transporters in trace metal transporters with the percentage of Fe-binding proteins in phytoplankton proteomes (Fig. 4a and Supplementary Fig. 5a). The emergence of cyanobacteria in an anoxic ocean, with abundant soluble Fe(II) and continued selection to inhabit an enviroment with relatively greater Fe bioavailability35, is consistent with both the greater Fe-binding proportion of cyanobacteria proteomes and the lower proportion of high-affinity Fe-specific transporters, reflecting the absence of a selective pressure to develop a specific Fe uptake system. The later emergence of the red lineage secondary endosymbionts, such as diatoms, in an environment where the bioavailable Fe concentration was orders of magnitude lower, have been selected for streamlining Fe use in their proteomes. Meanwhile they developed more Fe transporters/pathways for Fe acquisition to fulfil this reduced Fe need in response to the scarcity of this key element.

A positive correlation exists between the proportion of specific Zn transporters and Zn-binding proteins in different phytoplankton proteomes (Fig. 4b and Supplementary Fig. 5b). Larger proportions of Zn in eukaryotic than prokaryotic proteomes are attributed to small Zn-binding domains such as Zn fingers and RING domains, involved in protein–DNA/RNA interactions and protein–protein interactions36. A positive linear relationship between the number of Zn-containing proteins and the total number of proteins (r2 = 0.91) of different phytoplankton (both cyanobacteria and eukaryotes; Supplementary Fig. 6) indicates an increase of Zn requirements with complexity of life, consistent with the previous finding that Zn in higher organisms is essential for many enzymatic activities and enabling a tight contol of gene expression33,36. The higher percentage of Zn in eukaryote proteomes compared with cyanobacteria and a higher number of Zn transporters to manage Zn homeostasis in their cells may reflect a possible increase of bioavailable Zn in the surface ocean based on thermodynamic considerations37. Such an increase in Zn is not well supported by current geological evidence1,38,39 (Supplementary Discussion), so either there are secondary imprints on the geological record or the increased use of Zn in eukaryotes may just reflect the uniqueness of its chemistry to life, leading to more extensive use and incorporation of Zn into more functions, enabling more complex life to evolve40.

## Linking metal availability with phytoplankton shifts

Evolving ocean chemistry contributed selective pressure to promoting the proliferation of different phytoplankton groups through major transitions in Earth’s history8,41,42 (Fig. 5 and Supplementary Fig. 1), given their differences in metalloproteome and metal transport strategy. The surge of both major and minor nutrients in the surface ocean supplied by the Sturtian deglaciation41,42 may have promoted the rise of green algae (Chlorophyta, primary endosymbionts), well adapted to trace-metal-rich conditions. The contemporaneous increased atmospheric oxygenation43 may have contributed to reduced bioavailable Fe but increased bioavailable Zn in the surface ocean and aggravated the selection for the lower-Fe and higher-Zn requirements of the green algae compared with cyanobacteria (Fig. 1). Vertical distributions of metals may have been altered even if the Zn content of the whole ocean remained broadly uniform throughout time1,38,39. It was not until the Mesozoic that the secondary endosymbionts were able to radiate across oceanic shelves into the open ocean (Fig. 5 and Supplementary Fig. 1). They harbour a more metal-lean strategy, with low requirements of Fe, Cu and Zn in their proteomes, compared with the primary endosymbionts (Fig. 1). The more general transport strategies of the secondary endosymbionts compared with the selective transport systems of the primary endosymbionts also make the secondary endosymbionts better adapted to a trace-metal-poor, oligotrophic marine environment. The rise of the secondary endosymbionts to dominance at the Mesozoic therefore points to a distinct shift towards a paucity of trace metals in the open surface ocean ecological niche at this time, a target for validation by future geochemical records (Fig. 5 and Supplementary Fig. 7).

The modern ocean allows a validation of this environmental selection by trace metals between niches of different phytoplankton groups, evidenced by their proteomes and geological succession. In the modern ocean, higher concentrations of trace metals, including Fe and Zn, are found in the coastal and upwelling zones, while the open ocean is metal scarce44. We extracted phytoplankton abundance data from the Ocean Biodiversity Information System and visualize them on the world ocean map (Fig. 6 and Supplementary Fig. 8). Although there are variations and uncertainties due to the differences in ecotypes within each group, different phytoplankton groups generally display contrasting ecologies consistent with their strategies for trace metal tolerance and acquisition. The secondary endosymbionts are dominant in the modern open ocean45 and, in most cases, the primary endosymbionts Chlorophyta and Rhodophyta are more abundant in the coastal regions32,46,47. The cell densities for cyanobacteria are abundant in both coastal and open-ocean regions, consistent with our prediction from metalloproteome and metal-transporter analysis. Although data regarding cyanobacterial transporters show that they employ a strategy centred around nutrient acquisition and efflux, allowing them to survive and be detectable in most modern open-ocean, nutrient-poor environments48, we also show that they have a greater Fe-binding proportion of the proteome, reflecting their high Fe requirement, supported by physiological experiments49,50 and metagenomics analysis, indicating their stress caused by Fe limitation in large areas in the open ocean35.

Trace metal availability and thus their role in phytoplankton growth continues to evolve in the modern ocean. The contemporary ocean ecosystem is under threat from stressors associated with climate change, including warming, acidification, deoxygenation and pollution, which have implications for coastal eutrophication and the modulation of coastal versus open-ocean habitats. The change in trace metal geochemistry associated with the stressors may also substantially alter the phytoplankton community structure and thus carbon export activity in the future ocean51.

## Methods

To ensure the robustness and validity of our conclusions, we have employed two different approaches of analysis of the trace metal complement of different phytoplankton in this study. The majority of the data are collected and analysed by a comparative genomic approach, using complete genome-inferred proteomes of different phytoplankton species. To augment the data, we conducted domain analysis on both the complete proteomes of the phytoplankton species and all sequences from the Marine Microbial Eukaryote Transcriptome Sequencing Project (MMETSP)57. In this analysis, we used the Structural Classification of Proteins (SCOP) database58 and MetalPDB database59 as our library and references for protein domain and metal domain annotations. The uncertainties associated with these two approaches are discussed in the Supplementary Information.

### Proteome sources and raw data generation for the comparative genomics approach

In the comparative genomic analysis, to evaluate the proportional importance of each trace metal for the different lineages (Fig. 1), we calculated the percentage of metalloproteins binding with specific trace metals in the genome-predicted proteomes of 26 species of phytoplankton (Supplementary Table 1). Although our list of phytoplankton in this study is not exhaustive of the current phytoplankton proteomic database, the species included are from a diverse range of phyla from two superfamilies: the chlorophyll a+b-containing eukaryotes (green lineage) including Chlorophyceae, Trebouxiophyceae, Klebsomidiophyceae and Prasinophyceae; and the chlorophyll a+c-containing eukaryotes (red lineage) including Bangiophyceae, Florideophyceae, Cyanidiophyceae, Cryptophyta, Haptophyta and Stramenophiles. In the red lineage group, Cryptophyta, Haptophyta and Stramenopiles are secondary endosymbionts, while the others are primary endosymbionts. Although the focus of this study is the differences between the dominant eukaryotic phytoplankton groups, we have included two species of cyanobacteria, which are widely observed in the ocean and are major contributors to primary production35, for comparison.

Groups of orthologues shared among phytoplankton species were inferred using OrthoFinder60,61 with DIAMOND (double index alignment of next-generation sequencing data) for protein alignment62. In detail, we conducted all-versus-all DIAMOND blast throughout our list of phytoplankton sequences to measure pairwise sequence similarity between all sequences in the species. To reduce the impact of gene length on clustering accuracy, the method we used has employed a protocol that determines the gene length dependency of a given pairwise species comparison from an analysis of the bit scores from an all-versus-all Blast search60. Briefly, for each species pair, the all-versus-all Blast hits were divided into equal sized bins of increasing sequence length according to the product of the query and hit sequence length. The best 5% of hits were used to represent good hits for sequences of that length bin and a linear model in log–log space was used to fit a line to the scores by least square method. By transforming all the Blast bit scores from each species pair all-versus-all search using this model, the best hits between sequences in the species pair have equivalent scores that are independent of sequence length. Such score transformation enabled a normalization for both gene length and phylogenetic distance between species. These scores were used as the measure of sequence similarity on which all subsequent analysis and clustering were performed. After obtaining the orthogroups, the gene trees for all orthogroups are inferred and then from these gene trees the rooted species tree is identified. Then the method we employed provided both gene tree and species tree-level analysis of all gene duplication events. Based on all this phylogenetic information, we then can identify the complete set of orthologues between all species of phytoplankton that we investigated. Here, the orthogroup graph was clustered using Markov Cluster Algorithm (MCL) with its default inflation parameter of 1.5 (https://micans.org/mcl/). The species tree for the phytoplankton investigated in this study was also constructed using OrthoFinder based on the Species Tree from All Genes (STAG) algorithm, which infers the species tree using the most closely related genes within single-copy or multi-copy orthogroups. The output comparative genomics data and statistics were then subjected to further analysis for investigating metal transporters and metalloproteins. We have searched and identified the metal-binding proteins according to their functions, that is, metal transporters or metalloproteins that have other functions. We aim to understand the usage of trace metals by phytoplankton as well as their strategies for transporting different metals. These two processes together can determine the trace metal requirements by phytoplankton as reflected in their responses to trace metal limitation and toxicity. For each metalloprotein, we take into consideration of all different metals that can work as co-factors. For instance, Cu/Zn superoxide dismutase would be considered both a Cu metalloprotein and a Zn metalloprotein, to reflect its potential use of these two different trace metals.

We use annotations from previous studies on model organisms as our library and references (listed in Supplementary Table 1). Most sequences used in this analysis are reference proteomes from the Universal Protein Resource (UniProt). They were rigorously assembled and annotated by experts in the field and were selected as landmarks in proteome space. Moreover, because we have conducted comparative genomics analysis here, the annotations of the orthogroups from all phytoplankton species included in this analysis are compared and cross-checked against those from all other species; that is, even if we have proteins that derive from relatively worse-annotated species, they are further checked and compared with those better-annotated species to enable a robust analysis in our study.

### Protein (metal) domain analysis

In the domain level analysis, in addition to all the phytoplankton species from UniProt shown in Supplementary Fig. 1, we included whole proteome sequences of four phytoplankton species that have not been analysed in the comparative genomic approach: the stramenopile Phaeodactylum tricornutum and dinoflagellates Polarella glacialis, Symbiodinium microadriaticum and Symbiodinium pilosum. These sequences are also all from UniProt. We have also analysed all available sequences from the MMETSP. To do that, we applied a DIAMOND Blast search of all the sequences against the SCOP database to identify all the domains for each species, using an Expect value (E) of 10−4 as the cut-off. Then we searched the Protein Data Bank ID for each domain against the MetalPDB database to identify all the metal-containing domains. The relative abundances of trace metals for each species were then calculated by using the total number of metal domains normalized to total number of protein domains identified. We then conducted statistical analysis for different groups of phytoplankton to understand their relative requirement of trace metals. As the quality of the sequences from the MMETSP varies from strain to strain, we included only those (a total of 608 sequences) that had more than 200 protein domains identified for further metal domain and statistical analysis.

### Phytoplankton half-saturation coefficient data

Ku (the half-saturation coefficient for growth) values of different phytoplankton can be used as a measure of their metal requirements. Ku is [S] when μ = μmax/2 as described by the response of phytoplankton physiology (growth) to external trace metal concentrations using the steady-state equation for substrate limitation (Monod equation, equation (1):

$$\mu = \frac{{\mu _{{{{\mathrm{max}}}}}[S]}}{{K_u + [S]}}$$
(1)

wherein [S] is the concentration of the limiting nutrient; μ is the specific growth rate and μmax is the maximum specific growth rate. The raw data for calculating the half-saturation constants are from ref. 63. Only data from the species that overlapped with our metalloproteome analysis are included in Fig. 2, including Synechococcus, Emiliania huxleyi, Thalassiosira pseudonana and Thalassiosira oceanica. We used data from a single study in which all conditions were consistent to avoid potential differences in Ku values caused by variations in experimental conditions.

### Data statistical analysis

#### Metal-transporting families and metalloproteins analysis

For each metal of interest, we searched across the data generated by OrthoFinder and extracted all data annotated as metal-containing or transporting families in the proteome. We constructed matrices with each column representing a distinct species and each row representing either the abundances of metal-related proteins or the detailed proteins accessions in the group. To determine whether the encoded metalloprotein abundances were different between phyla and subgroups of phytoplankton, we performed a series of t-tests and analysis of variance (ANOVA) on each dataset. Patterns in metal-transporter systems were also examined using PCA. Multivariate clustering on different matrices were performed using Past (https://uhm.uio.no/english/research/resources/past/).

#### Amino acid composition of transporter families

We extracted protein sequences for each transporter family from all phytoplankton included in this study and determined their amino acid compositions by counting the elements forming each amino acid in the transporter protein sequences. For each type of transporter, the data were separated into phytoplankton subgroups, including the cyanobacteria, the Rhodophyta, the Chlorophyta and the secondary endosymbionts. Single-factor one-way ANOVA analysis was employed to determine the statistical differences among groups.

### Distribution of phytoplankton

The abundance data of different phytoplankton in Fig. 6 were taken from the Ocean Biodiversity Information System (OBIS), which provides the world’s largest scientific knowledge base on the diversity, distribution and abundance of all marine organisms. The data for diatoms (Supplementary Fig. 6) are from ref. 64. We extracted data according to their phylum names and included only data with individual counts available in the analysis. Individual counts from each data point were then plotted on a 3 × 3° global grid base map using the Generic Mapping Tools65.