N-linked glycosylation enzymes in the diatom Thalassiosira oceanica exhibit a diel cycle in transcript abundance and favor for NXT-type sites

N-linked glycosylation is a posttranslational modification affecting protein folding and function. The N-linked glycosylation pathway in algae is poorly characterized, and further knowledge is needed to understand the cell biology of algae and the evolution of N-linked glycosylation. This study investigated the N-linked glycosylation pathway in Thalassiosira oceanica, an open ocean diatom adapted to survive at growth-limiting iron concentrations. Here we identified and annotated the genes coding for the essential enzymes involved in the N-linked glycosylation pathway of T. oceanica. Transcript levels for genes coding for calreticulin, oligosaccharyltransferase (OST), N-acetylglucosaminyltransferase (GnT1), and UDP-glucose glucosyltransferase (UGGT) under high- and low-iron growth conditions revealed diel transcription patterns with a significant decrease of calreticulin and OST transcripts under iron-limitation. Solid-phase extraction of N-linked glycosylated peptides (SPEG) revealed 118 N-linked glycosylated peptides from cells grown in high- and low-iron growth conditions. The identified peptides had 81% NXT-type motifs, with X being any amino acids except proline. The presence of N-linked glycosylation sites in the iron starvation-induced protein 1a (ISIP1a) confirmed its predicted topology, contributing to the biochemical characterization of ISIP1 proteins. Analysis of extensive oceanic gene databases showed a global distribution of calreticulin, OST, and UGGT, reinforcing the importance of glycosylation in microalgae.

Posttranslational modifications of proteins are essential for proper protein function in all living organisms, with N-linked glycosylation widely prevalent in numerous proteins. N-linked glycosylation is primarily used for protein folding control in the endoplasmic reticulum with additional functions based on the matured glycan structures in the Golgi apparatus. Whereas putative functions of N-linked glycosylation in algae are unknown, functions in plants are slightly better understood, and, for example, a complete blockage of N-linked glycosylation leads to embryonic death, demonstrating the importance of this process in plants 1,2 . The maturation of N-linked glycan structures differs based on specific enzymes in the Golgi apparatus, and while vertebrates show a high diversity of glycan structures, homologues for many of the enzymes present in vertebrates have not been identified in plants and microalgae 3,4 . The diversity of glycan structures in vertebrates is reflected in functions ranging from cell-cell communication, their involvement in auto-immune diseases, to inflammatory reactions 5 . Therefore, it is not surprising that N-linked glycosylation is often an essential component for the functionality of biopharmaceuticals, where microalgae are seen as an attractive alternative for recombinant protein production as they are eukaryotic, photo-autotroph and fast-growing 6 . The first production of a functional monoclonal antibody in Phaeodactylum tricornutum was a significant step towards a recombinant protein expression system in diatoms for pharmaceutical usage [7][8][9][10] . P. tricornutum is capable of N-linked glycosylation resulting mainly in high-mannose glycans 4 . The N-linked glycosylation pathway in eukaryotes is a multi-step process that starts in the endoplasmic reticulum (ER) and continues throughout the Golgi apparatus.

Results
Identification of enzymes involved in N-linked glycosylation. As a first step in our study, we scanned the T. oceanica genome to identify the genes coding for proteins of the N-linked glycosylation pathway. Figure 1 and Table 1 describe the enzymes known to be involved in the N-linked glycosylation pathway and their corresponding genes. The in-silico search in T. oceanica yielded 10 of the 11 different ALG enzymes that are involved in the formation of the lipid-linked oligosaccharide (LLO) 11 . The missing enzyme is ALG10, which transfers the outermost glucose onto the LLO (Fig. 1, highlighted in red). Our in-silico search revealed a putative flippase, potentially involved in turning the glycan structure from the outside of the endoplasmic reticulum to the inside 27 . The analysis also showed that T. oceanica possesses an OST to attach the glycan structure to the NXT/S motif (Fig. 2). We identified calreticulin in the T. oceanica proteome as a putative chaperone, mediating the protein folding control in the ER 28 (Table 1). Correctly folded proteins enter the Golgi apparatus, and the glycan structure is modified by a mannosidase (α-ManI) 29 , with further modifications are either GnT1-dependant or GnT1-independent Table 1. Comparison of proteins involved in the N-linked glycosylation pathway between T. oceanica (this study) and P. tricornutum 4 . The table gives a comparison of previously published proteins that are involved in N-linked glycosylation in P. tricornutum 4 (Pt) and the corresponding proteins we discovered in T. oceanica (To). The table provides the length of the proteins (number of amino acids), the number of transmembrane domains (TM), the corresponding protein domains (PFAM), and the protein numbers (THAOC, Protein Nr.) The following abbreviations are used: N-acetylglucosaminyltransferase (GlcNAcT), mannosyltransferase (ManT), glucosyltransferase (GlcT), dolichol-phosphate -glucosyltransferase (P-Dol GlcT), dolichol-phosphate mannosyltransferase (P-Dol ManT), glucosidase II (Glc II), mannosidase I (Man I), N-acetylglucosaminyltransferase I (GnTI), mannosidase II (Man II), UDP-glucose glucosyltransferase (UGGT).  ManT ALG11  THAOC_09775  673  433  2  4  PF15924, PF00534  PF00534  54621   (1,2)-ManT ALG12  THAOC_28372  361  581  6  6  PF03901, PF00153  PF03901  44425   (1,2)-ManT ALG9  THAOC_13160  637  556  7  10  PF03901  PF03901  44574   (1,3)-FucT  THAOC_00736  1273  481  0  1  PF00852  PF00852  54599   (1,3)-FucT  THAOC_14764  281  770  0  1  PF00852  PF00852     Chl Sec 15 45 Mito undef 12 18 C. reinhardtii* 90 Proteins B) We extracted the sequences of 90 previously published proteins from C. reinhardtii 15 that were N-linked glycosylated and performed the same analysis as for T. oceanica. The same servers were used to analyze the subcellular localization. (C) The 120 motifs identified in T. oceanica are shown in terms of the motif type (NXT or NXS) with one letter code for the amino acid and x being any amino acid except proline. Small circles within the grey circles show the number of motifs that could be putatively bacterial (orange) or that have a valine (red).
The numbers of motifs found in high-iron treatment (yellow), low-iron treatment (brown), or both treatments (overlapping area) are shown. (D) Comparison with peptides found in B. braunii 16 and C. reinhardtii 15 . We analyzed the peptides from both studies with respect to the type of motif (NXT; NXS) and compared these findings to our data from T.oceanica. Shown are the total motifs found in each study and the percent of NXT and NXS motifs.  4 . Based on the presence of GnT1 in T. oceanica, both pathways are possible. In addition to GnT1, we identified a fucosyltransferase ((1,3)-FucT) and a galactosylase (β1,4-GalT) that can alter the glycan structure in the Golgi apparatus (Table1).

Solid-phase extraction of N-linked glycopeptides.
The solid-phase extraction of N-linked glycosylated peptides pre-selects N-linked glycosylated peptides through the binding of glycosylated peptides to hydrazide beads. This approach resulted in the identification of 118 peptides with 120 motifs originating from 115 proteins (Fig. 3c), among which a nitrate reductase, iron starvation-induced protein1a (ISIP1a), and a ferrichrome-binding protein (FBP) were identified. All other proteins were unknown, and putative functions were identified through BLAST and KEGG database searches. The analysis of the proteins through the KEGG database resulted in putative functions for 23 proteins. These functions were widely distributed throughout various cellular pathways ranging from glycan biosynthesis, genetic information processing to amino acid metabolism (Supplementary File S1). Most of the motifs (101) occurred with an NXT-type motif, and only 19 peptides had an NXS motif. We analyzed previously published data to compare the number of N-linked glycosylation site types in C. reinhardtii 15 and B. braunii 16 to our results in T. oceanica. In all three species, the NXT-type is the dominant type and accounts for more than 60% of the identified glycosylation sites (Fig. 3d). Within the 120 motifs in T. oceanica, we found 16 putative bacterial glycosylation motifs of the D/E-X 1 -N-X 2 -S/T type, and, interestingly, 27 had a valine located either in the motif or directly in front or following the motif (Fig. 3c). The gene of one of the identified peptides with a putative bacterial glycosylation motif was previously shown to originate from cyanobacteria via lateral gene transfer 17 . Overall, 66 of the identified proteins had a predicted transmembrane domain (TM) or a signal peptide (SP), and 32 proteins were predicted to enter the secretory pathway. While 18 proteins were targeted to the chloroplast (Chl) and 23 to the mitochondrion (Mito), 42 proteins were undefined (undef) in terms of their location and would be considered cytosolic proteins in T. oceanica (Fig. 3a). In comparison, within the 90 N-linked glycosylated proteins previously identified in C. reinhardtii 15 , 45 proteins were predicted to enter the secretory pathway, 15 proteins were targeted to the chloroplast, 12 proteins into the mitochondrion, and 18 proteins were undefined (Fig. 3b). The high-iron cultures of our study on T. oceanica were grown in the presence of 15 N-labelled nitrate as sole N-source while the low-iron cultures were grown with unlabelled nitrate, allowing the combined proteomic analysis of iron-limited and iron-replete samples. This approach was used to distinguish the origin of the identified peptides and led to the detection of 46 motifs in both iron conditions, 30 in high-iron and 25 in low-iron samples (Fig. 3c). These results are based on the empirical observation of the peptide to MS/MS spectrum matches (PSMs).
Targeted transcriptome analysis. The transcript levels of OST, calreticulin, UGGT, and GnT1 as keyenzymes in the N-linked glycosylation pathway were analyzed in a targeted transcriptome experiment with cells grown under iron-replete and iron-deprived conditions. Here, calreticulin and OST showed a significant difference between high-and low-iron conditions. This transient upregulation resulted from higher transcript counts for high-iron samples in the first 6 h (Fig. 4a). The diurnal pattern for OST, calreticulin, and UGGT showed a maximum expression towards the end of the light phase and a steady downregulation during the night. The analysis of transcript levels included the use of actinomycin D, a transcription inhibitor. The inhibition of gene transcription resulted in a rapid degradation of all four transcripts in a similar fashion, and none of the analyzed transcripts showed prolonged retention of their transcripts (Fig. 4c,d).

Discussion
Our study combines comparative genomics, targeted transcriptomics, and proteomics approaches to elucidate the N-linked glycosylation pathway in the open ocean diatom T. oceanica in the context of high-and low-iron growth conditions. The genes encoding for enzymes involved in the N-linked glycosylation pathway were identified (Table 1), and the transcript levels of four key enzymes were tracked through a diel cycle in cultures grown under iron-replete and iron-deficient growth conditions (Fig. 4a). Further, we characterized 120 N-linked glycosylated motifs in 115 proteins, including N-linked glycosylation sites in iron-regulated proteins ( Fig. 3 and Supplementary File S1).
Overall, this work contributes to a greater understanding of N-linked glycosylation in microalgae. Based on their low cost and easy growth as well as their eukaryotic character, microalgae are a promising alternative to bacteria, mammalian cell lines, or yeast for the production of recombinant proteins 30,31 . Posttranslational modifications, such as N-linked glycosylation, are often an essential aspect in the production of recombinant proteins as it is often necessary for the proper function of the produced proteins 5 . The understanding of N-linked glycosylation in microalgae is still in its infancy, and research has been done on a few microalgae, including C. reinhardtii 26,32 , P. tricornutum 4 , Porphyridium sp 33,34 , B. braunii 16 and Chlorella vulgaris 35 , demonstrating speciesspecific glycan structures. We analyzed T. oceanica in this context because it is an open ocean diatom of ecological importance. Our results revealed high similarities between P. tricornutum and T. oceanica with respect to the proteins involved in the N-linked glycosylation pathway (Table 1).
Two enzymes, ALG10 and GC-I, that are involved in the maturation of the N-linked glycan precursor were not identified in T. oceanica. Our results align with previous findings demonstrating the absence of ALG10 in diatoms in general and the absence of GC-I in P. tricornutum 4,33 . Interestingly, ALG10 is responsible for the attachment of the outer glucose residue, and GC-I removes this glucose residue (Fig. 1). The absence of both enzymes in T. oceanica and P. tricornutum suggests that this might be a unique feature in the N-linked glycosylation pathway in diatoms.
The identification of N-linked glycosylated peptides included the pre-selection of glycosylated peptides bound to hydrazide beads using the SPEG method 36 . This pre-selection is important as the identification of glycosylated www.nature.com/scientificreports/ peptides in the proteomic data processing is based on the deamidation of the asparagine. Asparagine deamidation occurs when PNGase F cleaves the glycan. The deamidation reaction can also occur spontaneously 37 , resulting in false-positive glycosylated peptides, but a pre-selection of glycosylated peptides greatly enhances the identification of true glycosylated peptides 38 . We also verified our glycosylated peptides in-silico through the NetNGlyco server, revealing only a low number of motifs with low specificity (12 out of 120), and we identified almost 50% of the glycosylated peptides in both high-and low-iron samples, greatly reducing the chance of spontaneous deamidation. The majority of motifs that we identified in the SPEG analysis contained the NXT-type (101) motif, and only 19 motifs belonged to the NXS-type. We found similar proportions when we analyzed previously published data from B. braunii 16 and C. reinhardtii 15 (Fig. 3d). T. oceanica possessed motifs with a putatively bacterial N-linked glycosylation motif, as well as motifs with a valine either within the motif itself or following the motif. Valine is part of N-linked glycosylation sites in insects, where an NXV motif has been reported 39 . We used in our study PNGase F, which is often preferred because the bulky PNGase A is considered less effective for large peptides and proteins 40 . However, the use of PNGaseF has limited our results to peptides that are not core α (1,3) fucosylated. P. tricornutum exhibited a weak presence of core α (1,3) fucosylated glycan structures in comparison to the green onion 4 , and future studies on this topic in diatoms should also consider the additional use of PNGase A as it cuts core α (1,3) fucosylated glycan structures. www.nature.com/scientificreports/ The analysis of protein lysates from iron-deprived conditions resulted in identifying N-linked glycosylation sites in proteins that are important for the adaptation of T. oceanica to low-iron conditions. These included two N-linked glycosylation sites in ISIP1a, a protein that is involved in endocytosis-mediated iron uptake via siderophores in P. tricornutum 41 . The predicted conformation of ISIP1 proteins shows an extracellular N-terminal domain 42 , which we confirmed through the identification of N-linked glycosylated peptides in this domain. We also identified N-linked glycosylated peptides of two putative ferrichrome-binding proteins (FBP) (THAOC_08758 and THAOC_28875). These proteins might play an important role in the low-iron adaptation as discussed in P. tricornutum 43 , and the identification of their peptides under iron-deprived conditions demonstrates the presence of FBP proteins in T. oceanica under iron limitation. Previous transcriptomic studies support its sole expression under iron-limited conditions in T. oceanica 17 .
The subcellular localization analysis revealed that only 27% of the proteins were targeted to the secretory pathway (Fig. 3a) and 35% of the proteins identified in our N-linked glycosylated peptide study were targeted towards the chloroplast or the mitochondrion (Fig. 3a). We expected a higher percentage targeted to the secretory pathway, but N-linked glycosylated proteins have been previously reported for chloroplasts and for mitochondria [44][45][46][47] . It is defined as an alternative route for some mitochondrial proteins to go through the ER 44 , and N-linked glycosylations of chloroplast targeted proteins such as the carbonic anhydrase 45 and a plastidial pyrophosphatase 46 have been demonstrated previously. N-linked glycosylation also occurs in some chloroplast targeted proteins during their transport through the first membrane as part of the protein transport into complex plastids 47 . For comparison, we analyzed 90 previously published N-glycosylated proteins in C. reinhardtii 15 . The analysis showed that 15 of the 90 proteins are targeted to the chloroplast, and 12 proteins are targeted to the mitochondrion. In C. reinhardtii, 50% of the proteins were predicted to go into the secretory pathway (Fig. 3b). Both datasets show that N-linked glycosylations might play an important role in protein function and trafficking, demonstrating a complex function of N-linked glycosylation in microalgae.
The targeted transcriptome approach aimed primarily to verify that genes involved in the N-linked glycosylation pathway, identified in silico, were actively transcribed, demonstrating that the pathway is active during a diel cycle (Fig. 4a). In addition, we detected differences in expression patterns between high-and low-iron cultures (Fig. 4b). Based on the restructuring of the surface proteome in T. oceanica under low iron conditions 17 , the key enzymes in our targeted transcriptomic approach included the catalytic domain Staurosporine and temperaturesensitive 3 (STT3) as a subdomain of OST 48 , calreticulin, and UGGT as central enzymes in the ER. OST and calreticulin revealed a transient downregulation for the first six hours under low-iron conditions. Despite this downregulation, the transcript levels of OST, UGGT, and calreticulin exhibited a diel cycle with an increase throughout the light period, possibly linked to an increased expression of genes for proteins related to the secretory pathway (Fig. 4a). In P. tricornutum, genes encoding proteins of the cytosolic glycolysis and the citric cycle showed a similar expression pattern 49 . The upregulation of these two pathways could increase the number of uptake proteins needed to supply macro and micronutrients, which could result in an increased flow-through of proteins through the ER. Overall, the similarity in their transcript levels supports a functional connection (Fig. 4a). The fourth transcript that we analyzed was GnT1 as a critical enzyme for complex glycan maturation in the Golgi apparatus. GnT1-dependant glycan structures have been shown for B. braunii 16 , but the activity in diatoms is under debate 4,15 . The analysis of conserved sites in the T. oceanica GnT1 showed the presence of most of the conserved sites previously identified in P. tricornutum 4 (Supplementary Fig. S1). The activity of GnT1 was verified by a qPCR assay in P. tricornutum 4 . In our study, GnT1 transcripts were detectable throughout the diel growth cycle in T. oceanica, although in lower abundance when compared to the transcript levels for the three other genes we analyzed (Fig. 4a). Whether GnT1 is active in T. oceanica and P.tricornutum still needs to be shown because the GnT1-dependent glycan structures that were identified in P. tricornutum 4 were also found in the non-GnT1 harbouring C. reinhardtii 15 . In-silico analysis showed the presence of GnT1 in numerous algae 50 , but GnT1-dependant glycan structures were so far only identified in the green algae Botryococcus braunii 16 . The knockout of GnT1 in mice was embryonic lethal 51 , and the importance of GnT1 in plants was shown under stress conditions 52 . Whether or not GnT1 activity is important for microalgae remains unclear. However, we searched through the gene repository database from the Tara Oceans initiative 53 to assess the global oceanic distribution of gene abundance for OST, calreticulin, UGGT, and GnT1 (Fig. 5). The Tara Oceans database, assembled from several globally distributed research cruises, is to date the most complete collection of marine metagenomes and metatranscriptomes 54,55 , and is widely used to verify the presence of genes and assess their relative abundance in the ocean globally 41,53,56,57 . Our search in the Tara Oceans dataset revealed a lower abundance of GnT1 genes compared to OST, UGGT, and calreticulin (Fig. 5), indicating a restricted presence of GnT1 in microalgae, which coincides with the limited presence of GnT1-based glycan structures in microalgae 16 .
It is still very early to speculate, but in addition to the importance of N-linked glycosylation for the folding control of proteins, N-linked glycosylation in microalgae could also play a role in intra/inter-species signalling and communication for algae-algae or algae-bacteria relationships 58 . The importance of N-linked glycosylation for cell-cell communication in multicellular eukaryotes and the species-specific glycan structures found in microalgae point towards a role of N-linked glycosylation that goes beyond the folding control.
In conclusion, our study provides insight into N-linked glycosylation in the open ocean diatom T. oceanica. We identified enzymes that are essential for N-linked glycosylation, resulting in a fully functional N-linked glycosylation pathway in T. oceanica. Our targeted transcriptome study revealed diel expression patterns of four key-enzymes under high-and low-iron conditions with a transient downregulation of OST and calreticulin under low-iron conditions. The proteomic study not only revealed the characteristics of N-linked glycosylation motifs, but these motifs were also identified in peptides of two important iron-regulated proteins, ISIP1a and FBP. The N-terminal position of these peptides in ISIP1a verified the predicted topology of ISIP1a. Future work focusing on additional genome sequences of algal species, analysis of glycan structures, and, most importantly, functional www.nature.com/scientificreports/ characterization of specific proteins involved in N-linked glycosylation is needed to gain further insight into the evolution and function of N-linked glycosylation in the ocean.

Methods
Artificial seawater and culturing. For the targeted transcriptomic experiment, axenic Thalassiosira oceanica (CCMP1005) was grown in ASW f/2 following a 14/10 h light/dark cycle at 22℃ in trace metal clean polycarbonate (PC) bottles. ASW was prepared after Goldman et al. 59 . The Aquil metal mix 60 was used (100 µM EDTA, no addition of Ni 2+ ). Trace metal clean techniques were used at all times, and ASW was cleaned using Chelex100 (BioRad, Hercules, CA, USA). Acidified (pH = 2) 10 mM solution of FeCl 3 without EDTA was used for the addition of 10 µM final concentration of FeCl 3 . Cell culture for the solid-phase extraction of N-linked glycopeptides (SPEG) analysis was done separately with ASW f/2 following the recipe from Goldman et al. 59 , using high purity salts (BioUltra). Batch cultures were regularly assessed for iron-limitation by measurements of normalized variable fluorescence (F v /F m ). T. oceanica cultures grown in high-iron media were grown in the presence of 10 µM Fe, 10 µM EDTA and grown with 98% 15 NO 3 (Sigma-Aldrich, St. Louis, MO, USA) . This modification was used to differentiate between peptides identified under high-and low-iron conditions. Solid-phase extraction of N-linked glycopeptides. Sample volumes of 850 ml for low-iron and 500 ml for high-iron cultures were filtered onto 2 µm polycarbonate filters, rinsed off, and subsequently pelleted. The protocol by Tian et al. (Tian et al. 2007) was followed for capturing the N-linked glycosylated peptides, pre-selecting glycosylated peptides to avoid false-positive N-linked glycosylated peptide identification. We conducted the protocol with minor modifications. After protein lysis, 500 µg of each sample was combined and reduced for 60 min at 60 °C using 10 mM DTT. It followed an alkylation step with 12 mM iodoacetamide for 30 min at room temperature (RT). A 100 mM (pH 8) potassium phosphate buffer was used to dilute urea concentration, and 20 µg trypsin was added for overnight digestion. The sample was acidified (pH < 3) with formic acid and washed in HLB cartridges. Columns were conditioned with 1 × 1 ml 50% acetonitrile (ACN) with 0.1% trifluoroacetic acid (TFA) and 2 × 1 ml 0.1% TFA. The sample was loaded, and the column was washed 5 × with 0.1% TFA. Two elution steps followed with 600 µl 50% ACN, 0.1% TFA. The peptides were oxidized with a final concentration of 10 mM sodium periodate for 1 h at 4 °C in the dark, followed by a dilution step to reach an ACN percentage of under 5% and acidified with formic acid (pH < 3). A final column-wash, as described above, followed. Hydrazine beads (75 µl) were used to bind oxidized beads in an overnight reaction. Three µl PNGase F was used to release the peptides for 3 h at 37 °C. Peptides were dried in a SpeedVac and stored at − 20 °C for further processing.
Peptide identification. The methods used were based on techniques used on previously published data 61 with some modifications. Briefly, the pre-selected and cleaved peptides were dried to a pellet in a vacuum centrifuge and subsequently resuspended in 20 µl of a 3% ACN, 0.5% formic acid solution. The samples were transferred to a 300 µl HPLC vial and subject to analysis by LC-MS/MS on a VelosPRO orbitrap mass spectrometer (ThermoFisher Scientific, Waltham, Massachusetts, USA) equipped with an UltiMate 3000 Nano-LC system (ThermoFisher Scientific, Waltham, Massachusetts, USA). Chromatographic separation of the digests was performed on PicoFRIT C18 self-packed 75 µm × 60 cm capillary column (New Objective, Woburn, Massachusetts, USA) at a flow rate of 300 nl/min. MS and MS/MS data were acquired using a data-dependent acquisition method in which a full scan was obtained at a resolution of 30,000, followed by ten consecutive MS/MS spectra in both higher-energy collisional dissociation (HCD) and collision-induced dissociation (CID) mode (normalized collision energy 36%). Internal calibration was performed using the ion signal of polysiloxane at m/z 445.120025 as a lock mass. Raw MS data were analyzed using Proteome Discoverer 2.2 (ThermoFisher Scientific, Waltham, Massachusetts, USA). Peak lists were searched against the T. oceanica protein (txid159749) database as well as the cRAP database of common contaminants (Global Proteome Machine Organization). Two separate database searches were performed on each LC-MSMS datafile to identify both light and heavy 15 N-labelled proteins. For both searches, cysteine carbamidomethylation was set as a fixed modification, while asparagine to aspartic deamidation (N to D to account for PNGase F hydrolysis), methionine (Met) oxidation, N-terminal Met loss, and phosphorylation on serine, threonine, and tyrosine were included as variable modifications. Additionally, for the 15 N-labelled search, all nitrogen atoms were set as 15 N fixed modifications. A custom asparagine to aspartic deamidation variable modification was created to contemplate the loss of a 15 N isotope instead of the standard 14 N. A mass accuracy tolerance of 5 ppm was used for precursor ions, while 0.02 Da for HCD fragmentation or 0.6 Da for CID fragmentation was used for productions. Percolator was used to determine confident peptide identifications using a 0.1% false discovery rate (FDR). For semi-quantitative purposes, peptides and proteins were classified as + Fe and/or No Fe based on the evidence of a peptide to spectrum (MS/MS) match identification (PSM) for each experimental group.
In-silico analysis of N-linked glycosylated proteins. N www.nature.com/scientificreports/ to analyze the respective full length proteins in terms of their subcellular localization. Here, we used the same servers as we used for T. oceanica.
Experimental design of the targeted transcriptomic experiment. T. oceanica growth was kept in the exponential phase, and sampling was done in low-to mid-exponential phase. A 22 h experiment (long-term (LT)) and 6 h experiment (short-term (ST)) were performed. Both experiments included high-iron, low-iron, and iron-recovery. High-iron cultures were grown with an iron concentration of 10 µM FeCl 3 , whereas no iron was added to the low-iron cultures. Iron-recovery samples received 10 µM FeCL 3 after the initial measurement (T = 0). Additionally, treatments in the ST experiment included actinomycin D and iron (actd-Fe), only actinomycin D (actD), DMSO with iron (DMSO-Fe), and only DMSO (DMSO). The initial measurement was taken one hour after the start of the light period, and iron-addition followed immediately after the initial sampling was completed. The following timepoints are based on the time-distance to the addition of iron. In all experiments, the 1 h timepoint describes the timepoint at 1 h following the addition of FeCl 3 (Fig. 6). The LT experiment included triplicate experiments of each treatment (high, low, and recovery). The ST experiment included triplicates for each treatment (high, low, recovery, DMSO, DMSO-recovery) but only duplicate measurements for actD and actD-Fe. DMSO treatments served as a control for the actD treatment as actD was dissolved in DMSO. DMSO did not result in any transcript decrease as observed in the actD treatments (data not shown). We only analyzed data from high-and low-iron cultures for the study presented here. www.nature.com/scientificreports/ liquid nitrogen. ST experiment samples were filtered on 2 µm PC filters and immediately flash-frozen in liquid nitrogen. On-column DNA digestion using RNase free DNase by Qiagen (Qiagen, Inc., Valencia, CA, USA) was done to ensure the complete removal of DNA. The RNA was quantified with a Nanodrop (Thermo Fisher Scientific, Waltham, Massachusetts, USA) and stored at − 80 °C.
Transcript analysis using the NanoString platform. The probes for the NanoString (NanoString Technologies , Seattle, Washington, USA) analysis were designed based on the available genome of T. oceanica 17 (Accession nr. AGNL01000000) in collaboration with NanoString Technologies (Supplementary Table S2). Samples were analyzed in two 96well plates (96well-Run) (Supplementary Table S3 and Supplementary Table S4), and the "nCounter PlexSet Reagents for Gene Expression User Manual" was followed. Each column on the 96well plate was pooled, after a 20 h hybridization incubation, to create one sample that is processed and analyzed. A titration run was performed to calculate loading amounts of the different treatments, resulting in 90 ng of RNA for high-iron samples (including iron-recovery samples sampled later than 30 min after the addition of iron), 70 ng for low-iron samples, and 80 ng for iron-recovery samples.
Basic calculation of transcript counts. The results from the Ncounter were first processed in the NSolver software. A reference lane with the same samples was used for in-plate probe calibration. The geomean of the Top 3 positive controls provided by NanoString was used for the normalization. The following steps were done in Excel. One house-keeping gene, the nuclear-import-exporter (THAOC_05312), was used to correct loading differences. The housekeeping gene normalization for the actD samples was done using the arithmetic mean of the housekeeping gene counts of each timepoint.
Tara Oceans database analysis. The Tara Oceans database 53 is an annotated gene catalogue of globally distributed samples and was used to screen for the global abundance of OST, calreticulin, UGGT, and GnT1. The protein sequences were used in a blastp search against the Metagenome database with an e-value threshold of 10 -50 . Surface water samples and deep chlorophyll max samples were analyzed. The data for the size fractions of 0.8-5 µm and 5-20 µm were plotted, and graphs were downloaded from http://tara-ocean s.mio.osupy theas .fr/ ocean -gene-atlas /.