The macronuclear genome of the Antarctic psychrophilic marine ciliate Euplotes focardii reveals new insights on molecular cold adaptation

The macronuclear (MAC) genomes of ciliates belonging to the genus Euplotes species are comprised of numerous small DNA molecules, nanochromosomes, each typically encoding a single gene. These genomes are responsible for all gene expression during vegetative cell growth. Here, we report the analysis of the MAC genome from the Antarctic psychrophile Euplotes focardii. Nanochromosomes containing bacterial sequences were not found, suggesting that phenomena of horizontal gene transfer did not occur recently, even though this ciliate species has a substantial associated bacterial consortium. As in other euplotid species, E. focardii MAC genes are characterized by a high frequency of translational frameshifting. Furthermore, in order to characterize differences that may be consequent to cold adaptation and defense to oxidative stress, the main constraints of the Antarctic marine microorganisms, we compared E. focardii MAC genome with those available from mesophilic Euplotes species. We focussed mainly on the comparison of tubulin, antioxidant enzymes and heat shock protein (HSP) 70 families, molecules which possess peculiar characteristic correlated with cold adaptation in E. focardii. We found that α-tubulin genes and those encoding SODs and CATs antioxidant enzymes are more numerous than in the mesophilic Euplotes species. Furthermore, the phylogenetic trees showed that these molecules are divergent in the Antarctic species. In contrast, there are fewer hsp70 genes in E. focardii compared to mesophilic Euplotes and these genes do not respond to thermal stress but only to oxidative stress. Our results suggest that molecular adaptation to cold and oxidative stress in the Antarctic environment may not only be due to particular amino acid substitutions but also due to duplication and divergence of paralogous genes.

www.nature.com/scientificreports/ (Fig. S1). The final GC distribution shows a well-defined peak (with respect to the distribution before cleaning) centered on the 31.51% (Table 1, Fig. S2), a value consistent with those reported for other ciliates with a nanochromosomal MAC genome architecture (i.e., Stylonychia lemnae, Oxytricha trifallax, and Euplotes crassus, 31.5%, 31.2% and 36.9%, respectively) 29 . This homogeneous normal GC distribution is consistent with negligible bacterial contamination. Although we cannot rule out the presence of sequences belonging to yet unreported endosymbionts with low GC content, it is unlikely their sequence base composition would perfectly match that of E. focardii and produce a unimodal distribution. Nanochromosomes containing bacterial sequences were not found in the final assembly suggesting that no phenomena of horizontal gene transfer recently occurred in this ciliate even though it has a substantial associated bacterial consortium 30 .
The genome size is in line with expectations based on the previous studies on other ciliates 5 , as well as for the ciliates reported in Table 2. This number of nanochromosomes was chosen as the main parameter for the selection of the best assembly among those produced by the different versions and configurations/parameters of SPAdes algorithm described in "Materials and methods". This Whole Genome Shotgun project has been deposited at DDBJ/ENA/GenBank under the accession MJUV00000000. The version described in this paper is version MJUV02000000.
This assembly does not show any alternative fragmentation on the basis of the clustering analysis used in analyses of the Oxytricha MAC genome and consistent with previous observations in E. crassus 7,31 . Moreover, from the pairwise sequence identity analysis of the assembly (described in the "Paralog prediction" section of "Materials and methods"), E. focardii, and E. octocarinatus, do not show a peak at high (> 90%) sequence identity typical of heterozygous alleles, as instead E. crassus and E. vannus show, but only one peak of around 40% identity, likely due to the presence of a substantial number of paralogous genes (Fig. S3). The MAC genome assemblies of E. crassus, E. octocarinatus and E. vannus all have considerably more complete nanochromosomes than E. focardii (Table 1; E. crassus MAC genomes are unpublished but have in excess of 30,000 nanochromosomes). However, all three species also show an order of magnitude more paralogs around the 40% peak than E. focardii. The smaller number of complete nanochromosomes in E. focardii thus likely reflects a combination of three factors: (i) lower assembly contiguity; (ii) high genome homozygosity; (iii) fewer paralogs. In future further improvement of this genome would likely best be achieved by using long-read sequencing, such as that provided by Pacific Biosciences.
The results of the gene prediction performed using the AUGUSTUS software are summarized in Table 2. The number of genes (predicted in the nanochromosomes and in the contigs with no telomeres that blast with a database containing all ciliates available proteomes), the average of the CDSs' length, of the introns' length and of the number of introns per gene are consistent with those previously reported for Stylonychia and Oxytricha 7,32 , and for E. crassus 28,33 . Nanochromosomes encoding a single CDS represent 74.58% of the total, and nanochromosomes with more than one CDS are 8.51% of the total; these percentages are comparable to those of Stylonychia and Oxytricha (75% encoding a single CDS and higher than 7% with more than one CDS 7,32,34 ).
To assess genome completeness, we compared the predicted proteins of the E. focardii macronuclear genome assembly with the CEG database: 93% of this database's sequences (231) have homologs in the E. focardii assembly. This value may suggest a small amount of incompleteness of the genome assembly. However, 11 out of 17 of the sequences without matches are also absent from the Oxytricha and Stylonychia genomes (Table 3), consistently with greater evolutionary distances of ciliates from the eukaryotes included in CEG, as also proposed for Stylonychia 7 . Genome annotation revealed a total of 90 ribosomal proteins, of which 36 belong to the standard eukaryotic small 40S subunit and 54 to the large 60S subunit. These values strongly support the present genome analysis, considering that the total standard eukaryotic ribosomal protein set that contains 32 proteins in the 40S subunit and 48 in the 60S subunit. Rfam analysis of structural RNAs in E. focardii, in addition to confirm the annotated tRNAs described in the next section, revealed the presence in the assembly of a nanochromosome (NODE_589) encoding both 18S and 28S rRNA genes, as well as 10 additional snRNA genes, including 3 for 5S rRNA and one for 5.8S rRNA. These results agree with those previously reported for Oxytricha trifallax 7 and Stylonychia lemnae 32 . Taken together these results support the completeness of the E. focardii MAC genome assembly.
Properties of E. focardii translation. Ciliates, including Euplotes, frequently use the standard stop codons in atypical ways compared to the usual eukaryote assignments 35,36 . As previously reported 28,37,38 , we www.nature.com/scientificreports/ detected abundant frameshifting motifs (AAA, AAT, ATA, AAC etc. codons followed by a stop codon, either TAA or TAG) in the E. focardii MAC genome (see "Materials and methods" section) consistent with pervasive + 1 programmed translational frameshifting in Euplotes ciliates. The frameshifting analysis on the predicted genes revealed that at least 4.2% of total genes could be affected by this phenomenon. This value could be underestimated since it excludes genes that had no BLAST hits (about 47%) to our sequence database and does not consider other possible frameshifting motifs beside those already identified in E. crassus 28 . Even though the comparison of the frameshifting sites between E. focardii and E. crassus revealed that these are not conserved in the same genes 28 , the occurrence of predicted frameshifting (just under 10%) would be in line with the observations in E. crassus by Klobutcher et al. 37 and in E. octocarinatus by Wang et al. 38 . Considering the transcriptlevel quantity of these specific genes, as described in the "Materials and methods" section, the + 1 ribosomal frameshifting activity does not significantly affect the transcription in E. focardii on the basis of the statistical test used (p-value = 0.386). 61 tRNAs were predicted in the E. focardii MAC genome assembly. 55 unique tRNAs were encoded on nanochromosomes and appear to be sufficient for the translation of all codons. These tRNAs include one selenocysteine tRNA (encoded on NODE_51680), with the typical long variable arm characteristic of such tRNAs, and a putative paralogous pair of cysteine tRNA genes with TCA anticodons (tRNA-Cys(TCA)-1: NODE_55662; tRNA-Cys(TCA)-2: NODE_55665; 90% identical to each other), which resemble those previously reported for E. crassus 39 . The tRNA-Cys(TCA) paralogs are in turn paralogs of tRNA-Cys(GCA), as for E. crassus 39 .
Previously a potential stop-suppressor tRNA of UAA, suggested to play a role in the ribosomal frameshifting was reported in Euplotes octocarinatus 38,40 . Though we found evidence of potential translational readthrough of stop codons (see subsequent analysis), we could not detect a similar tRNA in Euplotes focardii using BLASTN searches with the putative E. octocarinatus frameshifting tRNA as a query. Furthermore, neither tRNAscan-SE nor Aragorn predicted a similar tRNA to the putative frameshifting E. octocarinatus tRNA in E. focardii. The tRNA secondary structure prediction software did however predict an unrelated tRNA with a potential stop cognate anticodon on contig NODE_32101 (Fig. 1A). NODE_32101 encodes an additional tRNA-Glu(TTC), which is 66.7% identical to the putative suppressor tRNA(CTA) (Fig. 1). A different tRNA-Glu(TTC) encoded on a separate nanochromosome (NODE_54875) is 77% identical to NODE_32101 tRNA-Glu(TTC). BLASTN Table 3. CEGs missing from alignment results using default CEGMA criteria. # Found into a contig without telomeres set (with %GC > 45). *Found into the assembly set but not into the genes predicted set. We observed expression of mature tRNAs for both predicted tRNAs on NODE_32101 in the YAMAT-seq reads (Supplemental Table 1), at low levels, but well within the limits of other predicted E. focardii tRNAs. It can be seen that the secondary structure predicted by RNAfold (Fig. 1A) does not place the anticodon 5'-CTA-3' symmetrically in the anticodon loop, as is typically the case. However, this may be incorrect structure prediction, similar to the incorrect predictions by RNAfold we observed while attempting to predict the E. focardii tRNA-Cys(TCA) structures using this software. In the alignment of E. focardii and E. octocarinatus tRNAs, it can be seen www.nature.com/scientificreports/ that TCA aligns to the predicted TTC anticodons of the Glu-tRNAs (Fig. 1B). In future, to ascertain whether the candidate tRNA(CTA) on NODE_32101 is functional it would be necessary to search additional genomes from other Euplotes species, and observe both whether there are similar putative tRNA genes with TCA anticodons and also whether there are co-varying substitutions that support structural conservation and anticodon position. Other than frameshifting "stop" codons, there are a few reports of potential translation of in-frame "stops". In E. focardii, based on multiple sequence alignments, a TAG codon in a beta-tubulin gene was hypothesized to be translated as tryptophan 41 . Ricci et al. described the use of an in-frame TAG codon in two other Euplotes genes 42 . Recently, Wang et al. reported the use of TAG to encode an amino acid, likely glutamine, in the cathepsin B gene of the closely related freshwater Euplotes octocarinatus 43 . Given these observations and considerable plasticity in termination codon usage in ciliates 44,45 , we wondered if the E. focardii stop codons may occasionally be translated in other genes.
In standard genetic code organisms, which do not possess tRNAs directly cognate to "stop" codons, translational readthrough, may use near-cognate pairing of tRNAs (i.e., possessing two of the three complementary anticodon-codon pairings between bases). TAA/TAG codons are most frequently translated as glutamine and TGA as tryptophan in eukaryotic translational readthrough 46 . To investigate what amino acids TAA/TAG codons in E. focardii might encode, we examined the frequency of amino acids aligned to these codons in conserved alignments extracted from translated BLAST matches (Fig. 1C). We focused on matches with substantial sequence conservation up-and downstream of stops to exclude potential sites of translational frameshifting. As a baseline for comparison, we also examined the frequencies of amino acids aligned to TGA codons, which predominantly encode cysteine, and CAA/CAG codons, which encode glutamine. For both codon kinds, the most frequent aligned amino acids are the expected ones (Fig. 1C). TAA/TAG codons are most often frequently aligned to glutamine.
Whether TAA/TAG codons are translated by translational readthrough or conventionally by a tRNA such as the candidate tRNA(CTA) in E. focardii, remains to be determined. In other eukaryotes translational readthrough typically occurs at low levels, typically a small percentage of non-translational readthrough, and leads to short extensions of proteins 47 . Consequently, in future it will be of interest to determine the translational efficiency of in-frame stops in Euplotes, particularly in genes like the beta-tubulin paralog with an in-frame TAG occurring close to the N-terminus 41 , especially if it is a relatively highly translated protein like other beta-tubulins. Furthermore, it would also be of interest to determine what amino acid the candidate tRNA(CTA) may be charged with, and if there is transamidation of glutamate to glutamine, as observed in Bacillus subtilis and many other organisms 48,49 .
The tubulin super families in E. focardii genome. The protein annotation procedure allowed the identification of 15,357 proteins ( Table 2), 3306 of which are enzymes. Using CD-HIT, 5222 were grouped in clusters with at least 40% of identity. We focused on the analysis of the members of tubulins, antioxidant enzymes, and Hsp70 gene families to examine whether the number of paralogs and their evolution in each superfamily may be related to the E. focardii cold-adaptation.
In vitro polymerization studies performed with E. focardii purified tubulin heterodimers demonstrated their ability to form microtubules at temperatures close to the freezing point of the Antarctic marine habitat 14 , as also reported for tubulin heterodimers purified from Antarctic fishes 50,51 . In addition, the study of the tubulin superfamily in an Antarctic psychrophilic ciliate is even more interesting with respect to other psychrophilic organisms because it contributes not only to the understanding of the molecular basis of microtubule cold adaptation but also of microtubular structure complexity. Indeed, differently from other eukaryotic microorganisms, ciliates assemble 17 different types of microtubules throughout their life cycle 52 , even though all microtubule functions are carried out in a single cell.
The macronuclear genome sequencing from several ciliates 7,32,53,54 allowed the characterization of up to five alpha-and beta-tubulin isotypes. Therefore, a multigenic tubulin family is a common characteristic in ciliates and these tubulin isotypes may be responsible of the formation of functionally different microtubules 55 and with different dynamics properties 14,16,56 .
In previous papers, we reported the characterization of a single α-tubulin 3 , four β-tubulin 1,14 and two γ-tubulin isotypes 15 from E. focardii. Previously, the presence of four β-tubulin isotypes induced us to hypothesize that in E. focardii microtubules cold adaptation was based mainly on molecular modification of the β-tubulin subunit of the heterodimer rather than on the α-tubulin subunit. The comparison of these sequences with the homologs from non-cold adapted Euplotes species revealed the presence of unique amino acid substitutions in the E. focardii tubulin isotypes that may be correlated with cold adaptation 14 . Therefore, we further investigated this relevant class of proteins in the E. focardii genome.
All tubulin sequences that were detected in the final assembly were checked starting from those previously characterized 1,3,14,15,56 . All the sequences already collected into the UniProt Knowledgebase (TBβ1-UniProt: Q9N2N6; TBβ2-UniProt: C0L7F0; TBβ3-UniProt: C0L7F1; TBα1-UniProt: Q8WRT6; TBγ1-UniProt: A3F2R1; TBγ2-UniProt: A3F2R2) were identified and confirmed as reported in Table 4, with the exception of TBβ4 (the sequence collected as C0L7F2 in UniProt is included in the phylogenetic tree of Fig. 2B). However, new isotypes were discovered after protein annotation. Specifically, a new β-tubulin isotype (named TBβ5) and six additional α-tubulin isotypes. The relationship among these isotypes and those for other Euplotes species are shown in the phylogenetic trees in Fig. 2. In the trees, we introduced homologs from Tetrahymena, Oxytricha and Stylonychia to better evidence the degree of E. focardii gene amplification and divergency. E. focardii α-tubulins branches (highlighted as bold in Fig. 2A) are scattered throughout the phylogeny of ciliate homologs: ATU1 and ATU2 isotypes cluster with the homologs from E. crassus in the same group containing the "canonical" Tetrahymena α-tubulin, whereas ATU3 to ATU7 isotypes form a separate clade with E. crassus ATU4 and ATU5, and An high number of isotypes may be considered as an additional strategy for tubulin cold adaptation beside the presence of unique residues substitutions in the primary structure of tubulin heterodimers 1 . By contrast, E. focardii β-tubulin family appears less amplified and divergent than alpha tubulin. The branches (highlighted as bold in Fig. 2B) are less scattered and cluster with the corresponding E. crassus and E. octocarinatus homologs, even though BTU5 forms a well separated clade. BTU1 and BTU2, and BTU3 and BTU4, may be originated by recent gene duplications in E.focardii.
With the high number of different α-tubulin isotypes in the Antarctic ciliate E. focardii, we reconsidered the importance of the α-tubulin subunit in microtubule cold adaptation.
Molecular flexibility is regarded as a hallmark of cold adapted molecules, in particular enzymes, to cope with the reduction of dynamics and activity at low temperature 57,58 . We applied Molecular Dynamics (MD) simulation on each α-tubulin isotype of the ciliates under study. We found that only three E. focardii isotypes (2, 4 and 5; Fig. 3B) show a higher flexibility at 4 °C with respect to the E. crassus α-tubulin isotypes. The only exception in E. crassus is the isotype 2. This result can be considered as a further evidence of the E. focardii tubulin cold adaptation and suggests that not all isotypes must be flexible to function at low temperatures, as we reported also for the β-tubulin isotypes 14 . The similarity of E. focardii α-tubulin isotype 2 with that from E. crassus suggests a common origin of this α-isotype.
We also identified seven α-like and two β-like tubulins (e.g., more divergent isotypes; Table 4) and two δ and one ε isotypes. By contrast, the assembler was not able to produce two distinct complete contigs/nanochromosomes for γ-tubulin. In other words, γ-tubulin type 2 previously obtained from macronuclear DNA purified from E. focardii cells is present in the contig only in a short fragment form (see Table 4 legend). This is probably due to the high sequence similarity between the two γ-tubulin isotypes since raw reads specific for this second isotype www.nature.com/scientificreports/ were obtained after sequencing but difficult to align with this assembler. In the phylogenetic tree, E. focardii GTU1 and GTU2 isotypes (highlighted as bold in Fig. 2C) appear to be more related to the single γ-tubulin isotype from E. octocarinatus (that it is encoded by two similar genes) than the two distinct isotypes from E. crassus. In E. focardii, the presence of distinct γ-tubulin isotypes is associated to different roles in the nucleation of cellular microtubules (as previously reported 15 ) more than to cold-adaptation.   www.nature.com/scientificreports/ In a previous paper 73 , two E. focardii Cu,ZnSODs and one MnSOD were biochemically characterized. All three SODs are active at 4 °C but at the same time they retain high activity upon 20 min incubation up to 55/60 °C. This feature is unusual in cold active enzymes that are often heat sensitive and undergo inactivation and unfolding even at mild temperature. Nevertheless, thermo-tolerance or even thermostability of cold adapted enzymes was previously reported 4,74 suggesting that cold activity and thermo-tolerance may coexist in a molecule.
The sequences of the enzymes previously studied and present in the UniProt Knowledgebase (SOD1a-Uni-Prot: W0FZ77; SOD1b-UniProt: W0FUJ3; SOD2a-UniProt: MG575644) were confirmed by the analysis of the genes obtained in the E. focardii genome here described (Table 5): the SOD classification was done on the base of similarity with cytoplasmic SODs (type 1) and with mitochondrial SODs (type 2) and was confirmed by the presence of Cu/Zn and Fe/Mn pattern signature, respectively. In the same genome analysis, we could predict four additional E. focardii isoforms, the SOD1d and SOD1e and two SOD3 isoforms (Fig. 4A). According to this tree, the SOD3 isoforms appear to derive from gene duplication events probably happened before the www.nature.com/scientificreports/ divergence of E. focardii from other Euplotes species and some isotypes were then lost in E. crassus. As general result, the E. focardii SOD encoding gene family appeared composed by a higher number of genes with respect to the mesophilic E. crassus, probably due to a repeated event of gene duplication. A similar situation is evident for the catalase (CAT) genes (Fig. 4B). CAT (EC 1.11.1.6), that inhibits the DNA damage by decomposing the H 2 O 2 into oxygen and water induced by nitrofurazone, was previously considered a good biomarker for detecting oxidative stress and, consequently, ecotoxicity in aquatic ecosystems 75 . The number of E. focardii CATs genes is higher compared to the mesophilic E. crassus, probably due to a repeated event of gene duplication. As shown in Fig. 4B, CAT branches are scattered throughout the phylogeny of ciliate homologs. The low bootstrap values at these branches suggest that in E. focardii the CAT gene family underwent several events of recent gene duplications with potential adaptive outcomes that imply high divergence and consequently less supported phylogeny.
In general, the antioxidant enzymes system appears amplified in E. focardii (Table 6) suggesting that gene amplification may have contributed to combating the effects of increased oxygen concentration in the Antarctic seawaters. On the other hand, E. focardii possesses few gene encoding Thioredoxin NADPH Reductase (TrxR) www.nature.com/scientificreports/ and Glutathione Reductase (GR) isoforms. With the exception of E. octocarinatus, the genes for these enzymes are present in small number in the ciliate genomes known so far (see Table 6): Tetrahymena, as an exception in ciliates, has no TRXRs or GRs genes but 6 isotypes of thioredoxin-glutathione reductase (TGR) genes, that are composed by a fusion of the sequences of TRXR and glutaredoxin domains and are capable of transporting electrons from NADPH to both Trx and GSH systems 76 . E. octocarinatus has only 3 isotypes of TGR. In conclusion, the E. focardii antioxidant system appears to be based mainly on numerous SODs and CATs enzymes.  25 , the hsp70 gene does respond to several oxidative stressors, such as hydrogen peroxide in E. focardii 78 . In order to understand whether the Euplotes Hsp70 isoforms newly described here respond to thermal and/or oxidative stressors, we tested the inducibility of these genes in cultures subjected to heat shock (18 °C) or oxidative stress (H 2 O 2 ). Figure 6 shows that thermal stress did not induce transcription of the any of the seven Hsp70 genes relative to the control temperature (4 °C). In contrast, six of the seven genes (excluding isoform named Protein_05367) were induced by oxidative stress, strikingly so in the case of Protein_07400 isoform. We suggest that E. focardii, like several other Antarctic organisms, maintained a constitutive synthesis of Hsp70 isoforms to preserve protein function in the cold environment and evolved an oxidative stress response involving inducible Hsp70 synthesis. Moreover, this Antarctic ciliate has clearly lost the capability to induce the classical heat shock response when confronted with elevated temperatures.

Conclusions
The Antarctic ciliate E. focardii represents an optimal model for studying genome adaptation to cold environments. In this paper, we reported the analysis of the E. focardii MAC genome after a significant improvement of the assembly that were also cleaned-up of algal or bacterial contaminants. In particular, we obtained 17,798 sequences containing telomeres at both ends vs 12,922 previously reported 28 . Even though some assembly parameters remain lower in comparison to the other ciliates reported, complete nanochromosomes are now closer, as a percentage of the total assembly, to those from other Euplotes species 38,79 . This new report on the E. focardii MAC genome will provide additional information to investigate translation mechanisms in organisms with alternative genetic codes associated with the evolution of novel tRNA variants, including a putative suppressor tRNA, and to investigate how cold adaptation may have evolved.
The analysis of this improved E. focardii genome assembly allowed a better characterization of gene families, in particular that of the tubulins, that were previously only partially identified by single gene cloning approaches 1,3 . We identified a new β-tubulin isotype (TBβ5) and six additional divergent α-tubulin isotypes. In combination with the β-tubulin diversity, the role of the high number of different α-tubulins in microtubules cold adaptation should be reconsidered. Furthermore, we found that SODs and CATs families are composed by a higher number of genes with respect to the mesophilic Euplotes. The opposite trend was observed for the hsp70 genes: in this case isoform's diversification appears reduced with respect to homologs from other Euplotes species. Furthermore, expression of these genes was not induced by heat stress (18 °C for 30 min vs. a physiological temperature of 4 °C). On the other hand, hsp70 expression was raised during oxidative stress. We can conclude from these results that as for other Antarctic organisms, it is more important for E. focardii to cope with cold denaturation of proteins and oxidative stress than to respond to thermal stress. Consequently, the Hsp70 gene family did not expand like SODs and CATs families, that are involved in the antioxidant responses. Table 6. Number of antioxidant enzymes predicted in ciliate organisms phylogenetically close to E. focardii. These values were obtained consulting the annotated proteins files. 7  6  5  2  2  5  3  69  0  99   Euplotes crassus  6  4  6  3  2  7  3  60  0  91   Euplotes vannus  6  5  4  2  2  3  0  31  0  53   Euplotes octocarinatus  6  3  4  9  0  4  1  63  3  93   Oxytricha trifallax  11  4  0  1  0  10  2  58  0  86   Stylonychia lemnae  7  4  1  1  1  11  1  23  0  www.nature.com/scientificreports/ All these results suggest potential roles for paralogy in environmental adaptation, warranting future experimental investigation. Genomic expansions of specific protein gene families contributing to physiological fitness in freezing polar conditions have previously been reported for Antarctic notothenioids 80 . Gene diversification has been proved to produce a differential gene expression in specific adaptive conditions, as reported for the cold acclimation of the tea plant Camellia sinensis 81 and also for the E. focardii βT3-tubulin during cilia regeneration 14 and the E. focardii SOD 1b during cold stress 73 . In future, RNA-seq analyses of E. focardii transcriptome in different environmental conditions coupled to detailed molecular analyses will provide deeper insights into the role of duplicated genes.

Euplotes focardii
In conclusion, we propose that the molecular basis of cold adaptation that enabled E. focardii to thrive in the Antarctic Ocean may not be solely due to particular amino acid substitutions that enable these molecules to function at low temperatures but may have also arisen via gene duplications that increased protein functional diversity.

Materials and methods
SPAdes assembly. The Illumina HiSeq 2000 PE (paired-end, 100 bp, with BioProject ID SRX1959352) reads obtained after sequencing of Euplotes focardii macronuclear genome, previously trimmed using the Trimmomatic software (version 0.36) 82 and checked using the FastQC software (version 0.11.5) 83 , were assembled using the SPAdes algorithm (version 3.10.1) 84 with the "careful" option and the BayesHammer error correction algorithm 85 . Other parameters were set to default values. Additionally, SPAdes version 3.9, version 3.11.1 and a different set of k-mer lengths were also used to check and identify the best version and configuration of the assembler for these reads.
Possible redundant "chaff " contigs were removed from the assembly, as previously reported 32 , by mapping contigs shorter than 500 bp that had matches to the other contigs with greater than 80% coverage and 90% sequence identity. www.nature.com/scientificreports/

Assembly clean-up and properties.
To perform a quality assessment of the obtained assembly from SPAdes, avoiding bacterial contamination, the assembly was further analyzed checking the GC content by the QUAST software (version 4.5) 86 . The contigs having GC content higher than 45%, and a coverage lower than 10, were removed from the assembly using a custom algorithm written in Perl. This GC content percentage threshold was also set on the base of the minimum GC content of the most abundant bacteria in the consortium associated to E. focardii (data unpublished). Further bacterial contaminations were analyzed by using BLAST of the assembly versus the Genbank nr database setting bacteria as taxonomy filter, with 80% of hit coverage and 95% sequence identity of the matches. To check the goodness of the assembly, the SPAdes procedure was further repeated after this decontamination step. Currently, Genbank nr database also includes data of the last reported Euplotes endosymbiotic bacteria 87,88 . Moreover, to remove possible contaminant algal sequences, the Dunaliella salina genome (used to feed E. focardii) was compared to the E. focardii macronuclear genome assembly, using blastn and cd-hit-2d software with a threshold of 0.95 (see "Paralog prediction" section for details).
After all the steps of assembling and cleaning, the assembly was evaluated using another custom Perl script providing information about the size, the number of contigs, the number of telomeres, the number of 2, 1, and 0 telomere contigs and the mean length of contigs.
Gene prediction and frameshifting analysis. The gene prediction procedure on the E. focardii macronuclear genome assembly was performed using the AUGUSTUS software (version 3.3.3) 89 previously trained and tested on a manually curated data set, with no cases of translation frameshifting, from Euplotes crassus (data unpublished). The software was run using the following parameters: "-species = euplotes_crassus -UTR = on -alternatives-from-evidence = true -genemodel = complete -codingseq = on". The results of this prediction were assessed/processed with another Perl script.

Figure 6.
Hsp70 transcript abundance in stressed E. focardii cells normalized to the abundance in control cells. mRNA levels were determined by qPCR of the Hsp70 isoforms (reported as protein assembly code) using the comparative threshold method. Mean ± SD (n = 4). Total RNA was purified from cells grown at their standard temperature of 4° C (i.e., not shocked), from cells which were exposed to 18 °C (orange) via a heat ramp (see "Materials and methods" section), or from cells exposed to 100 µM of hydrogen peroxide (blue); control and experimental treatments were for 30 min. Data are reported as the mean of three experiments. Assessment of genome completeness. The assessment of genome completeness was firstly conducted analyzing the percentage of conserved core eukaryotic genes (CEGs) 93 searching the number of protein sequences contained in the CEG database (composed by 248 proteins) that were likely homologs with those of the E. focardii macronuclear genome assembly (i.e., with blastp E-values lower than 1e-10 and a match coverage higher than 70% of the length of the CEG proteins). The tRNAs were initially predicted with the Aragorn algorithm (version 1.2.41) 94 . Ribosomal proteins were counted after the protein annotation, as previously described, and structural RNAs were identified by BLAST searches of the assembly against the Rfam database. Secondary and tertiary structures of a potential stop-suppressor tRNA were determined using the RNAfold web server 95 and the RNAComposer automated RNA structure modeling server 96 , respectively. Paralog prediction. The first step of this analysis was to cluster the E. focardii macronuclear proteome using the cd-hit software (version 4.7) 97,98 with a sequence identity threshold of 0.95 to merge alleles (26,680 clusters). Therefore, the clustering was performed using a threshold of 0.4 to identify the largest and most represented protein families in the proteome (21,850 clusters of which 2312 with at least two elements). In this work, we have focused our interest on the Hsp70, tubulins and antioxidant enzymes family. Moreover, CD-HIT (and CD-HIT-EST) software was also used with a threshold of 0.95 to identify possible alternative fragmentation in the whole genome.
Pairwise sequence identity searches were performed on the E. focardii MAC genome assembly, in comparison with the E. crassus, E. octocarinatus and E. vannus assemblies, to estimate the distribution of alleles and paralogous sequences. By a custom Perl script, an alignment of all contigs against each other was performed into the assembly invoking the blastn algorithm and extracting the best non-self BLAST hits; then, MAFFT algorithm (-clustalout -maxiterate 1000 -globalpair / -localpair) was invoked to align the two sequences of each pair obtained; finally, its sequence identity was calculated. tRNA sequencing, mapping and quantification. Total RNA was extracted using TRI Reagent (Sigma) according to the manufacturer's protocol, deacylated at 37 °C for 40 min in 20 mM Tris-HCl (pH 9.0) and precipitated with 5 M ammonium acetate in 75% ethanol. This RNA was used to produce a library of mature tRNAs by the YAMAT-seq method 99 . The resulting cDNA library was multiplexed and sequenced on an Illumina MiSeq sequencer in paired-end mode (75 bp reads).
Paired-end YAMAT-seq reads were merged with BBMerge (default parameters), from the BBTools package 100 . Forward and reverse adapters were trimmed off the merged reads using cutadapt 3.2 (default parameters; -m 20) 101 . This procedure yielded 623,000 reads, which were mapped to E. focardii tRNAs predicted by tRNAscan-SE 2.0 as a part of the tRAX pipeline 102 (default parameters) which was also used to obtain read counts (available as Supplementary Table 1).
Hsp70, tubulins and antioxidant enzymes system classification. The classes of Hsp70, tubulins and antioxidant enzymes in E. focardii previously detected and analyzed 1,69 were confirmed and extended after the protein annotation and ""Paralog prediction stages. Clustal Omega algorithm 103 was used to produce each multi-alignment, and MEGA version X 104 was used to infer each phylogeny using the Maximum Likelihood statistical method, the JTT substitution model with a single substitution rate category (for SODs, it was used with four substitution rate categories), and generate 100 bootstrap pseudo-replicates. MEGA version X 104 was also used to plot each phylogenetic tree.
Homology modeling and molecular dynamics of alpha tubulins. Three dimensional structures of α-tubulins of Euplotes species, not already available by X-ray crystallographic or NMR analysis, were obtained by homology modeling using the Modeller software (version 9.20) 105 and the Swiss-Model server 106 . Three dimensional structures (reported as pdb ID) detected as best template to model the α-tubulins of Euplotes, on the base of the Global Model Quality Estimation (GMQE) score, the coverage and the sequence identity in the Swiss-Model server, were: 6U0H 10 112 and the all-atom OPLS force field. After the preliminary topology generation, macromolecules were soaked in a SPC water molecules cubic box in the presence of Na + /Cl − ions. The entire system was minimized, until the variation of potential energy was smaller than 100 kJ mol −1 nm −1 , using the steepest descent algorithm and equilibrated under a constant Number of particles, Volume, and Temperature (NVT) and a constant Number of particles, Pressure and Temperature (NPT) ensemble. Then, the MD simulation was run for 50 ns. Temperatures in the equilibration and MD simulation steps were set either at 4 °C (277 K) or at 27 °C (300 K). The backbone root mean square deviation (RMSD) www.nature.com/scientificreports/ and the protein root mean square fluctuation (RMSF) were determined using the GROMACS tools "gmx rms" and "gmx rmsf ", respectively, while the maximum RMSD value (RMSD max ) for each dynamic was calculated fitting the trajectory with the equation RMSD = t * RMSD max / (t n + const) (Fig. 3A).

Hsp70 gene transcription by E. focardii under stress.
We evaluated the inducibility of E. focardii Hsp70 genes in cultures subjected to thermal or oxidative stress. Cells entering stationary phase after ~ 1 week of vegetative proliferation in the presence of food (the green alga Dunaliella tertiolecta) were pelleted by low-speed centrifugation (500×g, 3 min), and pellets were resuspended in seawater to a density of ~ 5 × 10 3 cells/mL. Heat-shock was performed by warming cells from 4 to 18 °C over 30 min. Control cells were incubated at 4 °C for 30 min. Oxidative stress was produced by incubating cells at 4 °C in the presence of 100 µM H 2 O 2 for 30 min. Total RNA from control or stressed cells was extracted with Trizol reagent (GIBCO BRL), and cDNA was synthesized from each template using the StrataScript Reverse Transcriptase (Stratagene).
Transcript levels corresponding to the seven E. focardii Hsp70 genes were measured in control and stressed DNA samples by comparative-threshold qPCR using the SYBR green DNA-binding method 113 and the primer pairs given in Supplemental Tables 2 and 3; the Euplotes SSrDNA gene (GenBank ID: EF094961) was used for normalization. To 100 ng of E. focardii cDNA were added 12.5 μl of 2 × SYBR Green JumpStart Taq ReadyMix (Sigma-Aldrich, Milan), 5 pg each of gene-specific forward and reverse primers, and water to a final volume of 25 μl. Amplification reactions were performed in triplicate in a Multicolor qPCR MX3000P thermocycler (Stratagene, Milan, Italy), with an initial denaturation step (95 °C for 2 min) to activate the polymerase followed by 45 cycles of denaturation at 95 °C for 30 s, and annealing and extension at 60 °C for 15 s. During annealing/ extension, the increase in fluorescence at 495 nm was monitored, and the threshold value was set at 30 units. To verify that the primer pairs gave specific PCR products without non-specific amplification, the DNA samples were subjected to melting curve analysis by ramping the thermocycler temperature from 50 to 95 °C at 0.05 °C/sec.
The relative expression of the Hsp70 genes was calculated by the method of Pfaffl 114 : where Ct is the PCR cycle number at which the fluorescent signal is above the set threshold, ∆Ct is the Ct difference (control minus sample) of the target or reference gene, and E is the real-time PCR efficiency of the target or reference gene (E = 10 -1/slope , calculated from plots of Ct vs. cDNA input). The relative expression ratios of transcripts under investigation were tested for statistical significance by a pairwise, fixed reallocation randomization test implemented in REST MCS version 2 software 115 . www.nature.com/scientificreports/ Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article's Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article's Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http:// creat iveco mmons. org/ licen ses/ by/4. 0/.