Genome-wide signatures of adaptation to extreme environments in red algae

The high temperature, acidity, and heavy metal-rich environments associated with hot springs have a major impact on biological processes in resident cells. One group of photosynthetic eukaryotes, the Cyanidiophyceae (Rhodophyta), has successfully thrived in hot springs and associated sites worldwide for more than 1 billion years. Here, we analyze chromosome-level assemblies from three representative Cyanidiophyceae species to study environmental adaptation at the genomic level. We find that subtelomeric gene duplication of functional genes and loss of canonical eukaryotic traits played a major role in environmental adaptation, in addition to horizontal gene transfer events. Shared responses to environmental stress exist in Cyanidiales and Galdieriales, however, most of the adaptive genes (e.g., for arsenic detoxification) evolved independently in these lineages. Our results underline the power of local selection to shape eukaryotic genomes that may face vastly different stresses in adjacent, extreme microhabitats.

(5) Deposition: It wasn't immediately clear if the genome assembly and structural annotations, including predicted transcript and protein sequences will be made publicly available through Dryad. This data (which is the bulk of the work and a result of the authors expertise and diligence) should be made publicly (and easily) accessible. (6) Polycistronic: the authors refer to Table S11, and I was expecting to find a list of identified proteins encoded by polycistronic transcripts in each of the three algal genomes, but the table is instead a list of primer sequences. There should be clarification or move the reference to Table S11 to the Supplemental note. If the authors are not providing a list of identified proteins encoded by polycistronic transcripts, can interested readers (who are not necessarily experts with Illumina and PacBio data) easily identify these from deposited data? (7) Line 133: how many are orthologs are conserved between species and how many paralogs are predicted?
Reviewer #2: Remarks to the Author: This submitted paper describes three new T2T genomes of cyanidiophycean red algae, of which habitats are polyextreme environments such as hot springs. The authors analyzed the genomic features and found subtelomeric duplications resulting in duplications of resided genes. Some of the duplicated subtelomeric genes are those for environmental adaptations such as heavy metal tolerance. They also performed comparative genomics of gene contents in Archaeplantida, and detected drastic gene losses in the last common ancestor of Cyanidiophyceae and further differential gene losses and gene gains in two lineages of Cyanidiophyceae. Thus, regardless of the same habitat, the two cyanidiophycean lineages possess different gene sets and exhibit different tolerance against environmental stresses, the latter of which was revealed by cultivation experiments. Consequently, this paper illuminates the "power of local selection" for eukaryotic genome evolution. The above findings are novel and of general interest of the readers. This research will significantly affect to future genome researches of eukaryotic algae and protists that occupy most of the diversity in the eukaryotic tree of life. I do not have any strong objection against most of the analyses and implications in this paper. But I would like to suggest to tone down or to conduct some additional experiments or analyses in certain points.
-Frequency of gene duplications Although HGT and duplication of genes for heavy metal tolerance have been reported in several algae (e.g., Hirooka et al. 2017 PNAS E8304-E8313), subtelomeric gene duplication contributing to evolution of environmental stress tolerance is one of the novel findings in this paper. Thus, subtelomeric duplications should have contributed to evolutionary adaptation of the extremophile algae to hot spring conditions. But I would like to see more data to focus on how relatively important the subtelomeric duplications are for genome evolution of them. Duplications of any genes could also be counted from non-telomeric regions. I expect that there are few duplications in non-telomeric and non-sub-telomeric regions of the red algal chromosomes as their genomes are highly reduced. This might help quantification of importance of gene duplication in the sub-telomeric regions, further adding novel findings not only from functional importance of sub-telomeric gene duplications but also from frequencies of gene duplication events in genome evolution of the red algae.
-Selections of duplicated genes Purifying selection in speciation but positive or relaxed selection in duplication for merA genes is interesting. But I would like to know whether this is the only exception that the authors could have detected the selection by using Ka/Ks ratios or this is one example of genes under selection detectable in the analysis. If the latter is the case, more examples should be shown. The archaeal-derived ATPase genes in Galdieria would be interesting if they are under positive or purifying selections by analyzing Ka/Ks among the subtelomeric duplicated ATPase genes. If the former is the case, it would be better to clearly mention so.
Relevant to this, merA should be spelled out in line 234 as this is the first emergence of the gene.
-Adaptative evolution of proteins to thermophilic conditions The authors detected the different types of heat shock protein genes from the red algal genomes. They found the number of chaperon genes are almost same among the genomes. However, expression level of the chaperon genes would be more directly involved in thermophilic lifestyles; the chaperon genes might be more expressed (or transcribed) in Cyanidiophyceae than mesophilic red algae. This is the case in some extremophile green algae (e.g., Hirooka et al. 2017 PNAS E8304-E8313). As the authors have already had RNAseq data of the thermophilic red algae and some mesophilic red algal RNAseq data are publicly available, it would provide more insight into contribution or no contribution of chaperons to thermophilic lifestyles.
Relevant to this, I have a concern about the in silico analysis of protein aggregation. Aggregation and folding of proteins could be affected by pH and ionic strength in addition to temperature. Indeed, I found TANGO v.2.3.1 has options to set pH and ionic strength. There is no mention about the settings and how the authors know the intracellular pH and ionic strength in cells of Cyanidiophyceae. These factors might be different between extremophiles and mesophilic species.
In addition, it would be better to show genomic evidence of specific adaptation in proteostasis machineries in the cyanidiophycean species as discussed in the supplementary note.
Accordingly, the current analyses and interpretation of the results seem insufficient to conclude anything about extremophilic adaptation of proteins. I would suggest delete this paragraph from the manuscript as it is not directly relevant to the main topic of the research which is introduced in the well-organized Abstract.
-Prokaryotic features I agree with the authors that the thermophilic red algal genomes have acquired the reduced genomes, HGTs contributing to the adaptive evolution, small numbers of introns and spliceosomal components, lack of miRNA processing, and polycistronic expression of some genes. Some of them could allow the algae to thrive in hot springs. But it may remain unclear whether all these indeed contribute to the adaptative evolution to hot spring environments. These traits can be seen in other eukaryotes that do not thrive in hot springs, and thus, acquisition of one or more prokaryotic traits might be irrelevant to thriving in hot springs. Some parasitic protists such as Giardia possess reduced genomes with few introns and reduced sets of spliceosome components (Morrison et al. 2007Science 317:1921-1926. HGTs for adaptation to certain environments, lack of miRNA processing, and polycistronic expression have been reported for other eukaryotes in previously published papers as cited in this submitted paper. I agree the above traits might be acquired by certain environmental pressures in extreme conditions but not limited or specific to hot springs. I would like to suggest to tone down in this paragraph or delete this paragraph from the manuscript as it is not directly relevant to the main topic of the research which is introduced in the well-organized Abstract.
In addition, I have a comment on the term "prokaryote-like features." As either of these traits is present in other eukaryotes, it is difficult to agree that all the above features are categorized as "prokaryote-like features." -Metabolic pathway maps in Figs. 4 and 5 These figures seem to be models from prokaryotic cells. I would like to see models of mercuric and arsenic detoxifications in the eukaryotic algae. The protein sequences for the pathways could be predicted their possible localization by analyzing N-terminal peptides and internal transmembrane regions. I understand it might be difficult to predict proper localizations of eukaryotic proteins as eukaryotes possess multiple organelles. But, model pathways and localizations could be proposed by the analyses as the metabolic flow is for detoxification in this case.
-Minor points line 141 Size increase of the intergenic regions might have happened in Cyanidioschyzon if the average size of intergenic regions in Cyanidium is as small as that in Cyanidiococcus. It would be more persuasive by describing the corresponding feature in Cyanidium in the main text.
line 193 I cannot follow what the "lack of conservation of subtelomeric duplicated genes in the last common ancestor of Cyanidiales" exactly means. Does it mean that subtelomeric duplicated genes in some chromosomes are distinct from those in other chromosomes in the last common ancestor of Cyanidiales, followed by differential inheritance of subtelomeric duplicated genes into the two Cyanidiales lineages? line 392 I cannot follow why certain evolutionary constraints result in gene loss. Constraints would function for gene retention or against gene loss. Is it proper to say "evolutionary pressure" in this case? -Typos/Wording line 52 Delete the period from "a. variety of." line 219 "Linker protein" should be "linker peptide." Lines 330 and 354 What does "host" mean in this context? Line 381 Spell out "pri-miRNA" as this is the first emergence of this wording. In "Genome-wide signatures of adaptation to extreme environments", the authors present the sequencing, 3 assembly, and annotation of three high-quality red alga nuclear genomes and gene model sets. The availability but it is not clear how the authors arrived at the gene set that comprises the toolkit or which genes are in the 23 'toolkit'. The data may be in one or more supplemental tables, but there needs to be quantification supporting this could be made more explicit in the abstract. The authors should also quantify how many of those genes 30 are conserved between the 3 species and how many are unique. This information doesn't have to go in the 31 abstract, but there should be several sentences describing this gene set and a section in the Methods describing 32 how this gene set is defined. An alternative would be to add small Venn diagrams with numbers in each part 33 of Fig. 6 to show how many "HGT" from proks and the conservation between species, and how many genes 34 resulted from STDG and the conservation between species. How the authors define conservation, i.e. OGfs as 35 an example, should be included.

37
We meant the term 'extremophile toolkit' to refer to a collection of adaptation strategies for extreme 38 environments, rather than a collection of specific genes. We discovered that the term 'extremophile toolkit' 39 can be misconstrued due to its terminology ('toolkit') and some of our previous statements, including the 40 sentence mentioned by the reviewer. To minimize the misunderstanding, we replace it as "environment 41 adaptation strategies".

42
We provided most of the details (e.g., orthogroups and genes) for subtelomeric gene duplication (STGD)

87
However, as previously stated, we found that determining the precise number of HGTs was difficult due to 88 methodological issues, so we chose to focus on genes that have been functionally identified and have a clear 89 phylogenetic signal of foreign origin. We revised the sentence as follows,

90
• Page 2, Line 18-19: "These extremophilic adaptation strategies are shared by the two major orders, information.
observed differences between the genomes compare to non-extremophilic red algae and to other lineages? As duplications / HGT events unique to these red algae/extremophiles or would a similar result with respect to average of values from 25 different combination parameters used for DIAMOND.

152
Since STGD accounted for ca. 30% of recent gene duplications, we infer that gene duplication in

153
Cyanidiophyceae species has been highly influenced by STGD, which is supported by Fisher's exact test (p-154 value < 0.05). We added and revised the sentence as follows,

232
We agree with the reviewer that a list of identified proteins encoded by polycistronic transcripts might be of

304
If the former is the case, it would be better to clearly mention so.

305
Relevant to this, merA should be spelled out in line 234 as this is the first emergence of the gene.

308
We listed 21 pair of candidate genes in Supplementary Table 7, however, gene pairs in the STGD did not 309 show evidence of purifying selection (not supported by Fisher test), whereas non-STGD pairs show a 310 significant signal of purifying selection. Due to this difference, we were unsure how to interpret these results. and Galdieria ATPase genes) have been arisen in a previous paper (Rossoni et al., eLife, 2018), therefore, we 313 have only mentioned their subtelomeric gene duplication. We also reconstructed phylogenies of archaeal-314 derived ATPase genes for our analysis, but we were unable to provide a clear answer due to low bootstrap 315 values, phylogenetic inconsistency, and unclear taxon sampling (no significant archaeal hits were retrieved).

316
Therefore, we chose only merA as a candidate for this analysis because, i) we focused on genes whose 317 functions we already knew, and ii) the phylogeny of genes was clear.

330
We agree that chaperones play a role in adaptation to harsh environments, however, our genome data a

348
Relevant to this, I have a concern about the in silico analysis of protein aggregation. Aggregation and folding

354
As this reviewer notes, pH and ionic strength may be important parameters when using TANGO. We used 355 three different values for the temperature parameter (290K, 300K, and 320K), but we used the default option 356 for the pH 7 and ionic strength 0.1 parameters. The reason for using the default parameter for pH is that we 357 already know the internal pH of the extremophile Cyanidium caldarium (pH 6.64 ± 0.09) and that when 358 compared to other mesophilic algae (pH 5.0-7.4), they do not show a significantly lower pH (Beardall &

359
Entwisle, Phycologia, 1984). We were not able to obtain ion strength information except from a study about 360 Cyanidium caldarium proteins in which they used an ionic strength of 1.0 by adding NaCl, but this was not 361 about the internal condition of cells (Enami, Plant Cell Physiol, 1978). Because all parameters will be applied 362 uniformly to all Archaeplastida proteomes for the aggregation prone propensity in silico survey, we 363 respectfully did not consider it critical for our comparative analysis. In addition, the paper from which we got 364 the majority of our ideas for this analysis (Draceni & Pechmann, PNAS, 2019) used default parameters.

370
The intracellular pH of Cyanidium caldarium (pH 6.6) was used as the pH parameter, and the other 371 parameters were set to the default.

373
The new findings also revealed consistent differences between mesophilic and extremophilic red 374 algae (t-test: df= 36,536, p-value < 0.05) (see above figure only for this reviewer's reference). Given all of the 375 information presented above, our analysis is applicable. We updated and revised the sentence as follows,

378
Because there was no ion strength measurement, we used the default pH and ionic strength parameters."

380 381
In addition, it would be better to show genomic evidence of specific adaptation in proteostasis machineries in 382 the cyanidiophycean species as discussed in the supplementary note. Accordingly, the current analyses and 383 interpretation of the results seem insufficient to conclude anything about extremophilic adaptation of proteins.

384
I would suggest delete this paragraph from the manuscript as it is not directly relevant to the main topic of the 385 research which is introduced in the well-organized Abstract.

387
We agree with the reviewer's comment. Based on the comment, we deleted this paragraph.

407
We agree with the reviewer comment ("prokaryote-like features"). Based on reviewer comment, we removed 408 this sentence.

463
Introns and telomeric regions were not considered in this analysis because we defined intergenic 464 region as sequences between two genes. All statistical tests supported Cyanidioschyzon is significantly 465 different from others (p-value < 0.05) and our following analysis of repeat comparison support this idea. We 466 revised the sentence as follows,

467
• Page 7, line 137-140: "Using a statistical approach (student's t-test: p-value < 0.05), we discovered that show overlap between all Cyanidiales species. This also applies to the ancestor of Cyanidiophyceae, because,

486
with the exception of a single orthogroup (GTP binding protein) that has been amplified independently in each 487 lineage (shown in Supplementary Fig. S9), there is no overlap in STGD events between Cyanidiales and

489
Some may be concerned about a few STGD overlaps found between Cyanidiales lineages 490 (particularly CCYA and CZME in Fig. 2a), but this can be explained by genetic distance between species.

491
CCYA and CZME are genetically more closely related compared to other Cyanidiales or Galdieriales species 492 (i.e., CDCA), and genomic syntenies has been found to be more conserved in CCYA & CZME than that of

502
As a result of their evolutionary history of recent divergence, CZME and CCYA have more STGD

509
We thank the reviewer's comment about the interpretation of gene losses as "evolutionary pressure". There 510 are various approaches to interpreting genome evolution (e.g., size, gene content); i) the mutational hazard 511 hypothesis (genetic drift), ii) the nucleotypic and nucleoskeletal hypotheses, and iii) the genome streamlining effective population than other eukaryotic species (i.e., macroalgae), resulting in more recombination and 516 being more affected by selection efficacy. On the other hand, it has been proposed that under stress (i.e., 517 nutrient limitation), there is a tradeoff in terms of energy or materials costs allocated to either DNA or other 518 components (Hessen et al., Trends Ecol Evol 2010). According to a genomic survey in the SAR11 clade, 519 nutrient-limited environments can promote evolutionary pressure, and they were able to reduce the material

542
We corrected the typo as follows,

548
We corrected the sentence as follows,

554
What does "host" mean in this context?

556
We revised the sentence as follows, Spell out "pri-miRNA" as this is the first emergence of this wording.

563
We updated the sentence as follows,

564
• Page 17, Line 398: "… produce primary miRNAs (pri-miRNAs), which …" 565 566 567 Reviewers' Comments: Reviewer #1: Remarks to the Author: My concerns/comments were addressed. In order for the full potential of this study to be reached, it will be essential that predicted transcript and protein sequences be deposited for the public to include in their own studies, but the authors have ensured that this will be done before publication.
Reviewer #2: Remarks to the Author: This is the revised manuscript that now does not contain anecdotal claims. Concerns raised by reviewers are well addressed. I have no more comment to improve this manuscript.