Genomic analysis of an ultrasmall freshwater green alga, Medakamo hakoo

Kato, Shoichi; Misumi, Osami; Maruyama, Shinichiro; Nozaki, Hisayoshi; Tsujimoto-Inui, Yayoi; Takusagawa, Mari; Suzuki, Shigekatsu; Kuwata, Keiko; Noda, Saki; Ito, Nanami; Okabe, Yoji; Sakamoto, Takuya; Yagisawa, Fumi; Matsunaga, Tomoko M.; Matsubayashi, Yoshikatsu; Yamaguchi, Haruyo; Kawachi, Masanobu; Kuroiwa, Haruko; Kuroiwa, Tsuneyoshi; Matsunaga, Sachihiro

doi:10.1038/s42003-022-04367-9

Download PDF

Article
Open access
Published: 23 January 2023

Genomic analysis of an ultrasmall freshwater green alga, Medakamo hakoo

Communications Biology volume 6, Article number: 89 (2023) Cite this article

7102 Accesses
5 Citations
161 Altmetric
Metrics details

Subjects

Abstract

Ultrasmall algae have attracted the attention of biologists investigating the basic mechanisms underlying living systems. Their potential as effective organisms for producing useful substances is also of interest in bioindustry. Although genomic information is indispensable for elucidating metabolism and promoting molecular breeding, many ultrasmall algae remain genetically uncharacterized. Here, we present the nuclear genome sequence of an ultrasmall green alga of freshwater habitats, Medakamo hakoo. Evolutionary analyses suggest that this species belongs to a new genus within the class Trebouxiophyceae. Sequencing analyses revealed that its genome, comprising 15.8 Mbp and 7629 genes, is among the smallest known genomes in the Viridiplantae. Its genome has relatively few genes associated with genetic information processing, basal transcription factors, and RNA transport. Comparative analyses revealed that 1263 orthogroups were shared among 15 ultrasmall algae from distinct phylogenetic lineages. The shared gene sets will enable identification of genes essential for algal metabolism and cellular functions.

Expansion of phycobilisome linker gene families in mesophilic red algae

Article Open access 23 October 2019

Genome sequencing of the multicellular alga Astrephomene provides insights into convergent evolution of germ-soma differentiation

Article Open access 22 November 2021

Pediludiella daitoensis gen. et sp. nov. (Scenedesmaceae, Chlorophyceae), a large coccoid green alga isolated from a Loxodes ciliate

Article Open access 20 January 2020

Introduction

Microalgae are microscopic unicellular phytoplankton found in freshwater, seawater, and sediment, and are invisible to the naked eye¹. Microalgae form the basis of the food chain in aquatic ecosystems, and play important roles in carbon dioxide capture and sequestration through photosynthesis². Despite their ecological importance in providing energy to support all higher trophic levels, more than 70% of the species are estimated to remain unidentified³. Microalgae have been used to produce highly functional foods, biofuels, and materials used in cosmetics¹. To improve the production efficiency and profitability of current algal culture systems, demand is increasing for especially small microalgae that can be cultured at high densities.

We focused on Medakamo hakoo (Chlorophyta), an ultrasmall algal species found in freshwater that potentially may provide notable insights into genome biology of algae. Medakamo hakoo was first identified and reported in 2015⁴. A previous study involving DNA staining revealed that M. hakoo likely has the smallest known nucleus among Archaeplastida species⁵. Although some microalgae inhabiting seawater and hot springs have extremely simple genomes^6,7,8,9,10, relatively few freshwater algae with extremely small genomes have been reported. Genomic analysis of M. hakoo is expected to produce useful information for future investigations on effective culture methods for optimal production of useful substances. Genomic information for M. hakoo will also contribute to understanding how eukaryotic phototrophs thrive in diverse environments. In addition, comparison of the genomes of M. hakoo and other ultrasmall algal species is an effective strategy for identifying the gene set common to algal species and genes common to green algal lineages.

In this study, we first characterized the morphological features and synchronization of the cell cycle of M. hakoo. Next, the M. hakoo genome sequence was assembled from long reads generated using the PacBio RSII system in conjunction with RNA-seq analysis of the transcriptome. Finally, comparison of the genomes of M. hakoo and 14 other microalgal species revealed that M. hakoo has one of the smallest genomes among freshwater algae, and 1263 gene families conserved among microalgae were identified.

Results

Investigation of M. hakoo cellular characteristics

To characterize M. hakoo morphology, we first used SYBR Green I stain to label M. hakoo, Cyanidioschyzon merolae, and Saccharomyces cerevisiae nuclei, and observed that the fluorescence intensity of the M. hakoo nuclei was similar to that of C. merolae and S. cerevisiae nuclei (Fig. 1a, b, Supplementary Fig. 1). Cyanidioschyzon merolae is an ultrasmall unicellular red alga with the smallest genome in Rhodophyta^6,7. Fluorescence and transmission electron microscopic examination indicated that M. hakoo cells were approximately 1 µm in diameter and contained relatively few organelles, with only a single mitochondrion and chloroplast (Fig. 1c–g, Supplementary Fig. 2). Notably, a specific electron-dense structure in the nuclear peripheral region and thick cell walls were typical characteristics of M. hakoo cells (Fig. 1f, Supplementary Fig. 2d, e). Another structural feature observed in M. hakoo cells was the accumulation of starch aggregates in the chloroplast (Supplementary Fig. 2d). In C. merolae, starch aggregated in the cytoplasm (Supplementary Fig. 2a, c). Additionally, phycobilisomes were undetectable in M. hakoo chloroplasts (Fig. 1f, Supplementary Fig. 2). To examine the M. hakoo cell division pattern, we cultured cells under a light–dark cycle to obtain highly synchronized cells, and detected the following cell-cycle stages: a single-cell stage (I), a two-cells-combined stage (II), a tetrad stage (III), and a dissection stage (IV) (Fig. 1h, i, Supplementary Fig. 3). In addition, M. hakoo cells cultured in nitrogen-depleted medium typically formed lipid droplets, similar to the response of the oil-rich alga Botryococcus braunii (Supplementary Fig. 4)^11,12,13.

**Fig. 1: Morphological characteristics of *Medakamo hakoo*.**

Sequencing and evolutionary analysis of the M. hakoo genome

From the long-read sequencing analysis, we obtained 18 contigs via a de novo sequence assembly (Table 1, Supplementary Table 1), of which two contigs were annotated as organellar genomes because they were circular sequences. To perform a phylogenetic analysis, we used the M. hakoo organellar genome, which we previously described¹⁴. A phylogenetic tree was constructed on the basis of plastid genome sequences from 62 chlorophyte taxa using the maximum-likelihood (ML) method (Fig. 2a). The resulting tree suggested that M. hakoo is classifiable in the class Trebouxiophyceae. Additionally, M. hakoo is likely evolutionarily related to B. braunii, which shows potential for algal fuel production^15,16 (Supplementary Fig. 4).

Table 1 Basic data for the Medakamo hakoo genome.

Full size table

**Fig. 2: Evolutionary analyses of the *Medakamo hakoo* genome.**

In our phylogenetic analysis, Medakamo and Choricystis formed a small clade sister to Botryococcus (Fig. 2a). Many algal strains originating from various freshwater habitats were recently identified as Choricystis species, and their rbcL sequences are available in the NCBI database (e.g., Novis et al.¹⁷). In addition, three Choricystis species were identified mainly on the basis of phylogenetic analyses by Pröschold and Darienko¹⁸. To more precisely resolve the phylogenetic relationships between Medakamo and Choricystis, 54 Choricystis rbcL sequences in the NCBI database, two new rbcL sequences from strains studied by Pröschold and Darienko¹⁸, and the Medakamo rbcL sequence were included in a phylogenetic analysis, with Botryococcus sequences as the outgroup (Fig. 2b). The phylogenetic tree robustly resolved two sister clades (with bootstrap values of 83% or higher) that corresponded to Choricystis and the new genus Medakamo. Medakamo comprises M. hakoo sp. nov. and M. limnetica comb. nov. (= Choricystis limnetica) and can be clearly distinguished from Choricystis and other freshwater green algae on the basis of their phylogenetic position and differences in cell morphology (Supplementary Note 1). Thus, Medakamo is recognized as a new genus within the class Trebouxiophyceae.

Nuclear genome analysis

The 16 non-organellar contigs were chromosomal sequences flanked by telomere sequences (5′-TTAGGG-3′). The total contig size was 15.8 Mb (Tables 1, 2), which was larger than the genome size estimated by fluorescence microscopy⁵. Thus, the contigs represented almost the complete genome sequence.

Table 2 Comparison of genome data for five microalgal species.

Full size table

Next, we validated the genome assembly by confirming several benchmarks. The sequence coverage of the assembly was 246.8×. To further evaluate the assembled genome, we conducted a Benchmarking Universal Single-Copy Orthologs (BUSCO) analysis^19,20. After removing organellar contigs, 89.5% of the assembly comprised complete BUSCOs (Supplementary Fig. 5). We also mapped RNA-seq reads to the contigs, with 98.6% of the reads successfully aligned. Moreover, tRNAs corresponding to all 20 amino acids were identified (Supplementary Table 2).

Using the BRAKER2 software²¹, we identified 7629 candidate protein-coding sequence (CDS) regions in the nucleus, and annotated these sequences using eggNOG-mapper²² (Supplementary Data 1) and GhostKOALA²³ (Supplementary Data 2). Amino acid sequences were obtained for 91.0% of the complete BUSCOs (Supplementary Fig. 5).

Effect of a high G + C content on the amino acid content in the M. hakoo proteome and gene expression

A high G + C content (73 mol%) is one characteristic of the M. hakoo genome (Table 2). The G + C content was high in protein-coding regions as well (Supplementary Table 3). A high G + C content in the protein-coding regions may increase the number of certain types of amino acids in the resultant proteins (Fig. 3a). We analyzed the amino acid composition of the predicted protein sequences in M. hakoo and compared them with the amino acid composition in other microalgae. Our analysis demonstrated that M. hakoo proteins contained many alanine, glycine, and proline residues. More specifically, alanine was more abundant in the M. hakoo proteome than in the C. merolae proteome (Fig. 3b). To investigate the relationship between gene expression and the G + C content in the CDSs, we plotted the G + C content and the transcripts per million (TPM) value for each gene (Fig. 3c), which revealed a negative correlation between these two factors. A negative correlation was also detected between the alanine content and the TPM value for each gene (Fig. 3d). These results suggested that highly expressed genes (e.g., housekeeping genes) were relatively unaffected by the bias toward a high G + C content in the M. hakoo genome.

**Fig. 3: Effect of a high genomic G + C content.**

We further analyzed M. hakoo expression patterns using proteomics, which detected more than 3000 unique peptides across samples (Supplementary Data 3). All codons were represented in the proteins detected (Table 3, Supplementary Fig. 6), suggesting that M. hakoo cells use a standard codon table, with the caveat that mass spectrometry cannot distinguish between leucine and isoleucine residues.

Table 3 Codon counts in the Medakamo hakoo genome regions for the peptides detected during the proteomic analysis.

Full size table

Relationship between the extremely high G + C content of the M. hakoo genome and the notable accumulation of the guanine quadruplex consensus sequence

To determine the reason for the high G + C content in the M. hakoo genome, we examined the G + C content-related characteristics of the genome sequence. We focused on the guanine quadruplex (G4) structure, which is a non-B DNA structure that forms in guanine-abundant DNA regions^24,25,26. The guanine nucleotides form a tetrad structure that is involved in various biological processes, including DNA replication, transcription, meiotic double-strand break, and telomere maintenance^24,25,26. To predict the abundance of G4 in the genome of M. hakoo and other organisms, we used pqsfinder software, which can detect the G4 consensus sequence in DNA sequences²⁷ (Fig. 3e). The frequency of the predicted G4 regions was highest in M. hakoo among the analyzed organisms. Notably, the frequency was higher than that in Streptomyces coelicolor, which has a genome with one of the highest G + C contents among prokaryotes.

Comparison between the M. hakoo genome and the genomes of other species

We compared the M. hakoo genome with previously sequenced genomes in terms of their size and gene number (Fig. 4a; Table 2). The genome of M. hakoo was one of the smallest among Viridiplantae species and was smaller than that of Auxenochlorella protothecoides, which has the smallest genome previously known among freshwater green algae (Fig. 4a). We classified the M. hakoo nuclear genes according to the Kyoto Encyclopedia of Genes and Genomes (KEGG) Orthology²⁸ and determined the number of genes belonging to each functional group. First, we calculated the number of genes in the most enriched KEGG Orthology categories and compared the results with those for other green algae (Fig. 4b). Overall, M. hakoo had fewer genes than the other examined species and, most notably, fewer genes belonging to the ‘Genetic information processing’ category. Next, we determined the number of genes annotated with specific pathways categorized as ‘Genetic information processing’. For many of these pathways, M. hakoo tended to have fewer genes compared with other green algal species (Fig. 4c). On the basis of the gene sets, the pathways associated with ‘Basal transcription factors’ and ‘Nucleocytoplasmic transport’ were considerably simpler in M. hakoo than in Ostreococcus lucimarinus, a marine green alga with a smaller genome than M. hakoo (Fig. 4c).

**Fig. 4: Gene number and composition of the *Medakamo hakoo* genome.**

Conservation of important biological pathways

To more precisely determine the characteristics of the M. hakoo genome, we analyzed the conservation of genes associated with major biological pathways. Although almost all fundamental photosynthesis-related genes were conserved in M. hakoo, genes encoding stress-related light-harvesting complex (LHC)-like proteins (LHCSR) were not detected in the M. hakoo genome (Supplementary Figs. 7, 8). Regarding the photoreceptors, M. hakoo had cryptochrome and phototropin orthologs, but lacked a phytochrome homolog. Although several green algae have phytochromes²⁹, their absence in M. hakoo is consistent with the lack of phytochromes in the green alga Chlamydomonas reinhardtii³⁰ and the red alga Cyanidioschyzon merolae⁶.

Our analysis also confirmed that M. hakoo had a set of conserved canonical histones, including H2A, H2B, H3, and H4. However, we did not detect an ortholog of the linker histone H1 encoded in the M. hakoo genome. To confirm the conservation of other chromosomal proteins, we analyzed structural maintenance of chromosomes (SMC) protein complexes (cohesin, condensin I/II, and SMC5/6), which are key regulators of chromosomal organization, dynamics, and stability³¹. Our analysis indicated that canonical components of all SMC complexes were conserved in M. hakoo (Supplementary Table 4).

Notably, relatively few nuclear envelope (NE)-related genes were identified in the M. hakoo genome. We investigated whether the NE-related genes were conserved in M. hakoo. The NE is composed of the outer nuclear membrane, the inner nuclear membrane, and the nuclear pore complex, which is mechanically supported by the nuclear lamina beneath the inner nuclear membrane^32,33. We determined that the nuclear pore complex components (nucleoporins) were highly conserved, in contrast to other NE-related factors, which were minimally conserved. Only one ortholog of the mid-Sad1/UNC84 domain-containing protein (mid-SUN)³⁴ was a conserved inner nuclear membrane protein, and none of the outer nuclear membrane and nuclear lamina proteins were conserved.

RNA interference (RNAi) is the molecular mechanism that regulates gene expression via small RNA and related proteins. Although RNAi-related proteins, such as Dicer and Argonaute, are widely conserved among eukaryotic organisms^35,36, genes encoding these proteins were not detected in the M. hakoo genome. The genomes of microalgae, including C. merolae and O. tauri, also appear to have lost RNAi-related genes (Supplementary Table 5).

Autophagy involves the transport of a cytoplasmic cargo to the lysosome for degradation³⁷. This mechanism is crucial for cell homeostasis because it allows the cell to recycle itself via the degradation of its components. However, no orthologs of autophagy-related genes are known in red algae³⁸. By screening the M. hakoo genome, we identified conserved autophagy-related (ATG) genes (Supplementary Table 6), including some that encode core ATG proteins required for autophagosome formation³⁹. This indicates that the autophagy system is conserved in chlorophytes, but not in rhodophytes.

Analysis of orthogroup composition in M. hakoo and other algae

For further analysis of the gene composition in M. hakoo and other algae, we analyzed orthogroups of M. hakoo and 14 other algal species comprising 11 green algae (A. protothecoides, Bathycoccus prasinos, Chlamydomonas reinhardtii, Chlorella variabilis, Coccomyxa subellipsoidea, Micromonas commoda, Micromonas pusilla, Monoraphidium neglectum, O. lucimarinus, O. tauri, and Volvox carteri) and three red algae (Chondrus crispus, Cyanidioschyzon merolae, and Galdieria sulphuraria) with OrthoFinder^40,41. The composition of each algal orthogroup is summarized in Supplementary Data 4 and the data were analyzed using principal component analysis (Fig. 5a). The microalgal genomes were classifiable into several groups according to their orthogroup composition. Medakamo hakoo belonged to a group composed of trebouxiophycean algae (A. protothecoides, C. subellipsoidea, and C. variabilis) and had the smallest genome in this group.

**Fig. 5: Analysis of orthogroups among *Medakamo hakoo* and other microalgae.**

Analysis of shared genes among microalgae

Given that M. hakoo had the smallest genome known among trebouxiophycean microalgae, we compared the orthogroups of M. hakoo and 14 microalgal species from other lineages to identify the gene set common to these microalgae (C15 gene set). We added S. cerevisiae gene sets to the OrthoFinder analysis to divide the C15 gene set into the following two classes: genes typically conserved in eukaryotes (CE) gene set and algae-specific (AS) genes. The AS gene set was defined with a purposefully lenient criterion to maximally capture the potential diversity of microalgal orthologs.

The 1263 orthogroups were classified into the C15 gene set, of which 984 and 279 orthogroups comprised the CE and AS gene sets, respectively (Fig. 5b, Supplementary Data 5). The CE and AS gene sets in the M. hakoo genome tended to be more highly expressed than the genes not included in these sets (Fig. 5c, Supplementary Table 7), indicating that the gene sets include many housekeeping genes. Next, we compared each gene set with the M. hakoo genome in terms of the proportion of genes assigned to particular KEGG Orthology categories, which revealed that the proportion of genes associated with metabolism was substantially higher in the AS gene set (Fig. 5d).

We also analyzed the enriched pathways in these gene sets. To evaluate the extent of the enrichment, we mapped the C15 and AS gene sets to KEGG pathways and calculated the ratio of the mapped genes in each pathway. We considered that a high ratio of AS genes to C15 genes indicated that the pathway strongly contributed to the characteristics of the AS gene set. The highly enriched pathways were associated with metabolism and photosynthesis (Fig. 5e, Table 4). Notably, the AS genes were mainly associated with pathways involved in secondary metabolism (Fig. 5e, Table 4).

Table 4 Enriched KEGG Orthology categories in the algae-specific gene set.

Full size table

Discussion

In this study, we revealed that the freshwater green alga M. hakoo is an ultrasmall microalgae with cells 1 µm in diameter that can accumulate useful substances, including starch and lipids, and thus may be of utility for bioproduction. The most unique characteristic of M. hakoo is strong synchronization of the cell cycle under a light–dark cycle. Recently, it was shown that the key to maintaining dense algal cultures is to avoid clogging of the photosynthetic electron flow by appropriate regulation of the timing of light–dark cycles⁴². Thus, this attribute confers the potential to contribute to effective and stable bioproduction of useful substances with uniform quality¹. In addition to these characteristics, M. hakoo has an extremely small genome. On the basis of the available information on Viridiplantae species in the KEGG Organisms database⁴³ and the JGI genome portal^44,45, M. hakoo likely possesses one of the smallest genomes among freshwater Viridiplantae species. This finding suggests that maintaining a small genome and small cells is advantageous for survival, not only in seawater and extreme environments, but also in common freshwater environments. According to the package effect^11,12,13, small cells are advantageous for photosynthesis. Although the mechanisms underlying the maintenance of ultrasmall cells remain unknown in microalgae, generic factors, such as photosynthetic efficiency, may have induced a decrease in genome size or suppressed genomic expansion.

In green plants, LHCSR and PSBS are conserved LHC-like proteins with functions associated with non-photochemical quenching⁴⁶. Although there is some uncertainty regarding the underlying molecular mechanism, LHCSR plays a major role in the dissipation of excess light energy in C. reinhardtii. Regarding land plants, both LHCSR and PSBS help to dissipate excess energy in the moss Physcomitrium, but Arabidopsis thaliana lacks LHCSR and uses PSBS as a central component of its light energy dissipation machinery⁴⁶. Additionally, PSBS is universally conserved in Viridiplantae, with the exception of bryopsidalean algae (Ostreobium and Caulerpa), which may have adapted to diverse light conditions⁴⁷. In M. hakoo, the absence of LHCSR and the presence of the plastid genome-encoded cemA, which is required for the tolerance of Chlamydomonas to high-light stress⁴⁸, suggest that this alga may have a unique high-light acclimation mechanism that may have convergently evolved in flowering plants and in an early-branching green alga (Chloropicon) (Supplementary Figs. 7, 8).

The M. hakoo genome does not contain a histone H1 gene CDS. Histone H1 includes the H15 domain, which is a globular domain comprising a winged-helix motif⁴⁹. Green algae, including Chlorella sorokiniana⁵⁰ and Haematococcus lacustris⁵¹, and red algae, including C. merolae⁶, have histone H1 orthologs or H15 domain-containing proteins. Similar to M. hakoo, the genome of O. tauri lacks a gene encoding a protein with the H15 domain⁸. Because the genes encoding SMC proteins, including those in condensin and cohesin, are conserved in these genomes, chromatin folding and chromosome condensation in these green algae occur without the linker histone H1.

The M. hakoo genome has a very high G + C content and abundant G4. The G4 structure, which is commonly located in telomere sequences but is also present within chromosomes, has diverse functions, including transcriptional and translational regulation. We speculate that a high frequency of the G4 consensus sequence is a characteristic of the M. hakoo genome, and G4-related biological processes may have contributed to the elevated genomic G + C content in this species during its evolution.

Principal component analysis of the orthogroup composition of M. hakoo and 14 other microalgal species resolved multiple groups of species. Interestingly, M. hakoo and Ostreococcus, which have extremely small genomes, were placed in separate groups. This result indicates that ortholog compositions in microalgae are not dependent on the genome size but rather may reflect lineage-specific gene gains/losses. The AS gene set reflects well the genomic, metabolomic, and physiological characteristics of microalgae. For example, the AS gene set included those genes associated with terpenoid-related secondary metabolites. Carotenoids, one subgroup of tetraterpenoids, play a role as an antenna pigment for harvesting light and provide protection against oxidative stress^52,53, which is beneficial for human health. Large-scale production of carotenoids for health-related industries using microalgae is flourishing⁵⁴. The AS gene set determined in the current study provides information relevant for bioengineering of microalgae. In addition to the M. hakoo genome described in this study, the availability of genome sequences for a broad range of other ultrasmall algae would provide a foundation to identify the minimal conserved gene set of plants (algae and land plants), and to understand how photosynthetic eukaryotes thrive in diverse environments.

Methods

Materials

Medakamo hakoo 311 was obtained from the personal aquarium of Prof. Kuroiwa (Kagurazaka, Tokyo, Japan)⁴. The M. hakoo strain was cultured in 0.05% HYPONeX (HYPONeX Japan Corp., Ltd., Osaka, Japan) liquid medium and on 0.05% HYPONeX gellan gum-based solid medium in plates. Cyanidioschyzon merolae 10D (Toda et al. 1995) was cultured in Misumi–Kuroiwa medium at pH 2.2 and 42 °C⁵⁵. The Misumi–Kuroiwa medium was prepared by diluting 1 mL of a commercial nutrient solution (Hyponex, N: P: K 10: 8: 8; Hyponex Japan, Osaka, Japan) to 1 L with distilled or tap water. The pH in the medium was adjusted to pH 2.2 with 1 mL concentrated HCl. Diploid Saccharomyces cerevisiae BY4743 strains were cultured at 30 °C in YPD medium that contained 1% yeast extract (Oriental Yeast Co., Ltd., Tokyo, Japan), 2% peptone (Kyokuto Co., Ltd., Tokyo, Japan), and 2% glucose⁵⁶. The B. braunii Kützing (NIES-2199) line was obtained from the Microbial Culture Collection at the National Institute for Environmental Studies (Japan) and cultured in AF-6 medium⁵⁷ at 22 °C.

Synchronization culture

The light–dark cycle for the cell-cycle synchronization culture was as follows: 12-h light:12-h dark. In the mitotic phase, M. hakoo cells were sampled every hour and examined using a microscope. Each cell type (one-cell, two-cells, and four-cells) was counted. More than five fields of view (1 × 10³ µm²) were selected for each sample. This experiment was performed several times and representative results are presented.

Fluorescence microscopy

The M. hakoo, C. merolae, and S. cerevisiae cells were stained with 4’, 6-diamidino-2-phenylindole (DAPI) and SYBR Green I (Molecular Probes, Eugene, OR, USA)⁵. SYBR Green I stain, which has been used to examine cell nuclei in various algae because it is unaffected by the genomic G + C content⁵⁸, was used to confirm the presence of the cell nuclei and organelle nucleoids revealed by conventional DAPI staining⁵⁹. Cultures were centrifuged and the resulting pellet was resuspended, after which a 3 µL aliquot of the solution was placed on a glass slide to form a droplet. Next, 3 µL 1% (v/v) glutaraldehyde in NS buffer (0.25 M sucrose, 1 mM EDTA, 7 mM 2-mercaptoethanol, 0.8 mM PMSF, 1 mM magnesium chloride, 0.1 mM calcium chloride, 0.1mM zinc sulfate and 20 mM Tris-HCl, pH 7.6) was added to the droplet, followed by the addition of 3 µL DAPI (15 µg mL⁻¹) or 3 µL SYBR Green I (1 µg mL⁻¹). A coverslip was placed on the droplet and then gently pressed. The stained samples were observed using an Olympus BH-2 BHS epifluorescence microscope.

Transmission electron microscopy

Electron microscopy analyses were performed as previously described⁴. Briefly, M. hakoo was fixed for 4 h in 1% (v/v) glutaraldehyde in a sodium cacodylate buffer. After post-fixation and dehydration steps, the samples were embedded in Spurr’s resin. Ultrathin sections of the samples were stained with 5% (w/v) uranyl acetate and lead citrate. The JEM 1200 EXS electron microscope (JEOL Ltd., Tokyo, Japan) was used to examine the prepared samples.

De novo whole-genome assembly

Medakamo hakoo cells were frozen and then ground using a mortar and pestle. We extracted genomic DNA in two phenol extraction cycles, which were followed by an ethanol precipitation step. The DNA solution was purified by cesium chloride density-gradient centrifugation. The genomic DNA was purified using the AMPure XP kit (Beckman Coulter, CA, USA). Purified samples were fragmented using g-TUBE (Covaris, IL, USA). The fragmented DNA was blunted and fused with SMRTbell adapters using the SMRTbell Template Prep Kit 1.0 (Pacific Bioscience, CA, USA). We evaluated the size distribution of the adapter-fused DNA by pulse-field electrophoresis and performed a size selection step (15.0 kb cut-off) using the BluePippin system (Sage Science, MA, USA). The sequencing library was quantified using the Agilent 2200 TapeStation (Agilent Technologies, CA, USA). The sequencing primer was annealed and DNA polymerase was bound to the sequencing library using the DNA/Polymerase Binding Kit P6 (version 2) (Pacific Bioscience, CA, USA). The sequencing templates were bound to magnetic beads using MagBead OneCellPerWell (Pacific Bioscience, CA, USA) and added to a SMART cell for the subsequent sequencing on the PacBio RS II system (Pacific Bioscience, CA, USA). For the de novo sequence assembly, the obtained reads were analyzed with the RS_HGAP Assemble.3 program of the SMRT analysis software (version 2.3.0).

RNA-seq analysis

Medakamo hakoo cells were ground to a powder in liquid nitrogen with a pestle and mortar. The cell powder was resuspended in 5 mL warmed (55 °C) nucleic acid extraction buffer (300 mM NaCl, 50 mM Tris-HCl [pH 7.6], 100 mM EDTA [pH 8.0], 2% Sarkosyl, and 4% SDS) and the resulting solution was stirred. The RNA extraction using a phenol/chloroform/isoamyl alcohol mixture (25:24:1, v/v/v) was repeated twice. The extract was purified by ethanol precipitation, and the total RNA was isolated using the RNeasy Plant Mini Kit (Qiagen, CA, USA). Next, 1 µL (5 units) Recombinant DNase I (RNase-free) (Takara Bio, Shiga, Japan) in 10 µL 10× buffer and 1 µL (40 units) Recombinant RNase Inhibitor (Takara Bio) were added to 100 µL crude RNA solution, which was then incubated at 37 °C for 30 min. After the addition of 5 µL 0.5 M EDTA, the solution was incubated at 80 °C for 5 min. The RNA in the solution was precipitated in ethanol and then dissolved in pure water. After the quality was checked using a NanoDrop spectrophotometer (Thermo Fisher Scientific, MA, USA) and the Agilent 2200 TapeStation, a sequencing library was produced using the TruSeq Stranded mRNA Sample Prep Kit (Illumina, CA, USA). The quality of the sequencing library was evaluated using the Agilent 2100 Bioanalyzer (Agilent Technologies). Both cBot and the HiSeq PE Cluster Kit (version 4) (Illumina) were used for the cluster formation step. The library was sequenced (100-bp paired-end reads) using the HiSeq 2500 system and the HiSeq SBS Kit (version 4) (Illumina).

Annotation of the M. hakoo genome

Sixteen contigs were annotated after removing two organellar contigs. The completeness of the contigs was evaluated using BUSCO (version 5.2.2)¹⁹ and the chlorophyta_odb10 dataset. Repeat sequences were identified and masked using RepeatModeler (version 2.0.2)⁶⁰ and RepeatMasker (version 4.1.1) (https://www.repeatmasker.org/). Gene models were predicted according to RNA-seq reads, which were trimmed using the default options of fastp (version 0.20.0)⁶¹. The trimmed RNA-seq reads were mapped to the contigs using the default options of HISAT2 (version 2.2.1)⁶², whereas the initial gene models were predicted using “stopCodonExcludedFromCDS = False” of BRAKER2 (version 2.1.5)²¹. Gene models were also predicted using PASA (version 2.4.1)⁶³, GeneMark-ET (version 4.33)⁶⁴, and SNAP (version 2006-07-28)⁶⁵ in the funannotate pipeline (version 1.8.9) (https://github.com/nextgenusfs/funannotate). These gene models were combined with those obtained from BRAKER2 (with weight = 1) using EvidenceModeler (version 1.1.1)⁶⁶. The quality of gene prediction was evaluated using BUSCO^19,20 with the protein mode and the chlorophyta dataset. Functional annotation was conducted using eggNOG-mapper 2.1.9²² and GhostKOALA²³.

Assessment of the quality of the M. hakoo genome assembly and annotation

To evaluate the quality of the M. hakoo genome assembly, we performed a BUSCO analysis (Simão et al. 2015) using the chlorophyta dataset. Contig sequences excluding organelle-derived contigs and predicted protein-coding sequences were used for the BUSCO analysis.

Phylogenomic analyses of amino acid datasets

The chloroplast genomes of M. hakoo and 63 green algae reported by Lemieux et al.⁶⁷ were included in the phylogenomic analyses. A total of 79 protein-coding genes were used to construct the following phylogenetic datasets as previously: accD, atpA, B, E, F, H, I, ccsA, cemA, chlB, I, L, N, clpP, cysA, T, ftsH, infA, minD, petA, B, D, G, L, psaA, B, C, I, J, M, psbA, B, C, D, E, F, H, I, J, K, L, M, N, T, Z, rbcL, rpl2, 5, 12, 14, 16, 19, 20, 23, 32, 36, rpoA, B, C1, C2, rps2, 3, 4, 7, 8, 9, 11, 12, 14, 18, 19, tufA, ycf1, 3, 4, 12, 20, 47, and 62. Amino acid datasets were prepared as follows. The deduced amino acid sequences of the 79 selected genes were aligned using MUSCLE 3.8⁶⁸. The ambiguously aligned regions in each alignment were removed using TRIMAL 1.4⁶⁹ with the following settings: block = 6, gt = 0.7, st = 0.005, and sw = 3. The protein alignments were concatenated using Phyutility 2.2.6⁷⁰. Maximum-likelihood analyses were conducted using the edge-linked partition model of IQ-TREE 1.6.1^71,72. The datasets were partitioned by gene, with the model applied to each partition. The optimal amino acid substitution model for each gene, partitioned by the datasets, was selected according to the Bayesian information criterion using the ModelFinder function of IQ-TREE⁷³. Branch support for the resulting ML trees was calculated via a non-parametric bootstrap analysis and the SH-like approximate likelihood-ratio test⁷⁴. Bayesian analyses were performed using the site-heterogeneous CATGTR + Γ4 model and PhyloBayes 4.1⁷⁵. Five independent chains were run for 5000 cycles, and consensus topologies were calculated from the saved trees using the BPCOMP program of PhyloBayes after a burn-in of 1250 cycles. The largest discrepancy value across all bipartitions in the consensus topologies (maxdiff) under these conditions was less than 0.13, suggesting that the chains were substantially converged.

The sequences of the chloroplast Rubisco large subunit gene (rbcL) of the algal strains identified as Choricystis and Botryococcus species were obtained as aligned sequences following a BLASTN search of the NCBI database (https://www.ncbi.nlm.nih.gov/) using the M. hakoo rbcL sequence (1431 bp) (accession no. LC709230) as the query. Sequences were aligned using ClustalX⁷⁶. Additionally, the rbcL sequences of SAG 251-1 (NIES-1436) and SAG 251-2¹⁸ were determined by Sanger sequencing of the PCR products (accession nos. LC709231 and LC709232) and then added to the alignment. The ML analyses of the aligned rbcL sequences were performed using MEGAX⁷⁷, with the best-fit model (GTR + G + I) selected by MEGAX and topological support assessed with 1000 bootstrap replicates⁷⁸. Three Botryococcus sequences were treated as the outgroup on the basis of the present chloroplast multigene phylogeny (Fig. 2b).

Lipid formation culture

We cultured M. hakoo in the liquid media described in Supplementary Table 8. For B. braunii, AF-6 was used as the normal medium. After a 13-day culture, cells were collected by centrifugation and then stained with 20 µg mL⁻¹ Nile Red diluted with phosphate buffer.

Gene expression analysis

We used the Genedata Profiler Genome software (version 10.1.14a; Genedata, Basel, Switzerland) to analyze the assembled genomic sequence and annotation data. TopHat (version 2.0.14)^79,80 was used for mapping. The total read count was 68,289,325 and 95.9% of the reads were mapped onto the M. hakoo genome.

Genome size and gene number comparison among plants

Genomic data (nuclear and organellar genomes) for the following species were obtained from the RefSeq database⁸¹: A. thaliana, Glycine max, Oryza sativa, Selaginella moellendorffii, Physcomitrella patens, Amborella trichopoda, C. reinhardtii, V. carteri f. nagariensis, Monoraphidium neglectum, O. lucimarinus CCE9901, O. tauri, B. prasinos, Micromonas commoda, Micromonas pusilla CCMP1545, C. subellipsoidea C-169, C. variabilis, A. protothecoides, C. merolae strain 10D, G. sulphuraria, and C. crispus. The Chloropicon primus genome was previously analyzed by Lemieux et al.⁸². The genome size and gene number analyses did not include organellar genome data.

Pathway analysis

A BLASTP analysis was performed by screening the KEGG database using the predicted CDSs in the M. hakoo genome as queries. The K numbers of each gene were obtained. Next, the pathway count data for each organism (C. reinhardtii, V. carteri f. nagariensis, O. lucimarinus CCE9901, M. commoda, C. subellipsoidea C-169, C. variabilis, and A. protothecoides) were acquired from the KEGG database (https://www.genome.jp/kegg/kegg_ja.html) and compared with the pathway information for the M. hakoo genome.

Enrichment analysis

A KEGG pathway enrichment analysis of the common gene sets was performed using enrichKEGG in the clusterProfiler package⁸³. The K numbers of the C15 and PS gene sets were used for this analysis. The C15 gene set served as the background to assess the KEGG pathway enrichment of the AS gene set. Additionally, “ko” was selected as the parameter for “organisms”. Details of the statistics for the enrichment analysis are described in the Statistics and Reproducibility section.

SDS-PAGE and in-gel digestion

Protein samples were dissolved in the sample buffer and partially separated (approximately 1 cm) using a NuPAGE Bis-Tris gel (Thermo Fisher Scientific, CA, USA). Electrophoresis was performed according to instructions of the manufacturer. Each lane was excised from the unstained gel. In-gel digestion was performed using 0.01 µg/µL LysC and trypsin⁸⁴.

Mass spectroscopic and chromatographic methods, instrumentation, and database searches

The resulting peptides were analyzed using the Q Exactive hybrid mass spectrometer (Thermo Fisher Scientific, CA, USA)⁸⁵. The MS/MS spectra were interpreted and then peak lists were generated using Proteome Discoverer 2.2.0.388 (Thermo Fisher Scientific, CA, USA). The SEQUEST program was used to search the in-house M. hakoo protein database with the following settings: enzyme selected with up to two missing cleavage sites; peptide mass tolerance, 10 ppm; MS/MS tolerance, 0.02 Da; fixed modification, carbamidomethylation (C); and variable modification, oxidation (M). Peptides were identified according to significant Xcorr values (high confidence filter). The peptide identification and modification information obtained from SEQUEST was manually examined and filtered to obtain confirmed peptide identification and modification lists for the HCD MS/MS analysis. The precursor ion intensity (normalized against the total peptide amount) was used for the label-free quantification.

Codon count

The detected peptides encoded by specific regions in the M. hakoo genome were used for the codon count. All complete CDSs mapped on the basis of these peptides were used. We counted the codons in the genome sequence encoding the peptides using the R package coRdon (https://github.com/BioinfoHR/coRdon).

G4 analysis

The complete M. hakoo draft genome sequence and the Escherichia coli, C. reinhardtii, C. merolae, and S. coelicolor genome sequences were analyzed using the default settings of the pqsfinder software to identify potential G4-forming sequences²⁷.

Orthogroup analysis

The C. merolae^6,7, O. tauri^9,10, S. cerevisiae strain S288C⁸⁶, and M. hakoo amino acid sequence datasets were analyzed. For the orthologous group analyses, gene families were identified using OrthoFinder⁴⁰. Protein sequences were compared in all-vs-all BLASTP searches using the NCBI blast+ toolkit⁸⁷, as suggested in the OrthoFinder manual.

Principal component analysis

Orthogroup composition data (Supplementary Data 4) for microalgae were analyzed with the prcomp package (version 4.0.2) in R.

Nomenclatural Acts

This published work and the nomenclatural acts (Supplementary Note 1) it contains have been registered in PhycoBank, the proposed online registration system for the International Code of Nomenclature for algae, fungi and plants (ICN). The PhycoBank LSIDs (Life Science Identifiers) can be resolved and the associated information viewed through any standard web browser by appending the LSID to the prefix “http://phycobank.org/”. The LSIDs for this publication are: 103506; 103507; 103508.

Statistics and reproducibility

All of the culture experiments presented in this paper have been conducted multiple times to confirm reproducibility. To analyze the table data and draw the figures, we used the tidiverse package (version 1.3.1) in R and pandas (version 1.0.5) in python. Brunner-Munzel test was performed with lawstat package (version 3.5) in R. The stats package (version 4.0.2) in R was used for Bonferroni correction of p-values. In the gene enrichment analysis, the p-values were calculated using a hypergeometric distribution, and the p-values of each pathway were adjusted according to the Benjamini–Hochberg method⁸⁸.

Reporting summary

Further information on research design is available in the Nature Portfolio Reporting Summary linked to this article.

Data availability

The source data underlying Figs. 1h, 3b, e, 4b, c, 5b–d are provided as Supplementary Data 6. The genome sequence read data were deposited in the Sequence Read Archive (accession numbers: SRR16480670–SRR16480673). The assembled chromosomal DNA sequences were deposited in GenBank (accession numbers: CP089450–CP089465). The transcriptome sequencing data were deposited in the Sequence Read Archive (accession number: SRR19165385), whereas the proteome data were deposited in the jPOST repository (accession number: JPST001585). All other data are available from the corresponding author.

References

Handbook of Microalgae-based Processes and Products (Elsevier, 2020). https://doi.org/10.1016/c2018-0-04111-0.
Onyeaka, H. et al. Minimizing carbon footprint via microalgae as a biological capture. Carbon Capture Sci. Technol. 1, 100007 (2021).
Article CAS Google Scholar
Guiry, M. D. How many species of algae are there? J. Phycol. 48, 1057–1063 (2012).
Article Google Scholar
Kuroiwa, T. et al. Cytological evidence of cell-nuclear genome size of a new ultra-small unicellular freshwater green alga, “Medakamo hakoo” strain M-hakoo 311 I. Comparison with Cyanidioschyzon merolae and Ostreococcus tauri. Cytologia 80, 143–150 (2015).
Article CAS Google Scholar
Kuroiwa, T. et al. Genome size of the ultrasmall unicellular freshwater green alga, Medakamo hakoo 311, as determined by staining with 4′,6-diamidino-2-phenylindole after microwave oven treatments: II. Comparison with Cyanidioschyzon merolae, Saccharomyces cerevisiae (n, 2n), and Chlorella variabilis. Cytologia 81, 69–76 (2016).
Article CAS Google Scholar
Matsuzaki, M. et al. Genome sequence of the ultrasmall unicellular red alga Cyanidioschyzon merolae 10D. Nature 428, 653–657 (2004).
Article CAS Google Scholar
Nozaki, H. et al. A 100%-complete sequence reveals unusually simple genomic features in the hot-spring red alga Cyanidioschyzon merolae. BMC Biol. 5, 28 (2007).
Derelle, E. et al. Genome analysis of the smallest free-living eukaryote Ostreococcus tauri unveils many unique features. Proc. Natl Acad. Sci. 103, 11647–11652 (2006).
Article CAS Google Scholar
Palenik, B. et al. The tiny eukaryote Ostreococcus provides genomic insights into the paradox of plankton speciation. Proc. Natl Acad. Sci. USA 104, 7705–7710 (2007).
Article CAS Google Scholar
Blanc-Mathieu, R. et al. An improved genome of the model marine alga Ostreococcus tauri unfolds by assessing Illumina de novo assemblies. BMC Genomics 15, 1103 (2014).
Article Google Scholar
Kirk, J. T. O. A theoretical analysis of the contribution of algal cells to the attenuation of light within natural waters I. general treatment of suspensions of pigmented cells. N. Phytol. 75, 11–20 (1975).
Article Google Scholar
Raven, J. A. A cost-benefit analysis of photon absorption by photosynthetic unicells. N. Phytol. 98, 593–625 (1984).
Article CAS Google Scholar
Raven, J. & Beardall, J. In Microalgal Production for Biomass and High-Value Products (eds Slocombe, S. P. & Benemann, J. R.) 1–19 (CRC Press, 2016).
Takusagawa, M. et al. Complete mitochondrial and plastid DNA sequences of the freshwater green microalga Medakamo hakoo. bioRxiv https://doi.org/10.1101/2021.07.27.453968 (2021).
Metzger, P. & Largeau, C. Botryococcus braunii: a rich source for hydrocarbons and related ether lipids. Appl. Microbiol. Biotechnol. 66, 486–496 (2005).
Article CAS Google Scholar
Banerjee, A., Sharma, R., Chisti, Y. & Banerjee, U. C. Botryococcus braunii: a renewable source of hydrocarbons and other chemicals. Crit. Rev. Biotechnol. 22, 245–279 (2002).
Article CAS Google Scholar
Novis, P. M., Lorenz, M., Broady, P. A. & Flint, E. A. Parallela Flint: its phylogenetic position in the Chlorophyceae and the polyphyly of Radiofilum Schmidle. Phycologia 49, 373–383 (2010).
Article CAS Google Scholar
Pröschold, T. & Darienko, T. Choricystis and Lewiniosphaera gen. nov. (Trebouxiophyceae Chlorophyta), two different green algal endosymbionts in freshwater sponges. Symbiosis 82, 175–188 (2020).
Article Google Scholar
Simão, F. A., Waterhouse, R. M., Ioannidis, P., Kriventseva, E. V. & Zdobnov, E. M. BUSCO: assessing genome assembly and annotation completeness with single-copy orthologs. Bioinformatics 31, 3210–3212 (2015).
Article Google Scholar
Waterhouse, R. M. et al. BUSCO applications from quality assessments to gene prediction and phylogenomics. Mol. Biol. Evol. 35, 543–548 (2018).
Article CAS Google Scholar
Brůna, T., Hoff, K. J., Lomsadze, A., Stanke, M. & Borodovsky, M. BRAKER2: automatic eukaryotic genome annotation with GeneMark-EP+ and AUGUSTUS supported by a protein database. NAR Genom. Bioinform. 3, lqaa108 (2021).
Article Google Scholar
Cantalapiedra, C. P., Hernández-Plaza, A., Letunic, I., Bork, P. & Huerta-Cepas, J. EggNOG-mapper v2: Functional annotation, orthology assignments, and domain prediction at the metagenomic scale. Mol. Biol. Evol. 38, 5825–5829 (2021).
Article CAS Google Scholar
Kanehisa, M., Sato, Y. & Morishima, K. BlastKOALA and GhostKOALA: KEGG tools for functional characterization of genome and metagenome sequences. J. Mol. Biol. 428, 726–731 (2016).
Article CAS Google Scholar
Bochman, M. L., Paeschke, K. & Zakian, V. A. DNA secondary structures: stability and function of G-quadruplex structures. Nat. Rev. Genet. 13, 770–780 (2012).
Article CAS Google Scholar
Maizels, N. Dynamic roles for G4 DNA in the biology of eukaryotic cells. Nat. Struct. Mol. Biol. 13, 1055–1059 (2006).
Article CAS Google Scholar
Maizels, N. & Gray, L. T. The G4 genome. PLoS Genet. 9, e1003468 (2013).
Article CAS Google Scholar
Hon, J., Martínek, T., Zendulka, J. & Lexa, M. pqsfinder: an exhaustive and imperfection-tolerant search tool for potential quadruplex-forming sequences in R. Bioinformatics 33, 3373–3379 (2017).
Article CAS Google Scholar
Mao, X., Cai, T., Olyarchuk, J. G. & Wei, L. Automated genome annotation and pathway identification using the KEGG Orthology (KO) as a controlled vocabulary. Bioinformatics 21, 3787–3793 (2005).
Article CAS Google Scholar
Rockwell, N. C. et al. Eukaryotic algal phytochromes span the visible spectrum. Proc. Natl Acad. Sci. USA 111, 3871–3876 (2014).
Article CAS Google Scholar
Duanmu, D. et al. Retrograde bilin signaling enables Chlamydomonas greening and phototrophic survival. Proc. Natl Acad. Sci. USA 110, 3621–3626 (2013).
Article CAS Google Scholar
Diaz, M. & Pecinka, A. Scaffolding for repair: Understanding molecular functions of the SMC5/6 complex. Genes (Basel) 9, 36 (2018).
Article Google Scholar
Prunuske, A. J. & Ullman, K. S. The nuclear envelope: form and reformation. Curr. Opin. Cell Biol. 18, 108–116 (2006).
Article CAS Google Scholar
Schirmer, E. C., Guan, T. & Gerace, L. Involvement of the lamin rod domain in heterotypic lamin interactions important for nuclear organization. J. Cell Biol. 153, 479–489 (2001).
Article CAS Google Scholar
Poulet, A., Probst, A. V., Graumann, K., Tatout, C. & Evans, D. Exploring the evolution of the proteins of the plant nuclear envelope. Nucleus 8, 46–59 (2017).
Article CAS Google Scholar
Cerutti, H. & Casas-Mollano, J. A. On the origin and functions of RNA-mediated silencing: from protists to man. Curr. Genet. 50, 81–99 (2006).
Article CAS Google Scholar
Shabalina, S. A. & Koonin, E. V. Origins and evolution of eukaryotic RNA interference. Trends Ecol. Evol. 23, 578–587 (2008).
Article Google Scholar
Levine, B. & Kroemer, G. Biological functions of autophagy genes: a disease perspective. Cell 176, 11–42 (2019).
Article CAS Google Scholar
Shemi, A., Ben-Dor, S. & Vardi, A. Elucidating the composition and conservation of the autophagy pathway in photosynthetic eukaryotes. Autophagy 11, 701–715 (2015).
Article CAS Google Scholar
Mizushima, N. & Komatsu, M. Autophagy: renovation of cells and tissues. Cell 147, 728–741 (2011).
Article CAS Google Scholar
Emms, D. M. & Kelly, S. OrthoFinder: solving fundamental biases in whole genome comparisons dramatically improves orthogroup inference accuracy. Genome Biol. 16, (2015).
Emms, D. M. & Kelly, S. OrthoFinder: phylogenetic orthology inference for comparative genomics. Genome Biol. 20, 238 (2019).
Article Google Scholar
Zarmi, Y. et al. Enhanced algal photosynthetic photon efficiency by pulsed light. iScience 23, 101115 (2020).
Article CAS Google Scholar
Kanehisa, M. KEGG: Kyoto Encyclopedia of Genes and Genomes. Nucleic Acids Res. 28, 27–30 (2000).
Article CAS Google Scholar
Grigoriev, I. V. et al. The genome portal of the Department of Energy Joint Genome Institute. Nucleic Acids Res. 40, D26–D32 (2012).
Article CAS Google Scholar
Nordberg, H. et al. The genome portal of the Department of Energy Joint Genome Institute: 2014 updates. Nucleic Acids Res. 42, D26–D31 (2014).
Article CAS Google Scholar
Pinnola, A. The rise and fall of Light-Harvesting Complex Stress-Related proteins as photoprotection agents during evolution. J. Exp. Bot. 70, 5527–5535 (2019).
Article CAS Google Scholar
Iha, C. et al. Genomic adaptations to an endolithic lifestyle in the coral-associated alga Ostreobium. Curr. Biol. 31, 1393–1402.e5 (2021).
Article CAS Google Scholar
Rolland, N. et al. Disruption of the plastid ycf10 open reading frame affects uptake of inorganic carbon in the chloroplast of Chlamydomonas. EMBO J. 16, 6713–6726 (1997).
Article CAS Google Scholar
Kasinsky, H. E., Lewis, J. D., Dacks, J. B. & Ausió, J. Origin of H1 linker histones. FASEB J. 15, 34–42 (2001).
Article CAS Google Scholar
Arriola, M. B. et al. Genome sequences of Chlorella sorokiniana UTEX 1602 and Micractinium conductrix SAG 241.80: implications to maltose excretion by a green alga. Plant J. 93, 566–586 (2018).
Article CAS Google Scholar
Morimoto, D., Yoshida, T. & Sawayama, S. Draft genome sequence of the astaxanthin-producing microalga Haematococcus lacustris strain NIES-144. Microbiol. Resour. Announc. 9, e00128-20 (2020).
Young, A. J. The photoprotective role of carotenoids in higher plants. Physiol. Plant. 83, 702–708 (1991).
Article CAS Google Scholar
Frank, H. A. & Cogdell, R. J. Carotenoids in photosynthesis. Photochem. Photobiol. 63, 257–264 (1996).
Article CAS Google Scholar
Ren, Y., Sun, H., Deng, J., Huang, J. & Chen, F. Carotenoid production from microalgae: biosynthesis, salinity responses and novel biotechnologies. Mar. Drugs 19, 713 (2021).
Article CAS Google Scholar
Kuroiwa, T. et al. Mitotic karyotype of the primitive red alga Cyanidioschyzon merolae 10D. Cytologia (Tokyo) 85, 107–113 (2020).
Article CAS Google Scholar
Miyakawa, I., Fujimura, R. & Kadowaki, Y. Use of the nuc1 null mutant for analysis of yeast mitochondrial nucleoids. J. Gen. Appl. Microbiol. 54, 317–325 (2008).
Article CAS Google Scholar
Provasoli, L. Artificial media for fresh-water algae: problems and suggestions. Ecol. Algae Spec. Pub 2, 84–96 (1960).
Nishimura, Y., Higashiyama, T., Suzuki, L., Misumi, O. & Kuroiwa, T. The biparental transmission of the mitochondrial genome in Chlamydomonas reinhardtii visualized in living cells. Eur. J. Cell Biol. 77, 124–133 (1998).
Article CAS Google Scholar
Kuroiwa, T. & Suzuki, T. An improved method for the demonstration of the in situ chloroplast nuclei in higher plants. Cell Struct. Funct. 5, 195–197 (1980).
Article Google Scholar
Flynn, J. M. et al. RepeatModeler2 for automated genomic discovery of transposable element families. Proc. Natl Acad. Sci. USA 117, 9451–9457 (2020).
Article CAS Google Scholar
Chen, S., Zhou, Y., Chen, Y. & Gu, J. fastp: an ultra-fast all-in-one FASTQ preprocessor. Bioinformatics 34, i884–i890 (2018).
Article Google Scholar
Kim, D., Paggi, J. M., Park, C., Bennett, C. & Salzberg, S. L. Graph-based genome alignment and genotyping with HISAT2 and HISAT-genotype. Nat. Biotechnol. 37, 907–915 (2019).
Article CAS Google Scholar
Haas, B. J. et al. Improving the Arabidopsis genome annotation using maximal transcript alignment assemblies. Nucleic Acids Res. 31, 5654–5666 (2003).
Article CAS Google Scholar
Lomsadze, A., Burns, P. D. & Borodovsky, M. Integration of mapped RNA-Seq reads into automatic training of eukaryotic gene finding algorithm. Nucleic Acids Res. 42, e119 (2014).
Article Google Scholar
Korf, I. Gene finding in novel genomes. BMC Bioinform. 5, 59 (2004).
Article Google Scholar
Haas, B. J. et al. Automated eukaryotic gene structure annotation using EVidenceModeler and the Program to Assemble Spliced Alignments. Genome Biol. 9, R7 (2008).
Article Google Scholar
Lemieux, C., Otis, C. & Turmel, M. Chloroplast phylogenomic analysis resolves deep-level relationships within the green algal class Trebouxiophyceae. BMC Evol. Biol. 14, 211 (2014).
Article Google Scholar
Edgar, R. C. MUSCLE: multiple sequence alignment with high accuracy and high throughput. Nucleic Acids Res. 32, 1792–1797 (2004).
Article CAS Google Scholar
Capella-Gutiérrez, S., Silla-Martínez, J. M. & Gabaldón, T. trimAl: a tool for automated alignment trimming in large-scale phylogenetic analyses. Bioinformatics 25, 1972–1973 (2009).
Article Google Scholar
Smith, S. A. & Dunn, C. W. Phyutility: a phyloinformatics tool for trees, alignments and molecular data. Bioinformatics 24, 715–716 (2008).
Article CAS Google Scholar
Nguyen, L.-T., Schmidt, H. A., von Haeseler, A. & Minh, B. Q. IQ-TREE: a fast and effective stochastic algorithm for estimating maximum-likelihood phylogenies. Mol. Biol. Evol. 32, 268–274 (2015).
Article CAS Google Scholar
Chernomor, O., von Haeseler, A. & Minh, B. Q. Terrace aware data structure for phylogenomic inference from supermatrices. Syst. Biol. 65, 997–1008 (2016).
Article Google Scholar
Kalyaanamoorthy, S., Minh, B. Q., Wong, T. K. F., von Haeseler, A. & Jermiin, L. S. ModelFinder: fast model selection for accurate phylogenetic estimates. Nat. Methods 14, 587–589 (2017).
Article CAS Google Scholar
Guindon, S. et al. New algorithms and methods to estimate maximum-likelihood phylogenies: assessing the performance of PhyML 3.0. Syst. Biol. 59, 307–321 (2010).
Article CAS Google Scholar
Lartillot, N., Lepage, T. & Blanquart, S. PhyloBayes 3: a Bayesian software package for phylogenetic reconstruction and molecular dating. Bioinformatics 25, 2286–2288 (2009).
Article CAS Google Scholar
Thompson, J. D., Gibson, T. J., Plewniak, F., Jeanmougin, F. & Higgins, D. G. The CLUSTAL_X windows interface: flexible strategies for multiple sequence alignment aided by quality analysis tools. Nucleic Acids Res. 25, 4876–4882 (1997).
Article CAS Google Scholar
Kumar, S., Stecher, G., Li, M., Knyaz, C. & Tamura, K. MEGA X: molecular evolutionary genetics analysis across computing platforms. Mol. Biol. Evol. 35, 1547–1549 (2018).
Article CAS Google Scholar
Felsenstein, J. Confidence limits on phylogenies: an approach using the bootstrap. Evolution 39, 783 (1985).
Article Google Scholar
Langmead, B., Trapnell, C., Pop, M. & Salzberg, S. L. Ultrafast and memory-efficient alignment of short DNA sequences to the human genome. Genome Biol. 10, R25 (2009).
Article Google Scholar
Trapnell, C., Pachter, L. & Salzberg, S. L. TopHat: discovering splice junctions with RNA-Seq. Bioinformatics 25, 1105–1111 (2009).
Article CAS Google Scholar
O’Leary, N. A. et al. Reference sequence (RefSeq) database at NCBI: current status, taxonomic expansion, and functional annotation. Nucleic Acids Res. 44, D733–D745 (2016).
Article Google Scholar
Lemieux, C., Turmel, M., Otis, C. & Pombert, J.-F. A streamlined and predominantly diploid genome in the tiny marine green alga Chloropicon primus. Nat. Commun. 10, 4061 (2019).
Article Google Scholar
Yu, G., Wang, L. G., Han, Y. & He, Q. Y. clusterProfiler: an R package for comparing biological themes among gene clusters. OMICS 16, 284–287 (2012).
Article CAS Google Scholar
Fujimoto, S., Sugano, S. S., Kuwata, K., Osakabe, K. & Matsunaga, S. Visualization of specific repetitive genomic sequences with fluorescent TALEs in Arabidopsis thaliana. J. Exp. Bot. 67, 6101–6110 (2016).
Article CAS Google Scholar
Shimada, T. L. et al. HIGH STEROL ESTER 1 is a key factor in plant sterol homeostasis. Nat. Plants 5, 1154–1166 (2019).
Article CAS Google Scholar
Cherry, J. M. et al. Saccharomyces Genome Database: the genomics resource of budding yeast. Nucleic Acids Res. 40, D700–D705 (2012).
Article CAS Google Scholar
Camacho, C. et al. BLAST+: architecture and applications. BMC Bioinform. 10, 421 (2009).
Article Google Scholar
Benjamini, Y. & Hochberg, Y. Controlling the false discovery rate: a practical and powerful approach to multiple testing. J. R. Stat. Soc. 57, 289–300 (1995).
Google Scholar
Brunner, E. & Munzel, U. The nonparametric Behrens-Fisher problem: asymptotic theory and a small-sample approximation. Biom. J. 42, 17–25 (2000).
Article Google Scholar

Download references

Acknowledgements

This research was supported by MXT/JSPS KAKENHI funding to T.K. (19H03260 and 22H02657) and S. Matsunaga (20H05911). It was also supported by JST-CREST (JPMJCR20S6) and JST-OPERA (JPMJOP1832) grants to S. Matsunaga. We thank Edanz (https://jp.edanz.com/ac) for editing a draft of this manuscript.

Author information

Shinichiro Maruyama
Present address: Department of Integrated Biosciences, Graduate School of Frontier Sciences, The University of Tokyo, Kashiwa, Chiba, 277-8562, Japan

Authors and Affiliations

Department of Applied Biological Science, Faculty of Science and Technology, Tokyo University of Science, Noda, Chiba, 278-8510, Japan
Shoichi Kato, Takuya Sakamoto & Sachihiro Matsunaga
Department of Biological Science and Chemistry, Faculty of Science, Graduate School of Medicine, Yamaguchi University, Yoshida, Yamaguchi, 753-8512, Japan
Osami Misumi
Department of Ecological Developmental Adaptability Life Sciences, Graduate School of Life Sciences, Tohoku University, Aobaku, Sendai, 980-8578, Japan
Shinichiro Maruyama
Graduate School of Humanities and Sciences, Ochanomizu University, Tokyo, 112-8610, Japan
Shinichiro Maruyama
Department of Biological Sciences, Graduate School of Science, The University of Tokyo, Hongo, Tokyo, 113-0033, Japan
Hisayoshi Nozaki
Biodiversity Division, National Institute for Environmental Studies, Onogawa, Tsukuba, Ibaraki, 305-8506, Japan
Hisayoshi Nozaki, Shigekatsu Suzuki, Haruyo Yamaguchi & Masanobu Kawachi
Department of Integrated Biosciences, Graduate School of Frontier Sciences, The University of Tokyo, Kashiwa, Chiba, 277-8562, Japan
Yayoi Tsujimoto-Inui, Nanami Ito, Yoji Okabe, Tomoko M. Matsunaga & Sachihiro Matsunaga
Department of Botany, Graduate School of Science, Kyoto University, Kyoto, 606-8502, Japan
Mari Takusagawa
Institute of Transformative Bio-Molecules (WPI-ITbM), Nagoya University, Chikusa, Nagoya, 464-8602, Japan
Keiko Kuwata
Division of Biological Science, Graduate School of Science, Nagoya University, Nagoya, Japan
Saki Noda & Yoshikatsu Matsubayashi
Center for Research Advancement and Collaboration, University of the Ryukyus, Okinawa, 903-0213, Japan
Fumi Yagisawa
Graduate School of Engineering and Science, University of the Ryukyus, Okinawa, 903-0213, Japan
Fumi Yagisawa
Department of Chemical and Biological Science, Faculty of Science, Japan Women’s University, Tokyo, 112-8681, Japan
Haruko Kuroiwa & Tsuneyoshi Kuroiwa

Authors

Shoichi Kato
View author publications
You can also search for this author in PubMed Google Scholar
Osami Misumi
View author publications
You can also search for this author in PubMed Google Scholar
Shinichiro Maruyama
View author publications
You can also search for this author in PubMed Google Scholar
Hisayoshi Nozaki
View author publications
You can also search for this author in PubMed Google Scholar
Yayoi Tsujimoto-Inui
View author publications
You can also search for this author in PubMed Google Scholar
Mari Takusagawa
View author publications
You can also search for this author in PubMed Google Scholar
Shigekatsu Suzuki
View author publications
You can also search for this author in PubMed Google Scholar
Keiko Kuwata
View author publications
You can also search for this author in PubMed Google Scholar
Saki Noda
View author publications
You can also search for this author in PubMed Google Scholar
Nanami Ito
View author publications
You can also search for this author in PubMed Google Scholar
Yoji Okabe
View author publications
You can also search for this author in PubMed Google Scholar
Takuya Sakamoto
View author publications
You can also search for this author in PubMed Google Scholar
Fumi Yagisawa
View author publications
You can also search for this author in PubMed Google Scholar
Tomoko M. Matsunaga
View author publications
You can also search for this author in PubMed Google Scholar
Yoshikatsu Matsubayashi
View author publications
You can also search for this author in PubMed Google Scholar
Haruyo Yamaguchi
View author publications
You can also search for this author in PubMed Google Scholar
Masanobu Kawachi
View author publications
You can also search for this author in PubMed Google Scholar
Haruko Kuroiwa
View author publications
You can also search for this author in PubMed Google Scholar
Tsuneyoshi Kuroiwa
View author publications
You can also search for this author in PubMed Google Scholar
Sachihiro Matsunaga
View author publications
You can also search for this author in PubMed Google Scholar

Contributions

S.K., O.M., S. Matsunaga, and T.K. designed the project. H.K. and T.K. performed the morphological analyses of M. hakoo, C. merolae, and S. cerevisiae. S.K., S.S., H.Y., and M.K. analyzed the nuclear genome data. M.T. analyzed the organellar genome data. H.N. and S. Maruyama performed the phylogenetic analysis. O.M., Y.T.I., and M.T. extracted the genomic DNA. Y.T.I., K.K., S.N., and Y.M. performed the proteome analysis. T.K. and H.K. performed the optical and electron microscopy analyses. Y.T.I. and T.M.M. performed the lipid accumulation assay. S.K., S. Maruyama, T.S., N.I., Y.O., F.Y., and S. Matsunaga analyzed the conserved genes in M. hakoo. S.K., S. Matsunaga, S. Maruyama, H.N., M.T., T.S., N.I., and Y.O. wrote the manuscript, which was edited by the other authors.

Corresponding authors

Correspondence to Tsuneyoshi Kuroiwa or Sachihiro Matsunaga.

Ethics declarations

Competing interests

The authors declare no competing interests.

Peer review

Peer review information

Communications Biology thanks the anonymous reviewers for their contribution to the peer review of this work. Primary Handling Editors: Shahid Mukhtar, Caitlin Karniski and George Inglis. Peer reviewer reports are available.

Additional information

Publisher’s note Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Supplementary information

Peer Review File

Supplementary Information

Description of Additional Supplementary Data

Supplementary Data 1

Supplementary Data 2

Supplementary Data 3

Supplementary Data 4

Supplementary Data 5

Supplementary Data 6

Reporting Summary

Rights and permissions

Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made. The images or other third party material in this article are included in the article’s Creative Commons license, unless indicated otherwise in a credit line to the material. If material is not included in the article’s Creative Commons license and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this license, visit http://creativecommons.org/licenses/by/4.0/.

Reprints and permissions

About this article

Cite this article

Kato, S., Misumi, O., Maruyama, S. et al. Genomic analysis of an ultrasmall freshwater green alga, Medakamo hakoo. Commun Biol 6, 89 (2023). https://doi.org/10.1038/s42003-022-04367-9

Download citation

Received: 17 November 2021
Accepted: 12 December 2022
Published: 23 January 2023
DOI: https://doi.org/10.1038/s42003-022-04367-9

Comments

By submitting a comment you agree to abide by our Terms and Community Guidelines. If you find something abusive or that does not comply with our terms or guidelines please flag it as inappropriate.

Subjects

Abstract

Similar content being viewed by others

Introduction

Results

Investigation of M. hakoo cellular characteristics

Sequencing and evolutionary analysis of the M. hakoo genome

Nuclear genome analysis

Effect of a high G + C content on the amino acid content in the M. hakoo proteome and gene expression

Relationship between the extremely high G + C content of the M. hakoo genome and the notable accumulation of the guanine quadruplex consensus sequence

Comparison between the M. hakoo genome and the genomes of other species

Conservation of important biological pathways

Analysis of orthogroup composition in M. hakoo and other algae

Analysis of shared genes among microalgae

Discussion

Methods

Materials

Synchronization culture

Fluorescence microscopy

Transmission electron microscopy

De novo whole-genome assembly

RNA-seq analysis

Annotation of the M. hakoo genome

Assessment of the quality of the M. hakoo genome assembly and annotation

Phylogenomic analyses of amino acid datasets

Lipid formation culture

Gene expression analysis

Genome size and gene number comparison among plants

Pathway analysis

Enrichment analysis

SDS-PAGE and in-gel digestion

Mass spectroscopic and chromatographic methods, instrumentation, and database searches

Codon count

G4 analysis

Orthogroup analysis

Principal component analysis

Nomenclatural Acts

Statistics and reproducibility

Reporting summary

Data availability

References

Acknowledgements

Author information

Authors and Affiliations

Contributions

Corresponding authors

Ethics declarations

Competing interests

Peer review

Peer review information

Additional information

Supplementary information

Rights and permissions

About this article

Cite this article

Share this article

Comments

Search

Quick links