Ultrasmall algae have attracted the attention of biologists investigating the basic mechanisms underlying living systems. Their potential as effective organisms for producing useful substances is also of interest in bioindustry. Although genomic information is indispensable for elucidating metabolism and promoting molecular breeding, many ultrasmall algae remain genetically uncharacterized. Here, we present the nuclear genome sequence of an ultrasmall green alga of freshwater habitats, Medakamo hakoo. Evolutionary analyses suggest that this species belongs to a new genus within the class Trebouxiophyceae. Sequencing analyses revealed that its genome, comprising 15.8 Mbp and 7629 genes, is among the smallest known genomes in the Viridiplantae. Its genome has relatively few genes associated with genetic information processing, basal transcription factors, and RNA transport. Comparative analyses revealed that 1263 orthogroups were shared among 15 ultrasmall algae from distinct phylogenetic lineages. The shared gene sets will enable identification of genes essential for algal metabolism and cellular functions.
Microalgae are microscopic unicellular phytoplankton found in freshwater, seawater, and sediment, and are invisible to the naked eye1. Microalgae form the basis of the food chain in aquatic ecosystems, and play important roles in carbon dioxide capture and sequestration through photosynthesis2. Despite their ecological importance in providing energy to support all higher trophic levels, more than 70% of the species are estimated to remain unidentified3. Microalgae have been used to produce highly functional foods, biofuels, and materials used in cosmetics1. To improve the production efficiency and profitability of current algal culture systems, demand is increasing for especially small microalgae that can be cultured at high densities.
We focused on Medakamo hakoo (Chlorophyta), an ultrasmall algal species found in freshwater that potentially may provide notable insights into genome biology of algae. Medakamo hakoo was first identified and reported in 20154. A previous study involving DNA staining revealed that M. hakoo likely has the smallest known nucleus among Archaeplastida species5. Although some microalgae inhabiting seawater and hot springs have extremely simple genomes6,7,8,9,10, relatively few freshwater algae with extremely small genomes have been reported. Genomic analysis of M. hakoo is expected to produce useful information for future investigations on effective culture methods for optimal production of useful substances. Genomic information for M. hakoo will also contribute to understanding how eukaryotic phototrophs thrive in diverse environments. In addition, comparison of the genomes of M. hakoo and other ultrasmall algal species is an effective strategy for identifying the gene set common to algal species and genes common to green algal lineages.
In this study, we first characterized the morphological features and synchronization of the cell cycle of M. hakoo. Next, the M. hakoo genome sequence was assembled from long reads generated using the PacBio RSII system in conjunction with RNA-seq analysis of the transcriptome. Finally, comparison of the genomes of M. hakoo and 14 other microalgal species revealed that M. hakoo has one of the smallest genomes among freshwater algae, and 1263 gene families conserved among microalgae were identified.
Investigation of M. hakoo cellular characteristics
To characterize M. hakoo morphology, we first used SYBR Green I stain to label M. hakoo, Cyanidioschyzon merolae, and Saccharomyces cerevisiae nuclei, and observed that the fluorescence intensity of the M. hakoo nuclei was similar to that of C. merolae and S. cerevisiae nuclei (Fig. 1a, b, Supplementary Fig. 1). Cyanidioschyzon merolae is an ultrasmall unicellular red alga with the smallest genome in Rhodophyta6,7. Fluorescence and transmission electron microscopic examination indicated that M. hakoo cells were approximately 1 µm in diameter and contained relatively few organelles, with only a single mitochondrion and chloroplast (Fig. 1c–g, Supplementary Fig. 2). Notably, a specific electron-dense structure in the nuclear peripheral region and thick cell walls were typical characteristics of M. hakoo cells (Fig. 1f, Supplementary Fig. 2d, e). Another structural feature observed in M. hakoo cells was the accumulation of starch aggregates in the chloroplast (Supplementary Fig. 2d). In C. merolae, starch aggregated in the cytoplasm (Supplementary Fig. 2a, c). Additionally, phycobilisomes were undetectable in M. hakoo chloroplasts (Fig. 1f, Supplementary Fig. 2). To examine the M. hakoo cell division pattern, we cultured cells under a light–dark cycle to obtain highly synchronized cells, and detected the following cell-cycle stages: a single-cell stage (I), a two-cells-combined stage (II), a tetrad stage (III), and a dissection stage (IV) (Fig. 1h, i, Supplementary Fig. 3). In addition, M. hakoo cells cultured in nitrogen-depleted medium typically formed lipid droplets, similar to the response of the oil-rich alga Botryococcus braunii (Supplementary Fig. 4)11,12,13.
Sequencing and evolutionary analysis of the M. hakoo genome
From the long-read sequencing analysis, we obtained 18 contigs via a de novo sequence assembly (Table 1, Supplementary Table 1), of which two contigs were annotated as organellar genomes because they were circular sequences. To perform a phylogenetic analysis, we used the M. hakoo organellar genome, which we previously described14. A phylogenetic tree was constructed on the basis of plastid genome sequences from 62 chlorophyte taxa using the maximum-likelihood (ML) method (Fig. 2a). The resulting tree suggested that M. hakoo is classifiable in the class Trebouxiophyceae. Additionally, M. hakoo is likely evolutionarily related to B. braunii, which shows potential for algal fuel production15,16 (Supplementary Fig. 4).
In our phylogenetic analysis, Medakamo and Choricystis formed a small clade sister to Botryococcus (Fig. 2a). Many algal strains originating from various freshwater habitats were recently identified as Choricystis species, and their rbcL sequences are available in the NCBI database (e.g., Novis et al.17). In addition, three Choricystis species were identified mainly on the basis of phylogenetic analyses by Pröschold and Darienko18. To more precisely resolve the phylogenetic relationships between Medakamo and Choricystis, 54 Choricystis rbcL sequences in the NCBI database, two new rbcL sequences from strains studied by Pröschold and Darienko18, and the Medakamo rbcL sequence were included in a phylogenetic analysis, with Botryococcus sequences as the outgroup (Fig. 2b). The phylogenetic tree robustly resolved two sister clades (with bootstrap values of 83% or higher) that corresponded to Choricystis and the new genus Medakamo. Medakamo comprises M. hakoo sp. nov. and M. limnetica comb. nov. (= Choricystis limnetica) and can be clearly distinguished from Choricystis and other freshwater green algae on the basis of their phylogenetic position and differences in cell morphology (Supplementary Note 1). Thus, Medakamo is recognized as a new genus within the class Trebouxiophyceae.
Nuclear genome analysis
The 16 non-organellar contigs were chromosomal sequences flanked by telomere sequences (5′-TTAGGG-3′). The total contig size was 15.8 Mb (Tables 1, 2), which was larger than the genome size estimated by fluorescence microscopy5. Thus, the contigs represented almost the complete genome sequence.
Next, we validated the genome assembly by confirming several benchmarks. The sequence coverage of the assembly was 246.8×. To further evaluate the assembled genome, we conducted a Benchmarking Universal Single-Copy Orthologs (BUSCO) analysis19,20. After removing organellar contigs, 89.5% of the assembly comprised complete BUSCOs (Supplementary Fig. 5). We also mapped RNA-seq reads to the contigs, with 98.6% of the reads successfully aligned. Moreover, tRNAs corresponding to all 20 amino acids were identified (Supplementary Table 2).
Using the BRAKER2 software21, we identified 7629 candidate protein-coding sequence (CDS) regions in the nucleus, and annotated these sequences using eggNOG-mapper22 (Supplementary Data 1) and GhostKOALA23 (Supplementary Data 2). Amino acid sequences were obtained for 91.0% of the complete BUSCOs (Supplementary Fig. 5).
Effect of a high G + C content on the amino acid content in the M. hakoo proteome and gene expression
A high G + C content (73 mol%) is one characteristic of the M. hakoo genome (Table 2). The G + C content was high in protein-coding regions as well (Supplementary Table 3). A high G + C content in the protein-coding regions may increase the number of certain types of amino acids in the resultant proteins (Fig. 3a). We analyzed the amino acid composition of the predicted protein sequences in M. hakoo and compared them with the amino acid composition in other microalgae. Our analysis demonstrated that M. hakoo proteins contained many alanine, glycine, and proline residues. More specifically, alanine was more abundant in the M. hakoo proteome than in the C. merolae proteome (Fig. 3b). To investigate the relationship between gene expression and the G + C content in the CDSs, we plotted the G + C content and the transcripts per million (TPM) value for each gene (Fig. 3c), which revealed a negative correlation between these two factors. A negative correlation was also detected between the alanine content and the TPM value for each gene (Fig. 3d). These results suggested that highly expressed genes (e.g., housekeeping genes) were relatively unaffected by the bias toward a high G + C content in the M. hakoo genome.
We further analyzed M. hakoo expression patterns using proteomics, which detected more than 3000 unique peptides across samples (Supplementary Data 3). All codons were represented in the proteins detected (Table 3, Supplementary Fig. 6), suggesting that M. hakoo cells use a standard codon table, with the caveat that mass spectrometry cannot distinguish between leucine and isoleucine residues.
Relationship between the extremely high G + C content of the M. hakoo genome and the notable accumulation of the guanine quadruplex consensus sequence
To determine the reason for the high G + C content in the M. hakoo genome, we examined the G + C content-related characteristics of the genome sequence. We focused on the guanine quadruplex (G4) structure, which is a non-B DNA structure that forms in guanine-abundant DNA regions24,25,26. The guanine nucleotides form a tetrad structure that is involved in various biological processes, including DNA replication, transcription, meiotic double-strand break, and telomere maintenance24,25,26. To predict the abundance of G4 in the genome of M. hakoo and other organisms, we used pqsfinder software, which can detect the G4 consensus sequence in DNA sequences27 (Fig. 3e). The frequency of the predicted G4 regions was highest in M. hakoo among the analyzed organisms. Notably, the frequency was higher than that in Streptomyces coelicolor, which has a genome with one of the highest G + C contents among prokaryotes.
Comparison between the M. hakoo genome and the genomes of other species
We compared the M. hakoo genome with previously sequenced genomes in terms of their size and gene number (Fig. 4a; Table 2). The genome of M. hakoo was one of the smallest among Viridiplantae species and was smaller than that of Auxenochlorella protothecoides, which has the smallest genome previously known among freshwater green algae (Fig. 4a). We classified the M. hakoo nuclear genes according to the Kyoto Encyclopedia of Genes and Genomes (KEGG) Orthology28 and determined the number of genes belonging to each functional group. First, we calculated the number of genes in the most enriched KEGG Orthology categories and compared the results with those for other green algae (Fig. 4b). Overall, M. hakoo had fewer genes than the other examined species and, most notably, fewer genes belonging to the ‘Genetic information processing’ category. Next, we determined the number of genes annotated with specific pathways categorized as ‘Genetic information processing’. For many of these pathways, M. hakoo tended to have fewer genes compared with other green algal species (Fig. 4c). On the basis of the gene sets, the pathways associated with ‘Basal transcription factors’ and ‘Nucleocytoplasmic transport’ were considerably simpler in M. hakoo than in Ostreococcus lucimarinus, a marine green alga with a smaller genome than M. hakoo (Fig. 4c).
Conservation of important biological pathways
To more precisely determine the characteristics of the M. hakoo genome, we analyzed the conservation of genes associated with major biological pathways. Although almost all fundamental photosynthesis-related genes were conserved in M. hakoo, genes encoding stress-related light-harvesting complex (LHC)-like proteins (LHCSR) were not detected in the M. hakoo genome (Supplementary Figs. 7, 8). Regarding the photoreceptors, M. hakoo had cryptochrome and phototropin orthologs, but lacked a phytochrome homolog. Although several green algae have phytochromes29, their absence in M. hakoo is consistent with the lack of phytochromes in the green alga Chlamydomonas reinhardtii30 and the red alga Cyanidioschyzon merolae6.
Our analysis also confirmed that M. hakoo had a set of conserved canonical histones, including H2A, H2B, H3, and H4. However, we did not detect an ortholog of the linker histone H1 encoded in the M. hakoo genome. To confirm the conservation of other chromosomal proteins, we analyzed structural maintenance of chromosomes (SMC) protein complexes (cohesin, condensin I/II, and SMC5/6), which are key regulators of chromosomal organization, dynamics, and stability31. Our analysis indicated that canonical components of all SMC complexes were conserved in M. hakoo (Supplementary Table 4).
Notably, relatively few nuclear envelope (NE)-related genes were identified in the M. hakoo genome. We investigated whether the NE-related genes were conserved in M. hakoo. The NE is composed of the outer nuclear membrane, the inner nuclear membrane, and the nuclear pore complex, which is mechanically supported by the nuclear lamina beneath the inner nuclear membrane32,33. We determined that the nuclear pore complex components (nucleoporins) were highly conserved, in contrast to other NE-related factors, which were minimally conserved. Only one ortholog of the mid-Sad1/UNC84 domain-containing protein (mid-SUN)34 was a conserved inner nuclear membrane protein, and none of the outer nuclear membrane and nuclear lamina proteins were conserved.
RNA interference (RNAi) is the molecular mechanism that regulates gene expression via small RNA and related proteins. Although RNAi-related proteins, such as Dicer and Argonaute, are widely conserved among eukaryotic organisms35,36, genes encoding these proteins were not detected in the M. hakoo genome. The genomes of microalgae, including C. merolae and O. tauri, also appear to have lost RNAi-related genes (Supplementary Table 5).
Autophagy involves the transport of a cytoplasmic cargo to the lysosome for degradation37. This mechanism is crucial for cell homeostasis because it allows the cell to recycle itself via the degradation of its components. However, no orthologs of autophagy-related genes are known in red algae38. By screening the M. hakoo genome, we identified conserved autophagy-related (ATG) genes (Supplementary Table 6), including some that encode core ATG proteins required for autophagosome formation39. This indicates that the autophagy system is conserved in chlorophytes, but not in rhodophytes.
Analysis of orthogroup composition in M. hakoo and other algae
For further analysis of the gene composition in M. hakoo and other algae, we analyzed orthogroups of M. hakoo and 14 other algal species comprising 11 green algae (A. protothecoides, Bathycoccus prasinos, Chlamydomonas reinhardtii, Chlorella variabilis, Coccomyxa subellipsoidea, Micromonas commoda, Micromonas pusilla, Monoraphidium neglectum, O. lucimarinus, O. tauri, and Volvox carteri) and three red algae (Chondrus crispus, Cyanidioschyzon merolae, and Galdieria sulphuraria) with OrthoFinder40,41. The composition of each algal orthogroup is summarized in Supplementary Data 4 and the data were analyzed using principal component analysis (Fig. 5a). The microalgal genomes were classifiable into several groups according to their orthogroup composition. Medakamo hakoo belonged to a group composed of trebouxiophycean algae (A. protothecoides, C. subellipsoidea, and C. variabilis) and had the smallest genome in this group.
Analysis of shared genes among microalgae
Given that M. hakoo had the smallest genome known among trebouxiophycean microalgae, we compared the orthogroups of M. hakoo and 14 microalgal species from other lineages to identify the gene set common to these microalgae (C15 gene set). We added S. cerevisiae gene sets to the OrthoFinder analysis to divide the C15 gene set into the following two classes: genes typically conserved in eukaryotes (CE) gene set and algae-specific (AS) genes. The AS gene set was defined with a purposefully lenient criterion to maximally capture the potential diversity of microalgal orthologs.
The 1263 orthogroups were classified into the C15 gene set, of which 984 and 279 orthogroups comprised the CE and AS gene sets, respectively (Fig. 5b, Supplementary Data 5). The CE and AS gene sets in the M. hakoo genome tended to be more highly expressed than the genes not included in these sets (Fig. 5c, Supplementary Table 7), indicating that the gene sets include many housekeeping genes. Next, we compared each gene set with the M. hakoo genome in terms of the proportion of genes assigned to particular KEGG Orthology categories, which revealed that the proportion of genes associated with metabolism was substantially higher in the AS gene set (Fig. 5d).
We also analyzed the enriched pathways in these gene sets. To evaluate the extent of the enrichment, we mapped the C15 and AS gene sets to KEGG pathways and calculated the ratio of the mapped genes in each pathway. We considered that a high ratio of AS genes to C15 genes indicated that the pathway strongly contributed to the characteristics of the AS gene set. The highly enriched pathways were associated with metabolism and photosynthesis (Fig. 5e, Table 4). Notably, the AS genes were mainly associated with pathways involved in secondary metabolism (Fig. 5e, Table 4).
In this study, we revealed that the freshwater green alga M. hakoo is an ultrasmall microalgae with cells 1 µm in diameter that can accumulate useful substances, including starch and lipids, and thus may be of utility for bioproduction. The most unique characteristic of M. hakoo is strong synchronization of the cell cycle under a light–dark cycle. Recently, it was shown that the key to maintaining dense algal cultures is to avoid clogging of the photosynthetic electron flow by appropriate regulation of the timing of light–dark cycles42. Thus, this attribute confers the potential to contribute to effective and stable bioproduction of useful substances with uniform quality1. In addition to these characteristics, M. hakoo has an extremely small genome. On the basis of the available information on Viridiplantae species in the KEGG Organisms database43 and the JGI genome portal44,45, M. hakoo likely possesses one of the smallest genomes among freshwater Viridiplantae species. This finding suggests that maintaining a small genome and small cells is advantageous for survival, not only in seawater and extreme environments, but also in common freshwater environments. According to the package effect11,12,13, small cells are advantageous for photosynthesis. Although the mechanisms underlying the maintenance of ultrasmall cells remain unknown in microalgae, generic factors, such as photosynthetic efficiency, may have induced a decrease in genome size or suppressed genomic expansion.
In green plants, LHCSR and PSBS are conserved LHC-like proteins with functions associated with non-photochemical quenching46. Although there is some uncertainty regarding the underlying molecular mechanism, LHCSR plays a major role in the dissipation of excess light energy in C. reinhardtii. Regarding land plants, both LHCSR and PSBS help to dissipate excess energy in the moss Physcomitrium, but Arabidopsis thaliana lacks LHCSR and uses PSBS as a central component of its light energy dissipation machinery46. Additionally, PSBS is universally conserved in Viridiplantae, with the exception of bryopsidalean algae (Ostreobium and Caulerpa), which may have adapted to diverse light conditions47. In M. hakoo, the absence of LHCSR and the presence of the plastid genome-encoded cemA, which is required for the tolerance of Chlamydomonas to high-light stress48, suggest that this alga may have a unique high-light acclimation mechanism that may have convergently evolved in flowering plants and in an early-branching green alga (Chloropicon) (Supplementary Figs. 7, 8).
The M. hakoo genome does not contain a histone H1 gene CDS. Histone H1 includes the H15 domain, which is a globular domain comprising a winged-helix motif49. Green algae, including Chlorella sorokiniana50 and Haematococcus lacustris51, and red algae, including C. merolae6, have histone H1 orthologs or H15 domain-containing proteins. Similar to M. hakoo, the genome of O. tauri lacks a gene encoding a protein with the H15 domain8. Because the genes encoding SMC proteins, including those in condensin and cohesin, are conserved in these genomes, chromatin folding and chromosome condensation in these green algae occur without the linker histone H1.
The M. hakoo genome has a very high G + C content and abundant G4. The G4 structure, which is commonly located in telomere sequences but is also present within chromosomes, has diverse functions, including transcriptional and translational regulation. We speculate that a high frequency of the G4 consensus sequence is a characteristic of the M. hakoo genome, and G4-related biological processes may have contributed to the elevated genomic G + C content in this species during its evolution.
Principal component analysis of the orthogroup composition of M. hakoo and 14 other microalgal species resolved multiple groups of species. Interestingly, M. hakoo and Ostreococcus, which have extremely small genomes, were placed in separate groups. This result indicates that ortholog compositions in microalgae are not dependent on the genome size but rather may reflect lineage-specific gene gains/losses. The AS gene set reflects well the genomic, metabolomic, and physiological characteristics of microalgae. For example, the AS gene set included those genes associated with terpenoid-related secondary metabolites. Carotenoids, one subgroup of tetraterpenoids, play a role as an antenna pigment for harvesting light and provide protection against oxidative stress52,53, which is beneficial for human health. Large-scale production of carotenoids for health-related industries using microalgae is flourishing54. The AS gene set determined in the current study provides information relevant for bioengineering of microalgae. In addition to the M. hakoo genome described in this study, the availability of genome sequences for a broad range of other ultrasmall algae would provide a foundation to identify the minimal conserved gene set of plants (algae and land plants), and to understand how photosynthetic eukaryotes thrive in diverse environments.
Medakamo hakoo 311 was obtained from the personal aquarium of Prof. Kuroiwa (Kagurazaka, Tokyo, Japan)4. The M. hakoo strain was cultured in 0.05% HYPONeX (HYPONeX Japan Corp., Ltd., Osaka, Japan) liquid medium and on 0.05% HYPONeX gellan gum-based solid medium in plates. Cyanidioschyzon merolae 10D (Toda et al. 1995) was cultured in Misumi–Kuroiwa medium at pH 2.2 and 42 °C55. The Misumi–Kuroiwa medium was prepared by diluting 1 mL of a commercial nutrient solution (Hyponex, N: P: K 10: 8: 8; Hyponex Japan, Osaka, Japan) to 1 L with distilled or tap water. The pH in the medium was adjusted to pH 2.2 with 1 mL concentrated HCl. Diploid Saccharomyces cerevisiae BY4743 strains were cultured at 30 °C in YPD medium that contained 1% yeast extract (Oriental Yeast Co., Ltd., Tokyo, Japan), 2% peptone (Kyokuto Co., Ltd., Tokyo, Japan), and 2% glucose56. The B. braunii Kützing (NIES-2199) line was obtained from the Microbial Culture Collection at the National Institute for Environmental Studies (Japan) and cultured in AF-6 medium57 at 22 °C.
The light–dark cycle for the cell-cycle synchronization culture was as follows: 12-h light:12-h dark. In the mitotic phase, M. hakoo cells were sampled every hour and examined using a microscope. Each cell type (one-cell, two-cells, and four-cells) was counted. More than five fields of view (1 × 103 µm2) were selected for each sample. This experiment was performed several times and representative results are presented.
The M. hakoo, C. merolae, and S. cerevisiae cells were stained with 4’, 6-diamidino-2-phenylindole (DAPI) and SYBR Green I (Molecular Probes, Eugene, OR, USA)5. SYBR Green I stain, which has been used to examine cell nuclei in various algae because it is unaffected by the genomic G + C content58, was used to confirm the presence of the cell nuclei and organelle nucleoids revealed by conventional DAPI staining59. Cultures were centrifuged and the resulting pellet was resuspended, after which a 3 µL aliquot of the solution was placed on a glass slide to form a droplet. Next, 3 µL 1% (v/v) glutaraldehyde in NS buffer (0.25 M sucrose, 1 mM EDTA, 7 mM 2-mercaptoethanol, 0.8 mM PMSF, 1 mM magnesium chloride, 0.1 mM calcium chloride, 0.1mM zinc sulfate and 20 mM Tris-HCl, pH 7.6) was added to the droplet, followed by the addition of 3 µL DAPI (15 µg mL−1) or 3 µL SYBR Green I (1 µg mL−1). A coverslip was placed on the droplet and then gently pressed. The stained samples were observed using an Olympus BH-2 BHS epifluorescence microscope.
Transmission electron microscopy
Electron microscopy analyses were performed as previously described4. Briefly, M. hakoo was fixed for 4 h in 1% (v/v) glutaraldehyde in a sodium cacodylate buffer. After post-fixation and dehydration steps, the samples were embedded in Spurr’s resin. Ultrathin sections of the samples were stained with 5% (w/v) uranyl acetate and lead citrate. The JEM 1200 EXS electron microscope (JEOL Ltd., Tokyo, Japan) was used to examine the prepared samples.
De novo whole-genome assembly
Medakamo hakoo cells were frozen and then ground using a mortar and pestle. We extracted genomic DNA in two phenol extraction cycles, which were followed by an ethanol precipitation step. The DNA solution was purified by cesium chloride density-gradient centrifugation. The genomic DNA was purified using the AMPure XP kit (Beckman Coulter, CA, USA). Purified samples were fragmented using g-TUBE (Covaris, IL, USA). The fragmented DNA was blunted and fused with SMRTbell adapters using the SMRTbell Template Prep Kit 1.0 (Pacific Bioscience, CA, USA). We evaluated the size distribution of the adapter-fused DNA by pulse-field electrophoresis and performed a size selection step (15.0 kb cut-off) using the BluePippin system (Sage Science, MA, USA). The sequencing library was quantified using the Agilent 2200 TapeStation (Agilent Technologies, CA, USA). The sequencing primer was annealed and DNA polymerase was bound to the sequencing library using the DNA/Polymerase Binding Kit P6 (version 2) (Pacific Bioscience, CA, USA). The sequencing templates were bound to magnetic beads using MagBead OneCellPerWell (Pacific Bioscience, CA, USA) and added to a SMART cell for the subsequent sequencing on the PacBio RS II system (Pacific Bioscience, CA, USA). For the de novo sequence assembly, the obtained reads were analyzed with the RS_HGAP Assemble.3 program of the SMRT analysis software (version 2.3.0).
Medakamo hakoo cells were ground to a powder in liquid nitrogen with a pestle and mortar. The cell powder was resuspended in 5 mL warmed (55 °C) nucleic acid extraction buffer (300 mM NaCl, 50 mM Tris-HCl [pH 7.6], 100 mM EDTA [pH 8.0], 2% Sarkosyl, and 4% SDS) and the resulting solution was stirred. The RNA extraction using a phenol/chloroform/isoamyl alcohol mixture (25:24:1, v/v/v) was repeated twice. The extract was purified by ethanol precipitation, and the total RNA was isolated using the RNeasy Plant Mini Kit (Qiagen, CA, USA). Next, 1 µL (5 units) Recombinant DNase I (RNase-free) (Takara Bio, Shiga, Japan) in 10 µL 10× buffer and 1 µL (40 units) Recombinant RNase Inhibitor (Takara Bio) were added to 100 µL crude RNA solution, which was then incubated at 37 °C for 30 min. After the addition of 5 µL 0.5 M EDTA, the solution was incubated at 80 °C for 5 min. The RNA in the solution was precipitated in ethanol and then dissolved in pure water. After the quality was checked using a NanoDrop spectrophotometer (Thermo Fisher Scientific, MA, USA) and the Agilent 2200 TapeStation, a sequencing library was produced using the TruSeq Stranded mRNA Sample Prep Kit (Illumina, CA, USA). The quality of the sequencing library was evaluated using the Agilent 2100 Bioanalyzer (Agilent Technologies). Both cBot and the HiSeq PE Cluster Kit (version 4) (Illumina) were used for the cluster formation step. The library was sequenced (100-bp paired-end reads) using the HiSeq 2500 system and the HiSeq SBS Kit (version 4) (Illumina).
Annotation of the M. hakoo genome
Sixteen contigs were annotated after removing two organellar contigs. The completeness of the contigs was evaluated using BUSCO (version 5.2.2)19 and the chlorophyta_odb10 dataset. Repeat sequences were identified and masked using RepeatModeler (version 2.0.2)60 and RepeatMasker (version 4.1.1) (https://www.repeatmasker.org/). Gene models were predicted according to RNA-seq reads, which were trimmed using the default options of fastp (version 0.20.0)61. The trimmed RNA-seq reads were mapped to the contigs using the default options of HISAT2 (version 2.2.1)62, whereas the initial gene models were predicted using “stopCodonExcludedFromCDS = False” of BRAKER2 (version 2.1.5)21. Gene models were also predicted using PASA (version 2.4.1)63, GeneMark-ET (version 4.33)64, and SNAP (version 2006-07-28)65 in the funannotate pipeline (version 1.8.9) (https://github.com/nextgenusfs/funannotate). These gene models were combined with those obtained from BRAKER2 (with weight = 1) using EvidenceModeler (version 1.1.1)66. The quality of gene prediction was evaluated using BUSCO19,20 with the protein mode and the chlorophyta dataset. Functional annotation was conducted using eggNOG-mapper 2.1.922 and GhostKOALA23.
Assessment of the quality of the M. hakoo genome assembly and annotation
To evaluate the quality of the M. hakoo genome assembly, we performed a BUSCO analysis (Simão et al. 2015) using the chlorophyta dataset. Contig sequences excluding organelle-derived contigs and predicted protein-coding sequences were used for the BUSCO analysis.
Phylogenomic analyses of amino acid datasets
The chloroplast genomes of M. hakoo and 63 green algae reported by Lemieux et al.67 were included in the phylogenomic analyses. A total of 79 protein-coding genes were used to construct the following phylogenetic datasets as previously: accD, atpA, B, E, F, H, I, ccsA, cemA, chlB, I, L, N, clpP, cysA, T, ftsH, infA, minD, petA, B, D, G, L, psaA, B, C, I, J, M, psbA, B, C, D, E, F, H, I, J, K, L, M, N, T, Z, rbcL, rpl2, 5, 12, 14, 16, 19, 20, 23, 32, 36, rpoA, B, C1, C2, rps2, 3, 4, 7, 8, 9, 11, 12, 14, 18, 19, tufA, ycf1, 3, 4, 12, 20, 47, and 62. Amino acid datasets were prepared as follows. The deduced amino acid sequences of the 79 selected genes were aligned using MUSCLE 3.868. The ambiguously aligned regions in each alignment were removed using TRIMAL 1.469 with the following settings: block = 6, gt = 0.7, st = 0.005, and sw = 3. The protein alignments were concatenated using Phyutility 2.2.670. Maximum-likelihood analyses were conducted using the edge-linked partition model of IQ-TREE 1.6.171,72. The datasets were partitioned by gene, with the model applied to each partition. The optimal amino acid substitution model for each gene, partitioned by the datasets, was selected according to the Bayesian information criterion using the ModelFinder function of IQ-TREE73. Branch support for the resulting ML trees was calculated via a non-parametric bootstrap analysis and the SH-like approximate likelihood-ratio test74. Bayesian analyses were performed using the site-heterogeneous CATGTR + Γ4 model and PhyloBayes 4.175. Five independent chains were run for 5000 cycles, and consensus topologies were calculated from the saved trees using the BPCOMP program of PhyloBayes after a burn-in of 1250 cycles. The largest discrepancy value across all bipartitions in the consensus topologies (maxdiff) under these conditions was less than 0.13, suggesting that the chains were substantially converged.
The sequences of the chloroplast Rubisco large subunit gene (rbcL) of the algal strains identified as Choricystis and Botryococcus species were obtained as aligned sequences following a BLASTN search of the NCBI database (https://www.ncbi.nlm.nih.gov/) using the M. hakoo rbcL sequence (1431 bp) (accession no. LC709230) as the query. Sequences were aligned using ClustalX76. Additionally, the rbcL sequences of SAG 251-1 (NIES-1436) and SAG 251-218 were determined by Sanger sequencing of the PCR products (accession nos. LC709231 and LC709232) and then added to the alignment. The ML analyses of the aligned rbcL sequences were performed using MEGAX77, with the best-fit model (GTR + G + I) selected by MEGAX and topological support assessed with 1000 bootstrap replicates78. Three Botryococcus sequences were treated as the outgroup on the basis of the present chloroplast multigene phylogeny (Fig. 2b).
Lipid formation culture
We cultured M. hakoo in the liquid media described in Supplementary Table 8. For B. braunii, AF-6 was used as the normal medium. After a 13-day culture, cells were collected by centrifugation and then stained with 20 µg mL−1 Nile Red diluted with phosphate buffer.
Gene expression analysis
We used the Genedata Profiler Genome software (version 10.1.14a; Genedata, Basel, Switzerland) to analyze the assembled genomic sequence and annotation data. TopHat (version 2.0.14)79,80 was used for mapping. The total read count was 68,289,325 and 95.9% of the reads were mapped onto the M. hakoo genome.
Genome size and gene number comparison among plants
Genomic data (nuclear and organellar genomes) for the following species were obtained from the RefSeq database81: A. thaliana, Glycine max, Oryza sativa, Selaginella moellendorffii, Physcomitrella patens, Amborella trichopoda, C. reinhardtii, V. carteri f. nagariensis, Monoraphidium neglectum, O. lucimarinus CCE9901, O. tauri, B. prasinos, Micromonas commoda, Micromonas pusilla CCMP1545, C. subellipsoidea C-169, C. variabilis, A. protothecoides, C. merolae strain 10D, G. sulphuraria, and C. crispus. The Chloropicon primus genome was previously analyzed by Lemieux et al.82. The genome size and gene number analyses did not include organellar genome data.
A BLASTP analysis was performed by screening the KEGG database using the predicted CDSs in the M. hakoo genome as queries. The K numbers of each gene were obtained. Next, the pathway count data for each organism (C. reinhardtii, V. carteri f. nagariensis, O. lucimarinus CCE9901, M. commoda, C. subellipsoidea C-169, C. variabilis, and A. protothecoides) were acquired from the KEGG database (https://www.genome.jp/kegg/kegg_ja.html) and compared with the pathway information for the M. hakoo genome.
A KEGG pathway enrichment analysis of the common gene sets was performed using enrichKEGG in the clusterProfiler package83. The K numbers of the C15 and PS gene sets were used for this analysis. The C15 gene set served as the background to assess the KEGG pathway enrichment of the AS gene set. Additionally, “ko” was selected as the parameter for “organisms”. Details of the statistics for the enrichment analysis are described in the Statistics and Reproducibility section.
SDS-PAGE and in-gel digestion
Protein samples were dissolved in the sample buffer and partially separated (approximately 1 cm) using a NuPAGE Bis-Tris gel (Thermo Fisher Scientific, CA, USA). Electrophoresis was performed according to instructions of the manufacturer. Each lane was excised from the unstained gel. In-gel digestion was performed using 0.01 µg/µL LysC and trypsin84.
Mass spectroscopic and chromatographic methods, instrumentation, and database searches
The resulting peptides were analyzed using the Q Exactive hybrid mass spectrometer (Thermo Fisher Scientific, CA, USA)85. The MS/MS spectra were interpreted and then peak lists were generated using Proteome Discoverer 184.108.40.2068 (Thermo Fisher Scientific, CA, USA). The SEQUEST program was used to search the in-house M. hakoo protein database with the following settings: enzyme selected with up to two missing cleavage sites; peptide mass tolerance, 10 ppm; MS/MS tolerance, 0.02 Da; fixed modification, carbamidomethylation (C); and variable modification, oxidation (M). Peptides were identified according to significant Xcorr values (high confidence filter). The peptide identification and modification information obtained from SEQUEST was manually examined and filtered to obtain confirmed peptide identification and modification lists for the HCD MS/MS analysis. The precursor ion intensity (normalized against the total peptide amount) was used for the label-free quantification.
The detected peptides encoded by specific regions in the M. hakoo genome were used for the codon count. All complete CDSs mapped on the basis of these peptides were used. We counted the codons in the genome sequence encoding the peptides using the R package coRdon (https://github.com/BioinfoHR/coRdon).
The complete M. hakoo draft genome sequence and the Escherichia coli, C. reinhardtii, C. merolae, and S. coelicolor genome sequences were analyzed using the default settings of the pqsfinder software to identify potential G4-forming sequences27.
The C. merolae6,7, O. tauri9,10, S. cerevisiae strain S288C86, and M. hakoo amino acid sequence datasets were analyzed. For the orthologous group analyses, gene families were identified using OrthoFinder40. Protein sequences were compared in all-vs-all BLASTP searches using the NCBI blast+ toolkit87, as suggested in the OrthoFinder manual.
Principal component analysis
Orthogroup composition data (Supplementary Data 4) for microalgae were analyzed with the prcomp package (version 4.0.2) in R.
This published work and the nomenclatural acts (Supplementary Note 1) it contains have been registered in PhycoBank, the proposed online registration system for the International Code of Nomenclature for algae, fungi and plants (ICN). The PhycoBank LSIDs (Life Science Identifiers) can be resolved and the associated information viewed through any standard web browser by appending the LSID to the prefix “http://phycobank.org/”. The LSIDs for this publication are: 103506; 103507; 103508.
Statistics and reproducibility
All of the culture experiments presented in this paper have been conducted multiple times to confirm reproducibility. To analyze the table data and draw the figures, we used the tidiverse package (version 1.3.1) in R and pandas (version 1.0.5) in python. Brunner-Munzel test was performed with lawstat package (version 3.5) in R. The stats package (version 4.0.2) in R was used for Bonferroni correction of p-values. In the gene enrichment analysis, the p-values were calculated using a hypergeometric distribution, and the p-values of each pathway were adjusted according to the Benjamini–Hochberg method88.
Further information on research design is available in the Nature Portfolio Reporting Summary linked to this article.
The source data underlying Figs. 1h, 3b, e, 4b, c, 5b–d are provided as Supplementary Data 6. The genome sequence read data were deposited in the Sequence Read Archive (accession numbers: SRR16480670–SRR16480673). The assembled chromosomal DNA sequences were deposited in GenBank (accession numbers: CP089450–CP089465). The transcriptome sequencing data were deposited in the Sequence Read Archive (accession number: SRR19165385), whereas the proteome data were deposited in the jPOST repository (accession number: JPST001585). All other data are available from the corresponding author.
Handbook of Microalgae-based Processes and Products (Elsevier, 2020). https://doi.org/10.1016/c2018-0-04111-0.
Onyeaka, H. et al. Minimizing carbon footprint via microalgae as a biological capture. Carbon Capture Sci. Technol. 1, 100007 (2021).
Guiry, M. D. How many species of algae are there? J. Phycol. 48, 1057–1063 (2012).
Kuroiwa, T. et al. Cytological evidence of cell-nuclear genome size of a new ultra-small unicellular freshwater green alga, “Medakamo hakoo” strain M-hakoo 311 I. Comparison with Cyanidioschyzon merolae and Ostreococcus tauri. Cytologia 80, 143–150 (2015).
Kuroiwa, T. et al. Genome size of the ultrasmall unicellular freshwater green alga, Medakamo hakoo 311, as determined by staining with 4′,6-diamidino-2-phenylindole after microwave oven treatments: II. Comparison with Cyanidioschyzon merolae, Saccharomyces cerevisiae (n, 2n), and Chlorella variabilis. Cytologia 81, 69–76 (2016).
Matsuzaki, M. et al. Genome sequence of the ultrasmall unicellular red alga Cyanidioschyzon merolae 10D. Nature 428, 653–657 (2004).
Nozaki, H. et al. A 100%-complete sequence reveals unusually simple genomic features in the hot-spring red alga Cyanidioschyzon merolae. BMC Biol. 5, 28 (2007).
Derelle, E. et al. Genome analysis of the smallest free-living eukaryote Ostreococcus tauri unveils many unique features. Proc. Natl Acad. Sci. 103, 11647–11652 (2006).
Palenik, B. et al. The tiny eukaryote Ostreococcus provides genomic insights into the paradox of plankton speciation. Proc. Natl Acad. Sci. USA 104, 7705–7710 (2007).
Blanc-Mathieu, R. et al. An improved genome of the model marine alga Ostreococcus tauri unfolds by assessing Illumina de novo assemblies. BMC Genomics 15, 1103 (2014).
Kirk, J. T. O. A theoretical analysis of the contribution of algal cells to the attenuation of light within natural waters I. general treatment of suspensions of pigmented cells. N. Phytol. 75, 11–20 (1975).
Raven, J. A. A cost-benefit analysis of photon absorption by photosynthetic unicells. N. Phytol. 98, 593–625 (1984).
Raven, J. & Beardall, J. In Microalgal Production for Biomass and High-Value Products (eds Slocombe, S. P. & Benemann, J. R.) 1–19 (CRC Press, 2016).
Takusagawa, M. et al. Complete mitochondrial and plastid DNA sequences of the freshwater green microalga Medakamo hakoo. bioRxiv https://doi.org/10.1101/2021.07.27.453968 (2021).
Metzger, P. & Largeau, C. Botryococcus braunii: a rich source for hydrocarbons and related ether lipids. Appl. Microbiol. Biotechnol. 66, 486–496 (2005).
Banerjee, A., Sharma, R., Chisti, Y. & Banerjee, U. C. Botryococcus braunii: a renewable source of hydrocarbons and other chemicals. Crit. Rev. Biotechnol. 22, 245–279 (2002).
Novis, P. M., Lorenz, M., Broady, P. A. & Flint, E. A. Parallela Flint: its phylogenetic position in the Chlorophyceae and the polyphyly of Radiofilum Schmidle. Phycologia 49, 373–383 (2010).
Pröschold, T. & Darienko, T. Choricystis and Lewiniosphaera gen. nov. (Trebouxiophyceae Chlorophyta), two different green algal endosymbionts in freshwater sponges. Symbiosis 82, 175–188 (2020).
Simão, F. A., Waterhouse, R. M., Ioannidis, P., Kriventseva, E. V. & Zdobnov, E. M. BUSCO: assessing genome assembly and annotation completeness with single-copy orthologs. Bioinformatics 31, 3210–3212 (2015).
Waterhouse, R. M. et al. BUSCO applications from quality assessments to gene prediction and phylogenomics. Mol. Biol. Evol. 35, 543–548 (2018).
Brůna, T., Hoff, K. J., Lomsadze, A., Stanke, M. & Borodovsky, M. BRAKER2: automatic eukaryotic genome annotation with GeneMark-EP+ and AUGUSTUS supported by a protein database. NAR Genom. Bioinform. 3, lqaa108 (2021).
Cantalapiedra, C. P., Hernández-Plaza, A., Letunic, I., Bork, P. & Huerta-Cepas, J. EggNOG-mapper v2: Functional annotation, orthology assignments, and domain prediction at the metagenomic scale. Mol. Biol. Evol. 38, 5825–5829 (2021).
Kanehisa, M., Sato, Y. & Morishima, K. BlastKOALA and GhostKOALA: KEGG tools for functional characterization of genome and metagenome sequences. J. Mol. Biol. 428, 726–731 (2016).
Bochman, M. L., Paeschke, K. & Zakian, V. A. DNA secondary structures: stability and function of G-quadruplex structures. Nat. Rev. Genet. 13, 770–780 (2012).
Maizels, N. Dynamic roles for G4 DNA in the biology of eukaryotic cells. Nat. Struct. Mol. Biol. 13, 1055–1059 (2006).
Maizels, N. & Gray, L. T. The G4 genome. PLoS Genet. 9, e1003468 (2013).
Hon, J., Martínek, T., Zendulka, J. & Lexa, M. pqsfinder: an exhaustive and imperfection-tolerant search tool for potential quadruplex-forming sequences in R. Bioinformatics 33, 3373–3379 (2017).
Mao, X., Cai, T., Olyarchuk, J. G. & Wei, L. Automated genome annotation and pathway identification using the KEGG Orthology (KO) as a controlled vocabulary. Bioinformatics 21, 3787–3793 (2005).
Rockwell, N. C. et al. Eukaryotic algal phytochromes span the visible spectrum. Proc. Natl Acad. Sci. USA 111, 3871–3876 (2014).
Duanmu, D. et al. Retrograde bilin signaling enables Chlamydomonas greening and phototrophic survival. Proc. Natl Acad. Sci. USA 110, 3621–3626 (2013).
Diaz, M. & Pecinka, A. Scaffolding for repair: Understanding molecular functions of the SMC5/6 complex. Genes (Basel) 9, 36 (2018).
Prunuske, A. J. & Ullman, K. S. The nuclear envelope: form and reformation. Curr. Opin. Cell Biol. 18, 108–116 (2006).
Schirmer, E. C., Guan, T. & Gerace, L. Involvement of the lamin rod domain in heterotypic lamin interactions important for nuclear organization. J. Cell Biol. 153, 479–489 (2001).
Poulet, A., Probst, A. V., Graumann, K., Tatout, C. & Evans, D. Exploring the evolution of the proteins of the plant nuclear envelope. Nucleus 8, 46–59 (2017).
Cerutti, H. & Casas-Mollano, J. A. On the origin and functions of RNA-mediated silencing: from protists to man. Curr. Genet. 50, 81–99 (2006).
Shabalina, S. A. & Koonin, E. V. Origins and evolution of eukaryotic RNA interference. Trends Ecol. Evol. 23, 578–587 (2008).
Levine, B. & Kroemer, G. Biological functions of autophagy genes: a disease perspective. Cell 176, 11–42 (2019).
Shemi, A., Ben-Dor, S. & Vardi, A. Elucidating the composition and conservation of the autophagy pathway in photosynthetic eukaryotes. Autophagy 11, 701–715 (2015).
Mizushima, N. & Komatsu, M. Autophagy: renovation of cells and tissues. Cell 147, 728–741 (2011).
Emms, D. M. & Kelly, S. OrthoFinder: solving fundamental biases in whole genome comparisons dramatically improves orthogroup inference accuracy. Genome Biol. 16, (2015).
Emms, D. M. & Kelly, S. OrthoFinder: phylogenetic orthology inference for comparative genomics. Genome Biol. 20, 238 (2019).
Zarmi, Y. et al. Enhanced algal photosynthetic photon efficiency by pulsed light. iScience 23, 101115 (2020).
Kanehisa, M. KEGG: Kyoto Encyclopedia of Genes and Genomes. Nucleic Acids Res. 28, 27–30 (2000).
Grigoriev, I. V. et al. The genome portal of the Department of Energy Joint Genome Institute. Nucleic Acids Res. 40, D26–D32 (2012).
Nordberg, H. et al. The genome portal of the Department of Energy Joint Genome Institute: 2014 updates. Nucleic Acids Res. 42, D26–D31 (2014).
Pinnola, A. The rise and fall of Light-Harvesting Complex Stress-Related proteins as photoprotection agents during evolution. J. Exp. Bot. 70, 5527–5535 (2019).
Iha, C. et al. Genomic adaptations to an endolithic lifestyle in the coral-associated alga Ostreobium. Curr. Biol. 31, 1393–1402.e5 (2021).
Rolland, N. et al. Disruption of the plastid ycf10 open reading frame affects uptake of inorganic carbon in the chloroplast of Chlamydomonas. EMBO J. 16, 6713–6726 (1997).
Kasinsky, H. E., Lewis, J. D., Dacks, J. B. & Ausió, J. Origin of H1 linker histones. FASEB J. 15, 34–42 (2001).
Arriola, M. B. et al. Genome sequences of Chlorella sorokiniana UTEX 1602 and Micractinium conductrix SAG 241.80: implications to maltose excretion by a green alga. Plant J. 93, 566–586 (2018).
Morimoto, D., Yoshida, T. & Sawayama, S. Draft genome sequence of the astaxanthin-producing microalga Haematococcus lacustris strain NIES-144. Microbiol. Resour. Announc. 9, e00128-20 (2020).
Young, A. J. The photoprotective role of carotenoids in higher plants. Physiol. Plant. 83, 702–708 (1991).
Frank, H. A. & Cogdell, R. J. Carotenoids in photosynthesis. Photochem. Photobiol. 63, 257–264 (1996).
Ren, Y., Sun, H., Deng, J., Huang, J. & Chen, F. Carotenoid production from microalgae: biosynthesis, salinity responses and novel biotechnologies. Mar. Drugs 19, 713 (2021).
Kuroiwa, T. et al. Mitotic karyotype of the primitive red alga Cyanidioschyzon merolae 10D. Cytologia (Tokyo) 85, 107–113 (2020).
Miyakawa, I., Fujimura, R. & Kadowaki, Y. Use of the nuc1 null mutant for analysis of yeast mitochondrial nucleoids. J. Gen. Appl. Microbiol. 54, 317–325 (2008).
Provasoli, L. Artificial media for fresh-water algae: problems and suggestions. Ecol. Algae Spec. Pub 2, 84–96 (1960).
Nishimura, Y., Higashiyama, T., Suzuki, L., Misumi, O. & Kuroiwa, T. The biparental transmission of the mitochondrial genome in Chlamydomonas reinhardtii visualized in living cells. Eur. J. Cell Biol. 77, 124–133 (1998).
Kuroiwa, T. & Suzuki, T. An improved method for the demonstration of the in situ chloroplast nuclei in higher plants. Cell Struct. Funct. 5, 195–197 (1980).
Flynn, J. M. et al. RepeatModeler2 for automated genomic discovery of transposable element families. Proc. Natl Acad. Sci. USA 117, 9451–9457 (2020).
Chen, S., Zhou, Y., Chen, Y. & Gu, J. fastp: an ultra-fast all-in-one FASTQ preprocessor. Bioinformatics 34, i884–i890 (2018).
Kim, D., Paggi, J. M., Park, C., Bennett, C. & Salzberg, S. L. Graph-based genome alignment and genotyping with HISAT2 and HISAT-genotype. Nat. Biotechnol. 37, 907–915 (2019).
Haas, B. J. et al. Improving the Arabidopsis genome annotation using maximal transcript alignment assemblies. Nucleic Acids Res. 31, 5654–5666 (2003).
Lomsadze, A., Burns, P. D. & Borodovsky, M. Integration of mapped RNA-Seq reads into automatic training of eukaryotic gene finding algorithm. Nucleic Acids Res. 42, e119 (2014).
Korf, I. Gene finding in novel genomes. BMC Bioinform. 5, 59 (2004).
Haas, B. J. et al. Automated eukaryotic gene structure annotation using EVidenceModeler and the Program to Assemble Spliced Alignments. Genome Biol. 9, R7 (2008).
Lemieux, C., Otis, C. & Turmel, M. Chloroplast phylogenomic analysis resolves deep-level relationships within the green algal class Trebouxiophyceae. BMC Evol. Biol. 14, 211 (2014).
Edgar, R. C. MUSCLE: multiple sequence alignment with high accuracy and high throughput. Nucleic Acids Res. 32, 1792–1797 (2004).
Capella-Gutiérrez, S., Silla-Martínez, J. M. & Gabaldón, T. trimAl: a tool for automated alignment trimming in large-scale phylogenetic analyses. Bioinformatics 25, 1972–1973 (2009).
Smith, S. A. & Dunn, C. W. Phyutility: a phyloinformatics tool for trees, alignments and molecular data. Bioinformatics 24, 715–716 (2008).
Nguyen, L.-T., Schmidt, H. A., von Haeseler, A. & Minh, B. Q. IQ-TREE: a fast and effective stochastic algorithm for estimating maximum-likelihood phylogenies. Mol. Biol. Evol. 32, 268–274 (2015).
Chernomor, O., von Haeseler, A. & Minh, B. Q. Terrace aware data structure for phylogenomic inference from supermatrices. Syst. Biol. 65, 997–1008 (2016).
Kalyaanamoorthy, S., Minh, B. Q., Wong, T. K. F., von Haeseler, A. & Jermiin, L. S. ModelFinder: fast model selection for accurate phylogenetic estimates. Nat. Methods 14, 587–589 (2017).
Guindon, S. et al. New algorithms and methods to estimate maximum-likelihood phylogenies: assessing the performance of PhyML 3.0. Syst. Biol. 59, 307–321 (2010).
Lartillot, N., Lepage, T. & Blanquart, S. PhyloBayes 3: a Bayesian software package for phylogenetic reconstruction and molecular dating. Bioinformatics 25, 2286–2288 (2009).
Thompson, J. D., Gibson, T. J., Plewniak, F., Jeanmougin, F. & Higgins, D. G. The CLUSTAL_X windows interface: flexible strategies for multiple sequence alignment aided by quality analysis tools. Nucleic Acids Res. 25, 4876–4882 (1997).
Kumar, S., Stecher, G., Li, M., Knyaz, C. & Tamura, K. MEGA X: molecular evolutionary genetics analysis across computing platforms. Mol. Biol. Evol. 35, 1547–1549 (2018).
Felsenstein, J. Confidence limits on phylogenies: an approach using the bootstrap. Evolution 39, 783 (1985).
Langmead, B., Trapnell, C., Pop, M. & Salzberg, S. L. Ultrafast and memory-efficient alignment of short DNA sequences to the human genome. Genome Biol. 10, R25 (2009).
Trapnell, C., Pachter, L. & Salzberg, S. L. TopHat: discovering splice junctions with RNA-Seq. Bioinformatics 25, 1105–1111 (2009).
O’Leary, N. A. et al. Reference sequence (RefSeq) database at NCBI: current status, taxonomic expansion, and functional annotation. Nucleic Acids Res. 44, D733–D745 (2016).
Lemieux, C., Turmel, M., Otis, C. & Pombert, J.-F. A streamlined and predominantly diploid genome in the tiny marine green alga Chloropicon primus. Nat. Commun. 10, 4061 (2019).
Yu, G., Wang, L. G., Han, Y. & He, Q. Y. clusterProfiler: an R package for comparing biological themes among gene clusters. OMICS 16, 284–287 (2012).
Fujimoto, S., Sugano, S. S., Kuwata, K., Osakabe, K. & Matsunaga, S. Visualization of specific repetitive genomic sequences with fluorescent TALEs in Arabidopsis thaliana. J. Exp. Bot. 67, 6101–6110 (2016).
Shimada, T. L. et al. HIGH STEROL ESTER 1 is a key factor in plant sterol homeostasis. Nat. Plants 5, 1154–1166 (2019).
Cherry, J. M. et al. Saccharomyces Genome Database: the genomics resource of budding yeast. Nucleic Acids Res. 40, D700–D705 (2012).
Camacho, C. et al. BLAST+: architecture and applications. BMC Bioinform. 10, 421 (2009).
Benjamini, Y. & Hochberg, Y. Controlling the false discovery rate: a practical and powerful approach to multiple testing. J. R. Stat. Soc. 57, 289–300 (1995).
Brunner, E. & Munzel, U. The nonparametric Behrens-Fisher problem: asymptotic theory and a small-sample approximation. Biom. J. 42, 17–25 (2000).
This research was supported by MXT/JSPS KAKENHI funding to T.K. (19H03260 and 22H02657) and S. Matsunaga (20H05911). It was also supported by JST-CREST (JPMJCR20S6) and JST-OPERA (JPMJOP1832) grants to S. Matsunaga. We thank Edanz (https://jp.edanz.com/ac) for editing a draft of this manuscript.
The authors declare no competing interests.
Peer review information
Communications Biology thanks the anonymous reviewers for their contribution to the peer review of this work. Primary Handling Editors: Shahid Mukhtar, Caitlin Karniski and George Inglis. Peer reviewer reports are available.
Publisher’s note Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Rights and permissions
Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made. The images or other third party material in this article are included in the article’s Creative Commons license, unless indicated otherwise in a credit line to the material. If material is not included in the article’s Creative Commons license and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this license, visit http://creativecommons.org/licenses/by/4.0/.
About this article
Cite this article
Kato, S., Misumi, O., Maruyama, S. et al. Genomic analysis of an ultrasmall freshwater green alga, Medakamo hakoo. Commun Biol 6, 89 (2023). https://doi.org/10.1038/s42003-022-04367-9
By submitting a comment you agree to abide by our Terms and Community Guidelines. If you find something abusive or that does not comply with our terms or guidelines please flag it as inappropriate.