Raphidocelis subcapitata (=Pseudokirchneriella subcapitata) provides an insight into genome evolution and environmental adaptations in the Sphaeropleales

The Sphaeropleales are a dominant group of green algae, which contain species important to freshwater ecosystems and those that have potential applied usages. In particular, Raphidocelis subcapitata is widely used worldwide for bioassays in toxicological risk assessments. However, there are few comparative genome analyses of the Sphaeropleales. To reveal genome evolution in the Sphaeropleales based on well-resolved phylogenetic relationships, nuclear, mitochondrial, and plastid genomes were sequenced in this study. The plastid genome provides insights into the phylogenetic relationships of R. subcapitata, which is located in the most basal lineage of the four species in the family Selenastraceae. The mitochondrial genome shows dynamic evolutionary histories with intron expansion in the Selenastraceae. The 51.2 Mbp nuclear genome of R. subcapitata, encoding 13,383 protein-coding genes, is more compact than the genome of its closely related oil-rich species, Monoraphidium neglectum (Selenastraceae), Tetradesmus obliquus (Scenedesmaceae), and Chromochloris zofingiensis (Chromochloridaceae); however, the four species share most of their genes. The Sphaeropleales possess a large number of genes for glycerolipid metabolism and sugar assimilation, which suggests that this order is capable of both heterotrophic and mixotrophic lifestyles in nature. Comparison of transporter genes suggests that the Sphaeropleales can adapt to different natural environmental conditions, such as salinity and low metal concentrations.


Results and Discussion
Phylogenetic analyses. To infer the phylogenetic position of R. subcapitata within the Sphaeropleales, we performed phylogenetic analyses using two datasets: 55 plastid-encoded or 13 mitochondrion-encoded proteins ( Fig. 1; Supplementary Fig. S1). In both trees, R. subcapitata formed a monophyletic group with the other Selenastraceae species, M. neglectum, Ourococcus multisporus, and Kirchneriella aperta. There was robust support (BP = 100 and BPP = 1.00) for inclusion of R. subcapitata in the Sphaeropleales; however, the topologies of the trees were different. The tree using plastid-encoded proteins resolves the phylogenetic relationships better than the mitochondrial tree because it had higher supporting values (BPs at all nodes ≥70) and was based on more amino acids than that based on mitochondrion-encoded proteins. In the plastid-based tree, R. subcapitata was the most basal lineage in the Selenastraceae (BP = 94 and BPP = 1.00) (Fig. 1). M. neglectum and O. multisporus were sister species showing robust support (BP = 99 and BPP = 1.00). The Selenastraceae was a sister group to T. obliquus, Neochloris aquatica, and Chlorotetraedron incus (BP = 99, BPP = 1.00). Chromochloris zofingiensis was monophyletic with the Selenastraceae, T. obliquus, N. aquatica, and C. incus, with moderate support (BP = 71, BPP = 0.9993). Therefore, C. zofingiensis was probably the most basal among the four species with available nuclear genomes of the Sphaeropleales.
Evolution of plastid and mitochondrial genomes. In the Sphaeropleales, 17 mitochondrial and 11 plastid genomes have been submitted so far to comparative analyses 10,15,19,20,[23][24][25][26] . We sequenced the complete mitochondrial and plastid genomes of R. subcapitata and compared these sequences to those of other Sphaeropleales to reveal organellar genome evolution of the order, and particularly of the family Selenastraceae. The mitochondrial genome of R. subcapitata was circular and 44,268 bp in size ( Supplementary Fig. S2a); this genome contained a large tandem repeat region with a 10-mer unit repeated at least 11 times at position 21,665. Protein-coding genes could be translated following the genetic code of the mitochondria of T. obliquus 14 . There were 13 conserved protein-coding genes (3 cytochrome oxygenases, 1 cytochrome b, 7 NADH dehydrogenase subunits, and 2 ATP synthase subunits), 6 fragmented rRNAs, and 28 tRNAs; as observed for other Sphaeropleales species, 16S rRNA and 23S rRNA were separated into two and four fragments, respectively (Supplementary Table S1). Cox2 was split and its N-terminus (Cox2a) was encoded in the mitochondrial genome, similar to other mitochondrial genomes. Sphaeropleales species encode the C-terminus of Cox2 (Cox2b) in the nuclear genome 17 , and this was indeed the case in R. subcapitata (see below). R. subcapitata had 11 introns in cox1, cob, and rrl4, and 11 intronic open reading frames (ORFs), possibly encoding LAGLIDADG or GIY-YIG endonucleases. Except for an intron in rrl4, the introns of R. subcapitata were inserted at the same positions as that of at least one other Sphaeropleales species (Supplementary Fig. S3a-c), suggesting that they had the same origin. Selenastraceae species contained larger amounts of intronic sequences in the mitochondrial genomes than other Sphaeropleales species (Fig. 2a, Supplementary Fig. S3a-c), suggesting that there was a gain in introns in a common ancestor of the Selenastraceae. ProgressiveMauve analysis 27 was performed to reveal the genome arrangements in the Selenastraceae. The gene order in the mitochondrial genome of R. subcapitata was identical to those of K. aperta, O. multisporus, and M. neglectum; however, there was a divergent region in an intron of cob ( Supplementary  Fig. S4). Overall, R. subcapitata, K. aperta, and O. multisporus had genome structures with high similarities, although K. aperta possessed longer introns in rrl2 and rrl4. In contrast, M. neglectum had larger intergenic regions and introns ( Fig. 2a; Supplementary Fig. S4), which may have been caused by a secondary genome size expansion in this species.
The plastid genome of R. subcapitata was circularly mapped and 128,080 bp in size, including two 18,858 bp inverted repeats (IRs) (Supplementary Fig. S2b). Sixty-nine conserved protein-coding genes, 6 rRNAs, 30 tRNAs, and 7 intronic ORFs were found in the plastid genome (Supplementary Table S2). The IR had 1 psbA, 3 rRNAs, and 3 tRNAs. Two introns were inserted in psaA and 4 introns in 2 copies of rrl, all of which possessed an intronic  ORF encoding LAGLIDADG endonuclease. Exceptionally, there was an ORF encoding maturase in an intergenic region between psbE and psaA-1; psaA was split into three fragments, which were dispersed in the genome, suggesting that they could be transcribed with trans-splicing, as in T. obliquus 26 . Among the Sphaeropleales species studied, M. homosphaera contained the smallest plastid genome (102.7 kbp), and R. subcapitata the second smallest, mainly because of small intergenic regions (Fig. 2b). Because the phylogenetic relationships between R. subcapitata and M. homosphaera are unclear, we cannot determine whether the genome reduction is derived. Of the three Selenastraceae species, R. subcapitata and K. aperta showed a completely conserved gene set (Supplementary Table S2), with K. aperta possessing a larger number of introns and intergenic regions (Fig. 2b). M. neglectum had an additional atpF in the IRs, one of which was fragmented; thus, it may be a pseudogene. Based on phylogenetic relationships, IRs increased in length in M. neglectum after branching in the Selenastraceae. The gene orders of M. neglectum and K. aperta were identical, but R. subcapitata contained an inversion of psbK-psbC ( Supplementary Fig. S5), suggesting that this inversion occurred independently.
General features of the nuclear genome of R. subcapitata. We sequenced 4.1 Gbp reads using Illumina MiSeq. Based on assembly and scaffolding, we obtained 417 scaffolds (≥1 kbp). Major scaffolds (≥5 kb) were 51,162,697 bp in total (300 scaffolds) ( Table 1). The assembly size was similar to the estimated genome size (46,790,751 bp), based on a k-mer analysis ( Supplementary Fig. S6). To confirm the completeness of the genome, we performed BUSCO analysis, which assesses genome completeness to find the Benchmarking Universal Single-Copy Orthologs (BUSCOs) in a target genome 28 . The analysis was based on the eukaryote dataset, and showed 91.7% complete BUSCOs, which was higher than those of C. zofingiensis, M. neglectum, and T. obliquus (84.5%, 58.5%, and 79.9%, respectively) (Supplementary Table S3). Additionally, 95.8% and 95.6% of the RNA-seq reads were mapped into ≥1 kb and ≥5 kb scaffolds, respectively. Therefore, we obtained a nearly complete genome for R. subcapitata. The 300 major scaffolds were annotated using the funannotate pipeline based on ab initio and transcripts-based gene prediction. Repeat sequences (12.45% of the major scaffolds) were masked, and most (10.5% of the major scaffolds) were simple repeats. At least 5 rRNAs, 46 tRNAs, and 13,383 protein-coding genes were predicted. Additionally, 11,902 protein-coding genes (88.9% of the prediction) were detected by RNA-seq. The remaining 11.1% of gene models were not mapped with the RNA-seq reads; therefore, they may be rarely expressed genes and some may be pseudogenes. The protein-coding genes contained 76,178 introns at a density of 5.7 introns per gene.
The nuclear genome of R. subcapitata was smallest in the Sphaeropleales (Table 1). Thus far, the nuclear genomes of M. neglectum, T. obliquus, and C. zofingiensis have been sequenced [10][11][12] . The genomes differed greatly from that of R. subcapitata in size, gene number, coding capacity, and GC-content. T. obliquus has a 107.7 Mbp genome, which is nearly twice as large as that of R. subcapitata, but the two species possess a similar number of genes. The coding percentages of R. subcapitata and T. obliquus were 44.0% and 15.7%, respectively. These results suggest that the larger genome of T. obliquus is caused by a large amount of non-coding regions (i.e. intergenic regions and introns). The genomes of M. neglectum (68 Mbp) and C. zofingiensis (58 Mbp) were also smaller but with higher coding percentages (25.7% and 39% 12 , respectively) than that of T. obliquus. However, M. neglectum and C. zofingiensis possessed 16,761, and 15,274 genes, respectively, at least ~2,000 genes more than R. subcapitata and T. obliquus. We compared gene families, which were defined by TreeFam 29 , among R. subcapitata, M. neglectum, T. obliquus, and C. zofingiensis. Most of the families (58.2-69.8% of the total) were shared by them (Fig. 3a). C. zofingiensis had 4,063 gene families, ~300 more gene families than the others (Fig. 3b). However, only 296 unique gene families were found in C. zofingiensis, which was similar to that of the others (160-218 gene families) (Fig. 3a). Functional analyses using KEGG categories 30 showed that most of the functions were shared in the Sphaeropleales (Fig. 3c). M. neglectum possessed the largest number of genes among the Sphaeropleales families considered in the study (Fig. 3b). Therefore, gene expansion in M. neglectum probably originated in the past as a complete or partial duplication of the genome. Bogen et al. 10 reported that M. neglectum had a diploid character because their investigation of a 'contig length vs. read count' plot showed homozygous and heterozygous contigs. In this plot, contigs of homozygous loci have more than twice the read coverage than those of heterozygous loci 31 . However, such characters were not found in R. subcapitata, T. obliquus ( Supplementary Fig. S7a,b), and C. zofingiensis 12 . Therefore, we consider that vegetative cells of the Sphaeropleales are basically haploid. The haploid character, and the small size and simplicity of the R. subcapitata genome are of great advantage for transformation and genome editing. Interestingly, GC-content of R. subcapitata was 71.6%, which was higher than that of M. neglectum, T. obliquus, and C. zofingiensis (Table 1). Roth et al. 12 indicated that the high GC-content of several algae was associated with more fragmentary assemblies. However, the R. subcapitata genome had higher GC-content than M. neglectum and T. obliquus despite the less fragmentary assembly of R. subcapitata (Table 1; Supplementary Table S3). Therefore, the genome of R. subcapitata might have high GC-content in nature. Codon usage of R. subcapitata was highly biased; we found that although all codons were used, there was a greater proportion of GC at synonymous codons, e.g. cysteine was coded for by 20% UGU and 80% UGC codons (Supplementary Table S4). Additionally, the GC-content of transcripts (coding regions) in R. subcapitata was 74.6%, which was higher than that of spacer regions (66.2%). These results suggest that the GC bias is related to the high gene density and coding length of R. subcapitata. In mammalian species, it is known that high GC-content is positively correlated to gene density 32 and coding length 33 .
Restricted endosymbiotic gene transfer of mitochondrial genes in the Sphaeropleales. Species in the Sphaeropleales have split cox2 (cox2a and cox2b) genes, with cox2a located in the mitochondrial genome and cox2b located in the nuclear genome, based on the difference in codon usage between the mitochondria and nucleus and the presence of spliceosomal introns. Because the Chlamydomonadales have both split cox2 genes in their nuclear genome 16   To gain further insights into EGT from the mitochondrial genome in the Sphaeropleales, we searched for cox2b and nuclear mitochondrial DNA segments (NUMTs), which are nuclear sequences that are similar to the mitochondrial genome sequences. We found a cox2b gene in R. subcapitata, M. neglectum, and C. zofingiensis. T. obliquus possessed two copies of cox2b with N terminal sequences that differed from those reported previously 17 . All of the cox2b possessed a ~50 aa N-terminal extension, which was very similar to that of the Chlamydomonadales ( Supplementary Fig. S8), strongly supporting the conclusion that cox2b migrated to the nucleus in a common ancestor of the Chlamydomonadales and Sphaeropleales 17 .
We also performed homology searching of the mitochondrial sequences against the nuclear sequences. We did not find mitochondrial-encoded genes in nuclear protein-coding genes; however, we found some NUMTs in intergenic regions or introns of the nuclear genomes.  (Fig. 4). These results suggest that the Sphaeropleales and the Chlamydomonadales have similar amounts of ongoing mitochondrial DNA transfer to the nucleus. However, in the Sphaeropleales, cox2a has not been transferred to the nuclear genome; this might be caused by unusual codon usage in this order 17 . If the gene is transferred, it cannot be translated correctly to protein, and the DNA fragment may be eliminated immediately. Exceptionally, T. obliquus possessed many more NUMTs than the other Sphaeropleales, 339 NUMTs with 32-2,108 bp with 78-100% similarity (Supplementary Table S5), which were significantly larger in size than those of C. reinhardtii (P = 0.002, t-test) (Fig. 4). This may be explained by the lower gene density of T. obliquus. It had large spacer regions (3,647 bp on average) and introns (462 bp on average), therefore, it is likely that mitochondrial DNA fragments are transferred easily into the nucleus without disturbing genes.
Conserved gene repertories for lipid metabolism in the Sphaeropleales. Some Sphaeropleales species are used for the production of biomass, particularly biofuels. T. obliquus and M. neglectum accumulate palmitate (C16:0) and oleate (C18:1) as major lipid constituents, which may be suitable feedstock for biodiesel production 10,34 . A congener of M. neglectum, M. contortum, can also be useful for biofuel production because of its robust growth, with efficient neutral lipid accumulation, and a favourable fatty acid profile under nitrogen starvation conditions 13 . M. contortum and R. subcapitata mainly accumulate palmitate and oleate 13,35 . We searched for genes related to lipid metabolism in the Sphaeropleales, based on metabolic pathways 10 , and compared them to those of C. reinhardtii. They shared all the genes except for enoyl-acyl-carrier-protein reductase (EAR) (Fig. 5a). The gene for EAR in M. neglectum could not be detected by our homology search; however, Bogen et al. 10 reported the presence of EAR. The Sphaeropleales possessed more genes for acyl-CoA:diacylglycerol acyltransferase (DGAT) than C. reinhardtii (Fig. 5a). DGAT catalyses the last step in triacylglycerol (TAG) biosynthesis in the acyl-CoA dependent pathway, and is composed of two families with similar structure, DGAT type1 and DGAT type2 (also known as DGTT) 36 . Roth et al. 12 discussed the difficulties for annotation of these genes because of their high sequence diversities, and 11 reported putative genes for DGAT type1 or 2. In our analyses, although the exact number of genes for DGAT with diacylglycerol acyltransferase activity could not be confidently determined, R. subcapitata and T. obliquus tentatively possessed 11 and 12 genes for DGAT, respectively, and most of them were homologs of DGAT type2 by our phylogenetic analysis. The DGAT type2 genes of the Sphaeropleales were present in four clades (Fig. 5b), and each clade included other green algae, suggesting that the increase in genes originated in gene duplication. An increase in DGAT genes is found in other oleaginous organisms, such as Nannochloropsis species 37,38 . In these species, the increase in DGAT type2 genes originated in their endosymbiont via EGT 37 , whereas the gene number increase in the Sphaeropleales was probably caused by gene duplication in their ancestor. For the other genes related to lipid biosynthesis, the Sphaeropleales possessed a large number of  genes for ACC, KAR, and KAS1/2 (Fig. 5a) as mentioned in Bogen et al. 10 . These findings imply that the large number of DGAT genes may be related to high lipid productivity in the Sphaeropleales.

Variation in nutrient transporter genes and environmental adaptation in the Sphaeropleales.
The Sphaeropleales are known generally as a dominant group in freshwater and adapted to different environmental conditions e.g. 39,40 . Moreover, Sphaeropleales species have high sensitivity to exogenous substances 41 . To characterize the adaptive potential of the Sphaeropleales to environmental variability, we compared transporter genes among the Sphaeropleales and C. reinhardtii. Most of the transporter gene repertoires for nutrients and metals were similar among the Sphaeropleales and C. reinhardtii (Supplementary Table S6 Table 2).
The H + /hexose cotransporter is functionally related to intake of glucose from the outside of cells 42,43 . A trebouxiophyte, Parachlorella kessleri, also has H + /hexose cotransporter genes 44 . They are categorized in three classes in green algae, based on the phylogenetic relationships 45 . In the classes, genes in the HUP-like class are increased uniquely in mixotrophic green algae (e.g. Chlorella and Auxenochlorella), and thus the HUP-like is possibly related to the heterotrophic lifestyle 45 . Expression of this gene is induced during heterotrophic growth in P. kessleri. The mRNA was not detected in photosynthetically-grown cells, but immediately induced by glucose under darkness 46 . Interestingly, our phylogenetic analysis showed that R. subcapitata, M. neglectum, T. obliquus, and C. zofingiensis had 6, 3, 4, and 7 HUP-like genes, respectively, and they were monophyletic with heterotrophic or mixotrophic trebouxiophytes (Fig. 6a). This result suggests that the HUP-like genes of the trebouxiophytes and the Sphaeropleales had the same origin, and the genes were acquired in the common ancestor of the trebouxiophytes and the Sphaeropleales but were lacking in the Chlamydomonadales. However, it is also possible, although unlikely, that the gene has been acquired independently in the trebouxiophytes and the Sphaeropleales via horizontal gene transfer and increase in each group. Genome information of the OCC clade is needed to clearly reveal their evolutionary history. Heterotrophic growth using hexose or other simple sugars (e.g. glucose) was observed in Monoraphidium sp. 47 , T. obliquus 48,49 , and C. zofingiensis 50,51 . To investigate the heterotrophic and mixotrophic growth ability of R. subcapitata, we cultured it with and without 0.5% glucose under light or continuous dark conditions for 9 days (Fig. 6b; Supplementary  Fig. S9). Final cell concentration was the highest in the culture treatment with glucose under light; cell concentration was ~20-fold higher in this treatment than in the treatment without glucose under light. The treatment with glucose under continuous darkness had a ~6-fold higher cell concentration than the treatment without glucose under light; cells could not grow without glucose under continuous darkness. These results suggest that R. subcapitata is mixotrophic and can grow under heterotrophic conditions without photosynthesis. The cells cultured with glucose contained many lipid bodies (Supplementary Figs S9, S10), suggesting that R. subcapitata stores excess glucose as TAG in these lipid bodies. In the case of nitrogen resources, we identified a large number of amino acid/peptide transporter genes in the Sphaeropleales, although the number of nitrate/nitrite transporter genes was not different among the Sphaeropleales and C. reinhardtii. In Chlorella vulgaris, glucose or glucose analogues induce the intake of amino acids through amino acid permeases 52 , even if inorganic nitrogen (i.e. ammonium or nitrate) is present in high amounts 53 . Based on these findings, it appears reasonable that the Sphaeropleales possess a large number of H + / hexose cotransporters and amino acid permeases. These transporters may contribute to their rapid growth under different nutrient conditions. M. neglectum and T. obliquus can grow under highly salinity conditions (up to 1% NaCl) 10,54 . The adaptation to high salt stress may be conferred by the large number of genes for aquaporin (Table 2). Aquaporin can passively transport small polar molecules, such as water, across cell membranes in different species of algae 55 , and is likely to be used for controlling intracellular osmotic pressure. Multiple metal-nicotianamine transporters may be related to high sensitivity of the Sphaeropleales to exogenous metals. Metal-nicotianamine transporters transport various metals, such as Fe(III), Fe(II), Ni(II), Zn(II), Cu(II), Mn(II), and Cd(II) as metal-PS (phytosiderophores) or metal-NA (nicotianamine) chelates in Zea mays 56,57 . In Scenedesmus, organic chelators are released in inorganic medium 58 , and uptake of iron is regulated by siderophore secretion 59 . These findings suggest that the large number of genes for metal-nicotianamine transporters in R. subcapitata may induce high sensitivity to different metals. M. neglectum, T. obliquus, and C. zofingiensis also possessed multiple copies of genes for metal-nicotianamine transporters, suggesting that the Sphaeropleales might have high sensitivity to metals. In our homology search, there were few genes for metal-nicotianamine transporters among green algae, except for the Sphaeropleales; however, their origins are unknown because of their sequence diversity. The increase in transporters in the Sphaeropleales is likely to be related to their lifestyle. In the Sphaeropleales cell cycle, the stage with motility (e.g. flagellates) is not dominant or not known 60,61 . Immobility may force cells to adapt to different environmental conditions aided by their numerous transporters.

Conclusions
The Sphaeropleales are a dominant group of green algae and contain species important in ecosystems and those with potential for applied usage. In this study, we sequenced the nuclear, plastid, and mitochondrial genomes of R. subcapitata, and performed comparative analyses of Sphaeropleales species. The plastid and mitochondrial genomes provided insights into the phylogenetic relationships of R. subcapitata and the complex evolutionary histories in the Sphaeropleales. The nuclear genome of R. subcapitata was the more compact genome (i.e. assembly size) than those of M. neglectum, T. obliquus, and C. zofingiensis. The gene repertoire was conserved in Sphaeropleales species. Comparison of transporter genes indicated that the Sphaeropleales have the potential to adapt to different natural environmental conditions. These findings have implications for future ecological research and applications such as biomarkers and screening of highly oleaginous algae.

Methods
Culture. R. subcapitata NIES-35 (=ATCC22662) is a strain widely used in environmental bioassays e.g. 60 . It is an axenic strain that is available from the Microbial Culture Collection at the National Institute for Environmental Studies, Japan (http://mcc.nies.go.jp). The strain was cultivated at 20 °C in C medium 62 under white LED (~20 µmol photons/m 2 /s) with 12 h:12 h light:dark cycles in 300 mL glass flasks. For tests of cultivation under mixotrophic or heterotrophic conditions, the cells were cultivated in 6 mL of C medium or C medium with 0.5% glucose in glass test tubes under the above light conditions or continuous darkness. Three separate R. subcapitata cultures were grown in each type of medium and light conditions. The cells were shaken using a TAITEC small size shaker NR-3 (TAITEC, Tokyo, Japan) at ~150 rotations per minute. The initial cell concentration was approximately 1.7 × 10 5 cells/mL. Cell concentrations were counted three times using a haemocytometer.
DNA and RNA extraction. For DNA extraction, R. subcapitata was cultured for 2 weeks in 500 mL of C medium. Cells were collected by gentle centrifugation and ground in a pre-cooled mortar with liquid nitrogen and 50 mg of 0.1 mm glass beads (Bertin, Rockville, MD, USA). The cells were incubated with 600 µL of CTAB extraction buffer 63 at 65 °C for 1 h. DNA was separated by mixing with 500 µL of chloroform and centrifuging at 20,000 × g for 1 min; it was concentrated by standard EtOH precipitation. For RNA sequencing, cells were cultured for a week in 100 mL C medium. To acquire whole expressed genes, RNA was extracted twice just before light-and dark-phase. Cells were collected and ground as described above, and RNA was extracted using the RNeasy Mini Kit (Qiagen, Hilden, Germany). DNA contamination was removed using the TURBO DNA-free Kit (Thermo Fisher Scientific, Waltham, MA, USA). Equal amounts of light-and dark-RNA samples were mixed and sequenced at the same time.
Sequencing. For DNA libraries, we prepared paired-end (~550 bp insert) and mate-pair (~3-4 kbp insert) libraries. The paired-end library was prepared using the TruSeq Nano DNA Library Prep Kit for NeoPrep (Illumina, San Diego, CA, USA) with the NeoPrep system (Illumina) following the manufacturer's protocol. The mate-pair library was prepared using the Nextera Mate Pair Sample Preparation Kit (Illumina) following the manufacturer's protocol. We also prepared a paired-end RNA library (~550 bp insert) using the NEBNext Ultra Directional RNA Library Prep Kit for Illumina (New England BioLabs, Inc., Ipswich, MA, USA). mRNA was purified using the NEBNext Poly(A) mRNA Magnetic Isolation Module (New England BioLabs). All libraries were sequenced on the MiSeq sequencing system (Illumina) using the MiSeq Reagent Kit v3, 600 cycles (300 bp × 2), and the MiSeq Reagent Kit v2, 150 cycles (75 bp × 2), for the paired-end and mate-pair libraries, respectively.

Identification of gene families, NUMTs, and genes for lipid biosynthesis and transporters.
Gene families were searched using TreeFam 9 29 with an e-value cut-off of <1E −5 . NUMTs were searched by blastn with the mitochondrial genomes as a query, with an e-value cut-off of <1E −3 . Genes for lipid biosynthesis were based on Bogen et al. 10 , and searched using blastp with the proteins of Arabidopsis thaliana as a query, with an e-value cut-off of <1E −10 . Transporters were identified using blastp with the Transporter Classification Database (TCDB) 85 as a query, with an e-value cut-off of <1E −5 .
Data availability. The strain used in this study (R. subcapitata, NIES-35) is available from the Microbial Culture Collection at the National Institute for Environmental Studies (NIES) (http://mcc.nies.go.jp), Japan. The DNA paired-end, mate-pair, and RNA paired-end reads were deposited in DDBJ/Genbank/ENA with accession numbers DRR090198, DRR090199, and DRR090200, respectively. The nuclear, plastid, and mitochondrial genomes were deposited in DDBJ/Genbank/ENA with accession numbers BDRX01000001-BDRX01000300, AP018038, and AP018037, respectively.