Genome sequencing of the multicellular alga Astrephomene provides insights into convergent evolution of germ-soma differentiation

Germ-soma differentiation evolved independently in many eukaryotic lineages and contributed to complex multicellular organizations. However, the molecular genetic bases of such convergent evolution remain unresolved. Two multicellular volvocine green algae, Volvox and Astrephomene, exhibit convergent evolution of germ-soma differentiation. The complete genome sequence is now available for Volvox, while genome information is scarce for Astrephomene. Here, we generated the de novo whole genome sequence of Astrephomene gubernaculifera and conducted RNA-seq analysis of isolated somatic and reproductive cells. In Volvox, tandem duplication and neofunctionalization of the ancestral transcription factor gene (RLS1/rlsD) might have led to the evolution of regA, the master regulator for Volvox germ-soma differentiation. However, our genome data demonstrated that Astrephomene has not undergone tandem duplication of the RLS1/rlsD homolog or acquisition of a regA-like gene. Our RNA-seq analysis revealed the downregulation of photosynthetic and anabolic gene expression in Astrephomene somatic cells, as in Volvox. Among genes with high expression in somatic cells of Astrephomene, we identified three genes encoding putative transcription factors, which may regulate somatic cell differentiation. Thus, the convergent evolution of germ-soma differentiation in the volvocine algae may have occurred by the acquisition of different regulatory circuits that generate a similar division of labor.


Results and discussion
De novo genome sequence of Astrephomene gubernaculifera. The de novo whole nuclear genome of A. gubernaculifera strain NIES-4017 was constructed by assembling a combination of long and short sequencing reads into contigs (see "Methods"). The genome assembly with 103,829,650 bp from 206 gap-free contigs and N50 of 1,506,991 bp was obtained. The quality of assembly was verified using BUSCO 37 , which showed the presence of 97.1% genes in the reference dataset (see "Methods"). Moreover, gene prediction based on RNA-seq data identified 13,713 protein-coding gene models of genome assembly in A. gubernaculifera. The genome size and number of genes in A. gubernaculifera were slightly smaller than those of other volvocine algae (Supplementary Table 1). GC content of A. gubernaculifera was 59.9%, which is intermediate between C. reinhardtii (64.1%) and V. carteri (56.1%). Overall, the characteristics of the nuclear genome in A. gubernaculifera did not differ greatly from those in other volvocine algae. The plastid genome (414,859 bp) and mitochondrial genome (33,967 bp) were also sequenced and annotated ( Supplementary Fig. 1).
The analysis of orthology of the genes in four volvocine genomes (C. reinhardtii, Gonium pectorale, V. carteri and A. gubernaculifera) showed that more than a half of orthogroups (putative sets of orthologs or gene families) were shared within the four volvocine species (Supplementary Fig. 2). On the other hand, A. gubernaculifera genome lacked many orthogroups shared in three other volvocine species ( Supplementary Fig. 2a), which might have resulted in the relatively small number of genes in A. gubernaculifera genome (Supplementary Table 1). The synonymous substitution ratios (dS) of conserved orthologs are totally higher in V. carteri than those in A. gubernaculifera and G. pectorale, in contrast to the similarity in non-synonymous substitution ratio (dN) ( Supplementary Fig. 3 Fig. 4) and a previous report 38 . The genomes of all genera except for Astrephomene have been sequenced in previous studies [7][8][9][10][11] Fig. 4), which is consistent with the recent phylotranscriptomic analysis of volvocine species 38 .

RLS1/rlsD homolog and other VARL genes.
The differentiation of somatic cells in V. carteri is controlled by regA, which encodes a putative transcription factor with a DNA-binding SAND-like domain (called the VARL domain), and is expressed only in somatic cells 26 . regA is arranged tandemly with closely related paralogs with VARL domains (rlsA, rlsB, and rlsC) in the V. carteri genome, which is called the reg cluster 27 . The reg cluster is derived from tandem duplication of another VARL domain-containing gene (RLS1/rlsD) at the common ancestor of Volvocaceae, and one of the duplicated genes, regA, has become the master regulator of somatic cells in V. carteri 9,28,29 . C. reinhardtii also has RLS1 27 , whose function may be response to environmental stresses 39,40 . Homologs of regA and other genes with the VARL domain were searched in the genome of A. gubernaculifera. Six VARL genes were found in A. gubernaculifera. Phylogenetic analysis of VARL genes within volvocine algae assigned them to the RLS1/rlsD clade; RLS5, 6, 9/rlsH clade; RLS10/rlsL clade; RLS2/rlsI clade; and rlsF clade (Fig. 2a). Moreover, gene synteny including the A. gubernaculifera RLS1/rlsD gene was similar to that in other volvocine algae (Fig. 2b). No A. gubernaculifera homolog was included in the reg cluster clade (Fig. 2a), and no tandem duplication of an RLS1/rlsD ortholog was found in the A. gubernaculifera genome (Fig. 2b), unlike species in Volvocaceae 28,29 . These results suggest that Astrephomene may have evolved another molecular basis to differentiate somatic cells.
We also examined ECM gene families and several other genes whose homologs are involved in asymmetric cell divisions and spheroidal colony formation during embryogenesis in V. carteri (Supplementary Notes 1, 2).  www.nature.com/scientificreports/ enrichment analysis against these two groups identified many enriched terms in A. gubernaculifera, including "signal transduction, " "cytoskeleton-dependent intracellular transport", and "cilium" as enriched GO terms in somatic genes, and "generation of precursor metabolites and energy" and "photosynthesis" as enriched GO terms in reproductive genes (Fig. 3b, Supplementary Table 3). These terms were common with somatic-and gonidial-enriched terms in V. carteri 32 , suggesting that germ-soma differentiation in Astrephomene and Volvox have similar features in gene expression. We further investigated the expression patterns of A. gubernaculifera genes involved in respective biological functions. Among the genes involved in motility, many of the genes encoding flagella components belonged to somatic genes. In particular, most intraflagellar transport genes and Bardet-Biedel Syndrome genes were somatic genes ( Supplementary Fig. 10, Supplementary Data 4). Although both somatic and reproductive cells in Astrephomene have flagella 33 , unlike Volvox, this result implies that somatic cells in Astrephomene are also more specialized for motility compared to reproductive cells. The most obvious difference in gene expression in A. gubernaculifera between somatic and reproductive cells was observed in photosynthetic genes. Almost all of the genes encoding core subunits of photosynthetic complexes on the thylakoid membranes exhibited higher expression in reproductive cells (Fig. 3c, Supplementary Fig. 11, Supplementary Data 5). On the other hand, gene expression of assembly factors for photosynthetic complexes and enzymes for chlorophyll biosynthesis did not show clear differences between somatic and reproductive cells (Fig. 3c, Supplementary Data 5). In V. carteri, almost all of the genes encoding assembly factors and enzymes for chlorophyll biosynthesis are highly expressed in gonidia 32 . Overall, both Astrephomene and Volvox are thought to have downregulation of photosynthesis in somatic cells.
Among carbon metabolism genes (Fig. 3d, Supplementary Fig. 12, Supplementary Data 6), almost all genes in the Calvin-Benson-Bassham cycle were reproductive genes in A. gubernaculifera. Moreover, more than half of fatty acid biosynthesis genes and half of amino acid biosynthesis genes were reproductive genes. These anabolic pathways were also upregulated in gonidia of V. carteri 32 . In addition, genes involved in acetate metabolism were reproductive in A. gubernaculifera. Acetate is mainly metabolized during the TCA cycle and glyoxylate cycle in heterotrophic algae 41 . In A. gubernaculifera, genes involved in both metabolic pathways, including enzymes for acetyl-CoA synthesis from acetate (ACK2, PAT1, and ACS1 for TCA cycle, ACS2 and ACS3 for glyoxylate cycle) 42,43 and downstream TCA and glyoxylate cycles, were mainly upregulated in reproductive cells (Supplementary Fig. 12, Supplementary Data 6). By contrast, glyoxylate cycle genes, along with upstream fatty acid degradation, are upregulated in the somatic cells of V. carteri 32 . Differences in expression patterns of the glyoxylate cycle between A. gubernaculifera and V. carteri may reflect different trophic states: unlike Volvox, Astrephomene exhibits strong heterotrophy and requires acetate or some carboxylic acid as a carbon source for growth 36 . Thus, growth in somatic cells of A. gubernaculifera is thought to be downregulated via both phototrophic and heterotrophic pathways, whereas that of V. carteri somatic cells may be downregulated via only photosynthetic pathways.
An overview of gene expression patterns of somatic and reproductive cells in A. gubernaculifera documented in this study and comparison with those in V. carteri are summarized in Fig. 4c. In V. carteri, gonidia are larger than somatic cells at the end of embryogenesis due to the asymmetric cell divisions 44 , and the difference in cell size between cell types become much larger as gonidia grow faster than somatic cells 45,46 . This difference in cell growth between reproductive and somatic cells results from upregulation of photosynthesis in gonidia and upregulation of ECM biosynthesis in somatic cells 32 . On the other hand, the size of somatic and reproductive cells in A. gubernaculifera is almost the same at the end of embryogenesis 35 , and the difference in cell size between somatic and reproductive cells in mature colonies (Fig. 1c) may result from the different growth rates between the two cell types. The present RNA-seq analysis of A. gubernaculifera showed the upregulation of photosynthesis and acetate metabolism in reproductive cells (Fig. 3c,d, Supplementary Figs. 11, 12, Supplementary Data 5, 6), which may result in larger growth in reproductive cells than somatic cells in A. gubernaculifera. The source-sink hypothesis in Volvocales 47 , a hypothesis that somatic cells act as a source for nutrients and reproductive cells act as a sink, may not be applied to germ-soma differentiation in Astrephomene, since there are only two or four somatic cells in a 32-or 64-celled colony, and reproductive cells in Astrephomene may grow by themselves.
Candidate genes for master regulators of germ-soma differentiation. In V. carteri, regA has extremely high somatic/gonidial (reproductive) expression ratios (150-1000-fold) 30,32 . To find candidate master regulator genes of germ-soma differentiation in A. gubernaculifera, we examined the transcripts of 147 somaticspecific genes that exhibited significant values of somatic/reproductive expression ratios higher than five by a statistical test (Supplementary Data 9, see also Supplementary Methods for detail). Of these, we found three genes encoding putative transcription factors: Agub_g2945, Agub_g5284, and Agub_g8265 (Fig. 4). Agub_g2945 contained two MYB repeats in a DNA binding domain (Fig. 4a) and was an R2R3-MYB transcription factor, a group that is diverged in Viridiplantae (green plants and algae) 48 (Supplementary Fig. 13a). Agub_g5284 had one MYB repeat and was thought to be a 1R-MYB transcription factor (Fig. 4a), which belongs to a diverged group within volvocine green algae (Supplementary Fig. 13b). Though 1R-MYB genes in this group have expanded in the volvocine species, there is no tandem repeats of these genes in the volvocine genomes. On the other hand, Agub_g8265 encoded an RWP-RK transcription factor (Fig. 4a), which is included in the RKD(B) subfamily diverged in volvocine algae ( Supplementary Fig. 14) 49 and was separated from MID (sex determination) 50 and NIT2 (nitrate signaling) 51 .
These somatic-specific transcription factors in A. gubernaculifera have putative orthologs in V. carteri and other volvocine algae (Fig. 4b, Supplementary Figs. 13, 14). Because the homolog of Agub_g2945, Vocar.0014s0065, exhibits low expression in both somatic cells and gonidia in V. carteri 32 , the somatic-specific expression of Agub_g2945 may only have evolved in the ancestor of Astrephomene. On the other hand, because both Agub_g5284 and its homolog Vocar.0011s0039 exhibit somatic-specific expression 32 32 . Thus, the rlsH homologs in A. gubernaculifera and V. carteri may have similar function in somatic cells, which is currently unknown. As A. gubernaculifera does not have a regA-like gene (Fig. 2), the rlsH homolog also might be involved in the differentiation of somatic cells in A. gubernaculifera. www.nature.com/scientificreports/ Although the functions of the three somatic-specific transcription factors in A. gubernaculifera found in this study and their closely related homologs in other volvocine algae (Fig. 4) are currently unknown, these genes warrant further investigation as candidates of master regulators of germ-soma differentiation in A. gubernaculifera. In addition to these transcription factors and the rlsH homolog, it is possible that the post-transcriptional regulation of other transcription factors may be involved in the cell differentiation, which would be discovered by further studies on Astrephomene. The difference in such regulator genes between A. gubernaculifera and V. carteri may also affect differences in the downregulation of metabolic pathways (especially in acetate metabolism) to repress the growth of somatic cells (Fig. 4c).

Conclusion
Convergent evolution, the independent evolution of similar traits in two or more different groups, is a crucial topic in evolutionary biology. Recent studies with genetic and genomic tools have revealed that convergent evolution has occurred at various hierarchical levels even for the similar phenotype: in mutations, genes, gene functions, networks or pathways 52,53 . Germ-soma differentiation is a pivotal trait in convergent evolution, as it has occurred in many multicellular lineages and contributed to the evolution of multicellular complexity 3,22 . However, it has been unsolved at what level the convergence of germ-soma occurred. Here, we documented a hierarchy in the convergent evolution of germ-soma differentiation by focusing on Volvox and Astrephomene (Fig. 1). Through de novo genome sequencing and RNA-seq analyses of A. gubernaculifera, we revealed differences in the regulation of germ-soma differentiation at the genetic level. While the master regulator gene for germ-soma differentiation in V. carteri is regA, we demonstrated that there is no regA or no reg cluster gene in the Astrephomene genome (Fig. 2). However, we discovered three somatic-specific transcription factors that are candidate genes for regulating germ-soma differentiation in Astrephomene. Although regA belongs to the VARL gene family with the SAND domain, these three Astrephomene candidate genes belong to either the MYB or RWP-RK gene family (Fig. 4). Nevertheless, the present and previous cell-type RNA-seq analyses of Astrephomene and Volvox have demonstrated that these two multicellular organisms share a common overall pattern of two cell types in gene expression at the metabolic pathway level: reproductive cells are specialized for cell growth and reproduction, while somatic cells are specialized for motility and ECM biosynthesis instead of cell growth (Fig. 3). Thus, different transcription factors in the ancestors of Astrephomene and Volvox may have been independently co-opted for a similar function in the regulation of gene expression in differentiated cells, which would underlie the convergent evolution of germ-soma differentiation within the volvocine green algae. This contrasts with the recent studies which have reported that the repeated evolution of the same transcription factor underlies phenotypic convergence, such as parallel duplication of Zn-cluster TFs associated with multiple emergences of unicellular yeast-like fungi 54 .
The present study suggested that the convergent evolution of germ-soma differentiation in Volvox and Astrephomene is based on evolution of different transcription factors that control similar gene expression patterns in the different linages (Fig. 4c). This evolution may involve various changes at many levels: expression and/or regulation of the transcription factors themselves, cis-regulatory elements they bind, target genes and/or gene circuits having the cis-regulatory elements, etc. Thus, the convergent evolution of germ-soma differentiation in Volvox and Astrephomene may be much more complicated than previous, well-known examples of convergent evolution in animals, such as convergence of pigmentation by the mutation in the same pathway (but different genes or different gene function) in mice 55 and lizards 56 . Further study on the convergent evolution of germsoma differentiation in Astrephomene and Volvox would require the new approach to finding the evolutionary changes, such as comparative ChIP-seq analysis between volvocine species, after the identification of the master regulator for somatic cell differentiation in Astrephomene. The convergent evolution of germ-soma differentiation in volvocine algae would be a simple, suitable model for understanding how different transcription factors have been co-opted for the evolution of a similar phenotype.

Methods
Strain and culture conditions. Astrephomene gubernaculifera strain NIES-4017 (from the Microbial Culture Collection at the National Institute for Environmental Studies [NIES]; http:// mcc. nies. go. jp/ 35,57 ) was used throughout this study. The culture was maintained in 10 mL of VTAC medium 57,58 in screw-capped tubes (18 × 150 mm) at 25 °C on a 12-h light/12-h dark schedule under cool-white fluorescent lamps at an intensity of 50-90 μmol·m −2 ·s −1 . For experimental analyses, synchronous cultures were established as described previously 35 (Fig. 1f, Supplementary Methods).
De novo genome sequencing and assembly. For extraction of genomic DNA, young colonies just after embryogenesis in synchronous cultures of A. gubernaculifera (ZT 15 in Fig. 1f) were used. Genomic DNA from cells was prepared based on methods described previously 59  www.nature.com/scientificreports/ For A. gubernaculifera nuclear genome assembly, PacBio subreads were first assembled de novo using Falcon v0.7, Falcon-unzip v0.4 60 , and SMRT Link v6.0.0.47841 (Pacific Biosciences). The Illumina data were then mapped against the PacBio assembly sequence, and assemblies were corrected using Pilon v1.22 61 . Contigs with < 45% GC (putative organelle genome) were excluded from a set of the nuclear genome sequence. The quality of genome assembly was verified using BUSCO ver.5.0.0 37 with the chlorophyta_odb10 dataset (1519 BUSCOs), which indicated that 97.1% complete genes in the reference dataset were present in the current genome assembly (C: 97.1% S: 96.6% D: 0.5% F: 0.9% M: 2.0%). Plastid and mitochondrial genomes were also assembled and annotated, respectively (Supplementary Methods).
Prediction of gene models of Astrephomene gubernaculifera using RNA-seq data. For extraction of total RNA for annotation of the nuclear genome of A. gubernaculifera, young colonies and embryos in synchronous culture (ZT 23 and ZT 11 in Fig. 1f) were used. RNA was extracted using TRIzol reagent (Thermo Fisher Scientific, MA, USA) according to the manufacturer's instructions. Illumina RNA-seq libraries were constructed using a TruSeq Stranded mRNA Library Prep Kit (Illumina). These libraries were sequenced on an Illumina HiSeq 2500 sequencer (Illumina). 22.6 M and 24.2 M paired-end reads (100 bp × 2) were generated for young colonies and embryos, respectively. The sequenced reads from two samples were merged and aligned to the A. gubernaculifera genome assembly described above using HISAT2 version 2.1.0 62 . 89.0% of read pairs were uniquely mapped to the genome. The mapped reads were assembled using StringTie v2.0.4 63 and the coding regions in assemblies generated by StringTie were identified using TransDecoder v5.5.0 (https:// github. com/ Trans Decod er/ Trans Decod er) for predicted gene models.  Cell-type RNA-seq analysis. The separation of somatic and reproductive cells of A. gubernaculifera, RNAseq analysis using both cell samples and differential expression analysis were performed as shown in Fig. 3a Fig. 15).

Identification and analysis of VARL genes and other genes related to multicellular traits. The
Identification and analysis of somatic-and reproductive-specific genes. The characteristics of proteins encoded in all somatic-and reproductive-specific genes described above were deduced based on a BLASTP search against the NCBI nonredundant database (E-value < 1e−10) using DIAMOND v2.0.2.140 64 and a domain search against all Pfam-A models 65 using "hmmscan" (E-value < 1e−5) in HMMER. The putative hydroxyproline-rich domains, which contain prolines at a high ratio, were difficult to detect by these searches and were instead found by checking sequences manually. Among somatic-specific genes, putative MYB transcription factors, Agub_g2945 and Agub_g5284, and a putative RWP-RK transcription factor, Agub_g8265, were subjected to phylogenetic analysis (Supplementary Methods).

Statistics and reproducibility.
For cell-type RNA-seq analysis, we isolated three somatic and reproductive cell samples from different flasks containing A. gubernaculifera culture, respectively, and treated as biological replicates (Supplementary Table 2). The differential expression analysis was based on Wald test between the three somatic and three reproductive replicates using DESeq2 66 . GO enrichment analysis was based on twosided Fisher's exact test for the detected differential expressed genes (somatic and reproductive genes) using Blast2GO 67 . For both tests, statistical significance was defined as a false discovery rate (FDR) < 0.05. See Supplementary Methods for detail.

Data availability
The nuclear genome assembly with annotated gene models of A. gubernaculifera strain NIES-4017 has been deposited to DDBJ/EMBL/GenBank (BioProject accession number: PRJDB10253; BioSample accession number: SAMD00236769; WGS accession numbers: BMAR01000001-BMAR01000206). The plastid and mitochondrial genome assemblies with annotated gene models of A. gubernaculifera strain NIES-4017 have been deposited to DDBJ/EMBL/GenBank (accession number: LC523292 [plastid], AP024562 [mitochondria]). The raw read sequence data from cell-type RNA-seq analysis also has been deposited to DDBJ/EMBL/GenBank (BioProject accession number: PRJDB10862; BioSample accession numbers: SAMD00260684-SAMD00260689; DRA accession number: DRA011195). The alignment file and dataset partitioning and best-fitted substitution models used for phylogenomic analysis of volvocine species is Supplementary Data 10 and 11 for this article, respectively. All the other sequence alignments used for phylogenetic analyses have been deposited in TreeBASE (https:// treeb