Phylotranscriptomic insights into a Mesoproterozoic–Neoproterozoic origin and early radiation of green seaweeds (Ulvophyceae)

The Ulvophyceae, a major group of green algae, is of particular evolutionary interest because of its remarkable morphological and ecological diversity. Its phylogenetic relationships and diversification timeline, however, are still not fully resolved. In this study, using an extensive nuclear gene dataset, we apply coalescent- and concatenation-based approaches to reconstruct the phylogeny of the Ulvophyceae and to explore the sources of conflict in previous phylogenomic studies. The Ulvophyceae is recovered as a paraphyletic group, with the Bryopsidales being a sister group to the Chlorophyceae, and the remaining taxa forming a clade (Ulvophyceae sensu stricto). Molecular clock analyses with different calibration strategies emphasize the large impact of fossil calibrations, and indicate a Meso-Neoproterozoic origin of the Ulvophyceae (sensu stricto), earlier than previous estimates. The results imply that ulvophyceans may have had a profound influence on oceanic redox structures and global biogeochemical cycles at the Mesoproterozoic-Neoproterozoic transition.

T he Chlorophyta is a large and ancient clade of green plants (Viridiplantae), including morphologically diverse green algae living in a wide range of habitats. The Ulvophyceae, one of the main classes of the Chlorophyta, displays an extraordinary diversity, ranging from unicellular or multicellular organisms to giant-celled siphonous or siphonocladous thalli 1,2 . Most species are marine macroalgae (green seaweeds), which are important primary producers in coastal ecosystems, and have important ecological functions as ecosystem engineers 3 . A smaller diversity is found in freshwater and terrestrial habitats 3,4 . Rapid growth of some green seaweeds (e.g., Ulva, Cladophora) under high temperature and nutrient conditions can lead to green algal blooms in coastal waters, also known as "green tides", with negative economic and ecological impacts. Ulvophyceans are also economically important. For example, species in the genera Ulva, Caulerpa, and Cladophora produce sustainable biomass material for the food, aquaculture and biofuel industries, as well as medicine and pharmacology 3,5 . A reliable phylogenetic framework and an evolutionary timeline of the Ulvophyceae are urgently needed to improve our understanding of the evolutionary innovations of these green seaweeds, such as their unique cytomorphological features 6 , and carbon concentrating mechanisms which contribute to the high growth rates of some ulvophyceans 7 .
The phylogenetic relationships among the main lineages of the Ulvophyceae have been difficult to resolve (Fig. 1). Nuclear ribosomal and chloroplast multigene data produced inconsistent and contradictory results 2 . Furthermore, the monophyly of the Ulvophyceae has been questioned based on the lack of synapomorphic characters (including ultrastructural features of the flagellate reproductive cells and modes of nuclear division) distinguishing them from the related Chlorophyceae and Trebouxiophyceae 8,9 (Supplementary Note 1 and Supplementary  Table 1). Although these ultrastructural features were believed to accurately reflect phylogenetic relationships because of their involvement in fundamental processes of cell replication and cell motility, analyses of these data have not been able to resolve the phylogenetic relationships among the Ulvophyceae, Chlorophyceae, and Trebouxiophyceae, or among the orders of the Ulvophyceae 10 . The advent of molecular methods provided new opportunities for reconstructing the phylogeny of the green algae 2 . Early phylogenies based on ribosomal DNA have recovered the Ulvophyceae as a monophyletic group but with weak support 11,12 . A 10-gene phylogeny (eight nuclear and two plastid genes) recovered the Ulvophyceae as a well-supported clade for the first time, and elucidated the relationships of the main lineages 6 . More recently, however, chloroplast phylogenomic analyses have rejected the monophyly of the Ulvophyceae, and instead indicated that several ulvophycean lineages were related to other core chlorophytan clades [13][14][15][16] . Phylotranscriptomic analyses partly confirmed these chloroplast phylogenomic studies, supporting a non-monophyletic Ulvophyceae with the Bryopsidales being sister to the Chlorophyceae, or indicating a hard polytomy among Chlorophyceae, Bryopsidales, and the remaining Ulvophyceae [17][18][19][20] , in which the relative branching order of (1) (2) (3) Fig. 1 Current knowledge of the phylogenetic relationships among the main lineages of the core Chlorophyta based on nuclear phylogenomic studies 17,18,20 . Uncertain phylogenetic relationships are indicated by polytomies or dashed lines. The three different interpretations of the Proterocladus antiquus fossil, corresponding to the three strategies used in our dating analyses, are indicated with arrows: (1) Proterocladus as a total-group cladophoralean, (2) Proterocladus as a total-group ulvophycean, and (3) uncertain phylogenetic position (within or outside the green algae). ARTICLE NATURE COMMUNICATIONS | https://doi.org/10.1038/s41467-022-29282-9 these taxa cannot be resolved with confidence. However, these genome-scale analyses have suffered from relatively low taxon sampling (e.g., in the orders Dasycladales and Cladophorales), and factors such as long-branch attraction (LBA) artifact and ancient rapid radiation likely confounded phylogenetic relationships 12,13,21 . Therefore, richer taxon sampling, genomewide nuclear datasets, and application of more plausible evolutionary models are expected to further enhance our knowledge of phylogenetic relationships within the Ulvophyceae.
Estimating a timeframe for diversification of the green algae has been an equally difficult task given the absence of biomineralization in most groups of green algae, resulting in poor fossil records. In addition, the phylogenetic interpretation of alleged green algal fossils, particularly unicellular fossils with simple morphologies, is a significant challenge because of known morphological convergence in the evolution of protists and algae 22,23 . Thus, the taxonomic interpretation of Precambrian green algal fossils has been heavily debated, limiting their value as fossil calibration points in molecular clock analyses. Only a few orders of Ulvophyceae that include species with calcified thalli (e.g., Bryopsidales and Dasycladales) have a relatively rich fossil record, but these fossils are generally younger, occurring from the Paleozoic onward [24][25][26] . A notable Precambrian fossil genus is Proterocladus, characterized by branched filamentous thalli, and first described from Neoproterozoic deposits of ca. 750 mya (refs. 24,27 ). Recently, a new species, P. antiquus, has been described from one-billion-year-old deposits 28 , prompting further interest in the phylogenetic and molecular clock analyses of the Ulvophyceae. Based on the apparent coenocytic thallus architecture and its resemblance with extant Cladophora and Cladophoropsis species, Proterocladus has been attributed to the Cladophorales, although its taxonomic placement remains contentious 18,29 . Thus, it is appropriate to more thoroughly evaluate Proterocladus in a phylogenetic context to better constrain the evolution of ulvophyceans. A more accurate phylogeny and evolutionary timeline of chlorophytes and ulvophyceans would in turn improve our understanding of the potential impact of green algae on the Mesoproterozoic-Neoproterozoic transition that bridges the "boring billion" (an apparently quiescent period in the global carbon cycle between 1600 and 1000 Ma) and the "freezing millions" (three glaciation events at about 717-662, 654-635, and 580 Ma) 30 .
In this study, we carried out an intensive nuclear gene sampling including 11 new ulvophycean transcriptomes, and constructed a multigene dataset containing 884 genes and 69 species (including 43 ulvophyceans). We applied coalescent-and concatenationbased phylogenetic approaches, and took the heterogeneity of genes, sites and lineages into account in the model selection. We resolved the backbone relationships of the Ulvophyceae, and evaluated the diversification time estimates through three fossil assignment strategies to account for the alternative interpretations of P. antiquus. Notably, our results suggest important connections between the early evolution of green seaweeds and significant changes in the Earth system during and immediately after the Mesoproterozoic-Neoproterozoic transition.

Results
Phylogenomic analyses of the core Chlorophyta. Our dataset included 884 nuclear genes from 69 species of Ulvophyceae, Chlorophyceae and Trebouxiophyceae, and four Chlorodendrophyceae species as outgroup. We applied coalescent-and concatenation-based approaches to reconstruct phylogenetic relationships. All the phylogenetic trees were largely congruent and most relationships among major lineages of core Chlorophyta were well-supported. Trebouxiophyceae and Chlorophyceae were inferred as monophyletic groups with full support, and the former was sister to a clade containing the Ulvophyceae and Chlorophyceae ( Fig. 2A, Supplementary  Figs. 1-3). The Ulvophyceae, however, was recovered as paraphyletic. Bryopsidales and Chlorophyceae were recovered as a clade in all analyses, with full support in the concatenation-based analyses, and weak support in the coalescent-based analysis (0.55/ 33, PP/MLBS, Fig. 2A). The remaining orders of Ulvophyceae (hereafter Ulvophyceae sensu stricto or s.s.) were grouped in three subclades: (1) the UUOI clade, which successively placed Ignatiales, Oltmannsiellopsidales, and Ulotrichales as sister to Ulvales, (2) the DS clade consisting of Dasycladales and Scotinosphaerales, and (3) the TC clade including the Trentepohliales and Cladophorales plus Blastophysa. Within each of these subclades, relationships were concordant among the different analyses ( Fig. 2A, Supplementary Figs. 1-3). Incongruence was found in the position of the DS clade (Fig. 2B). The DS and UUOI clades were sister in the coalescent-based analysis, as well as in the concatenation-based analyses using partition strategy and posterior mean site frequency (PMSF) model ( Fig. 2A, Supplementary Figs. 1, 2), while the concatenation-based analysis using a heterotachy model supported a sister relationship between the DS and TC clades ( Supplementary Fig. 3). The phylogenetic conflict in the concatenation-based analyses implied that the relationships among these deep lineages of the Ulvophyceae (i.e., DS and TC clades) were sensitive to model selection.
Gene tree discordance and coalescent simulations. In the coalescent-based analysis, the UUOI and DS clades as well as the Bryopsidales-Chlorophyceae relationship were separated by extremely short and weakly supported branches ( Fig. 2A). In addition, conflicts between gene trees and the coalescent species tree were particularly situated in these short branches ( Supplementary Fig. 4). The proportions of conflicting bipartitions for the UUOI-DS node, and the Bryopsidales-Chlorophyceae nodes were 92.76% and 87.10% respectively, and the concordant proportions were only 2.15% and 4.75%, respectively (pie charts in Fig. 2A). We also detected gene tree discordance using quartet supports among three topologies (q1-q3) of the focal internal branch in the coalescentbased analysis. Of the 884 gene trees, 36% supported the main topology ((DS, UUOI), TC), 31% supported a first alternative topology ((UUOI, TC), DS), and the remaining 33% supported a second alternative topology ((DS, TC), UUOI) (Fig. 3A). Similarly, 36% of 884 gene trees supported Bryopsidales as sister to Chlorophyceae, 28% supported Bryopsidales as sister to the Ulvophyceae s.s., and the other 36% supported Chlorophyceae as sister to the Ulvophyceae s.s. (Fig. 3A). The short branches and similar topological frequencies (q1-q3) of gene trees are suggestive of incomplete lineage sorting (ILS) in the deep lineages of Ulvophyceae.
To further test whether the short branches and gene tree discordance were caused by ILS, we assembled two small datasets [7-taxon(I) and 7-taxon(II)] and performed coalescent simulations. We separately simulated 100,000 gene trees from 7-taxon(I) and 7-taxon(II) species trees under the multispecies coalescent model (Fig. 3B), and compared the topological frequencies of simulated gene trees with observed gene trees. There was a significant correlation between the simulated and observed topologies (Pearson's correlation coefficient = 0.994, p < 0.01 for 7-taxon(I) dataset; Pearson's correlation coefficient = 0.997, p < 0.01 for 7-taxon(II) dataset, Fig. 3C), indicating that ILS could well explain the topological conflicts among the main lineages of Ulvophyceae. The multispecies coalescent model can accommodate gene tree heterogeneity and has been proven to be able to produce more accurate species trees in the presence of ILS (refs. 31,32 ). Therefore, we used the phylogenetic relationship of the Ulvophyceae reconstructed by the multispecies coalescent model as a framework for subsequent analyses.
Time-calibrated phylogeny of the Ulvophyceae. Our phylogenomic results provided a phylogenetic framework for estimating divergence times in ulvophycean evolutionary history. Timecalibrated phylogenetic analyses were performed using a Bayesian relaxed clock model in MCMCTree, based on the coalescent species tree and 200 most clock-like genes. We compared the posterior distributions to the effective prior distributions for the different calibrated nodes ( Supplementary Fig. 5  priors were generated without molecular data, and posteriors were obtained using data. Most fossil-calibrated nodes demonstrated shifts between posterior and effective prior, confirming that posterior time estimates are not simply dependent on priors, but also on the contribution of molecular data. Our results of three calibration strategies indicated that divergence times of Ulvophyceae s.s. was sensitive to the placement of the one-billionyear-old fossil Proterocladus antiquus (Fig. 4

Discussion
Phylogenetic relationships and morphological evolution of Ulvophyceae. The recent explosive growth of omics data and application of more realistic evolutionary models are rewriting our understanding of the relationships and temporal patterns of diversification of many taxa. Ulvophyceae is not an exception to this. In agreement with recent studies [17][18][19][20] , our phylogenomic analyses indicate that the Ulvophyceae is non-monophyletic, with the Bryopsidales and the Chlorophyceae forming a clade, which is sister to the Ulvophyceae s.s. (Fig. 2, Supplementary Figs. 1-3). Increased taxon sampling largely confirmed the relationships within the Ulvophyceae s.s., supporting three main clades (UUOI, DS, and TC), and enabled resolving the phylogenetic relationships within these clades with higher confidence. The relationships within the UUOI clade have been contentious 11,12,18,33 . Some studies placed the Oltmannsiellopsidales as a sister clade to the clade containing Ignatiales, Ulotrichales, and Ulvales, but these relationships were either based on a limited number of genes (e.g., small subunit rDNA and/or rbcL sequences, refs. 11,12 ), or received moderate to poor support 17,18 . Our analyses are consistent with chloroplast phylogenomic analyses, which also showed a relatively robust support for the relationships among these four orders, with the Ignatiales branching off first 33 . Conversely, some nuclear phylogenomic analyses resolved the Oltmannsiellopsidales as a sister clade to the DS clade 20 . This study confirms the sister relationships of the Dasycladales and Scotinosphaerales, as well as the Trentepohliales and Cladophorales-Blastophysa of previous nuclear phylogenomic studies 18,20 . Phylogenetic analyses based on one or a few genes, as well as chloroplast genomes have largely failed to resolve the relationships among these orders 6,11,12,34,35 . The contentious relationships of these orders were likely caused by insufficient taxon sampling, for instance the notable absence of Scotinosphaerales 11 or Cladophorales 15 in chloroplast-based analyses. In addition, LBA appearing in several phylogenetic analyses of the Ulvophyceae also likely biased the topology 12,13,21,34 . In the present study, relationships within the DS and TC clades were well resolved, and we greatly alleviated long branches in the Trentepohliales and Cladophorales by denser taxon sampling, and by employing site-heterogeneous and heterotachy models. In agreement with previous phylogenomic studies 17,18,20 , most of our analyses did not support a sister relationship between TC and DS clades.
The inferred phylogenetic relationships provide a solid framework to further our understanding of cytomorphological evolution in the green seaweeds. Ancestral state reconstruction confirmed that multicellularity, as well as the siphonous thallus architecture evolved independently and repeatedly in various  Supplementary Figs. 9, 10). Our phylogeny also provides a framework for re-evaluating earlier hypotheses of the evolution of cytological features that were believed to be phylogenetic informative at higher taxonomic level, such as the ultrastructure of the flagellar root system and the processes of mitosis and cell division (Supplementary Note 1 and Supplementary Table 1). Some ultrastructural features, such as the processes of mitosis and cytokinesis, may be less conserved than previously thought, as for example illustrated by the fundamentally different processes in the sister clades Cladophorales and Trentepohliales. However, information on ultrastructural features is largely lacking for the smaller ulvophycean lineages, hampering a complete picture of ultrastructural evolution in the group.
Given the conflicting relationships among the UUOI, DS, and TC clades, combined with the short branches separating these lineages (Fig. 2, Supplementary Figs. 1-3), we tested the hypothesis of ILS in the early evolutionary radiation of Ulvophyceae.
Ancient incomplete lineage sorting explains topological conflict. Phylogenetic incongruence can result from multiple processes and factors that often act together, such as hybridization,  Fig. 4 Comparison of divergence time estimates from three fossil assignment strategies. The differences between the three calibration strategies refer to the placement of Proterocladus antiquus (strategy 1: P. antiquus to stem-group Cladophorales; strategy 2: P. antiquus to stem-group Ulvophyceae s.s.; strategy 3: exclusion of P. antiquus). Node ages are plotted at the posterior means, with horizontal bars representing 95% credibility intervals. ILS, gene duplication/loss, and model misspecification [36][37][38][39][40] . It is widely known that model misspecification can have a substantial impact on phylogenetic estimation, especially for ancient lineages like the Chlorophyta [41][42][43] . It has been suggested that conflicting phylogenetic relationships and the ubiquitous long branches within the Ulvophyceae resulted from inadequate modeling 3 . To address these problems and to test the performance of different models, we used three strategies in the concatenation-based analyses (i.e., partition strategy, PMSF model, and GHOST model). Substitution rate variation across genes, sites and lineages were all taken into account 44,45 . LBA was alleviated in our analyses by increasing taxon sampling and applying heterogeneous models, and most of the inferred topologies using different strategies were congruent (Fig. 2, Supplementary Figs. 1-3), which conferred confidence in the phylogenetic results. However, the phylogenetic position of the DS clade remained uncertain. Specifically, phylogenetic analysis based on a gene-wise partition and site-heterogeneous model supports congruent topology in which the TC clade diverged before the UUOI and DS clade ( Supplementary Figs. S1, S2). In contrast, in the phylogenetic tree estimated by the heterotachy model, the UUOI clade was resolved as a sister to the two other clades (Supplementary Fig. 3). The different topologies imply that these models were insufficient to explain the phylogenetic conflict, and/or that our dataset was sensitive to the rate variation across lineages at deep divergences in the Ulvophyceae. It is difficult to directly measure the fit between these complex models and datasets in phylogenetic analyses. Some approaches for assessing model-to-data fitness have been commonly used for model validation, that is, simulation data are generated using the model and parameter values, and observed data are compared with the simulated data to determine if they can represent the distribution defined from the model 46,47 . Puttick et al. 48 used posterior predictive tests and showed that the site-heterogeneous CATGTR model fitted better than the site-homogeneous GTR model for inferring land plant phylogeny. So far, however, methods of simulating the PMSF or General Heterogeneous evolution On a Single Topology (GHOST) model are not available, and future work is needed for evaluating model fitness.
ILS has been intensively studied in recently diverging lineages 49 , but there is no reason to rule out ILS in ancient rapid radiations, confounding the phylogenetic history of deep nodes, as has been demonstrated in early land plants relationships 50 , early angiosperm evolution 51,52 , and rapid adaptive radiations, including those of neoavian birds 53 , legumes 54 , and the amaranth family 37 . Under these circumstances, the multispecies coalescent method is expected to perform better than the concatenation method, since the former is able to deal with the gene tree heterogeneity caused by ILS, and is likely to result in a more accurate estimation of the species tree 31,32 . In our study, although the branching order of the UUOI, DS and TC clades was not well resolved in concatenation-based analyses, we recovered the same phylogeny in the concatenation-based analyses using a partition strategy and the PMSF model, as in the coalescent-based analysis (Fig. 2B). Moreover, several deep branches were very short, especially at the internodes of the UUOI-DS clade, and Bryopsidales-Chlorophyceae clade ( Fig. 2A). Our analyses revealed strong gene tree conflicts among the focal branches, corresponding to the gene tree quartet probability observed in Fig. 3A. The proportions of genes supporting alternative topologies were roughly equal, similar to the observations reported by previous transcriptomic analyses 18 . These phenomena hint at the possibility of ILS, in conjunction with an ancient rapid radiation 42,51,55-57 . Our simulation results confirmed that deep divergences in the Ulvophyceae were affected by pervasive ILS. Consistent with several previous studies that found that the multispecies coalescent model is superior to concatenation in diverse phylogenomic datasets across the tree of life 40,51 , the multispecies coalescent model is likely to be suitable for our dataset in the presence of ILS, emphasizing that the degree of fit between model and datasets should be considered in phylogenetic analyses.
Insights in ulvophycean evolution at the Mesoproterozoic-Neoproterozoic transition. Fossil calibration is by far the most critical factor in divergence time estimation 58 . Although the Phanerozoic fossils used in this study provide reasonably robust calibration points, they are relatively few in number owing to the scarce fossil records of green algae. This means that the Precambrian fossils used as calibration points potentially exert a strong influence on time estimation. In this context, it is critical to carefully assess the phylogenetic interpretations of the selected Precambrian fossils 59,60 , particularly given their relatively simple morphologies 61 . Del Cortona et al. 18 have evaluated the impact of different phylogenetic assignments of the middle Neoproterozoic filamentous fossil Proterocladus major (ca. 750 Mya, ref. 24 ). Their study supported an early diversification of the Ulvophyceae in the Tonian Period, with further taxonomic, morphological, and ecological expansions in the Ediacaran and Ordovician Periods, consistent with the Paleozoic macroalgal fossil record 61,62 . The recent discovery of the one-billion-year-old fossil Proterocladus antiquus provided an opportunity to further evaluate how these ancient fossils impact our understanding of the timescale of green algal evolution 28 . Based on morphological similarities with contemporary species of Cladophorales, P. antiquus has been interpreted as a total-group cladophoralean 28 . This interpretation is credible given the apparent coenocytic organization of Proterocladus, the fact that most of modern coenocytic green algae belong to the Cladophorales, and other morphological features of Proterocladus (e.g., a rather unusual branching style, holdfast) that are consistent with a cladophoralean interpretation. However, it is impossible to exclude the possibility of convergent evolution. Coenocytic forms are not restricted to the Cladophorales, and also occur in the Ulotrichales (e.g., Acrosiphonia), for example. It is thus reasonable to assume that during the long evolutionary history of the green algae there have been otherperhaps extinct-lineages of Ulvophyceae with similar coenocytic architectures. A more conservative strategy would thus be to treat Proterocladus as a total-group ulvophycean. Finally, there is a remote possibility that Proterocladus belongs to an extinct coenocytic lineage outside the Ulvophyceae or even outside the green algae, considering that coenocytic cells are present in diverse and unrelated groups of algae, including Xanthophyceae (e.g., Vaucheria), red algae (e.g., Griffithsia), even though their morphology differs significantly from Proterocladus or Cladophorales green algae. These different phylogenetic interpretations of P. antiquus led us to evaluate the impact of three different taxonomic assignments of P. antiquus on divergence time estimates. Note that Sforna et al. 63 recently reported a~1 Gyr-old multicellular photosynthetic eukaryote, Arctacellularia tetragonala, with a siphonocladous organization, but that the alga was not associated to either green or red algae.
The time-calibrated trees inferred using the first (assignment of P. antiquus as a stem-group Cladophorales) and second (assignment of P. antiquus as a stem-group Ulvophyceae s.s.) calibration strategies indicated an origin and early diversification of the core Chlorophyta in the Mesoproterozoic, and early diversification of the Ulvophyceae in the late Mesoproterozoic and early Neoproterozoic. Although these estimates are older than what has been reported in previous studies 18,64 , they are largely congruent with current understanding of eukaryotic divergence times [65][66][67] , and some previous hypotheses on green algal evolution 68 . In addition, besides P. antiquus, a number of other paleontological studies supported an earlier diversification of ulvophyte lineages, including the discovery of unbranched filamentous macrofossils of possibly cladophoralean affinity from the early Tonian Period 69 , and the tentative interpretation of the Tonian macrofossil Tawuia as a coenocytic green macroalga 70 . Together, these data indicate that green macroalgae may have already colonized marine benthic environments by the early Neoproterozoic, and possibly earlier in the Mesoproterozoic 61 , although they may have been ecologically restricted due to severe nutrient limitation 71 . The third calibration strategy (excluding P. antiquus) produced time estimates close to those in Del Cortona et al. 18 supporting a Stenian-Tonian origin of Ulvophyceae s.s. and diversification of its lineages in the late Tonian/Cryogenian periods.
By regarding the different time estimates as plausible evolutionary scenarios, our time-calibrated phylogenies provide a framework for understanding green algal evolution and their impact on the past climate and paleogeology. Since ulvophyceans mainly consist of marine macroalgae, their diversification may have paved the road for significant changes in the global carbon cycle and oceanic redox structure at the Mesoproterozoic-Neoproterozoic transition and onwards [72][73][74] . In light of the current results, it is intriguing to note that, although there is evidence for oxygenated bottom waters in Mesoproterozoic oceans [75][76][77][78] , geochemical data indicate a fundamental restructuring of oceanic redox architectures in the Neoproterozoic 72,79 . Biomarker fossils also allude to an ecological rise of green algae in the Cryogenian Period 73 , necessitating the divergence of the chlorophytes prior to the Cryogenian. The evolutionary timeline of chlorophytes and ulvophyceans presented here is consistent with these geochemical and biomarker data, although a more nuanced correlation between molecular clocks and sedimentary rocks will require more refined phylogenetic, paleontological, and geochemical data to be acquired in the future.
In conclusion, this study furthers our understanding on phylogenetic relationships and the timescale of ulvophycean diversification based on a large-scale nuclear dataset. Our study is based on an extensive nuclear gene sampling (including 11 newly sequenced ulvophycean transcriptomes) and provides a robust phylogenetic framework for studying the evolution of ulvophyceans. In concordance with previous studies, our analysis did not support monophyly of the Ulvophyceae. The branching order within the Ulvophyceae s.s. was found to be sensitive to the rate variation across lineages. Gene tree discordance together with coalescent simulations and model fitting indicates that ILS was a possible explanation of topological conflicts. The recent discovery of the one-billion-year-old fossil Proterocladus antiquus offers an opportunity to test its impact on the estimation of ulvophycean divergence times. Taking into account different phylogenetic assignments of the new fossil, our molecular clock analyses indicate that the Ulvophyceae s.s. may have diverged in the late Mesoproterozoic to early Neoproterozoic. The early divergence and diversification of green seaweeds hint at the possibility of geobiological roles of chlorophytes and ulvophyceans in environmental changes in the Mesoproterozoic and Neoproterozoic Era.
Transcriptome sequencing, assembly, and dataset retrieval. According to the cultivation conditions and growth period of the strains, we cultivated the strains for 1 to 2 months during its logarithmic growth period. In order to obtain as many expressed genes as possible, the logarithmic phase strains were enriched at three time points (9:00, 15:00, 21:00) by centrifugation. The collected strains were immediately flash frozen in liquid nitrogen and stored in a −80°C refrigerator until RNA extraction. Total RNA was extracted using TRIzol reagent (Ambion) according to the manufacturer's instructions. The RNA libraries were generated through 150 bp paired-end reads (PE150) sequencing using the Illumina Novaseq platform, at Novogene Bioinformatics Technology Co., Ltd. (Beijing, China). Clean data were obtained by removing reads containing adapter or ploy-N, and lowquality reads from the raw data. The contigs were de novo assembled with Trinity (ref. 80 ) using a k-mer = 25, and clustered using Corset 81 with default parameters. The longest contig in each cluster was chosen as a unigene for further analyses. The full dataset was derived from eight published genomes and 50 published transcriptomes, in addition to the 11 newly sequenced transcriptomes (Supplementary Data 1).
Amino acid sequences of each OG were aligned by MAFFT v7.310 (ref. 84 ) using the L-INS-I algorithm. Ambiguously aligned regions were excluded using Gblocks v0.91b (ref. 85 ) with half gaps allowed. The short sequences were removed using trimAl v1.2 (ref. 86 ) with parameters: resoverlap 0.5 and seqoverlap 50. We further filtered the sequences with taxon occupancy below 80% and length <100 bp, resulting in a dataset of 884 OGs for downstream phylogenetic analysis.
Phylogenetic inferences. The dataset of 884 OGs was analyzed using coalescent and concatenation approaches. For the coalescent approach, ML gene trees were built using IQ-TREE v1.6.12 (ref. 87 ) with automatic selection of the best-fit substitution model for each gene. Branch support was evaluated with 1,000 replicates of the ultrafast bootstrap 88 . To minimize potential impacts of gene tree estimation error for species tree reconstruction, we collapsed low support branches (bootstrap support <20%) from gene trees with Newick Utilities v1.6 (ref. 89 ). The species tree was inferred using ASTRAL v5.7.3 (ref. 90 ) with support values estimated by local posterior probabilities and multi-locus bootstrapping (gene and site resampling). The concatenation analyses were performed using IQ-TREE v1.6.12 with three settings: (1) the gene-wise partitioned analysis using the best-fit model estimated to each partition, (2) the phylogenetic analysis performing the PMSF model with LG + C20 + F + Γ4 and a guide tree specified 45 , and (3) the heterotachy analysis using the GHOST model 44 . Branch support in each analysis was estimated with 1000 ultrafast bootstrap and SH-aLRT branch test replicates 91 .
Ancestral state reconstruction. In order to reconstruct the ancestral state of cytomorphological traits of the Ulvophyceae, we recoded the cell types of 69 species: unicellular (0), colonial (1), siphonous (2), multicellular (3), siphonocladous (4), and all cytomorphological traits were obtained from the literature and AlgaeBase 92 . The coalescent-based phylogeny was used to guide the ancestral state reconstruction. The ancestral state of the cell type was reconstructed using the Mk1 model (Markov k-state 1 parameter model), implemented under a maximumlikelihood framework in the Mesquite v3.61 (http://www.mesquiteproject.org).
Simulation of incomplete lineage sorting (ILS). Gene tree concordance and conflicts were quantified using PhyParts (ref. 36 ), mapping against the coalescent species tree, with 50% bootstrap support threshold of individual gene trees (-s parameter). The pie charts of each node were summarized and generated by PhyPartsPieCharts and ETE3 (https://github.com/mossmatters/ MJPythonNotebooks, ref. 93 ). Normalized quartet supports of the internal branches in the species tree were performed in ASTRAL v5.7.3 with the parameter "-t 8".
To explore whether gene tree conflicts around two short internal branches can be explained by ILS, we carried out ILS simulations and correlation test following the approach of refs. 37,52,94 . The gene trees are simulated under the multispecies coalescent model from the ASTRAL species tree, and then the correlation between the observed frequencies and the simulated frequencies of gene trees is calculated. If there is considerable agreement between simulated and observed gene trees (the correlation coefficient is close to: (1), ILS is likely to occur and may well account for gene tree conflicts. First, our OGs dataset of 69 taxa was filtered to two small datasets of 7 taxa, referred herein as 7-taxon(I) and 7-taxon(II). The 7-taxon(I) dataset was selected around the internal branch between the UUOI (i.e., Ulotrichales, Ulvales, Oltmannsiellopsidales, and Ignatiales) and DS (i.e., Dasycladales and Scotinosphaerales) clades ( Fig. 2A), and the 7-taxon(II) dataset was selected around Bryopsidales and Chlorophyceae. Amino acid sequences of OGs of each 7-taxon dataset were aligned using MAFFT v7.310 (ref. 84 ) with the L-INS-I algorithm, and alignments were trimmed using Gblocks v0.91b (ref. 85 ) with 'Allowed Gap Positions set to half'. OGs with taxon occupancy rate of <80% and length of <100 bp were removed, resulting in 891 genes for the 7-taxon(I) and 1280 genes for the 7-taxon(II) dataset. ML gene trees for each 7-taxon dataset were reconstructed using IQ-TREE v1.6.12 with the best-fit model estimated for each gene. Both species trees were estimated using ASTRAL v5.7.3, with internal branch lengths assigned in coalescent units and terminal branch lengths aligned. We simulated 100,000 gene trees for both datasets under the multispecies coalescent model using the function sim.coaltree.sp in the R package Phybase v1.5 (ref. 95 ). To explore the correlation between the gene trees as estimated by the multispecies coalescent model and the observed gene trees, we calculated the topological frequencies of the observed gene trees and the simulated gene trees for each 7-taxon dataset. The correlation between the observed frequencies and the simulated frequencies of gene trees was performed using the cor.test function in R.

Divergence time estimation
Dataset assembly for molecular dating. Divergence times were estimated for the dataset of 884 OGs using the program MCMCTree in the PAML package v4.9j (ref. 96 ). We identified the clock-likeliness of each gene using SortaDate 97 by the three criteria sequentially: 3 = least topological conflict against the coalescent species tree, 1 = minimal root-to-tip variance and 2 = tree length (discernible information content). The 200 most clock-like genes were selected for the molecular clock analyses. The approximate likelihood calculations in MCMCTree were implemented using CODEML under the LG + Γ 4 + F model 98 . The MCMC process was run for 10 million generations sampled every 1000 generations after a burnin of 1 million iterations. We ran two independent chains to check for convergence and confirmed that the effective sample sizes (ESS) of all parameters were above 200 using Tracer v1.7.1 (ref. 99 ).
Rate priors. We used a time unit of 100 My (million years) and an independent rate (IR) model in our divergence time estimation. For rate priors, we calculated the amino acid pairwise distance between Ulva mutabilis and Chlorocladiella pisiformis (0.3581 substitutions per site) using the LG + Γ 4 + F model in CODEML (ref. 96 ). The divergence time between the two species was represented with the fossil-based age~1000 Ma (ref. 28 ), which indicated that the mean rate for branches (μ) was assigned a gamma hyperprior G (2, 55.85) with mean 2/55.85 = 0.03581 substitutions per site per time unit (100 My). The variance of branch rates (σ 2 ) was assigned G (1, 10) with mean 0.1.
Time priors. The time prior for all nodes was generated by combining fossil calibrations and the birth-death process. Because of the relatively small number of fossil occurrences that can be reliably used as calibration points, we needed more informative parameters for the birth-death model in our calibration-poor phylogeny. The birth rate λ and death rate μ were estimated by using the data-driven birth-death (ddBD) tree prior in R 100 . The sampling fraction ρ was the proportion of our ingroup sample size (65 taxa) to the number of extant species in Ulvophyceae, Chlorophyceae and Trebouxiophyceae (~6478) (https://www.algaebase.org). To ensure our priors on divergence times were appropriate, we ran the MCMC analyses without molecular data to obtain the effective priors.
Fossil constraints. We applied node age constraints to eight nodes using fossil information and secondary calibrations ( Table 1). Node 1 was set with a uniform distribution with a hard minimum age (p L = 1e−300) and a soft maximum age (p U = 0.025). Nodes 5 and 8 were set with uniform distributions with a soft minimum age (p L = 0.025) and a soft maximum age (p U = 0.025) as they were secondary calibrations. The other nodes were set as truncated Cauchy distributions with a hard minimum bound (p L = 1e−300). Three strategies were applied to test the impact of the one-billion-year-old fossil Proterocladus antiquus in our dating analyses. From strategy 1-3, we respectively assigned Proterocladus antiquus to the stem Cladophorales (node 6 − 1), stem Ulvophyceae s.s. (node 6 − 2), and excluded this fossil (node 6 − 3). Calibration densities of the effective prior and posterior distributions were plotted by MCMCTreeR v1.1 in ref. 101 .
Reporting summary. Further information on research design is available in the Nature Research Reporting Summary linked to this article.

Data availability
The raw Illumina data generated for this study are available through the Sequence Read Archive (SRA accession PRJNA726747). The alignment data and phylogenetic trees are available from Figshare: https://figshare.com/s/40126faad551cdfccf69. Source data are provided with this paper.