Elevated mutation rates underlie the evolution of the aquatic plant family Podostemaceae

Molecular evolutionary rates vary among lineages and influence the evolutionary process. Here, we report elevated genome-wide mutation rates in Podostemaceae, a family of aquatic plants with a unique body plan that allows members to live on submerged rocks in fast-flowing rivers. Molecular evolutionary analyses using 1640 orthologous gene groups revealed two historical increases in evolutionary rates: the first at the emergence of the family and the second at the emergence of Podostemoideae, which is the most diversified subfamily. In both branches, synonymous substitution rates were elevated, indicating higher mutation rates. On early branches, mutations were biased in favour of AT content, which is consistent with a role for ultraviolet light-induced mutation and habitat shift. In ancestors of Podostemoideae, DNA-repair genes were enriched in genes under positive selection, which may have responded to the meristem architectural changes.

volution is affected by various factors that are unique to each taxon or species, such as ecology (environmental factors, distribution patterns, life history), development (developmental constraints), and genetic basis (mutation rate, genetic variation). In recent years, the concept of mutation as a driving force of adaptive evolution 1 has received increasing attention, although the notion remains controversial 2 . Large-scale studies have suggested that molecular evolutionary rates correlate with species diversity and morphological evolution in both plants and animals [3][4][5] , while some plants show high phenotypic diversity with low genetic diversity 6 . In plants, elevated molecular evolutionary rates have been reported in carnivores and parasites 7,8 . In parasitic plants, both nonsynonymous and synonymous substitution rates increased, suggesting an elevated mutation rate 6,7 . While such plants are thought to have evolved novel morphologies under special selective pressures, an increase in genetic variation arising from elevated mutation rates is also likely to be responsible for their novel evolutionary history. It is widely accepted that the generation time is important in molecular evolution; the shorter generation time, the more mutation accumulates across generations per absolute time 9 . However, the reason behind the elevation of substitution rates in various plants is not well understood. Although rarely mentioned, environmental mutagenic pressure may also vary; for example, ultraviolet (UV) light is known as a mutagen in vitro 10 , considered to affect domestication of plants 11 , and the intensities vary among the habitat. Here, we report a higher molecular evolutionary rate in the odd aquatic plant family Podostemaceae than in a terrestrial sister group and discuss the role of elevated mutation rates in their unique evolution.
Podostemaceae, commonly known as river weeds, are exclusively aquatic eudicots mainly distributed in the tropics and subtropics. They grow on rocks in rapid streams and waterfalls (Fig. 1a, b). All members have a similar lifestyle, in which plants grow on submerged rocks during the rainy season. When the dry season starts and the rocks are exposed to the air, the plants enter a reproductive phase to dry out and die (Fig. 1c, d). They show unique morphologies that presumably evolved as adaptations to aquatic habitats by developmental reductionism, leading to characteristics such as loss of indeterminate shoot apical meristem (SAM) and loss of primary roots [12][13][14][15][16][17] .
Typical flowering plants grow upright with underground roots and terrestrial shoots, facilitated by indeterminate activities of the root apical meristems and SAMs. In contrast, Podostemaceae plants lack a typical root-shoot system, instead forming adventitious roots from the hypocotyls, which are green and grow on rocks. Shoots form repetitively on the roots, establishing a prostrate body plan 12,14,17 . Unusual morphological variations appear within Podostemaceae as a result of diversification in their unique habitats.
The Podostemaceae appeared relatively recently among flowering plants, after diverging from Hypericaceae~80 Ma 18 . Podostemaceae diversified into 313 species of 53 genera, most of which are small genera of 10 or fewer species, and 42% of which (22 genera) are monotypic, reflecting rapid morphological evolution [19][20][21][22] . Podostemaceae are classified into three subfamilies, among which Trisichoideae are the most basally diverged, being sister to the monophyletic Weddellinoideae and Podostemoideae 19,23,24 . While Tristichoideae and Weddellinoideae are relatively small groups that retain a shoot system with a dome-shaped SAM, Podostemoideae evolved a unique shoot with a determinate SAM that differentiates into leaves 25 . This subfamily eventually diversified into 47 genera and 294 species.
The drastic evolution of the body plan and the high species diversity of Podostemoideae raises the possibility that the subfamily may have experienced a high molecular evolutionary rate. Longer branches of the Podostemaceae lineage compared to other taxa of Malpighiales were found in previous phylogenetic studies 24,25 . However, because these studies were conducted with limited taxon sampling in Podostemaceae (not all three subfamilies) 26,27 , or using a limited region (only plastid genome) 22,25 , historical trends of the evolutionary rate of nuclear genes among the three subfamilies have not been clarified. Therefore, analyses based on the sufficient taxon sampling and sequence information of various genes are needed for understanding the trend of evolutionary rates in this enigmatic family. In this study, we examined samples of all three subfamilies and performed high-throughput cDNA sequencing (RNA-seq) analyses to investigate when and how the evolutionary rate changed, and whether a specific set of genes experienced positive selection.

Results
RNA-seq and de novo assembly. We extracted total RNA from the vegetative tissues of seven Podostemad species-two from Tristichoideae, one from Weddellinoideae, and four from Podostemoideae-and St. John's wort (Hypericum perforatum) from the presumed sister clade, and performed RNA-seq analyses (Fig. 2 and Supplementary Tables 1, 2). At least 25 million paired reads were collected and assembled using Trinity 28 . The contigs were filtered to retain at least 150 amino acids and minimise redundancy (see "Methods" section). We finally obtained 28,421-78,650 contigs in each species for subsequent analyses (Supplementary Table 2).
Acceleration of molecular evolution in Podostemaceae. We inferred 1640 one-to-one orthologous gene groups (orthogroups) from contigs obtained from the RNA-seq data of seven Podostemad species, St. John's wort, and transcript sequences obtained from the Cladopus chinensis 29 and the flax (Linum usitatissimum) genomes 30 . Flax is the closest species for which a genome with high-quality annotation is publicly available. Using the 1640 orthogroups, we performed a phylogenetic analysis of the concatenated sequences using the maximum-likelihood method under the GTR + Γ + I model and obtained a well-supported topology (bootstrap values = 100 for all branches). ASTRAL analysis also strongly supported the same topology (posterior probabilities = 1.0 for all but one branch with posterior probability = 0.98; Supplementary Fig. 1). In the maximum-likelihood tree, longer branches were observed for Podostemaceae (Fig. 2). The tree suggests that the substitution rates increased twice within the family: first at the emergence of Podostemaceae and again at the divergence of the subfamily Podostemoideae. Discordance analysis showed that monophyly of each Podostemoideae, Tristichoideae, and the clade uniting Podostemoidea and Weddelinoideae was supported with more than 1580 genes out of 1640 and that less than 10 genes strongly rejected the monophyly of Podostemoideae, indicating a quite low discordance among the gene trees (Supplementary Table 3). We tested whether the elevated substitution rates were the result of fluctuations within finite observation or a reflection of different evolutionary rates. The following three hypotheses were compared (Table 1) indicating that the preferred model is H2; that is, an elevation of substitution rates occurred at each stem branch in the family Podostemaceae and the subfamily Podostemoideae. Next, to clarify whether the elevated evolutionary rates were common among genes or specific to a given subset of genes, we performed relative-rate tests on the 1640 genes independently under the model H2. In the basal branches of the subfamilies Tristichoideae and Weddellinoideae, we found that 98.1% (1609 genes) had higher evolutionary rates than those in the branches of the outgroups (Fig. 3a), and 98.5% (1616 genes) had higher evolutionary rates in the Podostemoideae branches compared with the basal subfamilies (Fig. 3b). In both cases, the distribution of relative rates was approximately lognormal, and the peaks clearly deviated from 1 ( Fig. 3; t-test for mean of log ratio = 0: t = 92.50, d.f. = 1639, P < 2.2e−16 and t = 77.03, d.f. = 1639, P < 2.2e−16), and the means of relative rates were 2.43 (basal subfamilies/outgroups) and 2.28 (Podostemoideae/ basal subfamilies). The elevated evolutionary substitution rates, therefore, did not occur in specific genes, but widely in the genome, and the increase was more than two-fold, on average. The relative substitution rates became elevated to similar levels in both nucleotides and amino acids (mean values: 2.43 vs. 2.16 in basal subfamilies, 2.28 vs. 2.36 in Podostemoideae).
Next, we investigated whether the mutation rates were elevated because the mutation is a major driver of molecular evolution.   32 . In the d S tree using concatenated sequences, there were long branches in the family Podostemaceae. On average, the substitution rates in the basal branches of Podostemaceae were 2.40 times greater than those of the outgroups, and the rates in the Podostemoideae branches were 1.70 times those of the basal subfamilies (Fig. 2).
These results suggest that spontaneous mutation rates in this family increased, resulting in accelerated molecular evolution.
Investigating the causal factors for the acceleration of mutation rates. A given mutagen or specific mutational process tends to produce specific types of mutation, giving rise to a mutation spectrum. For example, UV light preferentially induces C-to-T mutations (G-to-A in the complementary DNA strand) 10,11 . If the composition of a mutagen changed during evolution, the substitution process would be non-stationary and the base frequency would also change. To infer the history of the mutational  spectrum trend, we used a model in which the equilibrium frequencies differed among branches 33 . The observed frequencies of C and G were lower than 0.25 in Tristichoideae (T. brevis and D. ubonensis; Fig. 4), and the estimated equilibrium frequency of C and G leading to those branches and the ancestral branch of Podostemaceae was low (0.14-0.22), particularly in the branch uniting the Tristichoideae (C: 0.18, G: 0.14; Fig. 4). This observation is consistent with the hypothesis that UV-induced mutations increased at the emergence of Podostemaceae.
To further investigate evidence as to the causes of the increased evolutionary rate in this family, we attempted to detect positive selection at the stem branch of Podostemaceae. We used a branch-site model to detect positive selection affecting specific sites along particular lineages [34][35][36] and investigated the functions of the positively-selected genes using gene ontology (GO) enrichment analysis. Since many sites were tested, some of the sites found in this analysis may be false positives. However, significant GO enrichment is unlikely to occur because of random false positives, and is therefore likely to reflect evolutionary forces. To increase the size of the gene set to be analysed, incomplete orthogroups lacking one Asian Podostemoideae species were also added to the analyses. Of a total of 1964 orthogroups, 459 genes had at least one site under positive selection with a posterior probability above 95%, according to a Bayes Empirical Bayes (BEB) method 36 (Supplementary Data 1). We used GO annotations of the flax gene to represent an orthogroup for the enrichment analysis. Among the 1964 orthogroups, GO annotations were assigned to 1150 orthogroups (Supplementary Data 2). GO enrichment analysis revealed no significant enrichment in the Biological Processes, but response to light stimulus popped at a high rank with complete retainment (Supplementary Data 3). Two genes were retained; one is PHYTOCHROME C (PHYC) and the other is CPSRP43 encoding chloroplast signal recognition particle 43 (cpSRP43). Although neither protein is involved in UV resistance, PHYC functions as a light receptor for photoperiodic flowering 37,38 and cpSRP43 helps plants to adjust their chlorophyll synthesis rate in response to environmental change 39 .
There were no visible spectrum changes in GC and AT frequencies at the Weddellinoideae and Podostemoideae branches. This finding implies that factors other than UV promoted the elevation of molecular evolutionary rates at the Podostemoideae branches. To characterise the evolutionary process during the second elevation of mutation rates for the subfamily Podostemoideae, we attempted GO enrichment analysis for the positively-selected genes at the stem branch of Podostemoideae as well. In the stem branch for Podostemoideae, 511 genes had experienced positive selection on at least one site (Supplementary Data 4). Of the 511 genes, 238 GO annotations were assigned and GO enrichment analysis revealed enrichment of GO:0006281 "DNA repair" (20/829 to 10/238, P = 0.0342) (Supplementary Data 5). Seven other genes involved in DNA damage response were also found in the list of the 511 positivelyselected genes (Supplementary Data 4).
Of the 10 genes associated with "DNA repair," we found a photolyase that repairs UV-induced DNA damage 40 and five genes involved in major DNA-repair mechanisms: MutS homologue 2 (MSH2) and MSH6 act in the DNA-mismatch repair system, DNA-(apurinic or apyrimidinic site) lyase and Nglycosylase/DNA lyase (OGG1) in the base-excision repair, and general transcription factor IIH subunit 2 in the nucleotide excision repair 40 . The base-excision repair and nucleotide excision repair pathways suppress UV-induced mutagenesis by  41 . Mismatch repair also plays a key role in genome integrity by correcting the mismatched nucleotides that occur during DNA synthesis in UV-induced DNA damage 42 . In these six genes, the positively-selected sites were located in conserved domains (Supplementary Data 6-11). Positivelyselected sites were detected in both MSH6 and MSH2, which form a heterodimer and act in mismatch repair (Fig. 5a). Their function is conserved in bacteria (MutS) and eukaryotes 43,44 . We found two positively-selected sites in the connector domain of MSH6 (Fig. 5a, b, full alignment in Supplementary Data 6). One was a highly conserved site in budding yeast, humans, Arabidopsis, Tristichoideae, and Weddellinoideae, but it is changed in the Podostemoideae species. The other site was also well conserved from yeast to the basal subfamilies of Podostemaceae, and a previous complementation study in yeast showed that a mutant allele msh6-573 (K573D K574D), with substitutions next to that site (G572) failed to fully complement the mutator phenotype of a Δmsh3 Δmsh6 strain 45 . A surface loop 650-675 in human MSH6 is one of three loops proposed to mediate protein-protein interactions 44 . Yeast MSH6p interacts with NHP6Ap, an HMGB1 homologue, and modulates their loading to DNA substrate 46 . An HMGB1-interacting sequence corresponding to amino acids 631-637 in human MSH6 has been recognised, but not experimentally identified 43 . These amino acid changes in the connector domain could influence the stability of the genome and the efficacy of DNA repair. In MSH2, two positively-selected sites were found, both of which were located in the lever domain, which functions in opening the cavity to which DNA binds, by conformational change with the clamp domain 47 . Missense mutations at the positively-selected sites detected in this study have been reported to be "functionally neutral," indicating that missense mutations are deleterious in MSH2 knockout human cells 48 .
Modification of the meristem maintenance system occurred in Podostemoideae. Although GO enrichment analysis did not detect terms related to shoot development, we identified nine genes that may be involved in the regulation of shoot meristem activity and development (Supplementary Data 4). The ephemeral fate of the shoot meristems leads to the accumulation of somatic mutations throughout the lives of the plants. DNA repair may have increased the tolerance to DNA damage, allowing cells with damaged DNA to proliferate (Fig. 6). Cells with a deficiency in mismatch-repair capacity are reportedly more tolerant to specific types of DNA damages 49 .
Relaxed selection in the Podostemoideae crown groups. The increase in the ω ratio is a signal of relaxed functional constraints 50,51 . We examined the ω of the branches using a concatenated alignment under a free-ratio model 52,53 . Higher ω values were obtained from the Podostemoideae branches (0.090-0.238) than from the branches of the basal subfamilies (0.051-0.107) and the Podostemoideae stem branch (0.062) (Fig. 7). This observation suggests that the constraints relaxed in Podostemoideae, which experienced drastic morphological changes. The colonisation of new habitats usually occurs with a reduction in effective population size 54,55 , which in turn leads to lower selection pressure. Population fragmentation can also reduce the effective population size. Podostemaceae species usually consists of small populations separated by a number of rapids and waterfalls 22 , making migration between locations difficult. Relaxation of constraints through radiation and fragmented population structure may have contributed to the accelerated accumulation of nonsynonymous changes in the Podostemoideae crown group.

Discussion
Our results demonstrate that the rate of molecular evolution in the aquatic plant family Podostemaceae accelerated twice. In both cases, the acceleration can be attributed primarily to elevated mutation rates, accompanied by the relaxation of selection pressures in the second acceleration.
The plants of the family Podostemaceae colonised rocks in rivers upon which the sunlight falls directly. We found a trend toward lower GC frequencies in the basal branches of Podostemaceae, which indicates a UV-induced mutational spectrum. GO enrichment analysis indicated probable positive selection on PHYC and CPSRP43, which act in response to light stimuli, supporting the connection that the light environment of the   6 Hypothetical scenario leading to higher mutation rates in Podostemoideae. In Podostemoideae, the meristem architecture was changed to one without the organising centre (OC). The change led to more DNA replication in the generation and more DNA damage, which could have caused higher mutation rates. Further, more DNA damage would result in more selection pressure on DNA repair genes, such that more DNA damages are tolerated. The tolerance to DNA damages again could lead to higher mutation rates through survival of the cells with DNA damage leading to mutations rather than cell death.
habitat might have changed at the time of emergence of the family. The light energy from various types of light, including UV light and ionising radiation, causes DNA damage both directly and through the generation of reactive oxygen species 56,57 . Excess visible light can also lead to the production of reactive oxygen species during the process of photosynthesis 58 . These results are consistent with a scenario in which light-induced mutations increased along with habitat changes at an early evolutionary stage. Because members of Podostemaceae have an annual lifestyle adapted to seasonal water level changes, they are particularly exposed to sunlight in the open areas of rivers at the beginning and the end of their life cycles. The generation time of the annual lifestyle is also one of the factors leading to higher mutation rates, coupled with habitat changes at the beginning of the evolutions of the family 27 .
At the time of the emergence of the subfamily Podostemoideae, the evolutionary rates accelerated further, and positive selection on DNA-repair genes was detected. We also identified several genes related to shoot and root development that might be under positive selection. Although GO enrichment analysis did not show significant enrichment of these genes, they could have been involved in the evolution of shoot architecture. One likely explanation is that the acceleration of mutation rates in Podostemoideae is a consequence of the modification of the meristem structure, which is unique to Podostemoideae. In the centre of the SAM of most flowering plants, there is a small group of cells, known as the organising centre (OC), which divide at a slower rate than the surrounding cells and maintain the cell fate [59][60][61] . Because cell division involves DNA replication, it can be a mutation source 62,63 . Somatic mutations in the SAM that are transmitted to the next generation could be restricted to low levels by minimising the number of cell divisions that occur in the OC 64,65 . In contrast, in Podostemoideae, the loss of the ordinary maintenance system of the shoot meristem via the OC during architectural change 25 could increase the number of somatic mutations transmitted to the next generation (Fig. 6). Meristematic cells that are not under the control of the OC experience more DNA replication and cell division cycles, increasing the number of mutations occurring, with the same replication accuracy. Because DNA might be more susceptible to damage during replication, selection pressure to tolerate DNA damage could have been strengthened, leaving traces in DNA-repair genes. Some forms of mismatch-repair deficiency reportedly increase tolerance to specific types of DNA damage 49 , implying the existence of a trade-off between tolerance to DNA damage and mutation rate. DNA-repair mechanism may, therefore, allow cells with damaged DNA to proliferate at the cost of a higher mutation rate (Fig. 6).
Our results were slightly different from those of previous studies 24,27 . Phylogenetic analyses using the matK sequences of 132 species by Koi et al. 24 and the plastid genomes of five species investigated by Bedoya et al. 27 showed longer branches in Tristichoideae than in Podostemoideae. Both Koi et al. 24 and Bedoya et al. 27 analysed plastid genes, while our analyses were of nuclear genes. Plastids and nuclear DNA reside in different cellular compartments. The DNA-repair genes that we found as having positively-selected mutations in Podostemoideae are probably localised in the nucleus, but not in plastids. Thus, plastid and nuclear DNA evolution may have involved different forces in different lineages.
While the heterogeneity of molecular evolutionary rates is widely known, the way in which the rate of molecular evolution affects the evolutionary process remains an open question because the interactions are complex and can vary among taxa. Despite the limited number of species included in the present study, we detected general trends in the elevation of molecular evolutionary rates at the subfamily level in Podostemaceae. The causes of accelerated molecular evolution were investigated using transcriptome-based molecular evolutionary analyses involving a large number of one-to-one orthologous groups. We predicted that changes in the light environment at the time of emergence of the family, and the modification of the stem cell maintenance and DNA-repair systems during the divergence of Podostemoideae, are probable causes of the accelerated substitution rates. Experimental measurements of mutation rates in organisms from environments with a range of natural UV level variations are currently unavailable, and should be considered as future work. The details of the effects of the mutations that are likely to have been affected by the positive selection should also be the subject of future studies. As high-throughput transcriptome sequencing of multiple closely related taxa is now feasible, similar analyses should reveal evolutionary aspects in other taxa with accelerated evolutionary rates that have experienced drastic developmental and physiological evolution, such as parasites and carnivores. The establishment of a general rule for changes in the rate of molecular evolution requires a better understanding of the mechanisms behind various cases.

Methods
Plant materials and culture conditions. Seeds of Podostemaceae species (T. brevis, D. ubonensis, W. squamulosa, R. cf. penicillata., Z. tailichenoides, P. stylosum) and plant bodies of H. japonicum were collected from wild populations (Supplementary Table 1 Table 2). The raw read data were deposited in the DDBJ Sequence Read Archive (DRA) under the accession numbers DRA008126 and DRA010548-DRA010554.
De novo assembly and construction of one-to-one orthogroups. We constructed one-to-one orthogroups following the methods described in Amemiya 3 . Reads from each species were assembled using Trinity 28 . The transcripts were translated, and only the longest open reading frame, of at least 150 amino acids, was retained for each transcript. Redundancy was minimised using the dereplication mode of USEARCH v6.0.307. Groups of orthologous proteins were determined using SonicParanoid, leading to the identification of 6723 orthogroups, each of which included at least one sequence for each of the 10 species. In the second step, the 6723 orthogroups were aligned with MAFFT version 7 66 using the E-INS-i strategy. One representative sequence per species was selected for each orthogroup using SCaFoS 67 . Only orthogroups without missing species were retained, yielding a data set of 1796 orthogroups. To exclude potentially non-homologous sequence segments, we used HmmCleaner 68 . Ambiguously aligned positions were removed using Gblocks 69 with default parameter settings. After these filtration steps, sequences with at least 100 amino acids were retained. The matrices were converted again to nucleotides, with the final data set consisting of 1,739,691 nucleotide positions in 1640 orthogroups.
We also made orthogroups that allow one missing species from Asian Podostemoideae clades consisting of Z. tailichenoides, P. stylosum, H. japonicum and C. chinensis, to increase the power of detection of positive selection. In SCaFoS, we obtained 76 one-to-one orthogroups without Z. tailichenoides, 53 without P. stylosum, 164 without H. japonicum and 70 without C. chinensis, resulting in a total of 363 orthogroups. After HmmCleaner, Gblocks and size filtering were applied to for these orthogroups, 324 orthogroups were retained.
Phylogenetic analysis of multi-gene sequences. RAxML 8.0.5 was used to construct a maximum-likelihood tree under the GTR + Γ + I model using all 1,739,691 positions of the 1640 orthogroups. Bootstrap values were calculated for 10,000 replicates. Further, we inferred a coalescent-based species tree from 1640 gene trees using ASTRAL-III 70 and measured branch supports as local posterior probabilities 71 . Discordance analysis on gene trees was performed using DiscoVista 72 and branches with bootstrap support values above (below) 75% were considered to be highly (weakly) supported. A clade that is not in the original tree but is compatible if low support branches (below 75%) were contracted is considered as a weakly rejected clade. A clade that is not compatible even after contracting low support branches is regarded as strongly rejected.
Analyses of molecular evolutionary rates and positive selection. Molecular evolutionary analyses were performed using the PAML4 programme package 73 . To perform likelihood ratio tests among the hypotheses under different evolutionary models (Table 1), we obtained the log-likelihood values of H0 (the global clock model), H1 (the local clock model), and H2 (the local clock model) using the codeml programme and performed chi-squared tests. In relative-rate tests of each of the 1640 orthogroups, we obtained P-values from the chisquared tests of the log-likelihood under H0 vs. H1, and H1 vs. H2 and adjusted false discovery rate q-values using the fdrtool package in R v3.5.1 74 . Relative rates were obtained under model H2.
To estimate the change in the number of synonymous substitutions, we constructed a d S tree of 1640 orthogroups using a free-ratio branch model implemented in the codeml programme. We calculated the average branch lengths of each of the following groups: outgroups, basal subfamilies, and Podostemoideae. In the concatenated sequences, estimates of nucleotide frequency parameters for the branches and the base frequencies were obtained using the baseml programme under the HKY + dG+N2 model 33 . To detect genes that experienced positive selection at the stem branch of the family Podostemaceae and the subfamily Podostemoideae, we applied a branch-site model to the sequences of each of the 1964 orthogroups including those with no missing species and those with one missing species. Positively-selected sites (ω > 1) were identified using BEB methods for calculating the posterior probabilities for the site classes (class 0: negative, class 1: neutral, class 2a and 2b: positive). Estimates of the ω ratio on the branches were also obtained using concatenated alignments of 1640 orthogroups under a free-ratio branch model to infer the relaxed constraints in each branch.
Gene ontology enrichment analysis. GO enrichment analysis was carried out with a Cytoscape plugin BiNGO 75 using annotation data of flax (Linum usitatissimum v1.0 in Phytozome).
Reporting summary. Further information on research design is available in the Nature Research Reporting Summary linked to this article.

Data availability
Source data underlying figures are presented in Supplementary Data 12-14. The raw read data of RNA-seq are deposited in DRA (DDBJ Read Archive) and available through DRA/SRA/ENA with the accession numbers (DRR238797-DRR238827, DRR258784-DRR258790; Supplementary Table 2). The assembled contig sequences are deposited in the TSA division of DDBJ (INSDC accession ICRX01000001-ICSE01072389,  Supplementary Table 2) with notes on the orthogroup assigned. The input data and outputs of molecular evolutionary analyses are deposited in Dryad (https://doi.org/ 10.5061/dryad.z34tmpgft) 78 . Received: 3 November 2020; Accepted: 23 December 2021; Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made. The images or other third party material in this article are included in the article's Creative Commons license, unless indicated otherwise in a credit line to the material. If material is not included in the article's Creative Commons license and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this license, visit http://creativecommons.org/ licenses/by/4.0/.