Deciphering evolutionary dynamics of SWEET genes in diverse plant lineages

SWEET/MtN3/saliva genes are prevalent in cellular organisms and play diverse roles in plants. These genes are widely considered as evolutionarily conserved genes, which is inconsistent with their extensive expansion and functional diversity. In this study, SWEET genes were identified from 31 representative plant species, and exhibited remarkable expansion and diversification ranging from aquatic to land plants. Duplication detection indicated that the sharp increase in the number of SWEET genes in higher plants was largely due to tandem and segmental duplication, under purifying selection. In addition, phylogeny reconstruction of SWEET genes was performed using the maximum-likelihood (ML) method; the genes were grouped into four clades, and further classified into 10 monocot and 11 dicot subfamilies. Furthermore, selection pressure of SWEET genes in different subfamilies was investigated via different strategies (classical and Bayesian maximum likelihood (Datamonkey/PAML)). The average dN/dS for each group were lower than one, indicating purifying selection. Individual positive selection sites were detected within 4 of the 21 sub-families by both two methods, including two monocot subfamilies in Clade III, harboring five rice SWEET homologs characterized to confer resistance to rice bacterial blight disease. Finally, we traced evolutionary fate of SWEET genes in clade III for functional characterization in future.


Results
Genome-wide identification of SWEET genes in 30 representative plant species. In this study, SWEET genes were systematically surveyed in 31 plant genomes, ranging from aquatic algae to angiosperms. A total of 636 SWEET homologs were identified among our sampled genomes ( Fig. 1 and Supplementary Table S1). Interestingly, SWEET genes were detected in unicellular aquatic algae, which was indicative of its ancient origin and functional conservation. In addition, the numbers of SWEET members in land plants indicated varying degrees of expansion compared to aquatic algae. Firstly, only one, one, four and four homologs were identified in four aquatic algae O. lucimarinus, M. pusilla, C. reinhardtii and V. carteri, respectively, all of which were remarkably fewer than those found in land plants. Secondly, in the lower land plant P. patens, which is believed to be one of the earliest land lineages that diverged from aquatic plants 22 , and A. trichopoda, which is the single living representative of the sister lineage of all living angiosperms 23 , seven and nine SWEET genes were identified, respectively. Whereas, in the non-seed lycophyte S. moellendorffii, 15 SWEET homologs were found. In G. biloba, a gymnosperm species that is described as a living fossil, 17 SWEET homologs were characterized. In the seven monocot species, eight to 26 SWEET genes were identified. In eudicots, 16 to 53 SWEET genes were observed, suggesting extensive gene expansion and duplication events. Most of the SWEET genes were observed in the legume plant, G. max and the rosids plant E. grandis, which harbored 53 and 52 SWEET homologs, respectively. Although copy number variations among species were apparently complex, our data suggested that the number of SWEET homologs in each species was positively correlated with genome-wide gene numbers (r = 0.7168, P-value = 3.79e-05) ( Supplementary Fig. S1). In addition, the distribution of SWEET homologs was not evenly distributed within one species or among plant lineages. For example, no SWEET genes were observed in four of the 12 rice chromosomes, whereas, roughly 57.1% of the rice homologs were detected on chromosomes 1 and chromosome 9. In the two legume genomes G. max and M. truncatula, both the copy number and distribution of SWEET homologs were distinct ( Supplementary Fig. S2).
Furthermore, the characterized SWEET proteins from various species generally fell into two types. Most of these proteins harbor two MtN3_slv domains, whereas a few consist of one MtN3_slv domain 7,8 . Herein, a  Table S1), and 90% of the predicted SWEET proteins contained two MtN3_slv domains, including all homologs from three unicellular plants. SWEET proteins that only harbored one MtN3_slv domain were observed in P. patens, as well as in most multicellular plants except for S. bicolor, A. thaliana, P. vulgaris and C. grandis. Interestingly, one SWEET homologs, which were characterized in E. grandis, consisted of three MtN3_slv domains.
Expansion models of SWEET genes among plant genomes. Gene expansion or duplication, which frequently occur in plant taxa, is often followed by divergence, thereby resulting in subfunctionalization, novel evolutionary materials and adaptive advantages 24,25 . Diverse duplication models such as whole-genome duplication (WGD) or segmental duplications (SD), local duplication (including tandem and proximal duplications) and dispersed duplication), have been hypothesized for gene duplication [24][25][26][27] . Each of these models is biased in regard to gene retention by either contributing to genetic redundancy or evolutionary novelty 26 . Hence, estimation of the duplication model of SWEET genes was performed for the surveyed genomes via MCscanX software, including two multicellular algae, the basal land species P. patens, S. moellendorffii and all angiosperms (those species were excluded due to either having a of sing-copy SWEET genes or poorly assembled genomes) ( Fig. 1) 28 . The results revealed that the proportions of SWEET genes retained from different gene duplication models differed within or among species. Interestingly, dispersed duplication was the only duplication mode detected within all of the surveyed species. Furthermore, dispersed duplication was also the only duplication mode in SWEET genes from two algaes and P. patens. WGD/segmental duplication events involving SWEET genes were observed in each higher plant species, but not in mosses and algae, which may be related to the phenomenon that all vascular plants undergo one or more whole-genome duplication events. At least three types of duplication events in SWEET genes were detected in every surveyed angiosperm except for the aquatic moncot S. polyrhiza. In particular, SWEET genes retained from dispersed, proximal, tandem, and WGD/segmental duplication accounted for 37.2%, 4.6%, 19.4%, and 38.7% of the duplication events, respectively. The sharp increase in the number of SWEET genes in higher plants was largely due to segmental and tandem duplication compared with basal land plants. The proportion of these two types of duplication models in each species was not equal, and a species-specific duplication model preponderance was detected. For example, in monocots, WGD/segmental duplication was preferentially enriched in M. acuminata and Z. mays to a greater degree than in all of the other surveyed monocot plants. Conversely, tandem duplication mainly contributed to the expansion of SWEET genes in the two Solanaceae plants. For the only two species harboring more than 50 SWEET genes, 69.8% of genes in G. max were derived from WGD/segmental duplication events ( Supplementary Fig. S3), while 52.0% of genes in E. grandis were derived from tandem duplication, which were much higher than those in the other species.
Evolutionary rate estimation of duplicated SWEET paralog genes. Considering the important role of WGD/segmental duplication and segmental duplication in SWEET gene expansion, an estimation of the evolutionary dynamics of SWEET duplicated pairs would help to understand their evolutionary process in all surveyed angiosperms including dicot and monocot lineages. The dN/dS ratio is an important parameter for estimating molecular evolutionary rates and reflects the dynamics that drive evolution. Generally, a dN/ds ratio larger than 1 indicates positive selection and a dN/dS ratio less than 1 suggests purifying selection. In the present study, the dN/dS values of most duplicated paralogous genes were lower than 1 except for three gene pairs, which strongly indicated that most of these duplicated pairs experienced purifying selection. The three gene pairs, Eucgr. F02750/Eucgr. F02751, Gorai. 001G055600/Gorai. 001G055700, and Glyma. 05G036500/Glyma. 17G090800, exhibited dN/dS values larger than 1, suggesting that they underwent positive selection pressure during their evolutionary history.
Furthermore, these results show the different evolutionary rates of WGD and TD duplicated pairs in angiosperms (Fig. 2). Comparing all of the WGD and TD duplicated pairs in angiosperms, the average dN/dS value of WGD (0.25) was less than that of the TD duplicated pairs (0.32). Comparing these two types of duplicated pairs in only monocot or dicot lineages, the average dN/dS value of WGD was less than that of the TD duplicated pairs. Smaller dN/dS values indicated WGD gene pairs evolved more slowly. Finally, both WGD and TD pairs in dicots had a higher average dN/dS value than that in monocots, reflecting the difference between the evolutionary rates of monocot and dicot duplicated SWEET pairs.

Phylogenetic analysis of SWEET genes in 30 plant species.
To better explore the evolutionary history of SWEET genes in plants, complete protein sequences of SWEET genes were used to build ML trees (Figs 3 and S4). Our phylogenetic tree exhibited exactly the same topological structure described by Chen et al. 7 was observed ( Fig. 3 and Table 1). Thus, SWEET genes of angiosperm plants in the phylogenetic trees were also divided into four clades, and SWEET members in algae and basal land species, including three bryophyta plants, S. moellendorffi and A. trichopoda, were used as outgroups. Moreover, SWEET genes from A. thaliana were distributed among the four clades of the two phylogenetic trees, which was also consistent with the findings of the previous study 7 . We followed the nomenclature of Chen et al. 7 and named these clades as I, II, III, and IV, in which 146, 120, 205, and 55 genes were characterized, respectively. Few large recently-duplicated subclades (gene number >5) were observed in the phylogenetic tree, except for three sub-clades in E. grandis (6, 6 and 13 genes, respectively) and one subclade in M. domestica (7 genes). These results indicated that a few extensive gene expansion events involving SWEET genes occurred in a species-specific manner; conversely, most expansion events took place before the taxonomic families or more ancient species diverged. Trees were built with the reliability of internal nodes and evaluated using the Shimodaira-Hasegawa approximate likelihood ratio test (SH-aLRT) values in PhyML 3.1 and were further edited by MEGA 5.0. The phylogenetic tree had exactly the same topological structure described by Chen Li et al. 7 and could be divided into four clades, the major nodes of which were supported with high confidence (≥0.80). We followed the nomenclature of Chen et al. according to the distributing of the SWEET members in A. thaliana, and they are named clades I, II, III, and IV. Dicot and monocot SWEET clades were compressed to triangle.
Interestingly, all the algal SWEET members clustered in one cluster and was apparently an outgroup, exhibiting co-orthologous relationship of all other plant SWEET genes (Figs 3 and S4). Whereas, SWEET genes in Clade II have relatively close relationship with the algal SWEET clade. Besides, each clade has nearby nested outgroups, constituted by SWEET members from all the surveyed basal land taxonomy (bryophyta plants and S. moellendorffi), indicating these four clades split as early as land plant speciation. The SWEET genes of the gymnospermous plant were also detected within all four clades. Additionally, all angiosperm plants could be found in every clade, except the aquatic moncot, S. polyrhiza. SWEET members in S. polyrhiza were absent in Clade III and IV. Finally, compared with the other three clades, clade III has the highest number of genes (205). Five rice SWEET genes in clade III have been reported to confer susceptibility to Xoo 18 , and may cause bacterial blight disease in rice. In the clade IV, the lowest number of genes (55) was observed.

Molecular evolutionary analysis of SWEET genes.
To better estimate the evolutionary rates of the expanded SWEET family in angiosperms, especially in dicots and monocot lineages, four clades in the phylogenetic tree were classified into distinct gene families. First, the monocot-specific (M) and dicot-specific (D) gene families were defined based on the following criteria: (1) According to the species tree ( Fig. 1) and the distribution of homologs in A. thaliana, the M or D gene families should consist of homologs from most monocot or dicots species (not less than half of the dicots or monocots), (2) the clades in which the M or D gene families resided should have support values for basal nodes ≥0.70 ( Fig. 4 and Table 1). These SWEET gene families were preserved throughout the evolutionary history of angiosperms and are regarded as a reliable core set of SWEET genes in angiosperms. Finally, 11 D gene families and 10 M gene families were explored, and these families accounted for the majority of all SWEET homologs. In the four clades we defined above, different numbers of M and D gene family members were characterized in each clade. Three M and four D in clade I, two M and two D in clade II, three M and three D in clade III, and two M and two D family were identified, respectively.

Discussion
SWEET genes are ubiquitous in cellular organisms, from monocellular prokaryotes to higher eukaryotes 1-7 . The dramatic expansion of SWEET genes in plant taxa indicates their functional importance in plants 7,9,13,[19][20][21]30 . However, to date, only a few plant species have been investigated 7,8,[19][20][21] , and the SWEET family has been considered to be an evolutionarily conserved family 7,8 . The accessibility of more high quality genome sequences provides us with an unprecedented chance to analyze this multi-copy gene family in-depth. As sequencing gaps or errors occurred in almost all sequenced genomes, the prediction of a multi-copy gene family may be underestimated.
In the present study, 31 well-annotated or well-assembled genome sequences were carefully selected to minimize the impact of these errors. In addition, considering assembling and sequencing errors, the incomplete of genome sequences or errors in phylogeny reconstruction, we allowed for the gene families in our analysis to be missing in up to half of the dicot or monocot genomes (see the Results). SWEET homologs were systemically surveyed in 31 representative species, ranging from unicellular aquatic algae to terrestrial higher plants, thereby demonstrating its functional importance and ancient origin. Only one to four SWEET homologs were detected in four aquatic To confirm our findings, another gene family, the HUS1 gene family, which is required for homologous recombination repair during meiosis, was also identified in 31 species. This gene family displayed a copy number conservation, evidently different than that of SWEET genes ( Supplementary Fig. S5). Family expansion is generally generated by gene duplication, which frequently occurs in plant taxa and has been considered to be a source of neo-functionalization and genetic redundancy [24][25][26][27]31 . Estimation of the different  Table 2. Estimation of the evolutionary parameters in CDS of SWEET genes in monocot-specific (M) and dicot-specific (D) families. n represent sequence numbers within these families;ts/tv means transition/ transversion rate; dN/ds means non-synonymous/synonymous rate.
SCiENtifiC RepoRTS | (2018) 8:13440 | DOI:10.1038/s41598-018-31589-x duplication models that led to the expansion of SWEET genes in vascular plants was also conducted, and included WGD/SD, Tandem, proximal and dispersed duplication 25,31 . Each duplicated model is biased for gene retention. Duplicated genes retained after different duplicated mechanisms often show opposite extremes of the spectrum, particularly in terms of their fates and divergence in expression 26,27 . For example, retained WGD duplicates may play a primary role as a buffer of crucial functions, thereby providing evolutionary stability. Dispersed duplications largely contribute to genetic novelty and adaptation to new environments 26,27 . The distinct duplication patterns observed in this study imply various functional differentiations among different species or taxa. Based on our findings, we inferred that ancestral core SWEET genes may be predominantly dispersed duplications. Subsequently, WGD/SD and tandem duplications mainly contributed to the expansion of SWEET genes in angiosperms. Further molecular evolutionary rate estimations implied that these WGD/SD and tandem duplicated correlated SWEET gene pairs underwent purifying selection.
Gene duplication and expansion are always followed by functional diversification, and functional diversification may play an important role in providing novel genes for adaptation to new environments 24,25,31 . Here, the expansion of SWEET genes, as well as their diverse roles in multiple processes, clearly indicates their functional diversification and evolutionary history. Together, these sugar transporters exhibited evolutionary conservation, as indicated by remarkable similarities in the phylogenetic relationships within the species tree among SWEET members in 31 species. However, these SWEET genes were diversified into four clades. Among these four clades, only clade II exhibited old, ancient member that were evolutionarily related to algae. To better trace the evolutionary history of SWEET genes, these four clades were further divided into 11 D and 10 M subfamilies. Ten of the 21 subfamilies had positive selection sites, indicating that they had important functions under positive selection. For example, M7 had two positive selection sites, and OsSWEET11 and OsSWEET15 have been shown to contribute to seed filling and size, and are important in breeding and are involved in domestication 32 .
Several SWEET genes acting as both transporters and R-genes, have attracted the attention of researchers 5,8,17,33,34 . According to our results, clade III harbored three monocot subfamilies, two of which had positive selection sites, indicating positive selection. In clade III, all five rice members were determined to have been targeted by the Xoo TAL effectors, thereby inducing pathogenic virulence 18 . Among these, loss-of-function alleles of 3 susceptibility loci (xa25, xa13, xa41) clustered within M7 and M8 have been identified as well-known R-genes that are utilized to combat bacterial blight disease 5,17,33 . We can therefore infer that families M7 and M8, or even clade III, may compose a gene pool that can be used for the identification of resistance genes from transporters in various species. Furthermore, arecently evolved hexose transporter gene in wheat (Triticum aestivum), Lr67, was found to confer partial resistance to three wheat rust pathogen species and powdery mildew; it is a member of the sugar transport proteins (STP) family 34 . Its ortholog in A. thaliana STP13 has also been shown to confer basal resistance to Botrytis cinerea 35 . Therefore, the transporters from which pathogens prey on nutrients from the host have been considered to be a genetic reservoir for R-genes. Clarifying the evolutionary fate of SWEET genes in clade III would be in favor of in-depth function and molecular mechanism analysis of SWEET genes. The Subfamilies defined in our study are believed to have been preserved throughout the evolutionary history of angiosperms and are regarded as a reliable core set of SWEET genes in angiosperms. No matter monocot or dicot species, two 'ancestral genes' were deduced (Fig. 6). One of these genes was duplicated into two core angiosperm gene pairs (D7, D8 and M6, M7) and the other was retained (D9 and M8). Taking the duplication modes that SWEET genes are involved in, we aimed to trace the evolutionary fate of SWEET genes in clade III, using SWEET genes in rice and maize as examples. Clear orthologous relationships were detected between these two species. Interestingly, the five rice homologs were all dispersed duplication correlated genes, while maize has more SWEET genes that originated from recently duplication, including WD and TD, and may result in functional redundancy. Different evolutionary fates may result in functional diversity or redundancy. Our results may provide a theoretical basis for further analyses of functional and molecular mechanisms of these SWEET genes. Together with our analysis, the engineering of candidate SWEET mutants with CRISPR/Cas9 system 36 , could be easily performed during genomic editing of TAL effector target sites, which could be a promising for the exploitation and production of multiple R-genes.

Methods
Data sources. 31  depending on their copy number and genomic distribution. We employed MCScanX to perform synteny analysis and estimate the duplication models in fine-assembled plant genomes (fine-assembled plant genomes means corresponding plant genome sequences had been assembled into pseudomolecule scales).
Phylogenetic analysis. The ML method was used to build phylogenetic trees using the amino acid sequences of the entire CDS sequences by PhyML 3.0. All the sequences were first aligned using MAFFT with the auto strategy 40 . As there were too many gaps in the alignments of the entire protein sequences, trimAl v1.2 was used to delete gaps with parameter of -automated1 41 (Additional file 3). Then aligned sequences were further tested to select the best-fit amino acid substitution model for constructing the ML phylogenetic tree by using ProtTest 3.4 42 . The most appropriate model estimated with ProtTest 3.4 was JTT + G + F (−lnL = 44530.08). Finally, trees were constructed with the reliability of internal nodes and evaluated by using Shimodaira-Hasegawa approximate likelihood ratio test (SH-aLRT) values 43 . Other criteria were set according to the results of ProtTest (gamma shape = 1.257; amino acid frequencies = observed). Obtained trees were edited with MEGA 5.0.
To decipher molecular evolutionary genetic basis of SWEET genes, their nucleotides of CDS were selected from gene model sequences of all surveyed species by a perl script. Then nucleotides of each CDS were submitted to GUIDANCE2 44 website and firstly translated to amino acid sequences and aligned by MAFFT. This aligned amino acid sequences were re-transferred to nucleotide sequences. Finally, unreliable alignments were masked by N with a cutoff (0.90). All the following analysis were conducted with these masked alignments. The HyPhy package with the Genetic Algorithm for Recombination Detection (GARD) method as implemented on the Data Monkey webserver (http://www.datamonkey.org/) 45,46 was used to detect break point sites, which indicated points of unequal crossover.
The codon-based maximum likelihood (CodeML) method in the PAML4.0 package and MEGA 5.0 were firstly used to estimate the average dn/ds ratio of genes within each sub-families 47 . A branch evolutionary analysis for positive selection was conducted using CodeML for average dn/ds of the genes in the M and D sub-families with one-ration model. All masked aligned CDS in each sub-families were used to reconstruct consensus trees for molecular genetic analysis by Seqboot, Dnadist, neighbor and consense program in Phylip package 48 .
To identify the probabilities of sites under positive selection in each sub-families, site models (M7 vs. M8) were implemented in which ω could vary among sites 49 . We used estimated transition/tranversion rates and the F3×4 codon frequencies algorithm as the codon substitution models in the PAML program. Additionally, all of the positively selected sites in the site and branch-site models were identified by using Bayes Empirical Bayes (BEB) analysis with posterior probabilities ≥0.80 47 . Furthermore, positively selected sites were also deduced in the Datamonkey web server by the random effect likelihood (REL) method 45 . Candidate sites under positive selection were defined as those with Bayes factor >50 for REL 45 .

Availability of Data and Materials
All data employed in the present study were downloaded from public databases, which we depicted in methods and materials part of our manuscript. Genomes used for identifying SWEET genes were listed in the Supplementary Table S1. Sequence alignments used for phylogenetic tree were provided as Supplementary Dataset 2. Figure 6. Evolutionary fate of rice and maize SWEET genes in Clade III. According to the phylogenetic tree, no matter monocot or dicot species, two 'ancestral genes' were deduced in Clade III. One of these genes was duplicated into two core angiosperm gene pairs (D7, D8 and M6, M7) and the other was retained (D9, M8).