Functional divergence and intron variability during evolution of angiosperm TERMINAL FLOWER1 (TFL1) genes

The protein encoded by the TERMINAL FLOWER1 (TFL1) gene maintains indeterminacy in inflorescence meristem to repress flowering, and has undergone multiple duplications. However, basal angiosperms have one copy of a TFL1-like gene, which clusters with eudicot TFL1/CEN paralogs. Functional conservation has been reported in the paralogs CENTRORADIALIS (CEN) in eudicots, and ROOTS CURL IN NPA (RCNs) genes in monocots. In this study, long-term functional conservation and selective constraints were found between angiosperms, while the relaxation of selective constraints led to subfunctionalisation between paralogs. Long intron lengths of magnoliid TFL1-like gene contain more conserved motifs that potentially regulate TFL1/CEN/RCNs expression. These might be relevant to the functional flexibility of the non-duplicate TFL1-like gene in the basal angiosperms in comparison with the short, lower frequency intron lengths in eudicot and monocot TFL1/CEN/RCNs paralogs. The functionally conserved duplicates of eudicots and monocots evolved according to the duplication-degeneration-complementation model, avoiding redundancy by relaxation of selective constraints on exon 1 and exon 4. These data suggest that strong purifying selection has maintained the relevant functions of TFL1/CEN/RCNs paralogs on flowering regulation throughout the evolution of angiosperms, and the shorter introns with radical amino acid changes are important for the retention of paralogous duplicates.

CEN/RCNs prevents redundancy or silencing by functional divergence 14,15 , which occurs by positive selection or through the relaxation of environmental constraints 15 .
Different expression patterns of duplicated TFL1/CEN/RCNs genes in Arabidopsis 16 , apple 17 , tomato 18,19 , and tobacco 20 tissues have been reported. Such differential expression was suggested as complementary functions (subfunctionalisation) 21,22 . Following functional divergence, genes normally experience a phase free from selective constraints 23 . Because of the conserved properties of TFL1/CEN/RCNs paralogs, poor resolution of nucleotide phylogeny 24 cannot explain their divergence. Nevertheless, a single reciprocally switched amino acid could cause functional interconversion between FT (flowering activator) and TFL (flowering repressor) 25,26 . Therefore, a few changes in the amino acid sequence can alter protein function to escape the redundancy of duplicates 27 . Therefore, determining radical amino acid changes between TFL1/CEN/RCNs paralogs (the type-II functional divergence of Gu 28,29 ) could be useful for predicting their functional divergence after duplication.
The functional conservation and divergence of paralogous genes is not only reflected in coding sequences, but also in exon-intron structure. Structural divergence is prevalent in duplicated genes and leads to functionally divergent paralogs 30 . Variable intron lengths could be relevant to functional compensation in coexisting paralogs 30 and provide heterogeneous regulatory functions in duplicate [31][32][33] . Highly expressed genes have longer introns than genes expressed at low levels 33 . Exon length was also suggested to be associated with molecular functions in flowering development cf. 34 . The Amborella trichopoda genome (http://www.amborella.org/) 13 enables the comparison of gene structure and sequence variation in TFL1-like gene between basal angiosperms, and monocot and eudicot angiosperms. Comparisons of gene structure and intron lengths may enhance our understanding of evolution and its relevance among paralogs.
Genetic diversity among TFL1/CEN homologs played a key role in the diversification of flowering plants 7,23 , which was probably driven by heterogeneous selective pressures on different gene regions. For example, strong selective sweeps in coding regions, and balancing selection of promoters were detected in Arabidopsis 35 . Furthermore, epistatic selection was identified through a QTL closely linked to the Arabidopsis TFL1 36 . In addition, latitudinal gradients adaptation was also inferred by nonsynonymous polymorphisms of TFL1 37 . However, there have been limited studies focussed on the effects of selective pressures on TFL1/CEN/RCNs paralog duplication, as well as the TFL1-like gene in basal angiosperms. These functionally conserved paralogous gene duplicates may be subject to strong purifying selection pressures that constrain redundant functions, such as the floral-regulatory paralogs SEPALLATA 1 (SEP1) and SEPALLATA 2 (SEP2), and SHATTERPROOF 1 (SHP1) and SHATTERPROOF 2 (SHP2) 38 . Selective constraints may be important in functionally redundant paralogous genes for buffering an organism's phenotype against deleterious mutations 39 .
In this paper, a broad range of representative organisms from basal angiosperms, eudicots, and monocots were sampled to determine whether flowering plants exhibit divergent functions of TFL1/CEN/RCNs duplicates and how the selective pressure drove their evolution. General patterns of structural divergence in duplicated genes were analysed to represent the divergence/conservation of these paralogous genes. The aims of this research were to investigate (1) the evolution of intron variability in angiosperm TFL1/CEN/RCNs genes; (2) the effect of positive selection on angiosperm TFL1/CEN/RCNs coding sequences; and (3) the functional divergence between paralogs of angiosperm TFL1/CEN/RCNs, and thus infer the ancestral/derived type of TFL1/CEN/ RCNs paralogs.

Results
Sequence length variation. All sequences were confirmed as TFL1-like by the presence of histidine at the 92 nd amino acid position (corresponding position at the 88 th site of Arabidopsis) 25 . Only one copy for each Magnoliid species was obtained after amplification, and this result is consistent with only one TFL1/CEN/ RCNs member in EST-library of basal angiosperm database (accession number: gnl|Liriodendron|b4_c119764, Ancestral Angiosperm Genome Project, http://ancangio.uga.edu/index.php). The sequences amplified from Magnoliid shown the best hit to TFL1/CEN/RCNs family (Nelumbo nucifera CEN-like protein 2, E-value: 5e −99 -2e −92 ). Exon lengths of eudicot TFL1 and CEN, monocot RCN1, RCN2, and RCN3, and basal angiosperm (magnoliids + Amborella) TFL1-like gene range 519-609 bps, 447-531 bps, 522 bps, 522 bps, 522 bps and 516-522 bps, respectively. The length of introns from TFL1, CEN, RCN1, RCN2, RCN3, and basal angiosperm TFL1-like genes are 496-2048 bps, 320-3273 bps, 312-384 bps, 509-1007 bps, 510-643 bps, and 1444-3539 bps, respectively. Exon lengths were found to be constant and there were no significant differences between paralogs, with the exception of exon 4 between eudicots and monocots (Fig. 2). In contrast, the intron lengths were highly variable, and the monocot RCN1 was found to have relatively short but constant intron lengths compared with other paralogs. Furthermore, monocot RCNs had a higher number of intron length polymorphisms than eudicot TFL1/CEN (Fig. 2). Although only two TFL1-like full sequences were obtained from magnoliids, the synapomorphy of the long intron lengths of TFL1-like genes in Lauraceae and Magnoliaceae were confirmed by PCR (Additional file 1: Fig. S1). Analysis revealed that the monocot Sorghum bicolor has lost intron 1, and that exon 2 has merged with exon 1, and this sequence is removed when estimating the exon/intron length variation. The exon/intron structures and lengths are shown in Fig. 3 and Additional file 1: Table S2.
Correlation between intron lengths and conserved motifs. Twelve conserved motifs, which are identical to the motifs of putative cis-acting elements, were identified in noncoding regions (Additional file 1: Table S3), and the four-base motifs CAAT box and WRKY were abundant and both presence frequencies (0.0054 and 0.0064, respectively) are higher than those predicted by random occurrence (>1/256, p = 0.0245 and 0.0001, respectively) (Additional file 1: Table S3). The TFL1-like gene from magnoliids was found to have longer introns and more abundant cis-element to motifs. A strong and significant positive correlation between the number of cis-element motifs and intron length were found (R 2 = 0.711, p < 0.0001, Fig. 4), suggesting that noncoding regions in TFL1/CEN/RCNs paralogs are relevant to intron length.
Phylogenetic tree of angiosperm TFL1/CEN/RCNs paralogs. The phylogenetic tree of angiosperm TFL1/CEN/RCNs paralogs was reconstructed using amino acid sequences (Fig. 3) and was inconsistent with the hypothetical tree ( Fig. 1 and Additional file 1: Fig. S2). The magnoliid TFL1-like gene was misgrouped with monocot RCNs (Fig. 3). The misgrouping for monocot and magnoliid paralogs was also revealed in the bush-like tree topology for basal lineages by Bayesian inference (Additional file 1: Fig. S3). The misgrouping of Magnoliid with monocot or eudicot is common in phylogenetic analysis using certain genes, which is probably due to combination of the relatively old age of these taxa and long branches attraction [40][41][42] . In contrast to the unresolved topology of basal lineages of eudicot and magnoliid paralogs, the monocot RCN paralogs were well grouped, with relatively high bootstrap supports in the ML and Bayesian trees ( Fig. 3 and Additional file 1: Fig. S3). Furthermore, RCN2 and RCN3 are grouped together in both ML and Bayesian analyses (Fig. 3), implying that the duplication sequence in monocots is RCN1 and RCN2/3 followed by RCN2 and RCN3.

Positive selection analyses.
To examine the effect of selective pressures on the angiosperm TFL1/CEN/ RCNs paralogs, the ratio (ω) of missense (Ka) to silent mutation rates (Ks), an indicator of natural selection, was estimated. Likelihood ratio analysis revealed that the free ratio model was a better fit than the constant ratio model (2∆L = 97.8117, df = 61, P = 0.0004), suggesting a strong and pervasive purifying selection on angiosperm TFL1/  (Table 1), the two and three ratio models were performed, and showed relaxation of selective constraints (ω 0 < ω 1 , ω 2 < 1) for eudicot TFL1, CEN, monocot RCN2 and RCN3, and magnoliid TFL1-like gene, but strong purifying selection (ω 0 > ω 1 ) for the monocot RCN1 ( Table 2). The two ratio model was a better fit for the evolution of monocot, eudicot and magnoliids TFL1/TFL1-like paralogs than three ratio models. This suggests that the grouping of eudicot and magnoliid TFL1/TFL1-like paralogs is a consequence of functional constraint and that both paralogs suffered different selective pressures for magnoliids TFL1/TFL1-like paralogs ( Table 2).

Evolutionary divergence between angiosperm TFL1/CEN/RCNs paralogs. The pairwise Ka/Ks
ratio was calculated and plotted against Ks to reveal patterns of selection through time. No pairwise Ka/Ks > 1 were obtained suggesting that no positive divergent selection occurred between paralogs. Eudicot TFL1 and CEN were mostly distributed in the quadrant Ka/Ks < 1 and Ks > 1, indicating long-term purifying selection. The monocot RCNs and magnoliids TFL1-like genes were distributed in the quadrant Ka/Ks < 1 and Ks < 1. We divided this quadrant into two classes: (1) Ka/Ks > 0.097 (average Ka/Ks), suggesting the relaxation of selective constraints. This quadrant comprises the magnoliid TFL1-like and the monocot RCN2 and RCN3; and (2) Ka/ Ks < 0.097, suggesting strong selective constraints or recent purifying selection, which comprised the monocot RCN1 (Fig. 5). This inference is consistent with the results of tests for selection hypotheses ( Table 2).
The sliding window analysis showed pairwise Ka/Ks < 1 in all regions among paralogs, with the greatest evolutionary divergence in exon 1 and exon 4 ( Fig. 6). Relatively conserved regions in exon 2 and exon 3 indicate that these regions were subject to strong selective constraints. Relatively high Ka/Ks at exon 4 between the recently divergent monocot RCN2 and RCN3, indicate that this was subject to low selective pressures of constraining amino acid changes between RCN2 and RCN3 (Fig. 6D). Small Ka/Ks ratios between eudicot and monocot paralogs indicate functional conservatism divergence ( Fig. 6B and C). Magnoliid TFL1-like gene was found to have a highly divergent exon 1 and exon 4 compared with the other paralogs (about 100 th bp in TFL1, 180 th and 450 th bp in CEN, 340 th bp in RCN2). This suggests that this gene was subject to different selective pressures than the eudicot and monocot paralogs, while the conservation of exon 2 and exon 3 suggests long-term and pervasive functional constraints on these genetic regions (Fig. 6E).

Radical functional divergence between angiosperm TFL1/CEN/RCNs paralogs.
An in silico analysis of radical amino acid changes between duplicated genes was conducted for testing the functional divergence of TFL1/CEN/RCNs paralogs. Nonsignificant radical functional divergence, as estimated by the type-II functional divergence index (θ II ) 43 , was found between angiosperm TFL1/CEN/RCNs paralogs ( Table 3). The proportion of fixed radical change between paralogs (F 00,R ) was zero between eudicot TFL1/CEN and paralogs of monocots and magnoliids, but more or less between paralogs between monocot RCNs and magnoliid TFL1-like genes (Table 3). This indicates that there is functional conservation of paralogs between eudicots and monocots, and functional specialization between paralogs within the eudicot and monocot species.  Table S2. Types and locations of the putative cis-acting elements are listed in Additional file 1: Table S3.

Hypotheses Scenarios
Purifying selection on magnoliid TFL1-like Table 1. Hypotheses and the corresponding scenarios for the grouping of eudicot TFL1 and magnoliid TFL1like genes ω 1 , ω 2 , and ω 0 are the Ka/Ks ratio of the branches of eudicot TFL1, magnoliid TFL1-like, and the other lineages (backgrounds), respectively. The ω 1 was also set for the eudicot CEN and monocot RCN1~3 for testing the same hypotheses.

Discussion
Exon length conservation and intron length variability. Exon length conservation implies constraints of gene functions among organisms 34,44 . Eudicot and monocot TFL1/CEN/RCNs are functionally conserved and the inflorescence architecture was determined by comparison with the model organisms Arabidopsis and rice 45 . Highly variable intron lengths and sequences of angiosperm CEN/RCNs/TFL1-like genes suggest absence of constraining reproductive function from noncoding regions. It is not known whether intron fragments have been gained or lost through evolution, due to poor or failed alignment in introns. However, we suspect that there was a gradual deletion throughout intron evolution because, generally, there are longer introns in basal angiosperms than in both eudicots and monocots ( Fig. 3 and Additional file 1: Fig. S1). A deletion of this type in introns could be the result of recombination 46 and may have contributed to the divergence and functional differentiation in this family of genes 47 . Intron lengths are positively correlated with the number of conserved motifs, which are identical to the putative transcription factor binding sites (R 2 = 0.711, P < 0.0001, Fig. 4). Furthermore, certain motifs in intron may stimulate gene expression 48 . Long introns with more conserved motifs could have a complicated folding structure, as well as alternative splicing sites that affect transcription, particularly for the basal angiosperms (such as Lauraceae, Magnoliaceae, and Amborella). Alternative splicing in TFL1/CEN paralogs was reported to influence terminal flowering and flowering time 49 . Formation of gene loops is also relevant to the activation or maintenance of Arabidopsis TFL1 expression 45 . Therefore, gene lengths are hypothesised to be a key factor affecting the expression efficiency of TFL1 orthologs. TFL1 is targeted by several MADS-box genes, which have different functions during floral transition, and they coordinate the timing of flowering and floral development with TFL1 45 , indicating that the TFL1 orthologs could have several protein binding sites. In addition, both AP1 and LFY can bind to the TFL1 locus and directly suppress TFL1 expression [50][51][52] . Suppression of TFL1 in inflorescence branching regulation by MADS-box genes also affects LFY and AP1 expression 45 . No AP1 binding sites (CArG box) or LFY binding sites were found in either eudicot TFL1 and monocot RCN1, but were present in basal angiosperms (Additional file 1: Table S3). This might suggest the functional relevance of long introns in the TFL1-like gene in basal angiosperms. In contrast, the short introns of eudicot and monocot paralogs could reflect their low expression 33 , which may facilitate the retention of duplicated genes and the conservation of their ancestral functions 53 .
Pervasive purifying selection and relaxation of selective constraints on eudicot and monocot TFL1/CEN/RCNs paralogs. It was suggested that the functions of angiosperm flowering development genes, have been conserved under selective constraint in eudicots and monocots 34,45,54 . Strong purifying selection of the TFL1 paralog with an average ω = 0.097 was inferred based on site-model analysis 34 , which is similar to the average pairwise Ka/Ks = 0.0791 estimated in our analysis (the horizontal dotted line in Fig. 5), and lower than the average ω of other floral-regulatory paralogs (SEP1 vs. SEP2 and SHP1 vs. SHP2, both ω = 0. 16 38 ). Nonsignificant radical functional divergence (θ II ) between paralogs supports the functional constraint hypothesis for angiosperm TFL1/CEN/RCNs paralogs (Table 3). However, in the phylogenetic analysis ( Fig. 3 and Additional file 1:   Table 2. Summary of the ω estimation and likelihood ratio test (2ΔL) between two-ratio (ω 0 ≠ ω 1 = ω 2 ) and three-ratio (ω 0 ≠ ω 1 ≠ ω 2 ) models. ω 1 , ω 2 , and ω 0 are the Ka/Ks ratio of the branches of the eudicot TFL1 (or eudicot CEN, monocot RCNs), magnoliid TFL1-like, and background lineages, respectively. np: number of parameters p: p-value obtained from fitted model using χ 2 test. The pairwise Ka/Ks ratio and the sliding window analysis suggest that there were long-term selective constraints on eudicot TFL1 and CEN (Fig. 5), particularly on exon 2 and exon 3 of all angiosperm TFL1/CEN/ RCNs paralogs (Fig. 6). Exon 2 and exon 3 are activator regions (ligand-binding site) of TFL1 25,55 , and are highly conserved with no amino acid changes (Fig. 7). However, relatively higher pairwise Ka/Ks within monocot and magnoliid paralogs suggests that constraints were relaxed, particularly in exon 1 and exon 4 (Figs. 5 and 6), which is also supported by the analysis of site-specific radical functional change between paralogs of both monocots and magnoliids (Fig. 7). Residues 133-151 in exon 4 form an external loop, and this conformation determines the functional specificities of floral regulators 55 . Loss of the hydrogen bond between the external loop (exon 4) and the activator regions (exon 3) may be responsible for the functional conversion of activators of FT to floral repressors of TFL1 55 . One radical change in RCN1/RCN2, RCN2/RCN3, RCN2/magnoliid TFL1-like, and three radical changes in RCN1/RCN3 within the external loop were estimated (Fig. 7), suggesting that the paralogous divergence occurred by relaxation of selective constraints, particularly between the monocot RCNs.
Limited fixed radical differences (F 00,R ) could suggest the maintenance of ancestral function between paralogs of different taxa and imply that the eudicot TFL1/CEN and monocot RCNs do not fit the neo-functionalisation hypothesis of duplicate genes. Instead, the duplication could be a case of sub-functionalisation due to the relaxation of functional constraints, because one-third to a half frequency of radical change (G R ) was detected ( Table 3). The duplication-degeneration-complementation (DDC) 56 and escape from adaptive conflict (EAC) models 57 are commonly used to explain subfunctional divergence of duplicates and retention of duplicates 58 . The main difference between the DDC and EAC models is that the EAC puts more emphasis on positive selection for gene specialisation during or after duplication, while positive selection is not required for DDC 59 . In the case of eudicot TFL1/CEN and monocot RCNs, all duplicates retained plesiomorphic functionality with slight differences, by relaxation of selective constraints. However, no specific paralogs suffered positive selective pressures, suggesting a more likely evolutionary fit to DDC. The EAC hypothesis therefore, was rejected.

Relaxation of selective constraints and phylogenetic convergence of magnoliid TFL1-like genes.
No duplication of TFL1-like genes was found in basal angiosperms. The constructed phylogenetic tree showed that the magnoliid TFL1-like genes are grouped with eudicot TFL1/CEN paralogs (Fig. 3 and Additional file 1: Fig. S3). This may suggest: (1) constraining ancestral functions of the eudicot TFL1 with the basal-angiosperm TFL1-like gene (functional constraints hypothesis), (2) identical selection pressures acted on both eudicot TFL1 and magnoliid TFL1-like genes synchronously (synchronous selection hypothesis), or (3) eudicot TFL1 and magnoliid TFL1-like genes evolved in parallel independently, resulting in phylogenetic convergence (phylogenetic convergence hypothesis). The LRTs for the two ratio and three ratio models showed the functional constraints between magnoliid TFL1-like and other paralogs (ω 1 = ω 2 ≤ 1) except the monocot RCN1 (ω 0 ≠ ω 1 ≠ ω 2 ) ( Table 2). This indicates that (1) eudicot and monocot TFL1/CEN/RCNs could share ancestral polymorphisms and functions with TFL1-like gene of basal angiosperms, and (2) magnoliid TFL1-like and monocot RCN1 could functionally converge under heterogeneous evolutionary rates. The basic functions of TFL1/CEN/RCNs paralogs in magnoliids, eudicots, and monocots do not alter, but there is division of labour by small fractions of neutral or nearly neutral amino acid replacements, which is consistent with the functional divergence analysis (Table 3).    Table 3. Summary of type-II functional divergence analysis for angiosperm TFL1/CEN/RCNs/TFL1-like paralogs. θ II , coefficient of type-II functional divergence (SE: standard error); p-value, significance test based on Z-score test to test the hypothesis of deviation of θ II from zero; a R /π R : the ratio of radical change under functional divergence versus nonfunctional divergence; G R and G C , proportion of radical change and conserved change, respectively; F 00,N , F 00,R , and F 00,C , proportion of none change, radical change, and conserved change of amino acids between clusters but no change within clusters, respectively.
The long-term constrained evolution of floral development genes across divergent species was inferred by comparative analyses of 18 angiosperm species 34 . However, the evolutionary pattern of these genes in basal angiosperms, such as Amborella, Lauraceae, and Magnoliaceae, has not yet been investigated. Although the functional constraint hypothesis was supported between magnoliid TFL1-like genes and most other paralogs, the ω of foreground branches are larger than background lineages (Table 2), supporting the hypothesis of relaxation of constraints for flexing the non-duplicated magnoliid TFL1-like genes in shaping floral diversity inferred by both pairwise Ka/Ks (Fig. 5) and functional divergence analysis ( Table 3). The relaxation of selective constraints was common for duplicated genes at the phase of early duplication that accelerated evolution of duplicated genes to escape from redundancy, while most gene duplicates were stochastically silenced with few survivors subsequently experiencing strong (10-fold efficiency) purifying selection 14 . Here, we provide at least two novel discoveries regarding the evolution of TFL1-like genes in basal angiosperms: (1) Lauraceae and Magnoliaceae TFL1-like genes are divergent from those of Amborella and are phylogenetically similar to the eudicot TFL1/CEN; (2) purifying selection prevailed over the magnoliid TFL1-like genes as well as the eudicot and monocot paralogs, but the unfixed paralogous radical replacement enabled their differentiation through the relaxation of selective constraints.

Conclusions
In this work, we inferred evolution and functional divergence of TFL1/CEN/RCN among 18 angiosperm species, including basal angiosperm species to elucidate the duplication history of TFL1/CEN/RCN genes. We found long-term retention of functionally redundant duplicates TFL1/CEN/RCNs in the angiosperm genomes. Based on the results of purifying selection on exon, radical amino acid changes and various intron lengths with cis-acting element analysis, the maintenance and conservation of their ancestral function could be explained by duplication-degeneration-complementation model. The ancestral function of TFL1/CEN/RCNs might be preserved and divided into each duplicates. Therefore, the strong selection pressure against removing any duplicates may cause the permanent establishment of duplicates during evolution of flowering plants. Consequently, these two duplicates together maintain the conservative mechanism in inflorescence architectures, and expansion of the PEBP gene members may be important factor for driving morphological divergence among angiosperms.
Intron length of TFL1 paralogs was various. TFL1 introns of basal angioserpm tend to have longer intron and more predicted cis-acting than monocot and eudicot. On the other hands, exon length was conserved with low amino acid substitution rate. These data suggest that strong purifying selection has maintained the relevant functions of TFL1/CEN/RCNs paralogs on flowering regulation throughout the evolution of angiosperms, and the shorter introns with radical amino acid changes are important for the retention of paralogous duplicates.

Methods
Data collection and phylogenetic tree reconstruction. The full lengths of angiosperm TFL1/CEN/ RCNs genes were obtained from NCBI GenBank. Organisms without complete paralogs (e.g. only TFL1 of eudicot and RCN1 of monocot organisms) were excluded. Due to high similarity among PEBP gene family, many sequences named with TFL1 or CEN are belonged to FT/BFT/MFT. For preventing miss-inferring of phylogenetics of TFL1/CEN, we only included sequences which were previously identified as TFL1/CEN in our subsequent analysis e.g. ref. 7  , and the TFL1like gene from Amborella trichopoda (Amborellaceae) were obtained from GenBank. We also amplified complete TFL1-like sequences from two basal angiosperm species Lindear megaphylla (Lauraceae) and Liriodendron sp. (Magnoliaceae) using primers (MaLaTFL1-F1: 5′-ATGGCAAGAATGTTAGAGC-3′; MaLaTFL1-R1: 5′-CAACGTCTCCTNGCAGCTG-3′). Intron positions were rechecked based on the GT-AG rule. Exon-intron structures were drawn by Exon-Intron Graphic Maker (http://wormweb.org/exonintron). Exons of Litsea cubeba, Neolitsea phanerophlebia, Persea sp., and Michelia compressa (GenBank accession number: KY933631-KY933636) were also sequenced for coding region analyses. The identification of the exon sequences were conducted using BLAST. Sequences without best hit to TFL1/CEN/RCNs were discarded (eg. FT/BFT/MFT). The phylogenetic tree of TFL1/CEN/RCNs was reconstructed by exons using the Maximum likelihood method with the GTR+G model, gamma distribution (α = 0.46) for substitution rate among sites using PhyML 3.0 60 . The tree bisection and reconnection (TBR) was adopted for tree rearrangement and fast bootstrap method aLRT was adopted for branch supports.

Conserved motifs in introns.
Conserved motifs in introns were found by searching the database of plant cis-acting regulatory DNA elements, NEW PLACE 61 . From 212 types of predicted motifs like cis-acting elements, 12 putative functional cis-acting elements that have been reported to regulate the expression of TFL1/CEN/ RCNs paralogs were identified (Additional file 1: Table S1) 49,[62][63][64][65] and the number of these putative cis-acting elements were calculated. Correlation between the number of cis-acting elements and the total intron length (i.e. intron1 + intron2 + intron3) was estimated.
Detection of positive selection on angiosperm TFL1/CEN/RCNs paralogs. To examine the effect of selective pressures, the ω ratio, which can be used for testing the gene neutrality hypothesis Ka/Ks (ω) = 1, was estimated by maximum likelihood approaches implemented in PAML 4.7 66 . First, the ω under the free-ratio model was estimated, which allows varied ω on every branch. The likelihood ratio test (LRT) that calculates the 2× differences of log likelihood between constant-rate model and other evolutionary hypotheses (2ΔL) were used for evaluating the better fitted selective hypothesis by χ 2 test. Because TFL1-like genes in magnoliids were grouped with eudicot TFL1/CEN clades (Fig. 1), we hypothesized that (1) functional constraints between magnoliid TFL1-like and eudicot TFL1/CEN and (2) phylogenetic convergence was tested. To test these hypotheses, we estimated the ω and evaluated the goodness-of-fit of two-ratio model (ω 0 ≠ ω 1 = ω 2 ) and three-ratio model (ω 0 ≠ ω 1 ≠ ω 2 ) by LRT. ω 0 are the Ka/Ks ratio of background branches; ω 1 and ω 2 are Ka/Ks of foreground branches that allowed ω > 1, where ω 1 are the ω of eudicot TFL1/CEN or monocot RCNs paralogs and ω 2 are the ω of magnoliid TFL1-like genes. Detailed hypotheses and selection scenarios are listed in Table 1.
In addition, pairwise Ka/Ks comparisons between angiosperm TFL1/CEN/RCNs paralogs were calculated to examine the evolutionary divergence and represented by (1) the Ka/Ks against Ks plot and (2) sliding window analysis by DnaSP 5.0 67 . The Ka/Ks against Ks plot helps to determine the degrees and relative times of paralogous divergence and the sliding windows provide details for clarifying the divergent regions from the regions under selective constraints.
Functional divergence between angiosperm TFL1/CEN/RCNs paralogs. Functional divergence between paralogs caused by radical amino acid changes was assessed by the type-II divergence function implemented in DIVERGE 3.0 43 with 500 bootstrap replications. Substitutions between amino acids with different radical biochemical properties (charge positive/negative, hydrophilic/hydrophobic) are classified as a radical change, and all others are classified as a conserved change. The Z-test was conducted to test the deviation of coefficients of type II functional divergence (θ II ) from zero, and value of θ II greater than zero implied radical shifts in amino acid physiochemical properties after duplication. The fold of radical change related to non-functional change was calculated using the ratio of radical change under functional divergence versus nonfunctional divergence (aR/ πR). The proportion of such as fixed radical change, conserved change and none change sites were also calculated. Site-specific estimation of posterior probability of radical changes was performed to assess the probable regions and shifts of biochemical properties between paralogous groups.