Introduction

Whole genome duplication (WGD) is an important driving force in the evolutionary process of flowering plants1,2. Analyses of genomic data suggest that the extant angiosperm crown arises from a common paleohexaploid progenitor3,4. In addition to paleohexapolyploidization, one or more rounds of WGD recurred in the genomes of many angiosperms5, which simultaneously created thousands of paralogous gene pairs in the affected lineages. Population genetics predict that an entirely redundant duplicate copy cannot be maintained in the genome for a long time6. Following the ancient WGDs, some paralogs will be silenced and eventually eliminated, while many of the retained paralogs may be subject to changes in DNA sequence or gene expression, leading to sub or neo-functionalization7,8,9. Wholesale gene loss after WGDs can drastically shrink genome size and gene content, which has long been viewed as a critical driving force in the evolution of higher plants10,11,12.

Populus (poplar) and Salix (willow) are sister genera in the Salicaceae family. The two lineages are important fiber resources with rapid growth rates, ease of vegetative propagation, predisposition to hybridization, and high productivity of wood13. These characteristics, together with their relatively small genomes and the rapidly growing research resources, have led to Salicaceae species emerging as the model system for different aspects of genetic studies of woody plants. In the past decade, the whole genomes of poplar and willow have been sequenced and become publicly available14,15. Genomic analysis revealed that in addition to the paleohexaploidization shared in angiosperms, a lineage-specific WGD (known as salicoid duplication) recurred in the genome of the progenitor of these two lineages ca. 58~65 million years ago14,15. Genome comparison revealed that poplars and willows originate from a common paleotetraploid ancestor16. However, cytogenetic studies show that the extant poplars and willows mainly exist in diploid form17, suggesting that genome diploidization recurred after salicoid duplication. This process may accompany substantial genome reshuffling and gene losses14,18,19,20. Genome-wide comparison of sequences among different chromosome members suggested that after salicoid duplication, a series of reciprocal tandem terminal fusions of the duplicated chromosomes gave rise to the diploid progenitor of the modern taxa of these two lineages14. Approximately six million years later, two major interchromosomal rearrangements and several minor intrachromosomal rearrangements occurred subsequently16,21, which distinguished the karyotypes of willow and poplar. Previous studies have also revealed that primitive sex chromosomes evolve in poplar and willow, which are associated with different autosomes in the two lineages21,22,23. In addition to the divergences in genome structure and sex chromosome evolution, multiple lines of evidence, such as the 2 C DNA contents24,25, the k-mer estimation and the published genomes14,15,26,27,28, indicate that willows have smaller genomes and gene contents than poplars. Considering that these two lineages share a common ancestor, willows should lose DNA and genes at a faster rate than poplar after their divergence. However, the genetic mechanism triggering this scenario remains largely unknown.

The nonsynonymous (Ka) to synonymous (Ks) substitution rate ratio (ω = Ka/Ks) can be used as an estimator for selective pressure on DNA sequence evolution29. Using this analytical tool, Clark et al. detected an informative set of genes with significantly different patterns of substitution in humans different than that in chimpanzees and mouse among a total of 7,645 human-chimp-mouse orthologous genes30. With paralogous genes, this analytical tool is useful to navigate an evolutionary trajectory from an initial state of complete redundancy. By comparison of the selection pressure acting on paralogous genes with data from 26 bacterial, six archaeal, and seven eukaryotic genomes, Kondrashov et al. indicated that paralogs produced through duplication were subject to purifying selection, which would lead to losses of redundant genes31. In this paper, we assess the selection pressure on paralogous pairs resulting from salicoid duplication throughout the genomes of Populus trichocarpa and Salix suchowensis. We aim to detect whether there is uneven selection pressure accelerating the divergent evolution of these two sister genera.

Results

PGRS in poplar and willow

A total of 39,514 and 24,931 coding sequences contained in 19 chromosomal reconstructions were extracted from the genomic database of P. trichocarpa and S. suchowensis, respectively, and these genes were used to detect the PGRS in poplar and willow. Plotting the average 4DTV (four-fold degenerate site transversion) values for the paralogous genes contained on each syntenic segment revealed two peaks both in poplar and willow (Fig. 1a, b), and the highest peak was recognized to result from salicoid duplication according to Tuskan et al.’s study14. The highest peaks covered 4DTVs in the range of 0.0–0.2 in both lineages, which was consistent with the results in previous reports14,32. With a rejection significance P ≤ 0.01, the confidence interval [μ−2.58δ, μ + 2.58δ] would contain 99% of the variables covered by the peak associated with salicoid duplication. The confidence interval was [0.050, 0.150] and [0.103, 0.172] for P. trichocarpa and S. suchowensis, respectively. Syntenic segments with average 4DTV values outside of the confined ranges were subsequently filtered out in the following analyses due to concerns that they may not have arisen from the salicoid duplication. For the retained syntenic segments, we further calculated the Ks values for each paralogous pair. Based on the plotting of the derived Ks values (Figs. 1c, d), with 99% coverage, the paralogous pairs on syntenic segments with Ks values in the range of [0.000, 0.636] and those with Ks values in the range of [0.032, 0.744] (Supplementary Table S2) were recognized as PGRS in P. trichocarpa and S. suchowensis, respectively. According to Cui et al.’s study, the paralogous pairs with Ks values < 0.005 should be discarded to avoid fitting a component to infinity3; thus, we modify the range of Ks values for identifying PGRS in poplar as [0.005, 0.636] (Supplementary Table S1). In total, 8,991 and 5,161 PGRS were detected on the reconstructed chromosomes in P. trichocarpa and S. suchowensis, respectively (Table 1). The synteny for the detected PGRS among the poplar and willow chromosomes was separately shown in Supplementary Fig. S1 and Supplementary Fig. S2, respectively.

Fig. 1: Plotting the average 4DTV and Ks values for paralogous genes on the syntenic segments (PGSS) in the genome of P.
figure 1

trichocarpa and S. suchowensis a Plotting the average 4DTV values for PGSS in the P. trichocarpa genome. b Plotting the average 4DTV values for PGSS in the S. suchowensis genome. c Plotting the Ks values for PGSS with 4DTV in the range of 0.050 to 0.150 in the P. trichocarpa genome. d Plotting the Ks values for PGSS with 4DTV in the range of 0.103 to 0.172 in the S. suchowensis genome

Table 1 Gene density, No. of PGRS, ω ratios range, mean ω ratio, and median ω ratio in poplar and willow genome

Gene coverages for the 19 chromosomal reconstructions were 98.0% and 93.7% of the total poplar and willow genomes, respectively14,16. Although most genes were assembled on chromosomes of poplar and willow, the reconstructed chromosomes are incomplete, and the integrity may vary among different chromosomes. Thus, the absolute numbers of genes are not comparable among different chromosomes. In contrast, gene density is a comparable parameter under such circumstances. As shown in Table 1, gene density ranged from 84.1/Mb (chromosome XIX) to 131.8/Mb (chromosome IX), and PGRS density ranged from 24.8/Mb (chromosome XIX) to 68.9/Mb (chromosome IX) among chromosomes in the poplar genome. In the willow genome, gene density varied from 89.4/Mb (chromosome XIX) to 122.4/Mb (chromosome IX), and PGRS density varied from 27.1/Mb (chromosome XIX) to 64.4/Mb (chromosome VIII). It is noteworthy that chromosome XIX, the sex chromosome in poplar, is characterized by the lowest gene and PGRS densities among all the chromosome members.

Correlation analysis shows that gene and PGRS densities on the corresponding chromosomes are highly correlated between poplar and willow, with a correlation coefficient equal to 0.79 (P = 0.000) and 0.89 (P = 0.000), respectively. The high correlation coefficients imply that loss of PGRS scales similarly across different chromosomes in the two lineages. On the reconstructed chromosomes, poplar is found to retain more PGRS than willow (8991 vs. 5161), indicating that willow has lost PGRS at a faster rate than poplar after their divergence.

Compare the ω ratios for PGRS within and between the two lineages

The Ka, Ks, and ω ratios were calculated for each PGRS on the 19 chromosomes in poplar and willow separately. At the genome-wide level, the average ω ratio for P. trichocarpa and S. suchowensis was 0.309 and 0.274, respectively (Table 1). The former was significantly higher than that for the latter (P < 0.001) (Table 2), indicating PGRS in willow were subject to stronger purifying selection than those in poplar, which would result in a faster loss of redundant genes in willow than in poplar. At chromosome level, the average ω ratios for PGRS on different chromosomes ranged from 0.301 to 0.318 in poplar (Table 1), and it ranged from 0.261 to 0.282 in willow (Table 1). The ω ratios were all substantially smaller than 1, indicating PGRS were generally under strong purifying selection in both lineages. The statistical test for ω ratios of PGRS between willow and poplar indicated that this parameter varied significantly among 18 of the corresponding chromosomes except for chromosome XVIII (Table 2), suggesting that PGRS on most of the chromosomes in willow were under significantly stronger purifying selection than those on the corresponding chromosomes in poplar. In contrast, no significant difference was observed with ω ratios for most of the pairwise comparisons within the genome of poplar (except for XIX vs. IV, XIX vs. VI, and XIX vs. IX) and willow (except for IV vs. V) (Table 3; Table 4), indicating that PGRS on most chromosomes within each lineage were under similar purifying selection pressure.

Table 2 Statistical test for ω ratios of PGRS on the corresponding chromosomes between P. trichocarpa and S. suchowenesis
Table 3 Statistical test for ω ratios of PGRS among chromosomes within the genome of P. trichocarpa
Table 4 Statistical test for ω ratios of PGRS among chromosomes within the genome of S. suchowensis

Examination of the selection pressure on genes in the sex-determining region will provide unique insight into the evolution of sex chromosomes. Previous studies have shown that the gender locus in Populus was mapped to the peritelomeric region upstream to the position of SSR marker O_206 in chromosome XIX22,33. The gender locus in Salix was between SSR markers SSR151 and SSR893 on chromosome XV23. In this study, ω ratios were calculated for the PGRS in SDRs for each lineage. The median ω ratios were 0.233 and 0.223 in the SDR of the P. trichocarpa and S. suchowensis genomes, respectively, which is the lowest value in the corresponding column (Table 1). Thus, higher convergent selection pressure has been observed to act on the PGRS in SDRs, and PGRS in the corresponding regions are supposed to be lost faster. Interestingly, much lower PGRS density was observed in the SDRs in both the P. trichocarpa (11.9/Mb) and S. suchowensis (14.9/Mb) genomes (Table 1). It is well known that gene losses occur with the evolution of sex chromosomes. A dramatic decrease in PGRS density in SDR regions indicates the faster divergence of SDRs in the two lineages.

Sliding window analysis

To demonstrate the variation of selection pressure along each chromosome, we conducted sliding window analysis for poplar (Fig. 2a) and willow (Fig. 2b) separately. The figures show that extensive purifying selection dominated throughout each chromosome. Genome regions under significant relaxed purifying selection were observed on many of the chromosomes (peak positions). Examination of the sliding windows detected significantly elevated ω ratios in 13 regions on 12 of the chromosomes in poplar and in six regions on six of the chromosomes in willow. A detailed examination of ω ratios revealed 25 PGRS that were subjected to extremely strong purifying selection (ω = 0) and 52 PGRS were under positive selection (ω > 1) (Supplementary Table S3) in the P. trichocarpa genome, accounting for 0.28% and 0.58% of the total, respectively. In the willow genome, the PGRS under unusually strong purifying (ω = 0) and positive selection (ω > 1) were 8 and 3, (Supplementary Table S3), accounting for 0.16% and 0.06% of the total, respectively. It is noteworthy that in both lineages, PGRS under extremely strong purifying selection are mainly housekeeping genes coding histone, ubiquitin and ribosomal proteins. In contrast, GO enrichment showed that PGRS under positive selection were significantly enriched in genes found with the “metabolic process” terms associated with a diverse spectrum of biological functions (Supplementary Fig. S3), especially genes involving tolerance to biotic or abiotic stress, such as the bifunctional inhibitor, BTB/POZ domain-containing protein, and the AWPM-19-like family protein, etc. In the willow genome, the three PGRS under positive selection were annotated as genes with unknown function; thus, it remained unclear which biological processes they might be involved with.

Fig. 2: Sliding window analysis of ω ratios varying along each chromosome in P.
figure 2

trichocarpa and S. suchowensis a Variation of ω ratios along chromosomes in P. trichocarpa. b Variation of ω ratios along chromosomes in S. suchowensis. Note: blue stars indicate the regions where ω ratios varied significantly in the corresponding regions between P. trichocarpa and S. suchowensis; red lines represent the genomic regions where fission and fusion occurred on chromosome I and chromosome XVI; yellow regions represent the SDRs

Discussion

Poplar and willow are dioecious woody plants that generally appear as diploids with a basic haploid chromosome number of n = 19. It has been confirmed that Populus and Salix share lineage-specific salicoid duplication, and nearly every chromosomal segment is found to have a paralogous segment elsewhere in their genomes14,15,26. Collinearity analysis of genetic maps and genome sequences for multiple Salicaceae species showed that poplar and willow shared the same large-scale genomic history16,21,34,35. Analyzing the orthologous groups suggested that the divergence of these two lineages occurred approximately 6 million years later after salicoid duplication15. However, it remains unknown whether, after salicoid duplication, fission and fusion of the ancestral chromosomes first gave rise to the crown of Populus or that of Salix. Previous studies revealed that chromosome I of poplar or willow was a conjunction of chromosome XVI and the distal end of chromosome I of the alternate lineage, and the proximal end of chromosome I in poplar or willow corresponded to chromosome XVI of the alternate lineage. These major interchromosomal rearrangements distinguish the karyotypes of poplars and willows16,21. The changes accumulated during speciation may be of special relevance in understanding the basis of their differences. In this study, significantly elevated ω ratios were observed in different regions on many of the chromosomes. The elevated ω ratio means relaxed purifying selection; thus, the corresponding genomic regions are supposed to diverge faster. In the poplar genome, significantly elevated ω ratios were observed on chromosome I and XVI, where chromosomal fission and fusion occurred. In contrast, no peaks appeared in the corresponding regions in the willow genome. Whether the observed coincidences are relevant to the additional round of chromosome rearrangements is an interesting question. However, with the current data, we cannot determine whether the observed genome regions with significantly elevated ω ratios are stable signatures and this needs to be explored in more species.

In this study, we also found that the PGRS in willow were subject to stronger purifying selection than those in poplar, which would result in a faster loss of PGRS in willow. Deleterious mutations are much more likely to occur than beneficial ones. Thus, the paralogous copies of a gene may often accumulate degenerative mutations at an accelerated rate following a duplication event, and purifying selection can result in stabilizing selection through the purging of deleterious variations that arise36. We speculate that the mechanism underlying the faster loss of PGRS in willow might relate to the additional round of genome reorganization after salicoid duplication, since chromosomal rearrangements might bring up epistatic effects on other chromosome regions and affect genome stability. During genome stabilization, the genome of the nascent lineage would reshuffle more extensively. During this process, good genes might be dragged off due to the hijacking effect, which would cause a driven force for more active gene duplication through other manners in willow. Indeed, significantly more active gene duplications associated with transposon and tandem duplication were detected in willow than in poplar15. The joint driving force would cause willow to evolve faster, leading willows to be more diverse. According to the taxonomy of the Salicaceae family, the genus Populus comprises 29 species or so37, while the genus Salix represents over 300–500 species38,39. Moreover, Salix shows considerable variation in size, growth form and crown architecture, ranging from large trees to sub-trees to dwarf shrubs, while Populus generally appear as large trees. Taking these findings together, we propose that Populus should be evolutionarily more primitive than Salix, which supports the empirical presumption in previous studies13,39,40.

It was found that different autosomes evolved into sex chromosomes in the sister genera of Populus and Salix21,23. In Populus, the gender locus was consistently mapped on chromosome XIX22,33,41,42,43, and multiple lines of evidence suggest that chromosome XIX has been evolving into an incipient sex chromosome44,45, while chromosome XIX is an autosome in willow. In Salix, the primitive sex chromosome is chromosome XV21,23, while the corresponding chromosome is an autosome in poplar. Examination of the SDRs revealed stronger purification selection and faster loss of PGRS both in poplar and willow. At the chromosome level, chromosome XIX is characterized by the lowest gene and PGRS densities in both lineages, but this characteristic is not observed on chromosome XV. It is a common scenario that sex chromosomes contain less genes than autosomes in dioecious organisms46. We proposed that Populus might inherit the ancestral sex chromosome from the progenitor of the Salicaceae family, and the sex chromosome in willow should be evolutionarily younger. Evidence in this study showed that the sex chromosome in willow was still at the very early evolutionary stage because dramatic loss of PGRS was observed only in its SDR but not at the chromosomal level on chromosome XV.

It has been reported that retaining genes should be biased after WGD47,48. In Brassica species, asymmetrical gene retention was proposed to contribute to extreme morphological diversity49,50. It is commonly accepted that the rate of molecular evolution differs greatly from gene to gene depending on the degree of constraint of gene products51. In this study, we detected some genes under extremely strong purifying selection or under positive selection in both lineages. As expected by the neutral theory of molecular evolution51, these genes should account for only a very small portion of the total. Genes under extremely strong purifying selection (ω = 0) are mainly housekeeping genes, are subject to very stringently selective constraints, and every nonsynonymous mutation in them is supposed to be deleterious. By contrast, PGRS under positive selection (ω > 1) are assumed to aid adaption and fitness, and they are found to be mainly involved in transcriptional regulation and resistance to biotic or abiotic stresses. These genes tend to diverge faster to gain better fitness for the population. Thus, from a biological perspective, genes under unusual selection pressure detected in this study are worthy of further functional exploration through experiments.

Materials and methods

Genome sequence data

Whole-genome CDS sequences, protein sequences and gene positions along each chromosome in the genome of P. trichocarpa were extracted from the Joint Genome Institute, United States Department of Energy website (JGI) (https://genome.jgi.doe.gov/portal/pages/dynamicOrganismDownload.jsf?organism = Ptrichocarpa). The corresponding information for S. suchowensis was retrieved from willow genome databases (http://115.29.234.170/node/5). If a gene had more than one transcript, only the first transcript in the annotation was extracted.

Detection of PGRS in P. trichocarpa and S. suchowensis

The whole-genome protein sequences from P. trichocarpa and S. suchowensis were compared against themselves using BLASTP to search for their paralogs52. For a protein sequence, the best five non-self hits in each genome were reported with an E-value threshold of 10−10. Whole-genome duplication, tandem gene duplication, and segmental duplication all generate paralogous genes3. To detect the paralogous genes specifically generated by the salicoid duplication event, we first identified the syntenic segments containing at least five homologous genes that are collinear in a row, following the description in Tang et al.’s paper53. In detail, whole-genome BLASTP results were sorted according to gene positions in poplar and willow genomes. Then, the sorted paralogs were used to compute collinear blocks for all chromosomes to detect the WGD paralogous genes using MCScanX54. For each pair of paralogous duplicates, their protein sequences were aligned using MUSCLE55, and the protein alignment was converted to DNA alignment using PAL2NAL according to their CDS sequences56. Subsequently, we confined the paralogous genes associated with salicoid duplication by calculating the average 4DTV for each syntenic segment with all the contained paralogous genes, and 4DTV values for the paralogous genes having ≥10 four-fold degenerate sites were calculated using in-house Perl scripts following Rodgers-Melnick et al.’s study32. The 4DTV range associated with salicoid duplication was determined by plotting the 4DTV values. Ka, Ks values for the paralogous genes were calculated using the Nei-Gojobori algorithm implemented in the KaKs_Calculator 2.057. PGRS were finally determined based on the plotting of the Ks values in each lineage. The 4DTV and Ks ranges were set to cover 99% of the variables for the peak associated with salicoid duplication.

Significance test for ω ratios and sliding window analysis

A significance test for ω ratios was performed with Mann-Whitney statistics to detect whether there is significantly different selection pressure acting on specific genomic regions within or between poplar and willow, with a significance level of P ≤ 0.05. The Mann-Whitney test is particularly useful for determining whether there is a significant difference for two groups of samples with unequal sizes based upon a series of ranking scores58. This test was conducted with Minitab software (Minitab Inc., PA, USA), and ω ratios were transformed into ranking scores by the software prior to the test. To compare the detailed patterns of selection pressure acting on different chromosomes between the two lineages, we open a sliding window along each chromosome. Because willow had a smaller gene content than poplar, a sliding window was designed to contain 30 and 25 genes in poplar and willow, respectively. The default sliding size was 15-gene lengths.

For the ω indicator, ω > 1 indicates positive selection, ω close to 1 indicates neutral mutation, and ω < 1 indicates purifying (negative) selection29. We detected segmental duplicates under unusual selection pressure in poplar and willow, with ω > 1 and ω = 0, and annotated these genes by referring annotation files in the JGI and willow genome databases, respectively. GO-based functional enrichment analysis was performed for the genes under positive selection using Blast2GO (https://www.blast2go.com/).