Uneven selection pressure accelerating divergence of Populus and Salix

Populus (poplars) and Salix (willows) are sister genera in the Salicaceae family that arise from a common tetraploid ancestor. The karyotypes of these two lineages are distinguished by two major interchromosomal and some minor intrachromosomal rearrangements, but which one is evolutionarily more primitive remains debatable. In this study, we compare the selection pressure acting on the paralogous genes resulting from salicoid duplication (PGRS) within and between the genomes of the two lineages. Purifying selection was determined to act more strongly on the PGRS in willow than on those in poplar, which would cause a faster loss of paralogous duplicates in willow. Therefore, Salix species are supposed to evolve faster than Populus species, which is consistent with the observation that the former are taxonomically and morphologically more diverse than the latter. In these two lineages, different autosomes were found to have been evolving into sex chromosomes. Examining the ω ratio and the PGRS in the sex determination regions in willow and poplar revealed higher convergent selection pressure and a faster loss of PGRS in the sex determination regions of both lineages. At the chromosome level, the sex chromosome in poplar is characterized by the lowest gene density among all chromosome members, while this feature is not observed on the sex chromosome in willow, suggesting that Populus species may inherit the more incipient sex chromosome from their progenitor. Taken together, Salix is supposed to be the nascent lineage arising from the additional round of genome reorganization that distinguishes the karyotypes of the two sister genera. In this study, assessment of ω ratios also detected a list of paralogous genes under unusual selection pressure, which could have special consequences for the adaptive evolution of Salicaceae species. In conclusion, the results of this study provide unique information for better understanding the genetic mechanism accelerating the divergence of these two closely related lineages.


Introduction
Whole genome duplication (WGD) is an important driving force in the evolutionary process of flowering plants 1,2 . Analyses of genomic data suggest that the extant angiosperm crown arises from a common paleohexaploid progenitor 3,4 . In addition to paleohexapolyploidization, one or more rounds of WGD recurred in the genomes of many angiosperms 5 , which simultaneously created thousands of paralogous gene pairs in the affected lineages. Population genetics predict that an entirely redundant duplicate copy cannot be maintained in the genome for a long time 6 . Following the ancient WGDs, some paralogs will be silenced and eventually eliminated, while many of the retained paralogs may be subject to changes in DNA sequence or gene expression, leading to sub or neofunctionalization [7][8][9] . Wholesale gene loss after WGDs can drastically shrink genome size and gene content, which has long been viewed as a critical driving force in the evolution of higher plants [10][11][12] .
Populus (poplar) and Salix (willow) are sister genera in the Salicaceae family. The two lineages are important fiber resources with rapid growth rates, ease of vegetative propagation, predisposition to hybridization, and high productivity of wood 13 . These characteristics, together with their relatively small genomes and the rapidly growing research resources, have led to Salicaceae species emerging as the model system for different aspects of genetic studies of woody plants. In the past decade, the whole genomes of poplar and willow have been sequenced and become publicly available 14,15 . Genomic analysis revealed that in addition to the paleohexaploidization shared in angiosperms, a lineage-specific WGD (known as salicoid duplication) recurred in the genome of the progenitor of these two lineages ca. 58~65 million years ago 14,15 . Genome comparison revealed that poplars and willows originate from a common paleotetraploid ancestor 16 . However, cytogenetic studies show that the extant poplars and willows mainly exist in diploid form 17 , suggesting that genome diploidization recurred after salicoid duplication. This process may accompany substantial genome reshuffling and gene losses 14,[18][19][20] . Genome-wide comparison of sequences among different chromosome members suggested that after salicoid duplication, a series of reciprocal tandem terminal fusions of the duplicated chromosomes gave rise to the diploid progenitor of the modern taxa of these two lineages 14 . Approximately six million years later, two major interchromosomal rearrangements and several minor intrachromosomal rearrangements occurred subsequently 16,21 , which distinguished the karyotypes of willow and poplar. Previous studies have also revealed that primitive sex chromosomes evolve in poplar and willow, which are associated with different autosomes in the two lineages [21][22][23] . In addition to the divergences in genome structure and sex chromosome evolution, multiple lines of evidence, such as the 2 C DNA contents 24,25 , the k-mer estimation and the published genomes 14,15,[26][27][28] , indicate that willows have smaller genomes and gene contents than poplars. Considering that these two lineages share a common ancestor, willows should lose DNA and genes at a faster rate than poplar after their divergence. However, the genetic mechanism triggering this scenario remains largely unknown.
The nonsynonymous (Ka) to synonymous (Ks) substitution rate ratio (ω = Ka/Ks) can be used as an estimator for selective pressure on DNA sequence evolution 29 . Using this analytical tool, Clark et al. detected an informative set of genes with significantly different patterns of substitution in humans different than that in chimpanzees and mouse among a total of 7,645 human-chimp-mouse orthologous genes 30 . With paralogous genes, this analytical tool is useful to navigate an evolutionary trajectory from an initial state of complete redundancy. By comparison of the selection pressure acting on paralogous genes with data from 26 bacterial, six archaeal, and seven eukaryotic genomes, Kondrashov et al. indicated that paralogs produced through duplication were subject to purifying selection, which would lead to losses of redundant genes 31 . In this paper, we assess the selection pressure on paralogous pairs resulting from salicoid duplication throughout the genomes of Populus trichocarpa and Salix suchowensis. We aim to detect whether there is uneven selection pressure accelerating the divergent evolution of these two sister genera.

PGRS in poplar and willow
A total of 39,514 and 24,931 coding sequences contained in 19 chromosomal reconstructions were extracted from the genomic database of P. trichocarpa and S. suchowensis, respectively, and these genes were used to detect the PGRS in poplar and willow. Plotting the average 4DTV (four-fold degenerate site transversion) values for the paralogous genes contained on each syntenic segment revealed two peaks both in poplar and willow (Fig. 1a, b), and the highest peak was recognized to result from salicoid duplication according to Tuskan et al.'s study 14 . The highest peaks covered 4DTVs in the range of 0.0-0.2 in both lineages, which was consistent with the results in previous reports 14,32 Table S1). In total, 8,991 and 5,161 PGRS were detected on the reconstructed chromosomes in P. trichocarpa and S. suchowensis, respectively ( Table 1). The synteny for the detected PGRS among the poplar and willow chromosomes was separately shown in Supplementary Fig. S1 and Supplementary Fig. S2, respectively.
Gene coverages for the 19 chromosomal reconstructions were 98.0% and 93.7% of the total poplar and willow genomes, respectively 14,16 . Although most genes were assembled on chromosomes of poplar and willow, the reconstructed chromosomes are incomplete, and the integrity may vary among different chromosomes. Thus, the absolute numbers of genes are not comparable among different chromosomes. In contrast, gene density is a comparable parameter under such circumstances. As shown in Table 1 Correlation analysis shows that gene and PGRS densities on the corresponding chromosomes are highly correlated between poplar and willow, with a correlation coefficient equal to 0.79 (P = 0.000) and 0.89 (P = 0.000), respectively. The high correlation coefficients imply that loss of PGRS scales similarly across different chromosomes in the two lineages. On the reconstructed chromosomes, poplar is found to retain more PGRS than willow (8991 vs. 5161), indicating that willow has lost PGRS at a faster rate than poplar after their divergence.
Compare the ω ratios for PGRS within and between the two lineages The Ka, Ks, and ω ratios were calculated for each PGRS on the 19 chromosomes in poplar and willow separately. At the genome-wide level, the average ω ratio for P. trichocarpa and S. suchowensis was 0.309 and 0.274, respectively ( Table 1). The former was significantly higher than that for the latter (P < 0.001) ( Table 2), indicating PGRS in willow were subject to stronger purifying selection than those in poplar, which would result in a faster loss of redundant genes in willow than in poplar. At chromosome level, the average ω ratios for PGRS on different chromosomes ranged from 0.301 to 0.318 in poplar (Table 1), and it ranged from 0.261 to 0.282 in willow ( Table 1). The ω ratios were all substantially smaller than 1, indicating PGRS were generally under strong purifying selection in both lineages. The statistical test for ω ratios of PGRS between willow and poplar indicated that this parameter varied significantly among 18 of the corresponding chromosomes except for chromosome XVIII ( Table 2), suggesting that PGRS on most of the chromosomes in willow were under significantly stronger purifying selection than those on the corresponding chromosomes in poplar. In contrast, no significant difference was observed with ω ratios for most of the pairwise comparisons within the genome of poplar (except for XIX vs. IV, XIX vs. VI, and XIX vs. IX) and willow (except for IV vs. V) ( Table 3; Table 4), indicating that Examination of the selection pressure on genes in the sexdetermining region will provide unique insight into the evolution of sex chromosomes. Previous studies have shown that the gender locus in Populus was mapped to the peritelomeric region upstream to the position of SSR marker O_206 in chromosome XIX 22,33 . The gender locus in Salix was between SSR markers SSR151 and SSR893 on chromosome XV 23 . In this study, ω ratios were calculated for the PGRS in SDRs for each lineage. The median ω ratios were 0.233 and 0.223 in the SDR of the P. trichocarpa and S. suchowensis genomes, respectively, which is the lowest value in the corresponding column (Table 1). Thus, higher convergent selection pressure has been observed to act on the PGRS in SDRs, and PGRS in the corresponding regions are supposed to be lost faster. Interestingly, much lower PGRS density was observed in the SDRs in both the P. trichocarpa (11.9/Mb) and S. suchowensis (14.9/Mb) genomes (Table 1). It is well known that gene losses occur with the evolution of sex chromosomes. A dramatic decrease in PGRS density in SDR regions indicates the faster divergence of SDRs in the two lineages.

Sliding window analysis
To demonstrate the variation of selection pressure along each chromosome, we conducted sliding window analysis for poplar (Fig. 2a) and willow (Fig. 2b) separately. The figures show that extensive purifying selection dominated throughout each chromosome. Genome regions under significant relaxed purifying selection were observed on many of the chromosomes (peak positions). Examination of the sliding windows detected significantly elevated ω ratios in 13 regions on 12 of the chromosomes in poplar and in six regions on six of the chromosomes in willow. A detailed examination of ω ratios revealed 25 PGRS that were subjected to extremely strong purifying selection (ω = 0) and 52 PGRS were under positive selection (ω > 1) (Supplementary Table S3) in the P. trichocarpa genome, accounting for 0.28% and 0.58% of the total, respectively. In the willow genome, the PGRS under unusually strong purifying (ω = 0) and positive selection (ω > 1) were 8 and 3, (Supplementary Table S3), accounting for 0.16% and 0.06% of the total, respectively. It is noteworthy that in both lineages, PGRS under extremely strong purifying selection are mainly housekeeping genes coding histone, ubiquitin and ribosomal proteins. In contrast, GO enrichment showed that PGRS under positive selection were significantly enriched in genes found with the "metabolic process" terms associated with a diverse spectrum of biological functions ( Supplementary Fig. S3), especially genes involving tolerance to biotic or abiotic stress, such as the bifunctional inhibitor, BTB/POZ domain-containing protein, and the AWPM-19-like family protein, etc. In the willow genome, the three PGRS under positive selection were annotated as genes with unknown function; thus, it remained unclear which biological processes they might be involved with.

Discussion
Poplar and willow are dioecious woody plants that generally appear as diploids with a basic haploid chromosome number of n = 19. It has been confirmed that Populus and Salix share lineage-specific salicoid duplication, and nearly every chromosomal segment is found to have a paralogous segment elsewhere in their genomes 14,15,26 . Collinearity analysis of genetic maps and genome sequences for multiple Salicaceae species showed that poplar and willow shared the same large-scale genomic history 16,21,34,35 . Analyzing the orthologous groups suggested that the divergence of these two lineages occurred approximately 6 million years later after salicoid duplication 15 . However, it remains unknown whether,   after salicoid duplication, fission and fusion of the ancestral chromosomes first gave rise to the crown of Populus or that of Salix. Previous studies revealed that chromosome I of poplar or willow was a conjunction of chromosome XVI and the distal end of chromosome I of the alternate lineage, and the proximal end of chromosome I in poplar or willow corresponded to chromosome XVI of the alternate lineage. These major interchromosomal rearrangements distinguish the karyotypes of poplars and willows 16,21 . The changes accumulated during speciation may be of special relevance in understanding the basis of their differences. In this study, significantly elevated ω ratios were observed in different regions on many of the chromosomes. The elevated ω ratio means relaxed purifying selection; thus, the corresponding genomic regions are supposed to diverge faster. In the poplar genome, significantly elevated ω ratios were observed on chromosome I and XVI, where chromosomal fission and fusion occurred. In contrast, no peaks appeared in the corresponding regions in the willow genome. Whether the observed coincidences are relevant to the additional round of chromosome rearrangements is an interesting question. However, with the current data, we cannot determine whether the observed genome regions with significantly elevated ω ratios are stable signatures and this needs to be explored in more species.
In this study, we also found that the PGRS in willow were subject to stronger purifying selection than those in poplar, which would result in a faster loss of PGRS in willow. Deleterious mutations are much more likely to occur than beneficial ones. Thus, the paralogous copies of a gene may often accumulate degenerative mutations at an accelerated rate following a duplication event, and purifying selection can result in stabilizing selection through the purging of deleterious variations that arise 36 . We speculate that the mechanism underlying the faster loss of PGRS in willow might relate to the additional round of genome reorganization after salicoid duplication, since chromosomal rearrangements might bring up epistatic effects on other chromosome regions and affect genome stability. During genome stabilization, the genome of the nascent lineage would reshuffle more extensively. During this process, good genes might be dragged off due to the hijacking effect, which would cause a driven force for more active gene duplication through other manners in willow. Indeed, significantly more active gene duplications associated with transposon and tandem duplication were detected in willow than in poplar 15 . The joint driving force would cause willow to evolve faster, leading willows to be more diverse. According to the taxonomy of the Salicaceae family, the genus Populus comprises 29 species or so 37 , while the genus Salix represents over 300-500 species 38,39 . Moreover, Salix shows considerable variation in size, growth form and crown architecture, ranging from large trees to sub-trees to dwarf shrubs, while Populus generally appear as large trees. Taking these findings together, we propose that Populus should be evolutionarily more primitive than Salix, which supports the empirical presumption in previous studies 13,39,40 . It was found that different autosomes evolved into sex chromosomes in the sister genera of Populus and Salix 21,23 . In Populus, the gender locus was consistently mapped on chromosome XIX 22,33,[41][42][43] , and multiple lines of evidence suggest that chromosome XIX has been evolving into an incipient sex chromosome 44,45 , while chromosome XIX is an autosome in willow. In Salix, the primitive sex chromosome is chromosome XV 21,23 , while the corresponding chromosome is an autosome in poplar. Examination of the SDRs revealed stronger purification selection and faster loss of PGRS both in poplar and willow. At the chromosome level, chromosome XIX is characterized by the lowest gene and PGRS densities in both lineages, but this characteristic is not observed on chromosome XV. It is a common scenario that sex chromosomes contain less genes than autosomes in dioecious organisms 46 . We proposed that Populus might inherit the ancestral sex chromosome from the progenitor of the Salicaceae family, and the sex chromosome in willow should be evolutionarily younger. Evidence in this study showed that the sex chromosome in willow was still at the very early evolutionary stage because dramatic loss of PGRS was observed only in its SDR but not at the chromosomal level on chromosome XV.
It has been reported that retaining genes should be biased after WGD 47,48 . In Brassica species, asymmetrical gene retention was proposed to contribute to extreme morphological diversity 49,50 . It is commonly accepted that the rate of molecular evolution differs greatly from gene to gene depending on the degree of constraint of gene products 51 . In this study, we detected some genes under extremely strong purifying selection or under positive selection in both lineages. As expected by the neutral theory of molecular evolution 51 , these genes should account for only a very small portion of the total. Genes under extremely strong purifying selection (ω = 0) are mainly housekeeping genes, are subject to very stringently selective constraints, and every nonsynonymous mutation in them is supposed to be deleterious. By contrast, PGRS under positive selection (ω > 1) are assumed to aid adaption and fitness, and they are found to be mainly involved in transcriptional regulation and resistance to biotic or abiotic stresses. These genes tend to diverge faster to gain better fitness for the population. Thus, from a biological perspective, genes under unusual selection pressure detected in this study are worthy of further functional exploration through experiments.

Genome sequence data
Whole-genome CDS sequences, protein sequences and gene positions along each chromosome in the genome of P. trichocarpa were extracted from the Joint Genome Institute, United States Department of Energy website (JGI) (https://genome.jgi.doe.gov/portal/pages/ dynamicOrganismDownload.jsf?organism = Ptrichocarpa). The corresponding information for S. suchowensis was retrieved from willow genome databases (http:// 115.29.234.170/node/5). If a gene had more than one transcript, only the first transcript in the annotation was extracted.

Detection of PGRS in P. trichocarpa and S. suchowensis
The whole-genome protein sequences from P. trichocarpa and S. suchowensis were compared against themselves using BLASTP to search for their paralogs 52 . For a protein sequence, the best five non-self hits in each genome were reported with an E-value threshold of 10 −10 . Whole-genome duplication, tandem gene duplication, and segmental duplication all generate paralogous genes 3 . To detect the paralogous genes specifically generated by the salicoid duplication event, we first identified the syntenic segments containing at least five homologous genes that are collinear in a row, following the description in Tang et al.'s paper 53 . In detail, whole-genome BLASTP results were sorted according to gene positions in poplar and willow genomes. Then, the sorted paralogs were used to compute collinear blocks for all chromosomes to detect the WGD paralogous genes using MCScanX 54 . For each pair of paralogous duplicates, their protein sequences were aligned using MUSCLE 55 , and the protein alignment was converted to DNA alignment using PAL2NAL according to their CDS sequences 56 . Subsequently, we confined the paralogous genes associated with salicoid duplication by calculating the average 4DTV for each syntenic segment with all the contained paralogous genes, and 4DTV values for the paralogous genes having ≥10 four-fold degenerate sites were calculated using in-house Perl scripts following Rodgers-Melnick et al.'s study 32 . The 4DTV range associated with salicoid duplication was determined by plotting the 4DTV values. Ka, Ks values for the paralogous genes were calculated using the Nei-Gojobori algorithm implemented in the KaKs_Calculator 2.0 57 . PGRS were finally determined based on the plotting of the Ks values in each lineage. The 4DTV and Ks ranges were set to cover 99% of the variables for the peak associated with salicoid duplication.
Significance test for ω ratios and sliding window analysis A significance test for ω ratios was performed with Mann-Whitney statistics to detect whether there is significantly different selection pressure acting on specific genomic regions within or between poplar and willow, with a significance level of P ≤ 0.05. The Mann-Whitney test is particularly useful for determining whether there is a significant difference for two groups of samples with unequal sizes based upon a series of ranking scores 58 . This test was conducted with Minitab software (Minitab Inc., PA, USA), and ω ratios were transformed into ranking scores by the software prior to the test. To compare the detailed patterns of selection pressure acting on different chromosomes between the two lineages, we open a sliding window along each chromosome. Because willow had a smaller gene content than poplar, a sliding window was designed to contain 30 and 25 genes in poplar and willow, respectively. The default sliding size was 15-gene lengths.
For the ω indicator, ω > 1 indicates positive selection, ω close to 1 indicates neutral mutation, and ω < 1 indicates purifying (negative) selection 29 . We detected segmental duplicates under unusual selection pressure in poplar and willow, with ω > 1 and ω = 0, and annotated these genes by referring annotation files in the JGI and willow genome databases, respectively. GO-based functional enrichment analysis was performed for the genes under positive selection using Blast2GO (https://www.blast2go.com/).