Introduction

Hybridisation and gene flow between closely related species is common and evolution of reproductive barriers is the crucial step in speciation process. In animals, sex chromosomes are known to play a disproportionately large role in isolation between incipient species (e.g.,1,2,3). Interspecific hybridisation often leads to asymmetric outcome, with hybrid inviability and sterility usually occurring in the heterogametic sex—the observation that is often called Haldane’s rule (HR), indicating that sex chromosomes play a major role in speciation4,5. The X-chromosome was also proposed to have disproportionately large role in dysfunction of hybrids in comparison to their autosomal counterparts, known as the large X effect (LXE)6,7,8. The LXE and HR are often referred to as the “two rules of speciation”.

The LXE and HR, are thought to be caused by recessive species incompatibilities exposed in the phenotype due to the hemizygosity of X-linked genes in the heterogametic sex7,9. Thus, the reports of HR and the LXE in species with recently evolved non or partially-degenerate Y-chromosomes10, such as Silene latifolia and its relatives discussed below11,12, were surprising and cast doubts that hemizygous X-linked genes are the universal major cause of HR and LXE. Other possible causes of HR and LXE include meiotic drive on sex chromosomes13,14,15, misregulation of X-chromosome in hybrids1,3, quicker evolution of genes linked to X-chromosome (faster-X theory; e.g.,16,17,18), higher density of male sterility loci on X chromosomes than autosomes1, quicker evolution of spermatogenesis-related genes and stronger sexual selection exerted on males than females (faster males theory;19).

Here we analyse another possible cause of the LXE—the presence of a massive block of rarely- or non-recombining DNA on the X-chromosome, as recently reported for S. latifolia20. Extensive pericentromeric recombination suppression (PRS) on very large (~ 400Mb) S. latifolia X-chromosome appears to be an extreme case of a general tendency for long chromosomes to have large central chromosomal regions with rare recombination. The reasons for this are not clear, but they are discussed in the literature21,22. Regions of low recombination often show high genetic differentiation between species because stronger and wider linkage disequilibrium (LD) in such regions increases linkage of loci involved in interspecific incompatibility (barrier loci) with a larger chunk of the genome, which leads to suppressed introgression in such regions (e.g.,23,24,25,26,27,28,29). Non-recombining regions could contribute to the maintenance of species integrity despite on-going interspecific hybridisation, as noted in many theoretical and empirical studies (e.g.,23,30,31,32,33,34). Some suggested that suppression of introgression in low-recombining regions is key to maintaining divergence between hybridising species (e.g.,35,36).

The focal species of this study, S. latifolia and S. dioica, are commonly found across Europe. Habitat differentiation plays a crucial role in reproductive isolation between the two species37: S. latifolia inhabiting open fields and road margins, while S. dioica is more common in shady and moist habitats. They also differ in a number of phenotypic traits, including but not limited to flower colour, size and shape of sepal and seed capsules, and leaf shape38. Although the two species form viable and fertile hybrids where they co-occur39,40, some fitness reduction in hybrids (such as low pollen viability) had been detected11. S. latifolia and S. dioica have indistinguishable karyotypes, with the Y being the largest and the X the second largest chromosome in the genome41,42. The separate sexes and sex chromosomes are of relatively recent origin in this lineage—they have evolved about ~ 11 million years ago in the ancestor of S. latifolia and S. dioica, as estimated from synonymous divergence between the X- and Y-linked gametologs based on the mutation rate that was measured directly in S. latifolia43. Although some degeneration was reported on the Y chromosome43,44,45,46,47, most sex-linked genes are not hemizygous in males. This raises the question how the ‘two rules of speciation’, reported for these species11,12, apply to species with such recently evolved sex chromosomes. One possibility is that rapid species-specific degeneration of Y-linked genes and associated adjustment of expression of X-linked gametologs (dosage compensation) may lead to rapid evolution of sex-linked species incompatibilities44. This model is particularly suitable for species with large, recently evolved sex chromosomes, such as in S. latifolia and S. dioica, because the rate of Y-degeneration is proportional to the number of genes linked together in a non-recombining region48, so it has to be fast for young sex chromosomes and slow down once only few functional Y-linked genes are left, as inferred for mammalian Y chromosomes (Fig. 4 in49).

Recent sequencing of S. latifolia genome and its integration with high-density genetic map20 revealed substantial pericentromeric recombination suppression (PRS) on all chromosomes. PRS is particularly extensive on the X chromosome, where the rarely-recombining pericentromeric region (Xpr) comprises at least 330 Mb, which is ~ 90% of the X chromosome length and over 13% of the total genome length. Recombination rates are similar in male and female meiosis in S. latifolia50 and extensive PRS is unrelated to heterochiasmy, but PRS may have contributed to evolution of recombination suppression between the nascent X- and Y-chromosomes in this species51. As explained above, the rarely- or non-recombining regions represent a significant obstacle in interspecific gene flow. If most of the X chromosome in S. latifolia (and likely in S. dioica) is represented by a massive rarely-recombining block of chromatin impenetrable to interspecific gene flow, this may be the main reason for the LXE reported for these species12. Here we test this hypothesis to evaluate whether the presence of the massive rarely-recombining region in the S. latifolia X chromosome is sufficient to explain the LXE. Specifically, we compared patterns of polymorphisms and gene expression divergence between rarely-recombining X-linked genes and other X-linked and autosomal genes. We also employed demographic modelling to characterise the extent of gene flow in different parts of the Silene genome.

Materials and methods

Transcriptome dataset

The analyses in this study are based on sequence data from 12 S. latifolia and 12 S. dioica females (Table 1) grown in the glasshouse (20 °C, 15-h lighting) from seeds collected in the wild. Actively growing shoots with flower buds were used for total RNA extraction with a Qiagen RNeasy Plant Mini Kit with on-column DNase digestion. Isolation of mRNA, cDNA synthesis and high-throughput sequencing were conducted according to the standard Illumina RNA-Seq procedure at the WTCHG genomics facility (Oxford, UK). The resulting sequence reads were mapped to female reference transcriptome46 that was also used in the genetic mapping50,52. Read mapping was done with BWA mem 0.7.1753 and sorted with Samtools 1.754. Then, SNP calling was done with Samtools mpileup (options: -d 1000 -q 20 -Q 20) and sites filtered with bcftools filter 1.7. The resulting multisample vcf file was converted to fasta alignments using ProSeq software55 available from https://sourceforge.net/projects/proseq/. The latter software was also used for further processing and analysis of resulting datasets. Gene expression was quantified as per-gene FPKM (fragments per kilobase per million reads), calculated with RSEM56.

Table 1 Silene samples used in this study. Newly submitted samples are part of the BioProejct PRJNA1012686.

The following groups of genes were used in the analyses: rarely-recombining autosomal genes (rareA); rarely-recombining X-linked genes in the Xpr region (rareX); frequently-recombining autosomal genes (freqA); frequently-recombining X-linked genes in the qXdr region (freqX). These groups were defined according to the location of a gene in the rarely recombining central chromosome region or actively recombining ends of the chromosomes, based on the S. latifolia female genome sequence20. Pericentromeric recombination suppression is extensive on all S. latifolia chromosomes20,50 and genetic analysis detected no recombination in the central regions of the chromosomes, while recombination at the ends of the chromosomes was frequent50. As the transition between the frequently recombining ends of the chromosomes and rarely-recombining central regions is quite sharp20, we reasoned the split of the genes in the freqA, rareA, freqX and rareX categories is well justified.

Genomewide polymorphism statistics and comparisons between gene categories

Five polymorphism indices, namely nucleotide diversity (π)57, Tajima’s D58, FST57, Dxy57 and ZnS59, were measured using ProSeq55 for all sites, fourfold degenerate sites and the first two codon positions. The fourfold degenerate sites are considered the most neutral type of sites in the genome (e.g. Fig. 2 in reference60), while the first two codon positions are likely least neutral. All the above statistics were firstly plotted against genomic positions (using the R package ggplot2;61) to obtain a genome wide overview (fourfold degenerate π and Tajima’s D; all sites for FST, Dxy and ZnS). Then, their values were compared based on the following categories using the Kruskal–Wallis test and Wilcoxon rank-sum test: (1) between frequently-recombining and rarely-recombining groups of genes analysed separately within each species (for Tajima’s D and π; using fourfold degenerate sites, or first two codon positions; and for ZnS using all sites) or between the two species (for Dxy and FST; using all sites); (2) among frequently-recombining and rarely-recombining genes in autosomes and X chromosome (from the frequently-recombining qXdr region and rarely-recombining Xpr region), respectively. These statistics were also estimated for each chromosome using all sites, fourfold degenerate sites, and first two codon positions, respectively. To correct for ploidy difference in comparisons between the X-linked and the autosomal genes the estimates of π in autosomes were adjusted to 75% of the original values. Both adjusted and original values are reported here and whenever this correction is used, it is explicitly stated in the text.

Demographic modelling

To quantify and compare the extent of gene flow in rarely- and frequently-recombining genes in the two Silene species, we used five demographic models (from62) that utilise Poisson random field-based demography inference framework implemented in dadi package63. These models include (Fig. 1): split_mig—population split with bi-directional migration and constant population size; IM—population split (isolation) with bi-directional migration equal in two directions and population size change; IM_2M—IM with bi-directional heterogeneous migration that is allowed to differ between two classes of sites across the genome; IM2—IM with migration allowed to differ in two directions; IM2_2M—IM2 with heterogeneous migration for two classes of sites (Fig. 1). The models with heterogeneous migration (IM_2M and IM2_2M) include two categories of genomic sites with different migration parameters. These models were chosen to test whether the two species had experienced significant population size change since divergence, whether gene flow differed in each direction and whether there was heterogeneous gene flow (presence of this would potentially mean significant differences in gene flow between autosomes and the X chromosome in each recombination category). Heterogeneous gene flow was tested using two sets of nested models – IM versus IM_2M, and IM2 versus IM2_2M. The fit of these models to data was compared with likelihood ratio tests (LRT). All these models were run for frequently-recombining and rarely-recombining genes separately. Additionally, models IM2 and IM2_2M were run for the following groups of genes: rarely-recombining autosomal genes (rareA); rarely-recombining X-linked genes in the Xpr region (rareX); frequently-recombining autosomal genes (freqA); frequently-recombining X-linked genes in the qXdr region (freqX). Ten initial runs were performed for each model with a wide parameter range (0–5 for time parameters, 0–10 for migration parameters, 0–100 for population size parameters). Based on estimated parameter values in these initial runs, parameter ranges were adjusted for a further 30 runs. The best-fitting model (the run with the highest estimated likelihood) was selected based on Akaike Information Criterion (AIC). Robustness of parameter estimates of the best-fitting models was evaluated with 100 bootstrap runs, with the confidence intervals calculated as M ± 1.96X (where M is the likelihood parameter estimate and X is the standard deviation of parameter estimates from the bootstrap runs).

Figure 1
figure 1

Schematic representation of the five demographic models used in this study. In each diagram, the width of the tree branches at the top shows the current population sizes (N1 and N2), and moving down (backward in time) the inferred demographic history since the species split. The model that assumes constant population size (split_mig) is represented by straight lines. Models that allow for exponential population size changes since the split (IM, IM_2M, IM2, IM2_2M) have curved lines and include the parameter s, which is the relative size of the population 1 at the split (relative size of population 2 is 1-s). NA is the ancestral population size before the split, and is not a free parameter63. All population sizes (N1 and N2) are expressed in units of NA. The time parameter, T, is given in units of 2*NA generations. All migration parameters (M, M1, M2, MA, MB, MA1, MA2, MB1, MB2) are represented by horizontal arrows and expressed in units of 2*NA*m, where m is the proportion of the receiving population consisting of immigrants in each generation. The “A” and “B” indexes for migration parameters reflect migration rate at two classes of sites in the genome in the IM_2M and IM2_2M models.

Results

Significant differences in polymorphism statistics between frequently and rarely-recombining genes

The distribution of genetic diversity was similar in S. latifolia and S. dioica genomes (Spearman’s correlation for π (fourfold degenerate sites): R = 0.99, p-value < 2.2 × 10–16; Spearman’s correlation for π (first two codon positions): R = 0.99, p-value < 2.2 × 10–16; Spearman’s correlation for Tajima’s D (fourfold degenerate sites): R = 0.30, p-value < 2.2 × 10–16; Spearman’s correlation for Tajima’s D (first two codon positions): R = 0.22, p-value < 2.2 × 10–16). Genetic diversity varied considerably across both genomes, with the highest diversity observed at the ends of the chromosomes and much lower diversity in the central regions (Fig. 2). This corresponds to the distribution of recombination rate reported for S. latifolia genome, with extensive pericentromeric recombination suppression present on all chromosomes and frequent recombination occurring only near the ends of all chromosomes20,50. Consistent with this, the extent of linkage disequilibrium, quantified with ZnS statistic59, was higher in the central regions of the chromosomes compared to actively recombining ends of the chromosomes (Fig. 2).

Figure 2
figure 2

Genomewide polymorphism statistics in S. latifolia (slat) and S. dioica (sdio). From top to bottom panel: nucleotide diversity (π) of each species (fourfold degenerate sites), Tajima’s D for each species (fourfold degenerate sites), Dxy and FST between the two species (all sites) and ZnS for each species (all sites).

Below we analyse and compare the patterns of DNA polymorphism separately for frequently and rarely-recombining regions. The genes lying in the central chromosomal regions, where no recombination was detected in genetic cross data20,50 are designated as “rarely” or “low”-recombining, while the genes located in the actively recombining ends of the chromosomes are designated as “frequently” or “high”-recombining, with similar numbers of genes analysed in these categories (2161 and 2261, respectively). At fourfold degenerate sites, median π was 0.0324 and 0.0341 in the frequently recombining autosomal genes (freqA); and 0.0107 and 0.0118 in the rarely recombining autosomal genes (rareA) in S. latifolia and S. dioica, respectively (Supp. Table 2). Median π at fourfold degenerate sites on the X chromosome was 0.0227 and 0.0209 in the frequently recombing genes (freqX); 0.0029 and 0.0048 in the rarely recombining genes (rareX) in S. latifolia and S. dioica, respectively (Supp. Table 2). In the first two codon positions, median π was 0.0053 in the frequently recombining genes for both species; and 0.0023 and 0.0026 in the rarely recombining genes in S. latifolia and S. dioica, respectively (Supp. Table 3). Median π in the first two codon positions in freqX genes were 0.0043 and 0.0039; and 0.0011 and 0.0016 in rareX genes of the two species (Supp. Table 3). Median FST (all sites) was the highest in both freqX and rareX genes (Supp. Table 1). Median Dxy (all sites) was the second lowest and lowest in freqX and rareX genes, respectively (Supp. Table 1). Median ZnS of freqX genes was highest for both species, but that of rareX genes was highest only for S. dioica (Supp. Table 1).

The frequently and rarely-recombining regions differed in the level and patterns of DNA polymorphism, with π (fourfold degenerate sites and first two codon positions), Tajima’s D (fourfold degenerate sites and first two codon positions), Dxy (all sites), FST (all sites), ZnS (all sites) all showed significant differences between these regions (Fig. 3). π, Dxy and FST were also significantly different for all pairwise comparisons in autosomal and X-linked genes in the two recombination categories (Fig. 3, Supp. Fig. 2). However, after autosomal genes’ π had been adjusted for ploidy difference with X (by multiplying each value by 0.75), the same pattern remained significant only for fourfold degenerate sites in S. dioica (Fig. 3a, Supp. Fig. 1). RareX genes had significantly different adjusted π from all other groups in fourfold degenerate sites for both species, and first two codon positions in S. latifolia (Fig. 3a, Supp. Fig. 1). In the first two codon positions in S. dioica, rareX genes had significantly different adjusted π from freqA and freqX, but not rareA genes (Fig. 3a, Supp. Fig. 1). For Tajima's D, rareX genes did not differ significantly from rareA and freqX genes in S. latifolia; whereas rareX genes in S. dioica differ significantly from all three other groups (freqA, rareA and freqX) (Fig. 3b). In S. latifolia, ZnS differed significantly between freq (A or X) and rare (A or X) genes but not within recombination categories (between A and X). In S. dioica, the patterns are similar to that of Tajima’s D that rareX genes differed significantly with all other groups (Fig. 3b).

Figure 3
figure 3

Comparisons of polymorphism statistics between different groups of autosomal and X-linked genes in the frequently- and rarely-recombining genomic regions of the two Silene species. Letters (ad) at the top of each box plot represent groupings based on the Wilcoxon rank-sum test. In the Kruskal–Wallis test, *indicates p-value < 0.05, **indicates p-value < 0.005, *** indicates p-value < 0.0005. Top of plots for π were cut off for better resolution of differences among groups. Full π plots and plots with adjusted π (0.75 of estimated values for autosomal genes) are presented in Supp. Figs. 1 and 2, respectively.

Demographic modelling

In order to exclude the effect of pericentromeric recombination suppression on gene flow we conducted separate analyses for rarely-recombing and frequently-recombining regions. For each of these datasets we fitted two pairs of nested models, IM and IM_2M, IM2 and IM2_2M (Fig. 1) that differed in the number of parameters accounting for interspecific gene flow. All these models included population size change after species split, which turns out to be an essential feature of the models, given the model without population size change (split_mig, Fig. 1) showed much lower fit to data compared to any of the models allowing for population size changes (Table 2). The parameter estimates for population size change showed 2.23 to 2.99-fold population size growth in both species (Table 2), which is consistent with north-ward post-glacial expansion of these species from refugia in southern Europe or Anatolia64.

Table 2 Best parameter estimates of demographic models analysed with dadi, Akaike information criterion (AIC), and results of the likelihood ratio tests (LRT) for nested models (IM versus IM_2M and IM2 versus IM2_2M). The parameters are as in Fig. 1. Confidence levels for parameters of the best-fitting model for each gene category are shown (± 1.96X, where X = standard deviation in 100 bootstrap estimates).

IM and IM_2M models assumed that gene flow is the same in both directions, while IM2 and IM2_2M allowed for different migration rates in two directions. Fitting of these models to data revealed that gene flow differs significantly in two directions (Table 2), with S. latifolia to S. dioica gene flow (M1) being stronger than in the opposite direction (M2), which is consistent with asymmetric reproductive barrier between these species65.

The IM and IM2 models assumed that all sites in the genome had the same gene flow, while the more complex *_2M models allowed for two different classes of sites (“A” and “B”) with different migration rates. Better fit of the *_2M models to data (Table 2) demonstrates the presence of significant heterogeneity in interspecific gene flow across the genome. The “A” sites (MA, MA1 and MA2 in Table 2) show much lower migration rate(s) compared to the "B" sites (MB, MB1 and MB2 in Table 2), with ~ 7 to ~ 70-fold difference between the A and B sites (Table 2). Larger proportion of the analysed sites belonged the lower migration A-category for rarely-recombining regions (56%) compared to frequently recombining regions (26%). This is consistent with rarely-recombining regions representing a significant barrier to interspecific gene flow. Analyses using separate autosomal and X-linked genes from rarely-and frequently recombining regions (“rareA”, “freqA’, “rareX”, “freqA” gene categories), revealed a similar pattern (Table 2)—higher proportions of analysed sites fell into the low migration A-category in the rarely-recombining regions (55% and 50% in “rareA” and “rareX” gene categories, respectively) than in frequently-recombining regions (31% and 28% in “freqA” and “freqX” gene categories, respectively).

The comparison of estimated gene flow for X-linked and autosomal genes reveals that on average (across MA1, MA2, MB1 and MB2 in Table 2) for frequently recombining regions, it is about twofold lower on the X compared to autosomes (MAut/MX = 2.1), which is consistent with significantly higher FST for freqX compared to freqA (Fig. 3e) as well as with the large-X effect. On the other hand, the rarely-recombining regions show little difference in migration rates between the X-linked and autosomal genes (average MAut/MX = 1.1), indicating that lack of recombination limits gene flow to a similar extent on the X-chromosome and the autosomes. The estimated time since species divergence (measured in generations times twice the ancestral population size) was similar for all categories except the frequently recombining autosomal genes, where it was much lower (TrareX = 3.85; TfreqX = 3.77; TrareA = 4.84; TfreqA = 0.55, Table 2).

Gene expression divergence in frequently and rarely-recombining regions

To compare the rate of gene expression divergence on the X chromosome and autosomes, we measured expression in transcriptome sequence data from 12 S. latifolia and 12 S. dioica females (Table 1). As expected for closely related species, gene expression in the two species was strongly positively correlated (Table 3). The correlation was the strongest for the frequently recombining X-linked genes (r2 = 0.870) and the weakest for the rarely-recombining X-linked genes (r2 = 0.781), suggesting that gene expression divergence is slightly faster in rarely-compared to frequently recombining X-linked genes (Table 3). However, the proportion of genes that evolved significantly (t-test P < 0.0001) different expression was the same (10%) in these categories. This proportion was the lowest (7.22%) in the frequently recombining autosomal genes, while in all other gene categories it was close to 10% (Table 4). Only the difference between frequently and rarely-recombining autosomal genes was marginally significant (chi2 = 3.595; P = 0.0580) for the number of genes that evolved significantly different expression in the two species. All other pairwise comparisons were non-significant. Taken together, these results indicate that gene expression divergence between S. latifolia to S. dioica is slowest in the frequently recombining autosomal genes (freqA), possibly due to more active interspecific gene flow homogenising gene pools of these species. Unlike the autosomal frequently recombining genes, expression of the freqX genes is diverging at a similar rate to rarely recombining X-linked genes, which is consistent with the X-linkage acting as a partial barrier to interspecific gene flow.

Table 3 Correlation (r2) of gene expression (FPKM) between S. latifolia and S. dioica.
Table 4 The numbers and proportions of genes that evolved significantly (t-test, P < 0.0001) different expression between S. latifolia and S. dioica.

Discussion

This study analysed the level and patterns of genetic diversity across S. latifolia and S. dioica genomes to assess the contribution of the extensive pericentromeric recombination suppression to limiting gene flow between these species. Lack of recombination leads to linkage disequilibrium of a barrier locus with a wider genomic region, which leads to suppressed introgression in such regions even for loci that are not causing any hybrid inviability or reduced fertility. Thus, rarely-recombining regions may be major contributors to the maintenance of species integrity despite on-going interspecific hybridisation (e.g.,23,30,31,32,33,34). As S. latifolia and S. dioica regularly hybridise and introgress in overlapping ranges across Europe, rarely-recombining regions, especially that on the X-chromosome, could be key to maintaining their distinct species identities.

We conducted analyses of genetic diversity, interspecific divergence and gene flow separately for regions with 'high' and 'low' recombination rates. While this division of the genome into two classes may appear crude, it does reflect strong differences in recombination rate at the ends and central regions of all chromosomes. Pericentromeric recombination suppression is quite extensive on all S. latifolia chromosomes, with the central rarely-recombining region comprising most of the length of all chromosomes20,50. This division into a very large (~ 330 Mb) rarely-recombining central region and small frequently recombining regions at the ends is particularly pronounced on the X-chromosome, which is the largest in the female genome20. The transition between the frequently recombining ends of the chromosomes and rarely-recombining central regions is quite sharp20, and a few genes falling in the transition zones with intermediate recombination rate were excluded from our analysis. Given this distribution of recombination across the S. latifolia genome, the artificial division into 'high' and 'low' (or 'freq' and 'rare') recombination classes reflects biological reality well.

Genetic diversity was observed to be substantially lower in the rarely-recombining central regions of all chromosomes, compared to actively recombining chromosomal ends. Reduced diversity in rarely-recombining regions is a general phenomenon likely caused by linked selection—selective sweeps66 and background selection67 that affect wider genomic regions in rarely-recombining regions due to stronger linkage disequilibrium. Selective sweeps are expected to drive allele frequency spectrum towards the excess of low frequency polymorphisms, which is detectable by negative Tajima's D values68. This statistic is indeed more negative in rarely-recombining compared to frequently recombining regions (Fig. 3b; Supp. Tables 1, 2, 3). In particular, rareX genes had significantly more negative Tajima’s D than both freqA and freqX genes in S. dioica (both fourfold degenerate sites and first two codon positions), and freqA genes in S. latifolia (fourfold degenerate sites only) (Fig. 3b, Supp. Table 2, 3). Genetic diversity in the X-linked genes is lower compared to the autosomes, as expected from their ploidy difference. The lower ploidy for X-linked genes accounts for their lower diversity in frequently recombining X-linked regions (π for freqX and freqA genes were similar after adjustment for the difference in ploidy, except for fourfold degenerate sites in S. dioica), but it is not sufficient to explain reduced diversity in the massive rarely-recombining Xpr region on the X chromosome, compared to rarely-recombining autosomal regions. Even after adjusting π for autosomal genes, rareX genes still had significantly lower π than rareA genes in both species and types of analysed sites, except for those in S. dioica from first two codon positions (Fig. 3a, Supp. Tables 2, 3).This may be due to particularly large size of the Xpr (~ 330Mb) that includes over thousand genes20, which should make linked selection that reduces genetic diversity particularly strong. Indeed, linkage disequilibrium (measured with ZnS) is strongest in rareX genes (Supp. Table 1).

Genetic differentiation between S. latifolia and S. dioica, measured with FST, is higher in the rarely-recombining central regions of the chromosomes, compared to actively recombining terminal regions with rareX genes having significantly higher FST than all other groups (Fig. 3e; Supp. Table 1). This is likely caused by reduced gene flow, but the reduced intraspecific genetic diversity in rarely-recombining regions (Figs. 2, 3, Supp. Tables 1, 2, 3) could have also contributed to higher FST by increasing the relative proportion of overall genetic diversity that is due to species divergence. Lower Dxy in the central compared to peripheral regions of the chromosomes (Figs. 2, 3, Supp. Table 1) is also indicative that high FST in the central rarely-recombining regions is, at least partly, caused by reduced intraspecific genetic diversity. However, the demographic modelling reveals consistently lower estimates of interspecific gene flow in the rarely-recombining compared to frequently recombining regions (Table 2). Time since species divergence estimated for rarely- and frequently recombining X-linked genes is very similar (TrareX = 3.85; TfreqX = 3.77; Table 2), which indicates similar coalescent times for the two groups of X-linked genes and suggests that rare recombination and X-linkage both act as considerable interspecific barriers. This is also consistent with much lower T for frequently—(TfreqA = 0.55) compared to rarely—(TrareA = 4.84) recombining autosomal regions, with higher interspecific gene flow in the former compared to the latter (Table 2), homogenising gene pools of the two species and reducing T.

While rarely-recombining regions appear to represent significant barriers to interspecific gene flow, X-linkage may also contribute significantly to species differentiation as indicated by higher FST values in the X-linked compared to autosomal genes for regions with similar recombination rates (i.e. rareX versus rareA, and freqX versus freqA; Fig. 3e). However, Dxy is lower in the X-linked compared to autosomal genes both for frequently- and rarely-recombining regions (Fig. 3d), indicating that higher FST for X-linked genes is at least partly caused by lower intraspecific genetic diversity on the X-chromosome. Furthermore, the fitting of demographic models to data did not show significantly lower gene flow for X-linked compared to autosomal genes for regions with similar recombination rate (Table 2). The proportion of sites (P) falling into low gene flow category was similar for X-linked and autosomal genes within the same recombination category (PrareX = 0.50 vs PrareA = 0.55; PfreqX = 0.28 vs PfreqA = 0.31; Table 2). Thus, the effect of X-linkage (if any) on gene flow appears to be much less pronounced compared to reduced recombination rate in pericentromeric regions.

Conclusion

In this study, we tested whether the pericentromeric recombination suppression in the massive Xpr region20 on the S. latifolia X chromosome can account for the LXE previously reported for this species12. While LXE in animals has been shown with direct experiments69, the evidence for LXE in S. latifolia12 and our analyses presented above are indirect—based on evolutionary genetic analyses of genetic diversity and gene flow between the species. We report that population differentiation (FST; Fig. 3e) and the proportion of sites with low interspecific gene flow (P in Table 2) are significantly higher in the rarely-recombining compared to the actively recombining regions on the X-chromosome and the autosomes. This reveals an important role of the rarely-recombining regions in limiting gene flow between the two species. As the rarely-recombining region comprises a larger proportion of the X-chromosome (~ 90%) compared to the autosomes (~ 80%)20, this likely disproportionately reduces overall interspecific gene flow on the X, contributing to the 'large-X' effect. We found little evidence that X-linkage by itself contributes significantly to the LXE in S. latifolia and S. dioica. The frequently recombining part of the X-chromosome does have a significantly higher FST compared to the frequently recombining regions on the autosomes (Fig. 3e), but this appears to be caused by lower genetic diversity in the X-linked genes. We conclude that the lack of recombination in pericentromeric regions creates a significant barrier for interspecific gene flow, which is a cause for the LXE in S. latifolia and S. dioica due to a disproportionately large pericentromeric region on the X-chromosome.