Introduction

Cessation of recombination between diverging sex chromosomes makes the non-recombining sex chromosome vulnerable to a number of degenerative forces. Inevitable accumulations of deleterious mutations through the process of Muller’s ratchet, selective sweeps and reduced effective population size are examples of such forces that act to increase the mutation load of the sex-limited chromosome, that is, the Y chromosome in male heterogametic organisms and the W chromosome in female heterogametic organisms1,2,3. Gene sequence and activity for most loci will therefore deteriorate unless they are under strong selection in the heterogametic sex4,5. This process will continue as recombination cessation spreads along the sex chromosomes in the heterogametic sex6.

However, the need for chromosome pairing and an obligate crossing-over event to ensure proper chromosomal segregation at meiosis is thought to, in most cases, strongly select against complete loss of sex chromosome recombination7,8,9; but see ref. 10. For this to be possible, it is necessary that the differentiated sex chromosomes maintain sequence homology in at least one common region shared between X and Y, or Z and W, where homologous recombination can take place11,12. This is referred to as the pseudoautosomal region (PAR)9 and cytology has since long established that meiotic pairing and crossing-over between differentiated sex chromosomes of diverse organism groups are concentrated to the PAR13.

At the onset of sex chromosome differentiation, which may be triggered by selection against recombination around a sex-determining locus5, the proto-sex chromosomes correspond to an ordinary pair of autosomes. However, as soon as recombination restriction is established in at least a small region of this pair, the distinction between the non-recombining region and the pseudoautosomal part of the sex chromosomes becomes apparent. In newly evolved (young) sex chromosome systems14, the PAR may constitute a major part of the X (Z) chromosome. However, the PAR shrinks as recombination suppression spreads. PAR is thus a dynamic entity and additions, losses or transpositions of chromosomal segments to sex chromosomes may add to the dynamics of the evolution of PARs15.

It is fair to say that the PAR represents one of the least well-characterized parts of the genome. Genomic data on the character and structure of the PAR in old and differentiated sex chromosomes mainly come from humans and other mammals16,17. Birds have come to constitute the most well-studied group of organisms in terms of sex chromosome evolution under female heterogamety18,19. Chicken Gallus gallus is a main avian model, and a draft genome sequence was presented already 10 years ago20. However, despite a number of studies investigating how the chicken Z and W chromosomes became differentiated21,22,23,24, the PAR still remains to be molecularly identified and this is also the case for all other birds with well-differentiated sex chromosomes25 (in the most basal lineage of contemporary birds, Paleognathae (ostrich and allies, representing <1% of avian species), sex chromosomes have remained largely undifferentiated26,27; Supplementary Note 1). This has led to the idea that the avian PAR might in most cases be very small or shows some peculiar molecular features that hinder its identification.

Here we describe the identification of an avian PAR using a combination of high-density genetic linkage analysis and whole-genome re-sequencing in the collared flycatcher Ficedula albicollis. We find that the PAR is a 630-kb region in one of the ends of the Z chromosome and by performing population genomic and molecular evolutionary analyses, we test theoretical predications9,28,29,30 for the genetics and evolution of pseudoautosomal sequences.

Results

Identification of the PAR based on linkage analysis

By selecting markers from essentially all scaffolds in a draft assembly of the collared flycatcher genome31, we recently developed a custom single-nucleotide polymorphism (SNP) array and obtained a genome-wide high-density genetic linkage map (3,249 cM) by genotyping large multi-generational pedigrees from a natural population32,33. This included a 161-cM male map of the Z chromosome, which showed no female recombination and testifies of advanced sex chromosome differentiation in this species. To identify the PAR, we focused on SNPs heterozygous in both males and females in a pool of unmapped markers that were not linked to any of the 33 autosomal linkage groups33. Seven of these previously unlinked markers from three small scaffolds (N00298, three markers in 436.0 kb; N00378, three markers in 182.2 kb; N02597, one marker in 2.3 kb) showed highly significant two-point linkage in both male and female meioses to several markers located close to one end of the Z chromosome linkage map and to each other (Supplementary Table 1; Supplementary Fig. 1).

We built a new map of the Z chromosome that with strong support placed markers from N00298, N00378 and N02597 distal to all other markers on the chromosome (Table 1) and extended the Z chromosome linkage map as measured in male meiosis with 7.3 cM and the Z chromosome assembly with 630 kb (Fig. 1b; Supplementary Note 2). There was a dramatic sex difference in the amount of recombination in this region with a female map length of 64.3 cM (Fig. 1b), corresponding to a female recombination rate of 102.1 cM/Mb. With a genetic distance >50 cM, the data are compatible with an obligate crossing-over in female meiosis, consistent with expectations for a PAR. Moreover, female recombination was not uniformly distributed across the 630 kb but was concentrated to an ~150-kb hotspot region (with an extreme rate of 747 cM/Mb in the 67-kb interval between markers N00378:115359 and N02597:626) distal to the boundary with the rest of the Z chromosome.

Table 1 Genetic map of the PAR and neighbouring region of the Z chromosome.
Figure 1: Characteristics of the flycatcher PAR.
figure 1

(a) Estimates of population genetic and genomic parameters in 20 kb windows across the PAR in comparison with the mean for the rest of the Z chromosomes (dashed line in each panel). From top to bottom: between-species differentiation FST, density of fixed differences between species df, nucleotide diversity (π), LD (r2), GC content (%) and repeat density (%). (b) Physical and genetic description of the PAR showing the three scaffolds assigned to PAR and genetic maps for females and males, respectively, in this region.

Identification of the PAR based on read depth

An independent way to identify a PAR is to contrast depth of coverage in re-sequencing of males and females26,34,35. Specifically, while autosomes and PARs should show similar coverage in male and female sequencing, the region of the Z chromosome that does not recombine with the W chromosome in female meiosis should show twofold higher coverage in males. We therefore performed whole-genome re-sequencing of population samples of males and females and mapped reads to the assembly. This clearly demonstrated a twofold higher male coverage across the Z chromosome, with the exception of scaffolds N00298, N00378 and N02597, where males and females had equal coverage (Fig. 2; Supplementary Fig. 2). On the basis of the described evidence, we now define these three scaffolds as the collared flycatcher PAR, the first detected PAR in a pair of well-differentiated avian sex chromosomes (Supplementary Note 1). The physical length of the PAR corresponds to ≤1% of the total length of the Z chromosome and implies that cessation of female recombination has spread over ≥99% of the sex chromosomes. Given previous failure to identify the PAR in neognath birds (all birds but ratites and tinamous), a small PAR may be a common feature of avian sex chromosomes (Supplementary Note 3). Moreover, this resembles the situation for a recently identified small PAR in a female heterogametic flatfish36. We acknowledge that since there are gaps between scaffolds within the PAR, as well as between PAR and the rest of the Z chromosome, the complete PAR sequence and the precise PAR boundary remain to be determined. The same applies for any telomeric sequence.

Figure 2: Sex-specific gene expression.
figure 2

Male-to-female (M/F) coverage ratio for 200 kb windows along the Z chromosome. Coverage was normalized by the average M/F ratio of autosomal scaffolds.

Genomic characteristics of the PAR and PAR genes

The flycatcher PAR contains 16 known and 6 de novo-predicted protein-coding genes (Supplementary Table 2). This implies a higher gene density in the PAR than in the Z chromosome overall, both expressed as the number of genes per Mb (34.9 versus 10.0) and the amount of coding sequence per base pair (bp; 0.048 versus 0.016, non-parametric boostrap re-sampling, P<10−5). Consistent with a tight organization, repeat content was lower in the PAR (0.064 per bp) than in the rest of the Z chromosomes (0.116, P=0.081; Fig. 1a; Table 2). The high rate of recombination in the PAR may have generated an excess of deletion mutations37 and may also have increased the efficiency of selection against deleterious insertions of repetitive elements. A high rate of recombination might also be expected to have left a footprint on the base composition of PAR38 via GC-biased gene conversion39,40,41. Indeed, the mean GC content (49.2%) was significantly higher than in the rest of the Z chromosomes (mean=39.9%, range=36.4–48.3%, P=0.00056; Fig. 1a; Table 2).

Table 2 Genomic parameters of the flycatcher PAR.

Recombination rate may affect the rate of sequence evolution in different ways. We made three-species alignments of coding sequences of flycatcher, zebra finch (Taeniopygia guttata) and chicken to estimate branch-specific substitution rates in the flycatcher lineage (Table 2). The mean synonymous-to-non-synonymous substitution rate ratio (dN/dS) of PAR genes was lower than that of other Z-linked genes (0.095 versus 0.171, P=0.030, Wilcoxon test), consistent with more efficient removal of slightly deleterious mutations in the PAR due to reduced Hill–Robertson interference. More surprisingly, dS of PAR genes (0.130) was significantly higher than that of other Z-linked genes (0.078, P=0.0014). This cannot be explained by constraints at synonymous sites42 or male-biased mutation43 because both would act in the opposite direction, with a higher substitution rate on the rest of the Z chromosomes. Timing and mechanisms of recombination and the formation of double-strand breaks in female germ line have been shown to differ between PAR and autosomes in chicken44, and Z–W pairing is error prone45. This might translate into a situation where the extraordinary high rate of recombination implies an increased rate of mutation in the PAR.

PAR and sexual antagonism

Because PAR sequences may be polymorphic in both sexes but yet show an association with sex, increasingly so closer to the boundary with the sex-determining region29, the stage is potentially set for a strong role of sexual antagonism46 on the character and evolution of genetic diversity in PAR9. Recently, several evolutionary genetic predictions pertinent to pseudoautosomal sequences have been developed9,28,29,30,47. For example, since sexual antagonism can favour the maintenance of polymorphisms by selection for alternate alleles in males and females, genetic diversity in PARs should be high30. Moreover, the rather unusual scenario of allele frequency differences between males and females may apply48, due to the formation of linkage disequilibria between sexually antagonistic alleles and the Z chromosome or the W chromosome28. To test this, we used whole-genome re-sequencing of 10 males and 10 females to assess levels of noncoding nucleotide diversity (π). We found that diversity in the PAR (mean π=0.0034) was not significantly different from the rest of the Z chromosome (mean π=0.0032; non-parametric bootstrap re-sampling, P=1; Fig. 1a; Table 2). Moreover, there was no detectable differentiation between males and females in the PAR (FST=0.007±0.011 s.d.) or in the rest of the Z chromosome (0.012±0.023 s.d.), as would have been the case with sex differences in allele frequencies. Females were heterozygous throughout the PAR at a rate identical to that in males.

In none of these cases were there any deviating signals close to the boundary with the sex determining region. Levels of linkage disequilibrium (LD) in the PAR were lower (mean r2=0.00087) than in the rest of the Z chromosomes (mean r2=0.00157, Wilcoxon test, P=3.1e−10; Fig. 1a), with a mean distance of LD decaying to r2=0.1 of 45 bp in the PAR and of 1,558 bp in the rest of the Z chromosomes (Supplementary Fig. 3). As a side note, genomic differentiation in comparison with the closely related pied flycatcher (F. hypoleuca) was much lower in PAR than in the rest of the Z chromosome (FST: 0.372 versus 0.555, P=0.00051; df: 0.0001 versus 0.0011, P=0.00051). This provides support for an increased rate of sex-linked lineage sorting. Enhanced differentiation of sex chromosomes observed in this31 and other speciation models49 can thereby be explained by the lower effective population size of sex chromosomes compared with autosomes and PARs.

If sexual antagonism is prevalent, theory predicts an over-representation of genes with sex-specific functions on the sex chromosomes50. However, none of the annotated PAR genes (Supplementary Table 2) had known function in male or female reproduction. Another prediction is that sex-specific expression, or sex-biased gene expression as a means to resolve sexual conflict50,51, should be evident. We analysed expression profiles using RNA-sequencing from seven non-reproductive tissues, plus testis and ovary, for five males and five females. Twenty PAR genes were expressed in at least one of the tissues analysed and expression breadth did not deviate from other Z-linked genes (mean τ of 0.601 and 0.657, respectively, P=0.239). One PAR gene (ENSFALG00000011567, predicted transcript) showed testis-specific expression while none showed ovary-specific expression, which is at a level expected by chance given the overall frequency of testis- and ovary-specific genes in the genome (probability of 0.135). The tissue-averaged male-to-female expression ratio for PAR genes varied between 0.76 and 1.20, with a mean of 0.95 (similar to the autosomal average, 1.02). This made a marked contrast to the situation for other genes on the flycatcher Z chromosome, which had a mean male-to-female ratio of 1.40 (P<1e−10). There is ample evidence for pervasive male-biased gene expression (incomplete dosage compensation) in the Z chromosome in this52 and other avian species53,54. In summary, annotation and expression of genes in the PAR provide no strong indication of sexual antagonism.

One possible explanation for the failure to verify theoretical expectations based on sexual antagonism in the evolution of flycatcher PAR sequences includes frequent turnover of the PAR by interchromosomal rearrangements. However, this explanation is highly unlikely because of a high degree of conservation of this region in birds. Genomic alignment of flycatcher and chicken revealed that the flycatcher PAR corresponds to one of the terminal regions also of the chicken Z chromosome (Fig. 3), with completely conserved gene content. Two inversions distinguish gene order between the flycatcher PAR and the homologous region of the chicken Z chromosome (Fig. 3). Using Anolis lizard as an outgroup suggests that the most distal inversion arose in the lineage leading to chicken, subsequent to the split of the chicken and flycatcher lineages 80 myr ago55. The other discrepancy in gene order between chicken and flycatcher coincides exactly with scaffold N00378. This scaffold was oriented with a logarith of the odds (LOD) score support >3 in the flycatcher linkage map and orientation was also supported by mate-pair data (Supplementary Note 1). Our data therefore show that, despite some internal inversions, the sequence content of the flycatcher PAR has remained stable during avian evolution. In general, the avian karyotype is extremely conserved with very few interchromosomal rearrangements56; in fact, flycatcher and zebra finch chromosomes are completely syntenic without fusions, fissions or translocations detectable with the resolution given standard methodology33.

Figure 3: Comparative genome organization of flycatcher PAR.
figure 3

Homologous sequences of chicken, flycatcher and Anolis lizard including the flycatcher PAR region and the distal 600 kb of the non-recombining part of the flycatcher Z chromosome. Each line joins homologous regions identified as anchors by the program LASTZ. Blue lines represent identical orientation of homologous sequences, red lines represent inverted orientation. Scaffolds N02597, N00781 and N00497 contain no genes, meaning that establishment of homology is difficult. Note the difference in scale between chicken and flycatcher (upper scale) and lizard (lower scale).

Discussion

Identification of the collared flycatcher PAR was achieved by indisputable support for genetic linkage of markers from three previously unassigned scaffolds to the Z chromosome, equal depth of coverage in male and female genomic re-sequencing, evidence for an obligate crossing-over in female meiosis and presence of heterozygote sites across this region in females. To our knowledge, this represents the first identification and extensive sequencing and genetic analysis of a PAR in a pair of highly differentiated avian sex chromosomes (Supplementary Note 1). It includes estimation of the size, boundary, sequence and gene content of the PAR, and analyses of gene expression and several population genetic and molecular evolutionary parameters. We find that the PAR is intermediate to autosomal and sex-linked sequences in several evolutionary and genomic respects. It is interesting to note that the recent identification of the first PAR in a female heterogametic fish revealed a very similar size, number of genes, repeat content and male:female expression ratio as for the PAR in flycatcher36.

There has been considerable recent interest in the evolutionary expectations for pseudoautosomal sequences, based on sexual antagonism9,28,29,30,47. Much of this theoretical work remains to be empirically tested and our data provide one of the first opportunities to do so with a population genomic approach. This is particularly the case when it comes to female heterogametic sex chromosomes. However, we found no evidence for a role of sexual antagonism on sequence content or evolution. It is possible that theoretical predictions for the evolution of PAR sequences are not applicable to a situation of highly differentiated sex chromosomes, as observed in flycatchers. First, with most of the observed recombination concentrated close to the PAR boundary, distal PAR sequences will be effectively autosomal. However, as recombination hotspots may be ephemeral57, this pattern may have changed over time. Second, there might be constraints to sexual antagonism in a small PAR that is defined by the particular set of a limited number of genes that happen to reside in the terminal part of the Z chromosome. This situation may have been different at earlier stages of sex chromosome evolution. A widely accepted model of sex chromosome evolution implies gradual or sequential expansion of recombination restriction between the Z (or X) and W (or Y) chromosomes, and the concomitant contraction of the PAR, driven by selection for linkage between sexually antagonistic alleles and the sex-determining region58. After recombination restriction, such loci will subsume into the non-recombining region to become truly sex limited, thereby reducing signals of antagonism in the contracted PAR. An extension of this hypothesis is a negative feedback loop in which the impetus for further expansion of the non-recombining region of sex chromosomes is increasingly reduced with a decreasing number of potential targets for sexual antagonism in the remaining PAR.

Methods

Identification of the PAR based on linkage analysis

We used a natural population of collared flycatchers breeding on the Baltic Sea island Öland (sampling conducted according to permissions and rules of the Swedish ethics committee for wild animals) and a custom 50K SNP array32 to obtain genotypes of 655 individuals from four-generation pedigrees for linkage analysis33. Genotyping was performed with an Illumina iScan instrument. The array had purposedly been designed to include highly variable SNPs from essentially all scaffolds >25 kb in a preliminary assembly version of the flycatcher genome32. After filtering for deviations from Hardy–Weinberg equilibrium and Mendelian inheritance, linkage analysis was performed using CRI-MAP 2.503 (ref. 59) developed by Ian Evans and Jill Maddox. Genotype data were initially used to construct a high-density linkage map, comprising 33 autosomal linkage groups and chromosome Z with a total of 33,627 markers assigned to one of these linkage groups33. To identify markers in PAR, pairwise linkage scores were calculated between 89 markers in the best-order Z chromosome linkage map and 2,904 markers that were not linked to any of the 33 autosomal linkage groups by using TWOPOINT option in CRI-MAP. These 2,904 markers had both heterozygous and homozygous genotypes in males as well as females without deviating from Hardy–Weinberg Equilibrium. Because of being heterozygous in females, they were not included in the initial Z-linkage analysis33. Markers that had pairwise LOD score >3.0 with at least one of the 89 Z-linked markers were used for subsequent BUILD analysis to determine their marker order along with the existing Z-chromosome linkage map. Genotypes have been deposited in the Dryad database ( doi:10.5061/dryad.h68jd).

Identification of the PAR based on sequence coverage

Raw whole-genome re-sequencing reads, obtained by Illumina HiSeq sequencing as described in ref. 31, from 10 female and 10 male collared flycatchers from the above population were mapped to all scaffolds in the FicAlb1.5 assembly version of the collared flycatcher genome (AGTO00000000.2) with Burrows-Wheeler Aligner (BWA) 0.6.2 (ref. 60) using default settings with a soft-clipping base-quality threshold of 5 to avoid low-quality bases in alignments. Alignment quality was enhanced by local realignment with GATK 2.4.3 (ref. 61). Duplicates were marked at the library level using Picard ( http://picard.sourceforge.net).

Base coverage for all Z-linked scaffolds including the three PAR scaffolds (NW_004775940.1 (scaffold N00298), NW_004775959.1 (N00378) and NW_004778032.1 (N02597)) was extracted with SAMtools mpileup 0.1.19 (ref. 62) pooling all individuals from each sex. The scaffolds were divided into 200 kb windows, and the mean and median coverage per window as well as the male-to-female coverage ratio were calculated with in-house scripts. To account for differences in total sequenced reads per sex, we normalized ratios by dividing them with the average M/F ratio of autosomal scaffolds.

Characterization of the PAR

Gene information was obtained from Ensembl annotation of the FicAlb_1.4 version of the flycatcher genome assembly. The three PAR scaffolds identified in this study correspond to scaffolds JH603441.1 (N00298), JH603380.1 (N00378) and AGTO1003702.1 (N02597) in FicAlb_1.4 available at http://www.ensembl.org. The genome was repeat masked with RepeatMasker (version open-3.2.9) and repeat content and GC content were calculated in 20 kb (for Fig. 1a) or 630 kb windows (=the size of PAR, with 5 kb added to each of the two gaps between the scaffolds, for statistical analysis). Gene expression data for PAR genes was taken from ref. 52 and included expression levels measured by RNA-sequencing in five birds of each sex in brain, kidney, liver, lung, muscle, ovary, skin, testis and embryo (ERX144565-577, ERX144581–585, ERX144589–598, ERX144609–618, ERX144637–650, ERX144661–674, ERX144685–696, ERX144721, ERX144725, ERX144729 and ERX144731). Transcriptome reads were mapped onto the assembly version FicAlb1.5 using TopHat (version 2.0.10) and Cufflinks (version 2.1.1)63,64.

PAR scaffolds were aligned pairwise to the genomes of chicken (Galgal4) and Anolis lizard (AnoCar2.0) using LASTZ65. Homologous regions were identified, extracted and ordered to minimize the number of inversions between species. All anchors between each species pair falling in the extracted regions were plotted with R.

Molecular evolutionary and population genomic analysis

We identified and downloaded putatively orthologous genes from collared flycatcher, zebra finch and chicken through the Biomart ( http://www.biomart.org) retrieval tool in Ensembl release 73 ( http://www.ensembl.org). Codon-based alignments were made using PRANK (v.130410)66 with a free-ratio-model in the codeml program in the Phylogenetic Analysis by Maximum Likelihood (PAML4.7) package67 to estimate flycatcher lineage-specific dS and dN/dS for each gene.

Differentiation (FST) between species (using whole-genome re-sequencing data from 10 males and 10 females of the closely related species pied flycatcher) or sexes was estimated using the hierfstat package in R68. The proportion of fixed differences between species (df) and genetic diversity within species (π) were estimated using custom R scripts. Genotypes were assumed to be diploid for the PAR, and haploid for the remainder of the Z chromosome. These parameters were estimated for 20 or 630 kb windows. To investigate the pattern of LD, we first reconstructed haplotypes by Beagle 4 (ref. 69) with 40 iterations for estimating genotype phase, 10 iterations for imputing missing genotypes and 20 haplotype sampling during each iteration. Pairwise LD (r2) was then calculated for all pairs of SNPs within 20 kb using VCFTools 0.1.12 (ref. 70), and the level of LD within 20 kb windows was estimated by E(r2)=1/(1+αd), where α is a LD decay parameter over distance d between markers.

Additional information

Accession Codes. Re-sequencing data have been deposited in the European Nucleotide Archive at EMBL-EBI under the accession codes ERR637360 to ERR637378 and ERR637485 to ERR63752; project code PRJEB7359.

How to cite this article: Smeds, L. et al. Genomic identification and characterization of the pseudoautosomal region in highly differentiated avian sex chromosomes. Nat. Commun. 5:5448 doi: 10.1038/ncomms6448 (2014).