Transcriptional regulation of dosage compensation in Carica papaya

Sex chromosome evolution results in the disparity in gene content between heterogametic sex chromosomes and creates the need for dosage compensation to counteract the effects of gene dose imbalance of sex chromosomes in males and females. It is not known at which stage of sex chromosome evolution dosage compensation would evolve. We used global gene expression profiling in male and female papayas to assess gene expression patterns of sex-linked genes on the papaya sex chromosomes. By analyzing expression ratios of sex-linked genes to autosomal genes and sex-linked genes in males relative to females, our results showed that dosage compensation was regulated on a gene-by-gene level rather than whole sex-linked region in papaya. Seven genes on the papaya X chromosome exhibited dosage compensation. We further compared gene expression ratios in the two evolutionary strata. Y alleles in the older evolutionary stratum showed reduced expression compared to X alleles, while Y alleles in the younger evolutionary stratum showed elevated expression compared to X alleles. Reduced expression of Y alleles in the older evolutionary stratum might be caused by accumulation of deleterious mutations in regulatory regions or transposable element-mediated methylation spreading. Most X-hemizygous genes exhibited either no or very low expression, suggesting that gene silencing might play a role in maintaining transcriptional balance between females and males.

www.nature.com/scientificreports/ Similar distribution of mX/fXX was observed among various levels of X/Y divergence, suggesting that dosage compensation and X/Y divergence evolved independently 14,18,19 . Partial dosage compensation was also observed in Rumex hastatulus that has cytologically distinguishable heteromorphic sex chromosomes 15 . The expression ratio of Y alleles relative to X alleles in male was significantly lower in the old evolutionary stratum than the one in the young evolutionary stratum, whereas the total expression of Y and X alleles in these two strata showed no significant reduction relative to the total expression of two alleles in female, suggesting that dosage compensation probably coevolved with X/Y divergence during sex chromosome evolution in R. hastatulus 15 . These studies in plants revealed that dosage compensation operates only on some genes instead of the entire non-recombining region. Genes in non-recombining regions degenerate gradually instead of becoming nonfunctionalized instantly during the evolution of sex chromosomes. Therefore, a gene-by-gene analysis is necessary to fully understand the evolutionary process of dosage compensation, especially for the young sex chromosomes in plant species. Sex chromosomes of papaya evolved 7 MYA, younger than the ones in R. hastatulus and S. latifolia 15,20 . With the complete genome sequence and annotation of non-recombining regions of sex chromosomes available, papaya offers an unprecedented opportunity to study effects of dosage compensation at early stages of sex chromosome evolution 20,21 . Papaya (Carica papaya) belongs to the family Caricaceae that contains six genera and 35 species. Among the 35 species, Vasconcella monoica is the only monoecious species that does not have sex chromosomes, while all the other species are either trioecious or dioecious 20,22 . Papaya is trioecious with three sex types (female, male, and hermaphrodite) controlled by a XY system. The two Y chromosomes controlling the development of hermaphrodites (Y h ) and males (Y) are slightly different 21,23,24 . The non-recombining regions of papaya sex chromosomes have been fully sequenced using a BAC (Bacterial artificial chromosome)-by-BAC strategy 21,23,24 . Two evolutionary strata, corresponding to two inversions on the Y chromosome, were identified in the papaya non-recombining region. The non-recombining region of the papaya Y chromosome is 8.1 Mb whereas its counterpart of the X chromosome is 3.5 Mb 21 . All the genes in the non-recombining regions have been annotated, making it possible to evaluate dosage compensation on a gene-by-gene level and to study the relationship between dosage compensation and evolutionary strata.

Results
Expression of genes in non-recombining regions of papaya sex chromosomes. In the nonrecombining regions of papaya X and Y chromosomes, 50 X/Y paired genes, 34 X-specific genes, and 22 Y-specific genes were annotated 21 . A straightforward test for dosage compensation is to compare the average expression of X-specific genes to the average expression of autosomal genes in males (mX/mAA). In general, dosage compensation is considered to be evolved if the ratio of mX/mAA is close to 1. However, among the 34 X-linked hemizygous genes without Y alleles (X-specific genes), only two of them were expressed in male flower tissues and one expressed in male leaf tissues, making it impossible to test dosage compensation in papaya using this method. We therefore chose X/Y paired genes (including pseudogenes) in the non-recombining region to test dosage compensation in papaya.
Expression ratios of X-linked genes to autosomal genes in male papaya tissues. We used RNA-Seq data of papaya male and female leaf and flower tissues to study whether dosage compensation has evolved in papaya. The global gene expression patterns were highly correlated between male and female in both leaf (Fig. 1a) and flower (Fig. 1b) tissues. However, expression levels of sex-linked genes were significantly lower than that of autosome genes (Wilcoxon rank sum test, p < 0.05) in both leaf and flower tissues ( Supplementary  Fig. S1). We then chose V. monoica, a close relative species of papaya without sex chromosomes, to calibrate the expression of genes on sex chromosomes and autosomes.
First, we compared the global gene expression between papaya and V. monoica. Overall, the gene expression in both male and female papaya leaves was correlated with their orthologs in V. monoica (Spearman's rank correlation, male rho = 0.66, N = 10,010 genes, p < 0.001; female rho = 0.67, N = 10,010 genes, p < 0.001) (Fig. 1c,d). The result demonstrated that most of the genes in papaya and V. monoica had conserved expression patterns. Similar correlation was also observed in both male and female flowers between papaya and V. monoica (Fig. 1e,f).
After calibration of gene expression using V. monoica, we compared median expressions of genes in sexspecific regions and genes on autosomes. No significant difference of median expression was detected between sex-specific region and autosome genes in either male leaf (Wilcoxon rank sum test, W = 253,910, p value = 0.352) or male flower (Wilcoxon rank sum test, W = 291,850, p value = 0.3981) tissues ( Table 1). The calibrated median sex-specific region/Autosome expression ratios were 1.064 and 1.125 in male leaf and male flower tissues, and 1.141 and 1.000 in female leaf and female flower tissues, respectively (Table 1). However, the calibrated mean sexspecific region/Autosome expression ratios were 0.502 and 0.842 in male leaf and male flower tissues, and 0.661 and 0.985 in female leaf and female flower tissues, respectively. Our result showed dosage imbalance between sex-specific region and autosome genes, and between female and male papayas. In general, genes in sex-specific region showed reduced expression in male tissues compared to their counterpart female tissues. However, the reduction was less than 50%, 24% in leaf tissue and 14.5% in flower tissue, indicating that partial compensation or compensation in some genes may exist. Different degrees of reduction in flower and leaf tissues may suggest that dosage compensation in papaya varies among tissues.
Expression ratio of sex-linked genes in male relative to female. One of the major aims of dosage compensation is to mitigate the effects of imbalanced gene dose in males and females. If no dosage compensation evolved, X and Y alleles are expected to show equalized expression in male and their average expression in male should be half of the expression of the two X alleles in female (mX = mY = 0.5fXX). If partial dosage www.nature.com/scientificreports/ compensation exists, mX > mY and mX > 0.5fXX. The average expression ratios of genes in sex-specific region between male and female (mXY/fXX) were initially calculated at 1.00 and 1.16 in leaf and flower tissues, respectively. Since X and Y alleles were highly identical, misalignment could happen during mapping reads to the reference genome sequence for male samples. For this reason, we recalculated the expression ratios using singlenucleotide polymorphisms (SNPs) to distinguish X and Y alleles. We also normalized read counts by total reads of each library, making it comparable across male and female libraries. The median expression ratios of mY/mX in leaf and flower tissues were 0.81 and 0.92, respectively, indicating decreased expression of Y alleles compared to their corresponding X alleles. A double-peaked distribution was observed for expression ratios of X alleles in male relative to female (Fig. 2a). A high peak was observed at 0.5, indicating a large number of X alleles are dosage uncompensated. The secondary peak was observed at 0.7 with a long tail reaching above 1, indicating www.nature.com/scientificreports/ partial dosage compensation has evolved for a subset of genes. In contrast, the expression ratio of Y alleles in male relative to their corresponding X alleles in female was centered at 0.4, lower than 0.5, indicating reduced gene expression of Y alleles (Fig. 2b). We further analyzed expression ratios of genes in the two evolutionary strata (corresponding to inversion 1 and inversion 2) and the collinear region. In the collinear region, the median log 2 transformed expression ratio of X alleles in male relative to female (mX/fXX) was around − 1 and the log 2 (mXY/fXX) was around 0, indicating that the expression of X alleles in male was about half of the expression of two X alleles in female and X and Y alleles shared similar levels of expression (Fig. 3a,b). In the inversion 1, the log 2 (mX/fXX) was around − 1 and the log 2 (mXY/fXX) was below 0, indicating that the expression of X allele in male was about half of the expression of two X alleles in female but Y alleles had reduced expression compared to their corresponding X alleles (Fig. 3a,b). In the inversion 2, the log 2 (mX/fXX) was slightly above − 1, but the log 2 (mXY/fXX) was above 0, significantly higher than the ones in the collinear region and the inversion 1, suggesting elevated expression of Y alleles relative to their corresponding X alleles in the inversion 2 ( Fig. 3a,b).
Although dosage compensation of whole sex-linked region was not observed in papaya, local compensation may exist. We analyzed the mX/fXX ratio of all expressed genes one by one in eight different tissues, including three leaf and five flower tissues ( Supplementary Fig. S2). One gene in the inversion 1, PYhCpXYh7 (its Y-allele is a pseudogene) showed dosage regulation, with a median expression ratio of mX/fXX at 0.986. Four genes in the inversion 2, CpXYh18, CpXYh23, CpXYh29 and CpXYh32, and two genes in the collinear region, CpXYh35 and CpXYh38, had a mX/fXX ratio close to 1 or higher than 0.75, suggesting that these genes were dosage compensated or partially dosage compensated. We also calculated the expression ratio of total X and Y alleles in male relative to the two X alleles in female. The log 2 (mXY/fXX) of most of the genes mentioned above were higher than zero ( Supplementary Fig. S3).
Gradual reduction of Y allele expression with sex chromosome evolution. We calculated expression ratios of Y alleles relative to their corresponding X alleles in male leaf and flower tissues. In the inversion 1, an older evolutionary stratum, Y alleles showed a relatively lower level of expression than X alleles (Fig. 4,  Supplementary Fig. S4). In the younger evolutionary stratum inversion 2 and the collinear region, X and Y alleles showed a similar level of expression. In general, the expression ratio of Y alleles relative to X alleles was related with evolutionary strata of the non-recombining region in papaya with few exceptions (Fig. 4). CpXYh2 in inversion 1 and CpXYh29 in inversion 2 showed Y-biased expression. Consistent with the above findings, the Table 1. The median and mean expression ratios of genes in sex-specific region to autosome genes in leaf and flower tissues of papaya male and female plants.    . The distribution of log 2 transformed expression ratio of Y alleles relative to X alleles in male samples (including three leaf replicates and five flower replicates) across the non-recombining region. Forty-five pairs of expressed genes were used in the analysis, including 18 genes in inversion 1, 16 genes in inversion 2, and 11 genes in the collinear region. The X-axis represents the position of genes in the non-recombining region of the X chromosome. Cyan and orange dots represent the log 2 value of average and median ratios between Y and X alleles, respectively.

Discussion
Genetic degeneration is a hallmark of sex chromosome evolution 14,15,25 . The decay of genes and gene activities on the Y or W chromosome may cause expression imbalance between sexes, and could negatively affect the fitness of heterogametic sex. Dosage compensation is a process to equalize genes expression between sexes and between genes on sex chromosomes and autosomes 10,11,14,26 . To investigate whether dosage compensation operates in papaya, we conducted comparative analysis on gene expression between different sex types using RNA-Seq data generated from papaya leaf and flower tissues. Dosage compensation has been well studied in human and mammalian systems. Compared to plant sex chromosomes, mammalian sex chromosomes are much older, evolved about 166-180 Mya 27,28 . The mammalian Y chromosome has degenerated and lost most of its original genes over evolutionary time, resulting in only a few functional genes retained 29 . Therefore, most X-linked genes are single copy in the mammalian heterogametic sex. Most dosage compensation studies in mammalian systems are based on assessment of relative expression of X-specific genes versus autosomal genes and relative expression of X-specific genes between homogametic and heterogametic sexes. However, this method cannot be easily applied into plant systems. Plant sex chromosomes are normally 'young' and their non-recombining regions are small 2 . In papaya, the non-recombining region of Y chromosome is 8.1 Mb and its X counterpart is 3.5 Mb 21 . A total of 50 paired X/Y genes, 34 X-specific genes and 22 Y-specific genes were annotated in the non-recombining regions of papaya sex chromosomes. Among the 34 X-specific genes, only three of them showed expression in leaf and/or flower tissues. Therefore, it is impossible to study dosage compensation by expression analysis of hemizygous X-specific genes in papaya. For this reason, we used X/Y paired genes to investigate dosage compensation in papaya. We used SNPs to distinguish X and Y alleles and used V. monoica, a close relative species of papaya without sex chromosomes, to normalize gene expression between sex chromosomes and autosomes.
Dosage compensation coevolved with sex chromosome degeneration. Plants with newly evolved sex chromosomes provide good opportunities to determine whether dosage compensation can evolve rapidly. Dosage compensation can be whole sex-linked region (acting on most genes in non-recombining regions) or local (acting on individual genes). In papaya, a bimodal distribution of the expression ratio of mX/fXX was observed with the main peak at 0.5 and the 2nd peak at 0.7, suggesting that dosage compensation was not established in the whole sex-linked region but might be acting on individual genes in papaya.
We further analyzed the expression of genes on different evolutionary strata. Although no dosage compensation of whole sex-linked region has been established in papaya, a gene-by-gene analysis revealed that at least seven genes were dosage regulated, including one gene (5%) in the older evolutionary stratum, four genes (21%) in the younger evolutionary stratum and two genes (13%) in the collinear region. Fewer dosage regulated genes were detected on the old evolutionary stratum, but the rate could be biased as the number of genes is too small. The Y allele of the only dosage regulated gene in inversion 1 is a pseudogene, while the Y alleles of the other five dosage regulated genes are functional genes. These genes might be dosage-sensitive genes and play critical functions. Dosage changes in these genes might cause severe deleterious effects or loss of fitness. In our study, the number of dosage regulated genes might be underestimated since dosage regulation can be achieved not only by adjustment of mRNA abundance but also by mRNA splicing, stability, or translation.
In the older evolutionary stratum, the expression level of X alleles in male was about half of that in female. And Y alleles showed reduced expression compared to their corresponding X alleles. In the younger evolutionary stratum, the expression level of X alleles in male is about half of that in female as observed in the older evolutionary stratum, but Y alleles showed relatively higher expression than their corresponding X alleles. The Y alleles diverged from X alleles as the sex chromosome evolved. They might have lost their function due to accumulated mutations but still be transcribed 21 . However, expression of a large number of malfunctional or nonfunctional Y alleles can be deleterious or costly because malfunctional transcripts may compete with the functional ones, which can lead to disturbance of normal functions. Even there is no harmful affect introduced, it is still not cost-efficient for organisms to maintain the nonfunctional transcripts. Although no significant correlation was observed between expression and Ks values of Y alleles, the overall expression of Y alleles in the older stratum showed a significantly higher level of reduction than the ones in the younger stratum in papaya. Due to the recombination suppression between the X and Y chromosomes, more deleterious mutations might have accumulated in Y alleles in the older evolutionary stratum than the ones in the younger evolutionary stratum 21 . Reduced expression of Y alleles in the older evolutionary stratum might be caused by accumulation of deleterious mutations in regulatory regions. Lenormand et al. 30 proposed a "degeneration by regulatory evolution" (DRE) theory to explain Y chromosome degeneration. According to this theory, coding sequences carrying more deleterious mutations were selected to become associated with weak Y cis-elements in order to mask deleterious mutations. Reduced expression of Y alleles in the older evolutionary stratum could also be caused by transposable element (TEs)-mediated methylation spreading. TE-mediated methylation can spread beyond the TE sequence and influence the expression of nearby host genes 31 . In fact, there are more TEs accumulated in Inversion 1 than Inversion 2 21 . Immunofluorescence assay also showed a slightly higher level of methylation signal in Inversion 1 than Inversion 2 32 Gene silencing can be another strategy to maintain transcriptional balance between sexes for the genes that could be substituted by other genes. In papaya sex-specific region, about 94.1% (32/34) of the hemizygous genes (X-specific genes) were transcriptionally silenced in leaf and/or flower tissues. The X-specific genes annotated in the papaya sex-specific region were located nearby a large highly methylated heterochromatin knob, which may change the DNA structure and suppress the expression of those genes 32 . The high percentage of silenced genes can also be the genes that have a very narrow expression span, which could escape detection due to their narrow spatial or temporal expression.
Several genes on the sex-specific region showed hyper transcription of Y alleles in papaya flower or leaf tissues. Elevated expression of Y alleles could be resulted from their specialization in male function. It can also be www.nature.com/scientificreports/ caused by haploid selection. Pollen experience extensive haploid selection, because they compete to access ovules to deliver the male gametes for fertilization 33 . Genes involved or expressed in haploid male gametophytes are less likely to be degenerated or lower their expression. Two genes in the inversion regions, CpXY h 2 and CpXY h 29, showed extremely higher expression of Y alleles. CpXY h 2 encodes the somatic embryogenesis receptor-like kinase (SERK), and CpXY h 29 encodes a phosphoacetylglucosamine mutase involved in DNA repair from UV damage. AtSERK was expressed in developing ovules and embryos in Arabidopsis, and it could enhance embryo competition in tissue culture 34 . Our results will lead to further investigation of haploid selection during evolution of sex chromosomes in papaya.

Materials and methods
Plant materials, total RNA extraction, and RNA-Seq library construction and sequencing. Papaya  Gene expression analysis. Papaya male and female RNA-Seq reads were mapped on papaya male and female reference genomes 20,21,24 using HISAT2 with default parameters 36 . The average identity of genes between papaya and V. monoica is about 93.1%. We therefore relaxed the stringency of mapping parameters when mapping the V. monoica reads on papaya genomes using HISTA2 (-p 10 --score-min L, 0.0, -0.7 --rdg 2,1 --rfg 2,1 --mp 3,1). TPM values were calculated for each library using StringTie 36 . Genes with TPM greater than 1 in at least one tissue were selected for dosage compensation analysis. TPM values of papaya genes were divided by TPM values of V. monoica genes individually for calibration.
Gene expression analysis of X-specific and Y-specific alleles in the non-recombining regions of papaya sex chromosomes. To distinguish the transcript reads between X and Y alleles in male samples, we first aligned the X and Y alleles of each gene using MUSCLE 37 , and then identified SNPs between them manually. We used the reads covering the SNP sites to estimate the expression levels of X-specific and Y-specific alleles. We mapped the RNA-Seq reads of papaya male samples onto the papaya female reference genome using HISAT2 36 . HaplotypeCaller of the GATK package was used to identify SNPs at the genome-wide level following the GATK best practice recommendations 38 . For SNPs in the non-recombining regions, the read counts of reference base and alternative base were used to estimate the expression of X-specific and Y-specific alleles, respectively. SNP loci identified in the first step were further validated whether they were homozygous in female and heterozygous in male plants. Reads matching these criteria were counted for allele-specific expression analysis. Read numbers of X and Y alleles at SNP sites within the open reading frame were summed up to calculate expression ratio between X and Y alleles. To make it comparable between libraries, the total read counts for each gene were normalized by the number of SNPs and library size using the calcNormFactors function in edgeR 39 .

Data availability
The RNA-Seq data used in this study have been deposited and publicly available in NCBI under Accession No. PRJNA555541.