Integrative genomics reveals paths to sex dimorphism in Salix purpurea L

Sex dimorphism and gene expression were studied in developing catkins in 159 F2 individuals from the bioenergy crop Salix purpurea, and potential mechanisms and pathways for regulating sex development were explored. Differential expression, eQTL, bisulfite sequencing, and network analysis were used to characterize sex dimorphism, detect candidate master regulator genes, and identify pathways through which the sex determination region (SDR) may mediate sex dimorphism. Eleven genes are presented as candidates for master regulators of sex, supported by gene expression and network analyses. These include genes putatively involved in hormone signaling, epigenetic modification, and regulation of transcription. eQTL analysis revealed a suite of transcription factors and genes involved in secondary metabolism and floral development that were predicted to be under direct control of the sex determination region. Furthermore, data from bisulfite sequencing and small RNA sequencing revealed strong differences in expression between males and females that would implicate both of these processes in sex dimorphism pathways. These data indicate that the mechanism of sex determination in Salix purpurea is likely different from that observed in the related genus Populus. This further demonstrates the dynamic nature of SDRs in plants, which involves a multitude of mechanisms of sex determination and a high rate of turnover.


Introduction
Major progress has been made in recent years in identifying the master regulators of sex determination in plants, but less is known about the transcriptional networks that account for sex dimorphism. Transcription factors 1 (Asparagus), small RNAs 2 (Diospyros), and cytokinin-response regulators (Actinidia and Populus) 3,4 have all been identified as mechanisms of sex determination in angiosperms. Development of separate sexes, however, requires that the genes controlling sex determination regulate the transcription of many genes and metabolic pathways in order to coordinate development of sex-specific characteristics, such as gametes and floral morphology. With the notable exception of Diospyros 5 , little is known about the metabolic pathways that are regulated by these sex-determination genes and the resulting transcriptome-wide expression differences.
Evidence from multiple angiosperm taxa, as well as leading sex determination models, suggests that in plants, sex is controlled by one or two regulatory factors, termed "sex determination" or "master regulator" genes 1,2,4,6,7 . These factors in turn lead to primary sex dimorphisms (andrecium and gynecium development) as well as secondary sex dimorphisms, such as floral volatile profiles, pigmentation, floral phenology, and organ morphology, and often involve both sex-linked and autosomal genes 8 . While primary sex dimorphisms are under direct control of the sex determination gene(s), secondary sex dimorphisms may be either under the control of sex-linked genes, or under direct control of the sex-determining genes themselves 7,8 .
Because their expression in the opposite sex may be deleterious, linkage between sex determination genes and genes controlling secondary sexual dimorphisms may be favored by natural selection. Such genes are termed "sexually antagonistic" 9 . As a result, it can be challenging to discriminate among sex determination genes and other sexlinked genes that influence sex dimorphisms.
In angiosperms, the maintenance of separate sexes typically involves factors controlling sex determination residing alongside linked sexually antagonistic genes controlling sex dimorphisms on heterogametic sex chromosomes, where typically one sex is heterogametic and the other homogametic 6 . Two heterogametic systems, XY and ZW, have been observed in angiosperms. The XY system, where the male is the heterogametic sex, tends to be more prevalent, and is found in 84.7% of diecious angiosperm species, including Asparagus officinalis, Carica papaya, Diospyros, and Phoenix dactylifera. In the less common ZW system, females are heterogametic, as in Fragaria and Silene otitis 10,11 . The Salicaceae family is of particular interest for use as a model in understanding the evolution of sex chromosomes and sex determination in plants, because the family is almost exclusively diecious, yet the sex determination region (SDR) has been remarkably dynamic 4,12,13 . The Salicaceae family exhibits both ZW and XY heterogametic systems and SDRs that are localized in different chromosomes across species. The SDR in S. purpurea has been localized to a 6.73-Mb pericentromeric region on chr15W, which includes approximately 2 Mb of sequence that is not present in the corresponding region of chr15Z 14 . Other Salix spp. in the section Vetrix also have a chr15 ZW SDR, including S. viminalis and S. suchowensis 15,16 , whereas S. nigra in the section Salix has a chr07 XY system 12 . In contrast, most Populus species have chr19 SDRs, including XY in P. trichocarpa 17 and ZW in P. alba 18 but, exceptionally, P. euphratica exhibits a chr14 XY SDR 19 . This indicates that sex determination has a complex evolutionary history in the Salicaceae, and that the SDR has shifted chromosomes as well as heterogametic systems, possibly through translocation, the rise of an entirely new locus becoming sexually antagonistic, or independent origins of diecy. Differing sex determination systems (as a result of sex chromosome turnover) in closely related taxa have been observed across both plants and animals. For instance, both XY and ZW systems are represented in Silene 10,11 . In cichilid fish, multiple sex determination loci and systems have been identified in specific populations 20 . Further, examples of a sex determination region evolving independently in related taxa include: sticklebacks, medaka fishes, and true frogs (Ranidae) 21 . Thus, even closely related species with similar genomic regions involved in sex determination, as in the Salicaceae, may have evolved independently and can involve unique mechanisms.
The Salicaceae family contains many species of economic importance in the genera Populus and Salix. Shrub willows (Salix section Vetrix) in particular, are grown throughout North America and Eurasia for bioenergy and bioproducts 22 . Despite its commonality across the Salicaceae family, diecy presents a challenge for breeding efforts and the cultivation of shrub willow, with sex showing linkage to biomass-related traits, such as leaf area 23 and catkins showing distinct phenology and secondary metabolite profiles between sexes, affecting pollinator and pest attraction 24,25 . There is a strong interest in understanding the genetic mechanisms controlling sex determination in Salix, along with the gene pathways involved in sex dimorphism, in order to advance current breeding efforts and genetic studies to improve Salix as a bioenergy crop. Nevertheless, there is still relatively little data characterizing sex-determination genes and sex dimorphism pathways in Salix, despite substantial research and identification of putative master regulators of sex in the related genus Populus 4,18,26,27 . Moreover, willows are typically insect pollinated, whereas members of the genus Populus are wind pollinated and show little evidence of sex dimorphism in vegetative traits 28 that may point to unique pathways and genes involved in sex dimorphism and underscore the need for sexdetermination research that is specific to Salix.
A previous study of transcriptomic data in Salix identified differentially expressed genes associated with sex in shoot tips containing floral primordia 29 . It is hypothesized that sexually dimorphic genes are regulated through complex pathways that are ultimately controlled by one or more elements in the SDR, most likely via transcription factors, plant hormones, and/or small RNAs. This would be consistent with characterized sex determination systems in plants [1][2][3][4] . Elements in the SDR controlling sex determination are termed "master regulator genes" and likely control several top-level regulatory genes that may or may not be located in the SDR, which in turn regulate both primary and secondary sex characteristics through intermediate metabolic pathways, as described by Feng et al., 2020 11 . Unfortunately, identification of master regulator genes in Salix purpurea is complicated by an SDR that comprises nearly 40% of chr15W and contains 488 linked genes, including repetitive regions, and tandem duplications 14 , requiring the use of transcriptomic data and coexpression analyses to characterize candidate genes.
This study captures the transcriptome-wide primary and secondary sex dimorphism profile in emerging inflorescences, which contain hundreds of achlamydeous flowers (lacking both sepals and petals) across a range of developmental time points, in addition to exhibiting distinct terpenoid and phenylpropanoid profiles leading to pigmentation and volatile organic compound emission dimorpism 24 . Using eQTL analysis, we associated the expression levels of differentially expressed genes in catkins with genomic loci. We identified multiple genes associated in trans with the SDR that could serve as toplevel regulators of primary and secondary floral sex dimorphisms under direct control by master regulator genes, as well as enriched pathways predicted to serve as intermediate pathways involved in sex dimorphism. Furthermore, based on these multi-omics results, six gene families are presented as candidates for the master regulators of sex: homologs of Arabidopsis GATA15, ARR17, AGO4, and DRB1, three genes coding hypothetical proteins, and a CCHC zinc finger. Characterizing the master regulator genes and the mechanisms of sex determination in willow provides insight into the complex evolution of diecy in the Salicaceae. These data represent one of the most comprehensive studies of sex dimorphism expression in a diecious crop plant, incorporating RNA expression, genotyping-by-sequencing (GBS), differential methylation, and small RNA expression, and provide a valuable addition to the nascent body of knowledge on sex-determination mechanisms in crop plants.

Differential expression analysis
Principal component analysis showed a clear separation of male and female genotypes based on transcriptome-wide expression (Fig. S1). A total of 36,518 gene models, including alternative transcripts, accounting for 63.6% of all genomewide protein-coding transcripts, were expressed in floral tissue across the 159 samples, with 24,074 (66%) showing significant differential expression between males (M) and females (F) (FDR ≤ 0.05) (Fig. 1, Fig. S2). There were 4676 genes with log 2 M:F ≥ 1 (male-upregulated), while only 3247 genes with log 2 M:F ≤ − 1 (female-upregulated).
Gene Ontology enrichment was performed on the male and female differentially expressed (DE) genes showing at least twofold expression differences between sexes (Tables S1-S4). Gene Ontology enrichment for the male-upregulated genes showed a significant overrepresentation of 76 terms, including pollen and anther development, male gamete development, the terpenoid biosynthesis pathway, and cytokinin metabolism (Table S1). The male-upregulated gene set showed an underrepresentation for 71 terms, including terms relating to transcription and RNA regulation, splicing, and modification (Table S2). Among the female-upregulated genes, there was an overrepresentation of RNA transcription and metabolism and shoot and organ development (Table  S3), and an underrepresentation of 47 terms, including cell metabolism and biosynthesis (Table S4).

Small RNA identification and analysis
A total of 266,272 small RNA (smRNA) loci were identified, 146,807 of which contained smRNAs in the range of 20-25 nucleotides, the canonical length of siR-NAs and miRNAs. Forty-five miRNA precursor loci representing 34 unique final miRNAs were predicted based on putative stem-loop structures consistent with miRNA biosynthesis (Table 1, Fig. 1, Table S5, Fig. S3). Forty of the 45 loci had identity with known and annotated plant smRNAs in the PmiREN database (www. pmiren.com), including 27 with matches to miRNAs in Populus trichocarpa. psRNATarget (plantgrn.noble.org/ psRNATarget) was used to identify putative target genes for all miRNAs. Differential expression analysis on the miRNA loci revealed that 25 were significantly differentially expressed (FDR ≤ 0.05), including 18 with greater expression in males and seven with greater expression in females. Among the miRNA loci were two maleupregulated copies of miR172, which targets the MADSbox gene APETALA2 in Arabidopsis 30 and four maleupregulated copies of miR156, one of which is located in the SDR.

Bisulfite sequencing and analysis
A total of 604,150 methylated regions were identified using DMRfinder after combining nearby methylated sites. Of these sites, 2018 show differential methylation, including 1465 with greater methylation in males and 553 with greater methylation in females (Fig. 1, Table S6). In total, 170 genes were identified with differentially methylated sites in their putative promoter regions (between 1 and 1000 bp ahead of transcription start site) (Table S7).

WGCNA network analysis and GO enrichment
WGCNA network analysis was performed to explore pathways that may be involved in floral sex dimorphism and development. Twenty-four modules were generated based on gene expression similarities (Fig. 2, Table 2). In total, 17,953 genes were assigned to modules, accounting for 49.8% of all expressed genes and alternative transcripts. The remaining unassigned genes were placed in the "gray" module. Three modules accounted for the majority of assigned genes: the "purple", "cyan", and "brown" modules. The "purple" module was strongly correlated to the female sex (r 2 = 0.97) and captured 30.2% of all female-expressed genes ( Table 2), this module likely represents the genes involved in primary and secondary female sex dimorphism. In the "purple" module, 17 GO terms were enriched, including RNA and nucleic acid metabolism and regulation, photosynthesis, and phenylpropanoid metabolism (Table S8). This is consistent with the female differential expression of the majority of genes in this module (Table S3) and supports the contention that this module may be responsible for female sex-dimorphic traits. The cyan module was the most male-correlated (r 2 = 0.92) and included 45.5% of all male-upregulated genes likely representing the genes in primary and secondary male sex dimorphism pathways. The "cyan" module contained 230 significantly enriched GO terms, notably multiple terms related to pollen development (Table S9). The "brown" module consisted mostly of genes not showing differential expression, and probably represents gene pathways involved in basic cell and biological processes.

eQTL analysis
A total of 1,381,813 cis-eQTL and 811,499 trans-eQTL (FDR ≤ 0.05) were identified after accounting for  covariates. Notably, there appears to be an eQTL "hotspot" on chr07, to which the expression levels of 550 genes are associated (Fig. 1) The exceptional number of genome-wide eQTL at this locus suggests that it may have a major role in regulating sex dimorphic gene expression in catkin tissue. Expression levels of 2127 genes were found to be associated with polymorphisms in the SDR, of which 1686 were trans (Fig. 1). To identify top-level regulatory and intermediate pathway genes, 2127 genes with eQTL in the SDR were a subset for genes with involvement in secondary metabolism (22), hormone signaling (16), RNA splicing and regulation (4), or transcription (96) (Table S10). Of the 96 genome-wide transcription factors found to have SDR eQTL, 70 were differentially expressed in either males or females. Because these transcription factors include genes relating to floral development, phenylpropanoid production, and cytokinin signaling, they are candidates for top-level regulatory genes that may regulate further downstream expression. However, confirmation of such roles will require further investigation using methods such as ChIP-Seq and DAP-Seq. Fourteen MADS-box and floral development genes, 15 phenylpropanoid pathway genes, and five terpenoid pathway genes were also found to have eQTL in the SDR (Table S10), representing candidates for intermediate pathway genes directly responsible for dimorphisms in floral morphology, pigmentation, and volatile and secondary metabolite profiles.

Candidate master regulator gene identification
We identified eleven genes that are strong candidates as master regulators of sex using the following criteria: (1) presence on chr15W and absence from chr15Z, (2) a significant log 2 M:F < − 1, (3) presence in the femalecorrelated "purple" WGCNA module, and (4) gene annotation either consistent with a possible floral sex dimorphism pathway or of unknown function. Genes meeting all four of these criteria are expected to be present only in females and have expression levels and module membership that would implicate them in sex dimorphism. Four copies of ARR17, a truncated AGO4 gene, DRB1, GATA15, a CCHC zinc finger nuclease, and three genes coding hypothetical proteins met these criteria and were identified as candidate master regulator genes (Table 3).

Discussion
We have used a combination of differential expression, coexpression, and eQTL analyses to identify genes that are candidate master and top-level regulators of sex expression in S. purpurea floral tissue.

RNA-Seq and differential expression
Total RNA-Seq and small RNA-Seq captured the unique sex-specific transcriptomic profiles during catkin development, after floral meristem differentiation and prior to maturation of any stamens or pistils. Within a single maturing catkin, there are hundreds of individual flowers across a range of developmental stages, resulting in pooled expression data from across floral development time points as well as tissue types (i.e., anthers/pistils, floral bracts, and peduncles). In addition to the primary sex dimorphism genes responsible for anther and carpel development, this enables the identification of secondary sex dimorphisms, such as genes involved in pigmentation, volatile emission, and differences in catkin phenology, which can also inform differences in vegetative emergence and secondary metabolites. By using network analysis and incorporating genomic data through eQTL, we can hypothesize how the SDR may regulate differential gene expression in catkins. Nearly two-thirds of all expressed genes in the floral tissue exhibited differential expression Table 2 All WGCNA modules with the total number of genes belonging to each, along with the number of differentially expressed genes; the gray module contains all genes that were unable to be assigned to a specific module between males and females. This number is due in part to the large sample size of 159 individuals, whereby there is enough statistical power to detect even slight differences in expression. Nevertheless, over 21% of the expressed genes showed at least twofold expression differences between sexes, providing evidence of global expression differences, which would require robust transcriptional regulation, ultimately leading back to the sex-determinant genes in the SDR. These genes provide important clues about the regulation of sex determination in this species and the molecular mechanism responsible for diecy and floral sex dimorphism, as described in more detail below.

Type A response regulator (ARR17)
Four copies of ARR17, a type A cytokinin-response regulator, in the SDR, show high levels of expression in female S. purpurea: Sapur.15WG073500, Sapur.15WG073900, Sapur.15WG074000, and Sapur.15WG075200. Two additional copies of ARR17 are present on chr19 but are not differentially expressed. The cytokinin signaling pathway has been proposed as a common pathway for sex determination in angiosperms 10 . Cytokinin-response regulators serve as feminizing factors in Actinidia, where they are master regulators 31 , and Diospyros, where they act as top-level regulators downstream of the SDR 2,5 . There is recent evidence implicating ARR17 as the master regulator of sex in the closely related genus Populus, where it may function as a feminizing factor whose expression is suppressed in males by small RNAs 4 .
The presence of two complete copies of ARR17 on S. purpurea chr19, expressed in both males and females, suggests that the dosage of ARR17 (eight copies in females vs. four in males) may play a role in sex determination in willow. Interestingly, these findings suggest a different mechanism for ARR17 than the leading model for sex determination in Populus proposed by Müller et al.
(2020) 4 . They proposed that functional ARR17 in P. alba (ZW) is a feminizing factor, and in XY species, ARR17 is silenced by inverted repeats on the Y chromosome through the RNA-directed DNA methylation (RDDM) pathway, leading to a male phenotype. To confirm this, they silenced the ARR17 gene in an early-flowering female line and observed male flowers in tissue culture 4 . We found no evidence of an ARR17 RNA interference mechanism in S. purpurea catkins. Salix purpurea has a similar truncated inverted repeat of ARR17 on chr15Z, but we did not observe small RNAs mapping to the ARR17 genes and their proximal regions, nor to the ARR17 homologs on Salix chr19. There were also no differential methylated regions in the putative promoter regions of any of the ARR17 genes, and S. purpurea males show expression of the ARR17 copies on chr19, whereas in P. trichocarpa males, there is no ARR17 expression. Furthermore, Carlson et al. (2017) did not find that ARR17 was differentially expressed in shoot tips containing floral primordia, indicating that this mechanism is not present at an earlier floral development stage either 29 . Taken together, these results suggest that the RNA- interference mechanism of ARR17 may be absent in S. purpurea.
The observation of ARR17 expression in both male (chr19) and female (chr19 and chr15W) S. purpurea, combined with a lack of small RNA loci in these same regions, demonstrates that the Salix sex-determination mechanism is likely different from the model proposed by Müller et al. (2020) 4 . Instead, our data suggest that if ARR17 is a master regulator in S. purpurea, it likely involves a unique mechanism, possibly through gene dosage, such that a threshold of ARR17 expression must be reached to activate a switch from male-to-female development. Alternatively, there may be another feature in the S. purpurea SDR that is suppressing this silencing mechanism, one such candidate is the adjacent AGO4 homolog described below.

ARGONAUTE 4 (AGO4)
A single copy of an Arabidopsis AGO4 homolog, Sapur.15WG074400, is present within the ARR17-inverted repeat region of the chr15W SDR 14 that exhibits a log 2 M:F expression of −7.94, and has a cis-eQTL in the SDR. AGO4 is a component of the RNA-induced silencing complex (RISC) in the RNA-dependent DNA methylation (RDDM) pathway, where it binds small RNAs and silences mRNA 32 . In the bisulfite sequencing data, nearly three times as many regions showed increased methylation in males (1465) compared with females (553), supporting that methylation activity is downregulated in females and may have a role in mediating sex dimorphisms (Fig. 1). The SDR AGO4 gene appears to be truncated to only 79 amino acids in length compared with five other catkinexpressed AGO4 homologs in S. purpurea, which are 893-922 amino acids, and has multiple indels and substitutions when aligned (Fig. S4A). The most similar AGO4 paralog to Sapur.15WG074400 by MUSCLE multiple-sequence alignment is Sapur.008G00580 which has a nearly sevenfold greater expression in males (Fig.  S4B, Table S11). We speculate it is possible that the truncated version of AGO4 is interfering with expression of the full-length Sapur.008G00580 in males by a long noncoding RNA. This could have wide-ranging effects on sexually dimorphic gene expression and could explain the decreased genome-wide methylation observed in females. The findings from the bisulfite-sequencing data indicate that methylation is globally reduced in females. We hypothesize that the Sapur.15WG074400 could be competing for binding of siRNAs with a full-length AGO4 and sequestering male-specific RDDM in females. These global methylation differences could be responsible for sex determination, such as in Melandrium album where demethylation of male plants results in monoecy, with no effect on female plants 33 . Such a mechanism could also explain ARR17 expression levels, and why no small RNAs were observed mapping to ARR17 in Salix, despite evidence for this mechanism in Populus.

Double-stranded RNA-binding protein 1 (DRB1)
A copy of a DRB1 homolog, Sapur.15WG074300, is also located in the ARR17 inverted repeat region of the chr15W SDR and is adjacent to AGO4. In Arabidopsis, DRB1 is involved in RNA-mediated post-transcriptional silencing, working directly with DCL1 in miRNA processing 34 . RNA modification, RNA metabolic process, and regulation of transcription were all among the enriched GO terms in the female differentially expressed genes and could be regulated directly or indirectly by DRB1.

Transcription factor GATA15
A female-expressed homolog of GATA15, Sapur.15WG 062800, is located in the W-specific region of the SDR and shows a cis-eQTL association with polymorphisms on Chr15. GATA15 is a transcriptional regulator that binds GAT or GATA motifs in gene promoters and is involved in cell differentiation, morphogenesis, and development 35 . This is consistent with the GO enrichment analysis of female-expressed genes, which contains many significant terms related to morphogenesis and development. Furthermore, chr15 GATA15 was found by Carlson et al. (2017) to be differentially expressed in F 1 S. purpurea shoot tips containing floral primordia 29 , suggesting that this may be the earliest cue for floral sex differentiation, which would implicate it as a master regulator gene. While functional genomics data are required to elucidate its precise function, its expression in females both during floral differentiation and catkin emergence suggests that it could be directly involved in gynecium development.

Genes of unknown function
Four genes were identified that fit the criteria for candidate master regulator genes, but whose functions are not known or whose annotations are insufficient for further analysis. These included Sapur.15WG068800, a CCHC-type zinc finger, and three hypothetical proteins: Sapur.15WG075300, Sapur.15WG074900, and Sapur.15 WG075700.
While there is mounting evidence pointing toward ARR17 as the master regulator in Populus spp 4,26,27 , the evidence for different expression profiles of ARR17 in Salix, as well as the presence of additional candidate genes, suggests that the mechanism may be more complicated or altogether different in Salix. Nevertheless, expression data from the ARR17 homologs in S. purpurea do support a possible role in sex determination, either as a single gene master regulator or part of a two-gene system in conjunction with another master regulator, and would provide further evidence to support cytokinin response as a common mechanism for diecy in angiosperms, as suggested by Montalvão et al 10 . Further functional genomics studies will be necessary to elucidate the precise functions of candidate master regulators and their role in sex determination.

SDR regulation of floral sex dimorphism
Among the floral development genes with eQTL in the SDR were homologs of AGL11 and AGL32 (ovule development and seed formation) 36 , AGL29 and AGL30 (pollen development) 37 , and AGL6 (floral meristem differentiation and gamete development) 38 , as well as TOC1, WOX1, RGA, and CONSTANS 39 (Table S10). While differential expression of MADS-box genes is expected in floral tissues, their association with the SDR through eQTL, even after accounting for sex as a covariate, suggests that the SDR may have a direct role in controlling expression of these genes and subsequent primary sex dimorphisms. TOC1 and CO regulate circadian rhythm and day-length responses, while RGA represses floral growth in the absence of gibberellic acid 39 . WUSCHEL (WOX1) genes have a well-characterized function in meristem organization and have been shown to interact with AGAMOUS in floral meristem development 39 . AGL11 is important in ovule and seed development, and has been shown to interact with cytokinin to control fruit size 36,39 . These floral development genes may have specific roles as "entry points" for the SDR into the floral development pathway to regulate the development of a particular sex (Fig. 3). Cronk and Müller (2020) proposed that ARR17 may act as a feminizing master regulator in Populus through the suppression of PISTILLATA (PI) or APETALLA3 (AP3) MADS-box genes 27 . Importantly, there was no association of either PI or AP3 expression with genomic variation in the SDR, despite the fact that PI shows very high levels of expression in males (log 2 M:F 13.21) (Table S11), further suggesting that the mechanism of sex determination in Salix may be different from Populus. eQTL analysis revealed several loci with an exceptionally high number of trans-eQTL (Fig. 1). Intriguingly, the hotspot with the greatest number of trans-eQTL is located in the region on chr07 homologous to the S. nigra SDR 12 , which could implicate its role as an ancestral SDR in Salix, or explain a fitness advantage of a chr07 SDR in S. nigra when linked with sex-dimorphism genes in this region. Approximately 250 kb from this locus on chr07 is Sapur.007G068100, a homolog of AGL32, a MADS-box gene involved in ovule development 36 . The expression of this gene is associated in trans with the chromosome-15 SDR region, further supporting its role as a top-level regulatory gene under direct regulation by the SDR.
Compounds produced from the phenylpropanoid and terpenoid pathways are well-characterized in Salicaceae, and there is evidence that floral volatile, terpenoid, and phenolic glycoside profiles differ substantially between males and females, which affects both pollinator attraction and herbivory, traits that are likely to be evolutionary drivers of diecy and affect cultivar yield 24,25 . In support of this, we identified five terpenoid pathway genes and 15 phenylpropanoid pathway genes with eQTL in the SDR. These include both genes involved in the core phenylpropanoid pathway, and biosynthesis of specific compounds, including naringenin, flavenol glucosides, and sesquiterpenoids (Table S10). This provides evidence supporting a direct link between the SDR and synthesis of these compounds, by an as-yet unknown mechanism.

Small RNA regulation of sex dimorphism
Of the 45 identified miRNA loci, 18 were male differentially expressed and seven were female differentially expressed. Among the putative targets of these miRNAs were dicer-like genes, squamosa promoter-like genes, and  Table 1) providing evidence that miRNAs are likely to be a component of floral sex-dimorphism regulation. Notably, five miRNAs were identified that had no match with any small RNAs in the PmiREN database and could represent genus-or species-specific micro-RNAs. Furthermore, 13 miRNA loci with matches in the pmiREN database were not matched to a known P. trichocarpa miRNA, which is the closest species for which extensive small RNA data are available (Table 1). This suggests that S. purpurea may utilize different sets of small RNAs in floral development relative to Poplar, which likely has implications on sex dimorphism and determination. Expression results showed that four miR156 and two miR172 homologs have greater expression in male floral tissue. In Arabidopsis, miR156 and miR172 interact to form a gradient that regulates vegetative-to-floral meristem transition through the targeting of squamosa promoter-like genes (SPL) 30 . While all copies of both miR172 and miR156 show malebiased expression in catkins, the overall expression of miR156 is greater than miR172 in males. The majority of SPL genes that are targeted are female-upregulated in S. purpurea, including an SPL4 homolog on chr07 (Sapur.007G123900) that also shows increased methylation in males in its promoter region. These data support that the miR156/172 pathway is upregulated in male catkins and may be responsible for sex dimorphisms (Table S11). This pathway may play a role in male floral tissue development or differentiation.
Importantly, one copy of miR156 is located in the SDR region unique to chr15Z. Alignment of the chr15Z miR156 precursor sequence to chr15W reveals that a single indel is responsible for this chr15Z-specific mapping (Fig. S5), which may prevent transcription or processing of this small RNA from chr15W. This could indicate a dosage dependent response in males, which have four copies of this mature miR156 homolog, compared with only three in females. Female-upregulated miRNAs included miR403, which targets AGO2, and miR162, which targets a dicer-like gene. All of these are involved in small RNA signaling and DNA methylation, which is consistent with the enrichment of transcription regulation terms in the female-upregulated genes. This may point toward a role of genome-wide DNA methylation or RNA silencing in regulating sex dimorphism, possibly mediated by AGO4 or DRB1.

Conclusion
Taken together, RNA-Seq, small RNA-Seq, bisulfite sequencing, and eQTL mapping results suggest that the sex-determination mechanism in willow is different from the small RNA-mediated, single-gene-sex determination pathway proposed in Populus, and reveals that many miRNAs, transcription factors, floral development genes, and secondary metabolism genes are involved in primary and secondary sex-dimorphism pathways in Salix purpurea catkins. A major factor inhibiting functional genomics research in Salix is the lack of a functional transformation system, which precludes direct assessment of gene function in transgenic plants. We show here that using a combination of genome-wide differential expression, coexpression, eQTL, small RNA, and genomic variation can provide strong evidence supporting the involvement of these proposed candidate master regulator genes in sex determination.

Plant material and growing conditions
The F 1 family (Family 82) was generated from a cross between female S. purpurea clone 94006 and male clone 94001, both collected from different naturalized S. purpurea populations in Upstate NY (Fig. S6) 42 . Details on phenological stage of catkin collection can be found in the Supplementary Methods.

Library preparation and sequencing
Catkins of 90 males and 90 females from the 317 F 2 family were collected for RNA-Seq, which, after removing individuals with poor mapping quality, was reduced to 77 females and 82 males. Reads were mapped using STAR 43 (Table S12), and counts assigned using featureCounts 44 in R. DESeq2 were used to identify differentially expressed genes 45 . A total of 22 male and 21 female F 2 progeny, along with the male (94001) and female (94006) grandparents, were collected for small RNA sequencing. Mapping and identification of putative miRNAs was done using ShortStacks 46 and differential expression determined using DESeq2 45 . Six male and six female F 2 progeny, along with twelve male and twelve female unrelated S. purpurea from a diversity panel, were collected for bisulfite sequencing. Reads were mapped using Bismark 47 and differentially methylated regions (DMR) determined using DMRfinder 48 . GBS data were collected on all 317 family F 2 progeny and used to call variants in TASSEL 5.0 49 . The RNA-Seq mapping and downstream analysis pipelines are summarized in Fig. S7. All read mapping for RNA-Seq, small RNA-Seq, GBS, and bisulfite reads was conducted using the version 5.0 Salix purpurea reference genome available from Phytozome 13. Full details on library prep, sequencing, and alignment to the reference genome can be found in the Supplementary Methods.

eQTL analysis
The R package, MatrixEQTL 50 , was used to map eQTL. Covariates were included for sex, along with the four largest principal components in order to control for the underlying data structure and minimize false-positive eQTL mapping to the SDR. Each unique SNP/gene association is considered separately as an eQTL, independent of any additional genes that may be associated with that SNP. Cis-associations were classified as those SNPs within 1 Mb of their associated gene based on the recommended settings of MatrixEQTL, while transassociations were classified as greater than 1 Mb or on different chromosomes. SNPs within the SDR, spanning from 2.34 Mb to 9.07 Mb on chr15W and 2.34 Mb to 6.70 Mb on chr15Z 14 , were also considered to be cis to all other genes within the SDR, regardless of distance, due to low recombination in this region.

WGCNA network analysis
Library-normalized FPKM expression values were calculated from the raw count data using the edgeR package 51 . Weighted gene coexpression network analysis (WGCNA) 52 was used to construct modules of genes with similar expression values. A clustering of FPKM expression values was generated using hclust, which showed that two male samples (10X-317-161 and 10X-317-020) displayed extreme differences in overall expression patterns from all of the other samples (Fig. S8), and so were removed from downstream analyses. A topological overview matrix was created to perform module identification. In order to correlate sex with module expression, sex was coded numerically with values of −1 for females and 1 for males; thus, a module r 2 near one indicates strong correlation with males, and a module r 2 near-negative one indicates strong correlation with females. The minimum module-size threshold was set at 30 genes per module. Identified modules were merged based on similar expression levels. A total of 32 unmerged modules were merged into 24 modules based on similar expression patterns.

Gene ontology enrichment and pathway analysis
Gene ontology (GO) enrichment at the lowest levels of the GO hierarchy was calculated in the BiNGO app 53 in Cytoscape using the Arabidopsis homolog gene models. Arabidopsis gene models were used as there are limited data available on gene function in Salix, and gene functional annotations of nonmodel species are often ultimately based on experimental evidence from Arabidopsis. Furthermore, using Arabidopsis gene models provided compatibility with existing databases, such as TAIR. The background reference was created from the list of Arabidopsis homologs for every transcript model in the S. purpurea 94006 v5.1 genome found in the gene annotation information file on Phytozome (https://phytozome-next. jgi.doe.gov/info/Spurpurea_v5_1). The Arabidopsis homologs of the genes present in each module or male: female differential expression cutoff (inclusive of all genes above or below the cutoff) were used as the query to determine overrepresented and underrepresented parent and child terms for each gene set. Descriptions of each over-and underrepresented Gene Ontology ID were generated by BinGO based on TAIR descriptions 39 .