Comparative transcriptome analysis of Indian domestic duck reveals candidate genes associated with egg production

Egg production is an important economic trait and a key indicator of reproductive performance in ducks. Egg production is regulated by several factors including genes. However the genes involved in egg production in duck remain unclear. In this study, we compared the ovarian transcriptome of high egg laying (HEL) and low egg laying (LEL) ducks using RNA-Seq to identify the genes involved in egg production. The HEL ducks laid on average 433 eggs while the LEL ducks laid 221 eggs over 93 weeks. A total of 489 genes were found to be significantly differentially expressed out of which 310 and 179 genes were up and downregulated, respectively, in the HEL group. Thirty-eight differentially expressed genes (DEGs), including LHX9, GRIA1, DBH, SYCP2L, HSD17B2, PAR6, CAPRIN2, STC2, and RAB27B were found to be potentially related to egg production and folliculogenesis. Gene ontology (GO) and Kyoto Encyclopedia of Genes and Genomes (KEGG) pathway analysis suggested that DEGs were enriched for functions related to glutamate receptor activity, serine-type endopeptidase activity, immune function, progesterone mediated oocyte maturation and MAPK signaling. Protein–protein interaction network analysis (PPI) showed strong interaction between 32 DEGs in two distinct clusters. Together, these findings suggest a mix of genetic and immunological factors affect egg production, and highlights candidate genes and pathways, that provides an understanding of the molecular mechanisms regulating egg production in ducks and in birds more broadly.

www.nature.com/scientificreports/ and ovulation, requires many genes to act synergistically. Tao et al. 5 conducted a comparative study on high and low egg laying Jinding ducks using tissue from the medulla of the ovary, which revealed multiple genes that are involved in, circadian rhythm and estrogen receptor binding. Sun et al. 6 performed a comparative transcriptome study on Longyan Shan-ma ducks and reported several DEGs and pathways related to ovarian development and egg production. However the potential genes involved in egg production in duck remain unclear and these studies 5,6 have analysed only a few samples from high and low egg producing ducks. Therefore, it is necessary to investigate more samples and additional breeds to provide a comprehensive view of the molecular mechanism underlying the egg production in ducks.
In the present study we analysed the ovarian transcriptome of high and low egg laying Chara ducks to identify the genes associated with egg production for the improvement of egg production. The results of this study will further the understanding of the gene regulatory mechanisms influencing egg production and laying performance. The genes identified in this study represent candidate genes for marker-assisted selection studies for enhancing egg production in ducks.

Results
Phenotypic features. The body weight and egg weight were not significantly different between HEL and LEL ducks (Fig. 1b,c). However, the difference in the total number of eggs laid by HEL and LEL ducks was significant (Fig. 1d). On average HEL ducks laid 212 eggs more than the LEL ducks over 93 weeks.

Characteristics of duck ovarian transcriptome.
To analyse the transcriptome changes of the ovary of HEL and LEL ducks, and to identify the genes associated with egg production, 11 ovarian transcriptomes were sequenced. In total, 806,967,717 million raw reads were generated with an average of 73,360,701 million reads per sample. The number of clean reads ranged from 50,718,335 to 118,504,555 after filtering out ribosomal RNA, adapters, and low quality reads (Table 1). Mapping reports from STAR revealed that 71.4-84.3% of the reads were mapped to the reference duck genome. The mapping rate was comparable to previous transcriptome studies 12,13 . On average 80.5% of reads mapped to annotated genes in the genome. Principal component analysis (PCA) was performed using normalized expression data to understand the grouping and variability between the   Table 2). Of these 310 and 179 genes were upregulated and downregulated in HEL ducks respectively (Fig. 3a). The top upregulated genes in HEL included FAM19A4, LHX9, GRIK3, LRFN2, GUCA1B, SLC26A9 and the genes upregulated in LEL (downregulated in HEL) included P2RX6, TNNT2, ISG15 and LY96. Among the top upregulated genes, three were uncharacterized genes (ENSAPLG00000001875, ENSAPLG00000002685, and ENSAPLG00000002064) and among the top downregulated genes, the top four genes were uncharacterized (ENSAPLG00000004628, ENSAPLG00000006301, ENSAPLG00000000489, and ENSAPLG00000004923). Hierarchical clustering of the DEG's showed that, the samples clustered by the group (HEL and LEL) and the DEGs clustered into two major clusters (Fig. 3b). Further, the putative functions of the DEGs were analysed using blast, against the GenBank and UniProt databases, and 38 genes were found to be associated with egg development, folliculogenesis and reproductive functions (  None of the GO categories in the biological process domain was statistically enriched after correcting for multiple hypothesis testing (Supplementary Table 3). These mildly enriched biological process categories, however, highlight genes associated with immunity (e.g., viral entry into host cells (p = 9.5E − 5, FDR = 0.338), gonad morphogenesis (p = 7.05E − 4 and FDR = 0.476), glutamate metabolic process (p = 2.15E − 04, FDR = 3.04E − 01), negative regulation of viral genome replication (p = 3.69E − 04, FDR = 4.76E − 01)) and regulation of hormone levels (p = 4.67E − 4 and FDR = 4.14E1). Only a single gene, LIM homeobox 9 (LHX9) was associated with gonad morphogenesis which was among the most strongly differentially expressed genes (log2FC = 2.07 FDR = 3.01E − 05). MX1 gene was one of nine genes associated with viral entry into host cells and is also differentially expressed (log2FC = − 1.08, FDR = 0.008). Genes associated with viral entry into host cells and other immune-related patterns tend to be upregulated in LEL ducks (5/9 genes enriched under viral entry into host cell GO term). Moreover, functional classification of DEGs according to protein family domains based on InterPro database revealed that 17 DEGs were enriched under Immunoglobulin subtype. In addition to these, various other genes including, FAM19A, dopamine receptor D2 (DRD2), Ras and Rab interactor 2 (RIN2), LHX9 that are associated with reproductive developmental process like egg maturation, folliculogenesis and gonad morphogenesis were also differentially expressed (  Table 4) including, Rap1 signaling pathway (hsa04015, adjusted p = 0.034, 11 genes), Glutamatergic synapse (hsa04724, adjusted p = 0.0016, 10 genes), cAMP signaling pathway (hsa04024, adjusted p = 0.093, 10 genes), MAPK signaling pathway (hsa04010, adjusted p = 0.037, 9 genes), progesterone mediated oocyte development (hsa04914, adjusted p = 0.044, 7 genes), and Circadian entrainment (hsa04713, adjusted p = 0.010, 6 genes).

Protein-protein interaction network.
Protein-protein interaction network analysis of DEGs showed 32 genes to have strong interaction (minimum interaction score of 0.9). Genes functioning in MAPK signaling pathway, circadian entrainment, progesterone mediated oocyte maturation, glutamatergic synapse and oxytocin www.nature.com/scientificreports/ signaling pathways formed one cluster and genes that were part of peroxisome pathway formed a separate cluster (Fig. 6).

Discussion
Genetic progress through selection for improved egg number or rate of lay have increased egg production in domestic birds particularly in chicken. The rapid development of various "omics" technologies has had a profound effect on animal breeding and have replaced classical breeding methods with molecular breeding, as these methods help in enhancing desired traits in animals at shorter generational intervals. RNA sequencing can help in deciphering the complex expression dynamics of genes under various physiological states. In poultry, as in all vertebrates, the reproductive endocrine systems and periodic ovulation are regulated by the hypothalamic-pituitary-gonadal axis. These regulatory mechanisms mediate physical interaction of follicles and directly contribute to egg-laying. Ovary development and follicle formation is a dynamic process and has a direct effect on egg-laying performance 14 , therefore identifying candidate genes that are differentially expressed in the ovaries of HEL and LEL ducks might increase the understanding of egg production rate in Chara ducks. The HEL ducks had a significantly higher laying rate than the LEL ducks. On average the HEL ducks laid 433 eggs, while the LEL ducks laid only 221 eggs. Given this difference in egg laying, the transcriptome levels of the HEL ovary were compared against the LEL ovary, to identify genes and pathways that were differentially  www.nature.com/scientificreports/ enriched. Overall, 489 genes were found to be differentially expressed out of which 310 were upregulated in the HEL ducks. This included 10 glutamatergic synapses related genes. Glutamate receptors mediate excitatory neurotransmission and increase LH and FSH secretion 15 . Several glutamatergic synapse genes were upregulated in the HEL ducks, including GRIN2B, GRIK3, GRIA1 GRIK4 and GRIK2. The GRIN2B, (glutamate ionotropic receptor NMDA type subunit 2B), which acts as an agonist binding site for glutamate, and is reported to be a functional component for embryonic development 16 . GRIK3 (Glutamate receptor ionotropic kainate 3), is a glutamate ionotropic receptor encoding gene associated with egg production in chickens 17 , and fertility in mouse 18 . www.nature.com/scientificreports/ GRIA1 (Glutamate ionotropic receptor AMPA type subunit 1), which was found to influence ovulation rate and number of ova and embryos collected in super ovulated donors in cattle 19 . Previously, Chen et al. 14 have also demonstrated the enrichment of glutamatergic synapses in the follicles of high laying chickens. This shows the vital role of glutamatergic synapses genes in embryonic development, egg production, fertility and reproduction. Seven genes that are part of the progesterone mediated oocyte maturation were enriched among the DEGs and a few of these genes were also part of the MAPK (mitogen activated protein kinase) pathway. Both these pathways have been found to be associated with egg production in ducks 20,21 . MAPK signaling regulates diverse cellular mechanisms by relaying extracellular signals and activating intracellular response 22 which is activated due to extracellular growth factor-receptor interactions. However, in female germ cells, it is activated by an upstream activator gene MOS (MOS proto-oncogene, serine/threonine kinase). MOS is a critical regulator of meiotic divisions that produces eggs 23 . MOS expression was found to be elevated in the ovary of the HEL ducks. Similarly, the expression of BRAF (B-Raf proto-oncogene, serine/threonine kinase) which is involved in progesterone mediated oocyte maturation and MAPK pathways, was also found to be elevated in the ovary of HEL ducks. BRAF has been previously reported to be associated with fertility and egg production in chickens 24 . The enrichment of these genes suggests increased meiotic division in the ovary of HEL ducks.
Several serine hydrolase, serine-type endopeptidase and serine-type peptidase genes were differentially enriched. Serine proteases have been reported to affect egg production, oocyte maturation and fertilization 12 . They have also been reported to be enriched in the ovary of high egg producing ducks 5 . Ovochymases are a serine active site protease and were found to be released during egg activation in Xenopus laevis 25 . Though their function is not clearly known in mammals or birds, Bourin et al. 26 suggested that ovochymase-2 (OVCH2) might be secreted to be incorporated in the egg yolk or vitelline membrane. OVCH2, was found to be differentially expressed in the pituitary of high laying hen 12 whereas OVCH1 (Ovochymase-1) expression has not been previously reported in poultry. In our study, both OVCH1 and OVCH2 were upregulated in HEL ducks. Another group of genes highly enriched in HEL ducks were the cathepsin (CTS) family of lysosomal proteases, which were reported to be related to oviduct development in chickens 27 .
Several other genes related to egg production, hormone metabolism, and folliculogenesis were upregulated in the HEL ducks. This included PAR6 (Par-6-family cell polarity regulator alpha), which is a major regulator of follicle number and follicle development 28 . KPNA7 (Karyopherin subunit alpha 7) which functions as a receptor for oocyte-specific protein that are important for oocyte and embryonic development 29 . HSD17B2 (Hydroxysteroid 17-beta dehydrogenase 2) a key regulator of steroid hormone metabolism 30 . LHX9 (LIM homeobox 9), which plays a key role in female reproductive tract development 31 .
Notably several immune related genes were enriched in the LEL ducks. This included Mx1 (Interferon-induced GTP binding protein), which plays an important role in antiviral activities by blocking viral transcription 32 . Integrins β family genes (ITGB5 and ITGB6) mediate cell-to-extracellular matrix and cell-to-cell adhesions 33 . ITGB5 plays a key role in triggering phagocytosis of apoptotic cells 34 , while ITGB6 modulates innate immune response 33 . MRC1 (Mannose receptor C-type 1) mediates clathrin dependent endocytosis and promotes antigen presentation and is expressed in response to infectious agents 35 . Interferon-stimulated gene 15 (ISG15) expression has been reported to inhibit the budding of retroviruses from cells 36 . The upregulation of these genes in the LEL ducks (downregulated in HEL ducks), possibly suggests that the immune system of the LEL ducks were under stress.

Conclusions
Variation in egg production is likely caused by a combination of both genetic and environmental factors. This study identified several important pathways and candidate genes contributing to egg production in ducks. The difference in egg laying number between the HEL and LEL ducks could be a combination of increased expression of egg production and folliculogenesis related genes in HEL ducks and the increased expression of immune response related genes in the LEL ducks. These findings will have important implications for not only poultry industry but also for our understanding of adaptive variation in egg production in nature.

Materials and methods
Sample collection. The ovary samples used in the study were obtained from another study which was reviewed and approved by the Institutional Animal Ethics Committee of College of Veterinary and Animal Sciences, Kerala Veterinary and Animal Sciences University, Mannuthy, Thrissur, Kerala. This study is in accordance with ARRIVE guidelines. All methods were performed in accordance with relevant guidelines and regulations. Briefly, 300 Kuttanad ducks, the popular indigenous ducks of Kerala which includes Chara and Chemballi breeds were reared at the Poultry and Duck Farm, Kerala Veterinary and Animal Sciences University, Mannuthy, Thrissur, Kerala. At 20 weeks of age, 200 ducks were randomly selected and housed in individual cages with same feeding and management condition to vividly take into record the number of eggs being laid by each duck. Layer duck feed was provided ad libitum twice a day and water was supplied ad libitum through water trough. Egg number of individual ducks was recorded from 21 to 93 weeks of age. The ducks were sorted based on the total number of eggs laid at 93 weeks of age from high to low. The ducks with greater egg production (i.e. higher than 400 eggs in 93 weeks) were considered as high egg layers (HEL) and the ducks with lesser egg production (i.e. lower than 235 eggs in 93 weeks) were considered as low egg layers (LEL). We selected six and five Chara ducks from the HEL and LEL group respectively, for transcriptome sequencing. The average egg production rate was 433 ± 17 and 221 ± 9 for HEL and LEL ducks, respectively. Ducks were sacrificed during their laying physiological state through exsanguination. The ovary tissues including the follicles of different hierarchical grades were collected in RNA later solution after removing egg yolk and stored at − 80 °C. www.nature.com/scientificreports/ RNA isolation and sequencing. The total RNA extraction was performed using TRIzol method 37 . The ovarian tissue samples were washed briefly with a chilled 1X PBS buffer (10X PBS Buffer, Invitrogen, USA) before the extraction to avoid the antagonistic effect of RNA later in RNA yield. The quality and quantity of the RNA samples were assessed using NanoDrop (NanoDrop, Thermo Scientific) and Bioanalyzer (Agilent Technologies, USA). Approximately 4 µg of total RNA (RNA integrity number > 7) was used to prepare the cDNA library using TruSeq RNA Sample Prep Kits (Illumina, USA). Poly-A containing mRNA molecules were purified using poly-T oligo-attached magnetic beads and fragmented using divalent cations. The cleaved RNA fragments were used to synthesise first strand cDNA using reverse transcriptase and random hexamer primers followed by second strand cDNA synthesis. Subsequently, the synthesised cDNA fragments were then processed for end repairing followed by addition of single ' A' base and adapter ligation. The products were purified and enriched with PCR to create the sequencing cDNA library. The paired end libraries were sequenced on Illumina HiSeq 2500 platform (Illumina, USA).
Quantitative real time RT-PCR. To  Transcriptome data analysis. The raw transcriptome data comprising reads with adaptor sequences, sequencing primers and low-quality reads were processed using cutadapt 38 to obtain high quality clean reads. Subsequently, the clean reads were aligned and mapped to the reference duck genome (CAU_duck1.0) using the STAR v2.7.2b aligner 39 . The genome annotation was downloaded from Ensembl (http:// ftp. ensem bl. org/ pub/ relea se-104/ gtf/ anas_ platy rhync hos_ platy rhync hos/ Anas_ platy rhync hos_ platy rhync hos. CAU_ duck1.0. 104. gtf. gz)). Mapped reads were counted using HTSeq v0.9.1 40 . The raw counts were processed and analysed using the DESeq2 v1.28.1 package 41 in R version 4.0.2 (https:// www.R-proje ct. org) following the methods previously described 42 . The normalized count data were used for all the downstream analysis. Principal Component Analysis (PCA) was performed by calculating the distance between samples using the vst function in DESeq2. The p values were adjusted using the Benjamini-Hochberg method 43 and genes with an absolute fold change > 1.5 (absolute log2FC > 0.5), p < 0.005 and false discovery rate (FDR) < 0.1 were considered as significantly differentially expressed (DEG). Duck Ensembl gene IDs were converted to human orthologs, and the matched official gene symbols were used for functional clustering and pathway analysis using the Database for Annotation, Visualization and Integrated Discovery (DAVID) 44 . Annotation was performed under Biological Process (BP), Molecular Function (MF), Cellular Component (CC) in the InterPro protein domain and KEGG (Kyoto Encyclopedia of Genes and Genomes) pathway databases 45 . Significant terms (FDR < 0.1) were plotted using the ggplot2 package in R version 4.0.2. Protein-protein interaction network analysis of the DEGs was performed using search tool for the retrieval of interacting genes/proteins (STRING) 46 and the resulting interaction network was visualized using Cytoscape v.3.9.0 47 .

Data availability
The transcriptome data raw files have been deposited at Sequence Read Archive (GenBank) under the BioProject ID PRJNA825149.