Transcriptome sequencing reveals key potential long non-coding RNAs related to duration of fertility trait in the uterovaginal junction of egg-laying hens

Duration of fertility, (DF) is an important functional trait in poultry production and lncRNAs have emerged as important regulators of various process including fertility. In this study we applied a genome-guided strategy to reconstruct the uterovaginal junction (UVJ) transcriptome of 14 egg-laying birds with long- and short-DF (n = 7); and sought to uncover key lncRNAs related to duration of fertility traits by RNA-sequencing technology. Examination of RNA-seq data revealed a total of 9977 lncRNAs including 2576 novel lncRNAs. Differential expression (DE) analysis of lncRNA identified 223 lncRNAs differentially expressed between the two groups. DE-lncRNA target genes prediction uncovered over 200 lncRNA target genes and functional enrichment tests predict a potential function of DE-lncRNAs. Gene ontology classification and pathway analysis revealed 8 DE-lncRNAs, with the majority of their target genes enriched in biological functions such as reproductive structure development, developmental process involved in reproduction, response to cytokine, carbohydrate binding, chromatin organization, and immune pathways. Differential expression of lncRNAs and target genes were confirmed by qPCR. Together, these results significantly expand the utility of the UVJ transcriptome and our analysis identification of key lncRNAs and their target genes regulating DF will form the baseline for understanding the molecular functions of lncRNAs regulating DF.

. Duration of fertility trait in hens. (a,b) Histogram illustrates the distribution of DF trait within the population of egg-laying hens (n = 304). DN: the days post-insemination until the last fertile and FN: the numbers of fertile eggs laid after artificial insemination (AI). Bell-shaped indicate the normal distribution and the density curve was determined by standard normal distribution parameters N (µ, σ) where µ is the mean and σ is the standard deviation.
ScIEntIFIc REpORTS | (2018) 8:13185 | DOI: 10.1038/s41598-018-31301-z General characteristics of the egg-laying hen's transcriptome. RNA-sequencing was used to analyze the lncRNA expression from 14 cDNA libraries constructed from the UVJ of long-and short-DF hens (no pooled samples, n = 7). On the average, approximately 45.7 and 42.3 million raw reads, were generated from the long-DF and the short-DF groups respectively (Fig. 3). After data pre-processing and quality control, clean reads from the long-DF group ranges from (~80-90%) and short-DF group (~79-92%) were successfully aligned to chicken genome:Galgal4.0, map ratio of reads in most samples was about 73% in both groups; read coverage and mapped unique reads in most samples was more than 70%, see Supplementary Table S1 for the details of data pre-processing and read mapping. The reads alignment to chicken genome indicated that the sequencing reads were of good quality and the sequencing depth was sufficient for further analysis of lncRNA between the two groups of hens.
Identification of lncRNAs in long-and short-DF egg-laying hens. In this study, we adopted a set of filtering criteria as described in the methods to identify putative lncRNAs from the alignments. Consequently, a total of 9977 lncRNAs were discovered in the assembled transcripts for which an average of 7038 and 6909 was expressed lncRNA transcripts in long-and short-DF groups respectively (Supplementary Table S2). Among the total lncRNA transcripts identified, 7401 and 2576 were known and unknown lncRNAs (Supplementary  Table S2). Nevertheless, we observed that majority of lncRNAs were from the intergenic region (Supplementary  Table S3). Next, the transcript lengths, exon number, and expression level between lncRNAs and mRNAs which Phenotypes DN * (days, Mean ± SD) FN † (eggs, Mean ± SD) Fertility rate ‡ (%)  Fig. 4a-c. All the lncR-NAs and all the mRNAs were regarded as two groups to compare their basic characteristics. The lncRNA transcript lengths were significantly shorter than those of mRNAs (Fig. 4a). A majority of lncRNAs had two or three exons, whereas mRNAs contained a broad range of exon numbers from two to greater than twenty (Fig. 4b). Additionally, the average expression level detected for lncRNAs (0.74 average FPKM) is significantly lower than that of mRNA (2.2 average FPKM) (Fig. 4c). Together, these results indicated that the lncRNAs in the UVJ of long-and short-DF exhibited relatively short length, low exon numbers, and low expression level.  further analyzed based on the criteria of differential expression analysis. In total, the expression levels of 223 lncRNAs were significantly different between the long-and short-DF groups (q value < 0.05). Among these 223 lncRNAs, 81 were up-regulated and 142 were down-regulated in the long-DF group compared with the short-DF group (Fig. 5, Supplementary Table S4). The top 10 up-regulated and down-regulated lncRNAs in the long-DF hens were listed in Table 2. Among the downregulated ones, 85 lncRNAs were located in the intergenic region, 16 lncRNAs in bidirectional, 4 lncRNAs in introns (2 intronic-antisense and 2 intronic-sense) and 37 lncRNAs in exons (34 exonic-antisense and 3 exonic-sense). For the upregulated ones, 55 lncRNAs were mapped in intergenic position, 6 lncRNAs in bidirectional, 15 lncRNAs in exons (12 exonic-antisense and 3 exonic-sense), and 5 lncRNAs in introns (intronic-antisense) (Supplementary Table S4).    Table S7). Together these results showed that these 8 key DE-lncRNAs may regulate DF trait, UVJ formation, and reproductive processes through their potential targets ( Table 3).  (Fig. 8a,b). Similarly, upregulated MSTRG.8138.1 and MSTRG.2982.4 also appeared to be expressed at the same level when compared with the RNA-Seq data (Fig. 8c,d). Together, these data indicated that there was a strong agreement between the qPCR and RNA sequenced data. We found that target genes (SOX2, PLCB1, and FAM20C) were highly expressed in long-DF hens compared with the short-DF hens ( Fig. 8e-g). High expression of BRAF and PRKAA2 were also detected in the short-DF hens compared with the long-DF hens (Fig. 8h,i). According to these findings, long-and short-DF hens significantly differed at P < 0.01 and P < 0.05 levels based on assuming unequal variances Student's t-test.

Discussion
Duration of fertility (DF) trait is a vital factor in poultry production. In the current study, the estimated DF-trait for the two composite characteristics (DN and FN) was approximately (18.71 ± 0.49; 7.14 ± 1.57) and (17.43 ± 0.53; 7.43 ± 1.40) in the long-and short-DF hens respectively (Table 1). During the reproductive phase, there was a significant difference in the DF traits between the two groups (Supplementary Table S8). In addition, a different morphological change was observed in their respective UVJ samples. Numerous SSTs were embedded in the long-DF groups compared with the short-DF, indicating that the mechanism regulating prolonged DF could be mostly due to the observed SST as revealed in the previous work conducted in broiler and turkey hens 30 . Several studies associated with period of fertility of egg-laying hens have been reported 31,32 . However, the molecular mechanisms underlying prolonged-DF remain obscure. During the reproductive season, we compared the lncRNA expression profiles in the long-DF hens with those of short-DF hens and identified a set of lncRNA genes associated with DF trait (Supplementary Table S2). In agreement with previous reports 28 , lncRNAs shared similar features, such as shorter in length, lower in exon number, and lower expression level than protein-coding transcripts. A total of 223 lncRNAs were differentially expressed (DE), of these, 81 lncRNAs were upregulated and 142 lncRNAs were downregulated in the long-DF groups, indicating that the majority of the DE-lncRNAs were downregulated throughout the reproductive phase. We anticipate that some of these DE-lncRNAs might play key roles in prolonging the DF in hens as shown in other studies 19,33 .
The DE-lncRNA-target prediction and functional interaction network revealed eight key DE-lncRNAs related to long-DF hens, including MSTRG.20950. 12 Table S5). Many of the long-DF-enriched target genes were categorized into identical protein binding, response to growth factor, reproductive system development, positive regulation of cell differentiation, cellular response to cytokine stimulus, developmental process involved in reproduction, regulation of osteoblast differentiation, response to acid chemical, osteoblast differentiation, carbohydrate binding, response to cytokine, positive regulation of MAPK cascade, peptidyl-lysine modification, chromatin, chromatin organization, chromatin modification, histone modification, regulation of MAPK cascade, MAPK cascade, positive regulation of protein modification process, regulation of cell cycle, and ossification.
A total of nine overlapped targets were potentially associated with long-DF hens (Table 3). These included CLDN1, SOX2, FAM20C, PLCB1, CTR9, DDIAS, 17.5, PRKAA2, and BRAF. For instance, SOX2 was expressed at high levels in the long-DF hens compared with the short-DF hens. SOX2 is a member of the sex-determining region (SRY-related) having analogous HMG domains for DNA binding and is highly involved in regulation of cell differentiation 34,35 . SOX2 expression patterns in early chicken embryos, consistently mark neural primordial cells at various stages of development 36 . Mutations in SOX2 have previously been associated with male genital tract abnormalities 37 . Another interesting gene, named family with sequence similarity 20, member C (FAM20C), was also highly expressed in the long-DF hens. FAM20C is a physiological Golgi casein kinase that phosphorylates multiple secreted proteins 38 . The molecular activity of FAM20C has been implicated in mammalian reproduction 39 . In this study, the observation of significant high expression of FAM20C implied that it may  40 . Likewise in our data, the high expression level of phospholipase C beta 1 (PLCB1) was observed in the long-DF hens which suggest that PLCB1 has a potential role in DF regulation 41 .
One important signaling pathway associated with DF trait is mTOR signaling pathway and the target genes involved were over-represented in KEGG database, including mTOR signaling pathway, Vascular smooth muscle contraction, insulin signaling pathway, FoxO signaling pathway, Regulation of actin cytoskeleton, and Focal adhesion pathway (Supplementary Table S7). The elevated expression of BRAF and PRKAA2 genes via mTOR signaling pathway was seen in the short-DF hens. The mTOR signaling pathways are critical regulators of ovarian function including quiescence, activation, and survival of primordial follicles (PFs), granulosa cell (GC), proliferation and differentiation, and meiotic maturation of oocytes 42 . mTOR signaling pathway has been implicated in fertile egg production and female fertility in mice 43 , suggesting that BRAF and PRKAA2 genes via mTOR signaling pathways may be involved in regulating the DF trait.

Conclusions
In summary, based on the RNA-Seq and bioinformatics analysis results, a list of DE-lncRNAs related to the cellular response to cytokine, reproductive structure development, regulation of protein modification, carbohydrate binding, co-factor binding, chromatin organization and modification, response to growth factors, cytokine stimulus and immune pathways were identified in the UVJ of long and short-DF hens. Eight key DE-lncRNAs were identified to have the most probable role in prolonging the DF in egg-laying hens, thus, they are novel lncRNAs that are related to DF in hens. These results provided the starting point for studies aimed at understanding the molecular mechanism of the DF trait in egg-type chickens.

Material and Methods
Egg-laying hens. A total of 304 egg-type chickens obtained from the poultry farm of Huadu Yukou Poultry Industry Co. Ltd, Beijing, China were raised in individual cages, kept in identical light/dark cycles and fed standard diets ad libitum from 25 weeks until the end of the experiment aiming at study their duration of fertility. All hens were artificially inseminated once with 2 × 10 8 pooled sperms ejaculates collected from viable rooster flocks. Eggs were collected and marked daily from day 2-20 after artificial insemination, all experiment was executed in three replicates and lasted 60 days. The number of egg per hen over the period was recorded and the fertilized eggs were examined by candling on day 10 of incubation (dead embryos were considered as fertile). DF trait was expressed in terms of DN (the number of days post-insemination until last fertile egg) and FN (the number of fertile eggs after a single AI) and hens with DN or FN traits ≥ 18 were selected, coded as long and DN or FN < 10 as short-DF groups [52][53][54][55] . Seven long-DF and seven short-DF hens were selected for sampling UVJ. All selected hens were anesthetized with sodium pentobarbital (20 mg/kg) administered intraperitoneally, and the narrow band known as UVJ located at the cranial anterior end of the vagina were used as a sample for RNA collection 29 .
In other to characterize the UVJ structure of long-and short-DF hens. The sample size for tissues analysis was four UVJs selected from 4 different individuals of long-and short-DF groups respectively. The UVJ samples were formalin-fixed for 48 h and paraffin-embedded. The 100 μ m of the tissues sectioned were stained with hematoxylin and counterstained with eosin (hematoxylin for 1 min and 1% eosin for 10 sec) for observation under a light microscope (OLYMPUS; TH4-200; Tokyo, Japan).  Transcriptome analysis for lncRNA data. Mapping of all clean reads was carried on the chicken reference genome (assembly Gallus gallus_4.0) using TopHat (version:2.0.9) with parameters set to default values 56 . Unmapped reads were trimmed and remapped, additionally, Gene Transfer Format (GTF) of the Ensembl gene annotation was included in the read mapping as previously described by 57 . Next, RNA-Seq alignments for all sample was assembled using String Tie (version: 1.3.0) and Gffcompare (version: 0.9.8) was used to evaluate the accuracy of the assembled transcripts (i.e., putative transcripts containing both coding and noncoding transcripts) and compare the assembly with known transcripts (putative transcripts for each sample assembly against a set of combined gene annotations), to extract assemblies that fully match known and unknown annotations 57,58 .
Prior to that, low-quality assemblies were removed based on the Fragments Per Kilobase of exon per Million fragments mapped (FPKM) threshold, and multi-exon transcripts were retained for downstream processing. The FPKM threshold for classifying complete and partial transcripts in our experiment was established according to 57 . Furthermore, transcripts with short read length <200 bp, <2 exons including those with maximum putative ORFs < 300 bp were filtered out. Next, protein-coding transcripts were identified and removed with the software programme Coding-Non-Coding Index (CNCI > 0) 57 and Coding Potential Calculator (CPC > 0) 59 . Similarly, Pfam (http://pfam.xfam.org) was used to remove transcripts with significant protein domain. From the remaining transcripts, a putative lncRNA is defined as novel lncRNAs if its Pfam hits returned insignificantly. The analysis protocols are described step by step in Supplementary Fig. S1. Afterward, the novel lncRNAs were classified according to 60 and characterized based on transcription length, exon number and expression level between lncRNA and mRNA transcripts (RefSeq transcripts, NCBI). Furthermore, differentially expressed lncRNAs (DE-lncRNAs) in the comparison groups of long-DF versus short-DF were identified using the EdgeR package (freely available from the Bioconductor web site (http://bioconductor.org) 61 . Only the lncRNAs with the criteria of log2FC (fold change) ≥2 and p-value ≤ 0.01 and FDR ≤ 0.05 were identified as DE-lncRNAs.

Prediction and functional annotation of DE-lncRNAs target genes.
To further predict the DE-lncRNA target genes, the Pearson correlation coefficient (PCC) was calculated to evaluate the co-expression relationships between DE-lncRNAs and target genes. The co-expression pairs with PCC > 0.9 and p-value < 0.05 were selected for the construction of the regulatory network, which was visualized by Quantitative real-time PCR (qPCR) validation of DE-lncRNAs and target genes. The RNA sequencing data were validated by qPCR, using a standard EasyScript ™ one-step gDNA Removal, cDNA Synthesis SuperMix, for cDNA production and SYBR Green real-time PCR (Toyobo co., Ltd, Osaka, Japan) following manufacturer's instructions. Two up-regulated and two down-regulated DE-lncRNAs and five target genes were randomly selected to quantify the expression levels. Quantitative PCR (qPCR) was then performed in Bio-Rad thermal cycler, CFX-384, real-time system. Gene expression in each sample was normalized to beta-actin expression. The qPCR amplifications were conducted using an independent set of seven biological replicates and three technical replicates per sample. Relative quantitation of lncRNAs and target genes expression was evaluated by the 2(−ΔΔCt) methods 64,65 . Primers sequences of lncRNAs and target genes for qPCR are listed in Supplementary Table S9.
Ethics approval and consent to participate.

Data Availability
All replicates sequencing reads used in this study have been submitted to Gene Expression Omnibus (GEO) under the accession number GSE101163.