Characterization of the Transcriptional Complexity of the Receptive and Pre-receptive Endometria of Dairy Goats

Endometrium receptivity is essential for successful embryo implantation in mammals. However, the lack of genetic information remains an obstacle to understanding the mechanisms underlying the development of a receptive endometrium from the pre-receptive phase in dairy goats. In this study, more than 4 billion high-quality reads were generated and de novo assembled into 102,441 unigenes; these unigenes were annotated using published databases. A total of 3,255 unigenes that were differentially expressed (DEGs) between the PE and RE were discovered in this study (P-values < 0.05). In addition, 76,729–77,102 putative SNPs and 12,837 SSRs were discovered in this study. Bioinformatics analysis of the DEGs revealed a number of biological processes and pathways that are potentially involved in the establishment of the RE, notably including the GO terms proteolysis, apoptosis, and cell adhesion and the KEGG pathways Cell cycle and extracellular matrix (ECM)-receptor interaction. We speculated that ADCY8, VCAN, SPOCK1, THBS1, and THBS2 may play important roles in the development of endometrial receptivity. The de novo assembly provided a good starting point and will serve as a valuable resource for further investigations into endometrium receptivity in dairy goats and future studies on the genomes of goats and other related mammals.

. Because one unigene might be annotated to more than one public database, a total of 43,127 coding genes were found after annotation.
To study the sequence conservation of the endometrium in other animal species, we used BLAST 28 to align the unigenes to the NCBI non-redundant database (nr) using an E-value of e −10 as the threshold. A total of 15,220 unique sequences (14.86%) had BLAST hits in the nucleotide sequence database in nr. The majority of the annotated sequences corresponded to known nucleotide sequences of animal species, with 30.3%, 17.4%, 9.1%, 3.6%, and 2.5% matching with Ovis aries, Bos taurus, Bos grunniens, Homo sapiens, and Orcinus sequences, respectively (Fig. 2). KOG (clusters of orthologous groups for eukaryotic complete genomes) is a classification system based on orthologous genes 29 . In this study, 31,778 unigenes were annotated to 26 groups by the KOG database (Fig. 3). General functional prediction alone (R) annotated 6,009 unigenes at most, and no unigenes were annotated to the Unnamed protein (X). Cell cycle control, cell division, chromosome partitioning (D) was annotated with 854 unigenes, Extracellular structures (W) was annotated with 113 unigenes.
Detecting SNP and SSR. Next-generation sequencing provides a range of new potential applications for evolutionary and ecological-genetic studies in non-model species 30 . The discovery of putative SNPs in the RE and PE datasets was summarized. According to the results presented in Table 4, a total of 76,729 putative SNPs were identified for the PE dataset (Table S2), of which 55,044 (71.74%) were transitions and 21,685 (28.26%) were transversions. Similarly, 77,102 putative SNPs were identified from the RE dataset (Table S3), of which 72.13% were transitions and 27.87% were transversions.

Differential gene expression and functional characterization. Analysis of Unigene Expression.
The RNA-Seq technique allowed the analysis of differential expression profiles via transcript abundance with a high sensitivity for transcripts expressed at low levels 34,35 . We generated 90 million paired-end reads 100 bp in length, yielding approximately 9 GB of sequence. Thus, the sequencing depth in this study was sufficient to detect transcripts expressed at low levels. To better categorize the unigenes that presented differential expression levels, unigenes expression values RPKM (reads per kilobase of exon    model per million mapped reads) were categorized into three groups: high (> 500 RPKM), medium (10 to 500 RPKM), and low (< 10RPKM) ( Table 5). Unigenes that are highly expressed in a specific tissue may be responsible for the basic metabolism and functions of that tissue 12 . A total of 192 and 187 genes were found to be highly expressed in the PE and RE libraries, respectively. A total of 3,255 unigenes were found to differ significantly in terms of expressional levels (P < 0.05) between the PE and RE libraries; the full list of DEGs is provided in Table S5. There were 208 differentially expressed unigenes that were down-regulated in the RE compared to the PE in goats, while 618 unigenes were up-regulated with fold-changes greater than or equal to 2 (Fig. 5). Additionally, there were 5 unigenes specifically expressed in the receptive endometrium with expression values at the medium level. The top 10 unigenes that were up-regulated in RE compared to PE are shown in Table 6. comp34258_c0_seq1 (MMP1) was the most up-regulated DEG (13.61-fold increase in the RE compared to the PE), followed by comp41692_c0_seq2 (MMP12, 10.89-fold) and comp19823_c0_seq3 (Fxyd4, 8.48-fold). The top 10 down-regulated unigenes are shown in Table 7. The most down-regulated DEG was comp24892_c0_seq2 (NNAT, -5.32-fold increase in the RE compared with the PE), followed by comp43544_c1_seq3 (MYH11, -4.41-fold). A heat map of the Pearson's correlation and a dendrogram of the correlation between transcript tags are provided in Fig. 6. The up-regulated DEG with the highest level of expression (RPKM = 1525.24) was comp9210_c0_seq2 (S100G) with a 5.34-fold increase in the RE, and the down-regulated DEG with the highest level of expression (RPKM = 1985.32) was comp39532_c0_seq7 (Tes) with a -4.09-fold increase in the RE (Table 8).
Gene Ontology Analysis of the DEGs. The DEGs were analysed by running queries for each DEG against the GO database, which provides information related to three ontologies: molecular function, cellular component, and biological process. In this study, GO enrichments of the DEGs were categorized into 426 functional groups that met the criteria of P-values < 0.001. Out of the 133 terms that were   Red dots indicate up-regulated genes, blue was not DEG, and the green was down-regulated genes in in the receptive endometrium compared to pre-receptive phase.
significantly enriched in molecular functions (Table S6), the most significantly enriched GO terms were protein binding (GO: 0005515) with 154 genes annotated, followed by ATP binding (GO: 0005524), calcium ion binding (GO: 0005509), metal ion binding (GO: 0046872), and sequence-specific DNA binding transcription factor activity (GO: 0003700). In the cellular compartment GO category, 88 terms were significantly enriched (Table S7). The most significantly enriched GO terms were integral to membrane (GO: 0016021) with 338 genes annotated, followed by nucleus (GO: 0005634), cytoplasm (GO: 0005737), and extracellular region (GO: 0005576). In the biological processes, 205 GO terms were significantly enriched and were related to various processes (Table S8) Fig. 7. The inclusion of the annotation for insulin-like growth factor binding (GO: 0005520) and clathrin sculpted gamma-aminobutyric acid transport vesicle membrane (GO: 0061202) interested us, because previous studies have reported that IGF-1 (insulin-like growth factor) and GABA (gamma-aminobutyric acid) may play important roles in the development of the receptive endometrium in mice and humans [36][37][38][39] .  Fig. 8. These results indicate that diversifying metabolic processes are active in the development of a receptive endometrium from the pre-receptive phase, and a variety of metabolites are synthesized in the receptive endometrium.

KEGG
Genes Possibly Involved in the Development of Receptive Endometria. Our analysis identified many DEGs that were enriched in Calcium (GO: 0005509, GO: 0005509, GO: 0008294, and ko04020) and Cell adhesion (GO: 0045785, GO: 0007155, and ko04514) were significant according to the analysis results of both the GO terms and KEGG pathways (P < 0.001). Based on these results, we analysed the mRNA expression levels of some coding genes related to calcium and cell adhesion, and found that ADCY8, VCAN, NA, SPOCK1, CGREF1, THBS1, THBS2, S100G, S100A1, S100A2, S100A4, S100A10, S100A13, MMP1, MMP3, MMP11, MMP12, and MMP19 exhibited significantly different expression levels among the two endometrial phases (Fig. 9). Therefore, the results of this study suggested that these genes may play roles  in the differential regulation of goat endometrial development during the receptive and pre-receptive phases; however, the validation of this hypothesis needs further study.

Discussion
Investigating the transcriptome profile of the receptive and pre-receptive endometrium will contribute to our understanding of the biochemical and physiological development of endometrial receptivity during the "window of implantation". RNA-Seq offers an unprecedented level of sensitivity and high throughput deep sequencing and has been widely used to detect gene expression patterns 10,12 . In the present study, large-scale transcriptome data were obtained using Illumina RNA-Seq as the first step of our endeavour to provide clear insight into the molecular mechanism of endometrial receptivity in dairy goats.         Sequencing Results. Summary and validation of RNA-Seq. The RNA-Seq method has previously been widely used in many tissues and at developmental different stages 41,42 . Mamo's research team summarized a large dataset to characterize the transcriptome of the endometrium and thought that most studies had tended to focus on either the conceptus or the maternal endometrium rather than the crosstalk between the two in cattle 43 . Therefore, they examined the global transcriptome profiles of the day 16 bovine conceptus and pregnant endometrial tissues using RNA-Seq, and a total of 133 conceptus ligands that interact with corresponding receptors on the endometrium and 121 endometrium ligands that interact with corresponding receptors on the conceptus were identified 44 . Additionally, Forde and her colleagues determined how low progesterone (P4) affected the endometrial transcriptome in cattle, and identified a panel of genes that were truly regulated in the endometrium by circulating concentrations of P4 in vivo 45 .
To characterize the complex transcriptome changes in the endometrium in the course of initial conceptus attachment, the porcine endometrium between Day 14 pregnancy (Attachment Phase) and corresponding cyclic endometrium was performed using Illumina RNA-Seq 46 . Furthermore, the endometrium transcriptome of day 12 pregnant (Preattachment Phase) was also analysed by Samborski 47 . Therefore, an integrated analysis of gene expression changes during these two distinct phases of early pregnancy in the pig was performed and revealed a number of biological processes and pathways that are potentially involved in the establishment of pregnancy in the pig.
Researchers would get some genes that were unidentified previously using RNA-Seq, what provided a useful approach to gain insights in the physiological events. However, current knowledge on the development of the receptive endometrium has been limited. Herein, for the first time we present a complete dataset detailing the transcriptome of the pre-receptive endometrium (PE) and receptive endometrium Scientific RepoRts | 5:14244 | DOi: 10.1038/srep14244 (RE) in dairy goats using the Illumina RNA-Seq approach to generate a total of 46,514,662 and 44,185,646 clean reads, respectively.
De novo assembly of sequencing data. Previous analyses of transcriptomic data always used the Velvet, ABySS, MIRA, Newbler v2.3, Newbler v2.5p, CLC, or TGICL software programmes for assembly 11,48 . However, all of these methods rely on aligning reads to a reference genome, and thus are unsuitable for samples with a partial or missing reference genome. Choosing suitable assemblers and parameters is critical to obtain a better assembly performance, and is even more important in transcriptomic studies in non-model organisms 30 . For these reasons, Grabherr presented the Trinity method for the de novo assembly of full-length transcripts and evaluated it on samples from fission yeast, mice and the whitefly, whose reference genomes are not yet available 26 . Compared with other transcriptome assemblers, Trinity recovers more full-length transcripts across a broad range of expression levels with a similar sensitivity to methods that rely on genome alignments 26 . Thus, the Trinity approach provides a unified solution for transcriptome reconstruction in any sample, especially in the absence of a reference genome. Thus we choose Trinity (http://trinityrnaseq.sourceforge.net/) to assembly and got 102,441 unigenes.
Unigene Annotation. All of the unigenes were BLASTed (Basic Local Alignment Search Tool) to public database banks. A total of 43,127 coding genes were detected after the unigenes were annotated to SWISS-PROT, nr, KEGG, KOG, and Pfam. Additionally, 15,220 genes had BLAST hits in the nucleotide sequence database in nr, of which 4,612 (30.3%) matched with Ovis aries and 2,648 (17.4%) with Bos taurus. Orthologous genes have the same function and common ancestors 29 , and KOG is a classification system based on orthologous genes. In this study, most of the annotated genes were in the General functional prediction only (R) category, but no genes were annotated to the Unnamed protein (X) category. The dataset we report here is the largest dataset of different genes representing a substantial portion of the endometrial transcriptome, and most likely includes the majority of the genes involved in the sophisticated networks that regulate endometrial receptivity during embryo implantation.
Detecting SNP and SSR Variants by RNA-Seq. RNA-seq has been shown to be an efficient method to detect SNP variants on a large scale at the mRNA level 12 . A total of 100,734 SNPs were discovered in the milk from Holstein cattle using RNA-seq and were validated by Sanger sequencing technologies. The results confirmed that RNA-Seq technology is an efficient and cost-effective method to identify SNPs in transcribed regions 49 . In the present manuscript we supplied 76,729 putative SNPs in RE and 77,102 in PE, which will contribute to furthering our understanding of its development and functions.
SSRs have proven to be more reliable than other markers, and the utility of SSRs in genetic studies is well established 30 . The polymorphisms revealed by SSRs result from variations in repeat number, which primarily result from slipped-strand mispairing during DNA replication. Thus, SSRs reveal much higher levels of polymorphism than most other marker systems 50,51 . We screened 12,837 SSR loci in the endometrium transcriptome, of which the monomer repeats (47.12%) were the most frequent type.

Differential gene expression and functional characterization. Analysis of Gene Expression.
Successful implantation and trophoblast invasion are facilitated by the degradation of the extracellular matrix (ECM) of the endometria/decidua 52,53 by various proteinases, including the matrix metalloproteinases (MMPs) that are capable of degrading the ECM 54 . MMP-2 and MMP-9 represent the best characterized proteases in the context of trophoblast invasion; additionally, MMP-12 was recently described to execute elastolysis during uterine spiral artery remodelling 55 . In our study, 3,255 unigenes meeting the criteria of P < 0.05 were found to differ significantly in expression levels between the RE and PE. The most up-regulated DEG was MMP-1 (comp34258_c0_seq1). MMP-1 is a zinc-containing endopeptidase that plays a key role in physiological and pathological tissue remodelling [56][57][58] . It was found to be secreted by trophoblast cells in vitro 59 , and Berthold found weak MMP-1 immuno-reactivities throughout pregnancy, preferably in the invasive phenotype of the extravillous trophoblast and its ECM 60 .
The up-regulated DEG with the highest expression level was S100G (S100 calcium binding protein G, comp9210_c0_seq2), which has been identified in the uteri of several species [61][62][63][64] . S100G expression was observed in the uterine inter-implantation sites in mice 65 and endometrial epithelial cells in rats 62 during early pregnancy. Further study showed that S100G was up-regulated by oestrogen 66 . However, studies have reported that the levels of S100G were higher in the luteal phase than in the follicular phase in the endometrium of pigs 67 . And this study demonstrated higher expression of S100G in the receptive endometrium of dairy goats for the first time. These findings suggested that endometrial S100G expression is regulated in a stage-specific manner during early pregnancy, and that S100G may have a critical role in regulating the development of endometrial receptivity.
In this study, CA1 (Carbonic anhydrase 1, comp43561_c1_seq2) was specifically expressed in the receptive endometrium of dairy goats. It is a member of the CA family 68 and catalyses the reversible hydration/dehydration of carbon dioxide 69 . CA1 is considered to be a differentiation marker of the colonic mucosa, and the loss of CA1 expression is associated with the disappearance of differentiated epithelial cells 70 . Currently, little is known concerning the relationship between CA1 and endometrial receptivity during the "window of implantation" in dairy goats. Therefore, we propose that CA1 may be Scientific RepoRts | 5:14244 | DOi: 10.1038/srep14244 negatively involved in the development of the receptive endometrium from the pre-receptive phase in dairy goats; however, this hypothesis needs further exploration.
Additionally, MYH11 (myosin heavy chain 11, comp43544_c1_seq6) was decreased in the RE compared with the PE. MYH11 has been associated with cell migration and adhesion, intracellular transport, and signal transduction 71 . It encodes myosin-11 72 , a major contractile protein that converts chemical energy into mechanical energy through the hydrolysis of ATP 71 . And several studies have shown that myosin plays important roles in cancer 73-76 due to its down-regulation 77,78 .
Based on these findings, we propose that S100G, CA1, and MYH11 may play important role in the development of the receptive endometrium from the pre-receptive phase. However, the molecular mechanism of the above genes in the regulation of endometrial receptivity requires further study.
Gene Ontology Analysis of the DEGs. In total, 426 GO terms were assigned to the DEGs meeting the criteria of P < 0.001, of which 133 GO terms were categorized into molecular function, 88 GO terms were categorized into cellular component and 205 GO terms were categorized into biological process. Further analyses showed that proteolysis, apoptosis, cell adhesion, protein transport, protein folding, multicellular organismal development, and cell differentiation were enriched significantly in biological processes. Previous studies have reported that increased proteolysis were found in the culture supernatant of endometrial tissues from women with endometriosis 79 . Controlled extracellular proteolysis is a requirement for angiogenesis (new vessel formation) that depended on controlled interactions between cells and the ECM 80 . Apoptosis, referred to as programmed cell death, is a process by which multiple cell types are eliminated during embryogenesis and in fully developed adult multicellular organisms 46 . Apoptosis is an important biological process involved in the cyclic remodelling of the endometrium 27,81 . Kokawa's in situ analysis revealed that cells undergoing apoptosis were scattered in the functional layer of the early proliferative endometrium 82 . Accumulating evidence has suggested that apoptosis helps to maintain cellular homeostasis during the menstrual cycle through the elimination of senescent cells from the functional layer of the uterine endometrium during the late secretory and menstrual phases of the cycle 79,82 . Spontaneous apoptosis of eutopic and ectopic endometrial tissues is impaired in women with endometriosis, and this reduced susceptibility to apoptosis might permit the growth, survival and invasion of endometriotic tissue 83,84 . The unique cell adhesion of the trophoblast to the endometrial epithelium and the subsequent trophoblastic invasion of the maternal tissue is an essential element of embryo implantation 85 . Genes related to cell adhesion processes are particularly important for implantation and placenta formation, and Samborski reported that related categories were found as overrepresented for genes with higher expression in Day 14 pregnant endometrium as well as for genes with higher expression in Day 14 cyclic endometrium 46 . These represent important and necessary processes for the development of the receptive endometrium from the pre-receptive phase, the folded endometrial epithelial bilayer and the ability of the endometrium to acquire the adhesive properties that allow embryo adhesion and its subsequent invasion.
Additionally, 9 genes were significantly annotated to insulin-like growth factor binding (GO: 0005520). Insulin-like growth factor (IGF-1) mediates the growth-promoting activity of the hormone, induces endothelial cell migration and is involved in the regulation of angiogenesis 86 . Oestrogen has been shown to stimulate IGF-1 gene expression in the endometrium 36,87 , and a growing body of evidence supports interactions between oestrogen and the IGF signalling pathways 88 . Giudice et al. provided evidence of IGF-I involvement in endometrial growth regulation 89 , thereby promoting endothelial cell proliferation and differentiation 90 .
KEGG Pathway Analysis of the DEGs. These KEGG pathways with DEGs enriched provide a valuable resource for investigating specific processes, functions, and pathways, facilitated the identification of novel genes involved in the development of the receptive endometrium in goats during the 'window of implantation' .
Previous studies have reported that cell cycle pathway can be rapidly elicited by oestradiol (E2) in the human and mouse uterus [91][92][93] . Progesterone has been found to inhibit endometrial cancer by inhibiting the cell cycle 94 , and Dai found that the number of endometrial cells in G1 is significantly increased following treatment with progesterone 81 . Moreover, a few cell cycle regulatory genes were discovered to be expressed within post-hatching and/or preimplantation blastocysts 27 . In this study, the KEGG pathway of Cell cycle (ko04110) with 22 DEGs enriched meet the criteria of P-values < 0.001. Samborski identified some genes that were more highly expressed at the mRNA level in the endometria of pigs on day 12 of pregnancy, and these genes were related to KEGG functional terms of the cell cycle 47 .
In addition, Mitko reported that a number of mRNAs that encode ECM proteins and components of the cytoskeleton are enriched in the bovine endometrium during the oestrous cycle, indicating a remodelling of the ECM in the endometrium and changes in the cytoskeleton of endometrial cells 84 . The possibility that trophoblasts are scavengers of endometrial ECM breakdown products is compelling, and the analyses of ECM receptors and associated molecules that may be present on the trophoblast cell membrane adjacent to these structures will shed light on their precise nature and function 82 . The levels of a number of molecules have been shown to peak during the "window of implantation", particularly the ECM receptor integrin α vβ 3 95 . We found 24 DEGs enriched into the pathway of ECM-receptor interaction (ko04512), met the criteria of P-values < 0.001. Furthermore, temporally and spatially regulated Scientific RepoRts | 5:14244 | DOi: 10.1038/srep14244 changes in the expression of ECM molecules at the maternal-foetal interface during embryo implantation are crucial for blastocyst attachment, migration and invasion 96 .
Potential genes involved in the development of the receptive endometrium. In this study, we analyzed the enriched GO terms and KEGG pathways and found Calcium and Cell adhesion what aroused our interest based on the researches of predecessors. So we filtered the GO-terms and KEGG pathways related to Calcium (GO: 0005509, GO: 0005509, GO: 0008294, and ko04020) and Cell adhesion (GO: 0045785, GO: 0007155, and ko04514), what were significantly enriched for the DEGs from the receptive and pre-receptive endometria.
Large-conductance calcium-activated potassium channels (Bκ Ca channels) were expressed in endometrial cells, affected embryo implantation by mediating endometrial receptive factors, and altered the activity of NF-κ B and homeostasis of Ca 2+ in the human endometrial cells 97 . S100A11 was a Ca 2+ -binding protein that interpreted the calcium fluctuations and elicited various cellular responses. Endometrial S100A11 was a crucial intermediator in EGF-stimulated embryo adhesion, endometrium receptivity, and immunotolerance via affecting Ca 2+ uptake and released from intracellular Ca 2+ stores, and down-regulation of S100A11 might cause reproductive failure 98 . In addition, Membrive reported that Calcium could potentiate the effect of estradiol on PGF2α production in the bovine endometrium 99 . Cell-cell adhesion between the conceptus trophectoderm and endometrial luminal epithelial cells is the final step for placentation, and various cell adhesion molecules are involved in this process 100 . The expression and function of cell adhesion molecules are very important for embryo implantation and the establishment of pregnancy 101 . What's more, LAY reported that interleukin 11 regulated endometrial cancer cell adhesion and migration via STAT3 102 , and L1 cell adhesion molecule is a strong predictor for distant recurrence and overall survival in early stage endometrial cancer 103 .
Therefore, GO terms and/or KEGG pathways that were closely related to calcium and cell adhesion in our study provided much valuable information for future study of the molecular mechanism underlying the development of the receptive endometrium in goats. And then we analysed the mRNA expression levels of some putative coding genes and found that ADCY8, VCAN, SPOCK1, THBS1 and THBS2 were involved in the GO terms and KEGG pathways related to Calcium and Cell adhesion.
ADCY8 (adenylate cyclase 8) is a membrane-bound enzyme that catalyses the formation of cAMP from ATP in higher vertebrates 104 after it is activated by calmodulin 105,106 . Previous findings showed that ADCY8 is required for axonal pathfinding before axons reach their targets, connecting mouse avoidance behaviour obtained in automated home cage environments to human bipolar affective disorder 107 . The findings in the present study point to novel mechanisms underlying the role of the regulatory factor in the formation of the receptive endometrium. VCAN is a chondroitin sulphate proteoglycan that was first identified in fibroblastic extracts 108 and is abundantly expressed within the ECM of the developing and mature cardiovascular system 109 . Since the initial identification and characterization of VCAN, the functional importance of this chondroitin sulphate proteoglycan in multiple adult biological systems has been increasingly demonstrated 110,111 . Additionally, because it interacts directly with integrin, attachment-dependent signalling may be altered and affect cell migration and the epithelial mesenchymal transition in the cushion primordia of the septa and valves 112,113 . SPOCK1 (SPARC/osteonectin, cwcv and kazal-like domains proteoglycan 1) encodes testican-1, which is a proteoglycan belonging to the BM-40/ SPARC/osteonectin family of extracellular calcium-binding proteins; SPOCK1 is widely expressed in the heart, blood and cartilage [114][115][116][117] . Miao identified SPOCK1 as a novel transforming growth factor-b1 (TGF-b) target gene that regulates the lung cancer cell epithelial-to-mesenchymal transition (EMT), which plays a key role in the early process of metastasis of cancer cells 118 . Moreover, a number of studies have demonstrated that SPOCK1 plays a critical role in prostate cancer recurrence and glioblastoma invasion 119,120 . Thus, we hypothesize that SPOCK1 plays an important role in the development of the receptive endometrium, although this hypothesis needs further study. THBS1 (thrombospondin 1) is a multifunctional glycoprotein that is involved in numerous biological processes, such as cellular adhesion, angiogenesis, metastasis, inflammation, atherosclerosis, homeostasis 121,122 . THBS1 is generally considered to be a tumour suppressor and mediates cell-to-cell and cell-to-matrix interactions that are important for platelet aggregation and angiogenesis 123,124 . THBS2 (Thrombospondin 2) is an ECM disulphide-linked homotrimeric glycoprotein 95 that plays an important role in cell adhesion, migration and proliferation, cell-to-cell and cell-to-matrix interactions, and angiogenesis 125 . It has also been implicated in atherosclerosis and thrombosis due to its function in the regulation of matrix metallopeptidase 2 (MMP2) 126,127 . Thus, THBS1 and THBS2 may play a major role in ECM homeostasis. However, only very limited data is available concerning the status of THBS1 and THBS2 in the development of the receptive endometrium, what needs further study.
In conclusion, we sequenced the transcriptome of dairy goats with the Illumina 2500 sequencing platform. Trinity software was used for de novo assembly of our valid reads. A total of 3,255 DEGs were discovered between the pre-receptive endometrium (PE) and the receptive endometrium (RE), with a threshold of P < 0.05. Based on the results of GO and KEGG enrichment, some genes potentially play an important role in the development of endometrial receptivity. Additionally, 76,729-77,102 putative SNPs and 12,837 SSRs were discovered in this study. Our results not only reveal new information regarding the development of dairy goat endometrial receptivity but also provide a broad and novel vision for future research at the molecular level in dairy goats.

Methods
Ethics statement. All animals in this study were maintained according to the No. 5 proclamation of the Ministry of Agriculture, P. R. China. And animal protocols were approved by the Review Committee for the Use of Animal Subjects of Northwest A&F University.

Study Design and Sample Collection.
A total of 20 healthy, 24-month-old multiparous dairy goats (Xinong Saanen) were induced to oestrous synchronization for this study. The first day of mating was considered to be day 0 of pregnancy. Gestational days 5 and 15 are important time points for embryo implantation in goats 128 . The experimental goats were observed 3 times daily to ascertain oestrous signs and mated naturally twice during oestrus. Then, the goats were euthanized following intravenous injection of a barbiturate (30 mg/kg) at gestational day 5 (pre-receptive endometrium) and gestational day 15 (receptive endometrium). Endometrium samples from 10 goats at gestational day 5 and 10 goats at gestational day 15 were obtained from the anterior wall of the uterine cavity. All tissue samples were washed briefly with PBS (Phosphate Buffered Saline) and then immediately frozen in liquid nitrogen.
Library Construction and Sequencing. Total RNA was extracted from every sample using the TRIzol reagent (Invitrogen, CA, USA) and DNA contamination was evaluated using DNase (TaKaRa, Dalian, China). The total RNA quantity and purity were analysis of Bioanalyzer 2100 (Agilent, CA, USA) and RNA 6000 Nano LabChip Kit (Agilent, CA, USA) with RIN number > 7.0. The total RNA with lowest quality was not used for further study from the pre-receptive endometrium samples and receptive endometrium samples, respectively. Approximately 5 ug of total RNA was subjected to isolate Poly (A) mRNA with poly-T oligo attached magnetic beads (Invitrogen). Following purification, the mRNA was fragmented into small pieces using divalent cations under elevated temperature. Then the cleaved RNA fragments were reverse-transcribed to create the final cDNA library in accordance with the protocol for the mRNA-Seq sample preparation kit (Illumina, San Diego, USA). Each PE and RE library was constructed by pooling 9 homogenized total RNAs from the endometrium samples. Then, paired-end sequencing of the PE and RE libraries was performed on the Illumina Hiseq2500 (LC Sciences, USA) following the vendor's recommended protocol. The length of the reads was 100 bp, and the average insert size for the paired-end libraries was 179 bp (the length of the adapter was 121 bp).

Primary Analysis and Sequence
Assembly. Prior to assembly, raw reads (raw data) in the fastq format were processed using previously described scripts 29 . In this step, clean reads (valid data) were obtained by removing reads containing adapter sequences, ploy-N and the sequencing primer as well as low quality reads (1, reads containing sequencing adaptors; 2, reads containing sequencing primer; 3, nucleotide with q quality score lower than 20) from the raw data. At the same time, Q20, Q30, GC-content and sequence duplication level of the clean data were calculated. All of the downstream analyses were based on high quality clean data. On the basis of previousresearch work 26,36 , the publicly available program Trinity (http://trinityrnaseq.sourceforge.net/) was selected for assembly in this study and used to calculate the N50 number, average length, max length and contig number during different length intervals. Sequence Functional Annotation. Unigene annotations provide functional annotations for all unigenes in addition to their expression levels. Functional annotations of unigenes were analysed using protein sequence similarity by LC-Bio (Hangzhou, China). Protein function information could be predicted from annotations of the most similar proteins in the following published databases: SWISS-PROT (A manually annotated and reviewed protein sequence database, ftp://ftp.uniprot.org/pub/databases/uniprot/currentrelease/knowledgebase/complete/uniprot sprot.fasta.gz), NR (NCBI non-redundant protein sequences, ftp://ftp.ncbi.nlm.nih.gov/blast/db/FASTA/nr.gz), KEGG (Kyoto Encyclopedia of Genes and Genomes, http://www.kegg.jp/kegg/download/), KOG (Karyotic Ortholog Groups, http://www.ncbi.nlm. nih.gov/COG/grace/shokog.cgi), and Pfam (A widely used protein family and structure domain database, ftp://ftp.sanger.ac.uk/pub/databases/Pfam/releases/Pfam27.0/Pfam-A.fasta.gz). All searches were conducted with BLASTx (http://blast.ncbi.nlm.nih.gov/) using a minimum E-value of < 1e-10 as the threshold 29,129 . Analysis of Gene Expression. To investigate the expression level of each unigene in different samples, all unigenes for each sample were aligned back to the final assembly using Perl scripts in Trinity under the default parameters option. The alignment produced digital expression levels for each contig that were normalized with a RESM-based algorithm using Perl scripts in the Trinity package to obtain RPKM values. Gene expression levels were normalized by considering the RPKM value (reads per kilobase of the exon model per million mapped reads) and the effect of the sequencing depth and gene length on the read counts; this approach represents the most commonly used method for estimating gene expression levels 25 . The fold changes (log2 (RE RPKM /PE RPKM )) and the corresponding significance threshold of the P-value were estimated according to the normalized gene expression level. Based on the expression levels, the significant DEGs between PE and RE were identified with "P < 0.05 and |log 2 fold change|> 1" used as the threshold to judge the DEGs in this study.
Gene ontology and pathway enrichment analysis of DEGs. Gene ontology (GO) is an international standard gene functional classification system 130 . The hypergeometric test was applied to map all differentially expressed genes to terms in the GO database 131 . The corrected P-value < 0.001 was used as the threshold to find significantly enriched GO terms in the input list of DEGs compared to their genomic background. The formula of P-value was as follow ( The N represented the number of GO annotated genes in genome, n represented the number of differentially expressed genes in N. M represented the number of particular GO annotated genes in genome, m represented the number of particular GO annotated genes expressed differentially in M 132 . And then P was less than 0.001 was used as the threshold to judge significant enrichment GO term in this study 10,12 . KEGG is the major public pathway-related database that helps to further understanding the biological functions of genes with high level functions and the utilities of the biological system from large-scale molecular datasets (http://www.genome.jp/kegg/) 29 . Pathway enrichment analysis identifies significantly enriched metabolic pathways or signal transduction pathways 10 using the corrected P-value < 0.001 as a threshold to find significantly enriched KEGG terms in the input list of DEGs compared to their genomic background 133 . The calculation formula is the same as that used for the GO analysis.

Detection of Putative SNP and SSR.
High-throughput SNP data has demonstrated its potential for making inferences about demographic and adaptive processes in natural populations [134][135][136][137] . The SNPs of PE and RE at the transcriptome level were analysed based on the massively parallel Illumina technology. The Bowtie (http://bowtie-bio.sourceforge.net/) and Samtools (http://samtools.sourceforge.net/) software programmes were used to identify SNPs; all reads were included in this process 129 . The sample data were mapped to the transcript with the Bowtie software after pretreatment based on the transcription library. Further SNP analysis was performed based on the results of mapping, and variable sites with higher possibilities were further filtered using the software Samtools.
The MISA (Microsatellite) (http://pgrc.ipkgatersleben.de/misa) was used to identify SSRs. The Batch Primer3 program was used to design primer pairs for amplification of the SSR motifs 138 . The default settings were used with the exception of the annealing temperature, which was set for an optimum of 60 °C 129 . Monomers, dimers, trimers, quadramers, pentamers and hexamers were all considered as the search criteria for SSRs in the MISA script.