Bioinformatic analysis reveals the importance of epithelial-mesenchymal transition in the development of endometriosis

Background: Endometriosis is a frequently occurring disease in women, which seriously affects their quality of life. However, its etiology and pathogenesis are still unclear. Methods: To identify key genes/pathways involved in the pathogenesis of endometriosis, we recruited 3 raw microarray datasets (GSE11691, GSE7305, and GSE12768) from Gene Expression Omnibus database (GEO), which contain endometriosis tissues and normal endometrial tissues. We then performed in-depth bioinformatic analysis to determine differentially expressed genes (DEGs), followed by gene ontology (GO), Hallmark pathway enrichment and protein-protein interaction (PPI) network analysis. The findings were further validated by immunohistochemistry (IHC) staining in endometrial tissues from endometriosis or control patients. Results: We identified 186 DEGs, of which 118 were up-regulated and 68 were down-regulated. The most enriched DEGs in GO functional analysis were mainly associated with cell adhesion, inflammatory response, and extracellular exosome. We found that epithelial-mesenchymal transition (EMT) ranked first in the Hallmark pathway enrichment. EMT may potentially be induced by inflammatory cytokines such as CXCL12. IHC confirmed the down-regulation of E-cadherin (CDH1) and up-regulation of CXCL12 in endometriosis tissues. Conclusions: Utilizing bioinformatics and patient samples, we provide evidence of EMT in endometriosis. Elucidating the role of EMT will improve the understanding of the molecular mechanisms involved in the development of endometriosis.


Plain English summary
Endometriosis is a condition where tissue similar to the lining of the womb starts to grow in other places, such as the ovaries and fallopian tubes. About 10-15% of women in their reproductive years are affected worldwide. Although the cause of endometriosis remains unclear; genetic and environmental factors are considered to be risk factors for manifestation and progression of endometriosis. This study aims to identify key players in the development of endometriosis using bioinformatics. This provides the computational capacity required to handle big datasets for genome analysis and complex algorithms for molecular simulation. We found an important role for the cellular process of epithelial-mesenchymal transition (EMT) in endometriosis, which may allow cells to migrate away from the primary site. This study suggests that targeting EMT might have therapeutic potential in endometriosis, with therapies currently undergoing development in the context of malignancy. 4

Background
Endometriosis is a frequently occurring gynaecological disease characterised by chronic pelvic pain, dysmenorrhea and infertility 1 . Its prevalence is estimated to be 10-15% of reproductive age females 2 and around to 20-48% in infertile women 3 .
Despite a number of theories being suggested to describe the molecular mechanisms underlying the development of endometriosis such as: Sampson's theory of retrograde menstruation 4 , ectopic implantation, epigenetic factors 5 , immune and inflammatory factors 6,7 , eutopic endometrial determinism 8 , and stem cell factors 9 ; disease pathogenesis is still not fully understood.
At present, there have been several studies on the gene expression profiles of endometriosis 10-13 , which have identified various differentially expressed genes (DEGs) involved in the development of endometriosis. However, due to heterogeneity between each independent experiment as a result of variations in tissue or specimens and/or different data processing methods, the identification of these DEGs is inconsistent. In this study, we integrated different studies using a non-biased approach, which may resolve these problems and enable the discovery of effective and reliable molecular markers.
We downloaded 3 microarray datasets GSE11691 11 , GSE7305 12 , GSE12768 13 , from Gene Expression Omnibus database (GEO), which contain gene expression data from endometriosis tissues and normal endometrial tissues. We then performed deep bioinformatic analysis, including identifying common DEGs, gene ontology (GO), Hallmark pathway enrichment and protein-protein interaction (PPI) network analysis.
The findings were further validated by immunohistochemistry (IHC) staining in endometrial tissues from endometriosis or control patients. The aim of this study was to identify common DEGs and important pathways, and to explore potential candidate biomarkers for the diagnosis and therapeutic targets in endometriosis.

Original data collection
We used "endometriosis" as a keyword on the Gene Expression Omnibus (GEO) database, and 3 datasets (GSE11691, GSE7305 and GSE12768) were collected. GSE11691 was in GPL96 platform, [HG-U133A] Affymetrix Human Genome U133A Array, which included 9 endometriosis and 9 normal endometrial samples (Control samples). GSE7305 was in GPL570 platform,  Affymetrix Human Genome U133 Plus 2.0 Array, which included 10 endometriosis and 10 normal endometrial samples (Control samples). GSE12768 was in GPL7304 platform, institute Cochin HG18 60mer expression array 47Kl, which included 2 endometriosis and 2 normal endometrial samples (Control samples). The platform and series matrix files were downloaded.

Analysis for Differentially Expressed Genes (DEGs)
RStudio software (version 3.6) was used to process and standardise the files. The CEL files of three datasets were downloaded from GEO. Raw data of the Affymetrix platform were normalised by Robust Multi-array Average (RMA) function in the affy package (version 1.64.0). Multiple probes relating to the same gene were deleted and summarised as the median value for further analysis. These 3 datasets were analyzed using the limma package (version 3.40.6) in the RStudio 14 , and genes with P value < 0.05 and Log[FoldChange] (Log[FC]) > 1 were considered as DEGs. Overlapping DEGs from three databases were screened for subsequent GO, Hallmark pathway enrichment and PPI analysis, and were displayed with Venn diagrams.

Analysis for GO and pathway enrichment
GO Biological Processes of DEGs were analyzed through online DAVID software 15 (version 6.8), P value < 0.05 as the cutoff criterion was considered statistically significant. The Hallmark pathway enrichment analysis was performed in Metascape 16 . P value < 0.05 as the cutoff criterion was considered statistically significant.

Protein-protein interaction (PPI) network analysis
The PPI of DEGs-encoded proteins was demonstrated by STRING (version 11.0) 17 , with search limited to "Homo sapiens" and a score > 0.700 corresponding to high confidence interaction as significant. Network construction and analyses were performed by Cytoscape (version 3.7.1). In addition, the function and pathway enrichment analysis were performed for DEGs in the modules by ClueGo (version 2.5.4), P value < 0.05 was considered to be significant.

Clinical sample collection
From June to October 2019, laparoscopic surgeries were performed in Jiangxi Maternal and Child Health Hospital (Nanchang, China), and 6 cases were pathologically diagnosed as ovarian endometriosis. On the staging criteria of endometriosis as stipulated by American Fertility Society revised (AFS-r), all patients with endometriosis were stage IV. Eutopic endometrial tissues were collected. The average age of the patients was (32.71 ± 1.12) years. Meanwhile, 6 cases of endometrial tissue were selected from patients with benign ovarian teratoma as the control group. The average age of patients was (32.18 ± 1.22) years.
All the collected endometrial tissues were diagnosed as proliferative endometrium after pathological histological diagnosis. There was no significant difference in the age of patients in each group (P value > 0.05). All menstrual cycles were normal, non-pregnant or non-lactation, and no hormonal medication was taken 6 months before the operation, and no obvious medical and surgical diseases and complications were found.

Immunohistochemistry (IHC) and image analysis
Fresh tissue specimens were taken during the operation, rinsed with physiological saline to remove blood and other impurities, fixed with 10% formaldehyde, dehydrated with conventional gradient ethanol and embedded in paraffin, continuously sliced with a paraffin microtome, and baked at 65°C for 1 h to dewax, and removed the glass. Tablets, soak in xylene for 40  Counterstained by haematoxylin, the slides were dehydrated in graded alcohol and mounted.
Image-pro Plus software was used to convert the image format and the grayscale units into optical density (IOD) units. Then area, density and IOD were selected for measure according to the manufactor's protocol.

Statistical analysis
Student's t-test was used for statistical analysis between two different groups when variables were normally distributed, which was confirmed by Q-Q plots and the Shapiro-Wilk test (SPSS 18.0, Armonk, NY, USA). P value < 0.05 was considered statistically significant.

Identification of Differentially Expressed Genes (DEGs) using integrated bioinformatics.
All datasets (GSE7305, GSE11691 and GSE12768) were first normalised by Robust Multi-array Average (RMA) (Supplementary Figures 1-3). Differential expression analysis was performed on these datasets in limma, and those genes with P value <  Table 2). In the cellular component, DEGs were mainly involved in extracellular exosome, extracellular space and extracellular region ( Figure 3a; Table 3). In the biological process, DEGs were mainly involved in cell adhesion, epithelial cell differentiation, inflammatory response and extracellular exosome ( Figure 3a; Table 4).

Signaling pathway enrichment in DEGs.
Signaling pathway enrichment of DEGs in endometriosis was performed using Metascape. The most significantly enriched pathways were submitted to Hallmark genes hit analysis. Hallmark pathway enrichment analysis identified epithelial mesenchymal transition (EMT), estrogen response late and estrogen response early as top pathways (Figure 3b; Table 5).

Protein-protein interaction (PPI) network analysis in DEGs.
PPI analysis was performed using the online STRING database and Cytoscape software. After removing the isolated nodes and the partially connected nodes, a grid network was constructed using the Cytoscape software ( Figure 4). Pathway enrichment analysis revealed that the genes were mainly involved in vascular smooth muscle contraction, cell adhesion molecules, NF-κB pathway, complement and coagulation cascade.  (Table 5).
In PPI network analysis, CXCL12 was found to be connected to a hub gene C3,

Discussion
Endometriosis occurs in about 10-15% of reproductive age females and the etiology is unknown 1, 2 . At present there is no cure and the treatment options available are limited.
The disease has a high recurrence rate, which adds to its large socio-economic impact 18 . Endometriosis is the growth of cells derived from the endometrium outside the uterus, such as the ovaries, peritoneum, intestines and vagina 19 . In a small number In this unbiased study, we found EMT in endometriosis could be potentially induced by inflammatory cytokines such as C-X-C motif chemokine ligand 12 (CXCL12), also known as stromal cell-derived factor 1 (SDF1). CXCL12 is highly expressed in endometriosis in our analysis, which is consistent with a previous report 43 . CXCL12 interacts with its specific receptor, C-X-C motif chemokine receptor 4 (CXCR4), which is not consistently over-expressed in these three datasets though. The CXCL12-CXCR4 axis promotes proliferation, migration, and invasion of endometriotic cells 44,45 . In human papillary thyroid carcinoma, the CXCL12-CXCR4 axis promotes EMT processes by activating the NF-κB signaling pathway 46 . In a murine model of endometriosis both C-X-C motif chemokine receptor 7 (CXCR7) and CXCL12 expression increased with grafting time 47 . Expression of CXCR7 is enhanced during pathological inflammation and tumor development, and CXCR7 mediates TGFβ1-induced EMT 48 . However, there were no probes for CXCR7 in the microarrays analysed in our studies. In endometriosis, it is still unclear whether CXCL12 promotes EMT through the CXCL12-CXCR4 axis or the CXCL12-CXCR7 axis. PPI analysis showed that CXCL12 interacts directly with complement C3 and C-C motif chemokine ligand 21 (CCL21), and a previous study showede CCL21 is up-regulated in endometriosis, which acts through inflammatory responses 49 . In TGF-β-induced EMT, the expression of C-C motif chemokine receptor 7 (CCR7), the CCL21 receptor, is increased and this facilitates breast cancer cell migration 50 .
Through IHC, we confirmed that CXCL12 is significantly increased in endometriosis, accompanied by a decrease in the expression E-cadherin (CDH1), which is consistent with bioinformatics analysis. These findings, together, suggest that CXCL12 may lead to endometriosis through EMT, although further research is required.
EMT in endometriosis has been suggested to be associated with smooth muscle metaplasia and fibrogenesis 51,52 . We found various markers for smooth muscle cells in the PPI network analysis. ACTA2 (α-SMA), is considered to be a marker of fibrosis and is up-regulated in endometriosis 53 , which is consistent with our findings. Previous studies 54,55 have shown that platelet-derived TGF-β1 can activate the TGF-β1/Smad3 signaling pathway, subsequently promoting EMT and fibroblast-to-myofibroblast trans-differentiation (FMT) in endometriotic lesions in turn, promoting smooth muscle metaplasia and ultimately leading to fibrosis.

Availability of data and materials
Data and materials from this study are available upon a written request.

This study was approved by the Ethics Committee of Jiangxi Provincial Maternal and
Child Health Hospital, China (No. EC-KT-201904). All patients have signed the informed consent for the study protocol and reserve the right to withdraw at any time.

Consent for publication
Not applicable.

Competing interests
The authors declare that they have no competing interests.             Graphs showing the data before (left) and after (right) normalisation.

Supplementary Figure 2. Standardisation of gene expression in GSE11691
dataset. Graphs showing the data before (left) and after (right) normalisation.

Supplementary Figure 3. Standardisation of gene expression in GSE12768
dataset. Graphs showing the data before (left) and after (right) normalisation.