MicroRNome analysis generates a blood-based signature for endometriosis

Endometriosis, characterized by endometrial-like tissue outside the uterus, is thought to affect 2–10% of women of reproductive age: representing about 190 million women worldwide. Numerous studies have evaluated the diagnostic value of blood biomarkers but with disappointing results. Thus, the gold standard for diagnosing endometriosis remains laparoscopy. We performed a prospective trial, the ENDO-miRNA study, using both Artificial Intelligence (AI) and Machine Learning (ML), to analyze the current human miRNome to differentiate between patients with and without endometriosis, and to develop a blood-based microRNA (miRNA) diagnostic signature for endometriosis. Here, we present the first blood-based diagnostic signature obtained from a combination of two robust and disruptive technologies merging the intrinsic quality of miRNAs to condense the endometriosis phenotype (and its heterogeneity) with the modeling power of AI. The most accurate signature provides a sensitivity, specificity, and Area Under the Curve (AUC) of 96.8%, 100%, and 98.4%, respectively, and is sufficiently robust and reproducible to replace the gold standard of diagnostic surgery. Such a diagnostic approach for this debilitating disorder could impact recommendations from national and international learned societies.

genome-wide miRNA expression profiling by small RNA sequencing from plasma available in a biobank, Vanhie et al. identified a set of 42 miRNAs with discriminative power to differentiate between patients with and without endometriosis. Expression of 41 of these miRNAs was confirmed by RT-qPCR and three diagnostic models were built to discriminate between controls and all stages of endometriosis: minimal-mild endometriosis, and moderate to severe endometriosis. Only the model for minimal-mild endometriosis (miR-125b-5p, miR-28-5p and miR-29a-3p) exhibited an AUC of 60%, and while its sensitivity was acceptable at 78% the specificity was only 37% 14 . Selecting some miRNAs altered in endometriosis from a large screen, Moustafa et al. reported increased expression of four serum miRNAs (miR-125b-5p, miR-150-5p, miR-342-3p, miR-451a) and decreased expression of two (miR-3613-5p, let-7b). The authors concluded that their 6-miRNA signature was able to differentiate patients with endometriosis from those with other gynecologic disorders with an accuracy > 0.9 15 . However, overall, the studies in this field are based on small sample sizes limiting the validation of the signatures. Furthermore, discrepancies in methodology (study design, collection, storage, sequencing techniques, and statistical approach) have a particularly strong influence on the results of small studies 4,16,17,20,26 . In addition, miRNA selection based on the highest AUC is of low accuracy since the extreme variability of the endometriosis phenotypes has a major impact on the AUC. This may explain why signatures composed of a small selection of miRNAs are of low validity, stability, and reproducibility 4,16,17,20,26 . Thus, despite the findings of these studies, no new blood-based biomarkers are currently used in clinical practice for the diagnosis of endometriosis.
Therefore, the aim of the prospective ENDO-miRNA study, using both Artificial Intelligence (AI) and Machine Learning (ML), was to analyze the current human miRNAome to differentiate between patients with and without endometriosis, and to develop a blood-based miRNA diagnostic signature for endometriosis with internal cross-validation.

Materials and methods
Ethics statement. Data and plasma collection were from the prospective ENDO-miRNA study (Clinical-Trials.gov Identifier: NCT04728152). The Research Protocol (n° ID RCB: 2020-A03297-32) was approved by the ethics committee "Comité de Protection des Personnes (C.P.P.) Sud-Ouest et Outre-Mer 1" (CPP 1-20-095 ID 10476). All participants included in the study gave their written and informed consent for the use of their data. All the procedures were performed in accordance with the relevant guidelines and regulations.
The study and data analysis followed the STAndards for the Reporting of Diagnostic accuracy studies (STARD) guidelines 27 (Annex 1). The study consisted of two parts: (i) biomarker discovery based on genomewide miRNA expression profiling by small RNA sequencing using next generation sequencing (NGS), and (ii) development of a miRNA diagnostic signature according to expression and accuracy profiling using an ML algorithm 28-38 . Study population. The prospective ENDO-miRNA study included 200 plasma samples obtained from women with chronic pelvic pain suggestive of endometriosis. All the plasma samples were collected from the participants between January and June 2021. All the patients underwent either a laparoscopic procedure (operative or diagnostic) and/or MRI imaging [9][10][11][12] . The laparoscopic procedures were systematically videoed and then analyzed by two operators (CT, YD) who were blinded to the symptoms and imaging findings, to confirm the presence or absence of endometriosis. For the patients who underwent laparoscopy, diagnosis was confirmed by histology. Patients who were diagnosed with endometriosis without laparoscopic evaluation, all had MRI findings with features of deep endometriosis with colorectal involvement, and/or endometrioma confirmed by a multidisciplinary endometriosis committee. Following exploration by laparoscopy or MRI, the women were classified into two groups: an endometriosis group; and a control group of women with various benign pathologies other than endometriosis or with symptoms suggestive of endometriosis but without clinical or MRI features and no endometriosis lesions found during laparoscopic inspection (complex patients). The study flow chart is reported in Fig. 1. The patients with endometriosis were stratified according to the revised American Society of Reproductive Medicine (rASRM) classification 39 .
Plasma sample collection. The blood samples (4 mL) were collected in EDTA tubes (BD, Franklin Lakes, NJ, USA) before the surgery. The plasma was isolated from whole blood within 2 h after blood sampling by two successive centrifugations at 4 °C (first at 1900g (3000 rpm) for 10 min, followed by 13,000-14,000g for 10 min to remove all cell debris), then aliquoted, labeled and stored at − 80 °C until analysis as previously described [40][41][42] . The miRNAs were automatically extracted with a Promega Maxwell ® Instrument to avoid cross contamination. Extractions and quality control (QC) were conducted in an accredited biobank (NFS96-900) to guarantee good processes. The samples were anonymized. NGS library preparation was performed individually under ISO-9001-2015 certification. QC was performed before pooling the indexed samples. After sequencing, demultiplexing was done with ILLUMINA bcl2fastq. To avoid mixing, exchanging or cross-contamination, each sample or preparation was followed with its own Laboratory Information Management System (LIMS).
RNA sample extraction, preparation and quality control. RNA  Differential expression analysis of miRNA. Expression level quantification of the miRNAs was first determined by miRDeep2 47 . Differential expression tests were then conducted in DESeq2 only for the miRNAs with read counts in ≥ 1 of the samples. DESeq2 integrates methodological advances with several novel features to facilitate a more quantitative analysis of comparative RNA-seq data using shrinkage estimators for dispersion and fold change 48,49 . miRNAs were considered as differentially expressed if the absolute value of log2-fold change was > 1.5 (upregulated) and < 0.5 (downregulated). The P value adjusted for multiple testing was < 0.05 48

Results
Description of the ENDO-miRNA cohort. The ENDO-miRNA study included 200 patients, with 76.5% (n = 153) who were diagnosed with endometriosis, and 23.5% (n = 47) without (controls), respectively. Among patients with endometriosis, 52% (n = 80) and 48% (n = 73) were staged rASRM stage I-II versus with III-IV. The control group is composed in majority (51% (n = 24)) by women with no abnormality after laparoscopic diagnostic. The clinical and demographics characteristics of patients are summarized in Table 1. There were no significant differences in terms of age and body mass index (BMI) between the groups. Compared to the control group, the endometriosis group had higher rates of sciatica pain (p = 0.021), dyspareunia (p < 0.001), lower back pain outside menstruation (p = 0.049), and urinary pain during menstruation (p < 0.001).
Global overview of the miRNA transcriptome. The sequencing of the 200 plasma samples for small RNA-seq provided ~ 4228 M raw sequencing reads (from ~ 11.7 M to ~ 34.98 M reads/sample). After filtering steps, we retained 39% (~ 1639 M) of initial raw reads. Among those, the majority of were described as 20-23 nt length which corresponds to mature miRNA sequences. The identification of known miRNAs provided ~ 2588 M sequences which have been mapped to 2633 known miRNAs from miRbase (v22). The expressed miRNAs ranged from 666 to 1274 per blood sample. The overall composition of processed reads is shown in Annex 2.
miRNA blood-based diagnostic signature for endometriosis. The overall performance of the ML models against the 10 datasets are reported in Table 3. Against the 10 datasets randomly generated, the sensitivity, specificity, and AUC ranged from 80.6 to 96.8%, 77.8 to 100%, and 76.2 to 98.4%, respectively. The most accurate signature (n°3) after internal cross-validation provides a sensitivity, specificity, and AUC of 96.8%, 100%, and 98.4%, respectively (Table 3).

Relation between pathophysiology of endometriosis and miRNA expression. Among the 86
miRNAs composing the diagnostic signature, 40.7% (35/86) have not been previously described in the human. The remaining have been described in both benign and malignant conditions (Table 4). Almost 30% of the 86 miRNAs are downregulated, and many of them are related to the PI3K/Akt and MAPK pathways. Figure 3 illustrates the network, pathways, and functions for the relevant miRNAs associated with these pathways 55,56 . Only miR-124-3p has previously been reported in patients with endometriosis. Details concerning the exhaustive signaling pathways and targeted regulators are summarized in Annex 4.

Discussion
We present here a blood-based diagnostic signature combining a selected panel of 86 miRNAs extracted from patients with chronic pelvic pain suggestive of endometriosis participating in the prosspective ENDO-miRNA study.
To the best of our knowledge, this is the first blood-based diagnostic signature obtained from a combination of two robust and disruptive technologies merging the intrinsic quality of miRNAs to condense the endometriosis phenotype (and its heterogeneity) with the modeling power of AI. The most accurate signature provides a www.nature.com/scientificreports/ sensitivity, specificity, and AUC of 96.8%, 100%, and 98.4%, respectively, and is sufficiently robust and reproducible to replace the gold standard of diagnostic surgery. www.nature.com/scientificreports/ We hypothesize that this signature could have large implications for clinical practice in improving endometriosis care pathways by significantly reducing time to diagnosis and therapeutic wandering.
In the specific setting of endometriosis, multiple biomarkers 13,18,64 , genomic analyses 32,57 , questionnaires 5,58,59 , symptom-based algorithms 5 , and imaging techniques 12 have been advocated as screening and triage tests for endometriosis. However, to date, none have demonstrated sufficient clinical accuracy, i.e., a sensitivity of 0.94 and specificity of 0.79 12,13,18 . The present signature composed of 86 miRNAs exceeds the required sensitivity and specificity metrics suggesting high clinical value. In addition, as stated by Agrawal et al. 4 the main characteristic's for relevant biomarker for clinical use is one which is (i) specific to the disorder, (ii) associated with early stage of the disease, (iii) accessible and acceptable with non-invasive procedure, (iv) biologically stable and clinically reproducible, and (v) associated with known or potential pathophysiological mechanisms. Therefore, to subscribe to Agrawal et al. 's criteria and improving endometriosis diagnosis, the prospective ENDO-miRNA study was designed to analyze the entire humain miRNome especially for (i) complex women (women with chronic pelvic pain suggestive of endometriosis and both negative clinical examination and imaging findings), (ii) women various phenotypes based on early and advanced stages (I-II vs III-IV rASRM) and (v) women with other gynecologic disorders sharing the symptoms of endometriosis. The exhaustive analyze of all miRNAs (n = 2633) from 200 blood samples of patients with without endometriosis allow to capture the complexity of the disease and in fine to illustrate its heterogeneity. The data that emerged from this analysis, resulted in the combination of a large set of 86 miRNAs robustly selected by 10 reproducible statistical methods (and not only based on the AUC criteria as previous reports). miRNA selection based purely on the highest AUC is of low accuracy because the extreme variability of endometriosis has a major impact on AUC. This point may explain the low validity, stability and reproducibility of using a few miRNAs to design a signature.
To date, only studies evaluating a limited number of mi-RNAs 14,17,20,21,26 using classic logistic regression have been published. These studies show that some miRNAs are deregulated in patients with endometriosis. For example, in a retrospective study using blood samples from a biobank, Vanhie et al. 14 failed to build a signature based on 42 miRNAs divided into three models of three miRNAs each, mainly because the authors focused on the accuracy of each miRNA to design a signature. In agreement with Lopez-Rincon et al. [36][37][38] it would appear illusory that endometriosis-a highly heterogeneous multifactorial disorder with various phenotypes and characterized by incomplete knowledge of the various pathologic pathways-could be reflected by a few miRNAs. Therefore, we decided (i) to select specific miRNAs based on 10 statistical methods (resulting in a selection of 86 miRNAs), and (ii) to use several highly accurate ML models which support the value of AI technology as a disruptive approach. Such an approach has been previously validated in cancer showing that a 100-miRNA signature was sufficiently stable to provide almost the same classification accuracy across different types of cancers and platforms 36,37 .
Numerous studies have evaluated blood or plasma miRNA expression as potential biomarkers for endometriosis but with discordant results, probably because of study design issues but also because of limitations inherent to the biological techniques used 17 . For example, Yang et al. 60 found 61 miRNAs (36 downregulated and 25 upregulated) significantly expressed in the serum of patients with endometriosis by array analysis, but only five were validated by qRT-PCR. These data underline the importance of NGS platforms for miRNA profiling. Although considerable computational support is needed, these platforms are of high sensitivity and resolution, and of excellent reproducibility allowing the analysis of millions of RNA fragments. As described by A C 't Hoen et al. 61 , bioinformatics allows the exhaustive analysis of all RNA fragments that can be aligned and mapped, and their expression levels quantified, thus eliminating the need for sequence specific hybridization probes or qRT-PCR which are required in a microarray 62 .
From a pathophysiologic point of view, a systematic review revealed that 45% of the 86 miRNAs composing our endometriosis signature have not previously been reported in the human. Only miR-124-3p has previously been reported in patients with endometriosis, and is involved in ectopic endometrial cell proliferation and invasion in both benign and malignant disorders 63 . In addition, miR-124-3p has been found to be involved in various signaling pathways such as mTOR STAT3, PI3K/Akt, NF-κB, ERK, PLGF-ROS, FGF2-FGFR, MAPK, GSK3B/β -catenin 64,65 . The remaining miRNAs of the signature have previously been identified as being involved in both benign and malignant disorders with the main signaling pathways being JAK/STAT, NF-KB, YAP/TAZ, PIK3/Akt, www.nature.com/scientificreports/ Wnt/β-catenin, FOXO, MAPK, p53, mTOR and TGF-ß. All these data open new avenues to better understand the pathophysiology of endometriosis and to develop new therapeutic options already used in other pathologies. Some limits of the present study deserve to be discussed. First, some of our patients-in both the endometriosis and control group-had a prior hormonal treatment that may have affected miRNA expression. However, Vanhie et al. reported that no miRNAs changed significantly with the menstrual cycle 14 . Moreover, Moustafa et al. found that miRNAs remained unchanged both throughout the menstrual cycle and in response to sex steroid hormone treatment 15 . Second, among the 10 miRNAs with the most important diagnostic value only miRNA124-3p has been previously reported in the setting of endometriosis which suggests that external validation is required. Third, our signature was based on patients aged between 18 and 43 years excluding adolescents with pelvic pain. Therefore, an additional study should be performed for adolescent patients. Fourth, although no difference was observed in miRNA expression between patients with dysmenorrhea under or over VAS 7, no attempt was made to correlate symptoms with the various locations of endometriosis. Finally, some patients with deep endometriosis and/or endometrioma were included in the endometriosis group without having undergone laparoscopy and this represents a potential bias. However, the meta-analysis by Nisenblat et al. demonstrated that MRI fulfills the criteria for a replacement and SnNout triage test for endometrioma, colorectal and pouch of Douglas obliteration related to endometriosis 12 .  www.nature.com/scientificreports/