## Introduction

Endometriosis is a chronic, gynecologic condition1 estimated to affect 190 million women worldwide2. This benign, but often debilitating condition is thought to impact ~10% of women based on extrapolations of pelvic pain and subfertility in the general population3 and of those that are symptomatic, the prevalence is thought to be 30% to 50%4. True prevalence rates are difficult to estimate because this condition is often underreported, undiagnosed or misdiagnosed1. In Canada, the national societal burden of endometriosis is estimated at CAD \$1.8 billion annually based on treatment costs, caregiver costs, quality of life and work absenteeism5. Endometriosis poses a large economic and disease burden on society and the precise scope of the problem remains unknown.

Endometriosis is characterized by extrauterine growth of endometrial-like tissue in areas of the pelvis (i.e., ovaries), bowels, bladder, and peritoneum6. These growths are rarely found in the thoracic region, and other organ systems7,8. Endometriosis has three predominant phenotypes: superficial endometriosis, endometriomas and deep endometriosis (DE)8,9. There are many staging systems for endometriosis, including the American Society for Reproductive Medicine classification system: stage I (minimal), stage II (mild), stage III (moderate), and stage IV (severe)10,11. However, given the complexity of this disease, it is difficult to universally stage and characterize under the present systems. Significant research has been done in recent years in attempts to elucidate the pathogenesis of this disease and many etiological factors are currently being explored including immune-mediated, inflammatory, genetic and environmental components12,13.

The signs and symptoms of this disease are non-specific and can vary in severity, creating clinical heterogeneity, which adds to the diagnostic difficulty associated with this disease8. Patients can present with a range of symptomatology depending on the type of endometriosis, location of implants, stage, and severity including but not limited to dysmenorrhea, dyspareunia, abdominal pain, chronic pelvic pain, menorrhagia, bowel symptoms, urinary symptoms, and subfertility or infertility8. Due to the combination of non-specific symptoms, a long differential list, lack of provider awareness, unnecessary investigations, and a lack of non-invasive diagnostic tools, many patients experience significant delays in receiving an endometriosis diagnosis1,14,15,16. The current literature has documented diagnostic delays of up to 6 to 12 years globally before patients receive a definitive diagnosis and adequate management1,17,18. Currently, the gold standard diagnostic procedure for endometriosis remains laparoscopic visualization of lesions followed by histologic confirmation of ectopic endometriotic implants8, a costly and invasive process that requires a skilled clinician. Transvaginal ultrasonography is a commonly used clinical technique in endometriosis screening and diagnosis, given its non-invasive nature and widespread accessibility8.

In the past 5 years, the emergence of artificial intelligence (AI) has spread rapidly into healthcare; it has demonstrated marked potential in disease diagnostics, treatments, and a higher-level analysis of large biomedical datasets19,20. With the increase in digitization in healthcare, AI presents novel opportunities to decrease the amount of time required for diagnosis and to streamline care in many settings19. Machine learning (ML) is a subset of AI and includes common methods such as logistic regression with the use of training and test sets and support vector machines (SVMs)19. Currently, AI has been used to analyze multi-omics, clinical, behavioral/wellness, environmental and research and developmental data19, and it has been applied to decision-making, patient self-management, triage, understanding disease mechanisms, and drug discovery21,22. However, AI methods require an expert’s oversight to help inform the model’s development since clinical problems are often complex and multifaceted19. Additionally, the privacy and the security of patient data remain a consideration when introducing new technology into healthcare; thus researchers should be aware of any risks associated with AI models19.

From fetal heart monitoring to reproductive medicine, AI technologies have been used in the field of obstetrics and gynecology and have demonstrated the potential to significantly aid in prediction of outcomes22,23,24,25. Given the diversity of its use in the clinical context, there is great potential to apply AI to the complex challenges presented by endometriosis and improve non-invasive diagnostics to reduce the delays and human error associated with diagnosis22. However, clinicians face significant challenges in the field of AI applications including a widespread lack of understanding about different AI methods and the competencies and limitations of AI technologies21. This review examines the different ways AI methods have been applied to solve pressing issues in endometriosis diagnostics, prediction, and research as shown in Fig. 1. By providing a thorough understanding of the different models and their application to clinical problems, and by analyzing their strengths and limitations, recommendations will be provided to help future researchers adequately develop AI models to advance the field of endometriosis.

## Results

### Study selection

A total of 1309 titles were identified by searching the PubMed, Medline-OVID, EMBASE, and CINAHL database, and 115 full-texts were eligible for screening after studies were excluded during the title and abstract-screening stages. Of these, 79 papers were excluded in the final review based on our exclusion criteria and 36 studies were included in the final review (Fig. 2). A summary of the eligible studies and extracted study characteristics is shown in Table 1. The majority of studies were predominantly retrospective designs (n = 20) using data from large clinical databases and registries and some prospective designs (n = 16); no randomized studies were included. Samples sizes ranged from modest numbers of 26 patients with endometriosis26 to 1396 symptomatic patients27, with the average sample size being 245 individuals for studies exploring diagnosis and prediction in endometriosis.

### Study characteristics

In the field of endometriosis, AI utilization spanned three overarching categories: predicting outcomes in endometriosis populations, building diagnostic models, and improving research efficacy. Most interventions were developed to assist with prediction of endometriosis in patients. However, the type, stage and specific characteristics of endometriosis that these interventions predicted, differed among the studies, depending on the research question generated by the authors. Approximately 44.4% (n = 16) of the studies analyzed the predictive capabilities of AI approaches in patients with endometriosis, while 47.2% (n = 17) explored diagnostic capabilities. The predictive capabilities differed between studies but included many aims such as predicting fertility therapy success in endometriosis patients, the likelihood of endometriosis versus other pelvic pain pathologies, predicting the presence of DE, and many more as seen in Table 1. Only 8.33% (n = 3) of the studies used AI technologies to advance the understanding of disease pathophysiology28,29,30. The AI methods that were used included: logistic regression, K-nearest neighbor, Naïve Bayes, random forest, decision tree, SVMs, neural networks, classification tree analysis, genetic algorithm, least squares support vector machines (LSSVMs), partial least squares discriminant analysis (PLSDA), margin tree classification, quick classifier algorithm, quadratic discriminant analysis (QDA), natural language processing (NLP), principle component analysis (PCA), adaptive boosting, eXtreme gradient boosting, voting classifier (hard/soft), deep learning and new ensemble ML classifiers. However, logistic regression (n = 15) was the AI intervention that was most frequently used to build predictive and diagnostic models.

The types of inputs used in different AI models varied among the studies. Four studies used biomarkers as the specific inputs for their final predictive model, but the types of biomarkers differed including: angiogenic factors, cytokines, serum microRNAs signatures, and other metabolite biomarkers. Some studies also used metabolite spectra as inputs for their AI models (n = 10) however, there was significant diversity between the type of spectrometry method (i.e., Raman spectrometry versus hydrogen nuclear magnetic resonance [1H-NMR] Carr-Purcell-Meiboom-Gill [CPMG] spectrometry) and the specific mass-dependent velocity (m/z, mass divided by charge number) peak ranges that were used among the studies. Other studies also used genetic variables such as large transcriptomics datasets (n = 5) and clinical factors (n = 6) as inputs for their final models. The clinical factors that were used in different models demonstrated some similarity with age, history of pelvic surgery, dysmenorrhea, and pelvic pain being commonly used variables. However, many studies used different combinations, thresholds and classifiers for these variables in their models. For instance, various combinations of severe dysmenorrhea, primary dysmenorrhea, and secondary dysmenorrhea were used in different ML models.

Although the AI approaches were heterogenous, most models generally achieved sensitivity and specificity above 85%, as demonstrated in Table 1. All of the studies (n = 33) used a validation process to train and validate AI models with various methods of cross-validation (i.e., bootstrapping method, leave-one-out cross-validation, etc.) or by implementing a validation/test cohort not used in the initial training set. Table 1 also reports on sensitivity and specificity for the models.

Given the heterogeneity in the purpose of the AI intervention, type and stage of endometriosis being examined, type of AI methodology used, and evaluation metrics, the included studies were grouped into six categories based on the inputs used to create the AI models. These categories are discussed in detail below.

### Diagnostic or predictive models for endometriosis using biomarkers

Four different studies31,32,33,34,35 examined the use of biomarkers as inputs to create diagnostic or predictive AI models in endometriosis populations. As seen in Table 2, the type of biomarkers used differed among the studies. Knific et al.31 was the only study that used protein ratios while others used metabolites33, miRNAs35 and other biomarkers34. Knific et al.31 and Bendifallah et al.35 were the only studies in this category to use the random-forest method to develop a diagnostic model for endometriosis and the accuracy of Knific et al.’s31 model was reported to be 59%31 —the lowest accuracy for all the models in this category—while the clinical accuracy of Bendifallah et al.’s35 model was significantly higher with a sensitivity and specificity of 96.8 and 100%. One study used LSSVMs34 and the accuracy of this method was deemed to be 79% with a sensitivity and specificity of 82% and 75%, respectively. One study also used SVMs to develop a diagnostic model for endometriosis using lipidomic profiling of endometrial fluid in patients with ovarian endometriosis33. The accuracy of this method was reported to be 85.7% with a sensitivity and specificity of 58.3% and 100%, respectively. It should be noted that among the four studies that were examined, there were no commonalities in the specific biomarker inputs used; thus, it is difficult to compare the accuracy of each AI model given the differences in the inputs used. The pooled SE and SP for each study’s most accurate model were 85.6% and 85%, respectively33,34,35.

### Diagnostic or predictive models for endometriosis using protein spectra

Ten studies26,36,37,38,39,40,41,42,43,44 used various metabolite spectra as their primary inputs to develop diagnostic and predictive models in endometriosis populations. In this specific problem formulation, it is important to note the methodology that is used. The most popular method to determine metabolite spectra for model development was surface-enhanced laser desorption/ionization time-of-flight mass spectrometry, which was used by four studies26,41,43,44. The pooled SE for the models with highest accuracy in each study was 91.7%, while the pooled SP was 81.1%26,37,38,39,40,41,42,43,44. Table 3 presents the other methods of spectrometry and spectroscopy that were used to determine the metabolite spectra of interest for the model inputs.

Among the studies in this category, artificial neural networks (ANNs) were the most popular method used in three of the models26,38,44. However, although these three studies used the same type of AI intervention, the inputs varied greatly between them. Two studies used PLSDA to compute their final models36,42, albeit using different methodologies (mass spectroscopy36 and 1H-NMR spectrophotometer42). While the inputs also varied between both models, they both had a similar correct classification rates of 84%36 and 86.67%42. Further studies between similar inputs are needed to determine if PLSDA is an appropriate AI intervention to compute diagnostic and predictive models in endometriosis populations.

### Diagnostic or predictive models for endometriosis using clinical variables and symptoms

Six studies45,46,47,48,49,50 grouped in this category strongly preferred using logistic regression; two studies50,51 used decision tree methods to build a model and one study50 also used random forest, eXtreme gradient boosting and voting classifier (soft/hard) ML algorithms as shown in Table 4. Interestingly many studies in this category examined predictive and diagnostic model capabilities in patients with some form of deep endometriosis (n = 5). The pooled SE for the models with highest accuracy in each study was 81.7% while the pooled SP was 91.6%47,48,49,50. Specific inputs into each model varied as seen in previous categories with Bendifallah et al.50 using the largest number of clinical features for their models. However, there were some commonalities in the types of inputs that were used in each model. Patient age was the most frequently used input (n = 5) in diagnostic and predictive models using clinical variables. Given that endometriosis most commonly presents in reproductive-aged women, it is not surprising that age is the most frequent input in a diagnostic/predictive AI model. Other significant inputs included the presence or severity of dysmenorrhea, presence or severity of dyspareunia, visual analogic scale for dyspareunia, infertility, and previous surgery for endometriosis or pelvic surgery. Among the studies that did report SE and SP metrics, the SE values ranged from 51% to 95% and SP values ranged from 77.1 to 95.7%47,48,49,50.

### Diagnostic or predictive models for endometriosis using genetic variables

Models that were built using genetic variables as their primary inputs used a significantly larger number of inputs than any of the other six input categories referenced in this review. Only five studies52,53,54,55,56 used genetic variables to build their predictive and diagnostic models, however, the type of input varied between individual gene candidates52,56, large protein-coding gene datasets from transcriptomics and methylomics data53,55, and 16S rRNA gene amplicon data54. The AI methods used in this category included: deep ML algorithm, decision tree, GenomeForest (a new ensemble ML classifier), random-forest-based ML classification analysis, PLSDA, SVM, random forest, and margin tree classification. The pooled SE for the models with highest accuracy in each study was 96.7%, while the pooled SP was 70.7%52,53,55.

Two studies compared the use of large transcriptomics and methylomics datasets to build different AI models that were compared with each other53,55. As seen in Table 5, regardless of which AI method was used, the models built using the transcriptomics dataset outperformed the models built with the methylomics dataset, albeit marginally. Akter53 used GenomeForest, a novel ensemble technique based on chromosomal partitioning, to classify endometriosis and control samples using both transcriptomics and methylomics datasets. The authors concluded that this new classifier could help identify candidate biomarkers for endometriosis; they further demonstrated that three different ML models (GenomeForest, decision tree, and Biosigner) independently identified NOTCH3 as candidate gene with differential expression in the endometriosis samples53,55. ML methods may be of particular use when analyzing very large genomic datasets to help identify candidate genes that have altered expression in endometriosis patients versus control samples.

### Diagnostic or predictive models for endometriosis using mixed variables

Three studies27,57,58 used mixed variable types to create predictive or diagnostic models for endometriosis as shown in Table 6. All three studies used logistic regression as the methodology to construct models and the sample sizes ranged from 119 patients57 to 1396 patients27. Inputs included clinical variables collected from patient medical history, physical exam findings, ultrasonography evidence, and MRI visualization. It should be noted that Chattot et al.57 had the smallest sample size. The study with the largest sample size27 reported a SE and SP of 82.6% and 75.8%, respectively. The accuracy for studies in this category was relatively consistent compared to other categories with similar SE and SP.

### Diagnostic or predictive models for endometriosis using imaging

Only three studies59,60,61 explored the use of imaging variables as their primary inputs for their AI models as seen in Table 7. Guerriero59 built models specifically for rectosigmoid endometriosis and compared the accuracy of the different AI methods using the same inputs for each model. This specific study allows one to draw conclusions about the accuracy of different methodologies in developing predictive models to increase suspicion for rectosigmoid endometriosis. The Naïve Bayes and SVM approaches produced the models with the highest accuracy (75%) in this study and K-nearest neighbor produced the lowest accuracy (69%). SVM also produced the highest SE at 84% while Naïve Bayes and decision tree showed the highest SP (77%). The pooled SE for the models with highest accuracy in each study was 88% while the pooled SP was 89.7%59,60,61.

Reid et al.60 also produced two logistic regression models using different imaging variables; the accuracy of both models was higher than the logistic regression model produced by Guerriero et al.59 indicating that perhaps the inputs for Reid’s model60 played a role in the higher accuracy, SE and SP. All three studies in this category explored “sliding sign” on transvaginal ultrasound as an important features in their models.

Maicus et al.61 was the only study to use a deep learning model called Resnet (2 + 1)D to classify the state of the pouch of Douglas with regards to adhesions indicative of endometriosis in patients. Their model was trained, internally validated, and externally tested on a dataset to evaluate the sliding sign on ultrasound, demonstrating an accuracy of 88.8%.

## Discussion

In the field of endometriosis, AI interventions have proven to be heterogenous in terms of their purpose, methodology, input selection and accuracy. Given the wide range of problems that exist in the field of endometriosis diagnosis, prediction and research, it is not surprising that models were built to tackle many different problem formulations. This study performed a thorough scoping review on the literature intersecting endometriosis and AI, and it provides a timely understanding of AI technology in the field of endometriosis. A meta-analysis of the data was not possible due to the diverse nature of studies included in this scoping review. Our study identified six major categories of model inputs that were used to build AI interventions in addition to three studies that used AI methods to improve research techniques28,29,30 and one study that only used lesion characteristics to build a predictive model62. Of the six major input categories, biomarkers, clinical variables, genetic variables and metabolite spectra were the most frequently used input types for building diagnostic and predictive AI models.

AI interventions that were built using biomarker inputs included diagnostic and predictive models for ultrasound-negative endometriosis34, and ovarian endometriomas33. Biomarker inputs for these models included plasma biomarkers collected in all phases of the menstrual cycle34, lipidomic profiling of endometrial fluid33, and serum miRNA markers35. AI interventions built using metabolite spectra as their primary input included detecting endometriosis in serum samples43,44, screening for biomarkers in eutopic endometrium26, diagnosing ultrasound-negative endometriosis40, diagnosing endometriosis using messenger RNA expression in endometrium biopsies41, identifying predictive serum biomarkers42, diagnosing and staging endometriosis using peptide profiling39, determining classifier metabolites for early prediction risk38, and diagnosing stage 3 and stage 4 endometriosis in infertile patients36. Studies that used genetic variables to build AI interventions included classifying endometriosis using RNAse and enrichment-based DNA-methylation datasets53, diagnosing endometriosis using gut and/or vaginal microbiome profiles54, using transcriptomics or methylomics to classify endometriosis55, and staging pelvic endometriosis using genomic data56. Some studies also used clinical signs and symptoms collected when obtaining a patient’s medical history as well as other clinical variables to build models. These AI interventions included predicting the presence of posterior deep endometriosis in patients with chronic pelvic pain symptoms49, predicting pregnancy rates in patients with endometriosis48, predicting medical care decision rules for patients with recurrent endometriomas51, diagnosing DE pre-operatively for patients with endometriomas47 and differentiating between patients with and without endometriosis50.

Our scoping review was able to evaluate the current literature and map out the field of study to demonstrate that AI applications in endometriosis look promising for improving diagnostics, research efficacy and outcome prediction in this patient population. Pooled SE ranged between 81.7 and 96.7% and pooled SP ranged between 70.7 and 91.6%. Our review included a range of heterogenous study designs, large retrospective analyses, various ML interventions and diverse research questions in the field of endometriosis. This is a timely review providing clinicians and computer scientists with an extensive understanding of AI applications in endometriosis. Clinical decision-making by humans is often prone to errors, biases and heuristics63. However, this review shows strong promise for AI’s ability to mitigate these human errors and provide superior outcome prediction with high SE and SP. Although many of the studies included in this review relied on a human component for data analysis/collection and determining feature extraction, AI technologies (especially when using standardized and validated models) may present the potential to reduce diagnostic error that can result from individual practicing biases and clinical heuristics. Future studies with human comparators are required to determine this. This review also demonstrated how AI can be used to improve research efficacy particularly through the use of natural language processing28 and identification of potential biomarkers30 and diseases29 associated with endometriosis pathophysiology. Lastly, this scoping review adds to future recommendations for research in this field and supports the need for standardized guidelines for ML applications in medicine.

Approximately 44.4% (n = 16) of AI interventions were predictive models meant to predict various outcomes in patients with endometriosis or undifferentiated symptomatic patients. Models were built to predict the presence of posterior DE in patients with chronic pelvic pain49, the clinical pregnancy rate in patients with endometriosis48, and many other outcomes in this patient population. However, many of these studies were conducted retrospectively and they did not adequately compare the AI’s ability to outperform existing decision tools and clinical diagnostics. Additionally, none of the studies involved a human comparator (since many models were trained and validated on retrospectively diagnosed patient datasets) and thus make it difficult to comment on AI’s superiority as a tool clinicians can use for predictive modeling.

The type and stage of endometriosis varied among the included studies; thus, the AI approaches to prediction and diagnosis also differed. This makes it difficult to compare AI models used in the studies. Many studies lacked detailed information on the methods used to verify patients with endometriosis with regards to a reference standard, while others cited gold standard laparoscopic visualization with subsequent histopathologic confirmation as the modality of diagnosis. Additionally, the heterogeneity of the study designs, input data used, and AI interventions, made it difficult to compare the accuracy and efficacy of the different models. Many studies lacked transparent descriptions of their modeling making it difficult to critique methodology and determine if the right AI model was being used to predict the outcome in question.

Applying AI to assess endometriosis is relatively new, and most AI methods used are still relatively simple. Various data types continue to be explored; however, each data type was utilized exclusively up to date. As can be seen from the tables, the use of protein spectra continues to be perhaps the most common approach, but generally only with small sample sizes. In the future, the increasing adoption of AI in assessing endometriosis will also likely play an essential role in women’s healthcare.

Our recommendations, based on this review and challenges of employing AI, are as follows:

1. 1.

The types and stages of endometriosis included in the study sample need to be clearly defined, and models should specify what type/stage of endometriosis they are built to predict, classify or diagnose.

2. 2.

The gold standard (a reference where we compare the AI model against) has to be defined and justified to assess reliability.

3. 3.

The evaluation metric (e.g., sensitivity and specificity) needs to be tested and reported clearly.

4. 4.

Transparent descriptions of the used AI model is needed for reproducibility.

5. 5.

Applying multiple AI models to determine the most accurate one for specific outcomes and diagnostic goals.

6. 6.

A large sample size with a diverse age group used is required for achieving generalizability.

7. 7.

Training and testing phases need to be clearly explained, specifically stating whether cross-validation or holdout is implemented; and

8. 8.

Logistic regression models incorporating a training and test/validation cohort would be more effective in establishing external validation of the model; and

9. 9.

Studies using retrospective analyses of large clinical datasets to build models should attempt to validate their models in prospective controlled clinical trials. Controlled clinical trials are required to determine whether AI can outperform human decision-making and remove any potential biases. Although internal validation samples are essential to test a model’s performance, these models should also be tested through prospective controlled trials to ensure that they are generalizable in a clinical context and that their performance is not limited to an artificial set of parameters.

Of the 36 studies included in this review, 50% were published in the last 5 years, indicating that there is recent and rapidly growing interest in AI applications to improve diagnostic, predictive and research capabilities for a complex disease such as endometriosis. Further research should be conducted using human comparators and should include comparisons with existing scoring systems and diagnostic tools to determine AI’s superiority for predictive and diagnostic modeling in endometriosis. These AI algorithms should also be externally validated or tested through prospective controlled trials to ensure that they contribute to advancing real-world clinical practice and diagnostics. This review was able to identify this interest in AI and highlight the benefits and shortcomings of AI interventions to improve future models for endometriosis.

## Methods

### Study guidelines

Given the heterogeneity and breadth of research in this field, a scoping review was performed to summarize the use of AI applications in endometriosis research, diagnostics, and prediction to help identify gaps in knowledge and address broad research questions64. The guidelines of the Preferred Reporting Items for Systematic Reviews and Meta-Analyses for Scoping Review (PRISMA-ScR)65 and Arksey and O’Malley’s recommendations for scoping review methodology66 were followed. A prior review protocol was drafted using the Preferred Reporting Items for Systematic Reviews and Meta-Analyses Protocols67 for internal use amongst the research team but it was not externally published or registered prospectively.

### Search strategy and study eligibility

The PubMed, Medline-OVID, EMBASE, and CINAHL databases were searched sequentially from January 2000 to March 2022 for all English-language papers using the following search strategy (adapted for each database): [(Endometriosis) OR (Endometrioma)] AND [(AI) OR (ML) OR (Prediction Model) OR (Classification)]. Gray literature was not included in this scoping review in attempt to only include peer-reviewed studies. This timeframe was chosen to reflect advances in AI technologies and applications in medicine. The scope of the search was not restricted to a particular type or stage of endometriosis. The search for this scoping review was completed in March 2022.

### Inclusion and exclusion criteria

The following inclusion criteria were used to determine study eligibility for this review: (1) the study involved assessing an AI approach or model to advance prediction, diagnosis, management or disease understanding in the field of endometriosis; (2) the study reported a quantitative metric on the accuracy/performance of the AI method; (3) the study was conducted using humans; (4) the article was accessible in English; and (5) the study used a validation method to test its model. Studies were excluded if: (1) they were not conducted using humans; (2) did not assess or evaluate an AI approach or model; (3) did not pertain to the field of endometriosis; and (4) developed a logistic regression model without the use of a training and test/validation set. One reviewer (BS) conducted the literature search and two reviewers (BS and ME) screened the titles, abstracts and full-texts independently for potentially eligible studies. Reference lists of eligible studies were also hand-searched but no additional studies were included on this basis.

### Study selection and data extraction

One author (B.S.) conducted the literature search, and two authors (B.S. and M.E.) independently screened the titles and abstracts for potentially eligible studies. Each potential study for inclusion underwent full-text screening and was assessed to extract study-specific information and data; Table 1 presents a summary of the title, lead author, publication year, study design, AI intervention, purpose/aim, sample size, type of inputs used in the AI method, specific inputs in the final model, evaluation metrics used and AI accuracy. Two reviewers (B.S. and M.E.) independently conducted a full-text screening and extracted information from potentially eligible studies. They then cross-checked the identified studies to determine eligibility through discussion and used consensus to resolve discrepancies. The information collated in the initial evidence table was used to aggregate data and determine the main themes of use for AI in endometriosis in the currently published literature. Where studies explored more than one AI model, the model with the highest accuracy was assessed and included in the review.

### Pooled evaluation metric

Pooled sensitivities and specificities were calculated for studies within the same input category. The following formula68 was used to combine means across different studies where SE or SP is the pooled mean for sensitivity or specificity, as follows:

$${{{\mathrm{SE}}}}\,{{{\mathrm{or}}}}\,{{{\mathrm{SP}}}} \,=\, \frac{{N_1X_1 \,+\, N_2X_2 \,+\, \cdots }}{{N_1 \,+\, N_2 \,+\, \cdots }}$$
(1)

where, for example, N1 is the number of participants in study 1 and X1 is the value of the reported sensitivity or specificity in study 1.