Introduction

Endoscopic Retrograde Cholangiopancreatography (ERCP) is the most used therapeutic procedure for pancreaticobiliary disorders. However, how to best achieve safe and effective bile duct cannulation is still debated. Despite notable developments in the past decades, the failure rate is still 5–20% in experienced hands1. Moreover, the incidence of the procedure’s adverse events is high; post-ERCP pancreatitis (PEP) has an incidence rate of 9.7%, with a mortality rate of 0.7%2.

Endoscopists performing ERCP recognize the differences in the macroscopic appearance of the major papilla. This has led to a conception that certain appearances of the papilla are more challenging to cannulate and, therefore, more prone to adverse events. Despite the essential role of bile duct cannulation in procedural safety and success, research on this topic is still limited.

A Scandinavian research group published the first inter- and intraobserver-validated classification of the major papilla’s endoscopic appearance in 20173. In the same year, they also published a multicentric prospective cohort study, indicating that the anatomy of the major papilla affects both the difficulty of the bile duct cannulation and the procedural adverse events4. Further, their results suggest that the morphology of the papilla should be considered in the training of fellow endoscopists4. Other identified studies support their results5,6.

Recently, several articles have been published assessing the influence of papilla morphology on ERCP outcomes, with contradicting results. Therefore, we aimed to systematically review and quantify the magnitude of its effect and investigate its importance and relevance in the endoscopic practice.

Methods

A systematic review and meta-analysis were conducted following the Preferred Reporting Items for Systematic Review and Meta-Analysis (PRISMA) Statement (see Supplementary Table 12) and the recommendations of the Cochrane Handbook7,8. The review protocol was registered in advance on PROSPERO with the registration number CRD42022360894.

Systematic search

Three databases: MEDLINE (via PubMed), Embase, and Cochrane Central Register of Controlled Trials (CENTRAL), were systematically searched from inception until the 29th of September 2022. We did not apply any filters or restrictions to our search. The main parts of the search query included terms in connection with ERCP and papilla morphology. For the detailed search strategy, see Table S1. Additionally, we systematically searched for relevant articles by reviewing the included articles’ bibliographic references and citation lists.

Eligibility criteria

The condition-context-population (CoCoPop) framework was used to identify eligible studies9. The conditions were (Co): difficult cannulation, cannulation attempts, cannulation time, cannulation failure, post-ERCP pancreatitis, and other post-ERCP adverse events (bleeding, perforation, infection) in the context of the different papilla morphologies (Co). Studies with adult patients (> 18) undergoing ERCP with a native papilla (Pop) were selected.

Randomized controlled trials, case–control, cross-sectional, and cohort studies were eligible for inclusion. Both full-text articles and conference abstracts with sufficient data were considered eligible. Regarding the definition of difficult cannulation, cannulation failure, and post-ERCP adverse events, the definitions provided in the included studies were used.

Morphology of the papilla

Primarily, for the classification of the morphology of the papilla, as the first validated intra- and interobserver classification, the Haraldsson system was used4. They classified the papilla into four types: regular (type 1), small (type 2), protruding or pendulous (type 3), and creased or ridged (type 4)3.

Secondarily, a comparison between the Haraldsson and the other identified classification systems was attempted with the following method: two endoscopists (PJH, EB) assessed the description of the morphology and the imagery of the studies. They chose the identical papilla types to Haraldsson’s. In case of any disagreement, a third reviewer was included in the decision process (ET). After the comparison, additional analyses were conducted.

Study selection and data extraction

After the systematic search, the yielded articles were imported into a reference management program (EndNote X7.4, Clarivate Analytics, Philadelphia, PA, USA) to remove the duplicates automatically and manually. After removing duplicates, two independent authors (ET, EBG) screened the remaining publications first by title and abstract and then by full text. We used Rayyan for the selection process10. Cohen’s kappa coefficient (κ) was calculated on both levels of selection to measure inter-reviewer reliability11.

Two investigators extracted data independently (ET, EBG) and manually populated it into a purpose-designed Excel 2016 sheet (Office 365, Microsoft, Redmond, WA, USA). Data were collected on the first author, year of publication, digital object identifier, period of data collection, study location, number of centers, study design, the mean or median age of the patients (with standard deviation or interquartile range), the total number of patients, the number of women, the number of patients with each papilla morphology, and data regarding the primary and secondary outcomes in the context of the different papilla types. For statistical analysis, raw data were extracted into two-by-four tables (condition yes/no; papilla morphologies).

Statistical analysis

The statistical analysis was performed by a biostatistician (DSV) with R (R Core Team 2022, v4.2.2)12. Forest plots were used to display the results of the meta-analytical calculations. The minimum study number to perform the meta-analytical calculation were three. Event rates with a 95% confidence interval (CI) were used for the effect size measure. As we anticipated considerable between-study heterogeneity, a random-effects model was used to pool effect sizes. For assessing the small study publication bias, funnel plots were used with a visual inspection. Additional sensitivity analyses were conducted using the leave-one-out method, with a minimum study number of four (see additional details in the supplementary material).

See supplementary material for additional details on the statistical analyses.

Risk of bias assessment

Two investigators (ET, EBG) independently assessed the risk of bias for each outcome using the Joanna Briggs Institute Critical Appraisal tool for studies reporting prevalence13.

Quality of evidence

Certainty of evidence was assessed following the Grading of Recommendations Assessment, Development, and Evaluation (GRADE) recommendation14. Two independent investigators (ET, EBG) evaluated all criteria for all outcomes. Disagreements were resolved by the senior review author (BE).

Results

Search and selection

The details of the study selection process are summarised in the PRISMA flow chart shown in Fig. 1.

Figure 1
figure 1

PRISMA 2020 flowchart representing the study selection process.

A total of 6,952 studies were identified through database searching. Finally, our narrative synthesis comprised 17 studies4,5,6,15,16,17,18,19,20,21,22,23,24,25,26,27,28. Of those, 14 could be included in the quantitative synthesis4,5,6,15,16,17,19,21,22,23,24,25,26,27.

Basic characteristics of included studies

The main characteristics of the included studies are summarised in Table 1. Eligible studies were reported between 2016 and 2022. Of the 17 studies, 15 were cohort studies, eight had prospective (5, 6, 19–22, 26, 27), and seven had retrospective designs. There was also one case–control24 and one cross-sectional study19. 13 of the studies were full-text articles4,5,6,15,16,17,19,21,22,24,26,27,28, and four of them were conference abstracts18,21,23,25. Seven studies used the Haraldsson classification4,5,6,19,22,24,25, with seven additional ones using comparable classifications15,16,17,21,23,26,27. Three studies used classification systems that were not comparable to the Haralddson classification. The number of study participants ranged from 72 to 11,090.

Table 1 Basic characteristics of included studies.

Quantitative synthesis

Difficult cannulation

Nine studies were identified regarding the event rate of difficult cannulation4,15,19,20,21,22,24,25,26, of which eight were included in the quantitative synthesis4,15,19,21,22,24,25,26. In the case of studies using the classification proposed by Haraldsson, in type I papilla, the rate of difficult cannulation was lower (26%; CI 18–37) compared to the other papilla types (type III: 35%; CI 25–48; type II: 39%; CI 28–52; type IV: 41%; CI 28–55). The difference was statistically no significant; however, the p-value referred for a higher tendency for difficult cannulation in certain papilla types (p: 0.075). The heterogeneity was high (total I2: 89%; CI 48–98). Sensitivity analyses did not reveal outlier studies or relevant changes in the estimate (see Figs. 2 and S1).

Figure 2
figure 2

Forest plot representing the pooled event rate of difficult cannulation in the different papilla types in studies using the Haraldsson classification, showing a lower tendency for difficult cannulation in type I papilla compared to the other papilla types.

A similar but statistically significant result with no outlier study was observed, including all the studies with different classifications (p: 0.019; total I2: 87%; CI 5596) (see Figures S2-3).

Cannulation failure

Eight studies detailed the event rate of cannulation failure, all using Haraldsson’s or classifications comparable to it5,6,16,21,23,25,26,28. In the analysis, including studies only using the Haraldsson classification, no statistically significant difference was observed in the rate of failed cannulation between the different papilla types (p: 0.262, total I2: 61%; CI 097) (see Fig. 3).

Figure 3
figure 3

Forest plot representing the pooled event rate of cannulation failure in the different papilla types in studies using the Haraldsson classification, showing no statistically significant difference in the event rates between the papilla types.

In the case of including all eight studies, the difference was statistically significant (p: 0.047, I2: 64%; CI 091). The rate of cannulation failure was the highest in the case of type II papilla (8%, CI 4–14) and the lowest in type I (3%; CI 2–6) (see Figure S4). Sensitivity analyses did not reveal outlier studies or relevant changes in the estimate (see Figure S5).

Post-ERCP pancreatitis

Nine of the identified studies reported the event rate of PEP in the different papilla types4,5,6,15,16,17,22,25,28, of which eight articles were included in the quantitative synthesis4,5,6,15,16,17,22,25. In the case of studies using the Haraldsson classification, in type II papilla, the rate of post-ERCP pancreatitis was higher (11%; CI 815) compared to the other papilla types (type IV: 7%; CI 412; type I: 6%; CI 58; type III: 6%; CI 48). The result was statistically significant (p: 0.0441). Total homogeneity was observed (total I2: 0.044) (see Fig. 4).

Figure 4
figure 4

Forest plot representing the pooled event rate of post-ERCP pancreatitis in the different papilla types in studies using the Haraldsson classification, showing a statistically significantly higher rate of post-ERCP pancreatitis in type II papilla, compared to the other papilla types.

A similar tendency was observed in the case of including all eight studies; however, the difference between the papilla types was not statistically significant (p: 0.103) (see Figure S6). Sensitivity analyses did not reveal outlier studies or relevant changes in the estimate (see Figures S7-8).

Post-ERCP bleeding

Six eligible studies reported information about a bleeding episode after an ERCP procedure, all using the Haraldsson classification or classifications comparable to it5,6,15,16,22,25. In the analyses with only studies using the Haraldsson classification and with all classification systems, no statistically significant difference was observed in the event rate of the post-ERCP bleeding between the papilla types (p: 0.8585 and p: 0.8078, respectively) (see Figs. 5 and S9). Sensitivity analyses did not reveal outlier studies or relevant changes in the estimate (see Figures S10-11).

Figure 5
figure 5

Forest plot representing the pooled event rate of post-ERCP bleeding in the different papilla types in studies using the Haraldsson classification, showing no statistically significant difference in the event rates between the papilla types.

Qualitative synthesis

Cannulation time

Eight studies investigated cannulation time in the context of papilla morphology4,5,6,15,16,18,22,27, and four used the Haraldsson classification4,5,6,22. The time for cannulation was the lowest in type I papilla, without exception. Two-two studies reported the highest cannulation time in type II4,5 and type IV papilla6,22.

Cannulation attempts

Four studies investigated the number of cannulation attempts in the context of papilla morphology6,15,22,26, from which two used the Haraldsson classification6,22. In both cases, the cannulation attempts were the highest in type IV and the lowest in type I and III papillae.

Post-ERCP perforation

Three studies investigated the perforation rate after an ERCP procedure, all using the Haraldsson classification5,16,22. The meta-analytical calculation was impossible due to the number of zero events.

Post-ERCP infection

Four studies reported the proportion of patients with an infection after ERCP5,6,15,25; of those, three studies used the Haraldsson classification5,6,25. Chen et al. reported the highest event rate of cholangitis in type I (2.5%) and no event in type II and III papillae5. Mohammed et al. found the highest event rate of cholangitis and/or sepsis in type II (3.2%) and no event in type III and IV papillae, meanwhile in the study by Thongsuwan et al., the event rate of infection was the highest in type III (10.5%) and the lowest in type I papilla (6%)6,25.

Risk of bias and publication bias assessment

Most of the included studies carried a low risk of bias. Among the eight studies detailing difficult cannulation, two (25%) had high, and six (75%) had low risk of bias. The results of the risk of bias assessments are shown in Figures S12-19. Publication bias could not be observed in the conducted analyses. The results of the assessments are shown in Figures S20-27.

Quality of evidence

Since we included only cohort studies, the certainty of evidence ranged between very low and low for each outcome. Detailed results of the GRADE assessment can be found in Tables S4-11.

Discussion

Our systematic review and meta-analysis assessed the impact of papilla morphology on ERCP and its outcomes. We found that in studies using the Haraldsson classification, compared to the other papilla types, the event rate of difficult cannulation was lower in type I papilla. Type II papilla was associated with a twofold increase in the event rate of PEP compared to the other papilla types. There was no difference in the cannulation failure and post-ERCP bleeding event rates between the different papilla types.

Since its introduction, there have been debates regarding ERCP’s safety and success rate. Several factors seem to influence cannulation difficulties, such as age and age-related factors, including duodenal distortion; procedure-related aspects, such as duodenal positioning or certain etiologies, for example, malignant biliary obstruction. The morphology of the papilla is also assumed to be related to multiple perspectives of the procedure29.

First, papilla morphology should be considered in the training of fellow endoscopists. In the studies selected for inclusion, there are contradicting data regarding how the endoscopist’s expertise influences cannulation difficulty. Mohamed et al. found no relationship between the rate of difficult cannulation and the endoscopist’s expertise (7). In contrast, in the study by Haraldsson et al., the rate of difficult cannulation was the highest in type II papilla, where the number of trainees starting the cannulation process was the highest (5). Other studies also suggest that the operator’s experience may decrease the rate of difficult cannulation and cannulation failure (34, 35). Further data in the literature suggest that the rate of PEP and other adverse events also decreases with the endoscopist’s experience (36).

Secondly, papilla morphology also influences the rate of PEP, the procedure’s most common adverse event2. We found the highest rate of PEP in type II papilla, which is consistent with the result of the individual studies. However, the definite explanation for this pattern is still uncertain. According to Chen et al. hypothesis, it could be due to the fact that endoscopic papilla balloon dilatation (EPBD) was used more often in this papilla type in their cohort5. The same trend could be observed in the study by Mohamed et al.6. Further data in the literature suggest that EPBD with small-caliber balloons (diameter: 8–10 mm) increases the rate of PEP30.

Lastly, all the included studies observed differences in rescue techniques’ use in different papilla morphologies. It could be one of the explanations for the non-significant difference in cannulation failure between the different papilla types. We hypothesize that the morphology of the papilla should be considered when choosing a rescue cannulation technique since it decreases the difference in the tendency for cannulation failure or difficult cannulation between the papilla types. Studies suggest that a pre-cut sphincterotomy or needle-knife fistulotomy (NKF) may be used in normal papillae. Trans-pancreatic sphincterotomy could be the recommended rescue technique in small papillae. In protruding/pendulous or creased/ridged papillae, also NKF could be the preferred method31,32.

Several classification systems were identified; the Haraldsson was the most widely used and well-recognized one. Despite being the first validated classification system developed by expert endoscopists and, therefore, the basis of our analysis, it has one major limitation: it ignores the presence of a periampullary diverticulum. A modified version of the classification was proposed by Mohamed et al. in 2021, introducing an additional papilla type (type D) for papillae involved with a periampullary diverticulum6. In addition, a meta-analysis by Mui et al. found that the presence of PAD may increase the risk of cannulation failure and may also be associated with a higher risk for post-ERCP adverse events33. These results suggest that this modified version of the classification should be used.

Strengths

Despite the topic’s importance, to our knowledge, this is the first meta-analysis focusing on papilla morphology and its relation to the most relevant endpoints of the ERCP cannulation process and the rate of adverse events. A rigorous methodology was applied, with a comprehensive search key. No publication bias or outlier study was detected in any conducted analyses, and most studies carried a low risk of bias. Moreover, the number of included patients was above 20,000.

Limitations

Regardless of all the strengths, this study also had some limitations: (1) In certain analyses, considerable statistical heterogeneity was observed. Its explanation could be the clinical heterogeneity across studies, such as the difference in the applied definitions in connection with the endoscopic procedure. Most studies used the definition of the European Society of Gastrointestinal Endoscopy for difficult cannulation; however, Thongsuwan et al. used its simplified version. (2) Some of the included cohort studies were retrospective analyses. (3) The certainty of the evidence was low or very low. (4) Abstracts were also eligible for inclusion; however, all were high-quality, containing all the necessary data.

Implication for practice

Based on our results, during training of fellow endoscopists, papilla morphology should be determined, and trainees should start their learning with type I (“regular”) papillae. Using a unified classification system for papilla morphology is recommended to promote transparency in clinical practice.

Implication for research

Large sample cohorts are needed to validate the Mohammed version of the classification and assess the presence of a periampullary diverticulum. Besides the event rate, future research should also focus on the severity of PEP in the different papilla types. Furthermore, developing a recommendation system for advanced cannulation techniques in the context of papilla morphologies should be considered.

Conclusion

In conclusion, other types are associated with a higher rate of difficult cannulation compared to the regular papilla type. The small papilla is associated with a higher rate of post-ERCP pancreatitis.