Introduction

Acute appendicitis is classified clinically as uncomplicated or complicated, and 70–80% of all patients with acute appendicitis are diagnosed with uncomplicated appendicitis1. Recent clinical trials have demonstrated that antibiotic therapy for uncomplicated appendicitis may be a feasible alternative to immediate appendectomy2,3,4. Although some studies reported the usefulness of antibiotic therapy for acute appendicitis with abscess because nonsurgical treatment reduced the incidence of extensive surgical procedures and postoperative complications5,6, immediate surgical treatment remains the main approach for complicated appendicitis. Even though abdominal ultrasonography, computed tomography (CT), and magnetic resonance imaging have contributed to the diagnosis of acute appendicitis7,8,9, diagnostic discrimination between uncomplicated and complicated appendicitis using these imaging methods remains challenging. Therefore, a scoring system with reliable discrimination between uncomplicated and complicated appendicitis is required to determine the optimal treatment for patients with acute appendicitis.

Several studies have reported various scoring systems to predict complicated appendicitis10,11,12; however, few have evaluated the clinical utility of the scoring systems using external validation13,14. In this study, we developed a scoring system to discriminate between uncomplicated and complicated appendicitis, and assessed its clinical utility using external validation.

Results

Patient characteristics in the model development group

A total of 199 patients with acute appendicitis who underwent immediate surgical treatment in Secomedic Hospital comprised the model development group. Among the 199 patients, 105 patients (52.8%) were diagnosed with complicated appendicitis. The characteristics of the patients in the model development group are summarized in Table 1.

Table 1 Characteristics of 199 patients in the model development group.

Relationships between complicated appendicitis and the continuous variables

The relationships between complicated appendicitis and the continuous variables were investigated using a restricted cubic spline analysis. The analysis revealed that age and body temperature (BT) had linear relationships with complicated appendicitis, and these two variables were dichotomized at each point on the receiver operating characteristic (ROC) curves closest to the (0, 1) point (Supplementary Fig. S1). In contrast, body mass index (BMI), platelet count (PLT), serum C-reactive protein concentration (CRP), and maximum diameter of appendix or abscess (MD) had non-linear relationships with complicated appendicitis, and white blood cell count (WBC) had neither a linear nor non-linear relationship with complicated appendicitis. On the basis of the restricted cubic spline analysis, BMI, PLT, WBC, CRP, and MD were converted into three categories. The BMI, PLT, WBC, CRP, and MD were categorized as follows: BMI, ≤ 20.5 kg/m2, 20.6–23.5 kg/m2, and > 23.5 kg/m2; PLT, ≤ 210 × 109/L, 211–260 × 109/L, and > 260 × 109/L; WBC, ≤ 11.5 × 109/L, 11.6–14.5 × 109/L, and > 14.5 × 109/L; CRP, ≤ 15.0 mg/L, 15.1–70.0 mg/L, and > 70.0 mg/L; MD, ≤ 9.0 mm, 9.1–11.0 mm, and > 11.0 mm (Supplementary Fig. S2).

Determination of a final multivariate logistic regression model to discriminate between uncomplicated and complicated appendicitis

Multivariate logistic regression analysis with stepwise backward elimination was performed, and a final multivariate logistic regression model was created using six independent predictors of complicated appendicitis. The independent predictors were as follows: age > 47 years (odds ratio (OR), 2.825; 95% confidence interval (CI), 1.278–6.242; P = 0.010), BT > 37.2°C (OR, 2.230; 95% CI 1.037–4.792; P = 0.040), CRP > 70.0 mg/L (OR, 7.182; 95% CI 2.587–19.937; P < 0.001), MD > 11.0 mm (OR, 3.273; 95% CI 1.253–8.546; P = 0.015), presence of an appendicolith (OR, 3.064; 95% CI 1.387–6.770; P = 0.006), periappendiceal fat stranding (FS) grade 1 (OR, 4.073; 95% CI 1.473–11.266; P = 0.007), FS grade 2 (OR, 5.000; 95% CI 1.723–14.508; P = 0.003), and FS grade 3 (OR, 6.521; 95% CI 1.465–29.020; P = 0.014) (Table 2).

Table 2 Final multivariate logistic regression model for complicated appendicitis.

Internal validation of the final multivariate logistic regression model to discriminate between uncomplicated and complicated appendicitis

An internal validation of the final multivariate logistic regression model was performed using tenfold cross-validation. First, 199 patients were randomly split into a training data set comprising 70% of the 199 patients and a test data set comprising the remaining 30% of the 199 patients. Next, the training data set was randomly split into 10 equal-sized subsamples. Of the 10 subsamples, 9 subsamples were used as the data for training the model, and the remaining subsample was retained as the data for testing the model. Furthermore, the model estimation process was repeated 10 times, with each of the 10 subsamples used exactly once as the data for testing the model. Finally, the 10 results were averaged to produce a single estimation. The model accuracy of the training data set was 0.799, and Cohen’s kappa was 0.595. Additionally, the model accuracy was predicted using the test data set. The model accuracy using the test data set was 0.763, and Cohen’s kappa was 0.521. Therefore, the tenfold cross-validation indicated that the model discriminated between uncomplicated and complicated appendicitis with moderate accuracy.

Conversion of the final multivariate logistic regression model to a scoring system for complicated appendicitis

The final multivariate logistic regression model was converted to a scoring system. A points value rounding off the OR to the nearest integer was assigned to each predictor. The points values for the six independent predictors were as follows: age > 47 years, 3 points; BT > 37.2°C, 2 points; CRP 15.1–70.0 mg/L, 3 points; CRP > 70.0 mg/L, 7 points; MD 9.1–11.0 mm, 1 point; MD > 11.0 mm, 3 points; presence of an appendicolith, 3 points; FS grade 1, 4 points; FS grade 2, 5 points; and FS grade 3, 7 points. Consequently, the scores for the scoring system ranged from 0 to 25 points (Table 3).

Table 3 Scoring system for discriminating between uncomplicated and complicated appendicitis.

Discrimination and calibration ability of the scoring system for complicated appendicitis

The discrimination ability of the scoring system was evaluated using a ROC curve analysis. The area under the ROC curve (AUC) of the scoring system was 0.882 (95% CI 0.835–0.929), and the Hosmer–Lemeshow test showed no significant difference in goodness of fit for the scoring system (P = 0.478). According to the ROC analysis, the cutoff point of the scoring system was set as 12 points, with a sensitivity and specificity of 82.9% and 86.2%, respectively. The positive likelihood ratio, negative likelihood ratio, and diagnostic OR were 6.01, 0.20, and 30.3, respectively (Fig. 1).

Fig. 1
figure 1

Assessment of the ability of the model to distinguish complicated appendicitis using a receiver operating characteristic curve. AUC, Area under the receiver operating characteristic curve; CI, Confidence interval.

Diagnostic performance difference between the radiological assessment and the scoring system for complicated appendicitis

Based on the scoring system, 100 patients with > 12 points were diagnosed as complicated appendicitis. On the other hand, 152 patients with any one of six CT findings including the presence of an appendicolith, ileus, abscess, extraluminal air, ascites, and FS ≥ grade 2 were diagnosed as complicated appendicitis using the radiological assessment with a sensitivity and specificity of 91.4% and 40.4%, respectively. There were 139 patients (69.8%) with diagnostic concordance between the radiological assessment and the scoring system (Supplementary Table S1). The diagnostic performance difference between the radiological assessment and the scoring system was evaluated, and there was a significant difference in diagnostic performance between the radiological assessment and the scoring system (P < 0.001).

Assessment of the discrimination and calibration ability of the scoring system using external validation

The scoring system was evaluated using an external validation group comprising 100 patients with acute appendicitis who underwent immediate surgical treatment in Teikyo University Chiba Medical Center. Our scoring system was applied to the external validation group data, and 51 of the 59 patients with complicated appendicitis scored > 12 points (Table 4). In the external validation group, the AUC of the scoring system was 0.868 (95% CI 0.794–0.942), and the Hosmer–Lemeshow test showed no significant difference in goodness of fit for the scoring system (P = 0.125). At 12 points, the sensitivity and specificity were 86.4% and 78.0%, respectively. The positive likelihood ratio, negative likelihood ratio, and diagnostic OR were 3.93, 0.17, and 23.1, respectively. There was no significant difference in the AUC between the groups (P = 0.750) (Fig. 2). Therefore, we considered that our scoring system had equivalent discrimination ability between the model development group and external validation group.

Table 4 Characteristics of 100 patients in the external validation group.
Fig. 2
figure 2

Comparison of the AUCs between the model development and external validation groups. AUC, Area under the receiver operating characteristic curve; CI, Confidence interval.

Discussion

We demonstrated that our scoring system provided good discrimination between uncomplicated and complicated appendicitis and may contribute to choosing the appropriate treatment for acute appendicitis.

Recent evidence suggests that immediate surgical intervention may not always be required for uncomplicated appendicitis2,3,4. In contrast, other studies have suggested that nonoperative management with antibiotics is not always effective in acute appendicitis with high CRP, presence of an appendicolith, large appendiceal diameter, or complicated appendicitis15,16. It is important to discriminate between uncomplicated and complicated appendicitis accurately when the optimal treatment for patients with acute appendicitis is determined. CT is regarded as a useful modality in differentiating complicated from uncomplicated appendicitis. However, Kim et al. reported that the diagnostic accuracy of CT for complicated appendicitis in experimental trials and observational studies was considered inadequate17. Although their systematic review identified 10 CT findings that indicated complicated appendicitis, nine CT features tended to have a high specificity but low sensitivity when a single feature was used to discriminate between uncomplicated and complicated appendicitis. In our study, the radiological assessment using any one of six CT findings including the presence of an appendicolith, ileus, abscess, extraluminal air, ascites, and FS ≥ grade 2 showed high sensitivity but low specificity, and there was a significant difference in diagnostic performance between the radiological assessment and the scoring system. Therefore, a scoring system including a combination of various CT features and clinical continuous variables may improve the diagnostic accuracy for complicated appendicitis.

Interestingly, there is no universal definition that clearly categorizes acute appendicitis into uncomplicated and complicated appendicitis even though various guidelines have recently been proposed18,19,20,21. Generally, perforated appendicitis, gangrenous appendicitis, and appendicitis with abscess are classified as complicated appendicitis18,19. However, others define perforated appendicitis with periappendiceal phlegmon, abscess, and generalized peritonitis as complicated appendicitis20,21. In fact, several studies have reported various scoring systems to discriminate between uncomplicated and complicated appendicitis10,11,12, and some studies excluded acute appendicitis with abscess from their analysis. For example, Atema et al. developed a scoring system and identified age, BT, WBC, CRP, duration of symptoms, presence of extraluminal free air, periappendiceal fluid, and the presence of an appendicolith on CT as independent predictors of complicated appendicitis, excluding acute appendicitis with abscess; the AUC of the scoring system was 0.88 (95% CI 0.85–0.92)10. In comparison, Lin et al. developed two scoring models that included CRP, presence of ascites, and FS grade on CT to define perforated appendicitis, gangrenous appendicitis, and appendicitis with abscess as complicated appendicitis. The AUCs of the two scoring systems were estimated at 0.878 (95% CI 0.829–0.928) and 0.879 (95% CI 0.830–0.927), respectively11. In the present study, we defined perforated appendicitis, gangrenous appendicitis, and appendicitis with abscess as complicated appendicitis, and developed a scoring system comprising six independent predictors, namely age, BT, CRP, presence of an appendicolith, MD, and FS grade on CT. The AUC of our scoring system was 0.882 (95% CI 0.835–0.929) in the model development group. Our scoring system was considered compatible with previous systems because our system included common predictors, such as BT, CRP, and FS grade, and the scoring system had equivalent discrimination and calibration to those in previous studies. Moreover, the AUC of our scoring system was 0.868 (95% CI 0.794–0.942) in the external validation group, and there was no significant difference compared with the AUC of the model development group. To our knowledge, few studies have demonstrated the utility of a scoring system using an external validation group with the definition of complicated appendicitis comprising perforated appendicitis, gangrenous appendicitis, and appendicitis with abscess.

Our study has some limitations. First, this was a retrospective study performed in two institutions and had a small sample size. Second, our external validation was not performed in a separate prospective study by different researchers. Moreover, there were differences in the patients’ characteristics between the model development and external validation groups, which may have led to bias in the model development and assessment of complicated appendicitis. Although our findings should be interpreted with caution, we believe that the usefulness of our scoring system will be confirmed by further investigation.

Conclusion

This study demonstrated that our newly developed scoring system provided good discrimination between uncomplicated and complicated appendicitis on the basis of model development and external validation groups comprising patients with acute appendicitis. Our scoring system may be useful for the prompt diagnosis of complicated appendicitis and contribute to determining the optimal treatment for acute appendicitis.

Methods

Patients

We retrospectively reviewed the electronic medical records of 299 patients with acute appendicitis who underwent immediate surgical treatment between January 2009 and September 2023. Among the 299 patients, 199 patients with acute appendicitis who underwent immediate surgical treatment in Secomedic Hospital comprised the model development group. The remaining 100 patients with acute appendicitis who underwent immediate surgical treatment in Teikyo University Chiba Medical Center comprised the external validation group. The patient inclusion criteria were age ≥ 15 years, emergency operation for acute appendicitis, and a histologically confirmed diagnosis of acute appendicitis. We excluded patients who were pregnant, did not have histologically confirmed acute appendicitis, or underwent interval appendectomy after antibiotic therapy. In this study, gangrenous appendicitis, perforated appendicitis, and acute appendicitis with abscess were defined as complicated appendicitis. Catarrhal and phlegmonous appendicitis were defined as uncomplicated appendicitis.

CT imaging assessment in acute appendicitis patients

In all patients, the presence of an appendicolith, ileus, abscess, and ascites, and MD were assessed using CT before surgical treatment. FS was graded in accordance with the method of Kim et al.22. Grades 0, 1, 2, and 3 indicated no sign of FS, mild FS of the adjacent fat (thickness < 2 mm), moderate FS of the adjacent fat confined to the mesoappendix, and severe FS extending outside the mesoappendix that was disproportionately greater than the degree of wall thickening, respectively. All patients with any one of six CT findings including the presence of an appendicolith, ileus, abscess, extraluminal air, ascites, and FS ≥ grade 2 were diagnosed as complicated appendicitis in the radiological assessment.

Statistical analysis

The relationships between complicated appendicitis and the continuous variables were investigated using a restricted cubic spline analysis. The four knots were placed at the 5th, 35th, 65th, and 95th percentiles of each continuous variable23. Any inflection point was identified so that it could be used to categorize the variable. Independent predictors were identified by multivariate logistic regression analysis using stepwise backward elimination, with P = 0.10. The final multivariate logistic regression model was assessed by tenfold cross-validation. The discrimination and calibration ability of the scoring system based on the final multivariate logistic regression model were evaluated using the AUC and Hosmer–Lemeshow test, respectively. The cutoff points of the dichotomized continuous variables and scoring system were set at the points on the ROC curve closest to the (0, 1) point24. McNemar’s Chi-squared test was used to compare the diagnostic performance difference between two groups with paired nominal data. The DeLong test was used to compare two AUCs25. P < 0.05 indicated statistical significance. All statistical analyses were performed using R software (version 4.2.1; www.r-project.org) and SPSS (version 26.0; IBM Corp., Armonk, NY, USA) for Windows (Microsoft Corp., Redmond, WA, USA).

Ethics approval and consent to participate

This study was performed in accordance with the principals of the Declaration of Helsinki. Secomedic Hospital and Teikyo University Chiba Medical Center Ethics Review Boards approved the retrospective data collection and analysis (reference numbers: 2021-001 and 18-171, respectively). All patients provided written informed consent for the collection of their medical data for scientific purposes.