Abstract
A scoring system to discriminate between uncomplicated and complicated appendicitis is beneficial to determine the optimal treatment for acute appendicitis. We developed a scoring system to discriminate between uncomplicated and complicated appendicitis and assessed the clinical usefulness of the scoring system using external validation. A total of 299 patients with acute appendicitis were retrospectively reviewed. One hundred and ninety-nine patients were assigned to the model development group, while the other 100 patients were assigned to an external validation group. A scoring system for complicated appendicitis was created using a final multivariate logistic regression model with six independent predictors. The area under the receiver operating characteristic curve of the scoring system was 0.882 (95% confidence interval: 0.835–0.929). The cutoff point of the scoring system was 12, and the sensitivity and specificity were 82.9% and 86.2%, respectively. In the external validation group, the area under the receiver operating characteristic curve of the scoring system was 0.868 (95% confidence interval 0.794–0.942), and there was no significant difference between the groups in the area under the receiver operating characteristic curve (P = 0.750). Our newly developed scoring system may contribute to prompt determination of the optimal treatment for acute appendicitis.
Similar content being viewed by others
Introduction
Acute appendicitis is classified clinically as uncomplicated or complicated, and 70–80% of all patients with acute appendicitis are diagnosed with uncomplicated appendicitis1. Recent clinical trials have demonstrated that antibiotic therapy for uncomplicated appendicitis may be a feasible alternative to immediate appendectomy2,3,4. Although some studies reported the usefulness of antibiotic therapy for acute appendicitis with abscess because nonsurgical treatment reduced the incidence of extensive surgical procedures and postoperative complications5,6, immediate surgical treatment remains the main approach for complicated appendicitis. Even though abdominal ultrasonography, computed tomography (CT), and magnetic resonance imaging have contributed to the diagnosis of acute appendicitis7,8,9, diagnostic discrimination between uncomplicated and complicated appendicitis using these imaging methods remains challenging. Therefore, a scoring system with reliable discrimination between uncomplicated and complicated appendicitis is required to determine the optimal treatment for patients with acute appendicitis.
Several studies have reported various scoring systems to predict complicated appendicitis10,11,12; however, few have evaluated the clinical utility of the scoring systems using external validation13,14. In this study, we developed a scoring system to discriminate between uncomplicated and complicated appendicitis, and assessed its clinical utility using external validation.
Results
Patient characteristics in the model development group
A total of 199 patients with acute appendicitis who underwent immediate surgical treatment in Secomedic Hospital comprised the model development group. Among the 199 patients, 105 patients (52.8%) were diagnosed with complicated appendicitis. The characteristics of the patients in the model development group are summarized in Table 1.
Relationships between complicated appendicitis and the continuous variables
The relationships between complicated appendicitis and the continuous variables were investigated using a restricted cubic spline analysis. The analysis revealed that age and body temperature (BT) had linear relationships with complicated appendicitis, and these two variables were dichotomized at each point on the receiver operating characteristic (ROC) curves closest to the (0, 1) point (Supplementary Fig. S1). In contrast, body mass index (BMI), platelet count (PLT), serum C-reactive protein concentration (CRP), and maximum diameter of appendix or abscess (MD) had non-linear relationships with complicated appendicitis, and white blood cell count (WBC) had neither a linear nor non-linear relationship with complicated appendicitis. On the basis of the restricted cubic spline analysis, BMI, PLT, WBC, CRP, and MD were converted into three categories. The BMI, PLT, WBC, CRP, and MD were categorized as follows: BMI, ≤ 20.5 kg/m2, 20.6–23.5 kg/m2, and > 23.5 kg/m2; PLT, ≤ 210 × 109/L, 211–260 × 109/L, and > 260 × 109/L; WBC, ≤ 11.5 × 109/L, 11.6–14.5 × 109/L, and > 14.5 × 109/L; CRP, ≤ 15.0 mg/L, 15.1–70.0 mg/L, and > 70.0 mg/L; MD, ≤ 9.0 mm, 9.1–11.0 mm, and > 11.0 mm (Supplementary Fig. S2).
Determination of a final multivariate logistic regression model to discriminate between uncomplicated and complicated appendicitis
Multivariate logistic regression analysis with stepwise backward elimination was performed, and a final multivariate logistic regression model was created using six independent predictors of complicated appendicitis. The independent predictors were as follows: age > 47 years (odds ratio (OR), 2.825; 95% confidence interval (CI), 1.278–6.242; P = 0.010), BT > 37.2°C (OR, 2.230; 95% CI 1.037–4.792; P = 0.040), CRP > 70.0 mg/L (OR, 7.182; 95% CI 2.587–19.937; P < 0.001), MD > 11.0 mm (OR, 3.273; 95% CI 1.253–8.546; P = 0.015), presence of an appendicolith (OR, 3.064; 95% CI 1.387–6.770; P = 0.006), periappendiceal fat stranding (FS) grade 1 (OR, 4.073; 95% CI 1.473–11.266; P = 0.007), FS grade 2 (OR, 5.000; 95% CI 1.723–14.508; P = 0.003), and FS grade 3 (OR, 6.521; 95% CI 1.465–29.020; P = 0.014) (Table 2).
Internal validation of the final multivariate logistic regression model to discriminate between uncomplicated and complicated appendicitis
An internal validation of the final multivariate logistic regression model was performed using tenfold cross-validation. First, 199 patients were randomly split into a training data set comprising 70% of the 199 patients and a test data set comprising the remaining 30% of the 199 patients. Next, the training data set was randomly split into 10 equal-sized subsamples. Of the 10 subsamples, 9 subsamples were used as the data for training the model, and the remaining subsample was retained as the data for testing the model. Furthermore, the model estimation process was repeated 10 times, with each of the 10 subsamples used exactly once as the data for testing the model. Finally, the 10 results were averaged to produce a single estimation. The model accuracy of the training data set was 0.799, and Cohen’s kappa was 0.595. Additionally, the model accuracy was predicted using the test data set. The model accuracy using the test data set was 0.763, and Cohen’s kappa was 0.521. Therefore, the tenfold cross-validation indicated that the model discriminated between uncomplicated and complicated appendicitis with moderate accuracy.
Conversion of the final multivariate logistic regression model to a scoring system for complicated appendicitis
The final multivariate logistic regression model was converted to a scoring system. A points value rounding off the OR to the nearest integer was assigned to each predictor. The points values for the six independent predictors were as follows: age > 47 years, 3 points; BT > 37.2°C, 2 points; CRP 15.1–70.0 mg/L, 3 points; CRP > 70.0 mg/L, 7 points; MD 9.1–11.0 mm, 1 point; MD > 11.0 mm, 3 points; presence of an appendicolith, 3 points; FS grade 1, 4 points; FS grade 2, 5 points; and FS grade 3, 7 points. Consequently, the scores for the scoring system ranged from 0 to 25 points (Table 3).
Discrimination and calibration ability of the scoring system for complicated appendicitis
The discrimination ability of the scoring system was evaluated using a ROC curve analysis. The area under the ROC curve (AUC) of the scoring system was 0.882 (95% CI 0.835–0.929), and the Hosmer–Lemeshow test showed no significant difference in goodness of fit for the scoring system (P = 0.478). According to the ROC analysis, the cutoff point of the scoring system was set as 12 points, with a sensitivity and specificity of 82.9% and 86.2%, respectively. The positive likelihood ratio, negative likelihood ratio, and diagnostic OR were 6.01, 0.20, and 30.3, respectively (Fig. 1).
Diagnostic performance difference between the radiological assessment and the scoring system for complicated appendicitis
Based on the scoring system, 100 patients with > 12 points were diagnosed as complicated appendicitis. On the other hand, 152 patients with any one of six CT findings including the presence of an appendicolith, ileus, abscess, extraluminal air, ascites, and FS ≥ grade 2 were diagnosed as complicated appendicitis using the radiological assessment with a sensitivity and specificity of 91.4% and 40.4%, respectively. There were 139 patients (69.8%) with diagnostic concordance between the radiological assessment and the scoring system (Supplementary Table S1). The diagnostic performance difference between the radiological assessment and the scoring system was evaluated, and there was a significant difference in diagnostic performance between the radiological assessment and the scoring system (P < 0.001).
Assessment of the discrimination and calibration ability of the scoring system using external validation
The scoring system was evaluated using an external validation group comprising 100 patients with acute appendicitis who underwent immediate surgical treatment in Teikyo University Chiba Medical Center. Our scoring system was applied to the external validation group data, and 51 of the 59 patients with complicated appendicitis scored > 12 points (Table 4). In the external validation group, the AUC of the scoring system was 0.868 (95% CI 0.794–0.942), and the Hosmer–Lemeshow test showed no significant difference in goodness of fit for the scoring system (P = 0.125). At 12 points, the sensitivity and specificity were 86.4% and 78.0%, respectively. The positive likelihood ratio, negative likelihood ratio, and diagnostic OR were 3.93, 0.17, and 23.1, respectively. There was no significant difference in the AUC between the groups (P = 0.750) (Fig. 2). Therefore, we considered that our scoring system had equivalent discrimination ability between the model development group and external validation group.
Discussion
We demonstrated that our scoring system provided good discrimination between uncomplicated and complicated appendicitis and may contribute to choosing the appropriate treatment for acute appendicitis.
Recent evidence suggests that immediate surgical intervention may not always be required for uncomplicated appendicitis2,3,4. In contrast, other studies have suggested that nonoperative management with antibiotics is not always effective in acute appendicitis with high CRP, presence of an appendicolith, large appendiceal diameter, or complicated appendicitis15,16. It is important to discriminate between uncomplicated and complicated appendicitis accurately when the optimal treatment for patients with acute appendicitis is determined. CT is regarded as a useful modality in differentiating complicated from uncomplicated appendicitis. However, Kim et al. reported that the diagnostic accuracy of CT for complicated appendicitis in experimental trials and observational studies was considered inadequate17. Although their systematic review identified 10 CT findings that indicated complicated appendicitis, nine CT features tended to have a high specificity but low sensitivity when a single feature was used to discriminate between uncomplicated and complicated appendicitis. In our study, the radiological assessment using any one of six CT findings including the presence of an appendicolith, ileus, abscess, extraluminal air, ascites, and FS ≥ grade 2 showed high sensitivity but low specificity, and there was a significant difference in diagnostic performance between the radiological assessment and the scoring system. Therefore, a scoring system including a combination of various CT features and clinical continuous variables may improve the diagnostic accuracy for complicated appendicitis.
Interestingly, there is no universal definition that clearly categorizes acute appendicitis into uncomplicated and complicated appendicitis even though various guidelines have recently been proposed18,19,20,21. Generally, perforated appendicitis, gangrenous appendicitis, and appendicitis with abscess are classified as complicated appendicitis18,19. However, others define perforated appendicitis with periappendiceal phlegmon, abscess, and generalized peritonitis as complicated appendicitis20,21. In fact, several studies have reported various scoring systems to discriminate between uncomplicated and complicated appendicitis10,11,12, and some studies excluded acute appendicitis with abscess from their analysis. For example, Atema et al. developed a scoring system and identified age, BT, WBC, CRP, duration of symptoms, presence of extraluminal free air, periappendiceal fluid, and the presence of an appendicolith on CT as independent predictors of complicated appendicitis, excluding acute appendicitis with abscess; the AUC of the scoring system was 0.88 (95% CI 0.85–0.92)10. In comparison, Lin et al. developed two scoring models that included CRP, presence of ascites, and FS grade on CT to define perforated appendicitis, gangrenous appendicitis, and appendicitis with abscess as complicated appendicitis. The AUCs of the two scoring systems were estimated at 0.878 (95% CI 0.829–0.928) and 0.879 (95% CI 0.830–0.927), respectively11. In the present study, we defined perforated appendicitis, gangrenous appendicitis, and appendicitis with abscess as complicated appendicitis, and developed a scoring system comprising six independent predictors, namely age, BT, CRP, presence of an appendicolith, MD, and FS grade on CT. The AUC of our scoring system was 0.882 (95% CI 0.835–0.929) in the model development group. Our scoring system was considered compatible with previous systems because our system included common predictors, such as BT, CRP, and FS grade, and the scoring system had equivalent discrimination and calibration to those in previous studies. Moreover, the AUC of our scoring system was 0.868 (95% CI 0.794–0.942) in the external validation group, and there was no significant difference compared with the AUC of the model development group. To our knowledge, few studies have demonstrated the utility of a scoring system using an external validation group with the definition of complicated appendicitis comprising perforated appendicitis, gangrenous appendicitis, and appendicitis with abscess.
Our study has some limitations. First, this was a retrospective study performed in two institutions and had a small sample size. Second, our external validation was not performed in a separate prospective study by different researchers. Moreover, there were differences in the patients’ characteristics between the model development and external validation groups, which may have led to bias in the model development and assessment of complicated appendicitis. Although our findings should be interpreted with caution, we believe that the usefulness of our scoring system will be confirmed by further investigation.
Conclusion
This study demonstrated that our newly developed scoring system provided good discrimination between uncomplicated and complicated appendicitis on the basis of model development and external validation groups comprising patients with acute appendicitis. Our scoring system may be useful for the prompt diagnosis of complicated appendicitis and contribute to determining the optimal treatment for acute appendicitis.
Methods
Patients
We retrospectively reviewed the electronic medical records of 299 patients with acute appendicitis who underwent immediate surgical treatment between January 2009 and September 2023. Among the 299 patients, 199 patients with acute appendicitis who underwent immediate surgical treatment in Secomedic Hospital comprised the model development group. The remaining 100 patients with acute appendicitis who underwent immediate surgical treatment in Teikyo University Chiba Medical Center comprised the external validation group. The patient inclusion criteria were age ≥ 15 years, emergency operation for acute appendicitis, and a histologically confirmed diagnosis of acute appendicitis. We excluded patients who were pregnant, did not have histologically confirmed acute appendicitis, or underwent interval appendectomy after antibiotic therapy. In this study, gangrenous appendicitis, perforated appendicitis, and acute appendicitis with abscess were defined as complicated appendicitis. Catarrhal and phlegmonous appendicitis were defined as uncomplicated appendicitis.
CT imaging assessment in acute appendicitis patients
In all patients, the presence of an appendicolith, ileus, abscess, and ascites, and MD were assessed using CT before surgical treatment. FS was graded in accordance with the method of Kim et al.22. Grades 0, 1, 2, and 3 indicated no sign of FS, mild FS of the adjacent fat (thickness < 2 mm), moderate FS of the adjacent fat confined to the mesoappendix, and severe FS extending outside the mesoappendix that was disproportionately greater than the degree of wall thickening, respectively. All patients with any one of six CT findings including the presence of an appendicolith, ileus, abscess, extraluminal air, ascites, and FS ≥ grade 2 were diagnosed as complicated appendicitis in the radiological assessment.
Statistical analysis
The relationships between complicated appendicitis and the continuous variables were investigated using a restricted cubic spline analysis. The four knots were placed at the 5th, 35th, 65th, and 95th percentiles of each continuous variable23. Any inflection point was identified so that it could be used to categorize the variable. Independent predictors were identified by multivariate logistic regression analysis using stepwise backward elimination, with P = 0.10. The final multivariate logistic regression model was assessed by tenfold cross-validation. The discrimination and calibration ability of the scoring system based on the final multivariate logistic regression model were evaluated using the AUC and Hosmer–Lemeshow test, respectively. The cutoff points of the dichotomized continuous variables and scoring system were set at the points on the ROC curve closest to the (0, 1) point24. McNemar’s Chi-squared test was used to compare the diagnostic performance difference between two groups with paired nominal data. The DeLong test was used to compare two AUCs25. P < 0.05 indicated statistical significance. All statistical analyses were performed using R software (version 4.2.1; www.r-project.org) and SPSS (version 26.0; IBM Corp., Armonk, NY, USA) for Windows (Microsoft Corp., Redmond, WA, USA).
Ethics approval and consent to participate
This study was performed in accordance with the principals of the Declaration of Helsinki. Secomedic Hospital and Teikyo University Chiba Medical Center Ethics Review Boards approved the retrospective data collection and analysis (reference numbers: 2021-001 and 18-171, respectively). All patients provided written informed consent for the collection of their medical data for scientific purposes.
Data availability
The datasets used and/or analyzed during the current study are available from the corresponding author upon reasonable request.
References
Mällinen, J. et al. Appendicolith appendicitis is clinically complicated acute appendicitis–is it histopathologically different from uncomplicated acute appendicitis. Int. J. Colorectal Dis. 34, 1393–1400 (2019).
Vons, C. et al. Amoxicillin plus clavulanic acid versus appendicectomy for treatment of acute uncomplicated appendicitis: an open-label, non-inferiority, randomized controlled trial. Lancet 377, 1573–1579 (2011).
Di Saverio, S. et al. The NOTA study (Non Operative Treatment for Acute Appendicitis): Prospective study on the efficacy and safety of antibiotics (amoxicillin and clavulanic acid) for treating patients with right lower quadrant abdominal pain and long-term follow-up of conservatively treated suspected appendicitis. Ann. Surg. 260, 109–117 (2014).
Salminen, P. et al. Antibiotic therapy versus appendectomy for treatment of uncomplicated acute appendicitis: The APPAC randomized clinical trial. JAMA 313, 2340–2348 (2015).
Shekarriz, S. et al. Comparison of conservative versus surgical therapy for acute appendicitis with abscess in five German hospitals. Int. J. Colorectal Dis. 34, 649–655 (2019).
Mima, K. et al. Interval laparoscopic appendectomy after antibiotic therapy for appendiceal abscess in elderly patients. Asian J. Endosc. Surg. 13, 311–318 (2020).
Giljaca, V., Nadarevic, T., Poropat, G., Nadarevic, V. S. & Stimac, D. Diagnostic accuracy of abdominal ultrasound for diagnosis of acute appendicitis: Systematic review and meta-analysis. World J. Surg. 41, 693–700 (2017).
Sippola, S. et al. The accuracy of low-dose computed tomography protocol in patients with suspected acute appendicitis: The OPTICAP study. Ann. Surg. 271, 332–338 (2020).
Duke, E. et al. A systematic review and meta-analysis of diagnostic performance of MRI for evaluation of acute appendicitis. AJR Am. J. Roentgenol. 206, 508–517 (2016).
Atema, J. J., van Rossem, C. C., Leeuwenburgh, M. M., Stoker, J. & Boemeester, M. A. Scoring system to distinguish uncomplicated from complicated acute appendicitis. Br. J. Surg. 102, 979–990 (2015).
Lin, H. A., Tsai, H. W., Chao, C. C. & Lin, S. F. Periappendiceal fat-stranding models for discriminating between complicated and uncomplicated acute appendicitis: A diagnostic and validation study. World J. Emerg. Surg. 16, 52. https://doi.org/10.1186/s13017-021-00398-5 (2021).
Kobayashi, T. et al. Development of a scoring model based on objective factors to predict gangrenous/perforated appendicitis. BMC Gastroenterol. 23, 198. https://doi.org/10.1186/s12876-023-02767-7 (2023).
Imaoka, Y. et al. Validity of predictive factors of acute complicated appendicitis. World J. Emerg. Surg. 11, 48. https://doi.org/10.1186/s13017-016-0107-0 (2016).
Geerdink, T. H. et al. Validation of a scoring system to distinguish uncomplicated from complicated appendicitis. J. Surg. Res. 258, 231–238 (2021).
Loftus, T. J. et al. Successful nonoperative management of uncomplicated appendicitis: Predictors and outcomes. J. Surg. Res. 222, 212–218 (2018).
Kobayashi, T. et al. Prediction model for failure of nonoperative management of uncomplicated appendicitis in adults. World J. Surg. 45, 3041–3047 (2021).
Kim, H. Y. et al. Systematic review and meta-analysis of CT features for differentiating complicated and uncomplicated appendicitis. Radiology 287, 104–115 (2018).
Di Saverio, S. et al. Diagnosis and treatment of acute appendicitis: 2020 update of the WSES Jerusalem guidelines. World J. Emerg. Surg. 15, 27. https://doi.org/10.1186/s13017-020-00306-3 (2020).
Bhangu, A., Søreide, K., Di Saverio, S., Assarsson, J. H. & Drake, F. T. Acute appendicitis: Modern understanding of pathogenesis, diagnosis, and management. Lancet 386, 1278–1287 (2015).
Vasileiou, G. et al. Validation of the American Association for the Surgery of Trauma emergency general surgery score for acute appendicitis–an EAST multicenter study. J. Trauma Acute Care Surg. 87, 134–139 (2019).
Moris, D., Paulson, E. K. & Pappas, T. N. Diagnosis and management of acute appendicitis in adults: A review. JAMA 326, 134–139 (2021).
Kim, H. Y. et al. CT in differentiating complicated from uncomplicated appendicitis: Presence of any of 10 CT features versus radiologists’ gestalt assessment. AJR Am. J. Roentgenol. 213, W218–W227 (2019).
Harrell, J. F. E. 2015, Regression models for continuous y and case study in ordinal regression in Regression Modeling Strategies: with Applications to Linear Models, Logistic and Ordinal Regression, and Survival Analysis (Harrell, Jr. F. E. ed.) Springer, 359–387
Akobeng, A. K. Understanding diagnostic tests 3: Receiver operating characteristic curves. Acta Paediatr. 96, 644–647 (2007).
DeLong, E. R., DeLong, D. M. & Clarke-Pearson, D. L. Comparing the areas under two or more correlated receiver operating characteristic curves: A nonparametric approach. Biometrics 44, 837–847 (1988).
Acknowledgements
We thank Jane Charbonneau, DVM, from Edanz Group (https://jp.edanz.com/ac) for editing a draft of this manuscript.
Author information
Authors and Affiliations
Contributions
Study conception and design: M.M., K.S., C.K., T.F.. Acquisition of data: M.M., K.N., A.H., A.U., H.N., M.H.. Analysis and interpretation of data: M.M., K.S., C.K., T.F.. Drafting of manuscript: M.M., K.S., K.N., A.H.. Critical revision of manuscript: T.S., M.Y., K.Y., H.S., K.K.. All authors read and approved the final manuscript.
Corresponding author
Ethics declarations
Competing interests
The authors declare no competing interests.
Additional information
Publisher's note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Supplementary Information
Rights and permissions
Open Access This article is licensed under a Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 International License, which permits any non-commercial use, sharing, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if you modified the licensed material. You do not have permission under this licence to share adapted material derived from this article or parts of it. The images or other third party material in this article are included in the article’s Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article’s Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by-nc-nd/4.0/.
About this article
Cite this article
Mori, M., Shuto, K., Kosugi, C. et al. Development and validation of a new scoring system to discriminate between uncomplicated and complicated appendicitis. Sci Rep 14, 19825 (2024). https://doi.org/10.1038/s41598-024-70904-7
Received:
Accepted:
Published:
DOI: https://doi.org/10.1038/s41598-024-70904-7