Accuracy and precision of the spinal instability neoplastic score (SINS) for predicting vertebral compression fractures after radiotherapy in spinal metastases: a meta-analysis

Radiotherapy has played an important role in the treatment of spinal metastases. One of the major complications of radiotherapy is vertebral compression fracture (VCF). Although the spinal instability neoplastic score (SINS) was developed for evaluating spinal instability in patients with spinal metastases, it is also commonly used to predict VCF after radiotherapy in patients with spinal metastases. However, its accuracy for predicting radiotherapy-induced VCF and precision remain controversial. The aim of this study was to clarify the diagnostic value of the SINS to predict radiotherapy-induced VCF and to make recommendations for improving its diagnostic power. We searched core databases and identified 246 studies. Fourteen studies were analyzed, including 7 studies (with 1269 segments) for accuracy and 7 studies (with 280 patients) for precision. For accuracy, the area under the summary receiver operating characteristic curve was 0.776. When a SINS cut-off value of 7 was used, as was done in the included studies, the pooled sensitivity was 0.790 and the pooled specificity was 0.546. For precision, the summary estimate of interobserver agreement was the highest dividing 2 categories based on a cut-off value of 7, and the value was 0.788. The body collapse showed moderate relationship and precision with the VCF. The lytic tumor of bone lesion showed high accuracy and fair reliability, while location had excellent reliability, but low accuracy. The SINS system can be used to predict the occurrence of VCF after radiotherapy in spinal metastases with moderate accuracy and substantial reliability. Increasing the cut-off value and revising the domains may improve the diagnostic performance to predict the VCF of the SINS.

www.nature.com/scientificreports/ The Spine Oncology Study Group developed the spinal instability neoplastic score (SINS) in 2010 to assess the degree of spinal (in)stability caused by metastatic diseases, as presented in Table 1 [7][8][9] . The score consists of the sum of 5 radiographic parameters and 1 clinical parameter, which results in a summed score between 0 and 18 points 9 . The total score is then divided into 2 categories (stable, 0-6 points and unstable, 7-18 points) or 3 categories of spinal stability (stable, 0-6 points; impending/potentially unstable, 7-12 points; and unstable, 13-18 points) 9 . Because the SINS provides a common language to discuss spinal instability, its use can improve the uniform reporting of spinal instability in the published literature and communication among oncologists, radiologists, and spine surgeons 8,[10][11][12] . In recent years, the SINS has emerged as the most widely accepted instrument for classifying the stability of metastatic vertebral segments 13 .
Although the SINS was developed for evaluating spinal instability in patients with spinal metastases, it is also used in other scenarios, such as VCF after radiotherapy and instability associated with a primary bone tumor 14,15 . Several previous studies have reported that the SINS may be a useful tool for predicting VCF in patients who have undergone radiotherapy, and that it had substantial to excellent interobserver and intraobserver reliability [16][17][18][19] .
Other studies have reported that the SINS score was not predictive of new VCFs after radiotherapy 3,6,20 . Another study reported that the statistical power of the accuracy of SINS to predict radiotherapy-induced VCF was significant only in the univariate analysis, but not in the multivariate analysis 21 . Regarding precision, some researchers questioned the results of those studies, as some of the studies were authored by a co-developer of the SINS 11,18,22 . Therefore, the accuracy and precision of the SINS require objective evaluation by independent researchers.
The primary purpose of this study was to evaluate the accuracy of the SINS for predicting radiotherapyinduced VCF and to evaluate the scores assigned for the 6 domains of the SINS by performing a meta-analysis of diagnostic test accuracy. The secondary aim was to evaluate the precision of the SINS overall and for each domain through a meta-analysis of summary estimates.

Materials and methods
Search strategy and study selection criteria. We performed a comprehensive literature search to identify studies that applied the SINS in cases of spinal metastases, according to the Preferred Reporting Items for Systematic Reviews and Meta-Analyses (PRISMA) guidelines. The searched databases included PubMed, Embase, Web of Science, and the Cochrane Database from inception to January 2020. The search terms used were "spinal instability neoplastic score" AND "spine" (or "spinal") AND "metastasis" (or "metastases"). We also examined the references of all included papers to find other relevant articles. There were no language restrictions on study eligibility, and only the largest study was included in the case of overlapping study populations.

Yes 3
Occasional pain but not mechanical 1 Pain-free lesion 0 www.nature.com/scientificreports/ We excluded duplicated studies, narrative reviews, letters, editorials, comments, and case reports. Studies were also excluded if they included primary tumors (e.g., lymphoma), used the SINS to predict other outcomes (e.g., survival); or did not report target outcomes. The PRISMA checklist has been submitted to the journal as an attachment to this article (see Supplementary Table S1).

Bone lesion
Study eligibility criteria. Two independent reviewers (Y.R.K. and C.H.L.) assessed the eligibility of all the studies retrieved from the databases and performed quality assessments. Any disagreement between the reviewers was resolved through a discussion. We used 2 methods of meta-analysis. First, a meta-analysis of diagnostic test accuracy was performed to evaluate the accuracy of radiotherapy-induced VCF prediction. We assessed the quality of the studies using the outlined component approach for diagnostic accuracy studies with the Quality Assessment of Diagnostic Accuracy Studies 2 (QUADAS-2) tool 23 . We systematically reviewed published studies on the basis of the following criteria: (1) studies that used the SINS to predict VCFs in patients with spinal metastases; (2) studies that reported the numbers of patients for 2 or 3 SINS categories and the number of VCFs; and (3) studies that used data with sufficient information to assess true-positive (TP; fracture in the unstable group), true-negative (TN; no fracture in the stable group), false-positive (FP; fracture in the stable group), and false-negative (FN; no fracture in the unstable group) cases. Second, a meta-analysis of summary estimates was performed to evaluate the interobserver reliability of the overall score, categories, and each domain of SINS (pain, location, bone lesion, alignment, collapse, and posterolateral involvement). For evaluating interobserver reliability, we assessed the bias risk using a modified form of the Newcastle-Ottawa Scale for non-randomized studies 24,25 . Studies were included if they contained a point estimate of the Cohen or Fleiss kappa (κ) value and 95% confidence intervals (CIs).
Data synthesis and analysis. Accuracy for predicting radiotherapy-induced VCF. The retrieved data included the following items: name of the first author; year published; patient demographics; numbers of TP, TN, FP, and FN cases; and the numbers of patients with and without fractures and their scores for the 6 individual SINS domains. Test accuracy was calculated using a summary receiver operating characteristic (SROC) model, the area under the curve (AUC), and the index Q value. With respect to the AUC, a value of 0.5 was considered non-informative; a value of > 0.5 but ≤ 0.7 was considered less accurate; a value of > 0.7 but ≤ 0.9 was considered moderate; a value of > 0.9 but < 1 was considered very accurate; and a value of 1 was considered perfect 26,27 . To perform a meta-analysis of diagnostic test accuracy using all the available studies that reported more than one threshold value, we created 2-by-2 tables for each value from the included studies. Statistical analyses were performed using R version 3.6.3 (R Foundation for Statistical Computing, Vienna, Austria). For the subgroup analysis, each domain (categorical variables) of SINS was analyzed using Spearman rank-order correlation analysis to evaluate the relationship between the incidence of VCF and each domain of SINS. As a post-hoc test, we compared two adjacent classes (ordinal variables) of the 6 domains of SINS using odds ratios and the chi-square test. Review Manager version 5.3 Cochrane Collaboration, Oxford, UK) was used for the QUADAS-2 tool, coupled forest plot, and SROC plots.
Precision of the SINS. The collected data included the κ-values and 95% CIs of the overall SINS; the SINS used as a categorical measure with 2 categories (< 7 vs. ≥ 7) or 3 categories (0-6 vs. 7-12 vs. [13][14][15][16][17][18]; and the 6 individual domains of SINS. The summary estimate was calculated as the weighted mean of the reported number of participants and pooled variance. Pooled estimates were categorized as follows according to the method of Landis and Koch: near perfect (0.81-1.00), substantial (0.61-0.80), moderate (0.41-0.60), fair (0.21-0.40), and slight (0.00-0.20) 28 . Heterogeneity was considered significant when the I 2 value was > 50% 29 . A random-effects model was used depending on the study design and heterogeneity of the studies included. We assessed publication bias by visually inspecting the funnel plots and calculating the p-value (one-sided) for Egger's intercept 30 . Data were analyzed using the Comprehensive Meta-Analysis software version 3.3 (Biostat, Inc., Englewood, NJ, USA).

Results
Search results for relevant studies. The initial search identified 246 articles, from which 143 duplicated articles were excluded. Among the 103 remaining articles, 23 were excluded because they were case reports, review articles, letters, or technical notes. Some papers dealt with other tumors (n = 17) such as myeloma, granuloma, or lymphoma, and 12 were research on metastatic epidural spinal cord compression. The remaining 51 studies were subjected to full-text review and another 37 were excluded. The reasons for exclusion of these studies were no data on accuracy or precision (n = 28) or no prediction of VCF (n = 9). Finally, we identified a total of 14 observational studies. The detailed results of the selection process are shown in Supplementary Fig. S1.
Seven of the included studies 3,6,21,31-34 evaluated accuracy and included 798 patients and 1269 spinal segments, and the other 7 studies 11,16-18,22,35,36 evaluated precision and included 280 patients ( Table 2). Four studies 3,6,21,31 dealt with patients who underwent SBRT, the other studies 32-34 evaluated patients who underwent CRT. The primary regions of cancer were the kidney, breast, lung, and colorectum for the studies evaluating accuracy, and the kidney, breast, lung, and prostate for the studies analyzing precision.
Accuracy of the SINS in predicting VCFs. Using data from the included studies, a 2 × 2 table was created based on VCF using a SINS cut-off of 7. We used the same classification criteria as the included studies and defined TP as a SINS of ≥ 7 with a VCF, FP as a SINS of ≥ 7 without a VCF, FN as a SINS of < 7 with a VCF, and TN as a SINS of < 7 without a VCF. Coupled forest plots showed the sensitivities and specificities of each study (Fig. 1). An SROC plot was created using data from 7 studies (Fig. 2) A subgroup analysis was performed to assess the correlation between the scores for the 6 SINS domains and the incidence of VCF (Fig. 3, Table 3). The domain of body collapse showed a moderate relationship (correlation coefficient, 0.333; p < 0.001) and was closely correlated with VCF for scores of 0-2. However, the incidence of VCF with a > 50% collapse (score 3) was lower than that of VCF with a < 50% collapse (score 2), even as the score increased. In the bone lesion domain, the incidence of VCF decreased as the score decreased, with a weak correlation (correlation coefficient, 0.218; p < 0.001) between VCF and the bone lesion score. A low bone lesion score in the SINS indicates that a lytic tumor (score 2) is associated with a high risk of VCF after radiotherapy. There was a slight difference between mixed (score 1) and blastic (score 0) tumors. The other domains (location, pain, alignment, and posterolateral involvement) showed negligible relationships with VCF in the correlation analysis. Spinal alignment showed a difference between kyphosis/scoliosis (score 2) and normal (score 0). There were only 2 cases of score 4 (subluxation/translation) of spinal alignment, and an analysis showing statistical significance was not possible for a comparison between scores 2 and 4. A correlation between the location of metastases  www.nature.com/scientificreports/ and incidence of VCF was only found for mobile (score 2) and semi-rigid (score 1) metastases. Patients with occasional pain (score 1) showed fewer VCFs than those with persistent pain (score 3) and a similar incidence of VCF as the pain-free group (score 0). Unilateral posterolateral involvement (score 1) was associated with a higher incidence of VCF than bilateral involvement (score 3) or no involvement (score 0).
Precision of the SINS. The interobserver agreement (κ-value) of the SINS was calculated in 7 studies.
The summary estimates for the overall score of the SINS, 2 categories, and 3 categories was as follows: 0.709 (95% CI 0.390-1.028), 0.788 (95% CI 0.675-0.900), and 0.524 (95% CI 0.424-0.624), respectively (Fig. 4). The use of SINS with 2 categories (stable vs. unstable) showed substantial agreement and the highest interobserver reliability. A subgroup analysis of interobserver reliability for each domain was performed, and the results are presented in Table 4

Publication bias.
The results of the QUADAS-2 analysis are summarized in terms of risk of bias and concerns regarding applicability in Supplementary Fig. S2 23 . Most of the included studies predicted VCF after radiotherapy in patients with spinal metastases. One study predicted skeletal-related events, which included pathologic fractures, the need for surgery, bone irradiation, spinal compression, and hypercalcemia 34 . For evaluating interobserver reliability, the results of the quality assessment were acceptable. One study revealed that evaluations using the SINS were performed by residents and fellows, while others reported the evaluations were performed by a specialist 35 .

Discussion
When predicting the occurrence of VCF after radiotherapy based on a SINS score of 7 points in patients with spinal metastases, the accuracy of the tool was moderately significant (AUC, 0.776), and the interobserver reliability showed substantial agreement (κ = 0.799). Among the 6 SINS domains, body collapse domain was moderately correlated with occurrence of VCF and bone lesion was weakly correlated; however, the other domains showed insignificant relationships with the incidence of VCF. In aspect of precision, bone lesion domain showed fair interobserver reliability (κ = 0.28), while location displayed near-perfect reliability (κ = 0.83). Although the overall accuracy and precision were acceptable, some domains showed high accuracy and low precision, or high precision and low accuracy. www.nature.com/scientificreports/ This meta-analysis revealed that the diagnostic power of the SINS in predicting radiotherapy-induced VCF was moderate. All of the included studies used a SINS score of 7 to distinguish between low and high risk for VCF, and we used the same cut-off value and integrated summary estimates. When the cut-off value was 7, the pooled sensitivity and specificity were 0.79 and 0.54, respectively, which demonstrated substantially low specificity. SROC analyses provide important information about diagnostic test performance; the closer the apex of Figure 3. Incidence of vertebral compression fracture (VCF) after radiotherapy according to the score of each domain of the spinal instability neoplastic score (SINS). The red line shows the mean incidence rate. The lines with other colors represent each of the included studies. The lytic lesion (score 2) of bone lesion showed more frequent than the other lesions. In terms of the body collapse domain, the incidence decreased significantly as the score decreased, displaying high accuracy. Some domains and levels showed an increasing incidence of VCF despite decreasing scores. This figure was drown using Microsoft Excel version 2016. www.nature.com/scientificreports/ the curve is to the upper left corner, the greater the discriminatory ability of the test 38 . Considering the shallow slope of the curve, the best cut-off value (the closest point to the upper top corner) was mildly greater than 7 in the SROC curve. At an ideal threshold, the sensitivity and specificity would be approximately 0.7 and 0.8, respectively. The summary estimate of the interobserver reliability of the 2 SINS categories showed substantial agreement. Although only 2 studies (written by the same corresponding author) evaluated the precision of the SINS using 2 categories 18,22 , similar results for precision were observed for the overall SINS score in studies published by 5 different authors.
Subgroup analysis of 6 SINS domains. When 6 domains of SINS were analyzed, the body collapse domain also showed significance. The incidence of VCF at > 50% collapse (score 3) was 27%, which was substantially lower than that at < 50% collapse (score 2; 39%). The reason for this may be that the lesions that already showed extensive collapse were unlikely to develop additional fractures. The incidence of VCF at > 50% collapse was similar to the incidence in cases of no collapse with > 50% involvement (score 1; 20%). If the groups with scores of 3 and 1 in the body collapse domain were combined and the number of levels decreased from 4 (3, 2, 1, and 0) to 3 (2, 1, and 0), the accuracy and reliability would increase. This finding may mean that highly (> 50%) collapsed lesions can be more stable than slightly collapsed lesions. The reason for this may be that highly compressed lesions have no more space to collapse due to the compressed bone marrow. The score for the bone lesion domain revealed a weak relationship with the incidence of radiotherapy-induced VCF. The incidence of VCF was 22% in the lytic group (score 2), 7% in the mixed group (score 1), and 3% in the blastic group (score 0). The mixed lesion group showed a similar VCF incidence rate to that of the blastic group, so it may be better to combine these 2 groups. Interobserver reliability of bone lesion demonstrated fair (κ = 0.28), and the reason for its fair reliability may be that clinicians may have difficulty distinguishing between mixed (lytic/blastic) and lytic and blastic lesions.
Alignment was an accurate predictor in a comparison between de novo deformity (kyphosis/scoliosis) (score 2) and normal alignment (score 0), whereas the VCF incidence was not significantly different between subluxation/translation (score 4) and kyphosis/scoliosis (score 2). This was likely due to the presence of only 2 of 1222  www.nature.com/scientificreports/ patients with a score of 4. Therefore, eliminating the subluxation/translation level because of the scarcity of cases may improve the accuracy and reliability of the SINS test. The incidence of VCF in cases with unilateral posterolateral involvement (score 1) was 21%, significantly higher than the 11% in the bilateral involvement group (score 2) and 13% in the cases without any involvement (score 0). Although all 5 studies showed the same trend, none proposed an explanation for this trend. A further investigation into the relationship between VCF and posterolateral involvement is needed to clarify this result.
The location domain showed near-perfect (κ = 0.83) agreement in terms of interobserver reliability, and using this domain may be one way of increasing the overall reliability of the SINS. However, the incidence rates of VCF at the different levels were 17%, 19%, and 13% in junctional, mobile, and semirigid sections, respectively, demonstrating that different locations were not associated with a significantly different risk of VCF.
Pain, the only clinical SINS domain, showed a substantial agreement in interobserver reliability; however, the reliability might have been overestimated because the evaluators did not examine the patients themselves, but rather reviewed their medical records and evaluated the scores. The VCF incidence rates were 24%, 12%, and 13% in patients with persistent pain (score 3), occasional pain (score 1), and pain-free lesions (score 0), and no statistically significant difference was found in VCF incidence between the scores of 1 and 0. Although the VCF incidence showed a substantial difference between scores 3 and 1, the incidence in the group with a score of 3 was quite variable, ranging from 12 to 50% among the studies. Owing to the difficulty of fully distinguishing the pain caused by spinal metastases from other kinds of pain, disability indexes may be more accurate parameters.
Limitations. This study has several limitations that need to be addressed. First, the SINS was developed not to predict radiotherapy-induced VCF, but to evaluate spinal instability. Therefore, the accuracy of the SINS presented in this study is limited to the prediction of VCF in the specific scenario of radiotherapy-induced VCF, and the findings of this study have no implications for the accuracy of the SINS when used for its original purpose. Second, there was substantial heterogeneity in radiotherapy, patients, and evaluators. The pooled data were very heterogeneous, since different radiation protocols were reported in each study (CRT or SBRT). The radiation doses and fractionation schemes are very important, and may affect the VCF. For the analysis of accuracy, osteolytic metastases were included. Precision may be affected by the professional expertise and level of experience of the evaluators 39 . The evaluations using the SINS were performed in this meta-analysis by specialists in the fields of spine surgery, radiology, and radiation oncology, as well as trainees such as residents and fellows, which may have reduced the precision. Nonetheless, the SINS was developed to facilitate communication between non-surgeons and surgeons, and evaluations conducted by various health-care professionals are therefore representative of the real-world use of this instrument. Third, the SINS was introduced as a tool for clinicians to recommend surgery before the onset of severe disability caused by VCF. However, this scoring system can miss patients with VCF due to a low disability level or lack of obvious VCF because of a high disability level, such as being in a bedridden state. The clinical implications of this tool should be investigated to better understand and use the SINS with caution, as the disability index at baseline was not considered. Fourth, 6 of the 7 included studies for accuracy enrolled patients who underwent SBRT or CRT 3,6,21,31-33 . The reported median dose was 24 Gy for SBRT and 30 Gy for CRT. A high radiation dose may affect the incidence of VCF. Although this was an uncontrolled confounding factor that may have created bias, the target radiation dose was usually constant in the included studies, and follow-up without local treatment for spinal lesions after the diagnosis was almost impossible. Finally, pain as a SINS domain might not be consistently evaluated. Evaluators need to distinguish mechanical pain that improves with recumbency or pain with movement or spinal loading from pain due to degenerative disease or trauma. We conducted this meta-analysis under the assumption that the studies distinguished these types of pain correctly. If the pain data of the included studies are uncertain, the results of a meta-analysis using these data need to be interpreted with caution.

Conclusion
The system involving a binary categorization based on the SINS can be used to predict the occurrence of VCF after radiotherapy in spinal metastases with moderate accuracy and substantial reliability. The body collapse showed moderate accuracy and precision with the VCF. The lytic bone lesion was a risk factor of VCF, but bone lesion domain had only fair reliability, whereas location was nearly perfectly reliable, but was inaccurate. Raising the cut-off value to above 7 and revision of some domains may help improve diagnostic accuracy. www.nature.com/scientificreports/ Reprints and permissions information is available at www.nature.com/reprints.
Publisher's note Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article's Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article's Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creat iveco mmons .org/licen ses/by/4.0/.