Introduction

Spine fusions have become a frequent treatment choice for distinct spinal pathologies over the past four decades. Rajaee et al.1 estimated a spinal fusion rate increase from 64.5 cases per 100,000 adults in 1998 to 135.5 cases per 100,000 in 2008. Primary cervical and lumbar fusion increased from 73.717 and 77.682 in 1998 to 157.966 and 210.4047 in 2008, respectively1. In the United States, lumbar degenerative disc disease surgical interventions increased from 21,223 in 2000 to 55,467 in 20092. Similarly, the market for spinal implants and devices was estimated at $7 billion in sales between 2013 and 20143, reflecting an increase in material availability. However, the current literature insufficiently confirms the superiority of one intervention or graft4,5.

Like any other surgical intervention, spine fusions can lead to unexpected outcomes, such as pseudarthrosis or other adverse events. Pseudarthrosis can be defined as a solid fusion failure, whether symptomatic or asymptomatic, that can increase the risk of neurologic symptoms, material failure, and deformity6,7. To make appropriate decisions, surgeons must weigh the effectiveness versus costs of each graft type.

Autologous iliac crest (AIC) graft has been considered the gold standard treatment for spinal fusion because of its histocompatible and non-immunogenic properties, presenting higher amounts of cancellous bone, growth factors, and pluripotent cells related to osteoinduction, osteogenesis, and osteoconduction8,9,10. Unfortunately, spinal fusions with AIC have been associated with several morbidities, such as a higher incidence of infection, donor site pain, hematoma development, increased operative time, and blood loss11,12,13,14,15,16.

As consequence of AIC drawbacks, alternative grafts have been developed, and these alternatives are increasingly diverse and available. Such materials can be classified as extender, enhancer, or substitute grafts17,18. An extender decreases the need for large amounts of autologous bone grafting (ABG) while offering the same bone formation properties as AIC17,18. An enhancer is a material combined with ABG to increase successful fusion rates compared to ABG alone17,18. A substitute replaces an ABG and presents the same or higher healing success rates compared to ABG alone17,18.

These materials are often assembled in various proportions to achieve spinal fusion17. However, allograft (ALG) and alloplastic (ALP) grafts are foreign bodies that carry some inherent risks. Considering their pros and cons, AIC use is favorable since AIC need not be associated with other grafts to achieve reliable results19. One frequently used alternative is local bone (LB), however, to our best knowledge, previous studies have only compared autologous bone graft with ALP and ALG. They failed to make a subdivision of autologous bone graft in LB and AIC. This is crucial since if it is possible to avoid using AIC, or other nonlocal bone, the post operatory morbidity, especially residual pain and an extra wound/scar, can be avoided.

Methods

This study was conducted according to the Preferred Reporting Items for Systematic Reviews and Meta-Analyses (PRISMA) statement20. A comprehensive web-based literature search was conducted through to January 2021, using three databases (Lilacs, PubMed, and Cochrane), by two independent authors (SAF and ASMS), without publication-language restrictions. For all databases, controlled vocabulary and text word searches were performed, using a combination of the keywords: “spinal fusion AND autograft AND spinous process”, “spinal fusion AND autograft AND spinal lamina”, “spinal fusion AND autograft AND iliac crest”, “spinal fusion AND heterograft”, “spinal fusion AND allograft AND spinous process”, “spinal fusion AND allograft AND spinal lamina”, “spinal fusion AND allograft AND iliac crest”. Our search was direct toward adult patients that were submitted to spinal fusion, which ALG, ALP, LB or AIC was applied. Due to an increase in medical devices availability nowadays, we compared these groups between them to evaluate the presence of differences or superiority in outcomes, such as fusion rate, hospital stay, follow-up extension (6, 12, 24, and 48 months), pseudarthrosis rate, and adverse events.

Titles, abstracts, and full-text studies were reviewed according to pre-established criteria, and then the relevant data were extracted. Discrepancies were resolved by consensus with the remainder of the research team. This study’s inclusion and exclusion criteria are presented in Table 1. Retrospective studies, prospective analyses, randomized clinical trials, and case series were included in this review. The cut-off date for the review was January 31, 2021.

Table 1 Inclusion and exclusion criteria.

Data extraction

The following data were abstracted from all included studies: study design, year, patient demographics, preoperative assessment, intraoperative information, postoperative assessment, hospital stay, follow-up extension, fusion rate, pseudarthrosis rate (comprising reported data for nonunion and pseudarthrosis), and adverse events (graft-related, infections, and neurological). Data were partially (one graft-type group of interest) or fully (all graft-type groups in the study) extracted from comparative studies, in accordance with our inclusion criteria. Two investigators (SAF and ASMS) independently performed a systematic review of all identified citations. No attempts were made to contact the authors of the reviewed studies to obtain missing or unreported data. Our main outcome of interest was fusion rates, and secondary outcomes included pseudarthrosis and adverse event rates.

Risk of bias assessments and evaluations of validity

The quality of eligible studies and their risk of bias (RoB) were examined by two reviewers (SAF and ASMS) using the methodological index for non‐randomized studies (MINORS)21, and the Cochrane’s collaboration tool for assessing RoB22 in randomized controlled trials. The high risk of bias for RoB score for non-randomized studies was determined to be ≤ 8 (controlled group not present) or ≤ 12 (controlled group present). For randomized controlled trials, each domain was classified as unclear bias, low RoB, or high RoB.

Heterogeneity assessments

Heterogeneity between studies was examined using the I2 statistic and the P-value for heterogeneity23. Substantial heterogeneity is defined as ≥ 50%24.

Data analysis

Meta-analysis of proportions, using MedCalc 16.2.0, was performed to estimate an overall weighted proportion and its 95% confidence interval (CI) for each outcome of interest. MedCalc uses a Freeman–Tukey transformation to calculate summary proportions, weighted according to the number of patients described in each study. We determined the pooled proportion using a random-effects model. Data were summarized in tables and further stratified based on bone graft types (AIC, ALG, ALP, [comprising hydroxyapatite, rhBMP-2, rhBMP-7, titanium cages], and LB). The Kruskal–Wallis test was used to compare variables among the four groups, and post hoc analyses using Mann–Whitney U tests were performed to compare two groups. When multiple follow-up periods were available for a study, data from the last assessment were used for the combined analyses. Subsequently, the fusion rates stratified by bone graft substitutes (bone graft alone or combined with metallic implants), and follow-up periods (6, 12, 24, and 48 months) were further analyzed (subgroup analysis). Studies that did not report the timing of fusion rates assessments were excluded from this subgroup analysis. Further analysis (meta-regression) to identify factors related to fusion rates (surgical approach, pseudarthrosis, and adverse events) were unsuccessful because the methods used to report the data were inconsistent across studies.

Results

Study demographics

As designated by the PRISMA guidelines20, Supplementary Fig. 1 is a flowchart describing our database research, which identified 1535 studies. After passing a screening phase, 184 studies were fully reviewed, leading to 120 exclusions that are detailed in Supplementary Table 1. A total of 64 studies (4177 subjects) were selected for this systematic review. Analyses were performed by reorganizing studies according to graft material (with or without metallic implants) samples, resulting in 91 analyses (51 analyses regarding AIC, 9 analyses regarding ALG, 20 analyses regarding ALP, and 10 analyses regarding LB). PRISMA checklist can be consulted at Supplementary Table 2. According to MINORS21 and Cochrane’s collaboration tool for assessing RoB22, the majority of studies presented high RoB scores, as demonstrated on Supplementary Table 3 and Supplementary Figs. 2 and 3.

Participant demographics

Patients’ and procedures’ characteristics are summarized in Table 2. Overall, patients’ main diagnoses for surgical intervention were degenerative diseases (78.8%). A thorough analysis of follow-up, procedure duration, blood loss, and hospital length of stays (LOS) was impaired due to a lack of systematic reports. Data were inconsistent across studies since none of the ALG articles specified hospital LOS. Similarly, some aspects had been exposed by a unique author, such as procedure time and blood loss in the ALG and ALP groups.

Table 2 Patient’s characteristics and clinical outcomes.

Pre- and post-operative assessments

Patient assessments were not reported systematically, making this study’s analysis difficult. Apart from distinct assessments during patients’ clinical courses, such as weight and height (in preoperative assessments) and Odom’s criteria (in postoperative assessments), pre- and post-operative assessments included matching analysis only for Japanese Orthopedic Association Score (JOA) and Nurick Grade reports in the AIC group. The same pattern was observed in the LB group (Frankel scale report) and ALP group (Frankel and JOA reports), as Table 3 shows.

Table 3 Patient’s pre and postoperative assessments.

Meta-analysis of primary outcomes

LB presented significantly higher proportions of fusion rates (346 fusions out of 366; 95.3% CI 89.7–98.7; Fig. 1) compared to the AIC (2038 fusions out of 2336; 88.6% CI 84.8–91.9; Fig. 2), ALG (381 fusions out of 494; 87.8% CI 80.8–93.4: Fig. 3), and ALP (613 fusions out of 744; 85.8% CI 75.7–93.5; Fig. 4) study groups. Moderately to highly significant inconsistency (I2 > 50%, P < 0.001) was found in all proportion analyses (86.4%, 74.9%, 73.8%, and 91.6%, for AIC, LB, ALG, and ALP, respectively).

Figure 1
figure 1

Local bone pooled proportional rate for spinal fusion.

Figure 2
figure 2

Autologous iliac crest pooled proportional rate for spinal fusion.

Figure 3
figure 3

Allograft pooled proportional rate for spinal fusion.

Figure 4
figure 4

Alloplastic pooled proportional rate for spinal fusion.

Subgroup analysis

To determine what proportion of the summary results were driven by studies that had used grafts with metallic implants, we conducted subgroup analysis by dividing the studies into groups with grafts alone and grafts with metallic implants for all study groups. For AIC alone, the pooled proportion of fusion rates stood at 85.6% (CI 79.8–90.5, I2 = 87.6%), whereas AIC with metallic implants showed fusion rates of 92.3% (CI 87.6–96.0, I2 = 81.6%). The pooled proportion rates for LB alone and combined with metallic implants were 99.1% (CI 94.8–99.8) and 93.9% (CI 86.6–98.5, I2 = 78.4%), respectively. The ALG and ALP studies’ pooled proportion rates for grafts alone were 89.6% (CI 80.1–96.3, I2 = 70.6%) and 81.7% (CI 62.2–95.2, I2 = 92.6%), respectively, versus 82.1% (CI 77.6–86.2, I2 = 0.0%) and 91.4% (CI 76.4–99.2, I2 = 92.0%), respectively, for grafts combined with metallic implants.

Studies that stratified spinal fusion by follow-up period are presented in Supplementary Fig. 4. Detailed information about rates and confidence intervals are presented in Supplementary Table 4.

Meta-analysis of secondary outcomes

Only 32 studies described rates of pseudarthrosis: 17 in the AIC group, six in the ALP group, four in the ALG group, and five in the LB group. Pseudarthrosis presented a pooled proportion of 14.2% CI 8.9–20.5%, I2 = 74.2%, and P < 0.0001 for the lumbar spine region (88 of 625 patients) versus 4.1% CI 1.6–7.7%, I2 = 76.6%, and P < 0.0001 for the cervical spine (29 of 776 patients). According to applied grafts, pseudarthrosis achieved a significantly lower pooled proportion in ALG studies (four events among 243 patients, 4.8% CI 0.1–15.7, I2 = 77.3%) compared to AIC studies (81 events among 851 patients, 8.6% CI 4.2–14.2, I2 = 84.0%), ALP studies (14 events among 153 patients, 7.1% CI 0.9–18.2, I2 = 78.3%), and LB studies (18 events among 154 patients, 10.3% CI 1.8–24.5, I2 = 81.7%).

Adverse events analysis was performed using three main categories: pain, infection, and graft-related events (graft collapse, fragmentation, protrusion, or breach). We also added donor site morbidity for the AIC sample. ALP and AIC studies described significantly more cases (80 events among 404 patients and 860 events among 2001 patients, respectively) than LB studies (20 events among 311 patients) and ALG studies (73 events among 459 patients). For our proportion analysis, we considered only events per patient. Table 4 displays our proportions analysis calculations based on the available data.

Table 4 Adverse events proportions analysis.

Discussion

Through our primary outcome analysis, our study showed a higher proportion of fusion rates for LB (95.3%) compared to AIC (88.6%), ALG (87.8%), and ALP (85.8%). This finding was not expected since LB has less trabecular bone, which would theoretically result in less bone marrow and less availability of the pluripotent cells and growth factors25. Also, LB’s limited harvestable volume narrows its surgical recommendations, and it is commonly applied to the cervical spine (which involves a smaller area to cover and less body load to sustain compared to the lumbar spine).

Our sample mainly comprised AIC (2529) patients, followed by ALP (766), ALG (516), and LB (366) patients. This size discrepancy could explain the LB fusion effect among our pooled samples, which could exacerbate LB’s effect. Moreover, most studies did not present participants baseline assessments, and since the fusion quality of distinctive grafts can diverge by age, metabolic activity, or graft-bed preparation26,27, confirming LB graft fusions superiority to the other studied options is challenging. Similarly, most of the reviewed studies did not follow the FDA’s guidance for spinal fusion evaluations28, increasing their assessment bias.

Additionally, the literature has often identified conflicting opinions regarding the optimal association between surgical techniques and patients’ underlying predictive factors for spinal fusions and spinal grafts. Other meta-analyses, that have considered assorted graft materials or surgical approaches, have demonstrated higher fusion rates using rhBMP27,29,30 or when grafts are associated with the anterior lumbar interbody fusion technique31. Moreover, minimally invasive procedures did not demonstrate fusion rate differences compared to open surgical techniques32.

Considering the data inconsistencies in our primary analysis, which precluded further associations (e.g., fusion rate × graft type × surgical technique), we performed a subgroup analysis of fusion rates with or without metallic implants. In this subgroup, LB presented lower fusion rates when associated with metallic implants, and this finding could be explained by LB limitations in graft volume availability33 and/or small patient sample.

Pseudarthrosis rates and adverse events were studied as secondary outcomes. Our pseudarthrosis analysis revealed that the reported data presented a higher proportional rate of pseudarthrosis in the lumbar spine (14.2%) than the cervical spine (4.1%), consistently with previous analyses6, which was explained by the increased difficulty of stabilizing areas that support higher loads34,35. Furthermore, our analysis of bone graft types revealed that LB presented a higher pooled proportional pseudarthrosis rate (10.5%). However, some considerations are worth mentioning. Pseudarthrosis rates were not systematically assessed across the reviewed studies (AIC 17 of 51 analyses; ALG 4 of 9 analyses; ALP 6 of 20 analyses; and LB 5 of 10 analyses), which could have exacerbated the discrepancy between patient quantity and analyzed effects. Similarly, authors’ descriptions of their results did not suggest that pseudarthrosis can be presumed to directly result from fusion rates’ missing from fusion rate analyses. Moreover, the literature did not present a conclusive role governing bone grafts’ influence on pseudarthrosis rates6.

Greater pseudarthrosis rates have already been associated with advanced age (because of delayed bridging maturation and increased bone resorption)36, degenerative disease, and construct length6. Longer fusions can enable loading distribution, minimizing excess motion and helping to decrease pseudarthrosis34,37. However, they can also increase points of load failure for each adjacent segment34, demand more grafts, and increase patients’ exposure to complications (due to an extensive surgical intervention). Nevertheless, our literature review examined a limited sample for this subgroup analysis, and it included many studies with moderate to high heterogeneity, reflecting pseudarthrosis evaluations’ diversity. For example, Choudhri et al.38 recommend CT imaging with fine-cut axial and multiplanar reconstruction to evaluate spinal fusions. Nonetheless, no radiographic gold standard is available with which to evaluate pseudarthrosis38 compared to open surgical exploration. Therefore, as in the literature, our review did not reveal a conclusive role governing bone grafts’ influence on pseudarthrosis rates6.

Moreover, many available studies presented substantial methodological flaws regarding adverse events, limiting analyses. AIC pain corresponded to a 23.4% pooled proportional rate and a significant proportion of donor site morbidity (23.2%), corroborating the previously mentioned graft drawbacks already described in the literature11,12,13,14,15,16. Unsurprisingly, and as we have mentioned, foreign bodies can carry some inherent risks, which could explain ALP’s higher pooled proportional rates of infection (10.2%) and graft-related events (35.1%).

Our study faced other limitations. Heterogeneity was found in different aspects of the reviewed studies’ populations. This heterogeneity arose from clinical diversity in both treatment groups, supported by insufficient analyses, a small pool of subjects, differences on assessing patients’ baseline and outcomes, and the absence of systematic reports (e.g., the use of tobacco or nonsteroidal anti-inflammatory drugs could have led to a misinterpretation of fusion rates). Moreover, a standard tool for data collection could improve data availability for fusion rate analysis and pseudarthrosis assessment. Furthermore, we did not include all available ALP grafts due to the high existent variability, which could wane proportional analysis. An example is the platelet rich plasma, which is gaining recognition as an important adjunct in the spinal graft market39. Finally, an overall higher RoB—which could influence appraisals of interventions effects—indicated a lack of structured randomized trials. Moreover, successful treatments should be interpreted in light of patients diminished exposure to nosocomial events, acceptable survival rates, and function after treatment.

Comparing the inputs of more than three decades of medical evolution is challenging, given technical improvements, instrumental variations, and a greater range of material. The competition for better outcomes versus materials will continue, as well the difficulty of medical updates and the discernment of industry interests. Structured clinical trials are highly encouraged to promote the availability of optimal, cost–benefit treatments for patients.

The findings of our analysis demonstrate substantial variety of spinal grafts and the need for more rigorous studies to better address and assist surgeons in choosing the best graft options. Standardized methods to evaluate spinal fusion and pseudarthrosis are encouraged.