Introduction

Cancer is a leading cause of death worldwide and the total number of global cancer deaths is projected to increase by 45% from 7.9 million in 2007 to 11.5 million in 20301. For patients suffering from cancer, healthcare interventions aim to cure or considerably prolong the life of patients and to ensure the best possible quality of life for cancer survivors1. Treatment decisions should be based on evidence of the existing most effective treatment given available resources. High quality systematic reviews/meta-analyses of randomized controlled trials (RCTs) can provide the most valid evidence2. However, conventional meta-analysis becomes inadequate when there are no head-to-head trials comparing alternative interventions, or when more than two interventions need to be compared simultaneously3. For example, although there are trials directly comparing each of the newer antineoplastic agents with the current standard treatment (or placebo) for patients with neoplasm, there are no trials that directly compared different newer antineoplastic agents. Another example is a lack of direct comparison of 19 different chemotherapy regimens that are currently available for the treatment of advanced pancreatic cancer4.

Network meta-analyses (NMAs), as a generalization of pairwise meta-analysis, is becoming increasingly popular5,6,7,8. In the absence of or insufficient head-to-head comparisons of competing interventions of interest, NMAs using indirect treatment comparison analyses can provide useful evidence to inform health-care decision making. Even when evidence from direct comparisons are available, combining them with indirect estimates in a mixed treatment comparison may yield more refined estimates8,9. Formally, NMAs can be defined as a statistical combination of all available evidence for an outcome from several studies across multiple treatment to generate estimates of pairwise comparisons of each intervention to every other intervention within a network10. It has been considered that NMAs would be the next generation evidence synthesis toolkit which, when properly applied, could serve decision-making better than the conventional pair-wise meta-analysis11. However, NMAs are subject to similar methodological risks as standard pairwise systematic reviews. Because of its methodological complexity, it is probable that NMAs may be more vulnerable to such risks12. Therefore, it is important to assess the quality of published NMAs before their results are implemented into clinical or public health practice.

Previous studies have examined methodological problems in published indirect comparisons and NMAs, especially regarding reporting quality of statistical analysis12,13,14,15. It was concluded that the key methodological components of the NMAs process were often inadequately reported in published NMAs12. Currently, there are 30 tools available to assess the methodological quality of systematic reviews or meta-analyses16. To the best of our knowledge, no standard tool has been developed currently to assess the methodological quality of NMAs. AMSTAR (a measurement tool to assess the methodological quality of systematic reviews) tool is probably the most commonly used quality assessment tool for systematic reviews, which has been proven with good reliability, validity, and responsibility17,18,19.

The objective of this study is to conduct a methodological review of published NMAs in the field of cancer, summarise their characteristics, methodological quality, and reporting of key statistical analysis process. We also aim to compare the methodological quality and reporting of statistical analysis by selected general characteristics.

Results

Search results

Initial literature search retrieved 6,408 citations. Of them, 3,754 citations were duplicates, so 2,654 citations were sent for further screening. Based on titles and abstracts, 1,741 citations were excluded. Then 637 articles were excluded based on reading full-texts, for reasons including: traditional pair-wise meta analysis (n = 64), methodological studies (n = 67), NMAs not related to cancer (n = 478), abstracts/letters/editorials/correspondences (n = 26), cost-effectiveness reviews (n = 6). Finally, 102 NMAs in the field of cancer were included (Fig. 1), including 92 published in English and 10 in Chinese. A list of included NMAs could be found in Appendix 1.

Figure 1
figure 1

The details of literature selection.

General characteristics of included NMAs

The first NMA in the field of cancer was published in 200620. The number of published NMAs increased slowly until 2010, and then increased quickly. 43.14% (44/102) of the included NMAs were published since 2014 (Fig. 2). 98 NMAs involved 24 kinds of cancer, although 4 NMAs did not specify types of cancer. Non-small cell lung cancer (19/102, 18.63%) and breast cancer (12/102, 11.76%) were the most or secondly common type of cancer studies in the included NMAs (Fig. 3). NMAs were often performed by researchers based in China (29/102, 28.43%), UK (24/102, 23.53%), and USA (11/102, 10.78%) (Fig. 4). 99 NMAs were published in 60 different journals and 3 NMAs were doctorate dissertations. 85.30% (87/102) of NMAs were indexed by Science Citation Index (SCI) and 31.37% (32/102) were published in journals with high impact factors. According to 56 NMAs (54.90%) that reported dates of manuscript reception and acceptation, the median publishing period was 101 days (IQR: 47–187 days). According to 60 NMAs (58.82%) with information on funding source, 46 NMAs (45.10%) received funding support. The median number of interventions assessed per NMA was five (IQR: 3–9). The median number of trials included per NMAs was 12 (IQR: 7–23), and the median number of patients included in NMAs was 3,605 (IQR: 1,950–7,564). The main characteristics of the included NMAs were shown in Table 1. A more detailed characteristics and reporting of statistical analysis process could be found in Appendix 2.

Table 1 Characteristics of the included NMA.
Figure 2
figure 2

The trend of year of publications.

Figure 3
figure 3

Categories of disease of included NMAs.

Figure 4
figure 4

Countries of included NMAs.

Reporting of literature search

Thirteen NMAs did not report any information on literature search, whereas one NMAs was conducted based on previous meta-analyses without additional searching. 98.90% (88/89) NMAs searched only English databases. The median number of Chinese databases searched was 5 (IQR: 3–6), and it was 3 (IQR: 3–4) for English databases. 22.50% (20/89) NMAs reported the search strategy, and the median number of search strategies reported was 2 (IQR: 1–3). 27.00% (24/89) NMAs searched previous published meta-analyses as a supplemental literature search. Other supplemental literature search methods included reference list checking, clinical trial registration platform, conference abstracts or web sites, and google engine (Table 2). PubMed/MEDLINE was the most common single database searched, and it was often combined with a search of Cochrane Library. The details of databases searched were showed in Table 3.

Table 2 Reporting information of literature search.
Table 3 Information of databases searched (n = 89).

Reporting of statistical analysis processes

Sixty-one (59.80%) NMAs were conducted using a Bayesian framework (2 reviews are adjusted indirect comparisons). 43 reviews were adjusted indirect comparisons (2 adjusted indirect comparisons use Bayesian framework).

For NMAs using a Bayesian framework, more than half also included traditional meta-analyses (42/61, 68.85%). The majority of NMAs reported summary measures (57/61, 93.44%). 75.41% (46/61) NMAs reported the model used. Of the 24 (24/61, 39.34%) NMAs that tested model fit, the most common method was the use of deviance information criterion (15/24, 62.50%). The majority of NMAs (40/61, 65.57%) did not make their code available to journal readers. 16 (26.23%) NMAs provided the model source cited, however, it was unclear for the details of model used. 91.80% (56/61) did not report whether there was an adjustment for multiple arms. Half of NMAs (31/61, 50.82%) specified the prior distributions used, and the most common prior used was non-informative prior (18/31, 58.06%). Only 4 (6.56%) NMAs performed pre-specified sensitivity analyses. More than half of NMAs did not report assessment of convergence (37/61, 60.66%) and sensitivity analyses performed (37/61, 60.66%). Inconsistency was assessed in 27.87% of NMAs. Assessment of heterogeneity in traditional meta-analyses was more common (26/61, 42.62%) than in NMAs (4/61, 6.56%). Most of the included NMAs did not report assessment of similarity (53/61, 86.89%), publication or reporting bias (60/61, 98.36%), subgroup analyses or meta-regression performed (49/61, 80.33%), and whether GRADE tool was used to assess quality of evidence (58/61, 95.08%). NMAs published in journals with higher impact factors more often provided model code (57.69% versus 23.08%, p = 0.012) and assessed the heterogeneity of NMAs (0% versus 15.38%, p = 0.039). Based on the median division of the number of included NMAs, we chose December 31st 2013 as cut-off point. NMAs published prior to December 31st 2013 more often reported models used (89.66% versus 62.50%, p = 0.015), model code used (48.28% versus 21.88%, p = 0.032), and assessment of heterogeneity of NMAs (13.79% versus 0%, p = 0.031). Other results did not differ by journal impact factor or year of publication. The more details of statistical reporting in Bayesian NMAs was showed in Table 4.

Table 4 Statistical Reporting in Bayesian analysis [n/N(%)].

For adjusted indirect comparisons, the majority of NMAs (42/43, 97.67%) also conducted traditional meta-analyses and 53.49% (23/43) adjusted indirect comparisons were performed using methods described by Bucher21. 58.14% (25/43) assessed the heterogeneity of direct comparisons, but none of NMAs assessed the heterogeneity of indirect comparisons. Only two (4.65%) NMAs described the details of handling of multi group trials and three (6.98%) described the methods of similarity assessment. Most of NMAs did not report whether sensitivity analyses were performed (38/43, 88.37%) and whether subgroup analyses or meta-regression were performed (34/43, 79.07%). These results did not differ by journal quality or year of publication. The details of statistical reporting for adjusted indirect comparisons was showed in Table 5.

Table 5 Statistical Reporting in adjusted indirect comparisons [n/N(%)].

Methodological quality assessment

The results of methodological quality assessment based on modified AMSTAR checklist were presented in Fig. 5. The median total score was 8.00 (IQR: 6.00–8.25). Approximately half of the included NMAs did not perform a comprehensive literature search (Item 3, 42.31%). More than half of NMAs (69.61%) did not consider the scientific quality of the included studies in formulating conclusions, and 84.31% NMAs did not assess the likelihood of publication bias.

Figure 5
figure 5

The results of methodological quality assessment.

Table 6 presented the results of stratified analyses of methodological quality assessment. NMAs published in journals with higher impact factors more often performed a comprehensive literature search (78.13% versus 45.45%, p = 0.002), reported appropriate methods used to combine the findings of studies (81.25% versus 58.18%, p = 0.019), and assessed the likelihood of publication bias (25.00% versus 5.45%, p = 0.017). NMAs published after December 31st 2013 more often assessed the scientific quality of the included studies (86.36% versus 55.17%, p = 0.001) and considered the scientific quality in formulating conclusions (43.18% versus 20.69%, p = 0.015). Most of these items did not differ between funding support and non-funding support. NMAs published in China more often reported two independent reviewers for study selection and data extraction (89.66% versus 65.75%, p = 0.015), assessed the scientific quality of the included studies (86.21% versus 61.64%, p = 0.016) and considered the scientific quality in formulating conclusions (68.97% versus 15.07%, p = 0.000). Moreover, Bayesian NMAs more often reported two independent reviewers for study selection and data extraction (81.97% versus 60.47%, p = 0.015), performed a comprehensive literature search (70.49% versus 39.53%, p = 0.002), considered the status of publication (i.e. grey literature) used as an inclusion criterion (86.89% versus 67.44%, p = 0.017), and assessed the scientific quality of the included studies (78.69% versus 55.81%, p = 0.013).

Table 6 Subgroup analyses of methodological quality assessment (n/%).

Table 7 presented the association of total AMSTAR-score and selected general characteristics. Although the AMSTAR-score of NMAs published in China was higher than NMAs published in others (p = 0.023), there were no significant differences between AMSTAR-score and different countries (p = 0.465). The differences were not significant between AMSTAR-score and other selected general characteristics.

Table 7 The association of total AMSTAR-score and selected general characteristics [median (IQR)].

Discussion

We identified 102 NMAs involving 24 kinds of cancer. Methodological quality and statistical reporting were assessed based on PRISMA extension statement and modified AMSTAR checklist. In addition, we also assessed the conduct of literature search in the included NMAs. Some key methodological components including the literature search and statistical analysis were missing or inadequate in most of included NMAs, such as only 22.50% of NMAs reported search strategy, 6.56% assessed the heterogeneity in NMAs. Methodological quality and reporting of statistical analysis did not substantially differ by selected general characteristics of NMAs.

NMAs could provide useful evidence on relative effectiveness of different interventions for decision-making when there are no or insufficient direct comparison trials11. Methodological quality of NMAs is a crucial point for health care decision-makers and researchers. We assessed the methodological quality of NMAs in the field of cancer based on modified AMSTAR checklist. Some methodological flaws were identified, especially regarding to literature search (Item 3), assessment of scientific quality (Item 7) and scientific quality used appropriately in formulating conclusions (Item 8), the methods used to combine the findings of studies (Item 9), and assessment of publication bias (Item 10).

NMAs aimed to rank the benefits (or harms) of interventions, based on all available RCTs. Thus, the identification of all relevant data is critical7. Most of the included NMAs (80.39%, 82/102) did not report database search strategy. For those that reported search strategy, 26.96% only searched previous published meta-analyses. It is important to search, track, and include previous systematic reviews and meta-analyses in conducting NMAs22. PubMed/MEDLINE was the most commonly used databases and the most common combination of databases was PubMed/MEDLINE and EMBASE. The majority of NMAs did not search Chinese databases. Cohen et al.’ study showed that searching Chinese databases might lead to the identification of a large amount of additional clinical evidence, and suggested that Chinese biomedical databases should be searched when performing systematic reviews23.

The assessment of scientific quality of individual studies could affect findings of NMAs24. However, 31.37% of the included NMAs did not report methods for assessing the risk of bias of individual studies in methods sections. And 69.61% did not consider the scientific quality of the included studies in formulating conclusions. Although reporting bias could have a substantial effect on the conclusions of a NMA12, most of the included NMAs (84.31%) did not report a method to assess publication bias.

The complex nature of NMA mainly reflected in the diversification of interventions and complex statistical analysis process. Homogeneity and consistency assumptions underlie NMA25. Although assessment of heterogeneity in traditional meta-analyses was common, only 4 NMAs (3.92%) assessed the heterogeneity in the entire network by heterogeneity variance parameter (Tau2). Eleven (10.78%) explicitly reported the methods of assessment of similarity. For those with Bayesian framework, 17 (27.87%) assessed the inconsistency between direct comparisons and indirect comparisons. GRADE tool was proposed to assess the quality of evidence from NMAs in 201426. However, it still was rarely used to assess quality of evidence in NMAs related to cancer.

To the best of our knowledge, this is the first review to comprehensively assess the methodological quality using a modified AMSTAR checklist, and simultaneously assess the quality of reporting of literature search and statistical analysis methods. Two recent reviews that also focused on the methodological problems of published network meta analyses12,15 covered a wide range of medical areas and some details of reporting of literature search and statistical analysis were missing. Bafeta A et al.12 included 121 NMAs to examine the methodological reporting of NMAs, the results showed that 73% did not report the electronic search strategy for each database compared with 77.5% in our study. Most of NMAs did not assess quality of evidence using GRADE tool (3% vs. 4.92%). The results of methodological reporting were similar to our study. Chambers J et al.15 also showed that there were similar methodological quality problems in their included NMAs. However, AMSTAR checklist has not been used to systematically assess the methodological quality of NMAs. Furthermore, we explored the potential factors influencing methodological quality and statistical reporting according to general characteristics of the included NMAs. There were no substantial differences by selected general characteristics of NMAs.

Our study also have some limitations. There was no standard tool to assess the methodological quality of NMAs. We slightly modified three of the 11 AMSTAR items (Item 1, Item 5, and Item 9) to assess the methodological quality of NMAs. However, there are still some problems or uncertain issues, such as the difficulty in defining type of interventions and type of comparisons for inclusion in NMAs, how to draw geometry of the network, how to handle multi group trials, how to decide whether the assessment of similarity and consistency was appropriate, and whether statistical analysis methods were appropriate for NMAs. The complex nature of statistical analysis of NMAs raised the necessity to develop a guideline about the reporting of statistical analysis of NMAs. As with other methodological studies, assessing methodological quality and reporting quality from published reports alone could be misleading. The study authors may have used adequate methods but omitted important details from published reports12, or published reports were sufficient referring to relevant reporting guidelines but not rigorous during the conduct process. For example, while we distinguished whether study selection and data extraction were performed by least two independent reviewers, we did not know whether the processes were really performed by two independent reviewers. Finally, we did not identify any eligible NMAs related to diagnostic test accuracy and animal study. We also did not include reviews based on individual patient data (IPD) due to the differences of method and statistical analysis processes between IPD and aggregated data.

Overall, the methodological quality of NMAs in the field of cancer was generally acceptable. However, some methodological flaws have been identified in published NMAs, especially regarding to literature search, assessment of scientific quality and scientific quality used appropriately in formulating conclusions, the methods used to combine findings of studies, and assessment of publication bias. Methodological quality and statistical reporting did not substantially differ by general characteristics.

Methods

Search strategy

PubMed, EMBASE, Web of Science, Science Citation Index Expanded, Social Sciences Citation Index, The Cochrane Library, Cochrane Database of Systematic Reviews, Database of Abstracts of Reviews of Effects, Health Technology Assessment Database, NHS Economic Evaluation Database, Chinese Biomedical Literature Database (CBM), and China National Knowledge Infrastructure (CNKI) were searched from inception to February 26th, 2014. The search strategy was recently reported in a published paper7. All searches were updated on 9th July, 2015.

Eligibility criteria

We included any NMAs in the field of cancer in the English and Chinese languages, regardless of interventions. NMAs were defined as meta-analyses that used network meta-analytic methods to analyze, simultaneously, three or more different interventions7, adjusted indirect comparisons were also included. If the same NMA had duplicate publications, the latest was included. We excluded methodological articles, conference abstracts, letters, editorials, correspondences, cost-effectiveness reviews, and reviews based on individual patient data.

Study selection

Literature search records were imported into ENDNOTE X6 literature management software. Two independent reviewers (LG, LL) examined the title and abstract of retrieved studies to identify potentially relevant studies according to the eligibility criteria. Then, full-text versions of all potentially relevant studies were obtained. Excluded trials and the reasons for their exclusion were listed, conflicts were resolved by a third reviewer (J-HT, or K-HY).

Data extraction and management

A standard data abstraction form was created using Microsoft Excel 2013 (Microsoft Corp, Redmond, WA, www.microsoft.com) to collect data of interest. A pilot-test was performed for literature selection and data extraction, and a “cheat sheet” with detailed definitions and examples were developed to ensure high inter-rater reliability among the reviewers.

General characteristics

The following general characteristics were collected by one reviewer (LG): first author, year of publication, country of corresponding author, journal name, publishing period (time from received to accepted), funding source (industry-supported, non-industry-supported, unfunded or not report), number of author, language of publication (English or Chinese), number and type of included original studies, sample size of included original studies, number of study arm, type of outcome (dichotomous, continue, or survival time), categories of disease, and number of interventions included in the network. We categorised journal types into Science Citation Index (SCI) or non-SCI; we also identified journals with high impact factors (IF ≥ 5.000, as reported on Journal Citation Reports 2014)27 or low impact factors (IF < 5.000). We also categorised NMAs into older studies or recent studies based on the median division of number of included NMAs.

Reporting of literature search

One reviewer (XQ) extracted following information regarding reporting of literature search: number of databases searched (Chinese, English, or both), name of databases searched, whether the search strategy was provided, whether the previous systematic reviews/meta-analyses were searched, name and number of other sources searched (e.g., reference lists checking, clinical trial registration platform, conference abstracts or web sites, Google engine).

Reporting of statistical analysis processes

We assessed the reporting and quality of statistical analysis processes in the methods sections of each NMA report according to the Preferred Reporting Items for Systematic Reviews and Meta-analyses (PRISMA) extension statement for NMAs28. The following questions were designed according to the statistical analysis section of PRISMA extension statement, and were extracted by two independent reviewers (LG, JZ), and conflicts were resolved by a third reviewer (J-HT, or K-HY):

  • Was traditional meta-analysis conducted?

  • Were summary measures reported?

    State the principal summary measures (e.g., risk ratio, odd ratio, mean difference, hazard ratio). Also describe the use of additional summary measures assessed, such as treatment rankings (e.g., treatment rankings, best, or surface under the cumulative ranking curve (SUCRA) values), shape and scale parameters for survival data29.

  • Planned methods of analysis, this should include:

    Was a Bayesian or a frequentist framework used?

    What was the model used? (random-effects model, fixed-effect model, others, or not report). What was the method used to undertake the indirect comparisons30?

    Was the model code presented or source cited in Bayesian analyses? (not provided, manuscript, online supplement, external web site, reference, or others).

    Was the model fit assessed? (e.g., residual deviance31, deviance information criterion31, other, not reported).

    Was handling of multigroup trials reported?

    Was selection of prior distributions in Bayesian analyses described?

    Was the convergence in Bayesian analyses assessed32?

    Was the heterogeneity in traditional meta-analysis assessed and how to handle the heterogeneity33?

    Was the heterogeneity in the entire network of NMA assessed and how to handle the heterogeneity34?

    Was the transitivity/similarity in NMA assessed35?

  • Were the inconsistency assessed and how to handle the inconsistency36?

  • Was the publication or reporting bias assessed37?

  • Additional analyses included:

    Was a sensitivity analysis performed? (e.g., excluding studies, alternative prior distributions for Bayesian analyses, alternative formulations of the treatment network).

    Was subgroup analysis or meta-regression performed?

  • Was the Grading of Recommendations Assessment, Development and Evaluation (GRADE) tool used to assess quality of evidence26?

Methodological quality assessment

There were no consensuses to assess the methodological quality of NMAs. We assessed the methodological quality of included NMAs using a modified AMSTAR checklist. This checklist included 11 items, with possible responses of “Yes” (item/question fully addressed), “No” (item/question not addressed), “Cannot answer” (not enough information to answer the question), and “Not applicable”. Two reviewers (XQ, G-QP) independently extracted data, and conflicts were resolved by a third reviewer (LG, or J-HT). The total score using AMSTAR was obtained by summing one point for each “yes” and no points for any other responses (“no”, “Cannot answer” and “Not applicable”), ranging from 0 to 11. In our study, three of the 11 items were slightly modified as follows (Appendix 3):

  • “Was an ‘a priori’ design provided?” was amended to “Was the research question (i.e., research purpose, inclusion and exclusion criteria) clarified?”

    The reason for this modification was that only a small minority of published non-Cochrane reviews reported a protocol38. Where a protocol providing this information was available, the answer to this question would be “Yes”. Where no protocol was available but detailed information about research purpose and inclusion and exclusion criteria (patients, interventions, comparators, outcome, and study design) were supplied, we also considered answer this question “Yes”.

  • “Was a list of studies (included and excluded) provided?” was amended to “Were a list of included studies and flow diagram provided?”

    The reason for this modification was that most of published systematic reviews did not provide a list of excluded studies. Where a list of included studies and flow diagram of literature selection were provided (as references, electronic link, or supplement), we considered answer this question “Yes”.

  • Were the methods used to combine the findings of studies appropriate?

    For pairwise meta-analysis, we scored “Yes” if they mentioned or described heterogeneity and reported how to handle heterogeneity. For NMA, the following factors should be taken into consideration except heterogeneity, but not be limited to: summary measures, model used, model fit, prior distributions (Bayesian analysis), convergence (Bayesian analysis), and inconsistency.

Statistical analysis

Quantitative data were summarised by medians and interquartile range (IQR), and categorical data summarised by numbers and percentages. The association between methodological quality and following characteristic variables was explored using the Mann-Whitney U test and Kruskai-Wallis test: journal impact factor, year of publication, funding source, country of corresponding author, type of NMAs, and categories of disease. Moreover, the subgroup analyses for statistical reporting were performed according to journal impact factor (high vs. low impact factors) and year of publication (older vs. recent studies). Proportion results were analysed by Chi-square test using STATA version 12.039. All tests were two sided, and P ≤ 0.05 was considered statistically significant.

Additional Information

How to cite this article: Ge, L. et al. Epidemiology Characteristics, Methodological Assessment and Reporting of Statistical Analysis of Network Meta-Analyses in the Field of Cancer. Sci. Rep. 6, 37208; doi: 10.1038/srep37208 (2016).

Publisher’s note: Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.