Introduction

The median cost to develop a new drug is now more than half a billion dollars1. A prominent part of this bill is related to preclinical research, in which animal model-based and bench studies are conducted to test drug efficacy and safety. These studies are a mandatory step in the drug development process because exposing patients to a compound without a proof-of-concept of its potential interest would be unethical. However, even if promising results are obtained in the setting of preclinical research, most of the tested molecules fail to translate their efficacy in clinical trials2. In addition, concerns have been raised by the research community about the poor reproducibility of results, which leads them to question the methodological quality and the reporting of preclinical research3. Indeed, most preclinical studies are underpowered, with no sample size calculation prior to conducting experiments, which increases the risk of false conclusions4. As well, randomization and blinding, which are key methodological elements for causal inference, are usually not performed, which increases the risk of selection, performance and measurement biases5. Attrition bias may be another important issue. Removing outliers from the final analysis or not reporting dead animals without a measurable outcome seem common practice and may completely change the direction and magnitude of effect size6. Finally, reporting bias may be problematic because small studies with a striking effect are more likely to be published than those with negative results7.

In the context of small-sample studies, systematic reviews and meta-analyses seem attractive to synthesize results and overcome the lack of power8. Some previous studies suggested that systematic reviews of preclinical studies may have poor methodology, which led the Collaborative Approach to Meta-Analysis and Review of Animal Data from Experimental Studies (CAMARADES) and the Systematic Review Centre for Laboratory animal Experimentation (SYRCLE) in the early 2010s to provide guidance to conduct systematic reviews in the specific context of preclinical research9,10,11,12,13,14. Since the publication of these recommendations, the methodological quality of these reviews has not been evaluated, nor has the impact of methodological features of the included studies on the effect size.

In this study, we aimed to evaluate (1) the methodology of recently published systematic reviews with meta-analyses of preclinical research; (2) the methodological quality of preclinical studies included in such reviews and (3) the association between methodological characteristics of the included studies and effect size by using a meta-epidemiological approach.

Results

Selection and general characteristics of systematic reviews with meta-analyses of preclinical studies

Our search retrieved 1061 unique reviews. After sorting them based on eligibility criteria, 212 systematic reviews with meta-analyses were included in this methodological review (Fig. 1, and Supplementary File S2 for the complete list of included reviews).

Figure 1
figure 1

Selection process of preclinical systematic reviews with meta-analyses. Adapted from the PRISMA flow diagram18.

General characteristics of these reviews are presented in Table 1. The medical conditions studied were most commonly related to neurology or neurosurgery (27.8% of reviews). A statistician or an epidemiologist was involved in 22.2% of the articles. Overall, 75.9% of the reviews reported adherence to guidelines for conducting the systematic review: the PRISMA Statement reported in 86.3%, and guidelines from SYRCLE and CAMARADES consortia followed in 20.5% and 3.1%, respectively. A low number of reviews were registered (25.0%), mainly on the PROSPERO registry or on SYRCLE or CAMARADES website. Regarding the definition of the objective, the population and intervention were well reported in most systematic reviews (97.2% and 97.6%, respectively), but the control and outcome (with timepoint) were clearly defined in only 59% and 34% (Supplementary Table S1).

Table 1 General characteristics of the included systematic reviews with meta-analyses. Adapted from the PRISMA flow diagram18.

Methodological characteristics of systematic reviews with meta-analyses of preclinical studies

All reviews conducted an electronic search, and almost all (99.1%) reported at least one electronic database. The search equation was reported in 66.5% of reviews, and an animal filter was applied in 10.8%. A search of other sources was reported for 62.7% of reviews, predominantly the exploration of reference list of selected studies (75.9%). The grey literature was explored in 25 (11.8%) reviews and the OpenGrey website for 6 (4.5%). The selection and data extraction process were performed in duplicate in 67.9% and 46.7% of reviews, respectively. Methodological quality of included studies was assessed in 83.5% of reviews and in duplicate in 59.3%. Reviews of pathophysiologic studies less often assessed the methodological quality (70.2% vs 87.3% of reviews of therapeutic interventions). The most common tools used were the SYRCLE risk of bias tool (50.8%) and the CAMARADES quality checklist (18.6%) (Table 2).

Table 2 Methodological characteristics of included systematic reviews. *Chinese National Knowledge Infrastructure, WanFang, VIP, Chinese Biomedical Literature. **As published by Hooijmans et al., and de Vries et al.47,50. ***Google search engine queries, OpenGrey website, experts, unpublished data, others websites.

The median number of included studies per systematic review was 22 (interquartile range [Q1, Q3] 13, 42), with a trend to a higher number in systematic reviews exploring the pathophysiology of diseases (27 [15–62]). The animal model (species and strain) was detailed in 59.0% of reviews.

The methodological characteristics of the meta-analyses are summarized in Table 3. Quantitative outcomes were mostly studied (92.5%) and were summarized with an SMD (61.7%). The random-effects model was the most frequently used (84.5%), and 29.0% of authors advocated for this statistical model based on an observed substantial statistical heterogeneity. Only 9 reviews used both fixed- and random-effects models to synthesize data. Almost two thirds of meta-analyses (61.8%) included several experimental arms for the same preclinical study; 21.2% used a method to take into account this particularity (splitting the control group according to the number of experimental arms sharing the control10, robust variance-based or multilevel models15,16). The statistical heterogeneity was measured in 93.4% of meta-analyses by the Cochran Q test and/or the I2 and was explored in 63.7%. The impact of the methodological quality on meta-analysis results was assessed in 11.3% of meta-analyses. Small-study effects were explored in 64.5% of meta-analyses including 10 experimental arms or more. In total, 33 (15.6%) meta-analyses included both animal and human studies in their systematic review. More than half (54.5%; n = 18) performed separate meta-analyses according to human or animal data, and 27.3% (n = 9) meta-analyzed only animal data (Supplementary Table S2).

Table 3 Methodological characteristics of the meta-analyses (for the primary outcome or first reported outcome). MA meta-analysis. *Splitting the control group according to the number of experimental arms, use of robust variance in meta-analysis model, multi-level random-effects models. **Egger’s or Begg’s tests, other tests.

The median number of included studies in the meta-analyses was 9 [Q1, Q3 5, 20.0] and median number of experimental arms 13 [7, 33.5] (Supplementary Table S3). Heterogeneity was generally high across studies, with a median I2 of 77% [55.5, 90.7]. Small study effects were reported in 35.5% of meta-analyses that included ≥ 10 experimental arms and reported this evaluation. Among the 121 reviews that used an SMD, the combined estimates were in favor of the treatment (or the expected direction for reviews of pathophysiology studies) with a p-value below 5% for 104 (86.6%), not associated with the treatment for 6, and not in favor of the treatment for 1 with a p-value below 5%; in 10 meta-analyses, the expected direction of the effect size was unclear.

Methodological features of included preclinical studies

We evaluated 763 unique studies from 63 meta-analyses reporting an evaluation of methodological quality for each study and an SMD as the effect size (see Supplementary Fig. S2 for details on the selection process). The median year of publication was 2013 [Q1, Q3: 2009, 2015]; 44.0% of studies originated from a Chinese laboratory, and 78.9% were published in English. Most experimental procedures (92.7%) involved rodents (Table 4).

Table 4 General characteristics of the included studies in meta-analyses.

Among items from the different scales used to appraise risk of bias (Table 4), randomization was rated for all studies (Table 5). However, blinding of the animal model was available for only 25.4% of included studies and was mainly rated at high risk (92.3%). The following characteristics were mostly rated at unclear risk of bias: allocation concealment (60.7%), random housing of animals (91%), blinding of caregivers (77.8%), random outcome assessment (86.3%) and blinding of assessors (39.8%). Low risk was predominantly reported for randomization (53%), similarity of group characteristics at baseline (51.9%), attrition bias (60.3%), and selective outcome reporting bias (68.3%).

Table 5 Risk of bias of included studies in the meta-analyses.

We also assessed items regarding the quality of reporting and methods of experimental procedures. The number of studies evaluated for a given feature by review authors is reported in Table 6. “Yes” was assigned in half or more of studies assessed for the features publication in a peer-reviewed journal, control of temperature during the experimental procedure, use of anesthetic without protective effect on outcome, description of animal model, compliance with animal welfare regulations and statement of conflict of interest. However, sample size calculation was reported in only 1 of 351 studies assessed for this feature.

Table 6 Quality of reporting and experimental procedures of included studies in meta-analyses.

Meta-epidemiological analysis

Among the 63 selected meta-analyses, the number contributing to the analysis for each key methodological feature ranged from 11 (218 experimental arms) to 37 (849 experimental arms) (Fig. 2). None of the risk of bias features that could be evaluated (randomization, group characteristics similar at baseline, blinding of assessors, attrition, and selective outcome reporting) was associated with effect size. The I2 ranged from 54.1% (randomization) to 73.3% (selective outcome reporting). The sensitivity analysis using a robust variance estimator gave similar results (Supplementary Fig. S3).

Figure 2
figure 2

Difference in standardized mean difference (SMD) for risk of bias estimated by a meta-epidemiological analysis. A positive difference in SMD reveals a larger effect size in studies at high or unclear risk of bias. A negative difference in SMD indicates a smaller effect size in case of threats to methodological quality. Het heterogeneity, MA meta-analyses.

We did not perform the meta-epidemiological analysis for allocation concealment, random housing of animals, blinding of caregivers and animal models, and random outcome assessment, because of fewer than ten eligible meta-analyses for these characteristics (Supplementary Table S4).

Discussion

This review provides an overview of the methodological quality of recently published systematic reviews with meta-analyses of preclinical research. Regarding the systematic review process, an electronic database was almost always reported, and most systematic reviews assessed the methodological quality of included studies. In addition, almost all reviews reported the meta-analysis model and an evaluation of statistical heterogeneity, which was explored in two thirds of reviews. Still, there is some room for improvement because a clear definition of the control group was often missing both in the methods and in the description of included studies in results. Selection, data extraction and risk of bias assessment were performed in duplicate in only 67.9%, 46.7% and 59.3% of reviews, respectively. In addition, the methodological quality of included studies was poor: the risk of bias of included studies was mostly rated unclear, and reporting was often incomplete. Our meta-epidemiological analysis did not find any potential association of a methodological feature with the study effect size.

As compared with previously published methodological reviews, our evaluation suggests a substantial improvement in the methodology of systematic reviews and meta-analyses of preclinical studies. Across all evaluated reviews, the PRISMA statement was followed in 65.5% of reports, which contrasts with the previous reported rates of 3–14%9,10,13,17,18,19. Also, our study revealed a much more frequent evaluation of the methodological quality of the included studies (83.5% vs 18–47.3% in previous reviews11,12,13,20,21). This improvement might reflect the positive impact of the 2014 publication of the SYRCLE risk of bias tool, whose development in line with the Cochrane Risk of Bias tool may have facilitated its quick adoption in systematic reviews of preclinical research22,23,24. The methodological features related to meta-analysis also showed improvement: heterogeneity was assessed in 93.4% of meta-analyses and the small-study effect was explored in two thirds of reviews as compared with 19% and 22%, respectively, in previous published methodological reviews11,12,13,21.

Some of our results are consistent with an evaluation of systematic reviews and meta-analyses of clinical studies25. Similar to meta-analyses of clinical data, for preclinical data, we found a low rate of searching for grey literature; an assessment of risk of bias in most reviews but seldom taken into account in the meta-analysis; and an inadequate use of the heterogeneity statistic to guide the choice of model in about one third of meta-analyses, which is not recommended by Cochrane26. The two main differences from meta-analyses of clinical studies were the type of primary outcome, mostly continuous in our sample of preclinical reviews and dichotomous in clinical research reviews, and the number of effect estimates that could be some possible associations. Indeed, Page et al. found that 60% of meta-analyses combined estimates were different from a null effect considering a type-one error rate below 5% as compared with 86.6% of meta-analyses of SMD in our sample25.

Regarding preclinical studies, our results suggest an improvement in some domains but not in others, which is consistent with previous published methodological reviews. Van Lujk et al. reported the quality of studies included in 91 systematic reviews (published between 2005 and 2012) of interventions in animals and found similar rates of blinding of caretakers, assessors, and drop-outs in primary studies20. Conversely, we found a better reporting of randomization (53.2%), which is consistent with a recently published methodological review27, and of conflicts of interest (50.0%), which confirms the time-trend observed in the review of preclinical studies by Macleod et al. for these methodological features4. However, the mere reporting of randomization should be balanced by its evaluation with the more stringent SYRCLE risk of bias item “appropriate sequence generation”: the rate of low-risk studies then decreases to 33.5% (130/388). In the same way, even if features such as selective outcome reporting and attrition were considered predominantly at low risk by review authors, the registration of the protocol of preclinical studies and presenting a flow chart are still uncommon, which limits the evaluation of these biases. Of note, random housing of animals and random outcome assessment were almost always graded as unclear, which might reflect the lack of these practices in laboratories. Sample size calculation before starting the experimental procedure was almost never reported, which is consistent with a previous report pointing to the stability over time of poor use of statistics in preclinical research4,27. Some studies have evaluated the impact of reporting guidelines on quality of studies submitted to journals28,29. Unfortunately, contrary to dedicated guidelines for clinical studies such as the CONSORT statement30, the 2010 implementation of the ARRIVE checklist31,32 in the submission process did not seem to affect quality of reporting29. This finding underlines the importance of a multi-dimensional response to this issue: endorsement of guidelines by institutions and publishers and guaranteeing that they are actually followed could help improve the quality of preclinical research33,34. A better awareness of the importance of methodology and transparency in research at all stages of training may also help change practices.

Our meta-epidemiological analysis failed to show any possible association between the methodological characteristics of included studies and the effect size. However, our analysis may have been underpowered in that numerous systematic reviews assigned to all studies the same rating for a given methodological feature, which prevents a meta-epidemiological analysis. Particularly, analysis was precluded with risk of bias items mostly rated as unclear as a reflection of poor reporting of preclinical studies. The analysis may also have been dogged by substantial heterogeneity between meta-analyses, with I2 > 50% for all characteristics evaluated. This statistical heterogeneity may reveal variability in rating of the methodological quality across the reviews. In addition, these results may be explained by a differential impact of methodological features on the effect size between clinical and preclinical research. For example, meta-epidemiological approaches of clinical studies consistently reported an association between inadequate sequence generation and larger treatment effect35,36,37,38. Our meta-epidemiological analysis of preclinical data might have failed to demonstrate this association because in preclinical studies, the experiments are often conducted in inbred strains, which limits the between-subject variability and accordingly the impact of a biased randomization.

Few meta-epidemiological analyses have examined preclinical studies, but their results are consistent with our own. Crossley et al. used a similar two-step meta-epidemiological approach to study the impact of randomization, blinding of the animal model and blinding of assessors on 13 meta-analyses of experimental stroke studies; except for blinding of the model, none of these features was associated with the effect size, but only 7 meta-analyses were included in the analysis of blinding39,40,41,42,43,44,45. Another study evaluated the impact on the effect size of blinding of outcome assessment in experimental spinal cord injury by estimating the proportion of variance explained by the methodological feature. The study found a very low value associated with this characteristic46.

Our study has several strengths: for the first time we assessed the methodological characteristics of systematic reviews with meta-analyses of preclinical research in light of dedicated recommendations, based on a large sample of recently published reviews and concomitantly assessed the methodological quality of the studies they included. We also performed a meta-epidemiological analysis of a wide range of methodological features to evaluate how these characteristics could modify the reported effect sizes. Notably, we did not limit our assessment to studies related to neurological diseases as in previous published meta-epidemiological analyses. Nevertheless, our study presents several limitations. Our search strategy might have induced a selection bias because we searched only the MEDLINE database and included only systematic reviews associated with meta-analyses, which may feature greater methodological quality than the overall literature11,12,13. Our meta-epidemiological analysis also has two weaknesses: lack of power as discussed above and the heterogeneity of results, which may be explained by differences in rating between review authors in that we relied on only data extracted from systematic reviews.

To conclude, even if the overall methodological profile of systematic reviews with meta-analyses of preclinical research seems to show substantial improvement, the reporting and methodological quality of the studies they include may still jeopardize their internal validity and limit their interpretation to exploratory data. Our meta-epidemiological analysis did not reveal any possible association between the methodological characteristics of included studies and the effect size, but it may have been underpowered. Also, meta-epidemiological studies are observational by nature and may be affected by the quality of reporting in included studies. Therefore, these results should not be used as an excuse to not comply with the well-established guidelines for conducting experimental studies. Institutions and publishers have endeavored to improve the overall methodological quality of preclinical data, but there is still a long way to go before animal studies reach expected methodological standards.

Methods

This is a methodological review of systematic reviews with meta-analyses of preclinical studies with a meta-epidemiological analysis of study characteristics associated with effect size. A protocol (available upon request) was written before starting the review process.

Search strategy

We searched MEDLINE via PubMed for systematic reviews with meta-analyses of preclinical studies published between January 1, 2018 and March 1, 2020. We focused on MEDLINE because it is the most widely used database and because our aim was not to be exhaustive but to retrieve a large sample of recent studies. The search algorithm combined terms for meta-analyses and preclinical, in vivo or animal studies and their synonyms47 (Supplementary File S1).

Eligibility criteria

We selected all systematic reviews with meta-analyses reporting the evaluation of a pharmacological (i.e., any administered product with biological activity) or non-pharmacological intervention to treat one or several medical conditions or to explore the pathophysiology of a disease. Only meta-analyses published in English were included. All vertebrate animal models were eligible, regardless of age and sex. If the review aimed to summarize preclinical and clinical evidence, we selected only articles in which the meta-analysis section included preclinical studies.

The following studies were excluded: systematic reviews without meta-analysis; systematic reviews and meta-analyses without an evaluation of an intervention (e.g., evaluation of performance of a diagnostic test); systematic reviews including in vitro experimentation about ecology, veterinary medicine, agri-food industry, microorganism epidemiology, microbiology, and human-only research (including genetics or medical biochemistry studies); reviews of invertebrate models; narrative reviews; methodological studies; case reports; editorials; protocols; replication studies and comments; network and Bayesian meta-analyses; and meta-analyses of transcriptomic, genomic or epigenetic studies48.

Selection process

References retrieved by the search were imported to a reference manager (Zotero 5.0). The title and abstract were first screened by one reviewer (NST). The full text was assessed for eligibility by two reviewers (NST and ALG). Disagreements were solved by consensus.

Data extraction

Two reviewers (NST and DR) extracted data for all selected studies according to two standardized data extraction forms (one for systematic reviews and meta-analyses and one for included preclinical studies). Disagreements were resolved by consensus.

Characteristics of systematic reviews and meta-analyses

From available recommendations for systematic reviews and meta-analyses of preclinical studies9,10,14, we extracted the following data for each systematic review.

General characteristics extracted were publication date, journal name and impact factor; medical condition of interest; reporting of protocol availability (i.e., registration or publication); any involvement of epidemiologists or statisticians as defined elsewhere49; funding source and declared conflicts of interest by the review authors; and main objective of the study, expressed as population, intervention or exposure, control group and outcome (PICO18), explicitly provided by the authors or extracted from the text of the article. We also evaluated whether a primary outcome was defined.

Data extracted for the systematic review process were first those related to the search strategy: interrogated electronic databases, reporting of the search algorithm for at least one database, and eventual use of an animal filter47,50; whether restriction of language or publication date was applied, search of reference lists of articles or relevant reviews, manual search of conference abstracts, books or specific journals, investigation of the “grey” literature (Google search engine queries, dedicated websites, and unpublished data obtained by contacting experts in the field). We then assessed the selection and data extraction process: whether eligibility criteria were defined and whether these steps were performed in duplicate; the evaluation of the methodological quality of included studies and which tool was used22,51,52; number of screened, included and excluded studies and reasons for study exclusion reported in a flow chart; and number of animal and human studies.

Meta-analyses data extracted were the effect size measure; statistical model used to pool data (fixed-effects and/or random-effects model), and whether it was chosen according to a heterogeneity statistic; evaluation of heterogeneity; exploration of heterogeneity and if so, by which means (subgroup analysis, meta-regression model); assessment of the potential impact of the methodological quality of included studies on the meta-analysis result (subgroup analysis according to the methodological quality, meta-regression, sensitivity analyses excluding low quality studies); and investigation of the small-study effects by a funnel plot, by tests for funnel plot asymmetry, or by using the trim-and-fill method53 if the number of studies was sufficient. For characteristics that may vary across meta-analyses (e.g., effect size measure), we focused on the primary outcome or the first reported if no primary outcome was defined.

Characteristics of included preclinical studies

We focused on systematic reviews and meta-analyses reporting an evaluation of the methodological quality for each included study and a standardized mean difference (SMD) as the effect size for the primary or first reported outcome (this sample was also used for the subsequent meta-epidemiological analysis). For each included study, we collected the language of publication, the country of the laboratory (based on the affiliation of the last author) and the species used for experimentation. We extracted features related to the risk of bias, adapted from the SYRCLE risk of bias tool for animal studies22: reporting of randomization, and if available, appropriate sequence generation; similar group characteristics at baseline; appropriate allocation concealment; random housing of animals; blinding of caregivers; blinding of the animal model; random assessment of outcomes; blinding of assessors; and adequate management of incomplete outcome data and presence of selective outcome reporting. The quality of reporting and methods of experimental procedures were rated according to the CAMARADES quality items51: publication in a peer-reviewed journal, reporting of control of temperature during experimental procedure, use of an anesthetic without protective effect on outcome, description of the animal model, sample size calculation, compliance with animal welfare regulations and statement of conflicts of interest.

The value of each individual study SMD and its confidence interval for the primary or first reported outcome was also collected for the meta-epidemiological analysis.

All data were extracted directly from the reviews, except language of publication and country of laboratory, which were retrieved from the original publication.

Statistical analysis

Descriptive analysis

Categorical variables were expressed as number (percentage) and quantitative variables as median (Q1, Q3). Characteristics of meta-analyses evaluating a therapeutic intervention and those exploring the pathophysiology of a disease were described separately.

Meta-epidemiological analysis

The association between the effect size and each key methodological characteristic related to risk of bias as defined above for included studies was explored by a two-step meta-epidemiological approach54 (for an overview of the statistical methodology, see Supplementary Fig. S1). First, for each meta-analysis, we used random-effects meta-regression to estimate the difference in the reported effect size (i.e., SMD) between studies at high or unclear risk for the methodological characteristic and the SMD of studies at low risk (difference in standardized mean difference [DMSD]). To obtain comparable effect sizes, all SMDs were transformed before meta-regression to be positive in case of a beneficial treatment effect (or expected direction of the relation for studies exploring the pathophysiology of a disease). Second, the DMSDs obtained per meta-analysis were combined by a random-effects meta-analysis55. This combined estimate provides an estimation of the average association between the methodological characteristic and the effect size across meta-analyses. A positive combined estimate of the DMSD suggests a larger effect size for studies at high or unclear risk of bias for the characteristic. In contrast, a negative combined DMSD indicates that a smaller effect size is observed in studies with poor methodological features.

We performed the meta-epidemiological analysis if there were at least 10 eligible meta-analyses for a given characteristic. An eligible meta-analysis was a meta-analysis including ≥ 3 studies with at least one study at high or unclear risk of bias and one study at low risk of bias for the studied characteristic.

Because numerous meta-analyses included several times the same study as multiple experimental arms, we performed a sensitivity analysis for estimating the DMSD by using a meta-regression with a robust variance estimator, which takes into account the dependency of the different experimental arms of the same study included in the meta-analysis15,56.

Statistical analysis was performed with R 3.6 (R Core Team (2019). R: A language and environment for statistical computing. R Foundation for Statistical Computing, Vienna, Austria. URL https://www.R-project.org/).