The minute fraction of published preclinical studies of medicines and therapies that have been tested in case studies or clinical trials is inescapable evidence that clinical translation is complex and costly. Yet another factor is also critical: difficulties in the reproducibility (by a different team using a different set-up), and even replicability (by a different team using the same set-up), of protocols, procedures and results. The few reproducibility efforts that have been carried out in biomedicine (mostly in cancer biology) have reported that only 10–25% of the small sets of preclinical studies considered could be successfully reproduced.

figure a

Chris Ryan/Nature, Springer Nature Ltd.

Improper design, implementation and reporting of preclinical studies of medicines and therapies can undermine their reproducibility and translatability. Figure reproduced from Nature 529, 456–458 (2016).

Preclinical findings can be difficult to replicate and reproduce, owing to hurdles associated with biological heterogeneity and complexity, or with the use of non-standard methods or of costly materials or technology. But many published studies unfortunately also suffer from poor or biased study design, insufficient statistical power, or lack of adherence to reporting standards; these flaws — which can result in wasted time and resources, in reductions in funding and investments, and occasionally in the halting of promising research — are entirely avoidable.

How can preclinical biomedical studies be designed to maximize the chances of clinical translation? A Perspective by John Ioannidis, Betty Kim and Alan Trounson, published in this issue, provides specific advice for studies in nanomedicine and in cell therapy. For instance, to minimize the effects of biological heterogeneity when designing a study to test a new nanomedicine or cell product, investigators should adopt or develop standardized specimens, materials and protocols, and use multiple disease-relevant in vitro models and animal models. For example, immunocompetent mouse models and models using patient-derived cells typically better recapitulate critical aspects of human disease than immunocompromised animals and cell-line-derived xenografts. Still, all models have limitations; to avoid hype, these and any other limitations, assumptions and caveats of the study should be reported. And to reduce biological variability and experimentation (and experimenter) biases, replication by different investigators (ideally in a multi-site collaboration project) across independent biological specimens or systems is often necessary. Also, when designing new medicines or therapies, early attempts to reduce complexities in product synthesis or manufacturing and in material costs should pay off; too often, promising nanomedicines and cell products are not translatable because their synthesis cannot be scaled up (or producing them with consistent homogeneity or purity would be unfeasible or too expensive), or because their safety profiles do not offer satisfactory assurances for testing in humans.

Tools that protect investigators from biases and statistically poor results can act as effective safeguards against reproducibility failures in preclinical studies testing medicines or therapies. In particular, randomization and blinding, when feasible, reduce bias. And large and statistically powered studies boost trust in the claims. Yet studies that lack negative controls when claiming a positive result, that lack positive controls when claiming a negative finding, that claim statistical significance on the basis of P values barely below the arbitrary threshold of 0.05, or that selectively report the outcomes of exhaustive post-hoc data analyses carried out to find statistically significant patterns in the data, collectively pollute the biomedical literature and harm biomedical research. Biases, which are rarely intentional, usually result from inadequate training or from skewed incentives and rewards, such as the pressure to publish significant results. Hence, study design, including well-informed statistical analyses, should ideally be in place in advance of the implementation of the study (except for work of exploratory nature), and be registered on a publicly accessible database (such as preclinicaltrials.eu).

Even when preclinical studies have been properly designed, the lack of sufficiently thorough and clear reporting can hamper their reproducibility. In this respect, guidelines such as ARRIVE (Animal Research: Reporting of In Vivo Experiments) and reporting checklists (such as the Nature Research Reporting Summary, which is included in every research article published by Nature Biomedical Engineering) prompt authors to provide details about study design, methods, protocols, materials, animal models, data acquisition and processing, and software used.

Still, proper study design and clear and thorough reporting, although necessary for reproducibility and translatability, are not sufficient. Without an adequate understanding of the biological mechanisms underlying the action and performance of a particular medicine or therapy, reproducibility and translatability attempts can be handicapped. Naturally, unveiling sufficient mechanistic understanding requires exploratory research, which sometimes is painstakingly difficult and may require new animal models or in vitro microphysiological models that better recapitulate the biology of human disease.

Good practices in study design, the standardization of methods, protocols, models and reporting, and increased mechanistic understanding can better facilitate the reproducibility of promising medicines and therapies when the data, protocols, and unique materials and disease models, are readily shared among researchers via community-recognized databases and repositories. Only when all of these measures are pursued collectively, alongside calling for the funding of reproducibility studies, will the translational pipeline be widened and translational efforts accelerated.