A recent article identified five key technical determinants that make substantial contributions to the outcome of drug R&D projects (Lessons learned from the fate of AstraZeneca's drug pipeline: a five-dimensional framework. Nat. Rev. Drug Discov. 13, 419–431 (2014))1. Careful consideration of such determinants might be particularly valuable in the fields of neurology and psychiatry, in which successful drug development has declined precipitously over the past decade. This decline has largely been fuelled by a high failure rate in the translation of preclinical efficacy findings, caused by multiple factors (see Supplementary information S1 (table)), including limited training and poor protocol design, inadequate animal models, insufficiently validated therapeutic targets and problems with data handling and reporting.

Here, we focus on three factors that can be addressed immediately in order to re-evaluate the therapeutic potential of older drugs and targets and to increase the probability of success for future preclinical-to-clinical translation: data robustness, data generalizability and target engagement data, a factor that was also highlighted in the recent article1. We argue that the many failed clinical trials in neuropsychiatry do not necessarily invalidate the potential of a drug target or an animal model. Rather, these failures indicate a need for improved experimental designs and a robust translational strategy to better inform compound and dose selection for clinical trials. We conclude that many of the drugs and targets in neuropsychiatry that have been discarded because of negative clinical trial outcomes may deserve re-evaluation using contemporary knowledge, methodology and tools.

Robustness

The problem of robustness in preclinical data is best illustrated by an example from research in amyotrophic lateral sclerosis (ALS), a severe progressive neurodegenerative disease. There is currently one approved medication for ALS, riluzole, which has only modest effects on survival. Numerous other drug candidates have reported efficacy in a superoxide dismutase 1 (SOD1) mouse model of ALS (one of the common animal models for this disease), but none of these candidates produced an efficacy signal in clinical trials. The ALS Therapy Development Institute later rigorously retested more than 100 of those molecules in the SOD1 model (using adequate statistical power, treatment groups matched for litter and gender, blinding, uniform end point criteria, tracking of non-ALS deaths and quantitative analysis of transgene copy number prior to assigning mice to a study), and they were unable to replicate any of the previously reported preclinical efficacy findings2. In this context, the lack of clinical efficacy is not surprising.

Similar problems related to deficiencies in experimental design (such as inadequate blinding and randomization) had previously been observed in studies with animal models of stroke and multiple sclerosis3. The potential impact of such experimental design problems can be assessed retrospectively for drugs that have already been approved or abandoned, and steps can be taken to improve the robustness of experiments for drugs in development (see Supplementary information S2 (table)).

A hallmark of the scientific method is the replication of findings both within and between laboratories. However, such replication is limited by cost, human resources, time and bioethical considerations. Additionally, “there is an almost irresistible pressure to stop when the result is about what one expects it to be,” according to Terry Quinn4. However, the more novel the findings of an experiment appear, the less likely they are to be true5, especially in the context of poorly designed and underpowered studies. This problem reflects the pressure to publish novel findings in high-impact journals before being scooped by a rival laboratory or funding runs out. The conventional value attached to such publications for career advancement and future funding often outweighs the efforts required to rigorously challenge the novel findings; thus, verification in an independent laboratory is unlikely.

As the costs of clinical studies are so much higher than those of preclinical development, one might assume that pharmaceutical companies would conduct robust replications of key findings. This is indeed the case during lead optimization, candidate selection, testing different administration routes and the use of primary preclinical disease models. However, this is far less common for studies in the more complex disease models that are used in late stages of development and are potentially more relevant for predicting clinical efficacy.

In general, the rigorousness with which preclinical data is obtained — and the resulting robustness of the data — is quite low; few studies report randomization, blinding, sample size calculations or attrition.

Generalizability

Every laboratory has a unique combination of protocols, suppliers of tools and reagents, source of animals, and animal husbandry characteristics. As drugs are used in highly heterogeneous patient populations, efficacy observed in a single lab is more likely to be successfully translated when similar findings can also be obtained under different conditions in other laboratories. There is empirical evidence to support this assumption: the broader the range of circumstances and laboratory environments in which preclinical efficacy can be demonstrated, the higher the likelihood of detecting efficacy signals in clinical studies6. A recent European Union-funded initiative (the Multicentre Preclinical Animal Research Team (MultiPART)) established web-based platforms for multicentre animal studies, and the National Institute of Ageing supports an Interventions Testing Program that seeks to validate the efficacy of treatments for ageing across several test sites with adequately powered, rigorous experiments using genetically heterogeneous mice of both sexes.

Generalizability of preclinical data is not only an issue concerned with laboratory conditions and animal strains, age and sex. For example, there is a remarkable paucity of studies employing chronic or subchronic drug administration. Given that even a second dose of most drugs can alter the biological milieu (for example, tolerance, sensitization or receptor regulation), chronic dosing studies are important for generating the best predictions of effects in patients. It is largely unknown whether a compound's efficacy has been confirmed in properly designed studies with chronic administration in animals before initiation of a clinical trial7. This information is particularly important for those indications for which preclinical models do not require repeated administration to detect drug efficacy, while the treatment duration in clinical trials can range from weeks to several months, depending on the indication.

Target engagement

In the context of hypothesis-driven drug research, any observation of clinical efficacy is serendipitous if the molecule does not engage its biological targets at the dose tested. Various modelling tools can be used to assess target engagement, and direct target occupancy assays including positron emission tomography (PET) are increasingly available. The aim is to demonstrate that, at relevant doses, the drug is present in the same compartment as the target and in appropriate free concentrations to bind to it. PET studies demonstrate receptor occupancy but not targeted downstream effects; such approaches may not be appropriate for novel drugs working through allosteric or non-competitive molecular mechanisms, and for some targets PET tracers are not yet available.

Scientists at Pfizer considered evidence of exposure at the site of action, target binding and expression of functional pharmacological activity for 44 Phase II programmes across several therapeutic areas8. In 43% of cases, it was reported that the target mechanism had not been adequately tested owing to the lack of evidence of target engagement. Similarly, AstraZeneca reported that 40% of efficacy failures in Phase II projects could be attributed to a lack of clear target linkage to a disease or validated animal model, and 29% could be attributed to a lack of data establishing tissue exposure1.

Even the use of cerebrospinal fluid concentrations to guide dosing may be misleading for intracellular targets or for compounds that are actively transported across the blood–brain barrier. Similar considerations may apply to antibody therapeutics. Key parameters required for brain penetration have been discussed9, and the failure of AZD8529 — a positive allosteric modulator of metabotropic glutamate receptor 2 (mGluR2) — in a Phase II study in schizophrenia has been attributed, in part, to unreliable target engagement1.

We have used the Thomson Reuters Cortellis database to identify drug development projects in schizophrenia between 1994 and 2014; we could not find evidence of biomarker-driven dose selection for 80% of 72 novel drugs subjected to Phase II clinical proof-of-concept studies (Fig. 1).

Figure 1: Analysis of the use of biomarkers in the development of novel treatments for schizophrenia.
figure 1

The Thomson Reuters Cortellis database (searched on January 21, 2014) was used to identify drug development projects in schizophrenia between 1994 and 2014. We could not find evidence of biomarker-driven (for example, using positron emission tomography (PET), MRI or electroencephalography (EEG)) dose selection for 80% of 72 novel drugs that were evaluated in Phase II clinical proof-of-concept studies in this time period. 5-HT2A, 5-hydroxytryptamine receptor 2A; 5-HT6, 5-hydroxytryptamine receptor 6; α7, α7 nicotinic receptor; σ, σ receptor; DA D1, dopamine receptor 1; GABA, γ-aminobutyric acid; GlyT1, sodium- and chloride-dependent glycine transporter 1; H3, histamine receptor 3; M1, muscarinic acetylcholine receptor 1; mGlu, metabotropic glutamate receptor; NK3, neurokinin receptor 3; NMDAR, N-methyl-D-aspartate receptor; PDE10, phosphodiesterase 10.

PowerPoint slide

Path forward

Efforts to improve the robustness of preclinical data will lead to better study designs. Strengthening of publication policies and open access to data (including negative data) is an additional key way to improve data reliability and transparency. The Preclinical Data Forum was established with support from the European College of Neuropsychopharmacology (ECNP)10 and is developing an online platform to enable scientists to exchange unpublished data in a pre-competitive manner and to share knowledge on the use of tool compounds. This platform should facilitate disclosure of large amounts of pre-competitive information and should be paralleled by the development of consensus approaches to data robustness and demonstrating generalizability.

There may be compounds that have failed in clinical proof-of-concept studies owing to poor target engagement or that might be more appropriate for other diseases sharing similar mechanisms. The ECNP medicines chest provides a list of pharmacological tools no longer under development and can be used to obtain further clinical information on a particular target. Such target revalidation efforts should ideally occur in a pre-competitive space and may involve the development of new business models.

Finally, improved training on appropriate study designs will ensure that preclinical research is conducted to the highest standards, with appropriate protocol design informed by statistical and power analyses similar to those processes now standard for clinical trials.

Summary and conclusions

Herein, we challenge the widely held view that the high failure rate in neuropsychiatry trials invalidates both the drug targets chosen and the preclinical models used. We argue instead that the scientific community should attach greater importance to issues of data robustness, data generalizability and target engagement when designing preclinical studies. We do not seek to detract attention from the fundamental need to strengthen our scientific understanding of disease mechanisms, improve clinical testing strategies, and develop better disease models. Our premise is that an increase in robustness, generalizability and evidence of target engagement will increase the probability of successful translation of preclinical findings into Phase II efficacy. We are optimistic that the adoption of these approaches will enhance our ability to bring improved and much needed medicines to patients.