A woman works in a lab at the Cancer Research Center of Marseille

Funders and publishers need to take replication studies much more seriously than they do at present.Credit: Anne-Christine Poujoulat/AFP/Getty

Replicabillity — the ability to obtain the same result when an experiment is repeated — is foundational to science. But in many research fields it has proved difficult to achieve. An important and much-anticipated brace of research papers now show just how complicated, time-consuming and difficult it can be to conduct and interpret replication studies in cancer biology1,2.

Nearly a decade ago, research teams organized by the non-profit Center for Open Science in Charlottesville, Virginia, and ScienceExchange, a research-services company based in Palo Alto, California, set out to systematically test whether selected experiments in highly cited papers published in prestigious scientific journals could be replicated. The effort was part of the high-profile Reproducibility Project: Cancer Biology (RPCB) initiative. The researchers assessed experimental outcomes or ‘effects’ by seven metrics, five of which could apply to numerical results. Overall, 46% of these replications were successful by three or more of these metrics, such as whether results fell within the confidence interval predicted by the experiment or retained statistical significance.

The project was launched in the wake of reports from drug companies that they could not replicate findings in many cancer-biology papers. But those reports did not identify the papers, nor the criteria for replication. The RPCB was conceived to bring research rigour to such retrospective replication studies.

Initial findings

One of the clearest findings was that the effects of an experimental treatment — such as killing cancer cells or shrinking tumours — were drastically smaller in replications, overall 85% smaller, than what had been reported originally. It’s hard to know why. There could have been statistical fluke, for example; bias in the original study or in the replication; or lack of know-how by the replicators that caused the repeated study to miss some essential quality of the original.

The project also took more than five years longer than expected, and, despite taking the extra time, the teams were able to assess experiments in only one-quarter of the experiments they had originally planned to cover. This underscores the fact that such assessments take much more time and effort than expected.

The RPCB studies were budgeted to cost US$1.3 million over three years. That was increased to $1.5 million, not including the costs of personnel or project administration.

None of the 53 papers selected contained enough detail for the researchers to repeat the experiments. So the replicators had to contact authors for information, such as how many cells were injected, by what route, or the exact reagent used. Often, these were details that even the authors could not provide because the information had not been recorded or laboratory members had moved on. And one-third of authors either refused requests for more information or did not respond. For 136 of the 193 experimental effects assessed, replicators also had to request a key reagent from the original authors (such as a cell line, plasmid or model organism) because they could not buy it or get it from a repository. Some 69% of the authors were willing to share their reagents.

Openness and precision

Since the reproducibility project began, several efforts have encouraged authors to share more-precise methodological details of their studies. Nature, along with other journals, introduced a reproducibility checklist in 2013. It requires that authors report key experimental data, such as the strain, age and sex of animals used. Authors are also encouraged to deposit their experimental protocols in repositories, so that other researchers can access them.

Furthermore, the ‘Landis 4’ criteria were published in 2012 to promote rigorous animal research. They include the requirement for blinding, randomization and statistically assessed sample sizes. Registered Reports, an article format in which researchers publish the design of their studies before doing their experiments, is another key development. It means that ‘null effects’ are more likely to be published than buried in a file drawer. The project team found that null effects were more likely to be replicated; 80% of such studies passed by three metrics, compared with only 40% of ‘positive effects’.

Harder to resolve is the fact that what works in one lab might not work in another, possibly because of inherent variation or unrecognized methodological differences. Take the following example: one study tracked whether a certain type of cell contributes to blood supply in tumours3. Tracking these cells required that they express a ‘reporter’ molecule (in this case, green fluorescent protein). But, despite many attempts and tweaks, the replicating team couldn’t make the reporter sufficiently active in the cells to be tracked4, so the replication attempt was stopped.

The RPCB teams vetted replication protocols with the original authors, and also had them peer reviewed. But detailed advance agreement on experimental designs will not necessarily, on its own, account for setbacks encountered when studies are repeated — in some cases, many years after the originals. That is why another approach to replication is used by the US Defense Advanced Research Projects Agency (DARPA). In one DARPA programme, research teams are assigned independent verification teams. The research teams must help to troubleshoot and provide support for the verification teams so that key results can be obtained in another lab even before work is published. This approach is built into programme requirements: 3–8% of funds allocated for research programmes go towards such verification efforts5.

Such studies also show that researchers, research funders and publishers must take replication studies much more seriously. Researchers need to engage in such actions, funders must ramp up investments in these studies, and publishers, too, must play their part so that researchers can be confident that this work is important. It is laudable that the press conference announcing the project’s results included remarks and praise by the leaders of the US National Academies of Sciences, Engineering, and Medicine and the National Institutes of Health. But the project was funded by a philanthropic investment fund, Arnold Ventures in Houston, Texas.

The entire scientific community must recognize that replication is not for replication’s sake, but to gain an assurance central to the progress of science: that an observation or result is sturdy enough to spur future work. The next wave of replication efforts should be aimed at making this everyday essential easier to achieve.