Close up of a laboratory test

Vague experimental protocols was one barrier to replication that researchers encountered.Credit: Patrick Hertzog/AFP/Getty

A US$2-million, 8-year attempt to replicate influential preclinical cancer research papers has released its final — and disquieting — results. Fewer than half of the experiments assessed stood up to scrutiny, reports the Reproducibility Project: Cancer Biology (RPCB) team in eLife1,2. The project — one of the most robust reproducibility studies performed so far — documented how hurdles including vague research protocols and uncooperative authors delayed the initiative by five years and halved its scope.

“These results aren’t surprising. And, simultaneously, they’re shocking,” says Brian Nosek, an RPCB investigator and executive director of the Center for Open Science in Charlottesville, Virginia. Although initially planning to repeat 193 experiments from 53 papers, the team ran just 50 experiments from 23 papers.

The low replication rate is “frankly, outrageous”, says Glenn Begley, an oncologist and co-founder of Parthenon Therapeutics in Cambridge, Massachusetts, who was not involved in the study. But it isn’t unexpected, he agrees. In 2012, while at the biotech firm Amgen in Thousand Oaks, California, Begley’s team helped to draw attention to growing evidence of a ‘reproducibility crisis’, the concern that many research findings cannot be replicated. Over the previous decade, his haematology and oncology team had been able to confirm the results of only 6 of the 53 (11%) landmark papers it assessed, despite working alongside the papers’ original authors.

Other analyses have reported low replication rates in drug discovery, neuroscience and psychology.

Double take

The RPCB — a partnership between the Center for Open Science and Science Exchange, a marketplace for research services in Palo Alto, California — launched in 2013. Funded by the philanthropic investment fund Arnold Ventures, headquartered in Houston, Texas, the collaborators set out to systematically reproduce experiments in 53 high-profile papers published during 2010–12 in journals including Nature, Science and Cell.

The project focused on preclinical cancer research because early hints at low reproducibility rates came from this space — animal studies, in particular, seemed difficult to reproduce. By selecting high-impact papers, the team focused on the research that most shapes the field.

The RPCB started publishing its findings in 2017, and these hinted at the messy results to come. The researchers now summarize their overall findings in two papers published on 7 December.

The first of these papers1 catalogues the hurdles the researchers encountered. For every experiment they set their sights on, for example, they needed to contact the authors for advice on experimental design because the original papers lacked data and details. They deemed 26% of authors “extremely helpful”, sometimes spending months tracking down answers and sharing reagents. But 32% were “not at all helpful” — often ignoring queries altogether.

“Everyone always talks about this problem. But here, we’ve actually got data on how prevalent it is,” says Manoj Lalu, a clinician–researcher who studies data reproducibility at the Ottawa Hospital Research Institute in Canada.

This lack of cooperation, alongside the need to modify or overhaul protocols once experiments were under way, took a toll. On average, the team needed 197 weeks to replicate a study. And as costs added up to $53,000 per experiment — about twice what the team had initially allocated — the project’s budget couldn’t cover its original ambition.

The second study2 delves into the overall results of these experiments in detail. By one analysis, only 46% of the attempted replications confirmed the original findings. And, on average, the researchers observed effect sizes that were 85% smaller than originally reported.

The experiments with the biggest effect sizes were those most likely to be replicated. Animal experiments fared worst, mainly because in vivo experiments tend to yield smaller effect sizes than do in vitro experiments.

Counterclaims

Not everyone is convinced that the study has merit. Pushback came especially from researchers whose findings were not successfully replicated.

“I’m not sure there is much value in these one-shot experiments,” says Erkki Ruoslahti, a cancer biologist at the Sanford Burnham Prebys in La Jolla, California. In 2017, the RPCB team reported that it could not confirm a finding made by Ruoslahti’s team, but Ruoslahti counters that external laboratories have replicated the disputed result at least 20 times. A drug candidate resulting from this work is now in phase II trials. “It’s hard for me to believe that half of all papers out there would not be valid,” he says.

Dean Tang, a cancer biologist at the Roswell Park Comprehensive Cancer Center in Buffalo, New York, is also circumspect. The RPCB reported3 in 2019 that it could not replicate some work from his lab. But, he argues, the replicators deviated from their experimental plan, relied on fewer and different cell lines from those used in the original study, and didn’t double-check their own work. “We believe all published work deserves at least a minimum of 3 attempted replicates before being discounted,” Tang and a colleague wrote in 2019 in response to the project’s findings.

But replication is extremely hard, says Olavo Amaral, a coordinator of the Brazilian Reproducibility Initiative and a neuroscientist at the Federal University of Rio de Janeiro, Brazil. “You can never do it exactly the same,” he says. Does it matter if you shake a tube up and down instead of side to side? How do you account for different baseline readings? Figuring out when and how to stay true to an experimental protocol is part of the emerging science of replication, he says.

Failure to replicate alone is not necessarily cause for concern, says Nosek. Some preliminary findings are distractions, but contradictory follow-up results can lead to deeper scientific insights. The RPCB was not set up to call out or invalidate specific studies, adds Nosek. Replication, like science, is about the total body of evidence. Rather, he says, the goal was to capture a snap shot of the drivers and the magnitude of the reproducibility crisis, with an eye towards system-level solutions.

The real problem is the time, money and effort that are wasted in finding the signals amid the noise, says Tim Errington, the RPCB’s project leader and director of research at the Center for Open Science. “How well are we using our resources? And how are we learning new knowledge? This is the place to keep pushing, across disciplines.”

Culture shift

There is no shortage of proposed fixes: for example, in vitro and animal studies can benefit from blinding, bigger sample sizes, greater statistical rigour and preregistration of study plans. Papers should make fewer claims and provide more proof, researchers suggest. Data sharing and reporting requirements need to be baked into scientific processes.

But stakeholders also need to address the incentives and research cultures that stand in the way of replication, says Nosek. Researchers who have published high-profile papers have little to gain from participating in confirmatory analyses, he points out, and much to lose. Replication attempts are often seen as threats rather than as compliments or opportunities for progress, he says. “That kind of culture does not help this ethos of self-correction. We are really about changing the entire research culture,” says Nosek.

There is also currently little support for the researchers who show that something doesn’t work, or who focus on the causes of variability between labs, says Lalu. “Hopefully, this will provide some people sober second thought about how we’re going to approach this moving forward.”

Begley sees evidence that attitudes are already changing. “When I first presented my findings, people got very hostile. Now, people accept that there’s a problem, and ask about what needs to happen to change this.”