The two of us have spent years coordinating replications of published studies. The most consistent outcomes are confusion and disagreement, particularly when outcomes seem to contradict original findings.
We saw this in the Reproducibility Project: Cancer Biology, in which we managed attempts to replicate experiments from high-impact papers1. Among the 50 replication experiments completed (from 23 papers), one required transplanting leukaemia cells into immunocompromised mice and letting the cells grow before administering a potential treatment. Neither our team, the reviewers nor the original authors thought that the immunosuppression technique we proposed, which differed from that in the original study, would affect the key question of whether the treatment lengthened survival2. Until we did the replication and observed no difference in survival3. Then, the reviewers said that this difference was crucial.
Similar scenarios play out in other disciplines. When one of us (B.A.N.) worked as a guest editor for the journal Social Psychology, a reviewer objected to the setting of a replication study selected to test the hypothesis that Asian women perform better on a mathematics test when reminded of their Asian identity. A second replication study was then commissioned in a setting that met the reviewer’s criteria. The replication failed in the second, commissioned setting but mostly succeeded in the original setting that had been deemed inappropriate by the reviewer4,5. After learning the results, the reviewer then stated that the second study was obviously flawed.
Failure to replicate often brings intellectual gridlock. Some researchers insist that a replication refutes the original paper’s ideas; others find flaws in the reproduced work. Both replicators and original authors defend their conclusions — or at least their competence — rather than getting on with the difficult, intellectual work of using new evidence to revise ideas. Human nature and the academic incentive system make it hard to do otherwise.
How can researchers avoid such stalemates? We need to spend more time early on resolving what is to be tested, the crucial features for doing so and the insight we expect. We need a process that appeals to our better natures, or at least requires that we reveal our lesser selves. The approach should favour seeking an accurate answer over defending previous results.
We call it precommitment. After a paper is made public, but before it is replicated, the original authors and independent replicators collaborate to design a replication experiment that both agree will be meaningful, whatever the results. This process will be documented using preregistration or, ideally, a Registered Report (see ‘Routes to replication’).
Over the past decade or so, philanthropists, government funders and journal editors have started to devote policies, programmes and money to replication research. Methodologists are working out how replication can sift promising paths from dead ends.
But we still lack the tools to make the most of replications. Too often, they are seen as a hostile act rather than an ordinary and desirable part of the scientific process. Precommitment will make replications more informative. It will favour collaboration over confrontation and promote scientific humility. It will focus energy on improving the quality of replications, on maximizing what can be learned from them, and on reducing the tendency to assess replications depending on whether they fit previous beliefs.
Replication depends on creating the conditions necessary to repeat a finding. Even if two studies use the same protocol, they will still differ in innumerable ways: time of year, climate, samples used and identities of the experimenters, to name a few. Conducting a replication demands a theoretical commitment to the features that matter6.
It is difficult to make that commitment after results are known. Ask whether temperature (or age or light or language or any other variable) matters before a replication is done, and many researchers will acknowledge that they don’t know or that they didn’t even realize that condition had been held constant in the original experiments. Ask the same question after a replication has failed, and hindsight bias supplies a different answer. “Of course, temperature [or age or light or language or any other variable] matters.” A replication study cannot test current understanding if any outcome can be accommodated by adding assumptions after the fact.
Worth a gamble
What, then, constitutes a theoretical commitment? Here’s an idea from economists: a theoretical commitment is something you’re willing to bet on.
If researchers are willing to bet on a replication with wide variation in experimental details, that indicates their confidence that a phenomenon is generalizable and robust. Those who will only precommit after narrowing the conditions shrink the phenomenon to match their confidence.
Let’s say a study reports that regular exercise improves memory. Proponents might insist on reusing methods from the original study, such as limiting the definition of ‘regular exercise’ to running not cycling. They might also insist on narrower conditions than were originally specified — perhaps that the memory test be administered only at night, among adults under 35 and in the United Kingdom. This insistence suggests that the proponents lack full confidence in the general claim that regular exercise improves memory, and actually believe a much narrower claim. If they cannot suggest any design that they would bet on, perhaps they don’t even believe that the original finding is replicable.
A good-faith replication study matches its design to the original claims. If claims specify location, the replication must take location into account. If the original claims ignore or dismiss age, then the replication need not consider it. Designing a replication study with input from proponents, sceptics and neutral bystanders can clarify the boundaries of claims, especially ones that went unspecified.
We have witnessed too many debates that stalled because proponents and sceptics misunderstood or talked past one another. To fix this, we need an efficient process to manage replication designs a priori. It must produce claims that are sufficiently well articulated to be tested, and needs to cope with the distrust that arises between people who disagree.
The most informative replication will occur when proponents and sceptics each endorse the research design but predict different results. Examples of adversarial collaboration illustrate both the difficulty and the potential of this approach (see ‘Collaborative adversaries’). For instance, leading neuroscientists with different perspectives on consciousness came together in a project initiated by the Templeton World Charity Foundation, based in Nassau in the Bahamas, to design experiments for which their theories demanded different outcomes. People got worked up. There was shouting. But, after two days, experiments were proposed. The results, which should be available later this year, won’t settle the debates about consciousness, but they should advance understanding (see go.nature.com/3gqou5u).
Make it happen
Seven years ago, we conducted an exercise that convinced us how the principle of precommitment might work. Now, the basic infrastructure to implement precommitment is widely available: Registered Reports. In this system, authors, reviewers and editors evaluate a study before it is performed. Assuming the research question is important and the methodology is of high quality, that work is accepted for publication before results are known7.
Our proof-of-concept exercise encompassed 15 replication papers published as a special issue of Social Psychology8. Teams proposed replication studies of important findings in the field. Original authors and other expert reviewers critiqued the proposed methodology. Although exchanges were occasionally tense and confrontational, these ‘adversarial teams’ and journal editors worked on a shared goal: designing a methodology that would make the replication results meaningful. That still didn’t eliminate all controversy — far from it. Indeed, after publication, one paper spurred what came to be called ‘repligate’ involving name-calling, competing reanalyses and reflections on civility (see go.nature.com/3ftemmf).
The other papers better illustrate the desired outcome: proponents and sceptics observed the findings, debated their meaning and offered alternative explanations. Because the methodology and analyses had already been agreed, alternative explanations for the replication results were framed appropriately as areas for follow-up research, rather than as a necessary component for a valid experiment.
For example, one paper that failed to replicate findings that superstitions can improve performance discussed the possibility that the original finding was a false positive or that the particular type of task or belief could account for the differences9. Crucially, these potential moderating influences were described as hypotheses for future study, not as explanations for unexpected outcomes. In many ways, the process felt like the ideal of how we imagined science to operate when we entered the field.
The Registered Reports format is now offered by more than 250 journals. Nature Communications became one of the latest of these earlier this month. Funders such as the US-based Flu Lab and the Children’s Tumor Foundation in New York City have each partnered with the scientific publisher PLOS to fund Registered Reports of important findings in their fields. These journals (and many others) archive accepted Registered Reports on a platform (http://osf.io/rr) that is maintained by the non-profit Center for Open Science, where we work. Our centre can support journals that implement a precommitment process combined with Registered Reports.
Researchers see findings as personal possessions, so a replication puts them at risk of losing something and amplifies their desire to be right. Precommitment offers an opportunity for everyone, proponent or sceptic, to shift away from that unproductive framework and pursue a shared goal of getting it right. Original authors can be rewarded for generating clear and specific testable claims, and for transparently reporting how to test them.
We think that the visibility of precommitments will be sufficient to shift incentives. After all, which scientist would you admire more — one who never agrees with independent tests of their findings, or one who willingly precommits and revises beliefs when new results suggest they were wrong? We have evidence that researchers and the public prefer the latter10. Critics will counter that some experiments are inherently messy or require arcane techniques, or that replicators’ energies would be better spent on original ideas. We argue that testing existing claims to improve understanding is essential for progress.
Eventually, precommitment should become an expectation. Whether or not the results come out as proponents or sceptics expected, knowledge comes out ahead.
Nature 583, 518-520 (2020)