Earlier this month, a large-scale replication project underscored just how hard it is to repeat results in published papers. Over my three decades running a basic-biology laboratory, I have experienced the same. One of my biggest frustrations as a scientist is that it is so hard to know which exciting results are sturdy enough to build on. As I mentor early-career scientists, I try to sharpen their skill at spotting what is unlikely to be replicable, for instance published papers with oddly cropped images or protocols that mention no replicates. Still, my students have cumulatively wasted decades of research pursuing results that were impossible to confirm. The virtuous cycle of progress is broken.

There have been countless moves to ameliorate this problem: better reporting, better career incentives, separating exploratory from confirmatory work and developing infrastructure for large, collaborative confirmatory experiments (O. B. Amaral and K. Neves Nature 597, 329–331; 2021).

As the year comes to a close, it’s natural to consider how to improve in future. One step would be to explicitly restructure scientific publications to fulfil their functions as building blocks of knowledge. Past suggestions include requiring authors to include statements of generalizability or a numerical confidence level. Here I propose two new strategies.

First, every published study should articulate specific testable conclusions.

In my field — cancer biology — an overall conclusion might be that enzyme Y regulates cell migration in cancer. This could be built from a series of experimental results, each laid out in a quantitative way, with the relevant metrics. It’s easy to imagine a series like this: (a) compound X inhibits Y in vitro, with a Ki (a biochemical measure of inhibition) of 300 nM; (b) compound X inhibits Y in cells with an IC50 (concentration giving 50% inhibition) of 1 µM; (c) compound X inhibits cell migration with an IC50 of 1 µM; (d) deletion of the gene encoding Y inhibits cell migration by >50%.

Each statement includes a ‘testable unit’. These units can be assembled at the end of the article in a ‘compendium of discrete authenticable observations’, or CODA-O. Authors can append a section in which they list specific testable units in work from other groups that they validated in the course of their current work.

The main goal is for authors to articulate and ‘own’ explicit testable statements by expressing extremely high confidence in them. That should, in turn, prompt them to clearly articulate the experimental conditions required, for instance by stating whether results were obtained in one cell line or several cancer cell lines. That will also clarify how work might be extended (for instance, testing a result in other cell types).

Second, I propose that scientists extract these discrete observations from the literature and compile them in a registry that can seed experiments to be conducted in undergraduate or beginning graduate lab courses.

My work focuses on biologically active lipids, their roles in cell signalling pathways and how those pathways go awry in cancer. Besides working with many postdocs and graduate students, I’ve trained and mentored about 150 undergraduate and high-school students. Assigning a team of these trainees a claim from the registry to reproduce would be meaningful to them. Such a training programme would produce researchers skilled in troubleshooting and results that contribute to science. An approach with similar goals has been implemented in the field of psychology (K. Button Nature 561, 287; 2018): graduate students craft a basic protocol and supervise groups of undergraduates running that protocol.

A registry of claims for replication would bring multiple benefits. First, requirements to specify which components of an experiment will be replicable would reduce temptations to overstate and overgeneralize results. And it might head off some of the bickering that occurs when one set of researchers says it cannot reproduce another’s ‘work’. Second, researchers might be encouraged to describe their experiments more completely if others are more likely to attempt to formally replicate the work. Third, trainees would learn to appreciate the experimental nature of the biological sciences (too often, undergraduates are taught that all that matters is the existing body of knowledge, not how it is built). Fourth, the registry could let others check whether claims in a paper have been reproduced, and perhaps uncover subtle requirements to make an experiment work. Graduate students and trainees working to update the registry would gain practical insights on what makes work reproducible.

When too many data in the system are irreproducible, the noise becomes overwhelming — ‘garbage in, garbage out’. However, with reliable building blocks of information, synthesis and bioinformatics techniques will be much more productive and predictive.

This idea has limitations. Some techniques are too specialized and some experiments too expensive or resource-intensive to be replicated. In these cases, a registry would still be useful because it would articulate the specific units of the work, and might prompt a focused replication effort if the results are deemed crucial for the community. It would also establish precedence — simply requiring claims to be registered in a testable form would prompt researchers to report what would be replicable under which conditions. That alone would spare my lab and others much fruitless work.