Main

After more than a decade of meta-research and debate, the life and social sciences are well in the midst of a credibility revolution1,2,3. Faced with evidence of publication bias4,5,6,7, hindsight bias and selective reporting8,9,10,11,12,13,14,15, insufficient sample sizes16, inadequate data sharing17 and suboptimal rates of both attempted18,19 and successful replication20,21,22, researchers from across a broad range of fields are unifying around a core mission to improve reproducibility and transparency. In doing so, the deeper aim of the openness agenda is to stimulate cultural reform, aligning what is beneficial for individual scientists with what is beneficial for science23.

As scientists and policymakers grapple with the causes of irreproducibility, it has become clear that one of its main drivers is the so-called ‘results paradox’. On the one hand, scientists are taught from their earliest years that the one part of the research process that they must keep at arm’s length is the results of their research. The objective investigator—the detective—follows the data with discipline and restraint, never pressuring it to bend to their will, lest they fall prey to Richard Feynman’s famous warning that “the first principle is that you must not fool yourself and you are the easiest person to fool” (p. 12)24. On the other hand, the very same researcher is sent a message from prestigious journals, funding agencies and evaluation committees that if you want to succeed in science, be sure to publish a lot of clear, novel, positive findings. Researchers are therefore presented with conflicting goals: be a good detective who never indulges in data massaging or cherry picking, but also be a good lawyer who wins arguments and produces a continual supply of beautiful results25.

Many observed problems with reproducibility stem from researchers attempting to resolve this paradox while protecting their careers. When publishing in prestigious journals requires confirming one’s hypotheses but the results defy expectations, researchers can resolve this conflict by following the advice of Bem26,27 and rewriting their hypotheses to ‘predict’ those results—a form of hindsight bias known as “hypothesizing after results are known” (HARKing)8. When academic leaders tell authors to “go with strongest studies” because “weak data dilute strong data” (p. 79)28 and that “what you don’t have to do is tell the whole truth ... you can select the results you present”29, the responsive researcher answers by reporting the analyses that tell the best story, diverting negative or inconvenient results to the file drawer or converting them into publishable (probably false) positives. And when journal editors tell researchers that, all else being equal, some results are simply more deserving of publication than others, the strategic researcher responds by conducting a large number of small studies and reporting only the most persuasive findings (even if unreliable) rather than gambling on the outcome of larger, more definitive projects that may yield inconclusive data30.

Registered Reports (RRs) were proposed in 2012 as a way to free researchers from the pressure to engage in these counterproductive practices, thereby breaking the cycle that perpetuates bias and irreproducibility (Fig. 1). The RR model originates from the simple philosophy that to defeat the distorting effects of outcome bias on science, we must focus on the process and blind the evaluation of science to research outcomes31. This blinding is achieved by splitting peer review into two stages. In the first stage, authors submit their research question(s), theory, hypotheses, detailed methods and analysis plans and any preliminary data as needed. Following detailed review and revision—usually according to specific criteria—proposals that are favourably assessed receive in principle acceptance (IPA), which commits the journal to publishing the final paper regardless of whether the hypotheses are supported, provided that the authors adhere to their approved protocol and interpret the results in line with the evidence. Following IPA, authors then typically register their approved protocol in a repository, either publicly or under a temporary embargo. Then, after completing the research, they submit a stage 2 manuscript that includes the approved protocol plus the results and discussion, which may include clearly labelled post hoc analyses in addition to the preregistered outcomes (that is, findings from both confirmatory and exploratory analyses). The reviewers from stage 1 and/or newly invited reviewers then assess the completed stage 2 manuscript, focusing on compliance with the protocol and whether the conclusions are justified by the evidence. Crucially, reviewers do not relitigate the theory, hypotheses or methods, thereby preventing knowledge of the results from influencing recommendations. RR guidelines specify that editors similarly cannot reject a manuscript on the basis of any new concerns about the methodology or rationale or on the basis of the results themselves.

Fig. 1: The basics of RRs.
figure 1

a, The typical RR workflow involves pre-study review of the study rationale, design and proposed analyses, including preliminary data as needed to provide proof of concept, effect size estimation for a sampling plan or hypothesis generation. Following IPA, authors conduct the research before submitting a complete manuscript with the results. The original reviewers then return at stage 2 to assess compliance with the protocol and to ensure that the conclusions are appropriately evidence-based. b, The distinction between RRs, Registered Replications Reports and study preregistration is a common source of confusion. Aside from the minority of RR formats that do not require public preregistration (see the section “Limitations and drawbacks”), RRs are mostly a subset of the wider family of preregistration methods but with the additional features of pre-study review and IPA regardless of results. Registered Replications Reports, offered by one psychology journal, are a subset of RRs. Panel a, reproducing with permission from https://cos.io/rr/.

The aim of this modified review process is to reduce as much as possible the potential for biased research practices such as HARKing and selective reporting, while also eliminating the incentive for researchers to employ such practices in the first place. RRs are also designed to mitigate publication bias by journals and outcome bias by reviewers, since the decision to accept or reject is made before results are known. Finally, the format is designed to clearly distinguish the outcomes of pre-planned confirmatory research from exploratory data analysis.

In this Review, we take stock of the RR initiative, consider its recent history and historical underpinnings, emerging variants, impacts and limitations, and the probable future of the format into the 2020s and beyond. We also offer guidance to authors, reviewers and editors who are becoming familiar with RRs.

Past

RRs as they exist today were first proposed in 2012 independently and simultaneously at two journals: Cortex and Perspectives on Psychological Science32,33. The format was then formally offered at these journals and at Social Psychology in 2013 (refs. 34,35,36). These first steps precipitated a gradual rise in adoptions, with now over 300 journals across a range of disciplines offering RRs as a new article type (Fig. 2). Early launches triggered the rise of RRs into mainstream publishing, but the origins of the format, and of preregistration in general, are much older. As early as 1878, chemist and logician Charles Peirce laid the foundations for the preregistration of protocols, writing that “[t]he hypothesis should be distinctly put as a question, before making the observations which are to test its truth” (p. 476)37. In the mid-twentieth century, psychologist Adriaan de Groot further argued that distinguishing exploratory from confirmatory research was vital for scientific progress, and that “it is a serious offense against the social ethics of science to pass off an exploration as a genuine testing procedure” (see Wagenmakers et al.38 for a detailed historical overview). Embedded in the arguments of Peirce, de Groot and many others is the maxim that prespecifying predictions and analyses is an important tool for preventing confirmation bias in hypothesis testing.

Fig. 2: Key milestones in the evolution of RRs.
figure 2

RRs in their current form were first introduced in 2012 and Cortex was the first journal to officially offer RRs as an article type, in 2013. The first RRs were published in the journals Social Psychology (Soc. Psychol.) and Perspectives on Psychological Science (Perspect. Psychol. Sci.). The first journal to be exclusively dedicated to RRs, Comprehensive Results in Social Psychology (CRSP), was also launched in 2014. In 2015, Cortex published its first RR110, and Royal Society Open Science (RSOS) became the first multidisciplinary journal covering all STEM to offer RRs111. In 2015, nine political science journals launched a joint RR project for the 2016 American National Election Studies survey (https://electionstudies.org/data-center/2016-time-series-study/), marking the first application of RRs in political science. The number of RR-adopting journals further increased in 2017, which was a key year on several fronts. As part of the Reproducibility Project: Cancer Biology, eLife published the first of many RRs112, and the first RR format for clinical trials was launched by BMC Medicine. The first RR in the field of computer science was also published in RSOS113, and the format was introduced for the first time in a specialist ecology journal (BMC Ecology). In the same year, Nature Human Behaviour launched RRs114, and F1000Research and Meta-Psychology paved the way for the post-publication peer-review model for RRs. The first RR funder/journal partnership was also announced in 2017 (ref. 90). By the end of 2018, the number of adopting journals had risen to 150, and the 100th stage 2 RR was published across all journals. This increase in the number of adopters paralleled a major disciplinary expansion, with the format being applied to preclinical science (BMJ Open Science), economics (Journal of Development Economics), empirical accounting (Journal of Accounting Research), animal neurophysiology (European Journal of Neuroscience), cancer research (Cancer Medicine), immunology, endocrinology, gastroenterology, herpetology and agricultural/soil sciences. In 2018, the British Psychological Society became the first society to launch RRs concurrently across all of its journals105. In 2019, PLoS Biology became the 200th adopter of RRs, Nature Human Behaviour published its first two RRs and the format was launched for the first time in the field of veterinary science (Equine Veterinary Journal). In 2020, RSOS and 11 journals launched the COVID-19 RR rapid review network82. As part of this ongoing initiative, participating journals strive to review stage 1 RRs related to COVID-19 in 7 days and to commit to open access publication with no article processing charges. As a result of this initiative, the past year also marked the first published RR in viral bioinformatics85.

Perhaps the earliest proposal for a RR-type review process, in which journal editors reach editorial decisions based on pre-study or results-blind review, was advanced by psychologist Robert Rosenthal who wrote in 1966: “What we may need is a system for evaluating research based only on the procedures employed. If the procedures are judged appropriate, sensible, and sufficiently rigorous to permit conclusions from the results, the research cannot then be judged inconclusive on the basis of the results and rejected by the referees or editors” (p. 36)39. Similar ideas were proposed throughout the 1970s and 1980s40,41,42,43,44 but were not widely implemented. Yet, remarkably and unknown to mainstream science, by 1976, the first RR format had already been launched, albeit in the fringe discipline of parapsychology. For 17 years, the European Journal of Parapsychology quietly published RRs alongside regular articles before discontinuing them in 1992 (ref. 45).

While non-clinical researchers were debating the potential merits of results-blind review, medical researchers were busy weighing up the costs and benefits of public preregistration to address publication bias, particularly in the context of clinical trials. With the US Food and Drug Administration Modernization Act of 1997 came the first law requiring trial (pre)registration, which in turn led to the launch of the ClinicalTrials.gov registry in 2000. By 2005, the International Committee of Medical Journal Editors was requiring trial registration as a condition of journal publication. An increasing number of journals began offering protocol article types (and in some cases, entire journals), with some performing pre-study review of the protocols (for example, Trials, and The Lancet’s since-abandoned protocol reviews format46).

Crucially, none of these initiatives, article types or journals provided IPA regardless of the results. Thus, despite decades of debate about preregistration, pre-study review and results-blind acceptance within isolated channels, it would take until the launch of RRs at Cortex and Perspectives on Psychological Science in 2013 for the combined model to take hold. From there, the next 2 years witnessed a gradual increase in the number of psychology journals adopting RRs, followed by the first general science, technology, engineering and mathematics (STEM) journal in 2016 (Royal Society Open Science; Fig. 2). After a series of major launches (for example, Nature Human Behaviour, BMC Medicine and PLoS Biology) and broader disciplinary expansion throughout 2017–2018, the RR format permanently entered the mainstream.

Present

At the time of writing, RRs are offered by over 300 journals, with 591 stage 2 articles so far published by 94 adopting outlets. With the format becoming more available and associated with a growing published corpus, the first signs of impact are emerging. In this section, we review some of the early signs of its effectiveness. We also introduce recent variants of the model, summarize the key ingredients that make a high-quality RR, address some major misconceptions (Table 1) and genuine limitations that have emerged before offering specific recommendations to authors (Box 1), reviewers (Box 2) and editors (Box 3).

Table 1 Misconceptions and realities of RRs

Field spread and author demographics

Since their initial launch within psychology and neuroscience, RRs have spread to specialist journals covering a range of disciplines, primarily in the life and social sciences (Supplementary Fig. 1). As this reach has grown, we can begin to explore the demographics of submitting authors to discover the accessibility of the format. The prospect of being pre-accepted at a respectable journal is likely to be appealing to many researchers—and perhaps early-career researchers (ECRs) in particular—seeking to eliminate the risk that the results of their research will determine publication and, consequently, their career prospects. However, the often substantial sample sizes needed for RRs to achieve minimum levels of statistical power required by many journals, combined with the time taken for stage 1 review (see the section “Limitations and drawbacks“), could act as a deterrent, especially for researchers with major resource constraints. To provide a preliminary insight into accessibility, we analysed the author demographics of 141 stage 1 RRs submitted to Cortex, the European Journal of Neuroscience, NeuroImage and Royal Society Open Science. We found that 77% of submitted stage 1 manuscripts were first-authored by PhD students or postdoctoral researchers (Supplementary Fig. 2a). At the journal Cortex, where a direct comparison between different article types was possible, we found that 78% of submitted RRs were led by ECRs compared with 67% in a comparison sample of regular articles (Supplementary Fig. 2b,c). It would be premature to conclude that RRs present no barriers for researchers47, but these results at least provide no reason to fear that RRs are beyond the reach of ECRs.

Early impacts

Are RRs working as intended to reduce bias and improve reliability? Although the initiative is too young to answer this question with confidence, metascientific investigations are beginning to reveal signs of bias control, study quality, computational reproducibility and citation influence.

Bias control

Since reporting and publication biases typically favour positive results, RRs, if successful, should yield a greater proportion of negative results compared with the conventional literature. So far, this prediction appears to hold: a recent analysis of 296 hypotheses published across 127 RRs in different fields found that 60% of RRs report null results, which is approximately five times greater than the rate in regular articles48. In psychology, this difference is even more striking, with a new study49 finding that just 4% of regular articles failed to confirm the first hypothesis compared with 56% for RRs4 (see also Wiseman et al.45). It would be tempting to conclude that this increase is caused by the elimination of selective reporting, HARKing and publication bias resulting from pre-study review and IPA, which are the key ingredients of the RR process. However, it is possible that authors, knowing that their study will be published regardless of whether their hypotheses are supported, might employ the RR format to test riskier hypotheses. Moreover, RRs themselves might select for authors who are diligent in controlling their own reporting bias, regardless of the article type. To address such confounds, future observational studies could compare the plausibility of hypotheses in RRs compared with non-RRs, as well as indicators of biased reporting in RRs and non-RRs within the same sets of authors (see the section “Future”).

Computational reproducibility

There are several reasons why RRs might be more computationally reproducible than conventional articles. At many journals, the RR review policy has more stringent expectations concerning open data and code, which is associated with greater accuracy in statistical reporting50. In addition, IPA eliminates the incentive for authors to conceal messy or inconvenient elements of their data, and early adopters of the format may also be predisposed to performing research to a higher level of transparency. A recent study51 indeed suggests that the results of RRs can be more readily reproduced from the acquired data compared with regular articles. Of the 35 RRs published in psychology that made data and code openly available, 57% were computationally reproducible compared with 31% in a previous analysis of regular articles52. Although RRs appear to perform better than the status quo, these results clearly show room for improvement and require substantially more data to be confirmed.

Citation profile

Clinical trials reporting negative results receive between two and ten times fewer citations than trials reporting positive results53,54. Given the increased rate of negative results in RRs, authors may therefore be concerned that submitting a RR could be disadvantageous to their careers. Similarly, one of the immediate reactions to RRs from many journal editors is that the format could risk reducing their outlet’s impact factor, a powerful albeit spurious measure of research influence55,56,57,58. In fact, such concerns may be unwarranted; a recent analysis59 of 70 RRs reported in a non-peer-reviewed preprint found that RRs are cited the same as or slightly higher than comparable regular articles.

Study quality

How do expert assessments of RRs compare with non-RRs? In a recent study60, Soderberg et al. reported an experiment in which 353 scientists rated a sample of published, partially blinded RRs and non-RRs on 19 study characteristics, including importance, novelty, creativity, innovation and rigour. RRs numerically outperformed non-RRs on every criterion, showing statistically robust and large improvements in attributes such as methodological rigour and overall article quality, while being statistically indistinguishable from comparison papers in terms of features such as novelty and creativity. These results held even among reviewers who admitted being sceptical or neutral about RRs.

Emerging variants

As the reach of RRs has grown, several modified versions of the format have arisen to accommodate specific needs. Five major strands in particular have emerged, including results-blind review, accountable replication policies, RRs involving post-publication peer review, publisher-level RRs and publisher-independent RRs. These variants are briefly summarized below and discussed in more detail in the Supplementary Note.

Results-blind review

In this modified workflow, stage 1 peer review is undertaken after results are known to the authors but before they are known to the reviewers and editors61. Following IPA, the authors then submit the full manuscript containing the data and conclusions. Because authors need not wait until IPA to conduct their research, this format prevents the stage 1 review time from delaying data collection and analysis. However, reviewers are unable to improve the study design, and the format does not prevent reporting bias (for example, p-hacking or HARKing) by authors. To date, at least 13 journals, primarily in psychology and management, have adopted results-blind review as an optional article track. Two journals, Cortex and Infant and Child Development, have also launched Verification Reports, a results-blind format dedicated to assessing the computational reproducibility and robustness of previous findings based on a re-analysis of the original study data.

Accountable replications

Conceived by psychologist Sanjay Srivastava62, this variant emerged from the principle that when a journal publishes a research finding, it should commit to publishing all methodologically sound replications of that finding regardless of how the results turn out and regardless of subjective importance or methodological flaws in the original study. Using a modified set of the RR assessment criteria, the journal reaches a stage 1 IPA decision on the basis of technical validity and the methodological proximity between the replication and target study (Supplementary Fig. 3). To date, Royal Society Open Science is the only journal that implements a complete and fully specified version of this concept, following partial implementations at Clinical Psychological Science, the Journal of Research in Personality and Psychological Science63,64,65,66,67.

Post-publication peer review RRs

RRs usually rely on conventional pre-publication review in which reviewers serve as gatekeepers to IPA and stage 2 acceptance. In contrast, by combining post-publication peer review with RRs, the stage 1 manuscript is published almost immediately following initial receipt and is then openly reviewed68,69. If the reviews are positive (with authors having the usual opportunity to revise the protocol), then the article is awarded IPA and, once passing stage 2 review, the final manuscript with results is badged as a RR. To date, this model has been adopted across ten journals, including F1000Research and Wellcome Open Research.

Publisher-level RRs

The review process for RRs is typically managed by the one journal, but recently, some journals have begun implementing a distributed model in which stage 1 and stage 2 manuscripts can be reviewed and published in different journals under the same publisher. In one working model, the completed stage 2 RR is then cross-linked to the accepted stage 1 protocol using an international RR identifier49.

Publisher-independent RRs

Can RRs exist beyond journals? As part of the recently created Peer Community in Registered Reports (PCI RR), stage 1 and stage 2 preprints are reviewed independently of journals (https://rr.peercommunityin.org). Where the reviews are positive, PCI RR issues a positive recommendation and authors can then choose to publish their recommended preprint in any ‘PCI RR-friendly’ journal without further peer review.

Seven virtues of high-quality RRs

What makes a good RR? In this section, we describe seven desirable characteristics that authors should aim to capture in their stage 1 and stage 2 manuscripts. Further guidance may be found in Box 1, in the RR policies for specific journals (see the list at https://cos.io/rr/) and in a recent practical primer by Kiyonaga and Scimeca70.

The first and foremost ingredient is that the proposal tackles a scientifically valid question and ideally one that other scientists agree is important to answer. The introduction section of the stage 1 manuscript should make clear the underlying theory or application from which the question arises, leaving the reader in no doubt as to why the study is being proposed. Second, where the study proposes hypotheses, they should be stated as precisely as possible in terms of specific variables to ensure falsifiability. In quantitative hypothesis-driven sciences, we recommend that researchers consider the open-theory pathway proposed by Guest and Martin71 to help ensure that hypotheses are formulated as a natural specification of computational theory rather than emerging loosely—and often with questionable rationale—from a vague conceptual framework. Some of the most effective RRs achieve this by identifying pressure points in competing theories and then devising hypotheses to adjudicate between them.

With the theory, rationale and hypotheses in place, the third key ingredient is a study procedure and analysis plan that is as rigorous, transparent and as comprehensive as possible. Data acquisition protocols and analysis plans should be prespecified with sufficient detail to be reproduced by experts in the field, ideally with accompanying code, and with rigorous experimental controls where appropriate, including both negative and positive controls. Where the conclusions will depend on inferential statistics, the procedure should include a detailed sampling plan, such as a statistical power analysis, Bayes factor design analysis72 or an appropriate alternative, which, crucially, should also make clear the specific hypothesis it interrogates and the rationale for deciding the sensitivity of each statistical test (such as justification of the target effect size or Bayesian prior). When planning the analyses, it is vital to choose the right tools for the job, including assumption checks, detailed consideration of data preprocessing and filtering, and planned contingencies for any data-driven analysis decisions. Where these contingencies would be too numerous (or even impossible) to specify in advance, the inclusion of pilot data at stage 1 can be used to verify assumptions and narrow the range of possibilities. Alternatively, authors can embrace uncertainty and use blinded analysis methods to control risk of bias.

These three features are the essential building blocks for the fourth key ingredient: a seamless link between the research question, theory and its specification, the hypotheses, sampling plan and contingent interpretation given different outcomes (Box 1). A stage 1 RR can be thought of as a preparatory chain of inference, leading from the why to the what and how, which, as with any chain, is only as strong as its weakest link. Many RRs now include study design tables to elucidate these links as precisely as possible (for example, see https://osf.io/sbmx9/).

Fifth, once the study is completed and results are known, it is essential that the outcomes of the prospective (confirmatory) analyses are clearly distinguished from the outcomes of any post hoc (exploratory) analyses that deviated from the preregistered plans. While RRs are not intended to restrict valid deviation or post hoc exploration, it is vital that at this final hurdle the outcomes that were decided after observing data, and therefore potentially bias-prone, are not conflated with those that were protected from bias by prespecification. Clear differentiation of exploratory and confirmatory outcomes in turn furnishes the sixth key ingredient: ensuring that the conclusions of the RR are based firmly on the evidence presented and appropriately weighted in favour of the confirmatory outcomes. Finally, in line with level 2 of the Transparency and Openness Promotion Guidelines73, the seventh key ingredient of a high-quality RR is that study data, code and digital materials are made publicly available to the maximum extent permitted by relevant ethical or legal constraints.

Limitations and drawbacks

Despite their advantages, RRs are neither a panacea nor a one-size-fits all solution for irreproducibility. As a tool for improving the quality of confirmatory research, they are particularly well suited to hypothesis-driven studies and are not designed to improve the robustness or transparency of purely exploratory science (for which better suited article types are available74,75,76). As the format has evolved, various shortcomings have also been revealed in the workflow and implementation.

Lack of protocol transparency

In 2018, Hardwicke et al.77 reported that of the 70 journals that had adopted RRs permanently, only 50% required that the accepted stage 1 protocols were publicly registered and available alongside the completed stage 2 articles. Protocol transparency is an important element of RRs because it enables readers to check whether authors followed the approved protocol rather than relying on the (typically) closed review process to ensure compliance78. One reason for the lack of protocol transparency is that, in 2013, the key progenitor of RRs, Cortex, did not require accepted stage 1 protocols to be made public, and this policy omission was then duplicated among subsequent adopters. However, since the analysis by Hardwicke et al., the recommended author guidelines for RRs at Cortex and at the Center for Open Science have been updated to include protocol transparency79. To date, of the 213 permanent adopters with published RR policies, 87% now either publish the stage 1 protocol as a separate article or require stage 1 protocols to be registered and made public no later than the point of stage 2 acceptance (see Supplementary Note for details, and ref. 80 for a dedicated RR registry supported by the Center for Open Science). It remains an ongoing task to persuade all RR-adopting journals to require protocol transparency, and a key aim of future metascience will be to confirm that journals are enforcing their policies appropriately (Box 4).

Lack of standardization

Previous analyses by Hardwicke et al.77 and Scheel et al.49 show that RRs are registered and reported inconsistently and, in many cases, even lack sufficient information to determine the specific hypotheses. This lack of specificity probably arises from the incompatibility between the seventeenth-century traditional manuscript format—involving discursive and often vague prose documentation—and the demand for precision within RRs. An RR should articulate falsifiable predictions that are linked to specific sampling plans, inferential analyses and contingent interpretations given different outcomes.

Stage 1 delay and bureaucratic tennis

Despite the fact that RRs might, in aggregate, lead to more efficient knowledge generation compared with regular articles (Table 1), the fact remains that the stage 1 review time typically adds a period of several months between submitting a stage 1 manuscript and the commencement of the research. This downtime can present a substantial barrier for researchers on short-term contracts or who hold grants that demand immediate data acquisition. Furthermore, in fields that require very specific ethical approval, such as the clinical sciences, authors can find themselves locked into a time-consuming tennis match between the journal and their ethics committee, both of which can insist on approving a precisely specified protocol (Fig. 3).

Fig. 3: RRs and ethics approval.
figure 3

One frequently asked question is when authors should obtain ethics committee (EC) or institutional review board (IRB) approval for their stage 1 RR. The answer depends on the tolerance of the EC/IRB for methodological flexibility and the requirements of the journal’s policy. Where the EC/IRB permits flexibility (left track), it is usually most efficient to obtain a generic EC/IRB approval before manuscript submission. Where the EC/IRB will instead only approve a precise protocol, and any deviations to the protocol must be submitted for reapproval, the most efficient course of action depends on the specific journal requirements (right track). Most RRs proceed via the left track, but most RR policies also leave the door open for authors to discuss barriers arising from EC/IRB rigidity. For example, Cortex requires that “all necessary ... approvals (e.g. ethics) are in place for the proposed research. Note that manuscripts will be generally considered only for studies that are able to commence immediately; however, authors with alternative plans are encouraged to contact the journal office for advice.” (emphasis added).

Future

What does the next decade and beyond hold for RRs? The gradual rise of the format has unlocked a range of possibilities for expansion and innovation while also posing challenges for implementation and quality control. Here, we consider some of the major possible developments as RRs scale up. We also reflect on the key outstanding questions for metascience (Box 4) and consider how RRs may influence systems for evaluating research and researchers.

Improving efficiency

Perhaps the greatest limitation of the RR format is the time taken for submissions to be reviewed at stage 1 and receive IPA, thus delaying the commencement of research (see the section “Limitations and drawbacks”). While it can be argued that elimination of publication bias offsets this cost at a community level, and that the stage 1 delay could improve quality by increasing start-up costs81, this downtime nevertheless reduces accessibility of the format to individual researchers and can make it prohibitive for short-term projects. Here, we consider four innovations that could substantially accelerate stage 1 review without reducing quality.

Rapid review

One way to accelerate RR review is to create a network in which reviewers agree to evaluate submissions within a short time frame. In 2020, Royal Society Open Science became the first journal to launch such a network for RRs related to the COVID-19 pandemic82. As part of this special initiative, the journal calls for submissions that are relevant to any aspect of COVID-19 in any field, including biological, medical, economic and psychological research, while also seeking specialist reviewers who are willing and able to evaluate stage 1 RRs within 24–48 h of accepting a review request. To date, nearly 900 scientists across a range of disciplines have joined the reviewer network, which is also accessible to other journals. To gain access, the journal must commit to rapid peer review of COVID-19 RRs—striving for 7 days for the initial stage 1 review round—and waive all article processing charges. Since then, 11 additional journals have joined the network, including Nature Human Behaviour, Nature Communications and PLoS Biology. To date, Royal Society Open Science has published six stage 2 RRs arising from the initiative, and additional submission statistics are available in the Supplementary Note83,84,85,86,87,88.

Scheduled review

To date, RRs at all journals are subjected to the same serial review process as regular articles. At stage 1, the manuscript is received and undergoes editorial triage; if it meets minimal requirements, editors then seek and obtain specialist reviews, ideally leading to IPA following revision and, in many cases, re-review (Fig. 4a). Despite prompt engagement by editors and reviewers, this process can take several months to achieve resolution. An alternative approach is to perform key elements of the initial stage 1 assessment in parallel (Fig. 4b). Under this model, authors initially submit a short, structured protocol for consideration before writing the stage 1 manuscript. If this passes editorial triage, then reviewers are invited to assess a complete stage 1 manuscript at a fixed date in the future (for example, 6 weeks ahead). During this time, the authors write and submit their complete stage 1 manuscript, which is then reviewed on the scheduled date or during a short range of dates. With sufficient contingencies in place, this modified review process could reduce the initial stage 1 review time (but not re-review time) from weeks/months down to a matter of days.

Fig. 4: Proposed scheduled review workflow for RRs.
figure 4

Restructuring the RR submission workflow could considerably reduce the duration of stage 1 peer review (dashed red arrows). a, In the typical RR chronology, the total time taken for editors to triage submissions, acquire reviewers, obtain reviews and reach an editorial decision accumulates serially and only after authors have prepared and submitted a full manuscript. b, Scheduled review could accelerate this process by performing key tasks in parallel. Rather than submitting a full manuscript, authors would initially submit a one-page, template-based RR ‘snapshot’ that undergoes editorial triage. If deemed suitable, the editor would then organize the review process for a fixed future date (or a short range of dates) while the authors prepare the full manuscript. Although this process could only feasibly expedite the first round of stage 1 review (and not the re-review of a revised stage 1 submission), the overall time-saving process could be substantial since the first round of assessment is usually the most onerous. Although no journals currently offer scheduled review, the workflow has been recently introduced as part of the PPI RR initiative (https://rr.peercommunityin.org/).

Observer–evaluator review and ‘rolling IPA’

A more radical alternative to accelerated review and scheduled review would be to abolish the current peer review system altogether, replacing the serial assessment of documents—arguably a throwback to the seventeenth-century exchange of letters—with a more dynamic observer–evaluator mechanism. Authors could use an existing infrastructure such as the Open Science Framework (and associated add-ons) to create a virtual laboratory space containing the project rationale, study protocol(s), code and data as applicable. Reviewers could then be parachuted in as virtual observers, monitoring, commenting and approving specific components as the research unfolds in real-time, with the editor providing guidance, monitoring and oversight. This system could make RRs more compatible with the rapid sequential workflow that is common in fields such as chemical biology, virology and psychophysics, where the results of one experiment often lead within days to the design and implementation of the next experiment. As each update to the protocol is approved, IPA could be rolled over and extended.

RR funding models

One of the major barriers to RRs in clinical research is the additional bureaucracy imposed by stage 1 review. Many researchers already face multiple pre-study hurdles, including grant review, ethics review and, in some cases, regulatory review, and all before even contemplating a stage 1 RR submission. One promising solution to this problem is for journals and funders to perform concurrent or near concurrent reviews of RR proposals. Under this partnership model, which was first trialled in 2017 (refs. 89,90), authors submit a stage 1 proposal to a journal and funder, either simultaneously or in succession. Following assessment by both parties (either separately or as part of a joint process), if expert reviews are favourable, then IPA and funding are awarded in synchrony, reducing two pre-study review phases into one. This mechanism could be further enhanced by incorporating ethics and regulatory review, which would be particularly useful for clinical trials for which these phases of review often require the assessment of a detailed and precise protocol.

Improving quality, accountability and rewards

Alongside improvements in efficiency, the next decade is likely to see a range of measures to optimize the openness and reliability of both RRs and the assessment of RRs. In this final section, we consider six key innovations, including computationally generated RRs, mandatory RRs for clinical trials, the development of a RR training and accreditation process for journal editors, tools for monitoring the speed and quality of journal assessment, ways to improve the recognition of reviewer contributions and emerging steps to incorporate RRs into formal research evaluation.

Computationally generated RRs

Quality control of accepted protocols is limited by the subjectivity of the review process and variable implementation of RR criteria across journals and fields. As previously noted (see the section “Lack of standardization”), the specificity of hypotheses, statistical tests and interpretations may not always be achievable with the current format. It may therefore be unrealistic to expect authors to achieve the required level of precision without tools that can guide them through this process, such as RR study design templates or software (for example, the ScienceVerse project developed by L. DeBruine and D. Lakëns available at https://scienceverse.github.io/scienceverse/). For RRs that test specific confirmatory hypotheses, all preregistered hypotheses and statistical predictions could also become machine-readable91. A machine-readable output can be used to extract key metadata from the RR, as the results either confirm or disconfirm the prespecified hypotheses. Evaluating a stage 2 manuscript on the basis of adherence to preregistered statistical predictions would then become a more efficient and standardized process for the reviewers and editors.

Several preregistration templates are available for assisting researchers in communicating their study plans in a concise and structured format92, and some journals also implement their own protocol-coding checklists with strict criteria for IPA93 (for example, https://osf.io/6bv27/). These template and checklists can be time-consuming to complete on top of manuscript preparation; therefore, an essential innovation for RRs will be the creation of a user-friendly web-based RR generation tool that guides the authors through the implementation of stringent criteria, including precise specification of hypotheses and linking with sampling plans, analysis plans, inference criteria and contingent interpretation given different outcomes. The tool would ideally accommodate a wide range of disciplines, similar to the experimental design assistant offered by NC3Rs (https://www.nc3rs.org.uk/experimental-design-assistant-eda), thereby producing a standardized, submission-ready stage 1 protocol.

Mandatory RRs for clinical trials

In basic (non-clinical) research, RRs are usually proposed as an additional option for authors rather than a requirement, which makes sense given the vital importance of exploratory science. However, we believe a strong case can be made for all clinical trials to be conducted and reported exclusively as RRs. Even though trial registration is now the norm, registration does not guarantee that trials are preregistered rather than ‘post-registered’94,95, that trial results will be reported free from bias96,97 or that the results will be published at all98,99. With trials being vulnerable to all the same publication and reporting biases that afflict basic research, and with the first RR model for clinical trials now available at BMC Medicine100, the next decade will hopefully see mounting pressure on clinical trial funders and major medical journals to embrace the format, ideally via RR funding models to maximize efficiency.

Training and accreditation for editors

As a general rule, the standard with which a journal reviews and administers RRs can never exceed its standard of RR editing. For this reason, it is therefore crucial that editors have the required skills and training. Guidelines to assist editors in evaluating stage 1 RR submissions are available in most RR-adopting journals or at the Center for Open Science (https://cos.io/rr/), and the PCI RR initiative requires new editors (called “recommenders”) to pass a 2 h entrance test. The expansion of RRs into new disciplines and less familiar terrain is bound to introduce variability in the standard of editing. Therefore, an important future step for increasing and standardizing the quality of RR editing will be to provide editors with training materials that are tailored for the busy schedules of academics and possibly using a massive open online course. In this way, editors, and perhaps entire journals, could receive accreditation for their knowledge and understanding of the RR process, including criteria for manuscript acceptance/rejection at stage 1 (IPA) and stage 2. This system would equip authors with data to inform their choice of journal and readers with the confidence in the standard of RR editing across journals.

Community monitoring and feedback

Related to the issue of editorial training is the issue of journal accountability. At present, there is little information available for authors to judge the quality of editing and review at a RR-adopting journal, apart from word of mouth and conventional (probably uninformative) indicators such as journal prestige55. We believe journals should regularly publish all data on the number of RR submissions received, rejection rates at the different stages and time spent under review. In addition, open review policies and a Yelp-style website in which authors and reviewers could leave anonymous feedback ratings on the quality of the editorial process would provide an incentive for journals to maintain high standards.

Reviewer recognition

Reviewers often make major contributions to RRs that are not transparently recognized. During stage 1, reviewers can recommend major changes in the study design, hypotheses, methods and analyses, contributions that would readily justify authorship if made outside the review process. One way to ensure that reviewers are properly credited for their contribution to RRs is for a journal to adopt an open, signed review policy. Although most journals employ closed, anonymous peer review, a minority of RR-adopting journals, such as Royal Society Open Science and Meta-Psychology, publish the accepted article alongside the reviews, which reviewers can sign in order to increase transparency, credit and accountability. Reviewers can further list their contributions in online platforms such as Publons (https://publons.com/). We believe it is important for reviewers, and especially ECR reviewers, to have publicly available evidence demonstrating the quality of their reviewing. To further recognize this contribution, one possibility would be to create a ‘reviewer contributor’ role that formally acknowledges the intellectual input of reviewers to the final RR, without being named as an author. This role could also be recognized through use of the CRediT taxonomy (https://casrai.org/credit/).

Research evaluation

To become normative in the long term, RRs will need to be recognized within formal systems for evaluating research quality. In the United Kingdom, there are already promising moves in this direction. In the 2021 Research Excellence Framework—a regular national exercise for assessing research quality and apportioning public funds—RRs are specifically noted as an indicator of research rigour101, which in turn means that authors who publish RRs could attract increased funding for their institutions. Following the recent formation of the UK Reproducibility Network (http://www.ukrn.org), institutions are also signalling their support for RRs102. University College London “strongly encourages” researchers to use RRs where appropriate103, an approach echoed by learned societies including the British Neuroscience Association104 and the British Psychological Society105. The Norwegian funder Stiftelsen Dam also recommends that grantees consider publishing their research in the form of RRs106, while the Templeton World Charity Foundation goes so far as to mandate RRs for certain funding schemes107. The next 5 years will hopefully see international expansion in the recognition of RRs at all stages of evaluation, from research outputs and grant applications to the criteria for employment and promotions. It is crucial that such judgments are applied cautiously with continual reference to ongoing metascience that will establish evidence of the costs and benefits of the format (Box 4).

Conclusion

In this Review, we reflected on the history, preliminary impacts and future potential of the RR initiative. For the past 8 years, the life and social sciences have embarked on a journey into the unknown—one that has been mooted for decades but has only now reached open waters. Early suggestions of impact are promising, with RRs more likely to disconfirm a priori hypotheses and to be computationally reproducible, while also receiving higher quality ratings and the same or higher attention through citations. The prospects of the initiative now hinge on more detailed metascience, while addressing limitations and maintaining quality control as the format scales up and into new disciplines. As we look into the next decade, we believe RRs are showing all the signs of becoming a powerful antidote to reporting and publication bias, realigning incentives to ensure that the practices that are best for science—transparent, reproducible, accurate reporting—also serve the interests of individual scientists.