Credit: Illustration by David Parkins

Irreproducible research poses an enormous burden: it delays treatments, wastes patients' and scientists' time, and squanders billions of research dollars. It is also widespread. An unpublished 2015 survey by the American Society for Cell Biology found that more than two-thirds of respondents had on at least one occasion been unable to reproduce published results. Biomedical researchers from drug companies have reported that one-quarter or fewer of high-profile papers are reproducible1,2.

Many parties are addressing the problem. Funding bodies such as the US National Institutes of Health (NIH) have announced training initiatives3 and explicitly instructed grant reviewers to consider whether experimental plans ensure rigour. New methods of data analysis and peer review have been proposed to deflate bias.

Several journals, including Nature and Science, have updated their guidelines and introduced checklists. These ask scientists whether they followed practices such as randomizing, blinding and calculating appropriate sample size. Science has also added statisticians to its panel of reviewing editors. Philanthropic and non-profit organizations have sponsored projects to improve robustness.

Funders' policies, journal guidelines and widespread soul-searching are necessary. But they are not sufficient.

Conspicuous by their absence from these efforts are the places in which science is done: universities, hospitals, government-supported labs and independent research institutes. This has to change. Institutions must support and reward researchers who do solid — not just flashy — science and hold to account those whose methods are questionable.

Spot the shirkers

Although researchers want to produce work of long-term value, multiple pressures and prejudices discourage good scientific practices. In many laboratories, the incentives to be first can be stronger than the incentives to be right.

Discussions of conflicts of interest typically centre on relationships with industry, but academic scientists face more pernicious, even existential temptations. Monetary rewards are often less important than the 'currency' with which scientists advance their careers: high-level publications lead to funding opportunities, promotions, awards and other forms of recognition. These markers of scientific achievement become proxies for assessment of the work itself, and further encourage spectacular, but less than substantiated, research.

Amplifying these pressures is a human prejudice in favour of our own ideas. There is a very real temptation to ignore a result that does not conform to our preconceptions, or to recast it so that it does. Data-dredging is used to find statistically significant results that justify a publication. Sound practices such as blinding, multiple repeats, validated reagents and appropriate controls4 are dismissed as luxuries or nuisances.

Research institutions contribute to and benefit from these perverse incentives. They bathe in the reflected glory of their faculty; they trumpet breakthroughs published in top-tier journals, lauding achievements to the media and donors. Some even pay investigators for publications. Many require that investigators generate their salary from research grants.

An anonymous survey of around 140 trainees at the MD Anderson Cancer Center in Houston, Texas, found that nearly one-third had felt pressure to prove a mentor's hypothesis even though their experimental results did not support it, and nearly one-fifth had themselves published results they considered less than robust5. Nearly half knew of mentors who required lab members to publish a high-impact paper to complete training in their labs (see 'Pressured findings').

Credit: Source: Ref. 5

Although important, the checklists introduced by journals do nothing to shift the focus from results to the legitimacy of the process by which the results are produced. Researchers encounter these lists after they have drawn conclusions and are ready to announce them — not when planning their research. There is no mechanism to verify that listed practices were actually employed.

The core instinct of scientists — scepticism — is punished by the current system. Institutions have a duty to reform it. They must shoulder their responsibility for training graduate students and postdoctoral fellows, for supporting the scientific behaviour of their faculty members and for the knowledge that emanates from their endeavours.

Good institutional practice

Although there are some protections against outright fraud, few institutions have strong, transparent processes in place to discourage poor-quality science or to foster objectivity. We propose that research institutions that receive public funding should apply the same kind of oversight and support to ensure research integrity as is routinely applied for animal husbandry, biosafety and clinical work.

Nature special: Challenges in irreproducible research

To conduct animal research, investigators must hold licences and undergo continuous education. Institutions appoint delegates to monitor compliance, and those delegates are held to account by regulators. Similar oversight is used for work with radioactivity and human embryonic stem cells.

These functions could be broadened to encompass established guidelines for research conduct, such as the ARRIVE (Animal Research: Reporting of In Vivo Experiments) and MIAME (Minimum Information About a Microarray Experiment) guidelines, and data sharing as required by the NIH and the National Science Foundation.

Standards already exist that define good laboratory practice to test chemicals for toxicity, good manufacturing practice and good clinical practice. These systems were introduced to ensure a degree of consistency, quality and integrity. Procedures are in place to ensure compliance.

The scientific community should come up with a similar system for research, which we term good institutional practice (GIP). If funding depended on a certified record of compliance with GIP, robust research would get due recognition.

At a minimum, GIP should consist of the following tenets.

Routine discussion of research methods. Many labs already comb through data and methods as a group before submitting a paper. Such discussions should be broadened and formalized across an institution. Regular department and cross-department meetings should be established to dissect manuscripts in preparation. Methods and processes (rather than conclusions) would be debated just as a competitor's paper might be critiqued in a journal club. Primary research material would be available. This practice is roughly analogous to the 'Morbidity and Mortality' conferences routine in hospitals, in which working hours are also intense.

Regular critique sessions help scientists to learn to defend their science without feeling defensive. Investigators publicly hold each other to account, and trainees learn what to demand of their own research. Anxieties can be raised informally, highlighting institutional weaknesses and systematic errors. The practice also puts a short-term focus on what has traditionally been a long-term reward: a reputation for careful science.

Most institutions will not make the necessary move unless forced.

Reporting systems. Also well-established in clinical medicine is a system to anonymously flag occurrences that did or could have jeopardized a patient's care. Such systems are often the only way workers dare to raise concerns and admit mistakes. Similarly colleagues, graduate students and postdocs should be able to discuss concerns about sloppy science without jeopardizing their careers. Designated co-mentors, a departmental omsbudsman or existing university offices of research integrity could be charged with providing a forum for informal, confidential discussions. Any formal reports should be investigated in a balanced and impartial way.

Training and standards. Some sloppiness stems from ignorance. Many investigators determine whether trainees are ready to move on by gauging the number and impact factors of their publications; instead, supervisors should base such decisions on whether their lab members understand research methods and process. Compulsory institutional training should ensure a common understanding of rigorous experimental design, research standards and objective evaluation of data. Faculty members and trainees should demonstrate their ability to spot problems such as 'cherry picking' data to make the best story. Compliance with research standards, including data-sharing, should be supported, audited and acknowledged.

Records and quality management. Laboratory notebooks and records must be available for independent review. Electronic laboratory notebooks facilitate collaboration, supervision and record keeping, and can link records to the original data. One of our institutions (U.D.'s) is now adopting these system-wide. Random audits should be conducted to guarantee that experimental data are duly recorded and that elements of good research practice are routine. Such spot-checks are commonplace in industry.

Appropriate incentive and evaluation systems. Institutions should find ways to deter non-compliance with guidelines, poor mentoring and scientific sloppiness. Faculty members with poor records should face loss of laboratory space and trainees, decreased funding and potential demotion. Conversely, faculty members who excel as mentors and careful experimentalists should be rewarded. Appropriate metrics should be developed so that promotions are based on robustness and high-quality mentoring, rather than simply on high-profile publications6. Surveys such as that conducted at MD Anderson exemplify one way in which administrators can gain the insight necessary to improve the research environment. Institution-level metrics could help to monitor overall performance and remind all researchers and administrators of their responsibility to the scientific community.

Enforcement. Institutions should investigate egregious lapses and record them in a routine, transparent way. Departments of research integrity or other centres of excellence should be funded, staffed and given enough authority to prevent, detect, investigate and penalize poor-quality research. They should also be charged with promoting an institutional culture that nurtures robustness.

Getting to GIP

The systems needed to promote reproducible research must come from institutions — scientists, funders and journals cannot build them on their own. These kinds of changes will require additional money, infrastructure, personnel and paperwork. The load on institutions and investigators will be real, but so is the burden of irreproducible research. Even if it is accompanied by an apparent decrease in productivity, the resulting increase in research quality will be well worth the costs.

Still, most institutions will not make the necessary moves unless forced. Funding bodies should make GIP a prerequisite for receiving a grant. The concept has gained some traction: last year, Science Foundation Ireland announced plans to conduct external audits on some of the labs that it supports.

There will not be one ideal solution. Faculty members, trainees and administrators will need to come together for honest, difficult discussions to restructure institutions. Neither scientists nor institutions should engage in mere box checking; new practices must restrain sloppiness while interfering only minimally with the many scientists who are behaving well.

Large-scale change is possible. In the 1970s, clinical research had little rigour or oversight. Now clinical trials routinely include concurrent control groups, double-blinding, pre-specified endpoints, power calculations to determine patient numbers and analysis plans that thwart bias. In addition, primary data are available for independent statistical analysis by regulatory authorities. At the time, these changes were controversial; many physicians believed them to be unnecessary and regressive.

Nothing an institution can do will prevent misconduct altogether. This is not the goal. Rather, it is to support the work of well-meaning scientists, to reduce the waste from biased results, and to relieve some of the pressures that encourage sloppy science.