In biomedical science, at least one thing is apparently reproducible: a steady stream of studies that show the irreproducibility of many important experiments.

In a 2011 internal survey, pharmaceutical firm Bayer HealthCare of Leverkusen, Germany, was unable to validate the relevant preclinical research for almost two-thirds of 67 in-house projects. Then, in 2012, scientists at Amgen, a drug company based in Thousand Oaks, California, reported their failure to replicate 89% of the findings from 53 landmark cancer papers. And in a study published in May, more than half of the respondents to a survey at the MD Anderson Cancer Center in Houston, Texas, reported failing at least once in attempts at reproducing published data (see 'Make believe').


The growing problem is threatening the reputation of the US National Institutes of Health (NIH) based in Bethesda, Maryland, which funds many of the studies in question. Senior NIH officials are now considering adding requirements to grant applications to make experimental validations routine for certain types of science, such as the foundational work that leads to costly clinical trials. As the NIH pursues such top-down changes, one company is taking a bottom-up approach, targeting scientists directly to see if they are willing to verify their experiments. 

There is the looming question, however, of who will pay for it all. Independently validating the results of a major paper that has in vitro and animal experiments can cost US$25,000, says Elizabeth Iorns, chief executive of Science Exchange, a company in Palo Alto, California, that matches scientists with verification service providers.

Last year, the NIH convened two workshops that examined the issue of reproducibility, and last October, the agency’s leaders and others published a call for higher standards in the reporting of animal studies in grant applications and journal publications. At a minimum, they wrote, studies should report on whether and how animals were randomized, whether investigators were blind to the treatment, how sample sizes were estimated and how data were handled.

The NIH is just beginning to take active measures, says Lawrence Tabak, the agency’s principal deputy director. “There is certainly sufficient information now that the NIH feels it’s appropriate to look at this at a central-agency level,” he says. This summer, he and other senior NIH officials, including Story Landis, director at the neurology institute, and Harold Varmus, director at the cancer institute, are assessing input gathered from the directors of the agency’s 27 institutes and centres. They will then confer with NIH director Francis Collins, who will decide what steps to take.

Proposals under consideration include modifying peer review to bring greater scrutiny to the work a grant application is based on — perhaps just for applications that are likely to lead to clinical trials. In a June meeting of Collins’s advisory committee, Tabak imagined implementing such a scenario. “If the premise isn’t validatable, then we’re done; it doesn’t matter how well you wrote the grant,” he said. Agency officials are also considering a requirement that independent labs validate the results of important preclinical studies as a condition of receiving grant funding.

The very idea of a validation requirement makes some scientists queasy. “It’s a disaster,” says Peter Sorger, a systems biologist at Harvard Medical School in Boston, Massachusetts. He says that frontier science often relies on ideas, tools and protocols that do not exist in run-of-the-mill labs, let alone in companies that have been contracted to perform verification. “It is unbelievably difficult to reproduce cutting-edge science,” he says.

But others say that independent validation is a must to counteract the pressure to publish positive results and the lack of incentives to publish negative ones. Iorns doubts that tougher reporting requirements will make any real impact, and thinks that it would be better to have regular validations of results, either through random audits or selecting the highest-profile papers.

Science Exchange would clearly benefit from a new flow of business should the NIH impose such a mandate on even some studies. The company’s Reproducibility Initiative, launched last year, arranges for the independent replication of study results for authors who request the service — and the first batch of academic validations began in May. In January, Iorns asked more than 22,000 corresponding authors of original biomedical papers published in 2012 if they would allow the experiments in their reports to be independently verified, should funding be made available. Of those who responded, 1,892 scientists said yes and 416 declined.

Iorns says that there are plenty of important papers among those published by the 1,892 — they had a readership that was, on average, an order of magnitude higher than that of the papers whose authors declined, as measured by downloads to a free online reference manager, Mendeley. The Laura and John Arnold Foundation, based in Houston, Texas, says that it is actively considering funding Science Exchange to validate cancer-cell biology papers within Iorns’ cohort.

Some at the NIH are coming round to the idea that validation is best contracted out. Shai Silberberg, who is responsible for reproducibility issues at the agency’s neurology institute, has almost finished a pilot study in which several academic labs tried to reproduce findings from studies aiming to move drugs to a stage at which they are ready to be tested in humans. He points out that it has already taken two and a half years. “It’s too slow,” he says. He now favours speedier contract-research organizations.

Iorns, for her part, is not waiting for the NIH to take action. On 30 July, Science Exchange launched a programme with reagent supplier, based in Aachen, Germany, to independently validate research antibodies. These are used, for example, to probe gene function in biomedical experiments, but their effects are notoriously variable. “Having a third party validate every batch would be a fabulous thing,” says Peter Park, a computational biologist at Harvard Medical School. He notes that the consortium behind ENCODE — a project aimed at identifying all the functional elements in the human genome — tested more than 200 antibodies targeting modifications to proteins called histones and found that more than 25% failed to target the advertised modification.

With antibodies, the companies that make them have an incentive to prove the quality of their products, and Iorns hopes that they will pay the thousands of dollars that such validation costs. Antibodies that pass muster will receive an ‘independently validated’ green tick in the catalogue.

But with budgets stretched thin — and with Congress well aware of the reproducibility issues — the NIH also has an incentive to make sure that its $29-billion budget is spent on verifiable science. “We are obligated to consider how we want to address this,” says Tabak.