“Totally unreliable.” “A buggy mess.” Over the past month, software engineers have sharply criticized the code underpinning an influential coronavirus simulation by scientists at Imperial College London, one of several modelling exercises that helped sway UK politicians into declaring a lockdown. Some media articles even suggested that the simulation couldn’t be repeated by others — casting further doubt on the study. Now, a computational neuroscientist has reported that he has independently rerun the simulation and reproduced its results. And other scientists have told Nature that they had already privately verified that the code is reproducible.
The successful code testing isn’t a review of the scientific accuracy of the simulation, produced by a team led by mathematical epidemiologist Neil Ferguson. But it dispels some misapprehensions about the code, and shows that others can repeat the original findings.
The new test is "the best possible verification of Ferguson's simulations given the state of the art in computational science," says Konrad Hinsen, a computational biophysicist at the French national research agency CNRS in Paris, who was not involved in the work. In May, he wrote in a blogpost that the Imperial study's code looked "horrible", but that such shortcomings are to be expected in code written by scientists who usually aren’t specialists in software development.
Released in mid-March, the original study suggested there could be half a million UK deaths if nothing were done to stop the virus, and modelled how various policy interventions might help. But Imperial scientists did not immediately make the code available for public scrutiny.
When a cleaned-up version was released at the end of April, software engineers disparaged its quality and said the simulation needed to be repeated by others. In May, David Davis, a member of Parliament with the United Kingdom’s governing Conservative Party, tweeted about some of the online criticism, saying that if true, it was “scandalous”.
Media articles cast further doubt on the Imperial work by reporting online comments suggesting that other scientists had experienced problems rerunning the code. Nature has now ascertained that these were taken out of context: they related to work done with the Imperial group to ensure that the publicly released code ran correctly in every possible computing environment.
Ferguson — who didn’t comment on the criticisms at the time — agrees that the simulation didn’t use current best-practice coding methods, because it had to be adapted from a model created more than a decade ago to simulate an influenza pandemic. There was no time to generate new simulations of the same complexity from scratch, he says, but the team has used more modern coding approaches in its other work. However, none of the criticisms of the code affects the mathematics or science of the simulation, he says.
The politicized debate around the Imperial code demonstrates some of the reasons that scientists might still hesitate to openly release the code underlying their work, researchers say: academic programs often have shortcomings that software engineers can pick at. Even so, scientists ought to release their code and document how it works, says Stephen Eglen, the neuroscientist at the University of Cambridge, UK, who reran the Imperial code and reported his results on 1 June.
This year, Eglen co-founded an organization called Codecheck to help evaluate the computer programs behind scientific studies. His work tests whether an independent scientist can reproduce the results of a computational analysis, given its data inputs and code. He didn’t review the epidemiology that went into the Imperial simulation — such as estimates of the fatality rate associated with the new coronavirus, or of how often individuals typically mix in societies. British science advisers, however, asked multiple teams to model the emerging pandemic, and they produced results similar to Imperial’s.
Researchers working with London’s Royal Society as part of an effort called Rapid Assistance in Modelling the Pandemic (RAMP) have told Nature that they also privately ran exercises to verify the code in March. After the original Imperial study was posted online, RAMP researchers worked with Ferguson’s team and software firms Microsoft and GitHub to improve the clarity of documentation and clean up the software for public release on the GitHub website, a central repository where developers (including scientists) share code. As part of this effort, they checked that the public and original code reliably produced the same findings from the same inputs.
The RAMP group’s work included a separate effort to test the robustness of the simulation by trying to break it under various operating conditions, says Graeme Ackland, a physicist at the University of Edinburgh, UK. The team involved, including software specialists at Edinburgh and at Europe’s particle-physics laboratory CERN, near Geneva, Switzerland, posted comments on GitHub as they went. It was these comments that newspaper articles erroneously quoted as casting doubt on whether the code could be reproduced.
Calls to share code
Although some journals now ask peer reviewers to rerun and verify code, sharing it publicly is still far from an academic norm. The amount of time researchers have to spend either helping people use their software or refuting claims stemming from its misuse is a “big worry” among many academics, says Neil Chue Hong, founding director of the Software Sustainability Institute in Edinburgh. “There are ways you can run the code that mean you won’t get sensible results, but the researchers who use the code know what those ways are,” he says. “It’s like you or me being given a Formula One racing car, and being surprised when it crashes when it goes around the first corner.”
Despite this, code is the substance of any computational study, and ought always to be released, says Eglen. Other scientists have also called for more transparency around the code underlying COVID-19 models in general. Through Codecheck, Eglen has verified the results of two other COVID-19 models, by researchers at the London School of Hygiene & Tropical Medicine, who posted their code with their studies.
Asked whether he’d learnt any lessons from the furore over his team’s code, Ferguson emphasized to Nature how fast the work had to be done. On 27 February, he presented basic estimates of the impact of the pandemic at a private meeting of the United Kingdom’s main scientific advisory group for emergencies; according to minutes released at the end of May, his figures already gave estimates of half a million deaths. His team then worked long days to quickly produce the more complex simulations estimating how some policy actions might change the result. Cleaning up and releasing the code was not a top priority at the time, he says.
Nature 582, 323-324 (2020)