The case for open computer programs

Journal name:
Nature
Volume:
482,
Pages:
485–488
Date published:
DOI:
doi:10.1038/nature10836
Received
Accepted
Published online

Scientific communication relies on evidence that cannot be entirely included in publications, but the rise of computational science has added a new layer of inaccessibility. Although it is now accepted that data should be made available on request, the current regulations regarding the availability of software are inconsistent. We argue that, with some exceptions, anything less than the release of source programs is intolerable for results that depend on computation. The vagaries of hardware, software and natural language will always ensure that exact reproducibility remains uncertain, but withholding code increases the chances that efforts to reproduce results will fail.

The rise of computational science has led to unprecedented opportunities for scientific advance. Ever more powerful computers enable theories to be investigated that were thought almost intractable a decade ago, robust hardware technologies allow data collection in the most inhospitable environments, more data are collected, and an increasingly rich set of software tools are now available with which to analyse computer-generated data.

However, there is the difficulty of reproducibility, by which we mean the reproduction of a scientific paper’s central finding, rather than exact replication of each specific numerical result down to several decimal places. We examine the problem of reproducibility (for an early attempt at solving it, see ref. 1) in the context of openly available computer programs, or code. Our view is that we have reached the point that, with some exceptions, anything less than release of actual source code is an indefensible approach for any scientific results that depend on computation, because not releasing such code raises needless, and needlessly confusing, roadblocks to reproducibility.

At present, debate rages on the need to release computer programs associated with scientific experiments2, 3, 4, with policies still ranging from mandatory total release to the release only of natural language descriptions, that is, written descriptions of computer program algorithms. Some journals have already changed their policies on computer program openness; Science, for example, now includes code in the list of items that should be supplied by an author5. Other journals promoting code availability include Geoscientific Model Development, which is devoted, at least in part, to model description and code publication, and Biostatistics, which has appointed an editor to assess the reproducibility of the software and data associated with an article6.

In contrast, less stringent policies are exemplified by statements such as7Nature does not require authors to make code available, but we do expect a description detailed enough to allow others to write their own code to do similar analysis.” Although Nature’s broader policy states that “...authors are required to make materials, data and associated protocols promptly available to readers...”, and editors and referees are fully empowered to demand and evaluate any specific code, we believe that its stated policy on code availability actively hinders reproducibility.

Much of the debate about code transparency involves the philosophy of science, error validation and research ethics8, 9, but our contention is more practical: that the cause of reproducibility is best furthered by focusing on the dissection and understanding of code, a sentiment already appreciated by the growing open-source movement10. Dissection and understanding of open code would improve the chances of both direct and indirect reproducibility. Direct reproducibility refers to the recompilation and rerunning of the code on, say, a different combination of hardware and systems software, to detect the sort of numerical computation11, 12 and interpretation13 problems found in programming languages, which we discuss later. Without code, direct reproducibility is impossible. Indirect reproducibility refers to independent efforts to validate something other than the entire code package, for example a subset of equations or a particular code module. Here, before time-consuming reprogramming of an entire model, researchers may simply want to check that incorrect coding of previously published equations has not invalidated a paper’s result, to extract and check detailed assumptions, or to run their own code against the original to check for statistical validity and explain any discrepancies.

Any debate over the difficulties of reproducibility (which, as we will show, are non-trivial) must of course be tempered by recognizing the undeniable benefits afforded by the explosion of internet facilities and the rapid increase in raw computational speed and data-handling capability that has occurred as a result of major advances in computer technology14. Such advances have presented science with a great opportunity to address problems that would have been intractable in even the recent past. It is our view, however, that the debate over code release should be resolved as soon as possible to benefit fully from our novel technical capabilities. On their own, finer computational grids, longer and more complex computations and larger data sets—although highly attractive to scientific researchers—do not resolve underlying computational uncertainties of proven intransigence and may even exacerbate them.

Although our arguments are focused on the implications of Nature’s code statement, it is symptomatic of a wider problem: the scientific community places more faith in computation than is justified. As we outline below and in two case studies (Boxes 1 and 2), ambiguity in its many forms and numerical errors render natural language descriptions insufficient and, in many cases, unintentionally misleading.

The failure of code descriptions

The curse of ambiguity

Ambiguity in program descriptions leads to the possibility, if not the certainty, that a given natural language description can be converted into computer code in various ways, each of which may lead to different numerical outcomes. Innumerable potential issues exist, but might include mistaken order of operations, reference to different model versions, or unclear calculations of uncertainties. The problem of ambiguity has haunted software development from its earliest days.

Ambiguity can occur at the lexical, syntactic or semantic level15 and is not necessarily the result of incompetence or bad practice. It is a natural consequence of using natural language16 and is unavoidable. The problem is regarded as so axiomatic that its avoidance or minimization is routinely taught at the undergraduate level in computing degrees. Nor is the study of ambiguity confined to the classroom. Active research continues on the use of tools for the detection of ambiguity17, the avoidance of ambiguity in major projects18, and the clarification of the intended functions of computer programs15.

One proposed solution to the problem of ambiguity is to devote a large amount of attention to the description of a computer program, perhaps expressing it mathematically or in natural language augmented by mathematics. But this expectation would require researchers to acquire skills that are only peripheral to their work (set theory, predicate calculus and proof methods). Perhaps worse, investment of effort or resources alone cannot guarantee the absence of defect19. A recent study20 of a tightly specified, short, simply expressed algorithm whose semi-mathematical specification was supplemented by example outputs showed that major problems still arose with large numbers of programs individually implemented to this specification. In short, natural language descriptions cannot hope to avoid ambiguous program implementations, with unpredictable effects on results.

Errors exist within ‘perfect’ descriptions

Let us assume for a moment that a researcher, perhaps trained—as are computer scientists—to think of computer algorithms as mathematical objects, and fully versed in the formal semantics of software description, has managed to describe a computer program perfectly in some notation. Unfortunately, even such a description would not ensure direct or indirect reproducibility, because other forms of error or ambiguity (unrelated to natural language) are likely to creep in, leading to potentially serious uncertainties (Box 2).

First, there are programming errors. Over the years, researchers have quantified the occurrence rate of such defects to be approximately one to ten errors per thousand lines of source code21.

Second, there are errors associated with the numerical properties of scientific software. The execution of a program that manipulates the floating point numbers used by scientists is dependent on many factors outside the consideration of a program as a mathematical object22. Rounding errors can occur when numerous computations are repeatedly executed, as in weather forecasting23. Although there is considerable research in this area, for example in arithmetic and floating point calculations24, 25, 26, 27, algorithms28, verification29 and fundamental practice30, much of it is published in outlets not routinely accessed by scientists in generic journals, such as Computers & Mathematics with Applications, Mathematics in Computer Science and the SIAM Journal on Scientific Computing.

Third, there are well-known ambiguities in some of the internationally standardized versions of commonly used programming languages in scientific computation13. Monniaux22 describes an alarming example relating to implementation of software features:

“More subtly, on some platforms, the exact same expression, with the same values in the same variables, and the same compiler, can be evaluated to different results, depending on seemingly irrelevant statements (printing debugging information or other constructs that do not openly change the values of variables).”

This is known as an order-of-evaluation problem and many programming languages are subject to its wilful ways. Ironically, such execution ambiguity is quite deliberate and is present to allow a programming language compiler more flexibility in its optimization strategy. And even when programs are simple, or developed by the largest software companies, such errors remain surprisingly common: numerical ambiguity led Microsoft to declare in 2010 and reaffirm in September 2011, that the treatment of floating point numbers in its popular Excel spreadsheet “...may affect the results of some numbers or formulas due to rounding and/or data truncation.” (http://support.microsoft.com/kb/78113).

Perfection is no guarantee of reproducibility

Finally, even if a computer program could be unambiguously described and implemented without error, other problems can arise in machine deployment whereby the results from identical code often diverge when hardware and software configurations are changed22. So even perfection in one’s own software environment does not guarantee reproducibility. As a result, to maximize the chances of reproducibility and consistency, not only would we urge code release, but also a description of the hardware and software environment in which the program was executed and developed.

Challenges are no excuse for closed code

Nature’s policy on code release implies that algorithmic descriptions using mathematical specifications, equations, formal algorithmic descriptions or pseudocode (simplified version of complete code) may be required. But there is no guarantee that such tools can avoid ambiguity20, and even if they could, we have shown above that implementation and numerical errors—possibly compounded by differences in machine architecture—will still arise. So, even if complete code is made available, exact replication or even reproduction of central results may fail. A reasonable observer might therefore ask why code should be made available at all. Our response is that the alternative is far worse. Keeping code closed ensures that potential uncertainties or errors in a paper’s conclusions cannot be traced to ambiguity, numerical implementation, or machine architecture issues and prevents testing of indirect reproducibility. Although it is true that independent efforts to reproduce computational results without recourse to the original source code constitute an important approach, the all-too-common treatment of code as a black box unnecessarily slows and impedes valid efforts to evaluate model results. We therefore regard the non-availability of code as a serious impediment to reproducibility.

Potential barriers and proposed solutions

There are a number of barriers to the release of code. These include a shortage of tools that package up code and data in research articles; a shortage of central scientific repositories or indexes for program code; an understandable lack of perception of the computational problems with scientific code leading to the faulty assumption that program descriptions are adequate (something we address in this article); and finally that the development of program code is a subsidiary activity in the scientific effort.

A modest proposal

An effective step forward would be for journals to adopt a standard for declaring the degree of source code accessibility associated with a scientific paper. A number of simple categories illustrate the idea:

• Full source code: full release of all source code used to produce the published results along with self-tests to build confidence in the quality of the delivered code, as is the case with Perl modules in the CPAN archive, for example (http://cpan.org).

• Partial source code: full release of source code written by the researcher accompanied by associated documentation of ancillary packages used, for example commercial scientific subroutine libraries.

• Marginal source code: release of executable code and an application programming interface to allow other researchers to write test cases.

• No source code: no code at all provided.

This hierarchy of disclosure would alert both the readers and authors of a journal article to the fact that the issue is important and would highlight the degree to which results might be reproduced independently. There remain, however, some potential stumbling blocks, a number of which can easily be resolved using existing facilities.

Intellectual property rights

Clearly, if there is evidence of commercial potential or use, such as a patent or some copyright, then there is a problem. It is difficult to see how a journal might deal with this without substantial financial commitment to independent testing under a non-disclosure agreement or possibly even the purchase of commercial rights. Perhaps the simplest solution is for a journal to flag the software as ‘No source code’ (ideally giving the reasons) until such time as the source code can be included, either because the code goes into the public domain or is released under some free licence. Such a designation simply says that, for the moment, the results are not reproducible with the authors’ own source code, and that testing of the main results must proceed with independent approaches.

Limited access

Researchers may not have access to at least some of the software packages that are used for development. We suggest that this would not be a problem for most researchers: their institutions would normally provide such software. If it were to be a problem, then a journal could mark a publication as ‘Partial source code’. The release of the code, even without the software environment required for compilation and execution, would still be valuable in that it would address issues such as dissection and indirect reproducibility (see above) and would enable rewriting using other programming languages.

Procedure

Adopting the simple disclosure of the availability of source code will help make it clear to the readership of a journal that this is an important issue, while also giving them an idea of the degree of code release. However, we would further suggest that journals adopt a standard that specifies that supplementary material supporting a research article must describe each of the released modular components of any software used. Nature editors and referees are already empowered to include an appraisal of code in their judgement about the publication potential of the article, and this practice should be more widely advertised and supported. A good example of this approach is the way that the journal Geoscientific Model Development asks authors to describe their program code.

Logistics

Over the past two decades, the open-source community has solved the logistics of releasing and storing code while maintaining a cooperative development environment. SourceForge (http://www.sourceforge.net/) is an excellent example. Founded in 1999, it is a web-based source-code repository which acts as a free centralized location for developers working on open-source projects. It currently hosts around 300,000 projects and has over two million registered users. Not only does it store source code but also it provides access to version control information, project wikis (websites that are easily modifiable by its users) and database access. We urge funding agencies to investigate and adopt similar solutions.

Packaging

There are a number of tools that enable code, data and the text of the article that depends on them to be packaged up. Two examples here are Sweave associated with the programming language R and the text-processing systems LaTeX and LyX, and GenePattern-Word RRS, a system specific to genomic research31. Sweave allows text documents, figures, experimental data and computer programs to be combined in such a way that, for example, a change in a data file will result in the regeneration of all the research outputs. GenePattern-Word RRS is similar in that it enables an author to link text, tables and figures to the analysis and data that yielded the results, reported in a word-processed document; it also allows further experimentation (for example, additional analyses can be carried out). It is still early days, however, and localized solutions are emerging at the grassroots level. Donoho and co-workers, for example, have developed software packages that allow anyone with access to the Matlab programming language and development environment to reproduce figures from their harmonic analysis articles, inspect source code, change parameters and access data sets32.

Steps to implementation

Our thesis is that journal and funding body strictures relating to code implementations of scientific ideas are now largely obsolete. We have suggested one modest path to code availability in this article. There are a number of further steps that journals, academies and educational organizations might consider taking:

• Research funding bodies should commission research and development on tools that enable code to be integrated with other elements of scientific research such as data, graphical displays and the text of an article.

• Research funding bodies should provide metadata repositories that describe both programs and data produced by researchers. The Australian National Data Service (http://www.ands.org.au/) which acts as an index to data held by Australian research organizations, is one example of this approach.

• Journals should expect researchers to provide some modular description of the components of the software that support a research result; referees should take advantage of their right to appraise software as part of their reviewing task. An example of a modular description can be seen in a recent article published in Geoscientific Model Development33.

• Science departments should expand their educational activities into reproducibility. Clearly such teaching should be relevant to the science at hand; however, courses on statistics, programming and experimental method could be easily expanded and combined to include the concept of reproducibility.

References

  1. Schwab, M., Karrenbach, M. & Claerbout, J. Making scientific computations reproducible. Comput. Sci. Eng. 2, 6167 (2000)
  2. Barnes, N. Publish your computer code: it is good enough. Nature 467, 753 (2010)
  3. McCafferty, D. Should code be released? Commun. ACM 53, 1617 (2010)
  4. Merali, Z. Error. Nature 467, 775777 (2010)
  5. Hanson, B., Sugden, A. & Alberts, B. Making data maximally available. Science 331, 649 (2011)
  6. Peng, R. D. Reproducible research and biostatistics. Biostatistics 10, 405408 (2009)
    This work provides a succinct and convincing argument for reproducibility—Biostatistics is at the forefront of ensuring that code and data are provided for other researchers.
  7. Devil in the details. Nature 470, 305306 (2011)
  8. Pedersen, T. Empiricism is not a matter of faith. Comput. Linguist. 34, 465470 (2008)
  9. Donoho, D. L. An invitation to reproducible computational research. Biostatistics 11, 385388 (2010)
  10. Raymond, E. S. The Cathedral and the Bazaar (O’Reilly, 2001)
  11. He, Y. & Ding, C. H. Q. Using accurate arithmetics to improve numerical reproducibility and stability in parallel applications. J. Supercomput. 18, 259277 (2001)
  12. Hatton, L. et al. The seismic kernel system—a large scale exercise in Fortran 77 portability. Softw. Pract. Exper. 18, 301329 (1988)
  13. Hatton, L. The T experiments: errors in scientific software. IEEE Comput. Sci. Eng. 4, 2738 (1997)
  14. Hey, T.Tansley, S.Tolle, K., eds. The Fourth Paradigm: Data-Intensive Scientific Discovery (Microsoft Press, 2009)
  15. Gervasi, V. & Zowghi, D. On the role of ambiguity in RE. In Requirements Engineering: Foundation for Software Quality (eds Wieringa, R. & Persson, A) 248254 (Springer, 2010)
  16. van Deemter, K. Not Exactly: In Praise of Vagueness (Oxford University Press, 2010)
  17. Yang, H., Willis, A., De Roeck, A. & Nuseibeh, B. Automatic detection of nocuous coordination ambiguities in natural language requirements. In Proc. IEEE/ACM Intl Conf. on Automated Software Engineering 5362 (ACM, 2010)
  18. de Bruijn, F. & Dekkers, H. L. Ambiguity in natural language software requirements: a case study. In Requirements Engineering: Foundation for Software Quality (eds Wieringa, R, & Persson, A., ) 233247 (Springer, 2010)
  19. Pfleeger, S. L. & Hatton, L. Investigating the influence of formal methods. IEEE Computer 30, 3343 (1997)
  20. van der Meulen, M. J. P. & Revilla, M. A. The effectiveness of software diversity in a large population of programs. IEEE Trans. Softw. Eng. 34, 753764 (2008)
  21. Boehm, B., Rombach, H. D., Zelkowitz, M. V., eds. Foundations of Empirical Software Engineering: the Legacy of Victor R. Basili (Springer, 2005)
  22. Monniaux, D. The pitfalls of verifying floating-point computations. ACM Trans. Programming Languages Systems 30 (3). 141 (2008)
  23. Thomas, S. J., Hacker, J. P., Desagne, M. & Stull, R. B. An ensemble analysis of forecast errors related to floating point performance. Weather Forecast. 17, 898906 (2002)
  24. Revol, N. Standardized interval arithmetic and interval arithmetic used in libraries. Proc. 3rd Intl Congr. Mathematical Software 337–341. (2010)
  25. Rump, S. M., Ogita, T. & Oishi, S. Accurate floating point summation. Part 1: Faithful rounding. SIAM J. Sci. Comput. 31, 189224 (2009)
  26. Rump, S. M. Accurate and reliable computing in floating-point arithmetic. Proc. 3rd Intl Congr. Mathematical Software 105–108. (2010)
  27. Badin, M., Bic, L. & Dillencourt, M. &. Nicolau, A. Improving accuracy for matrix multiplications on GPUs. Sci. Prog. 19, 311 (2011)
  28. Pan, V. Y., Murphy, B., Qian, G. & Rosholt, R. E. A new error-free floating-point summation algorithm. Comput. Math. Appl. 57, 560564 (2009)
  29. Boldo, S. & Muller, J.-M. Exact and approximated error of the FMA. IEEE Trans. Comput. 60, 157164 (2011)
  30. Kahan, W. Desperately needed remedies for the undebuggability of large floating-point computations in science and engineering. IFIP/SIAM/NIST Working Conf. Uncertainty Quantification in Scientific Computing (Springer, in the press).
  31. Mesirov, J. P. Accessible reproducible research. Science 327, 415416 (2010)
  32. Donoho, D. L., Maleki, A., Rahman, I. U., Shahram, M. & Stodden, V. Reproducible research in computational harmonic analysis. Comput. Sci. Eng. 11, 818 (2009)
    Donoho and fellow researchers have been at the forefront of reproducibility for many years; this article reviews their work, including facilities for code presentation.
  33. Yool, A., Popova, E. E. & Anderson, T. R. Medusa-1.0: a new intermediate complexity plankton ecosystem model for the global domain. Geosci. Model Develop. 4, 381417 (2011)
    An example of an article from a journal that asks for code release for model description papers and encourages code release for other categories of paper.
  34. Easterbrook, S. M. & Johns, T. C. Engineering the software for understanding climate change. Comput. Sci. Eng. 11, 6574 (2009)
  35. Brohan, P., Kennedy, J. J., Harris, I., Tett, S. F. B. & Jones, P. D. Uncertainty estimates in regional and global observed temperature changes: a new dataset from 1850. J. Geophys. Res. 111, D12106 (2006)
  36. Adams, E. N. Optimizing preventive service of software products. IBM J. Res. Develop. 28, 214 (1984)
  37. Hatton, L. & Roberts, A. How accurate is scientific software? IEEE Trans. Softw. Eng. 20, 785797 (1994)

Download references

Acknowledgements

We thank D. Hales of the Department of Design at the Open University and P. Piwak of the Department of Computing for pointing out some reproducibility work outside computing. J.G.-C. is grateful to I. Goz for pointing out the calculation errors in the CRUTEM data from the Meteorological Office.

Author information

Affiliations

  1. Department of Computing Open University, Walton Hall, Milton Keynes MK7 6AA, UK

    • Darrel C. Ince
  2. School of Computing and Information Systems, Kingston University, Kingston KT1 2EE, UK

    • Leslie Hatton
  3. 83 Victoria Street, London SW1H 0HW, UK

    • John Graham-Cumming

Contributions

D.C.I., L.H. and J.G.-C. contributed to all aspects of this article.

Competing financial interests

The authors declare no competing financial interests.

Corresponding author

Correspondence to:

Author details

Comments

  1. Report this comment #39425

    Darrel Ince said:

    We found the word limit that Nature has for Perspective articles prevented us from making some points. This coda contains them.

    OUR INTENT
    Our intent was not to criticise; indeed we have admiration for scientists who have to cope with the difficult problems we describe. One thesis of the article is that errors occur within many systems and does not arise from incompetence or lack of care. Developing a computer system is a complex process and problems will, almost invariably, occur. By providing the ability for code to be easily perused improvement will happen. This is the result detailed in both the boxes in the article: the Met Office data is more accurate, admittedly by a small amount, and because of feedback to developers the geophysical software was considerably improved.

    OTHER REASONS FOR CODE TRANSPARENCY
    Our message is simple: because of ambiguities in natural language, hardware, and floating point representations, together with programming errors, the code that supports some published scientific result must be released. The code represents an exact description of some computational process and is far superior to a written description, even one expressed in some formal notation.

    Also, given the size of datasets and the fact that the only way to deal with them is via code, it makes no sense to speak of the data and code separately. If you release one, you must release the other.
    There are, however, a number of other reasons for code release which, because of pressure of space, we were unable to explore.

    The first is that it enables researchers to build upon other research. A module that has applicability to a specific research area can be used by other researchers. Here we have the software analogue of researchers building upon others research results.

    The second is that released code provides excellent teaching material, both to the post-graduate and undergraduate levels. One of us (DCI) has taught programming for many years. He, along with many others, has discovered that after the initial teaching of the elements of a programming language many students benefit hugely from realistic source code examples.

    The third is connected with citizen science. Increasingly there are more and more projects that encourage non-institutional researchers to get involved in research, for example the Galaxy Zoo project. Often these projects involve data collection or classification. We believe that there is scope for some really ambitious projects which would involve citizen scientists, for example statisticians and software developers, to get involved (JG-C was one of them). For these to happen both code and data transparency is needed.

    The fourth is to do with improvement. JG-Cs work has meant that the error bounds for the data in a temperature database have been narrowed, albeit in a small way, leading to a greater accuracy. This is common within the open-source environment where many such systems, for example, the web server Apache, are highly reliable.

    EXCEPTIONS
    We may get some criticisms for including the caveat that there are some exceptions. If a researcher has persuaded his institution to apply for some patent or copyright protection (a long and quite expensive process) and there is evidence of sales then this caveat clearly has to stand. However, if the researcher claims, without evidence, that the software has commercial potential we feel that this should not hinder publication of the code.

    It is also worth mentioning some other reasons for not releasing code. The first is where damage can be done. For example a biological researcher may find a way of developing some virus that could wreck havoc with the world health system and a software component forms part of the research then this should not be released.

    The second is where the possession of both software and data could lead to someone being able to carry out some criminal activity or anti-social activity. DCI recently attended a presentation that described research that involved a system that was capable of identifying a certain criminal activity. The data could never be released; neither could the code, since possession of the analysis within the code would have lead to the crime being much less easily detected.

    CLARIFICATION
    The quantity 5000 execution years is used in the article. This represents the time over what would have been thousands of concurrent executions. The Merali reference can be found with the tag ?Computational Science? fronting it in the online edition of Nature. This is an internal web-based tag that is not part of the title of the paper and we have omitted it.

    CODE DISCLOSURE
    DI has carried out four research projects over the last ten years that involve software. One has been published and software is available; a second is in the process of refereeing, but the software is available; the third involved commercial data and software used in an automotive industry project and is subject to commercial agreement; the fourth is in the early stages of research and software will be published during the latter stages of the publication process. LH makes the code and data for all recent scientific work available as category 1 or 2 on his own web-site and for commercial code as category 3. JG-C makes all his non-commercial code available freely under open source license.

    A FIRST STEP
    What we have described in this article is just a first step. Much more needs to be done. We would have liked to devote more time to the Australian National Data Service in our article, but word limits prevailed. This, we regard as a terrific model for national or subject-specific archiving and our description of it as a meta-data repository and index is just the tip of the iceberg.

    THANKS
    Could we thank Nature for accepting this paper, particularly since the content was driven by criticism of a code availability statement made in an editorial in the journal.

  2. Report this comment #39499

    Wayne Williamson said:

    Coming from a computing background this make perfect sense. In order to reproduce a set of results it is much easier to use existing code(as the article states). It would still need someone to validate that the code is doing what its supposed to.

  3. Report this comment #39500

    Charles Warner said:

    Some very strong arguments for code availability, and a critical issue for evaluating the validity of research results.

    You may want to consider adding to your list of example repositories:

    DLMF: NIST Digital Library (http://dlmf.nist.gov/) a digital version of the Handbook of Mathematical Functions with Formulas, Graphs, and Mathematical Tables, edited by Milton Abramowitz and Irene A. Stegun, which includes a link to their "Guide to Available Software", http://gams.nist.gov/.

    And, of course, there is netlib (http://netlib.org/).

    Both of these sites offer some very practical models for archiving source code, and researchers who draw on the codes included in these archives derive the benefit that this code hs been vetted by the larger community.

    Regarding packaging, there is a very interesting Open Source platform called "KNIME" (http://www.knime.org/) that, although designed as a "data mining" platform, offers a way to package documents, software, procedures, etc. in a single "project", and it is infinitely extendable through R or user-developed external programs. I have been using it to organize my own research projects lately- handy for keeping references, notes, and data all in one place. Possibly a good model for what you call "packaging"...
    Charles Warner

  4. Report this comment #41837

    Thomas Dent said:

    This is a rather strange article, it makes a fetish of 'reproducibility' as a computing-science requirement apparently detached from scientific content. Surely we should be more interested in the reproducibility of the science regardless of coding implementation, than in any narrower definition requiring agreement to so many decimal places between code outputs.

    Scientific codes are designed to answer scientific questions – defined without reference to any details of computing – to some specified accuracy. If the question is stated precisely and unambiguously enough, the answer ought to be independent of the code implementation to within the required accuracy. This is a different criterion from requiring computer-science 'reproducibility', meaning if I run the same algorithms on the same data I should get the same result – which can only detect errors in carrying out the algorithm. But scientific questions cannot be mapped one-to-one to computing algorithms.

    As scientists we would like to be confident that research results are not merely artefacts of our chosen analysis methods. The most thorough way to verify this is to analyze the same data with an independent code designed to answer the same scientific question. We cannot expect exactly the same answer, but we can demand that any deviations between results are small enough to attribute to insignificant calculational details.

    Publication of codes may or may not help towards this. It is very difficult to eliminate errors in any code longer than a thousand lines or so, doubly so when inspecting it from 'outside'. But comparing two independently written codes can be extremely informative, revealing pitfalls and hidden assumptions. Having the original code published will inevitably reduce the independence of any verification, as people will be tempted to copy or adapt aspects of it. In the worst case we might have two codes that gave similar answers, satisfying a computing science definition of reproducibility, but both containing the same scientific error, for example an invalid approximation.

    At the back of this is the amount of effort the community is expected to put in to checking analysis results. The best way to do this is by having two or more teams doing the same science but with independent codes. If this is too much time and effort, opening code to some level of public scrutiny is at least a decent second best.

  5. Report this comment #43013

    Thomas Dent said:
    This article completely misses the main point of reproducibility in science: a result should not depend on technical details of implementation of algorithms. If the analysis were re-implemented by different people in a different computing language the results should be comparable within uncertainties. The considerations of computing-science reproducibility discussed in the article are at best tangential to this.

    It's one thing to check that a code gives the claimed results when run as documented. Even if it passes all such computational tests it may still contain scientific mistakes (eg invalid approximations or assumptions) -- these would be more likely to be revealed by having an independent team write an independent analysis without even looking at the original code.

    What would be more useful is not releasing the source code, but instead giving a set of unit tests that any analysis ought to satisfy in order to answer the desired scientific questions. The article, despite its extraordinary length and verbosity, doesn't even mention 'scientific' unit tests of this kind.

Subscribe to comments

Additional data