Editorial | Published:

Credit for code

Nature Genetics volume 46, page 1 (2014) | Download Citation

Subjects

Moving toward fully transparent research publications, we suggest several approaches to share research that is instantiated in software written for computers and other laboratory machines. Review, replication, reuse and recognition are all incentives to provide code.

Software for biomedical research ranges from single scripts used to format data to complex suites of analytical tools. The biggest problem often encountered by editors, referees and readers is knowing where in the work any code has been employed. Thus, when submitting a manuscript to a journal for peer review, it is good practice to deposit annotated open source code in a recognized community repository that tests software under a set of standard conditions and to provide a unique resource locator. Because software rapidly versions and operates under diverse settings and with customization, it is also useful to offer the code actually used nonexclusively to the journal in a supplementary text or archived file. This latter solution is simple enough to deter malware while providing some guidance for determined readers.

Community repositories that carry out testing are ideal for commonly used programs (for example, those used in statistical analysis), and a fair proportion of the genetics community is fortunately familiar with the Comprehensive R Archive Network (http://cran.r-project.org/) and the principles of stewardship of modular software embodied in the Bioconductor suite (http://www.bioconductor.org/). The journal has sufficient experience with these resources to endorse their use by authors. We do not yet provide any endorsement for the suitability or usefulness of other solutions but will work with our authors and readers, as well as with other journals, to arrive at a set of principles and recommendations.

Two needs stand out. First, code should have permanent identifiers such as the Digital Object Identifiers used by publishers, and authors of code should receive attribution for their programs as well as for their publications. Second, data sets and the code to handle data should be stored together, as metadata can cover both and the repositories then become attractors for communities, sometimes even evolving into environments where stored code can be run by third parties on stored data (for example, Nat. Genet. 45, 1121–1126, 2013, doi:10.1038/ng.2761). We are pleased to see that the DataCite organization is indeed listing code storage as a feature in its index of data repositories (http://www.datacite.org/repolist). We think that this organization is a good one to engage with in coordinating efforts to attain the citation of data sets and code.

If these best practices are not possible, there are ways not to make the current situation worse. For example, it is a good idea to provide a unique resource locator for the version of the code at a corporate or institutional site. Much open source software is currently archived at the commercial site GitHub (https://github.com/). If none of these solutions are feasible, please do declare when there is code involved in the work, even if it is proprietary or unavailable, and provide equations or algorithms that enable a reader to understand and replicate analytical decisions made with the research software.

About this article

Publication history

Published

DOI

https://doi.org/10.1038/ng.2869

Authors

    Further reading

    • Clustering of 770,000 genomes reveals post-colonial population structure of North America

      • Eunjung Han
      • , Peter Carbonetto
      • , Ross E. Curtis
      • , Yong Wang
      • , Julie M. Granka
      • , Jake Byrnes
      • , Keith Noto
      • , Amir R. Kermany
      • , Natalie M. Myres
      • , Mathew J. Barber
      • , Kristin A. Rand
      • , Shiya Song
      • , Theodore Roman
      • , Erin Battat
      • , Eyal Elyashiv
      • , Harendra Guturu
      • , Eurie L. Hong
      • , Kenneth G. Chahine
      •  & Catherine A. Ball

      Nature Communications (2017)

    • Disease variants alter transcription factor levels and methylation of their binding sites

      • Marc Jan Bonder
      • , René Luijk
      • , Daria V Zhernakova
      • , Matthijs Moed
      • , Patrick Deelen
      • , Martijn Vermaat
      • , Maarten van Iterson
      • , Freerk van Dijk
      • , Michiel van Galen
      • , Jan Bot
      • , Roderick C Slieker
      • , P Mila Jhamai
      • , Michael Verbiest
      • , H Eka D Suchiman
      • , Marijn Verkerk
      • , Ruud van der Breggen
      • , Jeroen van Rooij
      • , Nico Lakenberg
      • , Wibowo Arindrarto
      • , Szymon M Kielbasa
      • , Iris Jonkers
      • , Peter van 't Hof
      • , Irene Nooren
      • , Marian Beekman
      • , Joris Deelen
      • , Diana van Heemst
      • , Alexandra Zhernakova
      • , Ettje F Tigchelaar
      • , Morris A Swertz
      • , Albert Hofman
      • , André G Uitterlinden
      • , René Pool
      • , Jenny van Dongen
      • , Jouke J Hottenga
      • , Coen D A Stehouwer
      • , Carla J H van der Kallen
      • , Casper G Schalkwijk
      • , Leonard H van den Berg
      • , Erik W van Zwet
      • , Hailiang Mei
      • , Yang Li
      • , Mathieu Lemire
      • , Thomas J Hudson
      • , P Eline Slagboom
      • , Cisca Wijmenga
      • , Jan H Veldink
      • , Marleen M J van Greevenbroek
      • , Cornelia M van Duijn
      • , Dorret I Boomsma
      • , Aaron Isaacs
      • , Rick Jansen
      • , Joyce B J van Meurs
      • , Peter A C 't Hoen
      • , Lude Franke
      •  & Bastiaan T Heijmans

      Nature Genetics (2017)

    • DOI for geoscience data - how early practices shape present perceptions

      • Jens Klump
      • , Robert Huber
      •  & Michael Diepenbroek

      Earth Science Informatics (2016)

    • TCGAbiolinks: an R/Bioconductor package for integrative analysis of TCGA data

      • Antonio Colaprico
      • , Tiago C. Silva
      • , Catharina Olsen
      • , Luciano Garofano
      • , Claudia Cava
      • , Davide Garolini
      • , Thais S. Sabedot
      • , Tathiane M. Malta
      • , Stefano M. Pagnotta
      • , Isabella Castiglioni
      • , Michele Ceccarelli
      • , Gianluca Bontempi
      •  & Houtan Noushmehr

      Nucleic Acids Research (2016)

    • Orchestrating high-throughput genomic analysis with Bioconductor

      • Wolfgang Huber
      • , Vincent J Carey
      • , Robert Gentleman
      • , Simon Anders
      • , Marc Carlson
      • , Benilton S Carvalho
      • , Hector Corrada Bravo
      • , Sean Davis
      • , Laurent Gatto
      • , Thomas Girke
      • , Raphael Gottardo
      • , Florian Hahne
      • , Kasper D Hansen
      • , Rafael A Irizarry
      • , Michael Lawrence
      • , Michael I Love
      • , James MacDonald
      • , Valerie Obenchain
      • , Andrzej K Oleś
      • , Hervé Pagès
      • , Alejandro Reyes
      • , Paul Shannon
      • , Gordon K Smyth
      • , Dan Tenenbaum
      • , Levi Waldron
      •  & Martin Morgan

      Nature Methods (2015)

    Newsletter Get the most important science stories of the day, free in your inbox. Sign up for Nature Briefing