To the Editor — Animal behavior is increasingly being recorded in systematic imaging studies that generate large datasets. To maximize the usefulness of these data, there is a need for improved resources for analyzing and sharing behavioral data that will encourage reanalysis and methodological developments1. However, for behavioral data, unlike genomic or protein structural data, there are no widely used standards. It is therefore desirable to make data available in a relatively raw form to enable flexibility in data analysis. For computational ethology to approach the level of maturity of other areas of bioinformatics, at least three challenges must be addressed: storing and accessing video files; defining flexible data formats to facilitate data sharing; and developing software to read, write, browse, and analyze the data. We have generated an open resource to begin addressing these challenges for Caenorhabditis elegans behavioral data.
To store video files and the associated features and metadata, we use a Zenodo.org community (an open-access repository for data) that provides durable storage and citability, and that supports contributions from other groups. We have also developed a web interface that enables filtering of the video files on the basis of feature histograms that can return, for example, fast and curved worms in addition to more standard searches for particular strains or genotypes (Fig. 1 and http://movement.openworm.org/). The database currently consists of 14,874 single-worm tracking experiments representing 386 genotypes (building on 9,203 experiments and 305 genotypes in a previous publication2) and includes data from several larval stages as well as data from aging experiments consisting of more than 2,700 videos of animals tracked daily from the L4 stage to death (Nature Research Reporting Summary). Full-resolution videos are available in HDF5 containers that include gzip-compressed video frames, time stamps, worm outlines and midlines, feature data, and experimental metadata. HDF5 files are compatible with multiple languages including MATLAB, R, Python, and C. We have also developed an HDF5 video reader that allows video playback with adjustable speed and zoom (an important feature for reviewing high-resolution multiworm tracking data), as well as toggling of worm segmentation over the original video to verify segmentation accuracy during playback.
Second, we have defined an interchange format named Worm tracker Commons Object Notation (WCON), to facilitate data sharing and software reuse among groups working on worm behavior. WCON uses the widely supported JSON format to store tracking data as text that is readable by both humans and machines. It is compatible with single and multiworm3 tracking data at any resolution, from a single point representing worm position over time4 to many points representing the high-resolution skeleton of a moving worm2. It also supports custom feature additions so that individual laboratories can store their own specific datasets alongside the existing set of basic worm data. WCON readers are available for Python, MATLAB, Scala, and C. Detailed documentation for the file formats and software is available on the project page (https://github.com/openworm/tracker-commons/).
Finally, we have complemented the database and file formats with open-source software written in Python for single and multiworm tracking, feature extraction, review, and analysis (Supplementary Discussion; code and documentation in Supplementary Software or at https://doi.org/10.5281/zenodo.1323782, where compiled versions are also available).
The suite of tools reported here makes quantitative behavioral analysis and reanalysis accessible for both experimentalists and computational scientists. It may also serve as a template for similar efforts in other model-organism communities.
Further information on research design is available in the Nature Research Reporting Summary linked to this article.
Videos, skeleton (WCON) files, and feature files are available under a Creative Commons attribution (CC BY) license through the database page http://movement.openworm.org/ and Zenodo community page https://zenodo.org/communities/open-worm-movement-database/.
This work was supported by the MRC through grant MC-A658-5TY30 to A.E.X.B. Q.C. is supported by an ERC Starting Grant (NeuroAge 242666), a Research Councils UK Fellowship, and the University of London Central Research Fund. Some strains were provided by the CGC, which is funded by the NIH Office of Research Infrastructure Programs (P40 OD010440).