Technology feature

Open framework tackles backwards science

Good news for grackles 

  • Jeffrey M. Perkel

Credit: Mopic / Alamy Stock Photo

Open framework tackles backwards science

Good news for grackles

6 September 2018

Jeffrey M. Perkel

Mopic / Alamy Stock Photo

When Brian Nosek established his lab at the University of Virginia in Charlottesville, he focused on openness, making code and software scripts publicly available and releasing manuscript preprints. "One of the themes that my lab had since it started in 2002 was, how can we make our daily practice in science closer to the ideals that we aspire to," he says.

But for many researchers, the minutiae of following suit was difficult. There was at the time no universal infrastructure available for researchers to establish long-distance collaborations, share files, post preprints, and pre-register studies. So, Nosek’s team, led by PhD student Jeff Spies, decided to build one.

Brian Nosek

The result was the Center for Open Science, a nonprofit based in Charlottesville, for which Nosek is Executive Director (and Spies its former chief technology officer). Its product is the Open Science Framework. Publicly launched in 2012, the OSF is a free, open-source platform that blends the capabilities of popular cloud-based services with research-dedicated features that those other tools lack.

Like Dropbox, Google Drive, and GitHub, for instance, the OSF provides capacious file storage; like Zenodo and Figshare, it allows users to archive code and data and to register a persistent digital object identifier; and like arXiv and bioRxiv, it provides a home for researchers to deposit manuscripts prior to publication. But the OSF extends that feature set with project home pages, fine-grain control over user access, study pre-registration, version control, a wiki, and more.

End to end support

"The OSF is a collaboration tool," Nosek explains, "that supports the entire research life cycle, from the onset of a project to the final publication and archiving of the research."

Researchers can use the OSF as a home for their project data and collaborations, and can keep those pages private, or limited to specific users. When they are ready to publish their work, they can take a snapshot of the project, assign it a DOI, make it public, and associate it with a preprint.

"The OSF is the closest thing I've seen - and it's not there yet, but it is the closest thing I've seen to something that's well integrated into a scholarly publishing workflow," says Titus Brown, a bioinformatician at the University of California, Davis.

The service now boasts more than 105,000 users and 143,000 public projects, Nosek says.

One of those users is Corina Logan, a behavioral biologist at the Max Planck Institute for Evolutionary Anthropology in Leipzig, Germany.

Logan studies the behavioral flexibility of birds called 'grackles', an urban bird species that has massively expanded its range over the past 150 years. She created a public project called The Grackle Project on the OSF, but most of its content- including project files and a wiki that documents lab standard operating procedures - is accessible only to her collaborators, a decision she made in order to ensure that data files cannot be edited by outside parties. "That would just be a huge mess," she says.

The killer app

For Logan, OSF's killer app is pre-registration - a feature that addresses the problem of "doing science backwards".

In a typical behavioral study, she explains, researchers will ask a question, collect data, and crunch the numbers to determine whether the data support the hypothesis. Researchers may then try slicing and dicing the data to see if they may support other hypotheses post hoc - a sleight of hand sometimes called 'p-hacking' or 'HARKing' ('hypothesising after results are known'), which can elevate statistical noise to apparent significance. "This is the bane of a lot of scientific papers," says Dorothy Bishop, a neuropsychologist at the University of Oxford, UK.

"It's doing science backwards," Logan says. But using pre-registration, she says, she can do "science forwards", by declaring in advance what she intends to study, and how she will do it.

To pre-register a study, researchers lay out their hypothesis and proposed methods for data collection and analysis, before collecting any data. They then post that document on OSF, which time-stamps and archives it. The researcher can choose to make the document public immediately, or keep it private for up to four years. Some journals will peer-review such documents, effectively agreeing to publish the resulting study regardless of its findings. (Logan has migrated her pre-registrations to GitHub to take advantage of that service's granular version control mechanism.)

In performing such studies, researchers adhere to the pre-registered protocol as closely as possible. Should hiccups arise - for instance, if the data deviate from the statistical model the researchers anticipated - they can change tack, provided they document that deviation in their write-ups. "Science is a verb", Logan says, "It's always changing."

Thinking ahead

If that sounds like it adds to a researcher's already busy schedule, it doesn't, says Sara Weston, a postdoctoral fellow at Northwestern University in Evanston, Illinois. Pre-registration can actually save time, she explains, by forcing researchers to plan ahead. "I stop myself from saying, well, what if I did it this way, or what if I did it with this covariate, or what if I try this interaction? I feel I'm more sure that I've thought through those things ahead of time."

Another key feature of the OSF is its project homepages, and their integration with other online storage systems. Though researchers can upload files directly to OSF, those who use Google Drive, Dropbox, or GitHub, for instance, can tie those accounts to an OSF project instead. OSF will pull files from those services automatically, thus providing a home for a project's many pieces, which a user can keep public, private, or share with collaborators. Should a user upload a revised version of a given file, the service retains both copies, thus providing a history of the changes.

Ben Marwick, an archaeologist at the University of Washington in Seattle, says this ability to integrate multiple online systems is one of the "strong attractions" of the OSF. Another, he says, is its integrated preprint server. OSF preprints can be associated with projects, providing an easy way to migrate back and forth between a project's data and the resulting manuscripts. Integration with the Hypothesis annotation service means users can comment on preprints, as well.

Upload, download

OSF includes two other elements: OSF Meetings, for sharing conference materials, and OSF Institutions, which provides a “central hub for research projects on a branded, dedicated OSF page,” as well as institutional log-in.

For Brown, OSF serves mostly as a home for very large data files - those over about 50 MB in size, which are too large for GitHub to handle efficiently. (Or at all - GitHub has a 100 MB file limit.) In order to work efficiently with those files, Brown's lab developed a tool, called osfclient, for uploading and downloading OSF files either from the Unix command line or using the Python programming language - a strategy that codifies research workflows and thus aids computational reproducibility; an R-language analog called osfr is separately under development.

Users can also upload files directly to OSF via its web interface, or indirectly via the Google Drive or Dropbox desktop applications. At present, no OSF file-synchronization desktop application is in the works, Nosek says. But according to Richard Ball, an economist at Haverford College in Haverford, Pennsylvania, it would be welcome: "That would take OSF to the next level of appeal and usefulness," he says.

Also missing, says Marwick, are the social features that make GitHub so popular - for instance, the ability to 'follow' a researcher and stay up to date with their latest activities.

Those concerns aside, what the OSF does do is advance research reproducibility by making it easier for researchers to document their own work, and to understand that of their colleagues. "That's the heart, to me, of what the OSF is," says Weston: "What are all the ways that I can be transparent with other researchers about what I did and why I did it? And what are the ways that I can be transparent with myself about what I did and why I did it?"

Jeffrey M. Perkel is technology editor at Nature.