Making Biomedical Research Software FAIR: Actionable Step-by-step Guidelines with a User-support Tool

Findable, Accessible, Interoperable, and Reusable (FAIR) guiding principles tailored for research software have been proposed by the FAIR for Research Software (FAIR4RS) Working Group. They provide a foundation for optimizing the reuse of research software. The FAIR4RS principles are, however, aspirational and do not provide practical instructions to the researchers. To fill this gap, we propose in this work the first actionable step-by-step guidelines for biomedical researchers to make their research software compliant with the FAIR4RS principles. We designate them as the FAIR Biomedical Research Software (FAIR-BioRS) guidelines. Our process for developing these guidelines, presented here, is based on an in-depth study of the FAIR4RS principles and a thorough review of current practices in the field. To support researchers, we have also developed a workflow that streamlines the process of implementing these guidelines. This workflow is incorporated in FAIRshare, a free and open-source software application aimed at simplifying the curation and sharing of FAIR biomedical data and software through user-friendly interfaces and automation. Details about this tool are also presented.

What are relevant coding best practices for developing biomedical research software?
Category 2: Include metadata Include metadata that follows community standards and uses controlled vocabulary (F2). The metadata needs to include a plurality of attributes, i.e. use multiple terms for the same, similar, or overlapping concept (R1). The metadata is required to include several elements: relation between different versions of a software (F1.2), the identifier of the software and describes how it can be obtained (F3), information about citing the software (F4), the standards followed by the data interacting with the software (I1), qualified references to other objects required to run the software (I2), detailed provenance of the software (i.e., why and how the software came to be, as well as who contributed what, when and where) (R1.2), and qualified references to other software required to run the software (R2). Metadata needs to be included in both machine-readable and human-readable e.g. software documentation format (F4, R1). The documentation needs to meet domain-relevant community standards (R3). Version control systems such as GitHub can be used to record details of the software development history (R1).
What relevant standards exist for metadata format and structure (outside of repositories and registries) that allow to document the mandatory metadata in human and machine-readable format?
How can biomedical research software be documented following relevant standards (documentation format, documentation content)?
Category 3: Provide a license Provide a clear license that is, preferably, widely used and as unrestrictive as possible (R1.1). The license must be provided such that it is readable by both humans and machines (R1.1), is compatible with the dependencies of the software (R1.1), and meets relevant standards (R3).
How to provide a license in human-readable and machine-readable formats that are standard? What are widely used licenses that are suggested?
Category 4: Share software in a repository Share software on a suitable repository that issues a unique and long-lasting identifier (F1), helps with including rich metadata that follow community standards and uses controlled vocabulary (F2), includes the identifier of the software, and describes how it can be obtained (F3) and is FAIR, searchable, and indexable (F4). A suitable repository can also help with making the software accessible via its identifier through a standardized protocol (A1) that is open and free (A1.1.) and allow for authentication and authorization when necessary (A1.2). Share such that different components (software, commits, files, etc.) of the software (F1.1) and different versions of the software (F1.2) are assigned distinct identifiers as deemed suitable by the developers.
What repositories can be used for archiving biomedical research software? In what format should the research software be archived?

Category 5: Register in a registry
Register the software on a suitable registry to make the software metadata accessible even when the software is no longer available (A2). A suitable registry can also act as an alternative or complement to a repository for obtaining a unique and long-lasting identifier (F1), including rich metadata that follow community standards and uses controlled vocabulary (F2), includes the identifier of the software and describes how it can be obtained (F3), and is FAIR, searchable, and indexable (F4).
What registries can be used for registering biomedical research software?

FAIR4RS Principles
Compliance through the FAIR-BioRS guidelines F1. Software is assigned a globally unique and persistent identifier.
Archiving the software on Zenodo/Figshare (step 5.2) will assign a Digital Object Identifier (DOI) which is a unique and persistent identifier. Archiving the software on Software Heritage (step 5.3) will assign a SoftWare Heritage persistent IDentifier (SWHID) which is also a unique and persistent identifier. Bio.tools/RRID Portal will issue a unique and persistent identifier as well (bio.tools ID and RRID, respectively) when the software is registered (step 6).
F1.1. Components of the software representing levels of granularity are assigned distinct identifiers.
Bio.tools/RRID Portal (step 6) will assign a unique identifier for the entire software.
Archiving each version of the software on Zenodo/Figshare (step 5.2) will assign a distinct identifier (DOI) for each version. Archiving the software on Software Heritage (step 5.3) will assign a distinct identifier (SWHID) to any level of granularity of the software (software, releases, files, commits, code fragments, etc.).
F1.2. Different versions of the software are assigned distinct identifiers.
Archiving each version of the software on Zenodo/Figshare (step 5.2) will assign a distinct identifier (DOI) for each version. Archiving on Software Heritage (step 5.3) will assign a distinct identifier for each version release of the software as well. Changes between versions will be documented in the CHANGELOG file (step 3.2).
F2. Software is described with rich metadata.
Rich metadata covering a variety of aspects will be provided through the code-level documentation (step 2.1), the dependencies recording (step 2.2), the instructed documentation (step 3), the prescribed metadata files (step 4), the repository-specific metadata on Zenodo/Figshare (step 5.2), and the registry-specific metadata on bio.tools/RRID Portal (step 6).
F3. Metadata clearly and explicitly include the identifier of the software they describe.
The README file will include the DOI from Zenodo/Figshare in a "How to cite" or similar section (step 3.1). The codemeta.json and CITATION.cff files (step 4) will include the DOI from Zenodo/Figshare in their "identifier" and "identifiers" fields, respectively. The DOI from Zenodo/Figshare is always included in that repository's metadata (step 5.2). The DOI will also be included in the bio.tools/RRID portal's metadata which also includes their respective IDs (step 6).  1 and 2.2), documentation of the software (step 3), and prescribed metadata files (step 4) will contain additional metadata that also follow community standards, use controlled vocabularies, and is typically searchable through the suggested version system control platforms (step 1.1).
A1. Software is retrievable by its identifier using a standardised communications protocol.
The software archive can be retrieved by the DOI generated by Zenodo/Figshare (step 5.2) using HTTP, which is a standardized protocol. The software will be retrievable through the version control system platform (step 1.1), the deployment repository if applicable (step 5.1), and Software Heritage (Step 5.3) also using HTTP. Once archived on Zenodo or Figshare (step 5.2) and on Software Heritage (step 5.3) both the software and metadata will always be available and accessible for the lifetime of these repositories. Moreover, Zenodo and Figshare send metadata from the software to DataCite for generating a DOI and that metadata will always remain accessible through DataCite's registry. Additionally, Zenodo keeps metadata stored in high-availability database servers separate from the software files. Bio.tools/RRID Portal (step 6) will also keep the metadata accessible even if the software is no longer available e.g., on the version control system platform or any of the archiving repositories.
I1. Software reads, writes and exchanges data in a way that meets domain-relevant community standards.
Step 2.4 will ensure that the inputs/outputs of the software follow any applicable community standards. Those standards will be documented in the README file under a "Standards followed" or similar section (step 3.1). They can also be documented in the bio.tools metadata using the EDAM ontology to specify the nature and format of the input and output data.
I2. Software includes qualified references to other objects.
The README file/documentation will contain qualified references to other objects associated with the software under a "Parameters and data required to run the software" or similar section (step 3.1). The fields "isPartOf", "hasPart", and "relatedLink" of the codemeta.json file (step 4.1) will also provide qualified references to other objects. The Zenodo metadata (step 5.2) include a "Related identifiers" field that can be used to provide qualified references to other objects. R1. Software is described with a plurality of accurate and relevant attributes.
The software will be described with a plurality of accurate and relevant attributes through the development history captured by the version control system platform (step 1.1.), the prescribed documentation (step 3), the prescribed metadata files (step 4), the repository-specific metadata (step 5), and the registry-specific metadata (step 6), which all have several overlapping elements.
R1.1. Software is given a clear and accessible license.
The software will be given a clear and accessible license through step 1.2 which instructs selecting a license and including a LICENSE file with usage terms. The metadata of the software repository in the version control system platform (step 1.1), the metadata files (step 4), the repository-specific metadata (step 5), and the registry-specific metadata (step 6) will all include the name of the license.
Detailed provenance (why and how the software came to be, as well as who contributed what, when and where, etc.) will be provided in several ways: in the development history maintained by the version control system platform (step 1.1.) that will also get archived in Software Heritage (step 5.3), in the README through an "Overall description of the software" and a "How to cite" or similar sections (step 3.1), in the codemeta.json file through several fields such as Software description/abstract ("description") and Authors ("givenName", "familyName") with their Organization name ("affiliation") (step 4.1), in the CITATION.cff file through several fields such as Authors ("given-names", "family-names") with their Organization name ("affiliation") (step 4.2), in the repository-specific metadata (step 5), and the registry-specific metadata (step 6).
R2. Software includes qualified references to other software.
The software dependencies file (step 2.2) will contain qualified references to other software required to run the source code. Following language-specific best practices (step 2.3) will also allow including dependencies in the code (e.g., imports in Python code). The README files (step 3.1) will contain qualified references to other software under a "High-level dependencies of the software" or similar section. The fields "isPartOf", "hasPart", and "relatedLink" of the codemeta.json file (step 4.1) will provide qualified references to other software. The Zenodo metadata (Step 5.2) also includes a "Related identifiers" field that can be used to provide qualified references to other software. The bio.tools metadata (step 6) include a "Relations" class that can be used to provide qualified references to other software registered on bio.tools.
R3. Software meets domain-relevant community standards.
Steps 1, 2, and 3 will ensure that the software, including its documentation and license, meet domain-relevant community standards and best practices. Sharing software on a deployment repository (if applicable) will also help meet domain-relevant community standards and best practices (step 5.1).