Journal Production Guidance for Software and Data Citations

Software and data citation are emerging best practices in scholarly communication. This article provides structured guidance to the academic publishing community on how to implement software and data citation in publishing workflows. These best practices support the verifiability and reproducibility of academic and scientific results, sharing and reuse of valuable data and software tools, and attribution to the creators of the software and data. While data citation is increasingly well-established, software citation is rapidly maturing. Software is now recognized as a key research result and resource, requiring the same level of transparency, accessibility, and disclosure as data. Software and data that support academic or scientific results should be preserved and shared in scientific repositories that support these digital object types for discovery, transparency, and use by other researchers. These goals can be supported by citing these products in the Reference Section of articles and effectively associating them to the software and data preserved in scientific repositories. Publishers need to markup these references in a specific way to enable downstream processes.


Introduction
Software and data citation are emerging best practices in academic and scientific communication that provide excellent support for results validation, reproducibility, credit, sharing and reuse of valuable tools and resources.Data citation has been possible with some rigor since the establishment of DataCite-a Registration Agency of the International DOI Foundation, which issues robust persistent identifiers for datasets and maintains a registry of associated rich metadata-in 2009 1 and was recommended by a comprehensive report of a CODATA-ICSTI task group in 2012 (available at: http://www.nap.edu/catalog.php?record_id=13564).CODATA-ICSTI is the international and interdisciplinary Committee on Data for Science and Technology (CODATA) and the International Council for Scientific and Technical Information (ICSTI) jointly-formed Task Group on Data Citation Standards and Practices (https://codata.org/initiatives/task-groups/previous-tgs/data-citation-standards-and-practices/).It has become increasingly adopted since the introduction of the "Joint 1. Instructions to Authors: Provide author instructions that define which software and datasets should be cited in their article, reinforce these instructions throughout the process including editorial actions.Additionally, include how to structure these citations and provide information on selecting the best possible scientific repositories to use for software and data, and what information to put in an Availability Statement.2. Publication Policies: Update publication policies to include the importance of citing the software and datasets that support the transparency, integrity, and reproducibility of the research.3. Education and Outreach: Educate editors, reviewers, and staff on the policy, requirements, areas of flexibility, and unique aspects of software and dataset citations.4. Technical Updates: Put into place the necessary technical updates, as defined in this document, to ensure that the machine-actionable representation of the software and dataset citations is sustained throughout the publication workflow and properly formatted when registered to Crossref.5. Production Teams: Define for publication production team members the unique aspects of software and dataset citations; work together to implement necessary changes.6. Metrics: Establish quality metrics and measures to ensure consistent, accurate results for software and dataset citations.

Discussion
This document is the product of working sessions of the FORCE11 Software Citation Implementation Working Group's Journals Task Force, conducted over a period of 2 years.It reflects lessons learned from a project supported by an NSF Grant (2025364) to the American Geophysical Union to ensure that software and data citations were transferred reliably from publications to NSF's Public Access Repository.It provides best practices to journals such that the machine-actionable representation of the software and dataset citations is sustained throughout the publication workflow and properly formatted when deposited to Crossref.This optimizes their ability to be properly linked as research objects in support of services provided by DataCite, Scholarly Link eXchange (Scholix), and others.The FORCE11 Journals Task Force includes both society and commercial publishers of all sizes, research infrastructure providers, and others interested in implementation challenges for software and data citations.
The guidance here is primarily intended for those who support the journal production process, including those engaged in copy editing, markup, and preparing the article for publication on various types of digital and print platforms.To implement these recommendations, coordination with journal submission systems, author guidelines, editors, reviewers, authors, and others involved with the front-end process of publications will also need to be included.Since journal production depends on a number of activities that take place both prior and subsequent to production in the publication workflow, we include brief descriptions of what is expected in those activities.
In this guide we describe a set of necessary journal-related activities, organized by role, along with what is needed for datasets and software that supports the article to be identified and cited.We provide use cases for journals to consider as they ensure that their production processes are handling data and software citations optimally.
Problem statement.Data and software citations have unique use cases and workflows that distinguish them from article or book citations.In particular, software and data are often registered by Digital Object Identifier (DOI) registration agencies other than Crossref.
We have discovered that at many publishers, when scripts in the journal production process attempt to validate these citations, they do not do so correctly, and as a result, the integrity of the machine-actionable link to the software or dataset is not maintained and not provided to Crossref.As a result, important downstream services are not supported.
The global research community is now advancing the importance of citing software and datasets in scholarly literature as a means to be more transparent and support steps towards better reproducibility and reusability of software and datasets.Data and software are research objects separate from the article with the potential of reuse and citation in later works.Tracking these citations accurately so that authors/creators of these objects receive recognition is important and ensures the scholarly record is more complete.
Journal practices are influenced by other participants in the research ecosystem and the processes and services they provide.For this set of recommendations, these include Crossref (or other DOI registration agencies), Event Data (a joint service by Crossref and DataCite), DataCite Commons, and Scholix.These work together to associate software and datasets with publications for many other uses.
Crossref provides DOI registration services for publishers of English language peer-reviewed articles, their journals, and other publications.Using the Reference Section of each registered article, Event Data (a joint service by Crossref and DataCite that captures online mentions of Crossref records, https://www.crossref.org/blog/event-data-now-with-added-references/) records the relationship between the software and datasets cited in the article with the article itself in a database that other services can use.Event Data can also be populated if a relationship is asserted (with an identifier) to software or datasets in the journal metadata.Crossref has encountered challenges in providing consistent support to Event Data.In 2021, Crossref completed updates that fixed and improved these processes; however, incorrect publisher markup of the software and dataset citations still prevents full implementation.
DataCite, like Crossref, provides DOI registration services.DataCite specializes in preserving scientific datasets, some preprint servers (e.g., arXiv.org),physical samples (e.g., International Generic Sample Numbers (IGSNs),), and software (e.g., Zenodo).DataCite is the preferred DOI registration agency for datasets and software because of its robust identifier system, metadata, and support services.
DataCite Commons uses Event Data as well as the metadata from both Crossref and DataCite's APIs (Application Programming Interface), ORCIDs (Open Researcher and Contributor ID, https://orcid.org/), and RORs (Research Organization Registry, https://ror.org/)to visualize the connections between works (e.g., article, data, software), people, organizations, and repositories.For example, you can see the works that cite a particular dataset.
Scholix is an interoperability framework that collects and exchanges links between research data and literature 12,13 .Scholix is designed to support multiple types of persistent identifiers, although at the moment only DOIs are included.Both Crossref and DataCite feed DOIs to Scholix.If a data repository includes information about related publications that use a dataset, through the metadata included in the DOI registration process (as metadata property 'relatedIdentifier'), DataCite will provide that information (linked to the newly created dataset DOI) to Scholix.Similarly, if an article includes a dataset in the Reference Section, Crossref will report that to Scholix.Scholix manages multiple entries when the same article-to-dataset link is provided by two different sources.Software is not supported at this time.

Workflow description.
The following activities describe a typical publishing workflow, grouped by role, used to capture software and dataset citations properly in both human-readable and machine-actionable formats to support linking and automated attribution and credit.The order of activities may be slightly different for each publisher's production process.
Author Preparation 1. Use scientific, trustworthy repositories that register data and software with persistent, globally unique and resolvable identifiers.Trustworthy repositories are key to helping authors make their data and software as well-documented, accessible, and reusable as possible.These qualities are embodied in the FAIR Guiding Principles.Datasets and software that are as FAIR and openly licensed as possible should be encouraged by journals, taking into consideration necessary protections and national laws.Trusted repositories provide long-term curation and preservation data and software services guided by the TRUST (Transparency, Responsibility, User focus, Sustainability and Technology) principles 14 .CoreTrustSeal -an international, community based, non-governmental, and non-profit organization promoting sustainable and trustworthy data infrastructures, and offers to any interested data repository a core level certification based on the Core Trustworthy Data Repositories Requirements -provides a list of certified repositories that follow those principles 15 .Specific dataset curation services may require time to review and prepare the data prior to publication.This is most common with repositories that specialize in a particular scientific domain or data type.The time needed for domain-specific data curation, the value-added process of making the dataset interoperable and more likely to be reused, may be up to several weeks or more.Generalist repositories such as Harvard Dataverse Repository, Dryad, and Zenodo are typically far quicker as they do not undertake domain-specific curation.However, domain-specific repositories may be the norm for certain disciplines.Journals should encourage researchers to be aware of this time constraint where applicable, and to plan accordingly.Well-curated data provides transparency and integrity to research.The repository landscape is growing to accommodate the needs of researchers.Tools like re3data.org,DataCite Commons (using both DataCite and re3data.orgcontent), FAIRsharing.org,or the CoreTrustSeal website are well-managed and support authors in finding a trustworthy discipline-specific repository.
Software is best preserved in repositories that supports versioning.Researchers can use re3data.org to locate software preservation repositories.Those researchers that use GitHub.orgas a development platform can use the integrated bridge to Zenodo.org to preserve a specific version for publication.When using the Zenodo bridge remind your authors to double-check author names, add in author ORCIDs, and make necessary corrections to citation information and metadata.2. Include in the submitted article both an Availability Statement for software and datasets and citations in the Reference Section paired with in-text citations in the article.Publishers should provide support with author guidance and journal templates.All software and datasets can be described in the Availability Statement, but frequently the information needed for a complete citation is not fully available.
a.An Availability Statement is designed for the reader and states where the supporting datasets and software are located and any information about accessibility that is needed.Include an in-text citation that has a corresponding entry in the Reference Section.This is a statement on availability so that those looking to analyze or reuse datasets or software can easily find these objects.The information provided should lead the reader to the exact location where the software or dataset can be found.See Availability Statement Recommended Elements Section for more information.b.Paired Citations and References i. In-text citations in the Methods and/or Data Section that refer to corresponding entries (citations) in the Reference Section for the dataset or software.These in-text citations provide discrete linkage from research claims to the specific dataset or software supporting them and/or methods used in analysis similar to citations of research articles or other sources.ii.Citation (in Reference Section) for the dataset and software used in this research.These should be listed in the Reference Section.This allows for automated attribution and credit through Crossref 's Event Data.See section Techniques for Identifying a Data or Software Citation in your References Section Metadata for a suggestion on using a "bracketed description" in the citation format to make identification of data and software citations easier.Also, see Software and Data Citation Resources Section.3. Be aware that reference tools used by many authors may not have an item type for datasets or software yet.In order to follow the best practices in this guide, further editing of the citation may be required.For example, Zotero does not support datasets, but does support software.Zotero provides a workaround to support datasets.

Journal staff review
1. Ensure the presence of a sufficiently descriptive Availability Statement with actionable DOI/URLs for both software and datasets supporting the research.2. Ensure that for all software and datasets identified in the Availability Statement there is a supporting citation and the statement is complete.Journal review guidance is helpful (e.g., American Geophysical Union's (AGU) Data and Software Availability and Citation Checklist).The journal should promote the importance of a citation in the Reference Section as much as is feasible to encourage automated attribution and credit.Provide examples relevant to your journal.Citations should be to both newly created software and datasets as well as to software and datasets already deposited in repositories by other researchers and reused or referred to in the research by the authors.See Availability Statement Recommended Elements Section for more information.3. Provide guidance to authors recommending that software and datasets in the Availability Statement be preserved for the long term in a trustworthy repository complying with the TRUST principles 14 including its inherent long-term commitment for sustainable operations.DataCite has enhanced their Commons tool to include repositories; In addition, CoreTrustSeal provides a list of their certified repositories.Use keywords to discover potential repositories and review relevant characteristics like domain specialties, technical properties, governance, upload policies, and possible certifications 16 .4. Ensure citation follows journal style guides.Journals use many different citation styles.Some styles are flexible enough to allow the author to assist with identifying the citation type by using a "bracketed description" to indicate the citation as a dataset, software, computational notebook, etc.This is important to inform automated and manual reference section review as well as accurate production markup.See Software and Data Citation Resources Section for more information.
Reviewer/editor evaluation  19 and software citations from The National Information Standards Organization (NISO) 20 .3. Avoid removing citation information, especially the persistent identifier.

Copy editor review (re: language and style editing)
Check that the software and datasets mentioned in the Methods and Data Section have corresponding entries in the Availability Statement and citations in the Reference Section.
Note: not all data or software can be supported with a citation that includes a persistent identifier.

Production Markup Review (supports machine-actionable formats)
1. Methods/Data Section: Check that in-text citations and/or text in the Availability Statement link correctly to citations.2. Availability Statement: This text should include availability statements for all software and datasets that support the research.Currently most journals do not mark up this text.However, JATS4R now has a specific recommendation to do so.3. Citations: Journals should review and update the markup requirements for dataset citation 19 and software citation 20 .The persistent identifier or URL is an active link in the article to the reference.Some publishers provide visualization tools of the software or dataset.Consult the Software and Data Citation Resources Section of this article for information on formatting.Dataset and software research outputs should use the same journal citation style and treatment of the persistent identifier with slight adjustment to include software version information and bracketed descriptions.

Display the human readable citation correctly. Consult the Software and Data Citation Resources
Section of this article for information on formatting.Dataset and software research outputs should use the same journal citation style and treatment of the persistent identifier with slight adjustment to include software version information and possibly bracketed descriptions.4. Provide the machine-actionable citation correctly to downstream services (e.g., Crossref).It is important to note the guidance provided by Crossref in their blog (available at https://www.crossref.org/blog/how-do-you-deposit-data-citations/) has since been amended.If a journal implemented this guidance previously, there is a high probability that a correction is needed, using the information in this article.The Software and Dataset Citation Use Cases tables below (Tables 1-4), specifically the Crossref information, includes those corrections.In short, if there is a DOI, include the DOI XML tag.

Crossref activities
1. Receive the article XML from the Content Hosting Provider/Publisher and create the necessary Event Data entries for software and dataset citations included in the Reference Section.
We include below a list of specific software and dataset use cases with recommended JATS XML and Crossref submission record elements.JATS XML refers to Journal Article Tag Suite (JATS) Library, a standard publishers use to mark up journal article content in XML format.(https://www.niso.org/standards-committees/jats)Following the use cases are recommendations for techniques for identifying a software or dataset citation in the Reference Section.

Software and dataset citation use cases.
Tables 1-4 list the common use cases for software (Tables 1, 2) and dataset (Tables 3, 4) citations, the corresponding JATS XML elements, and Crossref Metadata depository schema elements to assist publishers with creating the necessary adjustments to their production systems that result in proper machine-readable and actionable citations.The first use case for software citation (use case A in Table 1) and dataset citation (use case 1 in Table 3) represent the most desirable condition to support automated attribution and credit.Use cases not included are listed in text below.In each table, we show the desired outcome for JATS XML version 1.3, which is available at https://www.niso.org/publications/z3996-2021-jats.The example used is <element-citation> but <mixed-citation> is equally acceptable.We also show the desired outcome for Crossref Metadata depository schema 5.3.1, which is available at https://www.crossref.org/documentation/schema-library/metadata-deposit-schema-5-3-1/.
Use cases not included in this guidance: 1. Citation to a physical location such as a museum.2. Mixed content citation that includes, for example, software, data, and other digital objects where individual attribution to a specific item is difficult or not possible.3. Physical samples using IGSN, RRID, or other persistent identifiers.4. Executable environments, for example, MyBinder, Executable Research Article (ERA) 21 , and Code Ocean. 5. Datasets associated with different persistent identifiers, for example journal supplemental files for a journal article where the journal article DOI is used for citation.6. Datasets and software together with an online software implementation associated with a publication are reviewed, published and preserved long-term, used data and software is cited and the provenance documented: IPCC Atlas Chapter with WGI Interactive Atlas and WGI Github Repository.

Software and Data Citation -Crossref Metadata Schema examples.
Software Citation Example.

Use Content Negotiation on machine-actionable metadata from the repository landing page.
Benefit: Crossref and DataCite provide a tool that validates Digital Object Identifiers (https://citation.crosscite.org/docs.html)and returns information that includes the "type" of content registered, such as "data set" or "software." Content Negotiation often also works with references to URLs, especially when requesting Schema.org ("application/ld + json") metadata.Many repositories also have machine-actionable metadata in Schema.org format on their landing page, which makes it a universal way to request metadata and type (data/software/ creative work).
Challenge: Often, content negotiation only works for DOIs registered with Crossref or DataCite.Additionally, for generalist repositories, the "type" information might not be accurate.Researchers depositing their data without support from a Data Manager will usually select the default type, misidentifying the files.The metadata format of Crossref and DataCite is different, so an implementation for both formats is required.It may also be advisable to request Schema.orgmetadata.

Conclusion
Publishers today have a responsibility to the research community, to establish and provide digital connections between the published article and the software and datasets that support that article.This is critical for research transparency, integrity, and reproducibility in today's data-and software-intensive science enterprise.Scientific communications that do not provide these connections can no longer be considered fully robust.
In implementing required changes to support these responsibilities many journals do not yet have policies and practices strong enough to ensure the necessary dataset and software citations are included in the references.Further, if such citations are included, the journal guidance has heretofore often been murky on how the machine-actionable citations should be digitally formatted for downstream services to provide automated attribution and credit.
This article provides the much-needed guidance to help journals review their production processes and work with their authors, editors, reviewers, staff, and production teams to ensure high-quality dataset and software citations that support automated attribution.
The authors of this article are members of the FORCE11 Journals Task force representing many society, institution, and commercial publishers world-wide.They encourage their peers to adopt these practices and help enable proper machine-actionable dataset and software citations.
Journals should use the Checklist of Best Practices for Journal Publishers located in the results section, to start their journey of improving machine-actionable dataset and software citations.
We strongly encourage our colleagues in academic publishing to adopt these practices.Colleagues with questions, or wishing to share success stories, should contact the lead author.

the entire production process team, such as copy editors, markup vendors, and staff, are aware of the uniqueness of software and dataset citations.
18 Review the submitted article to determine if the software and datasets referenced in the Methods and Data or Methods and Materials Section appears to fully support the research, such that all has been accounted for.2.Review the Availability Statement to ensure all the software and datasets referenced are included.Use the information provided to locate and observe the software and datasets.Ensure that the links resolve and the content is as described.Request clarity as needed.Ensure the Availability Statement is complete.See Availability Statement Recommended Elements Section for more information.This includes providing guidance, education, and support for questions.See the Software and Data Citation Resources Section for more information.2.Ensure the publication policies are current using best practices of including software and dataset citations such as the Transparency and Openness Promotion Guidelines (TOP) Guidelines17and the Research Data Policy Framework for All Journals and Publishers18and those that extend this work to include software.3. Implement quality assurance processes for measuring software and data citation accuracy.4. Establish quality controls, measures and metrics to track consistency and establish thresholds for when to take action when the measures indicate a degradation in quality.
Automated verification activity -reference section (Publisher or third-party vendor responsibility)1.Check software and dataset citations in the Reference Section.When checking citations, note that software and dataset citations formats can include repository names and version numbers.Guidance from FORCE11 and Research Data Alliance (RDA), a community-driven initiative by the European Commission, the United States Government's National Science Foundation and National Institute of Standards and Technology, and the Australian Government's Department of Innovation with the goal of building the social and technical infrastructure to enable open sharing and re-use of data (https://www.rd-alliance.org/about-rda) provides recommended formats.See the Software and Data Citation Resources Section for more information.The format for persistent identifiers for software and datasets can vary.DOIs are commonly registered through DataCite and potentially other DOI Registration Agencies.Consider using content negotiation to validate.Content Negotiation allows a user to request a particular representation of a web resource.DOI resolvers use content negotiation to provide different representations of metadata associated with DOIs.A content negotiated request to a DOI resolver is much like a standard HTTP request, except server-driven negotiation will take place based on the list of acceptable content types a client provides.(https://citation.crosscite.org/docs.html).2.

Ensure that software and dataset citations are tagged accurately. Avoid
tagging them as "other".Refer to JATS for Reuse (JATS4R) guidance for data citations

the Software and Dataset Citation Use Cases provided below as a guide to ensure coverage of most variations.
1. Register the article with Crossref and ensure metadata is sent to Crossref, including the full reference list (and the Availability Statement when added to the Crossref schema).Ensure all citations are included in the file going to Crossref, and not being removed inadvertently.Use the Crossref Participation Report to check publisher metadata overview of what is provided to Crossref.Consult with Initiative for Open Citations (I4OC, https://i4oc.org/)for community recommendations on open citations.2. Use

Table 1 .
Software Citation Use Cases-Software Citation with registered DOI & Software Citation with URL (not persistent).

Table 2 .
Software Citation Use Cases-Software Citation with Software Heritage Persistent Identifier (PID) & Software Citation with other PID.

Table 3 .
Data Citation Use Cases -Data Citation with registered DOI & Data Citation with unregistered DOI (DOI does not resolve).: A complete Availability Statement that has been validated through journal staff and/or peer review can be used to determine which citations in the Reference Section are data or software.Challenge: Staff, reviewers, and authors need clear guidance on what should be included in the Availability Statement and examples of data and software citations.The publication process should include steps to review and provide guidance to authors on completing or improving the Availability Statement.See the Availability Statement Recommendations Section for a list of elements to include.

Table 4 .
Data Citation Use Cases -Data Citation with URL (not persistent) & Data Citation with non-DOI PID.