News & Comment

AI and the democratization of knowledge

The solution of the longstanding “protein folding problem” in 2021 showcased the transformative capabilities of AI in advancing the biomedical sciences. AI was characterized as successfully learning from protein structure data, which then spurred a more general call for AI-ready datasets to drive forward medical research. Here, we argue that it is the broad availability of knowledge, not just data, that is required to fuel further advances in AI in the scientific domain. This represents a quantum leap in a trend toward knowledge democratization that had already been developing in the biomedical sciences: knowledge is no longer primarily applied by specialists in a sub-field of biomedicine, but rather multidisciplinary teams, diverse biomedical research programs, and now machine learning. The development and application of explicit knowledge representations underpinning democratization is becoming a core scientific activity, and more investment in this activity is required if we are to achieve the promise of AI.

Christophe Dessimoz
Paul D. Thomas
CommentOpen Access05 Mar 2024
A Framework for the Interoperability of Cloud Platforms: Towards FAIR Data in SAFE Environments

As the number of cloud platforms supporting scientific research grows, there is an increasing need to support interoperability between two or more cloud platforms. A well accepted core concept is to make data in cloud platforms Findable, Accessible, Interoperable and Reusable (FAIR). We introduce a companion concept that applies to cloud-based computing environments that we call a Secure and Authorized FAIR Environment (SAFE). SAFE environments require data and platform governance structures and are designed to support the interoperability of sensitive or controlled access data, such as biomedical data. A SAFE environment is a cloud platform that has been approved through a defined data and platform governance process as authorized to hold data from another cloud platform and exposes appropriate APIs for the two platforms to interoperate.

Robert L. Grossman
Rebecca R. Boyles
Stan Ahalt
CommentOpen Access26 Feb 2024
Who owns (or controls) health data?

The ongoing debate on secondary use of health data for research has been renewed by the passage of comprehensive data privacy laws that shift control from institutions back to the individuals on whom the data was collected. Rights-based data privacy laws, while lauded by individuals, are viewed as problematic for the researcher due to the distributed nature of data control. Efforts such as the European Health Data Space initiative seek to build a new mechanism for secondary use that erodes individual control in favor of broader secondary use for beneficial health research. Health information sharing platforms do exist that embrace rights-based data privacy while simultaneously providing a rich research environment for secondary data use. The benefits of embracing rights-based data privacy to promote transparency of data use along with control of one’s participation builds the trust necessary for more inclusive/diverse/representative clinical research.

Scott D. Kahn
Sharon F. Terry
CommentOpen Access01 Feb 2024
A General Primer for Data Harmonization

Data harmonization is an important method for combining or transforming data. To date however, articles about data harmonization are field-specific and highly technical, making it difficult for researchers to derive general principles for how to engage in and contextualize data harmonization efforts. This commentary provides a primer on the tradeoffs inherent in data harmonization for researchers who are considering undertaking such efforts or seek to evaluate the quality of existing ones. We derive this guidance from the extant literature and our own experience in harmonizing data for the emergent and important new field of COVID-19 public health and safety measures (PHSM).

Cindy Cheng
Luca Messerschmidt
Joan Barceló
CommentOpen Access31 Jan 2024
Enhancing radiomics and Deep Learning systems through the standardization of medical imaging workflows

Recent advances in computer-aided diagnosis, treatment response and prognosis in radiomics and deep learning challenge radiology with requirements for world-wide methodological standards for labeling, preprocessing and image acquisition protocols. The adoption of these standards in the clinical workflows is a necessary step towards generalization and interoperability of radiomics and artificial intelligence algorithms in medical imaging.

Miriam Cobo
Pablo Menéndez Fernández-Miranda
Lara Lloret Iglesias
CommentOpen Access21 Oct 2023
Journal Production Guidance for Software and Data Citations

Software and data citation are emerging best practices in scholarly communication. This article provides structured guidance to the academic publishing community on how to implement software and data citation in publishing workflows. These best practices support the verifiability and reproducibility of academic and scientific results, sharing and reuse of valuable data and software tools, and attribution to the creators of the software and data. While data citation is increasingly well-established, software citation is rapidly maturing. Software is now recognized as a key research result and resource, requiring the same level of transparency, accessibility, and disclosure as data. Software and data that support academic or scientific results should be preserved and shared in scientific repositories that support these digital object types for discovery, transparency, and use by other researchers. These goals can be supported by citing these products in the Reference Section of articles and effectively associating them to the software and data preserved in scientific repositories. Publishers need to markup these references in a specific way to enable downstream processes.

Shelley Stall
Geoffrey Bilder
Timothy Clark
CommentOpen Access26 Sept 2023
Shared metadata for data-centric materials science

The expansive production of data in materials science, their widespread sharing and repurposing requires educated support and stewardship. In order to ensure that this need helps rather than hinders scientific work, the implementation of the FAIR-data principles (Findable, Accessible, Interoperable, and Reusable) must not be too narrow. Besides, the wider materials-science community ought to agree on the strategies to tackle the challenges that are specific to its data, both from computations and experiments. In this paper, we present the result of the discussions held at the workshop on “Shared Metadata and Data Formats for Big-Data Driven Materials Science”. We start from an operative definition of metadata, and the features that a FAIR-compliant metadata schema should have. We will mainly focus on computational materials-science data and propose a constructive approach for the FAIRification of the (meta)data related to ground-state and excited-states calculations, potential-energy sampling, and generalized workflows. Finally, challenges with the FAIRification of experimental (meta)data and materials-science ontologies are presented together with an outlook of how to meet them.

Luca M. Ghiringhelli
Carsten Baldauf
Matthias Scheffler
CommentOpen Access14 Sept 2023
Incentivising open ecological data using blockchain technology

Robert John Lewis
Kjell-Erik Marstein
John-Arvid Grytnes
CommentOpen Access07 Sept 2023
FAIR for AI: An interdisciplinary and international community building perspective

A foundational set of findable, accessible, interoperable, and reusable (FAIR) principles were proposed in 2016 as prerequisites for proper data management and stewardship, with the goal of enabling the reusability of scholarly data. The principles were also meant to apply to other digital assets, at a high level, and over time, the FAIR guiding principles have been re-interpreted or extended to include the software, tools, algorithms, and workflows that produce data. FAIR principles are now being adapted in the context of AI models and datasets. Here, we present the perspectives, vision, and experiences of researchers from different countries, disciplines, and backgrounds who are leading the definition and adoption of FAIR principles in their communities of practice, and discuss outcomes that may result from pursuing and incentivizing FAIR AI research. The material for this report builds on the FAIR for AI Workshop held at Argonne National Laboratory on June 7, 2022.

E. A. Huerta
Ben Blaiszik
Ruike Zhu
CommentOpen Access26 Jul 2023
FAIR High Content Screening in Bioimaging

The Minimum Information for High Content Screening Microscopy Experiments (MIHCSME) is a metadata model and reusable tabular template for sharing and integrating high content imaging data. It has been developed by combining the ISA (Investigations, Studies, Assays) metadata standard with a semantically enriched instantiation of REMBI (Recommended Metadata for Biological Images). The tabular template provides an easy-to-use practical implementation of REMBI, specifically for High Content Screening (HCS) data. In addition, ISA compliance enables broader integration with other types of experimental data, paving the way for visual omics and multi-Omics integration. We show the utility of MIHCSME for HCS data using multiple examples from the Leiden FAIR Cell Observatory, a Euro-Bioimaging flagship node for high content screening and the pilot node for implementing Findable, Accessible, Interoperable and Reusable (FAIR) bioimaging data throughout the Netherlands Bioimaging network.

Rohola Hosseini
Matthijs Vlasveld
Katherine J. Wolstencroft
CommentOpen Access17 Jul 2023
What prevents us from reusing medical real-world data in research

Medical real-world data stored in clinical systems represents a valuable knowledge source for medical research, but its usage is still challenged by various technical and cultural aspects. Analyzing these challenges and suggesting measures for future improvement are crucial to improve the situation. This comment paper represents such an analysis from the perspective of research.

Julia Gehrmann
Edit Herczog
Oya Beyan
CommentOpen Access13 Jul 2023
Ten lessons for data sharing with a data commons

A data commons is a cloud-based data platform with a governance structure that allows a community to manage, analyze and share its data. Data commons provide a research community with the ability to manage and analyze large datasets using the elastic scalability provided by cloud computing and to share data securely and compliantly, and, in this way, accelerate the pace of research. Over the past decade, a number of data commons have been developed and we discuss some of the lessons learned from this effort.

Robert L. Grossman
CommentOpen Access06 Mar 2023
Addressing barriers in FAIR data practices for biomedical data

Laura D. Hughes
Ginger Tsueng
Lars Pache
CommentOpen Access23 Feb 2023
An agenda for addressing bias in conflict data

With increased availability of disaggregated conflict event data for analysis, there are new and old concerns about bias. All data have biases, which we define as an inclination, prejudice, or directionality to information. In conflict data, there are often perceptions of damaging bias, and skepticism can emanate from several areas, including confidence in whether data collection procedures create systematic omissions, inflations, or misrepresentations. As curators and analysts of large, popular data projects, we are uniquely aware of biases that are present when collecting and using event data. We contend that it is necessary to advance an open and honest discussion about the responsibilities of all stakeholders in the data ecosystem – collectors, researchers, and those interpreting and applying findings – to thoughtfully and transparently reflect on those biases; use data in good faith; and acknowledge limitations. We therefore posit an agenda for data responsibility considering its collection and critical interpretation.

Erin Miller
Roudabeh Kishi
Caitriona Dowd
CommentOpen Access30 Sept 2022
Data from the MOSAiC Arctic Ocean drift experiment

The Multidisciplinary drifting Observatory for the Study of Arctic Climate (MOSAiC) is a multinational interdisciplinary endeavor of a large earth system sciences community.

Stephan Frickenhaus
Daniela Ransby
Marcel Nicolaus
CommentOpen Access15 Sept 2022
From biomedical cloud platforms to microservices: next steps in FAIR data and analysis

The biomedical research community is investing heavily in biomedical cloud platforms. Cloud computing holds great promise for addressing challenges with big data and ensuring reproducibility in biology. However, despite their advantages, cloud platforms in and of themselves do not automatically support FAIRness. The global push to develop biomedical cloud platforms has led to new challenges, including platform lock-in, difficulty integrating across platforms, and duplicated effort for both users and developers. Here, we argue that these difficulties are systemic and emerge from incentives that encourage development effort on self-sufficient platforms and data repositories instead of interoperable microservices. We argue that many of these issues would be alleviated by prioritizing microservices and access to modular data in smaller chunks or summarized form. We propose that emphasizing modularity and interoperability would lead to a more powerful Unix-like ecosystem of web services for biomedical analysis and data retrieval. We challenge funders, developers, and researchers to support a vision to improve interoperability through microservices as the next generation of cloud-based bioinformatics.

Nathan C. Sheffield
Vivien R. Bonazzi
Andrew D. Yates
CommentOpen Access08 Sept 2022
Capturing the COVID-19 Crisis through Public Health and Social Measures Data Science

In response to COVID-19, governments worldwide are implementing public health and social measures (PHSM) that substantially impact many areas beyond public health. The new field of PHSM data science collects, structures, and disseminates data on PHSM; here, we report the main achievements, challenges, and focus areas of this novel field of research.

Cindy Cheng
Amélie Desvars-Larrive
Sophia Alison Zweig
CommentOpen Access26 Aug 2022
Recommendations for repositories and scientific gateways from a neuroscience perspective

Digital services such as repositories and science gateways have become key resources for the neuroscience community, but users often have a hard time orienting themselves in the service landscape to find the best fit for their particular needs. INCF has developed a set of recommendations and associated criteria for choosing or setting up and running a repository or scientific gateway, intended for the neuroscience community, with a FAIR neuroscience perspective.

Malin Sandström
Mathew Abrams
Wojtek J. Goscinski
CommentOpen Access16 May 2022
PET-BIDS, an extension to the brain imaging data structure for positron emission tomography

The Brain Imaging Data Structure (BIDS) is a standard for organizing and describing neuroimaging datasets, serving not only to facilitate the process of data sharing and aggregation, but also to simplify the application and development of new methods and software for working with neuroimaging data. Here, we present an extension of BIDS to include positron emission tomography (PET) data, also known as PET-BIDS, and share several open-access datasets curated following PET-BIDS along with tools for conversion, validation and analysis of PET-BIDS datasets.

Martin Norgaard
Granville J. Matheson
Melanie Ganz
CommentOpen Access02 Mar 2022
Monitoring non-pharmaceutical public health interventions during the COVID-19 pandemic

Measuring and monitoring non-pharmaceutical interventions is important yet challenging due to the need to clearly define and encode non-pharmaceutical interventions, to collect geographically and socially representative data, and to accurately document the timing at which interventions are initiated and changed. These challenges highlight the importance of integrating and triangulating across multiple databases and the need to expand and fund the mandate for public health organizations to track interventions systematically.

Yannan Shen
Guido Powell
David L. Buckeridge
CommentOpen Access24 Aug 2021