Imaging is now used globally as a method of quantitative measurement of biological and biomedical structure, composition and dynamics in the life and biomedical sciences. Imaging technology is rapidly evolving, with new modalities and applications appearing that enable new insights and discoveries1,2. These innovations present challenges at several different but interdependent levels. Sourcing and retaining expert research technology professionals (‘imaging scientists’), providing initial and ongoing training in advanced technologies, rapidly disseminating and offering easy access to new innovative methods and applications, publishing reproducible experiments, and managing and analyzing data are all global issues experienced by academic and industrial research labs and institutions. Global BioImaging (https://globalbioimaging.org) was founded to meet these challenges and wherever possible use the spirit of cooperation across international boundaries to disseminate best practices, develop common imaging and data standards that promote data sharing, and develop world-class training programs and tools for imaging scientists.

Global Bioimaging has held annual ‘Exchange of Experience’ meetings (https://www.globalbioimaging.org/exchange-of-experience) since 2016. These meetings are open to all and seek to ensure the widest possible engagement with the worldwide imaging scientist community. So far, in-person meetings have been hosted by imaging communities in Europe, India, Australia and Singapore, and in 2020 an online meeting was hosted by Japan’s bioimaging community (Table 1). The meeting agendas, international working groups and informal discussions have repeatedly emphasized the need for standards for image data formats and public data resources. With rapid innovations in light-sheet microscopy, multiplex tissue imaging, spatial profiling of single-cell transcriptomes, mass spectrometry–based imaging, correlative imaging techniques, molecular imaging, advanced forms of microscopy-based spectroscopy (fluorescence correlation, Raman, hyperspectral) and several others, data complexity and dimensionality are increasing, which makes the need for open, common methods for recording imaging metadata even greater. Moreover, with the establishment and growth of public image data repositories, proposals for common metadata standards are now emerging. It is essential we define the specifications and usability requirements for data standards and repositories so that the global community of individual labs, core facilities, large multicenter projects and public data resources have the solutions they need to enable interrogation, analysis, sharing and publication of this new generation of datasets. Global BioImaging’s partners (Table 1) have observed these challenges across all boundaries of geography and scientific domain, and therefore have come together to try to identify universally relevant solutions.

Table 1 Global BioImaging partners and participating national and international initiatives

Target audiences for global bioimaging recommendations

For the construction and dissemination of recommendations by the Global BioImaging community, the target audiences for these recommendations are a broad range of constituencies and community members. We aim to support and ideally influence imaging scientists—central facility staff and managers in both academic and industrial research laboratories who deliver technical know-how and best practices to their experimental science colleagues or implement novel approaches and develop methods building on such best practices. However, Global BioImaging also seeks to influence journal editors and research funders, who have an important role in defining policy, practice and implementation. Journals have repeatedly contributed to the use of open data standards by requiring that papers submitted for publication adopt domain-specific data deposition standards (for example, deposition of sequence data in ArrayExpress or the European Nucleotide Archive and deposition of structural data in the Protein Data Bank). Funders contribute by conditioning funding awards on the use and adoption of data standards and, where appropriate, the deposition of datasets in open repositories. Finally, Global BioImaging seeks to engage with the commercial imaging community, which builds and delivers most of the equipment used by imaging scientists. It is essential that any recommendations can be ultimately adopted by commercial technology developers so they can be distributed as widely as possible. Global BioImaging’s recommendations are constructed so that they can be easily appreciated and incorporated by a wide cross-section of the scientific community.

Serving a diverse collection of national communities

Global BioImaging uses its international training and staff exchange programs and Exchange of Experience meetings to share expertise and know-how between imaging communities and domains while also developing an understanding and appreciation of the disparate levels of funding, installed technology and scientific priorities in different countries and international regions. A key theme in the Exchange of Experience meetings is the establishment of recommendations for defining and adopting standards (such as for image data, quality management, impact assessment, training curricula for facility staff), which may be especially important in new and developing bioimaging communities. After several discussions that have included all Global BioImaging partners and represented bioimaging communities at many different levels of development, we are convinced that international guidelines and standards will encourage the design of high-quality experiments that are robust and reproducible. This will also drive the creation of substantial educational resources that shape and contribute to undergraduate and postgraduate curricula and training. In addition, there is a significant responsibility on the part of universities to ensure that the teaching and training around technologies and tools are up to date, reproducible and accurate. Global BioImaging is committed to supporting and favoring the development and adoption of data standards and resources in regions at different stages. For instance, in South America the situation is diverse. While countries like Chile, Argentina and Brazil have implemented bioimaging national networks and even a few examples for biomedical data resources, countries such as Uruguay, Colombia or Peru are in earlier stages. The most important challenge for this region is the limited funding to implement and develop national and regional data resources. A successful example of a regional joint effort is the Centro de Biología Estructural del Mercosur (CeBEM) and its European partner Instruct-ERIC, which have constructed a data repository in South America as part of a global effort (Structural Biology Data Grid3, http://data.sbgrid.org). This repository is an example of resource development that is targeted toward the needs of a specific region, but that is informed by and adopts globally recognized standards. Global BioImaging aims to use a similar process to help grow public bioimaging data resources that meet the needs of national imaging communities in Latin America and elsewhere.

Moreover, Global BioImaging recommendations will support the growing international network of research infrastructure providers, who are increasingly responsible for offering guidance on best practices, developing and implementing processes, or making decisions on behalf of, and in partnership with, their user communities. This includes both facilities based on physical instrumentation for capturing scientific experiments, and e-research or cyberinfrastructure facilities that are responsible for the data management and analysis environments. By cooperating with communities with expertise in other fields—for example, in collaboration with the RI-VIS project (https://ri-vis.eu/network/rivis/Home)—Global BioImaging can disseminate its experience to a broader audience, increase visibility of research infrastructures and promote interdisciplinarity. Ultimately our goal is to leverage the richness and complexity of image data for new directions in research and training—for example, for the application of new artificial intelligence (AI) methods for object recognition, tracking, or correlation of multiscale datasets.

For these reasons, Global BioImaging partners are constructing international recommendations alongside their participating bioimaging communities, representing scientific communities from around the globe; and including imaging scientists and staff, universities and research institutions, health care providers, commercial entities, national funders and science-policy makers.

Building on existing experience

The range of imaging modalities and applications reflects the spread and dominance of imaging as a critical technology in the physical, biological, biomedical and life sciences. This diversity demonstrates the power of imaging but also creates several challenges. In particular, the huge number of data formats that are used across many different modalities inhibits access to and exchange of datasets among scientists for reproducible research and collaborative projects, between different imaging applications and across research domains.

It is impractical to suppose or recommend that a single data format can satisfy the wide range of imaging applications used by the global community of imaging scientists. Thus, we have developed a series of specifications and recommendations for potential standards that Global BioImaging members, imaging scientists, journals, technology manufacturers, funders and institutions may adopt and use in the future. These recommendations are built on the successful use of standards in imaging communities: DICOM (Digital Imaging and Communications in Medicine, https://www.dicomstandard.org), OME-TIFF4, imzML5, NIfTI (https://nifti.nimh.nih.gov) and many others. These community standards have had different levels of success depending on the quality of implementation and maintenance of the format. We propose to use routes to wide adoption that have been successful in other data-intensive life sciences fields (for example, genomics, transcriptomics, neuroimaging and structural biology). Specifically, by linking the recommendations for standards to the requirements of data repositories, we aim to build a powerful framework that defines formats for data acquisition and analysis, but also for data deposition in public resources at time of publication. Critically, Global BioImaging’s recommendations must be adoptable by the worldwide bioimaging community and respect the different levels of development of different imaging modalities, communities and countries. In the sections below we detail our current level of experience and recommendations for implementing and adopting standards for imaging data.

Recommendations for data format standards

In the following we outline the characteristics of useful, usable data standards. These guidelines can be used by scientists, infrastructure providers, commercial suppliers, funders and journal editors to assess the utility of data standards proposed by scientific groups, national programs or transnational collaborations. These recommendations reflect the requirements that are increasingly being adopted by other communities6.

  1. 1.

    Openness. Any proposed data format must be openly available, supported by accessible, versioned and editable specification(s), implementations and documentation. Specifications and other related documents must be easily accessible from a URL or other publicly available online resource, following the FAIR specification—Findable, Accessible, Interoperable and Reusable—formulated by the Force11 group (https://www.force11.org/group/fairgroup/fairprinciples). It is insufficient for documents and specifications to be supplied only on demand.

  2. 2.

    Implementation. Any proposed format should be supported by open source, publicly available, up-to-date software, with well-defined specifications, that provides read and write functions for the format, preferably in multiple, community-adopted programming environments (for example, Java, Python and C++). These implementations should include an application programming interface (API) and an open-source reference implementation, so they can be easily adopted and included in third-party software. It is useful for the read functions to be incorporated into a validator—an application that can be used to read a file and assess how well it adheres to the standard. Software libraries that meet these requirements will serve as ‘reference implementations’ for these formats: that is, public tools that implement community-agreed guidelines and specifications and can be adopted and used by the broad target audience defined by Global BioImaging.

  3. 3.

    Examples. Usage and adoption of a proposed data format standard will be catalyzed by openly available examples—real data stored in the format. These are useful references for anyone wishing to adopt and use the format and also can serve as tools for testing and validating software that reads and/or writes the format. For each version of the format specification, up-to-date examples should be provided.

  4. 4.

    Licensing. All data-standard resources, including documentation, specifications, implementations and example datasets, should be licensed using an appropriate community-agreed license (one set of examples are the Creative Commons licenses—for example, CC0 or CC-BY). Licenses that forbid commercial use often inhibit adoption by industrial research labs and commercial technology providers and should be avoided. Software for reading and writing data formats should be licensed under a permissive software license—for example, BSD, MIT or similar—to promote adoption by users from across the bioimaging community.

  5. 5.

    Data types. There are many different data types covering a multitude of different applications, domains, imaging modalities, and spatial and temporal scales. Any proposed standard will likely only cover one or at most a few applications or domains. The expected types of data supported by the standard should be stated clearly in any documentation. In addition, the types of data supported—for example, metadata related to experimental or case manipulations, image data acquisition, data processing and analytic outputs—should be clear, easy to understand for any user, documented, and usable for search and data management applications.

  6. 6.

    Governance or change management. For a scientific standard to stay relevant while ensuring transparency, it needs a mechanism or structure for decision-making and change management. Due to the varying types of standards, their reaches, and differences across their adoptive communities, a governance or change management policy and process could take many forms. The most critical attributes are transparency and strong community engagement.

  7. 7.

    Adoption. For a standard to be considered suitable, it should be adopted beyond an individual research laboratory, institution or geographic locale. As imaging is rapidly evolving, it is likely new candidates for standards will emerge. This is necessary and even healthy in a field with rapid innovation, but viable candidates should follow the recommendations listed above.

Data repositories

Commonly shared open datasets have repeatedly proven to be essential for the development of analytic and processing tools for data across the sciences. Open science initiatives are becoming more widely accepted by the scientific community, and open access to research data is often required by private, national and transnational funding agencies7. In the life and biomedical sciences, the commitment of the genomics community to rapidly publish genomic sequence data8 was the basis for the development and growth of the modern field of bioinformatics. Global BioImaging aims to catalyze a similar development of bioimage informatics and data analytics by encouraging and supporting the construction, sustainability and continuous availability of repositories for imaging data.

Imaging datasets are rich, heterogeneous and often large. Until recently, most image data repositories published datasets from single projects, making large strategic datasets available for query and download. However, in the last 10 years, several repositories have appeared that integrate datasets from independent peer-reviewed studies, enabling datasets from electron microscopy, high-content screening, bright-field and multidimensional fluorescence microscopy, histology, magnetic resonance imaging, positron emission tomography and ultrasound to be published and accessed online, usually through a web-browser-based interface and sometimes through appropriate APIs.

Recommendations for open access image data repositories

Table 2 lists several public data resources used by Global BioImaging’s scientists. The appearance and growth of these and other resources demonstrates that many of the barriers for managing and publishing large collections of images have been overcome. We have therefore defined key, specific recommendations that should be implemented to ensure this momentum continues and preferably grows.

  1. 1.

    Metadata specifications for submission. The value of published imaging datasets can only be realized if they are accompanied by metadata that describe type and state of sample, experimental manipulations, imaging technology, conditions and probes, and any analytic results derived from the data. The value of capturing metadata as completely as possible must be weighed against the practicality of capturing experimental and analytic outputs from research laboratories. As noted above, there are several established metadata-rich formats (for example, DICOM, OME-TIFF, NIfTI), but the complexity of case, tissue, disease, sample and imaging modality metadata have defied full standardization, especially in the research setting—innovative experiments and technologies often challenge previously used definitions and concepts9. New web-based metadata technologies like JSON-LD, which is now a formal specification from World Wide Web Consortium, may provide a way to implement a flexible metadata specification in a common language. Nonetheless, in our experience, the easiest, most commonly used data format for research metadata is the spreadsheet, so public data resources will need to take a flexible, practical approach to capturing the broad range of metadata required to document and reproduce innovative experiments. Moreover, the increasing number of image data repositories may result in an equivalent number of metadata submission templates, causing confusion for data submitters and future data users.

    The developing image data resources should engage with the bioimaging community to define, as far as possible, a common metadata specification that is shared across repositories; updated on a regular, predictable basis; and relatively easy for data submitters to use, fill out and submit. The bioimaging community should collaborate to define consistent ontologies for metadata. To minimize any extra workload on the part of the researcher, as far as possible the metadata should be harvested from the instrument, preferably at the time of acquisition. Here, Global Bioimaging can serve to consolidate communication of requirements between bioimaging scientists and commercial technology developers.

  2. 2.

    Components of the bioimage data ecosystem. The collection, annotation, storage, integration and publication of biological datasets is well established, with many resources having reached maturity and stability. These existing resources serve as models that the imaging community can use to learn useful and successful design and construction patterns10.

    An approach that has proven successful in several other fields is to construct two separate data resources. The first, an archive or repository, holds and serves all data associated with publications and stores data files and a limited amount of metadata. Data can be browsed, found using search indices and downloaded, but higher level annotation, integration and processing is not attempted, so that the archive can keep pace with the rate of data submissions. Archives hold datasets that are as close to the primary data produced by the instrument as possible, and they should be immutable. A second type of resource, an added-value database (AVDB), incorporates datasets from the archive, performs curation and integration, and seeks to enrich data and enable discovery with the datasets it holds (‘reference datasets’; see ref. 11). The separation between the construction and operation of archives and AVDBs facilitates an efficient data intake workflow and allows curation at a sufficient level to enable data reuse and discovery.

    Significant steps toward the establishment of a mature, usable bioimage data ecosystem have been achieved (Table 2). Image databases that collect and curate multidimensional bioimaging data in electron microscopy and cell and tissue light microscopy, several organ-specific resources, and biomedical image data repositories are now funded, available, and accepting and publishing terabyte-scale datasets. The launch of the BioImage Archive (https://www.ebi.ac.uk/bioimage-archive/) by the European Molecular Biology Laboratory European Bioinformatics Institute (EMBL-EBI) in July 2019 provided a central resource for the biological community and a common cross-domain foundation for existing and future AVDBs, such as those listed in Table 2. Data pipelines from the BioImage Archive are being developed to connect to and feed AVDBs such as the Electron Microscopy Public Image Archive (EMPIAR) and the Cell and Tissue Image Data Resources (IDRs). Medical imaging communities are actively exploiting a dedicated image archive platform (XNAT, http://www.xnat.org) and developing tools for easy integrations with the BioImage Archive and other databases. In the future, capabilities will be available to connect other AVDBs that will enhance the scientific value of the archived images through curation, integrative analysis and the development of new analytical methods for cross-interrogation and information retrieval among multidomain AVDBs. Global Bioimaging strongly endorses these steps and looks forward to contributing to and deriving value from these public resources.

  3. 3.

    Requirements for AVDBs for AI applications. As AVDBs grow and mature, the well-annotated datasets they hold may be valuable training datasets for advanced AI applications, including tools that use deep learning. However, in discussions with members of Global BioImaging who run AVDBs, there is a shared sense that we lack clear, definitive requirements for how training datasets should be constructed, how annotations (‘labels’) should be formatted, or which datasets should be prioritized for formatting for AI uses. We recommend that custodians of AVDBs work with AI experts to define these and other requirements in order to rapidly expand the usage of bioimaging datasets for AI applications. This should include standards for linking the imaging data to other relevant data from the same subject or sample, such as genetic data and biochemical, clinical or behavioral results.

    Moreover, there are clearly strong opportunities for applying AI techniques to microscopy and imaging problems12,13. A lack of community consensus across these attributes will impede the translation of AI image analysis techniques from laboratory to application, taking into account the growing demands for greater transparency of AI operative heuristics and legislation for the right to an explanation of algorithmic decision making.

  4. 4.

    Authentication for submissions and data access. As archives and AVDBs grow, the number of submissions they receive will increase, and the number of authors submitting datasets will also increase. This inevitably raises the issue of authentication of author identity, affiliation and other critical information becoming an essential part of the data submission workflow. Several public identifier and authentication projects, including ORCiD (https://orcid.org/), Elixir Authentication and Authorization Infrastructure (AAI, https://www.elixir-europe.org/services/compute/aai), Life Science Authentication and Authorization Infrastructure (LS AAI, https://tnc18.geant.org/getfile/4229) and Australian Access Federation (AAF, https://aaf.edu.au/), are building identification policies and resolution systems to ensure all members of the scientific community are associated with a unique identifier and to provide services to resources like the imaging archives and AVDBs for user identification and authorization.

    LS AAI is an extensive collaborative project whereby several life science research infrastructures have together defined requirements for a common AAI, developed under the overarching blueprint of the Authentication and Authorisation for Research and Collaboration (AARC) initiative (https://aarc-project.eu/). LS AAI is being implemented within the European Open Science Cloud (EOSC)-Life project (http://www.eosc-life.eu/) and is expected to be widely used by the life science community. In another example, the AAF provides a federated web-login service that allows researchers to access a broad variety of Australian research-focused web services through their university credentials.

    We recommend that those involved in data services develop a task force to research current and ongoing work, standardize authentication practice and initiate proof-of-concept projects to assess the usage and usability of the various authentication systems that are coming online. In the long term, a truly global identification and authentication could be extended to identify instruments and the datasets they collect.

  5. 5.

    Trustworthy research data resources. The complexity of acquisition techniques, experiments and the resulting research data is increasing, and this challenges data archives and AVDBs, and ultimately the ability to reproduce experiments or reuse data. In response, there are developing initiatives to assess and declare the quality of public data resources, using criteria of openness, sustainability and the adoption of community standards. These efforts are international and extend across a broad range of scientific domains. Examples include http://FAIRsharing.org, which provides a catalog and characteristics of databases, data standards and other public resources14, and CoreTrustSeal’s Core Trustworthy Data Repositories Requirements (https://www.coretrustseal.org/), which provides a list of requirements that are deemed mandatory for a trustworthy data repository. In Australia the National Imaging Facility is building a trusted data resource to serve the needs of its national community15. In the European Union, EOSC-Life is constructing a trusted, sustainable open data resource infrastructure for the life sciences (https://www.eosc-life.eu/). These efforts aim to increase reproducibility and repeatability of experiments, enhance researchers understanding the data, make processing pipelines more accessible and easily comprehensible, and strengthen data provenance.

  6. 6.

    Human identifiable data. A key issue for data resources are the methods for and policies toward treatment of personally identifiable data, and/or datasets derived from individuals or their biological materials16. There are several active efforts to define guidelines for both ethics and best practices in the sharing and publication of these data. For example, guidelines published by the Global Alliance for Genomics and Health (https://www.ga4gh.org/genomic-data-toolkit/regulatory-ethics-toolkit/) provide a useful, established framework for the developing bioimage data ecosystem. As bioimage data resources will undoubtedly link to and/or integrate genomics and other datasets, their adoption of these guidelines is likely to be the most sensible and efficient way to handle these valuable datasets.

Table 2 Examples of public image data archives and AVDBs

Future directions

Looking forward, we see several challenges and opportunities for imaging data standards and image publication resources. Most formats will not perform well in cloud-based storage technologies (‘object storage’) that treat files as single monolithic entities and do not support accessing parts or ‘chunks’ of files. Therefore a new generation of binary and metadata storage technologies will be required. Whole-tissue or whole-body profiling projects—for example, the Human Biomolecular Atlas Project17—are creating datasets that far exceed the capabilities of the current generation of file formats and resources. Support for new types of metadata that integrate experimental protocols, organism metadata, common coordinate frameworks, analytic results and derived models are urgently required. In addition, the increasing need to integrate information derived from biomedical and medical images in combination with clinical data into innovative healthcare workflows will be a key challenge in the future and will require modern, open, developer-friendly interoperability standards, such as the Fast Healthcare Interoperability Resources (FHIR, https://www.hl7.org/fhir). International communities are now working to establish standards for quality management including protocols and recommendations for biological imaging that will integrate with data standards and, ideally, will be included in deposition requirements for data repositories18. Finally the application of machine-learning-based models for object recognition and segmentation will require wholly new capabilities in data resources so that well-annotated models can be published and shared. We recommend that academic and commercial technology developers, funding agencies and experimental and computational users of these resources specify and begin to construct the data technologies required for the next generation of imaging experiments.

Conclusion

Standardized data formats and public data resources are a critical next step for the fields of biological and biomedical imaging. The appearance of several open data formats and data repositories has demonstrated that the technology and know-how exist to build these resources. The members of Global BioImaging agree that the next step is to drive adoption by all members of the scientific community, but in particular funders and journals, who can mandate the use of open formats and data deposition as a condition of funding or acceptance of scientific publications. We have outlined the characteristics of standards that can be used by these critical stakeholders to assess the quality of proposed open formats and data repositories. These recommendations can help catalyze the development and adoption of resources for open, accessible, reusable bioimaging data.