Spectacular advances in light and electron microscopy1,2 are rapidly transforming the life sciences. For instance, scientists are now able to image molecular complexes at atomic resolution3,4,5, follow the fates of individual molecules in a living cell, and image the development of an organism starting from a single fertilized cell6,7. These imaging technologies are generating large amounts of complex data, the interpretation of which often requires sophisticated analyses, as in other ‘omics’ technologies. Moreover, most advanced imaging technologies are expensive, while the biological samples used in the experiments may be unique. To maximize the use of the generated data and to realize the full potential of the advances in biological imaging, these datasets need to be made available to other researchers in a timely manner, consistent with the FAIR principles—findable, accessible, interoperable and reusable8—and thus amenable to reuse.

Around the world, there are efforts to develop informatics systems for making different types of microscopy data available to the community. Sharing cryo-electron microscopy (cryo-EM) data is already quite advanced (Box 1), while sharing light microscopy data is still at an early stage. In Europe, a research infrastructure for biological and biomedical imaging called Euro-BioImaging has recently been established and is developing imaging data management and publishing solutions such as Cell-IDR and Tissue IDR9. In Japan, RIKEN launched the Systems Science of Biological Dynamics database (SSBD) in 2013, with the goal of sharing quantitative biological dynamics data including time-lapse microscopy images10. In 2016, the database expanded its remit to all bioimage data from the Japanese community. In the United States, the National Institutes of Health (NIH) has funded the establishment of the CELL Image Library11, while NIH’s BRAIN initiative is establishing specifications and resources for imaging of brain tissue (https://doryworkspace.org/, https://www.brainimagelibrary.org). In collaboration with Bioimaging North America, NIH’s 4D Nucleome project has released specifications for image-acquisition metadata12. There are also efforts that have wider geographic coverage. Global BioImaging (https://globalbioimaging.org/) has published recommendations for data formats and data repositories13, and the QUAREP-LiMi14,15 global consortium is working to establish community-driven specifications for quality assurance and testing in quantitative light microscopy.

Experience from other omics domains has taught us that to make data reusable, some standardization is necessary, and in particular, in reporting the metadata we need to give information describing the experiments and the samples—for instance, what instrument was used to generate the images and how the samples were prepared. To achieve this, ‘appropriate minimal’ or recommended information guidelines or standards have been adopted by various life-science communities. One of the first such initiatives was MIAME (Minimum Information About a Microarray Experiment), which was published16 in 2001 and has had a major impact on how functional genomics data are collected and reported via public repositories, and on the reusability of these data17,18. As the biological imaging field is maturing, the bioimaging community is now recognizing that it faces similar challenges. In fact, the metadata challenge in the bioimaging domain has been discussed in the European Light Microscopy Initiative (ELMI) community (https://elmi.embl.org/) since 2001, and an attempt to address it was undertaken by the OME Consortium19. In the domain of medical imaging, the challenge is partially addressed by the Digital Information and Communications in Medicine (DICOM) standard20. Nevertheless, it was reported recently that metadata on imaging methods are vastly under-reported in biomedical research21. One might argue that microscopy experiments are too complex and heterogeneous to be amenable to a standardized description. Twenty years ago, the same was often said about microarray data, but the best practices for collecting and representing metadata for various biomedical domains have evolved considerably since then. Arguably, the biological imaging field is ready for some initial data standardization and would benefit from it.

A workshop held in Hinxton, UK, in 2017 unanimously supported the establishment of a public bioimage archive to store data associated with peer-reviewed publications or systematic imaging projects22. The workshop recommended the adoption of initially flexible data standards, which could be gradually tightened as different imaging communities reach consensus. In July 2019 the BioImage Archive (https://www.ebi.ac.uk/bioimage-archive) was established at the European Bioinformatics Institute, part of the European Molecular Biology Laboratory (EMBL-EBI), and it provides the community with the means to share different types of imaging data. The BioImage Archive is a deposition database for all microscopy images associated with peer-reviewed scientific publications for which a more specialized resource is not available. It is part of a larger and developing bioimaging ‘ecosystem’ that also includes more specialized and structured image resources, such as EMPIAR for electron and X-ray microscopy images23, Cell-IDR9 for curated images of cells and Tissue-IDR for curated images of biological tissues. The BioImage Archive is built on a high-performance, high-volume data-storage system that can be used as a platform by other existing or emerging biological imaging resources.

A follow-up workshop to discuss minimum metadata recommendations in several biological imaging fields was held in Hinxton in October 2019. Representatives from the light, electron and X-ray microscopy communities exchanged their experiences and ideas and began the process of developing the Recommended Metadata for Biological Images (REMBI) guidelines, presented here, to address the needs of these communities. A common theme in community efforts such as this is that standardized dataset annotation and deposition become more complex and time-consuming with every extra metadata element. Thus, attempts to impose requirements that are not yet sufficiently widely adopted by a given community or supported by relevant data-annotation tools may be counterproductive. However, this challenge is not dissimilar to the one the microarray community faced at the beginning of this century, and the arguments presented for and against a greater or lower level of detail in the minimum standard are similar in the two domains. In addition, the amount of information required for reuse may differ depending on the imaging technology, the scientific application and the needs of different user groups (Fig. 1). We are thus convinced that there is a need to strike the right balance between minimizing the barriers to data submission and maximizing opportunities for data reuse.

Fig. 1: There are at least three different categories of users of archived images, each with different needs with respect to metadata.
figure 1

(1) Biologists and life scientists who are interested in repeating experiments, (re-)analyzing or comparing bioimage data and understanding results. For this, they need detailed information on the experimental context, such as the composition of biological samples, molecular entities, experimental interventions (for example, control vs. treatment) and how these relate to the image data. (2) Imaging scientists (microscopists and technology developers) who are interested in developing new imaging technologies. For this, they need detailed information on the image-acquisition process, such as physical properties of the image-acquisition set-up, and may benefit from some high-level information on the biological problem at hand. (3) Computer-vision researchers who develop algorithms (not limited to biological applications). Depending on the objective, they may need any of the information listed above. For example, to train a machine-learning algorithm, they would need ‘ground truth’ information such as adequately labeled images with categories (for example, control vs. treatments/phenotypes) or object outlines (segmentations).

Guidelines must take into account that microscopy technology development is highly dynamic, that there are many existing file formats (with new formats appearing regularly), and that datasets are becoming larger and more complex. Recognizing the enormous heterogeneity of biological imaging methods and the wide range of scales (from subnanometer to centimeter scale), the workshop established three working groups to address metadata recommendations for different subdomains: (1) the Electron Cryo-Microscopy and Cryo-Tomography working group; (2) the Volume EM and Correlative Imaging working group; and (3) the Light Microscopy working group, which covered cell-, tissue- and organism-level imaging. While these types of imaging each require specific types of metadata, they are all applied to study biological systems, and therefore commonalities are to be expected. The working groups converged on a common high-level structure of the recommended metadata guidelines (Fig. 2 and Supplementary Information).

Fig. 2: Different categories of metadata that are covered by REMBI.
figure 2

The “study” module describes the top-level metadata elements, in alignment with existing generic standards such as Dublin Core, DataCite Metadata, and schema.org. For example, in a correlative study comprising serial block-face scanning electron microscopy (SBF-SEM) and confocal images, one of the study components would contain all information on the EM image stack, the other study component would correspond to the confocal stack, and a transformation description would allow an overlay of the two types of image. Data that retain spatial fidelity to underlying images (for example, label maps, volume renderings) are described in the “image data” module, whereas “analyzed data” (for example, volumetric analyses, image segment features, counts) contains image-derived measurements, typically presented in tabular form. For more details, see the Supplementary Information.

The purpose of the proposed guidelines is to provide a framework for discussing different aspects of useful sharing of imaging data with the goal of reaching community-wide consensus on the level of detail that is optimal. The workshop participants agreed that it is important to distinguish recommended metadata requirements from particular data models: the former concern the semantic requirements of what annotation is needed to understand and reuse image data whereas the latter concern the syntactic representation of these metadata elements by computer software. There is also a third layer, specifying the implementation of a data model in a deposition system for a particular archive, along with the user interface of such a system.

In the field of cryo-EM, a tradition of detailed data annotation and deposition to a public repository is well established (Box 1). Standardizing metadata for light microscopy is challenging, as it covers a wide range of imaging modalities spanning several temporal and spatial scales, including single-molecule localization microscopy, wide-field or confocal microscopy, optical projection tomography, and light-sheet microscopy. The plethora of experimental set-ups (for example, high-content screening, light-sheet microscopy and digital pathology), file formats and compression methods, and the increasing complexity of datasets, are all complicating factors. Acknowledging that this subdomain produces datasets to address an extremely wide range of research questions, the working group concluded that it is currently difficult to expand the recommended metadata required for archival deposition beyond the basic information needed to open a dataset and access the pixel data such that visualization or reanalysis is possible. While such an approach does not immediately ensure full experimental reproducibility or provide a biological understanding of the sample, imaging conditions or other contextual information, it can serve as a starting point. The standard will of course evolve and be subject to refinement by the community as standardization progresses in the field. We hope that this publication will accelerate this process by facilitating discussions in the community, eventually producing a consensus view on metadata that allows experimental reproducibility and is fully consistent with the FAIR principles.

As agreement on recommended metadata is emerging, data-deposition tools that facilitate collection of these metadata (including submission tools for the BioImage Archive) will be developed, testing these standards in practice. For instance, the SSBD repository currently uses its own metadata template (comprising 11 required input fields), but the templates will be revised as an accepted standard emerges. Intelligent software strategies, such as autofilling common fields and automatic ‘data harvesting’ of information from log files, should be used to lower the barriers for data upload and to increase the quality of the captured information. Development and adoption of metrics to assess the completeness and correctness of uploads may encourage better deposition practices, resulting in wider use of and greater trust in the shared data in the community. Implementing recommended criteria in a way that encourages submission of additional structured metadata in the archive submission systems will facilitate dataset annotation beyond the required minimum, as better documented datasets will benefit from enhanced reusability and gain broader visibility. On the basis of the experience gained and the community feedback regarding the practicalities of data submissions and reuse, the standards will need to be kept up to date and to evolve with the science, technology and practices of bioimaging. However, the ultimate test of this effort will be the extent to which biological imaging data deposited in relevant archives will be reused (Fig. 3). The lessons from microarray data show that the earliest mode of data reuse may be related to testing new data-analysis tools, rather than providing biological insights, which happened later18.

Fig. 3: Imaging data are already being reused.
figure 3

An example of a widely reused dataset is EMPIAR entry EMPIAR-10061 (https://empiar.org/10061), which contains the raw cryo-EM data (12.4 TB in size) underpinning what was a breakthrough structure and at the time the highest resolution cryo-EM structure available, the 2.2 Å resolution structure of β-galactosidase24. Several groups have reprocessed the data to even higher resolution and published and deposited the resulting EM maps. The dataset has been used by several developers of cryo-EM processing software to improve and test their algorithms, and it was used in the development of two deep-learning methods for automated particle picking, and to demonstrate cloud-based data processing. Details and literature references can be found at https://empiar.org/reuse.

The recommended imaging metadata standard described here will be adopted by the BioImage Archive, EMPIAR, Cell-IDR and Tissue-IDR. We hope that other existing and future archives will also adopt REMBI and engage with us to help shape the future development of the standard, in the spirit of the worldwide drive toward FAIR data sharing. To facilitate this, we encourage interested parties to contact us at rembi@ebi.ac.uk. We encourage scientific journals to support the deposition of bioimaging data in such FAIR resources, and funders to make data deposition a condition of grant funding. We also hope that instrument manufacturers and software developers, as well as large facilities and centers, will increasingly support recording of the recommended metadata automatically (in agreed formats), thereby minimizing the burden on the data submitters and minimizing data entry errors. Finally, we call on all scientists who use imaging methods in their published work to consider depositing their data and the associated rich metadata in the appropriate archives.

The current version of REMBI, including examples from the fields covered by the three working groups, is available as Supplementary Information, as well as from http://bit.ly/rembi_v1.