Highly multiplexed tissue imaging using any of a variety of optical and mass-spectrometry-based methods (Supplementary Table 1) combines deep molecular insight into the biology of single cells with spatial information traditionally acquired using histological methods, such as hematoxylin and eosin (H&E) staining and immunohistochemistry (IHC)1. As currently practiced, multiplexed tissue imaging of proteins involves 20–60 channels of two-dimensional (2D) data, with each channel corresponding to a different antibody or colorimetric stain (Fig. 1). Multiple inter-institutional and international projects, such as the Human Tumor Atlas Network (HTAN)2, the Human BioMolecular Atlas Program (HuBMAP)3 and the LifeTime Initiative4, aim to combine such highly multiplexed tissue images with single-cell sequencing and other types of omics data to create publicly accessible atlases of normal and diseased tissues. Easy public access to primary and derived data is an explicit goal of these atlases and is expected to encompass native-resolution images, segmented single-cell data, anonymized clinical metadata and treatment history (for human specimens), genetic information (particularly for animal models) and specification of the protocols used to acquire and process the data. Given the imminent release of the first atlases, an urgent need exists for data and metadata standards consistent with emerging Findable, Accessible, Interoperable, and Reusable (FAIR) standards5. In this commentary, we establish the Minimum Information about Highly Multiplexed Tissue Imaging (MITI) standard and associated data-level definitions; we also discuss the relationship of MITI to existing standards, practical implementations and future developments.

Fig. 1: The steps in a canonical multiplexed tissue imaging experiment and the associated metadata.
figure 1

In a typical workflow, samples collected from patient biopsies and resections or from animal models are formaldehyde fixed and paraffin embedded or frozen, and are then sectioned and mounted onto either a standard glass microscope slide (for CyCIF, mIHC, IMC, MELC or mxIF), fluidic chamber (for CODEX) or specialized carriers (for MIBI). Clinical and biospecimen metadata (extracted from clinical records, for example) are linked to all other levels of metadata via a unique ID (Biospecimen ID). Data are acquired using cyclical or noncyclical staining and imaging methods, and both reagent and experimental metadata (consisting of antibody, reagent and instrument metadata) are collected. In both cyclic and noncyclic methods, sections undergo preprocessing, antigen retrieval and antibody incubation, and images are acquired. In cyclic imaging methods, fluorophores or chromogens are inactivated or removed and additional antibodies and/or visualization reagents are applied and data acquisition repeated. Channel and instrument metadata capture these essential details.

Scope and target audiences

MITI covers biospecimen, reagent, data acquisition and data analysis metadata, as well as data levels for imaging with antibodies, aptamers, peptides, dyes and similar detection reagents (Supplementary Table 1). The standard is also compatible with images based on H&E staining, low-plex immunofluorescence (IF) and IHC. A working group is currently extending MITI to cover subcellular-resolution imaging of nucleic acids using methods such as MERFISH6. Although conceived with today’s 2D images in mind (typically involving 5–10-μm-thick sections of fixed or frozen specimens), MITI accommodates three-dimensional (3D) datasets acquired using confocal, deconvolution and light-sheet microscopes7. MITI has been established as its own organization with its own GitHub repository, governing structure and procedures for proposing and incorporating revisions. The definition of MITI is available in the machine-readable YAML format (https://github.com/miti-consortium/MITI) along with other relevant information. MITI has also been implemented in practice (https://github.com/ncihtan/data-models) and used to structure metadata available via the HTAN data portal (https://htan-portal-nextjs.vercel.app). However, MITI is independent of HTAN or any single research consortium.

Highly multiplexed imaging is derived from methods such as IHC and IF that are in widespread use in preclinical research using cultured cells and model organisms, and in clinical practice with human tissue specimens. Many standards and best practices have been established for these types of data (Supplementary Table 2), but high-plex imaging presents unique challenges: images are expensive to collect and can be very large (up to 1 TB in size), specimens are often difficult to acquire and may have data use restrictions, and accurate clinical and genomic annotation is a necessity. Recent interest in highly multiplexed tissue imaging has been driven by applications in oncology, largely due to the importance of the tumor microenvironment in immuno-editing and responsiveness to immunotherapy, but the approach is broadly applicable to studying normal development, infectious disease, immunology and other topics. HuBMAP3, for example, is using high-plex imaging to study a range of normal human tissues. MITI is also relevant to studies with model organisms, and data tables have already been created to store data from genetically engineered mouse models (GEMMs) in a standardized manner.

Multiplexed imaging also promises to influence the pathological diagnosis of diseases, which is rapidly switching to digital approaches8. For over a century, histological analysis of anatomic specimens (from biopsies and surgical resection) has been the primary method to diagnose diseases such as cancer9, and this remains true today, despite the impact of gene sequencing. Multiplexed tissue imaging promises to augment such conventional pathological diagnosis with the detailed molecular information needed to specify the use of contemporary precision therapies. This is therefore an opportune time to seek alignment of research and diagnostic approaches by establishing public standards able to take full advantage of the detailed molecular information revealed by emerging imaging methods.

Existing standards and approaches

The Human Genome Project, the Cancer Genome Atlas (TCGA)10 and similar large-scale genomic programs have developed several approaches to data management that are of immediate relevance to tissue atlases. The first is the concept of “minimum information” metadata, which has been employed in microarrays (the MIAME standard)11, genome sequences (MIGS)12 and biological investigation in general (MIBBI)13. The second is the idea of “data levels” (https://gdc.cancer.gov/resources-tcga-users/tcga-code-tables/data-levels), which specify the extent of data processing (raw, normalized, aggregated or region of interest, corresponding to data levels 1–4) and access control. Access control is required because even anonymized DNA sequencing data pose a risk of re-identification14. As a result, the database of Genotypes and Phenotypes (dbGaP), the US National Cancer Institute (NCI) Genome Data Commons (GDC)15 and the US Federal Register (79 FR 51345) control access to primary sequencing data (so-called level 1 and 2 sequencing data) based on policies set by a data access committee. Higher-level genomic data, which are generally more consolidated, involve information aggregated from many patients and pose little or no re-identification risk, can be freely shared16 (Fig. 2). When datasets are combined, they acquire the most stringent restriction applied to any constituent element. Although we are not aware of any policies addressing the anonymity of histological images, consultation with our Institutional Review Boards (IRBs, or ethics committees) has led us to conclude that public release of tissue images does not constitute a risk to patient privacy. MITI data levels are nonetheless consistent with the existing GDC and dbGaP practice whereby data intended for unrestricted distribution are classified as level 3 and up. In the case of images adhering to the MITI standard, level 3 data have been subjected to quality control and some degree of human annotation, making them more useful in a shared environment than raw images. We anticipate that IRBs and government agencies will in the future provide further guidance on the sharing of datasets that combine clinical history, sequence information and tissue images; MITI will be adapted to accommodate such guidance.

Fig. 2: MITI data levels and formats.
figure 2

Data levels specify the extent of data processing and, in the case of sequencing data, whether access requires the approval of a data access committee. In common practice, data at levels 3 and up are freely shared. Primary data arising from microscopes and data acquisition instruments correspond to level 1 data. Because the raw image data acquired from one slide usually consist of separate image fields, possibly from proprietary formats, they are processed to correct for uneven illumination and other instrumentation artifacts and assembled into a single multichannel image in the OME-TIFF format (level 2 data). OME-TIFF image mosaics undergo quality control (including artifact removal, channel rejection and evaluation of staining quality) to generate full-resolution, assembled and curated level 3 image data; segmentation algorithms generate one or more label masks that also comprise level 3 data. The great majority of users will want to access these level 3 images. Each label mask (for example, nuclei, cytoplasmic regions, whole cells, organelles, etc.) is used to compute quantitative features, such as the mean signal intensity, spatial coordinates of individual cells and morphological features, which are stored as level 4 spatial feature tables (where rows represent single cells and columns the extracted cellular features); these data are suitable for analysis using the dimensionality reduction and visualization tools used for other types of single-cell data (for example, UMAP plots). Spatial models computed from images and spatial feature tables or by direct application of machine learning to images, as well as images annotated by humans, comprise level 5 data.

The MITI standard also draws extensively on image formats developed for cultured cells and model organisms and on a wide variety of open-source software tools (Supplementary Table 3). Noteworthy among these are the Open Microscopy Environment (OME) TIFF standard17 and the BioFormats18 approach to standardization of microscopy data. MITI field definitions are harmonized with the QUality Assessment and REProducibility for Instruments and Images in Light Microscopy (QUAREP-LiMi)19 effort, the Resource Identification Initiative20 and antibody standardization efforts by the Human Protein Atlas21 and are also compliant with the recently developed Recommended Metadata for Biological Images initiative22. Metadata on model organisms (particularly GEMMs and patient-derived xenografts (PDXs)) are aligned with existing standards, many developed for genomic information (see Supplementary Table 2 for a full list of antecedent resources). Well-curated clinical information is essential for the interpretation of data from human specimens, but standardizing such information has proven to be a major challenge in the past, for example in TCGA23,34. Thus, HTAN and other current NCI projects focused on human specimens are emphasizing standardization of clinical metadata, and the MITI standard is designed to closely align with the GDC Data Model24 in this regard (Supplementary Tables 5 and 6).

All imaging methods generate data that comprise a sequence of intensity values on a raster; multispectral imaging simply adds new dimensions to the raster. The cameras that collect H&E and IHC images from bright-field microscopes or high-plex images from fluorescence microscopes generate rasters; ablation-based mass-spectrometry imaging (for example, multiplexed ion beam imaging (MIBI) and imaging mass cytometry (IMC)) is also raster based. As currently defined, MITI specifies that raster images should be stored in the OME-TIFF 6 standard, but OME formats are currently being migrated to a set of next-generation file formats (collectively called OME-NGFF)25 to improve their scalability and performance in the cloud. MITI will be updated to align with these new formats as they come into general use. Another area of translational and clinical research in which imaging is commonly encountered is radiology, which is almost entirely digital, and which uses data interchange standards governed by the Digital Imaging and Communications in Medicine (DICOM) standard (https://www.dicomstandard.org/). DICOM has recently been extended to accommodate both radiology data and OME-TIFF standards26. The NCI’s ongoing program to create an Imaging Data Commons27 is expected to be based on this dual standard, or on a successor using OME-NGFF. MITI is, or will be, compatible with these foundational data standards.

In highly multiplexed tissue imaging, antibodies either are conjugated to fluorophores directly or via oligonucleotides or are bound to secondary antibodies (Fig. 1, Supplementary Table 4). Images are then acquired serially, one to six channels at a time, to assemble data from 20–60 antibodies. In ablation-based methods, antibodies are labeled with metals and vaporized with lasers or ion beams, after which they are detected by atomic mass spectrometry (Supplementary Table 4). In all cases, the raw output of data acquisition instruments comprises level 1 MITI data (Fig. 2), analogous to the level 1 FASTq files in genomics.

Whole-slide imaging is required for clinical applications28 and also necessary to ensure adequate power in preclinical studies29. However, resolution and field of view have a reciprocal relationship—with respect both to optical physics and to the practical process of mapping image fields onto the fixed raster of a camera (or ablating beam). Whole-slide images of histological specimens8 must therefore be acquired by dividing a large specimen into contiguous tiles. This usually involves acquisition of ~100–1,000 tiles by moving the microscope stage in both the x and y dimensions, with each tile being a multidimensional, subcellular-resolution TIFF image. Tiles are combined at subpixel accuracy into a mosaic image, in a process known as stitching. When high-plex images are assembled from multiple rounds of lower-plex imaging, it is also necessary to register channels to each other across imaging cycles and to correct for any unevenness in illumination (so-called flat-fielding)30. Stitched and registered mosaics can be as large as 50,000 × 50,000 pixels × 100 channels and require ~500 GB of disk space. They correspond to level 2 MITI data and represent full-resolution primary images that have undergone automated stitching, registration, illumination correction, background subtraction and intensity normalization and have been stored in a standardized OME format. The level of processing is analogous to that of BAM files, a common type of level 2 data in genomics.

Level 3 data represent images that have been processed with some interpretive intent, which may include (i) full-resolution images following quality control or artifact removal, (ii) segmentation masks computed from such images, (iii) machine-generated spatial models and (iv) images with human- or machine-generated annotations. Level 3 MITI data are roughly analogous to level 3 mRNA expression data in genomics. However, whereas many users of genomic data only require access to processed level 3 and 4 data, which are usually quite compact, quantitative analysis of tissue images adds a requirement for full-resolution primary images so that images and computed features can be examined in parallel31. Level 3 MITI data are intended to be the primary type of image data distributed by tissue atlases and similar projects.

Assembled level 3 images are typically segmented to identify single cells31, which are quantified to produce a ‘spatial feature table’ that describes marker intensities, cell coordinates and other single-cell features. The level 4 data in spatial feature tables are a natural complement to count tables in single-cell sequencing data (for example, from scRNA-seq, scATAC-seq or scDNA-seq) and can be analyzed using many of the same dimensionality reduction methods (for example, PCA, t-SNE and U-MAP)32 and online browsers such as cellxgene (Supplementary Table 3)33. These types of tabular data are all examples of ‘feature observation matrixes’, which are themselves being standardized across domains of biology to improve their utility and intercompatibility. Level 5 MITI data comprise results computed from spatial feature tables or primary images. Because access to terabyte-size full-resolution image data is impractically burdensome when reading a manuscript or browsing a large dataset, a specialized type of level 5 image data has been developed to enable panning and zooming across images using a standard web browser35. In the case of level 5 images viewed with MINERVA software, the aim is to exploit functionality and concepts similar to those in Google Maps or electronic museum guides. The inclusion of digital docents with images makes it possible to combine pan and zoom with guided narratives that greatly facilitate comprehension of complex datasets and promote new hypothesis generation35.

For any metadata standard to be used, a balance must be struck between ease of data entry, which minimizes noncompliance by data generators, and level of detail, which must be sufficient for data retrieval, analysis and publication in a reproducible manner. Moreover, specification of a metadata standard is separate from the essential task of developing a practical and reliable means for capturing information needed to ensure adherence to the standard. Two approaches have proven most effective in addressing this requirement. One, exemplified by OMeta36, involves a relational database and web interface that data generators use to input necessary information in a controlled manner. Another approach, exemplified by MAGE-TAB37, involves a standardized format for collecting metadata via a series of structured documents, which are then used to populate web pages and databases38. As a practical test of MITI, we have implemented the latter approach in a JSON schema (https://github.com/ncihtan/data-models) that also conforms to the design principles of SCHEMA.org. These principles focus on the creation, maintenance and promotion of schemas for structured data that are supported by major web search engines, thereby enhancing discoverability. In this TAB-like approach, the MITI standard is exposed to data collectors as Google Sheets with dropdowns representing controlled vocabularies and highlighting required or optional elements; many fields are automatically validated upon entry. These documents are ingested using SCHEMATIC (Schema Engine for Manifest Ingress and Curation; https://github.com/Sage-Bionetworks/schematic), automatically linked to primary imaging data and stored as cloud assets. These implementations continue to evolve, and entirely different approaches are possible: nothing in a MITI-type standard constrains how data are collected.

Whereas many research agencies and countries have made a major investment in curating, storing and distributing genomic data, fewer repositories exist for primary image data. The Image Data Resource39 maintained by the European Bioinformatics Institute (EBI) is an exception, but as the volume of image data grows, other means of data distribution will almost certainly be required. In the US, in the absence of a major public investment in data storage, the development of “requester pays”40 access to datasets is a promising development. The primary cost associated with the creation and maintenance of a dataset on a commercial cloud service involves data download, not data ingress and storage. In a ‘requester pays’ model, a user seeking access to a dataset pays the cost of data egress directly to the cloud provider, making access both secure and anonymous (moreover, the cost of egress into another account on the same commercial cloud is low). Although this approach might appear to create an impediment to research, the actual cost of egress is quite low (currently, about US $100/TB) compared to any form of data acquisition, and a key goal is to avoid a tragedy of the commons in which frequent, duplicate downloads overwhelm the system. A combination of a MITI implementation on a cloud service (as described above) with ‘requester pays’ cloud access will also make it possible for individuals to distribute very large FAIR image datasets at relatively low cost. Such an approach does not obviate the need for public investments, such as those being made by the EBI, but it does represent a practical way forward for democratizing the release of standardized data—some of which can then be incorporated into publicly supported resources. Regardless, the MITI standard described here is available for immediate use, without being affected by how access to the primary data is provisioned.

Public data and metadata standards have been essential for the success of genomics and other fields of biomedicine, but the creation of a new standard is no guarantee of successful adoption. An outpouring of effort 10–20 years ago led to the development of widely adopted and well-maintained standards such as MIAME11, MIGS12 and MIBBI13, and these have been consolidated and further documented by the Digital Curation Center (https://www.dcc.ac.uk/), FairSharing.org and similar projects. However, many other minimum information projects have been left unattended41, and it remains unclear whether existing metadata adequately conform to user needs42. The development of MITI and of the initial HTAN implementation enjoys NCI support and is expected to become part of the NCI Cancer Research Data Commons27, helping ensure its viability. However, individuals and organizations are invited to join in the further development of MITI and should make contact via the image.sc forum or submit pull requests (that is, requests for inclusion in the MITI ‘code base’ at https://github.com/miti-consortium/MITI). Because high-plex tissue imaging is in its infancy and MITI has attracted the great majority of developers of existing high-plex tissue image acquisition methods, it represents a solid beginning for what will need to be an evolving standard. By having its own repository and governance structure, independent of any particular research program or constituency, MITI also conforms with other requirements of successful open standards43.