Building the HCA will require extensive lab protocols, SOPs, benchmarks and quality-control metrics. Each technology and tissue requires careful benchmarking of protocols or validation of datasets. Benchmarking can be conducted (1) across many sites using the same technology; (2) across many sites using complementary technologies9 and (3) at the same site using complementary technologies10 (Fig. 2). For example, in the case of RNA applications, we should compare profiles between single-cell, single-nucleus, bulk and spatial transcriptomics methods to comprehensively identify the different cell types in tissues. Considering the diversity of tissues and questions, we argue that benchmarking experiments should aim to produce decision trees that serve to guide researchers to choose a protocol best suited to their samples and questions (Fig. 2). Such systematic testing has been performed for specific tissues8,11 and protocols9,10, highlighting important differences in resulting datasets, but continuous benchmarking efforts are required to broadly define applicable guidelines.

Fig. 2: Enabling high-quality, cost-effective data generation through systematic benchmarking and decision trees. Standards and best practices are defined for sampling procedures and profiling technologies through centralized or decentralized benchmarking activities, simulating different data-generation scenarios. Protocols and technologies are evaluated on the basis of their accuracy and sensitivity to detect cell types or states (stage 1) and their suitability to be integrated with other cell atlas efforts (stage 2). Eventually, new protocols are disseminated as SOPs with related guidelines and quality control measures, combined with hands-on training activities by the community. Full size image

The HCA’s STWG and Analysis Working Group will facilitate this process by developing broadly agreed-upon experimental and computational metrics and guidelines for these comparisons. The STWG will also receive feedback from the HCA Biological Networks on the application of these guidelines to specific organs, tissues and systems. Below, we outline the considerations for the construction of robust and useful SOPs and benchmarking datasets for each stage of the process, including sample collection, sample processing, sample profiling and data analysis.

Sample collection

HCA labs obtain and process human specimens from healthy living donors, clinical biopsies and surgical resections on living patients, deceased transplant organ donors and rapid autopsies. It is important to maximize biospecimen quality early in the sample collection process by rapid sample processing or preservation in clinical settings12,13 and by minimizing the post mortem interval (PMI) for deceased donor samples. We emphasize three key preanalysis quality metrics: first, pathology review with careful recording of the precise anatomical location of each specimen (ideally following a common coordinate framework allowing mapping and comparison of the sample to a reference template14); second, review and collection of associated donor metadata, including health and disease states; recording of sample metadata (for example, PMI measurements, freezing and/or fixation times); and third, scoring of biomolecule quality and integrity, if possible, and recording of quality control (QC) data for downstream assays (for example, viability, Bioanalyzer for scRNA-seq).

Molecular profiling of dissociated cells or nuclei

Although sc/snRNA-seq is already one of the main profiling methods in the HCA, two key challenges remain. First, each tissue type typically requires at least some optimization for successful cell dissociation or nuclei extraction. Cell dissociation depends on the cell type and extracellular matrix composition of each tissue, and its process directly impacts the atlas’s quality as a result of transcriptional responses and/or RNA degradation during extended incubation8, as well as biases in cell viability and recovery15. snRNA-seq instead isolates nuclei from snap-frozen or lightly fixed tissue, tackling archived (frozen)11 and hard-to-dissociate tissues (for example, brain)16, but different buffers, detergents and physical forces can affect the recovery of nuclei from tissues, fewer genes and transcripts are detected by snRNA-seq17, and cell type enrichment is challenging. Both approaches recover cells with similar profiles, but sometimes at different proportions, with immune cells often more prevalent in scRNA-seq and many parenchymal cells more prevalent in snRNA-seq8,11. To assess such biases, we can use computational QC to determine cell composition11 or the presence of ambient RNA18, as well as auxiliary experimental data to determine the ground truth of cellular composition, including bulk RNA-seq (also providing a tissue-specific reference transcriptome) and spatial profiling. For example, in a lung dataset, bulk RNA-seq identified the depletion of fibroblast and endothelial cells and the enrichment of immune cell types in scRNA-seq datasets as a result of dissociation15. Efforts of the STWG involve the comparison of different sn/snRNA-seq modalities (3′, 5′, full-length and total RNA), multi-omics protocols, scATAC-seq, and spatial RNA and protein measures from donor-matched kidney samples. This framework can be readily extended to other tissue types in health and disease.

In situ and spatial profiling

To build an atlas, it is essential to characterize cells in their spatial context in tissues and whole organs. Benchmarking these methods, many of them not yet as broadly adopted, spans several challenges, including testing and sharing reagents — in particular, for spatial methods relying on RNA probes (for example, MERFISH or Seq-FISH) and antibodies (for example, MIBI, CODEX, or tCy-CIF); testing protocol-specific optimizations for specific tissues; testing equipment, particularly for methods that rely on highly specialized equipment that is not yet broadly available to other labs and that poses a cost barrier; and comparing to complementary methods like single-molecule FISH and immunohistochemistry of individual RNA and proteins, respectively. One key strategy is comparing different technologies on the same tissue. Given the highly specialized nature of many of these techniques, this often involves a collaborative effort whereby different expert labs apply different technologies to the same sample (Fig. 2; for example, using consecutive sections; see SpaceTx project below). In addition, applying both spatial profiling and molecular profiling of dissociated cells from the same tissue, as has been used for the atlas of the developing human heart19, can help assess the congruence of the two methods. As spatial technologies mature, they will require systematic evaluation to ensure a high-quality dataset for HCA, and we believe that their robustness and reproducibility will continue to progress in the near future.

Aside from benchmarking, there are also key opportunities for further development through concerted and collaborative efforts. Among these are improvements in the signal-to-background ratio and the resolution for approaches based on an imaging readout (through preparation approaches like tissue clearing20 or expansion microscopy21), as well as enhancing the resolution or deconvolution22 of approaches that are not single-cell or imaging based (for example, spatial transcriptomics, Slide-Seq and HDST). There are also efforts to improve the throughput for imaging-based strategies (for example, MIBI, FISSEQ, MERFISH, SeqFISH and STARmap), which require substantial imaging or processing time.