Optimizing multiplexed imaging experimental design through tissue spatial segregation estimation

Recent advances in multiplexed imaging methods allow simultaneous detection of dozens of proteins and hundreds of RNAs, enabling deep spatial characterization of both healthy and diseased tissues. Parameters for the design of optimal multiplex imaging studies, especially those estimating how much area has to be imaged to capture all cell phenotype clusters, are lacking. Here, using a spatial transcriptomic atlas of healthy and tumor human tissues, we developed a statistical framework that determines the number and area of fields of view necessary to accurately identify all cell phenotypes that are part of a tissue. Using this strategy on imaging mass cytometry data, we identified a measurement of tissue spatial segregation that enables optimal experimental design. This strategy will enable an improved design of multiplexed imaging studies.

Practical consideration for multiplexed imaging experimental design This document contains practical guidelines for the experimental design of multiplexed imaging experiments, for instance Imaging Mass Cytometry experiments. We will consider that the experiment involves a homogenous group of samples, all derived from the same type of tissue/organ. In practice, the imaging time required to image the whole area of all samples is too long, therefore one should determine the minimal area to image per sample such that the global composition of each sample can be accurately described. Here we will describe three practical cases where our model can be used together with prior experimental data to optimize experimental design. All scripts and functions used here are available on a GitHub repository (https://github.com/PierreBSC/ MI_Sampling_study).

A) Derive sampling parameters from a large multiplexed panorama image
The experimenter can image a large region, typically several mm 2 , of a representative sample with the same technology as the one that will be used for all samples in the experiment (and in the case of a targeted approach, the same marker panel). The resulting data can then be analyzed using our method (Equations (1) and (2) in Bost et al; implemented in our script by the R function Perform_sampling_analysis()). It will determine the value for various FoV sizes and the relation between the two, i.e the parameter. Once the FoV size has been selected, we recommend to image 2 FoVs of the selected FoV size in order to reach a reasonable saturation (>86% of the cell phenotype recovered on average). While this is the most quantitative and rigorous approach, it requires a preliminary experiment, thus representing additional reagent cost and imaging time.

B) Infer sampling parameters from previous multiplex imaging data
In some cases, the experimenter will have access to previously generated multiplexed imaging data from the same tissue/organ. The technology used to generate this dataset should be the same as the one planned to be used to generate new data, including a similar marker panel. In most cases those data will consist of a set of small FoVs, that can be used to infer the parameter value through a shrinkage analysis, as described in the manuscript ( Figure S2g). This is done using the Global_alpha_estimation() function and will provide a rough estimate of . While this approach does not provide a precise description of the optimal sampling strategy, it can still identify tissues with a low value (i.e., with low spatial segregation) where imaging numerous small FoV should be highly favored compared to imaging a small number of large FoVs. In practice, we recommend to image FoVs with a width lower than 200µm when the is equal to or below one, such as in the highly structured breast cancer samples.

C) Infer sampling parameters from publicly available spatial transcriptomic (ST) data
If ST data (e.g., Visium data) from the tissue/organ of interest are publicly available, the datasets can be first processed using the R script Visium_data_processing.R and then analyzed using the Perform_sampling_analysis() function in order to estimate the value of the parameter. The computed value cannot be directly used, due to the difference of technology, as illustrated in the manuscript ( Figure 2c). As for strategy B above, this approach should be used to identify tissues with a low value (i.e., with low spatial segregation) where imaging numerous small FoV should be highly favored compared to imaging a small number of large FoVs.
Note that our paper (Bost et al) contains a list of computed values (i.e., degree of cell phenotype spatial segregation within the tissue) for various healthy and tumor tissues (Supplementary Table 3) using publicly available ST Visium datasets. "