MCMICRO: a scalable, modular image-processing pipeline for multiplexed tissue imaging

Highly multiplexed tissue imaging makes detailed molecular analysis of single cells possible in a preserved spatial context. However, reproducible analysis of large multichannel images poses a substantial computational challenge. Here, we describe a modular and open-source computational pipeline, MCMICRO, for performing the sequential steps needed to transform whole-slide images into single-cell data. We demonstrate the use of MCMICRO on tissue and tumor images acquired using multiple imaging platforms, thereby providing a solid foundation for the continued development of tissue imaging software.


Figure S1
The EMIT dataset spanning 123 tissue cores across 34 cancer, non-neoplastic diseases, and normal tissue type A. CyCIF whole slide image of EMIT visualizing Hoechst 33342-stained nuclear DNA (white), Keratin (orange), MART1 (cyan), CD45 (green) and SMA (purple). B. A zoom-in view of a metastatic melanoma (left, red box) and a lung adenocarcinoma (right, blue box) core. The highest zoom level is highlighted with white boxes in the corresponding low magnification images. This experiment was performed once.

Figure S2
Principal component analysis (PCA) of Spatial Feature Tables derived from EMIT images A. represents normal tissues and B. cancer tissues. Independent cores cluster to a substantial degree by tissue or cancer type; some variation is expected because tumors had different grades and derive from different individuals. Data from the following antibodies was used to generate the data: CD73, MART1, KI67, pan-cytokeratin, CD45, ECAD, α-SMA, CD32, CDKN1A, CCNA2, CDKN1C, CDKN1B, CCND1, cPARP, CCNB1, PCNA and CDK2.

Figure S3
Nextflow enables reproducible data processing using the provenance module A. Nextflow report provides detailed documentation for used resources, directories, repositories (including commit hash) and the corresponding execution times. The report is browser based and interactive. B-D. Provenance reconstruction enabled by recording each executed command (.sh) and its output (.log). Representative examples of a command and its output are shown in C and D, respectively.

Detailed insight into the computational resources required by each module, generated by Nextflow
The data is viewed as an interactive browser-based report. A. Physical memory usage is recorded as either RAM only (visualized), RAM + Disk swap or % RAM allocated. B. Job duration is recorded as either execution time (visualized) or % time allocated. C-D compares the physical memory usage and E-F the task execution time in minutes between whole slide image TMA downstream processing (left, C and E) and Coreograph segmented TMAs (right, D and F). Panels A, B, C and E were derived when processing a single whole-slide image (n=1), while panels D and F correspond to processing the EMIT TMA (n=123 images). In all panels, the box shows inter-quartile range (q1 to q3) with a horizontal line denoting the median; the whiskers extend to the minimum and the maximum values.

Figure S5
The web-based Galaxy platform for Multiplexed Tissue Imaging analysis A. A detailed representation of the Galaxy interface implemented for MCMICRO. Process of a TMA is shown, including image registration, de-arraying, segmentation, quantification, and cell state phenotyping. A single line denotes one dataset flowing from the output of one tool to the input of the next tool, whereas multiple lines denote a group of datasets flowing from one tool to the next. B. Galaxy tool user interface showing the ASHLAR module. With this interface, a user can select tool inputs, set parameter values, and execute the tool all in a web browser. Default parameters are provided that can be changed at run time. C. Multi-channel images can be viewed interactively in a web browser from Galaxy using the Avivator visualization. A single core of the TMA with channels corresponding to Hoechst 33342-stained nuclear DNA (blue), α-smooth muscle actin (α-SMA; green), cytokeratin (red), B cell marker CD20 (cyan), and the T cell marker CD3 (magenta). With Avivator, users can browse and zoom around images, show and hide channels, and set channel colors and intensity ranges. Table S1: Data collected in the course of this study using methods in bold were successfully processed by MCMICRO; results are shown in Figure 2 and detailed descriptions of intermediate processing results are provided with the documentation available at mcmicro.org. Publicly available data for IMC and MIBI were also processed.  16 Visualization tool OMERO 17 Visualization tool Minerva 18 Visualization tool Seurat 19 R toolkit for single cell genomics Scanpy 20 Python toolkit for single cell genomics Cumulus 21 Cloud-based framework for single cell genomics Table S2: List of open-source tools available for highly multiplexed image processing.