QuantISH: RNA in situ hybridization image analysis framework for quantifying cell type-specific target RNA expression and variability

RNA in situ hybridization (RNA-ISH) is a powerful spatial transcriptomics technology to characterize target RNA abundance and localization in individual cells. This allows analysis of tumor heterogeneity and expression localization, which are not readily obtainable through transcriptomic data analysis. RNA-ISH experiments produce large amounts of data and there is a need for automated analysis methods. Here we present QuantISH, a comprehensive open-source RNA-ISH image analysis pipeline that quantifies marker expressions in individual carcinoma, immune, and stromal cells on chromogenic or fluorescent in situ hybridization images. QuantISH is designed to be modular and can be adapted to various image and sample types and staining protocols. We show that in chromogenic RNA in situ hybridization images of high-grade serous carcinoma (HGSC) QuantISH cancer cell classification has high precision, and signal expression quantification is in line with visual assessment. We further demonstrate the power of QuantISH by showing that CCNE1 average expression and DDIT3 expression variability, as captured by the variability factor developed herein, act as candidate biomarkers in HGSC. Altogether, our results demonstrate that QuantISH can quantify RNA expression levels and their variability in carcinoma cells, and thus paves the way to utilize RNA-ISH technology.


RNA in situ hybridization and imaging
Chromogenic RNA-ISH: We used freshly cut 3 μm sections of the TMAs and RNAscope 2.5 HD detection kit-BROWN (#322310, ACDBio, Milano, IT) for target mRNA detection. TMA sections were baked for 1 h at 56 °C, deparaffinized and treated with hydrogen peroxide for 10 min at room temperature. Target retrieval was performed for 15 min at 98 °C, followed by protease plus treatment for 15 min at 40 °C. Target probes Hs-DDIT3-C1 (#311131), Hs-CCNE1 (#470021), as well as the positive and negative control probes Hs-PPIB (#313901) and DapB (#310043), respectively, were hybridized for 2 h at 40 °C followed by signal amplification steps. The samples were incubated for 60 min with AMP 5 reagent. The sections were next treated with DAB for 10 min at room temperature followed by counterstaining with 50% hematoxylin. The sections were dipped in ammonium water and dehydrated before mounting. All TMAs were digitalized using 3DHISTECH Pannoramic 250 FLASH II digital slide scanner.
Fluorescent RNA-ISH: RNA in situ hybridization was performed on freshly cut 3 μm formalin-fixed paraffin embedded (FFPE) tissue sections using RNAScope Multiplex Fluorescent Reagent Kit Version 2 for target detection (#323100, Advanced Cell Diagnostics) according to the manual. In short, tissue sections were baked for 1 h at 60 °C, then deparaffinized and treated with hydrogen peroxide for 10 min at room temperature (RT). Target retrieval was performed for 15 min at 98 °C, followed by protease plus treatment for 15 min at 40 °C. All RNAScope probes (Table S1) were hybridized for 2 h at 40 °C followed by signal amplification, and developing of HRP channels was performed according to the manual. TSA Plus fluorophores fluorescein (1:750 dilution), Cyanine 3 (1:1500 dilution), and Cyanine 5 (1:3000 dilution) (NEL744001KT, Perkin Elmer) were used for signal detection. The sections were counterstained with DAPI, and mounted with ProLong Gold Antifade Mountant (P36930, Invitrogen). Images were generated using 3DHISTECH Pannoramic of Medicine, University of Helsinki, and Biocenter Finland. All samples were scanned using 40x magnification with extended focus and 7 focus levels.  Figure 2).
Tiling the DAPI images: As the full whole slide images are too resource demanding for CellProfiler segmentation, the DAPI channel images were cropped into four tiles, each of which was segmented separately and assembled back into a whole slide segmentation for downstream analysis.
Cell segmentation: We used CellProfiler software (version 3.1.8) for segmenting the DAPI staining.
The non-default parameters that were determined experimentally were as follows: typical diameter 18 to 56 pixels, thresholding using adaptive Otsu's method (i.e. local not global), clumped object detection and splitting using shape (i.e. not intensity), and low-resolution speedups were disabled.
An example is shown in Supplementary Figure 5A.
Cell type classification: The segmented objects were classified into cancer, immune, and stromal cells using the DAPI staining and its segmentation. For this, we extracted the area, the mean nucleus stain intensity, and the eccentricity of each segmented object (Supplementary Figure 5B).
These features were selected based on the same logic as for the chromogenic images. Subsequently, we trained a supervised quadratic classifier using 402 cells with the properties mentioned above and the desired cell types (Supplementary Figure 5C). Afterwards, the cell types were predicted in untrained images (Supplementary Figure 5D). In order to improve the classification accuracy, the classification involved one artifact class in addition to the cancer, immune, and stromal classes, which was trained to represent the small segmentation artifacts in classification. The classifier was implemented in MATLAB and was trained with uniform class priors. The classification was filtered using spatial information from the neighboring cells, as described for the chromogenic images in the main manuscript.
RNA signal quantification: Cross-channel fluorescence bleed of Cy5, FITC, and TRITC staining was reduced by finding a suitable basis for the intensity data of all pixels near the principal axes (channels) using power iteration. Note that this is done at the pixel level, not at the cellular level.
Since some RNA signals are localized in the cytoplasm of the cells, we also expanded the segmentation corresponding to the cancer, immune, and stromal nuclei to include the cellular cytoplasm, as described for the chromogenic images in the main manuscript. Next, the fluorescence intensity signal was quantified using the negative response of a Laplacian of Gaussian filter with standard deviation of unity. The value was tuned manually, and the kernel width roughly corresponds to the diameter of an observed RNA spot in our images. This procedure filters out background variations and cellular autofluorescence, leaving intensity blobs of the specified size.

Variability factor formulation
The weighted mean expression and its weighted variance for each TMA (whole slide) spot/patient are computed as follows: log v ∼ α 1 log m+α 0 and obtained the estimated parameters α 1 , α 0 of the linear regression. Following this, the expected log-variance due to the mean is: which is then regressed out from the variability measurements: log v factor :=log v −log v that is: v factor ∼v / mα 1 which is a generalization of the Fano factor (α 1 =1 ), and also of the coefficient of variation (α 1 =2 ), and the variance (α 1 =0 ), with arbitrary power-law relationship between the variance and the mean.
Once the relationship has been configured, the variability factor can be computed for individual samples, just like the mean and the variance.  Figure 2C) or the DDIT3 variability (cf. Figure 4C), both of which correlate with the platinum-free interval. Figure 7. CCNE1 variability factor does not significantly change over three