Image-based computational quantification and visualization of genetic alterations and tumour heterogeneity

Recent large-scale genome analyses of human tissue samples have uncovered a high degree of genetic alterations and tumour heterogeneity in most tumour entities, independent of morphological phenotypes and histopathological characteristics. Assessment of genetic copy-number variation (CNV) and tumour heterogeneity by fluorescence in situ hybridization (ISH) provides additional tissue morphology at single-cell resolution, but it is labour intensive with limited throughput and high inter-observer variability. We present an integrative method combining bright-field dual-colour chromogenic and silver ISH assays with an image-based computational workflow (ISHProfiler), for accurate detection of molecular signals, high-throughput evaluation of CNV, expressive visualization of multi-level heterogeneity (cellular, inter- and intra-tumour heterogeneity), and objective quantification of heterogeneous genetic deletions (PTEN) and amplifications (19q12, HER2) in diverse human tumours (prostate, endometrial, ovarian and gastric), using various tissue sizes and different scanners, with unprecedented throughput and reproducibility.


Supplementary Methods
For the circular Hough transform, the signal radius were defined empirically from 1 to 7 pixels according to domain knowledge and the edge gradient threshold was set to Matlab default (Otsu's method). The detection sensitivity was set to Matlab default (0.85) for tissues scanned by the Zeiss scanner and was set to 0.95 for tissues digitized by the Hamamatsu scanner, because the Zeiss scanner has a higher scanning resolution and a more advanced image sensor.
The SVM model was trained and validated (5-fold cross validation and grid search that iterates overall all pairs of C and γ) on an independent image set from a single tissue spot with two sets of expert annotations (one for the HER2 and the other for the remaining genes), consisting of 1000 image patches of size 13 × 13 pixels with PTEN, CEP10, PTEN +CEP10, white (background noise) and blue (cell stains) signals in the center of the patch ( fig. S6). The feature vector was constructed by concatenating (13 × 13 = 169) RGB values. For reduction of misclassified signals, only gene and corresponding CEP signals were used for subsequent calculation. Signals classified as white or blue were discarded. The maximum of the global ratio was set to three to circumvent false positive gene signals due to unspecific staining (any roundish black signals) for cases with gene deletion.
For prostate and ovarian cancers, each whole slide image was tiled into sub-images, in which we used the same parameter settings as the TMA for the circular Hough transform and the SVM model to detect and classify gene and CEP signals. A signal colormap was then drawn for each sub-image.
By merging signal colormaps of all sub-images, the complete signal colormap of the whole slide was generated. A three-dimensional bar graph was plotted for visualizing intra-tumor heterogeneity, where each bar represents a sub-image. The same workflow of the ISHProfiler was also applied to a whole slide of a gastric cancer tissue stained with DISH probes for HER2 /CEP17. To re-calibrate the molecular signal intensities of HER2 /CEP17, we used a different expert annotation of the same training data for SVM training and cross-validation.
Algorithm: An image-based computational workflow: ISHProfiler input : A digital image: I (tissue core or whole slide), Radius range: radiusRange, Detection sensitivity: sensitivity, Neighborhood distance: radius, K random points: P := {p k } K k=1 , A trained and validated SVM. output: The global ratio, randomized local ratio (RLR), randomized local density (RLD), and a signal colormap.
1 Detect GENE and CEP signals by circular Hough transform: imfindcircles(I,radiusRange,sensitivity). Get positions of the detected signals.
Calculate the ratio of GENE to CEP: r k = |G k |+|Z k | |C k |+|Z k | in the neighborhood, where | · | denotes the cardinality of a set.

10
Calculate the total number of GENE and CEP: d k = |G k | + |C k | + |Z k | in the neighborhood. 11 end 12 Save all r k , d k , k = 1, 2..., K.

Global model
Supplementary Figure S3. Estimation of overall survival hazard ratios by Cox regression. The dashed vertical line was drawn at the no effect point (hazard ratio of 1.0). Horizontal lines represent a 95% confidence interval. The mid-point of the box represents the mean effect estimate and the area of the box represents the weight for each subgroup. P < 0.05 are marked in bold. Limit for the stepwise reverse selection procedure was P = 0.1.

a c b d e f
Supplementary Figure S4. Sample tissue cores with IHC and ALU. (a-d) Examples of representative tissue cores (diameter 0.6 mm) with negative (score 0) and strongly positive (score 3+) immunoreactivity for antibodies against PTEN and ERG. (e,f ) Examples of representative tissue cores with weak and strongly positive ALU SISH as marker for DNA viability. If ALU staining is weak or negative, the DNA is not viable (no target for staining). A total of 13 cores (out of 84) were excluded from further analyses because of unviable DNA, lack of target tissue, or weak CEPs.  Table S3). AUC stands for area under the curve.
Supplementary Figure S8. Randomized neighborhood. The right images are the zoomed version of the left (diameter 0.6 mm), superimposed with detected points (drawn as squares, color-coded as in Fig.  1g) and random neighborhoods. A neighborhood is represented as a circle with a predefined radius. The center of such a circle is the CEP10 point that lies closest to a random point (in blue).   (Supplementary Fig. S11b). The height and the color of the 3D bar graph encode the global ratio of 19q12 to CEP19 in respective sub-images. Each sub-image has the dimension of 2000×2000 pixels. Supplementary Figure S13. Post hoc power analysis estimating power versus N for different hazard ratios. For instance, a two-sided log-rank test with an overall sample size of 100 subjects achieves 60.1% power at a 0.05 significance level to detect a hazard ratio of 1.70 (red dots) when the control group has a hazard ratio of 1.0. All subjects begin the study together (no accrual periods). The proportion dropping out of the both groups is 0.05.

Supplementary Tables
Clinicopathological, immunohistochemical and molecular features of prostate cancer patients with RPE