Automated high-throughput image processing as part of the screening platform for personalized oncology

Schilling, Marcel P.; El Khaled El Faraj, Razan; Urrutia Gómez, Joaquín Eduardo; Sonnentag, Steffen J.; Wang, Fei; Nestler, Britta; Orian-Rousseau, Véronique; Popova, Anna A.; Levkin, Pavel A.; Reischl, Markus

doi:10.1038/s41598-023-32144-z

Download PDF

Article
Open access
Published: 29 March 2023

Automated high-throughput image processing as part of the screening platform for personalized oncology

Marcel P. Schilling¹,
Razan El Khaled El Faraj²,
Joaquín Eduardo Urrutia Gómez²,
Steffen J. Sonnentag²,
Fei Wang³,
Britta Nestler³,
Véronique Orian-Rousseau²,
Anna A. Popova²,
Pavel A. Levkin² &
…
Markus Reischl¹

Scientific Reports volume 13, Article number: 5107 (2023) Cite this article

1918 Accesses
1 Citations
4 Altmetric
Metrics details

Subjects

Abstract

Cancer is a devastating disease and the second leading cause of death worldwide. However, the development of resistance to current therapies is making cancer treatment more difficult. Combining the multi-omics data of individual tumors with information on their in-vitro Drug Sensitivity and Resistance Test (DSRT) can help to determine the appropriate therapy for each patient. Miniaturized high-throughput technologies, such as the droplet microarray, enable personalized oncology. We are developing a platform that incorporates DSRT profiling workflows from minute amounts of cellular material and reagents. Experimental results often rely on image-based readout techniques, where images are often constructed in grid-like structures with heterogeneous image processing targets. However, manual image analysis is time-consuming, not reproducible, and impossible for high-throughput experiments due to the amount of data generated. Therefore, automated image processing solutions are an essential component of a screening platform for personalized oncology. We present our comprehensive concept that considers assisted image annotation, algorithms for image processing of grid-like high-throughput experiments, and enhanced learning processes. In addition, the concept includes the deployment of processing pipelines. Details of the computation and implementation are presented. In particular, we outline solutions for linking automated image processing for personalized oncology with high-performance computing. Finally, we demonstrate the advantages of our proposal, using image data from heterogeneous practical experiments and challenges.

Organoids in image-based phenotypic chemical screens

Article Open access 18 October 2021

Ilya Lukonin, Marietta Zinner & Prisca Liberali

Drug screening at single-organoid resolution via bioprinting and interferometry

Article Open access 06 June 2023

Peyton J. Tebon, Bowen Wang, … Alice Soragni

Automated molecular-image cytometry and analysis in modern oncology

Article 05 March 2020

Ralph Weissleder & Hakho Lee

Introduction

Cancer is the second leading cause of death worldwide after cardiovascular diseases. A major problem in cancer therapy is the development of resistance to the current treatment. The combination of in-vitro Drug Sensitivity and Resistance Test (DSRT) and multi-omics data from individual tumors helps determine the appropriate therapy for patients. In modern biotechnology, miniaturized and parallelized systems have become indispensable tools to address high-throughput assays^13,14. Such systems offer a higher degree of spatio-temporal handling and reduce the workload by up to a thousand-fold while significantly increasing the number of samples per run. This technology improves the performance of the assay. Miniaturized high-throughput technology is typically built with grid structures consisting of hundreds of elements, such as the Microplate¹, the Microwell-mesh², the Agarose Microwells³, the Microwell Array Chips⁴, or the Droplet Microarray (DMA)⁵ (cf. Fig. 1). This technology, for instance, tests the cells for their sensitivity to anti-cancer drugs, which could ultimately be utilized for personalized oncology predictions¹⁵.

Imaging techniques such as fluorescence microscopy or scanners are often used to visualize and interpret the results of experiments. However, despite the named advantages, these miniaturized high-throughput systems also have limitations. The generated data is substantial and needs to be interpreted. In the project called “Screening Platform for Personalized Oncology” (SPPO), automated image processing is essential, as manual analysis is time-consuming and leads to difficulties in reproducibility.

Imaging data and corresponding objectives in the analysis of high-throughput experiments are complex and heterogeneous, as shown in Fig. 2. These characteristics lead to various challenges that make fully automated image data processing complex: (1) Traditional image processing algorithms implemented in common software packages^16,17 cannot be used for accurate image analysis by default. (2) Domain experts must become familiar with the development/implementation of custom software solutions. (3) In particular, approaches that focus on grid-like structures are not yet available. (iv) High-resolution images combined with many experiments require substantial computational resources.

However, Deep Learning (DL) is suitable for solving heterogeneous problems^18,19. Nevertheless, several open problems complicate the application of Deep Neural Networks (DNNs). There are many individual DL algorithms, but a comprehensive and extensible concept has yet to be developed. In addition, approaches must be tailored to high-throughput tasks, as given in the SPPO project, which deals with the processing of grid-like structures. An automated grid estimation (macro level) and flexible spot-wise processing (micro level) are required. For instance, the analysis of cell nuclei requires more than a single instance segmentation approach. The calculation of the cell viability necessitates a fusion algorithm or a description of cell features, such as the area or eccentricity.

The scarcity of publicly available annotated datasets for individual high-throughput experiments poses a challenge for the supervised training of DNNs¹⁸. Currently, there are no comprehensive methods to increase the efficiency and improve the quality of annotations²⁰. In addition, an open challenge in the SPPO project is to improve supervised learning processes to reduce the amount of annotated samples required. High-throughput experiments typically demand expertise from multiple disciplines. A workflow is needed that combines the knowledge of the experimenters with the expertise of data scientists. Experimenters often have neither access to GPU hardware nor programming skills. Usability problems, software requirements, or the need for specialized hardware²¹ prevent experimenters from applying automated algorithms.

For interdisciplinary projects, we contribute a comprehensive concept for automated image processing in the context of high-throughput grid data. Methods for assisted image annotation, algorithms for high-throughput image processing, alternative learning processes, and the deployment of pipelines are considered. We also present details on the implementation, computation, and integration of High-Performance Computing (HPC) used to promote the application of our image and data analysis methods. In addition, we show excerpts of results from SPPO and other projects, using heterogeneous image data to demonstrate the generalization ability of our pipeline.

In Sect. “Screening platform for personalized oncology”, we give a brief overview of SPPO. Further, the workflow and methods for automated image processing are described in Sect. “Methods”. Then, an account of computation and implementation is given in Sect. “Computation and implementation. Moreover, Sect. “Results” shows the results obtained, followed by a discussion (cf. Sect. “Discussion”). Finally, we conclude our work in Sect. “Conclusion”.

Screening platform for personalized oncology

DMA is a miniaturized biocompatible platform based on superhydrophilic superhydrophobic micropatterns on a thin coating layer. A standard glass microscope slide forms the basic structure. The difference in the wettability of hydrophilic and superhydrophobic regions allows the formation of nanodroplets from aqueous solutions that remain stable on hydrophilic spots without physical barriers. This technology has proven its versatility by being tested in various research areas. For example, it has been used as a platform to develop combinatorial organic synthesis²² and hydrogel synthesis²³. Additionally, it has been utilized for advancements in various fields of biology, such as artificial multicellular systems²⁴, drug screening^25,26, embryoid body screening²⁷, and DNA delivery²⁸. The results of these studies suggest that this arrangement of open nanodroplets offers several advantages over standard methods. These advantages include defined nanoliter-sized compartments, easy and fast access to spots, and low sample/reagent consumption. As a result, the DMA is considered a viable system for personalized medicine. Personalized medicine is one of the main medical directions of our time. However, due to the intrinsic heterogeneity of malignant tumors, the “one size fits all” therapeutic approach is not always efficient. The genetic characteristics of each patient should be taken into account. So-called Patient-Derived Organoids (PDOs) can be established from the primary tumor tissue of the patient, which to a large extent, reflect the tumor heterogeneity in an ex-vivo setting²⁹. Studies showed that therapeutic reagents with a positive effect in the context of the PDOs could also be used effectively for the respective patients²⁹. Unfortunately, this type of personalized medicine is very time-consuming and expensive. In addition, the DMA platform offers a unique opportunity to combine classical cell biology methods, including cell culture and treatment, with molecular biology techniques, such as proteome and genome extraction protocols, on a single chip (cf. Fig. 1). Another advantage is that a DMA platform is used for cell-based screening⁵. Testing the susceptibility of patient-derived cancer cells to anti-cancer drug treatment in vitro in so-called DSRT is a major clinical development. Using DMA as an SPPO is considered an efficient screening method. Such an SPPO method allows us to evaluate a dose response on each anti-cancer compound per droplet. In addition, this method demonstrates the potential for a highly miniaturized DSRT as a personalized medicine, using only a tiny amount of primary patient-derived cells¹⁵.

Methods

Overall concept

The proposed concept for automated image processing within SPPO is illustrated in Fig. 3. Raw image data and expert knowledge form the starting point of the image processing problem. First, the data is divided into an application and not annotated training dataset, which is relevant for further investigations into machine learning. Then, the assisted annotation, which focuses on annotator variability and increasing efficiency, forms the basis for generating a partially annotated dataset from expert knowledge. A detailed presentation of this can be found in Sect. “Assisted annotation”.

Furthermore, tailor-made image processing algorithms for the high-throughput application form another concept component, which is explained in Sect. “Automated high-throughput image processing”. Both the macro and micro levels are addressed in the context of image processing. The macro level deals with the analysis of the grid, while the micro level considers the individual elements of the grid.

Another part of the concept investigates elaborated learning methods compared to state-of-the-art supervised learning approaches. The focus is on parameterizing robust image processing pipelines with as little annotated data as possible.

Finally, the concept deals with deploying algorithms to make them useful for researchers in practice. Here, we contribute to bridge the gap between the development and usage of image processing pipelines. This element relates to the implementation and computation component (cf. Sect. “Computation and implementation”), which focuses on implementing all methods in real-world applications.

Assisted annotation

Figure 4 presents an overview w.r.t. the proposed method for assisted annotation, which is based on the publications of Schilling et al.^20,30,31.

A raw image dataset obtained during data acquisition is supplied with meta-information that is essential for scientific data management. Inspired by deep active learning, the component “Selection” allows users to influence the order of images presented to them. The goal is to focus on the most promising images. When selecting samples, for instance, methods consider the heterogeneity of samples or the uncertainty of an already available DNN. There are various scenarios where the annotation of raw images is a challenge. The pre-processing element addresses this issue. It enables image pre-processing inherently, i.e., by normalizing or cropping images to simplify the annotation for users. The concept of pre-annotation incorporates available algorithms (i.e., pre-trained DNNs or traditional image processing algorithms such as Otsu) that can serve as heuristics and provide an initial estimate. Therefore, users only need to correct the predicted annotations. Adapting the input interface to the image processing tasks guarantees task independence. The post-processing of annotations directly avoids recurring errors by incorporating prior knowledge, e.g., that there are no holes in segments.

Besides, the annotator variability problem is considered (cf. annotation inspection). Quality criteria, like a comparison of the annotation and prediction of an inspection DNN, can be evaluated to trigger warnings on suspected errors. Since the form of support is project-dependent, extensions or customizations can be integrated due to the generic design. In addition, we integrate dataset version control capability to track the dataset history and simplify the transfer to data servers. Finally, plugins consider neighboring image annotation activities and integrate elements, such as DNN training, the application of trained DL pipelines, or the post-processing of predictions by DNNs.

Automated high-throughput image processing

Figure 5 shows the developed approach for automated high-throughput image processing based on the work of Schilling et al.³². The method is composed of two components: (i) automated grid parameter estimation (macro level, cf. Sect. “Automated high‑throughput image processing”) and (ii) spot-wise processing (micro level, cf. Sect. “Automated high‑throughput image processing”), which are discussed below.

Grid detection

First, the input image is pre-processed, i.e., conversion to grayscale, allowing the processing of different image types and normalization to improve the performance of the subsequent DNN for spot detection. Next, a DNN is used for segmentation to distinguish between pixels belonging to elements or the background. Various post-processing operations, such as smoothing with morphological operators or filtering based on the area of an element, further optimize the robustness of the detected elements. A sophisticated algorithm that considers the arrangement of neighborhood elements is used to estimate and correct the rotation of the grid. Subsequently, the grid parameters can be obtained by using robust parameter estimation approaches. In addition, a semi-automated grid estimation enables the estimation of grid parameters in cases where the automated approach fails, i.e., challenging imaging conditions. However, this method requires user input as a subset of grid parameters.

Selected methods for spot-wise processing

Automated object detection Object detection is a common problem in biomedical applications. For example, counting the number of cells or describing the properties of organoids are of particular interest.

Figure 6 presents our proposed pipeline for automated object detection inspired by Scherr et al.³³. First, the input image is pre-processed by normalization to obtain a reasonable range of values for the subsequent DNN. In the case of instance segmentation, the DNN predicts Euclidean distance maps that are post-processed using a seed-based watershed algorithm to obtain objects. In semantic segmentation, the prediction of the DNN is post-processed by selecting the class with the highest output value. A feature extraction step follows to export various object properties, such as the area, the position of the centroid, or the length of the major/minor axis.

Automated cellular viability analysis Ex-vivo drug screening assays test the cellular viability after drug treatment at appropriate dilutions. The proposed novel image processing pipeline for cell viability analysis is shown in Fig. 7. The multidimensional input image is processed channel by channel using the instance segmentation approach presented (cf. Sect. “Automated high‑throughput image processing”). The Hoechst channel stains the nucleus of all cells, dead or alive, giving the total number of cells. Calcein indicates only living cells, and PI stains the nucleus of cells containing ruptured cell membrane (dead cells). However, some cells appear to stain faintly for Calcein, while at the same time being already positive for PI.

In a fusion step, the information is merged. Using the k-nearest-neighbor algorithm, matching instances between the different staining channels are obtained. Then, cellular viability can be calculated by comparing the total number of cells with the instances that meet the specified criteria (Hoechst/Calcein positive, but PI negative).

Colorimetric analysis In many high-throughput experiments, colorimetric analysis of scanner images can be used to conclude experimental results. The objective of image processing is to automatically calculate a quantitative metric per spot element that correlates with the output variable of interest. The metrics can thus be converted into each other via a calibration measurement. Next, an abstract transformation function is used to convert the RGB value into a scalar, which is utilized to quantify the color of the element. The transformation functions depend strongly on the particular experiment and the given color spectrum. Examples are color space transformations or individual linear combinations of the RGB values.

Deployment

There are special requirements within the interdisciplinary SPPO project. The automated image processing pipeline should be usable by experimenters to avoid cumbersome workflows and provide direct image analysis without waiting time. Therefore, a user-friendly framework is needed that does not require special hardware, is out-of-the-box, and does not require complicated software installation procedures or dependencies on commercial solutions.

Our deployment framework proposes to provide cross-platform, open-source software packages, including Graphical User Interfaces (GUIs) and user manuals. The deployment of the software packages consists of four parts: 1) A local installation on a user device is possible. 2) We propose the establishment of an image processing server equipped with the required software/hardware that can be accessed via a remote desktop connection, independent of the local device. This solution is designed for basic computations such as assisted image annotation. 3) We provide a RestAPI solution that connects the image processing server to a single Graphics Processing Unit (GPU) for medium-scale computations like inference steps of newly collected experimental data.

4) An integrated GUI-based submission system that interfaces with the HPC workload manager SLURM allows experimenters to use HPC resources without coding. Having access to an HPC cluster brings several advantages. For instance, training a DNN with a single GPU instead of a CPU or with multiple GPUs by applying data-parallel training approaches are use cases of HPC that shorten the training duration³⁴. In addition, the cluster enables high-throughput computing because multiple computational nodes are equipped with several GPUs. Taking parallel independent hyperparameter optimizations or DNN inference into account, the benefit from the perspective of high-throughput computing emerges. The computational benefit scales approximately linearly with the number of devices available³⁴.

In addition, our solutions consider a global data server for seamless data exchange between all computing systems. For further details, refer to the publications^20,32,34.

Computation and implementation

We have developed an extensive open-source software portfolio with GUIs available as cross-platform Python pip packages. The tool Grid Screener³², available at https://git.scc.kit.edu/sc1357/grid-screener, covers the implementation of the proposed methods for automated high-throughput screening. The Karlsruhe Image Annotation Tool (KaIDA)²⁰ software package implements our proposal for assisted image annotation. The repository is located at https://git.scc.kit.edu/sc1357/kaida. The GUIs can be found in Fig. 8.

We consider the Large Scale Data Facility (LSDF)³⁵ as a global data server system. In addition, we set up a prototype server described in Table 1. Moreover, we provide researchers with a Lenovo x12 Detachable device that can be connected to the image processing server. This hardware offers a touchscreen for image annotation (cf. Fig. 8). The RestAPI solution is connected to a local GPU server (GPU: Nvidia Tesla V100, CPU: Intel Xeon 5118 CPU).

Table 1 Prototype Server. Details w.r.t. established prototype server²⁰ are shown.

Full size table

Furthermore, the SLURM submission system integrated into KaIDA is linked to HoreKa, which allows elaborate DNN training or inference on HPC resources. A computational node of HoreKa is equipped with an Intel Xeon Platinum 8368 CPU (2 sockets, 76 cores per socket) and four NVIDIA A100 Tensor Core GPUs. This configuration enables high-performance and high-throughput computing. There are a total of 167 nodes available.

Results

We give an overview of our investigations on automated high-throughput image processing within the SPPO project and other projects to demonstrate the generalization ability of our concept. However, presenting the results of all modules given in our concept would go beyond the scope of this paper. Therefore, only excerpts from the results are shown below. Please refer to^{20,30,31,32,34,36} for further details.