AI models for automated segmentation of engineered polycystic kidney tubules

Autosomal dominant polycystic kidney disease (ADPKD) is a monogenic, rare disease, characterized by the formation of multiple cysts that grow out of the renal tubules. Despite intensive attempts to develop new drugs or repurpose existing ones, there is currently no definitive cure for ADPKD. This is primarily due to the complex and variable pathogenesis of the disease and the lack of models that can faithfully reproduce the human phenotype. Therefore, the development of models that allow automated detection of cysts’ growth directly on human kidney tissue is a crucial step in the search for efficient therapeutic solutions. Artificial Intelligence methods, and deep learning algorithms in particular, can provide powerful and effective solutions to such tasks, and indeed various architectures have been proposed in the literature in recent years. Here, we comparatively review state-of-the-art deep learning segmentation models, using as a testbed a set of sequential RGB immunofluorescence images from 4 in vitro experiments with 32 engineered polycystic kidney tubules. To gain a deeper understanding of the detection process, we implemented both pixel-wise and cyst-wise performance metrics to evaluate the algorithms. Overall, two models stand out as the best performing, namely UNet++ and UACANet: the latter uses a self-attention mechanism introducing some explainability aspects that can be further exploited in future developments, thus making it the most promising algorithm to build upon towards a more refined cyst-detection platform. UACANet model achieves a cyst-wise Intersection over Union of 0.83, 0.91 for Recall, and 0.92 for Precision when applied to detect large-size cysts. On all-size cysts, UACANet averages at 0.624 pixel-wise Intersection over Union. The code to reproduce all results is freely available in a public GitHub repository.

The recent onset of a continuous symbiosis between biotechnology and artificial intelligence (AI) has proven to be particularly effective in areas of life sciences where classical approaches are not an option.Drug discovery is one of such activities, especially for diseases whose treatment options are dramatically limited, due to the intrinsic nature of the pathology as well as the lack of models that can faithfully reproduce the human phenotype.This is indeed the case here, where a novel drug testing strategy is introduced, combining an in vitro experimental setup with deep learning modeling into a pipeline aimed at accelerating the testing of novel compounds.The target condition is the Autosomal Dominant Polycystic Kidney Disease (ADPKD), the most common inherited monogenic kidney disorder that affects 1/500 − 1/2500 individuals 1,2 worldwide.ADPKD distinctive phenotype is the formation and the progressive growth of multiple cysts that gradually replace the kidney parenchyma, thus leading to an impairment of kidney structure and function and, eventually, to end-stage kidney disease 3 .In particular, cytogenesis involves a large number of diverse signaling cascades and pathways, such as PC1/2 (polycystin-1 and 2) signaling, cilia-related cascades, and growth factors-related signaling 4 .
Apart from the conventional anti-hypertensive strategies 5 , there are currently two drugs that have recently been repurposed and used sometimes to reduce the growth rate of cysts in ADPKD: Tolvaptan and Octreotide-LAR 6 .However, these drugs are only available to patients at high risk of end-stage kidney disease (ESKD), while an important number of ADPKD patients progress to ESKD despite the treatments.There is currently no definitive cure for ADPKD.This is mainly due to the high phenotypical and genotypical variability between patients, the complex pathobiology of the disease, and the lack of models that can faithfully mimic the human phenotype.
As such, there is an urgent need for developing patient-specific models that can replicate key aspects of polycystic kidneys and be used for drug testing studies.Using 3D printing technologies and patients' cells, we have

Dataset and material
The dataset provided by Istituto di ricerche farmacologiche Mario Negri (Bergamo, Italy) is made of RGB immunofluorescence images of tridimensional human tubules engineered from epithelial cyst-lining cells that were isolated from a single donor patient with a mutation in PKD1.Tubules are the result of 4 individual experiments conducted from July 2019 to December 2020 and are classified according to the treatment received.

Immunofluorescence and image acquisition
Engineered tubules were fixed with 4% paraformaldehyde (PFA) ( Cat#157 − 8 , Electron Microscopy Sciences) and permeabilized in 100% cold methanol for 10 minutes.After washing, tubules were incubated with mouse anti- E-cadherin ( Cat#610182 , BD Biosciences, 1:50) overnight at 4 • C and then with the specific secondary antibody (Jackson ImmunoResearch Labs, 1:50) overnight at 4 • C .DAPI (Sigma-Aldrich) was applied for 10 minutes to staining the nuclei of the cells, and then the samples were mounted with Dako Fluorescence Mounting Medium (DAKO Corporation).Digital z-stack images of the whole tubular surface were acquired for both sides of the tubules using an inverted confocal laser microscope (Leica Biosystems).For each acquired image, cysts were manually annotated by using Labelme 38 through a polygonal segmentation.Images composing the dataset and related information are intellectual properties of Istituto Mario Negri.

Data characterization
The dataset consists of 1076 images of microscope acquisitions with a fixed scale and fixed size of 1024 × 1024 pixels.The total number of annotated cysts is 5042.Table 1 shows the cardinality separated by period.
Each of the four experiments comprises the study of various treatments (not always the same ones) and a control group.For each treatment or control, up to two tubules have been produced.For each tubule, various images have been acquired at different depths and viewpoints over the tubule.Each experiment used a different distance between the z-stacks of the acquisitions.Figure 1 shows the relationship between experiments, treatments, tubules, and images.The zones of the smallest cysts are around 20-30 µm 2 large, whereas the zone range significantly increases for bigger cysts.A similar behavior is also reflected in the number of cysts per image, reported on the right side of Fig. 2. Generally, the number of cysts in an image is close to the distribution median of four.However, there are some images with a very high number (e.g., more than 10) and others with no cysts at all.In the following sections, we will mitigate this imbalance by aggregating evaluation metrics over all images within a tubule.
A closer look at these features shows that such a large variance is to be attributed to the difference between treatments. Figure 3 collects the previously discussed statistics separated by experiment and treatment.We can observe from the top row that a significant component of the large-sized cysts comes from TREAT_2 in experiment 1.Such an experiment has a diverging distribution also concerning the number of cysts per image.Such a large variance in the dataset is potentially challenging for the deep learning models.Hence the model evaluation will have to address this issue.

Method
Deep learning algorithms can provide powerful and effective solutions for automated segmentation of polycystic kidney tubules.Different architectures have indeed been proposed in the literature in the last few years.Here we comparatively review some of the state-of-the-art segmentation approaches, using as a testbed the suite of RGB sequential immunofluorescence images from 4 in vitro experiments consisting of 32 engineered polycystic kidney tubules.The experimental pipeline is presented in Fig. 4. It consists of three main steps: (i) an initial pre-processing phase that is aimed at data cleaning and augmentation, (ii) the application of a cyst segmentation model aimed at identifying the cysts, and (iii) a post-processing phase to support the analysis by a domain expert for the final evaluation.Each of these steps is described in further detail in the following subsections.

Preprocessing
The preprocessing steps are applied to the input images to prepare them for the training of the cyst segmentation model.This data preparation is aimed at removing noise that is potentially contained in the images and augmenting them: we want the final model to be invariant to some operations, such as image rotations, orientations, and brightness.

Green channel removal
The images composing the dataset are generated after the application of highlighting fluorophores to the tissues to emphasize the most relevant components concerning the background.Human annotators generally recognize cysts as void globular holes surrounded by nuclei in the cell tissue.In order to help in the identification task, a red fluorophore is applied to spotlight the tissues and, as a consequence, the overall shape of the tubule.A blue fluorophore is then applied to stain the nuclei.Other markers are applied to some tubules to emphasize different points of the acquisition, but they are not relevant to the proposed analysis.Because of the colors of the fluorophores applied, we make the assumption that the most relevant information is found in the red and blue channels of the RBG images (Red Green Blue), with the green channel mostly containing noise or information that is unnecessary for the segmentation.The first preprocessing step applied is thus the removal of the green channel from the input images.The removal of the green channel can be accomplished in two ways: either by preserving the green channel and setting all of its values to 0 or by removing the channel altogether (i.e.producing RB images).Although the two approaches conceptually reach the same result, we decided to preserve the "muted" green channel for compatibility with later segmentation models that have been pretrained on (and thus expect as input) 3-channel images.Figure 5 shows two examples of unprocessed images, green channel, and output images after removing the green channel.We refer to this strategy as no-G preprocessing in the following.

Random transformations
Since deep learning solutions should learn the relevant features of the images neglecting non-fundamental ones, such as the direction of the tubules and the acquisition luminance, we applied data augmentation techniques to improve generalizability.This is a commonly adopted technique in computer vision to increase the number of available samples and train models to acquire some desired invariance properties 39 .Images used for training are thus subject to random transformations with different probability rates -as summarized in Table 2.A final augmentation step is applied to all images to normalize them so that they end up having the same RGB distribution as images from ImageNet 40 .Although this normalization does not affect the overall performance obtained, we achieved faster model convergence in this way.This happens because, after normalization, the inputs follow the same distribution as the data (ImageNet) already used for the pre-training the models.The last column of Fig. 5 shows two examples of augmented images.

Cyst segmentation model
The augmented images are used to train a deep-learning model that performs a segmentation task.The target for this segmentation task is a binary mask that labels each pixel as either belonging to a cyst or not.The ground truth was obtained through manual labeling.We compared 5 state-of-the-art image segmentation models to identify the best-performing deep-learning model, as described in the following.
• UNet 10 , a widely used model in medical image segmentation thanks to its effectiveness in combining low- and high-level features, and thus balancing the trade-off between architecture complexity and segmentation performance 41,42 .This model consists of two macro-blocks, the encoder and the decoder, connected through a series of extra convolutional blocks which act as skip connections between the encoder and the decoder.
As in many other similar architectures, U-Net uses an initial contracting path (used to capture meaningful information) followed by an expanding one (to build an output with a similar shape as the input).Skip connections are used as stabilizers, reducing the loss of information that may incur in the encoding-decoding procedure and providing bypasses for the backpropagation of gradients.The output of each segmentation model is a 1-channel image with the same shape as the input, representing the predicted probability for each pixel to be a cyst.To obtain a binary mask, we discretized with a threshold of 0.5.This value is a good compromise between the accuracy of the model against unwanted cysts and the recognition of actual cysts.Figure 6 showcases some examples of segmentations performed by the various models adopted.For each image, the original tubule and the manually annotated ground truths are reported, as well as the binary masks produced by each of the models.It can be observed how, despite minor behavioral differences, the models are primarily consistent with one another in their predictions.The experimental section will present quantitative results regarding the quality of the segmentation outputs.

Training and tuning
Each model was trained using a binary cross-entropy function between the predicted mask and the expected binary output.We used an Adam optimizer and cosine annealing with warm restart to periodically decrease the learning rate and restart from its initial value across training epochs.This strategy was shown to improve the training speed 46 .Hyperparameters (e.g., learning rate) were set separately for each network configuration.Following a Bayesian search strategy, we found that the models with a learning rate of 10 −4 and a stack size of 8 images achieved optimal performance within the available computational resources.An early stop strategy (with a maximum number of epochs of 100) was applied to stop training when no further improvement in Intersection-over-Union (IoU) was observed in the validation set.A summary of the main hyperparameters used for all models is shown in Table 3.

Model evaluation
Results for the best models are validated using a cross-validation strategy designed to both preserve the various levels of stratification of data and avoid data leakage.This second aspect stems from the observation that images from the same tubules are significantly correlated with one another.For this reason, we enforce placing all the same-tubule images in the same fold.The policy of separation was to keep together each image from the same experiment with the same treatment.Images belonging to the same experiment but different treatments (or vice versa) may be assigned to different folds: this is a necessary measure to guarantee same-size folds.This results in 32 folds, each containing images of one single tubule.We arrange them to form the train-validation-test split following a LOTO (Leave One Tubule Out) separation.This means that in each training pipeline, a model is trained and validated on 31 tubules (splitted in 80% training and 20% validation sets) and tested on the remaining one.In this process, the validation set is used to identify the early stopping point over the IoU curve for epochs, and final model weights are assigned based on this metric.Given the limited size of the dataset available, the LOTO approach (a specialization of the leave-one-out one) can be used with only limited computing power and provides the best possible estimate of the quality of the model on unseen data (i.e.how well the model can generalize to tubules other than the ones already seen).We are interested in predicting both the number of cysts and their size accurately.Segmentation models are generally evaluated in terms of pixel-wise metrics, such as Intersection over Union (IoU), Recall (Re), and Precision (Pr).These metrics can be computed based on the ground truth value of each pixel (i.e.whether it is a cyst or not) and the value predicted by the model for the pixel (i.e., whether the model predicted that that pixel was a cyst or not).Table 4 shows the definitions for pixel-wise IoU, precision, and recall.In this context, true positives (TP) are cyst pixels that have been correctly labeled as cyst pixels, false positives (FP) are non-cyst pixels that have been erroneously labeled as cysts and false negatives (FN) are cyst pixels that have not been detected as being cysts (i.e., they have been labeled as non-cysts).
Pixel-wise metrics provide useful information about the overall quality of the model.However, we note that these metrics tend to weight cysts with larger areas: Since these larger cysts consist of a larger number of pixels, their correct detection has a greater impact on the metrics than smaller cysts.
For this reason, we additionally propose the use of cyst-based metrics by extending the used pixel-wise metrics to a cyst-based granularity level.In this way, we can weight all cysts equally regardless of their area.In our previous work, we have also introduced and discussed these metrics 47 .
We first define a notion of overlap: a predicted cyst and a ground truth cyst are considered to be overlapping if they have some pixels in common (e.g., as a threshold value on their pixel-wise IoU).We will study the impact of this choice of threshold in the experimental section.Based on this notion, we can identify a ground truth cyst as being detected (DT) if it overlaps with at least one predicted cyst.If no predicted cyst overlaps with it, the ground truth cyst is instead referred to as missed (MS).If a predicted cyst does not overlap with any of the ground truth cysts, it is labeled as wrong (WR).These values are the counterparts of true positives, false negatives, and false positives respectively.We note that a cyst may be detected multiple times (i.e. it may overlap with multiple predicted cysts).To avoid inconsistencies, we consider this situation as one detected (DT) cyst and the rest as wrong (WR).Similarly, a predicted cyst that overlaps N ground truth ones is considered as one detected (DT) cyst and N-1 missed (MS) ones.
Figure 7 exemplifies the various types of situations that can occur between predicted and ground truth cysts.Figure 8 additionally illustrates some examples of cyst predictions and ground truths, with examples of detected, Table 3.Values of the main hyperparameters.All models have been fine-tuned using the same configuration.

Postprocessing
After obtaining a binary mask from the segmentation model, we apply two postprocessing steps to make the segmentation output more useful for the domain expert who will assess and evaluate the result.The first postprocessing step consists in filling holes that may occur within some of the segmented images.We reasonably assume that cysts do not typically contain holes within them.Based on this, all pixels of the predicted mask containing "non-cyst" pixels surrounded by "cyst" pixels are automatically switched to "cyst" pixels.We accomplish this through a simple flood-fill algorithm.
The second (optional) postprocessing step consists in either applying an opening or a closing morphological operation, which act as follows: • Closing consists in applying a dilation operation followed by an erosion one.The two operations are applied pixel-wise and consist in replacing each pixel value with either the maximum (dilation) or minimum (erosion) that are found within a neighborhood of the pixel itself.The closing operation results in an image where neighboring clusters of pixels (cysts) are merged together.• Opening applies an erosion followed by a dilation.This order of operation makes small clusters of pixels disappear due to the application of the erosion operation.This operation can be useful to remove noisy segmentations (i.e., predicted cysts that are only a few pixels in size) Both operations rely on the notion of a neighborhood.This neighborhood is defined through a structuring element, whose size and shape determine the specific properties of the openings and closings.More specifically, we let the size of the structuring element k as a parameter that the domain expert can control to decide the extent of the desired behavior.Instead, we define the shape of the structuring element as circular, given the assumption that we expect cysts to have a circular shape.Figure 9 shows the application of an opening or closing application on a sample image, using different k values.We observe how applying a closing can mitigate the overcounting problem, which occurs when the model segments one of the real cysts as multiple ones.Closing can be used to make the various predicted cysts merged into a single one.
The opening operation can be used instead to remove small (noisy) segmented cysts.For example, in the zoomed mask, a small cyst can be seen in the upper right corner of the image.While this cyst cannot be removed by closing it, it quickly disappears when opened with a small value for k.Using this example, we can also see some problems that can occur when opening and closing.Excessive closing can cause cysts that were previously correctly separated to be incorrectly merged (as with the lower cyst in the example image).Excessive closing, on the other hand, could result in small -but correctly identified -cysts no longer being recognized as such.Based on these considerations, we believe that both opening and closing cysts can be a helpful support if used carefully.

Detected Detected
. Cases of overlap between real (blue) and predicted (red) cysts.Examples of overlaps between ground truth cysts (in blue) and predicted cysts (in red).Each situation is identified with the type of label that is assigned to it.In this paper, we leave the option of applying these steps (and, if so, the extent to which they are applied) to the expert analyzing the results.We additionally present some results of applying either approach in the experimental section.However, we are aware that this post-processing could be applied differently in different sections of the image (e.g.different operations, different k values): one of the future directions of this work will be to automate this post-processing step.

Experimental results
In this section, we present the results of the LOTO cross-validation applied to each segmentation model.The main results show the performance of the various segmentation models without postprocessing.We additionally study the behavior of the best-performing model as a function of the size of the cysts.We show that the proposed method has a consistent behavior throughout treatments, which implies that the proposed methodology is robust www.nature.com/scientificreports/ to changes in the distribution of cysts' number and size.Finally, we observe how applying the proposed opening and closing postprocessing techniques impacts the performance.

Deep learning model comparison
Table 5 reports the model performance in terms of IoU, precision, and recall for both the pixel-and cysts-wise metrics.We can observe that all the models can reach comparable results in terms of pixel-and cyst-wise metrics.UACANet is the model that generally performs best regarding IoU and recall.In terms of precision, we instead observe the best results with UNet++.Looking at the cyst-wise measures, the interpretation is that UACANet tends to over-estimate the presence of cysts (thus higher recall and lower precision), whereas UNet++ makes more conservative estimates, only predicting cysts when there is high confidence (resulting in lower recall, but higher precision).Most confidence intervals of the results overlap with one other.Concerning IoU-like metrics, UNet, UNet++, and PraNet get values close to UACANet.For the precisions, all models except PraNet approach the bestperforming one, i.e., UNet++.PraNet is close to the best model in the recall metrics.Even though all the models achieved comparable results, UACANet is found to be, on average, the best-performing architecture overall.For this reason, in the following, we elect UACANet to be our reference model.

Overlaps between predictions and ground truth
When defining the cyst-wise metrics, we made the assumption that a cyst is detected (DT) if the overlap between a predicted cyst and a real one (as quantified by the IoU between them) is above some threshold value.
To identify a meaningful IoU threshold for the definition of detected cases (and, as a consequence, the other metrics), we study the distribution of IoU values between predicted and ground truth cysts.Figure 10 shows the distribution (as modeled with a kernel density estimation) of values for all detected cysts (assuming a minimum overlap of 1 pixel).We observe that most cysts are detected with a large overlap (95% of cysts are detected with an IoU greater than 0.2, and 80% of the cysts are detected with an IoU greater than 0.6).This implies that, if the model detects the presence of a cyst, its segmentation of the cyst itself will be particularly accurate.We additionally note that 65% of the cysts predicted with an IoU smaller than 0.2 are the smallest ones (zone sizes 1 to 3).As we will discuss, these cysts are the most problematic to detect.Larger cysts (i.e., those that are more relevant in terms of cyst size) are generally well segmented (i.e., they have a large IoU).Because of this, and with the overall goal of achieving good performance in terms of recall, we decide to consider a cyst detected if the predicted and ground truth cysts overlap at least 1 pixel.

Cyst size
Given the heterogeneous size range of the labeled cysts, we aggregated the results of the models by cyst size to determine if this affected the network's learning ability.Focusing on the UACANet architecture, we evaluated the cyst-wise performance for the cysts in each of the 6 size zones (as described in Fig. 2) and presented the results in Fig. 11.For partitioning, the actual size is considered for DT and MS cysts, whereas the predicted size is considered for WR cysts (no information on the actual size is available because the predicted cyst does not exist).This means that a predicted cyst is considered for evaluation within the size zone of its ground-truth counterpart, if available, within its size zone otherwise.
We observe an increase in predictive power with larger cyst sizes.Precision and recall scores achieve similar performance throughout zones, except for the first zone, which contains the smallest cysts.In this case, the precision is significantly lower compared to the results for the other size categories.The recall for the smallest zone is still lower than for the other zones, but less significant.This can be explained by the fact that the model makes most false-positive (WR) predictions for small cysts.In other words, the cysts that the model incorrectly predicts are usually very small (mostly zone 1).We also note that some of the incorrectly-predicted zone-1 cysts were later accepted as valid cysts when reevaluated by human annotators, confirming that detection of small cysts is a difficult task even for humans.

Treatment invariance
This work aims to investigate the feasibility of an affordable cyst detection platform.Therefore, we need a model capable of producing excellent and stable results regardless of tubule treatment.In this way, the effect of prediction error in evaluating treatments would be reduced if it is the same for all treatments.This property was evaluated by separating the results by treatments and comparing them using the proposed metrics in Fig. 12.We find that the highest difference in performance between treatments is at most 0.2 points for each metric.Furthermore, there is no meaningful correlation between these results and the statistics of the treatments presented in Fig. 3.In particular, the anomalous distribution of images from TREAT _2 does not affect the predictive capabilities of the network.

Preprocessing impact
In Section , we presented the concept of no-G preprocessing, which incorporates expert-guided supplementary information indicating that the green channel's content is irrelevant to the task.Demonstrating the advantages of integrating this knowledge for enhanced performance, Table 6 displays the outcomes of UACANet, identified as the top-performing model for this task, across all the experiments.
We note a general improvement in performance in all the experiments, but the most substantial improvement is seen in Experiment 3. In this particular experiment, it is worth noting that the experts involved in the acquisition of the tubules reported that the green channel shows additional fluorophores unrelated to the scope of this study.It should be reiterated that this technique is specifically tailored to this particular task.Nonetheless, these results highlight the advantages of using a custom pipeline over general segmentation algorithms that are readily available.

Postprocessing
The results presented so far were obtained without applying the opening or closing postprocessing effects.As discussed earlier, these effects can be used to find a better compromise between precision and recall.Although we leave the decision on whether to use these techniques to domain experts (as well as the extent to which they should be applied), we nevertheless report on the results that can be obtained by applying either technique with different kernel sizes (k).This is to provide a general overview of the effect that the opening and closing techniques can provide.In particular, Fig. 13 shows the effect on precision and recall when using different k values, for both opening and closing.We observe two very different behaviors for opening and closing.When using closing, the effect is very small and not statistically significant across confidence intervals.However, we observe a slight increase in both IoU and precision, with a negligible effect on recall.
This behavior occurs because some of the wrong (WR) cysts are the result of an "overcount" in which a single ground truth cyst is segmented as multiple smaller cysts (one such case is shown in Fig. 9).As closing is applied, the smaller cysts are merged into fewer larger ones (third line in Fig. 9).As expected, this reduces the total number of false cysts found.
The opening postprocessing has instead the effect of removing the smallest cysts and separating the cysts that are close to each other and predicted to be together.We observe a steady increase in precision and a steady decrease in recall.Based on the previously stated consideration that many of the incorrectly predicted cysts are small, the observed increase in precision can be easily justified.The reduction in recall can be explained by the fact that the opening removes some of the small cysts that were correctly identified.
In summary, we can see that the effect of opening and closing can be beneficial in some cases, but detrimental in others.For this reason, we leave it to the expert to perform these operations as needed.

nnUNet comparison
Our solution addresses the challenges the cyst segmentation task specifically poses.We extend our experimental comparison here, including a more generic alternative, the nnUNet 48 .

Figure 2 .
Figure 2. Dataset statistics.Statistics for cyst size (left side) and number of cysts per image (right side) on the whole dataset.

Figure 3 .
Figure 3. Statistics for treatment and experiment.Columns of the plot collect the data over the different experiments, each bar is an administered treatment.In the upper row we evaluate the cyst area, in the lower row the number of cysts is reported. https://doi.org/10.1038/s41598-024-52677-1www.nature.com/scientificreports/

Figure 4 .
Figure 4. Experimental pipeline.Data flow of the application of Artificial Intelligence techniques for automated segmentation of engineered polycystic kidney tubules.

Figure 5 .
Figure 5. Tubule acquisitions and preprocessing.Rows represent 2 sample tubule acquisitions.The first column shows the raw images, and the second and the third are the associated green channel and the output image after muting the green channel.The last column shows the final image that is provided as input to the segmentation model after applying additional image augmentation techniques.

Figure 6 .
Figure 6.Segmentation results generated by the different models.The original tubule and ground truth mask are given for each of the 4 example images (one for each line row).The models' predictions mostly agree with each other and the ground truth, especially for larger cysts.Smaller cysts are sometimes not detected or incorrectly predicted even though none are present.

Figure 8 .
Figure 8. Predictions from four different input images.For each image, contours of manually annotated (DT and MS) and predicted cysts (DT, WR) are highlighted.Yellow contours identify the ground truth cysts, they are DT if there is a predicted (green) contour over them, MS otherwise.Incorrect predictions are marked as WR.

Figure 9 .
Figure 9. Example of postprocessing on a sample image.The first row includes the original image, the ground truth, and the mask predicted with UACANet.The second and third rows show the effect of applying the closing technique, whereas the fourth and fifth rows show the effect of applying opening.The results are shown both for the entire image (second and fourth rows) and for a close-up of an interesting case (third and fifth rows).A red square is used in the top row to identify the portion of the image used for the close-up.For all images, the ground truth is shown in grey, the prediction is shown in white.

Figure 11 .
Figure 11.Cyst-wise performance by size.Performance is reported on the test set for UACANet, aggregated by cyst size.

Figure 12 .
Figure 12.Performance measures separated by treatment.The upper and lower rows show the pixel-and cystwise metrics, respectively.

Table 1 .
Summary of the experiments.Some treatments are duplicated across experiments, the total value counts the unique treatments.The cyst dimensions span a broad range of values over different images.The upper part of Fig.2shows the distribution over the whole dataset, depicting that smaller cysts located around 30 µm 2 , while the biggest ones reach 1900 µm 2 .The median of the distribution is around 78 µm 2 , which means it is highly peaked near the lower part of the graph.To address this peculiarity, we defined six adjacent zones with the same cardinality, i.e., approximately 840 cysts each, following the cyst area distribution: cysts with a size smaller than 34.7 µm 2 fall into zone 1, the other zones start at 34.7 µm 2 , 53.3 µm 2 , 78.7 µm 2 , 120.9 µm 2 and 207.5 µm 2 , respectively.The last zone includes cysts with a size up to 2000 µm 2 .
Figure1.Experimental layout across different treatments.The treatments vary across experiments, but a treatment may be repeated multiple times.For each treatment or control, a maximum of 2 tubules are gathered.Each tubule is exclusively linked to a single experiment.Therefore, for example, tubule 1 in experiment 1, treatment 1, is distinct from tubule 1 in experiment 1, treatment 2, as well as tubule 1 in experiment 3, treatment 1. Multiple images are collected for each tubule.Vol:.(1234567890)

•
44et++14, an improvement of the original U-Net model.It exploits the skip-connection strategy benefits by adding extra paths connecting the encoder and the decoder, thus further reducing the gap between the two blocks.Both UNet and UNet++ architectures are implemented with a ResNet50 encoder, pretrained on ImageNet40.• Ha-MSEG43, a HarDNet-based44segmentation model, which uses Receptive Field Blocks for the decod- 45g phase.HardNet-MSEG has been used for the task of medical image segmentation in the identification of colorectal adenomatous polyps.•PraNet45(ParallelReverse Attention Network) is a model that follows a different paradigm than that used in previous networks.Input features are aggregated by a parallel partial decoder that generates a global map

Table 2 .
Random data augmentation techniques applied to dataset images to improve generalizability during the deep-learning model training.thatispassed to a set of recurrent reverse attention modules.These modules are beneficial for extracting relationships between boundaries and areas.Similar to HardNet-MSEG, PraNet was also tested on a polyp segmentation task.•UACANet 15 is a model based on the PraNet architecture.It differs from PraNet mainly by using Uncertainty Augmented Context Attention (UACA) modules instead of Reverse Attention modules.UACA modules introduce a self-attention mechanism that incorporates uncertain regions to extract rich semantic features without introducing additional boundary guidance.UACANet has been shown to perform better than PraNet on a number of polyp segmentation tasks.

Table 5 .
Performance of the cyst segmentation models in terms of Intersection over Union, Precision, and Recall, both pixel-and cysts-wise.Best values for each metric are in bold.

Table 6 .
Performance of the cyst segmentation models in terms of Intersection over Union, Precision, and Recall, both pixel-and cysts-wise.Best results for each experiment are in bold.

6087 ± 0.0446 0.4786 ± 0.0653 0.4814 ± 0.0246 0.7037 ± 0.0143 Figure
13. Effect of the opening and closing postprocessing at different values of k .The baseline result of the UACANet model without postprocessing is reported by the dashed red line.