OrgaQuant: Human Intestinal Organoid Localization and Quantification Using Deep Convolutional Neural Networks

Organoid cultures are proving to be powerful in vitro models that closely mimic the cellular constituents of their native tissue. Organoids are typically expanded and cultured in a 3D environment using either naturally derived or synthetic extracellular matrices. Assessing the morphology and growth characteristics of these cultures has been difficult due to the many imaging artifacts that accompany the corresponding images. Unlike single cell cultures, there are no reliable automated segmentation techniques that allow for the localization and quantification of organoids in their 3D culture environment. Here we describe OrgaQuant, a deep convolutional neural network implementation that can locate and quantify the size distribution of human intestinal organoids in brightfield images. OrgaQuant is an end-to-end trained neural network that requires no parameter tweaking; thus, it can be fully automated to analyze thousands of images with no user intervention. To develop OrgaQuant, we created a unique dataset of manually annotated human intestinal organoid images with bounding boxes and trained an object detection pipeline using TensorFlow. We have made the dataset, trained model and inference scripts publicly available along with detailed usage instructions.

( Fig. 1c-h). Manually measuring and counting these organoids is a very inefficient process as typically there are hundreds of images that need to be quantified with tens to hundreds of organoids per image. As a result, most studies either score by hand a limited number of images or use the images as representative samples and are not quantified.
Recently Borten et al. released an elegant open-source software package, OrganoSeg 17 , that addresses some of these challenges, but still relies on conventional image processing techniques and requires tweaking of multiple parameters for any given set of images with similar optical conditions. Instance-based detection using deep convolutional neural networks, however, offers an auspicious approach to address this and similar problems. Building on Tensorflow 18 , Google has recently released an object detection API 19 that makes configuring, training, testing, and running various object detection neural architectures substantially more accessible to scientists than before. Utilizing the object detection API, here we present a practical open-source implementation, OrgaQuant, which allows any user to automatically detect and localize a human intestinal organoid within a typical bright-field image. Based on the idea of transfer learning 20 , we take a pre-trained neural network and further train it on organoid images to achieve very high precision results in drawing a bounding box around each organoid. Once a bounding box is determined, downstream processing allows further quantification, including size and shape measurements. Using the algorithm does not require any parameter tuning and runs autonomously on all images in a given folder and sub-folders while being robust against the various imaging artifacts described in Fig. 1c-h. (a) Organoids are typically cultured in a 3D hydrogel droplet formed from either naturally-derived or synthetic extracellular matrices. The droplet sits on a transparent substrate such as the polystyrene bottom of a multi-well culture plate. Each droplet can have anywhere from zero to several hundred organoids. (b) Organoid droplets are imaged using low magnification objectives to efficiently capture a wide field of view using bright-field modalities. (c-h) As a result of the culture and imaging methods, there are a variety of imaging artifacts that render conventional segmentation and image processing techniques unreliable. These artifacts include occlusion and overlap (c), out of focus organoids (d), heterogeneous size distribution (e), sub-optimal lighting conditions (f), very dense (g) or very sparse (h) cultures.
A fast, accurate, parameterless and fully automated algorithm for human intestinal organoid localization and quantification. OrgaQuant provides a quantification Jupyter Notebook file that can be run to quantify all images within a folder and sub-folders. The resulting output is a CSV file for each image containing the bounding box coordinates for each organoid, projected 2D area measurements as well as lengths of the major and minor axis of an organoid (which is assumed to be an ellipse). The inference script quantifies an input image by using a sliding window for which both the size and overlap can be set by the user if needed (Fig. 3a). The sliding window is used to circumvent GPU memory limitations if the entire high-resolution image was given as input. Organoids at the edge of each sliding window patch are ignored thus, an overlap between windows should be used. The output is a single image with all the aggregated labels. Both the labeled image and the CSV labels file are saved in the same folder as the original input image. OrgaQuant labeling quality is indistinguishable from that of humans (p = 0.35) for a given image set (Fig. 3b) with a mean average precision (mAP) of 80%, but is substantially faster and more consistent requiring only 30 sec/patch (on an NVIDIA Quadro P5000 GPU) vs. anywhere from 25 to 284 seconds for humans (Fig. 3c).

Discussion
Object detection and localization is a complex problem in computer vision applications. It is especially tricky when fast detection performance is required. There have been several detection algorithms implemented to provide a balance between speed and accuracy. Two prevalent approaches are Single Shot Multibox Detector (SSD) 21 and You Only Look Once (YOLO) 22 . While these are ideal for real-time detection, they accomplish speed by sacrificing accuracy. For OrgaQuant, we decided to implement Region Convolutional Neural Network (R-CNN) and specifically what is referred to as Faster R-CNN. Faster R-CNN can use a detection model based on several different architectures including ResNet 101 23 and Inception v2 24 . Here we chose an architecture based on both Inception v2 and ResNet called Inception-ResNet-v2 25 for which an implementation is provided with the TensorFlow object detection API. The model has been pre-trained on a box annotated COCO dataset 26 and for our purpose, we fine-tuned the model by training it on our organoid dataset. Since Inception-ResNet models have many parameters it is important to use a very large dataset. To achieve this, we augmented the dataset as described in the methods section. (a) A single image is captured using a 4x objective which is then divided into 300 × 300 (and 450 × 450) pixel patches to make annotations less overwhelming to our crowdsourcing community and to fit within GPU memory of our neural network training computational hardware. (b) The patches were then distributed on a crowdsourcing platform called CrowdFlower, now known as Figure Eight, along with detailed instructions on what to annotate. Several redundancy techniques were used to assure quality, as described in the methods. The resulting dataset includes box coordinates (x min , y min , x max , y max ) for each organoid in an image.
The resulting implementation of OrgaQuant can automatically localize an organoid within a brightfield image and label it with a bounding box. The cropped organoid image can, in turn, be used in any number of downstream image processing and analysis pipelines. Given the nature of our training set, the current model provided with this manuscript is only capable of accurately localizing spherical organoids (i.e. organoids with non-crypt-like structures). Here, we demonstrated the ability to measure the human intestinal organoid size, but an important byproduct of convolutional neural networks is that they extract features that can be used with various other machine learning algorithms. These features can be used, for example, to cluster similar organoid based on visual similarly or even detect subtle changes in organoid morphology in response to stimuli that cannot necessarily be detected with normal human vision.
Given the fact that each organoid is localized in 2D space, we can also track the individual growth kinetics of each organoid in a droplet over time. While we don't explicitly use OrgaQuant for this, it is as easy as loading a time-lapse set of images in a folder and analyzing it. We believe OrgaQuant is a basis for many exciting and intelligent organoid quantification techniques and we look forward to working the organoid community to develop this open-source implementation further.
Methods intestinal organoid culture. De-identified tissue biopsies were collected from unaffected duodenum areas of children and adult patients undergoing endoscopy for gastrointestinal complaints. All experimental methods and protocols were approved by and carried out in accordance with the Institutional Review Board of Boston Children's Hospital (IRB-P00000529). Informed consent was obtained from adult patients and from the legal guardians of the minor donors with assent from the patients which all were obtained at Boston Children's Hospital. Tissue was digested in 2 mg/ml of collagenase I for 40 min at 37 °C followed by mechanical dissociation. Isolated crypts were resuspended in growth factor-reduced (GFR) Matrigel (Becton Dickinson) and polymerized at 37 °C. Organoids were grown in organoid expansion medium (OEM) consisting of Advanced DMEM/F12 supplemented with L-WRN conditioned medium (50% vol/vol, ATCC, cat. no. CRL-3276) 8 , glutamax, HEPES, murine epidermal growth factor (EGF, 50 ng/ml), N2 supplement (1×), B27 supplement (1×), human [Leu15]gastrin I (10 nM), N-acetyl cysteine (1 mM), nicotinamide (10 mM), SB202190 (10 μM), A83-01 (500 nM), and Y-27632 (10 µM) as described 27,28 . Media was changed every two days and organoids were passaged every 4 days by incubating in Cell Recovery Solution for 40 min at 4 °C, followed by trypsin digestion for 5 min at 37 °C to obtain single cells. Single cells were seeded at a density of 25,000 cells in 25 µL of GFR Matrigel. For experiments involving the synthetic hydrogels, single cells were seeded at a density of 500 cells/µL. Three µL of cells suspension www.nature.com/scientificreports www.nature.com/scientificreports/ (Matrigel or synthetic hydrogels) were loaded in a 96-well plate an allowed to polymerase for 15-20 min at 37 °C. 100 µL of OEM was loaded in each well. Media was changed every two days.
image acquisition. Images of organoids suspended in gel droplets were acquired using a Thermo EVOS FL microscope with a 4x objective at days 4 and 6 of culture in normal bright-field mode. Images were saved as 8-bit TIFFs along with a scale bar. A single image was taken for a droplet. Since the organoids are suspended in the gel, the focus level was chosen to have the most organoids in focus as determined subjectively by the user. The resulting images were 1500 × 1125 pixels and were approximately 4.5 MB in size.
training dataset creation. There are no publicly available datasets for labeled organoid images. Instead, we created our own (Fig. 2). Each image (which was around 1500 × 1125 pixels) was divided into 300 × 300 pixel and 450 × 450 pixel patches. It was important to use patches because the original image was (1) too big to fit into GPU memory and (2) too difficult to label as it had hundreds of organoids. The patches were then labeled using a crowdsourcing platform (Crowdflower.com, now known as Figure-Eight) where the workers drew a bounding box around each organoid that was considered to be in focus (i.e., not having very blurry edges). The definition of what 'in focus' is very subjective, and there was no way to easily standardize that during the process of manual labeling. Each image was labeled by two different workers, and if there was less than 80% agreement (as defined by calculations of Intersection over Union (IoU) carried out by CrowdFlower), the image was presented to a third worker for futher annotation. The bounding boxes that were chosen for each image where an aggregate where a box is only chosen if there was 70% agreement between all workers. Detailed instructions and examples were provided to the workers who could only complete the task after a quality test they underwent. Additionally, each individual labeling task had a discrete test image to assure data integrity. The resulting dataset was composed of 1,750 image patches and a total of 14,242 aggregated bounding boxes. The dataset was randomly divided into training and test sets. Training had 13,004 boxes and test had 1,135. There were a total of 1,745 unique images that had at least one bounding box. The bounding box data was stored in a '.csv' file where each row contained: 1. filename: the image name in which the bounding box is located 2. width, height: of the image patch (in our case we had two different patch sizes 300 × 300 and 450 × 450) 3. class: the label for the bounding box. 'organoid' was the only label we used. 4. xmin, ymin, xmax, ymax: define the coordinates of the bounding box where the origin (0,0) is located in the top left corner of the image.
Hyperparameter selection and neural network training. While implementing a Faster R-CNN from scratch is no trivial task. The TensorFlow object detection API made is incredibly easy. While we will not reiterate the steps we took which are well documented on the TensorFlow API's GitHub page. We will briefly describe the entire implementation and refer the user to our code for more details.
1. The dataset was created by breaking apart large microscope images of organoids into 300 × 300 and 450 × 450 pixel patches. 2. The patches were then uploaded to a Google Storage Bucket to make them accessible to our crowdsourced annotators.
3. An detailed instruction manual was written for the crowdsourcing platform called CrowdFlower.com, and a new job on the platform was set up to annotate the images using bounding boxes as defined by specific instructions. 4. The resulting '.csv' file included the x min , y min , width and height of each bounding box. A small python script was written to change that to x min , y min , x max , and y max as this is the preferred format for the helper scripts used below. 5. The '.csv' file was broken into a training set and a test set. 6. A helper script provided by the API was then used to transform the data from .csv format into TFrecords (which is a TensorFlow data format used by the API). 7. A configuration script was then created where we specified the number of classes (in this case only one), augmentation strategy, data location…etc. We also had the option of specifying parameters relating to the Faster R-CNN architecture, but we decided to stick with the defaults as that seemed to work well during initial tests. The hyperparameters we adjusted were: a. The batch size used was one as anything larger did not fit into a single GPU memory. b. Total training steps of 200k with no stopping criteria c. We used an SGD optimizer with 0.9 momentum, and learning rate was adjusted to decrease with the number of steps as follows: i. LR = 0.001 from step 0-50k ii. LR = 0.0001 from step 50-80k iii. LR = 0.00001 above 80k 8. The training was carried out on a cloud-based Windows Server 2016 instance on Paperspace.com and took around three days on a Quadro P5000 GPU with 16 GB of GPU RAM. The service used was Paperspace. com as it was cheaper than both AWS and Google Cloud (for GPU instances) at the time we trained. 9. TensorFlow comes with TensorBoard, which allowed us to observe the training loss as it was training and to calculate the mean average precision for the implementation (mAP) using the code-base provided by the API.
The main metric we used to evaluate the algorithm's accuracy was the mean average precision (mAP). This metric is the gold standard for assessing object detection algorithms. The mAP was determined using a 10% held out test set that the training algorithm had not seen. To describe the metric in a bit more detail: The average precision refers to what fraction of the ground truth (manually annotated) bounding boxes were found by the algorithm. For example, if an image has two organoids (hence two bounding boxes) and the algorithm detects only one of them, then the average precision is 0.5 or 50%. If it detects both of them, then it would be 100%. The mAP is then the mean of all the precisions calculated across all the test images. Hence the closer the mAP to 100% the better is the algorithm. Note that in order to compare the bounding box created by the algorithm with the ground truth, it was assumed if there was 70% overlap (i.e., 0.7 intersection over union) then it was considered the same bounding box. While in some instances, it might be useful to have a metric that measures computational efficiency, here it was not a large concern as the implementation did not have to be fast. For example, no real-time detection was desired.