Automated fluorescence intensity and gradient analysis enables detection of rare fluorescent mutant cells deep within the tissue of RaDR mice

Homologous recombination (HR) events are key drivers of cancer-promoting mutations, and the ability to visualize these events in situ provides important information regarding mutant cell type, location, and clonal expansion. We have previously created the Rosa26 Direct Repeat (RaDR) mouse model wherein HR at an integrated substrate gives rise to a fluorescent cell. To fully leverage this in situ approach, we need better ways to quantify rare fluorescent cells deep within tissues. Here, we present a robust, automated event quantification algorithm that uses image intensity and gradient features to detect fluorescent cells in deep tissue specimens. To analyze the performance of our algorithm, we simulate fluorescence behavior in tissue using Monte Carlo methods. Importantly, this approach reduces the potential for bias in manual counting and enables quantification of samples with highly dense HR events. Using this approach, we measured the relative frequency of HR within a chromosome and between chromosomes and found that HR within a chromosome is more frequent, which is consistent with the close proximity of sister chromatids. Our approach is both objective and highly rapid, providing a powerful tool, not only to researchers interested in HR, but also to many other researchers who are similarly using fluorescence as a marker for understanding mammalian biology in tissues.


Supporting section S1: The heuristic algorithm for near-surface foci segmentation
We use a heuristic algorithm to adaptively set the h-value that result a realistic foci segmentation in the image intensity pipeline. The resulting foci count from this algorithm is later used to set h-value of candidate foci detection in both image intensity and image gradient branches.
The parameter, h-value, often dictates the resulting segmentation from the extended maxima transform ( Figure S4A). Small h-values causes the faulty recognition of local maxima due to noise and higher h-values result in failure to detect some candidate foci. Thus, we take three image statistics to represent the above-mentioned phenomenon, namely, the average intensity value of the segmented foci pixels (higher the value better representation of foci), the total area of the segmented foci (lesser the area better representation of foci without region merging or noise detection) and the number of foci (parameters, normalized to the range [0 1], plotted against a range of h-values are shown in the Figure S4B). The goal is to take a fuzzy decision to determine an h-value, which will segment highest number of foci with relatively a high average intensity and a low total area. In order to model the above requirements, we define a scoring function shown below. (Eq. S1) A representative plot for the above scoring function is shown in the Figure S4C. There is an initial sharp decrease in the number of foci with increasing h-value and then the rate of decrease levels off. Therefore, the scoring function plummets initially and starts to increase. Hence we select the h-value that gives the highest local maximum of H_score and perform extended maxima transform on the intensity image.

Supporting section S2: Support Vector Machines
Consider a classification problem of finding a classifier that separates the training set, { , 5 | ∈ ℛ I , 5 ∈ −1, +1 , = 1,2, … , } where 5 are the dimensional input vectors and 5 are the corresponding labels of the (two) classes. SVMs are a class of decision functions, that separates this training data (and similar new data) in to the two classes. Theoretically, support vector machines, involves projecting the input data in to a higher dimensional space, using a mapping, (. ), and finding the optimum hyper-plane in the higher dimensional space, ⋅ + = 0 (Eq. S2) that separates the data in to the two classes. Then the decision function is essentially, = ( ⋅ + ) (Eq. S3) This involves solving the quadratic optimization problem, Here 5 s are the slack variables introduced to tolerate the misclassification and ≥ 0 is penalty parameter on the training error chosen by the user. Solving the dual problem of the above optimization (which is computationally easier than solving its primal problem) gives, = 5 5 k 5lg (Eq. S6) where 5 s are the Lagrange multipliers. Non-zero 5 s corresponding to the training data that determines the decision hyperplane and are called the support vectors. Karush-Kuhn-Tucker conditions for the primal problem can be used to determine the value for [S1]. Thus the final decision function is given by, = 5 * 5 * ⋅ + * (Eq. S7) here are the support vectors and 5 * and 5 * are their corresponding Lagrange multipliers and labels respectively. * is the values for found by the optimization.
In the dual problem and the decision function the data appear only as dot products of each other and hence we do not need to define the mapping (. ) [S1]. All we need to define is a kernel function in the form of, , = ( ) ⋅ ( ) (Eq. S8) This give the user the flexibility of using infinite dimensional mappings such as the one we use, i.e. radial basis function (RBF) kernel.

,
= exp (− − h ) (Eq. S9) Extensive reference for SVM can be found in for the interested readers.

Supporting section S3: Microscope simulation
Fluorescence imaging was performed using a simulated microscope with a low numerical aperture objective (NA = 0.13) at the emission wavelength, 530nm, similar to the experimental set-up. All simulations were performed using the MATLAB (MATHWorks, Natick, MA) and simulation process is mathematically described below.
Consider Ω , , to be the fluorophores distribution of the locus at the 3D position , , . The fluorescence excitation was performed using the laser beam with wavelength €• and wave number €• = 2 €• . The 3D distribution of the electric field , corresponding to projected illumination at the sample space was calculated using the Fresnel's formula as, Here … is the amplitude of electric field of the projected excitation, z is the axial distance and ) , ( h x is the focal plane of the locus. The 3D intensity distribution can be written as follows: , , = ( , , ) h = … h ( , , ) h ≡ … ‰ ( , , ) (Eq. S11) Here … is the intensity of the illumination at the specimen. The fluorescence emission intensities for specimen can be calculated as, Ω Š , , = Ω , , × , , × , , (Eq. S12) Here , , is the scattering function. Assume that fluorescence emission has wavelength €OE . The 3D point spread function (PSF), Ž• ( , , ), is simulated for the detection microscope objective (MO). The intensity function of the detected fluorescence photons at the imaging plane, Š , Š , can be written as, ′ Š , Š = Ω , , ⨂ Ž• ( , , ) h | z=0 (Eq. S13) This image is recorded by the CCD camera, where each pixel accumulates electrons in proportion to the number of incident photons and reads out the count. This is a Poisson process and hence Poisson noise was added to the recorded image.

Supporting section S4: Monte Carlo simulations to generate scattering point spread function (sPSF)
a. Monte Carlo simulation process The light transport process inside a scattering medium, such as tissue, is mainly dominated by two phenomena, light absorption and scattering (see Figure S7A). The absorption coefficient, 0 , of a scattering medium is defined in such a way that the following expression holds for any path that a photon may follow inside the medium. ℎ = •--˜ (Eq. S14) Here, ℎ is the probability of the photon's survival (without absorption) and L is the total length of the path the photon travelled inside the medium. A scattering event can happen at any random time during photon's propagation through the medium. The length the photon travels without a scattering event, , is given by the following expression.
= −ln ( )/ 9 (Eq. S15) Here, (where, 0 < ≤ 1) is a random number and 9 is the scattering coefficient of the medium. A scattering event changes the photon's direction of travel by an angle, . The probability distribution function of is given by, = g žŸ × g•1 (g 1 ¡ h1 ¢£¤ (¥)) ¦/¡ (Eq. S16) Here, is the anisotropy of the medium. Thus, the random walk of a photon ( Figure S7B) through a scattering medium can be probabilistically simulated according to the above equations when the medium parameters, 0 , 9 and are known. In this work we used the Mote Carlo(MC) simulation implemented in the reference S2.
Briefly, the escaping flux density of fluorescence at a point ( ) on the tissue surface can be described by the following equation [S2].
(Eq. S17) Here, ′ is a vector that specifies the position of the incremental volume ( ′) within the tissue, ( ′) is the incremental volume at ′ of the volume integration, … is the incident power of the excitation beam, ª ′ is the transport of source power at the excitation wavelength such that … ª ′ yields the local fluence rate at ′, is the extinction coefficient of the fluorophore (using base ), ′ is the concentration of fluorophore at ′, is the power yield, ( , ′ is the transport of fluorescence power from ′ to yield the escaping energy density at the tissue surface at position , ( is the flux density of escaping fluorescence at the surface at position . Mote Carlo simulations calculates statistical averages for ª ′ and ( , ′ based on a large number of photons' random walks. ª ′ and ( , ′ in (4) also takes refraction (and hence total internal reflection) at the air tissue boundary in to consideration. Implementation of the MC simulation, sometimes uses speed of light inside the medium. Therefore, in addition to, 0 , 9 and , refractive index of tissue, , should also be known. Interested readers can refer to Jacques [S2] for a detailed description of the simulation process.

b. Approximation to tissue parameters
The tissue parameters 0 , 9 , and were selected from the literature to mimic mouse pancreatic tissue as closely as possible. A simplified version of the model suggested in the reference S3, was used to calculate the above parameters. 0 can be calculated using the expression below. 0 = 0.3ª: + 1 − 0.I.3ª: + 0.e08./ + 0.(08 + 0.=.@07393=. (Eq. S18) Here, 0.3ª: , 0.I.3ª: , 0.e08./ , 0.(08 and 0.=.@07393=. are respectively the scattering coefficients of oxygenated blood, deoxygenated blood, water, fat and melanosome. , , and are volume fractions of blood, water, fat and melanoma. is HGb oxygen saturation of mixed arterio-venous vasculature. Approximations for , , , and for pancreatic tissue were selected form the table 3 in the reference S3. In cases where values for pancreatic tissue were not available, the averages over similar tissue types were used. 0.3ª: , 0.I.3ª: , 0.e08./ , 0.(08 and 0.=.@07393=. at excitation and emission wavelengths (512nm and 529nm for enhanced yellow fluorescent protein -EYFP) were extracted from the graphs reported in the reference S3. Thus absorption coefficients at excitation and emission wavelengths ( 0.• and 0.€ ) were calculated. 9 and were reported hard to measure [S3]. Hence, the average value of 0.9 was used for as suggested in the reference S3. Then 9 was derived using the reduce scattering coefficient 9 ′ and according the definition of 9 ′. 9 Š = 9 (1 − ) (Eq. S19) 9 Š at excitation and emission wavelengths were calculated using the following equation. Approximate values of and for pancreatic tissue were chosen from the table 2 in the reference S3.
Finally, an approximation for for pancreatic tissue was selected by taking the average of the refractive indices of water and dry tissue mass reported in the reference S3.

c. Monte Carlo simulation results
The Monte Carlo simulation was run using the tissue parameters described above and figure S8 shows the resulting point spread functions that were used in the main text. Mean boundary intensity Mean boundary Focus-flow 1 Circularity: the ratio between the perimeter and the area, 2 Convex area: the area inside the convex hull for the segmentation of each focus and 3 Relative convex area: the ratio between the convex area and the area of the focus segmentation.

Original
Focus flow Figure S3. A screenshot of the graphical user interface.    Photons interact with tissue either by absorption (quantified using absorption coefficient, µ a ) or scattering (quantified using scattering coefficient, µ s and anisotropy, g). On average, a scattering event occurs after a mean-free-path and some excitation photons reach the fluorophore and causes isotropic emission photons generation (shown in green). Emission photons, similarly subjected to absorption and scattering, reaches air-tissue boundary. (B) A random walk of an emission photon. scattering events occurs before the photon dies.