μMAPPS: a novel phasor approach to second harmonic analysis for in vitro-in vivo investigation of collagen microstructure

Second Harmonic Generation (SHG) is a label-free imaging method used to monitor collagen organization in tissues. Due to its sensitivity to the incident polarization, it provides microstructural information otherwise unreachable by other intensity based imaging methods. We develop and test a Microscopic Multiparametric Analysis by Phasor projection of Polarization-dependent SHG (μMAPPS) that maps the features of the collagen architecture in tissues at the micrometer scale. μMAPPS retrieves pixel-by-pixel the collagen fibrils anisotropy and orientation by operating directly on two coupled phasor spaces, avoiding direct fitting of the polarization dependent SHG signal. We apply μMAPPS to fixed tissue sections and to the study of the collagen microscopic organization in tumors ex-vivo and in-vivo. We develop a clustering algorithm to automatically group pixels with similar microstructural features. μMAPPS can perform fast analyses of tissues and opens to future applications for in-situ diagnosis of pathologies and diseases that could assist histo-pathological evaluation.


Table of contents.
• Supplementary Note 1: -Second Harmonic Generation Theory • Supplementary Note 2: -Phasor analysis • Supplementary Note 3: -Fitting method -y0 effect • Effect of data sampling on the phasor analysis.
• Effect of noise on the phasor analysis.
• Effect of the micro-structure in-homogeneity. Second Harmonic generation (SHG) is a nonlinear coherent optical process where two incident photons of frequency  are converted into a single photon of exactly twice the frequency 2. SHG offers several advantages with respect to other imaging techniques: it avoids the need of exogenous labels and it is generated by means of infrared laser, thereby increasing the penetration depth into tissue and reducing sample damage. Since it is a second-order nonlinear process, SHG provides intrinsic optical sectioning related to the dependence of SHG intensity to the square of the excitation intensity. SHG does not suffer of photobleaching and phototoxicity since it a nonlinear scattering process, not relying with absorption phenomena. Moreover, SHG is restricted to molecules with non-centrosymmetric organization, resulting in a high sensitivity to polarizationdependent measurement. This property can be exploited to provide quantitative information on the local molecular structure (and/or organization) of the sample and to directly probe the assembly of collagen into tissues. Indeed, numerous studies have demonstrated that collagen, myosin and microtubules in cells and tissues can be non-invasive imaged by exploiting SHG microscopy.
Polarization-dependent method detects the SHG signal as a function of the relative angle between the laser polarization and the unknown direction of the SHG emitters, allowing to probe their molecular distribution and therefore to reconstruct the structure of the investigated sample. In order to extract the molecular organization, we must consider proper theoretical model relating the collagen structure and the SHG optical principles.
The total polarization for a medium interacting with an incident electric field E at frequency ω is 61,62 : (S1) where i, j and k denote vector or tensor component indices, P (0) is the static intrinsic polarization, and ) 1 ( ij  are the components of the linear electric susceptibility of the medium. SHG, a degenerate case of three-wave mixing where 1=2=, is usually described by using second-order nonlinear optical susceptibility tensor ) 2 (  , which represents the macroscopic (0.1-1 m) nonlinear response of the medium composed of elementary scatter elements at a microscopic molecular level. In general, the second-order nonlinear optical susceptibility tensor ) 2 (  is a third-rank tensor with (3 × 3 × 3) elements and the maximum number of independent components of ) 2 (  is 18 due to the following symmetry: The second order polarization can be expressed as: Several theoretical models of the local second-order nonlinear susceptibility tensor  (2) have been proposed 12,21,27,33,40,43,62-64 for interpreting the polarization-dependent second harmonic generation (P-SHG) signal. In all the samples capable to produce SHG signal (muscle, microtubules, collagen in animals and starch and cellulose in plants), the SHG active macromolecular assemblies are usually described by hexagonal (C6) or cylindrical (C∞) symmetry, which are equivalent within the ...
Kleinman conditions (that implies that no net energy absorption is involved) 65 . Under these assumptions it is possible to further reduce the number of the  (2) components from 18 to 2.
By considering the coordinate system in which the sample lies in the xz-plane and the propagation direction of the laser beam is along the y-direction (see Supplementary Fig. 1), if the hexagonal symmetry is applied, the non-zero  (2) tensor elements follows the conditions: If Kleinman symmetry is assumed, the entries of the tensor identified by a permutation of indices (i,j,k) have the same value 65 .This results in seven nonzero components and two independent tensor components: If now an electric field propagating along the z-axis and linearly polarized at an angle (L-F) with respect to the fibril axis (axis of cylindrical symmetry) is considered: the second-order polarization can be written as: (S7) Therefore, the total SHG intensity, proportional to 2 ) 2 ( P  , is given by: where L  and F  are, respectively, the laser excitation polarization angle and the fibril orientation angle with respect to a fixed direction in the laboratory frame (axis x in Supplementary Fig. 1). These relations are valid since the collagen fibrils lie parallel to the surface of the tissue (the collagen fibrils and the incident electric field are both within the focal plane). The scale factor k includes the absolute intensity of the SHG signal and the setup parameters (e.g. laser power, optical transmission ratios for both excitation and signal light paths, detector sensitivity).   In particular, two phasor plots, the  and  p-plots, were created to extract the F and the  values without fitting procedures.
In the p-plot, the coordinates (g, s) of the point are the cos/sin first Discrete Fourier Transform In this case the first DFT is computed in the FLF+/2range with F retrieved from the p-plot. Three different ISHG profiles as a function of L, obtained for a fixed =1.5 and different F parameter (F=30°, F=60°, F=120°), are shown in Supplementary Fig. 3. In the p-plot therefore they all lie on the same circle but at different angular position, while in the p-plot the points are superimposed due to their shared  value. It is clear that the phasor approach applied to the acquired polarized-dependent SHG curves can be exploited to access information on the structural organization and distribution of SHG emitters within the sample.  (5) (Main text). The distances between the considered experimental point and points of the  reference curve are computed, as shown by the white dashed arrows, searching for the minimum represented here by the green arrow. It must be noted that the  reference curve is obtained with the F value retrieved from the first phasor plot and with the experimental sampling angle . We refer to Supplementary Fig. 5b for a description of the algorithm used to derive the  parameter when the microscopic model must take into account a non negligible background level (see Equation (S11)).

Fitting method
In order to compare the data obtained by means of the phasor approach with those retrieved with a fitting procedure, we use the following function: (S11) Where L  and F  are the laser excitation polarization angle and the fibril orientation angle, respectively, with respect to a fixed direction in the laboratory frame (axis x in Supplementary Fig.  1). This relation is valid since the collagen fibrils lie parallel to the surface of the tissue (the collagen fibrils and the incident electric field are both within the focal plane). The scale factor k includes the absolute intensity of the SHG signal and setup parameters (e.g. laser power, and optical transmission ratios for both excitation and signal light paths and for the detector sensitivity).
The parameters k, , F  and y0 were set free in the fitting algorithm, based on a nonlinear leastsquares fitting routine (Origin 8.5, OriginLab Corporation) using over five hundred iterations per pixel. The parameter y0, added in other analysis based on fit procedure 40,60 , includes experimental errors and any eventual deviation from the theoretical model, so it is not to be considered a simple background contribution 40,60 . y0 suffers also of pixel to pixel slight difference.
The fitting function was used to interpolate the SHG signal as a function of the laser polarization angle L  , as benchmark for MAPPS. The half-waveplate was rotated from 0° to 180° in steps of 5° in order to obtain a rotation of the laser polarization from 0° to 360° in steps of 10°. From the fitting procedure the parameters k, , F  and y0 are retrieved. Figure 5. "Background y0" effect on  and p-plots. The term y0, exploited in the fittingbased procedure, includes both experimental errors and deviation from the selected theoretical model. In tumor samples, collagen fibrils can no longer be considered to lie parallel to the surface of the tissue. Therefore, an addition term y0, accounting for an angular distribution of collagen fibrils within each pixels, has to be examined. Simulations have been performed in order to evaluate the position of the points derived from the DFT of the P-SHG curves in the  and p-plots. Data were simulated by substituting in equation (S11) (which is equal to equation (S8) with the additional term y0) the parameters k=1000, F=60°, =1. When y0 increases, the  and  spectra move towards the coordinate (0,0) in the  and p-plots along a straight line, as shown in a) and b) respectively. As shown in a), following this effect, a point initially lying on the reference circle characterized by F =60° and =1.5 will progressively move on a circle nearest to the center of the p-plot, while maintaining the same F value. b) shows the movement of the points towards the center of the  p-plot as y0 increases. Since, due to the y0 effect, the  values can no longer be retrieved by means of equation (5) (Main text), a modified procedure has been devised. We computed the angles between the following vectors: 1) the normalized vector pointing from the center of the p-plot to the coordinates of the considered point; 2) the vector pointing from each point of the reference curve to the coordinates of the considered point. The value of the considered pixel is assumed equal to the value of the point on the reference curve for which this angle is minimum. Points that, following this procedure, are associated with  values lying outside the range [0,10] are discarded from further analysis and not reported in the  and  maps.
Effect of data sampling on the phasor analysis. Figure 6. Effect of data sampling on the -spectrum in the p-plot. a) The reference curve (represented in red-yellow color code) and therefore the position of the points in the p-plot are dependent on both θF and  parameters. The image shows the effect of the data sampling when different values of θF =90°, 91°, 92°, 93°, 94°, 95°, 96°, 97°, 98°, 99° with the same =1.5 are considered. When =1°, points corresponding to different values of θF lie in the same position on the shown reference curve (obtained considering =1°, yellow dot in panel a). =5° means that the reference curve is simulated starting from P-SHG spectra sampled every 5°. The projections of data simulated for different θF values lie in this case on five different reference curves obtained by substituting θF =90°,91°,92°,93° or 94° in equation (S10) (each passing through one of the five green dots in panel a). Two points characterized by two different values of θF, whose absolute difference is an integer multiple (n) of , lie on the same reference curve. Let us consider F1-F2=95°-90°=nin this case the point with θF =95° will be superimposed to the point with θF =90° on the same reference curve. In a similar way, also the couples of points 96°-91°, 97°-92°, 98°-93°,99°-94° will be superimposed, each of them lying on their respective reference curve. For =10° a similar approach can be followed: =10° implies that the considered points can lie on ten different reference curves, which in this case are obtained by substituting θF =90°,91°,92°,93°,94°,95°,96°,97°,98° or 99° in equation (S10) (see blue dots in panel a). Five of these reference curves are shown in panel b). Panel c) shows an intuitive explanation of the data sampling effect for two curves characterized by F1=91° (magenta) and F2=94°

Supplementary
(green), with =1.5 and =10°. Since F1-F2≠nand 1=2), the Discrete Fourier Transform will be calculated starting from the nearest experimentally sampled angle, which is F=90°and different portion of the P-SHG curves will be therefore analyzed (shown by the increased thickness of the curves) to obtain the (g,s) coordinates by equation (S10). This results in different -reference curves in dependence on both and F1 values. Instead, if F1-F2=n the same portion of the P-SHG curve is DFT analyzed since in this case F1 and F2 angles are characterized by the same distance from the nearest experimentally sampled F angle.
Effect of noise on the phasor analysis. The orange dots in the p-plots and the blue dots in the p-plots represent the distribution of points without the application of the respective Gaussian noise and/or F and Gaussian distributions. 2500 points were sampled for each simulation.

Supplementary
Effect of the micro-structure in-homogeneity.

Denoising effect
The deleterious noise effects on the P-SHG signals can be reduced by processing the matrices representing the image in the phasor space. These matrices have the same dimension of the original images acquired at different θL. The entries are the real or imaginary components of the first harmonic Discrete Fourier Transform (DFT) of the normalized signal from individual pixels in the stack of images. It is then clear that a couple of matrices, containing respectively real and imaginary values of the DFT, is associated to the p-plot and another couple to the p-plot.
Denoising is realized by applying a median filter to each matrix separately, which yields to a reduction of the scatter of points around the proper position in the phasor plot while preserving the original image resolution.
The matrices associated to the two phasor plot are not denoised at the same time. Since the first harmonic DFT of the normalized P-SHG signal between [0,] is employed in the calculation of F, fundamental for performing the second Discrete Fourier Transform, the matrices for the first phasor plot are processed before estimating F, then the last two matrices are calculated, processed and used for obtaining the value of .
In the experiments reported here a 3x3 pixels median filter has been exploited. Moreover, the filtering procedure has been repeated five times (since it has been proved that repeated application lead to a plateau of improvement in denoising 55 ).

Coherence Density Peak Clustering
We exploited a custom clustering algorithm in order to highlight regions in the sample related to different micro-structural parameters. This algorithm, based on the maximum density approach, works in the (θF,) space where each pixel is projected. It starts by calculating, for each pixel, the density ρi: where the sum is performed over any other pixels. is the Heaviside step function and C andC are arbitrary chosen cut-off. ij is the absolute difference between the values of for pixels i and j, while ij is the angle between the two vectors representing the same pixels in the p-plot. We order the pixels according to their density and apply the following iterative procedure to obtain the cluster center. i) The first unprocessed pixel with highest density is hypothesized as the cluster center and marked as processed. ii) All the unprocessed pixels within C and C from the putative center and with lower density are marked as processed and excluded from the list of the putative cluster centers. iii) The procedure is iterated until every pixel is marked as processed.
The pixels who are not a center are assigned to the j-th cluster for which the pixel-center distance (equation (S13)) is the lowest. Provided that ij and ij are within the respective cutoff values, the following definition of distance has been employed: (S13) Moreover, if the number of pixels grouped in a cluster at the end of the procedure is lower than a user selected threshold, the cluster is discarded and its pixels are again marked as unprocessed. At this stage, each unprocessed pixel is re-assigned to the nearest cluster according to equation (S13), provided that ij <C andij <C. At the end of this procedure each remaining unprocessed pixel is removed from the dataset. The clustered pixels are merged together in a single image in which the different clusters are encoded with different colors. Moreover, for each cluster the pixel color scale intensity represents the integral of the P-SHG spectrum. The cutoff values for  and F used for performing the clustering analysis of the experimental data are reported in the caption of the corresponding figures.  The image exhibits the obtained clusters merged together. In this case, three clusters were retrieved and encoded with different colors, as shown in the legend. The color scale intensity (reported in the legend) for each pixel of a single cluster represents the sum of its values calculated from the images acquired at different L. The clusters have been obtained with the following cutoff parameters: C=5° and C=0.5. Middle: the image illustrates the color-coded values (as shown in the legend) of the anisotropy parameter in each pixel, retrieved from the p-plot. Right: the image represents the colorcoded values (as shown in the legend) of the angle F in each pixel, retrieved from the p-plot. We remark that the method is able to extract the correct F and  parameters also if the points lie within the indetermination zone (as point of the dataset 1 and 2).

Mouse-tail tendon
To further validate our method, MAPPS has been exploited to analyze polarization-dependent SHG image stack of a mouse-tail tendon, digested by means of collagenase (5mg/ml for 1h at 37 °C). Images of both normal and denatured mouse-tail tendon have been acquired as a function of the laser polarization L  within the range [0,2] with the polarization rotation step    10  . Supplementary Fig. 12 a and e show the maximum intensity projection of the mouse-tail tendon images acquired as a function of L  . The image stack corresponding to the collagenase-digested mouse-tail tendon has been acquired with double the power with respect to the normal control sample due to a reduced emitted SHG signal. By comparing Supplementary Figs. 12a and 12e, a decrease of SHG intensity can be noted throughout the sample together with alternated bright and almost dark areas. Also for thermal denaturation, the formation of tiger-tail like band pattern perpendicular to the collagen fibers, with horizontally extended areas characterized by a reduced SHG signal, has been reported. In Supplementary Figs. 12b-c and 12f-g the color-coded pixel by pixel 1   , encoded with different colors, whose scale intensity for each pixel represent the integral of the P-SHG spectrum. The analysis of the normal tendon recognizes 5 different clusters while in the denatured case the number of clusters was 4. In this case, only areas of the sample not characterized by crimped features has been selected in order to better highlight eventual differences in the anisotropy values distribution. As shown in the histogram of Supplementary Fig. 12i, mouse-tail tendon digested by collagenase did not exhibit a mean anisotropy  parameters significantly different from the normal control sample, in agreement with the literature ( In conclusion, during the denaturation process the intensity of the SHG signal in the mouse-tail tendon is reduced, while the value of the anisotropy value is almost constant, in agreement with the literature 57 . This is probably due to the fact that areas with a compromised microscopic collagen structure are not anymore able to emit an SHG signal and therefore their polarization dependence can no more be tested.