MiCASA is a new method for quantifying cellular organization

While many tools exist for identifying and quantifying individual cell types, few methods are available to assess the relationships between cell types in organs and tissues and how these relationships change during aging or disease states. We present a quantitative method for evaluating cellular organization, using the mouse thymus as a test organ. The thymus is the primary lymphoid organ responsible for generating T cells in vertebrates, and its proper structure and organization is essential for optimal function. Our method, Multitaper Circularly Averaged Spectral Analysis (MiCASA), identifies differences in the tissue-level organization with high sensitivity, including defining a novel type of phenotype by measuring variability as a specific parameter. MiCASA provides a novel and easily implemented quantitative tool for assessing cellular organization.

In order to describe structure at various length scales in the thymus, we use the circularly averaged spectrum and circularly averaged cross-coherence. The circularly averaged spectrum is used to describe important features of the distribution of a given cell population at various length scales and the circularly averaged cross-coherence is used to describe similarities between cell populations at various length scales. Since this paper is aimed toward an audience of experimental geneticists and cellular and developmental biologists, we will first describe in some detail the concepts upon which the quantitative analysis is based, then describe the particular techniques that we use to get stable statistical estimates of the quantities of interest.

Convolutions, Frequency Spectra and Coherences
One Dimension. The quantities that we will use to measure structure are based on spatial correlation functions. In one spatial dimension, the autocorrelation function, C(x), is defined by the expression Here, is a function that describes the quantity of interest, say cell density, along a line. The autocorrelation function measures how similar is to itself when shifted by a distance , i.e.
. An integral of the form is called a convolution. Thus, is the convolution of a function with itself, .
The Fourier transform is important when considering autocorrelation functions (and convolutions in general) for two reasons: 1) the Fast Fourier Transform (FFT) algorithm can be used to rapidly compute convolutions and 2) although the autocorrelation function at point is typically statistically correlated with the autocorrelation function at nearby points, it can be decomposed into statistically uncorrelated components via the Fourier transform.

The Fourier transform of a function is defined
It is a decomposition of into sinusoidal functions (recall that ). This transform has the property (Parseval's theorem) that the Fourier transform of a convolution is equal to the product of the Fourier transforms of the individual functions being convolved, Because Fourier transforms may be calculated with the FFT algorithm, which is computationally very efficient, convolutions are more quickly calculated by applying Parseval's theorem. The Fourier transform of the autocorrelation function, is called the frequency spectrum where denotes a spatial frequency in our case. Additionally, if is a stationary random variable, then is independent of when the spatial frequency . This means that it is easier to work analytically with the frequency spectrum than the autocorrelation function because one need not worry about statistical dependencies between nearby frequencies when estimating the spectrum, whereas one does when estimating the autocorrelation. Although stationarity (translational invariance) does not strictly apply to the thymus, where obvious regional boundaries occur, it can still be a good approximation on intermediate length scales on which a cellular pattern can repeat many times.
We use a quantity called the cross-coherence, or simply the coherence, to measure similarities between two functions, and . The coherence is defined as a Fourier transform of the convolution of two functions normalized by the power in each function taken separately The magnitude of this function ranges between zero and one. If two functions have similar structure, i.e. if they contain sinusoids with similar phases at a given frequency, the magnitude of the coherence, , will be closer to one. Otherwise, if the phases are dissimilar at a given frequency, this function will be closer to zero.

Two Dimensions. In two spatial dimensions, the Fourier transform is defined
Here, the function is decomposed into sinusoidal waves in a two-dimensional plane. Often, this Fourier transform is rewritten in polar coordinates, where represents the spatial frequency of the wave and the direction perpendicular to the wavefront. In the two-dimensional case, the frequency spectrum is also the Fourier transform of the two-dimensional autocorrelation function, The coherence in two-dimensions is similarly defined, Both of these functions are also often written in polar coordinates, and . That is, they may be thought of as describing waves of a given spatial frequency, , with wavefront perpendicular to a given direction, .
A useful one-dimensional summary of may be calculated by averaging across all angles, . This is called the circularly averaged spectrum, In this paper, we use and its companion, the two-dimensional, circularly averaged coherence, , to quantify cellular distributions. Broadly speaking, provides information concerning the spatial frequencies (inverse length scales) that make up a cellular distribution and provides information concerning similarities between cellular distri-butions at different spatial frequencies. Because they are one-dimensional quantities, they are easy to plot and represent important quantitative features of cellular distributions at a glance.
It should be remembered that some information about the functions and will be lost after we take the circular average. Only if the cellular distribution were isotropic would all the information about the cellular distribution be contained in and . Nonetheless, these are important diagnostics to examine to identify characteristic length scales in cell populations and investigate similarities between cellular distributions.

Multitaper Estimates of Circularly Averaged Spectra and Coherences
The above discussion concerned the main concepts underlying our reasoning for choosing to quantify cellular distributions using circularly averaged spectra and coherences. Because imaging data is inherently noisy, it is important to choose robust, stable methods for estimating these quantities. One such method is multitaper estimation. It is beyond the scope of this paper to describe in full the theoretical basis for multitaper methods (and tapers in general). However, here we will describe the main issues that arise when estimating Fourier transforms from noisy, discretely sampled, finite data, as are found in imaging measurements of cellular distributions. What is done to avoid leakage, to the extent possible, is to multiply by a new function, whose Fourier transform is as sharply peaked as possible and that has low side lobes. An additional complication arises because data in an image is also discretely sampled, i.e. we only have the value of the function measured for each pixel, not continuously across the interval.
Slepian and Pollack [1] solved the problem of which discrete functions to use to replace the rect function that minimize the amplitude of side lobes (and hence leakage). They found a set of discrete functions, which are now called Slepian sequences, that have maximal amplitude at frequencies within a specified bandwidth, , and minimal amplitude in side lobes. Slepian sequences are eigenvectors of a special eigenvalue problem. These eigenvectors are wellconcentrated in frequency when they have low side lobes. The eigenvectors are wellconcentrated when their eigenvalue is close to . The number of well-concentrated eigenvectors, , depends on both the number of points sampled, , and the bandwidth, . When the eigenvectors are ordered in decreasing size of their associated eigenvector, the first are well-concentrated.

Statistical Estimation of the Frequency Spectrum, Multitaper Methods.
A final problem in computing a frequency spectrum from imaging data is that the data is noisy. That is, the image that is formed by the camera has noise superimposed on its measurements. Therefore, to get a good estimate of the frequency spectrum, some kind of averaging is needed to smooth the data.
Thomson [2] developed a set of methods for the statistical estimation of spectra and coherences using Slepian sequences called multitaper methods. The multitaper spectral estimation method uses well-concentrated Slepian sequences as tapers to minimize the side lobes arising from an estimate calculated from finite, discrete data. In addition, multitaper methods reduce noise by averaging over a set of spectral estimates calculated with different Slepian tapers.
A multitaper spectral estimate is thus an average over the magnitudes-squared of Fourier transforms of a tapered function, where indicates the i'th Slepian sequence. Written out in series form this becomes Here, is a discrete variable. That is, it takes values on an equally spaced grid of lattice points. For this spectral estimate, because the Slepians sequences are orthogonal, meaning they sample uncorrelated parts of the function, , the estimates are approximately independent as long as the estimates are separated in spatial frequency by more than the width of the central peak in the Fourier transform of the Slepian sequence, . Since the number of points imaged, , is typically fixed by the microscope, is largely determined by the bandwidth, , of the Fourier estimates. By choosing a broad bandwidth (large and hence large ) the spectral estimates are smoother, have smaller variances and are more stable. For narrow bandwidths (small and hence small ) the spectral estimates show sharper features, but have larger variances and are less stable. Since the linear dimensions of images are typically fixed, one usually sets such that there is enough smoothing that estimates are stable, but not so much that important features are smoothed out.
Below, we show in an example how the choice of bandwidth affects spectral features. Note the smaller confidence intervals on the estimate with larger ( versus ), but note that spectral features are smoothed out more in the same spectral estimate ( versus ). The user should perform some exploratory study of their data before choosing a value of to use. Particularly if there are sharp spectral peaks of interest, a smaller value of should be used. It should also be noted that, since the total degrees of freedom in a MiCASA analysis are increased with each successive image that is analyzed, if there is excessive variance in spectral estimates due to low bandwidths, it can be overcome by increasing the number of images. For this reason, we typically fix to be on the order of 3 − 7, giving 3 − 11 degrees of freedom per image.
The effect of different bandwidths (W) on spectral estimates. A) A pattern with a sharp ring of spectral power. This is one of N = 10 such images that were used for the MiCASA analysis here. B) A MiCASA spectral estimate of the pattern in (A). B 1,2 ) Insets showing details of the MiCASA spectral estimate in (B). Spectral estimate in red (resp. green) indicates XW = 3 (resp. XW = 9); this corresponds to M = 3 (resp. M = 15) degrees of freedom in each spectral of the N = 10 estimates, giving NM = 30 (resp. NM = 150) degrees of freedom in the estimate and an SNR of NM = 5.5 (resp. NM = 12). Note that the red error bars are thicker than the green error bars because the red spectral estimate has fewer (M = 3) degrees of freedom, and therefore a smaller signal-to-noise ratio (SNR). Note, however, that the red spectral estimate has a narrower bandwidth (XW = 3) and therefore sharper spectral features. Inset (B 1 ) shows the peak due to the wave-like pattern in (A). Note the broadening of the peak in the XW = 9 estimate due to smoothing across a wider bandwidth. Inset (B 2 ) shows ripples due to (very) low amplitude spectral ripples due to pixelization in (A). Note the smoothing of the ripples in the XW = 9 estimate leading to lower amplitude ripples.
Coherence magnitudes are similarly calculated using a set of Slepian sequences (see MiCASA algorithm, below).
As a last comment, for spectrum and coherence magnitude, , (log-spectrum) and (atanh-coherence) are normally distributed, therefore it is these quantities that we will estimate and plot. In addition to this technical consideration, an advantage of plotting logspectra and atanh-coherence is that statistically significant, low-amplitude features that might be missed in a standard plot, can often be more easily identified.

Two Dimensions.
One-dimensional multitaper spectral analysis has been used extensively in signal processing [3]. For the analysis of imaging data, we generalize the method to two dimensions. A Slepian taper is applied along each spatial dimension, giving a two-dimensional Slepian The two-dimensional multitaper spectral estimate becomes In series notation, this becomes Finally, we circularly average the multitaper spectral estimate Coherences are handled similarly (see MiCASA algorithm, below).

Jackknife Error Bars.
A standard statistical method for determining errors when only a few samples are available or when the exact distribution of a quantity is unknown is the jackknife method. This is the method that we use here. One calculates a set of estimates of the quantity of interest, in our case circularly averaged log-spectrum and atanh-coherence, where each member of the set is estimated with one sample left out. An intermediate set of quantities, so-called pseudo values, is then computed. This set of estimates has been shown to be effectively normally distributed for most common data sets. Finally, the overall mean and variance of the quantity of interest are calculated. Error bars for a given confidence interval are then obtained from the variance estimates.

The Multitaper Circularly Averaged Spectrum Analysis (MiCASA) Algorithm
Our algorithm is a two-dimensional generalization of a one-dimensional, multitaper spectral analysis described in [2,3]. A typical dataset that we analyze will be comprised of fluorescence measurements at two wavelengths corresponding to fluorophores that label two different cell types, and . We will have images from a number of experiments, and , where labels the experiments. To calculate Fourier transforms, we will use Slepian sequences as tapers. We wish to calculate the circularly averaged spectra and coherences for the two datasets. Combining the concepts and considerations above, we arrive at the following algorithm: Step 1. Calculate Fourier transforms at both fluorescence wavelengths (i.e. for cell types and ) for all experiments and tapers, and where and .
is then rasterized as a joint index, , labeling all experiments and tapers.
Step 2. Interpolate the Fourier transforms to the same frequency grid. This step is necessary since typically a full 'image' consists of a collage of sub images covering a portion of tissue. Each experiment can correspond to a differently sized image.
Step 3. Compute , one-sample-left-out frequency spectrum estimates, and , and coherence magnitudes, , for all experiments and tapers of the two cell types and . Here labels the left out sample. The spectra are calculated with the expression where and can be or . Here, the sum is multiplied by instead of to give a statistically unbiased estimate. The coherence magnitudes are calculated with the expression Also, calculate and , spectra and coherence with no samples left out.
Step 5. Compute pseudo values and where .
Step 6. Compute jackknife mean and variance estimates and Compute jackknife confidence intervals for both mean and variance estimates.
Step 7. Plot and with confidence intervals. Plot and with confidence intervals.
Run times for the datasets in this paper took about 12 minutes per image on a 12 core server. It is key to make sure that you have enough RAM for the analysis, otherwise paging to memory can use up a lot of time.

Interpreting a MiCASA Analysis
A MiCASA analysis results in estimates of three circularly averaged functions, , (log-spectra) and (atanh-coherence) along with their variances. Units measuring distance, , will typically be microns and units for spatial frequency, , will be 2π/microns, respectively. When we plot , and , and their variances, we plot them as a function of , then label the frequencies in microns (instead of 1/microns). In this way, it is easier to visualize the length-scales of characteristic features of cellular distributions. One disadvantage of this is that the bandwidth of the estimate then varies along the -axis.
In an analysis of cellular distributions, peaks in the log-spectra at a given frequency indicate extra structure in the cellular distribution, relative to nearby frequencies. If there is a characteristic spatial separation between cells at a distance x, then this will appear as an increase in the log-spectrum at a frequency of . Cells characteristically clustered at a given short separation will tend to have a higher amplitude log-spectrum at high frequencies. Clustering in the cellular distribution at long separations will contribute higher amplitudes to the log-spectrum at low frequencies.
Atanh-coherences will have peaks or excess amplitude at spatial frequencies where cells from both cell populations that were imaged have similar separations relative to each other. They will also have peaks if they are anti-correlated with each other (i.e. if one set of cells is absent where the other is present). Note that it is possible to have coherence without high amplitudes in the log-spectra and it is also possible to have no coherence even though both of the log-spectra have high amplitude at a given spatial frequency.
Variances of log-spectra and atanh-coherence are also of interest. When inconsistencies arise between how cells organize, either one population relative to itself (log-spectrum) or one population relative to another (atanh-coherence) in different images or at different locations within a single image, these are measured by variances.
When performing MiCASA on a collage of images, the edges of sub images will tend to generate spurious power in the log-spectra and atanh-coherence. Edges can also increase variance estimates. This can be minimized by standardizing the original images to have background pixels with zero mean.

Some Details Concerning the Statistical Model
The statistical model governing the MiCASA analysis is the standard repeated measures design. That is, multiple tapered estimates of the spectrum are made on the same image; these estimates are then combined across multiple images to form an aggregate estimate. As was pointed out in [2], since the tapers are orthogonal within the bandwidth of the estimate and very low amplitude outside the bandwidth, the estimates are, to a very good approximation, uncorrelated within each image. Thus, as long as the estimates are independent from image to image, the confidence intervals that we derive are independent between different frequency bands. Note that it is important, in order to assure that estimates are independent from image to image, that the user not use images of multiple tissue slices from the same organ, because nearby slices will very likely be correlated. For the data shown in this paper, the estimates represent NM = 33 and NM = 44 independent degrees of freedom, thus the SNR is high even though the number of images is relatively small.
The hypothesis that estimates are comparable between tapers and images for a given condition may be tested using a fixed-effects, additive, single-factor, repeated measures design using twoway ANOVA. A MiCASA analysis is appropriate if the null hypothesis (i.e. that means across tapers and images are the same) is supported. We note that the hypothesis of interchangeability between tapers and images in our analysis is similar to the ergodic hypothesis in time-series estimates, in which temporal and spatial averages are also checked to be interchangeable.