Thank you for visiting nature.com. You are using a browser version with limited support for CSS. To obtain the best experience, we recommend you use a more up to date browser (or turn off compatibility mode in Internet Explorer). In the meantime, to ensure continued support, we are displaying the site without styles and JavaScript.

# AutoSpill is a principled framework that simplifies the analysis of multichromatic flow cytometry data

## Abstract

Compensating in flow cytometry is an unavoidable challenge in the data analysis of fluorescence-based flow cytometry. Even the advent of spectral cytometry cannot circumvent the spillover problem, with spectral unmixing an intrinsic part of such systems. The calculation of spillover coefficients from single-color controls has remained essentially unchanged since its inception, and is increasingly limited in its ability to deal with high-parameter flow cytometry. Here, we present AutoSpill, an alternative method for calculating spillover coefficients. The approach combines automated gating of cells, calculation of an initial spillover matrix based on robust linear regression, and iterative refinement to reduce error. Moreover, autofluorescence can be compensated out, by processing it as an endogenous dye in an unstained control. AutoSpill uses single-color controls and is compatible with common flow cytometry software. AutoSpill allows simpler and more robust workflows, while reducing the magnitude of compensation errors in high-parameter flow cytometry.

## Introduction

Fluorescently labeled antibodies and flow cytometry have been the workhorse for single-cell data generation in many fields of biosciences since its development in the late 1960s1. The ability to rapidly collect quantitative data from millions of single cells has driven the understanding of heterogeneity in complex cellular mixtures, and led to the development of many fluorescence-based functional assays2,3,4,5. The diverse utility of flow cytometry has driven constant demand for an expansion in the number of parameters to be simultaneously measured. Development of novel fluorophores and advances in laser technology have provided a steady increase in the number of parameters that can be measured on state-of-the-art machines, roughly doubling each decade since the 1970s (Roederer’s Law for Flow Cytometry)6.

The development from single-color flow cytometry to ultra high-parameter flow cytometry has allowed an enormous growth in the data collected per cell. In our own field of immunology, high-parameter flow cytometry panels have become necessary, with multiple markers required to identify cellular lineages, major subsets, and activation markers. A key limitation with high-parameter flow cytometry, however, is the spectral overlap of fluorescent dyes7. This results in the spillover of fluorescence to detectors different from the detector assigned to each dye (in classical flow cytometry). Removing this unwanted spillover, i.e. compensating, is a necessary preliminary step in the data analysis of multi-color flow cytometry.

State-of-the-art flow cytometers, with ~30 channels, make compensation increasingly difficult as the number of channels grows, due to the unavoidable overlap between emission spectra of fluorescent dyes. The difficulty of experimental design has followed the growth in fluorophore options, to the point where the development, refinement, and validation of ultra-high parameter panels can take months to years of expert input4,8,9,10. Indeed, the development of mass cytometry as an alternative technology is largely driven by its lack of spillover11, as otherwise the technology compares unfavorably to flow cytometry in several aspects6.

Unlike the extensive development efforts in fluorophore generation, fluidics refinement, and laser addition, the basis for dealing with spillover in flow cytometry has largely remained unchanged. Current compensation algorithms are based upon the algorithm for spillover calculation proposed by Bagwell and Adams, when flow cytometers worked with only a few fluorophores12. While aspects of data processing have been refined since then, such as autofluorescence correction, the basic compensation strategy of calculating the spillover signal between defined positive and negative populations remains the traditional approach used across the majority of software packages. These approaches provide an estimation of the spillover matrix, in which the degree of spectral spillover between channels is estimated from single-color controls. A compensation matrix is obtained by inverting the spillover matrix, by which spillover is compensated out from experimental datasets. While effective in low-parameter datasets, where spillover is moderate to start with, in the case of high-parameter data this method often requires manual adjustment before proceeding with downstream analyses. This manual tuning entails manipulating a matrix with several hundred coefficients, which can be challenging and time-consuming, thus severely constraining panel design in practice. This approach requires single-color controls with well-defined positive and negative populations, which often forces the single-color controls to differ from those of the actual panel, increasing the complexity of the experiment.

Spectral flow cytometry is a refinement of classical flow cytometers, expanding the number of parameters simultaneously measured. In these systems, spectral unmixing is used to discriminate between the spectra of similar fluorophores. The unmixing is carried out in a different way, but obtaining the spectral signature of each fluorophore is also based on single-color controls. As with compensation, unmixing requires the calculation of spillover to every detector, with more detectors used than fluorophores. Both classical flow cytometers, and the spectral systems potentially replacing them over the upcoming decades, are therefore limited by the accuracy of spillover calculation.

We have developed an algorithm, AutoSpill, to compensate flow cytometry data. This approach uses single-color controls, making it compatible with existing datasets and protocols. Unlike other compensation approaches, however, it calculates spillover coefficients by means of robust linear models. This method produces better estimation of spillover coefficients, without requiring well-defined positive and negative populations. Moreover, AutoSpill uses this improved estimation of the spillover matrix only as the initial value for an iterative algorithm that automatically refines the spillover matrix until achieving, for practical purposes, virtually perfect compensation for the given set of controls. In addition to providing optimal spillover matrices for compensating (or unmixing in spectral systems), and given that AutoSpill does not rely on well-defined positive and negative populations, it can calculate the autofluorescence spectrum of cells by treating it as an extra endogenous dye. Thus, it allows effective detection and removal of autofluorescence from experimental data.

A linear modeling approach can equally be used to estimate the increase in fluorescence noise or spread caused by compensating spillover. Thus, we also propose a second algorithm, AutoSpread, which calculates spillover spreading coefficients with linear models, thereby providing a spillover spreading matrix (SSM) without the need for well-defined positive and negative populations in the single-color controls.

Together, AutoSpill and AutoSpread remove limiting constraints of traditional compensation methods, easing the preparation of compensation controls in high-parameter flow cytometry, making errors less likely, and facilitating the practical implementation of ultra high-parameter flow cytometry. AutoSpill is available through open-source code and a freely available web service (https://autospill.vib.be). AutoSpill and AutoSpread are available in FlowJo v.10.7.

## Results

### Tessellation allows robust gating

A critical first step in the processing of flow cytometry data is the elimination of cellular debris and other non-cellular contamination. This stage is typically performed by manual or automated gating of particles with the expected size and granularity, based on forward scatter and side scatter. In order to develop a fully automated pipeline, we sought to encode this initial cellular gating in the AutoSpill algorithm (Supplementary Software 1). After numerous tests on data provided by collaborating immunologists, we settled on a multi-step process with two tessellations, which demonstrated the required features of robust cell or bead identification. Figure 1 shows the initial gating for one single-color control of each set of controls. The multi-step process robustly identified the cellular fractions as desired, regardless of the presence of high amounts of cellular debris in the HS1 and HS2 datasets (Fig. 1, second and third columns). It also worked correctly with beads (Be1 dataset), which exhibited substantially different forward-scatter/side-scatter profiles (Fig. 1, fourth column). For all channels and all datasets, the gate selected the cell/bead population in the desired density maximum, without needing manual adjustment.

### Robust linear regression effectively estimates spillover coefficients

The estimation of spillover coefficients is based on the comparison between the level of fluorescence detected in the primary channel (i.e. the detector dedicated to the dye or fluorophore, in classical systems, or the detector with highest signal, in spectral systems) and the secondary channels (i.e. every other detector). The linear relationship between the fluorescence levels of primary and secondary channels is not visible in the usual bi-exponential scale (Fig. 2, first and third columns), but it becomes apparent in linear scale (Fig. 2, second and fourth columns). The linear relationship between the primary and secondary channels shows that the ratio of fluorescence between the two channels is constant across a broad range of fluorescence levels. Thus, a linear regression can be used to properly identify the slope between the two channels, that is, the spillover coefficient. As fluorescence data is heteroskedastic, owing to the effects of photon-counting statistics, robust linear regression, which efficiently estimates the relationship while down-weighting outlier points, is more suitable for this purpose than an ordinary linear regression, which has increased sensitivity to outliers that violate the normality of the data. We sought to compare robust linear regression to the traditional approach. As our robust gating (described above) provided notable benefits on downstream spillover calculation, independent of calculation method, we applied the same initial robust gating strategies to both the traditional and robust linear regression approaches. Other than the use of improved robust gating (applied so as to not overestimate the advantages of our approach), the traditional calculation used the standard approach of identifying positive and negative peaks and selecting the median. The robust linear model approach produces a similar result to that achieved by the “traditional” calculation of a slope between the median values of the positive and negative populations12, which is the method usually employed (Fig. 2, first and second columns). Notably, however, the use of linear regression also allows the robust calculation of the slope in cases that the traditional approach was not designed to deal with: low numbers of positive events (Fig. 2b), without a well-defined positive population (Fig. 2c), or without well-defined positive and negative populations (Fig. 2d). The quality of compensation can be evaluated by the difference between the obtained compensation and the ideal one, with perfectly compensated data showing an exactly vertical distribution of data along the primary fluorophore (i.e. zero slope). While traditional estimation of spillover was successful to some extent in producing low-error compensation, in particular when distinct positive and negative populations were present (Fig. 2a, first and second columns), errors were identified in particular channels, especially when populations did not conform to good separation (Fig. 2c, first and second columns). Traditional algorithms struggle in the case of poor separation between positive and negative populations due to the requirement to identify two distinct populations to calculate a slope between (Fig. 2c). In extreme cases this can result in the identification of two populations within the negative cell cluster, and grossly wrong slope calculations. AutoSpill, by contrast, treats data at the single-cell level, utilizing expression data even when the positive and negative populations are low in frequency or form a tail from the negative population, driving a large correction in error (Fig. 2c). In all cases, linear regression resulted in less compensation error (Fig. 2, third and fourth columns).

### Iterative reduction of compensation error yields optimal spillover coefficients

The spillover coefficients obtained in the first iteration step by robust linear regression produced low-error estimates of the spillover matrix for all channels (Fig. 3, with representative example in Fig. S1). While this error level outperformed that of the traditional approach (Figs. 3 and S1C), some channels exhibited a residual degree of over compensation or under compensation (Fig. S1E). While such errors are small, they nonetheless produce overcompensation or undercompensation noticeable in bi-exponential scale, which visually amplifies fluorescence levels close to zero. In a high-parameter flow cytometry panel, with multiple fluorophores present on small subpopulations, such errors can accumulate to the point of making individual channels effectively unusable. We therefore developed an iterative approach, by which the spillover results obtained through the robust linear regression approach (Fig. S1E) were used as the starting point for an additional round of robust linear regression. This process repeats, successively obtaining better spillover matrices allowing for further reduction of error in the compensated data, until pre-defined criteria are met. This iterative refinement of the spillover matrix reduced the compensation errors to negligible values (Figs. 3 and S1F).

While effective in most cases, this strategy for reducing compensation error can become compromised when using controls with low fluorescence levels in the primary channel or other fluorescence artifacts. Under these circumstances, iterations gave rise to oscillations in the observed compensation errors before reaching convergence (Fig. 3a, c). In order to deal with these extreme cases, we applied a fraction of the update to the spillover matrix, slowing down convergence and further decreasing compensation error (Fig. 3a, c).

Overall, the iterative refinement of spillover coefficients was effective at reducing errors in compensation. In the four representative datasets reported here, the refinement reduced error from the initial compensation step in 4–6 orders of magnitude (Fig. 3). This improvement was observed even with subsampling single color controls to very low numbers of cells (Fig. S2A), although an increased number of iterations was required to achieve convergence (Fig. S2B). This low error amounts to optimal spillover coefficients and compensation matrices, relative to the quality of the single-color controls used as input, and therefore it removes a key challenge to successful compensation in high-dimensional flow cytometry.

### Removal of autofluorescence through compensation with an additional autofluorescence channel

Cells produce autofluorescence, due to the interaction of the constituent organic molecules with the incoming photons. The amount of autofluorescence varies between cell types, and it is, for example, higher on cells from the myeloid lineage13,14. This can create problems in the analysis of certain flow cytometry datasets. Although the amount of autofluorescence varies between cell types, the spillover from autofluorescence observed in an unstained control (Fig. 4a) behaved similarly to the spillover detected from (exogenous) fluorescent dyes (Fig. 2, first and third columns), with the key feature of not having well-defined positive and negative populations. The capacity of AutoSpill to estimate spillover coefficients without needing these populations allowed the treatment of autofluorescence as coming from an endogenous dye, whose single-color control was an unstained control, and whose fluorescence level was recorded in an extra empty channel assigned to a dummy dye. We therefore tested the ability of AutoSpill to compensate out autofluorescence, which was in issue in the HS1 and HS2 datasets. In effect, we were able to use the extra channel to measure the intensity of autofluorescence and greatly reduce its impact onto the other channels (Fig. 4b, c). Importantly, the empty channel assigned to autofluorescence worked best when it was the channel with higher level of signal in the unstained control. This way, the most autofluorescent channel was sacrificed during panel design to enhance resolution across all the other channels. As this process of autofluorescence removal is based on the calculation of spillover in the unstained control, autofluorescence removal requires all of the single-color control samples to be run from the same base cell type as the experimental samples. Autofluorescence removal is therefore not possible in AutoSpill, or any other computational approaches of which we are aware, when single color controls come from disparate sources (such as using beads or cellular mixes with different baseline autofluorescence). While autofluorescence removal is effective in a mixed cellular population in which different cell types have quantitatively different levels of autofluorescence, the process may fail if the sample includes a mixture cells which qualitatively differ in their autofluorescence spectrum. Autofluorescence subtraction in samples with minimal autofluorescence could, in principle, add low degree of noise to the data. We therefore suggest that users manually inspect unstained samples for variance in fluorescence and only use the autofluorescence subtraction option if autofluorescence is detected in the sample.

### Linear models for estimation of the SSM

Spillover spreading is defined as the incremental increase in standard deviation of fluorescent intensity in one parameter caused by the increase in fluorescent intensity of another parameter. Calculation of SSM coefficients, while not a standard step in the analysis pipeline, is a useful tool for machine quality control of consistency in sensitivity and performance, and can aid in minimizing interference during the design of high parameter flow cytometry panels15. The SSM coefficients can be calculated by comparing the fluorescent intensity in the primary detector to the standard deviation of fluorescence in the secondary detector, for a pair of positive and negative populations in a single-color control corresponding to the primary detector15. It can also be demonstrated that the linearity of this relationship for different sizes $$\sqrt{{{\Delta }}F}$$, and that the estimation of each spillover spreading coefficient is machine-dependent and compensation-matrix-dependent, but is, however, dataset independent15. Here, we used quantile partitioning and linear regression to estimate the linear relationship observed by Nguyen et al. thereby allowing the inclusion of events above, below, or in-between the positive and negative populations of the original approach.

The events of each single-color control were partitioned quantile-wise in the primary detector, and the standard deviation of the level of fluorescence was estimated, for each quantile bin, in every secondary detector. Next, two linear regressions were used to estimate, first, the standard deviation at zero fluorescence, and second, the spillover spreading coefficient. Coefficients deemed non-significant using an F-test were replaced with zeros, as well as any negative coefficients. The majority of quantiles were, in fact, subsamples of the traditional positive and negative populations, but the inclusion of additional quantiles improved the precision of AutoSpread in estimating spillover spreading effects, because all these events conform to the same linear relationship, assuming that they are on-scale and in the linear range of the flow cytometer (Fig. 5a). As a result, AutoSpread accurately estimated spillover spreading for datasets whose compensation matrices successfully orthogonalized the fluorescent signals present in the single-color controls (Fig. 5b).

The adjustment step of AutoSpread (the first regression) was critical. The adjustment removed the minor quadratic effect caused by σ0 in the initial estimates, thereby allowing a more accurate estimation of the coefficients $${\mathrm{{S{S}}}}_{C}^{P}$$. If this adjustment step were skipped, that is, if the β’s were taken as the spillover spreading coefficients, then spreading effects would be consistently underestimated. In that case, comparison against the traditional SSM algorithm would show a clear negative bias (Fig. 5c). Including the adjustment, step eliminated that bias. For datasets whose single-color controls were contaminated by uncompensated signals (e.g. autofluorescence), both AutoSpread and the traditional SSM calculation may fail to accurately estimate spillover spreading. Initial gating that actively eliminates such effects, as well as the use of an extra autofluorescence channel, can alleviate the problem for both algorithms.

## Discussion

Flow cytometry has been a revolutionary force in single-cell analysis. The ability to rapidly analyze protein expression of millions of cells at single-cell level, coupled with the purification capacity of fluorescence-activated cell sorting, has provided a remarkable tool for understanding cellular heterogeneity and function. Initial limitations were overcome through ingenious technical developments: the number of fluorescent parameters were expanded through the development of new dyes and lasers, intracellular staining protocols were optimized for the detection of intracellular (and even post-translationally modified) proteins, RNAflow techniques allowed measurement at the RNA level25, and numerous non-antibody-based dyes were able to detect processes from redox potential26 to organelle content and status27. The very utility of the technique has pushed flow cytometry to its technical barrier—the desire to measure everything on every cell has driven up the number of parameters that can be distinctly measured. The constraints imposed by overlapping fluorescent spectra are arguably the largest limit to the potential of flow cytometry, yet progress in the mathematical underpinnings of the analysis have substantially lagged behind the advances in the chemical and physical bases of the technology.

Newer single-cell technologies, most notably mass cytometry and single-cell RNA-Seq, do not have the spillover issues of flow cytometry. Mass cytometry is a direct competitor to flow cytometry, also primarily utilizing antibody-based detection of single-cell expression28. As the heavy metal labels do not overlap, mass cytometry panels can be built up in an modular manner, without the same design constraints required for flow cytometry29. While spectral flow cytometry and mass cytometry can readily run more than 40 parameters, classical flow cytometry experiments struggle to use more than 30 parameters, due to the challenge of distinguishing signals from each dye or fluorophore. Nonetheless, flow cytometry has major advantages over mass cytometry, most notably the speed of data acquisition (around 50-fold more rapid data collection) and the ability to sort live cells. The other main competitor to flow cytometry is single-cell RNA-Seq28. While initially limited to measurement of RNA content in a semi-quantitative manner, the advent of barcoded antibodies in protocols such as CITE-Seq30 and Abseq31 provided data directly comparable to that of flow cytometry. As barcoding approaches have no practical limit concerning compensation issues, they can compete with flow cytometry. Even in this case, however, flow cytometry has distinct technological advantages. In addition to the previously mentioned advantage of live-cell sorting, flow cytometry produces data at an unparalleled speed, with more than 106 cells measured per minute, and with a data format enabling immediate analysis. In terms of price, current flow cytometry assays are several orders of magnitude cheaper than RNA-Seq, with costs on the order of 10 USD per 106 cells28. Flow cytometry is therefore very much a living technology, with important advantages over competitor technologies and limited only by the parameter barrier.

The latest iteration of flow cytometry is spectral flow cytometry, a refinement where more channels (detectors) are used than dyes. Spectral flow cytometry allows for enhanced discrimination of fluorophores, including those that share a main channel, by calculating the dye origin through fluorescence at minor channels where the emission spectrum differs32. Spectral unmixing (assignment of detector signal to dyes) requires the generation of an accurate spillover matrix, which can be performed in a mathematically identical manner to the spillover matrix of traditional flow cytometry, regressing each dye against each other, but producing a rectangular rather than square spillover matrix (as channels > dyes). While different algorithms have been proposed on the methodology of applying this spillover matrix to unmix the spectral data33,34, each benefits from the use of a more correct spillover matrix. As AutoSpill focuses on improving the estimation of the spillover coefficients, rather than on how these coefficients are used, the implementation of the AutoSpill algorithm to spectral cytometry data can therefore yield similar benefits to that observed with traditional flow cytometry data. Indeed, spectral systems may well be the more compelling use case, as the system encourages dye crowding and the use of dyes with overlapping spectra. Moreover, spectral systems almost always have sufficient spectral resolution to orthogonalize autofluorescence from the other fluorescent spectra present in a sample. It is in these more complex cases where AutoSpill provides the greatest benefit.

We have presented here a compensation method which greatly reduces compensation error and expands the possible number of parameters in flow cytometry experiments. The use of robust linear regression and iterative refinement allows the calculation of spillover matrices without the need for using controls with well-defined positive and negative populations, thus permitting the use of the actual panel antibodies for the controls in many experiments. This method can be applied to any flow panel from 4 to 6 fluorophores up to multi-color staining sets with more than 30 fluorescent dyes. Given that the typical number of gated events in single-color controls is at least in the order of thousands, the amount of data points available enables this approach to reduce compensation errors to such small values that the resulting compensation is, in practical terms, functionally perfect for the given set of single-color controls. On the other hand, the method needs some level of fluorescence in the primary channel for each control (or at least in one of the detectors for spectral systems), to be able to regress the spillover coefficients.

An added feature of AutoSpill is the ability to compensate out autofluorescence. Although some methods have been proposed35,36,37, typically it is not possible to remove autofluorescence, with the exception of some spectral systems38,39. By default, AutoSpill does not use an unstained control, but it can be included and assigned to an extra unused channel in the flow cytometer. Data collected in this extra channel can be treated as coming from an endogenous fluorescent dye, which results in the inclusion of autofluorescence levels in the calculation of spillover coefficients and ensuing compensation. This optional approach is recommended when there are non-negligible levels of autofluorescence in one or several channels (as observed from an unstained control), and one of those high-autofluorescence channels is not used in the design of the panel. As autofluorescence can be increased by physiological and cellular processes13,40, the ability to compensate out autofluorescence can remove distortions appearing as false positives, where cellular changes are mistakenly identified as altered expression of a marker, while the signal is in fact caused by autofluorescence. This approach will be of particular utility in the study of cell populations with high intrinsic autofluorescence, such as myeloid-lineage cells13,14 or tumor cells41,42.

In comparison with previous compensation methods, which do not guarantee an upper bound on the compensation error, AutoSpill provides a spillover matrix with such a guarantee, given a set of controls. Therefore, it is possible now to address a new question: To which extent a set of single-color controls is sufficient to ensure proper compensation of data obtained with a complete panel, that is, not just for the set of controls. In our experience, some panels still require minor modifications of the spillover matrix, which implies that the single-color controls do not fully describe the fluorescence properties of the complete panel, probably because of second-order phenomena such as secondary fluorescence or other interactions between dyes. Thus, this remains an open question.

While we demonstrate the utility of this method using eight representative datasets, the tool has been beta-tested more than 1000 times over a period of 22 months by more than 100 collaborating immunologists. This has allowed the development of a robust algorithm, designed to accommodate diverse datasets and to deal with less-than-perfect data arising in real-world experiments. The code is open source and is released with a permissive license, allowing integration into existing flow cytometry analysis pipelines in academia and industry. To increase access by research communities in immunology and other fields, we also provide a website (https://autospill.vib.be) that allows the upload of sets of single-color controls for calculating the spillover matrix with AutoSpill, produced in formats compatible with common software for flow cytometry analysis. As we have demonstrated by including AutoSpill in FlowJo v.10.7, this algorithm is suitable for integration into commercial software, allowing for rapid and widespread uptake of superior flow cytometry compensation.

## Methods

### Datasets

Collaborating immunologists beta-tested AutoSpill over a period of 22 months, which allowed extensive testing and improvement of the algorithm for niche cases. Among these datasets, four are used as examples here, covering mouse cells, human cells, and beads. Compensation using AutoSpill, with default parameters, was carried out for each of these four sets of single-color controls: mouse splenocytes (MM1 dataset), human PBMCs (HS1 and HS2 datasets), and beads (Be1 dataset). We also analyzed four fully stained datasets, as examples of biological utility: mouse splenocytes (MM2 and MM3 datasets), and mouse microglia (MM4 and MM5 datasets). Data collection complied with all relevant ethical regulations for animal research and work with human participants. All animal experiments were performed in accordance with the University of Leuven Animal Ethics Committee guidelines or the Babraham Institute Animal Welfare and Ethics Review Body. Animal husbandry and experimentation complied with existing European Union and national legislation and local standards. Sample sizes for mouse experiments were chosen in conjunction with the ethics committees to allow for robust sensitivity without excessive use. For human experiments, written informed consent was obtained from all participants and the ethics committee of University Hospitals Leuven approved the study.

UltraComp eBeadsTM Compensation Beads (Thermofisher) were used to optimize fluorescence compensation settings for multi-color flow cytometric analysis at a Symphony flow cytometer. UltraComp eBeadsTM were stained with the following fluorochrome-labeled anti-human antibodies: anti-CD8–BUV805 (1:200, clone SK1), anti-CD4–BUV496 (1:50, clone SK3), anti-CD86–BUV737 (1:50, clone 2331 FUN-1), anti-CD141–BUV615-P (1:50, clone 1A4), anti-CD56–BUV563 (1:50, clone NCAM 16.2), anti-CD16–BUV395 (1:50, clone 3G8), anti-CD123–BB660-P (1:50, clone 7G3), anti-CD80–BB630 (1:50,clone L307.4), anti-CD21–BV785 (1:50, clone B-ly4), anti-CD27–BV750-P (1:40,clone L128), anti-BAFF-R–BV650 (1:50, clone 11C1), anti-CD94–BV605 (1:50, clone HP-3D9), anti-CD40–APC-R700 (1:50, clone 5C3) (all BD bioscience); anti-CD3–PerCP-Vio700 (1:50, clone REA613) (Miltenyi Biotec); anti-CD57–FITC (1:100, clone TB01), anti-CD14–PE-Cy5.5 (1:200, clone TuK4), fixable viability dye eFluor780 (1:1000) (all eBioscience); anti-CD24–BV711 (1:50, clone ML5), anti-CD19–BV510 (1:25, clone HIB19), anti-HLA-DR–BV570 (1:40, clone L243), anti-IgM–BV421 (1:100, clone MHM-88), anti-CD11c–APC (1:40, clone 3.9), anti-CD38–PE/Dazzle 594 (1:100, clone HB-7), anti-CD10–PE-Cy5 (1:50, clone HI10a), and anti-IgD–PE-Cy7 (1:100, clone IA6-2) (all BioLegend).

### HS1 dataset, human peripheral blood mononuclear cells (PBMCs)

PBMCs were isolated from heparinized blood samples of human healthy donors using Ficoll-Paque density centrifugation (MP biomedicals), frozen and then stored in liquid nitrogen. Frozen PBMCs were thawed and counted, and cell concentration was adjusted to 1 × 106 for each single-color control. Cells were plated in a V-bottom 96-well plate, washed once with PBS (Fisher Scientific) and stained with live/dead marker and fluorochrome-conjugated antibodies against surface markers: anti-CD8–BUV805 (1:200, clone SK1), anti-CD4–BUV496 (1:50, clone SK3), anti-CD95–BUV737 (1:100, clone DX2), anti-CD4–BUV615-P (1:50, SK3), anti-CD28–BB660-P (1:100, clone CD28.2), anti-CD4–BB630 (1:50, clone SK3), anti-CD4–BV750-P (1:50, clone SK3), anti-CD31–BV480 (1:100, clone WM59), anti-CXCR5–BV650 (1:25, clone RF8B2), anti-CD4–PE (1:100, clone SK3), anti-CD4–PE-Cy5 (1:50, clone SK3) (all BD Biosciences); anti-CD3–PerCP-Vio700 (1:50, clone Rea613) (Miltenyi Biotec); anti-CD3–FITC (1:50, clone UCHT1), anti-CD4–PE-Cy5.5 (1:50, clone SK3), anti-CCR7–PE-Cy7(1:50, clone 3D12), anti-CD4–APCeFluor780 (1:50, clone SK3) (all eBioscience); anti-CD4–BV786 (1:50, clone SK3), anti-CD4–BV711 (1:50, clone SK3), anti-CD4–BV605 (1:50, clone SK3), anti-HLA-DR–BV570 (1:40, clone L243), anti-CD127–BV421 (1:25, clone A019D5), anti-CD4–PE/Dazzle 594 (1:100, clone SK3), anti-CD4–AF647 (1:50, clone SK3) (all BioLegend).

Samples were stained for 60 min at 4 °C, washed twice in PBS/1% FBS (Tico Europe), and then fixed and permeabilized with Foxp3 Transcription Factor Staining Buffer Set (eBioscience), according to manufacturer’s instructions. Cells were stored overnight at 4 °C and were then acquired on a Symphony flow cytometer with Diva software (BD Biosciences). A minimum of 5 × 104 events were acquired for each sample.

### HS2 dataset, human PBMCs

Frozen PBMCs from human healthy donors were processed as for the HS1 datasset and stained with live/dead marker and fluorochrome-conjugated antibodies against the following surface markers: anti-CD8–BUV805 (1:200, clone SK1), anti-CD4–BUV496 (1:50, clone SK3), anti-CD95–BUV737 (1:100, clone DX2), anti-CD28–BB660-P (1:100, clone CD28.2), anti-ICOS–BB630 (1:50, clone DX29), anti-CXCR3–BV785 (1:25, clone 1C6), anti-PD-1–BV750-P (1:25, clone EH12.1), anti-CXCR5–BV650 (1:25, clone RF8B2), anti-CCR2–BV605 (1:25, clone 1D9), anti-CD31–BV480 (1:100, clone WM59) (all BD Biosciences); anti-CD3–PerCP-Vio700 (1:50, clone REA613) (Miltenyi Biotec); anti-CD45RA–FITC (1:50, clone HI100), anti-CD14-PE–Cy5.5 (1:200, clone TuK4), anti-CCR7-PE–Cy7 (1:50, clone 3D12), fixable viability dye eFluor780 (all eBioscience); anti-CD25–BV711 (1:25, clone BC96), anti-HLA-DR–BV570 (1:40, clone L243), anti-CD127–BV421 (1:25, clone A019D5), and anti-CCR4–PE/Dazzle 594 (1:100, clone L291H4) (all BioLegend).

Samples were stained for 60 min at 4 °C, washed twice in PBS/1% FBS (Tico Europe), and then fixed and permeabilized with Foxp3 Transcription Factor Staining Buffer Set (eBioscience), according to manufacturer’s instructions. Cells were stained overnight at 4 °C with anti-Ki67–BUV615-P, anti-CTLA-4–PE-Cy5, anti-RORγt–PE (BD Biosciences), and anti-FOXP3–AF647 (BioLegend) anti-human intracellular antibody. Samples were acquired on a Symphony flow cytometer (BD Biosciences).

### MM1 dataset, mouse splenocytes

Splenocytes from C57Bl/6 mice were disrupted with glass slides, filtered through 100 μm mesh, and red blood cells lysed. Cells were fixed and permeabilized with Foxp3 transcription factor staining buffer set (eBioscience) according to the manufacturer’s instructions, and stained overnight at 4 °C with Fixable Viability Dye eFluor780 (eBioscience) or the following antibodies: anti-CD4–BV421 (1:200, clone GK1.5), anti-CD24–BV510 (1:400, clone M1/69), anti-CD3–BV570 (1:250, clone 145-2C11), anti-CD4–BV605 (1:200, clone RM4-5), anti-CD3–BV650 (1:400, clone 145-2C11), anti-CD4–BV711 (1:200, clone GK1.5), anti-CD4–BV785 (1:200, clone GK1.5), anti-CD3–AF488 (1:1000, clone 145-2C11)/anti-CD4–AF488 (1:200, clone RM4-5)/anti-TCRβ–AF488 (1:2000, clone H57-597), anti-CD4–PerCP-Cy5.5 (1:200, clone RM4-5), anti-CD4–PE-594 (1:200, clone RM4-5), anti-CD8–PE-Cy7 (1:2000, clone 53-6.7), anti-MHC-II–AF700 (1:1000, clone M5/114.15.2) (all Biolegend), anti-CD19–BV750 (1:500, clone 1D3), anti-CD3–BB630-P (1:1000, clone 145-2C11)/anti-Thy1.2–BB630-P (1:4000, clone 53-2.1), anti-CD45.2–BB660-P2 (1:1000, clone 104)/anti-CD3–BB660-P2 (1:1000, clone 145-2C11), anti-TCRβ–BB790-P (1:2000, clone H57-597), anti-CD4–BUV395 (1:200, clone GK1.5), anti-IgD–BUV496 (1:2000, clone 11-26c.2a), anti-CD3–BUV563 (1:400, clone 145-2C11), anti-CD3–BUV615-P (1:400, clone 145-2C11), anti-CD19–BUV661 (1:250, clone 1D3), anti-CD21–BUV737 (1:500, clone 7G6), anti-CD8–BUV805 (1:250, clone 53-6.7) (all BD Biosciences), anti-CD4–PE (1:500, clone RM4-5)/anti-CD3–PE (1:2000, clone 145-2C11)/anti-CD8–PE (1:500, clone 53-6.7), anti-IgM–PE-Cy5 (1:2000, clone Il/41), anti-CD3–PE-Cy5.5 (1:8000, clone 145-2C11) or anti-CD4–APC (1:1000, clone RM4-5) (all eBioscience). For some fluorophores, multiple antibodies were used in the same compensation control, which is indicated by slashes. Samples were acquired on a Symphony flow cytometer (BD Biosciences).

### MM2 dataset, mouse splenocytes

Splenocytes from C57Bl/6 mice were disrupted with glass slides, filtered through 100 μm mesh, and red blood cells lysed. Cells were stained with Fixable Viability Dye eFluor780 (eBioscience), fixed and permeabilized with Foxp3 transcription factor staining buffer set (eBioscience) according to the manufacturer’s instructions, and stained overnight at 4 °C with the following antibodies: anti-CD4–BV421 (1:2000, clone N418), anti-CD24–BV510 (1:2000, clone M1/69), anti-Ly6G–BV570 (1:2000, clone 1A8), anti-XCR1–BV650 (1:2500, clone ZET), anti-CD19–BV785 (1:400, clone 1D3), anti-CD3–AF488 (1:1000, clone 145-2C11), anti-PDCA-1–PerCP-Cy5.5 (1:1000, clone 927), anti-CD23–PE (1:5000, clone B3B4), anti-CD64–PE-594 (1:500, clone X54-5/7.1), anti-CD172a–PE-Cy7 (1:5000, clone P84), anti-CD45–APC (1:10,000, clone 30-F11), anti-MHCII–AF700 (1:2000, clone M5/114.15.2) (all Biolegend), anti-IgE–BV605 (1:5000, clone R35-72), anti-CD93–BV711 (1:2000, clone AA4.1), anti-CD11b–BV750 (1:2000, clone M1/70), anti-CD80–BB630-P (1:2000, clone 16-10A1), anti-CD95–BB660-P2 (1:10,000, clone Jo2), anti-TCRβ–BB790-P (1:2000, clone H57-597), anti-CD103–BUV395 (1:1000, clone M290), anti-IgD–BUV496 (1:2000, clone 11-26c.2a), anti-Ly6C–BUV563 (1:500, clone AL-21), anti-Siglec F–BUV615-P (1:1000, clone E50-2440), anti-c-Kit–BUV661 (1:5000, clone 2B8), anti-CD21/35–BUV737 (1:5000, clone 7G6), anti-CD8a–BUV805 (1:500, clone 53-6.7) (all BD Biosciences), anti-IgM–PE-Cy5 (1:1000, clone Il/41) and anti-NK1.1–PECy5.5 (1:2000, clone PK136) (eBioscience). Compensation controls were stained as described in the MM1 dataset. Samples were acquired on a Symphony flow cytometer (BD Biosciences).

### MM3 dataset, mouse splenocytes

Splenocytes from C57Bl/6 mice were disrupted with glass slides, filtered through 100 μm mesh, and red blood cells lysed. Cells were stained with Fixable Viability Dye eFluor780 (eBioscience), anti-CD90.2–BV510 (1:250, clone 53-2.1), anti-CD25–BV650 (1:200, clone PC61), anti-CD45–BUV395 (1:500, clone 30-F11) (all Biolegend), anti-CD127–PE (1:100, clone A7R34) and anti-B220–PE-Cy5 (1:200, clone RA3-6B2) (all eBioscience). Cells were fixed and permeabilized with Foxp3 transcription factor staining buffer set (eBioscience) according to the manufacturer’s instructions, and stained overnight at 4 °C with the following antibodies: anti-T-bet–BV421 (1:200, clone 4B10), anti-CD8–BV785 (1:2000, clone 53-6.7), anti-NKp46–FITC (1:500, clone 29A1.4), anti-NK1.1–PE-Cy5.5 (1:2500, clone PK136), anti-MHCII–AF700 (1:2000, clone M5/114.15.2) (all Biolegend), anti-CD11b–eFluor450 (1:1000, clone M1/70), anti-GATA3–PE-Cy7 (1:100, clone L50-823), anti-CD3–biotin (1:1000, clone 145-2C11), anti-RORt–APC (1:500, clone AFKJS-9) (all eBioscience), anti-TCRβ–BB790-P (1:4000, clone H57-597), anti-CD4–BUV496 (1:500, clone GK1.5), and anti-CD19–BUV661 (1:2000, clone 1D3) (all BD Biosciences). Antibodies used for compensation controls were anti-CD25–BV421 (1:200, clone PC61), anti-CD44–BV510 (1:200, clone IM7), anti-CD3–BV650 (1:200, clone 17A2), anti-CD8–BV785 (1:2000, clone 53-6.7), anti-NK1.1–PE-Cy5.5 (1:2500, clone PK136), anti-MHCII–AF700 (1:2000, clone M5/114.15.2) (all Biolegend), anti-CD11b–eFluor450 (1:1000, clone M1/70), anti-TCRβ–FITC (1:500, clone H57-597), anti-B220–PE-Cy5 (1:200, clone RA3-6B2), anti-CD23–PE-Cy7 (1:500, clone B3B4), anti-CD8–biotin (1:200, clone 53-6.7), anti-Foxp3–APC (1:200, clone FJK-16s), anti-CD69–PE (1:200, clone H1.2F3) (all eBioscience), anti-TCRβ–BB790-P (1:4000, clone H57-597), anti-CD103–BUV395 (1:500, clone M290), anti-CD4–BUV496 (1:200, clone GK1.5), and anti-CD19–BUV661 (1:2000, clone 1D3) (all BD Biosciences). Streptavidin AF350 (1:200, Invitrogen) was used to identify biotinylated antibody. Samples were acquired on a Yeti/ZE5 flow cytometer (Propel Labs/BioRad).

### MM4 dataset, mouse microglia

MHCII knockout mice43 were used on the B6 background. Leukocytes and microglia were extracted from mouse brains by chopping with a razor blade, digested in 0.4 mg/ml collagenase D (Sigma-Aldrich), and separated over 40% Percoll (GE Healthcare). Microglia were stained with anti-MHCII–FITC (1:200, clone M5/114.15.2, eBioscience), anti-CD11b–PE-Cy7 (1:500, clone M1/70, eBioscience), anti-CD45–APC (1:1000, clone 30-F11, eBioscience), anti-CD4–PE-Dazzle594 (1:500, clone GK1.5, BioLegend), and fixable viability dye eFluor780 (eBioscience). Samples were acquired on an Aurora spectral cytometer (Cytek).

### MM5 dataset, mouse splenocytes

Foxp3DTR-GFP mice44 were used on the B6 background. Splenocytes were disrupted with glass slides, filtered through 100 μm mesh, and red blood cells lysed. Splenocytes anti-CD11b–PE-Cy7 (1:2000, clone M1/70, eBioscience), anti-CD45–APC (1:1000, clone 30-F11, eBioscience), anti-CD4–PE-Dazzle594 (1:500, clone GK1.5, BioLegend), and fixable viability dye eFluor780 (1:4000, eBioscience). Samples were acquired on an Aurora spectral cytometer (Cytek).

### General implementation details of AutoSpill

AutoSpill was implemented in R v.3.6.3, using the packages flow core v.1.52.1, flowWorkspace v.3.34.1, ggplot2 v.3.3.2, moments v.0.14, and RColorBrewer v.1.1-2. Further details on packages specific to particular steps of the algorithm are listed below.

### Initial gating

The initial gate was calculated independently for each control, over the 2d-density of events on forward and side scatter (FSC-A and SSC-A parameters). To robustly detect the population of interest, two tessellations were successively carried out to isolate the desired density peak. First, data were trimmed on extreme values (1% and 99%). Then, maxima were located numerically by a moving average (window size 3) on a soft estimation of the 2d-density (bandwidth factor 3). Maxima were used to generating non-overlapping tiles covering the entire 2d dataset (tessellation). The first tessellation was carried out on these density maxima, and the tile corresponding to the highest maximum was selected, ignoring peaks with lower values of both FSC-A and SSC-A (<5% of range). A rectangular region in the FSC-A/SSC-A-plane was chosen by using the median and 3 × the mean absolute deviation of the events contained in the selected tile. A second, finer 2d-density estimation (bandwidth factor 2) was obtained on this region, followed again by numerical detection of maxima (window size 2) and tessellation by the maxima. A final 2d-density estimation (bandwidth factor 1) was obtained on the tile containing the highest maximum, with the gate being defined as the convex hull enclosing the points that belonged to this tile and had a density larger than a threshold (33% of range).

Tessellations were carried out with package deliver v.0.1-28, density estimations with packages MASS v.7.3-51.6, surface interpolations with package fields v.10.3, and spatial operations with packages sp v.1.4-2 and tripack v.1.3-9.

### Robust linear models for estimation of spillover coefficients

The linearity of the quantum mechanical nature of photons implies that the ratio between the average fluorescence level (that is, the average number of photons) detected in any two detectors and from any dye is equal to the ratio between the corresponding values of the emission spectrum of the dye, regardless of the level of fluorescence. As the value of the spillover coefficient for the primary channel (the channel assigned to the dye in the single-color control, in classical systems) is usually normalized to one, the spillover coefficient of every secondary channel is equal to the fluorescence ratio above. This implies that each spillover coefficient can be directly read from the slope of a linear regression considering the fluorescence in the primary channel as the independent variable and the fluorescence in the secondary channel as the dependent variable (that is, with x and y swapped for the usual representation of single-color controls when compensating). Thus, the absence of spillover corresponds to a zero slope in this regression, that is, to the vertical direction in the usual plot where the primary channel is displayed in the y-axis. To protect the algorithm against distortions in the data, especially those coming from autofluorescence issues, robust linear regression was used, giving lower weights to events farther away from the estimated regression line. Robust linear models (motivated by the heteroscedastic data with outliers) were implemented with the package MASS v.7.3-51.6, with default parameters, i.e. M-estimation with Huber weighting and the parameter k = 1.345.

### Refinement of spillover matrix

After the first iteration of the algorithm, applying on compensated data the same kind of calculation used for the spillover coefficients, on channels in classical systems or on dyes in spectral systems, would produce zero values with perfect compensation, corresponding to perfectly vertical compensation plots. Otherwise, errors in compensation would yield non-zero values reflecting residual spillover. Overcompensated data would amount to excessively negative values in the secondary channel/dye, corresponding to a negative slope. Similarly, undercompensation would produce excessively positive values in the secondary channel/dye, corresponding to a positive slope.

Observed errors in compensation arise from errors in the estimation of the spillover coefficients. Crucially, it can be proved that, for the average event at any level of fluorescence, the error matrix T in the calculation of the spillover matrix S can be calculated from the observed compensation errors E as

$${\bf{T}}=-{\bf{E}}{\bf{U}}\ ,$$
(1)

U = S + T is the (erroneous) spillover matrix used to compensate the data (see below).

By successively applying Eq. (1), that is, by iteratively refining the spillover matrix and recalculating the compensation, errors in the spillover matrix and errors in compensation can be reduced to a negligible magnitude. The algorithm starts working in linear scale, and switches to bi-exponential scale when the maximum compensation error across all single-color controls is less than a threshold fixed a priori (10−2). To be used in Eq. (1), compensation errors obtained in the bi-exponential scale are transformed back to a linear scale, by using the two points in the regression line with extreme values in the primary channel. Iterations stop near the convergence of the algorithm when the maximum compensation error across all single-color controls is less than a threshold of 10−4.

While effective in most cases, this strategy for reducing compensation error can become compromised when using controls with low fluorescence levels in the primary channel or other fluorescence artifacts. In these situations, iterations can give rise to oscillations in the observed compensation errors before reaching convergence. To deal with these extreme cases, oscillations are detected by a moving average (size 10, initial value 1) of the decrease in the standard deviation of spillover errors. When this moving average gets below a threshold of 10−6, a fraction (10%) of the update to the spillover matrix is applied in Eq. (1), slowing down convergence and further decreasing compensation error.

### Spillover error

In a flow cytometry system with c channels, let us consider the spillover matrix for a set of d single-color controls, that is for d dyes, with d ≤ c. We concentrate on the dye i = 1…d during the following argument.

For any event in the flow cytometer, we have the following two-row vectors: the true event data x, with length d, and the observed event data y, with length c. On average for any level of fluorescence, true and observed events are related linearly through the d × c spillover matrix S, according to

$${\bf{x}}{\bf{S}}={\bf{y}}.$$
(2)

Classical flow cytometry systems have c = d, and compensation is usually achieved by inverting the spillover matrix S and multiplying by the observed data y. Spectral systems feature c > d, and compensation is usually called unmixing and is not unequivocally defined, because Eq. (2) produces an overspecified system of equations. In the following, and for simplicity, we refer to unmixing in spectral systems also as compensation.

Independently of the compensation method used, when the spillover matrix S is estimated as U = S + T, thus with some error T, it unavoidably gives rise to incorrectly compensated data x + p, which verifies, on average,

$$({\bf{x}}+{\bf{p}})({\bf{S}}+{\bf{T}})={\bf{y}}.$$
(3)

Therefore,

$${\bf{x}}{\bf{T}}=-{\bf{p}}{\bf{U}}.$$
(4)

The vectors x and p, and the matrices S, T, and U, have the following properties:

• Because x represents the true value of events in the single-color control for dye i, then xi > 0 and xj = 0, for all j ≠ i.

• The ith row of the spillover matrix S is normalized with 1 = Sir ≥ Sis ≥ 0, for some r = 1…c and every s ≠ r.

• The row normalization of S implies that the true value of the dye in the control, xi, can always be obtained from the observed value yr, as Eq. (2) implies yr = xiSir = xi. Therefore, pi = 0, irrespective of errors in the estimation of the spillover matrix.

• Also because of the row normalization of the spillover matrix, the estimation of the spillover coefficient Sir = 1 will always be exact, i.e. Uir = 1 and Tir = 0, irrespective of errors in the estimation of the spillover matrix.

Let us consider now the LHS of Eq. (4), i.e. the row vector xT. Its sth coefficient, for any s = 1…c, equals

$${({\bf{x}}{\bf{T}})}_{s}=\mathop{\sum }\limits_{j=1}^{d}{x}_{j}{T}_{js}={x}_{i}{T}_{is}.$$
(5)

Note that (xT)r = 0.

Let us consider the RHS of Eq. (4), i.e. the row vector−pU. Its sth coefficient, for any s = 1…c, equals

$${(-{\bf{p}}{\bf{U}})}_{s}=-\mathop{\sum }\limits_{j=1}^{d}{p}_{j}{U}_{js}.$$
(6)

Note that the summation term piUis = 0.

Equations (46) imply that, for any s = 1…c,

$${T}_{is}=-\mathop{\sum }\limits_{j=1}^{d}\frac{{p}_{j}}{{x}_{i}}{U}_{js}.$$
(7)

The ratio pj/xi can be considered as the compensation error for the average event, corresponding to a spurious signal assigned to dye j, caused by incorrectly compensated spillover from dye i. Equation (3) implies that the ratio pj/xi is invariant w.r.t. the level of fluorescence, and thus it can be estimated by regressing pj vs. xi.

Let us define the compensation error matrix E as the d × d matrix with coefficients

$${E}_{ij}=\frac{{p}_{j}}{{x}_{i}}.$$
(8)

Note that Eii = 0. We can then rewrite Eq. (7) as

$${T}_{is}=-\mathop{\sum }\limits_{j=1}^{d}{E}_{ij}{U}_{js}=-{\bf{E}}(i,* )\ {\bf{U}}(* ,s),$$
(9)

for any s = 1…c.

In summary, Eq. (9) allows to calculate the ith row of the spillover error matrix T. By repeating the same argument for every dye, we can obtain all the rows i = 1…d, and thus the complete matrix as

$${\bf{T}}=-{\bf{E}}{\bf{U}}.$$
(10)

Box 1

### Linear models for estimation of SSM

Successful compensation equilibrates around zero the fluorescence levels in all secondary channels, but with the cost of accentuating undesirable variance or spread in those channels. Again for quantum mechanical reasons, the variance in fluorescence for any (compensated or uncompensated) channel/dye grows linearly with the fluorescence level, and therefore the coefficients of the SSM can be estimated with linear regression.

We start with the formula for an SSM coefficient $${\mathrm{{S{S}}}}_{C}^{P}$$, which characterizes the incremental standard deviation induced in parameter C by the spillover from parameter P15,

$${\mathrm{{S{S}}}}_{C}^{P}=\frac{\sqrt{{\sigma }_{{\rm{positive}}}^{2}-{\sigma }_{{\rm{negative}}}^{2}}}{\sqrt{{F}_{{\rm{positive}}}-{F}_{{\rm{negative}}}}}\ ,$$
(11)

where σpositive and σnegative are the standard deviations in C-fluorescence in positive and negative populations, respectively, and FpositiveFnegative is the difference in P-fluorescence intensity between them. While the traditional algorithm estimates the above quantities using medians and robust standard deviations of fluorescence in the positive and negative populations, we will, for the sake of linear regression, let our negative be the theoretical quantity when P-fluorescence (F) is equal to zero, while the standard deviation is an unknown quantity, which we call σ0. This assumption, introduced for practical computation, excludes the quadratic effect that σ0 imparts. The effect of this exclusion is negligible, as: (i) σ0 (characterization of the cytometer’s machine noise) is guaranteed to be small when compared to the standard deviation introduced by the Poisson process of counting photons (otherwise the cytometer cannot generate meaningful data), (ii) compensation controls used during SSM calculation include negative populations that reside close to zero, and (iii) the result of a small σ0 and presence of a population near-zero dramatically reduces the impact of the σ0 quadratic effect on the model because they guarantee that the data reside on a near-linear region of a parabola. This gives us the following equation relating F to σ, which is suitable for estimating σ0 by linear regression:

$$\sigma =\sqrt{F}\ \beta +{\sigma }_{0}\ .$$
(12)

Notice that the slope β is not equal to the spillover spreading coefficient $${\mathrm{{S{S}}}}_{C}^{P}$$, except in the unique case where σ0 equals zero. We thus proceed with the estimation of σ0 as the first step of AutoSpread.

To supply data for the regression, we partition the events of the single-color control for parameter P by quantile. For controls with a large number of events, we use 256 quantiles, but we allow as few as 8 to ensure enough events in each quantile to estimate standard deviation reliably. For each other parameter C, we calculate in each quantile the robust standard deviation of fluorescence (the 84th percentile minus the median) as the estimate of σ and the median fluorescence as the estimate of F. The F values may be negative and/or close to zero, so they are passed through a square-root-like transform defined by $${f}_{\sqrt{}}(x)={\rm{sign}}(x)\ (\sqrt{| x| +1}-1)$$ prior to regression, instead of the simple square root function. The resulting regression provides an estimate of σ0.

Using the estimate of σ0, AutoSpread calculates for each quantile the estimate of $$\sigma ^{\prime}$$, defined by $$\sigma ^{\prime} ={f}_{\sqrt{}}({\sigma }^{2}-{\sigma }_{0}^{2})$$, and these adjusted standard deviation estimates provide the data for the second regression, $$\sigma ^{\prime} =\sqrt{F}\ {\mathrm{{S{S}}}}_{C}^{P}$$. This regression is calculated without an intercept term because the adjustment of σ0 forces it to zero.

### Reporting summary

Further information on research design is available in the Nature Research Reporting Summary linked to this article.

## Data availability

The raw data for the eight analyzed datasets is available at FlowRepository (https://flowrepository.org), with IDs FR-FCM-Z2SV (Be1) [https://flowrepository.org/id/FR-FCM-Z2SV], FR-FCM-Z2ST (HS1 & HS2) [https://flowrepository.org/id/FR-FCM-Z2ST], FR-FCM-Z2SS (MM1) [https://flowrepository.org/id/FR-FCM-Z2SS], FR-FCM-Z2SW (MM2) [https://flowrepository.org/id/FR-FCM-Z2SW], FR-FCM-Z2SJ (MM3) [https://flowrepository.org/id/FR-FCM-Z2SJ], FR-FCM-Z2SK (MM4) [https://flowrepository.org/id/FR-FCM-Z2SK], and FR-FCM-Z2SL (MM5) [https://flowrepository.org/id/FR-FCM-Z2SL]. Note that the compensation controls for the MM2 dataset are the MM1 dataset. Source data are provided with this paper.

## Code availability

Source code for AutoSpill is available through the R package autospill, available at the github repository https://github.com/carlosproca/autospill45, which includes batch code that reproduces the reported results for the datasets MM1, HS1, HS2, and Be1. The R package is also available in the Supplementary Information as Supplementary Data. In addition, AutoSpill is accessible as a freely available web service at https://autospill.vib.be. The R package also includes batch code to reproduce results as generated by the website.

To allow a large user base to take immediate advantage of the approaches reported here, an implementation of AutoSpill is included in the release of FlowJo v.10.7. AutoSpread is available in binary form in FlowJo v.10.7 (patent pending).

## References

1. 1.

Herzenberg, L. A. et al. The history and future of the fluorescence activated cell sorter and flow cytometry: a view from Stanford. Clin. Chem. 48, 1819–1827 (2002).

2. 2.

O’Gorman, M. R. Clinically relevant functional flow cytometry assays. Clin. Lab. Med. 21, 779–94 (2001).

3. 3.

Krutzik, P. O. & Nolan, G. P. Fluorescent cell barcoding in flow cytometry allows high-throughput drug screening and signaling profiling. Nat. Methods 3, 361–368 (2006).

4. 4.

Maciorowski, Z., Chattopadhyay, P. K. & Jain, P. Basic multicolor flow cytometry. Curr. Protoc. Immunol. 117, 5.4.1–5.4.38 (2017).

5. 5.

Cossarizza, A. et al. Guidelines for the use of flow cytometry and cell sorting in immunological studies (second edition). Eur. J. Immunol. 49, 1457–1973 (2019).

6. 6.

Bendall, S. C., Nolan, G. P., Roederer, M. & Chattopadhyay, P. K. A deep profiler’s guide to cytometry. Trends Immunol. 33, 323–332 (2012).

7. 7.

Roederer, M. Spectral compensation for flow cytometry: visualization artifacts, limitations, and caveats. Cytometry 45, 194–205 (2001).

8. 8.

Carr, E. J. et al. The cellular composition of the human immune system is shaped by age and cohabitation. Nat. Immunol. 17, 461–468 (2016).

9. 9.

Mair, F. & Prlic, M. OMIP 044: 28 color immunophenotyping of the human dendritic cell compartment. Cytometry Part A 93, 402–405 (2018).

10. 10.

Brummelman, J. et al. Development, application and computational analysis of high-dimensional fluorescent antibody panels for single-cell flow cytometry. Nat. Protoc. 14, 1946–1969 (2019).

11. 11.

Bandura, D. R. et al. Mass cytometry: technique for real time single cell multitarget immunoassay based on inductively coupled plasma time-of-flight mass spectrometry. Anal. Chem. 81, 6813–6822 (2009).

12. 12.

Bagwell, C. B. & Adams, E. G. Fluorescence spectral overlap compensation for any number of flow cytometry parameters. Ann. N. Y. Acad. Sci. 677, 167–84 (1993).

13. 13.

Mitchell, A. J. et al. Technical advance: autofluorescence as a tool for myeloid cell analysis. J. Leukoc. Biol. 88, 597–603 (2010).

14. 14.

Vermaelen, K. & Pauwels, R. Accurate and simple discrimination of mouse pulmonary dendritic cell and macrophage populations by flow cytometry: methodology and new insights. Cytometry Part A 61, 170–177 (2004).

15. 15.

Nguyen, R., Perfetto, S., Mahnke, Y. D., Chattopadhyay, P. & Roederer, M. Quantifying spillover spreading for comparing instrument performance and aiding in multicolor panel design. Cytometry Part A 83A, 306–315 (2013).

16. 16.

Li, Q. & Barres, B. A. Microglia and macrophages in brain homeostasis and disease. Nat. Rev. Immunol. 18, 225–242 (2018).

17. 17.

Pasciuto, E. et al. Microglia require cd4 t cells to complete the fetal-to-adult transition. Cell 182, 625–640 (2020).

18. 18.

Chang, X. et al. The Scurfy mutation of FoxP3 in the thymus stroma leads to defective thymopoiesis. J. Exp. Med. 202, 1141–1151 (2005).

19. 19.

Zuo, T. et al. FOXP3 is an X-linked breast cancer suppressor gene and an important repressor of the HER-2/ErbB2 oncogene. Cell 129, 1275–1286 (2007).

20. 20.

Manrique, S. Z. et al. Foxp3-positive macrophages display immunosuppressive properties and promote tumor growth. J. Exp. Med. 208, 1485–1499 (2011).

21. 21.

Liston, A. et al. Lack of Foxp3 function and expression in the thymic epithelium. J. Exp. Med. 204, 475–480 (2007).

22. 22.

Li, F. et al. Autofluorescence contributes to false-positive intracellular Foxp3 staining in macrophages: a lesson learned from flow cytometry. J. Immunol. Methods 386, 101–107 (2012).

23. 23.

Kim, J. et al. Cutting Edge: depletion of Foxp3+ cells leads to induction of autoimmunity by specific ablation of regulatory T cells in genetically targeted mice. J. Immunol. 183, 7631–7634 (2009).

24. 24.

Put, S. et al. Macrophages have no lineage history of Foxp3 expression. Blood 119, 1316–1318 (2012).

25. 25.

Hanley, M. B., Lomas, W., Mittar, D., Maino, V. & Park, E. Detection of low abundance RNA molecules in individual cells by flow cytometry. PLoS ONE 8, e57002 (2013).

26. 26.

Li, R., Jen, N., Yu, F. & Hsiai, T. K. Assessing mitochondrial redox status by flow cytometric methods: vascular response to fluid shear stress. Curr. Protoc. Cytom. 58, 9.37.1–9.37.14 (2011).

27. 27.

Poot, M., Gibson, L. L. & Singer, V. L. Detection of apoptosis in live cells by Mito-Tracken(TM) red CMXRos and SYTO dye flow cytometry. Cytometry 27, 358–364 (1997).

28. 28.

Chattopadhyay, P. K., Winters, A. F., Lomas, W. E., Laino, A. S. & Woods, D. M. High-parameter single-cell analysis. Annu. Rev. Anal. Chem. 12, 411–430 (2019).

29. 29.

Spitzer, M. H. & Nolan, G. P. Mass cytometry: single cells, many features. Cell 165, 780–791 (2016).

30. 30.

Stoeckius, M. et al. Simultaneous epitope and transcriptome measurement in single cells. Nat. Methods 14, 865–868 (2017).

31. 31.

Shahi, P., Kim, S. C., Haliburton, J. R., Gartner, Z. J. & Abate, A. R. Abseq: ultrahigh-throughput single cell protein profiling with droplet microfluidic barcoding. Sci. Rep. 7, 44447 (2017).

32. 32.

Nolan, J. P. & Condello, D. Spectral flow cytometry. Curr. Protoc. Cytom. 63, 1.27.1–1.27.13 (2013).

33. 33.

Novo, D., Grégori, G. & Rajwa, B. Generalized unmixing model for multispectral flow cytometry utilizing nonsquare compensation matrices. Cytometry Part A 83 A, 508–520 (2013).

34. 34.

Futamura, K. et al. Novel full-spectral flow cytometry with multiple spectrally-adjacent fluorescent proteins and fluorochromes and visualization of in vivo cellular movement. Cytometry Part A 87, 830–842 (2015).

35. 35.

Roederer, M. & Murphy, R. F. Cell by cell autofluorescence correction for low signal to noise systems: application to epidermal growth factor endocytosis by 3T3 fibroblasts. Cytometry 7, 558–565 (1986).

36. 36.

Alberti, S., Parks, D. R. & Herzenberg, L. A. A single laser method for subtraction of cell autofluorescence in flow cytometry. Cytometry 8, 114–119 (1987).

37. 37.

Roederer, M. Distributions of autofluorescence after compensation: Be panglossian, fret not. Cytometry Part A 89, 398–402 (2016).

38. 38.

Nitta, N., Veltri, G. & Dessing, M. Method and Theory of The Autofluorescence Unmixing in sp6800 Spectral Cell Analyzer. Technical Report (Sony Corporation, 2015).

39. 39.

Schmutz, S., Valente, M., Cumano, A. & Novault, S. Spectral cytometry has unique properties allowing multicolor analysis of cell suspensions isolated from solid tissues. PLoS ONE 11, e0159961 (2016).

40. 40.

Surre, J. et al. Strong increase in the autofluorescence of cells signals struggle for survival. Sci. Rep. 8, 12088 (2018).

41. 41.

Smith, C. A., Pollice, A., Emlet, D. & Shackney, S. E. A simple correction for cell autofluorescence for multiparameter cell-based analysis of human solid tumors. Cytometry Part B 70, 91–103 (2006).

42. 42.

Pantanelli, S. M. et al. Differentiation of malignant B-lymphoma cells from normal and activated T-cell populations by their intrinsic autofluorescence. Cancer Res. 69, 4911–4917 (2009).

43. 43.

Madsen, L. et al. Mice lacking all conventional MHC class II genes. Proc. Natl Acad. Sci. USA 96, 10338–10343 (1999).

44. 44.

Kim, J. M., Rasmussen, J. P. & Rudensky, A. Y. Regulatory T cells prevent catastrophic autoimmunity throughout the lifespan of mice. Nat. Immunol. 8, 191–197 (2007).

45. 45.

Roca, C. P. et al. Autospill is a principled framework that simplifies the analysis of multichromatic flow cytometry data. GitHub repository https://doi.org/10.5281/zenodo.4656919 (2021).

## Acknowledgements

This project has received funding from the European Union’s Horizon 2020 research and innovation program under grant agreement Nos. 779295 and 874707 (EXIMIOUS). This work was also supported by the VIB, the ERC Consolidator Grant TissueTreg (to A.L.), the Biotechnology and Biological Sciences Research Council through BB/S019189/1, Institute Strategic Program Grant funding BBS/E/B/000C0427, and BBS/E/B/000C0428, and the Biotechnology and Biological Sciences Research Council Core Capability Grant to the Babraham Institute. Part of this research is conducted within the project entitled PRISMA funded by VLAIO (Flanders Innovation & Entrepreneurship) and supported by grant from Stichting Alzheimer Onderzoek -Fondation Recherche Maladie Alzheimer (SAO-FMA) (to S.H.-B.). The authors thank all the collaborators who extensively tested and gave feedback on the beta version of AutoSpill, Michelle Linterman, Danika Hill (Babraham Institute) and Alice Denton (Imperial College London) for supplying additional example data for the AutoSpill website, and Ruben Van Gestel (KUL) for running test datasets for the AutoSpill website.

## Author information

Authors

### Contributions

The study was conceived by C.P.R. and A.L. C.P.R. developed and tested the AutoSpill algorithm. O.T.B., T.P., C.E.W., and S.H.-B. provided the datasets, input on flow cytometry practicalities, and evaluated compensation results. R.H. and J.S. developed and tested the AutoSpread algorithm, and implemented AutoSpill into FlowJo. L.K., J.C., and A.B. developed and tested the AutoSpill website. V.G. worked on revisions. The manuscript was written by C.P.R. and A.L., and revised and approved by all authors.

### Corresponding authors

Correspondence to Carlos P. Roca or Adrian Liston.

## Ethics declarations

### Competing interests

The VIB and the Babraham Institute received funding from BD Bioscience in return for pre-publication access to and consultancy on the AutoSpill algorithm, in order to be incorporated into FlowJo v.10.7. R.H. and J.S. are affiliated with FlowJo, a wholly owned subsidiary of Becton, Dickinson and Company. The other authors declare no competing interests.

Peer review informationNature Communications thanks Pratip Chattopadhyay, Wayne Moore, David Parks and the other, anonymous, reviewer(s) for their contribution to the peer review of this work.

Publisher’s note Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

## Rights and permissions

Reprints and Permissions

Roca, C.P., Burton, O.T., Gergelits, V. et al. AutoSpill is a principled framework that simplifies the analysis of multichromatic flow cytometry data. Nat Commun 12, 2890 (2021). https://doi.org/10.1038/s41467-021-23126-8

• Accepted:

• Published:

• ### Monocyte-driven atypical cytokine storm and aberrant neutrophil activation as key mediators of COVID-19 disease severity

• L. Vanderbeke
• P. Van Mol
• J. Wauters

Nature Communications (2021)