As newer and more refined technologies facilitate continued investigation of the immune system, the extent of its heterogeneity becomes more evident. Nowhere has this been more apparent than as a result of recent developments in flow cytometry and related technologies for cell analysis. It is now known that the immune system, even when only the cell types residing in blood are considered, is composed of hundreds of phenotypically and functionally distinct subsets. It is no longer tenable to study immune-related effects using 'bulk' lymphocyte populations such as 'memory CD8+ T cells'. Instead, investigation of 'fine' lymphocyte subsets is necessary to understand the mechanisms of immunological disease and protection.

To distinguish among these specific cell types, it is necessary to measure simultaneously as many as six different T-cell-associated 'markers' (typically cell surface proteins); further functional characterization requires the additional measurement of responses (such as cytokine production, proliferative capacity or apoptotic potential). This has necessitated the development of flow cytometric technologies capable of detecting considerably more parameters than the typical 3- or 4-color cytometers prevalent to date. Current technology is capable of measuring 2 scatter and 12 fluorescence parameters simultaneously and individually for each cell. Because this technology comes with unique problems (and solutions), we term it polychromatic flow cytometry (PFC), which we use to refer in general to flow cytometric analyses encompassing 6 or more colors.

Instrument manufacturers have recognized the growing demand for PFC, and most now provide the hardware necessary to carry out these types of measurements. During the past few years, several research groups have employed custom-built instruments with the capacity to quantify as many as 12 fluorescences. In this paper we explore how this multicolor cytometric technology is substantially refining extant methods to allow new experimental designs. Together, these types of studies are providing a completely new view of the immune system that encompasses a far greater heterogeneity of cell type and function than previously imagined.

Historical perspective

To develop a perspective on the future of multicolor technology, it is useful to appreciate some of the history and motivations underlying the development of the current state of the art. A more complete discussion of this topic can be found online (see Supplementary Note). Since its introduction to immunological analyses by the Herzenberg laboratory at Stanford University in the late 1960s1,2,3, fluorescence-based flow cytometric analysis has become pervasive in biological analyses. The two crucial and unique achievements afforded by flow cytometry are the multiparametric analysis of cells at a very high rate and the viable separation of essentially pure cell populations.

The earliest experiments used a single fluorescence detector together with scattered light signals to measure biophysical characteristics of cells (size and shape). Such 'one-color' experiments captured the imagination of immunologists, and one of the earliest successful sorting experiments was the isolation of antibody-secreting B cells1. Nonetheless, it was quickly appreciated that one fluorescence measurement was not enough: although B and T cells could be distinguished on the basis of lineage-specific cell surface proteins, additional information could only be obtained by sorting these bulk populations and making further measurements in vitro. By then, any correlation of function with the phenotype of the original cell was lost. Gradually, researchers developed additional fluorochromes that could be coupled to antibodies, and by 1984, the most sophisticated cytometry laboratories were routinely conducting 4-color experiments. Nonetheless, another decade passed before 4-color instruments were available to the majority of research and clinical laboratories.

As discussed in the Supplementary Note online, the AIDS epidemic had a marked impact on the diffusion of flow cytometric instrumentation into the research and clinical community. AIDS-related research also spurred the extensive development effort carried out by our group at Stanford, in which we gradually increased the number of measurements, culminating in our current instrument, capable of measuring 12 fluorescence parameters (14 total) simultaneously4. The hurdles that had to be overcome—in regard to hardware, software, chemistry, and data analysis and presentation—have been discussed elsewhere5.

Applications of multicolor technology

The ability to measure such a large number of parameters simultaneously has many potential advantages and applications. Although there are many examples of the use of 3-, 4- and even 5-color flow cytometry in the literature, reports using more than 6 colors are much less frequent because of the limited availability of such technology. Here, we review some of these reports and some of our own work with the goal of illustrating applications particularly well suited to multiparameter cytometry.

Specifically, we will review three types of experiments that became possible with this technology: precise identification of cellular subsets, single-cell functional characterization and characterization of rare subsets. Additionally, we identify some of the unique advantages associated with PFC, illustrating the ways in which this technology can enhance a wide range of experiments as summarized in Advantages of 12-color PFC.

Precise identification of cellular subsets using multiple markers.

Perhaps the most powerful application of PFC is to use multiple markers for precise identification of cellular subsets. An early 7-color study of lymphocytes from patients with Hodgkin's disease identified what are probably distinct lineages of T cells, possibly arising through extrathymic differentiation6. In the setting of lymphopenia, these cells often become much more prevalent and easier to identify; now that their phenotype is better known, it is feasible to enumerate them even in healthy adults.

The original impetus for the development of PFC was the discovery that naive CD8+ T cells are lost during HIV disease7. This research made use of a specific combination of two reagents that best identify this T-cell subset8. Later, we demonstrated that the use of only two markers defining naive T cells is inadequate for accurate identification of naive T cells4. In this study, the use of three to five naive-defining markers progressively improved the accuracy of identification of naive T cells. More recently, a report demonstrated that naive T cells can be broadly subdivided using CD31 (ref. 9). This serves to reinforce the point that populations of cells previously thought to be homogeneous are better defined when the appropriate set of markers is used. Unfortunately, it is impossible to know in advance which markers to measure; therefore, the greater the number of markers that can be used in exploratory studies, the better.

As is the case for human cells, identification of naive rhesus macaque T cells also requires the simultaneous use of multiple markers. One report suggested that CD4+ and CD8+ naive T cells could be optimally distinguished from memory cells using a combination of three surface markers (CD95 and CD28, with β7 integrin for CD4+ T cells, and with CD11a for CD8+ T cells), in addition to the lineage-defining markers10.

The high degree of heterogeneity demonstrated in these few examples is not unique to the T-cell compartment. Indeed, B cells can also be divided into many complex subsets that are related by differentiation and are functionally distinct. Currently, most explorations of B-cell heterogeneity have involved the progenitor cells leading to mature B cells. For example, a study employing 11-color PFC to analyze the expression of CD4, MHC class II and CD45 (B220) among the pro–B cell population in adult bone marrow distinguished the subsets leading to the B-1 or B-2 lineages11. Two recent publications examined the role of antigen receptor affinity in the development of B cells in both T-independent and T-dependent fashion12,13. In this study, progenitor cells uniformly expressing low-affinity receptors were mixed with B cells uniformly expressing high-affinity receptors and adoptively transferred into a wild-type mouse. The source of the B cells (low-affinity, high-affinity and wild type–derived) were distinguished by fluorescent markers, requiring 2 colors. An additional 4 or 5 colors were then used to identify B-cell stages. This study was the first to define the role of antigen receptor affinity in the recruitment of B cells to germinal centers, as well as the selection of B cells of varying affinity for maturation—research that would not have been possible without sufficient measurements to distinguish among all of the variables needed to identify the source of the B cells and their differentiation stage. Perhaps the most interesting finding of this study was the demonstration that affinity maturation (through somatic hypermutation) was not driven by the (lack of) affinity of a B-cell receptor; rather, mutation occurred at similar rates in both low- and high-affinity B cells. Selection of B cells for expansion and differentiation, however, was highly dependent on affinity. These studies shed new light on fundamental aspects of B-cell biology that had been researched for decades.

Single-cell functional characterization.

One of the most powerful applications of PFC is the determination of cellular functions at the single-cell level—that is, combining detailed phenotypic information with measurements of cell function. There are now several cytometry-based assays for direct or indirect determination of cellular function. For example, the method of cell permeabilization and intracellular staining has been used to assay for cytokines, cytotoxic molecules, cell-proliferation antigens (Ki-67, BrdU) and activated kinases and caspases. Cell proliferation can be quantified by measurement of dye (carboxyfluorescein diacetate, CFSE) dilution accompanying cell division. Finally, there are multiple cytometric assays quantifying apoptosis and cytotoxicity at the single-cell level.

Intracellular staining for cytokines after stimulation of T cells has revealed complex response profiles. In an early study, we identified subsets of memory CD4+ T cells polarized toward the production of interferon-γ (IFN-γ) (Th1-like) or interleukin-4 (IL-4) (Th2-like)14. These subsets could be defined based on the expression of cell-surface markers; notably, the representation of these subsets (but not their function) was altered in blood from patients with tuberculoid leprosy (a Th1 disease) or lepromatous leprosy or atopic disease (Th2 diseases).

In another study, researchers used PFC to provide a detailed picture of the intracellular signaling pathways active in lymphocyte subsets15. These experiments use fluorescently conjugated antibodies specific for the active state of several kinases involved in T-cell activation pathways. There was clear resolution between cells in uninduced and induced conditions: thus, one can now sort cells that have responded to stimuli from those that have not done so, to identify other genetic or molecular correlates of responsiveness. Notably, the assay is quantitative for kinase activity, suggesting that it may be possible to accomplish T-cell separations based on even finer criteria (that is, relative extent of activation of multiple kinases). Using this assay, the authors showed that there are heterogeneous kinase activation states among primary lymphocyte subsets. Especially notable was their demonstration that the state of kinase activation is exquisitely sensitive to manipulation of cells. Their experiments highlight the importance of the ability to carry out multifactorial functional measurements on cells with minimal manipulation.

Characterization of rare subsets (antigen-specific responses).

PFC has been used in assays designed to identify and functionally characterize antigen-specific T cells. The two different forms of such assays that are commonly employed are stimulation-based assays that identify antigen-specific T cells on the basis of the upregulation of activation markers or expression of cytokines in response to the stimulation16,17, and fluorescent peptide–loaded MHC multimers (commonly referred to as 'tetramers') that selectively bind to antigen-specific T cells18.

The contemporaneous development of these assays with PFC is fortunate. Given that antigen-specific T cells are often very rare (typically <0.1% of lymphocytes), it is difficult to carry out multiple assays on these cells given the limitation of sample volume. Hence, it is useful to do many distinct measurements simultaneously. The unique capability of PFC to make as many as 14 measurements on each cell confers the ability to determine simultaneously both phenotypic and functional characteristics of such rare cells.

For example, PFC was used to identify and functionally characterize murine γδ T cells specific for a nonclassical MHC molecule19. Because γδ T cells constitute 1% of the splenic cell population, and of these only 0.5% bind the T22 tetramer, the characterization was carried out on a subset comprising only 5 per 105 cells. Using PFC, these T cells were shown not only to bind to the MHC tetramer, but also to alter their surface expression of activation markers such as CD69 and CD62L after MHC tetramer binding.

One of the first detailed characterizations of an antigen-specific T-cell population studied potentially tumor-reactive cytotoxic T cells in patients with metastatic melanoma20. Because of the rarity of these cells (1 in 105 lymphocytes), standard techniques are unable to derive much information about these T cells. We quantified the expression of nearly 40 different T-cell surface antigens (that is, we accurately identified the T-cell subset to which the antigen-specific T cells belong) as well as determining their cytokine profile, showing that these cells were functionally anergic (whereas viral antigen-specific T cells from the same patient samples were functionally normal).

Other advantages of PFC.

The availability of additional parameters often allows experimental procedures to be modified for further optimization of measurements. For example, one channel can be devoted exclusively to discriminating between live and dead (or dying) cells, thus potentially substantially reducing background. The ability to measure as many as 12 different surface antigens can reduce a typical leukemia-lymphoma monitoring panel from more than a dozen tubes to just two—allowing important diagnostic information to be collected from even minute sample volumes. These and other advantages are discussed in the Supplementary Note online and summarized in Advantages of 12-color PFC.

Examples of the power of PFC

Previously, data generated using 4-color flow cytometry have described at least three different populations of CD8+ T cells, termed naive, central memory and effector memory21,22,23,24,25. Naive T cells are classified by expression of CD45RA, CD27, CD28, CD62L and CCR7, whereas 'central memory' T cells are hypothesized to express CD45RO but not CD45RA, and to maintain CCR7 expression (and presumably CD27 and CD62L). The 'effector memory' T cells were defined on the basis of the loss of CCR7 expression (and probable loss of CD27 and CD62L with re-expression of CD45RA; ref. 21). Using 4-color flow cytometry does not permit careful phenotyping of any of the three described CD8+ T-cell populations or definitive demonstration that the effector memory population is exclusively CD45RA+CD27CD62LCCR7.

Use of PFC demonstrates the oversimplification of the interpretations of these phenotypic analyses (Fig. 1). For example, in addition to the three well-documented CD8+ T-cell populations, there exists an additional population, CD45RO+CD27. The use of additional markers reveals even more complexity. For example, within the subset of CD8+ T cells defined by a common combination taken to indicate naive T cells, CD27+CD45RO, there are quite a few CD8+ T cells that do not express CCR7 or CD62L. Hence, although most of the CD45ROCD27+ population is naive, assessment of additional markers indicates that this population contains a substantial number of contaminating antigen-experienced CD8+ T cells4. PFC will be invaluable in delineating the function and maturational pathways of these subsets, especially in light of recent work suggesting there are functionally equivalent T cells within both 'central' and 'memory' T-cell compartments26.

Figure 1: Complex subsets of CD8+ T cells.
figure 1

Peripheral blood mononuclear cells (PBMC) were stained with 9 reagents: CD3 and CD8 to define the CD8 T-cell lineage, and CD45RA, CD45RO, CCR7, CD27, CD28, CD62L and CD57 to further delineate fine subsets. As shown in the top panels, lymphocytes were scatter-gated and then CD8+ cells were gated as CD3+CD8+. CD8+ T cells were divided into 4 subsets based on the expression of CD27 and CD45RO. The remaining panels show the expression of the other markers on these 4 subsets compared to the total CD8+ population (left panel): the 2 CD27+ subsets (middle 2 panels) and the 2 CD27 subsets (right 2 panels). The phenotype for naive cells is labeled on the figure. Note that each of the CD27 populations can be further subdivided on the basis of expression of CD57, a marker with as yet undefined functional correlates. This figure also demonstrates that, although the expression of CD28 and CD57 is usually mutually exclusive, there are clearly some cells that lack expression of both (among CD27CD45RO cells) or that coexpress both (among CD27+CD45RO+ cells). Data were collected on a Becton Dickinson DiVa flow cytometer (San Jose, California) modified for 14 parameters (12 colors and 2 scatter measurements). Data were analyzed using FlowJo (Treestar, San Carlos, California).

Figure 2 provides another example in which PFC reveals multiple levels of complexity. In this experiment, several functional assays are combined to illustrate the phenotypic and functional characterization of cells responding to antigenic stimulation. These types of analyses can be used to identify different functional pathways that T cells follow after stimulation. For example, although most undivided CD8+ T cells produce IFN-γ, far fewer of the divided cells produce this cytokine—suggesting that T cells that do not divide upon antigen stimulation (and are perhaps terminally differentiated) are highly enriched for cytokine production. In addition, the data illustrate the complexity of cytokine repertoires that exists among the different T-cell populations. These kinds of functional correlates may be useful in determining, for example, the potential responses of antigen-specific T cells after vaccine administration.

Figure 2: Proliferation, cytokine production and phenotyping measured simultaneously in a single tube.
figure 2

Peripheral blood mononuclear cells were stained with carboxyfluorescein diacetate (CFSE) and stimulated with Staphylococcus enterotoxin B (SEB) (5 μg/ml) for 5 d. On the day of analysis, cells were restimulated with phorbol myristate acetate (PMA; 25 ng/ml) and ionomycin (1 μg/ml) in the presence of brefeldin A (10 μg/ml) for 4 h. Cells were then surface-stained with a combination of reagents including CD8, CD45RO, CCR7 and HLA-DR, as well as ethidium monazide bromide (EMA) and CD14 in a 'dump' channel. After fixation and permeabilization, cells were intracellularly stained for CD3, CD4, IFN-γ and IL-2. (Left) The CD4+ and CD8+ T cells shown have first been gated as 'dump', then by scatter and CD3 for T lymphocytes. Proliferating cells that had been stimulated in culture for 5 d can be separated into undivided and dividing populations on the basis of the extent of CFSE fluorescence dilution. In addition, cytokine production for the dividing and nondividing cells is detected by intracellular staining after short-term restimulation with PMA and ionomycin. By combining these measurements with fine phenotypic subsetting, a more precise identification of populations capable of cytokine production ('effector' cells) and those capable of proliferation is possible. IFN-γ production is inversely correlated with proliferation, in that far fewer of the divided cells make the cytokine. Most T cells that produce this cytokine have not divided and may in fact be terminally differentiated. (Right) CD4+ or CD8+ T cells were further subdivided on the basis of whether they had divided and/or express CD45RO (see quadrant gates in the middle panels on the left). Shown for each of the 4 subsets within the lineage is the expression of IL-2 and IFN-γ. Note for example that the IL-2+IFN-γ cells are most predominant in the CD45RO, divided population, which arises predominantly from naive T cells.

How many colors are required?

Although 6-color analysis is possible with commercial reagents and instruments, 8- to 12-color technology is still on the cutting edge, available only to laboratories willing to manufacture and validate reagents on site. Such technology will become commercially viable once researchers require it with greater frequency. Thus, it is valid to ask: “How many colors do we really need?” This question is more fully discussed in the Supplementary Note online.

Figure 1 illustrates the existing complexity just within CD8+ T cells; similar levels of complexity exist for other lymphocyte populations. Why should we make the picture so complicated? In reality, this kind of analysis can, paradoxically, simplify our understanding of pathogenesis or immune protection. By uniquely identifying functionally distinct subsets, we can now specifically enumerate and characterize subsets that are crucial in immune protection or pathogenesis. Thus, we can eliminate the variability arising from intrasubject differences in subset representation for contaminating subsets that are not relevant to what is being studied. For example, only when 3-color technology was applied was the specific loss of naive CD8+ T cells in late-stage AIDS identified7. More recently, 8-color PFC made it possible to demonstrate that functionally polarized memory CD4+ T-cell subsets vary in leprosy or atopy14.

The greatest value in PFC comes from the research that identifies the specific subsets or functions, which might then be measured by 4- to 6-color cytometers. For example, although 8-color PFC was used to identify the functionally polarized subsets important to leprosy, it is possible to construct specific 4-color combinations to enumerate these subsets. Thus, the high-end multicolor PFC experiments can be used first to scan large numbers of combinations of reagents (and therefore subsets) for those that are most central to a given project, and then to help identify a mechanism for reducing the complexity of the analysis for use in clinically available instruments.

Future directions

High-end multicolor flow cytometry is still in its infancy; many hurdles must still be overcome before this technology will become routinely available in research laboratories (see Supplementary Note online for a more detailed discussion). The most difficult obstacle at this time is reagent availability; most of the fluorochrome conjugates we use (Fig. 3) are made in our laboratory. Many dyes are now available that can be easily conjugated to antibodies for use in PFC to supplement the commercially available conjugates. With regard to hardware development, by far the most important requirement is automation. Instrument setup and calibration are far more complex with the multiple lasers and detectors for PFC; computer-aided validation of instrument performance is necessary. In addition, automated compensation (fluorescence spillover) setting is also necessary; however, this is adequately handled by most contemporary software packages.

Figure 3: Example of 12-color PFC dyes.
figure 3

Shown are the fluorescent dyes currently used in our laboratory for 12-color PFC, each excited by 1 of 3 lasers. The wavelength range of light collected from each dye is determined by bandpass filters designed to optimize dye collection from the specific dye while minimizing that from other dyes. Light from dyes excited by different lasers can be collected at similar wavelengths because the cells cross the lasers at different times. CasBlu, Cascade Blue; Alx, Alexa; PE, phycoerythrin; TR, Texas Red; APC, allophycocyanin; Cy, cyanine.

Data analysis is by far the most time-consuming aspect of PFC experiments. The complexity of the data is such that, with current software tools, analysis of individual samples requires inordinate amounts of time. There is a considerable demand for tools that can organize the analyses into databases, as well as assist in the exploration of the complex data sets.

For example, we recently developed automated multivariate techniques to help researchers analyze such data. These algorithms are designed to compare multidimensional distributions to identify and quantify the degree of difference between data sets27,28. In addition, these algorithms can rapidly identify regions of multivariate distributions that differ, thereby providing a mechanism for identifying those cells that are different between two samples. Figure 4 illustrates the use of this technique to compare B-cell populations from mice of disparate genetic backgrounds. Tools such as these help researchers to explore the complex data sets and identify interesting aspects that no combination of two-dimensional graphs could have revealed. The probability binning (PB) algorithm can be used to quantify the degree of similarity or disparity of highly complex distributions for the purposes of ranking these distributions. This may be useful for quantifying the number of cells that respond to a particular stimulus (perhaps having responded in complex multivariate patterns) or identifying variations in immunophenotyping patterns that correlate with pathogenic states. We are also developing automated population identification algorithms (cluster analysis) specifically tailored for flow cytometric data.

Figure 4: Probability binning (PB) and frequency difference gating (FDG).
figure 4

Mouse spleen cells were stained with 4 reagents to distinguish B-cell subsets, including antibodies to CD21, CD23, IgM and AA4.1. Cells from two littermates of 6- or 8-wk-old Balb/c, C57bl or F1 cross-bred mice were stained and compared to a single 8-wk-old Balb/c mouse. The questions we asked were: “Are these distributions different in a statistically significant way?” and, if so, “What T cells are different between these?”. The PB algorithm calculates a metric, T(χ), that scales with the extent of the difference between distributions; the FDG algorithm identifies regions in multivariate distributions that have statistically significant differences in distribution. a, Gating on scatter and propidium iodide (PI) to identify live lymphocytes, and on CD21 and IgM to identify B cells. b, Comparison of the distributions of CD23 and IgM on the B cells from two Balb/c mice, either 6 or 8 wk old. c, The 4-color immunophenotyping data such as those illustrated in a and b were compared using PB. Note that in every case, immunophenotype patterns from littermates were far more similar to each other than to those of other mice. Further, the F1 mice were closer to either parental strain than the 2 strains were to each other. This indicates that the patterns of surface expression of these proteins are under genetic control. Finally, the 6-wk-old mice were statistically significantly different from the 8-wk-old mice. d, Distributions from one of the 6-wk-old mice and one of the 8-wk-old mice were compared using FDG. This algorithm identifies the regions in multivariate space where 2 distributions have significant differences in the frequency of cells. Top left, overlay of the total B-cell populations from the 2 mice. Top right, overlay of only the difference-gated events (that is, those events that occupy regions that are different between the 2 mice). Bottom, overlays of the total B-cell population from each mouse with the difference-gated events from the other mouse. The algorithm identified that the 8-wk-old mouse has more cells with an IgM-dull, CD23-bright phenotype, which is associated with increased maturity. This subtle difference between the 6- and 8-wk-old mice would have been nearly impossible to detect using standard analysis techniques.

The menu of tools available for analyzing these complex data sets is by far the most limiting aspect of PFC technology. We need a series of new tools that can guide the data exploration process, and a series of tools that can be used to automate the data analysis in a validated fashion. At this point, the hardware and chemistry are far more advanced than the software. Instrument manufacturers and developers will need to work closely with expert research laboratories to develop more sophisticated software tools.

Conclusions

Polychromatic flow cytometry is expanding vigorously in the immunological community. Just in the past year, commercial instrumentation capable of analyzing 8 or more colors has become widely available. Reagent manufacturers are rapidly developing additional fluorescent conjugates for use in immunophenotyping and functional analysis. Yet there remain substantial hurdles, both conceptually and experimentally, for laboratories beginning to use this new technology. Nonetheless, the success of this technology in revealing hitherto unappreciated aspects of the immune system, as exemplified in this review, will drive this technology to become commonplace in flow cytometry laboratories worldwide.

Note: Supplementary information is available on the Nature Medicine website.