Imagine watching the flow of the East Australian Current; how would you describe all the fish swimming in front of you or find a rare species? Biologists face a similar situation when they look at blood flow. But unlike the tropical fish that come in various sizes, shapes and colors, most blood cells can only be distinguished by their surface protein markers.

Immunologists have used flow-cytometry technology for studying blood cells for decades. A flow cytometer analyzes the cells flowing through a fluorescence detector one by one and records the surface-marker profile of each cell. Recent advances in new fluorescent dyes and detection techniques have allowed simultaneous detection of close to 20 different markers. However, new computational methods are needed to properly analyze these rich data. At the Broad Institute of the Massachusetts Institute of Technology, Jill Mesirov, a computational biologist, recently collaborated with two experimental biology groups led by Philip De Jager and David Hafler to develop a software tool called flow analysis with automated multivariate estimation (FLAME), for such flow-cytometric data analysis.

“The traditional way is to look at the data two or three markers at a time and manually identify cell subpopulations, but when you have 12 different markers, you can't look at 12-dimensional space,” says Mesirov. The current approach is inefficient and subjective. Worse, explains De Jager, “if you don't know what you are looking for, you get more and more limited in the successive two- or three-dimensional projections and fail to recognize the architecture of the whole cell population.”

Two important characteristics of flow-cytometric data, asymmetric distribution and outliers, complicate the analysis. “The histograms that come out of a flow cytometer always have tails, which are not captured very well if one assumes symmetric distribution,” says De Jager. “And the outlier can be also quite important,” adds Mesirov. Others have used symmetric modeling or data transformation to analyze high-dimensional flow-cytometric data. However, symmetric modeling does not fit the flow-cytometric data, and “the problem of data transformation is that very different asymmetric distributions can, after transformation, yield the same Gaussian distribution,” explains Mesirov.

To model robustly against such asymmetry and outliers for precise identification of subpopulations from the high-dimensional data, the authors used a non-Gaussian statistical model based on the multivariate skewed t distribution. “Mathematically it was obvious that a new modeling approach was needed” for flow-cytometric data, says Mesirov. A postdoc in her laboratory, Saumyadipta Pyne, designed the high-dimensional mixture model and then worked with another postdoc, Kui Wang, in Geoffrey McLachlan's lab at the University of Queensland in Australia to create efficient software code to carry out the statistical analysis. They incorporated the code into a user-friendly interface, the FLAME software.

The Mesirov and De Jager groups tested FLAME with data sets derived from peripheral mononuclear blood cells. Using two-step clustering with multiple surface markers, they isolated an important regulatory T-cell population that consists of only 0.81% of the total number of blood cells. However, FLAME does more than identify subpopulations. Aided by another mathematical approach called bipartite graph matching, FLAME can be used to detect changes to the surface marker profile of specific cell subpopulations in distinct states. The researchers demonstrated this by matching cell populations before and after T-cell stimulation and visualizing phosphorylation and other marker shifts.

“The FLAME output is much richer than the old way of looking at flow-cytometric data,” said De Jager. As a neurologist, he plans to use FLAME to discover biomarkers for neurodegenerative diseases, such as multiple sclerosis. Meanwhile, Mesirov's group is tackling more complex data, such as the nonconvex distributions, to examine biological systems by mathematical description. Also, by adding FLAME to the Gene Pattern genomic software package developed by the Mesirov group, researchers will be able to compare flow-cytometric data to other high-throughput data, according to Mesirov.