Accurate prediction of future clinical outcome could revolutionize cancer treatment by potentially allowing specific therapies to be tailored to distinct tumour types, thus maximizing efficacy and minimizing toxicity. Identifying informative correlations between gene-expression levels in cell populations that do or do not respond to a given treatment offers much promise for this goal, but large differences in expression levels are typically crucial for predictive success. Writing in the inaugural issue of the Journal of Proteome Research, Michael Korenberg describes a computational method that can predict long-term treatment response from gene-expression profiles taken from patients at the time of diagnosis of acute leukaemia. The success achieved is in contrast to several previous attempts with the same data, which were hampered by the lack of genes with expression-level differences strongly correlated with clinical outcome.

The data were from a landmark study by Golub et al., which showed that gene-expression profiling could be used to distinguish acute lymphoblastic leukaemia (ALL) from acute myeloid leukaemia (AML). This study also explored the ability to predict response to treatment with anthracycline-cytarabine in a group of 15 AML patients, 8 of whom failed to achieve remission, but statistical analysis found no evidence of a strong gene-expression signature predictive of clinical outcome.

Korenberg exploited a method for modelling nonlinear systems called parallel cascade identification (PCI), which requires only input/output data for the system gathered in an experiment — in this case, expression levels at the time of diagnosis/clinical outcome — to train a model for classifying further data, and has the useful feature that effective classifiers can be created with very few data. Using expression profiles from just one failed treatment and one successful treatment to create a training input, a PCI model was constructed that transformed gene-expression levels from the remaining profiles into output values whose correlation with outcome was clearly significant (P< 0.0155). In fact, 5 of the remaining 7 failed outcomes and 5 of the 6 remaining successful outcomes could be correctly classified from their expression profiles. Another identified PCI model could also distinguish AML from ALL in a test analogous to that done by Golub and colleagues.

Such success in the prediction of class in the absence of large differences in gene-expression levels between classes could lead to the widespread application of PCI in cancer diagnosis and therapy. Furthermore, the method is just as applicable to images obtained from 2D-gel electrophoresis of proteins, and to many other biological profiles.