Nature Medicine
8, 68 - 74 (2002)
doi:10.1038/nm0102-68
Diffuse large B-cell lymphoma outcome prediction by gene-expression profiling and supervised machine learningMargaret A. Shipp1, Ken N. Ross2, 8, Pablo Tamayo2, 8, Andrew P. Weng3, Jeffery L. Kutok3, Ricardo C.T. Aguiar1, Michelle Gaasenbeek2, Michael Angelo2, Michael Reich2, Geraldine S. Pinkus3, Tane S. Ray6, Margaret A. Koval1, Kim W. Last4, Andrew Norton5, T. Andrew Lister4, Jill Mesirov2, Donna S. Neuberg1, Eric S. Lander2, 7, Jon C. Aster3
& Todd R. Golub1, 21
Dana-Farber Cancer Institute, Harvard Medical School, Boston, Massachusetts, USA
2
Whitehead Institute for Biomedical Research/Massachusetts Institute of Technology Center for Genome Research, Cambridge, Massachusetts, USA
3
Brigham and Women's Hospital, Harvard Medical School, Boston, Massachusetts, USA
4
ICRF Medical Oncology Unit, St. Bartholomew's Hospital, London, UK
5
Pathology Unit, St. Bartholomew's Hospital, London, UK
6
Department of Computer Science, Maths and Physics, University of West Indies, Bridgetown, Barbados
7
Department of Biology, Massachusetts Institute of Technology, Cambridge, Massachusetts, USA
8
K.N.R. and P.T. contributed equally to this study.
Correspondence should be addressed to Margaret A. Shipp margaret_shipp@dfci.harvard.edu or Todd R. Golub golub@genome.wi.mit.eduDiffuse large B-cell lymphoma (DLBCL), the most common lymphoid malignancy in adults, is curable in less than 50% of patients. Prognostic models based on pre-treatment characteristics, such as the International Prognostic Index (IPI), are currently used to predict outcome in DLBCL. However, clinical outcome models identify neither the molecular basis of clinical heterogeneity, nor specific therapeutic targets. We analyzed the expression of 6,817 genes in diagnostic tumor specimens from DLBCL patients who received cyclophosphamide, adriamycin, vincristine and prednisone (CHOP)-based chemotherapy, and applied a supervised learning prediction method to identify cured versus fatal or refractory disease. The algorithm classified two categories of patients with very different five-year overall survival rates (70% versus 12%). The model also effectively delineated patients within specific IPI risk categories who were likely to be cured or to die of their disease. Genes implicated in DLBCL outcome included some that regulate responses to B-cell−receptor signaling, critical serine/threonine phosphorylation pathways and apoptosis. Our data indicate that supervised learning classification techniques can predict outcome in DLBCL and identify rational targets for intervention.
|