Gene expression–based survival prediction in lung adenocarcinoma: a multi-site, blinded validation study


Although prognostic gene expression signatures for survival in early-stage lung cancer have been proposed, for clinical application, it is critical to establish their performance across different subject populations and in different laboratories. Here we report a large, training–testing, multi-site, blinded validation study to characterize the performance of several prognostic models based on gene expression for 442 lung adenocarcinomas. The hypotheses proposed examined whether microarray measurements of gene expression either alone or combined with basic clinical covariates (stage, age, sex) could be used to predict overall survival in lung cancer subjects. Several models examined produced risk scores that substantially correlated with actual subject outcome. Most methods performed better with clinical data, supporting the combined use of clinical and molecular information when building prognostic models for early-stage lung cancer. This study also provides the largest available set of microarray data with extensive pathological and clinical annotation for lung adenocarcinomas.

Figure 1: Classifier performance.
Figure 2: Kaplan-Meier estimates of the survivor function for method A on each validation data set for the four hypotheses.
Figure 3: Kaplan-Meier estimates of the survivor function for method A (cross-validated) on training sets UM and MSK.


We thank M. Orringer, A. Pickens, F. Taylor, N. Liu, D. Lau, M. Whitehead, L. Chen, L. Vargas, Y. Xiao, M. Maddaus and C. Hoang. We thank M. Heiskanen, L. Liu, D. Reeves and S. Whitley from the US National Cancer Institute Center for Bioinformatics and W. Ricker from Information Management Services for assistance with development of the lung study database and data management. We thank D. Sawyer, J.M. Askew and A. Vaughn of the Cancer and Leukemia Group B Statistical Center, Duke University for quality control of the clinical data. We thank Affymetrix for technical support. This work was supported by US National Cancer Institute grants CA84953, CA84999, CA84995, CA85052 and CA46592 and contracts 263-MQ-319735, 263-MQ-319740, 263-MQ-319746 and 263-MQ-510430 and support from the Canadian Cancer Society.

Author information

Authors and Affiliations



Writing Committee: K.S., J.M.G.T., S.A.E., M.S.T., T.J.Y., W.L.G., S.E., I.J., V.E.S., M.M., R.K., K.K.D., T.L., J.W.J. and D.G.B. Members of the Writing Committee participated in the planning, initiation, data generation, data analysis and manuscript preparation for the project.

Additional participants: T.J.G., D.E.M., A.C.C. and S.H. participated in aspects of sample collection and preparation, data generation and data analysis at the University of Michigan. C.Q.Z., D.S., F.A.S., K.D. and L.S. participated in aspects of sample collection and preparation, data generation and data analysis at the Ontario Cancer Institute. K.N., N.P., B.W., R.V., C.L.-A and T.G. participated in aspects of sample collection and preparation, data generation and data analysis at the Dana-Farber Cancer Institute and Broad Institute. M.G. assembled the clinical data at the H. Lee Moffitt Cancer Center. J.S., M.Z., V.R., M.K., A.V., N.M., W.T. and A.S. participated in aspects of sample collection and preparation, data generation and data analysis at Memorial Sloan-Kettering Cancer Center. B.C. participated in the planning and initiation of the study.

Corresponding authors

Correspondence to James W Jacobson or David G Beer.

Additional information

The consortium consists of the Writing Committee plus additional participants as detailed in the Author Contributions section.

Supplementary information

Supplementary Text and Figures

Supplementary Figs. 1 and 2 (PDF 2763 kb)

Supplementary Table 1

Supplementary Data (XLS 409 kb)

