Short Communication
Oncogene advance online publication 16 November 2009; doi: 10.1038/onc.2009.398
Predicting the site of origin of tumors by a gene expression signature derived from normal tissues
E Staub1, H -J Buhr2 and J Gröne2
- 1Drug Discovery Informatics, Merck Serono, Merck KGaA, Darmstadt, Germany
- 2Department of General, Vascular and Thoracic Surgery, Charité-Campus Benjamin Franklin, Berlin, Germany
Correspondence: Dr E Staub, Drug Discovery Informatics, Merck KGaA, Merck Serono, Frankfurter Str. 250, 64293 Darmstadt, Germany. E-mail: eike.staub@merck.de
Received 22 July 2009; Revised 9 October 2009; Accepted 12 October 2009; Published online 16 November 2009.
Abstract
Multiple expression signatures for the prediction of the site of origin of metastatic cancers of unknown primary origin (CUP) have been developed. Owing to their limited coverage of tumor types and suboptimal prediction accuracy on distinct tumors there is still room for alternative CUP gene expression signatures. Whereas in past studies CUP classifiers were solely trained on data from tumor samples, we now use expression patterns from normal tissues for classifier training. This approach potentially avoids pitfalls related to the representation of genetically heterogeneous tumor subtypes during classifier training. Two expression data sets of normal human tissues have been reanalysed to derive an expression signature for liver, prostate, kidney, ovarian and lung tissues. In reciprocal validation classifiers trained on either data set achieved overall accuracies greater than 97%. Classifiers trained on combined expression data from both normal tissue data sets were able to predict the site of origin in a cohort of 652 primary tumors with
90% accuracy. Prediction accuracies of primary cancer-based classifiers were in the same range as determined by cross-validation on this cohort. For individual tumor types, normal tissue-based best-centroid classifiers achieved sensitivities ranging from 71 to 99% and specificities ranging from 91 to 99%. Primary origins for 12 of 20 metastases were predicted correctly with false predictions highlighting the need for accurate sample preparation to avoid contaminations by metastases-surrounding tissue. We conclude that gene expression patterns of normal tissues harbor phenotypic information that is retained in tumors and can be sufficient to recover the type of a primary tumor from expression patterns alone.
Keywords:
gene expression signature, CUP, cancer of unknown primary, cancer type, expression profiling, microarray
