Original Article

The Pharmacogenomics Journal (2010) 10, 310–323; doi:10.1038/tpj.2010.35

Functional analysis of multiple genomic signatures demonstrates that classification algorithms choose phenotype-related genes

The views presented in this article do not necessarily reflect those of the U.S. Food and Drug Administration.

W Shi1,11, M Bessarabova2,11, D Dosymbekov2,11, Z Dezso1,11, T Nikolskaya1,2, M Dudoladova2, T Serebryiskaya2, A Bugrim1, A Guryanov1,2, R J Brennan1, R Shah3, J Dopazo4, M Chen5, Y Deng6, T Shi7, G Jurman8, C Furlanello8, R S Thomas9, J C Corton10, W Tong5, L Shi5 and Y Nikolsky1

  1. 1GeneGo Inc., St Joseph, MI, USA
  2. 2Vavilov Institute for General Genetics, Russian Academy of Sciences, Moscow, Russia
  3. 3SRA International Inc., Durham, NC, USA
  4. 4Centro de Investigacion Principe Felipe, Valencia, Spain
  5. 5National Center for Toxicological Research, FDA, Jefferson, AR, USA
  6. 6Department of Biological Sciences, University of Southern Mississippi, Hattiesburg, MS, USA
  7. 7The Center for Bioinformatics and The Institute of Biomedical Sciences, College of Life Sciences, East China Normal University, Shanghai, China
  8. 8Fondazione Bruno Kessler, Trento, Italy
  9. 9The Hamner Institutes for Health Sciences, Durham, NC, USA
  10. 10Division of Environmental Carcinogenesis, NHEERL, Environmental Protection Agency, Durham, NC, USA

Correspondence: Dr Y Nikolsky, GeneGo Inc., 500 Renaissance Drive, no. 106, St Joseph, MI 49085, USA. E-mail: yuri@genego.com

11These authors contributed equally to this work.

Received 22 November 2009; Revised 14 April 2010; Accepted 26 April 2010.



Gene expression signatures of toxicity and clinical response benefit both safety assessment and clinical practice; however, difficulties in connecting signature genes with the predicted end points have limited their application. The Microarray Quality Control Consortium II (MAQCII) project generated 262 signatures for ten clinical and three toxicological end points from six gene expression data sets, an unprecedented collection of diverse signatures that has permitted a wide-ranging analysis on the nature of such predictive models. A comprehensive analysis of the genes of these signatures and their nonredundant unions using ontology enrichment, biological network building and interactome connectivity analyses demonstrated the link between gene signatures and the biological basis of their predictive power. Different signatures for a given end point were more similar at the level of biological properties and transcriptional control than at the gene level. Signatures tended to be enriched in function and pathway in an end point and model-specific manner, and showed a topological bias for incoming interactions. Importantly, the level of biological similarity between different signatures for a given end point correlated positively with the accuracy of the signature predictions. These findings will aid the understanding, and application of predictive genomic signatures, and support their broader application in predictive medicine.


genomic signatures; enrichment analysis; network reconstruction; biological pathways; interactome; MAQCII


DEGs, differentially expressed genes; DI, direct interactions (network); FA, functional analysis; GO, gene ontology; GSEA, gene set enrichment analysis; KEGG, Kyoto Encyclopedia of Genes and Genomes; MAQCII, Microarray Quality Control Consortium II; TF, transcription factor; TR, transcriptional regulation (an interaction mechanism).



These links to content published by NPG are automatically generated