With the increased use of next-generation sequencing generating large amounts of genomic data, gene expression signatures are becoming critically important tools for the interpretation of these data, and are poised to have a substantial effect on diagnosis, management, and prognosis for a number of diseases. It is becoming crucial to establish whether the expression patterns and statistical properties of sets of genes, or gene signatures, are conserved across independent datasets. Conversely, it is necessary to compare established signatures on the same dataset to better understand how they capture different clinical or biological characteristics. Here we describe how to use sigQC, a tool that enables a streamlined, systematic approach for the evaluation of previously obtained gene signatures across multiple gene expression datasets. We implemented sigQC in an R package, making it accessible to users who have knowledge of file input/output and matrix manipulation in R and a moderate grasp of core statistical principles. SigQC has been adopted in basic biology and translational studies, including, but not limited to, the evaluation of multiple gene signatures for potential clinical use as cancer biomarkers. This protocol uses a previously obtained signature for breast cancer metastasis as an example to illustrate the critical quality control steps involved in evaluating its expression, variability, and structure in breast tumor RNA-sequencing data, a different dataset from that in which the signature was originally derived. We demonstrate how the outputs created from sigQC can be used for the evaluation of gene signatures on large-scale gene expression datasets.

Access optionsAccess options

Rent or Buy article

Get time limited or full article access on ReadCube.


All prices are NET prices.

Data availability

All data that have been used in this publication have been made available through Zenodo at https://doi.org/10.5281/zenodo.1319848.

Code availability

All code that constitutes the sigQC R package is available for use under a GPL v3 license and can be downloaded from the CRAN repository at https://CRAN.R-project.org/package=sigQC.

All scripts used to create the figures in this paper can be downloaded through Zenodo at https://doi.org/10.5281/zenodo.1319848.

Additional information

Publisher’s note: Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Key references using this protocol

Dhawan, A. et al. Nat. Commun. 9, 5228 (2018): https://doi.org/10.1038/s41467-018-07657-1

Haider, S. et al. Genome Biol. 17, 140 (2016): https://doi.org/10.1186/s13059-016-0999-8

Buffa, F. et al. Br. J. Cancer 102, 428–435 (2010): https://doi.org/10.1038/sj.bjc.6605450

Masiero, M. et al. Cancer Cell 24, 229–241 (2013): https://doi.org/10.1016/j.ccr.2013.06.004

Key data used in this protocol

van ’t Veer, L. J. et al. Nature 415, 530–536 (2002): https://doi.org/10.1038/415530a


  1. 1.

    van de Vijver, M. J. et al. A gene-expression signature as a predictor of survival in breast cancer. N. Engl. J. Med. 347, 1999–2009 (2002).

  2. 2.

    Rui Liu, X. et al. The prognostic role of a gene signature from tumorigenic breast-cancer cells. N. Engl. J. Med. 356, 217–226 (2007).

  3. 3.

    Byers, L. A. et al. An epithelial–mesenchymal transition gene signature predicts resistance to EGFR and PI3K inhibitors and identifies Axl as a therapeutic target for overcoming EGFR inhibitor resistance. Clin. Cancer Res. 19, 279–290 (2013).

  4. 4.

    Liberzon, A. et al. The molecular signatures database hallmark gene set collection. Cell Syst. 1, 417–425 (2015).

  5. 5.

    Alexandrov, L. B., Nik-Zainal, S., Wedge, D. C., Campbell, P. J. & Stratton, M. R. Deciphering signatures of mutational processes operative in human cancer. Cell Rep. 3, 246–259 (2013).

  6. 6.

    Kanagal-Shamanna, R. et al. Next-generation sequencing-based multi-gene mutation profiling of solid tumors using fine needle aspiration samples: promises and challenges for routine clinical diagnostics. Mod. Pathol. 27, 314–327 (2014).

  7. 7.

    Shipp, M. A. et al. Diffuse large B-cell lymphoma outcome prediction by gene-expression profiling and supervised machine learning. Nat. Med. 8, 68–74 (2002).

  8. 8.

    Li, A. et al. Unsupervised analysis of transcriptomic profiles reveals six glioma subtypes. Cancer Res. 69, 2091–2099 (2009).

  9. 9.

    Buffa, F. M., Harris, A. L., West, C. M. & Miller, C. J. Large meta-analysis of multiple cancers reveals a
common, compact and highly prognostic hypoxia metagene. Br. J. Cancer 102, 428–435 (2010).

  10. 10.

    Winter, S. C. et al. Relation of a hypoxia metagene derived from head and neck cancer to prognosis of multiple cancers. Cancer Res. 67, 3441–3449 (2007).

  11. 11.

    Kourou, K., Exarchos, T. P., Exarchos, K. P., Karamouzis, M. V. & Fotiadis, D. I. Machine learning applications in cancer prognosis and prediction. Comput. Struct. Biotechnol. J. 13, 8–17 (2015).

  12. 12.

    Venet, D., Dumont, J. E. & Detours, V. Most random gene expression signatures are significantly associated with breast cancer outcome. PLoS Comput. Biol. 7, e1002240 (2011).

  13. 13.

    Hänzelmann, S., Castelo, R. & Guinney, J. GSVA: gene set variation analysis for microarray and RNA-seq data. BMC Bioinformatics 14, 7 (2013).

  14. 14.

    Barbie, D. A. et al. Systematic RNA interference reveals that oncogenic KRAS-driven cancers require TBK1. Nature 462, 108–112 (2009).

  15. 15.

    Tomfohr, J., Lu, J. & Kepler, T. B. Pathway level analysis of gene expression using singular value decomposition. BMC Bioinformatics 6, 225 (2005).

  16. 16.

    Bradley, E. & Tibshirani, R. On testing the significance of sets of genes. Ann. Appl. Stat. 6, 107–129 (2007).

  17. 17.

    Berglund, A. E., Welsh, E. A. & Eschrich, S. A. Characteristics and validation techniques for PCA-based gene-expression signatures. Int. J. Genomics 2017, 2354564 (2017).

  18. 18.

    Fox, N. S., Starmans, M. H. W., Haider, S., Lambin, P. & Boutros, P. C. Ensemble analyses improve signatures of tumour hypoxia and reveal inter-platform differences. BMC Bioinformatics 15, 170 (2014).

  19. 19.

    Masiero, M. et al. A core human primary tumor angiogenesis signature identifies the endothelial orphan receptor ELTD1 as a key regulator of angiogenesis. Cancer Cell 24, 229–241 (2013).

  20. 20.

    Harris, B. H. L., Barberis, A., West, C. M. L. & Buffa, F. M. Gene expression signatures as biomarkers of tumour hypoxia. Clin. Oncol. 27, 547–560 (2015).

  21. 21.

    Dhawan, A., Scott, J. G., Harris, A. L. & Buffa, F. M. Pan-cancer characterisation of microRNA with hallmarks of cancer reveals role of microRNA-mediated downregulation of tumour suppressor genes. Nat. Commun. 9, 5228 (2018).

  22. 22.

    van’t Veer, L. J. et al. Gene expression profiling predicts clinical outcome of breast cancer. Nature 415, 530–536 (2002).

  23. 23.

    Cancer Genome Atlas Network. Comprehensive molecular portraits of human breast tumours. Nature 490, 61–70 (2012).

  24. 24.

    Broad Institute TCGA Genome Data Analysis Center. Analysis-Ready Standardized TCGA Data from Broad GDAC Firehose 2016_01_28 run (Broad Institute TCGA Genome Data Analysis Center, 2016).

  25. 25.

    Schulze, A. & Downward, J. Navigating gene expression using microarrays—a technology review. Nat. Cell Biol. 3, E190–E195 (2001).

  26. 26.

    Wang, Z., Gerstein, M. & Snyder, M. RNA-Seq: a revolutionary tool for transcriptomics. Nat. Rev. Genet. 10, 57–63 (2009).

  27. 27.

    Durinck, S. et al. BioMart and bioconductor: a powerful link between biological databases and microarray data analysis. Bioinformatics 21, 3439–3440 (2005).

  28. 28.

    Fraley, C. & Raftery, A. E. MCLUST Version 3: An R Package for Normal Mixture Modeling and Model-based Clustering (Department of Statistics, University of Washington, Seattle, 2006).

  29. 29.

    Subramanian, A. et al. Gene set enrichment analysis: a knowledge-based approach for interpreting genome-wide expression profiles. Proc. Natl. Acad. Sci. USA 102, 15545–15550 (2005).

  30. 30.

    Knudsen, S. et al. Development and validation of a gene expression score that predicts response to fulvestrant in breast cancer patients. PLoS One 9, e87415 (2014).

  31. 31.

    Chen, H.-I. H., Hsiao, T.-H., Chen, Y. & Keller, C. S-score: a novel scoring method of gene signatures for molecular classification in 2011 IEEE International Workshop on Genomic Signal Processing and Statistics (GENSIPS) 154–157 (IEEE, 2011).

  32. 32.

    Hsiao, T.-H. et al. Utilizing signature-score to identify oncogenic pathways of cholangiocarcinoma. Transl. Cancer Res. 2, 6–17 (2013).

  33. 33.

    Ebi, H. et al. Relationship of deregulated signaling converging onto mTOR with prognosis and classification of lung adenocarcinoma shown by two independent in silico analyses. Cancer Res. 69, 4027–4035 (2009).

  34. 34.

    Gibbons, D. L. et al. Expression signatures of metastatic capacity in a genetic mouse model of lung adenocarcinoma. PLoS One 4, e5401 (2009).

  35. 35.

    Hong, F. et al. RankProd: a bioconductor package for detecting differentially expressed genes in meta-analysis. Bioinformatics 22, 2825–2827 (2006).

Download references


This work was funded by Cancer Research UK grant 23969 to F.M.B. (F.M.B., A.B., W.-C.C., and A.D.), the Oxford Cancer Centre (A.L.H. and A.D.), the Medical Research Council Stratified Medicine Consortium MR/M016587/1 (T.M. and E.D.), and European Research Council Consolidator Grant 772970 to F.M.B. We are also grateful for a Clarendon Scholarship to A.D.

Author information


  1. Computational Biology and Integrative Genomics Lab, MRC/CRUK Oxford Institute and Department of Oncology, University of Oxford, Oxford, UK

    • Andrew Dhawan
    • , Alessandro Barberis
    • , Wei-Chen Cheng
    • , Enric Domingo
    • , Tim Maughan
    • , Adrian L. Harris
    •  & Francesca M. Buffa
  2. Division of Cancer Studies, University of Manchester, Manchester, UK

    • Catharine West
  3. Translational Hematology and Oncology Research, Cleveland Clinic, Cleveland, OH, USA

    • Jacob G. Scott


  1. Search for Andrew Dhawan in:

  2. Search for Alessandro Barberis in:

  3. Search for Wei-Chen Cheng in:

  4. Search for Enric Domingo in:

  5. Search for Catharine West in:

  6. Search for Tim Maughan in:

  7. Search for Jacob G. Scott in:

  8. Search for Adrian L. Harris in:

  9. Search for Francesca M. Buffa in:


F.M.B. conceived the idea and designed the study. A.D., A.B., W.-C.C., J.G.S., and F.M.B. contributed to statistics and data visualization. A.D. performed analyses. A.D., A.B., and W.-C.C. wrote and debugged code. A.B. and F.M.B. supervised the implementation. All authors contributed to application cases and interpretation of data. A.D. and F.M.B. wrote the manuscript, with contributions from all other authors.

Competing interests

The authors declare no competing interests.

Corresponding author

Correspondence to Francesca M. Buffa.

Integrated supplementary information

  1. Supplementary Figure 1 Measures of expression of signature genes across TCGA breast cancer dataset.

    Expression of signature genes across the TCGA breast cancer RNA-seq dataset for the metastasis gene signature (top) and a random set of genes (bottom), shown as (a) a barplot for the proportion of samples expressing a gene above the median, (b) a density plot showing the same information as the barplots in (a), and (c) a plot of the proportion of samples showing NA expression for each of the genes of the signature.

  2. Supplementary Figure 2 Assessment of standardization of dataset values on gene signature score.

    Comparison of median and z-transformed median of signature gene expression across the RNA-seq breast cancer dataset for the metastasis gene signature (left) and the random set of genes (right).

Supplementary information

  1. Supplementary Text and Figures

    Supplementary Figs. 1 and 2, Supplementary Table 1, and Supplementary Manuals 1 and 2

  2. Reporting Summary

About this article

Publication history







By submitting a comment you agree to abide by our Terms and Community Guidelines. If you find something abusive or that does not comply with our terms or guidelines please flag it as inappropriate.