With the increased use of next-generation sequencing generating large amounts of genomic data, gene expression signatures are becoming critically important tools for the interpretation of these data, and are poised to have a substantial effect on diagnosis, management, and prognosis for a number of diseases. It is becoming crucial to establish whether the expression patterns and statistical properties of sets of genes, or gene signatures, are conserved across independent datasets. Conversely, it is necessary to compare established signatures on the same dataset to better understand how they capture different clinical or biological characteristics. Here we describe how to use sigQC, a tool that enables a streamlined, systematic approach for the evaluation of previously obtained gene signatures across multiple gene expression datasets. We implemented sigQC in an R package, making it accessible to users who have knowledge of file input/output and matrix manipulation in R and a moderate grasp of core statistical principles. SigQC has been adopted in basic biology and translational studies, including, but not limited to, the evaluation of multiple gene signatures for potential clinical use as cancer biomarkers. This protocol uses a previously obtained signature for breast cancer metastasis as an example to illustrate the critical quality control steps involved in evaluating its expression, variability, and structure in breast tumor RNA-sequencing data, a different dataset from that in which the signature was originally derived. We demonstrate how the outputs created from sigQC can be used for the evaluation of gene signatures on large-scale gene expression datasets.
Access optionsAccess options
Subscribe to Journal
Get full journal access for 1 year
only $41.25 per issue
All prices are NET prices.
VAT will be added later in the checkout.
Rent or Buy article
Get time limited or full article access on ReadCube.
All prices are NET prices.
All data that have been used in this publication have been made available through Zenodo at https://doi.org/10.5281/zenodo.1319848.
All code that constitutes the sigQC R package is available for use under a GPL v3 license and can be downloaded from the CRAN repository at https://CRAN.R-project.org/package=sigQC.
All scripts used to create the figures in this paper can be downloaded through Zenodo at https://doi.org/10.5281/zenodo.1319848.
Publisher’s note: Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Key references using this protocol
Dhawan, A. et al. Nat. Commun. 9, 5228 (2018): https://doi.org/10.1038/s41467-018-07657-1
Haider, S. et al. Genome Biol. 17, 140 (2016): https://doi.org/10.1186/s13059-016-0999-8
Buffa, F. et al. Br. J. Cancer 102, 428–435 (2010): https://doi.org/10.1038/sj.bjc.6605450
Masiero, M. et al. Cancer Cell 24, 229–241 (2013): https://doi.org/10.1016/j.ccr.2013.06.004
Key data used in this protocol
van ’t Veer, L. J. et al. Nature 415, 530–536 (2002): https://doi.org/10.1038/415530a
This work was funded by Cancer Research UK grant 23969 to F.M.B. (F.M.B., A.B., W.-C.C., and A.D.), the Oxford Cancer Centre (A.L.H. and A.D.), the Medical Research Council Stratified Medicine Consortium MR/M016587/1 (T.M. and E.D.), and European Research Council Consolidator Grant 772970 to F.M.B. We are also grateful for a Clarendon Scholarship to A.D.