Molecular profiles of tumors and tumor-associated cells hold great promise as biomarkers of clinical outcomes. However, existing data sets are fragmented and difficult to analyze systematically. Here we present a pan-cancer resource and meta-analysis of expression signatures from ~18,000 human tumors with overall survival outcomes across 39 malignancies. By using this resource, we identified a forkhead box MI (FOXM1) regulatory network as a major predictor of adverse outcomes, and we found that expression of favorably prognostic genes, including KLRB1 (encoding CD161), largely reflect tumor-associated leukocytes. By applying CIBERSORT, a computational approach for inferring leukocyte representation in bulk tumor transcriptomes, we identified complex associations between 22 distinct leukocyte subsets and cancer survival. For example, tumor-associated neutrophil and plasma cell signatures emerged as significant but opposite predictors of survival for diverse solid tumors, including breast and lung adenocarcinomas. This resource and associated analytical tools (http://precog.stanford.edu) may help delineate prognostic genes and leukocyte subsets within and across cancers, shed light on the impact of tumor heterogeneity on cancer outcomes, and facilitate the discovery of biomarkers and therapeutic targets.
At a glance
- Inflammation and cancer. Nature 420, 860–867 (2002). &
- Intratumoral T cells, recurrence, and survival in epithelial ovarian cancer. N. Engl. J. Med. 348, 203–213 (2003). et al.
- Safety, activity, and immune correlates of anti-PD-1 antibody in cancer. N. Engl. J. Med. 366, 2443–2454 (2012). et al.
- Highly parallel genomic assays. Nat. Rev. Genet. 7, 632–644 (2006). , &
- Why most gene expression signatures of tumors have not been useful in the clinic. Sci. Transl. Med. 2, ps2 (2010).
- Critical review of published microarray studies for cancer outcome and guidelines on statistical analysis and reporting. J. Natl. Cancer Inst. 99, 147–157 (2007). &
- Gene expression–based prognostic signatures in lung cancer: ready for clinical use? J. Natl. Cancer Inst. 102, 464–474 (2010). &
- Cancer biomarkers: an invitation to the table. Science 312, 1165–1168 (2006). &
- Thousands of samples are needed to generate a robust gene list for predicting outcome in cancer. Proc. Natl. Acad. Sci. USA 103, 5923–5928 (2006). , &
- Rules of evidence for cancer molecular-marker discovery and validation. Nat. Rev. Cancer 4, 309–314 (2004).
- Ten years on: the human genome and medicine. N. Engl. J. Med. 362, 2028–2029 (2010).
- Coexpression analysis of human genes across many microarray data sets. Genome Res. 14, 1085–1094 (2004). , , , &
- PrognoScan: a new database for meta-analysis of the prognostic value of genes. BMC Med. Genomics 2, 18 (2009). , , &
- Leukemia Gene Atlas: a public platform for integrative exploration of genome-wide molecular data. PLoS ONE 7, e39148 (2012). et al.
- Assessing the clinical utility of cancer genomic and proteomic data across tumor types. Nat. Biotechnol. 32, 644–652 (2014). et al.
- Robust enumeration of cell subsets from tissue expression profiles. Nat. Methods 12, 453–457 (2015). et al.
- Hallmarks of cancer: the next generation. Cell 144, 646–674 (2011). &
- Chemokines in the recruitment and shaping of the leukocyte infiltrate of tumors. Semin. Cancer Biol. 14, 155–160 (2004). et al.
- Neutralizing tumor-promoting chronic inflammation: a magic bullet? Science 339, 286–291 (2013). , &
- Architecture of the human regulatory network derived from ENCODE data. Nature 489, 91–100 (2012). et al.
- Tackling the widespread and critical impact of batch effects in high-throughput data. Nat. Rev. Genet. 11, 733–739 (2010). et al.
- The landscape of somatic copy-number alteration across human cancers. Nature 463, 899–905 (2010). et al.
- AutoSOME: a clustering method for identifying gene expression modules without prior knowledge of cluster number. BMC Bioinformatics 11, 117 (2010). &
- Association of a leukemic stem cell gene expression signature with clinical outcomes in acute myeloid leukemia. J. Am. Med. Assoc. 304, 2706–2715 (2010). , , &
- Colorectal cancer stem cells: from the crypt to the clinic. Cell Stem Cell 15, 692–705 (2014). , , &
- The emerging roles of forkhead box (Fox) proteins in cancer. Nat. Rev. Cancer 7, 847–859 (2007). &
- The Ki-67 protein: from the known and the unknown. J. Cell. Physiol. 182, 311–322 (2000). &
- CD161 defines a transcriptional and functional phenotype across distinct human T cell lineages. Cell Rep. 9, 1075–1088 (2014). et al.
- ChEA: transcription factor regulation inferred from integrating genome-wide ChIP-X experiments. Bioinformatics 26, 2438–2444 (2010). et al.
- Molecular signatures database (MSigDB) 3.0. Bioinformatics 27, 1739–1740 (2011). et al.
- The forkhead transcription factor FOXM1 controls cell cycle-dependent gene expression through an atypical chromatin binding mechanism. Mol. Cell. Biol. 33, 227–236 (2013). et al.
- Type, density, and location of immune cells within human colorectal tumors predict clinical outcome. Science 313, 1960–1964 (2006). et al.
- The immune contexture in human tumours: impact on clinical outcome. Nat. Rev. Cancer 12, 298–306 (2012). , , &
- Distinct role of macrophages in different tumor microenvironments. Cancer Res. 66, 605–612 (2006). &
- Paradoxical roles of the immune system during cancer development. Nat. Rev. Cancer 6, 24–37 (2006). , &
- Regulatory T cells in cancer. Blood 108, 804–811 (2006). &
- Regulation of cutaneous malignancy by γδ T cells. Science 294, 605 (2001). et al.
- Gamma/delta cells. Annu. Rev. Immunol. 11, 637–685 (1993). , &
- Tumor-associated neutrophils: friend or foe? Carcinogenesis 33, 949–955 (2012). &
- The intriguing role of polymorphonuclear neutrophils in antitumor reactions. Blood 97, 339–345 (2001). et al.
- Inflammation and necrosis promote tumour growth. Nat. Rev. Immunol. 4, 641–648 (2004). &
- Tumour-associated macrophages are a distinct M2 polarised population promoting tumour progression: potential targets of anti-cancer therapy. Eur. J. Cancer 42, 717–727 (2006). , , &
- Pretreatment neutrophil count as an independent prognostic factor in advanced non-small-cell lung cancer: an analysis of Japan Multinational Trial Organisation LC00–03. Eur. J. Cancer 45, 1950–1958 (2009). et al.
- De novo carcinogenesis promoted by chronic inflammation is B lymphocyte dependent. Cancer Cell 7, 411–423 (2005). , &
- Prevalence of regulatory T cells is increased in peripheral blood and tumor microenvironment of patients with pancreas or breast adenocarcinoma. J. Immunol. 169, 2756–2761 (2002). et al.
- Regulatory T cells, dendritic cells and neutrophils in patients with renal cell carcinoma. Immunol. Lett. 152, 144–150 (2013). et al.
- The baseline ratio of neutrophils to lymphocytes is associated with patient prognosis in advanced gastric cancer. Oncology 73, 215–220 (2007). et al.
- A multigene assay to predict recurrence of Tamoxifen-treated, node-negative breast cancer. N. Engl. J. Med. 351, 2817–2826 (2004). et al.
- Survival analysis tools in genomics research. Hum. Genomics 8, 21 (2014). , &
- Prediction of survival in diffuse large B-cell lymphoma based on the expression of 2 genes reflecting tumor and microenvironment. Blood 118, 1350–1358 (2011). et al.
- Spatiotemporal dynamics of intratumoral immune cells reveal the immune landscape in human cancer. Immunity 39, 782–795 (2013). et al.
- Molecular and genetic properties of tumors associated with local immune cytolytic activity. Cell 160, 48–61 (2015). , , , &
- Myeloid-derived suppressor cells as regulators of the immune system. Nat. Rev. Immunol. 9, 162–174 (2009). &
- The blockade of immune checkpoints in cancer immunotherapy. Nat. Rev. Cancer 12, 252–264 (2012).
- Tumor immunotherapy directed at PD-1. N. Engl. J. Med. 366, 2517–2519 (2012).
- Celsius: a community resource for Affymetrix microarray data. Genome Biol. 8, R112 (2007). , , , &
- Evolving gene/transcript definitions significantly alter the interpretation of GeneChip data. Nucleic Acids Res. 33, e175 (2005). et al.
- affy—analysis of Affymetrix GeneChip data at the probe level. Bioinformatics 20, 307–315 (2004). , , &
- Bioconductor: open software development for computational biology and bioinformatics. Genome Biol. 5, R80 (2004). et al.
- Summaries of Affymetrix GeneChip probe level data. Nucleic Acids Res. 31, e15 (2003). et al.
- A model-based background adjustment for oligonucleotide expression arrays. J. Am. Stat. Assoc. 99, 909–917 (2004). , , , &
- Frozen robust multiarray analysis (fRMA). Biostatistics 11, 242–253 (2010). , &
- A single-sample microarray normalization method to facilitate personalized-medicine workflows. Genomics 100, 337–344 (2012). et al.
- TCGA-Assembler: open-source software for retrieving and processing TCGA data. Nat. Methods 11, 599–600 (2014). , &
- On the combination of independent tests. Magyar Tud. Akad. Mat. Kutato Int. Közl 3, 171–196 (1958).
- The American Soldier: Adjustment During Army Life (Princeton University Press, 1949). , &
- Optimally weighted Z-test is a powerful method for combining probabilities in meta-analysis. J. Evol. Biol. 24, 1836–1841 (2011).
- Adjusting batch effects in microarray expression data using empirical Bayes methods. Biostatistics 8, 118–127 (2007). , &
- Statistical significance for genomewide studies. Proc. Natl. Acad. Sci. USA 100, 9440–9445 (2003). &
- The STRING database in 2011: functional interaction networks of proteins, globally integrated and scored. Nucleic Acids Res. 39, D561–D568 (2011). et al.
- Algebraic connectivity of graphs. Czech. Math. J. 23, 298–305 (1973).
- ToppGene Suite for gene list enrichment analysis and candidate gene prioritization. Nucleic Acids Res. 37, W305–W311 (2009). , , &
- Molecular signatures database (MSigDB) 3.0. Bioinformatics 27, 1739–1740 (2011). et al.
- Gene set enrichment analysis: a knowledge-based approach for interpreting genome-wide expression profiles. Proc. Natl. Acad. Sci. USA 102, 15545–15550 (2005). et al.
- Gene expression deconvolution in linear space. Nat. Methods 9, 8–9 (2012). &
- Gene expression changes in an animal melanoma model correlate with Aggressiveness of Human Melanoma Metastases. Mol. Cancer Res. 6, 760–769 (2008). et al.
- Selection of DDX5 as a novel internal control for Q-RT-PCR from microarray data using a block bootstrap re-sampling scheme. BMC Genomics 8, 140 (2007). et al.
- Gene expression signature of cigarette smoking and its role in lung adenocarcinoma development and survival. PLoS ONE 3, e1651 (2008). et al.
- Transcriptional profiling of long non-coding RNAs and novel transcribed regions across a diverse panel of archived human cancers. Genome Biol. 13, R75 (2012). et al.
- An interactive java statistical image segmentation system: Gemident. J. Stat. Softw. 30, i10 (2009). , &
- Supplementary Text and Figures (16,665 KB)
Supplementary Figures 1–8
- Supplementary Data 1 (10,688 KB)
PRECOG meta-z matrix and source data
- Supplementary Data 2 (90 KB)
Prognostic genes shared across multiple cancers or specific to individual cancers, and related analyses
- Supplementary Data 3 (631 KB)
Clusters of prognostic genes and corresponding functional annotations
- Supplementary Data 4 (26 KB)
Bivariate models incorporating FOXM1 and KLRB1 expression levels across cancer types, and significance of a FOXM1-KLRB1 score in multivariate models with clinical parameters
- Supplementary Data 5 (190 KB)
Protein-protein association data for the top pan-cancer prognostic genes in PRECOG; analysis of transcription factors and their target genes in PRECOG
- Supplementary Data 6 (49 KB)
CIBERSORT-inferred fractions of tumor-associated leukocytes across 25 malignancies
- Supplementary Data 7 (37 KB)
Lung adenocarcinoma TMA analyses, including clinical data and marker quantification, multivariate survival analysis with clinical covariates, and comparison of TAL levels with circulating leukocytes