The prognostic landscape of genes and infiltrating immune cells across human cancers

Journal name:
Nature Medicine
Volume:
21,
Pages:
938–945
Year published:
DOI:
doi:10.1038/nm.3909
Received
Accepted
Published online

Abstract

Molecular profiles of tumors and tumor-associated cells hold great promise as biomarkers of clinical outcomes. However, existing data sets are fragmented and difficult to analyze systematically. Here we present a pan-cancer resource and meta-analysis of expression signatures from ~18,000 human tumors with overall survival outcomes across 39 malignancies. By using this resource, we identified a forkhead box MI (FOXM1) regulatory network as a major predictor of adverse outcomes, and we found that expression of favorably prognostic genes, including KLRB1 (encoding CD161), largely reflect tumor-associated leukocytes. By applying CIBERSORT, a computational approach for inferring leukocyte representation in bulk tumor transcriptomes, we identified complex associations between 22 distinct leukocyte subsets and cancer survival. For example, tumor-associated neutrophil and plasma cell signatures emerged as significant but opposite predictors of survival for diverse solid tumors, including breast and lung adenocarcinomas. This resource and associated analytical tools (http://precog.stanford.edu) may help delineate prognostic genes and leukocyte subsets within and across cancers, shed light on the impact of tumor heterogeneity on cancer outcomes, and facilitate the discovery of biomarkers and therapeutic targets.

At a glance

Figures

  1. Prognostic landscape of gene expression across human cancers.
    Figure 1: Prognostic landscape of gene expression across human cancers.

    (a) Schematic depicting PRECOG data pre-processing and analysis steps. (b) Number of patient samples with survival data included in PRECOG, organized by cancer type. Thirty-nine distinct histologies (for example, adenocarcinoma and squamous cell carcinoma in lung cancer, different types of blood cancer) have been grouped into 18 clusters for concise display. (c) Left, approximately two-thirds of prognostic genes (filtered for |meta-z| > 3.09, or nominal one-sided P < 0.001) are prognostic in more than one of the 39 distinct cancer histologies for which meta-z-scores were computed, while the remaining one-third are prognostic in only a single histology; the latter are cancer-specific. Right, same analysis shown at left but applied to randomly shuffled gene labels for each cancer in PRECOG. On the basis of 100,000 trials, the empirical P value for the observed enrichment of shared genes is P < 10−5 (Monte Carlo simulation). (d) Left, heat map showing genes (rows) clustered by association between expression levels and survival outcomes across 166 individual cancer studies (columns). Z-scores represent the statistical significance of each gene's association with survival, with poor prognosis genes in red, and favorable prognosis genes in green. All identified clusters were ranked by compound scores that integrate cluster size with the prognostic significance of genes within each cluster; the five top-ranking clusters are shown (left). Right, representative functional enrichments for each of the five clusters, determined by analyzing annotated gene sets with a Bonferroni-corrected hypergeometric test. All clusters, including associated data sets and compound scores, are provided in Supplementary Data 3.

  2. Genes globally associated with adverse and favorable survival.
    Figure 2: Genes globally associated with adverse and favorable survival.

    (a) Analysis of the number of cancer types used to identify pan-cancer prognostic genes versus the significance of these genes in validation data sets. Left, the top ten adverse and favorable pan-cancer prognostic genes were identified in training sets (comprised of t cancer types) and assessed by mean meta-z-scores in validation sets (remaining 39 − t cancers). For each value of t, from 1 to 31, histologies were randomly drawn from PRECOG 100 times, and the results are presented as means ± 95% confidence interval (CI). Right, the ten most frequent cancer-wide adverse and favorable prognostic genes are shown for t = 31 (above this threshold, performance gains were marginal). Of note, global meta-z-scores (bottom x axis) reflect all cancers in PRECOG (Supplementary Data 1). (b) Comparison of global meta-z-scores between PRECOG (n = 17,808 tumors) and TCGA RNA-seq data (n = 6,663 tumors), with FOXM1 and KLRB1 indicated. Points lying between the parallel gray lines represent insignificant genes in PRECOG, TCGA, or both (nominal two-sided P > 0.05). (c) Kaplan-Meier curves showing differences in overall survival for patients in validation sets stratified by a FOXM1 and KLRB1 expression score (see Online Methods). For each cancer, a median split was used and curve separation was assessed by a log-rank test. Survival units from different studies were standardized to months. Lung cancers were primarily stage I (approximately two-thirds), and the melanoma data consisted primarily of metastatic samples (see Online Methods). 95% confidence intervals are presented in brackets. HR, hazard ratio. (d) Top, genes ranked by mean meta-z-scores across all data sets in PRECOG (n = 23,288 genes). Center, PPA networks for the top 100 genes determined by mean pan-cancer meta-z-scores. Edges are colored to denote experimentally confirmed interactions and/or associations in curated databases (blue edges), and other sources of evidence (gray edges) (see Online Methods and Supplementary Data 5). Functional annotation P values were determined using a Benjamini-Hochberg–corrected hypergeometric test. Genes in the pan-cancer prognostic networks are colored according to the number of cancer-specific PPA networks in which they are also found. 0* indicates genes only found in PPA networks derived from all cancers. Bottom, two metrics of network connectivity are compared among PPA networks for the top 100 prognostic genes derived from all cancers (red diamonds) versus individual cancers and studies in the PRECOG data (gray circles): x axis = node degree, the average number of edges e (i.e., PPAs) per node n (i.e., protein); y axis = algebraic connectivity, a graph theoretic measure of overall network connectedness.

  3. Inferred leukocyte frequencies and prognostic associations in 25 human cancers.
    Figure 3: Inferred leukocyte frequencies and prognostic associations in 25 human cancers.

    (a) Relative leukocyte fractions enumerated in solid tumors by CIBERSORT versus immunohistochemical analysis (IHC) or flow cytometry (FACS) on independent samples. CRC, colorectal cancer; LUAD, lung adenocarcinoma. To approximate ground truth proportions in CRC biopsies, levels were inferred by averaging previously reported leukocyte counts from the tumor center and invasive margin of 107 patients (Bindea et al.51). Baseline leukocyte fractions in LUAD biopsies were enumerated by FACS (n = 13 tumors; data represented as medians; details in Online Methods). CIBERSORT results are represented as mean leukocyte fractions for the corresponding histologies (Supplementary Data 6). (b) Estimated mRNA fractions of 22 leukocyte subsets across 25 cancers (Affymetrix platforms only; see Online Methods), pooled into 11 immune populations here for clarity (for full details, see Supplementary Data 6). (c) Global prognostic associations for 22 leukocyte types across 25 cancers (n = 5,782 tumors; left) and 14 solid non-brain tumors (n = 3,238 tumors; right), ranked by unweighted meta-z-score, with a false discovery rate (FDR) threshold of 25% indicated for each plot. Additional FDR thresholds are provided in Supplementary Figure 6d. For individual cancers, see Supplementary Figure 6a,c. (d) Concordance and differences in TAL prognostic associations between breast cancers and lung adenocarcinoma (for FDRs, see Supplementary Fig. 6c). Resting and activated subsets in c,d are indicated by − and +, respectively. All leukocyte subset abbreviations are defined in Supplementary Data 6. Red and blue bars in c,d indicate adverse and favorable prognostic associations, respectively.

  4. Ratio of infiltrating PMN cells to PCs is prognostic in diverse solid tumors.
    Figure 4: Ratio of infiltrating PMN cells to PCs is prognostic in diverse solid tumors.

    (a) Prognostic associations between inferred PMN cell and PC frequencies are significantly inversely correlated across the cancer landscape (R = −0.46, P = 0.02; Pearson test). Each point represents an individual cancer: triangles, blood cancers; squares, brain cancers; circles, remaining cancers. (b) Meta-z-scores depict the prognostic significance of combining PMN cell and PC fractions into a ratiometric index, for diverse solid tumors (source data are provided in Supplementary Data 6). (c) Comparison between CIBERSORT and tissue microarray (TMA) analysis for PC, B cell, and PMN cell frequencies in lung adenocarcinoma, using IGKC, CD20 and MPO, respectively, as surrogate markers (n = 187 specimens). Lung adenocarcinoma expression arrays from publicly available data sets (GSE7670 and GSE10072) were analyzed with CIBERSORT (n = 85 tumors). (d,e) Kaplan-Meier Plots depict patients stratified above ('high') or below ('low') the median level of PMN cell–to-PC fractions inferred in lung adenocarcinoma microarray studies (P = 0.0005, log-rank test; n = 453 high and 453 low patients; Supplementary Data 6) (d) and the median level of MPO to IGKC stained positive in lung adenocarcinoma tissue sections (P = 0.028, log-rank test; n = 94 high and 93 low patients). (e) Hazard ratios were 1.5 (1.2–1.9, 95% CI) for d and 1.7 (1.1–2.6, 95% CI) for e. Inferred levels of PMN cells to PCs were also significantly prognostic in continuous models assessed by univariate Cox regression in d (P = 0.003, z = 2.98) and e (P = 0.0005, z = 3.46). Data in c are presented as means ± s.e.m. All patients were right censored after 5 years in d and e.

Accession codes

Referenced accessions

Gene Expression Omnibus

References

  1. Coussens, L.M. & Werb, Z. Inflammation and cancer. Nature 420, 860867 (2002).
  2. Zhang, L. et al. Intratumoral T cells, recurrence, and survival in epithelial ovarian cancer. N. Engl. J. Med. 348, 203213 (2003).
  3. Topalian, S.L. et al. Safety, activity, and immune correlates of anti-PD-1 antibody in cancer. N. Engl. J. Med. 366, 24432454 (2012).
  4. Fan, J.B., Chee, M. & Gunderson, K. Highly parallel genomic assays. Nat. Rev. Genet. 7, 632644 (2006).
  5. Koscielny, S. Why most gene expression signatures of tumors have not been useful in the clinic. Sci. Transl. Med. 2, ps2 (2010).
  6. Dupuy, A. & Simon, R.M. Critical review of published microarray studies for cancer outcome and guidelines on statistical analysis and reporting. J. Natl. Cancer Inst. 99, 147157 (2007).
  7. Subramanian, J. & Simon, R. Gene expression–based prognostic signatures in lung cancer: ready for clinical use? J. Natl. Cancer Inst. 102, 464474 (2010).
  8. Dalton, W.S. & Friend, S.H. Cancer biomarkers: an invitation to the table. Science 312, 11651168 (2006).
  9. Ein-Dor, L., Zuk, O. & Domany, E. Thousands of samples are needed to generate a robust gene list for predicting outcome in cancer. Proc. Natl. Acad. Sci. USA 103, 59235928 (2006).
  10. Ransohoff, D.F. Rules of evidence for cancer molecular-marker discovery and validation. Nat. Rev. Cancer 4, 309314 (2004).
  11. Varmus, H. Ten years on: the human genome and medicine. N. Engl. J. Med. 362, 20282029 (2010).
  12. Lee, H.K., Hsu, A.K., Sajdak, J., Qin, J. & Pavlidis, P. Coexpression analysis of human genes across many microarray data sets. Genome Res. 14, 10851094 (2004).
  13. Mizuno, H., Kitada, K., Nakai, K. & Sarai, A. PrognoScan: a new database for meta-analysis of the prognostic value of genes. BMC Med. Genomics 2, 18 (2009).
  14. Hebestreit, K. et al. Leukemia Gene Atlas: a public platform for integrative exploration of genome-wide molecular data. PLoS ONE 7, e39148 (2012).
  15. Yuan, Y. et al. Assessing the clinical utility of cancer genomic and proteomic data across tumor types. Nat. Biotechnol. 32, 644652 (2014).
  16. Newman, A.M. et al. Robust enumeration of cell subsets from tissue expression profiles. Nat. Methods 12, 453457 (2015).
  17. Hanahan, D. & Weinberg, R.A. Hallmarks of cancer: the next generation. Cell 144, 646674 (2011).
  18. Mantovani, A. et al. Chemokines in the recruitment and shaping of the leukocyte infiltrate of tumors. Semin. Cancer Biol. 14, 155160 (2004).
  19. Coussens, L.M., Zitvogel, L. & Palucka, A.K. Neutralizing tumor-promoting chronic inflammation: a magic bullet? Science 339, 286291 (2013).
  20. Gerstein, M.B. et al. Architecture of the human regulatory network derived from ENCODE data. Nature 489, 91100 (2012).
  21. Leek, J.T. et al. Tackling the widespread and critical impact of batch effects in high-throughput data. Nat. Rev. Genet. 11, 733739 (2010).
  22. Beroukhim, R. et al. The landscape of somatic copy-number alteration across human cancers. Nature 463, 899905 (2010).
  23. Newman, A.M. & Cooper, J.B. AutoSOME: a clustering method for identifying gene expression modules without prior knowledge of cluster number. BMC Bioinformatics 11, 117 (2010).
  24. Gentles, A.J., Plevritis, S.K., Majeti, R. & Alizadeh, A.A. Association of a leukemic stem cell gene expression signature with clinical outcomes in acute myeloid leukemia. J. Am. Med. Assoc. 304, 27062715 (2010).
  25. Zeuner, A., Todaro, M., Stassi, G. & De Maria, R. Colorectal cancer stem cells: from the crypt to the clinic. Cell Stem Cell 15, 692705 (2014).
  26. Myatt, S.S. & Lam, E.W.-F. The emerging roles of forkhead box (Fox) proteins in cancer. Nat. Rev. Cancer 7, 847859 (2007).
  27. Scholzen, T. & Gerdes, J. The Ki-67 protein: from the known and the unknown. J. Cell. Physiol. 182, 311322 (2000).
  28. Fergusson, J.R. et al. CD161 defines a transcriptional and functional phenotype across distinct human T cell lineages. Cell Rep. 9, 10751088 (2014).
  29. Lachmann, A. et al. ChEA: transcription factor regulation inferred from integrating genome-wide ChIP-X experiments. Bioinformatics 26, 24382444 (2010).
  30. Liberzon, A. et al. Molecular signatures database (MSigDB) 3.0. Bioinformatics 27, 17391740 (2011).
  31. Chen, X. et al. The forkhead transcription factor FOXM1 controls cell cycle-dependent gene expression through an atypical chromatin binding mechanism. Mol. Cell. Biol. 33, 227236 (2013).
  32. Galon, J. et al. Type, density, and location of immune cells within human colorectal tumors predict clinical outcome. Science 313, 19601964 (2006).
  33. Fridman, W.H., Pagès, F., Sautès-Fridman, C. & Galon, J. The immune contexture in human tumours: impact on clinical outcome. Nat. Rev. Cancer 12, 298306 (2012).
  34. Lewis, C.E. & Pollard, J.W. Distinct role of macrophages in different tumor microenvironments. Cancer Res. 66, 605612 (2006).
  35. de Visser, K.E., Eichten, A. & Coussens, L.M. Paradoxical roles of the immune system during cancer development. Nat. Rev. Cancer 6, 2437 (2006).
  36. Beyer, M. & Schultze, J.L. Regulatory T cells in cancer. Blood 108, 804811 (2006).
  37. Girardi, M. et al. Regulation of cutaneous malignancy by γδ T cells. Science 294, 605 (2001).
  38. Haas, W., Pereira, P. & Tonegawa, S. Gamma/delta cells. Annu. Rev. Immunol. 11, 637685 (1993).
  39. Fridlender, Z.G. & Albelda, S.M. Tumor-associated neutrophils: friend or foe? Carcinogenesis 33, 949955 (2012).
  40. Di Carlo, E. et al. The intriguing role of polymorphonuclear neutrophils in antitumor reactions. Blood 97, 339345 (2001).
  41. Vakkila, J. & Lotze, M.T. Inflammation and necrosis promote tumour growth. Nat. Rev. Immunol. 4, 641648 (2004).
  42. Sica, A., Schioppa, T., Mantovani, A. & Allavena, P. Tumour-associated macrophages are a distinct M2 polarised population promoting tumour progression: potential targets of anti-cancer therapy. Eur. J. Cancer 42, 717727 (2006).
  43. Teramukai, S. et al. Pretreatment neutrophil count as an independent prognostic factor in advanced non-small-cell lung cancer: an analysis of Japan Multinational Trial Organisation LC00–03. Eur. J. Cancer 45, 19501958 (2009).
  44. de Visser, K.E., Korets, L.V. & Coussens, L.M. De novo carcinogenesis promoted by chronic inflammation is B lymphocyte dependent. Cancer Cell 7, 411423 (2005).
  45. Liyanage, U.K. et al. Prevalence of regulatory T cells is increased in peripheral blood and tumor microenvironment of patients with pancreas or breast adenocarcinoma. J. Immunol. 169, 27562761 (2002).
  46. Minárik, I. et al. Regulatory T cells, dendritic cells and neutrophils in patients with renal cell carcinoma. Immunol. Lett. 152, 144150 (2013).
  47. Yamanaka, T. et al. The baseline ratio of neutrophils to lymphocytes is associated with patient prognosis in advanced gastric cancer. Oncology 73, 215220 (2007).
  48. Paik, S. et al. A multigene assay to predict recurrence of Tamoxifen-treated, node-negative breast cancer. N. Engl. J. Med. 351, 28172826 (2004).
  49. Chen, X., Sun, X. & Hoshida, Y. Survival analysis tools in genomics research. Hum. Genomics 8, 21 (2014).
  50. Alizadeh, A.A. et al. Prediction of survival in diffuse large B-cell lymphoma based on the expression of 2 genes reflecting tumor and microenvironment. Blood 118, 13501358 (2011).
  51. Bindea, G. et al. Spatiotemporal dynamics of intratumoral immune cells reveal the immune landscape in human cancer. Immunity 39, 782795 (2013).
  52. Rooney, M.S., Shukla, S.A., Wu, C.J., Getz, G. & Hacohen, N. Molecular and genetic properties of tumors associated with local immune cytolytic activity. Cell 160, 4861 (2015).
  53. Gabrilovich, D.I. & Nagaraj, S. Myeloid-derived suppressor cells as regulators of the immune system. Nat. Rev. Immunol. 9, 162174 (2009).
  54. Pardoll, D.M. The blockade of immune checkpoints in cancer immunotherapy. Nat. Rev. Cancer 12, 252264 (2012).
  55. Ribas, A. Tumor immunotherapy directed at PD-1. N. Engl. J. Med. 366, 25172519 (2012).
  56. Day, A., Carlson, M.R., Dong, J., O'Connor, B.D. & Nelson, S.F. Celsius: a community resource for Affymetrix microarray data. Genome Biol. 8, R112 (2007).
  57. Dai, M. et al. Evolving gene/transcript definitions significantly alter the interpretation of GeneChip data. Nucleic Acids Res. 33, e175 (2005).
  58. Gautier, L., Cope, L., Bolstad, B.M. & Irizarry, R.A. affy—analysis of Affymetrix GeneChip data at the probe level. Bioinformatics 20, 307315 (2004).
  59. Gentleman, R.C. et al. Bioconductor: open software development for computational biology and bioinformatics. Genome Biol. 5, R80 (2004).
  60. Irizarry, R.A. et al. Summaries of Affymetrix GeneChip probe level data. Nucleic Acids Res. 31, e15 (2003).
  61. Wu, Z., Irizarry, R.A., Gentleman, R., Martinez-Murillo, F. & Spencer, F. A model-based background adjustment for oligonucleotide expression arrays. J. Am. Stat. Assoc. 99, 909917 (2004).
  62. McCall, M.N., Bolstad, B.M. & Irizarry, R.A. Frozen robust multiarray analysis (fRMA). Biostatistics 11, 242253 (2010).
  63. Piccolo, S.R. et al. A single-sample microarray normalization method to facilitate personalized-medicine workflows. Genomics 100, 337344 (2012).
  64. Zhu, Y., Qiu, P. & Ji, Y. TCGA-Assembler: open-source software for retrieving and processing TCGA data. Nat. Methods 11, 599600 (2014).
  65. Lipták, T. On the combination of independent tests. Magyar Tud. Akad. Mat. Kutato Int. Közl 3, 171196 (1958).
  66. Stouffer, S., DeVinney, L. & Suchmen, E. The American Soldier: Adjustment During Army Life (Princeton University Press, 1949).
  67. Zaykin, D.V. Optimally weighted Z-test is a powerful method for combining probabilities in meta-analysis. J. Evol. Biol. 24, 18361841 (2011).
  68. Johnson, W.E., Li, C. & Rabinovic, A. Adjusting batch effects in microarray expression data using empirical Bayes methods. Biostatistics 8, 118127 (2007).
  69. Storey, J.D. & Tibshirani, R. Statistical significance for genomewide studies. Proc. Natl. Acad. Sci. USA 100, 94409445 (2003).
  70. Szklarczyk, D. et al. The STRING database in 2011: functional interaction networks of proteins, globally integrated and scored. Nucleic Acids Res. 39, D561D568 (2011).
  71. Fiedler, M. Algebraic connectivity of graphs. Czech. Math. J. 23, 298305 (1973).
  72. Chen, J., Bardes, E.E., Aronow, B.J. & Jegga, A.G. ToppGene Suite for gene list enrichment analysis and candidate gene prioritization. Nucleic Acids Res. 37, W305W311 (2009).
  73. Liberzon, A. et al. Molecular signatures database (MSigDB) 3.0. Bioinformatics 27, 17391740 (2011).
  74. Subramanian, A. et al. Gene set enrichment analysis: a knowledge-based approach for interpreting genome-wide expression profiles. Proc. Natl. Acad. Sci. USA 102, 1554515550 (2005).
  75. Zhong, Y. & Liu, Z. Gene expression deconvolution in linear space. Nat. Methods 9, 89 (2012).
  76. Xu, L. et al. Gene expression changes in an animal melanoma model correlate with Aggressiveness of Human Melanoma Metastases. Mol. Cancer Res. 6, 760769 (2008).
  77. Su, L.-J. et al. Selection of DDX5 as a novel internal control for Q-RT-PCR from microarray data using a block bootstrap re-sampling scheme. BMC Genomics 8, 140 (2007).
  78. Landi, M.T. et al. Gene expression signature of cigarette smoking and its role in lung adenocarcinoma development and survival. PLoS ONE 3, e1651 (2008).
  79. Brunner, A.L. et al. Transcriptional profiling of long non-coding RNAs and novel transcribed regions across a diverse panel of archived human cancers. Genome Biol. 13, R75 (2012).
  80. Holmes, S., Kapelner, A. & Lee, P.P. An interactive java statistical image segmentation system: Gemident. J. Stat. Softw. 30, i10 (2009).

Download references

Author information

  1. Present addresses: Department of Radiation Oncology, Princess Margaret Cancer Centre, University of Toronto, Toronto, Ontario, Canada (S.V.B.); Thoracic and Gastrointestinal Oncology Branch, National Cancer Institute, Bethesda, Maryland, USA (C.D.H.).

    • Scott V Bratman &
    • Chuong D Hoang
  2. These authors contributed equally to this work.

    • Andrew J Gentles &
    • Aaron M Newman
  3. These authors jointly directed this work.

    • Sylvia K Plevritis &
    • Ash A Alizadeh

Affiliations

  1. Center for Cancer Systems Biology (CCSB), Stanford University, Stanford, California, USA.

    • Andrew J Gentles,
    • Sylvia K Plevritis &
    • Ash A Alizadeh
  2. Department of Radiology, Stanford University, Stanford, California, USA.

    • Andrew J Gentles &
    • Sylvia K Plevritis
  3. Institute for Stem Cell Biology and Regenerative Medicine, Stanford University, Stanford, California, USA.

    • Aaron M Newman,
    • Chih Long Liu,
    • Scott V Bratman,
    • Weiguo Feng,
    • Dongkyoon Kim,
    • Maximilian Diehn &
    • Ash A Alizadeh
  4. Department of Medicine, Division of Oncology, Stanford Cancer Institute, Stanford University, Stanford, California, USA.

    • Aaron M Newman,
    • Chih Long Liu &
    • Ash A Alizadeh
  5. Department of Radiation Oncology, Stanford University, Stanford, California, USA.

    • Scott V Bratman,
    • Weiguo Feng &
    • Maximilian Diehn
  6. Department of Medicine, Division of Pulmonary and Critical Care Medicine, Stanford University, Stanford, California, USA.

    • Viswam S Nair
  7. Department of Cardiothoracic Surgery, Division of Thoracic Surgery, Stanford University, Stanford, California, USA.

    • Yue Xu,
    • Amanda Khuong &
    • Chuong D Hoang
  8. Stanford Cancer Institute, Stanford University, Stanford, California, USA.

    • Maximilian Diehn &
    • Ash A Alizadeh
  9. Department of Pathology, Stanford University, Stanford, California, USA.

    • Robert B West
  10. Department of Medicine, Division of Hematology, Stanford Cancer Institute, Stanford University, Stanford, California, USA.

    • Ash A Alizadeh

Contributions

A.J.G., S.K.P. and A.A.A. conceived PRECOG, and A.M.N. and A.A.A. conceived immune-PRECOG. A.J.G., A.M.N. and A.A.A. designed the framework, collected and curated the primary data, and developed strategies for implementation and optimizations in related experiments, analyzed the data, and wrote the paper. A.M.N. and A.J.G. wrote all bioinformatics software for PRECOG and related analyses. A.J.G. and C.L.L. implemented web infrastructure for hosting PRECOG. S.V.B., V.S.N., R.B.W. and M.D. curated the NSCLC tumor GEP and TMA data, including clinical annotations. Y.X., A.K. and C.D.H. identified and provided viable NSCLC patient specimens. D.K. and W.F. assisted with flow cytometry characterizations of primary NSCLC tumor specimens and enumeration of corresponding TALs. V.S.N. and R.B.W. constructed the NSCLC TMA and R.B.W. performed in situ hybridizations and immunohistochemical characterizations for TALs. A.A.A. and S.K.P. contributed equally as senior authors to supervising and funding the project. All authors discussed the results and their implications, and commented on the manuscript at all stages.

Competing financial interests

The authors declare no competing financial interests.

Corresponding author

Correspondence to:

Author details

Supplementary information

PDF files

  1. Supplementary Text and Figures (16,665 KB)

    Supplementary Figures 1–8

Excel files

  1. Supplementary Data 1 (10,688 KB)

    PRECOG meta-z matrix and source data

  2. Supplementary Data 2 (90 KB)

    Prognostic genes shared across multiple cancers or specific to individual cancers, and related analyses

  3. Supplementary Data 3 (631 KB)

    Clusters of prognostic genes and corresponding functional annotations

  4. Supplementary Data 4 (26 KB)

    Bivariate models incorporating FOXM1 and KLRB1 expression levels across cancer types, and significance of a FOXM1-KLRB1 score in multivariate models with clinical parameters

  5. Supplementary Data 5 (190 KB)

    Protein-protein association data for the top pan-cancer prognostic genes in PRECOG; analysis of transcription factors and their target genes in PRECOG

  6. Supplementary Data 6 (49 KB)

    CIBERSORT-inferred fractions of tumor-associated leukocytes across 25 malignancies

  7. Supplementary Data 7 (37 KB)

    Lung adenocarcinoma TMA analyses, including clinical data and marker quantification, multivariate survival analysis with clinical covariates, and comparison of TAL levels with circulating leukocytes

Additional data