Introduction

Human nuclear receptors (NR) form a superfamily of 48 evolutionarily related transcription factors that rely on ligand binding (endogenous ligands: hormones, vitamins, and dietary lipids; exogenous ligands: pharmaceutical agents and toxins) and co-regulator recruitment to mediate the transcriptional activity of target genes1,2,3. NRs are typically comprised of five common domains: (i) nonconserved N-terminal A/B domain, (ii) highly conserved DNA-binding domain (DBD, C domain), (iii) flexible hinge between the DBD and LBD regions (D domain), (iv) moderately conserved ligand-binding domain (LBD, E domain), and (v) nonconserved C-terminal F domain1. Early phylogenetic studies further classified the NR superfamily into seven subfamilies or classes based on sequence similarity, including thyroid hormone receptors (class I), retinoid X receptors (class II), estrogen receptors (class III), nerve growth factors (class IV), steroidogenic factors (class V), germ cell nuclear factor (class VI), and class 0 NRs (NR0B1 and NR0B2) that lack a DBD4,5. Though ligand binding predominantly occurs in the nucleus, a number of class III NRs (ESR1 (NR3A1), GR (NR3C1), MR (NR3C2), PR (NR3C3) or AR (NR3C4)) bind to their respective ligands in the cytoplasm leading to subsequent NR translocation to the nucleus6,7. Despite similar structural architecture, differences in NR sequence homology (NR classes 0-VI), ligand binding (endocrine NRs, orphan NRs or adopted NRs), and NR-NR interaction (homo- and heterodimerization) ultimately result in tissue-specific responses7,8,9,10. NRs are therefore able to control a number of pivotal physiological processes (e.g. development, metabolism, reproduction, cell cycle, differentiation) and diseases (e.g. cancer, osteoporosis, diabetes, cardiovascular disease)8,11.

Consequently, about 16% of FDA-approved drugs currently target NRs, further highlighting the importance of NRs in human disease3. Due to their effect on various cancer-related processes (e.g. tumor initiation and therapeutic response), NRs have become attractive targets for anticancer drug development10,12. Tamoxifen, an estrogen receptor alpha (ERα) antagonist, was first introduced as a palliative agent for advanced breast cancer during the 1970s, but was later proven to be an effective adjuvant therapy for ERα-positive breast cancer patients13. Other pharmaceutical agents have since been FDA-approved for prostate cancer (androgen receptor (AR) antagonists), acute promyelocytic leukemia (retinoic acid receptor (RAR) agonists), AIDS-related Kaposi’s sarcoma (RAR and retinoid X receptor (RXR) agonists), and cutaneous T-cell lymphoma (RXR modulators), while others are currently in clinical trials (ER, AR, RAR, RXR, glucocorticoid receptor (GR), RAR-related orphan receptor (ROR), vitamin D receptor (VDR), peroxisome proliferator-activated receptor (PPAR), liver X receptor (LXR), and farnesoid X receptor (FXR))10. Other less well-known NRs, such as NR1I2 (PXR) and NR1I3 (CAR), have been shown to have an effect on the pharmacokinetics and pharmacodynamics of anticancer drugs14,15.

High-throughput sequencing technologies have been used to develop comprehensive insights into NR function and potential interplay between different NRs in cancer8,16. A pan-cancer study in six cancer types recently demonstrated that recurrent downregulation of NRs in cancer is only partially due to deletion or mutation17. Yet, our understanding of the impact global NR gene expression patterns have on patient clinical outcome is still limited in most cancer forms. Here, NR gene expression patterns were systematically mapped in relation to prognosis in 33 cancer types for 8,526 patients using genomic and clinical data from The Cancer Genome Atlas (TCGA), thereby pinpointing a number of interesting NR targets for future cancer drug development.

Results

RNA-seq analysis defines four main NR expression patterns in cancer

RNA-seq data for 8,526 TCGA patient samples were used to evaluate mRNA expression patterns for the 48 human NRs across 33 cancer types and 11 pan-organ groups (Tables 12). Evaluation of the genome-wide gene expression profiles revealed four main expression patterns in the different neoplastic tissues, i.e. absent (absent to low expression in 100% of tissues), restricted (expressed (defined as moderate to high expression levels) in <50% of tissues), widespread (expressed in >50%, but <100% of tissues), and ubiquitous (expressed in 100% of tissues). In total, five NRs (10%; ESR2, ESRRB, NR2E3, NR6A1, RORB) were not expressed in any tissue, whereas 22 NRs (46%; AR, ESR1, ESRRG, HNF4A, HNF4G, NR0B1, NR0B2, NR1H4, NR1I2, NR1I3, NR2E1, NR2F1, NR3C2, NR4A3, NR5A1, NR5A2, PGR, PPARG, RARB, RORC, RXRG, THRB) showed restricted expression patterns in specific ‘Pan-Cancer’ organ systems, e.g. gynecologic, endocrine, urologic, central nervous system, gastrointestinal, and thoracic (Supplementary Table 1). In contrast, 11 NRs (23%; NR1D1, NR1H3, NR2F2, NR2F6, NR3C1, NR4A2, PPARA, RARG, RORA, THRA, VDR) had widespread expression and 10 NRs (21%; ESRRA, NR1D2, NR1H2, NR2C1, NR2C2, NR4A1, PPARD, RARA, RXRA, RXRB) were ubiquitous. Interestingly, ESRRG (KIHC), NR0B1 (ACC), NR1I3 (LIHC), NR2E1 (GBM), NR5A1 (ACC) were only expressed in one neoplastic tissue (Fig. 1). Unsupervised hierarchical clustering of the expression profiles stratified the cohort fairly well by cancer type and pan-organ group. With the exception of three clusters of NRs representing NR classes I (cluster I: RORC, VDR, PPARA, NR1D1, THRA, RORA, NR1H3; cluster II: RARB, PPARG, THRB) and III (cluster III: ESR1, PGR, AR, ESRRG), NR class was not a good determinate of NR expression patterns in the different cancer types.

Table 1 TCGA cancer types and corresponding pan-cancer organ system.
Table 2 The 48 human nuclear receptors and associated ligands.
Figure 1
figure 1

Human nuclear receptors display relatively similar expression patterns across ‘Pan-Cancer’ diseases. Heatmap depicting RNA-seq gene expression for 48 human NRs in 8,526 TCGA samples representing 33 ‘Pan-Cancer’ diseases. Hierarchical clustering was performed using the Manhattan distance metric and Ward’s minimum variance method (Ward.D2). Gene expression is shown in log10 normalized RSEM.

Differential gene expression reveals cancer-associated human NRs

To identify cancer-related NRs, differential gene expression was assessed in cancer (n = 5,507) and corresponding normal tissue (n = 627) for 16 of the 33 pan-cancers with available gene expression data for normal samples. On average, 33.9 ± 1.60 ( ± SEM, range 23–42) NRs were differentially expressed per cancer type and 11.4 ± 0.40 (range 5–16) cancer types were associated with each NR (Fig. 2A–C). In addition, lower NR expression levels were prevalent in cancer compared with normal tissue. Interestingly, NR3C2, PGR, RORA were differentially expressed in all 16 cancer types, while HNF4G was differentially expressed in only 5/16 cancers (31.3%; Fig. 2C). The highest number of cancer-related NRs was found in LUSC (42 NRs, Fig. 3), KIRC (41 NRs), BRCA (39 NRs), LIHC (39 NRs), and LUAD (39 NRs), whereas only 23 differentially expressed NRs were significantly associated with GBM cancers.

Figure 2
figure 2

NRs are differentially expressed in normal and cancer tissue. (A) Heatmap of Benjamini-Hochberg adjusted p-values using the Wilcoxon test depicting differences in RNA-seq gene expression levels for 16 ‘Pan-Cancer’ forms and corresponding normal tissue. Hierarchical clustering was performed using the Manhattan distance metric and Ward’s minimum variance method (Ward.D2). Statistical significance is shown in −log10[adjusted p-value], where P < 0.05 corresponds to −log10[adjusted p-value] >1.3 (light green), P ≤ 0.01 corresponds to −log10[adjusted p-value] >2 (blue green), P ≤ 0.001 corresponds to −log10[adjusted p-value] >3 (green), and P ≤ 0.0001 corresponds to −log10[adjusted p-value] >4 (dark blue). (B) Bar chart depicting the number of differentially expressed NRs (cancer vs normal) that were identified per cancer type (corresponds to the number of green to blue colored rows in the heatmap). (C) Bar chart depicting the number of cancer types associated with over- (blue bars) and underexpression (yellow bars) of each NR in cancer compared with normal tissue (corresponds to the number of green to blue colored columns in the heatmap).

Figure 3
figure 3

Strong association between NR gene expression and the LUSC cancer form. The highest number of differentially expressed NRs (42/48 NRs) was found in the LUSC cancer form. Box plots showing differences in NR gene expression levels between cancer and corresponding normal tissue for the LUSC cancer form. The Wilcoxon test was used to calculate statistical significance (Benjamini-Hochberg adjusted p-values). ns = not significant (P > 0.05); *P ≤ 0.05; **P ≤ 0.01; ***P ≤ 0.001; ****P ≤ 0.0001.

Pearson correlation demonstrates distinct patterns of NR co-expression in cancer

Pairwise Pearson correlation was then used to assess co-expression of NRs in 21 of the 33 cancer types (Supplementary Figures). Examination of mRNA expression in all 21 cancer types revealed positive correlation between three NR gene clusters, namely 1) NR4A1, NR4A2, NR4A3 (NR class IV), 2) AR, ESR1, ESRRG, NR2E3, NR3C2, PGR, RORC, THRB (NR class I/II/III), and 3) HNF4A, HNF4G, NR0B2, NR1H3, NR1H4, NR1I2, NR5A2, PPARA, PPARG (NR class 0/I/II/V; Fig. 4A). However, individual cancer types were also found to exhibit distinct NR co-expression patterns (Fig. 4B–D). NR expression patterns were generally shown to be weakly to moderately correlated (correlation coefficient values (r) between |0.2| and |0.6|) with the expression of other NRs in most neoplastic tissues. As expected, strong positive correlation (r > 0.6) was observed between the ESR1, AR, PGR, and RARA genes in BRCA. Intriguingly, evidence of NR crosstalk was found between NR class IV genes (NR4A1, NR4A2, NR4A3) in 20/21 cancer types (absent in SKCM). Only 6/21 cancer types (GI pan-organ system: ESCA, PAAD, STAD; Urologic pan-organ system: KIHC, PRAD; Hematologic/lymphatic pan-organ system: THYM) contained ≥20 strongly correlated (r > |0.6|) NR gene pairs (Supplementary Table 2). In total, 32 NR gene pairs were co-expressed in ESCA, several of which were comprised of the HNF4A, HNF4G, NR0B2, NR1I2, NR3C2, NR4A1, and NR5A2 genes. Additionally, KIHC, PAAD, PRAD, and STAD cancers were found to be associated with a number of NR gene pairs containing at least one NR class III genes (estrogen receptor-like NRs, e.g. AR, ESR1, ESRRA, ESRRB, ESRRG, NR3C1, NR3C2, and PGR), whereas THYMs were strongly associated with NR class I genes (thyroid hormone receptor-like NRs, e.g. PPAR, RAR, ROR genes).

Figure 4
figure 4

Pairwise Pearson correlation plots between NR gene expression in different ‘Pan-Cancer’ diseases. Correlation matrices for (A) the 21 ‘Pan-Cancer’ diseases, (B) BRCA, (C) ESCA, and (D) PAAD, with genes ordered using hierarchical clustering with the Ward’s minimum variance method (Ward.D2). Positive correlation coefficients are displayed in blue and negative correlation coefficients in red color. The color intensity and circle size are proportional to the correlation coefficients (P < 0.05), while correlation coefficients with P > 0.05 are blank.

The prognostic significance of NRs depends on the cancer type

Furthermore, the prognostic potential of NR expression was examined in 21 pan-cancers using the web-based Kaplan-Meier (KM) plotter tool with dichotomized gene expression (high and low expression) and overall survival times (Supplementary Figures). Although a number of ‘Pan-Cancer’ diseases (BRCA/CESC and PAAD/READ) and NRs (NR class I: THRA/THRB, RORA/PPARA, RARB/RORB, NR1H3/NR1H4/PPARD, and RARA/NR1H2/NR1I3; NR class II: HNF2A/HNF4G, NR2F1/RXRA/RXRG, and NR2E3/NR2C2/RXRB; NR class IV: NR4A1/NR4A2) belonging to the same groups clustered together, hierarchical clustering of the log-rank test p-values showed no clear correlation between prognostic potential and pan-organ system or NR class (Fig. 5A). On average, 21.2 ± 1.5 (±SEM, range 6–37) NRs were significantly associated with overall survival per cancer type and 9.2 ± 0.3 (range 5–13) cancer types were associated with each prognostic NR (Fig. 5B). Although NRs were generally found to be underexpressed in cancer compared with corresponding normal tissue, both high and low NR expression correlated with adverse clinical outcome (Fig. 5C). Consequently, the prognostic significance of an individual NR frequently differed for cancer types in the same ‘Pan-Cancer’ organ system. NR2E1 was the only NR to demonstrate an association between similar expression patterns and shorter overall survival rates in all cancer types within a ‘Pan-Cancer’ organ system (high NR2E1 expression in BRCA, CESC, OV, and UCEC among gynecologic pan-cancers). In contrast, the prognostic potential of the remaining 47 NRs was frequently found to be connected with diverse expression patterns in different cancer types within a ‘Pan-Cancer’ organ system and NR class. For example, high PPARG expression was shown to be associated with decreased risk for BRCA and increased risk for CESC in the gynecologic organ system, and decreased risk for READ/STAD and increased risk for LIHC/PAAD cancers in the GI organ system. Furthermore, high PPARG expression was found to have a protective effect in five cancer types, e.g. BLCA (HR = 0.5, 95% CI 0.35–0.7, P = 6.7e-05; Fig. 6A), and an adverse effect in seven cancer types, e.g. LIHC (HR = 2.18, 95% CI 1.51–3.14, P = 2e-05; Fig. 6B). The effect of PPARG expression on patient clinical outcome thereby depended on the cancer type (Fig. 6C).

Figure 5
figure 5

NRs are associated with clinical outcome for several ‘Pan-Cancer’ forms. (A) Heatmap of log-rank test p-values depicting the effect of NR gene expression on overall survival for 21 ‘Pan-Cancer’ forms. The ESCA ‘Pan-Cancer’ disease is shown as ESCA_A (esophageal adenocarcinoma) and ESCA_S (esophageal squamous cell carcinoma). Hierarchical clustering was performed using the Manhattan distance metric and Ward’s minimum variance method (Ward.D2). Statistical significance is shown in –log10[p-value], where P < 0.05 corresponds to −log10[p-value] >1.3 (light green), P ≤ 0.01 corresponds to −log10[p-value] >2 (blue green), P ≤ 0.001 corresponds to −log10[p-value] >3 (green), and P ≤ 0.0001 corresponds to −log10[p-value] >4 (dark blue). (B) Bar chart depicting the number of identified prognostic NRs per cancer type (corresponds to the number of green to blue colored rows in the heatmap). (C) Bar chart depicting the number of cancer types associated with high (blue bars) and low expression (yellow bars) for each prognostic NR (corresponds to the number of green to blue colored columns in the heatmap).

Figure 6
figure 6

Gene expression of the PPARG nuclear receptor is significantly associated with overall survival in cancer. (A,B) Kaplan–Meier analysis of PPARG expression in the BLCA and LIHC cohorts. Estimates of the probability of overall survival according to quantile expression (low or high expression). P-values, hazard ratios (HR), and 95% confidence intervals (95% CI) were calculated using the log-rank test and Cox proportional hazards regression, respectively. The x-axes depict months after initial diagnosis and the y-axes depict overall survival. (C) Forest plots illustrating univariate Cox regression analysis of the prognostic impact of PPARG expression on overall survival in 19 ‘Pan-Cancer’ forms. The x-axis is in log scale. HR <1 depicts the association between high PPARG expression and decreased risk, whereas HR >1 illustrates the association between high PPARG expression and increased risk.

Discussion

Although the spatial (expression in different tissues) and temporal (circadian regulation) effects of NR function have been studied extensively in normal mouse tissues, no large-scale studies have currently been conducted in human cancers17,18,19. Therefore, publicly accessible TCGA data containing genome-wide molecular datasets and matching clinical information offers a unique opportunity to examine the relationship between NR expression and prognosis in a range of cancer forms. This comprehensive analysis of global NR gene expression patterns in 33 TCGA cancer types provides a detailed description of NR expression and potential co-expression in specific neoplastic tissues and pan-cancer organ groups. Here, the vast majority of NRs were shown to either be expressed in specific neoplastic tissues (restricted), most tissues (widespread), or all tissues (ubiquitous). Consistent with a previous report, NR expression was generally down-regulated in cancer compared with corresponding normal tissue17. However, this is the first report, to evaluate the clinical utility of the NR superfamily in cancer using survival analysis.

Crystollographic studies have improved our knowledge of how one or more NR polypeptides form dimers (mono-, homo- or heterodimeric NRs) that eventually bind to DNA response elements (DNA direct repeats, palindromic repeats or monomeric sites) in the nucleus9. Intriguingly, gene expression analysis showed that approximately 20% of NRs (ESRRA, NR1D2, NR1H2, NR2C1, NR2C2, NR4A1, PPARD, RARA, RXRA, RXRB) were ubiquitously expressed in all 33 pan-cancers. Furthermore, pairwise Pearson correlation for 21/33 cancer types revealed recurrent correlation between the expression patterns for multiple class I-III NRs and retinoid X receptors (RXRs), as well as, strong positive correlation between class IV NRs (NR4A1, NR4A2, NR4A3) in 20/21 pan-cancers. Although less prevalent, negative correlation was also observed, e.g. RARA and PPARA, ESR1 and PPARA, NR2E1 and RARA/AR/ESR1/PGR, NR2C2 and NR2F6/NR1H2 in BRCA, which is in line with previous reports20. Correlation between expression patterns for RXR genes and other NRs is not surprising since RXRs are common heterodimer partners with class I/II NRs (e.g. RAR, TR, VDR, LXR, PPAR, FXR, PXR, CAR)7,9. Indeed, class IV NRs have been previously associated with urologic malignancies such as bladder urothelial carcinoma, kidney renal carcinoma, and prostate adenocarcinoma, but this is the first report of widespread coordinated expression between these NRs in cancer8.

Survival analysis demonstrated that the prognostic potential of NR expression is predominantly dependent on cancer type, rather than on NR class. Each cancer type and NR were shown to be associated with ≥6 prognostic NRs and ≥5 different cancers, respectively. However, the expression levels (low or high expression) of individual prognostic NRs frequently differed between cancer types. For example, high PPARG expression correlated with decreased mortality risk for five cancer types (BLCA, BRCA, KIRC, READ, and STAD) and increased risk for eight cancer types (CESC, HNSC, LIHC, LUAD, PAAD, PCPG, SARC, THCA). Surprisingly, NR2E1 was the only NR to display similar expression levels (high expression) in association with overall survival in all cancer types (BRCA, CESC, OV, and UCEC) within a pan-cancer organ system (gynecologic pan-cancers).

In summary, this integrative pan-cancer analysis provides a detailed overview of the effects of NR expression on clinical outcome, thereby highlighting the importance of NRs in cancer. This work confirmed previously identified relationships between individual NRs and specific cancer types and revealed novel clinically relevant NRs. Taken together, these findings may therefore prompt a reevaluation of certain NRs as potential actionable targets for various cancer forms.

Methods

Patient cohorts and data acquisition

Genomic and clinical data for 33 cancer types from The Cancer Genome Atlas (TCGA) consortium were retrieved from Broad GDAC Firehose (https://gdac.broadinstitute.org/). The patient cohorts were further stratified into 11 pan-organ systems (central nervous system (CNS), endocrine, gastrointestinal, gynecologic, head and neck, hematologic and lymphatic malignancies, melanocytic, neural-crest derived, soft tissue, thoracic, urologic).

UNC RNASeqV2 level 3 expression (normalized RSEM) data for the 48 human NRs (AR, ESR1, ESR2, ESRRA, ESRRB, ESRRG, HNF4A, HNF4G, NR0B1, NR0B2, NR1D1, NR1D2, NR1H2, NR1H3, NR1H4, NR1I2, NR1I3, NR2C1, NR2C2, NR2E1, NR2E3, NR2F1, NR2F2, NR2F6, NR3C1, NR3C2, NR4A1, NR4A2, NR4A3, NR5A1, NR5A2, NR6A1, PGR, PPARA, PPARD, PPARG, RARA, RARB, RARG, RORA, RORB, RORC, RXRA, RXRB, RXRG, THRA, THRB, VDR; Table 2) were retrieved from Broad GDAC Firehose for 8,526 TCGA tumor specimens and 627 normal specimens. The prognostic significance of the 48 NRs was assessed using the web-based Kaplan-Meier (KM) plotter tool (http://kmplot.com/analysis/index.php?p = service&cancer = pancancer_rnaseq) with 7,489 TCGA RNA-seq datasets representing 21 different ‘Pan-Cancer’ diseases (the ESCA cohort was stratified into ESCA_A (esophageal adenocarcinoma) and ESCA_S (esophageal squamous cell carcinoma)). The patient cohorts are described in detail in Table 121.

Statistical analysis

Statistical analyses were performed using a 0.05 p-value cutoff in R/Bioconductor (version 3.6.0). All p-values are two-sided. The distribution of NR gene expression levels was evaluated in each cancer type by calculating quantile expression (Q1–Q4) using log10-transformed RNA-seq data. Expression levels were then classified as not expressed (Q1 (0–25%: -Inf to 0.98) were defined as absent and Q2 (25–50%: 0.98 to 2.32) as low expression) or expressed (Q3 (50–75%: 2.32 to 2.93) as moderate and Q4 (75–100%: 2.93 to 5.13) as high expression). The frequency of NR expression in a given cancer type was defined as absent (absent to low expression in 100% of tissues), restricted (expressed in <50% of tissues), widespread (expressed in >50%, but <100% of tissues), and ubiquitous (expressed in 100% of tissues), as described elsewhere18. The KM plotter tool first dichotomized gene expression into high and low expression using median expression as a cut-off and then constructed Kaplan-Meier plots by calculating univariate Cox proportional hazard models for the 48 genes using overall survival (OS) and log-rank test (Supplementary Figures). Hierarchical clustering of the log10-tranformed RNA-seq data and -log10-tranformed p-values (survival analysis) was performed with the pheatmap R package (version 1.0.12)22 using the Manhattan distance metric and Ward’s minimum variance method (Ward.D2). Box plots were constructed using the ggpubr (version 0.2.1.999)23 and rstatix (version 0.1.1.999)24 R packages to compare gene expression levels between cancer and normal samples with the Wilcoxon test and Benjamini-Hochberg adjusted p-values. Cancer types with no available normal samples (ACC, CESC, COAD, DLBC, LAML, LGG, MESO, OV, PAAD, READ, SARC, SKCM, TGCT, THYM, UCEC, UCS, UVM) were excluded from the analysis. The pairwise Pearson’s correlation coefficient (r) was calculated per gene pair using the basic stats R package to determine the level of co-expression. Gene expression correlation matrices were visualized using the corrplot R package with Ward D2 hierarchical clustering and P < 0.05 (95% CI) (version 0.84)25. Forest plots were used to display hazard ratios (HR) for the effect of gene expression on overall survival with the forestplot R package (version 1.9)26.