Prognostic significance of AP-2α/γ targets as cancer therapeutics

Identifying genes with prognostic importance could improve cancer treatment. An increasing number of reports suggest the existence of successful strategies based on seemingly “untargetable” transcription factors. In addition to embryogenesis, AP-2 transcription factors are known to play crucial roles in cancer development. Members of this family can be used as prognostic factors in oncological patients, and AP-2α/γ transcription factors were previously investigated in our pan-cancer comparative study using their target genes. The present study investigates tumors that were previously found similar with an emphasis on the possible role of AP-2 factors in specific cancer types. The RData workspace was loaded back to R environment and 3D trajectories were built via Monocle3. The genes that met the requirement of specificity were listed using top_markers(), separately for mutual and unique targets. Furthermore, the candidate genes had to meet the following requirements: correlation with AP-2 factor (through Correlation AnalyzeR) and validated prognostic importance (using GEPIA2 and subsequently KM-plotter or LOGpc). Eventually, the ROC analysis was applied to confirm their predictive value; co-dependence of expression was visualized via BoxPlotR. Some similar tumors were differentiated by AP-2α/γ targets with prognostic value. Requirements were met by only fifteen genes (EMX2, COL7A1, GRIA1, KRT1, KRT14, SLC12A5, SEZ6L, PTPRN, SCG5, DPP6, NTSR1, ARX, COL4A3, PPEF1 and TMEM59L); of these, the last four were excluded based on ROC curves. All the above genes were confronted with the literature, with an emphasis on the possible role played by AP-2 factors in specific cancers. Following ROC analysis, the genes were verified using immunohistochemistry data and progression-related signatures. Staining differences were observed, as well as co-dependence on the expression of e.g. CTNNB1, ERBB2, KRAS, SMAD4, EGFR or MKI67. In conclusion, prognostic value of targets suggested AP-2α/γ as candidates for novel cancer treatment. It was also revealed that AP-2 targets are related to tumor progression and that some mutual target genes could be inversely regulated.


Results
3D trajectories revealed the dissimilarities across tumor types. As a preliminary remark, in our previous study 10 the analysis included global profiling of twenty-one tumors via Monocle3 and was directed to functionally annotate whole gene sets. In the current research, it was decided to focus more specificially on individual AP-2 targets and their significance as cancer therapeutics. Only the tumors that formerly could not be clearly distinguished were included in this study-this has resulted in an analysis which uses RData from a previous study where visualization with Uniform Manifold Approximation and Projection (UMAP) was preliminarily performed. Due to the reduction in the number of cohorts to eleven, the graphs were re-learned to ensure the best dimensional distribution. The distribution was visualized with regard to the full list of gene targets for a given transcription factor; the trajectories are presented in Fig. 1A and B, while the example three-dimensional extensions are visualized in Fig. 1C and D. Complete and interactive 3D plots corresponding to Fig. 1C and D are deposited as Supplementary File S1 and Supplementary File S2, respectively.
Despite being very similar at first glance, subtle differences were observed. For example, some STAD samples were better separated from the rest of the cohort when visualized through AP-2γ target genes. Likewise, UCEC was more distinct from other surrounding cohorts when AP-2α target genes were applied. Nevertheless, the dissimilarities between tumor types can be seen more clearly in 3D trajectories. Additionally, the tumors were grouped based on the expression level of each AP-2 factor (a common median cut-off was applied for all cancers at once). The results indicate that some tumor types demonstrate higher AP-2α/γ expression than others, and that some are more or less uniform in expression (Fig. 2).
For example in STAD or UCEC, a greater number of AP-2α "high" samples can be seen compared to AP-2γ "high", while in BLCA or LUAD, more AP-2γ "high" samples are found compared to AP-2α. Generally, it seems that both AP-2 factors demonstrate higher expression in the heterogenous mixed cluster (containing BLCA, CESC, ESCA, HNSC, LUSC, UCEC) compared to the other distinct clusters/cohorts. Between tumors there are unique target genes of prognostic importance. The study identified unique AP-2α/γ targets between tumors or in a specific cancer. For tumors, this indicated AP-2 target genes which are the most specific for each tumor when compared to others. Only the genes that simultaneously satisfy the demands of specificity, correlation with an AP-2 factor and survival prediction (with external validation) are shown in detail (see penultimate subsection of Results for total number of genes included from each methodology branch). As indicated in Fig. 3A and B (respectively for AP-2α and AP-2γ), our findings indicate that while single gene can be expressed in a few tumors, most are associated more closely with a specific cancer type. For example, CDX1 (an AP-2α target) is highly expressed in colorectal carcinoma (COAD and READ cohorts) but it is also expressed to a certain extent in STAD. Likewise, the AP-2γ target ADAM23 is mainly expressed in LUSC  www.nature.com/scientificreports/ but also in ESCA or HNSC. As expected, some tumors demonstrate more specific expression of AP-2 target genes than others. All sixty-six genes were subjected to downstream investigation. Three met all the requirements: the AP-2αregulated EMX2 for UCEC, and the AP-2γ targets PTPRN and SCG5 for PAAD. Not only were they negatively correlated with the corresponding AP-2 factor, but they also significantly affected Disease-Free Survival (DFS), as confirmed externally. Regarding specificity, PTPRN was found to be the most specific (0.73), followed by EMX2 (0.72) and SCG5 (0.65); this can be seen in Fig. 4 (the last graph in each subfigure).
Unique targets can predict patient survival for a specific cancer type. Unique AP-2α or AP-2γ targets were also independently analyzed within a specific cancer by classifying AP-2α/γ expression as "high" or "low", with regard to a median cut-off point. The three genes with the highest specificity per "high" or "low"   www.nature.com/scientificreports/ group were identified for each AP-2 factor, giving six genes per tumor. As some AP-2 targets did not meet the minimal specificity requirement (Table 1), fewer genes were included in the downstream analysis. Eventually, it was found that seven targets of AP-2α (KRT1, COL7A1, TMEM59L, KRT14, SLC12A5, GRIA1, SEZ6L) were of prognostic importance for patients having BLCA, LUAD, LUSC or PAAD. Regarding genes regulated by AP-2γ, three targets (PPEF1, ARX, COL4A3) had prognostic value for individuals from LUAD, PAAD or UCEC cohort. Most of the AP-2α unique targets concerned PAAD; interestingly, they were all negatively correlated with transcription factor-encoding gene. In contrast, the remaining part (KRT1 for BLCA, COL7A1 for LUAD and TMEM59L for LUSC) was positively correlated with AP-2α. When it comes to AP-2γ, it correlated positively with PPEF1 and ARX while negatively with COL4A3. Figure 5 presents a summary for targets of both TFs. Differently-regulated mutual target genes were found in ESCA and LUAD. To identify genes whose expression could be regulated in opposite directions by AP-2α and AP-2γ within a single tumor type, each cancer was examined individually. Samples of specific tumor were compared regarding high/low phenotypes for the two AP-2 factors; this allowed changes in expression of mutual target genes to be established between phenotypes. Only two genes satisfied all criteria: DPP6 and NTSR1. AP-2α strongly downregulates the former (Fig. 6A), while AP-2γ appears to downregulate the latter (Fig. 6B). High DPP6 expression is associated with shorter survival in ESCA patients and NTSR1 with shorter survival in LUAD.
ROC curves confirmed that most of the selected genes are good predictors. The relevant targets from previous sections were first demonstrated in Table 2 to infer the possible AP-2 role in specific cancer, and then were subjected to Receiver Operating Characteristic (ROC) analysis to confirm predictive abilities. Binary classification depended on methodology branch e.g. for genes identified within a specific cancer the "high" and "low" groups of AP-2 factor(s) were used. The targets identified by the tumor vs tumor comparisons (EMX2, PTPRN, SCG5) required a representative cohort for ROC analysis; these three genes concern only UCEC and PAAD tumors. For EMX2 expression, UCEC was compared with CESC, an other carcinoma of the female reproductive system. For PTPRN and SCG5 expression, PAAD was compared to ESCA, since it was the closest cohort in terms of size (no tumor with a similar site of origin to PAAD was included in this study). Out of fifteen genes that were suitable for ROC analysis, COL4A3, PPEF1 and TMEM59L did not predict a binary outcome sufficiently: their Area Under the Curve (AUC) was respectively 0.57, 0.56 and 0.57. The usefulness of ARX is also www.nature.com/scientificreports/ Table 1. The top three most specific genes per "high" or "low" group in tumors for both AP-2α and AP-2γ. Underlined and in bold are genes that met > 0.6 requirement in terms of specificity estimated by top_markers().  Immunohistochemistry showed staining differences and the genes were found to be related to tumor progression. To complement the results of the survival analysis, all genes that met the AUC requirement (herein denoted as post-ROC genes) were subjected to further analysis of their immunohistochemical (IHC) data (Fig. 8). Out of eleven genes, there was insufficient or no data for DPP6 or SEZ6L and NTSR1 in www.nature.com/scientificreports/ Table 2. Supposed role of AP-2α and AP-2γ in cancer based on genes that met the study requirements.

Transcription factor Tumor Gene of interest
Prognosis when gene is highly expressed

Correlation with AP-2 factor
Possible role of AP-2 factor in cancer  The study also examined whether the expression of the post-ROC genes could co-depend on expression of other genes known to play a role in tumor progression (a group of five or six genes was selected for each tumor based on literature data). It was found that patients with high or low expression of each post-ROC gene presented varied expression of at least two progression-related genes (Fig. 9).

Discussion
Apart from their role in human development, AP-2 transcription factors are also known to influence carcinogenesis; as such, they have prognostic value for cancer patients 11,12 . Once regarded as "undruggable" DNAbinding proteins, transcription factors are now the object of studies examining selective modulators of their activity 8,13,14 . Among these factors, AP-2 is of particular significance since one of the members from this family, AP-2δ (encoded by the TFAP2D gene), is one of less than three hundred candidate transcription factors suitable for studies focused on examining selective modulators of transcription factor activity 15 .
This suggests that other AP-2 representatives also deserve attention with regard to TF-based targeted therapy. Our previous research on AP-2α and AP-2γ in twenty-one tumors from TCGA identified evident differences in the expression of their target genes between tissues 10 . The present study examined tumors whose independence was not so obvious. The identification of AP-2α/γ mutual or unique targets provided an indication of their relevance for a specific cancer and indirectly, whether AP-2α or AP-2γ might be worth consideration in forthcoming TF-based therapy.
To identify the most relevant AP-2α/γ targets, mutual and unique genes were considered separately. Only fifteen genes met the initial requirements of this study; some were excluded based on ROC curves. Prior to ROC analysis, these fifteen genes were confronted with the literature data, revealing their potential in cancer therapy. Below, they are discussed in sequence.
One of methodology branches was to investigate unique AP-2α/γ target genes between tumors. Only EMX2 (for AP-2α) and PTPRN or SCG5 (both for AP-2γ) were identified; however, they demonstrated very good discriminatory properties. EMX2 encodes a homeodomain-containing transcription factor essential for growth and differentiation 16 . It is also a fundamental protein necessary for the development of the reproductive tract 17,18 ; it is therefore not surprising that it was found to be specific for uterine corpus endometrial carcinoma in the current study: EMX2 expression was noted mainly within UCEC and to a small extent within BLCA. High expression was found to be favorable for survival in UCEC patients, which is in agreement with previous data indicating that it acts as a tumor suppressor in lung, kidney, colorectal, gastric cancers or in sarcoma 16,[19][20][21][22] . Moving forward, PTPRN encodes transmembrane receptor-type protein tyrosine phosphatase, mainly expressed in neuroendocrine tissues such as pancreas 23 . Its expression was found to be PAAD-specific in this study. High PTPRN expression was found to be favorable for survival in pancreatic adenocarcinoma patients. This contradicts data www.nature.com/scientificreports/ from other tumors; for example, its presence favors metastasis and migration promotion in LUAD 24 , and it is associated with worse prognosis of patients with hepatocellular carcinoma 25 or progression of gastric cancer 26 . However, little or no research has been performed on the role of PTPRN in PAAD, and it may be the case that the context is tissue-dependent, similar to GLDC gene function across tumors 27 . Lastly, SCG5 gene encodes secretogranin V, an essential chaperone involved in signaling that influences proliferation, among other things 28,29 . SCG5 was found to be implicated in polyposis syndrome, which is associated with colorectal cancer (CRC) 30 .
Our present findings indicate that SCG5 expression was specific for the PAAD cohort. This is in line with data regarding ability of secretogranin V to regulate prohormone convertase 2 (PC2), a neuroendocrine-specific proteinase 31 . In addition, the higher SCG5 expression was found to improve survival among PAAD patients, which is consistent with the literature 32 .
The other group of unique AP-2α/γ targets identified throughout this study were derived from comparisons within a specific cancer. From genetic information of COL7A1, the alpha chain of basement-membrane protein, type VII collagen, is formed. Its high expression was found to be unfavorable for survival among lung adenocarcinoma patients. Despite the lack of literature data on the role of COL7A1 in LUAD, similar observations have been made in patients with squamous-cell skin cancer 33 , laryngeal cancer 34 and gastric cancer 35 . In contrast, high expression of GRIA1, an ionotropic receptor involved in glutamate signaling, was found to be beneficial in PAAD; indeed, GRIA1 has previously been included in a gene-based risk score system constructed for patients with that tumor 36,37 . However, its function may depend on tissue type, since GRIA1 was shown to promote tumor progression in glioma 38 . Moving forward, KRT1 encodes a member of the intermediate filament superfamily, i.e. the clade comprising the cytokeratins: common markers of differentiation, migration and proliferation in epithelial cells 39 . KRT1 is used to molecularly distinguish muscle-invasive urothelial carcinoma into luminal or basal subtypes 40 ; as such, the identification of this target as an unfavorable prognostic marker in BLCA is useful. KRT1 was previously found to be involved in bladder tumorigenesis 41 , and to be associated with advanced tumor stage and worse prognosis in melanoma patients 42 . The other cytokeratin, KRT14, was also found to worsen the outcome in patients with PAAD. This is in line with data from ovarian and lung cancers 43,44 ; however, a study on www.nature.com/scientificreports/ pancreatic carcinoma suggests that KRT14 is not predictive of outcome (hazard ratio was similar as in our study but statistical significance was not met) 45 . Another gene indicated in the PAAD cohort was SEZ6L, encoding a transmembrane protein implicated in signal transduction, protein-protein interactions and complement regulation (via inhibition of C3 convertases and promotion of C3b degradation) 46 . High expression was favorable for PAAD patients, which is in line with other studies on pancreatic carcinoma 47 or lung cancer 48 . Similarly, a potassium chloride cotransporter encoded by SLC12A5, also improved DFS in PAAD; however, no previous research exists on this tumor (various members of the solute carrier family have been discussed in pancreas neoplasm but SLC12A5 is not among them 49 ). In fact, previous studies indicate that this gene promotes tumor invasion and metastasis in BLCA 50 or proliferation and G 1 /S cell cycle transition in colon cancer 51 . The last target unique for AP-2α was the transmembrane protein-encoding gene TMEM59L (also known as C19orf4). However, its role in LUSC and other cancers remains unclear 52 . The protein is believed to regulate apoptosis 53 and the GeneCards website indicates that TMEM59L modulates the glycosylation of amyloid precursor protein. The remaining genes from unique targets were found to be dependent on AP-2γ. ARX encodes a transcription factor that is crucial in regulating the endocrine pancreas development; it is mainly expressed in the central nervous system, skeletal muscles and aforementioned lineage of pancreatic cells 54,55 . This gene correlates with an aggressive course and frequent relapses in pancreatic neuroendocrine tumors 56,57 . Our present survival analysis suggests that high ARX expression is favorable for PAAD patients, which might suggest a subtype-specific behavior that is worth investigation since pancreatic tumors arising from endocrine or epithelial portion have different properties [58][59][60] .
The prognostic outcome of COL4A3 could also vary depending on cancer type. Our findings indicate that high expression was associated with shorter survival of UCEC patients. This corresponds to research on non-small cell lung (NSCLC) and breast cancers 61,62 but contradicts data from HNSC 63 . Finally, while the molecular function of phosphatase encoded by PPEF1 is not known in detail, it has been connected to apoptosis regulation and the response to calcium (Ca 2+ is also second messenger controlling cell death) 64 . It exerts a tumorigenic role in breast cancer 65 and was found to be overexpressed in pancreatic carcinoma 66 which confirms both its prognostic value in LUAD (poorer survival when highly expressed) and literature data regarding lung cancer 64 . The presence of any mutual targets that are differently regulated by AP-2α and AP-2γ within a specific cancer type was also investigated. Only two genes were found to be inversely regulated by both transcription factors and met all the other requirements: DPP6 (in ESCA cohort) and NTSR1 (in LUAD). The first gene stores information about membrane glycoprotein of dipeptidyl peptidase IV family which regulates apoptosis, differentiation or proliferation 67,68 . Our present findings indicate that high DPP6 expression worsens survival of ESCA patients which is in line with previous data regarding colon cancer progression 67 . However, as no DFS data was present in the validation cohort, the survival analysis of DFS with regard to DPP6 level was validated using overall survival outcome. Interestingly, this gene was found to be overexpressed in long-term survivors of study on esophageal cancer compared to those with shorter survival 69 , and DPP6 hypomethylation or hypermethylation has been noted depending on cancer type 67 . It is undoubtedly an important gene in esophagus neoplasms since, in addition to opposite regulation by AP-2α/γ, its expression is regulated by ARID3A, ZNF354C: two out of five key transcription factors crucial for carcinogenesis and development of esophageal squamous cell carcinoma 70 . Moreover, the gene encoding AP-2α (TFAP2A) was also found to significantly correlate with longer survival rate 70 . Considering how strong TFAP2A and DPP6 correlated in this study (R = − 0.97) and that AP-2γ is suspected to contribute in esophageal cancer progression 9 , they clearly deserve further investigation in ESCA. Last but not least, NTSR1 gene encodes the seven-transmembrane G-protein coupled receptor, through which neurotensin acts on proliferation, DNA synthesis or migration 71 . This receptor has been the subject of more study than other neurotensin receptors 72 ; it has been found to play a tumorigenic role in PAAD, HNSC, NSCLC or CRC 73,74 . In LUAD, it has been found to correlate with poor prognosis and to participate in cancer progression 71,75 ; this corresponds to the survival analysis performed in this study. As such, NTSR1 has often been proposed as a potential therapeutic or diagnostic target 71,74,76 .
All genes with satisfactory AUC were subjected to analysis of IHC data. Their influence on tumor progression was also evaluated; each cancer was considered separately, with progression-related genes being selected for BLCA 77 , ESCA 78 , LUAD 79 , PAAD 80,81 and UCEC 82 based on literature. The prognostic value of EMX2 suggests it as a favorable marker for DFS and the staining confirms that it is present at a higher level in normal endometrial tissue than in tumor, as noted previously 83 . Similarly, high PTPRN or SCG5 expression was found to be favorable for PAAD patients; more intense staining was observed in normal pancreatic tissue, suggesting their expression is lowered during tumorigenesis. In addition, SCG5 expression has been found to be decreased in primary pancreatic cancer, and even lower in metastatic carcinoma 32 . In contrast, KRT1 or KRT14 staining was more intense in BLCA or PAAD than in corresponding normal specimens, confirming their observed unfavorable impact on patient outcome. KRT1 was found to be elevated in urospheres (which contain cancer initiating cells) compared to the parent non-tumorigenic UROtsa cell line, which was exposed to arsenite in order to acquire transformed cells 84 . Likewise, basal keratins (including KRT14) are expressed in a subset of pancreatic ductal adenocarcinoma but are undetectable in normal pancreas 45 .
The observations regarding the co-dependence of post-ROC genes and tumor progression-related genes are generally consistent with both prognostic outcome and IHC data; however, EMX2 presented intriguing tendencies. Namely, while survival analysis and immunohistochemistry are both consistent with literature data suggesting EMX2 as endometrial tumor suppressor, the EMX2 high group of patients were found to have higher expression of CTNNB1 and FGFR2, two known oncogenes implicated in endometrial oncogenesis 82 . This clearly requires future investigation. Nevertheless, the three genes identified in PAAD for which IHC data was available (PTPRN, SCG5, KRT14) indicated that an important switch exists between ERBB2 and SMAD4 expression i.e. the oncogene and tumor suppressor, respectively 80,81 . If the expression of PTPRN and SCG5 (both favorable for prognosis) was high, SMAD4 is elevated but ERBB2 is lowered, similar to the KRAS oncogene. In contrast, when KRT14 expression (being unfavorable) is high, the opposite tendency is seen for ERBB2 and SMAD4. Similar www.nature.com/scientificreports/ observations were made for the genes SEZ6L, GRIA1 and SLC12A5 in PAAD, whose impact on survival suggested anti-cancer properties but insufficient data or no differences were observed in IHC specimens. The same changes in ERBB2, SMAD4 and KRAS level are observed between the "high" and "low" groups of these genes and conforms that of PTPRN and SCG5. The remaining genes in which IHC data was insufficient or staining was not detected, concern DPP6 (for ESCA) and COL7A1 or NTSR1 (both for LUAD), all being unfavorable for patient outcome. High DPP6 expression was found in the group that also demonstrated increased NOX5 and GPX3; while both genes belong to the same biological pathway i.e. regulate reactive oxygen species levels, the former gene is upregulated during esophageal carcinogenesis while the latter is downregulated 85 . Nevertheless, the metabolism of oxygen by-products is dysregulated and this is known to affect tumor progression 86,87 . Undoubtedly, patients with high DPP6 expression demonstrate altered adhesion, as CDH1 expression is significantly decreased, which is associated with poor survival 88 . In LUAD, "high" expressing groups of both COL7A1 and NTSR1 demonstrated reduced NCALD, the gene whose low expression worsens patient outcome 89 . Furthermore, CCND3, whose high expression improves survival 79 , was reduced in the COL7A1 high group while CD44, a promoter of KRAS-dependent lung tumorigenesis 90 , was elevated in the NTSR1 high group. Lastly, KRT1 high BLCA patients had significantly higher expression of EGFR and MKI67, suggesting that proliferation is potentiated in this group; this complements the survival analysis (high KRT1 is unfavorable in BLCA) and IHC data (KRT1 staining is higher in cancer than normal specimens). Finally, there is a need to determine whether AP-2α/γ upregulate or downregulate all the above genes, as estimated by correlation analysis, and thus the role of AP-2 factors in particular cancer types. It appears that while both anti-cancer and pro-tumorigenic roles are possible, the latter is predominant ( Table 2), suggesting AP-2α/γ are potential candidates for cancer treatment. Moreover, the presence of "anti-cancer" next to "pro-tumorigenic" within a single tumor only complicates the final conclusion regarding AP-2 role in that cancer (Table 2; e.g. KRT14 vs GRIA1, SEZ6L, SLC12A5 for AP-2α within PAAD or NTSR1 vs PPEF1 for AP-2γ within LUAD). The only unequivocal "anti-cancer" cases are DPP6 in ESCA (for AP-2α) or COL4A3 in UCEC (for AP-2γ) but this cannot be concluded only on the basis of the single TF-target example. However, three genes per AP-2 factor were of favorable prognostic value for PAAD and all were negatively regulated by the TF, implying both AP-2α and AP-2γ could play tumorigenic role in this tumor. Our previous findings suggest that AP-2α may have an oncogenic role in pancreatic cancer 9 ; as such, further studies are needed of these two AP-2 factors in this tumor type. In addition, they may have different roles in other neoplasms and further research could confirm their value as novel candidate TFs suitable for targeting in cancer treatment, as aforementioned in terms of AP-2δ 15 .

Conclusions
Our findings indicate that genetic targets of AP-2α and AP-2γ differ between seemingly similar tumors. These differences can be of prognostic importance, being implicated in tumor progression, and they may be of value in targeted therapy. The study also paved the way for these two AP-2 transcription factors to be considered as candidates for developing TF-based cancer treatment.

Methods
Data collection, identification of AP-2 target genes, building trajectories. The expression and clinical data of patients (level 3 RNA-seqV2, RSEM normalized) from TCGA-dedicated GDAC Firehose Repository (gdac.broadinstitute.org), together with a list of AP-2α/γ targets (combined from GTRD v19.10, TRANSFAC v2019.2 and TRRUST v2-see Supplementary File S3) were loaded back to R environment in the form of RData workspace (available at github.com/koldam/AP2-prognostic-significance). Thus, the entire workflow (with objects) of the Monocle3 R toolkit (cole-trapnell-lab.github.io/monocle3), performed in the former research 10 , was automatically exported. The additional (i.e. not previously performed) part within Monocle3 included e.g. the generation of subsets using choose_cells(). At first, the subset of the cell_data_set was created in order to include only selected tumors (Table 3). Further comparisons depended on methodology branch (Fig. 10). For example, if the research focused on identifying unique AP-2α/γ target genes within a specific www.nature.com/scientificreports/ tumor, the remaining cohorts were temporarily excluded from the subset. The study was carried out in accordance with relevant guidelines/regulations. The plot_cells_3d() was used to visualize the differences between tumors; this relied on the learn_graph() function (with use_partition parameter set as "TRUE") that was preceded by the reduce_dimension() function (with umap.metric set as "cosine") and cluster_cells(), both with the reduction method set on UMAP. Finally, the most specific genes (specificity > 0.6) in comparisons were listed using top_markers() with fraction_expressing ≥ 0.5 and marker_test_p_value < 0.05. For tumor vs tumor comparisons, gene expression was plotted on heatmaps generated with the use of the DoHeatmap() function (scale_fill_viridis option "E" was used for aesthetics) after CreateSeuratObject() was used within the Comprehensive R Archive Network (CRAN) Seurat R-package. www.nature.com/scientificreports/ Analysis of correlation, prognostic importance (with validation) and classification model. Genes that met the requirement of specificity > 0.6 were individually correlated with gene encoding AP-2α (TFAP2A), AP-2γ (TFAP2C) or both (depending on whether the gene was unique or mutual AP-2 target). Correlation AnalyzeR (gccri.bishop-lab.uthscsa.edu/correlation-analyzer) was used to correlate genes using Pearson's correlation coefficient in desired tissue and sample type ("Gene vs gene" mode was used). Since this tool uses ARCHS4 repository as RNA-seq data source, this could advantageously show relationships independent of TCGA. Survival analysis performed in GEPIA2 (gepia2.cancer-pku.cn) was validated using separate web tools i.e. either pan-cancer RNA-seq KM plotter (kmplot.com) or LOGpc (bioinfo.henu.edu.cn/DatabaseList. jsp), depending on the data availability for specific cancer (e.g. for prognostic endpoints, disease-free or recurrence-free survival (DFS; RFS) was primarily used, with a few examples of disease-specific survival (DSS) and single overall survival (OS)). ROC curves were constructed to evaluate the genes not excluded in previous steps of analysis. Estimation of AUC and 95% Confidence Interval (CI) was done using pROC package with curve visualization made with ggroc and ggplot2 in R environment.

Evaluation of both immunostaining data and influence on tumor progression. Representative
IHC data were obtained from publicly-available Human Protein Atlas (proteinatlas.org); the same antibody for both normal and tumor specimens was selected. The "Tissue" or "Pathology" atlas was used for normal or tumor tissue data, respectively. Genes with prognostic significance were also analyzed on beanplots generated via the BoxPlotR (shiny.chemgrid.org/boxplotr), a web-tool which uses beanplot R-package. The median expression was used as a cut-off value for the post-ROC genes to identify differences in the expression of other genes representing a progression-related signature in a specific tumor. In most cases, two separate scales were used on the Y axis due to large differences in expression level.