Large-scale RNA-Seq Transcriptome Analysis of 4043 Cancers and 548 Normal Tissue Controls across 12 TCGA Cancer Types

Peng, Li; Bian, Xiu Wu; Li, Di Kang; Xu, Chuan; Wang, Guang Ming; Xia, Qing You; Xiong, Qing

doi:10.1038/srep13413

Download PDF

Article
Open access
Published: 21 August 2015

Large-scale RNA-Seq Transcriptome Analysis of 4043 Cancers and 548 Normal Tissue Controls across 12 TCGA Cancer Types

Li Peng¹,
Xiu Wu Bian²,
Di Kang Li³,
Chuan Xu^2,4,
Guang Ming Wang⁵,
Qing You Xia¹ &
…
Qing Xiong³

Scientific Reports volume 5, Article number: 13413 (2015) Cite this article

29k Accesses
71 Citations
22 Altmetric
Metrics details

Subjects

Abstract

The Cancer Genome Atlas (TCGA) has accrued RNA-Seq-based transcriptome data for more than 4000 cancer tissue samples across 12 cancer types, translating these data into biological insights remains a major challenge. We analyzed and compared the transcriptomes of 4043 cancer and 548 normal tissue samples from 21 TCGA cancer types and created a comprehensive catalog of gene expression alterations for each cancer type. By clustering genes into co-regulated gene sets, we identified seven cross-cancer gene signatures altered across a diverse panel of primary human cancer samples. A 14-gene signature extracted from these seven cross-cancer gene signatures precisely differentiated between cancerous and normal samples, the predictive accuracy of leave-one-out cross-validation (LOOCV) were 92.04%, 96.23%, 91.76%, 90.05%, 88.17%, 94.29% and 99.10% for BLCA, BRCA, COAD, HNSC, LIHC, LUAD and LUSC, respectively. A lung cancer-specific gene signature, containing SFTPA1 and SFTPA2 genes, accurately distinguished lung cancer from other cancer samples, the predictive accuracy of LOOCV for TCGA and GSE5364 data were 95.68% and 100%, respectively. These gene signatures provide rich insights into the transcriptional programs that trigger tumorigenesis and metastasis and many genes in the signature gene panels may be of significant value to the diagnosis and treatment of cancer.

Inferring gene regulatory networks from single-cell multiome data using atlas-scale external data

Article Open access 12 April 2024

Assessing GPT-4 for cell type annotation in single-cell RNA-seq analysis

Article Open access 25 March 2024

PERCEPTION predicts patient response and resistance to treatment using single-cell transcriptomics of their tumors

Article 18 April 2024

Introduction

Recent advances in cancer genomics have created a rich resource for studying the causes of cancer. The Cancer Genome Atlas (TCGA)¹ (http://cancergenome.nih.gov) has accrued more than 10,000 cases of human cancer including over 25 different cancer types. Datasets including RNA-Seq, miRNA-Seq, Exon-Seq, somatic mutations, methylation, CNV for each case are publically available via the TCGA Data Portal (https://tcga-data.nci.nih.gov/tcga/tcgaHome2.jsp) and UCSC Cancer Genomics Hub (https://cghub.ucsc.edu). Translating these data into biological insights remains a major challenge. Currently several studies have analyzed genome-wide mutational patterns in different cancer types and identified genes harboring functional mutations implicated in cancerogenesis^2,3,4,5. Cancer is thought to be driven by gene expression pattern changes due to the accumulation of mutations or epigenetic modifications; thus, a comprehensive characterization of alterations in gene expression will not only advance our understanding of cancer biology, it will also provide a large number of new potential diagnostic and therapeutic targets for cancer. Cheng et al.⁶ introduced a method to identify cancer-associated attractors and revealed some interesting bimolecular events shared among multiple cancer types based on microarray gene expression data. However, genome-wide association analysis of RNA-Seq transcriptome data across various TCGA cancer types has rarely been reported. RNA-Seq, a revolutionary technology for genome-wide gene expression profiling, offers several key advantages compared to microarrays⁷, it could better characterize the transcriptomic changes associated with human cancers.

In this study, we analyzed and compared the RNA-Seq transcriptomes of 4043 cancer and 548 solid tissue normal samples across 21 types of cancer from TCGA. We created a catalog of gene expression alterations for each cancer type and our results show that the alterations in gene expression vary substantially between different tumor types. Studies have shown that cancer involves many different genes and a majority of these genes have a small to moderate effect⁸, it is difficult to detect these effects by single gene analysis. By clustering genes into co-regulated gene sets, we are able to examine accumulative effects of a group of functionally related genes. We performed gene set association analysis for each cancer type; our results revealed several common gene signatures shared by multiple cancer types and a lung cancer-specific gene signature. We also validated these signatures using several non-TCGA data sets. These cross-cancer and cancer-specific transcriptional aberrations improve our understanding of the etiology of human cancers and are of great importance for the diagnosis and treatment of cancer.

Results

Gene-level differential expression analysis of transcriptomes

We conducted gene differential expression analysis and created a catalog of gene expression alterations for each of 12 cancer types; the results are shown in Supplementary Table S1. Our results show that a large number of genes were differentially expressed. Among a total of 20530 genes, the percentage of differentially expressed (DE) genes with FDR < 0.01 is 0.32, 0.72, 0.51, 0.52, 0.52, 0.65, 0.68, 0.54, 0.69, 0.46, 0.46 and 0.56 for BLCA, BRCA, COAD, HNSC, LIHC, LUAD, LUSC, KICH, KIRC, KIRP, PRAD and THCA, respectively. To examine the similarity of DE genes between cancer types, we extracted the top 3% most differentially expressed genes from each cancer type. We then calculated the number of common DE genes between cancer types. As shown in Table 1, we found that DE genes vary substantially across cancer types and there are less than 20% common DE genes between most cancer types. LUAD and LUSC, two forms of lung cancers, turn out to be most similar cancers since they share 55% of DE genes. Contrarily, the DE profiles of two kidney cancers, KICH and KIRC, are quite different from each other and others; the percentage of common DE genes is less than 10%. Additionally, THCA is also poorly overlapped with other cancers in terms of DE genes. The diversity in differential expression could be explained by several factors: (1) many of gene expression alterations may be cancer type-specific; (2) aberrations in different genes may have same phenotypic consequences; (3) single gene analysis may miss many subtle effects on causative genes.

Table 1 The percentage of common genes and gene sets in top 3% most differentially expressed genes and gene sets between 12 cancer types*.

Full size table

Gene clustering

Prior to gene set association analysis, we clustered genes based on their expression profiles over all normal samples across 12 cancer types. We obtained a total of 3236 clusters (Supplementary Table S2). The expression changes of genes in a cluster are highly correlated under various conditions, thus, it is reasonable to assume that genes in the same cluster are co-regulated or belong to the same pathway.

Gene set association analysis of TCGA data

Cancers arise from the aberrations in multiple genes, many of which only have moderate or weak effect sizes that are difficult to detect by only analyzing individual genes, therefore, we adopted gene set association analysis to detect the accumulative effect of a group of functionally related genes and to reveal the transcriptional program accounting for the variability in phenotype.

Carcinogenesis is caused by the accumulation of mutations and epimutations in normal cells^9,10, which confer a growth and selective advantage upon these cells, resulting in uncontrolled cell division and the evolution of these cells by natural selection¹¹. The mutations can be classified into two classes, driver mutation and passenger mutation, according to their phenotypic effects¹². Driver mutations are causally implicated in carcinogenesis while passenger mutations don’t contribute to the development of cancer. A driver mutation is expected to alter the gene expression of its target genes and/or genes that share the same biological pathway^13,14 and these changes in gene expression account for the phenotypic variance¹⁵.

The cell cycle lies at the core of cancer^16,17. In normal cells, the cell cycle is controlled by a series of signaling pathways by which a cell grows, replicates its DNA and divides. In cancers, as a result of mutations, this regulatory process malfunctions, resulting in uncontrolled cell proliferation that leads to carcinogenesis^18,19. From the perspective of pathway, we hypothesize that there may be two potential carcinogenic mechanisms, as illustrated in Fig. 1: (1) one or more driver mutations are within a cell cycle-associated pathway, altering its expression pattern and consequently leading to cancer; (2) one or more driver mutations lie in an organ/tissue-specific pathway or other pathways not related to cell cycle, which interacts with a cell cycle-associated pathway, alters its expression pattern and ultimately results in cancer. Since the deregulation of cell cycle is a common characteristic shared by multiple cancer types, we expected that the expression of cell cycle-associated pathways would be altered across a range of cancers. By analyzing and comparing the transcriptome data of 12 cancer types, we can test this hypothesis.

A gene signature denotes a set of genes that are significantly differentially expressed between cancer and normal samples. We call those pathways/gene sets significantly altered in multiple cancer types as cross-cancer gene signatures while those disrupted in just one cancer type as cancer-specific gene signatures. We performed gene set association analysis using all gene sets generated by gene clustering; the results are shown in Supplementary Table S3. We identified 20, 7, 7, 6, 7, 15, 30 and 1 significant gene sets for BLCA, BRCA, COAD, HNSC, LIHC, LUAD, LUSC and KICH, respectively. No significant associations were found for KIRC, KIRP, PRAD and THCA. Among 46 significant gene sets, seven are cross-cancer gene signatures whose expression levels were significantly altered in at least four cancer types (Fig. 2), the false discovery rates (FDRs) of these gene sets for each cancer type are shown in Table 2. In order to gain biological insights into these gene sets, we performed three types of pathway enrichment analyses, GO analysis, KEGG analysis and Pathway Commons analysis and disease association analysis for genes of each of these gene sets. The results of these analyses are shown in Supplementary Table S4. Interestingly, we found that these seven cross-cancer gene signatures are all closely related to cell cycle regulation, as we expected. Gene set CLUSTER2556 is significant in BLCA, COAD and LUSC. There are 9 significant gene sets shared by two cancer types. Gene set CLUSTER242 is shared by LIHC and LUSC and the remaining 8 gene sets are shared by LUAD and LUSC. LUAD and LUSC are more similar to one another than other cancer types possibly because they are both lung cancers.

Table 2 Significant gene sets at FDR < 0.15 from 12 types of cancers.

Full size table

Cross-cancer gene signatures

We identified seven cross-cancer gene signatures: CLUSTER241, CLUSTER514, CLUSTER1011, CLUSTER932, CLUSTER574, CLUSTER3137 and CLUSTER184, that were altered in at least four types of human cancers. All of these signatures are associated with cell cycle regulation.

Cross-cancer gene signature 1 – CLUSTER241.

CLUSTER241 is significantly altered in seven cancer types: BLCA, BRCA, COAD, HNSC, LIHC, LUAD and LUSC. GO analysis, KEGG analysis and Pathway Commons analysis indicate that genes in this cluster are enriched in pathways involved in the cell cycle. The top enriched GO biological process, KEGG pathway and Pathway Commons pathway are M Phase, Cell Cycle and Mitotic Prometaphase, respectively. The top associated disease is Aneuploidy. Aneuploidy, denoting cells with an abnormal number of chromosomes, is commonly observed in human cancer; it has been recognized as a key characteristic of cancer^20,21. This cluster contains 33 genes, several of which have reported roles in cancer. Kinesins have been reported to play critical roles in the initiation and development of human cancers^22,23. Marker of proliferation Ki-67 (MKI67) is a prognostic marker for breast cancer^24,25. Simultaneous aberration of topoisomerase (DNA) II alpha (TOP2A) and v-erb-b2 avian erythroblastic leukemia viral oncogene homolog 2 (ERBB2/HER2) has been observed in multiple tumor types^26,27.

Cross-cancer gene signature 2 – CLUSTER514.

CLUSTER514 is significantly altered in seven cancer types: BLCA, BRCA, COAD, HNSC, LIHC, LUAD and LUSC. GO analysis, KEGG analysis and Pathway Commons analysis indicate that genes in this cluster are enriched in pathways involved in the cell cycle. The top enriched GO biological process, KEGG pathway and Pathway Commons pathway are Organelle Fission, Cell Cycle and Cell Cycle, Mitotic, respectively. The top associated disease is Cancer or Viral Infections. This cluster contains 36 genes, of which a lot are prognostic markers for cancer. Enhancer of zeste 2 polycomb repressive complex 2 subunit (EZH2) has been linked to multiple cancers^28,29. Aurora kinase A (AURKA) causes chromosome instability by inactivating p53 and contributes to tumorigenesis/carcinogenesis^30,31,32. Baculoviral IAP repeat containing 5 (BIRC5) is over-expressed in most human cancers; the microRNA targeting BIRC5 suppresses cell proliferation in triple-negative breast cancer (TNBC) cells^33,34,35. Thymidine Kinase 1 (TK1), which is elevated in the early stages of malignancies, is a universal marker for cancer^36,37. Polo-like kinase 1 (PLK1) is overexpressed in many tumor types; it is a target for cancer therapy^38,39,40. RAD51 recombinase (RAD51) plays a critical role in DNA Damage Repair and is a potential therapeutic target for cancer^41,42. Hyaluronan-mediated motility receptor (HMMR) is correlated to the stemness and tumorigenicity of cancer stem cells^43,44. Cyclin B1 (CCNB1), PDZ binding kinase (PBK) and cyclin-dependent kinase inhibitor 3 (CDKN3) are also prognostic biomarkers for various types of cancer^{45,46,47,48,49}.

Cross-cancer gene signature 3 – CLUSTER1011.

CLUSTER1011 is significantly altered in seven cancer types: BLCA, BRCA, COAD, HNSC, LIHC, LUAD and LUSC. GO analysis, KEGG analysis and Pathway Commons analysis indicate that genes in this cluster are enriched in pathways involved in the cell cycle. The top enriched GO biological process, KEGG pathway and Pathway Commons pathway are Cell Cycle, Cell Cycle and DNA Replication, respectively. The top associated disease is Fanconi Anemia (FA). The FA proteins are involved in the cell-cycle checkpoint and DNA-repair pathways^50,51. This cluster contains 19 genes, several of which have been linked to cancer. Mutations in BRCA1 interacting protein C-terminal helicase 1 (BRIP1) have been associated with ovarian cancer and breast cancer^52,53,54. The overexpression of KIAA1524/CIP2A have been observed in multiple types of cancer^55,56,57. Centromere protein H (CENPH) is a prognostic marker for cancer^58,59,60,61.

Cross-cancer gene signature 4 – CLUSTER932.

CLUSTER932 is significantly altered in seven cancer types: BLCA, BRCA, COAD, HNSC, LIHC, LUAD and LUSC. GO analysis, KEGG analysis and Pathway Commons analysis indicate that genes in this cluster are enriched in pathways involved in the cell cycle. The top enriched GO biological process, KEGG pathway and Pathway Commons pathway are Cell Cycle, Cell Cycle and Cell Cycle, Mitotic, respectively. The top associated disease is Retinoblastoma. This cluster contains 19 genes, many of which have well-known roles in cancer. Cyclin E1 (CCNE1) and cyclin E2 (CCNE2) play critical roles in cell cycle regulation and are potential therapeutic targets in cancer^62,63,64. The aberrant expression of cell division cycle 6 (CDC6) has been documented in multiple human cancers^65,66,67. E2F transcription factor 7 (E2F7) is interacted with p53, it has been implicated as playing a role in tumorigenesis^68,69,70. Ubiquitin-like with PHD and ring finger domains 1 (UHRF1) is an upstream regulator of the Tip60-p53 interaction and it has been linked to liver cancer⁷¹.

Cross-cancer gene signature 5 – CLUSTER574.

CLUSTER574 is significantly altered in six cancer types: BRCA, COAD, HNSC, LIHC, LUAD and LUSC. GO analysis, KEGG analysis and Pathway Commons analysis indicate that genes in this cluster are enriched in pathways involved in the cell cycle. The top enriched GO biological process, KEGG pathway and Pathway Commons pathway are Mitotic Cell Cycle, Pyrimidine Metabolism and Cell Cycle, Mitotic, respectively. The top associated disease is Pancreatic Diseases. This cluster contains 17 genes, several of which have been associated with cancer. Forkhead box M1 (FOXM1) is overexpressed in the majority of human cancers, it has well-known roles in cancer^72,73,74. Thymidylate synthetase (TYMS) is considered a prognostic biomarker for cancer^75,76. Ribonucleotide reductase M2 (RRM2) is associated with poor survival; it is also implicated in angiogenesis^77,78,79. Spindle and kinetochore associated complex subunit 1 (SKA1) has been highlighted as a biomarker in several types of cancers⁸⁰.

Cross-cancer gene signature 6 – CLUSTER3137.

CLUSTER3137 is significantly altered in five cancer types: BLCA, COAD, LIHC, LUAD and LUSC. GO analysis, KEGG analysis and Pathway Commons analysis indicate that genes in this cluster are enriched in pathways involved in the cell cycle. The top enriched GO biological process, KEGG pathway and Pathway Commons pathway are Cell Cycle Process, Cell Cycle and Mitotic M-M/G1 Phases, respectively. The top associated disease is Retinoblastoma. This cluster contains 16 genes, several of which have been linked to cancer. S-phase kinase-associated protein 2, E3 ubiquitin protein ligase (SKP2) is a protooncogene in human tumors and is a potential cancer drug target^81,82,83. Ribonucleotide reductase M1 (RRM1) is a prognostic marker for cancer^84,85. DNA (cytosine-5-)-methyltransferase 1 (DNMT1) is overexpressed in many cancers and is correlated to the aberrant methylation in human cancer cells⁸⁶. The polymorphisms of DNMT1 have been reported to increase breast cancer risk^87,88,89.

Cross-cancer gene signature 7 – CLUSTER184.

CLUSTER184 is significantly altered in four cancer types: BLCA, BRCA, LUAD and LUSC. GO analysis, KEGG analysis and Pathway Commons analysis indicate that genes in this cluster are enriched in pathways involved in the cell cycle. The top enriched GO biological process, KEGG pathway and Pathway Commons pathway are M Phase, Oocyte Meiosis and Cell Cycle, Mitotic, respectively. The top associated disease is Aneuploidy. This cluster contains 24 genes, several of which have been linked to cancer. The aurora kinase B (AURKB) was shown to be overexpressed in many types of cancer cells and it has been implicated in the carcinogenesis and tumor development process^90,91,92. Ubiquitin-conjugating enzyme E2C (UBE2C/UBCH10) has been reported to play a critical role in carcinogenesis and tumor development^93,94,95.

Although these seven cross-cancer gene signatures in a broad sense are involved in the cell cycle, they may manifest different cellular processes leading to the abnormal cell cycle regulation in malignancy. For example, DNMT1 in CLUSTER3137 is the major enzyme responsible for maintenance of the DNA methylation pattern^96,97,98. DNMT1 has been reported to be overexpressed in many cancers and to be involved in the epigenetic silencing of tumor suppressor genes in human tumor cells⁸⁶. Therefore, the perturbation of CLUSTER3137 might be an epigenetic trigger of tumorigenesis. The deregulation of CLUSTER1011 may reveal the roles of components of the Fanconi anemia/BRCA pathway in human cancers. Increasing evidence shows that FA proteins are involved in the DNA damage response^50,51. In this cluster, except for genes that have established roles in the DNA damage response, such as Fanconi anemia, complementation group D2 (FANCD2)⁹⁹, our study also suggests genes, e.g., downstream neighbor of SON (DONSON) and proline/serine-rich coiled-coil 1 (PSRC1), that may have new unrevealed functions in DNA repair since the expression levels of theses genes were up-regulated in accordance with FA proteins and BRIP1 in cancer samples. Altogether, these seven cross-cancer gene signatures can not only deepen and broaden our understanding of the cellular events involving carcinogenesis related to the four phases of the cell cycle, they also reveal many potential novel therapeutic targets that have so far not been linked to cancers but may have unknown roles in cancer biology. Our study can be considered as a starting point and further investigations (e.g., mutation analysis, survival analysis and functional analysis) on these genes or clusters may lead to the discovery of novel cancer biomarkers and development of new anticancer therapies.

Gene signatures significantly altered in one type of cancer

Based on the TCGA cancer data sets we used, we identified 37 gene signatures significantly altered (FDR < 0.15) only in one type of cancer, of which 21 gene signatures are for lung cancers: LUAD and/or LUSC, 13 for BLCA, 1 for BRCA, 1 for HNSC and 1 for KICH. Figure 3 lists the expression patterns of part of these gene signatures across cancer and normal samples. Among these signature gene sets, several were implicated in relative organ-specific diseases by disease association analysis and may provide insights into transcriptional aberrations underlying the initiation and progression of a specific cancer type.

Gene signatures for lung cancers.

21 clusters were significantly altered only in one or both of lung cancers, three of which, CLUSTER1520, CLUSTER901 and CLUSTER1057, have been implicated in lung diseases.

CLUSTER1520 contains 39 genes. Some genes in this cluster have been reported to be associated with lung cancer or other lung diseases (see Table 3 for details). Among them, two genes, SFTPA1 and SFTPA2, encode surfactant protein A (SP-A) that plays a vital role in maintaining normal lung function¹⁰⁰ and have been implicated in various lung diseases^{101,102,103,104,105,106,107,108,109}. The expression levels of SFTPA1 and SFTPA2 were much higher in lung tissue samples than in any other tissue samples, moreover, these two genes were strikingly down-regulated in lung tumor tissues as compared to the adjacent nontumor tissues (Fig. 4). We thus speculate that the expression changes in these two genes might be an important indicator for lung function abnormalities and those 39 genes in CLUSTER1520 might form a network underlying the initiation and/or development of lung cancers. It could be valuable to elucidate the possible roles of these genes in lung cancer in an experimental setting.

Table 3 List of lung disease-associated genes in CLUSTER1520.

Full size table

The top associated disease for CLUSTER901 is Lung Neoplasms (adjP = 0.0006). This cluster contains 32 genes, several of which have been reported to play roles in lung diseases. G protein-coupled receptor, class C, group 5, member A (Gprc5a) protein is detected in the lungs more than in any other tissue; Gprc5a knockout promotes lung inflammation and tumorigenesis in mice^110,111,112. Moreover, GPRC5A is down-regulated in the adjacent field and normal bronchial epithelia of patients with chronic obstructive pulmonary disease and non-small-cell lung cancer^113,114. Wingless-type MMTV integration site family, member 7A (WNT7A) has been reported to be associated with lung cancer^115,116. Claudin 18 (CLDN18) deficiency is related to alveolar barrier dysfunction^117,118. Adrenoceptor beta 2 (ADRB2) is associated with lung function and lung diseases^119,120.

The top associated diseases for CLUSTER1057 are Lung Diseases (adjP = 0.0037), Respiratory Tract Diseases (adjP = 0.0037) and Airway Obstruction (adjP = 0.0037). CLUSTER1057 contains many immunity-associated genes and might contribute to the immune reactions to lung cancers. Among them, interleukin 33 (IL33) has been linked to lung diseases^121,122; interferon (alpha, beta and omega) receptor 2 (IFNAR2) is a prognostic biomarker for lung cancer¹²³; GTPase, IMAP family member 6 (GIMAP6) and member 8 (GIMAP8) were significantly down-regulated in the non-small cell lung cancer¹²⁴.

Gene signatures for BLCA.

Thirteen clusters were significantly altered only in BLCA, two of which, CLUSTER2174 and CLUSTER1860, have been implicated in bladder abnormalities. The top associated disease for CLUSTER2174 is Urogenital Abnormalities (adjP = 0.0008). This cluster contains 15 genes. Among them, fibroblast growth factor receptor 1 (FGFR1) is a well-known gene that plays a key role in the development of urothelial carcinomas^125,126. The top associated disease for CLUSTER1860 is Cystitis (adjP = 3.08e-05). This cluster contains 18 genes. Gap junction protein, gamma 1 (GJC1/CX45) is one of the two most important gap junction proteins in bladder smooth muscle cells and suburothelial myofibroblasts that are essential for the coordination of normal bladder function¹²⁷. SPARC-like 1 (SPARCL1) is down-regulated in bladder cancer and prostate cancer^128,129.

Gene signatures for BRCA.

CLUSTER891 is significantly altered only in BRCA. The top associated disease is Adenocarcinoma (adjP = 0.0182). This cluster contains 16 genes, two of which have been linked to p53. p53 represses hepatoma-derived growth factor (HDGF) and loss of p53 function contributes to tumorigenesis by elevating HDGF expression^130,131. p53 induces the expression of ferredoxin reductase (FDXR) which sensitizes cells to apoptosis^132,133. Syndecan 1 (SDC1) promotes tumor angiogenesis and growth^134,135.

Gene signatures for KICH.

CLUSTER2240 is significantly altered only in KICH. The top associated disease for CLUSTER2240 is Ciliary Motility Disorders (adjP = 1.85e-05) and Ciliary dysfunction is a risk factor for both syndromic and isolated kidney cystic disease¹³⁶. This cluster contains 25 genes. Nephronophthisis 1 (NPHP1/NPH1) gene deletion is correlated with nephronophthisis^{137,138,139,140}.

We have shown that DE genes vary dramatically across 12 cancer types. To test whether there exists a more consistent DE pattern at the gene set level, we extracted the top 3% most differentially expressed gene sets from each cancer type and calculated the number of common DE gene sets between cancer types. The results are shown in the upper triangular matrix in Table 1. We found that the percentage of common gene sets increase compared to the percentage of common genes for most of cancer pairs. This suggests there are common patterns shared by different tumor types and these patterns can be detected more effectively at the gene set level. These similarities across cancer types shed light on biomarkers that can be used across a range of cancer types and thus have important implications for treatment.

Through genome-wide gene set association analysis of all co-regulated clusters, we identified both cross-cancer gene signatures, which regulate the cell cycle and cancer-specific gene signatures, which are associated with relative organ/tissue-specific diseases. These partly verified our hypothesis that alterations in cell cycle-associated pathways directly contribute to the initiation and development of cancers, while some organ/tissue-specific pathways can lead to cancers possibly by altering the expression of cell-cycle associated pathways. More functional investigations are necessary for further validating this hypothesis.

Leave-one-out cross validation

Seven gene sets, CLUSTER241, CLUSTER514, CLUSTER1011, CLUSTER932, CLUSTER574, CLUSTER3137 and CLUSTER184, were differentially expressed in at least four of the seven cancer types: BLCA, BRCA, COAD, HNSC, LUAD and LUSC. We extracted the top two most differentially expressed genes from these gene sets and created a 14-gene signature, including kinesin family member 4A (KIF4A), nucleolar and spindle associated protein 1 (NUSAP1), Holliday junction recognition protein (HJURP), NIMA-related kinase 2 (NEK2), Fanconi anemia, complementation group I (FANCI), denticleless E3 ubiquitin protein ligase homolog (Drosophila) (DTL), UHRF1, flap structure-specific endonuclease 1 (FEN1), IQ motif containing GTPase activating protein 3 (IQGAP3), kinesin family member 20A (KIF20A), tripartite motif containing 59 (TRIM59), centromere protein L (CENPL), chromosome 16 open reading frame 59 (C16orf59) and UBE2C. We employed leave-one-out cross-validation (LOOCV) to assess whether or not this 14-gene signature can be used to differentiate between the normal and cancerous tissue samples of those seven cancer types. Machine learning techniques, for example support vector machines, have been playing a vital role in sample classification^{141,142,143,144}. LOOCV was performed using SVM-light¹⁴⁵ (http://svmlight.joachims.org/) that is an implementation of support vector machines. The predictive accuracy of LOOCV for each cancer type are shown in Fig. 5. The predictive accuracy is the proportion of the total number of predictions that were correct. We found that most of samples were correctly classified based on the expression levels of these 14 genes, the classification accuracy for BLCA, BRCA, COAD, HNSC, LIHC, LUAD and LUSC were 92.04%, 96.23%, 91.76%, 90.05%, 88.17%, 94.29% and 99.10%, respectively.

Validation of the 14-gene cross-cancer signature and a cancer-specific gene signature, CLUSTER1520, on non-TCGA data sets

We have shown that the cancerous and adjacent normal samples from BLCA, BRCA, COAD, HNSC, LIHC, LUAD and LUSC can be precisely classified using the 14-gene cross-cancer signature. To test if the same holds true for other non-TCGA data sources, we downloaded two RNA-Seq data sets, GSE40419¹⁴⁶ and GSE50760¹⁴⁷ and one microarray data set, GSE5364¹⁴⁸, from the Gene Expression Omnibus (GEO: http://www.ncbi.nlm.nih.gov/geo). GSE40419 includes the RNA-Seq expression values for 87 lung adenocarcinomas and 77 adjacent normal tissues, while GSE50760 contains the RNA-Seq expression values of 54 samples (18 primary colorectal cancer, 18 liver metastasis and 18 normal colon) generated from 18 colorectal cancer patients. We performed LOOCV on these two data sets based on the expression values of the 14-gene cross-cancer signature. We found that the tumor and normal samples were accurately classified, the predictive accuracy for GSE40419 and GSE50760 were 97.14% and 93.33%, respectively. GSE5364 includes 341 samples from multiple solid cancers: 18 lung tumor samples, 12 lung normal samples, 183 breast tumor samples, 13 breast normal samples, 9 colon tumor samples, 9 colon normal samples, 9 liver tumor samples, 8 liver normal samples, 16 oesophagus tumor samples, 13 oesophagus normal samples, 35 thyroid tumor samples and 16 thyroid normal samples. LOOCV was carried out for tumor and normal samples of each tumor type in this data set, the predictive accuracy for lung, breast, colon, liver, oesophagus and thyroid samples were 100%, 93.37%, 100%, 100%, 94.12% and 68.63%, respectively. These results show that our 14-gene cross-cancer signature precisely differentiated between tumor and normal samples for all tumor types in GSE5364 except for those from the thyroid. Interestingly, we here were not able to effectively distinguish tumors from normal samples from the thyroid using this 14-gene cross-cancer signature and this is consistent with the results from the TCGA data.

We found that CLUSTER1520 is a lung cancer-specific gene signature. In the 548 adjacent normal tissue samples of 12 TCGA cancer types, the expression level of CLUSTER1520 in the lung tissue samples was strikingly higher than any other tissue samples and the same holds true for tumor samples if excluding THCA tumor samples from the analysis (Fig. 3). Moreover, CLUSTER1520 showed a substantially reduced level of expression in the lung tumor samples as compared to lung normal samples. In order to test if this signature can be used to differentiate lung tumors from other tumors, we divided all cancer samples from 12 TCGA cancer types into two classes: lung cancer samples (LUAD, LUSC) and non-lung cancer samples (BLCA, BRCA, COAD, HNSC, LIHC, KICH, KIRC, KIRP, PRAD, THCA) and performed LOOCV on these two classes of cancer samples using the expression values of CLUSTER1520. The predictive accuracy was 95.68%, namely we very effectively identified lung cancer samples out of a selection of 12 TCGA cancers based on the expression pattern of CLUSTER1520. We also validate that CLUSTER1520 is a lung cancer-specific gene signature on a non-TCGA microarray data set (GSE5364). GSE5364 includes 6 tumor types and we divided those tumor samples into two classes: lung tumor samples and non-lung tumor samples (breast, colon, liver, oesophagus, thyroid). The predictive accuracy of LOOCV for these two classes of tumor samples was 100%, this demonstrated that lung tumor samples and non-lung tumor samples were accurately classified based on CLUSTER1520. These results show that CLUSTER1520 is a lung cancer-specific gene signature and genes in this signature are potential targets for developing novel lung cancer therapies.

Gene set association analysis of two non-TCGA data sets

In order to test if the significant gene sets we identified from TCGA data can be validated on non-TCGA data sources, we performed gene set association analysis on two non-TCGA cohorts: a lung adenocarcinoma data set (GSE40419) and a colorectal cancer data set (GSE50760), the results are shown in Supplementary Table S5. For GSE40419, we identified 1 significant gene set (FDR < 0.25), CLUSTER514, which is one of the seven cross-cancer gene signatures. Moreover, we found that 9 out of 10 top gene sets with smallest FDRs were within the significant gene sets identified from TCGA LUAD and/or LUSC data. For GSE50760, we identified five significant gene sets (FDR < 0.25), two of them, CLUSTER2556 and CLUSTER514, were also identified as significant using TCGA COAD data. In the 10 top gene sets with smallest FDRs, 4 overlap with the significant gene sets from TCGA COAD data. These results show that at least part of significant gene sets identified from TCGA data can be validated on these two non-TCGA data sets.

Discussion

In this study, we comprehensively characterized the gene expression alterations of 12 types of cancers at the gene and gene set level. We identified DE genes and gene sets, some DE genes and gene sets are shared by different cancer types while others are only altered in one cancer type. We identified seven cross-cancer gene signatures that are differentially expressed in at least 4 cancer types. These signatures contain not only a number of genes which have established roles in cancers, but also genes that might be potential new biomarkers for cancers. These results reveal the aberrations in cancer transcriptomes and lead to a deeper understanding of the formation and development of human cancers.

Traditionally, we research one cancer type at a time, but there are patterns that can only be detected by making connections across different cancer types. Our results reveal that four gene sets, CLUSTER241, CLUSTER514, CLUSTER1011 and CLUSTER932, are significantly altered across seven cancer types: BLCA, BRCA, COAD, HNSC, LIHC, LUAD and LUSC (Table 2). These similarities may indicate that there exist common mechanisms underlying the initiation and/or development of human cancers from different organs or different tissues in the same organ. Interestingly, three types of kidney tumors don’t show these patterns. We found that KIRC and KIRP are more similar to each other than KICH since they share 36% of DE genes (Table 1). Studies have shown that KICH is less aggressive than KIRC and KIRP^149,150.

Gene expression changes with phenotypic consequences are driven by mutations and epimutations. The driver mutations and epimutations may be scattered in different pathways. We hypothesize that some of these mutations or epimutations may disrupt a pathway responsible for cell cycle regulation that directly drives cells into uncontrolled proliferation, while others may lie within an organ-specific pathway that turn a healthy cell into a cancer cell by altering the expression of cell cycle-associated pathways. We were not able to directly detect which pathways harbor the driver mutations through gene expression analysis, but we observed evidence that at least partially support this hypothesis: 1) aberrations in the cell cycle are a common feature shared by multiple cancer types since all of the cross-cancer gene signatures we identified involved cell cycle processes; 2) we found that some cancer-specific gene signatures contain genes implicated in corresponding organ-specific diseases; 3) each type of cancer has its unique features in terms of their DE profiles. It would be very interesting for future studies to explore the connections between mutational profiles and DE profiles and how gene expression patterns change surrounding driver mutations.

We identified some gene sets that were only significantly altered in one type of cancer. Some of these gene sets may be cancer-specific gene signatures, say CLUSTER1520 and CLUSTER2318, that shed light on the mechanisms underlying cancer-driving abnormalities in a specific organ, while many of them may still represent a cellular process broadly perturbed across cancer types and the differentiation is just stronger in one cancer type than other cancer types. Therefore, when we look for mechanisms underlying a specific cancer type, we should treat these signatures with caution. It could be a way to reveal cancer-specific events by comparing various tumor types and looking into the differential gene sets between tumor types in the future.

A question arising from this study is how to make connections between the mutational profiles and DE profiles of human cancers. Some genes, for example tumor protein p53 (TP53/p53), phosphatidylinositol-4,5-bisphosphate 3-kinase, catalytic subunit alpha (PIK3CA) and retinoblastoma 1 (RB1), are frequently mutated in a number of cancers and are key genes contributing to tumorigenesis^151,152,153. However, among 20530 genes in the BLCA, BRCA, COAD, HNSC, LIHC, LUAD, LUSC, KICH, KIRC, KIRP, PRAD and THCA datasets, we found that TP53 was ranked at 11834, 14601, 7752, 18515, 17359, 8769, 11116, 14995, 4200, 986, 4776 and 2094, PIK3CA at 16090, 7012, 14799, 4118, 13535, 13228, 6632, 12475, 14634, 17065, 19153 and 18438, RB1 at 15691, 15157, 6836, 16116, 16543, 9168, 19208, 8333, 9565, 4233, 16756 and 11160, respectively (Supplementary Table S1). These genes are not at the top of the DE gene list. One reason could be that mutations in TP53/PIK3CA/RB1 substantially change the expression of its downstream target genes rather than genes harboring them¹⁵⁴. For example, the expression levels of two p53 targets, E2F7 and HDGF^69,130, are significantly altered across multiple cancer types. These results indicate that cancer-causing genes may only have subtle expression changes, it is thus crucial to measure the total effect of a pathway or integrate mutation analysis into gene expression analysis.

Batch effects in high-throughput data might lead to inaccurate results when dealing with samples from multiple cancer types or data from different sequencing platforms¹⁵⁵. In our study, all of the RNA-Seq data we used were from the same sequencing platform and same sequencing center and this design minimizes the impact of batch effects on our analyses. Second, we clustered genes by their changing tendency in expression over samples and this can eliminate the impact of batch effects on clustering since these effects are global effects on every gene in a sample. Third, we carried out gene set association analysis for each cancer type independently, thus avoiding the cross-cancer bias from the batch effects. Of course, proper handling of batch effects could improve the cross-platform or cross-cancer consistency when performing analyses on data from different sequencing platforms or different cancer types.

Methods

Overview

The pipeline of our analysis is illustrated in Fig. 6. The details of each step are described below.

Data sets

Transcriptome data and clinical data were obtained from the TCGA Data Portal (https://tcga-data.nci.nih.gov/tcga/tcgaHome2.jsp). In order to eliminate the heterogeneity introduced by different sequencing platforms, we only downloaded those data in the category of UNC (IlluminaHiSeq_RNASeqV2). We chose 12 cancer types with transcriptome data available for both cancer and normal tissue samples. The two classes of phenotypes we used were “primary tumor” and “solid tissue normal”, namely only those samples in the clinical category of “primary tumor” or “solid tissue normal” were used for this study. The number of cancer and normal samples for each cancer type are listed in Table 4.

Table 4 Number of cancer and normal samples of 12 cancer types.

Full size table

Differential expression analysis of individual genes

Differential expression analysis of individual genes was carried out using the edgeR Bioconductor package¹⁵⁶ (http://www.bioconductor.org/packages/release/bioc/html/edgeR.html). For each cancer type, we divide samples into two phenotypic groups, primary tumor and solid tissue normal, based on their clinical labels. Raw counts were extracted for these samples and edgeR was employed to find the differentially expressed genes between the two phenotypic groups.

Clustering and gene set association analysis

In order to perform gene set association analysis, we first clustered genes into co-regulated sets based on their expression profiles over all normal samples across 12 cancer types. First, RNA-Seq data were normalized by the DESeq normalization method¹⁵⁷, clustering was then performed using APCluster^158,159 (http://www.bioinf.jku.at/software/apcluster/). We used the Pearson correlation coefficient to measure the similarity between genes. Pearson correlation measures the similarity in shape between two expression profiles, so this metric partitions genes into gene groups whose expression levels rise or fall synchronously under varying conditions or in response to a sequence of environment stimuli. We consider genes with coherent changing tendency in expression as co-regulated genes possibly functional in a same pathway. The number of clusters generated by APCluster is largely determined by the input preference, so we set the input preference (q) to 0.98 to obtain precise clusters, namely the expression profiles of genes in the same cluster are highly correlated.

We performed gene set association analysis for each cancer type to identify gene sets/clusters significantly associated with cancers. Gene set association analysis was carried out using GSAASeqSP, a software newly developed by our group¹⁶⁰ (http://gsaa.unc.edu). RNA-Seq raw counts were normalized by the DESeq normalization in GSAASeqSP which is same as that in the DESeq Bioconductor package¹⁵⁷. We chose Signal2Noise for gene-level differential expression analysis and Weighted_KS for gene set association analysis. Gene sets are gene clusters generated by APCluster and one cluster represents one gene set. Gene sets with less than 15 genes or more than 100 genes were filtered to avoid overly narrow or broad functional categories. In this study, we set the FDR cutoff to 0.15, namely gene sets with FDR < 0.15 were considered to be statistically significantly associated with cancers.

Pathway enrichment analysis and disease association analysis

To gain biological understanding of those gene sets statistically significantly associated with cancers, we carried out pathway enrichment analysis and disease association analysis using WebGestalt^161,162 (http://bioinfo.vanderbilt.edu/webgestalt/). We conducted three types of pathway enrichment analyses for genes of significant gene sets: GO analysis, KEGG analysis and Pathway Commons analysis. GO analysis is to find which GO terms are over-represented in a gene set based on the functional annotation of genes. KEGG analysis and Pathway Commons analysis are to discover pathways enriched in genes in a gene set, the difference between these two analyses is that KEGG analysis is based on the KEGG PATHWAY Database¹⁶³ (http://www.genome.jp/kegg/pathway.html) while Pathway Commons analysis uses pathways collected by Pathway Commons¹⁶⁴. Disease association analysis identifies diseases in which genes in a gene set are over-represented. We adopted the default values for parameters in WebGestalt when performing pathway enrichment analysis and disease association analysis.

Additional Information

How to cite this article: Peng, L. et al. Large-scale RNA-Seq Transcriptome Analysis of 4043 Cancers and 548 Normal Tissue Controls across 12 TCGA Cancer Types. Sci. Rep. 5, 13413; doi: 10.1038/srep13413 (2015).

References

Hudson, T. J. et al. International network of cancer genome projects. Nature 464, 993–998 (2010).
Article CAS ADS PubMed Google Scholar
Alexandrov, L. B. et al. Signatures of mutational processes in human cancer. Nature 500, 415–421 (2013).
Article CAS PubMed PubMed Central Google Scholar
Alexandrov, L. B., Nik-Zainal, S., Wedge, D. C., Campbell, P. J. & Stratton, M. R. Deciphering signatures of mutational processes operative in human cancer. Cell Rep 3, 246–259 (2013).
Article CAS PubMed PubMed Central Google Scholar
Tamborero, D. et al. Comprehensive identification of mutational cancer driver genes across 12 tumor types. Sci Rep 3, 2650; 10.1038/srep02650 (2013).
Article PubMed PubMed Central Google Scholar
Abaan, O. D. et al. The exomes of the NCI-60 panel: a genomic resource for cancer biology and systems pharmacology. Cancer Res 73, 4372–4382 (2013).
Article CAS PubMed PubMed Central Google Scholar
Cheng, W. Y., Ou Yang, T. H. & Anastassiou, D. Biomolecular events in cancer revealed by attractor metagenes. PLoS Comput Biol 9, e1002920; 10.1371/journal.pcbi.1002920 (2013).
Article CAS PubMed PubMed Central Google Scholar
Wang, Z., Gerstein, M. & Snyder, M. RNA-Seq: a revolutionary tool for transcriptomics. Nat Rev Genet 10, 57–63 (2009).
Article CAS PubMed PubMed Central Google Scholar
Pharoah, P. D., Dunning, A. M., Ponder, B. A. & Easton, D. F. Association studies for finding cancer-susceptibility genetic variants. Nat Rev Cancer 4, 850–860 (2004).
Article CAS PubMed Google Scholar
Loeb, K. R. & Loeb, L. A. Significance of multiple mutations in cancer. Carcinogenesis 21, 379–385 (2000).
Article CAS PubMed Google Scholar
Banno, K. et al. Epimutation and cancer: a new carcinogenic mechanism of Lynch syndrome (Review). Int J Oncol 41, 793–797 (2012).
Article CAS PubMed PubMed Central Google Scholar
Tomlinson, I. P., Novelli, M. R. & Bodmer, W. F. The mutation rate and cancer. Proc Natl Acad Sci USA 93, 14800–14803 (1996).
Article CAS ADS PubMed PubMed Central Google Scholar
Stratton, M. R., Campbell, P. J. & Futreal, P. A. The cancer genome. Nature 458, 719–724 (2009).
Article CAS ADS PubMed PubMed Central Google Scholar
Gibson, G. Cancer: Directions for the drivers. Nature 512, 31–32 (2014).
Article CAS ADS PubMed Google Scholar
Bashashati, A. et al. DriverNet: uncovering the impact of somatic driver mutations on transcriptional networks in cancer. Genome Biol 13, R124; 10.1186/gb-2012-13-12-r124 (2012).
Article CAS PubMed PubMed Central Google Scholar
Khaitovich, P., Enard, W., Lachmann, M. & Paabo, S. Evolution of primate gene expression. Nat Rev Genet 7, 693–702 (2006).
Article CAS PubMed Google Scholar
Williams, G. H. & Stoeber, K. The cell cycle and cancer. J Pathol 226, 352–364 (2012).
Article CAS PubMed Google Scholar
Collins, K., Jacks, T. & Pavletich, N. P. The cell cycle and cancer. Proc Natl Acad Sci USA 94, 2776–2778 (1997).
Article CAS ADS PubMed PubMed Central Google Scholar
Hartwell, L. H. & Kastan, M. B. Cell cycle control and cancer. Science 266, 1821–1828 (1994).
Article CAS ADS PubMed Google Scholar
Negrini, S., Gorgoulis, V. G. & Halazonetis, T. D. Genomic instability–an evolving hallmark of cancer. Nat Rev Mol Cell Biol 11, 220–228 (2010).
Article CAS PubMed Google Scholar
Sen, S. Aneuploidy and cancer. Curr Opin Oncol 12, 82–88 (2000).
Article CAS PubMed Google Scholar
Gordon, D. J., Resio, B. & Pellman, D. Causes and consequences of aneuploidy in cancer. Nat Rev Genet 13, 189–203 (2012).
Article CAS PubMed Google Scholar
Hung, P. F. et al. The motor protein KIF14 inhibits tumor growth and cancer metastasis in lung adenocarcinoma. PLoS One 8, e61664; 10.1371/journal.pone.0061664 (2013).
Article CAS ADS PubMed PubMed Central Google Scholar
Liu, X., Gong, H. & Huang, K. Oncogenic role of kinesin proteins and targeting kinesin therapy. Cancer Sci 104, 651–656 (2013).
Article CAS PubMed PubMed Central Google Scholar
de Azambuja, E. et al. Ki-67 as prognostic marker in early breast cancer: a meta-analysis of published studies involving 12,155 patients. Br J Cancer 96, 1504–1513 (2007).
Article CAS PubMed PubMed Central Google Scholar
Tawfik, K., Kimler, B. F., Davis, M. K., Fan, F. & Tawfik, O. Ki-67 expression in axillary lymph node metastases in breast cancer is prognostically significant. Hum Pathol 44, 39–46 (2013).
Article CAS PubMed Google Scholar
Liang, Z. et al. Analysis of EGFR, HER2 and TOP2A gene status and chromosomal polysomy in gastric adenocarcinoma from Chinese patients. BMC Cancer 8, 363; 10.1186/1471-2407-8-363 (2008).
Article CAS PubMed PubMed Central Google Scholar
Arriola, E. et al. Genomic analysis of the HER2/TOP2A amplicon in breast cancer and breast cancer cell lines. Lab Invest 88, 491–503 (2008).
Article CAS PubMed Google Scholar
Kleer, C. G. et al. EZH2 is a marker of aggressive breast cancer and promotes neoplastic transformation of breast epithelial cells. Proc Natl Acad Sci USA 100, 11606–11611 (2003).
Article CAS ADS PubMed PubMed Central Google Scholar
Chase, A. & Cross, N. C. Aberrations of EZH2 in cancer. Clin Cancer Res 17, 2613–2618 (2011).
Article CAS PubMed Google Scholar
Lee, D. F. et al. Regulation of embryonic and induced pluripotency by aurora kinase-p53 signaling. Cell Stem Cell 11, 179–194 (2012).
Article CAS PubMed PubMed Central Google Scholar
Chou, C. H. et al. Chromosome instability modulated by BMI1-AURKA signaling drives progression in head and neck cancer. Cancer Res 73, 953–966 (2013).
Article CAS PubMed Google Scholar
Zhu, J., Abbruzzese, J. L., Izzo, J., Hittelman, W. N. & Li, D. AURKA amplification, chromosome instability and centrosome abnormality in human pancreatic carcinoma cells. Cancer Genet Cytogenet 159, 10–17 (2005).
Article CAS PubMed Google Scholar
Nassar, A., Lawson, D., Cotsonis, G. & Cohen, C. Survivin and caspase-3 expression in breast cancer: correlation with prognostic parameters, proliferation, angiogenesis and outcome. Appl Immunohistochem Mol Morphol 16, 113–120 (2008).
Article CAS PubMed Google Scholar
Boidot, R. et al. The expression of BIRC5 is correlated with loss of specific chromosomal regions in breast carcinomas. Genes Chromosomes Cancer 47, 299–308 (2008).
Article CAS PubMed Google Scholar
Wang, C., Zheng, X., Shen, C. & Shi, Y. MicroRNA-203 suppresses cell proliferation and migration by targeting BIRC5 and LASP1 in human triple-negative breast cancer cells. J Exp Clin Cancer Res 31, 58; 10.1186/1756-9966-31-58 (2012).
Article CAS PubMed PubMed Central Google Scholar
Alegre, M. M., Robison, R. A. & O’Neill, K. L. Thymidine Kinase 1: A Universal Marker for Cancer. Cancer and Clinical Oncology 2, 159–167 (2013).
Article Google Scholar
Alegre, M. M., Robison, R. A. & O’Neill, K. L. Thymidine kinase 1 upregulation is an early event in breast tumor formation. J Oncol 2012, 575647; 10.1155/2012/575647 (2012).
Article CAS PubMed PubMed Central Google Scholar
Degenhardt, Y. & Lampkin, T. Targeting Polo-like kinase in cancer therapy. Clin Cancer Res 16, 384–389 (2010).
Article CAS PubMed Google Scholar
Weiss, L. & Efferth, T. Polo-like kinase 1 as target for cancer therapy. Exp Hematol Oncol 1, 38; 10.1186/2162-3619-1-38 (2012).
Article CAS PubMed PubMed Central Google Scholar
Hu, K., Law, J. H., Fotovati, A. & Dunn, S. E. Small interfering RNA library screen identified polo-like kinase-1 (PLK1) as a potential therapeutic target for breast cancer that uniquely eliminates tumor-initiating cells. Breast Cancer Res 14, R22; 10.1186/bcr3107 (2012).
Article CAS PubMed PubMed Central Google Scholar
Lord, C. J. & Ashworth, A. RAD51, BRCA2 and DNA repair: a partial resolution. Nat Struct Mol Biol 14, 461–462 (2007).
Article CAS PubMed Google Scholar
Nagathihalli, N. S. & Nagaraju, G. RAD51 as a potential biomarker and therapeutic target for pancreatic cancer. Biochim Biophys Acta 1816, 209–218 (2011).
CAS PubMed Google Scholar
Tilghman, J. et al. HMMR maintains the stemness and tumorigenicity of glioblastoma stem-like cells. Cancer Res 74, 3168–3179 (2014).
Article CAS PubMed PubMed Central Google Scholar
Pujana, M. A. et al. Network modeling links breast cancer susceptibility and centrosome dysfunction. Nat Genet 39, 1338–1349 (2007).
Article CAS PubMed Google Scholar
Suzuki, T. et al. Nuclear cyclin B1 in human breast carcinoma as a potent prognostic factor. Cancer Sci 98, 644–651 (2007).
Article CAS PubMed Google Scholar
Hassan, K. A. et al. Clinical significance of cyclin B1 protein expression in squamous cell carcinoma of the tongue. Clin Cancer Res 7, 2458–2462 (2001).
CAS PubMed Google Scholar
Hu, F. et al. PBK/TOPK interacts with the DBD domain of tumor suppressor p53 and modulates expression of transcriptional targets including p21. Oncogene 29, 5464–5474 (2010).
Article CAS PubMed Google Scholar
Shih, M. C. et al. TOPK/PBK promotes cell migration via modulation of the PI3K/PTEN/AKT pathway and is associated with poor prognosis in lung cancer. Oncogene 31, 2389–2400 (2012).
Article CAS PubMed Google Scholar
Li, T., Xue, H., Guo, Y. & Guo, K. CDKN3 is an independent prognostic factor and promotes ovarian carcinoma cell proliferation in ovarian cancer. Oncol Rep 31, 1825–1831 (2014).
Article CAS PubMed Google Scholar
D’Andrea, A. D. & Grompe, M. The Fanconi anaemia/BRCA pathway. Nat Rev Cancer 3, 23–34 (2003).
Article CAS PubMed Google Scholar
Wang, W. Emergence of a DNA-damage response network consisting of Fanconi anaemia and BRCA proteins. Nat Rev Genet 8, 735–748 (2007).
Article CAS PubMed Google Scholar
Rafnar, T. et al. Mutations in BRIP1 confer high risk of ovarian cancer. Nat Genet 43, 1104–1107 (2011).
Article CAS PubMed Google Scholar
Seal, S. et al. Truncating mutations in the Fanconi anemia J gene BRIP1 are low-penetrance breast cancer susceptibility alleles. Nat Genet 38, 1239–1241 (2006).
Article CAS PubMed Google Scholar
De Nicolo, A. et al. A novel breast cancer-associated BRIP1 (FANCJ/BACH1) germ-line mutation impairs protein stability and function. Clin Cancer Res 14, 4672–4680 (2008).
Article CAS PubMed PubMed Central Google Scholar
Kim, J. S., Kim, E. J., Oh, J. S., Park, I. C. & Hwang, S. G. CIP2A modulates cell-cycle progression in human cancer cells by regulating the stability and activity of Plk1. Cancer Res 73, 6667–6678 (2013).
Article CAS PubMed Google Scholar
Liu, N. et al. Overexpression of CIP2A is an independent prognostic indicator in nasopharyngeal carcinoma and its depletion suppresses cell proliferation and tumor growth. Mol Cancer 13, 111; 10.1186/1476-4598-13-111 (2014).
Article CAS PubMed PubMed Central Google Scholar
Vaarala, M. H., Vaisanen, M. R. & Ristimaki, A. CIP2A expression is increased in prostate cancer. J Exp Clin Cancer Res 29, 136; 10.1186/1756-9966-29-136 (2010).
Article PubMed PubMed Central Google Scholar
Zhao, X. et al. Interruption of cenph causes mitotic failure and embryonic death and its haploinsufficiency suppresses cancer in zebrafish. J Biol Chem 285, 27924–27934 (2010).
Article CAS PubMed PubMed Central Google Scholar
Liao, W. T. et al. Centromere protein H is a novel prognostic marker for nasopharyngeal carcinoma progression and overall patient survival. Clin Cancer Res 13, 508–514 (2007).
Article CAS ADS PubMed Google Scholar
Liao, W. T. et al. Centromere protein H is a novel prognostic marker for human nonsmall cell lung cancer progression and overall patient survival. Cancer 115, 1507–1517 (2009).
Article CAS PubMed Google Scholar
Liao, W. T. et al. Overexpression of centromere protein H is significantly associated with breast cancer progression and overall patient survival. Chin J Cancer 30, 627–637 (2011).
Article CAS PubMed PubMed Central Google Scholar
Potemski, P. et al. Cyclin E expression in breast cancer correlates with negative steroid receptor status, HER2 expression, tumor grade and proliferation. J Exp Clin Cancer Res 25, 59–64 (2006).
CAS PubMed Google Scholar
Sieuwerts, A. M. et al. Which cyclin E prevails as prognostic marker for breast cancer? Results from a retrospective study involving 635 lymph node-negative breast cancer patients. Clin Cancer Res 12, 3319–3328 (2006).
Article CAS PubMed Google Scholar
Nakayama, N. et al. Gene amplification CCNE1 is related to poor survival and potential therapeutic target in ovarian cancer. Cancer 116, 2621–2634 (2010).
Article CAS PubMed Google Scholar
Gonzalez, S. et al. Oncogenic activity of Cdc6 through repression of the INK4/ARF locus. Nature 440, 702–706 (2006).
Article CAS ADS PubMed Google Scholar
Liu, Y., Gong, Z., Sun, L. & Li, X. FOXM1 and androgen receptor co-regulate CDC6 gene transcription and DNA replication in prostate cancer cells. Biochim Biophys Acta 1839, 297–305 (2014).
Article CAS PubMed Google Scholar
Robles, L. D. et al. Down-regulation of Cdc6, a cell cycle regulatory gene, in prostate cancer. J Biol Chem 277, 25431–25438 (2002).
Article CAS PubMed Google Scholar
Endo-Munoz, L. et al. E2F7 can regulate proliferation, differentiation and apoptotic responses in human keratinocytes: implications for cutaneous squamous cell carcinoma formation. Cancer Res 69, 1800–1808 (2009).
Article CAS PubMed Google Scholar
Carvajal, L. A., Hamard, P. J., Tonnessen, C. & Manfredi, J. J. E2F7, a novel target, is up-regulated by p53 and mediates DNA damage-dependent transcriptional repression. Genes Dev 26, 1533–1545 (2012).
Article CAS PubMed PubMed Central Google Scholar
Chen, H. Z., Tsai, S. Y. & Leone, G. Emerging roles of E2Fs in cancer: an exit from cell cycle control. Nat Rev Cancer 9, 785–797 (2009).
Article CAS PubMed PubMed Central Google Scholar
Mudbhary, R. et al. UHRF1 overexpression drives DNA hypomethylation and hepatocellular carcinoma. Cancer Cell 25, 196–209 (2014).
Article CAS PubMed PubMed Central Google Scholar
Raychaudhuri, P. & Park, H. J. FoxM1: a master regulator of tumor metastasis. Cancer Res 71, 4329–4333 (2011).
Article CAS PubMed PubMed Central Google Scholar
Lokody, I. Signalling: FOXM1 and CENPF: co-pilots driving prostate cancer. Nat Rev Cancer 14, 450–451 (2014).
Article CAS PubMed Google Scholar
Halasi, M. & Gartel, A. L. Targeting FOXM1 in cancer. Biochem Pharmacol 85, 644–652 (2013).
Article CAS PubMed Google Scholar
Koumarianou, A. et al. Prognostic Markers in Early-stage Colorectal Cancer: Significance of TYMS mRNA Expression. Anticancer Res 34, 4949–4962 (2014).
PubMed Google Scholar
Conradi, L. C. et al. Thymidylate synthase as a prognostic biomarker for locally advanced rectal cancer after multimodal treatment. Ann Surg Oncol 18, 2442–2452 (2011).
Article PubMed PubMed Central Google Scholar
Hsu, N. Y. et al. Expression status of ribonucleotide reductase small subunits hRRM2/p53R2 as prognostic biomarkers in stage I and II non-small cell lung cancer. Anticancer Res 31, 3475–3481 (2011).
CAS PubMed Google Scholar
Putluri, N. et al. Pathway-centric integrative analysis identifies RRM2 as a prognostic marker in breast cancer associated with poor survival and tamoxifen resistance. Neoplasia 16, 390–402 (2014).
Article CAS PubMed PubMed Central Google Scholar
Zhang, K. et al. Overexpression of RRM2 decreases thrombspondin-1 and increases VEGF production in human cancer cells in vitro and in vivo: implication of RRM2 in angiogenesis. Mol Cancer 8, 11; 10.1186/1476-4598-8-11 (2009).
Article CAS PubMed PubMed Central Google Scholar
Sun, W., Yao, L., Jiang, B., Guo, L. & Wang, Q. Spindle and kinetochore-associated protein 1 is overexpressed in gastric cancer and modulates cell growth. Mol Cell Biochem 391, 167–174 (2014).
Article CAS PubMed Google Scholar
Gstaiger, M. et al. Skp2 is oncogenic and overexpressed in human cancers. Proc Natl Acad Sci USA 98, 5043–5048 (2001).
Article CAS ADS PubMed PubMed Central Google Scholar
Lin, H. K. et al. Skp2 targeting suppresses tumorigenesis by Arf-p53-independent cellular senescence. Nature 464, 374–379 (2010).
Article CAS ADS PubMed PubMed Central Google Scholar
Wang, Z. et al. Skp2: a novel potential therapeutic target for prostate cancer. Biochim Biophys Acta 1825, 11–17 (2012).
CAS PubMed Google Scholar
Jordheim, L. P., Seve, P., Tredan, O. & Dumontet, C. The ribonucleotide reductase large subunit (RRM1) as a predictive factor in patients with cancer. Lancet Oncol 12, 693–702 (2011).
Article CAS PubMed Google Scholar
Ceppi, P. et al. ERCC1 and RRM1 gene expressions but not EGFR are predictive of shorter survival in advanced non-small-cell lung cancer treated with cisplatin and gemcitabine. Ann Oncol 17, 1818–1825 (2006).
Article CAS PubMed Google Scholar
Agarwal, S. et al. Mahanine restores RASSF1A expression by down-regulating DNMT1 and DNMT3B in prostate cancer cells. Mol Cancer 12, 99; 10.1186/1476-4598-12-99 (2013).
Article CAS PubMed PubMed Central Google Scholar
Robert, M. F. et al. DNMT1 is required to maintain CpG methylation and aberrant gene silencing in human cancer cells. Nat Genet 33, 61–65 (2003).
Article CAS PubMed Google Scholar
Li, A., Omura, N., Hong, S. M. & Goggins, M. Pancreatic cancer DNMT1 expression and sensitivity to DNMT1 inhibitors. Cancer Biol Ther 9, 321–329 (2010).
Article CAS PubMed Google Scholar
Kullmann, K., Deryal, M., Ong, M. F., Schmidt, W. & Mahlknecht, U. DNMT1 genetic polymorphisms affect breast cancer risk in the central European Caucasian population. Clin Epigenetics 5, 7; 10.1186/1868-7083-5-7 (2013).
Article CAS PubMed PubMed Central Google Scholar
den Hollander, J. et al. Aurora kinases A and B are up-regulated by Myc and are essential for maintenance of the malignant state. Blood 116, 1498–1505 (2010).
Article CAS PubMed PubMed Central Google Scholar
Morozova, O. et al. System-level analysis of neuroblastoma tumor-initiating cells implicates AURKB as a novel drug target for neuroblastoma. Clin Cancer Res 16, 4572–4582 (2010).
Article CAS PubMed Google Scholar
Addepalli, M. K. et al. RNAi-mediated knockdown of AURKB and EGFR shows enhanced therapeutic efficacy in prostate tumor regression. Gene Ther 17, 352–359 (2010).
Article CAS PubMed Google Scholar
Wagner, K. W. et al. Overexpression, genomic amplification and therapeutic potential of inhibiting the UbcH10 ubiquitin conjugase in human carcinomas of diverse anatomic origin. Oncogene 23, 6621–6629 (2004).
Article CAS PubMed Google Scholar
Pallante, P. et al. UbcH10 overexpression in human lung carcinomas and its correlation with EGFR and p53 mutational status. Eur J Cancer 49, 1117–1126 (2013).
Article CAS PubMed Google Scholar
Fujita, T. et al. Clinicopathological relevance of UbcH10 in breast cancer. Cancer Sci 100, 238–248 (2009).
Article CAS PubMed Google Scholar
Rhee, I. et al. DNMT1 and DNMT3b cooperate to silence genes in human cancer cells. Nature 416, 552–556 (2002).
Article CAS ADS PubMed Google Scholar
Subramaniam, D., Thombre, R., Dhar, A. & Anant, S. DNA methyltransferases: a novel target for prevention and therapy. Front Oncol 4, 80; 10.3389/fonc.2014.00080 (2014).
Article PubMed PubMed Central Google Scholar
Robertson, K. D. DNA methylation, methyltransferases and cancer. Oncogene 20, 3139–3155 (2001).
Article CAS PubMed Google Scholar
Garcia-Higuera, I. et al. Interaction of the Fanconi anemia proteins and BRCA1 in a common pathway. Mol Cell 7, 249–262 (2001).
Article CAS PubMed Google Scholar
Silveyra, P., DiAngelo, S. L. & Floros, J. An 11-nt sequence polymorphism at the 3’UTR of human SFTPA1 and SFTPA2 gene variants differentially affect gene expression levels and miRNA regulation in cell culture. Am J Physiol Lung Cell Mol Physiol 307, L106–119 (2014).
Article CAS PubMed PubMed Central Google Scholar
Grageda, M., Silveyra, P., Thomas, N. J., DiAngelo, S. L. & Floros, J. DNA methylation profile and expression of surfactant protein A2 gene in lung cancer. Exp Lung Res 41, 93–102 (2015).
Article CAS PubMed Google Scholar
Wang, Y. et al. Genetic defects in surfactant protein A2 are associated with pulmonary fibrosis and lung cancer. Am J Hum Genet 84, 52–59 (2009).
Article CAS PubMed PubMed Central Google Scholar
Lin, Z. et al. DNA methylation markers of surfactant proteins in lung cancer. Int J Oncol 31, 181–191 (2007).
CAS PubMed Google Scholar
Maitra, M., Cano, C. A. & Garcia, C. K. Mutant surfactant A2 proteins associated with familial pulmonary fibrosis and lung cancer induce TGF-beta1 secretion. Proc Natl Acad Sci USA 109, 21064–21069 (2012).
Article CAS ADS PubMed PubMed Central Google Scholar
Maitra, M., Wang, Y., Gerard, R. D., Mendelson, C. R. & Garcia, C. K. Surfactant protein A2 mutations associated with pulmonary fibrosis lead to protein instability and endoplasmic reticulum stress. J Biol Chem 285, 22103–22113 (2010).
Article CAS PubMed PubMed Central Google Scholar
Choi, E. H., Ehrmantraut, M., Foster, C. B., Moss, J. & Chanock, S. J. Association of common haplotypes of surfactant protein A1 and A2 (SFTPA1 and SFTPA2) genes with severity of lung disease in cystic fibrosis. Pediatr Pulmonol 41, 255–262 (2006).
Article PubMed Google Scholar
Heinrich, S., Hartl, D. & Griese, M. Surfactant protein A–from genes to human lung diseases. Curr Med Chem 13, 3239–3252 (2006).
Article CAS PubMed Google Scholar
Zhang, Y. et al. Identification and examination of a novel 9-bp insert/deletion polymorphism on porcine SFTPA1 exon 2 associated with acute lung injury using an oleic acid-acute lung injury model. Anim Sci J 86, 573–578 (2015).
Article CAS PubMed Google Scholar
Silveyra, P. & Floros, J. Genetic variant associations of human SP-A and SP-D with acute and chronic lung injury. Front Biosci (Landmark Ed) 17, 407–429 (2012).
Article CAS Google Scholar
Deng, J. et al. Knockout of the tumor suppressor gene Gprc5a in mice leads to NF-kappaB activation in airway epithelium and promotes lung inflammation and tumorigenesis. Cancer Prev Res (Phila) 3, 424–437 (2010).
Article CAS Google Scholar
Barta, P. et al. Enhancement of lung tumorigenesis in a Gprc5a Knockout mouse by chronic extrinsic airway inflammation. Mol Cancer 11, 4; 10.1186/1476-4598-11-4 (2012).
Article CAS PubMed PubMed Central Google Scholar
Chen, Y. et al. Gprc5a deletion enhances the transformed phenotype in normal and malignant lung epithelial cells by eliciting persistent Stat3 signaling induced by autocrine leukemia inhibitory factor. Cancer Res 70, 8917–8926 (2010).
Article CAS PubMed PubMed Central Google Scholar
Fujimoto, J. et al. G-protein coupled receptor family C, group 5, member A (GPRC5A) expression is decreased in the adjacent field and normal bronchial epithelia of patients with chronic obstructive pulmonary disease and non-small-cell lung cancer. J Thorac Oncol 7, 1747–1754 (2012).
Article CAS PubMed PubMed Central Google Scholar
Kadara, H. et al. A Gprc5a tumor suppressor loss of expression signature is conserved, prevalent and associated with survival in human lung adenocarcinomas. Neoplasia 12, 499–505 (2010).
Article CAS PubMed PubMed Central Google Scholar
Ohira, T. et al. WNT7a induces E-cadherin in lung cancer cells. Proc Natl Acad Sci U S A 100, 10429–10434 (2003).
Article CAS ADS PubMed PubMed Central Google Scholar
Tennis, M. A., Vanscoyk, M. M., Wilson, L. A., Kelley, N. & Winn, R. A. Methylation of Wnt7a is modulated by DNMT1 and cigarette smoke condensate in non-small cell lung cancer. PLoS One 7, e32921; 10.1371/journal.pone.0032921 (2012).
Article CAS ADS PubMed PubMed Central Google Scholar
LaFemina, M. J. et al. Claudin-18 deficiency results in alveolar barrier dysfunction and impaired alveologenesis in mice. Am J Respir Cell Mol Biol 51, 550–558 (2014).
Article CAS PubMed PubMed Central Google Scholar
Micke, P. et al. Aberrantly activated claudin 6 and 18.2 as potential therapy targets in non-small-cell lung cancer. Int J Cancer 135, 2206–2214 (2014).
Article CAS PubMed Google Scholar
Torjussen, T. M. et al. Childhood lung function and the association with beta2-adrenergic receptor haplotypes. Acta Paediatr 102, 727–731 (2013).
Article CAS PubMed Google Scholar
Marson, F. A., Bertuzzo, C. S., Ribeiro, A. F. & Ribeiro, J. D. Polymorphisms in ADRB2 gene can modulate the response to bronchodilators and the severity of cystic fibrosis. BMC Pulm Med 12, 50; 10.1186/1471-2466-12-50 (2012).
Article CAS PubMed PubMed Central Google Scholar
Byers, D. E. et al. Long-term IL-33-producing epithelial progenitor cells in chronic obstructive lung disease. J Clin Invest 123, 3967–3982 (2013).
Article CAS PubMed PubMed Central Google Scholar
Li, D. et al. IL-33 promotes ST2-dependent lung fibrosis by the induction of alternatively activated macrophages and innate lymphoid cells in mice. J Allergy Clin Immunol 134, 1422–1432 (2014).
Article CAS PubMed PubMed Central Google Scholar
Tanaka, S. et al. Interferon (alpha, beta and omega) receptor 2 is a prognostic biomarker for lung cancer. Pathobiology 79, 24–33 (2012).
Article CAS PubMed Google Scholar
Shiao, Y. M. et al. Dysregulation of GIMAP genes in non-small cell lung cancer. Lung Cancer 62, 287–294 (2008).
Article PubMed Google Scholar
di Martino, E., Tomlinson, D. C. & Knowles, M. A. A Decade of FGF Receptor Research in Bladder Cancer: Past, Present and Future Challenges. Adv Urol 2012, 429213; 10.1155/2012/429213 (2012).
Article PubMed PubMed Central Google Scholar
Lamont, F. R. et al. Small molecule FGF receptor inhibitors block FGFR-dependent urothelial carcinoma growth in vitro and in vivo. Br J Cancer 104, 75–82 (2011).
Article CAS PubMed Google Scholar
Heinrich, M., Oberbach, A., Schlichting, N., Stolzenburg, J. U. & Neuhaus, J. Cytokine effects on gap junction communication and connexin expression in human bladder smooth muscle cells and suburothelial myofibroblasts. PLoS One 6, e20792; 10.1371/journal.pone.0020792 (2011).
Article CAS ADS PubMed PubMed Central Google Scholar
Zaravinos, A., Lambrou, G. I., Boulalas, I., Delakas, D. & Spandidos, D. A. Identification of common differentially expressed genes in urinary bladder cancer. PLoS One 6, e18135; 10.1371/journal.pone.0018135 (2011).
Article CAS ADS PubMed PubMed Central Google Scholar
Hurley, P. J. et al. Secreted protein, acidic and rich in cysteine-like 1 (SPARCL1) is down regulated in aggressive prostate cancers and is prognostic for poor clinical outcome. Proc Natl Acad Sci USA 109, 14977–14982 (2012).
Article CAS ADS PubMed PubMed Central Google Scholar
Sasaki, Y. et al. p53 negatively regulates the hepatoma growth factor HDGF. Cancer Res 71, 7038–7047 (2011).
Article CAS PubMed Google Scholar
Chen, S. C. et al. Hepatoma-derived growth factor regulates breast cancer cell invasion by modulating epithelial–mesenchymal transition. J Pathol 228, 158–169 (2012).
Article CAS PubMed Google Scholar
Liu, G. & Chen, X. The ferredoxin reductase gene is regulated by the p53 family and sensitizes cells to oxidative stress-induced apoptosis. Oncogene 21, 7195–7204 (2002).
Article CAS PubMed Google Scholar
Lacroix, M., Toillon, R. A. & Leclercq, G. p53 and breast cancer, an update. Endocr Relat Cancer 13, 293–325 (2006).
Article CAS PubMed Google Scholar
Yang, N., Mosher, R., Seo, S., Beebe, D. & Friedl, A. Syndecan-1 in breast cancer stroma fibroblasts regulates extracellular matrix fiber organization and carcinoma cell motility. Am J Pathol 178, 325–335 (2011).
Article CAS PubMed PubMed Central Google Scholar
Maeda, T., Desouky, J. & Friedl, A. Syndecan-1 expression by stromal fibroblasts promotes breast carcinoma growth in vivo and stimulates tumor angiogenesis. Oncogene 25, 1408–1412 (2006).
Article CAS PubMed Google Scholar
Gascue, C., Katsanis, N. & Badano, J. L. Cystic diseases of the kidney: ciliary dysfunction and cystogenic mechanisms. Pediatr Nephrol 26, 1181–1195 (2011).
Article PubMed Google Scholar
Bollee, G. et al. Nephronophthisis related to homozygous NPHP1 gene deletion as a cause of chronic renal failure in adults. Nephrol Dial Transplant 21, 2660–2663 (2006).
Article PubMed Google Scholar
Saunier, S. et al. Characterization of the NPHP1 locus: mutational mechanism involved in deletions in familial juvenile nephronophthisis. Am J Hum Genet 66, 778–789 (2000).
Article CAS PubMed PubMed Central Google Scholar
Konrad, M. et al. Large homozygous deletions of the 2q13 region are a major cause of juvenile nephronophthisis. Hum Mol Genet 5, 367–371 (1996).
Article CAS PubMed Google Scholar
Parisi, M. A. et al. The NPHP1 gene deletion associated with juvenile nephronophthisis is present in a subset of individuals with Joubert syndrome. Am J Hum Genet 75, 82–91 (2004).
Article CAS PubMed PubMed Central Google Scholar
Furey, T. S. et al. Support vector machine classification and validation of cancer tissue samples using microarray expression data. Bioinformatics 16, 906–914 (2000).
Article CAS PubMed Google Scholar
Zhang, C., Lu, X. & Zhang, X. Significance of gene ranking for classification of microarray samples. IEEE/ACM Trans Comput Biol Bioinform 3, 312–320 (2006).
Article CAS PubMed Google Scholar
Saeys, Y., Inza, I. & Larranaga, P. A review of feature selection techniques in bioinformatics. Bioinformatics 23, 2507–2517 (2007).
Article CAS PubMed Google Scholar
Sharma, A., Imoto, S. & Miyano, S. A top-r feature selection algorithm for microarray gene expression data. IEEE/ACM Trans Comput Biol Bioinform 9, 754–764 (2012).
Article PubMed Google Scholar
Joachims, T. Making large-Scale SVM Learning Practical. in Advances in Kernel Methods - Support Vector Learning (eds. Schölkopf, B., Burges, C. & Smola, A. ) (MIT-Press, 1999).
Seo, J. S. et al. The transcriptional landscape and mutational profile of lung adenocarcinoma. Genome Res 22, 2109–2119 (2012).
Article CAS PubMed PubMed Central Google Scholar
Kim, S. K. et al. A nineteen gene-based risk score classifier predicts prognosis of colorectal cancer patients. Mol Oncol 8, 1653–1666 (2014).
Article CAS PubMed PubMed Central Google Scholar
Yu, K. et al. A precisely regulated gene expression cassette potently modulates metastasis and survival in multiple solid cancers. PLoS Genet 4, e1000129; 10.1371/journal.pgen.1000129 (2008).
Article CAS PubMed PubMed Central Google Scholar
Steffens, S. et al. Clinical behavior of chromophobe renal cell carcinoma is less aggressive than that of clear cell renal cell carcinoma, independent of Fuhrman grade or tumor size. Virchows Arch 465, 439–444 (2014).
Article CAS PubMed Google Scholar
Onishi, T., Ohishi, Y., Goto, H., Suzuki, M. & Miyazawa, Y. Papillary renal cell carcinoma: clinicopathological characteristics and evaluation of prognosis in 42 patients. BJU Int 83, 937–943 (1999).
Article CAS PubMed Google Scholar
Muller, P. A. & Vousden, K. H. p53 mutations in cancer. Nat Cell Biol 15, 2–8 (2013).
Article CAS PubMed Google Scholar
Karakas, B., Bachman, K. E. & Park, B. H. Mutation of the PIK3CA oncogene in human cancers. Br J Cancer 94, 455–459 (2006).
Article CAS PubMed PubMed Central Google Scholar
Kleinerman, R. A. et al. Hereditary retinoblastoma and risk of lung cancer. J Natl Cancer Inst 92, 2037–2039 (2000).
Article CAS PubMed Google Scholar
Menendez, D., Inga, A. & Resnick, M. A. The expanding universe of p53 targets. Nat Rev Cancer 9, 724–737 (2009).
Article CAS PubMed Google Scholar
Leek, J. T. et al. Tackling the widespread and critical impact of batch effects in high-throughput data. Nat Rev Genet 11, 733–739 (2010).
Article CAS PubMed Google Scholar
Robinson, M. D., McCarthy, D. J. & Smyth, G. K. edgeR: a Bioconductor package for differential expression analysis of digital gene expression data. Bioinformatics 26, 139–140 (2010).
Article CAS PubMed Google Scholar
Anders, S. & Huber, W. Differential expression analysis for sequence count data. Genome Biol 11, R106; 10.1186/gb-2010-11-10-r106 (2010).
Article CAS PubMed PubMed Central Google Scholar
Bodenhofer, U., Kothmeier, A. & Hochreiter, S. APCluster: an R package for affinity propagation clustering. Bioinformatics 27, 2463–2464 (2011).
Article CAS PubMed Google Scholar
Frey, B. J. & Dueck, D. Clustering by passing messages between data points. Science 315, 972–976 (2007).
Article CAS ADS MathSciNet PubMed MATH Google Scholar
Xiong, Q., Mukherjee, S. & Furey, T. S. GSAASeqSP: a toolset for gene set association analysis of RNA-Seq data. Sci Rep 4, 6347; 10.1038/srep06347 (2014).
Article CAS ADS PubMed PubMed Central Google Scholar
Zhang, B., Kirov, S. & Snoddy, J. WebGestalt: an integrated system for exploring gene sets in various biological contexts. Nucleic Acids Res 33, W741–748 (2005).
Article CAS PubMed PubMed Central Google Scholar
Wang, J., Duncan, D., Shi, Z. & Zhang, B. WEB-based GEne SeT AnaLysis Toolkit (WebGestalt): update 2013. Nucleic Acids Res 41, W77–83 (2013).
Article PubMed PubMed Central Google Scholar
Kanehisa, M. et al. Data, information, knowledge and principle: back to metabolism in KEGG. Nucleic Acids Res 42, D199–205 (2014).
Article CAS PubMed Google Scholar
Cerami, E. G. et al. Pathway Commons, a web resource for biological pathway data. Nucleic Acids Res 39, D685–690 (2011).
Article CAS PubMed Google Scholar
Killock, D. Lung cancer: alternative rearrangements–targeting ROS1 in NSCLC. Nat Rev Clin Oncol 11, 624; 10.1038/nrclinonc.2014.180 (2014).
Article PubMed Google Scholar
Bergethon, K. et al. ROS1 rearrangements define a unique molecular class of lung cancers. J Clin Oncol 30, 863–870 (2012).
Article CAS PubMed PubMed Central Google Scholar
Campo, I. et al. A large kindred of pulmonary fibrosis associated with a novel ABCA3 gene variant. Respir Res 15, 43; 10.1186/1465-9921-15-43 (2014).
Article CAS PubMed PubMed Central Google Scholar
Wambach, J. A. et al. Genotype-phenotype correlations for infants and children with ABCA3 deficiency. Am J Respir Crit Care Med 189, 1538–1543 (2014).
Article CAS PubMed PubMed Central Google Scholar
Agrawal, A. et al. An intronic ABCA3 mutation that is responsible for respiratory disease. Pediatr Res 71, 633–637 (2012).
Article CAS ADS PubMed PubMed Central Google Scholar
Gower, W. A. et al. Fatal familial lung disease caused by ABCA3 deficiency without identified ABCA3 mutations. J Pediatr 157, 62–68 (2010).
Article CAS PubMed Google Scholar
Xie, Y. et al. Aquaporin 1 and aquaporin 4 are involved in invasion of lung cancer cells. Clin Lab 58, 75–80 (2012).
CAS PubMed Google Scholar
Warth, A. et al. Loss of aquaporin-4 expression and putative function in non-small cell lung cancer. BMC Cancer 11, 161; 10.1186/1471-2407-11-161 (2011).
Article CAS PubMed PubMed Central Google Scholar
Yang, Q. et al. STAT3 activation and aberrant ligand-dependent sonic hedgehog signaling in human pulmonary adenocarcinoma. Exp Mol Pathol 93, 227–236 (2012).
Article CAS PubMed Google Scholar
Zhou, X. et al. Identification of a chronic obstructive pulmonary disease genetic determinant that regulates HHIP. Hum Mol Genet 21, 1325–1335 (2012).
Article CAS PubMed Google Scholar
Li, X. et al. Importance of hedgehog interacting protein and other lung function genes in asthma. J Allergy Clin Immunol 127, 1457–1465 (2011).
Article CAS PubMed PubMed Central Google Scholar
Jonsson, A. L., Simonsen, U., Hilberg, O. & Bendstrup, E. Pulmonary alveolar microlithiasis: two case reports and review of the literature. Eur Respir Rev 21, 249–256 (2012).
Article PubMed Google Scholar
Ferreira Francisco, F. A., Pereira e Silva, J. L., Hochhegger, B., Zanetti, G. & Marchiori, E. Pulmonary alveolar microlithiasis. State-of-the-art review. Respir Med 107, 1–9 (2013).
Article PubMed Google Scholar
Edmiston, J. S. et al. Gene expression profiling of peripheral blood leukocytes identifies potential novel biomarkers of chronic obstructive pulmonary disease in current and former smokers. Biomarkers 15, 715–730 (2010).
Article CAS PubMed Google Scholar
Hawkins, G. A. et al. The IL6R variation Asp(358)Ala is a potential modifier of lung function in subjects with asthma. J Allergy Clin Immunol 130, 510–515 e511 (2012).
Article CAS PubMed PubMed Central Google Scholar

Download references

Acknowledgements

The results published here are based on data generated by the TCGA Research Network: http://cancergenome.nih.gov/. This work was supported by the Open Fund of State Key Laboratory of Silkworm Genome Biology (sklsgb2013005) (Q.X.).

Author information

Authors and Affiliations

State Key Laboratory of Silkworm Genome Biology, Southwest University, Chongqing, 400715, China
Li Peng & Qing You Xia
Institute of Pathology and Southwest Cancer Center, Southwest Hospital, Third Military Medical University, Chongqing, 400038, China
Xiu Wu Bian & Chuan Xu
Department of Computer Science and Technology, Department of Statistics, Southwest University, Chongqing, 400715, China
Di Kang Li & Qing Xiong
Department of Oncology, Chengdu Military General Hospital of PLA, Chengdu, 610083, China
Chuan Xu
Department of Pathology, Clinical School, Dali University, Dali, 671000, China
Guang Ming Wang

Authors

Li Peng
View author publications
You can also search for this author in PubMed Google Scholar
Xiu Wu Bian
View author publications
You can also search for this author in PubMed Google Scholar
Di Kang Li
View author publications
You can also search for this author in PubMed Google Scholar
Chuan Xu
View author publications
You can also search for this author in PubMed Google Scholar
Guang Ming Wang
View author publications
You can also search for this author in PubMed Google Scholar
Qing You Xia
View author publications
You can also search for this author in PubMed Google Scholar
Qing Xiong
View author publications
You can also search for this author in PubMed Google Scholar

Contributions

Q.X. designed the study. L.P., D.K.L. and Q.X. performed experiments. L.P., X.W.B., D.K.L., C.X., G.M.W., Q.Y.X. and Q.X. contributed to the interpretation of results and manuscript.

Ethics declarations

Competing interests

The authors declare no competing financial interests.

Electronic supplementary material

Supplementary Tables S1-S5

Rights and permissions

This work is licensed under a Creative Commons Attribution 4.0 International License. The images or other third party material in this article are included in the article’s Creative Commons license, unless indicated otherwise in the credit line; if the material is not included under the Creative Commons license, users will need to obtain permission from the license holder to reproduce the material. To view a copy of this license, visit http://creativecommons.org/licenses/by/4.0/

Reprints and permissions

About this article

Cite this article

Peng, L., Bian, X., Li, D. et al. Large-scale RNA-Seq Transcriptome Analysis of 4043 Cancers and 548 Normal Tissue Controls across 12 TCGA Cancer Types. Sci Rep 5, 13413 (2015). https://doi.org/10.1038/srep13413

Download citation

Received: 11 December 2014
Accepted: 27 July 2015
Published: 21 August 2015
DOI: https://doi.org/10.1038/srep13413

This article is cited by

Mutant APC reshapes Wnt signaling plasma membrane nanodomains by altering cholesterol levels via oncogenic β-catenin
- Alfredo Erazo-Oliveras
- Mónica Muñoz-Vega
- Robert S. Chapkin
Nature Communications (2023)
Exaggerated false positives by popular differential expression methods when analyzing human population samples
- Yumei Li
- Xinzhou Ge
- Jingyi Jessica Li
Genome Biology (2022)
B4GALT5 high expression associated with poor prognosis of hepatocellular carcinoma
- Yang Han
- Zhe Li
- Judong Luo
BMC Cancer (2022)
Transcriptome analysis of ankylosed primary molars with infraocclusion
- Annie Tong
- Yuh-Lit Chow
- Seong-Seng Tan
International Journal of Oral Science (2020)
Pan-cancer analysis reveals synergistic effects of CDK4/6i and PARPi combination treatment in RB-proficient and RB-deficient breast cancer cells
- Songyu Li
- Yixiang Zhang
- Qing-kai Yang
Cell Death & Disease (2020)

Comments

By submitting a comment you agree to abide by our Terms and Community Guidelines. If you find something abusive or that does not comply with our terms or guidelines please flag it as inappropriate.

Subjects

Abstract

Similar content being viewed by others

Introduction

Results

Gene-level differential expression analysis of transcriptomes

Gene clustering

Gene set association analysis of TCGA data

Cross-cancer gene signatures

Cross-cancer gene signature 1 – CLUSTER241.

Cross-cancer gene signature 2 – CLUSTER514.

Cross-cancer gene signature 3 – CLUSTER1011.

Cross-cancer gene signature 4 – CLUSTER932.

Cross-cancer gene signature 5 – CLUSTER574.

Cross-cancer gene signature 6 – CLUSTER3137.

Cross-cancer gene signature 7 – CLUSTER184.

Gene signatures significantly altered in one type of cancer

Gene signatures for lung cancers.

Gene signatures for BLCA.

Gene signatures for BRCA.

Gene signatures for KICH.

Leave-one-out cross validation

Validation of the 14-gene cross-cancer signature and a cancer-specific gene signature, CLUSTER1520, on non-TCGA data sets

Gene set association analysis of two non-TCGA data sets

Discussion

Methods

Overview

Data sets

Differential expression analysis of individual genes

Clustering and gene set association analysis

Pathway enrichment analysis and disease association analysis

Additional Information

References

Acknowledgements

Author information

Authors and Affiliations

Contributions

Ethics declarations

Competing interests

Electronic supplementary material

Rights and permissions

About this article

Cite this article

Share this article

This article is cited by

Comments

Search

Quick links