Pan-cancer analysis of somatic mutations and transcriptomes reveals common functional gene clusters shared by multiple cancer types

To discover functional gene clusters across cancers, we performed a systematic pan-cancer analysis of 33 cancer types. We identified genes that were associated with somatic mutations and were the cores of a co-expression network. We found that multiple cancer types have relatively exclusive hub genes individually; however, the hub genes cooperate with each other based on their functional relationship. When we built a protein-protein interaction network of hub genes and found nine functional gene clusters across cancer types, the gene clusters divided not only the region of the network map, but also the function of the network by their distinct roles related to the development and progression of cancer. This functional relationship between the clusters and cancers was underpinned by the high expression of module genes and enrichment of programmed cell death, and known candidate cancer genes. In addition to protein-coding hub genes, non-coding hub genes had a possible relationship with cancer. Overall, our approach of investigating cancer genes enabled finding pan-cancer hub genes and common functional gene clusters shared by multiple cancer types based on the expression status of the primary tumour and the functional relationship of genes in the biological network.

Sample clustering is shown in a dendrogram. Arbitrary cut off value to exclude outlier is shown as red horizontal line. The name of TCGA dataset is located to the right top corner.

Supplementary Figure S2.7. Sample clustering to detect outliers.
Sample clustering is shown in a dendrogram. Arbitrary cut off value to exclude outlier is shown as red horizontal line. The name of TCGA dataset is located to the right top corner. Figure S2.14. Sample clustering to detect outliers.

Supplementary
Sample clustering is shown in a dendrogram. Arbitrary cut off value to exclude outlier is shown as red horizontal line. The name of TCGA dataset is located to the right top corner.

Supplementary Figure S2.20. Sample clustering to detect outliers.
Sample clustering is shown in a dendrogram. Arbitrary cut off value to exclude outlier is shown as red horizontal line. The name of TCGA dataset is located to the right top corner.

Supplementary Figure S2.24. Sample clustering to detect outliers.
Sample clustering is shown in a dendrogram. Arbitrary cut off value to exclude outlier is shown as red horizontal line. The name of TCGA dataset is located to the right top corner.

Supplementary Figure S2.28. Sample clustering to detect outliers.
Sample clustering is shown in a dendrogram. Arbitrary cut off value to exclude outlier is shown as red horizontal line. The name of TCGA dataset is located to the right top corner.

Supplementary Figure S3.2. Network topology analysis of soft-thresholding powers.
The scale-free fit (y-axis) against the soft-thresholding power (x-axis) is shown in the left panel, and the fit value (0.9) is shown as red horizontal lines. The mean connectivity (y-axis) against the soft-thresholding power (x-axis) is shown in the right panel. The name of TCGA dataset is located to the right top corner.

Supplementary Figure S3.3. Network topology analysis of soft-thresholding powers.
The scale-free fit (y-axis) against the soft-thresholding power (x-axis) is shown in the left panel, and the fit value (0.9) is shown as red horizontal lines. The mean connectivity (y-axis) against the soft-thresholding power (x-axis) is shown in the right panel. The name of TCGA dataset is located to the right top corner.

Supplementary Figure S3.4. Network topology analysis of soft-thresholding powers.
The scale-free fit (y-axis) against the soft-thresholding power (x-axis) is shown in the left panel, and the fit value (0.9) is shown as red horizontal lines. The mean connectivity (y-axis) against the soft-thresholding power (x-axis) is shown in the right panel. The name of TCGA dataset is located to the right top corner.

Supplementary Figure S3.5. Network topology analysis of soft-thresholding powers.
The scale-free fit (y-axis) against the soft-thresholding power (x-axis) is shown in the left panel, and the fit value (0.9) is shown as red horizontal lines. The mean connectivity (y-axis) against the soft-thresholding power (x-axis) is shown in the right panel. The name of TCGA dataset is located to the right top corner.

Supplementary Figure S3.6. Network topology analysis of soft-thresholding powers.
The scale-free fit (y-axis) against the soft-thresholding power (x-axis) is shown in the left panel, and the fit value (0.9) is shown as red horizontal lines. The mean connectivity (y-axis) against the soft-thresholding power (x-axis) is shown in the right panel. The name of TCGA dataset is located to the right top corner.

Supplementary Figure S3.7. Network topology analysis of soft-thresholding powers.
The scale-free fit (y-axis) against the soft-thresholding power (x-axis) is shown in the left panel, and the fit value (0.9) is shown as red horizontal lines. The mean connectivity (y-axis) against the soft-thresholding power (x-axis) is shown in the right panel. The name of TCGA dataset is located to the right top corner.

Supplementary Figure S3.8. Network topology analysis of soft-thresholding powers.
The scale-free fit (y-axis) against the soft-thresholding power (x-axis) is shown in the left panel, and the fit value (0.9) is shown as red horizontal lines. The mean connectivity (y-axis) against the soft-thresholding power (x-axis) is shown in the right panel. The name of TCGA dataset is located to the right top corner.

Supplementary Figure S3.9. Network topology analysis of soft-thresholding powers.
The scale-free fit (y-axis) against the soft-thresholding power (x-axis) is shown in the left panel, and the fit value (0.9) is shown as red horizontal lines. The mean connectivity (y-axis) against the soft-thresholding power (x-axis) is shown in the right panel. The name of TCGA dataset is located to the right top corner.

Supplementary Figure S3.10. Network topology analysis of soft-thresholding powers.
The scale-free fit (y-axis) against the soft-thresholding power (x-axis) is shown in the left panel, and the fit value (0.9) is shown as red horizontal lines. The mean connectivity (y-axis) against the soft-thresholding power (x-axis) is shown in the right panel. The name of TCGA dataset is located to the right top corner.

Supplementary Figure S3.11. Network topology analysis of soft-thresholding powers.
The scale-free fit (y-axis) against the soft-thresholding power (x-axis) is shown in the left panel, and the fit value (0.9) is shown as red horizontal lines. The mean connectivity (y-axis) against the soft-thresholding power (x-axis) is shown in the right panel. The name of TCGA dataset is located to the right top corner.

Supplementary Figure S3.12. Network topology analysis of soft-thresholding powers.
The scale-free fit (y-axis) against the soft-thresholding power (x-axis) is shown in the left panel, and the fit value (0.9) is shown as red horizontal lines. The mean connectivity (y-axis) against the soft-thresholding power (x-axis) is shown in the right panel. The name of TCGA dataset is located to the right top corner.
The scale-free fit (y-axis) against the soft-thresholding power (x-axis) is shown in the left panel, and the fit value (0.9) is shown as red horizontal lines. The mean connectivity (y-axis) against the soft-thresholding power (x-axis) is shown in the right panel. The name of TCGA dataset is located to the right top corner. Figure S3.14. Network topology analysis of soft-thresholding powers.

Supplementary
The scale-free fit (y-axis) against the soft-thresholding power (x-axis) is shown in the left panel, and the fit value (0.9) is shown as red horizontal lines. The mean connectivity (y-axis) against the soft-thresholding power (x-axis) is shown in the right panel. The name of TCGA dataset is located to the right top corner.

Supplementary Figure S3.15. Network topology analysis of soft-thresholding powers.
The scale-free fit (y-axis) against the soft-thresholding power (x-axis) is shown in the left panel, and the fit value (0.9) is shown as red horizontal lines. The mean connectivity (y-axis) against the soft-thresholding power (x-axis) is shown in the right panel. The name of TCGA dataset is located to the right top corner.

Supplementary Figure S3.16. Network topology analysis of soft-thresholding powers.
The scale-free fit (y-axis) against the soft-thresholding power (x-axis) is shown in the left panel, and the fit value (0.9) is shown as red horizontal lines. The mean connectivity (y-axis) against the soft-thresholding power (x-axis) is shown in the right panel. The name of TCGA dataset is located to the right top corner.

Supplementary Figure S3.17. Network topology analysis of soft-thresholding powers.
The scale-free fit (y-axis) against the soft-thresholding power (x-axis) is shown in the left panel, and the fit value (0.9) is shown as red horizontal lines. The mean connectivity (y-axis) against the soft-thresholding power (x-axis) is shown in the right panel. The name of TCGA dataset is located to the right top corner.

Supplementary Figure S3.18. Network topology analysis of soft-thresholding powers.
The scale-free fit (y-axis) against the soft-thresholding power (x-axis) is shown in the left panel, and the fit value (0.9) is shown as red horizontal lines. The mean connectivity (y-axis) against the soft-thresholding power (x-axis) is shown in the right panel. The name of TCGA dataset is located to the right top corner.

Supplementary Figure S3.19. Network topology analysis of soft-thresholding powers.
The scale-free fit (y-axis) against the soft-thresholding power (x-axis) is shown in the left panel, and the fit value (0.9) is shown as red horizontal lines. The mean connectivity (y-axis) against the soft-thresholding power (x-axis) is shown in the right panel. The name of TCGA dataset is located to the right top corner.

Supplementary Figure S3.20. Network topology analysis of soft-thresholding powers.
The scale-free fit (y-axis) against the soft-thresholding power (x-axis) is shown in the left panel, and the fit value (0.9) is shown as red horizontal lines. The mean connectivity (y-axis) against the soft-thresholding power (x-axis) is shown in the right panel. The name of TCGA dataset is located to the right top corner.

Supplementary Figure S3.21. Network topology analysis of soft-thresholding powers.
The scale-free fit (y-axis) against the soft-thresholding power (x-axis) is shown in the left panel, and the fit value (0.9) is shown as red horizontal lines. The mean connectivity (y-axis) against the soft-thresholding power (x-axis) is shown in the right panel. The name of TCGA dataset is located to the right top corner.

Supplementary Figure S3.22. Network topology analysis of soft-thresholding powers.
The scale-free fit (y-axis) against the soft-thresholding power (x-axis) is shown in the left panel, and the fit value (0.9) is shown as red horizontal lines. The mean connectivity (y-axis) against the soft-thresholding power (x-axis) is shown in the right panel. The name of TCGA dataset is located to the right top corner.

Supplementary Figure S3.23. Network topology analysis of soft-thresholding powers.
The scale-free fit (y-axis) against the soft-thresholding power (x-axis) is shown in the left panel, and the fit value (0.9) is shown as red horizontal lines. The mean connectivity (y-axis) against the soft-thresholding power (x-axis) is shown in the right panel. The name of TCGA dataset is located to the right top corner.

Supplementary Figure S3.24. Network topology analysis of soft-thresholding powers.
The scale-free fit (y-axis) against the soft-thresholding power (x-axis) is shown in the left panel, and the fit value (0.9) is shown as red horizontal lines. The mean connectivity (y-axis) against the soft-thresholding power (x-axis) is shown in the right panel. The name of TCGA dataset is located to the right top corner.

Supplementary Figure S3.25. Network topology analysis of soft-thresholding powers.
The scale-free fit (y-axis) against the soft-thresholding power (x-axis) is shown in the left panel, and the fit value (0.9) is shown as red horizontal lines. The mean connectivity (y-axis) against the soft-thresholding power (x-axis) is shown in the right panel. The name of TCGA dataset is located to the right top corner.

Supplementary Figure S3.26. Network topology analysis of soft-thresholding powers.
The scale-free fit (y-axis) against the soft-thresholding power (x-axis) is shown in the left panel, and the fit value (0.9) is shown as red horizontal lines. The mean connectivity (y-axis) against the soft-thresholding power (x-axis) is shown in the right panel. The name of TCGA dataset is located to the right top corner.

Supplementary Figure S3.27. Network topology analysis of soft-thresholding powers.
The scale-free fit (y-axis) against the soft-thresholding power (x-axis) is shown in the left panel, and the fit value (0.9) is shown as red horizontal lines. The mean connectivity (y-axis) against the soft-thresholding power (x-axis) is shown in the right panel. The name of TCGA dataset is located to the right top corner.

Supplementary Figure S3.28. Network topology analysis of soft-thresholding powers.
The scale-free fit (y-axis) against the soft-thresholding power (x-axis) is shown in the left panel, and the fit value (0.9) is shown as red horizontal lines. The mean connectivity (y-axis) against the soft-thresholding power (x-axis) is shown in the right panel. The name of TCGA dataset is located to the right top corner.

Supplementary Figure S3.29. Network topology analysis of soft-thresholding powers.
The scale-free fit (y-axis) against the soft-thresholding power (x-axis) is shown in the left panel, and the fit value (0.9) is shown as red horizontal lines. The mean connectivity (y-axis) against the soft-thresholding power (x-axis) is shown in the right panel. The name of TCGA dataset is located to the right top corner.

Supplementary Figure S3.30. Network topology analysis of soft-thresholding powers.
The scale-free fit (y-axis) against the soft-thresholding power (x-axis) is shown in the left panel, and the fit value (0.9) is shown as red horizontal lines. The mean connectivity (y-axis) against the soft-thresholding power (x-axis) is shown in the right panel. The name of TCGA dataset is located to the right top corner.

Supplementary Figure S3.31. Network topology analysis of soft-thresholding powers.
The scale-free fit (y-axis) against the soft-thresholding power (x-axis) is shown in the left panel, and the fit value (0.9) is shown as red horizontal lines. The mean connectivity (y-axis) against the soft-thresholding power (x-axis) is shown in the right panel. The name of TCGA dataset is located to the right top corner.

Supplementary Figure S3.32. Network topology analysis of soft-thresholding powers.
The scale-free fit (y-axis) against the soft-thresholding power (x-axis) is shown in the left panel, and the fit value (0.9) is shown as red horizontal lines. The mean connectivity (y-axis) against the soft-thresholding power (x-axis) is shown in the right panel. The name of TCGA dataset is located to the right top corner.

Supplementary Figure S3.33. Network topology analysis of soft-thresholding powers.
The scale-free fit (y-axis) against the soft-thresholding power (x-axis) is shown in the left panel, and the fit value (0.9) is shown as red horizontal lines. The mean connectivity (y-axis) against the soft-thresholding power (x-axis) is shown in the right panel. The name of TCGA dataset is located to the right top corner.

Supplementary Figure S4.2. Dendrogram of weighted gene co-expression network and module colors.
The genes clustered on the topological overlap matrix (TOM) based on dissimilarity, formed a branch-like shape, and the modules of those genes with high inter-connectivity clustered at the same module. The genes of grey module color are not assigned to any modules. The name of TCGA dataset is located to the right top corner.

Supplementary Figure S4.3. Dendrogram of weighted gene co-expression network and module colors.
The genes clustered on the topological overlap matrix (TOM) based on dissimilarity, formed a branch-like shape, and the modules of those genes with high inter-connectivity clustered at the same module. The genes of grey module color are not assigned to any modules. The name of TCGA dataset is located to the right top corner.

Supplementary Figure S4.4. Dendrogram of weighted gene co-expression network and module colors.
The genes clustered on the topological overlap matrix (TOM) based on dissimilarity, formed a branch-like shape, and the modules of those genes with high inter-connectivity clustered at the same module. The genes of grey module color are not assigned to any modules. The name of TCGA dataset is located to the right top corner.

Supplementary Figure S4.5. Dendrogram of weighted gene co-expression network and module colors.
The genes clustered on the topological overlap matrix (TOM) based on dissimilarity, formed a branch-like shape, and the modules of those genes with high inter-connectivity clustered at the same module. The genes of grey module color are not assigned to any modules. The name of TCGA dataset is located to the right top corner.

Supplementary Figure S4.6. Dendrogram of weighted gene co-expression network and module colors.
The genes clustered on the topological overlap matrix (TOM) based on dissimilarity, formed a branch-like shape, and the modules of those genes with high inter-connectivity clustered at the same module. The genes of grey module color are not assigned to any modules. The name of TCGA dataset is located to the right top corner.

Supplementary Figure S4.7. Dendrogram of weighted gene co-expression network and module colors.
The genes clustered on the topological overlap matrix (TOM) based on dissimilarity, formed a branch-like shape, and the modules of those genes with high inter-connectivity clustered at the same module. The genes of grey module color are not assigned to any modules. The name of TCGA dataset is located to the right top corner.

Supplementary Figure S4.9. Dendrogram of weighted gene co-expression network and module colors.
The genes clustered on the topological overlap matrix (TOM) based on dissimilarity, formed a branch-like shape, and the modules of those genes with high inter-connectivity clustered at the same module. The genes of grey module color are not assigned to any modules. The name of TCGA dataset is located to the right top corner.

Supplementary Figure S4.10. Dendrogram of weighted gene co-expression network and module colors.
The genes clustered on the topological overlap matrix (TOM) based on dissimilarity, formed a branch-like shape, and the modules of those genes with high inter-connectivity clustered at the same module. The genes of grey module color are not assigned to any modules. The name of TCGA dataset is located to the right top corner.

Supplementary Figure S4.11. Dendrogram of weighted gene co-expression network and module colors.
The genes clustered on the topological overlap matrix (TOM) based on dissimilarity, formed a branch-like shape, and the modules of those genes with high inter-connectivity clustered at the same module. The genes of grey module color are not assigned to any modules. The name of TCGA dataset is located to the right top corner.

Supplementary Figure S4.12. Dendrogram of weighted gene co-expression network and module colors.
The genes clustered on the topological overlap matrix (TOM) based on dissimilarity, formed a branch-like shape, and the modules of those genes with high inter-connectivity clustered at the same module. The genes of grey module color are not assigned to any modules. The name of TCGA dataset is located to the right top corner.

Supplementary Figure S4.13. Dendrogram of weighted gene co-expression network and module colors.
The genes clustered on the topological overlap matrix (TOM) based on dissimilarity, formed a branch-like shape, and the modules of those genes with high inter-connectivity clustered at the same module. The genes of grey module color are not assigned to any modules. The name of TCGA dataset is located to the right top corner.

Supplementary Figure S4.14. Dendrogram of weighted gene co-expression network and module colors.
The genes clustered on the topological overlap matrix (TOM) based on dissimilarity, formed a branch-like shape, and the modules of those genes with high inter-connectivity clustered at the same module. The genes of grey module color are not assigned to any modules. The name of TCGA dataset is located to the right top corner.

Supplementary Figure S4.15. Dendrogram of weighted gene co-expression network and module colors.
The genes clustered on the topological overlap matrix (TOM) based on dissimilarity, formed a branch-like shape, and the modules of those genes with high inter-connectivity clustered at the same module. The genes of grey module color are not assigned to any modules. The name of TCGA dataset is located to the right top corner.

Supplementary Figure S4.16. Dendrogram of weighted gene co-expression network and module colors.
The genes clustered on the topological overlap matrix (TOM) based on dissimilarity, formed a branch-like shape, and the modules of those genes with high inter-connectivity clustered at the same module. The genes of grey module color are not assigned to any modules. The name of TCGA dataset is located to the right top corner.

Supplementary Figure S4.18. Dendrogram of weighted gene co-expression network and module colors.
The genes clustered on the topological overlap matrix (TOM) based on dissimilarity, formed a branch-like shape, and the modules of those genes with high inter-connectivity clustered at the same module. The genes of grey module color are not assigned to any modules. The name of TCGA dataset is located to the right top corner.

Supplementary Figure S4.19. Dendrogram of weighted gene co-expression network and module colors.
The genes clustered on the topological overlap matrix (TOM) based on dissimilarity, formed a branch-like shape, and the modules of those genes with high inter-connectivity clustered at the same module. The genes of grey module color are not assigned to any modules. The name of TCGA dataset is located to the right top corner.

Supplementary Figure S4.20. Dendrogram of weighted gene co-expression network and module colors.
The genes clustered on the topological overlap matrix (TOM) based on dissimilarity, formed a branch-like shape, and the modules of those genes with high inter-connectivity clustered at the same module. The genes of grey module color are not assigned to any modules. The name of TCGA dataset is located to the right top corner.

Supplementary Figure S4.31. Dendrogram of weighted gene co-expression network and module colors.
The genes clustered on the topological overlap matrix (TOM) based on dissimilarity, formed a branch-like shape, and the modules of those genes with high inter-connectivity clustered at the same module. The genes of grey module color are not assigned to any modules. The name of TCGA dataset is located to the right top corner.

Supplementary Figure S6. Number of protein-coding and non-coding genes in pan-cancer-wide selected genes.
The lower panel shows the number of protein-coding and non-protein-coding PSGs and upper panel shows the proportion of them for each TCGA dataset. Dark blue presents protein-coding PSGs and light blue shows nonprotein-coding PSGs.

Supplementary Figure S7. Distribution of PPIs for protein-coding PSGs and genes of interaction.
The PSGs that had PPI information were designated as 'representor' (light red), and genes that interacted with representor were designated as 'interactor' (light blue). Density of node (y-axis) against degree of node (x-axis) were plotted.

Supplementary Figure S9. The number of shared genes between subnetworks.
The crescent-shaped lines show the number of genes that shared by clusters. Line width and size of cluster reflect the number of genes.