Functional Genomic Complexity Defines Intratumor Heterogeneity and Tumor Aggressiveness in Liver Cancer

Chronic inflammation and chromosome aneuploidy are major traits of primary liver cancer (PLC), which represent the second most common cause of cancer-related death worldwide. Increased cancer fitness and aggressiveness of PLC may be achieved by enhancing tumoral genomic complexity that alters tumor biology. Here, we developed a scoring method, namely functional genomic complexity (FGC), to determine the degree of molecular heterogeneity among 580 liver tumors with diverse ethnicities and etiologies by assessing integrated genomic and transcriptomic data. We found that tumors with higher FGC scores are associated with chromosome instability and TP53 mutations, and a worse prognosis, while tumors with lower FGC scores have elevated infiltrating lymphocytes and a better prognosis. These results indicate that FGC scores may serve as a surrogate to define genomic heterogeneity of PLC linked to chromosomal instability and evasion of immune surveillance. Our findings demonstrate an ability to define genomic heterogeneity and corresponding tumor biology of liver cancer based only on bulk genomic and transcriptomic data. Our data also provide a rationale for applying this approach to survey liver tumor immunity and to stratify patients for immune-based therapy.

Science Lab, National Cancer Institute, National Institute of health, MD 20892, USA *Corresponding author: xw3u@nih.gov

This PDF file includes:
Supplementary Materials and Methods Figure S1 to S15 Figure S1. Distribution of global correlation coefficient in PLC. Figure S2. Gene Ontology (GO) of PCC associated genes. Figure S3. Association of PCC with CIN and GIN. Figure S4. Association of amplified or deleted CIN (CINampl or CINdel) with PCC. Figure S5. Gene Ontology (GO) of tFA associated genes. Figure S6. Collective association among PCC, CIN, and tFA. Figure S7. Validation of FGC in independent cohorts. Figure S8. Comparison of tFA between HFGC and LFGC. Figure S9. Comparison of SCNA in HFGC and LFGC. Figure S10. Differentially expressed genes (DEG) between HFGC and LFGC of TCGA. Figure S11. Association of PCC with immune cytolytic activity in Thai PLC. Figure S12. Gene Set Enrichment Analysis of HFGC and LFGC of Thai PLC. Figure S13. Integrative analysis based on PCC showed TP53 as a Cancer functional genomic complexity (FGCs) driver. Figure S14. Comparison of TIL subpopulations between HFGC and LFGC. Figure S15. Association of FGC with immunomodulators

Data preprocessing
Transcriptome data and copy number data from TIGER-LC cohort were processed as

Calculation of global correlation
The global correlation coefficients and global correlation p-value based on the total transcriptome probes and corresponding genomic segments were calculated. For this, SCNA value for genomic segments corresponding to the 64,597 transcript probes was assigned using the GenomicRanges R packages. Hereafter, CN denotes copy number value and EXP denotes mRNA expression value.

CNV EXP
Where cnm represents the SCNA value of n th sample corresponding m th feature.
Matched features are expressed below. LOH and allelic specific copy number LOH (Loss of Heterozygosity) for each sample was inferred using Genotyping Console 4.0 and the output CHP file was used as an input file to calculate allele-specific copy numbers using the Partek Genomics Suite 7.5. By merging the copy number of the segmented region defined by an algorithm in Partek and LOH data, the allele-specific copy number was estimated. For further analysis, we calculated the proportion of sample with allele-specific copy number change in each segmented region.

The biological relevance of PCC or tFA associated genes
To examine if PCC or tFA was associated with the biological process, we performed a correlation analysis between PCC and all the transcriptome features. Positively or negatively associated genes were selected based on the correlation estimate and p-value (above top 5% or below the bottom 5% of correlation estimate and p-value < 0.05). Gene ontology enrichment analysis was performed using R package gProfileR based on GO: BP.

Gene Set Enrichment Analysis (GSEA) and single-sample GSEA (ssGSEA)
GSEA was implemented in GenePattern 8 based on the C5 GO gene set of biological process, C2 curated gene sets of KEGG pathway, and C6 Oncogenic gene sets in Molecular Signatures Database (MSigDB database v5.2). Expression data of individual samples were transformed into the gene set enrichment score P-value from the Kolmogorov-Smirnov (ks) test was used a single sample enrichment score.

Measurement of chromosomal instability
To infer chromosomal instability, we devised two indicators, one is based on the Therefore, the tFA is a total summarized level of chromosomal aberration in a given tumor in a univariate measure.

Differentially Expressed Genes (DEG) and Gene Ontology analysis
By comparing the expression between HFGC and LFGC of each tumor type, we selected differentially expressed genes in each subtype based on the fold change and permutation p-value from the permutation t-test with 1,000 resamplings (FC >0.5 or -0.5 & perm p-value <0.005). Gene ontology enrichment analysis was performed based on the DAVID 6.7 10 .

Immune score
An estimation of the relative fractions of immune/inflammatory cell subsets from tissue expression profiles of Thai HCC, iCCA or TCGA HCC was conducted using CIBERSORT 11 . The gene expression data was converted by quantile normalization of the log2 scaled expression matrix and relative fractions of leukocytes were quantified according to the website (https://cibersort.stanford.edu/index.php) with implemented analyses using the built-in LM22 signature matrix (LM22 and "Non-Responder" as followed; "Responder" indicates those who marked as "Complete Response" or "Partial Response", while "Non-Responder" indicates those who marked as "Clinical Progressive Disease" or "Stable Disease" according to the response column of the clinical data.

Statistical Analyses
Kaplan     Chromosomal arms are shown with respect to the frequency of arm-level gain (x-axis) and loss (y-axis), respectively. As a frequency measure, Z score from GISTIC output was used. Vertical dotted blue lines indicate Z score of the arm-level gain frequency is 1 and horizontal dotted blue lines indicate Z score of the arm-level loss frequency is 1. The arms with many gains and many losses or with few gains or few losses were highlighted in red or blue colors, respectively.