Convergent Evolution of Copy Number Alterations in Multi-Centric Hepatocellular Carcinoma

In the recent years, new molecular methods have been proposed to discriminate multicentric hepatocellular carcinomas (HCC) from intrahepatic metastases. Some of these methods utilize sequencing data to assess similarities between cancer genomes, whilst other achieved the same results with transcriptome and methylome data. Here, we attempt to classify two HCC patients with multi-centric disease using the recall-rates of somatic mutations but find that difficult because their tumors share some chromosome-scale copy-number alterations (CNAs) but little-to-no single-nucleotide variants. To resolve the apparent conundrum, we apply a phasing strategy to test if those shared CNAs are identical by descent. Our findings suggest that the conflicting alterations occur on different homologous chromosomes, which argues for multi-centric origin of respective HCCs.

Hepatocellular carcinomas (HCC) with intra-hepatic metastases (IM) are profoundly different in their development and clinical outcome from multi-centric tumors 1 . The clinical discrimination between these subtypes has historically been challenging and is usually based on tumor location, blood vessel involvement of the primary tumor, and/or hemodynamics in CT/MRI imaging before resection 2 . Once tumors are resected, pathological evaluation is performed to discriminate between the IMs and MCs. However, there is no consensus at present on how to discriminate the two and it is advised to use complementary molecular tests to reduce the uncertainty. For example, in a recent study Furuta et al. showed similarities between genomes of multi-centric tumors using recall-rates of somatic alterations 3 , while others achieved the same with transcriptome and methylome data 4,5 . In this study, we extend Furuta's strategy by accounting for convergent evolution of copy-number alterations (CNA) in multi-centric tumors that can, in certain situations, complicate data interpretation and clinical reasoning.

Results
We generated testing data from multiple-regions of chemotherapy-naïve multi-centric HCCs of two HBV/ HCV-negative patients (Table 1). In each patient, somatic mutations were identified from multi-region exome sequencing data from three synchronous tumors (Fig. 1A). Prior to the analysis, sequencing reads were mapped onto the human genome hs37d5 and GATK haplotype caller was used to identify single-nucleotide (SNV) mutations and indels simultaneously. On average, 42.8/21.0 somatic SNVs and 106/15.3 indels were detected per tumor region per patient (Fig. 1B, Supplementary Table 2), of which six and four variants per patient were cancer driver mutations (Fig. 1C). Pairwise comparison of somatic SNVs across all regions of a given tumor yielded average recall rates of 65% and 85% per patient, while across different tumors 10.9% and 7.3% ( Fig. 2; Supplementary Tables 3-6).
We therefore ensured that there was not a significant enrichment of false positive variants in our analysis by replicating 21/23 (91%) selected somatic mutations and re-assessing hematoxylin-eosin stains of tumor tissues to ensure that they contained at least 80% of tumor cells (Supplementary Table 7). We then undertook a down-sampling approach to equalize read depth across all tumor regions and repeated the analysis with additional cancer-specialized calling algorithms, including MuTect2 6 (i.e. GATK4 implementation of MuTect) and Strelka 7 . Nevertheless, read-depth equalizing or the use of different variant calling algorithms did not improve the recall rates and a gap between the degree of inter-and intra-tumor heterogeneity remained.
Most somatic SNVs occurred at low allelic frequencies irrespectively of whether they were recalled in different regions of the same tumor or not ( Fig. 3; Supplementary Tables 8-9). Since cancer driver mutations were no exception and thus no clearly clonal candidate was found, we wondered if some other type of mutations  www.nature.com/scientificreports www.nature.com/scientificreports/ could help us to classify multi-centric tumors of our patients. For example, there were four possibly identical copy-number alterations (CNA; Fig. 4) occurring in two tumors of each patient. To determine if these CNAs were not identical by descent, we used a combination of breakpoint analysis and chromosome phasing. The specific phasing algorithm is detailed in material and methods and on a Fig. 4 and is, at heart, a Welch's two sample t-test that compares allele-frequency changes at polymorphic sites of candidate CNAs of tumor samples and a tumor-free tissue, on the assumption that allele frequencies for each chromosomal haplotype are random variables with mean μ and variance σ 2 . Our null hypothesis was that CNAs occurred at the same homologous chromosome in different tumors. After reviewing each case, the null hypothesis was rejected on a significance level of 0.05 for cases of CNAs of chromosome 21q, 8q and 17p, whilst accepted for the case of 6q. In this case, however, we used the breakpoint information to argue that 6qter alterations were independent mutation events.

Discussion
In this study, we attempted to classify multi-centric hepatocellular carcinomas using the information that may be not be available for strategies that either compare tumor morphology or mutation burden. Especially, the use of the latter as proposed by Furuta et al. can be sometimes misleading because the assignment of single or multiple origin category is possible only for tumors which share most or no somatic mutations, but otherwise there is no clear threshold to distinguish them. Given that cancer driver mutations can occur in cells that do not undergo malignant transformation 8,9 , it could well be that multi-centric tumors originating independently can be difficult to distinguish from IMs that have arisen very early 10 . It is also possible that these tumors behave www.nature.com/scientificreports www.nature.com/scientificreports/ differently in development and clinical outcome from IMs that originated late in progression and are classified as true IMs. Thus, if disease prognosis and/or clinical classification of IMs is the end-point of an analysis, molecular strategies utilizing somatic mutations may not be precise enough to distinguish the differing clinical categories. Whilst our study illustrates how refinements of the existing methods can be used for better data interpretation, it also outlines some of the greater limitations that must be overcome before molecular strategies are implemented clinically.  Table 1). Sequencing data are freely available from the authors after a reasonable request. semiconductor sequencing. Ion torrent sequencing was selected for technical replication of somatic variants. Used in conjunction with the AmpliSeq Library Kit 2.0, the Ion AmpliSeq ™ Comprehensive Cancer Panel was selected to capture exons of 409 genes from the Cancer Gene Census database and the resulting libraries were ran on the Ion PGM ™ Sequencer. The raw reads were processed using the Ion Reporter software with recommended settings. sanger sequencing. Following PCR amplification, Sanger sequencing was used to detect TERT promoter mutations g.5:1295228 G > A/T and g.5:1295250 G > A. The primer sequences were 5′-CCAGGGCTTCCCACGTGC-3′ for the forward primer and 5′-ACTGGGGACCCGGGCACC-3′ for the reverse primer.

Methods
Variant Detection and Filtering. Raw sequencing reads were quality-checked (fastqc ver. 0.11.7), adapter-trimmed, duplicate-removed (Picard tools ver. 2.9) and mapped onto the hs37d5 version of the human genome (BWA ver. 0.7). The GATK pipeline (ver. 3.7) was used to perform base-quality score recalibration and variant calling. Specifically, we used GATK haplotype caller algorithm with standard settings, followed by Variant Quality Score Recalibration (VQSR) for single-base substitution identification and the Scalpel algorithm (ver. 0.5.3) for indels. Variant files containing too few variants were specifically filtered (QD > 10.0, MQ > 40.0, FS < 30.0, SOR < 3.0, MQRankSum > −12.5, ReadPosRankSum > −8.0) to extract high-quality variants. We used the human genome assembly hs37d5 and 2017 versions of ANNOVAR databases to annotate variants. Germline or somatic origin of the variants and indels were determined based on their presence or absence in the matched tumour-free tissue.
We applied the following exclusion filters to somatic variants: (i) presence in a segmental duplication region; (ii) variant present in any read from paired normal sample; (iii) fewer than ten reads in total at the variant site in the normal sample; (iv) fewer than eight reads in total in the tumor; (v) fewer three variant reads in the tumor; variant allele frequency <3% in the tumor; and (vi) presence of variant in the Exome Aggregation Consortium dataset (release 22.6.2017) at a frequency >2%. Variants identified in constitutional DNA from any of the other local, non-cancer sequencing project at a frequency of 5% (for example, 29 million variants across 284 samples www.nature.com/scientificreports www.nature.com/scientificreports/ from the Oxford-Illumina WGS500 consortium) were discarded as being more likely due to systematic error in our pipeline than genuine somatic mutations.
Copy Number Calling. Nexus Copy Number Discovery software (ver. 9.0) was used to identify CNAs from sequencing data and for independent validation of these alterations with Affymetrix Oncoscan arrays. The raw