Hepatocellular carcinomas (HCCs) are liver tumors related to various etiologies, including alcohol intake and infection with hepatitis B (HBV) or C (HCV) virus. Additional risk factors remain to be identified, particularly in patients who develop HCC without cirrhosis. We found clonal integration of adeno-associated virus type 2 (AAV2) in 11 of 193 HCCs. These AAV2 integrations occurred in known cancer driver genes, namely CCNA2 (cyclin A2; four cases), TERT (telomerase reverse transcriptase; one case), CCNE1 (cyclin E1; three cases), TNFSF10 (tumor necrosis factor superfamily member 10; two cases) and KMT2B (lysine-specific methyltransferase 2B; one case), leading to overexpression of the target genes. Tumors with viral integration mainly developed in non-cirrhotic liver (9 of 11 cases) and without known risk factors (6 of 11 cases), suggesting a pathogenic role for AAV2 in these patients. In conclusion, AAV2 is a DNA virus associated with oncogenic insertional mutagenesis in human HCC.
HCC is predominantly related to HBV or HCV infection, high levels of alcohol consumption and metabolic syndromes that cause chronic liver injury and the development of cirrhosis. Although HCC occurs following cirrhosis in the majority of cases, a subset of tumors (around 5% of all cases) develop in non-fibrotic livers without identified risk factors1. Irrespective of etiology, HCC results from an accumulation of genetic or epigenetic alterations, with frequent recurrent somatic mutations in TERT, CTNNB1, TP53, ARID1A, ARID2 and RPS6KA3 (refs. 2,3,4,5,6). In HBV-infected patients, specific oncogenic mechanisms are also related to insertional mutagenesis from the integration of viral DNA in cancer driver genes, with TERT, CCNE1 and KMT2B (also known as MLL4) frequently targeted7,8.
Recently, TERT promoter mutations were identified as the most frequent and earliest recurrent somatic genetic alterations in HCC4,5,6,9. These mutations are known to activate telomerase, a key enzyme for telomere maintenance that is known to be essential for malignant transformation4,6,9,10. While screening a series of 150 HCCs for TERT promoter mutations, we identified a 208-bp insertion of AAV2. The viral sequence was inserted 187 bp upstream of the start codon of TERT, accompanied by a 16-bp deletion of human genome sequence and a 7-bp insertion of undetermined human or viral origin (Fig. 1a).
AAVs, which are members of the parvovirus group, have single-stranded, linear DNA genomes11. They are defective viruses that require coinfection with adenovirus for productive infection12,13. Early studies showed that, during infection, the AAV2 DNA integrates into the human genome, where it remains quiescent until a new parvovirus infection occurs12,14,15. To investigate the functional consequences of viral integration, we generated a construct based on the pGL3 reporter vector, reproducing the exact AAV2 inserted sequence. In two different liver cell lines, the AAV2 insertion significantly (P < 0.05) increased TERT promoter activity in comparison to the promoter with a scrambled sequence integrated and the wild-type promoter (Fig. 1b and Supplementary Fig. 1). We observed similar activation through the introduction of the two classical hotspot somatic mutations at positions –124 and –146 bp with respect to the start codon (Supplementary Fig. 2). Accordingly, TERT mRNA was overexpressed in the tumor with the insertion (tumor/normal expression ratio = 18), as was observed in more than 90% of the HCCs.
Next, we searched for the presence of AAV2 DNA by PCR in 150 HCCs and their matched non-tumor liver tissue samples, covering the entire viral genome in 9 fragments (Supplementary Fig. 3 and Supplementary Table 1). We identified AAV2 amplicons in 33 (21%) non-tumor liver tissues and 11 (7%) HCCs. Subsequent AAV2 capture and deep sequencing for 43 paired tumor and non-tumor samples identified AAV2 reads in 7 tumors and 20 non-tumors, all of which were positive for AAV2 by PCR (see Supplementary Fig. 4 for a flowchart of the experimental design). All 11 cases that displayed a positive signal in PCR but no AAV2 reads in the deep sequencing analysis corresponded to samples with a unique small AAV2 fragment amplified by PCR. The number of captured AAV2 sequence reads was higher in tumors than in the non-tumor tissues (Fig. 1c). Chimeric human-viral reads were identified in both tumors and non-tumor tissues (Fig. 1c); however, a large number of reads (ranging from 53 to 1,460) at AAV2 integration sites were identified only in 7 tumors within 4 different genes (TERT, CCNA2, CCNE1 and TNFSF10; Fig. 1d). At these loci, we found independent clusters of sequences demonstrating the clonal nature of the AAV2 insertion in each tumor (Fig. 1d and Supplementary Fig. 5). In contrast, non-clonal AAV2 insertions were distributed throughout the genome in non-tumor samples, without enrichment of chromosome 19, as previously described in cell lines14,16 (Fig. 1d). Next, we analyzed whole-exome sequencing data for 43 additional HCCs6 and identified AAV2 insertions in CCNE1, TNFSF10 and KMT2B in 4 additional tumors but in none of their corresponding non-tumor liver tissues (Supplementary Fig. 4). Overall, we identified 11 of 193 HCCs with clonal AAV2 insertion validated by individual PCR amplification and Sanger sequencing (see Figs. 2,3,4, Table 1 and Supplementary Tables 1 and 2 for descriptions of the patients).
CCNA2 was targeted by AAV2 insertion in four cases at different integration sites clustered within 1 kb in intron 2. The insertions ranged from 238 to 1,975 bp in length, occurring in both orientations (Fig. 2a). CCNA2 encodes cyclin A2, which is commonly overexpressed in malignant tumors and controls the cell cycle during the G1/S and G2/M transitions17,18. The four tumors with AAV2 insertion showed CCNA2 overexpression similar to what was observed in 75% of HCCs, in contrast to the low-level expression in non-tumor liver tissues (Fig. 2b). In the three tumors analyzed by RNA sequencing (RNA-seq), we observed overexpression of mature spliced CCNA2 mRNAs but no mature in-frame chimeric human-viral transcripts (Fig. 2c). However, two cases with AAV2 inserted in a 5′–3′ orientation showed high levels of immature chimeric RNAs including intronic sequences and multiple stop codons. In CHC2206T, the immature mRNA ended at the classical AAV2 polyadenylation site at position 4,424, and in CHC313T the immature mRNA ended at a new viral polyadenylation site created at position 4,315 by nucleotide substitutions. Additionally, in CHC313T, 5′ donor splice sites (positions 2,867 and 4,354 in AAV2) were identified (Fig. 2c).
In two HCC cases, AAV2 integrated sequences (301 and 210 bp) mapped within the 3′ UTR of TNSFS10, 255 bp and 132 bp downstream of the stop codon, respectively (Fig. 3a). TNFSF10 encodes TRAIL, the tumor necrosis factor apoptosis-inducing ligand that activates a proapoptotic pathway but also promotes cell survival via activation of the nuclear factor (NF)-κB, phosphoinositide 3-kinase (PI3K) and MAPK signaling pathways19. Both AAV2 insertions occurred in the 5′–3′ orientation, associated with high overexpression of TNFSF10 mRNA in comparison to samples lacking an insertion (Fig. 3b). RNA-seq analysis identified a large number of well-spliced transcripts ending prematurely at the viral polyadenylation site at position 4,424 (Fig. 3c). We cloned the AAV2 insertions identified in CHC1602T and CHC2557T in the 3′ UTR of TNFSF10 into the pmirGLO vector and showed increased luciferase activity in comparison to the wild-type 3′ UTR and the 3′ UTR with integration of a scrambled sequence (Fig. 3d). Overall, these results suggest that TNFSF10 overexpression results from the inserted alleles and is caused by the AAV2 sequence itself through a mechanism that remains to be elucidated.
In three tumors, AAV2 insertion occurred in CCNE1 (Fig. 4), which encodes cyclin E1, a protein involved in cell proliferation and the G1/S transition17,18. In CCNE1, we identified AAV2 integrations of 221, 258 and 368 bp within 15 kb of the 5′ UTR, exon 1 and intron 4, respectively. The insertion in exon 1 occurred in the 5′–3′ orientation, whereas the two other insertions were in the opposite orientation (Fig. 4a). In all tumors with an insertion in this gene, we observed high overexpression of CCNE1 in comparison to tumors without an insertion and non-tumor liver samples (tumor/normal expression ratio = 470; Fig. 4b). RNA-seq analysis of the CHC2208T and CHC1591T tumor samples with AAV2 insertions in intron 4 showed overexpression of mature wild-type CCNE1 transcripts without expression of chimeric human-viral transcript (Supplementary Fig. 6).
The last case of AAV2 insertion occurred in exon 3 of KMT2B, encoding MLL4, a histone methyltransferase frequently mutated in HCC and prone to HBV insertion (Fig. 4c and Supplementary Fig. 7)6,7,20. Quantitative PCR showed high expression of KMT2B in the tumor for CHC1185T that displayed an AAV2 insertion in exon 3 of KMT2B, as compared to the normal sample (tumor/normal expression ratio = 8.5; Fig. 4d). No RNA-seq data were available.
Notably, in nearly all cases, only a small part of the AAV2 genome was inserted, including a portion of the 3′ inverse tandem repeat (ITR), identified as the smallest commonly inserted region in 10 of the 11 cases (Fig. 5a). The AAV2 ITR, a sequence mandatory for viral insertion in the genome, is composed of seven regions—A, A′, B, B′, C, C′ and D—that can form secondary structures in two configurations (flip and flop)21. In two cases, the integrated viral sequences resulted from an insertion in the flip conformation (with a CC′:BB′ junction), whereas the insertion was in the flop conformation (with a BB′:CC′ junction) in the other eight cases (Fig. 5b). Additionally, we observed small deletions (from 7 to 22 bp) of the human genome sequence without any distinct nucleotide motif at the insertion points. Alignment of the inserted AAV2 sequences showed 96–99% homology with the reference AAV2 sequence, with recurrent nucleotide substitutions observed at eight positions (Fig. 5a and Supplementary Figs. 8 and 9).
Correlations with clinical and histological characteristics and the genomic features identified by exome sequencing6 in the entire series of 193 tumors (Table 1 and Supplementary Table 1) showed more frequent AAV2 insertion in HCCs arising in non-fibrotic livers (8 of the 11 HCCs with METAVIR score of F0 or F1) and without known risk factors (6 of the 11 HCCs) and in patients younger than 60 years of age (Table 1). Interestingly, none of the 11 HCCs harbored TERT promoter mutations as compared to a frequency of 70% for mutation carriers in tumors lacking AAV2 insertion. Only two tumors (CHC1602T and CHC2208T) showed focal amplification of TERT6. Because TERT promoter mutations are early events during hepatocarcinogenesis, these results suggest that AAV2-associated HCCs are caused by different mechanisms of carcinogenesis at early stages, as in HBV-related HCC7,22. HCCs with AAV2 insertion were also less frequently mutated for CTNNB1 but enriched in PTEN and AXIN1 alterations, without significant differences in the total number of mutations or chromosome alterations per tumor in comparison to tumors without insertions6 (Table 1).
Here we show that, as with human papillomavirus, Merkel cell polyomavirus and HBV, AAV2 infection induces insertional mutagenesis in tumors23. Strikingly, four of the genes targeted by AAV2 integration, namely CCNA2, CCNE1, KMT2B and the TERT promoter, have also been described as cancer driver genes, recurrently targeted by HBV integration in HCC at the same hotspots (Supplementary Fig. 7)7,8,24. All tumors displayed overexpression of the gene targeted by AAV2 insertion, and, interestingly, the common AAV2 insertion region includes the A/D sequence of the 3′ ITR, which has potential transcriptional activity or could act as a cryptic promoter13,25,26. Our in cellulo models of AAV2 insertion in the TERT promoter and the 3′ UTR of TNFSF10 suggest that inserted AAV2 sequences can modulate the transcription of these target genes. Interestingly, in CCNA2, the four AAV2 insertions were clustered within the same intron, in either orientation, thus suggesting that integrated AAV2 could also act as an enhancer, the most frequent mechanism also involved at HBV insertion sites7,22,27.
Although AAV2 infection is frequent in the human population, with 30–50% of adults harboring neutralizing antibodies directed against AAV2, no specific diseases have yet been associated with natural infection28,29. The gap between the high rate of AAV2 infection in humans and the rare occurrence of HCC with AAV2 integration mirrors the Epstein-Barr virus (EBV) paradox, where the virus causes very frequent infections in the general population but only rare cancers, such as nasopharyngeal carcinoma and Burkitt lymphoma27. The ability of AAV2 to integrate into the genome, the high infectivity of cells and the apparent lack of induced diseases in humans have supported extensive development of AAV2-derived vectors for gene therapy for more than 20 years30,31. To our knowledge, no cases of HCC were described in patients treated with AAV vectors. However, two mouse models treated by recombinant AAV-mediated gene therapy developed HCC as a result of insertional mutagenesis involving the 5′ ITR of the AAV vector in the chromosome 12 locus that includes the noncoding RNA genes Rian and Mirg32,33. In human, dysregulation of the genes located at the syntenic locus was observed in a subset of HCCs34. In our study, we did not find any clonal insertion in this region, underlining the differences between species and with the use of recombinant AAV. Overall, the present results and the occurrence of HCC in two different mouse models infected by AAV vectors support the role of AAV in liver carcinogenesis by insertional mutagenesis in both humans and rodents32,33,35,36.
Patients and tissue samples.
A total of 193 cases were included in the study approved by our local institutional review board (IRB) committees (CCPRB Paris Saint-Louis, 1997 and 2004; Bordeaux 2010-A00498-31) (Table 1 and Supplementary Table 2). In two French university hospitals (Bordeaux and Créteil), HCC tissues and corresponding non-tumor liver samples were frozen systematically immediately after surgery. For all samples from 1996 to 2014, DNA and RNA were systematically extracted. For the present study, we selected 193 HCC samples and corresponding non-tumor tissues with good-quality DNA, available clinical data, histological review and analysis of gene mutations as described in Guichard et al.3. In this series, 179 HCCs were treated by liver resection and 14 cases were treated by liver transplantation. We enriched our series of patients who developed HCC in non-fibrotic livers with a specific national cohort of patients (NoFLIC), focusing on molecular alterations in this subgroup of tumors. All patients provided informed consent according to French law.
A flowchart of the study inclusion practices at each analytical step is provided in Supplementary Figure 4.
Viral capture and DNA sequencing.
Genomic DNA (600 ng fragmented to 150–200 nt in length) was captured using Agilent in-solution enrichment methodology and a biotinylated RNA oligonucleotide probe library, followed by 2 × 75-bp paired-end massively parallel sequencing on the Illumina HiSeq 2000 platform37 (IntegraGen). The AAV2 genome (4.67 kb) was fragmented into 305 segments of 120 bases, allowing a tiling density of 8× (the list of probes is available upon request). ELANDv2 (CASAVA1.8, Illumina) was used to align reads to the human and AAV2 (AF043303.1) reference genomes. The algorithm identified chimeric read pairs that mapped to both a human chromosome and the AAV2 virus. After removing PCR duplicates, the reference genome was divided into overlapping bins of fixed size, in which each read pair was assigned to a cluster of chimeric viral-human sequences.
In the next step, we conducted hybrid de novo assembly of the AAV2 integration sites in the human genome. The paired-end reads were newly mapped to the fusion reference genomes using Burrows-Wheeler Aligner (BWA; version 0.7.12)39; the alignment files were then converted to BAM format and sorted using SAMtools (version 1.2)40. Read alignments were displayed at base-pair resolution using IGV (Integrative Genomics Viewer) (Supplementary Fig. 5)41.
Somatic mutations in the TERT promoter and in the CTNNB1, TP53, AXIN1, ARID1A and ARID2 genes were searched for in the entire coding sequence using Sanger sequencing as previously described3.
Whole-exome sequencing and data analysis.
Sequence capture, enrichment and elution for 43 pairs of genomic DNA samples was performed by IntegraGen as previously described. Somatic variant calling was carried out as described in Schulze et al.6.
For each tumor and matched normal sample, the sequence reads were mapped de novo to the AAV2 reference genome (AF043303.1) using BWA (version 0.7.12)39; the alignment files were then converted to BAM format and sorted using SAMtools (version 1.2)40.
Read alignments were displayed at base-pair resolution using IGV41. Samples that displayed AAV2-matched reads, singleton reads and paired reads for which one read was unmapped were subsequently extracted using SAMtools (version 1.2), and the corresponding sequences were autoassembled using Sequencher (version 5.1; GeneCodes).
RNA was isolated using the Maxwell Tissue LEV Total RNA Purification kit and instrument (Promega); 1 μg of RNA was reverse transcribed using MultiScribe reverse transcriptase and random hexamers (Applied Biosystems). Quantitative RT-PCR was performed using predesigned TaqMan probes (Hs00996788_m1, Hs01026536_m1, Hs00921974_m1, Hs00207065_m1 and Hs00972656_m1 for CCNA2, CCNE1, TNFSF10, KMT2B (MLL4) and TERT, respectively) and the ABI BioMark HD reader (Fluidigm). Expression data (Ct values) were acquired using Fluidigm Real-Time PCR Analysis software (4.1.3) and the method with 18 S rRNA as the calibrator as previously described42.
RNA samples were enriched for polyadenylated RNA from 5 μg of total RNA, and the enriched samples were used to generate sequencing libraries with the Illumina TruSeq Stranded mRNA kit and associated protocol as provided by the manufacturer. The libraries were sequenced on an Illumina HiSeq 2000 sequencer, yielding approximately 45 million 100-bp paired-end reads (IntegraGen). Reads were mapped with TopHat2 (v.2.0.9; default parameters and supplying Ensembl GTF annotation) according to the method described by Trapnell et al.43,44.
Next, we conducted hybrid de novo assembly of the AAV2 integration sites in the human genome using a chimeric viral-human reference genome as described above. Quantifications of transcript products and isoforms were visualized alongside the raw and de novo–aligned RNA-seq data using Sashimi plots, an implementation built into the IGV browser45.
Nucleotide alignment of the inserted AAV2 sequences.
The sequence homology of the resulting consensus sequences was determined using BLAST searches. A maximum E value of 0.001 was considered for this analysis (the lower the E value, the higher the homology).
Molecular phylogenetic analysis was performed using the maximum-likelihood method on the basis of the Tamura-Nei model47. Initial tree(s) for the heuristic search were obtained automatically by applying the Neighbor-Join and BioNJ algorithms to a matrix of pairwise distances estimated using the maximum composite likelihood (MCL) approach, and the topology with the superior log-likelihood score was then selected. The tree is drawn to scale, with branch lengths corresponding to the number of substitutions per site. There were a total of 4,845 positions in the final data set. Evolutionary analyses were conducted in MEGA6 (ref. 48).
The 208-bp AAV2 sequence along with the additional 7 bp identified in CHC985T was inserted into the pGL3 vector (Supplementary Fig. 1a) in the TERT promoter (500 bp upstream of the TSS) kindly provided by X. Mayol (Institut Hospital del Mar d'Investigacions Mèdiques) by nucleotide synthesis (GenScript). The QuikChange Lightning Site-Directed Mutagenesis kit (Agilent Technologies) was used to introduce hotspot point mutations in the TERT promoter (g.1295228G>A (−124G>A) and g.1295250G>A (−146G>A)) using specific primers (Supplementary Table 1). The 210-bp AAV2 sequence from CHC2557T and the 301-bp AAV2 sequence along with 2 bp from CHC1602T were cloned into pmirGLO vector containing a 132-bp fragment of the 3′ UTR of TNFSF10 (CHC2557T) or the entire 3′ UTR of TNFSF10 (984 bp; CHC1602T) (described in Fig. 3) after synthesis by GenScript (Supplementary Fig. 1b). Scrambled sequences were obtained by nucleotide randomization of the inserted AAV2 sequences (the CHC985T sequence for the TERT promoter and the CHC1602T sequence for the TNFSF10 3′ UTR), resulting in sequences with the same size and nucleotide content, which were then inserted at the same position. All the constructs were resequenced using the Sanger method.
Cell culture, transfection and dual-luciferase assays.
HuH7 HCC and HuH6 hepatoblastoma cells were purchased from the American Type Culture Collection (ATCC) and cultured in DMEM supplemented with 10% FBS. Cell lines were systematically screened for mycoplasma infection. Cells were transfected in reverse mode using Lipofectamine LTX PLUS (Life Technologies). Cells were cotransfected with pGL3 plasmid containing the wild-type TERT promoter or promoter with the two hotspot mutations, AAV2 insertion or insertion of a scrambled AAV2 sequence controlling a luciferase reporter gene and plasmid encoding Renilla firefly luciferase (Promega) (Supplementary Fig. 2). To study the 3′ UTR of TNFSF10, cells were transfected with pmirGLO plasmid (Promega) containing the wild-type TNFSF10 3′ UTR or the 3′ UTR with two different types of AAV2 insertion or scrambled AAV2 sequence downstream of a luciferase reporter gene. Luminescence from firefly luciferase was normalized to the corresponding Renilla luciferase activity (indicator of transfection efficiency). The fold change in activity was then calculated relative to the values obtained for constructs containing wild-type TERT promoter or TNFSF10 3′ UTR.
Statistical analyses were performed using R and GraphPad Prism. The relationship between AAV2 insertion and clinical, histological and genetic features of HCC was investigated using χ2 tests with Monte Carlo simulation according to Hope49. P-value adjustment was computed for a Monte Carlo test with 2,000 permutations. The strength of association and exclusion among gene mutation events was modeled using a binomial logistic regression model. A q value less than 0.1 was considered to be statistically significant.
For repeated measures, one-way analysis of variance (ANOVA) compared the means of cell line matched groups. In the experiments with transfected cell lines, the statistical significance of the quantitative values for the repeated measures were determined using Student's t tests in comparison to cell lines transfected with empty vector. A P value of less than 0.05 was considered to be statistically significant. All statistical tests were two-sided.
The sequences reported here have been deposited in EGA (European Genome-phenome Archive) under accessions EGAS00001000217, EGAS00001000679 and EGAS00001001002. The human-AAV2 chimeric sequence data from the 11 patients with HCC have been deposited in GenBank (accessions KT258720–KT258730).
NCBI Reference Sequence
NCBI Reference Sequence
We warmly thank L. Yost, E. Chevet and A. de Reynies for critical review of the manuscript and helpful discussion. We thank all the clinician surgeons and pathologists who have participated in this work: J. Saric, C. Laurent, L. Chiche, B. Le Bail and C. Castain (CHU Bordeaux) and Y. Allory, K. Leroy and D. Azoulay (CHU Henri Mondor). We also thank the Réseau National Centre de Ressources Biologiques (CRB) Foie and the tumor banks of CHU Bordeaux and CHU Henri Mondor for contributing to the tissue collection. This work was supported by Institut Nationale du Cancer (INCa) with the International Cancer Genome Consortium (ICGC) and the PAIR-CHC project NoFLIC (also funded by Association pour la Recherche sur le Cancer (ARC)). The group is supported by the Ligue Nationale contre le Cancer. J.-C.N., M.M., C.P. and A.F. were supported by fellowships from INCa, AERIO-Boehringer-Ingelheim, ARC and the Ligue Nationale contre le Cancer, respectively.
Supplementary Figures 1–9 and Supplementary Tables 1 and 2.
About this article
Current Treatment Options in Oncology (2019)