Hepatocellular carcinomas (HCCs) are liver tumors related to various etiologies, including alcohol intake and infection with hepatitis B (HBV) or C (HCV) virus. Additional risk factors remain to be identified, particularly in patients who develop HCC without cirrhosis. We found clonal integration of adeno-associated virus type 2 (AAV2) in 11 of 193 HCCs. These AAV2 integrations occurred in known cancer driver genes, namely CCNA2 (cyclin A2; four cases), TERT (telomerase reverse transcriptase; one case), CCNE1 (cyclin E1; three cases), TNFSF10 (tumor necrosis factor superfamily member 10; two cases) and KMT2B (lysine-specific methyltransferase 2B; one case), leading to overexpression of the target genes. Tumors with viral integration mainly developed in non-cirrhotic liver (9 of 11 cases) and without known risk factors (6 of 11 cases), suggesting a pathogenic role for AAV2 in these patients. In conclusion, AAV2 is a DNA virus associated with oncogenic insertional mutagenesis in human HCC.


HCC is predominantly related to HBV or HCV infection, high levels of alcohol consumption and metabolic syndromes that cause chronic liver injury and the development of cirrhosis. Although HCC occurs following cirrhosis in the majority of cases, a subset of tumors (around 5% of all cases) develop in non-fibrotic livers without identified risk factors1. Irrespective of etiology, HCC results from an accumulation of genetic or epigenetic alterations, with frequent recurrent somatic mutations in TERT, CTNNB1, TP53, ARID1A, ARID2 and RPS6KA3 (refs. 2,3,4,5,6). In HBV-infected patients, specific oncogenic mechanisms are also related to insertional mutagenesis from the integration of viral DNA in cancer driver genes, with TERT, CCNE1 and KMT2B (also known as MLL4) frequently targeted7,8.

Recently, TERT promoter mutations were identified as the most frequent and earliest recurrent somatic genetic alterations in HCC4,5,6,9. These mutations are known to activate telomerase, a key enzyme for telomere maintenance that is known to be essential for malignant transformation4,6,9,10. While screening a series of 150 HCCs for TERT promoter mutations, we identified a 208-bp insertion of AAV2. The viral sequence was inserted 187 bp upstream of the start codon of TERT, accompanied by a 16-bp deletion of human genome sequence and a 7-bp insertion of undetermined human or viral origin (Fig. 1a).

Figure 1: AAV2 integration in the TERT promoter and AAV2 mapping using viral capture.
Figure 1

(a) The AAV2 integration site in the TERT promoter in the CHC985T tumor sample. Blue numbered boxes represent exons. The AAV2 sequences at the boundaries of the insertion region are represented in red, and the coordinates of the inserted nucleotide fragment are given relative to the AAV2 reference sequence (AF043303.1); the 5′–3′ orientation of the AAV2 inserted sequence is indicated above. The human sequences at the insertion site are shown in black, with the coordinates on chromosome 5 given according to the GRCh37 reference sequence. Base pairs inserted at the boundaries are shown in gray. (b) The impact of AAV2 integration in the TERT promoter was evaluated using promoter luciferase assays in the Huh6 and Huh7 liver cell lines. TERT promoter with inserted AAV2 or scrambled AAV2 sequence was compared to the wild-type promoter (WT). Error bars, s.d. of triplicate experiments corresponding to three independent transfections for each plasmid in each cell line. **, 0.01 < P < 0.001; ***, 0.001 < P < 0.0001; NS, not significant; significance was determined by Student's t test. (c) Number of viral (top) and chimeric (bottom) reads identified by capture of AAV2 sequences followed by deep sequencing. The seven cases showing evidence of clonal insertion in the tumor are represented. (d) The location of the AAV2 insertion on the human genome and the abundance of chimeric reads are displayed for the tumors (top) and adjacent liver samples (bottom) from the seven cases. Genes harboring clonal insertions are labeled.

AAVs, which are members of the parvovirus group, have single-stranded, linear DNA genomes11. They are defective viruses that require coinfection with adenovirus for productive infection12,13. Early studies showed that, during infection, the AAV2 DNA integrates into the human genome, where it remains quiescent until a new parvovirus infection occurs12,14,15. To investigate the functional consequences of viral integration, we generated a construct based on the pGL3 reporter vector, reproducing the exact AAV2 inserted sequence. In two different liver cell lines, the AAV2 insertion significantly (P < 0.05) increased TERT promoter activity in comparison to the promoter with a scrambled sequence integrated and the wild-type promoter (Fig. 1b and Supplementary Fig. 1). We observed similar activation through the introduction of the two classical hotspot somatic mutations at positions –124 and –146 bp with respect to the start codon (Supplementary Fig. 2). Accordingly, TERT mRNA was overexpressed in the tumor with the insertion (tumor/normal expression ratio = 18), as was observed in more than 90% of the HCCs.

Next, we searched for the presence of AAV2 DNA by PCR in 150 HCCs and their matched non-tumor liver tissue samples, covering the entire viral genome in 9 fragments (Supplementary Fig. 3 and Supplementary Table 1). We identified AAV2 amplicons in 33 (21%) non-tumor liver tissues and 11 (7%) HCCs. Subsequent AAV2 capture and deep sequencing for 43 paired tumor and non-tumor samples identified AAV2 reads in 7 tumors and 20 non-tumors, all of which were positive for AAV2 by PCR (see Supplementary Fig. 4 for a flowchart of the experimental design). All 11 cases that displayed a positive signal in PCR but no AAV2 reads in the deep sequencing analysis corresponded to samples with a unique small AAV2 fragment amplified by PCR. The number of captured AAV2 sequence reads was higher in tumors than in the non-tumor tissues (Fig. 1c). Chimeric human-viral reads were identified in both tumors and non-tumor tissues (Fig. 1c); however, a large number of reads (ranging from 53 to 1,460) at AAV2 integration sites were identified only in 7 tumors within 4 different genes (TERT, CCNA2, CCNE1 and TNFSF10; Fig. 1d). At these loci, we found independent clusters of sequences demonstrating the clonal nature of the AAV2 insertion in each tumor (Fig. 1d and Supplementary Fig. 5). In contrast, non-clonal AAV2 insertions were distributed throughout the genome in non-tumor samples, without enrichment of chromosome 19, as previously described in cell lines14,16 (Fig. 1d). Next, we analyzed whole-exome sequencing data for 43 additional HCCs6 and identified AAV2 insertions in CCNE1, TNFSF10 and KMT2B in 4 additional tumors but in none of their corresponding non-tumor liver tissues (Supplementary Fig. 4). Overall, we identified 11 of 193 HCCs with clonal AAV2 insertion validated by individual PCR amplification and Sanger sequencing (see Figs. 2,3,4, Table 1 and Supplementary Tables 1 and 2 for descriptions of the patients).

Figure 2: AAV2 integration in CCNA2 and the consequences on gene expression.
Figure 2

(a) Location of the AAV2 integration sites in the CCNA2 gene in the CHC2128T, CHC2112T, CHC2206T and CHC313T tumor samples (depicted using the same representation as in Fig. 1a). (b) The levels of CCNA2 mRNA were assayed in HCCs with AAV2 insertion, considering the CCNA2 target gene and other target genes, in comparison to HCCs without AAV2 insertion and non-tumor liver (NTL) tissues. The fold change (log2) in gene expression is presented relative to the expression in normal liver tissue on the y axis. The red lines correspond to the mean values; error bars, s.d. A significant difference in expression was defined by Wilcoxon rank-sum test: *P < 0.05, **P < 0.01. (c) IGV-Sashimi plots showing RNA-seq alignments for three cases with AAV2 insertion in CCNA2; a normal liver tissue sample and an HCC without AAV2 insertion are shown as controls. Alignments in exons are represented as read density, and alignments to splice junctions are shown as an arc connecting a pair of exons, where arc width is proportional to the number of reads aligning to the junction. The canonical transcripts for the gene are shown above (NM_001237). Poly(A), polyadenylation sequence.

Figure 3: AAV2 integration in TNFSF10 and the consequences on gene expression.
Figure 3

(a) Location of the AAV2 integration sites in TNFSF10 in the CHC2557T and CHC1602T tumor samples (depicted using the same representation as in Fig. 1a). (b) The impact of AAV2 integration on the expression of TNFSF10 was assayed in HCCs with AAV2 insertion (presented as described in Fig. 2b). *, 0.05 < P < 0.01. (c) IGV-Sashimi plots showing RNA-seq alignments for two cases with AAV2 insertion in TNFSF10; a normal liver tissue sample and an HCC without AAV insertion are shown as controls as in Figure 2c. The transcripts for canonical and described isoforms for the gene are shown above (NM_003810). (d) The impact of the two AAV2 integrations in the 3′ UTR of TNFSF10 was evaluated using luciferase assays in the Huh6 and Huh7 liver cell lines. The 3′ UTR of TNFSF10 with either of the two AAV2 insertions identified in CHC2557T and CHC1602T or a scrambled sequence cloned into the pmirGLO vector was compared to vector encoding the wild-type 3′ UTR (WT). Error bars, s.d. of triplicate experiments corresponding to three independent transfections for each plasmid in each cell line. t tests were performed; NS, P > 0.05; *, 0.05 < P < 0.01; **, 0.01 < P < 0.001; ***, 0.001 < P < 0.0001.

Figure 4: AAV2 integration in CCNE1 and KMT2B and the consequences on gene expression.
Figure 4

(a,c) The AAV2 integration sites located in CCNE1 and KMT2B in three tumors (CHC2141T, CHC1591T and CHC2208T) (a) and CHC1185T (c), respectively (presented as in Fig. 1a). (b,d) Expression of CCNE1 (b) and KMT2B (d) was assayed in HCCs with and without AAV2 insertion (presented as in Fig. 2b).

Table 1: Patient and tumor features according to the presence of AAV2 insertion in the tumor genome

CCNA2 was targeted by AAV2 insertion in four cases at different integration sites clustered within 1 kb in intron 2. The insertions ranged from 238 to 1,975 bp in length, occurring in both orientations (Fig. 2a). CCNA2 encodes cyclin A2, which is commonly overexpressed in malignant tumors and controls the cell cycle during the G1/S and G2/M transitions17,18. The four tumors with AAV2 insertion showed CCNA2 overexpression similar to what was observed in 75% of HCCs, in contrast to the low-level expression in non-tumor liver tissues (Fig. 2b). In the three tumors analyzed by RNA sequencing (RNA-seq), we observed overexpression of mature spliced CCNA2 mRNAs but no mature in-frame chimeric human-viral transcripts (Fig. 2c). However, two cases with AAV2 inserted in a 5′–3′ orientation showed high levels of immature chimeric RNAs including intronic sequences and multiple stop codons. In CHC2206T, the immature mRNA ended at the classical AAV2 polyadenylation site at position 4,424, and in CHC313T the immature mRNA ended at a new viral polyadenylation site created at position 4,315 by nucleotide substitutions. Additionally, in CHC313T, 5′ donor splice sites (positions 2,867 and 4,354 in AAV2) were identified (Fig. 2c).

In two HCC cases, AAV2 integrated sequences (301 and 210 bp) mapped within the 3′ UTR of TNSFS10, 255 bp and 132 bp downstream of the stop codon, respectively (Fig. 3a). TNFSF10 encodes TRAIL, the tumor necrosis factor apoptosis-inducing ligand that activates a proapoptotic pathway but also promotes cell survival via activation of the nuclear factor (NF)-κB, phosphoinositide 3-kinase (PI3K) and MAPK signaling pathways19. Both AAV2 insertions occurred in the 5′–3′ orientation, associated with high overexpression of TNFSF10 mRNA in comparison to samples lacking an insertion (Fig. 3b). RNA-seq analysis identified a large number of well-spliced transcripts ending prematurely at the viral polyadenylation site at position 4,424 (Fig. 3c). We cloned the AAV2 insertions identified in CHC1602T and CHC2557T in the 3′ UTR of TNFSF10 into the pmirGLO vector and showed increased luciferase activity in comparison to the wild-type 3′ UTR and the 3′ UTR with integration of a scrambled sequence (Fig. 3d). Overall, these results suggest that TNFSF10 overexpression results from the inserted alleles and is caused by the AAV2 sequence itself through a mechanism that remains to be elucidated.

In three tumors, AAV2 insertion occurred in CCNE1 (Fig. 4), which encodes cyclin E1, a protein involved in cell proliferation and the G1/S transition17,18. In CCNE1, we identified AAV2 integrations of 221, 258 and 368 bp within 15 kb of the 5′ UTR, exon 1 and intron 4, respectively. The insertion in exon 1 occurred in the 5′–3′ orientation, whereas the two other insertions were in the opposite orientation (Fig. 4a). In all tumors with an insertion in this gene, we observed high overexpression of CCNE1 in comparison to tumors without an insertion and non-tumor liver samples (tumor/normal expression ratio = 470; Fig. 4b). RNA-seq analysis of the CHC2208T and CHC1591T tumor samples with AAV2 insertions in intron 4 showed overexpression of mature wild-type CCNE1 transcripts without expression of chimeric human-viral transcript (Supplementary Fig. 6).

The last case of AAV2 insertion occurred in exon 3 of KMT2B, encoding MLL4, a histone methyltransferase frequently mutated in HCC and prone to HBV insertion (Fig. 4c and Supplementary Fig. 7)6,7,20. Quantitative PCR showed high expression of KMT2B in the tumor for CHC1185T that displayed an AAV2 insertion in exon 3 of KMT2B, as compared to the normal sample (tumor/normal expression ratio = 8.5; Fig. 4d). No RNA-seq data were available.

Notably, in nearly all cases, only a small part of the AAV2 genome was inserted, including a portion of the 3′ inverse tandem repeat (ITR), identified as the smallest commonly inserted region in 10 of the 11 cases (Fig. 5a). The AAV2 ITR, a sequence mandatory for viral insertion in the genome, is composed of seven regions—A, A′, B, B′, C, C′ and D—that can form secondary structures in two configurations (flip and flop)21. In two cases, the integrated viral sequences resulted from an insertion in the flip conformation (with a CC′:BB′ junction), whereas the insertion was in the flop conformation (with a BB′:CC′ junction) in the other eight cases (Fig. 5b). Additionally, we observed small deletions (from 7 to 22 bp) of the human genome sequence without any distinct nucleotide motif at the insertion points. Alignment of the inserted AAV2 sequences showed 96–99% homology with the reference AAV2 sequence, with recurrent nucleotide substitutions observed at eight positions (Fig. 5a and Supplementary Figs. 8 and 9).

Figure 5: AAV2 inserted sequences and secondary structures of the wild-type AAV2 3′ ITR.
Figure 5

(a) Schematic of the AAV2 genome inserted in 11 different HCC samples. Recurrent nucleotide mismatches, either silent (in green) or leading to amino acid substitution (in red), in comparison to the reference AAV2 genome (AF043303.1) are indicated. The gray area highlights the common insertion region. The three promoters (P5, P19 and P40) and the reading frames encoding the Rep and cap proteins are indicated above. (b) The viral breakpoints identified at each integration site in tumors are indicated on the wild-type secondary sequences of the 3′ ITR. The AAV2 ITR is composed of seven regions—A, A′, B, B′, C, C′ and D—and can form two different secondary structures (flip and flop), as previously described21. The Rep-binding sequence (RBS) motif and terminal resolution site (trs) are indicated.

Correlations with clinical and histological characteristics and the genomic features identified by exome sequencing6 in the entire series of 193 tumors (Table 1 and Supplementary Table 1) showed more frequent AAV2 insertion in HCCs arising in non-fibrotic livers (8 of the 11 HCCs with METAVIR score of F0 or F1) and without known risk factors (6 of the 11 HCCs) and in patients younger than 60 years of age (Table 1). Interestingly, none of the 11 HCCs harbored TERT promoter mutations as compared to a frequency of 70% for mutation carriers in tumors lacking AAV2 insertion. Only two tumors (CHC1602T and CHC2208T) showed focal amplification of TERT6. Because TERT promoter mutations are early events during hepatocarcinogenesis, these results suggest that AAV2-associated HCCs are caused by different mechanisms of carcinogenesis at early stages, as in HBV-related HCC7,22. HCCs with AAV2 insertion were also less frequently mutated for CTNNB1 but enriched in PTEN and AXIN1 alterations, without significant differences in the total number of mutations or chromosome alterations per tumor in comparison to tumors without insertions6 (Table 1).

Here we show that, as with human papillomavirus, Merkel cell polyomavirus and HBV, AAV2 infection induces insertional mutagenesis in tumors23. Strikingly, four of the genes targeted by AAV2 integration, namely CCNA2, CCNE1, KMT2B and the TERT promoter, have also been described as cancer driver genes, recurrently targeted by HBV integration in HCC at the same hotspots (Supplementary Fig. 7)7,8,24. All tumors displayed overexpression of the gene targeted by AAV2 insertion, and, interestingly, the common AAV2 insertion region includes the A/D sequence of the 3′ ITR, which has potential transcriptional activity or could act as a cryptic promoter13,25,26. Our in cellulo models of AAV2 insertion in the TERT promoter and the 3′ UTR of TNFSF10 suggest that inserted AAV2 sequences can modulate the transcription of these target genes. Interestingly, in CCNA2, the four AAV2 insertions were clustered within the same intron, in either orientation, thus suggesting that integrated AAV2 could also act as an enhancer, the most frequent mechanism also involved at HBV insertion sites7,22,27.

Although AAV2 infection is frequent in the human population, with 30–50% of adults harboring neutralizing antibodies directed against AAV2, no specific diseases have yet been associated with natural infection28,29. The gap between the high rate of AAV2 infection in humans and the rare occurrence of HCC with AAV2 integration mirrors the Epstein-Barr virus (EBV) paradox, where the virus causes very frequent infections in the general population but only rare cancers, such as nasopharyngeal carcinoma and Burkitt lymphoma27. The ability of AAV2 to integrate into the genome, the high infectivity of cells and the apparent lack of induced diseases in humans have supported extensive development of AAV2-derived vectors for gene therapy for more than 20 years30,31. To our knowledge, no cases of HCC were described in patients treated with AAV vectors. However, two mouse models treated by recombinant AAV-mediated gene therapy developed HCC as a result of insertional mutagenesis involving the 5′ ITR of the AAV vector in the chromosome 12 locus that includes the noncoding RNA genes Rian and Mirg32,33. In human, dysregulation of the genes located at the syntenic locus was observed in a subset of HCCs34. In our study, we did not find any clonal insertion in this region, underlining the differences between species and with the use of recombinant AAV. Overall, the present results and the occurrence of HCC in two different mouse models infected by AAV vectors support the role of AAV in liver carcinogenesis by insertional mutagenesis in both humans and rodents32,33,35,36.


Patients and tissue samples.

A total of 193 cases were included in the study approved by our local institutional review board (IRB) committees (CCPRB Paris Saint-Louis, 1997 and 2004; Bordeaux 2010-A00498-31) (Table 1 and Supplementary Table 2). In two French university hospitals (Bordeaux and Créteil), HCC tissues and corresponding non-tumor liver samples were frozen systematically immediately after surgery. For all samples from 1996 to 2014, DNA and RNA were systematically extracted. For the present study, we selected 193 HCC samples and corresponding non-tumor tissues with good-quality DNA, available clinical data, histological review and analysis of gene mutations as described in Guichard et al.3. In this series, 179 HCCs were treated by liver resection and 14 cases were treated by liver transplantation. We enriched our series of patients who developed HCC in non-fibrotic livers with a specific national cohort of patients (NoFLIC), focusing on molecular alterations in this subgroup of tumors. All patients provided informed consent according to French law.

A flowchart of the study inclusion practices at each analytical step is provided in Supplementary Figure 4.

DNA amplification.

Samples were screened for the presence of AAV2 (AF043303.1) with nine pairs of primers designed with Primer3 software (Supplementary Fig. 3 and Supplementary Table 1).

Viral capture and DNA sequencing.

Genomic DNA (600 ng fragmented to 150–200 nt in length) was captured using Agilent in-solution enrichment methodology and a biotinylated RNA oligonucleotide probe library, followed by 2 × 75-bp paired-end massively parallel sequencing on the Illumina HiSeq 2000 platform37 (IntegraGen). The AAV2 genome (4.67 kb) was fragmented into 305 segments of 120 bases, allowing a tiling density of 8× (the list of probes is available upon request). ELANDv2 (CASAVA1.8, Illumina) was used to align reads to the human and AAV2 (AF043303.1) reference genomes. The algorithm identified chimeric read pairs that mapped to both a human chromosome and the AAV2 virus. After removing PCR duplicates, the reference genome was divided into overlapping bins of fixed size, in which each read pair was assigned to a cluster of chimeric viral-human sequences.

All insertions identified by capture and sequencing were amplified by PCR using primers mapping to the AAV2 and human sequences (Supplementary Table 1) and analyzed by Sanger sequencing38.

In the next step, we conducted hybrid de novo assembly of the AAV2 integration sites in the human genome. The paired-end reads were newly mapped to the fusion reference genomes using Burrows-Wheeler Aligner (BWA; version 0.7.12)39; the alignment files were then converted to BAM format and sorted using SAMtools (version 1.2)40. Read alignments were displayed at base-pair resolution using IGV (Integrative Genomics Viewer) (Supplementary Fig. 5)41.

Somatic mutations in the TERT promoter and in the CTNNB1, TP53, AXIN1, ARID1A and ARID2 genes were searched for in the entire coding sequence using Sanger sequencing as previously described3.

Whole-exome sequencing and data analysis.

Sequence capture, enrichment and elution for 43 pairs of genomic DNA samples was performed by IntegraGen as previously described. Somatic variant calling was carried out as described in Schulze et al.6.

For each tumor and matched normal sample, the sequence reads were mapped de novo to the AAV2 reference genome (AF043303.1) using BWA (version 0.7.12)39; the alignment files were then converted to BAM format and sorted using SAMtools (version 1.2)40.

Read alignments were displayed at base-pair resolution using IGV41. Samples that displayed AAV2-matched reads, singleton reads and paired reads for which one read was unmapped were subsequently extracted using SAMtools (version 1.2), and the corresponding sequences were autoassembled using Sequencher (version 5.1; GeneCodes).

Quantitative RT-PCR.

RNA was isolated using the Maxwell Tissue LEV Total RNA Purification kit and instrument (Promega); 1 μg of RNA was reverse transcribed using MultiScribe reverse transcriptase and random hexamers (Applied Biosystems). Quantitative RT-PCR was performed using predesigned TaqMan probes (Hs00996788_m1, Hs01026536_m1, Hs00921974_m1, Hs00207065_m1 and Hs00972656_m1 for CCNA2, CCNE1, TNFSF10, KMT2B (MLL4) and TERT, respectively) and the ABI BioMark HD reader (Fluidigm). Expression data (Ct values) were acquired using Fluidigm Real-Time PCR Analysis software (4.1.3) and the 2ΔΔCT method with 18 S rRNA as the calibrator as previously described42.

RNA sequencing.

RNA samples were enriched for polyadenylated RNA from 5 μg of total RNA, and the enriched samples were used to generate sequencing libraries with the Illumina TruSeq Stranded mRNA kit and associated protocol as provided by the manufacturer. The libraries were sequenced on an Illumina HiSeq 2000 sequencer, yielding approximately 45 million 100-bp paired-end reads (IntegraGen). Reads were mapped with TopHat2 (v.2.0.9; default parameters and supplying Ensembl GTF annotation) according to the method described by Trapnell et al.43,44.

Next, we conducted hybrid de novo assembly of the AAV2 integration sites in the human genome using a chimeric viral-human reference genome as described above. Quantifications of transcript products and isoforms were visualized alongside the raw and de novo–aligned RNA-seq data using Sashimi plots, an implementation built into the IGV browser45.

Nucleotide alignment of the inserted AAV2 sequences.

The sequence homology of the resulting consensus sequences was determined using BLAST searches. A maximum E value of 0.001 was considered for this analysis (the lower the E value, the higher the homology).

Multiple-sequence alignment of the inserted AAV2 sequences in the 11 cases with the AAV2 genome available from GenBank (AF043303.1) was performed using the MultAlin program46.

Molecular phylogenetic analysis was performed using the maximum-likelihood method on the basis of the Tamura-Nei model47. Initial tree(s) for the heuristic search were obtained automatically by applying the Neighbor-Join and BioNJ algorithms to a matrix of pairwise distances estimated using the maximum composite likelihood (MCL) approach, and the topology with the superior log-likelihood score was then selected. The tree is drawn to scale, with branch lengths corresponding to the number of substitutions per site. There were a total of 4,845 positions in the final data set. Evolutionary analyses were conducted in MEGA6 (ref. 48).

Site-directed mutagenesis.

The 208-bp AAV2 sequence along with the additional 7 bp identified in CHC985T was inserted into the pGL3 vector (Supplementary Fig. 1a) in the TERT promoter (500 bp upstream of the TSS) kindly provided by X. Mayol (Institut Hospital del Mar d'Investigacions Mèdiques) by nucleotide synthesis (GenScript). The QuikChange Lightning Site-Directed Mutagenesis kit (Agilent Technologies) was used to introduce hotspot point mutations in the TERT promoter (g.1295228G>A (−124G>A) and g.1295250G>A (−146G>A)) using specific primers (Supplementary Table 1). The 210-bp AAV2 sequence from CHC2557T and the 301-bp AAV2 sequence along with 2 bp from CHC1602T were cloned into pmirGLO vector containing a 132-bp fragment of the 3′ UTR of TNFSF10 (CHC2557T) or the entire 3′ UTR of TNFSF10 (984 bp; CHC1602T) (described in Fig. 3) after synthesis by GenScript (Supplementary Fig. 1b). Scrambled sequences were obtained by nucleotide randomization of the inserted AAV2 sequences (the CHC985T sequence for the TERT promoter and the CHC1602T sequence for the TNFSF10 3′ UTR), resulting in sequences with the same size and nucleotide content, which were then inserted at the same position. All the constructs were resequenced using the Sanger method.

Cell culture, transfection and dual-luciferase assays.

HuH7 HCC and HuH6 hepatoblastoma cells were purchased from the American Type Culture Collection (ATCC) and cultured in DMEM supplemented with 10% FBS. Cell lines were systematically screened for mycoplasma infection. Cells were transfected in reverse mode using Lipofectamine LTX PLUS (Life Technologies). Cells were cotransfected with pGL3 plasmid containing the wild-type TERT promoter or promoter with the two hotspot mutations, AAV2 insertion or insertion of a scrambled AAV2 sequence controlling a luciferase reporter gene and plasmid encoding Renilla firefly luciferase (Promega) (Supplementary Fig. 2). To study the 3′ UTR of TNFSF10, cells were transfected with pmirGLO plasmid (Promega) containing the wild-type TNFSF10 3′ UTR or the 3′ UTR with two different types of AAV2 insertion or scrambled AAV2 sequence downstream of a luciferase reporter gene. Luminescence from firefly luciferase was normalized to the corresponding Renilla luciferase activity (indicator of transfection efficiency). The fold change in activity was then calculated relative to the values obtained for constructs containing wild-type TERT promoter or TNFSF10 3′ UTR.

Statistical analyses.

Statistical analyses were performed using R and GraphPad Prism. The relationship between AAV2 insertion and clinical, histological and genetic features of HCC was investigated using χ2 tests with Monte Carlo simulation according to Hope49. P-value adjustment was computed for a Monte Carlo test with 2,000 permutations. The strength of association and exclusion among gene mutation events was modeled using a binomial logistic regression model. A q value less than 0.1 was considered to be statistically significant.

For repeated measures, one-way analysis of variance (ANOVA) compared the means of cell line matched groups. In the experiments with transfected cell lines, the statistical significance of the quantitative values for the repeated measures were determined using Student's t tests in comparison to cell lines transfected with empty vector. A P value of less than 0.05 was considered to be statistically significant. All statistical tests were two-sided.


EGA (European Genome-phenome Archive), http://www.ebi.ac.uk/ega/; Primer3 software, http://fokker.wi.mit.edu/primer3/input.htm.

Accession codes.

The sequences reported here have been deposited in EGA (European Genome-phenome Archive) under accessions EGAS00001000217, EGAS00001000679 and EGAS00001001002. The human-AAV2 chimeric sequence data from the 11 patients with HCC have been deposited in GenBank (accessions KT258720KT258730).


Primary accessions

NCBI Reference Sequence

Referenced accessions

NCBI Reference Sequence


  1. 1.

    , & Hepatocellular carcinoma. Lancet 379, 1245–1255 (2012).

  2. 2.

    et al. Whole-genome sequencing of liver cancers identifies etiological influences on mutation patterns and recurrent mutations in chromatin regulators. Nat. Genet. 44, 760–764 (2012).

  3. 3.

    et al. Integrated analysis of somatic mutations and focal copy-number changes identifies key genes and pathways in hepatocellular carcinoma. Nat. Genet. 44, 694–698 (2012).

  4. 4.

    et al. High frequency of telomerase reverse-transcriptase promoter somatic mutations in hepatocellular carcinoma and preneoplastic lesions. Nat. Commun. 4, 2218 (2013).

  5. 5.

    et al. Trans-ancestry mutational landscape of hepatocellular carcinoma genomes. Nat. Genet. 46, 1267–1273 (2014).

  6. 6.

    et al. Exome sequencing of hepatocellular carcinomas identifies new mutational signatures and potential therapeutic targets. Nat. Genet. 47, 505–511 (2015).

  7. 7.

    et al. Genome-wide survey of recurrent HBV integration in hepatocellular carcinoma. Nat. Genet. 44, 765–769 (2012).

  8. 8.

    et al. Hepatitis B virus–related insertional mutagenesis occurs frequently in human liver cancers and recurrently targets human telomerase gene. Oncogene 22, 3911–3916 (2003).

  9. 9.

    et al. Telomerase reverse transcriptase promoter mutation is an early somatic genetic alteration in the transformation of premalignant nodules in hepatocellular carcinoma on cirrhosis. Hepatology 60, 1983–1992 (2014).

  10. 10.

    et al. Genomic profiling of hepatocellular adenomas reveals recurrent FRK-activating mutations and the mechanisms of malignant transformation. Cancer Cell 25, 428–441 (2014).

  11. 11.

    , & Adenovirus-associated defective virus particles. Science 149, 754–756 (1965).

  12. 12.

    Adeno-associated virus integration: virus versus vector. Gene Ther. 15, 817–822 (2008).

  13. 13.

    & Adeno-associated virus: a ubiquitous commensal of mammals. Hum. Gene Ther. 16, 401–407 (2005).

  14. 14.

    et al. Targeted integration of adeno-associated virus (AAV) into human chromosome 19. EMBO J. 10, 3941–3950 (1991).

  15. 15.

    & Preferential integration of adeno-associated virus type 2 into a polypyrimidine/polypurine-rich region within AAVS1. J. Virol. 81, 9718–9726 (2007).

  16. 16.

    et al. Site-specific integration by adeno-associated virus. Proc. Natl. Acad. Sci. USA 87, 2211–2215 (1990).

  17. 17.

    , & Cyclins and CDKs in development and cancer: a perspective. Oncogene 24, 2909–2915 (2005).

  18. 18.

    et al. Oncogenic activation of a human cyclin A2 targeted to the endoplasmic reticulum upon hepatitis B virus genome insertion. Oncogene 16, 1277–1288 (1998).

  19. 19.

    , & The TRAIL apoptotic pathway in cancer onset, progression and therapy. Nat. Rev. Cancer 8, 782–798 (2008).

  20. 20.

    et al. Recurrent targeted genes of hepatitis B virus in the liver cancer genomes identified by a next-generation sequencing–based approach. PLoS Genet. 8, e1003065 (2012).

  21. 21.

    Adeno-associated virus: from defective virus to effective vector. Virol. J. 2, 43 (2005).

  22. 22.

    , & Mechanisms of HBV-related hepatocarcinogenesis. J. Hepatol. 52, 594–604 (2010).

  23. 23.

    & Human tumor-associated viruses and new insights into the molecular mechanisms of cancer. Oncogene 27 (suppl. 2), S31–S42 (2008).

  24. 24.

    , , & Hepatitis B virus integration in a cyclin A gene in a hepatocellular carcinoma. Nature 343, 555–557 (1990).

  25. 25.

    , & Novel transcriptional regulatory signals in the adeno-associated virus terminal repeat A/D junction element. J. Virol. 74, 8732–8739 (2000).

  26. 26.

    et al. Expression of the human multidrug resistance and glucocerebrosidase cDNAs from adeno-associated vectors: efficient promoter activity of AAV sequences and in vivo delivery via liposomes. Hum. Gene Ther. 7, 1309–1322 (1996).

  27. 27.

    & Why do viruses cause cancer? Highlights of the first century of human tumour virology. Nat. Rev. Cancer 10, 878–889 (2010).

  28. 28.

    , , , & Worldwide epidemiology of neutralizing antibodies to adeno-associated viruses. J. Infect. Dis. 199, 381–390 (2009).

  29. 29.

    et al. Prevalence of neutralizing antibodies against adeno-associated virus (AAV) types 2, 5, and 6 in cystic fibrosis and normal populations: implications for gene therapy using AAV vectors. Hum. Gene Ther. 17, 440–447 (2006).

  30. 30.

    et al. Adenovirus-associated virus vector-mediated gene transfer in hemophilia B. N. Engl. J. Med. 365, 2357–2365 (2011).

  31. 31.

    et al. AAV2-GAD gene therapy for advanced Parkinson's disease: a double-blind, sham-surgery controlled, randomised trial. Lancet Neurol. 10, 309–319 (2011).

  32. 32.

    et al. AAV vector integration sites in mouse hepatocellular carcinoma. Science 317, 477 (2007).

  33. 33.

    et al. Vector design influences hepatic genotoxicity after adeno-associated virus gene therapy. J. Clin. Invest. 125, 870–880 (2015).

  34. 34.

    et al. Induction of hepatocellular carcinoma by in vivo gene targeting. Proc. Natl. Acad. Sci. USA 109, 11264–11269 (2012).

  35. 35.

    et al. Analysis of tumors arising in male B6C3F1 mice with and without AAV vector delivery to liver. Mol. Ther. 14, 34–44 (2006).

  36. 36.

    et al. Patterns of scAAV vector insertion associated with oncogenic events in a mouse model for genotoxicity. Mol. Ther. 20, 2098–2110 (2012).

  37. 37.

    et al. Solution hybrid selection with ultra-long oligonucleotides for massively parallel targeted sequencing. Nat. Biotechnol. 27, 182–189 (2009).

  38. 38.

    et al. Frequent in-frame somatic deletions activate gp130 in inflammatory hepatocellular tumours. Nature 457, 200–204 (2009).

  39. 39.

    & Fast and accurate long-read alignment with Burrows-Wheeler transform. Bioinformatics 26, 589–595 (2010).

  40. 40.

    et al. The Sequence Alignment/Map format and SAMtools. Bioinformatics 25, 2078–2079 (2009).

  41. 41.

    , & Integrative Genomics Viewer (IGV): high-performance genomics data visualization and exploration. Brief. Bioinform. 14, 178–192 (2013).

  42. 42.

    et al. Germline hepatocyte nuclear factor 1α and 1β mutations in renal cell carcinomas. Hum. Mol. Genet. 14, 603–614 (2005).

  43. 43.

    et al. Differential gene and transcript expression analysis of RNA-seq experiments with TopHat and Cufflinks. Nat. Protoc. 7, 562–578 (2012).

  44. 44.

    , & TopHat: discovering splice junctions with RNA-Seq. Bioinformatics 25, 1105–1111 (2009).

  45. 45.

    , , & Analysis and design of RNA sequencing experiments for identifying isoform regulation. Nat. Methods 7, 1009–1015 (2010).

  46. 46.

    Multiple sequence alignment with hierarchical clustering. Nucleic Acids Res. 16, 10881–10890 (1988).

  47. 47.

    & Estimation of the number of nucleotide substitutions in the control region of mitochondrial DNA in humans and chimpanzees. Mol. Biol. Evol. 10, 512–526 (1993).

  48. 48.

    , , , & MEGA6: Molecular Evolutionary Genetics Analysis version 6.0. Mol. Biol. Evol. 30, 2725–2729 (2013).

  49. 49.

    A simplified Monte Carolo significance test procedure. J. R. Stat. Soc., B 30, 582–598 (1968).

Download references


We warmly thank L. Yost, E. Chevet and A. de Reynies for critical review of the manuscript and helpful discussion. We thank all the clinician surgeons and pathologists who have participated in this work: J. Saric, C. Laurent, L. Chiche, B. Le Bail and C. Castain (CHU Bordeaux) and Y. Allory, K. Leroy and D. Azoulay (CHU Henri Mondor). We also thank the Réseau National Centre de Ressources Biologiques (CRB) Foie and the tumor banks of CHU Bordeaux and CHU Henri Mondor for contributing to the tissue collection. This work was supported by Institut Nationale du Cancer (INCa) with the International Cancer Genome Consortium (ICGC) and the PAIR-CHC project NoFLIC (also funded by Association pour la Recherche sur le Cancer (ARC)). The group is supported by the Ligue Nationale contre le Cancer. J.-C.N., M.M., C.P. and A.F. were supported by fellowships from INCa, AERIO-Boehringer-Ingelheim, ARC and the Ligue Nationale contre le Cancer, respectively.

Author information

Author notes

    • Jean-Charles Nault
    • , Shalini Datta
    •  & Sandrine Imbeaud

    These authors contributed equally to this work.


  1. INSERM, Unité Mixte de Recherche (UMR) 1162, Génomique Fonctionnelle des Tumeurs Solides, Equipe Labellisée Ligue contre le Cancer, Paris, France.

    • Jean-Charles Nault
    • , Shalini Datta
    • , Sandrine Imbeaud
    • , Andrea Franconi
    • , Maxime Mallet
    • , Gabrielle Couchy
    • , Eric Letouzé
    • , Camilla Pilati
    • , Benjamin Verret
    • , Julien Calderaro
    • , Fabien Calvo
    •  & Jessica Zucman-Rossi
  2. Université Paris Descartes, Labex Immuno-Oncology, Sorbonne Paris Cité, Paris, France.

    • Jean-Charles Nault
    • , Shalini Datta
    • , Sandrine Imbeaud
    • , Andrea Franconi
    • , Maxime Mallet
    • , Gabrielle Couchy
    • , Eric Letouzé
    • , Camilla Pilati
    • , Julien Calderaro
    • , Fabien Calvo
    •  & Jessica Zucman-Rossi
  3. Université Paris 13, Sorbonne Paris Cité, Unité de Formation et de Recherche (UFR) Santé, Médecine, Biologie Humaine (SMBH), Bobigny, France.

    • Jean-Charles Nault
    • , Shalini Datta
    • , Sandrine Imbeaud
    • , Andrea Franconi
    • , Maxime Mallet
    • , Gabrielle Couchy
    • , Eric Letouzé
    • , Camilla Pilati
    • , Julien Calderaro
    • , Fabien Calvo
    •  & Jessica Zucman-Rossi
  4. Université Paris Diderot, Institut Universitaire d'Hématologie, Paris, France.

    • Jean-Charles Nault
    • , Shalini Datta
    • , Sandrine Imbeaud
    • , Andrea Franconi
    • , Maxime Mallet
    • , Gabrielle Couchy
    • , Eric Letouzé
    • , Camilla Pilati
    • , Benjamin Verret
    • , Julien Calderaro
    • , Fabien Calvo
    •  & Jessica Zucman-Rossi
  5. Assistance Publique–Hôpitaux de Paris (AP-HP), Hôpitaux Universitaires Paris–Seine Saint-Denis, Site Jean Verdier, Pôle d'Activité Cancérologique Spécialisée, Service d'Hépatologie, Bondy, France.

    • Jean-Charles Nault
  6. Centre Hospitalier Universitaire (CHU) de Bordeaux, Department of Hepatology, Hôpital Saint-André, Bordeaux, France.

    • Jean-Frédéric Blanc
  7. INSERM, UMR 1053, Bordeaux, France.

    • Jean-Frédéric Blanc
    • , Charles Balabaud
    •  & Paulette Bioulac-Sage
  8. Université de Bordeaux, Bordeaux, France.

    • Jean-Frédéric Blanc
    • , Charles Balabaud
    •  & Paulette Bioulac-Sage
  9. AP-HP, Department of Pathology, CHU Henri Mondor, Créteil, France.

    • Julien Calderaro
  10. AP-HP, Department of Digestive and Hepatobiliary Surgery, CHU Henri Mondor, Créteil, France.

    • Alexis Laurent
  11. INSERM, U955, Créteil, France.

    • Alexis Laurent
  12. IntegraGen, Evry, France.

    • Mélanie Letexier
  13. CHU de Bordeaux, Pellegrin Hospital, Department of Pathology, Bordeaux, France.

    • Paulette Bioulac-Sage
  14. Institut Gustave Roussy, Core Europe, Villejuif, France.

    • Fabien Calvo
  15. AP-HP, Hôpital Européen Georges Pompidou, Paris, France.

    • Jessica Zucman-Rossi


  1. Search for Jean-Charles Nault in:

  2. Search for Shalini Datta in:

  3. Search for Sandrine Imbeaud in:

  4. Search for Andrea Franconi in:

  5. Search for Maxime Mallet in:

  6. Search for Gabrielle Couchy in:

  7. Search for Eric Letouzé in:

  8. Search for Camilla Pilati in:

  9. Search for Benjamin Verret in:

  10. Search for Jean-Frédéric Blanc in:

  11. Search for Charles Balabaud in:

  12. Search for Julien Calderaro in:

  13. Search for Alexis Laurent in:

  14. Search for Mélanie Letexier in:

  15. Search for Paulette Bioulac-Sage in:

  16. Search for Fabien Calvo in:

  17. Search for Jessica Zucman-Rossi in:


J.-C.N., S.D., S.I. and J.Z.-R. designed the study and wrote the manuscript. J.Z.-R. conceived and directed the research. J.-C.N., S.D., A.F., M.M., G.C., C.P. and B.V. performed the experiments. J.-C.N., S.D., S.I., A.F., M.M., G.C., E.L., C.P., B.V., F.C. and J.Z.-R. analyzed and interpreted the data. S.I., E.L. and M.L. performed bioinformatics and statistical analysis. J.-F.B., C.B., J.C., A.L. and P.B.-S. provided essential biological resources and collected clinical data. All authors approved the final manuscript and contributed to critical revisions to its intellectual context.

Competing interests

IntegraGen performed all the next-generation sequencing, and M.L. is an employee of IntegraGen. All other authors declare no competing financial interests.

Corresponding author

Correspondence to Jessica Zucman-Rossi.

Supplementary information

PDF files

  1. 1.

    Supplementary Text and Figures

    Supplementary Figures 1–9 and Supplementary Tables 1 and 2.

About this article

Publication history






Further reading