Extrachromosomal circularization of DNA is an important genomic feature in cancer. However, the structure, composition and genome-wide frequency of extrachromosomal circular DNA have not yet been profiled extensively. Here, we combine genomic and transcriptomic approaches to describe the landscape of extrachromosomal circular DNA in neuroblastoma, a tumor arising in childhood from primitive cells of the sympathetic nervous system. Our analysis identifies and characterizes a wide catalog of somatically acquired and undescribed extrachromosomal circular DNAs. Moreover, we find that extrachromosomal circular DNAs are an unanticipated major source of somatic rearrangements, contributing to oncogenic remodeling through chimeric circularization and reintegration of circular DNA into the linear genome. Cancer-causing lesions can emerge out of circle-derived rearrangements and are associated with adverse clinical outcome. It is highly probable that circle-derived rearrangements represent an ongoing mutagenic process. Thus, extrachromosomal circular DNAs represent a multihit mutagenic process, with important functional and clinical implications for the origins of genomic remodeling in cancer.
Recent studies have shown that circular DNA is more prevalent in human tissues than previously anticipated1,2,3,4,5. Based on size and copy number, at least three classes of circular DNA exist in human cells: (1) small extrachromosomal circular DNA (including microDNA; referred to as eccDNA throughout the text)3,6; (2) large, copy number–amplified extrachromosomal circular DNA (ecDNA)1, and (3) ring and/or neochromosomes7,8. ecDNA can lead to oncogene amplification and is a powerful driver of intratumoral heterogeneity1,9,10,11,12. Whether ecDNA has other cancer-causing functions is unknown, and the impact circularization has on genome remodeling is unclear.
Neuroblastoma is one of the first tumor entities where extrachromosomal oncogene circularization in the form of MYCN proto-oncogene double-minute chromosomes was detected10,13. Since the first descriptions in 1965 (refs. 14,15), the extent of DNA circularization has not been accurately quantified in neuroblastoma. We hypothesized that DNA circularization could represent a genome-wide, driving mutagenic process in neuroblastoma with functional consequences beyond oncogene amplification. We set out to systematically describe the spectrum and impact of circular DNA in neuroblastoma by using different genomic and transcriptomic approaches (Supplementary Fig. 1).
Since DNA circularity can be computationally inferred from whole-genome sequencing (WGS) data3,16,17, we applied an algorithm using paired-end read orientation to detect circularity to WGS from 93 neuroblastomas paired with normal blood specimens (Fig. 1a,b). This approach detected a large tumor-specific circular DNA catalog, including MYCN double-minute chromosomes, mitochondrial DNA and many previously undescribed ecDNAs and eccDNAs (Fig. 1c,d and Supplementary Fig. 2a,b). This suggests a greater prevalence and complexity of circular DNA in neuroblastoma than previously anticipated.
To achieve complementary and more sensitive detection and characterization of circular DNA in neuroblastoma, we adapted and modified the Circle sequencing (Circle-seq) method (Supplementary Figs. 1 and 2c,d)6. We achieved specific DNA circle enrichment through >1010-fold depletion of linear genomic DNA (gDNA; Fig. 1c and Supplementary Figs. 2c and 3a–c). Applying Circle-seq to endonuclease-treated gDNA significantly reduced read mapping to circularized genomic regions by 474-fold (P = 7.566 × 10−11, Welch’s t-test; Fig. 1c and Supplementary Fig. 3d,e), confirming specific enrichment of circular DNA. Sequence composition was analyzed and genomic origin inferred combining massive parallel paired-end sequencing with long-read Nanopore and single-molecule real-time sequencing (SMRT-seq). Circular head-to-tail junctions predicted computationally were confirmed by PCR and Sanger sequencing (Supplementary Fig. 3a–c). De novo sequence assembly of long reads spanning the entirety of circles allowed further physical confirmation of their circular structure in 65% of cases (Supplementary Fig. 4a–c). Circle-seq confirmed 100% of ecDNAs and 30% of eccDNAs predicted from WGS and identified on average 0.82 ecDNAs and 5,673 eccDNAs per neuroblastoma (Fig. 1c–e and Supplementary Fig. 4d–f). Although ecDNA was accurately predicted from WGS with high sensitivity (100%), our results highlight the advantages of using additional and more sensitive approaches, such as Circle-seq, to obtain a comprehensive characterization of circular DNAs in tumors.
The structure of circularized genomic loci in neuroblastoma varied considerably, with mean sizes of 680,200 base pairs (bp; ecDNA) and 2,403 bp (eccDNA) in tumors, reproducing the oscillating length distribution observed in lymphoma cancer cell lines3 (Fig. 1f and Supplementary Fig. 4g–j). In agreement with cytogenetic reports18, no ring chromosomes were detected in neuroblastoma. Notably, both ecDNAs and eccDNAs were of monoallelic origin, as determined by haplotype phasing (Fig. 1g). Inspection of circle junction sequences (ecDNA and eccDNA) indicated the probable mechanism(s) of generation, since 2.8% contained nontemplate insertions indicative of nonhomologous end joining repair or replication-associated mechanisms (Supplementary Fig. 4k). In line with reports in human lymphoma cell lines19, 6.3% of circle junctions contained sequence microhomologies (minimally 5 bp), suggesting the involvement of microhomology-mediated DNA repair (Supplementary Fig. 4l). Notably, eccDNA and ecDNA were significantly enriched in genic regions, particularly in MYCN-amplified neuroblastomas (Fig. 1h and Supplementary Fig. 5a–c). Whereas ecDNAs regularly contained entire genes (62.5%), eccDNAs mostly included fractions of genes (Fig. 1i). Our genome-wide map of circular DNA in neuroblastoma shows that DNA circularization is not restricted to proto-oncogenes but also affects various coding and noncoding regions with yet unknown functional consequences.
Extrachromosomal circularization and amplification are associated with increased oncogene expression. It is unclear whether circularization itself or subsequent circle copy number amplification drives overexpression. The majority of genomic amplifications (85.7%) identified using WGS coincided with ecDNAs, as confirmed by Circle-seq, suggesting that ecDNAs contribute to genomic amplifications. Moreover, haplotype phasing showed that ecDNAs were exclusively derived from the amplified allele, confirming extrachromosomal circularization as a potential driver of high-level focal genomic amplifications (Fig. 2a,b). Notably, circle length was significantly associated with a higher copy number of circularized regions (Supplementary Fig. 5d; P < 1 × 10−4), implicating circle length as a determining factor for subsequent amplification/propagation of circular DNA (Supplementary Fig. 5d–f). In agreement with its prominent role in neuroblastoma genesis, MYCN was the most recurrently extrachromosomally amplified and overexpressed gene in our cohort (Fig. 2b–e and Supplementary Fig. 5a–c). Other cancer-related genes listed in the Catalogue Of Somatic Mutations In Cancer (COSMIC) database20 were also circularized in tumors and neuroblastoma cell lines, including the JUN and MDM2 proto-oncogenes and SOX11 and TAL2 transcription factor genes (Fig. 2c and Supplementary Fig. 5a–c). However, the genomic copy number of oncogenes contained in the majority of eccDNAs was not altered (Supplementary Fig. 5g,h), suggesting that extrachromosomal circularization may be required but insufficient for oncogene amplification.
To determine the consequences of DNA circularization on gene expression, we performed total RNA sequencing (RNA-seq) on our neuroblastoma cohort. Whereas differences in gene expression were not observed for most genes affected by circularization in the form of small eccDNA (Fig. 2d and Supplementary Fig. 5i–j), massive increases in expression occurred for a small subset of genes entirely incorporated on circularized DNA and amplified as ecDNA (Fig. 2d–f). For example, NTF3, a gene encoding a neurotrophic factor with known importance in neuroblastoma21, was strongly expressed from amplified ecDNA (Fig. 2f). Allele-specific messenger RNA expression (allele-specific expression (ASE)) analysis confirmed that increased gene expression originated from the circular allele (Fig. 2a,b). In contrast, ASE from copy number–neutral extrachromosomal circles did not differ from noncircular counterparts (Supplementary Fig. 5g,i,j; binomial test for equal probability, P = 0.24), suggesting that DNA circularization was insufficient to induce high-level gene expression. Thus, even though DNA circularization is a major route to gene amplification, it appears insufficient alone (without combined amplification) to increase gene expression. Given this observation, we hypothesized that circular DNA may have additional, cancer-relevant functions.
The genome-wide frequency and functional impact of circle-derived structural rearrangements, such as chimeric circle formation (circular DNA including parts from different chromosomes)17,22, and circular DNA reintegration23, in neuroblastomas are currently unknown. We hypothesized that beyond their ability to drive gene amplification, circular DNAs may serve as substrates for oncogenic genome remodeling. We sought evidence of genomic rearrangements at circularization loci (ecDNA and eccDNA) in WGS data (Supplementary Fig. 1). Strikingly, most intrachromosomal and interchromosomal rearrangements detected in neuroblastoma genomes coincided with regions of extrachromosomal circularization, supporting the idea of circle-mediated genome remodeling (Supplementary Fig. 6a,b). Visual inspection of Circos plots from each tumor showed that interchromosomal rearrangements at circularization loci often formed a tree-shaped pattern, defined as clusters of at least three interchromosomal rearrangements with the same origin and branches reaching other distant genomic regions (Fig. 3a,b and Supplementary Fig. 7a–l). Tree-shaped rearrangement cluster origins significantly overlapped with ecDNAs, with hot spots on chromosomes 2 (including MYCN) and 12 (Fig. 3c and Supplementary Fig. 7i). Only 10.5% of MYCN-amplified neuroblastomas displayed homogenously staining regions (Supplementary Table 1), consistent with their rarity in neuroblastomas14,24,25. Thus, the majority of MYCN-derived tree-shaped rearrangements did not represent homogenously staining regions. Tree-shaped rearrangement patterns indicative of circle-derived rearrangements were detected in 9% of pediatric tumors in the analysis of an independent dataset of structural rearrangements in 546 pediatric cancer genomes26, confirming that this pattern is neither entity-specific nor dependent on variant detection methods (Supplementary Fig. 7j). Our data reveal an unanticipated association between circular DNA and somatic genomic rearrangements in neuroblastoma.
We reasoned that circle-derived tree-shaped rearrangements could either represent chromosomal circle integrations or the formation of chimeric circles, incorporating different chromosomal parts. To test this, we inspected the rearrangement recipient sites for signs of extrachromosomal circularization and integration and performed de novo assembly of circular DNAs (ecDNA and eccDNA). Extrachromosomal circular DNAs (identified using Circle-seq) appeared in 5.5% of rearrangement recipient sites (tree branch intercepts), indicating chimeric circle formation (Supplementary Fig. 6). This was confirmed by long-read Nanopore sequencing and assembly-based circle reconstruction, determining chimeric structures in 2.1% of eccDNAs and 84% of ecDNAs with on average 2.2 and 4.8 chimeric segments, respectively. Chromosomal circle integration was defined as interchromosomal rearrangements connecting extrachromosomal circles with intrachromosomal sites (that is, not detected by Circle-seq). The majority of rearrangement recipient sites (83.3%) were classified as circle integrations (Fig. 3d and Supplementary Fig. 6), which were validated by visual inspection of split reads, allele-specific PCR and Sanger sequencing (Fig. 3d and Supplementary Fig. 8). Phased heterozygous SNPs near integration breakpoints further confirmed extrachromosomal DNA circles as the origin of the integrations (Fig. 3d). Thus, circle-derived, tree-shaped rearrangement clusters represent (1) formation of chimeric circles and (2) chromosomal circle integrations.
To test the functional impact of circle-derived, tree-shaped rearrangements in neuroblastoma, we inspected the rearrangement recipient sites for the presence of cancer-relevant genes and changes in gene expression (Fig. 4a). Circle integration sites and sites included in chimeric circles were significantly enriched for cancer-relevant genes (P = 0.033) and particularly for tumor suppressor genes (P = 0.033), whose expression varied from tumors where the same gene was not involved in circle-derived rearrangements (Fig. 4b,c and Supplementary Fig. 9). For example, integration of an extrachromosomal circle fragment into the DCLK1 gene (shown in Fig. 3d) led to loss of heterozygosity (LOH) and was associated with significant repression of DCLK1 expression (Fig. 4b). In agreement with a tumor suppressor function in neuroblastoma, low DCLK1 expression was associated with adverse patient prognosis and short hairpin RNA-mediated DCLK1 knockdown significantly increased clonogenicity in neuroblastoma cell lines (Supplementary Fig. 10a–i). Notably, circle integration also occurred proximal to the TERT gene and was associated with enhanced TERT expression (Fig. 4c). It is tempting to speculate that enhancer hijacking27 or disruption of other cis-regulatory elements could explain such expression changes. Chimeric circle formation, on the other hand, often resulted in simultaneous amplification of multiple proto-oncogenes and aberrant circle-specific fusion transcript expression in a subset of cases (Supplementary Fig. 11). Thus, circle-derived rearrangements can contribute to aberrant expression of cellular tumor suppressors and proto-oncogenes.
Seemingly genetically identical MYCN-amplified neuroblastomas can produce strong clinical heterogeneity, representing a conundrum in the field. We hypothesized that circle-derived oncogenic lesions could functionally cooperate with extrachromosomal circular MYCN amplification, explaining some of the clinical heterogeneity observed. Indeed, the presence of circle-derived rearrangements was associated with adverse patient outcome (Fig. 4d). In line with our hypothesis, patients with MYCN-amplified neuroblastomas and circle-derived rearrangement clusters involving MYCN had significantly worse overall survival compared to patients with MYCN-amplified tumors lacking such rearrangements (Fig. 4e). Contrastingly, the number of rearrangements in MYCN-amplified tumors did not correlate with survival (Supplementary Fig. 12a–c). This implicates circle-derived rearrangements as clinically relevant genomic alterations in neuroblastoma.
Our work provides a comprehensive map of extrachromosomal DNA circularization in neuroblastoma, revealing this mutagenic process to be more frequent than previously anticipated. We demonstrate that the majority of genomic rearrangements in neuroblastoma involve circular DNA, challenging our current understanding about cancer genome remodeling. Such rearrangements have previously gone largely undetected or underestimated in WGS analyses because integrative, sequencing-based methods identifying circular DNA in tumor samples were lacking. In contrast to previous cytogenetic reports describing homogenously staining region-based circle integration and chimeric circle formation as a means of stable gene amplification, we conclude that extrachromosomally circularized DNA can actively contribute to genome remodeling with important functional and clinical consequences (Fig. 4f). It is tempting to speculate that factors exist, such as recently described oncogenic transposases28,29,30, that could induce a mutator phenotype in the presence of circular DNA, driving circle-mediated genome remodeling. We envision that our findings extend to other cancers and that further detailed analyses of circle-derived rearrangements will shed new insights into our understanding of cancer genome remodeling.
The synthetic oligonucleotides listed in Supplementary Table 2 were obtained from Eurofins Genomics and were salt-free purified. pLKO.1 shRNA vectors targeting DCLK1 (TRCN0000002145, TRCN0000002146) and control short hairpin green fluorescent protein were obtained from the RNAi Consortium (Broad Institute).
Human tumor cell lines were obtained from the DSMZ-German Collection of Microorganisms and Cell Cultures (Leibniz Institute), from ATCC or were a gift from C. J. Thiele. The identity of all cell lines was verified by short tandem repeat STR genotyping (Genetica DNA Laboratories and/or IDEXX BioResearch). Absence of Mycoplasma contamination was determined with a MycoAlert system (Lonza). Neuroblastoma cell lines were cultured in RPMI 1640 medium (Thermo Fisher Scientific) supplemented with penicillin, streptomycin and 10% FCS. To assess the number of viable cells, cells were trypsinized, resuspended in medium and sedimented at 500g for 5 min. Cells were then resuspended in medium, mixed in a 1:1 ratio with 0.02% Trypan Blue Solution (Thermo Fisher Scientific) and counted with a TC20 Automated Cell Counter (Bio-Rad Laboratories). Lentiviral production and transduction were performed as described previously28. Clonogenicity was assessed as described previously28. Kelly and IMR-5 cells were plated in 24-well microplates at a concentration of 5,000 cells per well and incubated for 7 d. Clonogenicity was quantified using methods described previously31.
Protein blotting was performed as described previously28 using antibodies directed against mouse anti-β-actin (clone 8H10D10; Cell Signaling Technology), mouse anti-α-tubulin (clone DM1A; Cell Signaling Technology) and rabbit anti-DCLK1/DCAMKL1 (clone D2U3L; Cell Signaling Technology).
PCR and Sanger sequencing
PCR reactions were performed on 50–100 ng of gDNA using 0.4U Phusion Hot Start II High-Fidelity DNA Polymerase (Thermo Fisher Scientific), 0.5 µM forward and reverse primers (Supplementary Table 2), 200 µM deoxyribonucleotide triphosphates (Bio-Budget Technologies) and 4 µl 5× Phusion Green buffer (Thermo Fisher Scientific). PCR products were resolved on 1% agarose gels. PCR amplicons were purified using the PureLink PCR Purification Kit (Thermo Fisher Scientific). Sanger sequencing was carried out by capillary sequencing using standard procedures (Eurofins Genomics).
Quantitative PCR (qPCR)
qPCR was performed using 50 ng or 1.5 µl of template DNA and 0.5 µM primers with SYBR Green PCR Master Mix (Thermo Fisher Scientific) in FrameStar 96-well PCR plates (4titude). Reactions were run and monitored on a StepOnePlus Real-Time PCR System (Thermo Fisher Scientific) and Ct values were calculated with the StepOne Plus software v.2.3 (Thermo Fisher Scientific).
Circular DNA isolation, purification and sequencing
Circular DNA isolation and purification was performed on the samples described in Supplementary Tables 3 and 4 similarly to previous reports of Circle-seq6. A detailed step-by-step protocol for circular DNA isolation has been deposited on the Nature Protocol Exchange server32. DNA content was measured with a NanoDrop 2000 Spectrophotometer (Thermo Fisher Scientific) and a Qubit 3.0 Fluorometer (Thermo Fisher Scientific). Amplified circular DNA was sheared to an average fragment size of 150–200 bp using an S220 focused ultrasonicator (Covaris). Libraries for next-generation sequencing were prepared using the NEBNext Ultra DNA Library Kit for Illumina according to the manufacturer’s protocol (New England Biolabs). Libraries were sequenced on MiSeq instruments with 2 × 150 bp paired-end reads, HiSeq 4000 instruments with 2 × 125 bp paired-end reads or NextSeq 500 instruments with 2 × 150 bp paired-end reads (all Illumina). SMRT-seq was performed on a PacBio RS II instrument according to the manufacturer’s protocol (Pacific Biosciences). Nanopore sequencing was performed on a MinION instrument according to the manufacturer’s protocol (Oxford Nanopore).
Reads were 3′ trimmed for both quality and adapter sequences, with reads removed if the length was less than 20 nucleotides. Burrows–Wheeler Aligner MEM v.0.7.15 with default parameters was used to align the reads to human reference assembly hg19; PCR and optical duplicates were removed with Picard v.2.16.0. The aligned BAM files were then analyzed in two ways. First, all read pairs and split reads containing any outward-facing read orientation, indicating potential circles, were placed in a new BAM file. Second, genomic segments enriched for signal over background were detected in the ‘all reads’ BAM file using variable-width windows from Homer v.4.11 findPeaks (http://homer.ucsd.edu/), and the edges of these enriched regions were intersected with the ‘circle only’ BAM file to quantify the number of circle-supporting reads. To determine the thresholds for significance of real circles versus background noise, matched WGS data were used to determine the background distribution of circle-oriented reads in non-circle-enriched regions that were matched for length and nucleotide composition. An empirical P value of 0.01 was used to filter putative circles and regions passing this filter were then used for downstream analysis.
Circle analysis in WGS data
Alignments to hg19 were created as outlined earlier, with read trimming, Burrows–Wheeler Aligner MEM and duplicate removal. Discovery of putative tumor-specific circular DNA relied on the filtering of false positives from genomic sequence as well as circles from normal tissue. This was classified with the following approach: (1) alignments with an outward-facing read orientation served as markers of putative circle boundaries projected onto the linear genome; (2) all such regions were merged if their edges occurred within 500 bp on both ends; (3) regions not meeting the empirically defined background threshold were filtered out (P < 0.01; see Circle-seq analysis); (4) lastly, these putative circles were classified as tumor-specific once filtered against circles discovered in the matched normal genome (using steps 1–3). To allow for the detection of copy number–neutral DNA circles, copy number information was not used for this analysis. We confirmed that tandem duplications identified using variant calling algorithms did not identify the same number of circular DNA from the WGS data (Supplementary Fig. 4).
De novo assembly of extrachromosomal circular DNA
De novo assembly of long-read data (SMRT and Nanopore) was accomplished using two approaches. First, for long-read data alone, the Flye v.2.5 assembler (http://github.com/fenderglass/Flye) was used in ‘-meta’ mode with circle junctions evaluated after polishing. Second, for hybrid assemblies using both long and short read data, Unicycler v.0.4.7 (http://github.com/rrwick/Unicycler) was used with racon v.1.3.3 and SPAdes v.3.13.0 and polished with Pilon v.1.23. In all cases, circle assembly was inspected visually using Bandage v.0.8.1 (http://rrwick.github.io/Bandage/). Genic overlap with de novo assemblies was evaluated in two ways. First, by building a BLAST database of all assembled contigs and scoring matches to human genes with at least 70% of gene length covered. Second, each contig, independent of genic overlap, was mapped to hg19 using minimap2 v.2.17 (http://github.com/lh3/minimap2).
Reads from the SMRT-seq data were aligned to hg19 using the Burrows–Wheeler Aligner MEM with the‚ pacbio flag (-k17 -W40 -r10 -A1 -B1 -O1 -E1 -L0). Since these data are single-ended, outward-facing read pairs cannot be used; thus, classification of circle junctions depended on split reads. Segments of the genome enriched for circular DNA were discovered by scanning 10-kilobase (kb) windows and calculating the false discovery rate (FDR)-adjusted P value from the Poisson distribution of the randomized reads.
Genome-wide distribution was calculated by dividing each chromosome into 1-megabase (Mb) bins and overlapping with quality-filtered circles. The number of circle reads overlapping each bin was divided by the total number of circle reads per patient, calculated separately for Circle-seq and WGS data. Genic circles were classified with bedtools v.2.25.0 intersect (http://bedtools.readthedocs.io/) against all protein-coding genes, with gene bodies covered at least 20% being used for downstream analysis. Recurrence across samples was calculated from a high-confidence set of genic circles created from genes with at least four circle-supporting reads covering at least 80% of the shortest transcript. Patients with matched Circle-seq, WGS and RNA-seq (n = 16) were used to investigate the relationship between circles, amplification and expression with a focus on circles with genic overlap. Correlation plots were computed per patient based on circle coverage, RNA expression and copy number variation fold change. Concordance between gene expression and circles was discovered by converting normalized read counts to z-scores and correlating with circle coverage across patient samples. For further methods, see the Supplementary Note.
Circle chimerism was evaluated using split reads from Nanopore sequencing (n = 21) that either bridged another chromosome or linked to a region separated by at least 4 Mb on the same chromosome. A minimum of 5 reads at a mapping quality (MAPQ) > 30 were required for a region to be considered chimeric; all such regions within the circle length ±500 bp were merged using pgltools v.1.2.0 (http://github.com/billgreenwald/pgltools). The resulting chimeric circles were further used as a secondary metric to evaluate the FDR of clustered tree-shaped rearrangement contacts in the WGS data.
Structural variant detection
Copy number variation was detected using Control-FREEC33 v.10.6 with contamination adjustment based on a contamination of 0.4 (that is, samples are 60% tumor), a minimalSubclonePresence of 0.244 and with ASCAT v.4.0.1 using default parameters33,34. Regions in the genome with a total copy number ≥9 were considered amplified regions following COSMIC copy number variant definition20. Amplifications were intersected with regions of circularization using the bedtools v.2.25.0; circular DNAs identified over these amplified regions were classified as ecDNAs. All remaining circular DNAs were classified as eccDNAs. Structural variation was done on matched tumor/normal genomes using novoBreak v.1.1.3 (ref. 35), SvABA v.1.1.1 (ref. 36), Delly2 v.0.7.7 (ref. 37), BRASS v.6.0.5 (https://github.com/cancerit/BRASS) and SMUFIN v.0.9.4 (ref. 38) using default parameters. From 97 initial neuroblastoma genomes, 4 of them (NBL47, NBL53, NBL54 and NBL61) were excluded from the analysis due to their abnormal high number of breakpoints and amplified regions. The 93 genomes left were analyzed with at least 4 variant callers each. Focusing on interchromosomal rearrangements, merging and filtering of the results from different variant calling algorithms was performed. Filtering for all variants was performed with a Brass Assembly Score (BAS) ≥99 and at least 6 variant-supporting reads with an MAPQ > 60. All rearrangements that did not have a minimum of 6 aligned supporting reads with an MAPQ > 60 at each breakpoint were discarded. For the merging of interchromosomal rearrangements, all results from different variant callers were joined after filtering. Variants with breakpoints within a window of 500 bp where collapsed. Only intrachromosomal variants supported by at least two different callers were included. Two additional samples (NBL49 and NBL50), which had exceptionally high numbers of rearrangements (z-score > 2) were discarded. A 1-Mb genomic region was blacklisted due to its high number of recurrent, visually confirmed false positive breakpoints (z-score > 2 within the 10 highest-ranking bins). Structural variant calls from an independent cohort of WGS data of 546 pediatric cancer genomes was obtained from the DKFZ Pediatric Pan Cancer dataset (https://hgserver1.amc.nl/cgi-bin/r2/main.cgi?&dscope=DKFZ_PED&option=about_dscope). For further methods, see the Supplementary Note.
Regions of clustered rearrangements
A region of clustered, tree-shaped rearrangement pattern was defined as having three or more interchromosomal rearrangements within a 4-Mb sliding window. The outermost breakpoints defined the boundaries of a cluster region. When five or more interchromosomal rearrangements connected the same two chromosomes, these were flagged and not considered for cluster detection. When 2 or more interchromosomal rearrangements connected 2 regions <10 Mb in size, only one rearrangement was counted for cluster detection. All chromosomes with >25 interchromosomal rearrangements were not considered. All structural variants detected in our dataset, as well as regions of clustered rearrangements detected using the methods described, can be visually inspected in an openly accessible website39. To estimate the FDRs, we randomly redistributed breakpoints of each sample across the mappable genome before counting the number of rearrangements within 4-Mb sliding windows. Five hundred such randomized datasets were created. The FDR was estimated as the mean fraction of rearrangement cluster-positive samples in this randomized dataset. For the chosen threshold of 3 or more rearrangements, the estimated FDR was 0.13. The analysis of circle integration was carried out by detecting the rearrangements connecting a circularized region with a candidate insertion site. Integration sites were defined by two main characteristics: both recipient breakpoints being located on the same chromosome and at a distance between breakpoints smaller than the circularized region inserted. Visual inspection of BAM files was performed for each candidate integration site. For further methods, see the Supplementary Note.
Circle length analysis
To identify the length preferences for circles depending on the copy number state of the underlying genomic segment, we derived a zero-sum score, following common enrichment test strategies such as gene set enrichment analysis40,41. For a given copy number category (balanced, weak imbalance, strong imbalance, LOH and focal amplification), each circle was assigned a score of 1/k if the circle belonged to the category and −1/(n − k) otherwise, where k is the total number of circles in that category and n is the total number of circles. Circles were ranked by length and cumulative scores along the list were calculated. The absolute maximum cumulative score was tested against 10,000 random permutations of the ranked list to determine the approximate enrichment P values. For further methods, see the Supplementary Note.
Circle breakpoint analysis
Base-pair accurate circle junctions were reassembled using SvABA v.1.1.1 with default parameters and only read pairs and split reads containing any outward-facing read orientation as input. Each precise head-to-tail rearrangement call was considered a circle junction. Homology and insertion sequences were taken from the SvABA output directly.
To screen for motifs enriched at circle junction breakpoints, hg19 reference sequences for 41-bp windows around each circle junction breakpoint were obtained. MEME v.5.0.2 (parameters -objfun de -revcomp -nmotifs 5) was used to assess these sequences for motif enrichment with respect to a set of 1 million length-matched sequences randomly sampled from hg19 (excluding poorly or nonassembled regions and the ENCODE DAC blacklist). We compared reference sequence-derived microhomology lengths for actual breakpoints versus a random permutation of breakpoint partners using a two-sided t-test. For further methods, see the Supplementary Note.
Structural variant breakpoint analysis
Base-pair accurate structural rearrangement calls from the merged structural variant set were considered for detailed breakpoint analysis. The hg19 reference sequence was obtained for a 61-bp window around each breakpoint. MEME v.5.0.2 (parameters -objfun de -revcomp -nmotifs 10) was used to identify motifs that were enriched with regard to a set of 1 million length-matched sequences randomly sampled from hg19 (excluding poorly or nonassembled regions and the ENCODE DAC blacklist). Differential enrichment was equally assessed to compare subsets of rearrangements (clustered rearrangements versus nonclustered rearrangements, circle–circle versus other, circle–genome versus other, genome–genome versus other). Only SvABA rearrangement could be readily analyzed for homology and inserted sequences at breakpoints. We compared reference sequence-derived microhomology lengths for actual breakpoints versus a random permutation of breakpoint partners using a two-sided t-test. For further methods, see the Supplementary Note.
The enrichment of rearrangements in circularization loci was done using a two-sample test for equality of proportions with continuity correction. The enrichment of interchromosomal rearrangement breakpoint clusters within circularized regions was assessed using the union of interchromosomal rearrangements detected by all variant callers and at regions of circularization determined using Circle-seq and WGS separately. The relative overlap of each region of clustered breakpoints with circularized regions in the respective sample was computed. The distribution of overlap was then compared to the distribution expected by chance. For each region of clustered rearrangements, 2,000 random intervals of matching length were randomly positioned over a masked genome that excluded poorly or nonassembled regions and the ENCODE DAC blacklist. The relative overlap of each random interval with circular DNA in the matching patient was then assessed. A hypothesis test was derived from considering the mean relative overlap for the set of observed cluster regions with regard to the distribution of the mean relative overlap for the 2,000 synthetic sets of cluster regions. The one-sided empirical P value was calculated and Benjamini–Hochberg-corrected for multiple comparisons (circle classes and circle calling methods). We investigated the distance of distal breakpoints of tree-shaped clustered rearrangements. We tested whether these breakpoints were closer to certain classes of genes than expected by chance. We looked at three gene classes: all COSMIC v.87 genes versus only COSMIC v.87 oncogenes versus only COSMIC v.87 tumor suppressor genes. For each breakpoint, we calculated the distance to the closest gene of the particular gene class and calculated the class-wise median of distances. Each median was assigned a one-tailed P value based on the distribution of medians in 500 synthetic datasets with breakpoint positions randomly drawn from the nonblacklisted genome. P values were corrected for multiple testing using the Benjamini–Hochberg method. To assess gene expression changes around rearrangement breakpoints, expression of protein-coding genes within 2 Mb of each breakpoint were analyzed. The differential RNA expression of genes in each sample compared to the rest of the cohort was quantified and the modified z-score of their transcripts per million was calculated. Two-sided log-rank tests were used for survival analysis across subgroups. To assess the effect of rearrangement clusters at the MYCN amplicon locus, MYCN-associated clusters were defined as all clusters that overlapped the ±1-Mb window around the MYCN. All violin plots depict the smoothed distribution using a Gaussian kernel with bandwidth selected according to Silverman’s rule. The box plots depict the first and third quartiles, segmented by the median; the whiskers depict the points within the 1.5× interquartile range beyond the box edges. All cell culture experiments were conducted at least three independent times, unless otherwise stated. For further details, see the Nature Research Reporting Summary. For further methods, see the Supplementary Note.
Patient samples and clinical data access
This study comprised the analyses of tumor and blood samples of patients diagnosed with neuroblastoma between 1991 and 2016. Patients were registered and treated according to the trial protocols of the German Society of Pediatric Oncology and Hematology (GPOH). This study was conducted in accordance with the World Medical Association Declaration of Helsinki (2013) and good clinical practice; informed consent was obtained from all patients or their guardians. The collection and use of patient specimens was approved by the institutional review boards of Charité-Universitätsmedizin Berlin and the Medical Faculty, University of Cologne. Specimens and clinical data were archived and made available by Charité-Universitätsmedizin Berlin or the National Neuroblastoma Biobank and Neuroblastoma Trial Registry (University Children’s Hospital Cologne) of the GPOH. The MYCN gene copy number was determined as a routine diagnostic method using FISH. DNA and total RNA were isolated from tumor samples with at least 60% tumor cell content as evaluated by a pathologist. For further methods, see the Supplementary Note.
Further information on research design is available in the Nature Research Reporting Summary linked to this article.
The WGS and RNA-seq data that support the findings of this study have been deposited with the European Genome-phenome Archive (https://www.ebi.ac.uk/ega/) under accession nos. EGAS00001001308 and EGAS00001004022. The Circle-seq data that support the findings of this study are available from the corresponding author upon request. Source data for Fig. 1 are available online.
Turner, K. M. et al. Extrachromosomal oncogene amplification drives tumour evolution and genetic heterogeneity. Nature 543, 122–125 (2017).
Møller, H. D. et al. Circular DNA elements of chromosomal origin are common in healthy human somatic tissue. Nat. Commun. 9, 1069 (2018).
Shibata, Y. et al. Extrachromosomal microDNAs and chromosomal microdeletions in normal tissues. Science 336, 82–86 (2012).
Pennisi, E. Circular DNA throws biologists for a loop. Science 356, 996 (2017).
Verhaak, R. G. W., Bafna, V. & Mischel, P. S. Extrachromosomal oncogene amplification in tumour pathogenesis and evolution.Nat. Rev. Cancer 19, 283–288 (2019).
Møller, H. D., Parsons, L., Jørgensen, T. S., Botstein, D. & Regenberg, B. Extrachromosomal circular DNA is common in yeast. Proc. Natl Acad. Sci. USA 112, E3114–E3122 (2015).
Tjio, J. H. & Levan, A. The chromosome number of man. Hereditas 42, 1–6 (1956).
Garsed, D. W. et al. The architecture and evolution of cancer neochromosomes. Cancer Cell 26, 653–667 (2014).
Rausch, T. et al. Genome sequencing of pediatric medulloblastoma links catastrophic DNA rearrangements with TP53 mutations. Cell 148, 59–71 (2012).
Kohl, N. E. et al. Transposition and amplification of oncogene-related sequences in human neuroblastomas. Cell 35, 359–367 (1983).
deCarvalho, A. C. et al. Discordant inheritance of chromosomal and extrachromosomal DNA elements contributes to dynamic disease evolution in glioblastoma. Nat. Genet. 50, 708–717 (2018).
Nikolaev, S. et al. Extrachromosomal driver mutations in glioblastoma and low-grade glioma. Nat. Commun. 5, 5690 (2014).
Schwab, M. et al. Amplified DNA with limited homology to myc cellular oncogene is shared by human neuroblastoma cell lines and a neuroblastoma tumour. Nature 305, 245–248 (1983).
Balaban-Malenbaum, G. & Gilbert, F. Double minute chromosomes and the homogeneously staining regions in chromosomes of a human neuroblastoma cell line. Science 198, 739–741 (1977).
Cox, D., Yuncken, C. & Spriggs, A. I. Minute chromatin bodies in malignant tumours of childhood. Lancet 1, 55–58 (1965).
Sanborn, J. Z. et al. Double minute chromosomes in glioblastoma multiforme are revealed by precise reconstruction of oncogenic amplicons. Cancer Res. 73, 6036–6045 (2013).
Deshpande, V. et al. Exploring the landscape of focal amplifications in cancer using AmpliconArchitect. Nat. Commun. 10, 392 (2019).
Avet-Loiseau, H. et al. Morphologic and molecular cytogenetics in neuroblastoma. Cancer 75, 1694–1699 (1995).
Dillon, L. W. et al. Production of extrachromosomal microDNAs is linked to mismatch repair pathways and transcriptional activity. Cell Rep. 11, 1749–1759 (2015).
Forbes, S. A. et al. COSMIC: somatic cancer genetics at high-resolution. Nucleic Acids Res. 45, D777–D783 (2017).
Bouzas-Rodriguez, J. et al. Neurotrophin-3 production promotes human neuroblastoma cell survival by inhibiting TrkC-induced apoptosis. J. Clin. Invest. 120, 850–858 (2010).
Xu, K. Structure and evolution of double minutes in diagnosis and relapse brain tumors. Acta Neuropathol. 137, 123–137 (2019).
Storlazzi, C. T. et al. Gene amplification as double minutes or homogeneously staining regions in solid tumors: origin and structure. Genome Res. 20, 1198–1206 (2010).
Villamón, E. et al. Genetic instability and intratumoral heterogeneity in neuroblastoma with MYCN amplification plus 11q deletion. PLoS ONE 8, e53740 (2013).
Marrano, P., Irwin, M. S. & Thorner, P. S. Heterogeneity of MYCN amplification in neuroblastoma at diagnosis, treatment, relapse, and metastasis. Genes Chromosomes Cancer 56, 28–41 (2017).
Gröbner, S. N. et al. The landscape of genomic alterations across childhood cancers. Nature 555, 321–327 (2018).
Northcott, P. A. et al. Enhancer hijacking activates GFI1 family oncogenes in medulloblastoma. Nature 511, 428–434 (2014).
Henssen, A. G. et al. PGBD5 promotes site-specific oncogenic mutations in human tumors. Nat. Genet. 49, 1005–1014 (2017).
Henssen, A. G. et al. Genomic DNA transposition induced by human PGBD5. eLife 4, e10565 (2015).
Henssen, A. G. et al. Forward genetic screen of human transposase genomic rearrangements. BMC Genomics 17, 548 (2016).
Guzmán, C., Bagga, M., Kaur, A., Westermarck, J. & Abankwa, D. ColonyArea: an ImageJ plugin to automatically quantify colony formation in clonogenic assays. PLoS ONE 9, e92444 (2014).
Henssen, A. G., MacArthur, I., Koche, R. & Dorado-García, H. Purification and sequencing of large circular DNA from human cells. Protoc Exch (2019); https://doi.org/10.1038/protex.2019.006
Boeva, V. et al. Control-FREEC: a tool for assessing copy number and allelic content using next-generation sequencing data. Bioinformatics 28, 423–425 (2012).
Van Loo, P. et al. Allele-specific copy number analysis of tumors. Proc. Natl Acad. Sci. USA 107, 16910–16915 (2010).
Chong, Z. et al. novoBreak: local assembly for breakpoint detection in cancer genomes. Nat. Methods 14, 65–67 (2017).
Wala, J. A. et al. SvABA: genome-wide detection of structural variants and indels by local assembly. Genome Res. 28, 581–591 (2018).
Rausch, T. et al. DELLY: structural variant discovery by integrated paired-end and split-read analysis. Bioinformatics 28, i333–i339 (2012).
Moncunill, V. et al. Comprehensive characterization of complex structural variations in cancer by directly comparing genome sequence reads. Nat. Biotechnol. 32, 1106–1112 (2014).
Helmsauer, K. Tree-shaped Rearrangement Patterns in Pediatric Cancer Genomes (2019); https://kons.shinyapps.io/trees
Mootha, V. K. et al. PGC-1α-responsive genes involved in oxidative phosphorylation are coordinately downregulated in human diabetes. Nat. Genet. 34, 267–273 (2003).
Subramanian, A. et al. Gene set enrichment analysis: a knowledge-based approach for interpreting genome-wide expression profiles. Proc. Natl Acad. Sci. USA 102, 15545–15550 (2005).
We thank A. Kentsis, S. Armstrong, B. Regenberg, F. Speleman, S. Perner, U. Ohler and N. Hübener for critical discussions, and K. Astrahantseff for editorial advice. We thank D. Sunaga-Franze, C. Quedenau, M. Sohn, K. Richter and C. Langnick for technical support. A.G.H. is supported by the Deutsche Forschungsgemeinschaft (German Research Foundation; grant no. 398299703) and the Wilhelm Sander Stiftung (2018.011.1). A.G.H., A.K. and S.F. are participants in the Berlin Institute of Health-Charité Clinical Scientist Program funded by the Charité-Universitätsmedizin Berlin and the Berlin Institute of Health. A.G.H., S.F., K.H. and V.B. are supported by Berliner Krebsgesellschaft e.V. K.H. is supported by Boehringer Ingelheim Fonds. This work was also supported by the TransTumVar project (project no. PN013600). This project was supported by the Berlin Institute of Health within the collaborative research project TERMINATE-NB (CRG04). We thank the patients and their parents for granting access to the tumor specimens and clinical information that were analyzed in this study. We thank B. Hero, H. Düren and N. Hemstedt of the Neuroblastoma Biobank and Neuroblastoma Trial Registry (University Children’s Hospital Cologne) of the GPOH for providing samples and clinical data.
The authors declare no competing interests.
Publisher’s note Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
About this article
Cite this article
Koche, R.P., Rodriguez-Fos, E., Helmsauer, K. et al. Extrachromosomal circular DNA drives oncogenic genome remodeling in neuroblastoma. Nat Genet 52, 29–34 (2020). https://doi.org/10.1038/s41588-019-0547-z
International Journal of Cancer (2021)
Origin, Regulation, and Fitness Effect of Chromosomal Rearrangements in the Yeast Saccharomyces cerevisiae
International Journal of Molecular Sciences (2021)
The decreased exclusion of nuclear eccDNA: From molecular and subcellular levels to human aging and age-related diseases
Ageing Research Reviews (2021)
Ageing Research Reviews (2021)
Tracking telomere fusions through crisis reveals conflict between DNA transcription and the DNA damage response
NAR Cancer (2021)