Oncogenes are commonly amplified on particles of extrachromosomal DNA (ecDNA) in cancer1,2, but our understanding of the structure of ecDNA and its effect on gene regulation is limited. Here, by integrating ultrastructural imaging, long-range optical mapping and computational analysis of whole-genome sequencing, we demonstrate the structure of circular ecDNA. Pan-cancer analyses reveal that oncogenes encoded on ecDNA are among the most highly expressed genes in the transcriptome of the tumours, linking increased copy number with high transcription levels. Quantitative assessment of the chromatin state reveals that although ecDNA is packaged into chromatin with intact domain structure, it lacks higher-order compaction that is typical of chromosomes and displays significantly enhanced chromatin accessibility. Furthermore, ecDNA is shown to have a significantly greater number of ultra-long-range interactions with active chromatin, which provides insight into how the structure of circular ecDNA affects oncogene function, and connects ecDNA biology with modern cancer genomics and epigenetics.
Subscribe to Journal
Get full journal access for 1 year
only $3.90 per issue
All prices are NET prices.
VAT will be added later in the checkout.
Rent or Buy article
Get time limited or full article access on ReadCube.
All prices are NET prices.
WGS, RNA-seq, ATAC-seq, MNase-seq, ChIP–seq and PLAC-seq data are deposited in the NCBI Sequence Read Archive, under BioProject accession PRJNA506071. Source Data for Figs. 2, 3 and Extended Data Figs. 1–6, 10 are provided with the paper. Source data of the pixel quantification of ATAC-see on metaphase chromosome spread images in Extended Data Fig. 7d are available on Figshare (https://doi.org/10.6084/m9.figshare.9826115.v1).
The following are available for use online: AmpliconArchitect (https://github.com/virajbdeshpande/AmpliconArchitect), AmpliconReconstructor (https://github.com/jluebeck/AmpliconReconstructor), and CycleViz (https://github.com/jluebeck/CycleViz)
Turner, K. M. et al. Extrachromosomal oncogene amplification drives tumour evolution and genetic heterogeneity. Nature 543, 122–125 (2017).
Verhaak, R. G. W., Bafna, V. & Mischel, P. S. Extrachromosomal oncogene amplification in tumour pathogenesis and evolution. Nat. Rev. Cancer 19, 283–288 (2019).
Gibcus, J. H. & Dekker, J. The hierarchy of the 3D genome. Mol. Cell 49, 773–782 (2013).
Dixon, J. R., Gorkin, D. U. & Ren, B. Chromatin domains: the unit of chromosome organization. Mol. Cell 62, 668–680 (2016).
Corces, M. R. et al. The chromatin accessibility landscape of primary human cancers. Science 362, eaav1898 (2018).
Hnisz, D. et al. Activation of proto-oncogenes by disruption of chromosome neighborhoods. Science 351, 1454–1458 (2016).
Møller, H. D. et al. Circular DNA elements of chromosomal origin are common in healthy human somatic tissue. Nat. Commun. 9, 1069 (2018).
Shibata, Y. et al. Extrachromosomal microDNAs and chromosomal microdeletions in normal tissues. Science 336, 82–86 (2012).
Deshpande, V. et al. Exploring the landscape of focal amplifications in cancer using AmpliconArchitect. Nat. Commun. 10, 392 (2019).
Mendelowitz, L. & Pop, M. Computational methods for optical mapping. Gigascience 3, 33 (2014).
Mak, A. C. et al. Genome-wide structural variation detection by genome mapping on nanochannel arrays. Genetics 202, 351–362 (2016).
Demmerle, J. et al. Strategic and practical guidelines for successful structured illumination microscopy. Nat. Protocols 12, 988–1010 (2017).
Schimke, R. T. Gene amplification in cultured animal cells. Cell 37, 705–713 (1984).
Storlazzi, C. T. et al. Gene amplification as double minutes or homogeneously staining regions in solid tumors: origin and structure. Genome Res. 20, 1198–1206 (2010).
L’Abbate, A. et al. MYC-containing amplicons in acute myeloid leukemia: genomic structures, evolution, and transcriptional consequences. Leukemia 32, 2152–2166 (2018).
Baylin, S. B. & Jones, P. A. Epigenetic determinants of cancer. Cold Spring Harb. Perspect. Biol. 8, a019505 (2016).
Lee, D. Y., Hayes, J. J., Pruss, D. & Wolffe, A. P. A positive role for histone acetylation in transcription factor access to nucleosomal DNA. Cell 72, 73–84 (1993).
Luger, K., Mäder, A. W., Richmond, R. K., Sargent, D. F. & Richmond, T. J. Crystal structure of the nucleosome core particle at 2.8 A resolution. Nature 389, 251–260 (1997).
Smith, G. et al. c-Myc-induced extrachromosomal elements carry active chromatin. Neoplasia 5, 110–120 (2003).
Chen, X. et al. ATAC-see reveals the accessible genome by transposase-mediated imaging and sequencing. Nat. Methods 13, 1013–1020 (2016).
Solovei, I. et al. Topology of double minutes (dmins) and homogeneously staining regions (HSRs) in nuclei of human neuroblastoma cell lines. Genes Chromosom. Cancer 29, 297–308 (2000).
Fang, R. et al. Mapping of long-range chromatin interactions by proximity ligation-assisted ChIP-seq. Cell Res. 26, 1345–1348 (2016).
Mumbach, M. R. et al. HiChIP: efficient and sensitive analysis of protein-directed genome architecture. Nat. Methods 13, 919–922 (2016).
Rowley, M. J. & Corces, V. G. Organizational principles of 3D genome architecture. Nat. Rev. Genet. 19, 789–800 (2018).
Bailey, M. H. et al. Comprehensive characterization of cancer driver genes and mutations. Cell 173, 371–385.e318 (2018).
deCarvalho, A. C. et al. Discordant inheritance of chromosomal and extrachromosomal DNA elements contributes to dynamic disease evolution in glioblastoma. Nat. Genet. 50, 708–717 (2018).
Lederberg, J. Cell genetics and hereditary symbiosis. Physiol. Rev. 32, 403–430 (1952).
Nathanson, D. A. et al. Targeted therapy resistance mediated by dynamic regulation of extrachromosomal mutant EGFR DNA. Science 343, 72–76 (2014).
McGranahan, N. & Swanton, C. Clonal heterogeneity and tumor evolution: past, present, and the future. Cell 168, 613–628 (2017).
Xu, K. et al. Structure and evolution of double minutes in diagnosis and relapse brain tumors. Acta Neuropathol. 137, 123–137 (2019).
Cao, H. et al. Rapid detection of structural variation in a human genome using nanochannel-based genome mapping technology. Gigascience 3, 34 (2014).
Corces, M. R. et al. An improved ATAC-seq protocol reduces background and enables interrogation of frozen tissues. Nat. Methods 14, 959–962 (2017).
Juric, I. et al. MAPS: model-based analysis of long-range chromatin interactions from PLAC-seq and HiChIP experiments. PLOS Comput. Biol. 15, e1006982 (2019).
Raviram, R. et al. 4C-ker: a method to reproducibly identify genome-wide interactions captured by 4C-seq experiments. PLOS Comput. Biol. 12, e1004780 (2016).
Thakore, P. I. et al. Highly specific epigenome editing by CRISPR-Cas9 repressors for silencing of distal regulatory elements. Nat. Methods 12, 1143–1149 (2015).
We thank members of the Mischel laboratory, M. Farquhar for the use of the UCSD/CMM electron microscopy facility, T. Merloo and Y. Jones for electron microscopy sample preparation, UCSD Neuroscience Microscopy Shared Facility (NS047101) for providing imaging support, and the Ecker laboratory at the Salk Institute for Biological Studies for use of the Irys instrument for BioNano optical mapping. This work was supported by the Ludwig Institute for Cancer Research (P.S.M., B.R., F.B.F.), Defeat GBM Program of the National Brain Tumor Society (P.S.M., F.B.F.), NVIDIA Foundation, Compute for the Cure (P.S.M.), The Ben and Catherine Ivy Foundation (P.S.M.), and Ruth L. Kirschstein National Research Service Award NIH/NCI T32 CA009523 (R.R.). This work was also supported by the following National Institutes of Health (NIH) grants: NS73831 (P.S.M.), R35CA209919 (H.Y.C.), RM1-HG007735 (H.Y.C.), GM114362 (V.B.), NS80939 (F.B.F.), and NSF grants: NSF-IIS-1318386 and NSF-DBI-1458557 (V.B.). The TEM facility is supported in part by NIH award number S10OD023527. Work in the Law laboratory was supported by a Salk Innovation Grant and by the Rita Allen Foundation Scholars Program. H.Y.C. is an Investigator of the Howard Hughes Medical Institute.
P.S.M., H.Y.C. and R.G.W.V. are co-founders of Boundless Bio, Inc. and serve as consultants. V.B. is a co-founder, and has equity interest in Boundless Bio, Inc. and Digital Proteomics, LLC, and receives income from DP. The terms of this arrangement have been reviewed and approved by the University of California, San Diego, in accordance with its conflict of interest policies. Boundless Bio, Inc. and Digital Proteomics, LLC were not involved in the research presented here. K.M.T. and N.N became employees of Boundless Bio, Inc. after the paper was accepted for publication.
Publisher’s note Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Peer review information Nature thanks Tony Papenfuss, Lothar Schermelleh and the other, anonymous, reviewer(s) for their contribution to the peer review of this work.
Extended data figures and tables
a, ecDNA number per metaphase in GBM39, COLO320DM and PC3 cell lines. Box plots are as in Fig. 2g. At least 20 metaphase spreads from 3 biologically independent samples were counted. b, Left, depiction of amplification status classified by AmpliconArchitect. Right, representative AmpliconArchitect of the EGFR circular amplicon in GBM39 cells. Arrows represent the orientation of the assembled contig. c, Circular amplicon in COLO320DM cells and double FISH of MYC and PCAT1 validating the amplicon structure. Scale bar, 5 μm. d, Circular amplicon in PC3 cells and double FISH validating the structure and co-existence of DENND3 and MYC in the same ecDNA. Scale bar, 5 μm. e, A detailed AmpliconArchitect-reconstructed schema showing the junctions and hg19 coordinates of ecDNA in GBM39 cells, and the number of paired-end discordant reads to support the reconstruction. f, PCR cloning (left) and Sanger sequencing validation (right) of the ecDNA circular junction in GBM39 cells using the primers in d. Exact sequence and BLAT result are shown on the right. The highlighted 4-bp nucleotides were overlaps of the two DNA segments. An ecDNA-free GBM cell line U87 was used as a negative control. M, 100-bp DNA ladder. Data are representative of three independent experiments. See Supplementary Fig. 1 for source data. g, Representative linear amplicon breakpoint graph in GBM39 cells (left), with FISH validation of its chromosomal loci (right). Scale bars, 10 μm (left) and 5 μm (right). h, Size and copy number of 41 reconstructed circular structures in 37 cancer cell lines. All imaging experiments were repeated at least three times, with similar results. Source data
a, Pipeline to integrate WGS and BioNano optical mapping. CMAPS denotes a contig mapping and analysis package. b, Intensity profile plot of the double FISH of EGFR and SEPT14 in GBM39 cells. c, FISH validating MYC-containing ecDNA in COLO320DM cells visualized by 3D-SIM. Scale bars, 5 μm (top) and 1 μm (bottom). d, Three-dimensional reconstruction showing the circular structure of two individual ecDNA structures from 3D-SIM (arrows). The height in the contour map indicates the signal intensity of DAPI. Scale bar, 1 μm. e, TEM of GBM39 ecDNA. Scale bars, 200 nm. All imaging experiments were repeated at least three times, with similar results. Source data
a, Transcriptome in the U87 GBM cell line, which lacks ecDNA. Green data points represent the same genes that are found on ecDNA in the GBM39 cell line. b, ecDNA gene expression levels within the transcriptome of COLO320DM and PC3 cells, and selected TCGA samples. Red dots represent genes located on ecDNA (circular amplification genes). c, ecDNA gene expression (red data points) in GBM39 cells, COLO320DM cells, PC3 cells, one TCGA-LGG sample (TCGA-DU-7010-01A-11) and one TCGA-SARC sample (TCGA-DX-A23R-01A-11), compared to non-circular genes in the TCGA-GBM (n = 36 biologically independent samples), TCGA-COAD (n = 52 biologically independent samples), TCGA-PRAD (n = 120 biologically independent samples), TCGA-LGG (n = 96 biologically independent samples) and TCGA-SARC (n = 36 biologically independent samples) cohorts, respectively. d, Z-score of the gene expression values in b. Z-scores were plotted as +1 to avoid negative values during log10 transformation. For TCGA samples in b and c, genes on circular amplicons are highlighted as red data points. e–g, Expression of circular amplified and non-circular genes in the TCGA-GBM, TCGA-LGG and TCGA-SARC cohorts. h, Normalized gene expression by copy number in the TCGA-SARC cohort (CDK4, P < 0.028; METTL1, P = 0.007; METTL21B). P = 0.024, two-sided Wilcoxon rank-sum test. Asterisks indicate key oncogenes. Violin plots show the overall distribution of data points. Box plots are as in Fig. 2g. Every gene in each amplicon type was analysed from at least five biologically independent samples in e–h. Source data
a, Immunofluorescence staining of active histone marks H3K4me1 and H3K27ac in metaphase GBM39, COLO320DM and PC3 cells. Scale bars, 5 μm. b, H3K4me1 and H3K27ac ChIP–seq in cycling GBM39 cells. Magnified area demonstrates the ecDNA region. c, Immunofluorescence staining of active histone marks H3K4me3 and H3K18ac in metaphase GBM39 cells. Scale bars, 5 μm. d, Immunofluorescence staining of inactive histone marks H3K9me3 and H3K27me3 in metaphase GBM39 cells. Yellow arrows indicate positive foci, blue arrows indicate ecDNA without foci. e, Quantification of H3K9me3 and H3K27me3 foci per ecDNA in GBM39 cells in metaphase. All imaging experiments were repeated at least three times, with similar results. Source data
a, Workflow to characterize the chromatin accessibility of ecDNA. b, Global and long (>1 kb) ATAC-seq read length distribution comparing ecDNA and chrDNA in COLO320DM (88 ecDNA and 987 chrDNA long fragments) and PC3 (39 ecDNA and 108 chrDNA long fragments) cells (n = 2 biologically independent samples, showing one of the representative results). P values determined by two-sided Kolmogorov–Smirnov test. c, Distribution of global and long (>1 kb) MNase-seq fragment lengths in GBM39 cells (2,699 ecDNA and 18,942 chrDNA long fragments; n = 2 biologically independent samples, showing one of the representative results). P value determined by two-sided Kolmogorov–Smirnov test. d, ATAC-seq peak number per 10 kb comparing random genome regions (313,762 windows in COLO320DM and PC3 cells), linear amplification (470 windows in COLO320DM, 15,186 windows in PC3 cells), and circular amplification regions (44 windows in COLO320DM, 510 windows in PC3 cells; n = 2 biologically independent samples). P values determined by Kruskal–Wallis rank-sum test. e, ATAC-seq and WGS tracks of TCGA samples comparing circular and linear amplified regions, before (left) and after (right) normalization to copy number. f, Representative FISH from three replicates showing amplicon location in GBM39, GBM39HSR, COLO320DM and COLO320HSR metaphase cells. Scale bars, 10 μm. g, ATAC-seq and WGS tracks of the amplified region in GBM39, GBM39HSR, COLO320DM and COLO320HSR cells. CN, copy number. h, Normalized ATAC-seq read counts (10-kb bin) by copy number comparing ecDNA and HSR regions (GBM39/HSR amplicon, 134 windows; COLO320DM/HSR amplicon, 157 windows; non-amplicon, 1,000 windows). P values determined by two-sided Dunn’s test. Violin plots show the overall distribution of data points. Box plots are as in Fig. 2g. i, Distribution of global and long (>1 kb) ATAC-seq read lengths comparing HSR and non-HSR chrDNA in GBM39HSR (15 ecDNA and 640 chrDNA long fragments) and COLO320HSR (102 ecDNA and 4,554 chrDNA long fragments) cells (n = 2 biologically independent samples, showing one of the representative results). P value determined by two-sided Kolmogorov–Smirnov test. j, Number of single nucleotide polymorphism (SNP) supported reads from the major allele (containing ecDNA) and minor allele in GBM39 cells from multiple sequencing technologies. Circular amplified region (ecDNA) is marked in red. Source data
a, Workflow to evaluate the accessibility of ecDNA in interphase cells. b, Representative images of FISH, ATAC-see and MitoTracker Deep Red FM signal colocalization in COLO320DM cells. c, Pearson correlation of FISH signal pixel intensity and ATAC-see signal pixel intensity in four representative single cells. At least 27,000 pixels were analysed for each cell. Source data
a, The strategy of applying ATAC-see to DNA in cells in metaphase. b, Image analysis pipeline, showing ecDNA and chrDNA segmentation of the DAPI channel. The pixel intensity of ATAC-see channel was measured. c, ATAC-seq tracks and corresponding representative images of FISH and ATAC-see. Scale bars, 5 μm. d, Quantification of ATAC-see pixel intensity of ecDNA versus chrDNA from at least four independent metaphase spreads. Violin plots show the overall distribution of data points. The dashed line across the plot indicates the global mean value. The solid black lines inside each split violin plot indicate the mean of each dataset. P values determined by two-sided Z-test.
a–d, Composite circular plots displaying WGS, RNA-seq and ATAC-seq of ecDNA. For COLO320DM and PC3 cells with multiple versions of reconstructed structures, only one representative structure is shown. For TCGA samples (c, TCGA-A7-A0D9, breast invasive carcinoma; d, TCGA-L7-A6VZ, oesophageal carcinoma), the ATAC-seq data point represents the highest signal within a 1-kb window.
a, Examples of selected potential amplicons reconstructed from AmpliconArchitect in GBM39, COLO320DM and PC3 cells. For each potential amplicon, the average copy number of the segments is listed. The starting segment of the structure is outlined in green. From the starting segment, the structure can be traced by following the arrows to find the next genomic segment of the structure. Some structures have a circular path (that is, can return to the starting segment by following the arrows), which represents potential ecDNA structure.
a, Chromatin interaction heat maps comparing GBM39 with U87 cells, generated from PLAC-seq/HiChIP analyses using H3K27ac as the anchor. The GBM39 ecDNA region was downsampled to a comparable level of U87 to normalize for copy number. Contrast heat map shows the differential interaction. Green arrows indicate the increased corner reads in the GBM39 ecDNA junctional region but not in the U87 chrDNA locus, demonstrating ecDNA circularity. b, c, Virtual 4C read counts from viewpoints 1 (ecDNA junction) and 2 (EGFR promoter), respectively. d, Actual 4C-seq read counts, and the read count ratio of GBM39 to U87 from viewpoint 2. e, f, Models depicting local and distal interactions with the EGFR promoter and proposed model for CRISPR interference masking of the EGFR promoter. g, h, qPCR analysis of gene expression in regions proximal and distal to EGFR. Data are mean ± s.e.m.; n = 3; each data point represents three technical replicates from one representative result. criEGFR, CRISPR interference of EGFR; criNC, CRISPR interference negative control. **P < 0.01; ***P < 0.001; ****P < 0.0001, one-way ANOVA. N.S., not significant. i, Exogenous expression of EGFR variant III in U87 cells (U87-EGFRvIII) and the activation of EGFR signalling was confirmed by western blot. Experiment was repeated three times, with similar results. See Supplementary Fig. 1 for source data. j, qPCR analysis of EGFR-neighbouring gene expression in U87 cells, with and without ectopic overexpression of EGFRvIII. Data are mean ± s.e.m.; n = 3; each data point represents three technical replicates from one representative result. GBAS, *P = 0.038; EGFR, **P = 0.003; Welch’s t-test. Source data
Supplementary Figure 1: Raw images of agarose gel and western blot. Related to Extended Data Fig. 1f and 10i.
Supplementary Table 1: Amplicon Architect classification of amplified segments in cancer cell lines. Related to Fig. 1. Lists the amplicon size, location, copy count, circular or linear classification, and genes present in PC3, COLO320DM, and GBM39 cell lines.
Supplementary Table 2: RNA-seq of circular amplified genes in selected TCGA cohorts. Related to Fig. 2. Lists the TCGA sample ID, Ensembl ID, gene, FPKM and its rank, and oncogene classification.
Supplementary Table 3: CRISPRi gRNAs. Related to Extended Data Fig. 10. Lists the sequences for CRISPR gRNAs to mask EGFR promoter.
Supplementary Table 4: List of primers. Related to Extended Data Fig. 1, Fig. 4, and Extended Data Figure 10. Lists the primer sequences used for GBM39 ecDNA junction cloning, 4C-seq, and qPCR.
About this article
Cite this article
Wu, S., Turner, K.M., Nguyen, N. et al. Circular ecDNA promotes accessible chromatin and high oncogene expression. Nature 575, 699–703 (2019) doi:10.1038/s41586-019-1763-5