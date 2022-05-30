Skip to main content

Identifying synergistic high-order 3D chromatin conformations from genome-scale nanopore concatemer sequencing

Nature Biotechnology (2022)Cite this article

Subjects

Abstract

High-order three-dimensional (3D) interactions between more than two genomic loci are common in human chromatin, but their role in gene regulation is unclear. Previous high-order 3D chromatin assays either measure distant interactions across the genome or proximal interactions at selected targets. To address this gap, we developed Pore-C, which combines chromatin conformation capture with nanopore sequencing of concatemers to profile proximal high-order chromatin contacts at the genome scale. We also developed the statistical method Chromunity to identify sets of genomic loci with frequencies of high-order contacts significantly higher than background (‘synergies’). Applying these methods to human cell lines, we found that synergies were enriched in enhancers and promoters in active chromatin and in highly transcribed and lineage-defining genes. In prostate cancer cells, these included binding sites of androgen-driven transcription factors and the promoters of androgen-regulated genes. Concatemers of high-order contacts in highly expressed genes were demethylated relative to pairwise contacts at the same loci. Synergies in breast cancer cells were associated with tyfonas, a class of complex DNA amplicons. These results rigorously link genome-wide high-order 3D interactions to lineage-defining transcriptional programs and establish Pore-C and Chromunity as scalable approaches to assess high-order genome structure.

Fig. 1: Pore-C concatemers yield high-fidelity maps of proximal 3D contacts.
Fig. 2: Synergy algorithm uncovers cooperativity in high-order enhancer and promoter interactions.
Fig. 3: De novo discovery of reference-distal and interchromosomal cooperativity with Chromunity.
Fig. 4: High-order contacts in highly expressed genes are preferentially demethylated.
Fig. 5: Androgen stimulation induces enhancer and promoter cooperativity at androgen-dependent loci in prostate cells.
Fig. 6: Synergy associated with a complex cancer amplicon.

Data availability

Sequence and pairwise and high-order contact data that support the findings of this study have been deposited in GEO with the accession code GSE149117.

Code availability

The source code of packages used in these analyses can be accessed at the following links along with dependencies: https://github.com/nanoporetech/pore-c, https://github.com/mskilab/chromunity and https://github.com/mskilab/gGnome. Python (3.7.*) and R (3.6.0) were used for developing these tools. https://github.com/mskilab/GxG

We thank J. Skok for helpful comments on the manuscript. M.I. is supported by a Burroughs Wellcome Fund Career Award for Medical Scientists, Doris Duke Clinical Foundation Clinical Scientist Development Award and The Pershing Square Sohn Prize for Young Investigators in Cancer Research.

  1. Department of Pathology and Laboratory Medicine, Weill Cornell Medicine, New York, NY, USA

    Aditya S. Deshpande, Netha Ulahannan, Julie M. Behr, Huasong Tian, Emily Adney, Juan Miguel Mosquera & Marcin Imieliński

  2. New York Genome Center, New York, NY, USA

    Aditya S. Deshpande, Netha Ulahannan, Julie M. Behr, Will Liao, Huasong Tian, Hannah G. Otis, Emily Adney & Marcin Imieliński

  3. Tri-Institutional PhD Program in Computational Biology and Medicine, New York, NY, USA

    Aditya S. Deshpande & Julie M. Behr

  4. Oxford Nanopore Technologies, New York, NY, USA

    Matthew Pendleton, Xiaoguang Dai, Carly Tyer, Priyesh Rughani, Daniel J. Turner, Sissel Juul & Eoghan Harrington

  5. Oxford Nanopore Technologies, San Francisco, CA, USA

    Lynn Ly, Daniel J. Turner & Sissel Juul

  6. Oxford Nanopore Technologies, Oxford, UK

    Stefan Schwenk, David Stoddart, Daniel J. Turner & Sissel Juul

  7. Department of Urology, Weill Cornell Medicine, New York, NY, USA

    Michael A. Augello & Christopher E. Barbieri

  8. Sandra and Edward Meyer Cancer Center, Weill Cornell Medicine, New York, NY, USA

    Michael A. Augello, Christopher E. Barbieri & Marcin Imieliński

  9. Englander Institute for Precision Medicine, Weill Cornell Medicine, New York, NY, USA

    Sarah Kudman, David Wilkes, Juan Miguel Mosquera, Christopher E. Barbieri, Ari Melnick & Marcin Imieliński

  10. Weill Cornell/Rockefeller/Sloan Kettering Tri-Institutional MD–PhD Program, New York, NY, USA

    Hannah G. Otis

  11. Division of Hematology/Oncology, Weill Cornell Medicine, New York, NY, USA

    Ari Melnick

  12. Institute for Computational Biomedicine, Weill Cornell Medicine, New York, NY, USA

    Marcin Imieliński

Extended data

Extended Data Fig. 1 Pore-C assay development.

(a) An overview of Pore-C alignment. A directed acyclic graph (DAG) is built by partially ordering reference alignments on each concatemer query sequence. The final concatemer alignment comprises a greedy traversal of the DAG (see Methods for more details). (b) Comparison of cumulative fragment length distributions (left) and densities (right) associated with three different restriction enzyme based protocols (NlaIII, DpnIII and HindIII). (c) Comparison of virtual pairwise contacts generated per Gb sequenced for Pore-C for the three restriction enzyme based protocols (NlaIII: n = 13, DpnIII: n = 6, and HindIII: n = 2). P-values obtained using two-sided Wilcoxon rank sum test. For all boxplots, the centerline corresponds to the mean, box limits to the interquartile range (IQR) and the whiskers to the last datum within 1.5 * IQR. (d) Stratum adjusted correlation coefficients (SCC) comparing single MinION runs (M1 and M2), single PromethION runs (P1 and P2) and combination of P1 and P2 with the full GM12878 Hi-C data set (n = 23 chromosomes for each run). The bars represent mean SCC score for all chromosomes for each run with standard error of mean. (e) Comparison of 500 kb compartment scores (CS) between Pore-C (1.89 billion virtual pairwise contacts) and Hi-C ( ~ 4 billion pairwise contacts) for GM12878. (f) Comparison of 50 kb topologically associated domain insulation scores (IS) between Pore-C and Hi-C. (g) Jaccard similarity between TADtree TAD boundaries in gold standard Hi-C data and one of the four data sets on the X axis. (h) Aggregate peak analysis (APA) comparing Pore-C virtual pairwise and Hi-C pairwise contact density within 100 kb of Hi-C loop anchors. Each 10 kb by 10 kb pixel represents the total number of contacts detected across the entire loop set in a standard coordinate system centered around each loop anchor. (i) Number of structural loops called by Peakachu19 in two Hi-C replicate, SPRITE and Pore-C. (j) Precision and recall of structural loops called in Pore-C and Hi-C replicates relative to gold standard data Hi-C calls. (k) APA peaks for structural loops called in Pore-C separated by structural loop calling scores, showing stronger signal in higher confidence structural loops. (l) Stratum adjusted correlation coefficients (SCCs) between the full Hi-C dataset for GM12878 with GM12878 Pore-C, GM12878 SPRITE, after separating high order (3-way or higher) and pairwise contacts (n = 23 chromosomes for each run). For all boxplots, the centerline corresponds to the mean, box limits to the interquartile range (IQR) and the whiskers to the last datum within 1.5 * IQR.

Extended Data Fig. 2 Pore-C concatemers guide denovo human assembly correction and scaffolding.

(a) The Shasta assembler was used to generate a draft assembly using a PromethION flow cell of nanopore WGS of the HG002 GIAB sample; the resulting contigs are plotted in length order (topmost track). Next, virtual pairwise contacts from a PromethION flow cell of Pore-C of HG002 and WGS-derived HG002 contigs were provided to the 3D-DNA tool (second track from the top) to generate scaffolds. A second round of scaffolding was carried out using SALSA2 followed by the ‘Purge Haplotigs’ tool to remove regional duplications caused by un-collapsed heterozygosity (third track from the top). The resulting assembly shows high congruity with the reference genome (bottom track). (b) Illustration of the scaffolding process for chromosomes 3, 4 and 5. The top contact map shows the contact density derived from Pore-C reads mapped against length ordered contigs from the draft assembly. The bottom contact map is derived from the same Pore-C reads mapped against the final scaffolds. The center track shows how the contigs have been ordered, re-oriented and merged into the final scaffolds, which are at or near chromosome length. (c) Performance of different assembly approaches with respect to contig/scaffold size. The combination of all three methods gives the best result with a scaffold NG50 of 125 Mb. The human reference genome with scaffold gaps removed is shown for comparison. (d) Top 5 scaffold alignments for each chromosome are plotted as a proportion of non-gap bases in each chromosome, showing that most chromosomes are assembled into a single large scaffold.

Extended Data Fig. 3 Chromunity and Synergy.

(a) Example showing relationship between concatemer communities, contact frequency, and the resulting bin-set in the Chromunity analysis. The contact frequency is computed as the number of concatemers in the community within 1 kb of the given reference location. The bin-set is defined as the union of reference locations at or above 85th percentile contact frequency. (b) Volcano plots of Synergy model results for collections of random bin-sets chosen to match enhancers and promoters bin-sets in covariate space. (c) Contact map of an exemplar window with synergistic bin-sets before (top) and after (bottom) concatemer shuffling (see Methods). (d) Example distribution of concatemer order before and after shuffling in the window shown in the previous panel. Shuffled concatemers retain the original order distribution after shuffling. (e) HiCRep SCC of shuffled vs. original contact maps showing close correspondence of pairwise interaction frequencies after concatemer shuffling. The bar represents mean SCC score for chromosome 1 (n = 70 bin-sets) with standard error of mean. Each data point represents SCC score for single bin-set. (f) Volcano plots of Synergy results from sliding window Chromunity analysis of GM12878. Additional volcano plots represent identical analyses using (1) shuffled concatemers for identical bin-sets and (2) random bin-sets chosen to match candidate synergies in covariate space. (g) Fraction of synergies relative to total candidate bin-sets nominated by the concatemer community detection step of Chromunity (n = 3120 bin-sets). Shuffled (n = 2890) and random (n = 1852) analyses defined as in panel F. Fisher’s exact test was used to determine enrichment of synergy between various groups. Two-sided P values were obtained for each comparison. Error bars on bar plots represent 95% confidence intervals on the Bernoulli trial parameter. (h) Volcano plots of Synergy model results for de novo detected high order interactions for inter-chromosomal (red) and intra-chromosomal (blue) RE Chromunity derived bin-sets. Shuffled and random bin-sets defined as in F. (i) Similar to G, fraction of candidate bin-sets from sliding window Chromunity analysis of GM12878 showing synergy across inter- and intra-chromosomal interactions (n = 661) relative to shuffled concatemers (n = 661) and random bin-sets (n = 425). Fisher’s exact test was used to determine enrichment of synergy between various groups. Two-sided P values were obtained for each comparison. Error bars on bar plots represent 95% confidence intervals on the Bernoulli trial parameter. (j) Circos plots showing all inter- and intra-chromosomal synergies from Chromunity analysis of E-P targets in GM12878. The blue highlighted regions are approximate genomic locations shown to be interacting by SPRITE.

Extended Data Fig. 4 Methylation signal in Pore-C.

(a) Correlation of methylation data obtained from Pore-C with whole genome bisulfite sequencing (WGBS, r = 0.92) and (b) Nanopore whole genome sequencing (r = 0.94). (c) Comparison of methylation signal from 2 Pore-C replicates with 2 ENCODE WGBS replicates with Pearson correlation coefficients shown in the upper right triangle. Very high correlation was observed between two replicates of Pore-C, exceeding that shown for the two ENCODE WGBS replicates. (d) Haplotype-specific CpG island methylation across genes present on the Xa and Xi chromosomes. Genes that escape X chromosome inactivation (XCI, n = 29) show no difference in methylation of promoter-proximal CpG islands (CGI) (left). Genes that undergo XCI (n = 208) show significantly elevated DNA methylation on Xi compared to Xa. For all boxplots, the centerline corresponds to the mean, box limits to the interquartile range (IQR) and the whiskers to the last datum within 1.5 * IQR. (e) Haplotype-specific methylation fraction at CpG islands relative to the transcription start site of X chromosome genes at three categories of loci on Xa and Xi. A dip in DNA methylation around the promoter in the pseudo-autosomal region (PAR), which escapes inactivation. The dip is present on Xa and in both haplotypes in the PAR, but not in other Xi regions. Random regions of genome with similar widths as the promoters are also shown for both Xa and Xi. (f) Haplotype-specific contact maps on chromosome X in GM12878. The inactive X (Xi) haplotype (top heat map) lacks the A/B compartmentalization present in the active X (Xa) haplotype (lowest heat map) but instead demonstrates two megadomains. The border of these two megadomains, called the ‘hinge region,’ shows a mega-loop of ~ 16 Mb between DXZ4 and FIRRE present in Xi but absent in Xa.

Extended Data Fig. 5 LNCaP Chromunity results and controls.

(a) Volcano plot of Synergy results for bin-sets nominated by sliding-window Chromunity for LNCaP cells with (DHT+, bottom) and without (DHT-, top) androgen stimulation. Additional analyses represent negative controls: ‘Shuffled’ analysis shows Synergy results for the same candidate bin-sets using shuffled concatemers (see Methods). ‘Random’ analysis shows Synergy results on original concatemers using random bin-sets chosen to match candidate bin-sets in covariate space (see Methods). (b) Same as (a) but using regulatory element (RE) Chromunity.

Supplementary information

Supplementary Information

Supplementary Fig. 1 and Tables 1–5.

Reporting Summary

