CHESS enables quantitative comparison of chromatin contact data and automatic feature extraction

Galan, Silvia; Machnik, Nick; Kruse, Kai; Díaz, Noelia; Marti-Renom, Marc A.; Vaquerizas, Juan M.

doi:10.1038/s41588-020-00712-y

Technical Report
Published: 19 October 2020

CHESS enables quantitative comparison of chromatin contact data and automatic feature extraction

Nature Genetics volume 52, pages 1247–1255 (2020)Cite this article

8255 Accesses
19 Citations
124 Altmetric
Metrics details

Subjects

Matters Arising to this article was published on 05 December 2023

Abstract

Dynamic changes in the three-dimensional (3D) organization of chromatin are associated with central biological processes, such as transcription, replication and development. Therefore, the comprehensive identification and quantification of these changes is fundamental to understanding of evolutionary and regulatory mechanisms. Here, we present Comparison of Hi-C Experiments using Structural Similarity (CHESS), an algorithm for the comparison of chromatin contact maps and automatic differential feature extraction. We demonstrate the robustness of CHESS to experimental variability and showcase its biological applications on (1) interspecies comparisons of syntenic regions in human and mouse models; (2) intraspecies identification of conformational changes in Zelda-depleted Drosophila embryos; (3) patient-specific aberrant chromatin conformation in a diffuse large B-cell lymphoma sample; and (4) the systematic identification of chromatin contact differences in high-resolution Capture-C data. In summary, CHESS is a computationally efficient method for the comparison and classification of changes in chromatin contact data.

Access through your institution

Buy or subscribe

This is a preview of subscription content, access via your institution

Access options

Access through your institution

Buy this article

Purchase on Springer Link
Instant access to full article PDF

Buy now

Prices may be subject to local taxes which are calculated during checkout

**Fig. 1: CHESS overview and examples.**

**Fig. 2: CHESS evaluation on synthetic Hi-C matrices.**

**Fig. 3: Global comparison of syntenic region similarity between human and mouse using CHESS.**

**Fig. 4: Identification of chromatin conformational changes in fly embryos after Zelda (*zld*) knockdown.**

**Fig. 5: Identification of structural changes in a DLBCL.**

**Fig. 6: Feature extraction from Capture-C data.**

Deciphering multi-way interactions in the human genome

Article Open access 20 September 2022

A maximum-entropy model to predict 3D structural ensembles of chromatin from pairwise distances with applications to interphase chromosomes and structural variants

Article Open access 01 March 2023

Identifying statistically significant chromatin contacts from Hi-C data with FitHiC2

Article 24 January 2020

Data availability

The datasets analyzed in this study have been obtained from the Gene Expression Omnibus (Rao et al.¹⁰, GSE63525; Bonev et al.¹², GSE96107; Despang et al.⁵⁰, GSE125294) and ArrayExpress (Hug et al.⁹, E-MTAB-4918; Díaz et al.⁴⁸, E-MTAB-5875).

Code availability

The CHESS source code and the code for generating the synthetic Hi-C matrices and running tests on them is available on GitHub: (https://github.com/vaquerizaslab/CHESS). The intervaltree and tqdm packages used internally in CHESS can be found at https://github.com/chaimleib/intervaltree and https://github.com/tqdm/tqdm, respectively. In addition, CHESS uses internally the following published packages: FAN-C⁷¹ (https://github.com/vaquerizaslab/fanc); Cython⁷²; SciPy⁶⁹; scikit-image⁵⁹; NumPy^73,74; Pandas⁷⁵; Pathos⁷⁶; Pybedtools⁷⁷; Kneed⁷⁸.

References

Bonev, B. & Cavalli, G. Organization and function of the 3D genome. Nat. Rev. Genet. 17, 661–678 (2016).
Article CAS PubMed Google Scholar
Vietri Rudan, M. et al. Comparative Hi-C reveals that CTCF underlies evolution of chromosomal domain architecture. Cell Rep. 10, 1297–1309 (2015).
Article CAS PubMed PubMed Central Google Scholar
Acemel, R. D., Maeso, I. & Gómez‐Skarmeta, J. L. Topologically associated domains: a successful scaffold for the evolution of gene regulation in animals. Wiley Interdiscip. Rev. Dev. Biol. 6, e265 (2017).
Article Google Scholar
Lazar, N. H. et al. Epigenetic maintenance of topological domains in the highly rearranged gibbon genome. Genome Res. 28, 983–997 (2018).
Article CAS PubMed PubMed Central Google Scholar
Eres, I. E., Luo, K., Hsiao, C. J., Blake, L. E. & Gilad, Y. Reorganization of 3D genome structure may contribute to gene regulatory evolution in primates. PLoS Genet. 15, e1008278 (2019).
Article PubMed PubMed Central Google Scholar
Yang, Y., Zhang, Y., Ren, B., Dixon, J. R. & Ma, J. Comparing 3D genome organization in multiple species using phylo-HMRF. Cell Syst. 8, 494–505.e14 (2019).
Article CAS PubMed PubMed Central Google Scholar
Ke, Y. et al. 3D chromatin structures of mature gametes and structural reprogramming during mammalian embryogenesis. Cell 170, 367–381.e20 (2017).
Article CAS PubMed Google Scholar
Du, Z. et al. Allelic reprogramming of 3D chromatin architecture during early mammalian development. Nature 547, 232–235 (2017).
Article CAS PubMed Google Scholar
Hug, C. B., Grimaldi, A. G., Kruse, K. & Vaquerizas, J. M. Chromatin architecture emerges during zygotic genome activation independent of transcription. Cell 169, 216–228.e19 (2017).
Article CAS PubMed Google Scholar
Rao, S. S. P. et al. A 3D map of the human genome at kilobase resolution reveals principles of chromatin looping. Cell 159, 1665–1680 (2014).
Article CAS PubMed PubMed Central Google Scholar
Dixon, J. R. et al. Chromatin architecture reorganization during stem cell differentiation. Nature 518, 331–336 (2015).
Article CAS PubMed PubMed Central Google Scholar
Bonev, B. et al. Multiscale 3D genome rewiring during mouse neural development. Cell 171, 557–572.e24 (2017).
Article CAS PubMed PubMed Central Google Scholar
Nagano, T. et al. Cell-cycle dynamics of chromosomal organization at single-cell resolution. Nature 547, 61–67 (2017).
Article CAS PubMed PubMed Central Google Scholar
Gibcus, J. H. et al. A pathway for mitotic chromosome formation. Science 359, eaao6135 (2018).
Article PubMed PubMed Central Google Scholar
Spielmann, M., Lupiáñez, D. G. & Mundlos, S. Structural variation in the 3D genome. Nat. Rev. Genet. 19, 453–467 (2018).
Article CAS PubMed Google Scholar
Krijger, P. H. L. & de Laat, W. Regulation of disease-associated gene expression in the 3D genome. Nat. Rev. Mol. Cell Biol. 17, 771–782 (2016).
Article CAS PubMed Google Scholar
Darrow, E. M. et al. Deletion of DXZ4 on the human inactive X chromosome alters higher-order genome architecture. Proc. Natl Acad. Sci. USA 113, E4504–E4512 (2016).
Article CAS PubMed PubMed Central Google Scholar
Dixon, J. R. et al. Topological domains in mammalian genomes identified by analysis of chromatin interactions. Nature 485, 376–380 (2012).
Article CAS PubMed PubMed Central Google Scholar
Crane, E. et al. Condensin-driven remodelling of X chromosome topology during dosage compensation. Nature 523, 240–244 (2015).
Article CAS PubMed PubMed Central Google Scholar
Yang, T. et al. HiCRep: assessing the reproducibility of Hi-C data using a stratum-adjusted correlation coefficient. Genome Res. 27, 1939–1949 (2017).
Article CAS PubMed PubMed Central Google Scholar
Sauria, M. E. G. & Taylor, J. QuASAR: quality assessment of spatial arrangement reproducibility in Hi-C data. Preprint at bioRxiv https://doi.org/10.1101/204438 (2017).
Ursu, O. et al. GenomeDISCO: a concordance score for chromosome conformation capture experiments using random walks on contact map graphs. Bioinformatics 34, 2701–2707 (2018).
Article CAS PubMed PubMed Central Google Scholar
Yan, K.-K., Yardımcı, G. G., Yan, C., Noble, W. S. & Gerstein, M. HiC-spector: a matrix library for spectral and reproducibility analysis of Hi-C contact maps. Bioinformatics 33, 2199–2201 (2017).
Article CAS PubMed PubMed Central Google Scholar
Shavit, Y. & Lio’, P. Combining a wavelet change point and the Bayes factor for analysing chromosomal interaction data. Mol. Biosyst. 10, 1576–1585 (2014).
Article CAS PubMed Google Scholar
Huynh, L. & Hormozdiari, F. TAD fusion score: discovery and ranking the contribution of deletions to genome structure. Genome Biol. 20, 60 (2019).
Article PubMed PubMed Central Google Scholar
Paulsen, J. et al. HiBrowse: multi-purpose statistical analysis of genome-wide chromatin 3D organization. Bioinformatics 30, 1620–1622 (2014).
Article CAS PubMed PubMed Central Google Scholar
Lareau, C. A. & Aryee, M. J. diffloop: a computational framework for identifying and analyzing differential DNA loops from sequencing data. Bioinformatics 34, 672–674 (2018).
Article CAS PubMed Google Scholar
Djekidel, M. N., Chen, Y. & Zhang, M. Q. FIND: difFerential chromatin INteractions Detection using a spatial Poisson process. Genome Res. 28, 412–422 (2018).
Article CAS PubMed PubMed Central Google Scholar
Stansfield, J. C., Cresswell, K. G., Vladimirov, V. I., Dozmorov, M. G. HiCcompare: an R-package for joint normalization and comparison of HI-C datasets. BMC Bioinformatics 19, 279 (2018).
Article PubMed PubMed Central Google Scholar
Lun, A. T. L. & Smyth, G. K. diffHic: a Bioconductor package to detect differential genomic interactions in Hi-C data. BMC Bioinformatics 16, 258 (2015).
Article PubMed PubMed Central Google Scholar
Cook, K. B., Hristov, B. H., Le Roch, K. G., Vert, J. P. & Noble, W. S. Measuring significant changes in chromatin conformation with ACCOST. Nucleic Acids Res. 48, 2303–2311 (2020).
Article CAS PubMed PubMed Central Google Scholar
Heinz, S. et al. Simple combinations of lineage-determining transcription factors prime cis-regulatory elements required for macrophage and B cell identities. Mol. Cell 38, 576–589 (2010).
Article CAS PubMed PubMed Central Google Scholar
Wang, Z., Bovik, A. C., Sheikh, H. R. & Simoncelli, E. P. Image quality assessment: from error visibility to structural similarity. IEEE Trans. Image Process. 13, 600–612 (2004).
Article PubMed Google Scholar
Wang, Z. & Bovik, A. C. A universal image quality index. IEEE Signal Process. Lett. 9, 81–84 (2002).
Article CAS Google Scholar
Lieberman-Aiden, E. et al. Comprehensive mapping of long-range interactions reveals folding principles of the human genome. Science 326, 289–293 (2009).
Article CAS PubMed PubMed Central Google Scholar
Harmston, N. et al. Topologically associating domains are ancient features that coincide with Metazoan clusters of extreme noncoding conservation. Nat. Commun. 8, 441 (2017).
Article PubMed PubMed Central Google Scholar
Lee, J. et al. Synteny Portal: a web-based application portal for synteny block analysis. Nucleic Acids Res. 44, W35–W40 (2016).
Article CAS PubMed PubMed Central Google Scholar
Schwarzer, W. et al. Two independent modes of chromatin organization revealed by cohesin removal. Nature 551, 51–56 (2017).
Article PubMed PubMed Central Google Scholar
Nora, E. P. et al. Targeted degradation of CTCF decouples local insulation of chromosome domains from genomic compartmentalization. Cell 169, 930–944.e22 (2017).
Article CAS PubMed PubMed Central Google Scholar
Haarhuis, J. H. I. et al. The cohesin release factor WAPL restricts chromatin loop extension. Cell 169, 693–707.e14 (2017).
Article CAS PubMed PubMed Central Google Scholar
Rao, S. S. P. et al. Cohesin loss eliminates all loop domains. Cell 171, 305–320.e24 (2017).
Article CAS PubMed PubMed Central Google Scholar
Wutz, G. et al. Topologically associating domains and chromatin loops depend on cohesin and are regulated by CTCF, WAPL, and PDS5 proteins. EMBO J. 36, 3573–3599 (2017).
Article CAS PubMed PubMed Central Google Scholar
Gassler, J. et al. A mechanism of cohesin-dependent loop extrusion organizes zygotic genome architecture. EMBO J. 36, 3600–3618 (2017).
Article CAS PubMed PubMed Central Google Scholar
Lupiáñez, D. G. et al. Disruptions of topological chromatin domains cause pathogenic rewiring of gene-enhancer interactions. Cell 161, 1012–1025 (2015).
Article PubMed PubMed Central Google Scholar
Franke, M. et al. Formation of new chromatin domains determines pathogenicity of genomic duplications. Nature 538, 265–269 (2016).
Article CAS PubMed Google Scholar
Flavahan, W. A. et al. Insulator dysfunction and oncogene activation in IDH mutant gliomas. Nature 529, 110–114 (2016).
Article CAS PubMed Google Scholar
Hnisz, D. et al. Activation of proto-oncogenes by disruption of chromosome neighborhoods. Science 351, 1454–1458 (2016).
Article CAS PubMed PubMed Central Google Scholar
Díaz, N. et al. Chromatin conformation analysis of primary patient tissue using a low input Hi-C method. Nat. Commun. 9, 4938 (2018).
Article PubMed PubMed Central Google Scholar
Hughes, J. R. et al. Analysis of hundreds of cis-regulatory landscapes at high resolution in a single, high-throughput experiment. Nat. Genet. 46, 205–212 (2014).
Article CAS PubMed Google Scholar
Despang, A. et al. Functional dissection of the Sox9–Kcnj2 locus identifies nonessential and instructive roles of TAD architecture. Nat. Genet. 51, 1263–1271 (2019).
Article CAS PubMed Google Scholar
Kalhor, R., Tjong, H., Jayathilaka, N., Alber, F. & Chen, L. Genome architectures revealed by tethered chromosome conformation capture and population-based modeling. Nat. Biotechnol. 30, 90–98 (2011).
Article PubMed PubMed Central Google Scholar
Lin, D. et al. Digestion-ligation-only Hi-C is an efficient and cost-effective method for chromosome conformation capture. Nat. Genet. 50, 754–763 (2018).
Article CAS PubMed Google Scholar
Beagrie, R. A. et al. Complex multi-enhancer contacts captured by genome architecture mapping. Nature 543, 519–524 (2017).
Article CAS PubMed PubMed Central Google Scholar
Cardozo Gizzi, A. M. et al. Microscopy-based chromosome conformation capture enables simultaneous visualization of genome organization and transcription in intact organisms. Mol. Cell 74, 212–222.e5 (2019).
Article CAS PubMed Google Scholar
Sampat, M. P., Wang, Z., Gupta, S., Bovik, A. C. & Markey, M. K. Complex wavelet structural similarity: a new image similarity index. IEEE Trans. Image Process. 18, 2385–2401 (2009).
Article PubMed Google Scholar
Homola, T., Dohnal, V. & Zezula, P. Searching for sub-images using sequence alignment. In Proc. 2011 IEEE International Symposium on Multimedia 61–68 (IEEE, 2011).
Imakaev, M. et al. Iterative correction of Hi-C data reveals hallmarks of chromosome organization. Nat. Methods 9, 999–1003 (2012).
Article CAS PubMed PubMed Central Google Scholar
Knight, P. A. & Ruiz, D. A fast algorithm for matrix balancing. IMA J. Numer. Anal. 33, 1029–1047 (2013).
Article Google Scholar
van der Walt, S. et al. scikit-image: image processing in Python. PeerJ. 2, e453 (2014).
Article PubMed PubMed Central Google Scholar
Behara, K. N. S., Bhaskar, A. & Chung, E. Geographical window based structural similarity index for OD matrices comparison. J. Intell. Transp. Syst., https://doi.org/10.1080/15472450.2020.1795651 (2020).
Djukic, T., Hoogendoorn, S. & Van Lint, H. Reliability assessment of dynamic OD estimation methods based on structural similarity index. In Proc. Transportation Research Board 92nd Annual Meeting (Transportation Research Board, 2013).
Breakey, D. & Meskell, C. Comparison of metrics for the evaluation of similarity in acoustic pressure signals. J. Sound Vib. 332, 3605–3609 (2013).
Article Google Scholar
Hines, A. & Harte, N. Speech intelligibility prediction using a Neurogram Similarity Index Measure. Speech Commun. 54, 306–320 (2012).
Article Google Scholar
Tomasi, C. & Manduchi, R. Bilateral filtering for gray and color images. In Proc. Sixth International Conference on Computer Vision 839–846 (IEEE, 1998).
Otsu, N. A threshold selection method from gray-level histograms. IEEE Trans. Syst. Man Cybern. A Syst. Hum. 9, 62–66 (1979).
Article Google Scholar
Sexton, T. et al. Three-dimensional folding and functional organization principles of the Drosophila genome. Cell 148, 458–472 (2012).
Article CAS PubMed Google Scholar
Cock, P. J. A. et al. Biopython: freely available Python tools for computational molecular biology and bioinformatics. Bioinformatics 25, 1422–1423 (2009).
Article CAS PubMed PubMed Central Google Scholar
Cournac, A., Marie-Nelly, H., Marbouty, M., Koszul, R. & Mozziconacci, J. Normalization of a chromosomal contact map. BMC Genomics 13, 436 (2012).
Article CAS PubMed PubMed Central Google Scholar
Virtanen, P. et al. SciPy 1.0: fundamental algorithms for scientific computing in Python. Nat. Methods 17, 261–272 (2020).
Article CAS PubMed PubMed Central Google Scholar
Blythe, S. A. & Wieschaus, E. F. Zygotic genome activation triggers the DNA replication checkpoint at the midblastula transition. Cell 160, 1169–1181 (2015).
Article CAS PubMed PubMed Central Google Scholar
Kruse, K., Hug, C. B. & Vaquerizas, J. M. FAN-C: a feature-rich framework for the analysis and visualisation of C data. Preprint at bioRxiv https://doi.org/10.1101/2020.02.03.932517 (2020).
Behnel, S. et al. Cython: the best of both worlds. Comput. Sci. Eng. 13, 31–39 (2011).
Article Google Scholar
Oliphant, T. E. A Guide to NumPy (Trelgol Publishing, 2006).
van der Walt, S., Colbert, S. C. & Varoquaux, G. The NumPy array: a structure for efficient numerical computation. Comput. Sci. Eng. 13, 22–30 (2011).
Article Google Scholar
McKinney, W. Data structures for statistical computing in Python. In Proc. Python in Science Conference 56–61 (ScyPy.org, 2010).
McKerns, M. M., Strand, L., Sullivan, T., Fang, A. & Aivazis, M. A. G. Building a framework for predictive science. Preprint at https://arxiv.org/abs/1202.1056 (2012).
Dale, R. K., Pedersen, B. S. & Quinlan, A. R. Pybedtools: a flexible Python library for manipulating genomic datasets and annotations. Bioinformatics 27, 3423–3424 (2011).
Article CAS PubMed PubMed Central Google Scholar
Satopaa, V., Albrecht, J., Irwin, D. & Raghavan, B. Finding a ‘Kneedle’ in a haystack: detecting knee points in system behavior. In Proc. 31st International Conference on Distributed Computing Systems Workshops 166–171 (IEEE Computer Society, 2011).

Download references

Acknowledgements

Work in the Vaquerizas laboratory is funded by the Max Planck Society, the Deutsche Forschungsgemeinschaft (DFG) Priority Programme SPP 2202 ‘Spatial Genome Architecture in Development and Disease’ (project no. 422857230 to J.M.V.), the DFG Clinical Research Unit CRU326 ‘Male Germ Cells: from Genes to Function’ (project no. 329621271 to J.M.V.), the European Union’s Horizon 2020 research and innovation programme under the Marie Skłodowska-Curie grant agreement no. 643062—ZENCODE-ITN to J.M.V.) and the Medical Research Council in the UK. This research was partially funded by the European Union’s H2020 Framework Programme through the European Research Council (grant no. 609989 to M.A.M.-R.). We thank the support of the Spanish Ministerio de Ciencia, Innovación y Universidades through grant no. BFU2017-85926-P to M.A.M.-R. The Centre for Genomic Regulation thanks the support of the Ministerio de Ciencia, Innovación y Universidades to the European Molecular Biology Laboratory partnership, the ‘Centro de Excelencia Severo Ochoa 2013–2017’, agreement no. SEV-2012-0208, the CERCA Programme/Generalitat de Catalunya, Spanish Ministerio de Ciencia, Innovación y Universidades through the Instituto de Salud Carlos III, the Generalitat de Catalunya through the Departament de Salut and Departament d’Empresa i Coneixement and cofinancing by the Spanish Ministerio de Ciencia, Innovación y Universidades with funds from the European Regional Development Fund corresponding to the 2014–2020 Smart Growth Operating Program. S.G. thanks the support from the Company of Biologists (grant no. JCSTF181158) and the European Molecular Biology Organization Short-Term Fellowship programme.

Author information

These authors contributed equally: Silvia Galan, Nick Machnik.

Authors and Affiliations

Max Planck Institute for Molecular Biomedicine, Münster, Germany
Silvia Galan, Nick Machnik, Kai Kruse, Noelia Díaz & Juan M. Vaquerizas
National Centre for Genomic Analysis, Centre for Genomic Regulation, Barcelona Institute of Science and Technology, Barcelona, Spain
Silvia Galan & Marc A. Marti-Renom
Institute of Science and Technology Austria, Klosterneuburg, Austria
Nick Machnik
Centre for Genomic Regulation, Barcelona Institute of Science and Technology, Barcelona, Spain
Marc A. Marti-Renom
Pompeu Fabra University, Barcelona, Spain
Marc A. Marti-Renom
Catalan Institution for Research and Advanced Studies, Barcelona, Spain
Marc A. Marti-Renom
Medical Research Council London Institute of Medical Sciences, Institute of Clinical Sciences, Faculty of Medicine, Imperial College London, London, UK
Juan M. Vaquerizas

Authors

Silvia Galan
View author publications
You can also search for this author in PubMed Google Scholar
Nick Machnik
View author publications
You can also search for this author in PubMed Google Scholar
Kai Kruse
View author publications
You can also search for this author in PubMed Google Scholar
Noelia Díaz
View author publications
You can also search for this author in PubMed Google Scholar
Marc A. Marti-Renom
View author publications
You can also search for this author in PubMed Google Scholar
Juan M. Vaquerizas
View author publications
You can also search for this author in PubMed Google Scholar

Contributions

N.M. and J.M.V. conceptualized the study. S.G., N.M. and K.K. devised the methodology. N.M. and J.M.V. carried out the investigation. S.G., K.K. and N.D. obtained the resources. S.G., N.M., K.K., M.A.M.-R. and J.M.V. prepared and wrote the original draft of the manuscript. S.G., N.M., K.K., N.D., M.A.M.-R. and J.M.V. wrote, reviewed and edited the draft. J.M.V. supervised the study. M.A.M.-R. and J.M.V. acquired the funding.

Corresponding author

Correspondence to Juan M. Vaquerizas.

Ethics declarations

Competing interests

The authors declare no competing interests.

Additional information

Publisher’s note Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Extended data

Extended Data Fig. 1 Performance analysis of the CHESS algorithm.

a, CHESS P values in dependence of the relative noise level in synthetic matrices. Shown are the cases of equal amounts of noise in reference R and query Q (top) and different amounts of noise (bottom, noise only added to Q). Each case is examined for normalised and observed/expected (obs/exp) matrices, and different window sizes in the SSIM algorithm. b, Empirically determined CHESS P values in dependence of the size factor between R and Q for normalised (left) and observed/expected (obs/exp) matrices (right) (details in Methods). a, b, Solid lines indicate the mean, shaded areas the standard deviation over 100 simulations per parameter combination.

Extended Data Fig. 2 Technical details of the SSIM algorithm applied to Hi-C matrices.

a, Schematic overview of the structural similarity algorithm (SSIM). SSIM scores are calculated on all submatrices of R /Q at a given window size (WS). The final SSIM score is the mean of all SSIM submatrix scores. b, SSIM submatrix formula. Different components are coloured: illuminance (green), structure * contrast (red). x, y refer to submatrices (at the same positions) of the two full matrices for which the SSIM average is computed (see panel a). μ indicates the mean, σ the standard deviation, c1 and c2 are small constants that are introduced only for numerical reasons. c and d, SSIM comparisons of a matrix to itself (red dots) and 1,000 random matrices of the same size (blue dots). c, SSIM component values in dependence of SSIM score for different SSIM window sizes. d, Scatterplots of ranked SSIM scores at window size 100 vs. ranked scores at smaller window sizes.

Extended Data Fig. 3 Additional analysis of the CHESS algorithm.

a, Uniform distribution of empirically determined CHESS P values for comparisons of matrices with 100 % noise added. b, Distribution of structural similarity scores (SSIM) for background and truth comparisons at 25 k/Mb and 1.5 M/Mb simulated sequencing depth. Above each: Fractional change (value at x % noise/value at 0 % noise) of the standard deviation (std) of background scores and mean of truth scores over 100 simulations per parameter combination.

Extended Data Fig. 4 CHESS is robust to changes in noise due to random ligations and sequencing depth in real Hi-C data.

a, Examples of 5 Mb matrices used in this analysis including a 5, 80 and 95 % of added noise (random ligations between pairs of loci). We tested to what extent CHESS is able to identify two matrices as being identical, after noise and sequencing depth were adjusted independently in them. Matrices are based on chromosome 19 data from Bonev et al.¹². a, examples of the data with different amounts of noise. b, empirically determined P values and z-scores of CHESS runs with different window sizes, noise levels and simulated sequencing depths (details in Methods). Step size and matrix resolution were both 25 kb. Lines for 2 x 10⁵ and 1 x 10⁶ overlap for runs with window sizes > 1 Mb. c, As in panel a, but comparing CHESS runs with 2.5 Mb window size on matrices binned at 25 kb and 10 kb. b and c, solid lines indicate the mean, shaded areas the standard deviation over 1976, 2066, 2156, 2246, 2300 matrix pairs for window sizes 10 Mb, 7.5 Mb, 5 Mb, 2.5 Mb, 1 Mb, respectively.

Extended Data Fig. 5 Reproducibility of CHESS using different window (WS) and step sizes (SS), sequencing depths and resolutions.

For this analysis were tested the WS (250 kb - 3 Mb), SS (25 kb - 1 Mb), sequencing depths (percentage of reads between 20 and 80) and resolutions (10 kb and 25 kb) (details in Methods). X-axis labels: varied parameters in parentheses, fixed parameters before. The first two boxplots with red dots represent the Jaccard indices (JI) between CHESS results in Bonev et al.¹² using different WS, SS and sequencing depths. The boxplots with blue dots correspond to the Díaz et al.⁴⁸ dataset; in this case using different WS, SS, and then between different WS, SS and resolutions. mESC mouse embryonic stem cells, NPC neural progenitor cells. Boxplot elements: centre line: median, whiskers: 1.5x interquartile range, box limits: upper-lower quartile.

Extended Data Fig. 6 CHESS benchmark against HOMER, diffHiC and ACCOST.

a, Upset plot representing the intersection size between differential interactions of CHESS, HOMER, diffHiC and ACCOST. Below, an example is shown for each intersected group. b, Computational requirements of CHESS, HOMER, diffHiC and ACCOST. The first line plot shows the CPU usage, the second the memory consumption. The vertical dashed line represents the end of the run.

Extended Data Fig. 7 CHESS performance on differently sized simulated matrices with realistic noise and sequencing depth.

Shown are empirically determined CHESS p- and z-scores (details in Methods) for comparisons of R with a read depth of 100 read pairs / 100 bins and a resized copy Q. Scaling factor is indicated on the x-axis. A noise level of 25 % was added to both matrices independently. Sequencing depth was adjusted to 100 k/Mb. Solid lines indicate the mean, shaded areas the standard deviation over 100 simulations per parameter combination. Colours correspond to the different sizes of R.

Extended Data Fig. 8 Feature extraction from Capture-C data.

Examples of differential feature extraction with CHESS between the wt (top contact map) and different mutants (middle contact map) in the Despang et al.⁵⁰ dataset. Lost and gained structures in the mutants are highlighted in blue and red squares, respectively. Log₂ fold-change maps are depicted below (bottom contact map) with identified features coloured according to the directionality of the change. Below each comparison, the genomic annotation is represented, highlighting the modification of each mutant. The vertical lines define the CTCF binding motifs, dashed when deleted. Red hexagons demarcate TAD boundaries. Feature extraction between wt and a, ∆Bor, in which the border was deleted. b, ∆BorC1, in which the border and the first CTCF binding motif were deleted. c, ∆BorC1-2, in which the border and the two first CTCF binding motifs were deleted. d, ∆BorC1-4, in which the border and four CTCF binding motifs were deleted. e, ∆CTCF, in which the border and all the CTCF binding motifs were removed. f, Bor-KnockIn, in which the border was moved to a new location within the Sox9 locus. g, InvC∆Bor, in which the Sox9 sequence was inverted and the border was removed.

Supplementary information

Supplementary Information

Supplementary Figs. 1–5

Reporting Summary

Supplementary Table

Supplementary Table 1

Rights and permissions

Reprints and permissions

About this article

Cite this article

Galan, S., Machnik, N., Kruse, K. et al. CHESS enables quantitative comparison of chromatin contact data and automatic feature extraction. Nat Genet 52, 1247–1255 (2020). https://doi.org/10.1038/s41588-020-00712-y

Download citation

Received: 23 July 2018
Accepted: 04 September 2020
Published: 19 October 2020
Issue Date: November 2020
DOI: https://doi.org/10.1038/s41588-020-00712-y

This article is cited by

DiffDomain enables identification of structurally reorganized topologically associating domains
- Dunming Hua
- Ming Gu
- Dechao Tian
Nature Communications (2024)
KDM3B inhibitors disrupt the oncogenic activity of PAX3-FOXO1 in fusion-positive rhabdomyosarcoma
- Yong Yean Kim
- Berkley E. Gryder
- Javed Khan
Nature Communications (2024)
High-resolution Hi-C maps highlight multiscale chromatin architecture reorganization during cold stress in Brachypodium distachyon
- Xin Zhang
- Guangrun Yu
- Jinlei Han
BMC Plant Biology (2023)
Revisiting the use of structural similarity index in Hi-C
- Hanjun Lee
- Bruce Blumberg
- Toshihiro Shioda
Nature Genetics (2023)
A deep learning method for replicate-based analysis of chromosome conformation contacts using Siamese neural networks
- Ediem Al-jibury
- James W. D. King
- Daniel Rueckert
Nature Communications (2023)