Promoter-anchored chromatin interactions predicted from genetic analysis of epigenomic data.

Promoter-anchored chromatin interactions (PAIs) play a pivotal role in transcriptional regulation. Current high-throughput technologies for detecting PAIs, such as promoter capture Hi-C, are not scalable to large cohorts. Here, we present an analytical approach that uses summary-level data from cohort-based DNA methylation (DNAm) quantitative trait locus (mQTL) studies to predict PAIs. Using mQTL data from human peripheral blood ([Formula: see text]), we predict 34,797 PAIs which show strong overlap with the chromatin contacts identified by previous experimental assays. The promoter-interacting DNAm sites are enriched in enhancers or near expression QTLs. Genes whose promoters are involved in PAIs are more actively expressed, and gene pairs with promoter-promoter interactions are enriched for co-expression. Integration of the predicted PAIs with GWAS data highlight interactions among 601 DNAm sites associated with 15 complex traits. This study demonstrates the use of mQTL data to predict PAIs and provides insights into the role of PAIs in complex trait variation.

Reviewer #3: Remarks to the Author: Review of "Promoter-anchored chromatin interactions predicted from genetic analysis of epigenomic data," by Yang Wu et al. This paper presents a statistical method to predict promoter associated interactions using covariation of DNA methylation and mQTL data across individuals. The approach is based on Mendelian randomization and HEIDI and used to analyze mQTL data from a meta-analysis of studies on 1,980 individuals to predict the interactions between promoters and genomic regions within 4Mbp of the promoters. Our major concern is that the evidence in support of the predicted interactions is quite weak. Although anecdotal evidence is presented from a few gene loci, these are not well annotated and hand-picked examples where the effect of covarying methylation should be more obviously functionally significant. There is no direct experiment validation of novel interactions, and no direct comparison to previously published interaction datasets, which could be done by comparing the full prediction sets to promoter capture hi-C or hi-chip. Only one hi-c interaction is shown in a supplemental figure. Beyond that there are no hard examples shown that differential methylation is recovering known demonstrated E-P interactions. While the predictions do fall within TADs more frequently by chance, this is not a direct assessment of functional interaction. The enrichment for differentially expressed genes in fig3d looks marginally significant at best. We thank the three reviewers for their constructive comments, which have helped us improve the manuscript substantially. We have responded to all the reviewers' comments point-by-point below (in blue) and have highlighted all the relevant changes (in yellow) in the revised manuscript.

Reviewer #1:
Summary: The authors propose a novel approach to predict promoter-anchored chromatin interactions using DNA methylation QTL summary data. Their approach relies on previously published methods called SMR and HEIDI that implement Mendelian randomization to remove confounding factors and account for linkage disequilibrium (LD) to fine map SNPs belonging to the same LD blocks. The article is well written and is easy to read. The contribution of this article is significant, since the approach is novel and bridge two different fields: 3D chromatin and statistical genetics. It would help to target promoter-anchored chromatin interactions that are involved in GWAS, therefore allowing a better interpretation of the SNP effect on disease.
Re: We thank the reviewer for the positive remarks.
Major revision: 1. The authors must compare their prediction method with state-of-the-art methods for predicting long-range interactions, when possible. For instance, they can compute correlations between different DNA methylation probes (without accounting for genotype information) and show that their Mendelian randomization improves the results. A similar approach would be to compare with correlations between other kinds of chromatin data (histone mark ChIP-seq, protein binding ChIP-seq, DNase-seq, ...), or expression data (CAGE-seq from Fantom project or other expression seq data that map gene expression as well as enhancer expression with strand-specific data) from cell lines. Moreover, the author should compare their approaches using predictive models that predicts promoter-enhancer interactions using epigenomic data such as in Bing He et al. (Global view of enhancer-promoter interactome in human cells, PNAS May 27, 2014 111 (21) E2191-E2199), or any other modeling approach.
Re: We have compared our SMR & HEIDI method with two state-of-the-art methods in the revised manuscript (lines 156-184). We first compared our method with a correlation-based method (i.e., a method that uses correlations of epigenomic marks to predict interactions; Ernst et al., 2011, Nature) using two different types of epigenomic data, i.e., DNA methylation (DNAm) and chromatin accessibility measured by Assay for Transposase-Accessible Chromatin using sequencing (ATAC-seq). To evaluate the performance of the methods, we used a recently released chromatin interaction data (PCHi-C loops) generated by Jung et al. (2019, Nature Genetics) in GM12878 cell lines for validation, and quantified the enrichment of the predicted interactions in PCHi-C loops defined based on a range of PCHi-C P value thresholds. We chose the PCHi-C data from Jung et al. because the P values of all the tested loops are available and because compared to other Hi-C data sets, chromatin interactions identified in GM12878 cell lines may be more relevant to the predicted PAIs in whole blood. We computed the fold enrichment of our predicted PAIs in the PCHi-C loops by a 2 × 2 contingency table and used the Fisher's exact test to assess the statistical significance of the enrichment. The results showed that our predicted PAIs using either DNAm or chromatin accessibility data were highly enriched in the PCHi-C loops and that the fold enrichment increased with the increase of the significance level used to claim the PCHi-C loops (Fig. 3c), consistent with the observation from previous work that Hi-C loops with lower P values are more reproducible between biological replicates (Jin et al., 2013, Nature). Our SMR & HEIDI method outperformed the correlation-based method using either DNAm or chromatin accessibility data, as evidenced by the larger fold enrichment of our method compared to the correlation-based method at all the PCHi-C significance levels (Fig. 3c).
As pointed out by the reviewer, there are other predictive models such as the method developed by He et al. (2014, PNAS) that uses multiple genomic features to predict specific chromatin interactions (i.e., enhancer-promoter interactions). Considering that the He et al.'s method and our method required very different types of data and that our predicted PAIs are not restricted to enhancer-promoter interactions, we did not compare the two methods here. Instead, we compared our SMR & HEIDI method with the pairwise hierarchical model (PHM) that also uses genetic data of regulatory elements (Kumasaka et al., 2019, Nature Genetics). Applying our SMR & HEIDI method to the summary-level chromatin accessibility QTL (caQTL) data from Kumasaka et al., of the 15,487 causal interactions identified by the PHM approach, 10,416 were tested in our SMR & HEIDI analysis; 98.4% were replicated at a nominal significance level (PSMR < 0.05 and PHEIDI > 0.01), and 36% were significant after multiple testing corrections (PSMR < 4.8 × 10 −6 (0.05/10,416) and PHEIDI > 0.01). While the PHM method requires individual-level genotype and chromatin accessibility data and is less computationally efficient due to the use of Bayesian hierarchical model, our SMR & HEIDI method that requires only summary-level data is more flexible and can be potentially applied to all epigenetic QTL data. We have added these results in the revised manuscript ( Fig. 3c and lines 156-184).
2. Predictions based on DNA methylation seems to be quite sparse when compared to Hi-C data ( Figure 2d) as discussed by the authors in the Discussion. The authors should explain if the sparsity of these predictions are due to the low density of probes along the genome, or are due to any other problem related to their method. They should illustrate more this problem in the result section. In this line, the authors should explain if their predictions are not biased toward certain regions of the genome due to the probe density or technical artefacts and illustrate with results.
Re: In Figure 2d, we compared the predicted PAIs selected at a very stringent significance level (i.e., the experiment-wise significance level) to the chromatin interactions with correlation scores > 0.4 from Grubert et al. (2015, Cell), which does not provide a fair comparison of sparsity between PAIs and Hi-C loops in this region. We have clarified this in the revised manuscript (Fig. 2).
Nevertheless, we acknowledge that the predicted PAIs are relatively sparse because of the sparsity of the DNAm array used, the underlying hypothesis of the SMR method, and the stringent statistical significance level used to claim significant PAIs (lines 409-412 and Supplementary Note 1). More specifically, first, although the Illumina 450K methylation array has a genome-wide coverage, the probes cover only a limited proportion of the regulatory elements. Second, SMR requires the exposure probe with at least an mQTL at PmQTL < 5e-8, and we limited the exposure probes in promoter regions, resulting in only a small proportion of exposure DNAm probes (m=28,732, ~6.5%) being included in the SMR & HEIDI analysis. Third, to control for false positives, we applied an experiment-wise SMR significance threshold (i.e., PSMR < 1.76e-9) to correct for multiple testing and a stringent HEIDI threshold (i.e., PHEIDI < 0.01) to reject SMR associations due to linkage. However, despite the relatively sparse distribution of the predicted PAIs across the genome, the number of predicted PAIs (m=34,797) is comparable to the loops identified by experimental assays such as Hi-C and PCHi-C. For example, there are only ~10,000 Hi-C loops identified from Rao et al. (2014, Cell) and ~80,000 PCHi-C loops identified from Jung et al. (2019, Nature Genetics).
In addition to the sparsity, the Illumina 450K DNAm probes are preferentially distributed towards certain genome regions (e.g., promoter; Fig 4a). Such an uneven distribution, however, would not bias the functional enrichment results of our predicted PAIs (e.g., those shown in Figs. 4 and 5) because the enrichments were tested against DNAm pairs randomly sampled from all the pairs tested in the SMR & HEIDI analysis rather than random genomic positions. We have commented on this in the revised manuscript (lines 200-204).
3. The authors should explain and illustrate with figures if their predictions are more related to general features of Hi-C data (for instance TADs or compartments) or more specific features such as loops identified from Hi-C data or ChIA-PET.
Re: Our predicted PAIs are more related to the general features of Hi-C data such as topologically associating domains (TADs) than the specific features, as suggested by the three observations below. First, ~80% of the predicted PAIs were located in the TADs identified from Dixon et al. (2012, Nature) in comparison to only 130 PAIs overlapped with the ~10,000 Hi-C loops identified from Rao et al. (2014, Cell). Second, the fold enrichment of the predicted PAIs in TADs (1.89-fold) was larger than that in specific Hi-C loops (1.49-fold) using the same Hi-C data from Rao et al. (Fig. 2a). Third, we performed an additional enrichment analysis of the predicted PAIs in the POLR2A ChIA-PET loops from the ENCODE and observed a significant but smaller enrichment of the predicted PAIs in ChIA-PET loops (1.44-fold, one-sided empirical P < 0.001, Fig. 3b) than that in TADs.
There are several reasons why the overlaps between the PAIs and Hi-C loops were limited. First, Hi-C loops were detected with errors. We observed that the concordances between different Hi-C data sets were very limited (Fig. S11), consistent with the conclusion from Forcato et al. (2017, Nature Methods) that the reproducibility of Hi-C loops is low at all resolutions. Second, most (65%) of our predicted PAIs are interactions between DNAm sites within 50 Kb (Fig. S2b), which are often not well captured by the 3C-based methods due to its low resolution (Kumasaka et al., 2019, Nature Genetics). Third, Hi-C loops are cell type specific (Javierre et al., 2016, Cell) so that differences between the Hi-C loops identified in cell lines and our PAIs identified in whole blood are expected. We have discussed this issue in the revised manuscript (lines 388-402).
4. The authors must add more explanations on SMR and HEIDI in the article, since I had to read carefully their previous article (Integration of summary data from GWAS and eQTL studies predicts complex trait gene targets, Nature Genetics, 2016) to understand how the novel method works.
Re: We have added more explanations on the SMR & HEIDI method in the Methods section (lines 439-443 and 453-462).
5. The authors should add some ROC and PR curves to evaluate the accuracy of PAI predictions. That would help also to compare with other methods (see comment 1).
Re: We agree with the reviewer that ROC curves are useful for method comparison. In our case, however, the quantification of the specificity and sensitivity is hindered by the lack of ground truth positive and negative controls of the chromatin interactions. There were several reasons why there is no ideal Hi-C data set that could be used to conduct the ROC analysis. First, the Hi-C loops were detected with errors. The concordances between different Hi-C data sets were very limited (Fig. S11). Second, most of the Hi-C maps were generated at > 5 Kb resolution, while our precited PAIs were interactions between two DNAm sites at single base-pair resolution. Therefore, it is very likely that multiple PAIs with different significance levels could be mapped to the same Hi-C loop. In this case, it is difficult to evaluate specificity and sensitivity of the prediction. Third, Hi-C loops are cell-type specific so that there are expected differences between the Hi-C loops identified in cell lines and our PAIs identified in whole blood.
Instead, we have compared the interaction prediction methods by testing the enrichment of the predicted interactions in chromatin loops defined based on a range of Hi-C P value thresholds. As mentioned above (Remark 1), we used a recently published PCHi-C data set in GM12878 cell lines from Jung et al. (2019, Nature Genetics) because the P values of all the tested loops are available and because compared to other Hi-C data sets, chromatin interactions identified in GM12878 cell lines may be more relevant to the predicted PAIs in whole blood. The result showed that our predicted PAIs were highly enriched in the Jung et al. PCHi-C data and that the fold enrichment increased with the increase of the significance level used to claim the PCHi-C loops (Fig. 3c). Moreover, our SMR & HEIDI method outperformed the correlation-based method using either DNAm or chromatin accessibility data and is more computationally efficient and flexible in comparison to another genetic data based prediction method (i.e., the PHM approach).

Reviewer #2
(Remarks to the Author): Re: We limited the PAI analysis to DNAm pairs within a 2Mb window for the following reasons (see lines 99-102 in the revised manuscript). First, we knew from the Hi-C data that chromatin interactions between genomic sites separated by more than 2 Mb are very rare (Jin et al., 2013, Nature). Second, summary data from epigenetic QTL studies are often only available for genetic variants in cis-regions. Third, the use of a 2Mb window reduces the computational and multiple testing burdens. In fact, our results showed that only ~0.7% of the predicted PAIs were between DNAm sites greater than 1 Mb apart (lines 122-123).
3. In Page 5 Line 126, ChIA-PET could be an alternative method for mapping the chromatin 3D interaction. It would be better to show the overlap between ChIA-PET data (such as Pol2) and the PAIs of this study.
Re: We thank the reviewer for this suggestion. We have performed the analysis to test whether our predicted PAIs were also enriched in chromatin interactions identified by ChIA-PET (lines 150-151). We used the POLR2A ChIA-PET data from the ENCODE project. There were ~2,300 PAIs overlapping with the ChIA-PET loops, and the number of overlaps was significantly higher than that of the same number of DNAm pairs randomly sampled from all the tested DNAm pairs with distances matched (1.44-fold, one-sided empirical P-value < 0.001, Fig. 3b).
4. In Page 5 Line 126, the percentage of PAIs located in TADs of Hi-C data was not so high even though the statistical significance was shown. How to explain those PAIs that were not located in TADs? Are they false positives?
Re: First, we would like to clarify that the detected TAD regions are not perfect because they are predicted by computational approaches with errors/uncertainty. For example, Dali et al. (2017, Nucleic Acids Res) concluded that the predicted TADs varied greatly among prediction tools and datasets in number, size, and other biological properties. Second, since we have applied an experiment-wise significance level to correct for multiple testing in the PAI analysis, the false positive rate is expected to be very low (a probability of 0.05 to observe one or more false positives in the whole study). This is also supported by the result that ~80% of the PAIs were between DNAm within TADs (lines 134-137). For the PAIs that were between DNAm not located in any TADs, we have shown specific examples that these predicted PAIs are likely to be functionally interacted (Fig. 2d and Fig. S3), suggesting that they are not false positives but likely to be interactions yet to be identified by experimental assays. 5. In the section "Enrichment of the predicted PAIs in functional annotations" (Page 6), the analyses were confusing. For those significantly-enriched regions (e.g., repressed Polycomb regions and high DNase sensitivity sites) that the authors claimed, the fold-enrichment value seems small (most were less than 2.0). In addition, it did not make sense that the PIDSs were underrepresented surrounding transcription start sites but significantly-enriched in the bivalent promoters.
Re: The fold-enrichment was computed as the proportion of promoter-interacting DNAm sites (PIDSs) in a functional category divided by the mean of a null distribution generated by resampling variance-matched control probes at random from all the outcome probes used in the SMR analysis. On one hand, the enrichment test is not biased by the fact that the Illumina 450K methylation array probes are preferentially distributed towards certain genomic regions because it tests against control probes sampled from those on the array rather than random genomic positions. On the other hand, however, this test is over conservative because the control probes are enriched in certain functional genomic regions (Fig. S5a) and can possible contain some of the PIDSs, which may explain the relatively small fold enrichments observed in this analysis. We have clarified this in the revised manuscript (lines 200-208).
In the PAI analysis, we excluded the DNAm pairs within a promoter region, which may explain why the PIDSs were depleted in promoters. If we add the within-promoter DNAm pairs back in analysis, the predicted PIDSs are significantly enriched in both promoters and bivalent promoters (Fig. S5b). We have commented on this in the revised manuscript (lines 206-208).
6. In the section "Relevance of the predicted PAIs with gene expression" (Page 6-7), the authors analysed the association between PmPmI and gene co-expression. However, in Figure 3C, the authors only included the mean value (red line) for PmPmI gene pairs while including the histogram for the "control". It is confusing why not including both distributions? In addition, the authors should point out which statistical test was used when they claimed the pairwise genes with PmPmI were more likely to be co-expressed.
Re: The method used in this analysis is an empirical test that compares the observed mean Pearson correlation of all the PmPmI gene pairs to the distribution of a number of mean Pearson correlation values under the null. This null distribution was generated by randomly sampling a distance-matched control set from the gene pairs whose promoter were tested in the SMR analysis for 1,000 times. So, it is a comparison of the observed mean value with the null distribution of the mean values. We have clarified this in the main text (lines 217-223) and the legend of Figure 3c (now Figure 4c in the revised manuscript).

Reviewer #3
(Remarks to the Author): Review of "Promoter-anchored chromatin interactions predicted from genetic analysis of epigenomic data," by Yang Wu et al. This paper presents a statistical method to predict promoter associated interactions using covariation of DNA methylation and mQTL data across individuals. The approach is based on Mendelian randomization and HEIDI and used to analyse mQTL data from a meta-analysis of studies on 1,980 individuals to predict the interactions between promoters and genomic regions within 4Mbp of the promoters.
Re: We thank the reviewer for the summary.
Our major concern is that the evidence in support of the predicted interactions is quite weak. Although anecdotal evidence is presented from a few gene loci, these are not well annotated and hand-picked examples where the effect of covarying methylation should be more obviously functionally significant. There is no direct experiment validation of novel interactions, and no direct comparison to previously published interaction datasets, which could be done by comparing the full prediction sets to promoter capture hi-C or hi-chip. Only one hi-c interaction is shown in a supplemental figure.
Re: To further validate our method, we have compared our predicted PAIs with the chromatin loops identified by additional experimental assays (i.e., promoter capture Hi-C (PCHi-C) and Chromatin Interaction Analysis by Paired-End Tag Sequencing (ChIA-PET)), and compared our method with two other prediction approaches, i.e., the correlation-based method from Ernst et al. (2011, Nature) and the pairwise hierarchical model (PHM) method from Kumasaka et al. (2019, Nature Genetics). All the additional results have been incorporated in the revised manuscript (lines 150-151, and 156-184). We found that our predicted PAIs were highly enriched in the ChIA-PET loops from the ENCODE (Fig. 3b) and PCHi-C loops from Jung et al. (Fig. 3c). More importantly, the fold enrichment increased with the increase of significance level used to claim the PCHi-C loops (Fig. 3c), consistent with the observation from previous work that Hi-C loops with lower Hi-C P values are more reproducible between biological replicates (Jin et al., 2013, Nature). Moreover, our SMR & HEIDI method outperformed the correlationbased method using either DNAm or chromatin accessibility data (Fig. 3c). We have also shown that our method has similar performance in comparison with the PHM approach. Of the 15,487 causal interactions identified by PHM approach, 10,416 were tested in our SMR & HEIDI analysis; 98.4% were replicated at a nominal significance level (PSMR < 0.05 and PHEIDI > 0.01), and 36% were significant after multiple testing corrections (PSMR < 4.8 × 10 −6 (0.05/10,416) and PHEIDI > 0.01). While the PHM approach requires individual-level genotype and chromatin accessibility data and is less computationally efficient due to the use of Bayesian hierarchical model, our SMR & HEIDI method that only requires summary-level data is more flexible and can be potentially applied to all epigenetic QTL data.
We have also shown examples where the predicted PAIs are likely to be functional by integrating GWAS with PAIs ( Fig. 2d and Fig. S3), and these PAIs could be candidates for functional validations in the future. However, we agree with the reviewer that further experimental validations for these novel interactions are needed. We have mentioned this as a limitation of our study in the Discussion (lines 419-421).
Beyond that there are no hard examples shown that differential methylation is recovering known demonstrated E-P interactions.
Re: Our analytical approach was developed to detect the association between DNAm levels of two CpG sites due to the same set of underlying genetic variants rather than detecting differential methylation. We have shown that our predicted PAIs were significantly enriched in the loops identified by different experimental assays (e.g., Hi-C, PCHi-C and ChIA-PET) and further added in the revised manuscript an example that a predicted PAI is validated by an enhancer-promoter (E-P) interaction discovered in two independent Hi-C studies (lines 152-154 and Fig. S4).
While the predictions do fall within TADs more frequently by chance, this is not a direct assessment of functional interaction. The enrichment for differentially expressed genes in fig3d looks marginally significant at best.
Re: We agree with the reviewer that the predicted interactions falling within TAD regions were not necessarily functional. However, compared to the chromatin interactions identified by the 3C-based methods, our predicted PAIs may be more relevant to functional interactions, as evidenced by the observation from an additional analysis (see below) that our predicted Pm-PAI genes (genes whose promoters were involved in significant PAI) showed stronger enrichment in active gene groups compared to the predicted target genes from the PCHi-C data (Fig. S6). In addition, the use of a genetic model also allows us to integrate PAIs with GWAS results to understand the regulatory mechanisms for complex traits (Fig. 2d, Fig. 6, Fig. S7, Fig.  S10 and lines 236-241).
In the test for enrichment in differentially expressed genes, the Pm-PAI genes were tested against the same number of control genes whose promoter DNAm sites were included in the SMR analysis. This enrichment analysis is conservative because the ascertainment of genes with promoter DNAm sites tested in SMR would potentially lead to upward biases in expression levels of the control genes. Therefore, we performed an additional enrichment analysis by including all the genes available in the GTEx blood samples. We found that most of the control genes (~70%) were in the inactive gene sets and the fold enrichment of our Pm-PAI genes in active gene groups increased substantially (Fig. S6a). We also performed a similar enrichment analysis for the predicted target genes from the PCHi-C data in GM12878 cell lines (Jung et al.). There was a significant enrichment of the PCHi-C target genes in the active gene groups, but the fold enrichment was slightly smaller than that of our Pm-PAI genes (Fig. S6b). We have included these additional results in the revised manuscript (lines 236-241).
Minor: 1. All plots are poorly annotated and difficult to follow. In Fig 1a,  Re: It is true that co-varying chromatin accessibility has been used to imply the physical interactions (e.g., Gate et al., 2018, Nature Genetics). Most of these methods are based on the correlations of epigenomic marks. We have compared the correlation-based method with our SMR & HEIDI method and found that our method outperformed the correlation-based method using either DNA methylation or chromatin accessibility data ( Fig. 3c and lines 156-184). Our method is flexible and can be applied to other epigenomic data such as chromatin accessibility data from the ATAC-seq. To replicate the interactions identified by Kumasaka et al. (2019, Nature Genetics) and compare the SMR & HEIDI method with their PHM approach, we applied our method to the summary-level chromatin accessibility QTL (caQTL) data from Kumasaka et al.. Of the 15,487 causal interactions identified by PHM approach, 10,416 were tested in our SMR & HEIDI analysis; 98.4% were replicated at a nominal significance level (PSMR < 0.05 and PHEIDI > 0.01), and 36% were significant after multiple testing corrections (PSMR < 4.8 × 10 −6 (0.05/10,416) and PHEIDI > 0.01). As stated above, while the PHM method from Kumasaka et al. requires individual-level genotype and chromatin accessibility data and is less computationally efficient due to the use of Bayesian hierarchical model, our SMR & HEIDI method that requires only summary-level data is more flexible and can be potentially applied to all epigenetic QTL data.
DNAm sites are more functionally enriched than control probes randomly sampled from all the "outcome" probes used in the PAI analysis. The outcome probes, however, were ascertained because the DNAm probes on an Illumina 450K methylation array are designed to target certain functional elements such as promoters (Fig. 4a). So, the test is essentially to assess whether the promoter-interacting DNAm sites are more functionally enriched than the array sites, which explains why most of the fold enrichment values were small. We have now also reported the fold enrichment values computed against random genomic positions wherever appropriate, which were several-fold larger than those computed against array probes ( Fig. S5b; lines 211-213).
We agree with the reviewer that although most PAIs are within TADs, the overlaps of PAIs with Hi-C loops are limited. We have commented on this issue in the Discussion section (also see below). "There are several reasons why the overlaps between the predicted PAIs and Hi-C loops were limited. First and most importantly, Hi-C loops were detected with substantial noises. We observed that the concordances between different Hi-C data sets were very limited (Fig.  S13), consistent with the conclusion from Forcato et al. that the reproducibility of Hi-C loops is low at all resolutions. Second, most (65%) of our predicted PAIs are interactions between DNAm sites within 50 Kb (Fig. S2b), which are often not well captured by the 3C-based methods due to its low resolution. Third, the chromatin interactions are cell type specific so that differences between the Hi-C loops identified in cell lines and our PAIs identified in whole blood are expected. For the PAIs that were between DNAm sites not located in TADs or Hi-C loops, we have shown specific examples that these predicted PAIs are likely to be functionally interacted ( Fig. 2d and Fig. S3), suggesting that these PAIs are likely to be interactions yet to be identified by experimental assays. On the other hand, compared to the loops identified based on 3C-based methods, our predicted PAIs are more likely to be functional interactions due to the use of genetic and regulatory epigenomic data, as evidenced by the observation that our predicted Pm-PAI genes showed stronger enrichment in active gene groups compared to the predicted target genes from the PCHi-C data (Fig. S8)."