Chromatin interactions and expression quantitative trait loci reveal genetic drivers of multimorbidities

Clinical studies of non-communicable diseases identify multimorbidities that suggest a common set of predisposing factors. Despite the fact that humans have ~24,000 genes, we do not understand the genetic pathways that contribute to the development of multimorbid non-communicable disease. Here we create a multimorbidity atlas of traits based on pleiotropy of spatially regulated genes. Using chromatin interaction and expression Quantitative Trait Loci (eQTL) data, we analyse 20,782 variants (p < 5 × 10−6) associated with 1351 phenotypes to identify 16,248 putative spatial eQTL-eGene pairs that are involved in 76,013 short- and long-range regulatory interactions (FDR < 0.05) in different human tissues. Convex biclustering of spatial eGenes that are shared among phenotypes identifies complex interrelationships between nominally different phenotype-associated SNPs. Our approach enables the simultaneous elucidation of variant interactions with target genes that are drivers of multimorbidity, and those that contribute to unique phenotype associated characteristics.


Supplementary Figures
Supplementary Figure 1 (d) Manhattan plot of significant eQTL-eGene interactions that are not mapped in the GWAS Catalog. We considered an interaction as novel if the eGene is not distinctly mentioned in the 'MAPPED_GENE' column of a SNP association in the GWAS Catalog (v1.0.1). For c) and d); X axis tick marks are organized sequentially from 1-22, followed by the X chromosome. The Y chromosome is not represented on c) or d).

Supplementary Figure 2. Relationships between eQTL SNPs and eGene.
(a) Violin plot of eQTL p-values and distance between eQTL SNPs and eGene shows that there are more eQTLs within 1 mb of genes than there are within genes. (b) About half of eQTL SNPs affect only one gene, the other half affect multiple genes. (c) Distribution of distance between eQTL SNP and eGene Hi-C fragment loops. The distances here are between the closest SNPs and genes.
Supplementary Figure 3. Associations of phenotypes based on shared spatial interactions. (a) Phenotypes associate weakly when the 7,776 significant eQTLs were used to define their interrelationship as shown by sparser blue dots in the heatmap and gaps on the Q-Q plot. (b) Relationships among phenotypes are enhanced by the eGenes. (c) 1000 null datasets were generated by randomly assigning eGenes to phenotypes such that each control phenotype has the same number as the corresponding sample phenotype. The mean null dataset has a different pattern from that of sample phenotypes. Heatmaps in a, b and c are extracts of the same set of phenotypes in the same order. Darker squares in matrices indicate higher proportions of shared eGenes (with 1 being the highest, meaning the sets of eGenes of two phenotypes are the same). Q-Q plots include shared eQTL, eGene or control ratios for 861 phenotypes.
Supplementary Figure 4. Phenotypes cluster based on common spatial eGenes. The heat map highlights the multimorbidity of 618 phenotypes that share ≥ 4 spatial eGenes with at least one other phenotype. Unsupervised clustering was performed using convex biclustering algorithm from the cvxbiclustr R package. Deep blue squares indicate higher proportions of shared eGenes, with 1 being the highest and indicating that two phenotypes have the same set of eGenes. Ten notable clusters are annotated and described. A complete list of inter-phenotype shared eGene proportions is presented in Supplementary Data 2. To see phenotypes in these clusters, a high resolution version of this image is available at DOI 10.17608/k6.auckland.7294934 Supplementary Figure 5. Biclustering of phenotypes cluster based on the shared eQTLs. The graph shows the segregation of phenotypes (on both axes) that share ≥ 4 eQTLs with at least one other phenotype by the convex biclustering algorithm from the cvxbiclustr R package. Deep blue squares indicate higher proportions of shared eGenes, with 1 being the highest and indicating that two phenotypes have the same set of eGenes. The complete proportions of the eQTLs shared among phenotypes are given in Supplementary Data 2. We designated that phenotypes must share ≥ 4 eQTLs with at least one other phenotype because the ratio of eGenes to SNPs (i.e. 7776 / 7917) is approximately one and this made it equivalent to the requirements for eGenes biclustering (Supplementary Figure 4). (a) Linked eQTLs in the FADS locus have different allele specificities in the CEU population, as indicated by the differences between the D' and R 2 scores. The difference in allele frequencies is associated with the 2 distinct eQTL effect patterns (b and c) across the FADS locus. Groups A and B highlight the distinct pattern of eQTL associated transcriptional affects. Effect sizes of spatial eQTL on eGenes were obtained from GTEx v7 analysis. Centre line, bounds of box, and whiskers of boxplots represent the median, 2 nd and 3 rd quartile, and minimum and maximum values respectively.
Supplementary Figure 9. The CHRNA regulome is central to pulmonary disorders. (a) The LD pattern here suggests 2 alternating haplotype blocks, with the smaller block represented by rs503464. rs667282, rs8042374, and to a lesser extent, rs13180. (b) CHRNA3 and CHRNA5 are differentially expressed and regulated in the same tissues and by the same eQTLs, which seem consistent with the LD patterns in a. (c) eQTLs in the CHRNA locus region interact with multiple genes both within and outside this region, suggesting that the locus may be a super-enhancer. rs8042374, which is located at a TAD boundary also interacts with genes in the two adjacent TADs. Figure 10. Tissue specificity of spatial eQTL-eGene interactions. There is no correlation between eQTL p-values and the total unique Hi-C interactions (a), and the total supporting interactions (b) of eQTL SNP-eGene associations. Interactions and supporting interactions are as defined in Fig 1a. (c) Distribution of eQTL-eGene interactions among the Hi-C cell lines. (d) The proportion of spatial eQTL-eGene associations in tissues positively correlates (r = 0.87) with the number of RNASeq and genotyped samples in GTEx.