Genetic studies promise to provide insight into the molecular mechanisms underlying type 2 diabetes (T2D). Variants associated with T2D are often located in tissue-specific enhancer clusters or super-enhancers. So far, such domains have been defined through clustering of enhancers in linear genome maps rather than in three-dimensional (3D) space. Furthermore, their target genes are often unknown. We have created promoter capture Hi-C maps in human pancreatic islets. This linked diabetes-associated enhancers to their target genes, often located hundreds of kilobases away. It also revealed >1,300 groups of islet enhancers, super-enhancers and active promoters that form 3D hubs, some of which show coordinated glucose-dependent activity. We demonstrate that genetic variation in hubs impacts insulin secretion heritability, and show that hub annotations can be used for polygenic scores that predict T2D risk driven by islet regulatory variants. Human islet 3D chromatin architecture, therefore, provides a framework for interpretation of T2D genome-wide association study (GWAS) signals.
Access optionsAccess options
Subscribe to Journal
Get full journal access for 1 year
only $18.75 per issue
All prices are NET prices.
VAT will be added later in the checkout.
Rent or Buy article
Get time limited or full article access on ReadCube.
All prices are NET prices.
Raw sequence reads from pcHi-C, RNA-seq, ChIP-seq, ATAC-seq and 4C-seq are available from EGA (https://www.ebi.ac.uk/ega), under accession number EGAS00001002917. Processed data files for islet pcHi-C interactions, islet regulome annotations, enhancer–promoter assignments, hub coordinates and components and 3D model videos are provided as supplementary data. The robust set of ATAC-seq peaks, consistent set of Mediator, cohesin, H3K27ac and H3K4me3 peaks, list of islet super-enhancers defined using ROSE algorithm, islet regulome, ChromHMM segmentation model, list of islet TAD-like domains, PATs and the list of high-confidence pcHi-C interactions are provided as Supplementary Datasets and are also deposited at https://www.crg.eu/en/programmes-groups/ferrer-lab#datasets.
Custom code in this manuscript is available upon request.
P.R. is a shareholder and consultant for Endocells/Unicercell Biosolutions.
Publisher’s note: Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Chatterjee, S., Khunti, K. & Davies, M. J. Type 2 diabetes. Lancet 389, 2239–2251 (2017).
Flannick, J. & Florez, J. C. Type 2 diabetes: genetic data sharing to advance complex disease research. Nat. Rev. Genet. 17, 535–549 (2016).
Fuchsberger, C. et al. The genetic architecture of type 2 diabetes. Nature 536, 41–47 (2016).
Whyte, W. A. et al. Master transcription factors and mediator establish super-enhancers at key cell identity genes. Cell 153, 307–319 (2013).
Parker, S. C. et al. Chromatin stretch enhancer states drive cell-specific gene regulation and harbor human disease risk variants. Proc. Natl Acad. Sci. USA 110, 17921–17926 (2013).
Gaulton, K. J. et al. A map of open chromatin in human pancreatic islets. Nat. Genet. 42, 255–259 (2010).
Pasquali, L. et al. Pancreatic islet enhancer clusters enriched in type 2 diabetes risk-associated variants. Nat. Genet. 46, 136–143 (2014).
Cohen, A. J. et al. Hotspots of aberrant enhancer activity punctuate the colorectal cancer epigenome. Nat. Commun. 8, 14400 (2017).
Farh, K. K. et al. Genetic and epigenetic fine mapping of causal autoimmune disease variants. Nature 518, 337–343 (2015).
Hnisz, D. et al. Super-enhancers in the control of cell identity and disease. Cell 155, 934–947 (2013).
Vahedi, G. et al. Super-enhancers delineate disease-associated regulatory nodes in T cells. Nature 520, 558–562 (2015).
Montavon, T. et al. A regulatory archipelago controls Hox genes transcription in digits. Cell 147, 1132–1145 (2011).
Patrinos, G. P. et al. Multiple interactions between regulatory regions are required to stabilize an active chromatin hub. Genes Dev. 18, 1495–1509 (2004).
Javierre, B. M. et al. Lineage-specific genome architecture links enhancers and non-coding disease variants to target gene promoters. Cell 167, 1369–1384.e19 (2016).
Cairns, J. et al. CHiCAGO: robust detection of DNA looping interactions in Capture Hi-C data. Genome Biol. 17, 127 (2016).
Schofield, E. C. et al. CHiCP: a web-based tool for the integrative and interactive visualization of promoter capture Hi-C datasets. Bioinformatics 32, 2511–2513 (2016).
Mularoni, L., Ramos-Rodriguez, M. & Pasquali, L. The pancreatic islet regulome browser. Front Genet. 8, 13 (2017).
Dixon, J. R. et al. Topological domains in mammalian genomes identified by analysis of chromatin interactions. Nature 485, 376–380 (2012).
Nora, E. P. et al. Spatial partitioning of the regulatory landscape of the X-inactivation centre. Nature 485, 381–385 (2012).
Benazra, M. et al. A human beta cell line with drug inducible excision of immortalizing transgenes. Mol. Metab. 4, 916–925 (2015).
Fogarty, M. P., Cannon, M. E., Vadlamudi, S., Gaulton, K. J. & Mohlke, K. L. Identification of a regulatory variant that binds FOXA1 and FOXA2 at the CDC123/CAMK1D type 2 diabetes GWAS locus. PLoS Genet. 10, e1004633 (2014).
Thurner, M. et al. Integration of human pancreatic islet genomic data refines regulatory mechanisms at Type 2 diabetes susceptibility loci. eLife 7, e31977 (2018).
van de Bunt, M. et al. Transcript expression data from human islets links regulatory signals from genome-wide association studies for Type 2 diabetes and glycemic traits to their downstream effectors. PLoS Genet. 11, e1005694 (2015).
Varshney, A. et al. Genetic regulatory signatures underlying islet gene expression and type 2 diabetes. Proc. Natl Acad. Sci. USA 114, 2301–2306 (2017).
Scott, R. A. et al. An expanded genome-wide association study of type 2 diabetes in Europeans. Diabetes 66, 2888–2902 (2017).
Wood, A. R. et al. A genome-wide association study of IVGTT-based measures of first-phase insulin secretion refines the underlying physiology of type 2 diabetes variants. Diabetes 66, 2296–2309 (2017).
Lyssenko, V. et al. Mechanisms by which common variants in the TCF7L2 gene increase risk of type 2 diabetes. J. Clin. Invest. 117, 2155–2163 (2007).
Xia, Q. et al. The type 2 diabetes presumed causal variant within TCF7L2 resides in an element that controls the expression of ACSL5. Diabetologia 59, 2360–2368 (2016).
Nobrega, M. A. TCF7L2 and glucose metabolism: time to look beyond the pancreas. Diabetes 62, 706–708 (2013).
Bau, D. et al. The three-dimensional folding of the alpha-globin gene domain reveals formation of chromatin globules. Nat. Struct. Mol. Biol. 18, 107–114 (2011).
Serra, F. et al. Automatic analysis and 3D-modelling of Hi-C data using TADbit reveals structural features of the fly chromatin colors. PLoS Comput. Biol. 13, e1005665 (2017).
Gaulton, K. J. et al. Genetic fine mapping and genomic annotation defines causal mechanisms at type 2 diabetes susceptibility loci. Nat. Genet. 47, 1415–1425 (2015).
Boyle, E. A., Li, Y. I. & Pritchard, J. K. An expanded view of complex traits: from polygenic to omnigenic. Cell 169, 1177–1186 (2017).
Wood, A. R. et al. Defining the role of common variation in the genomic and biological architecture of adult human height. Nat. Genet. 46, 1173–1186 (2014).
Khera, A. V. et al. Genome-wide polygenic scores for common diseases identify individuals with risk equivalent to monogenic mutations. Nat. Genet. 50, 1219–1224 (2018).
Finucane, H. K. et al. Partitioning heritability by functional annotation using genome-wide association summary statistics. Nat. Genet. 47, 1228–1235 (2015).
DeFronzo, R. A. et al. Type 2 diabetes mellitus. Nat. Rev. Dis. Primers 1, 15019 (2015).
Gjesing, A. P. et al. Genetic and phenotypic correlations between surrogate measures of insulin release obtained from OGTT data. Diabetologia 58, 1006–1012 (2015).
Mahajan, A. et al. Fine-mapping type 2 diabetes loci to single-variant resolution using high-density imputation and islet-specific epigenome maps. Nat. Genet. 50, 1505–1513 (2018).
Khera, A. V. et al. Polygenic prediction of weight and obesity trajectories from birth to adulthood. Cell 177, 587–596.e9 (2019).
Richardson, T. G., Harrison, S., Hemani, G. & Davey Smith, G. An atlas of polygenic risk score associations to highlight putative causal relationships across the human phenome. eLife 8, e43657 (2019).
Bonas-Guarch, S. et al. Re-analysis of public genetic data reveals a rare X-chromosomal variant associated with type 2 diabetes. Nat. Commun. 9, 321 (2018).
Sudlow, C. et al. UK biobank: an open access resource for identifying the causes of a wide range of complex diseases of middle and old age. PLoS Med. 12, e1001779 (2015).
Bycroft, C. et al. The UK Biobank resource with deep phenotyping and genomic data. Nature 562, 203–209 (2018).
Schmitt, A. D. et al. A compendium of chromatin contact maps reveals spatially active regions in the human genome. Cell Rep. 17, 2042–2059 (2016).
Harmston, N. et al. Topologically associating domains are ancient features that coincide with Metazoan clusters of extreme noncoding conservation. Nat. Commun. 8, 441 (2017).
Akalin, A. et al. Transcriptional features of genomic regulatory blocks. Genome Biol. 10, R38 (2009).
Ahlqvist, E. et al. Novel subgroups of adult-onset diabetes and their association with outcomes: a data-driven cluster analysis of six variables. Lancet Diabetes Endocrinol. 6, 361–369 (2018).
Kahn, S. E., Cooper, M. E. & Del Prato, S. Pathophysiology and treatment of type 2 diabetes: perspectives on the past, present, and future. Lancet 383, 1068–1083 (2014).
Melzi, R. et al. Role of CCL2/MCP-1 in islet transplantation. Cell Transplant. 19, 1031–1046 (2010).
Kerr-Conte, J. et al. Upgrading pretransplant human islet culture technology requires human serum combined with media renewal. Transplantation 89, 1154–1160 (2010).
Bucher, P. et al. Assessment of a novel two-component enzyme preparation for human islet isolation and transplantation. Transplantation 79, 91–97 (2005).
Ricordi, C., Lacy, P. E., Finke, E. H., Olack, B. J. & Scharp, D. W. Automated method for isolation of human pancreatic islets. Diabetes 37, 413–420 (1988).
Nagano, T. et al. Comparison of Hi-C results using in-solution versus in-nucleus ligation. Genome Biol. 16, 175 (2015).
Wingett, S. et al. HiCUP: pipeline for mapping and processing Hi-C data. F1000Res. 4, 1310 (2015).
Buenrostro, J. D., Giresi, P. G., Zaba, L. C., Chang, H. Y. & Greenleaf, W. J. Transposition of native chromatin for fast and sensitive epigenomic profiling of open chromatin, DNA-binding proteins and nucleosome position. Nat. Methods 10, 1213–1218 (2013).
Martin, M. Cutadapt removes adapter sequences from high-throughput sequencing reads. EMBnet J. 17, 10–12 (2011).
Langmead, B. & Salzberg, S. L. Fast gapped-read alignment with Bowtie 2. Nat. Methods 9, 357–359 (2012).
Li, H. et al. The sequence alignment/map format and SAMtools. Bioinformatics 25, 2078–2079 (2009).
McKenna, A. et al. The genome analysis toolkit: a MapReduce framework for analyzing next-generation DNA sequencing data. Genome Res. 20, 1297–1303 (2010).
Dunham, I. et al. An integrated encyclopedia of DNA elements in the human genome. Nature 489, 57–74 (2012).
Kharchenko, P. V., Tolstorukov, M. Y. & Park, P. J. Design and analysis of ChIP-seq experiments for DNA-binding proteins. Nat. Biotechnol. 26, 1351–1359 (2008).
Zhang, Y. et al. Model-based analysis of ChIP-Seq (MACS). Genome Biol. 9, R137 (2008).
Leisch, F. A toolbox for K-centroids cluster analysis. Comput. Stat. Data Anal. 51, 526–544 (2006).
Kuleshov, M. V. et al. Enrichr: a comprehensive gene set enrichment analysis web server 2016 update. Nucleic Acids Res. 44, W90–W97 (2016).
Baù, D. & Marti-Renom, M. A. Genome structure determination via 3C-based data integration by the Integrative Modeling Platform. Methods 58, 300–306 (2012).
Di Stefano, M., Paulsen, J., Lien, T. G., Hovig, E. & Micheletti, C. Hi-C-constrained physical models of human chromosomes recover functionally-related properties of genome organization. Sci. Rep. 6, 35985 (2016).
Ahmed, M. et al. Variant Set Enrichment: an R package to identify disease-associated functional genomic regions. BioData Min. 10, 9 (2017).
Thuesen, B. H. et al. Cohort Profile: the Health2006 cohort, research centre for prevention and health. Int. J. Epidemiol. 43, 568–575 (2014).
Drivsholm, T., Ibsen, H., Schroll, M., Davidsen, M. & Borch-Johnsen, K. Increasing prevalence of diabetes mellitus and impaired glucose tolerance among 60-year-old Danes. Diabet. Med. 18, 126–132 (2001).
Johansen, N. B. et al. Protocol for ADDITION-PRO: a longitudinal cohort study of the cardiovascular experience of individuals at high risk for diabetes recruited from Danish primary care. BMC Public Health 12, 1078 (2012).
McCarthy, S. et al. A reference panel of 64,976 haplotypes for genotype imputation. Nat. Genet. 48, 1279–1283 (2016).
Loh, P. R., Palamara, P. F. & Price, A. L. Fast and accurate long-range phasing in a UK Biobank cohort. Nat. Genet. 48, 811–816 (2016).
Marchini, J. & Howie, B. Genotype imputation for genome-wide association studies. Nat. Rev. Genet. 11, 499–511 (2010).
Schwarzer, G. meta: an R package for meta-analysis. R. News 7, 40–45 (2007).
Bulik-Sullivan, B. K. et al. LD Score regression distinguishes confounding from polygenicity in genome-wide association studies. Nat. Genet. 47, 291–295 (2015).
Euesden, J., Lewis, C. M. & O’Reilly, P. F. PRSice: polygenic risk score software. Bioinformatics 31, 1466–1468 (2015).
This research was supported by the National Institute for Health Research Imperial Biomedical Research Centre. Work was funded by grants from the Wellcome Trust (nos. WT101033 to J.F. and WT205915 to I.P.), Horizon 2020 (Research and Innovation Programme nos. 667191, to J.F., 633595, to I.P., and 676556, to M.A.M.-R.; Marie Sklodowska-Curie 658145, to I.M.-E., and 43062 ZENCODE, to G.A.), European Research Council (nos. 789055, to J.F., and 609989, to M.A.M.-R.). Marató TV3 (no. 201611, to J.F. and M.A.M.-R.), Ministerio de Ciencia Innovación y Universidades (nos. BFU2014-54284-R, RTI2018-095666, to J.F., BFU2017-85926-P, to M.A.M.-R., IJCI-2015-23352, to I.F.), AGAUR (to M.A.M.-R.). UK Medical Research Council (no. MR/L007150/1, to P.F., MR/L02036X/1 to J.F.), World Cancer Research Fund (WCRF UK, to I.P.) and World Cancer Research Fund International (no. 2017/1641 to I.P.), Biobanking and Biomolecular Resources Research Infrastructure (nos. BBMRI-NL, NWO 184.021.007, to I.O.F.). Work in IDIBAPS, CRG and CNAG was supported by the CERCA Programme, Generalitat de Catalunya and Centros de Excelencia Severo Ochoa (no. SEV-2012-0208). Human islets were provided through the European islet distribution program for basic research supported by JDRF (no. 3-RSC-2016-160-I-X). We thank N. Ruiz-Gomez for technical assistance; R. L. Fernandes, T. Thorne (University of Reading) and A. Perdones-Montero (Imperial College London) for helpful discussions regarding Machine Learning approaches; B. Lenhard and M. Merkenschlager (London Institute of Medical Sciences, Imperial College London), F. Müller (University of Birmingham) and J. L. Gómez-Skarmeta (Centro Andaluz de Biología del Desarrollo) for critical comments on the draft; the CRG Genomics Unit; and the Imperial College High Performance Computing Service.
Integrated supplementary information
Supplementary Fig. 1. a, Schematic representation of the pcHi-C analysis workflow. b, Relative frequency of high-confidence interactions between baits and interacting regions. c, Distances from bait to interacting regions for high-confidence interactions. The dashed line represents the median distance. d, CHiCAGO score distribution of high-confidence interactions in merged pcHi-C data (n=175,784) and individual islet samples, and in distance-matched interactions. Boxplots show IQR, and whiskers show 5th and 95th percentiles. e, Pairwise Pearson correlation values of CHiCAGO scores between individual islet samples and merged dataset. f-g. Epigenomic maps and virtual 4C profiles in merged and individual human islet samples in TCF7L2 and ISL1. h,i. pcHi-C recapitulates interactions identified by 4C-seq in human islets and the human β cell line EndoC-βH1 at ISL1 and MAFB loci. The top track depicts a virtual 4C representation of human islet pcHi-C data in both promoters. High-confidence interactions from 4 pooled human islet samples and naïve CD4+ T cells are shown below. Inverted triangles depict viewpoints.
a, Binding patterns for indicated epitopes in ± 25 Kb regions centered on interacting pcHi-C baits (top), and promoter-interacting regions (bottom). Expected occupancy profiles after randomizing 10 times the positions of indicated signals are represented with a red line, and IQR are shown as a shade. b, Relative frequency of CTCF binding sites in baits and non-bait interacting regions. Nearly 50% of interactions are associated with CTCF binding in at least one of the interacting regions. c, CTCF-binding motif orientation at CTCF-bound interacting regions. 56.62% of 9,657 interactions are convergent, consistent with expectations. d, Tissue-selectivity of islet pcHi-C interactions relative to identically processed pcHi-C from erythroblasts, macrophages, naïve CD4+ T cells and total B lymphocytes. e, Genes located in baits with islet-selective interactions show increased gene expression islet-specificity scores vs. genes with tissue-invariant interactions. The islet-specificity Z score was calculated with a gene expression distribution from 18 human tissues. P value was calculated with Wilcoxon’s two-sided signed ranked test. Boxplot represents IQRs. f, Ratio of tissue-invariant to islet-selective interactions overlapping major open chromatin classes, normalized by the total number of tissue-invariant and islet-selective interactions. All categories showed significant differences with interactions in the remaining genome (Fisher’s P < 0.01).
a, Features of islet TAD-like domains. b, Representative example of human islet TAD-like domains (chr 11:1132582-4719948, hg19). Negative and positive directionality index (DI) scores are represented in blue and red, respectively. ESC and IMR90 TADs generated with Hi-C are shown for reference. c, Size of TAD-like domains in human islets and Hi-C TADs from ESC and IMR90 cells. d, TAD-like domains display known features of TADs, such as enrichment of CTCF binding and convergent CTCF motif orientation in borders. e, Tissue-selectivity of islet TAD-like boundary regions was estimated by comparison with TADs defined by Hi-C in 21 tissues. f, Enhancers frequently interact with more than one gene. Fraction of enhancers showing high-confidence (CHiCAGO > 5) interactions to 1-5+ promoter ”baits” in the same TAD. g, Schematic of promoter-associated three-dimensional spaces (PATs), defined as the genomic space that spans high-confidence interactions originating from one bait. h, Fraction of islet TAD-like spaces occupied by each PAT. i. ChromHMM state enrichments in PATs were consistent with the expression level of their associated genes. The heatmap shows ChromHMM state median log2 fold-enrichments in PATs over their genomic distributions, in 5 bins based on bait gene expression levels in human islets. j. Active islet enhancer or H3K9me3-enriched ChromHMM states in PATs were enriched over the remaining TAD-like space in accordance with islet expression of PAT genes. Only PATs at least 25% smaller than their TAD were used (n=7,085). Median enrichments (circles) and IQR (shade) are shown. k. Emission probabilities of the 15 ChromHMM states for all islet chromatin features used to create the model. l. Sequential steps used to impute the assignment of islet enhancers to target genes. m. CHiCAGO scores for imputed enhancer-promoter pairs vs. distance-matched controls (n=50 sets). P value is from Wilcoxon’s two-sided signed rank test. Boxplot represents IQRs. n. Genes assigned to enhancers were enriched in islet-specific genes, as compared with unassigned control genes from the same islet TAD-like structure (Chi-square P = 6 x 10−08). o. Islet exposure to 4 mM vs. 11 mM glucose causes widespread induction of H3K27 acetylation in islet enhancers. Dots represent H3K27ac-enriched regions, and are red if Benjamini-Hochberg adjusted P ≤ 0.05.
a, T2D and FG-associated variants used to examine gene targets (see Supplementary Table 3). b, Proportion of DIAGRAM credible set SNPs with high posterior probability (PP > 0.1) mapping to islet regulome elements within intervals containing credible sets. Note the enrichment in active enhancers and promoters vs.100 sets of elements shuffled within the genomic spaces that contain credible sets, shown as grey IQR boxplot distributions and outliers as black dots. Z-scores represent deviations from the mean of the shuffled distribution. c-d, Selected examples of loci with T2D-risk variants with gene targets supported by both significant eQTLs and pcHi-C, showing enhancer-gene assignments through pcHiC high-confidence interactions (from pooled data, in magenta) and imputations (grey). Enhancer eQTL-eGene pairs are represented as horizontal black lines. A vertical yellow stripe highlights the eGene promoter. Concordant gene targets include c, STARD10 d, ABCB9. pcHiC interactions are represented as arcs connecting HindIII fragments. Boxplots shows first and third quartiles as boxes and 1.5 x IQR as whiskers of gene expression for different genotypes, shown as PEER residuals, along with P and adjusted P (q) values from eQTL meta-analysis. Red dots represent individual PEER residual values of gene expression for 183 samples across different genotypes. For additional eQTL findings see Supplementary Table 2.
a, Long-range interactions of the enhancer carrying rs11257655 are replicated in individual human islet pcHi-C samples. Note how interactions between this enhancer and OPTN are detected with high confidence (ChICAGO >5) in each pcHi-C replicate. b, Luciferase assay in the human β cell line EndoC-βH3 shows allele-dependent activity for the rs11257655-enhancer. Data are means ± s.d. (n=3 independent experiments, with 3-6 independent transfections). Statistical significance: two-tailed Student's t-test. c,d. Analysis of OPTN and CAMK1D mRNA after c, CRISPRi of the rs11257655-enhancer in HepG2 and d, CRISPRi or CRISPRa in EndoC-βH3 cells. Bars show average values of 3-4 gRNAs targeting either the rs11257655 enhancer, or the transcriptional start sites. Data are presented as means ± s.e.m. (enhancer activation: 4 gRNAs n=6; inhibition: 4 gRNAs n=3). Statistical significance: two-tailed Student's t-test.
a, Virtual 4C representations from pooled human islet samples centered on all genes in this locus show that the region containing rs7903146 connects with TCF7L2 through moderate-confidence interactions and an imputed assignment, without evidence for interactions with other genes. The HindIII fragment that contains the enhancer with rs7903146 is highlighted in yellow. The bottom panel reveals that this enhancer shows unusually high occupancy by Mediator and islet-enriched transcription factors in islet chromatin. b, RNA analysis in EndoC-βH3 cells after deletion of either the rs7903146-enhancer or a control region in the same locus. Deletions were tested with 2 different gRNA pairs, n=3 experiments. Statistical significance was determined using two-tailed Student's t-test. Only active genes in the locus were tested. c, RNA analysis in EndoC-βH3 cells after CRISPRa or CRISPRi of the rs7903146-enhancer. Statistical significance was determined using two-tailed Student's t-test (activation: 1 gRNA, n=3 experiments; inhibition: 3 gRNAs n=3 experiments).
a,c. T2D variant-target gene assignments in VEGFA and ZFAND3 loci. pcHi-C and virtual 4C representations are from pooled samples. b,d. VEGFA or MDGA1 and ZFAND3 mRNAs in EndoC-βH3 cells after CRISPRa or CRISPRi of T2D-associated enhancers. C6orf223 was not detectable by qPCR. Note that we did not examine all potential targets near VEGFA (see other imputed genes in Supplementary Table 3). Data are presented as means ± s.e.m. (VEGFA enhancer CRISPRa: 3 guides n=3 experiments; VEGFA enhancer CRISPRi: 4 guides n=2 experiments; ZFAND3-MDGA1 enhancer: 4 guides n=3 experiments). Statistical significance was determined using two-tailed Student's t-test.
a, Multiple logistic regression analysis was used to identify PAT features that predict islet-expressed genes with islet-selective vs. non islet-selective expression. Islet-selective expression was examined as a surrogate endpoint because it is a property of many (though not all) genes important for islet cell identity. The PAT feature with the highest logistic regression coefficient was the number of non-islet tissues with promoter H3K27me3-enrichment. This feature was considered as almost synonymous with islet-specific islet expression. The next highest coefficient was the number assigned class I enhancers in the PAT. Further analysis showed that ≥3 assigned class I enhancers in a PAT optimized the prediction of islet-selective expression (Supplementary Fig. 9). b, Classification of PATs based on assigned enhancers revealed 2,623 enhancer-rich PATs (≥3 assigned class I enhancers). Enhancers are shown as red boxes. Turquoise and dashed green lines are high-confidence interactions and imputed assignments, respectively. c, Enhancer hubs were defined as enhancer-rich PATs, which were merged with other PATs connected through at least one common enhancer-associated high-confidence interaction. d, Descriptive characteristics of enhancer hubs in human islets. Multi-target enhancers show high confidence interactions with two or more promoter-containing baits. e, Enhancer hubs are enriched in islet-selective interactions relative to non-hub PATs that had at least 1 high-confidence interaction. Boxes are IQR, notches are 95% CI of the median and P values are from Wilcoxon’s two-sided signed rank test. f, Linear genomic space occupied by class I enhancers in three-dimensional enhancer hubs compared with the space occupied by super-enhancers (SEs) calculated with the ROSE algorithm, all enhancers from linear enhancer clusters (ECs), and stretch enhancers. g-i. Venn diagrams depicting how often hub enhancers overlap with other human islet enhancer domains: g, SEs, h, highly-bound (top two TF occupancy quartiles) ECs, and i, stretch enhancers. j-l. Islet enhancer hubs often contain enhancers that do not form part of SEs or ECs. Charts show the fraction of hub class I enhancers that overlapped SEs, ECs or stretch-enhancers. Note that the genomic space occupied by stretch enhancers is an order of magnitude greater than hubs (panel g). m-o. Islet enhancer hubs very frequently contain multiple SEs, ECs or stretch enhancers.
We considered alternative definitions of hubs as follows: a, enhancer-rich PATs with ≥3 class I enhancers, but without merging interconnected PATs, b-e, enhancer-rich PATs with ≥2-5 assigned class I enhancers, merged with PATs interconnected through high-confidence enhancer interactions, f,g, enhancer-rich PATs with ≥2 or ≥3 class I enhancers exclusively assigned through high-confidence interactions, and then merged to PATs interconnected through high-confidence enhancer interactions, h, enhancer-rich PATs with ≥3 assigned class I enhancers, merged to PATs interconnected through promoter-promoter (instead of enhancer-promoter) interactions. We found that canonical islet-cell functional annotations ranked highest only in definitions with ≥3 assigned class I enhancers. Hubs with ≥4-5 assigned class I enhancers (d,e), as well as those defined exclusively with high-confidence interactions (f,g), showed high ranking islet cell functional annotation enrichments, at the expense of reducing the number of hubs. Panels in the right show post-hoc VSE analysis of T2D/FG-associated SNPs (n=2,771; Supplementary Table 9). Consistent with the notion that the hub definitions in d-g were restrictive, they failed to show selective enrichment of T2D/FG-associated SNPs. Boxplots show null distributions based on 500 permutations of matched random haplotype blocks. Red dots indicate significant enrichment relative to the null distribution (Bonferroni–adjusted P < 0.01).
a, The FOXA2 locus forms a tissue-specific enhancer hub. Human islet epigenome maps and high-confidence pcHi-C interactions in islets and total B lymphocytes show that islet active enhancers, super-enhancers and enhancer clusters interact to form a single tissue-specific three-dimensional structure. b-c, 360o views of top-scoring 3D model of ISL1 enhancer hub in human islets and total B lymphocytes. Class I, II and III enhancers within 200 nm of ISL1 promoter are colored dark to light red, while promoters within 200 nm of ISL1 (including ISL1) are colored blue. Islet enhancers and promoters are otherwise represented as white spheres. These models show that active islet regulatory elements interact in a common restricted space in islet nuclei. See also Supplementary Videos 1 and 2. d-h, Left panels show the most populated community of the promoter-enhancer interaction network in chosen hubs, as obtained via MCODE clustering, in human islets and total B lymphocytes. Network nodes are promoters (blue) and enhancers (dark to light red for enhancer classes I to III). Edges are mean distance values in the most populated 3D structure cluster. The central panel compares the neighborhood connectivity distribution of networks in both tissues. The right panel shows the 3D distances between hub promoters and enhancers in both tissues. All boxplots show IQRs and outliers as grey diamonds. The number of nodes analysed for each locus is shown in Supplementary Table 16. Statistical significance was computed using two-sided Kolmogorov-Smirnov test.
a, pcHi-C and virtual 4C representations from pooled human islet samples in the ZBED3 locus for all promoters with active transcripts in the region. b, Islet pcHi-C assigns CRY2 and PHF21A as gene targets of an enhancer containing a FG-associated variant (vertical yellow stripe). c, Analysis of CRY2 and PHF21A mRNA after CRISPRa or CRISPRi of their transcriptional start sites or of the islet enhancer bearing the FG-associated variant rs1401419 in EndoC-βH3 cells. Data are presented as means ± s.e.m. (enhancer CRISPRa: 4 gRNAs n=3; CRISPRi: 2 gRNAs n=2). Statistical significance was determined using two-tailed Student's t-test.
a, Islet pcHi-C assigns C2CD4A and C2CD4B as gene targets of three enhancers containing T2D-associated variants (vertical yellow stripes) in the C2CD4A/B locus. pcHi-C and virtual 4C representations are from pooled human islet samples. b, Analysis of VPS13C, C2CD4A and C2CD4B mRNA after CRISPRa or CRISPRi targeting of their transcriptional start sites or of three islet enhancers bearing T2D-FG variants in EndoC-βH3 cells. Data are presented as means ± s.e.m. (CRISPRa: 4 gRNAs n=3 experiments; CRISPRi: 4 gRNAs n=2 experiments). Statistical significance was determined using two-tailed Student's t-test.
a, Islet pcHi-C virtual 4C representations from pooled samples, showing the T1D/T2D-associated locus GLIS3. The inset shows the enhancer bearing rs4237150. b, Luciferase assays in EndoC-βH3 cells show haplotype-dependent activity of the rs4237150-enhancer. Data are means ± s.d. (n=3 independent experiments with 4-6 independent transfections). Statistical significance: two-tailed Student's t-test. c, Analysis of GLIS3, RFX3 and RFX3-AS1 mRNA upon deletion of rs4237150-enhancer or control regions. Data are presented as means ± s.e.m. (2 pairs of gRNAs per target region, n=3 experiments each). Statistical significance: two-tailed Student's t-test. d, Analysis of predicted target gene transcripts after CRISPRa or CRISPRi targeting of the GLIS3 transcriptional start site or the rs4237150-enhancer in EndoC-βH3 cells. Data are means ± s.e.m. (enhancer CRISPRa: 3 gRNAs n=3 experiments; CRISPRi: 3 gRNAs n=2 experiments). Statistical significance: two-tailed Student's t-test. e, Top-scoring GLIS3 hub model from the most populated cluster of the ensemble in human islets and total B lymphocytes. Enhancers and promoters within 200 nm GLIS3 or RFX3 promoters are colored in red and blue, respectively, or as white spheres if located further. f, Most populated community of the promoter-enhancer interaction network obtained via MCODE clustering of this locus in human islets and total B lymphocytes. Nodes represent promoters (blue) and enhancers (dark to light red for enhancer classes I to III). Edges are mean distances in most populated 3D cluster. Although GLIS3 and RFX3 are connected in a common hub, the networks suggest that they form part of separable sub-communities. g, Neighborhood connectivity distribution between the islet and total B lymphocytes networks. h, 3D distance distribution between enhancers and promoters in GLIS3 hub. Boxplots show IQRs. Statistical significance was computed using two-sample Kolmogorov-Smirnov two-sided test as described in Supplementary Fig. 10. See also Supplementary Table 16.
Supplementary Figure 14 T2D-associated variants are enriched in interacting regions and hub class I enhancers.
a,b, VSE enrichment analysis of T2D and FG (n=2,771) and breast cancer (n=3,048) variants in islet active regulatory elements (see Supplementary Dataset 1). Box plots show null distributions based on 500 permutations of matched random haplotype blocks. Each dot denotes VSE enrichment of disease-associated variants in each genomic feature. The red dot indicates significant enrichment relative to the null distribution (Bonferroni-adjusted P < 0.01). c, Breast cancer-associated variants show no enrichment in islet enhancer sub-classes. d-e, VSE enrichment analysis of T2D and FG and breast cancer SNPs in chromatin regions with high-confident pcHi-C interactions in islets. f, VSE enrichment analysis of T2D and FG-associated variants in indicated enhancer categories. All boxplots show IQRs.
Supplementary Figure 15 Class I enhancers in hubs contribute to heritability of beta cell-related traits.
a-e, Per-SNP heritability estimates of variants in eight islet enhancer domain subtypes calculated using summary statistics data from: a, T2D (12,931 cases, and 57,196 controls); b, acute insulin release (AIR)-in vivo glucose tolerance test (IVGTT, up to 5,567 individuals); c, insulinogenic index (OGTT, 7,807 individuals); d, HOMA-B; and e, HOMA-IR (up to ~80,000 individuals). Bars show category specific per-SNP heritability coefficients (τ c ) divided by the LD score heritability (h 2) score observed for each trait. All normalized τ c coefficients were multiplied by 107 and shown with s.e.m. τ c coefficients were estimated using stratified LD score regression, controlling for 53 functional annotation categories included in the baseline model. f, Per-SNP T2D and Attention-Deficit/Hyperactivity Disorder (ADHD, up to 55,374 individuals) heritability estimates in islet regulatory elements and Central Nervous System (CNS) annotations. τ c coefficients, normalizations by h 2 and representations are as explained in panels a-e. g, Impact of polygenic risk scores (PRS) on T2D frequency. T2D frequency (y-axis) was calculated in 40 bins, each one representing 2.5% of individuals in the UK Biobank test set. PRS values were calculated with common genetic variants in islet hub enhancers and baits (pink dots), other islet open chromatin regions (light blue dots) and in the rest of genome (black dots). h. T2D risk ratios stratified by BMI (left) and age of onset of T2D (right). Controls were censored at the age of recruitment. Boxplots show IQR of the risk ratio from 100 sets of pseudo-hubs PRS, and with whiskers 1.5 x IQR. Color dots as in g. h, T2D risk stratified by BMI and age of onset of T2D. Odds ratios (OR) for T2D were calculated for 2.5% individuals with the highest PRS vs. all other individuals via adjusted logistic regression. Boxplots show IQR of the risk ratio from 100 sets of pseudo-hubs PRS, and with whiskers 1.5 x IQR. For all panels, Z-scores define standard deviations relative to average values from pseudo-hub PRS. See also Supplementary Fig. 15 and Supplementary Table 17.
Supplementary Figs. 1–15 and Supplementary Notes 1–18
Supplementary Tables 1–17
Supplementary Datasets 1–11
Top-scoring 3D model of ISL1 enhancer hub in human islets.
Top-scoring 3D model of ISL1 enhancer hub in total B lymphocites.