Discriminating the gene target of a distal regulatory element from other nearby transcribed genes is a challenging problem with the potential to illuminate the causal underpinnings of complex diseases. We present TargetFinder, a computational method that reconstructs regulatory landscapes from diverse features along the genome. The resulting models accurately predict individual enhancer–promoter interactions across multiple cell lines with a false discovery rate up to 15 times smaller than that obtained using the closest gene. By evaluating the genomic features driving this accuracy, we uncover interactions between structural proteins, transcription factors, epigenetic modifications, and transcription that together distinguish interacting from non-interacting enhancer–promoter pairs. Most of this signature is not proximal to the enhancers and promoters but instead decorates the looping DNA. We conclude that complex but consistent combinations of marks on the one-dimensional genome encode the three-dimensional structure of fine-scale regulatory interactions.
At a glance
Gene Expression Omnibus
- Linking disease associations with regulatory information in the human genome. Genome Res. 22, 1748–1759 (2012). , , , &
- Human genetic variation recognizes functional elements in noncoding sequence. Genome Res. 20, 311–319 (2010). , &
- Features of Arabidopsis genes and genome discovered using full-length cDNAs. Plant Mol. Biol. 60, 69–85 (2006). et al.
- Whole-genome sequencing and variant discovery in C. elegans. Nat. Methods 5, 183–188 (2008). et al.
- Genomic variation and its impact on gene expression in Drosophila melanogaster. PLoS Genet. 8, e1003055 (2012). et al.
- Candidate genes and functional noncoding variants identified in a canine model of obsessive-compulsive disorder. Genome Biol. 15, R25 (2014). et al.
- A HapMap harvest of insights into the genetics of common disease. J. Clin. Invest. 118, 1590–1605 (2008). , &
- Partitioning heritability of regulatory and cell-type-specific variants across 11 common diseases. Am. J. Hum. Genet. 95, 535–552 (2014). et al.
- Human genetic variation and its contribution to complex traits. Nat. Rev. Genet. 10, 241–251 (2009). , , &
- A high-resolution map of human evolutionary constraint using 29 mammals. Nature 478, 476–482 (2011). et al.
- ENCODE Project Consortium. An integrated encyclopedia of DNA elements in the human genome. Nature 489, 57–74 (2012).
- Unlocking the secrets of the genome. Nature 459, 927–930 (2009). et al.
- The NIH Roadmap Epigenomics Mapping Consortium. Nat. Biotechnol. 28, 1045–1048 (2010). et al.
- Mapping and analysis of chromatin state dynamics in nine human cell types. Nature 473, 43–49 (2011). et al.
- Annotation of functional variation in personal genomes using RegulomeDB. Genome Res. 22, 1790–1797 (2012). et al.
- HaploReg: a resource for exploring chromatin states, conservation, and regulatory motif alterations within sets of genetically linked variants. Nucleic Acids Res. 40, D930–D934 (2012). &
- A general framework for estimating the relative pathogenicity of human genetic variants. Nat. Genet. 46, 310–315 (2014). et al.
- A method for calculating probabilities of fitness consequences for point mutations across the human genome. Nat. Genet. 47, 276–283 (2015). , , &
- A long-range Shh enhancer regulates expression in the developing limb and fin and is associated with preaxial polydactyly. Hum. Mol. Genet. 12, 1725–1735 (2003). et al.
- The long-range interaction landscape of gene promoters. Nature 489, 109–113 (2012). , , &
- Genome-scale functional characterization of Drosophila developmental enhancers in vivo. Nature 512, 91–95 (2014). et al.
- Transcription factor and chromatin features predict genes associated with eQTLs. Nucleic Acids Res. 41, 1450–1463 (2013). , &
- Classification of human genomic regions based on experimentally determined binding sites of more than 100 transcription-related factors. Genome Biol. 13, R48 (2012). et al.
- DNA methylation of distal regulatory sites characterizes dysregulation of cancer genes. Genome Biol. 14, R21 (2013). , &
- Integrative analysis of genomic, functional and protein interaction data predicts long-range enhancer–target gene interactions. Nucleic Acids Res. 39, 2492–2502 (2011). et al.
- The accessible chromatin landscape of the human genome. Nature 489, 75–82 (2012). et al.
- Predicting spatial and temporal gene expression using an integrative model of transcription factor occupancy and chromatin state. PLoS Comput. Biol. 8, e1002798 (2012). , , &
- An oestrogen-receptor-α-bound human chromatin interactome. Nature 462, 58–64 (2009). et al.
- Capturing chromosome conformation. Science 295, 1306–1311 (2002). , , &
- Chromosome Conformation Capture Carbon Copy (5C): a massively parallel solution for mapping interactions between genomic elements. Genome Res. 16, 1299–1309 (2006). et al.
- A decade of 3C technologies: insights into nuclear organization. Genes Dev. 26, 11–24 (2012). &
- A 3D map of the human genome at kilobase resolution reveals principles of chromatin looping. Cell 159, 1665–1680 (2014). et al.
- Chromatin architecture reorganization during stem cell differentiation. Nature 518, 331–336 (2015). et al.
- The pluripotent regulatory circuitry connecting promoters to their long-range interacting elements. Genome Res. 25, 582–597 (2015). et al.
- Mapping long-range promoter contacts in human cells with high-resolution capture Hi-C. Nat. Genet. 47, 598–606 (2015). et al.
- Transcriptional regulatory elements in the human genome. Annu. Rev. Genomics Hum. Genet. 7, 29–59 (2006). , &
- Integrative modeling reveals the principles of multi-scale chromatin boundary formation in human nuclear organization. Genome Biol. 16, 110 (2015). , &
- Chromatin connectivity maps reveal dynamic promoter-enhancer long-range associations. Nature 504, 306–310 (2013). et al.
- Enhancer variants: evaluating functions in common disease. Genome Med. 6, 85 (2014). &
- AP-1 as a regulator of cell life and death. Nat. Cell Biol. 4, E131–E136 (2002). &
- ZNF143 provides sequence specificity to secure chromatin interactions at gene promoters. Nat. Commun. 2, 6186 (2015). et al.
- HCFC1 is a common component of active human CpG-island promoters and coincides with ZNF143, THAP11, YY1, and GABP transcription factor occupancy. Genome Res. 23, 907–916 (2013). et al.
- Promoter-proximal pausing of RNA polymerase II: emerging roles in metazoans. Nat. Rev. Genet. 13, 720–731 (2012). &
- The Polycomb complex PRC2 and its mark in life. Nature 469, 343–349 (2011). &
- Transcription factor binding predicts histone modifications in human cell lines. Proc. Natl. Acad. Sci. USA 111, 13367–13372 (2014). , , &
- ChIP-seq accurately predicts tissue-specific activity of enhancers. Nature 457, 854–858 (2009). et al.
- Recruitment of p300 by C/EBPβ triggers phosphorylation of p300 and modulates coactivator activity. EMBO J. 22, 882–892 (2003). et al.
- Role of histone H2A ubiquitination in Polycomb silencing. Nature 431, 873–878 (2004). et al.
- Global SUMOylation on active chromatin is an acute heat stress response restricting transcription. Genome Biol. 16, 153 (2015). et al.
- SUMO: a history of modification. Mol. Cell 18, 1–12 (2005).
- The CTCF insulator protein is posttranslationally modified by SUMO. Mol. Cell. Biol. 29, 714–725 (2009). , , , &
- NF-κB and AP-1 connection: mechanism of NF-κB-dependent regulation of AP-1 activity. Mol. Cell. Biol. 24, 7806–7819 (2004). et al.
- Ras regulates the association of serum response factor and CCAAT/enhancer-binding protein β. J. Biol. Chem. 274, 14224–14228 (1999). &
- Pioneer factors in hormone-dependent cancers. Nat. Rev. Cancer 12, 381–385 (2012). &
- hZimp10 is an androgen receptor co-activator and forms a complex with SUMO-1 at replication foci. EMBO J. 22, 6101–6114 (2003). et al.
- Antagonistic actions of Rcor proteins regulate LSD1 activity and cellular differentiation. Proc. Natl. Acad. Sci. USA 111, 8071–8076 (2014). , , , &
- Transcription factors mediate long-range enhancer-promoter interactions. Proc. Natl. Acad. Sci. USA 106, 20222–20227 (2009). et al.
- Sp1 regulates chromatin looping between an intronic enhancer and distal promoter of the human heme oxygenase-1 gene in renal cells. J. Biol. Chem. 285, 16476–16486 (2010). et al.
- Conserved ETS domain arginines mediate DNA binding, nuclear localization, and a novel mode of bZIP interaction. J. Biol. Chem. 280, 41421–41428 (2005). et al.
- Epigenetic control of hematopoiesis: the PU.1 chromatin connection. Biol. Chem. 395, 1265–1274 (2014). &
- Control of embryonic stem cell lineage commitment by core promoter factor, TAF3. Cell 146, 720–731 (2011). , , &
- POU/TBP cooperativity: a mechanism for enhancer action from a distance. Mol. Cell 10, 397–407 (2002). &
- A histone H3 lysine 36 trimethyltransferase links Nkx2-5 to Wolf-Hirschhorn syndrome. Nature 460, 287–291 (2009). et al.
- Going the distance: a current view of enhancer action. Science 281, 60–63 (1998). &
- Selective targeting of histone methylation. Cell Cycle 10, 413–424 (2011). , , &
- Checks and balances between cohesin and polycomb in gene silencing and transcription. Curr. Biol. 24, R535–R539 (2014). &
- The core of the polycomb repressive complex is compositionally and functionally conserved in flies and humans. Mol. Cell. Biol. 22, 6070–6078 (2002). et al.
- Polycomb eviction as a new distant enhancer function. Genes Dev. 25, 1583–1588 (2011). et al.
- Nanoscale spatial organization of the HoxD gene cluster in distinct transcriptional states. Proc. Natl. Acad. Sci. USA 112, 13964–13969 (2015). et al.
- Spatial enhancer clustering and regulation of enhancer-proximal genes by cohesin. Genome Res. 25, 504–513 (2015). et al.
- Unsupervised pattern discovery in human chromatin structure through genomic segmentation. Nat. Methods 9, 473–476 (2012). et al.
- ChromHMM: automating chromatin-state discovery and characterization. Nat. Methods 9, 215–216 (2012). &
- An abundance of ubiquitously expressed genes revealed by tissue transcriptome sequence data. PLoS Comput. Biol. 5, e1000598 (2009). , , &
- Measuring reproducibility of high-throughput experiments. Ann. Appl. Stat. 5, 1752–1779 (2011). , , &
- GENCODE: the reference human genome annotation for The ENCODE Project. Genome Res. 22, 1760–1774 (2012). et al.
- Fast gapped-read alignment with Bowtie 2. Nat. Methods 9, 357–359 (2012). &
- Model-based analysis of ChIP-Seq (MACS). Genome Biol. 9, R137 (2008). et al.
- Scikit-learn: machine learning in Python. J. Mach. Learn. Res. 12, 2825–2830 (2011). et al.
- Python for Data Analysis (O'Reilly, 2012).
- BEDTools: a flexible suite of utilities for comparing genomic features. Bioinformatics 26, 841–842 (2010). &
- A tutorial on support vector machines for pattern recognition. Data Min. Knowl. Discov. 2, 121–167 (1998).
- What are decision trees? Nat. Biotechnol. 26, 1011–1013 (2008). &
- Stochastic gradient boosting. Comput. Stat. Data Anal. 38, 367–378 (2002).
- The Elements of Statistical Learning (Springer, 2009). , &
- Gene selection for cancer classification using support vector machines. Mach. Learn. 46, 389–422 (2002). , , &
- Selection bias in gene extraction on the basis of microarray gene-expression data. Proc. Natl. Acad. Sci. USA 99, 6562–6566 (2002). &
- Supplementary Text and Figures (1,286 KB)
Supplementary Tables 1–3, Supplementary Figures 1–20 and Supplementary Note.