An integrated encyclopedia of DNA elements in the human genome

Journal name:
Nature
Volume:
489,
Pages:
57–74
Date published:
DOI:
doi:10.1038/nature11247
Received
Accepted
Published online

Abstract

The human genome encodes the blueprint of life, but the function of the vast majority of its nearly three billion bases is unknown. The Encyclopedia of DNA Elements (ENCODE) project has systematically mapped regions of transcription, transcription factor association, chromatin structure and histone modification. These data enabled us to assign biochemical functions for 80% of the genome, in particular outside of the well-studied protein-coding regions. Many discovered candidate regulatory elements are physically associated with one another and with expressed genes, providing new insights into the mechanisms of gene regulation. The newly identified elements also show a statistical correspondence to sequence variants linked to human disease, and can thereby guide interpretation of this variation. Overall, the project provides new insights into the organization and regulation of our genes and genome, and is an expansive resource of functional annotations for biomedical research.

At a glance

Figures

  1. Impact of selection on ENCODE functional elements in mammals and human populations.
    Figure 1: Impact of selection on ENCODE functional elements in mammals and human populations.

    a, Levels of pan-mammalian constraint (mean GERP score; 24 mammals8, x axis) compared to diversity, a measure of negative selection in the human population (mean expected heterozygosity, inverted scale, y axis) for ENCODE data sets. Each point is an average for a single data set. The top-right corners have the strongest evolutionary constraint and lowest diversity. Coding (C), UTR (U), genomic (G), intergenic (IG) and intronic (IN) averages are shown as filled squares. In each case the vertical and horizontal cross hairs show representative levels for the neutral expectation for mammalian conservation and human population diversity, respectively. The spread over all non-exonic ENCODE elements greater than 2.5kb from TSSs is shown. The inner dashed box indicates that parts of the plot have been magnified for the surrounding outer panels, although the scales in the outer plots provide the exact regions and dimensions magnified. The spread for DHS sites (b) and RNA elements (d) is shown in the plots on the left. RNA elements are either long novel intronic (dark green) or long intergenic (light green) RNAs. The horizontal cross hairs are colour-coded to the relevant data set in d. c, Spread of transcription factor motif instances either in regions bound by the transcription factor (orange points) or in the corresponding unbound motif matches in grey, with bound and unbound points connected with an arrow in each case showing that bound sites are generally more constrained and less diverse. e, Derived allele frequency spectrum for primate-specific elements, with variations outside ENCODE elements in black and variations covered by ENCODE elements in red. The increase in low-frequency alleles compared to background is indicative of negative selection occurring in the set of variants annotated by the ENCODE data. f, Aggregation of mammalian constraint scores over the glucocorticoid receptor (GR) transcription factor motif in bound sites, showing the expected correlation with the information content of bases in the motif. An interactive version of this figure is available in the online version of the paper.

  2. Modelling transcription levels from histone modification and transcription-factor-binding patterns.
    Figure 2: Modelling transcription levels from histone modification and transcription-factor-binding patterns.

    a, b, Correlative models between either histone modifications or transcription factors, respectively, and RNA production as measured by CAGE tag density at TSSs in K562 cells. In each case the scatter plot shows the output of the correlation models (x axis) compared to observed values (y axis). The bar graphs show the most important histone modifications (a) or transcription factors (b) in both the initial classification phase (top bar graph) or the quantitative regression phase (bottom bar graph), with larger values indicating increasing importance of the variable in the model. Further analysis of other cell lines and RNA measurement types is reported elsewhere59, 79. AUC, area under curve; Gini, Gini coefficient; RMSE, root mean square error.

  3. Patterns and asymmetry of chromatin modification at transcription-factor-binding sites.
    Figure 3: Patterns and asymmetry of chromatin modification at transcription-factor-binding sites.

    a, Results of clustered aggregation of H3K27me3 modification signal around CTCF-binding sites (a multifunctional protein involved with chromatin structure). The first three plots (left column) show the signal behaviour of the histone modification over all sites (top) and then split into the high and low signal components. The solid lines show the mean signal distribution by relative position with the blue shaded area delimiting the tenth and ninetieth percentile range. The high signal component is then decomposed further into six different shape classes on the right (see ref. 30 for details). The shape decomposition process is strand aware. b, Summary of shape asymmetry for DNase I, nucleosome and histone modification signals by plotting an asymmetry ratio for each signal over all transcription-factor-binding sites. All histone modifications measured in this study show predominantly asymmetric patterns at transcription-factor-binding sites. An interactive version of this figure is available in the online version of the paper.

  4. Co-association between transcription factors.
    Figure 4: Co-association between transcription factors.

    a, Significant co-associations of transcription factor pairs using the GSC statistic across the entire genome in K562 cells. The colour strength represents the extent of association (from red (strongest), orange, to yellow (weakest)), whereas the depth of colour represents the fit to the GSC20 model (where white indicates that the statistical model is not appropriate) as indicated by the key. Most transcription factors have a nonrandom association to other transcription factors, and these associations are dependent on the genomic context, meaning that once the genome is separated into promoter proximal and distal regions, the overall levels of co-association decrease, but more specific relationships are uncovered. b, Three classes of behaviour are shown. The first column shows a set of associations for which strength is independent of location in promoter and distal regions, whereas the second column shows a set of transcription factors that have stronger associations in promoter-proximal regions. Both of these examples are from data in K562 cells and are highlighted on the genome-wide co-association matrix (a) by the labelled boxes A and B, respectively. The third column shows a set of transcription factors that show stronger association in distal regions (in the H1 hESC line). An interactive version of this figure is available in the online version of the paper.

  5. Integration of ENCODE data by genome-wide segmentation.
    Figure 5: Integration of ENCODE data by genome-wide segmentation.

    a, Illustrative region with the two segmentation methods (ChromHMM and Segway) in a dense view and the combined segmentation expanded to show each state in GM12878 cells, beneath a compressed view of the GENCODE gene annotations. Note that at this level of zoom and genome browser resolution, some segments appear to overlap although they do not. Segmentation classes are named and coloured according to the scheme in Table 3. Beneath the segmentations are shown each of the normalized signals that were used as the input data for the segmentations. Open chromatin signals from DNase-seq from the University of Washington group (UW DNase) or the ENCODE open chromatin group (Openchrom DNase) and FAIRE assays are shown in blue; signal from histone modification ChIP-seq in red; and transcription factor ChIP-seq signal for Pol II and CTCF in green. The mauve ChIP-seq control signal (input control) at the bottom was also included as an input to the segmentation. b, Association of selected transcription factor (left) and RNA (right) elements in the combined segmentation states (x axis) expressed as an observed/expected ratio (obs./exp.) for each combination of transcription factor or RNA element and segmentation class using the heat-map scale shown in the key besides each heat map. c, Variability of states between cell lines, showing the distribution of occurrences of the state in the six cell lines at specific genome locations: from unique to one cell line to ubiquitous in all six cell lines for five states (CTCF, E, T, TSS and R). d, Distribution of methylation level at individual sites from RRBS analysis in GM12878 cells across the different states, showing the expected hypomethylation at TSSs and hypermethylation of genes bodies (T state) and repressed (R) regions.

  6. Experimental characterization of segmentations.
    Figure 6: Experimental characterization of segmentations.

    Randomly sampled E state segments (see Table 3) from the K562 segmentation were cloned for mouse- and fish-based transgenic enhancer assays. a, Representative LacZ-stained transgenic embryonic day (E)11.5 mouse embryo obtained with construct hs2065 (EN167, chr10: 46052882–46055670, GRCh37). Highly reproducible staining in the blood vessels was observed in 9 out of 9 embryos resulting from independent transgenic integration events. b, Representative green fluorescent protein reporter transgenic medaka fish obtained from a construct with a basal hsp70 promoter on meganuclease-based transfection. Reproducible transgenic expression in the circulating nucleated blood cells and the endothelial cell walls was seen in 81 out of 100 transgenic tests of this construct.

  7. High-resolution segmentation of ENCODE data by self-organizing maps (SOM).
    Figure 7: High-resolution segmentation of ENCODE data by self-organizing maps (SOM).

    ac, The training of the SOM (a) and analysis of the results (b, c) are shown. Initially we arbitrarily placed genomic segments from the ChromHMM segmentation on to the toroidal map surface, although the SOM does not use the ChromHMM state assignments (a). We then trained the map using the signal of the 12 different ChIP-seq and DNase-seq assays in the six cell types analysed. Each unit of the SOM is represented here by a hexagonal cell in a planar two-dimensional view of the toroidal map. Curved arrows indicate that traversing the edges of two dimensional view leads back to the opposite edge. The resulting map can be overlaid with any class of ENCODE or other data to view the distribution of that data within this high-resolution segmentation. In panel a the distributions of genome bases across the untrained and trained map (left and right, respectively) are shown using heat-map colours for log10 values. b, The distribution of TSSs from CAGE experiments of GENCODE annotation on the planar representations of either the initial random organization (left) or the final trained SOM (right) using heat maps coloured according to the accompanying scales. The bottom half of b expands the different distributions in the SOM for all expressed TSSs (left) or TSSs specifically expressed in two example cell lines, H1 hESC (centre) and HepG2 (right). c, The association of Gene Ontology (GO) terms on the same representation of the same trained SOM. We assigned genes that are within 20kb of a genomic segment in a SOM unit to that unit, and then associated this set of genes with GO terms using a hypergeometric distribution after correcting for multiple testing. Map units that are significantly associated to GO terms are coloured green, with increasing strength of colour reflecting increasing numbers of genes significantly associated with the GO terms for either immune response (left) or sequence-specific transcription factor activity (centre). In each case, specific SOM units show association with these terms. The right-hand panel shows the distribution on the same SOM of all significantly associated GO terms, now colouring by GO term count per SOM unit. For sequence-specific transcription factor activity, two example genomic regions are extracted at the bottom of panel c from neighbouring SOM units. These are regions around the DBX1 (from SOM unit 26,31, left panel) and IRX6 (SOM unit 27,30, right panel) genes, respectively, along with their H3K27me3 ChIP-seq signal for each of the tier 1 and 2 cell types. For DBX1, representative of a set of primarily neuronal transcription factors associated with unit 26,31, there is a repressive H3K27me3 signal in both H1 hESCs and HUVECs; for IRX6, representative of a set of body patterning transcription factors associated with SOM unit 27,30, the repressive mark is restricted largely to the embryonic stem (ES) cell. An interactive version of this figure is available in the online version of the paper.

  8. Allele-specific ENCODE elements.
    Figure 8: Allele-specific ENCODE elements.

    a, Representative allele-specific information from GM12878 cells for selected assays around the first exon of the NACC2 gene (genomic region Chr9: 138950000–138995000, GRCh37). Transcription signal is shown in green, and the three sections show allele-specific data for three data sets (POLR2A, H3K79me2 and H3K27me3 ChIP-seq). In each case the purple signal is the processed signal for all sequence reads for the assay, whereas the blue and red signals show sequence reads specifically assigned to either the paternal or maternal copies of the genome, respectively. The set of common SNPs from dbSNP, including the phased, heterozygous SNPs used to provide the assignment, are shown at the bottom of the panel. NACC2 has a statistically significant paternal bias for POLR2A and the transcription-associated mark H3K79me2, and has a significant maternal bias for the repressive mark H3K27me3. b, Pair-wise correlations of allele-specific signal within single genes (below the diagonal) or within individual ChromHMM segments across the whole genome for selected DNase-seq and histone modification and transcription factor ChIP-seq assays. The extent of correlation is coloured according to the heat-map scale indicated from positive correlation (red) through to anti-correlation (blue). An interactive version of this figure is available in the online version of the paper.

  9. Examining ENCODE elements on a per individual basis in the normal and cancer genome.
    Figure 9: Examining ENCODE elements on a per individual basis in the normal and cancer genome.

    a, Breakdown of variants in a single genome (NA12878) by both frequency (common or rare (that is, variants not present in the low-coverage sequencing of 179 individuals in the pilot 1 European panel of the 1000 Genomes project55)) and by ENCODE annotation, including protein-coding gene and non-coding elements (GENCODE annotations for protein-coding genes, pseudogenes and other ncRNAs, as well as transcription-factor-binding sites from ChIP-seq data sets, excluding broad annotations such as histone modifications, segmentations and RNA-seq). Annotation status is further subdivided by predicted functional effect, being non-synonymous and missense mutations for protein-coding regions and variants overlapping bound transcription factor motifs for non-coding element annotations. A substantial proportion of variants are annotated as having predicted functional effects in the non-coding category. b, One of several relatively rare occurrences, where alignment to an individual genome sequence (paternal and maternal panels) shows a different readout from the reference genome. In this case, a paternal-haplotype-specific CTCF peak is identified. c, Relative level of somatic variants from a whole-genome melanoma sample that occur in DHSs unique to different cell lines. The coloured bars show cases that are significantly enriched or suppressed in somatic mutations. Details of ENCODE cell types can be found at http://encodeproject.org/ENCODE/cellTypes.html. An interactive version of this figure is available in the online version of the paper.

  10. Comparison of genome-wide-association-study-identified loci with ENCODE data.
    Figure 10: Comparison of genome-wide-association-study-identified loci with ENCODE data.

    a, Overlap of lead SNPs in the NHGRI GWAS SNP catalogue (June 2011) with DHSs (left) or transcription-factor-binding sites (right) as red bars compared with various control SNP sets in blue. The control SNP sets are (from left to right): SNPs on the Illumina 2.5M chip as an example of a widely used GWAS SNP typing panel; SNPs from the 1000 Genomes project; SNPs extracted from 24 personal genomes (see personal genome variants track at http://main.genome-browser.bx.psu.edu (ref. 80)), all shown as blue bars. In addition, a further control used 1,000 randomizations from the genotyping SNP panel, matching the SNPs with each NHGRI catalogue SNP for allele frequency and distance to the nearest TSS (light blue bars with bounds at 1.5 times the interquartile range). For both DHSs and transcription-factor-binding regions, a larger proportion of overlaps with GWAS-implicated SNPs is found compared to any of the controls sets. b, Aggregate overlap of phenotypes to selected transcription-factor-binding sites (left matrix) or DHSs in selected cell lines (right matrix), with a count of overlaps between the phenotype and the cell line/factor. Values in blue squares pass an empirical P-value threshold ≤0.01 (based on the same analysis of overlaps between randomly chosen, GWAS-matched SNPs and these epigenetic features) and have at least a count of three overlaps. The P value for the total number of phenotype–transcription factor associations is <0.001. c, Several SNPs associated with Crohn’s disease and other inflammatory diseases that reside in a large gene desert on chromosome 5, along with some epigenetic features indicative of function. The SNP (rs11742570) strongly associated to Crohn’s disease overlaps a GATA2 transcription-factor-binding signal determined in HUVECs. This region is also DNase I hypersensitive in HUVECs and T-helper TH1 and TH2 cells. An interactive version of this figure is available in the online version of the paper.

References

  1. ENCODE Project Consortium. The ENCODE (ENCyclopedia Of DNA Elements) Project. Science 306, 636640 (2004)
  2. Birney, E. et al. Identification and analysis of functional elements in 1% of the human genome by the ENCODE pilot project. Nature 447, 799816 (2007)
  3. The ENCODE Project Consortium. A user’s guide to the encyclopedia of DNA elements (ENCODE). PLoS Biol. 9, e1001046 (2011)
  4. Mouse Genome Sequencing Consortium. Initial sequencing and comparative analysis of the mouse genome. Nature 420, 520562 (2002)
  5. Chiaromonte, F. et al. The share of human genomic DNA under selection estimated from human-mouse genomic alignments. Cold Spring Harb. Symp. Quant. Biol. 68, 245254 (2003)
  6. Cooper, G. M. et al. Distribution and intensity of constraint in mammalian genomic sequence. Genome Res. 15, 901913 (2005)
  7. Parker, S. C., Hansen, L., Abaan, H. O., Tullius, T. D. & Margulies, E. H. Local DNA topography correlates with functional noncoding regions of the human genome. Science 324, 389392 (2009)
  8. Lindblad-Toh, K. et al. A high-resolution map of human evolutionary constraint using 29 mammals. Nature 478, 476482 (2011)
  9. Pheasant, M. & Mattick, J. S. Raising the estimate of functional human sequences. Genome Res. 17, 12451253 (2007)
  10. Ponting, C. P. & Hardison, R. C. What fraction of the human genome is functional? Genome Res. 21, 17691776 (2011)
  11. Asthana, S. et al. Widely distributed noncoding purifying selection in the human genome. Proc. Natl Acad. Sci. USA 104, 1241012415 (2007)
  12. Landt, S. G. et al. ChIP-seq guidelines and practices used by the ENCODE and modENCODE consortia. Genome Res. http://dx.doi.org/10.1101/gr.136184.111 (2012)
  13. Li, Q., Brown, J. B., Huang, H. & Bickel, P. J. Measuring reproducibility of high-throughput experiments. Ann. Appl. Stat. 5, 17521779 (2011)
  14. Harrow, J. et al. GENCODE: The reference human genome annotation for the ENCODE project. Genome Res. http://dx.doi.org/10.1101/gr.135350.111 (2012)
  15. Howald, C. et al. Combining RT-PCR-seq and RNA-seq to catalog all genic elements encoded in the human genome. Genome Res. http://dx.doi.org/10.1101/gr.134478.111 (2012)
  16. Djebali, S. et al. Landscape of transcription in human cells. Nature http://dx.doi.org/10.1038/nature11233 (this issue)
  17. Derrien, T. et al. The GENCODE v7 catalog of human long noncoding RNAs: Analysis of their gene structure, evolution, and expression. Genome Res. http://dx.doi.org/10.1101/gr.132159.111 (2012)
  18. Pei, B. et al. The GENCODE pseudogene resource. Genome Biol. 13, R51 (2012)
  19. Gerstein, M. B. et al. Architecture of the human regulatory network derived from ENCODE data. Nature http://dx.doi.org/10.1038/nature11245 (this issue)
  20. Bickel, P. J., Boley, N., Brown, J. B., Huang, H. Y. & Zhang, N. R. Subsampling methods for genomic inference. Ann. Appl. Stat. 4, 16601697 (2010)
  21. Kaplan, T. et al. Quantitative models of the mechanisms that control genome-wide patterns of transcription factor binding during early Drosophila development. PLoS Genet. 7, e1001290 (2011)
  22. Li, X. Y. et al. The role of chromatin accessibility in directing the widespread, overlapping patterns of Drosophila transcription factor binding. Genome Biol. 12, R34 (2011)
  23. Pique-Regi, R. et al. Accurate inference of transcription factor binding from DNA sequence and chromatin accessibility data. Genome Res. 21, 447455 (2011)
  24. Zhang, Y. et al. Primary sequence and epigenetic determinants of in vivo occupancy of genomic DNA by GATA1. Nucleic Acids Res. 37, 70247038 (2009)
  25. Neph, S. et al. An expansive human regulatory lexicon encoded in transcription factor footprints. Nature http://dx.doi.org/10.1038/nature11212 (this issue)
  26. Whitfield, T. W. et al. Functional analysis of transcription factor binding sites in human promoters. Genome Biol. 13, R50 (2012)
  27. Gross, D. S. & Garrard, W. T. Nuclease hypersensitive sites in chromatin. Annu. Rev. Biochem. 57, 159197 (1988)
  28. Urnov, F. D. Chromatin remodeling as a guide to transcriptional regulatory networks in mammals. J. Cell. Biochem. 88, 684694 (2003)
  29. Thurman, R. E. et al. The accessible chromatin landscape of the human genome. Nature http://dx.doi.org/10.1038/nature11232 (this issue)
  30. Kundaje, A. et al. Ubiquitous heterogeneity and asymmetry of the chromatin environment at regulatory elements. Genome Res. http://dx.doi.org/10.1101/gr.136366.111 (2012)
  31. Schultz, D. C., Ayyanathan, K., Negorev, D., Maul, G. G. & Rauscher, F. J., III SETDB1: a novel KAP-1-associated histone H3, lysine 9-specific methyltransferase that contributes to HP1-mediated silencing of euchromatic genes by KRAB zinc-finger proteins. Genes Dev. 16, 919932 (2002)
  32. Frietze, S., O’Geen, H., Blahnik, K. R., Jin, V. X. & Farnham, P. J. ZNF274 recruits the histone methyltransferase SETDB1 to the 3′ ends of ZNF genes. PLoS ONE 5, e15082 (2010)
  33. Boyle, A. P. et al. High-resolution genome-wide in vivo footprinting of diverse transcription factors in human cells. Genome Res. 21, 456464 (2011)
  34. Hesselberth, J. R. et al. Global mapping of protein-DNA interactions in vivo by digital genomic footprinting. Nature Methods 6, 283289 (2009)
  35. Zhang, Y. et al. Model-based analysis of ChIP-Seq (MACS). Genome Biol. 9, R137 (2008)
  36. Kouzarides, T. Chromatin modifications and their function. Cell 128, 693705 (2007)
  37. Li, B., Carey, M. & Workman, J. L. The role of chromatin during transcription. Cell 128, 707719 (2007)
  38. Hon, G. C., Hawkins, R. D. & Ren, B. Predictive chromatin signatures in the mammalian genome. Hum. Mol. Genet. 18, R195R201 (2009)
  39. Zhou, V. W., Goren, A. & Bernstein, B. E. Charting histone modifications and the functional organization of mammalian genomes. Nature Rev. Genet. 12, 718 (2011)
  40. Ernst, J. et al. Mapping and analysis of chromatin state dynamics in nine human cell types. Nature 473, 4349 (2011)
  41. Hon, G., Wang, W. & Ren, B. Discovery and annotation of functional chromatin signatures in the human genome. PLoS Comput. Biol. 5, e1000566 (2009)
  42. Ball, M. P. et al. Targeted and genome-scale strategies reveal gene-body methylation signatures in human cells. Nature Biotechnol. 27, 361368 (2009)
  43. Meissner, A. et al. Genome-scale DNA methylation maps of pluripotent and differentiated cells. Nature 454, 766770 (2008)
  44. Ogryzko, V. V., Schiltz, R. L., Russanova, V., Howard, B. H. & Nakatani, Y. The transcriptional coactivators p300 and CBP are histone acetyltransferases. Cell 87, 953959 (1996)
  45. Lister, R. et al. Human DNA methylomes at base resolution show widespread epigenomic differences. Nature 462, 315322 (2009)
  46. Dekker, J. Gene regulation in the third dimension. Science 319, 17931794 (2008)
  47. Dostie, J. et al. Chromosome Conformation Capture Carbon Copy (5C): a massively parallel solution for mapping interactions between genomic elements. Genome Res. 16, 12991309 (2006)
  48. Lajoie, B. R., van Berkum, N. L., Sanyal, A. & Dekker, J. My5C: web tools for chromosome conformation capture studies. Nature Methods 6, 690691 (2009)
  49. Sanyal, A., Lajoie, B., Jain, G. & Dekker, J. The long-range interaction landscape of gene promoters. Nature http://dx.doi.org/10.1038/nature11279 (this issue)
  50. Fullwood, M. J. et al. An oestrogen-receptor-alpha-bound human chromatin interactome. Nature 462, 5864 (2009)
  51. Li, G. et al. Extensive promoter-centered chromatin interactions provide a topological basis for transcription regulation. Cell 148, 8498 (2012)
  52. Borneman, A. R. et al. Divergence of transcription factor binding sites across related yeast species. Science 317, 815819 (2007)
  53. Odom, D. T. et al. Tissue-specific transcriptional regulation has diverged significantly between human and mouse. Nature Genet. 39, 730732 (2007)
  54. Schmidt, D. et al. Five-vertebrate ChIP-seq reveals the evolutionary dynamics of transcription factor binding. Science 328, 10361040 (2010)
  55. A map of human genome variation from population-scale sequencing. Nature 467, 10611073 (2010)
  56. King, M. C. & Wilson, A. C. Evolution at two levels in humans and chimpanzees. Science 188, 107116 (1975)
  57. Spivakov, M. et al. Analysis of variation at transcription factor binding sites in Drosophila and humans. Genome Biol. 13, R49 (2012)
  58. Sandelin, A. et al. Mammalian RNA polymerase II core promoters: insights from genome-wide studies. Nature Rev. Genet. 8, 424436 (2007)
  59. Dong, X. et al. Modeling gene expression using chromatin features in various cellular contexts. Genome Biol. 13, R53 (2012)
  60. Huff, J. T., Plocik, A. M., Guthrie, C. & Yamamoto, K. R. Reciprocal intronic and exonic histone modification regions in humans. Nature Struct. Mol. Biol. 17, 14951499 (2010)
  61. Tilgner, H. et al. Deep sequencing of subcellular RNA fractions shows splicing to be predominantly co-transcriptional in the human genome but inefficient for lncRNAs. Genome Res. http://dx.doi.org/10.1101/gr.134445.111 (2012)
  62. Fu, Y., Sinha, M., Peterson, C. L. & Weng, Z. The insulator binding protein CTCF positions 20 nucleosomes around its binding sites across the human genome. PLoS Genet. 4, e1000138 (2008)
  63. Kornberg, R. D. & Stryer, L. Statistical distributions of nucleosomes: nonrandom locations by a stochastic mechanism. Nucleic Acids Res. 16, 66776690 (1988)
  64. Schones, D. E. et al. Dynamic regulation of nucleosome positioning in the human genome. Cell 132, 887898 (2008)
  65. Valouev, A. et al. Determinants of nucleosome organization in primary human cells. Nature 474, 516520 (2011)
  66. Frietze, S. et al. Cell type-specific binding patterns reveal that TCF7L2 can be tethered to the genome by association with GATA3. Genome Biol. 13, R52 (2012)
  67. Yip, K. Y. et al. Classification of human genomic regions based on experimentally-determined binding sites of more than 100 transcription-related factors. Genome Biol. 13, R48 (2012)
  68. Hoffman, M. M. et al. Unsupervised pattern discovery in human chromatin structure through genomic segmentation. Nature Methods 9, 473476 (2012)
  69. Kapranov, P. et al. RNA maps reveal new RNA classes and a possible function for pervasive transcription. Science 316, 14841488 (2007)
  70. Koch, F. et al. Transcription initiation platforms and GTF recruitment at tissue-specific enhancers and promoters. Nature Struct. Mol. Biol. 18, 956963 (2011)
  71. McLean, C. Y. et al. GREAT improves functional interpretation of cis-regulatory regions. Nature Biotechnol. 28, 495501 (2010)
  72. Rozowsky, J. et al. AlleleSeq: analysis of allele-specific expression and binding in a network framework. Mol. Syst. Biol. 7, 522 (2011)
  73. Boyle, A. P. et al. Annotation of functional variation in personal genomes using RegulomeDB. Genome Res. http://dx.doi.org/10.1101/gr.137323.112 (2012)
  74. Hindorff, L. A. et al. Potential etiologic and functional implications of genome-wide association loci for human diseases and traits. Proc. Natl Acad. Sci. USA 106, 93629367 (2009)
  75. Schaub, M. A., Boyle, A. P., Kundaje, A., Batzoglou, S. & Snyder, M. Linking disease associations with regulatory information in the human genome. Genome Res. http://dx.doi.org/10.1101/gr.136127.111 (2012)
  76. Libioulle, C. et al. Novel Crohn disease locus identified by genome-wide association maps to a gene desert on 5p13.1 and modulates expression of PTGER4. PLoS Genet. 3, e58 (2007)
  77. Vernot, B. et al. Personal and population genomics of human regulatory variation. Genome Res. http://dx.doi.org/10.1101/gr.134890.111 (2012)
  78. Harismendy, O. et al. 9p21 DNA variants associated with coronary artery disease impair interferon-γ signalling response. Nature 470, 264268 (2011)
  79. Cheng, C. et al. Understanding transcriptional regulation by integrative analysis of transcription factor binding data. Genome Res. http://dx.doi.org/10.1101/gr.136838.111 (2012)
  80. Schuster, S. C. et al. Complete Khoisan and Bantu genomes from southern Africa. Nature 463, 943947 (2010)

Download references

Author information

Affiliations

  1. Vertebrate Genomics Group, European Bioinformatics Institute (EMBL-EBI), Wellcome Trust Genome Campus, Hinxton, Cambridgeshire CB10 1SD, UK.

    • Ian Dunham,
    • Ewan Birney,
    • Javier Herrero,
    • Steven P. Wilder,
    • Damian Keefe,
    • Kathryn Beal,
    • Paul Flicek,
    • Nathan Johnson &
    • Daniel Sobral
  2. Department of Computer Science, Stanford University, 318 Campus Drive, Stanford, California 94305-5428, USA.

    • Anshul Kundaje,
    • Serafim Batzoglou,
    • Nadine Hussami,
    • Sofia Kyriazopoulou-Panagiotopoulou,
    • Max W. Libbrecht &
    • Marc A. Schaub
  3. SwitchGear Genomics, 1455 Adams Drive Suite 1317, Menlo Park, California 94025, USA.

    • Shelley F. Aldred,
    • Patrick J. Collins &
    • Nathan D. Trinklein
  4. Functional Genomics, Cold Spring Harbor Laboratory, 1 Bungtown Road, Cold Spring Harbor, New York 11724, USA.

    • Carrie A. Davis,
    • Felix Schlesinger,
    • Thomas R. Gingeras,
    • Alex Dobin,
    • Wei Lin,
    • Chenghai Xue,
    • Chris Zaleski,
    • Michael T. Baer,
    • Philippe Batut,
    • Kimberly Bell,
    • Sudipto Chakrabortty,
    • Jorg Drenkow,
    • Megan Fastuca,
    • Kata Fejes-Toth,
    • Assaf Gordon,
    • Sonali Jha,
    • Jonathan B. Preall,
    • Kimberly Presaud,
    • Lei-Hoon See,
    • Huaien Wang &
    • Gregory J. Hannon
  5. College of Nanoscale Sciences and Engineering, University ay Albany-SUNY, 257 Fuller Road, NFE 4405, Albany, New York 12203, USA.

    • Francis Doyle &
    • Scott A. Tenenbaum
  6. Broad Institute of MIT and Harvard, 7 Cambridge Center, Cambridge, Massachusetts 02142, USA.

    • Charles B. Epstein,
    • Noam Shoresh,
    • Bradley E. Bernstein,
    • Tarjei S. Mikkelsen,
    • Alon Goren,
    • Oren Ram,
    • Xiaolan Zhang,
    • Li Wang,
    • Robbyn Issner,
    • Michael J. Coyne,
    • Timothy Durham,
    • Manching Ku &
    • Thanh Truong
  7. Biochemistry and Molecular Biology, USC/Norris Comprehensive Cancer Center, 1450 Biggy Street, NRT 6503, Los Angeles, California 90089, USA.

    • Seth Frietze,
    • Peggy J. Farnham &
    • Heather Witt
  8. Informatics, Wellcome Trust Sanger Institute, Wellcome Trust Genome Campus, Hinxton, Cambridgeshire CB10 1SA, UK.

    • Jennifer Harrow,
    • Timothy J. Hubbard,
    • Felix Kokocinski,
    • Bronwen Aken,
    • Daniel Barrell,
    • Gemma Barson,
    • Andrew Berry,
    • Alexandra Bignell,
    • Veronika Boychenko,
    • Claire Davidson,
    • Gloria Despacio-Reyes,
    • Adam Frankish,
    • James Gilbert,
    • Jose Manuel Gonzalez,
    • Ed Griffiths,
    • Toby Hunt,
    • Mike Kay,
    • Jane Loveland,
    • Deepa Manthravadi,
    • Jonathan Mudge,
    • Gaurab Mukherjee,
    • Gary Saunders,
    • Stephen Searle,
    • Catherine Snow,
    • Charlie Steward,
    • Electra Tapanari,
    • Laurens Wilming &
    • Amonida Zadissa
  9. Department of Medicine, Division of Medical Genetics, University of Washington, 3720 15th Avenue NE, Seattle, Washington 98195, USA.

    • Rajinder Kaul
  10. College of Arts and Sciences, Boise State University, 1910 University Drive, Boise, Idaho 83725, USA.

    • Jainab Khatun,
    • Morgan C. Giddings,
    • John Wrobel &
    • Brian A. Risk
  11. Program in Systems Biology, Program in Gene Function and Expression, Department of Biochemistry and Molecular Pharmacology, University of Massachusetts Medical School, 364 Plantation Street, Worcester, Massachusetts 01605, USA.

    • Bryan R. Lajoie,
    • Amartya Sanyal,
    • Job Dekker &
    • Gaurav Jain
  12. Department of Genetics, Stanford University, 300 Pasteur Drive, M-344, Stanford, California 94305-5120, USA.

    • Stephen G. Landt,
    • Michael Snyder,
    • Nick Addleman,
    • Keith Bettinger,
    • Alan P. Boyle,
    • Philip Cayting,
    • Yong Cheng,
    • Catharine Eastman,
    • Ghia Euskirchen,
    • Fabian Grubert,
    • Manoj Hariharan,
    • Konrad J. Karczewski,
    • Maya Kasowski,
    • Phil Lacroute,
    • Hugo Lam,
    • Zhengqing Ouyang,
    • Dorrelyn Patacsil,
    • Lucia Ramirez,
    • Minyi Shi,
    • Teri Slifer,
    • Linfeng Wu &
    • Xinqiong Yang
  13. Center for Systems and Synthetic Biology, Institute for Cellular and Molecular Biology, Section of Molecular Genetics and Microbiology, The University of Texas at Austin, 1 University Station A4800, Austin, Texas 78712, USA.

    • Bum-Kyu Lee,
    • Anna Battenhouse,
    • Akshay A. Bhinge,
    • Zheng Liu,
    • Ryan M. McDaniell,
    • Yunyun Ni &
    • Vishwanath R. Iyer
  14. HudsonAlpha Institute for Biotechnology, 601 Genome Way, Huntsville, Alabama 35806, USA.

    • Florencia Pauli,
    • Timothy E. Reddy,
    • Chris Gunter,
    • Richard M. Myers,
    • Jason Gertz,
    • E. Christopher Partridge,
    • Katherine E. Varley,
    • Anita Bansal,
    • Preti Jain,
    • Kevin M. Bowling,
    • Marie K. Cross,
    • Michael A. Muratet,
    • Kimberly M. Newberry,
    • Amy S. Nesmith,
    • Barbara Pusey,
    • Stephanie L. Parker,
    • Nicholas S. Davis,
    • Sarah K. Meadows,
    • Tracy Eggleston,
    • J. Scott Newberry,
    • Shawn E. Levy &
    • Devin M. Absher
  15. Center for Biomolecular Science and Engineering, University of California, Santa Cruz, 1156 High Street, Santa Cruz, California 95064, USA.

    • Kate R. Rosenbloom,
    • W. James Kent,
    • Cricket A. Sloan,
    • Katrina Learned,
    • Venkat S. Malladi,
    • Matthew C. Wong,
    • Galt P. Barber,
    • Melissa S. Cline,
    • Timothy R. Dreszer,
    • Steven G. Heitner,
    • Donna Karolchik,
    • Vanessa M. Kirkup,
    • Laurence R. Meyer,
    • Jeffrey C. Long,
    • Morgan Maddren,
    • Brian J. Raney,
    • Mark Diekhans &
    • Rachel Harte
  16. Department of Genome Sciences, University of Washington, 3720 15th Ave NE, Seattle, Washington 98195-5065, USA.

    • Peter Sabo,
    • Michael M. Hoffman,
    • Robert E. Thurman,
    • Daniel L. Bates,
    • Theresa K. Canfield,
    • Morgan J. Diegel,
    • Douglas Dunn,
    • Erica Gist,
    • Eric Haugen,
    • Richard Humbert,
    • Audra K. Johnson,
    • Tattyana V. Kutyavin,
    • Kristen Lee,
    • Matthew T. Maurano,
    • Shane J. Neph,
    • Fiedencio V. Neri,
    • Hongzhu Qu,
    • Alex P. Reynolds,
    • Vaughn Roach,
    • Eric Rynes,
    • Richard S. Sandstrom,
    • Anthony O. Shafer,
    • Andrew B. Stergachis,
    • Sean Thomas,
    • Benjamin Vernot,
    • Jeff Vierstra,
    • Shinny Vong,
    • Hao Wang,
    • Molly A. Weaver,
    • Joshua M. Akey,
    • Michael J. MacCoss,
    • William Stafford Noble,
    • Orion J. Buske &
    • Avinash D. Sahu
  17. Institute for Genome Sciences and Policy, Duke University, 101 Science Drive, Durham, North Carolina 27708, USA.

    • Alexias Safi,
    • Lingyun Song,
    • Gregory E. Crawford,
    • Nathan C. Sheffield,
    • Darin London,
    • Tianyuan Wang &
    • Deborah Winter
  18. Department of Biology, Carolina Center for Genome Sciences, and Lineberger Comprehensive Cancer Center, The University of North Carolina at Chapel Hill, 408 Fordham Hall, Chapel Hill, North Carolina 27599-3280, USA.

    • Jeremy M. Simon,
    • Jason D. Lieb,
    • Linda L. Grasfeder,
    • Paul G. Giresi,
    • Kimberly A. Showers,
    • Christopher Shestak,
    • Matthew R. Schaner,
    • Seul Ki Kim,
    • Zhuzhu Z. Zhang,
    • Joanna O. Mieczkowska,
    • Min Jae Kim &
    • Sheera Adar
  19. Computer Science and Artificial Intelligence Laboratory, Broad Institute of MIT and Harvard, Massachusetts Institute of Technology, 32 Vassar Street, Cambridge, Massachusetts 02139, USA.

    • Robert C. Altshuler,
    • Jason Ernst,
    • Manolis Kellis,
    • Pouya Kheradpour,
    • Lucas D. Ward,
    • Matthew L. Eaton,
    • David A. Hendrix,
    • Irwin Jungreis,
    • Michael F. Lin &
    • Stefan Washietl
  20. Department of Statistics, University of California, Berkeley, 367 Evans Hall, University of California, Berkeley, Berkeley, California 94720, USA.

    • James B. Brown,
    • Qunhua Li,
    • Peter J. Bickel,
    • Balazs Banfai,
    • Nathan P. Boley,
    • Haiyan Huang &
    • Jingyi Jessica Li
  21. Computational Biology and Bioinformatics Program, Yale University, 266 Whitney Avenue, New Haven, Connecticut 06520, USA.

    • Chao Cheng,
    • Mark Gerstein,
    • Joel Rozowsky,
    • Kevin Y. Yip,
    • Alexej Abyzov,
    • Ekta Khurana,
    • Jing Leng,
    • Baikang Pei,
    • Cristina Sisu,
    • Roger P. Alexander,
    • Raymond K. Auerbach,
    • Suganthi Balasubramanian,
    • Nitin Bhardwaj,
    • Lukas Habegger,
    • Arif Harmanci,
    • Renqiang Min,
    • Xinmeng J. Mu,
    • Koon-Kiu Yan &
    • Lucas Lochovsky
  22. Bioinformatics and Genomics, Centre for Genomic Regulation (CRG) and UPF, Doctor Aiguader, 88, Barcelona 08003, Catalonia, Spain.

    • Sarah Djebali,
    • Angelika Merkel,
    • Roderic Guigó,
    • Andrea Tanzer,
    • Julien Lagarde,
    • Maik Röder,
    • Tyler Alioto,
    • Joao Curado,
    • Thomas Derrien,
    • Pedro Ferreira,
    • David Gonzalez,
    • Rory Johnson,
    • Colin Kingswood,
    • Paolo Ribeca,
    • Michael Sammeth,
    • Jorgen Skancke,
    • Hagen Tilgner,
    • Giovanni Bussotti,
    • Marco Mariotti &
    • Cedric Notredame
  23. Program in Bioinformatics and Integrative Biology, University of Massachusetts Medical School, 364 Plantation Street, Worcester, Massachusetts 01605, USA.

    • Xianjun Dong,
    • Melissa Greven,
    • Xinying Lin,
    • Jie Wang,
    • Troy W. Whitfield,
    • Jiali Zhuang &
    • Zhiping Weng
  24. Department of Genetics, The University of North Carolina at Chapel Hill, 120 Mason Farm Road, CB 7240, Chapel Hill, North Carolina 27599, USA.

    • Terrence S. Furey &
    • Zhancheng Zhang
  25. Center for Comparative Genomics and Bioinformatics, The Pennsylvania State University, Wartik Laboratory, University Park, Pennsylvania 16802, USA.

    • Belinda Giardine,
    • Ross C. Hardison,
    • Robert S. Harris,
    • Weisheng Wu &
    • Webb Miller
  26. Department of Biochemistry and Molecular Biology, The Pennsylvania State University, 304 Wartik Laboratory, University Park, Pennsylvania 16802, USA.

    • Ross C. Hardison
  27. Program in Bioinformatics, Boston University, 24 Cummington Street, Boston, Massachusetts 02215, USA.

    • Sowmya Iyer
  28. RIKEN Omics Science Center, RIKEN Yokohama Institute, 1-7-22 Suehiro-cho, Tsurumi-ku, Yokohama, Kanagawa 230-0045, Japan.

    • Timo Lassmann,
    • Rehab F. Abdelhamid,
    • Ana Maria Suzuki,
    • Hazuki Takahashi,
    • Yoshihide Hayashizaki &
    • Piero Carninci
  29. Division of Biology, California Institute of Technology, 156-291200 East California Boulevard, Pasadena, California 91125, USA.

    • Georgi K. Marinov,
    • Barbara Wold,
    • Brian A. Williams,
    • Igor Antoshechkin,
    • Brandon King,
    • Lorain Schaeffer,
    • Diane Trout,
    • Jost Vielmetter,
    • Clarke Gasper,
    • Shirley Pepke,
    • Henry Amrhein,
    • Michael Anaya,
    • Kenneth McCue,
    • Katherine I. Fisher-Aylor,
    • Gilberto DeSalvo &
    • Sreeram Balasubramanian
  30. Developmental and Cell Biology and Center for Complex Biological Systems, University of California Irvine, 2218 Biological Sciences III, Irvine, California 92697-2300, USA.

    • Ali Mortazavi &
    • Eddie Park
  31. Genome Technology Branch, National Human Genome Research Institute, 5625 Fishers Lane, Bethesda, Maryland 20892, USA.

    • Stephen C. J. Parker &
    • Elliott H. Margulies
  32. Department of Biochemistry and Molecular Pharmacology, Bioinformatics Core, University of Massachusetts Medical School, 364 Plantation Street, Worcester, Massachusetts 01605, USA.

    • Hualin S. Xi
  33. Howard Hughes Medical Institute and Department of Pathology, Massachusetts General Hospital and Harvard Medical School, 185 Cambridge St CPZN 8400, Boston, Massachusetts 02114, USA.

    • Bradley E. Bernstein,
    • Shawn Gillespie,
    • Alon Goren,
    • Oren Ram &
    • Manching Ku
  34. National Human Genome Research Institute, National Institutes of Health, 31 Center Drive, Building 31, Room 4B09, Bethesda, Maryland 20892-2152, USA.

    • Eric D. Green
  35. National Human Genome Research Institute, National Institutes of Health, 5635 Fishers Lane, Bethesda, Maryland 20892-9307, USA.

    • Michael J. Pazin,
    • Rebecca F. Lowdon,
    • Laura A. L. Dillon,
    • Leslie B. Adams,
    • Caroline J. Kelly,
    • Julia Zhang,
    • Judith R. Wexler,
    • Peter J. Good &
    • Elise A. Feingold
  36. Department of Pediatrics, Division of Medical Genetics, Duke University School of Medicine, Durham, North Carolina 27710, USA.

    • Gregory E. Crawford
  37. National Human Genome Research Institute, National Institutes of Health, 5625 Fishers Lane, Rockville, Maryland 20892, USA.

    • Laura Elnitski &
    • Hanna M. Petrykowska
  38. Affymetrix, Inc., 3380 Central Expressway, Santa Clara, California 95051, USA.

    • Thomas R. Gingeras
  39. Departament de Ciències Experimentals i de la Salut, Universitat Pompeu Fabra, Barcelona, Catalonia 08002, Spain.

    • Roderic Guigó
  40. Department of Genome Sciences, Box 355065, and Department of Medicine, Division of Oncology, Box 358081, University of Washington, Seattle, Washington 98195-5065, USA.

    • John A. Stamatoyannopoulos
  41. Institute for Genomics and Systems Biology, The University of Chicago, 900 East 57th Street, 10100 KCBD, Chicago, Illinois 60637, USA.

    • Kevin P. White,
    • Subhradip Karmakar,
    • Raj R. Bhanvadia,
    • Alina Choudhury,
    • Marc Domanus,
    • Lijia Ma,
    • Jennifer Moran &
    • Alec Victorsen
  42. Beckman Institute, California Institute of Technology, 156-29 1200 E. California Boulevard, Pasadena, California 91125, USA.

    • Barbara Wold,
    • Jost Vielmetter,
    • Clarke Gasper,
    • Michael Anaya,
    • Katherine I. Fisher-Aylor,
    • Gilberto DeSalvo &
    • Sreeram Balasubramanian
  43. Department of Biochemistry and Biophysics, University of North Carolina School of Medicine, Campus Box 7260, 120 Mason Farm Road, 3010 Genetic Medicine Building, Chapel Hill, North Carolina 27599, USA.

    • Yanbao Yu,
    • Harsha P. Gunawardena,
    • Heather C. Kuiper,
    • Christopher W. Maier,
    • Ling Xie &
    • Xian Chen
  44. Centro Nacional de Análisis Genómico (CNAG), C/Baldiri Reixac 4, Torre I, Barcelona, Catalonia 08028, Spain.

    • Tyler Alioto,
    • Colin Kingswood,
    • Paolo Ribeca &
    • Michael Sammeth
  45. Genomics, Affymetrix, Inc., 3380 Central Expressway, Santa Clara, California 95051, USA.

    • Ian Bell,
    • Erica Dumais,
    • Jackie Dumais,
    • Radha Duttagupta,
    • Sylvain Foissac,
    • Hui Gao &
    • Philipp Kapranov
  46. Center for Integrative Genomics, University of Lausanne, Genopode Building, 1015 Lausanne, Switzerland.

    • Jacqueline Chrast,
    • Cédric Howald,
    • Nathalie Walters &
    • Alexandre Reymond
  47. Genome Technology and Biology, Genome Institute of Singapore, 60 Biopolis Street, 02-01, Genome, Singapore 138672, Singapore.

    • Melissa J. Fullwood,
    • Oscar J. Luo,
    • Xiaoan Ruan,
    • Kuljeet Singh Sandhu,
    • Atif Shahab,
    • Yijun Ruan,
    • Meizhen Zheng &
    • Ping Wang
  48. Computational and Systems Biology, Genome Institute of Singapore, 60 Biopolis Street, 02-01, Genome, Singapore 138672, Singapore.

    • Guoliang Li
  49. Department of Genetic Medicine and Development, University of Geneva Medical School, and University Hospitals of Geneva, 1 rue Michel-Servet, 1211 Geneva 4, Switzerland.

    • Daniel Robyr &
    • Stylianos E. Antonarakis
  50. Department of Genetics, The University of North Carolina at Chapel Hill, 5078 GMB, Chapel Hill, North Carolina 27599-7264, USA.

    • Piotr A. Mieczkowski
  51. Department of Biostatistics, Gillings School of Global Public Health, The University of North Carolina at Chapel Hill, 408 Fordham Hall, Chapel Hill, North Carolina 27599-7445, USA.

    • Naim U. Rashid
  52. Center for Advanced Computing Research, California Institute of Technology, MC 158-79, 1200 East California Boulavard, Pasadena, California 91125, USA.

    • Shirley Pepke
  53. Department of Statistics, Stanford University, Sequoia Hall. 390 Serra Mall, Stanford, California 94305-4065, USA.

    • Wing H. Wong
  54. DOE Joint Genome Institute, Walnut Creek, California, USA.

    • Matthew J. Blow,
    • Axel Visel &
    • Len A. Pennachio
  55. Genomics Division, Lawrence Berkeley National Laboratory, One Cyclotron Road, MS 84-171, Berkeley, California 94720, USA.

    • Axel Visel &
    • Len A. Pennachio
  56. Structural Computational Biology, Spanish National Cancer Research Centre (CNIO), Melchor Fernandez Almagro, 3, 28029 Madrid, Spain.

    • Iakes Ezkurdia,
    • Jose Manuel Rodriguez,
    • Michael L. Tress &
    • Alfonso Valencia
  57. School of Life Sciences, Tsinghua University, School of Life Sciences, Tsinghua University, 100084 Beijing, China.

    • Zhi Lu
  58. Department of Pathology and Laboratory Medicine, Institute for Computational Biomedicine, Weill Cornell Medical College, 1305 York Avenue, Box 140, New York, New York 10065, USA.

    • Andrea Sboner
  59. Computer Science and Engineering, Washington University in St Louis, St Louis, Missouri 63130, USA.

    • Marijke J. van Baren &
    • Michael Brent
  60. Department of Genetics, Albert Einstein College of Medicine, 1301 Morris Park Avenue, Room 353A, Bronx, New York 10461, USA.

    • Zhengdong Zhang
  61. Center for Biomolecular Science and Engineering, Howard Hughes Medical Institute, University of California, Santa Cruz, 1156 High Street, Santa Cruz, California 95064, USA.

    • David Haussler
  62. Genome Center, University of California-Davis, 451 Health Sciences Drive, Davis, California 95616, USA.

    • Alina R. Cao,
    • Henriette O’Geen &
    • Xiaoqin Xu
  63. Department of Molecular, Cellular, and Developmental Biology, Yale University, 266 Whitney Avenue, New Haven, Connecticut 06511, USA.

    • Alexandra Charos,
    • Hannah Monahan,
    • Debasish Raha &
    • Brian Reed
  64. Biological Chemistry and Molecular Pharmacology, Harvard Medical School, 240 Longwood Avenue, Boston, Massachusetts 02115, USA.

    • Joseph D. Fleming,
    • Nathan Lamarre-Vincent,
    • Marianne Lindahl-Allen,
    • Benoit Miotto,
    • Zarmik Moqtaderi &
    • Kevin Struhl
  65. Biochemistry and Molecular Biology, University of Southern California, 1501 San Pablo Street, Los Angeles, California 90089, USA.

    • Sushma Iyengar
  66. Department of Biomedical Informatics, Ohio State University, 3172C Graves Hall, 333 W Tenth Avenue, Columbus, Ohio 43210, USA.

    • Victor X. Jin
  67. Department of Genetics, Yale University, Yale University School of Medicine, 333 Cedar Street, New Haven, Connecticut 06510, USA.

    • Jin Lian &
    • Sherman M. Weissman
  68. Department of Cellular and Structural Biology, Children’s Cancer Research Institute–UTHSCSA, Mail code 7784- 7703 Floyd Curl Dr, San Antonio, Texas 78229, USA.

    • Luiz O. Penalva
  69. Centre for Organismal Studies (COS) Heidelberg, University of Heidelberg, Im Neuenheimer Feld 230, 69120 Heidelberg, Germany.

    • Thomas Auer,
    • Lazaro Centanin,
    • Michael Eichenlaub,
    • Franziska Gruhl,
    • Stephan Heermann,
    • Burkhard Hoeckendorf,
    • Daigo Inoue,
    • Tanja Kellner,
    • Stephan Kirchmaier,
    • Claudia Mueller,
    • Robert Reinhardt,
    • Lea Schertel,
    • Stephanie Schneider,
    • Rebecca Sinn,
    • Beate Wittbrodt &
    • Jochen Wittbrodt
  70. Basic Sciences Division, Fred Hutchinson Cancer Research Center, 825 Eastlake Avenue East, Seattle, Washington 98109, USA.

    • Gayathri Balasundaram,
    • Rachel Byron,
    • Miaohua Zhang,
    • Michael Bender &
    • Mark Groudine
  71. Department of Medicine, Division of Medical Genetics, Box 357720, University of Washington, Seattle, Washington 98195-7720, USA.

    • Abigail K. Ebersol,
    • Tristan Frum,
    • R. Scott Hansen,
    • Lisa Boatman,
    • Ericka M. Johnson,
    • Dimitra Lotakis,
    • Eric D. Nguyen,
    • Minerva E. Sanchez,
    • Yongqi Yan,
    • Patrick Navas &
    • George Stamatoyannopoulos
  72. Division of Human Biology, Fred Hutchinson Cancer Research Center, 825 Eastlake Avenue East, Seattle, Washington 98109, USA.

    • Kavita Garg
  73. Department of Psychiatry and Behavioral Sciences, Box 356560, University of Washington, Seattle, Washington 98195-6560, USA.

    • Michael O. Dorschner
  74. Microarray Informatics Group, European Bioinformatics Institute (EMBL-EBI), Wellcome Trust Genome Campus, Hinxton, Cambridgeshire CB10 1SD, UK.

    • Alvis Brazma &
    • Margus Lukk
  75. Genomics and Regulatory Systems Group, European Bioinformatics Institute (EMBL-EBI), Wellcome Trust Genome Campus, Hinxton, Cambridgeshire CB10 1SD, UK.

    • Nicholas M. Luscombe &
    • Juan M. Vaquerizas
  76. Department of Pathology, Department of Genetics, Stanford University, 300 Pasteur Drive, Stanford, California 94305, USA.

    • Arend Sidow
  77. Department of Computer Science and Engineering, 185 Stevens Way, Seattle, Washington 98195, USA.

    • William Stafford Noble
  78. Department of Electrical Engineering, University of Washington, 185 Stevens Way, Seattle, Washington 98195, USA.

    • Jeffrey A. Bilmes
  79. Center for Biomedical Informatics, Harvard Medical School, 10 Shattuck Street, Boston, Massachusetts 02115, USA.

    • Peter V. Kharchenko &
    • Peter J. Park
  80. Departments of Biology and Mathematics and Computer Science, Emory University, Atlanta, Georgia 30322, USA.

    • Dannon Baker &
    • James Taylor
  81. Present addresses: Computer Science and Artificial Intelligence Laboratory, Massachusetts Institute of Technology, 32 Vassar Street, Cambridge, Massachusetts 02139, USA (A.K.); UCLA Biological Chemistry Department, Eli and Edythe Broad Center of Regenerative Medicine and Stem Cell Research at UCLA, Jonsson Comprehensive Cancer Center, 615 Charles E Young Dr South, Los Angeles, California 90095, USA (J.E.); Department of Statistics, 514D Wartik Lab, Penn State University, State College, Pennsylvania 16802, USA (Q.L.); Department of Biostatistics and Bioinformatics and the Institute for Genome Sciences and Policy, Duke University School of Medicine, 101 Science Drive, Durham, North Carolina 27708, USA (T.E.R.); Department of Computer Science and Engineering, The Chinese University of Hong Kong, Shatin, New Territories, Hong Kong (K.Y.Y.); Department of Genetics, Washington University in St Louis, St Louis, Missouri 63110, USA (R.F.L.); Department of Cell Biology and Molecular Genetics, University of Maryland, College Park, Maryland 20742, USA (L.A.L.D.); National Cancer Institute, National Institutes of Health, Bethesda, Maryland 20892, USA (J.Z.); University of California, Davis Population Biology Graduate Group, Davis, California 95616, USA (J.R.W.); Illumina Cambridge Ltd., Chesterford Research Park, Little Chesterford, Saffron Walden, Essex CB10 1XL, UK (E.H.M.); BlueGnome Ltd., CPC4, Capital Park, Fulbourn, Cambridge CB21 5XE, UK (F.K.); Institut de Génétique et Développement de Rennes, CNRS-UMR6061, Université de Rennes 1, F-35000 Rennes, Brittany, France (T.D.); Caltech, 1200 East California Boulevard, Pasadena, California 91125, USA (K.F.-T.); A*STAR-Duke-NUS Neuroscience Research Partnership, 8 College Road, Singapore 169857, Singapore (M.J.F.); St Laurent Institute, One Kendall Square, Cambridge, Massachusetts 02139, USA (P.K.); Department of Genetics, Stanford University, Stanford, California 94305, USA (H.T.); Biomedical Sciences (BMS) Graduate Program, University of California, San Francisco, 513 Parnassus Avenue, HSE-1285, San Francisco, California 94143-0505, USA (S.L.P.); Monterey Bay Aquarium Research Institute, Moss Landing, California 95039, USA (M.J.v.B.); Department of Machine Learning, NEC Laboratories America, 4 Independence Way, Princeton, New Jersey 08540, USA (R.M.); Neuronal Circuit Development Group, Unité de Génétique et Biologie du Développement, U934/UMR3215, Institut Curie–Centre de Recherche, Pole de Biologie du Développement et Cancer, 26, rue d’Ulm, 75248 Paris Cedex 05, France (T.A.); Cancer Research UK, Cambridge Research Institute, Li Ka Shing Centre, Robinson Way, Cambridge CB2 0RE, UK (M.L.); Unidade de Bioinformatica, Rua da Quinta Grande, 6, P-2780-156 Oeiras, Portugal (D.S.); Department of Genome Sciences, University of Washington, 3720 15th Avenue NE, Seattle, Washington 98195-5065, USA (M.W.L.); Center for Bioinformatics and Computational Biology, 3115 Ag/Life Surge Building 296, University of Maryland, College Park, Maryland 20742, USA (A.D.S.).

    • Anshul Kundaje,
    • Jason Ernst,
    • Qunhua Li,
    • Timothy E. Reddy,
    • Kevin Y. Yip,
    • Rebecca F. Lowdon,
    • Laura A. L. Dillon,
    • Julia Zhang,
    • Judith R. Wexler,
    • Elliott H. Margulies,
    • Felix Kokocinski,
    • Thomas Derrien,
    • Kata Fejes-Toth,
    • Melissa J. Fullwood,
    • Philipp Kapranov,
    • Hagen Tilgner,
    • Stephanie L. Parker,
    • Marijke J. van Baren,
    • Renqiang Min,
    • Thomas Auer,
    • Margus Lukk,
    • Daniel Sobral,
    • Max W. Libbrecht &
    • Avinash D. Sahu

Consortia

  1. The ENCODE Project Consortium

  2. Overall coordination (data analysis coordination)

    • Ian Dunham &
    • Anshul Kundaje
  3. Data production leads (data production)

    • Shelley F. Aldred,
    • Patrick J. Collins,
    • Carrie A. Davis,
    • Francis Doyle,
    • Charles B. Epstein,
    • Seth Frietze,
    • Jennifer Harrow,
    • Rajinder Kaul,
    • Jainab Khatun,
    • Bryan R. Lajoie,
    • Stephen G. Landt,
    • Bum-Kyu Lee,
    • Florencia Pauli,
    • Kate R. Rosenbloom,
    • Peter Sabo,
    • Alexias Safi,
    • Amartya Sanyal,
    • Noam Shoresh,
    • Jeremy M. Simon,
    • Lingyun Song &
    • Nathan D. Trinklein
  4. Lead analysts (data analysis)

    • Robert C. Altshuler,
    • Ewan Birney,
    • James B. Brown,
    • Chao Cheng,
    • Sarah Djebali,
    • Xianjun Dong,
    • Ian Dunham,
    • Jason Ernst,
    • Terrence S. Furey,
    • Mark Gerstein,
    • Belinda Giardine,
    • Melissa Greven,
    • Ross C. Hardison,
    • Robert S. Harris,
    • Javier Herrero,
    • Michael M. Hoffman,
    • Sowmya Iyer,
    • Manolis Kellis,
    • Jainab Khatun,
    • Pouya Kheradpour,
    • Anshul Kundaje,
    • Timo Lassmann,
    • Qunhua Li,
    • Xinying Lin,
    • Georgi K. Marinov,
    • Angelika Merkel,
    • Ali Mortazavi,
    • Stephen C. J. Parker,
    • Timothy E. Reddy,
    • Joel Rozowsky,
    • Felix Schlesinger,
    • Robert E. Thurman,
    • Jie Wang,
    • Lucas D. Ward,
    • Troy W. Whitfield,
    • Steven P. Wilder,
    • Weisheng Wu,
    • Hualin S. Xi,
    • Kevin Y. Yip &
    • Jiali Zhuang
  5. Writing group

    • Bradley E. Bernstein,
    • Ewan Birney,
    • Ian Dunham,
    • Eric D. Green,
    • Chris Gunter &
    • Michael Snyder
  6. NHGRI project management (scientific management)

    • Michael J. Pazin,
    • Rebecca F. Lowdon,
    • Laura A. L. Dillon,
    • Leslie B. Adams,
    • Caroline J. Kelly,
    • Julia Zhang,
    • Judith R. Wexler,
    • Eric D. Green,
    • Peter J. Good &
    • Elise A. Feingold
  7. Principal investigators (steering committee)

    • Bradley E. Bernstein,
    • Ewan Birney,
    • Gregory E. Crawford,
    • Job Dekker,
    • Laura Elnitski,
    • Peggy J. Farnham,
    • Mark Gerstein,
    • Morgan C. Giddings,
    • Thomas R. Gingeras,
    • Eric D. Green,
    • Roderic Guigó,
    • Ross C. Hardison,
    • Timothy J. Hubbard,
    • Manolis Kellis,
    • W. James Kent,
    • Jason D. Lieb,
    • Elliott H. Margulies,
    • Richard M. Myers,
    • Michael Snyder,
    • John A. Stamatoyannopoulos,
    • Scott A. Tenenbaum,
    • Zhiping Weng,
    • Kevin P. White &
    • Barbara Wold
  8. Boise State University and University of North Carolina at Chapel Hill Proteomics groups (data production and analysis)

    • Jainab Khatun,
    • Yanbao Yu,
    • John Wrobel,
    • Brian A. Risk,
    • Harsha P. Gunawardena,
    • Heather C. Kuiper,
    • Christopher W. Maier,
    • Ling Xie,
    • Xian Chen &
    • Morgan C. Giddings
  9. Broad Institute Group (data production and analysis)

    • Bradley E. Bernstein,
    • Charles B. Epstein,
    • Noam Shoresh,
    • Jason Ernst,
    • Pouya Kheradpour,
    • Tarjei S. Mikkelsen,
    • Shawn Gillespie,
    • Alon Goren,
    • Oren Ram,
    • Xiaolan Zhang,
    • Li Wang,
    • Robbyn Issner,
    • Michael J. Coyne,
    • Timothy Durham,
    • Manching Ku,
    • Thanh Truong,
    • Lucas D. Ward,
    • Robert C. Altshuler,
    • Matthew L. Eaton &
    • Manolis Kellis
  10. Cold Spring Harbor, University of Geneva, Center for Genomic Regulation, Barcelona, RIKEN, Sanger Institute, University of Lausanne, Genome Institute of Singapore group (data production and analysis)

    • Sarah Djebali,
    • Carrie A. Davis,
    • Angelika Merkel,
    • Alex Dobin,
    • Timo Lassmann,
    • Ali Mortazavi,
    • Andrea Tanzer,
    • Julien Lagarde,
    • Wei Lin,
    • Felix Schlesinger,
    • Chenghai Xue,
    • Georgi K. Marinov,
    • Jainab Khatun,
    • Brian A. Williams,
    • Chris Zaleski,
    • Joel Rozowsky,
    • Maik Röder,
    • Felix Kokocinski,
    • Rehab F. Abdelhamid,
    • Tyler Alioto,
    • Igor Antoshechkin,
    • Michael T. Baer,
    • Philippe Batut,
    • Ian Bell,
    • Kimberly Bell,
    • Sudipto Chakrabortty,
    • Xian Chen,
    • Jacqueline Chrast,
    • Joao Curado,
    • Thomas Derrien,
    • Jorg Drenkow,
    • Erica Dumais,
    • Jackie Dumais,
    • Radha Duttagupta,
    • Megan Fastuca,
    • Kata Fejes-Toth,
    • Pedro Ferreira,
    • Sylvain Foissac,
    • Melissa J. Fullwood,
    • Hui Gao,
    • David Gonzalez,
    • Assaf Gordon,
    • Harsha P. Gunawardena,
    • Cédric Howald,
    • Sonali Jha,
    • Rory Johnson,
    • Philipp Kapranov,
    • Brandon King,
    • Colin Kingswood,
    • Guoliang Li,
    • Oscar J. Luo,
    • Eddie Park,
    • Jonathan B. Preall,
    • Kimberly Presaud,
    • Paolo Ribeca,
    • Brian A. Risk,
    • Daniel Robyr,
    • Xiaoan Ruan,
    • Michael Sammeth,
    • Kuljeet Singh Sandhu,
    • Lorain Schaeffer,
    • Lei-Hoon See,
    • Atif Shahab,
    • Jorgen Skancke,
    • Ana Maria Suzuki,
    • Hazuki Takahashi,
    • Hagen Tilgner,
    • Diane Trout,
    • Nathalie Walters,
    • Huaien Wang,
    • John Wrobel,
    • Yanbao Yu,
    • Yoshihide Hayashizaki,
    • Jennifer Harrow,
    • Mark Gerstein,
    • Timothy J. Hubbard,
    • Alexandre Reymond,
    • Stylianos E. Antonarakis,
    • Gregory J. Hannon,
    • Morgan C. Giddings,
    • Yijun Ruan,
    • Barbara Wold,
    • Piero Carninci,
    • Roderic Guigó &
    • Thomas R. Gingeras
  11. Data coordination center at UC Santa Cruz (production data coordination)

    • Kate R. Rosenbloom,
    • Cricket A. Sloan,
    • Katrina Learned,
    • Venkat S. Malladi,
    • Matthew C. Wong,
    • Galt P. Barber,
    • Melissa S. Cline,
    • Timothy R. Dreszer,
    • Steven G. Heitner,
    • Donna Karolchik,
    • W. James Kent,
    • Vanessa M. Kirkup,
    • Laurence R. Meyer,
    • Jeffrey C. Long,
    • Morgan Maddren &
    • Brian J. Raney
  12. Duke University, EBI, University of Texas, Austin, University of North Carolina-Chapel Hill group (data production and analysis)

    • Terrence S. Furey,
    • Lingyun Song,
    • Linda L. Grasfeder,
    • Paul G. Giresi,
    • Bum-Kyu Lee,
    • Anna Battenhouse,
    • Nathan C. Sheffield,
    • Jeremy M. Simon,
    • Kimberly A. Showers,
    • Alexias Safi,
    • Darin London,
    • Akshay A. Bhinge,
    • Christopher Shestak,
    • Matthew R. Schaner,
    • Seul Ki Kim,
    • Zhuzhu Z. Zhang,
    • Piotr A. Mieczkowski,
    • Joanna O. Mieczkowska,
    • Zheng Liu,
    • Ryan M. McDaniell,
    • Yunyun Ni,
    • Naim U. Rashid,
    • Min Jae Kim,
    • Sheera Adar,
    • Zhancheng Zhang,
    • Tianyuan Wang,
    • Deborah Winter,
    • Damian Keefe,
    • Ewan Birney,
    • Vishwanath R. Iyer,
    • Jason D. Lieb &
    • Gregory E. Crawford
  13. Genome Institute of Singapore group (data production and analysis)

    • Guoliang Li,
    • Kuljeet Singh Sandhu,
    • Meizhen Zheng,
    • Ping Wang,
    • Oscar J. Luo,
    • Atif Shahab,
    • Melissa J. Fullwood,
    • Xiaoan Ruan &
    • Yijun Ruan
  14. HudsonAlpha Institute, Caltech, UC Irvine, Stanford group (data production and analysis)

    • Richard M. Myers,
    • Florencia Pauli,
    • Brian A. Williams,
    • Jason Gertz,
    • Georgi K. Marinov,
    • Timothy E. Reddy,
    • Jost Vielmetter,
    • E. Christopher Partridge,
    • Diane Trout,
    • Katherine E. Varley,
    • Clarke Gasper,
    • Anita Bansal,
    • Shirley Pepke,
    • Preti Jain,
    • Henry Amrhein,
    • Kevin M. Bowling,
    • Michael Anaya,
    • Marie K. Cross,
    • Brandon King,
    • Michael A. Muratet,
    • Igor Antoshechkin,
    • Kimberly M. Newberry,
    • Kenneth McCue,
    • Amy S. Nesmith,
    • Katherine I. Fisher-Aylor,
    • Barbara Pusey,
    • Gilberto DeSalvo,
    • Stephanie L. Parker,
    • Sreeram Balasubramanian,
    • Nicholas S. Davis,
    • Sarah K. Meadows,
    • Tracy Eggleston,
    • Chris Gunter,
    • J. Scott Newberry,
    • Shawn E. Levy,
    • Devin M. Absher,
    • Ali Mortazavi,
    • Wing H. Wong &
    • Barbara Wold
  15. Lawrence Berkeley National Laboratory group (targeted experimental validation)

    • Matthew J. Blow,
    • Axel Visel &
    • Len A. Pennachio
  16. NHGRI groups (data production and analysis)

    • Laura Elnitski,
    • Elliott H. Margulies,
    • Stephen C. J. Parker &
    • Hanna M. Petrykowska
  17. Sanger Institute, Washington University, Yale University, Center for Genomic Regulation, Barcelona, UCSC, MIT, University of Lausanne, CNIO group (data production and analysis)

    • Alexej Abyzov,
    • Bronwen Aken,
    • Daniel Barrell,
    • Gemma Barson,
    • Andrew Berry,
    • Alexandra Bignell,
    • Veronika Boychenko,
    • Giovanni Bussotti,
    • Jacqueline Chrast,
    • Claire Davidson,
    • Thomas Derrien,
    • Gloria Despacio-Reyes,
    • Mark Diekhans,
    • Iakes Ezkurdia,
    • Adam Frankish,
    • James Gilbert,
    • Jose Manuel Gonzalez,
    • Ed Griffiths,
    • Rachel Harte,
    • David A. Hendrix,
    • Cédric Howald,
    • Toby Hunt,
    • Irwin Jungreis,
    • Mike Kay,
    • Ekta Khurana,
    • Felix Kokocinski,
    • Jing Leng,
    • Michael F. Lin,
    • Jane Loveland,
    • Zhi Lu,
    • Deepa Manthravadi,
    • Marco Mariotti,
    • Jonathan Mudge,
    • Gaurab Mukherjee,
    • Cedric Notredame,
    • Baikang Pei,
    • Jose Manuel Rodriguez,
    • Gary Saunders,
    • Andrea Sboner,
    • Stephen Searle,
    • Cristina Sisu,
    • Catherine Snow,
    • Charlie Steward,
    • Andrea Tanzer,
    • Electra Tapanari,
    • Michael L. Tress,
    • Marijke J. van Baren,
    • Nathalie Walters,
    • Stefan Washietl,
    • Laurens Wilming,
    • Amonida Zadissa,
    • Zhengdong Zhang,
    • Michael Brent,
    • David Haussler,
    • Manolis Kellis,
    • Alfonso Valencia,
    • Mark Gerstein,
    • Alexandre Reymond,
    • Roderic Guigó,
    • Jennifer Harrow &
    • Timothy J. Hubbard
  18. Stanford-Yale, Harvard, University of Massachusetts Medical School, University of Southern California/UC Davis group (data production and analysis)

    • Stephen G. Landt,
    • Seth Frietze,
    • Alexej Abyzov,
    • Nick Addleman,
    • Roger P. Alexander,
    • Raymond K. Auerbach,
    • Suganthi Balasubramanian,
    • Keith Bettinger,
    • Nitin Bhardwaj,
    • Alan P. Boyle,
    • Alina R. Cao,
    • Philip Cayting,
    • Alexandra Charos,
    • Yong Cheng,
    • Chao Cheng,
    • Catharine Eastman,
    • Ghia Euskirchen,
    • Joseph D. Fleming,
    • Fabian Grubert,
    • Lukas Habegger,
    • Manoj Hariharan,
    • Arif Harmanci,
    • Sushma Iyengar,
    • Victor X. Jin,
    • Konrad J. Karczewski,
    • Maya Kasowski,
    • Phil Lacroute,
    • Hugo Lam,
    • Nathan Lamarre-Vincent,
    • Jing Leng,
    • Jin Lian,
    • Marianne Lindahl-Allen,
    • Renqiang Min,
    • Benoit Miotto,
    • Hannah Monahan,
    • Zarmik Moqtaderi,
    • Xinmeng J. Mu,
    • Henriette O’Geen,
    • Zhengqing Ouyang,
    • Dorrelyn Patacsil,
    • Baikang Pei,
    • Debasish Raha,
    • Lucia Ramirez,
    • Brian Reed,
    • Joel Rozowsky,
    • Andrea Sboner,
    • Minyi Shi,
    • Cristina Sisu,
    • Teri Slifer,
    • Heather Witt,
    • Linfeng Wu,
    • Xiaoqin Xu,
    • Koon-Kiu Yan,
    • Xinqiong Yang,
    • Kevin Y. Yip,
    • Zhengdong Zhang,
    • Kevin Struhl,
    • Sherman M. Weissman,
    • Mark Gerstein,
    • Peggy J. Farnham &
    • Michael Snyder
  19. University of Albany SUNY group (data production and analysis)

    • Scott A. Tenenbaum,
    • Luiz O. Penalva &
    • Francis Doyle
  20. University of Chicago, Stanford group (data production and analysis)

    • Subhradip Karmakar,
    • Stephen G. Landt,
    • Raj R. Bhanvadia,
    • Alina Choudhury,
    • Marc Domanus,
    • Lijia Ma,
    • Jennifer Moran,
    • Dorrelyn Patacsil,
    • Teri Slifer,
    • Alec Victorsen,
    • Xinqiong Yang,
    • Michael Snyder &
    • Kevin P. White
  21. University of Heidelberg group (targeted experimental validation)

    • Thomas Auer,
    • Lazaro Centanin,
    • Michael Eichenlaub,
    • Franziska Gruhl,
    • Stephan Heermann,
    • Burkhard Hoeckendorf,
    • Daigo Inoue,
    • Tanja Kellner,
    • Stephan Kirchmaier,
    • Claudia Mueller,
    • Robert Reinhardt,
    • Lea Schertel,
    • Stephanie Schneider,
    • Rebecca Sinn,
    • Beate Wittbrodt &
    • Jochen Wittbrodt
  22. University of Massachusetts Medical School Bioinformatics group (data production and analysis)

    • Zhiping Weng,
    • Troy W. Whitfield,
    • Jie Wang,
    • Patrick J. Collins,
    • Shelley F. Aldred,
    • Nathan D. Trinklein,
    • E. Christopher Partridge &
    • Richard M. Myers
  23. University of Massachusetts Medical School Genome Folding group (data production and analysis)

    • Job Dekker,
    • Gaurav Jain,
    • Bryan R. Lajoie &
    • Amartya Sanyal
  24. University of Washington, University of Massachusetts Medical Center group (data production and analysis)

    • Gayathri Balasundaram,
    • Daniel L. Bates,
    • Rachel Byron,
    • Theresa K. Canfield,
    • Morgan J. Diegel,
    • Douglas Dunn,
    • Abigail K. Ebersol,
    • Tristan Frum,
    • Kavita Garg,
    • Erica Gist,
    • R. Scott Hansen,
    • Lisa Boatman,
    • Eric Haugen,
    • Richard Humbert,
    • Gaurav Jain,
    • Audra K. Johnson,
    • Ericka M. Johnson,
    • Tattyana V. Kutyavin,
    • Bryan R. Lajoie,
    • Kristen Lee,
    • Dimitra Lotakis,
    • Matthew T. Maurano,
    • Shane J. Neph,
    • Fiedencio V. Neri,
    • Eric D. Nguyen,
    • Hongzhu Qu,
    • Alex P. Reynolds,
    • Vaughn Roach,
    • Eric Rynes,
    • Peter Sabo,
    • Minerva E. Sanchez,
    • Richard S. Sandstrom,
    • Amartya Sanyal,
    • Anthony O. Shafer,
    • Andrew B. Stergachis,
    • Sean Thomas,
    • Robert E. Thurman,
    • Benjamin Vernot,
    • Jeff Vierstra,
    • Shinny Vong,
    • Hao Wang,
    • Molly A. Weaver,
    • Yongqi Yan,
    • Miaohua Zhang,
    • Joshua M. Akey,
    • Michael Bender,
    • Michael O. Dorschner,
    • Mark Groudine,
    • Michael J. MacCoss,
    • Patrick Navas,
    • George Stamatoyannopoulos,
    • Rajinder Kaul,
    • Job Dekker &
    • John A. Stamatoyannopoulos
  25. Data Analysis Center (data analysis)

    • Ian Dunham,
    • Kathryn Beal,
    • Alvis Brazma,
    • Paul Flicek,
    • Javier Herrero,
    • Nathan Johnson,
    • Damian Keefe,
    • Margus Lukk,
    • Nicholas M. Luscombe,
    • Daniel Sobral,
    • Juan M. Vaquerizas,
    • Steven P. Wilder,
    • Serafim Batzoglou,
    • Arend Sidow,
    • Nadine Hussami,
    • Sofia Kyriazopoulou-Panagiotopoulou,
    • Max W. Libbrecht,
    • Marc A. Schaub,
    • Anshul Kundaje,
    • Ross C. Hardison,
    • Webb Miller,
    • Belinda Giardine,
    • Robert S. Harris,
    • Weisheng Wu,
    • Peter J. Bickel,
    • Balazs Banfai,
    • Nathan P. Boley,
    • James B. Brown,
    • Haiyan Huang,
    • Qunhua Li,
    • Jingyi Jessica Li,
    • William Stafford Noble,
    • Jeffrey A. Bilmes,
    • Orion J. Buske,
    • Michael M. Hoffman,
    • Avinash D. Sahu,
    • Peter V. Kharchenko,
    • Peter J. Park,
    • Dannon Baker,
    • James Taylor,
    • Zhiping Weng,
    • Sowmya Iyer,
    • Xianjun Dong,
    • Melissa Greven,
    • Xinying Lin,
    • Jie Wang,
    • Hualin S. Xi,
    • Jiali Zhuang,
    • Mark Gerstein,
    • Roger P. Alexander,
    • Suganthi Balasubramanian,
    • Chao Cheng,
    • Arif Harmanci,
    • Lucas Lochovsky,
    • Renqiang Min,
    • Xinmeng J. Mu,
    • Joel Rozowsky,
    • Koon-Kiu Yan,
    • Kevin Y. Yip &
    • Ewan Birney

Contributions

See the consortium author list for details of author contributions.

Competing financial interests

The authors declare no competing financial interests.

Corresponding author

Correspondence to:

The Supplementary Information is accompanied by a Virtual Machine (VM) containing the functioning analysis data and code. Further details of the VM are available from http://encodeproject.org/ENCODE/integrativeAnalysis/VM.

Author details

    Supplementary information

    PDF files

    1. Supplementary Information 1 (1.8M)

      This file contains Supplementary Text and Data, Methods and References – see Contents list for full details.

    2. Supplementary Information 2 (2.4M)

      This file contains Supplementary Figures E1-E6, K1-K2, L1-L3, M1-M2, R1-R2, Y1 and Z1, Supplementary Tables E1-E2, K1-K2 and L1 and additional references.

    Excel files

    1. Supplementary Table 1-section U (20K)

      This data fie shows the GENCODE Gene Annotation Statistics.

    2. Supplementary Table 1-section F (2.1M)

      This data file shows the TF Co‐associations.

    3. Supplementary Table 1-section M (406K)

      This data file shows the GWAS SNP phenotype associations across TF and DHS ENCODE annotations. The Supplementary Information file initially published online was corrupted and has been replaced on 7 September 2012.

    4. Supplementary Table 1-section N (34K)

      This data file shows the ENCODE TF Classification in detail.

    5. Supplementary Table 1-section P (38K)

      This data file shows the ENCODE Data Production Summary.

    6. Supplementary Table 1-section Q (323K)

      This data file shows the ENCODE Element Counts and Lengths by Data Type.

    Movies

    1. Supplementary Movie 1 (930K)

      This video shows the transient expression of a piece of human DNA predicted to an enhancer in a Medaka fish embryo. The prediction is from k562 human cell line, which is a red blood cell precursor derived line. The expression in Medaka is in the mature red blood cells (these are nucleated cells in fish). The expression is green fluorescent protein. The video was made in the Wittbrodt laboratory at the University of Heidelberg.

    2. Supplementary Movie 2 (637K)

      This video shows the transient expression of a piece of human DNA predicted to an enhancer in a Medaka fish embryo. The prediction is from k562 human cell line, which is a red blood cell precursor derived line. The expression in Medaka is in the mature red blood cells (these are nucleated cells in fish). The expression is green fluorescent protein. The video was made in the Wittbrodt laboratory at the University of Heidelberg.

    Text files

    1. Supplementary Table 2-section M (907K)

      This file contains the GWAS SNP pair‐wise associations across DHS ENCODE annotations.

    2. Supplementary Table 3-section M (2.3M)

      This file contains the GWAS SNP pair‐wise associations across TF ENCODE annotations.

    Comments

    1. Report this comment #49966

      Steven Pelech said:

      While the ENCODE effort is a milestone achievement, I do wonder how much of the documented putative trans cription factor and histone interactions with DNA sequences may actually be non-specific and truly inconsequential. Such low level interactions may simply be noise and just tolerated. However, the main reason why I have a hard time accepting that about 80% of the human genome sequence is functional and important is the data from other species with a similar number of genes, but extremely divergent amounts of DNA. For example, the fruit fly Drosophila melanogaster has 0.165 billion nucleotide base pairs, whereas the butterfly Fritillaria assyriaca has 124.9 billion nucleotide base pairs. The human genome size lies between with about 3.2 billion nucleotide base pairs. While the fruit fly has 750-times less DNA than the butterfly, both insects have somewhat comparable characteristics in terms of body structure, size, life span, diet, etc.

      There appears to be strong evolutionary pressure in multicellular organisms to retain excess baggage so as to simply make sure that the important parts are retained. There are countless cases of this ranging from the extensive remodelling of embryos during early development, to the hundreds of thousands of superfluous phosphorylation sites in the proteins encoded by the human genome. At the levels of gross anatomy down to the molecular, there are so many examples of inefficiencies in biology. As I have pointed out above, DNA sequencing studies in diverse organisms have increasingly demonstrated extreme ranges in the sizes of their genomes, whilst still having a relatively similar number of genes. It just seems highly unlikely that this is for increasing the amount of regulation of the genome in certain organisms over others.

    2. Report this comment #50103

      Steven Pelech said:

      In my previous comment, I mentioned that Fritillaria assyriaca's genome is has 124.9 billion base pairs, which is correct, but this species is not a butterfly but rather a flowering plant. The largest genome in an insect that I could identify appears to be instead the mountain grasshopper Podisma pedestris, which has 14 billion nucleotide base pairs. The smallest genome recorded for an insect is about 85 million base pairs, in the cases of the flies Psychoda cinerea and Coboldia fuscipes. This reveals up to a 167-fold range in the amount of DNA per cell in these flies compared to grasshoppers. In the case of mammals, the largest genome identified so far is in the Red visacha rat with ~7 billion nucleotide base pairs and the smallest in bats like the Straw-coloured fruit bat and the Long-fingered bat with 1.56 billion base pairs. So even with mammals, there can be a 4.5-fold range in the amount of DNA, despite similar numbers of genes.

    Subscribe to comments

    Additional data