A collection of research articles and related content describing the Encyclopedia of DNA Elements, its datasets and tools.
How cells, tissues and organisms interpret the information encoded in the genome has vital implications for our understanding of development, health and disease. Launched in 2003, the ENCyclopedia Of DNA Elements (ENCODE) project aims to map the functional elements in the human genome (later expanded to include model organisms).
This Collection showcases the main articles and related content resulting from the third phase of ENCODE, during which almost 6,000 new experiments were performed (4,834 involving human samples and 1,158 with mouse samples) and an online registry of more than one million human and mouse candidate cis-regulatory elements (cCREs) was curated. The Encyclopedia paper gives an overview of the various assays and describes the registry of cCREs, and in the companion articles, individual data types are analysed in more detail and the development of novel methodology is reported. This dedicated collection also contains a Perspective, News & Views articles and links to other resources
The Encyclopedia at a Glance
The authors summarize the data produced by phase III of the Encyclopedia of DNA Elements (ENCODE) project, a resource for better understanding of the human and mouse genomes.
The third phase of the Encyclopedia of DNA Elements (ENCODE) project has generated the most comprehensive catalogue yet of the functional elements that regulate our genes.
The authors summarize the history of the ENCODE Project, the achievements of ENCODE 1 and ENCODE 2, and how the new data generated and analysed in ENCODE 3 complement the previous phases.
Mouse Epigenetics and Gene Expression
Analysis of 168 methylomes from 12 mouse tissues at 9 developmental stages sheds light on the epigenetic and regulatory landscape during mammalian fetal development.
Analysis of chromatin state and accessibility in mouse tissues from twelve sites and eight developmental stages provides a comprehensive view of chromatin dynamics.
RNA expression is quantified at a tissue level in seventeen mouse tissues across embryonic development, and at the single-cell level in the developing limb.
Pseudogenes are key markers of genome remodelling processes. Here the authors present genome-wide annotation of the pseudogenes in the mouse reference genome and 18 inbred mouse strains, update human pseudogene annotations, and characterise the transcription and evolution of mouse pseudogenes.
Human Gene Expression and RNA Regulation
A combination of five assays is used to produce a catalogue of RNA elements to which RNA-binding proteins bind in human cells.
ChIP–seq and CETCh–seq data are used to analyse binding maps for 208 transcription factors and other chromatin-associated proteins in a single human cell type, providing a comprehensive catalogue of the transcription factor landscape and gene regulatory networks in these cells.
A map of cohesin-mediated chromatin loops in 24 types of human cells identifies loops that show cell-type-specific variation, indicating that chromatin loops may help to specify cell-specific gene expression programs and functions.
A high-density DNase I cleavage map from 243 human cell and tissue types provides a genome-wide, nucleotide-resolution map of human transcription factor footprints.
High-resolution maps of DNase I hypersensitive sites from 733 human biosamples are used to identify and index regulatory DNA within the human genome.
ENCODE is a resource comprising thousands of functional genomic datasets. Here, the authors present custom annotation within ENCODE for cancer, highlighting a workflow that can help prioritise key elements in oncogenesis.
A genome-wide screen identifies silencer regions in human cells. Deletion of silencers linked to the transporter genes ABCC2 and ABCG2 causes their up-regulation and chemo-resistance.
Differential binding of RNA-binding proteins mediated by genetic variants (GVs) can influence posttranscriptional regulation. Here, the authors develop BEAPR, a computational approach to identify allele-specific binding events in eCLIP-Seq data.
Giovanni Quinones-Valdez et al. examined the role of over 200 RNA-binding proteins in mediating A-to-I RNA editing. They identified several RNA-binding proteins that regulate ADAR1 expression, interaction, or binding with Alu elements in a cell type-specific manner.
Computational Analysis and Tools
Supervised machine-learning models trained using Drosophila epigenetic and STARR-seq data can be transferred to predict mouse and human enhancers.
Parallelized analysis in clinical genomics can lead to sample or data mislabelling, and could have serious downstream consequences. Here the authors present a tool to quantify sample genetic relatedness and detect such mistakes, and apply it to thousands of datasets from the ENCODE consortium.