How genes are regulated can be just as important as the proteins they encode. Regulatory elements in the genome are much harder to identify than protein-coding genes because they lack distinguishing sequence signatures. Moreover, many regulatory elements function only in certain cell types and conditions. In a recent paper, researchers led by Manolis Kellis and Bradley Bernstein of the Broad Institute showed how to use epigenetic marks to systematically characterize regulatory elements across many cell types. Not only has this revealed many thousands of new regulatory elements, but it has also linked regulatory elements to target genes.

The researchers used chromatin immunoprecipitation followed by high-throughput sequencing to profile genome-wide locations of histone marks, a type of epigenetic modification. As such modifications vary across cell types, the researchers profiled several types of cells: embryonic stem cells, leukemia cells, liver carcinoma cells, keratinocytes, mammary epithelial cells and four others.

Statistical analysis identified recurrent combinations of modifications, revealing several chromatin states. These states correlate with promoters, enhancers and other types of regulatory regions, and reflect cell type–specific activity. For example, different types of promoters could be distinguished as active, repressed or 'poised', and enhancers could be classified as strong or weak.

The researchers found that both promoter and enhancer states fell into a small number of common clusters of elements that tended to change together. Individual clusters were frequently associated with genes of common function such as immune response, cholesterol transport, metabolic processes, lipid metabolism and angiogenesis. Tissue-specific genes seemed most dependent on tissue-specific enhancers, whereas genes expressed across many cell types seemed most dependent on promoters and had few tissue-specific enhancers.

Perhaps most interestingly, the researchers linked enhancers with likely upstream regulators and downstream target genes. They defined profiles across the nine cell types using enhancer activity, gene expression and transcription-factor expression. By looking at the correlation between the activity profiles of different elements, they linked highly correlated elements together into regulatory networks.

To validate the predicted links, the researchers used quantitative trait locus mapping studies that examine variation between single-nucleotide polymorphisms (SNPs) and gene expression. Enhancers linked to a given target gene were also more likely to contain SNPs that correlated with the expression of the linked genes.

Lastly, the researchers correlated their chromatin state maps with disease association studies that revealed SNPs frequently associated with disease phenotypes. They found that disease-associated SNPs were more likely to be in strong enhancers in relevant cell types. For example, the researchers found that SNPs that had been associated with variation in erythrocytes occurred within enhancers active in leukemia cells; SNPs associated with lupus were identified in enhancer elements active in lymphoblastoid cells, and SNPs associated with variation in blood lipid and triglycerides were identified in regulatory elements active in liver cells. Thus, the chromatin annotations could shed light on sequence variants associated with disease phenotypes.

“This study shows this vast noncoding space can be approached, and its dynamics can be systematically understood, and it turns out to be very relevant to disease,” says Kellis. “By linking SNPs to tissue-specific enhancers and their regulatory targets, we can predict relevant tissues and pathways for previously uncharacterized disease variants.” Moreover, the approach should become even more powerful as more cell types are covered, as correlation-based links will become more precise and more elements are incorporated into maps of gene networks.