Main

In many respects, the sequencing of the human genome represents just the beginning of what will be a long and difficult, but exciting, journey for researchers. With the sequence of the human genome in hand, biologists are now faced with the rather daunting challenge of identifying and understanding the myriad functional elements it contains. Much progress has already been made in pinpointing the locations of protein-coding genes within the genome, but other functional elements—such as the DNA sequences that control when and where those genes are actively transcribed—remain elusive. Distal enhancers are one category of functional genomic elements that regulate gene expression. Locating these enhancers in the vast expanses of the genome can be difficult, however, especially because enhancers are frequently located quite far away from the genes that they regulate.

Recently, a team of scientists led by Len Pennacchio at Lawrence Berkeley National Laboratory used an ambitious strategy to identify and characterize previously unknown enhancers in the human genome on a large scale. First, they performed DNA sequence comparisons to identify DNA sequences that were highly conserved between humans and other animals; they then experimentally tested these sequences for enhancer activity using an in vivo assay. It was not clear at the outset how effective this strategy would be; as Pennachio explains, “Despite anecdotal evidence that previously identified enhancers appear highly conserved across species, we were unclear how well conservation as a starting point would yield active enhancers. This is due to the fact that there are so many different types of noncoding functions beyond transcriptional enhancement.”

Pennacchio and colleagues began their study by performing extensive sequence alignments between human genomic DNA sequence and genomic DNA sequence from mouse, rat and pufferfish. From these sequence alignments they were able to identify noncoding human genomic sequences that have been highly conserved over the course of evolution. High conservation suggests that a DNA sequence may have an important function, thereby constraining the sequence from changing very much. The researchers chose 167 of these highly conserved sequences to test whether they actually function as enhancers. For their in vivo assay, they generated transgenic mice carrying the human version of the highly conserved sequences. Each transgenic mouse line carried a different DNA sequence, and the researchers engineered the mice such that DNA sequences that were bona fide enhancers could be identified on the basis of their ability to turn on expression of a reporter gene (Fig. 1).

Figure 1: Enhancers were identified based on their ability to turn on expression of a reporter gene (blue) in specific regions of a developing mouse embryo.
figure 1

Reprinted with permission from Nature.

Using this strategy, the researches identified many new enhancers, with approximately half of the sequences they tested in mice functioning as actual enhancers. Pennacchio cautions, though, that this strategy does have certain limitations, and may be biased toward identifying particular classes of enhancers: “We have currently focused on an extreme version of human genome noncoding conservation and these elements are not randomly distributed. Rather they are highly clustered and biased toward transcription factors and other key developmental genes (likely due to the extreme constraint on the regulation of genes important in vertebrate body plan development)”. Additionally, enhancers that are active in an adult, but not during embryonic development, would be missed because the enhancer activity screen is only performed at an embryonic time point.

Nonetheless, this approach has proven to be very effective at enhancer identification. In fact, since their initial study, the researchers have continued their efforts to identify new enhancers and are doing so at an impressive rate. Pennacchio explains, “In this study, in our first pass, we tested 170 elements in about a year and are now testing closer to 500 per year. With an enhancer identification success rate of 50%, we are quickly surpassing the cumulative number of enhancers identified by all investigators to date.”