Main

Sequencing the human genome was a landmark achievement, yet the raw information alone is limited in usefulness, like a massive directory with only telephone numbers but no names or addresses. Genomic data are also available for several animal species, but these also underline just how little is truly known. “Only 5% of [human] sequence is conserved in mammals, and only 1.5% seems to be coding sequence out of that conserved 5%,” says Elise Feingold of the National Human Genome Research Institute (NHGRI). “So what is that other three and a half percent doing, and what is the rest of the genome doing?”

Feingold is on the Scientific Management team for the ENCODE (ENCyclopedia Of DNA Elements) project, a multi-institutional and multinational initiative to develop a comprehensive directory of functional elements contained within the human genome, including protein-coding and non-protein-coding expressed sequences, regulatory elements and so forth.

The ENCODE project has begun to address this daunting challenge with two initial phases. The 'pilot' phase entails the rigorous analysis of 1% (30 megabases) of the human genome by a broad range of existing technologies. This 1% includes sequence from several noncontiguous regions, both closely characterized segments and others selected at random. To account for differences between individuals, sequence variations in conserved regions will be determined from the 48 samples being used by the HapMap consortium. Meanwhile, groups involved in the 'technology development' phase are working on innovative technologies to enhance the pursuit of high-quality data. The outcome of these two phases will ultimately determine the conduct of the monumental 'production' phase, wherein the other 99% of the genome will be studied with equal rigor.

As ENCODE project data are verified, they will be made freely available via the UCSC Genome Browser and other databases, and some of the new techniques being developed by consortium members are already making their way into publication (see Dorschner et al., pp. 219–225). Feingold says funding for the initial phases currently covers three years, and she anticipates a far longer road beyond that—but she is already greatly encouraged by the findings and collaborations that have emerged to date, indicating that getting there may be much more than half the fun.