Abstract
The far-reaching, National Human Genome Research Institute backed ENCODE project hopes to advance the state-of-the-art of genomic analysis and derive the definitive functional index of the human genome.
Main
Sequencing the human genome was a landmark achievement, yet the raw information alone is limited in usefulness, like a massive directory with only telephone numbers but no names or addresses. Genomic data are also available for several animal species, but these also underline just how little is truly known. “Only 5% of [human] sequence is conserved in mammals, and only 1.5% seems to be coding sequence out of that conserved 5%,” says Elise Feingold of the National Human Genome Research Institute (NHGRI). “So what is that other three and a half percent doing, and what is the rest of the genome doing?”
Feingold is on the Scientific Management team for the ENCODE (ENCyclopedia Of DNA Elements) project, a multi-institutional and multinational initiative to develop a comprehensive directory of functional elements contained within the human genome, including protein-coding and non-protein-coding expressed sequences, regulatory elements and so forth.
The ENCODE project has begun to address this daunting challenge with two initial phases. The 'pilot' phase entails the rigorous analysis of 1% (30 megabases) of the human genome by a broad range of existing technologies. This 1% includes sequence from several noncontiguous regions, both closely characterized segments and others selected at random. To account for differences between individuals, sequence variations in conserved regions will be determined from the 48 samples being used by the HapMap consortium. Meanwhile, groups involved in the 'technology development' phase are working on innovative technologies to enhance the pursuit of high-quality data. The outcome of these two phases will ultimately determine the conduct of the monumental 'production' phase, wherein the other 99% of the genome will be studied with equal rigor.
As ENCODE project data are verified, they will be made freely available via the UCSC Genome Browser and other databases, and some of the new techniques being developed by consortium members are already making their way into publication (see Dorschner et al., pp. 219–225). Feingold says funding for the initial phases currently covers three years, and she anticipates a far longer road beyond that—but she is already greatly encouraged by the findings and collaborations that have emerged to date, indicating that getting there may be much more than half the fun.
References
RESEARCH PAPERS
ENCODE Project Consortium. The ENCODE (ENCyclopedia Of DNA Elements) Project. Science 306, 636–640 (2004).
WEB SITES
The ENCODE project homepage: http://www.genome.gov/ENCODE
The UCSC genome browser: http://genome.cse.ucsc.edu/ENCODE
Rights and permissions
About this article
Cite this article
Eisenstein, M. Too much information? Not for long. Nat Methods 1, 190 (2004). https://doi.org/10.1038/nmeth1204-190a
Issue Date:
DOI: https://doi.org/10.1038/nmeth1204-190a