Thank you for visiting nature.com. You are using a browser version with limited support for CSS. To obtain the best experience, we recommend you use a more up to date browser (or turn off compatibility mode in Internet Explorer). In the meantime, to ensure continued support, we are displaying the site without styles and JavaScript.

Genome 'census' reveals hidden riches

Subjects

Fruitfly and nematode studies could advance understanding of the human genome.

A genome-wide map is helping researchers to decode patterns of gene regulation. Credit: Getty

A sweeping study of fruitfly and nematode genomes has uncovered thousands of new genes, providing a better understanding of how the complex genetic networks needed to guide an animal through development are generated.

The study, reported today in four papers in Science1,2 and Nature3,4 as well as a suite of publications in other journals, is the first fruit of a project called modENCODE (Model Organism ENCyclopedia Of DNA Elements), which aims to map all of the functional elements in the genome, including those that regulate gene expression. ModENCODE teams generated nearly 1,000 new data sets, including thorough tallies of the RNA molecules produced at different stages of development, as well as maps of the DNA-binding sites used by proteins called transcription factors, which regulate gene expression.

Altogether, the modENCODE team has uncovered 100,000 new elements in the fruitfly genome that serve as a template for RNA molecules, says Susan Celniker, a geneticist at the Lawrence Berkeley National Laboratory (LBNL) in Berkeley, California, and a leader of one of the ten modENCODE teams. Among these new features are 1,938 previously unrecognized genes.

"It's quite amazing," says Mike Cherry, a computational biologist at Stanford University in California who was not a co-author on the papers, but who is an external adviser to the modENCODE project. "ModENCODE has made a great step in determining the topology of the chromosome by taking a census of its elements."

Combinatorial clues

In 2007, a related project called ENCODE released an in-depth characterization of 1% of the human genome5. Although it was a landmark moment for the field, focusing on such a small portion of the genome made it difficult to uncover patterns of gene regulation, says Manolis Kellis, a computational biologist at the Massachusetts Institute of Technology in Cambridge.

That year, modENCODE was born. The project focuses on two widely used models of animal development: the fruit fly Drosophila melanogaster, and the nematode Caenorhabditis elegans. But researchers hope that the results will also inform their efforts to map the dark recesses of the human genome, particularly those mysterious stretches of DNA that do not seem to code for protein, and which make up 99% of the genome.

Within those vast expanses of DNA are elements that are important for regulating gene expression. But those elements can be difficult to find based on sequence alone. To help crack the regulatory code, the modENCODE consortium also looked for patterns in the many chemical modifications, or 'marks', carried by some proteins that interact with DNA. Those marks were known to be important for regulating gene expression, but previous studies had analysed the distribution of a single type of chemical modification at a time, says Gary Karpen, who studies epigenetics at the LBNL.

Karpen and his colleagues decided to look for patterns in the combinations of different marks. They tracked 18 different chemical modifications to DNA-associated proteins called histones, and were able to pinpoint nine predominant patterns that are associated with differences in the level at which associated genes are expressed. The team than used this information to identify additional genetic elements that regulate gene expression.

The human touch

Such information could be particularly useful as the era of personalized genomics draws near, says Mark Gerstein, a professor of biomedical informatics at Yale University in New Haven, Connecticut, and a member of the modENCODE consortium. Most of the variation among genomes will reside in those spacious noncoding regions, he notes.

Furthermore, efforts to map DNA sequences associated with disease have sometimes yielded hits in noncoding regions with unknown function, says Kellis. "It's a major question right now: how are we going to be able to interpret all of these disease variants sitting in the middle of these large, intergenic noncoding regions?" he asks.

While the modENCODE consortium ploughed through its data, the ENCODE team has also tackled a full genome analysis. ENCODE, which has received US$161.9 million from the US National Human Genome Research Institute, hopes to publish the work in the next year, says Gerstein. And modENCODE, which has thus far received$69.3 million, is not finished either: it still has over a year of funding, and discussions about a possible second phase of the project are underway, says Celniker.

Ultimately, the goal is to use these analyses to determine how the genome guides an animal as it develops and responds to the environment. "That's a bigger challenge," says Karpen. "But we're many steps closer now that we have a map."

References

1. Gerstein, M. B. et al. Science 330, 1775-1787 (2010).

2. The modENCODE Consortium et al. Science 330, 1787-1797 (2010).

3. Kharchenko, P. V. et al. Nature doi:10.1038/nature09725 (2010).

4. Graveley, B. R. et al. Nature doi:10.1038/nature09715 (2010).

5. The ENCODE Project Consortium, Nature 447, 799-816 (2007).

Authors

Rights and permissions

Reprints and Permissions

Ledford, H. Genome 'census' reveals hidden riches. Nature (2010). https://doi.org/10.1038/news.2010.687