Nature 450, 219-232 (8 November 2007) | doi:10.1038/nature06340; Received 21 July 2007; Accepted 4 October 2007

Discovery of functional elements in 12 Drosophila genomes using evolutionary signatures

Alexander Stark1,2,35, Michael F. Lin1,2,35, Pouya Kheradpour2,35, Jakob S. Pedersen3,4,35, Leopold Parts5,6, Joseph W. Carlson7, Madeline A. Crosby8, Matthew D. Rasmussen2, Sushmita Roy9, Ameya N. Deoras2, J. Graham Ruby10,11, Julius Brennecke12, Harvard FlyBase curators, Berkeley Drosophila Genome Project, Emily Hodges12, Angie S. Hinrichs4, Anat Caspi13, Benedict Paten4,5,14, Seung-Won Park15, Mira V. Han16, Morgan L. Maeder17, Benjamin J. Polansky17, Bryanne E. Robson17, Stein Aerts18,19, Jacques van Helden20, Bassem Hassan18,19, Donald G. Gilbert21, Deborah A. Eastman17, Michael Rice22, Michael Weir23, Matthew W. Hahn16, Yongkyu Park15, Colin N. Dewey24, Lior Pachter25,26, W. James Kent4, David Haussler4, Eric C. Lai27, David P. Bartel10,11, Gregory J. Hannon12, Thomas C. Kaufman21, Michael B. Eisen28,29, Andrew G. Clark30, Douglas Smith31, Susan E. Celniker7, William M. Gelbart8,32 & Manolis Kellis1,2

  1. The Broad Institute, Massachusetts Institute of Technology and Harvard University, Cambridge, Massachusetts 02140, USA
  2. Computer Science and Artificial Intelligence Laboratory, MIT, Cambridge, Massachusetts 02139, USA
  3. The Bioinformatics Centre, Department of Molecular Biology, University of Copenhagen, Ole Maaloes Vej 5, 2200 Copenhagen N, Denmark
  4. Center for Biomolecular Science and Engineering, University of California, Santa Cruz, California 95064, USA
  5. Wellcome Trust Sanger Institute, Wellcome Trust Genome Campus, Hinxton, Cambridge CB10 1SA, UK
  6. Institute of Computer Science, University of Tartu, Estonia
  7. BDGP, LBNL, 1 Cyclotron Road MS 64-0119, Berkeley, California 94720, USA
  8. FlyBase, The Biological Laboratories, Harvard University, 16 Divinity Avenue, Cambridge, Massachusetts 02138, USA
  9. Department of Computer Science, University of New Mexico, Albuquerque, New Mexico 87131, USA
  10. Department of Biology, MIT, Cambridge, Massachusetts 02139, USA
  11. Whitehead Institute, Cambridge, Massachusetts 02142, USA
  12. Cold Spring Harbor Laboratory, Watson School of Biological Sciences, 1 Bungtown Road, Cold Spring Harbor, New York 11724, USA
  13. University of California, San Francisco/University of California, Berkeley Joint Graduate Group in Bioengineering, Berkeley, California 97210, USA
  14. EMBL Nucleotide Sequence Submissions, European Bioinformatics Institute, Wellcome Trust Genome Campus, Hinxton, Cambridge CB10 1SD, UK
  15. Department of Cell Biology and Molecular Medicine, G-629, MSB, 185 South Orange Avenue, UMDNJ-New Jersey Medical School, Newark, New Jersey 07103, USA
  16. Department of Biology and School of Informatics, Indiana University, Indiana 47405, USA
  17. Department of Biology, Connecticut College, New London, Connecticut 06320, USA
  18. Laboratory of Neurogenetics, Department of Molecular and Developmental Genetics, VIB, 3000 Leuven, Belgium
  19. Department of Human Genetics, K. U. Leuven School of Medicine, 3000 Leuven, Belgium
  20. Department de Biologie Moleculaire, Universite Libre de Bruxelles, 1050 Brussels, Belgium
  21. Department of Biology, Indiana University, Bloomington, Indiana 47405, USA
  22. Department of Mathematics and Computer Science, Wesleyan University, Middletown, Connecticut 06459, USA
  23. Biology Department, Wesleyan University Middletown, Connecticut 06459, USA
  24. Department of Biostatistics and Medical Informatics, University of Wisconsin-Madison, Madison, Wisconsin 53706, USA
  25. Department of Mathematics, University of California at Berkeley, Berkeley, California 94720, USA
  26. Department of Computer Science, University of California at Berkeley, Berkeley, California 94720, USA
  27. Department of Developmental Biology, Memorial Sloan-Kettering Cancer Center, New York, New York 10021, USA
  28. Graduate Group in Biophysics, Department of Molecular and Cell Biology, and Center for Integrative Genomics, University of California, Berkeley, California 94720, USA
  29. Lawrence Berkeley National Laboratory, Life Sciences Division, Berkeley, California 94720, USA
  30. Department of Molecular Biology and Genetics, Cornell University, Ithaca, New York 14853, USA
  31. Agencourt Bioscience Corporation, 500 Cummings Center, Suite 2450, Beverly, Massachusetts 01915, USA
  32. The Department of Molecular and Cellular Biology, Harvard University, Cambridge, Massachusetts 02138, USA
  33. FlyBase, The Biological Laboratories, Harvard University, 16 Divinity Avenue, Cambridge, Massachusetts 02138, USA.
  34. BDGP, LBNL, 1 Cyclotron Road MS 64-0119, Berkeley, California 94720, USA.
  35. These authors contributed equally to this work.
  36. Lists of participants and affiliations appear at the end of the paper.

Correspondence to: Manolis Kellis1,2 Correspondence and requests for materials should be addressed to M.K. (Email:


Sequencing of multiple related species followed by comparative genomics analysis constitutes a powerful approach for the systematic understanding of any genome. Here, we use the genomes of 12 Drosophila species for the de novo discovery of functional elements in the fly. Each type of functional element shows characteristic patterns of change, or 'evolutionary signatures', dictated by its precise selective constraints. Such signatures enable recognition of new protein-coding genes and exons, spurious and incorrect gene annotations, and numerous unusual gene structures, including abundant stop-codon readthrough. Similarly, we predict non-protein-coding RNA genes and structures, and new microRNA (miRNA) genes. We provide evidence of miRNA processing and functionality from both hairpin arms and both DNA strands. We identify several classes of pre- and post-transcriptional regulatory motifs, and predict individual motif instances with high confidence. We also study how discovery power scales with the divergence and number of species compared, and we provide general guidelines for comparative studies.


These links to content published by NPG are automatically generated.


The multitasking genome

Nature Genetics News and Views (01 Jun 2006)

Tiling DNA microarrays for fly genome cartography

Nature Genetics News and Views (01 Oct 2006)