Abstract

Sequencing of multiple related species followed by comparative genomics analysis constitutes a powerful approach for the systematic understanding of any genome. Here, we use the genomes of 12 Drosophila species for the de novo discovery of functional elements in the fly. Each type of functional element shows characteristic patterns of change, or ‘evolutionary signatures’, dictated by its precise selective constraints. Such signatures enable recognition of new protein-coding genes and exons, spurious and incorrect gene annotations, and numerous unusual gene structures, including abundant stop-codon readthrough. Similarly, we predict non-protein-coding RNA genes and structures, and new microRNA (miRNA) genes. We provide evidence of miRNA processing and functionality from both hairpin arms and both DNA strands. We identify several classes of pre- and post-transcriptional regulatory motifs, and predict individual motif instances with high confidence. We also study how discovery power scales with the divergence and number of species compared, and we provide general guidelines for comparative studies.

  • Subscribe to Nature for full access:

    $199

    Subscribe

Additional access options:

Already a subscriber?  Log in  now or  Register  for online access.

References

  1. 1.

    , , & Comparative genomics. Annu. Rev. Genomics Hum. Genet. 5, 15–56 (2004)

  2. 2.

    , & Comparative genomics: genome-wide analysis in metazoan eukaryotes. Nature Rev. Genet. 4, 251–262 (2003)

  3. 3.

    et al. Sequencing and comparison of yeast species to identify genes and regulatory elements. Nature 423, 241–254 (2003)

  4. 4.

    et al. Finding functional features in Saccharomyces genomes by phylogenetic footprinting. Science 301, 71–76 (2003)

  5. 5.

    Genome annotation past, present, and future: how to define an ORF at each locus. Genome Res. 15, 1777–1786 (2005)

  6. 6.

    , & Fast and reliable prediction of noncoding RNAs. Proc. Natl Acad. Sci. USA 102, 2454–2459 (2005)

  7. 7.

    et al. Identification and classification of conserved RNA secondary structures in the human genome. PLoS Comput. Biol. 2, e33 (2006)

  8. 8.

    et al. The microRNAs of Caenorhabditis elegans. Genes Dev. 17, 991–1008 (2003)

  9. 9.

    et al. Vertebrate microRNA genes. Science 299, 1540 (2003)

  10. 10.

    , , & Computational identification of Drosophila microRNA genes. Genome Biol. 4, R42 (2003)

  11. 11.

    et al. Phylogenetic shadowing and computational identification of human microRNA genes. Cell 120, 21–24 (2005)

  12. 12.

    et al. Systematic discovery of regulatory motifs in human promoters and 3′ UTRs by comparison of several mammals. Nature 434, 338–345 (2005)

  13. 13.

    et al. The discovery, positioning and verification of a set of transcription-associated motifs in vertebrates. Genome Biol. 6, R104 (2005)

  14. 14.

    , & Revealing posttranscriptional regulatory elements through network-level conservation. PLoS Comput. Biol. 1, e69 (2005)

  15. 15.

    et al. Phylogenetic shadowing of primate sequences to find functional regions of the human genome. Science 299, 1391–1394 (2003)

  16. 16.

    et al. Distribution and intensity of constraint in mammalian genomic sequence. Genome Res. 15, 901–913 (2005)

  17. 17.

    , , & Identification and characterization of multi-species conserved sequences. Genome Res. 13, 2507–2518 (2003)

  18. 18.

    et al. Comparative analyses of multi-species sequences from targeted genomic regions. Nature 424, 788–793 (2003)

  19. 19.

    A model of the statistical power of comparative genome sequence analysis. PLoS Biol. 3, e10 (2005)

  20. 20.

    et al. Assessing the impact of comparative genomic sequence data on the functional annotation of the Drosophila genome. Genome Biol. 3, RESEARCH0086 (2002)

  21. 21.

    & A brief history of Drosophila’s contributions to genome research. Science 287, 2216–2218 (2000)

  22. 22.

    et al. The genome sequence of Drosophila melanogaster.. Science 287, 2185–2195 (2000)

  23. 23.

    et al. Annotation of the Drosophila melanogaster euchromatic genome: a systematic review. Genome Biol. 3, RESEARCH0083 (2002)

  24. 24.

    & The Drosophila melanogaster genome. Annu. Rev. Genomics Hum. Genet. 4, 89–117 (2003)

  25. 25.

    & Drosophila melanogaster: a case study of a model genomic sequence and its consequences. Genome Res. 15, 1661–1667 (2005)

  26. 26.

    , & Research resources for Drosophila: the expanding universe. Nature Rev. Genet. 6, 179–193 (2005)

  27. 27.

    , , & P[acman]: a BAC transgenic platform for targeted insertion of large DNA fragments in D. melanogaster. Science 314, 1747–1751 (2006)

  28. 28.

    et al. A genome-wide transgenic RNAi library for conditional gene inactivation in Drosophila. Nature 448, 151–156 (2007)

  29. 29.

    et al. The Berkeley Drosophila Genome Project gene disruption project: Single P-element insertions mutating 25% of vital Drosophila genes. Genetics 153, 135–177 (1999)

  30. 30.

    The art and design of genetic screens: Drosophila melanogaster. Nature Rev. Genet. 3, 176–188 (2002)

  31. 31.

    et al. Comparative genome sequencing of Drosophila pseudoobscura: chromosomal, gene, and cis-element evolution. Genome Res. 15, 1–18 (2005)

  32. 32.

    Evolution of genes and genomes on the Drosophila phylogeny. Nature doi: 10.1038/nature06341 (this issue) (2007)

  33. 33.

    et al. Evolutionarily conserved elements in vertebrate, insect, worm, and yeast genomes. Genome Res. 15, 1034–1050 (2005)

  34. 34.

    , & The KA/KS ratio test for assessing the protein-coding potential of genomic regions: an empirical and simulation study. Genome Res. 12, 198–202 (2002)

  35. 35.

    Computational genomics of noncoding RNA genes. Cell 109, 137–140 (2002)

  36. 36.

    et al. Evolutionary patterns of non-coding RNAs. Theor. Biosci. 123, 301–369 (2004)

  37. 37.

    et al. Genome annotation assessment in Drosophila melanogaster. Genome Res. 10, 483–501 (2000)

  38. 38.

    et al. A Drosophila complementary DNA resource. Science 287, 2222–2224 (2000)

  39. 39.

    et al. A Drosophila full-length cDNA resource. Genome Biol. 3, RESEARCH0080 (2000).

  40. 40.

    et al. An integrated gene annotation and transcriptional profiling approach towards the full gene content of the Drosophila genome. Genome Biol. 5, R3 (2003)

  41. 41.

    et al. A computational and experimental approach to validating annotations and gene predictions in the Drosophila melanogaster genome. Proc. Natl Acad. Sci. USA 102, 1566–1571 (2005)

  42. 42.

    et al. Biological function of unannotated transcription during the early development of Drosophila melanogaster. Nature Genet. 38, 1151–1158 (2006)

  43. 43.

    et al. Revisiting the protein-coding gene catalog of Drosophila melanogaster using twelve fly genomes. Genome Res. doi: 10.1101/gr.6679507 (in the press)

  44. 44.

    & Statistical methods for detecting molecular adaptation. Trends Ecol. Evol. 15, 496–503 (2000)

  45. 45.

    , , & Computational identification of protein coding potential of conserved sequence tags through cross-species evolutionary analysis. Nucleic Acids Res. 31, 4639–4645 (2003)

  46. 46.

    , , & Human-mouse gene identification by comparative evidence integration and evolutionary analysis. Genome Res. 13 (6A). 1190–1202 (2003)

  47. 47.

    et al. FlyBase: genomes by the dozen. Nucleic Acids Res. 35 (Database issue). D486–D491 (2007)

  48. 48.

    et al. Gene ontology: tool for the unification of biology. The Gene Ontology Consortium. Nature Genet. 25, 25–29 (2000)

  49. 49.

    , , & Inverse polymerase chain reaction. Bio/Technology 8, 759–760 (1990)

  50. 50.

    et al. Rapid and efficient cDNA library screening by self-ligation of inverse PCR products (SLIP). Nucleic Acids Res. 33, e185 (2005)

  51. 51.

    et al. High-throughput plasmid cDNA library screening. Nature Protocols 1, 624–632 (2006)

  52. 52.

    , & Gene family evolution across 12 Drosophila genomes. PLoS Genet 3, e197 (2007)

  53. 53.

    et al. The stoned locus of Drosophila melanogaster produces a dicistronic transcript and encodes two distinct polypeptides. Genetics 143, 1699–1711 (1996)

  54. 54.

    & The Adh-related gene of Drosophila melanogaster is expressed as a functional dicistronic messenger RNA: multigenic transcription in higher organisms. EMBO J. 16, 2023–2031 (1997)

  55. 55.

    & How selenium has altered our understanding of the genetic code. Mol. Cell. Biol. 22, 3565–3576 (2002)

  56. 56.

    et al. Characterization of mammalian selenoproteomes. Science 300, 1439–1443 (2003)

  57. 57.

    Regulation of gene expression by stop codon recoding: selenocysteine. Gene 312, 17–25 (2003)

  58. 58.

    et al. In silico identification of novel selenoproteins in the Drosophila melanogaster genome. EMBO Rep. 2, 697–702 (2001)

  59. 59.

    & Regulated translational bypass of stop codons in yeast. Trends Microbiol. 15, 78–86 (2007)

  60. 60.

    et al. A specific base transition occurs on replicating hepatitis delta virus RNA. J. Virol. 64, 1021–1027 (1990)

  61. 61.

    & Hepatitis D virus RNA editing: specific modification of adenosine in the antigenomic RNA. J. Virol. 69, 7593–7600 (1995)

  62. 62.

    et al. Translational readthrough in the hdc mRNA generates a novel branching inhibitor in the Drosophila trachea. Genes Dev. 12, 956–967 (1998)

  63. 63.

    RNA editing by adenosine deaminases that act on RNA. Annu. Rev. Biochem. 71, 817–846 (2002)

  64. 64.

    et al. The Drosophila gene for antizyme requires ribosomal frameshifting for expression and contains an intronic gene for snRNP Sm D3 on the opposite strand. Mol. Cell. Biol. 18, 1553–1561 (1998)

  65. 65.

    Non-coding RNA genes and the modern RNA world. Nature Rev. Genet. 2, 919–929 (2001)

  66. 66.

    et al. RNomics in Drosophila melanogaster: identification of 66 candidates for novel non-messenger RNAs. Nucleic Acids Res. 31, 2495–2507 (2003)

  67. 67.

    & snoRNA-LBME-db, a comprehensive database of human H/ACA and C/D box snoRNAs. Nucleic Acids Res. 34 (Database issue). D158–D162 (2006)

  68. 68.

    Drosophila, the golden bug, emerges as a tool for human genetics. Nature Rev. Genet. 6, 9–23 (2005)

  69. 69.

    , , & Nervous system targets of RNA editing identified by comparative genomics. Science 301, 832–836 (2003)

  70. 70.

    et al. UTRdb and UTRsite: a collection of sequences and regulatory motifs of the untranslated regions of eukaryotic mRNAs. Nucleic Acids Res. 33 (Database issue). D141–D146 (2005)

  71. 71.

    , & The positional, structural, and sequence requirements of the Drosophila TLS RNA localization element. RNA 11, 1017–1029 (2005)

  72. 72.

    et al. Escherichia coli ribosomal protein L20 binds as a single monomer to its own mRNA bearing two potential binding sites. Nucleic Acids Res. 35, 3016–3031 (2007)

  73. 73.

    , , & An endoderm-specific GATA factor gene, dGATAe, is required for the terminal differentiation of the Drosophila endoderm. Dev. Biol. 278, 576–586 (2005)

  74. 74.

    et al. An evolutionarily conserved domain of roX2 RNA is sufficient for induction of H4-Lys16 acetylation on the Drosophila X chromosome. Genetics (in the press)

  75. 75.

    & Epigenetic aspects of X-chromosome dosage compensation. Science 293, 1083–1085 (2001)

  76. 76.

    , & Approaches to microRNA discovery. Nature Genet. 38 (Suppl 1). S2–S7 (2006)

  77. 77.

    MicroRNAs: genomics, biogenesis, mechanism, and function. Cell 116, 281–297 (2004)

  78. 78.

    et al. Systematic discovery and characterization of fly microRNAs using 12 Drosophila genomes. Genome Res. 10.1101/gr.6593807 (in the press)

  79. 79.

    et al. Evolution, biogenesis, expression, and target predictions of a substantially expanded set of Drosophila microRNAs. Genome Res. 10.1101/gr.6597907 (in the press)

  80. 80.

    et al. Tcl1 expression in chronic lymphocytic leukemia is regulated by miR-29 and miR-181. Cancer Res. 66, 11590–11593 (2006)

  81. 81.

    , & Intronic microRNA precursors that bypass Drosha processing. Nature 448, 83–86 (2007)

  82. 82.

    et al. The mirtron pathway generates microRNA-class regulatory RNAs in Drosophila. Cell 130, 89–100 (2007)

  83. 83.

    et al. Prediction of mammalian microRNA targets. Cell 115, 787–798 (2003)

  84. 84.

    , , & Identification of Drosophila microRNA targets. PLoS Biol. 1, E60 (2003)

  85. 85.

    Micro RNAs are complementary to 3′ UTR sequence motifs that mediate negative post-transcriptional regulation. Nature Genet. 30, 363–364 (2002)

  86. 86.

    , & Conserved seed pairing, often flanked by adenosines, indicates that thousands of human genes are microRNA targets. Cell 120, 15–20 (2005)

  87. 87.

    Identifying functional elements by comparative DNA sequence analysis. Genome Res. 11, 1143–1144 (2001)

  88. 88.

    DNA binding sites: representation and discovery. Bioinformatics 16, 16–23 (2000)

  89. 89.

    , , & Reliable prediction of regulator targets using 12 Drosophila genomes. Genome Res. 10.1101/gr.7090407 (in the press)

  90. 90.

    & Genomic regulatory networks and animal development. Dev. Cell 9, 449–462 (2005)

  91. 91.

    et al. Transcriptional control in the segmentation gene network of Drosophila. PLoS Biol. 2, e271 (2004)

  92. 92.

    et al. Whole-genome ChIP-chip analysis of Dorsal, Twist, and Snail suggests integration of diverse patterning processes in the Drosophila embryo. Genes Dev. 21, 385–390 (2007)

  93. 93.

    et al. The KEGG resource for deciphering the genome. Nucleic Acids Res. 32 (Database issue). D277–D280 (2004)

  94. 94.

    et al. Exploiting transcription factor binding site clustering to identify cis-regulatory modules involved in pattern formation in the Drosophila genome. Proc. Natl Acad. Sci. USA 99, 757–762 (2002)

  95. 95.

    et al. A regulatory code for neurogenic gene expression in the Drosophila embryo. Development 131, 2387–2394 (2004)

  96. 96.

    et al. Expression-guided in silico evaluation of candidate cis regulatory codes for Drosophila muscle founder cells. PLoS Comput. Biol. 2, e53 (2006)

  97. 97.

    & The RNA polymerase II core promoter. Annu. Rev. Biochem. 72, 449–479 (2003)

  98. 98.

    et al. Genome-wide identification of mRNAs associated with the translational regulator PUMILIO in Drosophila melanogaster. Proc. Natl Acad. Sci. USA 103, 4487–4492 (2006)

  99. 99.

    , & The nonamer UUAUUUAUU is the key AU-rich sequence motif that mediates mRNA degradation. Mol. Cell. Biol. 15, 2219–2230 (1995)

  100. 100.

    , , & Predictive identification of exonic splicing enhancers in human genes. Science 297, 1007–1013 (2002)

  101. 101.

    , , & Substrate requirements for let-7 function in the developing zebrafish embryo. Nucleic Acids Res. 32, 6284–6291 (2004)

  102. 102.

    et al. MicroRNA targeting specificity in mammals: determinants beyond seed pairing. Mol. Cell 27, 91–105 (2007)

  103. 103.

    et al. The widespread impact of mammalian MicroRNAs on mRNA repression and evolution. Science 310, 1817–1821 (2005)

  104. 104.

    et al. Animal microRNAs confer robustness to gene expression and have a significant impact on 3′ UTR evolution. Cell 123, 1133–1146 (2005)

  105. 105.

    microRNA target predictions in animals. Nature Genet. 38, (suppl. 1)S8–S13 (2006)

  106. 106.

    et al. Distinguishing regulatory DNA from neutral sites. Genome Res. 13, 64–72 (2003)

  107. 107.

    & CrebA regulates secretory activity in the Drosophila salivary gland and epidermis. Development 132, 2743–2758 (2005)

  108. 108.

    et al. A temporal map of transcription factor activity: mef2 directly regulates target genes at all stages of muscle development. Dev. Cell 10, 797–807 (2006)

  109. 109.

    et al. A core transcriptional network for early mesoderm development in Drosophila melanogaster. Genes Dev. 21, 436–449 (2007)

  110. 110.

    , & TarBase: A comprehensive database of experimentally supported animal microRNA targets. RNA 12, 192–197 (2006)

  111. 111.

    et al. Control of developmental regulators by Polycomb in human embryonic stem cells. Cell 125, 301–313 (2006)

  112. 112.

    et al. Core transcriptional regulatory circuitry in human embryonic stem cells. Cell 122, 947–956 (2005)

  113. 113.

    , , & Fine-tuning enhancer models to predict transcriptional targets across multiple genomes. PLoS ONE 2, (11)e1115 (2007)

  114. 114.

    , , & Phylogenetic footprinting analysis in the upstream regulatory regions of the Drosophila Enhancer of split genes. Genetics (in the press)

  115. 115.

    et al. Negative regulation of proneural gene activity: hairy is a direct transcriptional repressor of achaete. Genes Dev. 8, 2729–2742 (1994)

  116. 116.

    & Spatial regulation of the gap gene giant during Drosophila development. Development 111, 601–609 (1991)

  117. 117.

    & Suppressor of hairless directly activates transcription of enhancer of split complex genes in response to Notch receptor activity. Genes Dev. 9, 2609–2622 (1995)

  118. 118.

    & Regulation and function of tinman during dorsal mesoderm induction and heart specification in Drosophila. Dev. Genet. 22, 187–200 (1998)

  119. 119.

    et al. An initial strategy for the systematic identification of functional elements in the human genome by low-redundancy comparative sequencing. Proc. Natl Acad. Sci. USA 102, 4795–4800 (2005)

  120. 120.

    , & Differences between pair-wise and multi-sequence alignment methods affect vertebrate genome comparisons. Trends Genet. 22, 187–193 (2006)

  121. 121.

    Programmed translational frameshifting. Annu. Rev. Genet. 30, 507–528 (1996)

  122. 122.

    et al. Tissue-specific transcriptional regulation has diverged significantly between human and mouse. Nature Genet. 39, 730–732 (2007)

  123. 123.

    & Evolutionary dynamics of the enhancer region of even-skipped in Drosophila. Mol. Biol. Evol. 12, 1002–1011 (1995)

  124. 124.

    et al. Functional evolution of a cis-regulatory module. PLoS Biol. 3, e93 (2005)

  125. 125.

    Identification and analysis of functional elements in 1% of the human genome by the ENCODE pilot project. Nature 447, 799–816 (2007)

  126. 126.

    et al. The human genome browser at UCSC. Genome Res. 12, 996–1006 (2002)

Download references

Acknowledgements

We thank the National Human Genome Research Institute (NHGRI) for continued support. A.S. was supported in part by the Schering AG/Ernst Schering Foundation and in part by the Human Frontier Science Program Organization (HFSPO). P.K. was supported in part by a National Science Foundation Graduate Research Fellowship. J.S.P. thanks B. Raney and R. Baertsch, and the Danish Medical Research Council and the National Cancer Institute for support. J.B. thanks the Schering AG/Ernst Schering Foundation for a postdoctoral fellowship. L.Parts thanks J. Vilo. S.R. was supported by a HHMI-NIH/NIBIB Interfaces Training Grant and thanks T. Lane and M. Werner-Washburne. D.H., D.P.B., G.J.H. and T.C.K. are Investigators of the Howard Hughes Medical Institute, and B.P., J.G.R., E.H. and J.B. are affiliated with these investigators. J.W.C. and S.E.C. were supported by the NHGRI. M.K. was supported by start-up funds from the MIT Electrical Engineering and Computer Science Laboratory, the Broad Institute of MIT and Harvard, and the MIT Computer Science and Artificial Intelligence Laboratory, and by the Distinguished Alumnus (1964) Career Development Professorship.

Author Contributions Organizing committee: Manolis Kellis, William Gelbart, Doug Smith, Andrew G. Clark, Michael E. Eisen, Thomas C. Kaufman; protein-coding gene prediction: Michael F. Lin, Ameya N. Deoras, Mira V. Han, Matthew W. Hahn, Donald G. Gilbert, Michael Weir, Michael Rice, Manolis Kellis; manual curation of protein-coding genes: Madeline A. Crosby, Harvard FlyBase curators, William M. Gelbart; validation of protein-coding genes: Joseph W. Carlson, Berkeley Drosophila Genome Project, Susan E. Celniker; non-coding RNA gene prediction: Jakob S. Pedersen, David Haussler, Yongkyu Park, Seung-Won Park, Manolis Kellis; microRNA gene prediction: Alexander Stark, Pouya Kheradpour, Leopold Parts, Manolis Kellis; microRNA cloning and sequencing: Julius Brennecke, Emily Hodges, Gregory J. Hannon; microRNA target prediction: Alexander Stark, J. Graham Ruby, Manolis Kellis, Eric C. Lai, David P. Bartel; motif identification: Alexander Stark, Pouya Kheradpour, Manolis Kellis; motif instance prediction: Alexander Stark, Pouya Kheradpour, Sushmita Roy, Morgan L. Maeder, Benjamin J. Polansky, Bryanne E. Robson, Deborah A. Eastman, Stein Aerts, Bassem Hassan, Jacques van Helden, Manolis Kellis; genome alignments: Angie S. Hinrichs, W. James Kent, Anat Caspi, Lior Pachter, Colin N. Dewey, Benedict Paten; phylogeny and branch length estimation: Matthew D. Rasmussen, Manolis Kellis; final manuscript preparation: Alexander Stark, Michael F. Lin, Pouya Kheradpour, Jakob Pedersen, Manolis Kellis.

Author information

Author notes

    • Alexander Stark
    • , Michael F. Lin
    • , Pouya Kheradpour
    •  & Jakob S. Pedersen

    These authors contributed equally to this work.

Affiliations

  1. The Broad Institute, Massachusetts Institute of Technology and Harvard University, Cambridge, Massachusetts 02140, USA

    • Alexander Stark
    • , Michael F. Lin
    •  & Manolis Kellis
  2. Computer Science and Artificial Intelligence Laboratory, MIT, Cambridge, Massachusetts 02139, USA

    • Alexander Stark
    • , Michael F. Lin
    • , Pouya Kheradpour
    • , Matthew D. Rasmussen
    • , Ameya N. Deoras
    •  & Manolis Kellis
  3. The Bioinformatics Centre, Department of Molecular Biology, University of Copenhagen, Ole Maaloes Vej 5, 2200 Copenhagen N, Denmark

    • Jakob S. Pedersen
  4. Center for Biomolecular Science and Engineering, University of California, Santa Cruz, California 95064, USA

    • Jakob S. Pedersen
    • , Angie S. Hinrichs
    • , Benedict Paten
    • , W. James Kent
    •  & David Haussler
  5. Wellcome Trust Sanger Institute, Wellcome Trust Genome Campus, Hinxton, Cambridge CB10 1SA, UK

    • Leopold Parts
    •  & Benedict Paten
  6. Institute of Computer Science, University of Tartu, Estonia

    • Leopold Parts
  7. BDGP, LBNL, 1 Cyclotron Road MS 64-0119, Berkeley, California 94720, USA

    • Joseph W. Carlson
    •  & Susan E. Celniker
  8. FlyBase, The Biological Laboratories, Harvard University, 16 Divinity Avenue, Cambridge, Massachusetts 02138, USA

    • Madeline A. Crosby
    •  & William M. Gelbart
  9. Department of Computer Science, University of New Mexico, Albuquerque, New Mexico 87131, USA

    • Sushmita Roy
  10. Department of Biology, MIT, Cambridge, Massachusetts 02139, USA

    • J. Graham Ruby
    •  & David P. Bartel
  11. Whitehead Institute, Cambridge, Massachusetts 02142, USA

    • J. Graham Ruby
    •  & David P. Bartel
  12. Cold Spring Harbor Laboratory, Watson School of Biological Sciences, 1 Bungtown Road, Cold Spring Harbor, New York 11724, USA

    • Julius Brennecke
    • , Emily Hodges
    •  & Gregory J. Hannon
  13. University of California, San Francisco/University of California, Berkeley Joint Graduate Group in Bioengineering, Berkeley, California 97210, USA

    • Anat Caspi
  14. EMBL Nucleotide Sequence Submissions, European Bioinformatics Institute, Wellcome Trust Genome Campus, Hinxton, Cambridge CB10 1SD, UK

    • Benedict Paten
  15. Department of Cell Biology and Molecular Medicine, G-629, MSB, 185 South Orange Avenue, UMDNJ-New Jersey Medical School, Newark, New Jersey 07103, USA

    • Seung-Won Park
    •  & Yongkyu Park
  16. Department of Biology and School of Informatics, Indiana University, Indiana 47405, USA

    • Mira V. Han
    •  & Matthew W. Hahn
  17. Department of Biology, Connecticut College, New London, Connecticut 06320, USA

    • Morgan L. Maeder
    • , Benjamin J. Polansky
    • , Bryanne E. Robson
    •  & Deborah A. Eastman
  18. Laboratory of Neurogenetics, Department of Molecular and Developmental Genetics, VIB, 3000 Leuven, Belgium

    • Stein Aerts
    •  & Bassem Hassan
  19. Department of Human Genetics, K. U. Leuven School of Medicine, 3000 Leuven, Belgium

    • Stein Aerts
    •  & Bassem Hassan
  20. Department de Biologie Moleculaire, Universite Libre de Bruxelles, 1050 Brussels, Belgium

    • Jacques van Helden
  21. Department of Biology, Indiana University, Bloomington, Indiana 47405, USA

    • Donald G. Gilbert
    •  & Thomas C. Kaufman
  22. Department of Mathematics and Computer Science, Wesleyan University, Middletown, Connecticut 06459, USA

    • Michael Rice
  23. Biology Department, Wesleyan University Middletown, Connecticut 06459, USA

    • Michael Weir
  24. Department of Biostatistics and Medical Informatics, University of Wisconsin-Madison, Madison, Wisconsin 53706, USA

    • Colin N. Dewey
  25. Department of Mathematics, University of California at Berkeley, Berkeley, California 94720, USA

    • Lior Pachter
  26. Department of Computer Science, University of California at Berkeley, Berkeley, California 94720, USA

    • Lior Pachter
  27. Department of Developmental Biology, Memorial Sloan-Kettering Cancer Center, New York, New York 10021, USA

    • Eric C. Lai
  28. Graduate Group in Biophysics, Department of Molecular and Cell Biology, and Center for Integrative Genomics, University of California, Berkeley, California 94720, USA

    • Michael B. Eisen
  29. Lawrence Berkeley National Laboratory, Life Sciences Division, Berkeley, California 94720, USA

    • Michael B. Eisen
  30. Department of Molecular Biology and Genetics, Cornell University, Ithaca, New York 14853, USA

    • Andrew G. Clark
  31. Agencourt Bioscience Corporation, 500 Cummings Center, Suite 2450, Beverly, Massachusetts 01915, USA

    • Douglas Smith
  32. The Department of Molecular and Cellular Biology, Harvard University, Cambridge, Massachusetts 02138, USA

    • William M. Gelbart
  33. FlyBase, The Biological Laboratories, Harvard University, 16 Divinity Avenue, Cambridge, Massachusetts 02138, USA.

    • Madeline A. Crosby
    • , Beverley B. Matthews
    • , Andrew J. Schroeder
    • , L. Sian Gramates
    • , Susan E. St Pierre
    • , Margaret Roark
    • , Kenneth L. Wiley Jr
    • , Rob J. Kulathinal
    • , Peili Zhang
    • , Kyl V. Myrick
    • , Jerry V. Antone
    •  & William M. Gelbart
  34. BDGP, LBNL, 1 Cyclotron Road MS 64-0119, Berkeley, California 94720, USA.

    • Joseph W. Carlson
    • , Charles Yu
    • , Soo Park
    • , Kenneth H. Wan
    •  & Susan E. Celniker
  35. Lists of participants and affiliations appear at the end of the paper.

Consortia

  1. Harvard FlyBase curators

  2. Berkeley Drosophila Genome Project

Authors

  1. Search for Alexander Stark in:

  2. Search for Michael F. Lin in:

  3. Search for Pouya Kheradpour in:

  4. Search for Jakob S. Pedersen in:

  5. Search for Leopold Parts in:

  6. Search for Joseph W. Carlson in:

  7. Search for Madeline A. Crosby in:

  8. Search for Matthew D. Rasmussen in:

  9. Search for Sushmita Roy in:

  10. Search for Ameya N. Deoras in:

  11. Search for J. Graham Ruby in:

  12. Search for Julius Brennecke in:

  13. Search for Emily Hodges in:

  14. Search for Angie S. Hinrichs in:

  15. Search for Anat Caspi in:

  16. Search for Benedict Paten in:

  17. Search for Seung-Won Park in:

  18. Search for Mira V. Han in:

  19. Search for Morgan L. Maeder in:

  20. Search for Benjamin J. Polansky in:

  21. Search for Bryanne E. Robson in:

  22. Search for Stein Aerts in:

  23. Search for Jacques van Helden in:

  24. Search for Bassem Hassan in:

  25. Search for Donald G. Gilbert in:

  26. Search for Deborah A. Eastman in:

  27. Search for Michael Rice in:

  28. Search for Michael Weir in:

  29. Search for Matthew W. Hahn in:

  30. Search for Yongkyu Park in:

  31. Search for Colin N. Dewey in:

  32. Search for Lior Pachter in:

  33. Search for W. James Kent in:

  34. Search for David Haussler in:

  35. Search for Eric C. Lai in:

  36. Search for David P. Bartel in:

  37. Search for Gregory J. Hannon in:

  38. Search for Thomas C. Kaufman in:

  39. Search for Michael B. Eisen in:

  40. Search for Andrew G. Clark in:

  41. Search for Douglas Smith in:

  42. Search for Susan E. Celniker in:

  43. Search for William M. Gelbart in:

  44. Search for Manolis Kellis in:

Competing interests

The authors declare no competing financial interests.

Corresponding author

Correspondence to Manolis Kellis.

Supplementary information

PDF files

  1. 1.

    Supplementary Information

    The file contains extensive Supplementary Information.

Comments

By submitting a comment you agree to abide by our Terms and Community Guidelines. If you find something abusive or that does not comply with our terms or guidelines please flag it as inappropriate.