Skip to main content

Thank you for visiting nature.com. You are using a browser version with limited support for CSS. To obtain the best experience, we recommend you use a more up to date browser (or turn off compatibility mode in Internet Explorer). In the meantime, to ensure continued support, we are displaying the site without styles and JavaScript.

  • Review Article
  • Published:

The functional repertoires of metazoan genomes

Key Points

  • Predicting the number of protein-coding genes that are present in a genome is complicated by the presence of pseudogenes, by misannotations of non-coding sequence, by the incompleteness of assemblies and by structural variation. Nevertheless, it seems that gene counts in metazoan genomes vary by less than twofold between approximately 13,000 and 23,000.

  • Different genes in a species evolve at different rates. At one end of the spectrum genes involved in transcriptional and developmental regulation, and other housekeeping genes, tend to duplicate and change their coding and non-coding sequences slowly. At the other end of the spectrum, genes with products that are involved in sensing and responding to the environment tend to experience elevated rates of evolutionary change.

  • Approximately 1% of the human genome lies within protein-coding sequence. A further 2% does not encode protein but seems to be functional in divergent mammals, such as mice and humans. Up to an additional 7% of the human genome might be lineage specific: it is functional only in closely related primate species.

  • Vertebrate genomes seem to contain between two and five times more functional sequence than fruitfly genomes.

  • The majority of mammalian transposable elements exhibit no evidence of constraint and therefore are unlikely to be functional. Mobile genetic elements seem to contribute only a minority of eutherian specific and functional sequence.

  • Rates of chromosomal rearrangements vary dramatically across the metazoa but are unlikely to contribute greatly to phenotypic change.

  • Chromosomal rearrangements seem to be concentrated within 'fragile' GC-rich sequence. Increases in GC-content might result from sustained high rates of biased gene conversion.

  • Such mutational biases, together with a reduced efficacy of purifying selection in regions of sustained low recombination, might better explain the landscape of amniotic and nematode chromosomes than models invoking episodes of adaptive evolution.

Abstract

Metazoan genomes are being sequenced at an increasingly rapid rate. For each new genome, the number of protein-coding genes it encodes and the amount of functional DNA it contains are known only inaccurately. Nevertheless, there have been considerable recent advances in identifying protein-coding and non-coding sequences that have remained constrained in diverse species. However, these approaches struggle to pinpoint genomic sequences that are functional in some species but that are absent or not functional in others. Yet it is here, encoded in lineage-specific and functional sequence, that we expect physiological differences between species to be most concentrated.

This is a preview of subscription content, access via your institution

Access options

Buy this article

Prices may be subject to local taxes which are calculated during checkout

Figure 1: Metazoan phylogeny.
Figure 2: A neutral insertion and deletion (indel) model.
Figure 3: Conserved sequence in the fruitfly and the human.
Figure 4: Fragile breakage model.
Figure 5: Nematode chromosomal organization.

Similar content being viewed by others

References

  1. Seo, H. C. et al. Miniature genome in the marine chordate Oikopleura dioica. Science 294, 2506 (2001).

    CAS  PubMed  Google Scholar 

  2. Mikkelsen, T. S. et al. Genome of the marsupial Monodelphis domestica reveals innovation in non-coding sequences. Nature 447, 167–177 (2007).

    CAS  PubMed  Google Scholar 

  3. Putnam, N. H. et al. The amphioxus genome and the evolution of the chordate karyotype. Nature 453, 1064–1071 (2008).

    CAS  PubMed  Google Scholar 

  4. Jaillon, O. et al. Genome duplication in the teleost fish Tetraodon nigroviridis reveals the early vertebrate proto-karyotype. Nature 431, 946–957 (2004).

    PubMed  Google Scholar 

  5. Organ, C. L., Shedlock, A. M., Meade, A., Pagel, M. & Edwards, S. V. Origin of avian genome size and structure in non-avian dinosaurs. Nature 446, 180–184 (2007). An innovative approach to inferring genome size from bone-cell size shows that the small genomes that are characteristic of modern birds arose in their lineage long before the evolution of flight.

    CAS  PubMed  Google Scholar 

  6. Nene, V. et al. Genome sequence of Aedes aegypti, a major arbovirus vector. Science 316, 1718–1723 (2007).

    CAS  PubMed  Google Scholar 

  7. Sodergren, E. et al. The genome of the sea urchin Strongylocentrotus purpuratus. Science 314, 941–952 (2006).

    PubMed  Google Scholar 

  8. Vicoso, B. & Charlesworth, B. Evolution on the X chromosome: unusual patterns and processes. Nature Rev. Genet. 7, 645–653 (2006).

    CAS  PubMed  Google Scholar 

  9. Hurst, L. D., Pal, C. & Lercher, M. J. The evolutionary dynamics of eukaryotic gene order. Nature Rev. Genet. 5, 299–310 (2004).

    CAS  PubMed  Google Scholar 

  10. Lynch, M. The Origins of Genome Architecture (Sinauer Associates, Sunderland, Massachusetts, 2007).

    Google Scholar 

  11. Ferguson-Smith, M. A. & Trifonov, V. Mammalian karyotype evolution. Nature Rev. Genet. 8, 950–962 (2007).

    CAS  PubMed  Google Scholar 

  12. Hahn, M. W., Han, M. V. & Han, S. G. Gene family evolution across 12 Drosophila genomes. PLoS Genet. 3, e197 (2007).

    PubMed  PubMed Central  Google Scholar 

  13. Goodstadt, L. & Ponting, C. P. Phylogenetic reconstruction of orthology, paralogy, and conserved synteny for dog and human. PLoS Comput. Biol. 2, e133 (2006).

    PubMed  PubMed Central  Google Scholar 

  14. Brent, M. R. Steady progress and recent breakthroughs in the accuracy of automated genome annotation. Nature Rev. Genet. 9, 62–73 (2008).

    CAS  PubMed  Google Scholar 

  15. Tuzun, E. et al. Fine-scale structural variation of the human genome. Nature Genet. 37, 727–732 (2005).

    Article  CAS  PubMed  Google Scholar 

  16. Levy, S. et al. The diploid genome sequence of an individual human. PLoS Biol. 5, e254 (2007).

    PubMed  PubMed Central  Google Scholar 

  17. Dopman, E. B. & Hartl, D. L. A portrait of copy-number polymorphism in Drosophila melanogaster. Proc. Natl Acad. Sci. USA 104, 19920–19925 (2007).

    CAS  PubMed  PubMed Central  Google Scholar 

  18. Redon, R. et al. Global variation in copy number in the human genome. Nature 444, 444–454 (2006).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  19. Egan, C. M., Sridhar, S., Wigler, M. & Hall, I. M. Recurrent DNA copy number variation in the laboratory mouse. Nature Genet. 39, 1384–1389 (2007).

    CAS  PubMed  Google Scholar 

  20. Holt, R. A. et al. The genome sequence of the malaria mosquito Anopheles gambiae. Science 298, 129–149 (2002).

    CAS  PubMed  Google Scholar 

  21. Small, K. S., Brudno, M., Hill, M. M. & Sidow, A. Extreme genomic variation in a natural population. Proc. Natl Acad. Sci. USA 104, 5698–5703 (2007).

    CAS  PubMed  PubMed Central  Google Scholar 

  22. International Human Genome Sequencing Consortium. Finishing the euchromatic sequence of the human genome. Nature 431, 931–945 (2004).

  23. Clamp, M. et al. Distinguishing protein-coding and noncoding genes in the human genome. Proc. Natl Acad. Sci. USA (2007).

  24. Goodstadt, L., Heger, A., Webber, C. & Ponting, C. P. An analysis of the gene complement of a marsupial, Monodelphis domestica: evolution of lineage-specific genes and giant chromosomes. Genome Res. 17, 969–981 (2007).

    CAS  PubMed  PubMed Central  Google Scholar 

  25. Hillier L. W. et al. The DNA sequence of human chromosome 7. Nature 424, 157–164 (2003).

    CAS  PubMed  Google Scholar 

  26. Venter, J. C. et al. The sequence of the human genome. Science 291, 1304–1351 (2001).

    CAS  PubMed  Google Scholar 

  27. Lander, E. S. et al. Initial sequencing and analysis of the human genome. Nature 409, 860–921 (2001). A landmark document that introduced the analyses of genome landscapes, and allowed comparisons between vertebrate and invertebrate genomes.

    CAS  PubMed  Google Scholar 

  28. Hillier, L. W. et al. Comparison of C. elegans and C. briggsae genome sequences reveals extensive conservation of chromosome organization and synteny. PLoS Biol. 5, e167 (2007).

    PubMed  PubMed Central  Google Scholar 

  29. Lynch, M. & Conery, J. S. The evolutionary fate and consequences of duplicate genes. Science 290, 1151–1155 (2000).

    CAS  PubMed  Google Scholar 

  30. Lynch, M. & Conery, J. S. The evolutionary demography of duplicate genes. J. Struct. Funct. Genomics 3, 35–44 (2003).

    CAS  PubMed  Google Scholar 

  31. Suyama, M., Harrington, E., Bork, P. & Torrents, D. Identification and analysis of genes and pseudogenes within duplicated regions in the human and mouse genomes. PLoS Comput. Biol. 2, e76 (2006).

    PubMed  PubMed Central  Google Scholar 

  32. Zheng, D. et al. Pseudogenes in the ENCODE regions: consensus annotation, analysis of transcription, and evolution. Genome Res. 17, 839–851 (2007).

    CAS  PubMed  PubMed Central  Google Scholar 

  33. Kondrashov, F. A., Rogozin, I. B., Wolf, Y. I. & Koonin, E. V. Selection in the evolution of gene duplications. Genome Biol. 3, RESEARCH0008 (2002).

    PubMed  PubMed Central  Google Scholar 

  34. Thomas, J. H. Analysis of homologous gene clusters in Caenorhabditis elegans reveals striking regional cluster domains. Genetics 172, 127–143 (2006).

    CAS  PubMed  PubMed Central  Google Scholar 

  35. Warren, W. C. et al. Genome analysis of the platypus reveals unique signatures of evolution. Nature 453, 175–183 (2008).

    CAS  PubMed  PubMed Central  Google Scholar 

  36. Emes, R. D., Goodstadt, L., Winter, E. E. & Ponting, C. P. Comparison of the genomes of human and mouse lays the foundation of genome zoology. Hum. Mol. Genet. 12, 701–709 (2003).

    CAS  PubMed  Google Scholar 

  37. Mouse Genome Sequencing Consortium. Initial sequencing and comparative analysis of the mouse genome. Nature 420, 520–562 (2002). The mouse genome sequence presented the first opportunity to compare neutrally evolving and functional sequences from two closely related species. The level of detail described in this publication has yet to be surpassed.

  38. Nguyen, D. Q., Webber, C. & Ponting, C. P. Bias of selection on human copy-number variants. PLoS Genet. 2, e20 (2006).

    PubMed  PubMed Central  Google Scholar 

  39. Laukaitis, C. M. et al. Rapid bursts of androgen-binding protein (Abp) gene duplication occurred independently in diverse mammals. BMC Evol. Biol. 8, 46 (2008).

    PubMed  PubMed Central  Google Scholar 

  40. Heger, A. & Ponting, C. P. Evolutionary rate analyses of orthologs and paralogs from 12 Drosophila genomes. Genome Res. 17, 1837–1849 (2007).

    CAS  PubMed  PubMed Central  Google Scholar 

  41. Birtle, Z., Goodstadt, L. & Ponting, C. Duplication and positive selection among hominin-specific PRAME genes. BMC Genomics 6, 120 (2005).

    PubMed  PubMed Central  Google Scholar 

  42. Popesco, M. C. et al. Human lineage-specific amplification, selection, and neuronal expression of DUF1220 domains. Science 313, 1304–1307 (2006).

    CAS  PubMed  Google Scholar 

  43. Sackton, T. B. et al. Dynamic evolution of the innate immune system in Drosophila. Nature Genet. 39, 1461–1468 (2007).

    CAS  PubMed  Google Scholar 

  44. Turner, D. J. et al. Germline rates of de novo meiotic deletions and duplications causing several genomic disorders. Nature Genet. 40, 90–95 (2008).

    CAS  PubMed  Google Scholar 

  45. Ohta, T. Slightly deleterious mutant substitutions in evolution. Nature 246, 96–98 (1973).

    CAS  PubMed  Google Scholar 

  46. Wyder, S., Kriventseva, E. V., Schroder, R., Kadowaki, T. & Zdobnov, E. M. Quantification of ortholog losses in insects and vertebrates. Genome Biol. 8, R242 (2007).

    PubMed  PubMed Central  Google Scholar 

  47. Putnam, N. H. et al. Sea anemone genome reveals ancestral eumetazoan gene repertoire and genomic organization. Science 317, 86–94 (2007). This cnidarian genome sequence provides an important evolutionary measure against which all other metazoan genomes can be compared. As sea-anemone sequences seem to have evolved particularly slowly, they better represent the eumetazoan ancestor than many arthropod and vertebrate genomes.

    CAS  PubMed  Google Scholar 

  48. Kortschak, R. D., Samuel, G., Saint, R. & Miller, D. J. EST analysis of the cnidarian Acropora millepora reveals extensive gene loss and rapid sequence divergence in the model invertebrates. Curr. Biol. 13, 2190–2195 (2003).

    CAS  PubMed  Google Scholar 

  49. Zhu, J. et al. Comparative genomics search for losses of long-established genes on the human lineage. PLoS Comput. Biol. 3, e247 (2007).

    PubMed  PubMed Central  Google Scholar 

  50. Rouquier, S., Blancher, A. & Giorgi, D. The olfactory receptor gene repertoire in primates and mouse: evidence for reduction of the functional fraction in primates. Proc. Natl Acad. Sci. USA 97, 2870–2874 (2000).

    CAS  PubMed  PubMed Central  Google Scholar 

  51. Brawand, D., Wahli, W. & Kaessmann, H. Loss of egg yolk genes in mammals and the origin of lactation and placentation. PLoS Biol. 6, e63 (2008). A comprehensive study providing the evolutionary history of genes with functions that divide egg-laying amniotes from placental mammals.

    PubMed  PubMed Central  Google Scholar 

  52. Babin P. J. Conservation of a vitellogenin gene cluster in oviparous vertebrates and identification of its traces in the platypus genome. Gene 413, 76–82 (2008).

    CAS  PubMed  Google Scholar 

  53. Chiaromonte, F. et al. The share of human genomic DNA under selection estimated from human–mouse genomic alignments. Cold Spring Harb. Symp. Quant. Biol. 68, 245–254 (2003).

    CAS  PubMed  Google Scholar 

  54. Siepel, A. et al. Evolutionarily conserved elements in vertebrate, insect, worm, and yeast genomes. Genome Res. 15, 1034–1050 (2005).

    CAS  PubMed  PubMed Central  Google Scholar 

  55. Birney, E. et al. Identification and analysis of functional elements in 1% of the human genome by the ENCODE pilot project. Nature 447, 799–816 (2007). A trail-blazing study linking sequence and function across 1% of the human genome.

    CAS  PubMed  Google Scholar 

  56. Pheasant, M. & Mattick, J. S. Raising the estimate of functional human sequences. Genome Res. 17, 1245–1253 (2007).

    CAS  PubMed  Google Scholar 

  57. Lowe, C. B., Bejerano, G. & Haussler, D. Thousands of human mobile element fragments undergo strong purifying selection near developmental genes. Proc. Natl Acad. Sci. USA 104, 8005–8010 (2007).

    CAS  PubMed  PubMed Central  Google Scholar 

  58. Lunter, G., Ponting, C. P. & Hein, J. Genome-wide identification of human functional DNA using a neutral indel model. PLoS Comput. Biol. 2, e5 (2006). The only model of evolution to show that the vast majority of TEs have evolved neutrally. Another benefit of the method is its identification of probable functional sequence that has purged deleterious insertions or deletions.

    PubMed  PubMed Central  Google Scholar 

  59. Smith, N. G., Brandstrom, M. & Ellegren, H. Evidence for turnover of functional noncoding DNA in mammalian genome evolution. Genomics 84, 806–813 (2004). This paper proposes that a large proportion of mammalian genomes is functional but is not conserved in more divergent mammalian species. Sequence conservation gives only a partial view of the functional repertoire of a genome.

    CAS  PubMed  Google Scholar 

  60. Huang, H. et al. Evolutionary conservation and selection of human disease gene orthologs in the rat and mouse genomes. Genome Biol. 5, R47 (2004).

    PubMed  PubMed Central  Google Scholar 

  61. Sagai, T. et al. Phylogenetic conservation of a limb-specific, cis-acting regulator of Sonic hedgehog (Shh). Mamm. Genome 15, 23–34 (2004).

    CAS  PubMed  Google Scholar 

  62. Bohne, A., Brunet, F., Galiana-Arnoux, D., Schultheis, C. & Volff, J. N. Transposable elements as drivers of genomic and biological diversity in vertebrates. Chromosome Res. 16, 203–215 (2008).

    PubMed  Google Scholar 

  63. Ivanov, D., Stone, J. R., Maki, J. L., Collins, T. & Wagner, G. Mammalian SCAN domain dimer is a domain-swapped homolog of the HIV capsid C-terminal domain. Mol. Cell 17, 137–143 (2005).

    CAS  PubMed  Google Scholar 

  64. Li, X. Y. et al. Transcription factors bind thousands of active and inactive regions in the Drosophila blastoderm. PLoS Biol. 6, e27 (2008).

    PubMed  PubMed Central  Google Scholar 

  65. Moses, A. M. et al. Large-scale turnover of functional transcription factor binding sites in Drosophila. PLoS Comput. Biol. 2, e130 (2006).

    PubMed  PubMed Central  Google Scholar 

  66. Dermitzakis, E. T. & Clark, A. G. Evolution of transcription factor binding sites in mammalian gene regulatory regions: conservation and turnover. Mol. Biol. Evol. 19, 1114–1121 (2002).

    Article  CAS  PubMed  Google Scholar 

  67. Odom, D. T. et al. Tissue-specific transcriptional regulation has diverged significantly between human and mouse. Nature Genet. 39, 730–732 (2007).

    CAS  PubMed  Google Scholar 

  68. Woolfe, A. et al. Highly conserved non-coding sequences are associated with vertebrate development. PLoS Biol. 3, e7 (2005).

    PubMed  Google Scholar 

  69. Hillier, L. W. et al. Sequence and comparative analysis of the chicken genome provide unique perspectives on vertebrate evolution. Nature 432, 695–716 (2004). Sequencing of the first genome of an amniote from outside the mammals allowed the discrimination of mammal-specific features from more ancestral features for eutherian genes and chromosomes.

    CAS  Google Scholar 

  70. Sironi, M. et al. Analysis of intronic conserved elements indicates that functional complexity might represent a major source of negative selection on non-coding sequences. Hum. Mol. Genet. 14, 2533–2546 (2005).

    CAS  PubMed  Google Scholar 

  71. Kasahara, M. et al. The medaka draft genome and insights into vertebrate genome evolution. Nature 447, 714–719 (2007).

    CAS  PubMed  Google Scholar 

  72. Bolshakov, V. N. et al. A comparative genomic analysis of two distant diptera, the fruit fly, Drosophila melanogaster, and the malaria mosquito, Anopheles gambiae. Genome Res. 12, 57–66 (2002).

    CAS  PubMed  PubMed Central  Google Scholar 

  73. Lindblad-Toh, K. et al. Genome sequence, comparative analysis and haplotype structure of the domestic dog. Nature 438, 803–819 (2005).

    CAS  PubMed  Google Scholar 

  74. Semon, M. & Duret, L. Evolutionary origin and maintenance of coexpressed gene clusters in mammals. Mol. Biol. Evol. 23, 1715–1723 (2006).

    CAS  PubMed  Google Scholar 

  75. Clark, A. G. et al. Evolution of genes and genomes on the Drosophila phylogeny. Nature 450, 203–218 (2007).

    PubMed  Google Scholar 

  76. Arndt, P. F., Hwa, T. & Petrov, D. A. Substantial regional variation in substitution rates in the human genome: importance of GC content, gene density, and telomere-specific effects. J. Mol. Evol. 60, 748–763 (2005).

    CAS  PubMed  Google Scholar 

  77. Chuang, J. H. & Li, H. Functional bias and spatial organization of genes in mutational hot and cold regions in the human genome. PLoS Biol. 2, e29 (2004).

    PubMed  PubMed Central  Google Scholar 

  78. Bullaughey, K., Przeworski, M. & Coop, G. No effect of recombination on the efficacy of natural selection in primates. Genome Res. 18, 544–554 (2008).

    CAS  PubMed  PubMed Central  Google Scholar 

  79. Galtier, N., Piganeau, G., Mouchiroud, D. & Duret, L. GC-content evolution in mammalian genomes: the biased gene conversion hypothesis. Genetics 159, 907–911 (2001).

    CAS  PubMed  PubMed Central  Google Scholar 

  80. Chimpanzee Sequencing and Analysis Consortium. Initial sequence of the chimpanzee genome and comparison with the human genome. Nature 437, 69–87 (2005).

  81. Webber, C. & Ponting, C. P. Hotspots of mutation and breakage in dog and human chromosomes. Genome Res. 15, 1787–1797 (2005).

    CAS  PubMed  PubMed Central  Google Scholar 

  82. Blumenthal, T. et al. A global analysis of Caenorhabditis elegans operons. Nature 417, 851–854 (2002).

    CAS  PubMed  Google Scholar 

  83. Mardis, E. R. ChIP-seq: welcome to the new frontier. Nature Methods 4, 613–614 (2007).

    CAS  PubMed  Google Scholar 

  84. Nikolaev, S. I. et al. Life-history traits drive the evolutionary rates of mammalian coding and noncoding genomic elements. Proc. Natl Acad. Sci. USA 104, 20443–20448 (2007).

    CAS  PubMed  PubMed Central  Google Scholar 

  85. Nobrega, M. A., Zhu, Y., Plajzer-Frick, I., Afzal, V. & Rubin, E. M. Megabase deletions of gene deserts result in viable mice. Nature 431, 988–993 (2004).

    CAS  PubMed  Google Scholar 

  86. Emison, E. S. et al. A common sex-dependent mutation in a RET enhancer underlies Hirschsprung disease risk. Nature 434, 857–863 (2005).

    CAS  PubMed  Google Scholar 

  87. Stein, L. D. et al. The genome sequence of Caenorhabditis briggsae: a platform for comparative genomics. PLoS Biol. 1, e45 (2003).

    PubMed  PubMed Central  Google Scholar 

  88. Green, P. 2x genomes — does depth matter?. Genome Res. 17, 1547–1549 (2007).

    CAS  PubMed  Google Scholar 

  89. Honeybee Genome Sequencing Consortium. Insights into social insects from the genome of the honeybee Apis mellifera. Nature 443, 931–949 (2006).

  90. Hedges, S. B. The origin and evolution of model organisms. Nature Rev. Genet. 3, 838–849 (2002).

    CAS  PubMed  Google Scholar 

  91. Ghedin, E. et al. Draft genome of the filarial nematode parasite Brugia malayi. Science 317, 1756–1760 (2007).

    CAS  PubMed  PubMed Central  Google Scholar 

  92. Wittbrodt, J., Shima, A. & Schartl, M. Medaka — a model organism from the Far East. Nature Rev. Genet. 3, 53–64 (2002).

    CAS  PubMed  Google Scholar 

  93. Venkatesh, B. et al. Survey sequencing and comparative analysis of the elephant shark (Callorhinchus milii) genome. PLoS Biol. 5, e101 (2007).

    PubMed  PubMed Central  Google Scholar 

  94. Heger, A. & Ponting, C. P. OPTIC: orthologous and paralogous transcripts in clades. Nucleic Acids Res. 36, D267–D270 (2007).

    PubMed  PubMed Central  Google Scholar 

Download references

Acknowledgements

I am grateful to all in my group for assistance. My apologies go out to all authors whose work was not cited in this Review owing to space restrictions.

Author information

Authors and Affiliations

Authors

Related links

Related links

FURTHER INFORMATION

Ensembl Genome Browser

MRC Functional Genomics Unit

OPTIC: Clade genomics web server

UCSC Genome Browser

Glossary

Hill–Robertson interference

When recombination fails to break down linkage disequilibrium between alleles at selected loci the ability of selection to act on these alleles tends to be reduced.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Ponting, C. The functional repertoires of metazoan genomes. Nat Rev Genet 9, 689–698 (2008). https://doi.org/10.1038/nrg2413

Download citation

  • Issue Date:

  • DOI: https://doi.org/10.1038/nrg2413

This article is cited by

Search

Quick links

Nature Briefing

Sign up for the Nature Briefing newsletter — what matters in science, free to your inbox daily.

Get the most important science stories of the day, free in your inbox. Sign up for Nature Briefing