Analysis of expressed sequence tags indicates 35,000 human genes


The number of protein-coding genes in an organism provides a useful first measure of its molecular complexity. Single-celled prokaryotes and eukaryotes typically have a few thousand genes; for example, Escherichia coli1 has 4,300 and Saccharomyces cerevisiae2 has 6,000. Evolution of multicellularity appears to have been accompanied by a several-fold increase in gene number, the invertebrates Caenorhabditis elegans3 and Drosophila melanogaster4 having 19,000 and 13,600 genes, respectively. Here we estimate the number of human genes by comparing a set of human expressed sequence tag (EST) contigs with human chromosome 22 and with a non-redundant set of mRNA sequences. The two comparisons give mutually consistent estimates of approximately 35,000 genes, substantially lower than most previous estimates. Evolution of the increased physiological complexity of vertebrates may therefore have depended more on the combinatorial diversification of regulatory networks or alternative splicing than on a substantial increase in gene number.

Access optionsAccess options

Rent or Buy article

Get time limited or full article access on ReadCube.


All prices are NET prices.


  1. 1

    Blattner, F.R. et al. The complete genome sequence of Escherichia coli K-12. Science 277, 1453–1462 (1997).

  2. 2

    Goffeau, A. et al. Life with 6000 genes. Science 274, 563–567 (1996).

  3. 3

    The C. elegans Sequencing Consortium. Genome sequence of the nematode C. elegans: a platform for investigating biology. Science 282, 2012–2018 (1998).

  4. 4

    Adams, M.D. et al. The genome sequence of Drosophila melanogaster. Science 287, 2185–2195 (2000).

  5. 5

    Dunham, I. et al. The DNA sequence of human chromosome 22. Nature 402, 489–495 (1999).

  6. 6

    Waterston, R. et al. A survey of expressed genes in Caenorhabditis elegans. Nature Genet. 1, 114–123 (1992).

  7. 7

    Hillier, L. et al. Generation and analysis of 280,000 human expressed sequence tags. Genome Res. 6, 807–828 (1996).

  8. 8

    Wolfsberg, T.G. & Landsman, D. A comparison of expressed sequence tags (ESTs) to human genomic sequences. Nucleic Acids Res. 25, 1626–1632 (1997).

  9. 9

    Ewing, B., Hillier, L., Wendl, M. & Green, P. Basecalling of automated sequencer traces using phred. I. Accuracy assessment. Genome Res. 8, 175–185 (1998).

  10. 10

    Ewing, B. & Green, P. Basecalling of automated sequencer traces using phred. II. Error probabilities. Genome Res. 8, 186–194 (1998).

  11. 11

    Bonaldo, M., Lennon, G. & Soares, M.B. Normalization and subtraction: two approaches to facilitate gene discovery. Genome Res. 6, 791–806 (1996).

  12. 12

    Bernardi, G. The human genome: organization and evolutionary history. Annu. Rev. Genet. 29, 445–476 (1995).

  13. 13

    Lewin, B. Genes IV 466–481 (Oxford University Press, Oxford, 1990).

  14. 14

    Antequera, F. & Bird, A. Number of CpG islands and genes in human and mouse. Proc. Natl Acad. Sci. USA 90, 11995–11999 (1993).

  15. 15

    Fields, C., Adams, M.D., White, O. & Venter, J.C. How many genes in the human genome? Nature Genet. 7, 345–346 (1994).

  16. 16

    Green, P. et al. Ancient conserved regions in new gene sequences and the protein databases. Science 259, 1711–1716 (1993).

  17. 17

    Mironov, A.A., Fickett, J.W. & Gelfand, M.S. Frequent alternative splicing of human genes. Genome Res. 9, 1288–1293 (1999).

  18. 18

    Dickson, D. Gene estimate rises as US and UK discuss freedom of access. Nature 401, 311 (1999).

  19. 19

    Larsen, F., Gundersen, G., Lopez, R. & Prydz, H. CpG islands as gene markers in the human genome. Genomics 13, 1095–1107 (1992).

Download references


We thank C. Wilson and A. Nichols for programming assistance. This work was supported by a grant from the National Human Genome Research Institute.

Author information

Correspondence to Phil Green.

Rights and permissions

Reprints and Permissions

About this article

Further reading