Skip to main content

Thank you for visiting nature.com. You are using a browser version with limited support for CSS. To obtain the best experience, we recommend you use a more up to date browser (or turn off compatibility mode in Internet Explorer). In the meantime, to ensure continued support, we are displaying the site without styles and JavaScript.

  • Letter
  • Published:

Analysis of expressed sequence tags indicates 35,000 human genes

Abstract

The number of protein-coding genes in an organism provides a useful first measure of its molecular complexity. Single-celled prokaryotes and eukaryotes typically have a few thousand genes; for example, Escherichia coli1 has 4,300 and Saccharomyces cerevisiae2 has 6,000. Evolution of multicellularity appears to have been accompanied by a several-fold increase in gene number, the invertebrates Caenorhabditis elegans3 and Drosophila melanogaster4 having 19,000 and 13,600 genes, respectively. Here we estimate the number of human genes by comparing a set of human expressed sequence tag (EST) contigs with human chromosome 22 and with a non-redundant set of mRNA sequences. The two comparisons give mutually consistent estimates of approximately 35,000 genes, substantially lower than most previous estimates. Evolution of the increased physiological complexity of vertebrates may therefore have depended more on the combinatorial diversification of regulatory networks or alternative splicing than on a substantial increase in gene number.

This is a preview of subscription content, access via your institution

Access options

Buy this article

Prices may be subject to local taxes which are calculated during checkout

Similar content being viewed by others

References

  1. Blattner, F.R. et al. The complete genome sequence of Escherichia coli K-12. Science 277, 1453–1462 (1997).

    Article  CAS  Google Scholar 

  2. Goffeau, A. et al. Life with 6000 genes. Science 274, 563–567 (1996).

    Article  Google Scholar 

  3. The C. elegans Sequencing Consortium. Genome sequence of the nematode C. elegans: a platform for investigating biology. Science 282, 2012–2018 (1998).

  4. Adams, M.D. et al. The genome sequence of Drosophila melanogaster. Science 287, 2185–2195 (2000).

    Article  Google Scholar 

  5. Dunham, I. et al. The DNA sequence of human chromosome 22. Nature 402, 489–495 (1999).

    Article  CAS  Google Scholar 

  6. Waterston, R. et al. A survey of expressed genes in Caenorhabditis elegans. Nature Genet. 1, 114–123 (1992).

    Article  CAS  Google Scholar 

  7. Hillier, L. et al. Generation and analysis of 280,000 human expressed sequence tags. Genome Res. 6, 807–828 (1996).

    Article  CAS  Google Scholar 

  8. Wolfsberg, T.G. & Landsman, D. A comparison of expressed sequence tags (ESTs) to human genomic sequences. Nucleic Acids Res. 25, 1626–1632 (1997).

    Article  CAS  Google Scholar 

  9. Ewing, B., Hillier, L., Wendl, M. & Green, P. Basecalling of automated sequencer traces using phred. I. Accuracy assessment. Genome Res. 8, 175–185 (1998).

    Article  CAS  Google Scholar 

  10. Ewing, B. & Green, P. Basecalling of automated sequencer traces using phred. II. Error probabilities. Genome Res. 8, 186–194 (1998).

    Article  CAS  Google Scholar 

  11. Bonaldo, M., Lennon, G. & Soares, M.B. Normalization and subtraction: two approaches to facilitate gene discovery. Genome Res. 6, 791–806 (1996).

    Article  CAS  Google Scholar 

  12. Bernardi, G. The human genome: organization and evolutionary history. Annu. Rev. Genet. 29, 445–476 (1995).

    Article  CAS  Google Scholar 

  13. Lewin, B. Genes IV 466–481 (Oxford University Press, Oxford, 1990).

    Google Scholar 

  14. Antequera, F. & Bird, A. Number of CpG islands and genes in human and mouse. Proc. Natl Acad. Sci. USA 90, 11995–11999 (1993).

    Article  CAS  Google Scholar 

  15. Fields, C., Adams, M.D., White, O. & Venter, J.C. How many genes in the human genome? Nature Genet. 7, 345–346 (1994).

    Article  CAS  Google Scholar 

  16. Green, P. et al. Ancient conserved regions in new gene sequences and the protein databases. Science 259, 1711–1716 (1993).

    Article  CAS  Google Scholar 

  17. Mironov, A.A., Fickett, J.W. & Gelfand, M.S. Frequent alternative splicing of human genes. Genome Res. 9, 1288–1293 (1999).

    Article  CAS  Google Scholar 

  18. Dickson, D. Gene estimate rises as US and UK discuss freedom of access. Nature 401, 311 (1999).

    Article  CAS  Google Scholar 

  19. Larsen, F., Gundersen, G., Lopez, R. & Prydz, H. CpG islands as gene markers in the human genome. Genomics 13, 1095–1107 (1992).

    Article  CAS  Google Scholar 

Download references

Acknowledgements

We thank C. Wilson and A. Nichols for programming assistance. This work was supported by a grant from the National Human Genome Research Institute.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Phil Green.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Ewing, B., Green, P. Analysis of expressed sequence tags indicates 35,000 human genes. Nat Genet 25, 232–234 (2000). https://doi.org/10.1038/76115

Download citation

  • Received:

  • Accepted:

  • Issue Date:

  • DOI: https://doi.org/10.1038/76115

This article is cited by

Search

Quick links

Nature Briefing

Sign up for the Nature Briefing newsletter — what matters in science, free to your inbox daily.

Get the most important science stories of the day, free in your inbox. Sign up for Nature Briefing