Skip to main content

Thank you for visiting nature.com. You are using a browser version with limited support for CSS. To obtain the best experience, we recommend you use a more up to date browser (or turn off compatibility mode in Internet Explorer). In the meantime, to ensure continued support, we are displaying the site without styles and JavaScript.

Geneticists play the numbers game in vain

New York

The precise number of human genes might never be tallied, geneticists confessed this week, after handing out a cash prize for the nearest guestimate.

To non-geneticists at least, the admission may come as a surprise. Many had assumed that the human gene count was virtually settled at between 30,000 and 40,000, an estimate announced with great fanfare in February 2001 based on early analyses of the draft human genome sequence.

But two years on, geneticists who gathered at a Cold Spring Harbor Laboratory meeting in New York state acknowledged that they are no closer to a final tally. “I'd say we don't know the true gene number for any organism — and certainly not for humans,” said bioinformatics expert Phil Green of the University of Washington, Seattle.

This uncertainty has now been highlighted by a sweepstake on the human gene tally, dubbed Genesweep. The rules of the wager, which was light-heartedly set up at a Cold Spring Harbor conference in 2000, stated that the winner would be announced this year. More than 460 bets have since been placed.

On 30 May, the winning estimate — the lowest one — was announced by organizer Ewan Birney of the European Bioinformatics Institute in Hinxton, UK. The prize goes to Lee Rowen, who directs a sequencing project at the Institute for Systems Biology in Seattle, Washington.

Rowen's wager of 25,947 is closest to the current reckoning of 24,847 made by the genetic database Ensembl. Like many good gamblers, she describes her number as “a stab”; one runner-up picked 27,462 because his date of birth was 27 April 1962.

The final gene tally is anyone's guess — but it is unlikely to rise again to the estimates of 80,000–100,000 mooted a few years ago. Geneticists at the meeting came up with many reasons why genes — regions of DNA that code for proteins — have proved so difficult to identify.

One reason is that gene-predictor programs, which trawl through DNA for landmark sequences characteristic of a gene, are notoriously unreliable. For instance, they often erroneously pick up pseudogenes, copies of real genes that have become defunct.

Conversely, the programs can miss genes carrying variations in their landmark sequences — and are completely flummoxed by unconventional cases, such as tiny genes, overlapping genes or small genes hidden within larger ones. “No gene predictor will ever get these right,” says fruitfly geneticist Gerald Rubin of the University of California, Berkeley.

Lines of back-up evidence that are commonly used to strengthen program predictions are also fallible. For example, a putative human gene is considered more likely to be real if it matches a gene also found in databases of mouse, fruitfly or other organisms. But an unknown number of human genes have no obvious match.

With so many obstacles to tracking human genes, “there will never be a final number”, predicts Jean Weissenbach, director of the sequencing centre Génoscope in Evry, France. Ultimately, it may turn out that every person has a different number of genes, as mutations have eliminated minor ones from their genome, he says.

http://www.ensembl.org/Genesweep

Authors

Rights and permissions

Reprints and Permissions

About this article

Cite this article

Pearson, H. Geneticists play the numbers game in vain. Nature 423, 576 (2003). https://doi.org/10.1038/423576a

Download citation

Further reading

Search

Quick links

Nature Briefing

Sign up for the Nature Briefing newsletter — what matters in science, free to your inbox daily.

Get the most important science stories of the day, free in your inbox. Sign up for Nature Briefing