Washington

Celera Genomics last week reached a significant milestone in the race to sequence the human genome. At a US House of Representatives subcommittee hearing, the company announced that it has completed the raw sequencing stage of its project.

But experts from the rival publicly funded Human Genome Project (HGP) believe Celera may have stopped short of its original goal. They question the company's claims that it will assemble the human genome within six weeks, and raise doubts over whether Celera, by itself, will be able to annotate the sequence by the end of the year.

The controversy over Celera's latest claim centres on the ‘shifting finish line’ in the raw sequencing race. When the company first announced that it would take on the human genome in 1998, it said it would single-handedly sequence the genome 10 times over (10X coverage) — the repeats being necessary for higher accuracy. Celera revised this goal to 4X sequence coverage in January this year (see Nature 403, 119; 2000), saying that it would then combine its proprietary data with the HGP's public data.

But, according to sequencing experts such as Philip Green from the University of Washington in Seattle, the ‘endpoint’ that Celera announced last week seems closer to 3.3X coverage.

The company claims to have achieved 11X ‘clone coverage’, a measurement that is notably different from sequence coverage. In each unit of sequence coverage, machines read the entire length of every DNA fragment. But in clone coverage, sequencers only scan the ends of each fragment, skipping over the middle.

Clone coverage lends itself to Celera's strategy for genome sequencing. The company uses a ‘whole-genome shotgun’ approach, in which it blasts a complete piece of DNA into millions of fragments of varying sizes, reads only their ends, then uses computer algorithms to match up the overlaps.

Waterston: estimates 40,000 sequence gaps. Credit: BOB BOSTON/WASHINGTON UNIVERSITY

According to the company, the paired clone ends allow the genome to be assembled “much more completely than single-stranded sequencing methods allow at comparable levels of sequence coverage”. But critics such as Green argue that Celera needs a higher amount of clone coverage to achieve the level of sequence coverage that ensures completeness and accuracy.

The public project, by comparison, is doing ‘clone-by-clone’ sequencing, reading each DNA fragment in its entirety, then placing it on a physical map.

The HGP's map serves as a scaffold on which both the public and the private projects can affix data. Robert Waterston, head of the department of genetics at Washington University in St Louis, told the committee that because Celera has access to that scaffold — as well as to the public project's data — it will always be ahead in the gene mapping effort.

Waterston believes that Celera's level of coverage, when assembled, will result in over 40,000 gaps. He based his estimate on Celera's sequence of the fruitfly Drosophila, which, at a fraction of the size of the human genome, contained about 1,200 gaps. Green says the number of gaps in Celera's human sequence might be higher, as the company did not sequence the genome as many times over as it did the Drosophila genome.

All join hands? US science adviser Neil Lane (left) and Celera Genomics head Craig Venter agreed last week that public/private collaboration would be desirable in principle. Credit: AP

Nevertheless, Waterston, Celera's president Craig Venter and Neil Lane, President Bill Clinton's science adviser, all said at the subcommittee hearing that a formal collaboration would result in a better quality genome than either group could produce on its own. Indeed, each side may have something the other desires.

Although Celera has access to the public data, it has no guarantee that publicly funded scientists will volunteer to help annotate the human genome and fill in any gaps that remain once the two data sets are merged.

Public scientists participated in an ‘annotation jamboree’ for sequencing the genome of Drosophila. But that was on the understanding that the fruits of their labour would be available in GenBank, the publicly funded database. Celera has said it will make its version of the human genome available publicly, but only in its own database, and with restrictions on the use and redistribution of sequence data (see Nature 404, 324; 2000).

The public project, in turn, could use Celera's data to move up the completion date of its own project. The HGP has planned for a ‘rough draft’ of the genome with 5X sequence coverage by the end of this year, and a complete copy with 10X coverage in 2003.

Celera also plans to release its draft of the human genome by the end of this year — but only after it is fully assembled. By contrast, scientists can see new data being added to GenBank by the public project every 24 hours. This discrepancy between the two approaches has blocked formal cooperation from the beginning.

Celera's insistence that scientists should not be able to download, add to or redistribute information from either its prospective free or current public databases has placed further obstacles in the way of collaboration. Resolving them would allow for a more comprehensive human genome database sooner than either the public or private projects initially anticipated.