Washington and Paris

Celera Genomics Corporation appears to have cut back on its plans to singlehandledly complete a high-quality sequence of the human genome. The company now says that it intends to achieve the same end by combining lower-quality sequence with data from the international, publicly funded Human Genome Project (HGP).

At a press conference on Monday in Rockville, Maryland, Celera announced that it had itself sequenced 81 per cent of the human genome, and had combined this with publicly available data to cover 90 per cent of the genome.

Venter: modifying sequencing plans. Credit: HANK MORGAN/SPL

Craig Venter, the Celera president, said the company plans to stop sequencing the human genome at the '4X' level in June -- meaning that four bases of sequence have been generated for every base of genome -- instead of 10X, as originally planned.

Celera's formation in 1998 led to an acceleration of the public project, which is funded by the US National Institutes of Health and Department of Energy, and Britain's Wellcome Trust (see Nature 395, 207, 1998). Its use of public data means that Celera will now sequence the human genome fewer times than previously planned. "If we had to only use our own data, we might have to go as high as 10X," Venter said.

Instead, it will combine these data with the 5X draft that the publicly funded project is expected to produce this spring. Venter described Celera's use of the public data as a "de facto collaboration", and said the company would "give attribution and credit" to the public project.

The acknowledgement seems to mark the end of Celera's original plan to 'shotgun sequence' the whole human genome on its own by sequencing millions of DNA fragments without knowing where they belong in the genome, and then use sophisticated software to work out which fragment goes where.

Now Celera will need to hang its sequence data on the framework produced by the public project. Celera used the same approach to sequence the Drosophila genome, in collaboration with the Berkeley Genome Sequencing Project (see Nature 401, 729; 1999).

Celera claimed at its press conference to have sequenced 81 per cent of the human genome, but only to a depth of 1.75X. Combining these data with the public sequence has raised the total to 90 per cent, at 2X coverage.

Hamilton Smith, 1978 Nobel laureate in medicine and a co-founder of Celera, who is now its senior director of DNA resources, said the company's shotgun data match up well with the HGP's 'clone-by-clone' data. "Eighty per cent of the public sequence is covered in our data."

The public data represent about 50 per cent of the total in Celera's database, so the 20 per cent of unique data provided by the HGP represents a net 10 per cent contribution beyond Celera's, Smith explained.

John Sulston, director of Britain's Sanger Centre in Cambridge, where one third of the public sequencing is being carried out, points out that Celera's coverage is only to a depth of 1.5X, whereas the public project has already completed half of the sequence at 5X.

He says the Celera press release could be rewritten to say that, with the company's 1.5 X data on top of the 5X data, "half [the genome] is really very well covered", and that Venter has helped "extend our 50 per cent".

Members of the public effort have agreed to deposit their sequence data every 24 hours. Because Celera hasn't accepted any public money, it is under no obligation to make its data public, but Venter repeated on Monday that it will eventually do so.

Venter estimated that the combined data contain 2.58 billion base pairs (Gb), and that 97 per cent of human genes are now in the company's database. But he declined to estimate the number of genes these data contain, adding that most genes are still in fragments.

Philip Green, a biocomputing expert at the University of Washington, estimates that the sequencing output of Celera and the public project are running neck and neck, at 1.5 Gb per month. He says the total amount of raw data required for adequate coverage of the genome could be obtained by the summer, provided the data from both groups are combined. "Neither group seems likely to get to that point by itself before the end of the year," he says.