The whole-genome shotgun technique that has been adopted by members of the Human Genome Project (HGP) is affecting the consortium's ability to follow through on its promises over data release.

Before the high-throughput technique was taken up by the project, consortium members had agreed to place new data on a public database every 24 hours. HGP members had used this standard to differentiate the public project from the private one run by Celera Genomics of Rockville, Maryland. Celera intended to release its human data only on publication.

But the publicly funded centres are now using the shotgun technique to sequence the mouse and other organisms. And they are finding that this makes it difficult for them to stick to their agreement to release sequencing data immediately.

The HGP's initial sequencing methods lent themselves well to daily data release, because scientists could continually combine small fragments of deciphered DNA into larger ones. But shotgun sequencing is more problematic, as the technique requires that the small sequence traces produced daily are not assembled until relatively late in the procedure. And the small pieces are shorter than the HGP's agreement specifies for sequence release.

Shotgun success: the mouse genome has already been sequenced once over. Credit: AP

There is no agreement yet on how to deal with shotgun data. Baylor College of Medicine at Houston, one of five sequencing centres that worked originally on the mouse genome, has placed its own small amount of shotgun data on its website. But the Whitehead Institute for Biomedical Research in Cambridge, Massachusetts, is not releasing data on its web page. And the institute — along with Washington University in St Louis and the United Kingdom's Sanger Centre, near Cambridge — is sequencing the bulk of the mouse genome.

The Whitehead has instead given its data to repositories in Europe and the United States. But these repositories cannot yet accommodate shotgun data. The sequenced mouse fragments have therefore not been made public, even though the total genome of the animal has been sequenced once over.

The data should be available in about three weeks, once programmers have ironed out the technical kinks in the new databases, says David Lipman, director of the National Institutes of Health's National Center for Biotechnology Information.

The public project's switch to shotgun sequencing will nonetheless require some changes to the HGP's original agreement, says Mark Guyer, assistant director for scientific coordination at the National Institutes of Health's National Human Genome Research Institute, which is managing the US end of the publicly funded mouse project.

Strict adherence to earlier agreements would have meant that the mouse data were not publicly available until the organism was sequenced three times over and then assembled — a milestone targeted for April. Setting up a database of mouse sequence fragments may be an interim solution. “If we had kept to our earlier policy, the data would not ave been released for six months,” says Guyer.