Paris

The ambitions of IBM, the world's largest information technology company, to become a major player in genomics took a step forward last week. The company joined a consortium set up to produce a map of human genetic markers known as single-nucleotide polymorphisms, or SNPs.

This US$50-million joint effort by drug companies and academic centres, begun last year, aims to generate a map and place it in the public domain by 2001. IBM is the first core information technology company to join, and will pay a subscription of $3 million.

Protein problem: IBM says that its computing power will help researchers work out how unfolded proteins (such as Barnase, left) become the folded form (right).

“IBM will add sophistication,” says Arthur Holden, chairman and chief executive officer of the SNP Consortium. “Bioinformatics has been dominated by biologists with an interest in computing, and genome software is very basic. We really need to harvest more of the mainstream information technology capacities.”

Other members include Britain's Wellcome Trust, AstraZeneca PLC, Aventis Pharma, Bayer AG, Bristol-Myers Squibb Company, Hoffman-LaRoche, Glaxo Wellcome PLC, Novartis, Pfizer Inc., Searle, SmithKline Beecham PLC and Motorola Inc.

SNPs are identified and analysed in Britain at the Sanger Centre near Cambridge and in the United States at the Whitehead Institute for Biomedical Research, Washington University School of Medicine, Stanford Human Genome Center, and Cold Spring Harbor Laboratory.

The consortium (http://snp.cshl.org) aims to identify 300,000 SNPs by April 2001, and has mapped 150,000 of them to their positions on the genome. SNPs occur about once in every 1,000 bases of the 3 billion bases in the human genome. They are key to developing genetic medicine, allowing assessment of individuals' predisposition to diseases, and tailoring therapies.

SNPs will help track down the location of genes in disease, when whole-genome scans of populations susceptible to a disease are compared with those of others that are not. The strategy is to take DNA from 24 ethnically diverse individuals, create small representative libraries across the genome, sequence them and compare the overlapping traces.

It will yield “a broad evenly spaced map across the genome representative of diversity”, says Holden. He adds that the built-in diversity will yield large numbers of novel SNPs — one problem is that many of the common SNPs have no role in disease.

To allow time for mapping, the consortium plans quarterly releases into the public domain on the dbSNP database (http://www.ncbi.nlm.nih.gov/SNP ). The second of these brings the number of mapped SNPS to 7,365.

The research centres have now scaled up to maximum output, says Holden, and 42,000 SNPs have been identified. This acceleration is “fantastic”, says Genghis Lloyd-Harris, director of biotechnology and pharmaceutical equity research at Credit Suisse First Boston in London. It will allay concerns that the project was making slower progress than expected in the face of private rivals such as the US company Celera Genomics and France's Genset, he says.

Dan McCurdy, IBM's vice-president for life sciences, says the decision marks IBM's commitment to a public-domain effort. He adds that, if SNPs are to be broadly applicable to diagnosis and treatment, massive data-handling capacities will be needed to screen an explosion in genetic information.

“Hundreds of genomes are going to be sequenced, and today's tools are relatively crude,” says Jeff Augen, IBM's director of life science solutions development. Genomics needs more computer science input and a huge increase in computer power, he says.

In return, IBM will “gain a better understanding of the genome industry by sitting on the board and technical committees of the consortium,” he says. It is also interested in using SNPs in its own research to see how polymorphisms affect protein folding.

The move follows IBM's launch in December of a $100 million research programme — built around the challenge of modelling protein folding — to build a petaflop supercomputer within five years (see Nature 402, 705; 1999). This would carry out more than one quadrillion floating point operations (“flops”) per second: some two million times more than the best desktop machines. IBM has also created a life sciences division.

IBM also recently announced $1 million grants to pilot centres within the US National Institutes of Health's Protein Structure Initiative, a structural genomics programme established last June. The grants are for IBM RS/6000 SP supercomputers and other technologies, while scientists at the centres would have access to software and other resources at its Deep Computing Institute.

It is too soon to say what IBM's licensing policy for software will be, says McCurdy, but he expects it will make some tools freely available to academics. IBM recently made freely available the source code to its Visualization Data Explorer (http://www.research.ibm.com/dci/software.html).