Checking in: a powerful database used to track airline reservations will soon be tracing gene expression in mouse brains. Credit: CARROLEE BARLOW

What do biologists interested in gene expression have in common with airlines, banks and food retailers? Data, data ... and yet more data. A range of high-throughput techniques such as DNA microarray analyses are providing biologists with sets of data that dwarf anything they have ever dealt with before. And in this rising tide of information, a company that supplies data 'warehouses' has spied an opportunity.

Although largely unknown in scientific circles, Teradata of San Diego is a market leader in supplying systems that can manage more than a terabyte (1 trillion bytes) of data. Its databases help to organize supermarkets' supply chains so that shelves get filled, for instance, and are used to keep track of airline reservations and banks' financial records.

Teradata's excursion into biology follows an introduction to Carrolee Barlow, a neurobiologist at the Salk Institute for Biological Studies in La Jolla, California. Barlow and David Lockhart, president of Ambit Biosciences in San Diego, have a grant from the US National Institutes of Health to generate gene-expression profiles of the mouse brain. Monitoring the activity of thousands of genes in more than 150 different cell types in several mouse strains and under a range of different experimental conditions means that the project will generate about three-quarters of a terabyte of data. “It became clear that most commercially available databases were not suited to what we needed,” says Lockhart.

The idea of linking up with Teradata came through a chance conversation between Barlow and Sudhakar Shenoy, chairman and chief executive of Information Management Consultants (IMC) of McLean, Virginia, a company that develops specialized software tools and databases for a number of clients including the US government. “Banks, airlines and Walmart have much bigger problems with data than us,” says Barlow. “When we told Teradata of our problem, it was a no-brainer for them.”

Pump up the volume

Carrolee Barlow and David Lockhart expect to generate three-quarters of a trillion bytes of data.

The result was a collaboration between the biologists, IMC and Teradata to adapt the latter's data warehouse for large-scale gene-expression analyses. Lockhart and Barlow supply the data and devise algorithms to ask relevant questions of the database; IMC and Teradata provide database infrastructure and software engineering. The system is expected to be ready within a month and Barlow says that trials have allowed analyses that previously took days to be done within hours.

Once the system has been fully tested, Teradata plans to target the life-sciences market. “We think the bioresearch sector is a good potential market for Teradata technology,” says Alan Chow, vice-president and general manager of Teradata's development division. At a cost of about US$1 million for its most basic terabyte data warehouse, buying a Teradata system is beyond the pocket of most biology labs — although pharmaceutical companies might be interested. But IMC hopes to develop an outsourcing service, enabling biologists with particularly difficult data-management problems to use Teradata systems for a reasonable subscription fee.

Bioinformaticists are uncertain about Teradata's prospects in the commercial marketplace for biological databases, where companies such as Oracle and IBM are established players. With relatively few of today's repositories for biological data operating in the terabyte realm, Lincoln Stein of the Cold Spring Harbor Laboratory on Long Island, New York, thinks the demand for high-end data warehouses may be limited — a view echoed by Shankar Subramaniam of the University of California, San Diego. “Teradata is an excellent data warehouse for large volumes of data — the larger the volume of data, the better it is,” says Subramaniam. But he argues that its specialized hardware and software makes it less suitable for users who do not need such enormous capacity.

Jeff Augen, director of strategy with IBM Life Sciences in Somers, New York, adds that bioinformatics often involves relating information in disparate formats held in different places. “Very often you have to link many databases together and the Teradata system was not designed to be used in that way,” he argues.

But Richard Winter of the Winter Corporation, a database consultancy firm in Waltham, Massachusetts, suspects that other biologists will find themselves in Barlow's position, deluged with more data than their current systems can manage and analyse. “Teradata offers a solution to a class of problems which may be important to the life-sciences sector over the next few years,” he says. “Biologists are working their way up the learning curve of how they can exploit database technology.”

http://www.teradata.com