Computers have changed biology forever, even if most biologists don't yet realize it, says Michael Levitt, a structural biologist at Stanford University and the founder of Molecular Applications Group (MAG), in Palo Alto, California. Already, drug discovery is driven by the need to apply powerful computers to voluminous data sets, and the trend, he says, is certain to extend into all other disciplines in biology.

Chris Lee, Levitt's former graduate student and co-founder of MAG, agrees, noting that most biologists today use computers only in the most elementary way as a typewriter and graph-paper substitute. “Bioinformatics is really going to surge when biologists realize that there's a lot of value, and a lot of new insights, in being able to work across large amounts of data that they and all the other scientists in the world have produced,” says Lee.

Levitt began using computers to solve problems in protein folding when working under John Kendrew, Max Perutz, Francis Crick and other eminent molecular biologists in the “golden age” of the 1960s at the Laboratory for Molecular Biology at Cambridge, United Kingdom. Today the 7lb laptop he carries in his backpack has more than a thousand times the computing power, at less than a thousandth of the cost, of the punch-card behemoths of 30 years ago.

Accompanying the relentless increase in computing power is a breathtaking expansion of biological data from the human and other organism genome-sequencing projects. Complementary information from the pharmaceutical chemistry, neuroscience, microbiology, immunology, clinical trials, toxicology, teratology, epidemiology and other disciplines waits to be integrated with the genetic and structural data. There is no way to obtain a global view of all this information, to establish links between disparate fields of knowledge, without the computer.

Myra Williams, MAG's new president, has a PhD in biophysics from Yale and was hired this summer from Glaxo Wellcome to launch the company's GeneMine Pro suite of bioinformatics tools. She observes that rates of data acquisition, far from levelling off, are accelerating. Soon innovations like Affymetrix's high-density oligonucleotide array microchip will come online, generating terabytes (‘terror-bytes’) of new sequence information. How scientists navigate this ocean of biological information will be crucial.

“To be effective,” Williams says, “bio-informatics tools must not merely automate data retrieval, but give researchers the information in usable form, through clustering, filtering, analysis and visualization, allowing them to perceive insights which might well have eluded them had they attempted to process the information manually.”

The bioinformatics capabilities of MAG's GeneMine Pro program are built around its Discovery Engine, an automated Web browser which retrieves biological information in 22 categories from servers worldwide. After processing, the information is presented at the user interface, where it can be visualized in the context of sequence alignments and three-dimensional protein structures, or read as text. Large pharmaceutical companies, challenged by expiring patents and the high cost and slow pace of conventional drug development, are now the main source of sustenance for bioinformatics. The fact that genomics and bioinformatics are creatures of big industrial research environments inevitably leads to a blurring of distinctions between academic and industrial science.

Despite a strong and growing demand for bioinformaticists, there are few established training centres, perhaps 20 in the world, estimates Levitt. The field is still defining itself, and those who do have formal training are quickly snapped up by industry on hefty pay scales, leaving a deficit in the numbers of those available to train the next generation.

Nevertheless, Lee believes that the candidate who, on his or her own initiative, “can demonstrate the ability to cross over and generate results — not even necessarily original ones — is the one who will capture the recruiter's attention.”

Levitt adds that the shortage of formal training slots coincides with exceptional opportunities for self-learning: with an Internet connection and inexpensive computer, one can download all the databases, programs and papers needed to undertake an original research project. “Spend two days looking at the results and thinking about what you don't understand. Use e-mail to contact someone who does, and ask, ‘what should I be doing’?” he recommends. Prize-winning discoveries can be made in this way. “The problems are so difficult, and there is so much to be analysed, that the boom will not go away. It's a great time to be getting in. There's a wonderful lightness to the field.”