InforMax's BioAnnotator uses locally stored databases to find protein motifs. Credit: INFORMAX

The working biologist now has an enormous number of options when it comes to bioinformatics tools. On one hand, there is a lot of free high-quality software in the public domain. On the other, researchers can buy commercial products offering added features, such as programs to streamline sequential tasks, to access proprietary databases and to enhance data security. And because software producers realize that users' needs change and their products will rarely be used in isolation, flexibility and modularity are on the rise.

An important trend has been the increasing integration and sophistication of tools available to non-experts. A wide range of user-friendly packages incorporating tools for nucleotide and protein sequence analysis are available from companies such as MiraiBio, a Hitachi Software Engineering subsidiary based in Alameda, California; DNASTAR in Madison, Wisconsin; InforMax in Bethesda, Maryland; and Accelrys in San Diego, California. On the non-commercial side, the Biology WorkBench maintained by the Supercomputer Center at the University of California, San Diego, is particularly popular, offering more than 80 bioinformatics tools to more than 10,000 registered users. “It's a one-stop-shop for doing a lot of things,” says lead developer Shankar Subramaniam. “You can be sitting in front of any type of computer; as long as you have a web browser, you can access it.”

Software has also become more user-friendly. Back in the early 1990s, users of the GCG Wisconsin package, the grandfather of molecular-biology packages (now sold by Accelrys), had to work with UNIX-based systems. Although these systems are still preferred by some, users can now point-and-click their way through a wide range of tasks on ordinary desktop computers.

Another trend is the increased integration of data analysis with experimental design. The needs of bench scientists don't always coincide with those of professional bioinformaticians producing tools for whole-genome analyses. Genome projects require programs that can efficiently, if not very accurately, process huge amounts of sequence data, but the biologist in the lab is often interested in studying small sets of genes and their products with very high precision. Last month, for example, InforMax released GenomBench, a tool that allows users to predict the structure of genes and their splice variants, progressively refine these predictions, and then design experiments to validate them. “It's an interactive tool that can work with researchers not just to analyse the data they have, but to design the right experiment to resolve ambiguities in the data,” says Steve Lincoln, senior vice-president of life-science informatics at the company.

Others are hooking up their software to catalogues of reagents. As just one example, the genome browser run by the University of California, Santa Cruz, is being used in a collaboration with the National Cancer Institute in Bethesda, Maryland, to identify new genes to expand, and ultimately complete, the Mammalian Gene Collection — a set of cDNA clones of expressed genes for human and mouse. The browser will be linked to the collection's website, so that users can go straight from analysing an electronic representation of a gene to ordering a clone.

A key trend in the development of commercial products is the emergence of workflows, automated chains of operations that can dramatically increase analysis throughput. For example, software producer geneticXchange of Menlo Park, California, recently demonstrated a workflow that sorts gene-expression data generated by microarrays, looks up the accession numbers that identify the selected genes, collects sequence information from the US National Center for Biotechnology Information's UniGene database, gathers annotation information from the LocusLink website, and goes to Medline to assemble a list of relevant references. “You just hit a button and it does what might take a biologist 600 hours to do, in about five hours,” says Mark Haselup, chief technical officer for the company.

Some commercial products are valuable because they're linked to otherwise unavailable proprietary data. One of the main selling points of the Celera Discovery System, for example, is the access it provides to the biotech firm's high-quality human and mouse genome annotations. Unlike many other collections of annotations, a high proportion of Celera's have been generated by manual curation (see 'Putting a name on it').

Commercial products often provide greater security for those who don't wish to manipulate their unpublished or unpatented results openly over the Internet. Although some public sites offer a degree of security, commercial packages usually have more protection options and can be operated behind a firewall.

But the recurrent theme in the design of bioinformatics tools is the trend towards increased integration. The Discovery Studio Gene package recently launched by Accelrys is a case in point. “Results are put into a project database that has the ability to be accessed by a set of applications that span both chemistry and biology,” says Scott Kahn, senior vice-president of life science at Accelrys. “We set up the ability to collaborate between domains.”

Biology WorkBench → http://workbench.sdsc.edu