Main

Two new chemical databases are freely available to researchers.

Two databases have been launched, which aim to bridge the gap between chemistry and biology using different approaches.

The first database, PubChem (http://pubchem.ncbi.nlm.nih.gov/), is part of the US National Institutes of Health's 'Molecular Libraries' initiative, which already has 650,000 diverse small molecules in its collection and a further 100,000 to be included in 2005. The collection will contain as many independent scaffolds as possible, natural products, all current FDA-approved drugs, and other compounds obtained from private and public sources. These will be sourced from NIH-funded small-molecule screening centres and elsewhere.

PubChem will provide biomedical researchers with access to structural, chemical and physical information, which they can screen computationally or experimentally. The information will allow researchers to pick a suitable small molecule as a probe for important cellular pathways involved in health and disease, as an imaging agent, or potentially as a drug lead.

Bioinformatician Neil Saunders of the University of New South Wales is impressed by how PubChem integrates into the National Center for Biotechnology Information's website/ database structure. “It will be of use primarily for bioassays,” he says, “or in docking simulations.” Its usefulness as a research tool will be in allowing researchers to systematically screen thousands of small molecules.

The second database, called Chemical Entities of Biological Interest (ChEBI; http://www.ebi.ac.uk/chebi/) and produced by the European Bioinformatics Institute, contains naturally occurring and synthetic small molecular entities. However, the similarity to PubChem ends there. Whereas PubChem will provide a diverse database of potential drug leads and molecular probes, ChEBI is essentially a dictionary with a controlled vocabulary for looking up information about a chemical entity of interest.

“We see as our target audience those who use biological databases,” explains curator Marcus Ennis. Moreover, the ontological aspect of ChEBI will allow EBI to record relationships between molecular entities or classes of entities in a defined way.

Each ChEBI entry contains details about the entity's chemistry and biological activity. Synonyms for each entity are also listed and are searchable. The database includes the relationship between macromolecules, such as proteins, and small molecules via UniProt protein knowledgebase cross-links.

Although ChEBI is small, Ennis hopes that with improved funding the database will have scope to grow. “As the EBI databases begin to use ChEBI, its public awareness will increase, which hopefully will improve our prospects for increased funding.”

Steve Bryant, Senior Investigator in charge of PubChem, said that they are more than open to data exchange and collaboration with related international efforts. “The projects are new, and I think it will take some time to sort out what we can best do together,” says Bryant.

Issues of intellectual property regarding compounds should not be a problem, because a structure cannot be copyrighted even though its applications can be patented.

“Both databases appear to be good ideas attempting to address complex problems, but it is really too early to tell how useful they will be,” says Daniel Weaver of Array BioPharma, a company based in Boulder, Colorado that specializes in informatics-led drug discovery. Curation of individual structures at each site will be key to their validity in a biological research environment.

Weaver says that the developers of both databases will soon face many of the complexity issues pharmaceutical companies have tried to address. Which compounds to include, how to handle chemical variations in stereochemistry and tautomers, and how to store and retrieve appropriate compound information for the user community, will all need to be addressed.