Chemspider website provides free information on millions of molecules.
A chemist running a computer server from his home is quietly solving one of his colleagues' biggest frustrations by providing the community with an open-access source of chemical information.
Although biologists have enormous public databases of genes and proteins, chemists usually have to pay for access to data on molecules. Chemist Antony Williams is hoping to change this in a move likely to ruffle the feathers of the American Chemical Society. Williams, a private consultant based in Wake Forest, North Carolina, has started a website called ChemSpider that has compiled data on nearly 20 million molecules in a year.
The modest project has made chemists interested in open access take notice — last week, the number of daily users of the site surpassed 5,000. “It's quite an exciting development,” says David Wild, a chemical informatics researcher at Indiana University, Bloomington, who uses the service. “ChemSpider is working to integrate information in a unique way.”
Chemical data have long been available, but at a hefty price. The largest supplier of such information is the American Chemical Society's Chemical Abstracts Service. The service, which is more than a century old, includes data on roughly 35 million molecules. But university and industry chemists must pay thousands of dollars to use the database. The society will not reveal numbers, but fees for using the database are thought to make up a substantial portion of its US$311-million annual income from 'electronic services'. Some have been highly critical of the society's grip on chemicals.
In recent years, several public sources for chemical information have appeared on the scene. The largest, PubChem, is run by the National Library of Medicine in Bethesda, Maryland, and contains data on some 19 million chemical structures. But PubChem's data focus on biological information, according to Williams. Other potential sources of information, such as Wikipedia, lack the algorithms needed to search chemicals according to their structure. “I noticed there was this gap,” says Williams. “So I decided to try an experiment.”
Rather than building up a database, the ChemSpider service scans open-access sources, including PubChem and Wikipedia, for chemical data. It compiles the publicly available information in a single location, and allows users to follow links to the original source material. The site is maintained with modest profits from advertising and the work of about 30 active volunteers who double-check the data pulled in from outside.
The site is not without its flaws. “There's an awful lot of chemical information, but there's an awful lot of rubbish as well,” says Barrie Walker, a retired industrial chemist in Yorkshire, UK, who helps maintain the site. When working with such a large database, he says, “you're bound to end up with a quality issue”. Williams adds that the site still has problems with certain searches. For example, it struggles to distinguish between isomers: molecules with the same chemical formula arranged in different structures.
But Williams nevertheless believes that the service may be able to compete with for-profit services. “What I'm doing is highly disruptive,” he says. “I think it can be done and it needs to be done.” The American Chemical Society declined to comment on ChemSpider.
About this article
Cite this article
Brumfiel, G. Chemists spin a web of data. Nature 453, 139 (2008). https://doi.org/10.1038/453139a