Credit: AP/P. SAKUMA

Google launched a new service last week, Google Base. It allows anyone to upload files for free to its massive server farms, making the data instantly searchable. Although mainly aimed at online markets for such things as homes and jobs, scientists say the facility could have important implications for data-sharing in science, and perhaps boost efforts to make the web more ‘intelligent’.

As well as letting people upload data, Google Base lets users describe the data with simple tags that others can then use in searches. It also allows users to structure the data by adding fields on the fly. So a web page holding a scientific article might have fields for ‘author’, ‘journal’, ‘publication date’ and other bibliometric information.

That might not sound like a very big deal. But advocates say that this allows web content to be structured as databases on a large scale. For a start, that makes it simple for any scientist to share data, and store it in ways that allow computers to search and retrieve it.

Scientists are still “in the Dark Ages” when it comes to sharing data, says David Haussler, director of the Center for Biomolecular Science and Engineering at the University of California at Santa Cruz. Data falling outside the relatively few big international databases, such as those for gene sequences, protein structures and astronomy data, mainly end up in supplementary tables accompanying journal articles, he says, and are “stored in some non-indexable, inconsistent and inconvenient format, if indeed they are kept at all”.

Google Base or similar services could help, says Ian Foster, a computer scientist at Argonne National Laboratory in Illinois and co-inventor of the Grid concept, in which many computers work together to provide large amounts of processing power and data storage. Science badly needs “something that would make it trivial for individuals and communities to create and share scientific data, and the programs that operate on those data”, he says.

“To have a way to easily cross-examine multiple kinds and sources of data would be a real boon to research,” agrees Paul Myers, a bioinformatician from the University of Minnesota, Morris. “I think Google is getting in early on what could be an immensely important tool.”

Smart systems

Google Base may also signal a modest start for the web to move towards the ‘intelligent’ network originally envisaged by Tim Berners-Lee when he invented the web at CERN, the European Laboratory for Particle Physics in Geneva, Switzerland, in 1989.

Most web pages are designed to be read by humans, and don't contain additional descriptive information that can be interpreted by computers. This limits their usefulness, especially for users carrying out searches. For example, it's not currently possible to search the web to find “only peer-reviewed papers dealing with experiments where the CCR5 protein activates the PYK2 protein”. And when reading a paper online, you can't ask the computer to replot a graph adding in extra data sets.

Berners-Lee champions what he calls a ‘semantic web’, where tags added to pages would allow computers to ‘understand’ what the pages contain. This means computers can ask whether the data meet certain criteria and merge data sets from different sources.

But although the semantic web is fast gaining ground in certain specialist areas such as bioinformatics, it has yet to take off in a big way. Scientists say Google Base could change that by bringing structured web pages to the masses. “The big issue here is whether services like this will help bootstrap the semantic web,” says Greg Tyrelle, a proteomics researcher at Chang Guan University in Taiwan.

Google power

“Flexible online storage of arbitrary data, including scientific data, is going to be a major area of research over the next couple of years,” says Leigh Dodds, a web expert at publisher Ingenta. “Google Base takes that a step further by widening it out to everyone,” although he adds that he would like to see governments and universities doing more to promote such services, rather than leaving it to Google.

Scientists point out, however, that Google has been prominent in its absence from work on the semantic web in the World Wide Web consortium (W3C), the body that creates web standards. They also acknowledge that Google Base is a pretty crude service so far, especially compared with sophisticated specialist databases such as GenBank and UniProt. All you can do is put in information, and then search it — there's no way to extract or compute the data.

But most researchers believe that will change fast. Google has been a pioneer in creating what are known as ‘application programming interfaces’ to its other services, such as Google Maps. These allow anyone to write programs that can access Google's databases, and mix and match its content with other data to create completely new products.

“If Google wants to turn Google Base into more than just a tool for finding information, and into something scientists can actually use to explore data, then more is needed,” says Mark Gernstein, a bioinformatician at Yale University in New Haven, Connecticut.

But observers such as Foster believe such progress could happen fast. “Google has much relevant technology and expertise,” he says. “If it forms the right partnerships and dedicates sufficient resources, it could have a tremendous impact.”

“Google Base looks a little simple right now, and it's not clear exactly how to tap into Google's power,” adds Myers. “But we've got to start somewhere.”