Published online 18 November 2004 | Nature | doi:10.1038/news041115-13

News

Scientists get their own Google

New search engine ranks papers by importance, and finds the free versions.

Imagine searching the Internet and being able to restrict your results to academic texts. Today Google launched a free search engine that aims to do just that. Google Scholar searches only journal articles, theses, books, preprints, and technical reports across any area of research.

A test version of the search engine is available at http://scholar.google.com, so you can try it out. In a search for the phrase "human genome", for example, a normal Google web search throws back 450,000 or so hits, with genome centres and databases and other websites ranked top.

In contrast, Google Scholar returns just 113,000 hits, and all the top-ranked items are not websites but seminal papers on the subject. In fact, the number one hit is the landmark article "Initial sequencing and analysis of the human genome"1 published in Nature in 2001.

On the links

The tool is based on principles similar to those of Google's web search. The original search manages to make the most useful references appear at the top of the page using algorithms that exploit the structure of the links between web pages. Pages with many links pointing to them are considered 'authorities', and ranked highest in search returns.

The ranking is refined by taking into account the importance of the origins of links to a paper. "We don't just look at the number of links," says Sergey Brin, a cofounder of Google. "A link from the Nature home page will be given more weight than a link from my home page," he explains.

Google Scholar works in much the same way, using the citations at the end of each paper, rather than web links. It automatically identifies the format and content of scientific texts from around the web, extracts the references and builds automatic citation analyses for all the papers indexed.

This approach has been pioneered in computer science by ResearchIndex, software produced by the information technology company NEC.

Search for success

Much of the peer-reviewed material has been made available to Google by publishers, including Nature Publishing Group, the Association for Computing Machinery and the Institute of Electrical and Electronics Engineers, through a pilot cross-publisher search engine called CrossRef Search.

Publishers have arranged for Google robots to scan the full texts of their articles. Users clicking on a hit returned by Google Scholar are directed to the article on the publisher's site, where subscribers can access full text and non-subscribers get an abstract or information on how to buy an article.

Google Scholar has a subversive feature, however. Each hit also links to all the free versions of the article it has found saved on other sites, for example on personal home pages, elsewhere on the Internet.