Nature Publishing Group
nature.com about npg news@nature.com naturejobs natureevents help site index
Nature
my accounte-alertssubscriberegister
SEARCH JOURNAL  advanced  search
Friday 24 May 2019
access to the literature

NPG Subject areas
Access material from all our publications in your subject area:
Biotechnology Biotechnology
Cancer Cancer
Chemistry Chemistry
Dentistry Dentistry
Development Development
Drug Discovery Drug Discovery
Earth Sciences Earth Sciences
Evolution & Ecology Evolution & Ecology
Genetics Genetics
Immunology Immunology
Materials Materials Science
Medical Research Medical Research
Microbiology Microbiology
Molecular Cell Biology Molecular Cell Biology
Neuroscience Neuroscience
Pharmacology Pharmacology
Physics Physics
Browse all publications
 

CrossRef launches CrossRef Search, powered By Google

Imagine searching on Google and being able to restrict results to articles published in peer-reviewed journals. A step in that direction was taken this month when CrossRef, a not-for-profit association of publishers, and nine of its member1, launched the CrossRef Search Pilot, powered by Google2. From a search box on any one publisher's site, a user can perform a Google search across the content of all nine.

The pilot's goal during 2004 will be to evaluate the functionality of the initiative and get user feedback. Other CrossRef member publishers are expected to join during 2004.

It is an exciting time to be in scholarly publishing. Fundamental questions are being asked about the purposes of the complex business of scholarly communication and how best to organize and to pay for it. While the Open Access debate can get heated, all the various individuals and organizations involved ultimately share the same goal: an efficient scholarly communications system that effectively disseminates and archives high quality, peer-reviewed content.

The economic and philosophical debate over Open Access is an important one, but other significant access issues also confront scholarly communication. Everyone, scientists included, is being overwhelmed with a flood of information. As this article went to press, Google claimed 4.285 billion Web pages in its index. How then can users find high quality, authoritative, peer-reviewed content on the Web?

Exploiting new functionality and technologies to improve how scholarly research is done, recorded and disseminated is one area where all publishers, traditional or Open Access, have common ground. This area is the focus of CrossRef, a collaborative, non-profit venture with 300 member scholarly publishers, including both large and small publishers, and both commercial companies and learned societies, together with Open Access publishers such as the Public Library of Science (PLoS) and BioMed Central (BMC).

Created in 2000 specifically as a collaborative framework to help integrate information sources across the Internet, CrossRef is 'business model neutral', leaving it up to participating publishers to decide how to make their content available and what to charge, if anything, for submission or access.

CrossRef has accomplished its founding mission of enabling access to content by creating a broad-based reference linking service for scholarly journals. Scientists reading an article can now as a result click seamlessly from a reference to the original article at the publisher's site.

The system works by assigning a unique Digital Object Identifier (DOI) to each of the 11.1 million articles in the 9200 journals of the 300 publishers taking part in CrossRef. References to a given article are hyperlinked to its DOI, and users are automatically directed to the corresponding URL on the publisher's site.

In most cases a user arrives at the abstract page for the article where there are links to the full- text article. CrossRef does not collect abstracts or full text. Subscribers to the journal can go straight to the full text, whereas non-subscribers may be able to purchase the article or get information on a subscription. If the full text is available at no charge, then all users can get access to it directly. Academics make heavy use of the DOI reference system, clicking every month on some 5 to 6 million CrossRef DOI links.

CrossRef is also expanding content types beyond journal articles and has registered about 600,000 DOIs for conference proceeding chapters and articles and about 50,000 DOIs for books, book chapters and encyclopaedia entries. This allows expanding citation linking beyond journal articles to include a wide variety of important academic sources.

CrossRef was designed consciously to be a distributed system that allows users to get to authoritative content by making it efficient for publishers to link references across different publishing systems. This has been referred to as 'distributed aggregation' but a better term might be 'distributed integration' - CrossRef itself does not provide any products or content - it provides infrastructure.

It also sets the terms and conditions for participation, ensuring that there is a level playing field for all publishers. Member publishers must deposit all their online journal content in CrossRef for the purposes of indexing and must link their references to other publishers using DOI.

In keeping with providing infrastructure services across distributed systems and enabling users to find authoritative scholarly content, the CrossRef Search Pilot singles out peer reviewed content from general Web search results. This is an initiative that takes advantage of the collaborative environment of CrossRef and Google technologies. The pilot enables the indexing and the searching of the full text of journal and conference proceeding articles from nine publishers covering the full spectrum of scholarly research.

CrossRef Search is a 'domain filtered' search of the main Google index. CrossRef Search results are delivered from the regular Google index, but are filtered to include only content from the nine publishers participating in the pilot. CrossRef works behind the scenes to facilitate the 'crawling' (the automatic searching and indexing by computer search engine software) content on publishers' sites.

The pilot initiative is also investigating how DOIs can be used to improve indexing of content and enable persistent links from search results to the full text of content at publishers' sites. As part of the pilot, Google will supplement its own crawling of publishers' sites by using a 'directed crawl' based on a DOI Sitemap. This Sitemap is updated daily and includes a list of all the full text articles and DOIs from the pilot publishers. By using the DOI Sitemap, Google can quickly locate new content to crawl and ensure that it is indexing the 'deep Web' content on publishers' sites that may be behind access control.

An extra benefit of using the DOI system is that having a persistent and unique identification for an article will help preserve an item's Google PageRank. Google identifies specific Web pages using URLs and, if those URLs change, the PageRank of the item is lost. DOIs are persistent names for articles and over time they should enhance the rank of a document and protect against URL changes.

Whatever the economic model for individual scholarly publishers, potentially they can all work together in the in CrossRef project to take advantage of a distributed infrastructure and common policies to enable efficient access to peer-reviewed content online. While the starting point is CrossRef Search 'powered by Google', multiple versions of CrossRef Search provided by different search engines are also a possibility.

Ed Pentz

Executive Director, CrossRef


  1. CrossRef Search pages on publishers' sites: American Physical Society; Annual Reviews; Association for Computing Machinery; Blackwell Publishing; Institute of Physics Publishing; International Union of Crystallography; Nature Publishing Group; Oxford University Press; John Wiley & Sons, Inc.

  2. http://www.crossref.org/crossrefsearch.html

© 2004 Nature Publishing Group
Privacy Policy