Steve Lawrence and C. Lee Giles stated in their Commentary that most of the popular search engines index only about 7–16 per cent of the World-Wide Web1. This is alarming, as many scientific web pages containing important data may never be discovered. As the web grows it is going to become increasingly difficult for general search engines to give comprehensive coverage. The answer to the problem could be the development of subject-specific search engines able to cover most of the contents within that subject.
Most currently available subject-specific lists and indexes are maintained by humans. Many of them are merely collections of web addresses and lack context-based relevance ranking and retrieval of results in multivariate combinations. What is needed are search engines that could traverse through pages at the last level in a subject-specific website. They would be able to do this as the numbers of such sites would be within manageable limits. Crawlers or robots traversing through such a subject-specific web subset could build up a comprehensive and complete bank of keywords. In turn, such keyword-mounted crawlers would efficiently and more frequently screen the last-level page of the site.
Subject-specific search engines would be able to maintain the freshness of the hits, as the crawlers would check a manageable number of specific web pages more frequently than they would by moving through the entire web.
Mike Gardner2 has rightly suggested that we need science-oriented search engines with sets of scientific metadata, as metadata are the key to better searching. Together with that proposal, an approach similar to peer-reviewing of scientific publications could be applied for categorizing and evaluating web pages based on their content, quality and subject-specificity. It would be feasible to use algorithms and rule-based expert systems to check content-richness, subject-specificity and freshness-based context-relevance ranking for retrieved results.
Millions of dollars need to be invested in developing search engines. This investment could be cost-effective if it resulted in an almost zero noise-to-signal ratio and precise but comprehensive subject-relevant hits. The development work should be done in the academic sector, but the completed search engines would have commercial potential, and would generate more revenue than general search engines because of their subject-specificity.
Development of subject-specific search engines would satisfy the growing demand for the latest, precise, value-added, noise-free hits with a high level of subject relevance. Many such search engines together would be able to index most of the World-Wide Web.
Lawrence, S. & Giles, C. L. Nature 400, 107–109 (1999).
Gardner, M. Nature 401, 111 (1999).