What is power searching?

Power searching is the process of finding good quality information from the web as quickly and as easily as possible.

Why do you need to be a power searcher?

It would be fine if there were a contents or index page for the web, but there is not. Instead, you have to use one of the many search services that are available on the web, but which one should you use?

It has been estimated that there are now twice as many pages on the publicly indexable web than people on the planet and a lot of this information changes daily. When you type a keyword into a search engine and find that you have 1,250,000 results, what should you do?

How much of the information on the web is publicly indexable and how much is 'invisible' to the popular search engines; how do you find this invisible portion?

Initially, universities produced most of the information on the web, but since the explosion of 'free' home pages and the influx of commercial sites and advertisements, the evaluation of content on the web has become much harder. How do you separate the wheat from the chaff?

The answer to all these questions is to become proficient in the art of power searching.

What do you need to become a power searcher?

To become a power searcher you need plenty of time to practise and familiarise yourself with the best search services available. The information in this section will vastly speed up that process by guiding you to the right places to start.

Search services

A search service can be defined as any tool that you can use on the Internet to help find a certain piece of information. Search services are found on the web, are mostly free to use and consist of a web page where you type one or more keywords into a search box. Click on the search button and a new web page is generated showing you the results of your search. The results are listed as hyperlinks to the web pages/websites that contain the information; each result normally carries a brief description. The most common search services can be divided into two categories: search directories and search engines; they are often mistakenly confused.

Search directories and search engines; what is the difference?

Search directories are compiled by human researchers while search engines use automated programs to compile their indices. Search directories review and categorise a small number (up to three million) of websites, while search engines index a large number (claims of up to eight billion) of web pages. This is slightly complicated by the fact that many search engine websites also contain a separate search directory area and many search directory sites also carry a search engine area; however, they tend to specialise in one service or the other. This can be a bit confusing.

The bottom line is that directories are best for broad, general searches while search engines are more useful for focused searches.

Yahoo! Directory; how does a search directory work?

I will illustrate how a search directory works by considering Yahoo! Search Directory, as this is still the best-known directory service. Yahoo! was launched in 1994 as a search directory, but over the years its significance has dwindled as Yahoo! repositioned itself in the market as a major web portal and search engine in its own right. A portal is a website that acts as an opening to a large and diverse selection of web resources, such as email services, online shopping, forums, news, weather, sport and entertainment updates.

Yahoo! uses many different websites for its different services. The main portal site1 for Yahoo! is http://www.yahoo.com. From this portal, the default search option uses the Yahoo! search engine to gather results, although there is an option above the search bar to select a search of the directory instead. If you intend using Yahoo! simply as a portal, then it is better to use UK site2 (http://uk.yahoo.com/), as you then have the option of limiting your web searches to just the UK and Ireland (and the sports pages will not be full of baseball!). However, you will find that the link to Yahoo!'s directory is missing from the UK site. Yahoo! also maintains two separate sites solely for its search engine3 (http://search.yahoo.com) and search directory4 (http://dir.yahoo.com).

Go to the Yahoo! Search Directory site4 and you will see a search bar at the top of the page and a list of categories on the left hand side of the page. The directory has been built, and is maintained, by a staff of human editors. They have reviewed and categorised about three million websites and have sorted them into 14 main categories. To use the directory, just drill down the categories and subcategories until you find the sort of information you are looking for. For example, if you click on the main category Health, and then on the subcategory Dental Health, you will find at the top of this page thirteen subcategories listed, such as Amalgam, Consumer Dental Products, Dental Surgeries, Professional Resources etc (see Fig. 1). The number in brackets after each subcategory indicates how many websites are listed; an 'at' symbol (@) indicates that there are further subdirectories. To the right of the page are clearly listed 'sponsor results'; websites that organisations or businesses pay to have listed. Yahoo! tries to ensure that such sponsored sites are relevant to that category. Underneath the subcategories are about 20 dental websites listed in order of popularity.

Figure 1: Screen shot taken from the Yahoo! search directory.
figure 1

It shows the Dental Health category with thirteen subcategories, five sponsored listings and the first four website results

Instead of drilling down the categories, you can perform a search of the directory by typing your keyword(s) into the search bar at the top of the page, ensuring that the option for searching the directory is selected (as opposed to searching the web). Your keyword search is then matched to words in the editor's description of the website.

Advantages and disadvantages of a search directory

A search directory is probably the closest thing we have to a catalogue of the web. It is easy to use and is useful if you are starting with a broad topic of interest and then want to focus your search down as you explore. A directory is good for finding organisations, commercial sites and products. The main drawback of a search directory is that they only catalogue websites (as opposed to web pages), their catalogue is comparatively small and hyperlinks are often out of date.

Other search directories

The Open Directory Project5 uses volunteer editors (about 7,500 active editors at any one time) to catalogue the web. (The concept of using a large-scale community of editors to compile online content has been successfully applied to other types of projects such as the free encyclopedia, Wikipedia6).

The Open Directory Project is the biggest general search directory service, with links to about four million websites. You can search using the Open Directory website, but this is not recommended since the ranking of the results is poor. It is much better to use the Google Directory,7 a website that uses Google's more advanced algorithms to query the Open Directory catalogue and then rank the results more appropriately.

Google; how does a search engine work?

Google8 is currently the undisputed champion of search engines. In the USA, during the month of July 2006, there were an estimated 5.6 billion searches carried out by web users, of which 49.2% were carried out on Google. Founded in 1998 by Stanford students Larry Page and Sergey Brin, Google receives more than one billion searches a day worldwide. A play on the word googol (the number 1 followed by 10 zeros), it reflects the company's mission to organise the vast amount of information available on the web. Google is the first search engine name to be added to the Oxford English Dictionary as a verb, as in 'I don't know the answer but I'll Google it'.

As mentioned earlier, search engines use automated programs to gather information from web pages. These programs are called spiders, crawlers or robots. They visit a web page, read it, and then follow links to other pages within the site. They return to the site on a regular basis (anywhere from daily to a month or two) to register any changes.

This information is used to create an index or gigantic database; when you enter your query, a complex algorithm interrogates the index and returns the results in order of relevancy. Therefore, search engines do not search the web for the results; they search their own index of the web. This explains why the results from different search engines can often be wildly different. The results will depend on a number of factors:

  • How the index has been designed

  • What information on the web page is indexed

  • The size of the index

  • How often the index is updated

  • Rules for how search terms are used to query the index

  • How the results are ranked

  • How the results are presented.

I will use Google to illustrate this process.

How the index has been designed

Google indexes about the first 100kB of a web page. It also records some URLs of pages that it has not actually got round to indexing. In June 2005, Google stated that its index included eight billion HTML files (web pages), text documents, Adobe PDF files, Microsoft Office documents and other similar files. It also had an index of 17 million images and 1 billion Usenet messages. Google updates its index continuously.

Rules for how search terms are used to query the index

Most of the major search engines, including Google, use the rules shown in Table 1. There are also some more advanced search commands that can be used with Google, as shown in Table 2.

Table 1 Search rules commonly used by all major search engines
Table 2 More advanced search commands that can be used with Google

How the results are ranked

Google's success and popularity has a lot to do with the special way it has developed for ranking the results from a web search. In the past, search engines have ranked the results by looking at the location and frequency of the search words on the web pages in question. Google, on the other hand, has introduced 'link popularity' which greatly boosts the ranking for any particular web page to determine relevance. The algorithm (set of rules employed to rank the listings) used by Google is called PageRank. Google regards a page as being important if lots of pages from other websites have links to it. It considers a page as very important if pages of high importance have links to it. Google sees a link to a web page as the online equivalent of a citation in a book. Web page authors generally only link to pages that they think are important; it is almost like a peer review process for the web.

How the results are presented

Google has one of the least cluttered results pages, making it simple to read and understand (see Fig. 2). The results page shows the top ten results for that particular search query; it also shows the total number of results and how long the query took. Google's spell checking software will automatically suggest a more common spelling if it thinks that you have spelt a word incorrectly. On the right hand side of the results are the sponsored links, and at the bottom of the page are links to the next nine pages of results. Results are clustered so that no more than two pages per website appear in its results; if a second page is listed, it is indented under the first page.

Figure 2: The Google results page following a search on 'dental CPD online'.
figure 2

The red boxes show the main features of the results page. Sponsored results appear to the right of the results (Google™ screen capture © Google Inc., printed with permission)

Other features of Google

Other important features found on the Google website include:

  • Most pages are cached; this means that even if the web page has disappeared from the web, you will still be able to view the copy that is stored by Google

  • Pages written in French, German, Italian, Spanish, and Portuguese will normally show on the results page with a 'translate this page' hyperlink. Click on this to generate a new version of the page that has been automatically translated into English. Obviously text that is part of an image will not be translated

  • As mentioned earlier, as well as web pages, the results may include other documents such as Adobe PDFs and Microsoft Office files. This is indicated in the results; clicking on the link will open the appropriate file provided you have the software to read it. There is also a hyperlink to open the file as an HTML document, which is useful if you do not have the extra software installed

  • If you click on the Images tab from the results page, Google carries out a search of its index of images, and shows the results as thumbnails

  • The Groups tab takes you to Google Groups, which provides web-based access to Usenet. You can read and post messages, just like in Outlook Express, but in addition you can search for keywords in more than 845 million messages dating back to 1981

  • The News tab takes you to Google News. This searches for relevant news items from about 4,500 news sources worldwide. Some results include a small thumbnail image from the source

  • The Froogle tab takes you to Froogle, a place where you can search for products that are sold online. Information is obtained by searching online stores and from product information supplied by sellers. Google does not accept payment for any of the results listed, but there is a sponsored links section just to the right of the results. You cannot buy products via Froogle, it merely points you to merchants who will sell the product

  • The more ≫ tab takes you to a page which lists further Google services (such as the Calendar and Maps) and eight Google tools (such as the efficient Desktop Search). Some of these are Beta versions of new search services; Beta means a pilot program or test version of a product. Keep an eye out for the latest services by clicking on the Labs link

  • Adult content filter; this can prevent Google from inadvertently directing young children to offensive sites

  • If you type www.google.com into the browser address bar and click on Go, you are normally redirected to the UK site for Google; the same applies to many other search engines such as www.ask.com.

Other major search engines

Windows Live Search9 (formerly MSN) and Ask.com10 (formerly Ask Jeeves) are the other two main search engines used in the UK.

  • Microsoft launched its new Windows Live Search engine in September 2006; it currently has an index of about five billion documents and 400 million images. The results for the images are better displayed than in Google's results and there is the added feature of a Scratchpad where you can store collections of images from the results (see Fig. 3)

    Figure 3: Results from the Windows Live Search Images page when searching for the terms 'porcelain veneers'.
    figure 3

    On the right hand side is the Scratchpad where you can store selected results from image searches (Windows Live Search screen shot reprinted with permission from Microsoft Corporation)

  • Ask.com replaced Ask Jeeves in February 2006 and still encourages users to use 'natural language' questions in its search box as well as traditional keyword searching. Provided you ask a common enough question, a box will appear at the top of the results page with an appropriate human-written editorial response; there may also be links to other web resources. Below that will appear paid listings (adverts). At the bottom of the page will be results generated by the Ask.com search engine

  • As mentioned earlier, Yahoo! (once famous for its search directory) now has its own major search engine. Provided you are using the UK version of Yahoo! the Local tab can be very useful; the database of businesses is supplied through an alliance with BT Phone Book.

Other search engines of note

Many other search engines now merely show the results from another company's search engine, for example AOL Search and Netscape use the Google search engine while AltaVista is based on the Yahoo! search engine. AltaVista11 is a good place to search the web for MP3 files (music and sound) and video files. AltaVista also hosts the Babel Fish translation tool12 which allows you to type (or cut and paste) up to 150 words of foreign text into a form for an instant translation. This popular, free translation service covers 12 languages, including Dutch, Japanese, Korean and Chinese. A new UK search engine site designed specifically for medical general practitioners is Search Medica.13 Searchers can limit their search to medical sites chosen by doctors, to NHS sites only or to the entire web. It has an uncluttered interface and is easy to use.

Metasearch engines

There is a different breed of search engine called a metasearch engine. It is not a search engine in its own right; it works by simultaneously forwarding your query to more than one of the major search engines. It then ranks and compiles the results in a meaningful way.

The reason why using a metasearch engine is a good idea is that there is little overlap between the major search engines. To see this in action, visit the Thumbshots14 website where you can compare the overlap between any two search engines for any given keyword. For example, when I compared Google with Yahoo!, there was only a 21% overlap for the keywords dentist London.

There are dozens of metasearch engines available, but here are some of the best:

  • Dogpile15 searches for web pages, images, audio, video and news items from eight search engines, including Google, Windows Live Search and Yahoo! There is also a UK version where you can limit the results to just the UK16

  • Vivisimo17 metasearch engine uses document-clustering software to categorise web page results into hierarchically sorted category folders. Unlike directories, which have been developed by human editors, Vivisimo is fully automated and can sometimes uncover results that would otherwise remain buried

  • KartOO18 is a metasearch engine with a visual display interface. It presents the results as a series of interactive maps.

Tips for using a search engine

  • Get to know at least one search engine inside out (preferably Google). Study the help files and the page for advanced searches. When you come across a new search service, you will then have something to compare it against

  • If there is the option of using a UK edition of a search engine then use it since it will often allow you the choice of limiting your search to within the UK. Just be aware that the index used by regional editions may not be as current as that used by the .com edition

  • Keep your keywords focused. Think about the sorts of words or phrases that are likely to appear on a web page with the kind of information you are looking for

  • Consider differences in language. If you are looking for American pages, think 'color' not 'colour', and 'vacation' not 'holiday'

  • Do not worry about getting millions of results; it is the top 10-20 results that you should be interested in. If necessary, use filters such as +, − and OR to refine the first page of results so that each link will provide relevant information

  • Interpret the URL before you click on it. Look for clues as to the origin of the web page, for example, a URL with .ac or .edu will be from an academic site

  • Clicking on a result may lead to a web page deep within a website with no way of navigating to the rest of the site. Look at the URL in the address bar of the browser and try deleting sections at the end of the URL until you get back to the root page

  • Recognise that a lot of search terms will produce results that will be links to other search engines and directory listings. For example, if you are searching using the keywords 'sony camcorder', you are sure to find one of the results will lead you to Kelkoo,19 Europe's biggest online price-comparison search engine. From here you can compare prices of products from 1,500 online retailers. Clicking on a link from the Kelkoo site will then take you to the retailer's page

  • Although these price comparison search engines are useful if you are looking to buy a product, they can render other searches almost useless due the long list of price-comparison links that litter the first page of results. One answer is to add '–pricerunner –dealtime –kelkoo' to such a search. This is most easily achieved by using a text replacement utility such as ShortKeys;20 it can be set up to place this text automatically by using a simple keyboard shortcut.

Advantages and disadvantages of a search engine

Search engines are best at finding unique keywords, phrases, quotes and information buried in the text of web pages and associated documents. They can also be used to find images, sounds, video, online products and news.

Because search engines provide such a wide range of responses to specific queries, it is often necessary to use filters to narrow down the results. Remember that when you use a search engine, you are actually only searching a portion of the web, captured in a fixed index created at an earlier date. Results will differ between different search engines. Because the overlap between the major search engines is actually very small, consider using a metasearch engine. Some search engines have a very cluttered results page and it is easy to inadvertently click on an advert instead of a result.

The invisible web

The 'invisible web' (or 'deep web') is that part of the web that is not indexed by the major search directories or engines, but is accessible to people online. It has been estimated21 that this 'invisible' part may be 500 times bigger than the visible web and is increasing in size at a greater rate. The invisible web can be divided into two main categories:

1. Databases. Most searchable databases are accessible from some sort of web page 'front end'. When a query is sent to a database, special software interrogates the database and then presents the information as a unique web page that is dynamically created. Though these dynamic pages have a unique URL address that allows them to be retrieved again later, they are not persistent; search engines only index static pages/documents that have hyperlinks to other pages/documents. Most major search engines do not have direct access to query these 'invisible' databases. The vast majority of the databases are freely available to the general public, some are only open to subscribers (whether by paid or unpaid subscription) and the rest are password protected and accessible to only a minority of invited people. Databases make up the majority of the invisible web and include:

  • Private networks

  • Medical journal databases, eg PubMed22

  • Scientific databases, eg Scirus23

  • Financial information, eg Interactive Investor24

  • Publishing databases, eg online magazines requiring subscription

  • Product catalogues

  • Entertainment databases, eg The All Music Guide25

  • Auction databases, eg Ebay26

  • Interactive maps, eg Multimap27

  • Airline arrival information, eg BAA UK flight arrivals.28

2. Non-indexed pages:

  • Pages which have been hidden from search engine robots

  • Information that is either very new or changes quickly

  • Script-based pages are often intentionally ignored by search robots because if they are badly or maliciously written, they can 'trap' the robot within a loop of web pages

  • Flash and Shockwave files are difficult for search engines to index.

How to make the invisible web visible

The main search engines have indexed many of the web pages from where you can start interrogating the invisible web databases, but the difficulty is in knowing what databases are actually available for any given subject. A full catalogue of the invisible web would be useful but is not forthcoming. Google Scholar29 was launched in 2004 and does open up some of the invisible web content by linking with various academic publishers to gain access to material that is normally locked behind their subscription barriers. The bad news is that once you have located an article of interest, you are normally required to pay a fee to the publisher for the full-text content.

Figure 4 uses a diagram to represent how information on the web is indexed and is accessible to different search engines. Figure 5 shows how a metasearch engine can provide access to a greater spread of information and how databases in the invisible web (such as PubMed and the content of Google Scholar) can only be accessed from specific web pages. Note that none of these diagrams are to scale (as already mentioned, the invisible web is thought to be 500 times bigger than the visible web!).

Figure 4: Diagram representing the different types of data stored on the web and the mix of data indexed by two different search engines.
figure 4

Google indexes text from html pages, images, non-html documents and postings from every Usenet group. Altavista indexes html pages, images, non-html documents, video and sound files. Note that there is a degree of overlap between the two search engines. At the bottom of the diagram are the databases from the invisible web, with the links from their web page front ends located in the visible part of the web. For example, the dark blue circle is the database for the Tesco website and the thin blue trail leads to the Tesco home page. The vast majority of the information on the Tesco website is invisible to the search engines. However, every search engine will find you the Tesco home page, from where one can access the whole Tesco database. (Most of the pages will be created 'on the fly' from information gathered from the database rather than just 'static' pages)

Figure 5: Diagram showing the Dogpile metasearch engine.
figure 5

It is not a search engine in its own right, but it gathers its results by querying eight other search engines simultaneously. The bottom of the diagram illustrates other ways of accessing the invisible web. It shows the PubMed database which can be interrogated from many websites within the visible web. It also shows the Google Scholar search engine web page, which can access the data from numerous databases that are normally 'invisible' to other search engines

PubMed

PubMed is a database service from the US National Library of Medicine that includes over 16 million citations and abstracts from Medline and other life science journals for biomedical articles dating back to the 1950s.

There are many ways of accessing PubMed's database but the easiest way is via the website of the National Center for Biotechnical Information.22 PubMed contains citations from over 4,800 international journals and includes hyperlinks to many sites providing full text articles and other related resources. Being part of the invisible web, the data itself is not stored as web pages and because of this you cannot use an ordinary search engine to carry out a PubMed search, unless they also have access to the database. (Many web pages carry links to citations stored by PubMed, simply by quoting the unique PubMed index number for that particular paper).

The easiest way of using PubMed is to carry out a text word search. When you enter your terms in the search box and click on Go, PubMed will automatically look for a match in four main lists. It looks to see if any of the terms are contained in its list of medical subject headings (MeSH), table of journals, index of authors or any other field, such as in the title or abstract. MeSH is a hierarchical index of specific terms that have been assigned to describe all the main topics; it is useful for retrieving information that may use different terminology for the same concepts. For example 'Maryland Bridge' and 'Resin Bonded Bridge' are both synonyms for the MeSH term 'Denture, Partial, Fixed, Resin-Bonded' (see Fig. 6). To find out more about how MeSH terms work, try the animated tutorials from the MeSH database page. To reach the MeSH database, go to the PubMed home page and click on the MeSH Database link, situated in the left hand column. To see all the dental MeSH terms, type 'dentistry' into the MeSH database search box. Click on Go and you will see a definition of dentistry as the first result. Click on the text to the right of the definition that says Links and then select NLM MeSH Browser to see all of the MeSH tree structures. Think of MeSH as a controlled vocabulary thesaurus.

Figure 6: Results from the PubMed database on the search terms 'maryland bridge'.
figure 6

Clicking on the Details tab shows how the query has been translated by the search engine into MeSH terms and text words (see the infill in red)

Tips for using PubMed

  • To search using an author's name, enter the name in the format of surname plus initials (with no punctuation), eg smith ja, jones t

  • The abbreviation for the British Dental Journal is Br Dent J

  • ClusterMed30 from the creators of the Vivisimo metasearch engine is designed to provide easy navigation of PubMed's articles. While a PubMed search returns a chronological list of all articles matching a search criterion, ClusterMed automatically groups the articles into hierarchical folders based on subject categories (see Fig. 7). Tabs enable you to quickly reorganise the results by title/abstract, MeSH headings, author and date. Unregistered users can test a limited version of ClusterMed that will return not more than 100 results

    Figure 7: Screen shot showing the ClusterMed tool from Vivismo.
    figure 7

    It arranges the results from PubMed into hierarchical folders based on subject categories

  • The EviDents search engine31 for evidence-based dentistry uses a straightforward page to help focus any dental search of PubMed

  • Take into account that the Medline database is not complete; there are many articles that have been published in journals normally catalogued by Medline, which have not found their way into the database

  • For more detailed suggestions on how to use Medline for dental searches, read the articles32,33 from Primary Dental Care; there is also a useful book called Field guide to Medline: making searching simple.34

Other ways of finding information

  • Yahoo! Answers35 is a community-driven service that allows users to ask and answer questions posed by other users. As of November 2006, the site contained 65 million answers and more than 7 million questions

  • Visit the website of an appropriate company or organisation and ask your question by email. The quality of the response can be varied

  • The online encyclopaedia, Wikipedia,6 is an absolutely fantastic resource on just about anything you can think of. Started in 2001, it is an ongoing collaborative project that is written by many of its readers. Anyone can edit a page if they think that it can be improved (but inappropriate changes are quickly removed and repeat offenders can be blocked from editing). There are also many sister projects such as Wikiversity, Wiktionary and Wikiquote

  • Mailing lists such as GDP-UK36 can be an invaluable place to ask a dentally related question.

The critical evaluation of information or 'But I found it on the Internet!'

One of the most important skills that needs to be learnt before you become a true power searcher is the ability to critically evaluate the information that you find from the results of your searches. Much has been written about this topic37 and here are some simple guidelines to follow. Always ask yourself the following questions:

  • Who has written the information and what are their qualifications?

  • What does the author say is the purpose of the site?

  • Who is the intended audience?

  • When was the information written and when was it last updated?

  • How complete and accurate is the information and links provided?

  • Where does the information come from?

  • Is there any bias, for example, due to sponsorship?

  • Is the information consistent with other information published about the topic?

If you knew that the author of a medical article was a thirteen year old from Outer Mongolia and that it was written as part of a school project in 1996, would this change the way in which you regarded the information? For many web pages, it is simply not possible to know who the author is, where they are from, etc. For this reason, there have been many attempts to gather together hand-selected, evaluated Internet resources 'under one roof'. These are called subject gateways:

  • The Medicine gateway of Intute: Health and Life Sciences38 (which used to be known as OMNI) is probably the most important medical gateway in the UK. This gateway has been created by a core team of information specialists and subject experts lead by the University of Nottingham Greenfield Medical Library. Carrying out a search on 'dent*' resulted in 434 evaluated Internet resources (the * is a wildcard filter that is used by lots of search tools and is useful for finding derivatives and spelling variants of the same word, for example dent* could represent dentist, dentistry, dental etc). Intute also have a website called the Virtual Training Suite39 that provides free Internet tutorials on how to get the best from the Web for education and research; see the tutorial for medicine

  • The Cochrane Oral Health Group is part of the Cochrane Library40 and provides a collection of databases that provide high-quality, independent evidence for healthcare decision-making.

Which search service should I use?

Firstly consider exactly what information you are looking for: how detailed, how current, how wide ranging? Compare your requirements with the strengths and weaknesses of the different search services available and then use the right tool for the job (see Table 3). Unfortunately, this does mean that there is no one service that will be suitable for every occasion. It also means that it takes time and experience to get to know what services are available.

Table 3 Some search requirements and useful tools