8.5 Basics of Searching on the Web

There are two types of databases for searching on the web. There are site-specific search engines that locate material contained within a specific website, and there are web search engines that locate information widely from across the web. Both of these types of search engines have the same sorts of functions. What differs between them is how the database being searched was created, how searches within that database are best conducted, and what kind of information you’ll be retrieving from the database.

The first thing you should do when you go to an unfamiliar search site (or even one you’ve been using, but with some degree of frustration) is to find the search “help” file. Spending a few minutes with the help file will get you ready to do a thorough and effective search.

Directories vs. Indexes

The most important distinction for you to understand about a web search site is whether it is a directory or a machine-indexed search tool.

A directory search engine has hierarchically organized lists of subject categories. Its databases are compiled and maintained by humans, and these directories allow users to browse by subject in search of relevant information. Examples of directory search sites are Yahoo! and online library catalogs.

Instead of relying on humans to enter the information about sites, machine-indexed search engine sites rely on software. This explanation of how “spider” (sometimes referred to as “robot”) software works comes from founder Danny Sullivan’s Search Engine Watch. Sullivan writes:

“The spider visits a web page, reads it, and then follows links to other pages within the site. This is what it means when someone refers to a site being “spidered” or “crawled.” The spider returns to the site on a regular basis, such as every month or two, to look for changes. Everything the spider finds goes into the second part of a search engine, the index. The index, sometimes called the catalog, is like a giant book containing a copy of every web page that the spider finds. If a web page changes, then this book is updated with new information. Sometimes, it can take a while for new pages or changes that the spider finds to be added to the index. Thus, a web page may have been “spidered” but not yet “indexed.” Until it is indexed – added to the index – it is not available to those searching with the search engine.” (2002)

A few examples of machine-indexed search engine sites are Google and Bing.

Database “scope”: It is important that you understand the scope of the contents contained in whatever database you are searching. The scope defines the range of materials that the database indexes. Google, for example, aspires to index all the contents of web sites in the surface web. Google News, however, indexes only news stories. Google Scholar indexes articles found in scholarly publications. The New York Times web site also has a database of news stories, but they would only be those published in the New York Times. And the Star Tribune has two databases of news stories on its web site – one with stories published by the newspaper online in the past 3-4 years, another with stories published in the newspaper since the 1980s. Knowing the scope of the database you are using will help you know where you are likely to find the kind of information you need.