CPSC 110: Lab #5

There are two main types of Web search utilities: indexes and search engines. Search engines give us a variety of tools to form our searches.

Indexes

What Are They?

Indices are like electronic "yellow pages." You can look up sites by topic. One of the most successful sites on the Internet is Yahoo!. This site began as simply a list of the favorite sites of a couple of Stanford University graduate students. Their idea gradually grew into a kind of "yellow pages" for the World Wide Web.

Yahoo is located at http://www.yahoo.com/.

From this site, you can look for information by subject, or you can type a word or phrase in the search box on the page. The subjects are arranged hierarchically. That is, general topics are listed on the home page, each link you then select refines your topic further and further until (hopefully) you find what you're interested in.

There are other indices like About.com and Lycos.

Nearly all Web indexes include a search form for entering keywords. Often this can be better than search engines since the Web index can return its own index entries. So if you can get near your topic, you can use the index entries to explore your topic in various ways.

Using an Index

We will search using an index in class.

Search Engines

What Are They?

Search engines are computer programs that search the index for you based on the keywords you specify. Don't let their apparent simplicity fool you: the creation of these pieces of software are extremely impressive computer feats!

These enormous indexes are created by special programs that are sometimes called "web crawlers" because they crawl through the entire World Wide Web, following all of the links it can find, and making a record of them.

Where Are They?

There are a growing number of search engines:

In addition, there are some all-in-one search engines which take your input and submit it to several search engines at once:

Yahoo! has its own list of Search Engines.

Use a Search Engine

Go to AltaVista in Netscape. You must use this search engine so that we can easily check your results.

For each of the queries in this section (and the next), record the number of links to pages that Alta Vista finds. The searches for this section should return several thousand to several million pages; the searches in the next section should return several million pages.

Try a simple search

Query #1: John Calvin
Alta Vista returns a page of some (but certainly not all) links that match these keywords. There is some indication of how many pages in all were found.

To insist that a particular word appear in the search results, put a plus sign in front of the word:

Query #2: +John +Calvin

If you want to search on a phrase, place double quotes around it:

Query #3: "John Calvin"

To insist that a particular word not appear in the search results, put a minus in front of the word:

Query #4: +John -Calvin

Other Suggestions

If you want your search to be case insensitive (i.e., lowercase and uppercase don't matter), then enter your keywords entirely in lowercase letters. If you want your search to be case sensitive, then capitalize where it matters.

The asterisk functions as a wildcard character, meaning it matches zero or more characters of any kind. Thus, colleg* would match "college", "collegiate", etc.

Boolean Searches

Boolean Operators

Many search engines allow the use of Boolean operators: AND, OR, and NOT. These can be very useful in narrowing down the number of pages that result from a search.

AltaVista, for example, allows this in its Advanced Search, which you access by going to AltaVista's home page (http://www.altavista.com/) and clicking on the Advanced Search link. This will bring you to the advanced search page.

In the box marked "Boolean query:", you can enter combinations of search terms, the boolean operators as listed above, and the operator NEAR.

Examples

Suppose you wanted Web pages about children and the Internet:

Query #5: children AND internet
A Web page must contain both words (or expressions as we'll see) in order for it to match this search.

The NEAR operator looks for both words (expressions) to be near each other on the Web page. For example,

Query #6: children NEAR internet

If you were looking for Web pages about children but not about the Internet, you can use the NOT operator:

Query #7: children AND NOT internet

Suppose you were looking for Web pages about children and not just the Internet, but also video games or television. You can make more elaborate searches:

Query #8: children NEAR (internet OR "video games" OR television OR tv)
The parentheses here tell the search engine that each of the latter terms is wanted as long as it's paired with "children". We use the double quotes around "video games" so that it's treated as a phrase, not two individual words.

Turn In

Type up your answers in a word processor. Turn in a print out of this chart, and keep one for yourself since you will need it for Project #5. Make sure your name, course number (i.e., "110"), section letter, and "Lab #5" are written clearly on the paper. This lab is worth 8 points. See the schedule page for the due date.

Other Projects

Continue working on the email project. Start working on the Web project.


Schedule page --- Project #5 --- email project --- Web project


Last modified: Thu Feb 21 15:08:07 EST 2002
This document was prepared with Latte, the best text processing language for the Web.
Every attempt has been made to validate the HTML on this page. Valid HTML 4.0!
© Copyright 2001--2002, Jeremy D. Frens & Calvin College. Permission to copy by any means is granted as long as this copyright is preserved.