There are two main types of Web search utilities: indexes and search engines. Search engines give us a variety of tools to form our searches.
Indices are like electronic "yellow pages." You can look up sites by topic. One of the most successful sites on the Internet is Yahoo!. This site began as simply a list of the favorite sites of a couple of Stanford University graduate students. Their idea gradually grew into a kind of "yellow pages" for the World Wide Web.
Yahoo is located at http://www.yahoo.com/.
From this site, you can look for information by subject, or you can type a word or phrase in the search box on the page. The subjects are arranged hierarchically. That is, general topics are listed on the home page, each link you then select refines your topic further and further until (hopefully) you find what you're interested in.
There are other indices like About.com and Lycos.
Nearly all Web indexes include a search form for entering keywords. Often this can be better than search engines since the Web index can return its own index entries. So if you can get near your topic, you can use the index entries to explore your topic in various ways.
We will search using an index in class.
Search engines are computer programs that search the index for you based on the keywords you specify. Don't let their apparent simplicity fool you: the creation of these pieces of software are extremely impressive computer feats!
These enormous indexes are created by special programs that are sometimes called "web crawlers" because they crawl through the entire World Wide Web, following all of the links it can find, and making a record of them.
There are a growing number of search engines:
In addition, there are some all-in-one search engines which take your input and submit it to several search engines at once:
Yahoo! has its own list of Search Engines.
Go to AltaVista in Netscape. You must use this search engine so that we can easily check your results.
For each of the queries in this section (and the next), record the number of links to pages that Alta Vista finds. The searches for this section should return several thousand to several million pages; the searches in the next section should return several million pages.
Try a simple search
Query #1: John Calvin
Alta Vista returns a page of some (but certainly not all)
links that match these keywords. There is some indication of how
many pages in all were found.
To insist that a particular word appear in the search results, put a plus sign in front of the word:
Query #2: +John +Calvin
If you want to search on a phrase, place double quotes around it:
Query #3: "John Calvin"
To insist that a particular word not appear in the search results, put a minus in front of the word:
Query #4: +John -Calvin
If you want your search to be case insensitive (i.e., lowercase and uppercase don't matter), then enter your keywords entirely in lowercase letters. If you want your search to be case sensitive, then capitalize where it matters.
The asterisk functions as a wildcard character, meaning it matches
zero or more characters of any kind. Thus, colleg*
would
match "college", "collegiate", etc.
Many search engines allow the use of Boolean operators: AND
, OR
, and NOT
. These can be very useful in
narrowing down the number of pages that result from a search.
AltaVista, for example, allows this in its Advanced Search, which you access by going to AltaVista's home page (http://www.altavista.com/) and clicking on the Advanced Search link. This will bring you to the advanced search page.
In the box marked "Boolean query:", you can enter combinations of
search terms, the boolean operators as listed above, and the operator
NEAR
.
Suppose you wanted Web pages about children and the Internet:
Query #5: children AND internet
A Web page must contain both words (or expressions as we'll see) in
order for it to match this search.
The NEAR
operator looks for both words (expressions) to be
near each other on the Web page. For example,
Query #6: children NEAR internet
If you were looking for Web pages about children but not about
the Internet, you can use the NOT
operator:
Query #7: children AND NOT internet
Suppose you were looking for Web pages about children and not just the Internet, but also video games or television. You can make more elaborate searches:
Query #8: children NEAR (internet OR "video games" OR television OR tv)
The parentheses here tell the search engine that each of the latter
terms is wanted as long as it's paired with "children". We use the
double quotes around "video games" so that it's treated as a phrase,
not two individual words.
Type up your answers in a word processor. Turn in a print out of this chart, and keep one for yourself since you will need it for Project #5. Make sure your name, course number (i.e., "110"), section letter, and "Lab #5" are written clearly on the paper. This lab is worth 8 points. See the schedule page for the due date.
Continue working on the email project. Start working on the Web project.
Schedule page --- Project #5 --- email project --- Web project
Last modified: Thu Feb 21 15:08:07 EST 2002
This document was
prepared with Latte, the best
text processing language for the Web.
Every attempt has
been made to validate the HTML on this page.
© Copyright 2001--2002, Jeremy D. Frens & Calvin College. Permission to
copy by any means is granted as long as this copyright is
preserved.