Lab 6: Using Classes


Introduction

There are many problems that involve processing and analyzing text in much the same way as analyzing sentences in a natural language such as English to find nouns, verbs, adjectives, adverbs, etc. and determining if they fit together according to the grammar rules of that language.

In this lab's exercise, we examine two such text-processing problems that illustrate what is involved in problems like those described above but that will be much simpler so that we can focus on how text can be processed in C++ using the string data type. While these are rather different problems, they both involve string manipulation and require exploring the string type provided in the C++ string library. Some of the problems in the programming projects prepared for this lab exercise require using similar string operations in some simple data encryption methods. The textbook (C++ for Engineering and Science) describes, in addition to C++'s string class, its istream and ostream classes and complex class for processing complex numbers. It also describes a RandomInt class for generating random numbers that are useful in programming simulations.

Pluralizing Nouns

Making a noun plural usually consists of adding an 's', but sometimes there are special cases. Consider these examples:

Singular noun     Plural noun    
exercise exercises
noun nouns
word words
abscess abscesses
summons summonses
box boxes
hobby hobbies
party parties

Translating English into Pig Latin

As for Pig Latin, there are two basic rules:

  1. If the English word begins with a consonant, move the initial consonants to the end of the word and attach "ay".
  2. If the English word begins with a vowel, attach "yay" to the end of the word. (One variation uses "way" instead of "yay.")
Here are some examples:
English Pig Latin English         Pig Latin    
alphabetalphabetyay nerderdnay
billygoatillygoatbay orthodoxorthodoxyay
crazyazycray pricklyicklypray
drippingippingdray quasimodoasimodoquay
eligibleeligibleyay rhythmythmrhay
farmarmfay spryyspray
ghostostghay threeeethray
happyappyhay uglyuglyyay
illegalillegalyay vigilantigilantvay
juryuryjay wretchedetchedwray
killjoyilljoykay xerxeserxesxay
limitimitlay yellowellowyay
messyessymay zippyippyzay
(NOTE: Vowels are 'a', 'e', 'i', 'o', 'u', and 'y' (unless it begins the word). Words in which 'u' is the first vowel and it is preceded by 'q' also require special treatment as described later.)

Program Files

The file stringops.cpp is a program for you to use to try out some string operations described in the first part of this lab exercise.

  1. Create a program file that contains this file.
  2. Add any documentation required by your instructor at the ==> in the opening documentation.
  3. Add any output statement(s) required by your instructor at the ==> at the beginning of main().

The file translate.cpp is a driver for translating keyboard input from English to Pig Latin. Later in this lab exercise you will repeat the preceding 3 steps for it and then add a function englishToPigLatin() to it, inserting its prototype and definition at the designated places of the program.

Strings of Characters

For both problems — pluralizing nouns and english-to-pig-latin conversion — we begin by studying C++'s string type that is provided by the <string> library. Note the compiler directive

#include <string>
in the translate.cpp program.

The string data type in C++ is implemented with a class. The main difference between a class such as string and simple types such as int and double is that a class encapsulates both data and actions together in one object (as opposed to the simple types, which must be "shipped off" to other functions that perform the required actions). In a future lab, we'll look at how to build our own classes, but for now we'll restrict our attention to those provided in C++.

A string object is used to hold a sequence (i.e., string) of characters. A single character (i.e., a char) isn't, in general, as useful as strings; so you'll probably find yourself using strings almost every time you need to process text.

string Objects

We can easily declare and initialize string objects with string literals; for example,

string englishWord = "farm";
After the declaration, englishWord is a string object tnat contains the string of characters that make up the word farm.

A string is an indexed type, which means that we can access individual characters in the string by specifying their positions:

The numbers 0, 1, 2 and 3 are the indexes of the characters within englishWord and can be used to access the individual characters within it.

One important thing about indexing the characters of a string is where the indexing starts:

Remember: The indexing of a string always starts at 0.

So the first character of the string has index 0, the second character has index 1, the third character has index 2, and so on up to the last character whose index is one less than the size of the string. Keep this in mind as you use indices to access the characters and substrings of a string.

string Operations

As mentioned above, a class encapsulates both data and actions into one object. As a class, the string class provides many operations that can be performed on string objects.

Operations in a class are usually implemented as member functions (also known as methods or instance methods), which are simply functions provided within that class. In a later lab we will see how to make these definitions ourselves, but for now all we need to know is how to call them. This requires a slightly different syntax for a function call:

object.memberFunctionName(arguments);
Up to now, we haven't attached an object name to the function name when we called a function. However, to call (or "invoke") a member function inside an object, we must specify the object to which it is to be applied. And this is done by attaching the object's name to the member function with the dot operator.

Member functions can be thought of as messages sent to an object and the dot (.) as a "push-button" operation that sends the message (somewhat like buttons on a stop watch or a calculator or a phone and so on). For example,

int size = stringObject.size();
We can think of stringObject receiving a message size() that asks it to figure out and report back its size. In other words, "Hey, stringObject, what's your size?" This metaphor becomes more important when we implement classes, but it's also helpful in mastering this new syntax for calling methods.

Let's look at a few of the operations provided by the string class. In the following table, str is of type string:

Description Syntax Explanation
Index operator  str[index] Returns the character at position index in str. If index is out of range for str, it will return a "garbage value" or cause the program to crash.
Size of a string  str.size() Returns the size of the string str.
Concatenation of two strings  str1 + str2 Returns the string formed by attaching str2 at the end of str1.
Equality of two strings  str1 == str2 Returns true if the two strings are equal, false otherwise.
Substrings  str.substr(start,size); Returns the substring of str consisting of the size characters starting at index start.
Finding characters in a string.  str.find_first_of(pattern,index)  Returns the index of the first character in pattern found in str, starting at position index. If no matching character is found, string::npos is returned.

The first four operations are quite straightforward. The following statements illusrate how they are used:

string str = "milesperhour";
cout << str << endl;
cout << str[0] << str[1] << str[2] << str[3] << str[4] << endl;
cout << "Size = " << str.size() << endl;
cout << str + " (mph)" << endl;
if (str == "MilesPerHour") 
  cout << "Yes\n";
else
  cout << "No\n";

Question #7.1: What output will be produced by these statements? Predict what you think will be produced and then enter the statements in stringops.cpp and execute it to check your answers.

Now let's look at some examples of the substr() method:
string sub1 = str.substr(0,5);
cout << sub1 << endl;
sub1 = str.substr(5,3);
cout << sub1 << endl;
sub1 = str.substr(8,0);
cout << sub1 << endl;
sub1 = str.substr( 0, str.size() );
cout << sub1 << endl;

Question #7.2: What output will be produced by these statements? Predict what you think will be produced and then enter the statements in stringops.cpp and execute it to check your answers.

In these examples, it is important to note that the second argument is a size, not an index (a common mistake).

The find_first_of() method is a little more complicated, but is very useful. Consider this code:

string factorial= "n! = 1 * 2 * ... * n";
int firstIndex = factorial.find_first_of(".*!?", 0);

Question #7.3: What output will be produced by these statements? Predict what you think will be produced and then enter the statements in stringops.cpp and execute it to check your answers.

The find_first_of() method searches factorial to find the first occurrence of a character from the pattern ".*!?". It does not need to find the entire pattern, just one of the characters from the pattern. So, in the code above, firstIndex will be set to the index of the first exclamation mark in the string.

Starting a search at the beginning of a string — i.e., at position 0 — is common but isn't required. We can start the search anywhere within the string. To illustrate, suppose we wish to continue the search in the preceding example beyond the first occurence of the first ., *, or !. We need only conduct a search starting with the character after the one we found in the first search :

int secondIndex = factorial.find_first_of(".*!?", firstIndex + 1);
This starts the search right after the previous search and finds the second occurrence of one of the characters.

Question #7.4: What will be the value of secondIndex?

Question #7.5: What would be the value of secondIndex if we started the search at firstIndex instead of firstIndex + 1?

Memorizing a complete description of a library such as string is usually not necessary; in fact, in most cases it isn't really feasible. What is important is to know generally what's available in the library, where you can find the library, and most importantly, where you can find documentation for the library — for example, in your textbook (perhaps in an appendix) or by searching the Internet.

Here is a handy string Library Quick Reference that describes some of the most useful string operations. Don't try to memorize them, but be able to come back to this section when working with these string operations. It might be wise to print this quick reference and keep a hard copy (or a pdf file) handy.

Noun Pluralizer

Analysis

After examining the table of examples of singular and plural nouns at the beginning of this exercise, we can formulate some simple rules:

  1. For many words, simply add "s" to the end of the word.
  2. But if the word ends in "s" or "x," add "es" to the end of the word.
  3. Or if the word ends in "y," replace the "y" with "ies."
These rules do not cover all nouns, however. For example, although the plural of the word "bonus" is "bonuses," the plural of "radius" is not "radiuses" but "radii." Complete sets of rules for forming plurals contain a dozen or more rules, so to keep the program we're developing of reasonable length, we'll proceed with the three simple rules above and develop a "pluralizing" function that uses them.

OCD

Here's how our function should behave:

Our function should receive a singular noun.
If the noun ends in "s" or "x", return the noun with "es" attached at the end.
Otherwise, if the noun ends in "y", return the noun with the "y" replaced with "ies".
Otherwise, return the noun with "s" tacked on the end.

Here are the objects we need:

Description Type Kind Movement Name
the singular noun string variable in singularNoun
the index of the last character of the noun int variable local lastCharIndex
the last character of the noun char variable local lastChar
the plural noun string variable out --

And we can write a specification for our function:

Specification:
receive: a singular noun, a string
precondition: the noun should be singular
return: the plural version of the noun, a string
Up to now, we've used assert() or an if statement to check preconditions. Trying that here would require a lot of work because there is no easy way to tell whether a word is a noun and, in addition, whether it is singular.

In such situations when it isn't practical to enforce preconditions in our code, we settle for indicating clearly to those using the function that this is a precondition. If someone uses our function with a plural word (like "nouns") and gets strange results (like "nounses"), that's not our problem because we warned them!

Using the string operations listed above and other operations we've seen before, here are the operations we need for this function:

Description Predefined? Name Library
receive a value yes parameter built-in
if...otherwise... yes if statement built-in
get last character of string yes [] and size() string
compare characters yes == built-in
concatenate two strings yes + string
extract a substring yes substr() string
return a string yes return built-in

And here is our algorithm:

  1. Receive singularNoun.
  2. Let lastCharIndex be the size of singularNoun minus 1.
  3. Let lastChar be the character of singularNoun at index lastCharIndex.
  4. If lastChar is 's' or 'x',
        Return singularNoun + "es".
    Otherwise if lastChar is 'y',
    1. Let base be singularNoun without the trailing "y".
    2. Return base + "ies".
    Otherwise,
        Return singularNoun + "s".

Pig Latin Translator

Let's turn now to coding the Pig Latin translator.

Analysis

Like the pluralizer, the Pig Latin translator will have certain rules for transforming words that are based on the contents of the word. In the case of the pluralizer, the rule we used was determined by the last letter of the word. For the Pig Latin translator, it will be the location of the first vowel.

Let's look at some examples:

The main point in both rules is the first vowel. In additon to 'a', 'e', 'i', 'o', and 'u', we will treat 'y' as a vowel except when it is the first letter of the word. For example, "my" becomes "ymay" in Pig Latin, but "your" becomes "ouryay."

OCD

Here's a first attempt at our our Pig Latin translator is to behave:

Receive an English word. Find the position of the first vowel in that word, checking that a vowel was actually present. If the word begins with a vowel other than 'y', then return the concatenation of the English word and "yay". Otherwise, the Pig Latin word consists of three parts(in order): (1) the portion of the English word from the first vowel to its end, (2) the initial consonants of the English word, and (3) "ay". Return the Pig Latin word.

Here are the objects that we need:

Description Type Kind Movement Name
the English word string variable in englishWord
index of the first vowel int variable local vowelPosition
the Pig Latin word code string variable out piglatinWord
portion of the English word from its first vowel to its end string variable local lastPart
consonants at the beginning of the English word string variable local firstPart
"yay" string constant local --
"ay" string constant local --

This gives the following specification for our function:

Specification:
Receive: englishWord, a string.
Precondition: englishWord should be an English word.
Return: piglatinWord, a string.

Use the specification to create a prototype for a function named englishToPigLatin(). The rest of the design is up to you — identifying the operations needed and developing an algorithm. As with the noun-pluralizing function, checking the precondition isn't really feasible. The driver program (translate.cpp) where you will put your function prompts the user to enter English sentences, so you need not be concerned about what happens if they don't.

Task: Begin work on your Pig Latin translator function by identifying the operations needed and developing an algorithm for it. Even if you aren't required to hand it in, it will help a lot if you write some of this down and have it handy as you begin to work on the function definition.

Noun Pluralizer (Again)

Coding

We can code our algorithm for the "pluralizing" algorithm as follows::

string pluralize(string singularNoun)
{
  int lastCharIndex = singularNoun.size() - 1;
  char lastChar = singularNoun[lastCharIndex];
  if ((lastChar == 's') || (lastChar == 'x'))
    return singularNoun + "es";
  else if (lastChar == 'y')
  {
    string base = singularNoun.substr(0, lastCharIndex);
    return base + "ies";
  }
  else
    return singularNoun + "s";
}

Now let's break this down and spend some time comparing it with the algorithm.

The first statement in the pluralize function,
 
Step 1: Receive singularNoun.
This is done automatically through the parameter passing mechanism.

Step 2: Let lastCharIndex be the size of singularNoun minus 1.
The first statement in the pluralize function,
    int lastCharIndex = singularNoun.size() - 1;
uses the size() method to get the index of the last character.

Step 3: Let lastChar be the character of singularNoun at index lastCharIndex.
The second statement in the pluralize function,
    char lastChar = singularNoun[lastCharIndex];
uses this index to get the last character of singularNoun.

Step 4: If lastChar is 's' or 'x'
     Return singularNoun + "es"
Otherwise if lastChar is 'y'
     (a) Let base be singularNoun without the trailing "y"
     (b) Return base + "ies"
Otherwise,

     Return singularNoun + "s"

is implemented by the multi-branch if statement at the end of the function:

  if ((lastChar == 's') || (lastChar == 'x'))
    return singularNoun + "es";
  else if (lastChar == 'y')
  {
    string base = singularNoun.substr(0, lastCharIndex);
    return base + "ies";
  }
  else
    return singularNoun + "s";
Each of the conditions in the if statements compares two chars and determines which rule to apply. Each rule results in a string concatenation, the first and last of which are quite simple; but the second one requires a closer examination.

Recall that the substr() method needs an index where it is to start extracting a substring and the length of the substring to be extracted. The beginning index is easy: because we want everything but the last character (which is a 'y' we're discarding), we need to start at index 0. It doesn't matter what's in singularNoun or how long it is — a string always begins at index 0.

But how long is this substring? lastCharIndex is the index of the last character — the character we need to avoid. Because the indices of the individual characters begin with 0, the index of any one character is equal to the number of characters that precede it. For example, for the string "play", the indices of the characters are 0, 1, 2, and 3, so the last character (y) has index 3, and there are 3 characters that precede it. This explains why lastCharIndex is doing double duty as an index into singularNoun and as a size for the substring.

Now, go back and compare Step 4 of the algorithm with the code, noting how straightforward each step of the algorithm translates into code. You may struggle some with the syntax of the code since this is your first look at the string operations being used. The double use of lastCharIndex is a bit tricky, but otherwise the algorithm and the code match up very well.

Back to the Pig Latin Translator

Coding

The trickiest operation is finding the first vowel in a word. Unlike the noun pluralizer, you cannot simply test a fixed position with the index operator. Rather, you have to search for it. However, take another careful look at the table of string operations given earlier and the examples that follow it. One of those methods is exactly what you need!

Then, after locating the first vowel, you need only extract the appropriate substrings from the English word and use the concatenate operator (+) to build up the Pig Latin word to be returned by the function.

Testing

Test your Pig Latin translator on all of the words from the table of examples given earlier and others that you want to try (and perhaps some others that your instructor assigns.)

Maintenance — Fixes and Improvements

As you test your function, you'll discover that your function doesn't work on all words; in particular, words that begin with 'y' or that contain a 'q' in the initial consonants.

Words beginning with 'y' pose a problem because we've been considering 'y' to be a vowel. If we didn't, words like "style" or "spry" wouldn't translate properly. However, for most words that begin with 'y' such as "yellow" and "yard", the initial 'y' should be treated as a consonant so that the Pig Latin versionis are ""ellowyay" and ""ardyay."

Question #7.6: Add code to your function to handle this special case of words beginning with 'y' and test it with the words "yellow" and "yard".

Note: Your program won't be perfect, however, because for some words, such as "Ypsilanti" and "yperite", the initial 'y' acts as a vowel. But we won't worry about these!

Maintenance — Fixes and Improvements (Again)

Words with a 'q' in the initial consonants followed by a 'u' also pose a problem because the 'u' after the 'q' should also be moved to the end of the word. For example, "squire" should translate to "iresquay", not "uiresqay". This requires a special case.

We can complicate the special case further if we permit English words that do not have a 'u' after a 'q'. Strictly speaking, this isn't permitted in English, but many foreign words absorbed into English have a 'q' without a 'u' right after it; for example, the country "Qatar."

These two cases should be considered for a full-fledged Pig Latin translator and your instructor may wish to assign them. (See Project 6.1.) There are also some other interesting string-processing projects, including two that deal with encryption and decryption.


Lab Home Page


Report errors to Larry Nyhoff (nyhl@cs.calvin.edu)