There are many problems that involve processing and analyzing text in much the same way as analyzing sentences in a natural language such as English to find nouns, verbs, adjectives, adverbs, etc. and determining if they fit together according to the grammar rules of that language.
string
data type.
string
type
provided in the C++ string
library. Some of the problems
in the programming projects prepared for this lab exercise require using
similar string operations in some simple data encryption methods. The
textbook (C++ for Engineering and Science) describes,
in addition to C++'s string
class, its
istream
and ostream
classes and
complex
class for processing complex numbers. It also describes
a RandomInt
class for generating random numbers that are
useful in programming simulations.
Making a noun plural usually consists of adding an 's', but sometimes there are special cases. Consider these examples:
Singular noun Plural noun exercise exercises noun nouns word words abscess abscesses summons summonses box boxes hobby hobbies party parties
As for Pig Latin, there are two basic rules:
(NOTE: Vowels are 'a', 'e', 'i', 'o', 'u', and 'y' (unless it begins the word). Words in which 'u' is the first vowel and it is preceded by 'q' also require special treatment as described later.)
English Pig Latin English Pig Latin alphabet alphabetyay nerd erdnay billygoat illygoatbay orthodox orthodoxyay crazy azycray prickly icklypray dripping ippingdray quasimodo asimodoquay eligible eligibleyay rhythm ythmrhay farm armfay spry yspray ghost ostghay three eethray happy appyhay ugly uglyyay illegal illegalyay vigilant igilantvay jury uryjay wretched etchedwray killjoy illjoykay xerxes erxesxay limit imitlay yellow ellowyay messy essymay zippy ippyzay
stringops.cpp
is a
program for you to use to try out some string operations described in the
first part of this lab exercise.
==>
in the opening documentation.
==>
at the beginning of main()
.
The file translate.cpp
is a
driver for translating keyboard input from English to Pig Latin. Later in
this lab exercise you will repeat the preceding 3 steps for it and then add
a function englishToPigLatin()
to it, inserting its prototype
and definition at the designated places of the program.
For both problems — pluralizing nouns and english-to-pig-latin
conversion — we begin by studying C++'s string
type
that is provided by the <string>
library. Note the
compiler directive
#include <string>
in the translate.cpp
program.
The string
data type in C++ is implemented with a
class. The main difference between a class such as
string
and simple types such as
int
and double
is that a class
encapsulates both data and actions together in
one object (as opposed to the simple types, which must be "shipped off" to
other functions that perform the required actions). In a future lab, we'll
look at how to build our own classes, but for now we'll restrict our
attention to those provided in C++.
A string
object is used to hold a sequence (i.e., string)
of characters. A single character (i.e., a char
) isn't, in
general, as useful as strings; so you'll probably find yourself using
string
s almost every time you need to process text.
string
ObjectsWe can easily declare and initialize string
objects with
string
literals; for example,
After the declaration,string englishWord = "farm";
englishWord
is a string
object tnat contains the string of characters that make up the word
farm
.
A string
is an indexed type, which means that
we can access individual characters in the string by specifying their
positions:
englishWord
and can be used to access the
individual characters within it.
One important thing about indexing the characters of a string
is where the indexing starts:
Remember: The indexing of a string
always starts at 0.
So the first character of the string
has index 0, the second
character has index 1, the third character has index 2, and so on up
to the last character whose index is one less than the size of
the string
. Keep this in mind as you use indices to
access the characters and substrings of a string
.
string
OperationsAs mentioned above, a class encapsulates both data and actions into
one object. As a class, the string
class provides many
operations that can be performed on string
objects.
Operations in a class are usually implemented as
member functions (also known as methods
or instance methods), which are simply functions
provided within that class. In a later lab we will
see how to make these definitions ourselves, but for now all we need
to know is how to call them. This requires a slightly different syntax for
a function call:
Up to now, we haven't attached an object name to the function name when
we called a function. However, to call (or "invoke") a member function
inside an object, we must specify the object to which it is to be applied.
And this is done by attaching the object's name to the member function with
the dot operator.
object
.memberFunctionName
(arguments
);
Member functions can be thought of as messages
sent to an object and the dot (.
) as a "push-button"
operation that sends the message (somewhat like buttons on a stop watch
or a calculator or a phone and so on). For example,
We can think ofint size = stringObject.size();
stringObject
receiving a message
size()
that asks it to figure out and report back
its size. In other words, "Hey, stringObject
,
what's your size?" This metaphor becomes more
important when we implement classes, but it's also helpful in
mastering this new syntax for calling methods.
Let's look at a few of the operations provided by the string
class. In the following table, str
is of
type string
:
Description Syntax Explanation Index operator
str
[index
]Returns the character at position index
instr
. Ifindex
is out of range forstr
, it will return a "garbage value" or cause the program to crash.Size of a string
str
.size()Returns the size of the string str
.Concatenation of two string
s
str1
+str2
Returns the string
formed by attachingstr2
at the end ofstr1
.Equality of two string
s
str1
==str2
Returns true
if the twostring
s are equal,false
otherwise.Substrings
str
.substr(start
,size
);Returns the substring of str
consisting of thesize
characters starting at indexstart
.Finding characters in a string.
str
.find_first_of(pattern
,index
)Returns the index of the first character in pattern
found instr
, starting at positionindex
. If no matching character is found,string::npos
is returned.
The first four operations are quite straightforward. The following statements illusrate how they are used:
Now let's look at some examples of thestring str = "milesperhour"; cout << str << endl; cout << str[0] << str[1] << str[2] << str[3] << str[4] << endl; cout << "Size = " << str.size() << endl; cout << str + " (mph)" << endl; if (str == "MilesPerHour") cout << "Yes\n"; else cout << "No\n";Question #7.1: What output will be produced by these statements? Predict what you think will be produced and then enter the statements in
stringops.cpp
and execute it to check your answers.
substr()
method:
string sub1 = str.substr(0,5); cout << sub1 << endl; sub1 = str.substr(5,3); cout << sub1 << endl; sub1 = str.substr(8,0); cout << sub1 << endl; sub1 = str.substr( 0, str.size() ); cout << sub1 << endl;Question #7.2: What output will be produced by these statements? Predict what you think will be produced and then enter the statements in
stringops.cpp
and execute it to check your answers.
In these examples, it is important to note that the second argument is a size, not an index (a common mistake).
The find_first_of()
method is a little more complicated, but
is very useful. Consider this code:
string factorial= "n! = 1 * 2 * ... * n"; int firstIndex = factorial.find_first_of(".*!?", 0);
TheQuestion #7.3: What output will be produced by these statements? Predict what you think will be produced and then enter the statements in
stringops.cpp
and execute it to check your answers.
find_first_of()
method searches factorial
to find the first occurrence of a character from the pattern
".*!?"
. It does not need to
find the entire pattern, just one of the characters from the
pattern. So, in the code above, firstIndex
will be
set to the index of the first exclamation mark in the string.
Starting a search at the beginning of a string — i.e., at position 0
— is common but isn't required. We can start the search anywhere within
the string. To illustrate, suppose we wish to continue the search in the
preceding example beyond the first occurence of the first .
,
*
, or !
. We need only conduct a search starting
with the character after the one we found in the first search :
This starts the search right after the previous search and finds the second occurrence of one of the characters.int secondIndex = factorial.find_first_of(".*!?", firstIndex + 1);
Question #7.4: What will be the value of secondIndex
?
Question #7.5: What would be the value ofsecondIndex
if we started the search atfirstIndex
instead offirstIndex + 1
?
Memorizing a complete description of a library such as
string
is usually not necessary; in fact, in most cases it
isn't really feasible. What is important is to know generally
what's available in the library, where you can find the library, and
most importantly, where you can find documentation for the library —
for example, in your textbook (perhaps in an appendix) or by searching the
Internet.
Here is a handy string Library Quick Reference that describes some of the most useful string operations. Don't try to memorize them, but be able to come back to this section when working with these string operations. It might be wise to print this quick reference and keep a hard copy (or a pdf file) handy.
After examining the table of examples of singular and plural nouns at the beginning of this exercise, we can formulate some simple rules:
Here's how our function should behave:
Our function should receive a singular noun.
If the noun ends in "s" or "x", return the noun with "es" attached at the end.
Otherwise, if the noun ends in "y", return the noun with the "y" replaced with "ies".
Otherwise, return the noun with "s" tacked on the end.
Here are the objects we need:
Description Type Kind Movement Name the singular noun string
variable in singularNoun
the index of the last character of the noun int
variable local lastCharIndex
the last character of the noun char
variable local lastChar
the plural noun string
variable out --
And we can write a specification for our function:
Specification:Up to now, we've usedreceive: a singular noun, astring
precondition: the noun should be singular
return: the plural version of the noun, astring
assert()
or an if
statement
to check preconditions. Trying that here would require
a lot of work because there is no easy way to tell whether a
word is a noun and, in addition, whether it is singular.
In such situations when it isn't practical to enforce preconditions in our code, we settle for indicating clearly to those using the function that this is a precondition. If someone uses our function with a plural word (like "nouns") and gets strange results (like "nounses"), that's not our problem because we warned them!
Using the string
operations listed above and other operations
we've seen before, here are the operations we need for this function:
Description Predefined? Name Library receive a value yes parameter built-in if...otherwise... yes if
statementbuilt-in get last character of string yes []
andsize()
string
compare characters yes ==
built-in concatenate two strings yes +
string
extract a substring yes substr()
string
return a string yes return
built-in
And here is our algorithm:
singularNoun
.lastCharIndex
be the size of singularNoun
minus 1.lastChar
be the character of singularNoun
at index lastCharIndex
.lastChar
is 's' or 'x',
singularNoun
+ "es".lastChar
is 'y',
base
be singularNoun
without the
trailing "y".base
+ "ies".singularNoun
+ "s".Let's turn now to coding the Pig Latin translator.
Like the pluralizer, the Pig Latin translator will have certain rules for transforming words that are based on the contents of the word. In the case of the pluralizer, the rule we used was determined by the last letter of the word. For the Pig Latin translator, it will be the location of the first vowel.
Let's look at some examples:
The main point in both rules is the first vowel. In additon to 'a', 'e', 'i', 'o', and 'u', we will treat 'y' as a vowel except when it is the first letter of the word. For example, "my" becomes "ymay" in Pig Latin, but "your" becomes "ouryay."
Here's a first attempt at our our Pig Latin translator is to behave:
Receive an English word. Find the position of the first vowel in that word, checking that a vowel was actually present. If the word begins with a vowel other than 'y', then return the concatenation of the English word and "yay". Otherwise, the Pig Latin word consists of three parts(in order): (1) the portion of the English word from the first vowel to its end, (2) the initial consonants of the English word, and (3) "ay". Return the Pig Latin word.
Here are the objects that we need:
Description Type Kind Movement Name the English word string
variable in englishWord
index of the first vowel int
variable local vowelPosition
the Pig Latin word code string variable out piglatinWord
portion of the English word from its first vowel to its end string
variable local lastPart
consonants at the beginning of the English word string
variable local firstPart
"yay" string
constant local -- "ay" string
constant local --
This gives the following specification for our function:
Specification:Receive:englishWord
, astring
.
Precondition:englishWord
should be an English word.
Return:piglatinWord
, astring
.
Use the specification to create a prototype for a function named
englishToPigLatin()
. The rest of the design is up to you
— identifying the operations needed and developing an algorithm.
As with the noun-pluralizing function, checking the precondition isn't really
feasible. The driver program (translate.cpp
) where you will
put your function prompts the user to enter English sentences, so you need
not be concerned about what happens if they don't.
Task: Begin work on your Pig Latin translator function by identifying the operations needed and developing an algorithm for it. Even if you aren't required to hand it in, it will help a lot if you write some of this down and have it handy as you begin to work on the function definition.
We can code our algorithm for the "pluralizing" algorithm as follows::
string pluralize(string singularNoun) { int lastCharIndex = singularNoun.size() - 1; char lastChar = singularNoun[lastCharIndex]; if ((lastChar == 's') || (lastChar == 'x')) return singularNoun + "es"; else if (lastChar == 'y') { string base = singularNoun.substr(0, lastCharIndex); return base + "ies"; } else return singularNoun + "s"; }
Now let's break this down and spend some time comparing it with the algorithm.
The first statement in the pluralize
function,
Step 1: Receive singularNoun
.
This is done automatically through the parameter passing mechanism.
Step 2: Let lastCharIndex
be the size of
singularNoun
minus 1.
The first statement in the pluralize
function,
int lastCharIndex = singularNoun.size() - 1;
uses the size()
method to get the index of the last character.
Step 3: Let lastChar
be the character of
singularNoun
at index lastCharIndex
.
The second statement in the pluralize
function,
char lastChar = singularNoun[lastCharIndex];
uses this index to get the last character of
singularNoun
.
Step 4:
If lastChar
is 's' or 'x'
Return singularNoun
+ "es"
Otherwise if lastChar
is 'y'
(a) Let base
be singularNoun
without the trailing "y"
(b) Return base
+ "ies"
Otherwise,
Return singularNoun
+ "s"
is implemented by the multi-branch if
statement
at the end of the function:
if ((lastChar == 's') || (lastChar == 'x')) return singularNoun + "es"; else if (lastChar == 'y') { string base = singularNoun.substr(0, lastCharIndex); return base + "ies"; } else return singularNoun + "s";Each of the conditions in the
if
statements compares
two char
s and determines which rule to apply.
Each rule results in a string concatenation, the first and last of which
are quite simple; but the second one requires a closer examination.
Recall that the substr()
method needs an index where it is
to start extracting a substring and the length of the substring to be
extracted. The beginning index is easy: because we want everything but the
last character (which is a 'y' we're discarding), we need to start at
index 0. It doesn't matter what's in singularNoun
or
how long it is — a string
always begins at index 0.
But how long is this substring? lastCharIndex
is the
index of the last character — the character we need to avoid. Because
the indices of the individual characters begin with 0, the index of any
one character is equal to the number of characters that precede it. For
example, for the string "play"
, the indices of the
characters are 0, 1, 2, and 3, so the last character (y
) has
index 3, and there are 3 characters that precede it.
This explains why lastCharIndex
is doing double duty
as an index into singularNoun
and as a size
for the substring.
Now, go back and compare Step 4 of the algorithm with the code,
noting how straightforward each step of the algorithm translates
into code. You may struggle some with the syntax of the code since this
is your first look at the string operations being used. The double use of
lastCharIndex
is a bit tricky, but otherwise the algorithm
and the code match up very well.
The trickiest operation is finding the first vowel in a word.
Unlike the noun pluralizer, you cannot simply test a fixed position with
the index operator. Rather, you have to search for it.
However, take another careful look at the table of string
operations given earlier and the examples that follow it. One of those
methods is exactly what you need!
Then, after locating the first vowel, you need only extract the
appropriate substrings from the English word and use the concatenate operator
(+
) to build up the Pig Latin word to be returned by the
function.
Test your Pig Latin translator on all of the words from the table of examples given earlier and others that you want to try (and perhaps some others that your instructor assigns.)
As you test your function, you'll discover that your function doesn't work on all words; in particular, words that begin with 'y' or that contain a 'q' in the initial consonants.
Words beginning with 'y' pose a problem because we've been considering 'y' to be a vowel. If we didn't, words like "style" or "spry" wouldn't translate properly. However, for most words that begin with 'y' such as "yellow" and "yard", the initial 'y' should be treated as a consonant so that the Pig Latin versionis are ""ellowyay" and ""ardyay."
Note: Your program won't be perfect, however, because for some words, such as "Ypsilanti" and "yperite", the initial 'y' acts as a vowel. But we won't worry about these!Question #7.6: Add code to your function to handle this special case of words beginning with 'y' and test it with the words "yellow" and "yard".
Words with a 'q' in the initial consonants followed by a 'u' also pose a problem because the 'u' after the 'q' should also be moved to the end of the word. For example, "squire" should translate to "iresquay", not "uiresqay". This requires a special case.
We can complicate the special case further if we permit English words that do not have a 'u' after a 'q'. Strictly speaking, this isn't permitted in English, but many foreign words absorbed into English have a 'q' without a 'u' right after it; for example, the country "Qatar."
These two cases should be considered for a full-fledged Pig Latin translator and your instructor may wish to assign them. (See Project 6.1.) There are also some other interesting string-processing projects, including two that deal with encryption and decryption.