Lab 9: Files and Streams

Introduction

Throughout this lab manual, we have made extensive use of files---containers on a hard or floppy disk that can be used to store information for long periods of time. Each source program that we have written has been stored in a file, and each binary executable program has also been stored in a file.

Files differ from programs in that a program is a sequence of instructions, and a file is a container in which data (like program code) can be stored. For example, the word processing documents that you work on are saved as files. Those files look quite different from the files that store your programs and from the files that store your executables.

But this begs the question, why can't we read and write data from and to a file like a word processor? This would be particularly useful for problems where the amount of data to be processed is so large that interactively entering the data each time the program is executed becomes inconvenient. If we could only store our data in a file, then we could test our program many times over without having to retype the data each time.

Let's consider a simple problem.

When many of us were younger, we enjoyed writing secret messages, in which messages were encoded in such a way as to prevent others from reading them, unless they were in possession of a secret that enabled them to decode the message. Coded messages of this sort have a long history. For example, the Caesar cipher (invented, it is said, by Julius Caesar himself) is a simple way to encoding messages.

For example, consider this message and its encoding:
Message Encoded

One if by land, two if by sea. Rqh li eb odqg, wzr li eb vhd.
What is the relationship between the letters in the original sentence and those in the encoded sentence? Hint: compare the "difference" between the corresponding characters in the two sentences.

Message	Encoded
`One if by land, two if by sea.`	`Rqh li eb odqg, wzr li eb vhd.`

This lab's exercise is to use the Caesar cipher to encode and decode messages stored in files.

Files

Directory: lab9

caesar.h, caesar.cpp, and caesar.doc implement a Caesar encryption function.
encode.cpp and decode.cpp are the two drivers needed for this lab exercise.
message.text and alice.code are two sample input files.

Create the specified directory, and copy the files above into the new directory. Only gcc users need a makefile; all others should create a project and add all of the .cpp files to it.

Add your name, date, and purpose to the opening documentation of the code and documentation files; if you're modifying and adding to the code written by someone else, add your data as part of the file's modification history.

An Encoding Program

The first part of this exercise is to write a program that can be used to encode a message that is stored in a file. The encoded message will then be saved to a second file.

Design

As usual, we will apply object-centered design to solve this problem.

Behavior.

Our program should display a greeting and then prompt for and read the name of the input file. It should then connect an input stream to that file so that we can read from it, and check that the stream opened correctly. It should then prompt for and read the name of the output file. It should then connect an output stream to that file so that we can write to it, and check that the stream opened correctly. For each character in the input file, our program should read the character, encode it using the Caesar cipher, and output the encoded character to the output file. Our program should conclude by disconnecting the streams from the files.

This behavior is a bit verbose since we're just learning about files, but it's perhaps better to err on the verbose side of things rather than on the forgetting side of things. (Programs tend to crash when you forget to do things.)

Objects. Using this behavioral description, we can identify the following objects:

Description Type Kind Name

a greeting string constant ---

The name of the input file string varying inFile

An input stream ifstream varying inStream

The name of the output file string varying outFile

An output stream ofstream varying outStream

a character from the input file char varying inChar

an encoded character char varying outChar

Description	Type	Kind	Name
a greeting	`string`	constant	---
The name of the input file	`string`	varying	`inFile`
An input stream	`ifstream`	varying	`inStream`
The name of the output file	`string`	varying	`outFile`
An output stream	`ofstream`	varying	`outStream`
a character from the input file	`char`	varying	`inChar`
an encoded character	`char`	varying	`outChar`

Using this list of objects, here's our specification:

Specification:
input (inFile): a sequence of unencoded characters.
output (outFile): a sequence of encoded characters.

This list of objects raises an important question that you should think about:

Question #9.1: What is the difference between a file name and a file stream?

One immediate hint: consider the data types. The data type determines the operations you can perform on an object. You should answer to this question after you've read through this section. It's an important distinction that, if you understand the difference, you'll save yourself much heartache when writing and debugging your programs.

Operations. From our behavioral description, we have these operations:

Description Predefined? Name Library

Display a string yes << iostream

Read a string yes >> iostream

Connect an input stream to a file yes ifstream declaration fstream

Connect an output stream to a file yes ofstream declaration fstream

Check... yes assert() cassert

...that a stream opened properly yes is_open() fstream

Read a char from an input stream yes get() fstream

Encode a char using the Caesar cipher yes caesarEncode() caesar

Write a char to an output stream yes << fstream

Repeat input, encoding, and output operations yes input loop built-in

Determine when all chars have been read yes eof() fstream

Disconnect a stream from a file yes close() fstream

Description	Predefined?	Name	Library
Display a `string`	yes	<<	`iostream`
Read a `string`	yes	>>	`iostream`
Connect an input stream to a file	yes	`ifstream` declaration	`fstream`
Connect an output stream to a file	yes	`ofstream` declaration	`fstream`
Check...	yes	`assert()`	`cassert`
...that a stream opened properly	yes	`is_open()`	`fstream`
Read a `char` from an input stream	yes	`get()`	`fstream`
Encode a `char` using the Caesar cipher	yes	`caesarEncode()`	caesar
Write a `char` to an output stream	yes	`<<`	`fstream`
Repeat input, encoding, and output operations	yes	input loop	built-in
Determine when all `char`s have been read	yes	`eof()`	`fstream`
Disconnect a stream from a file	yes	`close()`	`fstream`

Algorithm. We can organize these operations into the following algorithm:

Display a greeting.
Prompt for and read inFile, the name of the input file.
Create an ifstream named inStream connecting our program to inFile.
Check that inStream opened correctly.
Prompt for and read outFile, the name of the output file.
Create an ofstream named outStream connecting our program to outFile.
Check that outStream opened correctly.
Loop through the following steps:
1. Read a character from the input file.
2. If the end-of-file was reached, then terminate repetition.
3. Encode the character.
4. Write the encoded character to the output file.
End loop.
Close the input and output connections.
Display a "successful completion" message.

Coding

This algorithm should be encoded in main() in encode.cpp.

Steps 1, 2, and 5 are very familiar. Print a prompt, read in a value. The values are only strings, so nothing new yet.
The loop is tested in the middle, so we'll use a forever loop (i.e., for(;;)...) with an if-break to stop the loop.
Encoding can be done with caesarEncode() from the caesar library.

Write the code for these steps. The if-break statement can wait, but the other steps listed here are straightforward.

We only have to figure out the file I/O steps.

Opening a Connection to a File. When we want to get input from a file, we have tell the compiler that's what we want. It's a fairly expensive operation (since data moves much slower to and from a disk than to and from a computer's main memory). We also have to be precise about what file we want. We certainly don't want all files on the machine.

So we need to open a connection between the program and a file. A connection is a thing, and all things in C++ are represented as objects. File connections are known as streams, and there are two types of streams: ifstream for input file streams and ofstream for output file streams.

Like any other object, a stream must be declared before it can be used. If inputFileName is a string object containing the name of an input file, then the declaration

ifstream inFileStream(inputFileName.data());

constructs a stream object named inFileStream as a connection to the file.

The string method data() extracts the actual characters from a string. If your compiler is not fully ANSI compliant, you may have to use the c_str() method instead. The stream classes are a bit particular about the strings that they'll accept.

If the file does not exist, bad things happen. More on this later.

Using this information, implement the step of our algorithm that creates and opens a connection inStream to the input file named inFile.

An output stream is similar:

ofstream outFileStream(outputFileName.data());

This declaration constructs an object named outFileStream as a connection to the file named outputFileName.

If the file does not exist, then a file by that name is created in the working directory. If the file does exist, then its contents are erased. An ofstream thus provides a connection to a file so that we write data to the file.

Using this information, implement the appropriate step of our algorithm by declaring an ofstream named outStream that serves as a connection between our program and the file whose name is in outFile.

Libraries. Try compiling your code. Ooops. You should get complaints about ifstream and ofstream. The answer is in the object chart above: you haven't included the proper library.

Include the proper library for these identifiers, and then compile your code. Don't run your code yet because your loop doesn't have a termination test.

Checking that a Connection Opened Correctly. Opening files is an operation that is highly susceptible to user errors. Suppose the user has accidentally deleted the input file and our program tries to open a connection to it? What if it never existed in the first place! If an fstream opens as expected, the operation is said to succeed, but if it does not open as expected, the operation is said to fail.

To detect the success of an open operation, fstream objects contain an is_open() method:

fileStream.is_open()

which returns true if fileStream is open, and it returns false otherwise.

In an assert(), the is_open() method provides a readable way to perform the checking steps of our algorithm.

Write the code for these steps. Compile (but again don't execute) your program.

Input from an ifstream. The most important thing about input (and output) is that you already know how to do it:

Helpful hint: File I/O is done the same way a screen and keyboard I/O.

Just as we have used the >> operator is used to read data from the istream named cin, the >> operator can be used to read data from an ifstream opened for input. Since the ifstream connects a file to a program, applying >> to it transfers data from the file to the program. For this reason, this operation is described as reading from a file, even though we are actually operating on the ifstream. An expression of the form:

inputFileStream >> VariableName

thus serves to read values from an ifstream named inputFileStream into the variable VariableName. The type of the value being read must match the type of VariableName, or the operation will fail.

However, while the input operator is the appropriate operator to solve many problems involving file input, it is not the appropriate operator for our problem. The reason is that the >> operator skips leading whitespace characters. That is, if our input were

 
  One if by land.
  Two if by sea.

and we were to use the >> operator (in a loop) to read each of these characters:

inStream >> inChar;

then all whitespace characters (blanks, tabs and newlines) would be skipped, so that only non-whitespace characters would be processed, as if the file contained

Oneifbyland.Twoifbysea.

To avoid this problem, ifstream objects contain a get() method:

inputFileStream.get( CharacterVariable );

When execution reaches this statement, the next character, including whitespace characters, is read from inputFileStream and stored in CharacterVariable.

Use the get() method of the inStream object to perform the char input in the loop. Then compile your program, and continue when your program compiles without error. Don't run it yet!

Controlling a File-Input Loop. Files are created by a computer's operating system. When the operating system creates a file, it marks the end of the file with a special end-of-file mark. Input operations are then implemented in such a way as to prevent them from reading beyond the end-of-file mark, since doing so could allow a programmer unauthorized access to the files of another programmer. The input operations will just keep you at the end-of-file forever until you realize where you are.

An ifstream object has a method named eof() that can be used to control an input loop:

inputFileStream.eof()

This expression returns true if the last read from inputFileStream tried to read the end-of-file mark, and it returns false otherwise. We have to read first, then test for end-of-file.

In a forever loop like the one in the source program, the eof() method can be used as our termination test. By placing an if-break combination:

if ( /* end-of-file has been reached */ ) break;

following the input step, repetition will be terminated when all of the data in the file has been processed.

In your source program, place an if-break in the appropriate place in our algorithm, using the eof() method of inStream as the condition in the if statement. Then compile your source program, to check the syntax of what you have written. When it is syntactically correct, continue to the next part of the exercise. You probably could run the program now, but it won't do anything interesting because it's not generating any output.

File Output. Just as we have used the << operator to write data to the ostream named cout, the << operator can be used to write data to an ofstream opened for output. Since the ofstream connects a program to a file, applying << to it transfers data from the program to the file. This operation is thus described as writing to the file, even though it is an ofstream operation.

The pattern for output should look pretty familiar:

outputFileStream << Value ;

outputFileStream is an ofstream, and Value is value that should be written in the file.

Use this example as a basis for a statement to finish up the loop, writing the encoded character (not the original!) to the output file. Compile your program to test the syntax of what you have written, and fix all of your compilation errors.

Closing Files. Once we are done using an stream to read from or write to a file, we should close it, to break the connection between our program and the file. This is accomplished using the method close(). Both the ifstream and ofstream classes have this method:

fileStream.close();

When execution reaches this statement, the program severs its connection to fileStream.

In the appropriate place in the source program, place calls to close() on the input stream and on the output stream. Then compile your program, and ensure that it is free of syntax errors.

Testing and Debugging

When your program's syntax is correct, test it using the provided file named message.text. If what you have written is correct, your program should create an output file, containing the output:

Rqh Li Eb Odqg
Wzr Li Eb Vhd

If this file is not produced, then your program contains a logical error. Retrace your steps, comparing the statements in your source program to those described in the preceding parts of the exercise, until you find your error. Or, pretend you're the computer and walk through your program. Correct your program, recompile it, and retest your program until it performs correctly.

Applying What We Have Learned

The last part of this exercise is for you to apply what you have learned to the problem of decoding a file encoded using the Caesar cipher. Complete the skeleton program decode.cpp, that can be used to decode a message encoded using the Caesar cipher. Do all that is necessary to get this program operational, so that messages encoded with encode.cpp can be decoded with decode.cpp. Put differently, the two programs should complement one another.

The difficult part has been done for you. The caesar library contains a caesarDecode() function which does all the work of decoding. Your job is to build the driver to handle file I/O.

To test your program, you can use the output file created by encode.cpp, or alice.code, a selection from Lewis Carroll's Alice In WonderLand.

Beware! Watch the names of your output files. Every application you've used has probably warned you when you're about to overwrite an existing file. This is behavior that the program had to implement. You haven't implemented it in this program, so you won't get this warning. So if you encode message.text to be message.code, and then you decode message.code to be message.text, then say goodbye to the old message.text! The old version will disappear, and you'll have to copy it over again. It's better to use message.decode, perhaps, when you decode message.code.

Submit

Turn in your code as well as the output from your programs.

Terminology

Caesar cipher, decode, file, reading (from a file), stream, writing (to a file)

Lab Home Page | Prelab Questions | Homework Projects