Hands on Testing Java: Lab #10

File I/O

Throughout this manual, we have made use of files---data stored in an organized way on some persistent storage. Each source file that we have written has been stored in a file; the compiled byte-code of a class is stored in a class file.

In this lab exercise we will take a closer look at the I/O facilities provided by Java that allow us to add file manipulation to our programs.

Streams

Input and output is viewed as a stream in Java. The terminology is deliberate: picture your program as someone who controls a stream of water; the program can turn the stream on and off; the program can watch everything that goes past.

We've actually been dealing with data streams all along: the Screen and Keyboard classes create stream objects that connect a program up to the screen and keyboard of the user. Whenever a program asks a Keyboard object to read in a double (with Keyboard.readDouble()), the program asks the stream to flow just enough to read in the floating-point number it expects from the keyboard.

Byte Streams

The most basic streams classes are the InputStream and OutputStream. These classes are very limited; though there are a few other operations that are supported, they basically only know how to read and write bytes (8 bits of data).

InputStream OutputStream
read() - read a byte write() - write a byte
close() - close the input stream close() - close the output stream

Generally we like to think of our data at least in terms of ints, doubles, etc. Individual bytes just really aren't that useful except to build these other types of data. Fortunately, we don't have to provide this behavior there are other classes that do it for us.

Buffered I/O

Another disadvantage of the basic byte streams is that they perform an I/O operation with every read or write. This can be a serious problem if we are accessing a hard disk or a network drive which has slow access times. To improve performance, most systems use some kind of buffering.

Java implements two classes BufferedInputStream and BufferedOutputStream to create buffered input and output streams. They don't supply any additional operations, but are more efficient.

Data Streams

Java has classes that implement streams for processing different kinds of data, other than just bytes: the DataInputStream and the DataOutputStream. They can read and write any of the primitive data types that Java supports. In addition to the operations from the byte stream classes, these classes add a few more operations:

DataInputStream DataOutputStream
readBoolean() - read a boolean writeBoolean() - write a boolean
readByte() - read a byte writeByte() - write a byte
readChar() - read a char writeChar() - write a char
readDouble() - read a double writeDouble() - write a double
readFloat() - read a float writeFloat() - write a float
readInt() - read a int writeInt() - write a int
readLong() - read a long writeLong() - write a long
readShort() - read a short writeShort() - write a short

These are still not the classes and behaviors that we're looking for, even though their names are really, really tempting. The writing operations will record the binary representation of the data, not the character representation that we're used to reading and typing. It's nearly impossible for humans to read the output from these streams, which makes debugging very difficult.

Character Streams

Java has another set of primitive stream classes for doing I/O that are based on the classes Reader and Writer. The Reader and Writer classes implement character streams.

Characters in Java are stored internally in Unicode which uses 16 bits to represent one character. The actual representation of the data that is stored in a text file will typically depend on the locality of the machine, but Java handles those details for us (in the same way that our text editors handle those details).

Reader Writer
read() - read a character write() - write a character
close() - close the character stream close() - close the character stream

Buffered I/O (Again)

Just as with byte streams, we have classes that support buffered I/O for character streams: BufferedReader and BufferedWriter. In addition to the methods from the Reader and Writer classes, these buffered classes support the reading and writing of Strings with these methods:

BufferedReader BufferedWriter
readLine() - read a String of characters write() - overloaded to provide writing a String

Byte Streams or Character Streams

So which should we use, byte streams or character streams? When a computer's memory (both RAM and disk space) was small and costly, it was important to make files as small as possible. But memory gets cheaper and larger, so this has become less of a concern.

This removes the main reason for using byte streams for reading and writing files; however, they do still have their uses. Most computer games use byte streams for their saved-game files so that we can't modify our games---what would be the fun if we could sudden give ourselves billions of dollars in our game? Other programs, like word processors and spreadsheets, will use byte streams to implement proprietary file formats so that it's more difficult for other programs to read your documents. You're stuck using the commercial program and buying new versions just so that you can continue using the same saved file.

However, as noted above, the output from a byte stream is not easily read by a human. Since it's us humans who have to debug the program (which involves reading the input and output files of the program), it may be more important to make the files human readable; this means using a character stream. A problem with a character stream is that it's output takes up a little more room than a byte stream, but in this day and age of extremely cheap memory and storage, it's well worth this extra cost.

Print Streams

A print stream is a kind of stream that has been extended to support the operations print() and println(). These operations are overloaded so that they can successfully print any object (using its toString() method) as well as all the primitive Java data types. In contrast to write(), the print() methods will print the data in human readable form. Your own experience from previous labs testify to this; we've used print() methods all along.

There are two kinds of print streams: PrintStream is a kind of OutputStream (for byte streams) and PrintWriter is a kind of Writer (for character streams).

Handling Input

Input, on the other hand, is trickier than output. The best we have are the buffered streams which provide a readLine() facility. To extract useful information from a line of text we can use two things:

We saw both of these in Lab #6.

Predefined Streams

There are three stream objects in Java defined in the System class:

While Java programmers will typically use System.out directly, they will rarely do so with System.in. Instead, they usually build objects from System.in that make input much easier.

File Streams

The last piece of our puzzle for this lab exercise is connecting up to a file. We looked at a variety of streams, but how do we go the next step to connect up a stream to a file?

Java has four basic classes for manipulating file streams.

Since we're interested in character streams, we'll be using the last two on this list.

Each of the file-stream classes has a constructor that accepts a String as the name of the file to open:

new FileStreamClass(fileName)

If the file cannot be opened, the constructor will throw some sort of an IOException.

Generally, we don't directly use the objects that these constructors build for us. At the very least we have to wrap them up in a buffered stream.

Caesar Cipher Problem

We're going to use an encryption problem as an excuse to put files to use in this lab exercise.

A Caesar cipher is a simple means of encoding messages that dates from Roman times (rumored to be created by Julius Caesar himself, hence the name). For example:

Original message Encoded message
One if by land, two if by sea. Rqh li eb odqg, wzr li eb vhd.
veni, vidi, vici wfoj, wjej, wjdj

Both use a Caesar cipher, although with different keys. Can you figure out the encryption? The encryption key is the number of characters that each character is "shifted".

We're going to use Caesar ciphers to encode and decode the data in files.

Getting Started

Do this...

The driver is a "command-line interface with arguments", hence the slightly awkward name. We start with these command-line arguments.

Command-Line Arguments

From a command-line prompt, you can run a Java program like this:

% java hotj.imaginary.Driver

You can add extra information on that same line like so:

% java hotj.imaginary.Driver 12 Inputs/lab10/foobar.txt Outputs/lab10/johnboy.txt

The java executable is happy so long as its first argument is a valid Java class. But what happens to that text after the name of the class? One possibility is that it's thrown away, but that seems unlikely. Another possibility is that this data is treated as the first input from the keyboard; this perhaps seems more likely, but this too is not the case.

The words of text after the name of the class are known as command-line arguments. The operating system saves away command-line arguments in a special place for our program to use. Java delivers them to us through the arguments of the #main(String[]) method:

public static void main(String[] args) { ... }

As we saw in the previous lab exercise, String[] is the data type for an array of Strings. This array, which is traditionally named args, is filled with the command-line arguments.

In the example above, args in Driver#main(String[]) will be filled with the Strings "12", "Inputs/lab10/foobar.txt", and "Outputs/lab10/johnboy.txt". The only magical thing here is that the operating system fills the array for us automatically and the Java Virtual Machine delivers it to the program in this parameter.

The array args is like any other array of Strings that we might come across, so everything we learned about arrays in the previous lab and everything we know about Strings apply.

As noted in the Javadoc comments for EncodeCLIWADriver#main(String[]) the usage of the EncodeCLIWADriver program should be executed like this:

java EncodeCLIWADriver key inputFile outputFile

To run the program, we must provide command-line arguments for the key, the inputFile, and the outputFile.

An Encoding Program

We want to write a program that encodes a message stored in a file. We'll write the encoded message to another file. All programs must be designed first...

Design

Our program needs three pieces of information:

  1. The encryption key.
  2. The name of the input file.
  3. The name of the output file.

The encryption key is the key used to encode the message. Each CaesarCipher instance requires its own encryption key.

Behavior of EncodeCLIWADriver#main(String[])

The driver will first check to make sure there are three command-line arguments; if not, it will exit with an error message.

Otherwise, the driver will start normal processing by displaying a greeting. Then it will process the three command-line arguments:

  1. The first command-line argument will be parsed as an encryption key.
  2. The second command-line argument will be the name of the input file.
  3. The third command-line argument will be the name of the output file.

The driver will create a new Caesar cipher using the encryption key. The driver will also create a reader for the input file and a writer for the output file.

Then, for each character in the input file, our driver should read the character, encode it using the Caesar cipher, and then output the encoded character to the output file. The driver should conclude by closing the streams and by displaying a "success" message.

If the encryption key cannot be parsed or if any of the file I/O results in an error, the driver will print an error message and will not try to recover.

Using this behavioral description, we can identify the following objects:

Objects of EncodeCLIWADriver#main(String[])
Description Type Kind Name
command-line arguments String[] received args
encryption key int varying key
the name of the input file String varying inputFilename
the name of the output file String varying outputFilename
the input reader for the input file BufferedReader varying theReader
the output writer for the output file BufferedWriter varying theWriter
a "character value" from the input file int varying inputValue
the actual character from the input file char varying inputChar
an encoded character char varying outputChar

The object list here is biased towards Java: when a char is read from an input stream (or input reader), you actually read in an int. We will discuss this later in this lab, but for now, we need both the int read in and the actualy char in our object list.

Specification:

command-line arguments: candidate values for key, inputFilename, outputFilename
input (input file): a sequence of unencoded characters.
output (output file): a sequence of encoded characters.
output (screen): various progress messages.

From our behavioral description, we have these operations:

Operations of EncodeCLIWADriver#main(String[])
Description Predefined? Name Library
test number of command-line arguments yes != built-in
conditionally complain yes if built-in
exit the program yes System.exit(int) java.lang
parse an int from a String yes parseInt(String) java.lang.Integer
display a String yes println() java.io.PrintStream
connect an input stream to a file no createReader(String) EncodeCLIWADriver
connect an output stream to a file no createWriter(String) EncodeCLIWADriver
read a character from an input stream yes read() java.io.BufferedReader
test for end-of-file yes == -1 built in
encode a char using the Caesar cipher yes encode(char) CaesarCipher
write a char to an output stream yes write(char) java.io.Writer
repeat... yes forever loop built-in
try to act normally, but handle problems yes try... catch... finally built-in
disconnect a stream from a file yes close() java.io.BufferedWriter and java.io.BufferedReader

Many of these operations may be a mystery to you right now (like testing for the end-of-file); we will examine each of these operations in turn.

Organizing all of this into a algorithm takes a little more work than we're used to. Since we're dealing with operations that are likely to fail, we write the algorithm in several stages:

  1. First, write the code that we expect to work, and assume that it works perfectly. This is nearly everything in the behavior except for the "if something goes wrong" observations made at the very end.
  2. We try to execute this ideal code.
  3. Catch the problems, and process them.
  4. Fine tune the declaration of your variables, particular I/O streams.
  5. Make sure all open streams are closed.
Algorithm of EncodeCLIWADriver#main(String[])
  1. If there aren't three command-line arguments
    1. Display an error message.
    2. Exit the program.
  2. Try to execute the following:
    1. Parse key from first command-line argument.
    2. Let inputFilename be the second command-line argument.
    3. Let outputFilename be the third command-line argument.
    4. Display a greeting.
    5. Let cipher be a new Caesar cipher with encryption key key.
    6. Let theReader be the reader connected to the file named inputFilename.
    7. Let theWriter be the writer connected to the file named outputFilename.
    8. Loop:
      1. Read inputValue from the input file.
      2. If end-of-file was reached, then terminate repetition.
      3. Let inputChar be the char version of inputValue.
      4. Let outputChar be the Caesar-cipher encoding of inputChar using cipher.
      5. Write outputChar to the output file.
      End loop.
    9. Display a "success" message.
    Handle problems with parsing of key:
    1. Display an appropriate message.
    Handle I/O problems:
    1. Display an appropriate message.
    In all cases:
    1. Close the input and output connections.

Support Methods

The design above asks for two support methods: one to create the input stream, another to create the output stream. These methods are grossly technical, and sound a bit strange when designed. It's better learning these from an example:

public static BufferedReader createReader(String filename) throws IOException {
  FileReader theFileReader = new FileReader(filename);
  BufferedReader theBufferedReader = new BufferedReader(theFileReader);
  return theBufferedReader;
}

There are several things to notice about this method:

Do this...
Add this method to EncodeCLIWADriver. The code should compile.

Fortunately, you need to do all of these same things to create your writer. The only difference is that "reader" becomes "writer".

Do this...
Write EncodeCLIWADriver#createWriter(String), and compile your code.

Coding

Coding will be a fairly major undertaking this time around. There's a bit of code in this algorithm, and unfortunately it cannot be shrunk up really due to the I/O processing.

EncodeCLIWADriver#main(String[]) already provides some of the structure for this algorithm, including the try-catch for the second step (which is probably giving you compilation errors). We'll go through the two catchs and the finally later on.

Let's go through the steps of the algorithm, one by one.

Testing Command-Line Arguments and Exiting

The first step is already implemented for you. There's nothing too surprising there since we've already looked at arrays and if statements are old, old friends.

The System.exit(-1); statement exits the program immediately. This is somewhat dangerous to do when we're working with I/O streams, but since this is at the beginning of the driver and no I/O streams have been created at this point in the program, exiting is safe. We don't throw an exception because this is a problem with the way the user used the program; without the proper command-line arguments, there's nothing to do but print an error message and quit.

The -1 argument is an arbitrary value that's sent back to the operating system. The convention is that a program returns a 0 when the program stops normally, without any problems. However, if our driver doesn't get three command-line arguments, it cannot do anything since some crucial piece of information is missing. So we exit with a non-zero value. There really aren't any rules for what non-zero values to use. 1 and -1 are perhaps the most common.

Try to Execute Code, Handle Problems, and Cleanup

Our main algorithm really consists of one step, a step that tries to execute some ideal code that hopefully executes fine all of the time. But life isn't perfect, and something will inevitably go wrong. When something goes wrong, we need to identify the problem and handle it.

The ideal code goes in the try block; the problems are recognized by the exceptions that they throw, and these are handled by the catch blocks.

The try-catch you need is already in EncodeCLIWADriver#main(String[]).

Since there might be multiple things that could go wrong with the code we want to try to execute, we're allowed to put down as many catch blocks as we like:

  1. A NumberFormatException is thrown when we parse a number with a method like Integer#parseInt(String). It indicates that the String does not, in fact, represent a valid number. Since we can't do any encryption with an invalid key, we catch this exception and quit.
  2. An IOException is thrown when an I/O operations fails. There are different types of IOExceptions, but we won't distinguish between them.

There's an extra block at the end of the try-catch statement, the finally block. While a try can have as many catchs as you like, you can have at most one finally block. The code in a finally block is always executed, no matter if the try finishes normally or if any of the catchs are triggered.

The finally block is necessary here because in all cases, no matter what goes right or wrong, we must always close up the input and output streams. This is easily the most common use for a finally block, to close up streams.

The finally block makes it important that we don't use System.exit() within the try or any of the catchs since exiting the program bypasses the important finally block.

Using the Command-Line Arguments

To access the first command-line argument, access the first element of the args array: args[0]. That's a String which can be passed to Integer.parseInt(String):

Integer.parseInt(string)

This returns the int value contained in string, a String. (If string does not contain a valid int, it throw the NumberFormatException.)

Do this...
Write the code to declare and initialize key from the first command-line argument.

Then two simple declarations:

Do this...
Also write the code to set inputFilename and outputFilename from the second and third command-line arguments, respectively. Compile your code to see how good your syntax is so far.

Displaying a Greeting

This is already done for you.

Creating the Cipher

Read the CaesarCipher class to see how to construct a new CaesarCipher.

Do this...
Add the statement to declare and initilzie cipher. Compile your code to see how good your syntax is so far.

Creating a Buffered Reader Connected to a File

theReader should be initialized using EncodeCLIWADriver#createReader(String).

Do this...
Write the statement to declare and initialize theReader.

Since this method might throw an exception (when the file can't be found), you have to write this code inside the try. (Actually, according to the algorithm above, a lot of this code should already be in the try.)

Do this...
Compile your code to see how good your syntax is so far. You should be able to compile without errors now, but do not run anything until you've finished up the code (i.e., closed the files).

Creating a Buffered Writer Connected to a File

Do this...
Write the statement to declare and initialize theWriter using EncodeCLIWADriver#createWriter(String). Compile the code.

When opening a writer, if the file already exists, it's erased completely and is replaced by the new output from the program. If the file doesn't exist, it's created for us. (If we want our program to be careful about overwriting old data files, we have to add extra code ourselves.)

Character Input from a BufferedReader

Now we're in the loop.

The first action in the loop is described as "reading from the file", even though we are actually operating on theReader. Colloquially we might talk about manipulating the files, but technically in the code we're manipulating readers and writers.

We are interested in reading one character at a time. The BufferedReader#read() method almost does what we want. It does read a character from the file, but it returns an int, not a char:

intVariable = reader.read();

We need the int for the time being (for the next step), but eventually we'll cast it into a char.

Do this...
Write the first statement of the forever loop. (The forever loop is already written for you.) Compile your code to see how good your syntax is so far.

End of a File

Files are created by a computer's operating system. When the operating system creates a file, it marks the end of the file with a special end-of-file mark. Input operations are implemented in such a way as to prevent a program from reading beyond the end-of-file mark since this would be a security risk.

This end-of-file mark is usually used for terminating input loops. Java indicates the end of a file by returning -1 from BufferedReader#read(). If your program tries to read more input when its at the end-of-file, you'll either have an exception thrown, or the end-of-file character is just returned over and over again.

So testing "if end-of-file was reached" is a matter of testing if our inputValue is -1.

Do this...
Write a if-break statement to terminate the loop when the end-of-file is reached. Compile your code to see how good your syntax is so far.

Processing a char, Not an int

Now that we have tested for the end-of-file, we can turn the int into a char. This is done by a data cast:

(Type) value

For our problem, our Type is char and value is inputValue. This is the initialization expression for declaring inputChar.

Do this...
Write the declaration for inputChar. Compile your code to see how good your syntax is so far.

Now give this char to cipher.

Do this...
Write the declaration for outputChar, initialized to invoking encode(char) on cipher and inputChar.

Writer Output

We need to be able to write a character on the output file. There is is a BufferedWriter#write(char) method that will write a single character---exactly what we need.

The general form is this:

writer.write(charValue) ;

Do this...
Write the last statement of the loop. Compile your code to see how good your syntax is so far.

Closing Files

Helpful hint: Other languages say that closing files is important, but it's often done for you. Not in Java! If you forget to close an output file, the output file may very well be empty when you run your program! It is your responsibility to close all of your files.

Once we are done using a stream to read from or write to a file, we must close it. This is particularly important for a buffered writer where the buffer may be discarded when the program ends without saving the data to the file. To force a buffered writer to write the buffer to the actual file, you must flush the buffer. Closing a file flushes your output buffer and takes care of some housekeeping details, so it's important not matter what stream, reader, or writer you're using.

Closing a stream, writer, or reader is done the same way:

stream.close();

When execution reaches this statement, the program severs its connection to stream, flushing buffers and telling the operating system to tie up any open files.

Looking at the algorithm above, note that closing the files is done in the finally block. This actually takes a little more code than is immediately obvious.

  1. First, theReader and theWriter must be declared before the try. Otherwise, the compiler won't see them in the finally block. You have to move the declarations of these variables outside the try block, but they must be initialized inside the try block.
  2. Don't redeclare theReader and theWriter!
  3. Next, both of the streams have to be initialized to null when they're declared or the compiler gets anxious that they don't have a value.
  4. In the finally block we have to test to see if they are null before we close them, otherwise we might get a NullPointerException.
  5. Finally, we have to catch the IOException that the close() method might throw.

Do this...
Follow all of those steps in writing the code for the finally block.

Here's a pattern for you to follow for closing a stream (the last two steps above):

try {
  if (stream != null)
    stream.close();
}
catch (IOException e) {
  System.err.println("Problems closing a stream.");
  e.printStackTrace();
}

This code goes inside the finally block of the original try. You can put other if-close statements inside this try to close other streams. The exception handling code will print an error and display a stack trace for debugging.

This is quite easily some of the ugliest code you'll ever have to write. But there's little we can do about it. We have to do these things in these places, otherwise Really Bad Things happen.

Do this...
Compile your code, but don't run it quite yet.

Handling the Problems

Our solution for handling problems is to display reprimanding messages to the user on System.err.

Do this...
Add statements to the catch blocks to reprimand the user appropriately.

Testing and Debugging

Do this...
Now, finally, compile and run your driver program. (Reread the run page to find out how to pass command-line arguments to your program!) Remember to pass in the key, input file name, and output file name in as arguments to the program from the command line.

Specify the input and output filenames relative from the main project folder. That is, your input filename should be Inputs/lab10/message.text, and the output file should be Outputs/lab10/message.encode. This will keep the main project folder less cluttered.

If what you have written is correct, use a key of 3, and your program should create an output file containing this output:

Rqh Li Eb Odqg
Wzr Li Eb Vhd

Also try running your program on a file that doesn't exist. It should exit gracefully with your reprimand.

Decoding

Now apply all of this knowledge to the problem of decoding a file encoded using a Caesar cipher. This program, implemented in a class DecodeCLIWADriver, should be a dual to EncodeCLIWADriver. In fact, DecodeCLIWADriver should look a lot like EncodeCLIWADriver.

This is definitely a problem where you should think about the problem first to figure out how little work you should do.

To test your DecodeCLIWADriver#main(String[]) program, you can use the output file created by EncodeCLIWADriver, using the same encryption key. Be careful with your file names!!!! Remember that output files are overwritten without any complaints. It's strongly advised that you name your decoded files with a .decode suffix or something else distinctive.

Also included is alice.code, an encoded selection from Lewis Carroll's Alice In WonderLand. Experiment with different encryption keys to figure out the proper encryption key.

Turn In

Submit the following for this lab exercise:

  1. Copies of your code files.
  2. A sample execution of your programs demonstrating the following:
  3. The original, encoded, and decoded files.

Terminology

buffered input stream, buffered output stream, buffering, Caesar cipher, character streams, command-line arguments, data cast, encryption key, file, finally block, flush a buffer, input buffer, output buffer, parsing method, persistent storage, print stream, program usage, stream, token, wrapper class