Throughout this manual, we have made use of files---data stored in an organized way on some persistent storage. Each source file that we have written has been stored in a file; the compiled byte-code of a class is stored in a class file.
In this lab exercise we will take a closer look at the I/O facilities provided by Java that allow us to add file manipulation to our programs.
Input and output is viewed as a stream in Java. The terminology is deliberate: picture your program as someone who controls a stream of water; the program can turn the stream on and off; the program can watch everything that goes past.
We've actually been dealing with data streams all along: the
Screen and Keyboard classes create stream
objects that connect a program up to the screen and keyboard of the
user. Whenever a program asks a Keyboard object to
read in a double (with
Keyboard.readDouble()), the program asks the stream to
flow just enough to read in the floating-point number it expects
from the keyboard.
The most basic streams classes are the InputStream
and OutputStream. These classes are very limited;
though there are a few other operations that are supported, they
basically only know how to read and write bytes (8 bits of
data).
InputStream |
OutputStream |
|---|---|
read() - read a byte |
write() - write a byte |
close() - close the input stream |
close() - close the output stream |
Generally we like to think of our data at least in terms of
ints, doubles, etc. Individual
bytes just really aren't that useful except to build
these other types of data. Fortunately, we don't have to provide
this behavior there are other classes that do it for us.
Another disadvantage of the basic byte streams is that they perform an I/O operation with every read or write. This can be a serious problem if we are accessing a hard disk or a network drive which has slow access times. To improve performance, most systems use some kind of buffering.
Java implements two classes BufferedInputStream and
BufferedOutputStream to create buffered input and
output streams. They don't supply any additional operations, but
are more efficient.
Java has classes that implement streams for processing different
kinds of data, other than just bytes: the
DataInputStream and the DataOutputStream.
They can read and write any of the primitive data types that Java
supports. In addition to the operations from the byte stream
classes, these classes add a few more operations:
DataInputStream |
DataOutputStream |
|---|---|
readBoolean() - read a boolean |
writeBoolean() - write a boolean |
readByte() - read a byte |
writeByte() - write a byte |
readChar() - read a char |
writeChar() - write a char |
readDouble() - read a double |
writeDouble() - write a double |
readFloat() - read a float |
writeFloat() - write a float |
readInt() - read a int |
writeInt() - write a int |
readLong() - read a long |
writeLong() - write a long |
readShort() - read a short |
writeShort() - write a short |
These are still not the classes and behaviors that we're looking for, even though their names are really, really tempting. The writing operations will record the binary representation of the data, not the character representation that we're used to reading and typing. It's nearly impossible for humans to read the output from these streams, which makes debugging very difficult.
Java has another set of primitive stream classes for doing I/O
that are based on the classes Reader and
Writer. The Reader and
Writer classes implement character
streams.
Characters in Java are stored internally in Unicode which uses 16 bits to represent one character. The actual representation of the data that is stored in a text file will typically depend on the locality of the machine, but Java handles those details for us (in the same way that our text editors handle those details).
Reader |
Writer |
|---|---|
read() - read a character |
write() - write a character |
close() - close the character stream |
close() - close the character stream |
Just as with byte streams, we have classes that support buffered
I/O for character streams: BufferedReader and
BufferedWriter. In addition to the methods from the
Reader and Writer classes, these buffered
classes support the reading and writing of Strings
with these methods:
BufferedReader |
BufferedWriter |
|---|---|
readLine() - read a String of
characters |
write() - overloaded to provide writing a
String |
So which should we use, byte streams or character streams? When a computer's memory (both RAM and disk space) was small and costly, it was important to make files as small as possible. But memory gets cheaper and larger, so this has become less of a concern.
This removes the main reason for using byte streams for reading and writing files; however, they do still have their uses. Most computer games use byte streams for their saved-game files so that we can't modify our games---what would be the fun if we could sudden give ourselves billions of dollars in our game? Other programs, like word processors and spreadsheets, will use byte streams to implement proprietary file formats so that it's more difficult for other programs to read your documents. You're stuck using the commercial program and buying new versions just so that you can continue using the same saved file.
However, as noted above, the output from a byte stream is not easily read by a human. Since it's us humans who have to debug the program (which involves reading the input and output files of the program), it may be more important to make the files human readable; this means using a character stream. A problem with a character stream is that it's output takes up a little more room than a byte stream, but in this day and age of extremely cheap memory and storage, it's well worth this extra cost.
A print stream is a kind of stream that has
been extended to support the operations print() and
println(). These operations are overloaded so that
they can successfully print any object (using its
toString() method) as well as all the primitive Java
data types. In contrast to write(), the
print() methods will print the data in human readable
form. Your own experience from previous labs testify to this; we've
used print() methods all along.
There are two kinds of print streams: PrintStream
is a kind of OutputStream (for byte streams) and
PrintWriter is a kind of Writer (for
character streams).
Input, on the other hand, is trickier than output. The best we
have are the buffered streams which provide a
readLine() facility. To extract useful information
from a line of text we can use two things:
StringTokenizer class from the
java.util package will turn a line of text (a
String) into tokens, chunks of text
from the String based on a pattern that we provide to
the StringTokenizer.String into another type of data like an
int.We saw both of these in Lab #6.
There are three stream objects in Java defined in the
System class:
System.in, an InputStream, supports
low-level, byte-stream operations. This is connected to the
keyboard, and ann.easyio.Keyboard objects are just
(terribly convenient) wrappers around System.in.System.out, a PrintStream, supports
the print() methods. This is connected to the screen,
and ann.easyio.Screen objects are just wrappers around
System.out.System.err, also a PrintStream, also
supports the print() methods. This is also connected
to the screen, but instead of presenting normal output,
System.err is used to display error messages.While Java programmers will typically use
System.out directly, they will rarely do so with
System.in. Instead, they usually build objects from
System.in that make input much easier.
The last piece of our puzzle for this lab exercise is connecting up to a file. We looked at a variety of streams, but how do we go the next step to connect up a stream to a file?
Java has four basic classes for manipulating file streams.
FileInputStreamFileOutputStreamFileReaderFileWriterSince we're interested in character streams, we'll be using the last two on this list.
Each of the file-stream classes has a constructor that accepts a
String as the name of the file to open:
newFileStreamClass(fileName)
If the file cannot be opened, the constructor will throw some
sort of an IOException.
Generally, we don't directly use the objects that these constructors build for us. At the very least we have to wrap them up in a buffered stream.
We're going to use an encryption problem as an excuse to put files to use in this lab exercise.
A Caesar cipher is a simple means of encoding messages that dates from Roman times (rumored to be created by Julius Caesar himself, hence the name). For example:
| Original message | Encoded message |
|---|---|
| One if by land, two if by sea. | Rqh li eb odqg, wzr li eb vhd. |
| veni, vidi, vici | wfoj, wjej, wjdj |
Both use a Caesar cipher, although with different keys. Can you figure out the encryption? The encryption key is the number of characters that each character is "shifted".
We're going to use Caesar ciphers to encode and decode the data in files.
Do this...
edu.institution.username.hotj.lab10
Exercise Questions/lab10.txt.Design/lab10.txt.Inputs/lab10.Outputs/lab10.CaesarCipher.java and
EncodeCLIWADriver.java.alice.code and message.text into the
Inputs/lab10 folder (not the package!).The driver is a "command-line interface with arguments", hence the slightly awkward name. We start with these command-line arguments.
From a command-line prompt, you can run a Java program like this:
% java hotj.imaginary.Driver
You can add extra information on that same line like so:
% java hotj.imaginary.Driver 12 Inputs/lab10/foobar.txt Outputs/lab10/johnboy.txt
The java executable is happy so long as its first
argument is a valid Java class. But what happens to that text
after the name of the class? One possibility is that it's
thrown away, but that seems unlikely. Another possibility is that
this data is treated as the first input from the keyboard; this
perhaps seems more likely, but this too is not the case.
The words of text after the name of the class are known as
command-line arguments. The operating system saves
away command-line arguments in a special place for our program to
use. Java delivers them to us through the arguments of the
#main(String[]) method:
public static void main(String[] args) { ... }
As we saw in the previous lab exercise, String[] is
the data type for an array of Strings. This array,
which is traditionally named args, is filled with the
command-line arguments.
In the example above, args in
Driver#main(String[]) will be filled with the
Strings "12",
"Inputs/lab10/foobar.txt", and
"Outputs/lab10/johnboy.txt". The only magical thing
here is that the operating system fills the array for us
automatically and the Java Virtual Machine delivers it to the
program in this parameter.
The array args is like any other array of
Strings that we might come across, so everything we
learned about arrays in the previous lab and everything we know
about Strings apply.
As noted in the Javadoc comments for
EncodeCLIWADriver#main(String[]) the
usage of the EncodeCLIWADriver
program should be executed like this:
java EncodeCLIWADriver key inputFile outputFile
To run the program, we must provide command-line arguments for the key, the inputFile, and the outputFile.
We want to write a program that encodes a message stored in a file. We'll write the encoded message to another file. All programs must be designed first...
Our program needs three pieces of information:
The encryption key is the key used to encode the message. Each
CaesarCipher instance requires its own encryption
key.
Behavior of EncodeCLIWADriver#main(String[])The driver will first check to make sure there are three command-line arguments; if not, it will exit with an error message.
Otherwise, the driver will start normal processing by displaying a greeting. Then it will process the three command-line arguments:
- The first command-line argument will be parsed as an encryption key.
- The second command-line argument will be the name of the input file.
- The third command-line argument will be the name of the output file.
The driver will create a new Caesar cipher using the encryption key. The driver will also create a reader for the input file and a writer for the output file.
Then, for each character in the input file, our driver should read the character, encode it using the Caesar cipher, and then output the encoded character to the output file. The driver should conclude by closing the streams and by displaying a "success" message.
If the encryption key cannot be parsed or if any of the file I/O results in an error, the driver will print an error message and will not try to recover.
Using this behavioral description, we can identify the following objects:
Objects of EncodeCLIWADriver#main(String[])Description Type Kind Name command-line arguments String[]received argsencryption key intvarying keythe name of the input file Stringvarying inputFilenamethe name of the output file Stringvarying outputFilenamethe input reader for the input file BufferedReadervarying theReaderthe output writer for the output file BufferedWritervarying theWritera "character value" from the input file intvarying inputValuethe actual character from the input file charvarying inputCharan encoded character charvarying outputChar
The object list here is biased towards Java: when a
char is read from an input stream (or input reader),
you actually read in an int. We will discuss
this later in this lab, but for now, we need both the
int read in and the actualy char
in our object list.
Specification:
command-line arguments: candidate values for
key,inputFilename,outputFilename
input (input file): a sequence of unencoded characters.
output (output file): a sequence of encoded characters.
output (screen): various progress messages.
From our behavioral description, we have these operations:
Operations of EncodeCLIWADriver#main(String[])Description Predefined? Name Library test number of command-line arguments yes !=built-in conditionally complain yes ifbuilt-in exit the program yes System.exit(int)java.langparse an intfrom aStringyes parseInt(String)java.lang.Integerdisplay a Stringyes println()java.io.PrintStreamconnect an input stream to a file no createReader(String)EncodeCLIWADriverconnect an output stream to a file no createWriter(String)EncodeCLIWADriverread a character from an input stream yes read()java.io.BufferedReadertest for end-of-file yes == -1built in encode a charusing the Caesar cipheryes encode(char)CaesarCipherwrite a charto an output streamyes write(char)java.io.Writerrepeat... yes forever loop built-in try to act normally, but handle problems yes try... catch... finallybuilt-in disconnect a stream from a file yes close()java.io.BufferedWriterandjava.io.BufferedReader
Many of these operations may be a mystery to you right now (like testing for the end-of-file); we will examine each of these operations in turn.
Organizing all of this into a algorithm takes a little more work than we're used to. Since we're dealing with operations that are likely to fail, we write the algorithm in several stages:
Algorithm of EncodeCLIWADriver#main(String[])
- If there aren't three command-line arguments
- Display an error message.
- Exit the program.
- Try to execute the following:
Handle problems with parsing of
- Parse
keyfrom first command-line argument.- Let
inputFilenamebe the second command-line argument.- Let
outputFilenamebe the third command-line argument.- Display a greeting.
- Let
cipherbe a new Caesar cipher with encryption keykey.- Let
theReaderbe the reader connected to the file namedinputFilename.- Let
theWriterbe the writer connected to the file namedoutputFilename.- Loop:
End loop.
- Read
inputValuefrom the input file.- If end-of-file was reached, then terminate repetition.
- Let
inputCharbe thecharversion ofinputValue.- Let
outputCharbe the Caesar-cipher encoding ofinputCharusingcipher.- Write
outputCharto the output file.- Display a "success" message.
key:Handle I/O problems:
- Display an appropriate message.
In all cases:
- Display an appropriate message.
- Close the input and output connections.
The design above asks for two support methods: one to create the input stream, another to create the output stream. These methods are grossly technical, and sound a bit strange when designed. It's better learning these from an example:
public static BufferedReader createReader(String filename) throws IOException {
FileReader theFileReader = new FileReader(filename);
BufferedReader theBufferedReader = new BufferedReader(theFileReader);
return theBufferedReader;
}
There are several things to notice about this method:
theFileReader.theBufferedReader. This is
a useable reader, and so it gets returned.FileReader
constructor throws an IOException, you have to declare
this in the method's signature.Do this...
Add this method to EncodeCLIWADriver. The code should
compile.
Fortunately, you need to do all of these same things to create your writer. The only difference is that "reader" becomes "writer".
Do this...
Write EncodeCLIWADriver#createWriter(String), and compile your code.
Coding will be a fairly major undertaking this time around. There's a bit of code in this algorithm, and unfortunately it cannot be shrunk up really due to the I/O processing.
EncodeCLIWADriver#main(String[]) already provides
some of the structure for this algorithm, including the
try-catch for the second step (which is
probably giving you compilation errors). We'll go through the two
catchs and the finally later on.
Let's go through the steps of the algorithm, one by one.
The first step is already implemented for you. There's nothing
too surprising there since we've already looked at arrays and
if statements are old, old friends.
The System.exit(-1); statement exits the program
immediately. This is somewhat dangerous to do when we're working
with I/O streams, but since this is at the beginning of the driver
and no I/O streams have been created at this point in the program,
exiting is safe. We don't throw an exception because this is a
problem with the way the user used the program; without the proper
command-line arguments, there's nothing to do but print an error
message and quit.
The -1 argument is an arbitrary value that's sent
back to the operating system. The convention is that a program
returns a 0 when the program stops normally, without
any problems. However, if our driver doesn't get three command-line
arguments, it cannot do anything since some crucial piece of
information is missing. So we exit with a non-zero value. There
really aren't any rules for what non-zero values to use.
1 and -1 are perhaps the most common.
Our main algorithm really consists of one step, a step that tries to execute some ideal code that hopefully executes fine all of the time. But life isn't perfect, and something will inevitably go wrong. When something goes wrong, we need to identify the problem and handle it.
The ideal code goes in the try block; the problems
are recognized by the exceptions that they throw, and these are
handled by the catch blocks.
The try-catch you need is already in
EncodeCLIWADriver#main(String[]).
Since there might be multiple things that could go wrong with
the code we want to try to execute, we're allowed to put down as
many catch blocks as we like:
NumberFormatException is thrown when we parse a
number with a method like Integer#parseInt(String). It
indicates that the String does not, in fact, represent
a valid number. Since we can't do any encryption with an invalid
key, we catch this exception and quit.IOException is thrown when an I/O operations
fails. There are different types of IOExceptions, but
we won't distinguish between them.There's an extra block at the end of the
try-catch statement, the
finally block. While a
try can have as many catchs as you like,
you can have at most one finally block. The code in a
finally block is always executed, no matter
if the try finishes normally or if any of the
catchs are triggered.
The finally block is necessary here because in all
cases, no matter what goes right or wrong, we must always
close up the input and output streams. This is easily the most
common use for a finally block, to close up
streams.
The finally block makes it important that we
don't use System.exit() within the
try or any of the catchs since exiting
the program bypasses the important finally block.
To access the first command-line argument, access the first
element of the args array:
. That's a
args[0]String which can be passed to
Integer.parseInt(String):
Integer.parseInt(string)
This returns the int value contained in
string, a
String. (If string does
not contain a valid int, it throw the
NumberFormatException.)
Do this...
Write the code to declare and initialize key
from the first command-line argument.
Then two simple declarations:
Do this...
Also write the code to set inputFilename and
outputFilename from the second and third
command-line arguments, respectively. Compile your code to see how good
your syntax is so far.
This is already done for you.
Read the CaesarCipher class to see how to construct
a new CaesarCipher.
Do this...
Add the statement to declare and initilzie
cipher. Compile your code to see how good
your syntax is so far.
theReader should be initialized using
EncodeCLIWADriver#createReader(String).
Do this...
Write the statement to declare and initialize
theReader.
Since this method might throw an exception (when the file can't
be found), you have to write this code inside the try.
(Actually, according to the algorithm above, a lot of this code
should already be in the try.)
Do this...
Compile your code to see how
good your syntax is so far. You should be able to compile without
errors now, but do not run anything until you've finished
up the code (i.e., closed the files).
Do this...
Write the statement to declare and initialize
theWriter using
EncodeCLIWADriver#createWriter(String). Compile the code.
When opening a writer, if the file already exists, it's erased completely and is replaced by the new output from the program. If the file doesn't exist, it's created for us. (If we want our program to be careful about overwriting old data files, we have to add extra code ourselves.)
BufferedReaderNow we're in the loop.
The first action in the loop is described as "reading from the
file", even though we are actually operating on
theReader. Colloquially we might talk about
manipulating the files, but technically in the code we're
manipulating readers and writers.
We are interested in reading one character at a time. The
BufferedReader#read() method almost does what we want.
It does read a character from the file, but it returns an
int, not a char:
intVariable=reader.read();
We need the int for the time being (for the next
step), but eventually we'll cast it into a char.
Do this...
Write the first statement of the forever loop. (The forever loop
is already written for you.) Compile your code to see how good
your syntax is so far.
Files are created by a computer's operating system. When the operating system creates a file, it marks the end of the file with a special end-of-file mark. Input operations are implemented in such a way as to prevent a program from reading beyond the end-of-file mark since this would be a security risk.
This end-of-file mark is usually used for terminating input
loops. Java indicates the end of a file by returning
-1 from BufferedReader#read(). If your
program tries to read more input when its at the end-of-file,
you'll either have an exception thrown, or the end-of-file
character is just returned over and over again.
So testing "if end-of-file was reached" is a matter of testing
if our inputValue is -1.
Do this...
Write a if-break statement to terminate
the loop when the end-of-file is reached. Compile your code to see how good
your syntax is so far.
char, Not an intNow that we have tested for the end-of-file, we can turn the
int into a char. This is done by a
data cast:
(Type)value
For our problem, our Type is
char and value is
inputValue. This is the initialization
expression for declaring inputChar.
Do this...
Write the declaration for inputChar. Compile your code to see how good
your syntax is so far.
Now give this char to
cipher.
Do this...
Write the declaration for outputChar,
initialized to invoking encode(char) on
cipher and
inputChar.
We need to be able to write a character on the output file.
There is is a BufferedWriter#write(char) method that
will write a single character---exactly what we need.
The general form is this:
writer.write(charValue) ;
Do this...
Write the last statement of the loop. Compile your code to see how good
your syntax is so far.
Helpful hint: Other languages say that closing files is important, but it's often done for you. Not in Java! If you forget to close an output file, the output file may very well be empty when you run your program! It is your responsibility to close all of your files.
Once we are done using a stream to read from or write to a file, we must close it. This is particularly important for a buffered writer where the buffer may be discarded when the program ends without saving the data to the file. To force a buffered writer to write the buffer to the actual file, you must flush the buffer. Closing a file flushes your output buffer and takes care of some housekeeping details, so it's important not matter what stream, reader, or writer you're using.
Closing a stream, writer, or reader is done the same way:
stream.close();
When execution reaches this statement, the program severs its
connection to stream,
flushing buffers and telling the operating system to tie up any
open files.
Looking at the algorithm above, note that closing the files is
done in the finally block. This actually takes a
little more code than is immediately obvious.
theReader and
theWriter must be declared before
the try. Otherwise, the compiler won't see them in the
finally block. You have to move the
declarations of these variables outside the
try block, but they must be
initialized inside the try
block.theReader and
theWriter!null when they're declared or the compiler gets
anxious that they don't have a value.finally block we have to test to see if
they are null before we close them, otherwise we might
get a NullPointerException.IOException that the
close() method might throw.Do this...
Follow all of those steps in writing the code for the
finally block.
Here's a pattern for you to follow for closing a stream (the
last two steps above):
try {
if (stream != null)
stream.close();
}
catch (IOException e) {
System.err.println("Problems closing a stream.");
e.printStackTrace();
}
This code goes inside the finally block of
the original try. You can put other
if-close statements inside this
try to close other streams. The exception handling
code will print an error and display a stack trace for
debugging.
This is quite easily some of the ugliest code you'll ever have to write. But there's little we can do about it. We have to do these things in these places, otherwise Really Bad Things happen.
Do this...
Compile your code, but don't
run it quite yet.
Our solution for handling problems is to display reprimanding
messages to the user on System.err.
Do this...
Add statements to the catch blocks to reprimand the
user appropriately.
Do this...
Now, finally, compile and run your driver program. (Reread the run page to find out how to pass
command-line arguments to your program!) Remember to pass in the
key, input file name, and output file name in as arguments to the
program from the command line.
Specify the input and output filenames relative from the main
project folder. That is, your input filename should be
Inputs/lab10/message.text, and the output file should
be Outputs/lab10/message.encode. This will keep the
main project folder less cluttered.
If what you have written is correct, use a key of 3, and your program should create an output file containing this output:
Rqh Li Eb Odqg Wzr Li Eb Vhd
Also try running your program on a file that doesn't exist. It should exit gracefully with your reprimand.
Now apply all of this knowledge to the problem of decoding a
file encoded using a Caesar cipher. This program, implemented in a
class DecodeCLIWADriver, should be a dual to
EncodeCLIWADriver. In fact,
DecodeCLIWADriver should look a lot like
EncodeCLIWADriver.
This is definitely a problem where you should think about the problem first to figure out how little work you should do.
To test your DecodeCLIWADriver#main(String[])
program, you can use the output file created by
EncodeCLIWADriver, using the same encryption key.
Be careful with your file names!!!! Remember that output
files are overwritten without any complaints. It's
strongly advised that you name your decoded files with a
.decode suffix or something else distinctive.
Also included is alice.code, an encoded selection
from Lewis Carroll's Alice In WonderLand. Experiment with
different encryption keys to figure out the proper encryption
key.
Submit the following for this lab exercise:
message.text with three different
encryption keys.message.text.alice.code.buffered input stream, buffered output stream, buffering, Caesar
cipher, character streams, command-line arguments, data cast,
encryption key, file, finally block, flush a buffer,
input buffer, output buffer, parsing method, persistent storage,
print stream, program usage, stream, token, wrapper class