Throughout this manual, we have made use of files---data stored in an organized way on some persistent storage. Each source file that we have written has been stored in a file; the compiled byte-code of a class is stored in a class file.
In this lab exercise we will take a closer look at the I/O facilities provided by Java that allow us to add file manipulation to our programs.
Input and output is viewed as a stream in Java. The terminology is deliberate: picture your program as someone who controls a stream of water; the program can turn the stream on and off; the program can watch everything that goes past.
We've actually been dealing with data streams all along: the
Screen
and Keyboard
classes create stream
objects that connect a program up to the screen and keyboard of the
user. Whenever a program asks a Keyboard
object to
read in a double
(with
Keyboard.readDouble()
), the program asks the stream to
flow just enough to read in the floating-point number it expects
from the keyboard.
The most basic streams classes are the InputStream
and OutputStream
. These classes are very limited;
though there are a few other operations that are supported, they
basically only know how to read and write bytes (8 bits of
data).
InputStream |
OutputStream |
---|---|
read() - read a byte |
write() - write a byte |
close() - close the input stream |
close() - close the output stream |
Generally we like to think of our data at least in terms of
int
s, double
s, etc. Individual
byte
s just really aren't that useful except to build
these other types of data. Fortunately, we don't have to provide
this behavior there are other classes that do it for us.
Another disadvantage of the basic byte streams is that they perform an I/O operation with every read or write. This can be a serious problem if we are accessing a hard disk or a network drive which has slow access times. To improve performance, most systems use some kind of buffering.
Java implements two classes BufferedInputStream
and
BufferedOutputStream
to create buffered input and
output streams. They don't supply any additional operations, but
are more efficient.
Java has classes that implement streams for processing different
kinds of data, other than just bytes: the
DataInputStream
and the DataOutputStream
.
They can read and write any of the primitive data types that Java
supports. In addition to the operations from the byte stream
classes, these classes add a few more operations:
DataInputStream |
DataOutputStream |
---|---|
readBoolean() - read a boolean |
writeBoolean() - write a boolean |
readByte() - read a byte |
writeByte() - write a byte |
readChar() - read a char |
writeChar() - write a char |
readDouble() - read a double |
writeDouble() - write a double |
readFloat() - read a float |
writeFloat() - write a float |
readInt() - read a int |
writeInt() - write a int |
readLong() - read a long |
writeLong() - write a long |
readShort() - read a short |
writeShort() - write a short |
These are still not the classes and behaviors that we're looking for, even though their names are really, really tempting. The writing operations will record the binary representation of the data, not the character representation that we're used to reading and typing. It's nearly impossible for humans to read the output from these streams, which makes debugging very difficult.
Java has another set of primitive stream classes for doing I/O
that are based on the classes Reader
and
Writer
. The Reader
and
Writer
classes implement character
streams.
Characters in Java are stored internally in Unicode which uses 16 bits to represent one character. The actual representation of the data that is stored in a text file will typically depend on the locality of the machine, but Java handles those details for us (in the same way that our text editors handle those details).
Reader |
Writer |
---|---|
read() - read a character |
write() - write a character |
close() - close the character stream |
close() - close the character stream |
Just as with byte streams, we have classes that support buffered
I/O for character streams: BufferedReader
and
BufferedWriter
. In addition to the methods from the
Reader
and Writer
classes, these buffered
classes support the reading and writing of String
s
with these methods:
BufferedReader |
BufferedWriter |
---|---|
readLine() - read a String of
characters |
write() - overloaded to provide writing a
String |
So which should we use, byte streams or character streams? When a computer's memory (both RAM and disk space) was small and costly, it was important to make files as small as possible. But memory gets cheaper and larger, so this has become less of a concern.
This removes the main reason for using byte streams for reading and writing files; however, they do still have their uses. Most computer games use byte streams for their saved-game files so that we can't modify our games---what would be the fun if we could sudden give ourselves billions of dollars in our game? Other programs, like word processors and spreadsheets, will use byte streams to implement proprietary file formats so that it's more difficult for other programs to read your documents. You're stuck using the commercial program and buying new versions just so that you can continue using the same saved file.
However, as noted above, the output from a byte stream is not easily read by a human. Since it's us humans who have to debug the program (which involves reading the input and output files of the program), it may be more important to make the files human readable; this means using a character stream. A problem with a character stream is that it's output takes up a little more room than a byte stream, but in this day and age of extremely cheap memory and storage, it's well worth this extra cost.
A print stream is a kind of stream that has
been extended to support the operations print()
and
println()
. These operations are overloaded so that
they can successfully print any object (using its
toString()
method) as well as all the primitive Java
data types. In contrast to write()
, the
print()
methods will print the data in human readable
form. Your own experience from previous labs testify to this; we've
used print()
methods all along.
There are two kinds of print streams: PrintStream
is a kind of OutputStream
(for byte streams) and
PrintWriter
is a kind of Writer
(for
character streams).
Input, on the other hand, is trickier than output. The best we
have are the buffered streams which provide a
readLine()
facility. To extract useful information
from a line of text we can use two things:
StringTokenizer
class from the
java.util
package will turn a line of text (a
String
) into tokens, chunks of text
from the String
based on a pattern that we provide to
the StringTokenizer
.String
into another type of data like an
int
.We saw both of these in Lab #6.
There are three stream objects in Java defined in the
System
class:
System.in
, an InputStream
, supports
low-level, byte-stream operations. This is connected to the
keyboard, and ann.easyio.Keyboard
objects are just
(terribly convenient) wrappers around System.in
.System.out
, a PrintStream
, supports
the print()
methods. This is connected to the screen,
and ann.easyio.Screen
objects are just wrappers around
System.out
.System.err
, also a PrintStream
, also
supports the print()
methods. This is also connected
to the screen, but instead of presenting normal output,
System.err
is used to display error messages.While Java programmers will typically use
System.out
directly, they will rarely do so with
System.in
. Instead, they usually build objects from
System.in
that make input much easier.
The last piece of our puzzle for this lab exercise is connecting up to a file. We looked at a variety of streams, but how do we go the next step to connect up a stream to a file?
Java has four basic classes for manipulating file streams.
FileInputStream
FileOutputStream
FileReader
FileWriter
Since we're interested in character streams, we'll be using the last two on this list.
Each of the file-stream classes has a constructor that accepts a
String
as the name of the file to open:
newFileStreamClass
(fileName
)
If the file cannot be opened, the constructor will throw some
sort of an IOException
.
Generally, we don't directly use the objects that these constructors build for us. At the very least we have to wrap them up in a buffered stream.
We're going to use an encryption problem as an excuse to put files to use in this lab exercise.
A Caesar cipher is a simple means of encoding messages that dates from Roman times (rumored to be created by Julius Caesar himself, hence the name). For example:
Original message | Encoded message |
---|---|
One if by land, two if by sea. | Rqh li eb odqg, wzr li eb vhd. |
veni, vidi, vici | wfoj, wjej, wjdj |
Both use a Caesar cipher, although with different keys. Can you figure out the encryption? The encryption key is the number of characters that each character is "shifted".
We're going to use Caesar ciphers to encode and decode the data in files.
Do this...
edu.institution
.username
.hotj.lab10
Exercise Questions/lab10.txt
.Design/lab10.txt
.Inputs/lab10
.Outputs/lab10
.CaesarCipher.java
and
EncodeCLIWADriver.java
.alice.code
and message.text
into the
Inputs/lab10
folder (not the package!).The driver is a "command-line interface with arguments", hence the slightly awkward name. We start with these command-line arguments.
From a command-line prompt, you can run a Java program like this:
% java hotj.imaginary.Driver
You can add extra information on that same line like so:
% java hotj.imaginary.Driver 12 Inputs/lab10/foobar.txt Outputs/lab10/johnboy.txt
The java
executable is happy so long as its first
argument is a valid Java class. But what happens to that text
after the name of the class? One possibility is that it's
thrown away, but that seems unlikely. Another possibility is that
this data is treated as the first input from the keyboard; this
perhaps seems more likely, but this too is not the case.
The words of text after the name of the class are known as
command-line arguments. The operating system saves
away command-line arguments in a special place for our program to
use. Java delivers them to us through the arguments of the
#main(String[])
method:
public static void main(String[] args) { ... }
As we saw in the previous lab exercise, String[]
is
the data type for an array of String
s. This array,
which is traditionally named args
, is filled with the
command-line arguments.
In the example above, args
in
Driver#main(String[])
will be filled with the
String
s "12"
,
"Inputs/lab10/foobar.txt"
, and
"Outputs/lab10/johnboy.txt"
. The only magical thing
here is that the operating system fills the array for us
automatically and the Java Virtual Machine delivers it to the
program in this parameter.
The array args
is like any other array of
String
s that we might come across, so everything we
learned about arrays in the previous lab and everything we know
about String
s apply.
As noted in the Javadoc comments for
EncodeCLIWADriver#main(String[])
the
usage of the EncodeCLIWADriver
program should be executed like this:
java EncodeCLIWADriver key inputFile outputFile
To run the program, we must provide command-line arguments for the key, the inputFile, and the outputFile.
We want to write a program that encodes a message stored in a file. We'll write the encoded message to another file. All programs must be designed first...
Our program needs three pieces of information:
The encryption key is the key used to encode the message. Each
CaesarCipher
instance requires its own encryption
key.
Behavior of EncodeCLIWADriver#main(String[])
The driver will first check to make sure there are three command-line arguments; if not, it will exit with an error message.
Otherwise, the driver will start normal processing by displaying a greeting. Then it will process the three command-line arguments:
- The first command-line argument will be parsed as an encryption key.
- The second command-line argument will be the name of the input file.
- The third command-line argument will be the name of the output file.
The driver will create a new Caesar cipher using the encryption key. The driver will also create a reader for the input file and a writer for the output file.
Then, for each character in the input file, our driver should read the character, encode it using the Caesar cipher, and then output the encoded character to the output file. The driver should conclude by closing the streams and by displaying a "success" message.
If the encryption key cannot be parsed or if any of the file I/O results in an error, the driver will print an error message and will not try to recover.
Using this behavioral description, we can identify the following objects:
Objects of EncodeCLIWADriver#main(String[])
Description Type Kind Name command-line arguments String[]
received args
encryption key int
varying key
the name of the input file String
varying inputFilename
the name of the output file String
varying outputFilename
the input reader for the input file BufferedReader
varying theReader
the output writer for the output file BufferedWriter
varying theWriter
a "character value" from the input file int
varying inputValue
the actual character from the input file char
varying inputChar
an encoded character char
varying outputChar
The object list here is biased towards Java: when a
char
is read from an input stream (or input reader),
you actually read in an int
. We will discuss
this later in this lab, but for now, we need both the
int
read in and the actualy char
in our object list.
Specification:
command-line arguments: candidate values for
key
,inputFilename
,outputFilename
input (input file): a sequence of unencoded characters.
output (output file): a sequence of encoded characters.
output (screen): various progress messages.
From our behavioral description, we have these operations:
Operations of EncodeCLIWADriver#main(String[])
Description Predefined? Name Library test number of command-line arguments yes !=
built-in conditionally complain yes if
built-in exit the program yes System.exit(int)
java.lang
parse an int
from aString
yes parseInt(String)
java.lang.Integer
display a String
yes println()
java.io.PrintStream
connect an input stream to a file no createReader(String)
EncodeCLIWADriver
connect an output stream to a file no createWriter(String)
EncodeCLIWADriver
read a character from an input stream yes read()
java.io.BufferedReader
test for end-of-file yes == -1
built in encode a char
using the Caesar cipheryes encode(char)
CaesarCipher
write a char
to an output streamyes write(char)
java.io.Writer
repeat... yes forever loop built-in try to act normally, but handle problems yes try... catch... finally
built-in disconnect a stream from a file yes close()
java.io.BufferedWriter
andjava.io.BufferedReader
Many of these operations may be a mystery to you right now (like testing for the end-of-file); we will examine each of these operations in turn.
Organizing all of this into a algorithm takes a little more work than we're used to. Since we're dealing with operations that are likely to fail, we write the algorithm in several stages:
Algorithm of EncodeCLIWADriver#main(String[])
- If there aren't three command-line arguments
- Display an error message.
- Exit the program.
- Try to execute the following:
Handle problems with parsing of
- Parse
key
from first command-line argument.- Let
inputFilename
be the second command-line argument.- Let
outputFilename
be the third command-line argument.- Display a greeting.
- Let
cipher
be a new Caesar cipher with encryption keykey
.- Let
theReader
be the reader connected to the file namedinputFilename
.- Let
theWriter
be the writer connected to the file namedoutputFilename
.- Loop:
End loop.
- Read
inputValue
from the input file.- If end-of-file was reached, then terminate repetition.
- Let
inputChar
be thechar
version ofinputValue
.- Let
outputChar
be the Caesar-cipher encoding ofinputChar
usingcipher
.- Write
outputChar
to the output file.- Display a "success" message.
key
:Handle I/O problems:
- Display an appropriate message.
In all cases:
- Display an appropriate message.
- Close the input and output connections.
The design above asks for two support methods: one to create the input stream, another to create the output stream. These methods are grossly technical, and sound a bit strange when designed. It's better learning these from an example:
public static BufferedReader createReader(String filename) throws IOException { FileReader theFileReader = new FileReader(filename); BufferedReader theBufferedReader = new BufferedReader(theFileReader); return theBufferedReader; }
There are several things to notice about this method:
theFileReader
.theBufferedReader
. This is
a useable reader, and so it gets returned.FileReader
constructor throws an IOException
, you have to declare
this in the method's signature.Do this...
Add this method to EncodeCLIWADriver
. The code should
compile.
Fortunately, you need to do all of these same things to create your writer. The only difference is that "reader" becomes "writer".
Do this...
Write EncodeCLIWADriver#createWriter(String)
, and compile your code.
Coding will be a fairly major undertaking this time around. There's a bit of code in this algorithm, and unfortunately it cannot be shrunk up really due to the I/O processing.
EncodeCLIWADriver#main(String[])
already provides
some of the structure for this algorithm, including the
try
-catch
for the second step (which is
probably giving you compilation errors). We'll go through the two
catch
s and the finally
later on.
Let's go through the steps of the algorithm, one by one.
The first step is already implemented for you. There's nothing
too surprising there since we've already looked at arrays and
if
statements are old, old friends.
The System.exit(-1);
statement exits the program
immediately. This is somewhat dangerous to do when we're working
with I/O streams, but since this is at the beginning of the driver
and no I/O streams have been created at this point in the program,
exiting is safe. We don't throw an exception because this is a
problem with the way the user used the program; without the proper
command-line arguments, there's nothing to do but print an error
message and quit.
The -1
argument is an arbitrary value that's sent
back to the operating system. The convention is that a program
returns a 0
when the program stops normally, without
any problems. However, if our driver doesn't get three command-line
arguments, it cannot do anything since some crucial piece of
information is missing. So we exit with a non-zero value. There
really aren't any rules for what non-zero values to use.
1
and -1
are perhaps the most common.
Our main algorithm really consists of one step, a step that tries to execute some ideal code that hopefully executes fine all of the time. But life isn't perfect, and something will inevitably go wrong. When something goes wrong, we need to identify the problem and handle it.
The ideal code goes in the try
block; the problems
are recognized by the exceptions that they throw, and these are
handled by the catch
blocks.
The try
-catch
you need is already in
EncodeCLIWADriver#main(String[])
.
Since there might be multiple things that could go wrong with
the code we want to try to execute, we're allowed to put down as
many catch
blocks as we like:
NumberFormatException
is thrown when we parse a
number with a method like Integer#parseInt(String)
. It
indicates that the String
does not, in fact, represent
a valid number. Since we can't do any encryption with an invalid
key, we catch this exception and quit.IOException
is thrown when an I/O operations
fails. There are different types of IOException
s, but
we won't distinguish between them.There's an extra block at the end of the
try
-catch
statement, the
finally
block. While a
try
can have as many catch
s as you like,
you can have at most one finally
block. The code in a
finally
block is always executed, no matter
if the try
finishes normally or if any of the
catch
s are triggered.
The finally
block is necessary here because in all
cases, no matter what goes right or wrong, we must always
close up the input and output streams. This is easily the most
common use for a finally
block, to close up
streams.
The finally
block makes it important that we
don't use System.exit()
within the
try
or any of the catch
s since exiting
the program bypasses the important finally
block.
To access the first command-line argument, access the first
element of the args
array:
. That's a
args
[0]String
which can be passed to
Integer.parseInt(String)
:
Integer.parseInt(string
)
This returns the int
value contained in
string
, a
String
. (If string
does
not contain a valid int
, it throw the
NumberFormatException
.)
Do this...
Write the code to declare and initialize key
from the first command-line argument.
Then two simple declarations:
Do this...
Also write the code to set inputFilename
and
outputFilename
from the second and third
command-line arguments, respectively. Compile your code to see how good
your syntax is so far.
This is already done for you.
Read the CaesarCipher
class to see how to construct
a new CaesarCipher
.
Do this...
Add the statement to declare and initilzie
cipher
. Compile your code to see how good
your syntax is so far.
theReader
should be initialized using
EncodeCLIWADriver#createReader(String)
.
Do this...
Write the statement to declare and initialize
theReader
.
Since this method might throw an exception (when the file can't
be found), you have to write this code inside the try
.
(Actually, according to the algorithm above, a lot of this code
should already be in the try
.)
Do this...
Compile your code to see how
good your syntax is so far. You should be able to compile without
errors now, but do not run anything until you've finished
up the code (i.e., closed the files).
Do this...
Write the statement to declare and initialize
theWriter
using
EncodeCLIWADriver#createWriter(String)
. Compile the code.
When opening a writer, if the file already exists, it's erased completely and is replaced by the new output from the program. If the file doesn't exist, it's created for us. (If we want our program to be careful about overwriting old data files, we have to add extra code ourselves.)
BufferedReader
Now we're in the loop.
The first action in the loop is described as "reading from the
file", even though we are actually operating on
theReader
. Colloquially we might talk about
manipulating the files, but technically in the code we're
manipulating readers and writers.
We are interested in reading one character at a time. The
BufferedReader#read()
method almost does what we want.
It does read a character from the file, but it returns an
int
, not a char
:
intVariable
=reader
.read();
We need the int
for the time being (for the next
step), but eventually we'll cast it into a char
.
Do this...
Write the first statement of the forever loop. (The forever loop
is already written for you.) Compile your code to see how good
your syntax is so far.
Files are created by a computer's operating system. When the operating system creates a file, it marks the end of the file with a special end-of-file mark. Input operations are implemented in such a way as to prevent a program from reading beyond the end-of-file mark since this would be a security risk.
This end-of-file mark is usually used for terminating input
loops. Java indicates the end of a file by returning
-1
from BufferedReader#read()
. If your
program tries to read more input when its at the end-of-file,
you'll either have an exception thrown, or the end-of-file
character is just returned over and over again.
So testing "if end-of-file was reached" is a matter of testing
if our inputValue
is -1
.
Do this...
Write a if
-break
statement to terminate
the loop when the end-of-file is reached. Compile your code to see how good
your syntax is so far.
char
, Not an int
Now that we have tested for the end-of-file, we can turn the
int
into a char
. This is done by a
data cast:
(Type
)value
For our problem, our Type
is
char
and value
is
inputValue
. This is the initialization
expression for declaring inputChar
.
Do this...
Write the declaration for inputChar
. Compile your code to see how good
your syntax is so far.
Now give this char
to
cipher
.
Do this...
Write the declaration for outputChar
,
initialized to invoking encode(char)
on
cipher
and
inputChar
.
We need to be able to write a character on the output file.
There is is a BufferedWriter#write(char)
method that
will write a single character---exactly what we need.
The general form is this:
writer
.write(charValue
) ;
Do this...
Write the last statement of the loop. Compile your code to see how good
your syntax is so far.
Helpful hint: Other languages say that closing files is important, but it's often done for you. Not in Java! If you forget to close an output file, the output file may very well be empty when you run your program! It is your responsibility to close all of your files.
Once we are done using a stream to read from or write to a file, we must close it. This is particularly important for a buffered writer where the buffer may be discarded when the program ends without saving the data to the file. To force a buffered writer to write the buffer to the actual file, you must flush the buffer. Closing a file flushes your output buffer and takes care of some housekeeping details, so it's important not matter what stream, reader, or writer you're using.
Closing a stream, writer, or reader is done the same way:
stream
.close();
When execution reaches this statement, the program severs its
connection to stream
,
flushing buffers and telling the operating system to tie up any
open files.
Looking at the algorithm above, note that closing the files is
done in the finally
block. This actually takes a
little more code than is immediately obvious.
theReader
and
theWriter
must be declared before
the try
. Otherwise, the compiler won't see them in the
finally
block. You have to move the
declarations of these variables outside the
try
block, but they must be
initialized inside the try
block.theReader
and
theWriter
!null
when they're declared or the compiler gets
anxious that they don't have a value.finally
block we have to test to see if
they are null
before we close them, otherwise we might
get a NullPointerException
.IOException
that the
close()
method might throw.Do this...
Follow all of those steps in writing the code for the
finally
block.
Here's a pattern for you to follow for closing a stream
(the
last two steps above):
try { if (stream
!= null)stream
.close(); } catch (IOException e) { System.err.println("Problems closing a stream."); e.printStackTrace(); }
This code goes inside the finally
block of
the original try
. You can put other
if
-close
statements inside this
try
to close other streams. The exception handling
code will print an error and display a stack trace for
debugging.
This is quite easily some of the ugliest code you'll ever have to write. But there's little we can do about it. We have to do these things in these places, otherwise Really Bad Things happen.
Do this...
Compile your code, but don't
run it quite yet.
Our solution for handling problems is to display reprimanding
messages to the user on System.err
.
Do this...
Add statements to the catch
blocks to reprimand the
user appropriately.
Do this...
Now, finally, compile and run your driver program. (Reread the run page to find out how to pass
command-line arguments to your program!) Remember to pass in the
key, input file name, and output file name in as arguments to the
program from the command line.
Specify the input and output filenames relative from the main
project folder. That is, your input filename should be
Inputs/lab10/message.text
, and the output file should
be Outputs/lab10/message.encode
. This will keep the
main project folder less cluttered.
If what you have written is correct, use a key of 3, and your program should create an output file containing this output:
Rqh Li Eb Odqg Wzr Li Eb Vhd
Also try running your program on a file that doesn't exist. It should exit gracefully with your reprimand.
Now apply all of this knowledge to the problem of decoding a
file encoded using a Caesar cipher. This program, implemented in a
class DecodeCLIWADriver
, should be a dual to
EncodeCLIWADriver
. In fact,
DecodeCLIWADriver
should look a lot like
EncodeCLIWADriver
.
This is definitely a problem where you should think about the problem first to figure out how little work you should do.
To test your DecodeCLIWADriver#main(String[])
program, you can use the output file created by
EncodeCLIWADriver
, using the same encryption key.
Be careful with your file names!!!! Remember that output
files are overwritten without any complaints. It's
strongly advised that you name your decoded files with a
.decode
suffix or something else distinctive.
Also included is alice.code
, an encoded selection
from Lewis Carroll's Alice In WonderLand. Experiment with
different encryption keys to figure out the proper encryption
key.
Submit the following for this lab exercise:
message.text
with three different
encryption keys.message.text
.alice.code
.buffered input stream, buffered output stream, buffering, Caesar
cipher, character streams, command-line arguments, data cast,
encryption key, file, finally
block, flush a buffer,
input buffer, output buffer, parsing method, persistent storage,
print stream, program usage, stream, token, wrapper class