CS 214 Lab 12: Java Introduction


Begin by creating a java subdirectory in 12; then copy the files Sum.java, Array_Sum.java, and Makefile from the course directory /home/cs/214/labs/12/java/ into your java directory. Use your favorite text editor to open these files, customize their opening documentation, and take a moment to study them to get a sense of what each does.

Computing Sums

The file Sum.java contains a program to compute the sum of a sequence of values stored in a file (without using an array). The /home/cs/214/labs/12/numbers/ directory contains several files, each containing differing numbers of random values from the range 1..100. The first line of each file indicates the number of subsequent values in the file (e.g., 1000, 10000, etc.), as does the name of the file. Our programs will use this information to allocate a dynamic array of the correct size, and then fill it with values from the file.

Open and save a spreadsheet in your 12 directory. (Save it in Excel format with the name results, and place your name and the date somewhere near the top. If you prefer, you can use Google Drive to create your spreadsheet, and then download the resulting file in Excel format when complete.) Record the name of each input file on a separate row in column A, leaving some blank rows at the top for column headings. In column B, enter the (numerical) number of values in the corresponding file. Be sure to label your columns for readability.

Compile Sum.java, and use it to compute the sums of the values in each of the input files. For example:

   ./java Sum ../numbers/1000numbers.txt

In column C of your spreadsheet, record the sum of the values in each file, as computed by the program in Sum.java. In the rest of this exercise, we will be recording how long it takes to sum an array containing these values, so this will let us validate the results produced by our other programs, to ensure that they are computing their sums correctly.

Timing the Operation

Open the file Array_Sum.java and take a few moments to study its structure, to see how it differs from Sum.java.

As you can see, the program begins by creating an ArraySum object, and sending that object the run() message.

If you examine the ArraySum() constructor, it begins by checking that the correct number of command-line arguments were specified. If so, it opens a Scanner to the input file; it then passes that Scanner to the readFile() method, which uses it to create and populate an array with the data from the input file.

From there, control moves to the run() method, which saves the current time, invokes sumArray() to sum the values in the array,, records the new time, and then outputs the sum and the difference of the two times.

The sumArray() method just computes and return the sum of the values in the array it receives via its parameter.

Compile Array_Sum.java and make certain it builds correctly before proceeding.

In your spreadsheet, add a column labeled Java. Then run Array_Sum, using each of the files in /home/cs/labs/12/numbers/. Verify that Array_Sum produces the correct sum for each file. Assuming it does, the system clock can vary significantly from execution to execution, so run the program three times for each file, and record the middle of the three execution times in your spreadsheet. These times, computed with a single thread of execution, will provide us with a baseline against which we can compare the execution times when we use multithreading.

Multithreading in Java

There are a variety of ways to perform multithreading in Java, but a common approach is to extend Java's Thread class in order to inherit its capabilities, and override its run() method with the desired functionality for our thread.

Begin by making a copy of Array_Sum.java and name the copy Threaded_Array_Sum.java. Then open Threaded_Array_Sum.java, customize its opening documentation, update the class name and update the instantiation of the class in the main() method.

Threads are built into the java.lang package, so we don't need to import anything special to use them. (Java also provides an extensive collection of classes useful for multithreading in the java.util.concurrent package, but we don't need these features for our task today.)

To allow us to control the number of threads our program uses, we must somehow get the desired number of threads from the user. Since we are already getting the name of the input file from the command-line, let's also get the number of threads to be used from there. In the ArraySum constructor, update the "usage" message to indicate that the number of threads is an optional second command-line argument.

Next, let's define a method named getNumThreads() that, given the array of command-line arguments, lets us retrieve the number of threads specified on the command-line:

  private final int getNumThreads( String [] args ) {
     if (args.length >= 2) {
        return Integer.parseInt( args[1] );
     } else {
        return 1;
     }
  }
Then, at the end of the ArraySum() constructor, add the line:
      ...
      fillArray(myArray, myScanner);
      myNumThreads = getNumThreads( args );
   }
and add a declaration of myNumThreads so that it is a private instance variable of your class.

Next, find the definition of method sumArray(). There, we want to divide the iterations of the for loop across myNumThreads Helper threads. Each Helper should know how to be a Thread, but we also want it to know how to sum a slice of the array. We can use inheritance so that the each Helper object is-a Thread but we'll have to encode the slice-summing piece ourselves. Add the following code to define an inner class (i.e. a class defined within another class, and therefore scoped within the outer class):

 private class Helper extends Thread {

     public Helper(int id) {
       super();
       myID = id;
       myPartialSum = 0;
     }

     public void run() {
         myPartialSum = sumSlice(myID);
     }

     public final long getPartialSum() {
        return myPartialSum;
     }

     private int  myID = 0;
     private long myPartialSum = 0;
  } // Helper
A class like Helper that extends Java's Thread class inherits a number of methods from Thread including start() and run(). After we create an instance of this class, we must send it the start() message to activate the thread. Among other things, this inherited start() method invokes run(), which like all Java methods, is polymorphic. The convention in Java is for the subclass to override the run() method with a definition that provides the desired behavior, as shown above.

To provide that behavior, our Helper class's run() method invokes a sumSlice() method that, given its id, uses that information to compute and return the sum of its slice of the array. We can define sumSlice() as follows:

    private long sumSlice(int id) {
        int sliceSize = myArray.length / myNumThreads;
        int start = id * sliceSize;         // starting index
        int stop = (id+1) * sliceSize;      // stopping index
        if ( id == myNumThreads-1 ) {       // have final thread
            stop = myArray.length;            //  handle leftovers
        } 
        long sliceSum = 0;
        for (int i = start; i < stop; ++i) {  // sum the ints
            sliceSum += myArray[i];           //  in my slice
        }
        return sliceSum;
  }
Note that this should be defined as a ThreadedArraySum method, not a Helper method, because our main thread (0) will need to invoke sumSlice() to sum its slice of the array.

Now that we have defined what it means to be a Helper, we still need to actually instances of these objects to do the work. Replace the body of your sumArray() method with the body shown in bold below:

  private long sumArray() { 
     Helper[] helpers = new Helper[myNumThreads];

     for (int i = 1; i < myNumThreads; ++i) {     // for each helper:
        helpers[i] = new Helper(i);                  //  create, and
        helpers[i].start();                          //  launch them
     }

     long sum = sumSlice(0);                         // main thread does slice 0

     try {
        for (int i = 1; i < myNumThreads; ++i) {  // for each helper h:
           helpers[i].join();                        //  wait for h to finish
           sum += helpers[i].getPartialSum();        //  get its partial sum
        }
     } catch( InterruptedException ie) {             // required by join()
        System.err.println("\n*** a Helper was interrupted!\n");
        System.err.println(ie);
        System.exit(1);
     }

     return sum;
  }
In this revised version of sumArray(), we start by creating an array of handles to our helper threads, so that we can access them later. Then we use a for loop to populate this array with Helper objects, passing each one its ID so that it can compute its partial sum, and then we sent it the start() message to start it running.

Having launched its helpers, our main thread computes the partial sum of slice 0 and stores that in its local variable sum.

Having computed its partial sum, our main thread then uses a second for loop to iterate through its array of helpers, sending each the join() message. join() is another method our Helper class inherits from the Thread class; if the thread to which join() is sent has not finished running, the join() message causes the thread that sent the message (i.e., our main thread) to suspend until the helper thread has finished running. This ensures that the helper thread has finished computing its partial sum, so our main thread then sends the getPartialSum() message to retrieve that value.

This approach -- in which the main thread "forks" a group of helper threads, does some work, uses "join" to ensure that each helper has finished, and then combines their partial results into a complete problem-solution -- is called the fork-join pattern, and it is commonly used in multithreaded programming.

Save all of your changes, compile, and continue when your program is free of syntax errors.

Testing and Performance

Run ThreadedArraySum on the input files from the course directory, verifying it is correctly computing the sums. For example, to test the largest file with 4 threads you can use:

   java ThreadedArraySum /home/cs/214/labs/12/numbers/1000000numbers.txt 4

To see what kind of speedup we are getting, add 5 more columns to your spreadsheet, and label them 1 2 3 4 5. For each of the input files, run it three times using 1 thread, and record the middle time in the appropriate row of your spreadsheet under column 1. How does the time for ThreadedArraySum using 1 thread compare to the time for ArraySum?

Repeat this procedure using 2, 3, 4 and 5 threads and record the middle times in the appropriate spreadsheet cells. In your spreadsheet, answer the following questions:

Java.1. How does the time using Threaded_Array and 1 thread compare to the time using Array_Sum?

Java.2. How do the times using T > 1 threads compare to the times using a single thread? Explain what is happening.

Java.3. Under what circumstances does using more threads to solve the problem make the time increase?

Interpretation vs Compilation

Java (unfortunately) is not representative of most programming languages when it comes to parallel speedup. The problem is that the Java compiler (javac) produces byte-code for the Java Virtual Machine (JVM), as opposed to machine language code for your computer. When you use java to run your program, that invokes the JVM, which interprets your code.

Interpreted programs are notoriously slow compared to machine-language programs that run natively on a computer's hardware. To improve Java's speed, recent versions of the JVM incorporate "HotSpot" technology, which runs on its own thread in the background, watching the JVM interpret your byte-code. If it determines that the JVM is re-executing the same code over and over (i.e., a "hot spot" in your code, such as a loop), the HotSpot technology will compile that code to machine-language while your program is running, and then switch to run that machine-code natively instead of interpreting the byte-code. Java calls this just in time compilation (JIT).

JIT compilation can provide a significant performance boost; to get a sense of the difference it can make, enter:

   java -Xint ThreadedArraySum /home/cs/214/labs/12/1000000numbers.txt
The -Xint flag tells the JVM to disable the JIT compiler and only run the interpreter. Use the output and your spreadsheet to answer the following question:

Java.4. For summing 1,000,000 numbers, how much does Java's HotSpot technology improve performance compared to just running the JVM's interpreter? Quantify your answer.

Java's HotSpot technology complicates our study of parallel speedup. The HotSpot technology is clearly identifying the array-summing loop as a hot spot; suppose we are processing 1,000,000 numbers and it invokes the JIT compiler after 100,000 iterations of the loop. Then:

The percentage of the execution that is performed as fast native machine code thus decreases as we add more threads, interfering with our analysis of parallel speedup. As we shall see, this interference does not occur with compiled languages whose programs run 100% as native machine-code.

Turn In

Create a script file in which you use cat to display ThreadedArraySum.java, and then show a successful compilation of your program. Run ThreadedArraySum, using the file 1000000numbers.txt and 1 thread; then run ThreadedArraySum, using the same input file and 4 threads. Save your spreadsheet, but do not close it since you will be continuing to update it in the other exercises.

That concludes the Java introduction for this lab.


Calvin > CS > 214 > Labs > 12 > Java


This page maintained by Joel Adams.