CS 214 Lab 12

CS 214 Lab 12: Parallel Programming for Multicore Processors

The "programs" we have written prior to this have all illustrated traditional sequential programming, meaning each program had a single thread of execution that performed that program's steps one at a time, in sequence. Parallel programming is creating a computation that is made up of multiple execution threads that run in parallel and cooperate to solve the problem. On a multicore processor, the program's threads can potentially run simultaneously on different cores, and thus solve the problem faster.

This exercise explores parallel programming in Ada, Clojure, Java, and Ruby. Our task is a simple one: sum the values in a large array. To illustrate, consider the following array of 16 random integers:

If we were to sum these values with a single thread (shown in blue below), that thread would have to add all up the values itself.

Computing the sum of the sixteen values (704) would require that one thread to perform all sixteen additions, which would take it time proportional to the length of the array:

But if we do were to slice the array into two halves, and use a second "helper" thread (shown in red below) to sum one of the slices at the same time as our original (blue) thread sums the other slice, each thread could compute a partial sum of its half of the array in parallel (i.e., simultaneously):

Once they have each computed their partial sums, our original (blue) thread can compute the complete sum by adding the helper thread's partial sum to its own, as shown at step 9 above. Since each thread is summing eight of the sixteen values, the time required to solve the problem should be cut nearly in half, provided the two threads can sum their slices simultaneously.

If we were to instead sum the array using three threads running in parallel (shown in blue, red, and green below), each thread could sum a third of the array (5 values) at the same time. But since the size of the array is not evenly divisible by three, that leaves a "leftover" value.

One way to deal with such "leftovers" is to have the original thread compute the partial sum of its slice of the array, and then add the leftover values to its partial sum, as shown at step 6 below:

Once it has finished adding in the leftovers, the original thread can then add in the partial sums of the other threads, as shown at steps 7 and 8 above.
A different way to deal with the leftovers is to recognize that these all come at the end of the array, immediately after the array-slice processed by the last thread, so we could just have the last thread keep on going and process these leftovers as part of its workload.
How much does this add to the final thread's workload? Given T threads and an array of length N, the number of leftovers will be the remainder of N / T, so the number of leftovers will always be less than T. That is, suppose that we are using four threads. If our array is length 16, there will be 0 leftovers; if it is length 17, there will be 1 leftover; if it is length 18, there will be 2 leftovers; if it is length 19, there will be 3 leftovers. if it is length 20, there will be 0 leftovers and the cycle then begins again: As a result, summing the leftovers will not take very long, so long as T is small.

Either of these approaches is acceptable, and we will see how to use both of them in today's exercise.

So long as the threads can sum their slices of the array simultaneously, the time required to sum the values in the array of length N using T threads should be proportional to N/T + S, where N/T is the time to compute the partial sums, and S is the time to add up those partial sums. S will generally be less than T, and T should usually be less than the number of processing cores available in the computer, so that all of the threads can run simultaneously. Most current computers do not have very many cores, which forces us to keep T small, so for large arrays, N is the dominant term.

If we consider very generally what one of our threads needs to do, it must:

Sum its particular slice of the array; and
Communicate its partial sum back to the original thread.

Taking these one at a time:

1. To help a thread compute the indices of its slice of the array we assign each thread an ID number (i.e., the original thread is 0, the first helper is 1, the second helper is 2, etc.). To illustrate, we can have the slice for the thread whose ID is 0 begin at index 0; the slice for the thread whose ID is 1 begin at index sliceSize; the slice for the thread whose ID is 2 begin at index 2 * slizeSize; and so on. If this does not seem obvious, take a moment to look at the diagram above, with the blue thread having ID 0, the red thread having ID 1, the green thread having ID 2, and the sliceSize being 16 / 3 = 5 (integer division).

Thus, if a thread knows its ID and the number of items in the array, it can compute its sliceSize and its starting and stopping index values using these equations:

   sliceSize = ID / numberOfItems
   startIndex = ID * sliceSize
   stopIndex = startIndex + sliceSize

A thread thus needs two pieces of information to compute its starting and stopping index values:

its ID number; and
the number of items (i.e., the size of the array) it is to sum.

Our original thread can pass these values to a helper thread when it creates that helper thread.

2. To communicate a thread's partial sum back to the original thread, we will use four different approaches:

In Java, our main thread will wait for each helper thread to finish its slice of the array, invoke a "getter" method to retrieve that helper thread's partial sum, and add that partial sum to its own running total.
In Ada, we will use entry procedures and their parameter-passing mechanisms to send values back and forth between the main task and the helper tasks.
In Clojure, each thread will write its partial-sum into a channel; after the main thread finishes its slice, it will retrieve these partial-sums from the channel and add them together to form a total-sum.
In Ruby, we will have each thread write its result to a unique spot in a results-array, which the main thread can then sum.

In each of these languages, we will begin with a program that

gets the name of an input file from the command-line;
reads the first line from the input file, which indicates the number of subsequent values in the file;
allocates a dynamic array of that size;
reads the subsequent values from the file into the array;
starts a timer;
sums the values in the array;
stops the timer; and
reports the sum, plus how long it took to compute.

This sequential program will thus provide a baseline time against which we can compare the times for a parallel version, using different numbers of threads.

We also want to see how the size of the array being summed affects performance, so the directory:

   /home/cs/214/labs/12/numbers

contains some files of random numbers for you to use as input files: 5numbers.txt (for correctness testing), 1000numbers.txt, 10000numbers.txt, 100000numbers.txt, and 1000000numbers.txt. For convenience sake, our program reads the name of the input file from the command-line. To conserve disk space, please access these files from there (e.g., /home/cs/214/labs/12/1000000numbers.txt), instead of making your own local copy.

We will thus get to see our languages' mechanisms for:

creating threads,
communicating between threads,
allocating dynamic arrays,
timing computations, and
processing command-line arguments.

Begin by making a new directory for this exercise and changing directory to that new directory.

Turn in. Each of the parts requires you to create files containing source code and execution traces, plus a spreadsheet with execution times and charts. When you have completed each part, use cat to create a single file that contains all of your results:

   cat ada/script.ada clojure/script.clojure java/script.java ruby/script.ruby  > lab12-results

Then submit your work by copying that single file and your spreadsheet into your personal folder in /home/cs/214/current/:

   cp lab12-results /home/cs/214/current/yourUserName

replacing yourUserName with your login name. The grader will access and grade your results from there.

Calvin > CS > 214 > Labs > 12

This page maintained by Joel Adams.