The "programs" we have written prior to this have all illustrated traditional sequential programming, meaning each program had a single thread of execution that performed that program's steps one at a time, in sequence. Parallel programming is creating a computation that is made up of multiple execution threads that run in parallel and cooperate to solve the problem. On a multicore processor, the program's threads can potentially run simultaneously on different cores, and thus solve the problem faster.
This exercise explores parallel programming in Ada, Clojure, Java, and Ruby. Our task is a simple one: sum the values in a large array. To illustrate, consider the following array of 16 random integers:
If we were to sum these values with a single thread (shown in blue below), that thread would have to add all up the values itself.
But if we do were to slice the array into two halves, and use a second "helper" thread (shown in red below) to sum one of the slices at the same time as our original (blue) thread sums the other slice, each thread could compute a partial sum of its half of the array in parallel (i.e., simultaneously):
If we were to instead sum the array using three threads running in parallel (shown in blue, red, and green below), each thread could sum a third of the array (5 values) at the same time. But since the size of the array is not evenly divisible by three, that leaves a "leftover" value.
Once it has finished adding in the leftovers, the original thread can then add in the partial sums of the other threads, as shown at steps 7 and 8 above.
How much does this add to the final thread's workload? Given T threads and an array of length N, the number of leftovers will be the remainder of N / T, so the number of leftovers will always be less than T. That is, suppose that we are using four threads. If our array is length 16, there will be 0 leftovers; if it is length 17, there will be 1 leftover; if it is length 18, there will be 2 leftovers; if it is length 19, there will be 3 leftovers. if it is length 20, there will be 0 leftovers and the cycle then begins again: As a result, summing the leftovers will not take very long, so long as T is small.
Either of these approaches is acceptable, and we will see how to use both of them in today's exercise.
So long as the threads can sum their slices of the array simultaneously, the time required to sum the values in the array of length N using T threads should be proportional to N/T + S, where N/T is the time to compute the partial sums, and S is the time to add up those partial sums. S will generally be less than T, and T should usually be less than the number of processing cores available in the computer, so that all of the threads can run simultaneously. Most current computers do not have very many cores, which forces us to keep T small, so for large arrays, N is the dominant term.
If we consider very generally what one of our threads needs to do, it must:
1. To help a thread compute the indices of its slice of the array we assign each thread an ID number (i.e., the original thread is 0, the first helper is 1, the second helper is 2, etc.). To illustrate, we can have the slice for the thread whose ID is 0 begin at index 0; the slice for the thread whose ID is 1 begin at index sliceSize; the slice for the thread whose ID is 2 begin at index 2 * slizeSize; and so on. If this does not seem obvious, take a moment to look at the diagram above, with the blue thread having ID 0, the red thread having ID 1, the green thread having ID 2, and the sliceSize being 16 / 3 = 5 (integer division).
Thus, if a thread knows its ID and the number of items in the array, it can compute its sliceSize and its starting and stopping index values using these equations:
sliceSize = ID / numberOfItems startIndex = ID * sliceSize stopIndex = startIndex + sliceSize
A thread thus needs two pieces of information to compute its starting and stopping index values:
2. To communicate a thread's partial sum back to the original thread, we will use four different approaches:
In each of these languages, we will begin with a program that
We also want to see how the size of the array being summed affects performance, so the directory:
/home/cs/214/labs/12/numberscontains some files of random numbers for you to use as input files: 5numbers.txt (for correctness testing), 1000numbers.txt, 10000numbers.txt, 100000numbers.txt, and 1000000numbers.txt. For convenience sake, our program reads the name of the input file from the command-line. To conserve disk space, please access these files from there (e.g., /home/cs/214/labs/12/1000000numbers.txt), instead of making your own local copy.
We will thus get to see our languages' mechanisms for:
Begin by making a new directory for this exercise and changing directory to that new directory.
Turn in. Each of the parts requires you to create files containing source code and execution traces, plus a spreadsheet with execution times and charts. When you have completed each part, use cat to create a single file that contains all of your results:
cat ada/script.ada clojure/script.clojure java/script.java ruby/script.ruby > lab12-resultsThen submit your work by copying that single file and your spreadsheet into your personal folder in /home/cs/214/current/:
cp lab12-results /home/cs/214/current/yourUserName
replacing yourUserName with your login name.
The grader will access and grade your results from there.