### HPC Project 6: Patterns in Pthreads

Last week, we parallelized calcPI, a sequential program to calculate PI using integration and the trapezoidal method. This week, we are going to start with calcPI2, which calculates PI using integration and the arctangent method, and uses the pthreads library to do the computation in parallel.

#### Part I: Using Mutual Exclusion

This version of calcPI2 has each thread compute its portion of the integral in a variable localSum. At the end of the computation, the thread adds its localSum to a shared variable pi. Since each thread is reading and writing this shared variable, this addition is a critical section, so the program performs this addition using the Mutual Exclusion pattern, to avoid a race condition.

Using a spreadsheet, record the precision of the answer and the time required to compute it, using 1; 10; 100; 1,000; 10,000; 100,000; 1,000,000; 10,000,000; 100,000,000; 1,000,000,000; and 10,000,000,000 intervals. Record these timings using 1, 2, 4, 8, and 16 threads. This will provide a baseline for Part II.

#### Part II: The Reduction Pattern

One of the points at which calcPI2 loses performance is its use of the Mutual Exclusion pattern: if multiple threads finish at the same time, they are forced to add their localSum values to pi one at a time (i.e., sequentially).

Part II of this week's assignment is to replace that Mutual Exclusion pattern with a Reduction pattern, to see if we can improve the program's performance. More precisely, you should replace the lines:

```    pthread_mutex_lock(&piLock);
pi += localSum;
```
with something like this:
```    pthreadReductionSum(localSum, &pi);
```
that reliably achieves the same result, but using a tree-like parallel summation instead of a mutually exclusive sequential summation.

Unfortunately, the pthreads library does not provide any constructs that implement the Reduction pattern, so you will need to implement this yourself. There are different ways this can be done; do some research to look for a speedy way. Depending on the approach you use, you may find the Barrier pattern to be useful. If you need more information about pthreads, this tutorial by Blaise Barney is a good resource.

Store your pthreadReductionSum() function in its own header file (e.g., pthreadReduction.h, so that you can easily use it in another program if the need arises. Your function should not use any non-local variables except for ones defined in pthreadReduction.h.

Once you have finished it, time this version of your program using the same intervals and threads values as before.

Then use your program to collect the timing data needed for the 3D charts described below. For these timing runs, you should use the Ulab machines.

This is a one week project.

#### Hand In

Hard copies of:

1. A 1-page analysis of your program's behavior, exploring the relationship between the number of PEs and the number of intervals. Note any points at which performance and/or accuracy start to degrade. Note any differences you discover in the precision of the Arctangent and the Trapezoidal methods. Make your writeup as quantitative as possible, and cite data from your spreadsheet and charts to support your analysis.
2. Two 3D bar-charts (one for the Mutual Exclusion timings and one for the Reduction timings), with the X-axis being 1, 2, 4, 8, and 16 threads, the Z-axis being 100,000; 1,000,000; 10,000,000; 100,000,000; 1,000,000,000; and 10,000,000,000 intervals; and the Y-axis being your programs' times for the corresponding X and Z values. As always, label your charts and their axes, and format them so that they displays well when printed on the Ulab printer.
3. One 2D line-chart, in which you compare the times of the Mutual Exclusion and Reduction patterns using 8 threads while varying the number of intervals. Your X-axis should be 100,000; 1,000,000; 10,000,000; 100,000,000; 1,000,000,000; and 10,000,000,000 intervals. Your Y-axis should be the time.
4. A script file in which you use cat to list your program, use make to show that it compiles correctly, and then show its execution using 1,000,000,000 intervals and 4 threads.
Please staple these pages together and make certain that your name is on each page.

CS > 374 > Exercises > 06 > Homework Project