CS 374: Computing PI via Posix Threads


Introduction

Begin by typing

   man -k pthread
and
   man pthread_create
and skim the information that is displayed, particular the signature of the pthread_create() command.

When you have a basic understanding of what pthread_create does and how it works, create a new directory and save copies of pi.c and Makefile there.

The file pi.c contains a program that computes an approximation of PI using 1 or more threads. Since PI == 4 * arctan(1) and the arctan(x) == the integral from 0 to x of (1/(1+x*x)), the program calculates a numerical approximation of this integral and uses it to approximate PI. Take a few minutes to study the program. Don't proceed until you understand what it is doing, and why.

Exercise: Part I

On your ulab workstation, compile the program using the Makefile to ensure that it compiles correctly. Execute the program using the command:

   ./a.out 1000 1
and verify that it works as it should. The "usage" for this program is as follows:
   ./a.out intervals threads
where intervals is the number of "pieces" into which we divide the integral computation, and threads is the number of threads across which we spread the computation of those "pieces".

Note that the precision of our approximation of PI is rather limited. PI is a transcendental number, so its exact value cannot be specified -- we must always deal with an approximation. The question is how precise our approximation is.

Execute the program again, increasing the first command-line argument by a factor of ten:

   ./a.out 10000 1
What happens to the precision of PI?

Keep increasing the command-line argument by a factor of ten until (a) the program takes longer than ten seconds; or (b) the approximation of PI is as accurate as the value given for comparison purposes. Record this intervals value.

Exercise: Part II

Execute ./a.out using the command:
   time ./a.out yourIntervals 1
where yourIntervals is your intervals value from part I. Record the real, user and sys time values in a spreadsheeta. In another column, add together the user and sys times.

Repeat this procedure using 2, 4, 8, 16, and 32 for the number of threads. Using your spreadsheet, compute the speedup using 2, 4, 8, 16, and 32 threads. Create a line-chart comparing the real times with the sum of the user+sys times using 1, 2, 4, 8, 16, and 32 threads.

How do the real times compare to the sum of the user plus sys times? How can you explain this?

Exercise: Part III

Currently, the "main" thread of the program spawns off the N child threads and then sits idle until they terminate. In theory, we should be able to get better performance if we have the main thread do some of the work, rather than sitting idle.

Modify the program so that for N threads, the "main" thread creates N-1 threads numbered 1..N-1, and then performs the work formerly done by thread "0".

Repeat our "experiment" again using this modified version. Add new series to your spreadsheet for the real times, and a new line-chart comparing the real-time performances of your original and modified versions. Be sure to label each line clearly.

Which version of the program gives better performance?

Using the better version, find the "tipping point" -- the exact number of threads that provides the best performance -- on your ulab machine.

Exercise: Part IV

Now, let's compare the performance of the ulab machines against the performance of acolyte.calvin.edu, which is one of the ulab's servers.

Login to acolyte and recompile your program there, using the better of the two versions from Part III. Repeat our "experiment" there using that version of our program. Add new series to your spreadsheet giving the real times on acolyte. Add a new chart comparing the times on acolyte to the times in the ulab. Be sure to label each line clearly.

Which environment -- ulab or acolyte -- gives better performance?

Find the "tipping point" -- the exact number of threads that provides the best performance -- on acolyte.

Exercise: Part V

Now, let's compare the performance of one of our supercomputer's nodes to the machines we have examined so far.

Login to dahl.calvin.edu (the cluster's "super" node) and recompile your program there. Repeat the "experiment" a final time, using the better of the two versions of the program. Add final series to your spreadsheet and a line to your chart from Part IV so that you can compare the performance on the ulab machines, the performance on acolyte, and the performance on dahl.

Which environment gives you better performance?

Find the "tipping point" -- the exact number of threads that provides the best performance -- on dahl's "super" node.

Homework

Have a good holiday!

(Optional.) For extra credit, rewrite the circuit program as a shared-memory (multithreaded) application. Compare its performance to that of the MPI version we wrote previously.

Hand In

To learn more about POSIX threads, see the LLNL Posix Threads Programming tutorial.


CS > 374 > Homework > Pthreads


This page maintained by Joel Adams.