HPC MPI Homework Project 2:
An Embarassing (ly Parallel) Program
Overview
In this exercise, we will use an embarassingly parallel
MPI program to solve a problem,
and see how well it speeds up as we add more CPUs.
Exercise
There is a circuit diagram on page 97 of your text.
Page 100 presents a C program to find the inputs that cause
the circuit to produce 1.
Code in the C program, and add MPI statements to time
its execution.
Then for N = 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15,
and 16 processors,
run the program 3 times and record each time in a spreadsheet.
Use the spreadsheet to compute the median (middle) time,
and create a chart that plots these
(N vs median-time-using-N) points.
Questions to think about:
-
What does your chart reveal?
-
What is it about the program that is causing this behavior??
Homework
Part I.
Repeat this experiment on the cluster.
Part II.
The file ooCircuit.tgz
contains an OO version of the circuit program.
Untar/zip it, use make to compile it,
and repeat the experiment in the ulab and on the cluster using this version.
Part III.
Using the discussion in section 4.5 of the text as a guideline,
modify the OO version so that the program displays the number
of inputs that satisfy the circuit.
Use the MPI_Reduce() function to sum all the PEs' values
efficiently.
Rerun your OO experiment on the cluster.
How does this change affect the timing results there?
Hand In
-
Your modified OO version of the program;
-
Your spreadsheet charts; and
-
A 1 page analysis in which you quantify your results:
-
What trends (general or specific) do your charts reveal?
-
How do raw execution times on the cluster compare to times in the ulab?
Why?
Does it depend on the number of processors being used,
and if so, how?
-
How does the shape of the curve on the cluster compare to the
shape of the curve in the ulab?
Are they different from one another in any way?
-
How does the OO version compare to the non-OO version
in execution speed?
-
How does the OO version compare to the non-OO version
in size?
-
How much does collecting the number of inputs that
satisfy the circuit change the OO version's performance?
Please staple your pages together.
Up to the HPC Homework Page
Up to the Calvin HPC Course Page
This page maintained by
Joel Adams.