HPC MPI Homework Project 6:
Measuring Speedup
Overview
In this exercise, we want to practice tuning an algorithm,
and measuring the resulting changes in speedup and efficiency.
The problem we'll be solving is one that is conceptually simple,
computationally intensive, and important in cryptography/security:
identifying all prime numbers less than some (large) limit value.
To solve the problem, we'll be "parallelizing"
an ancient algorithm called Eratosthenes Sieve.
To find all prime numbers up to some limit value N,
this algorithm works as follows:
-
Create the sieve -- an array indexed from 2 to N.
(This can be an array of bools or ints or anything
else that supports random access and provides entries
that are easy to "mark".)
-
Set k to 2.
-
Loop:
-
In the sieve, mark all entries whose indices are
multiples of k between
k2 and N,
inclusive.
-
Find the smallest index greater than k
that is unmarked, and set k to this value.
Until k2 > N.
-
The indices of the unmarked sieve entries are prime numbers.
To see how the algorithm works, check chapter 5
(specifically Figure 5.2)
in your text for a walk-through.
Today we wll study a parallel version of this algorithm
that distributes "the sieve" across M different PEs,
to see what kind of speedup and efficiency it offers.
Exercise
Chapter 5 of your text provides a parallel version of Eratosthenes
Sieve called sieve1.c (see Figure 5.6)
and explains how it was designed.
Unfortunately, that program has a bug that makes it only work
correctly with more than one process.
Download this
corrected version and this
Makefile.
Make and run it to verify that it is working correctly.
Then rerun it, using a large limit value
(e.g., 100,000,000 or more)
on 1 PE, and record the execution time in a spreadsheet.
The Makefile provides a way to do this conveniently.
Then run it again using 2 PEs (hopefully different from those
of your classmates) and record its execution time in your spreadsheet.
Using these values, compute the speedup S2,
and the efficiency E2 as discussed in class.
Then run it again using 4 PEs (again, hopefully different from those
of your classmates) and record its execution time in your spreadsheet.
Using these values, compute the speedup S4,
and the efficiency E4.
If all is well (i.e., your classmates are not interfering with you),
you should see measurable speedup when using 2 and 4 PEs.
Which has higher speedup?
Which has higher efficiency?
There are at least three ways in which the program
can be improved.
Implementing these improvements is this week's assignment.
Homework
Your task is to write three "improved versions" of sieve1.c,
as described in your text:
-
sieve2.c,
in which you modify sieve1.c to exclude
all even integers from consideration.
(See section 5.9.1 in your text.)
-
sieve3.c,
in which you modify sieve2.c to eliminate
the broadcast step.
(See section 5.9.2 in your text.)
-
sieve4.c,
in which you modify sieve3.c by reorganizing
its loops to increase the cache hit rate.
(See section 5.9.3 in your text.)
On Ohm,
run all four programs using 1, 2, 4, 8, and 16 PEs,
and record their execution times in a spreadsheet.
Using the same spreadsheet,
compute the speedup and efficiency for each execution
using multiple PEs, as discussed in class.
Hand In
-
Your source code for the four programs.
-
Four spreadsheet charts
-- one for each program --
showing the execution times on Ohm
using 1, 2, 4, 8, and 16 PEs.
-
Four spreadsheet charts
-- one for each program --
showing the speedups on Ohm
using 2, 4, 8, and 16 PEs.
-
Four spreadsheet charts
-- one for each program --
showing the efficiencies on Ohm
using 2, 4, 8, and 16 PEs.
-
A 1 page analysis of your results:
How do the execution times of these programs compare?
Do the execution times match those predicted in the textbook?
How do their speedups compare?
How do their efficiencies compare?
What is the optimal number of PEs for maximizing speedup?
What is the optimal number of PEs for maximizing efficiency?
Are these optimal numbers the same or different for your
four programs?
Please staple or paperclip your pages together.
Up to the HPC Homework Page
Up to the Calvin HPC Course Page
This page maintained by
Joel Adams.