HPC MPI Homework Project 6: Measuring Speedup


Overview

In this exercise, we want to practice tuning an algorithm, and measuring the resulting changes in speedup and efficiency. The problem we'll be solving is one that is conceptually simple, computationally intensive, and important in cryptography/security: identifying all prime numbers less than some (large) limit value. To solve the problem, we'll be "parallelizing" an ancient algorithm called Eratosthenes Sieve.

To find all prime numbers up to some limit value N, this algorithm works as follows:

  1. Create the sieve -- an array indexed from 2 to N. (This can be an array of bools or ints or anything else that supports random access and provides entries that are easy to "mark".)
  2. Set k to 2.
  3. Loop:
    1. In the sieve, mark all entries whose indices are multiples of k between k2 and N, inclusive.
    2. Find the smallest index greater than k that is unmarked, and set k to this value.
    Until k2 > N.
  4. The indices of the unmarked sieve entries are prime numbers.
To see how the algorithm works, check chapter 5 (specifically Figure 5.2) in your text for a walk-through.

Today we wll study a parallel version of this algorithm that distributes "the sieve" across M different PEs, to see what kind of speedup and efficiency it offers.

Exercise

Chapter 5 of your text provides a parallel version of Eratosthenes Sieve called sieve1.c (see Figure 5.6) and explains how it was designed. Unfortunately, that program has a bug that makes it only work correctly with more than one process. Download this corrected version and this Makefile. Make and run it to verify that it is working correctly.

Then rerun it, using a large limit value (e.g., 100,000,000 or more) on 1 PE, and record the execution time in a spreadsheet. The Makefile provides a way to do this conveniently.

Then run it again using 2 PEs (hopefully different from those of your classmates) and record its execution time in your spreadsheet. Using these values, compute the speedup S2, and the efficiency E2 as discussed in class.

Then run it again using 4 PEs (again, hopefully different from those of your classmates) and record its execution time in your spreadsheet. Using these values, compute the speedup S4, and the efficiency E4.

If all is well (i.e., your classmates are not interfering with you), you should see measurable speedup when using 2 and 4 PEs. Which has higher speedup? Which has higher efficiency?

There are at least three ways in which the program can be improved. Implementing these improvements is this week's assignment.

Homework

Your task is to write three "improved versions" of sieve1.c, as described in your text:

On Ohm, run all four programs using 1, 2, 4, 8, and 16 PEs, and record their execution times in a spreadsheet. Using the same spreadsheet, compute the speedup and efficiency for each execution using multiple PEs, as discussed in class.

Hand In

Please staple or paperclip your pages together.


Up to the HPC Homework Page

Up to the Calvin HPC Course Page


This page maintained by Joel Adams.