### HPC Project 8: A Hybrid Project

#### Overview

In this project, we want to combine MPI and OpenMP. The problem we'll be solving is conceptually simple, computationally intensive, and important in cryptography/security: finding all prime numbers less than some (large) limit value. To solve the problem, we'll be "hybridizing" a parallel version of an ancient algorithm called Eratosthenes Sieve.

To find all prime numbers up to some limit value N, the sequential version of this algorithm works as follows:

1. Create the sieve -- a dynamic array indexed from 2 to N. (This can be an array of bools or ints or anything else that supports random access and provides entries that are easy to "mark".)
2. Since 1 is not prime, set k to 2.
3. Loop:
1. In the sieve, mark all entries whose indices are multiples of k between k2 and N, inclusive.
2. Find the smallest index greater than k that is unmarked, and set k to this value.
Until k2 > N.
4. The indices of the unmarked sieve entries are prime numbers.
To see how the sequential algorithm works, check out this video.

This week's project is to create a hybrid version of this algorithm that uses MPI to distribute "the sieve" across different PEs, and uses OpenMP to speed up the processing on those PEs.

#### Exercise

Download sieve.c and this Makefile. Make and run it to verify that it is working correctly.

Then rerun it, using a large limit value (e.g., 1,000,000,000 or more) and 1 process, and record the execution time in a spreadsheet.

Then run it again using 2 processes and verify that its execution time is faster than was the case with 1 process. Repeat this with 4, 8, 16, and 32 processes, to see how well it scales.

There are many ways in which the program can be improved.

#### Homework

Your task is to write sieve2.c: an improved version of Eratosthenes Sieve that uses MPI for the interprocess communication and uses OpenMP to multithread the computation a given process performs. You may either adapt sieve.c (i.e., add OpenMP to MPI), or start from scratch and build/design/implement sieve2.c from the ground up.

For this assignment, use 4 OpenMP threads per process. You should develop and test your sieve2.c program in the ulab, and then move both it and sieve.c to Dahl to compare their performance.

On Dahl, run the original sieve.c program using a large limit value (e.g., 1,000,000,000 or more) and 4, 8, 16, 32, 64, 128, and 256 processes. Record their execution times in a spreadsheet. Then repeat this using your sieve2.c hybrid version using 4, 8, 16, 32, 64, 128, and 256 threads to quantify your performance improvement.

This is a one week project.

#### Hand In

• Your source code for sieve2.c.
• A spreadsheet and a single chart showing both programs' execution times on Dahl using 4, 8, 16, 32, 64, 128, and 256 PEs.
• A 1 page analysis of your results in which you compare and contrast the performance of the sieve.c and sieve2.c. What kinds of conclusions can you draw? What was easiest about combining MPI and OpenMP? What was the most difficult?

CS > 374 > Exercises > 08 > Homework Project