HPC OpenMP Project: The Mandelbrot Set

the mandelbrot set

Introduction

The black figure in the 2D graphic above is called the Mandelbrot Set, named after the mathematician Benoit Mandelbrot who created early visualizations of it. The values of the (x,y) points within this figure can be determined by a simple but non-linear iterative function f(x,y). For each (x,y) within the central part of the figure (shown in black), the values produced by f(x,y) converge; for each (x,y) outside of that central area (shown in shades of blue), the values produced by f(x,y) never converge (i.e., they diverge toward infinity). The boundary between these regions of convergence and divergence (shown in gold in the figure above) is a fascinating chaotic recursive fractal that contains many patterns similar to those found in the natural world. Check out the Mandelbrot Set Wikipedia page for more information.

In the source folder for this project, the file mandel.cpp is a C++ program that draws the Mandelbrot set. This is a traditional sequential program; a single-threaded process is doing all the work. This one-week project is to create and compare two different approaches to speeding up this program:

  1. A program that uses OpenMP's version of the "Chunks" Parallel Loop, and
  2. A version that uses OpenMP's version of the "Slices" Parallel Loop.
The goal is to see which approach is faster for solving this problem.

The program uses a graphics library developed here at Calvin University called the Thread-Safe Graphics Library, aka TSGL. Other graphics libraries are not thread-safe, meaning if two threads try to perform graphical calls at the same time, the program will likely crash. To support multithreaded graphical programming, we developed TSGL, which lets different threads safely perform simultaneous graphics calls in approximately real-time.

Since TSGL is only installed in Calvin's CS labs, you will need to do this week's project using one of the CS lab workstations or one of the remote-access Guacamole systems; you cannot do this project on the cluster or your own machine (at least not without installing TSGL).

Getting Started

Make a new project folder and download the files from this folder to your project folder. Look over the statements in mandel.cpp to see how it is structured.

Use the provided Makefile to build the program; then run it. You should see the Mandelbrot set drawn using Calvin colors (i.e., a gold figure on a maroon background).

Your first task is to measure how long this sequential program is taking to compute and draw the Mandelbrot set. Where MPI had the MPI_Wtime() function to measure 'wall clock' time, OpenMP has the omp_get_wtime() function that behaves in a similar fashion. Modify the program by adding calls to omp_get_wtime(): (i) after the first printf() near the begining of the program, and (ii) just before the printf() statement near the end of the program. (Since this is an OpenMP function, you will need to #include <omp.h> as well; the Makefile is already set up to utilize the OpenMP library.) Then modify that final printf() so that it displays the total time the program takes to compute and draw the Mandelbrot set.

When you have this working, open up a spreadsheet, run the program three times, record the times in the spreadsheet, and then use the spreadsheet to compute the median of those times. We will use this as a baseline to compare against our parallel versions, and as the T1 value in our SpeedupT computations.

Version I: Equal Chunks

In a separate folder, make a copy of mandel.cpp (e.g., mandelChunks.cpp) and modify it to use OpenMP's "Equal Chunks" Parallel Loop to speed up the computation. You should have no problems doing this if you have completed this week's lab.

Note that each iteration of the Mandelbrot computation is independent of the others. In a multithreaded version, a thread can compute and draw the pixels in its 'chunk' of the window without interacting with any other threads, making this an embarrassingly-parallel problem. It follows that this approach should provide a good speedup over the sequential version.

For convenience, we want to be able to control the number of threads the program uses when we run the program. Using the examples you explored during this week's lab, add logic near the beginning of your program (i.e., before the first printf()) that passes the number of threads the user species on the command-line as an argument to the omp_set_num_threads() function. Then modify that first printf() so that is displays the number of threads being used.

Finally make a copy of the project Makefile and adjust it so that it builds your new program.

With these changes, you should be able to specify the number of threads when you run your program; for example, if your program is named mandelChunks, to run it with 1 thread you would enter:

   ./mandelChunks 1
to run it with 2 threads, enter:
   ./mandelChunks 2
to run it with 3 threads, enter:
   ./mandelChunks 3
and so on. If the user neglects to specify the number of threads on the commandline, you should display a "Usage" message that shows them how to run the program and then either terminate abnormally, or proceed using a single thread as the default number of threads.

Run your program a few times with different numbers of threads (or no commandline arguments) to verify that it is working correctly.

When you have Version I of your program fully debugged, you are ready to begin time-trials!

Time Trials. For each thread-value T = 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, and 12: run your program three times and record those times in your spreadsheet.

Then use the spreadsheet to compute the medians of those times.

Then use those medians and the medians you computed for the sequential version to compute the SpeedupT for each value of T.

Finally, use your SpeedupT values to compute the EfficiencyT for each value of T.

Version II: Slices

Create a second parallel version of the program, but use OpenMP's version of the "Slices" (aka Chunk-size 1) Parallel Loop in this version.

Run your program a few times using different numbers of threads. If you look carefully at how it is drawing the Mandelbrot set and compare its behavior to that of your "Chunks" version, you should see different graphical behaviors from the two versions. Make certain this new version is working correctly before starting the time-trials for this version.

As always, make certain that both of your programs follow good style guidelines, are well-documented, etc.

Time Trials. For each thread-value T = 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, and 12: run your program three times and record those times in your spreadsheet.

Then use the spreadsheet to compute the medians of those times.

Then use those medians and the median you computed for the sequential version to compute the SpeedupT for each value of T.

Finally, use your SpeedupT values to compute the EfficiencyT for each value of T.

Now that we have collected data for both versions, your next task is to determine which approach (if either) works better for this problem.

Visualizations

To visualize your data, use your spreadsheet to create the following line-charts:

  1. A chart in which you compare the wall-clock times of the two versions of the program, with Time on the Y-axis and the number of threads on the X-axis.
  2. A chart in which you compare the Parallel SpeedupT of the two versions, with Speedup on the Y-axis and the number of threads on the X-axis.
  3. A chart in which you compare the Parallel EfficiencyT of the two versions, with Efficiency on the Y-axis and the number of threads on the X-axis.

As always, give your charts descriptive titles and label their axes. Make certain the lines for each version can be clearly distinguished on a gray-scale printer by using different line-styles (e.g., solid vs. dotted).

Analysis

Here are some things to keep in mind as you analyze your results:

  1. Do you notice anything unexpected about the data you have collected so far?
  2. What are the relevant factors that might be contributing to the behaviors of your two programs? Are there hardware factors that effect the behavior of your multithreaded programs as you increase the number of threads?
  3. Conventional wisdom is that the 'chunks' approach is faster than the 'slices' approach when dealing with arrays. Do your data support or contradict that wisdom?
  4. Which exhibits better Speedup on this problem, 'chunks' or 'slices'? Why?
  5. Which exhibits better Efficiency on this problem, 'chunks' or 'slices'? Why?
  6. What is unusual about the Mandelbrot computation compared to other problems we have examined?
  7. Under what circumstances does the "Chunks" Parallel Loop out-perform the "Slices" Parallel Loop, and vice versa? Can you derive a guiding rule for when to use one version as opposed to the other?
  8. Which 'law' applies in this situation: Amdahl's or Gustafson's? How do your Speedup calculations affect your answer?

Hand In

Hard copies of:

  1. Your 1-to-2 page written analysis in which you compare your programs, including their raw performance as the number of threads increases, their respective speedups and efficiencies, what you have learned about when to use the "Chunks" Parallel Loop vs. the "Slices" Parallel Loop, and anything else you discovered that was interesting, unexpected, or surprising. As always, make your analysis as quantitative as you can, citing specific values from the data you collected to support your claims.
  2. If you used AI on this project, add an extra 1-2 pages in which you indicate which AI you used, the issue(s) you were hoping it would help you solve, the specific prompts you gave it, its responses, and an estimate of how much time it saved (or cost) you.
  3. Your spreadsheet charts and data showing your programs' timing results, speedups, and efficiencies. As always, print your charts one per page, as large as possible, to maximize the amount of detail that can be seen.
  4. A recording of your console using the program script in which you: (i) use cat to list each of your programs, (ii) use make to show that they each compile correctly, and then (iii) show that they each run correctly using 4 threads.
  5. Two screen captures that show your two programs' outputs from the runs using 4 threads.

Please staple these pages together and make certain that your name is on each page.


CS > 374 > Exercises > 05 > Homework Project


This page maintained by Joel Adams.