MPI Exercise 3: Homework Project

HPC MPI Exercise 3: Homework Project

Computational models that use the Monte Carlo pattern model a physical phenomenon using randomness, probability, and the average of many repetitions or trials to compute "typical" behavior. The more trials that are used, the more accurately the model is in describing the physical phenomenon. However running many trials can turn a fast program into a slow one, especially as the size of the problem (in this case, the forest size) increases.

Relatedly, we only used nine burn-probabilities in the lab exercise, but in reality there are 101 burn-probablities (0, 1, 2, ..., 100). To accurately model the phenomenon, we should include all 101 probabilities.

Your homework assignment this week is to take a sequential version of the forest-fire model and "parallelize" it using MPI. Make a new folder for this project; then download this tarball of the sequential model and save it in your project folder. You can extract the tarball's contents by opening a terminal, navigating to your project folder, and entering:

   tar xvzf firestarter.tgz

Compile it using the included Makefile and then run the program.

This version uses 5000 trials for each of the 101 probabilities to model the burning of a forest of size 20x20. When it is done, it displays the 101 burn-probabilities and their corresponding percentage-burned values, plus a simple graph that plots the percentage-burned values against the burn-probabilities.

In the Lab

Modify this program so that in addition to the percentage-burned values, it also calculates and displays the times (i.e., the number of iterations) for each burn. You may find the malloc or calloc command useful for this.
Add MPI timing commands to benchmark how fast your parallel program runs in "wall clock" time.
Even with a "small" (20x20) forest, this sequential program takes a non-trivial amount of time to compute its results. Rewrite this program using MPI to spread the computation of these burn-percentage and time values across N processes. You should be able to develop a good parallel solution to the problem, using the patterns we have explored so far. Given that there are only 101 probabilities but the cluster has nearly 400 cores, think carefully about how you can best "parallelize" your program to maximize its scalability. (Hint: Look at the two nested loops where the program spends most of its time. How could these be rewritten to maximize the program's parallelism and scalability?)
Develop, test, and debug your program, using the 20x20 forest and 1, 2, and 4 processes.
Using 1, 2, and 4 processes, for each of the 101 probabilities, record in your spreadsheet: (i) the probability, (ii) the percent-burned values, (iii) the burn-times that your program produces, and (iv) the MPI wall-clock time.
Create charts for the percent-burned and burn-time values, and verify that these results are consistent with those produced by the sequential version, regardless of the number of processes used. (If they are markedly different, your program has a bug that you need to fix before proceeding further.)
When your program is fully tested and debugged, copy it to the cluster.

On The Cluster

On the cluster, build and run your program using 1, 2, 4, 8, 16, 32, 64, 128, and 256 processes and three forest-sizes--20x20, 40x40, and 80x80--for a total of 27 different runs. Note that these sizes are quite small compared to an actual forest, but they should let us see how increasing the forest size affects the performance of our model.

As in the past, use 1 cluster-node for your runs of 1, 2, 4, and 16 processes; then double the number of nodes for each subsequent run.

Data Collection and Visualization

For each forest-size:

Open a new tab in your spreadsheet and using your program's output on the cluster, record your program's wall-clock run-times when using 1, 2, 4, 8, 16, 32, 64, 128, and 256 processes.
Open a new spreadsheet tab and for all 101 burn-probabilities, record the burn-probability, the burn-percentages and and burn-times from your runs on the cluster with P == 256.

Using your spreadsheet, create three line-charts:

One that shows how the simulation's wall-clock run-times varies with P, the number of processes used, for P=1, 2, 4, 8, 16, 32, 64, 128, and 256.
One that shows how the simulation's burn-percentage varies with the burn-probability, for all 101 burn-probabilities when P has the value 256.
One that shows how the simulation burn-times (# of steps) varies with the burn-probability, for all 101 burn-probabilities when P has the value 256.

On each chart, put the first quantity (e.g., wall-clock time, burn-percentage, or burn-time) on the vertical axis and the second quantity (e.g., P or burn-probability) on the horizontal axis.

All three charts should feature three lines--one for each forest-size--so that any effects of increasing the forest size can be compared. Be sure to use distinct line-styles (e.g., solid, dashed, dotted) so that the lines are easily distinguished when printed on a gray-scale printer.

As always, give each chart a descriptive title, well-labelled axes, etc. Resize each chart so that it is a full-page in size when printed, to maximize the visable details. If necessary, copy them into a word processor for printing.

Hand In

Hard copies of:

A 1-2 page analysis -- as quantitative as possible -- of what this model reveals about the behavior of a fire in a forest, and why it behaves that way. In your analysis, you should:
- Describe the relationship between the burn-probability and the percentage of the forest that burns (on average).
  - At what probability does the burn-percentage begin to accelerate (e.g., where is the first 45-degree tangent to the curve)?
  - At what probability does the burn-percentage begin to deccelerate (e.g., where is the second 45-degree tangent to the curve)?
  - At what probability does the curve's inflection point occur and why is this of interest?
- Describe the relationship between the burn-probability and the duration of the forest fire (according to this model).
  - At exactly what burn-probability does the fire burn the longest?
  - What is happening in the forest prior to this point that makes the burn-time increase?
  - What is happening in the forest after this point to make the burn-time decrease?
- Discuss your program's scalability: How does its performance on the cluster change as the number of processes increases?
- Describe how the size of the forest affects:
  1. Your program's run-times.
  2. The fire's burn-percentages.
  3. The fire's burn-times.
  As usual, be as quantitative as possible, citing specific data values to support your claims.
- Suppose a park has 4 fire-danger levels: green (no danger), yellow (low danger), orange (moderate danger), and red (high danger), and that each day, a park ranger measures how dry the forest is on a 0..100 scale (the higher the number, the drier the forest). Based on your simulation results, what dryness values would you recommend the park rangers use as thresholds to decide whether a given day's fire-danger level is green, yellow, orange, or red, and why?
- Describe anything else you discovered that was surprising or interesting doing this project.
If you used AI in completing this project, attach an additional page in which you document your use of it, as was described in the course policies.
The charts and spreadsheet data you collected in this exercise;
A script file in which you use cat to list your program, and use make to show that it compiles correctly in the lab.
The ".out" file produced from running it on the cluster using 256 processes and the 20x20 forest.

Please staple these pages together and make certain that your name is on each page.

CS > 374 > Exercises > 03 > Homework Project

This page maintained by Joel Adams.