HPC MPI Exercise 3: Homework Project
Computational models that use the Monte Carlo pattern
model a physical phenomenon using randomness, probability,
and the average of many repetitions or trials
to compute "typical" behavior.
The more trials that are used, the more accurately the model is in
describing the physical phenomenon.
However running many trials can turn a fast program into a slow one,
especially as the size of the problem (in this case, the forest size)
increases.
Relatedly, we only used nine burn-probabilities in
the lab exercise,
but in reality there are 101 burn-probablities (0, 1, 2, ..., 100).
To accurately model the phenomenon, we should include all 101 probabilities.
Your homework assignment this week is to take a sequential version
of the forest-fire model and "parallelize" it using MPI.
Make a new folder for this project;
then download
this tarball of the sequential model
and save it in your project folder.
You can extract the tarball's contents by opening a terminal,
navigating to your project folder, and entering:
tar xvzf firestarter.tgz
Compile it using the included Makefile and then run the program.
This version uses 5000 trials for each of the 101 probabilities
to model the burning of a forest of size 20x20.
When it is done, it displays the 101 burn-probabilities and their
corresponding percentage-burned values,
plus a simple graph that plots the percentage-burned values
against the burn-probabilities.
In the Lab
-
Modify this program so that in addition to the percentage-burned
values, it also calculates and displays the times
(i.e., the number of iterations) for each burn.
You may find the malloc or calloc command useful for this.
-
Add MPI timing commands to benchmark how fast your parallel
program runs in "wall clock" time.
-
Even with a "small" (20x20) forest,
this sequential program takes a non-trivial amount of time
to compute its results.
Rewrite this program using MPI to spread the computation of
these burn-percentage and time values across N processes.
You should be able to develop a good parallel solution
to the problem, using the patterns we have explored so far.
Given that there are only 101 probabilities but the cluster
has nearly 400 cores, think carefully
about how you can best "parallelize" your program to maximize its scalability.
(Hint: Look at the two nested loops where the program spends most of its time.
How could these be rewritten to maximize the program's parallelism
and scalability?)
-
Develop, test, and debug your program,
using the 20x20 forest and 1, 2, and 4 processes.
Using 1, 2, and 4 processes,
for each of the 101 probabilities,
record in your spreadsheet: (i) the probability,
(ii) the percent-burned values,
(iii) the burn-times that your program produces, and
(iv) the MPI wall-clock time.
Create charts for the percent-burned and burn-time values,
and verify that these results are consistent with
those produced by the sequential version,
regardless of the number of processes used.
(If they are markedly different, your program has a bug that you
need to fix before proceeding further.)
-
When your program is fully tested and debugged,
copy it to the cluster.
On The Cluster
On the cluster, build and run your program using
1, 2, 4, 8, 16, 32, 64, 128, and 256 processes and
three forest-sizes--20x20, 40x40, and 80x80--for a total
of 27 different runs.
Note that these sizes are quite small compared to an actual forest,
but they should let us see how increasing the forest size
affects the performance of our model.
As in the past, use 1 cluster-node for your runs of 1, 2, 4, and 16 processes;
then double the number of nodes for each subsequent run.
Data Collection and Visualization
For each forest-size:
-
Open a new tab in your spreadsheet and
using your program's output on the cluster,
record your program's wall-clock run-times
when using 1, 2, 4, 8, 16, 32, 64, 128, and 256 processes.
-
Open a new spreadsheet tab and
for all 101 burn-probabilities,
record the burn-probability,
the burn-percentages and
and burn-times
from your runs on the cluster with P == 256.
Using your spreadsheet, create three line-charts:
-
One that shows how the simulation's wall-clock run-times
varies with P, the number of processes used,
for P=1, 2, 4, 8, 16, 32, 64, 128, and 256.
-
One that shows how the simulation's burn-percentage
varies with the burn-probability,
for all 101 burn-probabilities
when P has the value 256.
-
One that shows how the simulation burn-times (# of steps)
varies with the burn-probability,
for all 101 burn-probabilities
when P has the value 256.
On each chart, put the first quantity (e.g., wall-clock time,
burn-percentage, or burn-time) on the vertical axis and
the second quantity (e.g., P or burn-probability) on the horizontal axis.
All three charts should feature three lines--one for each forest-size--so
that any effects of increasing the forest size can be compared.
Be sure to use distinct line-styles (e.g., solid, dashed, dotted)
so that the lines are easily distinguished when printed
on a gray-scale printer.
As always,
give each chart a descriptive title, well-labelled axes, etc.
Resize each chart so that it is a full-page in size when printed,
to maximize the visable details.
If necessary, copy them into a word processor for printing.
Hand In
Hard copies of:
-
A 1-2 page analysis
-- as quantitative as possible --
of what this model reveals
about the behavior of a fire in a forest,
and why it behaves that way.
In your analysis, you should:
-
Describe the relationship between the burn-probability
and the percentage of the forest that burns (on average).
-
At what probability does the burn-percentage
begin to accelerate (e.g., where is the first
45-degree tangent to the curve)?
-
At what probability does the burn-percentage
begin to deccelerate (e.g., where is the second
45-degree tangent to the curve)?
-
At what probability does the curve's inflection point occur
and why is this of interest?
-
Describe the relationship between the burn-probability
and the duration of the forest fire
(according to this model).
-
At exactly what burn-probability does the fire burn the longest?
-
What is happening in the forest prior to this point that
makes the burn-time increase?
-
What is happening in the forest after this point
to make the burn-time decrease?
-
Discuss your program's scalability: How does its performance
on the cluster change as the number of processes increases?
-
Describe how the size of the forest affects:
-
Your program's run-times.
-
The fire's burn-percentages.
-
The fire's burn-times.
As usual, be as quantitative as possible, citing specific data values
to support your claims.
-
Suppose a park has 4 fire-danger levels: green (no danger),
yellow (low danger), orange (moderate danger), and red (high danger),
and that each day, a park ranger measures how dry the forest is
on a 0..100 scale (the higher the number, the drier the forest).
Based on your simulation results,
what dryness values would you recommend the park rangers use
as thresholds to decide whether a given day's fire-danger level is
green, yellow, orange, or red, and why?
-
Describe anything else you discovered that was surprising or interesting
doing this project.
If you used AI in completing this project,
attach an additional page in which you document your use of it,
as was described in
the
course policies.
-
The charts and spreadsheet data you collected in this exercise;
-
A script file in which you use cat to list your program,
and use make to show that it compiles correctly in the lab.
-
The ".out" file produced from running it on the cluster
using 256 processes and the 20x20 forest.
Please staple these pages together
and make certain that your name is on each page.
CS >
374 >
Exercises >
03 >
Homework Project
This page maintained by
Joel Adams.