HPC MPI Exercise 3: Homework Project -- Monte Carlo Forest Fire
Computational models that use the Monte Carlo pattern
model a physical phenomenon using randomness, probability,
and the average of many trials to compute "typical" behavior.
The more trials that are used, the more accurately the model is in
describing the physical phenomenon.
However running many trials can greatly increase a simulation's runtime,
especially as the size of the problem (in this case, the forest size)
increases.
Recall that we only used nine burn-probabilities in
the lab exercise,
but there are 101 burn-probablities (0, 1, 2, ..., 100).
To accurately model the phenomenon, we must include all 101 probabilities,
in case some probabilities produce unexpected behaviors.
Your homework assignment this week is to take a sequential version
of the forest-fire simulation and "parallelize" it using MPI.
Make a new folder for this project (e.g., 03/project);
then download
this tarball of the sequential model
and save it in your project folder.
If your browser does not automatically extract the tarball's files,
you can extract its contents manually by opening a terminal,
navigating to the folder containing the tarball, and entering:
tar xvzf Fire.tgz
Compile it using the included Makefile and then run the program.
This simulation uses 10,000 trials for each of the 101 probabilities
to model the burning of a modest 20x20 forest (400 trees).
When the simulation is done, the program displays a table of
the 101 burn-probabilities,
their corresponding percentage-burned values,
and their corresponding 'burn-times' (number of iterations).
It also generates two simple graphs that,
for each of the 101 burn-probabilities, respectively plot
the percentage-burned and 'burn-time' values.
These provide you with quick visualizations of the
values the program is generating.
In the Lab
-
Add MPI timing commands to benchmark how fast this sequential
program takes in "wall clock" time.
To time the full calculation,
start timing before the code that initializes the random number
generator, and stop timing after the for loop that
calculates and prints out the 101 probability-results.
(Do not include the code that plots the xgraph.)
-
Even with a "small" (20x20) forest,
this sequential simulation takes a non-trivial amount of time
to perform its 10,000 trials.
Rewrite this program using MPI to spread the computation of
these burn-percentage and time values across P processes.
You should be able to develop a good parallel solution
to the problem, using the patterns we have explored so far.
You may find the chunks.h
file to be useful.
-
Test and debug your program
using the 20x20 forest and P = 1, 2, and 4 processes.
In your spreadsheet, record:
-
the MPI wall-clock time.
(As P increases, this time should decrease proportionally.)
-
for each of the 101 burn-probabilities,
(i) the burn-probability,
(ii) the average percent-burned values, and
(iii) the average burn-times (number of iterations)
that your program produces.
Create quick charts for the percent-burned and burn-time values,
and verify that these results are consistent with
those produced by the sequential version,
regardless of the number of processes used.
(If they are markedly different, your program has a bug that you
need to fix before proceeding further.)
-
When your program is fully tested and debugged,
copy it to the cluster.
On The Cluster
On the cluster, modify your program as necessary for it to build
using this Makefile.
Since we will be running it via SLURM,
you will need to remove the x_graph variables
and other code related to generating the simple plots.
You should also remove the X-graph and display files
from your project folder on the cluster.
When your code builds without errors or warnings,
generate the SLURM file(s) needed to run your program using
P = 1, 2, 4, 8, 16, 32, 64, 128, and 256 processes and
three forest-sizes--20x20, 40x40, and 80x80--a total
of 27 different runs.
Note that the forest-size can be passed to the program
via command-line arguments.
Note also that these sizes are small compared to an actual forest,
but they should let us see how increasing the forest size
affects the performance of our model.
Since each compute node on Borg has 16 cores,
use a single cluster-node for your runs of P = 1, 2, 4, 8, and 16 processes;
then double the number of nodes for each subsequent run.
Data Collection and Visualization
For each forest-size:
-
Open a new tab in your spreadsheet and
using your program's output on the cluster,
record your program's wall-clock run-times
when using P = 1, 2, 4, 8, 16, 32, 64, 128, and 256 processes.
Next to the times for which P >= 2, use your spreadsheet to calculate
your program's parallel speedup and parallel efficiency.
(Double- and triple-check the formulas you use to calculate the speedup
and efficiency, as it's easy to get these wrong in a spreadsheet.)
-
Open a new spreadsheet tab and
for all 101 burn-probabilities,
record the burn-probability,
the burn-percentages and
and burn-times
from your runs on the cluster with P == 256.
Using your spreadsheet, create five line-charts:
-
One that plots your simulation's wall-clock run-times
against P, the number of processes used,
for P = 1, 2, 4, 8, 16, 32, 64, 128, and 256.
-
One that plots your simulation's parallel speedup against
P = 1, 2, 4, 8, 16, 32, 64, 128, and 256.
-
One that plots your simulation's parallel efficiency against
P = 1, 2, 4, 8, 16, 32, 64, 128, and 256.
-
One that plots your simulation's burn-percentage
against the 101 burn-probabilities
when P = 256.
-
One that plots your simulation's burn-times (# of steps)
against the 101 burn-probabilities
when P = 256.
On each chart, put the first quantity (e.g., wall-clock time,
speedup, efficiency, burn-percentage, or burn-time) on the vertical axis and
the second quantity (e.g., P or burn-probability) on the horizontal axis.
Each chart should feature three lines--one for each forest-size--so
that any effects of increasing the forest size can be seen.
Be sure to use distinct line-styles (e.g., solid, dashed, dotted)
so that the lines may be easily distinguished when printed
on a gray-scale printer.
Since you will be submitting these charts,
give each chart a descriptive title, well-labelled axes, etc.
Resize each chart so that it is a full-page in size when printed,
to maximize the visible details.
If necessary, copy them into a word processor for printing.
Hand In
Hard copies of:
-
A 2-3 page analysis
-- as quantitative as possible --
of what this model reveals
about the behavior of a forest fire,
and why it behaves that way.
In your analysis, you should:
-
Describe the relationship between the burn-probability
and the percentage of the forest that burns (on average).
-
At what burn-probability does the burn-percentage
begin to accelerate (i.e., moving left-to-right,
where does the first 45-degree tangent occur)?
-
At what burn-probability does the burn-percentage
begin to deccelerate (e.g., moving left-to-right,
where does the second 45-degree tangent occur)?
-
At what burn-probability does the curve's inflection point occur?
What is its significance for the forest fire?
-
Describe the relationship between the burn-probability
and the duration of the forest fire
(according to this model).
-
At what burn-probability does the fire burn the longest?
-
What is happening to the fire in the forest prior to this point that
makes the burn-time increase?
-
What is happening to the fire in the forest after this point
to make the burn-time decrease?
-
Describe how the size of the forest affects:
-
Your program's run-times.
-
The fire's burn-percentages.
-
The fire's burn-times.
-
Discuss your program's scalability: How do its run-time, parallel speedup,
and parallel efficiency change as P increases?
Does there appear to be an upper bound on its speedup?
Is there a point beyond which its efficiency becomes unacceptable
(i.e., lower than 60%)?
-
Do the data you have collected
corroborate or contradict Amdahl's and Gustafson's Laws?
Why or why not?
-
Suppose a national park has 4 fire-danger levels: green (no danger),
yellow (low danger), orange (moderate danger), and red (high danger),
and that each day, a park ranger measures how dry the forest is
on a 0..100 scale (the higher the number, the drier the forest).
Based on the data you have collected,
what dryness values would you recommend the park rangers use
as thresholds to decide whether a given day's fire-danger level is
green, yellow, orange, or red, and why?
-
Describe anything else that you found surprising or interesting
doing this project.
As usual, be as quantitative as possible, citing specific data values
to support each of your claims.
If you used AI in completing this project,
attach an additional 1-2 pages in which you document your use of it,
as was described in
the
course policies.
-
The charts you created in this exercise;
-
The spreadsheet data you collected in this exercise;
-
The final version of your source program from the cluster.
-
The SLURM ".out" file produced by running your program
on the cluster using 256 processes and the 20x20 forest.
Please staple these pages together
and make certain that your name is on each page.
CS >
374 >
Exercises >
03 >
Homework Project
This page maintained by
Joel Adams.