HPC Project 3: Computational Modeling


Overview

In this exercise, we want to introduce the idea of computational modeling -- the modeling of real-world processes or phenomena in software.

Computational modeling is useful in situations where experimenting with a real-world physical process is impractical, costly, or dangerous. A computer model of the physical phenomenon lets us experiment with the phenomenon, and perhaps come to understand how the process works, without the limitations, cost, or danger found in the real world.

The physical process we'll be modeling is a forest fire. The model we'll use is simplified so that we can understand it easily, which reduces its representational accuracy. This simplicity-vs-accuracy tradeoff is common in modeling: models are (by definition) imperfect representations of the phenomena they represent, and the more simplified the model, the less accurately it represents the complexities of the real world.

Getting Started

Begin by experimenting with this fire applet from the Shodor Foundation's Interactivate Project. This model lets you start a forest fire by clicking on a tree, which lights it on fire. The fire then spreads probabilistically to neighboring trees, based upon the burn-probability, which models the likelihood of a fire spreading from a tree to its neighbors (e.g., perhaps reflecting local moisture levels).

When the fire burns out, the model displays the percentage of the forest that burned, and the time (number of iterations) the fire burned. Once your forest fire has burned itself out, you can "regrow" the forest by clicking the Regrow Forest button. Take a few minutes to experiment with the model, to get a sense of how it works.

Exercise

We want to use this model to explore these questions:

Click the Highlight Center Tree button to make it clear which tree is the center tree. Then for each of the burn probabilities 0.1, 0.2, 0.3, 0.4, 0.5, 0.6, 0.7, 0.8, and 0.9:
  1. Start a forest fire at the center tree, noting the percentage of the forest that burns, and the time (iterations) the forest burns in a spread sheet.
  2. Regrow the forest.
When you have finished, use the spreadsheet to create a chart in which you plot the percentage-burned and time (iterations) for each burn-probability. Compare your chart to the charts of your classmates. How do they compare?

This model is an example of a class of computations called monte carlo simulations, that use statistical probabilities and pseudo-random numbers to represent physical phenomena. As a result, the model is non-deterministic: if you rerun the model several times with the exact same inputs (burn-probability, center tree), you are likely to get different outputs (percentage-burned, time). For this reason, it is usually necessary to do many trials or repetitions of a monte carlo simulation, and then average the results, to get a sense of what is most likely to happen (on average).

Start a forest fire 2 more times for each burn probability, and add the percent-burned and time results to your spreadsheet so that you have six measurements (three percent-burned and three time) for each burn-probability. Then calculate the average of the three percent-burned and time measurements for each burn-probability, and create a new chart plotting these averages against the burn-probabilities. How does your new chart compare to your first one? How does it compare to your classmates' charts? Is they more or less similar than before?

Homework

As you can see, monte carlo models need to compute average results over many repetitions or trials to produce results that are "accurate." However these trials can turn a fast program into a time-consuming one, especially as the size of the problem (e.g., the forest size) increases.

Your homework assignment this week is to take a sequential version of this model (written in C), and "parallelize" it using MPI. Download this zip file of the sequential model, extract its contents (tar xvzf firestarter.tgz), compile it using the included Makefile, and then run the program. This version (by default) uses 101 probabilities (0, 1, 2, ... 100) and 5000 trials to burn a forest of size 20x20. When it is done, it displays the burn-probabilities and their corresponding percentage-burned values, plus a graph plotting these values.

  1. Modify this program so that in addition to the percentage-burned values, it also calculates and displays the times (in number of iterations) for each burn. You may find it useful to use the calloc command instead of the malloc command.
  2. Even with a "small" (20x20) forest, this program takes "a long time" to compute its results. Rewrite this program using MPI commands to spread the computation of these burn-percentage and time values across N processes. Try to develop an embarassingly parallel solution to the problem. (Hint: use a similar approach as the circuit-solving problem uses.)
  3. Add MPI timing commands to benchmark how fast your parallel program runs in "wall clock" time.
  4. Run your program on three forest-sizes: 20x20, 40x40, 80x80 (none of which is all that big for an actual forest). For each forest, plot your program's run-times for 1, 2, 4, 6, 8, 10, 12, 14, and 16 PEs.

Hand In


Up to the
HPC Homework Page
Up to the
Calvin HPC Course Page

This page maintained by Joel Adams.