HPC Project 2: The Master-Worker & Message-Passing Patterns

This week's assignment is a fairly simple project to give you practice using the master-worker and message-passing patterns. The assignment is to write a program that sends a message around a ring of processes, using the master-worker and message-passing patterns. More precisely:

The master process (i.e., rank 0) should:
1. Record the starting time.
2. Create a message containing its rank.
3. Send that message to the next process (i.e., with rank 1).
4. Receive a message from the last worker process (i.e., with rank n-1).
5. Display the number of processes, the message received, and the time required for the message to traverse the ring.
Each worker process i should:
1. Receive a message from the process "before" it in the ring (i.e., rank i-1).
2. Determine the number of values received and verify that it is correct.
3. Append its rank (i.e., i)to the message it just received.
4. Send this new message on to the next process (i.e., with rank i+1), using the modulus operation to "wrap around" from the last worker to the master.

If all is well, the master process should display the number of processes, the message it received from process P-1, and the time to traverse the ring. For example, with 8 processes, the master might display something like this:

      8 processes: 
      [0 1 2 3 4 5 6 7]
      time: 0.15324 secs.

Note that in your final version, no process except the master should perform any I/O.

In many solutions to this problem, the 'message' being sent and received is a char array or string. In your program, the 'message' must be a dynamically allocated integer array.

More precisely, each MPI process should allocate an integer array that is exactly big enough to store the ranks of the MPI processes performing the computation. I recommend you use the C-standard calloc() function to allocate this array, since (unlike malloc()) it auto-initializes the array's elements to 0 values. This array can then be used as the 'message' to be sent and received.

For the sake of efficiency, when a process sends its 'message', it should only send the minimally required number of integers. That is, process 0 should send a single integer [0], process 1 should send two integers [0,1], ..., and process P-1 should send P integers [0, 1, ..., P-1].

To keep your program modular, your program should define and call a printResults() function that takes and displays an integer array, the size of the array, the number of processes, and the time to traverse the ring.

This function can also be used to help debug your program, by displaying the array each process receives. But once your program is debugged, only your master process should call this function, since worker processes should not perform any I/O. To keep your code clean, be sure you remove such 'debugging statements' and any other unnecessary code from the final version of your program.

Your program should be fully documented.

In the CS labs:

Write your program and test it using different arguments for the -np switch, so that you are confident it works correctly before continuing.
Using the MPI_Wtime() function, add code that causes the master process to calculate how long it takes the message to circulate around the "ring". Have the master output the number of processes, the message it received and the time, but do not include any I/O operations in your timing -- your timing should just measure how long it takes to transmit the message around the ring.
Since you are competing for network bandwidth and CPU cycles with others in the lab, this time may vary significantly from execution to execution, even using the same number of processes.
To measure this variance, run your program 3 times for a given number of processes. In a spreadsheet, record these three times; then use the spreadsheet to compute the max, min, median, average, and standard deviation of the three trip-time measurements. Use this procedure to test, record, and determine these statistics for rings of 4, 8, 16, 32, 64, 128, and 256 processes. (Do not use a calculator--use your spreadsheet's built-in functions to calculate these statistics!)
Be careful to only compute the time it takes the message to traverse the ring -- your master process should only time the interval from before sending its message to after receiving its message.

On The Cluster:

On the cluster (borg), make a new directory for this project. Transfer your program from the lab to this new directory on the cluster. Copy the Makefile you used last week, modify it to work with this week's program, and then use it to compile your program. (See me or one of your classmates if you need help with this.)
Copy your SLURM script-files from last week's projects and modify them as necessary for this project. To get SLURM to run your program multiple times on your behalf:
- Add this SBATCH directive to your script:
```
     #SBATCH --array=1-3                     # array job: run 3 times
```
- Modify the --output line of your script as follows:
```
     #SBATCH --output=slurm-%A-%a.out        # send stdout/err to this file
```
These changes will cause SLURM to run your program as an array job, running it 3 times and storing each job's output in a different file named slurm-, followed by the master job number (the number SLURM reports when you submit your job via sbatch), followed by -, followed by the job's array index number (1-3), followed by .out. To illustrate, if SLURM reported my job number as 25268, the output files it generate would be slurm-25268-1.out, slurm-25268-2.out, and slurm-25268-3.out.
Repeat the timing experiments you performed in the CS lab, recording three ring-traversal times for rings of 4, 8, 16, 32, 64, 128, and 256 processes.
For each different ring-size, use your spreadsheet to compute the same statistics of the three traversal-times that you computed in the lab (max, min, median, average, standard deviation).

Visualization:

In your spreadsheet, create a line-chart that plots the average times you computed for the CS Lab vs. on the cluster. The X-axis should be P, the number of processes; the Y-axis should be the time. Give your chart a descriptive title, be sure to label your axes (including any units), and include a legend that indicates which system each line represents in a way that will be clear on a gray-scale printer (e.g., use a solid line for one and a dashed line for the other). Your X-axis markers should be the actual P-values you used: 4, 8, 16, 32, 64, 128, and 256 processes.

Then create a second similar line-chart in which you plot the standard deviations you computed for each ring-size, with the number of processes P on the X-axis and the standard deviation on the Y-axis.

Size both of these charts so that each will occupy an entire page when printed, to reveal as much detail as possible. Since our X-axis values are growing by powers of 2, use a log-scale on the Y-axes to visualize these relationships. If necessary, copy-paste these charts into a word-processing program for printing.

Hand In

Hard copies of:

A 1-to-2 page written analysis in which you interpret your line-charts and spreadsheet data, discussing the statistics you recorded, especially the averages and standard deviations. To give a few examples:
- What is the relationship between the ring-traversal times and P, the number of processes? Why is this the case? Is this relationship the same in the lab and on the cluster, or different?
- How do the times in the lab compare to the times on the cluster? Why?
- How close to one another are the typical average and median values in the lab? Are the differences between these values on the cluster similar or different? Why?
- Is the average typically closer to the max or the min value? Is the same true of the median or is it different? What would explain this?
- Using the standard deviations, how does the variance in the lab compare to the variance on the cluster?
Explain for what you observe there; provide hypotheses for your observations, and support your hypotheses with evidence. Make your analysis as quantitative as possible, citing actual values from the data you collected to support all claims and hypotheses you make.
Compare the data you collected this week with what you collected last week. If you notice significant differences, quantify those differences and provide evidence-based hypotheses that might explain those differences.
If you used AI in completing this project, attach an additional page or two in which you document your use of it, as described in the course policies.
Your two line-charts.
The spreadsheet data you used to create the line-charts.
Your program's source code, printed with enscript or a2ps.
The Slurm .out file produced by your minimum-time 256-process run on the cluster, also printed with enscript or a2ps.

Please staple these pages together and make certain that your name is on each page.

CS > 374 > Exercises > 02 > Homework Project

This page maintained by Joel Adams.