HPC Project 2: The Master-Worker & Message-Passing Patterns


This week's assignment is a fairly simple project to give you practice using the master-worker and message-passing patterns. The assignment is to write a program that sends a message around a ring of processes, using the master-worker and message-passing patterns. More precisely:

If all is well, the master process should display the number of processes, the message it received from process P-1, and the time to traverse the ring. For example, with 8 processes, the master might display something like this:
      8 processes: 
      [0 1 2 3 4 5 6 7]
      time: 0.15324 secs.
Note that in your final version, no process except the master should perform any I/O.

In many solutions to this problem, the 'message' being sent and received is a char array or string. In your program, the 'message' must be a dynamically allocated integer array.

More precisely, each MPI process should allocate an integer array that is exactly big enough to store the ranks of the MPI processes performing the computation. I recommend you use the C-standard calloc() function to allocate this array, since (unlike malloc()) it auto-initializes the array's elements to 0 values. This array can then be used as the 'message' to be sent and received.

For the sake of efficiency, when a process sends its 'message', it should only send the minimally required number of integers. That is, process 0 should send a single integer [0], process 1 should send two integers [0,1], ..., and process P-1 should send P integers [0, 1, ..., P-1].

To keep your program modular, your program should define and call a printResults() function that takes and displays an integer array, the size of the array, the number of processes, and the time to traverse the ring.

This function can also be used to help debug your program, by displaying the array each process receives. But once your program is debugged, only your master process should call this function, since worker processes should not perform any I/O. To keep your code clean, be sure you remove such 'debugging statements' and any other unnecessary code from the final version of your program.

Your program should be fully documented.

In the CS labs:

  1. Write your program and test it using different arguments for the -np switch, so that you are confident it works correctly before continuing.
  2. Using the MPI_Wtime() function, add code that causes the master process to calculate how long it takes the message to circulate around the "ring". Have the master output the number of processes, the message it received and the time, but do not include any I/O operations in your timing -- your timing should just measure how long it takes to transmit the message around the ring.

  3. Since you are competing for network bandwidth and CPU cycles with others in the lab, this time may vary significantly from execution to execution, even using the same number of processes.

    To measure this variance, run your program 3 times for a given number of processes. In a spreadsheet, record these three times; then use the spreadsheet to compute the max, min, median, average, and standard deviation of the three trip-time measurements. Use this procedure to test, record, and determine these statistics for rings of 4, 8, 16, 32, 64, 128, and 256 processes. (Do not use a calculator--use your spreadsheet's built-in functions to calculate these statistics!)

    Be careful to only compute the time it takes the message to traverse the ring -- your master process should only time the interval from before sending its message to after receiving its message.

On The Cluster:

  1. On the cluster (borg), make a new directory for this project. Transfer your program from the lab to this new directory on the cluster. Copy the Makefile you used last week, modify it to work with this week's program, and then use it to compile your program. (See me or one of your classmates if you need help with this.)
  2. Copy your SLURM script-files from last week's projects and modify them as necessary for this project. To get SLURM to run your program multiple times on your behalf: These changes will cause SLURM to run your program as an array job, running it 3 times and storing each job's output in a different file named slurm-, followed by the master job number (the number SLURM reports when you submit your job via sbatch), followed by -, followed by the job's array index number (1-3), followed by .out. To illustrate, if SLURM reported my job number as 25268, the output files it generate would be slurm-25268-1.out, slurm-25268-2.out, and slurm-25268-3.out.
  3. Repeat the timing experiments you performed in the CS lab, recording three ring-traversal times for rings of 4, 8, 16, 32, 64, 128, and 256 processes.
  4. For each different ring-size, use your spreadsheet to compute the same statistics of the three traversal-times that you computed in the lab (max, min, median, average, standard deviation).

Visualization:

In your spreadsheet, create a line-chart that plots the average times you computed for the CS Lab vs. on the cluster. The X-axis should be P, the number of processes; the Y-axis should be the time. Give your chart a descriptive title, be sure to label your axes (including any units), and include a legend that indicates which system each line represents in a way that will be clear on a gray-scale printer (e.g., use a solid line for one and a dashed line for the other). Your X-axis markers should be the actual P-values you used: 4, 8, 16, 32, 64, 128, and 256 processes.

Then create a second similar line-chart in which you plot the standard deviations you computed for each ring-size, with the number of processes P on the X-axis and the standard deviation on the Y-axis.

Size both of these charts so that each will occupy an entire page when printed, to reveal as much detail as possible. Since our X-axis values are growing by powers of 2, use a log-scale on the Y-axes to visualize these relationships. If necessary, copy-paste these charts into a word-processing program for printing.

Hand In

Hard copies of:
  1. A 1-to-2 page written analysis in which you interpret your line-charts and spreadsheet data, discussing the statistics you recorded, especially the averages and standard deviations. To give a few examples: Explain for what you observe there; provide hypotheses for your observations, and support your hypotheses with evidence. Make your analysis as quantitative as possible, citing actual values from the data you collected to support all claims and hypotheses you make.

    Compare the data you collected this week with what you collected last week. If you notice significant differences, quantify those differences and provide evidence-based hypotheses that might explain those differences.

    If you used AI in completing this project, attach an additional page or two in which you document your use of it, as described in the course policies.

  2. Your two line-charts.
  3. The spreadsheet data you used to create the line-charts.
  4. Your program's source code, printed with enscript or a2ps.
  5. The Slurm .out file produced by your minimum-time 256-process run on the cluster, also printed with enscript or a2ps.
Please staple these pages together and make certain that your name is on each page.


CS > 374 > Exercises > 02 > Homework Project


This page maintained by Joel Adams.