HPC MPI Exercise 4: Hands-On Lab

Part I: Collective Communication

The first part of this week's exercise is to explore MPI's Collective Communications patterns, which include:

The broadcast pattern.
The reduction pattern.
The scatter pattern.
The gather pattern.
The the scatterv and gatherv patterns.

Each of these folders contains a source program that you may use to explore that pattern, and a Makefile to build the program. The source program's opening comment includes a mini-exercise you can do to explore the behavior of that program.

Create a new folder for this week's work. Then for each pattern: create a new folder and download the source program and Makefile to that folder. Build and run the program, and compare its output to the source. Use the exercise described in the source program's opening comment to experiment with the program, until you understand how the pattern works.

When you are comfortable with each of these communication patterns and understand how they differ from one another, continue on to Part II.

Part II: MPI Parallel I/O

The second part of this week's lab exercise is to explore different aspects of file input. The ideas we'll explore also apply to file output, but we will focus on input today.

To avoid wasting space, we will all read from the same set of (large) data files, which are stored in the directory: /home/cs/374/exercises/04/. Verify that you have access to this directory by entering:

   ls /home/cs/374/exercises/04

Note that this directory is located on a file server, which is across the network from your workstation. Reading from shared files on a file server is:

convenient (you can access these files from any lab workstation), and
it saves space (we avoid redundant copies of large files),

but it is quite possible that this remote-access arrangement may affect your timings and/or their consistency during this exercise.

A. Sequential Text Input.

Begin by making a new folder named seqTextIn, cd to it, and then download the files seqTextIn.c and Makefile from this folder.

Take a few minutes to look at the program in seqTextIn.c. It is a fairly simple program that uses a traditional input loop to read the contents of a text file containing randomly generated double values into an array. In order to determine the size of the array, the first line of this text file is an integer N that indicates the number of double values in the file. After opening the file, the program reads this value N, allocates an array of N double elements, and then uses a traditional input loop to read values from the file into the array.

Identify the statements that open the file, fill the array, and close the file. Then surround this group of statements with calls to MPI_Wtime() to time how long it takes the program to perform this group of statements.

You will also need to 'wrap' these calls to MPI_Wtime() in calls to MPI_Init() and MPI_Finalize(), and perhaps add #include <mpi.h> to the program's include-directives.

Finally, modify the printf() at the end so that it also reports how long it the program took to .

Having made these changes, use the make command to compile the program. Continue when your program compiles without errors.

To run the program, enter:

   ./seqTextIn /home/cs/374/exercises/04/1m-doubles.txt

Your program should run, reporting that it read 1,000,000 double values and the time it took to do so. Note that the time reported is quite short, and short timings can be inaccurate, so we will need to take the average of several trials.

Open a spreadsheet, write Sequential Text Input in the first row, and beneath that create the column headings N, Trial 1, Trial 2, Trial 3Average. Record 1000000 beneath N and the time your program reported beneath Trial 1. Run the program 2 more times, record the times beneath Trial 2 and Trial 3 and then compute the average.

Then repeat this procedure using the other text files:

/home/cs/374/exercises/04/10m-doubles.txt,
/home/cs/374/exercises/04/100m-doubles.txt, and
/home/cs/374/exercises/04/1b-doubles.txt, but for this file, just do a single trial. The time to read it will not be short, so any timing inaccuracies should have little effect.

You should now have a record of the average times required to process text files containing one million, ten million, one hundred million, and one billion double values.

To view a long-listing of these files sorted by size, enter the command

   ls -lS /home/cs/374/exercises/04

The fifth column (to the left of the month) is the size of each file in bytes. Add a new column Text File Size to your spreadsheet and beneath it, record each file's size information.

Enter the command

   less /home/cs/374/exercises/04/1m-doubles.txt

The less program lets you use the spacebar to scroll forward in the file, the b to scroll back, and the q key to quit.

Are you able to read the double values this file contains?

When you have determined the answer to that question, type q to quit the less program and continue.

B. Sequential Binary Input.

Next, we want to see how using a binary file changes things.

Use cd .. to change to the parent directory, make a new directory there named seqBinIn, cd to that directory, and then download the files seqBinIn.c and Makefile from this folder to that directory.

As before, open up seqBinIn.c and take a few minutes to compare its contents to seqTextIn.c. Identify the statements that open the file, read values from the file into the array, and close the file and take special note how they differ in this program compared to the previous program.

One key difference in this program is that it uses a function getFileSize() that uses POSIX system calls to determine the size of the file in bytes. The program then uses that information to compute the number of doubles in the file, and then uses that to allocate the array. This approach works because unlike a double in a text file, each double in a binary file occupies the same number of bytes as a double in main memory.

The other key difference is that this program reads all of the values from the file into the array via a single read. That one read sucks all of the values from the file into the array. As we'll see, this ability to fill the array with a single read will have a profound effect on the program's performance.

To see this, add calls to the MPI_Wtime() and modify the printf() function to compute and report how long this program takes to open, read, and close the file.

Then use the make command to compile the program. Continue when your program compiles without errors.

To run this program, enter:

   ./seqBinIn /home/cs/374/exercises/04/1m-doubles.bin

Note that the file's .bin extension indicates that this is a binary-format file. Similar to seqTextIn, this program will read 1,000,000 doubles from a file, but the numbers are stored in binary-format in this file.

In your spreadsheet, add a new row Sequential Binary Input. Beneath it, add column headings for N, Trial 1, Trial 2, Trial 3, and Average. Copy-paste the N values from the Sequential Text Input area of your spreadsheet and enter the time your program reported for N=1,000,000 under Trial 1. Then run the program twice more, record those times, and compute the average of the three trials.

Repeat this process using the binary files:

/home/cs/374/exercises/04/10m-doubles.bin,
/home/cs/374/exercises/04/100m-doubles.bin, and
/home/cs/374/exercises/04/1b-doubles.bin. As before, this time should not be very short (but it should be much shorter than seqTextIn), so just use a single trial.

Record the times produced on the appropriate lines in your spreadsheet.

How do these times compare to your text output times?

Which is more time-efficient, text or binary file input?

Re-enter the command:

   ls -lS /home/cs/374/exercises/04

Back in your spreadsheet, enter a new Size column next to the Average column and beneath it, enter the sizes for each of the binary files your program read.

How do these binary file sizes compare to their text file counterparts?

Which is more space-efficient, text or binary files?

Lastly, enter:

   less /home/cs/374/exercises/04/1m-doubles.bin

Are you able to read the values this file contains?

Which is more human-friendly, text files or binary files?

When you have determined the answers to those questions, type q to quit the less program and continue.

C. Parallel Binary Input.

Lastly, we want to see if parallelization can help us improve things even further.

Use cd .. to change to the parent directory, make a new directory there named parallelBinIn, cd to that directory, and then download the files parBinIn.cpp, OO_MPI_IO.h, and Makefile from this folder to that directory.

Note that these are C++ files, not C. As you might guess from the name of the flle OO_MPI_IO.h, it uses object-oriented thinking to create abstractions that hide the complexity of using MPI-IO to perform parallel I/O. More precisely, OO-MPI-IO.h declares three class templates, which are:

ParallelReader, a class template whose constructor and readChunk() methods simplify using MPI-IO's capabilities to let MPI processes read from a binary file in parallel.
ParallelWriter, a class template whose constructor and writeChunk() methods simplify using MPI-IO's capabilities to let MPI processes write to a binary file in parallel.
OO-MPI-Base, a superclass template of ParallelReader and ParallelWriter that consolidates the functionality they have in common.

The following UML diagram illustrates the relationships between them:

We will be using ParallelReader in this last part of today's exercise.

As before, open up parBinIn.cpp and take a few minutes to compare its contents to seqBinIn.c. Things to note include:

The statements that open the file, read the binary values, and close the file are now implementation details hidden within the ParallelReader abstraction.
We pass the C/C++ type of value being read (i.e., double in this case) as an argument to the template.
We pass the equivalent MPI type of the value being written (e.g., MPI_DOUBLE) as an argument to the ParallelReader constructor.
The ParallelReader abstraction provides a readChunk() method that takes a C++ vector as its argument, and fills that vector with this MPI process's chunk of the input file. All the details of determining the size of the file, this MPI process's offset within the file, the size of the process's chunk, the resizing of the vector, and so on are hidden within that method.
To save you time, we have already added the calls to MPI_Wtime() and modified the final printf() so that the program computes and reports the time required to read this process's chunk of the file.

Use make to compile the program; when it compiles correctly, use the genHosts.pl to generate a fresh hosts file. Then run the program by entering:

   mpirun -np 2 -machinefile hosts ./parBinIn /home/cs/374/exercises/04/1m-doubles.bin

In your spreadsheet, make a new section named Parallel Binary Input, P = 2 similar to the preceding sections. Run your program two more times, enter the times under the trials columns, and compute the average.

Next to your Average column heading, add column headings for Speedup and Efficiency. Under the Speedup column, compute the parallel speedup for P = 2 and N = 1000000 using the formula:

   Speedup_P(N) = Time₁(N) / Time_P(N)

For Time₁(N), use the time you recorded in your spreadsheet for Sequential Binary Input for N = one million

Under the Efficiency column, compute the parallel efficiency using the formula:

   Efficiency_P(N) = Speedup_P(N) / P

Then repeat these steps using the other binary files 10m-doubles.bin, 100m-doubles.bin, and 1b-doubles.bin.

Repeat this entire procedure using 4 processes, 6 processes, and 8 processes.

Wrap up by creating three line-charts to visualize the data you have collected for Parallel Binary Input:

A chart showing how the Input Time (Y-axis) changes as the number of processes (X-axis) change, for P = 1, 2, 4, 6, 8.
A chart showing how the Speedup (Y-axis) changes as the number of processes (X-axis) change, for P = 1, 2, 4, 6, 8.
A chart showing how the Efficiency (Y-axis) changes as the number of processes (X-axis) change, for P = 1, 2, 4, 6, 8.

Optional:

Local vs. Remote Files. As noted at the beginning of Part II, the input files we have read are stored on a network file server. Storing files on a network file server provides convenient access from any lab workstation, but it can add significantly to the time required to access a file, compared to files stored locally on a workstation. Each of the CS lab workstations has a local solid state device (SSD) and any files stored in the directory /scratch are stored on this SSD. If you wish to compare the access times for files stored on the network file server vs. files stored on a local SSD, feel free to make a 374 folder in /scratch, copy the files from /home/cs/374/exercises/04 to /scratch/374, and then rerun the programs from this exercise using those local files as the input files to your programs (e.g., /scratch/374/1m-oubles.txt). Note that since /scratch is local to each workstation, you will lose the convenience of being able to work from any workstation in the lab -- you will only be able to access those files from that workstation.
Input vs. Output Files. In Part II of this exercise, we have focused on input. If you are interested in exploring output, the folders seqTextOut, seqBinOut, and parBinOut provide the output equivalents of seqTextIn, seqBinIn, and parBinIn. These can also be used to generate your own files containing pseudo-random sequences.

When you have finished all of the preceding steps, you are ready for this week's project.

CS > 374 > Exercise > 04 > Hands-On Lab

This page maintained by Joel Adams.