HPC MPI Exercise 4: Hands-On Lab, Part 1

Part 1: Performing Sequential I/O Faster

Within your 374 course folder, create a new folder for this lab exercise (e.g., 4).

The first part of this week's lab exercise is to explore different aspects of file input. The ideas we'll explore also apply to file output, but we will focus on input today.

To avoid wasting storage-space, we will all read from the same set of (large) data files, which are stored in the directory: /home/cs/374/exercises/04/. Verify that you have access to this directory by entering:

   ls /home/cs/374/exercises/04

Each of the files in this folder contains randomly generated double values; the name of a file indicates the number of doubles in that file.

Note that this directory is located on a file server, which is "across the network" from your workstation. Reading from shared files on a file server is:

convenient (you can access these files from any lab workstation), and
it saves space (we avoid redundant copies of large files),

However, since you and your classmates may be accessing these files simultaneously, it is possible that this remote-access approach may affect your timings and/or their consistency during this exercise. To mitigate these effects, we will limit our activities to P <= 8 processes; we will also run each program three times and use the minimum of those three times.

A. Sequential Text Input (Default).

Begin by making a new folder named seqTextIn, cd to it, and then download the files seqTextIn.c and Makefile from this folder.

Take a few minutes to look at the program in seqTextIn.c. It is a fairly simple program that reads the contents of a text file containing double values into an array. In order to determine the size of the array, the first line of this text file is an integer N that indicates the number of double values in the file. After opening the file, the program reads this value N, allocates an array of N double elements, and then uses a for loop to read the N values from the file into the array. The program thus assumes the file has a certain structure: N on the first line, followed by N double values.

Identify the statements that open the file, fill the array, and close the file. Surround this group of statements with calls to MPI_Wtime() to time how long it takes the program to perform this group of statements.

Since MPI_Wtime() is an MPI function, you will also need to 'wrap' these calls in calls to MPI_Init() and MPI_Finalize(), and add #include <mpi.h> to the program's include-directives.

Finally, modify the printf() at the end so that it also reports how long it the program took to open+read+close the file.

Having made these changes, use the make command to build the program. Continue when your program builds without errors or warnings.

To run the program, enter:

   ./seqTextIn /home/cs/374/exercises/04/1m-doubles.txt

Your program should run, reporting that it read 1,000,000 double values and the time it took to do so. Note that the time reported is quite short, and short timings can be inaccurate. To compensate, we will perform three trials and use the minimum of these three times.

Open a spreadsheet, write Sequential Text Input, Default in the first row. Beneath that, create the column headings N, Trial 1, Trial 2, Trial 3, and Minimum. Record 1000000 beneath N and the time your program reported beneath Trial 1. Run the program 2 more times, record the times beneath Trial 2 and Trial 3; then use a spreadsheet function to compute and record the minimum of the three trials beneath Minimum.

Then repeat this procedure using the other text files:

/home/cs/374/exercises/04/10m-doubles.txt,
/home/cs/374/exercises/04/100m-doubles.txt, and
/home/cs/374/exercises/04/1b-doubles.txt. This file takes quite a while, and for such a long time, the variance from one execution to another will be modest, so feel free to just perform one trial, and use that trial's time as the minimum time for this file.

You should now have a record of the trial and minimum times to read text files containing one million, ten million, one hundred million, and one billion double values.

To view a long-listing of these files sorted by size, enter the command

   ls -lS /home/cs/374/exercises/04

The fifth column (to the left of the month) that ls displays is the size of each file in bytes. Add a new column File Size to your spreadsheet and beneath it, record each file's size information.

Enter the command

   less /home/cs/374/exercises/04/1m-doubles.txt

The less program lets you use the spacebar to scroll forward in the file, the b key to scroll back, and the q key to quit.

Discuss with your neighbor: Are you able to read the double values this file contains?

When you have determined the answer to that question, type q to quit the less program and continue.

B. Sequential Text Input (Optimized).

If you examine your Makefile, you will see that it contains the line:

   CFLAGS = -Wall -ansi -pedantic -std=c99

This line defines the compilation flags being used to build the program:

-Wall generates all warnings
-ansi tells the compiler to adhere to the ANSI standard
-pedantic tells the compiler to strictly enforce that standard
-std=c99 tells the compiler to use the C99 language standard.

Since these are the only compilation flags being specified, the compiler will use its default level optimization, which typically performs little or no optimization of the code. For the GNU family of compilers, this is known as -O0 (dash oh-zero) optimization. Using a text editor, change this by changing that line in the Makefile to instead read:

   CFLAGS = -O2 -Wall -ansi -pedantic -std=c99

GNU's different levels of optimization are:

-O0: perform minimal optimizations, if any (the default).
-O1: perform optimizations that reduce the execution time and the size of the binary, but do not increase the compile time.
-O2: perform optimizations that reduce the execution time and the size of the binary, that may increase the compile time.
-O3: perform all supported optimizations, including those that reduce the execution time without regard to the size of the binary or the compile time.

By adding -O2 to the CFLAGS in our Makefile we are telling the compiler to apply all optimizations that do not increase the size of our program.

If you enter

     make

it will tell you that your program is up to date, because we have not changed it since you last built it. Enter:

      touch seqTextIn.c

This will update the 'last modified' date on your source file, so make will think it is newer than your program file and rebuild the latter. Then re-enter

     make

and your program should rebuild. Inspect the compilation command performed by make to verify that the -O2 switch is being used.

If your find that adding -O2 causes the compiler to generate a warning about not using the return value of fread(), you can disable that warning by editing the Makefile again and adding:

   CFLAGS = -O2 -Wall -ansi -pedantic -std=c99 -Wno-unused-result

In your spreadsheet, add a new row Sequential Text Input, -O2 Optimized. Below this, copy-and-paste the same N, Trial 1, Trial 2, Trial 3, Minimum, and File Size column headings you used in Section A. To provide the values below each column-heading, repeat the activities from Section A. For each of the four input files (1m-doubles.txt, 10m-doubles.txt, 100m-doubles.txt, 1b-doubles.txt) run the newly-optimized version of your program three times, record these three trial-times in your spreadsheet, and use it to compute the minimum value for each file-size.

Discuss with your neighbor: How do these times compare with your times from Section A?

Optional: Feel free to experiment with the -O1 and -03 optimization levels to see how they compare with the default and -O2 times.

C. Sequential Binary Input.

Next, let's see how using a binary input file affects performance.

Use cd .. to change to the parent directory, make a new directory there named seqBinIn, cd to that directory, and then download the files seqBinIn.c and Makefile from this folder to that directory.

Open up seqBinIn.c and take a few minutes to compare its contents to those of seqTextIn.c. Identify the statements that open the file, read values from the file into the array, and close the file and take special note how they differ in this program compared to the previous program.

One key difference in this program is that it uses a function getFileSize() that uses POSIX system calls to determine the size of the file in bytes. The program then uses that information to compute N, the number of doubles in the file, and then uses N to allocate the array. This approach works because unlike a double in a text file, each double in a binary file occupies exactly the same number of bytes as a double in main memory.

The other key difference is that this program reads all of the values from the file into the array via a single read. That one read should use the computer's Direct Memory Access (DMA) hardware to transfer all of the values from the file into the array. This ability to fill the array with a single read (plus the use of the binary format) will have a profound effect on the program's performance.

To see this, add calls to MPI_Wtime() to time how long it takes to open+read+close the file, and modify the printf() function to report this time.

Then use the make command to build the program. Continue when your program builds without errors or warnings.

Discuss with your neighbor: What optimization level is the Makefile using to build the program?

To run this program, enter:

   ./seqBinIn /home/cs/374/exercises/04/1m-doubles.bin

Note that the file's .bin extension indicates that this is a binary-format file. Similar to seqTextIn, this program will read 1,000,000 doubles from a file, but the numbers are stored in binary-format in this file.

Note that since a program can compute N, the number of items in a binary-format file, and use that value to allocate an array of the necessary size, there is no need to store N at the beginning of the file, the way we did with our text files.

In your spreadsheet, add a new row Sequential Binary Input. Beneath it, add column headings for N, Trial 1, Trial 2, Trial 3, Minimum, and File Size. Copy-paste the N values from a Sequential Text Input area of your spreadsheet and enter the time your program reported for N=1,000,000 under Trial 1. Then run the program twice more, record those times, and compute the minimum of the three trials.

Repeat this process using the other binary files:

/home/cs/374/exercises/04/10m-doubles.bin,
/home/cs/374/exercises/04/100m-doubles.bin, and
/home/cs/374/exercises/04/1b-doubles.bin.
As before, this time should not be very short (but it should be much shorter than with seqTextIn), so just use a single trial.

Record the times produced on the appropriate lines in your spreadsheet.

Discuss with your neighbor: How do these times compare to your text output times?

Discuss with your neighbor: Which is more time-efficient, text or binary file input?

Re-enter the command:

   ls -lS /home/cs/374/exercises/04

Back in your spreadsheet, enter the sizes for each of the binary files your program read under the File Sizes column heading.

Discuss with your neighbor: How do these binary file sizes compare to their text file counterparts?

Discuss with your neighbor: Which are more space-efficient, text or binary files?

Lastly, enter:

   less /home/cs/374/exercises/04/1m-doubles.bin

Discuss with your neighbor: Are you able to read the values this file contains?

Discuss with your neighbor: Which are more human-friendly, text or binary files?

When you have determined the answers to those questions, type q to quit the less program and continue.

When you have finished all of the preceding steps, congratulations, you are ready for part 2 of this exercise!

Optional:

Local vs. Remote Files. As noted earlier, the input files we have read are stored on a network file server. Storing files on a network file server provides convenient access from any lab workstation, but reading a file across the network can significantly increase the time, compared to reading a file stored locally on a workstation.
Each CS lab workstation has a local solid state device (SSD) and any files stored in the directory /scratch are stored on this SSD. If you wish to compare the access times for files stored on the network file server vs. files stored on a local SSD, feel free to make a 374 folder in /scratch, copy the files from /home/cs/374/exercises/04 to /scratch/374/; then rerun the programs from this exercise using those local files as the input files to your programs (e.g., /scratch/374/1m-doubles.txt).
Note that since /scratch is local to each workstation, you will lose the convenience of being able to work from any workstation in the lab -- you will only be able to access those files from that workstation.
Input vs. Output Files. In this exercise, we have focused on input. If you are interested in exploring output, the folders seqTextOut and seqBinOut provide the output equivalents of seqTextIn and seqBinIn. These can also be used to generate your own files containing pseudo-random sequences.

CS > 374 > Exercise > 04 > Hands-On Lab, Part 1

This page maintained by Joel Adams.