Calvin University CS 112 Lab Exercise: Faster Genome Scanning: OpenMP Multithreading

CS 112 Lab Exercise: Faster Genome Scanning: OpenMP Multithreading

OpenMP Multithreading

Here is our main() function that uses OpenMP multithreading:

int main(int argc, char** argv) { 
  string fileName;
  string subSeq;

  double startTotalTime = omp_get_wtime();
  processCommandLineArgs(argc, argv, fileName, subSeq);

  long count = 0;
  int P = 0;
  double readTime = 0.0, scanTime = 0.0;

  #pragma omp parallel reduction(+:count)
  {
    P = omp_get_num_threads();
    int id = omp_get_thread_num();

    double startReadTime = omp_get_wtime();
    ParallelReader<char> pReader(fileName, MPI_CHAR, id, P);
    vector<char> dnaChunk = pReader.readChunkPlus(subSeq.size()-1);
    pReader.close();
    #pragma omp master
    readTime = omp_get_wtime() - startReadTime;

    double startScanTime = omp_get_wtime();
    count = scan(dnaChunk, subSeq);
    #pragma omp master
    scanTime = omp_get_wtime() - startScanTime;
  }

  double totalTime = omp_get_wtime() - startTotalTime;

  printResults(P, subSeq, count, 
                 readTime, scanTime, totalTime);
}

Let's do a deep-dive into how this main() function works:

OpenMP uses #pragma directives, which are strong suggestions to the compiler that it take a particular action, if at all possible. If the action is not possible, the compiler ignores such directives, but will usually generate a warning.
Since every program has one thread when it starts, the directive
```
   #pragma omp parallel
```
spawns P-1 additional threads, where P is the desired number of threads. (The lab exercise shows to control the value of P.) Each thread_i will perform whatever statement follows the #pragma omp parallel directive.
The statement following the #pragma omp parallel is a C++ block statement--a pair of curley braces ({ and }) containing other statements. The other statements within that block statement encode steps 1 and 2 of our algorithm, so #pragma omp parallel performs step 0 of our algorithm.
Every variable declared before #pragma omp parallel is shared by all the threads. Since our printResults() function uses the values of P, count, readTime, and scanTime, all of those variables must be declared before the #pragma instead of inside the block statement, because:
- Variables declared inside the block statement that follows #pragma omp parallel are private to each thread, Each thread has its own copy of such variables, and they do not exist outside of the block statement.
- Since variable count is declared before #pragma omp parallel, it would normally be shared, but the clause
```
   #pragma omp parallel reduction(+:count)
```
  causes:
  - each thread to get its own temporary copy count_i of count, and
  - at the end of the #pragma omp parallel's block statement, these count_i values will be added together and their sum will be stored in the shared count variable. This accomplishes step 2b of our algorithm.
  This reduction clause to #pragma omp parallel thus provides a convenient way to combine the partial results computed by individual threads into a total result. Other operators besides + can also be used.
The first two lines inside the block statement call the OpenMP functions omp_get_num_threads() and omp_get_thread_num(). These return (respectively) the total number of threads (P) and the id number of current thread. These two values are useful when dividing work between multiple threads; if we think of the P threads as a team of workers, these functions let each thread_i answer these questions:
- How many workers do we have to divide up the work?
- Which worker am I?
Each thread_i will get i for its id value (i.e., a different value) but the same value for P.
The lines shown in blue declare a ParallelReader object named pr, send it the readChunkPlus() message, and then close it.
- The ParallelReader() constructor takes the name of the input file F, the type of data in the file, and the P and id values, and uses these values to divide F into chunks.
- ParallelReader provides two key methods:
  - readChunk() which reads thread_i's chunk, and
  - readChunkPlus() reads the chunk of thread_i plus a specified numbers of additional chars from the chunk of thread_i+1.
  Both methods return what they have read in a vector; our second line uses this vector to initialize a thread-local variable dna, which makes it private to that thread.
- The close() method closes file F.
These lines thus perform steps 1a and 1b of our algorithm.
ParallelReader is part of a Calvin University library called OO_MPI_IO.h; this file must be #include-ed to use a ParallelReader.
Input files may contain values of arbitrary data types, not just characters. Since a ParallelReader may need to read these kinds of files, its readChunk() and readChunkPlus() methods return their values using a vector, not a string.
Each thread calls scan() on its private dna variable, thus performing step 2a of our algorithm.
The P-1 "extra" threads only exist while the block statement following #pragma omp parallel is being executed. When execution passes the close-brace (}) of that block, only the original thread continues, so only it will calculate the totalTime and invoke printResults().
Within the block statement, the directive #pragma omp master causes whatever statement follows it to be performed only by the program's original thread. Since only the program's original thread reports the readTime and scanTime (i.e., by calling printResult()), the block statement uses #pragma omp master to ensure that only the original thread calculates these values.
Relatedly, since these time values (and P) are calculated inside the block statement but reported outside of the block statement, we must declare their variables before #pragma omp parallel and its block statement, because all variables declared within a block statement are local to that block--their scope ends at the end of the block.

CS > 112 > Labs > 12 > OpenMP Multithreading

This page maintained by Joel Adams.