OpenMP Multithreading
Here is our main() function that uses OpenMP multithreading:
int main(int argc, char** argv) {
string fileName;
string subSeq;
double startTotalTime = omp_get_wtime();
processCommandLineArgs(argc, argv, fileName, subSeq);
long count = 0;
int P = 0;
double readTime = 0.0, scanTime = 0.0;
#pragma omp parallel reduction(+:count)
{
P = omp_get_num_threads();
int id = omp_get_thread_num();
double startReadTime = omp_get_wtime();
ParallelReader<char> pReader(fileName, MPI_CHAR, id, P);
vector<char> dnaChunk = pReader.readChunkPlus(subSeq.size()-1);
pReader.close();
#pragma omp master
readTime = omp_get_wtime() - startReadTime;
double startScanTime = omp_get_wtime();
count = scan(dnaChunk, subSeq);
#pragma omp master
scanTime = omp_get_wtime() - startScanTime;
}
double totalTime = omp_get_wtime() - startTotalTime;
printResults(P, subSeq, count,
readTime, scanTime, totalTime);
}
Let's do a deep-dive into how this main() function works:
-
OpenMP uses #pragma directives, which are strong suggestions
to the compiler that it take a particular action, if at all possible.
If the action is not possible, the compiler ignores such directives,
but will usually generate a warning.
-
Since every program has one thread when it starts,
the directive
#pragma omp parallel
spawns P-1 additional threads,
where P is the desired number of threads.
(The lab exercise shows to control the value of P.)
Each threadi will perform whatever statement follows the
#pragma omp parallel directive.
-
The statement following the #pragma omp parallel is a C++
block statement--a pair of curley braces
({ and }) containing other statements.
The other statements within that block statement
encode steps 1 and 2 of our algorithm,
so #pragma omp parallel performs step 0 of our algorithm.
-
Every variable declared before #pragma omp parallel
is shared by all the threads.
Since our printResults() function uses the values
of P, count, readTime, and scanTime,
all of those variables must be declared before the #pragma
instead of inside the block statement, because:
-
The first two lines inside the block statement call the OpenMP functions
omp_get_num_threads() and omp_get_thread_num().
These return (respectively) the total number of threads (P)
and the id number of current thread.
These two values are useful when dividing work between multiple threads;
if we think of the P threads as a team of workers,
these functions let each threadi answer these questions:
-
How many workers do we have to divide up the work?
-
Which worker am I?
Each threadi will get i for its id value
(i.e., a different value) but the same value for P.
-
The lines shown in blue declare a ParallelReader object
named pr, send it the readChunkPlus() message,
and then close it.
-
The ParallelReader() constructor takes
the name of the input file F,
the type of data in the file,
and the P and id values,
and uses these values to divide F into chunks.
-
ParallelReader provides two key methods:
-
readChunk() which reads threadi's chunk, and
-
readChunkPlus() reads the chunk of threadi
plus a specified numbers of additional chars
from the chunk of threadi+1.
Both methods return what they have read in a vector;
our second line uses this vector to initialize
a thread-local variable dna,
which makes it private to that thread.
-
The close() method closes file F.
These lines thus perform steps 1a and 1b of our algorithm.
-
ParallelReader is part of a Calvin University library
called OO_MPI_IO.h; this file must be #include-ed
to use a ParallelReader.
-
Input files may contain values of arbitrary data types, not just characters.
Since a ParallelReader may need to read these kinds of files,
its readChunk() and readChunkPlus()
methods return their values using a vector, not a string.
-
Each thread calls scan() on its private dna
variable, thus performing step 2a of our algorithm.
-
The P-1 "extra" threads only exist while the block statement
following #pragma omp parallel is being executed.
When execution passes the close-brace (}) of that block,
only the original thread continues,
so only it will calculate the totalTime
and invoke printResults().
-
Within the block statement,
the directive #pragma omp master causes whatever statement
follows it to be performed only by the program's original thread.
Since only the program's original thread reports the
readTime and scanTime
(i.e., by calling printResult()),
the block statement uses #pragma omp master to ensure that
only the original thread calculates these values.
Relatedly, since these time values (and P) are calculated
inside the block statement
but reported outside of the block statement,
we must declare their variables before #pragma omp parallel
and its block statement, because all variables declared within a block
statement are local to that block--their scope ends at the end of the block.
CS >
112 >
Labs >
12 >
OpenMP Multithreading
This page maintained by
Joel Adams.