Open MultiProcessing (OpenMP) is an open standard for multithreading in C, C++, and Fortran. Most modern compilers, including GNU's gcc, Intel's icc, and Microsoft's Visual Studio include built-in support for OpenMP, so it provides a means of writing portable multithreaded programs.
In many multithreading platforms (e.g., POSIX threads, Java threads), the programmer must explicitly manage the threads. By contrast, OpenMP provides implicit multithreading, meaning the OpenMP library handles most of the details of managing the threads. Because of this, OpenMP is relatively easy to use compared to other multithreading platforms; in the HPC world, it has become the de facto standard for C multithreading.
The Fork-Join pattern plays a central role in OpenMP. To let you explore it, download and run forkJoin and forkJoin2. Read the comment at the beginning of each source file to see what you should do with it. Things to look for include:
Once you have explored these examples and are confident you understand how OpenMP's use of the Fork-Join pattern works, you may continue to the next part of today's exercise.
Like many other parallel platforms, OpenMP uses the Single Program Multiple Data (SPMD) pattern. To explore it, download and run spmd and spmd2. As before, read the comment at the beginning of each source file to see what you should do with it.
When you are confident you understand how OpenMP uses the SPMD pattern, continue on to the next part.
OpenMP also provides built-in support for the Barrier pattern. To explore it, download and run barrier. As before, read the comment at the beginning of the source file to see what you should do with it.
When you are confident you understand how to use OpenMP's version of the Barrier pattern, continue on to the next part.
OpenMP provides functions by which a thread can discover how many threads there are, and its identity. These (combined with the Fork-Join pattern) are sufficient to implement the Master-Worker and Master-only patterns in OpenMP. To explore these patterns, download and run masterWorker and masterOnly. As before, read the comments at the beginning of each source file to see what you should do with it.
When you are confident you understand how to implement the Master-Worker and Master-only patterns in OpenMP, continue on to the next part.
Simple Parallel Loops. As we have seen before, there are two versions of the Parallel Loop pattern:
OpenMP's parallel loop supports many other features, including dynamic scheduling (aka work stealing) and the ability to control the size of the blocks in each version of the Parallel Loop pattern. We will not explore them here, but you should feel free to investigate them on your own.
Parallel While Loops. OpenMP's #pragma omp parallel for provides an easy and convenient way to parallelize a for loop, but it has some key restrictions: (i) the iterations of the loop must be independent of one another, and (ii) the loop must have the canonical for loop form:
for (initExpr; testExpr; incrExpr)
structuredBlock
The most efficient solutions to many problems'
require iteration without this second (canonical form) restriction
-- for example, when the iteration should
continue so long as changes keep occurring, or
stop as soon as a solution is found.
In C/C++, there is an equivalent while loop for any for loop:
| For Loop Structure | Equivalent While Loop Structure | |
|---|---|---|
for (initExpr; testExpr; incrExpr) {
stmts
}
|
initExpr;
while (testExpr) {
stmts
incrExpr;
}
|
so for problems whose sequential solutions use a while loop or a non-canonical for loop, the solution can be parallelized in OpenMP using an parallel while instead of its #pragma omp parallel for.
Download the files from the parallelWhile folder. The file parallelWhile.c is a patternlet that illustrates how to implement both the 'chunks' and the 'slices' versions of a parallel loop using OpenMP and a while loop. Examine the code in parallelWhile.c to see how to implement such loops, and follow the directions at the beginning of the file to explore their behavior.
Parallel Loops and Reductions. In OpenMP, the Reduction pattern is implemented as an optional clause of the omp parallel directive. Since OpenMP's omp parallel for is a special case of that directive, Reduction can also be used with it. To see how it works, download the files in the parallelLoop-reduction folder, and follow the directions at the beginning of the source file.
OpenMP has no implementation of the broadcast pattern because it is not needed: if the master thread has read a dataset into an array that is in shared memory, then all threads can access the dataset via that array.
However, when multiple threads are accessing a shared variable (e.g., the shared array just mentioned), it is all too easy to create race conditions.
As you work through the exercise, work with your neighbor to find the answers to these questions:
For race conditions where the code contains a critical section, the Mutual Exclusion pattern can be used. OpenMP provides two different mechanisms for this pattern: the atomic mechanism and the critical mechanism:
You can explore the atomic mechanism by downloading the files from the atomic folder. As usual, follow the directions at the beginning of the file, and map the behavior you observe to the source code producing it.
You can explore the critical mechanism by downloading the files from the critical folder and the critical2 folder. By now, you should know the drill: follow the directions at the beginning of the file, and figure out how the source code is producing the behavior you observe.
If you compare these patterns in OpenMP against the corresponding patterns in other parallel platforms, it should be evident that OpenMP's mechanisms are much simpler and easier to use than the equivalent mechanisms of other platforms. This simplicity and comparative ease of use are what have made OpenMP so popular.
The Parallel Loop patterns can be used to divide a dataset into chunks or slices and process those chunks or slices in parallel. As such, the Parallel Loop supports a particular parallel algorithm design strategy called Data Decomposition --we decompose the dataset into chunks or slices and process them in parallel.
Another common parallel algorithm design strategy is called Task Decomposition, in which a problem is decomposed into tasks, and those that can be performed in parallel are performed by different PEs.
OpenMP supports Task Decomposition, so the final parallel pattern we will examine is the Parallel Task pattern. OpenMP implements this pattern using its sections directive. To see how it works, download the files from the sections folder. Note that each section can be performing a different function, so if a problem can be decomposed into independent tasks (i.e., that can run in parallel), the sections directive provides a way to map those tasks to different PEs.
When you are finished, you may continue to this week's project.
CS > 374 > Exercise > 05 > Hands-On Lab