HPC MPI Exercise 5: Hands-On Lab

Integration Using the Trapezoidal Method

There are many different computational methods for performing integration. Since the integral from xLo to xHi of a function f(x) is the area under the curve between xLo and xHi, one way to compute the integral is to compute that area.

One way to compute this area is using the trapezoidal method, which approximates the area by fitting 1 or more trapezoids to the curve, and computing the combined areas of the trapezoids. With a single trapezoid, the approximation is pretty poor:

but the approximation becomes more accurate as more trapezoids are used:

A Sequential Program Using the Trapezoidal Method

Create a new folder for this project, and save a copy of these files: integral.h, integral.c, calcPI.c, and Makefile. The integrate() function in the file integral.c uses the trapezoidal method to perform integration.

The program in calcPI.c uses that function to calculate an approximate value of PI. The program uses the function to calculate the area of 1 quadrant of the unit circle. Since a circle's area is PI * radius2 and the radius of a unit circle is 1, the area of the unit circle is PI. If we can find the area of one quadrant of the unit circle, multiplying that area by 4 should provide an approximate value for PI:

Use the provided Makefile to build the program. Then run it from the commandline, using 1 and 10 trapezoids. Since we cannot tell how long it takes the program to run, and the I/O should not take much time, add MPI calls to measure how long the integrate() function takes to run, and have the program output that time.

Rerun the program using 1; 10; 100; 1,000; 10,000; 100,000; 1,000,000; 10,000,000; 100,000,000; 1,000,000,000; and 10,000,000,000 trapezoids. In a spreadsheet, record (i) the number of trapezoids used, (ii) the time required, and (iii) the precision (the number of correct decimal digits) of the result. Create a chart of your times, and predict (a) how many trapezoids, and (b) how long it will take to compute PI to 20 digits of precision.

Part II: Using Intel's Profiling Tools

In this example, finding the "hotspot" or place where the program is spending most of its time is pretty easy. In larger programs, it can be more difficult, and there may be multiple "hotspots". In this part of today's exercise, we are going to use this simple program to explore the capabilities of a program that can help us find such "hotspots".

Intel and other companies make profiling tools that analyze the performance of a sequential program to help you identify its "hotpots". Take a few minutes to watch this short video to get high-level overview of the tools available in Intel's Parallel Studio XE.

Intel's profiling tool is called Advisor. This video walks you through the use of Intel's Advisor for multithreading, to get a general sense of its capabilities.

The general steps to use the Advisor tool are:

1. Build your program with debugging information.
2. Run your program from within Advisor, collecting "survey data", which identifies the "hotspots" where your program is spending its time.
3. Add "annotations" -- Advisor-specific directives -- to your source code's hotspot, so that Advisor can model/predict the likely effects of parallelizing it.
4. Run your program from with Advisor, collecting "suitability data". During this run, Advisor will model what would happen if you were to parallelize the hotspot, and try to predict how much that parallization would improve your program's performance.
5. Run your program from within Advisor, collecting "correctness data". This run will try to determine whether or not adding parallelization will affect the correctness of the results your hotspot computes.
6. View Advisor's "Summary". In this step, Advisor will make recommendations as to what changes you could make to your code to add parallelism and improve its performance.
Getting the first couple of steps to work is the most challenging part, so let's start there.

To use the standlone Advisor application, your program must be compiled for debugging (i.e., with extra information saved so that the debugger can map machine instructions back to the corresponding source code instructinos). With the gcc compiler, this is accomplished using the -g switch, so edit the Makefile and add -g to the definitions of CFLAGS and LFLAGS. Then save the Makefile, enter

```    make clean
```
to remove the old files, and then enter
```    make
```
to rebuild the program for debugging. Examine the compile and link commands the Makefile is performing, and verify that the -g switch is present.

The Intel tools reside in /opt/intel/, so to launch the Advisor, enter the following:

```   /opt/intel/advisor/bin64/advixe-gui
```
Try it out on your own: create a project, and follow the steps in the video to run/profile your calcPI binary. When you create your project and specify calcPI as the Application, you can specify the command-line argument on the Application Parameters line. Give it enough trapezoids (e.g., 1,000,000,000) so that it has enough work to do, or else the Collect Survey Data step may report No Data. To change this command-line argument, choose:
```   File > Project Properties
```
While you are in Project Properties, click the Source Search tab and add your project folder to the list of "Search Directories" so that Advisor can find your project's source code.

When you try the Add Annotations step, gcc will likely give you an error when it processes the directive:

```   #include "advisor-annotate.h"
```
If this happens, gcc does not know the location of Advisor's #include directory. To tell it where to find this directory, add the following switch to the CFLAGS variable in the Makefile:
```  -I/opt/intel/advisor/include
```
The -I switch followed by a directory tells gcc to add that directory to the list of those it searches when processing #include directives. That change should resolve that particular problem.

```   -ldl
```

Finally, it appears that Intel did not strictly follow the ANSI standard, so their files generate a number of ANSI warnings. If you want to suppress most of those warnings, try removing the -ansi -pedantic switches from the CFLAGS variable in the Makefile.

If you get stuck, Prof. Adams is available to help.

Feel free to try Intel's advisor with your other projects, to see what information it reveals about their runtime behavior.

If you want to explore more, Intel's Parallel Studio includes two other tools to aid you in parallel program development:

• Intel's VTune Amplifier, a more advanced profiler for multithreading [ Video ] [ Tutorials ]
• Intel's Inspector, a memory and threading debugger [ Video ] [ Tutorials]
Both of these tools are installed in our lab (in /opt/intel/) so feel free to explore them on your own.

When you are comfortable with the basics of using Intel's Advisor, you may continue to this week's project.

CS > 374 > Exercise > 05 > Hands-On Lab