HPC MPI Exercise 1: Homework Project
The Circuit Satisfiability Problem for a given binary circuit
is to find the set of inputs that cause that circuit to produce '1'
as its output.
For example, here is a diagram of a 32-bit circuit:
Since the circuit contains AND-gates, OR-gates, and inverters,
we can model that circuit in C using the boolean
AND (&&), OR (||), and NOT (!) operators.
The sequential program
circuitSat.c.
does this, and solves the problem by using a for loop to iterate
through all possible inputs, outputting those inputs for which
the circuit produces '1'.
Homework
Our research question for this project is:
Which of the two versions of the Parallel Loop pattern
produce the best performance for the Circuit Satisfiability Problem?
Or does it not matter?
To answer this question, our project has multiple parts:
-
In the Lab:
-
Download the program
circuitSat.c.
Using a text editor,
modify circuitSat.c.
so that it uses the MPI_Wtime() function
to time the computation:
...
MPI_Init(&argc, &argv);
double startTime = MPI_Wtime();
for (i = 0; i <= UINT_MAX; ++i) {
count += checkCircuit(id, i);
}
double totalTime = MPI_Wtime() - startTime;
MPI_Finalize();
printf("\nUsing 1 process, %d solutions were found in %f secs.\n\n", count, totalTime);
...
(Note that you will have to #include <mpi.h>
in order for the program to compile.)
Then compile the program as follows:
mpicc circuitSat.c -Wall -ansi -pedantic -std=c99 -o circuitSat
When your program compiles without errors or warnings, run it as follows:
./circuitSat
With these modifications, the program will self-report how long it
takes to check the circuit.
Warning: This program must evaluate the circuit using 232 inputs;
on my laptop, it took ~7 minutes to finish.
I recommend bringing some other work to do while it runs!
-
Create a spreadsheet (e.g., use Excel or a Google Sheet)
with the column headings Number of Processes and Time.
In the first row beneath these headings, enter '1' and the
time (in seconds) reported by your program.
This will serve as our baseline time against which we'll compare
subsequent times.
In an adjacent cell of the spreadsheet, record how many solutions
this sequential version of the program found.
This will let us verify that modified versions of the program
are producing the correct results.
-
Make a copy of circuitSat.c, naming the copy
slicesCircuitSat.c.
Using any text editor, update its header comment so that it explains
"the 5 Ws":
who (your name),
what (the file's name and what its code does differently),
where (Calvin University),
when (the date), and
why (for CS 374, project 1).
Style and documentation are a significant portion of your project grade,
so get in the habit of doing this for each file that you create/edit.
See the documentation of circuitSatisfiability.c for a model.
-
Using parallelLoopSlices.c as a model,
use the 'Slices' Parallel Loop pattern to
improve the performance of slicesCircuitSat.c.
-
Use MPI's version of the Reduction pattern
(discussed in class) to sum the distributed processes'
local count-values into a total count.
-
Modify the final output statement so that it also reports the
number of processes used.
-
With the exception of the outputs produced by processes as they
discover inputs that cause the circuit to output '1',
use the Master-Only pattern to perform all other output,
including the initial feedback message at the beginning,
and the final output statement at the end.
Make certain that your timing excludes those preliminary and
final output operations,
since they should be the same for any two executions.
-
Make a copy of slicesCircuitSat.c, naming the copy
chunksCircuitSat.c, and update its header comment
as necessary.
Using parallelLoopChunks.c as a model,
use the 'Chunks' Parallel Loop pattern
in this version of the program.
-
When both programs compile without errors or warnings
and produce the correct results,
time their executions with 2, 4, 8, 16, 32, 64, 128, and 256 processes.
For each program,
enter these numbers and the corresponding times in your spreadsheet.
In the lab, you are competing for cores with other students,
so depending on how many lab machines are booted into Linux
(i.e., how many machines are in your hosts file),
you may exhaust the total number of available cores,
in which case mpirun will generate an error.
If this occurs, invoke mpirun with
the -oversubscribe switch, which tells it to launch more
processes on a machine than its number of cores.
See
the mpirun manual page for more details and other options.
-
To visualize how your slices and chunks versions perform in the lab,
create a line-chart of your spreadsheet's values,
with the number of processes on the X-axis and
time on the Y-axis.
Give this chart a title that clearly indicates what it measures
and where the measurements were recorded (e.g., the CS Labs).
Label the scale of your X-axis with the number of processes
(1, 2, 4, 8, 16, 32, 64, 128, and 256)
so that this axis indicates the scale at which measurements were taken.
Since these X-axis values reflect growth by powers of 2,
use a log-scale on your Y-axis to see how the two versions perform
as you vary the number of processes.
-
On the Cluster:
-
Transfer both of your source programs to the cluster
(borg.calvin.edu).
To do this conveniently, you will need to
set up key-based authentication
between your lab machine and the cluster.
Once you have these keys set up, you can:
-
Once you have transferred your programs to the cluster,
I recommend that you save and transfer a copy of this
Makefile to the cluster,
as that will simplify compiling your programs there.
-
Having transferred these files to the cluster,
use ssh to login there.
-
On the cluster, you can cd to your
project directory, and then enter:
make
to compile both your programs there.
Note that you must recompile your programs on Borg,
as it has different CPUs from our lab machines.
-
To run programs on the cluster,
we do not want your classmates' programs to interfere with
your programs' performances.
To prevent such interference,
everyone must use a batch queueing system
named SLURM to run programs on the cluster.
To run your program, follow these directions for
Using Slurm on Borg.
-
Using those directions,
repeat the timing experiments you did in the lab,
using 1, 2, 4, 8, 16, 32, 64, 128, and 256 processes.
The cluster has 20 compute-nodes, each with 16 cores,
and their clock speeds are different from the CS lab machines.
To get the best performance, tell SLURM to
use one node for your 1, 2, 4, 8 and 16 process runs;
then for each subsequent run, tell SLURM to use 16 processes
and double the number of nodes being used.
-
Record your timing data in your spreadsheet;
then add a second line-chart to your spreadsheet,
comparing your two programs' times on the cluster.
Note that our entire class can work "at the same time" on the cluster.
The Slurm batch system uses a queue to schedule the submitted jobs,
and only runs a job when all of the resources it requires are available;
this will prevent anyone's jobs from interfering with anyone else's jobs.
You can even submit multiple jobs at the same time
and they will not interfere with one another
-- Slurm will just queued them up and schedule them
as the resources they require become available.
However, if you and lots of others wait until the final night
before the project is due to collect your timing data,
you may experience much longer wait-times for your jobs to be scheduled,
since the queue will be that much longer and there will be more contention
for Borg's 20 nodes...
-
Finally:
Write a 1-2 page analysis of this project,
in which you answer the research question
and interpret the data you have collected about your programs.
Explain what your charts reveal about your programs' performances:
how their performances change as the number of processes changes,
how your two programs' performances compare to one another,
how running them in the lab compares to running them on the cluster, etc.
Make your analysis as quantitative as possible;
support any claims you make by
citing specific values from the data you collected.
For example, a statement like this:
On the cluster, both programs' times decrease linearly
as the number of processes increases.
is a claim; a quantitative analysis must provide evidence for that claim
by citing specific data you've collected:
For example, the cluster-times for the slices version using
P=1,2,4,8,...,256 processes are respectively
128.6, 65.1, 33.4, 18.7, ..., 0.77 seconds;
the corresponding times for the chunks version are...
Of course, any data you cite must be values from your spreadsheet
(and by extension, charts).
You should also report any surprising discoveries or other observations
you've made during the course of this exercise.
If you used an AI tool for any part of the project,
add additional pages to your analysis in which you name the specific tool
you used, describe the tasks for which you used it,
list the prompts you used, indicate what worked and what did not,
what you did to verify the correctness of the AI's outputs,
and so on, as described in
the course policies.
Hand In
Hard copies of:
-
Your written analysis of this project.
Print this using the print function of
the word processor you used to write your analysis.
-
Your line charts and spreadsheet data.
The spreadsheet data can be printed using your spreadsheet application,
but each line chart should be printed on a separate page
and maximally sized to fill that page
to reveal as much detail as possible.
(If necessary, copy each chart into a word processing application,
blow it up to fill a full page, and print it from there.)
Be sure each chart has a descriptive title to indicate how it is different,
and clearly label your axes (including units).
Provide a legend beneath your chart's title and
use different kinds of lines (e.g., solid vs. dashed) on your charts
to clearly indicate which program each line represents,
so that your lines can be clearly distinguished
when printed in black-and-white or gray scale.
-
A hard copy of each program.
To save paper, use the enscript command to print them.
For example, to print your slices program,
you can enter:
enscript slicesCircuitSat.c
and your program will be printed in two-up format,
using both sides of the sheet of paper on printers that support
2-sided printing.
-
The two .out output files generated by Slurm on the cluster
using 256 processes--one for each version of your program.
Use scp to transfer these files from the cluster
back to your lab workstation.
From there, you should be able to print both files (two-up, front-and-back)
by entering:
cat *.out | enscript
and then printing on a printer that supports 2-sided printing.
Please staple all of these pages together
and make certain that your name is on each page,
in case the pages become separated.
CS >
374 >
Exercises >
01 >
Homework Project
This page maintained by
Joel Adams.