Most supercomputers prevent MPI users from working interactively, because their programs will likely interfere with one another. Instead, users submit their "jobs" (the programs they want to run) to a batch queue, which holds a given job until all of the resources it needs to run become available.
There are a variety of batch queues available. Commonly used ones include Portable Batch System (PBS), Torque, Grid Engine, and Slurm. In this course, we will use Slurm, the Simple Linux Utility for Resource Management, which is the free, open-source batch scheduler used on most supercomputers.
Running a job via Slurm involves three steps:
#!/bin/bash # Example with 2 nodes, 16 processes each = 32 processes # #SBATCH --nodes=2 # how many nodes #SBATCH --ntasks-per-node=16 # how many MPI processes per node #SBATCH --output=slurm-%j.out # output filename # # Optional: uncomment to use ##SBATCH --mail-user=yourEmailAddress # send emails to this address # Load the compiler and MPI library module load openmpi-5.0.7 # Run the program mpirun ./spmd
Within the script, we specify the number of nodes we want to use (2), and the number of processes per node (16), for a total of 2 x 16 = 32 processes. Borg has 20 compute nodes, each with 16 cores, so those are the maximum values permitted for --nodes and --ntasks-per-node.
There are multiple MPI implementations available on Borg, so the script also loads a particular implementation (openmpi-5.0.7), before invoking mpirun to run our program. (To find what other modules are available you can enter the command: module avail.)
When you save the script, give it a descriptive name, such as 2x16.slurm.script. You will need to create at least one script for each project, so save it in the same directory as your program and its Makefile.
You will be running each of your programs multiple times, varying the number of processes in order to test its scalability. Whether you use a single script and change its values for each submission or write a separate script for each submission is up to you. (Since you will be using a similar procedure for each MPI project, using a separate script for each submission might save you time in the long run.)
Once you have a script created, the next step is to submit it to the Slurm scheduling system. To do this, you use the sbatch command:
sbatch script_spmd_2_16.slurm
This tells Slurm to put our program in its job-queue.
Slurm will read our script to determine the resources it requires
(i.e., number of nodes, processes per node)
and schedule it when all of those resources are available.
After you submit your job, sbatch will output a line like this:
Submitted batch job 98778The 98778 is your job's submission ID number.
The more resources your script requires, the less likely it is that they will all be simultaneously available, and the longer it will take to get scheduled. (This keeps people from "hogging" a supercomputer's resources.)
#SBATCH --mail-user=yourEmailAddressWhen the email notification line above is enabled in your submission script, Slurm will send you an email each time your job changes state (i.e., when your job begins running, and when your job terminates).
In between these extremes, you can enter the sstat command to monitor the status of your submission interactively, for example:
sstat -j 987778
will provide a status update for submission 987778.
You can enter man sstat for more information.
Alternatively, you can use the squeue command to view Slurm's job queue and see where your job is in the queue.
If you need to remove your submission from the queue (e.g., it seems to be stuck in an infinite loop), you can do so using the scancel command:
scancel 987778
or you can cancel all of your submissions at once:
scancel -u yourUserName
There are many other options that can be given to these commands.
See the manual pages for
sbatch,
sstat,
scancel, and
the other slurm commands
for more information.
When your job is finished, Slurm creates a text output file containing whatever your program produced on the standard output stream. To view the name of this file, enter
lsIn the example above, Slurm produced a file named slurm-98778.out. As you can see, the output file's names consist of three parts:
To view the contents of a short output file, use the cat command:
cat slurm-98778.out
To view the contents of a longer file, use the less command:
less slurm-98778.out
If you experience difficulty getting any of this to work, please contact Chris Wieringa or Prof. Adams.
Congratulations! You can now run your MPI programs on Calvin's supercomputer!