CS 374: Using Slurm on Borg

Most supercomputers prevent MPI users from working interactively, because their programs will likely interfere with one another. Instead, users submit their "jobs" (the programs they want to run) to a batch queue, which holds a given job until all of the resources it needs to run become available.

There are a variety of batch queues available. Commonly used ones include Portable Batch System (PBS), Torque, Grid Engine, and Slurm. In this course, we will use Slurm, the Simple Linux Utility for Resource Management, which is the free, open-source batch scheduler used on most supercomputers.

Running a job via Slurm involves three steps:

  1. Creating a job submission script.
  2. Submitting the job.
  3. Waiting until the job has completed.
  4. Retrieving the job's output/results.
Let's take these one at a time.

1. Creating a Job Submission Script

Before you can submit a job in Slurm, you must first create a text file containing a job submission script, in which you specify the resources your program needs. For example, suppose you want to run a program named spmd, using 4 of the supercomputer's nodes, and with 8 processes running on each node (a total of 32 processes). Then we might create the following job submission script:
#!/bin/bash
# Example with 2 nodes, 16 processes each = 32 processes
#
#SBATCH --nodes=2                    # how many nodes
#SBATCH --ntasks-per-node=16         # how many MPI processes per node
#SBATCH --output=slurm-%j.out        # output filename
#
# Optional: uncomment to use
##SBATCH --mail-user=yourEmailAddress # send emails to this address

# Load the compiler and MPI library
module load openmpi-5.0.7

# Run the program
mpirun ./spmd

Within the script, we specify the number of nodes we want to use (2), and the number of processes per node (16), for a total of 2 x 16 = 32 processes. Borg has 20 compute nodes, each with 16 cores, so those are the maximum values permitted for --nodes and --ntasks-per-node.

There are multiple MPI implementations available on Borg, so the script also loads a particular implementation (openmpi-5.0.7), before invoking mpirun to run our program. (To find what other modules are available you can enter the command: module avail.)

When you save the script, give it a descriptive name, such as 2x16.slurm.script. You will need to create at least one script for each project, so save it in the same directory as your program and its Makefile.

You will be running each of your programs multiple times, varying the number of processes in order to test its scalability. Whether you use a single script and change its values for each submission or write a separate script for each submission is up to you. (Since you will be using a similar procedure for each MPI project, using a separate script for each submission might save you time in the long run.)

2. Submitting the Job

Once you have a script created, the next step is to submit it to the Slurm scheduling system. To do this, you use the sbatch command:

   sbatch script_spmd_2_16.slurm
This tells Slurm to put our program in its job-queue. Slurm will read our script to determine the resources it requires (i.e., number of nodes, processes per node) and schedule it when all of those resources are available.

After you submit your job, sbatch will output a line like this:

   Submitted batch job 98778
The 98778 is your job's submission ID number.

3. Waiting Until the Job has Completed

The more resources your script requires, the less likely it is that they will all be simultaneously available, and the longer it will take to get scheduled. (This keeps people from "hogging" a supercomputer's resources.)

In between these extremes, you can enter the sstat command to monitor the status of your submission interactively, for example:

   sstat -j 987778
will provide a status update for submission 987778. You can enter man sstat for more information.

Alternatively, you can use the squeue command to view Slurm's job queue and see where your job is in the queue.

If you need to remove your submission from the queue (e.g., it seems to be stuck in an infinite loop), you can do so using the scancel command:

   scancel 987778
or you can cancel all of your submissions at once:
   scancel -u yourUserName
There are many other options that can be given to these commands. See the manual pages for sbatch, sstat, scancel, and the other slurm commands for more information.

4. Retrieving the Job's Output/Results

When your job is finished, Slurm creates a text output file containing whatever your program produced on the standard output stream. To view the name of this file, enter

   ls 
In the example above, Slurm produced a file named slurm-98778.out. As you can see, the output file's names consist of three parts:
  1. The word slurm-.
  2. The job ID number 98778.
  3. The suffix .out.
Each time you submit a new job, Slurm gives that submission a unique ID number, so the .out file from each submission will be unique.

To view the contents of a short output file, use the cat command:

   cat slurm-98778.out
To view the contents of a longer file, use the less command:
   less slurm-98778.out

If you experience difficulty getting any of this to work, please contact Chris Wieringa or Prof. Adams.

Congratulations! You can now run your MPI programs on Calvin's supercomputer!


Calvin > CS > 374 > Using Slurm on Borg