Cluster: SLURM Queueing System


A SLURM “job” is simply a Unix-shell script that executes any number of commands. In many cases, it will follow the sequence of commands that you would have typed at a command-line – e.g. you may copy or rename files, cd into the proper directory, etc., all before you do the “real” computational work. You then submit your job or script to SLURM along with a list of requirements (memory usage, number of CPUs, run-time, etc.), and it will find a machine to run your job on. It will automatically find all the machines which meets your requirements, then remove any heavily-loaded machines from that list. E.g. if machine core-n15 has a small job running which leaves open 1200MB of memory and core-n33 has a large job running which leaves only 200MB of memory free, then SLURM will schedule your large-memory job onto core-n15. Of course, this implies that you told SLURM how much memory you need (see below).

The DCC login nodes can be accessed on-campus via ssh to dscr-slogin.oit.duke.edu. For off-campus access, connect ssh to dscr-slogin.oit.duke.edu, Multi-Factor Authetnication is required. Alternatively, connect directly to the “slogin” login node from off-campus using the Duke VPN


Options for submitting parallel jobs to SLURM:

** NOTE ** Not all programs can take advantage of multiple CPUs or multiple machines. The fact that SLURM can launch your program onto 10 machines does not mean that it will run 10 times faster, or that it will actually use those 10 machines – the program itself must be written to understand multi-processor operation. If you are unsure if your program is single-CPU or multi-CPU aware, IT IS MOST LIKELY SINGLE-CPU ONLY.


** NOTE ** We have a watchdog process which runs periodically and will KILL any job that is not attached to SLURM! If you do not use SLURM, you may be able to start your job, but it will be killed before you get any useful work done … so just use SLURM.



Basic SLURM commands

Six basic user commands should cover most of your needs

sbatch Submit job script (Batch mode) qsub
salloc Create job allocation and start a shell (none)
srun Run a command within a batch allocation that was created by sbatch or salloc (none)
scancel Delete jobs from the queue qdel
squeue View the status of jobs qstat
sinfo View information on nodes and queues qhost

 

With each of these commands, the “–help” option prints brief description of all options and the “–usage” option prints a list of the options.

Running Programs Interactively

If needed, programs can be run interactively without a batch script by typing “srun –pty bash -i” to reserve a compute node. The “–partition=” option also works with “srun –pty” but is not required.

tm103@dscr-slogin-02  ~ $ srun --pty bash -i
srun: job 507135 queued and waiting for resources
srun: job 507135 has been allocated resources
tm103@dscr-core-17  ~ $

** NOTE ** Do NOT run resource intensive applications on the head nodes. All the compute nodes have the same software image, including support tools. Use “srun –pty bash -i” for big compiles or application testing.

Simple Example Script

A simple, single-machine, queue script is shown below:

#!/bin/bash
cd /home/username/seq/simple
myprog

SLURM will choose a machine for this job to run on, and then the job will change directories and run the program, “myprog”, on that machine.

Note that since no input or output files were specified, nor were any command line options given, we are assuming that any such inputs can be found in the directory that was cd’ed into, or that program-defaults were used. Similarly, any output files will most likely be produced in the current directory as well.

The first line, with the “# !”, identifies the specific shell that will be used to interpret the commands – in this case, the “tcsh” shell. There are two or three major shells, each of which has a slightly different syntax: tcsh, bash, and sh (bash and sh use very similar syntax). The use of the “# !” on the first line is a standard Unix/Linux convention. Other lines that start with a “#” are comment lines and will be ignored when the script runs.

Job Submission

To submit the job, use the sbatch command with the above scripts:

% sbatch simple.q

where ‘simple.q’ is the name of the file with the above commands.

When submitting the job, we can also identify parameters for how the job should be run:

% sbatch -o simple.out simple.q

This command-line option, ‘-o simple.out‘, identifies that any output that would otherwise go to the screen should instead be placed into a text file named “simple.out“. SLURM provides a number of options that can be used to identify how the job should run and what kinds of resources are needed in order for the job to run effectively – e.g. memory or CPU requirements.

Adding SLURM Options

If you always need certain SLURM options to be specified for a given job, you can embed those options into the SLURM job script using lines that start with “#$”:

#!/bin/bash
#
#SBATCH --job-name=test
#SBATCH --output=test.out
#SBATCH --mem=100
 
cd /home/username/seq/simple
myprog

The ‘–output’ (or ‘-o’) option is used to direct screen output to a file. The ‘–mem_free=100’ tells SLURM to only run the job on a node with at least 100 megabytes of free RAM available. The ‘100’ here should be changed to match the actual amount of RAM you expect your job to use, e.g. ‘-l mem=4000’ (4 gigabytes of RAM). Help with estimating your program’s memory use can be found here: Monitoring Memory Usage.

In any shell script, lines starting with a “#” are generally ignored as comments by a shell script. It is only SLURM that interprets the lines that start with a “#SBATCH”, other systems will consider them to be comments. This makes it possible for your SLURM script to still run on other Linux machines, for example.

** NOTE ** The more memory your job requires, the more important it is to include a ‘– mem’ request in your script. See Monitoring Memory Usage for more info on determining how much the memory request should be.

Common SLURM Options

–mail-user=<user>      

Sends email to the specified account.

–mail-type=<type> Notify user by email when certain event types occur. Valid type values are BEGIN, END, FAIL, REQUEUE, and ALL (any state change). The user to be notified is indicated with –mail-user.

-o –output=<file>

-e –error =<file>

directs the standard output (-o) and standard error-output (-e) to the specified files. Note that this is output or error-output that is not otherwise directed to files; ie. csh redirection (myprog > file.out) takes precedence

-J –job-name=<name>

use this name when displaying in the squeue output; defaults to the name of the script file

-d, –dependency=<dependency_list> Defer the start of this job until the specified dependencies have been satisfied completed. <dependency_list> is of the form <type:job_id[:job_id][,type:job_id[:job_id]]>. Many jobs can share the same dependency and these jobs may even belong to different users. The value may be changed after job submission using the scontrol command.
-d after:job_id[:jobid…] This job can begin execution after the specified job(s) have begun execution.
-d afterany:job_id[:jobid…] This job can begin execution after the specified job(s) have terminated. If you have one job that must wait for another to complete (perhaps the first one creates an output file which is needed by the second program), then you can request that the job be held until that first job completes.

–mem=1536

Requests that only machines with 1.5GB (=1536MB) or more be used for this job; ie. the job requires a lot of memory and thus is not suitable for all hosts. Note that 1G is equal to 1024M (How do I determine how much memory my program needs? See the FAQ) Change this amount to reserve the correct amount of memory for your job.

–time=06:30:00

requests that 6 hours and 30 minutes be allocated for this job to run. Acceptable time formats include “minutes”, “minutes:seconds”, “hours:minutes:seconds”, “days-hours”, “days-hours:minutes” and “days-hours:minutes:seconds”.

-n –ntasks=100

request 100 tasks, where the “100” is the total number of tasks. The default is one cpu core per task, but note that the –cpus-per-task option will change this default.
-c –cpus-per-task=4 request that each task be allocated 4 cpu cores, where the “4” was the number of cores per task. You MUST use this if your program is multi-threaded, you should NOT use it otherwise