Cluster: SLURM partitions — where your cluster job goes to be completed


The Duke Compute Cluster uses SLURM as its job scheduler (http://slurm.schedmd.com/). SLURM divides the cluster compute resources into “partitions,” and users direct cluster computing submissions to partitions. There are different DCC partitions to which batch jobs and interactive sessions can be directed:

  • common, for jobs that will run on the DCC core nodes (up to 64 GB RAM).
  • common-large, for jobs that will run on the DCC core nodes (64-240 GB GB RAM).
  • gpu-common, for jobs that will run on DCC GPU nodes.
  • Group partitions (partition name varies), for jobs that will run on lab-owned nodes

All the partitions use FIFO (File-In-File-Out) scheduling, although if the top job in the partition will not fit on the machine, SLURM will skip that job and try to schedule the next job in the partition.

Partitions are specified in SLURM submission scripts. An example script for the “common-large” partition:

#!/bin/bash
#SBATCH -e slurm.err
#SBATCH -p common-large
#SBATCH --mem=100G 
(application ... parameters ...)
 

For more information and assistance, please contact rescomputing@duke.edu.