create new tag
view all tags


Starting in May 2019, we're testing our new Slurm setup. Slurm is similar to GridEngine: it manages a cluster, distributing user jobs in (hopefully) a fair and efficient way.

The concepts are comparable, but the syntax is not.

This page will hopefully grow organically. Feel free to make corrections and add your tips, tricks and insights.


Some defaults:

  • There are two "partitions" (like a "queue" in GridEngine), called "cpu" (for normal jobs) and "gpu" (where some GPU's are available). The default is "cpu".
  • Default runtime is 10 minutes.
  • Default memory is 10 GB.
  • By default, your job gets 1 GB of "scratch" local disk space in "$TMPDIR" (where TMPDIR=/scratch/$SLURM_JOB_ID).

Running jobs

You can run jobs using "srun" (interactively), "sbatch" (like qsub), or use "salloc" to allocate resources and then "srun" your commands in that allocation.


srun is mostly useful for testing, and for interactive use. It will execute the command given, and wait for it to finish. Some examples:

  • srun sleep 60

  • srun -c 4 bash -c "hostname; stress -c 10". This will start 1 task, getting 4 cores (2 CPU's, 2 cores on each).

This is different from:

  • srun -n 4 bash -c "hostname; stress -c 10". This will start 4 seperate "tasks", each getting 1 CPU (2 cores on each). Eight threads in total.

The previous form (-c) is usually what you want. Just one "job" with 4 CPU cores.

To me, the number of tasks, CPU's and cores is sometimes slightly surprising. I guess it will make sense after a while...

You can also use srun to get an interactive shell on a compute node (like qlogin):

  • srun -c 2 --mem 5G --time 01:00:00 --pty bash

Or on a specific node:

  • srun -c 2 --mem 5G --time 01:00:00 --nodelist n0014 --pty bash


sbatch is like qsub. Commandline options are similar to srun, and can be embedded in a script file:


#SBATCH -t 00:05:00
#SBATCH --mem=20G
#SBATCH -o log.out
#SBATCH -e errlog.out
#SBATCH --mail-type=FAIL
#SBATCH --mail-user=youremail@some.where #Email to which notifications will be sent

echo "Hello World" 

sbatch Force job to run on a compute node

#SBATCH --nodelist=n0065   # force to run job on


Quoting from the documentation:

The final mode of operation is to create a resource allocation and spawn job steps within that allocation. The salloc command is used to create a resource allocation and typically start a shell within that allocation. One or more job steps would typically be executed within that allocation using the srun command to launch the tasks. Finally the shell created by salloc would be terminated using the exit command.

Be very careful to use srun to run the commands within your allocation. Otherwise, the commands will run on the machine that you're logged in on! See:

# Allocate two compute nodes:
[mmarinus@hpcm05 ~]$ salloc -N 2
salloc: Pending job allocation 1635
salloc: job 1635 queued and waiting for resources
salloc: job 1635 has been allocated resources
salloc: Granted job allocation 1635
salloc: Waiting for resource configuration
salloc: Nodes n[0009-0010] are ready for job

# I got n0009 and n0010
[mmarinus@hpcm05 ~]$ srun hostname

# But this command just runs on the machine I started the salloc command from!
[mmarinus@hpcm05 ~]$ hostname

# Even if you "srun" something, be careful where (e.g.) variable expansion is done:
[mmarinus@hpcm05 ~]$ srun echo "running on $(hostname)"
running on hpcm05.manage.hpc
running on hpcm05.manage.hpc

# Exit the allocation
[mmarinus@hpcm05 ~]$ exit
salloc: Relinquishing job allocation 1635

Local (scratch) disk space

If your job benefits from (faster) local disk space (like "qsub -l tmpspace=xxx"), request it like this:

srun --gres=tmpspace:250M --pty bash

Of course, this works for all the commands. The scratch disk space will be made available in $TMPDIR (/scratch/$SLURM_JOB_ID) and will be erased automatically when your job is finished. Note: the "--tmp" option to srun/sbatch sounds like it will do the same, but it won't. Please use the "--gres" method.

Using a GPU

Something like:

srun -p gpu -c 2 --gres=tmpspace:10G --gpus-per-node=1 --time 24:00:00 --mem 100G --pty bash

will give you an interactive session with 1 GPU.

srun -p gpu --gpus-per-node=RTX2080Ti:1 --pty bash

will request a specific type of GPU. Currently we have 1 "RTX2080Ti" and 4 "TeslaV100", and one node with 4 RTX6000's.

SGE versus SLURM

qstat - squeue
qsub - sbatch
qsub -P foo 1.sh - sbatch -A foo 1.sh
qsub -l h_vmem=20G 1.sh - sbatch --mem 20G 1.sh
qsub -l tmpspace=100G a1.sh - sbatch --gres=tmpspace:100G a1.sh
qsub -l h_rt=10:00:00 a1.sh - sbatch --time=10:00:00 a1.sh
qdel - scancel
qlogin - srun --pty bash
#$ -pe threaded 2 - #sbatch -c 4

examples :
submit job named 1.sh
sge = qsub 1.sh
slurm = sbatch 1.sh


Extra information :
Official Slurm Workload Manager documentation
Slurm Tutorials
SGE to SLURM conversion
SLURM user statistics
Slurm commands

-- Martin Marinus - 2019-05-07

Edit | Attach | Watch | Print version | History: r23 < r22 < r21 < r20 < r19 | Backlinks | Raw View | Raw edit | More topic actions
Topic revision: r23 - 2021-02-16 - ReneJanssen
This site is powered by the TWiki collaboration platform Powered by PerlCopyright © 2008-2021 by the contributing authors. All material on this collaboration platform is the property of the contributing authors.
Ideas, requests, problems regarding TWiki? Send feedback