Slurm
Starting in May 2019, we're testing our new Slurm setup. Slurm is similar to
GridEngine: it manages a cluster, distributing user jobs in (hopefully) a fair and efficient way.
The concepts are comparable, but the syntax is not.
This page will hopefully grow organically. Feel free to make corrections and add your tips, tricks and insights.
Defaults
Some defaults:
- There is one "partition" (like a "queue" in GridEngine). It is called "one" (suggestions welcome; it cannot be called "default").
- Default runtime is 10 minutes.
- Default memory is 10 GB.
- By default, your job gets 1 GB of "scratch" local diskspace in "$TMPDIR".
Running jobs
You can run jobs using "srun" (interactively) or "sbatch" (like qsub).
srun
srun will execute the command given, and wait for it to finish. Some examples:
-
srun -n 4 bash -c "hostname; stress -c 10"
. This will start 4 seperate "tasks", each getting 1 CPU (2 cores on each). Eight threads in total.
This is different from:
-
srun -c 4 bash -c "hostname; stress -c 10"
. This will start 1 task, getting 4 cores (2 CPU's, 2 cores on each).
To me, the number of tasks, CPU's and cores is sometimes slightly surprising. I guess it will make sense after a while...
You can also use srun to get an interactive shell on a compute node (like qlogin):
-
srun -n 2 --mem 5G --time 01:00:00 --pty bash
Or on a specific node:
-
srun -n 2 --mem 5G --time 01:00:00 --nodelist n0014 --pty bash
sbatch
sbatch is like qsub. Commandline options are similar to srun, and can be embedded in a script file:
#!/bin/bash
#SBATCH -t 00:05:00
#SBATCH --mem=20G
#SBATCH -o log.out
#SBATCH -e errlog.out
#SBATCH --mail-type=FAIL
#SBATCH --mail-user=youremail@some.where #Email to which notifications will be sent
env
echo "Hello World"
SGE versus SLURM
examples :
submit job named 1.sh
qsub 1.sh
sbatch 1.sh
--
Martin Marinus - 2019-05-07
Comments