Difference: SlurmScheduler (1 vs. 14)

Revision 142019-12-24 - ReneJanssen

Line: 1 to 1
 
META TOPICPARENT name="WebHome"

Slurm

Line: 138 to 138
 Official Slurm Workload Manager documentation
Slurm Tutorials
SGE to SLURM conversion
Added:
>
>
SLURM user statistics
 

Revision 132019-12-13 - MartinMarinus

Line: 1 to 1
 
META TOPICPARENT name="WebHome"

Slurm

Line: 113 to 113
  Something like:
Changed:
<
<
srun -p gpu -n 12 --gres=tmpspace:10G --gpus-per-node=1 --time 24:00:00 --mem 100G --pty bash
>
>
srun -p gpu -n 2 --gres=tmpspace:10G --gpus-per-node=1 --time 24:00:00 --mem 100G --pty bash
  will give you an interactive session with 1 GPU.

srun -p gpu --gpus-per-node=RTX2080Ti:1 --pty bash

Changed:
<
<
will request a specific type of GPU. Currently we have 1 "RTX2080Ti" and 4 "TeslaV100".
>
>
will request a specific type of GPU. Currently we have 1 "RTX2080Ti" and 4 "TeslaV100", and one node with 4 RTX6000's.
 

SGE versus SLURM

SGE vs. SLURM

Revision 122019-10-15 - MartinMarinus

Line: 1 to 1
 
META TOPICPARENT name="WebHome"

Slurm

Line: 14 to 14
  Some defaults:
Changed:
<
<
  • There is one "partition" (like a "queue" in GridEngine). It is called "one" (suggestions welcome; it cannot be called "default").
>
>
  • There are two "partitions" (like a "queue" in GridEngine), called "cpu" (for normal jobs) and "gpu" (where some GPU's are available). The default is "cpu".
 
  • Default runtime is 10 minutes.
  • Default memory is 10 GB.
  • By default, your job gets 1 GB of "scratch" local disk space in "$TMPDIR" (where TMPDIR=/scratch/$SLURM_JOB_ID).
Line: 113 to 113
  Something like:
Changed:
<
<
srun -n 12 --gres=tmpspace:10G --gpus-per-node=1 --time 24:00:00 --mem 100G --pty bash
>
>
srun -p gpu -n 12 --gres=tmpspace:10G --gpus-per-node=1 --time 24:00:00 --mem 100G --pty bash
  will give you an interactive session with 1 GPU.
Changed:
<
<
srun --gpus-per-node=RTX2080Ti:1 --pty bash
>
>
srun -p gpu --gpus-per-node=RTX2080Ti:1 --pty bash
  will request a specific type of GPU. Currently we have 1 "RTX2080Ti" and 4 "TeslaV100".

Revision 112019-09-11 - MartinMarinus

Line: 1 to 1
 
META TOPICPARENT name="WebHome"

Slurm

Line: 119 to 119
  srun --gpus-per-node=RTX2080Ti:1 --pty bash
Changed:
<
<
will request a specific type of GPU.
>
>
will request a specific type of GPU. Currently we have 1 "RTX2080Ti" and 4 "TeslaV100".
 

SGE versus SLURM

SGE vs. SLURM

Revision 102019-06-13 - MartinMarinus

Line: 1 to 1
 
META TOPICPARENT name="WebHome"

Slurm

Line: 113 to 113
  Something like:
Changed:
<
<
srun -n 12 --gres=tmpspace:10G --gpus=1 --time 24:00:00 --mem 100G --pty bash
>
>
srun -n 12 --gres=tmpspace:10G --gpus-per-node=1 --time 24:00:00 --mem 100G --pty bash
  will give you an interactive session with 1 GPU.
Changed:
<
<
srun --gpus=RTX2080Ti:1 --pty bash
>
>
srun --gpus-per-node=RTX2080Ti:1 --pty bash
  will request a specific type of GPU.

Revision 92019-06-05 - MartinMarinus

Line: 1 to 1
 
META TOPICPARENT name="WebHome"

Slurm

Line: 113 to 113
  Something like:
Changed:
<
<
srun -n 12 --gres=tmpspace:10G --gres=gpu:1 --time 24:00:00 --mem 100G --pty bash
>
>
srun -n 12 --gres=tmpspace:10G --gpus=1 --time 24:00:00 --mem 100G --pty bash
  will give you an interactive session with 1 GPU.
Changed:
<
<
srun --gres=gpu:RTX2080Ti:1 --pty bash
>
>
srun --gpus=RTX2080Ti:1 --pty bash
  will request a specific type of GPU.

Revision 82019-06-03 - MartinMarinus

Line: 1 to 1
 
META TOPICPARENT name="WebHome"

Slurm

Line: 117 to 117
  will give you an interactive session with 1 GPU.
Added:
>
>
srun --gres=gpu:RTX2080Ti:1 --pty bash

will request a specific type of GPU.

 

SGE versus SLURM

SGE vs. SLURM
qstat - squeue

Revision 72019-05-31 - MartinMarinus

Line: 1 to 1
 
META TOPICPARENT name="WebHome"

Slurm

Line: 109 to 109
  Of course, this works for all the commands. The scratch disk space will be made available in $TMPDIR (/scratch/$SLURM_JOB_ID) and will be erased automatically when your job is finished.
Added:
>
>

Using a GPU

Something like:

srun -n 12 --gres=tmpspace:10G --gres=gpu:1 --time 24:00:00 --mem 100G --pty bash

will give you an interactive session with 1 GPU.

 

SGE versus SLURM

SGE vs. SLURM
qstat - squeue

Revision 62019-05-23 - ReneJanssen

Line: 1 to 1
 
META TOPICPARENT name="WebHome"

Slurm

Line: 118 to 118
  examples :
submit job named 1.sh
Changed:
<
<
qsub 1.sh = sbatch 1.sh
>
>
sge = qsub 1.sh
=
slurm = sbatch 1.sh
  Extra information :
Official Slurm Workload Manager documentation

Revision 52019-05-16 - ReneJanssen

Line: 1 to 1
 
META TOPICPARENT name="WebHome"

Slurm

Line: 121 to 120
 submit job named 1.sh
qsub 1.sh = sbatch 1.sh
Changed:
<
<
Extra information : https://srcc.stanford.edu/sge-slurm-conversion
>
>
Extra information :
Official Slurm Workload Manager documentation
Slurm Tutorials
SGE to SLURM conversion

 

-- Martin Marinus - 2019-05-07

Deleted:
<
<

Comments

<--/commentPlugin-->

Revision 42019-05-08 - MartinMarinus

Line: 1 to 1
 
META TOPICPARENT name="WebHome"

Slurm

Line: 8 to 8
  This page will hopefully grow organically. Feel free to make corrections and add your tips, tricks and insights.
Added:
>
>
 

Defaults

Some defaults:

Line: 15 to 17
 
  • There is one "partition" (like a "queue" in GridEngine). It is called "one" (suggestions welcome; it cannot be called "default").
  • Default runtime is 10 minutes.
  • Default memory is 10 GB.
Changed:
<
<
  • By default, your job gets 1 GB of "scratch" local diskspace in "$TMPDIR".
>
>
  • By default, your job gets 1 GB of "scratch" local disk space in "$TMPDIR" (where TMPDIR=/scratch/$SLURM_JOB_ID).
 

Running jobs

Changed:
<
<
You can run jobs using "srun" (interactively) or "sbatch" (like qsub).
>
>
You can run jobs using "srun" (interactively), "sbatch" (like qsub), or use "salloc" to allocate resources and then "srun" your commands in that allocation.
 

srun

Line: 61 to 63
 echo "Hello World"
Changed:
<
<

SGE versus SLURM

>
>

salloc/srun

Quoting from the documentation:

The final mode of operation is to create a resource allocation and spawn job steps within that allocation. The salloc command is used to create a resource allocation and typically start a shell within that allocation. One or more job steps would typically be executed within that allocation using the srun command to launch the tasks. Finally the shell created by salloc would be terminated using the exit command.

Be very careful to use srun to run the commands within your allocation. Otherwise, the commands will run on the machine that you're logged in on! See:

# Allocate two compute nodes:
[mmarinus@hpcm05 ~]$ salloc -N 2
salloc: Pending job allocation 1635
salloc: job 1635 queued and waiting for resources
salloc: job 1635 has been allocated resources
salloc: Granted job allocation 1635
salloc: Waiting for resource configuration
salloc: Nodes n[0009-0010] are ready for job

# I got n0009 and n0010
[mmarinus@hpcm05 ~]$ srun hostname
n0009.compute.hpc
n0010.compute.hpc

# But this command just runs on the machine I started the salloc command from!
[mmarinus@hpcm05 ~]$ hostname
hpcm05.manage.hpc

# Even if you "srun" something, be careful where (e.g.) variable expansion is done:
[mmarinus@hpcm05 ~]$ srun echo "running on $(hostname)"
running on hpcm05.manage.hpc
running on hpcm05.manage.hpc

# Exit the allocation
[mmarinus@hpcm05 ~]$ exit
exit
salloc: Relinquishing job allocation 1635

Local (scratch) disk space

If your job benefits from (faster) local disk space (like "qsub -l tmpspace=xxx"), request it like this:

srun --gres=tmpspace:250M --pty bash

Of course, this works for all the commands. The scratch disk space will be made available in $TMPDIR (/scratch/$SLURM_JOB_ID) and will be erased automatically when your job is finished.

SGE versus SLURM

 
SGE vs. SLURM
qstat - squeue
qsub - sbatch
qdel - scancel
Changed:
<
<
qlogin - salloc
>
>
qlogin - srun --pty bash
 

examples :

Revision 32019-05-07 - ReneJanssen

Line: 1 to 1
 
META TOPICPARENT name="WebHome"

Slurm

Line: 71 to 71
  examples :
submit job named 1.sh
Changed:
<
<
qsub 1.sh sbatch 1.sh
>
>
qsub 1.sh = sbatch 1.sh
 
Added:
>
>
Extra information : https://srcc.stanford.edu/sge-slurm-conversion
 

Revision 22019-05-07 - ReneJanssen

Line: 1 to 1
 
META TOPICPARENT name="WebHome"

Slurm

Line: 61 to 61
 echo "Hello World"
Added:
>
>

SGE versus SLURM

SGE vs. SLURM
qstat - squeue
qsub - sbatch
qdel - scancel
qlogin - salloc

examples :
submit job named 1.sh
qsub 1.sh sbatch 1.sh

 -- Martin Marinus - 2019-05-07

Comments

Revision 12019-05-07 - MartinMarinus

Line: 1 to 1
Added:
>
>
META TOPICPARENT name="WebHome"

Slurm

Starting in May 2019, we're testing our new Slurm setup. Slurm is similar to GridEngine: it manages a cluster, distributing user jobs in (hopefully) a fair and efficient way.

The concepts are comparable, but the syntax is not.

This page will hopefully grow organically. Feel free to make corrections and add your tips, tricks and insights.

Defaults

Some defaults:

  • There is one "partition" (like a "queue" in GridEngine). It is called "one" (suggestions welcome; it cannot be called "default").
  • Default runtime is 10 minutes.
  • Default memory is 10 GB.
  • By default, your job gets 1 GB of "scratch" local diskspace in "$TMPDIR".

Running jobs

You can run jobs using "srun" (interactively) or "sbatch" (like qsub).

srun

srun will execute the command given, and wait for it to finish. Some examples:

  • srun sleep 60

  • srun -n 4 bash -c "hostname; stress -c 10". This will start 4 seperate "tasks", each getting 1 CPU (2 cores on each). Eight threads in total.

This is different from:

  • srun -c 4 bash -c "hostname; stress -c 10". This will start 1 task, getting 4 cores (2 CPU's, 2 cores on each).

To me, the number of tasks, CPU's and cores is sometimes slightly surprising. I guess it will make sense after a while...

You can also use srun to get an interactive shell on a compute node (like qlogin):

  • srun -n 2 --mem 5G --time 01:00:00 --pty bash

Or on a specific node:

  • srun -n 2 --mem 5G --time 01:00:00 --nodelist n0014 --pty bash

sbatch

sbatch is like qsub. Commandline options are similar to srun, and can be embedded in a script file:

#!/bin/bash

#SBATCH -t 00:05:00
#SBATCH --mem=20G
#SBATCH -o log.out
#SBATCH -e errlog.out
#SBATCH --mail-type=FAIL
#SBATCH --mail-user=youremail@some.where #Email to which notifications will be sent

env
echo "Hello World" 

-- Martin Marinus - 2019-05-07

Comments

<--/commentPlugin-->
 
This site is powered by the TWiki collaboration platform Powered by PerlCopyright © 2008-2020 by the contributing authors. All material on this collaboration platform is the property of the contributing authors.
Ideas, requests, problems regarding TWiki? Send feedback