create new tag
view all tags

Resource Limits in Slurm

Very short summary

The resources that we put limits on are: CPU, memory, GPU. And runtime, sort of.

Whom do we limit this to? Not to you, as an individual, but to all the people in your HPC group (your "account") together. Your job can be pending because your colleague is using all your group's resources.

There are two types of limits: just a number (your jobs can not use more than X CPU's simultaneously, no more than Y GPU's and no more than Z gigabytes of memory); and a number times the requested runtime. That last one is always tricky to explain properly; we'll get to that.

The boring stuff that's good to know

A word of terminology: in the commands to come, you will often see the terms "TRES" and "GRES". A TRES is a "trackable resource": something that you can request and that we can put limits on. To reiterate, we limit the use of: CPU's, memory and GPU's. A GRES (generic resource) is just a TRES that doesn't have its own category yet (GPU is actually a GRES). For our purposes, GRES and TRES are just the same thing.

Another thing that may be good to point out again (copy/pasted from the HowToS page...):

Most of our compute nodes have 2 physical CPU's. These are the items you can hold in your hand and install in a motherboard socket.

These physical CPU's consist of multiple CPU "cores". These are mostly independent units, equivalent to what (in the old days...) you would actually call "a CPU".

These CPU cores present themselves to the operating system as two, so that they can run two software threads at a time. This is called hyperthreading.

Unfortunately (in my opinion), these "hyperthreads" is what Slurm actually calls a "CPU". If you specify "srun --cpus-per-task=2", you will get 2 hyperthreads, which is just 1 CPU "core". In addition, if you request an odd number of "CPU's", you will get an even number, rounded up. So: "--cpus-per-task=3" will get you 4 hyperthreads (2 CPU cores).

So, whenever you see a "cpu" limit below, remember that's actually a "hyperthread", and you can't actually request 1 "cpu", you will always get an even number.

On to the good stuff.

How to see your group limits

The full (but slightly unreadable) resource limit configuration can be seen with the command scontrol show assoc_mgr. The full (and equally unreadable) resource requests for queued and running jobs can be seen with the command scontrol show jobid NNNN. Relating the information from both commands can be quite a bit of work. Fortunately, some kind soul (https://github.com/OleHolmNielsen/Slurm_tools) has witten some tools that make this a whole lot easier. The most useful of these are available on the submit hosts, in /usr/local/bin.

To see your group limits, use the command showuserlimits. Without arguments, it shows the limits for your default "account". You can also specify -A someotheraccount or -u someotheruser.

Let's see the output of showuserlimits -u mmarinus -A bofh (my limits are very low, because I don't actually pay to use the cluster; I get payed so that you can use it...)

I'll add some comments inline.

[mmarinus@hpcs03 ~]$ showuserlimits -u mmarinus -A bofh
Association (Parent account):
ClusterName = udac   # this is the same for everyone
Account = bofh   # this is my "account" (the group that has all the actual limits)
UserName =   # no username, this applies to every membr of my account group
Partition =   # no partition, this applies to all partitions
Priority = 0
ID = 3
SharesRaw/Norm/Level/Factor = 8/0.00/18909/0.00   # Let's discuss this another time :-)
UsageRaw/Norm/Efctv = 1.99/0.00/0.00
ParentAccount = root, current value = 1
Lft = 1694
DefAssoc = No
GrpJobs =   # This line (and the next 4) are limits that could have been set, but are not
GrpJobsAccrue =
GrpSubmitJobs =
GrpWall =
cpu: Limit = 1882, current value = 0   # this is the first actual limit: I can use no more than 1882 CPU's simultaneously (one job using 1882, or 188 jobs using 10, etc). If I submit more, they stay "pending". 
mem: Limit = 7000000, current value = 0   # My running jobs can not request more than 7 TB memory. If I request additional memory, that jobs stay pending.
gres/gpu: Limit = 8, current value = 0   # I can not use more than 8 GPU's simultaneously

GrpTRESMins =   # this would set limits on total resource consumption, including past jobs. We don't do this, we only limit current resource usage.

GrpTRESRunMins =   # this limits "requested_runtime * specified_resource" for running jobs. Time is in minutes.
cpu: Limit = 17818, current value = 0   # I can have 1 jobs with 2 CPU's requesting 8909 minutes, or 10 jobs with 10 CPU's requesting 178.18 minutes, etc. Additional jobs stay pending.
gres/gpu: Limit = 20160, current value = 0   # I can have 1 job with 1 GPU requesting 20160 minutes, or 4 jobs with 2 GPU's requesting 2520 minutes, etc.

MaxJobs =   # That is all we limit on.
MaxJobsAccrue =
MaxSubmitJobs =
MaxWallPJ =
MinPrioThresh =
Association (User):   # This would show any limits that are applied to me individually, rather than to my group. Nothing here, we only limit groups.
ClusterName = udac
Account = bofh
UserName = mmarinus, UID=10307
Partition =
Priority = 0
ID = 4
SharesRaw/Norm/Level/Factor = 10/0.00/50/0.00
UsageRaw/Norm/Efctv = 1.99/0.00/1.00
ParentAccount =
Lft = 1705
DefAssoc = Yes
GrpJobs =
GrpJobsAccrue =
GrpSubmitJobs =
GrpWall =

GrpTRESMins =

GrpTRESRunMins =

MaxJobs =
MaxJobsAccrue =
MaxSubmitJobs =
MaxWallPJ =
MinPrioThresh =

-- Martin Marinus - 2020-05-26


Topic revision: r1 - 2020-05-26 - MartinMarinus
This site is powered by the TWiki collaboration platform Powered by PerlCopyright © 2008-2020 by the contributing authors. All material on this collaboration platform is the property of the contributing authors.
Ideas, requests, problems regarding TWiki? Send feedback