Home Blog
en|ru

SLURM notes

SLURM is a job scheduling system for supercomputers. If this is not what you were thinking, then reading further makes no sense. Here are my notes on the capabilities of this system.

Reservations

Useful options: duration=infinite, startTime=now+1day, NodeCnt=10

Useful flags:

  • IGNORE_JOBS: create a reservation even if we cannot guarantee that the required number of nodes will be available at its start time;
  • OVERLAP: can overlap other reservations (similar to MAINT);
  • REPLACE_DOWN: automatically add nodes in a reservation in place of nodes that have gone to DRAIN;
  • MAGNETIC: if a user’s task matches the conditions of a reservation, it will automatically run in the reservation (no need to write sbatch --reservation=...)

Options in slurm.conf: ResvProlog and ResvEpilog - prolog and epilog for reservations

Resource Allocation

This is handled by the SelectType option in the config. For select/linear, nodes are allocated in their entirety. Otherwise, resources are allocated. select/cons_res and select/cons_tres are similar, and you can also allocate nodes fully if you enable OverSubscribe=Exclusive in the section. Possible resource types for linear planning are:

  • CR_CPU: Resource = CPU
  • CR_CPU_Memory: Resource = memory (recommended to set DefMemPerCPU)
  • CR_Core: Resource = core
  • CR_Core_Memory: Resource = memory + core
  • CR_ONE_TASK_PER_CORE: Allocate cores entirely
  • CR_CORE_DEFAULT_DIST_BLOCK: Allocate cores in “blocks”
  • CR_LLN: Take the least occupied nodes
  • CR_Pack_Nodes: “Pack” nodes with one task by default, instead of distributing across nodes. Can be disabled with srun --distribution NoPack
  • CR_Socket: Resource = socket
  • CR_Socket_Memory: Resource = memory + socket
  • CR_Memory: Resource = memory

For tres, two more parameters:

  • DefCpuPerGPU: Number of CPU per GPU
  • DefMemPerGPU: Memory size on GPU

Node parameters

Features - comma separated list if fearures, possibly with number argumants. No consumable - they are present or not. You can request for them using sbatch --constraint.

Gres - resources list in format <name>[:<type>][:no_consume]:<number>[K|M|G]. name should be in GresType. type - model, etc. Example: Gres=gpu:tesla:1,gpu:kepler:1,bandwidth:lustre:no_consume:4G.

Misc

State is saved in /var/spool, it may be changed using StateSaveLocation parameter.

TaskPlugin=task/affinity,task/cgroup - yes, this is correct and proposed.

en|ru
Home Blog
Nickname sergzhum is registered!