Submit jobs for CPU and memory affinity scheduling by specifying an affinity[] section either in the bsub -R option, to a queue defined in lsb.queues or to an application profile with a RES_REQ parameter containing and affinity[]section.
The affinity[] resource requirement string controls job slot and processor unit allocation and distribution within a host.
See Affinity string for detailed syntax of the affinity[] resource requirement string.
If JOB_INCLUDE_POSTPROC=Y is set in lsb.applications or lsb.queues, or the LSB_JOB_INCLUDE_POSTPROC=Y is set in the job environment, LSF does not release affinity resources until post-execution processing has finished, since slots are still occupied by the job during post-execution processing.
The following examples illustrate affinity jobs that request specific processor unit allocations and task distributions.
The following job asks for 6 slots and runs within single host. Each slot maps to one core. LSF tries to pack 6 cores as close as possible on single NUMA or socket. If the task distribution cannot be satisfied, the job can not be started.
bsub -n 6 –R "span[hosts=1] affinity[core(1):distribute=pack]" myjob
The following job asks for 6 slots and runs within single host. Each slot maps to one core, but in this case it must be packed into a single socket, otherwise, the job remains pending.
bsub -n 6 –R "span[hosts=1] affinity[core(1):distribute=pack(socket=1)]" myjob
The following Job asks for 2 slots on a single host. Each slot maps to 2 cores. 2 cores for a single slot (task) must come from the same socket; however, the other 2 cores for second slot (task) must be on different socket.
bsub -n 2 –R "span[hosts=1] affinity[core(2, same=socket, exclusive=(socket, injob))]" myjob
The following job specifies that each task in the job requires 2 cores from the same socket. The allocated socket will be marked exclusive for all other jobs. The task will be CPU bound to socket level. LSF attempts to distribute the tasks of the job so that they are balanced across all cores.
bsub -n 4 -R "affinity[core(2, same=socket, exclusive=(socket, alljobs)): cpubind=socket:distribute=balance]" myjob
You can submit affinity jobs with CPU various binding and memory binding options. The following examples illustrate this.
In the following job, both tasks require 5 cores in the same NUMA node and binds the tasks on the NUMA node with memory mandatory binding.
bsub -n 2 -R "affinity[core(5,same=numa):cpubind=numa:membind=localonly]" myjob
The following job binds a multithread job on a single NUMA node:
bsub -n 2 -R "affinity[core(3,same=numa):cpubind=numa:membind=localprefer]" myjob
The following job distributes tasks across sockets:
bsub -n 2 -R "affinity[core(2,same=socket,exclusive=(socket,injob|alljobs)): cpubind=socket]" myjob
Each task needs 2 cores from the same socket and binds each task at the socket level. The allocated socket is exclusive, so no other tasks can use it.
The following job packs job tasks in one NUMA node:
bsub -n 2 -R "affinity[core(1,exclusive=(socket,injob)):distribute=pack(numa= 1)]" myjob
Each task needs 1 core and no other tasks from the same job will allocate CPUs from the same socket. LSF attempts to pack all tasks in the same job to one NUMA node.
LSF sets several environment variables in the execution environment of each job and task. These are designed to integrate and work with IBM Parallel Environment, and IBM Platform MPI. However, these environment variables are available to all affinity jobs and could potentially be used by other applications. Because LSF provides the variables expected by both IBM Parallel Environment and Platform MPI, there is some redundancy: environment variables prefixed by RM_ are implemented for compatibility with IBM Parallel Environment, although Platform MPI uses them as well, while those prefixed with LSB_ are only used by Platform MPI. The two types of variable provide similar information, but in different formats.
See the environment variable reference in the Platform LSF Configuration Reference for detailed information about these variables.
For Single-host applications the application itself does not need to do anything, and only the OMP_NUM_THREADS variable is relevant.
For the first execution host of a multi-host parallel application Platform MPI running under LSF will select CPU resources for each task, start up the Platform MPI agent (mpid) to bind mpid to all allocated CPUs and memory policies. Corresponding environment variables are set including RM_CPUTASKn. Platform MPI reads RM_CPUTASKn on each host, and does the task-level binding. Platform MPI follows the RM_CPUTASKn setting and binds each task to the selected CPU list per task. This is the default behaviour when Platform MPI runs under LSF.
To support IBM Parallel Operating Environment jobs, LSF starts the PMD program, binds the PMD process to the allocated CPUs and memory nodes on the host, and sets RM_CPUTASKn, RM_MEM_AFFINITY, and OMP_NUM_THREADS. The IBM Parallel Operating Environment will then do the binding for individual tasks.
Rank 0=Host1 slot=0,1,2,3
Rank 1=Host1 slot=4,5,6,7
Rank 2=Host2 slot=0,1,2,3
Rank 3=Host2 slot=4,5,6,7
Rank 4=Host3 slot=0,1,2,3
Rank 5=Host4 slot=0,1,2,3
The script (openmpi_rankfile.sh) is located in $LSF_BINDIR. Use the DJOB_ENV_SCRIPT parameter in an application profile in lsb.applications to configure the path to the script.
For distributed applications that use blaunch directly to launch tasks or agent per slot (not per host) by default, LSF binds the task to all allocated CPUs and memory nodes on the host. That is, the CPU and memory node lists are generated at the host level. Certain distributed application may need to generate the binding lists on a task-by-task basis. This behaviour is configurable in either job submission environment or an application profile as an environment variable named LSB_DJOB_TASK_BIND=Y | N. N is the default. When this environment variable is set, the binding list will be generated on a task per task basis.
The following examples assume that the cluster comprises only hosts with the following topology:
Host[64.0G] HostN
NUMA[0: 0M / 32.0G] NUMA[1: 0M / 32.0G]
Socket0 Socket0
core0(0 22) core0(1 23)
core1(2 20) core1(3 21)
core2(4 18) core2(5 19)
core3(6 16) core3(7 17)
core4(8 14) core4(9 15)
core5(10 12) core5(11 13)
Socket1 Socket1
core0(24 46) core0(25 47)
core1(26 44) core1(27 45)
core2(28 42) core2(29 43)
core3(30 40) core3(31 41)
core4(32 38) core4(33 39)
core5(34 36) core5(35 37)
A bsub command line with an affinity requirement
An allocation for the resulting job displayed as in bjobs
The same allocation displayed as in bhosts
The values of the job environment variables above once the job is dispatched
The examples cover some of the more common examples: serial and parallel jobs with simple CPU and memory requirements, as well as the effect of the exclusive clause of the affinity resource requirement string.
bsub -R "affinity[core(1)]" is a serial job asking for a single core.
...
CPU BINDING MEMORY BINDING
------------------------ --------------------
HOST TYPE LEVEL EXCL IDS POL NUMA SIZE
Host1 core - - /0/0/0 - - -
...
...
Host[64.0G] Host1
NUMA[0: 0M / 32.0G] NUMA[1: 0M / 32.0G]
Socket0 Socket0
core0(*0 *22) core0(1 23)
core1(2 20) core1(3 21)
core2(4 18) core2(5 19)
core3(6 16) core3(7 17)
core4(8 14) core4(9 15)
core5(10 12) core5(11 13)
Socket1 Socket1
core0(24 46) core0(25 47)
core1(26 44) core1(27 45)
core2(28 42) core2(29 43)
core3(30 40) core3(31 41)
core4(32 38) core4(33 39)
core5(34 36) core5(35 37)
...
Host1 0,22
LSB_BIND_CPU_LIST=0,22
RM_CPUTASK1=0,22
bsub -R "affinity[socket(1)]" is a serial job asking for an entire socket.
...
CPU BINDING MEMORY BINDING
------------------------ --------------------
HOST TYPE LEVEL EXCL IDS POL NUMA SIZE
Host1 socket - - /0/0 - - -
...
...
Host[64.0G] Host1
NUMA[0: 0M / 32.0G] NUMA[1: 0M / 32.0G]
Socket0 Socket0
core0(*0 *22) core0(1 23)
core1(*2 *20) core1(3 21)
core2(*4 *18) core2(5 19)
core3(*6 *16) core3(7 17)
core4(*8 *14) core4(9 15)
core5(*10 *12) core5(11 13)
Socket1 Socket1
core0(24 46) core0(25 47)
core1(26 44) core1(27 45)
core2(28 42) core2(29 43)
core3(30 40) core3(31 41)
core4(32 38) core4(33 39)
core5(34 36) core5(35 37)
...
Host1 0,2,4,6,8,10,12,14,16,18,20,22
LSB_BIND_CPU_LIST=0,2,4,6,8,10,12,14,16,18,20,22
RM_CPUTASK1=0,2,4,6,8,10,12,14,16,18,20,22
bsub -R “affinity[core(4):membind=localonly] rusage[mem=2048]” is a multi-threaded single-task job requiring 4 cores and 2 GB of memory.
...
CPU BINDING MEMORY BINDING
------------------------ --------------------
HOST TYPE LEVEL EXCL IDS POL NUMA SIZE
Host1 core - - /0/0/0 local 0 2.0GB
/0/0/1
/0/0/2
/0/0/3
...
...
Host[64.0G] Host1
NUMA[0: 2.0G / 32.0G] NUMA[1: 0M / 32.0G]
Socket0 Socket0
core0(*0 *22) core0(1 23)
core1(*2 *20) core1(3 21)
core2(*4 *18) core2(5 19)
core3(*6 *16) core3(7 17)
core4(8 14) core4(9 15)
core5(10 12) core5(11 13)
Socket1 Socket1
core0(24 46) core0(25 47)
core1(26 44) core1(27 45)
core2(28 42) core2(29 43)
core3(30 40) core3(31 41)
core4(32 38) core4(33 39)
core5(34 36) core5(35 37)
...
Host1 0,2,4,6,16,18,20,22 0 1
LSB_BIND_CPU_LIST=0,2,4,6,16,18,20,22
LSB_BIND_MEM_LIST=0
LSB_BIND_MEM_POLICY=localonly
RM_MEM_AFFINITY=yes
RM_CPUTASK1=0,2,4,6,16,18,20,22
OMP_NUM_THREADS=4
bsub -n 2 -R "affinity[core(2)] span[hosts=1]" is a multi-threaded parallel job asking for 2 tasks with 2 cores each running on the same host.
...
CPU BINDING MEMORY BINDING
------------------------ --------------------
HOST TYPE LEVEL EXCL IDS POL NUMA SIZE
Host1 core - - /0/0/0 - - -
/0/0/1
Host1 core - - /0/0/2 - - -
/0/0/3
...
...
Host[64.0G] Host1
NUMA[0: 0M / 32.0G] NUMA[1: 0M / 32.0G]
Socket0 Socket0
core0(*0 *22) core0(1 23)
core1(*2 *20) core1(3 21)
core2(*4 *18) core2(5 19)
core3(*6 *16) core3(7 17)
core4(8 14) core4(9 15)
core5(10 12) core5(11 13)
Socket1 Socket1
core0(24 46) core0(25 47)
core1(26 44) core1(27 45)
core2(28 42) core2(29 43)
core3(30 40) core3(31 41)
core4(32 38) core4(33 39)
core5(34 36) core5(35 37)
...
Host1 0,2,4,6
Host1 16,18,20,22
LSB_BIND_CPU_LIST=0,2,4,6,16,18,20,22
RM_CPUTASK1=0,2,4,6
RM_CPUTASK2=16,18,20,22
OMP_NUM_THREADS=2
LSB_BIND_CPU_LIST=0,2,4,6
RM_CPUTASK1=0,2,4,6
OMP_NUM_THREADS=2
LSB_BIND_CPU_LIST=16,18,20,22
RM_CPUTASK1=16,18,20,22
OMP_NUM_THREADS=2
bsub -n 2 -R "affinity[core(2)] span[ptile=1]" is a multi-threaded parallel job asking for a 2 tasks with 2 cores each running on a different host. This is almost identical to the previous example except that the allocation is across two hosts.
...
CPU BINDING MEMORY BINDING
------------------------ --------------------
HOST TYPE LEVEL EXCL IDS POL NUMA SIZE
Host1 core - - /0/0/0 - - -
/0/0/1
Host2 core - - /0/0/0 - - -
/0/0/1
...
...
Host[64.0G] Host{1,2}
NUMA[0: 0M / 32.0G] NUMA[1: 0M / 32.0G]
Socket0 Socket0
core0(*0 *22) core0(1 23)
core1(*2 *20) core1(3 21)
core2(4 18) core2(5 19)
core3(6 16) core3(7 17)
core4(8 14) core4(9 15)
core5(10 12) core5(11 13)
Socket1 Socket1
core0(24 46) core0(25 47)
core1(26 44) core1(27 45)
core2(28 42) core2(29 43)
core3(30 40) core3(31 41)
core4(32 38) core4(33 39)
core5(34 36) core5(35 37)
...
Host1 0,2,20,22
Host2 0,2,20,22
LSB_BIND_CPU_LIST=0,2,20,22
RM_CPUTASK1=0,2,20,22
OMP_NUM_THREADS=2
bsub -R "affinity[core(1,exclusive=(socket,alljobs))]" is an example of a single threaded serial job asking for a core that it would like to have exclusive use of a socket across all jobs. Compare this with examples (1) and (2) above of a jobs simply asking for a core or socket.
...
CPU BINDING MEMORY BINDING
------------------------ --------------------
HOST TYPE LEVEL EXCL IDS POL NUMA SIZE
Host1 core - socket /0/0/0 - - -
...
...
Host[64.0G] Host1
NUMA[0: 0M / 32.0G] NUMA[1: 0M / 32.0G]
Socket0 Socket0
core0(*0 *22) core0(1 23)
core1(*2 *20) core1(3 21)
core2(*4 *18) core2(5 19)
core3(*6 *16) core3(7 17)
core4(*8 *14) core4(9 15)
core5(*10 *12) core5(11 13)
Socket1 Socket1
core0(24 46) core0(25 47)
core1(26 44) core1(27 45)
core2(28 42) core2(29 43)
core3(30 40) core3(31 41)
core4(32 38) core4(33 39)
core5(34 36) core5(35 37)
...
Host1 0,22
LSB_BIND_CPU_LIST=0,22
RM_CPUTASK1=0,22
From the point of view of what is available to other jobs (that is, the allocation counted against the host), the job has used an entire socket. However in all other aspects the job is only binding to a single core.
bsub -R "affinity[core(1):cpubind=socket]" asks for a core but asks for the binding to be done at the socket level. Contrast this with the previous case where the core wanted exclusive use of the socket.
...
CPU BINDING MEMORY BINDING
------------------------ --------------------
HOST TYPE LEVEL EXCL IDS POL NUMA SIZE
Host1 core socket - /0/0/0 - - -
...
...
Host[64.0G] Host1
NUMA[0: 0M / 32.0G] NUMA[1: 0M / 32.0G]
Socket0 Socket0
core0(*0 *22) core0(1 23)
core1(2 20) core1(3 21)
core2(4 18) core2(5 19)
core3(6 16) core3(7 17)
core4(8 14) core4(9 15)
core5(10 12) core5(11 13)
Socket1 Socket1
core0(24 46) core0(25 47)
core1(26 44) core1(27 45)
core2(28 42) core2(29 43)
core3(30 40) core3(31 41)
core4(32 38) core4(33 39)
core5(34 36) core5(35 37)
...
The view from the execution side though is quite different: from here the list of CPUs that populate the job's binding list on the host is the entire socket.
Host1 0,2,4,6,8,10,12,14,16,18,20,22
LSB_BIND_CPU_LIST=0,2,4,6,8,10,12,14,16,18,20,22
RM_CPUTASK1=0,2,4,6,8,10,12,14,16,18,20,22
Compared to the previous example, from the point of view of what is available to other jobs ( that is, the allocation counted against the host), the job has used a single core. However in terms of the binding list, the job process will be free to use any CPU in the socket while it is running.