Managing jobs with affinity resource requirements

You can view resources allocated for jobs and tasks with CPU and memory affinity resource requirements with the -l -aff option of bjobs, bhist, and bacct. Use bhosts -aff to view host resources allocated for affinity jobs.

Viewing job resources for affinity jobs (-aff)

The -aff option displays information about jobs with CPU and memory affinity resource requirement for each task in the job. A table headed AFFINITY shows detailed memory and CPU binding information for each task in the job, one line for each allocated processor unit.

Use only with the -l option of bjobs, bhist, and bacct.

Use bjobs -l -aff to display information about CPU and memory affinity resource requirements for job tasks. A table with the heading AFFINITY is displayed containing the detailed affinity information for each task, one line for each allocated processor unit. CPU binding and memory binding information are shown in separate columns in the display.

For example the following job starts 6 tasks with the following affinity resource requirements:
bsub -n 6 -R"span[hosts=1] rusage[mem=100]affinity[core(1,same=socket,exclusive=(socket,injob))
:cpubind=socket:membind=localonly:distribute=pack]" myjob
Job <6> is submitted to default queue <normal>.
bjobs -l -aff 61

Job <61>, User <user1>, Project <default>, Status <RUN>, Queue <normal>, Comman
                     d <myjob1>
Thu Feb 14 14:13:46: Submitted from host <hostA>, CWD <$HOME>, 6 Processors R
                     equested, Requested Resources <span[hosts=1] rusage[mem=10
                     0]affinity[core(1,same=socket,exclusive=(socket,injob)):cp
                     ubind=socket:membind=localonly:distribute=pack]>;
Thu Feb 14 14:15:07: Started on 6 Hosts/Processors <hostA> <hostA> <hostA
                     > <hostA> <hostA> <hostA>, Execution Home </home/user1
                     >, Execution CWD </home/user1>;

 SCHEDULING PARAMETERS:
           r15s   r1m  r15m   ut      pg    io   ls    it    tmp    swp    mem
 loadSched   -     -     -     -       -     -    -     -     -      -      -
 loadStop    -     -     -     -       -     -    -     -     -      -      -

 RESOURCE REQUIREMENT DETAILS:
 Combined: select[type == local] order[r15s:pg] rusage[mem=100.00] span[hosts=1
                     ] affinity[core(1,same=socket,exclusive=(socket,injob))*1:
                     cpubind=socket:membind=localonly:distribute=pack]
 Effective: select[type == local] order[r15s:pg] rusage[mem=100.00] span[hosts=
                     1] affinity[core(1,same=socket,exclusive=(socket,injob))*1
                     :cpubind=socket:membind=localonly:distribute=pack]

 AFFINITY:
                     CPU BINDING                          MEMORY BINDING
                     ------------------------             --------------------
 HOST                TYPE   LEVEL  EXCL   IDS             POL   NUMA SIZE
 hostA               core   socket socket /0/0/0          local 0    16.7MB
 hostA               core   socket socket /0/1/0          local 0    16.7MB
 hostA               core   socket socket /0/2/0          local 0    16.7MB
 hostA               core   socket socket /0/3/0          local 0    16.7MB
 hostA               core   socket socket /0/4/0          local 0    16.7MB
 hostA               core   socket socket /0/5/0          local 0    16.7MB
 ...

Use bhist -l -aff to display historical job information about CPU and memory affinity resource requirements for job tasks.

If the job is pending, the requested affinity resources are displayed. For running jobs, the effective and combined affinity resource allocation decision made by LSF is also displayed, along with a table headed AFFINITY that shows detailed memory and CPU binding information for each task, one line for each allocated processor unit. For finished jobs (EXIT or DONE state), the affinity requirements for the job, and the effective and combined affinity resource requirement details are displayed.

The following example shows bhist output for job 61, submitted above.

bhist -l -aff 61

Job <61>, User <user1>, Project <default>, Command <myjob>
Thu Feb 14 14:13:46: Submitted from host <hostA>, to Queue <normal>, CWD <$HO
                     ME>, 6 Processors Requested, Requested Resources <span[hos
                     ts=1] rusage[mem=100]affinity[core(1,same=socket,exclusive
                     =(socket,injob)):cpubind=socket:membind=localonly:distribu
                     te=pack]>;
Thu Feb 14 14:15:07: Dispatched to 6 Hosts/Processors <hostA> <hostA> <hostA>
                     <hostA> <hostA> <hostA>, Effective RES_REQ <sel
                     ect[type == local] order[r15s:pg] rusage[mem=100.00] span[
                     hosts=1] affinity[core(1,same=socket,exclusive=(socket,inj
                     ob))*1:cpubind=socket:membind=localonly:distribute=pack] >
                     ;

AFFINITY:
                    CPU BINDING                          MEMORY BINDING
                    ------------------------             --------------------
HOST                TYPE   LEVEL  EXCL   IDS             POL   NUMA SIZE
hostA               core   socket socket /0/0/0          local 0    16.7MB
hostA               core   socket socket /0/1/0          local 0    16.7MB
hostA               core   socket socket /0/2/0          local 0    16.7MB
hostA               core   socket socket /0/3/0          local 0    16.7MB
hostA               core   socket socket /0/4/0          local 0    16.7MB
hostA               core   socket socket /0/5/0          local 0    16.7MB

Thu Feb 14 14:15:07: Starting (Pid 3630709);
Thu Feb 14 14:15:07: Running with execution home </home/jsmith>, Execution CWD
                     </home/jsmith>, Execution Pid <3630709>;
Thu Feb 14 14:16:47: Done successfully. The CPU time used is 0.0 seconds;
Thu Feb 14 14:16:47: Post job process done successfully;

MEMORY USAGE:
MAX MEM: 2 Mbytes;  AVG MEM: 2 Mbytes

Summary of time in seconds spent in various states by  Thu Feb 14 14:16:47
  PEND     PSUSP    RUN      USUSP    SSUSP    UNKWN    TOTAL
  81       0        100      0        0        0        181
  

Use bacct -l -aff to display accounting job information about CPU and memory affinity resource allocations for job tasks. A table with the heading AFFINITY is displayed containing the detailed affinity information for each task, one line for each allocated processor unit. CPU binding and memory binding information are shown in separate columns in the display. The following example shows bhist output for job 61, submitted above.

bacct -l -aff 61

Accounting information about jobs that are:
  - submitted by all users.
  - accounted on all projects.
  - completed normally or exited
  - executed on all hosts.
  - submitted to all queues.
  - accounted on all service classes.
------------------------------------------------------------------------------

Job <61>, User <user1>, Project <default>, Status <DONE>, Queue <normal>, Comma
                     nd <myjob>
Thu Feb 14 14:13:46: Submitted from host <hostA>, CWD <$HOME>;
Thu Feb 14 14:15:07: Dispatched to 6 Hosts/Processors <hostA> <hostA> <hostA>
                     <hostA> <hostA> <hostA>, Effective RES_REQ <sel
                     ect[type == local] order[r15s:pg] rusage[mem=100.00] span[
                     hosts=1] affinity[core(1,same=socket,exclusive=(socket,inj
                     ob))*1:cpubind=socket:membind=localonly:distribute=pack] >
                     ;
Thu Feb 14 14:16:47: Completed <done>.

AFFINITY:
                    CPU BINDING                          MEMORY BINDING
                    ------------------------             --------------------
HOST                TYPE   LEVEL  EXCL   IDS             POL   NUMA SIZE
hostA               core   socket socket /0/0/0          local 0    16.7MB
hostA               core   socket socket /0/1/0          local 0    16.7MB
hostA               core   socket socket /0/2/0          local 0    16.7MB
hostA               core   socket socket /0/3/0          local 0    16.7MB
hostA               core   socket socket /0/4/0          local 0    16.7MB
hostA               core   socket socket /0/5/0          local 0    16.7MB

Accounting information about this job:
     CPU_T     WAIT     TURNAROUND   STATUS     HOG_FACTOR    MEM    SWAP
      0.01       81            181     done         0.0001     2M    137M
------------------------------------------------------------------------------

SUMMARY:      ( time unit: second )
 Total number of done jobs:       1      Total number of exited jobs:     0
 Total CPU time consumed:       0.0      Average CPU time consumed:     0.0
 Maximum CPU time of a job:     0.0      Minimum CPU time of a job:     0.0
 Total wait time in queues:    81.0
 Average wait time in queue:   81.0
 Maximum wait time in queue:   81.0      Minimum wait time in queue:   81.0
 Average turnaround time:       181 (seconds/job)
 Maximum turnaround time:       181      Minimum turnaround time:       181
 Average hog factor of a job:  0.00 ( cpu time / turnaround time )
 Maximum hog factor of a job:  0.00      Minimum hog factor of a job:  0.00
  

Viewing host resources for affinity jobs (-aff)

Use bhosts -aff or bhosts -l -aff to display host topology information for CPU and memory affinity scheduling. bhosts -l -aff cannot show remote host topology information in clusters configured with the LSF XL feature of LSF Advanced Edition.

The following fields are displayed:

Host[memory] host_name

Available memory on the host. If memory availability cannot be determined, a dash (-) is displayed for the host. If the -l option is specified with the -aff option, the host name is not displayed.

For hosts that do not support affinity scheduling, a dash (-) is displayed for host memory and no host topology is displayed.

NUMA[numa_node: requested_mem / max_mem]

Requested and available NUMA node memory. It is possible for requested memory for the NUMA node to be greater than the maximum available memory displayed.

Socket, core, and thread IDs are displayed for each NUMA node.

A socket is a collection of cores with a direct pipe to memory. Each socket contains 1 or more cores. This does not necessarily refer to a physical socket, but rather to the memory architecture of the machine.

A core is a single entity capable of performing computations. On hosts with hyperthreading enabled, a core can contain one or more threads.

For example:
bhosts -l -aff hostA
HOST  hostA
STATUS           CPUF  JL/U    MAX  NJOBS    RUN  SSUSP  USUSP    RSV DISPATCH_WINDOW
ok              60.00     -      8      0      0      0      0      0      -

 CURRENT LOAD USED FOR SCHEDULING:
                r15s   r1m  r15m    ut    pg    io   ls    it   tmp   swp   mem  slots
 Total           0.0   0.0   0.0   30%   0.0   193   25     0 8605M  5.8G 13.2G      8
 Reserved        0.0   0.0   0.0    0%   0.0     0    0     0    0M    0M    0M      -


 LOAD THRESHOLD USED FOR SCHEDULING:
           r15s   r1m  r15m   ut      pg    io   ls    it    tmp    swp    mem
 loadSched   -     -     -     -       -     -    -     -     -      -      -
 loadStop    -     -     -     -       -     -    -     -     -      -      -


 CONFIGURED AFFINITY CPU LIST: all

 AFFINITY: Enabled
 Host[15.7G]
     NUMA[0: 0M / 15.7G]
         Socket0
             core0(0)
         Socket1
             core0(1)
         Socket2
             core0(2)
         Socket3
             core0(3)
         Socket4
             core0(4)
         Socket5
             core0(5)
         Socket6
             core0(6)
         Socket7
             core0(7)
  
When LSF detects missing elements in the topology, it attempts to correct the problem by adding the missing levels into the topology. For example, sockets and cores are missing on hostB below:
...
Host[1.4G] hostB
    NUMA[0: 1.4G / 1.4G] (*0 *1)
...

A job requesting 2 cores, or 2 sockets, or 2 CPUs will run. Requesting 2 cores from the same NUMA node will also run. However, a job requesting 2 cores from the same socket will remain pending.

Use lshosts -T to display host topology information for each host.

Displays host topology information for each host or cluster:

The following fields are displayed:

Host[memory] host_name

Maximum memory available on the host followed by the host name. If memory availability cannot be determined, a dash (-) is displayed for the host.

For hosts that do not support affinity scheduling, a dash (-) is displayed for host memory and no host topology is displayed.

NUMA[numa_node: max_mem]

Maximum NUMA node memory. It is possible for requested memory for the NUMA node to be greater than the maximum available memory displayed.

If no NUMA nodes are present, then the NUMA layer in the output is not shown. Other relevant items such as host, socket, core and thread are still shown.

If the host is not available, only the host name is displayed. A dash (-) is shown where available host memory would normally be displayed.

A socket is a collection of cores with a direct pipe to memory. Each socket contains 1 or more cores. This does not necessarily refer to a physical socket, but rather to the memory architecture of the machine.

A core is a single entity capable of performing computations. On hosts with hyperthreading enabled, a core can contain one or more threads.

lshosts -T differs from the bhosts -aff output:
  • Socket and core IDs are not displayed for each NUMA node.

  • The requested memory of a NUMA node is not displayed

  • lshosts -T displays all enabled CPUs on a host, not just those defined in the CPU list in lsb.hosts

A node contains sockets, a socket contains cores, and a core can contain threads if the core is enabled for multithreading.

In the following example, full topology (NUMA, socket, and core) information is shown for hostA. Hosts hostB and hostC are either not NUMA hosts or they are not available:
lshosts -T
Host[15.7G] hostA
    NUMA[0: 15.7G]
        Socket
            core(0)
        Socket
            core(1)
        Socket
            core(2)
        Socket
            core(3)
        Socket
            core(4)
        Socket
            core(5)
        Socket
            core(6)
        Socket
            core(7)

Host[-] hostB

Host[-] hostC
When LSF cannot detect processor unit topology, lshosts -T displays processor units to the closest level. For example:
lshosts -T
     Host[1009M] hostA 
            Socket (0 1)

On hostA there are two processor units: 0 and 1. LSF cannot detect core information, so the processor unit is attached to the socket level.

Hardware topology information is not shown for client hosts and hosts in a mixed cluster or MultiCluster environment running a version of LSF that is older than 9.1.3.