Description

By default, displays information about your own pending, running, and suspended jobs.

bjobs displays output for condensed host groups and compute units. These host groups and compute units are defined by CONDENSE in the HostGroup or ComputeUnit section of lsb.hosts. These groups are displayed as a single entry with the name as defined by GROUP_NAME or NAME in lsb.hosts. The -l and -X options display uncondensed output.

If you defined LSB_SHORT_HOSTLIST=1 in lsf.conf, parallel jobs running in the same condensed host group or compute unit are displayed as an abbreviated list.

For resizable jobs, bjobs displays the autoresizable attribute and the resize notification command.

To display older historical information, use bhist.

Output: Default Display

Pending jobs are displayed in the order in which they are considered for dispatch. Jobs in higher priority queues are displayed before those in lower priority queues. Pending jobs in the same priority queues are displayed in the order in which they were submitted but this order can be changed by using the commands btop or bbot. If more than one job is dispatched to a host, the jobs on that host are listed in the order in which they are considered for scheduling on this host by their queue priorities and dispatch times. Finished jobs are displayed in the order in which they were completed.

A listing of jobs is displayed with the following fields:

JOBID
The job ID that LSF assigned to the job.
USER
The user who submitted the job.
STAT
The current status of the job (see JOB STATUS below).
QUEUE
The name of the job queue to which the job belongs. If the queue to which the job belongs has been removed from the configuration, the queue name is displayed as lost_and_found. Use bhist to get the original queue name. Jobs in the lost_and_found queue remain pending until they are switched with the bswitch command into another queue.

In a MultiCluster resource leasing environment, jobs scheduled by the consumer cluster display the remote queue name in the format queue_name@cluster_name. By default, this field truncates at 10 characters, so you might not see the cluster name unless you use -w or -l.

FROM_HOST
The name of the host from which the job was submitted.

With MultiCluster, if the host is in a remote cluster, the cluster name and remote job ID are appended to the host name, in the format host_name@cluster_name:job_ID. By default, this field truncates at 11 characters; you might not see the cluster name and job ID unless you use -w or -l.

EXEC_HOST
The name of one or more hosts on which the job is executing (this field is empty if the job has not been dispatched). If the host on which the job is running has been removed from the configuration, the host name is displayed as lost_and_found. Use bhist to get the original host name.

If the host is part of a condensed host group or compute unit, the host name is displayed as the name of the condensed group.

If you configure a host to belong to more than one condensed host groups using wildcards, bjobs can display any of the host groups as execution host name.

JOB_NAME
The job name assigned by the user, or the command string assigned by default at job submission with bsub. If the job name is too long to fit in this field, then only the latter part of the job name is displayed.

The displayed job name or job command can contain up to 4094 characters for UNIX, or up to 255 characters for Windows.

SUBMIT_TIME
The submission time of the job.

Output: Long format (-l)

The -l option displays a long format listing with the following additional fields:

Start of change Job End of change
Start of change The job ID that LSF assigned to the job. End of change
Start of change User End of change
Start of change The ID of the user who submitted the job. End of change
Project
The project the job was submitted from.
Application Profile
The application profile the job was submitted to.
Command
The job command.
CWD
The current working directory on the submission host.
Execution CWD
The actual CWD used when job runs.
Host file

The path to a user-specified host file used when submitting or modifying a job.

Initial checkpoint period
The initial checkpoint period specified at the job level, by bsub -k, or in an application profile with CHKPNT_INITPERIOD.
Checkpoint period
The checkpoint period specified at the job level, by bsub -k, in the queue with CHKPNT, or in an application profile with CHKPNT_PERIOD.
Checkpoint directory
The checkpoint directory specified at the job level, by bsub -k, in the queue with CHKPNT, or in an application profile with CHKPNT_DIR.
Migration threshold
The migration threshold specified at the job level, by bsub -mig.
Post-execute Command
The post-execution command specified at the job-level, by bsub -Ep.
PENDING REASONS
The reason the job is in the PEND or PSUSP state. The names of the hosts associated with each reason are displayed when both -p and -l options are specified.
SUSPENDING REASONS
The reason the job is in the USUSP or SSUSP state.
loadSched
The load scheduling thresholds for the job.
loadStop
The load suspending thresholds for the job.
JOB STATUS
Possible values for the status of a job include:
PEND

The job is pending. That is, it has not yet been started.

PROV
The job has been dispatched to a power-saved host that is waking up. Before the job can be sent to the sbatchd, it is in a PROV state.
PSUSP
The job has been suspended, either by its owner or the LSF administrator, while pending.
RUN
The job is currently running.
USUSP
The job has been suspended, either by its owner or the LSF administrator, while running.
SSUSP
The job has been suspended by LSF. The job has been suspended by LSF due to either of the following two causes:
  • The load conditions on the execution host or hosts have exceeded a threshold according to the loadStop vector defined for the host or queue.
  • The run window of the job's queue is closed. See bqueues(1), bhosts(1), and lsb.queues(5).
DONE
The job has terminated with status of 0.
EXIT
The job has terminated with a non-zero status – it may have been aborted due to an error in its execution, or killed by its owner or the LSF administrator.

For example, exit code 131 means that the job exceeded a configured resource usage limit and LSF killed the job.

UNKWN
mbatchd has lost contact with the sbatchd on the host on which the job runs.
WAIT
For jobs submitted to a chunk job queue, members of a chunk job that are waiting to run.
ZOMBI
A job becomes ZOMBI if:
  • A non-rerunnable job is killed by bkill while the sbatchd on the execution host is unreachable and the job is shown as UNKWN.
  • The host on which a rerunnable job is running is unavailable and the job has been requeued by LSF with a new job ID, as if the job were submitted as a new job.
  • After the execution host becomes available, LSF tries to kill the ZOMBI job. Upon successful termination of the ZOMBI job, the job's status is changed to EXIT.

    With MultiCluster, when a job running on a remote execution cluster becomes a ZOMBI job, the execution cluster treats the job the same way as local ZOMBI jobs. In addition, it notifies the submission cluster that the job is in ZOMBI state and the submission cluster requeues the job.

RUNTIME
Estimated run time for the job, specified by bsub -We or bmod -We, -We+, -Wep.

The following information is displayed when running bjobs -WL, -WF, or -WP.

TIME_LEFT
The estimated run time that the job has remaining. Along with the time if applicable, one of the following symbols may also display.
  • E: The job has an estimated run time that has not been exceeded.
  • L: The job has a hard run time limit specified but either has no estimated run time or the estimated run time is more than the hard run time limit.
  • X: The job has exceeded its estimated run time and the time displayed is the time remaining until the job reaches its hard run time limit.
  • A dash indicates that the job has no estimated run time and no run limit, or that it has exceeded its run time but does not have a hard limit and therefore runs until completion.

If there is less than a minute remaining, 0:0 displays.

FINISH_TIME
The estimated finish time of the job. For done/exited jobs, this is the actual finish time. For running jobs, the finish time is the start time plus the estimated run time (where set and not exceeded) or the start time plus the hard run limit.
  • E: The job has an estimated run time that has not been exceeded.
  • L: The job has a hard run time limit specified but either has no estimated run time or the estimated run time is more than the hard run time limit.
  • X: The job has exceeded its estimated run time and had no hard run time limit set. The finish time displayed is the estimated run time remaining plus the start time.
  • A dash indicates that the pending, suspended, or job with no run limit has no estimated finish time.
%COMPLETE
The estimated completion percentage of the job.
  • E: The job has an estimated run time that has not been exceeded.
  • L: The job has a hard run time limit specified but either has no estimated run time or the estimated run time is more than the hard run time limit.
  • X: The job has exceeded its estimated run time and had no hard run time limit set.
  • A dash indicates that the jobs is pending, or that it is running or suspended, but has no run time limit specified.
Note: For jobs in the state UNKNOWN, the job run time estimate is based on internal counting by the job's mbatchd.
RESOURCE USAGE
For the MultiCluster job forwarding model, this information is not shown if MultiCluster resource usage updating is disabled. Use LSF_HPC_EXTENSIONS="HOST_RUSAGE" in lsf.conf to specify host-based resource usage.

The values for the current usage of a job include:

HOST
For host-based resource usage, specifies the host.
CPU time
Cumulative total CPU time in seconds of all processes in a job. For host-based resource usage, the cumulative total CPU time in seconds of all processes in a job running on a host.
IDLE_FACTOR
Job idle information (CPU time/runtime) if JOB_IDLE is configured in the queue, and the job has triggered an idle exception.
MEM
Total resident memory usage of all processes in a job. For host-based resource usage, the total resident memory usage of all processes in a job running on a host. The sum of host-based rusage may not equal the total job rusage, since total job rusage is the maximum historical value.

By default, memory usage is shown in MB. Use LSF_UNIT_FOR_LIMITS in lsf.conf to specify a larger unit for display (MB, GB, TB, PB, or EB).

SWAP
Total virtual memory usage of all processes in a job. For host-based resource usage, the total virtual memory usage of all processes in a job running on a host. The sum of host-based rusage may not equal the total job rusage, since total job rusage is the maximum historical value.

By default, swap space is shown in MB. Use LSF_UNIT_FOR_LIMITS in lsf.conf to specify a larger unit for display (MB, GB, TB, PB, or EB).

NTHREAD
Number of currently active threads of a job.
PGID
Currently active process group ID in a job. For host-based resource usage, the currently active process group ID in a job running on a host.
PIDs
Currently active processes in a job. For host-based resource usage, the currently active active processes in a job running on a host.
RESOURCE LIMITS
The hard resource usage limits that are imposed on the jobs in the queue (see getrlimit(2) and lsb.queues(5)). These limits are imposed on a per-job and a per-process basis.
The possible per-job resource usage limits are:
  • CPULIMIT
  • Start of change TASKLIMIT End of change
  • MEMLIMIT
  • SWAPLIMIT
  • PROCESSLIMIT
  • THREADLIMIT
  • OPENFILELIMIT
  • HOSTLIMIT_PER_JOB
The possible UNIX per-process resource usage limits are:
  • RUNLIMIT
  • FILELIMIT
  • DATALIMIT
  • STACKLIMIT
  • CORELIMIT

If a job submitted to the queue has any of these limits specified (see bsub(1)), then the lower of the corresponding job limits and queue limits are used for the job.

If no resource limit is specified, the resource is assumed to be unlimited. User shell limits that are unlimited are not displayed.

EXCEPTION STATUS
Possible values for the exception status of a job include:
idle

The job is consuming less CPU time than expected. The job idle factor (CPU time/runtime) is less than the configured JOB_IDLE threshold for the queue and a job exception has been triggered.

overrun
The job is running longer than the number of minutes specified by the JOB_OVERRUN threshold for the queue and a job exception has been triggered.
underrun
The job finished sooner than the number of minutes specified by the JOB_UNDERRUN threshold for the queue and a job exception has been triggered.
Requested resources
Shows all the resource requirement strings you specified in the bsub command.
Execution rusage
This is shown if the combined RES_REQ has an rusage OR || construct. The chosen alternative will be denoted here.
Synchronous Execution
Job was submitted with the -K option. LSF submits the job and waits for the job to complete.
JOB_DESCRIPTION
The job description assigned by the user. This field is omitted if no job description has been assigned.

The displayed job description can contain up to 4094 characters.

MEMORY USAGE
Displays peak memory usage and average memory usage. For example:

MEMORY USAGE:

MAX MEM:11 Mbytes; AVG MEM:6 Mbytes

You can adjust rusage accordingly next time for the same job submission if consumed memory is larger or smaller than current rusage.

RESOURCE REQUIREMENT DETAILS
Displays the configured level of resource requirement details. The BJOBS_RES_REQ_DISPLAY parameter in lsb.params controls the level of detail that this column displays, which can be as follows:
  • none - no resource requirements are displayed (this column is not displayed in the -l output).
  • brief - displays the combined and effective resource requirements.
  • full - displays the job, app, queue, combined and effective resource requirements.
Requested Network
Displays network resource information for IBM Parallel Edition (PE) jobs submitted with the bsub -network option. It does not display network resource information from the NETWORK_REQ parameter in lsb.queues or lsb.applications.
For example:
bjobs -l
Job <2106>, User <user1>;, Project <default>;, Status <RUN>;, Queue <normal>,
                     Command <my_pe_job>
Fri Jun  1 20:44:42: Submitted from host <hostA>, CWD <$HOME>, Requested Network
                      <protocol=mpi: mode=US: type=sn_all: instance=1: usage=dedicated>

If mode=IP is specified for the PE job, instance is not displayed.

Output: Forwarded job information

The -fwd option filters output to display information on forwarded jobs in MultiCluster job forwarding mode. The following additional fields are displayed:

CLUSTER
The name of the cluster to which the job was forwarded.
FORWARD_TIME
The time that the job was forwarded.

Output: Job array summary information

Use -A to display summary information about job arrays. The following fields are displayed:

JOBID

Job ID of the job array.

ARRAY_SPEC
Array specification in the format of name[index]. The array specification may be truncated, use -w option together with -A to show the full array specification.
OWNER
Owner of the job array.
NJOBS
Number of jobs in the job array.
PEND
Number of pending jobs of the job array.
RUN

Number of running jobs of the job array.

DONE
Number of successfully completed jobs of the job array.
EXIT
Number of unsuccessfully completed jobs of the job array.
SSUSP
Number of LSF system suspended jobs of the job array.
USUSP
Number of user suspended jobs of the job array.
PSUSP
Number of held jobs of the job array.

Output: Session Scheduler job summary information

JOBID
Job ID of the Session Scheduler job.
OWNER
Owner of the Session Scheduler job.
JOB_NAME
The job name assigned by the user, or the command string assigned by default at job submission with bsub. If the job name is too long to fit in this field, then only the latter part of the job name is displayed.

The displayed job name or job command can contain up to 4094 characters for UNIX, or up to 255 characters for Windows.

NTASKS
The total number of tasks for this Session Scheduler job.
PEND
Number of pending tasks of the Session Scheduler job.
RUN
Number of running tasks of the Session Scheduler job.
DONE
Number of successfully completed tasks of the Session Scheduler job.
EXIT
Number of unsuccessfully completed tasks of the Session Scheduler job.

Output: Unfinished job summary information

Use -sum to display summary information about unfinished jobs. The count of job slots for the following job states is displayed:

RUN
The job is running.
SSUSP
The job has been suspended by LSF.
USUSP
The job has been suspended, either by its owner or the LSF administrator, while running.
UNKNOWN
mbatchd has lost contact with the sbatchd on the host where the job was running.
PEND
The job is pending, which may include PSUSP and chunk job WAIT. When -sum is used with -p in MultiCluster, WAIT jobs are not counted as PEND or FWD_PEND. When -sum is used with -r, WAIT jobs are counted as PEND or FWD_PEND.
FWD_PEND
The job is pending and forwarded to a remote cluster. The job has not yet started in the remote cluster.

Output: Affinity resource requirements information (-l -aff)

Use -l -aff to display information about CPU and memory affinity resource requirements for job tasks. A table with the heading AFFINITY is displayed containing the detailed affinity information for each task, one line for each allocated processor unit. CPU binding and memory binding information are shown in separate columns in the display.

HOST

The host the task is running on

TYPE

Requested processor unit type for CPU binding. One of numa, socket, core, or thread.

LEVEL

Requested processor unit binding level for CPU binding. One of numa, socket, core, or thread. If no CPU binding level is requested, a dash (-) is displayed.

EXCL

Requested processor unit binding level for exclusive CPU binding. One of numa, socket, or core. If no exclusive binding level is requested, a dash (-) is displayed.

IDS

List of physical or logical IDs of the CPU allocation for the task.

The list consists of a set of paths, represented as a sequence integers separated by slash characters (/), through the topology tree of the host. Each path identifies a unique processing unit allocated to the task. For example, a string of the form 3/0/5/12 represents an allocation to thread 12 in core 5 of socket 0 in NUMA node 3. A string of the form 2/1/4represents an allocation to core 4 of socket 1 in NUMA node 2. The integers correspond to the node ID numbers displayed in the topology tree from bhosts -aff.

POL

Requested memory binding policy. Eitherlocal or pref. If no memory binding is requested, a dash (-) is displayed.

NUMA

ID of the NUMA node that the task memory is bound to. If no memory binding is requested, a dash (-) is displayed.

SIZE

Amount of memory allocated for the task on the NUMA node.

For example the following job starts 6 tasks with the following affinity resource requirements:
bsub -n 6 -R"span[hosts=1] rusage[mem=100]affinity[core(1,same=socket,
exclusive=(socket,injob)):cpubind=socket:membind=localonly:distribute=pack]" myjob
Job <6> is submitted to default queue <normal>.
Start of change
bjobs -l -aff 6

Job <6>, User <user1>, Project <default>, Status <RUN>, Queue <normal>, Comman
                     d <myjob1>
Thu Feb 14 14:13:46: Submitted from host <hostA>, CWD <$HOME>, 6 Task(s), 
                     Requested Resources <span[hosts=1] rusage[mem=10
                     0]affinity[core(1,same=socket,exclusive=(socket,injob)):cp
                     ubind=socket:membind=localonly:distribute=pack]>;
Thu Feb 14 14:15:07: Started 6 Task(s) on Hosts <hostA> <hostA> <hostA> <hostA>
                     <hostA> <hostA>, Allocated 6 Slot(s) on Hosts <hostA>
                     <hostA> <hostA> <hostA> <hostA> <hostA>, Execution Home 
                     </home/user1>, Execution CWD </home/user1>;

 SCHEDULING PARAMETERS:
           r15s   r1m  r15m   ut      pg    io   ls    it    tmp    swp    mem
 loadSched   -     -     -     -       -     -    -     -     -      -      -
 loadStop    -     -     -     -       -     -    -     -     -      -      -

 RESOURCE REQUIREMENT DETAILS:
 Combined: select[type == local] order[r15s:pg] rusage[mem=100.00] span[hosts=1
                     ] affinity[core(1,same=socket,exclusive=(socket,injob))*1:
                     cpubind=socket:membind=localonly:distribute=pack]
 Effective: select[type == local] order[r15s:pg] rusage[mem=100.00] span[hosts=
                     1] affinity[core(1,same=socket,exclusive=(socket,injob))*1
                     :cpubind=socket:membind=localonly:distribute=pack]

 AFFINITY:
                     CPU BINDING                          MEMORY BINDING
                     ------------------------             --------------------
 HOST                TYPE   LEVEL  EXCL   IDS             POL   NUMA SIZE
 hostA               core   socket socket /0/0/0          local 0    16.7MB
 hostA               core   socket socket /0/1/0          local 0    16.7MB
 hostA               core   socket socket /0/2/0          local 0    16.7MB
 hostA               core   socket socket /0/3/0          local 0    16.7MB
 hostA               core   socket socket /0/4/0          local 0    16.7MB
 hostA               core   socket socket /0/5/0          local 0    16.7MB
  
End of change

See also

bsub, bkill, bhosts, bmgroup, bclusters, bqueues, bhist, bresume, bsla, bstop, lsb.params, lsb.serviceclasses, mbatchd