Description

By default, displays information about your own pending, running, and suspended jobs.

bjobs displays output for condensed host groups and compute units. These host groups and compute units are defined by CONDENSE in the HostGroup or ComputeUnit section of lsb.hosts. These groups are displayed as a single entry with the name as defined by GROUP_NAME or NAME in lsb.hosts. The -l and -X options display uncondensed output.

If you defined LSB_SHORT_HOSTLIST=1 in lsf.conf, parallel jobs running in the same condensed host group or compute unit are displayed as an abbreviated list.

For resizable jobs, bjobs displays the autoresizable attribute and the resize notification command.

To display older historical information, use bhist.

Output: Default Display

Pending jobs are displayed in the order in which they are considered for dispatch. Jobs in higher priority queues are displayed before those in lower priority queues. Pending jobs in the same priority queues are displayed in the order in which they were submitted but this order can be changed by using the commands btop or bbot. If more than one job is dispatched to a host, the jobs on that host are listed in the order in which they are considered for scheduling on this host by their queue priorities and dispatch times. Finished jobs are displayed in the order in which they were completed.

A listing of jobs is displayed with the following fields:

JOBID

The job ID that LSF assigned to the job.

USER

The user who submitted the job.

STAT

The current status of the job (see JOB STATUS below).

QUEUE

The name of the job queue to which the job belongs. If the queue to which the job belongs has been removed from the configuration, the queue name is displayed as lost_and_found. Use bhist to get the original queue name. Jobs in the lost_and_found queue remain pending until they are switched with the bswitch command into another queue.

In a MultiCluster resource leasing environment, jobs scheduled by the consumer cluster display the remote queue name in the format queue_name@cluster_name. By default, this field truncates at 10 characters, so you might not see the cluster name unless you use -w or -l.

FROM_HOST

The name of the host from which the job was submitted.

With MultiCluster, if the host is in a remote cluster, the cluster name and remote job ID are appended to the host name, in the format host_name@cluster_name:job_ID. By default, this field truncates at 11 characters; you might not see the cluster name and job ID unless you use -w or -l.

EXEC_HOST

The name of one or more hosts on which the job is executing (this field is empty if the job has not been dispatched). If the host on which the job is running has been removed from the configuration, the host name is displayed as lost_and_found. Use bhist to get the original host name.

If the host is part of a condensed host group or compute unit, the host name is displayed as the name of the condensed group.

If you configure a host to belong to more than one condensed host groups using wildcards, bjobs can display any of the host groups as execution host name.

JOB_NAME

The job name assigned by the user, or the command string assigned by default at job submission with bsub. If the job name is too long to fit in this field, then only the latter part of the job name is displayed.

The displayed job name or job command can contain up to 4094 characters for UNIX, or up to 255 characters for Windows.

SUBMIT_TIME

The submission time of the job.

Output: Long format (-l)

The -l option displays a long format listing with the following additional fields:

Job

The job ID that LSF assigned to the job. End of change

User

The ID of the user who submitted the job. End of change

Project

The project the job was submitted from.

Application Profile

The application profile the job was submitted to.

Command

The job command.

CWD

The current working directory on the submission host.

Execution CWD

The actual CWD used when job runs.

Host file

The path to a user-specified host file used when submitting or modifying a job.

Initial checkpoint period

The initial checkpoint period specified at the job level, by bsub -k, or in an application profile with CHKPNT_INITPERIOD.

Checkpoint period

The checkpoint period specified at the job level, by bsub -k, in the queue with CHKPNT, or in an application profile with CHKPNT_PERIOD.

Checkpoint directory

The checkpoint directory specified at the job level, by bsub -k, in the queue with CHKPNT, or in an application profile with CHKPNT_DIR.

Migration threshold

The migration threshold specified at the job level, by bsub -mig.

Post-execute Command

The post-execution command specified at the job-level, by bsub -Ep.

PENDING REASONS

The reason the job is in the PEND or PSUSP state. The names of the hosts associated with each reason are displayed when both -p and -l options are specified.

SUSPENDING REASONS

The reason the job is in the USUSP or SSUSP state.

loadSched: The load scheduling thresholds for the job.
loadStop: The load suspending thresholds for the job.

JOB STATUS

Possible values for the status of a job include:

PEND

The job is pending. That is, it has not yet been started.

PROV

The job has been dispatched to a power-saved host that is waking up. Before the job can be sent to the sbatchd, it is in a PROV state.

PSUSP

The job has been suspended, either by its owner or the LSF administrator, while pending.

RUN

The job is currently running.

USUSP

The job has been suspended, either by its owner or the LSF administrator, while running.

SSUSP

The job has been suspended by LSF. The job has been suspended by LSF due to either of the following two causes:

The load conditions on the execution host or hosts have exceeded a threshold according to the loadStop vector defined for the host or queue.
The run window of the job's queue is closed. See bqueues(1), bhosts(1), and lsb.queues(5).

DONE

The job has terminated with status of 0.

EXIT

The job has terminated with a non-zero status – it may have been aborted due to an error in its execution, or killed by its owner or the LSF administrator.

For example, exit code 131 means that the job exceeded a configured resource usage limit and LSF killed the job.

UNKWN

mbatchd has lost contact with the sbatchd on the host on which the job runs.

WAIT

For jobs submitted to a chunk job queue, members of a chunk job that are waiting to run.

ZOMBI

A job becomes ZOMBI if:

A non-rerunnable job is killed by bkill while the sbatchd on the execution host is unreachable and the job is shown as UNKWN.
The host on which a rerunnable job is running is unavailable and the job has been requeued by LSF with a new job ID, as if the job were submitted as a new job.
After the execution host becomes available, LSF tries to kill the ZOMBI job. Upon successful termination of the ZOMBI job, the job's status is changed to EXIT.
With MultiCluster, when a job running on a remote execution cluster becomes a ZOMBI job, the execution cluster treats the job the same way as local ZOMBI jobs. In addition, it notifies the submission cluster that the job is in ZOMBI state and the submission cluster requeues the job.

RUNTIME

Estimated run time for the job, specified by bsub -We or bmod -We, -We+, -Wep.

The following information is displayed when running bjobs -WL, -WF, or -WP.

TIME_LEFT

The estimated run time that the job has remaining. Along with the time if applicable, one of the following symbols may also display.

E: The job has an estimated run time that has not been exceeded.
L: The job has a hard run time limit specified but either has no estimated run time or the estimated run time is more than the hard run time limit.
X: The job has exceeded its estimated run time and the time displayed is the time remaining until the job reaches its hard run time limit.
A dash indicates that the job has no estimated run time and no run limit, or that it has exceeded its run time but does not have a hard limit and therefore runs until completion.

If there is less than a minute remaining, 0:0 displays.

FINISH_TIME

The estimated finish time of the job. For done/exited jobs, this is the actual finish time. For running jobs, the finish time is the start time plus the estimated run time (where set and not exceeded) or the start time plus the hard run limit.

E: The job has an estimated run time that has not been exceeded.
L: The job has a hard run time limit specified but either has no estimated run time or the estimated run time is more than the hard run time limit.
X: The job has exceeded its estimated run time and had no hard run time limit set. The finish time displayed is the estimated run time remaining plus the start time.
A dash indicates that the pending, suspended, or job with no run limit has no estimated finish time.

%COMPLETE

The estimated completion percentage of the job.

E: The job has an estimated run time that has not been exceeded.
L: The job has a hard run time limit specified but either has no estimated run time or the estimated run time is more than the hard run time limit.
X: The job has exceeded its estimated run time and had no hard run time limit set.
A dash indicates that the jobs is pending, or that it is running or suspended, but has no run time limit specified.

Note: For jobs in the state UNKNOWN, the job run time estimate is based on internal counting by the job's mbatchd.

RESOURCE USAGE

For the MultiCluster job forwarding model, this information is not shown if MultiCluster resource usage updating is disabled. Use LSF_HPC_EXTENSIONS="HOST_RUSAGE" in lsf.conf to specify host-based resource usage.

The values for the current usage of a job include:

HOST: For host-based resource usage, specifies the host.
CPU time: Cumulative total CPU time in seconds of all processes in a job. For host-based resource usage, the cumulative total CPU time in seconds of all processes in a job running on a host.
IDLE_FACTOR: Job idle information (CPU time/runtime) if JOB_IDLE is configured in the queue, and the job has triggered an idle exception.
MEM: Total resident memory usage of all processes in a job. For host-based resource usage, the total resident memory usage of all processes in a job running on a host. The sum of host-based rusage may not equal the total job rusage, since total job rusage is the maximum historical value.
By default, memory usage is shown in MB. Use LSF_UNIT_FOR_LIMITS in lsf.conf to specify a larger unit for display (MB, GB, TB, PB, or EB).
SWAP: Total virtual memory usage of all processes in a job. For host-based resource usage, the total virtual memory usage of all processes in a job running on a host. The sum of host-based rusage may not equal the total job rusage, since total job rusage is the maximum historical value.
By default, swap space is shown in MB. Use LSF_UNIT_FOR_LIMITS in lsf.conf to specify a larger unit for display (MB, GB, TB, PB, or EB).
NTHREAD: Number of currently active threads of a job.
PGID: Currently active process group ID in a job. For host-based resource usage, the currently active process group ID in a job running on a host.
PIDs: Currently active processes in a job. For host-based resource usage, the currently active active processes in a job running on a host.

RESOURCE LIMITS

The hard resource usage limits that are imposed on the jobs in the queue (see getrlimit(2) and lsb.queues(5)). These limits are imposed on a per-job and a per-process basis.

The possible per-job resource usage limits are:

CPULIMIT
TASKLIMIT
MEMLIMIT
SWAPLIMIT
PROCESSLIMIT
THREADLIMIT
OPENFILELIMIT
HOSTLIMIT_PER_JOB

The possible UNIX per-process resource usage limits are:

RUNLIMIT
FILELIMIT
DATALIMIT
STACKLIMIT
CORELIMIT

If a job submitted to the queue has any of these limits specified (see bsub(1)), then the lower of the corresponding job limits and queue limits are used for the job.

If no resource limit is specified, the resource is assumed to be unlimited. User shell limits that are unlimited are not displayed.

EXCEPTION STATUS

Possible values for the exception status of a job include:

idle: The job is consuming less CPU time than expected. The job idle factor (CPU time/runtime) is less than the configured JOB_IDLE threshold for the queue and a job exception has been triggered.
overrun: The job is running longer than the number of minutes specified by the JOB_OVERRUN threshold for the queue and a job exception has been triggered.
underrun: The job finished sooner than the number of minutes specified by the JOB_UNDERRUN threshold for the queue and a job exception has been triggered.

Requested resources

Shows all the resource requirement strings you specified in the bsub command.

Execution rusage

This is shown if the combined RES_REQ has an rusage OR || construct. The chosen alternative will be denoted here.

Synchronous Execution

Job was submitted with the -K option. LSF submits the job and waits for the job to complete.

JOB_DESCRIPTION

The job description assigned by the user. This field is omitted if no job description has been assigned.

The displayed job description can contain up to 4094 characters.

MEMORY USAGE

Displays peak memory usage and average memory usage. For example:

MEMORY USAGE:

MAX MEM:11 Mbytes; AVG MEM:6 Mbytes

You can adjust rusage accordingly next time for the same job submission if consumed memory is larger or smaller than current rusage.

RESOURCE REQUIREMENT DETAILS

Displays the configured level of resource requirement details. The BJOBS_RES_REQ_DISPLAY parameter in lsb.params controls the level of detail that this column displays, which can be as follows:

none - no resource requirements are displayed (this column is not displayed in the -l output).
brief - displays the combined and effective resource requirements.
full - displays the job, app, queue, combined and effective resource requirements.

Requested Network

Displays network resource information for IBM Parallel Edition (PE) jobs submitted with the bsub -network option. It does not display network resource information from the NETWORK_REQ parameter in lsb.queues or lsb.applications.

For example:

bjobs -l
Job <2106>, User <user1>;, Project <default>;, Status <RUN>;, Queue <normal>,
                     Command <my_pe_job>
Fri Jun  1 20:44:42: Submitted from host <hostA>, CWD <$HOME>, Requested Network
                      <protocol=mpi: mode=US: type=sn_all: instance=1: usage=dedicated>

If mode=IP is specified for the PE job, instance is not displayed.

Output: Forwarded job information

The -fwd option filters output to display information on forwarded jobs in MultiCluster job forwarding mode. The following additional fields are displayed:

CLUSTER: The name of the cluster to which the job was forwarded.
FORWARD_TIME: The time that the job was forwarded.

Output: Job array summary information

Use -A to display summary information about job arrays. The following fields are displayed:

JOBID: Job ID of the job array.
ARRAY_SPEC: Array specification in the format of name[index]. The array specification may be truncated, use -w option together with -A to show the full array specification.
OWNER: Owner of the job array.
NJOBS: Number of jobs in the job array.
PEND: Number of pending jobs of the job array.
RUN: Number of running jobs of the job array.
DONE: Number of successfully completed jobs of the job array.
EXIT: Number of unsuccessfully completed jobs of the job array.
SSUSP: Number of LSF system suspended jobs of the job array.
USUSP: Number of user suspended jobs of the job array.
PSUSP: Number of held jobs of the job array.

Output: Session Scheduler job summary information

JOBID: Job ID of the Session Scheduler job.
OWNER: Owner of the Session Scheduler job.
JOB_NAME: The job name assigned by the user, or the command string assigned by default at job submission with bsub. If the job name is too long to fit in this field, then only the latter part of the job name is displayed.
The displayed job name or job command can contain up to 4094 characters for UNIX, or up to 255 characters for Windows.
NTASKS: The total number of tasks for this Session Scheduler job.
PEND: Number of pending tasks of the Session Scheduler job.
RUN: Number of running tasks of the Session Scheduler job.
DONE: Number of successfully completed tasks of the Session Scheduler job.
EXIT: Number of unsuccessfully completed tasks of the Session Scheduler job.

Output: Unfinished job summary information

Use -sum to display summary information about unfinished jobs. The count of job slots for the following job states is displayed:

RUN: The job is running.
SSUSP: The job has been suspended by LSF.
USUSP: The job has been suspended, either by its owner or the LSF administrator, while running.
UNKNOWN: mbatchd has lost contact with the sbatchd on the host where the job was running.
PEND: The job is pending, which may include PSUSP and chunk job WAIT. When -sum is used with -p in MultiCluster, WAIT jobs are not counted as PEND or FWD_PEND. When -sum is used with -r, WAIT jobs are counted as PEND or FWD_PEND.
FWD_PEND: The job is pending and forwarded to a remote cluster. The job has not yet started in the remote cluster.

Output: Affinity resource requirements information (-l -aff)

Use -l -aff to display information about CPU and memory affinity resource requirements for job tasks. A table with the heading AFFINITY is displayed containing the detailed affinity information for each task, one line for each allocated processor unit. CPU binding and memory binding information are shown in separate columns in the display.

HOST

The host the task is running on

TYPE

Requested processor unit type for CPU binding. One of numa, socket, core, or thread.

LEVEL

Requested processor unit binding level for CPU binding. One of numa, socket, core, or thread. If no CPU binding level is requested, a dash (-) is displayed.

EXCL

Requested processor unit binding level for exclusive CPU binding. One of numa, socket, or core. If no exclusive binding level is requested, a dash (-) is displayed.

IDS

List of physical or logical IDs of the CPU allocation for the task.

The list consists of a set of paths, represented as a sequence integers separated by slash characters (/), through the topology tree of the host. Each path identifies a unique processing unit allocated to the task. For example, a string of the form 3/0/5/12 represents an allocation to thread 12 in core 5 of socket 0 in NUMA node 3. A string of the form 2/1/4represents an allocation to core 4 of socket 1 in NUMA node 2. The integers correspond to the node ID numbers displayed in the topology tree from bhosts -aff.

POL

Requested memory binding policy. Eitherlocal or pref. If no memory binding is requested, a dash (-) is displayed.

NUMA

ID of the NUMA node that the task memory is bound to. If no memory binding is requested, a dash (-) is displayed.

SIZE

Amount of memory allocated for the task on the NUMA node.

For example the following job starts 6 tasks with the following affinity resource requirements:

bsub -n 6 -R"span[hosts=1] rusage[mem=100]affinity[core(1,same=socket,
exclusive=(socket,injob)):cpubind=socket:membind=localonly:distribute=pack]" myjob
Job <6> is submitted to default queue <normal>.

bjobs -l -aff 6

Job <6>, User <user1>, Project <default>, Status <RUN>, Queue <normal>, Comman
                     d <myjob1>
Thu Feb 14 14:13:46: Submitted from host <hostA>, CWD <$HOME>, 6 Task(s), 
                     Requested Resources <span[hosts=1] rusage[mem=10
                     0]affinity[core(1,same=socket,exclusive=(socket,injob)):cp
                     ubind=socket:membind=localonly:distribute=pack]>;
Thu Feb 14 14:15:07: Started 6 Task(s) on Hosts <hostA> <hostA> <hostA> <hostA>
                     <hostA> <hostA>, Allocated 6 Slot(s) on Hosts <hostA>
                     <hostA> <hostA> <hostA> <hostA> <hostA>, Execution Home 
                     </home/user1>, Execution CWD </home/user1>;

 SCHEDULING PARAMETERS:
           r15s   r1m  r15m   ut      pg    io   ls    it    tmp    swp    mem
 loadSched   -     -     -     -       -     -    -     -     -      -      -
 loadStop    -     -     -     -       -     -    -     -     -      -      -

 RESOURCE REQUIREMENT DETAILS:
 Combined: select[type == local] order[r15s:pg] rusage[mem=100.00] span[hosts=1
                     ] affinity[core(1,same=socket,exclusive=(socket,injob))*1:
                     cpubind=socket:membind=localonly:distribute=pack]
 Effective: select[type == local] order[r15s:pg] rusage[mem=100.00] span[hosts=
                     1] affinity[core(1,same=socket,exclusive=(socket,injob))*1
                     :cpubind=socket:membind=localonly:distribute=pack]

 AFFINITY:
                     CPU BINDING                          MEMORY BINDING
                     ------------------------             --------------------
 HOST                TYPE   LEVEL  EXCL   IDS             POL   NUMA SIZE
 hostA               core   socket socket /0/0/0          local 0    16.7MB
 hostA               core   socket socket /0/1/0          local 0    16.7MB
 hostA               core   socket socket /0/2/0          local 0    16.7MB
 hostA               core   socket socket /0/3/0          local 0    16.7MB
 hostA               core   socket socket /0/4/0          local 0    16.7MB
 hostA               core   socket socket /0/5/0          local 0    16.7MB