By default, displays information about your own pending,
running, and suspended jobs.
bjobs displays output
for condensed host groups and compute units. These host groups and
compute units are defined by CONDENSE in the HostGroup or ComputeUnit section
of lsb.hosts. These groups are displayed as a
single entry with the name as defined by GROUP_NAME or NAME in lsb.hosts.
The -l and -X options display uncondensed
output.
If you defined LSB_SHORT_HOSTLIST=1
in lsf.conf, parallel jobs running in the same
condensed host group or compute unit are displayed as an abbreviated
list.
For resizable jobs, bjobs displays
the autoresizable attribute and the resize notification command.
To display older historical information, use bhist.
Output: Default Display
Pending jobs are displayed in the order in which
they are considered for dispatch. Jobs in higher priority queues are
displayed before those in lower priority queues. Pending jobs in the
same priority queues are displayed in the order in which they were
submitted but this order can be changed by using the commands btop or bbot.
If more than one job is dispatched to a host, the jobs on that host
are listed in the order in which they are considered for scheduling
on this host by their queue priorities and dispatch times. Finished
jobs are displayed in the order in which they were completed.
A listing of jobs is displayed with the following
fields:
- JOBID
- The job
ID that LSF assigned to the job.
- USER
- The user
who submitted the job.
- STAT
- The current
status of the job (see JOB STATUS below).
- QUEUE
- The name
of the job queue to which the job belongs. If the queue to which the
job belongs has been removed from the configuration, the queue name
is displayed as lost_and_found.
Use bhist to get the original queue name. Jobs
in the lost_and_found queue remain pending
until they are switched with the bswitch command
into another queue.
In a MultiCluster resource leasing environment,
jobs scheduled by the consumer cluster display the remote queue name
in the format queue_name@cluster_name.
By default, this field truncates at 10 characters, so you might not
see the cluster name unless you use -w or -l.
- FROM_HOST
- The
name of the host from which the job was submitted.
With MultiCluster,
if the host is in a remote cluster, the cluster name and remote job
ID are appended to the host name, in the format host_name@cluster_name:job_ID.
By default, this field truncates at 11 characters; you might not see
the cluster name and job ID unless you use -w or -l.
- EXEC_HOST
- The
name of one or more hosts on which the job is executing (this field
is empty if the job has not been dispatched). If
the host on which the job is running has been removed from the configuration,
the host name is displayed as lost_and_found.
Use bhist to get the original host name.
If
the host is part of a condensed host group or compute unit, the host
name is displayed as the name of the condensed group.
If you
configure a host to belong to more than one condensed host groups
using wildcards, bjobs can display any of the host
groups as execution host name.
- JOB_NAME
- The
job name assigned by the user, or the command string assigned by default
at job submission with bsub. If the job name is
too long to fit in this field, then only the latter part of the job
name is displayed.
The displayed job name or job command can contain
up to 4094 characters for UNIX, or up to 255 characters for Windows.
- SUBMIT_TIME
- The
submission time of the job.
Output: Long format (-l)
The -l option displays a long format
listing with the following additional fields:
Job 
The job
ID that LSF assigned to the job. 
User 
The
ID of the user who submitted the job. 
- Project
- The
project the job was submitted from.
- Application Profile
- The
application profile the job was submitted to.
- Command
- The
job command.
- CWD
- The current
working directory on the submission host.
- Execution CWD
- The actual
CWD used when job runs.
- Host file
The
path to a user-specified host file used when submitting or modifying
a job.
- Initial checkpoint period
- The
initial checkpoint period specified at the job level, by bsub
-k, or in an application profile with CHKPNT_INITPERIOD.
- Checkpoint period
- The
checkpoint period specified at the job level, by bsub -k,
in the queue with CHKPNT, or in an application profile with CHKPNT_PERIOD.
- Checkpoint directory
- The
checkpoint directory specified at the job level, by bsub
-k, in the queue with CHKPNT, or in an application profile
with CHKPNT_DIR.
- Migration threshold
- The
migration threshold specified at the job level, by bsub -mig.
- Post-execute Command
- The
post-execution command specified at the job-level, by bsub
-Ep.
- PENDING REASONS
- The
reason the job is in the PEND or PSUSP state. The names of the hosts
associated with each reason are displayed when both -p and -l options
are specified.
- SUSPENDING REASONS
- The
reason the job is in the USUSP or SSUSP state.
- loadSched
- The
load scheduling thresholds for the job.
- loadStop
- The
load suspending thresholds for the job.
- JOB STATUS
- Possible
values for the status of a job include:
- PEND
-
The job is pending. That is, it has not
yet been started.
- PROV
- The
job has been dispatched to a power-saved host that is waking up. Before
the job can be sent to the sbatchd, it is in a PROV state.
- PSUSP
- The
job has been suspended, either by its owner or the LSF administrator,
while pending.
- RUN
- The job
is currently running.
- USUSP
- The
job has been suspended, either by its owner or the LSF administrator,
while running.
- SSUSP
- The
job has been suspended by LSF. The job has been suspended by LSF due
to either of the following two causes:
- The load conditions on the execution host or hosts have exceeded
a threshold according to the loadStop vector
defined for the host or queue.
- The run window of the job's queue is closed. See bqueues(1), bhosts(1),
and lsb.queues(5).
- DONE
- The
job has terminated with status of 0.
- EXIT
- The
job has terminated with a non-zero status – it may have been
aborted due to an error in its execution, or killed by its owner or
the LSF administrator.
For example, exit code 131 means that the
job exceeded a configured resource usage limit and LSF killed the
job.
- UNKWN
- mbatchd has
lost contact with the sbatchd on the host on which
the job runs.
- WAIT
- For
jobs submitted to a chunk job queue, members of a chunk job that are
waiting to run.
- ZOMBI
- A job
becomes ZOMBI if:
- A non-rerunnable job is killed by bkill while
the sbatchd on the execution host is unreachable
and the job is shown as UNKWN.
- The host on which a rerunnable job is running is unavailable and
the job has been requeued by LSF with a new job ID, as if the job
were submitted as a new job.
- After the execution host becomes available, LSF tries to kill
the ZOMBI job. Upon successful termination of the ZOMBI job, the job's
status is changed to EXIT.
With MultiCluster, when a job running
on a remote execution cluster becomes a ZOMBI job, the execution cluster
treats the job the same way as local ZOMBI jobs. In addition, it notifies
the submission cluster that the job is in ZOMBI state and the submission
cluster requeues the job.
- RUNTIME
- Estimated
run time for the job, specified by bsub -We or bmod
-We, -We+, -Wep.
The following information is displayed
when running bjobs -WL, -WF,
or -WP.
- TIME_LEFT
- The estimated run time that the job has remaining. Along with
the time if applicable, one of the following symbols may also display.
- E: The job has an estimated run time that has not been exceeded.
- L: The job has a hard run time limit specified but either has
no estimated run time or the estimated run time is more than the hard
run time limit.
- X: The job has exceeded its estimated run time and the time displayed
is the time remaining until the job reaches its hard run time limit.
- A dash indicates that the job has no estimated run time and no
run limit, or that it has exceeded its run time but does not have
a hard limit and therefore runs until completion.
If there is less than a minute remaining, 0:0 displays.
- FINISH_TIME
- The estimated finish time of the job. For done/exited jobs, this
is the actual finish time. For running jobs, the finish time is the
start time plus the estimated run time (where set and not exceeded)
or the start time plus the hard run limit.
- E: The job has an estimated run time that has not been exceeded.
- L: The job has a hard run time limit specified but either has
no estimated run time or the estimated run time is more than the hard
run time limit.
- X: The job has exceeded its estimated run time and had no hard
run time limit set. The finish time displayed is the estimated run
time remaining plus the start time.
- A dash indicates that the pending, suspended, or job with no run
limit has no estimated finish time.
- %COMPLETE
- The estimated completion percentage of the job.
- E: The job has an estimated run time that has not been exceeded.
- L: The job has a hard run time limit specified but either has
no estimated run time or the estimated run time is more than the hard
run time limit.
- X: The job has exceeded its estimated run time and had no hard
run time limit set.
- A dash indicates that the jobs is pending, or that it is running
or suspended, but has no run time limit specified.
Note: For
jobs in the state UNKNOWN, the job run time estimate is based on internal
counting by the job's mbatchd.
- RESOURCE USAGE
- For
the MultiCluster job forwarding model, this information is not shown
if MultiCluster resource usage updating is disabled. Use LSF_HPC_EXTENSIONS="HOST_RUSAGE" in lsf.conf to
specify host-based resource usage.
The values for the current usage
of a job include:
- HOST
- For
host-based resource usage, specifies the host.
- CPU time
- Cumulative
total CPU time in seconds of all processes in a job. For host-based
resource usage, the cumulative total CPU time in seconds of all processes
in a job running on a host.
- IDLE_FACTOR
- Job
idle information (CPU time/runtime) if JOB_IDLE is configured in the
queue, and the job has triggered an idle exception.
- MEM
- Total
resident memory usage of all processes in a job. For host-based resource
usage, the total resident memory usage of all processes in a job running
on a host. The sum of host-based rusage may not equal the total job
rusage, since total job rusage is the maximum historical value.
By
default, memory usage is shown in MB. Use LSF_UNIT_FOR_LIMITS in lsf.conf to
specify a larger unit for display (MB, GB, TB, PB, or EB).
- SWAP
- Total
virtual memory usage of all processes in a job. For host-based resource
usage, the total virtual memory usage of all processes in a job running
on a host. The sum of host-based rusage may not equal the total job
rusage, since total job rusage is the maximum historical value.
By
default, swap space is shown in MB. Use LSF_UNIT_FOR_LIMITS in lsf.conf to
specify a larger unit for display (MB, GB, TB, PB, or EB).
- NTHREAD
- Number
of currently active threads of a job.
- PGID
- Currently
active process group ID in a job. For host-based resource usage, the
currently active process group ID in a job running on a host.
- PIDs
- Currently
active processes in a job. For host-based resource usage, the currently
active active processes in a job running on a host.
- RESOURCE LIMITS
- The
hard resource usage limits that are imposed on the jobs in the queue
(see getrlimit(2) and lsb.queues(5)).
These limits are imposed on a per-job and a per-process basis.
The
possible per-job resource usage limits are:
- CPULIMIT
TASKLIMIT 
- MEMLIMIT
- SWAPLIMIT
- PROCESSLIMIT
- THREADLIMIT
- OPENFILELIMIT
- HOSTLIMIT_PER_JOB
The possible UNIX per-process resource usage limits are:
- RUNLIMIT
- FILELIMIT
- DATALIMIT
- STACKLIMIT
- CORELIMIT
If a job submitted to the queue has any of these limits
specified (see bsub(1)), then the lower of the
corresponding job limits and queue limits are used for the job.
If
no resource limit is specified, the resource is assumed to be unlimited.
User shell limits that are unlimited are not displayed.
- EXCEPTION STATUS
- Possible
values for the exception status of a job include:
- idle
-
The job is
consuming less CPU time than expected. The job idle factor (CPU time/runtime)
is less than the configured JOB_IDLE threshold for the queue and a
job exception has been triggered.
- overrun
- The
job is running longer than the number of minutes specified by the
JOB_OVERRUN threshold for the queue and a job exception has been triggered.
- underrun
- The
job finished sooner than the number of minutes specified by the JOB_UNDERRUN
threshold for the queue and a job exception has been triggered.
- Requested resources
- Shows
all the resource requirement strings you specified in the bsub command.
- Execution rusage
- This
is shown if the combined RES_REQ has an rusage OR || construct. The
chosen alternative will be denoted here.
- Synchronous Execution
- Job was submitted with the -K option. LSF submits
the job and waits for the job to complete.
- JOB_DESCRIPTION
- The
job description assigned by the user. This field is omitted if no
job description has been assigned.
The displayed job description
can contain up to 4094 characters.
- MEMORY USAGE
- Displays
peak memory usage and average memory usage. For example:
MEMORY
USAGE:
MAX MEM:11 Mbytes; AVG MEM:6 Mbytes
You can adjust rusage accordingly
next time for the same job submission if consumed memory is larger
or smaller than current rusage.
- RESOURCE REQUIREMENT DETAILS
- Displays
the configured level of resource requirement details. The BJOBS_RES_REQ_DISPLAY parameter
in lsb.params controls the level of detail that
this column displays, which can be as follows:
- none - no resource requirements are displayed (this column is
not displayed in the -l output).
- brief - displays the combined and effective resource requirements.
- full - displays the job, app, queue, combined and effective resource
requirements.
- Requested Network
- Displays
network resource information for IBM Parallel Edition (PE) jobs submitted
with the bsub -network option. It does not display
network resource information from the NETWORK_REQ parameter
in lsb.queues or lsb.applications.
For
example:
bjobs -l
Job <2106>, User <user1>;, Project <default>;, Status <RUN>;, Queue <normal>,
Command <my_pe_job>
Fri Jun 1 20:44:42: Submitted from host <hostA>, CWD <$HOME>, Requested Network
<protocol=mpi: mode=US: type=sn_all: instance=1: usage=dedicated>
If mode=IP is specified for the
PE job, instance is not displayed.
Output: Forwarded job information
The -fwd option filters
output to display information on forwarded jobs in MultiCluster job
forwarding mode. The following additional fields are displayed:
- CLUSTER
- The
name of the cluster to which the job was forwarded.
- FORWARD_TIME
- The
time that the job was forwarded.
Output: Job array summary information
Use -A to display summary information
about job arrays. The following fields are displayed:
- JOBID
-
Job ID of the job array.
- ARRAY_SPEC
- Array
specification in the format of name[index].
The array specification may be truncated, use -w option
together with -A to show the full array specification.
- OWNER
- Owner
of the job array.
- NJOBS
- Number
of jobs in the job array.
- PEND
- Number
of pending jobs of the job array.
- RUN
-
Number of running jobs of the job array.
- DONE
- Number
of successfully completed jobs of the job array.
- EXIT
- Number
of unsuccessfully completed jobs of the job array.
- SSUSP
- Number
of LSF system suspended jobs of the job array.
- USUSP
- Number
of user suspended jobs of the job array.
- PSUSP
- Number
of held jobs of the job array.
Output: Session Scheduler job
summary information
- JOBID
- Job
ID of the Session Scheduler job.
- OWNER
- Owner of the Session Scheduler job.
- JOB_NAME
- The
job name assigned by the user, or the command string assigned by default
at job submission with bsub. If the job name is
too long to fit in this field, then only the latter part of the job
name is displayed.
The displayed job name or job command can contain
up to 4094 characters for UNIX, or up to 255 characters for Windows.
- NTASKS
- The total number of tasks for this Session Scheduler job.
- PEND
- Number of pending tasks of the Session Scheduler job.
- RUN
- Number of running tasks of the Session Scheduler job.
- DONE
- Number of successfully completed tasks of the Session Scheduler
job.
- EXIT
- Number of unsuccessfully completed tasks of the Session Scheduler
job.
Output: Unfinished job summary information
Use -sum to display summary information
about unfinished jobs. The count of job slots for the following job
states is displayed:
- RUN
- The
job is running.
- SSUSP
- The
job has been suspended by LSF.
- USUSP
- The
job has been suspended, either by its owner or the LSF administrator,
while running.
- UNKNOWN
- mbatchd has
lost contact with the sbatchd on the host where
the job was running.
- PEND
- The
job is pending, which may include PSUSP and
chunk job WAIT. When -sum is
used with -p in MultiCluster, WAIT jobs
are not counted as PEND or FWD_PEND.
When -sum is used with -r, WAIT jobs
are counted as PEND or FWD_PEND.
- FWD_PEND
- The
job is pending and forwarded to a remote cluster. The job has not
yet started in the remote cluster.
Output: Affinity resource
requirements information (-l -aff)
Use -l
-aff to display information about CPU and memory affinity
resource requirements for job tasks. A table with the heading AFFINITY is
displayed containing the detailed affinity information for each task,
one line for each allocated processor unit. CPU binding and memory
binding information are shown in separate columns in the display.
- HOST
-
The host the task is running on
- TYPE
-
Requested processor unit type for CPU
binding. One of numa, socket, core,
or thread.
- LEVEL
-
Requested processor unit binding level
for CPU binding. One of numa, socket, core,
or thread. If no CPU binding level is
requested, a dash (-) is displayed.
- EXCL
-
Requested processor unit binding level
for exclusive CPU binding. One of numa, socket,
or core. If no exclusive binding level
is requested, a dash (-) is displayed.
- IDS
-
List of physical or logical
IDs of the CPU allocation for the task.
The
list consists of a set of paths, represented as a sequence integers
separated by slash characters (/), through
the topology tree of the host. Each path identifies a unique processing
unit allocated to the task. For example, a string of the form 3/0/5/12 represents
an allocation to thread 12 in core 5 of socket 0 in NUMA node 3. A
string of the form 2/1/4represents an
allocation to core 4 of socket 1 in NUMA node 2. The integers correspond
to the node ID numbers displayed in the topology tree from bhosts
-aff.
- POL
-
Requested memory binding policy. Eitherlocal or pref.
If no memory binding is requested, a dash (-)
is displayed.
- NUMA
-
ID of the NUMA node that the task memory
is bound to. If no memory binding is requested, a dash (-)
is displayed.
- SIZE
-
Amount of memory allocated for the
task on the NUMA node.
For example
the following job starts 6 tasks with the following affinity resource
requirements:
bsub -n 6 -R"span[hosts=1] rusage[mem=100]affinity[core(1,same=socket,
exclusive=(socket,injob)):cpubind=socket:membind=localonly:distribute=pack]" myjob
Job <6> is submitted to default queue <normal>.
bjobs -l -aff 6
Job <6>, User <user1>, Project <default>, Status <RUN>, Queue <normal>, Comman
d <myjob1>
Thu Feb 14 14:13:46: Submitted from host <hostA>, CWD <$HOME>, 6 Task(s),
Requested Resources <span[hosts=1] rusage[mem=10
0]affinity[core(1,same=socket,exclusive=(socket,injob)):cp
ubind=socket:membind=localonly:distribute=pack]>;
Thu Feb 14 14:15:07: Started 6 Task(s) on Hosts <hostA> <hostA> <hostA> <hostA>
<hostA> <hostA>, Allocated 6 Slot(s) on Hosts <hostA>
<hostA> <hostA> <hostA> <hostA> <hostA>, Execution Home
</home/user1>, Execution CWD </home/user1>;
SCHEDULING PARAMETERS:
r15s r1m r15m ut pg io ls it tmp swp mem
loadSched - - - - - - - - - - -
loadStop - - - - - - - - - - -
RESOURCE REQUIREMENT DETAILS:
Combined: select[type == local] order[r15s:pg] rusage[mem=100.00] span[hosts=1
] affinity[core(1,same=socket,exclusive=(socket,injob))*1:
cpubind=socket:membind=localonly:distribute=pack]
Effective: select[type == local] order[r15s:pg] rusage[mem=100.00] span[hosts=
1] affinity[core(1,same=socket,exclusive=(socket,injob))*1
:cpubind=socket:membind=localonly:distribute=pack]
AFFINITY:
CPU BINDING MEMORY BINDING
------------------------ --------------------
HOST TYPE LEVEL EXCL IDS POL NUMA SIZE
hostA core socket socket /0/0/0 local 0 16.7MB
hostA core socket socket /0/1/0 local 0 16.7MB
hostA core socket socket /0/2/0 local 0 16.7MB
hostA core socket socket /0/3/0 local 0 16.7MB
hostA core socket socket /0/4/0 local 0 16.7MB
hostA core socket socket /0/5/0 local 0 16.7MB
See also
bsub, bkill, bhosts, bmgroup, bclusters, bqueues, bhist, bresume, bsla, bstop, lsb.params, lsb.serviceclasses, mbatchd