LSF native command to get information on the jobs that are running, pending, or finished recently (the last hour) is bjobs
. When called without any option, it provides short information about all the user’s running/pending jobs.
bjobs OBID USER QUEUE JOB_NAME SLOTS STAT START_TIME TIME_LEFT 45755 s012345 hpc large_Set 4 RUN Aug 21 11:00 00:14:41 L 45756 s012345 hpc *3_LSF10_1 2 RUN Aug 21 11:00 24:00:00 L
You can see all your running jobs (JOBID column), the username (USER column), the queue the job was submitted to (QUEUE column), their jobnames (JOB_NAME column), the number of slots used (for running jobs, SLOTS column), the job status (STAT column), the start time (START_TIME column) and the time to completion (TIME_LEFT).
Some of the common status labels are:
PEND job is queued, waiting to be scheduled
RUN job is running
DONE job is completed after having run
EXIT job exited (after being killed, for example)
SSUSP job is suspended
bjobs
can be used to extract a lot more information about your job. For a list of the possible options, type
man bjobs
To get some more information about your jobs, use the command , eventually with the option -l
, if you want a verbose output:
bjobs -l <your-job-id> bjobs -l 45800 Job <45800>, Job Name <Com_wrench_HYBRID>, User <s012345>, Project <default>, Se rvice Class <sc_hpc1>, Mail <s012345@student.dtu.dk>, Status <RUN>, Queue <hpc>, Command <#!/bin/sh;# embedd ed options to bsub - #BSUB;# -- job name ---;#BSUB -J Com_wrench_HYBRID;# -- email me at the beginning (b) and end (e) of the execution --;#BSUB -B -N;# -- Select queue --;#BSUB -q hpc;# -- My email address - -;#BSUB -u andbor@dtu.dk;# -- estimated wall clock ti me (execution time) --;#BSUB -W 4:00;##BSUB -env "PA TH";# -- parallel environment requests --;#BSUB -n 2; #BSUB -M 5000;#BSUB -R "affinity[core(8)]";### -- Spe cify the output and error file. %J is the job-id -- ; ### -- -o and -e mean append, -oo and -eo mean overwr ite -- ;#BSUB -o Output_%J.out ;#BSUB -e Error_%J.err ; # -- end of LSF options --; I_MPI_HYDRA_BOOTSTRAP =lsf;I_MPI_DEBUG=0; export I_MPI_HYDRA_BOOTSTRAP I_MP I_DEBUG ; Num=$OMP_NUM_THREADS; Nodes=$LSB_MAX_NUM_P ROCESSORS; comsol52 -nn $Nodes -np $Num batch -input file wrench.mph -outputfile wrench_out.mph -tmpdir tm p/ -recoverydir tmp>, Share group charged </userlongs erial> Mon Aug 21 17:30:17 2017: Submitted from host <hpclogin2>, CWD <$HOME/LSF10/TES TS_Applic_7.3/COMSOL/HYBRID>, Output File <Output_458 00.out>, Error File <Error_45800.err>, Notify when jo b begins/ends, 2 Task(s), Requested Resources <affini ty[core(8)] rusage[mem=5000] span[hosts=1]>; RUNLIMIT 240.0 min of n-62-21-105 MEMLIMIT 4.8 G Mon Aug 21 17:30:17 2017: Started 2 Task(s) on Host(s) <2*n-62-21-105>, Allocat ed 16 Slot(s) on Host(s) <16*n-62-21-105>, Execution Home </zhome/xx/x/xxxxx>, Execution CWD </zhome/xx/x/ xxxxx/LSF10/TESTS_Applic_7.3/COMSOL/HYBRID>; Mon Aug 21 17:32:03 2017: Resource usage collected. The CPU time used is 279 seconds. MEM: 1.8 Gbytes; SWAP: 0 Mbytes; NTHREAD: 62 PGID: 15708; PIDs: 15708 15714 15718 15855 15877 PGID: 15880; PIDs: 15880 PGID: 15885; PIDs: 15885 PGID: 15886; PIDs: 15886 MEMORY USAGE: MAX MEM: 1.8 Gbytes; AVG MEM: 1 Gbytes PENDING TIME DETAILS: Eligible pending time (seconds): 0 Ineligible pending time (seconds): 0 SCHEDULING PARAMETERS: r15s r1m r15m ut pg io ls it tmp swp mem loadSched - - - - - - - - - - - loadStop - - - - - - - - - - - RESOURCE REQUIREMENT DETAILS: Combined: select[(model = XeonX5550 ||model = XeonE5_2680v2 ||model = XeonE5_2 660v3 ||model = XeonE5_2650v4 ) && (type == any)] ord er[-slots:-maxslots] rusage[mem=5000.00] span[hosts=1 ] same[type:model] cu[pref=config:maxcus=1:type=rack] affinity[core(8)*1] Effective: select[((model = XeonX5550 ||model = XeonE5_2680v2 ||model = XeonE5 _2660v3 ||model = XeonE5_2650v4 ) && (type == any))] order[-slots:-maxslots] rusage[mem=5000.00] span[host s=1] same[type:model] cu[type=rack:maxcus=1:pref=conf ig] affinity[core(8)*1]
The output shows, in a messy format, your original jobscript, and a lot of other information, like the walltime of your job (RUNLIMIT), and how much time has passed;
how many processors you have actually asked for ( Tasks);
the memory used (MEMORYUSAGE).
This information can be useful to check whether your job-script was correct.
Note: you can also get information on a job that is waiting to be dispatched. There will be a section with the reason of the pending status
PENDING REASONS: User has reached the per-user job slot limit of the queue;