Managing jobs


Submitting, and checking job status

The most important command is the one for submitting your job script to the queue:

qsub your_job_script

qsub accepts also a lot of options, so that you could in principle avoid to use the script and run a single long command with all the option you need. This is possible, but not recommended.

For checking the available options, just read the man pages:

man qsub

Some of the options are implementation dependent, therefore may not all be available in the DTU installation. Once you submit the job, you can still track it. The job is given a job-id, that is shown by the output of the qstat command:

qstat
Job ID                     Name            User            Time Use S Queue
------------------------- ---------------- --------------- -------- - -----
3597252.hpc-fe1           ...-COOETC_R9-16 s012345         3401:00: R hpc
3640759.hpc-fe1           xterm-linux      s012345         0        R app

You can see all your running jobs (User column), with their job-id (first column), the name you assigned them (-N option of PBS), the current runtime, the status (column S), and the name of the queue.Some common status letters are :

Q job is queued, eligible to run or routed

R job is running

C Job is completed after having run

H Job is held

qstat can be used to extract a lot more information about your job. For a list of the possible options, type

man qstat

If your job is queued (Q), and you want to have an idea about when it will run, pick its job-id from the qstat output, and type

showstart <your-job-id>

This will show the estimated start, and stop of your job execution. It is just an estimate, based on the requested resources of your job, and the current schedule of the queue.

To get some more information about your jobs, use the command checkjob, eventually with the option -v, if you want a verbose output:

checkjob <your-job-id>

It is sometimes necessary to remove a job from a queue. This can be done in any stage, i.e. when the program is still waiting to be run (state Q), or during the run (state R). Just get the job id with qstat, and type

qdel <your-job-id>

It can be useful to have an overview of the status of a whole queue. This can be done with the showq command:

showq

 

Additional useful commands

A short overview of the current system load can be obtained issuing the command classstat (3 “s”). It can be used with the queue name as an argument.

classstat hpc
queue                total  used avail
----------------------------------------
hpc                   1348  1248   100

It gives an overview of the system clusters capabilities and load, showing the number of cores in total, used and available.

To have an idea of the system load at node level, use nodestat, eventually followed by the name of a specific queue, e.g.:

nodestat hpc
Name                    State   Procs    Load 
... 
n-62-12-9             Running    3:8     5.08 
n-62-12-10               Busy    0:8     8.10 
n-62-12-11            Running    2:8     5.99 
...
n-62-23-7                Busy    0:20   20.07 
n-62-23-8                Busy    0:20   20.00 
n-62-23-13               Busy    0:20   20.08 
n-62-23-16               Busy    0:20   20.23 
n-62-23-1                Idle   20:20    0.11

This is a long output. It shows the nodename, the state if the node, the pair available-cores:total-cores, and the load.