Job Arrays


Sometimes it is necessary to run a series of jobs that share the same computational requirements. A typical case is when one needs to make many runs of the same code, with different input/output, for example.

In this case, it could be useful to make use of the scheduler capability of managing job-arrays. The user must prepare a template, that is then used by all the different jobs. A basic job script could be the following:

#!/bin/sh
# General options
# -- JobName --
#PBS -N Job-array-test 
# -- stdout/stderr redirection --
#PBS -o $PBS_JOBNAME.$PBS_JOBID.out
#PBS -e $PBS_JOBNAME.$PBS_JOBID.err 
# -- specify queue -- 
#PBS -q hpc
# -- user email address --
# please uncomment the following line and put in your e-mail address,
# if you want to receive e-mail notifications on a non-default address
##PBS -M your_email_address
# -- mail notification -- 
#PBS -m abe 
# -- Job array specification --
#PBS -t 1-20
# Number of cores 
#PBS -l nodes=1:ppn=4 
# specify the wall clock time (16 hours) 
#PBS -l walltime=16:00:00 
# Execute the job from the current working directory 
cd $PBS_O_WORKDIR

# Program_name_and_options 
./my_program >  Out_$PBS_ARRAYID.out

Most of the options in the script have been already discussed (see Batch Jobs). We discuss here only the job-array specific options. The option to ask for a job array is

#PBS -t 1-20

In this way you ask for 20 jobs, numbered from 1 to 20. For each of them the resource manager reserves 4 processors in this case, so 80 processors are needed. Each job in the job array can be identified by its index, that is accessible through the environment variable

$PBS_ARRAYID

This can be used for to diversify each job’s input and output. In this example, it is used to create a different output name for each of the jobs: Out_$PBS_ARRAYID.out.
Each job is also assigned a specific “vector” job-id of the form

<jobid>[array-id]

The user can check the status of the jobs with

qstat -t

Delete individual jobs with

qdel <jobid>[array-id]

delete the running jobs with

qdel <jobid>

and delete all the jobs (also the ones that are queued) with

qdel <jobid>[]

Another option is to specify a range of jobs (running and/or waiting)

qdel -t 3-5 <jobid>[]

will delete the jobs with index 3, 4, 5 from the list.

Note: it is important to specify the square bracket in the commands, as in the previous examples.

Important Note:

The jobs in a job array are all completely independent, and they will non necessarily run in the order you expect. So you can never rely on the order of execution. If one jobs in the array needs another one to be completed before it starts, you have to take care of this yourself.

 

Good practice

Using the job arrays it is possible to fill up the queueing system with a lot of jobs. Remember that there are other users on the system. You can always kindly specify that you want at most a certain number of jobs to simultaneously run on the system, as follows:

#PBS -t 1-20%5

In this way you request a job array of 20 jobs, but with at most 5 running simultaneously.