Job Arrays under LSF


Sometimes it is necessary to run a series of jobs that share the same computational requirements. A typical case is when one needs to make many runs of the same code, with different input/output, for example.

In this case, it could be useful to make use of the scheduler capability of managing job-arrays. The user must prepare a template, that is then used by all the different jobs. A basic job script could be the following:

#!/bin/sh
# embedded options to bsub - start with #BSUB
### -- set the job Name AND the job array --
#BSUB -J My_array[1-25]
### –- specify queue -- 
#BSUB -q hpc 
### -- ask for number of cores (default: 1) --
#BSUB -n 4
### -- set walltime limit: hh:mm --
#BSUB -W 03:00 
### -- specify that we need 4GB of memory per core/slot -- 
#BSUB -R "rusage[mem=4GB]"
### -- set the email address --
# please uncomment the following line and put in your e-mail address,
# if you want to receive e-mail notifications on a non-default address
##BSUB -u your_email_address
### -- send notification at start --
#BSUB -B
### -- send notification at completion--
#BSUB -N
### -- Specify the output and error file. %J is the job-id %I is the job-array index --
### -- -o and -e mean append, -oo and -eo mean overwrite -- 
#BSUB -o Output_%J_%I.out
#BSUB -e Output_%J_%I.err 

# here follow the commands you want to execute 
# Program_name_and_options
./my_program >  Out_$LSB_JOBINDEX.out

Most of the options in the script have been already discussed (see Batch Jobs). We discuss here only the job-array specific options. The way to ask for a job array is to add some array’s boundary in the jobname specification:

BSUB -J My_array[1-25]

In this way you ask for 25 jobs, numbered from 1 to 25. For each of them the resource manager reserves 4 processors in this case, so 100 processors are needed.

Note 1: one can also specify a list of numbers as array indexes. For example [1,23,45-67] will create jobs with index 1, 23, and from 45 to 67. It i also possible to specify a stepsize: [1-21:2] will create the jobs with odd numbers from 1 to 21.

Each job in the job array can be identified by its index, that is accessible through the environment variable

$LSB_JOBINDEX

This can be used for to diversify each job’s input and output. In this example, it is used to create a different output name for each of the jobs: Out_$LSB_JOBINDEX.out.

Note 2: LSF provides the runtime variables %I and %J, that correspond to the job array index and the jobID respectively. They can be used in the #BSUB option specification to diversify the jobs. In your commands, however, you have to use the environment variables LSB_JOBINDEX and LSB_JOBID

Each job is also assigned a specific “vector” job-id of the form

<jobid>[array-id]

The user can check the status of the job array with

bjobs <jobid>

Or of a single element with

bjobs <jobid>[array-id]

Delete individual jobs with

bkill <jobid>[array-id]

delete all jobs with

bkill <jobid>

or delete some selected ones with

bkill <jobid>[1-5,212,334]

Important Note:

The jobs in a job array are all completely independent, and they will not necessarily run in the order you expect. So you can never rely on the order of execution. If one jobs in the array needs another one to be completed before it starts, you have to take care of this yourself. However, you can set up dependencies among different jobs.

Good practice

  • Using the job arrays it is possible to fill up the queueing system with a lot of jobs. Remember that there are other users on the system. You can always kindly specify that you want at most a certain number of jobs to simultaneously run on the system, as follows:
#BSUB -J My_array[1-20]%5

In this way you request a job array of 20 jobs, but with at most 5 running simultaneously.

  • Each job of the array will be an independent job, so if you enable the email notification, you will receive email from each of the job. This can create a lot of emails. Please consider the option of disabling the email notification if you have a large job-array, or if several of your jobs are expected to start and finish within a short time interval.