Job-level automatic requeue

Procedure

Use bsub -Q to submit a job that is automatically requeued if it exits with the specified exit values.

Use spaces to separate multiple exit codes. The reserved keyword all specifies all exit codes. Exit codes are typically between 0 and 255. Use a tilde (~) to exclude specified exit codes from the list.

Job-level requeue exit values override application-level and queue-level configuration of the parameter REQUEUE_EXIT_VALUES, if defined.

Jobs running with the specified exit code share the same application and queue with other jobs.

For example:
bsub -Q "all ~1 ~2 EXCLUDE(9)" myjob

Jobs exited with all exit codes except 1 and 2 are requeued. Jobs with exit code 9 are requeued so that the failed job is not rerun on the same host (exclusive job requeue).

Enable exclusive job requeue

Procedure

Define an exit code as EXCLUDE(exit_code) to enable exclusive job requeue.

Exclusive job requeue does not work for parallel jobs.

Note:

If mbatchd is restarted, it does not remember the previous hosts from which the job exited with an exclusive requeue exit code. In this situation, it is possible for a job to be dispatched to hosts on which the job has previously exited with an exclusive exit code.

Modify requeue exit values

Procedure

Use bmod -Q to modify or cancel job-level requeue exit values.

bmod -Q does not affect running jobs. For rerunnable and requeue jobs, bmod -Q affects the next run.

MultiCluster Job forwarding model

For jobs sent to a remote cluster, arguments of bsub -Q take effect on remote clusters.

MultiCluster Lease model

The arguments of bsub -Q apply to jobs running on remote leased hosts as if they are running on local hosts.