About suspending and resuming jobs (bstop and bresume)

You can resume or suspend a job using the bstop and bresume commands.

A job can be suspended by its owner or the LSF administrator with the bstop command. These jobs are considered user-suspended and are displayed by bjobs as USUSP.

When the user restarts the job with the bresume command, the job is not started immediately to prevent overloading. Instead, the job is changed from USUSP to SSUSP (suspended by the system). The SSUSP job is resumed when host load levels are within the scheduling thresholds for that job, similarly to jobs suspended due to high load.

If a user suspends a high priority job from a non-preemptive queue, the load may become low enough for LSF to start a lower priority job in its place. The load created by the low priority job can prevent the high priority job from resuming. This can be avoided by configuring preemptive queues.

The command bstop sends the following signals to the job:

  • SIGTSTP for parallel or interactive jobs

    SIGTSTP is caught by the master process and passed to all the slave processes running on other hosts.

  • SIGSTOP for sequential jobs

    SIGSTOP cannot be caught by user programs. The SIGSTOP signal can be configured with the LSB_SIGSTOP parameter in lsf.conf.

Allow users to resume jobs

If ENABLE_USER_RESUME=Y in lsb.params, you can resume your own jobs that have been suspended by the administrator.