Configuration file |
Parameter and syntax |
Behavior |
---|---|---|
lsf.conf |
LSB_MIG2PEND=1 |
|
LSB_REQUEUE_TO_BOTTOM=1 |
|
After a checkpointable resizable job restarts (brestart), LSF restores the original job allocation request. LSF also restores job-level autoresizable attribute and notification command if they are specified at job submission.
Begin Queue
...
QUEUE_NAME=checkpoint
CHKPNT=mydir 240
DESCRIPTION=Automatically checkpoints jobs every 4 hours to mydir
...
End Queue
The bqueues command displays the checkpoint period in seconds; the lsb.queues CHKPNT parameter defines the checkpoint period in minutes.
If the command bchkpnt -k 123 is used to checkpoint and kill job 123, you can restart the job using the brestart command as shown in the following example:
Job <456> is submitted to queue <priority>
LSF assigns a new job ID of 456, submits the job to the queue named "priority," and restarts the job.
Once job 456 is running, you can change the checkpoint period using the bchkpnt command:
Job <456> is being checkpointed