To submit a VM job with checkpointing enabled, use the bsub -dc_chkpntvm option or the DC_CHKPNTVM parameter in lsb.applications to specify the initial and recurring time to create a checkpoint of the VM job.
About this task
Note: - Do not use LSF checkpointing (the bsub -k option) with Dynamic Cluster VM job checkpointing. If you use LSF checkpointing with VM job checkpointing, Dynamic Cluster will reject the job submission.
- RHEL KVM hypervisors will not live migrate a VM that has a snapshot. Do not use VM job checkpointing (-dc_chkpntvm) with live migration (-dc_vmaction livemigvm) for the same job.
Procedure
- To specify a custom checkpoint interval for an individual job, use the bsub -dc_chkpntvm option:
-dc_chkpntvm "init=initial_minutes
interval_minutes"
where initial_minutes is the time for the initial VM job checkpoint in minutes after the job was dispatched and interval_minutes is the amount of time after the previous checkpoint to create subsequent checkpoints.
For example,
bsub ... -r -dc_chkpntvm "init=60 15"
The first checkpoint is created 60 minutes after the job is dispatched, and a new checkpoint is created every 15 minutes after the previous checkpoint.
- To specify a custom checkpoint interval for all jobs, use the DC_CHKPNTVM parameter in lsb.applications:
DC_CHKPNTVM="init=initial_minutes interval_minutes
where initial_minutes is the time for the initial VM job checkpoint in minutes after the job was dispatched and interval_minutes is the amount of time after the previous checkpoint to create subsequent checkpoints.
For example,
DC_CHKPNTVM "init=120 30"
The first checkpoint is created 120 minutes after the job is dispatched, and a new checkpoint is created every 30 minutes after the previous checkpoint.