Enable and configure host memory defragmentation at the queue level. When you enable host memory defragmentation for a queue, users can submit large memory jobs to this queue to ensure that Dynamic Cluster can make large memory hosts available to run these large memory jobs.
Procedure
- Log in as the LSF administrator to any host in the cluster.
- Edit the lsb.queues file.
- Define DC_HOST_DEFRAG_TIMEOUT to specify how long a host memory defragmentation reservation lasts, in minutes, until the reservation times out.
DC_HOST_DEFRAG_TIMEOUT = time_in_minutes
Specifying this parameter enables host memory defragmentation for the queue.
- Optional. Define DC_HOST_DEFRAG_MIN_PENDING_TIME to specify how long a job is pending, in minutes, before it triggers a host memory defragmentation.
DC_HOST_DEFRAG_MIN_PENDING_TIME = time_in_minutes
If a job cannot find an available host with enough memory, the job pends. This parameter specifies how long the job waits for a host to become available before triggering a host memory defragmentation to make a host become available.
The default is 0 (the job triggers a host memory defragmentation immediately if it cannot find an available host).
- Optional. Define DC_HOST_DEFRAG_MIN_MEMSIZE to specify that a job triggers a host memory defragmentation if its rusage mem requirement (that is, its memory resource requirement) is larger than or equal to this value.
DC_HOST_DEFRAG_MIN_MEMSIZE = size_in_GB
This parameter specifies the minimum memory requested before a job is considered a "large memory job" that can trigger a host memory defragmentation.
The default is 0 (any jobs with a memory resource requirement can trigger a host memory defragmentation).
- Optional. To throttle the number of concurrent live migrations due to host memory defragmentation, edit the lsb.params file and specify DC_HOST_DEFRAG_MAX_CONCURRENT_NUM_JOBS.
DC_HOST_DEFRAG_MAX_CONCURRENT_NUM_JOBS = integer
Specify the maximum concurrent number of Dynamic Cluster jobs that can trigger a host memory defragmentation.
This parameter allows you to manage any performance impact on regular scheduler, and controls the load on network and storage infrastructure (that is, to prevent "I/O storms").
Results
By default, VM jobs are subject to to live migration when their hypervisor hosts are selected for host memory defragmentation. To indicate that a job cannot be live migrated, specify -dc_livemigvm n at the job submission time. At the application profile level, specify DC_LIVEMIGVM = N and a job can be submitted to the application file. In esub scripts, specify LSB_SUB4_DC_NOT_MIGRATABLE=Y.
Submit a job with a large memory job requirement (that is, a job with -R 'rusage[mem=memory_requirement]') to a queue configured with DC_HOST_DEFRAG_TIMEOUT. If there is no immediate resource available for the job (within the period as specified by the DC_HOST_DEFRAG_MIN_PENDING_TIME parameter), Dynamic Cluster attempts a host memory defragmentation under the following conditions:
- The job is pending because there are no hosts with enough available memory to run the job.
- There are smaller VM jobs running on a source hypervisor and these jobs can be live-migrated to other (target) hypervisors.
Example
For example, if you specify DC_HOST_DEFRAG_TIMEOUT in lsb.queues for the hostdefrag queue, any job with a memory requirement that you submit to the hostdefrag queue (that is, by submitting a job with -q hostdefrag -R 'rusage[mem=memory_requirement]') can trigger a host memory defragmentation if there are no available hosts to run the job (subject to the other host memory defragmentation parameters).
What to do next
To verify that a queue has host memory defragmentation enabled, run bqueues -l queue_name and verify that DC_HOST_DEFRAG_TIMEOUT displays a value. If there are other host memory defragmentation parameters, these parameters also display their configured values.