Host memory defragmentation

Enable and configure host memory defragmentation to allow large memory jobs to run on large memory hosts.

A large memory host tends to experience memory fragmentation when small memory jobs are running on the host. Since jobs run with different memory requirements, it is unlikely for a large memory host to have large blocks memory available for large memory jobs, especially in busy clusters. Large memory jobs must wait for small memory jobs to complete before running on the large memory hosts.

When this occurs, the host memory may become underutilized when a large memory host is running a few small memory jobs, which means that the remaining memory in the host is not being utilized. Another aspect of this case is that relatively large memory jobs remain pending while waiting for these small memory jobs to finish. Large memory jobs can only run on large memory hosts, while other hosts may run small memory jobs.

In order to schedule large memory jobs, Dynamic Cluster can use live migration to move small memory jobs from large memory hosts. In this way, large memory hosts are reserved for large memory jobs while small memory jobs are live-migrated. This is host memory defragmentation. Host memory defragmentation works when jobs are using guaranteed resources and does not rely on queue priority. Large memory jobs in lower priority queues can trigger a live migration to smaller memory jobs in a higher or equal priority queue.

How host memory defragmentation works

Host memory defragmentation is configured at the queue level. Users can submit large memory jobs (with memory resource requirements) to this queue. If the large memory job pends because there are no hosts with enough available memory to run the job, Dynamic Cluster finds and reserves a host that can run the large memory job. After reserving the host, Dynamic Cluster live-migrates the smaller VM jobs to other hosts.

The reservation for the large memory job lasts until one of the following occurs:

In general, a reservation made by host memory defragmentation occurs as long as a large memory job is entitled to use a guarantee pool that includes a source hypervisor. Small memory jobs can find other (target) hypervisors as long as its guaranteed SLA is satisfied.

A large memory job can be a VM job or a PM job, but it must be a Dynamic Cluster job (that is, the job must be submitted with a Dynamic Cluster template). For PM jobs, the hypervisor host must be KVM and configured with a Dynamic Cluster template.