Start of change

Configure host memory defragmentation

Enable and configure host memory defragmentation at the queue level. When you enable host memory defragmentation for a queue, users can submit large memory jobs to this queue to ensure that Dynamic Cluster can make large memory hosts available to run these large memory jobs.

Procedure

  1. Log in as the LSF administrator to any host in the cluster.
  2. Edit the lsb.queues file.
  3. Define DC_HOST_DEFRAG_TIMEOUT to specify how long a host memory defragmentation reservation lasts, in minutes, until the reservation times out.

    DC_HOST_DEFRAG_TIMEOUT = time_in_minutes

    Specifying this parameter enables host memory defragmentation for the queue.

  4. Optional. Define DC_HOST_DEFRAG_MIN_PENDING_TIME to specify how long a job is pending, in minutes, before it triggers a host memory defragmentation.

    DC_HOST_DEFRAG_MIN_PENDING_TIME = time_in_minutes

    If a job cannot find an available host with enough memory, the job pends. This parameter specifies how long the job waits for a host to become available before triggering a host memory defragmentation to make a host become available.

    The default is 0 (the job triggers a host memory defragmentation immediately if it cannot find an available host).

  5. Optional. Define DC_HOST_DEFRAG_MIN_MEMSIZE to specify that a job triggers a host memory defragmentation if its rusage mem requirement (that is, its memory resource requirement) is larger than or equal to this value.

    DC_HOST_DEFRAG_MIN_MEMSIZE = size_in_GB

    This parameter specifies the minimum memory requested before a job is considered a "large memory job" that can trigger a host memory defragmentation.

    The default is 0 (any jobs with a memory resource requirement can trigger a host memory defragmentation).

  6. Optional. To throttle the number of concurrent live migrations due to host memory defragmentation, edit the lsb.params file and specify DC_HOST_DEFRAG_MAX_CONCURRENT_NUM_JOBS.

    DC_HOST_DEFRAG_MAX_CONCURRENT_NUM_JOBS = integer

    Specify the maximum concurrent number of Dynamic Cluster jobs that can trigger a host memory defragmentation.

    This parameter allows you to manage any performance impact on regular scheduler, and controls the load on network and storage infrastructure (that is, to prevent "I/O storms").

Results

By default, VM jobs are subject to to live migration when their hypervisor hosts are selected for host memory defragmentation. To indicate that a job cannot be live migrated, specify -dc_livemigvm n at the job submission time. At the application profile level, specify DC_LIVEMIGVM = N and a job can be submitted to the application file. In esub scripts, specify LSB_SUB4_DC_NOT_MIGRATABLE=Y.

Submit a job with a large memory job requirement (that is, a job with -R 'rusage[mem=memory_requirement]') to a queue configured with DC_HOST_DEFRAG_TIMEOUT. If there is no immediate resource available for the job (within the period as specified by the DC_HOST_DEFRAG_MIN_PENDING_TIME parameter), Dynamic Cluster attempts a host memory defragmentation under the following conditions:

Example

For example, if you specify DC_HOST_DEFRAG_TIMEOUT in lsb.queues for the hostdefrag queue, any job with a memory requirement that you submit to the hostdefrag queue (that is, by submitting a job with -q hostdefrag -R 'rusage[mem=memory_requirement]') can trigger a host memory defragmentation if there are no available hosts to run the job (subject to the other host memory defragmentation parameters).

What to do next

To verify that a queue has host memory defragmentation enabled, run bqueues -l queue_name and verify that DC_HOST_DEFRAG_TIMEOUT displays a value. If there are other host memory defragmentation parameters, these parameters also display their configured values.

End of change