Some UNIX operating systems support dynamic hardware reconfiguration; that is, the attaching or detaching of system boards in a live system without having to reboot the host.
LSF is able to recognize changes in ncpus, maxmem, maxswp, maxtmp in the following platforms:
Sun Solaris 10 and 11+
HP UX 11
IBM AIX 5, 6 and 7 on IBM POWER
LSF is able to automatically detect a change in the number of processors in systems that support dynamic hardware reconfiguration.
The local LIM checks if there is a change in the number of processors at an internal interval of 2 minutes. If it detects a change in the number of processors, the local LIM also checks maxmem, maxswp, maxtmp. The local LIM then sends this new information to the master LIM.
If you dynamically change maxmem, maxswp, or maxtmp without changing the number of processors, you need to restart the local LIM with the command lsadmin limrestart so that it can recognize the changes.
If you dynamically change the number of processors and any of maxmem, maxswp, or maxtmp, the change is automatically recognized by LSF. When it detects a change in the number of processors, the local LIM also checks maxmem, maxswp, maxtmp.
There may be a 2-minute delay before the changes are recognized by lsxxx commands (for example, before lshosts displays the changes).
There may be at most a 2 + 10 minute delay before the changes are recognized by bxxx commands (for example, before bhosts -l displays the changes).
This is because mbatchd contacts the master LIM at an internal interval of 10 minutes.
Configuration changes from a local cluster are communicated from the master LIM to the remote cluster at an interval of 2 * CACHE_INTERVAL. The parameter CACHE_INTERVAL is configured in lsf.cluster.cluster_name and is by default 60 seconds.
This means that for changes to be recognized in a remote cluster there is a maximum delay of 2 minutes + 2*CACHE_INTERVAL.
LSF uses ncpus, maxmem, maxswp, maxtmp to make scheduling and load decisions.
When processors are added or removed, LSF licensing is affected because LSF licenses are based on the number of processors.
If you put a processor offline, dynamic hardware changes have the following effects:
Per host or per-queue load thresholds may be exceeded sooner. This is because LSF uses the number of CPUS and relative CPU speeds to calculate effective run queue length.
The value of CPU run queue lengths (r15s, r1m, and r15m) increases.
Jobs may also be suspended or not dispatched because of load thresholds.
Per-processor job slot limit (PJOB_LIMIT in lsb.queues) may be exceeded sooner.
If you put a new processor online, dynamic hardware changes have the following effects:
Load thresholds may be reached later.
The value of CPU run queue lengths (r15s, r1m, and r15m) is decreased.
Jobs suspended due to load thresholds may be resumed.
Per-processor job slot limit (PJOB_LIMIT in lsb.queues) may be reached later.