The following commands allow for control and monitoring of host power state management.
The option: hpower for badmin is used to switch the power state of idle host (hosts and host groups including compute unit and host partition hosts) to enter into power saving state or working state manually. For example:
badmin hpower suspend | resume [-C comments] host_name […]
Options:
Use badmin hist and badmin hhist to retrieve the historical information about the power state changes of hosts.
All power related events are logged for both badmin hpower and actions triggered by configured (automated) PowerPolicy.
Power State Action | Performed by | Success/Fail | Logged Events |
---|---|---|---|
Suspend | By badmin hpower | On Success | Host <host_name> suspend request from administrator <cluster_admin_name>. Host <host_name> suspend request done. Host <host_name> suspend. |
On Failure | Host <host_name> suspend request from administrator <cluster_admin_name>. Host <host_name> suspend request failed. Host <host_name> power unknown. |
||
By PowerPolicy | On Success | Host <host_name> suspend request from power policy <policy_name>. Host <host_name> suspend request done. Host <host_name> suspend. |
|
On Failure | Host <host_name> suspend request from power policy <policy_name>. Host <host_name> suspend request failed. Host <host_name> power unknown. |
||
Resume | By badmin hpower | On Success | Host <host_name> resume request from administrator <cluster_admin_name>. Host <host_name> resume request done. Host <host_name> on. |
On Failure | Host <host_name> resume request from administrator <cluster_admin_name>. Host <host_name> resume request exit. Host <host_name> power unknown. |
||
By PowerPolicy | On Success | Host <host_name> resume request from power policy <policy_name>. Host <host_name> resume request done. Host <host_name> on. |
|
On Failure | Host <host_name> resume request from power policy <policy_name>. Host <host_name> resume request exit. Host <host_name> power unknown. |
Use bhosts -l to display the power state for hosts. bhosts only shows the power state of the host when PowerPolicy (in lsb.resources) is enabled. If the host status becomes unknown (power operation due to failure), the power state is shown as a dash (“-”).
Final power states:
Intermediate power states:
The following states are displayed when mbatchd has sent a request for power operations but the execution has not returned back. If the operation command returns, LSF assumes the operation is done. The intermediate status will be changed.
Final host state under administrator control:
Final host state under policy control:
Example bhosts:
HOST_NAME STATUS JL/U MAX NJOBS RUN SSUSP USUSP RSV
host1 closed - 4 0 0 0 0 0
host2 ok_Power - 4 0 0 0 0
host3 unavail - 4 0 0 0 0 0
Example bhosts -w:
HOST_NAME STATUS JL/U MAX NJOBS RUN SSUSP USUSP RSV
host1 closed_Power - 4 0 0 0 0 0
host2 ok_Power - 4 0 0 0 0
host3 unavail - 4 0 0 0 0 0
Example bhosts -l:
HOST host1
STATUS CPUF JL/U MAX NJOBS RUN SSUSP USUSP RSV DISPATCH_WINDOW
closed_Power 1.00 - 4 4 4 0 0 - -
CURRENT LOAD USED FOR SCHEDULING:
r15s r1m r15m ut pg io ls it tmp swp mem slots
Total 0.0 0.0 0.0 0% 0.0 0 0 0 31G 31G 12G 0
Reserved 0.0 0.0 0.0 0% 0.0 0 0 0 0M 0M 4096M -
LOAD THRESHOLD USED FOR SCHEDULING:
r15s r1m r15m ut pg io ls it tmp swp mem
loadSched - - - - - - - - - - -
loadStop - - - - - - - - - - -
POWER STATUS: ok
IDLE TIME: 2m 12s
CYCLE TIME REMAINING: 3m 1s
When a host in energy saving state host is switched to working state by a job (that is, the job has been dispatched and waiting for the host to resume), its state is not shown as pending. Instead, it is displayed as provisioning (PROV). For example:
bjobs
JOBID USER STAT QUEUE FROM_HOST EXEC_HOST JOB_NAME SUBMIT_TIME
204 root PROV normal host2 host1 sleep 9999 Jun 5 15:24
The state PROV is displayed. This state shows that the job is dispatched to a suspended host, and this host is being resumed. The job remains in PROV state until LSF dispatches the job.
When a job is requires a host in energy saving state or the host is powered off, and LSF is switching the host to working state, the following event is appended by bjobs -l:
Mon Nov 5 16:40:47: Will start on 2 Hosts <host1> <host2>. Waiting for machine provisioning;
The message indicates which host is being provisioned and how many slots are requested.
When a job is dispatched to a standby host and provisioning the host to resume to working state is triggered, two events are saved into lsb.events and lsb.streams. For example:
Tue Nov 19 01:29:20: Host is being provisioned for job. Waiting for host <xxxx> to power on;
Tue Nov 19 01:30:06: Host provisioning is done;
Use bresources -p to show the configured energy aware scheduling policies. For example:
bresources -p
Begin PowerPolicy
NAME = policy_night
HOSTS = hostGroup1 host3
TIME_WINDOW= 23:59-5:00
MIN_IDLE_TIME= 1800
CYCLE_TIME= 60
APPLIED = Yes
End PowerPolicy
Begin PowerPolicy
NAME = policy_other
HOSTS = all
TIME_WINDOW= all
APPLIED = Yes
End PowerPolicy
In the above case, “policy_night” is defined only for hostGroup1 and host3 and applies during the hours of 23:59 and 5:00. In contrast, “policy_other” covers all other hosts not included in the “policy_night” power policy (with the exception of master and master candidate hosts) and is in effect at all hours.