Monitor the progress of an SLA (bsla)

Procedure

Run bsla to display the properties of service classes configured in lsb.serviceclasses and dynamic information about the state of each configured service class.

Examples

  • The guarantee SLA bigMemSLA has 10 slots guaranteed, limited to one slot per host.

    bsla
    SERVICE CLASS NAME:  bigMemSLA
     -- 
    ACCESS CONTROL: QUEUES[normal] 
    AUTO ATTACH: Y
     
    GOAL:  GUARANTEE 
     
    POOL NAME                    TYPE  GUARANTEED   USED
    bigMemPool                  slots          10      0
  • One velocity goal of service class Tofino is active and on time. The other configured velocity goal is inactive.

    bsla
    SERVICE CLASS NAME: Tofino
      -- day and night velocity 
    PRIORITY: 20 
     
    GOAL:  VELOCITY 30  
    ACTIVE WINDOW: (17:30-8:30)  
    STATUS:  Inactive 
    SLA THROUGHPUT:  0.00 JOBS/CLEAN_PERIOD 
     
    GOAL: VELOCITY 10 
    ACTIVE WINDOW: (9:00-17:00) 
    STATUS: Active:On time 
    SLA THROUGHPUT:  10.00 JOBS/CLEAN_PERIOD     
     
    NJOBS   PEND    RUN     SSUSP   USUSP   FINISH    
     300    280      10        0       0      10
  • The deadline goal of service class Sooke is not being met, and bsla displays status Active:Delayed:

    bsla
    SERVICE CLASS NAME:  Sooke
      -- working hours 
    PRIORITY: 20 
     
    GOAL:  DEADLINE  
    ACTIVE WINDOW: (8:30-19:00)  
    STATUS:  Active:Delayed 
    SLA THROUGHPUT:  0.00 JOBS/CLEAN_PERIOD 
     
    ESTIMATED FINISH TIME:  (Tue Oct 28 06:17) 
    OPTIMUM NUMBER OF RUNNING JOBS:  6    
    NJOBS   PEND    RUN     SSUSP   USUSP   FINISH     
     40     39       1        0       0       0
  • The configured velocity goal of the service class Duncan is active and on time. The configured deadline goal of the service class is inactive.

    bsla Duncan 
    SERVICE CLASS NAME:  Duncan
       -- Daytime/Nighttime SLA 
    PRIORITY:  23 
    USER_GROUP:  user1 user2 
     
    GOAL:  VELOCITY 8 
    ACTIVE WINDOW: (9:00-17:30)  
    STATUS:  Active:On time 
    SLA THROUGHPUT:  0.00 JOBS/CLEAN_PERIOD 
     
    GOAL:  DEADLINE  
    ACTIVE WINDOW: (17:30-9:00)  
    STATUS:  Inactive 
    SLA THROUGHPUT:  0.00 JOBS/CLEAN_PERIOD    
     
    NJOBS   PEND    RUN     SSUSP   USUSP   FINISH      
     0      0       0        0       0       0
  • The throughput goal of service class Sidney is always active. bsla displays:

    • Status as active and on time

    • An optimum number of 5 running jobs to meet the goal

    • Actual throughput of 10 jobs per hour based on the last CLEAN_PERIOD

    bsla Sidney
    SERVICE CLASS NAME:  Sidney
      -- constant throughput 
    PRIORITY:  20 
     
    GOAL:  THROUGHPUT 6 
    ACTIVE WINDOW: Always Open  
    STATUS:  Active:On time 
    SLA THROUGHPUT:  10.00 JOBs/CLEAN_PERIOD 
    OPTIMUM NUMBER OF RUNNING JOBS:  5    
     
    NJOBS   PEND    RUN     SSUSP   USUSP   FINISH    
     110     95       5        0       0      10

View jobs running in an SLA (bjobs)

Procedure

Run bjobs -sla to display jobs running in a service class:
bjobs -sla Sidney
JOBID   USER    STAT  QUEUE      FROM_HOST   EXEC_HOST   JOB_NAME   SUBMIT_TIME
136     user1   RUN   normal     hostA       hostA       sleep 100  Sep 28 13:24
137     user1   RUN   normal     hostA       hostB       sleep 100 Sep 28 13:25

For time-based SLAs, use -sla with -g to display job groups attached to a service class. Once a job group is attached to a time-based service class, all jobs submitted to that group are subject to the SLA.

Track historical behavior of an SLA (bacct)

Procedure

Run bacct to display historical performance of a service class. For example, service classes Sidney and Surrey configure throughput goals.
bsla
SERVICE CLASS NAME:  Sidney
  -- throughput 6  
PRIORITY:  20 
 
GOAL:  THROUGHPUT 6 
ACTIVE WINDOW: Always Open  
STATUS:  Active:On time 
SLA THROUGHPUT:  10.00 JOBs/CLEAN_PERIOD 
OPTIMUM NUMBER OF RUNNING JOBS:  5    
 
NJOBS   PEND    RUN     SSUSP   USUSP   FINISH    
 111     94       5        0       0      12 
----------------------------------------------
SERVICE CLASS NAME:  Surrey
  -- throughput 3 
PRIORITY:  15 
 
GOAL:  THROUGHPUT 3 
ACTIVE WINDOW: Always Open  
STATUS:  Active:On time 
SLA THROUGHPUT:  4.00 JOBs/CLEAN_PERIOD 
OPTIMUM NUMBER OF RUNNING JOBS:  4    
 
NJOBS   PEND    RUN     SSUSP   USUSP   FINISH    
 104     96       4        0       0       4

These two service classes have the following historical performance. For SLA Sidney, bacct shows a total throughput of 8.94 jobs per hour over a period of 20.58 hours:

bacct -sla Sidney
Accounting information about jobs that are:
    - submitted by users user1,
    - accounted on all projects.
   - completed normally or exited
   - executed on all hosts.
   - submitted to all queues.
   - accounted on service classes Sidney,  
----------------------------------------------
SUMMARY:      ( time unit: second )   
Total number of done jobs:     183      Total number of exited jobs:     1  
Total CPU time consumed:      40.0      Average CPU time consumed:     0.2  
Maximum CPU time of a job:     0.3      Minimum CPU time of a job:     0.1  
Total wait time in queues: 1947454.0  
Average wait time in queue:10584.0  
Maximum wait time in queue:18912.0      Minimum wait time in queue:    7.0  
Average turnaround time:     12268 (seconds/job)  
Maximum turnaround time:     22079      Minimum turnaround time:      1713  
Average hog factor of a job:  0.00 ( cpu time / turnaround time )  
Maximum hog factor of a job:  0.00      Minimum hog factor of a job:  0.00
Total throughput:             8.94 (jobs/hour)  during   20.58 hours 
Beginning time:       Oct 11 20:23      Ending time:          Oct 12 16:58

For SLA Surrey, bacct shows a total throughput of 4.36 jobs per hour over a period of 19.95 hours:

bacct -sla Surrey
Accounting information about jobs that are:
    - submitted by users user1,
    - accounted on all projects.
   - completed normally or exited.
   - executed on all hosts.
   - submitted to all queues.
   - accounted on service classes Surrey, 
----------------------------------------- 
 
SUMMARY:      ( time unit: second )   
Total number of done jobs:      87      Total number of exited jobs:     0  
Total CPU time consumed:      18.0      Average CPU time consumed:     0.2  
Maximum CPU time of a job:     0.3      Minimum CPU time of a job:     0.1  
Total wait time in queues: 2371955.0  
Average wait time in queue:27263.8  
Maximum wait time in queue:39125.0      Minimum wait time in queue:    7.0  
Average turnaround time:     30596 (seconds/job)  
Maximum turnaround time:     44778      Minimum turnaround time:      3355  
Average hog factor of a job:  0.00 ( cpu time / turnaround time )  
Maximum hog factor of a job:  0.00      Minimum hog factor of a job:  0.00
Total throughput:             4.36 (jobs/hour)  during   19.95 hours 
Beginning time:       Oct 11 20:50      Ending time:          Oct 12 16:47

Because the run times are not uniform, both service classes actually achieve higher throughput than configured.

View parallel jobs in EGO enabled SLA

Procedure

Run bsla -N to display service class job counter information by job slots instead of number of jobs. NSLOTS, PEND, RUN, SSUSP, USUSP are all counted in slots rather than number of jobs:

user1@system-02-461: bsla -N SLA1
SERVICE CLASS NAME:  SLA1
PRIORITY:  10
CONSUMER:  sla1
EGO_RES_REQ: any host
MAX_HOST_IDLE_TIME: 120
EXCLUSIVE:  N
GOAL:  VELOCITY 1
ACTIVE WINDOW: Always Open
STATUS:  Active:On time
SLA THROUGHPUT:  0.00 JOBS/CLEAN_PERIOD
   NSLOTS  PEND    RUN     SSUSP   USUSP
    42     28      14        0       0