Configuration overview of guaranteed resource pools

Basic service class configuration

About this task

Service classes are configured in lsb.serviceclasses. At a minimum, for each service class to be used in a guarantee policy, you must specify the following parameters:

  • NAME = service_class_name: This is the name of the service class.
  • GOALS = [GUARANTEE]: To distinguish from other types of service class, you must give the guarantee goal.

Optionally, your service class can have a description. Use the DESCRIPTION parameter.

The following is an example of a basic service class configuration:

Begin ServiceClass
NAME = myServiceClass
GOALS = [GUARANTEE]
DESCRIPTION = Example service class.
End ServiceClass

Once a service class is configured, you can submit jobs to this service class with thebsub –sla submission option:

bsub –sla myServiceClass ./a.out

The service class only defines the container for jobs. In order to complete the guarantee policy, you must also configure the pool. This is done in the GuaranteedResourcePool section of lsb.resources.

Basic guarantee policy configuration

About this task

At minimum, for GuaranteedResourcePool sections you need to provide values for the following parameters:

  • NAME = pool_name: The name of the guarantee policy/pool.
  • TYPE = slots | hosts | package[slots=num_slots:mem=mem_amount] | resource[rsrc_name]
    • The resources that compose the pool.
    • Package means that each unit guaranteed is composed of a number of slots, and some amount of memory together on the same host.
    • resource must be a License Scheduler managed resource.
  • DISTRIBUTION = [service_class, amount[%]] …
    • Describes the number of resources in the pool deserved by each service class.
    • A percentage guarantee means percentage of the guaranteed resources in the pool.

Optionally, you can also include a description of a GuaranteedResourcePool using the DESCRIPTION parameter.

The following is an example of a guaranteed resource pool configuration:

Begin GuaranteedResourcePool
NAME = myPool
Type = slots
DISTRIBUTION = [myServiceClass, 10] [yourServiceClass, 15]
DESCRIPTION = Example guarantee policy.
End GuaranteedResourcePool

Controlling access to a service class

About this task

You can control which jobs are allowed into a service class by setting the following parameter in the ServiceClass section:

ACCESS_CONTROL = [QUEUES[ queue ...]] [USERS[ [user_name] [user_group] ...]] [FAIRSHARE_GROUPS[user_group ...]] [APPS[app_name ...]] [PROJECTS[proj_name...]] [LIC_PROJECTS[license_proj...]]

Where:

  • QUEUES: restricts access based on queue
  • USERS: restricts access based on user
  • FAIRSHARE_GROUPS: restricts access based on bsub –G option
  • APPS: resticts access based on bsub –app option
  • PROJECTS: restricts access based on bsub –P option
  • LIC_PROJECTS: restricts access based on bsub –Lp option

When ACCESS_CONTROL is not configured for a service class, any job can be submitted to the service class with the –sla option. If ACCESS_CONTROL is configured and a job is submitted to the service class, but the job does not meet the access control criteria of the service class, then the submission is rejected.

The following example shows a service class that only accepts jobs from the priority queue (from user joe):

Begin ServiceClass
NAME = myServiceClass
GOALS = [GUARANTEE]
ACCESS_CONTROL = QUEUES[priority] USERS[joe]
DESCRIPTION = Example service class.
End ServiceClass

Have LSF automatically put jobs in service classes

About this task

A job can be associated with a service class by using the bsub –sla option to name the service class. You can configure a service class so that LSF will automatically try to put the job in the service class if the job meets the access control criteria. Use the following parameter in the ServiceClass definition:

AUTO_ATTACH=Y

When a job is submitted without a service class explicitly specified (i.e., the bsub –sla option is not specified) then LSF will consider the service classes with AUTO_ATTACH=Y and put the job in the first such service class for which the job meets the access control criteria. Each job can be associated with at most one service class.

The following is an example of a service class that automatically accepts jobs from user joe in queue priority:

Begin ServiceClass
NAME = myServiceClass
GOALS = [GUARANTEE]
ACCESS_CONTROL = QUEUES[priority] USERS[joe]
AUTO_ATTACH = Y
DESCRIPTION = Example service class.
End ServiceClass

Restricting the set of hosts in a guaranteed resource pool

About this task

Each host in the cluster can potentially belong to at most one pool of type, slots, hosts or package. To restrict the set of hosts that can belong to a pool, use the following parameters:

  • RES_SELECT = select_string
  • HOSTS = host | hostgroup …

The syntax for RES_SELECT is the same as in bsub –R “select[…]”.

When LSF starts up, it goes through the hosts and assigns each host to a pool that will accept the host, based on the pool’s RES_SELECT and HOSTS parameters. If multiple pools will accept the host, then the host will be assigned to the first pool according to the configuration order of the pools.

The following is an example of a guaranteed resource policy on hosts of type x86_64 from host group myHostGroup:

Begin GuaranteedResourcePool
NAME = myPool
TYPE = slots
RES_SELECT = type==X86_64
HOSTS = myHostGroup
DISTRIBUTION = [myServiceClass, 10] [yourServiceClass, 15]
End GuaranteedResourcePool

Loaning resources from a pool

About this task

When LSF schedules, it tries to reserve sufficient resources from the pool in order to honor guarantees. By default, if these reserved resources cannot be used immediately to satisfy guarantees, then they are left idle. Optionally, you can configure loaning to allow other jobs to use these resources when they are not needed immediately for guarantees.

To enable loaning, use the following parameter in the pool:

LOAN_POLICIES = QUEUES[all | queue_name …] [RETAIN[amount[%]]] [DURATION[minutes]] [CLOSE_ON_DEMAND]

Where:

  • QUEUES[all | queue_name …]
    • This is the only required keyword.
    • Specifies which queues are allowed to loan from the pool.
    • As more queues are permitted to loan, this can potentially degrade scheduling performance, so be careful about adding queues if scheduling performance is a concern.
  • RETAIN[amount[%]]
    • Without this keyword, LSF will potentially loan out all the resources in the pool to non-owners (i.e., those jobs without guarantees) when you enable loan policies, and there may never be a free package. Guaranteed jobs may starve (if resource reservation is not used). So RETAIN can be used as an alternative to resource reservation in such cases.
    • When RETAIN is set, then as long as there are unused guarantees, LSF will try to keep idle the amount of resources specified in RETAIN. These idle resources can only be used to honor guarantees. Whenever the number of free resources in the pool drops below the RETAIN amount, LSF stops loaning resources from the pool.
    • With RETAIN, LSF maintains an idle buffer. The number kept idle is: MIN(RETAIN, amount needed for unused guarantees).
    • For example, suppose that a service class owns 100% of a pool and RETAIN is 10. Initially, LSF will loan out all but 10 of the resources. If the service class then occupies those 10 resources, LSF will stop loaning to non-guaranteed jobs until more than 10 resources free up (as jobs finish).
  • DURATION[minutes]
    • Specifies that only jobs with runtime (-W) or expected runtime (-We) less than the given number of minutes are permitted loans from the pool.
    • Means that if later there is demand from a service class with a guarantee in the pool, the service class will not have to wait longer than the DURATION before it is able to have its guarantee met.
  • CLOSE_ON_DEMAND
    • Tells LSF that loaning should be disabled whenever there are pending jobs belonging to service classes with guarantees in the pool.
    • This is a very conservative policy. It should generally only be used when the service classes with guarantees in the pool have workload submitted to them only infrequently.

The following is an example of a guarantee package policy that loans resources to jobs in queue short, but keeps sufficient resources for 10 packages unavailable for loaning so it can honor guarantees immediately when there is demand from the service classes:

Begin GuaranteedResourcePool
NAME = myPool
TYPE = package[slots=1:mem=1024]
DISTRIBUTION = [myServiceClass, 10] [yourServiceClass, 15]
LOAN_POLICIES = QUEUES[short] RETAIN[10]
End GuaranteedResourcePool

Configuring a high priority queue to ignore guarantees

About this task

In some cases, you would like guarantees to apply to batch workload. However, for some high priority interactive or administrative workloads, you would like to get jobs running as soon as possible, without regard to guarantees.

You can configure a queue to ignore guarantee policies by setting the following parameter in the queue definition in lsb.queues:

SLA_GUARANTEES_IGNORE=Y

This parameter essentially allows the queue to violate any configured guarantee policies. The queue can take any resources that should be reserved for guarantees. As such, queues with this parameter set should have infrequent or limited workload.

The following example shows how to configure a high priority interactive queue to ignore guarantee policies:

Begin Queue
QUEUE_NAME = interactive
PRIORITY = 100
SLA_GUARANTEES_IGNORE = Y
DESCRIPTION = A high priority interactive queue that ignores all guarantee policies.
End Queue

Best practices for configuring guaranteed resource pools

About this task

  • In each guarantee pool, hosts should be relatively homogeneous in terms of the resources that will be available to the jobs.
  • Each job with a guarantee should ideally be able to fit within a single unit of the guaranteed resources.
    • In a slot type pool, each job with a guarantee should require only a single slot to run. Otherwise, multiple slots may be reserved on different hosts and the job may not run.
    • In a package type pool, each job should require only a single package.
  • For each guarantee policy, you must give the list of queues able to loan from the pool. For each queue able to loan, LSF must try scheduling from the queue twice during each scheduling session. This can potentially degrade scheduling performance. If scheduling performance is a concern, be sure to limit the number of queues able to loan.
  • When configuring the RES_SELECT parameter for a pool, use only static resources (e.g. maxmem) instead of dynamically changing resources (e.g. mem).