Start of change

Restrict job size requested by parallel jobs

Specifying a list of allowed job sizes (number of tasks) in queues or application profiles enables LSF to check the requested job sizes when submitting, modifying, or switching jobs.

About this task

Certain applications may yield better performance with specific job sizes (for example, the power of two, so that the job sizes are x^2), or some sites may want to run all job sizes to generate high cluster resource utilization. The JOB_SIZE_LIST parameter in lsb.queues or lsb.applications allows you to define a discrete list of allowed job sizes for the specified queues or application profiles.

LSF rejects jobs requesting job sizes that are not in this list, or jobs requesting a range of job sizes. The first job size in this list is the default job size, which is the job size assigned to jobs that do not explicitly request a job size. The rest of the list can be defined in any order.

For example, if the job size list for the queue1 queue allows 2, 4, 8, and 16 tasks, and you submit a parallel job requesting 10 tasks in this queue (bsub -q queue1 -n 10 ...), that job is rejected because the job size of 10 is not explicitly allowed in the list. To assign a default job size of 4, specify 4 as the first value in the list, and job submissions that do not request a job size are automatically assigned a job size of 4 (JOB_SIZE_LIST=4 2 8 16).

When using resource requirements to specify job size, the request must specify a single fixed job size and not multiple values or a range of values:

For example, the job size list for the normal queue allows 2, 4, and 8 tasks, with 2 as the default (JOB_SIZE_LIST=2 4 8). For the resource requirement "2*{-}+{-}", the last term ({-}) does not contain a fixed number of tasks, so this compound resource requirement is rejected in any queue that has a job size list.
  • For the following job submission with the compound resource requirement:

    bsub -R "2*{-}+{-}" -q normal myjob

    This job submission is rejected because the compound resource requirement does not contain a fixed number of tasks.

  • For the following job submission with the compound resource requirement:

    bsub -n 4 -R "2*{-}+{-}" -q normal myjob

    This job submission is accepted because -n 4 requests a fixed number of tasks, even though the compound resource requirement does not.

  • For the following job submission with compound and alternative resource requirements:

    bsub -R "{2*{-}+{-}}||{4*{-}}" -q normal myjob

    This job submission is rejected for specifying a range of values because the first alternative (2*{-}+{-}) does not imply a fixed job size.

  • For the following job submission with compound and alternative resource requirements for the interactive queue:

    bsub -R "{2*{-}+{-}}||{4*{-}}" -q interactive -H myjob

    This job submission is accepted because the interactive queue does not have a job size list. However, if you try to modify or switch this job to any queue or application profile with a job size list, and the job has not yet started, the request is rejected. For example, if this job has job ID 123 and is not started, the following request is rejected because the normal queue has a job size list:

    bswitch normal 123

    Similarly, if the app1 application profile has the same job size list as the normal queue, the following request is also rejected:

    bmod -app app1 123

When defined in both a queue (lsb.queues) and an application profile (lsb.applications), the job size request must satisfy both requirements. In addition, JOB_SIZE_LIST overrides any TASKLIMIT (formerly PROCLIMIT) parameters defined at the same level. Job size requirements do not apply to queues and application profiles with no job size lists, nor do they apply to other levels of job submissions (that is, host level or cluster level job submissions).

Specify a job size list for queues and application profiles as follows:

Procedure

  1. Log on as root or the LSF administrator on any host in the cluster.
  2. Define the JOB_SIZE_LIST parameter for the specific application profiles (in lsb.applications) or queues (in lsb.queues).

    JOB_SIZE_LIST=default_size [size ...]

    For example,
    • lsb.applications:
      Begin Application 
      NAME = app1 
      ... 
      JOB_SIZE_LIST=4 2 8 16 
      ... 
      End Application
    • lsb.queues:
      Begin Queue 
      QUEUE_NAME = queue1 
      ... 
      JOB_SIZE_LIST=4 2 8 16 
      ... 
      End Queue
  3. Save the changes to modified the configuration files.
  4. Use badmin ckconfig to check the new queue definition. If any errors are reported, fix the problem and check the configuration again.
  5. Run badmin reconfig to reconfigure mbatchd.
End of change