Estimated runtime and runtime limits

Instead of specifying an explicit runtime limit for jobs, you can specify an estimated run time for jobs. LSF uses the estimated value for job scheduling purposes only, and does not kill jobs that exceed this value unless the jobs also exceed a defined runtime limit. The format of runtime estimate is same as run limit set by the bsub -W option or the RUNLIMIT parameter in lsb.queues and lsb.applications.

Use JOB_RUNLIMIT_RATIO in lsb.params to limit the runtime estimate users can set. If JOB_RUNLIMIT_RATIO is set to 0 no restriction is applied to the runtime estimate. The ratio does not apply to the RUNTIME parameter in lsb.applications.

The job-level runtime estimate setting overrides the RUNTIME setting in an application profile in lsb.applications.

The following LSF features use the estimated runtime value to schedule jobs:
  • Job chunking

  • Advance reservation

  • SLA

  • Slot reservation

  • Backfill

Define a runtime estimate

Define the RUNTIME parameter at the application level. Use the bsub -We option at the job-level.

You can specify the runtime estimate as hours and minutes, or minutes only. The following examples show an application-level runtime estimate of three hours and 30 minutes:

  • RUNTIME=3:30

  • RUNTIME=210

Configure normalized run time

LSF uses normalized run time for scheduling in order to account for different processing speeds of the execution hosts.
Tip:

If you want the scheduler to use wall-clock (absolute) run time instead of normalized run time, define ABS_RUNLIMIT=Y in the file lsb.params or in the file lsb.applications for the application associated with your job.

LSF calculates the normalized run time using the following formula:
NORMALIZED_RUN_TIME = RUNTIME * CPU_Factor_Normalization_Host / CPU_Factor_Execute_Host
You can specify a host name or host model with the runtime estimate so that LSF uses a specific host name or model as the normalization host. If you do not specify a host name or host model, LSF uses the CPU factor for the default normalization host as described in the following table.

If you define…

In the file…

Then…

DEFAULT_HOST_SPEC

lsb.queues

LSF selects the default normalization host for the queue.

DEFAULT_HOST_SPEC

lsb.params

LSF selects the default normalization host for the cluster.

No default host at either the queue or cluster level

LSF selects the submission host as the normalization host.

To specify a host name (defined in lsf.cluster.clustername) or host model (defined in lsf.shared) as the normalization host, insert the "/" character between the minutes and the host name or model, as shown in the following examples:

RUNTIME=3:30/hostA
bsub -We 3:30/hostA

LSF calculates the normalized run time using the CPU factor defined for hostA.

RUNTIME=210/Ultra5S
bsub -We 210/Ultra5S
LSF calculates the normalized run time using the CPU factor defined for host model Ultra5S.
Tip:

Use lsinfo to see host name and host model information.

Guidelines for defining a runtime estimate

  1. You can define an estimated run time, along with a runtime limit (job level with bsub -W, application level with RUNLIMIT in lsb.applications, or queue level with RUNLIMIT lsb.queues).

  2. If the runtime limit is defined, the job-level (-We) or application-level RUNTIME value must be less than or equal to the run limit. LSF ignores the estimated runtime value and uses the run limit value for scheduling when
    • The estimated runtime value exceeds the run limit value, or

    • An estimated runtime value is not defined
      Note:

      When LSF uses the run limit value for scheduling, and the run limit is defined at more than one level, LSF uses the smallest run limit value to estimate the job duration.

  3. For chunk jobs, ensure that the estimated runtime value is
    • Less than the CHUNK_JOB_DURATION defined in the file lsb.params, or

    • Less than 30 minutes, if CHUNK_JOB_DURATION is not defined.

How estimated run time interacts with run limits

The following table includes all the expected behaviors for the combinations of job-level runtime estimate (-We), job-level rum limit (-W), application-level runtime estimate (RUNTIME), application-level run limit (RUNLIMIT), queue-level run limit (RUNLIMIT, both default and hard limit). Ratio is the value of JOB_RUNLIMIT_RATIO defined in lsb.params. The dash (—) indicates no value is defined for the job.

Job-runtime estimate

Job-run limit

Application runtime estimate

Application run limit

Queue default run limit

Queue hard run limit

Result

T1

-

Job is accepted

Jobs running longer than T1*ratio are killed

T1

T2>T1*ratio

Job is rejected

T1

T2<=T1*ratio

Job is accepted

Jobs running longer than T2 are killed

T1

T2<=T1*ratio

T3

T4

Job is accepted

Jobs running longer than T2 are killed

T2 overrides T4 or T1*ratio overrides T4

T1 overrides T3

T1

T2<=T1*ratio

T5

T6

Job is accepted

Jobs running longer than T2 are killed

If T2>T6, the job is rejected

T1

T3

T4

Job is accepted

Jobs running longer than T1*ratio are killed

T2 overrides T4 or T1*ratio overrides T4

T1 overrides T3

T1

T5

T6

Job is accepted

Jobs running longer than T1*ratio are killed

If T1*ratio>T6, the job is rejected