Instead of specifying an explicit runtime limit for jobs, you can specify an estimated run time for jobs. LSF uses the estimated value for job scheduling purposes only, and does not kill jobs that exceed this value unless the jobs also exceed a defined runtime limit. The format of runtime estimate is same as run limit set by the bsub -W option or the RUNLIMIT parameter in lsb.queues and lsb.applications.
Use JOB_RUNLIMIT_RATIO in lsb.params to limit the runtime estimate users can set. If JOB_RUNLIMIT_RATIO is set to 0 no restriction is applied to the runtime estimate. The ratio does not apply to the RUNTIME parameter in lsb.applications.
The job-level runtime estimate setting overrides the RUNTIME setting in an application profile in lsb.applications.
Job chunking
Advance reservation
SLA
Slot reservation
Backfill
Define the RUNTIME parameter at the application level. Use the bsub -We option at the job-level.
You can specify the runtime estimate as hours and minutes, or minutes only. The following examples show an application-level runtime estimate of three hours and 30 minutes:
RUNTIME=3:30
RUNTIME=210
If you want the scheduler to use wall-clock (absolute) run time instead of normalized run time, define ABS_RUNLIMIT=Y in the file lsb.params or in the file lsb.applications for the application associated with your job.
NORMALIZED_RUN_TIME = RUNTIME * CPU_Factor_Normalization_Host / CPU_Factor_Execute_Host
If you define… |
In the file… |
Then… |
---|---|---|
DEFAULT_HOST_SPEC |
lsb.queues |
LSF selects the default normalization host for the queue. |
DEFAULT_HOST_SPEC |
lsb.params |
LSF selects the default normalization host for the cluster. |
No default host at either the queue or cluster level |
LSF selects the submission host as the normalization host. |
To specify a host name (defined in lsf.cluster.clustername) or host model (defined in lsf.shared) as the normalization host, insert the "/" character between the minutes and the host name or model, as shown in the following examples:
RUNTIME=3:30/hostA
bsub -We 3:30/hostA
LSF calculates the normalized run time using the CPU factor defined for hostA.
RUNTIME=210/Ultra5S
bsub -We 210/Ultra5S
Use lsinfo to see host name and host model information.
You can define an estimated run time, along with a runtime limit (job level with bsub -W, application level with RUNLIMIT in lsb.applications, or queue level with RUNLIMIT lsb.queues).
The estimated runtime value exceeds the run limit value, or
When LSF uses the run limit value for scheduling, and the run limit is defined at more than one level, LSF uses the smallest run limit value to estimate the job duration.
Less than the CHUNK_JOB_DURATION defined in the file lsb.params, or
Less than 30 minutes, if CHUNK_JOB_DURATION is not defined.
Job-runtime estimate |
Job-run limit |
Application runtime estimate |
Application run limit |
Queue default run limit |
Queue hard run limit |
Result |
---|---|---|---|---|---|---|
T1 |
- |
— |
— |
— |
— |
Job is accepted Jobs running longer than T1*ratio are killed |
T1 |
T2>T1*ratio |
— |
— |
— |
— |
Job is rejected |
T1 |
T2<=T1*ratio |
— |
— |
— |
— |
Job is accepted Jobs running longer than T2 are killed |
T1 |
T2<=T1*ratio |
T3 |
T4 |
— |
— |
Job is accepted Jobs running longer than T2 are killed T2 overrides T4 or T1*ratio overrides T4 T1 overrides T3 |
T1 |
T2<=T1*ratio |
— |
— |
T5 |
T6 |
Job is accepted Jobs running longer than T2 are killed If T2>T6, the job is rejected |
T1 |
— |
T3 |
T4 |
— |
— |
Job is accepted Jobs running longer than T1*ratio are killed T2 overrides T4 or T1*ratio overrides T4 T1 overrides T3 |
T1 |
— |
— |
— |
T5 |
T6 |
Job is accepted Jobs running longer than T1*ratio are killed If T1*ratio>T6, the job is rejected |