Configuration to enable job submission and execution controls

This feature is enabled by the presence of at least one esub or one eexec executable in the directory specified by the parameter LSF_SERVERDIR in lsf.conf. LSF does not include a default esub or eexec; you should write your own executables to meet the job requirements of your site.

Executable file

UNIX naming convention

Windows naming convention

esub

LSF_SERVERDIR/esub.application

LSF_SERVERDIR\esub.application.exe

LSF_SERVERDIR\esub.application.bat

eexec

LSF_SERVERDIR/eexec

LSF_SERVERDIR\eexec.exe

LSF_SERVERDIR\eexec.bat

The name of your esub should indicate the application with which it runs. For example: esub.fluent.

Restriction:

The name esub.user is reserved. Do not use the name esub.user for an application-specific esub.

Valid file names contain only alphanumeric characters, underscores (_), and hyphens (-).

Once the LSF_SERVERDIR contains one or more esub executables, users can specify the esub executables associated with each job they submit. If an eexec exists in LSF_SERVERDIR, LSF invokes that eexec for all jobs submitted to the cluster.

The following esub executables are provided as separate packages, available from IBM Inc. upon request:
  • esub.openmpi: OpenMPI job submission
  • esub.pvm: PVM job submission
  • esub.poe: POE job submission
  • esub.ls_dyna: LS-Dyna job submission
  • esub.fluent: FLUENT job submission
  • esub.afs or esub.dce: for installing LSF onto an AFS or DCE filesystem
  • esub.lammpi: LAM/MPI job submission
  • esub.mpich_gm: MPICH-GM job submission
  • esub.intelmpi: Intel® MPI job submission
  • esub.bproc: Beowulf Distributed Process Space (BProc) job submission
  • esub.mpich2: MPICH2 job submission
  • esub.mpichp4: MPICH-P4 job submission
  • esub.mvapich: MVAPICH job submission
  • esub.tv, esub.tvlammpi, esub.tvmpich_gm, esub.tvpoe: TotalView® debugging for various MPI applications.

Environment variables used by esub

When you write an esub, you can use the following environment variables provided by LSF for the esub execution environment:

LSB_SUB_PARM_FILE
Points to a temporary file that LSF uses to store the bsub options entered in the command line. An esub reads this file at job submission and either accepts the values, changes the values, or rejects the job. Job submission options are stored as name-value pairs on separate lines with the format option_name=value.

For example, if a user submits the following job,

bsub -q normal -x -P myproject -R "r1m rusage[mem=100]" -n 90 myjob

The LSB_SUB_PARM_FILE contains the following lines:
LSB_SUB_QUEUE="normal"
LSB_SUB_EXLUSIVE=Y
LSB_SUB_RES_REQ="r1m usage[mem=100]"
LSB_SUB_PROJECT_NAME="myproject"
LSB_SUB_COMMAND_LINE="myjob"
LSB_SUB_NUM_PROCESSORS=90
LSB_SUB_MAX_NUM_PROCESSORS=90

An esub can change any or all of the job options by writing to the file specified by the environment variable LSB_SUB_MODIFY_FILE.

The temporary file pointed to by LSB_SUB_PARM_FILE stores the following information:

Option

bsub or bmod option

Description

LSB_SUB_ADDITIONAL

-a

String that contains the application name or names of the esub executables requested by the user.
Restriction: This is the only option that an esub cannot change or add at job submission.

LSB_SUB_BEGIN_TIME

-b

Begin time, in seconds since 00:00:00 GMT, Jan. 1, 1970

LSB_SUB_CHKPNT_DIR

-k

Checkpoint directory

The file path of the checkpoint directory can contain up to 4000 characters for UNIX and Linux, or up to 255 characters for Windows, including the directory and file name.

LSB_SUB_COMMAND_LINE

bsub job command argument

LSB_SUB_COMMANDNAME must be set in lsf.conf to enable esub to use this variable.

LSB_SUB_CHKPNT_PERIOD

-k

Checkpoint period in seconds

LSB_SUB3_CWD

-cwd

Current working directory

LSB_SUB_DEPEND_COND

-w

Dependency condition

LSB_SUB_ERR_FILE

-e, -eo

Standard error file name

LSB_SUB_EXCLUSIVE

-x

Exclusive execution, specified by "Y"

LSB_SUB_HOLD

-H

Hold job

LSB_SUB_HOST_SPEC

-c or -w

Host specifier, limits the CPU time or RUN time.

LSB_SUB_HOSTS

-m

List of requested execution host names

LSB_SUB_IN_FILE

-i, -io

Standard input file name

LSB_SUB_INTERACTIVE

-I

Interactive job, specified by "Y"

LSB_SUB_LOGIN_SHELL

-L

Login shell

LSB_SUB_JOB_

DESCRIPTION

-Jd

Job description

LSB_SUB_JOB_NAME

-J

Job name

LSB_SUB_JOB

_WARNING_ACTION

-wa

Job warning action

LSB_SUB_JOB_ACTION

_WARNING_TIME

-wt

Job warning time period

LSB_SUB_MAIL_USER

-u

Email address to which LSF sends job-related messages

LSB_SUB_MAX_NUM

_PROCESSORS

-n

Maximum number of processors requested

LSB_MC_SUB_CLUSTERS

-clusters

Cluster names

LSB_SUB_MODIFY

bmod

Indicates that bmod invoked esub, specified by "Y".

LSB_SUB_MODIFY_ONCE

bmod

Indicates that the job options specified at job submission have already been modified by bmod, and that bmod is invoking esub again, specified by "Y".

LSB_SUB4_NETWORK

-network

Defines network requirements before job submission

Start of change LSB_SUB4_ORPHAN_TERM_NO_WAIT End of change

Start of change -ti End of change

Start of change Tells LSF to terminate an orphaned job immediately (ignores the grace period). End of change

LSB_SUB_NOTIFY_BEGIN

-B

LSF sends an email notification when the job begins, specified by "Y".

LSB_SUB_NOTIFY_END

-N

LSF sends an email notification when the job ends, specified by "Y".

LSB_SUB_NUM_PROCESSORS

-n

Minimum number of processors requested.

LSB_SUB_OTHER_FILES

bmod -f

Indicates the number of files to be transferred. The value is SUB_RESET if bmod is being used to reset the number of files to be transferred.

The file path of the directory can contain up to 4094 characters for UNIX and Linux, or up to 255 characters for Windows, including the director and file name.

LSB_SUB_OTHER_FILES

_number

bsub -f

The number indicates the particular file transfer value in the specified file transfer expression.

For example, for bsub -f "a > b" -f "c < d", the following would be defined:

LSB_SUB_OTHER_FILES=2

LSB_SUB_OTHER_FILES_0="a > b"

LSB_SUB_OTHER_FILES_1="c < d"

LSB_SUB4_OUTDIR

-outdir

Output directory

LSB_SUB_OUT_FILE

-o, -oo

Standard output file name.

LSB_SUB_PRE_EXEC

-E

Pre-execution command.

The file path of the directory can contain up to 4094 characters for UNIX and Linux, or up to 255 characters for Windows, including the directory and file name.

LSB_SUB_PROJECT_NAME

-P

Project name.

LSB_SUB_PTY

-Ip

An interactive job with PTY support, specified by "Y"

LSB_SUB_PTY_SHELL

-Is

An interactive job with PTY shell support, specified by "Y"

LSB_SUB_QUEUE

-q

Submission queue name

LSB_SUB_RERUNNABLE

-r

"Y" specifies a rerunnable job

"N" specifies a nonrerunnable job (specified with bsub -rn). The job is not rerunnable even it was submitted to a rerunable queue or application profile

For bmod -rn, the value is SUB_RESET.

LSB_SUB_RES_REQ

-R

Resource requirement string—does not support multiple resource requirement strings

LSB_SUB_RESTART

brestart

"Y" indicates to esub that the job options are associated with a restarted job.

LSB_SUB_RESTART_FORCE

brestart -f

"Y" indicates to esub that the job options are associated with a forced restarted job.

LSB_SUB_RLIMIT_CORE

-C

Core file size limit

LSB_SUB_RLIMIT_CPU

-c

CPU limit

LSB_SUB_RLIMIT_DATA

-D

Data size limit

For AIX, if the XPG_SUS_ENV=ON environment variable is set in the user's environment before the process is executed and a process attempts to set the limit lower than current usage, the operation fails with errno set to EINVAL. If the XPG_SUS_ENV environment variable is not set, the operation fails with errno set to EFAULT.

LSB_SUB_RLIMIT_FSIZE

-F

File size limit

LSB_SUB_RLIMIT_PROCESS

-p

Process limit

LSB_SUB_RLIMIT_RSS

-M

Resident size limit

LSB_SUB_RLIMIT_RUN

-W

Wall-clock run limit in seconds. (Note this is not in minutes, unlike the run limit specified by bsub -W)

LSB_SUB_RLIMIT_STACK

-S

Stack size limit

LSB_SUB_RLIMIT_THREAD

-T

Thread limit

LSB_SUB_TERM_TIME

-t

Termination time, in seconds, since 00:00:00 GMT, Jan. 1, 1970

LSB_SUB_TIME_EVENT

-wt

Time event expression

LSB_SUB_USER_GROUP

-G

User group name

LSB_SUB_WINDOW_SIG

-s

Window signal number

LSB_SUB2_JOB_GROUP

-g

Submits a job to a job group

LSB_SUB2_LICENSE

_PROJECT

-Lp

License Scheduler project name

LSB_SUB2_IN

_FILE_SPOOL

-is

Spooled input file name

LSB_SUB2_JOB

_CMD_SPOOL

-Zs

Spooled job command file name

LSB_SUB2_JOB

_PRIORITY

-sp

Job priority

For bmod -spn, the value is SUB_RESET.

LSB_SUB2_SLA

-sla

SLA scheduling options

LSB_SUB2_USE_RSV

-U

Advance reservation ID

LSB_SUB3_ABSOLUTE

_PRIORITY

bmod -aps

bmod -apsn

For bmod -aps, the value equal to the APS string given. For bmod -apsn, the value is SUB_RESET.

LSB_SUB3_AUTO

_RESIZABLE

-ar

Job autoresizable attribute. LSB_SUB3_AUTO_RESIZABLE=Y if bsub -ar -app or bmod -ar is specified.

LSB_SUB3_AUTO_RESIABLE=

SUB_RESET if bmod -arn is used.

LSB_SUB3_APP

-app

Application profile name

For bmod -appn, the value is SUB_RESET.

LSB_SUB3_CWD

-cwd

Current working directory

LSB_SUB3_ INIT_CHKPNT_PERIOD

-k init

Initial checkpoint period

LSB_SUB

_INTERACTIVE

LSB_SUB3_INTERACTIVE_SSH

bsub -IS

The session of the interactive job is encrypted with SSH.

LSB_SUB

_INTERACTIVE

LSB_SUB_PTY

LSB_SUB3_INTERACTIVE_SSH

bsub –ISp

If LSB_SUB_INTERACTIVE is specified by "Y", LSB_SUB_PTY is specified by "Y", and LSB_SUB3_INTERACTIVE_SSH is specified by "Y", the session of interactive job with PTY support is encrypted by SSH.

LSB_SUB

_INTERACTIVE

LSB_SUB_PTY

LSB_SUB_PTY_SHELL

LSB_SUB3_INTERACTIVE_SSH

bsub –ISs

If LSB_SUB_INTERACTIVE is specified by "Y", LSB_SUB_PTY is specified by "Y", LSB_SUB_PTY_SHELL is specified by "Y", and LSB_SUB3_INTERACTIVE_SSH is specified by "Y", the session of interactive job with PTY shell support is encrypted by SSH.

LSB_SUB3_JOB

_REQUEUE

-Q

String format parameter containing the job requeue exit values

For bmod -Qn, the value is SUB_RESET.

LSB_SUB3_MIG

-mig

-mign

Migration threshold

LSB_SUB3_POST_EXEC

-Ep

Run the specified post-execution command on the execution host after the job finishes (you must specify the first execution host).

The file path of the directory can contain up to 4094 characters for UNIX and Linux, or up to 255 characters for Windows, including the directory and file name.

LSB_SUB3_RESIZE

_NOTIFY_CMD

-rnc

Job resize notification command.

LSB_SUB3_RESIZE_NOTIFY_CMD=<cmd> if bsub -rnc or bmod -rnc is specified.

LSB_SUB3_RESIZE_NOTIFY_CMD

=SUB_RESET

if bmod -rnc is used.

LSB_SUB3_RUNTIME

_ESTIMATION

-We

Runtime estimate in seconds. (Note this is not in minutes, unlike the runtime estimate specified by bsub -We)

LSB_SUB3_RUNTIME

_ESTIMATION_ACC

-We+

Runtime estimate that is the accumulated run time plus the runtime estimate

LSB_SUB3_RUNTIME

_ESTIMATION_PERC

-Wep

Runtime estimate in percentage of completion

LSB_SUB3_USER

_SHELL_LIMITS

-ul

Pass user shell limits to execution host

LSB_SUB_INTER-

ACTIVE

LSB_SUB3_XJOB_SSH

bsub -IX

If both are set to "Y", the session between the X-client and X-server as well as the session between the execution host and submission host are encrypted with SSH.

Start of change LSF_SUB4_SUB_ENV_VARS End of change Start of change -env End of change Start of change Controls the propagation of job submission environment variables to the execution hosts. If any environment variables in LSF_SUB4_SUB_ENV_VARS conflict with the contents of the LSB_SUB_MODIFY_ENVFILE file, the conflicting environment variables in LSB_SUB_MODIFY_ENVFILE will take effect. End of change
LSB_SUB_MODIFY_FILE
Points to the file that esub uses to modify the bsub job option values stored in the LSB_SUB_PARM_FILE. You can change the job options by having your esub write the new values to the LSB_SUB_MODIFY_FILE in any order, using the same format shown for the LSB_SUB_PARM_FILE. The value SUB_RESET, integers, and boolean values do not require quotes. String parameters must be entered with quotes around each string, or space-separated series of strings.

When your esub runs at job submission, LSF checks the LSB_SUB_MODIFY_FILE and applies changes so that the job runs with the revised option values.

Restriction:

LSB_SUB_ADDITIONAL is the only option that an esub cannot change or add at job submission.

LSB_SUB_MODIFY_ENVFILE
Points to the file that esub uses to modify the user environment variables with which the job is submitted (not specified by bsub options). You can change these environment variables by having your esub write the values to the LSB_SUB_MODIFY_ENVFILE in any order, using the format variable_name=value, or variable_name="string".

LSF uses the LSB_SUB_MODIFY_ENVFILE to change the environment variables on the submission host. When your esub runs at job submission, LSF checks the LSB_SUB_MODIFY_ENVFILE and applies changes so that the job is submitted with the new environment variable values. LSF associates the new user environment with the job so that the job runs on the execution host with the new user environment.

LSB_SUB_ABORT_VALUE
Indicates to LSF that a job should be rejected. For example, if you want LSF to reject a job, your esub should contain the line
exit $LSB_SUB_ABORT_VALUE
Restriction: When an esub exits with the LSB_SUB_ABORT_VALUE, esub must not write to LSB_SUB_MODIFY_FILE or to LSB_SUB_MODIFY_ENVFILE.

If multiple esubs are specified and one of the esubs exits with a value of LSB_SUB_ABORT_VALUE, LSF rejects the job without running the remaining esubs and returns a value of LSB_SUB_ABORT_VALUE.

LSB_INVOKE_CMD
Specifies the name of the LSF command that most recently invoked an external executable.

The length of environment variables used by esub must be less than 4096.

Environment variables used by eexec

When you write an eexec, you can use the following environment variables in addition to all user-environment or application-specific variables.
LS_EXEC_T
Indicates the stage or type of job execution. LSF sets LS_EXEC_T to:
  • START at the beginning of job execution
  • END at job completion
  • CHKPNT at job checkpoint start
LS_JOBPID
Stores the process ID of the LSF process that invoked eexec. If eexec is intended to monitor job execution, eexec must spawn a child and then have the parent eexec process exit. The eexec child should periodically test that the job process is still alive using the LS_JOBPID variable.