Running Jobs with Task Geometry

Specifying task geometry allows you to group tasks of a parallel job step to run together on the same node. Task geometry allows for flexibility in how tasks are grouped for execution on system nodes. You cannot specify the particular nodes that these groups run on; the scheduler decides which nodes run the specified groupings.

Using the task geometry environment variable

Use the LSB_TASK_GEOMETRY environment variable to specify task geometry for your jobs. LSB_TASK_GEOMETRY replaces LSB_PJL_TASK_GEOMETRY, which is kept for compatibility with earlier versionsLSB_TASK_GEOMETRY overrides any process group or command file placement options.

The environment variable LSB_TASK_GEOMETRY is checked for all parallel jobs. If LSB_TASK_GEOMETRY is set users submit a parallel job (a job that requests more than 1 slot), LSF attempts to shape LSB_MCPU_HOSTS accordingly.

The mpirun.lsf script sets the LSB_MCPU_HOSTS environment variable in the job according to the task geometry specification.

The syntax is:

setenv LSB_TASK_GEOMETRY "{(task_ID,...) ...}"

For example, to submit a job to spawn 8 tasks and span 4 nodes, specify:

setenv LSB_TASK_GEOMETRY "{(2,5,7)(0,6)(1,3)(4)}"

The results are:

  • Tasks 2, 5, and 7 run on one node

  • Tasks 0 and 6 run on another node

  • Tasks 1 and 3 run on a third node

  • Task 4 runs on one node alone

Each task_ID number corresponds to a task ID in a job and each set of parenthesis contains the task IDs assigned to one node. Tasks can appear in any order, but the entire range of tasks specified must begin with 0, and must include all task ID numbers; you cannot skip a task ID number. Use braces to enclose the entire task geometry specification, and use parentheses to enclose groups of nodes. Use commas to separate task IDs.

For example:

setenv LSB_TASK_GEOMETRY "{(1)(2)}"

is incorrect because it does not start from task 0.

setenv LSB_TASK_GEOMETRY "{(0)(3)}"

is incorrect because it does not specify task 1and 2.

LSB_TASK_GEOMETRY cannot request more hosts than specified by the bsub -n option. For example:

setenv LSB_TASK_GEOMETRY "{(0)(1)(2)}"

specifies three nodes, one task per node. A correct job submission must request at least 3 hosts:

bsub -n 3 -R "span[ptile=1]" -I -a pe mpirun.lsf my_job

Job <564> is submitted to queue <hpc_linux>

<<Waiting for dispatch ...>>

<<Starting on hostA>>

...

Planning your task geometry specification

You should plan task geometry in advance and specify the job resource requirements for LSF to select hosts appropriately.

Use bsub -n and -R "span[ptile=]" to make sure LSF selects appropriate hosts to run the job, so that:

  • The correct number of nodes is specified

  • All exceution hosts have the same number of available slots

  • The ptile value is the maximum number of CPUs required on one node by task geometry specifications.

LSB_TASK_GEOMETRY only guarantees the geometry but does not guarantee the host order. You must make sure each host selected by LSF can run any group of tasks specified in LSB_TASK_GEOMETRY.

You can also use bsub -x to run jobs exclusively on a host. No other jobs share the node once this job is scheduled.