Start of change

-hostfile

Submits a job with a user-specified host file.

Categories

resource

Synopsis

bsub -hostfile file_path

Description

When submitting a job, you can point the job to a file that allocates specific hosts and number of slots for job processing. For example, if you know what the best host allocation for a job is based on factors such as network connection status, you may choose to submit a job with a user specified host file.

The user specified host file specifies the order in which to launch tasks, ranking the slots specified in the file. The resulting rank file is made available to other applications (such as MPI).

mbatchd does not read the user specified host file directly. It stores the condensed format in memory and the lsb.events file for communication and recovery.

The -hostfile option allows a user to submit a job with a user specified host file. A user specified host file contains specific hosts and slots that a user wants to use for a job. The user specified host file specifies the order in which to launch tasks, ranking the slots specified in the file. This command specifies the path of the user specified host file:

bsub -hostfile "host_alloc_file" ./a.out
Important:
  • The -hostfile cannot be used with either the –n or –m option.
  • The -hostfile option cannot be combined with –R or compound res_req.
  • Do not use a user specified host file if you have enabled task geometry as it may cause conflicts and jobs may fail.
  • If resources are not available at the time that a task is ready, use advance reservation instead of a user-specified host file, to ensure reserved slots are available and to guarantee that a job will run smoothly.

Any user can create a user specified host file. It must be accessible by the user from the submission host. It lists one host per line. The format is as follows:

# This is a user specified host file
<host_name1>   [<# slots>]
<host_name2>   [<# slots>]
<host_name1>   [<# slots>]
<host_name2>   [<# slots>]
<host_name3>   [<# slots>]
<host_name4>   [<# slots>]
The following rules apply to the user specified host file:
  • Insert comments starting with the # character.
  • Specifying the number of slots for a host is optional. If no slot number is indicated, the default is 1.
  • A host name can be either a host in a local cluster or a host leased-in from a remote cluster (host_name@cluster_name).
  • A user specified host file should contain hosts from the same cluster only.
  • A host name can be entered with or without the domain name.
  • Host names may be used multiple times and the order entered represents the placement of tasks. For example:
    #first three tasks
    host01                      3
    #fourth tasks
    host02
    #next three tasks
    host03                      3

When a job is submitted with a user specified host file, the LSB_DJOB_RANKFILE environment variable is generated from the user specified host file. If a job is not submitted with a user specified host file then LSB_DJOB_RANKFILE points to the same file as LSB_DJOB_HOSTFILE.

Duplicate host names are combined, along with the total number of slots for a host name and the results are used for scheduling (LSB_DJOB_HOSTFILE groups the hosts together) and for LSB_MCPU_HOSTS. LSB_MCPU_HOSTS represents the job allocation.

The esub parameter LSB_SUB4_HOST_FILE reads and modifies the value of the -hostfile option.

The following is an example of a user specified host file that includes duplicate host names:

user1: cat ./user1_host_file
# This is my user specified host file for job242
host01   3
host02    
host03   3
host01    
host02   2

This user specified host file tells LSF to allocate 10 slots in total (4 slots on host01, 3 slots on host02, and 3 slots on host03). Each line represents the order of task placement.

The result is the following:

LSB_DJOB_RANKFILE:
host01
host01
host01
host02
host03
host03
host03
host01
host02
host02
LSB_DJOB_HOSTFILE:
host01
host01
host01
host01
host02
host02
host02
host03
host03
host03
LSB_MCPU_HOSTS = host01 4 host02 3 host03 3

The user specified host file is deleted along with other job-related files when a job is cleaned.

End of change