-network

For LSF IBM Parallel Environment (IBM PE) integration. Specifies the network resource requirements to enable network-aware scheduling for IBM PE jobs.

Categories

resource

Synopsis

bsub -network " network_res_req"

Description

If any network resource requirement is specified in the job, queue, or application profile, the jobs are treated as IBM PE jobs. IBM PE jobs can only run on hosts where IBM PE pnsd daemon is running.

The network resource requirement string network_res_req has the same syntax as the NETWORK_REQ parameter defined in lsb.applications or lsb.queues.

network_res_req has the following syntax:

[type=sn_all | sn_single] [:protocol=protocol_name[(protocol_number)][,protocol_name[(protocol_number)]] [:mode=US | IP] [:usage=shared | dedicated] [:instance=positive_integer]

LSF_PE_NETWORK_NUM must be defined to a non-zero value in lsf.conf for the LSF to recognize the -network option. If LSF_PE_NETWORK_NUM is not defined or is set to 0, the job submission is rejected with a warning message.

The -network option overrides the value of NETWORK_REQ defined in lsb.applications or lsb.queues.

The following network resource requirement options are supported:
type=sn_all | sn_single
Specifies the adapter device type to use for message passing: either sn_all or sn_single.
sn_single
When used for switch adapters, specifies that all windows are on a single network
sn_all
Specifies that one or more windows are on each network, and that striped communication should be used over all available switch networks. The networks specified must be accessible by all hosts selected to run the IBM PE job. See the Parallel Environment Runtime Edition for AIX: Operation and Use guide (SC23-6781-05) for more information about submitting jobs that use striping.

If mode is IP and type is specified as sn_all or sn_single, the job will only run on IB adapters (IPoIB). If mode is IP and type is not specified, the job will only run on Ethernet adapters (IPoEth). For IPoEth jobs, LSF ensures the job is running on hosts where pnsd is installed and running. For IPoIB jobs, LSF ensures the job the job is running on hosts where pnsd is installed and running, and that InfiniBand networks are up. Because IP jobs do not consume network windows, LSF does not check if all network windows are used up or the network is already occupied by a dedicated IBM PE job.

Equivalent to the IBM PE MP_EUIDEVICE environment variable and -euidevice IBM PE flag See the Parallel Environment Runtime Edition for AIX: Operation and Use guide (SC23-6781-05) for more information. Only sn_all or sn_single are supported by LSF. The other types supported by IBM PE are not supported for LSF jobs.

protocol=protocol_name[(protocol_number)]
Network communication protocol for the IBM PE job, indicating which message passing API is being used by the application. The following protocols are supported by LSF:
mpi
The application makes only MPI calls. This value applies to any MPI job regardless of the library that it was compiled with (IBM PE MPI, MPICH2).
pami
The application makes only PAMI calls.
lapi
The application makes only LAPI calls.
shmem
The application makes only OpenSHMEM calls.
user_defined_parallel_api
The application makes only calls from a parallel API that you define. For example: protocol=myAPI or protocol=charm.

The default value is mpi.

LSF also supports an optional protocol_number (for example, mpi(2), which specifies the number of contexts (endpoints) per parallel API instance. The number must be a power of 2, but no greater than 128 (1, 2, 4, 8, 16, 32, 64, 128). LSF will pass the communication protocols to IBM PE without any change. LSF will reserve network windows for each protocol.

When you specify multiple parallel API protocols, you cannot make calls to both LAPI and PAMI (lapi, pami) or LAPI and OpenSHMEM (lapi, shmem) in the same application. Protocols can be specified in any order.

See the MP_MSG_API and MP_ENDPOINTS environment variables and the -msg_api and -endpoints IBM PE flags in the Parallel Environment Runtime Edition for AIX: Operation and Use guide (SC23-6781-05) for more information about the communication protocols that are supported by IBM PE.

mode=US | IP
The network communication system mode used by the communication specified communication protocol: US (User Space) or IP (Internet Protocol). The default value is US. A US job can only run with adapters that support user space communications, such as the IB adapter. IP jobs can run with either Ethernet adapters or IB adapters. When IP mode is specified, the instance number cannot be specified, and network usage must be unspecified or shared.

Each instance on the US mode requested by a task running on switch adapters requires and adapter window. For example, if a task requests both the MPI and LAPI protocols such that both protocol instances require US mode, two adapter windows will be used.

usage=dedicated | shared
Specifies whether the adapter can be shared with tasks of other job steps: dedicated or shared. Multiple tasks of the same job can share one network even if usage is dedicated.
instance=positive_integer
The number of parallel communication paths (windows) per task made available to the protocol on each network. The number actually used depends on the implementation of the protocol subsystem.

The default value is 1.

If the specified value is greater than MAX_PROTOCOL_INSTANCES in lsb.params or lsb.queues, LSF rejects the job.

The following IBM LoadLeveller job command file options are not supported in LSF:
  • collective_groups
  • imm_send_buffers
  • rcxtblocks

See Administering IBM Platform LSF for more information about network-aware scheduling and running and managing workload through IBM Parallel Environment.

Examples

bsub –n2 –R "span[ptile=1]" –network "protocol=mpi,lapi: type=sn_all: instances=2: usage=shared" poe /home/user1/mpi_prog

For this job running on hostA and hostB, each task will reserve 8 windows (2*2*2), for 2 protocols, 2 instances and 2 networks. If enough network windows are available, other network jobs with usage=shared can run on hostA and hostB because networks used by this job are shared.