You can enable LSF so applications can use Nvidia Graphic Processing Units (GPUs) or Intel MIC (Phi co-processors) in a Linux environment. LSF supports parallel jobs that request GPUs or MICs, allowing you to specify a certain number of GPUs or MICs on each node at run time, based on availability.
Specifically, LSF supports the following:
Nvidia GPUs and Intel MICs for serial and parallel jobs. Parallel jobs should be launched by blaunch.
Intel MIC (Phi co-processor) for LSF jobs in offload mode, both serial and parallel.
CUDA 4.0 to CUDA 5.5.
Linux x64: MIC supports Linux x64. Linux-based GPUs support x64 for REHL/Fedora/SLES.
LSF also supports the collection of metrics for GPUs and MICs using elims and predefined LSF resources.
Information collected by the elim GPU includes:
ngpus: Total number of GPUs
ngpus_shared: Number of GPUs in share mode
ngpus_excl_t: Number of GPUs in exclusive thread mode
ngpus_excl_p: Number of GPUs in exclusive process mode
ngpus_shared is a consumable resource in the lim. Its value is set to the same number of cpu cores. You can place any number of tasks on the shared mode GPU, but more tasks might degrade performance.
Information collected by the optional elim includes:
ngpus_prohibited: Number of GPUs prohibited
gpu_driver: GPU driver version
gpu_mode*: Mode of each GPU
gpu_temp*: Temperature of each GPU
gpu_ecc*: ECC errors for each GPU
gpu_model*: Model name of each GPU
Information collected by the elim MIC includes:
elim MIC detects the number of MIC: nmics
For each co-processor, the optional elim detects:
mic_ncores*: Number of cores
mic_temp*: MIC temperature
mic_freq*: MIC frequency
mic_freemem*: MIC free memory
mic_util*: MIC utilization
mic_power*: MIC total power
* If there are more than 1, an index of them is displayed, starting at 0. For example, for gpu_mode you might see gpu_mode0, gpu_mode1 and gpu_mode2
When enabling LSF support for GPU or MIC, note the following:
With LSF 9.1.2, the old elim.gpu is replaced with the new elim.gpu.
Checkpoint and restart are not supported.
Preemption is not supported.
Resource duration and decay are not supported.
elims for CUDA 4.0 can work with CUDA 5.5.
To configure and use GPU or MIC resources:
Binaries for base elim.gpu and elim.mic are located under $LSF_SERVERDIR. The binary for optional elim.gpu.ext.c and its Makefile are located under LSF_TOP/9.1/misc/examples/elim.gpu.ext. The binary for elim.mic.ext (script file) is located under LSF_TOP/9.1/util/elim.mic.ext.
Ensure elim executables are in LSF_SERVERDIR.
For GPU support, ensure the following 3rd party software is installed correctly:
CUDA driver
CUDA toolkit
NVIDIA Management Library (NVML)
CUDA sample is optional.
CUDA version should be 4.0 or higher.
From CUDA 5.0, the CUDA driver, CUDA toolkit and CUDA samples are in one package.
Nodes must have at least one Nvidia GPU from the Fermi/Kepler family. Earlier Tesla and desktop GPUs of 8800 and later cards are supported. Not all features are available for the earlier cards. Cards earlier than Fermi cards do not support ECC errors, and some do not support Temperature queries.
For Intel Phi Co-processor support, ensure the following 3rd party software is installed correctly:
Intel Phi Co-processor (Knight Corner).
Intel MPSS version 2.1.4982-15 or newer.
Runtime support library/tools from Intel for Phi offload support.
Configure the LSF cluster that contains the GPU or MIC resources:
Configure lsf.shared: For GPU support, define the following resources in the Resource section, assuming that the maximum number of GPUs per host is three. The first four GPUs are provided by base elims. The others are optional. ngpus is not a consumable resource. Remove changes related to the old GPU solution before defining the new one:
Begin Resource
RESOURCENAME TYPE INTERVAL INCREASING CONSUMABLE DESCRIPTION
ngpus Numeric 60 N N (Number of GPUs)
ngpus_shared Numeric 60 N Y (Number of GPUs in Shared Mode)
ngpus_excl_t Numeric 60 N Y (Number of GPUs in Exclusive Thread Mode)
ngpus_excl_p Numeric 60 N Y (Number of GPUs in Exclusive Process Mode)
ngpus_prohibited Numeric 60 N N (Number of GPUs in Prohibited Mode)
gpu_driver String 60 () () (GPU driver version)
gpu_mode0 String 60 () () (Mode of 1st GPU)
gpu_temp0 Numeric 60 Y () (Temperature of 1st GPU)
gpu_ecc0 Numeric 60 N () (ECC errors on 1st GPU)
gpu_model0 String 60 () () (Model name of 1st GPU)
gpu_mode1 String 60 () () (Mode of 2nd GPU)
gpu_temp1 Numeric 60 Y () (Temperature of 2nd GPU)
gpu_ecc1 Numeric 60 N () (ECC errors on 2nd GPU)
gpu_model1 String 60 () () (Model name of 2nd GPU)
gpu_mode2 String 60 () () (Mode of 3rd GPU)
gpu_temp2 Numeric 60 Y () (Temperature of 3rd GPU)
gpu_ecc2 Numeric 60 N () (ECC errors on 3rd GPU)
gpu_model2 String 60 () () (Model name of 3rd GPU)
...
End Resource
For Intel Phi support, define the following resources in the Resource section. The first resource (nmics) is required. The others are optional:
Begin Resource
RESOURCENAME TYPE INTERVAL INCREASING CONSUMABLE DESCRIPTION
nmics Numeric 60 N Y (Number of MIC devices)
mic_temp0 Numeric 60 Y N (MIC device 0 CPU temp)
mic_temp1 Numeric 60 Y N (MIC device 1 CPU temp)
mic_freq0 Numeric 60 N N (MIC device 0 CPU freq)
mic_freq1 Numeric 60 N N (MIC device 1 CPU freq)
mic_power0 Numeric 60 Y N (MIC device 0 total power)
mic_power1 Numeric 60 Y N (MIC device 1 total power)
mic_freemem0 Numeric 60 N N (MIC device 0 free memory)
mic_freemem1 Numeric 60 N N (MIC device 1 free memory)
mic_util0 Numeric 60 Y N (MIC device 0 CPU utility)
mic_util1 Numeric 60 Y N (MIC device 1 CPU utility)
mic_ncores0 Numeric 60 N N (MIC device 0 number cores)
mic_ncores1 Numeric 60 N N (MIC device 1 number cores)
...
End Resource
Note that mic_util is a numeric resource, so lsload will not display it as the internal resource.
Configure lsf.cluster <clustername>: For GPU support, define the following in the resource map section. The first four GPUs are provided by elims.gpu. The others are optional. Remove changes related to the old GPU solution before defining the new one:
Begin ResourceMap
RESOURCENAME LOCATION
...
ngpus ([default])
ngpus_shared ([default])
ngpus_excl_t ([default])
ngpus_excl_p ([default])
ngpus_prohibited ([default])
gpu_mode0 ([default])
gpu_temp0 ([default])
gpu_ecc0 ([default])
gpu_mode1 ([default])
gpu_temp1 ([default])
gpu_ecc1 ([default])
gpu_mode2 ([default])
gpu_temp2 ([default])
gpu_ecc2 ([default])
gpu_mode3 ([default])
gpu_temp3 ([default])
gpu_ecc3 ([default])
...
End ResourceMap
For Intel Phi support, define the following in the ResourceMap section. The first MIC is provided by the elim mic. The others are optional:
Begin ResourceMap
RESOURCENAME LOCATION
...
nmics [default]
mic_temp0 [default]
mic_temp1 [default]
mic_freq0 [default]
mic_freq1 [default]
mic_power0 [default]
mic_power1 [default]
mic_freemem0 [default]
mic_freemem1 [default]
mic_util0 [default]
mic_util1 [default]
mic_ncores0 [default]
mic_ncores1 [default]
...
End ResourceMap
Configure lsb.resources: Optionally, for ngpus_shared, gpuexcl_t, gpuexcl_p and nmics, you can set attributes in the ReservationUsage section with the following values:
Begin ReservationUsage
RESOURCE METHOD RESERVE
ngpus_shared PER_HOST N
ngpus_excl_t PER_HOST N
ngpus_excl_p PER_HOST N
nmics PER_TASK N
End ReservationUsage
If this file has no configuration for GPU or MIC resources, by default LSF considers all resources as PER_HOST.
Use lsload –l to show GPU/MIC resources:
$ lsload -I nmics:ngpus:ngpus_shared:ngpus_excl_t:ngpus_excl_p
HOST_NAME status nmics ngpus ngpus_shared ngpus_excl_t ngpus_excl_p
hostA ok - 3.0 12.0 0.0 0.0
hostB ok 1.0 - - - -
hostC ok 1.0 - - - -
hostD ok 1.0 - - - -
hostE ok 1.0 - - - -
hostF ok - 3.0 12.0 0.0 0.0
hostG ok - 3.0 12.0 0.0 1.0
hostH ok - 3.0 12.0 1.0 0.0
hostI ok 2.0 - - - -
Use bhost –l to see how the LSF scheduler has allocated GPU or MIC resources. These resources are treated as normal host-based resources:
$ bhosts -l hostA
HOST hostA
STATUS CPUF JL/U MAX NJOBS RUN SSUSP USUSP RSV DISPATCH_WINDOW
ok 60.00 - 12 2 2 0 0 0 -
CURRENT LOAD USED FOR SCHEDULING:
r15s r1m r15m ut pg io ls it tmp swp mem slots nmics
Total 0.0 0.0 0.0 0% 0.0 3 4 0 28G 3.9G 22.5G 10 0.0
Reserved 0.0 0.0 0.0 0% 0.0 0 0 0 0M 0M 0M - -
ngpus ngpus_shared ngpus_excl_t ngpus_excl_p
Total 3.0 10.0 0.0 0.0
Reserved 0.0 2.0 0.0 0.0
LOAD THRESHOLD USED FOR SCHEDULING:
r15s r1m r15m ut pg io ls it tmp swp mem
loadSched - - - - - - - - - - -
loadStop - - - - - - - - - - -
nmics ngpus ngpus_shared ngpus_excl_t ngpus_excl_p
loadSched - - - - -
loadStop - - - - -
Use lshosts –l to see the information for GPUs and Phi co-processors collected by elim:
$ lshosts -l hostA
HOST_NAME: hostA
type model cpuf ncpus ndisks maxmem maxswp maxtmp rexpri server nprocs ncores nthreads
X86_64 Intel_EM64T 60.0 12 1 23.9G 3.9G 40317M 0 Yes 2 6 1
RESOURCES: (mg)
RUN_WINDOWS: (always open)
LOAD_THRESHOLDS:
r15s r1m r15m ut pg io ls it tmp swp mem nmics ngpus ngpus_shared ngpus_excl_t ngpus_excl_p
- 3.5 - - - - - - - - - - - - - -
Submit jobs: Use the Selection string to choose the hosts which have GPU or MIC resources. Use rusage[] to tell LSF how many GPU or MIC resources to use. The following are some examples:
Use a GPU in shared mode:
bsub -R “select[ngpus>0] rusage [ngpus_shared=2]” gpu_app
Use a GPU in exclusive thread mode for a PMPI job:
bsub -n 2 -R “select[ngpus>0] rusage[ngpus_excl_t=2]” mpirun -lsf gpu_app1
Use a GPU in exclusive process mode for a PMPI job:
bsub -n 4 -R “select[ngpus>0] rusage[ngpus_excl_p=2]” mpirun –lsf gpu_app2
Use MIC in a PMPI job:
bsub -n 4 -R “rusage[nmics=2]” mpirun –lsf mic_app
Request Phi co-processors:
bsub -R "rusage[nmics=n]"
Consume one MIC on the execution host:
bsub -R “rusage[nmics=1]” mic_app
Run the job on one host and consume 2 MICs on that host:
bsub -R “rusage[nmics=2]” mic_app
Run a job on 1 host with 8 tasks on it, using 2 ngpus_excl_p in total:
bsub -n 8 -R “select[ngpus > 0] rusage[ngpus_excl_p=2] span[hosts=1]” mpirun -lsf gpu_app2
Run a job on 8 hosts with 1 task per host, where every task uses 2 gpushared per host:
bsub -n 8 -R “select[ngpus > 0] rusage[ngpus_shared=2] span[ptile=1]” mpirun -lsf gpu_app2
Run a job on 4 hosts with 2 tasks per host, where the tasks use a total of 2 ngpus_excl_t per host.
bsub -n 8 -R “select[ngpus > 0] rusage[ngpus_excl_t=2] span[ptile=2]” mpirun -lsf gpu_app2