Define GPU or MIC resources

You can enable LSF so applications can use Nvidia Graphic Processing Units (GPUs) or Intel MIC (Phi co-processors) in a Linux environment. LSF supports parallel jobs that request GPUs or MICs, allowing you to specify a certain number of GPUs or MICs on each node at run time, based on availability.

Specifically, LSF supports the following:

Nvidia GPUs and Intel MICs for serial and parallel jobs. Parallel jobs should be launched by blaunch.
Intel MIC (Phi co-processor) for LSF jobs in offload mode, both serial and parallel.
CUDA 4.0 to CUDA 5.5.
Linux x64: MIC supports Linux x64. Linux-based GPUs support x64 for REHL/Fedora/SLES.

LSF also supports the collection of metrics for GPUs and MICs using elims and predefined LSF resources.

Information collected by the elim GPU includes:

ngpus: Total number of GPUs
ngpus_shared: Number of GPUs in share mode
ngpus_excl_t: Number of GPUs in exclusive thread mode
ngpus_excl_p: Number of GPUs in exclusive process mode

ngpus_shared is a consumable resource in the lim. Its value is set to the same number of cpu cores. You can place any number of tasks on the shared mode GPU, but more tasks might degrade performance.

Information collected by the optional elim includes:

ngpus_prohibited: Number of GPUs prohibited
gpu_driver: GPU driver version
gpu_mode*: Mode of each GPU
gpu_temp*: Temperature of each GPU
gpu_ecc*: ECC errors for each GPU
gpu_model*: Model name of each GPU

Information collected by the elim MIC includes:

elim MIC detects the number of MIC: nmics
For each co-processor, the optional elim detects:
- mic_ncores*: Number of cores
- mic_temp*: MIC temperature
- mic_freq*: MIC frequency
- mic_freemem*: MIC free memory
- mic_util*: MIC utilization
- mic_power*: MIC total power

* If there are more than 1, an index of them is displayed, starting at 0. For example, for gpu_mode you might see gpu_mode0, gpu_mode1 and gpu_mode2

When enabling LSF support for GPU or MIC, note the following:

With LSF 9.1.2, the old elim.gpu is replaced with the new elim.gpu.
Checkpoint and restart are not supported.
Preemption is not supported.
Resource duration and decay are not supported.
elims for CUDA 4.0 can work with CUDA 5.5.

Configure and use GPU or MIC resources

To configure and use GPU or MIC resources:

Binaries for base elim.gpu and elim.mic are located under $LSF_SERVERDIR. The binary for optional elim.gpu.ext.c and its Makefile are located under LSF_TOP/9.1/misc/examples/elim.gpu.ext. The binary for elim.mic.ext (script file) is located under LSF_TOP/9.1/util/elim.mic.ext.

Ensure elim executables are in LSF_SERVERDIR.

For GPU support, ensure the following 3rd party software is installed correctly:
- CUDA driver
- CUDA toolkit
- NVIDIA Management Library (NVML)
- CUDA sample is optional.
- CUDA version should be 4.0 or higher.
- From CUDA 5.0, the CUDA driver, CUDA toolkit and CUDA samples are in one package.
- Nodes must have at least one Nvidia GPU from the Fermi/Kepler family. Earlier Tesla and desktop GPUs of 8800 and later cards are supported. Not all features are available for the earlier cards. Cards earlier than Fermi cards do not support ECC errors, and some do not support Temperature queries.
For Intel Phi Co-processor support, ensure the following 3rd party software is installed correctly:
- Intel Phi Co-processor (Knight Corner).
- Intel MPSS version 2.1.4982-15 or newer.
- Runtime support library/tools from Intel for Phi offload support.

Configure the LSF cluster that contains the GPU or MIC resources:

Configure lsf.shared: For GPU support, define the following resources in the Resource section, assuming that the maximum number of GPUs per host is three. The first four GPUs are provided by base elims. The others are optional. ngpus is not a consumable resource. Remove changes related to the old GPU solution before defining the new one:

Begin Resource
RESOURCENAME     TYPE      INTERVAL  INCREASING  CONSUMABLE  DESCRIPTION
ngpus            Numeric   60        N           N           (Number of GPUs)
ngpus_shared     Numeric   60        N           Y           (Number of GPUs in Shared Mode)
ngpus_excl_t     Numeric   60        N           Y           (Number of GPUs in Exclusive Thread Mode)
ngpus_excl_p     Numeric   60        N           Y           (Number of GPUs in Exclusive Process Mode)
ngpus_prohibited Numeric   60        N           N           (Number of GPUs in Prohibited Mode)
gpu_driver       String    60        ()          ()          (GPU driver version)
gpu_mode0        String    60        ()          ()          (Mode of 1st GPU)
gpu_temp0        Numeric   60        Y           ()          (Temperature of 1st GPU)
gpu_ecc0         Numeric   60        N           ()          (ECC errors on 1st GPU)
gpu_model0       String    60        ()          ()          (Model name of 1st GPU) 
gpu_mode1        String    60        ()          ()          (Mode of 2nd GPU)
gpu_temp1        Numeric   60        Y           ()          (Temperature of 2nd GPU)
gpu_ecc1         Numeric   60        N           ()          (ECC errors on 2nd GPU)
gpu_model1       String    60        ()          ()          (Model name of 2nd GPU)
gpu_mode2        String    60        ()          ()          (Mode of 3rd GPU)
gpu_temp2        Numeric   60        Y           ()          (Temperature of 3rd GPU)
gpu_ecc2         Numeric   60        N           ()          (ECC errors on 3rd GPU)
gpu_model2       String    60        ()          ()          (Model name of 3rd GPU)
...
End Resource

For Intel Phi support, define the following resources in the Resource section. The first resource (nmics) is required. The others are optional:

Begin Resource 
RESOURCENAME TYPE    INTERVAL  INCREASING  CONSUMABLE  DESCRIPTION
nmics        Numeric 60        N           Y           (Number of MIC devices)
mic_temp0    Numeric 60        Y           N           (MIC device 0 CPU temp)
mic_temp1    Numeric 60        Y           N           (MIC device 1 CPU temp)
mic_freq0    Numeric 60        N           N           (MIC device 0 CPU freq)
mic_freq1    Numeric 60        N           N           (MIC device 1 CPU freq)
mic_power0   Numeric 60        Y           N           (MIC device 0 total power)
mic_power1   Numeric 60        Y           N           (MIC device 1 total power)
mic_freemem0 Numeric 60        N           N           (MIC device 0 free memory)
mic_freemem1 Numeric 60        N           N           (MIC device 1 free memory)
mic_util0    Numeric 60        Y           N           (MIC device 0 CPU utility)
mic_util1    Numeric 60        Y           N           (MIC device 1 CPU utility)
mic_ncores0  Numeric 60        N           N           (MIC device 0 number cores)
mic_ncores1  Numeric 60        N           N           (MIC device 1 number cores)
...
End Resource

Note that mic_util is a numeric resource, so lsload will not display it as the internal resource.

Configure lsf.cluster <clustername>: For GPU support, define the following in the resource map section. The first four GPUs are provided by elims.gpu. The others are optional. Remove changes related to the old GPU solution before defining the new one:

Begin ResourceMap
RESOURCENAME      LOCATION
...
ngpus             ([default])
ngpus_shared      ([default])
ngpus_excl_t      ([default])
ngpus_excl_p      ([default])
ngpus_prohibited  ([default])
gpu_mode0         ([default])
gpu_temp0         ([default])
gpu_ecc0          ([default])
gpu_mode1         ([default])
gpu_temp1         ([default])
gpu_ecc1          ([default])
gpu_mode2         ([default])
gpu_temp2         ([default])
gpu_ecc2          ([default])
gpu_mode3         ([default])
gpu_temp3         ([default])
gpu_ecc3          ([default])
...
End ResourceMap

For Intel Phi support, define the following in the ResourceMap section. The first MIC is provided by the elim mic. The others are optional:

Begin ResourceMap
RESOURCENAME      LOCATION
...
nmics             [default]
mic_temp0         [default]
mic_temp1         [default]
mic_freq0         [default]
mic_freq1         [default]
mic_power0        [default]
mic_power1        [default]
mic_freemem0      [default]
mic_freemem1      [default]
mic_util0         [default]
mic_util1         [default]
mic_ncores0       [default]
mic_ncores1       [default]
...
End ResourceMap

Configure lsb.resources: Optionally, for ngpus_shared, gpuexcl_t, gpuexcl_p and nmics, you can set attributes in the ReservationUsage section with the following values:
```
Begin ReservationUsage 
RESOURCE         METHOD        RESERVE
ngpus_shared     PER_HOST      N
ngpus_excl_t     PER_HOST      N
ngpus_excl_p     PER_HOST      N
 nmics            PER_TASK      N 
End ReservationUsage
```
If this file has no configuration for GPU or MIC resources, by default LSF considers all resources as PER_HOST.

Use lsload –l to show GPU/MIC resources:

$ lsload -I nmics:ngpus:ngpus_shared:ngpus_excl_t:ngpus_excl_p
HOST_NAME       status nmics ngpus ngpus_shared ngpus_excl_t ngpus_excl_p
hostA           ok      -    3.0   12.0         0.0          0.0
hostB           ok     1.0    -     -            -            -
hostC           ok     1.0    -     -            -            -
hostD           ok     1.0    -     -            -            -
hostE           ok     1.0    -     -            -            -
hostF           ok      -    3.0    12.0        0.0          0.0
hostG           ok      -    3.0    12.0        0.0          1.0
hostH           ok      -    3.0    12.0        1.0          0.0
hostI           ok     2.0    -      -           -            -

Use bhost –l to see how the LSF scheduler has allocated GPU or MIC resources. These resources are treated as normal host-based resources:

$ bhosts -l hostA
HOST  hostA
STATUS   CPUF  JL/U   MAX  NJOBS  RUN  SSUSP  USUSP  RSV DISPATCH_WINDOW
ok       60.00  -     12   2      2    0      0      0   -
 
CURRENT LOAD USED FOR SCHEDULING:
         r15s  r1m  r15m  ut  pg   io  ls it tmp  swp   mem   slots nmics
Total    0.0   0.0  0.0   0%  0.0  3   4  0  28G  3.9G  22.5G  10   0.0
Reserved 0.0   0.0  0.0   0%  0.0  0   0  0  0M   0M    0M      -    - 
 
          ngpus ngpus_shared ngpus_excl_t ngpus_excl_p
Total     3.0   10.0         0.0          0.0
Reserved  0.0   2.0          0.0          0.0
 
LOAD THRESHOLD USED FOR SCHEDULING:
           r15s  r1m  r15m  ut  pg  io  ls  it  tmp  swp  mem
loadSched   -    -     -    -   -   -   -   -   -    -    -  
loadStop    -    -     -    -   -   -   -   -   -    -    -  
 
            nmics ngpus ngpus_shared ngpus_excl_t ngpus_excl_p 
loadSched   -     -     -            -            -  
loadStop    -     -     -            -            -

Use lshosts –l to see the information for GPUs and Phi co-processors collected by elim:

$ lshosts -l hostA
 
HOST_NAME:  hostA
type    model        cpuf ncpus ndisks maxmem maxswp maxtmp rexpri server nprocs ncores nthreads
X86_64  Intel_EM64T  60.0 12    1      23.9G  3.9G   40317M 0      Yes    2      6      1
 
RESOURCES: (mg)
RUN_WINDOWS:  (always open)
 
LOAD_THRESHOLDS:
r15s  r1m  r15m ut pg io ls it tmp swp mem nmics ngpus ngpus_shared ngpus_excl_t ngpus_excl_p
-     3.5  -    -  -  -  -  -  -   -   -   -     -     -            -            -

Submit jobs: Use the Selection string to choose the hosts which have GPU or MIC resources. Use rusage[] to tell LSF how many GPU or MIC resources to use. The following are some examples:
- Use a GPU in shared mode:
  
  bsub -R “select[ngpus>0] rusage [ngpus_shared=2]” gpu_app
- Use a GPU in exclusive thread mode for a PMPI job:
  
  bsub -n 2 -R “select[ngpus>0] rusage[ngpus_excl_t=2]” mpirun -lsf gpu_app1
- Use a GPU in exclusive process mode for a PMPI job:
  
  bsub -n 4 -R “select[ngpus>0] rusage[ngpus_excl_p=2]” mpirun –lsf gpu_app2
- Use MIC in a PMPI job:
  
  bsub -n 4 -R “rusage[nmics=2]” mpirun –lsf mic_app
- Request Phi co-processors:
  
  bsub -R "rusage[nmics=n]"
- Consume one MIC on the execution host:
  
  bsub -R “rusage[nmics=1]” mic_app
- Run the job on one host and consume 2 MICs on that host:
  
  bsub -R “rusage[nmics=2]” mic_app
- Run a job on 1 host with 8 tasks on it, using 2 ngpus_excl_p in total:
  
  bsub -n 8 -R “select[ngpus > 0] rusage[ngpus_excl_p=2] span[hosts=1]” mpirun -lsf gpu_app2
- Run a job on 8 hosts with 1 task per host, where every task uses 2 gpushared per host:
  
  bsub -n 8 -R “select[ngpus > 0] rusage[ngpus_shared=2] span[ptile=1]” mpirun -lsf gpu_app2
- Run a job on 4 hosts with 2 tasks per host, where the tasks use a total of 2 ngpus_excl_t per host.
  
  bsub -n 8 -R “select[ngpus > 0] rusage[ngpus_excl_t=2] span[ptile=2]” mpirun -lsf gpu_app2