Define the dynamic external resources in lsf.shared. By default, these resources are host-based (local to each host) until the LSF administrator configures a resource-to-host-mapping in the ResourceMap section of lsf.cluster.cluster_name. The presence of the dynamic external resource in lsf.shared and lsf.cluster.cluster_name triggers LSF to start the elim executables.
You must run the command lsadmin reconfig followed by badmin mbdrestart to apply changes.
Create one or more elim executables in the directory specified by the parameter LSF_SERVERDIR. LSF does not include a default elim; you should write your own executable to meet the requirements of your site. The section Create an elim executable provides guidelines for writing an elim.
IBM® General Parallel File System (GPFS™) is a high performance cluster file system. GPFS is a shared disk file system that supports the AIX®, Linux, and Windows operating systems. The main differentiator in GPFS is that it is not a clustered File System but a parallel File System. This means that GPFS can scale almost infinitely. Using Platform RTM, you can monitor GPFS data.
In the RTM GUI, you can monitor GPFS on a per LSF host and a per LSF cluster basis either as a whole or per volume level.
Host level:
Average MB In/Out per second
Maximum MB In/Out per second
Average file Reads/Writes per second
Average file Opens/Closes/Directory Reads/Node Updates per second
Cluster level:
MB available capacity In/Out
Resources can be reserved and used upon present maximum available bandwidth. For example, bsub to reserve 100 kbytes of inbound bandwidth at cluster level for 20 minutes: bsub –q normal –R
“rusage[gtotalin=100:duration=20]” ./myapplication myapplication_options
Configure the following ELIMs in LSF before proceeding:
elim.gpfshost - Monitors GPFS performance counters at LSF host level
elim.gpfsglobal - Monitors available GPFS bandwidth at LSF cluster level
The ELIM Scripts are available for LSF 9.1.1 and later versions.
Configure the constant of elim.gpfshost:
Configure the monitored GPFS file system name by "VOLUMES".
[Optional] Configure CHECK_INTERVAL, FLOATING_AVG_INTERVAL and DECIMAL_DIGITS.
Configure the constant of elim.gpfsglobal:
Configure the monitored GPFS file system name by "VOLUMES".
Configure the maximum write bandwidth for each GPFS file system by MAX_INBOUND.
Configure the maximum read bandwidth for each GPFS file system by MAX_OUTBOUND.
[Optional] Configure CHECK_INTERVAL, FLOATING_AVG_INTERVAL and DECIMAL_DIGITS.
Configuration file |
Parameter and syntax |
Description |
---|---|---|
lsf.shared |
RESOURCENAME resource_name |
|
TYPE Numeric |
|
|
INTERVAL seconds |
|
|
INCREASING Y | N |
|
|
RELEASE Y | N |
|
|
DESCRIPTION description |
|
Once external resources are defined in lsf.shared, they must be mapped to hosts in the ResourceMap section of lsf.cluster.cluster_name.
Configuration file |
Parameter and syntax |
Default behavior |
---|---|---|
lsf.cluster. cluster_name |
RESOURCENAMEresource_name |
|
LOCATION
|
|
|
|
|
|
|
|
You can write one or more elim executables. The load index names defined in your elim executables must be the same as the external resource names defined in the lsf.shared configuration file.
Operating system |
Naming convention |
---|---|
UNIX |
LSF_SERVERDIR\elim.application |
Windows |
LSF_SERVERDIR\elim.application.exe or LSF_SERVERDIR\elim.application.bat |
The name elim.user is reserved for backward compatibility. Do not use the name elim.user for your application-specific elim.
LSF invokes any elim that follows this naming convention,—move backup copies out of LSF_SERVERDIR or choose a name that does not follow the convention. For example, use elim_backup instead of elim.backup.
Exit upon receipt of a SIGTERM signal from the load information manager (LIM).
Value |
Defines |
---|---|
number_indices |
|
index_name |
|
index_value |
|
For example, the string
3 tmp2 47.5 nio 344.0 tmp 5
The load update string must be end with only one \n or only one space. In Windows, echo will add \n.
The load update string must report values between -INFINIT_LOAD and INFINIT_LOAD as defined in the lsf.h header file.
If the elim executable is a C program, check the return value of printf(3s).
If the elim executable is a shell script, check the return code of /bin/echo(1).
If the elim executable is implemented as a C program, use setbuf(3) during initialization to send unbuffered output to stdout.
Each LIM sends updated load information to the master LIM every 15 seconds; the elim executable should write the load update string at most once every 15 seconds. If the external load index values rarely change, program the elim to report the new values only when a change is detected.
If you map any external resource as default in lsf.cluster.cluster_name, all elim executables in LSF_SERVERDIR run on all hosts in the cluster. If LSF_SERVERDIR contains more than one elim executable, you should include a header that checks whether the elim is programmed to report values for the resources expected on the host. For detailed information about using a checking header, see the section How environment variables determine elim hosts.
An elim executable can be used to override the value of a built-in load index. For example, if your site stores temporary files in the /usr/tmp directory, you might want to monitor the amount of space available in that directory. An elim can report the space available in the /usr/tmp directory as the value for the tmp built-in load index.
To override a built-in load index value, write an elim executable that periodically measures the value of the dynamic external resource and writes the numeric value to standard output. The external load index must correspond to a numeric, dynamic external resource as defined by TYPE and INTERVAL in lsf.shared.
You can find the built-in load index type and name in the lsinfo output.
For example, an elim collects available space under /usr/tmp as 20M. Then, it can report the value as available tmp space (the built-in load index tmp) in the load update string: 1 tmp 20.
The following built-in load indices cannot be overridden by elim: logins, idle, cpu, and swap
Attribute name |
Attribute type |
Resource name |
---|---|---|
OperatingSystemName |
string |
osname |
OperatingSystemVersion |
string |
osver |
CPUArchitectureName |
string |
cpuarch |
IndividualCPUSpeed |
int64 |
cpuspeed |
IndividualNetworkBandwidth |
int64 |
bandwidth (This is the maximum bandwidth). |
The file elim.jsdl is automatically configured to collect these resources. To enable the use of elim.jsdl, uncomment the lines for these resources in the ResourceMap section of the file lsf.cluster.cluster_name.
See the section How environment variables determine elim hosts for an example of a simple elim script.
You can find more elim examples in the LSF_MISC/examples directory. The elim.c file is an elim written in C. You can modify this example to collect the external load indices that are required at your site.