badmin

Administrative tool for LSF.

Synopsis

badmin subcommand options
badmin [-h | -V]

Description

Important:

This command can only be used by LSF administrators.

badmin provides a set of subcommands to control and monitor LSF. If no subcommands are supplied for badmin, you are prompted for a subcommand from standard input.

Information about each subcommand is available through the -h option.

The badmin subcommands include privileged and non-privileged subcommands. Privileged subcommands can only be invoked by root or LSF administrators. Privileged subcommands are:

diagnose

reconfig

mbdrestart

qopen

qclose

qact

qinact

hopen

hclose

hpower

hrestart

hshutdown

hstartup

hghostadd

hghostdel

perflog

perfmon

The configuration file lsf.sudoers must be set to use the privileged command hstartup by a non-root user.

All other commands are non-privileged commands and can be invoked by any LSF user. If the privileged commands are to be executed by the LSF administrator, badmin must be installed, because it needs to send the request using a privileged port.

When using subcommands for which multiple hosts can be specified, do not enclose the host names in quotation marks.

Subcommand synopsis

ckconfig [-v]
diagnose pending_jobID ...
diagnose -c query [ [-f logfile_name] [-d duration] | [-o]]
reconfig [-v] [-f]
mbdrestart [-C comment] [-v] [-f] [-p]
qopen [-C comment] [queue_name ... | all]
qclose [-C comment] [queue_name ... | all]
qact [-C comment] [queue_name ... | all]
qinact [-C comment] [queue_name ... | all]
qhist [-t time0,time1] [-f logfile_name] [queue_name ...]
hopen [-C comment] [host_name ... | host_group ... | compute_unit ... | all]
hclose [-C comment] [host_name ... | host_group ... | compute_unit ... | all]
hpower [suspend | resume] [-C comment] [host_name ...]
hrestart [-f] [host_name ... | all]
hshutdown [-f] [host_name ... | all]
hstartup [-f] [host_name ... | all]
hhist [-t time0,time1] [-f logfile_name] [host_name ...]
mbdhist [-t time0,time1] [-f logfile_name]
hist [-t time0,time1] [-f logfile_name]
hghostadd [-C comment] host_group | compute_unit | host_name [host_name ...]
hghostdel [-f] [-C comment] host_group | compute_unit | host_name [host_name ...]
help [command ...] | ? [command ...]
quit
mbddebug [-c class_name ...] [-l debug_level] [-f logfile_name] [-o]
mbdtime [-l timing_level] [-f logfile_name] [-o]
sbddebug [-c class_name ...] [-l debug_level] [-f logfile_name] [-o] [host_name ...]
sbdtime [-l timing_level] [-f logfile_name] [-o] [host_name ...]
schddebug [-c class_name ...] [-l debug_level] [-f logfile_name] [-o]
schdtime [-l timing_level] [-f logfile_name] [-o]
showconf mbd | [sbd [ host_name … | all ]]
showstatus
perflog [-t sample_period] [-d duration] [-f logfile_name] [-o]
perfmon start [sample_period]| stop | view | setperiod sample_period
-h
-V

Options

subcommand

Executes the specified subcommand. See Usage section.

-h

Prints command usage to stderr and exits.

-V

Prints LSF release version to stderr and exits.

Usage

ckconfig [-v]

Checks LSF configuration files located in the LSB_CONFDIR/cluster_name/configdir directory, and checks LSF_ENVDIR/lsf.licensescheduler.

The LSB_CONFDIR variable is defined in lsf.conf (see lsf.conf), in LSF_ENVDIR or /etc (if LSF_ENVDIR is not defined).

By default, badmin ckconfig displays only the result of the configuration file check. If warning errors are found, badmin prompts you to display detailed messages.
-v

Verbose mode. Displays detailed messages about configuration file checking to stderr.

diagnose <pend jobid> ...

Displays full pending reason list if CONDENSE_PENDING_REASONS=Y is set in lsb.params. For example:

badmin diagnose 1057
diagnose -c query [-f logfile_name] [-d minutes] | [-o]]

This feature is helpful if there is an unexpected mbatchd query load that is causing the cluster to slow, and/or fail to respond to requests. For example, there may be many bjobs queries causing a high network load and preventing mbatchd from responding. Running this command with its options enables mbatchd to dump the query source information into a log file. The log file shows information about the source of queries, allowing you to troubleshoot problems. The log file shows who issued these requests, where the requests came from, and the data size of the query.

You can also configure this feature by enabling DIAGNOSE_LOGDIR and ENABLE_DIAGNOSE in lsb.params to log the entire query information as soon as the cluster starts. However, the dynamic settings via the command override the static parameter settings. Also, once the duration you specify to keep track of the query information expires, the static diagnosis settings take effect.

You can use the following options to dynamically set the time, specify a log file and allow mbatchd to collect information:

-c query

Required.

-f

Specifies a log file in which to save the information. It is either a filename, which will be located in DIAGNOSE_LOGDIR, or a full path filename.

The default name for the log file is query_info.querylog.<hostname>.

The owner of the log file is LSF_ADMIN. The log file permissions are the same as mbatchd log permissions. Everyone has read and execute access but the LSF_ADMIN has write, read and execute access.

If you specify the log file in lsb.params and then later specify a different log file in the command line, the one in the command line takes precedence. Logging continues until the specified duration is over, or until you stop dynamic logging. It then switches back to the static log file location.

-d minutes

This is the duration in minutes you specify to keep track of the query information. mbatchd reverts back to static settings once the duration is over, or until you stop it manually, restart (badmin mbdrestart) or reconfigure mbatchd (badmin reconfig). The default value for this is infinite; that is, query info is always logged.

-o

Turns off dynamic diagnosis (stop logging). If ENABLE_DIAGNOSE=query is configured, it returns to the static configuration.

reconfig [-v] [-f]

Dynamically reconfigures LSF.

Configuration files are checked for errors and the results displayed to stderr. If no errors are found in the configuration files, a reconfiguration request is sent to mbatchd and configuration files are reloaded. When live configuration using bconf is enabled (LSF_LIVE_CONFDIR is defined in lsf.conf) badmin reconfig uses configuration files generated by bconf.

With this option, mbatchd is not restarted and lsb.events is not replayed. To restart mbatchd and replay lsb.events, use badmin mbdrestart.

When you issue this command, mbatchd is available to service requests while reconfiguration files are reloaded. Configuration changes made since system boot or the last reconfiguration take effect.

If warning errors are found, badmin prompts you to display detailed messages. If fatal errors are found, reconfiguration is not performed, and badmin exits.

If you add a host to a queue or to a host group or compute unit, the new host is not recognized by jobs that were submitted before you reconfigured. If you want the new host to be recognized, you must use the command badmin mbdrestart.

Resource requirements determined by the queue no longer apply to a running job after running badmin reconfig. For example, if you change the RES_REQ parameter in a queue and reconfigure the cluster, the previous queue-level resource requirements for running jobs are lost.

-v

Verbose mode. Displays detailed messages about the status of the configuration files. Without this option, the default is to display the results of configuration file checking. All messages from the configuration file check are printed to stderr.

-f

Disables interaction and proceeds with reconfiguration if configuration files contain no fatal errors.

mbdrestart [-C comment] [-v] [-f] [-p]

Dynamically reconfigures LSF and restarts mbatchd and mbschd. When live configuration using bconf is enabled (LSF_LIVE_CONFDIR is defined in lsf.conf) badmin mbdrestart uses configuration files generated by bconf.

Configuration files are checked for errors and the results printed to stderr. If no errors are found, configuration files are reloaded, mbatchd and mbschd are restarted, and events in lsb.events are replayed to recover the running state of the last mbatchd. While mbatchd restarts, it is unavailable to service requests.

If warning errors are found, badmin prompts you to display detailed messages. If fatal errors are found, mbatchd and mbschd restart is not performed, and badmin exits.

If lsb.events is large, or many jobs are running, restarting mbatchd can take several minutes. If you only need to reload the configuration files, use badmin reconfig.
-C comment

Logs the text of comment as an administrator comment record to lsb.events. The maximum length of the comment string is 512 characters.

-v

Verbose mode. Displays detailed messages about the status of configuration files. All messages from configuration checking are printed to stderr.

-f

Disables interaction and forces reconfiguration and mbatchd restart to proceed if configuration files contain no fatal errors.

-p

Allows parallel mbatchd restart. This will fork a child mbatchd process to help minimize downtime for LSF. LSF starts a new/child mbatchd process to read the configuration files and replay the event file. The old master mbatchd can respond to client commands (bsub, bjobs, etc.), handle job scheduling and status updates, dispatching, and updating new events to event files. When complete, the child takes over as master mbatchd, and the old master mbatchd dies.

qopen [-C comment] [queue_name ... | all]
Opens specified queues, or all queues if the reserved word all is specified. If no queue is specified, the system default queue is assumed. A queue can accept batch jobs only if it is open.
-C comment

Logs the text of comment as an administrator comment record to lsb.events. The maximum length of the comment string is 512 characters.

qclose [-C comment] [queue_name ... | all]
Closes specified queues, or all queues if the reserved word all is specified. If no queue is specified, the system default queue is assumed. A queue does not accept any job if it is closed.
-C comment

Logs the text as an administrator comment record to lsb.events. The maximum length of the comment string is 512 characters.

qact [-C comment] [queue_name ... | all]

Activates specified queues, or all queues if the reserved word all is specified. If no queue is specified, the system default queue is assumed. Jobs in a queue can be dispatched if the queue is activated.

A queue inactivated by its run windows cannot be reactivated by this command.
-C comment

Logs the text of the comment as an administrator comment record to lsb.events. The maximum length of the comment string is 512 characters.

qinact [-C comment] [queue_name ... | all]
Inactivates specified queues, or all queues if the reserved word all is specified. If no queue is specified, the system default queue is assumed. No job in a queue can be dispatched if the queue is inactivated.
-C comment

Logs the text as an administrator comment record to lsb.events. The maximum length of the comment string is 512 characters.

qhist [-t time0,time1] [-f logfile_name] [queue_name ...]
Displays historical events for specified queues, or for all queues if no queue is specified. Queue events are queue opening, closing, activating and inactivating.
-t time0,time1

Displays only those events that occurred during the period from time0 to time1. See bhist for the time format. The default is to display all queue events in the event log file.

-f logfile_name

Specifies the file name of the event log file. Either an absolute or a relative path name may be specified. The default is to use the event log file currently used by the LSF system: LSB_SHAREDIR/cluster_name/logdir/lsb.events. Option -f is useful for offline analysis.

If you specified an administrator comment with the -C option of the queue control commands qclose, qopen, qact, and qinact, qhist displays the comment text.

hopen [-C comment] [host_name ... | host_group ... | compute_unit ... | all]

Opens batch server hosts. Specify the names of any server hosts, host groups, or compute units. All batch server hosts are opened if the reserved word all is specified. If no host, host group, or compute unit is specified, the local host is assumed. A host accepts batch jobs if it is open.

Important:

If EGO-enabled SLA scheduling is configured through ENABLE_DEFAULT_EGO_SLA in lsb.params, and a host is closed by EGO, it cannot be reopened by badmin hopen. Hosts closed by EGO have status closed_EGO in bhosts -l output.

-C comment

Logs the text as an administrator comment record to lsb.events. The maximum length of the comment string is 512 characters.

If you open a host group or compute unit, each member displays with the same comment string.

hclose [-C comment] [host_name ... | host_group ... | compute_unit ... | all]

Closes batch server hosts. Specify the names of any server hosts, host groups, or compute units. All batch server hosts are closed if the reserved word all is specified. If no argument is specified, the local host is assumed. A closed host does not accept any new job, but jobs already dispatched to the host are not affected. Note that this is different from a host closed by a window; all jobs on it are suspended in that case.

-C comment

Logs the text as an administrator comment record to lsb.events. The maximum length of the comment string is 512 characters.

If you close a host group or compute unit, each member displays with the same comment string.

hghostadd [-C comment] host_group | compute_unit |host_name [host_name ...]

If dynamic host configuration is enabled, dynamically adds hosts to a host group or compute unit. After receiving the host information from the master LIM, mbatchd dynamically adds the host without triggering a reconfig.

Once the host is added to the host group or compute unit, it is considered part of that group with respect to scheduling decision making for both newly submitted jobs and for existing pending jobs.

This command fails if any of the specified host groups, compute units, or host names are not valid.

Restriction:

If EGO-enabled SLA scheduling is configured through ENABLE_DEFAULT_EGO_SLA in lsb.params, you cannot use hghostadd because all host allocation is under control of IBM Platform EGO.

-C comment

Logs the text as an administrator comment record to lsb.events. The maximum length of the comment string is 512 characters.

hghostdel [-f] [-C comment] host_group | compute_unit |host_name [host_name ...]

Dynamically deletes hosts from a host group or compute unit by triggering an mbatchd reconfig.

This command fails if any of the specified host groups, compute units, or host names are not valid.

CAUTION:

If you want to change a dynamic host to a static host, first use the command badmin hghostdel to remove the dynamic host from any host group or compute unit that it belongs to, and then configure the host as a static host in lsf.cluster.cluster_name.

Restriction:

If EGO-enabled SLA scheduling is configured through ENABLE_DEFAULT_EGO_SLA in lsb.params, you cannot use hghostdel because all host allocation is under control of Platform EGO.

-f

Disables interaction and does not ask for confirmation when reconfiguring mbatchd.

-C comment

Logs the text as an administrator comment record to lsb.events. The maximum length of the comment string is 512 characters.

hpower [suspend | resume] [-C comment] [hostname...]
Manually switches hosts between a power saving state or a working state.
suspend | resume

The state that you want to switch the host to.

-C comment

Logs the text as an administrator comment record to lsb.events. The maximum length of the comment string is 512 characters.

hrestart [-f] [host_name ... | all]
Restarts sbatchd on the specified hosts, or on all server hosts if the reserved word all is specified. If no host is specified, the local host is assumed. sbatchd reruns itself from the beginning. This allows new sbatchd binaries to be used.
-f

Specify the name of the file into which timing messages are to be logged. A file name with or without a full path may be specified.

If a file name without a path is specified, the file is saved in the LSF system log file directory.

The name of the file created has the following format:

logfile_name.daemon_name.log.host_name

On UNIX, if the specified path is not valid, the log file is created in the /tmp directory.

On Windows, if the specified path is not valid, no log file is created.

Note: Both timing and debug messages are logged in the same files.

Default: current LSF system log file in the LSF system log file directory, in the format daemon_name.log.host_name.

hshutdown [-f] [host_name ... | all]
Shuts down sbatchd on the specified hosts, or on all batch server hosts if the reserved word all is specified. If no host is specified, the local host is assumed. sbatchd exits upon receiving the request.
-f

Disables interaction and does not ask for confirmation for shutting down sbatchd.

hstartup [-f] [host_name ... | all]

Starts sbatchd on the specified hosts, or on all batch server hosts if the reserved word all is specified. Only root and users listed in the file lsf.sudoers can use the all and -f options. If no host is specified, the local host is assumed.

-f

Disables interaction and does not ask for confirmation for starting sbatchd.

hhist [-t time0,time1] [-f logfile_name] [host_name ...]

Displays historical events for specified hosts, or for all hosts if no host is specified. Host events are host opening and closing. Also, both badmin and policy (job)-triggered power related events (suspend, resume, reset) are displayed.

-t time0,time1

Displays only those events that occurred during the period from time0 to time1. See bhist for the time format. The default is to display all host events in the event log file.

-f logfile_name

Specify the file name of the event log file. Either an absolute or a relative path name may be specified. The default is to use the event log file currently used by the LSF system: LSB_SHAREDIR/cluster_name/logdir/lsb.events. Option -f is useful for offline analysis.

If you specified an administrator comment with the -C option of the host control commands hclose or hopen, hhist displays the comment text.

mbdhist [-t time0,time1] [-f logfile_name]

Displays historical events for mbatchd. Events describe the starting and exiting of mbatchd.

-t time0,time1

Displays only those events that occurred during the period from time0 to time1. See bhist for the time format. The default is to display all queue events in the event log file.

-f logfile_name

Specify the file name of the event log file. Either an absolute or a relative path name may be specified. The default is to use the event log file currently used by the LSF system: LSB_SHAREDIR/cluster_name/logdir/lsb.events. Option -f is useful for offline analysis.

If you specified an administrator comment with the -C option of the mbdrestart command, mbdhist displays the comment text.

hist [-t time0,time1] [-f logfile_name]

Displays historical events for all the queues, hosts and mbatchd. Also, both badmin and policy (job)-triggered power related events (suspend, resume, reset) are displayed.

-t time0,time1

Displays only those events that occurred during the period from time0 to time1. See bhist for the time format. The default is to display all queue events in the event log file.

-f logfile_name

Specify the file name of the event log file. Either an absolute or a relative path name may be specified. The default is to use the event log file currently used by the LSF system: LSB_SHAREDIR/cluster_name/logdir/lsb.events. Option -f is useful for offline analysis.

If you specified an administrator comment with the -C option of the queue, host, and mbatchd commands, hist displays the comment text.

help [command ...] | ? [command ...]

Displays the syntax and functionality of the specified commands.

quit

Exits the badmin session.

mbddebug [-c class_name ...] [-l debug_level] [-f logfile_name] [-o]

Sets message log level for mbatchd to include additional information in log files. You must be root or the LSF administrator to use this command.

See sbddebug for an explanation of options.

mbdtime [-l timing_level] [-f logfile_name] [-o]

Sets timing level for mbatchd to include additional timing information in log files. You must be root or the LSF administrator to use this command.

sbddebug [-c class_name ...] [-l debug_level] [-f logfile_name] [-o] [host_name ...]

Sets the message log level for sbatchd to include additional information in log files. You must be root or the LSF administrator to use this command.

In MultiCluster, debug levels can only be set for hosts within the same cluster. For example, you cannot set debug or timing levels from a host in clusterA for a host in clusterB. You need to be on a host in clusterB to set up debug or timing levels for clusterB hosts.

If the command is used without any options, the following default values are used:

class_name=0 (no additional classes are logged)

debug_level=0 (LOG_DEBUG level in parameter LSF_LOG_MASK)

logfile_name=current LSF system log file in the LSF system log file directory, in the format daemon_name.log.host_name

host_name=local host (host from which command was submitted)
-c class_name ...

Specifies software classes for which debug messages are to be logged.

Format of class_name is the name of a class, or a list of class names separated by spaces and enclosed in quotation marks. Classes are also listed in lsf.h.

Valid log classes are:
  • LC_ADVRSV and LC2_ADVRSV: Log advance reservation modifications

  • LC_AFS and LC2_AFS: Log AFS messages

  • LC_AUTH and LC2_AUTH: Log authentication messages

  • LC_CHKPNT and LC2_CHKPNT: Log checkpointing messages

  • LC_COMM and LC2_COMM: Log communication messages

  • LC_DCE and LC2_DCE: Log messages pertaining to DCE support

  • LC_EEVENTD and LC2_EEVENTD: Log eeventd messages

  • LC_ELIM and LC2_ELIM: Log ELIM messages

  • LC_EXEC and LC2_EXEC: Log significant steps for job execution

  • LC_FAIR - Log fairshare policy messages

  • LC_FILE and LC2_FILE: Log file transfer messages

  • LC2_GUARANTEE: Log messages related to guarantee SLAs

  • LC_HANG and LC2_HANG: Mark where a program might hang

  • LC_JARRAY and LC2_JARRAY: Log job array messages

  • LC_JLIMIT and LC2_JLIMIT: Log job slot limit messages

  • LC_LOADINDX and LC2_LOADINDX: Log load index messages

  • LC_M_LOG and LC2_M_LOG: Log multievent logging messages

  • LC_MEMORY and LC2_MEMORY: Log messages related to MEMORY allocation

  • LC_MPI and LC2_MPI: Log MPI messages

  • LC_MULTI and LC2_MULTI: Log messages pertaining to MultiCluster

  • LC_PEND and LC2_PEND: Log messages related to job pending reasons

  • LC_PERFM and LC2_PERFM: Log performance messages

  • LC_PIM and LC2_PIM: Log PIM messages

  • LC_PREEMPT and LC2_PREEMPT: Log preemption policy messages

  • LC_RESOURCE and LC2_RESOURCE: Log messages related to resource broker

  • LC_RESREQ and LC2_RESREQ: Log resource requirement messages

  • LC_SCHED and LC2_SCHED: Log messages pertaining to the mbatchd scheduler.

  • LC_SIGNAL and LC2_SIGNAL: Log messages pertaining to signals

  • LC_SYS and LC2_SYS: Log system call messages

  • LC_TRACE and LC2_TRACE: Log significant program walk steps

  • LC_XDR and LC2_XDR: Log everything transferred by XDR

  • LC_XDRVERSION and LC2_XDRVERSION: Log messages for XDR version

Default: 0 (no additional classes are logged)

-l debug_level

Specifies level of detail in debug messages. The higher the number, the more detail that is logged. Higher levels include all lower levels.

Possible values:

0 LOG_DEBUG level for parameter LSF_LOG_MASK in lsf.conf.

1 LOG_DEBUG1 level for extended logging. A higher level includes lower logging levels. For example, LOG_DEBUG3 includes LOG_DEBUG2 LOG_DEBUG1, and LOG_DEBUG levels.

2 LOG_DEBUG2 level for extended logging. A higher level includes lower logging levels. For example, LOG_DEBUG3 includes LOG_DEBUG2 LOG_DEBUG1, and LOG_DEBUG levels.

3 LOG_DEBUG3 level for extended logging. A higher level includes lower logging levels. For example, LOG_DEBUG3 includes LOG_DEBUG2, LOG_DEBUG1, and LOG_DEBUG levels.

Default: 0 (LOG_DEBUG level in parameter LSF_LOG_MASK)

-f logfile_name

Specify the name of the file into which debugging messages are to be logged. A file name with or without a full path may be specified.

If a file name without a path is specified, the file is saved in the LSF system log directory.

The name of the file that is created has the following format:

logfile_name.daemon_name.log.host_name

On UNIX, if the specified path is not valid, the log file is created in the /tmp directory.

On Windows, if the specified path is not valid, no log file is created.

Default: current LSF system log file in the LSF system log file directory.

-o

Turns off temporary debug settings and resets them to the daemon starting state. The message log level is reset back to the value of LSF_LOG_MASK and classes are reset to the value of LSB_DEBUG_MBD, LSB_DEBUG_SBD.

The log file is also reset back to the default log file.

host_name ...

Optional. Sets debug settings on the specified host or hosts.

Lists of host names must be separated by spaces and enclosed in quotation marks.

Default: local host (host from which command was submitted)

sbdtime [-l timing_level] [-f logfile_name] [-o] [host_name ...]

Sets the timing level for sbatchd to include additional timing information in log files. You must be root or the LSF administrator to use this command.

In MultiCluster, timing levels can only be set for hosts within the same cluster. For example, you could not set debug or timing levels from a host in clusterA for a host in clusterB. You need to be on a host in clusterB to set up debug or timing levels for clusterB hosts.

If the command is used without any options, the following default values are used:

timing_level=no timing information is recorded

logfile_name=current LSF system log file in the LSF system log file directory, in the format daemon_name.log.host_name

host_name=local host (host from which command was submitted)
-l timing_level

Specifies detail of timing information that is included in log files. Timing messages indicate the execution time of functions in the software and are logged in milliseconds.

Valid values: 1|2|3|4|5

The higher the number, the more functions in the software that are timed and whose execution time is logged. The lower numbers include more common software functions. Higher levels include all lower levels.

Default: undefined (no timing information is logged)

-f logfile_name

Specify the name of the file into which timing messages are to be logged. A file name with or without a full path may be specified.

If a file name without a path is specified, the file is saved in the LSF system log file directory.

The name of the file created has the following format:

logfile_name.daemon_name.log.host_name

On UNIX, if the specified path is not valid, the log file is created in the /tmp directory.

On Windows, if the specified path is not valid, no log file is created.

Note: Both timing and debug messages are logged in the same files.

Default: current LSF system log file in the LSF system log file directory, in the format daemon_name.log.host_name.

-o

Optional. Turn off temporary timing settings and reset them to the daemon starting state. The timing level is reset back to the value of the parameter for the corresponding daemon (LSB_TIME_MBD, LSB_TIME_SBD).

The log file is also reset back to the default log file.

host_name ...

Sets the timing level on the specified host or hosts.

Lists of hosts must be separated by spaces and enclosed in quotation marks.

Default: local host (host from which command was submitted)

schddebug [-c class_name ...] [-l debug_level] [-f logfile_name] [-o]

Sets message log level for mbschd to include additional information in log files. You must be root or the LSF administrator to use this command.

See sbddebug for an explanation of options.

schdtime [-l timing_level] [-f] [-o]

Sets timing level for mbschd to include additional timing information in log files. You must be root or the LSF administrator to use this command.

See sbdtime for an explanation of options.

showconf mbd | [sbd [ host_name … | all ]]

Display all configured parameters and their values set in lsf.conf or ego.conf that affect mbatchd and sbatchd.

In a MultiCluster environment, badmin showconf only displays the parameters of daemons on the local cluster.

Running badmin showconf from a master candidate host reaches all server hosts in the cluster. Running badmin showconf from a slave-only host may not be able to reach other slave-only hosts.

badmin showconf only displays the values used by LSF.

For example, if you define LSF_MASTER_LIST in lsf.conf, and EGO_MASTER_LIST in ego.conf, badmin showconf displays the value of EGO_MASTER_LIST.

badmin showconf displays the value of EGO_MASTER_LIST from wherever it is defined. You can define either LSF_MASTER_LIST or EGO_MASTER_LIST in lsf.conf. LIM reads lsf.conf first, and ego.conf if EGO is enabled in the LSF cluster. The value of LSF_MASTER_LIST is displayed only if EGO_MASTER_LIST is not defined at all in ego.conf.

For example, if EGO is enabled in the LSF cluster, and you define LSF_MASTER_LIST in lsf.conf, and EGO_MASTER_LIST in ego.conf, badmin showconf displays the value of EGO_MASTER_LIST in ego.conf.

If EGO is disabled, ego.conf is not loaded, so parameters defined in lsf.conf are displayed.

showstatus

Displays current LSF runtime information about the whole cluster, including information about hosts, jobs, users, user groups, and mbatchd startup and reconfiguration.

perflog [-t sample_period] [-f logfile_name] [-d duration] | [-o]]

This feature is useful for troubleshooting large clusters where a cluster may not be responding due to mbatchd performance problems. In such cases, mbatchd performance may be slow in handling high volume request, such as job submission, job status requests, job rusage requests, etc.

-t

Specifies the sampling period in minutes for performance metric collection. The default value is 5 minutes.

-f

Specifies a log file in which to save the information. It is either a filename or a full path filename. If you do not specify the path for the log file then its default path is used. The default name for the log file is mbatchd.perflog.<hostname>.

The owner of the log file is LSF_ADMIN. The log file permissions are the same as mbatchd log permissions. Everyone has read and execute access but the LSF_ADMIN has write, read and execute access.

-d

This is the duration (in minutes) you specify to keep logging performance metric data. mbatchd does not log messages once the duration is over, or until you stop it manually, restart mbatchd or reconfig mbatchd. The default value for this is infinite (that is, performance metric data will always be logged).

-o

Turns off dynamic performance metric logging (stop logging). If LSB_ENABLE_PERF_METRICS_LOG is enabled, it returns to the static configuration.

perfmon start [sample_period] | stop | view | setperiod sample_period

Dynamically enables and controls scheduler performance metric collection.

Collecting and recording performance metric data may affect the performance of LSF. Smaller sampling periods results in the lsb.streams file growing faster.

The following metrics are collected and recorded in each sample period:
  • The number of queries handled by mbatchd

  • The number of queries for each of jobs, queues, and hosts. (bjobs, bqueues, and bhosts, as well as other daemon requests)

  • The number of jobs submitted (divided into job submission requests and jobs actually submitted)

  • The number of jobs dispatched

  • The number of jobs completed

  • The numbers of jobs sent to remote cluster

  • The numbers of jobs accepted by from cluster

  • The file descriptors used by mbatchd

  • Scheduler performance metrics:
    • A shorter scheduling interval means the job is processed more quickly

    • Number of different resource requirement patterns for jobs in use which may lead to different candidate host groups. The more matching hosts required, the longer it takes to find them, which means a longer scheduling session.

    • Number of buckets (groups) in which jobs are put based on resource requirements and different scheduling policies. More buckets means a longer scheduling session.

start [sample_period]

Start performance metric collection dynamically and specifies an optional sampling period in seconds for performance metric collection.

If no sampling period is specified, the default period set in SCHED_METRIC_SAMPLE_PERIOD in lsb.params is used.

stop

Stop performance metric collection dynamically.

view

Display real time performance metric information for the current sampling period

setperiod sample_period

Set a new sampling period in seconds.

See also

bqueues, bhosts, lsb.params, lsb.queues, lsb.hosts, lsf.conf, lsf.cluster, sbatchd, mbatchd, mbschd