Configure mbatchd to use multithreading

About this task

When mbatchd has a dedicated port specified by the parameter LSB_QUERY_PORT in lsf.conf, it forks a child mbatchd which in turn creates threads to process bjobs query requests.

As soon as mbatchd has forked a child mbatchd, the child mbatchd takes over, and listens on the port to process more bjobs query requests. For each query request, the child mbatchd creates a thread to process it.

If you specify LSB_QUERY_ENH=Y in lsf.conf, batch query multithreading is extended to all mbatchd query commands except for the following:

  • bread

  • bstatus

  • tspeek

The child mbatchd continues to listen to the port number specified by LSB_QUERY_PORT and creates threads to service requests until the job status changes, a new job is submitted, or until the time specified in MBD_REFRESH_TIME in lsb.params has passed. For pending jobs that changed state (e.g., from PEND to EXIT caused by the automatic orphan job termination feature), a new child mbatchd is created based only on the time configured by the MBD_REFRESH_TIME parameter.

Specify a time interval, in seconds, when mbatchd will fork a new child mbatchd to service query requests to keep information sent back to clients updated. A child mbatchd processes query requests creating threads.

MBD_REFRESH_TIME has the following syntax:

MBD_REFRESH_TIME=seconds [min_refresh_time]

where min_refresh_time defines the minimum time (in seconds) that the child mbatchd will stay to handle queries. The valid range is 0 - 300. The default is 5 seconds.

  • If MBD_REFRESH_TIME is < min_refresh_time, the child mbatchd exits at MBD_REFRESH_TIME even if the job changes status or a new job is submitted before MBD_REFRESH_TIME expires.

  • If MBD_REFRESH_TIME > min_refresh_time

    • the child mbatchd exits at min_refresh_time if a job changes status or a new job is submitted before the min_refresh_time

    • the child mbatchd exits after the min_refresh_time when a job changes status or a new job is submitted

  • If MBD_REFRESH_TIME > min_refresh_time and no job changes status or a new job is submitted, the child mbatchd exits at MBD_REFRESH_TIME

The default for min_refresh_time is 10 seconds.

If you extend multithreaded query support to batch query requests (by specifying LSB_QUERY_ENH=Y in lsf.conf), the child mbatchd will also exit if any of the following commands are run in the cluster:

  • bconf
  • badmin reconfig
  • badmin commands to change a queue's status (badmin qopen, badmin qclose, badmin qact, and badmin qinact)
  • badmin commands to change a host's status (badmin hopen and badmin hclose)
  • badmin perfmon start

If you use the bjobs command and do not get up-to-date information, you may want to decrease the value of MBD_REFRESH_TIME or min_refresh_time in lsb.params to make it likely that successive job queries could get the newly submitted job information.

Note:

Lowering the value of MBD_REFRESH_TIME or min_refresh_time increases the load on mbatchd and might negatively affect performance.

Procedure

  1. Specify a query-dedicated port for the mbatchd by setting LSB_QUERY_PORT in lsf.conf.
  2. Optional: Set an interval of time to indicate when a new child mbatchd is to be forked by setting MBD_REFRESH_TIME in lsb.params. The default value of MBD_REFRESH_TIME is 5 seconds, and valid values are 0-300 seconds.
  3. Optional: Use NEWJOB_REFRESH=Y in lsb.params to enable a child mbatchd to get up to date new job information from the parent mbatchd.

Set a query-dedicated port for mbatchd

About this task

To change the default mbatchd behavior so that mbatchd forks a child mbatchd that can create threads, specify a port number with LSB_QUERY_PORT in lsf.conf.
Tip:

This configuration only works on UNIX platforms that support thread programming.

Procedure

  1. Log on to the host as the primary LSF administrator.
  2. Edit lsf.conf.
  3. Add the LSB_QUERY_PORT parameter and specify a port number that will be dedicated to receiving requests from hosts.
  4. Save the lsf.conf file.
  5. Reconfigure the cluster:

    badmin mbdrestart

Specify an expiry time for child mbatchds (optional)

About this task

Use MBD_REFRESH_TIME in lsb.params to define how often mbatchd forks a new child mbatchd.

Procedure

  1. Log on to the host as the primary LSF administrator.
  2. Edit lsb.params.
  3. Add the MBD_REFRESH_TIME parameter and specify a time interval in seconds to fork a child mbatchd.

    The default value for this parameter is 5 seconds. Valid values are 0 - 300 seconds.

  4. Save the lsb.params file.
  5. Reconfigure the cluster as follows:

    badmin reconfig

Specify hard CPU affinity

About this task

You can specify the master host CPUs on which mbatchd child query processes can run (hard CPU affinity). This improves mbatchd scheduling and dispatch performance by binding query processes to specific CPUs so that higher priority mbatchd processes can run more efficiently.

When you define this parameter, LSF runs mbatchd child query processes only on the specified CPUs. The operating system can assign other processes to run on the same CPU, however, if utilization of the bound CPU is lower than utilization of the unbound CPUs.

Procedure

  1. Identify the CPUs on the master host that will run mbatchd child query processes.
    • Linux: To obtain a list of valid CPUs, run the command

      /proc/cpuinfo

    • Solaris: To obtain a list of valid CPUs, run the command

      psrinfo

  2. In the file lsb.params, define the parameter MBD_QUERY_CPUS.

    For example, if you specify:

    MBD_QUERY_CPUS=1 2

    the mbatchd child query processes will run only on CPU numbers 1 and 2 on the master host.

    You can specify CPU affinity only for master hosts that use one of the following operating systems:
    • Linux 2.6 or higher

    • Solaris 8 or higher

    If failover to a master host candidate occurs, LSF maintains the hard CPU affinity, provided that the master host candidate has the same CPU configuration as the original master host. If the configuration differs, LSF ignores the CPU list and reverts to default behavior.

  3. Verify that the mbatchd child query processes are bound to the correct CPUs on the master host.
    1. Start up a query process by running a query command such as bjobs.
    2. Check to see that the query process is bound to the correct CPU.
      • Linux: Run the command taskset -p <pid>

      • Solaris: Run the command ps -AP

Configure mbatchd to push new job information to child mbatchd

Before you begin

LSB_QUERY_PORT must be defined. in lsf.conf.

About this task

If you have enabled multithreaded mbatchd support, the bjobs command may not display up-to-date information if two consecutive query commands are issued before a child mbatchd expires because child mbatchd job information is not updated. Use NEWJOB_REFRESH=Y in lsb.params to enable a child mbatchd to get up to date new job information from the parent mbatchd.

When NEWJOB_REFRESH=Y the parent mbatchd pushes new job information to a child mbatchd. Job queries with bjobs display new jobs submitted after the child mbatchd was created.

Procedure

  1. Log on to the host as the primary LSF administrator.
  2. Edit lsb.params.
  3. Add NEWJOB_REFRESH=Y.

    You should set MBD_REFRESH_TIME in lsb.params to a value greater than 10 seconds.

  4. Save the lsb.params file.
  5. Reconfigure the cluster as follows:

    badmin reconfig

Multithread batch queries

Earlier versions of LSF supported multithread for bjobs queries only, but not for other query commands. LSF now supports multithread batch queries for several other common batch query commands. Only the following batch query commands do not support multithread batch queries:

  • bread

  • bstatus

  • tspeek

The LSB_QUERY_ENH parameter (in lsf.conf) extends multithreaded query support to other batch query commands in addition to bjobs. In addition, the mbatchd system query monitoring mechanism starts automatically instead of being triggered by a query request. This ensures a consistent query response time within the system.

To extend multithread queries to other batch query commands, set LSB_QUERY_ENH=Y in lsf.conf and run badmin mbdrestart for the change to take effect.