Control mbatchd

Procedure

You use the badmin command to control mbatchd.

Reconfigure mbatchd

About this task

If you add a host to a host group, a host to a queue, or change resource configuration in the Hosts section of lsf.cluster.cluster_name, the change is not recognized by jobs that were submitted before you reconfigured.

If you want the new host to be recognized, you must restart mbatchd (or add the host using the bconf command if you are using live reconfiguration).

Procedure

Run badmin reconfig.

Results

When you reconfigure the cluster, mbatchd is not restarted. Only configuration files are reloaded.

Restart mbatchd

Procedure

Run badmin mbdrestart.

LSF checks configuration files for errors and prints the results to stderr. If no errors are found, the following occurs:

  • Configuration files are reloaded

  • mbatchd is restarted

  • Events in lsb.events are reread and replayed to recover the running state of the last mbatchd

Results

Tip:

Whenever mbatchd is restarted, it is unavailable to service requests. In large clusters where there are many events in lsb.events, restarting mbatchd can take some time. To avoid replaying events in lsb.events, use the command badmin reconfig.

Log a comment when restarting mbatchd

Procedure

  1. Use the -C option of badmin mbdrestart to log an administrator comment in lsb.events.

    For example:

    badmin mbdrestart -C "Configuration change"

    The comment text Configuration change is recorded in lsb.events.

  2. Run badmin hist or badmin mbdhist to display administrator comments for mbatchd restart.

Shut down mbatchd

Procedure

  1. Run badmin hshutdown to shut down sbatchd on the master host.

    For example:

    badmin hshutdown hostD
    Shut down slave batch daemon on <hostD> .... done
  2. Run badmin mbdrestart:
    badmin mbdrestart
    Checking configuration files ... 
    No errors found.

    This causes mbatchd and mbschd to exit. mbatchd cannot be restarted because sbatchd is shut down. All LSF services are temporarily unavailable, but existing jobs are not affected. When mbatchd is later started by sbatchd, its previous status is restored from the event log file and job scheduling continues.