Improve performance of mbatchd for job array switching events

You can improve mbatchd performance when switching large job arrays to another queue by enabling the JOB_SWITCH2_EVENT in lsb.params. This lets mbatchd generate the JOB_SWITCH2 event log. JOB_SWITCH2 logs the switching of the array to another queue as one event instead of logging the switching of each individual array element. If this parameter is not enabled, mbatchd generates the old JOB_SWITCH event instead. The JOB_SWITCH event is generated for each array element. If the job array is very large, many JOB_SWITCH events are generated. mbatchd then requires large amounts of memory to replay all the JOB_SWITCH events, which can cause performance problems when mbatchd starts up.

JOB_SWITCH2 has the following advantages:
  • Reduces memory usage of mbatchd when replaying bswitch destination_queue job_ID, where job_ID is the job ID of the job array on which to operate.

  • Reduces the time for reading records from lsb.events when mbatchd starts up.

  • Reduces the size of lsb.events.

Master Batch Scheduler performance is also improved when switching large job arrays to another queue. When you bswitch a large job array, mbd no longer signals mbschd to switch each job array element individually, which meant thousands of signals for a job array with thousands of elements. The flood of signals would block mbschd from dispatching pending jobs. Now, mbatchd only sends one signal to mbschd: to switch the whole array. mbschd is then free to dispatch pending jobs.