How mbatchd reconfiguration and restart affects historical run time

After restarting or reconfiguring mbatchd, the historical run time of finished jobs might be different, since it includes jobs that may have been cleaned from mbatchd before the restart. mbatchd restart only reads recently finished jobs from lsb.events, according to the value of CLEAN_PERIOD in lsb.params. Any jobs cleaned before restart are lost and are not included in the new calculation of the dynamic priority.

Example

The following fairshare parameters are configured in lsb.params:

CPU_TIME_FACTOR = 0
RUN_JOB_FACTOR  = 0
RUN_TIME_FACTOR = 1
FAIRSHARE_ADJUSTMENT_FACTOR = 0

Note that in this configuration, only run time is considered in the calculation of dynamic priority. This simplifies the formula to the following:

dynamic priority = number_shares / (run_time * RUN_TIME_FACTOR)

Without the historical run time, the dynamic priority increases suddenly as soon as the job finishes running because the run time becomes zero, which gives no chance for jobs pending for other users to start.

When historical run time is included in the priority calculation, the formula becomes:

dynamic priority = number_shares / (historical_run_time + run_time) * RUN_TIME_FACTOR)

Now the dynamic priority increases gradually as the historical run time decays over time.