The files lsf.shared and lsf.cluster.cluster_name are shared only among LIMs listed as candidates to be elected master with the parameter LSF_MASTER_LIST.
The preferred master host is no longer the first host in the cluster list in lsf.cluster.cluster_name, but the first host in the list specified by LSF_MASTER_LIST in lsf.conf.
Whenever you reconfigure, only master LIM candidates read lsf.shared and lsf.cluster.cluster_name to get updated information. The elected master LIM sends configuration information to slave LIMs.
The order in which you specify hosts in LSF_MASTER_LIST is the preferred order for selecting hosts to become the master LIM.
Generally, the files lsf.cluster.cluster_name and lsf.shared for hosts that are master candidates should be identical.
When the cluster is started up or reconfigured, LSF rereads configuration files and compares lsf.cluster.cluster_name and lsf.shared for hosts that are master candidates.
In some cases in which identical files are not shared, files may be out of sync. This section describes situations that may arise should lsf.cluster.cluster_name and lsf.shared for hosts that are master candidates not be identical to those of the elected master host.
LSF only rejects candidate master hosts listed in LSF_MASTER_LIST from the cluster if the number of load indices in lsf.cluster.cluster_name or lsf.shared for master candidates is different from the number of load indices in the lsf.cluster.cluster_name or lsf.shared files of the elected master.
A warning is logged in the log file lim.log.master_host_name and the cluster continue to run, but without the hosts that were rejected.
If you want the hosts that were rejected to be part of the cluster, ensure the number of load indices in lsf.cluster.cluster_name and lsf.shared are identical for all master candidates and restart LIMs on the master and all master candidates:
lsadmin limrestart hostA hostB hostC
If the elected master host goes down and if the number of load indices in lsf.cluster.cluster_name or lsf.shared for the new elected master is different from the number of load indices in the files of the master that went down, LSF will reject all master candidates that do not have the same number of load indices in their files as the newly elected master. LSF will also reject all slave-only hosts. This could cause a situation in which only the newly elected master is considered part of the cluster.
A warning is logged in the log file lim.log.new_master_host_name and the cluster continue to run, but without the hosts that were rejected.
To resolve this, from the current master host, restart all LIMs:
lsadmin limrestart all
All slave-only hosts will be considered part of the cluster. Master candidates with a different number of load indices in their lsf.cluster.cluster_name or lsf.shared files will be rejected.
When the master that was down comes back up, you need to ensure load indices defined in lsf.cluster.cluster_name and lsf.shared for all master candidates are identical and restart LIMs on all master candidates.