IBM Platform LSF 9.1.1 Fix 239508 Readme File
Abstract
APAR P100551: Scheduling performance fixes for Nvidia
Description
Readme documentation for IBM Platform LSF 9.1.1 Fix 239508 including installation-related instructions, prerequisites and co-requisites, and list of fixes.
This fix addresses the following issue:
Introduced the following two new parameters to enhance mbschd performance for Nvidia
1.
LSB_SHARED_RSRC_ENH=Y (lsf.conf)
This parameter will enable LSF to permit multiple instances of a (site defined) shared resource.
For example, you can have one instance of resource R consisting of 10 units on hosts 1 and 2 and a second instance consisting of 10 units on hosts 3 and 4. Each host can be associated with at most one instance.
When scheduling a job that requires a shared resource in its rusage expression, if LSF discovers that the job cannot use one host because of lack of the resource. In general without this parameter, it continues checking other hosts, since there may be multiple instances of the resource available.
A common case is a single instance of a shared resource. For example, a floating software license.
When the above parameter is set, for resources with a single instance defined in the cluster,LSF will stop considering a job for dispatch immediately once it finds there is an insufficient amount of the resource available in that instance. This can improve scheduling performance in the case where there are single-instance shared resources configured for the cluster.
2.
LSB_SKIP_FULL_HOSTS=Y (lsf.conf)
By default, LSF removes unusable hosts from consideration at the beginning of each scheduling session. For example, hosts that are down (unavail or unreach), closed by the administrator (closed_Adm), or closed due to a load threshold(closed_Busy), are unusable by any job and can be removed from consideration. This is done to help improve LSF scheduling performance.
However, hosts with all slots occupied (closed_Full) are not removed, since they can still be used by jobs in preemptive queues, if queue-based preemption is enabled.
For sites without preemption configured, it is not necessary for LSF to consider hosts whose jobs slots are all occupied. When the parameter is set, LSF removes those fully occupied hosts from consideration at the beginning of each scheduling session, as long as either: (1) the preemption plugin is not loaded, or (2) there is no preemption relationship between queues (see the PREEMPTION parameter in lsb.queues).
Readme file for: IBM® Platform LSF
Product/Component Release: 9.1.1
Update Name: Fix 239508
Fix ID: LSF-9.1.1-build239508
Publication date: 21 July 2014
Last modified date: 21 July 2014
Contents:
1. List of fixes
2. Download location
3. Products or components affected
4. System requirements
5. Installation and configuration
6. List of files
7. Copyright and trademark information
1. List of fixes
P100551
2. Download Location
Download Fix 239508 from the following location: http://www.ibm.com/eserver/support/fixes/
3. Products or components affected
Affected components include: LSF/mbschd, LSF/mbatchd, LSF/schmod_default.so, LSF/schmod_reserve.so, LSF/schmod_preemption.so LSF/schmod_affinity.so, LSF/schmod_parallel.so, LSF/schmod_advrsv.so, LSF/schmod_aps.so, LSF/schmod_bluegene.so, LSF/schmod_cpuset.so, LSF/schmod_craylinux.so, LSF/schmod_crayx1.so, LSF/schmod_dc.so, LSF/schmod_dist.so, LSF/schmod_fairshare.so, LSF/schmod_fcfs.so, LSF/schmod_jobweight.so, LSF/schmod_limit.so, LSF/schmod_mc.so, LSF/schmod_ps.so, LSF/schmod_pset.so, LSF/schmod_rms.so, LSF/schmod_xl.so, LSF/bjobs
4. System requirements
Linux2.6-glibc2.3-x86_64
5. Installation and configuration
5.1 Before installation
(LSF_TOP=Full path to the top-level installation directory of LSF.)
1) Log on to the LSF master host as root
2) Set your environment:
- For csh or tcsh: % source LSF_TOP/conf/cshrc.lsf
- For sh, ksh, or bash: $ . LSF_TOP/conf/profile.lsf
5.2 Installation steps
1) Go to the patch install directory: cd $LSF_ENVDIR/../9.1/install/
2) Copy the patch file to the install directory $LSF_ENVDIR/../9.1/install/
3) Run patchinstall: ./patchinstall <patch>
5.3 After installation
1) Log on to the LSF master host as root
2) Set below parameters in the lsf.conf,
LSB_SHARED_RSRC_ENH=Y
LSB_SKIP_FULL_HOSTS=Y
3) Run lsadmin limrestart and badmin mbdrestart
5.4 Uninstallation
To roll back a patch:
1) Log on to the LSF master host as root
2) Run ./patchinstall -r <patch>
3) Unset below parameters in the lsf.conf,
LSB_SHARED_RSRC_ENH=Y
LSB_SKIP_FULL_HOSTS=Y
4) Run lsadmin limrestart and badmin mbdrestart
6. List of files
mbschd
mbatchd
schmod_default.so
schmod_reserve.so
schmod_preemption.so
schmod_affinity.so
schmod_parallel.so
schmod_advrsv.so
schmod_aps.so
schmod_bluegene.so
schmod_cpuset.so
schmod_craylinux.so
schmod_crayx1.so
schmod_dc.so
schmod_dist.so
schmod_fairshare.so
schmod_fcfs.so
schmod_jobweight.so
schmod_limit.so
schmod_mc.so
schmod_ps.so
schmod_pset.so
schmod_rms.so
schmod_xl.so
bjobs
7. Copyright and trademark information
© Copyright IBM Corporation 2014
U.S. Government Users Restricted Rights - Use, duplication or disclosure restricted by GSA ADP Schedule Contract with IBM Corp.
IBM®, the IBM logo and ibm.com® are trademarks of International Business Machines Corp., registered in many jurisdictions worldwide. Other product and service names might be trademarks of IBM or other companies. A current list of IBM trademarks is available on the Web at "Copyright and trademark information" at www.ibm.com/legal/copytrade.shtml.