Readme file for IBM® Spectrum LSF 10.1 fix 601102

Abstract

P104627. This fix:
1. Prevents the mbatchd daemon core dumps when bresize jobs on dynamic hosts.
2. Prevents the mbatchd forward "DATA AVAILABILITY" message to execution cluster.
3. Prevents the mbschd daemon core dumps when a pending array job is switched from fairshare queue which uses egroup to a queue that fairshare does not use egroup.
4. Prevents the query mbatchd daemon core dumps when it handles bhosts request.
5. Enables LSF to create Nvidia Multi-Instance GPU (MIG) instance when you do not specify mig in -gpu option.
6. Prevents the mbschd daemon core dumps when it checks job group limit.
7. Dumps JGRP_ADD events into LSF batch event log (lsb.events) file in correct order when mbatchd daemon does event switching.

Description

Readme documentation for IBM® Spectrum LSF 10.1 fix 601102 including installation-related instructions, prerequisites and co-requisites, and list of fixes.

This fix addresses the following issues:

1. The mbatchd daemon core dumps when bresize jobs on dynamic hosts.
2. A "DATA AVAILABILITY" message is forward to execution cluster after the job was already finished.
3. The mbschd daemon core dumps when a pending array job is switched from fairshare queue which uses egroup to a queue that fairshare does not use egroup.
4. The query mbatchd daemon core dumps when it handles bhosts request.
5. Currently LSF creates MIG instance only when you specify the "mig" keyword in the -gpu option. This issue prevents the job run if specify only gmem option. For example, job1 "bsub -gpu num=1:gmem=8G ./app" cannot run, but job2 "bsub -gpu num=1:mig=2:gmem=8G ./app" can run. After you apply this fix, job1 also can run. LSF automatically sets mig value according to the gmem value.
6. The mbschd daemon core dumps when it checks job group limit.
7. The JGRP_ADD events are out of order in lsb.events when mbatchd daemon does event switching.

Readme file for: IBM® Spectrum LSF

Product/Component release: 10.1

Update name: Fix 601102

Fix ID: LSF-10.1-build601102

Publication date: May 25, 2022

Last modified date: May 25, 2022

Contents

1. List of fixes

2. Download location

3. Product or components affected

4. System requirements

5. Installation and configuration

6. List of files

7. Product notifications

8. Copyright and trademark information


1. List of fixes

P104627


2. Download locations

Download fix 601102 from the following location: https://www.ibm.com/support/fixcentral


3. Product or components affected

Affected product or components include(the list matches the issues list in description):

mbatchd, mbschd, ebrokerd, bjobs


4. System requirements

linux2.6-glibc2.3-x86_64

linux3.10-glibc2.17-x86_64


5. Installation and configuration

5.1 Before you install

(LSF_TOP=Full path to the top-level installation directory of LSF.)

1) You must have LSF 10.1 Fix Pack 12 installed prior to installing this fix. Download this fix pack from IBM Fix Central (https://www.ibm.com/support/fixcentral) and search for build600488. Contact IBM LSF Support if you have any questions or problems with installing fix pack 12.

2) Starting in IBM Spectrum LSF Version 10.1 Fix Pack 13, the default values of the following three GPU parameters are changed to:
LSF_GPU_AUTOCONFIG=Y
LSB_GPU_NEW_SYNTAX=extend
LSF_GPU_RESOURCE_IGNORE=Y

If you have fix pack 13 installed, no further action is needed to set these parameters . If you have IBM Spectrum LSF Version 10.1 Fix Pack 12 or earlier, consider explicitly configuring the same values to these three parameters.

If you want to keep the former GPU behavior, and any one of the three parameters are missing in the lsf.conf configuration file, you must explicitly configure the following default settings that are defined in fix pack 12 or earlier:
LSF_GPU_AUTOCONFIG=N
LSB_GPU_NEW_SYNTAX=N
LSF_GPU_RESOURCE_IGNORE=N

3) Log on to the LSF management host as root

4) Set the LSF cluster environment:

- For csh or tcsh: % source LSF_TOP/conf/cshrc.lsf

- For sh, ksh, or bash: $ . LSF_TOP/conf/profile.lsf

5.2 Installation steps

1) Go to the fix install directory: cd $LSF_ENVDIR/../10.1/install/

2) Copy the fix file to the install directory $LSF_ENVDIR/../10.1/install/

3) Run patchinstall: ./patchinstall <fix>

5.3 After you install

1) Log on to the LSF management host as the LSF cluster primary administrator and set the LSF cluster environment

2) Run badmin mbdrestart

5.4 Uninstallation

1) Log on to the LSF management host as root and set the LSF cluster environment

2) Go to the fix install directory: cd $LSF_ENVDIR/../10.1/install/

3) Run ./patchinstall -r <fix>

4) Log on to the LSF management host as the LSF cluster primary administrator and set the LSF cluster environment

5) Run badmin mbdrestart


6. List of files

mbatchd

mbschd

ebrokerd

bjobs


7. Product notifications

To receive information about product solution and fix updates automatically, subscribe to product notifications on the My notifications page ( www.ibm.com/support/mynotifications) on the IBM Support website (support.ibm.com). You can edit your subscription settings to choose the types of information you want to get notification about, for example, security bulletins, fixes, troubleshooting, and product enhancements or documentation changes.



8. Copyright and trademark information

©Copyright IBM Corporation 2022


U.S. Government Users Restricted Rights - Use, duplication or disclosure restricted by GSA ADP Schedule Contract with IBM Corp.

IBM®, the IBM logo, and ibm.com® are trademarks of International Business Machines Corp., registered in many jurisdictions worldwide. Other product and service names might be trademarks of IBM or other companies. A current list of IBM trademarks is available on the Web at "Copyright and trademark information" at www.ibm.com/legal/copytrade.shtml.