Readme file for IBM® Spectrum LSF 10.1 Fix 601691

Abstract

P104943. This is an enhancement to the LSF cgroup integration. LSF will now assign a share of the amount of CPU that a job receives based on how many of the job’s tasks will run on the host. Additionally, LSF can be configured so that jobs with a specific application profile, or submitted to a specific queue, will receive a fraction of the new default.

 

Description

Readme documentation for IBM Spectrum LSF 10.1 Fix 601691 including installation-related instructions, prerequisites and co-requisites, and list of fixes.

This fix addresses the following issue:

If the CPU cgroup subsystem is enabled and LSF is configured to use cgroups, LSF will create a cgroup for a job on each execution host. Historically, LSF relied on the OS default for setting cpu.shares (1024 for cgroup v1) and cpu.weight (100 for cgroup v2). Regardless of how CPU intensive a job was, it would receive equal weight or share of the CPU as all other jobs running. It is now possible for jobs which are less CPU intensive that they be given a fraction of the CPU weight or share compared to other jobs. The new default is for the weight or share of a job’s CPU now correlates with the number of tasks it requests/receives on a host.

A new parameter, LSB_CGROUP_CPU_SHARES_OLD_DEFAULT, has been introduced into the lsf.conf file to control whether LSF relies on the OS default for cpu.shares or cpu.weight, or if it should set the LSF default based on the number of tasks requested by the job. If this parameter is undefined or set to N|n, LSF will initially set cpu.shares/cpu.weight based on the number of tasks requested by the job. For both cgroup versions, LSF uses the(requested_cores/MXJ)*1024 formula. If the host’s MXJ value is undefined, then LSF uses the OS default. If set to Y|y, the OS default will be used. Syntax: LSB_CGROUP_CPU_SHARES_OLD_DEFAULT=Y|y|N|n

Default: N

The lsb.applications and lsb.queues files have been extended with a new parameter (CGROUP_CPU_SHARES_FACTOR) to allow the administrator to specify a percentage of the default value for CPU shares or CPU weight. The specified value must be an integer greater than 0 and less than 100. If an unsupported value is specified, the configuration will be ignored with a warning message. The CGROUP_CPU_SHARES_FACTOR parameter can be specified at the application profile, and queue level. When defined at multiple levels, LSF will use the smallest of the values.

CGROUP_CPU_SHARES_FACTOR=<int_value>

 

CGROUP_CPU_SHARES_FACTOR not defined

 

cgroup v1

cpu.shares

cgroup v2

cpu.weight

MXJ is defined and LSB_CGROUP_CPU_SHARES_OLD_DEFAULT=N

1024*(#TasksOnHost/MXJ)

Otherwise

1024

100

 

CGROUP_CPU_SHARES_FACTOR is defined (abbreviated CCSF)

 

cgroup v1

cpu.shares

cgroup v2

cpu.weight

MXJ is defined and LSB_CGROUP_CPU_SHARES_OLD_DEFAULT=N

1024*(CCSF/100)*(#TasksOnHost/MXJ)

Otherwise

(CCSF/100)*1024

(CCSF/100)*100

 

Also,  LSF’s resizable job feature has been enhanced so that if a new execution host is added to a resizable job, a cgroup will be created on the new host (if the host supports cgroups). As a resizable job grows or shrinks, LSF will adjust the value of cpu.shares or cpu.weight of the job’s cgroups.

 

Note: For cgroup v2 support, you must use lsf10.1_lnx310-lib217-x86_64 package, not lsf10.1_linux2.6-glibc2.3-x86_64 package.

 

Readme file for: IBM® Spectrum LSF

Product or component release: 10.1

Update name: Fix 601691

Fix ID: LSF-10.1-build601691

Publication date: 7 November 2023

 

Contents

1. List of fixes

2. Download location

3. Product or components affected

4. System requirements

5. Installation and configuration

6. List of files

7. Product notifications

8. Copyright and trademark information

 

1. List of fixes

P104943

 

2. Download locations

Download Fix 601691 from the following location: https://www.ibm.com/support/fixcentral

 

3. Product or components affected

Affected product or components include:

LSF/sbatchd

LSF/res

LSF/mbatchd

LSF/mbschd

LSF/bqueues

LSF/bapp

LSF/bjobs

LSF/bhosts

LSF/ebrokerd

 

4. System requirements

linux3.10-glibc2.17-x86_64

linux2.6-glibc2.3-x86_64

 

5. Installation and configuration

Before you install

LSF_TOP is the full path to the top-level installation directory of LSF.

1.      Before you apply this fix, ensure that you installed LSF 10.1 Fix Pack 12 or later. Download LSF 10.1 Fix Pack 12 from https://www.ibm.com/support/fixcentral. Search for build600488. Contact IBM LSF Support if you have any questions or problems with installing Fix Pack 12.

2.      Starting in LSF 10.1 Fix Pack 13, the default values of the following three GPU parameters are changed to:
LSF_GPU_AUTOCONFIG=Y
LSB_GPU_NEW_SYNTAX=extend
LSF_GPU_RESOURCE_IGNORE=Y

If you have Fix Pack 13 installed, and these GPU parameters are not configured in the lsf.conf configuration file, LSF will use default values, and the parameters already configured in the lsf.conf file will not be affected.

If you want to keep the former GPU behavior, and if any of the three parameters are missing in the lsf.conf configuration file, you must explicitly configure the following default settings that are defined in Fix Pack 12 or earlier:
LSF_GPU_AUTOCONFIG=N
LSB_GPU_NEW_SYNTAX=N
LSF_GPU_RESOURCE_IGNORE=N

3.      Log on to the LSF management host as the LSF primary administrator.

4.      Set your environment:
-For csh or tcsh: % source LSF_TOP/conf/cshrc.lsf
-For sh, ksh, or bash: $ . LSF_TOP/conf/profile.lsf

Installation steps

1.      Run badmin hclose all

2.    Run badmin qinact all

3.    Log on to the LSF management host as root and set the LSF cluster environment.

4.      Go to the install directory: cd $LSF_ENVDIR/../10.1/install/

5.      Copy the fix file to the install directory: $LSF_ENVDIR/../10.1/install/

4.      Run patchinstall: ./patchinstall <fix>

After you install

1.      Log on to the LSF management host as the LSF primary administrator and set your environment:

-For csh or tcsh: % source LSF_TOP/conf/cshrc.lsf

-For sh, ksh, or bash: $ . LSF_TOP/conf/profile.lsf

2.    Run lsadmin resrestart all

3.    Run badmin hrestart all

4.    Run badmin hopen all

5.    Run badmin qact all

Note: If MXJ for a host changes restart the sbatchd daemon on the host where MXJ has changed.

 

Uninstallation

1.    Log on to the LSF management host as the LSF cluster primary administrator and set the LSF cluster environment.

2.    Run badmin hclose all

3.    Run badmin qinact all

4.    Log on to the LSF management host as root and set the LSF cluster environment.

5.    Go to the fix install directory: cd $LSF_ENVDIR/../10.1/install/

6.    Run ./patchinstall -r <patch>

7.    Log on to the LSF management host as the LSF cluster primary administrator and set the LSF cluster environment.

8.   Run lsadmin resrestart all

9.    Run badmin hrestart all

10.   Run badmin hopen all

11.   Run badmin qact all

 

6. List of files

The following components in all Linux and Unix packages:

LSF/sbatchd

LSF/res

LSF/mbatchd

LSF/mbschd

LSF/bqueues

LSF/bapp

LSF/bjobs

LSF/bhosts

LSF/ebrokerd

 

7. Product Notifications

To receive information about product solution and fix updates automatically, subscribe to product notifications on the My notifications page (www.ibm.com/support/mynotifications) on the IBM Support website (support.ibm.com). You can edit your subscription settings to choose the types of information you want to get notification about, for example, security bulletins, fixes, troubleshooting, and product enhancements or documentation changes.

 

8. Copyright and Trademark Information

©Copyright IBM Corporation 2023

U.S. Government Users Restricted Rights - Use, duplication or disclosure restricted by GSA ADP Schedule Contract with IBM Corp.

IBM®, the IBM logo, and ibm.com® are trademarks of International Business Machines Corp., registered in many jurisdictions worldwide. Other product and service names might be trademarks of IBM or other companies. A current list of IBM trademarks is available on the web at "Copyright and trademark information" at www.ibm.com/legal/copytrade.shtml.