Readme file for IBM® Spectrum LSF 10.1 fix 601504
Abstract
P104844. This fix supports the following enhancements to LSF:
· LSF rate limiter
This fix is applicable for LSF Fix Pack 12 or later.
If you want to apply LSF
10.1 fix 601284, do so before applying this fix (601504). Otherwise, you
will need to uninstall this fix and revert any configuration changes, and then
install both fixes in order.
Description
Readme documentation for IBM® Spectrum LSF 10.1 fix 601504 including installation-related instructions, prerequisites and co-requisites, and list of fixes.
This fix provides the following new features and enhancements:
ID |
Description |
P104844 |
LSF now supports a new
component, called the rate limiter, which prevents excessive requests to the mbatchd daemon. Applications using the LSF batch library should be relinked with the updated library when rate limiter is enabled.
By default, all LSF batch commands contact the mbatchd daemon (or the mbatchd query child, if configured). When there are excessive requests, such as scripts with tight loop running bjobs commands, mbatchd can become overloaded, negatively affecting cluster performance. To protect mbatchd from heavy requests, a new rate limiter LSF component, lsfproxyd, is introduced to act as a gate keeper between the commands and the mbatchd daemon. If rate limiting is
configured, the lsfproxyd daemon controls the
number of requests that can reach the mbatchd daemon. A request must first obtains a token from lsfproxyd to be able
to contact the mbatchd. After the request
finishes, the token is returned to the lsfproxyd. The lsfproxyd daemon
distributes tokens in round robin fashion among users, so that each user has
a fair chance to get their requests served, even under heavy request loads. Once a token in use
reaches its max value, a request (from the same category) from an ordinary user
(that is, non-root or non-cluster administrator) will not be granted a token.
After a certain number of retries (configured by LSF_PROXYD_NOTOKEN_NTRIES in the lsf.conf configuration file) the command will
then not contact mbatchd and will fail with an
error message. Requests from root or LSF cluster administrators are exempt
from this policy. In other words, requests from root or LSF cluster
administrators will always be granted a token. Their requests will still be
counted towards the tokens currently in use. Configuring the rate limiter The rate limiter component is enabled by configuring the two new
LSF_PROXYD_HOSTS and LSF_PROXYD_PORT parameters in the lsf.conf configuration file: ·
LSF_PROXYD_HOSTS - Defines
hosts to run lsfproxyd. ·
LSF_PROXYD_PORT - Port number
for the lsfproxyd daemon. The lsfproxyd daemon is started by the lim daemon. Restart lim on all the hosts mentioned in your old and new LSF_PROXYD_HOSTS parameters (by running lsadmin limrestart) for the configuration to take effect. Also, when changing the LSF_PROXYD_HOSTS parameter, the mbatchd daemon must be restarted (by running badmin mbdrestart).
The badmin command now
recognizes the new lsfproxyd keyword: Running badmin
lsfproxyd enable|disable all|query|sub|other
enables or disables the limiter functionality while the lsfproxyd daemon continues to run. If all is specified, the rate limiter is
enabled or disabled for all request types. If query is specified,
then only query requests will be affected. If sub is specified,
then only submission requests will be affected. Finally, if other is specified, requests not considered a
query or submission request will be affected. If a request type is disabled,
lsfproxyd will not distribute tokens
for that request type, and the requests will proceed to mbatchd instead. If lsfproxyd is disabled (or the lsfproxyd daemons are all down), queries without
tokens will be accepted by mbatchd: Running badmin
lsfproxyd status displays whether the different request
types are enabled. The status information also shows if the lsfproxyd is connected to the mbatchd, its share of the token limit, how many
tokens are in use, and how many tokens are in use by privileged users (root
or cluster administrators). Also, metrics are reported showing requests where
tokens were granted, rejected, blocked, or an error occurred. Metrics are
displayed for the last (60 second) reporting period, the maximum seen during
any reporting period, and the average since the lsfproxyd started: lsfproxyd service status: When using
the rate limiter, running badmin lsfproxyd block allows an administrator to temporarily
block non-administrators and non-root users, hosts, or both, from requests
to the mbatch daemon. Administrators can run this command to temporarily
stop abusive or misbehaving users from interacting with
the LSF cluster, and to avoid performance impact on other users. Example of
removing all from a blocklist: Example of
blocking user1 and user2: Example of blocking hostA and hostB: $ badmin lsfproxyd block -m "hostA hostB" Example of blocking user1 at hostA: $ badmin lsfproxyd block -u "user1" -m
"hostA" Example of blocking user1 and user2, at hostA and hostB: $ badmin lsfproxyd block -u "user1 user2" -m
"hostA hostB" Example of unblocking user1: $ badmin lsfproxyd unblock -u user1 Example of unblocking user1 at hostA and hostB: $ badmin lsfproxyd unblock -u "user1" -m "hostA
hostB" Example to see a summary of who is currently blocked: $ badmin lsfproxyd blocklist Each block command is treated as a rule. Unblocking needs to match one of the blocking rules to remove it, if in a sequence. For example, in this sequence of block commands:
Rule 3 only removes rule 2, not rule 1, so that user1@host1 is still blocked.
By the same logic, unblock all will only remove block all and leave everything else there.
If the rate limiter is enabled, when ENABLE_DIAGNOSE=lsfproxyd is configured in the lsb.params configuration file, or when lsfproxyd diagnosing is enabled with the command badmin diagnose -c lsfproxyd with its options, lsfproxyd will log all requests to a diagnostic log file with a default file name of info.lsfproxydlog.<hostname> (similar to the diagnostic log file generated by mbatchd). The content of the lsfproxyd diagnostic log, however, is slightly different from the mbatchd’s logging file. Here is an example of
an lsfproxyd diagnostic
log file: Feb 8 07:57:12 2023 QUERY_REQ,BATCH_JOB_QUERY,user1,host1,184,0x1A,1,075423,075426 Feb 8 07:57:13 2023 QUERY_REQ,BATCH_HOST_INFO,user1,host1,6624,-,1,075424,075450 Feb 8 07:57:15 2023 QUERY_REQ,BATCH_JOB_QUERY,user1,host1,184,0x1A,1,075530,075535 Feb 8 07:57:16 2023 QUERY_REQ,BATCH_HOST_INFO,user1,host1,6624,-,0,075520,075620 Feb 8 07:57:36 2023 QUERY_REQ,BATCH_HOST_INFO,user1,host1,6624,-,0,075540,075640 Feb 8 07:57:41 2023 SUBMISSION_REQ,BATCH_JOB_SUB,user1,host1,36,1,075541,075542
Compare mbatchd’s diagnostic logging format: DATE TIME YEAR COMMAND,USER,HOSTNAME,SIZE,OPTION
with lsfproxyd’s diagnose logging format, which includes captures more information: DATE TIME YEAR CATEGORY,BATCH_OPCODE,USER,HOSTNAME,SIZE,ACCEPT,
RECEIVE TIME,PROCESS TIME where: · CATEGORY: The category (QUERY_REQ, SUBMISSION_REQ, or OTHER_REQ).
· BATCH_OPCODE: The batch OPCODE.
· ACCEPT: Whether the token was accepted or rejected: 1 for accept, 0 for reject
· RECEIVE TIME: Time in HHMMSS format, when the request was received by lsfproxyd.
· PROCESS TIME: Time in HHMMSS format, when the request was processed by lsfproxyd.
In addition to the required configuration for rate limiter
enablement and logging, you can also set several optional parameters in the lsf.conf configuration file: ·
LSF_DEBUG_PROXYD - Sets the
log class for debugging lsfproxyd. ·
LSF_PROXYD_BYPASS - Determines
how mbatchd responds to query requests
without tokens. ·
LSF_PROXYD_HEARTBEAT_INTERVAL - Sets the
frequency that lsfproxyd sends a
heartbeat message to mbatchd. ·
LSF_PROXYD_TOKEN_WAIT_TIME - The amount
of time that a token request can age before it is considered too old. ·
LSF_ADDON_HOSTS - Any
requests for tokens received from the specified hosts will be treated as
privileged requests. A request for a token will be granted regardless of the
current token in use count unless the user or host has been explicitly
blocked. · LSF_PROXYD_NOTOKEN_NTRIES – Number of times that LSF command attempts to contact the lsfproxyd daemon after it is not granted a token. If this parameter is set to a value, LSF only attempts to contact lsfproxyd the defined number of times and then quits. Default value: infinite Example: LSF_PROXYD_NOTOKEN_NTRIES=3 Valid values: Any positive integer.
Existing parameters in the lsf.conf file for
the rate limiter ·
LSB_DEBUG_MBD - Sets the
debugging log class for mbatchd.
·
PROXYD_POLICIES - Specifies
the max and nominal tokens, and the throttle value when lsfproxyd is configured. o max - The maximum number of tokens used for
specified requests. o nominal - If the in-use tokens is below this
threshold value, tokens are granted as quickly as possible. o throttle - If the in-use tokens has reached the nominal value, lsfproxd will wait
this throttle value in milliseconds,
before granting another token. The max and nominal values will be divided evenly among the
running lsfproxyd daemons in the cluster. |
Readme file for: IBM® Spectrum LSF
Product or component release: 10.1
Update name: Fix 601504
Fix ID: LSF-10.1-build601504
Publication date: 26 March 2023
Contents
1. List of fixes
2. Download location
3. Product or components affected
4. System requirements
5. Installation and configuration
6. List of files
7. Product notifications
8. Copyright and trademark information
1. List of fixes
P104844
2. Download locations
Download fix 601504 from the following location: https://www.ibm.com/support/fixcentral
3. Product or components affected
Affected product or components in this fix reflect the list of enhancements previously described:
lim
lsfproxyd
mbatchd
mbschd
nios
res
sbatchd
ebrokerd
bacct
badmin
bapp
battach
battr
bbot
bchkpnt
bclusters
bconf
bctrld
bdc
bentags
bgadd
bgdel
bgmod
bgpinfo
bhist
bhosts
bhpart
bimages
bjdepinfo
bjgroup
bjobs
bkill
blimits
bmg
bmgroup
bmig
bmod
bparams
bpeek
bpost
bqc
bqueues
bread
breboot
breconfig
brequeue
bresize
bresources
brestart
bresume
brsvadd
brsvdel
brsvjob
brsvmod
brsvs
brsvsub
brun
bsla
bslots
bstatus
bstop
bsub
bswitch
btop
bugroup
busers
bwait
lshosts
lsload
liblsf.a
liblsf.so
libbat.a
libbat.so
liblsbstream.a
liblsbstream.so
lsbatch.h
lsf.h
4. System requirements
linux2.6-glibc2.3-x86_64
linux3.10-glibc2.17-x86_64
5. Installation and configuration
Before you install
LSF_TOP is the full path to the top-level installation directory of LSF.
1. You must have LSF 10.1 Fix Pack 12 or later installed before you install this fix. Download this fix pack from IBM Fix Central (https://www.ibm.com/support/fixcentral) and search for build600488 (Fix Pack 12) or build601088 (Fix Pack 13).
2. Starting in LSF 10.1 Fix Pack 13, the default values of the following three GPU parameters are changed to:
LSF_GPU_AUTOCONFIG=Y
LSB_GPU_NEW_SYNTAX=extend
LSF_GPU_RESOURCE_IGNORE=Y
If you have Fix Pack 13 installed, and these three GPU parameters are not configured in the lsf.conf configuration file, they will take the default values, and the parameters already configured in the lsf.conf file will not be affected.
If you want to keep the former GPU behavior, and if any of the three parameters are missing in the lsf.conf configuration file, you must explicitly configure the following default settings that are defined in Fix Pack 12 or earlier:
LSF_GPU_AUTOCONFIG=N
LSB_GPU_NEW_SYNTAX=N
LSF_GPU_RESOURCE_IGNORE=N3. Log on to the LSF management host as the LSF primary administrator.
4. Set the LSF cluster environment:
- For csh or tcsh: % source LSF_TOP/conf/cshrc.lsf
- For sh, ksh, or bash: $ . LSF_TOP/conf/profile.lsf5. Run badmin hclose all
6. Run badmin qinact all
Installation steps
1. Log on to the LSF management host as root and set the LSF cluster environment.
2. Go to the installation directory for the fix: cd $LSF_ENVDIR/../10.1/install/
3. Copy the fix file to the installation directory $LSF_ENVDIR/../10.1/install/
4. Run the patchinstall command: ./patchinstall <fix>
After you install1. Log on to the LSF management host as the LSF primary cluster administrator and set the LSF cluster environment.
2. Run lsadmin limrestart all
3. Run lsadmin resrestart all
4. Run badmin hrestart all
5. Run badmin mbdrestart
6. Run badmin hopen all
7. Run badmin qact all
Uninstallation1. Log on to the LSF management host as the LSF primary cluster administrator and set the LSF cluster environment.
2. Run badmin hclose all
3. Run badmin qinact all
4. Log on to the LSF management host as root and set the LSF cluster environment.
5. Go to install directory of the fix: cd $LSF_ENVDIR/../10.1/install/
6. Run the patchinstall command: ./patchinstall -r <fix>
7. Log on to the LSF management host as the LSF cluster primary administrator and set the LSF cluster environment.
8. Run lsadmin limrestart all
9. Run lsadmin resrestart all
10. Run badmin hrestart all
11. Run badmin mbdrestart
12. Run badmin hopen all
13. Run badmin qact all
6. List of files
lim
lsfproxyd
mbatchd
mbschd
nios
res
sbatchd
ebrokerd
bacct
badmin
bapp
battach
battr
bbot
bchkpnt
bclusters
bconf
bctrld
bdc
bentags
bgadd
bgdel
bgmod
bgpinfo
bhist
bhosts
bhpart
bimages
bjdepinfo
bjgroup
bjobs
bkill
blimits
bmg
bmgroup
bmig
bmod
bparams
bpeek
bpost
bqc
bqueues
bread
breboot
breconfig
brequeue
bresize
bresources
brestart
bresume
brsvadd
brsvdel
brsvjob
brsvmod
brsvs
brsvsub
brun
bsla
bslots
bstatus
bstop
bsub
bswitch
btop
bugroup
busers
bwait
lshosts
lsload
liblsf.a
liblsf.so
libbat.a
libbat.so
liblsbstream.a
liblsbstream.so
lsbatch.h
lsf.h
7. Product notifications
To receive information about product solution and fix updates
automatically, subscribe to product notifications on the My notifications page
(www.ibm.com/support/mynotifications) on the IBM
Support website (support.ibm.com). You can edit your subscription
settings to choose the types of information you want to get notification about,
for example, security bulletins, fixes, troubleshooting, and product
enhancements or documentation changes.
8. Copyright and trademark information
©Copyright IBM Corporation 2023
U.S. Government Users Restricted Rights - Use, duplication or disclosure restricted by GSA ADP Schedule Contract with IBM Corp.
IBM®, the IBM logo, and ibm.com® are trademarks of International Business Machines Corp., registered in many jurisdictions worldwide. Other product and service names might be trademarks of IBM or other companies. A current list of IBM trademarks is available on the web at "Copyright and trademark information" at www.ibm.com/legal/copytrade.shtml.