IBM Spectrum LSF 10.1.0.9 Fix 548727
Readme
Abstract
This fix resolves an issue with mbschd and preemption that
might cause a core dump.
Description
Readme documentation for IBM Spectrum LSF 10.1.0.9 Fix 548727 including installation-related instructions, prerequisites and co-requisites, and list of fixes.
This fix addresses the following issue:
There is a potential for a core dump in the mbschd daemon
that is related to preemption. This fix resolves that issue.
Readme file for: IBM® Spectrum LSF
Product/Component Release: 10.1.0.9
Update Name: Fix 548727
APAR: P103620
Fix ID: LSF-10.1.0.9-build548727
Publication date: 27 May 2020
Last modified date: 27 May 2020
Contents:
ID |
Description |
|
RFE#140975 | An enhancement to LSF to change the default behavior of what happens to a job that fails during setup. This enhancement allows you to configure LSF so that it repeats the attempt to schedule or dispatch a job if the setup fails with a specific CSM API error code. | |
|
P103447 |
Fixed preemption
calculations in the allocation planner related to
NO_PREEMPT_INTERVAL, NO_PREEMPT_FINISH_TIME, and
PREEMPT_DELAY. |
P103322 |
Fixed incorrect
calculation of hierarchical farishare factor for
absolute priority scheduling. |
|
P103235 |
This resolves an issue
where a job may fail during its post job process. This
was caused by a segmentation fault within the sbatchd.
|
|
P103168 |
If a compute node's name
could not be resolved (eg. misconfigured DNS) LSF
would incorrectly set the associated login node as
unavailable. This would result in all the associated
compute nodes also being unavailable. |
|
P103167 |
There was a discrepancy
between the output of 'bjobs -prio' and 'bqueues -r '
when the queue is configured with the policies
FAIRSHARE and APS_PRIORITY. |
|
P103106 |
Fixed the issue that when
"JOB_CONTROLS = SUSPEND[bmig $LSB_BATCH_JID]" is
configured, using bstop to suspend a jsrun job causes
the job to be killed, instead of being requeued. |
|
P103032 |
Normally when a job
reaches its runlimit LSF will send the job SIGUSR2. If
the job doesn't exit within 10 minutes then LSF will
send SIGQUIT. If the job continues to run LSF will
send SIGTERM. Finally if the job is still running LSF
will send the job the signal SIGKILL. Now LSF can be
configured so that if a JSM job reaches its runlimit
LSF will skip sending SIGUSR2, and start with SIGQUIT.
To enable this one must configure
LSB_JSM_RUNLIMIT_SKIP_SIGUSR2=y in the
LSF_ENVDIR/lsf.conf configuration file, and restart
all the sbatchd daemons. It is recommend that this
feature not be enabled until the JSM has been updated
to handle SIGQUIT. |
|
P103031 |
A scheduler hanging issue
that is introduced by a previous system advance
reservation (AR) fix where jobs would run into an AR
during backfill. This fix resolves the hanging issue
by improving the scheduler's performance during
backfill and includes a new fix for the system AR
issue. This fix also resolves an issue where some jobs
do not backfill |
|
P102991 |
A job submitted with
"-jsm d" would run with an incorrect csm allocation
type of "user-managed". This fix corrects the
allocation type to "jsm". |
|
P102969 | There is a one week limit
for a parameter CLEAN_PERIOD_DONE for the amount of
time that finished jobs are kept in mbatchd core
memory. This limit is removed in this fix. |
|
RFE#132797 | This solution enhances
granular secure control for bhist and bacct commands.
|
|
RFE#132887 |
Introduces a new level
SECURE_JOB_INFO_LEVEL=5 to allow users to view summary
information for all jobs, including jobs that belong
to other users regardless of whether the other users
belong to the same user group |
Download Fix 548727 from the following location: http://www.ibm.com/eserver/support/fixes/
To receive information about product solution and patch updates automatically, subscribe to product notifications on the My notifications page http://www.ibm.com/support/mynotifications/ on the IBM Support website (http://support.ibm.com). You can edit your subscription settings to choose the types of information you want to get notification about, for example, security bulletins, fixes, troubleshooting, and product enhancements or documentation changes.
Affected components include:
LSF/mbschd
7.1 Before installation
7.2 Installation steps
7.3 After installation
7.4 Uninstallation
lsf-10.1.0.9-548727.ppc64le_csm.bin
The contents of lsf-10.1.0.9-548727.ppc64le_csm.bin:
lsf-common-10.1.0.9-548727.ppc64le.rpm
lsf-master-10.1-548727.ppc64le.rpm
lsf-misc-10.1.0.9-548727.ppc64le.rpm
lsf-python2-api-1.0.6-10.1.0.9.ppc64le.rpm
lsf-server-10.1.0.9-548727.ppc64le.rpm
ibm_smpi-jsm-10.03.01.00rtm5-rh7_20191114.ppc64le.rpm
© Copyright IBM Corporation 2020
U.S. Government Users Restricted Rights - Use, duplication or disclosure restricted by GSA ADP Schedule Contract with IBM Corp.
IBM®, the IBM logo and ibm.com® are trademarks of International Business Machines Corp., registered in many jurisdictions worldwide. Other product and service names might be trademarks of IBM or other companies. A current list of IBM trademarks is available on the Web at "Copyright and trademark information" at www.ibm.com/legal/copytrade.shtml