IBM Spectrum LSF 10.1.0.9 Fix 548727 Readme

Abstract

This fix resolves an issue with mbschd and preemption that might cause a core dump.

Description

Readme documentation for IBM Spectrum LSF 10.1.0.9 Fix 548727 including installation-related instructions, prerequisites and co-requisites, and list of fixes.

This fix addresses the following issue:

There is a potential for a core dump in the mbschd daemon that is related to preemption. This fix resolves that issue.

Readme file for: IBM® Spectrum LSF
Product/Component Release: 10.1.0.9
Update Name: Fix 548727
APAR: P103620
Fix ID: LSF-10.1.0.9-build548727
Publication date: 27 May 2020
Last modified date: 27 May 2020

Contents:

  1. List of fixes

  2. P103620

  3. Previous fixes since May 21, 2020


  4. ID
    Description

    RFE#140975 An enhancement to LSF to change the default behavior of what happens to a job that fails during setup. This enhancement allows you to configure LSF so that it repeats the attempt to schedule or dispatch a job if the setup fails with a specific CSM API error code.
             
    P103447
    Fixed preemption calculations in the allocation planner related to NO_PREEMPT_INTERVAL, NO_PREEMPT_FINISH_TIME, and PREEMPT_DELAY.

    P103322
    Fixed incorrect calculation of hierarchical farishare factor for absolute priority scheduling.

    P103235
    This resolves an issue where a job may fail during its post job process. This was caused by a segmentation fault within the sbatchd.

    P103168
    If a compute node's name could not be resolved (eg. misconfigured DNS) LSF would incorrectly set the associated login node as unavailable. This would result in all the associated compute nodes also being unavailable.

    P103167
    There was a discrepancy between the output of 'bjobs -prio' and 'bqueues -r ' when the queue is configured with the policies FAIRSHARE and APS_PRIORITY.

    P103106
    Fixed the issue that when "JOB_CONTROLS = SUSPEND[bmig $LSB_BATCH_JID]" is configured, using bstop to suspend a jsrun job causes the job to be killed, instead of being requeued.

    P103032
    Normally when a job reaches its runlimit LSF will send the job SIGUSR2. If the job doesn't exit within 10 minutes then LSF will send SIGQUIT. If the job continues to run LSF will send SIGTERM. Finally if the job is still running LSF will send the job the signal SIGKILL. Now LSF can be configured so that if a JSM job reaches its runlimit LSF will skip sending SIGUSR2, and start with SIGQUIT. To enable this one must configure LSB_JSM_RUNLIMIT_SKIP_SIGUSR2=y in the LSF_ENVDIR/lsf.conf configuration file, and restart all the sbatchd daemons. It is recommend that this feature not be enabled until the JSM has been updated to handle SIGQUIT.

    P103031
    A scheduler hanging issue that is introduced by a previous system advance reservation (AR) fix where jobs would run into an AR during backfill. This fix resolves the hanging issue by improving the scheduler's performance during backfill and includes a new fix for the system AR issue. This fix also resolves an issue where some jobs do not backfill

    P102991
    A job submitted with "-jsm d" would run with an incorrect csm allocation type of "user-managed". This fix corrects the allocation type to "jsm".

    P102969 There is a one week limit for a parameter CLEAN_PERIOD_DONE for the amount of time that finished jobs are kept in mbatchd core memory. This limit is removed in this fix.

    RFE#132797 This solution enhances granular secure control for bhist and bacct commands.

    RFE#132887
    Introduces a new level SECURE_JOB_INFO_LEVEL=5 to allow users to view summary information for all jobs, including jobs that belong to other users regardless of whether the other users belong to the same user group

  5. Download Location
  6. Download Fix 548727 from the following location: http://www.ibm.com/eserver/support/fixes/

  7. Product notifications
  8. To receive information about product solution and patch updates automatically, subscribe to product notifications on the My notifications page http://www.ibm.com/support/mynotifications/ on the IBM Support website (http://support.ibm.com). You can edit your subscription settings to choose the types of information you want to get notification about, for example, security bulletins, fixes, troubleshooting, and product enhancements or documentation changes.

  9. Products or components affected
  10. Affected components include:

     LSF/mbschd

  11. System requirements

  12. linux3.10-glibc2.17-ppc64le

  13. Installation and configuration
    1. 7.1    Before installation

      1. Shutdown LSF on all work load manager (WLM) and launch node (LN) hosts.

      1. Back up the LSF configuration from the conf ($LSF_ENVDIR) directory and any scripts or binary files that you added to the LSF_SERVERDIR directory (for example, customized elims, esubs, stage in pre-scripts, stage in post scripts).

      1. Run rpm -qa| grep lsf to list the currently installed rpm files.

      1. Use yum erase or rpm -evh to unistall the existing LSF packages.
        rpm -ev --allmatches -notriggers 'list of filenames as they appear in step 3'

      7.2 Installation steps

      1. Download lsf-10.1.0.9-548727.ppc64le_csm.bin package

        Run lsf-10.1.0.9-548727.ppc64le_csm.bin to extract the RPM files. Accept the license agreement when prompted to continue with the file extraction.
        lsf-common-10.1.0.9-548727.ppc64le.rpm
        lsf-master-10.1.0.9-548727.ppc64le.rpm
        lsf-misc-10.1.0.9-548727.ppc64le.rpm
        lsf-server-10.1.0.9-548727.ppc64le.rpm
        lsf-python2-api-1.0.6-10.1.0.9.ppc64le.rpm
        ibm_smpi-jsm-10.03.01.00rtm5-rh7_20191114.ppc64le.rpm

        Use rpm -ivh or yum install commands to deploy the common, server, and master RPM packages. The installation is relocatable (--prefix options supported)

        On the work load manager
        rpm -ivh lsf-common-10.1.0.9-548727.ppc64le.rpm lsf-server-10.1.0.9-548727.ppc64le.rpm lsf-master-10.1.0.9-548727.ppc64le.rpm

        On the launch node
        rpm -ivh lsf-common-10.1.0.9-548727.ppc64le.rpm lsf-server-10.1.0.9-548727.ppc64le.rpm


        Verify that the installation is successful:
        rpm -qa | grep lsf
        lsf-server-10.1.0.9-548727.ppc64le
        lsf-common-10.1.0.9-548727.ppc64le
        lsf-master-10.1.0.9-548727.ppc64le

      7.3 After installation

      1. Restore previously backed up LSF conf directory and any customized scripts or binaries in LSF_SERVERDIR

      2. (Optional) Under LSF_TOP:
        Rename work.rpmsave to work

      3. Using bash: On each WLM and LN, run the following commands as root:
      4. . /opt/ibm/spectrumcomputing/lsf/conf/profile.lsf
        or if you have used the --prefix to define your own LSF_TOP
        . /conf/profile.lsf

      5. Start up LSF
      6. lsf_daemons start

      7.4 Uninstallation

      1. Shut down LSF on all WLM and LN hosts.
        lsf_daemons stop

        Back up the LSF conf directory and any customized scripts or binaries as stated in previous steps.

        Use yum erase or rpm -evh commands to unistall LSF following the same previous steps.
  14. List of files

    lsf-10.1.0.9-548727.ppc64le_csm.bin

    The contents of lsf-10.1.0.9-548727.ppc64le_csm.bin:

    lsf-common-10.1.0.9-548727.ppc64le.rpm
    lsf-master-10.1-548727.ppc64le.rpm
    lsf-misc-10.1.0.9-548727.ppc64le.rpm
    lsf-python2-api-1.0.6-10.1.0.9.ppc64le.rpm
    lsf-server-10.1.0.9-548727.ppc64le.rpm
    ibm_smpi-jsm-10.03.01.00rtm5-rh7_20191114.ppc64le.rpm

  15. Copyright and trademark information
    1. © Copyright IBM Corporation 2020

      U.S. Government Users Restricted Rights - Use, duplication or disclosure restricted by GSA ADP Schedule Contract with IBM Corp.

      IBM®, the IBM logo and ibm.com® are trademarks of International Business Machines Corp., registered in many jurisdictions worldwide. Other product and service names might be trademarks of IBM or other companies. A current list of IBM trademarks is available on the Web at "Copyright and trademark information" at www.ibm.com/legal/copytrade.shtml