IBM Platform LSF 9.1.3 Fix Pack 7 (424842) Readme File

Abstract

LSF Version 9.1.3 Fix Pack 7. This Fix Pack includes all fixed issues and solutions included in previous LSF Version 9.1.3 Fix Packs and addresses new issues fixed between 16 June 2016 and 30 September 2016 . For detailed descriptions of the issues and solutions in this Fix Pack, refer to the LSF 9.1.3 Fix Pack 7 Fixed Bugs List (lsf9.1.3.7_fixed_bugs.pdf can be downloaded from Fix central via fix ID lsf-9.1.3.7-spk-2016-Sep-build424842).

Description

Readme documentation for IBM Platform LSF 9.1.3 Fix Pack 7 (424842) including installation-related instructions, prerequisites and co-requisites, and list of fixes.

The new issues addressed in LSF Version 9.1.3 Fix Pack 7:

ID

Fixed Date

Description

P101928

2016/09/29

sbatchd daemon crashes on the execution host for MPI and blaunch jobs.

P101911

2016/09/27

After a failover occurs, the sbatchd on the secondary master host gives mbatchd a fixed time to start (90 seconds) before it treats mbatchd as not started and tries to start a new one again. This fix extends the time to align with the first master host, and further extend it if the first time startup attempt failed. In addition, instead of just starting a new mbatchd, LSF terminates the old mbatchd process before it starts the new mbatchd.

P101900

2016/09/27

If a job's rusage includes reserved and unreserved resources, and the reserved resources are not satisfied in the cluster, the job's pending reason is reported incorrectly.

P101891

2016/09/14

When parallel jobs are submitted across multiple nodes, jobs exit unexpectedly.

P101882

2016/10/12

When submitting an interactive job with the "-cwd" option, if the specified directory does not exist, the client-end shows the message "Terminated while pending". This causes confusion because the job has been dispatched to the execution host. This fix changes the message to "Terminated on execution host" and appends a prompt message.

P101876

2016/09/28

This fix modifies the preemption scheduling plugin, schmod_preemption.so, to reduce the time spent on job preemption condition evaluation. This fix reduces the time spent on the POSTPROC phase of the scheduling cycle when there are some running preemptable jobs.

P101870

2016/09/01

When using LSF's liblsfstream.so to parse an LSF stream file, parsing fails if "JOB_FORCE" type records are in the file.

P101865

2016/09/12

The fix allows LSF users to adjust the NICE value of each LSF daemon process on UNIX platform.

P101859

2016/08/30

With JOB_CWD_TTL=0 configured, LSF does not delete a finished interactive job's temporary current working directory (CWD).

P101856

2016/08/18

Job environment variables LSB_JOBPIDS and LSB_JOBPGIDS are not set correctly when the parameters LSF_DAEMON_WRAP=Y and LSF_HPC_EXTENSIONS="CUMULATIVE_RUSAGE" are configured in lsf.conf.

P101852

2016/09/02

When cgroup is not enabled, if a job process runs in a separate process group, CPU time from bjobs output is underreported when the process completes, and following job process is still running.

P101844

2016/09/09

When submitting an interactive job with LSB_NUM_NIOS_CALLBACK_THREADS configured in lsf.conf, mbatchd logs the following error message to the mbatchd log: "niosCallback_ ... failed: Connection refused".

P101842

2016/09/12

During the scheduler session, hosts are selected as candidate hosts, pending reasons are set for them. However, subsequent cross-host parallel jobs still try to ALLOC on these candidate hosts in the same scheduler session. This occurs because the scheduler session takes a long time in the ALLOC phase. This fix decreases the amount of time needed for the ALLOC phase, to improve scheduling efficiency.

P101841

2016/08/15

In the lsload output, many hosts appear to be in busy status by "r1m" load index. However, the "r1m" index value from "lsload -E" output does not exceed the threshold defined in the lsf.cluster file.

P101840

2016/08/17

When external scheduler plugins use the following LSF API in lssched.h, mbschd encounters core dump issues in a few special use cases:
extern int extsched_getJobInfo(INT_Job *job, struct jobInfo *info);

P101837

2016/08/17

After mbatchd parallel restart, there may be jobs in the RUN state that are not actually running, and they may remain in that status indefinitely.

P101836

2016/08/15

When the vemkd working file ($LSF_TOP/work//ego/vemkd/allocation/status) has been damaged by a program exception, a lack of hardware resources, or an invalid operation, the egosc will core dump when starting ego ("egosh ego start").

P101819

2016/08/04

Fix for the extsched API that is used to create plug-ins for the LSF scheduler. This fix ensures that the correct context is set during preemption scheduling. The scheduling context is obtained by the use of the extsched API function extsched_getCheckAllocContext() from within a plug-in's checkAlloc callback.

P101797

2016/07/22

Fix to resolve an issue where the bsub -L option incorrectly keeps environment variables when passing to the running job environment if its assigned value is the same as the name of an environment variable that should be kept.

P101791

2016/08/05

When using APIs to perform a continuous query, the query might fail if the float client expired.

P101788

2016/08/17

Fix to give parallel blaunch job res enough time to send SIGINT, SIGTERM, and SIGKILL signals to a job when the job loses its connection to the job res.

P101785

2016/07/20

Jobs with alternative license resource requirements involving multiple license resources cannot run if one rusage string is not satisfied and the other rusage string is satisfied.
For example, a job with an alternative resource requirement string such as "A || B,C" cannot run if "A" is not satisfied but "B,C" is satisfied in the cluster for the job.

P101781

2016/08/08

When enabling the LSF_PROCESS_TRACKING and LSF_LINUX_CGROUP_ACCT parameters in the lsf.conf file, the parent sbatchd cannot clean up any obsolete job cgroup directories.

P101780

2016/07/29

The leased-in cluster defines a host group that uses an asterisk as a group member to present the leased-in hosts, and the submitted jobs specifying the defined host group are forwarded to the leased-out cluster.

P101776

2016/07/11

lim experiences memory leaks when using the GPU feature.

P101774

2016/07/07

After enabling LSB_LIMIT_CACHE in lsf.conf, jobs that use license project mode resources are always pending even when the resource requirement is satisfied in the cluster.

P101757

2016/06/17

When a job submitted with the energy aware scheduling feature finishes, the attribute count has the wrong value for the JOB_FINISH2 record in the lsb.stream file. As a result, some fields cannot be read correctly.

P101619

2016/07/01

When a job's pre-execution script fails repeatedly, the recorded run time in the stream file is not correct. This fix enables LSF to correctly record the run time by including a job's failed pre-execution run time.

124222

2016/09/29

When using "bmod -g /group ", the command reports "Cannot combine modify job group or service class option with others. Job not modified." error.

108691

2016/07/04

Job submission fails from a float client after restarting LSF in the submission cluster.


The new solutions in LSF Version 9.1.3 Fix Pack 7:

ID

Fixed Date

Description

RFE#88294

2016/07/26

Enhance the LSF external scheduler plugin API. You can use the LSF external scheduler plugin API to customize existing scheduling policies or implement new ones that can operate with existing LSF scheduler plugin modules.

P101924

2016/10/15

This enhancement will allow some queues to ignore the RETAIN and DURATION in guarantee host policy which enable LOAN_POLICIES. You can precede the queue name with a '!' in the LOAN_POLICIES line, then the RETAIN and DURATION policies are ignored for the queue when deciding whether a job in the queue can borrow unused guaranteed resources.


The fixed issues and solutions included in previous LSF Version 9.1.3 Fix Packs can be found in lsf9.1.3.7_fixed_bugs.pdf.

Readme file for: IBM® Platform LSF

Product/Component Release: 9.1.3

Update Name: Fix 424842

Fix ID: lsf-9.1.3.7-spk-2016-Sep-build424842

Publication date: 17 November 2016

Last modified date: 17 November 2016

Contents:

1.     List of fixes

2.     Download location

3.     Products or components affected

4.     System requirements

5.     Installation and configuration

6.     List of files

7.     Product notifications

8.     Copyright and trademark information

 

1.   List of fixes

P101928, P101924, P101911, P101900, P101891, P101882, P101876, P101870, P101865, P101859, P101856,
P101852, P101844, P101842, P101841, P101840, P101837, P101836, P101819, P101797 ,P101791, P101788,
P101785, P101781, P101780, P101776, P101774, P101757, P101619, 124222 (No APAR), 108691 (No APAR),
RFE#88294

2.   Download Location

Download Fix 424842 from the following location: http://www.ibm.com/eserver/support/fixes/

3.   Products or components affected

Components affected by the new issues addressed in LSF Version 9.1.3 Fix Pack 7 include:
LSF/nios
LSF/lim
LSF/pim
LSF/res
LSF/sbatchd
LSF/mbatchd
LSF/lsadmin
LSF/badmin
LSF/blaunch
LSF/lsbatch.h
LSF/lsf.h
LSF/libbat.a
LSF/liblsf.a
LSF/libbat.so
LSF/liblsf.so
LSF/mbschd
LSF/liblsbstream.so
LSF/egosc
LSF/lssched.h
LSF/bsub
LSF/bmod,
LSF/cal_jobweight.so
LSF/schmod_default.so
LSF/schmod_parallel.so
LSF/schmod_preemption.so
LSF/schmod_reserve.so
LSF/brestart (Only on Linux2.6-glibc2.3-x86_64)
LSF/mesub (Only on Linux2.6-glibc2.3-x86_64)
LSF/schmod_advrsv.so (Only on Linux2.6-glibc2.3-x86_64)
LSF/schmod_affinity.so (Only on Linux2.6-glibc2.3-x86_64)
LSF/schmod_aps.so (only On Linux2.6-glibc2.3-x86_64)
LSF/schmod_bluegene.so (Only on Linux2.6-glibc2.3-x86_64)
LSF/schmod_cpuset.so (Only on Linux2.6-glibc2.3-x86_64)
LSF/schmod_craylinux.so (Only on Linux2.6-glibc2.3-x86_64)
LSF/schmod_crayx1.so (Only on Linux2.6-glibc2.3-x86_64)
LSF/schmod_dc.so (only On Linux2.6-glibc2.3-x86_64)
LSF/schmod_demand.so (Only on Linux2.6-glibc2.3-x86_64)
LSF/schmod_dist.so (Only on Linux2.6-glibc2.3-x86_64)
LSF/schmod_fairshare.so (Only on Linux2.6-glibc2.3-x86_64)
LSF/schmod_fcfs.so (Only on Linux2.6-glibc2.3-x86_64)
LSF/schmod_jobweight.so (Only on Linux2.6-glibc2.3-x86_64)
LSF/schmod_limit.so (Only on Linux2.6-glibc2.3-x86_64)
LSF/schmod_mc.so (Only on Linux2.6-glibc2.3-x86_64)
LSF/schmod_ps.so (Only on Linux2.6-glibc2.3-x86_64)
LSF/schmod_pset.so (Only on Linux2.6-glibc2.3-x86_64)
LSF/schmod_rms.so (Only on Linux2.6-glibc2.3-x86_64)
LSF/schmod_xl.so (Only on Linux2.6-glibc2.3-x86_64)

 

4.   System requirements

Linux2.6-glibc2.3-x86_64
Linux3.10-glibc2.17-ppc64le

 

5.   Installation and configuration

 

5.1          Before installation

 

 LSF_TOP=Full path to the top-level installation directory of LSF.

1)    Log on to the LSF master host as root

2)    Set your environment:

-      For csh or tcsh: % source LSF_TOP/conf/cshrc.lsf

-      For sh, ksh, or bash: $ . LSF_TOP/conf/profile.lsf

 

5.2          Installation steps

 

1)    Go to the patch install directory: cd $LSF_ENVDIR/../9.1/install/

2)    Copy the patch file to the install directory $LSF_ENVDIR/../9.1/install/

3)    Run
badmin hclose all
badmin qinact all

4)    Run patchinstall: ./patchinstall <patch>

 

5.3          After installation

 

1)    Run
badmin hshutdown all
lsadmin resshutdown all
lsadmin limshutdown all

2)    Run
lsadmin limstartup all
lsadmin resstartup all
badmin hstartup all

3)    Run
badmin hopen all
badmin qact all

 

5.4          Uninstallation

 

To roll back a patch:

1)    Log on to the LSF master host as root

2)    Set your environment:

-      For csh or tcsh: % source LSF_TOP/conf/cshrc.lsf

-      For sh, ksh, or bash: $ . LSF_TOP/conf/profile.lsf

3)    Run
badmin hclose all
badmin qinact all

4)    Run ./patchinstall -r <patch>

5)    Run
badmin hshutdown all
lsadmin resshutdown all
lsadmin limshutdown all

6)    Run
lsadmin limstartup all
lsadmin resstartup all
badmin hstartup all

7)    Run
badmin hopen all
badmin qact all

6.   List of files in package

 

filelist.txt
fixlist.txt
include/
include/lsf/
include/lsf/lsbatch.h
include/lsf/lsf.h
include/lsf/lssched.h
linux2.6-glibc2.3-x86_64/
linux2.6-glibc2.3-x86_64/bin/
linux2.6-glibc2.3-x86_64/bin/bacct
linux2.6-glibc2.3-x86_64/bin/badmin
linux2.6-glibc2.3-x86_64/bin/bapp
linux2.6-glibc2.3-x86_64/bin/bgpinfo
linux2.6-glibc2.3-x86_64/bin/bhist
linux2.6-glibc2.3-x86_64/bin/bhosts
linux2.6-glibc2.3-x86_64/bin/bjobs
linux2.6-glibc2.3-x86_64/bin/bkill
linux2.6-glibc2.3-x86_64/bin/blaunch
linux2.6-glibc2.3-x86_64/bin/blimits
linux2.6-glibc2.3-x86_64/bin/bmgroup
linux2.6-glibc2.3-x86_64/bin/bmod
linux2.6-glibc2.3-x86_64/bin/bparams
linux2.6-glibc2.3-x86_64/bin/bpeek
linux2.6-glibc2.3-x86_64/bin/bqueues
linux2.6-glibc2.3-x86_64/bin/bresize
linux2.6-glibc2.3-x86_64/bin/bresources (Only on Linux2.6-glibc2.3-x86_64)
linux2.6-glibc2.3-x86_64/bin/brestart
linux2.6-glibc2.3-x86_64/bin/bsub
linux2.6-glibc2.3-x86_64/bin/bswitch
linux2.6-glibc2.3-x86_64/bin/lsadmin
linux2.6-glibc2.3-x86_64/bin/lsgrun
linux2.6-glibc2.3-x86_64/bin/lshosts
linux2.6-glibc2.3-x86_64/bin/lsmakerm
linux2.6-glibc2.3-x86_64/bin/lsrun
linux2.6-glibc2.3-x86_64/bin/pam
linux2.6-glibc2.3-x86_64/lib/
linux2.6-glibc2.3-x86_64/lib/cal_jobweight.so
linux2.6-glibc2.3-x86_64/lib/libbat.a
linux2.6-glibc2.3-x86_64/lib/libbat.so
linux2.6-glibc2.3-x86_64/lib/liblsbstream.so
linux2.6-glibc2.3-x86_64/lib/liblsf.a
linux2.6-glibc2.3-x86_64/lib/liblsf.so
linux2.6-glibc2.3-x86_64/lib/libptmalloc3.so (Only on Linux2.6-glibc2.3-x86_64)
linux2.6-glibc2.3-x86_64/lib/schmod_advrsv.so
linux2.6-glibc2.3-x86_64/lib/schmod_affinity.so
linux2.6-glibc2.3-x86_64/lib/schmod_aps.so
linux2.6-glibc2.3-x86_64/lib/schmod_bluegene.so
linux2.6-glibc2.3-x86_64/lib/schmod_cpuset.so
linux2.6-glibc2.3-x86_64/lib/schmod_craylinux.so
linux2.6-glibc2.3-x86_64/lib/schmod_crayx1.so
linux2.6-glibc2.3-x86_64/lib/schmod_dc.so
linux2.6-glibc2.3-x86_64/lib/schmod_default.so
linux2.6-glibc2.3-x86_64/lib/schmod_demand.so (Only on Linux2.6-glibc2.3-x86_64)
linux2.6-glibc2.3-x86_64/lib/schmod_dist.so
linux2.6-glibc2.3-x86_64/lib/schmod_fairshare.so
linux2.6-glibc2.3-x86_64/lib/schmod_fcfs.so
linux2.6-glibc2.3-x86_64/lib/schmod_jobweight.so
linux2.6-glibc2.3-x86_64/lib/schmod_limit.so
linux2.6-glibc2.3-x86_64/lib/schmod_mc.so
linux2.6-glibc2.3-x86_64/lib/schmod_parallel.so
linux2.6-glibc2.3-x86_64/lib/schmod_preemption.so
linux2.6-glibc2.3-x86_64/lib/schmod_ps.so
linux2.6-glibc2.3-x86_64/lib/schmod_pset.so
linux2.6-glibc2.3-x86_64/lib/schmod_reserve.so
linux2.6-glibc2.3-x86_64/lib/schmod_rms.so
linux2.6-glibc2.3-x86_64/lib/schmod_xl.so
linux2.6-glibc2.3-x86_64/etc/
linux2.6-glibc2.3-x86_64/etc/daemons.wrap
linux2.6-glibc2.3-x86_64/etc/eauth
linux2.6-glibc2.3-x86_64/etc/ebrokerd
linux2.6-glibc2.3-x86_64/etc/egosc
linux2.6-glibc2.3-x86_64/etc/elim.gpu
linux2.6-glibc2.3-x86_64/etc/elim.hpc (Only on linux2.6-glibc2.3-x86_64)
linux2.6-glibc2.3-x86_64/etc/erestart
linux2.6-glibc2.3-x86_64/etc/gpolicyd
linux2.6-glibc2.3-x86_64/etc/krbrenewd
linux2.6-glibc2.3-x86_64/etc/lim
linux2.6-glibc2.3-x86_64/etc/mbatchd
linux2.6-glibc2.3-x86_64/etc/mbschd
linux2.6-glibc2.3-x86_64/etc/melim
linux2.6-glibc2.3-x86_64/etc/mesub
linux2.6-glibc2.3-x86_64/etc/nios
linux2.6-glibc2.3-x86_64/etc/pim
linux2.6-glibc2.3-x86_64/etc/res
linux2.6-glibc2.3-x86_64/etc/sbatchd
linux2.6-glibc2.3-x86_64/etc/vemkd
misc/
misc/examples/
misc/examples/elim.gpu.ext/
misc/examples/elim.gpu.ext/Makefile
misc/examples/elim.gpu.ext/README
misc/examples/elim.gpu.ext/elim.gpu.ext
misc/examples/elim.gpu.ext/elim.gpu.ext.c
misc/examples/elim.gpu.ext/elim.gpu.topology
misc/examples/elim.gpu.ext/elim.gpu.topology.c
misc/examples/elim.gpu.ext/hwloc_nvml.so
misc/examples/external_plugin/
misc/examples/external_plugin/allocexample.c
misc/examples/external_plugin/matchexample.c
packagedef.txt

 

7.   Product notifications

To receive information about product solution and patch updates automatically, subscribe to product notifications on the My notifications page (www.ibm.com/support/mynotifications) on the IBM Support website (support.ibm.com). You can edit your subscription settings to choose the types of information you want to get notification about, for example, security bulletins, fixes, troubleshooting, and product enhancements or documentation changes.


8.   Copyright and trademark information

© Copyright IBM Corporation 2016

U.S. Government Users Restricted Rights - Use, duplication or disclosure restricted by GSA ADP Schedule Contract with IBM Corp.

IBM®, the IBM logo and ibm.com® are trademarks of International Business Machines Corp., registered in many jurisdictions worldwide. Other product and service names might be trademarks of IBM or other companies. A current list of IBM trademarks is available on the Web at "Copyright and trademark information" at www.ibm.com/legal/copytrade.shtml.