IBM Spectrum LSF 10.1 Fix Pack 3 (466165) Readme File
Abstract
LSF Version 10.1 Fix Pack 3 for linux3.12-glibc2.17-armv8. This Fix Pack includes new issues and solutions resolved between 16 February 2017 and 06 July 2017. For detailed descriptions of the issues and solutions in this Fix Pack, refer to the LSF 10.1 Fix Pack 3 Fixed Bugs List (lsf10.1.0.3_fixed_bugs.pdf can be downloaded from Fix central via fix ID lsf-10.1.0.3-spk-2017-Jul-build466165).
Description
Readme documentation for IBM Spectrum LSF 10.1 Fix Pack 3 (466165) including installation-related instructions, prerequisites and co-requisites, and list of fixes.
The new issues addressed in LSF Version 10.1 Fix Pack 3:
ID Fixed Date Description P102288
2017/06/30
This fix addresses an issue with parallel or multiple host jobs. When the bresize command is used to release one host's slots, LSF might abort the tasks on other hosts because of improper signals from the job res.
P102282
2017/06/05
This fix allows jobs to be scheduled even if the LSF_LOGDIR parameter is not defined in the lsf.conf file.
P102278
2017/07/06
This fix addresses an issue where if the load schedule and stop policy has been configured on hosts (to control swap resources), running jobs on those hosts would incorrectly enter SSUSP status after executing lsadmin reconfig.
P102277
2017/07/06
This fix addresses an issue when using the bsub -pack option. If you use an esub to modify the host options, the bsub command core dumps.
P102273
2017/06/29
This fix addresses an issue where if the job submission has compound resource requirements that include compute units, LSF might over-schedule jobs that exceed the specified limits definition.
P102272
2017/06/29
This fix addresses an issue where if the job's dependencies contains the wildcard character (*) on both sides of the job name (for example, "* P102267
2017/06/30
This fix addresses an issue where if a user runs an invalid bmod command where the argument for an option is a multi-line command, the content of the command is recorded literally in the lsb.events file, causing the bhist command to report a "Bad event format" error for each line of that record.
P102266
2017/06/23
This fix addresses an issue where if JOB_INCLUDE_POSTPROC=Y is set, a child job will be started when the brequeue command is used for the parent job in the post-execution phase.
P102253
2017/06/17
This fix addresses an issue where if the LSF_DAEMON_WRAP parameter is enabled in the lsf.conf file and the DJOB_ENV_SCRIPT parameter is not configured in the lsb.applications file, the user-defined script is incorrectly invoked as if the DJOB_ENV_SCRIPT parameter is defined.
P102252
2017/06/13
This fix addresses the following issue:
The environment variable DAEMON_WRAP_ENABLE_BPOST is added to control bpost call in daemons.wrap. The environment variable is set to DAEMON_WRAP_ENABLE_BPOST=y to enable a bpost call when cleartool setview fails.
P102241
2017/06/21
This fix addresses an issue that occurs in multicluster environments where the version of the submission cluster is lower than LSF 10.1 and the version of the execution cluster is LSF 10.1.
P102236
2017/06/06
This fix addresses an issue where the bhist command shows the incorrect pending time when there are two exit JOB_STATUS events in the lsb.events file.
P102234
2017/06/21
This fix addresses an issue where if the scheduler binary file is obtained from compiler optimization, the child scheduler might crash.
This is because the child scheduler uses a function that might cause a core dump under a specific case, but the child scheduler normally
checks for this case before running the function to prevent the core dump. The compiler optimization removes this check, which causes
the child scheduler to run the function regardless of the circumstances and potentially cause a core dump. This fix restores the
check to prevent the function from running.
P102231
2017/06/14
Prior to this patch MBD didn't distinguish between expected and unexpected error cases during recovery of the jobinfo cache. This patch will properly distinguish these cases and treat unexpected errors as fatal errors. When a fatal error occurs, MBD will exit. Cache recovery will be retried when SBD restarts MBD. This patch also improves the logging of file system access errors during recovery of the lsb.jobinfo.events file.
P102230
2017/06/05
This fix addresses the following issue:
A pending interactive job cannot be modified by the bmod command if automatic job rerun is enabled in the queue (that is, the RERUNNABLE parameter is set to "yes" in the lsb.queues file).
P102229
2017/06/05
This fix addresses an issue where a pending interactive job cannot be modified by the bmod command if automatic job rerun is enabled in the queue (that is, the RERUNNABLE parameter is set to "yes" in the lsb.queues file).
P102224
2017/05/18
In the lsb.resources file, when the length of a resource limit name is 40 characters, its limit usage cannot be shown with the blimits command.
P102208
2017/05/12
This fix addresses the following issue:
When the "PREEMPT_JOBTYPE" parameter is configured as "EXCLUSIVE" in the lsb.params file, the mbschd daemon crashes when resuming the cross-host preemptive parallel suspended job
P102199
2017/05/02
This fix addresses the issue where a migrated job has an incorrect runtime after mbatchd restart, caused when part of the job event is switched.
P102198
2017/05/02
This fix addresses the issue where if a host group contains hosts that are unknown, some good status hosts might be excluded from this host group after the mbatchd daemon restarts.
P102196
2017/04/26
This fix addresses the issue where if an LSF host group contains some hosts that are not static servers, some good status hosts might be excluded from this host group after a reconfiguration. In addition, this might result in the host group containing no hosts, which results in the entire host group being unavailable.
P102188
2017/04/04
This fix addresses the issue where jobs with run limits at the queue and application level fail to submit if the run limits at the queue and application level are both identical and the ABS_RUNLIMIT parameter is set to Y in the lsb.params file.
P102186
2017/04/21
This fix addresses the issue where if LSF is rotating event files, bhist might not be able to display the complete job history due to this race condition.
P102178
2017/04/04
This fix addresses the issue where if parallel restart happens at the same time as lsb.jobinfo.events are rewritten, then records in that event file may be corrupted. This fix prevents the corruption by properly synchronizing the two operations.
P102171
2017/04/10
This fix addresses the issue where ENABLE_HOST_INTERSECTION is defined as Y in the lsb.params file and jobs are submitted with a specified host list to a queue and some specified hosts are removed from the queue's host list later.
P102170
2017/04/13
This fix addresses the issue where if LSB_KRB_TGT_FWD is set to Y in the lsf.conf file and JOB_INFO_MEMORY_CACHE_SIZE is set to a non-zero value in the lsb.params file, the job's Kerberos TGT file in the LSB_JOBINFO_DIR directory cannot be deleted.
P102169
2017/03/22
This fix addresses the issue where liblsbstream.a and libfairsharedjust.a static libraries are missing from the LSF lib directory.
P102158
2017/04/11
This fix addresses the issue where rerunnable jobs get requeued when execution hosts are unavailable. This means that IBM Spectrum LSF Analytics does not correctly calculate the job pending time.
P102152
2017/03/30
This fix addresses the issue where after running the badmin mbdrestart command, if the mbatchd daemon exits abnormally when starting up, the sbatchd daemon will wait for a very long time before trying to restart the mbatchd again. P102147
2017/03/24
This fix addresses the issue where if cgroup is enabled, after JOB_POSTPROC_TIMEOUT is expired, LSF does not kill all processes launched by post-exec, but cgroup is still tracking those processes.
P102141
2017/03/22
This fix addresses the issue where the bmgroup command does not display host groups with names that contain the text "others".
P102139
2017/03/24
This fix addresses the issue where jobs are not being scheduled in the specified time window even when the bqueues command shows that the queue is open and active. This issue happens only when the RUN_WINDOW parameter in the lsb.queues file is configured to always be open.
P102136
2017/03/17
This fix addresses the issue where the RUNLIMIT cannot be enforced for jobs with pre-execution functions if they were submitted to an LSF cluster that only has Fix Packs applied that are older than 390354 and the pre-execution functions finish before applying a newer Fix Pack (and restarting sbatchd to apply the changes).
P102119
2017/03/10
This fix addresses the issue where if the LSF_COLLECT_ENERGY_USAGE parameter is configured as Y in the lsf.conf file, the sbatchd daemon that is running on the host is unresponsive after multiple jobs are dispatched to the same host. When attempting to restart sbatchd on the host with the "badmin hrestart" command, the command fails with one of the following error messages:
P102115
2017/03/07
This fix addresses the issue where if a user defines the JOB_SPOOL_DIR parameter in the lsb.params file with variable substitutions, the bpeek command cannot read the job output.
P102105
2017/03/10
This fix addresses the issue where if bsub is rejected on non-LSF hosts, bsub does not call the epsub file.
P102103
2017/03/02
This fix addresses the issue where if bpeek is run in a pseudo terminal that is started by an interactive job, the error "Job P102100
2017/02/17
This fix addresses the issue where if the file attribute of lsb.lease.state has been changed and the cluster manager cannot open it, the mbatchd daemon cannot start in an LSF multicluster environment. 157354
2017/07/04
This fix allows users with a restrictive user mask (0077, for instance) to start a Docker-based LSF job successfully.
156718
2017/07/06
This fix addresses the issue where LSF uses submission environment variables to overwrite environment variables in the Docker container. Users need special environment variables (like LD_LIBRARY_PATH in the image). This solution merges PATH, LD_LIBRARY_PATH, and LD_PRELOAD from the job environment with the image settings.
156008
2017/06/30
This fix prevents mbschd from generating large numbers of core when using the LSF Express version entitlement file.
155022
2017/07/06
MPS fails to start in multi-host job if LSF_TMPDIR is pointing to a shared file system.
152237
2017/05/15
This fix allows the sbatchd daemon to correctly log CGROUP related error message.
151340
2017/05/09
This fix addresses the issue where if you create an advance reservation and submit an exclusive job (without a RUN LIMIT or the job cannot finish before the current end time of the advance reservation) to the advance reservation after it becomes active, the brsvmod command cannot extend the advance reservation when the exclusive job is running.
149041
2017/05/23
This fix addresses the issue where fixed Parallel Environment (PE) jobs remain pending when a user disables the fair share plugin in all LSF queues.
148457
2017/04/20
This fix prevents the mbatchd daemon from corrupting files in the directory LSB_LOCALDIR/LSB_SHAREDIR when LSB_LOCALDIR and LSB_SHAREDIR are set as the same value.
147039
2017/04/17
This fix allows the mbatchd daemon to release the memory resource usage on leased-in hosts.
141493
2017/02/17
The fix prevents the mbatchd daemon from crashing when LSF_LOGDIR and DIAGNOSE_LOGDIR are not configured.
The new solutions in LSF Version 10.1 Fix Pack 3:
ID Fixed Date Description 152022 2017/06/29
The new bresize request subcommand option allows you to request additional tasks to be allocated to a running resizable job, which grows the resizable job. This means that you can both grow and shrink a resizable job by using the bresize command.
RFE#98501 2017/06/21
This solution allows the end user to submit jobs with remote hosts using the "-m remote_host@remote_cluster ..." option to send-jobs queues in the job forwarding model when using the LSF multicluster capability.
139905 2017/06/15
LSF provides a feature to log profiling information for the mbatchd and mbschd daemons to track the time that the daemons spend on key functions. This can assist IBM Support with diagnosing daemon performance problems.
RFE#101255 2017/06/09
This solution provides a way to directly get a list of jobs using a particular reservation ID by using the new bjobs -U option.
RFE#92852 2017/06/07
This enhancement enlarges bsub -u email address from 63 to 511 characters.
RFE#75418 2017/05/26
This fix enables the bpeek -f command to exit when the peeked job is completed.
RFE#94879 2017/05/15
This enhancement modifies the MAX_PEND_JOBS parameter to limit the maximum number of pending jobs. This enhancement also adds the new MAX_PEND_SLOTS parameter to replace the current MAX_PEND_JOBS parameter. Both of these parameters provide ways to protect cluster service globally across the cluster or at the user level.
RFE#79255 2017/05/15
This solution improves the performance of daemons.wrap by not checking the ClearCase view and provides the failure reason of "cleartool setview" called by daemons.wrap in bjobs -l, bhist -l, bstatus and bread. This solution also adds pid in daemons.wrap.log.
146129 2017/05/10
This enhancement enables users to create and schedule advance reservations in the same way as a job. Once the reservation is active, the jobs submitted to the reservation can run within it.
146128 2017/04/20
This solution allows you to configure a script in the options field of the CONTAINER parameter in the lsb.applications file. Before the container job runs, LSF first runs a script with LSF administrator privileges. While the script is running, all the jobs’ environment variables are passed to the script in run time. When the script finishes running, the output is used in the container startup options.
P102156 2017/04/20
This enhancement adds new functionalities to LSF:
RFE#98823 2017/04/13
For jobs that are pending because there are not enough licenses available, the bjobs -p, -p1, -p2, and -p3 options only show which licenses do not have enough available, and do not display the project or cluster to which the licenses belong. This might be confusing because this can show that the license server has free licenses, but jobs cannot run, without showing that the project or cluster has limits on the licenses that are available.
Readme file for:
IBM®
Spectrum LSF Product/Component Release:
10.1
Update Name:
Fix 466165
Fix ID:
lsf-10.1.0.3-spk-2017-Jul-build466165
Publication date:
12 September 2017
Last modified date:
12 September 2017
Contents:
1. List of fixes
2. Download location
3. Products or components affected
4. System requirements
5. Installation and configuration
6. List of files
7. Product notifications
8. Copyright and trademark information
1. List of fixes
P102288,P102282,P102278,P102277,P102273,P102272,P102267,P102266,P102253,P102252,P102241,P102236,
2. Download Location Download
Fix 466165
from the following location: http://www.ibm.com/eserver/support/fixes/ 3. Products or components affected
Components affected by the new issues addressed in LSF Version 10.1 Fix Pack 3 include:
4. System requirements
linux3.12-glibc2.17-armv8
5. Installation and configuration 5.1 Before installation LSF_TOP=Full path to the top-level installation directory of LSF. 1) Log on to the LSF master host as root 2) Set your environment: - For csh or tcsh: % source LSF_TOP/conf/cshrc.lsf - For sh, ksh, or bash: $ . LSF_TOP/conf/profile.lsf 5.2 Installation steps 1) Go to the patch install directory: cd $LSF_ENVDIR/../10.1/install/ 2) Copy the patch file to the install directory $LSF_ENVDIR/../10.1/install/ 3) Run 4) Run patchinstall: ./patchinstall <patch> 5.3 After installation 1) Run 2) Run 3) Run 5.4 Uninstallation To roll back a patch: 1) Log on to the LSF master host as root 2) Set your environment: - For csh or tcsh: % source LSF_TOP/conf/cshrc.lsf - For sh, ksh, or bash: $ . LSF_TOP/conf/profile.lsf 3) Run 4) Run ./patchinstall -r <patch> 5) Run 6) Run 7) Run
6. List of files in package
filelist.txt
7. Product notifications
To receive information about product solution and patch updates automatically, subscribe to product notifications on the My notifications page (www.ibm.com/support/mynotifications) on the IBM Support website (support.ibm.com). You can edit your subscription settings to choose the types of information you want to get notification about, for example, security bulletins, fixes, troubleshooting, and product enhancements or documentation changes.
8. Copyright and trademark information
© Copyright IBM Corporation 2017
U.S. Government Users Restricted Rights - Use, duplication or disclosure restricted by GSA ADP Schedule Contract with IBM Corp.
IBM®, the IBM logo and ibm.com®
are trademarks of International Business Machines Corp., registered in many jurisdictions worldwide. Other product and service names might be trademarks of IBM or other companies. A current list of IBM trademarks is available on the Web at "Copyright and trademark information" at www.ibm.com/legal/copytrade.shtml.
This fix prevents a child job from being triggered in this instance.
When the runtime resource usage limit of a forwarded job is modified in the submission cluster and the mbatchd daemon of the submission cluster can recognize the name of the execution host, the execution host enters an unreach status.
In addition, this might result in the host group containing no hosts, which results in the entire host group being unavailable.
After mbatchd is reconfigured and the jobs are requeued, they are incorrectly dispatched to the removed hosts.
This fix adds a job signal event before the job finish event in lsb.stream, allowing Analytics to correctly calculate pending time for these jobs.
A new parameter, LOG_JOB_SIGNAL_FOR_HOST_UNAVAIL, is added to lsb.params to control whether to log an additional job signal event.
Syntax: LOG_JOB_SIGNAL_FOR_HOST_UNAVAIL = Y|y|N|n
Description: If enabled (set to Y|y), when slave hosts become unavailable, LSF logs a job signal event before the first job finish event in the lsb.stream file for rerunnable jobs that run on those slave hosts.
Default: N
For example, if you set the RUN_WINDOW parameter as follows:
RUN_WINDOW = 0:0:0-4:09:00 4:13:00-5:13:00 5:14:00-1:13:00 01:14:00-0:0:0
The bqueues binary shows the queue is still open and active, but jobs in the according queue cannot be scheduled with the pending reason - "New job is waiting for scheduling;"
This is because the logic for calculating the run window close time in LSF does not correctly handle the case when the configured run window is always opened.
1. "Host control failed: Failed in an LSF library call: Failed in sending/receiving a message: Connection reset by peer"
2. "Host control failed: Failed in an LSF library call: Communication time out"
After applying this fix, the bpeek command will support the following user defined variables to be used in JOB_SPOOL_DIR:
%U: username
%H: (first) execution host name
%P: project name
%JG: job group name
%C: execution cluster name
NOTE: This fix is not supported on Windows operating systems.
The merging rule is putting the variable value in user environment first which is shown below:
PATH=job_PATH:image_PATH
LD_LIBRARY_PATH=job_LD_LIBRARY_PATH:image_LD_LIBRARY_PATH
LD_PRELOAD=job_LD_PRELOAD:image_LD_PRELOAD
To enable daemon profiling with the default settings, edit the lsf.conf file, then specify LSB_PROFILE_MBD=Y for the mbatchd daemon or specify LSB_PROFILE_SCH=Y for the mbschd daemon. You can also add keywords within these parameters to further customize the daemon profilers.
This solution also adds the "rsvid" field to the bjobs -o option to display individual advance reservation IDs.
If the peeked job is requeued or migrated, the bpeek command only exits if the job is completed again. In addition, the bpeek command cannot get the new output of the job. To avoid these issues, abort the previous bpeek -f command and rerun the bpeek -f command after the job is requeued or migrated.
RFE#82487
RFE#92576
RFE#98712
After applying this solution, the ClearCase view set by CLEARCASE_ROOT will not be checked under any condition. This means NOCHECKVIEW_POSTEXEC becomes obsolete.
The ClearCase environment will be set directly by "cleartool setview" in daemons.wrap. And the failure reason will be passed to mbatchd by the bpost command. Using bjobs -l, bhist -l, bstatus, or bread will display the message.
This fix provides 3 scripts to create schedulable advance reservation and query advance reservation jobs. The scripts are installed under $LSF_BINDIR.
- brsvsub: create a schedulable advance reservation.
- lsfrsv: part of brsvsub. Submit a job to run lsfrsv to update the time window and hosts in an advance reservation.
- brsvjob: query the job information submitted with advance reservation name.
To write your own script to create schedulable advance reservation, you need to know the following information:
- This fix adds a new option -p to brsvadd to create an advance reservation without a time window or hosts. This is referred to as a placeholder advance reservation.
- Use brsvmod to add a time window and hosts to the placeholder advance reservation.
- Submit jobs to the placeholder advance reservation. After the advance reservation is filled and active, jobs can run within it.
- Adds the new option "-o" to the lsload command to set the customized output format.
- Adds the new parameter LSF_LSLOAD_FORMAT to the lsf.conf file and new runtime environment variable LSF_LSLOAD_FORMAT to define the default lsload output format.
Note: "-json" option is not suppored on linux3.12-glibc2.17-armv8.
After this enhancement, the bjobs -p, -p1, -p2, -p3 options also show the project name for jobs that request project mode or fast dispatch mode features, and the bjobs -p options also show the cluster name for jobs that request cluster mode features. The bjobs -p0 and bjobs -l output are not affected by this enhancement.
For further details on these solutions, refer to https://www.ibm.com/support/knowledgecenter/SSWRJV_10.1.0/lsf_release_notes/lsf_relnotes_whatsnew10.1.0.3.html
P102234,P102231,P102230,P102229,P102224,P102208,P102199,P102198,P102196,P102188,P102186,P102178,
P102171,P102170,P102169,P102158,P102152,P102147,P102141,P102139,P102136,P102119,P102115,P102105,
P102103,P102100,157354(No APAR),156718(No APAR),156008(No APAR),155022(No APAR),152237(No APAR),
151340(No APAR),149041(No APAR),148457(No APAR),147039(No APAR),141493(No APAR),152022(No APAR),
RFE#98501,139905(No APAR),RFE#101255,RFE#92852,RFE#75418,RFE#94879,RFE#82487,RFE#79255,RFE#92576,
RFE#98712,146129(No APAR),146128(No APAR),P102156,RFE#98823
LSF/lsf.h
LSF/lsbatch.h
LSF/lssched.h
LSF/lim
LSF/mbatchd
LSF/res
LSF/mbschd
LSF/sbatchd
LSF/ebrokerd
LSF/daemons.wrap
LSF/eauth.krb5
LSF/egosc
LSF/nios
LSF/mesub
LSF/krbrenewd
LSF/eauth.cve
LSF/pim
LSF/bsub
LSF/bmod
LSF/bhist
LSF/bpeek
LSF/bapp
LSF/bjobs
LSF/bqueues
LSF/lsload
LSF/brsvadd
LSF/brsvs
LSF/bconf
LSF/busers
LSF/bparams
LSF/bhosts
LSF/brestart
LSF/bacct
LSF/lsid
LSF/badmin
LSF/bresize
LSF/brsvsub
LSF/lsfrsv
LSF/brsvjob
LSF/bgmod
LSF/bwait
LSF/blimits
LSF/bresources
LSF/bresume
LSF/lsadmin
LSF/blaunch
LSF/bclusters
LSF/lsinfo
LSF/lshosts
LSF/lsmon
LSF/brsvmod
LSF/bslots
LSF/lsgrun
LSF/lsloadadj
LSF/lslogin
LSF/lsplace
LSF/lsrun
LSF/bgadd
LSF/libbat.a
LSF/libbat.so
LSF/liblsf.a
LSF/liblsf.so
LSF/schmod_demand.so
LSF/liblsbstream.a
LSF/libfairshareadjust.a
LSF/liblsbstream.so
LSF/lsf_release
LSF/misc/examples/clearcase/daemons.wrap.c
LSF/misc/examples/lsfhint.py
LSF/misc/examples/external_plugin/allocexample.c
LSF/misc/examples/external_plugin/Makefile
LSF/misc/examples/external_plugin/matchexample.c
LSF/misc/examples/external_plugin/README
LSF/misc/examples/external_plugin/sch.mod.fcfs.c
LSF/misc/examples/external_plugin/myplugin.c
badmin hclose all
badmin qinact all
badmin hshutdown all
lsadmin resshutdown all
lsadmin limshutdown all
lsadmin limstartup all
lsadmin resstartup all
badmin hstartup all
badmin hopen all
badmin qact all
badmin hclose all
badmin qinact all
badmin hshutdown all
lsadmin resshutdown all
lsadmin limshutdown all
lsadmin limstartup all
lsadmin resstartup all
badmin hstartup all
badmin hopen all
badmin qact all
fixlist.txt
lsf.h
lsbatch.h
lssched.h
lim
mbatchd
res
mbschd
sbatchd
ebrokerd
daemons.wrap
eauth.krb5
egosc
nios
mesub
krbrenewd
eauth.cve
pim
bsub
bmod
bhist
bpeek
bapp
bjobs
bqueues
lsload
brsvadd
brsvs
bconf
busers
bparams
bhosts
brestart
bacct
lsid
badmin
bresize
brsvsub
lsfrsv
brsvjob
bgmod
bwait
blimits
bresources
bresume
lsadmin
blaunch
bclusters
lsinfo
lshosts
lsmon
brsvmod
bslots
lsgrun
lsloadadj
lslogin
lsplace
lsrun
bgadd
libbat.a
libbat.so
liblsf.a
liblsf.so
schmod_demand.so
liblsbstream.a
libfairshareadjust.a
liblsbstream.so
lsf_release
misc/examples/clearcase/daemons.wrap.c
misc/examples/lsfhint.py
misc/examples/external_plugin/allocexample.c
misc/examples/external_plugin/Makefile
misc/examples/external_plugin/matchexample.c
misc/examples/external_plugin/README
misc/examples/external_plugin/sch.mod.fcfs.c
misc/examples/external_plugin/myplugin.c
packagedef.txt