Readme file for:LL-3.5.1.20-power-Linux-SLES10
Product/Component Release:3.5.1.20
Update Name:LL-3.5.1.20-power-Linux-SLES10
Fix ID:LL-3.5.1.20-power-Linux-SLES10
Publication Date:17 June 2013
Last modified date:17 June 2013

Online version of the readme file:http://www-01.ibm.com/support/docview.wss?rs=0&uid=isg400001550
Important: The most current version of the readme file can always be found online.

Below is a list of components, platforms, and file names that apply to this Readme file.

Use these instructions to install the LoadLeveler update RPMs on IBM Systems running supported versions of Red Hat Enterprise Linux and SUSE Linux Enterprise Server.

Read-me-first installation notes

This installation procedure assumes that you are updating a machine running the full version of LoadLeveler.

License RPM

Do not proceed if the LoadL-full-license- < OS-ARCH > - < installed_license_version > license package is not installed. For example, if you are currently running LoadLeveler 3.3.x for Linux, you cannot install an update for LoadLeveler 3.4.x. You must first upgrade to LoadLeveler 3.4.0.0 before you can install an update package for a 3.4.0.X release. Please contact your IBM marketing representative for information on how to obtain the appropriate LoadLeveler CD-ROM that has the license package.

If you are installing a mod level (point release), you can use either the current license or the mod level license. For example, if you are installing LoadLeveler 3.4.1.0, you can use either the LoadLeveler 3.4.0.0 or the LoadLeveler 3.4.1.0 license package.

When uninstalling the "LoadL-full" RPM currently installed on your machine, do not uninstall the "LoadL-full-license" RPM for your currently installed LoadLeveler release. The "LoadL-full" update RPM has a dependency on the currently installed license RPM. Also, the LoadLeveler daemons will not start unless the license package is installed and you have accepted the terms and conditions of the license agreement.

Submit-only machines

The update steps for a LoadLeveler submit-only machine are similar to those associated with a machine running the full product. Simply replace the relevant "full" file names by their "so" counterparts. Also, on a submit-only machine there is no need to execute the llctl drain, llctl stop, or llctl start commands because LoadLeveler daemons do not run on a submit-only machine.

File name conventions

Download packages and RPMs for LoadLeveler updates use consistent file naming conventions. By looking at the components of a file name, you can recognize which update package you need for your machine and Linux operating system, and which installed and update RPMs you need to work with to update your current LoadLeveler version.

Example

If your System p (POWER server) is running SLES9 with LoadLeveler 3.4.0.1 installed, and you want to download and install the LoadLeveler 3.4.0.2 update, then the file name components would be as follows:

Download package:

LoadL- < update_version > . < OS-ARCH > . < base_version > .tar.gz
Specifies the LL-3.4.0.2-power-Linux-SLES9.tar.gz file

Package RPMs:

LoadL-full- < OS-ARCH > - < update_version > . < arch > .rpm
Specifies the LoadL-full-SLES9-PPC64-3.4.0.2-0.ppc64.rpm file

LoadL-so- < OS-ARCH > - < update_version > . < arch > .rpm
Specifies the LoadL-so-SLES9-PPC64-3.4.0.2-0.ppc64.rpm file

LoadL-full-lib- < OS-ARCH > - < update_version > . < arch > .rpm
Specifies the LoadL-full-lib-SLES9-PPC-3.4.0.2-0.ppc.rpm file

Currently installed LoadLeveler and License RPMs:

LoadL-full- < OS-ARCH > - < installed_version >
Specifies that LoadL-full-SLES9-PPC64-3.4.0.0-0 is installed

LoadL-full-license- < OS-ARCH > - < installed_license_version >
Specifies the LoadL-full-license-SLES9-PPC64-3.4.0.0-0 license

where

< OS-ARCH >

The Linux Operating System and platform architecture of the machine where you are installing a LoadLeveler update package. For example, if you are upgrading an installation of LoadLeveler on a PowerPC 64 machine running SLES9, then < OS-ARCH > would be SLES9-PPC64.

< update_version >

Specifies the version number of the LoadLeveler update package that you want to install on your system. For example, if you are updating to LoadLeveler 3.4.0.2-0, then the < update_version > is the number 3.4.0.2-0. The < update_version > appears in the file name of the package download (*.tar.gz file) and in the update RPMs in the package.

< arch >

Used in RPM file names. Specifies the platform architecture, where < arch > =
i386 (32-bit IBM System x),
x86_64 (64-bit IBM System x) or
ppc64 (64-bit IBM System p)

< base_version >

Used only in the file name of the downloadable update package (gzipped tar file). Specifies the LoadLeveler Version/Release to which the update can be applied, for example, 3.4.X.

< currently_installed_version >

Specifies the version number of the current LoadLeveler release installed on your system.

< installed_license_version >

Specifices the license RPM for the base release (or a lower release) of the LoadLeveler version you are updating. For example, to install the a LoadLeveler 3.3.2.6 update, you must have an installed license RPM for LoadLeveler 3.3.0.0 (base version) or 3.3.1.x or 3.3.2.x.

Installation procedure
1. Change to the < UpdateDirectory > directory, that is, the directory where the *.tar.gz file for the LoadLeveler update resides and where you have write access:
  
  cd < UpdateDirectory >
2. Extract the RPM files from the tar file:
  
  tar -xzvf LoadL- < update_version > . < OS-ARCH > . < base_version > .tar.gz
  
  At the end of this step the files extracted from the archive should match the files listed in the Readme ("View" link) for the LoadLeveler update you downloaded.
3. Verify that the LoadLeveler "license" RPM for the LoadLeveler version that you are updating is currently installed on this system:
  
  rpm -qa | grep LoadL
  
  The output of this command should be similar to the following:
  
  LoadL-full- < OS-ARCH > - < currently_installed_version >
  LoadL-full-license- < OS-ARCH > - < installed_license_version >
4. If LoadLeveler is running on this machine, enter the following command to "drain" the LoadL_schedd and LoadL_startd daemons on this machine and all other machines in the LoadLeveler cluster:
  
  llctl -g drain
  
  Note: To avoid potential incompatibility problems, all machines in the cluster should be upgraded to the same LoadLeveler update release before restarting the LoadLeveler daemons.
5. Use the llstatus command to verify that the LoadL_schedd and LoadL_startd daemons are in "Drned" (drained) state, and then enter the following command to stop the LoadLeveler daemons:
  
  llctl -g stop
6. To apply the update, use one of the following options:
  
  OPTION 1
  
  Uninstall the currently installed "LoadL-full" RPM and then install the "LoadL-full" update package by running the following commands:
  
  rpm -e LoadL-full- < OS-ARCH > - < currently_installed_version >
  rpm -ivh LoadL-full- < OS-ARCH > - < update_version > . < arch > .rpm
  
  For Blue Gene/L or Parallel POE only
  
  Run these additional commands:
  rpm -e LoadL-full-lib- < OS-ARCH > - < currently_installed_version >
  rpm -ivh LoadL-full-lib- < OS-ARCH > - < update_version > . < arch > .rpm
  
  OPTION 2
  
  Use the -U option of the rpm command to apply the update directly:
  
  rpm -Uvh LoadL-full- < OS-ARCH > - < update_version > . < arch > .rpm
  
  For Blue Gene/L or Parallel POE only
  
  Run this additional command:
  rpm -Uvh LoadL-full-lib- < OS-ARCH > - < update_version > . < arch > .rpm
7. Repeat step 6 for all the machines in the LoadLeveler cluster. On completion of this task, restart the LoadLeveler daemons with the following command:
  
  llctl -g start
For further information, consult the LoadLeveler Library for the appropriate version of the LoadLeveler AIX 6, AIX 5 and Linux Installation Guide.

Additional information

Package contents

Update for SUSE Linux Enterprise Server 10 (SLES 10) on IBM servers with POWER processors

LL-3.5.1.20-power-Linux-SLES10 is a corrective fix for LoadLeveler for SLES 10 on POWER systems, version 3.5.1.X.

The updates contained in the LL-3.5.1.20-power-Linux-SLES10.tar.gz file available for download from this site provide corrective fixes for LoadLeveler for SLES 10 on POWER systems, version 3.5.1.0. Updates for both the full LoadLeveler product and the submit-only component of the LoadLeveler product are provided.

Update to Version:

3.5.1.20

Update from Version:

3.5.1.0 through 3.5.1.19

Update (tar file) contents:
RPM files:

LoadL-full-SLES10-PPC64-3.5.1.20-0.ppc64.rpm
LoadL-so-SLES10-PPC64-3.5.1.20-0.ppc64.rpm
LoadL-full-lib-SLES10-PPC-3.5.1.20-0.ppc.rpm

README file:

README

For further information, consult the LoadLeveler for AIX and Linux Version 3.5.0 Installation Guide.

Changelog

General fixes
Multicluster fixes

Blue Gene fixes
APAR listing

Notes

Unless specifically noted otherwise, this history of problems fixed for LoadLeveler 3.5.1.x applies to:

LoadLeveler 3.5.1.x for AIX 6
LoadLeveler 3.5.1.x for AIX 5
LoadLeveler 3.5.1.x for SUSE LINUX Enterprise Server 11 (SLES11) on POWER servers
LoadLeveler 3.5.1.x for SUSE LINUX Enterprise Server 10 (SLES10) on POWER servers
LoadLeveler 3.5.1.x for SUSE LINUX Enterprise Server 9 (SLES9) on POWER servers
LoadLeveler 3.5.1.x for Red Hat Enterprise Linux 5 (RHEL5) on POWER servers
LoadLeveler 3.5.1.x for Red Hat Enterprise Linux 4 (RHEL4) on POWER servers
LoadLeveler 3.5.1.x for SUSE LINUX Enterprise Server 11 (SLES11) on Intel based servers
LoadLeveler 3.5.1.x for SUSE LINUX Enterprise Server 10 (SLES10) on Intel based servers
LoadLeveler 3.5.1.x for SUSE LINUX Enterprise Server 9 (SLES9) on Intel based servers
LoadLeveler 3.5.1.x for Red Hat Enterprise Linux 5 (RHEL5) on Intel based servers
LoadLeveler 3.5.1.x for Red Hat Enterprise Linux 4 (RHEL4) on Intel based servers
LoadLeveler 3.5.1.x for SUSE LINUX Enterprise Server 11 (SLES11) on servers with 64-bit Opteron or EM64T processors
LoadLeveler 3.5.1.x for SUSE LINUX Enterprise Server 10 (SLES10) on servers with 64-bit Opteron or EM64T processors
LoadLeveler 3.5.1.x for SUSE LINUX Enterprise Server 9 (SLES9) on servers with 64-bit Opteron or EM64T processors
LoadLeveler 3.5.1.x for Red Hat Enterprise Linux 5 (RHEL5) on servers with 64-bit Opteron or EM64T processors
LoadLeveler 3.5.1.x for Red Hat Enterprise Linux 4 (RHEL4) on servers with 64-bit Opteron or EM64T processors

Warning section

A coexistence problem was introduced in TWS LoadLeveler 3.5.0.5 and TWS LoadLeveler 3.5.1.1 which can not be corrected. The entire cluster will need to be migrated to either TWS LoadLeveler 3.5.0.5 or TWS LoadLeveler 3.5.1.1 at the same time.

Restriction section

TWS LoadLeveler 3.5 does not support checkpointing for data staging jobs.

General LoadLeveler problems

Problems fixed in LoadLeveler 3.5.1.19 [October 19, 2012]

LoadLeveler is modified to set the correct userid to prevent checkpoint files from being deleted and the correct checkpoint file is being read by the starter.
LoadLeveler Schedd daemon crash problem is fixed when accessing fairshare data when the job is terminating.
A performance optimization of the central manager is done.
The central manager will not schedule other steps of a co-scheduled job if one of them fails to schedule.
The central manager crash problem with SIGABT signal when removing step is fixed.

Problems fixed in LoadLeveler 3.5.1.18 [July 12, 2012]

LoadLeveler will not display the misleading message about image_size check in the command "llq -s" and in the Negotiator log for determining that a machine could not be used was found already.
LoadLeveler will now assign the correct number of cpus for blocking steps requesting cpu affinity.
LoadLeveler negotiator dameon will not core dump if a llmovespool command is run when there is a multistep job which has some steps completed and others still running.

Problems fixed in LoadLeveler 3.5.1.17 [May 3, 2012]

Under some rare conditions, the LoadL_schedd daemon can core dump when a job is rejected multiple times. The core dump was the result of an array index not being reset properly upon a 2nd dispatch of the same job step. This problem has been corrected by setting that array index back to -1 when a job step is redispatched.

General LoadLeveler problems

Problems fixed in LoadLeveler 3.5.1.16 [February 23, 2012]

The step count limitation is now calculated correctly for the user for each of its class if the step was modified by the llmodify command.

Problems fixed in LoadLeveler 3.5.1.15 [December 15, 2011]

The LOADL_HOSTFILE environment variable is now set in the environment for the user prolog when the job type is set to mpich.
LoadLeveler can prevent the potential negotiator core dump caused by a race condition when querying a terminating job.
The abort is now prevented by correcting the startd daemon locking when processing files in the execute directory during startup.

Problems fixed in LoadLeveler 3.5.1.14 [October 13, 2011]

The llsubmit command will fail if the smt and rset keywords are used together.
LoadLeveler can now handle jobs from users who belong to more than 64 system groups. The jobs submitted will not be rejected during launching on the compute node.
The Negotiator has been changed so that it no longer depends on processor core 0 having CPUs configured. The Negotiator will no longer core dump if it encounters such a configuration.

Problems fixed in LoadLeveler 3.5.1.13 [August 4, 2011]

LoadLeveler will not submit the job if there are no class in the default class list that can satisfy the job requirements.
LoadLeveler "llq -s" command will provide information about why a step is in Deferred state.
The unthread_open() error in the Schedd Log will no longer be printed when querying the remote cluster job since LoadLeveler will no longer try to route a nonexistent remote submit job command file in a multi cluster environment.
LoadLeveler will pick up the default values for the undefined keywords after a reconfig. e.g. job_user_prolog

Problems fixed in LoadLeveler 3.5.1.12 [June 17, 2011]

The llctl command will now check to make sure the Schedd daemon's port is available to be used before starting up LoadLeveler.
A new field, Eligibility Time, is added to the llq and llsummary long listing output which records the last time the job became eligible for dispatch.
LoadLeveler now creates cpuset files with permissions that are searchable by non-root users under the /dev/cpuset directory.
The llmkres command should now be able to create the reservations consistently without hitting the timing error message 2512-856.
LoadLeveler LoadL_negotiator daemon will not core dump when processing a multi-step job which contains a long dependency statement.
A unique security issue has been identified for TWS LoadLeveler Web User Interface that could potentially compromise your system. It is recommended that you apply this update to protect your system.

Problems fixed in LoadLeveler 3.5.1.11 [April 7, 2011]

Modifying the recurring reservation's attribute will now be seen in the first occurrence's attribute value under the llqres -l command.
On Linux/P nodes, jobs requesting memory affinity with MCM_MEM_NONE, the job will always consume memory from the local MCM and will start paging once memory on the local MCM is over consumed; even though memory is available on other MCMs on the node. Now, if a job is submitted with memory affinity option, MCM_MEM_NONE, the task will be bounded to all the MCMs on the node and the memory will be consumed from all the MCMs on the node.
Dependent steps are not given a new qdate when they are put onto the idle queue, while steps at the maxidle limit for a given user within a class are given a new qdate and a new sysprio based on that qdate. A change was made so that dependent steps are also counted as "queued" steps for the purposes of enforcing maxqueued and maxidle limits, and so a dependent step which is at the maxidle limit will get a new qdate.

Problems fixed in LoadLeveler 3.5.1.10 [February 10, 2011]

LoadLeveler affinity RSET_MCM_AFFINITY cannot work on AIX because the vmo command output had changed. LoadLeveler had enhanced the vmo handler code so RSET_MCM_AFFINITY can be enabled now.
The llsummary command might crash if the default class requirement value doesn't match the job requirement value. Fixed the llsummary command to select the correct requirement value from the default class list if there is no job class specified in job command file.
The llsummary command will fail when it tries to access invalid data memory in the job history file. Fixed the llsummary command to be able to ignore the bad data areas and just report the valid data in the job history file.
If the class-user sub-stanzas in the "default" class stanza are not defined in alphabetical order, the class-user sub-stanzas might incorrectly inherit the wrong values from the default class. LoadLeveler will now inherit the default values for the class-user sub-stanzas from the "default" class correctly.
Loadleveler doesn't set the environment variable, LOADL_JOB_STEP_EXIT_CODE, when executing the user epilog script. LoadLeveler will now set the right environment variables when executing the epilog script.
LoadLeveler schedd may ignore jobs if the job queue contains invalid job keys. The schedd daemon will now collect the correct job data when scanning the job queue files.
The LoadLeveler command, llmodify, has a limitation where the startdate and wall_clock_limit job attributes cannot be modify for idle jo bs. llmodify is now enhanced to be able to modify the startdate and wall_clock_limit job attributes for idle jobs.
- In the LoadLeveler Command and API Reference, SC23-6701-00, under Chapter 1. Commands, llmodify - Change attributes of a submitted job step,
  - New keyword wall_clock_limit for the -k option: Changes the wall clock limit of a job step. The value of the specified wall clock limit must be longer than the value of the current wall clock limit. This is a LoadLeveler administrator only option.
  - New keyword startdate for the -k option: Changes the start time of a idle-like job step. This is a LoadLeveler administrator only option.

Problems fixed in LoadLeveler 3.5.1.9 [December 16, 2010]

Fixed the llclass command to show the correct value for the "Free Slots" field when LoadLeveler is configured to use the LL_DEFAULT scheduler.
Fixed llchres command to check requested nodes additions to make sure those nodes have no jobs running on them or already assigned to another reservation. If no idle nodes can be found, the llchres command will fail.
Fixed the schedd daemon so it will not crash if the job's output file path contains the "%" character.
Fixed LoadLeveler to correctly reserve the reservation's resources after the central manager daemon restarts so that jobs with overlapping resources with the reservations will not be allowed to start.
Fixed the central manager to make sure pending status changes to the machines are properly locked so that jobs being scheduled to the down machines will no longer crash the central manager daemon.
Fixed the llsummary -s or -e command to report all jobs that match the filter requirement. In the TWS LoadLeveler documentation, Command and API Reference and the llsummary.l manual page, the -s and -e options will state the accounting data report will contain information about every job that contains at least one step that falls within the specified range.
Fixed the job launch program so that it does not need to verify the group name so jobs will be executed using the submitting GID number.
Fixed the negotiator crash by correcting the argument used to format the message which describes why the step cannot start be scheduled.

Problems fixed in LoadLeveler 3.5.1.8 [October 8, 2010]

Fixed llsummary command to display the correct job id for jobs which have been moved from one schedd to another using the llmovespool command.
Fixed the startd daemon to ignore the completion job command state if the job step was already terminated to prevent jobs from being stuck in the job queue.
Fixed jobs to run on partitions that had removed exclude_bg keyword from the partition's default class configuration.
Fixed LoadLeveler to do retries on the getpwnam() API so the correct passwd and group information will be retrieved if there are network issues instead of returning a "NOT FOUND" error.
Fixed central manager deadlock and core dump from occurring by removing the completed step from the user and group class queues before the dependent steps get requeued.
Fixed LoadLeveler from crashing by calling thread safe dirname() and filename() APIs during multi-thread executions.
Fixed LoadLeveler to accept jobs with environment variables up to 100KB.
Fixed the job step's completion code to return the wait3 UNIX system call status when the job is cancelled.

Problems fixed in LoadLeveler 3.5.1.7 [July 20, 2010]

Resources will be held correctly if two reservations in the cluster were reserving the same resources with the second reservation's start time corresponding to the first reservation's end time.
Fixed the llsummary command from crashing when the history file was being modified at the same time the command was trying to read it.
Fixed the llsummary command to handle small data fragments in the history file so job steps will now be displayed correctly.
Fixed the llacctmrg command from crashing if the global history file was greater than 2 GB.
Fixed llsummary and llacctmrg commands to be able to access history files greater than 2GB.
Fixed the central manager from crashing by locking the job step so different threads can not operated on it concurrently.
Fixed the llqres command so that it will now work in a mixed 32 bit and 64 bit cluster environment without seeing the 2512-301 error message.
Fixed the user prolog environment variables to be passed to the user epilog.
Fixed Loadleveler to prevent duplicate job id error by trying other remote inbound schedds for remote job submission if the network connections to the inbound schedd is not stable.
During file system failures, new mechanisms are implemented to reaccess file handlers in order to recover LoadLeveler to working state. The new implementations are to have a new timer to enable the schedd to come up automatically if file access was available and to set the schedd to drain state if the schedd file handlers can not be recovered.

Problems fixed in LoadLeveler 3.5.1.6 [May 20, 2010]

Fixed the evaluation of consumable cpus calculation for jobs which dynamically turn smt on and off so jobs will be scheduled properly on a Power5 or Power6 systems.
Fixed the negotiator core dump when the "START" expression was not configured for a machine.
Fixed design of dependent steps to get new qdate when they are put onto the idle queue due to enforcement of the maxqueued and maxidle limits.

Problems fixed in LoadLeveler 3.5.1.5 [March 22, 2010]

Fixed LoadLeveler to be able to honor the task order in the task_geometry keyword when assigning cpus to task ids.
Fixed llstatus to display the correct configuration expressions for all expression keywords.
Fixed the dispatch cycle of routed jobs so when the central manager failover takes place, the preempted jobs will now be able to run.
Fixed LoadLeveler to set the environment variable from the prolog output if each line contains at most 65534 characters. All lines containing more than 65534 characters will be ignored.
Fixed LoadLeveler jobs to start correctly in the login shell and know when to run under a login shell so the pmd will not hang during execution.
Fixed the reservation debug message field so the central manager will not core dump.

Problems fixed in LoadLeveler 3.5.1.4 [January 18, 2010]

Fix LoadLeveler from crashing when started in drain mode.
Fix the LoadL_negotiator daemon from core dumping by initializing an internal variable that was being used.
Fix LoadLeveler jobs from hanging in preempted pending state by correcting the machine state for the jobs being preempted.
Fix the schedd daemon memory leak when processing reservations by removing the reservation element object after use.
Fix the llsummary command segmentation fault by skipping over data that are not valid when processing the history file.
Fix user id name to have a length up to 256 so jobs can now run when submitted using those ids.
Fix llqres -l to output the correct days of the month under the Recurrence section if the month have less than 31 days.
Fix LoadLeveler to send emails to the right administrator accounts when LoadLeveler detects errors.
Fix LoadLeveler to execute the rescan function so jobs can not be scheduled once the running jobs are completed when using the default scheduler.
Fix submitted jobs to be rejected when user id is not valid.
Fix LoadLeveler to not send notification emails if the api process has already reported the errors.
Fix LoadLeveler jobs to run with the correct gid on AIX platform.
Fix LoadLeveler multi-step jobs to run with the correct umask value.
Fix the negotiator daemon to ensure that resource counts are now being updated correctly when a step is canceled during the window of time after it has been scheduled but before the job start order has been dispatched.

Problems fixed in LoadLeveler 3.5.1.3 [November 2, 2009]

Fix the job command file parsing error 2512-059 when the first non-blank line is neither a comment line nor a LoadLeveler keyword or if the first character of the first non-blank line is not a '#' sign.
Fix the resource count for coschedule job steps so if the step is canceled after it has been scheduled and waiting for preemption to take place, those resource counts will now be updated correctly for future dispatching cycles.
Fix LoadLeveler performance by reducing the overhead of handling llq query requests so that the impact to the overall scheduling progress is also reduced.
Fix documentation on why using different flags for llq will generate different outputs for the same job.

Problems fixed in LoadLeveler 3.5.1.2 [August 19, 2009]

Fix the LoadL_schedd SIGSEGV termination while many jobs are submitted by correcting reference counting on the data area while threads are still referencing it.
Fix LoadLeveler to use unsigned int64 variables instead of integer for file size calculations whenever transmitting files, including transmitting history files that are greater than 2G to the llacctmrg command.
Fix the llqres output to show the correct month value under the "Recurrence" section.
Fix the LoadL_startd increased memory size consumption by modifying LoadLeveler to dynamically load the libraries only once.
Fix the schedd memory leak when performing a llctl reconfig while having parallel, user space jobs on the queue in running state by correcting the memory leaks in the adapter objects.
Fix the job step staying in the complete state for a long period of time by changing the central manager job termination/cleanup processing.
Fix LoadLeveler to have better performance when scheduling jobs, especially in a cluster which has huge number of nodes with similar resources on each node.

Problems fixed in LoadLeveler 3.5.1.1 [May 18, 2009]

Notice: This is a mandatory service update to TWS LoadLeveler 3.5.1.0.

Data staging options DSTG_NODE=MASTER and DSTG_NODE=ALL can now be used.
Fix the accounting output of llsummary command to not have multiple same step entries after LoadLeveler restarts on a multistep job.
Fix the child starter process to ensure it is started as root so that the process could set up the environment and credentials to run the job.
Fix the negotiator handling of step dependencies so jobs that are supposed to run will run and those that shouldn't would not.
Linux: On linux platforms with multiple CPUs, it is possible for the seteuid function to malfunction. When the LoadLeveler startd daemon encounters this failure, its effective user id may be set incorrectly, in which case it is possible for jobs to become stuck in ST state. A workaround to the glibc issue is provided in this service update.

LoadLeveler Multicluster problems

Problems fixed in LoadLeveler 3.5.1.16 [February 23, 2012]

Locking is added to the LoadLeveler schedd daemon to serialize threads receiving multi-cluster jobs from threads processing llq -x requests to prevent the daemon from core dumping.

Problems fixed in LoadLeveler 3.5.1.11 [April 7, 2011]

LoadLeveler is changed to protect the schedd from core dumping if the same cluster stanza is configured as local for more than one cluster in a scale-across multi-cluster environment.

Problems fixed in LoadLeveler 3.5.1.3 [November 2, 2009]

Fix LoadL_schedd memory leak when running the llstatus -X command in a multi-cluster environment.
Fix LoadLeveler so jobs can be submitted to the remote cluster in a mixed 3.5.X and 3.4.3.X multi-cluster environment.

Problems fixed in LoadLeveler 3.5.1.1 [May 18, 2009]

Fix the llstatus -X from core dumping when there are adapters or MCMs on the nodes.

LoadLeveler Blue Gene problems

Problems fixed in LoadLeveler 3.5.1.20 [June 12, 2013]

Negotiator cores at 3.5.1.19 on BG/P.

Problems fixed in LoadLeveler 3.5.1.17 [May 3, 2012]

In a blue gene environment, if LoadLeveler is doing preemption and the preempting job affects exactly one running job with the same size, LoadLeveler will re-use the existing initialized partition, eliminating the need to boot a new partition of the same size.

Problems fixed in LoadLeveler 3.5.1.14 [October 13, 2011]

The base partition state as well as the nodecard state will be checked before dispatching the job. If the nodecards that the job requires are available, the job will run even if not all nodecards are in a good state on the base partition.

Problems fixed in LoadLeveler 3.5.1.13 [August 4, 2011]

When running jobs are preempted by new incoming jobs, the top-dog job will not lose its status as top-dog.

Problems fixed in LoadLeveler 3.5.1.11 [April 7, 2011]

In a blue gene environment, the partition state will be checked before dispatching the job so that the job will not be scheduled onto a down partition.
LoadLeveler can change the duration of an active Blue Gene Partition created by job command file on the BG/P system.
In a Blue Gene environment a job was being scheduled to midplanes which had linkcards in ERROR state causing a failure when booting the partition. This caused jobs to be placed in HOLD. Now, jobs will not be scheduled to midplanes that have a linkcard error.

Problems fixed in LoadLeveler 3.5.1.9 [December 16, 2010]

Fixed the llq -b command in a Blue Gene environment to not display invalid values as the partition state.

Problems fixed in LoadLeveler 3.5.1.6 [May 20, 2010]

The duration of an active blue gene partition can now be modified on the Blue Gene/P system.

Problems fixed in LoadLeveler 3.5.1.5 [March 22, 2010]

Fixed LoadLeveler Blue Gene jobs to start on free nodes by skipping over invalid partitions in the Blue Gene database during partition load and continue to load on valid partitions.

Problems fixed in LoadLeveler 3.5.1.1 [May 18, 2009]

Added scheduling enhancements to make it easier to find resources to run jobs on large Blue Gene systems.

*TWS LoadLeveler Corrective Fix listing*
Fix Level	APAR numbers
LL 3.5.1.20	AIX:IV43790
	LINUX:IV39500
LL 3.5.1.19	AIX:IV28108 IV07131 IV16136 IV28971
	LINUX:
LL 3.5.1.18	AIX:IV12614 IV23816
	LINUX:
LL 3.5.1.17	AIX:IV19195 IV19236
	LINUX:
LL 3.5.1.16	AIX:IV10394 IV11567 IV15094
	LINUX:
LL 3.5.1.15	AIX:IV09427 IV10204 IV11547
	LINUX:
LL 3.5.1.14	AIX:IV05141 IV08342 IV08363 IZ89656
	LINUX:
LL 3.5.1.13	AIX:IV00739 IV01418 IV01429 IV03310 IV03328 IV03346
	LINUX:
LL 3.5.1.12	AIX:IV00814 IV00815 IV00820 IV00831 IZ95944 IZ99311 IZ99669 IZ99914
	LINUX:IZ92264
LL 3.5.1.11	AIX:IZ94734 IZ96419 IZ96424 IZ96429 IZ96432 IZ96515 IZ96991
	LINUX:IZ86999 IZ90254 IZ91951 IZ93320 IZ94855
LL 3.5.1.10	AIX:IZ89659 IZ90701 IZ91661 IZ91756 IZ93221 IZ93227 IZ93260 IZ93266
	LINUX:
LL 3.5.1.9	AIX:IZ85108 IZ86934 IZ87143 IZ87433 IZ87442 IZ89564 IZ89566 IZ89568
	LINUX:
LL 3.5.1.8	AIX:IZ80738 IZ83416 IZ83769 IZ84426 IZ85339 IZ85341 IZ85385 IZ85390 IZ85406
	LINUX: IZ78027 IZ81226 IZ82094 IZ83772
LL 3.5.1.7	AIX: IZ69053 IZ75381 IZ75989 IZ76705 IZ77606 IZ77612 IZ78490 IZ78493 IZ79273 IZ79274
	LINUX: IZ77000 IZ77610
LL 3.5.1.6	AIX: IZ52059 IZ73349 IZ75142 IZ75265
	LINUX: IZ70744
LL 3.5.1.5	AIX: IZ56208 IZ67241 IZ67479 IZ67760 IZ69047 IZ70280 IZ70760 IZ70787
	LINUX: IZ66534
LL 3.5.1.4	IZ60760 IZ62312 IZ64121 IZ64435 IZ64717 IZ64913 IZ64956 IZ65273 IZ65278 IZ66454 IZ66461 IZ66874 IZ66914 IZ67156
LL 3.5.1.3	IZ55661 IZ58696 IZ59450 IZ59799 IZ59841 IZ59842 IZ63572
LL 3.5.1.2	IZ51180 IZ51401 IZ52219 IZ52926 IZ52927 IZ53764 IZ53825 IZ54439
LL 3.5.1.1	IZ48410 IZ48545 IZ48548 IZ50225 IZ50226

Notices

INTERNATIONAL BUSINESS MACHINES CORPORATION PROVIDES THIS PUBLICATION "AS IS" WITHOUT WARRANTY OF ANY KIND, EITHER EXPRESS OR IMPLIED, INCLUDING, BUT NOT LIMITED TO, THE IMPLIED WARRANTIES OF NON-INFRINGEMENT, MERCHANTABILITY OR FITNESS FOR A PARTICULAR PURPOSE. Some jurisdictions do not allow disclaimer of express or implied warranties in certain transactions, therefore, this statement may not apply to you.

This information could include technical inaccuracies or typographical errors. Changes are periodically made to the information herein; these changes will be incorporated in new editions of the publication. IBM may make improvements and/or changes in the product(s) and/or the program(s) described in this publication at any time without notice.

Microsoft, Windows, and Windows Server are trademarks of Microsoft Corporation in the United States, other countries, or both.

Intel, Intel logo, Intel Inside, Intel Inside logo, Intel Centrino, Intel Centrino logo, Celeron, Intel Xeon, Intel SpeedStep, Itanium, and Pentium are trademarks or registered trademarks of Intel Corporation or its subsidiaries in the United States and other countries.

Other company, product, or service names may be trademarks or service marks of others.

THIRD-PARTY LICENSE TERMS AND CONDITIONS, NOTICES AND INFORMATION

The license agreement for this product refers you to this file for details concerning terms and conditions applicable to third party software code included in this product, and for certain notices and other information IBM must provide to you under its license to certain software code. The relevant terms and conditions, notices and other information are provided or referenced below. Please note that any non-English version of the licenses below is unofficial and is provided to you for your convenience only. The English version of the licenses below, provided as part of the English version of this file, is the official version.

Notwithstanding the terms and conditions of any other agreement you may have with IBM or any of its related or affiliated entities (collectively "IBM"), the third party software code identified below are "Excluded Components" and are subject to the following terms and conditions:

Product/Component Name:	Platform:	Fix:
LoadLeveler	Linux 64-bit,pSeries SLES 10	LL-3.5.1.20-power-Linux-SLES10

Readme and Release notes for release 3.5.1.20 LoadLeveler 3.5.1.20 LL-3.5.1.20-power-Linux-SLES10 Readme