- LoadLeveler 4.1.1.x for AIX 6
- LoadLeveler 4.1.1.x for AIX 5
- LoadLeveler 4.1.1.x for SUSE LINUX Enterprise Server 11 (SLES11) on POWER servers
- LoadLeveler 4.1.1.x for SUSE LINUX Enterprise Server 11 (SLES11) on servers with 64-bit Opteron or EM64T processors
- LoadLeveler 4.1.1.x for Red Hat Enterprise Linux 6 (RHEL6) on POWER servers
- LoadLeveler 4.1.1.x for Red Hat Enterprise Linux 6 (RHEL6) on servers with 64-bit Opteron or EM64T processors
- LoadLeveler 4.1.1.x for Red Hat Enterprise Linux 5 (RHEL5) on servers with 64-bit Opteron or EM64T processors
- LoadLeveler 4.1.1.x for Red Hat Enterprise Linux 5 (RHEL5) on Intel based servers
For LL 4.1.1:
- If the scheduler and resource manager components on the same machine are not at the same level, the daemons will not start up.
Problems fixed in LoadLeveler 4.1.1.14 [February 7, 2013]
- The import of environment variables containing semicolons has been corrected.
- Fixed an issue that llsummary command core dump.
- The failure of the jobs attempting to create the output or error file has been fixed.
- Resource Manager only:
- Fixed issue of LOADL_PROCESSOR_LIST not being set correct for serial jobs.
- LoadLeveler is changed to no longer wait for network tables to be loaded for MPICH jobs.
- The issue that LoadLeveler fails to load the network table before starting a single node parallel job has been fixed.
- Scheduler only:
- Fixed an issue that the scheduler core dump when coscheduling jobs.
- Fixed issue of scheduler getting into especially long dispatching cycles.
- Performance improvement to the scheduling of work by the central manager.
- The issue that the scheduler core dump when removing unusable internal objects has been fixed.
Problems fixed in LoadLeveler 4.1.1.13 [November 1, 2012]
- There is no error message displayed in Startd log when NRT is not installed on the cluster.
- The llq command can now show error messages when an user id issue is encountered.
- The central manager daemon stalling and llq commands hanging problem has been fixed.
- LoadLeveler will not core dump when the user issues the command "llstatus -L machine".
- Resource Manager only:
- Obsolete code which attempted to terminate left over job processes is removed.
- LoadLeveler startd daemon will no longer abort when trying to reject a job when a network table load fails.
- Scheduler only:
- It has been fixed that the LoadL_negotiator daemon terminated with a SIGSEGV when trying to schedule a job step while the number of machines in the cluster had changed.
- The top dog start time search has been optimized and additional changes reduce the number of waiting threads that are started during the dispatching loop.
Problems fixed in LoadLeveler 4.1.1.12 [August 29, 2012]
- An coredump problem has been fixed when querying the step adapter usage information.
- The fix eliminates the deadlock as a cause for jobs to become stuck in RP state and permits llctl stop to terminate the LoadL_startd process.
- LL has sufficiently reduceds its calls to the user registry.
- The issues that completed jobs not giving back resources for a long time and the appearance that no jobs are starting at all have been fixed.
- Scheduler only:
- The llmovespool commands now works well for multistep job which has some steps completed and others still running.
- LoadLeveler will not display the misleading message about image_size check in the command "llq -s" and in the Negotiator log for determining that a machine could not be used was found already.
- The accounting record which has a negative wall clock value is now skipped by the llsummary command.
- The problem that central manager crashes with signal SIGABT when removing a job step has been fixed.
Problems fixed in LoadLeveler 4.1.1.11 [June 7, 2012]
- Implemented internal LoadLeveler data contention improvements.
- Resource Manager only:
- Under some rare conditions, the LoadL_schedd daemon can core dump when a job is rejected multiple times. The core dump was the result of an array index not being reset properly upon a 2nd dispatch of the same job step. This problem has been corrected by setting that array index back to -1 when a job step is redispatched.
- The adapter state shown in llstatus is not correct due to a deadlock in the region manager is now resolved.
- Scheduler only:
- If the step requires CPU affinity by blocking, LoadLeveler will assign more CPUs than requested. LoadLeveler will now assign the correct number of cpus for blocking steps requesting cpu affinity.
Problems fixed in LoadLeveler 4.1.1.10 [April 5, 2012]
- Parsing of the alternate regional managers is now corrected to not get the error message of 2512-636 so defining alternate regional managers will not prevent LoadLeveler from starting up. Also, defining name_server keyword to "LOCAL" will now only send one stop to the machine using the machine's short host name.
- Fixed startd daemon core dumping in cases where the job was rejected due to adapter windows failures.
- Resource Manager only:
- LoadLeveler was modified to set the correct userid to prevent checkpoint files from being deleted and the correct checkpoint file is being read by the starter so that job will restart when the llctl flush and resume commands are executed twice.
- Locking is added to the LoadLeveler schedd daemon to serialize threads receiving multi-cluster jobs from threads processing llq -x requests which will prevent the schedd from core dumping.
- The LoadLeveler schedd daemon will now write the host smt status to the accounting history file before the job gets terminated so that all the host smt status will be shown in the llsummary -l output.
- A problem in pe_rm_connect() that caused read() to be called on a socket that was not ready to be read has been corrected, allowing pe_rm_connect() to continue to retry to the connection for the specified rm_timeout amount of time.
- Scheduler only:
Problems fixed in LoadLeveler 4.1.1.9 [February 9, 2012]
- LoadLeveler can now display the host name correctly based on the name_server configuration. The previous limitation of the name_server keyword being ignore is now lifted.
- LoadLeveler has been changed to prevent unnecessary logging of multi-cluster messages to the SchedLog.
- The llconfig -c command will coredump if cluster has more than 128 machines.
- If there are adapter windows failure from unloading or cleaning, LoadLeveler will now mark these adapters as unusable and will give the central manager only windows that can be used.
- Resource Manager only:
- The timing issue between the cancel and job start transaction is now corrected so that the job will be canceled after the llcancel command is issued and the startd daemon will no longer hang after executing the llctl stop command.
- The LoadLeveler method for reporting job step status has been corrected to report R state, even for parallel jobs which do not invoke an mpi run time manager (e.g. poe).
- Scheduler only:
- There can be a problem in determining the count of idle job steps towards the maxidle limit for a user within a class in instances where the class of a job step is modified with the llmodify command. The step count limitation is now calculated correctly for the user for each of its class.
Problems fixed in LoadLeveler 4.1.1.8 [December 15, 2011]
- LoadLeveler can prevent the potential core dump caused by a race condition when querying a terminating job.
- LoadLeveler will ignore the machine_list keyword if the syntax is not defined correctly.
- Changes have been made to the processing of the machine_list keyword so that hyphens can now be used as part of the machine name and that multiple number ranges can be specified in each of the machine name expressions.
e.g. machine_list = c250f01c[02-08]n[01-08]-ib0,c250f[02-04]c[01-08]n[01-08]-ib0 - Changes have been made in the way job keys are handled in LoadLeveler so that it is no longer possible for more than one job having the same job key to be active in the cluster at the same time.
- Resource Manager only:
- The LOADL_HOSTFILE environment variable is now set in the environment for the user prolog when the job type is set to mpich.
- The abort is now prevented by correcting the startd daemon locking when processing files in the execute directory during startup.
Problems fixed in LoadLeveler 4.1.1.7 [October 27, 2011]
- The llsubmit command will fail if the smt and rset keywords are used together.
- The processing of the preempt_class configuration keywords has been fixed so that changes will take effect after the llctl reconfig command is issued.
- The Negotiator has been changed so that it no longer depends on processor core 0 having CPUs configured. The Negotiator will no longer core dump if it encounters such a configuration.
- The memory error in the LoadLeveler String library is corrected to prevent crashes if the function is used.
- The LoadLeveler commands will not generate the 2512-030 error message when there is no /etc/LoadL.cfg file on the system.
- Resource Manager only:
- The Startd has been fixed to ensure that the correct effective user ID is used when cleaning up job status and job usage files in the execute directory during job termination.
- Scheduler only:
- LoadLeveler will now select and hold cpus that are already in used for top dog usage; therefore, other jobs can now run with cpus that are currently available.
Problems fixed in LoadLeveler 4.1.1.6 [September 1, 2011]
- LoadLeveler can now handle jobs from users who belong to more than 64 system groups.
- LoadLeveler is now able to support ETHoIB using bond0 interface mapped to IB User Space device on linux system if the fileset rsct.lapi.rte apar IV06393 is also applied.
Problems fixed in LoadLeveler 4.1.1.5 [July 28, 2011]
- Multiple configuration editor and form-based GUI issues are resolved.
- LoadLeveler will not submit the job if there are no class in the default class list that can satisfy the job requirements.
- LoadLeveler now creates cpuset files with permissions that are searchable by non-root users under the /dev/cpuset directory.
- The unthread_open() error in the Schedd Log will no longer be printed when querying the remote cluster job since LoadLeveler will no longer try to route a nonexistent remote submit job command file in a multi cluster environment.
- LoadLeveler has been enhanced so it now displays the job eligibility time.
- Intel MPI and Open MPI are now supported under LoadLeveler.
- Resource Manager only:
- The llctl command is now able to support "start drained" option on the remote node.
- Scheduler only:
- LoadLeveler LoadL_negotiator daemon will not core dump when processing a multi-step job which contains a dependency statement longer than 2048 character s.
- LoadLeveler "llq -s" command will provide information about why a step is in Deferred state.
- The llsummary command and API will no longer core dump if the number of history files are greater than or equal to the PTHREAD_DATAKEYS_MAX constant value.
Problems fixed in LoadLeveler 4.1.1.4 [May 27, 2011]
- The llctl command will now check to make sure the Schedd daemon's port is available to be used before starting up LoadLeveler.
- A new keyword, restart, is implemented for the class stanza in the admin configuration.
- If a value is not set for the keyword max_starters in database configuration mode, the default value used for max_starters will be adjusted when the count of classes specified in the keyword class is changed.
- Absolute paths containing http/https are changed to relative paths for the configuration editor to run.
- Resource Manager only:
- Loadleveler will now set the right environment variables when executing the user epilog script.
- The llmkres command should now be able to create the reservations consistently without hitting the timing error message 2512-856.
- In a multicluster environment, the llq -s command will now invoke the correct query command on the remote cluster.
- Scheduler only:
- Modifying the recurring reservation's attribute will now be seen in the first occurrence's attribute value under the llqres -l command.
- A unique security issue has been identified for TWS LoadLeveler Web User Interface that could potentially compromise your system. It is recommended that you apply this update to protect your system.
Problems fixed in LoadLeveler 4.1.1.3 [March 25, 2011]
- The ability to set the name_server in LoadLeveler is now disabled. The setting under LoadLeveler will now always be set to DNS.
- When configuring class limits using the config editor adding or updating when there is more than one class limit will fail. Now the config editor can be used to update class limits or add new hard and soft limits.
- If the class-user sub-stanzas in the "default" class stanza are not defined in alphabetical order, the class-user sub-stanzas might incorrectly inherit the wrong values from the default class. LoadLeveler will now inherit the default values for the class-user sub-stanzas from the "default" class stanza correctly.
- On Linux/P nodes, jobs requesting memory affinity with MCM_MEM_NONE, the job will always consume memory from the local MCM and will start paging once memory on the local MCM is over consumed; even though memory is available on other MCMs on the node. Now, if a job is submitted with memory affinity option, MCM_MEM_NONE, the task will be bounded to all the MCMs on the node and the memory will be consumed from all the MCMs on the node.
- An incorrect spelling prevented the class stanza keyword striping_with_minimum_networks from being set when DB configuration was used. The spelling of the column name in the database is now corrected.
- LoadLeveler schedd may ignore jobs if the job queue contains invalid job keys. Now, LoadLeveler schedd will collect the correct job data when scanning the job queue files.
- The llrstatus -a reports "No adapters are available" after issuing the llrctl reconfig command. When a machine running a Resource Manager or Region Manager daemon is reconfigured, information about adapters on other machines was being wiped out. The configuration processing code in Resource Manager and Region Manager has been fixed so that existing adapter information will remain intact.
- When configuring the resources=keyword(all) in the machine group stanza in database mode, the llstatus -R command will show no resources being set. Resources will now become effective when setting the resources=keyword(all) in the machine group stanza in database mode.
- The schedd can core dump when a scale-across multi-cluster environment is configured incorrectly. This can happen if scale-across multi-cluster is configured and the same cluster stanza is specified as local for more than one cluster. LoadLeveler is changed to protect the schedd from core dumping the same cluster stanza is configured as local for more than one cluster in a scale-across multi-cluster environment.
- Resource Manager only:
- The LoadL_startd daemon leaks memory due to a failure to release memory allocated for data structures to hold switch table data for a job step. The LoadL_startd daemon is corrected to release all memory allocated for data structures to hold switch table data for a job step, when the job step data structure is de-allocated.
- A crash may occur in either the resource manager daemon or the negotiator daemon if those daemons received incorrect routing data during an update from startd. This could have happened when the feature keyword was used in the machine_group stanza under the administration file or database setup. The correct bits are now set by the startd daemon so that routing of the data will not cause the resource manager or the central manager to core dump.
- Scheduler only:
- LoadLeveler will occasionally show the wrong number of class resource slots or even miss some classes from the llclass output if too many class query requests come in simultaneously. LoadLeveler is now fixed to show the correct class resources in the llclass output.
- When maxidle is used for a given user within a class, dependent steps can be queued at a higher priority than non-dependent steps. Dependent steps are not given a new qdate when they are put onto the idle queue, while steps at the maxidle limit for a given user within a class are given a new qdate and a new sysprio based on that qdate. A change was made so that dependent steps are also counted as "queued" steps for the purposes of enforcing maxqueued and maxidle limits, and so a dependent step which is at the maxidle limit will get a new qdate.
Problems fixed in LoadLeveler 4.1.1.2 [January 28, 2011]
- The LoadLeveler startd drain status will be lost if the negotiator daemon restarts. Fixed the startd drain status to be stored onto each individual startds. When the negotiator daemon restarts, the startd drain information will be restored from all the startds.
- The llsummary command might crash if the default class requirement value doesn't match the job requirement value. Fixed the llsummary command to select the correct requirement value from the default class list if there is no job class specified in job command file.
- The llsummary command will fail when it tries to access invalid data memory in the job history file. Fixed the llsummary command to be able to ignore the bad data areas and just report the valid data in the job history file.
- LoadLeveler schedd may ignore jobs if the job queue contains invalid job keys. The schedd will now collect the correct job data when scanning the job queue files.
- The LoadLeveler command, llmodify, has a limitation where the startdate and wall_clock_limit job attributes cannot be modify for idle jobs. llmodify is now enhanced to be able to modify the startdate and wall_clock_limit job attributes for idle jobs.
-
New documentation:
- In the LoadLeveler Command and API Reference, SC23-6701-00, under Chapter 1. Commands, llmodify - Change attributes of a submitted job step,
- New keyword wall_clock_limit for the -k option: Changes the wall clock limit of a job step. The value of the specified wall clock limit must be longer than the value of the current wall clock limit. This is a LoadLeveler administrator only option.
- New keyword startdate for the -k option: Changes the start time of a idle-like job step. This is a LoadLeveler administrator only option.
- In the LoadLeveler Command and API Reference, SC23-6701-00, under Chapter 1. Commands, llmodify - Change attributes of a submitted job step,
- Resource Manager only:
- User jobs will not be launched on AIX if the group name did not match the one from the job submission. Fixed the job launch program so it d oes not need to verify the group name so jobs will be executed using the submitting GID number.
- A crash may occur in either the LoadL_resource_mgr daemon or the LoadL_negotiator daemon when the feature keyword is used in the machine_group stanza under the administration file or database setup. A fix has been made in supporting the specifying of the feature keyword in the machine_group stanza in the administration file or in the database.
- Scheduler only:
- LoadLeveler machines and jobs may have the wrong state if some startd are down and the region manager is enabled. Fixed LoadLeveler to handle machines and jobs status correctly when the region manager detects a machine to be down.
- LoadLeveler was trying to load the network table for jobs with job_type=MPICH and the job will fail to run if the network table can not be loaded. Since jobs with a job_type=MPICH do not require the loading of the network table. LoadLeveler will not load the network table with this job type specified in the job command file.
Problems fixed in LoadLeveler 4.1.1.1 [December 10, 2010]
- Fixed the schedd daemon so it will not crash if the job's output file path contains the "%" character.
- Fixed the -s and -e options in the llsummary command to report all the jobs that match the filter requirement. In the TWS LoadLeveler documentation, Command and API Reference and the llsummary.l manual page, the -s and -e options will state the accounting data report will contain information about every job that contains at least one step that falls within the specified range.
- Resource Manager only:
- Fixed the processor affinity environment to be setup correctly for jobs to run in when the job prolog is configured in LoadLeveler.
- Scheduler only:
- Fixed the llclass command to show the correct value for the "Free Slots" field when LoadLeveler is configured to use the LL_DEFAULT scheduler.
- Fixed the llchres command to check requested node additions to make sure that those nodes have no jobs running on them or already assigned to another reservation. If no idle nodes can be found, the llchres command will fail.
- Fixed LoadLeveler to correctly reserve the reservation's resources after the central manager daemon restarts so that jobs with overlapping resources with the reservations will not be allowed to start.
- Fixed the central manager to make sure pending status changes to the machines are properly locked so that jobs being scheduled to the down machines will no longer crash the central manager daemon.
Copyright and trademark information
http://www.ibm.com/legal/copytrade.shtml
Notices
INTERNATIONAL BUSINESS MACHINES CORPORATION PROVIDES THIS
PUBLICATION "AS IS" WITHOUT WARRANTY OF ANY KIND, EITHER EXPRESS
OR IMPLIED, INCLUDING, BUT NOT LIMITED TO, THE IMPLIED WARRANTIES
OF NON-INFRINGEMENT, MERCHANTABILITY OR FITNESS FOR A PARTICULAR
PURPOSE. Some jurisdictions do not allow disclaimer of express or
implied warranties in certain transactions, therefore, this
statement may not apply to you.
This information could include technical inaccuracies or
typographical errors. Changes are periodically made to the
information herein; these changes will be incorporated in new
editions of the publication. IBM may make improvements and/or
changes in the product(s) and/or the program(s) described in this
publication at any time without notice.
Microsoft, Windows, and Windows Server are trademarks of Microsoft
Corporation in the United States, other countries, or both.
Intel, Intel logo, Intel Inside, Intel Inside logo, Intel Centrino,
Intel Centrino logo, Celeron, Intel Xeon, Intel SpeedStep, Itanium, and
Pentium are trademarks or registered trademarks of Intel Corporation or its
subsidiaries in the United States and other countries.
Other company, product, or service names may be trademarks or
service marks of others.
THIRD-PARTY LICENSE TERMS AND CONDITIONS, NOTICES AND INFORMATION
The license agreement for this product refers you to this file for
details concerning terms and conditions applicable to third party
software code included in this product, and for certain notices
and other information IBM must provide to you under its license
to certain software code. The relevant terms and conditions,
notices and other information are provided or referenced below.
Please note that any non-English version of the licenses below is
unofficial and is provided to you for your convenience only. The
English version of the licenses below, provided as part of the
English version of this file, is the official version.
Notwithstanding the terms and conditions of any other agreement
you may have with IBM or any of its related or affiliated entities
(collectively "IBM"), the third party software code identified
below are "Excluded Components" and are subject to the following
terms and conditions:
- the Excluded Components are provided on an "AS IS" basis
- IBM DISCLAIMS ANY AND ALL EXPRESS AND IMPLIED WARRANTIES AND CONDITIONS WITH RESPECT TO THE EXCLUDED COMPONENTS, INCLUDING, BUT NOT LIMITED TO, THE WARRANTY OF NON-INFRINGEMENT OR INTERFERENCE AND THE IMPLIED WARRANTIES AND CONDITIONS OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE
- IBM will not be liable to you or indemnify you for any claims related to the Excluded Components
- IBM will not be liable for any direct, indirect, incidental, special, exemplary, punitive or consequential damages with respect to the Excluded Components.