History of problems fixed
The following fixes apply to IBM PE_RTE updates for both Power and x86 Systems, unless noted otherwise.
IBM PE_RTE 1.2.0.17 [September 29, 2015]
MPI:
MPICH:
PAMI:
- Critical fixes for Silent Data Corruption
- General fixes
PDB:
POE:
IBM PE_RTE 1.2.0.16 [October 30, 2014]
MPI:
MPICH:
PAMI:
- Critical fixes for Silent data corruption in certain PE RTE PAMI and MPICH apps
PDB:
POE:
IBM PE_RTE 1.2.0.15 [May 16, 2014]
MPI:
MPICH:
PAMI:
- Critical fixes for Silent data corruption in some PAMI, MPICH, & OpenSHMEM apps, PAMI Fence Operations
- General fixes
PDB:
POE:
IBM PE_RTE 1.2.0.14 [January 13, 2014]
MPI:
MPICH:
- GENCI: negative source returned from MPI_Waitany
- General fixes.
POE:
- Fixed a problem in mpfort due to library path name.
- Fixed a problem in line number missing in light weight core file.
- Fixed a problem that caused 0031-161 EOF ON SOCKET CONNECTION error message.
- Fixed an issue that generates excessive GOMP_CPU_AFFIBITY warning messages.
- Fixed a problem in some PE compiler scripts which write error messages to stdout.
- General fixes.
MDCR:
PAMI:
- Changed the default values for MP_POLLING_INTERVAL and MP_DEBUG_SLOT_DATA_SIZE.
- General fixes.
PDB:
IBM PE_RTE 1.2.0.13 [May 28, 2013]
MPI:
- Possible data corruption with Linux RDMA applications that change malloc behavior.
POE:
MDCR:
PAMI:
- Possible data corruption with Linux RDMA applications that change malloc behavior.
PDB:
IBM PE_RTE 1.2.0.12 [March 15, 2013]
MPI:
- Display a warning message in cases where FCA is not supported.
- Add support for MP_EAGER_LIMIT data sizes with a K or M value.
- Fixed an issue in MPI_Scatterv with complex data types.
- Fixed a deadlock due to out of order messages..
- Fixed a problem in MPI_SendRecv with data sizes less than 232 bytes.
- Fixed a Linux Fortran module compatibility issue.
- Fixed various Intel MPI binary compatibility issues.
- Fixed a problem with the MPI IO file format.
- Changed the MP_EUIDEVELOP default to "min".
- Fixed a problem where MPI jobs hang with MPI_Iprobe using mpich2.
- Fixed a problem with the number of available MPI-IO threads.
- Fixes an internal buffer too small error with MPCI.
- Fixes an issue with library paths.
- Changed alltoall throttle to 32 outstanding sends.
- Fixed a shared memory assert case on initialization.
- Ensure that MPI_Wtime is syncronized with MPICH2.
POE:
- Fixed an issue with creating an initial PMD log in /tmp.
- Added out of memory killer support.
- Fixed a problem where POE reports a signal number when a task exits.
- Fixed a problem with user space jobs hanging on nodes where InfiniBand is not configured.
- Warn users when lightweight core files were requested however the system is not able to generate the files.
- Updated PMD to dynamically open shared libraries where appropriate.
- Change POE to use a default of core task affinity when MP_PE_AFFINITY is set for x86 environments.
- Fixed a problem with MP_COREFILE_FORMAT set to STDERR causing poe pulse timeouts.
- Fixed a problem with the format of the debug socket data.
- Fixed an issue in standalone POE to set the XLSMPOPTS bindlist option.
- Updated static CPU set configuration command.
- Fixed a LoadLeveler compatibility issue with PE 1.1 and PE 1.2.
- Updated packaging & installation for support of multiple releases & rolling upgrades.
- Fixed an issue with SSH security authentication message exchanges.
- Fixed a problem with incorrect endpoint values while running sub jobs.
MDCR:
PAMI:
- Fixed several issues in the pe_node_diag script displaying host name & shmmax results.
- Fixed an issue with MPI & MPICH support of concurrent sub jobs.
- Disable FCA support of IP jobs over InfiniBand.
- Added out of memory killer PNSD support.
- Support adapter and memory affinity on x86.
- Fixed a problem with PNSD IB port numbers displayed incorrectly when adapter is queried.
- Fixed a problem with opening a lot of pipes & spawning processes excessively.
- Display the affinity CPU set data as part of debug info for each task.
- Fixed an issue with displaying confusing debug messages.
- Fixed an issue with overriding child task information for sub jobs.
- Fixed a problem in the alltoall must-query algorithms.
- Fixed a LoadLeveler compatibility issue with PE 1.1 and PE 1.2.
- Prevent BSR & CAU debug output messages when not requested.
- Fixed an issue with incorrect IPv6 and network information for x86 nodes.
- Allow an option to prevent "HFI_FORK_SAFE" from being set.
- Fixed an issue with LAPI when MP_RELIABLE_HW is set to an incorrect value.
- Fixed an RDMA timeout issue when InfiniBand ports are masked.
- Fixed a problem restarting PNSD.
- Fixed a problem where PNSD configuration file could not be read.
- Fixed a problem with MPI_Reduce with mp_collective_selection = yes.
- Changed the MP_RFIFO_SIZE default to 16MB.
- Fixed a PNSD problem when XRC is enabled & the number of available file descriptors is exceeded.
- Fixed an issue that could cause application to core dump.
- Fixed a compliance issue for IBM OpenSHMEM support.
- Fixed cpuid inline assembly to avoid register clobbering.
- Communication resources are not cleaned up properly after abnormal termination of jobs.
- Fixed possible application hangs when sending small message if MP_RELIABLE_HW=yes.
- Fixed MPI_COMM_SPLIT core issue when MP_COLLECTIVE_OFFLOAD=fca.
PDB:
IBM PE_RTE 1.2.0.11 [March 11, 2013]
MPI:
POE:
- Fixed several issues with Linux checkpoint/restart support.
MDCR:
PAMI:
- Prevent BSR from runnning on P7IH with MCM affinity.
PDB:
IBM PE_RTE 1.2.0.10 [December 19, 2012]
MPI:
- Possible data corruption in collective I/O routines.
- Fixed hang in MPI I/O routines.
- Fixed an issue with RDMA without shared memory.
- MPI-IO operation hangs at MPI_File_Read with large scale jobs.
- Fixed a problem with communication subsystem errors with MPI I/O routines.
POE:
MDCR:
PAMI:
- Fixed a hang in MPI Allreduce.
PDB:
IBM PE_RTE 1.2.0.9 [October 12, 2012]
MPI:
- Added a timeout to prevent a hang with MPI_Finalize.
- Fixed an issue with MPICH library search paths.
- Fixed an issue with low bandwidth when -map keywords is used.
- Fixed an issue with Intel MPI compiler scripts to reduce compile time.
- Fixed a problem with incorrect MPICH shared memory values reported with MP_PRINTENV.
- Corrected a missing symbolic link for binary compatibility.
- Fixed a problem with memory exhausted when MP_BUFFER_MEM is set to zero.
- Fixed a hang when -wait_mode is specified.
- Provided an option to limit FCA use.
POE:
- Fixed an issue with checkpoint/restart of batch jobs.
- Fixed an issue with restarting large scale jobs.
- Fixed a problem with MPMD job launch.
- Fixed a problem with dynamic tasking multiple job launch.
- Added a script for static cpuset configuration.
- Fixed a problem with programs containing large ELF header sizes.
- Added more debug timing information.
- Added support for out of memory conditions.
MDCR:
PAMI:
- Increased the maximum number of contexts for multiple endpoints.
- Add NRT API to check invalid adapter name and type error codes.
- Updated PNSD port configurations.
- Fixed a memory leak in pami_init.
- Fixed an issue with MP_DEBUG_NOTIMEOUT not recognized by PAMI.
- Fixed an issue with point to point benchmark performance.
- Fixed an issue with hardware multicase function running IP over HFI.
PDB:
IBM PE_RTE 1.2.0.8 [August 27, 2012]
MPI:
- Fixed a problem with a hang with striping.
- Fixed a problem with missing MPI_Sizeof and MPI typeclass Fortran definitions.
- Fixed a problem where MPI_All_reduce fails in INTEL MPICH2.
- Enabled UFS IO compilation to support GPFS.
POE:
- Fixed the parsing of MP_SHMEM_PT2PT MPI environment variables.
- Fixed a problem parsing the hostfile containing invalid characters.
- Fixed a problem where the job failed to run if rmpool and hostfile both set.
- Fixed a problem with the Linux coscheduler prematurely aborting.
MDCR:
PAMI:
- Fixed a problem caused by double freeing of global objects.
- Added support for deadlock detection.
- Fixed a problem during resume if HAL_OPEN fails.
- Fixed a problem where job failed to load ntbl after preemption.
- Fixed a problem at geometry creation with multiple endpoints enabled.
- Added a new environment variable to use IB lmc in run time.
- Fixed a problem in RDMA mode when LMC=1 and MP_INSTANCES=2.
- Fixed an issue with increased latency on x86.
PDB:
- Fixed an issue where PDB interrupt will kill tasks.
IBM PE_RTE 1.2.0.7 [July 19, 2012]
MPI:
- Add n-way binary compatility for MPICH2 programs.
- Fixed a problem where MPI did not properly create the datatype requested.
POE:
- Fixed a problem with clock reinitialization during a restore.
MDCR:
- Add more log messages when checkpoint cannot connect to the job file.
- Add new options to fix a shared memory problem in container environment.
PAMI:
- Improved multi-threaded application performance.
- Fixed a problem with PAMI Client query returns numeric processor name.
- Fixed a problem with a PAMI core dump when program was invoked without POE.
- Increased the PAMI HFI receive FIFO limit from 16MB to 64MB.
- Fixed an issue with Resume not working after full preemption.
- Fixed a problem with timeouts based on wall time instead of real time.
- Fixed a problem when user space initialization hangs with shared memory.
- Fixed multiple issues with geometry creation.
- Fixed a problem causing a timeout in MPI_Finalize at large task counts.
- Fixed a memory leak with MPI_Allgathr in PAMI_Type_create. li>
- Improved the scalability in RC FIFO mode.
- Fixed a problem causing memory growth when using RDMA.
- Handled Client_create failure when MP_MSG_API is not set.
- Return error if total number of shared memory endpoints is greater than 128.
- Moved initialization of Client's gc_read after counters are initialized to avoid race condition.
- Fixed a problem that causes start_pe to fail if SHMEM_ERROR_REPORT=print.
- Fixed a problem to improve performance on HFI DD2.0 hardware with SHMEM_DEBUG_HFI20_WORKAROUND=yes.
- Distribute shmem locks to different PEs to avoid congestion.
- Provided user level RDMA on IB for performance
- Improve performance by pre-generating all PE's endpoint numbers.
PDB:
IBM PE_RTE 1.2.0.6 [June 19, 2012]
MPI:
- Fixed a problem with the MPI_File file handle size for Fortran.
- Fixed a problem with incorrect statistics data from an MPICH2 job.
- Fixed a performance problem by removing PAMI_Rget from mpich2 shared memory path.
- Fixed a problem where clock source changed after a restart.
- Fixed a problem with missing library symbolic links for OpenMP ABI.
- Fixed a problem with missing _wtime_global symbol.
- Fixed a hang in ibsend.
POE:
- Fixed a problem with clock reinitialization during a restore.
- Fixed an issue with jobs failing with POE pulse timeouts.
- Updated AIX packaging to ensure latest SCI daemon gets started.
- Fixed an issue with jobs failing with POE pulse timeouts.
- Fixed a problem running under LDAP.
- Fixed a problem running jobs from a script.
- Updated packaging to allow SCI to be shared across other components.
MDCR:
- Fixed an issue with redirected STDOUT.
PAMI:
- Improved performance by removing reliability acknowledgements.
- Fixed a memory leak with IP and IP over InfiniBand.
- Fixed a hang with shared memory in reliable hardware mode.
- Fixed a problem with the PNSD API memory leak on resume.
- Fixed a hang with multiple client create cases.
- Fixed a problem compiling Fortran programs.
- Fixed a problem in MPI_Win_fence with multiple MPI jobs.
- Fixed an abnormal termination or hang with 8K tasks.
PDB:
- Fixed a problem where the wrong taskid is reported.
- Fixed a problem with a core dump when PDB failed to launch the job.
- Fixed a problem attaching to tasks.
IBM PE_RTE 1.2.0.5 [June 11, 2012]
MPI:
- Fixed a problem with hangs in MPI jobs after MPI_Finalize times out.
- Fixed a problem with nonblocking MPI IO failures with MPICH2.
POE:
- Fixed a problem with WAIT next to COMPLETE in a command file.
- Fixed an issue where MPI jobs were killed when using the POE run-queue co-scheduler.
- Fixed a problem with the coscheduler library path not being correctly set.
- Added support for checkpoint/restart.
- Added packaging & installation improvements.
- Fixed a problem with an empty MP_MSG_API value.
- Fixed an issue with task signal handler reentry.
- Updated pe_node_diag script to display xinetd settings.
- Added MP_EUILIBPATH value to the LD_LIBRARY_PATH when launching the job.
- Fixed a problem with internal messaging for TotalView support.
- Fixed a problem with open files in task child code with task affinity.
- Fixed an issue with I/O buffer overflow when MP_LABELIO=YES is set.
PAMI:
- Fixed an issue with jobs timing out due to AIX clock drift.
- Fixed a problem where PNSD will sometimes crash when processing the STATUS_ADAPTER request.
- Fixed an issue with User Space jobs failing on P7IH.
- Fixed a problem with collective operations under Fortran programs.
- Fixed a problem where the NRT API provides an incorrect return code value on unload functions.
- Fixed a problem with multiple endpoints with large task counts.
- Fixed an issue with long delays & timeouts during job initialization with shared memory in use.
- Fixed a problem with initialization with RDMA when resuming a job.
- Fixed a problem with a hang during sub jobs with MPICH2.
- Added latency performance improvements.
- Fixed a problem with a failure when using shared memory.
- Fixed a hang using multiple protocols.
- Added support to create multiple PAMI contexts.
- Improved shared memory bandwidth performance.
PDB:
- Fixed a problem debugging 32-bit applications.
- Fixed a problem where PDB was consuming a lot of CPU time.
- Fixed a problem with a hang in PDB launch mode.
- None
IBM PE_RTE 1.2.0.2 [April 9, 2012]
MPI:
- Add support for Statistics utility functions.
- Fixed a problem building MPI programs with MPICH2 compiler scripts.
POE:
- Fixed a problem when multiple protocol jobs hang during exit or when real-time signals are generated.
- Fixed an issue with pthread_cancel.
- Fixed a problem accessing the library path from the environment.
- Fixed an issue with long running applications.
PAMI:
- Fixed an issue with wrong answers after a restart.
- Fixed a shared memory leak issue.
- Fixed FORTRAN C binding name mingling problem.
- Fixed a problem with creating queue pairs.
PDB:
- Fixed a problem debugging 32-bit applications.
IBM PE_RTE 1.2.0.1 [March 19, 2012]
MPI:
- Fixed problems compiling & running MPI/MPICH2 programs
- Provide the library level used during job execution.
- fixed MPI program migration issues.
- Fixed an RDMA performance issues.
- Fixed na issue with MPI_FILE_READ_ALL hanging.
- Fixed an issue with MPI I/O lockless support.
- Fixed MPI point to point issues.
POE:
- updated man pages to correct some documentation details.
- corrected output for poe -v option to show installed RPM levels.
- cleanup the pelinks script for Linux.
- correct issues with POE running with huge pages.
- Fix task affinity issues on x86 Linux platforms.
- Fixed an issue with POE supporting short host names.
- Provided PE run time installation enhancements including automatically setting the SSH library version.
PAMI:
- Fix an issuw with MPI_Reduce_scatter_block producing bad data.
- Fix issues with MPICH RDMA reporting wrong data.
- Fix issues with compiling & running MPI/MPICH2 programs.
PDB:
- Fix an issue where PDB hides output.
- Correct issues with mangled C++ function names.
- Fix support initialization issues.