Description: Readme documentation for IBM Spectrum
MPI 10.1.0 Fix Pack 4 including installation-related instructions,
prerequisites, and list of fixes.
Readme file for: IBM Spectrum MPI
Product/Component Release: 10.1.0
Update Name: Fix pack 4
Fix ID: Spectrum_MPI_10.01.00.04
Publication date: 29 August 2017
Last modified date: 29 August 2017
View the IBM
Spectrum MPI 10.1.0 Fix Pack 4 website
(http://www.ibm.com/systems/spectrum-computing/products/mpi/index.html).
IBM Spectrum MPI
None.
IBM Spectrum MPI must be
installed on all machines in the same directory or be accessible through the
same shared network path. The following describes the process of installing the
product using the RPM toolset.
Step 1: Obtain software packages.
Step 2: You must have root
authority to install the package.
Step 3: Install RPMs and accept
the license packages.
To
review the license terms and manually accept the license:
If you did not set the
environment variable IBM_SPECTRUM_MPI_LICENSE_ACCEPT=yes, the license RPM
installation will print the location of a license acceptance script that must
be run to review and accept the license.
None.
Spectrum MPI one-sided operations are not thread safe. Multi-threaded applications that require
MPI_THREAD_MULTIPLE that use MPI one-sided operations may experience silent
data corruption.
Spectrum MPI 10.1.0.4 has disabled all MPI one-sided communication for
jobs that initialize the MPI execution environment using
MPI_THREAD_MULTIPLE. This is done at the MPI_Init_thread()
API, when MPI_THREAD_MULTIPLE is requested.
Users of Spectrum MPI 10.1.0.4 who use MPI_Init_thread()
with MPI_THREAD_MILTIPLE will see the following error at the first call to MPI_Win_create():
[<host>,<pid>] *** An error
occurred in MPI_Win_create
[<host>,<pid>] *** reported by process
[3497459713,0]
[<host>,<pid>] *** on communicator
MPI_COMM_WORLD
[<host>,<pid>] *** MPI_ERR_WIN: invalid
window
[<host>,<pid>] *** MPI_ERRORS_ARE_FATAL
(processes in this communicator will now abort,
[<host>,<pid>] *** and
potentially your MPI job)
There are no known issues using MPI_THREAD_SINGLE, MPI_THREAD_FUNNELED,
or MPI_THREAD_SERIALIZED.
The MPI standard defines several APIs to allow for synchronization
between the different MPI processes involved in a one-sided operation. In some cases, Spectrum MPI with the PAMI
interconnect protocol, the one-sided synchronization calls may return early,
before the one-sided remote operation is complete. The early return is
intermittent, and not reliably reproducible in all environments.
The following MPI APIs may experience an early return, before the remote
operation is complete.
o MPI_Win_fence()
o MPI_Win_unlock()
o MPI_Win_complete()
o MPI_Win_wait()
Silent
data corruption will occur with IBM Spectrum MPI when using MPI_Reduce_scatter(), MPI_Ireduce_scatter(),
MPI_Reduce_scatter_block(), and/or MPI_Ireduce_scatter_block(), for message buffer size of 8K
bytes or larger, when the datatype size and the datatype extents are
different.
The datatype size is the amount of data that will be transferred by the MPI
collective operation. This is the amount
of memory, in bytes, that will be consumed by one instance of a datatype.
The datatype extent is the amount of memory consumed from the first byte to the
last byte occupied by one entry of this datatype. The datatype extent may contain padding space
between successive elements to allow for more optimal memory placement of the
datatype.
User defined non-contiguous datatypes are the most common
situation when the datatype size and the datatype extent do not match.
As a
workaround with previous versions of IBM Spectrum MPI 10.1.0.x, the issue can be avoided
by disabling the specific collective algorithm.
Add this to the mpirun command line. This command will avoid the issue for both
the blocking and non-blocking MPI APIs:
-mca coll_ibm_skip_reduce_scatter true -mca
coll_ibm_skip_ireduce_scatter true
Silent data corruption may occur when
using the MPI_lreduce non-blocking collective, with 4
or fewer ranks participating in the collective, a message size 64kb or larger,
and with MPI_IN_PLACE.
Customers will experience silent data
corruption using the MPI_Ireduce collective with
Spectrum MPI when the following conditions are met:
Silent data corruption may occur for
Spectrum MPI applications that use the one-sided operations MPI_Put()
or MPI_Accumulate() with user-defined non-contiguous
datatypes on Power systems.
This error only affects applications
running on Power (ppc64le) systems.
Applications running on x86_64 systems are not affected.
Spectrum MPI uses the “-pami” interconnect
protocol by default. The Mellanox fabric collectives must be explicitly requested
using the mpirun command line option “-HCOLL/-hcoll” or “-FCA/-fca”. In these cases, some MPI_Reduce()
operations can result in silent data corruption.
As a
workaround with previous versions of IBM Spectrum MPI 10.1.0.x, users can add “-x HCOLL_ML_DISABLE_REDUCE=1”
to the mpirun command line. This will disable the
HCOLL MPI_Reduce() algorithm, and cause another MPI_Reduce() algorithm to be selected to complete the
transaction.
MPI_Gatherv(), MPI_Igatherv(),
MPI_Allgatherv(), MPI_Iallgatherv(),
MPI_Alltoallv(), and/or MPI_Ialltoallv()
all pack data before
transmission and unpack data after transmission. User defined datatypes with a true lower
bound that is not zero are not unpacked correctly. This may cause silent data corruption in the recv side buffer, or may result in segmentation violations
(e.g. segfault).
As a
workaround with previous versions of IBM Spectrum MPI 10.1.0.x, the specific
IBM collective algorithms may be disabled.
This will avoid the issue. To
disable coll/ibm
collectives in a single mpirun command line option:
-mca coll ^ibm
Disable
just those specific collectives (blocking and nonblocking)
affected by the issue. This is the
preferred method:
-mca coll_ibm_skip_gatherv t[rue]
-mca coll_ibm_skip_allgatherv
t[rue]
-mca coll_ibm_skip_alltoallv
t[rue]
-mca coll_ibm_skip_igatherv t[rue]
-mca coll_ibm_skip_iallgatherv
t[rue]
-mca coll_ibm_skip_ialltoallv
t[rue]
When MPI_Type_create_subarray() is used with a list of distribs[] arguments that contains both
MPI_DISTRIBUTE_BLOCK and MPI_DISTRIBUTE_CYCLIC, the resulting datatype may have
incorrect offsets and thus produce silent data corruption of data transferred
with MPI calls that use the effected subarray.
The datatype layout is incorrect, causing data to be placed
in the wrong memory offsets.
MPI_Accumulate() is used to combine data
that is received from a remote process, with data that already exists in a
local process. One typical programming
model is to allow multiple remote ranks to all MPI_Accumulate()
data into a single target buffer.
In some cases, silent data corruption can occur when
multiple ranks use MPI_Accumulate() to combine data
in a single target buffer. This silent
data corruption only occurs with the RDMA one sided component (osc=rdma) is used.
This issue only applies to supported options on x86_64
systems.
The issue is not present in pami (osc=pami component) or in the
point to point (osc=pt2pt component).
There
is no workaround for this issue when using the osc=rdma component
Silent data corruption may occur when
using MPI_Recv, MPI_Irevc, MPI_Sendrecv, MPI_Recv_init, MPI_Mrecv and/or MPI_Imrecv to receive data into a user defined non-contiguous receive
buffer that is allocated on the GPU. The
gaps in the receive data buffer may be overwritten during the data
transfer.
Silent data corruption may occur when
using the MPI_lreduce non-blocking collective, with 4
or fewer ranks participating in the collective, a message size 64kb or larger,
and with MPI_IN_PLACE.
Customers will experience silent data
corruption using the MPI_Ireduce collective with
Spectrum MPI when the following conditions are met:
· Update to Open MPI 2.0.1
· Add support for STAT Debugger
· Add support for PGI Compilers
· Add support for Mellanox
HCOLL on RH 7.3 with MOFED 3.4 (Power only)
· Add support for usNIC
(x86_64 only)
· Add support for PSM2 on RH 7 and SLES 12
(x86_64 only)
· Add “-pami_noib”
option to allow PAMI shmem use on single node without
InfiniBand (Power only)
· Add check for license acceptance in
$MPI_ROOT and default install location (/opt/ibm/spectrum_mpi)
· Add support for Dynamic Connect Transport
· Add support for non-blocking collectives
with CUDA Aware
· Fix issue with serial CUDA jobs
· Fix Epoll ADD
failure
· Fix "mpirun
--debug" option
· Fix “–stdio
file” option
· Fix LD_PRELOAD to honor user settings
· Fix shmem object
leaks in PAMI
· General improvement for IBM libcoll
· General improvements for RMA (1sided)
operations
In RMA
operations when a communication epoch is closed with one of the following
synchronization APIs, the application could have silent data corruption of the receive
side memory buffers:
Copyright IBM Corporation 2017
U.S. Government Users Restricted Rights - Use, duplication or disclosure
restricted by GSA ADP Schedule Contract with IBM Corp.
IBM, the IBM logo and ibm.com are
trademarks of International Business Machines Corp., registered in many
jurisdictions worldwide. Other product and service names might be trademarks of
IBM or other companies. A current list of IBM trademarks is available on the
Web at "Copyright and trademark information" at
www.ibm.com/legal/copytrade.shtml.