Unless specifically noted otherwise, this history of problems fixed for IBM Spectrum Scale 4.2.x applies for all supported platforms.
Problems fixed in IBM Spectrum Scale 4.2.2.3 [March 17, 2017]
- Fix an issue where mmlsdisk may get segfault when it receives a SIGTERM.
- Fix a gpfs cluster hang that can occur after a mmdiag --threads hangs.
- Fix the no such file or directory error in gpfs_get_fssnaphandle on Linux.
- Fix a problem in deleting AFM recovery snapshots that are leftover after using the mmdelsnapshot command.
- Fix a rare case assert "Assert exp(e == E_OK)" which can happen while running the mmcrfs command.
- Fix a 'incompleteOk' assertion that can occur during CCR enable.
- Fix hang problems with reads over nfs target for AFM filesets.
- Fix a problem in which FlushPending succeeds on a fileset which is in an unlinked state.
- Fixed the slow system monitoring startup due to a deadlock caused by clearing the FAILED CES flag as a part of a failover upon the startup of system monitoring.
- Fix a deadlock that can occur when changeSecondary of a primary fileset is in progress.
- When a token manager fails or token manager list changes, GPFS will do token domain recovery. DOMAIN_RECOVERING status may cause token reset request being mishandled and leave a token in COPYSET status forever which makes subsequent requests on that token hang.
- Fix a gpfs daemon crash that can occur while replaying operations to home with node failures at home.
- Fix a GNR recovery group failure to recover due to too many unstable drives.
- Fix a quorum loss that can occur when tiebreaker disks are not available during GPFS startup (CCR).
- Fix a deadlock that can occur when trying to queue transfers to the old gateway that was serving the fileset, this can occur if the new gateway node for that fileset is running a recovery for that fileset.
- Fix a deadlock in the AFM environment where adding a new gateway node using mmchnode could cause a deadlock when IO is happening.
- Fix a problem where the fileset is left in an intermediate state after a changeSecondary failure.
- Fix a rare race condition which can lead to assert (blockToExpand == ofP->metadata.getLastDataBlock()) in expandLastBlock.
- Fix a no space error in FPO pool if the specified WADFG is invalid.
- Fix a problem in which mmchqos -N node1 pool=system,other=200iops is applied to all nodes and not just node1.
- Fix unexpected NFS errors that can occur durring snapshot deletion.
- Fix a problem in which Ganesha operation numbers do not match GPFS operation numbers.
- Fix a problem in the convertToPrimary command where if the command is run with the --secondary-snapname option, it reports that the fileset is left in primInitFail state.
- Fix a problem in which mmrepquota reports unused quota entries when -v is not specified.
- Fix a mmprotocoltrace crash that can occur on a cluster with an enabled sudo wrapper.
- Fix a problem where GPFS did not interpret the return value from IOCINFO ioctl call on AIX correctly. That caused the wrong disk size to be reported during mmadddisk.
- Fix a problem in which mmnfs change config did not accept negative values.
- Fix a potential dirty read during small sequential read of a compressed file if the compressed file is overwritten from a different node while the small sequential read continues to retrieve obsolete data in a small side cache for decompressed data.
- This bug only affects ESS and GSS systems. It shows up as a GPFS daemon failure due to a signal 11 in NotGlobalMutexClass::acquire, called from AUExtentGroup::lcCallback.
- Fix a problem in which users were not able to use 'DEFAULT' as the value for a Ganesha configuration attribute.
- Fix a problem in which "mmlsquota -g" fails to get gid. This can occur on linux or AIX. This can occur if there is a very long line in /etc/passwd or /etc/group.
- Fix a kernel crash while gpfs handling OPENHANDLE_GET_VERIFIER op for Ganesha.
- Fix a problem in which internal recovery snapshots could not be deleted.
- Address a problem where fileset doesn't move out of disconnected state when the home has an Active NFS running.
- Improve message and user action information when kernel NFS service is preventing CES NFS service startup.
- Fix memory leak on client file read when checksums are enabled.
- Fix err 124 while resync is run when fs version is lower than 4.1.1.
- Fix code to avoid segmentation fault during purging CCR request queue.
- Fix an issue where the CWD may appear in the prefix of device names in /proc/mounts and /etc/mtab.
- Fix a problem in which AFM prefetch does not fetch empty directories.
- Fix a problem in which CES can not assign public IP addresses to CES designated nodes.
- Fix a problem in which a user could not use the '-' character in a netgroup name when attempting to add an export 'mmnfs export add'.
- Fix a problem in which mmchcarrier --replace fails trying to update firmware. This fix applies to GSS/ESS customers.
- Fix code to prevent potential kernel crash when performing read/write on very large file. This could occur when number of prefetched buffers goes over 32767.
- Fix a problem in which GPFS is left in arbitrating state. This can occur if you configure a tiebreaker disk that not all quorum nodes have access to.
- Fix kernel panic when GPFS Dead Man Switch timer expired in GSS and ESS configurations.
- This update addresses the following APARs: IV93134 IV93163 IV93596 IV93597 IV93710 IV93711 IV93712 IV93713 IV93714 IV93715 IV93716 IV93813 IV93952 IV93953 IV93956 IV93958.
Problems fixed in IBM Spectrum Scale 4.2.2.3 [January 27, 2017]
- Fix a mmfsd daemon crash for assert rmsgP != __null.
- Fix a problem in which mmcrrecoverygroup fails when the admin and daemon networks are different. This fix applies to GSS/ESS customers.
- Fix a signal 11 that can occur with active AFM filesets and heavy IO.
- This fix provides AIX encryption performance improvements.
- Fix a problem where the system monitoring framework was not reacting to configuration changes.
- Fix a problem in which the admin network is not listed in the output of "mmhealth node show network".
- Fix a problem in which gpfs_getacl returns ENOSPC. This can occur when the acl length exceeds the size of the buffer provided.
- Fix a problem in which gpfs.service gets disabled and ccrmonitor gets killed after upgraded.
- Fix a problem in which RemoteCmd gets stuck waiting for the pcacheListMutex.
- Fix a problem in which "mmhealth cluster show" shows the wrong information for the components which were unhealthy and then became HEALTHY, while the corresponding services were STOPPED, DISABLED or SUSPENDED.
- Fix a multipath device failure that reads "blk_cloned_rq_check_limits: over max size limit" which can occur when kernel function bio_get_nr_vecs() returns a value which is larger than the value of max sectors of the block device.
- Fix a problem with CES node fail-over when the failing CES node stays powered off.
- Fix a problem where error messages regarding cesiplist were shown in /var/adm/ras/mmsysmonitor.log on non CES-enabled clusters.
- Fix a problem in which mmbackup falsely claims skipped files if multiple tasks combine outputs.
- Fix an issue in the AFM environment where reads and evictions of a file can cause deadlocks.
- mmperfmon query option -N now supports the following inputs: node number, node short name, node full name, node ip address.
- Fix a 'isValidSocket(sock)' assertion that can occur when the incoming CCR request rate is much higher than the CCR server can handle.
- This update addresses the following APARs: IV92103 IV92104 IV92105 IV92397 IV92398.
Problems fixed in IBM Spectrum Scale 4.2.2.1 [December 16, 2016]
- CNFS: fix recursive calls during shutdown which may cause LOGASSERT.
- Fix data corruption that can occur writing large files using parallel IO and multiple gateway nodes.
- Fix a kernel crash during mmshutdown in kxAttachSharedMemory by tsctl process.
- Fix "mutexMagic != 0x59C0DEAD" assert by the SGExceptionLocalPanicThread that can occur during quorum loss.
- mmlsfileset: allow remote file system. Option -i and -d are now allow.
- mmstartup: verify kernel/module configuration before start runmmfs.
- Fix a very rare race condition that can lead to kernel panic. This issue could only occur on Linux when a mix of AIO and buffered IO are being used to read and write to the same file from multiple nodes.
- Fix data mismatch problems and snapshot mismatches between old primary and acting primary during an applyUpdates in the failbackToPrimary procedure.
- Fix a problem in which mmbackup returns the wrong number of objects handled. This can occur if NUMBERFORMAT is set incorrectly.
- Fix Kernel BUG: illegal operation locks_wake_up_blocks+0x6c.
- mmlsconfig: may not return correct value due to stale cache.
- Fix code to prevent a possible kernel panic during GPFS daemon shutdown on Linux. In order for problem to occur, filesystem must be mounted with active workload when GPFS daemon is been shutdown.* Fix assert in the Gpfs daemon by disallowing GetAttr for DR Primary fileset.
- Fix assert in the daemon that can occur during a Gateway queue transfer and new gateway node joins at the same time.
- Fix slow offline fsck repair. Allow doDeferredDeletions() to cleanup afm pre-destroyed inodes in ex-afm filesets.
- Fix a deadlock that can occur if a node of the cluster is down while GPFS attempts to establish a RDMA connection using RDMA CM.
- Fix an issue in AFM environment where queue memory usage threshold (afmHardMemThreshold) is not honored.
- Fix a remote error 17 creating and deleting hardlinks to files during log recovery.
- Fix a mmsdrrestore command failure that can occur when a tiebreaker disk is in use (CCR setup).
- This fix will make sure data is synced to disk only when required.
- Fix mmfsd crash in an AFM environment that can occur when a lookup on the same file is triggered from multiple nodes.
- Fix a debug assert in performPcacheMsg that can occur when the hardQMemLimit is hit for a fileset.
- During IFS System upgrade, occasionally node crash is observed if AFM is configured and in use.
- Fix Renames clogging the AFM queue during recovery for IW filesets.
- Apply this fix if you are troubled by bogus messages in the mmfs log files during file system creation.
- This modification does not change the functionality of GPFS, neither affects the appearance of the software, however it improves the effectiveness (speed) of the code in certain disk operations and introduces a mechanism (by explicitly distinguishing user- and kernel-space objects) that can be used for implementing other critical parts on s390x platform.
- Fix a deadlock that can occur during high volume file creations and deletions in a multi cluster environment.
- Fix mmlsquota -j returning a wrong answer that can occur if there is a special character in the name of the stripe group.
- Handle upgrade of Windows node from 3.5 and 4.1 to 4.2 correctly.
- Fix a problem in which mmces fails to change the address of object protocol attributes.
- Honor parent folder's DELETE_CHILD right during rename operation from Windows node.
- Fix a problem in which mmbackup with -B value > 32768 causes missed files.
- Applications using dmapi for HSM may benefit by knowing if a file that is being migrated is currently open and being accessed on some node in the cluster. Extend the dmapi dm_get_fileattr command to take a new flag "DM_AT_FOPEN" to query if the file is open anywhere in the cluster.
- Prevent the execution of mmhealth on unsupported nodes.
- Change in dmapi dm_stat_t structure returned by dm_get_fileattr used by TCT. Update must be coordinated with TCT update.
- This update addresses the following APARs: IV90831 IV90833 IV90864 IV90865 IV90869 IV90874 IV90878 IV90880 IV90891 IV90892 IV91054 IV91144 IV91145 IV91327 IV91443.