Unless specifically noted otherwise, this history of problems fixed for GPFS 4.1.x applies for all supported platforms.
Problems fixed in GPFS 4.1.0.2 [August 4, 2014]
- Add tsm server selection option and improve messages.
- write(fd,NULL,bs) gives rc -1 and inconsistent behavior Added a check in code to validate if user provided buffer is NULL. If user provided buffer for rea/write system call is NULL than error is returned much earlier in code.
- Fix various problems with RDMA reconnection.
- Fix a rare case live lock which can happen when FPO file system is in low space situation.
- Fix two integer overflow problems of GPFS block map allocation module which caused by adding larger disk into existing file system. The problem can lead to block lost and data corruption.
- Avoid very rare log recovery failure after restripe of snapshot files.
- Prevent GPFS file system program mmfsd crash on a GNR/GSS system while deleting a log vdisk.
- Fix a problem in locking code that can cause GPFS daemon assert under certain rare race condition. The chance is slightly higher under 4.1.
- Prevent file system errors in the face of too many disk errors.
- Offline fsck fileset 0 record false positive on v3.3 and older filesystem.
- Fix a defect in the fileset snapshot restore tool when the tool tries to restore attributes of directories which they have been deleted after we create fileset snapshot.
- Apply if you see tsapolicy failing immediately after a helper node goes down.
- Eliminate FSSTRUCT errors from occuring during image restore process. Prevent gpfsLookup() common function from proceeding if stripe group is being image restored.
- Fix a node crash defect in gpfs_ireaddirx GPFS API when we use it to list changed dentry for a directory which has data in inode.
- Improved stability of mmfsadm dump tscomm.
- Install this fix if you have non-standard enclosure / drawer hardware not found in GSS systems.
- Fix a defect in the fileset snapshot restore tool when it tries to restore clone file which has been deleted after we create snapshot.
- Ignore Grace msg on nodes that do not support Ganesha.
- Fixed hung problem due ro lock overflow.
- Fix a problem where user registered callback is unexpectedly invoked when using mount instesad of mmmount.
- Fix a generation number mismatch defect when we create fileset in GPFS secvm file system.
- Fixed a race condition that could lead to an assertion failure when mmpmon is used.
- Fixed Assert 'filesetP->hasInodeSpace == 0' in offline fsck.
- Fixed problem in multi acquire and release with FGDL.
- When there is a GPFS failure return EUNATCH to Ganesha.
- Fileset snapshot restore tool restores dir attributes more effectively.
- Without this fix a setup with 4 or more drawers in an enclosure may not be able to survive the loss of the enclosure even though mmlsrecoverygruop -L states that disk group fault tolerance can survive an enclosure loss.
- Fix a defect that the restore tool cannot sync the restoring fileset when the file system manager node of the restoring fileset is running in GPFS 4.1.0.0 and the restore command is running in a node which runs upper version.
- Fixed online fsck assert after block allocation map export.
- Must make sure that all the interfaces are enabled.
- Fixed Ganesha not using right interface in RHEL6.5.
- Fix GPFS_FCNTL_GET_DATABLKDISKIDX fcntl API to return location info of pre allocated block correctly.
- clear sector 0 and last 4k bytes of the disk before it is created as NSD to prevent accidental GPT table recovery by uEFI driver.
- Fix a race condition problem in fileset snapshot restore tool when it tries to restore extended attributes for a directory.
- When GPFS kernel module is loaded on Linux, look up dependent symbols on demand.
- Fix stale mount handling in case of home mount issues.
- Fixed problem in scanForFileset when sgmgr resigns while the scan is in progress.
- Fixed problem where the GPFS daemon may terminate abnormally while performing encryption key rewrap (mmapplypolicy command with "CHANGE ENCRYPTION KEYS" rule). Fixed problem where mmrestorefs -j on an encrypted file system may resultin the file system being unmounted.
- Prevent multiple eviction processes from running simultaneously.
- Assert in Deque after gracePeriodThread runs.
- Update mmchmgr to pick the best candidate as new filesystem manager when user did not specify a new manager node.
- Fix a memory leak in the GPFS daemon associated with Events Exporter, mmpmon,and SNMP support.
- In GPFS systems employing GPFS Native RAID, there was a situation in which failover and failback, and disk replacement operations could hang, requiring GPFS to be restarted on the node to clear the hang. Fix is extremely low risk and highly recommended.
- Fix mmsetquota bug that returns invalid argument if a stanza contain fileset attribute along with type=FILESET.
- Fix deadlock if fs panics during E_IO err processing.
- mmdeldisk is blocked while phase3 recovery is doing deferred deletions. It is enough to wait until log recovery is done.
- Ensure SQL migration is done on GSS nodes only.
- Use maxLogReplicas instead of defaultMaxMetadataReplicas to calcuate the number of new log items when a new stripped log is added.
- Limit PORTMAP inactive failure due to DNS busy.
- Ensure that vdisks are automatically scrubbed periodically.
- Initialize the fromFsP to NULL in openArch() to guard against ever calling gpfs_free_fssnaphandle() with a bad argument. Add an informative message to look for the an error log in /tmp when the file writer pipeline is broken.
- Correct the multi-release table to avoid releasing fcntl tokens prematurely.
- Fixed race condition between two threads trying to become metanode at the same time.
- Ensured not to create file if it already exists for NFS when Ganesha is running.
- Fixed a typo in in removeOpenFileFrombgdList function that caused sig 11.
- Fix code to prevent potential GPFS daemon assert during log recovery. This problem only occurs when filesystem is 4.1 format with 4K alignment enabled (4K inode size, etc). Data replication has to be enabled and direct IO been used for write with buffer size that is not 4K aligned.
- This update addresses the following APARs: IV61626 IV61628 IV61630 IV61655 IV61988 IV61991 IV61995 IV62043 IV62091 IV62215 IV62243 IV62418.
Problems fixed in GPFS 4.1.0.1 [June 06, 2014]
- Fix thread-safe problem in dumping GPFS daemon threads backtrace.
- Fixes a problem with fsck repair of deleted root directory inode of independent filesets.
- Fixed a problem in clusters configured for secure communications (cipherListconfiguration variable containing a cipher other than AUTHONLY) which may cause communications between nodes to become blocked.
- After a file system is panicked, new lock range request will notbe accepted.
- This fix only affects customers running GNR/GSS on Linux, and who have in the past misconfigured their GNR servers by turning the config parameter "numaMemoryInterleave" off, and who experienced IO errors on Vdisks as a result of that misconfiguration. These IO errors can potentially corrupt in-memory metadata of the GNR/GSS server, which can lead to data loss later on. This fix provides a tool that can be used to locate and repair such corruption.
- Remove mmchconfig -N restrictions for aioWorkerThreads and enableLinuxReplicatedAio.
- Fixed problem when reading a clonde child from a snapshot
- Fixed a rare race condition causing the assert when two threads are attempting to do a metanode operation at the same time whilethe node is in the process of becoming a metanode.
- Fixed a deadlock in a complicated scenario involving restripe,token revoke and exceeding file cache limit.
- Fixed race between log recovery and mnodeResign thread
- E_VALIDATE errors in the aclFile after node failure
- Deal with stress condition where mmfsd was running out of threads
- Fix a problem in log recovery that would cause it to fail when replaying a directory insert record. The error only occurs for filesystems in version 4.1 format, where the hash value of the file name being inserted is the same as an existing file in the directory. The problem is also dependent on the length of the file name, and only happens if the system crashes after the log record is committed, but before the directory contents are flushed.
- Fixed the problem that was caused by a hole in the cleanBufferafter the file system panicked.
- Close a hole that fileset snapshot restore tool (mmrestorefs -j) may cannot restore changed data for a clone child file.
- Fix a rare assert which happens under low disk space situation
- Fixed deadlock during mmap pagein
- Fixed the problem of excessive RPCs to get indirect blocks and the problemof metanode lock starvation involving a huge sized sparse file.
- A problem has been fixed where the GPFS daemon terminates abnormallywhen direct I/O and vector I/O (readv/writev) is used on encrypted files,and the data is replicated, or the data must go through an NSD server.
- Fix a potential deadlock when selinux is enabled and FS is dmapi managed.
- Close a hole that fileset snapshot restore tool (mmrestorefs -j) may cannot restore a snapshot in a race condition that one restore thread is deleting a file but another restore thread is also trying to get file attributes for this file.
- Fixed a kernel oops that caused by a race in multiple NFS readson the same large file.
- mmchfirmware command will avoid accessing non-existent disk path.
- Fix a directory generation mismatch problem in an encrypted secvm file system.
- shutdown hangs in the kernel trying to acquire revokeLock
- Apply at your convenience. Even if you hit this bug, an equivalent cleanup is completed later in the command execution.
- improved stability of daemon-to-daemon communications when cipherList is set to a real cipher (i.e. not AUTHONLY).
- The serial number of physical disks is now recorded in the GNR event log, and displayed in the mmlspdisk command.
- GNR on AIX allow only 32K segment.
- Fixes a problem with fsck repair of corrupt root directory inode
- mmbackup tricked by false TSM success messages Mmbackup can be fooled by TSM
output when dsmc decides to roll-back a transaction of multiple files being
backed up. When the TSM server runs out of data storage space, the current
transaction which may hold many files will be rolled back and re-tried with
each file separately. The failure of a file to be backed up in this case was
not detected because the earlier message from dsmc contained "Normal File -->
[Sent]" though it was later rolled back. Fixes in tsbuhelper now detect the failure signature "** Unsuccessfull **" string and instead of simply ignoring these now will revert the changes in the shadow DB for the matching record(s). Hash table keeps track of last change in each record already, so reverting is now a legal state transition for hashed records. Reorganized some debug messages and streamlined some common code to work better. Now find 'failed' string to issue reversion updates as well. Fixed pattern matching in tsbackup33.pl to properly display all "ANS####" messages. - Fix RO cache i/o error if mounting fs in ro mode.
- Don't release mutex if daemon death.
- Fix the path buffer length calculation to return the correct length for dm_handle_to_path() functions.
- Fix bug in mmauth that may cause duplicate configure entries and node numbermismatch in configure file.
- Fix a problem with creating directory if the parent directory has default POSIX ACL.
- mmbackup fails to read hex env values mmbackup debug values, progress reporting, and possibly other user settings may be presented in decimal or hex, especially the bit-mapped progress and debugging settings. Perl doesn't always interpret the hex values correctly unless converted with the oct() function.
- Correct an NLS-related problem with mmchdisk and similar commands.
- This update addresses the following APARs: IV60187 IV60468 IV60469 IV60471 IV60475 IV60478 IV60543 IV60863 IV60864.