Unless specifically noted otherwise, this history of problems fixed for GPFS 3.5.x applies for all supported platforms.
Problems fixed in GPFS 3.5.0.9 [March 28, 2013]
- support for the Linux ATTR_FORCE flag to reset setid bits on truncate.
- Fix a deadlock which can occur during recovery when there is the near simultaneous failure of an NSD server node and disks which are twin tailed to that failed server and another server. There is a narrow timing window within which these multiple near simultaneous failures can trigger the deadlock.
- Fixed potential mount hang problem due to multiple dmapi sessions on the same node manage the fs. This situation may happen after cluster lost quorum, then reestablished, and then cluster manager node get migrated to some other nodes.
- Fix problem where batched token release optimization introduced in 3.5 may cause inconsistent lock token state under certain race conditions in a multi-cluster environment.
- Avoid crashes when snapshots are used with high update loads in a filesystem with at least 200M files.
- Account for missing storage pool information in disks created prior to GPFS 3.4.
- Fixed the behavior of resync and failover operations when the Queue State is Dropped, for an AFM fileset.
- Fix the wrong return code of dm_read_invis() call so that it returns the true error code instead of -1.
- Ensure both SSL key files are restored with a single command invocation.
- Install this fix when convenient, unless you are bothered by "orphan" policy processes, in which case, sooner!
- Fix mmdefedquota command usage.
- Fix mmsnmpagentd error (hang and core dump) due to lack of serialization among threads accessing the socket connection to GPFS daemon.
- This update addresses the following APARs: IV35916 IV36649 IV36671 IV37163 IV37402 IV37407 IV37410 IV37432 IV37434 IV38096 IV38480.
Problems fixed in GPFS 3.5.0.8 [February 19, 2013]
- Fix COW code to don't try to copy overflow block into prev snapshot if the inodedoesn't exit in the target snapshot.
- Reject as invalid an NFS4 ACL that is appended to a posix ACL.
- Corrected a potential deadlock in the RAID reconstruction code.
- Avoid very rare hang when concurrently modifying a shared directory.
- Customer may experience assert like "lfVersion != other.lfVersion || lfVersion== 0 || tailLsn == other.tailLsn" during log recovery due to double allocation of the same log file to two nodes. The fix is avoid the log double allocation situations.
- Fix race between AsyncRecovery and mmcrfileset that cause assert.
- Fix a stale nfs file handle error under some conditions when listing in a per-directory snaplink dir.
- Fix rare race condition in a multi-cluster environment that may cause the gpfs daemon to fail with "!oldDiskAddrFound.compAddr(*oldDiskAddrP)" assert when there are frequent conflicting accesses to the same file from different remote clusters.
- Fix assert in getDatabuf as the blockOffset was not reset.
- Fix a problem that only one node is working in restripefs.
- Fix a race condition which could cause same file allocator used by two different files.
- The fix ensures that mmfsck cleans up the filesystem affected with inodes having cross-linked blocks throughly.
- Fix a race between NFS reads on same file which would cause kernel panic.
- Changed the error message when mmclone is run on a filesystem without 'fastea' enabled for AIX.
- Don't allow more then one pit job running at a time if cluster configure version less than 3.5.
- Fix an error of "No such file or directory" after successful mmcrsnapshot
- Fixed the allocation code which caused a memory corruption that may lead to FSSTRUCT errors. The problem only occurs when mmadddisk fails due to some unexpected error such as running out of metadata space.
- This fix applies to all releases of the mmapplypolicy command from 3.4 and onwards.
- mmbackup will exclude Socket special files from backup.
- Fix a problem in mmfileid command so that it can file disk address of an xattr overflow block correctly.
- This fix applies to all releases of the mmapplypolicy command from 3.1 and onwards.
- Eliminate a bogus error message in "mmrepquota -a".
- This fix applies to all releases of GPFS with an mmapplypolicy command that supports the -q option. The -q option is rather esoteric! Most customers will not use it directly, BUT the mmbackup command does invoke mmapplypolicy with -q.
- This update addresses the following APARs: IV33613 IV35513 IV35748 IV35750 IV35751 IV35754 IV35760 IV35761 IV35762 IV35802.
Problems fixed in GPFS 3.5.0.7 [December 14, 2012]
- Under heavy load, shrink dcache is called by kswapd and finds a candidate dentry to be pruned, but the attached inode has already been deleted (I_CLEAR). The problem occurs when a pcache NFS lookup finds an anonymous (no-name) dentry in the cache. This can happen if the dentry had been previously created through d_alloc_anon() when instantiating an inode from a cached fh instead of a regular lookup. To "repair" this dentry (fill in the name), we allocate a new dentry with the name being looked up and instantiate it with the same inode using d_materialise_unique() and release the old no-name dentry. d_materialise_unique() drops the inode count on success. When shrink dcache runs, it will find the freed no-name dentry with an attached inode with i_count 0 (and assert). Add an extra hold on the inode before calling d_materialise_unique().
- gpfsClose() called for NFS delayed-open instance of a deleted snapshot asserts that gnode is stale but vinfo still connected. On snapshot delete, disconnect any matching vinfo's for open NFS instances during sgmMsgQuiesceOps.
- Fix restripe code to prevent potential deadlock when delete/add disk is run at same time as restripe.
- Fix an incompatibility issue between GPFS 3.5 and GPFS 3.4. When both versions coexist, nodes running GPFS 3.4 can experience assert like "logAssertFailed: this == newDesc sgdescio.C".
- Reduce unnecessary compensation passes Duplicate entries in the TSM audit failure list cost mmbackup extra passes over the shadow DB to compensate the failures. Since we sort the fail list based on inode order, just use sort -u option to remove duplicates up front. Prevent throwing away entries that fail the grep by adding a number suffix on the inode order key to ensure best chance at getting it in the right order.
- Update repair code to prevent replica mismatch on EA data after restart down disk.This only affect filesystem with FASTEA enabled and have EA data that can no longerbe stored inside inode.
- Fix code for mmrpldisk where it will migrate data off any suspended disk in addition to the disk been replaced. This can lead to both replica been placed on the replacement disk.
- Fix corruption caused during a rare race updating a shared directory.
- Fix potential deadlock casued by reloading policy rules when file system manager nodes die and another node make takeover.
- Fixed problem when closing filesystem due to policyfile openfile object still in hash tables and looping in verifyAllGone.
- Fix assert "secLsn == curLsn" that may occur under metadata intensive NFS workloads.
- Fix striped log file corruption due to snapshot restore.
- Fix a timestamp issue for Linux AIO in which the modification timestamp of a file accessed by Linux native AIO interfaces might be set wrong in some case.
- syncnfs mount option not effective for some error tests.
- Fixed rare deadlock in aclMsgFailureUpdate.
- This only affects GPFS GNR users, and only those that are running 3.4.0.17 or later in combination with 3.4.0.16 or earlier, on the server pair for a single Recovery Group. If failover from a newer software version to an older software version occurs during a rolling upgrade, or if the newer software is downgraded back to the older software, the some Pdisks may become "missing", or the whole Recovery Group (and therefore all Vdisks in it) will not be recoverable. This fixes this problem.
- Customer may experience assert like "isMoreRecent && tailLsn >= other.tailLsn || !isMoreRecent && tailLsn <= other.tailLsn" during log recovery. The fix provides a more graceful workaround so log recovery can proceed safely.
- This fix applies to all supported releases of the mmapplypolicy command, but is only important if you ever run with -L 3 or higher anda SHOW value that is a very long character string.
- Ignore quorum loss event from remote cluster.
- GPFS Native RAID (GNR) is unable to recover after intermittent disk communication failures. Only affects GNR users, in situations where intermittent hardware failure causes multiple disks to temporarily report write errors, while the GNR server is writing primordial Vdisk Configuration Data (VCD), and then only if the GNR server has to restart (perhaps due to server failover) shortly after the temporary write errors occurred. The failure will be indicated in the log by showing either error 214 or checksum error when recovering. This PTF fixes this problem, and allows the GNR server to recover cleanly. There is no other practical workaround that preserves user data. If all Vdisk NSDs in the affected recovery group can be destroyed, one can instead manually clear all the Pdisks in the failed recovery group (by overwriting the first 4 MiB with zeroes), then manually delete that affected recovery group (with the -p flag on mmdelrecoverygroup), then recreate the recovery group.
- This update addresses the following APARs: IV28687 IV31663 IV31816 IV32726 IV32729 IV32823 IV33246 IV33393 IV33394.
Problems fixed in GPFS 3.5.0.6 [November 16, 2012]
- This is fix only for GPFS Linux users. On Linux operating systems, readdir() API on GPFS filesystem was not returning the valid file types in the d_type member of struct dirent output parameter. Modified the code to return valid file types in the d_type member of struct dirent output parameter of readdir() system API on GPFS file system.
- Fix a race between lookup and mnode takeover which caused lookup to get inconsistent data.
- Fix EA code which caused GPFS daemon assert on filesystem with FASTEA enabled. This is mostly a problem on Windows.
- Fix a deadlock which occurs on GNR configurations in certain situations. This deadlock can occur when the active RecoveryGroup server fails, and the backup server experiences a SAS problem that prevents access to a sufficient number of disks, preventing RecoveryGroup recovery. In this situation, it is sometimes possible to see failure recovery blocked because the NSD transactions are waiting for the backup server to take over, when it cannot.
- If rebuilding shadow file encounters a severe problem when mmbackup is invoked with query option (-q), mmbackup will stop further backup processing against the TSM server.
- Add synchronization between filesystem manager resign and some ACL related operations. This is needed to prevent a possible GPFS daemon assert while running mmchmgr command.
- Fix automount problem on SELinux enforcing systems.
- Release the fileset lock before jumping to noDataToCopy.
- Fix range revoke handler to better handle error conditions such as IO error. Instead of causing the GPFS daemon to assert, this fix just panics the filesystem.
- Fix a problem with tsfattr() API where kernel panic may occur when executing GPFS_IREAD/GPFS_IREAD64 command on same file by multiple threads at same time. This problem only occurs on AIX.
- Fixed code that can cause GPFS daemon assert when multiple threads try to write to the same file after it has been truncated to size 0.
- Fix CNFS problem on SELinux enforcing systems.
- Close a hole in fileset snapshot restore tool when it restores a renamed file.
- Fix slow sequential read performance of very large files in a file system with a 16K block size and a very large pagepool.
- Fixed a bug in new background deletion code where it is trying to queue the deletion instead of handling it when maxBackgroundDeletionThreads is zero.
- Force log writes for synchronous NFS unlink operations.
- Fix rdwrFast.fastpathGetstate() == 0 assert after a cluster membership loss.
- Fix a race condition where multiple threads appending to the same file with synchronous writes could cause a deadlock.
- Added additional info to noDiskSpace to distinguish the reason of the event1. Added %reason which could be either diskspace or inodespace2. Added %storagePool to indicate the pool name when %reason is diskspace3. Added %filesetName to indicate the fileset name when %reason is inodespace.
- Fix a problem where GPFS daemon assert can occur when restripe fails or aborts on very large files.
- Fixed problem in readpage/splice_read where it is returning EFAULT instead of ETIMEDOUT when accessing HSM migrated file from NFS client.
- Avoid hang with long 'open snapInode0' waiters.
- Fixed a bug when setting filesize with truncate file operation.
- This update addresses the following APARs: IV28097 IV28120 IV28672 IV28685 IV29929 IV29930 IV30005 IV30613 IV30738 IV30744 IV31645 IV31647 IV31684 IV31815.
Problems fixed in GPFS 3.5.0.4 [September 14, 2012]
- Fix code to prevent a rare condition where many inode expansion thread can get started by periodic sync. This can cause GPFS daemon to run out resources for starting new threads.
- Fix a segfault prlblem after node takeover as ccmgr and finish dmapi session recover process.
- Fix the code that can cause a GPFS daemon assert when multiple thread working on same file caused a race condition to occur.
- Add environment variable MMBACKUP_RECORD_ROOT specs an alternate dir to store shadow files, list files, temp files, etc.
- Fix an assert encountered during opening of NSDs. This assert occursdue to a rare race condition which requires the device backing particular NSDs to completely disappear from the operating system while opening the NSD.
- This fix only applies to any customer who want SUBSTR interpreted sensibly for negative indices.
- Fix null pointer dereference when an RDMA connection breaks during memory buffer adjustment and verbsRdmaSend is enabled.
- Mask out ReadEA (which is the same as ReadNamed) from unallowed rights so that the lack of it is not interpreted as a denial. Only the presence of an explicit ACE can deny the ReadEA right.
- Fix an issue in a mixed version cluster, where a node running running GPFS 3.4 or older failing in a small window during mount could cause spurious log recovery errors.
- Fix CNFS to recognize GPFS filesystem in RHEL6.3.
- Fixed assert happened in trace statement after xattr overflow block was copied to snapshot.
- This fix applies to any customer who needs to kill the scripts started by mmapplypolicy. Or who is wondering why on AIX, a faulty program startedby mmapplypolicy "hangs" instead of aborting.
- Fix assert "MSGTYPE == 34" that occurs in pre and post-3.4.0.7 mixed multicluster environment.
- offline bit gets lost after CIFS calls gpfs_set_winattrs.
- Fix a problem which occurs in GNR configurations with replicated file systems. Should an NSD checksum error occur between an NSD client and GNR server, the first such error on a transaction could be mistakenly ignored, resulting in no callback invocation or event generated for it. Additionally, if the checksum error is persistent on the same transaction, the code could attempt to retry the transaction one more time than allowed by the configuration settings.
- Fix sequential write performance and deadlock since 3.4.0.12 and 3.5.
- This fix applies to any customer who has policy rules that reference the PATH_NAME variables AND who might encounter a path_name whose length exceeds 1024 bytes.
- Fix segfault in dm_getall_disp() functions.
- This update addresses the following APARs: IV27283 IV27287 IV27288 IV27290 IV27291.
Problems fixed in GPFS 3.5.0.3 [August 21, 2012]
- Fixed potential live-lock in snapshot copy-on-write of the extended attribute overflow block when the next snapshot is being deleted. Problem occurred in rare cases after the inode file increases in size
- mmbackup will check if session between remote TSM client node and TSM server is healthy and will remove the combination from transaction if non-healthy situation is detected
- Prevent an assert accessing files via DIO
- mmbackup will filter ANS1361E Session Rejected: The specified node name is currently locked error and will exit error
- mmbackup will filter filename that contains unsupported characters by TSM
- When a tiebreaker disk is being used, avoid quorum loss under heavy load when the tiebreaker disk is down but all quorum nodes are still up
- Fix the file close code to prevent a daemon assert which can occurs on AIX with DMAPI enabled filesystem
- Fix an infinite wait when delsnapshot
- Fix a problem that mmdf can not return correct inode info in a BigEndian and LittleEndian mixed cluster
- Fix an assert when copy the inode block to previous snapshot
- Added logic to reduce the chance of failure for "mmfsadm dump cfgmgr"
- When a tiebreaker disk is used, prevent situations where more than one cluster configuration manager is present simultaneously in the same cluster
- Fixed old bug in getSpareLogFileDA due to a typo
- Fix assertion failure when multiple threads use direct I/O to write to the same block of a file that has data replication enabled
- Fix daemon crash, during log recovery, when log file becomes corrupted
- Fix a problem that would cause mmadddisk failure
- Fix assert "isValid()" that occurs during mmbackup a snapshot
- Fix an assertion caused by leftover "isBeingRestriped" bit after a failed restripe operation
- Update mmrpldisk to issue warning instead of error when it can not invalidate disk contents due to disk been in down state
- Fix regression introduced in 3.4.0.13 and 3.5.0.1 that could in some cases cause "mmchdisk ... start" to fail with spurious "Inconsistency in file system metadata" error
- Avoid deadlock creating files under extreme stress conditions
- Fix code to ensure E_ISDIR error get returned when FWRITE flag is used to open a directory
- Fix snapshot creation code to prevent a possible GPFS daemon assert when filesystem is very low on disk space
- Fix problems with using mmapped files after a filesystem has been force unmounted by a panic or cluster membership loss
- Fix regression where a race condition between restripe and unmount could cause the GPFS daemon to restart with error message "assert ... logInodeNum == descP->logDataP[i].logInodeNum" in the GPFS console log
- mmbackup will report severe error if dsmc hit ANS1351E (Session rejected: All server sessions are currently in use)
- Fix issue in multi-cluster environment, where nodes in different remote clusters updating the same set of files could cause deadlock under high load
- mmbackup will filter filename with newline correctly
- Improve error handling for completed tracks
- Fix a bug that causes slowness during mmautoload/mmstartup on systems with automount file system. The performance hit is noticeable on large clusters
- Prevent very rare race condition between fileset commands and mount
- Fixed rare assert in log migration
- Fix assert "writeNSectors == nSectors" that occurs during "mmchfs --enable-fastea"
- Update mmlsquota -j Fileset usage message
- Fix allocation message handler to prevent a GPFS daemon assert. The assert could happen when a filesystem is been used by more than 1 remote cluster
- Block Linux NFS read of a file when CIFS holds a deny share lock
- Speed-up recovery when multiple nodes fail, and multiple mmexpelnode commands are invoked with each failed node as target. Applies mostly to DB2 environments
- Fix rare assert under workload with concurrent updates to a small directory from multiple nodes
- Fix null ptr dereference in case of i/o failure case on gw node
- Fixed hang problem when deleting HSM migrated file after creating a snapshot
- Fix a GPFS API gpfs_next_inode issue that it doesn't scan the file whose inode number is the max inode number of file system or fileset
- Fixed assertion when generating read or destroy events
- Fix the mmcrfs command to handle the -n numNodes value greater than 8192
- Extend mmbackup's tolerance of TSM failures listed in the audit log
even when paths are duplicate or unrequested. TSM frequently logs
in the audit log a number of unexpected path names. Sometimes the
path name is a duplicate due to repeated errors or due to TSM trying
to back up objects in a different order than presented in the list
file. Other times the object simply was not requested and it tries
to back it up anyway. Make mmbackup ignore these log messages
during shadow database compensation. Log all uncompensated error
messages to files in backupStore (root) in
mmbackup.auditUnresolved.
and mmbackup.auditBadPaths. Add new debug bit to DEBUGmmbackup: 0x08 to cause a pause before backup activities commence and a second pause before analysis of audit logs. Correct minor errors in close() handling of various temp files - Fixed sig 11 when background deletions is trying to access OpenFile object that was removed from cache while waiting for quiesce to finish
- Fixed race condition between FakeSync and RemoveOpenFile
- Fix a kernel panic which caused by a race between two nfs read
- Fix a restripe code that could cause a potential filesystem corruption. The problem only affect filesystem that was created without FASTEA enabled but was later upgraded to enable FASTEA via mmmigratefs with --fastea option
- Loss of access to files with ACLs can occur if independent filesets are,or have been, created in the filesystem
- This fix only applies to customers running GPFS on Linux/PowerPC, using WEIGHT clauses in their policy rules
- Fix mmdeldisk to ignore special files that do not have data in a pool
- Close a hole that gpfs_ireadx/ireadx64 cannot find more than 128 delts. Close a hole that call gpfs_ireadx/ireadx64 for an overwritten file may get assert if the input offset is not 0
- Fixed a problem where 'mmchmgr -c' fails on a cluster configured with a tiebreaker disk, resulting in quorum loss
- EINVAL returned from gpfs_fputattrs when an empty NFSv4 ACL is included
- FSErrBadAclRef reported when lockGetattr called RetrieveAcl with a zero aclRef
- Fixed deadlock resulting out-of-order aclFile/buffer locking
- This fix only applies to customers who have set tscCmdPortRange, running mmapplypolicy, running a firewall that is preventing policy from exploiting multi-nodal operation
- Fix code to avoid unavailable disks when there is no metadata replication
- Fix rare race condition where a node failure while writing a replicated data block under certain workloads could lead to replica inconsistencies. A subsequent disk failure or disk recovery could cause reads to return stale data for the affected data block
- Fix hung AIX IO when the disk transfer size is smaller than the GPFS blocksize
- gpfs_i_unlink failed to release d_lock causing d_prune_aliases crash
- This fix only applies to customers who are on AIX and have gotten "no enough space" errors when running mmapplypolicy
- This update addresses the following APARs: IV21750 IV21756 IV21758 IV21760 IV23290 IV23810 IV23812 IV23814 IV23842 IV23855 IV23877 IV23879 IV24151 IV24382 IV24426 IV24942 IV25185 IV25484 IV25487 IV25488 IV25762 IV25763 IV25771
- IV24937 is documented further at the URL: http://www.ibm.com/developerworks/forums/thread.jspa?threadID=448578&tstart=0
Problems fixed in GPFS 3.5.0.2 [May 30, 2012]
- mmbackup will exit 1 when auditlog file is not available for result analysis after backup transaction is done.
- Fix a problem stealing buffers in a large pagepool after installing 3.4.0.11.
- When backup partially fail, mmbackup continues to compensate shadow file even thoughthere are multiple failed reported for the same file in auditlog file.
- Fixed a bug in log recovery which could result in a "CmpMismatch" file system corruption problem.
- Fix for the iP->i_count == 0 kernel assert in super.c. This problem onlyaffects Linux 2.6.36 and later.
- Fix a rare deadlock where a kernel process gets blocked waiting for a free mailbox to send to the GPFS daemon.
- mmbackup will exit 1 when incremental backup partially fail and shadow file compensation succeed.
- Correct mmlsfileset output for junctions of deleted filesets in some cases.
- Fix a memory allocation problem when online mmfsck runs on a node with a heavy mmap workload.
- mmbackup will not stop processing even though there's no auditlog file if only expiration processing is done.
- mmbackup will display progress msg "Expiring files..." correctly if expiration transaction takes longer than 30 mins.
- Prevent the cluster manager from being expelled as a consequence of some communication outage with another node.
- mmbackup with multiple TSM clients will catch all error messages from dsmc command output.
- Fixes problem where the 'expelnode' callback indicates that the chosen node had joined the cluster first.
- Fix a problem with nBytesNonStealable accounting.
- Fixed message handler for filesystem quiesce which caused a GPFS assert when filesystem manager failed while filesystem is being quiesced.
- Fix printing of long fileset names in mmrepquota and mmlsquota commands.
- Fix mmap operations to go through nsd server when direct access to disks are no longer possible.
- Fix mmsetquota to handle numerical fileset names.
- mmbackup can backup files/directories with long pathname as long as GPFS and TSM support.
- Fix an error message in mmchattr command with -M/R/m/r option.
- Fix a problem that restripe failed in to an inifinite loop when sg panicked on the busy node.
- mmbackup will display backup/expiration progress message in every interval specified by MMBACKUP_PROGRESS_INTERVAL environment variable if specified. Otherwise, mmbackup will display backup/expiration progress message every 30 mins.
- Fixed rare assert when deleting files in a fileset.
- Fixed rare hang problem during sg or token recovery.
- Fix deadlock when doing inode scan (mmapplypolicy/mmbackup) in small pagepool.
- getxattr for ACLs may ovewrite the kernel buffer if small buffer sizes (less than 8 bytes) are specified.
- When mmbackup shadow file is rebuilt by --rebuild or -q option, mmbackup will get CTIME information from TSM server, hence files modified after previous backup but before shadow is rebuilt will be backed up by consequent incremental backup.
- GNR: fix a problem where certain errors from a pdisk, like media errors, caused RecoveryGroup open to fail. Change code to continue attempting to open the RecoveryGroup and simply discount the pdisk(s) returning media errors(and unexpected error codes).
- Prevent disks from being marked as 'down' when a node with the configuration option unmountOnDiskFail=yes receives an I/O error or loses connectivity to a disk.
- When mmbackup can't backup files, the message is more informational.
- Fixed mailbox calls which can lead to deadlock during filesystem quiesce. The deadlock is most likely to happen on a extremely overloaded system.
- When backup fail (partially or entirely) due to error from TSM client, mmbackup will display error msg from TSM cleint for easy problem detection. But mmbackup will display the error msg only once for the same error even though multiple times occur.
- Make GPFS more resilient to sporadic errors during disk access. Upon an unexpected error during disk open, such as ERROR_INSUFFICIENT_BUFFER, GPFS now retries the call after a brief pause.
- When compesating shadow file takes long time because backup partially fail, mmbackup will show progress message.
- Fixed the backward compatibility error in reading data across node on different versions. This is needed if you are upgrading from 3.4.0.6 or lower version number to 3.4.0.12 or higher GPFS version.
- mmbackup Device -t incremental and --rebuild is valid syntax and will work properly.
- Fix the problem that deldisk returned success even though if failed.
- handleRevokeM loops when a lease callback is not responded to.
- This update addresses the following APARs: IV19037 IV19165 IV20350 IV20610 IV20613 IV20615 IV20618 IV20619 IV20625 IV20627 IV20630 IV20634.