Unless specifically noted otherwise, this history of problems fixed for IBM Spectrum Scale 4.1.x applies for all supported platforms.
Problems fixed in GPFS 4.1.1.4 [December 15, 2015]
- Fix a problem counting the number of mmpmon clients; prevent improper double close of a file descriptor.
- Fix GNR AU log long waiters seen in SSD replacement.
- Fix a deadlock when GPFS writes to memory mapped buffer and the same thread a lock already on it.
- Fix the truncate(2) up failure issue on clone child file.
- Add gpdQuorumLossShutdown to be one of the assert condition.
- Fix the AFM write to sparse file to home hang issue.
- Fix an issue in log code which can cause log recovery to be incorrectly skipped after a node failure. This could only occur on a 4K aligned filesystem where GPFS runs into problem completing log wrap operation.
- Fix the restore failure when restoring clone children files.
- Fix data mismatch on clone child file after restore.
- Fix data mismatch on regular file which is not clone kind of file after restore.
- Fix log writebehind code to prevent writing log record to old disk address while log file is being migrated. This issue will show up as a log recovery error if a node fails shortly after a log record was written to a wrong location.
- Update log recovery code to set junction bit when replay log to recover directory for a newly created fileset. The missing junction bit can only be detected via offline fsck.
- Fix the failover/resync to support outband trucking.
- Fix the data inconsistency issue between cache and home during resync on appended files.
- Fix the restore failure that happened at attributes restoring phase.
- Fix deadlock scenario that can occur when deleting a snapshot.
- Fix the ACL/EA mismatch during resync by considering ctime changed option.
- With NFS backend, ATTR_MTIME_SET implies ATTR_MTIME, but GPFS ignores setattr(ATTR_MTIME_SET) if ATTR_MTIME is also not set.
- Fix code to avoid high CPU usage by the mmfsd process under Windows.
- Update locking code to prevent a GPFS daemon assert. The assert could happen when more than MaxFcntlRangesPerFile (default 200) advisory locks were placed on a single file.
- Customer may experience signal 11 when trying to delete pdisk in the middle of RG fail over. The fix is to eliminate this problem.
- Fix the dentry count leak by adding the code to call dput in error path.
- Fix out of quota errors that can occur on filesystems with a format less then 1400.
- Fix the mtime mismatch between cache and home for zero sized files by copying mtime from openfile to child attributes.
- Apply if you use -B number with number > 2**31-1 in any of your commands or scripts.
- Fix is recommended for all GNR (ESS/GSS) customers. The problem could occur in the event of an actual disk enclosure failure.
- With this feature, user will be able to add a 4K native disk to existing non-4K aligned file system if the disk is used dataOnly, and the file system data block size is at least 128K, and the file system version is at least 4.1.1.4.
- Fix the issue by allowing prefetch to continue if parent cannot be found for some files.
- Fix the memory mapped read performance issue on AFM filesets.
- Fix the mmrestorefs[479] : daemon command memory fault issue.
- Fix a problem with copying key files in mmsdrrestore where the node that is being restored does not have prompt less password access to the issuing node.
- Fix the case where the ESS storage enclosure slot location that is cached in the daemon can get stale and is not getting updated.
- Do not allow the AioWorkerThread to steal a dirty buffer. This prevents a deadlock.
- Fix the mmdiscovercomp command that is failing with "Constraint error" when trying to add servers to the component database.
- Fix code to avoid quorum loss declaration of the current cluster manager, when the network is broken between two nodes.
- Fix the fileset unlink hang by closing the control file before calling unmount.
- If a system built on GNR/GSS/ESS servers has been getting IO errors on GPFS file systems (reported all the way to the end user application, not internal disk IO errors on individual physical disks), and those IO errors happened exactly at a time when some pdisks were unreachable (for example due to cabling or connectivity issues), and those pdisks would have been reachable from the backup node of the GNR server, then this fix will prevent the IO errors, by failing the recovery group containing the affected vdisk over to the backup node.
- Add code to flush data buffers first before setting cached bit.
- Fix the path to the Linux modprobe command that the mmchfirmware command uses when --type adapter is specified.
- Starting with 4.1.1, GPFS changed the contents of the Linux NFS filehandle, compared to earlier versions (while still supporting older filehandles). This means if the AFM home is upgraded to 4.1.1 or later, existing AFM filesets detect a change in export since the filehandle changes and will suspend future synchronization with home. Similarly, a change from knfsd to Ganesha at home also causes a filehandle change even though the export is the same. The only solution is to resync the cache using failover which is expensive. This fix handles upgrades if home is running GPFS by detecting and upgrading cached filehandle when the filehandle changes for an inode.
- Fix the mmdiscovercomp command that is failing when there are multiple building blocks.
- Re-enable online replica compare and repair.
- This update addresses the following APARs: IV76482 IV78653 IV78662 IV78666 IV78669 IV78672 IV78810 IV78910 IV78912 IV78913 IV78914 IV78915 IV78932 IV79336 IV79338 IV79339.
Problems fixed in GPFS 4.1.1.3 [October 29, 2015]
- Fix a problem in a Disaster Recovery (multi-site) environment. If a network outage prevents the two main sites from talking to each other while both sites can still communicate with the tie-breaker (single-node) site, it is possible that the cluster manager may end up moving from the primary to the backup site. That may cause the primary site to lose quorum.
- Fix a PreAlloc log assert which happens when "offset + len" wraps through zero.
- Fix a regression which breaks FPO locality aware restripe.
- Fix the api gpfs_get_fssnaphandle_by_name to return the proper number of bytes, when called from a 32 bit application, so that the heap is not corrupted.
- Fix a memory map I/O offset issue that GPFS may not handle I/O properly for very huge file.
- Fix the mmrestorefs command failure on data changes restore phase.
- Handle minquorumNodes correctly in CCR enabled cluster.
- Fix GPFS SNMP subagent to work with newer Net-SNMP versions. This fix should be applied to any GPFS cluster node given the role of snmp_collector, if it is running RHEL 7.1, or some other Linux version that includes Net-SNMP 5.6 or beyond.
- Do not return AFM-specific internal attributes in gpfs_fgetattrs().
- On 2.6.39+ linux kernel, add explicit blk_start_plug/blk_finish_plug inside gpfs io submit routine, let io scheduler have more chances to merge IOs into a bigger size one.
- This update addresses the following APARs: IV77541 IV77542 IV77544 IV78046.
Problems fixed in GPFS 4.1.1.2 [September 10, 2015]
- Avoid buffer overrun risk in AFM multi-byte scratch file name generation.
- Fix the cause of a crash of the GPFS mmsnmpagentd daemon. The fix only applies to GPFS clusters where a node has been given the snmp_collector role, as seen in mmlscluster output.
- Fix mmbackup which could report success even when some designated files did not back up. The count of objects backed up can become inaccurate due to a persistent problem that the reported number of objects backed up can be inflated by "dsmc" when it chooses to back up additional items such as parent directories. Correct the count of objects backed up by carefully monitoring for any possible misrepresentation from the individual dsmc commands.
- Fix the mmrestorefs command failure at the attributes restore phase of the command.
- mmfsadm dump improvements: add more loop restriction to exit loop after dumping all the original number of cached record addresses and improve SIGFPE support during dump.
- Fix rare case of deadlock in direct IO code path when flushing the stolen buffer.
- Fix memory fault (core dump), loop or hang in mmimgrestore during exit processing.
- This fix affects environments installing the Object protocol on an external Keystone where the administrator wants the install to automatically create the Swift entries in the Keystone server.
- When slab allocator creation fails, printk a warning message then fail mmfslinux.ko load instead of panic the kernel.
- Fix a possible GPFS daemon crash when using the mmcharrier command to replace a disk in the P7 disk enclosure in which some of the disk slots were not populated. Fix is recommended for P7IH customers and not relevant to other systems.
- Re-enable quota limits automatically after "mmcrfs -Q yes" and "mmchfs -Q yes". It has been disabled wrongly since GPFS v4.
- Fix potential signal 11 encountered during dump of NSD IO buffers.
- Fix the daemon hang during handler cleanup in AFM environment.
- Fix an error when mmafmctl flushpending is invoked without fileset name.
- Fix the data restore problem for the small file which only has fragment block.
- This fix affects environments running the Object protocol with a locally-installed Keystone server with SSL support.
- Fix assert that might occur on systems configured with a small shared segment under stress workload that includes metadata updates and frequent buffer steals.
- Fix code to avoid removing wrong address during deletion of addresses from the cesiplist configuration file.
- Increased stability of the library used to retrieve keys used for file encryption from ISKLM.
- RecLockModuleReset call to __posix_lock_file encounters bad file pointer
- Fix a deadlock caused by not releasing the DMAPI lock in failure path of AFM read.
- Fix a problem that suspended disks are still marked as "tobeemptied" after successful restripe.
- Migrating files in RO fileset causes SetXAttr to be queued at gateway node.
- Fix the undefined symbols in 32-bit version of libgpfs.so.
- Fix null pointer dereferencing in AFM expiration code by limiting it to work only on valid and registered fileset handlers.
- The GSKit toolkit has been updated to version 8.0.50.47, which(1) fixes the vulnerability described in CVE-2015-1788 and (2) improves the performance of secure sends (cipherList is set to a cipher other than empty or AUTHONLY)
- Fix a problem that the GSS/ESS component database information can appear out of sync.
- Optimize cifsProcess::isRegistered when the hash chain is empty
- Fix a specific case where the remote cluster is removed before cleaning up the remote mount entries when using mmremotefs delete.
- Upgraded LROC to support new NSD disk layout.
- Drop the GNR track mutex when trying to acquire the log mutex
- Fix signal 11 in saveInodePts when configured to use a localCache.
- Fix performance degradation under a workload accessing a large number of files, due to unnecessary atime refresh messages.
- Improve performance for workloads with large numbers of files on systems with fast metadata storage.
- Update code to ignore EINPROGRESS error from flush when setting up pipe for invoking external script from GPFS daemon.
- Fix signal 11 in daemon caused by removing a localCache device.
- When mmchfs is run with a rapid repair option this fix will check to see if the file system is unmounted before executing the command. An error is issued if the file system is mounted.
- Update the threshold to print 'memory usage approaching the limit' warning message that was triggered too early.
- Fix a problem in the AIX operating system, where some system calls like open() may set errno to EPERM, even if returning successfully, when run from non-root users. System calls like shmat() (when used to map a file) may fail with the same value of errno.
- Relax server license requirement for NSD disks in system.log pool
- This update addresses the following APARs: IV75396 IV75999 IV76016 IV76017 IV76018 IV76019 IV76020 IV76383 IV76455 IV76457 IV76458 IV76461 IV76467 IV76471 IV76473 IV76475 IV76518 IV76759
Problems fixed in IBM Spectrum Scale 4.1.1.1 [July 30, 2015]
- Fix a rare case that could cause mmsnmpagentd to consume up to 100% of CPU when GPFS daemon terminates. Only affects clusters where a node is given the SNMP collector role.
- Avoid rare kernel assert while deleting many snapshots concurrently on a sluggish system.
- The command mmrpldisk now reports no space error instead of panic the GPFS file systems with several almost full disks.
- Provide a default user exit for nodeLeave event for FPO clusters so that the disks could be marked as down and the data integrity is not compromised.
- Print accurate remaining redundancy in the log when rebuild fails due to insufficient disk space.
- Fix "No disk name found" error when all of the disks are either in "emptied" or "to be emptied" state.
- nsdperf can hang when used on large number of linux nodes
- Change gpfs_prealloc not to preallocate blocks when the requested preallocate size is within the last block of the file but less than the file size. The allocation blocks are rounded to GPFS block boundaries when the file has fragments.
- mmdeldisk (relocation of aclFile blocks) results in lost ACL
- Fix a rare kernel crash case in incompleteAioListRemove when doing AIO on Linux.
- Fix a deadlock resulted from running fsck and recovery in parallel.
- Fsck reports false positive DA corruption
- Fix a problem that makes file blocks not distributed in metablock unit among nodes when the FPO file system has not enough failure groups.
- Enhance FPO autorecovery log for clarity
- Fix a problem encountered when dumping buffers with NSD checksum errors.
- When a vdisk I/O times out, failover the recovery group to the back up node.
- Prevent an assert due to a race condition while both creating and deleting snapshots concurrently.
- By moving the truncation operation of clone child files to the later delta restoring phase, the failure of truncation on clone child files is avoided.
- By extracting the right log file name from the input of "device" of mmrestorefs command, user should not see this internal failure error when the restore process failed.
- Fix a deadlock during Ganesha queue clean up. Now when the daemon crash we don't clean the Ganesha queue using the Ganesha thread, clean it later during SG cleanup.
- Fix slow performance of some administration commands when CCR (Cluster Configuration Repository) is enabled
- Correct a small vulnerability in takeover after SG manager failure during a snapshot command.
- Fix secondary kernel exception (get_stcP) on Linux cNFS server
- Enhance mmfsctl to work with topology vector failure group, NSD stanza file.
- Fix performance issues in ESS/GSS clusters in very high stress. This fix applies to customer with client nodes in a ESS/GSS cluster containing Connect-IB adapter.
- Fix memory fault (core dump) in mmimgrestore during exit processing
- Improve the performance of communication across daemons when the 'cipherList' configuration parameter is set to something other than empty or AUTHONLY.
- Ganesha: file descriptor was used after it was released causing assertion. Now the release is done at exit after all references the the files are done.
- Provide inode number information to an assertion within the low-level file write operation.
- Fix assert "openInstCount >= 0" under stress workload that includes file deletions.
- kxSendFlock needs to copyin user objects
- Fix a problem that the disk failed LED may not lit when setting the disk state to failed
- Fix a problem with AIO write pass the end file where file size change may be lost if GPFS daemon fails or file system panics shortly after write was completed.
- Fix a problem that mmdf show 0 free blocks for suspended disks
- Fix a kernel panic due to NULL pointer dereference during hard reboot of the partner node in Ganesha environment
- Fix a problem with DIRECT_IO write which can cause data loss when file system panic or nod e fails after write pass end of file using DIRECT_IO causes an increase in file size. The file sizeincrease could be lost.
- Enhanced the file system inconsistency state check during restore process and then graceful exit if detected.
- Fix the problem that data missed to write to new allocated datablock when file was expanded to size larger than old allocated datablock.
- Fixed the problem with VMWare NFS v3 client in Ganesha environment by providing an option to enable short_file_handle that VMware NFS client is using.
- Fixed a replicas mismatch problem that was caused by using wrongblock index in the indirect block.
- gpfs hadoop connector supports Hadoop 2.7.x release
- gpfs hadoop connector supports hdfs:// schema
- This update addresses the following APARs: IV74661 IV74686 IV74697 IV74732 IV75108 IV75394
Summary of changes for IBM Spectrum Scale version 4 release 1.1 as updated, June 2015
Changes to this release of the IBM Spectrum Scale licensed program and the IBM Spectrum Scale library include the following:
- Active file management asynchronous fileset-level data
replication for disaster recovery (DR)
Asynchronous replication of data at the file level enables you to create a primary(active)/secondary(passive) relationship at the fileset level. Data is asynchronously replicated to the secondary on a periodic basis. To enable this function, ensure that you run the following commands: * If you are migrating from a previous release, run the mmchconfig release=LATEST command. * Run the mmchfs -V full command. - Cluster Configuration Repository (CCR) Enhancements were made to restore broken configuration and files to bring a cluster back online or a broken node to a working state. In the case of a disaster recovery setup, steps are provided to downgrade the quorum assignments when half or more of the quorum nodes are no longer available at one of the sites. Consult the "Establishing disaster recovery for your GPFS cluster" topic in the IBM Spectrum Scale: Advanced Administration Guide.
- Cygwin 64-bit version requirement for Windows nodes The 32-bit version of Cygwin is no longer supported for Windows nodes running GPFS. Users that are running GPFS 4.1 with the 32-bit version of Cygwin installed must upgrade to the 64-bit version of Cygwin before installing IBM Spectrum Scale 4.1.1. Users with SUA on GPFS releases prior to 4.1 should upgrade directly to the 64-bit version of Cygwin.
- Data collection for expelled nodes When a node is about to be expelled for unknown reasons, debug data is collected automatically to help find the root cause.
- Deadlock amelioration Deadlock breakup requests can be issued on demand at a time that is chosen by a system administrator. A user callback for the deadlockOverload event can be added to notify a system administrator to check the system and workload for an overload condition.
- File Placement Optimizer (FPO) FPO enhancements deliver the ability to change block allocation of an existing file with the mmrestripefile and mmchattr commands and efficient removal of disks when disks have already been emptied with the auto recovery process. Auto recovery has been optimized to handle multiple failure and recovery events more efficiently.
- Fileset-level integrated archive manager (IAM) modes Fileset-level integrated archive manager (IAM) modes give users the ability to set four different IAM modes at the fileset level, including the root fileset, so that users can modify the file-operation restrictions that normally apply to immutable files. For more information, see the following: * topic about immutability and appendOnly restrictions in the Information Lifecycle Management chapter of the IBM Spectrum Scale: Advanced Administration Guide * mmchfileset and mmlsfileset command descriptions in the IBM Spectrum Scale: Administration and Programming Reference To enable this function, ensure that you run the following commands: * If you are migrating from a previous release, run the mmchconfig release=LATEST command. * Run the mmchfs -V full command.
- GPFS Native RAID (GNR) and Elastic Storage Server (ESS) documentation The documentation for GNR and ESS was removed from the information units in the IBM Spectrum Scale library. This includes GNR commands, GNR callbacks available to the mmaddcallback command, vdisk performance monitoring with the mmpmon command, messages in the ranges 6027-1850 - 6027-1899 and 6027-3000 - 6027-3099, and the chapter in the IBM Spectrum Scale: Advanced Administration Guide titled GPFS Native RAID (GNR). For more information about GNR, see GPFS Native RAID: Administration. For more information about ESS, see Deploying the Elastic Storage Server.
- Hadoop support Hadoop support was expanded from FPO storage to shared storage. This allows data stored in current GPFS clusters using shared storage to be accessible to Hadoop applications. IBM Spectrum Scale Hadoop Connector has been enhanced to transparently support both FPO based storage pools to leverage data locality and shared storage where locality information is not applicable. This allows FPO and shared storage pool to be used within the same file system, which allows Hadoop applications to access data in the entire file systems transparently. IBM Spectrum Scale Hadoop Connector fully supports Hadoop version 2.5, and it can also be used with Hadoop version 2.6 in compatibility mode (Hadoop file system APIs in 2.6 are not yet implemented). The mmhadoopctl command was introduced to simplify IBM Spectrum Scale Hadoop Connector configuration and management.
- Inode expansion optimization In this release, inode expansion, which allows dynamic growth of inodes, is optimized to reduce the contention that can flare up during bursts of file creates. To enable this function, ensure that you run the following commands: * If you are migrating from a previous release, run the mmchconfig release=LATEST command. * Run the mmchfs -V full command to enable all of the new functionality that requires different on-disk data structures. For more information, see the topics on completing migration and use of disk storage and file structure in file systems in the IBM Spectrum Scale: Concepts, Planning, and Installation Guide.
- Installation toolkit The installation toolkit can be used to do the following: * Install and configure GPFS. * Add GPFS nodes to an existing cluster. * Deploy and configure SMB, NFS, OpenStack Swift, and performance monitoring tools on top of GPFS. * Configure authentication services for protocols. * Upgrade GPFS and protocols. For details, see the spectrumscale command description in the IBM Spectrum Scale: Administration and Programming Reference.
- Multi protocol data access Data access to a shared storage infrastructure through enhanced protocol support for NFS, SMB, and Swift Object. For more information, see the IBM Spectrum Scale: Advanced Administration Guide and the IBM Spectrum Scale: Administration and Programming Reference.
- Performance improvements for mmfsck The mmfsck command can now store information that is found during a scan of the file system into a patch file. The information in the patch file can then be used as input to repairing the file system. Using a patch file to repair the file system prevents an additional scan before starting the repair actions. For more information, see the mmfsck command description in the IBM Spectrum Scale: Administration and Programming Reference.
- PIT inode list The parallel inode traversal (PIT) scan used for the mmchdisk, mmdeldisk, mmrestripefs, and mmrpldisk commands has now been updated to produce a list of inodes with interesting attributes, for example: those having broken disk addresses or those being ill placed. While the mmfileid command can be used to list files with broken disk addresses, this can be a slow process. Two new optional parameters, --inode-criteria CriteriaFile and -o InodeResultFile have been added to the more commonly-used mmchdisk, mmdeldisk, mmrestripefs, and mmrpldisk commands. These parameters allow you to find files matching certain criteria without a separate invocation of mmfileid. With this new feature, you can easily find the interesting files and their inode numbers. The output file will contain a list of inode numbers that meet the specified flags along with the name of the flag and the file type. For more information about these commands and for a description of the optional parameters and flags, see the commands in the IBM Spectrum Scale: Administration and Programming Reference. To enable this function, ensure that you run the following commands: * If you are migrating from a previous release, run the mmchconfig release=LATEST command. * Run the mmchfs -V full command to enable all of the new functionality that requires different on-disk data structures.
- Policy improvements: This release includes the following policy improvements: mmapplypolicy --sort-command SortCommand The mmapplypolicy --sort-command parameter allows you to specify an alternative sort command to be used, rather than the default sort command provided with the operating system. Implicit SET POOL 'first-data-pool' rule For file systems that are at or have been upgraded to 4.1.1, the system recognizes that, even if no policy rules have been installed to a file system by mmchpolicy, data files should be stored in a non-system pool if available (rather than in the system pool, which is the default for earlier releases). For more information, see the following: * Information Lifecycle Management chapter in the IBM Spectrum Scale: Advanced Administration Guide * mmchpolicy command description in the IBM Spectrum Scale: Administration and Programming Reference
- Quota management Quota management improvements for file system format 4.1.1 and higher include: * Allowing quota management to be enabled and disabled without unmounting the file system. To enable this function, ensure that you run the following commands: * If you are migrating from a previous release, run the mmchconfig release=LATEST command. * Run the mmchfs -V full command.
- Read replica policy In a file system with replicas, there are replicas for each data block stored in different disks in different failure groups. Now, using the readReplicaPolicy attribute of the mmchconfig command you can specify the location from which the policy is to read replicas. readReplicaPolicy lets you specify that the first replica be read, the local or closest replica, or the fastest. For more information, see the mmchconfig command in the IBM Spectrum Scale: Administration and Programming Reference.
- Performance Monitoring Tool The Performance Monitoring tool aims to provide performance information after collecting the metrics from GPFS and protocol nodes using the mmperfmon query command with an appropriate query. The tool helps in detecting performance issues and problems. The predefined queries and metrics help in investigating every node or any particular node that is collecting metrics. For more information, see the following: * "Performance Monitoring tool overview" topic in the IBM Spectrum Scale: Advanced Administration Guide * mmperfmon command description in the IBM Spectrum Scale: Administration and Programming Reference
- Documented commands, structures, and subroutines The following lists the modifications to the documented commands, structures, and subroutines: New commands The following commands are new: * mmces * mmdumpperfdata * mmhadoopctl * mmnfs * mmobj * mmperfmon * mmprotocoltrace * mmsmb * mmuserauth * spectrumscale New structures There are no new structures. New subroutines There are no new subroutines. Changed commands The following commands were changed: * gpfs.snap * mmaddcallback * mmafmctl * mmafmlocal * mmapplypolicy * mmbackup * mmbuildgpl * mmchconfig * mmchdisk * mmcheckquota * mmchfileset * mmchnode * mmchpool * mmchpolicy * mmcrcluster * mmcrfileset * mmdeldisk * mmedquota * mmfsck * mmlscluster * mmlsfileset * mmlsfs * mmlspolicy * mmlsquota * mmpsnap * mmrepquota * mmrestorefs * mmrestripefile * mmrestripefs * mmrpldisk Changed structures There are no changed structures. Changed subroutines There are no changed subroutines. Deleted commands There are no deleted commands. Deleted structures There are no deleted structures. Deleted subroutines There are no deleted subroutines.
- Messages The following lists the new, changed, and deleted messages: New messages 6027-962, 6027-2145, 6027-2230, 6027-2234, 6027-2235, 6027-2238, 6027-2240, 6027-2241, 6027-2242, 6027-2245, 6027-2246, 6027-2247, 6027-2248, 6027-2249, 6027-2250, 6027-2251, 6027-2252, 6027-2253, 6027-2254, 6027-2255, 6027-2256, 6027-2257, 6027-2258, 6027-2259, 6027-2260, 6027-2261, 6027-2262, 6027-2263, 6027-2264, 6027-2265, 6027-2266, 6027-2267, 6027-2268, 6027-2269, 6027-2270, 6027-2271, 6027-2272, 6027-2273, 6027-2274, 6027-2281, 6027-2282, 6027-2283, 6027-2284, 6027-2285, 6027-2286, 6027-2287, 6027-2288, 6027-2289, 6027-2290, 6027-2291, 6027-2292, 6027-2293, 6027-2294, 6027-2295, 6027-2296, 6027-2297, 6027-2298, 6027-2299, 6027-2300, 6027-2301, 6027-2302, 6027-2303, 6027-2304, 6027-2305, 6027-2306, 6027-2307, 6027-2308, 6027-2309, 6027-2310, 6027-2311, 6027-2312, 6027-2313, 6027-2314, 6027-2315, 6027-2316, 6027-2317, 6027-2318, 6027-2319, 6027-2320, 6027-2321, 6027-2322, 6027-2323, 6027-2324, 6027-2325, 6027-2326, 6027-2327, 6027-2329, 6027-2330, 6027-2331, 6027-2332, 6027-2333, 6027-2334, 6027-2335, 6027-2336, 6027-2337, 6027-2338, 6027-2339, 6027-2340, 6027-2341, 6027-2342, 6027-2343, 6027-2344, 6027-2345, 6027-2346, 6027-2347, 6027-2348, 6027-2349, 6027-2350, 6027-2351, 6027-2352, 6027-3255, 6027-3256, 6027-3257, 6027-3306, 6027-3307, 6027-3308, 6027-3309, 6027-3310, 6027-3311, 6027-3312, 6027-3313, 6027-3314, 6027-3315, 6027-3316, 6027-3551, 6027-3552, 6027-3553, 6027-3554, 6027-3579, 6027-3580, 6027-3581, 6027-3708, 6027-3709, 6027-3710, 6027-3711, 6027-3712, 6027-3713, 6027-3714, 6027-3715, 6027-3716, 6027-3717, 6027-3718, 6027-3719, 6027-3900, 6027-3901, 6027-3902, 6027-3903, 6027-3904, 6027-3905, 6027-3906, 6027-3907, 6027-3908, 6027-3909, 6027-3910, 6027-3911, 6027-3912, 6027-4000, 6027-4001, 6027-4002, 6027-4003, 6027-4004, 6027-4005, 6027-4006, 6027-4007, 6027-4008, 6027-4009, 6027-4010, 6027-4011, 6027-4012, 6027-4013, 6027-4014, 6027-4015 Changed messages 6027-625, 6027-872, 6027-1305, 6027-2181, 6027-2183, 6027-2229, 6027-2714, 6027-2715, 6027-2758, 6027-3248, 6027-3249 Deleted messages 6027-2622, 6027-2632, 6027-3511, 6027-3514, 6027-3515, 6027-3516, 6027-3536, 6027-3544
Problems fixed in GPFS 4.1.0.8 [May 26, 2015]
- Correct a small vulnerability in takeover after file system manager failure during a snapshot command.
- The code change ensures that online replica compare tool does not report false positive mismatches when the file system has suspended disks.
- Fix an AFM recovery issue during the fileset unlink.
- Fix a problem when determining whether copy-on-write is needed or not in the presence of snapshots. Sometimes this problem may result in spurious write operation failures (especially, but not limited to file/directory creation).
- Fix a hang in mmrestripefs, which may also result in waiters for "PIT_Start_MultiJob". The problem may happen if the set of nodes specified in the '-N' option to the command includes nodes which are still in the process of being started (or restarted).
- mmcrsnapshot, mmdelsnapshot and mmfileset commands quiesce the file system before they start actual work. During that quiesce if a thread doing file deletion of an HSM migrated file is stuck waiting for recall, since that recall could take long time due to slow tapes for example, then the mm commands could time out. This fix allows those commands to proceed while a deletion is waiting for recall.
- Close a very small window of deadlock caused by releasing the kssLock and and calling cxiWaitEventWakeupOne when a thread not waiting for the exclusive lock is waken up and leaving the thread actually waiting for the lock sleeping and waiting.
- Avoid a GPFS crash when running mmrestorefs or mmbackup where there are deleted filesets.
- Enable offline fsck to validate extended attribute file
- Fix a problem with directory lookup code that can cause FSErrInodeCorrupted error to be incorrectly issued. This could occur when lookup on '..' entry of a directory occurs at the same time as its parent is being deleted.
- Ensure that EA migration to enable FastEA support for a file system does not assert for 'Data-in-Inode' case under certain conditions
- Enable online fsck to fix AFM pre-destroyed inodes. Use PIT to cleanup unlinked inodes in AFM disabled fileset.
- Update allocation code to close a small timing window that could lead to file system corruption. The problem could only occur when a GPFS client has a file system panic at the same time as the new file system manager is performing a take over after the old manager resigned.
- Fix a signal 11 problem in multi-cluster environment when gpfs daemon relay the fsync request through metanode but the OpenFile got stolen on the metanode in the middle.
- Remove confusing trace stop failed error messages on Windows.
- The privateSubnetOverride configuration parameter may be used to allow multiple clusters on the same private subnet to communicate even when cluster names are not specified in the 'subnets' configuration parameter.
- This fix indicates that mmfileid command will not work if there is only GPFS express edition installed.
- Fix a workload counter used for NVRAM log tip I/O processing queues. Recommended if NVRAM log tip is in-use.
- Potentially avoid crash on normal OS shutdown of CNFS node.
- Fix issue where file create performance optimization was sometimes disabled unnecessarily.
- In a cluster configured with node quorum, fix a problem where, if the cluster manager fails and the cluster is left with only the bare-minimum number of nodes to maintain node quorum, the cluster may still lose quorum.
- Enable offline fsck to fix AFM orphan directory entries in single run
- Fix a problem where the number of nodes allowed in a cluster is reset from 16384 to 8192.
- This affects GSS/ESS customers who are using chdrawer to prepare to replace a failed storage enclosure drawer on an active system.
- Correct a problem in the v4.1 release with directory listings in file systems created prior to v3.2.
- Fix a problem that a race between log wrap and repair threads caused checksum mismatch in indirect blocks.
- Fix a daemon crash in AFM ensuring that the setInFlight() method have positive 'numExecuted' value while calculating the average wait time of the messages.
- Fix a problem on GPFS CCR cluster where GPFS commands may not work on inactive configuration servers after generated new security key.
- Fix command poor performance on cluster that has no security key.
- Fix a problem with DIRECT_IO write which can cause data loss when file system panic or node fails after a write passes the end of file using DIRECT_IO and causes an increase in file size. The file size increase could be lost.
- File cache filled-up with deleted objects (Linux NFS)
- Fix a hardlink creation issue by handling the E_NEEDS_COPIED error in SFSLinkFile function for AFM files.
- Fix handling of policy rules like ... MIGRATE ... TO some-group-pool THRESHOLD (hi,lo) ...
- The /var/mmfs/etc/RKM.conf configuration file used to configure file encryption now supports a wider set of characters.
- Trigger a back-off when 90% of the configured hard memory limit is hit during queuing of AFM recovery operations.
- ESS customers, using zimon, may see GPFS daemon crashes in the performance monitoring code.
- ESS customers, using zimon, may see GPFS daemon crashes in the performance monitoring code.
- Add support for multiple RDMA completion threads and completion queues
- Fix signal 11 in verbs::verbsCheckConn_i
- Fix signal 11 in runTSPcache caused by a uninitialized variable in error paths.
- mmauth inadvertently change cipherList to an invalid string. Changed Externals: New messages: GPFS: 6027-3708 [E] Incorrect passphrase for backend '%s'. GPFS: 6027-3709 [E] Error encountered when parsing line %d: expected a new RKM backend stanza. GPFS: 6027-3710 [E] Error encountered when parsing line %d: invalid key '%s'. GPFS: 6027-3711 [E] Error encountered when parsing line %d: invalid key-value pair. GPFS: 6027-3712 [E] Error encountered when parsing line %d: incomplete RKM backend stanza '%s'. GPFS: 6027-3713 [E] An error was encountered when parsing line %d: duplicate key '%s'. GPFS: 6027-3714 [E] Incorrect permissions for the /var/mmfs/etc/RKM.conf configuration file. Deleted messages: GPFS: 6027-3536 [E] Incorrect passphrase '%s' for backend '%s'. GPFS: 6027-3511 [E] Error encountered when parsing '%s': expected a new RKM backend stanza. GPFS: 6027-3515 [E] Error encountered when parsing '%s': invalid key-value pair. GPFS: 6027-3514 [E] Error encountered when parsing '%s': invalid key '%s'. GPFS: 6027-3516 [E] Error encountered when parsing '%s': incomplete RKM backend stanza '%s'. GPFS: 6027-3544 [E] An error was encountered when parsing '%s': duplicate key '%s'.
- This update addresses the following APARs: IV71419 IV71569 IV71601 IV71607 IV71613 IV71616 IV71628 IV71633 IV71634 IV71636 IV71648 IV71692 IV71815 IV72029 IV72033 IV72039 IV72042 IV72048 IV72684 IV72687 IV72688 IV72694 IV72695 IV72698 IV72700 IV72890.