Unless specifically noted otherwise, this history of problems fixed for IBM Spectrum Scale 4.2.x applies for all supported platforms.

Changes to this release of the IBM Spectrum Scale licensed program and the IBM Spectrum Scale library include the following:

Auditing configuration changes
A syslog entry is automatically written whenever a GPFS command makes a configuration change. Adding the information to the syslog gives flexibility in mining, processing, and redirecting these events. Entries can also be written to the standard GPFS log. The commandAudit parameter of the mmchconfig command controls this option.

Automated configuration of sensors for performance monitoring
IBM Spectrum Scale now supports automated configuration of sensors for its performance monitoring tool.

Callback event for file system structure errors
A new user callback event fsstruct (file system structure error) is triggered when the file system detects an error in the metadata. Immediate notification enables the callback program to act to mitigate further errors.

CES, NFS, and SMB protocols: Support for SLES V12 on x86 systems
Cluster Export Services (CES) partially supports SUSE Linux Enterprise Server (SLES) V12 on x86 systems. The SMB and NFS protocols are now supported via a manual installation process.

Compression support for FPO environments
File compression is expanded to support the File Placement Optimizer (FPO) environment. For the FPO environment, you must set the block group factor to a multiple of 10 to avoid degrading file system performance.

Deadlock management and debug data control
Deadlock management is extended with the following features: * The detection thresholds for deadlocks are automatically adjusted according to waiter length and cluster overloaded status. * New defaults more suitable for customer environments are established for the configuration variables deadlockDataCollectionDailyLimit, deadlockDataCollectionMinInterval, and others. A new configuration variable debugDataControl controls the amount of debug data that is collected. The default setting is a minimal amount of debug information that is the most important for debugging issues.

/dev/ device for a file system on Linux
On Linux, GPFS no longer creates the /dev/ device for a file system. Applications that relied on the file system device under /dev to be present, or that relied on "/dev" to be displayed in the output of the mount command, must find other ways to obtain the information. As a substitute, consider the information provided by the /etc/fstab file and /proc/mounts entries.

Encryption: Simplified setup and Vormetric DSM support
* A new console command mmkeyserv greatly simplifies the setup of encryption both on the key server and the client node. IBM Security Key Lifecycle Manager (SKLM) V2.5.0.4 or later (including V2.6) is required. * Encryption support is added for key servers that run Vormetric Data Security Manager (DSM) V5.2.3 or later.

Federation in the performance monitoring tool
A performance monitoring tool installation with multiple collectors is called a federation. Federation is introduced in performance monitoring to increase the scalability or the fault-tolerance of the performance monitoring system.

Guided installation
The spectrumscale installation toolkit now provides next step hints that are designed to help customers new to IBM Spectrum Scale with an easy workflow that helps customers to install and configure an IBM Spectrum Scale cluster.

Hadoop Support for IBM Spectrum Scale
* HDFS transparency now supports running the Hadoop Map/Reduce workload inside the virtual machine container, Docker. * Federation is introduced in HDFS to solve the HDFS NameNode scaling problem. * Hadoop distcp is used for data migration from HDFS to the IBM Spectrum Scale file system and between two IBM Spectrum Scale file systems. * HDFS transparency security has been introduced for the simple security mode and the Kerberos mode. * User authentication and authorization is weak in the simple mode. The data transfers and RPCs from the clients to the NameNode and DataNode are not encrypted. The Kerberos mode introduced in the Hadoop ecosystem provides a secure Hadoop environment.

InfiniBand and RDMA performance
Performance is improved for clusters that use InfiniBand and RDMA for their intranode communications network.

Linux on z Systems: Expanded features
* Quality of Service (QoS) support. * mproved extended count key data (ECKD™) device handling: On different nodes, different bus IDs can refer to the same device. * IBM Spectrum Scale GUI now supported on Linux for z Systems.

--metadata-only parameter for mmrestripefs
A --metadata-only option for the mmrestripefs command allows the restripe to complete in less time than a full restripe of metadata and data. The savings in time is useful in situations where there is a concern about file system operations and you want to restripe. This operation is supported for migrating data off disks, rebalancing, restoring replication, and comparing replicas.

mmhealth: Monitoring services hosted on cluster nodes
A new command, mmhealth is added to monitor the health status of nodes and different services hosted on nodes. The mmhealth command also displays the event logs responsible for the unhealthy status of nodes and services, to analyze and determine the problem responsible for the service failure.

Object storage improvements
* Added support for starting and stopping the ibmobjectizer service. * For problem determination, added potential problem scenarios with proposed solutions. * Added support for object encryption. * Added new constraints for unified file and object access. * Added support for simplified enablement of S3. * Added support for multi-region object deployment with a highly available keystone service. * Added support for OpenStack Liberty packages. * Added support to execute mmobj commands from any IBM Spectrum Scale client node. * Added support for monitoring support for external AD and LDAP server for object authentication and main object services.

Quality of Service for I/O operations (QoS) improvements
Quality of Service for I/O operations is expanded to support the File Placement Optimizer (FPO) environment.

Re-create and restore options for protocols cluster failover
The failover procedure can choose between re-create and restore options.

Re-create and restore options for failing back to an old primary for protocols cluster
When failing back to an old primary, the file protocol configuration can either be re-created or restored.

Re-create and restore options for failing back to a new primary for protocols cluster
When failing back to a new primary, the file protocol configuration can either be re-created or restored.

Support for Transparent Cloud Tiering
* The Transparent Cloud Tiering feature leverages the existing ILM policy available in IBM Spectrum Scale, and administrators can define policies to migrate cold data to a cloud storage tier or recall data from the cloud storage tier on reaching certain threshold levels. * A new command, mmcloudgateway, is added to manage and configure the cloud storage tier.

workerThreads tunes file system performance
The workerThreads parameter of the mmchconfig command controls an integrated group of variables that tune file system performance. Use this variable to tune file systems in environments that are capable of high sequential or random read/write workloads or small-file activity. This variable can be used in any installation and is preferred over worker1Threads and prefetchThreads in new installations.

IBM Spectrum Scale GUI changes
* Renamed Monitoring > Topology page to NSDs. The NSDs page facilitates monitoring the status of Network Shared Disks (NSD) and nodes to NSD mapping in the system. * Added new Monitoring > Nodes page in the GUI. The Nodes page provides an easy way to monitor the performance, health status, and configuration aspects of all available nodes in the IBM Spectrum Scale cluster. The properties of a node display the status of various CES servicesrecr such as Object, NFS, and SMB as well as the authentication status of these services if they are enabled. It also displays other details such as network status, information on attached NSDs and file systems, and so on. * Monitoring performance of transparent cloud tiering services through Performance and Dashboard pages. * Renamed Monitoring > Performance page to Statistics. * Added capacity monitoring options in the Statistics page. * Added monitoring options for GPFS waiters in Monitoring > Statistics panel. * You can assign a name to the dashboards and the user can switch between dashboards. * The dashboards are now stored on the server instead of the browser. Therefore, it can be shared among users and browsers. * Default dashboards are shipped with the GUI. When you open the IBM Spectrum Scale GUI after the installation or upgrade, you can see the default dashboards. You can further modify or delete the default dashboards to suit your requirements. * Renamed Download Logs page as Diagnostic Data. Now, the GUI can be used instead of the gpfs.snap command to collect the details of the issue. * The Files > Information Lifecycle page facilitates defining compression and deletion rules. * The new Settings > Object Service page facilitates start and stop feature for object services. * Up to 1000 nodes are supported. * The GUI can now be used an IBM Spectrum Scale cluster where sudo wrappers are used. * IBM Spectrum Scale GUI support for System platform is available on RHEL7.2 and SLES12. * By default, GUI commands that change the configuration of the cluster cause an audit message to be sent to syslog. Optionally, an audit message can also be sent to the GPFS log.

NFS and SMB protocol troubleshooting information added
* New AD Discovery tool to query and validate several AD settings. * New troubleshooting information for NFS mount issues, NFS error events, and NFS error scenarios. * New troubleshooting information for SMB client on Linux failures, SMB mount errors, SMB error events, and SMB access issues.

Documented commands, structures, and subroutines
The following lists the modifications to the documented commands, structures, and subroutines: New commands mmadquery, mmcloudgateway, mmhealth, mmkeyserv New structures There are no new structures. New subroutines There are no new subroutines. Changed commands gpfs.snap, mmafmlocal, mmcallhome, mmchconfig, mmchnode, mmcesdr, mmcrsnapshot, mmdelsnapshot, mmlscluster, mmnfs, mmobj, mmrestripefs, mmsmb, mmprotocoltrace, mmsmb, mmuserauth Changed structures There are no changed structures. Changed subroutines gpfs_iopen() gpfs_iopen64() Deleted commands There are no deleted commands. Deleted structures There are no deleted structures. Deleted subroutines There are no deleted subroutines. New messages 6027-1826, 6027-2363, 6027-2364, 6027-2365, 6027-2366, 6027-2367, 6027-2368, 6027-2369, 6027-2370, 6027-2371, 6027-2372, 6027-2373, 6027-2374, 6027-2375, 6027-2376, 6027-2377, 6027-2378, 6027-3108, 6027-3720, 6027-3721, 6027-3722, 6027-3723, 6027-3724, 6027-3725, 6027-3726, 6027-3727, 6027-3728, 6027-3915, 6027-3916, 6027-3594, 6027-3595, 6027-3596 Changed messages 6027-1368, 6027-1235, 6027-1545, 6027-2271, 6027-2272, 6027-2273, 6027-2274, 6027-2951 Deleted messages 6027-1997

Problems fixed in IBM Spectrum Scale 4.2.0.4 [June 29, 2016]

For Spectrum Scale customers that have NFS enabled and use the GPFS GUI to monitor NFS exports, and have a large number of exports (~1000s), there will be a noticeable performance improvement.

Fix a problem where mounting a large number of file systems on a node with a small pagepool may fail or hang due to running out of space for log buffers.

Fix a kernel panic due to write after free in mmfs26 mmkproc.

Fix an issue in the AFM environment where recovery,resync and prefetch operations can fail because of a large number of files to be queued.

This fix provides improvements for mmap read SMP scalability.

If asynchronous NFS/NLM locking is used this fix will prevent potential kernel crash.

Fix a problem in which policy skips a file that has an 's' in its mode field.

This fix will try to force through log recovery even when all stripes of a log home vdisk are marked stale (logically unreadable) in the metadata. This will only occur when run under debug control. This applies to GSS & ESS installations.

This fix will avoid random node reboots when files are updated from different nodes.

Fix a problem in which CES gets shutdown on the local cluster manager nodes when the remote cluster has quorum loss.

Fix an assert that was hit when running offline fsck for a file system that has one or more files with multiple trailing duplicate blocks.

Fix a segmentation fault issue when running the mmsetquota command. This issue would only happen when GPFS overwrite tracing is enabled on Linux.

Fix a problem in which too much data is dumped when collecting data for deadlocks and expels. This was causing performance issues.

Fix a problem where during a stress workload doing appends to a small file could cause kernel panic.

Fix a daemon crash that can occur while trying to execute a pcache command with maxThrottle set.

Fix network communication problems that can occur when mmdiag --iohist and overload detection happen at the same time.

Fix a problem in which no server will try to activate the recoverygroup after a mmchrecoverygroup failure. This fix applies to GSS/ESS customers.

Fix an alloc segment steal problem that can lead to more than 22 minutes of searching for a free buffer assert.

Fix random memory corruption and kernel crashes in the AFM environment which are likely to happen while deleting the non empty directory at the home or secondary clusters.

Fix the readdir performance issue of independent writer mode filesets in the AFM environment. Introduce a new configuration option afmDIO at the fileset level to replicate data from cache to home using direct IO.

Fix QOS performance issues.

Fix a mmdiscoverycomp failure that can occur if the cluster is configured to use different admin and daemon node names.

Update log recovery code to better handle certain invalid data that could cause GPFS daemon to die with Signal 11. This change will allow offline mmfsck to run and fix the problem.

This update addresses the following APARs: IV84196 IV85084 IV86147 IV86154 IV86156 IV86158 IV86159 IV86160 IV86161 IV86162 IV86163 IV86164 IV86175 IV86176 IV86177 IV86178 IV86181 IV86182 IV86183.

Problems fixed in IBM Spectrum Scale 4.2.0.3 [April 30, 2016]

A new parameter has been added to the /var/mmfs/etc/expelnode callback script, which is invoked by the cluster manager when about to "expel" a node when one node cannot communicate with another. The parameter is set to "dryrun" when the script is not being invoked to make a decision on which node to expel. The parameter is set to "expel" when the exit value of the script is used to make the decision on which node to expel.

The Protocol Authentication command 'mmuserauth service create' now gracefully exits on execution when gpfs is down.

Fix a problem with restripefs -R which was incorrectly setting the currentDataReplicas of logfiles.

Fix a bug that prevented offline fsck from reporting replica mismatches.

Enable fsck to detect and repair duplicate sub-directory entries in a directory.

Enable offline fsck to report first bad metadata replica without using replica compare option.

Fix a rare fsck deadlock that can occur during fsck termination.

Fix fsck duplicate fragment problem report to be in a neater tabular format.

Fix a case of long waiters when a HSM application races with FS failover.

Fix corruption which could occur when hawc is enabled during a node failure.

Fix an assert in fileset creation following an earlier failure due to low disk space.

Offline fsck in read-only mode will now warn about unavailable disks before scanning the file system.

Fix a data corruption issue even after a successful mmrestorefs command completion, as well as to fix the potential issue of getting incorrect data by gpfs_ireadx() API. This issue only happens on mmrestorefs/gpfs_ireadx() users on AIX/Linux systems.

Fix a hang problem that could happen when umountting the filesystem.

Fix Linux kernel asserts BUG_ON(page_mapped(page)) for GPFS file mmap.

Prevent command issue with disk balance option from running if the number of NSDs is more than 31 and the total number of PIT worker threads is more than 31. Affected commands are mmrestripefs -b, mmaddisk -r and mmdeldis -b.

Prevent GPFS daemon from asserting on Windows when collecting debug data on waiters.

Disable cluster CCR left an authorized_ccr_keys file behind which may cause a startup problem if cipherLists and or nistCompliance changed.

Fix a deadlock in AFM environment where peer snapshot creation could deadlock with synchronous messages like (Lookup, Open, Read etc..). This can only occur if peer snapshots are enabled.

Handle Cygwin security error messages due to latest ACL changes.

Fix an issue in AFM environment where random writes to same file causes memory leak after replication.

Fix a problem in which list.evictionList.PID files are not cleaned up after eviction.

Allow mmchconfig to delete an empty nodeclass from the GPFS configuration node.

Fix a problem in AFM environment where replication would stop because of an error while replaying Rename operation. AFM queue will be in a stuck state while replaying Rename operation and no new operations will be replicated.

Fix a deadlock in AFM environment where readdir results in deadlock under heavy stress over GPFS backend.

Fix gpfs.snap hangs that can occur in AFM environment.

Fix a problem where GPFS did not handle E_NOTREADY correctly in low level IO routine that leads to a daemon assert.

Fix an autoload issue where GPFS may not come up on configure servers in SERVER base cluster if files in /var/mmfs/gen/nodeFiles are missing.

Fix a problem in which tsgescsiinfo reports invalid ESM information. This fix is required for all platforms. The condition seems to be SAS fabric related.

Make stuck tslsenclslot easier to diagnose.

Fix offine fsck repair assert when a replicated file system has a disk in 'removing refs' state.

Fix deadlock in an AFM environment when the hard queue memory limit (afmHardMemoryThreshold) is reached.

Fix an issue in AFM DR environment where file lookup might cause daemon assert when filesystem is already quiesced/suspended.

The "CollectDebugData upon node failure" waiter is no longer monitored by the automatted deadlock detection code to avoid causing another round of debug data collection in the case that waiter becomes long.

Users of GPFS native raid (ESS) should install this fix to ensure that on disk configuration data is not inadvertently written incorrectly during instances when large populations of disks become unavailable.

Fix a memory leak issue in AFM environment where an undocumented stop command causes messages not to be deleted from the queue.

Fix code to update the CCR leader index after a leader election, to avoid failing CCR file updates.

Change mmbackup behavior when policy scan fails. Permit operation in a reduced-capacity, to do backup only and no expiration when dir scan results are incomplete. When this happens, no expirations should be processed, just backup. Shadow DB lines for removed files should be left alone.

Fix an issue where CES clients fail to connect after failover on Juniper's switches.

Fix a problem in which Windows client lost view of ACLs in mixed Linux cluster.

Fix a problem in which the GPFS SNMP subagent could not monitor a GPFS file system named free_disk.

Lift restrictions on -B, --max-backup-count, and --max-expire-count for the mmbackup command.

Fix a prefetch issue in AFM environment where the list file format is not correctly recognized with --migrate-list-file (undocumented) option.

Fix AFM recovery issue in which recovery operations queueing fails. This mostly happens with local remove operations.

This fix applies to clusters utilizing protocols and that have (or will) set up Protocols DR or are backing up Protocols configuration via the mmcesdr command. This fix changes default actions taken by the failover, restore or fail back scenarios so that instead of fully restoring the file protocols configuration, it re-creates the NFS and SMB exports that are defined. Whether to perform the default actions re-creating the exports or performing the full file protocols configuration restore can be specified by a new, optional parameter: --file-config.

Add manpages for the gpfs.snap command.

Fix memory tracking issue in AFM environment where gateway node memory usage appears like growing without any real memory leak. This causes replication to stop.

Fix an issue in AFM environment during gateway node startup where DR fileset activation for RPO snapshots might cause deadlock.

Fix an issue in AFM environment where home running NFS ganesha causes the cache to not bring in ACLs and sparseness information.

Fix unexpected CES IP movements with the CES address distribution policy when node-affinity is configured on the affinity node and GPFS is in an unhealthy state.

Fix problem where command "mmafmctl Device converttoprimary" fails for certain fileset names, including "system".

Fix a problem in which an amber disk "fault" light may remain on after temporary disk unavailability. The problem is specific to the 60 disk NetApp disk enclosure only.

Fix a problem resulting in a daemon failure, possibly a "Signal 8 in genIV", while running the mmrestorefs command to restore encrypted files.

Fix unexpected empty CES IP configuration file.

Fix mmap page fault performance regession.

Avoid daemon assert accessing a freed BufferDesc structure.

Fix a daemon exception referencing a freed OpenFile structure.

Fix possible deadlock which occurs when a node loses quorum (cluster membership) because of a network adapter or network outage.

Fix a bug where offline fsck was not repairing some inode problems detected during dir scan phase.

Fix a bug where offline fsck did not check for other good directory block replicas if the first replica was corrupt. This fix prevents losing the directory block in such situations and prevents data loss when running offline fsck repair.

Fix an issue that can occur when the AFM cache reads HSM migrated files from home (running IBM Spectrum Scale V4.2 or later) as a sparse file. This issue may cause the cache to have undetected data corruption and may return incorrect data (all zeros) to an application on read.

Fix an issue in the AFM environment where conversion of regular fileset to AFM DR primary fileset fails because of an error in mmafmctl command.

Fix an issue in the AFM environment where stress on a fileset might cause memory usage to reach afmHardMemThreshold and subsequent queue drop might cause deadlock with incoming messages.

This fix applies to clusters utilizing protocols and that have set up Protocols DR via the mmcesdr command. This fix addresses an issue seen when performing a Protocols Cluster failover action. This issue is seen when the client portion of the AFM DR NFS transport exports is identical to any of the client portions of any NFS exports have been created on the primary cluster and are protected through Protocols DR.

This update addresses the following APARs: IV82180 IV83047 IV83263 IV83265 IV83266 IV83267 IV83268 IV83270 IV83273 IV83276 IV83278 IV83279 IV83280 IV83475 IV83742 IV83749 IV83765 IV83848 IV83852 IV83897 IV83898 IV83900 IV83901 IV83903 IV83904 IV83906 IV83907 IV83908.

Problems fixed in IBM Spectrum Scale 4.2.0.2 [March 7, 2016]

Fix an assert caused by an NSD being deleted and then quickly recreated.

Fix a cluster not being able to start when a node hosting LROC disks is not available.

Fix asserts on didEmpty and Signal 11 faults in delSnapshotEmpty that can occur during snapshot deletion.

Fix a AFM: Write file system remote error 9 which can occur when a write is in progress on a file being deleted.

Fix a mmrestorefs assert which can occur during the delete clone file phase. The clone was left in a bad state during a force unlink of a fileset.

Fix ENOENT failures that can occur during a snapshot restore and during iopen64 API calls.

Fix a logAssertFailed: cfP->inodeHdr.nlink > 0 that can occur when mmdelfileset is run while unlinking a file in that fileset.

Fix an incorrect no space error that can occur when doing mmrestripefs and mmdeldisk at the same time when the file system is 3.5 or older.

Fix assert exp(synched.isNULL()) that can occur during a high work load on a LROC disk.

Fix a wrong vdisk name that shows up in a DA rebuild failure messages.

Fix a problem in which the primary recovery group server is unable to take control of the recovery group when it comes back up.

Fix a logAssertFailed: (((LkObj::wa) & ~(opP->lockmode) & 0x3F) == 0), which could occur when a file is truncated by user while a mmrestripefs command happens to be running and working on the same file to fix its compression (-z option to fix illcompressed files).

Fix a MD5sum mismatch in data after resync operation which can occur if a resync, a touch, and a write all happens at the same time.

Fix AFM readdir performance over NFS backend.

Fix a problem where gpfs_getacl returns a bad ACL entry when called with the GPFS_GETACL_STRUCT flag and acl_level GPFS_ACL_LEVEL_V4FLAGS.

Fix a GPFS daemon abort that can occur when a GNR backup is performed and the server is down.

Fix command failure in CCR cluster on nodes that have other non-GPFS gskit installed.

To prevent confusion in messages between GNR, GSS, ESS products, and the GPFS file system metadata, the word "metadata" was removed from all GNR errors and log messages.

Fix deadlock that can occur when the NFS server remote mount is not responding.

Allow snapshots to be created while snapshots are being deleted.

Fix daemon assert that can occur on a IW fileset during a prefetch that happens while commands like stat and readdir are being run.

Fix a signal 11 that can occur during failure recovery of a node while running a command that gets node information.

Fix an unexpected CES IP assignment and movement of CES nodes which are not ready to host CES IPs when the address distribution policy node-affinity is selected.

Fix AFM sync messages getting stuck over internal NFS mount if server is not responding. This can cause a deadlock.

Add a new optional parameter "--allowed-nfs-clients" to the mmcesdr CLI command.

Fix a fileset being placed in NeedsResync state that can occur on a SW fileset. This can happen if while creating and writing to a file, droppending and resync commands are run. Then while the resync is in progress a hard link gets created and both files are deleted.

Fix a mmapplypolicy command fail when multiple commands are issued nearly simultaneously AND tscCmdPortRange has been configured in a SONAS environment.

Fix a E_NOATTR prefetch error that can occur after a fileset is converted from SW to RO.

Fix a problem where GPFS should allow the non-root user who is the owner of the immutable or appendOnly file to advance the retention time.

Fix a problem where GPFS did not flush the data when setting file to immutable or appendOnly to avoid data loss in the case the node crashes immediately after.

Fix clone corruption that can occur during resync/failover.

Fix a E_ROFS write error that can occur when you write over a clone file and make it a clone parent and then run recovery.

Enhancements for supporting different --list-file formats for eviction, prefetch and flushPending.

Fix a problem where GPFS does not remove write permission bit of a file under IAM modes that is set to immutable via "mmchattr -i yes"

Fix a problem where GPFS resets the retention time after changing the file under IAM modes to immutable.

Fix a problem which stops autorecovery from being triggered if a node which has only dataAndMetadata disks is down.

Fix unexpected CES IP assignments and movements to CES nodes which are not ready to host CES IPs (e.g. suspended or not healthy) when the address distribution policy node-affinity is selected.

This update addresses the following APARs: IV78972 IV81069 IV81070 IV81339 IV81340 IV81341 IV81343 IV81346 IV81657 IV81662 IV81667 IV81671 IV81676 IV81869 IV81871 IV81874 IV81875 IV81876 IV81880 IV81917.

Problems fixed in IBM Spectrum Scale 4.2.0.1 [January 15, 2016]

Fix an issue that could cause the GPFS daemon to abnormally terminate or could cause incorrect performance data to be reported when GPFS SNMP subagent, mmpmon, or zimon are being utilized.

Fix GNR AU log long waiters seen in SSD replacement.

Fix the snapshot restore issue that some files in a live file system are not restored.

Fix a problem where mmchfs -z, -Q or --perfileset-quota prematurely releases a sdr lock which can result in the command to fail.

Fix signal 11 in verbsDisconnect_i when "large" fabnum value is used.

Fix a problem with GPFS logging code that could cause the GPFS daemon to die with signal 11. This problem can only occur on nodes with LROC enabled.

On a GSS / ESS / GNR system that uses NVRAM for the log tip, short outages of one of the nodes can cause inappropriately strongly worded error messages in the log, which state "[E] Insufficient spare space; unable to complete rebalance of DA ...". Those messages have been changed to be more sensible.

Fix a problem that could cause an FSSTRUCT error to be incorrectly logged when reading from a disk. This could only occur when LROC is enabled.

Fix a signal 11 daemon crash that can occur while running the mmchcarrier or mmchpdisk commands while a disk enclosure is in a failed state. This fix is recommended for all GNR (ESS/GSS) customers.

Fix a problem in which all I/O stops and all nodes go into arbitrating state that can occur during a network failure.

Fix logAssertFailed: !(_ownedByCaller((lockWordCopy), (lockWord_t *)&(lockWordCopy))) that can occur during high stress work loads.

Fix failures that can occur when trying to resume pdisks including mmchcarrier command failures. These errors occur on a GSS/ESS system.

Fix AioWorkerThread to not allow it to steal a dirty buffer that could cause a deadlock.

Fix a "Constraint error" that can occur during the mmdiscovercomp command when trying to add servers to the component database. This fix applies to GSS/ESS customers.

Improve GNR write performance by using more threads to flush internal GNR metadata.

Fix code to avoid quorum loss declaration of the current cluster manager when the network is broken between two nodes.

Fix a deadlock that can occur when queue memory crosses AFM hard memory limit.

Fix a mmfsd daemon crash that is possible when Zimon is used to monitor the node and a file system is force unmounted due to some unrecoverable file system error.

This fix removes the restriction that daemon interface changes are not allowed on CCR enabled clusters. You can now make daemon interface changes but only from non-quorum nodes.

Fix logAssertFailed: (_ownedByCaller((lockWordCopy), (lockWord_t *)&(lockWordCopy))) that can occur when a fileset goes to disconnected mode.

If a system built on GNR/GSS/ESS servers has been getting IO errors on GPFS file systems (reported all the way to the end user application, not internal disk IO errors on individual physical disks), and those IO errors happened exactly at a time when some pdisks were unreachable (for example due to cabling or connectivity issues), and those pdisks would have been reachable from the backup node of the GNR server, then this fix will prevent the IO errors, by failing the recovery group containing the affected vdisk over to the backup node.

This fix restricts the mmchcluster command from disabling CCR in a cluster that has a CES node.

Fix a problem in which orphans (files with inode allocated but not initialized) that have been moved to .ptrash can not be deleted.

Fix a GNR server node crash that can occur when a network fails to connect a GNR server pair.

Fix a GPFS daemon assert that can occur during restripe file operations. If a storage pool gets deleted by mmdeldisk -p or mmdeldisk -c the GPFS daemon assert could occur during either a mmrestripefile command or a mmchattr -I yes command.

Fix an assert that can occur when adding pdisks with --replace option.

Fix a logAssert that can occur during a snapshot restore process that is scanning a sparse file whose size is close to the GPFS maximum file size limitation.

Fix the path to the Linux modprobe command that the mmchfirmware command uses when --type adapter is specified. This fix applies to GSS/ESS customers.

Starting with 4.1.1, GPFS changed the contents of the Linux NFS filehandle. This means if the AFM home is upgraded to 4.1.1 or later, existing AFM filesets detect a change in the export since the filehandle was changed and will suspend future synchronization with home. Similarly, a change from knfsd to Ganesha at home also causes a filehandle change even though the export is the same. The only solution is to resync the cache using failover which is expensive. This fix handles upgrades if home is running GPFS by detecting and upgrading cached filehandles when the filehandle changes for an inode.

Fix a mmbackup failure that can occur when the command line arguments list is too long.

Fix a fileset can become stuck in an unmounted state problem that can occur if a remote fileset becomes stale and then comes back and both the application and gateway nodes are the same.

Fix a node crash that can occur during a rolling upgrade.

On a GNR, ESS, and GSS systems, error messages are printed when an I/O to a physical disk does not succeed. These messages were printed even when the I/O operation was not even attempted. In those cases, the I/O error messages are now suppressed.

Fix a mmdiscovercomp command failure that can occur when adding storage servers in GSS/ESS.

Fix a problem in which mmaddnode fails to copy the committed key file to the new node. This only occurs on a CCR disabled cluster and if there are 2 key files.

Fix a problem in which a hard memory limit is not honored when a fileset is in disconnected mode.

This update addresses the following APARs: IV79381 IV79745 IV79747 IV79749 IV79750 IV79752 IV79753 IV79754 IV79757 IV79759 IV79762 IV79763 IV79764 IV79766 IV79768.

Product/Component Name:	Platform:	Fix:
IBM Spectrum Scale	AIX 6.1 AIX 7.1	Spectrum_Scale_Advanced-4.2.1.0-ppc64-AIX

Readme and Release notes for release 4.2.1.0 IBM Spectrum Scale 4.2.1.0 Spectrum_Scale_Advanced-4.2.1.0-ppc64-AIX Readme