Power8 System Firmware

Applies to: 9119-MHE, 9119-MME, 9080-MHE and 9080-MME.

This document provides information about the installation of Licensed Machine or Licensed Internal Code, which is sometimes referred to generically as microcode or firmware.

1.0 Systems Affected
1.1 Minimum HMC Code Level
1.2 AIX iFix Required
2.0 Important Information
2.1 IPv6 Support and Limitations
2.2 Concurrent Firmware Updates
2.3 DPSS Updates
2.4 Memory Considerations for Firmware Upgrades
3.0 Firmware Information
3.1 Firmware Information and Description Table
4.0 How to Determine Currently Installed Firmware Level
5.0 Downloading the Firmware Package
6.0 Installing the Firmware
7.0 Firmware History
8.0 Change History Revised (11/27/2017)

1.0 Systems Affected

This package provides firmware for Power System E880 (9119-MHE ), Power Systems E880C (9080-MHE), Power System E870 (9119-MME) and Power Systems E870C (9080-MME) servers only.

The firmware level in this package is:

SC860_103 / FW860.30

1.1 Minimum HMC Code Level

This section is intended to describe the "Minimum HMC Code Level" required by the System Firmware to complete the firmware installation process. When installing the System Firmware, the HMC level must be equal to or higher than the "Minimum HMC Code Level" before starting the system firmware update. If the HMC managing the server targeted for the System Firmware update is running a code level lower than the "Minimum HMC Code Level" the firmware update will not proceed.

The Minimum HMC Code level for this firmware is: HMC V8 R8.6.0 (PTF MH01654) with Mandatory efix (PTF MH01655) or higher.

Although the Minimum HMC Code level for this firmware is listed above, HMC V8 R8.6.0 Service Pack 1 (PTF MH01656) with iFix (PTF MH01703) or higher is recommended.

For information concerning HMC releases and the latest PTFs, go to the following URL to access Fix Central:
http://www-933.ibm.com/support/fixcentral/
For specific fix level information on key components of IBM Power Systems running the AIX, IBM i and Linux operating systems, we suggest using the Fix Level Recommendation Tool (FLRT):
http://www14.software.ibm.com/webapp/set2/flrt/home

NOTES:
-You must be logged in as hscroot in order for the firmware installation to complete correctly.
- Systems Director Management Console (SDMC) does not support this System Firmware level.

1.2 AIX iFix Required

For IBM Power System servers with the PCIe 2-port Async EIA-232 Adapter installed on AIX partitions, an AIX fix resolving the async port interrupt handling (APAR IV77596) must be installed before updating to the SC840_056 (FW840.00) or later level of firmware. The ports on the adapter (feature code EN27/EN28, CCIN 57D4) may become un-usable with the installation of that firmware level due to an issue with how interrupts are handled. Many JAS_RTS error log entries are written to the error log due to this issue.

Prior to this APAR shipping in a future Service Pack, AIX intends to publish ifixes for the latest Service Packs on all active Technology Levels on our ftp server, in ftp://aix.software.ibm.com/aix/ifixes/iv77596/ on or before Oct 13, 2015. If you need an ifix other than the ones on this server, contact IBM support to request one for your specific situation.

The procedure is intended to be performed by the customer. In the event that the customer has questions or concerns with the procedure, you should contact IBM Support. Please contact IBM Support:
US Support: 1.800.IBM.SERV
WW Support (select your country): http://www.ibm.com/planetwide/

2.0 Important Information

Downgrading firmware from any given release level to an earlier release level is not recommended.

If you feel that it is necessary to downgrade the firmware on your system to an earlier release level, please contact your next level of support.

2.1 IPv6 Support and Limitations

IPv6 (Internet Protocol version 6) is supported in the System Management Services (SMS) in this level of system firmware. There are several limitations that should be considered.

When configuring a network interface card (NIC) for remote IPL, only the most recently configured protocol (IPv4 or IPv6) is retained. For example, if the network interface card was previously configured with IPv4 information and is now being configured with IPv6 information, the IPv4 configuration information is discarded.

A single network interface card may only be chosen once for the boot device list. In other words, the interface cannot be configured for the IPv6 protocol and for the IPv4 protocol at the same time.

2.2 Concurrent Firmware Updates

Concurrent system firmware update is only supported on HMC Managed Systems only.

2.3 DPSS Updates

Power 8 servers use a programmable power controller called a DPSS (Digital Power Subsystem Sweep) which is located in each system node. The DPSS is used to control P8 fan speeds, check voltage levels of the power supplies for proper level, and operation in the system node. The DPSS image is persistent and is only reloaded if there is a system firmware update that contains a DPSS change. If there is a DPSS change and the system firmware update is concurrent, the DPSS update is delayed to the next IPL of the CEC which will cause an additional 18 to 20 minutes to be added on to the IPL. If there is a change and the firmware update is disruptive, then DPSS update occurs when the service processor is resetting to service processor stand-by state, and will add 18 to 20 minutes to this transition. During the DPSS update the HMC or op-panel, will display DPSS update progress codes which may be overwritten on the HMC, but will be displayed as C100C300 thru C100C3FF. If there is a DPSS change in a system firmware service pack, the change will be designated as deferred in the service pack README. DPSS changes will be described along with a reminder of the 18 to 20 minute additional time in the Firmware Information and Description section in the README.

The DPSS download progress codes are documented in the IBM Knowledge Center:
https://www.ibm.com/support/knowledgecenter/POWER8/p8eai/C1xx_info.htm

2.4 Memory Considerations for Firmware Upgrades

Firmware Release Level upgrades and Service Pack updates may consume additional system memory.
Server firmware requires memory to support the logical partitions on the server. The amount of memory required by the server firmware varies according to several factors.
Factors influencing server firmware memory requirements include the following:

Number of logical partitions
Partition environments of the logical partitions
Number of physical and virtual I/O devices used by the logical partitions
Maximum memory values given to the logical partitions

Generally, you can estimate the amount of memory required by server firmware to be approximately 8% of the system installed memory. The actual amount required will generally be less than 8%. However, there are some server models that require an absolute minimum amount of memory for server firmware, regardless of the previously mentioned considerations.

Additional information can be found at:
http://www-01.ibm.com/support/knowledgecenter/9119-MHE/p8hat/p8hat_lparmemory.htm

3.0 Firmware Information

Use the following examples as a reference to determine whether your installation will be concurrent or disruptive.

For systems that are not managed by an HMC, the installation of system firmware is always disruptive.

Note: The concurrent levels of system firmware may, on occasion, contain fixes that are known as Deferred and/or Partition-Deferred. Deferred fixes can be installed concurrently, but will not be activated until the next IPL. Partition-Deferred fixes can be installed concurrently, but will not be activated until a partition reactivate is performed. Deferred and/or Partition-Deferred fixes, if any, will be identified in the "Firmware Update Descriptions" table of this document. For these types of fixes (Deferred and/or Partition-Deferred) within a service pack, only the fixes in the service pack which cannot be concurrently activated are deferred.

Note: The file names and service pack levels used in the following examples are for clarification only, and are not necessarily levels that have been, or will be released.

System firmware file naming convention:

01SCxxx_yyy_zzz

xxx is the release level
yyy is the service pack level
zzz is the last disruptive service pack level

NOTE: Values of service pack and last disruptive service pack level (yyy and zzz) are only unique within a release level (xxx). For example, 01SC830_040_040 and 01SC860_040_045 are different service packs.

An installation is disruptive if:

The release levels (xxx) are different.

Example: Currently installed release is 01SC850_040_040, new release is 01SC860_050_050.

The service pack level (yyy) and the last disruptive service pack level (zzz) are the same.

Example: SC830_040_040 is disruptive, no matter what level of SC830 is currently installed on the system.

The service pack level (yyy) currently installed on the system is lower than the last disruptive service pack level (zzz) of the service pack to be installed.

Example: Currently installed service pack is SC830_040_040 and new service pack is SC830_050_045.

An installation is concurrent if:

The release level (xxx) is the same, and
The service pack level (yyy) currently installed on the system is the same or higher than the last disruptive service pack level (zzz) of the service pack to be installed.

Example: Currently installed service pack is SC830_040_040, new service pack is SC830_071_040.

3.1 Firmware Information and Description


*Filename*	*Size*	*Checksum*
01SC860_103_056.rpm	85416035	14129

Note: The Checksum can be found by running the AIX sum command against the rpm file (only the first 5 digits are listed).
ie: sum 01SC860_103_056.rpm

SC860 For Impact, Severity and other Firmware definitions, Please refer to the below 'Glossary of firmware terms' url: http://www14.software.ibm.com/webapp/set2/sas/f/power5cm/home.html#termdefs The following Fix description table will only contain the N (current) and N-1 (previous) levels. The complete Firmware Fix History for this Release Level can be reviewed at the following url: http://download.boulder.ibm.com/ibmdl/pub/software/server/firmware/SC-Firmware-Hist.html
SC860_103_056 / FW860.30 06/30/17	Impact: Availability Severity: SPE New features and functions Support was added for Redfish API to allow the ISO 8610 extended format for the time and date so that the date/time can be represented as an offset from UTC (Universal Coordinated Time). Support for the Redfish API for power and thermal properties for the chassis. The new URIs are as follows:: https://<fsp ip>/redfish/v1/Chassis/<id>/Power : Provides fan data https://<fsp ip>/redfish/v1/Chassis/<id>/Thermal : Provides power supply data Only the Redfish GET operation is supported for these resources. System firmware changes that affect all systems A problem was fixed for service actions with SRC B150F138 missing an Advanced System Management Interface (ASMI) Deconfiguration Record. The deconfiguration records make it easier to organize the repairs that are needed for the system and they need to be consistent with the periodic maintenance reminders that are logged for the failed FRUs. A problem was fixed for a false 1100026B1 (12V power good failure) caused by an I2C bus write error for a LED state. This error can be triggered by the fan LEDs changing state. A problem was fixed for a fan LED turning amber on solid when there is no fan fault, or when the fan fault is for a different fan. This error can be triggered anytime a fan LED needs to change its state. The fan LEDs can be recovered to a normal state concurrently using the following link steps for a soft reset of the service processor: https://www.ibm.com/support/knowledgecenter/POWER8/p8hby/p8hby_softreset.htm A problem was fixed for sporadic blinking amber LEDs for the system fans with no SRCs logged. There was no problem with the fans. The LED corruption occurred when two service processor tasks attempted to update the LED state at the same time. The fan LEDs can be recovered to a normal state concurrently using the following link steps for a soft reset of the service processor: https://www.ibm.com/support/knowledgecenter/POWER8/p8hby/p8hby_softreset.htm A problem was fixed for a Redfish Patch on the "Chassis" or "IBMEnterpriseComputerSystem" with empty data that caused a "500 Internal Server Error". Validation for the empty data case has been added to prevent the server error. A problem was fixed for hardware dumps only collecting data for the master processor if a run-time service processor failover had occurred prior to the dump. Therefore, there would be only master chip and master core data in the event of a core unit checkstop. To recover to a system state that is able to do a full collection of debug data for all processors and cores after a run-time failover, a re-IPL of the system is needed. A problem was fixed for a Redfish Patch on power mode to "MaxPowerSaver" that caused a "500 Internal Server Error" when that power mode was not supported on the system. With the fix, the Redfish server response is a list of the valid power modes that be used for the system. A problem was fixed for the loss of Operations Panel function 30 (displaying ethernet port HMC1 and HMC2 IP addresses) after a concurrent repair of the Operations Panel. Operations Panel function 30 can be restored concurrently using the following link steps for a soft reset of the service processor: https://www.ibm.com/support/knowledgecenter/POWER8/p8hby/p8hby_softreset.htm A problem was fixed for a core dump of the rtiminit (service processor time of day) process that logs an SRC B15A3303 and could invalidate the time on the service processor. If the error occurs while the system is powered on, the hypervisor has the master time and will refresh the service processor time, so no action is needed for recovery. If the error occurs while the system is powered off, the service processor time must be corrected on the systems having only a single service processor. Use the following steps from the IBM Knowledge Center to change the UTC time with the Advanced System Management Interface: https://www.ibm.com/support/knowledgecenter/en/POWER8/p8hby/viewtime.htm. A problem was fixed for the service processor boot watch-dog timer expiring too soon during DRAM initialization in the reset/reload, causing the service processor to go unresponsive. On systems with a single service processor, the SRC B1817212 was displayed on the control panel. For systems with redundant service processors, the failing service processor was deconfigured. To recover the failed service processor, the system will need to be powered off with AC powered removed during a regularly scheduled system service action. This problem is intermittent and very infrequent as most of the reset/reloads of the service processor will work correctly to restore the service processor to a normal operating state. A problem was fixed for host-initiated resets of the service processor causing the system to terminate. A prior fix for this problem did not work correctly because some of the host-initiated resets were being translated to unknown reset types that caused the system to terminate. With this new correction for failed host-initiated resets, the service processor will still be unresponsive but the system and partitions will continue to run. On systems with a single service processor, the SRC B1817212 will be displayed on the control panel. For systems with redundant service processors, the failing service processor will be deconfigured. To recover the failed service processor, the system will need to be powered off with AC powered removed during a regularly scheduled system service action. This problem is intermittent and very infrequent as most of the host-initiated resets of the service processor will work correctly to restore the service processor to a normal operating state. A problem was fixed for a service processor reset triggered by a spurious false IIC interrupt request in the kernel. On systems with a single service processor, the SRC B1817201 is displayed on the Operator Panel. For systems with redundant service processors, an error failover to the backup service processor occurs. The problem is extremely infrequent and does not impact processes on the running system. A problem was fixed for the System Attention LED failing to light for an error failover for the redundant service processors with an SRC B1812028 logged. A problem was fixed for a system failure at run time with SRC B111E450 corefir(55) that could not reIPL. A system node should have been deconfigured for an ABUS error on a processor chip but instead, the system was terminated. To recover from this problem, manually guard the node containing the failed processor and then the IPL will be successful. A problem was fixed for an incorrect Redfish error message when trying to use the $metadata URI: "The resource at the URI https://<systemip>/redfish/v1/%24metadata was not found.". This %24 is meaningless. The "%24" has been replaced with a "$" in the error message. The Redfish $metadata URI is not supported. A problem was fixed for a system failure caused by Host boot problems with one node but the other nodes good. With the fix, the node that is failing the Hostboot is deconfigured and the system is able to IPL on the remaining nodes. To recover from this problem, manually guard the node that is failing and reIPL. System firmware changes that affect certain systems DEFERRED: On systems using PowerVM firmware, a problem was fixed for PCIe3 I/O expansion drawer (#EMX0) link improved stability. The settings for the continuous time linear equalizers (CTLE) was updated for all the PCIe adapters for the PCIe links to the expansion drawer. The system must be re-IPLed for the fix to activate. On systems using PowerVM firmware with a Linux Little Endian (LE) partition, a problem was fixed for system reset interrupts returning the wrong values in the debug output for the NIP and MSR registers. This problem reduces the ability to debug hung Linux partitions using system reset interrupts. The error occurs every time a system reset interrupt is used on a Linux LE partition. On systems using PowerVM firmware, a problem was fixed for "Time Power On" enabled partitions not being capable of suspend and resume operations. This means Live Partition Mobility (LPM) would not be able to migrate this type of partition. As a workaround, the partition could be transitioned to a "Non-time Power On" state and then made capable of suspend and resume operations. On systems using PowerVM firmware, a problem was fixed for manual vNIC failovers (from the HMC, manually "Make the Backing Device Active") so that the selected server was chosen for the failover, regardless of its priority. With the problem, the server chosen for the VNIC failover will be the one with the most favorable priority. There are two possible workarounds to the problem: (1) Disable auto-priority-failover; Change priority to the server that is needed as the target of the failover; Force the vNIC failover; Change priority back to original setting. (2) Or use auto-priority-failover and change the priority so the server that is needed as the target of the failover is favored. On systems using PowerVM firmware, a problem was fixed for extra error logs in the VIOS due to failovers taking place while the client vNIC is inactive. The inactive client vNIC failovers are skipped unless the force flag is on. With the problem occurring, Enhanced Error Handling (EEH) Freeze/Temporary Error/Recovery logs posted in the VIOS error log of the client partition boot can be ignored unless an actual problem is experienced. On systems using PowerVM firmware, a problem was fixed for a Live Partition Mobility (LPM) migration abort and reboot on the FW860 target CEC caused by a mismatched address space for the source and target partition. The occurrence of this problem is very rare and related to performance improvements made in the memory management on the FW860 system that exposed a timing window in the partition memory validation for the migration. The reboot of the migrated partition recovers from the problem as the migration was otherwise successful. On systems using PowerVM firmware, a problem was fixed for reboot retries for IBM i partitions such that the first load source I/O adapter (IOA) is retried instead of bypassed after the first failed attempt. The reboot retries are done for an hour before the reboot process gives up. This error can occur if there is more than one known load source, and the IOA of the first load source is different from the IOA of the last load source. The error can be circumvented by retrying the boot of the partition after the load source device has become available. On systems using PowerVM firmware, a problem was fixed for adapters failing to transition to shared SR-IOV mode on the IPL after changing the adapter from dedicated mode. This intermittent problem could occur on systems using SR-IOV with very large memory configurations. On systems using PowerVM firmware, a problem was fixed for SR-IOV adapters in shared mode for a transmission stall or time out with SRC B400FF01 logged. The time out happens during Virtual Function (VF) shutdowns and during Function Level Resets (FLRs) with network traffic running. This fix updates adapter firmware to 10.2.252.1927, for the following Feature Codes: EN15, EN16, EN17, EN18, EN0H, EN0J, EN0M, EN0N, EN0K, EN0L, EL38, EL3C, EL56, and EL57. The SR-IOV adapter firmware level update for the shared-mode adapters happens under user control to prevent unexpected temporary outages on the adapters. A system reboot will update all SR-IOV shared-mode adapters with the new firmware level. In addition, when an adapter is first set to SR-IOV shared mode, the adapter firmware is updated to the latest level available with the system firmware (and it is also updated automatically during maintenance operations, such as when the adapter is stopped or replaced). And lastly, selective manual updates of the SR-IOV adapters can be performed using the Hardware Management Console (HMC). To selectively update the adapter firmware, follow the steps given at the IBM Knowledge Center for using HMC to make the updates: https://www.ibm.com/support/knowledgecenter/HW4M4/p8efd/p8efd_updating_sriov_firmware.htm. Note: Adapters that are capable of running in SR-IOV mode, but are currently running in dedicated mode and assigned to a partition, can be updated concurrently either by the OS that owns the adapter or the managing HMC (if OS is AIX or VIOS and RMC is running). On systems with maximum memory configurations (where every DIMM slot is populated - size of DIMM does not matter), a problem has been fixed for systems losing performance and going into Safe mode (a power mode with reduced processor frequencies intended to protect the system from overheating and excessive power consumption) with B1xx2AC3/B1xx2AC4 SRCs logged. This happened because of On-Chip Controller (OCC) timeout errors when collecting Analog Power Subsystem Sweep (APSS) data, used by the OCC to tune the processor frequency. This problem occurs more frequently on systems that are running heavy workloads. Recovery from Safe mode back to normal performance can be done with a re-IPL of the system, or concurrently using the following link steps for a soft reset of the service processor: https://www.ibm.com/support/knowledgecenter/POWER8/p8hby/p8hby_softreset.htm. To check or validate that Safe mode is not active on the system will require a dynamic celogin password from IBM Support to use the service processor command line: 1) Log into ASMI as celogin with dynamic celogin password generated by IBM Support 2) Select System Service Aids 3) Select Service Processor Command Line 4) Enter "tmgtclient --query_mode_and_function" from the command line The first line of the output, "currSysPwrMode" should say "NOMINAL" and this means the system is in normal mode and that Safe mode is not active. A problem has been fixed for systems losing performance and going into Safe mode (a power mode with reduced processor frequencies intended to protect the system from overheating and excessive power consumption) with B1xx2AC3/B1xx2AC4 SRCs logged. This happened because of an On-Chip Controller (OCC) internal queue overflow. The problem has only been observed for systems running heavy workloads with maximum memory configurations (where every DIMM slot is populated - size of DIMM does not matter), but this may not be required to encounter the problem. Recovery from Safe mode back to normal performance can be done with a re-IPL of the system, or concurrently using the following link steps for a soft reset of the service processor: https://www.ibm.com/support/knowledgecenter/POWER8/p8hby/p8hby_softreset.htm. To check or validate that Safe mode is not active on the system will require a dynamic celogin password from IBM Support to use the service processor command line: 1) Log into ASMI as celogin with dynamic celogin password generated by IBM Support 2) Select System Service Aids 3) Select Service Processor Command Line 4) Enter "tmgtclient --query_mode_and_function" from the command line The first line of the output, "currSysPwrMode" should say "NOMINAL" and this means the system is in normal mode and that Safe mode is not active. On systems using PowerVM firmware, a problem was fixed for a partition boot from a USB 3.0 device that has an error log SRC BA210003. The error is triggered by an Open Firmware entry to the trace buffer during the partition boot. The error log can be ignored as the boot is successful to the OS. On systems using PowerVM firmware, a problem was fixed for a partition boot fail or hang from a Fibre Channel device having fabric faults. Some of the fabric errors returned by the VIOS are not interpreted correctly by the Open Firmware VFC drive, causing the hang instead of generating helpful error logs. On systems with redundant service processors, a problem was fixed for an extra SRC B150F138 logged for a power supply that had already been replaced. The problem was triggered by a service processor failover and an old power supply fault event that was not cleared on the backup service processor. This caused the SRC B150F138 to be logged for a second time. This problem can be circumvented by clearing the error log associated with the bad FRU when the FRU is replaced. On systems using PowerVM firmware, a problem was fixed for a Power Enterprise Pool (PEP) resource Grace Period not being reset when the server is in the "Out of Compliance" state and the resource has been returned to put the server back in Compliance. The Grace Period was not being reset after a double-commit of a resource (doing an "remove" of an active resource) was resolved by restarting the server with the double-committed resource. When Grace Period ends, the "double-committed" resources on the server have to have been freed up from use to prevent the server from going to "Out of Compliance". If the user fails to free up the resource, the PEP is in an "Out of Compliance" state, and the only PEP actions allowed are ones to free up the double-commit. Once that is completed, the PEP is back In Compliance. The loss of the Grace Period for the error makes it difficult to move resources around in the PEP. Without the fix, the user can "Add" another PEP resource to the server, and the action of adding a PEP resource resets the Grace Period timer. One could then "Remove" that one PEP resource just added, and then any further "removes" of PEP resources would behave as expected with the full Grace Period in effect. On systems using PowerVM firmware, a problem was fixed for Power Enterprise Pool (PEP) IFL processors assignments causing an "Out of Compliance" for normal processor licenses. The number of IFL processors purchased was first credited as satisfying any "unreturned" PEP processor resources, thus potentially leaving the system "Out Of Compliance" since IFL processors should not be taking the place of the normal (expensive) processor usage. In this situation, without the fix, the user will need to either purchase more "expensive" non-IFL processors to satisfy the non-IFL workloads or adjust the partitions to reduce the usage of non-IFL processors. This is a very infrequent problem for the following reasons: 1) PEP processors are infrequently left "unreturned" for short periods of time for specialized operations such as LPM migrations 2) The user would have to purchase IFL processors from IBM, which is not a common occurrence. 3) The user would have to put in a COD key for IFL processors while a PEP processor is still "unreturned" On systems using PowerVM firmware, a problem was fixed for a power off hanging at D200C1FF caused by a vNIC VF failover error with SRC B200F011. The power off hang error is infrequent because it requires that a VF failover error having occurred first. The system can be recovered by using the power off immediate option from the Hardware Management Console (HMC). On systems using PowerVM firmware, a problem was fixed for the incorrect reporting of the Universally Unique Identifier (UUID) to the OS, which prevented the tracking of a partition as it moved within a data center. The UUID value as seen on HMC or the NovaLink did not match the value as displayed in the OS. On systems using PowerVM firmware, a problem was fixed for an error finding the partition load source that has a GPT format. GUID Partition Table (GPT) is a standard for the layout of the partition table on a physical storage device used in the server, such as a hard disk drive or solid-state drive, using globally unique identifiers (GUID). Other drives that are working may be using the older master boot record (MBR) partition table format. This problem occurs whenever load sources utilizing the GPT format occur in other than the first entry of the boot table. Without the fix, a GPT disk drive must be the first entry in the boot table to be able to use it to boot a partition. On systems using PowerVM firmware, a problem was fixed for an SRC BA090006 serviceable event log occurring whenever an attempt was made to boot from an ALUA (Asymmetric Logical Unit Access) drive. These drives are always busy by design and cannot be used for a partition boot, but no service action is required if a user inadvertently tries to do that. Therefore, the SRC was changed to be an informational log.
SC860_082_056 / FW860.20 03/17/17	Impact: Availability Severity: SPE New features and functions Support for the Redfish API for provisioning of Power Management tunable (EnergyScale) parameters. The Redfish Scalable Platforms Management API ("Redfish") is a DMTF specification that uses RESTful interface semantics to perform out-of-band systems management. (http://www.dmtf.org/standards/redfish). Redfish service enables platform management tasks to be controlled by client scripts developed using secure and modern programming paradigms. For systems with redundant service processors, the Redfish service is accessible only on the primary service processor. Usage information for the Redfish service is available at the following IBM Knowledge Center link: https://www.ibm.com/support/knowledgecenter/en/POWER8/p8hdx/p8_workingwithconsoles.htm. The IBM Power server supports DMTF Redfish API (DSP0266, version 1.0.3 published 2016-06-17) for systems management. A copy of the the Redfish schema files in JSON format published by the DMTF (http://redfish.dmtf.org/schemas/v1/) are packaged in the firmware image. The schema files are distributed on chip to enable proper functioning in deployments with no WAN connectivity. IBM extensions to the Redfish schema are published at http://public.dhe.ibm.com/systems/power/redfish/schemas/v1. Copyright notices for the DMTF Redfish API and schemas are at: (a) http://www.dmtf.org/about/policies/copyright, and (b) http://redfish.dmtf.org/schemas/README8010.html. Support added to reduce memory usage for shared SR-IOV adapters. Support for the Advanced System Management Interface (ASMI) was changed to allow the special characters of "I", "O", and "Q" to be entered for the serial number of the I/O Enclosure under the Configure I/O Enclosure option. These characters have only been found in an IBM serial number rarely, so typing in these characters will normally be an incorrect action. However, the special character entry is not blocked by ASMI anymore so it is able to support the exception case. Without the enhancement, the typing of one of the special characters causes message "Invalid serial number" to be displayed. Support was added to the Advanced System Management Interface (ASMI) "System Service Aids => Cable Validation" to add a timestamp for when the last time the cables were validated. System firmware changes that affect all systems A problem was fixed for the setting the disable of a periodic notification for a call home error log SRC B150F138 for Memory Buffer resources (membuf) from the Advanced System Management Interface (ASMI). A problem was fixed for the call home data for the B1xx2A01 SRC to include the min/max/average readings for more values. The values for processor utilization, memory utilization, and node power usage were added. A problem was fixed for incorrect callouts of the Power Management Controller (PMC) hardware with SRC B1112AC4 and SRC B1112AB2 logged. These extra callouts occur when the On-Chip Controller (OCC) has placed the system in the safe state for a prior failure that is the real problem that needs to be resolved. A problem was fixed for System Vital Product Data (SVPD) FRUs being guarded but not having a corresponding error log entry. This is a failure to commit the error log entry that has occurred only rarely. A problem was fixed for the failover to the backup PNOR on a Hostboot Self Boot Engine (SBE) failure. Without the fix, the failed SBE causes loss of processors and memory with B15050AD logged. With the fix, the SBE is able to access the backup PNOR and IPL successfully by deconfiguring the failing PNOR and calling it out as a failed FRU. A problem was fixed for the Advanced System Management Interface (ASMI) "System Service Aids => Error/Event Logs" panel not showing the "Clear" and "Show" log options and also having a truncated error log when there are a large number of error logs on the system. A problem was fixed a system going into safe mode with SRC B1502616 logged as informational without a call home notification. Notification is needed because the system is running with reduced performance. If there are unrecoverable error logs and any are marked with reduced performance and the system has not been rebooted, then the system is probably running in safe mode with reduced performance. With the fix, the SRC B1502616 is a Unrecoverable Error (UE). A problem was fixed for valid IPv4 static IP addresses not being allowed to communicate on the network and not being allowed to be configured. The Advanced System Management Interface (ASMI) static IPv4 address configuration was not allowing "255" in the IP address subfields. The corrected range checking is as follows: Allowed values: x.255.x.x, x.x.255.x, x.255.255.x Disallowed values: x.x.x.255 The failure for the communication on the network is seen if the problematic IP addresses are in use prior to a firmware update to 860.00, 860.10, 860.11, or 860.12. After the firmware update, the service processor is unable to communicate on the network. The problem can be circumvented by changing the service processor to use DHCP addressing, or by moving the IP address to a different static IP range, prior to doing the firmware update. A problem was fixed for corrupt service processor error log entries caused by incorrect error log synchronization between primary and backup service processor during firmware updates. At the time of the corruption an B1818601 is logged with a fipsdump generated. Then during normal operations, periodic B1818A12 SRC may be logged as the corrupted error log entries are encountered. No service action is needed for the corrupted error logs as the old corrupted entries will be deleted as new error logs are added as part of the error log housekeeping. A problem was fixed for an unneeded service action request for a informational VRM redundant phase fail error logged with SRC 11002701. If reminders for service action with SRC B150F138 are occurring for this problem, then firmware containing the fix needs to be installed and ASMI error logs need to be cleared in order to stop the periodic reminder. System firmware changes that affect certain systems On systems using PowerVM firmware, a problem was fixed for a blank SRC in the LPA dump for user-initiated non-disruptive adjunct dumps. The A2D03004 SRC is needed for problem determination and dump analysis. On a system using PowerVM firmware with an IBM i partition and VIOS, a problem was fixed for a Live Partition Mobility migration for a IBM i partition that fails if there is a VIOS failover during the migration suspended window. On a system using PowerVM firmware and VIOS, a problem was fixed for a HMC "Incomplete State" after a Live Partition Mobility migration followed by a VIOS failover. The error is triggered by a delete operation on a migration adapter on the VIOS that did the failover. The HMC "Incomplete State" can be recovered from by doing a re-IPL of the system. This error can also prevent a VIOS from activating. On systems using PowerVM firmware, a problem was fixed with SR-IOV adapter error recovery where the adapter is left in a failed state in nested error cases for some adapter errors. The probability of this occurring is very low since the problem trigger is multiple low-level adapter failures. With the fix, the adapter is recovered and returned to an operational state. On systems using PowerVM firmware with PCIe adapters in Single Root I/O Virtualization (SR-IOV) shared mode, a problem was fixed for the hypervisor SR-IOV adjunct partition failing during the IPL with SRCs B200F011 and B2009014 logged. The SR-IOV adjunct partition successfully recovers after it reboots and the system is operational. On systems using PowerVM firmware with PCIe adapters in Single Root I/O Virtualization (SR-IOV) shared-mode in a PCIe slot with Enlarged IO Capacity and 2TB or more of system memory, a problem was fixed for the hypervisor SR-IOV adjunct partition failing during the IPL with SRCs B200F011 and B2009014 logged. In this configuration, it is possible the SR-IOV adapter will not become functional following a system reboot or when an adapter is first configured into shared-mode. Larger system memory configurations of 2TB or more than 1TB are more likely to encounter the problem. The problem can be avoided by reducing the number of PCIe slots with Enlarged IO Capacity enabled so it does not include adapters in SR-IOV shared-mode. Another circumvention option is to move the adapter to an SR-IOV capable PCIe slot where Enlarged IO Capacity is not enabled. On a system using PowerVM firmware and VIOS, a problem was fixed for a Live Partition Mobility (LPM) migration for an Active Memory Sharing (AMS) partition that hangs if there is a VIOS failover during the migration. On systems using PowerVM firmware, a problem was fixed for the PCIe3 Optical Cable Adapter for the PCIe3 Expansion Drawer failing with SRC B7006A84 error logged during the IPL. The failed cable adapter can be recovered by using a concurrent repair operation to power it off and on. Or the system can be re-IPLed to recover the cable adapter. The affected optical cable adapters have feature codes #EJ05, #EJ06, and #EJ08 with CCINs 2B1C, 6B52, and 2CE2, respectively. On systems using PowerVM firmware, the hypervisor "vsp" macro was enhanced to show the type of the adjunct partition. The "vsp -longname" macro option was also updated to list the location codes for the SR-IOV adjunct partitions. The hypervisor macros are used by IBM support to help debug Power system problems. On systems using PowerVM firmware, a problem was fixed for PCIe Host Bridge (PHB) outages and PCIe adapter failures in the PCIe I/O expansion drawer caused by error thresholds being exceeded for the LEM bit [21] errors in the FIR accumulator. These are typically minor and expected errors in the PHB that occur during adapter updates and do not warrant a reset of the PHB and the PCIe adapter failures. Therefore, the threshold LEM[21] error limit has been increased and the LEM fatal error has been changed to a Predictive Error to avoid the outages for this condition. On systems using PowerVM firmware, a problem was fixed for PCIe3 I/O expansion drawer (#EMX0) link improved stability. The settings for the continuous time linear equalizers (CTLE) was updated for all the PCIe adapters for the PCIe links to the expansion drawer. The CEC must be re-IPLed for the fix to activate. On systems using PowerVM firmware with IBM i partitions, a problem was fixed for frequent logging of informational B7005120 errors due to communications path closed conditions during messaging from HMCs to IBMi partitions. In the majority of cases these errors are due to normal operating conditions and not due to errors that require service or attention. The logging of informational errors due to this specific communications path closed condition that are the result of normal operating conditions has been removed. On a system using PowerVM firmware with an IBM i partition, a problem was fixed for a D-mode boot failure for IBM i from an USB RDX cartridge. There is a hang at the LPAR progress code C2004130 for a period of time and then a failure with SRC B2004158 logged. There is a USB External Dock (FC #EU04) and Removable Disk Cartridge (RDX) 63B8-005 attached. The error is intermittent so the RDX can be powered off and back on to retry the D-mode boot to recover. On systems using PowerVM firmware, the following problems were fixed for SR-IOV adapters: 1) Insufficient resources reported for SR-IOV logical port configured with promiscuous mode enable and a Port VLAN ID (PVID) when creating new interface on the SR-IOV adapters. 2) Spontaneous dumps and reboot of the adjunct partition for SR-IOV adapters. 3) Adapter enters firmware loop when single bit ECC error is detected. System firmware detects this condition as a adapter command time out. System firmware will reset and restart the adapter to recover the adapter functionality. This condition will be reported as a temporary adapter hardware failure. 4) vNIC interfaces not being deleted correctly causing SRC B400FF01 to be logged and Data Storage Interrupt (DSI) errors with failiure on boot of the LPAR. This set of fixes updates adapter firmware to 10.2.252.1926, for the following Feature Codes: EN15, EN16, EN17, EN18, EN0H, EN0J, EN0M, EN0N, EN0K, EN0L, EL38 , EL3C, EL56, and EL57. The SR-IOV adapter firmware level update for the shared-mode adapters happens under user control to prevent unexpected temporary outages on the adapters. A system reboot will update all SR-IOV shared-mode adapters with the new firmware level. In addition, when an adapter is first set to SR-IOV shared mode, the adapter firmware is updated to the latest level available with the system firmware (and it is also updated automatically during maintenance operations, such as when the adapter is stopped or replaced). And lastly, selective manual updates of the SR-IOV adapters can be performed using the Hardware Management Console (HMC). To selectively update the adapter firmware, follow the steps given at the IBM Knowledge Center for using HMC to make the updates: https://www.ibm.com/support/knowledgecenter/HW4M4/p8efd/p8efd_updating_sriov_firmware.htm. Note: Adapters that are capable of running in SR-IOV mode, but are currently running in dedicated mode and assigned to a partition, can be updated concurrently either by the OS that owns the adapter or the managing HMC (if OS is AIX or VIOS and RMC is running). On systems using PowerVM firmware with an IBM i partition, a problem was fixed for incorrect maximum performance reports based on the wrong number of "maximum" processors for the system. Certain performance reports that can be generated on IBMi systems contain not only the existing machine information, but also "what-if" information, such as "how would this system perform if it had all the processors possible installed in this system". This "what-if" report was in error because the maximum number of processors possible was too high for the system. On systems using PowerVM firmware, a problem was fixed for degraded PCIe3 links for the PCIe3 expansion drawer with SRC B7006A8F not being visible on the HMC. This occurred because the SRC was informational. The problem occurs when the link attaching a drawer to the system trains to x8 instead of x16. With the fix, the SRC has been changed to a B70006A8B permanent error for the degraded link. On systems using PowerVM firmware, a problem was fixed for a concurrent exchange of a CAPI adapter that left the new adapter in a deactivated state. The system can be powered off and IPLed again to recover the new adapter. The CAPI adapters have the following feature codes: #EC3E, #EC3F, #EC3L, #EC3M, #EC3T, #EC3U, #EJ16, #EJ17, #EJ18, #EJ1A, and #EJ1B. On a system using PowerVM firmware with SR-IOV adapters, a problem was fixed for a DLPAR remove on a Virtual Function (VF) of a ConnectX-4 (CX4) adapter that failed with AIX error "0931-013 Unable to isolate the resource". The HMC reported error is "HSCL12B5 The operation to remove SR-IOV logical port xx failed because of the following error: HSCL131D The SR-IOV logical port is still in use by the partition". The failing PCIe3 adapters are sourced from Mellanox Corporation based on ConnectX-4 technology and have the following feature codes and CCINs: #EC3E, #EC3F with CCIN 2CEA; #EC3L and #EC3M with CCIN 2CEC; and #EC3T and #ECTU with CCIN 2CEB. The issue occurs each time a DLPAR remove operation is attempted on the VF. Restarting the partition after a failed DLPAR remove recovers from the error. On systems using PowerVM firmware, a problem was fixed for NVRAM corruption that can occur when deleting a partition that owns a CAPI adapter, if that CAPI adapter is not assigned to another partition before the system is powered off. On a subsequent IPL, the system will come up in recovery mode if there is NVRAM corruption. To recover, the partitions must be restored from the HMC. The frequency of this error is expected to be rare. The CAPI adapters have the following feature codes: #EC3E, #EC3F, #EC3L, #EC3M, #EC3T, #EC3U, #EJ16, #EJ17, #EJ18, #EJ1A, and #EJ1B. On systems using PowerVM firmware, a problem was fixed for NVRAM corruption and a HMC recovery state when using Simplified Remote Restart partitions. The failing systems will have at least one Remote Restart partition and on the failed IPL there will be a B70005301 SRC with word 7 being 0X00000002. On systems using PowerVM firmware, a problem was fixed for a group of shared processor partitions being able to exceed the designated capacity placed on a shared processor pool. This error can be triggered by using the DLPAR move function for the shared processor partitions, if the pool has already reached its maximum specified capacity. To prevent this problem from occurring when making DLPAR changes when the pool is at the maximum capacity, do not use the DLPAR move operation but instead break it into two steps: DLPAR remove followed by DLPAR add. This gives enough time for the DLPAR remove to be fully completed prior to starting the DLPAR add request. On systems using PowerVM firmware, a problem was fixed for partition boot failures and run time DLPAR failures when adding I/O that log BA210000, BA210003, and/or BA210005 errors. The fix also applies to run time failures configuring an I/O adapter following an EEH recovery that log BA188001 events. The problem can impact IBMi partitions running in any processor mode or AIX/Linux partitions running in P7 (or older) processor compatibility modes. The problem is most likely to occur when the system is configured in the Manufacturing Default Configuration (MDC) mode. The trigger for the problem is a race-condition between the hypervisor and the physical operations panel with a very rare frequency of occurrence.
SC860_070_056 / FW860.12 01/13/17	Impact: Availability Severity: SPE
SC860_063_056 / FW860.11 12/05/16	Impact: N/A Severity: N/A This Service Pack contained updates for MANUFACTURING ONLY.
SC860_056_056 / FW860.10 11/18/16	Only DISRUPTIVE and DEFERRED fix descriptions are displayed for this service pack. The complete Firmware Fix History for this Release Level can be reviewed at the following url: http://download.boulder.ibm.com/ibmdl/pub/software/server/firmware/SC-Firmware-Hist.html Impact: New Severity: New System firmware changes that affect certain systems DISRUPTIVE: On systems using the PowerVM firmware, a problem was fixed for an "Incomplete" state caused by initiating a resource dump with selector macros from NovaLink (vio -dump -lp 1 -fr). The failure causes a communication process stack frame, HVHMCCMDRTRTASK, size to be exceeded with a hypervisor page fault that disrupts the NovalLink and/or HMC communications. The recovery action is to re-IPL the CEC but that will need to be done without the assistance of the management console. For each partition that has a OS running on the system, shut down each partition from the OS. Then from the Advanced System Management Interface (ASMI), power off the managed system. Alternatively, the system power button may also be used to do the power off. If the management console Incomplete state persists after the power off, the managed system should be rebuilt from the management console. For more information on management console recovery steps, refer to this IBM Knowledge Center link: https://www.ibm.com/support/knowledgecenter/en/POWER7/p7eav/aremanagedsystemstate_incomplete.htm. The fix is disruptive because the size of the PowerVM hypervisor must be increased to accommodate the over-sized stack frame of the failing task. DEFERRED: On systems using the PowerVM firmware, a problem was fixed for a CAPI function unavailable condition on a system with the maximum number of CAPI adapters and partitions. Not enough bytes were allocated for CAPI for the maximum configuration case. The problem may be circumvented by reducing the number of active partitions or CAPI adapters. The fix is deferred because the size of the hypervisor must be increased to provide the additional CAPI space. DEFERRED: On systems using PowerVM firmware, a problem was fixed for cable card capable PCI slots that fail during the IPL. Hypervisor I/O Bus Interface UE B7006A84 is reported for each cable card capable PCI slot that doesn't contain a PCIe3 Optical Cable Adapter for the PCIe Expansion Drawer (feature code #EJ05). PCI slots containing a cable card will not report an error but will not be functional. The problem can be resolved by performing an AC cycle of the system. The trigger for the failure is the I2C devices used to detect the cable cards are not coming out of the power on reset process in the correct state due to a race condition.

4.0 How to Determine The Currently Installed Firmware Level

You can view the server's current firmware level on the Advanced System Management Interface (ASMI) Welcome pane. It appears in the top right corner. Example: SC830_123.

5.0 Downloading the Firmware Package

Follow the instructions on Fix Central. You must read and agree to the license agreement to obtain the firmware packages.

Note: If your HMC is not internet-connected you will need to download the new firmware level to a USB flash memory device or ftp server.

6.0 Installing the Firmware

The method used to install new firmware will depend on the release level of firmware which is currently installed on your server. The release level can be determined by the prefix of the new firmware's filename.

Example: SCxxx_yyy_zzz

Where xxx = release level

If the release level will stay the same (Example: Level SC830_040_040 is currently installed and you are attempting to install level SC830_071_040) this is considered an update.
If the release level will change (Example: Level SC830_040_040 is currently installed and you are attempting to install level SC840_050_050) this is considered an upgrade.

Instructions for installing firmware updates and upgrades can be found at http://www-01.ibm.com/support/knowledgecenter/9119-MHE/p8ha1/updupdates.htm

IBM i Systems:

For information concerning IBM i Systems, go to the following URL to access Fix Central:
http://www-933.ibm.com/support/fixcentral/

Choose "Select product", under Product Group specify "System i", under Product specify "IBM i", then Continue and specify the desired firmware PTF accordingly.

7.0 Firmware History

The complete Firmware Fix History (including HIPER descriptions) for this Release level can be reviewed at the following url:
http://download.boulder.ibm.com/ibmdl/pub/software/server/firmware/SC-Firmware-Hist.html

8.0 Change History

Date	Description
November 27, 2017	- Fix Description update for SC860_082 / FW860.20.
October 17, 2017	Fix list correction for firmware level SC860_103_056 / FW860.30.
August 07, 2017	Fix Description update for firmware level: SC860_082_056 / FW860.20. One of the fixes requires a re-IPL of the system to activate but has not been marked as deferred. This is a fix for improved link stability for the PCIe expansion drawer (F/C #EMX0).