Power10 System Firmware

Applies to:   9080-HEX

This document provides information about the installation of Licensed Machine or Licensed Internal Code, which is sometimes referred to generically as microcode or firmware.


Contents


1.0 Systems Affected

This package provides firmware for IBM Power System E1080 (9080-HEX) server only.

The firmware level in this package is:

1.1 Minimum HMC Code Level

This section is intended to describe the "Minimum HMC Code Level" required by the System Firmware to complete the firmware installation process. When installing the System Firmware, the HMC level must be equal to or higher than the "Minimum HMC Code Level" before starting the system firmware update.  If the HMC managing the server targeted for the System Firmware update is running a code level lower than the "Minimum HMC Code Level" the firmware update will not proceed.

The Minimum HMC Code levels for this firmware for HMC x86,  ppc64 or ppc64le are listed below.

x86 -  This term is used to reference the legacy HMC that runs on x86/Intel/AMD hardware for the Virtual HMC that can run on the Intel hypervisors (KVM, XEN, VMWare ESXi).
ppc64 or ppc64le - describes the Linux code that is compiled to run on Power-based servers or LPARS (Logical Partitions)
The Minimum HMC level supports the following HMC models:
HMC models: 7063-CR1 and 7063-CR2
x86 - KVM, XEN, VMWare ESXi (6.0/6.5)
ppc64le - vHMC on PowerVM (POWER8,POWER9, and POWER10 systems)

For information concerning HMC releases and the latest PTFs,  go to the following URL to access Fix Central:
http://www-933.ibm.com/support/fixcentral/

For specific fix level information on key components of IBM Power Systems running the AIX, IBM i and Linux operating systems, we suggest using the Fix Level Recommendation Tool (FLRT):
http://www14.software.ibm.com/webapp/set2/flrt/home


NOTES:

                -You must be logged in as hscroot in order for the firmware installation to complete correctly.
                - Systems Director Management Console (SDMC) does not support this System Firmware level.

2.0 Important Information

NovaLink levels earlier than "NovaLink 1.0.0.16 Feb 2020 release" with partitions running certain SR-IOV capable adapters is NOT supported at this firmware release

NovaLink levels earlier than "NovaLink 1.0.0.16 Feb 2020 release" do not support IO adapter FCs EC2R/EC2S, EC2T/EC2U, EC66/EC67 with FW1010 and later. 

Live Partition Mobility (LPM) support restrictions for FW1010.00:

Live Partition Mobility (LPM) support restrictions for FW1010.00 have been removed for FW1010.10 and later releases.
The LPM restrictions for FW1010.00 have been removed for FW1010.10.

Note:  The following IBM document article for the LPM support matrix for POWER10 should be followed for guidance on migrating between firmware levels
https://www.ibm.com/docs/en/power10?topic=mobility-firmware-support-matrix-partition

Firmware Update Failure on Power10:

Important information regarding system firmware update might fail with errors HSCF0180E and E302F854 logged on the Hardware Management Console (HMC) and System Reference Code (SRC) B181303F on the flexible service processor (FSP).

See the link for further information: https://www.ibm.com/support/pages/node/6527300

2.1 IPv6 Support and Limitations

IPv6 (Internet Protocol version 6) is supported in the System Management Services (SMS) in this level of system firmware. There are several limitations that should be considered.

When configuring a network interface card (NIC) for remote IPL, only the most recently configured protocol (IPv4 or IPv6) is retained. For example, if the network interface card was previously configured with IPv4 information and is now being configured with IPv6 information, the IPv4 configuration information is discarded.

A single network interface card may only be chosen once for the boot device list. In other words, the interface cannot be configured for the IPv6 protocol and for the IPv4 protocol at the same time.

2.2 Concurrent Firmware Updates

Concurrent system firmware update is supported on HMC Managed Systems only.

Ensure that there are no RMC connections issues for any system partitions prior to applying the firmware update.  If there is a RMC connection failure to a partition during the firmware update, the RMC connection will need to be restored and additional recovery actions for that partition will be required to complete partition firmware updates.

2.3 Memory Considerations for Firmware Upgrades

Firmware Release Level upgrades and Service Pack updates may consume additional system memory.
Server firmware requires memory to support the logical partitions on the server. The amount of memory required by the server firmware varies according to several factors.
Factors influencing server firmware memory requirements include the following:
Generally, you can estimate the amount of memory required by server firmware to be approximately 8% of the system installed memory. The actual amount required will generally be less than 8%. However, there are some server models that require an absolute minimum amount of memory for server firmware, regardless of the previously mentioned considerations.

Additional information can be found at:
https://www.ibm.com/support/knowledgecenter/9080-M9S/p9hat/p9hat_lparmemory.htm

2.4 SBE Updates

Power10 servers contain SBEs (Self Boot Engines) and are used to boot the system.  SBE is internal to each of the Power10 chips and used to "self boot" the chip.  The SBE image is persistent and is only reloaded if there is a system firmware update that contains a SBE change.  If there is a SBE change and system firmware update is concurrent, then the SBE update is delayed to the next IPL of the CEC which will cause an additional 3-5 minutes per processor chip in the system to be added on to the IPL.  If there is a SBE change and the system firmware update is disruptive, then SBE update will cause an additional 3-5 minutes per processor chip in the system to be added on to the IPL.  During the SBE update process, the HMC or op-panel will display service processor code C1C3C213 for each of the SBEs being updated.  This is a normal progress code and system boot should be not be terminated by the user. Additional time estimate can be between 12-20 minutes per drawer or up to 48-80 minutes for maximum configuration.

The SBE image is updated with this service pack.


3.0 Firmware Information

Use the following examples as a reference to determine whether your installation will be concurrent or disruptive.

For systems that are not managed by an HMC, the installation of system firmware is always disruptive.

Note: The concurrent levels of system firmware may, on occasion, contain fixes that are known as Deferred and/or Partition-Deferred. Deferred fixes can be installed concurrently, but will not be activated until the next IPL. Partition-Deferred fixes can be installed concurrently, but will not be activated until a partition reactivate is performed. Deferred and/or Partition-Deferred fixes, if any, will be identified in the "Firmware Update Descriptions" table of this document. For these types of fixes (Deferred and/or Partition-Deferred) within a service pack, only the fixes in the service pack which cannot be concurrently activated are deferred.

Note: The file names and service pack levels used in the following examples are for clarification only, and are not necessarily levels that have been, or will be released.

System firmware file naming convention:

01VHxxx_yyy_zzz

NOTE: Values of service pack and last disruptive service pack level (yyy and zzz) are only unique within a release level (xxx). For example, 01VH900_040_040 and 01VH910_040_045 are different service packs.

An installation is disruptive if:

            Example: Currently installed release is 01VH900_040_040, new release is 01VH910_050_050.

            Example: VH910_040_040 is disruptive, no matter what level of VH910 is currently installed on the system.

            Example: Currently installed service pack is VH910_040_040 and new service pack is VH910_050_045.

An installation is concurrent if:

The release level (xxx) is the same, and
The service pack level (yyy) currently installed on the system is the same or higher than the last disruptive service pack level (zzz) of the service pack to be installed.

Example: Currently installed service pack is VH910_040_040, new service pack is VH910_041_040.

3.1 Firmware Information and Description

 
Filename Size Checksum md5sum
01MH1010_117_094.rpm 145050817
40853
7fa60c7e415946109516e424da30f953

Note: The Checksum can be found by running the AIX sum command against the rpm file (only the first 5 digits are listed).
ie: sum 01MH1010_117_094.rpm

MH1010
For Impact, Severity and other Firmware definitions, Please refer to the below 'Glossary of firmware terms' url:
http://www14.software.ibm.com/webapp/set2/sas/f/power5cm/home.html#termdefs

The complete Firmware Fix History for this Release Level can be reviewed at the following url:
http://download.boulder.ibm.com/ibmdl/pub/software/server/firmware/MH-Firmware-Hist.html
MH1010_117_094 / FW1010.20

03/31/22
Impact: Availability    Severity:  SPE

New Features and Functions

  • Support was added for an Advanced System Management Interface (ASMI) System Configuration panel option to disable or enable the system Lateral Cast-Out function (LCO).  LCO is enabled by default and a change to disable it must be done at service processor standby.  POWER processor chips since POWER7 have a feature called “Lateral Cast-Out” (LCO), enabled by default, where the contents of data cast out of one core’s L3 can be written into another core’s L3.  Then if a core has a cache miss on its own L3, it can often find the needed data block in another local core’s L3. This has the useful effect of slightly increasing the length of time that a storage block gets to stay in a chip’s cache, providing a performance boost for most applications.  However, for some applications such as SAP HANA, the performance can be better if LCO is disabled.  More information on how LCO is being configured by SAP HANA can be found in the SAP HANA on Power Advanced Operation Guide manual that can be accessed using the following link: 
    http://ibm.biz/sap-linux-power-library
    Follow the "SAP HANA Operation" link on this page to the "SAP HANA Operation Guides" folder.  In this folder, locate the updated "SAP_HANA_on_Power_Advanced_Operation_Guide" manual that has a new topic added of "Manage IBM Power Lateral Cast Out settings" which provides the additional information.
    The default behavior of the system (LCO enabled) will not change in any way by this new feature.  The customer will need to power off and disable LCO in ASMI to get the new behavior.
  • Support was added for Secure Boot for SUSE Linux Enterprise Server (SLES) partitions.  The SUSE Linux level must be SLES 15 SP4 or later.  Without this feature, partitions with SLES 15 SP4 or later and which have the OS Secure Boot partition property set to "Enabled and Enforced" will fail to boot.  A workaround to this is to change the partition's Secure Boot setting in the HMC partition configuration to "Disabled" or "Enabled and Log only".

System firmware changes that affect all systems

  • A problem was fixed for a possible unexpected SRC B1812641 logged if the system is powered off immediately after an IPL. The frequency of this problem is expected to be very rare because systems are not normally powered off immediately after powering on.  If this SRC occurs in this scenario, it can be ignored.
  • A problem was fixed for a logical partition failing to boot with an SRC B700F104 logged after a memory DDIMM power fault.  This is a rare problem needing a double failure on the Power Management Integrated Circuit (PMIC) that handles memory DDIMM power regulation for the OpenCAPI Memory Buffer (OCMB).  A re-IPL of the system is needed to recover from this problem.
  • A problem was fixed for a firmware update error with "HSCF0180E Operation failed" displayed on the HMC with error code E302F854. This fix is only available for firmware updates from FW1010.20 to a later service pack.  For firmware updates from earlier levels to FW1010.20, a failure is expected unless the following circumvention is performed:  On the firmware update from the HMC, select the "Advanced options" to automatically accept the new code level.  This is the default setting for an HMC at PTF levels MF69286 or MF69287 for HMC V10 R1 M1011.2   For earlier levels of the HMC, the automatically accept option must be manually changed to on when performing the code update as it defaults to off.  To do this, use the following steps:
    1.  When running the HMC code update wizard, click on "Advanced options".
    2.  From "Advanced options", select "Install and Activate (Implied retrieve)".
    3.  On the "Install and Activate panel", you will see the guidance text of "Select a LIC level type and accept option; then click OK."  The two accept options displayed are as follows:
    o Automatically accept
    o Do Not automatically accept
    To prevent the problem from occurring, the "Automatically accept" option must be selected.
  • A problem was fixed for errors that can occur if doing a Live Partition Mobility (LPM)  migration and a Dynamic Platform Optimizer (DPO) operation at the same time.  The migration may abort or the system or partition may crash.  This problem requires running multiple migrations and DPO at the same time.  As a circumvention, do not use DPO while doing LPM migrations.
  • A problem was fixed for a system hypervisor hang and an Incomplete state on the HMC after a logical partition (LPAR) is deleted that has an active virtual session from another LPAR.  This problem happens every time an LPAR is deleted with an active virtual session.  This is a rare problem because virtual sessions from an HMC (a more typical case) prevent an LPAR deletion until the virtual session is closed, but virtual sessions originating from another LPAR do not have the same check.
  • A problem was fixed for vTPM 2.0 updates not being applied concurrently on a firmware update.  The updates are applied after a reboot of the system.
  • A problem was fixed for vague and misleading errors caused by using an invalid logical partition (LP) id for a resource dump request.  With the fix, the invalid LP id is rejected immediately as a user input error instead of being processed by the main storage dump to create what appear to be severe errors.
  • A problem was fixed for a partition with an SR-IOV logical port (VF) having a delay in the start of the partition. If the partition boot device is an SR-IOV logical port network device, this issue may result in the partition failing it boot with SRCs BA180010 and BA155102 logged and then stuck on progress code SRC 2E49 for an AIX partition.  This problem is infrequent because it requires multiple error conditions at the same time on the SR-IOV adapter.  To trigger this problem, multiple SR-IOV logical ports for the same adapter must encounter EEH conditions at roughly the same time such that a new logical port EEH condition is occurring while a previous EEH condition's handling is almost complete but not notified to the hypervisor yet.  To recover from this problem, reboot the partition.
  • A problem was fixed for a secondary fault after a partition creation error that could result in a Terminate Immediate (TI) of the system with an SRC B700F103 logged.  The failed creation of partitions can be explicit or implicit that might trigger the secondary fault.  One example of an implicit partition create is the ghost partition created for a Live Partition Mobility (LPM) migration.  This type of partition can fail to create when there is insufficient memory available for the hardware page table (HPT) for the new partition.
  • A problem was fixed for an I/O adapter slot error when powering on the slot with SRC B4000202 and B400F104 logged.  One example where this problem has been seen is when moving an SR-IOV adapter to shared mode.  This problem is infrequent and can be recovered by retrying the operation that failed, such as DLPAR, starting the partition, or moving the SR-IOV adapter.
  • A problem was fixed for a System Management Services (SMS) iSCSI information panel being incorrect and an SMS abort when navigating away from the panel.   The iSCSI target and initiator names are not shown. The configured IP addresses to be used for an iSCSI boot are all zeroes even after they are set. Navigating away from the iSCSI information panel causes an SMS abort.  This problem is triggered by setting an iSCSI disk alias in SMS menus then attempting to show information with the following selection:  "Select Boot Options -> Configure Boot Device Order -> Select 1st Boot Device ->Network -> ISCSI -> iscsi-disk1 -> Information".  The probability is low that this issue will be encountered because it requires iSCSI disk aliases to be used for a boot.  Normally for an iSCSI boot disk,  most users use a fully qualified iSCSI OF device path which does not trigger the problem.  If an SMS abort does occur when navigating away from the iSCSI information menu, the logical partition (LPAR) can be restarted to SMS menus.
  • A problem was fixed for a Hostboot hang during an IPL with SRC BC141E2B logged.  This is a very rare failure for a timing problem involving multiple process threads.  To recover from the problem, do a re-IPL of the system.
  • A problem was fixed for detecting a bad SBE SEEPROM with a SEEPROM and processor callout with SRC BC102224 logged when an SBE update is attempted and failed.  The fix allows the boot to continue on the old level of the SEEPROM level.  This is a rare problem that only occurs with an SBE SEEPROM that cannot be written.  Without the fix, the IPL will loop and hang with issues with the SBE update being continually logged.
  • A problem was fixed for a clock error during the IPL that should have been recoverable but instead failed the IPL with extra error logs that included BC8A285E and B111B901.  The trigger for this problem requires a recoverable Hostboot IPL failure of some kind to occur (such as a clock error) and specifically a situation that does not result in a deconfiguration of Hostboot targets.
  • A problem was fixed for a system hang caused by an Open Memory Interface (OMI) memory loop.  This is a very rare error that can only occur if the OMI  host memory controller data link has gone into degraded bandwidth mode (x8->x4) because of another error and it also requires a specific memory data pattern to be transmitted when in this degraded mode for the problem to occur. 
  • A problem was fixed for an IPL failure involving a processor that does not have any functional cores.  For this rare problem to occur, a processor with only one functional core must have that core fail with a checkstop.  Then on the ensuing post-dump IPL, the error occurs during the deconfiguration of the failed processor.  This fix updates the Self Boot Engine (SBE).
  • A problem was fixed for ASMI TTY menus allowing an unsupported change in hypervisor mode to OPAL.  This causes an IPL failure with BB821410 logged if OPAL is selected.  The hypervisor mode is not user-selectable in POWER9 and POWER10.  Instead,  the hypervisor mode is determined by the MTM of the system. With this fix, the "Firmware Configuration" option in ASMI TTY menus is removed so that it matches the options given by the ASMI GUI menus.

System firmware changes that affect certain systems

  • For a system with a AIX or Linux partition, a problem was fixed a partition start failure for AIX or Linux with SRC BA54504D logged.  This problem occurs if the partition is a MDC default partition with virtual Trusted Platform Module (vTPM) enabled.  As a circumvention, power off the system and disable vTPM using the HMC GUI to change the default partition property for Virtualized Trusted Platform Module (VTPM) to off.
  • For a system with an IBM i partition in MDC mode, a problem was fixed for a possible system hang if an HMC virtual IBM i console fails to connect.  A rare timing problem with a shared lock can occur during the console connect attempt,  This problem can be recovered by a re-IPL of the system.
  • For systems with Linux partitions, a problem was fixed for Linux energy scale features not being enabled in Linux partitions for POWER10.  With the problem, Linux is prevented from knowing that energy scale operations are available for use by the partition.

4.0 How to Determine The Currently Installed Firmware Level

You can view the server's current firmware level on the Advanced System Management Interface (ASMI) Welcome pane. It appears in the top right corner. Example: MH1010_117.


5.0 Downloading the Firmware Package

Follow the instructions on Fix Central. You must read and agree to the license agreement to obtain the firmware packages.

Note: If your HMC is not internet-connected you will need to download the new firmware level to a USB flash memory device or ftp server.


6.0 Installing the Firmware

The method used to install new firmware will depend on the release level of firmware which is currently installed on your server. The release level can be determined by the prefix of the new firmware's filename.

Example: MHxxx_yyy_zzz

Where xxx = release level

Instructions for installing firmware updates and upgrades can be found at https://www.ibm.com/support/knowledgecenter/9080-M9S/p9eh6/p9eh6_updates_sys.htm

IBM i Systems:

For information concerning IBM i Systems, go to the following URL to access Fix Central: 
http://www-933.ibm.com/support/fixcentral/

Choose "Select product", under Product Group specify "System i", under Product specify "IBM i", then Continue and specify the desired firmware PTF accordingly.

HMC and NovaLink Co-Managed Systems (Disruptive firmware updates only):

A co-managed system is managed by HMC and NovaLink, with one of the interfaces in the co-management master mode.
Instructions for installing firmware updates and upgrades on systems co-managed by an HMC and Novalink is the same as above for a HMC managed systems since the firmware update must be done by the HMC in the co-management master mode.  Before the firmware update is attempted, one must be sure that HMC is set in the master mode using the steps at the following IBM KnowledgeCenter link for NovaLink co-managed systems:
https://www.ibm.com/support/knowledgecenter/9009-22A/p9eig/p9eig_kickoff.htm

Then the firmware updates can proceed with the same steps as for the HMC managed systems except the system must be powered off because only a disruptive update is allowed.   If a concurrent update is attempted, the following error will occur: " HSCF0180E Operation failed for <system name> (<system mtms>).  The operation failed.  E302F861 is the error code:"
https://www.ibm.com/support/knowledgecenter/9009-22A/p9eh6/p9eh6_updates_sys.htm

7.0 Firmware History

The complete Firmware Fix History (including HIPER descriptions)  for this Release level can be reviewed at the following url:
http://download.boulder.ibm.com/ibmdl/pub/software/server/firmware/MH-Firmware-Hist.html