Power6 High-End System Firmware

Applies to:  9119-FHA

This document provides information about the installation of Licensed Machine or Licensed Internal Code, which is sometimes referred to generically as microcode or firmware.


Contents


1.0 Systems Affected

This package provides firmware for Power 595 (9119-FHA)  Servers only.  Do not use on any other systems.

The firmware level in this package is:


2.0 Cautions and Important Information

2.1 Cautions

Concurrent Maintenance Restrictions

Problems have been identified in the system firmware which impact CEC Concurrent Maintenance on Power6 servers 9119-FHA.

These functions must not be performed until the firmware level containing the fixes has been installed on the server.

The fixes for these functions will be available in a future Service Pack.

2.2 Important Information

HMC-Managed Systems

This firmware level requires HMC V7R3.3.0 with MH01119 and MH01130.

Go to the following URL to access the HMC code packages:

NOTE:   You must be logged in as hscroot in order for the firmware installation to complete correctly.
 

IPv6 Support and Limitations

IPv6 (Internet Protocol version 6) is supported in the System Management Services (SMS) in this level of system firmware.  There are several limitations that should be considered.

When configuring a network interface card (NIC) for remote IPL, only the most recently configured protocol (IPv4 or IPv6) is retained.  For example, if the network interface card was previously configured with IPv4 information and is now being configured with IPv6 information, the IPv4 configuration information is discarded.

A single network interface card may only be chosen once for the boot device list.  In other words, the interface cannot be configured for the IPv6 protocol and for the IPv4 protocol at the same time.

A failure will occur if the overall device pathname string and its parameters exceed 255 bytes.  One symptom of the string being too long is an odd-looking boot device string  in the AIX start banner as in the following example:

    -------------------------------------------------------------------------------
                                    Welcome to AIX.
                           boot image timestamp: HH:MM MM/DD
                     The current time and date: 10:15:24 04/22/2008
            processor count: 2;  memory size: 1024MB;  kernel size: 28034141
                                    boot device: /l
    -------------------------------------------------------------------------------

  Several things that can be done to try to get the overall string length reduced are:

      A.   Use the compressed form of the IPv6 IP addresses whenever possible.  For example, change the address

             FEA0:0:0:0:3CD6:F0FF:FD00:3004

             to

            FEA0::3CD6:F0FF:FD00:3004

      B.  Keep the TFTP filename as short as possible.

      C.  Leave the gateway IP address blank unless it is required.

4.  When global IPv6 addresses are used for the client and the server, and there are more than two gateways on the same link, the gateway with the best route to the server should be used.  Using a gateway that does not have the best route to the server can cause the ping test or network boot to fail.


3.0 Firmware Information and Description

Use the following examples as a reference to determine whether your installation will be concurrent or disruptive.

Note:  The concurrent levels of system firmware may, on occasion, contain fixes that are known as deferred. These deferred fixes can be installed concurrently, but will not be activated until the next IPL.  Deferred fixes, if any, will be identified in the "Firmware Update Descriptions" table of this document.  For deferred fixes within a service pack, only the fixes in the service pack which cannot be concurrently activated are deferred.

Note:  The file names and service pack levels used in the following examples are for clarification only, and are not necessarily levels that have been, or will be released.

System firmware file naming convention:

     01EHXXX_YYY_ZZZ

NOTE:  Values of service pack and last disruptive service pack  level (YYY and ZZZ) are only unique within a release level (XXX).  For example, 01EH330_067_045 and 01EH340_067_053 are different service packs.

An installation is disruptive if:

              Example:  Currently installed release is EH330, new release is EH340                Example:  EH330_120_120 is disruptive, no matter what level of EH330 is currently
                                   installed on the system                  Example:  Currently installed service pack is EH330_120_120 and
                                     new service pack is EH330_152_130

An installation is concurrent if:

              Example: Currently installed service pack  is EM310_126_120,
                                 new service pack is EM310_143_120.
 

Firmware Information and Update Description

 
Filename Size Checksum
01EH330_046_034.rpm 36780708 21257
 
EH330
EH330_046_034

08/28/08

Impact:  Function          Severity:  HIPER

System firmware changes that affect all systems:

  • DEFERRED and HIPER:  A problem was fixed that, under certain rarely occurring circumstances, an application could cause a processor to go into an error state, and the system to crash.
  • HIPER:  A problem was fixed that caused the system to terminate abnormally with SRC B131E504.
  • HIPER:  A problem was fixed that might cause a partition to crash during a partition migration before the migration was complete.
  • DEFERRED:  Enhancements were made to the system firmware to reduce the system boot time on power up.
  • DEFERRED:  A problem was fixed such that under certain rare circumstances, if a system controller failover occurred, the new secondary system controller was not able to communicate with the system.
  • DEFERRED:  A problem was fixed that caused SRC B1608CB0 to be logged if a separate I/O frame is attached to the CEC frame.
  • A problem was fixed that caused multiple instances of SRC B1818A03 and B1818A0A to be logged erroneously, and multiple calls home to be made, during a frame connection reset.
  • A problem was fixed that caused SRC B1819506 to be erroneously generated, and a call home to be made, when service processor (or system controller) error log entries were generated faster than they could be processed.
  • A problem was fixed that caused the hardware management console (HMC) to show an "Incomplete" state after it attempted to read a file with an incorrect size from the service processor (or system controller).  This problem also occurred if the "factory configuration" option was used on the advanced system management interface (ASMI) menus.
  • Enhancements were made to the firmware to improve the FRU callouts for certain types of failures of the time-of-day clock circuitry.
  • A problem was fixed that prevented a dump file larger than 4 GB from being successfully off-loaded to the hardware management console (HMC).
  • On systems with redundant bulk power controllers, a problem was fixed that caused the hardware management console (HMC) to get stuck at "Pending Authentication" for one of the bulk power controllers (BPCs).
  • On systems with I/O drawers attached, a problem was fixed that might have caused some I/O slots in the drawers not to be configured when the system was booted.
  • In systems with clustered processors, various problems were fixed in the InfiniBand interconnection networks.
  • A problem was fixed that caused the location codes of the external InfiniBand ports on a 5791 I/O drawer with the InfiniBand interface to be reported incorrectly on the HMC.
  • A problem was fixed that caused SRC B7006971 to be generated because the firmware was  incorrectly performing operations on PCI-Express I/O adapters during dynamic LPAR (DLPAR) operations on memory.
  • A problem was fixed the might have caused an out-of-memory condition in the hypervisor, with SRC B7000200 being logged.
  • A problem was fixed in the thermal management firmware that caused SRCs B1812635 and B1812636 to be logged, and the system or node to run in low power mode when it should have been in nominal, or nominal when it should have been in low power mode.
  • A problem was fixed that caused SRC B1818A10 to be erroneously generated after a successful installation of system firmware.
  • A problem was fixed that caused the AIX commands "lsmcode" and "diag" to fail after a partition migration.
  • A problem was fixed that caused the message "BA330000malloc error!" to be displayed on the operating system console after a partition migration, even though SRC BA330000 had not been logged.  When this problem occurred, the partition migration appeared to be successful.  However, a process within the partition was either hung or had failed, and in most cased the partition had to be rebooted to fully recover.
  • A problem was fixed that caused the status of the connection between the hardware management console (HMC) and the service processor to be set to an invalid state.  This might cause problems when the HMC and service processor tried to communicate.
  • A problem was fixed that caused partitions that were being rebooted to hang at D200E0AF after a concurrent firmware update under certain circumstances.
  • A problem was fixed that prevented the replacement of a system controller from completing successfully if the system controller had been guarded out prior to it replacement.
  • A problem was fixed that caused the system controller to go through an unnecessary reset/reload cycle when a checkstop occurred or the system was powered off.
  • Enhancements were made to the firmware to improve the FRU callouts for certain types of failures of the node controller.
  • A problem was fixed that caused predictive SRC B181EF88 to be logged when, under certain circumstances, a system controller failover occurred at runtime.
  • A problem was fixed such that if redundancy was disabled, and the emergency power off (EPO) switch was then used to power off the system, redundancy was erroneously enabled when the system came back up.
  • Enhancements were made to the firmware to improve the FRU callouts for certain types of failures of a node controller.
  • A problem was fixed such that caused the service processor (or system controller) to lose its communication link with the hypervisor, and SRC A181D000 to be logged, under certain rare circumstances.
  • On systems using virtual shared processor pools (VSPP), a problem was fixed that caused the number of processors assigned to the partitions to be reduced after a memory-preserving IPL. 
EH330_034_034

06/10/08

Impact:  Function        Severity:  HIPER

This level is a disruptive update from the prior level, EH330_018.  The system should be powered off before installing this level of system firmware.  If this level is installed when the system is running, the CECs will be rebooted, causing all partitions to be terminated, and a reboot will be required

System firmware changes that affect all systems:

  • HIPER:  A problem was fixed that caused a concurrent firmware installation to hang with SRC BA00E840 being logged.  This problem may also cause a partition migration to hang, under certain circumstances, with the same SRC, BA00E840, being logged.  This SRC will be logged when this level of firmware is installed and will generate a call home; it should be ignored.  It will not be logged during subsequent installations.
  • HIPER: The processor initialization settings were changed to reduce the likelihood of a processor going into an error state and causing a checkstop or system crash.
  • HIPER: A problem was fixed that, under certain circumstances, caused a system termination during a service processor failover.
  • HIPER: A problem was fixed that caused large numbers of enhanced error handling (EEH) errors to be logged against the 4-port gigabit Ethernet adapter, F/C 5740, under certain circumstances.
  • HIPER:  On systems with a redundant system controllers installed and enabled, a problem was fixed that might cause a communications hang between the two system controllers.  When this occurred, it triggered a reset/reload of the primary system controller, and the resulting fail-over to the secondary system controller failed in such a way that the system crashed. 
  • Several problems were fixed that might cause one or both of the clock cards to be deconfigured, and erroneously called out as bad, when the system boots up from the power-off state. 
  • A problem was fixed that caused the /tmp directory on the system controllers and the service processor in the bulk power controller (BPC) to fill up, which results in an out-of-memory condition.  When this problem occurred, the system controllers or service processor in the BPC usually performed a reset/reload.  This is one possible cause of SRC B1817201 being logged. 
  • A problem was fixed that prevented the "i5/OS enable/disable" setting (in the ASMI power on/off menu) from taking effect when the system is booted.  This solution requires the system to be booted up to hypervisor standby twice after the setting is changed to "enabled". This will be fixed in a future service pack to remove the requirement for the second boot to hypervisor standby. 
  • A problem was fixed that caused the firmware  to receive a false error indication when reading the registers on the LED controller.  SRC B1811340 was logged when this happened.
  • A problem was fixed that prevented an error fail-over to the secondary system controller from completing successfully.
  • A problem was fixed that might have caused a system firmware installation to fail with SRC B18138B7 being logged.
  • A problem was fixed that caused an error log to be generated that called out system controller A (Un-P1-C2), instead of the correct callout, which was system controller B (Un-P2-C5).
  • A problem was fixed that caused the P1 LED on the front light strip to be on when it should have been off.
  • A problem was fixed that caused the wrong memory DIMM location to be called out when certain types of failures occurred.
  • A problem was fixed that might have caused cache chip failures when the system is operating in Power Save mode.  Error log entries that might indicate that this problem is occurring include correctable errors and uncorrectable errors in L2, i-cache and d-cache memory, parity errors, and SRC B181E504. 
  • The firmware was enhanced so that the IDs "celogin1" and "celogin2" allow an authorized service provider to log into the bulk power controller (BPC).
  • A problem was fixed that caused a partition using a host channel adapter (HCA) or host Ethernet adapter (HEA) to appear to hang (with progress code D200C1FF being displayed) before successfully shutting down.  The amount of time the partition appeared to hang depended on the amount of memory assigned to the partition and the usage of HCA or HEA.
  • A problem was fixed that prevented the HMC from connecting to the managed system if the HMC's DHCP server IP range is changed when the managed system is running.
  • The error logging and FRU callout firmware was enhanced so that if a failure occurs on one or both clock cards, only one will get deconfigured, and the system will continue to try to boot instead of terminating.
  • The firmware was enhanced to improve the system memory error recovery.
  • The firmware was enhanced so that the contents of  the /tmp directory are included when a service processor dump is taken.
  • A problem was fixed in the hypervisor that might cause a partition migration to fail.
  • The firmware was enhanced so that:
    • A failure when writing VPD to a P6 processor will cause the node to be deconfigured rather than terminating the system.
    • The failure of a VPD write operation will  not corrupt the VPD table, which may lead to unnecessary system down-time and unnecessary FRU replacement.
System firmware changes that affect certain systems:
  • On systems using QLogic InfiniBand switches, a problem was fixed that caused the PortInfo:linkWidthActive and PortInfo:linkSpeedActive to be inaccurately stored and displayed on the display of subnet parameters.
EH330_018_018

05/13/08

Impact:  New        Severity:  New
  • GA Level


4.0 How to Determine Currently Installed Firmware Level

You can view the server's current firmware level on the Advanced System Management Interface (ASMI) Welcome pane.  It appears in the top right corner.   Example:  EH330_046.

5.0 Downloading the Firmware Package

Follow the instructions on the web page. You must read and agree to the license agreement to obtain the firmware packages.

Note: If your HMC is not internet-connected you will need to download the new firmware level to a CD-ROM or ftp server.


6.0 Installing the Firmware

The method used to install new firmware will depend on the release level of firmware which is currently installed on your server. The release level can be determined by the prefix of the new firmware's filename.

Example: EHXXX_YYY_ZZZ

Where XXX =  release level

Instructions for installing firmware updates and upgrades can be found at http://publib.boulder.ibm.com/infocenter/systems/scope/hw/topic/ipha1/updateschapter.htm


7.0 Change History

 
DATE Description
Mar 18, 2009 Added information in Section 2.1 pertaining to Concurrent Maintenance.Restrictions.
Dec 02, 2008 Revised the link in Section 6.0 for updating and upgrading firmware.
Sep 09, 2008  Revised the HMC information to include MH01130.