Power6 High-End System Firmware
Applies to: 9125-F2A
This document provides information about the installation of Licensed
Machine or Licensed Internal Code, which is sometimes referred to generically
as microcode or firmware.
Contents
1.0 Systems Affected
This package provides firmware for Power 575 (9125-F2A) Servers
only.
Do
not use on any other systems.
The firmware level in this package is:
2.0 Important Information
Firmware Installation
Attention: 9125-F2A servers should be evaluated for ECA845 installation
prior to performing firmware upgrades. Contact your service provider
for more information about ECA845. If firmware must be upgraded
prior to the installation of ECA845, use the following special instructions:
Firmware updates should be performed on an entire Managed Frame and
all Managed Systems
contained in that frame at one time. The following instructions
can be used to update
one Managed Frame and all Managed Systems that it contains. The
instructions can be
repeated as many times as needed until all Managed Frames have been
updated.
1) Power off all Managed Systems in the Managed Frame
2) Reinstall the current firmware level on all Managed Systems in the
Managed Frame:
The following command can be used to determine the current
firmware level:
lslic -t sys -m <managedsystem_1>
-F activated_level
The following command will retrieve and reinstall the current
firmware level on
one Managed System:
updlic -o a -m <managedsystem_1> -t
sys -l <current_fsp_level> -r <repos>
<current_fsp_level> is the level determined
with the lslic command.
<repos> is the location of the firmware
such as "-r dvd", "-r ibmwebsite", etc.
Additional parameters might be required,
depending on the repository selection.
If the firmware has already been retrieved
to the HMC, the HMC hard drive (-r disk)
should be used as the repository.
After the firmware has been retrieved to the HMC, the HMC
hard drive (-r disk) should
be used as the repository to update the remaining Managed
Systems:
updlic -o a -m <managedsystem_2> -t
sys -l <current_fsp_level> -r disk
updlic -o a -m <managedsystem_3>
-t sys -l <current_fsp_level> -r disk
......
updlic -o a -m <managedsystem_N>
-t sys -l <current_fsp_level> -r disk
These commands can be run in the background in parallel
to speed up the processing.
Wait for all updlic commands to complete before proceeding
to step 3.
3) Install and activate new BPC firmware only (during this step the
Managed Systems will
transition from "Power Off" to "No Connection" and then
back to "Power Off"):
Select 1 Managed System as the target of the updlic command
and update BPC firmware.
The following command will retrieve firmware from the
repository and update the BPCs:
updlic -o a -m <managedsystem_1> -t
power -l latest -r <repos> .....
<repos> is the location of the firmware
such as "-r dvd", "-r ibmwebsite", etc.
Additional parameters might be required,
depending on the repository selection.
If the firmware has already been retrieved
to the HMC, the HMC hard drive (-r disk)
should be used as the repository.
4) Wait for all Managed Systems to return to "Power Off" state.
5) Install and activate new firmware on all Managed Systems in the Managed
Frame.
updlic -o a -m <managedsystem_1> -t
sys -l latest -r <repos> .....
<repos> is the location of the firmware
such as "-r dvd", "-r ibmwebsite", etc.
Additional parameters might be required,
depending on the repository selection.
If the firmware has already been retrieved
to the HMC, the HMC hard drive (-r disk)
should be used as the repository.
After the firmware has been retrieved to the HMC, the HMC
hard drive (-r disk) should
be used as the repository to update the remaining Managed
Systems:
updlic -o a -m <managedsystem_2> -t
sys -l latest -r disk
updlic -o a -m <managedsystem_3>
-t sys -l latest -r disk
......
updlic -o a -m <managedsystem_N>
-t sys -l latest -r disk
These commands can be run in the background in parallel
to speed up the process.
Wait for all updlic commands to complete before proceeding
to step 6.
6) Power on the Managed Systems
HMC-Managed Systems
This firmware level requires HMC V7 R3.3.0
For more information and the latest PTFs, go to the following URL to
access the HMC code packages:
NOTE: You must be logged in as hscroot in order for
the firmware installation to complete correctly.
IPv6 Support and Limitations
IPv6 (Internet Protocol version 6) is supported in the System Management
Services (SMS) in this level of system firmware. There are several
limitations that should be considered.
When configuring a network interface card (NIC) for remote IPL, only
the most recently configured protocol (IPv4 or IPv6) is retained.
For example, if the network interface card was previously configured with
IPv4 information and is now being configured with IPv6 information, the
IPv4 configuration information is discarded.
A single network interface card may only be chosen once for the boot
device list. In other words, the interface cannot be configured for
the IPv6 protocol and for the IPv4 protocol at the same time.
A failure will occur if the overall device pathname string and its parameters
exceed 255 bytes. One symptom of the string being too long is an
odd-looking boot device string in the AIX start banner as in the
following example:
-------------------------------------------------------------------------------
Welcome to AIX.
boot image timestamp: HH:MM MM/DD
The current time and date: 10:15:24 04/22/2008
processor count: 2; memory size: 1024MB; kernel size: 28034141
boot device: /l
-------------------------------------------------------------------------------
Several things that can be done to try to get the overall string
length reduced are:
A. Use the compressed form
of the IPv6 IP addresses whenever possible. For example, change the
address
FEA0:0:0:0:3CD6:F0FF:FD00:3004
to
FEA0::3CD6:F0FF:FD00:3004
B. Keep the TFTP filename as short
as possible.
C. Leave the gateway IP address
blank unless it is required.
4. When global IPv6 addresses are used for the client and the
server, and there are more than two gateways on the same link, the gateway
with the best route to the server should be used. Using a gateway
that does not have the best route to the server can cause the ping test
or network boot to fail.
3.0 Firmware Information
and Description
Use the following examples as a reference to determine whether your installation
will be concurrent or disruptive.
Note: The concurrent levels of system firmware may, on occasion,
contain fixes that are known as deferred. These deferred fixes can be installed
concurrently, but will not be activated until the next IPL. Deferred
fixes, if any, will be identified in the "Firmware Update Descriptions"
table of this document. For deferred fixes within a service pack,
only the fixes in the service pack which cannot be concurrently activated
are deferred.
Note: The file names and service pack levels used in the
following examples are for clarification only, and are not necessarily
levels that have been, or will be released.
System firmware file naming convention:
01ESXXX_YYY_ZZZ
-
XXX is the release level
-
YYY is the service pack level
-
ZZZ is the last disruptive service pack level
NOTE: Values of service pack and last disruptive service pack
level (YYY and ZZZ) are only unique within a release level (XXX).
For example, 01ES330_067_045 and 01ES340_067_053 are different service
packs.
An installation is disruptive if:
-
The release levels (XXX) are different.
Example: Currently installed release is ES330, new release is ES340
-
The service pack level (YYY) and the last disruptive service
pack level (ZZZ) are equal.
Example: ES330_120_120 is disruptive, no matter what level of ES330
is currently
installed on the system
-
The service pack level (YYY) currently installed on the system is lower
than the last disruptive service pack level (ZZZ) of the service pack to
be installed.
Example: Currently installed service pack is ES330_120_120 and
new service pack is ES330_152_130
An installation is concurrent if:
-
The service pack level (YYY) is higher than the service pack
level currently installed on your system.
Example: Currently installed service pack is ES330_126_120, new service
pack is ES330_143_120.
Firmware Information and Update Description
Filename |
Size |
Checksum |
01ES330_095_078.rpm |
23776233 |
52843 |
ES330 |
ES330_095_078
08/31/09 |
Impact: Usability
Severity: HIPER
System firmware changes that affect all systems
-
DEFERRED: This fix corrects the handling of a specific processor
instruction sequence that was generated on a particular heavily-tuned High
Performance Computing (HPC) application. This specific instruction sequence
has the potential to produce an incorrect result. This instruction sequence
has only been observed in a single HPC application. However, it is
strongly recommended that you apply this fix.
-
HIPER: A problem was fixed that caused the migration of a
partition using shared processors to fail with a reason code of 4180043,
or caused the source system to hang or crash.
-
A problem was fixed that caused SRC 1000911B to be erroneously logged during
a reset/reload of the service processor.
System firmware changes that affect certain systems
-
On systems with 7311-D11, 7314-G30, 5790, or 5796 19" drawers attached,
a problem was fixed that caused SRC 10009138 to be erroneously logged.
Concurrent maintenance (CM) firmware fixes
-
A problem was fixed that caused SRC B7005603 to be erroneously logged when
a F/C 5802 or 5877 drawer was concurrently added.
|
ES330_092_078
05/18/09 |
Impact: Availability
Severity: HIPER
System firmware changes that affect all systems:
-
HIPER: The firmware was enhanced to improve the service processor's
capability to recover from bad bits in the flash memory. A predictive
error, or an unrecoverable error, will be logged against the card that
contains the system firmware if the number of correctable or uncorrectable
errors exceeds the threshold.
-
A problem was fixed that prevented the service processor from automatically
booting from the permanent (or P) side if the temporary (or T) side of
the firmware flash was corrupted. When the problem occurred, the
service processor stopped instead of booting from the P side.
-
The firmware was enhanced so that SRC B1xxE458 (with word 6=0000E42B) will
be logged as informational instead of generating a call home.
-
A problem was fixed that caused the system to crash, under certain circumstances,
with SRC B112E504 being logged, followed by SRC B181C350, when a system
dump was initiated.
-
A problem was fixed that caused a partition being migrated to become unresponsive
on the target system when firmware-assisted dump was enabled.
-
A problem was fixed that caused hardware to be deconfigured when the system
encountered network errors, even though the SRCs were being logged as informational.
-
A problem was fixed that caused the detailed data at the end of an "early
power off warning type 5" AIX error log entry to be filled with invalid
data instead of zeros.
-
A problem was fixed that caused a partition being migrated to crash on
the target system.
-
A problem was fixed that might cause a system to crash with SRC B170E504
when a processor was dynamically deconfigured.
-
The firmware was enhanced such that when data is written to the VPD (Anchor)
card, the results are verified, resulting in fewer VPD cards being replaced.
System firmware changes that affect certain systems
-
In systems using InfiniBand switches for processor clustering, a problem
was fixed that caused packets to be dropped under certain circumstances.
|
ES330_078_078
01/15/09 |
Impact: Function
Severity: HIPER
This level is a disruptive update from any ES330 firmware level.
The system should be powered off before installing this level of system
firmware. If this level is installed when the system is running,
the CECs will be rebooted, causing all partitions to be terminated, and
a reboot will be required.
System firmware changes that affect all systems:
-
DEFERRED and HIPER: The system initialization settings were
changed to reduce the likelihood of a system crash under extremely rare
circumstances.
-
HIPER: A problem was fixed that caused nodes to guard out
processor cores, or checkstop, during the transition to nominal voltage
from "power save" mode.
-
HIPER: A problem was fixed that caused a system to fail to
reboot after a B1xxE504 SRC was logged, due to a processor interconnection
bus failure. The same SRC, B1xxE504, was logged when the reboot failed.
-
A problem was fixed that might, if a platform dump occurred, have caused
a reset/reload of the service processor, and the platform dump to be corrupted.
-
A problem was fixed that caused incorrect field replaceable unit (FRU)
part numbers to be returned for the BPF scroll assembly and the UEPO
panel.
-
A problem was fixed that prevented the system from rebooting if an error
occurred during a memory-preserving IPL.
-
The firmware was enhanced so that a call home will be made if the hypervisor
issues a "terminate immediate" interrupt.
-
The firmware's redundant bit steering logic was enhanced to improve performance.
-
A problem was fixed that caused the location codes for mutli-port PCI adapters,
such as the 4-port Ethernet adapters, to be incorrect.
-
A problem was fixed that prevented service processor and hypervisor error
log entries from being reported to the operating system after a successful
partition migration. This problem only affected the partition that
was migrated.
-
On systems running AIX or Linux, a problem was fixed that, under certain
rare circumstances, might cause the operating system to crash.
-
A problem was fixed that, in certain configurations, caused the removal
of a host Ethernet adapter (HEA) port to fail when using a dynamic LPAR
(DLPAR) operation.
-
A problem was fixed that, under certain rare circumstances, caused the
hypervisor to crash when it was booting with SRC B6000103 being logged.
-
A problem was fixed that, under certain circumstances, prevented the operating
system from recovering a PCI-E adapter on which a temporary enhanced error
handling (EEH) error occurred.
-
A problem was fixed that, under certain rarely occurring circumstances,
caused the system to crash if an L2 or L3 cache failure is not discovered
and repaired when it initially occurs.
-
A problem was fixed that caused the service processor diagnostics to call
out a processor as the failing item, instead of the memory DIMMs, when
a large number of memory error correction coding (ECC) errors occurred.
-
A problem was fixed that caused the wrong field replaceable unit (FRU)
to be called out when SRC B152F109, which indicates a problem with the
NVRAM in a bulk power controller (BPC), was logged.
-
A problem was fixed that prevented service processor and hypervisor
error log entries from being reported to the operating system after a successful
partition migration. This problem only affected the partition that
was migrated.
-
A problem was fixed that might cause a default catch to occur when booting
from an iSCSI device.
System firmware changes that affect certain systems:
-
On systems with a host Ethernet adapter (HEA) or host channel adapter (HCA)
assigned to a Linux partition, a problem was fixed that prevented the partition
from booting if 512 GB, 1 TB, or 1.5 TB of memory was assigned to the partition.
When this problem occurred, SRC B700F105 was logged.
-
In systems with clustered processors, various problems were fixed
in the InfiniBand interconnection networks.
-
A problem was fixed that, under certain circumstances, caused an AIX or
Linux partition to fail to boot with SRC D200E0AF being logged.
-
On systems with external I/O frames, a problem was fixed that might have
prevented the firmware from "unthrottling" processors after entering power
save mode.
|
ES330_046_034
08/28/08 |
Impact: Function
Severity: HIPER
System firmware changes that affect all systems:
-
DEFERRED and HIPER: A problem was fixed that, under certain
rarely occurring circumstances, an application could cause a processor
to go into an error state, and the system to crash.
-
HIPER: A problem was fixed that caused the system to terminate
abnormally with SRC B131E504.
-
HIPER: A problem was fixed that might cause a partition to
crash during a partition migration before the migration was complete.
-
A problem was fixed that caused the location codes of multi-port PCI adapters,
such as a 4-port Ethernet card, to be displayed incorrectly.
-
A problem was fixed that caused multiple instances of SRC B1818A03 and
B1818A0A to be logged erroneously, and multiple calls home to be made,
during a frame connection reset.
-
A problem was fixed that caused SRC B1819506 to be erroneously generated,
and a call home to be made, when service processor (or system controller)
error log entries were generated faster than they could be processed.
-
A problem was fixed that caused the hardware management console (HMC) to
show an "Incomplete" state after it attempted to read a file with an incorrect
size from the service processor (or system controller). This problem
also occurred if the "factory configuration" option was used on the advanced
system management interface (ASMI) menus.
-
Enhancements were made to the firmware to improve the FRU callouts for
certain types of failures of the time-of-day clock circuitry.
-
A problem was fixed that prevented a dump file larger than 4 GB from being
successfully off-loaded to the hardware management console (HMC).
-
On systems with redundant bulk power controllers, a problem was fixed that
caused the hardware management console (HMC) to get stuck at "Pending Authentication"
for one of the bulk power controllers (BPCs).
-
On systems with I/O drawers attached, a problem was fixed that might have
caused some I/O slots in the drawers not to be configured when the system
was booted.
-
In systems with clustered processors, various problems were fixed in the
InfiniBand interconnection networks.
-
A problem was fixed that caused the location codes of the external InfiniBand
ports on a 5791 I/O drawer with the InfiniBand interface to be reported
incorrectly on the HMC.
-
A problem was fixed that caused SRC B7006971 to be generated because the
firmware was incorrectly performing operations on PCI-Express I/O
adapters during dynamic LPAR (DLPAR) operations on memory.
-
A problem was fixed the might have caused an out-of-memory condition in
the hypervisor, with SRC B7000200 being logged.
-
A problem was fixed in the thermal management firmware that caused SRCs
B1812635 and B1812636 to be logged, and the system or node to run in low
power mode when it should have been in nominal, or nominal when it should
have been in low power mode.
-
A problem was fixed that caused SRC B1818A10 to be erroneously generated
after a successful installation of system firmware.
-
A problem was fixed that caused the AIX commands "lsmcode" and "diag" to
fail after a partition migration.
-
A problem was fixed that caused the message "BA330000malloc error!" to
be displayed on the operating system console after a partition migration,
even though SRC BA330000 had not been logged. When this problem occurred,
the partition migration appeared to be successful. However, a process
within the partition was either hung or had failed, and in most cased the
partition had to be rebooted to fully recover.
-
A problem was fixed that caused the status of the connection between the
hardware management console (HMC) and the service processor to be set to
an invalid state. This might cause problems when the HMC and service
processor tried to communicate.
-
A problem was fixed that caused partitions that were being rebooted to
hang at D200E0AF after a concurrent firmware update under certain circumstances.
|
ES330_034_034
06/10/08 |
Impact: Function Severity:
HIPER
This level is a disruptive update from the prior level, ES330_018.
The system should be powered off before installing this level of system
firmware. If this level is installed when the system is running,
the CECs will be rebooted, causing all partitions to be terminated, and
a reboot will be required.
System firmware changes that affect all systems:
-
HIPER: A problem was fixed that caused a concurrent firmware
installation to hang with SRC BA00E840 being logged. This problem
may also cause a partition migration to hang, under certain circumstances,
with the same SRC, BA00E840, being logged. This SRC will be logged
when this level of firmware is installed and will generate a call home;
it should be ignored. It will not be logged during subsequent installations.
-
HIPER: The processor initialization settings were changed
to reduce the likelihood of a processor going into an error state and causing
a checkstop or system crash.
-
HIPER: A problem was fixed that caused large numbers of enhanced
error handling (EEH) errors to be logged against the 4-port gigabit Ethernet
adapter, F/C 5740, under certain circumstances.
-
A problem was fixed that caused the /tmp directory on the system controllers
and the service processor in the bulk power controller (BPC) to fill up,
which results in an out-of-memory condition. When this problem occurred,
the system controllers or service processor in the BPC usually performed
a reset/reload. This is one possible cause of SRC B1817201 being
logged.
-
A problem was fixed in the repair and verify (R and V) function on the
HMC that caused an unnecessary shutdown of the processor node when an error
was logged against a bulk power regulator (BPR).
-
A problem was fixed that caused a partition using a host channel adapter
(HCA) or host Ethernet adapter (HEA) to appear to hang (with progress code
D200C1FF being displayed) before successfully shutting down. The
amount of time the partition appeared to hang depended on the amount of
memory assigned to the partition and the usage of HCA or HEA.
-
A problem was fixed that prevented the HMC from connecting to the managed
system if the HMC's DHCP server IP range is changed when the managed system
is running.
-
The firmware was enhanced so that the IDs "celogin1" and "celogin2" allow
an authorized service provider to log into the bulk power controller (BPC).
-
The firmware was enhanced to improve the system memory error recovery.
-
The firmware was enhanced so that the contents of /tmp are included when
a service processor dump is taken.
-
A problem was fixed in the hypervisor that might cause a partition migration
to fail.
-
The firmware was enhanced so that:
-
A failure when writing VPD to a P6 processor will cause the node to be
deconfigured rather than terminating the system.
-
The failure of a VPD write operation will not corrupt the VPD table, which
may lead to unnecessary system down-time and unnecessary FRU replacement.
System firmware changes that affect certain systems:
-
On systems using QLogic InfiniBand switches, a problem was fixed that caused
the PortInfo:linkWidthActive and PortInfo:linkSpeedActive to be inaccurately
stored and displayed on the display of subnet parameters.
|
ES330_018_018
05/13/08 |
Impact: New Severity:
New
|
4.0
How to Determine Currently Installed Firmware Level
You can view the server's current firmware level on the Advanced System
Management Interface (ASMI) Welcome pane. It appears in the top right
corner. Example: ES330_095.
5.0 Downloading the
Firmware Package
Follow the instructions on the web page. You must read and agree to the
license agreement to obtain the firmware packages.
Note: If your HMC is not internet-connected you will need to download
the new firmware level to a CD-ROM or ftp server.
6.0 Installing the Firmware
The method used to install new firmware will depend on the release level
of firmware which is currently installed on your server. The release level
can be determined by the prefix of the new firmware's filename.
Example: ESXXX_YYY_ZZZ
Where XXX = release level
-
If the release level will stay the same (Example: Level ES330_075_075
is currently installed and you are attempting to install level ES330_081_075)
this is considered an update.
-
If the release level will change (Example: Level ES330_081_075 is currently
installed and you are attempting to install level ES340_096_096) this is
considered an upgrade.
Instructions for installing firmware updates and upgrades can be found
at http://publib.boulder.ibm.com/infocenter/systems/scope/hw/topic/ipha1/updupdates.htm