Power6 High-End System Firmware
Applies to: 9119-FHA
This document provides information about the installation of
Licensed
Machine or Licensed Internal Code, which is sometimes referred to
generically
as microcode or firmware.
Contents
1.0 Systems Affected
This package provides firmware for Power 595 (9119-FHA) Servers
only.
Do
not use on any other systems.
The firmware level in this package is:
2.0
Cautions and Important Information
2.1 Cautions
Concurrent Maintenance Restrictions
Clients currently at 320 or 330 system firmware
release
levels will need to upgrade to the 350 level to enable the CCM
functionality.
As always, upgrading from one firmware release level to another will
require
a system re-IPL (reboot of the frame).
2.2 Important Information
HMC-Managed Systems
The minimum HMC level required by this firmware level is HMC V7 R3.3.0
with MH01150 and MH01169 (or higher).
Although this EH330 service pack can be installed on a system being
managed by a 7.3.3 HMC, there are fixes/function that are only
available when using a system managed by a 7.3.5 HMC.
Therefore, HMC level 7.3.5 with MH01212 and MH01217 (or higher) is
recommended for
this
firmware level.
For information concerning HMC releases and the latest PTFs,
go
to the following URL to access the HMC code packages:
http://www14.software.ibm.com/webapp/set2/sas/f/hmcl/home.html
NOTE: You must be logged in as hscroot in order for the
firmware
installation to complete correctly.
IPv6 Support and Limitations
IPv6 (Internet Protocol version 6) is supported in the System
Management
Services (SMS) in this level of system firmware. There are
several
limitations that should be considered.
When configuring a network interface card (NIC) for remote IPL, only
the most recently configured protocol (IPv4 or IPv6) is retained.
For example, if the network interface card was previously configured
with
IPv4 information and is now being configured with IPv6 information, the
IPv4 configuration information is discarded.
A single network interface card may only be chosen once for the boot
device list. In other words, the interface cannot be configured
for
the IPv6 protocol and for the IPv4 protocol at the same time.
A failure will occur if the overall device pathname string and its
parameters
exceed 255 bytes. One symptom of the string being too long is an
odd-looking boot device string in the AIX start banner as in the
following example:
-------------------------------------------------------------------------------
Welcome to AIX.
boot image timestamp: HH:MM MM/DD
The current time and date: 10:15:24 04/22/2008
processor count: 2; memory size: 1024MB; kernel size:
28034141
boot device: /l
-------------------------------------------------------------------------------
Several things that can be done to try to get the overall
string
length reduced are:
A. Use the compressed
form
of the IPv6 IP addresses whenever possible. For example, change
the
address
FEA0:0:0:0:3CD6:F0FF:FD00:3004
to
FEA0::3CD6:F0FF:FD00:3004
B. Keep the TFTP filename as
short
as possible.
C. Leave the gateway IP address
blank unless it is required.
4. When global IPv6 addresses are used for the client and the
server, and there are more than two gateways on the same link, the
gateway
with the best route to the server should be used. Using a gateway
that does not have the best route to the server can cause the ping test
or network boot to fail.
3.0 Firmware
Information
and Description
Use the following examples as a reference to determine whether your
installation
will be concurrent or disruptive.
Note: The concurrent levels of system firmware may, on
occasion,
contain fixes that are known as deferred. These deferred fixes can be
installed
concurrently, but will not be activated until the next IPL.
Deferred
fixes, if any, will be identified in the "Firmware Update Descriptions"
table of this document. For deferred fixes within a service pack,
only the fixes in the service pack which cannot be concurrently
activated
are deferred.
Note: The file names and service pack levels used in
the
following examples are for clarification only, and are not necessarily
levels that have been, or will be released.
System firmware file naming convention:
01EHXXX_YYY_ZZZ
- XXX is the release level
- YYY is the service pack level
- ZZZ is the last disruptive service pack level
NOTE: Values of service pack and last disruptive service
pack
level (YYY and ZZZ) are only unique within a release level (XXX).
For example, 01EH330_067_045 and 01EH340_067_053 are different
service
packs.
An installation is disruptive if:
- The release levels (XXX) are different.
Example: Currently installed release is EH330, new release is
EH340
- The service pack level (YYY) and the last
disruptive
service
pack level (ZZZ) are equal.
Example: EH330_120_120 is disruptive, no matter what level of
EH330
is currently
installed on the system
- The service pack level (YYY) currently installed on the system is
lower
than the last disruptive service pack level (ZZZ) of the service pack
to
be installed.
Example: Currently installed service pack is EH330_120_120 and
new service pack is EH330_152_130
An installation is concurrent if:
- The release level (XXX) is the same, and
- The service pack level (YYY) currently installed on you
system
is
the same or higher than the last disruptive service pack level
(ZZZ)
of the service pack to be installed.
Example: Currently installed service pack is EM310_126_120,
new service pack is EM310_143_120.
Firmware Information and Update Description
Filename |
Size |
Checksum
|
01EH330_104_034.rpm |
37536596 |
03747 |
Note: The Checksum can be found by running the AIX sum command against the rpm file
(only the first 5 digits are listed).
ie: sum 01EH330_104_034.rpm
EH330 |
EH330_104_034
04/26/10
|
Impact: Availability
Severity: ATT
System firmware changes that affect all systems
- DEFERRED:
This fix
corrects the handling of a specific processor instruction sequence that
has the potential to result in undetected data errors. This
specific instruction sequence has only been observed in a small number
of highly tuned floating point-intensive applications. However,
it is strongly recommended that this fix be applied to all POWER6
systems. This fix has the potential to decrease system
performance on applications that make extensive use of floating point
divide, square root, or estimate instructions..
- A problem was fixed that caused SRC
B181440C to be erroneously logged, and a call home to be erroneously
made, during the installation of system firwmare.
- A problem was fixed that caused SRC
B1818A0A to be erroneously logged during a concurrent firmware update.
- The firmware was enhanced such that
SRCs B181F126, B181F127, and B181F129 are correctly logged, and no
longer cause unnecessary calls home to be made.
- The firmware was enhanced so that SRC B181720D,
and occasionally a service processor dump, will not be generated
when the service processor's two Ethernet interfaces are on the same
subnet. (This is an invalid configuration.)
- In partitions running AIX or Linux,
a problem was fixed that, under certain rare circumstances, caused the
addition of an I/O slot to a partition using a dynamic LPAR (DLPAR) add
operation to fail.
- A problem was fixed that caused the
system to hang with SRCs B182953C, B182954C and B17BE434 being logged.
- A problem was fixed that caused SRC
B1818902 to be erroneously logged during a firmware installation.
- A problem was fixed that caused a
reset/reload of a node controller.
System firmware changes that affect certain systems
- On partitions running AIX or Linux,
a problem was fixed that caused a dynamic LPAR (DLPAR) operation to add
an I/O slot to fail.
- On systems running redundant VIOS
partitions, a problem was fixed that prevented Ethernet traffic from
being properly bridged between the two partitions. This problem
also prevented shared Ethernet adapter failover from working correctly.
- On systems using
InfiniBand switches for processor clustering, a problem was fixed that
caused InfiniBand ports to intermittently drop out.
|
EH330_095_034
08/31/09
|
Impact:
Usability
Severity: HIPER
System firmware changes that affect all systems
- DEFERRED: This fix corrects the handling of
a
specific processor
instruction sequence that was generated on a particular heavily-tuned
High
Performance Computing (HPC) application. This specific instruction
sequence
has the potential to produce an incorrect result. This instruction
sequence
has only been observed in a single HPC application. However, it
is
strongly recommended that you apply this fix.
- HIPER: A problem was fixed that caused the
migration of a
partition using shared processors to fail with a reason code of
4180043,
or caused the source system to hang or crash.
- A problem was fixed that caused SRC 1000911B to be
erroneously logged
during
a reset/reload of the service processor.
System firmware changes that affect certain systems
- On systems with 7311-D11, 7314-G30, 5790, or 5796 19"
drawers attached,
a problem was fixed that caused SRC 10009138 to be erroneously logged.
Concurrent maintenance (CM) firmware fixes
- A problem was fixed that caused SRC B7005603 to be
erroneously logged
when
a F/C 5802 or 5877 19" drawer was concurrently added to the
system.
|
EH330_092_034
05/18/09
|
Impact:
Usability
Severity: Special Attention
System firmware changes that affect all systems:
- DEFERRED: A problem was fixed that caused
the
advanced system
management interface (ASMI) menus to become unresponsive, and the
system
to appear to hang, when a GX adapter slot reservation was attempted
when
the system was at service processor standby.
- The firmware was enhanced to improve the service
processor's capability
to recover from bad bits in the flash memory. A predictive error,
or an unrecoverable error, will be logged against the card that
contains
the system firmware if the number of correctable or uncorrectable
errors
exceeds the threshold.
- A problem was fixed that prevented the service processor
from
automatically
booting from the permanent (or P) side if the temporary (or T) side of
the firmware flash was corrupted. When the problem occurred, the
service processor stopped instead of booting from the P side.
- The firmware was enhanced so that SRC B1xxE458 (with word
6=0000E42B)
will
be logged as informational instead of generating a call home.
- A problem was fixed that caused non-terminating SRCs (such
as B1818A1E)
that indicate registry read errors to be logged during a disruptive
installation
of system firmware.
- The firmware was enhanced to improve the field replaceable
unit (FRU)
callouts
when a clock failure occurs.
- A problem was fixed that caused a partition being
migrated to
become
unresponsive on the target system when firmware-assisted dump was
enabled.
- The callouts for SRC B181E6ED, which is logged when a
system is
booted
with service processor redundancy disabled, were improved to indicate
that
redundancy was disabled rather than calling out a firmware failure.
- A problem was fixed that caused hardware to be deconfigured
when the
system
encountered network errors, even though the SRCs were being logged as
informational.
- A problem was fixed that caused the detailed data at the
end of an
"early
power off warning type 5" AIX error log entry to be filled with invalid
data instead of zeros.
- A problem was fixed that caused a partition being migrated
to crash on
the target system.
- A problem was fixed that might cause a system to crash with
SRC
B170E504
when a processor was dynamically deconfigured.
- The firmware was enhanced such that when data is written to
the VPD
(Anchor)
card, the results are verified, resulting in fewer VPD cards being
replaced.
- A problem was fixed that prevented all of the necessary
files from
being
synchronized between the primary and the secondary system
controllers.
One possible symptom of this problem was the time-of-day clocks being
out
of synch after a system controller failover.
- A problem was fixed that caused SRC B1818601 to be logged,
and a
service
processor dump to be generated, at runtime.
System firmware changes that affect certain systems:
- In systems using InfiniBand switches for processor
clustering, a
problem
was fixed that caused packets to be dropped under certain circumstances.
- On systems with five or more nodes, a problem was fixed
that prevented
the identify LED function from turning on the correct node's LED.
|
EH330_076_034
12/05/08
|
Impact:
Serviceability
Severity: HIPER
System firmware changes that affect all systems:
- DEFERRED and HIPER: The system
initialization
settings were
changed to reduce the likelihood of a system crash under
extremely
rare circumstances.
- HIPER: A problem was fixed that caused a
system
to fail to
reboot after a B1xxE504 SRC was logged, due to a processor
interconnection
bus failure. The same SRC, B1xxE504, was logged when the reboot
failed.
- A problem was fixed that caused SRC 11001D1x to be
erroneously logged
during
system boot.
- A problem was fixed that might, if a platform dump
occurred, have
caused
a reset/reload of the service processor, and the platform dump to be
corrupted.
- A problem was fixed that caused incorrect field replaceable
unit (FRU)
part numbers to be returned for the BPF scroll assembly, UEPO panel and
the CEC MDA scroll assembly.
- A problem was fixed that prevented the system from
rebooting if an
error
occurred during a memory-preserving IPL.
- The firmware was enhanced so that if a system with
redundant system
controllers
is booted with redundancy disabled, a call home error will be logged.
- The firmware was enhanced so that a call home will be made
if the
hypervisor
issues a "terminate immediate" interrupt.
- A problem was fixed that prevented service processor and
hypervisor
error
log entries from being reported to the operating system after a
successful
partition migration. This problem only affected the partition
that
was migrated.
- On systems running AIX or Linux, a problem was fixed that,
under
certain
rare circumstances, might cause the operating system to crash.
- A problem was fixed that, in certain configurations, caused
the removal
of a host Ethernet adapter (HEA) port to fail when using a dynamic LPAR
(DLPAR) operation.
- A problem was fixed that, under certain rare circumstances,
caused the
hypervisor to crash when it was booting with SRC B6000103 being logged.
- A problem was fixed that, under certain circumstances,
prevented the
operating
system from recovering a PCI-E adapter on which a temporary enhanced
error
handling (EEH) error occurred.
- A problem was fixed that, under certain rarely occurring
circumstances,
caused the system to crash if an L2 or L3 cache failure is not
discovered
and repaired when it initially occurs.
- A problem was fixed that caused the service processor
diagnostics to
call
out a processor as the failing item, instead of the memory DIMMs, when
a large number of memory error correction coding (ECC) errors occurred.
- A problem was fixed that prevented the system from powering
on after
the
"reset to factory settings" option was selected in the advanced system
management interface (ASMI) menus.
- A problem was fixed that caused the wrong field replaceable
unit (FRU)
to be called out when SRC B152F109, which indicates a problem with the
NVRAM in a bulk power controller (BPC), was logged.
- (picked up under feature 683162): A problem was
fixed that
prevented service processor and hypervisor error log entries from being
reported to the operating system after a successful partition
migration.
This problem only affected the partition that was migrated.
- A problem was fixed that might cause a default catch to
occur when
booting
from an iSCSI device.
System firmware changes that affect certain systems:
- On systems with a host Ethernet adapter (HEA) or host
channel adapter
(HCA)
assigned to a Linux partition, a problem was fixed that prevented the
partition
from booting if 512 GB, 1 TB, or 1.5 TB of memory was assigned to the
partition.
When this problem occurred, SRC B700F105 was logged.
- In systems with clustered processors, various problems were
fixed in
the
InfiniBand interconnection networks.
- A problem was fixed that, under certain circumstances,
caused an AIX or
Linux partition to fail to boot with SRC D200E0AF being logged.
- On systems with external I/O frames, a problem was fixed
that might
have
prevented the firmware from "unthrottling" processors after entering
power
save mode.
|
EH330_046_034
08/28/08
|
Impact:
Function
Severity: HIPER
System firmware changes that affect all systems:
- DEFERRED and HIPER: A problem was fixed
that,
under certain
rarely occurring circumstances, an application could cause a processor
to go into an error state, and the system to crash.
- HIPER: A problem was fixed that caused the
system to terminate
abnormally with SRC B131E504.
- HIPER: A problem was fixed that might cause
a
partition to
crash during a partition migration before the migration was complete.
- DEFERRED: Enhancements were made to the
system
firmware to
reduce the system boot time on power up.
- DEFERRED: A problem was fixed such that
under
certain rare
circumstances, if a system controller failover occurred, the new
secondary
system controller was not able to communicate with the system.
- DEFERRED: A problem was fixed that caused
SRC
B1608CB0 to
be logged if a separate I/O frame is attached to the CEC frame.
- A problem was fixed that caused multiple instances of SRC
B1818A03 and
B1818A0A to be logged erroneously, and multiple calls home to be made,
during a frame connection reset.
- A problem was fixed that caused SRC B1819506 to be
erroneously
generated,
and a call home to be made, when service processor (or system
controller)
error log entries were generated faster than they could be processed.
- A problem was fixed that caused the hardware management
console (HMC)
to
show an "Incomplete" state after it attempted to read a file with an
incorrect
size from the service processor (or system controller). This
problem
also occurred if the "factory configuration" option was used on the
advanced
system management interface (ASMI) menus.
- Enhancements were made to the firmware to improve the FRU
callouts for
certain types of failures of the time-of-day clock circuitry.
- A problem was fixed that prevented a dump file larger than
4 GB from
being
successfully off-loaded to the hardware management console (HMC).
- On systems with redundant bulk power controllers, a problem
was fixed
that
caused the hardware management console (HMC) to get stuck at "Pending
Authentication"
for one of the bulk power controllers (BPCs).
- On systems with I/O drawers attached, a problem was fixed
that might
have
caused some I/O slots in the drawers not to be configured when the
system
was booted.
- In systems with clustered processors, various problems were
fixed in
the
InfiniBand interconnection networks.
- A problem was fixed that caused the location codes of the
external
InfiniBand
ports on a 5791 I/O drawer with the InfiniBand interface to be reported
incorrectly on the HMC.
- A problem was fixed that caused SRC B7006971 to be
generated because
the
firmware was incorrectly performing operations on PCI-Express I/O
adapters during dynamic LPAR (DLPAR) operations on memory.
- A problem was fixed the might have caused an out-of-memory
condition in
the hypervisor, with SRC B7000200 being logged.
- A problem was fixed in the thermal management firmware that
caused SRCs
B1812635 and B1812636 to be logged, and the system or node to run in
low
power mode when it should have been in nominal, or nominal when it
should
have been in low power mode.
- A problem was fixed that caused SRC B1818A10 to be
erroneously
generated
after a successful installation of system firmware.
- A problem was fixed that caused the AIX commands "lsmcode"
and "diag"
to
fail after a partition migration.
- A problem was fixed that caused the message "BA330000malloc
error!" to
be displayed on the operating system console after a partition
migration,
even though SRC BA330000 had not been logged. When this problem
occurred,
the partition migration appeared to be successful. However, a
process
within the partition was either hung or had failed, and in most cased
the
partition had to be rebooted to fully recover.
- A problem was fixed that caused the status of the
connection between
the
hardware management console (HMC) and the service processor to be set
to
an invalid state. This might cause problems when the HMC and
service
processor tried to communicate.
- A problem was fixed that caused partitions that were being
rebooted to
hang at D200E0AF after a concurrent firmware update under certain
circumstances.
- A problem was fixed that prevented the replacement of a
system
controller
from completing successfully if the system controller had been guarded
out prior to it replacement.
- A problem was fixed that caused the system controller to go
through an
unnecessary reset/reload cycle when a checkstop occurred or the system
was powered off.
- Enhancements were made to the firmware to improve the FRU
callouts for
certain types of failures of the node controller.
- A problem was fixed that caused predictive SRC B181EF88 to
be logged
when,
under certain circumstances, a system controller failover occurred at
runtime.
- A problem was fixed such that if redundancy was disabled,
and the
emergency
power off (EPO) switch was then used to power off the system,
redundancy
was erroneously enabled when the system came back up.
- Enhancements were made to the firmware to improve the FRU
callouts for
certain types of failures of a node controller.
- A problem was fixed such that caused the service processor
(or system
controller)
to lose its communication link with the hypervisor, and SRC A181D000 to
be logged, under certain rare circumstances.
- On systems using virtual shared processor pools (VSPP), a
problem was
fixed
that caused the number of processors assigned to the partitions to be
reduced
after a memory-preserving IPL.
|
EH330_034_034
06/10/08
|
Impact:
Function Severity:
HIPER
This level is a disruptive update from the prior level,
EH330_018.
The system should be powered off before installing this level of system
firmware. If this level is installed when the system is running,
the CECs will be rebooted, causing all partitions to be terminated, and
a reboot will be required.
System firmware changes that affect all systems:
- HIPER: A problem was fixed that caused a
concurrent firmware
installation to hang with SRC BA00E840 being logged. This problem
may also cause a partition migration to hang, under certain
circumstances,
with the same SRC, BA00E840, being logged. This SRC will be
logged
when this level of firmware is installed and will generate a call home;
it should be ignored. It will not be logged during subsequent
installations.
- HIPER: The processor initialization settings were
changed to reduce
the likelihood of a processor going into an error state and causing a
checkstop
or system crash.
- HIPER: A problem was fixed that, under certain
circumstances, caused
a system termination during a service processor failover.
- HIPER: A problem was fixed that caused large
numbers
of enhanced
error handling (EEH) errors to be logged against the 4-port gigabit
Ethernet
adapter, F/C 5740, under certain circumstances.
- HIPER: On systems with a redundant system
controllers installed
and enabled, a problem was fixed that might cause a communications hang
between the two system controllers. When this occurred, it
triggered
a reset/reload of the primary system controller, and the resulting
fail-over
to the secondary system controller failed in such a way that the system
crashed.
- Several problems were fixed that might cause one or both of
the clock
cards
to be deconfigured, and erroneously called out as bad, when the system
boots up from the power-off state.
- A problem was fixed that caused the /tmp directory on the
system
controllers
and the service processor in the bulk power controller (BPC) to fill
up,
which results in an out-of-memory condition. When this problem
occurred,
the system controllers or service processor in the BPC usually
performed
a reset/reload. This is one possible cause of SRC B1817201 being
logged.
- A problem was fixed that prevented the "i5/OS
enable/disable" setting
(in
the ASMI power on/off menu) from taking effect when the system is
booted.
This solution requires the system to be booted up to hypervisor standby
twice after the setting is changed to "enabled". This will be fixed in
a future service pack to remove the requirement for the second boot to
hypervisor standby.
- A problem was fixed that caused the firmware to
receive a false
error
indication when reading the registers on the LED controller. SRC
B1811340 was logged when this happened.
- A problem was fixed that prevented an error fail-over to
the secondary
system controller from completing successfully.
- A problem was fixed that might have caused a system
firmware
installation
to fail with SRC B18138B7 being logged.
- A problem was fixed that caused an error log to be
generated that
called
out system controller A (Un-P1-C2), instead of the correct callout,
which
was system controller B (Un-P2-C5).
- A problem was fixed that caused the P1 LED on the front
light strip to
be on when it should have been off.
- A problem was fixed that caused the wrong memory DIMM
location to be
called
out when certain types of failures occurred.
- A problem was fixed that might have caused cache chip
failures when the
system is operating in Power Save mode. Error log entries that
might
indicate that this problem is occurring include correctable errors and
uncorrectable errors in L2, i-cache and d-cache memory, parity errors,
and SRC B181E504.
- The firmware was enhanced so that the IDs "celogin1" and
"celogin2"
allow
an authorized service provider to log into the bulk power controller
(BPC).
- A problem was fixed that caused a partition using a host
channel
adapter
(HCA) or host Ethernet adapter (HEA) to appear to hang (with progress
code
D200C1FF being displayed) before successfully shutting down. The
amount of time the partition appeared to hang depended on the amount of
memory assigned to the partition and the usage of HCA or HEA.
- A problem was fixed that prevented the HMC from connecting
to the
managed
system if the HMC's DHCP server IP range is changed when the managed
system
is running.
- The error logging and FRU callout firmware was enhanced so
that if a
failure
occurs on one or both clock cards, only one will get deconfigured, and
the system will continue to try to boot instead of terminating.
- The firmware was enhanced to improve the system memory
error recovery.
- The firmware was enhanced so that the contents of the
/tmp
directory
are included when a service processor dump is taken.
- A problem was fixed in the hypervisor that might cause a
partition
migration
to fail.
- The firmware was enhanced so that:
- A failure when writing VPD to a P6 processor will cause
the node to be
deconfigured rather than terminating the system.
- The failure of a VPD write operation will not
corrupt the VPD
table,
which may lead to unnecessary system down-time and unnecessary FRU
replacement.
System firmware changes that affect certain systems:
- On systems using QLogic InfiniBand switches, a problem was
fixed that
caused
the PortInfo:linkWidthActive and PortInfo:linkSpeedActive to be
inaccurately
stored and displayed on the display of subnet parameters.
|
EH330_018_018
05/13/08
|
Impact: New
Severity:
New
|
4.0
How to Determine Currently Installed Firmware Level
You can view the server's current firmware level on the Advanced System
Management Interface (ASMI) Welcome pane. It appears in the top
right
corner. Example: EH330_104.
5.0 Downloading
the
Firmware Package
Follow the instructions on the web page. You must read and agree to the
license agreement to obtain the firmware packages.
Note: If your HMC is not internet-connected you will need to
download
the new firmware level to a CD-ROM or ftp server.
6.0 Installing the
Firmware
The method used to install new firmware will depend on the release
level
of firmware which is currently installed on your server. The release
level
can be determined by the prefix of the new firmware's filename.
Example: EHXXX_YYY_ZZZ
Where XXX = release level
- If the release level will stay the same (Example: Level
EH330_075_075
is currently installed and you are attempting to install level
EH330_081_075)
this is considered an update.
- If the release level will change (Example: Level EH330_081_075 is
currently
installed and you are attempting to install level EH340_096_096) this
is
considered an upgrade.
Instructions for installing firmware updates and upgrades can be found
at http://publib.boulder.ibm.com/infocenter/systems/scope/hw/topic/ipha1/updupdates.htm