Power6 Mid-Range Firmware
Applies to: 9117-MMA and 9406-MMA
This document provides information about the installation of
Licensed
Machine or Licensed Internal Code, which is sometimes referred to
generically
as microcode or firmware.
Contents
1.0 Systems Affected
This package provides firmware for System p 570 (9117-MMA), System i570
(9406-MMA) and Power 570 (9117-MMA) servers
only. Do
not
use on any other systems.
The firmware level in this package is:
Updating your system firmware from EM320_031 or EM320_040 directly to
EM320_101
is DISRUPTIVE.
2.0 Cautions
and Important
Information
2.1 Cautions
Concurrent Maintenance Restrictions
Problems have been identified in the system
firmware
which impact CEC Concurrent Maintenance on Power6 servers 9117-MMA and
9406-MMA.
These functions
must
not be performed until the firmware level
containing the fixes has been installed on the server.
The fixes for these functions will be
available
in a future Service Pack.
New minimum system firmware requirements for model MMA systems
shipped
starting in May 2008
9117-MMA systems shipped beginning in May of 2008 have a unique system
VPD value which identifies them to the HMC. These systems will ship
with
EM320_059 or higher system firmware installed. Installing a level
of system firmware lower than EM320_059 will result in an error
condition
with SRC B155A46B being logged when the system is powered on.
HMC-Managed Systems
NOTE:
- For systems shipped beginning in May 2008, you must upgrade
your
HMC code to Version 7, Release 3.3.0.
- For system shipped before May 2008, you may continue
using
HMC code
Version 7, Release 3.2.
For information concerning HMC releases and to access the HMC code
packages,
go to the following URL: http://www14.software.ibm.com/webapp/set2/sas/f/hmcl/home.html
NOTE: You must be logged in as hscroot in order
for
the firmware installation to complete correctly.
Updating firmware from EM320_031 to EM320_101
Prior to updating server firmware from EM320_031 to EM320_101, ensure a
backup of partition profile(s) is current and HEA settings are
collected
(when applicable).
The following steps may be required if the HMC shows Recovery state
after updating server firmware.
Restore partition data from the backup
Check/Reset any of the following settings which may have been lost
during the server firmware update:
Promiscuous partition flag
HEA
The bootlists in SMS on any and all LPARs.
Other
9117-MMA Systems with IBM i partition(s) at EM320_040 or EM320_046
updating
to EM320_101
9117-MMA systems with IBM i partition(s) currently running server
firmware
level EM320_040 or EM320_046 must check that the system is enabled to
run
i5/OS. This option is specified within the Advanced System
Management
Interface (ASMI), under the Power/Restart Control, Power On/Off System
menu. If this setting is not set to Enabled, change it to
Enabled,
Save Settings, and then perform a disruptive firmware update (system
IPL
required) to EM320_101. After the disruptive firmware update is
complete,
verify that i5/OS Capable = 'True' on the HMC. This managed
system
setting is shown on the HMC, via the System properties, under the
Capabilities
tab.
System stability may be affected by the installation of downlevel
system
firmware, or using the i5/OS operating system's firmware installation
procedure
Both of these actions may affect systems stability:
1. Installation of system firmware via the i5/OS operating
system
(also know as inband update).
2. Downleveling the system firmware using the HMC (an
out-of-band
installation) from EM320_059 or EM320_061 to a lower level, then
performing
a concurrent firmware installation of EM320_059 or EM320_061.
Neither of the above actions is supported; both may result in SRC
11001611,
11001621 or 11001631 being displayed on the operator panel and logged
as
a serviceable event in service focal point. These SRCs indicate a
regulator failure, which can be a recoverable error (logged only, no
affect
on the state of the system) or a hard error (the system powers down due
to the regulator error). Since an HMC is attached to the system,
the only supported method of installing system firmware is via the HMC
(the way the system was originally shipped from manufacturing).
Do not backlevel from
EM320_031
to EM310
Do not attempt to backlevel firmware from the
EM320_031 level to the EM310 release level. This will corrupt the
service processor(s) code and will require the service processor(s) to
be replaced. Firmware update or upgrade fails with SRC
E302F842.
This problem will occur when the following conditions apply:
HMC is at V7.3.2 with fix MH01081 installed and the managed system
being
updated or upgraded is at firmware level EM310_048.
To determine if MH01081 is installed:
Enter the following command on an HMC command line:
lshmc
-V
This command will produce a report similar to the
following:
MH01081: Pegasus security fix, code update fix, and new DST
updates (01-09-2008)
To prevent this failure from occurring, install fix MH01084.
If you have experienced this problem, install fix MH01084, and then
reinstall the system firmware. For information about the recovery
procedure call you next level of support.
2.2 Important Information
IPv6 Support and Limitations
IPv6 (Internet Protocol version 6) is supported in the System
Management
Services (SMS) in this level of system firmware. There are
several
limitations that should be considered.
When configuring a network interface card (NIC) for remote IPL, only
the most recently configured protocol (IPv4 or IPv6) is retained.
For example, if the network interface card was previously configured
with
IPv4 information and is now being configured with IPv6 information, the
IPv4 configuration information is discarded.
A single network interface card may only be chosen once for the boot
device list. In other words, the interface cannot be configured
for
the IPv6 protocol and for the IPv4 protocol at the same time.
A failure will occur if the overall device pathname string and its
parameters
exceed 255 bytes. One symptom of the string being too long is an
odd-looking boot device string in the AIX start banner as in the
following example:
-------------------------------------------------------------------------------
Welcome to AIX.
boot image timestamp: HH:MM MM/DD
The current time and date: 10:15:24 04/22/2008
processor count: 2; memory size: 1024MB; kernel size:
28034141
boot device: /l
-------------------------------------------------------------------------------
Several things that can be done to try to get the overall
string
length reduced are:
A. Use the compressed
form
of the IPv6 IP addresses whenever possible. For example, change
the
address
FEA0:0:0:0:3CD6:F0FF:FD00:3004
to
FEA0::3CD6:F0FF:FD00:3004
B. Keep the TFTP filename as
short
as possible.
C. Leave the gateway IP address
blank unless it is required.
4. When global IPv6 addresses are used for the client and the
server, and there are more than two gateways on the same link, the
gateway
with the best route to the server should be used. Using a gateway
that does not have the best route to the server can cause the ping test
or network boot to fail.
Signal Cable in an InfiniBand loop, and InfiniBand I/0 drawer
power
on/off
The problems noted in this section in earlier levels of this
description
file were corrected in the EM320_059 firmware level by the last two
fixes
in the "affects certain systems" section.
ECA702 Released for 9117-MMA Systems
ECA702 was released on 12/07/2007 to update 9117-MMA systems to
firmware
level EM310_063_048 (or higher). In addition to system
firmware,
the ECA also provides corresponding HMC updates. Product Engineering
strongly
recommends the installation of the ECA. Customers wishing to have
IBM service perform the installation of this firmware, free of charge,
should call 1-800-IBM-SERV or their country's service organization to
request
mandatory ECA702.
Memory Considerations for Firmware Upgrades
The increase in memory used by the firmware is due to the additional
functionality
in later firmware releases.
2.3 Planning
Information
Processor MES that requires minimum firmware level be
installed
on the system before MES installation
The EM320_093 or higher firmware level must be installed on the system
prior to installing an MES processor upgrade of these processor feature
codes that have been shipped from the plant after 06/22/09.
Machine Type-Model |
Processor Feature Code |
9117-MMA, 9406-MMA |
5620 - 3.5 GHz Proc Card, 0/2 Core POWER6 |
9117-MMA, 9406-MMA |
5622 - 4.2 GHz Proc Card, 0/2 Core POWER6 |
9117-MMA, 9406-MMA |
7380 - 4.7 GHz Proc Card, 0/2 Core POWER6 |
3.0 Firmware
Information
and Description
Use the following examples as a reference to determine whether your
installation
will be concurrent or disruptive.
For systems that are not managed by an HMC, the installation of
system
firmware is always disruptive.
Note: The concurrent levels of system firmware may, on
occasion,
contain fixes that are known as deferred. These deferred fixes can be
installed
concurrently, but will not be activated until the next IPL.
Deferred
fixes, if any, will be identified in the "Firmware Update Descriptions"
table of this document. For deferred fixes within a service pack,
only the fixes in the service pack which cannot be concurrently
activated
are deferred.
Note: The file names and service pack levels used in
the
following examples are for clarification only, and are not necessarily
levels that have been, or will be released.
System firmware file naming convention:
01EMXXX_YYY_ZZZ
- XXX is the release level
- YYY is the service pack level
- ZZZ is the last disruptive service pack level
NOTE: Values of service pack and last disruptive service
pack
level (YYY and ZZZ) are only unique within a release level (XXX).
For example, 01EM310_067_045 and 01EM320_067_053 are different
service
packs.
An installation is disruptive if:
- The release levels (XXX) are different.
Example: Currently installed release is EM310, new release is
EM320
- The service pack level (YYY) and the last
disruptive
service
pack level (ZZZ) are the same.
Example: EM310_120_120 is disruptive, no matter what level of
EM310
is currently
installed on the system
- The service pack level (YYY) currently installed on the system is
lower
than the last disruptive service pack level (ZZZ) of the service pack
to
be installed.
Example: Currently installed service pack is EM310_120_120 and
new service pack is EM310_152_130
An installation is concurrent if:
The release level (XXX) is the same, and
The service pack level (YYY) currently installed on you system
is the same or higher than the last disruptive service pack level (ZZZ)
of the service pack to be installed.
Example: Currently installed service pack is EM310_126_120,
new service pack is EM310_143_120.
Firmware Information and Update Description
Filename |
Size |
Checksum |
01EM320_101_045.rpm |
22393332 |
05901 |
EM320 |
EM320_101_045
10/22/09
|
Impact:
Function
Severity: HIPER
System firmware changes that affect all systems
- HIPER: A problem was fixed that caused the
migration of a
partition using shared processors to fail with a reason code of
4180043,
or caused the source system to hang or crash.
- DEFERRED: This fix corrects the handling of
a
specific processor
instruction sequence that was generated on a particular heavily-tuned
High
Performance Computing (HPC) application. This specific instruction
sequence
has the potential to produce an incorrect result. This instruction
sequence
has only been observed in a single HPC application. However, it
is
strongly recommended that you apply this fix.
- The firmware was enhanced such that SRCs B181F126,
B181F127, and
B181F129
are correctly managed, and no longer cause unnecessary calls home.
- A problem was fixed that caused SRC B7005603 to be
erroneously logged
when
a F/C 5802 or 5877 19" drawer was concurrently added to the system.
- A problem was fixed that caused SRC B1817201 to be
erroneously logged
during
the installation of system firmware.
System firmware changes that affect certain systems
- On systems using on/off (temporary) memory capacity on
demand (COD),
the
firmware was enhanced to improve the billing process for this
feature.
|
EM320_093_045
05/04/09
|
Impact:
Function
Severity: SPE
System firmware changes that affect all systems
- DEFERRED: The firmware was enhanced so that
the
system recovers
gracefully from an I/O load time-out, rather than issuing a machine
check,
which crashes the system.
- A problem was fixed that caused the service processor
diagnostics to
report
a "TOD (time-of-day) overflow" error, instead of an uncorrectable
memory
error, when failures occurred on memory DIMMs.
- A problem was fixed that, in certain configurations,
caused the
removal
of a host Ethernet adapter (HEA) port to fail when using a dynamic LPAR
(DLPAR) operation.
- A problem was fixed that, under certain circumstances,
prevented the
operating
system from recovering a PCI-E adapter on which a temporary enhanced
error
handling (EEH) error occurred.
- A problem was fixed that caused the hardware management
console (HMC)
to
show the managed system's status as incomplete after adding a drawer
using
the concurrent maintenance operation.
- The firmware was enhanced to improve the service
processor's capability
to recover from bad bits in the flash memory. A predictive error,
or an unrecoverable error, will be logged against the card that
contains
the system firmware if the number of correctable or uncorrectable
errors
exceeds the threshold.
- The firmware was enhanced so that a call home will be made
if the
hypervisor
issues a "terminate immediate" interrupt.
- A problem was fixed that prevented service processor and
hypervisor
error
log entries from being reported to the operating system after a
successful
partition migration. This problem only affected the partition
that
was migrated.
- The firmware was enhanced so that if a system with
redundant service
processors
is booted with redundancy disabled, a call home error will be logged.
- A problem was fixed that prevented the system from powering
on after
the
"reset to factory settings" option was selected in the advanced system
management interface (ASMI) menus.
- A problem was fixed that caused the migration of an AIX or
Linux
partition
to fail when firmware-assisted dump was enabled. When this
problem
occurs, the partition becomes unresponsive on the target system, and
the
target system may have to be rebooted to recover.
- A problem was fixed that prevented the service processor
from
automatically
booting from the permanent (or P side) if the temporary (or T side) of
the firmware flash was corrupted. When the problem occurred, the
service processor stopped instead of booting from the P side.
- A problem was fixed that caused SRC B1818601 to be logged,
and a
service
processor dump to be generated, at runtime.
- The firmware was enhanced to include processor card #1 in
the list of
field
replaceable units (FRUs) that are called out if an I2C bus error occurs
when accessing the processor backplane's vital product data (VPD).
- A problem was fixed that prevented all of the necessary
files from
being
synchronized between the primary and the secondary service
processors.
One possible symptom of this problem was the time-of-day clocks being
out
of synch after a service processor failover.
System firmware changes that affect certain systems
- On systems with a host Ethernet adapter (HEA) or host
channel adapter
(HCA)
assigned to a Linux partition, a problem was fixed that prevented the
partition
from booting if 512 GB, 1 TB, or 1.5 TB of memory was assigned to the
partition.
When this problem occurred, SRC B700F105 was logged.
- On systems with multiple host channel adapter (HCA) cards,
a problem
was
fixed that logical ports on the HCA cards to be intermittently inactive.
- On systems with the integrated xSeries adapter (IXA), a
problem was
fixed
that prevented the creation of a system plan on the HMC.
- On systems with redundant service processors, a problem was
fixed that
caused registry read errors or registry value errors to be generated
during
the installation of system firmware.
- On systems running AIX partitions, a problem was fixed that
caused AIX
to erroneously log a hardware error in which the LABEL field is
"INTRPPC_ERR",
and the INTERRUPT LEVEL is "0009 0001", after a concurrent
firmware
update or partition migration. This error did not affect the
operation
of the system or partition.
|
EM320_083_045
09/24/08
|
Impact:
Serviceability
Severity: HIPER
System firmware changes that affect all systems:
- DEFERRED and HIPER: A problem was fixed
that,
under certain
rarely occurring circumstances, an application could cause a processor
to go into an error state, and the system to crash.
- DEFERRED and HIPER: The system
initialization
settings were
changed to reduce the likelihood of a system crash under certain
circumstances.
- HIPER: A problem was fixed that caused the
system to terminate
abnormally with SRC B131E504.
- HIPER: A problem was fixed that caused a
system
to fail to
reboot after a B1xxE504 SRC was logged due to a processor
interconnection
bus failure. The same SRC, B1xx E504, was logged when the reboot
failed.
- HIPER: A problem was fixed that might cause
a
partition to
crash during a partition migration before the migration was complete.
- DEFERRED: A problem was fixed such that
under
certain rare
circumstances, if a service processor failover occurred, the new
secondary service processor was not able to communicate with the system.
- A problem was fixed that caused SRC B1818A10 to be
erroneously
generated
after the successful installation of system firmware.
- Enhancements were made to the firmware to improve the FRU
callouts for
certain types of failures of the time-of-day clock circuitry.
- A problem was fixed that, under certain rarely occurring
circumstances,
caused the system to crash if an L2 or L3 cache failure occurred.
- The firmware was enhanced so that the contents of /tmp are
included
when
a service processor dump is taken.
- A problem was fixed that caused a predictive SRC, B181EF88,
to be
erroneously
logged after a successful installation of system firmware, and a
subsequent
slow-mode IPL, of the system.
- A problem was fixed that, under certain rarely occurring
circumstances,
caused the system to crash with SRC B7005191 being logged.
- A problem was fixed that prevented the system from
rebooting if an
error
occurred during a memory-preserving IPL.
- A problem was fixed that prevented the diagnostic commands
in AIX (diag
and lsmcode, for example) from working after a partition migration.
- A problem was fixed that, under certain rarely occurring
circumstances,
caused a partition shutdown or partition reboot to hang with SRC
D200B077.
- A problem was fixed that, under certain rarely occurring
circumstances,
caused the hypervisor to loose its communication link to the service
processor
and log SRC A181D000.
- A problem was fixed that, under certain rarely occurring
circumstances,
might have caused dynamic LPAR (DLPAR) operations on memory to fail.
- A problem was fixed that prevented I/O hardware operations
from
completing
before dynamic LPAR (DLPAR) operations were performed on memory.
This caused PCI bus errors, and multiple instances of SRC B7006971 to
be
logged.
- A problem was fixed in the hypervisor that, under certain
rarely
occurring
circumstances, caused a system-level activation to fail.
- A problem was fixed that caused SRC B7006971 to be
generated because
the
firmware was incorrectly performing operations on PCI-Express I/O
adapters during dynamic LPAR (DLPAR) operations on memory.
- A problem was fixed that might have caused a processor
checkstop after
a node repair or node add operation.
- A problem was fixed that caused the message "BA330000malloc
error!" to
be displayed on the operating system console after a partition
migration,
even though SRC BA330000 had not been logged. When this problem
occurred,
the partition migration appeared to be successful. However, a
process
within the partition was either hung or had failed, and in most cased
the
partition had to be rebooted to fully recover.
- The firmware was enhanced to improve the description and
service
actions
that are logged with SRC BA210012.
- A problem was fixed that, under certain rare circumstances,
prevented a
partition migration from completing successfully if processors were
removed
from the partition being migrated prior to the migration using dynamic
LPAR (DLPAR) operations.
- A problem was fixed that, under certain rare circumstances,
caused a
system
crash during partition migration operations.
- A problem was fixed that, under certain rare circumstances,
caused the
hypervisor to crash when it was booting.
System firmware changes that affect certain systems:
- On systems that are managed by a hardware management
console (HMC), a
problem
was fixed that caused the HMC to show an "Incomplete" state after it
attempted
to read a file with an incorrect size from the service processor (or
system
controller). This problem also occurred if the "factory
configuration"
option was used on the advanced system management interface (ASMI)
menus.
- On systems with I/O drawers attached, a problem was
fixed that
might
have caused some I/O slots in the drawers not to be configured when the
system was booted.
- On i5 partitions using IOP-based I/O adapters which are
configured to
use
i5 clustering (SAN), a problem was fixed that caused the failover of an
I/O drawer or tower, to a system which previously owned the drawer or
tower,
to fail.
- On systems with a large number of fibre channel disks, a
problem was
fixed
that caused SRC BA210003 to logged (which called out the fibre channel
adapter) when the system management services (SMS) boot firmware was
searching
for a boot disk.
- In systems with clustered processors, various problems were
fixed in
the
InfiniBand interconnection networks.
|
EM320_076_045
06/09/08
|
Impact:
Serviceability
Severity: HIPER
System firmware changes that affect all systems:
- DEFERRED and HIPER: The processor initialization
settings were changed
to reduce the likelihood of a processor going into an error state and
causing
a checkstop or system crash.
- HIPER: A problem was fixed in the hypervisor that
might cause a
partition migration to fail.
- HIPER: A problem was fixed that caused large
numbers
of enhanced
error handling (EEH) errors to be logged against the 4-port gigabit
Ethernet
adapter, F/C 5740, under certain circumstances.
- HIPER: A problem was fixed that caused the
firmware to erroneously
log VPD errors against the processors. This prevented drawers from
powering
on.
- HIPER: On system with a redundant service
processor installed
and enabled, a problem was fixed that caused a communications hang
between
the two service processors. When this occurred, it triggered a
reset/reload
of the primary service processor, and the resulting fail-over to the
secondary
service processor failed in such a way that the system crashed and
logged
SRC B1813410. Service processor dumps were also taken.
- HIPER: On systems with redundant service
processors installed
and enabled, the firmware was enhanced so that if a failure occurs
during
a service processor failover, the firmware will attempt to reset/reload
one of the service processors. This may allow the system to
recover
and stay up instead of crashing.
- HIPER: On systems with redundant service
processors installed
and enabled, a problem was fixed that caused the system to crash if a
service
processor failover occurred when the VPD files were being synchronized.
- The firmware was enhanced to improve the system memory
error recovery.
- A problem was fixed that caused the /tmp directory on the
service
processor
to fill up, which results in an out-of-memory condition. When
this
problem occurred, the service processor usually performed a
reset/reload.
This is one possible cause of SRC B1817201 being logged.
- A problem was fixed that caused panel function 02 to fail
when trying
to
set the "next IPL speed" or "next IPL side".
- The firmware was enhanced so that serial port S1 is not
automatically
designated
the local console, even if the console is not selected within 60
seconds
of the system is first booted. This enhancement allows the
console
to be selected again, if no selection was made on the previous boot,
instead
of defaulting to the S1 port.
|
EM320_061_031
Mfg Only
05/09/08
|
Impact:
Serviceability
Severity: HIPER
- HIPER: A problem was fixed that caused a
concurrent
firmware installation to hang with SRC BA00E840 being logged.
This
problem may also cause a partition migration to hang, under certain
circumstances,
with the same SRC, BA00E840, being logged. This SRC will be
logged
when this level of firmware is installed and will generate a call
home;
it should be ignored. It will not be logged during subsequent
installations.
|
EM320_059_031
Mfg Only
05/06/08
|
Impact:
Function
Severity: Special Attention
New features and functions:
Fixes that affect all systems:
- HIPER: A problem was fixed that caused
capacity-on-demand
(COD) data to be retrieved in an unreadable format from the Anchor
(VPD)
card.
- HIPER: A problem was fixed that caused
enhanced
error handling
(EEH) to fail on certain I/O adapters.
- HIPER: A problem was fixed that might cause
the
system to
terminate while IPLing partitions soon after a system boot. This
problem might also have been seen if the partitions were set to
"autostart".
This failure is typically seen on systems with a large amount of
memory;
SRC B181D138 is usually logged when this error occurs.
- DEFERRED: A problem was fixed that caused
the
system to appear
to hang with C10090B8 in the control (operator) panel during a slow
mode
boot.
- A problem was fixed that prevented the processor clock from
being
deconfigured
with the fabric bus after a hardware error.
- A problem was fixed that caused the L2 deconfiguration
option to be
displayed
advanced system management interface (ASMI) menus on systems on which
it
is not supported.
- A problem was fixed that caused the GX adapter slot
reservation option
to be displayed on the advanced system management interface (ASMI)
menus
on systems on which it is not supported.
- Fixes problem where wrong slot location was provided in
message when no
slot reservations were available for adding next Feature Code 1800 or
1802
adapter.
- A problem was fixed that caused the location code reported
with
enhanced
error handling (EEH) errors on certain imbedded slots have a -Cx suffix
instead of the correct -T# suffix for the underlying adapter.
This
also impacted the HMC's System Planning tool.
- A problem was fixed that caused the Linux boot loader to
lose its
command
line parameters (and fail to boot a Linux partition) during a
reconfiguration
reboot.
- A problem was fixed that caused the "iSCSI" and "network1"
aliases to
be
created incorrectly in the SMS menus; this might have prevented the
system
or partition from booting from that device.
- A problem was fixed that caused this informational message
to be
erroneously
sent to the operating system console:
subq[5][0] destination address is 0!!!
Check whether the subq is needed. If it is, allocate MEM.
- A problem was fixed that caused the AIX command lsvpd to
hang if it was
executed during a partition migration.
- A problem was fixed that caused the system or partition to
hang at the
"Welcome to AIX" banner, following an iSCSI boot, during the
installation
of AIX.
- A problem was fixed that caused an iSCSI login to fail
under certain
circumstances.
When this failure occurred, the message sent to the console looked
something
like this:
iscsiFailed to LOGIN to target, rc = 1
failed to login.
could not open target 0x9034751 :system04 for r/w, aborting...
tcpOPEN: iscsi open failed
!BA012010 !
- A problem was fixed that caused the location codes of
devices attached
to the integrated USB ports to have a duplicate port suffix. For
example, when this problem occurred, the location code of the device
was
shown as:
/usb-scsi@1 U789D.001.DQDGARW-P1-T2-T2-L1
instead
of the
correct location code, which is
/usb-scsi@1 U787D.001.DQDGARW-P1-T2-L1
- Two translation issues were fixed. The first one
caused the
string
"No alias" to always be displayed on the iSCSI menus in SMS in English
even though it should have been translated into the other languages
that
the SMS menus support. The second one caused the NIC (network
interface
card) parameters such as the client IP address in the SMS ping menu to
be displayed with message strings in English; these should have been
translated
as well.
- A problem was fixed that caused the SMS menus to drop into
the open
firmware
prompt with the message "DEFAULT CATCH!" when the ping test failed.
- A problem was fixed that prevented the operating system
from setting
the
boot device list in NVRAM.
- A problem was fixed that caused approximately 20-25
occurrences of
informational
SRC B7005300 to be logged during every IPL, which was filling up the
error
logs.
- A problem was fixed that prevented the "100 Mbps/full
duplex" setting
for
the HEA 1 Gbps ports from being implemented from the HMC. When
this
occurred, there was no error message on the HMC, but the setting never
took effect.
- A problem was fixed that caused the MAC addresses displayed
on the HMC,
in the HEA logical port information for the second port group, to show
invalid addresses.
- A problem was fixed that caused a service processor dump to
be
generated
with SRC B181EF88 when the advanced system management interface (ASMI)
client was closed abruptly, or a network failure disconnected the
client
and the ASMI.
- Enhancements were made to improve the field
replaceable unit
(FRU)
isolation for phase-locked loop (PLL) clock failures on multi-CEC
drawer
system. SRCs B114F6D2, B114F6C1, B113F6C1, B157F12E, B18187EF,
and
B158E500 were typically seen with this type of failure.
- Enhancements were made to the error analysis firmware to
provide better
FRU callouts for certain types of processor fabric bus failures.
SRCs B114E504, B114B2DF, and B181B10B were typically seen with this
type
of failure.
- Enhancements were made to the firmware to improve the
reliability of
memory
DIMMs.
- A change was made to the firmware such that predictive SRCs
B18138B0,
B1813862,
or B1813882 are now logged as informational.
System firmware changes that affect certain model MMA systems:
- On system using the EnergyScale(TM)
technology,
enhancements were made to include status, log, and error information
about
the Power Save mode in the service processor error logs.
- On systems with redundant service processors enabled, a
problem was
fixed
that caused the "restore factory configuration" function on the
Advanced
System Management Interface (ASMI) to fail.
- On systems with 7314-G30 drawers attached, a problem was
fixed that
caused
the InfiniBand I/O device to drop packets, which resulted in an
unrecoverable
error.
- On systems with 7314-G30 drawers attached, a problem was
fixed that
caused
the drawer to fail when performing concurrent maintenance on the
associated
InfiniBand loop.
- On systems with 7314-G30 drawers attached, a problem was
fixed that
caused
the partition to become unresponsive when an InfiniBand cable in a
redundantly-cabled
loop was disconnected.
Note: The last two defects in this section corrected the issues
detailed
in the section titled Signal Cable in an InfiniBand loop, and
InfiniBand
I/0 drawer power on/off in earlier levels of the firmware description
file. |
EM320_046_031
06/09/08
|
Impact:
Serviceability
Severity: HIPER
Fixes that affect all model MMA systems:
- HIPER: A problem was fixed that caused a
concurrent firmware
installation to hang with SRC BA00E840 being logged. This problem
may also cause a partition migration to hang, under certain
circumstances,
with the same SRC, BA00E840, being logged. This SRC will be
logged
when this level of firmware is installed and will generate a call home;
it should be ignored. It will not be logged during subsequent
installations.
- HIPER: On systems with redundant service
processors installed
and enabled, a problem was fixed that caused the system to crash if a
service
processor failover occurred when the VPD files were being synchronized.
- HIPER: On systems with redundant service
processors installed
and enabled, the firmware was enhanced so that if a failure occurs
during
a service processor failover, the firmware will attempt to reset/reload
one of the service processors. This may allow the system to
recover
and stay up instead of crashing.
- HIPER: A problem was fixed that caused the
firmware to erroneously
log VPD errors against the processors. This prevented drawers from
powering
on.
|
EM320_040_031
03/03/08
|
Impact:
Serviceability
Severity: Special Attention
Fixes that affect all model MMA systems:
- DEFERRED: A problem was fixed that caused a
system crash (with
SRC B131E504) by changing the initialization settings of the I/O
control
hardware.
- A problem was fixed that could cause the hypervisor to hang
after a
reset/reload
of the service processor.
- A problem was fixed that, under certain circumstances,
caused the
InfiniBand
adapter to stop responding to InfiniBand requests.
- A problem was fixed that caused SRC B1813014 to be logged
after a
successful
system firmware installation. This SRC will be logged when this
level
of firmware is installed and will generate a call home; it should be
ignored.
It will not be logged during subsequent installations.
- The FRU list was changed so that clock card failures in a
multi-drawer
system will be easier to debug and require fewer parts to fix.
- A problem was fixed that caused the service processor to
get stuck in a
reset/reload loop, which prevented the system from booting to standby.
System firmware changes that affect certain model MMA systems:
- On systems with redundant service processors enabled, a
problem was
fixed
that could cause a significant increase in system boot time.
- On systems with two service processors installed and with
redundancy
disabled,
a problem was fixed that caused the secondary service processor to go
into
the dump state, and remain in the dump state, after a platform dump.
- On systems with redundant service processors, SRCs B1813833
and
B1813834,
which were being logged intermittently after a side-switch IPL, were
changed
to informational.
- On systems with a 1519-100 tower attached, a problem was
fixed that
caused
the location code of a connector on the integrated virtual IOP to be
displayed
as Un-SE1-SE1-T1 instead of Un-SE1-T1.
- On systems with 7134-G30 I/O drawers attached in certain
cabling
configurations,
a problem was fixed that prevented the I/O port labels from being
displayed
for the port location codes on the hardware topology screens.
|
EM320_031_031
12/03/07
|
Impact:
Function
Severity: Attention
New Features and Functions:
- Support for redundant service processors with failover on
model MMA
systems.
- Support for the concurrent addition of a RIO/HSL adapter on
model MMA
systems. For
more information, see Section 2.1 Cautions, paragraph Concurrent
Maintenance.
- Support for the concurrent replacement of a RIO/HSL adapter
on model
MMA
systems. For more information see
Section
2.1 Cautions, paragraph Concurrent Maintenance.
- Support for the "hyperboot" boot speed option in the power
on/off menu
on the Advanced System Management interface (ASMI).
- Support for the creation of multiple virtual shared
processor pools
(VSPPs)
within the one physical pool. (In order for AIX performance tools to
report
the correct information on systems configured with multiple shared
processor
pools, a minimum of AIX 5.3 TL07 or AIX 6.1 must be running.)
- Support for the capability to move a running AIX or Linux
partition
from
one system to another compatible system with a minimum of
disruption.
- Support for the collection of extended I/O device
information
(independent
of the presence of an operating system) when a system is first
connected
to an HMC and is still in the manufacturing default state.
- Improved VPD collection time on model MMA systems.
- Support for the migration of DDR2 memory DIMMs during the
MES upgrade
from
a 9117-570 server to a 9117-MMA server when processor card F/C 5621 is
ordered when the initial system upgrade MES order is placed.
Support for EnergyScaletm and Active Energy Managertm.
For more information on the energy management features now available,
please
see the EnergyScaletm
white
paper.
|
EM310 |
EM310_074_048
11/10/2008
|
Impact:
Serviceability
Severity: HIPER
System firmware changes that affect all systems:
- DEFERRED and HIPER: The system
initialization
settings were
changed to reduce the likelihood of a system crash under certain
circumstances.
- HIPER: A problem was fixed that caused a
system
to fail to
reboot after a B1xxE504 SRC was logged, due to a processor
interconnection
bus failure. The same SRC, B1xxE504, was logged when the reboot
failed.
- A problem was fixed that, under certain rarely occurring
circumstances,
caused the system to crash if an L2 or L3 cache failure is not
discovered
and repaired when it initially occurs.
- The firmware was enhanced so that the contents of /tmp are
included
when
a service processor dump is taken.
- A problem was fixed that, in certain configurations, caused
the removal
of a host Ethernet adapter (HEA) port using a dynamic LPAR (DLPAR)
operation
to fail.
- A problem was fixed that, under certain rare circumstances,
caused the
hypervisor to crash when it was booting with and SRC B6000103 being
logged.
- A problem was fixed that, under certain circumstances,
prevented the
operating
system from recovering a PIE adapter on which a temporary enhanced
error
handling (EEH) error occurred.
- A problem was fixed that prevented service processor and
hypervisor
error
log entries from being reported to the operating system after a
successful
partition migration. This problem only affected the partition
that
was migrated.
- The firmware was enhanced so that a call home will be made
if the
hypervisor
issues a "terminate immediate" interrupt.
System firmware changes that affect certain systems:
- In systems with clustered processors, various
problems were fixed
in the InfiniBand interconnection networks.
- On systems with a host Ethernet adapter (HEA) or host
channel adapter
(HCA)
assigned to a Linux partition, a problem was fixed that prevented the
partition
from booting if 512 GB, 1 TB, or 1.5 TB of memory was assigned to the
partition.
When this problem occurred, SRC B700F105 was logged.
|
EM310_071_048
07/30/2008
|
Impact:
Serviceability
Severity: HIPER
System firmware changes that affect all systems:
- DEFERRED and HIPER: The processor initialization
settings were changed
to reduce the likelihood of a processor going into an error state and
causing
a checkstop or system crash.
- HIPER: A problem was fixed that caused large
numbers
of enhanced
error handling (EEH) errors to be logged against the 4-port gigabit
Ethernet
adapter, F/C 5740, under certain circumstances.
- DEFERRED: A problem was fixed that
caused
informational
SRCs B181B964 and B150D134 to be logged multiple times, and fill the
service
processor error log, during normal operation of the system.
- DEFERRED: The firmware was enhanced
so
that if an L3
cache controller gets deconfigured at runtime, the associated processor
cores will also be deconfigured. This prevents the system from
going
into an error state and causing a checkstop or system crash.
- A problem was fixed that caused the /tmp directory on the
service
processor
to fill up, which results in an out-of-memory condition. When
this
problem occurred, the service processor usually performed a
reset/reload.
This is one possible cause of SRC B1817201 being logged.
- Enhancements were made to improve the field
replaceable unit
(FRU)
isolation for phase-locked loop (PLL) clock failures on multi-CEC
drawer
system. SRCs B114F6D2, B114F6C1, B113F6C1, B157F12E, B18187EF,
and
B158E500 were typically seen with this type of failure.
- A problem was fixed that caused SRC B1813014 to be
erroneously
generated
when a new level of system firmware was installed on the managed system.
- A problem was fixed that caused SRC B7006971 to be
erroneously
generated
during dynamic LPAR (DLPAR) operations on memory.
- A problem was fixed that caused an "HTML viewer error",
followed by the
message "Cannot complete service action for reference code 'xxxxyyyy'
"
to occur in Service Focal Point on the HMC when trying to perform the
service
actions for certain SRCs.
- A problem was fixed in partition firmware that could cause
a partition
running AIX to crash under certain circumstances.
System firmware changes that affect certain systems:
- On a partition running Linux, a problem was fixed that
might cause the
hypervisor to erroneously deconfigure a processor core.
- On partitions with a large number of hard disks attached to
fibre
channel
adapters, a problem was fixed that might cause SRC BA210003 to be
erroneously
generated when the partition is booting. The partition might or
might
not boot when this error occurs.
- On systems with 7314-G30 drawers attached, a problem was
fixed that
caused
the port labels to be missing on the hardware topology screens with
certain
cable configurations.
- On systems with 7314-G30 drawers attached, a problem was
fixed that
caused
the partition to become unresponsive when an InfiniBand cable in a
redundantly-cabled
loop was disconnected.
- On systems with 7314-G30 drawers attached, a problem was
fixed that
might
have caused some I/O slots in the drawers not to be configured when the
system was booted.
Note: The last two defects in this section corrected the issues
detailed
in the section titled Signal Cable in an InfiniBand loop, and
InfiniBand
I/0 drawer power on/off in earlier levels of the firmware description
file. |
EM310_069_048
02/11/2008
|
Impact:
Availability
Severity: HIPER
Fixes that affect all model MMA systems:
- HIPER: A problem was fixed that caused some
functions that
perform hardware operations during runtime to generate temporary
extended
error handling (EEH) errors.
- DEFERRED: A problem was fixed that caused a
system crash (with
SRC B131E504) by changing the initialization settings of the I/O
control
hardware. Note: This fix is not in the EM320_031_031 level listed
above; it is included in the EM320_040_031 level.
- A problem was fixed that prevented a system from recovering
after SRC
B1xxB9xx
was logged.
- A problem was fixed that caused a firmware installation to
fail with
SRC
B1813028.
- A problem was fixed that caused SRC B1818A10 to be
erroneously logged
during
a disruptive firmware installation.
- A problem was fixed that, under certain circumstances,
caused the
buttons
on the control (operator) panel to be inoperative.
- A problem was fixed that prevented the system planning tool
from
deploying
a sysplan with certain HEA MCS values.
- A problem was fixed that caused SRC B1813108 to be
erroneously logged
during
system boot.
- A problem was fixed that, under certain circumstances,
caused the
InfiniBand
adapter to stop responding to InfiniBand requests.
- A problem was fixed that caused the error
"MSGVIOSE0300E002-0154 There
is insufficient memory available for firmware" to be logged on the HMC.
System firmware changes that affect certain model MMA systems
- On model MMA systems with multiple drawers, a problem
was fixed
that
prevented the pin-hole reset switch on the control (operator) panel
from
resetting the system.
- On model MMA system with an uninterruptible power supply
(UPS)
attached,
a problem was fixed the prevented the UPS from notifying the operating
system that a utility failure or low battery condition had
occurred.
- On systems with at least 3 or more licensed processors and
2 or more
unlicensed
processors, a problem was fixed that caused the system boot to be
slower
than normal, or to hang with SRC C700406E.
- On model MMA system with 7314-G30 I/O expansion drawers
attached,
problems
were fixed that caused the wrong FRUs to be called out with SRC
B70069ED,
and caused the hypervisor to loop if certain invalid cabling
configurations
are encountered.
- On model MMA systems with a large number of I/O towers
attached, a
problem
was fixed that caused the HMC to go to the incomplete state when an
additional
tower was added to a loop.
|
EM310_063_048
11/19/07
|
Impact:
Availability
Severity: HIPER
- HIPER: A problem was fixed that caused a
time-out in a hardware
device driver. This time-out must include both SRCs
B181B920
and B181D147. Other SRCs may be present including, but not
limited
to, B1xxB9xx, B1xxE504, and B150D141. Occasionally the system
crashes.
If B181B920 and B181D147 SRCs are logged, check for any resources that
were deconfigured at the time of these errors and reconfigure them
using
the ASMI menus. No hardware should be replaced. To recover
from this error condition, the service processor must be reset by
removing,
then reapplying, the managed system's power.
- DEFERRED: On multi-drawer model MMA systems,
a
problem found
in testing was fixed which when the L3 cache was disabled, under
very unique (and rare) circumstances may result in data being
overwritten
in the cache and the system to crash. Although the exposure to
this
issue is very low, and there have been no reported problems from the
field,
the system impact if this occurred would be high. Product
Engineering
recommends that you schedule time to install this deferred fix at you
earliest
convenience.
|
EM310_057_048
9/14/07
|
Impact:
Availability
Severity: HIPER
Additional features and functions:
- Added support for 9406-MMA.
System firmware changes that affect all 9117-MMA systems:
- HIPER: A problem was fixed that caused the
system to crash
with SRC B170E450.
- HIPER: A problem was fixed that, in rare
circumstances, could
cause the system to hang due to the improper handling of certain
exceptions.
- HIPER: A problem was fixed that prevented
the
operating system
from being notified of certain EPOW conditions that could lead to the
system
or partition being shut down, with the possible loss of data.
These
EPOW conditions included the ambient temperature being too high, the
loss
of utility power (with or without UPS backup), and a user-initiated
power
off using the white power button or the HMC.
- A problem was fixed that could cause a firmware
installation from the
HMC
to fail with SRC E302F85C on the HMC, and SRC B1813088, B1818A0F, or
B1813011
logged in the service processor error log.
- A change was made so that if a failure occurs during a
memory-preserving
reboot, the system continues to reboot rather than remaining in the
termination
(powered off) state.
- A problem was fixed that caused EEH (enhanced error
handling) errors to
be erroneously logged against certain I/O adapters.
- A problem was fixed that prevented "linked" resources that
had been
guarded
out from being reconfigured during the next reboot after a service
action
on one of the guarded parts.
- A problem was fixed that, after the backplane was replaced
in a
7314-G30
I/O drawer, prevented the partition that owned the drawer from seeing
those
resources.
- A problem was fixed that caused the serial connection to a
partition to
be lost. When this occurred, SRCs B181D307, B200E0AA, and/or
B200813A
were generated by the service processor and the hypervisor.
- A problem was fixed in partition firmware that, in some
circumstances,
prevented a CD-ROM or tape device from being in the default service
mode
boot list, even if one was present in the system.
- A problem was fixed that caused the HMC to go to the
incomplete state,
and SRC B182953C to be logged in the service processor error log every
five minutes or so, when the managed system was booted.
- A problem was fixed that caused the system to
intermittently fail to
configure
devices attached to the integrated USB port when booting.
- A problem was fixed that might have caused erroneous
callouts if a
problem
was found with certain levels of memory controller chips.
- A problem was fixed that caused the system to call home and
reboot
instead
of allowing the failing part (a memory controller or DIMM) to be
deconfigured
by PRD (processor runtime diagnostics).
Additional information concerning this service pack:
In addition to the fixes described above, this service pack
also contains
a fix for a low probability problem and content intended for
newly-manufactured
systems, or enhancements to system internal interfaces, which is not
required
for systems already in production use. This content will
not
be activated on systems that install this service pack
concurrently.
Even though this content is not required for systems which are already
installed and in use, a disruptive installation of this service pack or
a re-IPL after installing it will cause this content to become
active.
It is not necessary to plan a window for re-IPL the system the activate
this content.
|
EM310_048_048
6/22/07
|
Impact:
New
Severity: New
|
4.0
How to Determine Currently Installed Firmware Level
You can view the server's current firmware level on the Advanced System
Management Interface (ASMI) Welcome pane. It appears in the top
right
corner. Example: EM320_101.
5.0 Downloading
the
Firmware Package
Follow the instructions on the web page. You must read and agree to the
license agreement to obtain the firmware packages.
Note: If your HMC is not internet-connected you will need to
download
the new firmware level to a CD-ROM or ftp server.
6.0 Installing the
Firmware
The method used to install new firmware will depend on the release
level
of firmware which is currently installed on your server. The release
level
can be determined by the prefix of the new firmware's filename.
Example: EMXXX_YYY_ZZZ
Where XXX = release level
- If the release level will stay the same (Example: Level
EM310_075_075
is currently installed and you are attempting to install level
EM310_081_075)
this is considered an update.
- If the release level will change (Example: Level EM310_081_075 is
currently
installed and you are attempting to install level EM320_096_096) this
is
considered an upgrade.
Instructions for installing firmware updates and upgrades can be
found at http://publib.boulder.ibm.com/infocenter/systems/scope/hw/topic/ipha1/updateschapter.htm
7.0
Change History
Date
|
Description
|
Feb 17, 2010
|
Updating system firmware
from EM320_031 or EM320_040 is DISRUPTIVE.
|