Power7 High-End System Firmware
Applies to: 9125-F2C
This document provides information about the installation of
Licensed
Machine or Licensed Internal Code, which is sometimes referred to
generically
as microcode or firmware.
Contents
1.0 Systems Affected
This package provides firmware for Power 775 (9125-F2C) Servers
only.
The firmware level in this package is:
1.1 Minimum HMC Code Level
This section is intended to describe the "Minimum HMC Code Level"
required by the System Firmware to complete the firmware installation
process. When installing the System Firmware, the HMC level must be
equal to or higher than the "Minimum HMC Code Level" before starting
the system firmware update. If the HMC managing the server
targeted for the System Firmware update is lower than the "Minimum HMC
Code Level" the firmware update will not proceed.
The Minimum HMC Code level for
this firmware is: HMC V7 R7.3.0
(PTF MH01255 or MH01256).
Although the Minimum HMC Code level for this firmware is listed
above, HMC level V7 R7.3.0 with PTF MH01313 (Service Pack 4) with
PTF MH01320 (fix for V7R7.3.0 SP4), or
higher is suggested for
this
firmware level.
For information concerning HMC releases and the latest
PTFs,
go
to the following URL to access Fix Central.
http://www-933.ibm.com/support/fixcentral/
For specific fix level
information on key components of IBM
Power Systems running the AIX, IBM i and Linux operating systems, we
suggest using the Fix Level Recommendation Tool (FLRT):
http://www14.software.ibm.com/webapp/set2/flrt/home
NOTE: You must be logged in as hscroot in order for the
firmware
installation to complete correctly.
2.0 Important Information
Additional Details About
Installing This Service Pack
The new level of optical link firmware is
installed
automatically
during a node boot after this service pack is installed; it is done
prior to the optical init executing. This happens in parallel
with the
hypervisor starting, and it prevents usage of the hub HFIs. After
the
update is complete, and optical init is complete, the optical
interconnects will be fully functional. Allow for an additional 1
to 1.25 hours of boot time per node on the next reboot after installing
this service pack for this operation.
This new optical module firmware fixes several issues, among them:
- Invalid thermal alarms
- Unexpected resets because of watchdog timer time-outs.
- Laser faults not re-asserting
IPv6 Support and Limitations
IPv6 (Internet Protocol version 6) is supported in the System
Management
Services (SMS) in this level of system firmware. There are several
limitations
that should be considered.
When configuring a network interface card (NIC) for remote IPL, only
the most recently configured protocol (IPv4 or IPv6) is retained. For
example,
if the network interface card was previously configured with IPv4
information
and is now being configured with IPv6 information, the IPv4
configuration
information is discarded.
A single network interface card may only be chosen once for the boot
device list. In other words, the interface cannot be configured for the
IPv6 protocol and for the IPv4 protocol at the same time.
Memory Considerations for Firmware Upgrades
Firmware Release Level upgrades and Service Pack updates may consume
additional system memory.
Server firmware requires memory to support the logical partitions on
the server. The amount of memory required by the server firmware varies
according to several factors.
Factors influencing server firmware memory requirements include the
following:
- Number of logical partitions
- Partition environments of the logical
partitions
- Number of physical and virtual I/O devices
used by the logical partitions
- Maximum memory values given to the logical
partitions
Generally, you can estimate the amount of memory required by server
firmware to be approximately 8% of the system installed memory. The
actual amount required will generally be less than 8%. However, there
are some server models that require an absolute minimum amount of
memory for server firmware, regardless of the previously mentioned
considerations.
Additional information can be found at:
http://publib.boulder.ibm.com/infocenter/powersys/v3r1m5/topic/p7hat/iphatlparmemory.htm
Downgrading firmware from any
given release level to an earlier release level is not recommended.
If you feel that it is
necessary to downgrade the firmware on
your system to an earlier release level, please contact your next level
of support.
3.0 Firmware
Information
and Description
Use the following examples as a reference to determine whether your
installation
will be concurrent or disruptive.
Note: The concurrent levels of system firmware may, on occasion,
contain
fixes that are known as deferred. These deferred fixes can be installed
concurrently, but will not be activated until the next IPL. Deferred
fixes,
if any, will be identified in the "Firmware Update Descriptions" table
of this document. For deferred fixes within a service pack, only the
fixes
in the service pack which cannot be concurrently activated are
deferred.
Note: The file names and service pack levels used in the
following
examples are for clarification only, and are not necessarily levels
that
have been, or will be released.
System firmware file naming convention:
01ASXXX_YYY_ZZZ
- XXX is the release level
- YYY is the service pack level
- ZZZ is the last disruptive service pack level
NOTE: Values of service pack and last disruptive service pack
level
(YYY and ZZZ) are only unique within a release level (XXX). For
example,
01AS330_067_045 and 01AS340_067_053 are different service
packs.
An installation is disruptive if:
- The release levels (XXX) are different.
Example: Currently installed release is AS330, new release is AS340
- The service pack level (YYY) and the last disruptive
service
pack level (ZZZ) are the same.
Example: AS330_120_120 is disruptive, no matter what level of AS330 is
currently
installed on the system
- The service pack level (YYY) currently installed on the system is
lower
than the last disruptive service pack level (ZZZ) of the service pack
to
be installed.
Example: Currently installed service pack is AS330_120_120 and
new service pack is AS330_152_130
An installation is concurrent if:
- The release level (XXX) is the same, and
- The service pack level (YYY) currently installed on the
system
is
the same or higher than the last disruptive service pack level
(ZZZ)
of the service pack to be installed.
Example: Currently installed service pack is AS330_126_120,
new service pack is AS330_143_120.
Filename |
Size |
Checksum |
01AS730_118_093.rpm |
37811338 |
05659 |
Note: The Checksum can be found by running the AIX sum command against the rpm file
(only the first 5 digits are listed).
ie: sum 01AS730_118_093.rpm
AS730
For Impact, Severity and other Firmware definitions, Please
refer to the below 'Glossary of firmware terms' url:
http://www14.software.ibm.com/webapp/set2/sas/f/power5cm/home.html#termdefs
The complete Firmware Fix History for this
Release Level can be
reviewed at the following url:
http://download.boulder.ibm.com/ibmdl/pub/software/server/firmware/AS-Firmware-Hist.html
|
AS730_118_093
11/02/12
|
Impact: Function
Severity: SPE
System firmware changes that affect all systems
- DEFERRED: A problem
was fixed that could cause a live lock on the power bus resulting in a
system crash.
- The firmware was
enhanced to increase the
performance of certain applications by updating the routing tables.
- A problem was fixed that
caused a segmentation fault in the service processor firmware.
When this occurred, a PERC error with SRC B181C350 was logged.
- On systems on which
Internet Explorer (IE) is used to access the Advanced System Management
Interface (ASMI) on the Hardware Management Console (HMC), a problem
was fixed that caused IE to hang for about 10 minutes after saving
changes to network parameters on the ASMI.
- A problem was fixed that
caused the gateway network address to be shown incorrectly on the
System Management Services (SMS) menus when booting a partition on an
iSCSI network.
- A problem was fixed that
caused a "code accept" during a concurrent firmware installation from
the HMC to fail with SRC E302F85C.
- On storage drawers in a
cross-coupled topology, an attempt to place an indirect (failover)
route at an SNID location in the SRT1 route table may result in a
failover route that uses the opposite compute sub-cluster as a bounce
point. The firmware was enhanced to prevent this, since there are
no physical links between the two compute sub-clusters in a
cross-coupled topology. Having a failover route through the
opposite compute sub-cluster will lead to packet loss and application
failure.
- A problem was fixed that
prevented predictive guard errors from being deleted on the secondary
service processor. This caused hardware to be erroneously guarded
out if a service processor failover occurred.
- A problem was fixed that
caused the service processor to be reset during a CEC power off or
reboot. This causes the system to terminate, followed by a
platform reboot. SRC B181E6C7 is typically logged when this
problem occurs.
- A problem was fixed that
caused a system crash with unrecoverable SRC B7000103 and
"ErFlightRecorder" in the failing stack.
- A problem was fixed that
caused the following symptoms on user-level jobs:
1. During job initialization when starting communication
over the cluster fabric, an error message similar to the following:
4:ERROR 629
fD4fs: Message type 21 from source 4 4:MPI-PAMI ERROR: pami_init()
failed with rc(1) 4:ERROR: 0031-309 Connect failed during message
passing
initialization, task 4, reason:
2. The initialization may succeed, but an HFI translation
failure may occur, causing a time out on the cluster network and other
side effects.
System firmware changes that affect certain systems
- A problem was fixed that caused the dual-port Ethernet
adapter, F/C 5270 and F/C 5708, to fail to power on with SRC B7006970.
- On systems in a high-performance computing (HPC) cluster in
8D topology, a problem was fixed that caused a secondary route to be
linked to an indirect route chain. Jobs that are run in indirect
route mode may experience hangs and performance problems.
- The firmware was enhanced to improve the performance when
indirect routing is used in large cluster systems.
|
AS730_103_093
06/27/12
|
Impact: Availability
Severity: SPE
System firmware changes that affect all systems
- A problem was fixed that caused a
segmentation fault in the service processor firmware. When this
occurred, a perc error with SRC B181C350 was logged.
System firmware changes that affect certain systems
- On nodes with a single DCCA running AS730_093, a problem
was fixed that prevented the node from booting, with SRC 10008732
erroneously logged.
|
AS730_093_093
06/13/12
|
Impact: Serviceability
Severity: SPE
System firmware changes that affect all systems
- DEFERRED: The firmware was enhanced to fix a
potential performance degradation on systems utilizing the stride-N
stream prefetch instructions dcbt (with TH=1011) or dcbtst (with
TH=1011). Typical applications executing these algorithms include
High Performance Computing, data intensive applications exploiting
streaming instruction prefetchs, and applications utilizing the
Engineering and Scientific Subroutine Library (ESSL) 5.1.
- The firmware
was enhanced to correctly handle bus errors between the P7 processor
chip and the I/O hub chip.
- The firmware was enhanced to correctly diagnose the failing
FRU when SRC B1xxE504 with error signature "MCFIR[14] - Hang timer
detector" was logged.
- The firmware was enhanced to improve the FRU callouts when
the number of multi-bit errors on a POWER7 processor bus exceeds the
threshold. This reduces the number of FRUs replaced on a failing
system.
- A problem was fixed the caused a system to crash when the
system was in low power (or safe mode), and the system attempted to
switch over to nominal mode.
- The firmware was enhanced to reduce the impact of heavy
volume errors, which can be logged as "sender hang" errors.
- The firmware was enhanced to reduce the number of "retry
fetch CE" and "DRAM spare" error logs entries that call out memory
DIMMs.
- A problem was fixed that caused the first processor module
in a node to be erroneously called out if an over-temperature condition
was detected, instead of the processor module that was reporting the
over-temperature condition.
- The firmware was enhanced to handle the I/O hub ISR
(Integrated Switch Router) link port errors as software-recoverable,
rather than as hard failures. Before this enhancement, the links
would have been guarded out even though these errors were recoverable.
- A problem was fixed that caused a service processor kernel
panic due to an out-of-memory condition, with SRC B181720D.
System firmware changes that affect certain systems
- On systems with F/C 5708 and 5270 Dual port 10GB Ethernet
adapter cards installed, a problem was fixed that caused SRC B7006970
to be erroneously logged when the card was powered on.
- In asymmetric and cross-coupled topologies, if there are no
direct dlink connections between a storage drawer and a compute
supernode (either through fail-in-place or through having a compute
drawer or drawers at standby), then the storage drawer, upon restart or
re-initialization of the lnmc daemon (lnmcd), does not provide a
failover route to the target compute supernode even though there are
suitable bounce points within the compute sub-cluster that can provide
the indirect route. The firmware was enhanced to provide this
indirect route.
|
AS730_084_084
04/12/12
|
Impact: Function
Severity: SPE
New Features and Functions
- Support for cross-coupled compute-to-storage topology for a
2 drawer storage sub-cluster.
- Support for cross-coupled compute-to-storage topology for a
4 drawer storage sub-cluster.
System firmware changes that affect all systems
- The firmware was enhanced to allow a node to continue to
boot when unrecoverable SRC B181B70C is logged.
- A problem was fixed that caused an extraneous error log
entry
calling out DCCA-B and hub R5 when power was removed from DCCA-A, and
the service processor and TPMD in DCCA-A were primary.
- The firmware was enhanced to more gracefully handle the
system
shutdown that is required when a hypervisor hang condition was
encountered. SRCs B7000602, B182951C, B1813918 and A7001151 were
logged, and a service processor failover occurred, when the hypervisor
hang condition and subsequent system crash occurred.
- The firmware was enhanced to cause the secondary service
processor to automatically pick up configuration changes stored on the
primary service processor. This prevents the new configuration
information from being lost if a service processor failover occurs
before the secondary has picked up the new configuration information;
typically this problem will only be encountered just after a system is
installed.
- The firmware was enhanced to gracefully recover, and log
the correct error logs, if the secondary DCCA loses power.
- A problem was fixed that prevented communication between
the
compute and storage networks in asymmetric ISR network
topologies.
This affected network topologies DD2_64_8_2A, DD2_64_8_2B, DD2_64_8_4A,
and DD2_64_8_4B.
- A problem was fixed that caused SRC B181E6F1
("RMGR_PERSISTENT_EVENT_TIMEOUT") to be erroneously logged.
- The firmware was enhanced to reduce the number of memory
DIMMs replaced due to correctable errors being logged.
- A problem was fixed that caused unrecoverable SRC B130CD03
to be erroneously logged.
- A problem was fixed that caused SRC B7000602 to be
erroneously logged at power on.
- The firmware was enhance to prevent a potential deadlock in
the
opposite-side storage drawer if all of the cross-coupled dlinks between
a compute supernode (at runtime) and a storage drawer (at runtime) are
taken down. This problem also affects indirect routing from
compute to
storage over cross-coupled links.
- A problem was fixed that caused the Local Network
Management
Controller (LNMC) to be set to the wrong state during a service
processor (DCCA) fail-over. If this problem occurs, the most
likely
symptom will be a communication failure on the ISR network.
- A problem was fixed that caused a partition running AIX to
crash.
- A new level of optical link firmware is included in this
service
pack, and the optical link firmware update function is enabled.
The
new optical link device firmware will be automatically installed the
next time the node is booted after this service pack is
installed. Please see
"Additional
Details About Installing This Service Pack" in
the "Important Information" section.
- The firmware was enhanced to increase the threshold of soft
NVRAM errors on the service processor to 32 before SRC B15xF109 is
logged. (Replacement of the service processor is recommended if
more than one B15xF109 is logged per week.)
|
4.0
How to Determine Currently Installed Firmware Level
You can view the server's current firmware level on the Advanced System
Management Interface (ASMI) Welcome pane. It appears in the top right
corner.
Example: AS730_123.
5.0 Downloading
the
Firmware Package
Follow the instructions on the web page. You must read and agree to the
license agreement to obtain the firmware packages.
Note: If your HMC is not internet-connected you will need to
download
the new firmware level to a CD-ROM or ftp server.
6.0 Installing the
Firmware
The method used to install new firmware will depend on the release
level
of firmware which is currently installed on your server. The release
level
can be determined by the prefix of the new firmware's filename.
Example: ASXXX_YYY_ZZZ
Where XXX = release level
- If the release level will stay the same (Example: Level
AS330_075_075
is
currently installed and you are attempting to install level
AS330_081_075)
this is considered an update.
- If the release level will change (Example: Level AS330_081_075 is
currently
installed and you are attempting to install level AS340_096_096) this
is
considered an upgrade.
Instructions for installing firmware updates and upgrades can be found
at http://publib.boulder.ibm.com/infocenter/powersys/v3r1m5/index.jsp?topic=/p7ha1/updupdates.htm
7.0 Firmware History
The complete Firmware Fix History for this Release level can be
reviewed at the following url:
http://download.boulder.ibm.com/ibmdl/pub/software/server/firmware/AS-Firmware-Hist.html
8.0
Change History
Date
|
Description
|
January 31, 2013 |
Added Fix description for "an
out-of-memory condition, with SRC B181720D" to level AS730_093. |