AH730
For Impact, Severity and other Firmware definitions, Please
refer to the below 'Glossary of firmware terms' url:
http://www14.software.ibm.com/webapp/set2/sas/f/power5cm/home.html#termdefs
The complete Firmware Fix History for this
Release Level can be
reviewed at the following url:
http://download.boulder.ibm.com/ibmdl/pub/software/server/firmware/AH-Firmware-Hist.html
|
AH730_114_035 / FW730.70
04/03/13 |
Impact: Availability
Severity: SPE
System firmware changes that affect all systems
- A problem was fixed that caused a card (and its children)
that was removed after the system was booted to continue to be listed
in the guard menus in the Advanced System Management Interface (ASMI).
- A problem was fixed that prevented predictive guard errors
from being deleted on the secondary service processor. This
caused hardware to be erroneously guarded out if a service processor
failover occurred, then the system was rebooted.
- A problem was fixed that caused the system controller
(service processor) to reset during a CEC power-off or reboot.
This will cause a terminating SRC and other serviceable events to be
logged, and a platform reboot.
- A problem was fixed that caused SRC B1813221, which
indicates a failure of the battery on the service processor, to be
erroneously logged after a service processor reset or power cycle.
- A problem was fixed that caused various SRCs to be
erroneously logged at boot time including B181E6C7 and B1818A14.
- A problem was fixed that caused a code update operation to
fail with a time-out error, creating a call-home with SRC B1818A0F
. This problem is more likely to occur on HMC-managed systems
experiencing a high level of management activity during a code update.
- A problem was fixed that caused the service processor (or
system controller) to crash when it boots from the new level during a
concurrent firmware installation.
- A problem was fixed that caused SRC B7006A72 to be
erroneously logged.
- The Power Hypervisor was enhanced to insure better
synchronization of vSCSI and NPIV I/O interrupts to partitions.
- A problem was fixed that caused SRC B15A3303 ("CEC
Hardware: Time-Of-Day Hardware Predictive Error") to be erroneously
logged, and the time-of-day to be set to Jan 1, 1970.
- A problem was fixed that was caused by an attempt to modify
a virtual adapter from the management console command line when the
command specifies it is an Ethernet adapter, but the virtual ID
specified is for an adapter type other than Ethernet. The managed
system has to be rebooted to restore communications with the management
console when this problem occurs; SRC B7000602 is also logged.
System firmware changes that affect certain systems
- On
systems with an I/O tower attached, a problem was fixed that caused
SRCs 10009135 and 10009139 to be erroneously logged.
- On systems running Selective Memory Mirroring (SMM),
a problem was fixed that caused the hypervisor to hang or crash when an
uncorrectable hardware error occurred in a memory DIMM.
- On systems with redundant service processors, a problem was
fixed that caused the sibling service processor state to show up as
"unknown" in the service processor error log if a code synchronization
problem was detected after a service processor was replaced.
- A problem was fixed that caused the HMC to display
incorrect data for a virtual Ethernet adapter's transactions statistics.
- A problem was fixed that caused a hibernation resume
operation to hang if the connection to the paging space is lost near
the end of the resume processing. This is more likely on a
partition that supports remote restart.
- A problem was fixed that caused the system to terminate
with a bad address checkstop during mirroring defragmentation.
- A problem was fixed that prevented the HMC command
"lshwres" from showing any I/O adapters if any adapter name contained
the ampersand character in the VPD.
- On a system running a Live Partition Mobility (LPM)
operation, a problem was fixed that caused the partition to
successfully appear on the target system, but hang with a 2005 SRC.
- On a partition with a large number of potentially bootable
devices, a problem was fixed that caused the partition to fail to boot
with a default catch, and SRC BA210000 may also be logged.
- On systems running Active Memory Sharing (AMS) partitions,
a problem was fixed that may arise due to the incorrect handling of a
return code in an error path during the logical partition migration
(LPM) of an AMS partition.
- On systems running Active Memory Sharing (AMS) partitions,
a timing problem was fixed that may occur if the system is undergoing
AMS pool size changes.
Concurrent hot add/repair
maintenance firmware fixes
- A problem was fixed that caused a concurrent hot add/repair
maintenance (CHARM) operation to fail after this sequence of events
occurred:
1. A user-initiated platform system dump is requested (from the
ASMI or management console).
2. A service processor reset/reload takes place while dump
collection is in progress.
3. A concurrent hot add/repair maintenance operation is
attempted.
- A problem was fixed that caused the system to crash during
a processor book upgrade when the backup clock was in an error
state, and the system switched over the backup clock.
- A problem was fixed that caused a concurrent hot add/repair
maintenance operation to hang at the "Determining current FRU
information" panel on the management console.
- A problem was fixed that caused a service processor (system
controller) reset/reload during a concurrent hot add/repair maintenance
operation on a fully-configured system with several hundred partitions.
- The firmware was enhanced to reduce the number of
concurrent hot add/repair maintenance failures due to the operation
timing out on fully-configured systems.
- A problem was fixed that caused a concurrent hot add/repair
maintenance operation to fail if a memory channel failure on the CEC
was followed by a service processor reset/reload.
- On systems in which there are no processors in the shared
processor pool, a problem was fixed that caused the Hypervisor to
become unresponsive (the service processor starts logging time-out
errors against the Hypervisor, and the HMC can no longer talk to the
Hypervisor) during a concurrent hot add/repair maintenance operation.
- A problem was fixed that caused a hypervisor memory leak
during a concurrent hot add/repair maintenance operation.
- A problem was fixed that caused a concurrent node repair or
upgrade to fail during the system deactivation step with a hypervisor
error code of 0x300.
- A problem was fixed that caused a the system to terminate
with a bad address checkstop during a concurrent hot add/repair
maintenance operation.
- A problem was fixed that caused the system to hang if
memory relocation is performed during a concurrent hot add/repair
maintenance operation.
- A problem was fixed that caused partition activations to
fail during or after a node repair operation.
- A problem was fixed that caused synchronization problems in
an application using the Barrier Synchronization Register (BSR)
facility during the memory relocation that occurs in a concurrent hot
add/repair maintenance operation.
- A problem was fixed that prevented the I/O slot information
from being presented on the management console after a concurrent node
repair.
- A problem was fixed that caused a service processor (system
controller) reset/reload, instead of a standard error being logged,
when a concurrent hot add/repair maintenance operation failed.
- A problem was fixed that caused a node repair operation to
fail when preceded by another node repair operation.
- A problem was fixed that caused a node repair or node add
to fail.
- A problem was fixed that caused a concurrent hot add/repair
maintenance operation with a B181C350 SRC.
- On systems running multiple IBM i partitions that are
configured to communicate with each other via virtual Opticonnect,
concurrent hot add/repair maintenance operations may time-out.
When this problem occurs, a platform reboot may be required to recover.
|
AH730_099_035
10/24/12 |
Impact: Availability
Severity: HIPER - High Impact/PERvasive, Should be installed as soon as
possible.
System firmware changes that affect all systems
- HIPER/Non-Pervasive: DEFERRED: A problem was fixed
that caused a system crash with SRC B170E540.
- HIPER/Non-Pervasive: A related
problem was also fixed that could cause a live lock on the power bus
resulting in a system crash.
- To address poor placement of partitions following a reboot
of a server with unlicensed cores, the firmware was enhanced to run the
affinity manager when the initialize configuration operation is done
from the HMC. A problem was also fixed that caused the hypervisor
to be left in an inconsistent state after a partition create operation
failed.
- A problem was fixed that caused a crash while the system
was being serviced and the TPMD (Thermal and Power Management Device)
reset unexpectedly.
|
AH730_095_035
08/23/12 |
Impact: Availability
Severity: SPE
New Features and Functions
- Support for booting the IBM i operating system from a USB
tape drive.
System firmware changes that affect all systems
- A problem was fixed that caused a partition with dedicated
processors to hang with SRC BA33xxxx when rebooted, after it was
migrated using a Live Partition Mobility (LPM) operation from a system
running Ax730 to a system running Ax740, or vice versa.
- The firmware was enhanced to call out the correct field
replaceable units (FRUs) when SRC B124E504 with description "Chnl init
TO due to SN stuck in recovery" was logged.
- A problem was fixed that caused SRC B1818A10 to be
erroneously logged after a system firmware installation.
- A problem was fixed that caused booting from a virtual
fibre channel tape device to fail with SRC B2008105.
- The firmware was enhanced to log SRCs BA180030 and BA180031
as informational instead of predictive.
- A problem was fixed that caused a "code accept" during a
concurrent firmware installation from the HMC to fail with SRC
E302F85C. This is most likely to occur on model FHB systems.
- On systems running the AIX operating system, a problem was
fixed that caused the hypervisor to crash with SRC B7000103, after an
HEA (Host Ethernet Adapter) error was logged, when there is a lot of
AIX activity on the HEAs.
- A problem was fixed that caused the suspension of a
partition to fail if a large amount of data has to be stored to resume
the partition.
- A problem was fixed that caused a system crash with
unrecoverable SRC B7000103 with "ErFlightRecorder" in the failing
stack..
- On systems booting from an NPIV (N-port ID virtualization)
device, a problem was fixed that caused the boot to intermittently
terminate with the message "PReP-BOOT: unable to load full PReP
image.". This problem occurs more frequently on the IBM V7000
Storage System running the SAN Volume Controller (SVC), but not on
every boot.
- A problem was fixed that caused SRC B181E6F1 with the
description "RMGR_PERSISTENT_EVENT_TIMEOUT" to be erroneously logged.
- A problem was fixed that caused a memory leak in the
service processor firmware.
- A problem was fixed that caused SRC B155A491 to be
erroneously logged during multiple system IPLs. This SRC may
cause the system to terminate.
- A problem was fixed that caused the lsstat command on the
HMC to display an erroneously high number of packets transmitted and
received on a vlan interface.
- The firmware was enhanced to improve the service actions
when a processor error with the signature "MCFIR[14] - Hang timer
detector" was logged.
- A problem was fixed that caused the system to crash after a
recoverable error was logged on an I/O hub.
System firmware changes that affect certain systems
- The
firmware was enhanced to fix a potential performance degradation on
systems utilizing the stride-N stream prefetch instructions dcbt (with
TH=1011) or dcbtst (with TH=1011). Typical applications executing
these algorithms include High Performance Computing, data intensive
applications exploiting streaming instruction prefetchs, and
applications utilizing the Engineering and Scientific Subroutine
Library (ESSL) 5.1.
- On systems on which Internet Explorer (IE) is used to
access the Advanced System Management Interface (ASMI) on the Hardware
Management Console (HMC), a problem was fixed that caused IE to hang
for about 10 minutes after saving changes to network parameters on the
ASMI.
- A problem was fixed that caused informational SRC A70047FF,
which may indicate that the Anchor (VPD) card should be replaced, to be
erroneously logged again after the Anchor card was replaced.
- A problem was fixed that caused a network installation of
IBM i to fail when the client was on the same subnet as the server.
- On systems with a 5796 or 5797 I/O drawer attached, a
problem was fixed that could cause a system hang.
- On 9119-FHB systems with F/C 5803 or 5873 I/O drawers
attached via InfiniBand, and that are running IBM i partitions, a
problem was fixed that prevented slots on the same PCI bus from being
assigned to other partitions. This can result in SRC B600690B
being logged.
Concurrent hot add/repair
maintenance firmware fixes
- A problem was fixed the prevented the DASD roll-up fault
LED from working properly after a node add or node remove operation.
- A problem was fixed that caused a hot node repair operation
to fail with PhypRc=0x0300, indicating the deactivate system resource
operation failed.
- During a CHARM replacement of a memory card on a system
running with mirrored memory, a problem was fixed that caused the
operation to fail with "PhypRc = 0x0326".
- A problem was fixed that caused a node evacuation to fail
with "PhypRc=0x0300".
|
AH730_087_035
05/18/12
|
Impact: Availability
Severity: SPE
New Features and Functions
- Support for IBM i Live Partition Mobility (LPM)
System firmware changes that affect all systems
- A problem was fixed that prevented the user from changing
the boot mode or keylock setting after a remote restart-capable
partition is created, even after the partition's paging device is
on-line.
|
AH730_078_035
03/14/12
|
Impact: Availability
Severity: HIPER - High Impact/PERvasive, Should be installed as soon as
possible.
System firmware changes that affect all systems
- The firmware was enhanced to properly display a memory
controller that has been guarded out manually on the "Deconfiguration
Records" menu option (under "System Service Aids") on the Advanced
System Management Interface (ASMI).
- A problem was fixed that caused multiple service processor
dumps to be unnecessarily taken during a concurrent firmware
update. SRC B181EF9A, which indicates that the dump space on the
service processor is full, was logged as a result.
- The firmware was enhanced to increase the threshold for
recoverable SRC B113E504 so that the processor core reporting the SRC
is not guarded out. This prevents unnecessary performance loss
and the unnecessary replacement of processor modules.
- A problem was fixed that caused SRC B7000602 to be
erroneously logged at power on.
- The firmware was enhanced to recognize new USB-attached
devices so that they will be listed as boot devices in the System
Management Services (SMS) menus.
- A problem was fixed that caused booting or installing a
partition or system from a USB device to fail with error code
BA210012. This usually occurs when an operating system (OS) other
than the OS that is already on the partition or system is booted or
installed.
- On the System Management Services (SMS) remote IPL (RIPL)
menus, a problem was fixed that caused the SMS menu to continue to show
that an Ethernet device is configured for iSCSI, even though the user
has changed it to BOOTP.
- The firmware was enhanced to log SRCs BA180030 and BA180031
as informational instead of predictive.
- The firmware was enhanced to increase the threshold of soft
NVRAM errors on the service processor to 32 before SRC B15xF109 is
logged. (Replacement of the service processor is recommended if
more than one B15xF109 is logged per week.)
- A problem was fixed that caused a system to crash when the
system was in low power (or safe) mode, and the system attempted to
switch over to nominal mode.
- The firmware was enhanced to correctly log an error when
the bulk power controllers' firmware levels don't match.
- A problem was fixed that caused SRC B7006990 to be
incorrectly logged, instead of SRC B7006991, when the hypervisor is
unable to communicate with the secondary system controller at system
boot.
- A problem was fixed that caused corruption of the bus
between the thermal/power management device (TPMD) and the
processors. SRCs B1812A11 ("TPMD to processor communication
failure"), B113E504 ("core thermal unit PCB error"), B114E504
("nest thermal unit PCB error"), and B121E504 ("memory controller
thermal unit PCB error") were logged, and the processors to be guarded
out, when this problem occurred.
- A problem was fixed that caused a partition to hang at
progress code C200xxxx when booting.
- A problem was fixed that caused the Advanced System
Management Interface (ASMI) to continue to show that processors were
deconfigured after they had been replaced.
- Two problems were fixed with the firmware that controls the
lightstrip LED:
- A problem that prevented the P10 and P11 LEDs from coming on when
power was applied to the lightstrip
- A problem that caused a node power LED (P2, P3, P4...P9) to come on
even though a node was not installed.
- The firmware was enhanced to more gracefully handle the
system shutdown that is required when a hypervisor hang condition was
encountered. SRCs B7000602, B182951C, B1813918 and A7001151 were
logged, and a service processor failover occurred, when the hypervisor
hang condition and subsequent system crash occurred.
System firmware changes that affect certain systems
- HIPER/Non-Pervasive:
On systems running the Advanced Energy Manager, a
problem was fixed that caused the system to crash with SRC B114E504.
- A problem was fixed that caused the hypervisor to hang
during a concurrent operation on a F/C 5802, 5803, 5873 or 5877 I/O
drawer. Recovering from the hypervisor hang required a platform
reboot.
- A problem was fixed that impacted performance if profiling
was enabled in one or more partitions. Performance profiling is
enabled:
- In an AIX or VIOS partition using the tprof (-a, -b, -B, -E option)
command or pmctl (-a, -E option) command.
- In an IBM i partition when the PEX *TRACE profile (TPROF) collections
or PEX *PROFILE collections are active.
- In a Linux partition using the perf command, which is available in
RHEL6 and SLES11; profiling with oprofile does not cause the problem.
- On a system that is being upgraded from Ax720 system
firmware to Ax730 system firmware, the firmware was enhanced to log
B1818A0F as informational instead of predictive if it occurs during the
firmware upgrade.
- On systems running Active Memory Sharing (AMS), the
allocation of the memory was enhanced to improve performance.
- A problem was fixed that caused the suspension of a logical
partition running Active Memory Sharing (AMS) to fail because the disk
headers had not been erased.
- On systems with an iSCSI network, when booting a logical
partition using that iSCSI network, a problem was fixed that caused the
iSCSI gateway parameter displayed on the screen to be incorrect.
It did not impact iSCSI boot functionality.
- On systems using affinity groups, a problem was fixed that
prevented one of the partitions from being placed correctly.
Concurrent hot add/repair
maintenance firmware fixes
- A problem was fixed that caused a checkstop to occur during
a node repair operation.
- A problem was fixed that caused the system to hang
during a CHARM operation.
- A problem was fixed that caused unrecoverable SRCs B1813918
and B182953C during a CHARM operation.
- A problem was fixed that caused the replacement of a
processor module during a CHARM operation to fail because the CCIN of
the replacement module was different from the CCIN of the module being
replaced, even though they were functionally equivalent.
- A problem was fixed that caused multiple types of failures
(CHARM node operations and Advanced Energy Manager (AEM) state changes,
among others), after a CHARM hot node operation on the first node was
followed by a concurrent firmware installation.
|
AH730_066_035
12/08/11
|
Impact: Availability
Severity: HIPER
- High Impact/PERvasive, Should be installed as soon as
possible.
System firmware changes that affect certain systems
- HIPER/Pervasive on systems
with a Virtual Input/Output (VIO) client running AIX, and with a F/C
5803 or 5873 I/O drawer attached: A problem was fixed that
caused the system to crash with SRC B700F103.
|
AH730_065_035
11/22/11
|
Impact: Availability
Severity: HIPER
- High Impact/PERvasive, Should be installed as soon as
possible.
System firmware changes that affect all systems
- HIPER/Pervasive:
On systems running firmware level AH730_051, or AH730_058, a problem
was fixed that caused the target server to hang, or go to the
incomplete state on the management console, after a Live Partition
Mobility (LPM) operation. This problem can also occur when a
partition hibernation operation is done.
|