SV810_081_081 / FW810.10
09/08/14 |
Impact:
Availability Severity: SPE
New features and functions
- Extended the availability of the IBM Power System S812L
(8247-21L) that was enabled in the 810.00 release.
- Expansion of maximum number of SAS drives on Power System
S814 (8286-41A) from 8 (SSD, disk, or combination thereof) to 10 drives.
- Support for SAS EXP24S expansion drawer (#5887, #EL1S)
attached using a PCIe slot.
- Support for large M64 based BARs for systems in the OPAL
environment.
- Fan speed settings were enhanced for the case of systems
with fan failure to set the speed based on system thermal conditions
instead of forcing all remaining fans to a overdrive speed setting.
- Support for a PCIe Gen3 FPGA x 16 slot adapter that acts as
a co-processor for the POWER8 processor chip for gzip compressions and
decompressions. Feature codes #EJ12 and #EJ13 are electronically
identical with the same CCIN of 59AB. #EJ12 has full high tail
stock and is supported by 8286-41A and 8286-42A. #EJ13 has a low
profile tail stock and is supported by 8284-22A. OS levels
supported are AIX 6.1 and AIX 7.1 or later. IBM i and Linux are
not supported.
- Support for use of system and partition templates on the
management console.
- Support for Coherent Accelerator Processor Interface (CAPI)
for the PCIe Gen 3 FPGA on OPAL. Operating system supported is
Linux.
- Support was added to allow concurrent initialization of the
processor cores. This expands the range of concurrent firmware
updates to accommodate core initialization changes and also allows for
dynamic repairs of processor and cache memory.
- Support was added for cache memory L2/L3 column repair to
allow concurrent repair of memory and propagation of memory errors for
better fault isolation of memory components.
- The system operator panel was enhanced to show the firmware
mode of the system during the IPL of either PowerVM or OPAL for panel
function 1.
- The service processor Processor Runtime Diagnostics (PRD)
was enhanced to collect debug data for failures in host boot
initialization for the Self-Boot Engine (SBE).
- Support was added to the Advanced System Management
Interface (ASMI) USB menu to allow a system dump to be collected to USB
with the power on to the system. This allows the dump to be
collected with the system memory state intact.
- Support for enhanced 10 Gb ethernet adapters that were
previously announced for Power8 for AIX NIM (Network Install
Management) or Linux Network Install capability. The enhanced
adapters are the following:
PCIe2 4-port(10Gb+1GbE) SR+RJ45 Adapter (#EN0S,
#EN0T)
PCIe2 4-port(10Gb+1GbE) SFP+Copper+RJ45 Adapter
(#EN0U, #EN0V)
The level of adapter microcode required is level
20100130 or later.
PCIe2 LP 2-port 10/1GbE BaseT RJ45 Adapter (#EN0W,
#EN0X, #EL3Z)
The level of adapter microcode required is level
30080130 or later.
- Support for a new 4-port Ethernet Adapter with two 10 Gb
and two 1Gb ports (#EN0M, #EN0N with CCIN 2CC0). The adapter offers NIC
and FCoE over its 10 Gb ports and NIC over the 1 Gb ports and is SR-IOV
capable. The 10 Gb ports are LR (long range) fiber optic,
supporting distances up to 10 km. Except for the transceivers and
cabling of the 10 Gb ports, this adapter is functionally
identical to the 4-port adapter (#EN0H, #EN0J, #EL38) SR optical and
(#EN0K, #EN0L, #EL3C) activer copper twinax.
- Support for a new PCIe 2-port Async adapter (#EN27, #EN28)
that serves the same function as the predecessor PCIe 2-port
Async adapter (#5289, #5290) on the Power7+ and earlier
servers. This adapter provides connection for 2
asynchronous EIA-232 devices. Ports are programmable to support EIA-232
protocols, at a line speed of 128K bps. Two RJ45 connections are
located on the rear of the adapter. To attach to devices using a 9-pin
(DB9) connection, use an RJ45-to-DB9 converter. For convenience, one
converter is included with this feature. One converter for each
connector needing a DB9 connector is needed.
- Support for additional PCIe adapters, which had previously
been supported on Power7+ and earlier servers, to help with server
migration:
Ethernet 10 Gb LAN: 1-port optical SR (#5769, #5275)
Ethernet and FCoE: 4-port 10 Gb/1 Gb Copper (#EN0K,
#EN0L, #EL3C)
Ethernet RoCE: 2-port 10 Gb copper (#EC27, #EC28,
#EL27)
Fibre Channel: 2-port 4 Gb (#5774, #5276, #EL09)
SAS: 2-port 3 Gb 380 MB cache (#5805)
- Support was added for a new Advanced System Management
Interface (ASMI) menu to allow the user to choose between an IPMI or a
serial console when in OPAL mode.
System firmware changes that affect all systems
- A problem was fixed in the service processor that
caused the SRC B1504804 to be logged as many as 30 times over five
minutes for a operations panel voltage regulator error. The error
logging has been reduced to one SRC for this error.
- A problem was fixed to allow the system to prevent an
intermittent system hang until IPL time-out after a processor core
checkstop. This secondary failure after a core checkstop had a
low probability of occurring.
- A problem was fixed to maintain time-of-day (TOD) clock
redundancy for the hypervisor time-keeping services in the case of a
TOD error and fail-over to the backup clock topology. There was a
failure in the TOD fail-over process to correctly assign the new backup
TOD topology, causing loss of redundancy for the next TOD error.
- A problem was fixed for the service processor reset/reload
process to eliminate an extra dump and SRC B1818601 caused by an
internal core dump during the reset/reload.
- A problem was fixed for a processor error with an incorrect
call out of a memory card with SRC B124E504 to eliminate the memory
card FRU call out. The processor error call out of SRC B170E540
was correct.
- A problem was fixed in the Advanced System Menu Interface
(ASMI) menus to restore factory settings so that the default for the
Hypervisor mode (PowerVM or OPAL) was restored to the factory setting
using "System Service Aids/Factory Configuration/Service Processor
Reset/All Reset".
- A problem was fixed in how the processor clock speed was
reported to the hypervisor, causing the partitions to show a clock
speed that was about 200 MHZ faster than the actual processor clock
speed.
- A problem was fixed for DRAM repair for the case where two
DRAM modules are having failures at the same rank such that spares are
used to repair each DRAM error. Without the fix, the second DRAM
is not repaired and could eventually be called out and guarded with a
UE SRC.
- A problem was fixed for system hardware dump collection to
collect all the hardware registers by stopping all functional clocks
before starting the collection.
- A problem was fixed for repairing spare memory DRAM so that
repair solutions for failed spares persists across IPLs of the system
by getting the repair solutions written to the Vital Product Data (VPD)
of the DRAM.
- A problem was fixed in the Advanced System Menu Interface
(ASMI) menus to change the name of the "Hypervisor Configuration" menu
to "Firmware Configuration" to more accurately describe the menu
function of being able to change firmware between the PowerVM and OPAL
modes.
- A problem was fixed in the Advanced System Menu Interface
(ASMI) menus to move the IPMI password reset operation from the
"Firmware Configuration" menu to the "Login Profile/Change password"
menu. This change was made to put all the password change
operation together under one menu.
- A problem was fixed in the Advanced System Menu Interface
(ASMI) menu for "Resource Dump" to give the message "This feature is
not supported for OPAL environments" when the system is in OPAL
mode. Previously, ASMI incorrectly stated that the
"Resource Dump" function was not supported on the machine type.
- A problem was fixed in the service processor to add missing
call outs for the memory buffer and memory controller FRUs when there
is a time-out error on the power bus with PE SRC logged of B170E540.
- A problem was fixed in memory diagnostics and fault
isolation that deconfigured more memory than necessary for memory
errors.
- A problem was fixed that caused the Utility COD display of
historical usage data to be truncated on the management console.
- A problem was fixed to eliminate service processor dumps
after AC power cycles of the CEC.
- A problem was fixed to add a missing hardware call out for
service processor FSI bus errors logged with SRC BC8A0A11. This
causes the failing hardware to be deconfigured and guarded for the next
IPL of the system.
- A problem was fixed so that if an IPL failure occurs that
causes the system to power off, error SRCs will be logged instead
of the system hanging for ten minutes and not logging any SRCs.
- A problem was fixed in the system dump data collection for
missing memory data to collect memory data after hardware
de-configuration checkstop errors.
- A
problem was fixed for in-band code update to prevent loss of a
processor support interface (PSI) link that is in a backup role.
- A problem was fixed in system dump collection for a system
hang after a checkstop. The system failed to go to terminate
state and reboot.
- A problem was fixed in system dump collection to return
full dump data when a secondary error occurs during dump data
collection for the checkstop primary error.
- A problem was fixed in the Advanced System Menu Interface
(ASMI) menu "System Configuration/Hardware Deconfiguration/Memory
Deconfiguration" to be able to manually configure and deconfigure DIMMs.
- A problem was fixed for system terminations that could
occur as a result of PCIe adapters using a Level Signaled Interrupt
(LSI) before the hypervisor interrupt handler was ready. This
could occur when in PCIe adapter recovery for an error with src logs
of B7006970 and B700B971. The PCIe adapters are now
held in reset until initialization sequences are completed to ensure
all interrupt handlers are ready for PCIe adapter interrupts.
- A problem was fixed for a management console firmware
update "Remove and Activate" operation that fails to activate the OCC
(On-Chip Controller for thermal and power management) new code level
with SRCs logged of B18B2616 and B1812601. An IPL is needed to
activate the OCC code level to complete the firmware update.
- A problem was fixed for IPL failures caused by Host Boot
PNOR memory corruption. If a IPL Terminate Immediate (TI) from
Host Boot has a SRC without a specific reason code, a corruption check
on the Host Boot memory partitions is run and the Host Boot partitions
corrected to recover them.
- A problem was fixed for the power usage regulation of
memory to keep memory power usage below its specified limits.
Lack of enough memory throttling was allowing the memory to consume
power pass its set limits, leaving the system exposed to power faults
or unexpected power throttling in other areas of the system.
- A problem was fixed to guard cores on hang errors. A
processor core was not being guarded on hang errors where a core
timed-out waiting for an instruction to complete.
- A problem was fixed to allow memory diagnostics during a
re-IPL of the CEC, insuring that problem memory will be guarded or
recovered and preventing possible error log flooding with memory errors.
- A problem was fixed for system dump process memory
corruption that could cause the wrong dump type to be created for a
system failure, resulting in a system dump with the wrong content.
- A problem was fixed for a service processor reset/reload
causing a FSP dump with a Firmware Database (fwdb) core dump captured
within it.
- A problem was fixed for a processor core forward progress
parity error so that the core could be guarded without causing a system
checkstop.
- A problem was fixed in the run time diagnostics of DIMMs to
read the raw card type correctly, preventing failures in the memory
repair.
- A problem was fixed to prevent an intermittent hostboot IPL
deadlock/hang in the deferred work queue with progress code CC009543
and termination with SRC B1813450.
- A problem was fixed in memory diagnostics to be able to
handle multiple DIMM failures without a time-out failure, reducing the
the amount of memory needed to guarded for the errors.
- A problem was fixed in DIMM initialization to prevent
intermittent B181BA08 DIMM failures in host boot during IPL.
- A problem was fixed to call home guarded FRUs on each
IPL. Only the initial failure of the hardware was being reported
to the error log.
- A problem was fixed for the incorrect fan FRU call outs of
SRC 110076111 so that 4U systems (8286-41A, 8286-42A) have FRU 00FV629
for the 80 mm fan and the 2U systems (8284-22A, 8247-21L,
8247-22L) have FRU 00FV726 for the 60 mm fan.
- A problem was fixed for a memory write error becoming a
system checkstop instead of being handled by the memory error handling
and recovery processes.
- A problem was fixed for the error processing of processor
core checkstops at runtime to not ignore the guard on the failed core
on the next IPL of the system, thus preventing additional failures with
the next IPL during host boot.
- A problem was fixed for error recovery for a failed
processor that has all cores guarded such that host boot is able to
re-IPL using the working processor. In certain situations,
the re-IPL on the good processor was failing with SRC B113E504 with PRD
signature PB_CENT_CRESP_ADDR_ERROR.
- A problem was fixed for run-time guarding of a processor
core that had resulted in a system checkstop when the core guard
attempt failed. The processor with the non-guarded broken core
caused the On-Chip Controller (OCC) to have a power measurement
time-out to the processor with SRC B1102A00 that resulted in the system
termination.
- A problem was fixed to prevent incorrect logging of SRC
11007221 whenever the operator panel is missing (or broken). This
SRC indicates ambient temperature of the system is too high and a
performance throttle may occur to lower the temperature, causing
performance loss. A missing operator panel should not cause lower
performance of the system.
- A problem was fixed for undefined hardware states in the
system that caused a early IPL failure with SRCB1101314 when
configuring the Self Boot Engine (SBE) for hostboot.
- A problem was fixed for the Operator panel where the
Enclosure Fault LED was swapped with the Attention/Check Log LED.
- A problem was fixed for memory diagnostics to guard all
unusable memory due to a channel failure. This prevents the
hypervisor from trying to start partitions with memory associated with
the bad channel and having the partition crash.
- A problem was fixed to insure all memory is scrubbed for
correctable errors to prevent run-time memory failures and possible
checkstops. If memory scrubbing actions found the preceding
memory rank had persistent ECC errors, the next rank of memory was
sometimes skipped.
- A problem was fixed in the Hostboot Self Boot Engine (SBE)
to re-IPL without guarding the processor on a SBE step that has
infrequent failures that are recoverable with a retry.
System firmware changes that affect certain systems
- A problem was fixed for processor local bus errors during
an IPL to call out the master and slave bus components with a BC14090F
SRC to identify all the possible failing components. For the
problem, only the bus slave components were being called out on bus
error leaving open the possibility that the faulty component might not
be guarded or repaired.
- On systems that have a boot disk located on a SAN, a
problem was fixed where the SAN boot disk would not be
found on the default boot list and then the boot disk would have
to be selected from SMS menus. This problem would normally
be seen for new partitions that had tape drives configured before the
SAN boot disk.
- On systems in IPv6 networks, A problem was fixed for
DHCP where a duplicate address detection (DAD) message to the
DHCP-client on the service processor could fail, resulting in duplicate
IP addresses being configured on the network.
- On systems that have Active Memory Sharing (AMS)
partitions, a problem was fixed for Dynamic Logical Partitioning
(DLPAR) for a memory remove, leaving a logical memory block (LMB) in an
unusable state until partition reboot.
- On systems in IPv6 networks, a problem was fixed for
a network boot/install failing with SRC B2004158 and IP address
resolution failing using neighbor solicitation to the partition
firmware client.
- On systems in Dynamic Power Saver (DPS) mode, a
problem was fixed so SRC B1812A61 is not logged when power throttling
is needed for a workload over the power capacity. In DPS
mode, a system power usage adjustment is not an error condition.
- On systems in OPAL mode, a problem was fixed for OPAL
network boots to add retries to DHCP to prevent network boot time-out
errors caused by network lags and slow downs.
- On systems in OPAL mode, a problem was fixed in the fault
isolation procedures to not call out hardware FRUS for software
failures to reduce loss of hardware on errors.
- On systems in PowerVM mode, a problem was fixed in
Live Partition Mobility (LPM) for systems at or near the new 32K
maximum for virtual devices that insufficient space existed to store
device attributes of the migrated system, causing RMC failures
and incorrect MTMS values for the migrated partition.
- On systems in PowerVM mode, a problem was fixed for
I/O adapters so that BA400002 errors were changed to informational for
memory boundary adjustments made to the size of DMA map-in
requests. These size adjustments were marked as UE previously for
a condition that is normal.
- On Power8 2U systems, a problem was fixed for the C5 PCIe
slot failing. This PCIe configuration was not supported on the
8284-22A, 8247-21L, and 8247-22L systems.
- On Power8 2U systems, a problem was fixed in the fan
speed management to lower the maximum RPMs of the fans and reduce the
noise level of the system. This problem affects the 8284-22A,
8247-21L, and 8247-22L systems.
- On systems in PowerVM mode using dedicated processors, a
problem with concurrent firmware update was fixed to prevent a quiesce
of the hypervisor process that can result in a system hang.
- On systems in PowerVM mode, a problem was fixed for
unresponsive PCIe adapters after a partition power off or a partition
reboot.
- On systems with 64Gb DIMM memory (F/C #EM8D), a problem was
fixed to allow 64Gb DIMM memory error-correcting code (ECC) repairs
instead of logging a predictive error with no repair to the memory.
|