SV810
For Impact, Severity and other Firmware definitions, Please
refer to the below 'Glossary of firmware terms' url:
http://www14.software.ibm.com/webapp/set2/sas/f/power5cm/home.html#termdefs
The complete Firmware Fix History for this
Release Level can be
reviewed at the following url:
http://download.boulder.ibm.com/ibmdl/pub/software/server/firmware/SV-Firmware-Hist.html
|
SV810_124_081 / FW810.30
05/29/15 |
Impact:
Availability Severity: SPE
New features and functions
- Support for setting Power Management Tuning Parameters from
the Management Console (Fixed Maximum Frequency (FMF), Idle Power Save,
and DPS Tunables) without needing to use the Advanced System Management
Interface (ASMI) on the service processor. This allows FMF mode
to be set by default without having to modify any tunable parameters
using ASMI.
- Support was added for a new menu for the Advanced System
Management Interface (ASMI) that is used to reset/reload the service
processor. A reset/reload or "soft reset" maintains the state of
the hypervisor and the operating systems running in the partitions
while rebooting the service processor so it can recover from service
processor errors. The menu that does this function is called
"System Service Aids/Soft Reset Service Processor."
- Support was added to the Advanced System Management
Interface (ASMI) to display Anchor card VPD failures in the
"Deconfigurations records" menu.
- Support for the Nvidia Compute Intensive Accelerator (PCIe
attached 300W GPU) with F/C #EC4B. This feature is only supported
on the IBM Power System S824L (8247-42L). It is a PCIe 3
X16/Long/Full High/Double wide adapter with the PCIe connection in the
left slot and overlaps another PCIe slot. This feature ships with
an auxiliary power cord used inside the system to support the 300W card.
System firmware changes that affect all systems
- A problem was fixed for systems with a corrupted date of
"1900" showing
for the Update Access Key (UAK). The firmware update is allowed
to
proceed on systems with a bad UAK date because the fix is in an
emergency service pack. After the fix is installed, the user
should
correct the UAK date, if needed, by using the original UAK key for the
system. On the Management Console, enter the original
update access
key via the "Enter COD Code" panel. Or on the Advanced System Manager
Interface (ASMI), enter the original update access key via the
"On
Demand Utilities/COD Activation" panel.
- A problem was fixed for the iptables process consuming all
available memory, causing an out of memory dump and reset/reload of the
service processor.
- A problem was fixed for a CEC IPL hang failure with CEC
Hardware Subsystem SRC UE B150BE14 when having persistent L2/L3 cache
memory errors. The IPL was stuck in a loop with progress codes
C1C3C200 through C1C3C213 and having repeating error log informational
SRCs of HostBoot BC8A1402 and Processor Unit (CPU) BC13E504. With
the fix, the failing core chiplet is guarded out and the IPL is able to
complete.
- A problem was fixed for the NEBS DC power supply showing up
in the part inventories for the CEC as "IBM AC PS". The
description string has been changed to "IBM PS" as power supplies can
be of DC or AC type.
- A problem was fixed for missing hardware callouts in Vital
Product Data (VPD) error logs.
- A problem was fixed for the callouts for a checkstop with
SRC B111E504 with PBCENTFIR[5] of PB_CENT_CRESP_ADDR_ERROR so that
FSPSP16 is added as the high priority callout. This checkstop is
most likely caused by software error, not hardware.
- A problem was fixed for SRC B1104800 having duplicate FRU
call outs for the PNOR flash FRU.
- A problem was fixed in the hardware server to prevent a UE
B181BA07 abort when a host boot dump collection is in progress.
- A problem was fixed for the unnecessary guarding of DIMMs
for a memory bus error for SRC Memory Card/FRU B124E504. The
error recovery has been improved so that DIMMs are not guarded and the
failing memory bus lane is replaced by the spare memory bus data lane.
- A problem was fixed for a processor core unit being
deconfigured but not guarded for a SRC B113E504 processor error in host
boot with fault isolation register (FIR) code
"RC_PMPROC_CHKSLW_NOT_IN_ETR" that caused the CEC to go to
termination. By guarding the failed processor core, the fix
insures the core is not used on the reIPL of the CEC.
- A problem was fixed for the On Chip Controller (OCC) taking
the system into safe mode under certain work loads by increasing the
time allowed for getting an update of the Analog Power Subsystem Sweep
(APSS) data for current temperatures and power consumption. If
the OCC does not get data from the APSS within its time-out
period, the OCC will go to safe mode and run the processor at a
minimum frequency.
- A problem was fixed for intermittent firmware database
errors that logged an UE SRC of B1818611 and had a fwdbServer core dump.
- A problem was fixed for an intermittent reset/reload of the
service processor during the early part of an IPL with SRC B1814616
logged.
- A problem was fixed that prevented a second management
console from being added to the CEC. In some cases, network
outages caused defunct management console connection entries to remain
in the service processor connection table, making connection
slots unavailable for new management consoles A reset of the
service processor could be used to remove the defunct entries.
- A problem was fixed for a false guarding and call out of a
PSI link with SRC B15CDA27. This failure is very infrequent but
sometimes seen after the reset/reload of the service processor during a
concurrent firmware update. Since there is no actual
hardware failure, a manual unguarding of the PSI link allows it to be
reused.
- A problem was fixed for performance dumps to speed its
processing so it is able to handle partitions with a large number of
processors configured. Previously, for large systems, the
performance dump took too long in collecting performance data to be
useful in the debugging of some performance problems.
- A problem was fixed for a CEC power off error with SRC
B1818903 logged. The error causes a dump and reset of the service
processor that allows the power off operation to complete.
- A problem was fixed for firmware update to be able do a
code update downgrade from a SV830 release to SV810. This error
causes the service processor to go to a stopped state with a user power
cycle needed to recover to the P-side which will be correctly at the
SV810 level.
- A problem was fixed for missing "fastarray" data in
hardware dump type HWPROC. The "fastarray" contains debug
information for the processor cores.
- A problem was fixed for the Dynamic Power Saving (DPS) mode
where, when favoring performance, the system instead favored
lower power use. A work-around for the problem is to use
the Advanced System Management Interface (ASMI) menu of System
Configuration/Power Management/Tuning parameters to change the
parameter labelled "Utilization threshold to determine active cores
with slack" to 10.0%.
- A problem was fixed for the Automatic Power On Policy
(APOR) where the system failed to re-IPL after a AC power loss.
The APOR process needed to wait longer for the AC fault to clear before
doing the IPL retry.
- A problem was fixed for the Advanced System Manager
Interface (ASMI) IPv4 Network Configuration where the IP address
was being overwritten by value in the subnet mask field for the initial
values of the panel. If the network configuration was saved
without fixing the IP address, the wrong IP address was also saved.
- A problem was fixed for missing call outs when having
multiple "Memory Card/FRU" failures with SRC B124E504. There is a
call out for the first memory FRU of the failures but any other memory
FRUs failing at the same time are not reported.
- A problem was fixed for errors during a CEC power off with
SRCs B1812616 and B1812601. These occurred if the CEC was powered
off immediately after a power on such that the On-Chip Controllers
(OCCs) had to shutdown during their initialization.
- A problem was fixed for a highly intermittent IPL failure
with SRC B18187D9 caused by a defunct attention handler process.
For this problem, the IPL will continue to fail until the service
processor is reset.
- A security vulnerability, commonly referred to as GHOST,
was fixed in the service processor glibc functions getbyhostname() and
getbyhostname2() that allowed remote users of the functions to cause a
buffer overflow and execute arbitrary code with the permissions of the
server application. There is no way to exploit this vulnerability
on the service processor but it has been fixed to remove the
vulnerability from the firmware. The Common Vulnerabilities and
Exposures issue number is CVE-2015-0235.
- A security problem in GNU Bash was fixed to prevent
arbitrary commands hidden in environment variables from being run
during the start of a Bash shell. Although GNU Bash is not
actively used on the service processor, it does exist in a library so
it has been fixed. This is IBM Product Security Incident Response
Team (PSIRT) issue #2211. The Common Vulnerabilities and
Exposures issue numbers for this problem are CVE-2014-6271,
CVE-2014-7169, CVE-2014-7186, and CVE-2014-7187.
- A security problem was fixed in OpenSSL where the service
processor would, under certain conditions, accept Diffie-Hellman client
certificates without the use of a private key, allowing a user to
falsely authenticate . The Common Vulnerabilities and Exposures
issue number is CVE-2015-0205.
- A security problem was fixed in OpenSSL to prevent a denial
of service when handling certain Datagram Transport Layer Security
(DTLS) messages. A specially crafted DTLS message could exhaust
all available memory and cause the service processor to reset.
The Common Vulnerabilities and Exposures issue number is CVE-2015-0206.
- A security problem was fixed in OpenSSL to prevent a denial
of service when handling certain Datagram Transport Layer Security
(DTLS) messages. A specially crafted DTLS message could do an
null pointer de-reference and cause the service processor to
reset. The Common Vulnerabilities and Exposures issue number is
CVE-2014-3571.
- A security problem was fixed in OpenSSL to fix multiple
flaws in the parsing of X.509 certificates. These flaws could be
used to modify an X.509 certificate to produce a certificate with a
different fingerprint without invalidating its signature, and possibly
bypass fingerprint-based blacklisting. The Common Vulnerabilities
and Exposures issue number is CVE-2014-8275.
- A security problem was fixed in the OpenSSL (Secure Socket
Layer) protocol that allowed a man-in -the middle attacker, via a
specially crafted fragmented handshake packet, to force a TLS/SSL
server to use TLS 1.0, even if both the client and server supported
newer protocol versions. The Common Vulnerabilities and Exposures issue
number for this problem is CVE-2014-3511.
- A security problem was fixed in OpenSSL for formatting
fields of security certificates without null-terminating the output
strings. This could be used to disclose portions of the program
memory on the service processor. The Common Vulnerabilities and
Exposures issue number for this problem is CVE-2014-3508.
- Multiple security problems were fixed in the way that
OpenSSL handled Datagram Transport Layer Security (DLTS) packets.
A specially crafted DTLS handshake packet could cause the service
processor to reset. The Common Vulnerabilities and Exposures
issue numbers for these problems are CVE-2014-3505, CVE-2014-3506 and
CVE-2014-3507.
- A security problem was fixed in OpenSSL to prevent a denial
of service when handling certain Datagram Transport Layer Security
(DTLS) ServerHello requests. A specially crafted DTLS handshake
packet with an included Supported EC Point Format extension could cause
the service processor to reset. The Common Vulnerabilities and
Exposures issue number for this problem is CVE-2014-3509.
- A security problem was fixed in OpenSSL to prevent a denial
of service by using an exploit of a null pointer de-reference during
anonymous Diffie Hellman (DH) key exchange. A specially crafted
handshake packet could cause the service processor to reset. The
Common Vulnerabilities and Exposures issue number for this problem is
CVE-2014-3510.
- A problem was fixed for an intermittent problem in a CEC
IPL where an On-Chip Controller is stuck in a reset loop, logging
repeated SRCs for B1702A17, and eventually places the CEC in safe mode,
running at minimum processor clock frequencies.
- A problem was fixed for NVRAM initialization to support a
service processor side switch after an in-band firmware downgrade from
SV830 to SV810. A service processor failure with SRC
B1817212 occurs on a side switch after the downgrade (side switching
from SV810 to SV830). This happens because of the difference in
size of the NVRAM used between SV810 and SV830 with a need for more
NVRAM initialization on level SV830. This problem does not affect
out of band firmware downgrades to a new release using the Management
Console because in that case a code update accept automatically occurs
and the T and P sides are updated to the same SV810 release level.
- A problem was fixed for an error on a re-IPL of a
powered-on CEC that fails with a time-of-day topology error with SRC
B111BA24.
- A problem was fixed to provide a service alert for failed
VPD on the anchor card. Previously, only an informational (INF)
SRC B155A435 was generated for this failure. Now the SRC has been
made a predictive error (PE) and the failed anchor card VPD is guarded
and ready for service.
- A problem was fixed for a clearing of all guard records
associated with one error log entry. If a FRU is replaced for any
of the related guard record, all the related guard records are
cleared. Previously, only the guard record for the replaced FRU
was cleared and the association was lost.
- A problem was fixed
to reduce switching noise on the memory address bus for DIMMs.
Noise on the bus could cause a failure for a marginal DIMM, so this fix
has the effect of potentially improving the reliability of the memory.
- A fix was made to prevent processor speculative memory
loads from the service processor mailbox Direct Memory Access (DMA)
area in the CEC memory. The speculative loads caused memory cache
faults and system checkstops with SRC B181E540.
System firmware changes that affect certain systems
- For a system with
a degraded power supply, a problem was fixed so that inaccurate
output voltage levels would be handled by the Voltage Regulator Modules
(VRMs) and not cause a system failure.
- For a system with a missing or broken operations panel, a
problem was fixed for excessive logging of SRC B181A734 for the error
condition.
- On systems using Virtual IO Server (VIOS) with the
partitions, a problem was fixed for a mainstore dump (MSD)
failure with SRC B2005123 when it attempted to write to a loadsource
DASD connected via VIOS. VIOS was unable to handle the I/O write
request exceeding 256K.
- On systems using OPAL, the time-outs for errors on the PCIe
Host Bridge (PHB) were increased to allow time for PCIe link error
recoveries to complete where possible to reduce partition and system
errors caused by link errors.
- A problem was fixed for a PowerVM hypervisor hang after a
processor core and system checkstop. The failed processor core
was not put into a guarded state and the hypervisor hung when it tried
to use the failed core.
- On systems using Field Core Override (FCO) feature code
#2319 to reduce the number of available cores, a problem was fixed
where failed cores were not being replaced by unconfigured cores,
causing the system to fail to IPL with a no cores available
condition. The fix now allows unconfigured cores to be
substituted for licensed cores that have failed.
- On systems using OPAL, a problem was fixed for an
unnecessary guarding of a processor core on a L2 or L3 cache
error. This error was caused by an errant attempt to repair the
cache using an operation that is not supported on OPAL. Guarding
of the processor core on OPAL now only occurs after a daily threshold
of cache errors is exceeded instead of guarding on the second cache
error for the core.
- On systems using PowerVM, a problem was fixed for the
handling of the error of multiple cache hits in the instruction
effective-to-real address translation cache (IERAT). A multi-hit
IERAT error was causing system termination with SRC B700F105. The
multi-hit IERAT is now recognized by the hypervisor and reported to the
OS where it is handled.
- On systems using PowerVM, a problem was fixed to prevent a
hypervisor task failure with a B7000602 SRC logged, if multiple
resource dumps running concurrently run out of dump buffer space. The
failed hypervisor task could prevent basic logical partition operations
from working, potentially leading to an Incomplete state on the
Management Console.
- On systems using PowerVM, a problem was fixed for
partitions going back to Epoch Time (1970) after a real-time clock
(RTC) battery replacement. If the RTC battery is replaced and the
correct time is set using the Advanced System Management Interface, the
partitions end up with the wrong time based in 1970.
- On systems using PowerVM, a problem was fixed to allow
partitions to recover PCIe links from multiple link errors occurring at
the same time. The only recovery without the fix would be to
reipl the CEC.
- On systems using PowerVM, a problem was fixed for
partitions with Virtual Trusted Platform Module (VTPM)
resources so they could restart partitions after a CEC power off and
power on sequence without hanging at progress code C2006009.
- On systems using PowerVM, a problem was fixed to fully
deconfigure cores that have cache repair failures so they cannot be
referenced by an On-Chip Controller (OCC) reset.. This will
prevent an OCC reset failure because of the failed cores, logged with
SRC B1112AB4 and BC82203B, that forces the OCC into safe mode (minimum
processor clock frequency) for all of its remaining cores. A CEC
re-IPL is needed to get an OCC out of safe mode.
- On systems using Virtual IO Server (VIOS) to share physical
I/O resources among client logical partitions using virtual Small
Computer Serial Interface (vSCSI) adapters, a problem was fixed that
prevented the VIOS from accessing storage hosted by a physical adapter
that had storage mapped to a vSCSI adapter. The VIOS showed
errors on disks under that physical adapter and was unresponsive.
To recover from this problem, the VIOS must be rebooted.
- On systems using the Virtual I/O Server (VIOS) to share
physical I/O resources among client logical partitions, a problem was
fixed for memory relocation errors during page migrations for the
virtual control blocks. These errors caused a CEC termination
with SRC B700F103. The memory relocation could be part of the
processing for the Dynamic Platform Optimizer (DPO), Active Memory
Sharing (AMS) between partitions, mirrored memory defragmentation, or a
concurrent FRU repair.
- On systems using PowerVM, a problem was fixed for the PCIe
Host Bridge (PHB) error recovery process which failed, causing the PCIe
slots to fail. The recovery process has been enhanced to allow
for delays caused by active power bus operations during the recovery
and to handle recovery from simultaneous PCIe switch and PHB errors
. A CEC re-IPL is needed to get the failed PCIe slots working
again.
- On systems using PowerVM, a problem was fixed that could
result in unpredictable behavior if a memory UE is encountered while
relocating the contents of a logical memory block during one of these
operations:
- Reducing the size of an Active Memory Sharing (AMS) pool.
- On systems using mirrored memory, using the memory mirroring
optimization tool.
- Performing a Dynamic Platform Optimizer (DPO) operation.
- On systems using Virtual Shared Processor Pools (VSPP), a
problem was fixed for an inaccurate pool idle count over a small
sampling period.
- On systems using PowerVM and Virtual Trusted Platform
Module (VTPM) partitions, a problem was fixed for a Management
Console error that occurred while restoring a backup profile that
caused the system to go to the Management Console "Incomplete
state". The failed system had a suspended VTPM partition and a
B7000602 SRC logged.
- On systems using PowerVM, a problem was fixed for a
partition deletion error on the Management Console with error code
0x4000E002 and message "...insufficient memory for PHYP". The
partition delete operation has been adjusted to accommodate the
temporary increase in memory usage caused by memory fragmentation,
allowing the delete operation to be successful.
- On systems using PowerVM, a problem was fixed for Live
Partition Mobility (LPM) migrations of Linux partitions running in P8
compatibility mode. After an active migration, the resumed
partition may experience performance degradation.
- On systems using PowerVM, a problem was fixed for a false
error message with error code 0x8006 when creating a virtual ethernet
adapter with the Integrated Virtualization Manager (IVM). The
error message can be ignored as the virtual ethernet slot is fully
functional.
- On systems using PowerVM with a PCIe 3D graphics adapter
(F/C #EC41 or #EC42) in a partition, a problem was fixed for a
partition hang or BA21xxxx error conditions during partition
initialization.
- On systems using PowerVM, a problem was fixed for the Live
Partition Mobility (LPM) migration of virtual devices to a Power8
systems to update each virtual device location code correctly to
reflect the location code in the target systems instead of the location
code in the source system. This problem prevented the management
console from being able to look up AIX Object Data Manager (ODM) names
for the virtual devices so that operations such as remove on the device
could not be performed.
- On systems using PowerVM with a Linux partition, a problem
was fixed for the Linux "lsslot" command so that it is able to find the
F/C EC41 and EC42 PCIe 3D graphics adapter installed in the CEC,
instead of showing the slot as "empty". The Linux graphics
adapter worked correctly even though it showed as "empty".
- On systems using PowerVM, support was added for USB 2.0
HUBs so that a keyboard plugged into the USB 2.0 HUB will work
correctly at the SMS menus. Previously, a keyboard plugged into a
USB 2.0 HUB was not a recognized device.
- On systems using PowerVM, a problem was fixed for a
hypervisor deadlock that results in the system being in a "Incomplete
state" as seen on the management console. This deadlock is the
result of two hypervisor tasks using the same locking mechanism for
handling requests between the partitions and the management
console. Except for the loss of the management console control of
the system, the system is operating normally when the "Incomplete
state" occurs.
- On systems using OPAL firmware, a problem was fixed
for Coherent Accelerator Processor Interface (CAPI) devices not
being available to the partitions after a re-IPL of a CEC with power on.
- On systems using OPAL firmware, a problem was fixed to
support a kdump of a baremetal Little Endian (LE) kernel using XPS
mounts to prevent a hang in Big Endian (BE) Petitboot. For this
problem, there was an endian swtich on the re-mount of the XPS and
Petitboot was unable to read the XPS logs to do recovery.
Petitboot now mounts the XPS file system read-only with no recovery:
"-o ro,norecovery" to prevent the problem.
- On systems using OPAL firmware, a problem was fixed in
Petitboot for the default selection of the OS to use the first grub
entry if no matching OS labels are found in the grub configuration
file. Previously, if a grub label did not match, the user had to
manually select the OS and boot it.
- On systems using OPAL firmware, a security problem
was fixed to prevent an out-of-bounds read in the glibc's iconv()
function when converting certain encoded data to UTF-8. This
could cause a crash of OPAL. The Common Vulnerabilities and
Exposures issue number is CVE-2014-6040.
- On systems using OPAL firmware, a security problem
was fixed for Name Service Switch (NSS) to prevent a denial of service
attack from a application performing key based look-ups on a database
in an infinite loop. The Common Vulnerabilities and
Exposures issue number is CVE-2014-8121.
- On systems using OPAL firmware, a security problem
was fixed for the snap utility of powerpc-utils to prevent plain text
passwords from being extracted from archives containing configuration
snapshots of services. The Common Vulnerabilities and Exposures
issue number is CVE-2014-4040.
- On systems using OPAL firmware, a problem was fixed
for the OPAL lsdevinfo command as it did not correctly process the path
to the device, which made the path unreadable in the output. With
the fix, the path is displayed correctly.
- On systems using OPAL firmware, a problem was fixed for
Resource Monitoring and Control (RMC) failing and going inactive after
several OPAL Linux partition migrations. The validation
operations failed when the Machine, Type, Model, and Serial number
(MTMS) were set incorrectly.
- On systems using OPAL firmware, a problem was fixed for the
OPAL drmgr utility so it correctly gathers Logical Memory Block (LMB)
information while performing Memory Dynamic Logical Partitioning
(DLPAR) on the little-endian variation of the Power processor.
|
SV810_108_081 / FW810.21
01/09/15 |
Impact: Security
Severity: SPE
System firmware changes that affect all systems
- A problem was fixed to prevent the Advanced System
Management Interface (ASMI) "System Service Aids/Factory Configuration"
panel option from restoring to factory configuration for FSP or ALL if
one boot side of the service processor is marked invalid. The
following informational message is issued: "The request cannot be
performed because a firmware boot side is marked invalid. This
state may have been caused by a previous firmware update failure."
- A problem was fixed for firmware updates from USB to allow
the code update progress to be seen with the addition of progress code
C100B100. This progress code means that the firmware update is
busy unpacking the firmware image file and that the USB key should not
be removed until the operation is completed.
- A security problem was fixed in OpenSSL for padding-oracle
attacks known as Padding Oracle On Downgraded Legacy Encryption
(POODLE). This attack allows a man-in-the-middle attacker to
obtain a plain text version of the encrypted session data. The Common
Vulnerabilities and Exposures issue number is CVE-2014-3566. The
service processor POODLE fix is implemented by disabling SSL protocol
SSLv3 and requiring TLSv1.2 protocol on all secured connections.
The Hardware Management Console (HMC) also requires a POODLE fix for
APAR MB03867(FIX FOR CVE-2014-3566 FOR HMC V8 R8.1.0 SP1 with PTF
MH01481). This HMC minimum requirement is enforced by the
firmware update process for this defect.
- A security problem was fixed in OpenSSL for memory leaks
that allowed remote attackers to cause a denial of service (out of
memory on the service processor). The Common Vulnerabilities and
Exposures issue numbers are CVE-2014-3513 and CVE-2014-3567.
- A problem was fixed for two light-emitting diodes (LEDs)
turning on incorrectly on the operator panel after a system power
off. These LEDs are the blue LED (Identify) and the amber LED
(enclosure fault indicator LED with the exclamation point symbol ("!").
System firmware changes that affect certain systems
- On systems with
partitions using shared processors, a problem was fixed that could
result in latency or timeout issues with IO devices.
|
SV810_101_081 / FW810.20
10/24/14 |
Impact:
Availability Severity: HIPER
New features and functions
- Support for the IBM Power System S824L (8247-42L).
- Support for NEBS-3 48VDC 750 W power supply with CCIN 51D8
and F/C #EB3H on the S822 (8284-22A) and the S822L (8247-22L).
- Support for 128Gb CDIMM DDR3 DRAM with F/C #EM8E on the IBM
Power System S824 (8286-42A). These need to be ordered in pairs
and each DIMM within a DIMM pair must be of the same capacity.
- Support for the Nvidia Compute Intensive
Accelerator (PCIe attached GPU) with F/C #EC47. This feature is
only supported on the IBM Power System S824L(8247-42L). It is a
PCIe 3 X16/Long/Full High/Double wide adapter with the PCIe connection
in the left slot.
- Support was added to enable fast sleep on
OPAL systems, allowing for significant power savings.
- Support for an Intelligent Platform
Management Interface (IPMI) enhancement to provide a host Linux boot
device path on OPAL systems.
- Enhancement to the service processor dump
for easier problem debugging by collecting full kcore dumps as a
gzipped file instead of truncating the large kcore files.
- Enhancement made to the Advanced System
Management Interface (ASMI) "System Service Aids/Factory Configuration"
menu to clear all firmware NVRAM for PowerVM and OPAL, regardless of
the current firmware selection. Previously, only the NVRAM for
the current firmware type was cleared.
- Support for additional PCIe adapters,
which had previously been supported on Power7+ and earlier servers, to
help with server migration:
Ethernet 1 Gb LAN: 2-port UTP/TX (#5767, #5281),
2-port SX (#5768, #5274), and 4-port UTP/TX (#5717, #5271)
Ethernet and FCoE: 2-port 10 Gb (#5708, #5270)
SAS: 3-port 6 Gb/1.8 GB cache (#5913, #ESA3)
System firmware changes that affect all systems
- A problem was fixed in the error handling of memory channel
failures with SRC B181E540 to prevent false processor errors with SRC
B113E504 during the next IPL after the memory fault.
- A problem was fixed for L4 cache errors being assigned an
incorrect subsystem of "Memory Controller" in the SRC B121E504 error
log instead of "Memory Fru". L4 cache resides on the
DIMM and is not a memory controller.
- A problem was fixed in the Advanced System Management
Interface (ASMI) "Performance Setup/Logical Memory Block Size"
menu that prevented the user from selecting valid Logical Memory Block
(LMB) sizes because they were greyed out.
- A problem was fixed to capture missing trace data for the
hardware compression accelerator (NX) checkstop failures to allow for
easier debug of the failures.
- A problem was fixed to add call outs for the operations
panel FRU for SRCs B1504804 and B1504805 for operation panel
failures. The FRU call out had been missing in the error log.
- A problem was fixed that caused the system to hang in the
IPL state during a system dump with SRC B182901E shown in the error
log. The hang occurred when system dump detected a prior system
dump already in place. The second system dump would normally be
bypassed to allow the IPL to complete.
- A problem was fixed for the service processor error log
handling that caused SRC B150BAC5 errors when converting a error log
entry from an object into a flattened array of bytes.
- A problem was fixed for truncated fan part numbers in the
FRU call outs of SRC 110076111 so that 4U systems (8286-41A, 8286-42A,
8247-42L) have FRU 00FV629 for the 80 mm fan and the 2U systems
(8284-22A, 8247-21L, 8247-22L) have FRU 00FV726 for the 60 mm
fan. FRU 00FV62 and FRU 00FV72 were being incorrectly reported,
showing the right-most character of the part number truncated.
- A problem was fixed in the fault isolation of FRUs for
errors in the Time Of Day (TOD) oscillator topologies and the
processors to reduce the number of incorrect call outs. When a
problem is detected in a connection between the processor and TOD
oscillator, the oscillator is now called out with high priority
and processor with low priority but neither is guarded to prevent
unnecessary loss of system resources.
- A problem was fixed with the DIMM pairing rules to ensure
that only the one DIMM that is the paired mate of a failing or missing
DIMM is guarded. An error in the pairing rules was causing
additional DIMMs to be called out and guarded in the case of a single
DIMM failure.
- A problem was fixed so that when a L2/L3 cache repair
cannot be performed because there is no repair available, the error log
written is a Predictive Error instead of a hidden Recoverable
Error. This improves the customer awareness that the processor
cache is becoming degraded.
System firmware changes that affect certain systems
- HIPER/Pervasive:
On systems using PowerVM firmware, a performance problem was fixed that
may affect shared processor partitions where there is a mixture of
dedicated and shared processor partitions with virtual IO connections,
such as virtual ethernet or Virtual IO Server (VIOS) hosting, between
them. In high availability cluster environments this problem may
result in a split brain scenario.
- On systems using OPAL firmware, a performance problem was
fixed where
the On-Chip Controller (OCC) failed to establish a session to OPAL,
resulting in all the system processors being set to minimum (safe mode)
frequencies.
- On systems using PowerVM firmware, a problem was fixed for
systems in networks using the Juniper 1GBe and 10GBe switches (F/Cs
#1108, #1145, and #1151) to prevent network ping errors and boot from
network (bootp) failures. The Address Resolution Protocol (ARP)
table information on the Juniper aggregated switches is not being
shared between the switches and that causes problems for address
resolution in certain network configurations. Therefore, the CEC
network stack code has been enhanced to add three gratuitous ARPs (ARP
replies sent without a request received) before each ping and bootp
request to ensure that all the network switches have the latest network
information for the system.
- On systems using OPAL firmware, a problem was fixed
for the 10/1Gb Ethernet adapter (F/C #EL3Z) where it failed by
rebooting into the wrong endian mode.
- On systems using PowerVM firmware, a problem was fixed for
a false error message displayed on the management console during
firmware code updates that include Concurrent Core Initialization (CCI)
for the processors. All processors core are correctly initialized
but the management console displays this message: "An open
serviceable event related to system firmware was found. The
firmware update process will not be interrupted. Please address
any open serviceable events on the system(s) ... HSCF0223".
- On systems using PowerVM firmware, a problem was
fixed so that a system dump with Advanced System Management Interface
(ASMI) server firmware content of "maximum " or "HCA IO" will not
cause the system to fail with a SRC B700F103. There is no
Infiniband (IB) Host Channel Adapter (HCA) on a IBM Power8 system so
this caused an unexpected problem in the hypervisor dump data
collection for IB adapters.
- On systems using PowerVM firmware, a problem was
fixed for network boot/install using a null pointer when network
adapter buffers are depleted and failing the boot with a SRC BA210003 -
"Partition firmware detected a data storage error".
- On the IBM Power System S824 (8286-42A) with IBM i
partitions, a problem was fixed to block a non-applicable IBM i
console warning message "CPF9E17 - Usage limit exceeded - operator
action required". IBM i software license key 5722-SS1 feature
5052, the user entitlement key for the number of users who are
authorized to use the operating system, is not required for the
8286-42A system. This system has the Software Tier P20 licensing,
which does not have user based licensing and includes the 5250 features.
- On systems using OPAL firmware, a problem was fixed
when switching into the PowerVM mode to prevent the management console
from going into recovery mode.
- On systems using PowerVM firmware, a problem was fixed for
a hypervisor time-keeping services topology failover that caused errors
to be wrongly attributed to the new time-of-day topology, resulting in
processor FRUs being guarded falsely.
- On systems with a PCIe dual-x4 SAS adapter (F/C #5901,
#5278, or #EL10), a problem was fixed for the system fans running too
fast and loud. This PCIe adapter was incorrectly assigned a hot
PCIe rating and this caused the system fans to go to high speed for the
required extra cooling.
This fix is not applicable to the IBM Power System S824L (8247-42L).
- On systems using OPAL firmware, a problem was fixed
for CAPP (Coherent Attached Processor Proxy) system checkstops that
should have been recoverable errors.
- On systems using OPAL firmware, a problem was fixed
for the CEC memory controllers to increase the operation time-out value
to be able to handle long-running Coherent Accelerator Processor
Interface (CAPI) and Peripheral Component Interconnect Express (PCIe)
operations.
- On systems using OPAL firmware, a problem was fixed in the
Advanced System Management Interface (ASMI) "Real Time progress
indicator" to not delete the first character of the second line of the
display.
- On systems using PowerVM firmware, a problem was fixed to
allow booting off an iSCSI device. For the failure, the partition
firmware error logs had SRC BA012010 "Opening the TCP node failed." and
SRC BA010013 "The information in the error log entry for this SRC
provides network trace data." The open firmware standard output
trace showed SRC BA012014 "The TCP re-transmission count of 8 was
exceeded. This indicates a large number of lost packets between this
client and the boot or installation server" followed by SRC BA012010.
- On systems using PowerVM firmware, a problem was fixed for
partition firmware stack corruption that would cause spurious output to
the console for failed ping or network boot operations. When a
stack imbalance is encountered, text is displayed on the console
indicating a stack depth error along with a number of values and the
text string "CUTILS" similar, in format, to the following:
6 1
2 2 0 da15b007 22901dc
CUTILS: bad exit depth? SCHEDULER call-c-wrapper exit: depth=7 ,
_indepth=4 , _#inparms=0
- On systems using PowerVM firmware, a problem was fixed so
that the thermal and power management tunable parameters for the
On-Chip Controller (OCC) in the Advanced System Management Interface
(ASMI) "System Configuration/Power Management/Tuning Parameters" are
not set back to the defaults when the CEC is powered off.
- On systems using PowerVM firmware, a problem was fixed in
checkstop error recovery to force a re-IPL instead of a system
termination for checkstops that occur during memory-preserving
IPLs. This allows the system to recover from the IPL error
without any operator intervention needed.
|
SV810_087_081 / FW810.11
09/26/14 |
Impact: Data
Severity: HIPER
System firmware changes that affect certain systems
- HIPER/Pervasive:
A problem was fixed in PowerVM where the effect of the problem is
non-deterministic but may include undetected corruption of data.
This problem can occur if VIOS (Virtual I/O Server) version 2.2.3.x or
later is installed and either one of following statements is true:
(A) A storage adapter (including Fibre Channel) is assigned to a VIOS
and shared between multiple partitions (one of which must be an IBM i
partition, others can be AIX, Linux or IBM i partitions), and at least
one of the other partitions is performing LPM (Live Partition Mobility)
or an immediate or abnormal shutdown operation.
-or-
(B) A Shared Ethernet Adapter (SEA) with fail over enabled is
configured on the VIOS.
|
SV810_081_081 / FW810.10
09/08/14 |
Impact:
Availability Severity: SPE
New features and functions
- Extended the availability of the IBM Power System S812L
(8247-21L) that was enabled in the 810.00 release.
- Expansion of maximum number of SAS drives on Power System
S814 (8286-41A) from 8 (SSD, disk, or combination thereof) to 10 drives.
- Support for SAS EXP24S expansion drawer (#5887, #EL1S)
attached using a PCIe slot.
- Support for large M64 based BARs for systems in the OPAL
environment.
- Fan speed settings were enhanced for the case of systems
with fan failure to set the speed based on system thermal conditions
instead of forcing all remaining fans to a overdrive speed setting.
- Support for a PCIe Gen3 FPGA x 16 slot adapter that acts as
a co-processor for the POWER8 processor chip for gzip compressions and
decompressions. Feature codes #EJ12 and #EJ13 are electronically
identical with the same CCIN of 59AB. #EJ12 has full high tail
stock and is supported by 8286-41A and 8286-42A. #EJ13 has a low
profile tail stock and is supported by 8284-22A. OS levels
supported are AIX 6.1 and AIX 7.1 or later. IBM i and Linux are
not supported.
- Support for use of system and partition templates on the
management console.
- Support for Coherent Accelerator Processor Interface (CAPI)
for the PCIe Gen 3 FPGA on OPAL. Operating system supported is
Linux.
- Support was added to allow concurrent initialization of the
processor cores. This expands the range of concurrent firmware
updates to accommodate core initialization changes and also allows for
dynamic repairs of processor and cache memory.
- Support was added for cache memory L2/L3 column repair to
allow concurrent repair of memory and propagation of memory errors for
better fault isolation of memory components.
- The system operator panel was enhanced to show the firmware
mode of the system during the IPL of either PowerVM or OPAL for panel
function 1.
- The service processor Processor Runtime Diagnostics (PRD)
was enhanced to collect debug data for failures in host boot
initialization for the Self-Boot Engine (SBE).
- Support was added to the Advanced System Management
Interface (ASMI) USB menu to allow a system dump to be collected to USB
with the power on to the system. This allows the dump to be
collected with the system memory state intact.
- Support for enhanced 10 Gb ethernet adapters that were
previously announced for Power8 for AIX NIM (Network Install
Management) or Linux Network Install capability. The enhanced
adapters are the following:
PCIe2 4-port(10Gb+1GbE) SR+RJ45 Adapter (#EN0S,
#EN0T)
PCIe2 4-port(10Gb+1GbE) SFP+Copper+RJ45 Adapter
(#EN0U, #EN0V)
The level of adapter microcode required is level
20100130 or later.
PCIe2 LP 2-port 10/1GbE BaseT RJ45 Adapter (#EN0W,
#EN0X, #EL3Z)
The level of adapter microcode required is level
30080130 or later.
- Support for a new 4-port Ethernet Adapter with two 10 Gb
and two 1Gb ports (#EN0M, #EN0N with CCIN 2CC0). The adapter offers NIC
and FCoE over its 10 Gb ports and NIC over the 1 Gb ports and is SR-IOV
capable. The 10 Gb ports are LR (long range) fiber optic,
supporting distances up to 10 km. Except for the transceivers and
cabling of the 10 Gb ports, this adapter is functionally
identical to the 4-port adapter (#EN0H, #EN0J, #EL38) SR optical and
(#EN0K, #EN0L, #EL3C) activer copper twinax.
- Support for a new PCIe 2-port Async adapter (#EN27, #EN28)
that serves the same function as the predecessor PCIe 2-port
Async adapter (#5289, #5290) on the Power7+ and earlier
servers. This adapter provides connection for 2
asynchronous EIA-232 devices. Ports are programmable to support EIA-232
protocols, at a line speed of 128K bps. Two RJ45 connections are
located on the rear of the adapter. To attach to devices using a 9-pin
(DB9) connection, use an RJ45-to-DB9 converter. For convenience, one
converter is included with this feature. One converter for each
connector needing a DB9 connector is needed.
- Support for additional PCIe adapters, which had previously
been supported on Power7+ and earlier servers, to help with server
migration:
Ethernet 10 Gb LAN: 1-port optical SR (#5769, #5275)
Ethernet and FCoE: 4-port 10 Gb/1 Gb Copper (#EN0K,
#EN0L, #EL3C)
Ethernet RoCE: 2-port 10 Gb copper (#EC27, #EC28,
#EL27)
Fibre Channel: 2-port 4 Gb (#5774, #5276, #EL09)
SAS: 2-port 3 Gb 380 MB cache (#5805)
- Support was added for a new Advanced System Management
Interface (ASMI) menu to allow the user to choose between an IPMI or a
serial console when in OPAL mode.
System firmware changes that affect all systems
- A problem was fixed in the service processor that
caused the SRC B1504804 to be logged as many as 30 times over five
minutes for a operations panel voltage regulator error. The error
logging has been reduced to one SRC for this error.
- A problem was fixed to allow the system to prevent an
intermittent system hang until IPL time-out after a processor core
checkstop. This secondary failure after a core checkstop had a
low probability of occurring.
- A problem was fixed to maintain time-of-day (TOD) clock
redundancy for the hypervisor time-keeping services in the case of a
TOD error and fail-over to the backup clock topology. There was a
failure in the TOD fail-over process to correctly assign the new backup
TOD topology, causing loss of redundancy for the next TOD error.
- A problem was fixed for the service processor reset/reload
process to eliminate an extra dump and SRC B1818601 caused by an
internal core dump during the reset/reload.
- A problem was fixed for a processor error with an incorrect
call out of a memory card with SRC B124E504 to eliminate the memory
card FRU call out. The processor error call out of SRC B170E540
was correct.
- A problem was fixed in the Advanced System Menu Interface
(ASMI) menus to restore factory settings so that the default for the
Hypervisor mode (PowerVM or OPAL) was restored to the factory setting
using "System Service Aids/Factory Configuration/Service Processor
Reset/All Reset".
- A problem was fixed in how the processor clock speed was
reported to the hypervisor, causing the partitions to show a clock
speed that was about 200 MHZ faster than the actual processor clock
speed.
- A problem was fixed for DRAM repair for the case where two
DRAM modules are having failures at the same rank such that spares are
used to repair each DRAM error. Without the fix, the second DRAM
is not repaired and could eventually be called out and guarded with a
UE SRC.
- A problem was fixed for system hardware dump collection to
collect all the hardware registers by stopping all functional clocks
before starting the collection.
- A problem was fixed for repairing spare memory DRAM so that
repair solutions for failed spares persists across IPLs of the system
by getting the repair solutions written to the Vital Product Data (VPD)
of the DRAM.
- A problem was fixed in the Advanced System Menu Interface
(ASMI) menus to change the name of the "Hypervisor Configuration" menu
to "Firmware Configuration" to more accurately describe the menu
function of being able to change firmware between the PowerVM and OPAL
modes.
- A problem was fixed in the Advanced System Menu Interface
(ASMI) menus to move the IPMI password reset operation from the
"Firmware Configuration" menu to the "Login Profile/Change password"
menu. This change was made to put all the password change
operation together under one menu.
- A problem was fixed in the Advanced System Menu Interface
(ASMI) menu for "Resource Dump" to give the message "This feature is
not supported for OPAL environments" when the system is in OPAL
mode. Previously, ASMI incorrectly stated that the
"Resource Dump" function was not supported on the machine type.
- A problem was fixed in the service processor to add missing
call outs for the memory buffer and memory controller FRUs when there
is a time-out error on the power bus with PE SRC logged of B170E540.
- A problem was fixed in memory diagnostics and fault
isolation that deconfigured more memory than necessary for memory
errors.
- A problem was fixed that caused the Utility COD display of
historical usage data to be truncated on the management console.
- A problem was fixed to eliminate service processor dumps
after AC power cycles of the CEC.
- A problem was fixed to add a missing hardware call out for
service processor FSI bus errors logged with SRC BC8A0A11. This
causes the failing hardware to be deconfigured and guarded for the next
IPL of the system.
- A problem was fixed so that if an IPL failure occurs that
causes the system to power off, error SRCs will be logged instead
of the system hanging for ten minutes and not logging any SRCs.
- A problem was fixed in the system dump data collection for
missing memory data to collect memory data after hardware
de-configuration checkstop errors.
- A
problem was fixed for in-band code update to prevent loss of a
processor support interface (PSI) link that is in a backup role.
- A problem was fixed in system dump collection for a system
hang after a checkstop. The system failed to go to terminate
state and reboot.
- A problem was fixed in system dump collection to return
full dump data when a secondary error occurs during dump data
collection for the checkstop primary error.
- A problem was fixed in the Advanced System Menu Interface
(ASMI) menu "System Configuration/Hardware Deconfiguration/Memory
Deconfiguration" to be able to manually configure and deconfigure DIMMs.
- A problem was fixed for system terminations that could
occur as a result of PCIe adapters using a Level Signaled Interrupt
(LSI) before the hypervisor interrupt handler was ready. This
could occur when in PCIe adapter recovery for an error with src logs
of B7006970 and B700B971. The PCIe adapters are now
held in reset until initialization sequences are completed to ensure
all interrupt handlers are ready for PCIe adapter interrupts.
- A problem was fixed for a management console firmware
update "Remove and Activate" operation that fails to activate the OCC
(On-Chip Controller for thermal and power management) new code level
with SRCs logged of B18B2616 and B1812601. An IPL is needed to
activate the OCC code level to complete the firmware update.
- A problem was fixed for IPL failures caused by Host Boot
PNOR memory corruption. If a IPL Terminate Immediate (TI) from
Host Boot has a SRC without a specific reason code, a corruption check
on the Host Boot memory partitions is run and the Host Boot partitions
corrected to recover them.
- A problem was fixed for the power usage regulation of
memory to keep memory power usage below its specified limits.
Lack of enough memory throttling was allowing the memory to consume
power pass its set limits, leaving the system exposed to power faults
or unexpected power throttling in other areas of the system.
- A problem was fixed to guard cores on hang errors. A
processor core was not being guarded on hang errors where a core
timed-out waiting for an instruction to complete.
- A problem was fixed to allow memory diagnostics during a
re-IPL of the CEC, insuring that problem memory will be guarded or
recovered and preventing possible error log flooding with memory errors.
- A problem was fixed for system dump process memory
corruption that could cause the wrong dump type to be created for a
system failure, resulting in a system dump with the wrong content.
- A problem was fixed for a service processor reset/reload
causing a FSP dump with a Firmware Database (fwdb) core dump captured
within it.
- A problem was fixed for a processor core forward progress
parity error so that the core could be guarded without causing a system
checkstop.
- A problem was fixed in the run time diagnostics of DIMMs to
read the raw card type correctly, preventing failures in the memory
repair.
- A problem was fixed to prevent an intermittent hostboot IPL
deadlock/hang in the deferred work queue with progress code CC009543
and termination with SRC B1813450.
- A problem was fixed in memory diagnostics to be able to
handle multiple DIMM failures without a time-out failure, reducing the
the amount of memory needed to guarded for the errors.
- A problem was fixed in DIMM initialization to prevent
intermittent B181BA08 DIMM failures in host boot during IPL.
- A problem was fixed to call home guarded FRUs on each
IPL. Only the initial failure of the hardware was being reported
to the error log.
- A problem was fixed for the incorrect fan FRU call outs of
SRC 110076111 so that 4U systems (8286-41A, 8286-42A) have FRU 00FV629
for the 80 mm fan and the 2U systems (8284-22A, 8247-21L,
8247-22L) have FRU 00FV726 for the 60 mm fan.
- A problem was fixed for a memory write error becoming a
system checkstop instead of being handled by the memory error handling
and recovery processes.
- A problem was fixed for the error processing of processor
core checkstops at runtime to not ignore the guard on the failed core
on the next IPL of the system, thus preventing additional failures with
the next IPL during host boot.
- A problem was fixed for error recovery for a failed
processor that has all cores guarded such that host boot is able to
re-IPL using the working processor. In certain situations,
the re-IPL on the good processor was failing with SRC B113E504 with PRD
signature PB_CENT_CRESP_ADDR_ERROR.
- A problem was fixed for run-time guarding of a processor
core that had resulted in a system checkstop when the core guard
attempt failed. The processor with the non-guarded broken core
caused the On-Chip Controller (OCC) to have a power measurement
time-out to the processor with SRC B1102A00 that resulted in the system
termination.
- A problem was fixed to prevent incorrect logging of SRC
11007221 whenever the operator panel is missing (or broken). This
SRC indicates ambient temperature of the system is too high and a
performance throttle may occur to lower the temperature, causing
performance loss. A missing operator panel should not cause lower
performance of the system.
- A problem was fixed for undefined hardware states in the
system that caused a early IPL failure with SRCB1101314 when
configuring the Self Boot Engine (SBE) for hostboot.
- A problem was fixed for the Operator panel where the
Enclosure Fault LED was swapped with the Attention/Check Log LED.
- A problem was fixed for memory diagnostics to guard all
unusable memory due to a channel failure. This prevents the
hypervisor from trying to start partitions with memory associated with
the bad channel and having the partition crash.
- A problem was fixed to insure all memory is scrubbed for
correctable errors to prevent run-time memory failures and possible
checkstops. If memory scrubbing actions found the preceding
memory rank had persistent ECC errors, the next rank of memory was
sometimes skipped.
- A problem was fixed in the Hostboot Self Boot Engine (SBE)
to re-IPL without guarding the processor on a SBE step that has
infrequent failures that are recoverable with a retry.
System firmware changes that affect certain systems
- A problem was fixed for processor local bus errors during
an IPL to call out the master and slave bus components with a BC14090F
SRC to identify all the possible failing components. For the
problem, only the bus slave components were being called out on bus
error leaving open the possibility that the faulty component might not
be guarded or repaired.
- On systems that have a boot disk located on a SAN, a
problem was fixed where the SAN boot disk would not be
found on the default boot list and then the boot disk would have
to be selected from SMS menus. This problem would normally
be seen for new partitions that had tape drives configured before the
SAN boot disk.
- On systems in IPv6 networks, A problem was fixed for
DHCP where a duplicate address detection (DAD) message to the
DHCP-client on the service processor could fail, resulting in duplicate
IP addresses being configured on the network.
- On systems that have Active Memory Sharing (AMS)
partitions, a problem was fixed for Dynamic Logical Partitioning
(DLPAR) for a memory remove, leaving a logical memory block (LMB) in an
unusable state until partition reboot.
- On systems in IPv6 networks, a problem was fixed for
a network boot/install failing with SRC B2004158 and IP address
resolution failing using neighbor solicitation to the partition
firmware client.
- On systems in Dynamic Power Saver (DPS) mode, a
problem was fixed so SRC B1812A61 is not logged when power throttling
is needed for a workload over the power capacity. In DPS
mode, a system power usage adjustment is not an error condition.
- On systems in OPAL mode, a problem was fixed for OPAL
network boots to add retries to DHCP to prevent network boot time-out
errors caused by network lags and slow downs.
- On systems in OPAL mode, a problem was fixed in the fault
isolation procedures to not call out hardware FRUS for software
failures to reduce loss of hardware on errors.
- On systems in PowerVM mode, a problem was fixed in
Live Partition Mobility (LPM) for systems at or near the new 32K
maximum for virtual devices that insufficient space existed to store
device attributes of the migrated system, causing RMC failures
and incorrect MTMS values for the migrated partition.
- On systems in PowerVM mode, a problem was fixed for
I/O adapters so that BA400002 errors were changed to informational for
memory boundary adjustments made to the size of DMA map-in
requests. These size adjustments were marked as UE previously for
a condition that is normal.
- On Power8 2U systems, a problem was fixed for the C5 PCIe
slot failing. This PCIe configuration was not supported on the
8284-22A, 8247-21L, and 8247-22L systems.
- On Power8 2U systems, a problem was fixed in the fan
speed management to lower the maximum RPMs of the fans and reduce the
noise level of the system. This problem affects the 8284-22A,
8247-21L, and 8247-22L systems.
- On systems in PowerVM mode using dedicated processors, a
problem with concurrent firmware update was fixed to prevent a quiesce
of the hypervisor process that can result in a system hang.
- On systems in PowerVM mode, a problem was fixed for
unresponsive PCIe adapters after a partition power off or a partition
reboot.
- On systems with 64Gb DIMM memory (F/C #EM8D), a problem was
fixed to allow 64Gb DIMM memory error-correcting code (ECC) repairs
instead of logging a predictive error with no repair to the memory.
|
SV810_061_054 / FW810.02
07/29/14 |
Impact: Data
Severity: HIPER
System firmware changes that affect all systems
- HIPER/Pervasive: A problem was fixed in PowerVM
where the usage of P8 transactional memory and vector facilities could
result in undetected corruption of data if the system is running in
Power8 native mode. OS levels that support Power8 native mode are RHEL
7 and AIX 7.1 TL3 SP3 and later.
System firmware changes that affect certain systems
- HIPER/Pervasive: A
problem was fixed with Live Partition Mobility (LPM) on PowerVM when
migrating a partition between two Power8 systems that are running in
Power8 native mode. This problem could result in unpredictable behavior
when the partition resumes execution on the target system, including
potential undetected corruption of data, a system crash, or a partition
crash. OS levels that support Power8 native mode are RHEL 7 and AIX 7.1
TL3 SP3 and later.
- A problem was fixed for an IBM i D-mode IPL failure with
SRC B2003110 when the alternative load source could not be found.
If a system encounters this issue prior to installing the fix, the
Service Pack can be applied via the Management console or using a USB
flash drive with the system powered off.
|
SV810_058_054 / FW810.01
06/23/14 |
Impact: Security
Severity: HIPER
System firmware changes that affect all systems
- HIPER/Pervasive:
A security problem was fixed in the OpenSSL (Secure Socket Layer)
protocol that allowed clients and servers, via a specially crafted
handshake packet, to use weak keying material for communication.
A man-in-the-middle attacker could use this flaw to decrypt and modify
traffic between the management console and the service processor.
The Common Vulnerabilities and Exposures issue number for this problem
is CVE-2014-0224.
- HIPER/Pervasive:
A security problem was fixed in OpenSSL for a buffer overflow in the
Datagram Transport Layer Security (DTLS) when handling invalid DTLS
packet fragments. This could be used to execute arbitrary code on
the service processor. The Common Vulnerabilities and Exposures
issue number for this problem is CVE-2014-0195.
- HIPER/Pervasive:
Multiple security problems were fixed in the way that OpenSSL handled
read and write buffers when the SSL_MODE_RELEASE_BUFFERS mode was
enabled to prevent denial of service. These could cause the
service processor to reset or unexpectedly drop connections to the
management console when processing certain SSL commands. The
Common Vulnerabilities and Exposures issue numbers for these problems
are CVE-2010-5298 and CVE-2014-0198.
- HIPER/Pervasive:
A security problem was fixed in OpenSSL to prevent a denial of service
when handling certain Datagram Transport Layer Security (DTLS)
ServerHello requests. A specially crafted DTLS handshake packet could
cause the service processor to reset. The Common Vulnerabilities
and Exposures issue number for this problem is CVE-2014-0221.
- HIPER/Pervasive:
A security problem was fixed in OpenSSL to prevent a denial of service
by using an exploit of a null pointer de-reference during anonymous
Elliptic Curve Diffie Hellman (ECDH) key exchange. A specially
crafted handshake packet could cause the service processor to
reset. The Common Vulnerabilities and Exposures issue number for
this problem is CVE-2014-3470.
- A problem was fixed for hardware dumps on the service
processor so that valid dump data could be collected from multiple
processor checkstops. Previously, the hardware data from multiple
processor checkstops would only be correct for the first processor.
- A problem was fixed for platform dumps so that certain
operations would work after the platform dump completed.
Operations such as firmware updates or reset/reloads of the service
processor after a platform dump would cause the service processor to
become inaccessible.
|
SV810_054_054 / FW810.00
06/10/14 |
Impact:
New
Severity: New
New Features and Functions
- GA Level
NOTE:
- POWER8 firmware
addresses the security problem in the OpenSSL Transport Layer Security
(TLS) and Datagram Transport Layer Security (DTLS) to not allow
Heartbeat Extension packets to trigger a buffer over-read to steal
private keys for the encrypted sessions on the service processor.
The Common Vulnerabilities and Exposures issue number is CVE-2014-0160
and it is also known as the heartbleed vulnerability.
- POWER8 (and later) servers include an “update access key”
that is checked when system firmware updates are applied to the
system. The initial update access keys include an expiration date
which is tied to the product warranty. System firmware updates will not
be processed if the calendar date has passed the update access key’s
expiration date, until the key is replaced. As these update
access keys expire, they need to be replaced using either the Hardware
Management Console (HMC) or the Advanced Management Interface (ASMI) on
the service processor. Update access keys can be obtained via the
key management website: http://www.ibm.com/servers/eserver/ess/index.wss
.
|