SC840
For Impact, Severity and other Firmware definitions, Please
refer to the below 'Glossary of firmware terms' url:
http://www14.software.ibm.com/webapp/set2/sas/f/power5cm/home.html#termdefs
The complete Firmware Fix History for this
Release Level can be
reviewed at the following url:
http://download.boulder.ibm.com/ibmdl/pub/software/server/firmware/SC-Firmware-Hist.html
|
SC840_147_056 / FW840.40
10/28/16 |
Impact: Availability
Severity: SPE
New features and functions
- The requirement to upgrade the managing HMCs from HMC 840
to HMC 850 before installing FW840.40 on the E870 (9119-MME), E880
(9119-MHE), E870C (9080-MME) and E880C (9080-MHE) systems has been
removed. However, to properly manage the E870C and E880C systems,
the managing HMC(s) must be at V8 R8.5.0 SP1 or later.
- Support was added to protect the service processor from
booting on a level of firmware that is below the minimum MIF
level. If this is detected, a SRC B18130A0 is logged. A
disruptive firmware update would then need to be done to the minimum
firmware level or higher. This new support has no effect on the
system being updated with the service pack but has been put in place to
provide an enhanced firmware level for the IBM field stock service
processors.
- Support for the Advanced System Management Interface (ASMI)
was changed to not create VPD deconfiguration records and call home
alerts for hardware FRUs that have one VPD chip of a redundant
pair broken or inaccessible. The backup VPD chip for the FRU
allows continued use of the hardware resource. The notification
of the need for service for the FRU VPD is not provided until both of
the redundant VPD chips have failed for a FRU.
System firmware changes that affect all systems
- A problem was fixed the for an infrequent IPL hang and
terminate that can occur if the backup clock card is failing. The
following SRCs may be logged with this termination: B1813450,
B181460B, B181BA07, B181E6C7 and B181E6F1. If the IPL error
occurs, the system can be re-IPLed to recover from the problem.
- A problem was fixed for the Advanced System Management
Interface "Network Services/Network Configuration" "Reset Network
Configuration" button that was not resetting the static routes to the
default factory setting. The manufacturing default is to have no
static routes defined so the fix clears any static routes that had been
added. A circumvention to the problem is to use the ASMI "Network
Services/Network Configuration/Static Route Configuration" "Delete"
button before resetting the network configuration.
- A problem was fixed for a partial callout for a failed
SPIVID (Serial Peripheral Interface Voltage Identification) interface
on the power supply VRM (Voltage Regulator Module). The SPVID
interface allows the processor to to control it's external voltage
supply level, but if it fails, only the processor FRU (SCM) is called
out but not the VRM.
The system IPL will complete with a CEC drawer deconfigured. The
error log will only contain the processor but not the defective
processor VRM. Hostboot does not detect a SPIVID error, but fails
on a SCOM operation to the processor chip. The errors show up
with SRC BCxx090F logged by Hostboot and word 7 containing one of
three error values for a SPIVID_SLAVE_PART callout:
1) RC_SBE_SET_VID_TIMEOUT = 0x005ec1b2
2) RC_SBE_SPIVID_STATUS_ERROR = 0x00902aac
3) RC_SBE_SPIVID_WRITE_RETURN_STATUS_ERROR = 0x0045d3cd with HWP Error
description : "Procedure: proc_sbe_setup_evid SPIVID Device did not
return good status the Boot Voltage Write operation" and HWSV RC of
BA24.
Without the fix, replace both the identified SCM and the associated VRM.
- A problem was fixed for the HMC Exchange FRU procedure for
DVD drive with MTM 7226-1U3 and feature codes 5757/5762/5763 where it
did not verify the DVD drive was plugged in at the end of the exchange
procedure. Without the fix, the user must manually verify
that the DVD drive is plugged in.
- A problem was fixed for a 3.3V power fault on the primary
system clock card causing a failover to the backup clock without an
error log and a call out for the primary clock card. This clock
card is part of a redundant set in the System Control Unit with CCIN
6B49.
- A problem was fixed for a PLL unlock error on the backup
clock card by using spread spectrum to maintain the phased locked loop
for the clock frequency. This technique was already in use for
the primary clock card. The PLL unlock error is rare in the
backup clock for the Power systems but it has been seen more frequently
for the same part in other IBM systems. This clock card is part
of a redundant set in the System Control Unit with CCIN 6B49.
- A problem was fixed for the Advanced System Management
Interface (ASMI) incorrectly showing the Anchor card as guarded
whenever any redundant VPD chip is guarded.
- A problem was fixed for the health monitoring of the NVRAM
and DRAM in the service processor that had been disabled. The
monitoring has been re-established and early warnings of service
processor memory failure is logged with one of the following Predictive
Error SRCs: B151F107, B151F109, B151F10A, or B151F10D.
- A problem was fixed for infrequent VPD cache read failures
during an IPL causing an unnecessary guarding of DIMMs with SRC
B123A80F logged. With the fix, the VPD cache read fails cause a
temporary deconfiguration of the associated DIMM but the DIMM is
recovered on the next IPL.
- A problem was fixed for a processor hang where the error
recovery was not guarding the failing processor. The failure
causes a SRC B111E540 to be logged with Signature Description of "
ex(n0p3c1) (COREFIR[55]) NEST_HANG_DETECT: External Hang
detected". With the fix, the failure processor FRU is called out
and guarded so that the error does not re-occur when the system is
re-IPLed.
- A problem was fixed for the service processor recovery from
intermittent MAX31760 fan controller faults logged with SRC
B1504804. The fan controller faults caused an out of memory
condition on the service processor, forcing it to reset and failover to
the backup service processor with SRCs B181720D, B181E6E9, and
B182951C logged. With the fix, the fan controller faults are
handled without memory loss and the only SRC logged is B1504804 for
each fan controller fault.
- A problem was fixed for a DDR4 memory training step during
hostboot that incorrectly failed DIMMs on the timing margins for the
HOLD limit. The DIMMs may be recovered by manually unguarding the
failed DIMM hardware. This affects the 256GB DDR4 memory DIMM
with feature code #EM8Y.
- A problem was fixed for a failed IPL with SRC UE BC8A090F
that does not have a hardware callout or a guard of the failing
hardware. The system may be recovered by guarding out the
processor associated with the error and re-IPLing the system.
With the fix, the bad processor core is guarded and the system is able
to IPL.
- A problem was fixed for the loss of the setting for the
disable of a periodic notification for a call home error log after a
failover to the backup service processor on a redundant service
processor system. The call home for the presence of a failed
resource can get re-enabled (if manually disabled in ASMI on the
primary service processor) after a concurrent firmware update or any
scenario that causes the service processor to fail over and change
roles. With the fix, the periodic notification flag is
synchronized between the service processors when the flag value is
changed.
- A problem was fixed for a shortened "Grace Period" for
"Out of Compliance" users of a Power Enterprise Pool (PEP).
The "Grace Period" is short by one hour, so the user has one less hour
to resolve compliance issues before the HMC disallows any more
borrowing of PEP resources. For example, if the "Grace Period"
should have been 48 hours as shown in the "Out of Compliance" message,
it really is 47 hours in the hypervisor firmware. The borrowing
of PEP resources is not a common usage scenario. It is most often
found in Live Partition Mobility (LPM) migrations where PEP resources
are borrowed from the source server and loaned to the target server.
- A problem was fixed for an infrequent service processor
failover hang that results in a reset of the backup service processor
that is trying to become the new primary. This error occurs more
often on a failover to a backup service processor that has been in that
role for a long period of time (many months). This error can
cause a concurrent firmware update to fail. To reduce the chance
of a firmware update failure because of a bad failover, an
Administrative Failover (AFO) can be requested from the HMC prior to
the start of the firmware update. When the AFO has completed, the
firmware update can be started as normally done.
- A problem was fixed for On-Chip Controller (OCC) errors
that had excessive callouts for processor FRUs. Many of the OCC
errors are recoverable and do not required that the processor be called
out and guarded. With the fix, the processors will only be called
out for OCC errors if there are three or more OCC failures during a
time period of a week.
- A problem was fixed for an Operations Panel Function 04
(Lamp test) during an IPL causing the IPL to fail. With the fix,
the lamp test request is rejected during the IPL until the hypervisor
is available. The lamp test can be requested without problems
anytime after the system is powered on to hypervisor ready or an OS is
running in a partition.
- A problem was fixed for a false thermal alarm in the active
optical cables (AOC) for the PCIe3 expansion drawer with SRCs B7006AA6
and B7006AA7 being logged every 24 hours. The AOC cables have
feature codes of #ECC6 through #ECC9, depending on the length of the
cable. The SRCs should be ignored as they call for the
replacement of the cable, cable card, or the expansion drawer
module. With the fix, the false AOC thermal alarms are no longer
reported.
- A problem was fixed for CEC drawer deconfiguration during a
IPL due to SRCs BC8A0307 and BC8A1701 that did not have the correct
hardware callout for the failing SCM. With the fix, the failing
SCM is called out and guarded so the CEC drawer will IPL even though
there is a failed processor.
- A problem was fixed for extra resources being assigned in a
Power Enterprise Pool (PEP). This only occurs if all of
these things happen:
o Power server is in a PEP pool
o Power server has PEP resources assigned to it
o Power server powered down
o User uses HMC to 'remove' resources from the powered-down
server
o Power server is then restarted. It should come up with no
PEP resources, but it starts up and shows it still is using PEP
resources it should not have.
To recover from this problem, the HMC 'remove' of the PEP resources
from the server can be performed again.
- A problem was fixed for the On-Chip Controller (OCC)
incorrectly calling out processors with SRC B1112A16 for L4 Cache DIMM
failures with SRC B124E504. This false error logging can occur if
the DIMM slot that is failing is adjacent to two unoccupied DIMM slots.
System firmware changes that affect certain systems
- On systems using PowerVM firmware, a problem was fixed for
network issues, causing critical situations for customers, when an
SR-IOV logical port or vNIC is configured with a non-zero Port VLAN ID
(PVID). This fix updates adapter firmware to 10.2.252.1922, for
the following Feature Codes: EN15, EN16, EN17, EN18, EN0H, EN0J, EL38,
EN0M, EN0N, EN0K, EN0L, and EL3C.
The SR-IOV adapter firmware level update for the shared-mode adapters
happens under user control to prevent unexpected temporary outages on
the adapters. A system reboot will update all SR-IOV shared-mode
adapters with the new firmware level. In addition, when an
adapter is first set to SR-IOV shared mode, the adapter firmware is
updated to the latest level available with the system firmware (and it
is also updated automatically during maintenance operations, such as
when the adapter is stopped or replaced). And lastly, selective
manual updates of the SR-IOV adapters can be performed using the
Hardware Management Console (HMC). To selectively update the
adapter firmware, follow the steps given at the IBM Knowledge Center
for using HMC to make the updates: https://www.ibm.com/support/knowledgecenter/HW4M4/p8efd/p8efd_updating_sriov_firmware.htm.
Note: Adapters that are capable of running in SR-IOV mode, but are
currently running in dedicated mode and assigned to a partition, can be
updated concurrently either by the OS that owns the adapter or the
managing HMC (if OS is AIX or VIOS and RMC is running).
- On systems using the PowerVM hypervisor firmware and
Novalink, a problem was fixed for a NovaLink installation error where
the hypervisor was unable to get the maximum logical memory buffer
(LMB) size from the service processor. The maximum supported LMB
size should be 0xFFFFFFFF but in some cases it was initialized to a
value that was less than the amount of configured memory, causing the
service processor read failure with error code 0X00000134.
- On systems using PowerVM firmware with a system partition
with more than 64 cores, a problem was fixed for Live Partition
Mobility (LPM) migration operations failing with HSCL365C.
The partition migration is stopped because the platform detects a
firmware error anytime the partition has more than 64 cores.
- On systems using PowerVM firmware, a problem was fixed for
an AIX or Linux partition failing with a SRC B2008105 LP 00005 on a
re-IPL after a dump (firmware assisted or error generated dump)
following a Live Partition Mobility (LPM) migration operation.
The problem does not occur if the migrated partition completes a normal
IPL after the migration.
- On systems using PowerVM firmware, a problem was fixed to
prevent NovaLink managed or co-managed systems from blocking SR-IOV
configurations. When configuring or deconfiguring SR-IOV, it is
highly likely that the Novalink VMC virtual device will interfere with
SR-IOV virtual devices. Without the fix, SR-IOV is ignoring the
NovaLink VMC device and trying to use the same virtual slot.
- On systems using PowerVM firmware, a problem was fixed for
intermittent long delays in the NX co-processor for asynchronous
requests such as NX 842 compressions. This problem was observed
for AIX DB2 when it was doing hardware-accelerated compressions of data
but could occur on any asynchronous request to the NX co-processor.
- On systems using PowerVM firmware that have an attached
HMC, a problem was fixed for a Live Partition Mobility migration
that resulted in the source managed system going to the Hardware
Management Console (HMC) Incomplete state after the migration to the
target system was completed. This problem is very rare and has
only been detected once.. The problem trigger is that the source
partition does not halt execution after the migration to the target
system. The HMC went to the Incomplete state for the source
managed system when it failed to delete the source partition because
the partition would not stop running. When this problem occurred,
the customer network was running very slowly and this may have
contributed to the failure. The recovery action is to re-IPL the
source system but that will need to be done without the assistance of
the HMC. For each partition that has a OS running on the source
system, shut down each partition from the OS. Then from the
Advanced System Management Interface (ASMI), power off the
managed system. Alternatively, the system power button may also
be used to do the power off. If the HMC Incomplete state persists
after the power off, the managed system should be rebuilt from the
HMC. For more information on HMC recovery steps, refer to this
IBM Knowledge Center link: https://www.ibm.com/support/knowledgecenter/en/POWER7/p7eav/aremanagedsystemstate_incomplete.htm
- On systems using PowerVM firmware, a problem was fixed for
a latency time of about 2 seconds being added to a target Live
Partition Mobility (LPM) migration system when there is a latency time
check failure. With the fix, in the case of a latency time check
failure, a much smaller default latency is used instead of two
seconds. This error would not be noticed if the customer system
is using a NTP time server to maintain the time.
- On systems using PowerVM firmware that have an attached
HMC, a problem was fixed for a Live Partition Mobility migration
that resulted in a system hang when an EEH error occurred
simultaneously with a request for a page migration operation. On
the HMC, it shows an incomplete state for the managed system with
reference code A181D000. The recovery action is to re-IPL the
source system but that will need to be done without the assistance of
the HMC. From the Advanced System Management Interface
(ASMI), power off the managed system. Alternatively, the
system power button may also be used to do the power off. If the
HMC Incomplete state persists after the power off, the managed system
should be rebuilt from the HMC. For more information on HMC
recovery steps, refer to this IBM Knowledge Center link: https://www.ibm.com/support/knowledgecenter/en/POWER7/p7eav/aremanagedsystemstate_incomplete.htm
- On systems using PowerVM firmware, a problem was fixed for
a system dump post-dump IPL that resulted in adjunct partition errors
of SRC BA54504D, B7005191, and BA220020 when they could not be created
due to false space constraints. These adjunct partition failures
will prevent normal operations of the hypervisor such as creating new
partitions, so a power off and power on of the system is needed to
recover it. If the customer system is experiencing this error
(only some systems will be impacted), it is expected to occur for each
system dump post-dump IPL until the fix is applied.
- On systems using PowerVM firmware, a problem was
fixed for a shared processor pool partition showing an incorrect zero
"Available Pool Processor" (APP) value after a concurrent firmware
update. The zero APP value means that no idle cycles are present
in the shared processor pool but in this case it stays zero even when
idle cycles are available. This value can be displayed using the
AIX "lparstat" command. If this problem is encountered, the
partitions in the affected shared processor pool can be dynamically
moved to a different shared processor pool. Before the dynamic
move, the "uncapped" partitions should be changed to "capped" to
avoid a system hang. The old affected pool would continue to have the
APP error until the system is re-IPLed.
- On systems using PowerVM firmware, a rare problem was
fixed for a system hang that can occur when dynamically moving
"uncapped" partitions to a different shared processor pool. To
prevent a system hang, the "uncapped" partitions should be changed to
"capped" before doing the move.
- On systems using PowerVM firmware, a problem was
fixed for a DLPAR add of the USB 3.0 adapter (#EC45 and #EC46) to an
AIX partition where the adapter could not be configured with the
AIX "cfgmgr" command that fails with EEH errors and an
outstanding illegal DMA transaction. The trigger for the problem
is the DLPAR add operation of the USB 3.0 adapter that has a USB
External Dock (#EU04) and RDX Removable Disk Drives attached, or a USB
3.0 adapter that has a flash driver attached. The PCI slot can be
powered off and on to recover the USB 3.0 adapter.
- On systems using PowerVM firmware, a problem was
fixed for a missing OF trace buffer in the resource dump. This
happens any time a resource dump is requested. The missing FFDC
data may require that problems be recreated before they can be debugged.
- On systems using PowerVM firmware, a problem was fixed for
a Live Partition Mobility (LPM) error where the target partition
migration is failed with HSCLB98C error. Frequency of this error
can be moderate with source partitions that have a vNIC resource but
extremely low if the source partition does not have a vNIC
resource. The failure originates at the VIOS VF level, so
recovery from this error may need a re-IPL of the system to regain full
use of the vNIC resources.
|
SC840_139_056 / FW840.30
09/28/16 |
Impact: Availability
Severity: SPE
New features and functions
- Support for the E870C (9080-MME) and E880C (9080-MHE)
systems. These systems are cloud-enabled and require a minimum
HMC level of V8.R8.5.0 SP1.
- The certificate store on the service processor has been
upgraded to include the changes contained in version 2.6 of the CA
certificate list published by the Mozilla Foundation at the mozilla.org
website as part of the Network Security Services (NSS) version 3.21.
System firmware changes that affect all systems
- A problem was fixed for host-initiated resets of the
service processor that can cause the service processor to
terminate. In this state, the service processor will be
unresponsive but the system and partitions will continue to run.
On systems with a single service processor, the SRC B1817212 will be
displayed on the control panel. For systems with redundant
service processors, the failing service processor will be
deconfigured. To recover the failed service processor, the system
will need to be powered off with AC powered removed during a regularly
scheduled system service action. The problem is intermittent and
very infrequent as most of the host-initiated resets of the service
processor will work correctly to restore the service processor to a
normal operating state.
|
SC840_132_056 / FW840.24
08/31/16 |
Impact: Availability
Severity: HIPER
System firmware changes that affect certain systems
- HIPER/Non-Pervasive: For
a system using PowerVM firmware at a FW840 level and having an AIX
partition or VIOS partition at specific
back levels, a problem
was fixed for PCI adapters not getting configured in the OS. DVD
boots hang with status code 518 when attempts are made to boot off the
AIX or VIOS DVD image. NIM installs hang with status code
608. If the firmware is updated to 840_104 through 840_118 for a
SAS booted system, the subsequent reboot will hang with status code 554.
The failing AIX and VIOS levels are as follows:
AIX:
AIX 7100-02-06 - AIX 7100-02-07
AIX 6100-08-06 - AIX 6100-08-07
VIOS:
VIOS 2.2.2.6 - VIOS 2.2.2.70
Without the fix, the problem may be circumvented by upgrading the AIX
to 7100-03-03 or 6100-09-03 and the VIOS to 2.2.3.4.
Depending on the adapter not getting configured, the error may result
in Defined devices, EEH errors, and/or failure to boot the partition
(if the failing adapter is the boot device). These errors may
also be seen for a rebooted partition after a LPM migration to FW840.
With the fix applied, the error state for some of the adapters in
the running OS may persist and it will be necessary to reboot the OS to
recover from those errors.
|
SC840_118_056 / FW840.23
07/28/16 |
Impact: Data
Severity: HIPER
System firmware changes that affect certain systems
- HIPER/NON-PERVASIVE:
DEFERRED: On systems with DDR4 memory installed, a problem
was fixed for the handling of data errors in the L4 cache.
If a data error occurs in the L4 cache of the memory buffer on an
affected system and it is pushed out to mainline memory, the data error
will not be correctly handled. A data error originating in
the L4 cache may result in incorrect data being stored into
memory. The DDR4 DRAM has feature code (FC) EM8Y for a 256GB 1600
MHz CDIMM.
At this firmware level, DDR4 and DDR3 memory cannot be mixed in the
system. At FW860.10, DDR4 and DDR3 can be mixed in a system, but
each system node must have either DDR3 or DDR4 only.
IBM strongly recommends that the customer should plan an outage to
install the firmware fix immediately. Fix activation requires a
subsequent platform IPL following the installation of the firmware fix
to eliminate any exposure to this issue.
|
SC840_113_056 / FW840.22
07/06/16 |
Impact: Availability
Severity: ATT
New features and functions
- Support was added to Live Partition Mobility to allow
migrations between partitions at firmware level FW760 and FW840.22 or
later. Previously, migration operations were not allowed between
FW760 and FW840 partitions.
System firmware changes that affect all systems
- Support was added
for additional First Failure Data Capture (FFDC) data for processor
clock failover errors provided by creating daily clock status reports
with SRC B150CCDA informational error logs. This clock status SRC
log is written into the Hardware Management Console (HMC) iqyylog.log
as a platform error log (PEL) event. The PEL event contains a
dump of the clock registers. If a processor clock fails over with
SRC B158CC62 posted to the serviceable events log, the iqyylog.log
file on the HMC should be collected to help debug the clock problem
using the B150CCDA data. This support had been dropped in
FW840.21 because of a IPL initialization conflict that has been
resolved and the support is now re-instated.
System firmware changes that affect certain systems
- On systems using
PowerVM firmware, a problem was fixed for a sequence of two or more
Live Partition Mobility migrations that caused a partition to crash
with a SRC BA330000 logged (Memory allocation error in partition
firmware). The sequence of LPM migrations that can trigger the
partition crash are as follows:
The original source partition level can be any FW760.xx, FW763.xx,
FW770.xx, FW773.xx, FW780.xx, or FW783.xx P7 level or any FW810.xx,
FW820.xx, FW830.xx, or FW840.xx P8 level. It is migrated first to
a system running one of the following levels:
1) FW730.70 or later 730 firmware or
2) FW740.60 or later 740 firmware
And then a second migration is needed to a system running one of the
following levels:
1) FW760.00 - FW760.20 or
2) FW770.00 - FW770.10
The twice-migrated system partition is now susceptible to the BA330000
partition crash during normal operations until the partition is
rebooted. If an additional LPM migration is done to any firmware
level, the thrice-migrated partition is also susceptible to the
partition crash until it is rebooted.
With the fix applied, the susceptible partitions may still log multiple
BA330000 errors but there will be no partition crash. A reboot of
the partition will stop the logging of the BA330000 SRC.
|
SC840_111_056 / FW840.21
06/24/16 |
Impact: Availability
Severity: SPE
NOTE:
Critical firmware update for
FW840.20 (SC840_104) level systems
System IPLed with FW840.20:
A critical firmware update is required for all 9119-MME and 9119-MHE
that have been IPLed with FW840.20 (SC840_104). The FW840.20 level can
cause a failed IPL or a potential unplanned outage. If the server is
already in production, then customer should plan an outage at a
convenient time to apply FW 840.21 (SC840_111) or higher and IPL.
System had FW840.20 concurrently
applied: If firmware level FW840.20 was concurrently
installed (i.e. system was NOT IPL'ed after installing the level)
customers are not impacted by this issue provided they apply FW840.21
(SC840_111) or higher prior to next planned system reboot. NOTE: FW
840.21 can be applied concurrently.
System IPLed with any other
version of Firmware: If the current firmware level of the
system is not FW840.20, the system is not exposed to this issue.
Customers can install this level or later at the next scheduled update
window.
To verify the firmware level installed on the server, select “Updates”
from the left side of the HMC and place a check mark on the server of
interest. Then select “View system information” from the bottom view,
select “None - Display current values”. The Platform IPL Level will
indicate the last level the system was booted on.
System firmware changes that affect all systems
- A problem was fixed for an intermittent failure in Hostboot
during the system IPL resulting in SRCs BC70090F and BC8A1701 logged
with a hardware procedure return code of
"RC_PROC_BUILD_SMP_ADU_STATUS_MISMATCH". The system terminates
with a Terminate Immediate (TI) condition. The system must be
re-IPLed to recover. The failure is very infrequent and was
caused by a race condition introduced as part of clock card failure
data collection procedure which has now been removed (see below).
- Support was removed for additional First Failure Data
Capture (FFDC) data for processor clock failover errors added in
FW840.20. The FFDC was provided by creating daily clock
status reports with SRC B150CCDA informational error logs. This
change was removed because it could trigger intermittent IPL &
initialization failures.
|
SC840_104_056 / FW840.20
05/31/16 |
Impact: Availability
Severity: SPE
New features and functions
- Support for a system control unit (SCU) with three fans
instead of four on the E870 (9119-MME) and E880 (9119-MHE) system
models. The SCU fan has CCIN 6B44 with part number 00FV798.
- Support was added for the Stevens6+ option of the internal
tray loading DVD-ROM drive with F/C #EU13. This is an 8X/24X(max)
Slimline SATA DVD-ROM Drive. The Stevens6+ option is a FRU
hardware replacement for the Stevens3+. MTM 7226-1U3
(Oliver) FC 5757/5762/5763 attaches to IBM Power Systems and
lists Stevens6+ as optional for Stevens3+. If the Stevens6+
DVD drive is installed on the system without the required firmware
support, the boot of an AIX partition will fail when the DVD is used as
the load source. Also, an IBM i partition cannot consistently
boot from the DVD drive using D-mode IPL. A SRC C2004130 may be
logged for the load source not found error.
- Support for the IBM PCIe3 12GB cache RAID plus SAS dual
4-port 6Gb x8 adapter with feature code #EJ14 and CCIN 57B1. This
adapter is very similar to the #EJ0L SAS adapter, but it uses a second
chip in the card to provide more IOPS capacity (significant performance
improvement) and can attach more SSD. This adapter uses
integrated flash memory to provide protection of the write cache,
without need for batteries, in case of power failure.
- Support for PowerVM vNIC extended to Linux OS Ubuntu 16.04
LE with up to ten vNIC client adapters for each partition.
PowerVM vNIC combines many of the best features of SR-IOV and PowerVM
SEA to provide a network solution with options for advanced functions
such as Live Partition Mobility along with better performance and I/O
efficiency when compared to PowerVM SEA. In addition PowerVM vNIC
provides users with bandwidth control (QoS) capability by leveraging
SR-IOV logical ports as the physical interface to the network.
- PowerVM CoD was enhanced to eliminate the yearly Utility
CoD renewal on systems using Utility COD. The Utility CoD usage
is already monitoring to make sure systems are running within the
prescribed threshold limit of unreported usage, so a yearly customer
renewal is not needed to manage the Utility CoD processor usage.
- Support was added to the DHCP client on the service
processor for non-random backoff mode needed for Data Center
Manageability Interface (DCMI) V1.5 compliance. By default,
the DHCP client does random backoff delays for retries during DHCP
discovery. For DCMI V1.5, non-random backoff delays were
introduced as an option. Disabling the random back-off mode is
not required for normal operations, but if wanted, the system
administrator can override the default and disable the random back-off
mode by sending the “SET DCMI Configuration Parameters” for the random
back-off property of the Discovery Configuration parameter. A
value of "0" for the bit means "Disabled". Or, the DHCP
configuration file can be modified to add "random-backoff off", causing
the non-random mode for the retry delays to be used during DHCP
discovery.
- Support was added for enhanced diagnostics for PowerVM
Simplified Remote Restart (SRR) partitions. This service
pack level is recommended when using SRR partitions. You can
learn more about SSR partitions at the IBM Knowledge Center: " http://www.ibm.com/support/knowledgecenter/HW4P4/p8hat/p8hat_createremotereslpar.htm".
- Support was added for auto-correction in the Advanced
System Manager Interface (ASMI) for the "Feature Code/Sequence Number"
field of the "System Configuration/Program Vital Product Data/System
Enclosures" menu selection. Lower case letters are invalid in the
"Feature Code/Sequence Number" field so these are now changed to upper
case letters to help form a valid entry. For example, if
"78c9-001" was entered, it would be changed to "78C9-001".
- Support was added for HTTP Strict Transport Security (HSTS)
compliance for The Advanced System Management Interface (ASMI) web
connection. Even without this feature, any attempt to access ASMI
with the HTTP protocol was rejected because the service processor
firewall blocks port 80 (HTTP). But enabling HSTS for ASMI
prevents HSTS security warnings for the service processor during
network scans by security scanner programs such as IBM AppScan.
System firmware changes that affect all systems
- DEFERRED: A
problem was fixed in the dynamic
ram (DRAM) initialization to update the VREF on the dimms to the
optimal settings and to add an additional margin check test to improve
the reliability of the DRAM by screening out more marginal dimms before
they can result in a run-time memory fault.
- A problem was fixed for a degraded PCI link causing a
processor core to be guarded if a non-cacheable unit (NCU) store
time-out occurred with SRC B113E540 and PRD signature
"(NCUFIR[9]) STORE_TIMEOUT: Store timed out on PB". With the fix,
the processor core is not guarded because of the NCU error. If
this problem occurs and a core is deconfigured. clear the guard record
and re-IPL to regain the processor core. The solution for
degraded PCI links is different from the fix for this problem, but a
re-IPL of the CEC or a reset of the PCI adapters could help to recover
the PCI links from their degraded mode.
- A problem was fixed for an incorrect reduction in FRU
callouts for Processor Run-time Diagnostic (PRD) errors after a
reference oscillator clock (OSCC) error has been logged. Hardware
resources are not called out and guarded as expected. Some of the
missing PRD data can be found in the secondary SRC of B181BAF5 logged
by hardware services. The callouts that PRD would have made are
in the user data of that error log.
- A problem was fixed for a Qualys network scan for security
vulnerabilities causing a core dump in the Intelligent Platform
Management Interface (IPMI) process on the service processor with
SRC B181EF88. The error occurs anytime the Qualys scan is run
because it sends an invalid IPMI session id that should have been
handled and discarded without a core dump.
- A security problem was fixed in OpenSSL for a possible
service processor reset on a null pointer de-reference during RSA PPS
signature verification. The Common Vulnerabilities and Exposures issue
number is CVE-2015-3194.
- A security problem was fixed in the lighttpd server on the
service processor, where a remote attacker, while attempting
authentication, could insert strings into the lighttpd server log
file. Under normal operations on the service processor, this does
not impact anything because the log is disabled by default. The
Common Vulnerabilities and Exposures issue number is CVE-2015-3200.
- Support for cable validation option in the Advanced System
Management Interface (ASMI). A new panel option called "Cable
Validation" has been added to the "System Service Aids" menu. The
cable validation can be performed on the FSP, Clock, UPIC, and SMP
cables.
- A problem was fixed for a missing error log when a clock
card fails over to the backup clock card. This problem causes
loss of redundancy on the clock cards without a callout notification
that there is a problem with the FRU. If the fix is applied to a
system that had a failed clock, that condition will not be known until
the system is IPLed again when a errorlog and callout of the clock card
will occur if it is in a persisted failed state.
- A problem was fixed for the Hardware Management Console
(HMC) "chpwrmgmt" command not providing a meaningful error message when
used to try enable an invalid power saver mode of "dynamic_favor_power"
on the 9119-MME or 9119-MHE models. This power saver mode is not
available on these models but the error message issued was "HSCL1400 An
error has occurred during the operation to the managed system. Try the
task again." The following is the corrected error message:
"HSCL1402 This operation failed due to the following reasons: HSCL02F3
The managed system does not support the specified power saver mode."
- A problem was fixed for a secondary clock card (CCIN 6B49 )
failure on the system control unit (SCU) being called out as a local
clock card (CCIN 6B2D) failure on the node with SRC B158E504. For
this failure to occur, the primary clock card on the SCU must have been
previously failed and guarded.
- Support was added for additional First Failure Data Capture
(FFDC) data for processor clock failover errors provided by creating
daily clock status reports with SRC B150CCDA informational error
logs. This clock status SRC log is written into the Hardware
Management Console (HMC) iqyylog.log as a platform error log (PEL)
event. The PEL event contains a dump of the clock
registers. If a processor clock fail over with SRC B158CC62
occurs on the service processor, the iqyylog.log file on the HMC should
be collected to help debug the clock problem using the B150CCDA data.
- A problem was fixed for the service processor going to the
reset state instead of the termination state when the anchor card is
missing or broken. At the termination state, the Advanced System
Manager Interface (ASMI) can be used to collect failure data and debug
the problem with the anchor card.
- A problem was fixed for error log entries created by
Hostboot not getting written to the error log in some situations.
This can cause hardware detected as failed by Hostboot to not get
reported or have a call-home generated. This problem will occur
whenever Hostboot commits a recovered or informational error as its
last error log in the current IPL. In the next IPL, one or
more error logs from Hostboot will be lost.
- A problem was fixed for a service processor failure during
a system power off that causes a reset of the service processor.
The service processor is in the correct state for a normal system power
on after the error. The frequency for this error should be low as
it is caused by a very rare race condition in the power off process.
- A problem was fixed so that service processor NVRAM bit
flips are now detected and reported as predictive errors after a
certain threshold of failures have occurred. The SRCs reported
are B151F109 (threshold of NVRAM errors was reached) or B151F10A (a
NVRAM address has failed multiple times). Previously, these
normal wear errors in the NVRAM were ignored. The bit flip is
self-corrected and does not cause a problem but a high occurrence of
these could mean that a service processor card FRU or system backplane
FRU, as called out in the SRC, is in need of service.
- A security problem was fixed in OpenSSL for a possible
service processor reset on a null pointer de-reference during SSL
certificate management. The Common Vulnerabilities and Exposures issue
number is CVE-2016-0797.
System firmware changes that affect certain systems
- DEFERRED: On
systems using PowerVM
firmware, a performance improvement was made by disabling the Hot/Cold
Affinity (HCA) hardware feature, which gathers memory usage statistics
for consumption by partition operating system memory management
algorithms. The statistics gathering can, in rare cases, cause
performance to degrade. The workloads that may experience issues
are
memory-intensive workloads that have little locality of reference and
thus cannot take advantage of hardware memory cache. As a
consequence,
the problem occurs very infrequently or not at all except for very
specific workloads in a HPC environment. This performance fix
requires
an IPL of the system to activate it after it is applied.
- DEFERRED: On
systems using 256GB DDR4 dimms, a problem was fixed in
the 3DS packaging that could result in a recoverable memory
error.
This fix requires an IPL of the system to take effect. Any system
with
DDR4 dimms should be re-IPLed at the next opportunity to do so after
applying this service pack to provide the best running conditions for
the DDR4 dimms for reliable operation.
- On systems with DDR4 memory DIMMs install, a fix was made
for the longer IPL times needed to initialize DDR4 memory. The
time needed for the IPL has been reduced to be comparable to systems
using other DIMM types such as DDR3.
- On systems with a PowerVM Active Memory Sharing (AMS)
partition with AIX Level 7.2.0.0 or later with Firmware Assisted
Dump enabled, a problem was fixed for a Restart Dump operation failing
into KDB mode. If "q" is entered to exit from KDB mode, the
partition fails to start. The AIX partition must be powered off
and back on to recover. The problem can be circumvented by
disabling Firmware Assisted Dump (default is enabled in AIX 7.2).
- On a PowerVM system, a problem was fixed for an incorrect
date in partitions created with a Simplified Remote Restart-Capable
(SRR) attribute where the date is created as Epoch 01/01/1970
(MM/DD/YYYY). Without the fix, the user must change the partition
time of day when starting the partition for the first time to make it
correct. This problem only occurs with SRR partitions.
- On a PowerVM system with licensed Power Integrated Facility
for Linux (IFL) processors, a problem was fixed for a system hang that
could occur if the system contains both 1) dedicated processor
partitions configured to share processors while active and 2)
shared processor partitions. This problem is more likely to occur
on a system with a low number of non-IFL processors.
- On systems using PowerVM firmware with dedicated processor
partitions, a problem was fixed for the dedicated processor
partition becoming intermittently unresponsive. The problem can be
circumvented by changing the partition to use shared processors.
This is a follow-on to the fix provided in 840.11 for a different issue
for delays in dedicated processor partitions that were caused by low
I/O utilization.
- A problem was fixed for transmit time-outs on a Virtual
Function (VF) during stressful network traffic, on systems using PCIe
adapters in Single Root I/O Virtualization (SR-IOV) shared-mode.
This fix updates adapter firmware to 10.2.252.1918, for the following
Feature Codes: EN15, EN16, EN17, EN18, EN0H, EN0J, EL38, EN0M, EN0N,
EN0K, EN0L, and EL3C.
The SR-IOV adapter firmware level update for the shared-mode adapters
happens under user control to prevent unexpected temporary outages on
the adapters. A system reboot will update all SR-IOV shared-mode
adapters with the new firmware level. In addition, when an
adapter is first set to SR-IOV shared mode, the adapter firmware is
updated to the latest level available with the system firmware (and it
is also updated automatically during maintenance operations, such as
when the adapter is stopped or replaced). And lastly, selective
manual updates of the SR-IOV adapters can be performed using the
Hardware Management Console (HMC). To selectively update the
adapter firmware, follow the steps given at the IBM Knowledge Center
for using HMC to make the updates: https://www.ibm.com/support/knowledgecenter/HW4M4/p8efd/p8efd_updating_sriov_firmware.htm.
Note: Adapters that are capable of running in SR-IOV mode, but are
currently running in dedicated mode and assigned to a partition, can be
updated concurrently either by the OS that owns the adapter or the
managing HMC (if OS is AIX or VIOS and RMC is running).
- On PowerVM systems using Elastic Capacity on Demand (CoD)
(also known as On/Off CoD), a problem was fixed for losing entitlement
amounts when upgrading from FW820 or FW830. If you upgrade to a
service pack level that does not have this fix and lose the
entitlement, you can get another On/Off (Elastic) CoD Enablement code
from IBM Support. This problem only pertains to the E850
(8408-E8E), E870 (9119-MME), and E880 (9119-MHE) models.
|
SC840_087_056 / FW840.11
03/18/16 |
Impact: Availability
Severity: ATT
New features and functions
- The default setting for the "Enlarged I/O Memory Capacity"
feature was disabled on newly manufactured E850, E870 & E880 models
to reduce hypervisor memory usage. Customers of the new systems
using PCI adapters that leverage "Enlarged I/O Memory Capacity" will
need to explicitly enable this feature for the supported PCI slots,
using ASMI Menus while the system is powered off. Existing
systems will not see a change in their current setting. For
existing systems with only AIX and IBM i partitions that do not benefit
from this feature, it can be disabled by using the Advanced System
Management Interface (ASMI) for the "System Configuration-> I/O
Adapter Enlarged Capacity" panel to uncheck the option for the "I/O
Adapter Enlarged Adapter Capacity" feature.
System firmware changes that affect certain systems
- On systems using PowerVM partitions, a problem was fixed
for error recovery from failed Live Partition Mobility (LPM)
migrations. The recovery error is caused by a partition reset
that leaves the partition in an unclean state with the following
consequences: 1) A retry on the migration for the failed source
partition may not not be allowed; and 2) With enough failed migration
recovery errors, it is possible that any new migration attempts for any
partition will be denied. This error condition can be cleared by
a re-IPL of the system. The partition recovery error after a failed
migration is much more likely to occur for partitions managed by
NovaLink but it is still possible to occur for Hardware Management
Console (HMC) managed partitions.
|
SC840_079_056 / FW840.10
03/04/16 |
Impact: Availability
Severity: SPE
New features and functions
- Support for a 256GB DDR4 memory DIMM. Memory feature
code #EM8Y provides a total of 1024GB of memory with 4 each 256GB
CDIMMs (1600 MHz, 8GBIT DDR4). Note that DDR4 and DDR3
DIMMs cannot be mixed in a system for FW840. Also, the
minimum firmware level needed for DDR4 usage is FW840.23 due to a fix
needed for a data integrity problem. At firmware level FW860.10,
DDR4 and DDR3 DIMMs can be mixed in a system, but no mixing is allowed
in a node.
- Support was added to block a full Hardware Management
Console (HMC) connection to the service processor when the HMC is at a
lower firmware major and minor release level than the service
processor. In the past, this check was done only for the major
version of the firmware release but it now has been extended to the
minor release version level as well. The HMC at the lower
firmware level can still make a limited connection to the higher
firmware level service processor. This will put the CEC in a
"Version Mismatch" state. Firmware updates are allowed with the
CEC in the "Version Mismatch" state so that the condition can be
corrected with either a HMC update or a firmware update of the CEC.
- Support for PowerVM vNIC with more vNIC client adapters for
each partition, up to 10 from a limit of 6 at the FW840.00 level.
PowerVM vNIC combines many of the best features of SR-IOV and PowerVM
SEA to provide a network solution with options for advanced functions
such as Live Partition Mobility along with better performance and I/O
efficiency when compared to PowerVM SEA. In addition PowerVM vNIC
provides users with bandwidth control (QoS) capability by leveraging
SR-IOV logical ports as the physical interface to the network.
- Support for a 10-core 4.19 GHz Power8 processor with
feature code #EPBS on the IBM Power System E880 (9119-MHE). This
feature provides a 40-core processor planar containing four ten-core
processor SCMs. Each processor core has 512KB of L2 cache and 8MB
of L3 cache.
- The default setting for the "Enlarged I/O Memory Capacity"
feature was disabled on newly manufactured E850, E870 & E880 models
to reduce hypervisor memory usage. Customers using PCI adapters
that leverage "Enlarged I/O Memory Capacity" will need to explicitly
enable this feature for the supported PCI slots, using ASMI Menus while
the system is powered off.
System firmware changes that affect all systems
- On multi-node systems with a power fault, a problem was fix
for On-Chip Controller errors caused by the power fault being reported
as predictive errors for SRC B1602ACB. These have been corrected
to be informational error logs. If running without the fix, the
predictive and unrecoverable errors logged for the OCC on loss of power
to the node can be ignored.
- A problem was fixed for a system IPL hang at C100C1B0 with
SRC 1100D001 when the power supplies have failed to supply the
necessary 12-volt output for the system. The 1100D001 SRC
was calling out the planar when it should have called out the power
supplies. With the fix, the system will terminate as needed and
call out the power supply for replacement. One mode of power
supply failure that could trigger the hang is sync-FET failures that
disrupt the 12-volt output.
- A problem was fixed for a PCIe3 I/O expansion drawer
(#EMX0) not getting all error logs reported when its error log queue is
full. In the case where the error log queue is full with 16
entries, only one entry is returned to the hypervisor for
reporting. This error log truncation only occurs during periods
of high error activity in the expansion drawer.
- A problem was fixed for the callout of a VPD collection
fault and system termination with SRC 11008402 to include the 1.2vcs
VRM FRU. The power good fault fault for the 1.2 volts would be a
primary cause of this error. Without the fix, the VRM is missing
in the callout list and only has the VPDPART isolation procedure.
- A problem was fixed for excessive logging of the SRC
11002610 on a power good (pgood) fault when detected by the Digital
Power Subsystem Sweep (DPSS). Multiple pgood interrupts are
signaled by the DPSS in the interval between the first pgood failure
and the node power down. A threshold was added to limit the
number of error logs for the condition.
- A problem was fixed for redundant logging of the SRC
B1504804 for a fan failure, once every five seconds. With the
fix, the failure is logged only at the initial time of failure in the
IPL.
- A problem was fixed to speed recovery for VPD collection
time-out errors for PCIe resources in an I/O drawer logged with SRC
10009133 during concurrent firmware updates. With the fix, the
hypervisor is notified as soon as the VPD collection has finished so
the PCIe resources can report as available . Without the fix,
there is a delay as long as two hours for the recovery to complete.
- A problem was fixed to allow IPMI entity IDs to be used in
ipmitool raw commands on the service processor to get the temperature
reading. Without the fix, the DCMI entity IDs have to be used in
the raw command for the "Get temperature" function.
- A problem was fixed for a false unrecoverable error (UE)
logged for B1822713 when an invalid cooling zone is found during the
adjustment of the system fan speeds. This error can be ignored as
it does not represent a problem with the fans.
- A problem was fixed for a processor clock failover error
with SRC B158CC62 calling out all processors instead of isolating to
the suspect processor. The callout priority correctly has a clock
and a procedure callout as the highest priority, and these should be
performed first to resolve the problem before moving on to the
processors.
- A problem was fixed for loss of back-level protection
during firmware updates if an anchor card has been replaced. The
Power system manufacturing process sets the minimum code level a system
is allowed to have for proper operation. If a anchor card is
replaced, it is possible that the replacement anchor card is one that
has the Minimum MIF Level (MinMifLevel) given as "blank", and
this removes the system back-level protection. With the fix, blanks or
nulls on the anchor card for this field are handled correctly to
preserve the back-level protection. Systems that have already
lost the back-level protection due to anchor card replacement remain
vulnerable to a accidental downgrade of code level by operator error,
so code updates to a lower level for these systems should only be
performed under guidance from IBM Support. The following command
can be run the Advanced Management Management Interface (ASMI) to
determine if the system has lost the back-level protection with the
presence of "blanks" or ASCII 20 values for MinMifLevel:
"registry -l cupd/MinMifLevel" with output:
"cupd/MinMifLevel:
2020202020202020 2020202020202020 [ ]
2020202020202020 2020202020202020 [ ]"
- A problem was fixed for a code update error from FW830 to a
FW840 level causes temperature sensors to be lost so that the ipmitool
command to list the temperature sensors fails with a IPMI program core
dump. If the temperature sensors are already corrupted due to a
preceding code update, this fix adds back in the temperature sensors to
allow the ipmitool to work for listing the temperature sensors.
- A problem was fixed for a system checkstop caused by a L2
cache least-recently used (LRU) error that should have been a
recoverable error for the processor and the cache. The cache
error should not have caused a L2 HW CTL error checkstop.
- A problem was fixed for a re-IPL with power on failure with
B181A40F SRC logged for VPD not found for a DIMM FRU. The DIMM
had been moved to another slot or just removed. In this
situation, a IPL of the system from power off will work without errors,
but a re-IPL with power on, such as that done after processing a
hardware dump, will fail with the B181A40F. Power off the system
and IPL to recover. Until the fix is applied, the problem can be
circumvented after a DIMM memory move by putting the PNOR flash memory
in genesis mode by running the following commands in ASMI with the CEC
powered off:
1) hwsvPnorCmd -c
2) hwsvPnorCmd -g
- A problem was fixed for the service processor becoming
inaccessible when having a dynamic IP address and being in DCMI
"non-random" mode for DHCP discovery by customer configuration.
The problem can occur intermittently during a AC power on of the
system. If the service processor does not respond on the network,
AC power cycle to recover. Without the fix, the problem can be
circumvented by using the DHCP client in the DCMI "random" mode for
DHCP discovery, which is the default on the service processor.
- A problem was fixed for priority callouts for system clock
card errors with SRC B158CC62. These errors had high priority
callouts for the system clock card and medium callouts for FRUs in the
clock path. With the fix, all callouts are set to medium priority
as the clock card is not the most probable FRU to have failed but is
just a candidate among the many FRUs along the clock path.
- A problem was fixed for a memory initialization error
reported with SRC BC8A0506 that terminates the IPL. This problem
is unlikely to occur because it depends on a specific memory location
being used by the code load. The system can be recovered from the error
by doing another IPL.
System firmware changes that affect certain systems
- On PowerVM systems
a problem was fixed to address a performance
degradation. The problem surfaces under the following conditions:
1) There is at least one VIOS or Linux partition that
is running with dedicated processors AND
2) There is at least one VIOS or Linux partition
running with shared processors AND
3) There is at least one AIX or IBMi partitions
configured with shared processors.
If ALL the above conditions are met AND one of the following actions
occur,
1) VIOS/Linux dedicated processor partition is
configured to share processors while active OR
2) A dynamic platform optimization operation (HMC
'optmem' command) is performed OR
3) Processors are unlicensed via a capacity on demand
operation
there is an exposure for a loss in performance.
- On systems using
PowerVM firmware, a problem was fixed for PCIe switch recovery to
prevent a partition switch failure during the IPL with error logs for
SRC B7006A22 and B7006971 reported. This problem can occur
when doing recovery for an informational error on the switch. If
this problem occurs, the partition must be restarted to recover the
affected I/O adapters.
- On systems using PowerVM firmware, a problem was fixed for
a concurrent FRU exchange of a CAPI (Coherent Accelerator
Processor Interface) adapter for a standard I/O adapter that results in
a vary off failure. If this failure occurs, the system needs to
be re-IPLed to fix the adapter. The trigger for this failure is a
dual exchange where the CAPI adapter is exchanged first for a standard
(non-like-typed) adapter. Then an attempt is made to exchange the
standard adapter for a CAPI adapter which fails.
- On systems using PowerVM firmware, a problem was fixed for
a CAPI (Coherent Accelerator Processor Interface) device going to
a "Defined" state instead of "Available" after a partition boot.
If the CAPI device is doing recovery and logging error data at the time
of the partition boot, the error may occur. To recover from the
error, reboot the partition. With the fix, the hypervisor will
wait for the logging of error data from the CAPI device to finish
before proceeding with the partition boot.
- On systems using PowerVM firmware, a problem was fixed for
a hypervisor adjunct partition failed with "SRC B2009008 LP=32770" for
an unexpected SR-IOV adapter configuration. Without the fix, the
system must be re-IPLed to correct the adjunct error. This error
is infrequent and can only occur if an adapter port configuration is
being changed at the same time that error recovery is occurring for the
adapter.
- On systems using PowerVM firmware and PCIe adapters in
SR-IOV mode, the following problem was addressed with a Broadcom
Limited (formerly known as Avago Technologies and Emulex) adapter
firmware update to 10.2.252.1913: Transmit time-outs on a Virtual
Function (VF) during stressful network traffic.
- On systems using PowerVM firmware with an invalid P-side or
T-side in the firmware, a problem was fixed in the partition firmware
Real-Time Abstraction System (RTAS) so that system Vital Product Data
(VPD) is returned at least from the valid side instead of returning no
VPD data. This allows AIX host commands such as lsmcode,
lsvpd, and lsattr that rely on the VPD data to work to some extent even
if there is one bad code side. Without the fix, all the VPD
data is blocked from the OS until the invalid code side is recovered by
either rejecting the firmware update or attempting to update the system
firmware again.
- On systems using PowerVM firmware without a HMC (and in
Manufacturing Default Configuration (MDC) mode with a single host
partition), a problem was fixed for missing dumps of type SYSDUMP.
FSPDUMP. LOGDUMP, and RSCDUMP that were not off-loaded to the host
OS. This is an infrequent error caused by a timing error that
causes the dump notification signal to the host OS to be lost.
The missing/pending dumps can be retrieved by rebooting the host OS
partition. The rebooted host OS will receive new notifications of
the dumps that have to be off-loaded.
- On systems using PowerVM firmware, a problem was fixed for
truncation on the memory fields displayed in the Advanced System
Management Interface on the COD panels. ASMI shows three fields
of memory called "Installed memory", Permanent memory", and "Inactive
memory". The largest value that can be displayed in the fields
was "9999" GB. This has been expanded to a maximum of "999999" GB
for each of the ASMI fields. The truncation was only in the
displayed memory value, not in the actual memory size being used by the
system which was correct.
- On systems using PowerVM firmware and a partition using
Active memory Sharing (AMS), a problem was fixed for a Live Partition
Mobility (LPM) migration of the AMS partition that can hang the
hypervisor on the target CEC. When an AMS partition migrates to
the target CEC, a hang condition can occur after processors are resumed
on the target CEC, but before the migration operation completes.
The hang will prevent the migration from completing, and will likely
require a CEC reboot to recover the hung processors. For this
problem to occur, there needs to be memory page-based activity (e.g.
AMS dedup or Pool paging) that occurs exactly at the same time that the
Dirty Page Manager's PSR data for that page is being sent to the target
CEC.
- On systems using PowerVM firmware and having a IBM i
partition with more than 64 cores, a performance problem was fixed with
the choice of processor cores assigned to the partition.
This problem only applies to the E870 (9119-MME) and E880 (9119-MHE)
models.
- On systems using PowerVM firmware, a problem was fixed for
PCIe adapter hangs and network traffic error recovery during Live
Partition Mobility (LPM) and SR-IOV vNIC (virtual ethernet
adapter) operations. An error in the PCI Host Bridge (PHB)
hardware can persist in the L3 cache and fail all subsequent network
traffic through the PHB. The PHB error recovery was
enhanced to flush the PHB L3 cache to allow network traffic to resume.
- On systems using PowerVM firmware with AIX or Linux
partitions with greater than 8TB of memory, a problem was fixed for
Dynamic DMA Window (DDW) enabled adapters IPLing into a "Defined"
state, instead of "Available", and unusable with a "0" size DMA
window. If a DDW enabled adapter is plugged into an HDDW (Huge
Dynamic DMA Window) slot in a partition with the large memory size, the
OS changes the default DMA window to "0" in size. To prevent this
problem, the Advanced System Management Interface (ASMI) in the service
processor can be used to set "I/O Enlarged Capacity" to "0" (which is
off), and all the DDW enabled adapters will work on the next IPL.
- On a multi-node system, a problem was fixed for a
power fault with SRC 11002610 having incorrect FRU callouts. The
wrong second FRU callout is made on nodes 2, 3, and 4 of a multi-node
system. Instead of calling out the processor FRU, the enclosure
FRU is called out. The first FRU callout is correct.
- On PowerVM systems with partitions running Linux, a problem
was fixed for intermittent hangs following a Live Partition Mobility
(LPM) migration of a Linux partition. A partition migrating from
a source system running FW840.00 to a system running any other
supported firmware level may become unresponsive and unusable once it
arrives on the target system. The problem only affects Linux
partitions and is intermittent. Only partitions that have
previously been migrated to a FW840.00 system are susceptible to a hang
on subsequent migration to another system. If a partition is hung
following a LPM migration, it must be rebooted on the target system to
resume operations.
- On systems using OPAL firmware, a problem was fixed that
prevented multiple NVIDIA Tesla K80 GPUs from being attached to one
PCIe adapter. This prevented using a PCIe attached GPU
drawer. This fix increases the PCIe MMIO (memory-mapped I/O)
space to 1 TB from a previous maximum of 64 GB per PHB/PCIe slot.
- On PowerVM systems with dedicated processor partitions with
low I/O utilization, the dedicated processor partition may become
intermittently unresponsive. The problem can be circumvented by
changing the partition to use shared processors.
- On systems using OPAL firmware, a problem was fixed in OPAL
to identify the PCI Host Bridge (PHB) on CAPI adapter errors and not
always assume PHB0.
- On systems using OPAL firmware, a problem was fixed in the
OPAL gard utility to remove gard records after guarded components have
been replaced, Without the fix, Hostboot and the gard utility
could be in disagreement on the replaced components, causing some
components to still display as guarded after a repair.
- On systems using PowerVM firmware with partitions with very
large number of PCIe adapters, a problem was fixed for partitions that
would hang because the partition firmware ran out of memory for the
OpenFirmware FCode device drivers for PCIe adapters. With the
fix, the hypervisor is able to dynamically increase the memory to
accommodate the larger partition configurations of I/O slots and
adapters.
- On PowerVM systems with vNIC adapters, a problem was fixed
for doing a network boot or install from the adapter using a VLAN
tag. Without the fix, the support is missing for doing a network
boot from the VLAN tag from the SMS RIPL menu.
- On systems using PowerVM firmware, a problem was fixed for
a Live Partition Mobility (LPM) migration of a partition with large
memory that had a migration abort when the partition took longer than
five minutes to suspend. This is a rare problem and is triggered
by an abnormally slow response time from the migrating partition.
With the fix, the five minute time limit on the suspend operation has
been removed.
- On systems using PowerVM firmware at FW840.00 with an AIX
VIO client partition at level 7.1 TL04 SP03 or 7.2 TL01 SP00 or later,
a problem was fixed for virtual ethernet adapters adapters with a
IPv6 largesend packet (-i.e., data packets of size greater than
the maximum transmission unit (MTU)) that hung and/or ran slow because
largesend packets were discarded by the hypervisor. For
example, telnet and ping commands for the system will be working but as
soon as a send of a large packet of data is attempted, the network
connection hangs. This firmware fix requires AIX levels 7.1 TL04
SP03 or 7.2 TL01 SP00 or later for the largesend feature to work.
The problem can be circumvented by disabling "mtu_bypass" (largesend)
on the AIX VIO client. The "mtu_bypass" is disabled by default
but many network administrators enable it for a performance gain.
To disable " mtu_bypass" on the AIX VIO client, use the following
steps:
(0) This change may impact existing connections so shut down the
affected NIC cards (where X is the interface number) prior to the
change
(1) Login to AIX VIO client from console as root
(2) ifconfig enX down;ifconfig enX detach
(3) chdev -l enX -a mtu_bypass=off
(4) chdev -l enX -a state=up
(5) mkdev -l inet0
|
SC840_056_056 / FW840.00
12/04/15 |
Impact:
New
Severity: New
New Features and Functions
NOTE:
- POWER8 (and later) servers include an “update access key”
that is checked when system firmware updates are applied to the
system. The initial update access keys include an expiration date
which is tied to the product warranty. System firmware updates will not
be processed if the GA date of the desired firmware level occurred
after the update access key’s expiration date. As these update
access keys expire, they need to be replaced using either the Hardware
Management Console (HMC) or the Advanced Management Interface (ASMI) on
the service processor. Update access keys can be obtained via the
key management website: http://www.ibm.com/servers/eserver/ess/index.wss.
- Support for allowing the PowerVM hypervisor to continue to
run when communication between the service processor and platform
firmware has been lost and cannot be re-established. A SRC
B1817212 may be logged and any active partitions will continue to run
but they will not be able to be managed by the management
console. The partitions can be allowed to run until the next
scheduled service window at which time the service processor can be
recovered with an AC power cycle or a pin-hole reset from the operator
panel. This error condition would only be seen on a system that
had been running with a single service processor (no redundancy for the
service processor).
- Support
in the Advanced Systems Management Interface (ASMI) for managing
certificates on the service processor with option "System
Configuration/Security/Certificate Management". Certificate
management includes 1) Generation of Certificate Signing Request (CSR)
2) Download of CSR and 3) Upload of signed certificates. For more
information on managing certificates, go to the IBM KnowledgeCenter
link for "Certificate Management"
(https://www-01.ibm.com/support/knowledgecenter/P8ESS/p8hby/p8hby_securitycertificate.htm)
- Support for concurrent add of the PCIe expansion drawer
(F/C #EMX0) and concurrent add of PCIe optical cable adapters (F/C EJ07
and CCIN 6B52). For concurrent add guidance, go to the IBM
KnowledgeCenter links for "Connecting a PCIe Gen3 I/O expansion drawer
to your system"(https://www-01.ibm.com/support/knowledgecenter/9119-MHE/p8egp/p8egp_connect_kickoff.htm?lang=en-us)
and for "PCIe adapters for the 9119-MHE and 9119-MME" (https://www-01.ibm.com/support/knowledgecenter/9119-MHE/p8hak/p8hak_87x_88x_kickoff.htm?lang=en-us).
- Support for concurrent repair/exchange of the PCIe3 6-slot
Fanout module for the PCIe3 Expansion Drawer, PCIe Optical Cable
adapters and PCIe3 Optical Cable. For concurrent repair/exchange
guidance for these parts, go to the IBM KnowledgeCenter link for
"Removing and replacing parts in the PCIe Gen3 I/O expansion drawer"(https://www-01.ibm.com/support/knowledgecenter/9119-MHE/p8egr/p8egr_emx0_kickoff.htm?lang=en-us).
Below are the feature codes for the affected parts:
#EMX0 - PCIe3 Expansion Drawer
#EMXF - PCIe3 6-Slot Fanout Module for PCIe3 Expansion Drawer (all
server models)
#EJ07 (CCIN 6B52) - PCIe3 Optical Cable Adapter for PCIe3 Expansion
Drawer
#ECC6 - 2M Optical Cable Pair for PCIe3 Expansion Drawer
#ECC8 - 10M Optical Cable Pair for PCIe3 Expansion Drawer
#ECC9 - 20M Optical Cable Pair for PCIe3 Expansion Drawer
- PowerVM support for Support for Coherent Accelerator
Processor Interface (CAPI) adapters. The PCIe3 LP CAPI
Accelerator Adapter with F/C #EJ16 is used on the S812L(8247-21L) and
S822L (8247-22L) models The PCIe3 CAPI FlashSystem
Acclerator Adapter with F/C #EJ17 is used on the S814(8286-41A)
and S824(8286-42A) models. The PCIe3 CAPI FlashSystem Accelerator
Adapter with F/C #EJ18 is used on the S822(8284-22A), E870(9119-MME),
and E880(9119-MHE) models. This feature does not apply to the
S824L (8247-42L) model.
- Management console enhancements for support of concurrent
maintenance of CAPI-enabled adapters.
- Support for
PCIe3 Expansion Drawer (#EMX0) lower cable failover, using lane
reversal mode to bring up the expansion drawer from the top
cable. This eliminates a single point of failure by supporting
lane reversal in case of problems with the lower cable.
- Expanded support of Virtual Ethernet Large send from IPv4
to the IPv6 protocol in PowerVM.
- Support for IBM i network install on a IEEE 802.1Q
VLAN. The OS supported levels are IBM i.7.2.TR3 or later.
This feature applies only to S814 (8286-41A), S824(8286-42A), E870
(9119-MME), and E880 (9119-MHE) models.
- Support for PowerVM vNIC with up to six vNIC client
adapters for each partition. PowerVM vNIC combines many of the
best features of SR-IOV and PowerVM SEA to provide a network solution
with options for advanced functions such as Live Partition Mobility
along with better performance and I/O efficiency when compared to
PowerVM SEA. In addition PowerVM vNIC provides users with
bandwidth control (QoS) capability by leveraging SR-IOV logical ports
as the physical interface to the network.
Note: If more than six vNIC client adapters are used in a
partition, the partition will run, as there is no check to prevent the
extra adapters, but certain operations such as Live Partition Mobility
may fail.
- Enhanced handling of errors to allow partial data in a
Shared Storage Pool (SSP) cluster. Under partial data error
conditions, the management console "Manage PowerVM" gui will correctly
show the working VIOS clusters along with information about the broken
VIOS clusters, instead of showing no data.
- Live Partition Mobility (LPM) was enhanced to allow the
user to specify VIOS concurrency level overrides.
- Support was added for PowerVM hard compliance enforcement
of the Power Integrated Facility for Linux (IFL). IFL is an
optional lower cost per processor core activation for Linux-only
workloads on IBM Power Systems. Power IFL processor cores can be
activated that are restricted to running Linux workloads. In
contrast,
processor cores that are activated for general-purpose workloads can
run any supported operating system. PowerVM will block
partition activation, LPM and DLPAR requests on a system with IFL
processors configured if the total entitlement of AIX and IBMi
partitions exceeds the amount of licensed general-purpose
processors. For AIX and IBMi partitions configured with uncapped
processors, the
PowerVM hypervisor will limit the entitlement and uncapped resources
consumed to the amount of expensive processors that are currently
licensed.
- Support was added to allow Power Enterprise Pools to
convert permanently-licensed (static) processors to Pool Processors
using a CPOD COD activation code provided by the management
console. Previously, only unlicensed processors were able to
become Pool Processors.
- The management console was enhanced to allow a Live
Partition Mobility (LPM) if there is a failed VIOS in a redundant
pair. During LPM, if the VIOS is inactive, the management console
will use stored configuration information to perform the LPM.
- The firmware update process from the management console and
from in-band OS (except for IBM i PTFs) has been enhanced to download
new "Update access keys" as needed to prevent the access key from
expiring. This provides an automatic renewal process for the
entitled customer.
- Live Partition Mobility support was added to allow the user
to specify a different virtual Ethernet switch on the target server.
- PowerVM was enhanced to support an AIX Live Update where
the AIX kernel is updated without rebooting the kernel. The AIX
OS level must be 7.2 or later. Starting with AIX Version 7.2, the
AIX operating system provides the AIX Live Update function which
eliminates downtime associated with patching the AIX operating system.
Previous releases of AIX
required systems to be rebooted after an interim fix was applied to a
running system. This new feature allows workloads to remain active
during a Live Update operation and the operating system
can use the interim fix immediately without needing to restart the
entire system. In the first release of this feature, AIX Live Update
will allow customers to install interim fixes (ifixes) only. For more
information on AIX Live Update, go to the IBM KnowledgeCenter
link for "Live Update"
(https://www-01.ibm.com/support/knowledgecenter//ssw_aix_72/com.ibm.aix.install/live_update_install.htm).
- The management console has been enhanced to use standard
FTP in its firmware update process instead of a custom
implementation. This will provide a more consistent interface for
the users.
- Support for setting Power Management Tuning Parameters from
the management console (Fixed Maximum Frequency (FMF), Idle Power Save,
and DPS Tunables) without needing to use the Advanced System Management
Interface (ASMI) on the service processor. This allows FMF mode
to be set by default without having to modify any tunable parameters
using ASMI.
- Support for a Corsa PCIe adapter with accelerator FPGA for
low latency connection using CAPI (Coherent Accelerator Processor
Interface) attached to a FlashSystem 900 using two 8Gb optical SR Fibre
Channel (FC) connections.
Supported IBM Power Systems for this feature are the following:
1) E880 (9119-MHE) with CAPI Activation feature #EC19 and Corsa
adapter #EJ18 Low profile on AIX.
2) E870 (9119-MME) with CAPI Activation feature #EC18 and Corsa adapter
#EJ18.Low profile on AIX.
3) S822 (8284-22A) with CAPI Activation feature #EC2A and Corsa
adapter #EJ18.Low profile on AIX.
4) S814 (8286-41A) with CAPI Activation feature #EC2A and Corsa adapter
#EJ17 Full height on AIX.
5) S824 (8286-42A) with CAPI Activation feature #EC2A and Corsa adapter
#EJ17 Full height on AIX.
6) S812L (8247-21L) with CAPI Activation feature #EC2A and Corsa
adapter #EJ16 Low profile on Linux.
7) S822L (8247-22L) with CAPI Activation feature #EC2A and Corsa
adapter #EJ16 Low profile on Linux.
OS levels that support this feature are PowerVM AIX 7.2 or later and
OPAL bare-metal Linux Ubuntu 15.10.
The IBM FlashSystem 900 storage system is model 9840-AE2 (one year
warranty) or 9843-AE2 (three year warranty) at the 1.4.0.0 or later
firmware level with features codes #AF23, #AF24, and #AF25 supported
for 1.2 TB, 2.9 TB, 5.7 TB modules, respectively.
- The Digital Power Subsystem Sweep (DPSS) FPGA, used to
control P8 fan speeds and memory voltages, was enhanced to support the
840 GA level. This DPSS update is delayed to the next IPL of the CEC
and adds 18 to 20 minutes to the IPL. See the "Concurrent
Firmware Updates" section above for details.
- Support for Data Center Manageability Interface (DCMI) V1.5
and Energy Star compliance. DCMI features were added to the
Intelligent Platform Management Interface (IPMI) 2.0 implementation on
the service processor. DCMI adds platform management capability
for monitoring elements such as system temperatures, power supplies,
and bus errors. It also includes automatic and manually driven
recovery capabilities such as local or remote system resets, power
on/off operations, logging of abnormal or "out-of-range‟
conditions for later examination. And It allows querying for
inventory information that can help identify a failed hardware unit
along with power management options for getting and setting power
limits.
Note: A deviation from the DCMI V1.5 specification exists for
840.00 for the DCMI Configuration Parameters for DHCP Discovery.
Random back-off mode is enabled by default instead of being
disabled. The random back-off puts a random variation delay in
the DHCP retry interval so that the DHCP clients are not responding at
the same time. Disabling the back-off time is not required for normal
operations, but if wanted, the system administrator can override the
default and disable the random back-off mode by sending the “SET DCMI
Configuration Parameters” for the random back-off property of the
Discovery Configuration parameter. A value of "0" for the bit
means "Disabled".
|