SC860
For Impact, Severity and other Firmware definitions, Please
refer to the below 'Glossary of firmware terms' url:
http://www14.software.ibm.com/webapp/set2/sas/f/power5cm/home.html#termdefs
The following Fix description table will
only contain the N (current) and N-1 (previous) levels.
The complete Firmware Fix History for this
Release Level can be
reviewed at the following url:
http://download.boulder.ibm.com/ibmdl/pub/software/server/firmware/SC-Firmware-Hist.html
|
SC860_240_165 / FW860.B0
01/21/22 |
Impact: Availability
Severity: SPE
System firmware changes that
affect all systems
- A problem was fixed
for an incorrect SRC logged for a #EXM0 PCIe expansion drawer power
fault found on the low CXP cable. An SRC B7006A85 (AOCABLE,
PCICARD) is logged instead of the correct SRC of B7006A86 (PCICARD,
AOCABLE). This happens every time there is a power fault on the
low CXP cable.
- A problem was fixed for a Live Partition Mobility (LPM)
hang during LPM validation on the target system. This is a rare
system problem triggered during an LPM migration that causes LPM
attempts to fail as well as other functionality such as configuration
changes and partition shutdowns.
To recover from this problem to be able to do LPM and other operations
such as configuration changes and shutting down partitions, the system
must be re-IPLed.
- A problem was fixed for the HMC Repair and Verify (R&V)
procedure failing with "Unable to isolate the resource" during
concurrent maintenance of the #EMX0 Cable Card. This could lead
one to take disruptive action in order to do the repair. This should
occur infrequently and only with cases where a physical hardware
failure has occurred which prevents access to the PCIe reset line
(PERST) but allows access to the slot power controls. As a
workaround, pulling both cables from the Cable Card to the #EMX0
expansion drawer will result in a completely failed state that can be
handled by bringing up the "PCIe Hardware Topology" screen from either
ASMI or the HMC. Then retry the R&V operation to recover the Cable
Card.
- A problem was fixed for a partition with an SR-IOV logical
port (VF) having a delay in the start of the partition. If the
partition boot device is an SR-IOV logical port network device, this
issue may result in the partition failing it boot with SRCs BA180010
and BA155102 logged and then stuck on progress code SRC 2E49 for an AIX
partition. This problem is infrequent because it requires
multiple error conditions at the same time on the SR-IOV adapter.
To trigger this problem, multiple SR-IOV logical ports for the same
adapter must encounter EEH conditions at roughly the same time such
that a new logical port EEH condition is occurring while a previous EEH
condition's handling is almost complete but not notified to the
hypervisor yet. To recover from this problem, reboot the
partition.
- A problem was fixed for a system hypervisor hang and an
Incomplete state on the HMC after a logical partition (LPAR) is deleted
that has an active virtual session from another LPAR. This
problem happens every time an LPAR is deleted with an active virtual
session. This is a rare problem because virtual sessions from an
HMC (a more typical case) prevent an LPAR deletion until the virtual
session is closed, but virtual sessions originating from another LPAR
do not have the same check.
- The following problems were fixed for certain SR-IOV
adapters:
1) An error was fixed that occurs during a VNIC failover where the VNIC
backing device has a physical port down due to an adapter internal
error with an SRC B400FF02 logged. This is an improved version of
the fix delivered in earlier service pack FW860.A0 for adapter firmware
11.4.415.37 and it significantly reduces the frequency of the error
being fixed.
2) An adapter in SR-IOV shared mode may cause a network interruption
and SRCs B400FF02 and B400FF04 logged. The problem occurs
infrequently during normal network traffic.
These fixes update the adapter firmware to 11.4.415.41 for the
following Feature Codes and CCINs: #EN15/#EN16 with CCIN 2CE3,
#EN17/#EN18 with CCIN 2CE4, #EN0H/#EN0J with CCIN 2B93, #EN0M/#EN0N
with CCIN 2CC0, and #EN0K/#EN0L with CCIN 2CC1.
Update instructions: https://www.ibm.com/support/knowledgecenter/HW4M4/p8efd/p8efd_updating_sriov_firmware.htm.
- For a system with an AIX or Linux partition. a
problem was fixed for Platform Error Logs (PELs) that are
truncated to only eight bytes for error logs created by the firmware
and reported to the AIX or Linux OS. These PELs may appear to be
blank or missing on the OS. This rare problem is triggered by
multiple error log events in the firmware occurring close together in
time and each needing to be reported to the OS, causing a truncation in
the reporting of the PEL. As a problem workaround, the full error
logs for the truncated logs are available on the HMC or using ASMI on
the service processor to view them.
- A problem was fixed for Platform Error Logs (PELs) not
being logged and shown by the OS if they have an Error Severity code of
"critical error". The trigger is the reporting by a system
firmware subsystem of an error log that has set an Event/Error Severity
in the 'UH' section of the log to a value in the range, 0x50 to 0x5F.
The following error logs are affected:
B200308C ==> PHYP ==> A problem occurred during the IPL of
a partition. The adapter type cannot be determined. Ensure that a
valid I/O Load Source is tagged.
B700F104 ==> PHYP ==> Operating System error. Platform
Licensed Internal Code terminated a partition.
B7006990 ==> PHYP ==> Service processor failure
B2005149 ==> PHYP ==> A problem occurred during the IPL of
a partition.
B700F10B ==> PHYP ==> A resource has been disabled due to
hardware problems
A7001150 ==> PHYP ==> System log entry only, no service action
required. No action needed unless a serviceable event was logged.
B7005442 ==> PHYP ==> A parity error was detected in the hardware
Segment Lookaside Buffer (SLB).
B200541A ==> PHYP ==> A problem occurred during a partition
Firmware Assisted Dump
B7001160 ==> PHYP ==> Service processor failure.
B7005121 ==> PHYP ==> Platform LIC failure
BC8A0604 ==> Hostboot ==> A problem occurred during the IPL
of the system.
BC8A1E07 ==> Hostboot ==> Secure Boot firmware
validation failed.
Note that these error logs are still reported to the service processor
and HMC properly. This issue does not affect the Call Home action
for the error logs.
- A problem was fixed for the Device Description in a System
Plan related to Crypto Coprocessors and NVMe cards that were only
showing the PCI vendor and device ID of the cards. This is not
enough information to verify which card is installed without looking up
the PCI IDs first. With the fix, more specific/useful information
is displayed and this additional information does not have any adverse
impact on sysplan operations. The problem is seen every time a
System Plan is created for an installed Crypto Coprocessor or NVMe
card.
- A problem was fixed for correct ASMI passwords being
rejected when accessing ASMI using an ASCII terminal with a serial
connection to the server. This problem always occurs for systems
at firmware level FW860.A0 and later.
System firmware changes that affect
certain systems
- On systems with an IBM i partition, a problem was fixed for
a Live Partition Mobility (LPM) hang while performing the migration of
an IBM i partition. In some situations, there is a timing
issue when the hypervisor is managing IBM i software licenses.
When a subsequent LPM operation is performed, the LPM operation
hangs. To recover from this problem to be able to do LPM, the
system must be re-IPLed.
- For a system with an IBM i partition. a problem was
fixed for an IBM i partition running in P7 or P8 processor
compatibility mode failing to boot with SRCs BA330002 and B200A101
logged. This problem can be triggered as larger configurations
for processors and memory are added to the partition. A
circumvention for this problem could be to reduce the number of
processors and memory in the partition, or booting in P9 or later
compatibility mode will also allow the partition to boot.
|
SC860_236_165 / FW860.A2
12/07/21 |
Impact: Security
Severity: HIPER
System firmware changes that
affect all systems
- HIPER/Non-Pervasive:
A security problem was fixed to prevent an attacker that gains service
access to the FSP service processor from reading and writing PowerVM
system memory using a series of carefully crafted service
procedures. This problem is Common Vulnerability and Exposure
number CVE-2021-38917.
- HIPER/Non-Pervasive:
A problem was fixed for the IBM PowerVM Hypervisor where through a
specific sequence of VM management operations could lead to a violation
of the isolation between peer VMs. This Common Vulnerability and
Exposure number is CVE-2021-38918.
|
SC860_234_165 / FW860.A1
09/16/21 |
Impact: Data
Severity: HIPER
System firmware changes that
affect all systems
- HIPER: A
problem was fixed which may occur on a target system following a Live
Partition Mobility (LPM) migration of an AIX partition utilizing Active
Memory Expansion (AME) with 64 KB page size enabled using the vmo
tunable: "vmo -ro ame_mpsize_support=1". The problem may result
in AIX termination, file system corruption, application segmentation
faults, or undetected data corruption.
Note: If you are doing an LPM migration of an AIX partition
utilizing AME and 64 KB page size enabled involving a POWER8 or POWER9
system, ensure you have a Service Pack including this change for the
appropriate firmware level on both the source and target systems.
|
SC860_231_165 / FW860.A0
07/08/21 |
Impact: Availability
Severity: SPE
New
features and functions
- Support added to Redfish to provide a command to set the
ASMI user passwords using a new AccountService schema.
Using this service, the ASMI admin, HMC, and general user passwords can
be changed.
System firmware changes that
affect all systems
- A problem was fixed
for Time of Day (TOD) being lost for the real-time clock (RTC) with an
SRC B15A3303 logged when the service processor boots or resets.
This is a very rare problem that involves a timing problem in the
service processor kernel. If the server is running when the error
occurs, there will be an SRC B15A3303 logged, and the time of day on
the service processor will be incorrect for up to six hours until the
hypervisor synchronizes its (valid) time with the service
processor. If the server is not running when the error occurs,
there will be an SRC B15A3303 logged, and If the server is subsequently
IPLed without setting the date and time in ASMI to fix it, the IPL will
abort with an SRC B7881201 which indicates to the system operator that
the date and time are invalid.
- A problem was fixed in ASMI to allow setting static routes
with two default gateway IP addresses. Without the fix, ASMI
always fails with "Invalid entry. Gateway address" for this
configuration. As a workaround, the static routes could be created
using the ASMI command line and the "route add" command.
- A problem was fixed for intermittent failures for a reset
of a Virtual Function (VF) for SR-IOV adapters during Enhanced Error
Handling (EEH) error recovery. This is triggered by EEH events at a VF
level only, not at the adapter level. The error recovery fails if a
data packet is received by the VF while the EEH recovery is in
progress. A VF that has failed can be recovered by a partition
reboot or a DLPAR remove and add of the VF.
- A problem was fixed for time-out issues in Power Enterprise
Pools 1.0 (PEP 1.0) that can affect performance by having non-optimal
assignments of processors and memory to the server logical partitions
in the pool. For this problem to happen, the server must be in a PEP
1.0 pool and the HMC must take longer than 2 minutes to provide the
PowerVM hypervisor with the information about pool resources owned by
this server. The problem can be avoided by running the HMC optmem
command before activating the partitions.
- A problem was fixed where the Floating Point Unit
Computational Test, which should be set to "staggered" by default, has
been changed in some circumstances to be disabled. If you wish to
re-enable this option, this fix is required. After applying this
service pack, do the following steps:
1) Sign into the Advanced System Management Interface (ASMI).
2) Select Floating Point Computational Unit under the System
Configuration heading and change it from disabled to what is needed:
staggered (run once per core each day) or periodic (a specified time).
3) Click "Save Settings".
- The following problems were fixed for certain SR-IOV
adapters:
1) An error was fixed that occurs during a VNIC failover where the VNIC
backing device has a physical port down or read port errors with an SRC
B400FF02 logged.
2) A problem was fixed for adding a new logical port that has a PVID
assigned that is causing traffic on that VLAN to be dropped by other
interfaces on the same physical port which uses OS VLAN tagging for
that same VLAN ID. This problem occurs each time a logical port
with a non-zero PVID that is the same as an existing VLAN is
dynamically added to a partition or is activated as part of a partition
activation, the traffic flow stops for other partitions with OS
configured VLAN devices with the same VLAN ID. This problem can
be recovered by configuring an IP address on the logical port with the
non-zero PVID and initiating traffic flow on this logical port.
This problem can be avoided by not configuring logical ports with a
PVID if other logical ports on the same physical port are configured
with OS VLAN devices.
This fix updates the adapter firmware to 11.4.415.37 for the following
Feature Codes and CCINs: #EN15/#EN16 with CCIN 2CE3, #EN17/#EN18 with
CCIN 2CE4, #EN0H/#EN0J with CCIN 2B93, #EN0M/#EN0N with CCIN 2CC0, and
#EN0K/#EN0L with CCIN 2CC1.
Update instructions: https://www.ibm.com/support/knowledgecenter/HW4M4/p8efd/p8efd_updating_sriov_firmware.htm.
- A problem was fixed for some serviceable events specific to
the reporting of EEH errors not being displayed on the HMC. The sending
of an associated call home event, however, was not affected. This
problem is intermittent and infrequent.
- A problem was fixed for newer hardware record names
(hardware delivered after the original POWER8 GA) not being displayed
correctly in the ASMI deconfiguration records. For example, Capp
is displayed as "Unknown".
- A problem was fixed for a system termination with SRC
B700F107 following a time facility processor failure with SRC
B700F10B. With the fix, the transparent replacement of the failed
processor will occur for the B700F10B if there is a free core, with no
impact to the system.
- A problem was fixed for possible partition errors following
a concurrent firmware update from FW810 or later. A precondition for
this problem is that DLPAR operations of either physical or virtual I/O
devices must have occurred prior to the firmware update. The error can
take the form of a partition crash at some point following the update.
The frequency of this problem is low. If the problem occurs, the OS
will likely report a DSI (Data Storage Interrupt) error. For
example, AIX produces a DSI_PROC log entry. If the partition does not
crash, it is also possible that some subsequent I/O DLPAR operations
will fail.
- A problem was fixed for spurious out-of-range (greater than
127 C) temperatures being reported for the processor with SRC B1112A10.
With the fix, only valid temperature sensor readings are used when
reporting processors that have exceeded the Over Temperature (OT) value.
- A problem was fixed in ASMI for setting a static route with
a network address for the IP such as "xxx.xxx.xxx.0". Without the
fix, ASMI always fails with "Invalid entry. IP address" for this
network address format. As a workaround, the static route could
be created with the individual IP endpoint entered instead of the
network address. or created using the ASMI command line and the "route
add" command.
System firmware changes that affect
certain systems
- On systems with an IBM i partition, a problem was fixed for
physical I/O property data not being able to be collected for an
inactive partition booted in "IOR" mode with SRC B200A101 logged. This
can happen when making a system plan (sysplan) for an IBM i partition
using the HMC and the IBM i partition is inactive. The sysplan
data collection for the active IBM i partitions is successful.
- On systems with only Integrated Facility for Linux ( IFL)
processors and AIX or IBM i partitions, a problem was fixed for
performance issues for IFL VMs (Linux and VIOS). This problem
occurs if AIX or IBM i partitions are active on a system with IPL only
cores. As a workaround, AIX or IBM i partitions should not be activated
on an IFL only system. With the fix, the activation of AIX and IBM i
partitions are blocked on an IFL only system. If this fix is installed
concurrently with AIX or IBM i partitions running, these partitions
will be allowed to continue to run until they are powered off.
Once powered off, the AIX and IBM i partitions will not be allowed to
be activated again on the IFL-only system.
|
SC860_226_165 / FW860.90
12/09/20 |
Impact: Data
Severity: HIPER
New
features and functions
- Enable periodic logging of internal component operational
data for the PCIe3 expansion drawer paths. The logging of this
data does not impact the normal use of the system.
System firmware changes that
affect all systems
- HIPER/Pervasive:
A problem was fixed for certain SR-IOV adapters for a condition that
may result from frequent resets of adapter Virtual Functions (VFs), or
transmission stalls and could lead to potential undetected data
corruption.
The following additional fixes are also included:
1) The VNIC backing device goes to a powered off state during a VNIC
failover or Live Partition Mobility (LPM) migration. This failure
is intermittent and very infrequent.
2) Adapter time-outs with SRC B400FF01 or B400FF02 logged.
3) Adapter time-outs related to adapter commands becoming blocked with
SRC B400FF01 or B400FF02 logged
4) VF function resets occasionally not completing quickly enough
resulting in SRC B400FF02 logged.
This fix updates the adapter firmware to 11.4.415.33 for the following
Feature Codes and CCINs: #EN15/#EN16 with CCIN 2CE3, #EN17/#EN18 with
CCIN 2CE4, #EN0H/#EN0J with CCIN 2B93, #EN0M/#EN0N with CCIN 2CC0, and
#EN0K/#EN0L with CCIN 2CC1.
The SR-IOV adapter firmware level update for the shared-mode adapters
happens under user control to prevent unexpected temporary outages on
the adapters. A system reboot will update all SR-IOV shared-mode
adapters with the new firmware level. In addition, when an
adapter is first set to SR-IOV shared mode, the adapter firmware is
updated to the latest level available with the system firmware (and it
is also updated automatically during maintenance operations, such as
when the adapter is stopped or replaced). And lastly, selective
manual updates of the SR-IOV adapters can be performed using the
Hardware Management Console (HMC). To selectively update the
adapter firmware, follow the steps given at the IBM Knowledge Center
for using HMC to make the updates: https://www.ibm.com/support/knowledgecenter/HW4M4/p8efd/p8efd_updating_sriov_firmware.htm.
Note: Adapters that are capable of running in SR-IOV mode, but are
currently running in dedicated mode and assigned to a partition, can be
updated concurrently either by the OS that owns the adapter or the
managing HMC (if OS is AIX or VIOS and RMC is running).
- A rare problem was fixed for a checkstop during an IPL that
fails to isolate and guard the problem core. An SRC is logged
with B1xxE5xx and an extended hex word 8 xxxxDD90. With the fix,
the suspected failing hardware is guarded and a node is possibly
deconfigured to allow the subsequent IPLs of the system to be
successful.
- A problem was fixed for the REST/Redfish interface to
change the success return code for object creation from "200" to
"201". The "200" status code means that the request was received
and understood and is being processed. A "201" status code
indicates that a request was successful and, as a result, a resource
has been created. The Redfish Ruby Client, "redfish_client" may
fail a transaction if a "200" status code is returned when "201" is
expected.
- A problem was fixed to allow quicker recovery of PCIe links
for the #EMXO PCIe expansion drawer for a run-time fault with B7006A22
logged. The time for recovery attempts can exceed six minutes on
rare occasions which may cause I/O adapter failures and failed
nodes. With the fix, the PCIe links will recover or fail faster
(in the order of seconds) so that redundancy in a cluster configuration
can be used with failure detection and failover processing by other
hosts, if available, in the case where the PCIe links fail to recover.
- A problem was fixed for a concurrent maintenance "Repair
and Verify" (R&V) operation for a #EMX0 fanout module that fails
with an "Unable to isolate the resource" error message. This
should occur only infrequently for cases where a physical hardware
failure has occurred which prevents access to slot power
controls. This problem can be worked around by bringing up the
"PCIe Hardware Topology" screen from either ASMI or the HMC after the
hardware failure but before the concurrent repair is attempted.
This will avoid the problem with the PCIe slot isolation
These steps can also be used to recover from the error to allow the
R&V repair to be attempted again.
- A problem was fixed for a B7006A96 fanout module FPGA
corruption error that can occur in unsupported PCIe3 expansion
drawer(#EMX0) configurations that mix an enhanced PCIe3 fanout module
(#EMXH) in the same drawer with legacy PCIe3 fanout modules (#EMXF,
#EMXG, #ELMF, or #ELMG). This causes the FPGA on the enhanced
#EMXH to be updated with the legacy firmware and it becomes a
non-working and unusable fanout module. With the fix, the
unsupported #EMX0 configurations are detected and handled gracefully
without harm to the FPGA on the enhanced fanout modules.
- A problem was fixed for possible dispatching delays for
partitions running in POWER8 processor compatibility mode.
- A problem was fixed for system memory not returned after
create and delete of partitions, resulting in slightly less memory
available after configuration changes in the systems. With the fix, an
IPL of the system will recover any of the memory that was orphaned by
the issue.
- A problem was fixed for utilization statistics for commands
such as HMC lslparutil and third-party lpar2rrd that do not accurately
represent CPU utilization. The values are incorrect every time
for a partition that is migrated with Live Partition Mobility
(LPM). Power Enterprise Pools 2.0 is not affected by this
problem. If this problem has occurred, here are three possible
recovery options:
1) Re-IPL the target system of the migration.
2) Or delete and recreate the partition on the target system.
3) Or perform an inactive migration of the partition. The cycle
values get zeroed in this case.
- A problem was fixed for a PCIe3 expansion drawer cable that
has hidden error logs for a single lane failure. This happens
whenever a single lane error occurs. Subsequent lane failures are
not hidden and have visible error logs. Without the fix, the hidden or
informational logs would need to be examined to gather more information
for the failing hardware.
- A problem was fixed for a DLPAR remove of memory from a
partition that fails if the partition contains 65535 or more LMBs. With
16MB LMBs, this error threshold is 1 TB of memory. With 256 MB LMBs, it
is 16 TB of memory. A reboot of the partition after the DLPAR will
remove the memory from the partition.
- A problem was fixed for extraneous B400FF01 and B400FF02
SRCs logged when moving cables on SR-IOV adapters. This is an
infrequent error that can occur if the HMC performance monitor is
running at the same time the cables are moved. These SRCs can be
ignored when accompanied by cable movement.
- A problem was fixed for B400FF02 errors for certain SR-IOV
adapters during adapter initialization or error recovery. This is
a rare error that can occur because of a race condition in the firmware.
This fix pertains to adapters with the following Feature Codes and
CCINs: #EN15/#EN16 with CCIN 2CE3, #EN17/#EN18 with CCIN 2CE4,
#EN0H/#EN0J with CCIN 2B93, #EN0M/#EN0N with CCIN 2CC0, and #EN0K/#EN0L
with CCIN 2CC1.
- A problem was fixed for not logging SRCs for certain cable
pulls from the #EMXO PCIe expansion drawer. With the fix, the
previously undetected cable pulls are now detected and logged with SRC
B7006A8B and B7006A88 errors.
- A problem was fixed for a rare system hang that can occur
when a page of memory is being migrated. Page migration (memory
relocation) can occur for a variety of reasons, including predictive
memory failure, DLPAR of memory, and normal operations related to
managing the page pool resources.
- A problem was fixed for running PCM on a system with SR-IOV
adapters in shared mode that results in an "Incomplete" system state
with certain hypervisor tasks deadlocked. This problem is rare
and is triggered when using SR-IOV adapters in shared mode and
gathering performance statistics with PCM (Performance Collection and
Monitoring) and also having a low level error on an adapter. The
only way to recover from this condition is to re-IPL the system.
- A problem was fixed for an SRC B7006A99 informational log
now posted as a Predictive with a call out of the CXP cable FRU,
This fix improves FRU isolation for cases where a CXP cable alert
causes a B7006A99 that occurs prior to a B7006A22 or B7006A8B.
Without the fix, the SRC B7006A99 is informational and the latter SRCs
cause a larger hardware replacement even though the earlier event
identified a probable cause for the cable FRU.
System firmware changes that affect
certain systems
- On systems with an IBM i partition, a problem was fixed for
only seeing 50% of the total Power Enterprise Pools (PEP) 1.0 memory
that is provided. This happens when querying resource information
via QAPMCONF which calls MATMATR 0x01F6. With the fix, an error
is corrected in the IBM i MATMATR option 0X01F6 that retrieves the
memory information for the Collection Services.
|
SC860_215_165 / FW860.81
03/04/20
|
Impact:
Security Severity: HIPER
System firmware changes that affect all systems
- HIPER/Pervasive:
A problem was fixed for an HMC "Incomplete" state for a system after
the HMC user password is changed with ASMI on the service
processor. This problem can occur if the HMC password is changed
on the service processor but not also on the HMC, and a reset of the
service processor happens. With the fix, the HMC will get the
needed "failed authentication" error so that the user knows to update
the old password on the HMC.
|
SC860_212_165 / FW860.80
12/17/19 |
Impact: Security
Severity: SPE
New features and functions
- Support was added
for improved
security for the service processor password policy. For the
service processor, the "admin", "hmc" and "general" password must
be set on first use for newly manufactured systems and after a factory
reset of the system. The REST/Redfish interface will return an
error saying the user account is expired in these scenarios. This
policy change helps to enforce the service processor is not left in a
state with a well-known password. The user can change from an
expired default password to a new password using the Advanced System
Management Interface (ASMI).
- Support was added for real-time
data capture for PCIe3 expansion drawer (#EMX0) cable card connection
data via resource dump selector on the HMC or in ASMI on the service
processor. Using the resource selector string of "xmfr
-dumpccdata" will non-disruptively generate an RSCDUMP type of dump
file that has the current cable card data, including data from cables
and the retimers.
System firmware changes that affect all systems
- A problem was
fixed for SR-IOV
adapters to provide a consistent Informational message level for cable
plugging issues. For transceivers not plugged on certain SR-IOV
adapters, an unrecoverable error (UE) SRC B400FF03 was changed to an
Informational message logged. This affects the SR-IOV adapters
with the following feature codes and CCINs: #EC2R/EC2S with CCIN
58FA; #EC2T/EC2U with CCIN 58FB; and #EC3L/EC3M with CCIN
2CEC.
For copper cables unplugged on certain SR-IOV adapters, a missing
message was replaced with an Informational message logged. This
affects the SR-IOV adapters with the following feature codes and CCINs:
#EN17/EN18 with CCIN 2CE4; and #EN0K/EN0L with CCIN 2CC1.
- The following problem related to SR-IOV
was fixed: If the SR-IOV logical port's VLAN ID (PVID) is modified
while the logical port is configured, the adapter will use an incorrect
PVID for the Virtual Function (VF). This problem is rare because
most users do not change the PVID once the logical port is configured,
so they will not have the problem.
This fix updates adapter firmware to 10.2.252.1940 for the
following Feature Codes and CCINs: #EN15/EN16 with CCIN 2CE3;
#EN17/EN18 with CCIN 2CE4; #EN0H/EN0J with CCIN 2B93; #EN0M/EN0N with
CCIN 2CC0; and #EN0K/EN0L with CCIN 2CC1.
The SR-IOV adapter firmware level update for the shared-mode adapters
happens under user control to prevent unexpected temporary outages on
the adapters. A system reboot will update all SR-IOV shared-mode
adapters with the new firmware level. In addition, when an
adapter is first set to SR-IOV shared mode, the adapter firmware is
updated to the latest level available with the system firmware (and it
is also updated automatically during maintenance operations, such as
when the adapter is stopped or replaced). And lastly, selective
manual updates of the SR-IOV adapters can be performed using the
Hardware Management Console (HMC). To selectively update the
adapter firmware, follow the steps given at the IBM Knowledge Center
for using HMC to make the updates: https://www.ibm.com/support/knowledgecenter/HW4M4/p8efd/p8efd_updating_sriov_firmware.htm.
Note: Adapters that are capable of running in SR-IOV mode, but are
currently running in dedicated mode and assigned to a partition, can be
updated concurrently either by the OS that owns the adapter or the
managing HMC (if OS is AIX or VIOS and RMC is running).
- A problem was fixed for
Novalink failing to activate partitions that have names with character
lengths near the maximum allowed character length. This problem
can be circumvented by changing the partition name to have 32
characters or less.
- A problem was fixed where a Linux or AIX partition type was
incorrectly reported as unknown. Symptoms include: IBM Cloud
Management Console (CMC) not being able to determine the RPA partition
type (Linux/AIX) for partitions that are not active; and HMC attempts
to dynamically add CPU to Linux partitions may fail with a HSCL1528
error message stating that there are not enough Integrated Facility for
Linux ( IFL) cores for the operation.
- A problem was fixed for
a possible system crash with SRC B7000103 if the HMC session is closed
while the performance monitor is active. As a circumvention for
this problem, make sure the performance monitor is turned off before
closing the HMC sessions.
- A problem was fixed for a Live
Partition Mobility (LPM) migration of a large memory partition to a
target system that causes the target system to crash and for the HMC to
go to the "Incomplete" state. For servers with the default LMB
size (256MB), if a partition is >=16TB and if desired memory is
different than the maximum memory, LPM may fail on the target
system. Servers with LMB sizes less than the default could hit
this problem with smaller memory partition sizes. A circumvention
to the problem is to set the desired and maximum memory to the same
value for the large memory partition that is to be migrated.
- A problem was fixed for
system hangs or incomplete states displayed by HMC(s) caused by a loop
in the handling of Segment Lookaside Buffer (SLB) cache memory parity
errors where SRC B7005442 may be logged. This problem has a low
frequency of occurrence as it requires severe errors in the SLB cache
that are not cleared by an error flush of the entries. A re-IPL
of the system can be used to recover from this error.
- A problem was fixed for a failed
clock card causing a node to be guarded during the IPL of a multi-node
system. With the fix, the redundant clock card allows all the
nodes to IPL in the case of a single clock card failure.
System firmware changes that affect certain systems
- On systems with an IBM i partition, a problem was fixed
for a D-mode IPL failure when using a USB DVD drive in an IBM 7226
multimedia storage enclosure. Error logs with SRC BA16010E,
B2003110, and/or B200308C can occur. As a circumvention, an
external DVD drive can be used for the D-mode IPL.
- On systems with a single node, a problem
was fixed for unknowingly running at lower (the default) frequencies
when changing into Fixed Max Frequency (FMF) mode. This problem
should be unlikely to happen because it requires that the system
already is in FMF mode, and then the user requesting a change into FMF
mode. This request is not handled correctly as the tunable
parameters get reset to default which allows the processor frequency to
be reduced to the minimum value. The recovery for this problem is
to change the power mode to "Nominal" and then change it to FMF.
- On systems with IBM i partitions, a
rare problem was fixed for an intermittent failure of a DLPAR remove of
an adapter. In most cases, a retry of the operation will be
successful.
- On systems
with Integrated Facility for Linux ( IFL) processors and Linux-only
partitions, a problem was fixed for Power Enterprise Pools (PEP)
1.0 not going back into "Compliance" when resources are moved from
Server 1 to Server 2, causing an expected "Approaching Out Of
Compliance", but not automatically going back into compliance when the
resources are no longer used on Server 1. As a circumvention, the
user can do an extra "push" and "pull" of one resource to make the Pool
discover it is back in "Compliance",
- On systems with an IBM i partition,
a problem was fixed for a possibly incorrect number of Memory COD
(Capacity On Demand) resources shown when gathering performance data
with IBM i Collection Services. Memory resources activated by
Power Enterprise Pools (PEP) 1.0 will be missing from the data.
An error was corrected in the IBM i MATMATR option 0X01F6 that
retrieves the Memory COD information for the Collection Services.
|
SC860_205_165 / FW860.70
06/18/19 |
Impact: Availability
Severity: HIPER
System firmware changes that affect all systems
- HIPER/Pervasive: The
following problems related to SR-IOV were fixed:
1) A problem was fixed for new or replacement SR-IOV adapters with
feature codes EN15, EN16, EN17, and EN18 being rendered non-functional
when moved to SR-IOV mode. This includes cards moved from dedicated
device mode, newly installed adapters, and FRU replacements. This
problem occurs when the adapter firmware is updated to the 10.2.252.x
levels from 11.x adapter firmware levels.
2) A problem was fixed for certain SR-IOV adapters where SRC B400FF01
errors are seen during vNIC failovers and Live Partition Mobility (LPM)
migration of vNIC clients. This may also result in errors seen in
partitions (for example, some partitions may show LNC2ENT_TX_ERR).
3) A problem was fixed where network multicast traffic is not received
by a SR-IOV logical port (VF) network interface for a Linux partition.
The failure can occur when the partition transitions the network
interface out of promiscuous or multicast promiscuous mode.
These fixes update adapter firmware to 10.2.252.1939 for the
following Feature Codes: EN15, EN16, EN17, EN18, EN0H, EN0J,
EN0M, EN0N, EN0K, and EN0L.
The SR-IOV adapter firmware level update for the shared-mode adapters
happens under user control to prevent unexpected temporary outages on
the adapters. A system reboot will update all SR-IOV shared-mode
adapters with the new firmware level. In addition, when an
adapter is first set to SR-IOV shared mode, the adapter firmware is
updated to the latest level available with the system firmware (and it
is also updated automatically during maintenance operations, such as
when the adapter is stopped or replaced). And lastly, selective
manual updates of the SR-IOV adapters can be performed using the
Hardware Management Console (HMC). To selectively update the
adapter firmware, follow the steps given at the IBM Knowledge Center
for using HMC to make the updates: https://www.ibm.com/support/knowledgecenter/HW4M4/p8efd/p8efd_updating_sriov_firmware.htm.
Note: Adapters that are capable of running in SR-IOV mode, but are
currently running in dedicated mode and assigned to a partition, can be
updated concurrently either by the OS that owns the adapter or the
managing HMC (if OS is AIX or VIOS and RMC is running).
- DEFERRED:
PARTITION_DEFERRED: A problem was fixed for repeated CPU
DLPAR remove operations by Linux
(Ubuntu, SUSE, or RHEL) OSes possibly resulting in a partition
crash. No specific SRCs or error logs are reported.
The problem can occur on any DLPAR CPU remove operation if running on
Linux. The occurrence is intermittent and rare. The
partition crash may result in one or more of the following console
messages (in no particular order):
1) Bad kernel stack pointer addr1 at addr2
2) Oops: Bad kernel stack pointer
3) ******* RTAS CALL BUFFER CORRUPTION *******
4) ERROR: Token not supported
This fix does not activate until there is a reboot of the partition.
- A problem was fixed for a loss of service processor
redundancy if an attempt is made to boot from a corrupted flash side on
the primary service processor. Although the primary service
processor recovers, the backup service processor ends up stuck in the
IPLing state. The backup service processor must be reset to
recover from the IPL hang and restore service processor redundancy.
- A problem was fixed for an incorrect SRC B150F138 being
logged against the backup service processor when service processor
redundancy has been disabled. This SRC is logged at system
run-time when the backup service processor is in the standby or
termination state. This is the expected state of backup service
processor with redundancy is disabled, so no SRC should be logged and
it can be ignored.
- A problem was fixed for a PCIe Hub checkstop with SRC
B138E504 logged that fails to guard the errant processor chip.
With the fix, the problem hardware FRU is guarded so there is not a
recurrence of the error on the next IPL.
- A problem was fixed for an incorrect SRC of B1810000 being
logged when a firmware update fails because of Entitlement Key
expiration. The error displayed on the HMC and in the OS is
correct and meaningful. With the fix, for this firmware update
failure the correct SRC of B181309D is now logged.
- A problem was fixed for informational logs flooding
the error log if a "Get Sensor Reading" is not working.
- A problem was fixed for a Redfish (REST) Patch
request for PowerSaveMode with an unsupported mode value returning an
error code "500" instead of the correct error code of "400".
- A problem was fixed for a rare Live Partition Mobility
migration hang with the partition left in VPM (Virtual Page Mode) which
causes performance concerns. This error is triggered by a
migration failover operation occurring during the migration state of
"Suspended" and there has to be insufficient VASI buffers available to
clear all partition state data waiting to be sent to the migration
target. Migration failovers are rare and the migration state of
"Suspended" is a migration state lasting only a few seconds for most
partitions, so this problem should not be frequent. On the HMC,
there will be an inability to complete either a migration stop or a
recovery operation. The HMC will show the partition as migrating
and any attempt to change that will fail. The system must be
re-IPLed to recover from the problem.
- A problem was fixed for shared processor partitions going
unresponsive after changing the processor sharing mode of a
dedicated processor partition from "allow when partition is active" to
either "allow when partition is inactive" or "never". This
problem can be circumvented by avoiding disabling processor sharing
when active on a dedicated processor partition. To recover if the issue
has been encountered, enable "processor sharing when active" on the
dedicated partition.
- A problem was fixed for an error in deleting a partition
with the virtualized Trusted Platform Module (vTPM) enabled and SRC
B7000602 logged. When this error occurs, the encryption process
in the hypervisor may become unusable. The problem can be
recovered from with a re-IPL of the system.
- A problem was fixed in Live Partition Mobility (LPM) of a
partition to a shared processor pool, which results in the partition
being unable to consume uncapped cycles on the target system. To
prevent the issue from occurring, partitions can be migrated to the
default shared processor pool and then dynamically moved to the desired
shared processor pool. To recover from the issue, do one of
the following four steps:
1) Either use DLPAR to add or remove a virtual processor to/from the
affected partition;
2) or dynamically move the partition between shared processor pools;
3) or reboot the partition;
4) or re-IPL the system.
- A problem was fixed for a boot failure using a N_PORT ID
Virtualization (NPIV) LUN for an operating system that is installed on
a disk of 2 TB or greater, and having a device driver for the disk that
adheres to a non-zero allocation length requirement for the "READ
CAPACITY 16". The IBM partition firmware had always used an
invalid zero allocation length for the return of data and that had been
accepted by previous device drivers. Now some of the newer device
drivers are adhering to the specification and needing an allocation
length of non-zero to allow the boot to proceed.
- A problem was fixed for a clock card failure with SRC
B158CC62 logged calling out the wrong clock card and not calling out
the cable and system backplane as needed. This fix does not add
processors to the callout but in some cases the processor has also been
identified as the cause of the clock card failure.
- A problem was fixed for failing to boot from an AIX mksysb
backup on a USB RDX drive with SRCs logged of BA210012, AA06000D, and
BA090010. The problem trigger is a boot attempt from the
RDX device. The boot error does not occur if a serial console is used
to navigate the SMS menus.
- A problem was fixed for a system IPLing with an invalid
time set on the service processor that causes partitions to be reset to
the Epoch date of 01/01/1970. With the fix, on the IPL, the
hypervisor logs a B700120x when the service processor real time clock
is found to be invalid and halts the IPL to allow the time and date to
be corrected by the user. The Advanced System Management
Interface (ASMI) can be used to correct the time and date on the
service processor. On the next IPL, if the time and date have not
been corrected, the hypervisor will log a SRC B7001224 (indicating the
user was warned on the last IPL) but allow the partitions to start, but
the time and date will be set to the Epoch value.
- A security problem was fixed in the service processor
Network Security Services (NSS) services which, with a
man-in-the-middle attack, could provide false completion or errant
network transactions or exposure of sensitive data from intercepted SSL
connections to ASMI, Redfish, or the service processor message
server. The Common Vulnerabilities and Exposures issue number is
CVE-2018-12384.
- A problem was fixed for hypervisor task getting deadlocked
if partitions are powered on at the same time that SR-IOV is being
configured for an adapter. With this problem, workloads will
continue to run but it will not be possible to change the
virtualization configuration or power partitions on and off. This
error can be recovered by doing a re-IPL of the system.
- A problem was fixed for hypervisor tasks getting deadlocked
that cause the hypervisor to be unresponsive to the HMC ( this shows as
an incomplete state on the HMC) with SRC B200F011 logged. This is
a rare timing error. With this problem, OS workloads will
continue to run but it will not be possible for the HMC to interact
with the partitions. This error can be recovered by doing a
re-IPL
of the system with a scheduled outage.
- A problem was fixed for false indication of a real time
clock (RTC) battery failure with SRC B15A3305 logged. This error
happens infrequently. If the error occurs, and another battery
failure SRC is not logged within 24 hours, ignore the error as it was
caused by a timing issue in the battery test.
System firmware changes that affect certain systems
- DEFERRED: On
systems with a PCIe3 I/O expansion drawer (#EMX0) , a problem was fixed
for the PCIe3 I/O expansion drawer links to improve
stability. Intermittent training failures on the links
occurred during the IPL with SRC B7006A8B logged. With the fix,
the link settings were changed to lower the peak link signal
amplification to bring the signal level into the middle of the
operating range, thus improving the high margin to reduce link training
failures. The system must be re-IPLed for the fix to activate.
- On a system witn an IBM i partition, a problem was fixed
for a DLPAR force-remove of a physical IO adapter from an IBM i
partition and a simultaneous power off of the partition causing the
partition to hang during the power off. To recover the partition
from the error, the system must be re-IPLed. This problem is rare
because there is only a 2-second timing window for the DLPAR and power
off to interfere with each other.
- On a system with an active IBM i partition, a problem was
fixed for a SPCN firmware download to the PCIe3 I/O expansion drawer
(feature #EMX0) Chassis Management Card (CMC) that could possibly get
stuck in a pending state. This failure is very unlikely as it
would require a concurrent replacement of the CMC card that is loaded
with a SPCN level that is older than 2015 (01MEX151012a). The
failure with the SPCN download can be corrected by a re-IPL of the
system.
- On a system with an AMS (Active Memory Sharing) partition,
a problem was fixed for a Live Partition Mobility (LPM) migration
failure when migrating from P9 to a pre-FW860 P8 or P7 system.
This failure can occur if the P9 partition is in dedicated memory mode,
and the Physical Page Table (PPT) ratio is explicitly set on the HMC
(rather than keeping the default value) and the partition is then
transitioned to AMS mode prior to the migration to the older
system. This problem can be avoided by using dedicated memory in
the partition being migrated back to the older system.
- On a system with a vNIC configuration with multiple backing
Virtual Functions (VFs), a problem was fixed for a backing VF failure
after a sequence of repeated failovers where one of the VF backing
devices goes to a powered off state. This problem is infrequent
and only occurs after many vNIC failovers. A reboot of the
partition with the affected VF will recover it.
- On systems with PCIe3 expansion drawers (feature code
#EMX0), a problem was fixed for a UE B700BA01 logged after a FRU
was replaced in the PCIe Expansion drawer. The log should have
been informational instead of unrecoverable because it is normal to
have this log for a replaced part in the expansion drawer that has a
different serial number from the old part. If a part in the
expansion drawer has been replaced, the UE error log can be ignored.
- On systems with IBMi partitions, a problem was fixed
for Live Partition Mobility (LPM) migrations that could have incorrect
hardware resource information (related to VPD) in the target partition
if a failover had occurred for the source partition during the
migration. This failover would have to occur during the Suspended
state of the migration, which only lasts about a second, so this should
be rare. With the fix, at a minimum the migration error will be
detected to abort the migration so it can be restarted. And at a
later IBMi OS level, the fix will allow the migration to complete even
though the failover has occurred during the Suspended state of the
migration.
- On systems with PCIe3 expansion drawers (feature #EMX0), a
problem was fixed for PCI link recovery failure during a PCI Host
Bridge (PHB) reset with SRCs of B7006A80, B7006A22, B7006A8B, and
B7006970 logged. This causes the cable card to fail, losing all
slots in the expansion drawer. This is a rare problem. If
this error occurs, a concurrent maintenance operation could reboot the
expansion drawer or a re-IPL of the system could be done to recover the
drawer.
- On systems with an IBM i partition with greater than 9999
GB installed, a problem was fixed for on/Off COD memory-related
amounts not being displayed correctly. This only happens when
retrieving the On/Off COD numbers via a particular IBMi MATMATR MI
command option value.
- On systems with PCIe3 expansion drawers(feature code
#EMX0), a problem was fixed for a concurrent exchange of a PCIe
expansion drawer cable card, although successful, leaves the fault LED
turned on.
- A problem was fixed for shared processor pools
where uncapped shared processor partitions placed in a pool may not be
able to consume all available processor cycles. The problem may
occur when the sum of the allocated processing units for the pool
member partitions equals the maximum processing units of the pool.
|
SC860_180_165 / FW860.60
10/31/18 |
Impact: Availability
Severity: SPE
System firmware changes that affect all systems
- A security problem was fixed in the Dynamic Host Control
Protocol (DHCP) client on the service processor for an out-of-bound
memory access flaw that could be used by a malicious DHCP server to
crash the DHCP client process. The Common Vulnerabilities and
Exposures issue number is CVE-2018-5732.
|
SC860_165_165 / FW860.51
05/22/18 |
Impact: Security
Severity: SPE
Response for Recent Security
Vulnerabilities
- DISRUPTIVE:
In response to recently reported security vulnerabilities, this
firmware update is being released to address Common Vulnerabilities and
Exposures issue number CVE-2018-3639. In addition, Operating
System updates are required in conjunction with this FW level for
CVE-2018-3639.
|
SC860_160_056 / FW860.50
05/03/18 |
Impact: Availability
Severity: SPE
System firmware changes that affect all systems
- DEFERRED: A
problem was fixed for a PCIe3 I/O expansion drawer (with feature code
#EMX0) where control path stability issues may cause certain SRCs to be
logged. Systems using copper cables may log SRC B7006A87 or
similar SRCs, and the fanout module may fail to become active.
Systems using optical cables may log SRC of B7006A22 or similar
SRCs. For this problem, the errant I/O drawer may be recovered by
a re-IPL of the system.
|
SC860_138_056 / FW860.42
01/09/18 |
Impact: Security
Severity: SPE
New features and functions
- In response to recently reported security vulnerabilities,
this firmware update is being released to address Common
Vulnerabilities and Exposures issue numbers CVE-2017-5715,
CVE-2017-5753 and CVE-2017-5754. Operating System updates are
required in conjunction with this FW level for CVE-2017-5753 and
CVE-2017-5754.
|
SC860_127_056 / FW860.41
12/08/17 |
Impact: Availability
Severity: SPE |
SC860_118_056 / FW860.40
11/08/17 |
Impact: Availability
Severity: SPE
System firmware changes that affect certain systems
- DEFERRED: On
systems using
PowerVM firmware, a problem was fixed for DPO (Dynamic Platform
Optimizer) operations taking a very long and impacting the server
system with a performance degradation. The problem is triggered
by a
DPO operation being done on a system with unlicensed processor cores
and a very high I/O load. The fix involves using a different lock
type
for the memory relocation activities (to prevent lock contention
between memory relocation threads and partition threads) that is
created at IPL time, so an IPL is needed to activate the fix.
More
information on the DPO function can be found at the IBM Knowledge
Center: https://www.ibm.com/support/knowledgecenter/en/8247-42L/p8hat/p8hat_dpoovw.htm
|
SC860_103_056 / FW860.30
06/30/17 |
Impact: Availability
Severity: SPE
System firmware changes that affect certain systems
- DEFERRED: On
systems using PowerVM firmware, a problem was fixed for PCIe3 I/O
expansion drawer (#EMX0) link improved stability. The settings
for the continuous time linear equalizers (CTLE) was updated for all
the PCIe adapters for the PCIe links to the expansion drawer. The
system must be re-IPLed for the fix to activate.
|
SC860_082_056 / FW860.20
03/17/17 |
Impact: Availability
Severity: SPE
|
SC860_070_056 / FW860.12
01/13/17 |
Impact: Availability
Severity: SPE |
SC860_063_056 / FW860.11
12/05/16 |
Impact:
N/A
Severity: N/A
- This Service Pack contained updates for MANUFACTURING
ONLY.
|
SC860_056_056 / FW860.10
11/18/16 |
Impact:
New
Severity: New
System firmware changes that affect certain systems
- DISRUPTIVE:
On systems using the PowerVM
firmware, a problem was fixed for an "Incomplete" state caused by
initiating a resource dump with selector macros from NovaLink (vio
-dump -lp 1 -fr). The failure causes a communication
process stack
frame, HVHMCCMDRTRTASK, size to be exceeded with a hypervisor page
fault that disrupts the NovalLink and/or HMC communications. The
recovery action is to re-IPL the CEC but that will need to be done
without the assistance of the management console. For each
partition
that has a OS running on the system, shut down each partition from the
OS. Then from the Advanced System Management Interface
(ASMI), power
off the managed system. Alternatively, the system power button
may
also be used to do the power off. If the management console
Incomplete
state persists after the power off, the managed system should be
rebuilt from the management console. For more information on
management console recovery steps, refer to this IBM Knowledge Center
link: https://www.ibm.com/support/knowledgecenter/en/POWER7/p7eav/aremanagedsystemstate_incomplete.htm.
The fix is disruptive because the size of the PowerVM hypervisor must
be increased to accommodate the over-sized stack frame of the failing
task.
- DEFERRED: On
systems using the PowerVM
firmware, a problem was fixed for a CAPI function unavailable condition
on a system with the maximum number of CAPI adapters and
partitions.
Not enough bytes were allocated for CAPI for the maximum configuration
case. The problem may be circumvented by reducing the number of
active
partitions or CAPI adapters. The fix is deferred because
the size of
the hypervisor must be increased to provide the additional CAPI space.
- DEFERRED:
On systems using PowerVM
firmware, a problem was fixed for cable card capable PCI slots that
fail during the IPL. Hypervisor I/O Bus Interface UE B7006A84 is
reported for each cable card capable PCI slot that doesn't
contain a
PCIe3 Optical Cable Adapter for the PCIe Expansion Drawer (feature code
#EJ05). PCI slots containing a cable card will not report an
error but
will not be functional. The problem can be resolved by performing
an
AC cycle of the system. The trigger for the failure is the I2C
devices
used to detect the cable cards are not coming out of the power on reset
process in the correct state due to a race condition.
|