Power8 System Firmware
Applies to: 9119-MHE
and 9119-MME.
This document provides information about the installation of
Licensed
Machine or Licensed Internal Code, which is sometimes referred to
generically
as microcode or firmware.
Contents
1.0
Systems Affected
This
package provides firmware for Power System E880 (9119-MHE ) and
Power System E870
(9119-MME) servers
only.
The firmware level in this package is:
1.1 Minimum HMC Code Level
This section is intended to describe the "Minimum HMC Code Level"
required by the System Firmware to complete the firmware installation
process. When installing the System Firmware, the HMC level must be
equal to or higher than the "Minimum HMC Code Level" before starting
the system firmware update. If the HMC managing the server
targeted for the System Firmware update is running a code level lower
than the "Minimum HMC
Code Level" the firmware update will not proceed.
The
Minimum HMC Code level for
this firmware is: HMC V8 R8.3.0
(PTF MH01513) with Mandatory efix (PTF MH01514).
Although
the Minimum HMC Code level for this firmware is listed
above, HMC V8 R8.3.0 Service
Pack 1 (PTF MH01540) with fix
(PTF MH01574) or higher
is
recommended.
For information concerning HMC
releases and the latest PTFs,
go
to the following URL to access Fix Central:
http://www-933.ibm.com/support/fixcentral/
For specific fix level
information on key components of IBM
Power Systems running the AIX, IBM i and Linux operating systems, we
suggest using the Fix Level Recommendation Tool (FLRT):
http://www14.software.ibm.com/webapp/set2/flrt/home
NOTES:
-You must be logged in as hscroot in order for the
firmware
installation to complete correctly.
- Systems Director Management Console (SDMC) does not support this
System Firmware level.
1.2 AIX iFix Required
For IBM Power System
servers with the PCIe 2-port Async EIA-232 Adapter installed on AIX
partitions, an AIX fix resolving the async port interrupt handling
(APAR IV77596) must be installed before updating to the SC830_068
(FW830.10) or later level of firmware. The ports on the adapter
(feature code EN27/EN28, CCIN 57D4) may become un-usable with the
installation of that firmware level due to an issue with how interrupts
are handled. Many JAS_RTS error log entries are written to the
error log due to this issue.
Prior to this APAR shipping in a future Service Pack, AIX intends to
publish ifixes for the latest Service Packs on all active Technology
Levels on our ftp server, in ftp://aix.software.ibm.com/aix/ifixes/iv77596/
on or before Oct 13, 2015. If you need an ifix other than the
ones on this server, contact IBM support to request one for your
specific situation.
The procedure is intended to be performed by the customer. In the
event that the customer has questions or concerns with the procedure,
you should contact IBM Support. Please contact IBM Support:
US Support: 1.800.IBM.SERV
WW Support (select your country): http://www.ibm.com/planetwide/
2.0 Important
Information
Downgrading firmware from any
given release level to an earlier release level is not recommended.
If you feel that it is
necessary to downgrade the firmware on
your system to an earlier release level, please contact your next level
of support.
IPv6 Support and
Limitations
IPv6 (Internet Protocol version 6)
is supported in the System
Management
Services (SMS) in this level of system firmware. There are several
limitations
that should be considered.
When configuring a network interface
card (NIC) for remote IPL, only
the most recently configured protocol (IPv4 or IPv6) is retained. For
example,
if the network interface card was previously configured with IPv4
information
and is now being configured with IPv6 information, the IPv4
configuration
information is discarded.
A single network interface card
may only be chosen once for the boot
device list. In other words, the interface cannot be configured for the
IPv6 protocol and for the IPv4 protocol at the same time.
Concurrent Firmware Updates
Concurrent system firmware update is only supported on HMC Managed
Systems
only.
Memory
Considerations for
Firmware Upgrades
Firmware Release Level upgrades
and Service Pack updates may consume
additional system memory.
Server firmware requires memory to
support the logical partitions on
the server. The amount of memory required by the server firmware varies
according to several factors.
Factors influencing server
firmware memory requirements include the
following:
- Number of logical partitions
- Partition environments of the logical
partitions
- Number of physical and virtual I/O devices
used by the logical partitions
- Maximum memory values given to the logical
partitions
Generally, you can estimate the
amount of memory required by server
firmware to be approximately 8% of the system installed memory. The
actual amount required will generally be less than 8%. However, there
are some server models that require an absolute minimum amount of
memory for server firmware, regardless of the previously mentioned
considerations.
Additional information can be
found at:
http://www-01.ibm.com/support/knowledgecenter/9119-MHE/p8hat/p8hat_lparmemory.htm
3.0 Firmware
Information
Use the following examples as a reference to determine whether your
installation
will be concurrent or disruptive.
For systems that are not managed by an HMC, the installation
of
system
firmware is always disruptive.
Note: The concurrent levels
of system firmware may, on occasion,
contain
fixes that are known as Deferred and/or Partition-Deferred. Deferred
fixes can be installed
concurrently, but will not be activated until the next IPL.
Partition-Deferred fixes can be installed concurrently, but will not be
activated until a partition reactivate is performed. Deferred
and/or Partition-Deferred
fixes,
if any, will be identified in the "Firmware Update Descriptions" table
of this document. For these types
of fixes (Deferred and/or
Partition-Deferred) within a service pack, only the
fixes
in the service pack which cannot be concurrently activated are
deferred.
Note: The file names and service pack levels used in the
following
examples are for clarification only, and are not
necessarily levels that have been, or will be released.
System firmware file naming convention:
01SCxxx_yyy_zzz
- xxx is the release level
- yyy is the service pack level
- zzz is the last disruptive service pack level
NOTE: Values of service pack and last disruptive service pack
level
(yyy and zzz) are only unique within a release level (xxx). For
example,
01SC830_040_040 and 01SC840_040_045 are different service
packs.
An installation is disruptive if:
- The release levels (xxx) are
different.
Example:
Currently installed release is 01SC830_040_040,
new release is 01SC840_050_050.
- The service pack level (yyy) and the last disruptive
service
pack level (zzz) are the same.
Example:
SC830_040_040 is disruptive, no matter what
level of SC830 is currently
installed on the system.
- The service pack level (yyy) currently installed on the
system
is
lower than the last disruptive service pack level (zzz) of the service
pack to be installed.
Example:
Currently installed service pack is SC830_040_040 and new service
pack is SC830_050_045.
An installation is concurrent if:
The release level (xxx) is the same, and
The service pack level (yyy) currently installed on the system
is the same or higher than the last disruptive service pack level (zzz)
of the service pack to be installed.
Example: Currently installed service pack is SC830_040_040, new
service pack is SC830_071_040.
3.1 Firmware
Information
and Description
Filename |
Size |
Checksum |
01SC830_075_048.rpm
|
74858079
|
42576
|
Note: The Checksum can be found by running the AIX sum
command against
the rpm file (only the first 5 digits are listed).
ie: sum 01SC830_075_048.rpm
SC830
For Impact, Severity and other Firmware definitions, Please
refer to the below 'Glossary of firmware terms' url:
http://www14.software.ibm.com/webapp/set2/sas/f/power5cm/home.html#termdefs
The complete Firmware Fix History for this
Release Level can be
reviewed at the following url:
http://download.boulder.ibm.com/ibmdl/pub/software/server/firmware/SC-Firmware-Hist.html
|
SC830_075_048 / FW830.11
11/11/15 |
Impact:
Availability Severity: HIPER
System firmware changes that affect all systems
- HIPER/Pervasive:
A problem was fixed for recovering from embedded MultiMediaCard (eMMC)
flash NAND errors that caused the service processor to go to a failed
state with SRC B1817212 on systems with a single service
processor. On systems with redundant service processors, the
failed service processor would get guarded with a B151E6D0 or B152E6D0
SRC depending on which service processor fails.
- HIPER/Pervasive: A
problem associated with workloads using transactional memory on PowerVM
was discovered and is fixed in this service pack. The effect of the
problem is non-deterministic but may include undetected corruption of
data.
- DEFERRED: A
problem was fixed for memory on-die termination (ODT) settings to
improve the signal integrity of the memory channel.
- A problem was fixed for recovery from unaligned addresses
for MSI interrupts from PCIe adapters. The recovery prevents an
adapter timeout caused by resource exhaustion. With the fix, the
resources for each bad interrupt are returned, allowing the PCIe
adapter to continue to run for the normal traffic.
- A problem was fixed for an Operations Panel SRC of B1504804
with no FRU callout. A callout of the failed hardware has been
added.
- A problem was fixed to prevent recoverable power faults of
short duration from causing the system to lose power supply
redundancy. Without the fix, the faulted state persisted for the
recovered power fault, causing a problem with a system power off if
other power supplies were lost at a later time.
- A problem was fixed for a PCIe3 I/O expansion drawer
(#EMX0) link failure with SRC B7006A8B . The settings for the
continuous time linear equalizers (CTLE) were adjusted to improve the
incoming signal strength to improve the stability of the links.
The expansion drawer must be power cycled or the CEC can be re-IPLed
for the fix to activate.
- A problem was fixed for recovery from a processor local bus
(PLB) hang on the service processor. The errant PLB hang recovery
would be seen in concurrent firmware updates that, on rare occasions,
fail to do a side switch to activate to the new level of
firmware. On the management console, the error message would be
HSCF010180E Operation failed ... E302F873 is the error code."
Other than the failed code level activation, the firmware update is
successful. If this problem occurs, the system can be set to the
new firmware level by doing a power off from the management console and
then doing a power on with side switch selected in the advanced
properties.
System firmware changes that affect certain systems
- A problem was fixed
for the System Feature Code for the E870 (9119-MME) being displayed as
"EPBB" by IBM i "DSPSYSVAL QPRCFEAT" when it should be
"EPBA". This created a problem for certain IBM i software
packages whose license was tied to the System Feature Code. This
fix has a concurrent activation. For FW830.10, a similar,
non-concurrent fix for the feature codes was made but the System
Feature Code, as seen in IBM i partitions, did not update
immediately.
|
SC830_068_048 / FW830.10
09/10/15 |
Impact:
Availability Severity: HIPER
New features and functions
- The firmware code update process was enhanced with a
feature to block a firmware "downgrade" to a level that is below the
system's manufactured code level.
System firmware changes that affect all systems
- HIPER/Pervasive:DEFERRED:
A problem was fixed for a TCP/IP performance degradation on PCIe
ethernet adapters with Remote Direct Memory Access (RDMA) over
Converged Ethernet (RoCE). By adjusting the system memory
caching, a significant improvement was made to the data throughput
speed to restore performance to expected levels. This fix
requires a system re-IPL to take effect. This problem affects the
E850 (8408-E8E), E870 (9119-MME), and E880 (9119-MHE) systems.
- HIPER/Pervasive:
A
problem
was fixed for an ethernet adapter hanging on the service
processor.
This hang prevents TCP/IP network traffic from the managment console
and the Advanced System Management Interface (ASMI) browsers. It
makes
it appear as if the service processor is unresponsive and can be
confused with a service processor in the stopped state.. An A/C
power
cycle would recover a hung ethernet adapter.
- HIPER/Pervasive:
A
problem was fixed for missing the
interrupts for processor local bus (PLB) time-outs.. This problem
could hang the service processor or cause it to panic with a
reset/reload of the service processor. There is a possibility the
reset of the service processor could take it to a stopped state where
the service processor would be unresponsive. In the service
processor
stopped state, any active partitions will continue to run but they will
not be able to be managed by the management console. The
partitions
can be allowed to run until the next scheduled service window at which
time the service processor can be recovered with an AC power cycle or a
pin-hole reset from the operator panel.
- HIPER/Pervasive:
A
problem was fixed for a system
reset to clear the boot registers to prevent the reset from being
mishandled as chip reset. If a "system reset" is
misinterpreted as a
"chip reset", the boot of the service processor can go inadvertently to
a stopped state and be unresponsive. Pin-hole resets from the
operations panel could also fail to the service processor stopped
state. In the service processor stopped state, any active
partitions
will continue to run but they will not be able to be managed by the
management console. The partitions can be allowed to run until
the
next scheduled service window at which time the service processor can
be recovered with an AC power cycle or a pin-hole reset from the
operator panel.
- HIPER/Pervasive:
A
problem was
fixed so a corrupted file system partition table can be recovered and
not have the service processor lose the ability to do P and T-side
switches. In error recovery situations, the loss of the
side-switch
option could present itself as an unresponsive service processor if it
was needed to prevent a failure to the service processor stopped
state.
- HIPER/Pervasive:
A
problem was fixed for a runaway
interrupt request (IRQ) condition that caused the service processor to
go to a stopped state. In the service processor stopped state,
any
active partitions will continue to run but they will not be able to be
managed by the management console. The partitions can be allowed
to
run until the next scheduled service window at which time the service
processor can be recovered with an AC power cycle or a pin-hole reset
from the operator panel.
- HIPER/Pervasive:
A
problem was fixed for a dump
partition full condition that caused the service processor to go to a
stopped state. In the service processor stopped state, any active
partitions will continue to run but they will not be able to be managed
by the management console. The partitions can be allowed to run
until
the next scheduled service window at which time the service processor
can be recovered with an AC power cycle or a pin-hole reset from the
operator panel.
- DEFERRED: A
problem was fixed for a PCIe3 I/O expansion drawer (#EMX0)
link failure with SRC B7006A8B . Data packet send retries were
increased and link recovery was enabled to improve the stability of the
links. The CEC must be re-IPLed for the fix to activate.
- A problem was fixed for a SRC 11002613 logged during a
concurrent repair of a power supply. This SRC was erroneously
logged and did not represent a real problem.
- A problem was fixed for an intermittent SRC B1504804 logged
on a re-ipl of the CEC but that did not result in an IPL failure.
- A problem was fixed for the capture of the registers for
the Hostboot Self-Boot Engine (SBE) for SBE failures. These
registers had been missing from failure data for SBE failures, making
these problems more difficult to debug.
- A problem was fixed to remove an unnecessary delay in the
system IPL to reduce the time needed to IPL by 30 seconds.
- A problem was fixed for an unneeded error log with SRC
B181DB04 that occurred in a failed IPL for a normal condition of lost
PNOR flash access after a reIPL process had started and taken over the
access.
- A problem was fixed for an Advanced System Manager
Interface (ASMI) error message of "Error in function 'connect", error
code 111" when a browser attempted to connect before the service
processor was ready. The browser connection through the web
server is now held off until the ASMI process is ready after a reset of
the service processor or a AC power cycle of the system.
- A problem was fixed for an incorrect call home for SRC
B1818A0F. There was no real problem so this call home should have
been ignored.
- A problem was fixed for a dump reIPL that failed with SRC
B1818601 and B181460B after processor checkstops had terminated the
system.
- A problem was fixed for an infrequent service processor
database corruption during concurrent firmware update that caused the
system to terminate.
- A problem was fixed for a failed PCI oscillator that was
not guarded, causing repeated errors with SRC B15050A6 and B158E504
logged on each IPL of the system.
- A problem was fixed for a local clock card (LCC)
failure with SRC 11001515 that was missing a part number and location
code. This information has been added for LCC faults so the FRU
to replace is properly identified.
- A problem was fixed for a defective PCI oscillator in the
local clock card (LCC) with SRC BC58090F that caused a IPL failure for
the node instead of failing over to the redundant LCC.
- A problem was fixed for a service processor dump with error
logs B181E911 and B181D172 during an IPL. The error logs
were for the detection of defunct processes but otherwise the IPL was
successful.
- A problem was fixed for Digital Power Subsystem Sweep
(DPSS) firmware updates that caused an error log with SRC B1819906 but
otherwise was successful.
- A problem was fixed for missing Keyword (KW) and Resource
ID (RID) for SRC B181A40F.
- A problem was fixed for a I2C bus lock error during a CEC
power off that caused a ten minute delay for the power off and
errorlog SRCs B1561314 and B1814803 with error number (errno) 3E.
- A problem was fixed for the System Feature Code for the
E870(9119-MME) being displayed as "EPBB" by IBM i "DSPSYSVAL
QPRCFEAT" when it should be "EPBA". This created a problem
for certain IBM i software packages whose license was tied to the
System Feature Code. The System Feature Code, as seen in IBM
i partitions, does not update immediately with concurrent
activation of the fix pack, but it will eventually change to the
correct "EPBA" value within 24 hours. If it is necessary to see
the new System Feature Code value immediately, a re-IPL of the
system is needed.
- A problem was fixed for concurrent firmware updates to a
system that needed to be re-IPLed after getting a B113E504 SRC during
activation of the new firmware level on the hypervisor. The code
update activate failed if the Sleep Winkle (SLW) images were
significantly different between the firmware levels. The SLW
contains the state of the processor and cache so it can be restored
after sleep or power saving operations.
- A problem was fixed for System Power Control Network (SPCN)
failover for a I/O module A/C power fault on the PCIe3 I/O
expansion drawer (#EMX0). A sideband failure on one I/O module
was blocking SPCN commands for the entire drawer instead of SPCN
failing over to a working I/O module. The broken SPCN
communications path prevented concurrent maintenance operations
on the expansion drawer.
- A problem was fixed for a possible lack of recovery for an
A/C power loss condition on the PCIe3 I/O expansion drawer
(#EMX0). If there was an outstanding problem on the
expansion drawer and an A/C loss occurred while the earlier error was
still unprocessed, the auto-recovery for the A/C power loss would not
have happened.
- A problem was fixed for a missing FRU call out for error
SRC B7006A87 when unable to read the drawer module logical flash
VPD for the PCIe3 I/O expansion drawer (#EMX0).
- For a partition that has been migrated with Live Partition
Mobility (LPM) from FW730 to FW740 or later, a problem was fixed for a
Main Storage Dump (MSD) IPL failing with SRC B2006008. The MSD
IPL can happen after a system failure and is used to collect failure
data. If the partition is rebooted anytime after the migration,
the problem cannot happen. The potential for the problem existed
between the active migration and a partition reboot.
- A problem was fixed for partial loss of Entitlement for
On/Off Memory Capacity On Demand (also called Elastic COD). Users
with large amounts of Entitlement on the system of greater than "65535
GB * Days" could have had a truncation of the Entitlement value on a
re-IPL of the system. To recover lost Entitlement, the customer
can request another On/Off Enablement Code from IBM support to
"re-fill" their entitlement.
- A problem was fixed for a management console command line
failure with a return code 0x40000147 (invalid lock state) when trying
to delete SR-IOV shared mode configurations. This could have
occurred if the adapter slot had been re-purposed without involvement
of the management console and was owned and operational at the time of
the requested delete. With the fix, the current ownership of the
slot is honored and only the SR-IOV shared mode configuration data is
deleted on the force delete.
- A problem was fixed for an incorrect restriction on
the amount of "Unreturned" resources allowed for a Power
Enterprise Pool (PEP). PEP allows for logical moving of resources
(processors and memory) from one server to another. Part of this
is 'borrowing' resources from one server to move to another. This may
result in "Unreturned" resources on the source server. The management
console controls how many total "Unreturned" PEP resources can
exist. For this problem, the user had some "Unreturned" PEP
memory and asked to borrow more but this request was incorrectly
refused by the hypervisor.
- A problem was fixed for a PCIe3 I/O expansion drawer
(#EMX0) error with SRCs B7006A82 and B7004137 for a missing FRU
location code. The FRU location code for the Active Optical Cable
(AOC) was added to identify the failing drawer side.
- A problem was fixed for a PCIe3 I/O expansion drawer
(#EMX0) failing to IPL when the IPL includes a FPGA update for
the drawer. The FPGA update is actually good but perceived as a
failure when the FPGA resets as part of the update. For the
problem, a re-IPL of the system would have fixed the drawer.
- A problem was fixed for Live Partition Mobility (LPM) to
prevent a memory access error during LPM operations with unpredictable
affects. When data is moved by LPM, the underlying firmware code
requires that the buffers be 4K aligned. The fixes made now force
the buffers to be 4K aligned and if there is still an alignment issue,
the LPM operation will fail without impacting the system.
- A problem was fixed for an On-Chip Controller (OCC) failure
after a system dump with SRCs B18B2616 and BC822024 reported.
This resulted in the system running with reduced performance in safe
mode, where processor clock frequencies are lowered to minimum levels
to avoid hardware errors since the OCC is not available to monitor the
system. A re-IPL of the system would have resolved the
problem.
- A performance problem was fixed for systems entering
processor hang recovery prematurely with SRC B111E504 and PBCENTFIR(9)
"PB_CENT_HANG_RECOV". The ability of the L3 cache to prefetch
memory was extended to speed the memory accesses and prevent a
processor hang condition for applications running with lower memory
affinity.
- A problem was fixed for a processor error causing a
Hostboot terminate instead of a deconfiguration of the bad hardware and
continuation of the IPL. The state of the processors was
synchronized between the service processor and the Hostboot process to
correct the error.
- A problem was fixed for a USB Save and Restore of machine
configuration to not lose the system name.
- A problem was fixed for Advanced System Management
Interface (ASMI) help text for menu "I/O Adapter Enlarged Capacity"
being missing with the system IPLed and partitions running. The
help text is now available for the system in the powered on state as
well as in the powered off state.
- A problem was fixed for an intermittent power supply error
SRC 1100D008 with a flood of VPD SRC B1504804 with errno 3Es logged on
a re-ipl of the CEC but that did not result in an IPL failure.
- A problem was fixed for a LED intermittently not lighting
for an enclosure with a fault.
- A problem was fixed for an intermittent PSI link error with
SRC B15CDA27 after a firmware update or reset/reload of the service
processor.
- A problem was fixed for PCIe3 adapters failing when
requesting more than 32 Message Signaled Interrupts (MSI-X). The
adapter may fail to ping or cause OS tasks to hang that are using the
adapter. This problem was found specifically on the 10 Gb
Ethernet-SR (Short Range) PCIe3 adapter with feature codes #5275 and
#5769 and on the 56 Gb Infiniband (IB) Fourteen Data Rate (FDR) adapter
with feature codes #EC32, #EC33, #EL3D, and #EL50 and CCIN 2CE7.
However, other PCIe adapters may also be affected.
- A problem was fixed for IBM copyright statements being
displayed on the System Management Services (SMS) menu after a repair
or replacement of system hardware.
System firmware changes that affect certain systems
- HIPER/Pervasive:
For
partitions with a graphics
console and USB keyboard, a problem was fixed for a OS boot hang at the
CA00E100 progress SRC. For the problem, the hang can be avoided
by
issuing the boot command from the Open Firmware (OF) prompt.
- HIPER/Pervasive:
On
systems using
PowerVM with shared processor partitions that are configured as capped
or in a shared processor pool, there was a problem found that delayed
the dispatching of the virtual processors which caused performance to
be degraded in some situations. Partitions with dedicated
processors
are not affected. The problem is rare and can be mitigated,
until the
service pack is applied, by creating a new shared processor AIX or
Linux partition and booting it to the SMS prompt; there is no need to
install an operating system on this partition. Refer to help
document http://www.ibm.com/support/docview.wss?uid=nas8N1020863
for additional
details.
- DEFERRED: A
problem was fixed for
Non-Volatile Memory express (NVMe) adapters, plugged into PCIe3
switches, mis-training to generation 1 instead of generation
3. NVMe
adapters attached directly to the PCIe3 slots trained correctly to the
generation 3 specification. This fix requires a re-IPL of the system to
correct the training of any mis-trained adapters.
- On multiple-node systems, a problem was fixed for a missing
location code, part, and serial number for a faulty symmetric
multiprocessing (SMP) cable in the call home B1504922 error log.
- On multiple-node systems, a problem was fixed for a two
hour IPL hang in HostBoot caused by multiple B18ABAAB errors from more
than one node. The Hostboot process failed to go into its
reconfiguration loop to do error recovery and continue the IPL.
- On a system with redundant service processors, a
problem was fixed for an IPL failure for a bad service processor cable
on the primary service processor with SRCs B1504904 and B18ABAAB
logged. The system should have did an error failover to the
backup service processor and continued the IPL to get the partitions
running.
- On a system with redundant service processors where
redundancy is disabled, a problem was fixed for an unrecoverable (UE)
SRC B181DA19 being logged on a re-IPL after a checkstop error.
The error log did not interfere with the reIPL which was successful.
- On multiple-node systems, a problem was fixed for
extraneous error logs after a 12V power fault. After termination,
there were additional 110026Bx error log entries that should have been
ignored.
- On a system with redundant service processors, a problem
was fixed for the isolation procedures for an Anchor card error and
system VPD collection failure with termination SRC B181A40F .
FSPSP04 and FSPSP06 are no longer called out as part of reporting the
VPD collection failure. FSPSP30 has been updated with isolation
steps for this problem and is called out and should be used for the
problem isolation. Retain tip H213935 also provides the FRU
isolation steps. Procedure FSPSP30 tries to replace the service
processor first. If that does not work, then the procedure has
the Anchor card replaced.
- On multiple-node systems, a problem was fixed to isolate a
power fault during IPL to the specific node and guard the node, and
allow the rest of the system to IPL. Previously, the power fault
would not be localized to the problem node and it caused the IPL of all
the nodes of the system to fail.
- On a system with redundant service processors, a problem
was fixed for failovers to the backup service processor that caused an
On-Chip Controller (OCC) abort. This placed the CEC in a "safe"
mode where it ran at reduced processor clock frequencies to prevent
exceeding the power limits while not under OCC control.
- On a system with an IBM i partition using Active Memory
Sharing (AMS), a problem was fixed for internal memory management
errors caused by deleting a IBM i partition that had been powered off
in the middle of a Main Storage Dump (MSD). Until the fix is
installed, if a MSD is interrupted for a IBM i partition that has AMS,
the partition should be powered on and powered off normally before a
delete of the partition is done to prevent errors with unpredictable
affects. This problem does not affect the S822 (8284-22A),
S812L(8247-21L), S822L (8247-22L), S824L(8247-42L), and E850 (8408-E8E)
models.
- On a system with redundant service processors, a problem
was fixed for a failover to the backup service processor during a power
off of the CEC that caused a hypervisor time-out with SRC
B182953C. This error was caused by a delay in synchronizing the
state of the hypervisor to the backup service processor but it did not
prevent the power off from completing successfully.
- On a system with redundant service processors, a problem
was fixed for a firmware update causing an error log server dump with
SRC B1818601. The error log server restarted automatically to
recover from the error and the firmware update was successful.
|
SC830_048_048 / FW830.00
06/08/15 |
Impact:
New
Severity: New
New Features and Functions
NOTE:
- POWER8 (and later) servers include an “update access key”
that is checked when system firmware updates are applied to the
system. The initial update access keys include an expiration date
which is tied to the product warranty. System firmware updates will not
be processed if the calendar date has passed the update access key’s
expiration date, until the key is replaced. As these update
access keys expire, they need to be replaced using either the Hardware
Management Console (HMC) or the Advanced Management Interface (ASMI) on
the service processor. Update access keys can be obtained via the
key management website: http://www.ibm.com/servers/eserver/ess/index.wss.
- Support for Little Endian (LE) Linux in PowerVM. With
PowerVM LE guest support, all three Linux on Power distribution
partners (SUSE, Canonical, and Red Hat) with LE operating systems can
run on the same IBM Power Systems.
- Support for allowing the PowerVM hypervisor to continue to
run after the service processor has become unresponsive with a SRC
B1817212. Any active partitions will continue to run but they
will not be able to be managed by the management console. The
partitions can be allowed to run until the next scheduled service
window at which time the service processor can be recovered with an AC
power cycle or a pin-hole reset from the operator panel. This
error condition would only be seen on a system that had been running
with a single service processor (no redundancy for the service
processor).
- Support for three and four node configurations of the E880
(9119-MHE) system.
- Support for an increase of the maximum number of PCIe 3 I/O
expansion drawers (#EMX0) that can be attached to an E870 /E880 node
from two to four.
- Support for Single Root I/O Virtualization (SR-IOV) that
enables the hypervisor to share a SR-IOV-capable PCI-Express adapter
across multiple partitions. Twelve ethernet adapters are supported with
the SR-IOV NIC capability, when placed in the P8 system (SR-IOV
supported in both native mode and through VIOS):
- PCIe3 4-port 10GbE SR Adapter
(F/C EN15 and CCIN 2CE3)
- PCIe3 4-port 10GbE SR Adapter
(F/C EN16 and CCIN 2CE3).
Fits E870/E880 system node PCIe slot.
- PCIe3 4-port 10GbE SFP+ Copper Adapter
(F/C EN17 and CCIN 2CE4)
- PCIe3 4-port 10GbE SFP+ Copper Adapter
(F/C EN18 and CCIN 2CE4). Fits E870/E880
system node PCIe slot.
- PCIe2 4-port (10Gb FCoE & 1GbE) SR and RJ45 SFP+
Adapter (F/C EN0H and CCIN 2B93)
- PCIe2 LP 4-port (10Gb FCoE & 1GbE) SR and RJ45 SFP+
Adapter (F/C EN0J and CCIN 2B93)
- PCIe2 LP Linux 4-port (10Gb FCoE & 1GbE) SR and RJ45 SFP+
Adapter (F/C EL38 and CCIN 2B93)
- PCIe2 4-port (10Gb FCoE & 1GbE) LR and RJ45 Adapter
(F/C EN0M and
CCIN 2CC0)
- PCIe2 LP 4-port (10Gb FCoE & 1GbE) LR and RJ45 Adapter
(F/C EN0N and
CCIN 2CC0)
-PCIe2 4-port (10Gb FCoE & 1GbE) SFP+Copper and RJ45
Adapter (F/C EN0K and CCIN 2CC1)
- PCIe2 LP 4-port (10Gb FCoE & 1GbE) SFP+Copper and
RJ45 Adapter
(F/C EN0L and CC IN 2CC1)
- PCIe2 LP Linux 4-port (10Gb FCoE & 1Gb Ethernet) SFP+Copper and
RJ45 (F/C EL3C and CCIN 2CC1)
These adapters each have four ports, and all four ports are enabled
with SR-IOV function. The entire adapter (all four ports) is configured
for SR-IOV or none of the ports is.
System firmware updates the adapter firmware level on these adapters to
10.2.252.16 when a supported adapter is placed into SR-IOV mode.
Support for SR-IOV adapter sharing is now available for adapters in the
PCIe3 I/O Expansion Drawer with F/C #EMX0.
SR-IOV NIC on the Power P8 systems is supported by:
- AIX 6.1 TL9 SP4 and APAR IV63331, or later
- AIX 7.1 TL3 SP4 and APAR IV63332, or later
- IBM i 7.1 TR8, or later (Supported on S824/S814)
- IBM i 7.2 or later (Supported on
S824/S814)
- IBM i 7.1 TR9, or later (Supported on E870/E880)
- IBM i 7.2 TR1, or later (Supported on
E870/E880)
-
Red Hat Enterprise Linux 6.5 or later ( Supported on
E870/E880/S812L/S822/S822L/S814/S824/S824L except for adapters with
F/Cs EN15/EN16/EN17/EN18)
- Red Hat Enterprise Linux 6.6, or later (Supported
on E850 and minimum level needed for adapters with F/Cs
EN15/EN16/EN17/EN18)
- Red Hat Enterprise Linux 7.1, or later
- SUSE Linux Enterprise Server 11 SP1 or later
(Supported on S812L/S822/S822L/S814/S824/S824L)
- SUSE Linux Enterprise Server 11 SP3 or later
(Supported on E870/E880)
- SUSE Linux Enterprise Server 12, or later
(Supported on E850)
- Ubuntu 15.04 or later (Supported on
E850/S812L/S822/S822L/S814/S824/S824L)
- VIOS 2.2.3.4 with interim fix IV63331, or later
- Support for an upgrade from 8-core processors to 12-core
processors for the E880 (9119-MHE) system.
- Support for adjusting voltage regulators input voltage
dynamically based on regulator slave failures to achieve the optimal
voltage for system operation for normal and degraded conditions.
System firmware changes that
affect all systems
- A problem was fixed to eliminate unneeded guard data from
call home messages for the cases where there is no hardware error in
the system.
- On systems with redundant service processors, a problem was
fixed in
the run-time error failover to the backup service processor so it does
not terminate on FRU support interface (FSI) errors. In the case
of
FSI errors on the new primary service processor, the primary will do a
reset/reload instead of a terminate.
- A problem was fixed to call home guarded FRUs on each
IPL. Only the initial failure of the hardware was being reported
to the error log.
- Support was added to the Advanced System Management
Interface (ASMI)
USB menu to allow a system dump to be collected to USB with the power
on to the system. This allows the dump to be collected with the
system
memory state intact.
- A problem was fixed for the service processor error log
handling that
caused SRC B150BAC5 errors when converting a error log entry from an
object into a flattened array of bytes.
- A problem was fixed that prevented a second management
console from
being added to the CEC. In some cases, network outages caused
defunct
management console connection entries to remain in the service
processor connection table, making connection slots unavailable
for
new management consoles A reset of the service processor could be
used
to remove the defunct entries.
- A problem was fixed to eliminate a false error log and call
home for a
SRC1100154F fan fault caused by an unplugged power cable.
- A problem was fixed for a highly intermittent IPL failure
with SRC
B18187D9 caused by a defunct attention handler process. For this
problem, the IPL will continue to fail until the service processor is
reset.
A problem was fixed for missing FRU information in SRC
11001515. SRC
11001515 was logged indicating replacement of power supply hardware,
but did not include the location code, the part number, the CCIN, or
the serial number.
- A problem was fixed for systems with a corrupted date of
"1900" showing
for the Update Access Key (UAK). The firmware update is allowed
to
proceed on systems with a bad UAK date because the fix is in an
emergency service pack. After the fix is installed, the user
should
correct the UAK date, if needed, by using the original UAK key for the
system. On the Management Console, enter the original
update access
key via the "Enter COD Code" panel. Or on the Advanced System Manager
Interface (ASMI), enter the original update access key via the
"On
Demand Utilities/COD Activation" panel.
- A problem with concurrent PCIe adapter maintenance was
fixed that
caused On-Chip Controller (OCC) resets with SRCs logged of B18B2616 and
BC822029, forcing the system into safe mode (processor
voltage/frequency reduced to a "safe" level where thermal monitoring is
not required). Recovery from safe mode requires a system re-IPL.
- A problem was fixed for I/O adapters so that BA400002
errors were
changed to informational for memory boundary adjustments made to the
size of DMA map-in requests. These DMA size adjustments were
marked as
UE previously for a condition that is normal.
System firmware changes that
affect certain systems
- On systems in PowerVM mode, a problem was fixed for
unresponsive PCIe adapters after a partition power off or a partition
reboot.
- On systems using Virtual Shared
Processor Pools (VSPP), a problem was fixed for an inaccurate pool idle
count over a small sampling period.
- On systems with partitions using shared
processors, a problem was fixed that could result in latency or timeout
issues with I/O devices.
- On systems using PowerVM, a
problem was fixed for a hypervisor deadlock that results in the system
being in a "Incomplete state" as seen on the management console.
This
deadlock is the result of two hypervisor tasks using the same locking
mechanism for handling requests between the partitions and the
management console. Except for the loss of the management console
control of the system, the system is operating normally when the
"Incomplete state" occurs.
- On systems with memory mirroring enabled,
a problem was fixed for PowerVM over-estimating its memory needs,
allowing more memory to be used by the partitions.
- On systems using PowerVM, a problem was
fixed for the handling of the error of multiple cache hits in the
instruction effective-to-real address translation cache (IERAT).
A
multi-hit IERAT error was causing system termination with SRC
B700F105. The multi-hit IERAT is now recognized by the hypervisor
and
reported to the OS where it is handled.
- On systems using PowerVM, a
problem was fixed to allow booting off an iSCSI device. For the
failure, the partition firmware error logs had SRC BA012010 "Opening
the TCP node failed." and SRC BA010013 "The information in the error
log entry for this SRC provides network trace data." The open
firmware
standard output trace showed SRC BA012014 "The TCP
re-transmission
count of 8 was exceeded. This indicates a large number of lost packets
between this client and the boot or installation server" followed by
SRC BA012010.
- On systems using PowerVM, support was added for USB 2.0
HUBs so that a keyboard plugged into the USB 2.0 HUB will work
correctly at the SMS menus. Previously, a keyboard plugged into a
USB 2.0 HUB was not a recognized device.
|
4.0
How to Determine The Currently Installed Firmware Level
You can view the server's
current firmware level on the Advanced System
Management Interface (ASMI) Welcome pane. It appears in the top right
corner.
Example: SC830_123.
5.0
Downloading the Firmware Package
Follow the instructions on Fix Central. You must read and agree to
the
license agreement to obtain the firmware packages.
Note: If your HMC is not internet-connected you will need
to
download
the new firmware level to a USB flash memory device or ftp server.
6.0 Installing the
Firmware
The method used to install new firmware will depend on the release
level
of firmware which is currently installed on your server. The release
level
can be determined by the prefix of the new firmware's filename.
Example: SCxxx_yyy_zzz
Where xxx = release level
- If the release level will stay the same (Example: Level
SC830_040_040 is
currently installed and you are attempting to install level
SC830_071_040)
this is considered an update.
- If the release level will change (Example: Level SC830_040_040 is
currently
installed and you are attempting to install level SC840_050_050) this
is
considered an upgrade.
Instructions for
installing firmware updates and upgrades can be found
at http://www-01.ibm.com/support/knowledgecenter/9119-MHE/p8ha1/updupdates.htm
IBM i Systems:
For information concerning IBM i Systems, go
to the following URL to access Fix Central:
http://www-933.ibm.com/support/fixcentral/
Choose "Select product", under
Product Group specify "System i", under
Product specify "IBM i", then Continue and specify the desired firmware
PTF accordingly.
7.0 Firmware History
The complete Firmware Fix History for this Release Level can be
reviewed at the following url:
http://download.boulder.ibm.com/ibmdl/pub/software/server/firmware/SC-Firmware-Hist.html