AH780_056_040 / FW780.10
04/25/14 |
Impact: Serviceability
Severity: SPE
New Features and Functions
- Support was added for monitored compliance of the Power
Integrated Facility for Linux (IFL). IFL is an optional lower
cost per processor core activation for Linux-only workloads on IBM
Power Systems. Power IFL processor cores can be activated that
are restricted to running Red Hat Linux or SUSE linux. In
contrast, processor cores that are activated for general-purpose
workloads can run any supported operating system. Power IFL
processor cores are enabled by feature code ELJ1 using Capacity Upgrade
on Demand (CUoD). Linux partitions can use IFL processors and the
other processor cores but AIX and IBM i5/OS cannot use the IFL
processors. The IFL monitored compliance process will send
customer alert messages to the management console if the system is out
of compliance for the number of IFL processors and general-purpose
workload processors that are in active use compared to the number that
have been licensed.
Power IFL and monitored compliance is not supported on IBM Power ESE
(8412-EAD) system because it has the AIX operating system only.
System firmware changes that affect all systems
- A problem was fixed that prevented a HMC-managed system
from being converted to manufacturing default configuration (MDC) mode
when the management console command "lpcfgop -m <server> -o
clear" failed to create the default partition. The management
console went to the incomplete state for this error.
- A problem was fixed that logged an incorrect call home
B7006956 NVRAM error during a power off of the system. This error
log indicates that the NVRAM of the system is in error and will be
cleared on the next IPL of the system. However, there is no NVRAM
error and the error log was created because a reset/reload of the
service processor occurred during the power off.
- Help text for the Advanced System Management Interface
(ASMI) "System Configuration/Hardware Deconfiguration/Clear All
Deconfiguration Errors" menu option was enhanced to clarify that when
selecting "Hardware Resources" value of "All hardware resources", the
service processor deconfiguration data is not cleared. The
"Service processor" must be explicitly selected for that to be cleared.
- A firmware code update problem was fixed that caused the
Hardware Management Console (HMC) to go to "Incomplete State" for the
system with SRC E302F880 when assignment of a partition universal
unique identifier (UUID) failed for a partition that was already
running. This problem happens for disruptive code updates from
pre-770 levels to 770 or later levels.
- A problem was fixed that caused frequent SRC B1A38B24 error
logs with a call home every 15 seconds when service processor network
interfaces were incorrectly configured on the same subnet. The
frequency of the notification of the network subnet error has been
reduced to once every 24 hours.
- A problem was fixed that prevented guard error logs from
being reported for FRUs that were guarded during the system power
on. This could happen if the same FRU had been previously
reported as guarded on a different power on of the system. The
requirement is now met that guarded FRUs are logged on every power on
of the system.
- A problem was fixed for the Advanced System Management
Interface (ASMI) "Login Profile/Change Password" menu where ASMI would
fail with "Console Internal Error, status code 500" displayed on the
web browser when an incorrect current password was entered.
- A problem was fixed for a system with pool resources for a
resource remove operation that caused the number of unreturned
resources to become incorrect. This problem occurred if the
system first became out of compliance with overdue unreturned resources
and then another remove of a pool resources from the server was
attempted.
- A problem was fixed for the Advanced System Management
Interface (ASMI) "System Information/Firmware Maintenance
History" menu option on the service processor to display the firmware
maintenance history instead of the message "No code update
history log was found".
- A problem was fixed for a Live Partition Mobility (LPM)
suspend and transfer of a partition that caused the time of day to skip
ahead to an incorrect value on the target system. The problem
only occurred when a suspended partition was migrated to a target CEC
that had a hypervisor time that was later than the source CEC.
- A problem was fixed for IBM Power Enterprise System Pools
that prevented the management console from changing from the backup to
the master role for the enterprise pool. The following error
message was displayed on management console: "HSCL90F7 An
internal error occurred trying to set a new master management console
for the Power enterprise pool. Try the operation again. If this
error persists, contact your service representative."
This defect does not pertain to the IBM Power 770 (9117-MMB) and IBM
Power 780 (9179-MHB) systems.
- A problem was fixed for Live Partition Mobility (LPM) where
a 2x performance decrease occurs during the resume phase of the
migration when migrating from a system with 780 or later firmware back
to a system with a pre-780 level of firmware.
System firmware changes that affect certain systems
- On systems with multiple CEC drawers or nodes, a problem
was fixed in the service processor Advanced System Management Interface
(ASMI) performance dump collection that only allowed performance data
to be collected for the first node of the system. The
"System Service Aids/Performance Dump" menu of the ASMI is used to work
with the performance dump.
- On systems involved in a series of consecutive Live
Partition Mobility (LPM) operations, a memory leak problem was fixed in
the run time abstraction service (RTAS) that caused a partition run
time AIX crash with SRC 0c20. Other possible symptoms include
error logs with SRC BA330002 (RTAS memory allocation failure).
- On systems running Dynamic Platform Optimizer (DPO) with
one or more unlicensed processors, a problem was fixed where the system
performance was significantly degraded during the DPO operation.
The amount of performance degradation was more for systems with larger
numbers of unlicensed processors.
- On systems with a redundant service processor, a problem
was fixed where the service processor allowed a clock failover to occur
without a SRC B158CC62 error log and without a hardware deconfiguration
record for the failed clock source. This resulted in the system
running with only one clock source and without any alerts to warn that
clock redundancy had been lost.
- On systems with a management console and service processors
configured with Internet Protocol version 6 (IPv6) addresses, a
problem was fixed that prevented the management console from
discovering the service processor. The Service Location Protocol
(SLP) on the service processor was not being enabled for IPv6, so it
was unable to respond to IPv6 queries.
- On a system with a partition with a AIX and Linux boot
source to support dual booting, a problem was fixed that caused the
Host Ethernet Adapter (HEA) to be disabled when rebooting from Linux to
AIX. Linux had disabled interrupts for the HEA on power down,
causing an error for AIX when it tried to use the HEA to access the
network.
- On a system with a disk device with multiple boot
partitions, a problem was fixed that caused System Management Services
(SMS) to list only one boot partition. Even though only one boot
partition was listed in SMS, the AIX bootlist command could still be
used to boot from any boot partition.
Concurrent hot add/repair maintenance firmware fixes
- On a system with sixteen or more logical partitions, a
problem was fixed for a memory relocation error during concurrent hot
node repair that caused a hang or a failure. The problem can also
be triggered by mirrored memory defragmentation on a system with
selective memory mirroring.
|
AH780_040_040 / FW780.00
12/06/13 |
Impact:
New
Severity: New
New Features and Functions
- Support was added to the Virtual I/O Server (VIOS) for
shared storage pool mirroring (RAID-1) using the virtual SCSI (VSCSI)
storage adapter to provide redundancy for data storage.
- Support was added to upgrade the service processor to
openssl version 1.0.1 and for compliance to National Institute of
Standards and Technologies (NIST) Special Publications 800-131a.
SP800-131a compliance required the use of stronger cryptographic keys
and more robust cryptographic algorithms.
- Support was added to the Management Console command line to
allow configuring a shared control channel for multiple pairs of Shared
Ethernet Adapters (SEAs). This simplifies the control channel
configuration to reduce network errors when the SEAs are in fail-over
mode.
- Support was added in Advanced System Management Interface
(ASMI) to facilitate capture and reporting of debug data for system
performance problems. The "System Service Aids/Performance
Dump" menu was added to ASMI to perform this function.
- Support was added to the Management Console for group-based
LDAP authentication.
- Partition Firmware was enhanced to to be able to recognize
and boot from disks formatted with the GUID Partition Table (GPT)
format that are capable of being greater than 2TB in size. GPT is
a standard for the layout of the partition table on a physical hard
disk, using globally unique identifiers (GUID), that does not have the
2TB limit that is imposed by the DOS partition format.
- The call home data for every serviceable event of the
system was enhanced to include information on every guarded element
(processor, memory,I/O chip, etc) and contains the part number and
location codes of the FRUs and the service processor de-configuration
policy settings.
- Support for IBM PCIe 3.0 x8 dual 4-port SAS RAID adapter
with 12 GB cache with feature code EJ0L and CCIN 57CE.
- Support for Dynamic Platform Optimizer (DPO) enhancements
to show the logical partition current and potential affinity
scores. The Management Console has also been enhanced to show the
partition scoring. The operating system (OS) levels that support
DPO:
◦
AIX 6.1 TL8 or later
◦
AIX 7.1 TL2 or later
◦
VIOS 2.2.2.0
◦
IBM i 7.1 PTF MF56058
◦
Linux RHEL7
◦
Linux SLES12
Note:
If DPO is used with an older version of the OS that predates the above
levels, either:
- The partition needs to be rebooted after DPO completes to optimize
placement, or
- The partition is excluded from participating in the DPO operation
(through a command line option on the "optmem" command that is used to
initiate a
DPO operation).
- Support was added to the Management Console and the Virtual
I/O Server (VIOS) to provide the capability to to enable and disable
individual virtual ethernet adapters from the management console.
- Support for Management Console logical partition
Universally Unique IDs (UUIDs) so that the HMC preserves the UUID for
logical partitions on backup/restore and migration.
- Support for Management Console command line to configure
the ECC call home path for SSL proxy support.
- Support for Management Console to minimize recovery state
problems by using the hypervisor and VIOS configuration data to
recreate partition data when needed.
- Support for Management Console to provide scheduled
operations to check if the partition affinity falls below a threshold
and alert the user that Dynamic Platform Optimizer (DPO) is needed.
- Support for enhanced platform serviceability to extend call
home to include hardware in need of repair and to issue periodic
service events to remind of failed hardware.
- Support for IBM PCIe 3.0 x8 non-caching 2-port SAS RAID
adapter with feature code EJ0J. and CCIN 57B4.
- Support for Virtual I/O Server (VIOS) to support 4K block
size DASD as a virtual device.
- Support for performance improvements for concurrent Live
Partition Mobility (LPM) migrations.
- Support for Management Console to handle all Virtual I/O
Server (VIOS) configuration tasks and provide assistance in configuring
partitions to use redundant VIOS.
- Support for Management Console to maintain a profile that
is synchronized with the current configuration of the system, including
Dynamic Logical Partitioning (DLPAR) changes.
- Support for Power System Pools allows for the aggregation
of Capacity on Demand (CoD) resources, including processors and memory,
to be moved from one pool server to any other pool server as needed.
- Support for a Management Console Performance and Capacity
Monitor (PCM) function to monitor and manage both physical and virtual
resources.
- Support for virtual server network (VSN) Phase 2 that
delivers IEEE standard 802.1Qbg based on Virtual Ethernet Port
Aggregator (VEPA) switching. This supports the Management Console
assignment of the VEPA switching mode to virtual Ethernet switches used
by the virtual Ethernet adapters of the logical partitions. The
server properties in the Management Console will show the capability
"Virtual Server Network Phase 2 Capable" as "True" for the system.
- Support for Virtual I/O Server (VIOS) for an IBMi client
data connection to a SIS64 device driver backed by VSCSI physical
volumes.
- Support for the Power 795 GX++ 1-port 4X Infiniband QDR
adapter with CCIN 2B76 and feature code EN25.
- Support was dropped for Secured Socket Layer (SSL) protocol
version 2 and SSL weak and medium cipher suites in the service
processor web server (Ligthttpd) . Unsupported web browser
connections to the Advanced System Management Interface (ASMI) secured
port 443 (using https://) will now be rejected if those browsers do not
support SSL version 3. Supported web browsers for Power7 ASMI are
Netscape (version 9.0.0.4), Microsoft Internet Explorer (version 7.0),
Mozilla Firefox (version 2.0.0.11), and Opera (version 9.24).
- Support was added in Advanced System Management Interface
(ASMI) "System Configuration/Firmware Update Policy" menu to detect and
display the appropriate Firmware Update Policy (depending on whether
system is HMC managed) instead of requiring the user to select the
Firmware Update Policy. The menu also displays the "Minimum Code
Level Supported" value.
System firmware changes that affect all systems
- A problem was fixed that caused a service processor kernel
panic on an out-of-memory condition with SRC B181720D when an incorrect
MTMS was specified for a frame in the Advanced System Management
Interface (ASMI).
- A problem was fixed that caused a service processor OmniOrb
core dump with SRC B181EF88 logged.
- A problem was fixed that caused the system attention LED to
stay lit when a bad FRU was replaced.
- A problem was fixed that caused a memory leak of 50 bytes
of service processor memory for every call home operation. This
could potentially cause an out of memory condition for the service
processor when running over an extended period of time without a reset.
- A problem was fixed that caused a L2 cache error to not
guard out the faulty processor, allowing the system to checkstop again
on an error to the same faulty processor.
- A problem was fixed that caused a HMC code update failure
for the FSP on the accept operation with SRC B1811402 or FSP is unable
to boot on the updated side.
- A problem was fixed that caused a system checkstop during
hypervisor time keeping services.
- A problem was fixed that caused a built-in self test (BIST)
for GX slots to create corrupt error log values that core dumped the
service processor with a B18187DA. The corruption was caused by a
failure to initialize the BIST array to 0 before starting the tests.
- The Hypervisor was enhanced to allow the system to continue
to boot using the redundant Anchor (VPD) card, instead of stopping the
Hypervisor boot and logging SRC B7004715, when the primary Anchor
card has been corrupted.
- A problem was fixed with the Dynamic Platform Optimizer
(DPO) that caused memory affinity to be incorrectly reported to the
partitions before the memory was optimized. When this
occurs, the performance is impacted over what would have been gained
with the optimized memory values.
- A problem was fixed that caused a migrated partition to
reboot during transfer to a VIOS 2.2.2.0, and later, target system. A
manual reboot would be required if transferred to a target system
running an earlier VIOS release. Migration recovery may also be
necessary.
- A problem was fixed that can cause Anchor (VPD) card
corruption and A70047xx SRCs to be logged. Note: If a
serviceable event with SRC A7004715 is present or was logged
previously, damage to the VPD card may have occurred. After the fix is
applied, replacement of the Anchor VPD card is recommended in
order to restored full redundancy.
- The firmware was enhanced to display on the management
console the correct number of concurrent Live Partition Mobility (LPM)
operations that is supported.
- A problem was fixed that caused a 1000911E platform event
log (PEL) to be marked as not call home. The PEL is now a call
home to allow for correction. This PEL is logged when the
hypervisor has changed the Machine Type Model Serial Number (MTMS) of
an external enclosure to UTMP.xxx.xxxx because it cannot read the vital
product data (VPD), or the VPD has invalid characters, or if the MTMS
is a duplicate to another enclosure.
- A problem was fixed that caused the state of the Host
Ethernet Adapter (HEA) port to be reported as down when the physical
port is actually up.
- When powering on a system partition, a problem was fixed
that caused the partition universal unique identifier (UUID) to not get
assigned, causing a B2006010 SRC in the error log.
- For the sequence of a reboot of a system partition followed
immediately by a power off of the partition, a problem was fixed where
the hypervisor virtual service processor (VSP) incorrectly retained
locks for the powered off partition, causing the CEC to go into
recovery state during the next power on attempt.
- A problem was fixed that caused an error log generated by
the partition firmware to show conflicting firmware levels. This
problem occurs after a firmware update or a Live Partition Mobility
(LPM) operation on the system.
- A problem was fixed that caused the system attention LED
to be lit without a corresponding SRC and error log for the
event. This problem typically occurs when an operating system on
a partition terminates abnormally.
- A problem was fixed that caused the slot index to be
missing for virtual slot number 0 for the dynamic reconfiguration
connector (DRC) name for virtual devices. This error was visible
from the management console when using commands such as "lshwres -r
virtualio --rsubtype slot -m machine" to show the hardware resources
for virtual devices.
- A problem was fixed that caused a system checkstop with SRC
B113E504 for a recoverable hardware fault.
- A problem was fixed during resource dump processing that
caused a read of an invalid system memory address and a SRC
B181C141. The invalid memory reference resulted from the service
processor incorrectly referencing memory that had been relocated by the
hypervisor.
System firmware changes that affect certain systems
- On systems with a redundant service processor, a problem
was fixed that caused fans to run at a high-speed after a failover to
the sibling service processor.
- On systems with a redundant service processor, a problem
was fixed that caused a guarded sibling service processor
deconfiguration details to not be able to be shown in the Advanced
System Management Interface (ASMI).
- On systems with a redundant service processor, a problem
was fixed that caused a SRC B150D15E to be erroneously logged after a
failover to the sibling service processor.
- When switching between turbocore and maxcore mode, a
problem was fixed that caused the number of supported partitions to be
reduced by 50%.
- On systems in turbocore mode with unlicensed processors, a
problem was fixed that caused an incorrect processor count. The
AIX command lparstat gave too high a value for "Active Physical CPUs in
system" when it included unlicensed turbocore processors in the count
instead of just counting the licensed processors.
- A problem was fixed that was caused by an attempt to modify
a virtual adapter from the management console command line when the
command specifies it is an Ethernet adapter, but the virtual ID
specified is for an adapter type other than Ethernet. The managed
system has to be rebooted to restore communications with the management
console when this problem occurs; SRC B7000602 is also logged.
- On systems running AIX or Linux, a problem was fixed that
caused the operating system to halt when an InfiniBand Host Channel
Adapter (HCA) adapter fails or malfunctions.
- On systems running AIX or linux, a hang in a Live Partition
Mobility (LPM) migration for remote restart-capable partitions was
fixed by adding a time-out for the required paging space to become
available. If after five minutes the required paging space is not
available, the start migration command returns a error code of
0x40000042 (PagingSpaceNotReady) to the management console.
- On systems running Dynamic Platform Optimizer (DPO) with no
free memory, a problem was fixed that caused the Hardware
Management System (HMC) lsmemopt command to report the wrong status of
completed with no partitions affected. It should have indicated
that DPO failed due to insufficient free memory. DPO can only run
when there is free memory in the system.
- On systems with partitions using physical shared processor
pools, a problem was fix that caused partition hangs if the shared
processor pool was reduced to a single processor.
- On a system running a Live Partition Mobility (LPM)
operation, a problem was fixed that caused the partition to
successfully appear on the target system, but hang with a 2005 SRC.
- On systems using IPv6 addresses, the firmware was enhanced
to reduce the time it take to install an operating system using the
Network Installation Manager (NIM).
- On systems managed by a management console, a problem was
fixed that caused a partition to become unresponsive when the AIX
command "update_flash -s" is run.
- On systems with turbo-core enabled that are a target of
Live Partition Mobility (LPM), a problem was fixed where
cache properties were not recognized and SRCs BA280000 and BA250010
reported.
Concurrent hot add/repair maintenance firmware fixes
- A problem was fixed that caused a concurrent hot add/repair
maintenance operation to fail on an erroneously logged error for the
service processor battery with SRCs B15A3303, B15A3305, and
B181EA35 reported.
- The firmware was enhanced to reduce the number of
concurrent hot add/repair maintenance failures due to the operation
timing out on fully-configured systems.
- A problem was fixed that caused a concurrent hot add/repair
maintenance operation to fail if a memory channel failure on the CEC
was followed by a service processor reset/reload.
- A problem was fixed that caused SRC B15A3303 to be
erroneously logged as a predictive error on the service processor
sibling after a successful concurrent repair maintenance operation for
the real-time clock (RTC) battery.
- A problem was fixed that caused a concurrent hot add/repair
maintenance operation to fail with SRC B181C350.
- A problem was fixed that prevented the I/O slot information
from being presented on the management console after a concurrent node
repair.
- A problem was fixed that caused Capacity on Demand (COD)
"Out of Compliance" messages during concurrent maintenance operations
when the system was actually in compliance for the licensed amount of
resources in use.
|