IBM POWER9 Systems LC Server Firmware
Applies to: IC922 (9183-22X)
This document provides information about the installation of Licensed Machine or Licensed Internal Code, which is sometimes referred to generically as microcode or firmware.
This package provides firmware for the Power System IC922 (9183-22X) server only.
OP940 supports DD2.2 and DD2.3 processor versions.
The firmware level in this package is:
•OP940.22 / PNOR OP9-v2.4.1-4.31 and BMC op940.22.mih-1
WARNING: Once the OP940.20 service pack is installed, the system will not boot successfully with any service packs prior to OP940.20. The system can be recovered by reloading OP940.20 or any newer service pack.
This section specifies the "Minimum ipmitool Code Level" required by the System Firmware for managing the system. OpenPOWER requires ipmitool level v1.8.15 or later to execute correctly on the OP910 and later firmware. It must be capable of establishing a IPMI v2 session with the ipmi support on the BMC.
Verify your ipmitool level on your Linux workstation using the following command:
bash-4.1$ ipmitool -V
ipmitool version 1.8.15
If you are need to update or add impitool to your Linux workstation , you can compile ipmitools (current level 1.8.15) for Linux as follows from Sourceforge:
1.1.1 Download impitool tar from http://sourceforge.net/projects/ipmitool/ to your linux system
1.1.2 Extract tarball on Linux system
1.1.3 cd to top-level directory
1.1.4 ./configure
1.1.5 make
1.1.6 ipmitool will be under src/ipmitool
You may also get the ipmitool package directly from your workstation Linux packages.
For specific fix level information on key components of IBM Power Systems LC and Linux operating systems, please refer to the documentation in the IBM Knowledge Center for the IC922 (9183-22X) :
https://www.ibm.com/support/knowledgecenter/POWER9/p9hdx/9183_22x_landing.htm
If using xCAT on the host OS to do firmware updates, the minimum xCAT level that should be used is 2.13.4 because it has stability improvements for the firmware update process. See the xCAT 2.13.4 release notes below for more information.
https://github.com/xcat2/xcat-core/wiki/XCAT_2.13.4_Release_Notes
The Linux OS has a NVIDIA CUDA driver that must be at recommended level 396.44 or later, or minimum level 396.26 to be compatible with OP920.00 and later levels. Without this driver, a GPU which has faulted and gone through a GPU reset can cause a Terminate Immediate (TI) for the system. The recommended level for the NVIDIA CUDA driver is level 396.44 to get ATS performance improvements.
The Power IC922 server delivers four Tesla T4 with PCIe GPUs supported in two processor sockets.
For OP940.20, the recommended levels for NVIDIA CUDA are 410.129 (CUDA 10.0) or later, 418.165.02 (CUDA 10.1) or later, and 490.118.02 (CUDA 10.2) or later to get the latest fixes for the GPUs.
The NVIDIA "http://www.nvidia.com/Download/index.aspx?lang=en-us" link using the following information can be used to do a manual search for a driver. For example, to find and download the 410.129 CUDA 10.0, driver, enter the following at the NVIDIA web page:
Manually find drivers for my NVIDIA products.
Product Type: Data Center / Tesla
Product Series: V-Series
Product: Tesla V100
Operating System: Linux POWER LE RHEL 7
CUDA Toolkit: 10.0
Language: English(US)
Search results:
Version: 410.129
Release Date: 2019.9.4
Operating System: Linux POWER LE RHEL 7
CUDA Toolkit: 10.0
Language: English (US)
File Size: 35.45 MB
On Red Hat Enterprise Linux (RHEL) for PPC, RHEL-ALT 7.6, The Trusted Platform Module (TPM) device driver is not loaded automatically at boot time. Without this driver, the TPM device will not be accessible.
This affects any user-space application needing to access the TPM, as well as kernel security functions, such as the Integrity Measurement Architecture subsystem (IMA) in the Linux kernel. Without the TPM driver loaded, IMA will be unable to record trusted measurements to the TPM.
To load the driver manually, as root:
# modprobe tpm_i2c_nuvoton
To load the driver automatically at boot time:
# echo "tpm_i2c_nuvoton" > /etc/modules-load.d/tpm.conf"
The TPM device driver will be integrated as a built-in kernel module in a future release 7 of RHEL-ALT. Once this is done, it will be loaded automatically and this procedure will no longer be necessary.
PCIe4 adapters with feature codes that include #EC62, #EC64, #EC6E, and #EC6G for both air and water cooled systems may require additional cooling over what is given by the system default thermal mode settings. To handle the cooling needs of these adapters, the openbmctool can be used to set a recommended thermal mode to provide the cooling.
Please refer to IBM Knowledge Center for selection of the appropriate thermal mode for the PCIe4 adapter, system models, and cable types:
https://www.ibm.com/support/knowledgecenter/en/POWER9/p9ei3/p9ei3_thermal_mode.htm
The Knowledge Center article provides the guidance for the thermal mode to select and how to use the openbmctool to select it. Be aware that the thermal modes needed to supply additional cooling are not the system default and they have to be manually selected. Once the new thermal mode is selected with the system powered off or on, it is persistent until either changed by the user or a factory reset of the system is done. v1.14 of the openbmctool is the minimum version necessary to select the thermal mode and the recommended version is v1.17. There is no functional difference between these versions for thermal modes but error messages were updated in v.1.17 to reflect that with OP940 release level systems and later do not have to be powered on to set the thermal mode as was required in earlier releases. The thermal modes that can be selected include "DEFAULT", "CUSTOM", "HEAVY_IO", and "MAX_BASE_FAN_FLOOR". The system comes pre-selected with "DEFAULT" which may not provide enough cooling for some adapters and configurations.
Note: If your system has one of the affected PCIe4 adapters and you are using optical cables for that adapter, a thermal mode other than "DEFAULT" must be selected with openbmctool to get the proper cooling for the system.
Downgrading firmware from any given release level to an earlier release level is not recommended.
If you feel that it is necessary to downgrade the firmware on your system to an earlier release level, please contact your next level of support.
Notes:
Updating back to an OP940 level earlier than OP940.20 from OP940.20 or later is not supported. OP940.20 is a secure firmware level with anti-rollback protection so this may result in a system hang or other failures.
Concurrent Firmware Updates not available for LC servers.
Concurrent system firmware update is not supported on LC servers.
Due to there are some PCIe Gen4 signal integrity (SI) concerns about the IC922 Riser-A, so the host firmware (PNOR) has worked around the issues by limiting the maximum speed of the Riser-A slots from “Gen4” to “Gen3” only.
Set the access privilege when using IPMI to create user accounts to avoid disrupting user account viewing.
When using IPMI to create user accounts, there are three steps needed to create a valid user account that does not interfere with the display of the user account records.
The first step is to create the user account. The second step is assign the privilege level for the user. The third step is to enable the user. Failure to assign the privilege level will result in disruption of the viewing of the user account records through the BMC gui or through Redfish. The problem with the user account viewing is resolved once the privilege has been assigned for the account in IPMI using the ipmitool channel command.
Here is an example of setting a user name and assigning privilege, and then enabling the user. In this example, the userid "4" is assigned name "username" and password "password" and given administrator access on channel 1:
# ipmitool user set name 4 username
# ipmitool user set password 4 password
# ipmitool channel setaccess 1 4 link=on ipmi=on privilege=4
# ipmitool user enable 4
If the ipmitool channel command is not used to set the access privilege, this will cause issues when displaying the user accounts.
When a Root user or a user with "CAP_SYS_ADMIN" privileges executes the "perf" command with the trace_imc performance monitoring unit event to monitor applications or KVM threads, this execution may result in a checkstop (System crash) when using an IBM Power9 DD2.2 system(and greater) with Firmware OP930 (and greater) and Red Hat Enterprise 8.1.
The Linux host system can crash if the guest is rebooted while the host is monitoring the guest with 'perf kvm record' on qemu process and using the "-p" option.
Example of failure sequence:
1) On the Linux host, do "perf kvm record -e trace_cycles -p <Guest qemu pid>"
2) Run the guest for a duration of time while tracing it from the host
3) Reboot the guest and the host system may crash
This problem can be circumvented by using the "-i" or "--no-inherit" parameter on the "perf kvm record" command so that child processes do not inherit the performance counters:
perf kvm record -e trace_cycles -p <Guest qemu pid> -i
Use the following examples as a reference to determine whether your installation will be concurrent or disruptive.
For the LC server systems, the installation of system firmware is always disruptive.
The BMC and PNOR image tar files are used to update the primary side of the PNOR and the primary side of the BMC only, leaving the golden sides unchanged.
Filename | Size | Checksum |
obmc-mihawk-op940.22.hmc.ubi.mtd.tar
| 22876160 | 460fc94ba3ccb88581053fc34d43cd98 |
mihawk-IBM-OP9-v2.4.1-4.31_prod.pnor.squashfs.tar
| 29829120 | 09a57ac7be43883c841c95059a752ded |
Note: The Checksum can be found by running the Linux/Unix/AIX md5sum command against the Hardware Platform Management (hpm) file (all 32 characters of the checksum are listed), ie: md5sum <filename>
After a successful update to this firmware level, the PNOR components and BMC should be at the following levels.
To display the PNOR level, use the following BMC command: "cat /var/lib/phosphor-software-manager/pnor/ro/VERSION | grep -A 12 IBM"
And the BMC command line command "cat" can be used to display the BMC level: "cat /etc/os-release".
Note: FRU information for the PNOR level does not show the updated levels via the fru command until the system has been booted once at the updated level.
PNOR firmware level: driver content
IBM-mihawk-ibm-OP9_v2.4.1-4.31-prod
op-build-v2.3-rc2-509-g58296ed
buildroot-2019.05.2-11-g8e3337d
skiboot-v6.5.3-29-g74a7a87a-p550ec19
hostboot-6027b81-p18d8df0
occ-9047e57
linux-5.3.7-openpower1-pe53056a
petitboot-v1.11
machine-xml-b1fc7ca
hostboot-binaries-hw101520a.op940
capp-ucode-p9-dd2-v4
sbe-47abe2a-p5e8fd50
hcode-hw031221a.op940
BMC firmware level: driver content
BMC Primary side version:
ID="openbmc-phosphor"
NAME="Phosphor OpenBMC (Phosphor OpenBMC Project Reference Distro) "
VERSION="op940.22.mih-1"
VERSION_ID=”op940.22.mih-1-0-g41157d8d2e”
PRETTY_NAME="Phosphor OpenBMC (Phosphor OpenBMC Project Reference Distro) op940.22.mih-1"
BUILD_ID="op940.22.mih-1"
OPENBMC_TARGET_MACHINE="mihawk"
OP940 | |
PNOR OP9-v2.4.1-4.31 with BMC op940.22.mih-1 OP940.22
09/30/2021 | Impact: Security Severity: HIPER
System firmware changes that affect all systems
HIPER/Pervasive: A security problem was fixed that allowed a network attacker to use specially crafted IPMI messages to bypass authentication and gain full control of the system. This is security vulnerability CVE-2021-39296. |
PNOR OP9-v2.4.1-4.31 with BMC op940.21.mih-1 OP940.21
03/31/2021
| Impact: Availability Severity: HIPER
System firmware changes that affect all systems
HIPER/Pervasive: A problem was fixed for a processor core checkstop with SRC BC70E540 logged with Signature Description " ex(n0p1c4) (NCUFIR[11]) NCU no response to snooped TLBIE". This problem is intermittent and random but occurs with relatively high frequency for certain workloads. The trigger for the failure is one core of a fused core pair going into a stopped state while the other core of the pair continues running.
The BMC SNMP trap was enhanced to add the BMC uptime to the SNMP trap data.
|
PNOR OP9-v2.4.1-4.26 with BMC op940.20.mih-2 OP940.20
03/12/2021
|
The original PNOR image that was published was signed with the wrong key. At code update time, the signature validation failed and thus the PNOR image was not updated on the target machine.
This re-publish is to deliver the updated PNOR image. There are no functional changes in the image outside of those listed below.
|
PNOR OP9-v2.4.1-4.25 with BMC op940.20.mih-2 OP940.20
01/22/2021 | Impact: Security Severity: SPE
New features and functions:
Host firmware support for anti-rollback protection. This feature implements firmware rollback protection as described in NIST SP 800-147B “BIOS Protection Guidelines for Servers”. All host firmware is signed with a "secure version". The secure boot verification process will block any firmware secure version that is less than the "minimum secure version" that is maintained in the processor hardware. During the system power on the host firmware will update the "minimum secure version" to match the currently running firmware.
WARNING: Once this service pack is installed the system will not boot successfully with any service packs prior to this one. The system can be recovered by reloading this or any newer service pack.
Host support added for dynamic PCIe slot table.
Support added for Red Hat Enterprise Linux 8.2.
Support was added for a 16 Gb DDR4 single rank DIMM with feature code #8A2643. This memory DIMM can be used in place of #EM61, a dual-rank 16Gb DIMM, for faster memory performance.
Support was added to allow adapters with Radeon cards to be the petitboot console. This specifically includes the graphics adapter with IBM feature code #EC51. The default is for the system to use the embedded AST graphics card for the console, so to change to the Radeon card, one of the following two steps must be taken by the user to manually change the Skiroot kernel command line and append commands that instruct which console to use. On the Petitboot shell do one of the following: 1) Use a command to disable the default AST graphics card by blacklisting the driver for the embedded AST graphics card: # nvram -p ibm,skiboot --update-config bootargs="`cat /proc/cmdline` module_blacklist=ast" 2) Or use a command to map the Framebuffer console to the Radeon driver (considering AST graphics is still active): # nvram -p ibm,skiboot --update-config bootargs="`cat /proc/cmdline` fbcon=map:N" where "N" corresponds to the mapping function of currently active framebuffer devices (see 'Documentation/fb/fbcon.rst' in the Linux Kernel source for more information). The user can configure the console output back to the embedded AST graphics card by removing any 'module_blacklist=ast' or 'fbcon=map:1' entries from the 'bootargs' nvram setting, or by resetting it to the default: # nvram -p ibm,skiboot --update-config bootargs=""
Support was added for a Manufacturing ID and a Product ID.
Support added for the VPD EEPROM.
Support for a GPU thermal policy was added so that when the GPU temperature exceeds 92 C degrees, the system will automatically shut down.
Support was added for the following GRUB2 enhancements for petitboot: 1) 'source' command is now supported. 2) UUID and label are now supported in the 'search' command.
System firmware changes that affect all systems
A problem was fixed to be able to detect a failed PFET sensing circuit in a core at runtime, and prevent a system failure with an incomplete state when a core fails to wake up. The failed core is detected on the subsequent IPL. With the fix, a core is called out with the PFET failure with SRC BC13090F and hardware description "CME detected malfunctioning of PFET headers." to isolate the error better with a correct callout.
A problem was fixed for mixing memory DIMMs with different timings (different vendors) under the same memory controller that fail with an SRC BC20E504 error and DIMMs deconfigured. This is an "MCBIST_BRODCAST_OUT_OF_SYNC" error. The loss of memory DIMMs can result in an IPL failure. This problem can happen if the memory DIMMs have a certain level of timing differences. If the timings are not compatible, the failure will occur on the IPL during the memory training. To circumvent this problem, each memory controller should have only memory DIMMs from the same vendor plugged.
A problem was fixed for PCIe resources under a deconfigured PCIe Host Bridge (PHB) being shown on the OS host as available resources when they should be shown as deconfigured. While this fix can be applied concurrently, a re-IPL of the system is needed to correct the state of the PCIe resources if a PHB had already been deconfigured.
A problem was fixed for a possible host hang if the BMC becomes unresponsive, such as during a BMC reboot. With the fix, the uart will drop console write message data to the BMC if the BMC does not respond to avoid a host hang.
A problem was fixed for a re-IPL after a system crash failing with a processor core in an incorrect state. This can happen if the core is in a STOP state to save power at the time of the initial crash and then this stopped core cannot be initialized correctly on the re-IPL
A problem was fixed for non-optimal On-Chip Controller (OCC) processor frequency adjustments when system power limits or user power caps are exceeded. When a workload causes power limits or caps to be exceeded, there can be large frequency swings for the processors and a processor chip can get stuck at minimum frequency. With the fix, the OCC now waits for new power readings when changing the processor frequency and uses a master power capping frequency to keep all processors at the same frequency. As a workaround for this problem, do not set a power cap or run a workload that would exceed the system power limit.
A problem was fixed for an intermittent mismatch in the number of fan sensors versus the actual number of system fans. The problem can be circumvented by a re-IPL of the system. Fan sensors may be missing or there could be extra fan sensors when the problem happens.
A problem was fixed for the "Firmware Revision" showing as "0.00" when displayed by the "mc info" IPMI command. The Rest commands show the "Firmware Revision" correctly.
A problem was fixed for a checkstop not generating any eSels or guards of hardware. Without the fix, there is no data to debug the system failure when this happens.
A problem was fixed for BMC recovery from errors that cause directory paths to be deleted on the BMC. This problem has pervasive Common.Error.InternalFailure errors during BMC boots because of missing directories.
A problem was fixed to tune the equalization settings for OpenCAPI.
A problem was fixed for intermittent host IPMI aborts caused by the use of invalid pointers that had been freed.
A problem was fixed to force PEC2 to Gen3 by default.
A problem was fixed for security vulnerability CVE-2020-14156. With this vulnerability, a weakness in file permissions allows a remote authenticated attacker to recover the cleartext password of IPMI users. Without the fix, the problem can be avoided by changing the file permissions to restrict access to the root user only using the following procedure:
A problem was fixed for failing fans not being called out in an error log and they do not get marked non-functional in inventory. Without the fix, this problem can be circumvented by manually starting the fan monitor service on the BMC twenty seconds after a power on has been requested, using the following BMC command: "systemctl start phosphor-fan-monitor@0.service".
A problem was fixed for the REST AccountService/Accounts APIs to provide more feedback to guide the user on why a local account creation or password change has failed. If the password is not good enough, the service will indicate that result with the exact reason provided from the Linux PAM modules. Creating a new local user account and updating the password of an existing local user account both require a password to be supplied. Without the fix, when this password is not acceptable (complexity rules, etc), the request fails, with a suggestion to use a better password, but a diagnostic message from PAM is not provided to help the user.
A problem was fixed for handling On-Chip Controller (OCC) UE errors so that the OCC can reset without terminating the system. Without the fix, the system will checkstop on any OCC UE that requires a reset of the OCC. A re-ipl of the system will recover from this error.
A problem was fixed for a system HMI that can occur if a GPU Address Translation Request (ATR) exceeds the time out period. With the fix, the timeout period was extended to allow the worst-case extra time for memory accesses if the data was not in the cache.
A security vulnerability was fixed to block access to unknown I2C devices on the TPM bus. With the fix, bus traffic is allowed only to the I2C bus address explicitly defined in the firmware. If an unknown address is used by the host Linux, an I2C timeout error is returned to the OS without forwarding the request to the TPM bus.
A problem was fixed for an OPAL boot hang if OPAL keeps aborting early in the boot path. With the fix, an OPAL Terminate Immediate (TI) is used to signal the BMC about the OPAL termination. This allows the BMC to quiesce the system for debug if the reboot tries have exceeded the reboot counter limit in the AutoReboot policy.
A problem was fixed for a failure in a GPU that can occur if the GPU is reset at run time.
A problem was fixed for allocation of memory on the LPC bus that is not a power of 2 being truncated. Using memory space that is not allocated can cause unexpected system failure as unowned memory becomes corrupted. This problem is rare because it can only occur for requests of memory that are near the maximum memory size for an allocation.
A problem was fixed for a spurious error message when Linux is first booted: " LPC[000]: Got SYNC no-response error. Error address reg: 0xd0010082". Without the fix, this error message can be ignored as it just indicates a boot progress indicator was not implemented by the BMC.
A problem was fixed for the I2C signal used to reset OpenCapi adapters so that when the slot is powered off, the reset pin will not be kept high by the I2C controller anymore. This eliminates applying a voltage to part of the FPGA, which can send the FPGA into a bad state or even damage the card.
A problem was fixed for a GPU being reset from an error state remaining fenced and not usable.
A problem was fixed for not reporting OpenCapi slot reset failures. Although resets were attempted to recover from errors, any resets that failed on adapters were not reported to the OS which could result in hangs in operations directed at broken adapters. With the fix, a message is logged for the broken hardware that the PHY state could not be reset to working and the OS does not attempt to use it.
A problem was fixed for the LED nomenclature to make it compatible with xCAT.
A problem was fixed for a policy table missing for the BMC web SEL logs.
A problem was fixed so that power fail events are repeatedly generated at runtime when there is a power fault. With the fix, the pseq-monitor generates the "PSU1 powerfail" events for as long the fault with the power good signal exists.
A problem was fixed for GPU sensors showing a yellow "warning" even though no GPUs are installed.
A problem was fixed for an intermittent IPL failure with SRC B181E540 logged with fault signature " ex(n2p1c0) (L2FIR[13]) NCU Powerbus data timeout". No FRU is called out. The error may be ignored and the re-IPL is successful. The error occurs very infrequently. This is the second iteration of the fix that has been released. Expedient routing of the Powerbus interrupts did not occur in all cases in the prior fix, so the time-out problem was still occurring.
A problem was fixed for RHEL OpenShift boot failures caused by dependencies on unsupported features in petitboot GRUB2. Functionality in the GRUB2 parser was extended to allow RHEL OpenShift to boot correctly. Certain Red Hat cloud image installations were failing without the fix.
A problem was fixed for an LDAP Bind failure on the first configuration attempt from the BMC GUI. This problem applies to systems that have had a factory reset or to systems that are newly shipped from manufacturing when using LDAP for the first time from the BMC GUI. The BMC GUI is not setting the LDAP Bind password. The Bind password is needed for LDAP Authentication with the LDAP server. Without the fix, the workaround to this problem is to use openbmctool to configure LDAP. Here is an example of using the openbmctool command for LDAP configuration creation: openbmctool -U root -P "0penBmc" -H <BMC IP address> ldap enable -a "<LDAP server URI>" -B "<bind DN of the LDAP server>" -b "<base DN of the LDAP server>" -p "<bind password of the LDAP server>" -S sub -t OpenLDAP
System firmware changes that affect certain systems
For systems with deconfigured cores, a rare problem was fixed for an incorrect voltage/frequency setting for the processors during heavy workloads with high ambient temperature. This error could impact power usage, expected performance, or system availability if a processor fault occurs. This problem can be avoided by not using CPU frequency scaling when there are cores deconfigured in the system. Instead, use the "powersave" governor to run the CPUs at the minimum frequency or the "userspace" governor to run the CPUs at a user-specified fixed frequency.
|
PNOR OP9-v2.4-4.37 with BMC op940.00.mih-5 / OP940.00
02/07/2020 | Impact: New Severity: New
New features and functions:
Improved BMC password policy. For the BMC, the root password must be set on first use for newly manufactured systems and after a factory reset of the system. This policy change helps to enforce the BMC is not left in a state with a well known password. In firmware level OP940.00 and later, the root password is expired by default. The BMC administrator must change the password before you can access the functions of the BMC. If you are upgrading from a previous OpenBMC firmware level, you do not have to change the password. The administrator can change from the default password to a new password using the interfaces described below: 1. Web GUI 2. To change your expired password from a network interface, you can use Redfish APIs. 3. To change your expired password from the OpenBMC tool command, you can use the openbmctool set_password subcommand to change your password. Examples of using these interfaces for expired password recovery can be found in the following IBM Knowledge Center article:
Support was added for using DD2.2 and DD2.3 version P9 processors in the same system.
Support was added for Redfish APIs on the BMC. OpenBMC-based systems can be managed by using the DMTF Redfish APIs. Redfish is a REST API used for platform management and is standardized by the Distributed Management Task Force, Inc. (http://www.dmtf.org/standards/redfish). For more information, see the IBM Knowledge Center at the following link: https://www.ibm.com/support/knowledgecenter/POWER9/p9eih/p9eih_redfish.htm
Support was added for SSL certificate upload and generation. For more information, see the IBM Knowledge Center at the following link: https://www.ibm.com/support/knowledgecenter/POWER9/p9eih/p9eih_openbmc_ssl.htm
Support was added for IPMI version 2.0 on the BMC. IPMI network access can be disabled. The command to disable IPMI network access is: "ipmitool lan set 1 access off". You can use in-band IPMI if you want to re-enable IPMI network access: "ipmitool lan set 1 access on".
Support was added for Lightweight Directory Access Protocol (LDAP) on the BMC. For more information on using LDAP, see the IBM Knowledge Center at the following link: https://www.ibm.com/support/knowledgecenter/POWER9/p9eih/p9eih_openbmc_ldap.htm
Support was added for creating multiple local user accounts on the BMC. For more information on managing user accounts to add or remove new users, modify user settings, manage user account policy settings, and view privilege role descriptions, see the IBM Knowledge Center at the following link: https://www.ibm.com/support/knowledgecenter/POWER9/p9eih/p9eih_openbmc_usermanage.htm
Support was added for KVM (Keyboard, Video and Mouse) consoles on the BMC. For more information on how to use the remote KVM console., see the IBM Knowledge Center at the following link: https://www.ibm.com/support/knowledgecenter/POWER9/p9eih/p9eih_openbmc_kvm.htm
Support was added for virtual media devices on the BMC. For more information on how to use virtual media device to start a session, see the IBM Knowledge Center at the following link: https://www.ibm.com/support/knowledgecenter/POWER9/p9eih/p9eih_openbmc_virtualmedia.htm
Support was added for remote logging in the BMC gui.
Support was added for boot options in the BMC gui.
Support was added for SNMP traps and alerts on the BMC. For more information on SNMP settings on the BMC, see the IBM Knowledge Center at the following link: https://www.ibm.com/support/knowledgecenter/POWER9/p9eih/p9eih_openbmc_snmpsettings.htm
New thermal modes added to support PCIe4 adapters with feature codes #EC62, #EC64, #EC6E, and #EC6G for both air and water cooled systems. Please refer to IBM Knowledge Center for selection of the appropriate thermal mode for the PCIe4 adapter, system models, and cable types: https://www.ibm.com/support/knowledgecenter/en/POWER9/p9ei3/p9ei3_thermal_mode.htm
This article provides the guidance for the thermal mode to select and how to use the openbmctool to select it. Be aware that the thermal modes needed to supply additional cooling are not the system default and they have to be manually selected. Once the new thermal mode is selected at with the system powered off or on, it is persistent until either changed by the user or a factory reset of the system is done. v1.14 of the openbmctool is the minimum version necessary to select the thermal mode and the recommended version is v1.17. There is no functional difference between these versions for thermal modes but error messages were updated in v.1.17 to reflect that with OP940 release level systems and later do not have to be powered on to set the thermal mode as was required in earlier releases. The thermal modes that can be selected include "DEFAULT", "CUSTOM", "HEAVY_IO", and "MAX_BASE_FAN_FLOOR". The system comes pre-selected with "DEFAULT" which may not provide enough cooling for some adapters and configurations. Note: If your system has one of the affected PCIe4 adapters and you are using optical cables for that adapter, a thermal mode other than "DEFAULT" must be selected with openbmctool to get the proper cooling for the system.
|
OS levels supported by the LC 9183 servers:
- Red Hat Enterprise Linux 8 for POWER, or later, with all available maintenance updates
- Red Hat Enterprise Linux 7 for POWER9, version 7.6, or later, with all available maintenance updates
- NVIDIA Telsa CUDA recommended driver level 396.44 or later, or minimum driver level 396.26 from the CUDA 9.2 toolkit
Additional OS level supported:
- Ubuntu Server 18.04.1, with all available maintenance updates.
IBM Power LC 9183 servers supports Linux which provides a UNIX like implementation across many computer architectures. Linux supports almost all of the Power System I/O and the configurator verifies support on order. For more information about the software that is available on IBM Power Systems, see the Linux on IBM Power Systems website:
http://www.ibm.com/systems/power/software/linux/index.html
The Linux operating system is an open source, cross-platform OS. It is supported on every Power Systems server IBM sells. Linux on Power Systems is the only Linux infrastructure that offers both scale-out and scale-up choices.
A supported version of Linux on the Power LC 9183 is Red Hat Enterprise Linux 7.6 for IBM Power LE (POWER9) (RHEL 7.6-ALT LE).
For additional questions about the availability of this release and supported Power servers, consult the Red Hat Hardware Catalog at
https://access.redhat.com/products/red-hat-enterprise-linux/#addl-arch
For a system that is configured without GPUs, there is the option of using Linux Ubuntu 18.04 or later as the OS.
For more information about Linux on Power, see the Linux on Power developer center at
https://developer.ibm.com/linuxonpower/
For information about the features and external devices that are supported by Linux, see this website:
http://www.ibm.com/systems/power/software/linux/index.html
Use one of the following commands at the Linux command prompt to determine the current Linux level:
•cat /proc/version
•uname -a
The output string from the command will provide the Linux version level.
The opal-prd package on the Linux system collects the OPAL Processor Recovery Diagnostics messages to log file /var/log/syslog. It is recommended that this package be installed if it is not already present as it will help with maintaining the system processors by alerting the users to processor maintenance when needed.
On Red Hat Linux, perform command "rpm -qa | grep -i opal-prd ". The command output indicates the package is installed on your system if the rpm for opal-prd is found and displayed. This package provides a daemon to load and run the OpenPOWER firmware's Processor Recovery Diagnostics binary. This is responsible for run-time maintenance of Power hardware. If the package is not installed on your system, the following command can be run on Red Hat to install it:
sudo yum update opal-prd
To display the PNOR level, use the following BMC command: "cat /var/lib/phosphor-software-manager/pnor/ro/VERSION"
And the BMC command line command "cat" can be used to display the BMC level: "cat /etc/os-release".
Note: the "cat" commands are run after ssh to the BMC as root and the default password is 0penBmc (where 0 is the zero character).
Follow the instructions on Fix Central. You must read and agree to the license agreement to obtain the firmware packages.
WARNING: Once the OP940.20 service pack is installed, the system will not boot successfully with any service packs prior to this one. The system can be recovered by reloading this or any newer service pack.
The updating and upgrading of system firmware depends on several factors, such as the current firmware that is installed, and what operating systems is running on the system.
These scenarios and the associated installation instructions are comprehensively outlined in the firmware section of Fix Central, found at the following website:
http://www.ibm.com/support/fixcentral/
Any hardware failures should be resolved before proceeding with the firmware updates to help insure the system will not be running degraded after the updates.
The process of updating firmware on the OpenBMC managed servers is documented below.
The sequence of events that must happen is the following:
•Power off the Host
•Update and Activate BMC
•Update and Activate PNOR
•Reboot the BMC (applies new BMC image)
•Power on the Host (applies new PNOR image)
The OpenBMC firmware updates (BMC and PNOR) for the LC 9183 servers can be managed via the command line with the openbmctool.
The openbmctool is obtained using the IBM Support Portal.
1.Go to the IBM Support Portal.
2.In the search field, enter your machine type and model. Then click the correct product support entry for your system.
3.From the Downloads list, click the openbmctool for your machine type and model.
4.Follow the instructions to install and run the openbmctool. You will need to provide the file locations of the BMC firmware image tar and PNOR firmware image tar that must be downloaded from Fix Central for the update level needed.
Information on the openbmctool and the firmware update process can be found in the IBM Knowledge Center:
https://www.ibm.com/support/knowledgecenter/POWER9/p9ei8/p9ei8_update_firmware_openbmctool.htm
The service processor, or baseboard management controller (BMC), provides a hypervisor and operating system-independent layer that uses the robust error detection and self-healing functions that are built into the POWER processor and memory buffer modules. OpenPOWER application layer (OPAL) is the system firmware in the stack of POWER processor-based Linux-only servers.
The service processor, or baseboard management controller (BMC), is the primary control for autonomous sensor monitoring and event logging features on the LC server.
The BMC supports the Intelligent Platform Management Interface (IPMI) for system monitoring and management. The BMC monitors the operation of the firmware during the boot process and also monitors the OPAL hypervisor for termination.
Various risks that are associated with the Intelligent Platform Management Interface (IPMI) have been identified and documented in the information technology (IT) security community.
Possible risks includes the following three common vulnerabilities and exposures (CVEs):
1) CVE-2013-4037:
The Remote Authenticated Key-Exchange Protocol (RAKP), which is specified by the IPMI standard for authentication, has flaws. Although the system does not allow the use of null passwords, a hacker might reverse engineer the RAKP transactions to determine a password. The authentication process for IPMI requires the management controller to send a hash of the requested password of the user to the client before the client authenticates. This process is a key part of the IPMI specification. The password hash can be broken by using an offline brute force or dictionary attack.
2) CVE-2013-4031:
IBM Power Systems and OpenPower Systems are preconfigured with one IPMI user account, which has the same default login name and password on all affected systems. If a malicious user gains access to the IPMI interface by using this preconfigured account, the user can power off or on, or restart the host server, and create or change user accounts possibly preventing legitimate users from accessing the system. On OpenPower Systems, the default IPMI user name is root. Additionally, if a user fails to change the default user name and password on each of the systems that is deployed, the user has the same login information for each of those systems.
3) CVE-2013-4786:
The IPMI 2.0 specification supports RMCP+ Authenticated Key-Exchange Protocol (RAKP) authentication, which allows remote attackers to obtain password hashes and conduct offline password guessing attacks by obtaining the hash-based message authentication code (HMAC) from a RAKP message 2 response from a BMC.
If a user is not managing a server by using the IPMI, one can configure the system to disallow IPMI network access from the user accounts. This task can be accomplished by using the IPMItool utility or a similar utility for managing and configuring the IPMI management controllers. Use the following IPMItool command to disable the network access for an IPMI user:
ipmitool channel setaccess 1 #user_slot# privilege=15
For more information on the IPMI security vulnerabilities and configuration options and best practices to minimize the risks of this interface, go to the IBM Knowledge Center at the following URL:
https://www.ibm.com/support/knowledgecenter/POWER9/p9eih/p9eih_openbmc_security.htm
The OpenPOWER Abstraction Layer (OPAL) provides hardware abstraction and run time services to the running host Operating System.
For the 9183 servers, only the OPAL bare-metal installs can be used.
Find out more about OPAL skiboot here:
https://github.com/open-power/skiboot
The Intelligent Platform Management Interface (IPMI) is an open standard for monitoring, logging, recovery, inventory, and control of hardware that is implemented independent of the main CPU, BIOS, and OS. The LC 9183 servers provide two 1G baseT IPMI ports.
The ipmitool is a utility for managing and configuring devices that support IPMI. It provides a simple command-line interface to the service processor. You can install the ipmitool from the Linux distribution packages in your workstation, sourceforge.net, or another server (preferably on the same network as the installed server).
For installing ipmitool from sourceforge, please see section 1.1 "Minimum ipmitool Code Level".
For more information about ipmitool, there are several good references for ipmitool commands:
The man page
The built-in command line help provides a list of IPMItool commands:
# ipmitool help
You can also get help for many specific IPMItool commands by adding the word help after the command:
# ipmitool channel help
For a list of common ipmitool commands and help on each, you may use the following link:
https://www.ibm.com/support/knowledgecenter/POWER9/p9eih/p9eih_managing_with_ipmi.htm
To connect to your host system with IPMI, you need to know the IP address of the server and have
a valid password. To power on the server with the ipmitool, follow these steps:
1. Open a terminal program.
2. Power on your server with the ipmitool:
ipmitool -I lanplus -H bmc_ip_address -P ipmi_password power on
3. Activate your IPMI console:
ipmitool -I lanplus -H bmc_ip_address -P ipmi_password sol activate
Petitboot is a kexec based bootloader used by IBM POWER9 systems for doing the bare-metal installs on the 9183 servers.
After the POWER9 system powers on, the petitboot bootloader scans local boot devices and network interfaces to find boot options that are available to the system. Petitboot returns a list of boot options that are available to the system. If you are using a static IP or if you did not provide boot arguments in your network boot server, you must provide the details to petitboot. You can configure petitboot to find your boot with the following instructions:
https://www.ibm.com/support/knowledgecenter/linuxonibm/liabw/liabppetitbootadvanced.html
You can edit petitboot configuration options, change the amount of time before Petitboot automatically boots, etc. with these instructions:
https://www.ibm.com/support/knowledgecenter/linuxonibm/liabw/liabppetitbootconfig.html
After you select to boot the ISO media for the Linux distribution of your choice, the installer wizard for that Linux distribution walks you through the steps to set up disk options, your root password, time zones, and so on.
You can read more about the petitboot bootloader program here:
https://www.kernel.org/pub/linux/kernel/people/geoff/petitboot/petitboot.html
This guide helps you install Linux on Power Systems server.
Overview
Use the information found in https://www.ibm.com/support/knowledgecenter/linuxonibm/liabw/liabwp9kickoff.html to install Linux on a non-virtualized (bare metal) IBM Power LC server.
Date | Description |
09/30/2021 | OP940.22 release |
03/31/2021 | OP940.21 release |
03/12/2021 | OP940.20 republish – to re-sign the PNOR image with the correct key |
01/22/2021 | OP940.20 release – Removed LDAP Bind failure guidance, Anti-rollback Support |
02/07/2020 | New for IC922 LC servers for the OP940.00 release |