PCIe4 2-port 100GbE RoCE Adapter (FC:  EC75, EC76)

 

 

******* PLEASE READ THIS ENTIRE NOTICE *********

 
DATE: July 22, 2024

Table of Contents

1.0     Microcode and Document Revision History. 1

2.0     General information.. 2

3.0     Installation time. 3

4.0     Machine's Affected.. 3

5.0     Linux Requirements. 3

6.0     AIX Requirements. 6

6.1     AIX APARS. 6

7.0     Determine the Current Microcode Level for AIX. 6

8.0     Downloading the RPM Format File to the Target Server for AIX. 6

9.0     Discovery Tool Microcode CD-ROM creation and download instructions. 7

10.0        Verifying microcode before download and notes for AIX. 7

11.0        Microcode Download Procedure for AIX. 8

11.1        Setting up for Microcode download. 8

11.2        Downloading Microcode to the Adapter  8

11.3        Re-configure and Verify adapters. 9

 

1.0     Microcode and Document Revision History

 

Firmware Level

Description

22.40.1000 / 002200401000

Impact: Usability  Severity: ATT

Including list of Bug fixes between 22.40.1000 and 22.31.2006 that could potentially affect this product. Full list can be found in Nvidia documentation

Bug Fixes History - NVIDIA Docs

Bug Fixes History - NVIDIA Docs

 

·        Fixed a rare issue that prevented changes in mlxconfig from taking effect upon warm reboot.

·        Modified the TCP IPv4 flows so that the steering TIR rx_hash_symmetric field is now valid only when both the SRC and DST fields are not set to zero.

·        Added a locking mechanism to protect the firmware from a race condition between insertion and deletion of the same rule in parallel. Such behavior occasionally resulted in firmware accessing a memory that has already been released, thus causing IOMMU / translation error.

·         Fixed an issue in the steering definers used in LAG with IPv6 packets.

·        Added a safety mechanism to prevent the link from to getting stuck when receiving bad tuning results. In this case, the linkup flow is restarted and the mechanism retries to raise the link.

·        Fixed an issue that resulted in stuck IO when handling s software WQE with no request for CQE.

·        Fixed an inaccurate rate issue when running with multiple flows.

·        Fixed an issue that caused the commands sent by the MLNX_OFED driver to the NIC to fail when loading the VirtIO driver.

·        Fixed an issue that due to a firmware limitation, enabling tx_port_ts resulted in syndrome 0x5d2974.

·        Fixed an issue that caused the NIC to access the host memory when in idle mode.

·        Added 50 Usec delay during PML1 exit to avoid any PCIe replay timer timeout.

·        Fixed a rare HW/FW timing race of serdes' power-up sequence.

·        Fixed an issue that resulted in temporary packet drops while changing PTP/FCS configuration when the links were up.

·        Fixed an issue that resulted in wrong port calibration due to incorrect mapping of the port during initialization stage.

·        Fixed an issue that resulted in firmware getting stuck and causing unexpected behavior when connecting an optical transceiver that support RXLOS, and the remote side port was down.

·        Fixed an issue that caused the link status to be reported incorrectly and consequently caused the link to go down due to the wrong definition of the RX_LOS polarity in the INI.

·        Limited the external loopback speed to the used module's capabilities.

·        Modified the Rx flow to go directly to QP without going thru the RX steering flow to reflect correctly the statics for the Tx and Rx.

·        Fixed a BER issue on the Serdes by updating the mapping of logical to physical port configuration for Link-Maintenance flow.

·        Fixed an issue that caused some commands to get stuck or fail when configuring the HCA_CAP.cmdif_checksum to 0x3 and using firmware version lower than 22.31.1004.

·        Removed firmware dependency on credits reset during link reset flow.

·        Fixed a rare case of a doorbell drop that caused the Rx side to get stuck that when running traffic on top of a virtio device.

·        Fixed unexpected and excessive interrupts caused by internal misconfigured EQs that took PCI bandwidth and introduced PCIe latency and as a result caused virtio Tx pps degradation.

·        Fixed unexpected and excessive interrupts received by the Host when running virtio emulation application traffic due to internal misconfigured EQ in NIC.

The Firmware Levels Below Are No Longer Supported By IBM Once They Have Been Removed From The Microcode Download Website.

It is best practices to update to the latest FW level not only for IBM support of these products, but for optimal performance and to ensure that all of the required HW/FW fixes are installed. Once new FW has been released to the field, we will provide a 6 month grace period for customers to update these products to the currently supported FW level.

Please Update To The Latest Level At Your Earliest Convenience

 

22.31.2006 / 002200312006

Impact: Availability Severity: ATT

 

1. Fixed the rate select mechanism in QSFP modules.

2. Fixed classification issues for "Passive" cables to be more robust.

3. Fixed an issue that caused a fatal error, and eventually resulted in the HCA hanging when a packet was larger than a strided receive WQE that was being scattered.

4. Fixed an issue that caused Tx to hang when a duplicate packet rollback occurred.

5. Fixed an issue that prevented events from being sent when only the DCBX oper version was changed.

6. Fixed an issue that prevented a SFP28 cable from linking up in a 25GbE speed.

7. Increased the default number of outstanding read bytes on the PCIe link for PCIe Gen4 devices when working in PCIe Gen3 servers. This will enable the NIC to maximize the PCIe link and achieve maximum bandwidth.PCIe link for PCIe Gen4 devices when working in PCIe Gen3 servers. This will enable the NIC to maximize the PCIe link and achieve maximum bandwidth.

22.24.8000 (for AIX and Linux)

Impact: NEW Severity: NEW   

Original Release for EC75 and EC76 adapter

 

Document Revision History

Description

V1.0 – 08/17/2018

Original Release

V1.1 – 08/30/2018

Added AIX APARS section

V1.2 – 04/25/2019

Updated instructions for new fw 16.24.8000 / 001600248000 release

V1.3 – 05/04/2021

Updated instructions for new fw 16.29.1017 / 001600291017 release
Updated General Information section with SRIOV info.

V1.4 – 07/18/2022

Updated instructions for new fw 22.31.2006 / 002200312006 release

V1.5 – 07/22/2024

Updated instructions for new fw 22.40.1000 / 002200401000 release

 

 

2.0     General information

 

This Readme file is intended to give directions on how to update the microcode found on the PCIe4 2-port 100GbE RoCE Adapter

 

1.   Non-Concurrent Download (Linux Only)

The microcode installation does NOT support concurrent download in Linux. The device can be used during and after the download, but update will not go into effect until a reboot is performed.

 

 

2.   It is recommended that the installation be scheduled during a maintenance window or during non-peak production periods.

 

3.   It is best practice to update to latest FW level not only for IBM support of these products, but for optimal performance and to ensure that all the required HW/FW fixes are installed.

 

4.   Once new FW has been released to the field, we will provide a 6-month grace period for customers to update these products to the currently supported FW level.

 

5.   Adapter in PowerVM SRIOV shared mode

- This adapter firmware release notes applies to adapter configured in dedicated mode. 

- When adapter is transitioned to SRIOV mode, the system firmware updates the adapter firmware, which may differ from the firmware used in dedicated mode.

- When adapter is moved back to dedicated mode, user will need to update the adapter firmware to the level mentioned in this release notes.

 

6.   Release Notes for adapter firmware in PowerVM SRIOV shared mode.

Please visit fix central (http://www.ibm.com/support/fixcentral/) and review the release notes pertaining to your system MTM and installed system firmware.

 

7.   For more information about adapters running in  PowerVM SRIOV shared mode visit:  SRIOV FAQs   and vNIC FAQs

 

 

 

3.0     Installation time

 

Approximately 20 minutes.

 

4.0     Machine's Affected

 

Feature Code: EC76

·       9786-42H

·       9105-41B/42A

·       9043-MRX

·       9040-MR9

·       9009-41A/41G/42A/42G

·       9223-42H/42S

 

Feature Code: EC75

·       9786-22H

·       9105-22A/22B

·       9080-M9S/HEX

·       9009-22A/22G

·       9008-22L

·       9223-22H/22S

·       9183-22X

·       8335-GTH/GTX

 

5.0     Linux Requirements Error! Bookmark not defined.

 

For Linux operating systems, use the following procedure:

 

1. Find the PCI slot the ConnectX-5 adapter is plugged into. Issue the following command:

 

lspcinn | grep “1019"

 

For example:

# lspci -nn |grep 1019

0001:01:00.0 Ethernet controller [0200]: Mellanox Technologies MT28800 Family [ConnectX-5 Ex] [15b3:1019]

0001:01:00.1 Ethernet controller [0200]: Mellanox Technologies MT28800 Family [ConnectX-5 Ex] [15b3:1019]

Note:

              a. This tells us that the adapter is in the PCIe slot 0001:01:00.0 which will be needed in next steps. 

            

             b. Each port is listed as its own PCIe ID

                 0001:01:00.0 – Port 1 of the adapter

                      0001:01:00.1 – Port 2 of the adapter.

 

                 c. Either of the port PCIe IDs can be used to perform the microcode install.

 

2. Start MFT tools by running “mst start”

 

#mst start

Starting MST (Mellanox Software Tools) driver set

Loading MST PCI module - Success

Loading MST PCI configuration module - Success

Create devices

 

If the system does not have the mst command available, please install the Mellanox Firmware Tools (MFT) available here: http://www.mellanox.com/page/management_tools

 

 

 

3. Verify the current Firmware level by running “flint -d <pci bus ID> -q”.  Replace <pci bus ID> with the ID found in the lscpi command from step 2.

 

# flint -d 0001:01:00.0 q

Image type:            FS4

FW Version:            22.40.1000

FW Release Date:       4.2.2024

Product Version:       22.40.1000

Description:           UID                GuidsNumber

Base GUID:             506b4b0300350142        4

Base MAC:              506b4b350142            4

Image VSD:             N/A

Device VSD:            N/A

PSID:                  IBM0000000037

Security Attributes:   N/A

 

This command reveals the current version of the FW; which is 22.40.1000 in this particular case.  If the version is less than 22.40.1000, please update.

 

4. Download the firmware RPM file to the destination machine.

 

Use this method to download the new microcode to a Linux system:

NOTE: The instructions that follow are specific Linux commands. Linux commands are CASE (lower and upper) SENSITIVE, and must be entered exactly as shown, including filenames.

  1. Transfer the RPM format file to the /tmp directory (using “Save as....”). You will see the filename for the RPM file. 
  2. Install rpm on your Linux system by typing:

rpm -ivh /tmp/b315191014103506.002200401000.Linux.rpm

 

The microcode package will install the firmware images in the /lib/firmware directory. If a message is displayed saying the "package <package_name> is already installed", you will need to uninstall the listed rpm package. On the command line type:

rpm -e <package_name>

where <package_name> is the name of the package that was returned in the message. Return to Step 2 and attempt to install the file again.

 

  1. The microcode file will be installed to the /lib/firmware directory. The file name is

 

5. Verify the contents of the image before flashing:

ls -l /lib/firmware/b315191014103506.002200401000 to verify file size:

·        b315191014103506.002200401000 = 33554432

 

sum /lib/firmware/b315191014103506.002200401000 to verify Checksum:

·        b315191014103506.002200401000 = 38492

 

 

6. Update the FW with "flint -d <pci bus id> -i <image filename> burn"

 

 

 

# flint -d 0001:01:00.0 -i b315191014103506.002200401000 burn

 

    Current FW version on flash:  22.31.2006

    New FW version:               22.40.1000

 

Burning FW image without signatures - OK

Restoring signature                  - OK

 

 

7. At this point we have burned the new FW on to the Mellanox adapter. However, this FW will not take effect till the system is rebooted.

Reloading the drivers is not enough; it requires the system linux partition to be rebooted.

 

 

6.0     AIX Requirements

 

Adapter is supported starting on August 2018 fix pack of:

·       AIX 7.2 with Technology Level 2 and Service Pack 2 and above

·       AIX 7.1 with Technology Level 5 and Service Pack 2 and above

·       VIOS 2.2.6.23 and above

 

If you are using another release of AIX, ensure that the adapter is supported on that release before you install the adapter. Contact service and support for assistance.

6.1       AIX APARS

Prior to installing the microcode, the following APARs are required to be installed. Failure to do so may result in failure in aix advanced diagnostics.

IJ37699

IJ08890

 

7.0     Determine the Current Microcode Level for AIX

 

Before you install the microcode, it is important to determine the microcode level of the Adapter installed in the target system. Use the following instructions to read the ROM level stored in the Adapter's VPD.

A.    List all PCIe4 2-port 100GbE RoCE Adapters installed in the system by typing:
  lsdev |grep b315191014103506

B.    To check the current microcode level for the adapter or controller enter the following command:

lsmcode -cd entX

Where "X" is the instance of the adapter. The command will produce output similar to:

# lsmcode -cd ent1

The current microcode level for ent1 is 002200401000

 

If the ROM Level is less than 002200401000 you should update the microcode.

8.0     Downloading the RPM Format File to the Target Server for AIX


Use this method to download to an AIX system:

 

NOTE: The instructions that follow are specific AIX commands. AIX commands are CASE (lower and upper) SENSITIVE, and must be entered exactly as shown, including filenames.

 

A.    Make two directories on your AIX system to receive the RPM format file.
Enter:      "mkdir /tmp/microcode"
and then create this directory
Enter:      "mkdir /tmp/microcode/RPM"

 

B.    Transfer the RPM format file to the /tmp/microcode/RPM directory (using "Save as ...").  Change to that directory, "cd /tmp/microcode/RPM".
You'll see the filename for the RPM file.
       "rpm -ihv --ignoreos
b315191014103506.002200401000.aix.rpm"

C.   For AIX:  The microcode files will be added to /etc/microcode/. 

D.   Microcode file will be copied to "/etc/microcode".  The file size and checksum of the microcode image will be verified in Section 10.0.
File Names:
b315191014103506.002200401000

 

NOTE:
 - "/etc/microcode" is a symbolic link to "/usr/lib/microcode".
 - If permission does not allow the copy to the above stated directory or file then the user will be prompted for a new location.

 

9.0     Discovery Tool Microcode CD-ROM creation and download instructions

To obtain information how to burn a CD-ROM and run the Discovery Tool for an AIX or Linux System please go to:

 

http://www-304.ibm.com/webapp/set2/firmware/lgjsn?mode=10&page=cdrom.html

 

A.    After running the Discovery Tool successfully, the "/tmp/microcode/RPM" directory was created, and your rpm files are copied from the CD-ROM.

B.    Change to that directory, "cd /tmp/microcode/RPM".

C.   Unpack the file by executing the instructions below:
Enter the command:
"rpm -ihv --ignoreos
b315191014103506.002200401000.aix.rpm "

D.   Microcode file will be copied to "/etc/microcode".  The file size and checksum of the microcode image will be verified in Section 10.0.
File Names:
b315191014103506.002200401000

 

 

NOTE:

- "/etc/microcode" is a symbolic link to "/usr/lib/microcode".

- If permission does not allow the copy to the above stated directory or file then the user will be prompted for a new location.

- For customers using the AIX Diagnostics CD, please refer to the IBM System Hardware information Center for instructions.

 

 

 

 

10.0 Verifying microcode before download and notes for AIX

Please verify the file size and checksum of the raw microcode files matches what is listed below.


ls -l /etc/microcode/b315151014101f06.001400311014 to verify file size:

·        b315151014101f06.001400311014 = 33554432

 

sum /etc/microcode/b315151014101f06.001400311014 to verify Checksum:

·        b315151014101f06.001400311014 = 38492

 

 

11.0 Microcode Download Procedure for AIX

11.1    Setting up for Microcode download

 

A.    Stop all applications that use this interface/adapter.

B.    Remove the interface/IP address from the all ports identified in section 7.0 for the adapters that will be upgraded.

a.    Before detaching the interface, record the IP address and any other pertinent information that was configured on the Adapter.  This information may be needed if the microcode update overwrites this section on the Adapter.

C.   If the interfaces are members of an SEA, the SEA devices must be moved to a defined state.

a.    "rmdev -l enX" - where "X" is the interface number for the Shared Ethernet Adapter.

b.    "rmdev -l entX" - where "X" is the interface number for the Shared Ethernet Adapter.

D.   If the interfaces are members of an EtherChannel, the EtherChannel device must be moved to a define state.

a.    "rmdev -l enX" - where "X” is the interface number for the EtherChannel adapter.

b.    "rmdev -l entX" - where "X" is the interface number for the EtherChannel adapter.

E.    For every port associated with the adapter, the enX interfaces must be changed to a defined state.

a.    "rmdev -l enX" - where "X" is the interface number for the adapter port.

b.    This command will be run 2 times, once for each port on the adapter.

 

            

11.2    Downloading Microcode to the Adapter

A.    At the command line type "diag"

B.    Select the "Task Selection" from diagnostics menu.

C.   Select "Microcode Tasks” then select “Download Microcode" from the menu.

D.   Select all of the entX PCIe4 2-port 100GbE RoCE Adapters that need to be updated from the list of devices by using the arrow keys to highlight the entry and pressing "Enter" to mark it.  Press "F7" or "ESC+7" when you are done marking all the adapters you want to flash.

E.    If a source selection menu is displayed, Select "/etc/microcode".

F.    A dialogue box may be displayed on screen.  It will state that the current microcode level on the adapter is not in the /etc/microcode directory.  This is acceptable because the adapter will reject any incorrect code.  Press "Enter" to continue.

G.   Select 002200401000 level and press "Enter" to flash the adapter.

H.   The following message will appear on the screen when download is completed: "Microcode download complete successfully.  The current microcode level for the device is ...  Please run diagnostics on the adapter to ensure that it is functioning properly." 

I.     If you selected more than one adapter to update, then steps 6-9 will repeat until all adapters are updated.

J.     Exit diagnostics.

11.3    Re-configure and Verify adapters

A.    Run "cfgmgr" to reconfigure the adapters that were moved to defined before the update. 

B.    Verify the code level is 002200401000 by typing "lsmcode -cd entX" for each adapter updated, where "X" is the instance of the PCIe4 2-port 100GbE RoCE Adapter.