Readme file for IBM Watson® Machine Learning Accelerator Interim Fix 559338

 

Readme file for: IBM Watson® Machine Learning Accelerator
Product/Component Release: 1.2.1
Fix ID: dli-1.2.3-build559338-wmla-welfg

Publication date: October 1, 2020

 

Interim fix 559338 includes support for running native PyTorch and TensorFlow CPU jobs using the dlicmd command line tool or REST API.

 

Contents

1.      Download location 

2.      Products or components affected

3.      Installation and configuration

4.      Uninstallation

5.      List of files

6.      Product notifications 

7.      Copyright and trademark information

 

1.    Download location

Download interim fix 559338 from the following location https://www.ibm.com/eserver/support/fixes/.

 

2.    Products or components affected

Component Name, Platform, Fix ID:

dlpd

Linux ppc64le

dli-1.2.3-build559338-wmla-welfg

 

3.    Installation and configuration

3.1 Before installation

Before installing the interim fix, complete the following steps to prepare your environment

 

1.      Log on to the master host as the cluster administrator (CLUSTERADMIN) and source the environment according to your shell environment.

For sh, ksh or bash:

$ . $EGO_TOP/profile.platform

For csh or tcsh:

$ source $EGO_TOP/cshrc.platform

 

where EGO_TOP is IBM Spectrum Conductor Deep Learning Impact installation path.

 

2.      Run the following command to log in:

$ egosh user logon -u user_name -x password

where user_name and password are your login credentials. For example:

$ egosh user logon -u Admin -x Admin

 

3.      Create a backup directory and back up the following file:

      $DLI_SHARED_FS/tools/dl_plugins/dlioptgen.py

 

3.2 Installation steps 

Apply the interim fix by completing the follow steps:

1.      Stop services.

$ egosh service stop dlpd

 

2.      On each management host (including the master host), download the packages to a directory. For example, packages can be downloaded to the /dlifixes directory.

 

3.      As a root user, change the permission of the interim fix files.

For Linux ppc64le:

$ chmod o+r /dlifixes/dlicore-1.2.3.0_ppc64le_build559338.tar.gz

 

4.      Run the egoinstallfixes command to install cluster jars.

For Linux ppc64le:

$ egoinstallfixes /dlifixes/dlicore-1.2.3.0_ppc64le_build559338.tar.gz

 

NOTE: Running the “egoinstallfixes” command automatically backs up the current binary files to a fix backup directory for recovery purposes. Do not delete this backup directory; you need it if you want to recover the original files. For more information on using this command, see the egoinstallfixes command reference.

 

5.    Run the pversions command to verify the interim fix installation.

$ pversions -b 559338

 

6.    As a cluster administrator, log in to the master host and extract the dli-1.2.3.0_build559338_share.tar.gz package to the top-level $DLI_SHARED_FS directory:

$ tar zoxf dli-1.2.3.0_build559338_share.tar.gz -C $DLI_SHARED_FS

 

DLI_SHARED_FS must be the same as the IBM Spectrum Conductor Deep Learning Impact installation setting.

 7.   Start services.

$ egosh service start dlpd

 

3.3 After installation

To verify that the interim fix was installed successfully, submit a CPU job by setting the  gpuPerWorker parameter to 0. For example:

'args': '--exec-start PyTorch --cs-datastore-meta type=fs --gpuPerWorker 0 --model-main pytorch_mnist_HPO.py --model-dir pytorch_hpo'

 

4.    Uninstallation

If required, follow the instructions in this section to uninstall this interim fix from hosts in your cluster.

1.      Log in to the management host as a cluster administrator (CLUSTERADMIN) and source the environment.

2.      Stop the dlpd service.

$ egosh service stop dlpd

3.      Log on to each management host in the cluster and roll back the interim fix.

$ egoinstallfixes -r 559338

4.      Manually restore the dlioptgen.py file which is found in the $DLI_SHARED_FS/tools/dl_plugins directory.

   $DLI_SHARED_FS/tools/dl_plugins/dlioptgen.py

5.      Start the dlpd service.

$ egosh service start dlpd

 

5.    List of files 

$EGO_TOP/dli/1.2.3/dlpd/lib/cws_dl-core-1.2.3.jar

   $DLI_SHARED_FS/tools/dl_plugins/dlioptgen.py

 

6.    Product notifications

To receive information about product solution and patch updates automatically, subscribe to product notifications on the My Notifications page http://www.ibm.com/support/mynotifications/ on the IBM Support website (http://support.ibm.com). You can edit your subscription settings to choose the types of information you want to get notification about, for example, security bulletins, fixes, troubleshooting, and product enhancements or documentation changes. 

 

7.    Copyright and trademark information 

© Copyright IBM Corporation 2020

U.S. Government Users Restricted Rights - Use, duplication or disclosure restricted by GSA ADP Schedule Contract with IBM Corp.

IBM®, the IBM logo and ibm.com® are trademarks of International Business Machines Corp., registered in many jurisdictions worldwide. Other product and service names might be trademarks of IBM or other companies. A current list of IBM trademarks is available on the Web at "Copyright and trademark information" at www.ibm.com/legal/copytrade.shtml