Readme
file for IBM Watson® Machine Learning Accelerator Interim Fix 559338
Readme
file for: IBM Watson® Machine Learning Accelerator
Product/Component Release: 1.2.1
Fix ID: dli-1.2.3-build559338-wmla-welfg
Publication
date: October 1, 2020
Interim
fix 559338 includes support for running native PyTorch and TensorFlow CPU jobs
using the dlicmd command line tool or REST API.
1.
Download
location
2.
Products
or components affected
3.
Installation
and configuration
4.
Uninstallation
5.
List
of files
6.
Product
notifications
7.
Copyright
and trademark information
Download
interim fix 559338 from the following location https://www.ibm.com/eserver/support/fixes/.
Component
Name, Platform, Fix ID:
dlpd
Linux
ppc64le
dli-1.2.3-build559338-wmla-welfg
3.1
Before installation
Before
installing the interim fix, complete the following steps to prepare your
environment
1. Log on to the master host
as the cluster administrator (CLUSTERADMIN) and source the environment
according to your shell environment.
For sh, ksh or bash:
$ . $EGO_TOP/profile.platform
For csh or tcsh:
$ source $EGO_TOP/cshrc.platform
where EGO_TOP is IBM Spectrum Conductor Deep Learning Impact
installation path.
2.
Run
the following command to log in:
$
egosh user logon -u user_name -x password
where
user_name and password are your login credentials. For
example:
$
egosh user logon -u Admin -x Admin
3. Create a backup
directory and back up the following file:
$DLI_SHARED_FS/tools/dl_plugins/dlioptgen.py
3.2
Installation steps
Apply
the interim fix by completing the follow steps:
1.
Stop
services.
$
egosh service stop dlpd
2.
On each management
host (including the master host), download the packages to a directory. For
example, packages can be downloaded to the /dlifixes directory.
3.
As a
root user, change the permission of the interim fix files.
For Linux ppc64le:
$
chmod o+r /dlifixes/dlicore-1.2.3.0_ppc64le_build559338.tar.gz
4.
Run the egoinstallfixes command to install
cluster jars.
For Linux ppc64le:
$
egoinstallfixes /dlifixes/dlicore-1.2.3.0_ppc64le_build559338.tar.gz
NOTE: Running the
“egoinstallfixes” command automatically backs up the current binary files to a
fix backup directory for recovery purposes. Do not delete this backup
directory; you need it if you want to recover the original files. For more
information on using this command, see the egoinstallfixes command reference.
5. Run the pversions command to verify
the interim fix installation.
$
pversions -b 559338
6. As a cluster administrator, log in to the master host and extract the dli-1.2.3.0_build559338_share.tar.gz package to the top-level $DLI_SHARED_FS directory:
$
tar zoxf dli-1.2.3.0_build559338_share.tar.gz -C $DLI_SHARED_FS
DLI_SHARED_FS
must be the same as the IBM Spectrum Conductor Deep Learning Impact
installation setting.
7. Start services.
$ egosh service start dlpd
3.3
After installation
To verify that the interim fix was installed
successfully, submit a CPU job by setting the gpuPerWorker parameter to 0. For
example:
'args':
'--exec-start PyTorch --cs-datastore-meta type=fs --gpuPerWorker 0
--model-main pytorch_mnist_HPO.py --model-dir pytorch_hpo'
If
required, follow the instructions in this section to uninstall this interim fix
from hosts in your cluster.
1. Log in to the management host as a
cluster administrator (CLUSTERADMIN) and source the environment.
2. Stop the dlpd service.
$
egosh service stop dlpd
3. Log on to each management host in
the cluster and roll back the interim fix.
$
egoinstallfixes -r 559338
4. Manually restore the dlioptgen.py
file which is found in the $DLI_SHARED_FS/tools/dl_plugins directory.
$DLI_SHARED_FS/tools/dl_plugins/dlioptgen.py
5. Start the dlpd
service.
$
egosh service start dlpd
$EGO_TOP/dli/1.2.3/dlpd/lib/cws_dl-core-1.2.3.jar
$DLI_SHARED_FS/tools/dl_plugins/dlioptgen.py
To
receive information about product solution and patch updates automatically,
subscribe to product notifications on the My Notifications page http://www.ibm.com/support/mynotifications/
on the IBM Support website (http://support.ibm.com). You can edit your subscription
settings to choose the types of information you want to get notification about,
for example, security bulletins, fixes, troubleshooting, and product
enhancements or documentation changes.
©
Copyright IBM Corporation 2020
U.S.
Government Users Restricted Rights - Use, duplication or disclosure restricted
by GSA ADP Schedule Contract with IBM Corp.
IBM®,
the IBM logo and ibm.com® are trademarks of International Business Machines
Corp., registered in many jurisdictions worldwide. Other product and service
names might be trademarks of IBM or other companies. A current list of IBM
trademarks is available on the Web at "Copyright and trademark
information" at www.ibm.com/legal/copytrade.shtml