Readme File for adding additional framework support to IBM Spectrum Conductor Deep Learning Impact 1.1

Readme file for: IBM Spectrum Conductor Deep Learning Impact

Product/Component Release: 1.1.0

Fix ID: dli-1.1.0-build492708

Publication date: June 22, 2018

 

IBM Spectrum Conductor Deep Learning Impact 1.1.0 supports using additional frameworks via the command line interface (CLI). Utilize your existing cluster resources to use any framework to run deep learning jobs using the dlicmd command. Apply this fix to add “Bring Your Own Framework” to IBM Spectrum Conductor Deep Learning Impact 1.1. This fix also includes elastic updates.

 

Contents:

 

1.      System requirements

2.      Installation

3.      More information

4.      Copyright and trademark information

 

1. System requirements

 

Ensure that your system meets the required hardware and software for IBM Spectrum Conductor Deep Learning Impact 1.1, see https://www.ibm.com/support/knowledgecenter/SSWQ2D_1.1.0/in/installation-requirements.html.

 

 

2. Installation

 

Before you install

 

·       Before applying iFix, make sure that there are no running applications.

·        The  /opt/ibm/spectrumcomputing directory is set as the top installation directory.

·        The IBM Spectrum Conductor Deep Learning Impact shared directory is defined by $DLI_SHARED_FS.  Make sure to export the DLI SHARED_FS environment variable.

·       You only need to apply this fix on the IBM Spectrum Conductor with Spark management nodes.

 

 

Installation

 

To apply this fix to IBM Spectrum Conductor Deep Learning Impact, do the following:

 

1.      Log in to the master host as root.

2.      Download the iFix (dli-1.1.0.0_build492708.tar.gz) to a local directory on the master host, for example: /iFix_492708

3.      Switch to EGOADMIN, for example:

# su egoadmin

4.      Change directory to $EGO_TOP/dli/dlpd. The rest of this document assumes this is the working directory.

$ cd $EGO_TOP/dli/dlpd

5.      Stop dlpd service by running:

$ egosh service stop dlpd

6.      Back up core files. If HA is enabled, run these commands on each management host.

$ cp lib/cws_dl-core-1.1.0.jar lib/cws_dl-core-1.1.0.jar.0618ifix.bak

$ cp lib/cws_dl-common-1.1.0.jar lib/cws_dl-common-1.1.0.jar.0618ifix.bak

 

7.      Back up fabric files.

$ cp -r $DLI_SHARED_FS/fabric $DLI_SHARED_FS/fabric.bak

$ cp $DLI_SHARED_FS/tools/spark_tf_launcher/launcher.py $DLI_SHARED_FS/tools/spark_tf_launcher/launcher.py.bak

$ cp $DLI_SHARED_FS/conf/spark-env.sh $DLI_SHARED_FS/conf/spark-env.sh.bak

 

8.      Copy the fix tar file to the current directory and extract. If HA is enabled, run this command on each management host.

$ tar xvf dli-1.1.0.0_build492708.tar.gz

9.      Copy the plugin directory to your shared location.

$ cp -r tools/dl_plugins $DLI_SHARED_FS/tools

$ chmod 755 $DLI_SHARED_FS/tools/dl_plugins/*.sh

$ tar xvf fabric-1.0.0-linux-ppc64le.tar.gz -C $DLI_SHARED_FS/fabric

$ cp tools/spark_tf_launcher/launcher.py $DLI_SHARED_FS/tools/spark_tf_launcher

$ mv spark-env.sh $DLI_SHARED_FS/conf

 

10.      If HA is enabled, run the following command on the master host.

$ cp -r conf/dl_plugins $EGO_CONFDIR/../../dli/dlpd/conf

11.      Start dlpd service by running.

$ egosh service start dlpd 

12.      Switch back to root and copy the remaining file

# cp ../../wlp/usr/servers/dlrest/apps/dlrest/META-INF/swagger.yaml ../../wlp/usr/servers/dlrest/apps/dlrest/META-INF/swagger.yaml.0618ifix.bak

# mv swagger.yaml ../../wlp/usr/servers/dlrest/apps/dlrest/META-INF/

 

13.   Check that dlpd started successfully by ensuring there are no errors in the dlpd.log file.

# tail -f logs/dlpd.log

 

14.  Start using dlicmd:

# . $DLI_SHARED_FS/conf/spark-env.sh

# python bin/dlicmd.py

 

3. More information

To obtain more information about IBM Spectrum Conductor Deep Learning Impact, see IBM Knowledge Center at www.ibm.com/support/knowledgecenter/en/SSWQ2D_1.1.0.

For any questions regarding this solution, ask us directly on our Slack channel. For instructions on how to sign up for our Slack channel, see www.ibm.com/developerworks/community/wikis/home?lang=en#!/wiki/Wa0adcb6b782c_4e8b_b3c8_d633cb9456d8/page/Slack%20channel%20(IBM%20Cloud%20Technology)%20sign%20up%20page  

 

4. Copyright and trademark information

© Copyright IBM Corporation 1992, 2018.

U.S. Government Users Restricted Rights - Use, duplication or disclosure restricted by GSA ADP Schedule Contract with IBM Corp.

IBM®, the IBM logo and ibm.com® are trademarks of International Business Machines Corp., registered in many jurisdictions worldwide. Other product and service names might be trademarks of IBM or other companies. A current list of IBM trademarks is available on the Web at "Copyright and trademark information" at www.ibm.com/legal/copytrade.shtml.