Readme for IBM® Spectrum Conductor with Spark 2.2.1 Interim Fix 487185 

Readme file for: IBM Spectrum Conductor with Spark

Product/Component Release: 2.2.1

Update Name: Interim Fix 487185

Fix ID: cws-2.2.1-build487185

Publication date: Jun 8, 2018

 

Abstract

Some Deep Learning Impact jobs ask for dedicated resources, for example TensorFlow. If the cluster is fully allocated, the multiple masters might hold partial resources and wait for each other, causing deadlock. 

Description

This interim fix resolves the issue of Deep Learning Impact training job deadlock for an IBM Spectrum Conductor with Spark v2.2.1 Spark instance group that uses Spark versions 1.6.1, 2.1.1, and 2.2.0.

 

Contents

1.     Download location 

2.     Products or components affected

3.     Installation and configuration

4.     List of files

5.     Copyright and trademark information 

 

1.  Download location

Download Fix 487185 from the following location: http://www.ibm.com/eserver/support/fixes/.

2.  Products or components affected 

·       IBM Spectrum Conductor with Spark v2.2.1

·       Spark version 1.6.1, 2.1.1, 2.2.0

·       Linux 64-bit

·       cws-2.2.1-build487185

3.  Installation and configuration

System requirements

·       Linux 64-bit

Before installation

1.     IBM Spectrum Conductor with Spark v2.2.1 must be installed on a supported operating system. For details, see https://www.ibm.com/support/knowledgecenter/SSZU2E_2.2.1/installing/install_upgrade.html.

2.     Log on as the cluster administrator and stop the ascd service:

> egosh user logon -u Admin -x Admin

> egosh service stop ascd

3.     For recovery purposes, log on to each management host in the cluster and back up the following file to another directory:

$EGO_TOP/ascd/2.2.1/lib/asc-common-2.2.1.jar

$EGO_TOP/ascd/2.2.1/lib/asc-core-2.2.1.jar

 

Installation

1.     Log on to each management host in your cluster as the cluster administrator and decompress the cws-2.2.1.0_build487185.tgz package to the directory where you installed IBM Spectrum Conductor with Spark.

> mkdir -p /tmp/fix487185

> tar zoxf cws-2.2.1.0_build487185.tgz -C /tmp/fix487185

> tar zoxf /tmp/fix487185/lifecycle.tgz -C $EGO_TOP

 

2.     Delete all subdirectories and files in the ascd/workarea directory:

> rm –rf $EGO_TOP/ascd/workarea/*

NOTE: If you changed the default configuration for the WLP_OUTPUT_DIR environment variable and APPEND_HOSTNAME_TO_WLP_OUTPUT_DIR is set to true in the $EGO_CONFDIR/wlp.conf file, you must clean up the $WLP_OUTPUT_DIR/ascd_hostname/ascd/workarea/ directory.

3.     Start the ascd service:

> egosh user logon -u Admin -x Admin

> egosh service start ascd

 

4.   On the client machine where you have a browser, decompress the cws-2.2.1.0_build487185.tgz package. For example, on Linux:

> mkdir -p /tmp/fix487185

> tar zoxf cws-2.2.1.0_build487185.tgz -C /tmp/fix487185

 

5.   Launch a browser and clear the browser cache; then log in to the cluster management console as Admin.

 

6.   Remove the Spark package if it exists. For example, for Spark 1.6.1:

a.     Click Workload > Spark > Version Management.

b.     Check 1.6.1.

c.     Click Remove.

 

7.   Add the new Spark package to your cluster. For example, for Spark 1.6.1:

a.     Click Workload > Spark > Version Management.

b.     Click Add.

c.     Click Browse and select the /tmp/fix487185/Spark1.6.1-Conductor2.2.1.tgz package.

 

8.   Click Add.   

 

After installation

1.     Create a new Spark instance group that uses the new Spark package, for example, Spark 1.6.1. For details, see https://www.ibm.com/support/knowledgecenter/SSZU2E_2.2.1/developing_instances/instance_create_about.html.

 

2.     If required, upgrade your existing Spark instance groups to use the new Spark package, for example, Spark 1.6.1. For details, see https://www.ibm.com/support/knowledgecenter/SSZU2E_2.2.1/managing_instances/instance_update_spark_version.html.

    

For existing Spark instance groups, updating does not involve deleting and re-creating Spark instance groups. This patch takes effect for both newly created and updated Spark instance groups.

4.  List of files 

·       Spark1.6.1-Conductor2.2.1.tgz

 

·       Spark2.1.1-Conductor2.2.1.tgz

 

·       Spark2.2.0-Conductor2.2.1.tgz

 

·       asc-common-2.2.1.jar

 

·       asc-core-2.2.1.jar

 

5.  Copyright and trademark information

© Copyright IBM Corporation 2018

U.S. Government Users Restricted Rights - Use, duplication or disclosure restricted by GSA ADP Schedule Contract with IBM Corp.

IBM®, the IBM logo, and ibm.com® are trademarks of International Business Machines Corp., registered in many jurisdictions worldwide. Other product and service names might be trademarks of IBM or other companies. A current list of IBM trademarks is available on the Web at "Copyright and trademark information" at www.ibm.com/legal/copytrade.shtml