Readme for IBM Spectrum Conductor with Spark 2.2 Interim Fix 441347

Readme file for: IBM® Spectrum Conductor with Spark
Product/Component Release:
2.2.0
Update Name: Interim fix 441347
Fix ID:
cws-2.2-build441347-jpmc
Publication date:
Feb 10, 2017

Host blocking with Spark 2.1.0 for IBM Spectrum Conductor with Spark V2.2.0

This Readme file provides details on enabling host blocking for a Spark instance group that uses Spark 2.1.0. With host blocking, you can enable hosts to be blocked when Spark drivers of a Spark instance group fail 3 or more times on a host. When hosts in a Spark instance group contain environment errors (for example, those relating to Spark installation, Java home configuration, incorrect Java version, or inaccessible working directory), Spark drivers fail to start or run on these hosts. When this function is enabled, if drivers fail 3 or more times on a host, the host is added to a blocked host list and the Spark master no longer allocates resources from this host for driver startup.

A host is blocked under the following conditions:

o    When a driver cannot start on a host for any of the following reasons:

o    User account does not exist.

o    Failed to change container password.

o    Container fails to start.

o    Startup command does not exist.

o    Command cannot be executed.

o    Failed to create stdout, stderr redirection files.

o    When the driver starts on a host but then fails because the application exit reason matches the rules specified for host blocking. For more information, see the Configuration section.

 

Installation and configuration

System requirements

Linux 64-bit

Installation

Before you begin, IBM Spectrum Conductor with Spark V2.2.0 must be installed on a supported operating system. For details, see https://www.ibm.com/support/knowledgecenter/SSZU2E_2.2.0/installing/install_upgrade.html.

1.     From IBM Fix Central, look for Fix ID cws-2.2-build437098 and download the Spark2.1.0-Conductor2.2.0.tgz package to a local directory on your computer.

2.     Launch a browser and log in to the cluster management console as cluster administrator.

3.     Add the Spark 2.1.0 package to your cluster.

a.     Click Workload > Spark > Version Management.

b.     Click Add.

c.     Click Browse and select the package you downloaded previously.

d.     Click Add.

4.     Create a new Spark instance group that will use the new Spark 2.1.0 package. For details, see http://www.ibm.com/support/knowledgecenter/SSZU2E_2.2.0/developing_instances/developing_instances.html.

 

Configuration

To enable hosts of a Spark instance group to be blocked when Spark drivers fail on a host, you must modify a Spark instance group’s configuration and configure the SPARK_EGO_DRIVER_BLOCKHOST_CONF parameter for Spark 2.1.0.

Enabling host blocking for failed Spark drivers

To enable host blocking when Spark drivers fail 3 or more times on a host, configure the SPARK_EGO_DRIVER_BLOCKHOST_CONF parameter. By default, this parameter is undefined. Valid value is one or more keywords enclosed within single quotation marks (') that can contain uppercase and lowercase letters, numbers, underscore (_), and hyphen (-). Multiple keywords within each rule must be separated by a space ( ); multiple rules must be separated by a semi-colon (;).

For example, with the configuration defined as 'key1;key2 key3', a semi-colon (;) separates two rule sets. Assuming the following conditions, if the driver exit reason on a host matches even one rule, that driver host is blocked:

·         The Spark driver does not start on a host because the startup command does not exist. In this case, the host is blocked and the resources are returned.

·         The Spark driver starts on a host and then fails; the exit reason is “key1”, which meets the host blocking criteria. In this case, the host is blocked and the resources are returned.

·         The Spark driver starts on a host and then fails; the exit reason is “key4”, which does not meet the host blocking criteria. In this case, the host is not blocked.

Unblocking a blocked host

To list a blocked host and unblock the host, use the ego alloc view and ego alloc unblock commands from the egosh command line. For example:

1.     Find the consumer that the Spark instance group uses, then use the egosh alloc list command to get the allocation IDs for this consumer (for example, /SparkConsumer/Driver).

# egosh alloc list

ALLOC    CONSUMER     CLIENT       RGROUP       RESOURCE     ALLOCATED ACTI

318      /SparkConsu* SPARK_RESMG* DriversRG

285      /SparkConsu* SPARK_RESMG* DriversRG

319      /SparkConsu* SPARK_RESMG* ExecutorsRG

286      /SparkConsu* SPARK_RESMG* DriversRG

2.     Use the egosh alloc view allocID to display the blocked host list. For example:

# egosh alloc view 285

Allocation ID    : 285

Allocation Client: SPARK_RESMGR:2b24aa52-095e-425a-9fe9-154f5da7e0c2-spark21-sparkms-batch-1

Allocation User  : Admin

 

ALLOCATION REQUEST:

Allocation Name : 2b24aa52-095e-425a-9fe9-154f5da7e0c2-spark21-sparkms-batch-1

Consumer        : /SparkConsumer/Driver

Resource Group  : DriversRG

Requirement     : select(('X86_64' || 'LINUXPPC64' || 'LINUXPPC64LE') && ('ascd_pkg_deployed'==1))

MINSLOTS MAXSLOTS EXCLUSIVE TILE

0        0        No        0

 

ALLOCATED RESOURCE:

RESOURCE              ALLOCATED OCCUPY

 

BLOCKED RESOURCE LIST:

testhost1.lab.abc.com testhost2.lab.abc.com

3.     After fixing the misconfiguration that causes errors on these hosts, use the egosh alloc unblock command to unblock the blocked host, so that the host is allocated to its consumer again. For example:

# egosh alloc unblock

Please enter allocation ID:285

Please enter the number of hosts: 1

<1> Please enter one host name:teshost1.lab.abc.com

Request to unblock allocation <285>  has been submitted successfully,

use <alloc view> to check results

 

Copyright and trademark information

© Copyright IBM Corporation 2017

U.S. Government Users Restricted Rights - Use, duplication or disclosure restricted by GSA ADP Schedule Contract with IBM Corp.

IBM®, the IBM logo, and ibm.com® are trademarks of International Business Machines Corp., registered in many jurisdictions worldwide. Other product and service names might be trademarks of IBM or other companies. A current list of IBM trademarks is available on the Web at "Copyright and trademark information" at www.ibm.com/legal/copytrade.shtml.