Readme for IBM Spectrum Conductor with Spark 2.2 Interim Fix 441347
Readme file for: IBM® Spectrum Conductor with Spark
Product/Component Release: 2.2.0
Update Name: Interim fix 441347
Fix ID: cws-2.2-build441347-jpmc
Publication date: Feb 10, 2017
Host blocking with
Spark 2.1.0 for IBM Spectrum Conductor with Spark V2.2.0
This Readme file
provides details on enabling host blocking for a Spark instance group that uses
Spark 2.1.0. With host blocking, you can enable hosts to be blocked when Spark drivers
of a Spark instance group fail 3 or more times on a host. When hosts in a Spark
instance group contain environment errors (for example, those relating to Spark
installation, Java home configuration, incorrect Java version, or inaccessible
working directory), Spark drivers fail to start or run on these hosts. When
this function is enabled, if drivers fail 3 or more times on a host, the host
is added to a blocked host list and the Spark master no longer allocates
resources from this host for driver startup.
A host is
blocked under the following conditions:
o When a driver cannot start on a host for any of the following reasons:
o User account does not exist.
o Failed to change container password.
o Container fails to start.
o Startup command does not exist.
o Command cannot be executed.
o Failed to create stdout, stderr redirection files.
o When the driver starts on a host but then fails because the application
exit reason matches the rules specified for host blocking. For more information, see the Configuration
section.
Installation
and configuration
System
requirements
Linux 64-bit
Installation
Before you begin, IBM Spectrum Conductor with Spark
V2.2.0 must be installed on a supported operating system. For details, see https://www.ibm.com/support/knowledgecenter/SSZU2E_2.2.0/installing/install_upgrade.html.
1. From IBM Fix Central, look for Fix
ID cws-2.2-build437098
and download the Spark2.1.0-Conductor2.2.0.tgz
package to a local directory on your computer.
2. Launch a
browser and log in to the cluster management console as cluster administrator.
3. Add the Spark
2.1.0 package to your cluster.
a.
Click Workload > Spark > Version
Management.
b.
Click Add.
c.
Click Browse and select the package you
downloaded previously.
d.
Click Add.
4. Create a new
Spark instance group that will use the new Spark 2.1.0 package. For details,
see http://www.ibm.com/support/knowledgecenter/SSZU2E_2.2.0/developing_instances/developing_instances.html.
To enable hosts of a Spark
instance group to be blocked when Spark drivers fail on a host, you must modify
a Spark instance group’s configuration and configure the SPARK_EGO_DRIVER_BLOCKHOST_CONF parameter for Spark 2.1.0.
Enabling host blocking for failed
Spark drivers
To enable host blocking when Spark drivers fail 3 or more times on a
host, configure the SPARK_EGO_DRIVER_BLOCKHOST_CONF
parameter. By default, this parameter is undefined. Valid value
is one or more keywords enclosed within single quotation marks (') that can
contain uppercase and lowercase letters, numbers, underscore (_), and hyphen
(-). Multiple keywords within each rule must be separated by a space ( );
multiple rules must be separated by a semi-colon (;).
For example,
with the configuration defined as 'key1;key2
key3', a semi-colon (;) separates two rule sets. Assuming the following conditions, if the driver exit reason on a host
matches even one rule, that driver host is blocked:
·
The
Spark driver does not start on a host because the startup command does not
exist. In this case, the host is blocked and the resources are returned.
·
The
Spark driver starts on a host and then fails; the exit reason is “key1”, which
meets the host blocking criteria. In this case, the host is blocked and the
resources are returned.
·
The
Spark driver starts on a host and then fails; the exit reason is “key4”, which
does not meet the host blocking criteria. In this case, the host is not
blocked.
Unblocking a blocked host
To list a
blocked host and unblock the host, use the ego alloc
view and ego alloc unblock commands from the
egosh command line. For example:
1. Find the consumer that the Spark instance group uses,
then use the egosh alloc
list command to get the allocation IDs for this consumer (for example, /SparkConsumer/Driver).
# egosh alloc
list
ALLOC CONSUMER CLIENT RGROUP RESOURCE ALLOCATED ACTI
318 /SparkConsu* SPARK_RESMG*
DriversRG
285
/SparkConsu* SPARK_RESMG* DriversRG
319 /SparkConsu* SPARK_RESMG*
ExecutorsRG
286 /SparkConsu* SPARK_RESMG*
DriversRG
2. Use the egosh
alloc view allocID to display the
blocked host list. For example:
# egosh alloc view 285
Allocation ID : 285
Allocation Client:
SPARK_RESMGR:2b24aa52-095e-425a-9fe9-154f5da7e0c2-spark21-sparkms-batch-1
Allocation User : Admin
ALLOCATION REQUEST:
Allocation Name :
2b24aa52-095e-425a-9fe9-154f5da7e0c2-spark21-sparkms-batch-1
Consumer : /SparkConsumer/Driver
Resource Group : DriversRG
Requirement : select(('X86_64' || 'LINUXPPC64' ||
'LINUXPPC64LE') && ('ascd_pkg_deployed'==1))
MINSLOTS MAXSLOTS EXCLUSIVE TILE
0
0 No 0
ALLOCATED RESOURCE:
RESOURCE ALLOCATED OCCUPY
BLOCKED RESOURCE LIST:
testhost1.lab.abc.com
testhost2.lab.abc.com
3. After fixing the misconfiguration that causes errors
on these hosts, use the egosh alloc
unblock command to unblock the blocked host, so
that the host is allocated to its consumer again. For example:
# egosh alloc unblock
Please enter allocation ID:285
Please enter the number of hosts: 1
<1> Please enter one host
name:teshost1.lab.abc.com
Request to unblock allocation
<285> has been submitted
successfully,
use <alloc view> to check results
Copyright
and trademark information
© Copyright IBM Corporation 2017
U.S. Government
Users Restricted Rights - Use, duplication or disclosure restricted by GSA ADP
Schedule Contract with IBM Corp.
IBM®, the IBM logo, and ibm.com®
are trademarks of International Business Machines Corp., registered in many
jurisdictions worldwide. Other product and service names might be trademarks of
IBM or other companies. A current list of IBM trademarks is available on the
Web at "Copyright and trademark information" at www.ibm.com/legal/copytrade.shtml.