===============================================================================

Readme file for: IBM Platform Conductor for Spark

Product/Component Release: 1.1.0

Update name: Spark 1.5.2 Package

Fix ID: pcs-1.1-build398394

Publication date: 8 April 2016

 

Updated Spark 1.5.2 package for IBM® Platform Conductor for Spark v1.1.0

===============================================================================

 

=========================

CONTENTS

=========================

1. About the Spark 1.5.2 package

2. Prerequisites

3. Installation and Configuration

4. Copyright

 

=========================

1. About the Spark 1.5.2 package

=========================

The Spark version package (Spark1.5.2-Conductor1.1.tgz) supports Spark version 1.5.2 for Platform Conductor for Spark v1.1.0 and includes the following updates:

·         Application & Notebooks page did not display details if the Spark instance group execution user was different from Cluster Admin
Previously, if the execution user specified when creating a Spark instance group was not the Cluster Admin specified at installation time, the Applications & Notebooks page did not display any applications. This issue is now fixed.

·         Spark applications cannot be killed by spark-submit
Spark applications could not be killed when multiple applications were running at the same time. This issue is now fixed.

·         Service-to-slot ratio
You can now configure each task to run with multiple slots at the Spark instance group level. To configure this setting, edit the Spark instance group configuration and define SPARK_EGO_SLOTS_PER_TASK under Session Scheduler settings. Valid value is a positive integer starting from 1; the default setting is 1.

·         Support for exclusive slot allocation
You can now schedule Spark applications using the exclusive slot allocation policy. When slot allocation is exclusive, all of a host's free slots are allocated to one consumer at a time. Use this allocation to resolve resource fragmentation.

·         Support for hybrid policy with exclusive slot allocation

When using the exclusive slot allocation policy with a hybrid policy, you can now configure the reclaim grace period to specify the duration (in seconds) that the Spark master will wait to reclaim resources from applications. To configure this setting, edit the Spark instance group configuration and define SPARK_EGO_RECLAIM_GRACE_PERIOD under Session Scheduler settings. Valid value is in the range 0 – 8640000. The default setting is 0, wherein the Spark driver kills any running tasks and returns resources at once to the Spark master.

·         Spark application crashes
Previously, a JNI library mismatch with the EGO version caused some Spark applications to crash. This issue is now fixed.

·         SPARK_EGO_EXECUTOR_SLOTS_RESERVE behavior
In some cases, setting SPARK_EGO_EXECUTOR_SLOTS_RESERVE (under Spark on EGO settings) to more than 1 increased the resource demand of Spark applications far more than expected. This behavior is now fixed.

·         Prevent kill action from the Master UI when EGO_AUTH mode set
You can no longer kill (stop) Spark applications from the Spark master UI when EGO_AUTH mode is configured with spark.ui.killEnabled set to true.

·         Support uname and passwd to stop Spark applications using RESTful APIs
You can now stop (kill) Spark applications using RESTful APIs with spark.ego.uname and spark.ego.passwd set.       

·         Master UI now displays Spark applications
Previously, the Spark master UI would not render Spark applications correctly. With this issue now fixed, the master UI builds correctly to display Spark applications.

·         Rebuilding historical Spark web UI no longer needs to be asynchronous
Previously, when a long-running application finishes, it took a while (sometimes minutes) to rebuild the Spark web UI. In the meantime, no other applications could register with this master. This issue has now been fixed (https://issues.apache.org/jira/browse/SPARK-12062)

 

=========================
2. Prerequisites
=========================

2.1 Platform Conductor for Spark v1.1.0 must be installed on a supported operating system. For more information, see http://www.ibm.com/support/knowledgecenter/SSVH2B_1.1.0/install/install.dita.

2.2 Interim fix 398394 (conductor1.1.0_x86_64-build398394.tar.gz) must be installed. Refer to readme_build398394.html for instructions.

=========================

3. Installation and Configuration

=========================

3.1  Download the Spark1.5.2-Conductor1.1.tgz package to a local directory on your computer.

3.2  Log in to the Platform Management Console (PMC) as cluster administrator.

3.3  From the PMC, click Workload > Spark > Version Management.

3.4  If you have previously added Spark 1.5.2, select the Spark 1.5.2 package and click Remove.

3.5  Click Add.

3.6 Click Browse and select the Spark1.5.2-Conductor1.1.tgz package from this interim fix.

3.7  Click Add.

            The updated Spark 1.5.2 package is added to Platform Conductor for Spark. 

NOTE: If you do not want your existing Spark instance groups to pick up the new Spark 1.5.2 version, do not edit the Spark configuration for those Spark instance groups. Editing the configuration will cause the new Spark 1.5.2 package to be deployed for that Spark instance group.

3.8 To modify an existing Spark instance group using a previous Spark 1.5.2 package to use the new Spark 1.5.2 package, perform the following:
      1) Stop the existing Spark instance group.

2) Back up the configuration files for this Spark instance group:

     $SPARK_HOME/conf
  $SPARK_HOME/../master_conf

  $SPARK_HOME/../shuffle_conf

  $SPARK_HOME/../history_conf (if history was enabled)
    Any other manual configuration changes you have made.

3) Follow Option 1 or 2 to deploy the new Spark package to the existing Spark instance group. The Spark package referred to below is the Spark1.5.2.tgz file that you extract from the Spark1.5.2-Conductor1.1.tgz file.

OPTION 1: Modify the Spark configuration in the existing Spark instance group. Modifying the configuration triggers a redeploy of the Spark package, which then picks up the latest version.

OPTION 2: Manually update the Spark package for the existing Spark instance group:
a) From the command line interface, source the profile.

Find the package name for the existing Spark instance group you want to update. The package name will be in the form: SPARKINSTANCEGROUPNAME_Spark1.5.2. For example, if the name of a Spark Instance Group is LOB, the package name would be: LOB_Spark1.5.2.  Also, find the consumer path for this package; this is the top-level consumer for the Spark instance group.
Run the command:
soamdeploy add <packagename> -p <Path to Spark1.5.2.tgz file> -c <consumerPath>

For example:
soamdeploy add LOB_Spark1.5.2 –p /tmp/Spark1.5.2.tgz –c /LOB

This command creates a new version of the package for your Spark instance group.  
b) From the Spark Instance Groups page, navigate to the Hosts tab for the group and click Deploy Spark to Hosts.

 

4) Start the Spark instance group.

 

=========================

4. Copyright

=========================

© IBM Corporation 2016

U.S. Government Users Restricted Rights - Use, duplication or disclosure restricted by GSA ADP Schedule Contract with IBM Corp.

IBM®, the IBM logo and ibm.com® are trademarks of International Business Machines Corp., registered in many jurisdictions worldwide. 

Other product and service names might be trademarks of IBM or other companies. A current list of IBM trademarks is available on the 

Web at "Copyright and trademark information" at www.ibm.com/legal/copytrade.shtml