Interim Fix Readme for IBM Platform Conductor for Spark V1.1

===============================================================================
Readme file for: IBM® Platform Conductor for Spark
Product/Component Release: 1.1.0
Fix ID: pcs-1.1-build398394

Publication date: 8 April 2016

Abstract: Interim fix containing fixes/enhancements for IBM® Platform Conductor for Spark v1.1.0
===============================================================================

=========================
CONTENTS
=========================
1. About this interim fix
2. List of enhancements

3. List of fixes

4. Installation and configuration
5. Copyright

=========================
1. About this interim fix
=========================

This Platform Conductor for Spark interim fix applies to a v1.1.0 cluster and includes the following updates:

· Significant enhancements, including the ability to retrieve Spark driver logs from within the Platform Management Console (PMC). See subsequent sections for a complete list.

· Several fixes. See subsequent sections for a complete list.

=========================
2. List of enhancements
=========================

This interim fix includes the following list of enhancements:

· Validation for Add Notebook wizard parameters
Validation for the following fields in the Add Notebook wizard has been changed to prevent ascd crashes:
Start Command
Stop Command
Prestart Command
Job Monitor command
Environment Variable Name/Value

The new set of supported characters is:
A-Z a-z 0-9 ~ ` ! @ $ ^ * ( ) _ + \ - | } { : ? > < , . \/ ; ' ] [ "

· Submit Spark batch applications without arguments from the PMC
You can now submit Spark batch applications from the PMC without specifying any arguments, much like you would with the spark-submit command.

· Source additional third-party environment settings
You can now configure an additional directory with your own spark-env.sh file to pick up additional environment variable settings that you require beyond Spark environment variables. To load this spark-env.sh file on startup, set SPARK_EGO_CONF_DIR_EXTRA under Spark on EGO to an existing directory containing your custom spark-env.sh file.

· Retrieve Spark Driver logs from the PMC

You can now download Spark logs for a failed driver from the PMC if you are a cluster administrator or have the GUI Access Spark Application Logs permission. On the Applications & Notebooks page, when you have a failed driver in the Applications tab, you will see a (logs) hyperlink beside the application state. Click the link to download stdout and stderr logs.

For running or finished Spark applications, go to the Spark Applications page; then in the Logs & Metrics tab, you will see the Spark Driver Logs hyperlink.

· Separate consumer and resource group for Shuffle service and executors
When creating a Spark instance group, you can now choose a different consumer and resource group for both the Shuffle service and the executors.

NOTE: If you choose a different resource group, ensure that both resource groups have the same set of hosts.

· Create Spark Instance Groups without EGO Services Credential Configuration permission
Previously, you could create a Spark instance group only if you were a cluster administrator or a user with a role that had the EGO Services Credential Configuration permission. Now, it is possible for consumer administrators to create Spark instance groups without the EGO Services Credential permission.

=========================
3. List of fixes
=========================

This interim fix includes the following list of fixes:

· PMC with SSL disabled
With SSL disabled, cookies for the PMC were still sent securely, which the browser was unable to read. This issue has been fixed to avoid login prompt issues. If used after 24 hrs, errors may have occurred on account of expired tokens.

· Authentication required when ASCD is running on another host; cookies not cleared on new user login

With this interim fix, logging in to the cluster now uses the appropriate user cookies; communication with the ASCD/WEBGUI running on different hosts no longer prompts you to log in to the REST server.

To support this fix, ensure that all management hosts run in the same domain. For example, with two management hosts MANGA and MANGB in your cluster, both hosts must run in the same domain and must include the fully qualified domain name in the hostname: MANGA.domainname.com, MANGB.domainname.com.

· Login issues following PMC timeout

Previously, when the PMC timed out for one user and you tried to log in as a different user, the error message did not specify who the currently logged-in user was. This interim fix improves the error message to tell you which user is logged in. You must then log in as that user, log out, and log in again as the new user.

· NullPointerException when removing a Spark Instance Group

Previously, when a consumer being used by a Spark Instance Group was deleted, you could no longer delete the Spark Instance Group. With this interim fix, you can now delete Spark Instance Groups even if the associated consumer was deleted previously.

· Consumer Admin can no longer add/remove Spark versions, notebooks

With this interim fix, the cluster administrator is the only out-of-the-box role with permissions to add or remove Spark versions and notebooks.

· MaxInstancesPerHost setting removed for notebook services

When assigning users to notebooks, the service profile generated for the notebook no longer includes the MaxInstancesPerHost setting, which does not work with the exclusive slot allocation policy.

NOTE: This change applies to new notebooks assigned to users. Existing notebooks assigned to users will still use MaxInstancesPerHost.

· Master UI now displays Spark applications

Previously, the Spark master UI would not render Spark applications correctly. With this issue now fixed, the master UI builds correctly to display Spark applications.

· User is no longer logged out while using the PMC

Previously, when using the Spark menu on the PMC, users were sometimes timed out of the PMC even while using it. With this interim fix, a user is logged out only when the session has been inactive for the value specified by the SessionExpireTimeout parameter.

· Spark Shuffle Service copies configuration files once

Previously, the Spark Shuffle Service would copy the configuration files for every host it started on. When using IBM Spectrum Scale, where this configuration file directory was the same for all hosts, this behavior could corrupt the configuration files. This issue has now been fixed, so that the configuration is copied only once.

· Rebuilding historical Spark web UI no longer needs to be asynchronous

Previously, when a long-running application finishes, it took a while (sometimes minutes) to rebuild the Spark web UI. In the meantime, no other applications could register with this master. This issue has now been fixed (https://issues.apache.org/jira/browse/SPARK-12062).

· Communication issues between shipper service and logstash forwarder

Under heavy load, sometimes, the shipper service (elk-indexer) and logstash forwarder (elk-shipper) in the ELK stack lost communication. As a result, Kibana dashboards did not have updated information and submitted applications did not display. This issue is now resolved.

· Applications pages in PMC become extremely slow with hundreds of applications

When hundreds of Spark applications are submitted, the application pages under Workload > Spark become very slow. The performance of these application pages is now improved to reduce the amount of network data sent.

· RS logs incorrectly show ERROR message related to SOAM_HOME setting

In the repository server (RS) logs, an ERROR message was logged that SOAM_HOME was not set. This ERROR message now shows as a WARN, as the message has no impact on the RS.

· $EGO_TOP/soam/deploy incorrectly shows engbuild as file owner

For Spark instance group deployments, the file owner for the $EGO_TOP/soam/deploy folder was previously engbuild. This owner is now set correctly to the user executing the deployment.

· Improved deployment error messages

During deployments, if the deploy.sh file does not exist or has permission issues for execution, error messages now help you to better identify the issue.

=========================
4. Installation and configuration
=========================

Prerequisite: Platform Conductor for Spark v1.1.0 must be installed. For more information, see http://www.ibm.com/support/knowledgecenter/SSVH2B_1.1.0/install/install.dita.

NOTE: When copying files as described in the following steps, ensure that file permissions are identical to permissions set for the cluster administrator.

4.1. Log in as the cluster administrator.

4.2. From the egosh command prompt, shut down all services and the cluster:
egosh service stop all
egosh ego shutdown –f all

4.3. Back up the following files in your installation to a separate backup directory – these files will be replaced by new ones in this interim fix:

$EGO_TOP/asc/1.1.1/lib/egogui.jar

$EGO_TOP/asc/1.1.1/lib/asc-core-1.1.1.jar

$EGO_TOP/asc/1.1.1/lib/asc-common-1.1.1.jar

$EGO_TOP/asc/1.1.1/lib/commons-ego.jar

$EGO_TOP/gui/3.3/lib/egogui.jar

$EGO_TOP/gui/3.3/lib/asc-common-1.1.1.jar

$EGO_TOP/gui/3.3/lib/commons-ego.jar

$EGO_TOP/wlp/usr/shared/resources/rest/3.3/egogui.jar

$EGO_TOP/wlp/usr/servers/gui/apps/soam/7.1.1/symgui/WEB-INF/classes/com/platform/gui/framework/web/filter/SessionTimeOutFilter.class

$EGO_TOP/wlp/usr/servers/gui/apps/soam/7.1.1/symgui/WEB-INF/classes/com/platform/gui/spark/instances/action/SparkyInstanceListAction.class

$EGO_TOP/wlp/usr/servers/gui/apps/soam/7.1.1/symgui/WEB-INF/classes/com/platform/gui/spark/instances/action/struts.xml

$EGO_TOP/wlp/usr/servers/gui/apps/soam/7.1.1/symgui/WEB-INF/classes/com/platform/gui/spark/common/SparkTimeoutResult.class

$EGO_TOP/wlp/usr/servers/gui/apps/soam/7.1.1/symgui/WEB-INF/classes/com/platform/gui/spark/common/SparkRestClient.class

$EGO_TOP/wlp/usr/servers/gui/apps/soam/7.1.1/symgui/spark/common/js/ConductorSparkApp.js

$EGO_TOP/wlp/usr/servers/gui/apps/soam/7.1.1/symgui/spark/instance/js/instanceViewApplications.controller.js

$EGO_TOP/wlp/usr/servers/gui/apps/soam/7.1.1/symgui/spark/appsandnotebooks/js/appsAndNotebooksList.controller.js

$EGO_TOP/wlp/usr/servers/gui/apps/soam/7.1.1/symgui/spark/appsandnotebooks/js/applicationView.controller.js

$EGO_TOP/wlp/usr/servers/gui/apps/soam/7.1.1/symgui/spark/appsandnotebooks/applicationView.html

$EGO_TOP/wlp/usr/servers/gui/apps/soam/7.1.1/symgui/spark/appsandnotebooks/applicationView.jsp

$EGO_TOP/wlp/usr/servers/gui/apps/soam/7.1.1/symgui/spark/appsandnotebooks/i18n/locale-en.json

$EGO_TOP/wlp/usr/servers/gui/apps/soam/7.1.1/symgui/spark/instance/i18n/locale-en.json

$EGO_TOP/wlp/usr/servers/gui/apps/soam/7.1.1/symgui/spark/notebooks/addNotebook.html

$EGO_TOP/wlp/usr/servers/gui/apps/soam/7.1.1/symgui/spark/notebooks/i18n/locale-en.json

$EGO_TOP/wlp/usr/servers/gui/apps/soam/7.1.1/symgui/spark/notebooks/js/notebookList.controller.js

$EGO_TOP/wlp/usr/shared/resources/rest/3.3/commons-ego.jar

$EGO_TOP/3.3/linux-x86_64/lib/librbac_ego_default.so

$EGO_TOP/3.3/linux-x86_64/lib/rbac_ego_default.so

$EGO_TOP/perf/soam/7.1.1/lib/commons-ego.jar

$EGO_TOP/soam/7.1.1/linux-x86_64/bin/soamdeploy

$EGO_TOP/integration/elk/scripts/esjobmonitor.sh

$EGO_TOP/integration/elk/scripts/indexerjobmonitor.sh

$EGO_TOP/integration/elk/scripts/kbjobmonitor.sh

$EGO_TOP/integration/elk/scripts/shipper.conf

$EGO_TOP/integration/elk/scripts/shipperjobmonitor.sh

$EGO_TOP/integration/elk/scripts/startes.sh

$EGO_TOP/integration/elk/scripts/startindexer.sh

$EGO_TOP/integration/elk/scripts/startshipper.sh

$EGO_CONFDIR/../../gui/conf/useracl/permission_GUIPermissionSoam.acl

$EGO_CONFDIR/../../gui/conf/useracl/permission_actionElementsSoam.acl

$EGO_CONFDIR/RBAC_Permission_EGO.xml

4.4. On every host in the cluster, extract the conductor1.1.0_x86_64-build398394.tar.gz file:

tar zxfo conductor1.1.0_x86_64-build398394.tar.gz -C $EGO_TOP

4.5. On any management host in the cluster, complete any one of the following steps:

· If you have not customized RBAC roles, copy $EGO_TOP/tmp/RBAC_Role.xml to $EGO_CONFDIR/RBAC_Role.xml.

· If you added new user roles to the default user roles, edit the $EGO_CONFDIR/RBAC_Role.xml file in your preferred text editor as follows:

Add the permission <Permission>GUI_ACCESS_SPARK_APPLICATION_LOGS</Permission> for the following roles:
- Cluster Admin

- Consumer Admin

- Consumer User

- Cluster Admin (Read Only)

- Consumer Admin (Read Only)

- Any custom roles that should be able to download the Spark Driver logs from the PMC.

Remove the permissions <Permission>SPARK_NOTEBOOK_CONFIGURE</Permission> and <Permission>SPARK_VERSION_CONFIGURE</Permission> from the Consumer Admin role.

4.6. On any management host in the cluster, copy $EGO_TOP/kernel/conf/RBAC_Permission_EGO.xml to $EGO_CONFDIR/RBAC_Permission_EGO.xml.

4.7. On any management host in the cluster, copy $EGO_TOP/gui/conf/useracl/permission_GUIPermissionSoam.acl to $EGO_CONFDIR/../../gui/conf/useracl/permission_GUIPermissionSoam.acl.

4.8. On any management host in the cluster, copy $EGO_TOP/gui/conf/useracl/permission_actionElementsSoam.acl to $EGO_CONFDIR/../../gui/conf/useracl/permission_actionElementsSoam.acl.

4.9. On every management host in the cluster, modify the $EGO_TOP/integration/elk/scripts/startes.sh to replace the following 3 lines:

· Line 3: Replace @JAVA_HOME@ with the string EGO_TOP/jre/3.3/linux-x86_64, where EGO_TOP is the value of the $EGO_TOP environment variable in your environment.
For example, after modification, the line could be:
export JAVA_HOME=/opt/ibm/platform/jre/3.3/linux-x86_64

· Line 85: Replace the line

curl -s -XPUT http://$HOSTNAME:$ELK_ESHTTP_PORT/.kibana/visualization/Spark_Completed-Tasks -d '{"title":"Completed Tasks","visState":"{\"type\":\"area\",\"params\":{\"shareYAxis\":true,\"addTooltip\":true,\"addLegend\":false,\"smoothLines\":false,\"scale\":\"linear\",\"interpolate\":\"linear\",\"mode\":\"stacked\",\"times\":[],\"addTimeMarker\":false,\"defaultYExtents\":false,\"setYExtents\":false,\"yAxis\":{}},\"aggs\":[{\"id\":\"1\",\"type\":\"sum\",\"schema\":\"metric\",\"params\":{\"field\":\"executor.threadpool.completeTasks\"}},{\"id\":\"2\",\"type\":\"date_histogram\",\"schema\":\"segment\",\"params\":{\"field\":\"@timestamp\",\"interval\":\"auto\",\"min_doc_count\":1,\"extended_bounds\":{},\"json\":\"{\\\"interval\\\":\\\"10s\\\"}\"}}],\"listeners\":{}}","description":"","savedSearchId":"spark-executor-log","version":1,"kibanaSavedObjectMeta":{"searchSourceJSON":"{\"filter\":[]}"}}'

with

curl -s -XPUT http://$HOSTNAME:$ELK_ESHTTP_PORT/.kibana/visualization/Spark_Completed-Tasks -d '{"title":"Completed Tasks","visState":"{\"type\":\"area\",\"params\":{\"shareYAxis\":true,\"addTooltip\":true,\"addLegend\":true,\"smoothLines\":false,\"scale\":\"linear\",\"interpolate\":\"linear\",\"mode\":\"stacked\",\"times\":[],\"addTimeMarker\":false,\"defaultYExtents\":false,\"setYExtents\":false,\"yAxis\":{}},\"aggs\":[{\"id\":\"1\",\"type\":\"max\",\"schema\":\"metric\",\"params\":{\"field\":\"executor.threadpool.completeTasks\"}},{\"id\":\"2\",\"type\":\"date_histogram\",\"schema\":\"segment\",\"params\":{\"field\":\"@timestamp\",\"interval\":\"auto\",\"min_doc_count\":1,\"extended_bounds\":{},\"json\":\"{\\\"interval\\\":\\\"10s\\\"}\"}},{\"id\":\"3\",\"type\":\"terms\",\"schema\":\"group\",\"params\":{\"field\":\"spark.executor.id\",\"size\":1000,\"order\":\"desc\",\"orderBy\":\"1\"}}],\"listeners\":{}}","description":"","savedSearchId":"spark-executor-log","version":1,"kibanaSavedObjectMeta":{"searchSourceJSON":"{\"filter\":[]}"}}'

· Line 117: Replace the line

curl -s -XPUT http://$HOSTNAME:$ELK_ESHTTP_PORT/.kibana/visualization/Spark_Completed-Tasks-per-Executor -d '{"title":"Completed Tasks per Executor","visState":"{\"type\":\"histogram\",\"params\":{\"shareYAxis\":true,\"addTooltip\":true,\"addLegend\":false,\"scale\":\"linear\",\"mode\":\"stacked\",\"times\":[],\"addTimeMarker\":false,\"defaultYExtents\":false,\"setYExtents\":false,\"yAxis\":{},\"spyPerPage\":10},\"aggs\":[{\"id\":\"1\",\"type\":\"sum\",\"schema\":\"metric\",\"params\":{\"field\":\"executor.threadpool.completeTasks\"}},{\"id\":\"2\",\"type\":\"terms\",\"schema\":\"segment\",\"params\":{\"field\":\"spark.executor.id\",\"size\":1000,\"order\":\"desc\",\"orderBy\":\"custom\",\"orderAgg\":{\"id\":\"2-orderAgg\",\"type\":\"count\",\"schema\":\"orderAgg\",\"params\":{}}}}],\"listeners\":{}}","description":"","savedSearchId":"spark-executor-log","version":1,"kibanaSavedObjectMeta":{"searchSourceJSON":"{\"filter\":[]}"}}'

with

curl -s -XPUT http://$HOSTNAME:$ELK_ESHTTP_PORT/.kibana/visualization/Spark_Completed-Tasks-per-Executor -d '{"title":"Completed Tasks per Executor","visState":"{\"type\":\"histogram\",\"params\":{\"shareYAxis\":true,\"addTooltip\":true,\"addLegend\":false,\"scale\":\"linear\",\"mode\":\"stacked\",\"times\":[],\"addTimeMarker\":false,\"defaultYExtents\":false,\"setYExtents\":false,\"yAxis\":{},\"spyPerPage\":10},\"aggs\":[{\"id\":\"1\",\"type\":\"max\",\"schema\":\"metric\",\"params\":{\"field\":\"executor.threadpool.completeTasks\"}},{\"id\":\"2\",\"type\":\"terms\",\"schema\":\"segment\",\"params\":{\"field\":\"spark.executor.id\",\"size\":1000,\"order\":\"desc\",\"orderBy\":\"custom\",\"orderAgg\":{\"id\":\"2-orderAgg\",\"type\":\"count\",\"schema\":\"orderAgg\",\"params\":{}}}}],\"listeners\":{}}","description":"","savedSearchId":"spark-executor-log","version":1,"kibanaSavedObjectMeta":{"searchSourceJSON":"{\"filter\":[]}"}}'

4.10. On every management host in the cluster, modify the $EGO_TOP/integration/elk/scripts/startindexer.sh to replace the @JAVA_HOME@ on line 3 with the string EGO_TOP/jre/3.3/linux-x86_64,

          where EGO_TOP is the value of the $EGO_TOP environment variable in your environment.
          For example, after modification, the line could be:
    export JAVA_HOME=/opt/ibm/platform/jre/3.3/linux-x86_64

4.11. Restart the cluster:
egosh ego start –f all

4.12. Log in to the PMC as the cluster administrator.

After installing this interim fix, ensure that all users that connect to the PMC clear their browser cache.

=========================
5. Copyright
=========================

U.S. Government Users Restricted Rights - Use, duplication or disclosure restricted by GSA ADP Schedule Contract with IBM Corp.

IBM®, the IBM logo and ibm.com® are trademarks of International Business Machines Corp., registered in many jurisdictions worldwide.

Other product and service names might be trademarks of IBM or other companies. A current list of IBM trademarks is available on the Web at "Copyright and trademark information" at www.ibm.com/legal/copytrade.shtml