===============================================================================
Readme file for: IBM® Platform Conductor for Spark
Product/Component Release: 1.1.0
Fix ID:
pcs-1.1-build398394
Publication date: 8 April 2016
Abstract: Interim fix
containing fixes/enhancements for IBM® Platform Conductor for Spark v1.1.0
===============================================================================
=========================
CONTENTS
=========================
1. About this interim fix
2. List of enhancements
3. List of fixes
4. Installation and configuration
5. Copyright
=========================
1. About this interim fix
=========================
This Platform Conductor for Spark interim fix applies to a v1.1.0 cluster and includes the
following updates:
·
Significant
enhancements, including the ability to retrieve Spark driver logs from within
the Platform Management Console (PMC). See
subsequent sections for a complete list.
·
Several
fixes. See subsequent sections for a complete list.
=========================
2. List of enhancements
=========================
This interim fix includes the following list of
enhancements:
·
Validation for Add Notebook wizard parameters
Validation for the following fields in the Add Notebook wizard has been changed
to prevent ascd crashes:
Start Command
Stop Command
Prestart Command
Job Monitor command
Environment Variable Name/Value
The new set of supported characters is:
A-Z
a-z 0-9 ~ ` ! @ $ ^ * ( ) _ + \ - | } { : ? > <
, . \/ ; ' ] [ "
·
Submit Spark batch applications
without arguments from the PMC
You can now submit Spark batch applications from the PMC without specifying any
arguments, much like you would with the spark-submit
command.
·
Source additional third-party environment settings
You can now configure an additional directory with your own spark-env.sh file to pick up additional
environment variable settings that you require beyond Spark environment
variables. To load this spark-env.sh
file on startup, set
SPARK_EGO_CONF_DIR_EXTRA under Spark on EGO to an existing directory
containing your custom spark-env.sh
file.
·
Retrieve Spark Driver logs from the PMC
You can now download Spark logs for a failed driver
from the PMC if you are a cluster administrator or have the GUI Access Spark
Application Logs permission. On the Applications
& Notebooks page, when you have a failed driver in the Applications tab, you will see a (logs) hyperlink beside the application
state. Click the link to download stdout and stderr logs.
For running or finished Spark applications, go to the Spark Applications page; then in the
Logs & Metrics tab, you will see the Spark
Driver Logs hyperlink.
·
Separate consumer and resource group for Shuffle service and executors
When creating a Spark instance group, you can now
choose a different consumer and resource group for both the Shuffle service and
the executors.
NOTE: If you choose a different resource group, ensure that both resource
groups have the same set of hosts.
·
Create Spark Instance Groups without EGO Services Credential Configuration
permission
Previously, you could create a Spark instance group
only if you were a cluster administrator or a user with a role that had the EGO Services Credential Configuration
permission. Now, it is possible for consumer administrators to create Spark
instance groups without the EGO Services
Credential permission.
=========================
3. List of fixes
=========================
This interim fix includes the following list of fixes:
·
PMC with SSL disabled
With SSL disabled, cookies for the PMC were still sent securely, which the
browser was unable to read. This issue has been fixed to avoid login prompt
issues. If used after 24 hrs, errors may have occurred on account of expired
tokens.
·
Authentication required when ASCD is
running on another host; cookies not cleared on new user login
With this interim fix, logging in to the
cluster now uses the appropriate user cookies; communication with the
ASCD/WEBGUI running on different hosts no longer prompts you to log in to the
REST server.
To support
this fix, ensure that all management hosts run in the same domain. For example,
with two management hosts MANGA and MANGB in your cluster, both hosts must run
in the same domain and must include the fully qualified domain name in the
hostname: MANGA.domainname.com, MANGB.domainname.com.
·
Login issues following PMC timeout
Previously,
when the PMC timed out for one user and you tried to log in as a different
user, the error message did not specify who the currently logged-in user was.
This interim fix improves the error message to tell you which user is logged
in. You must then log in as that user, log out, and log in again as the new
user.
·
NullPointerException when removing a Spark Instance Group
Previously,
when a consumer being used by a Spark Instance Group was deleted, you could no
longer delete the Spark Instance Group. With this interim fix, you can now
delete Spark Instance Groups even if the associated consumer was deleted
previously.
·
Consumer Admin can no longer
add/remove Spark versions, notebooks
With this
interim fix, the cluster administrator is the only out-of-the-box role with
permissions to add or remove Spark versions and notebooks.
·
MaxInstancesPerHost setting removed for notebook
services
When
assigning users to notebooks, the service profile generated for the notebook no
longer includes the MaxInstancesPerHost
setting, which does not work with the exclusive slot allocation policy.
NOTE: This change applies to new notebooks assigned
to users. Existing notebooks assigned to users will still use MaxInstancesPerHost.
·
Master UI now displays Spark applications
Previously,
the Spark master UI would not render Spark applications correctly. With this issue
now fixed, the master UI builds correctly to display Spark applications.
·
User is no longer logged out while using the PMC
Previously, when using the Spark menu on the PMC,
users were sometimes timed out of the PMC even while using it. With this interim
fix, a user is logged out only when the session has been inactive for the value
specified by the SessionExpireTimeout
parameter.
·
Spark Shuffle Service copies configuration files once
Previously, the Spark Shuffle Service would copy the
configuration files for every host it started on. When using IBM Spectrum
Scale, where this configuration file directory was the same for all hosts, this
behavior could corrupt the configuration files. This issue has now been fixed,
so that the configuration is copied only once.
·
Rebuilding
historical Spark web UI no longer needs to be asynchronous
Previously, when a long-running application finishes,
it took a while (sometimes minutes) to rebuild the Spark web UI. In the
meantime, no other applications could register with this master. This issue has
now been fixed (https://issues.apache.org/jira/browse/SPARK-12062).
·
Communication issues between shipper service and logstash
forwarder
Under heavy load, sometimes, the shipper service
(elk-indexer) and logstash forwarder (elk-shipper) in
the ELK stack lost communication. As a result, Kibana
dashboards did not have updated information and submitted applications did not
display. This issue is now resolved.
·
Applications pages in PMC become extremely slow with hundreds of
applications
When hundreds of Spark applications are submitted, the
application pages under Workload
> Spark become very slow. The
performance of these application pages is now improved to reduce the amount of
network data sent.
·
RS logs incorrectly show ERROR message related to SOAM_HOME setting
In the repository server (RS) logs, an ERROR message
was logged that SOAM_HOME was not set. This ERROR message now shows as a WARN,
as the message has no impact on the RS.
·
$EGO_TOP/soam/deploy incorrectly shows engbuild as file owner
For Spark instance group deployments, the file owner
for the $EGO_TOP/soam/deploy folder was previously engbuild. This owner is now set correctly to the user
executing the deployment.
·
Improved deployment error messages
During
deployments, if the deploy.sh file does not exist or has permission issues for execution, error
messages now help you to better identify the issue.
=========================
4. Installation and configuration
=========================
Prerequisite: Platform Conductor
for Spark v1.1.0 must be installed. For more information, see http://www.ibm.com/support/knowledgecenter/SSVH2B_1.1.0/install/install.dita
NOTE: When copying files as described in the following steps, ensure that file permissions are identical to permissions set for the cluster administrator.
4.1. Log in as the cluster administrator.
4.2. From the egosh command
prompt, shut down all services and the cluster:
egosh service stop all
egosh ego shutdown –f all
4.3. Back up the following files in your
installation to a separate backup directory – these files will be
replaced by new ones in this interim fix:
$EGO_TOP/asc/1.1.1/lib/egogui.jar
$EGO_TOP/asc/1.1.1/lib/asc-core-1.1.1.jar
$EGO_TOP/asc/1.1.1/lib/asc-common-1.1.1.jar
$EGO_TOP/asc/1.1.1/lib/commons-ego.jar
$EGO_TOP/gui/3.3/lib/egogui.jar
$EGO_TOP/gui/3.3/lib/asc-common-1.1.1.jar
$EGO_TOP/gui/3.3/lib/commons-ego.jar
$EGO_TOP/wlp/usr/shared/resources/rest/3.3/egogui.jar
$EGO_TOP/wlp/usr/servers/gui/apps/soam/7.1.1/symgui/WEB-INF/classes/com/platform/gui/framework/web/filter/SessionTimeOutFilter.class
$EGO_TOP/wlp/usr/servers/gui/apps/soam/7.1.1/symgui/WEB-INF/classes/com/platform/gui/spark/instances/action/SparkyInstanceListAction.class
$EGO_TOP/wlp/usr/servers/gui/apps/soam/7.1.1/symgui/WEB-INF/classes/com/platform/gui/spark/instances/action/struts.xml
$EGO_TOP/wlp/usr/servers/gui/apps/soam/7.1.1/symgui/WEB-INF/classes/com/platform/gui/spark/common/SparkTimeoutResult.class
$EGO_TOP/wlp/usr/servers/gui/apps/soam/7.1.1/symgui/WEB-INF/classes/com/platform/gui/spark/common/SparkRestClient.class
$EGO_TOP/wlp/usr/servers/gui/apps/soam/7.1.1/symgui/spark/common/js/ConductorSparkApp.js
$EGO_TOP/wlp/usr/servers/gui/apps/soam/7.1.1/symgui/spark/instance/js/instanceViewApplications.controller.js
$EGO_TOP/wlp/usr/servers/gui/apps/soam/7.1.1/symgui/spark/appsandnotebooks/js/appsAndNotebooksList.controller.js
$EGO_TOP/wlp/usr/servers/gui/apps/soam/7.1.1/symgui/spark/appsandnotebooks/js/applicationView.controller.js
$EGO_TOP/wlp/usr/servers/gui/apps/soam/7.1.1/symgui/spark/appsandnotebooks/applicationView.html
$EGO_TOP/wlp/usr/servers/gui/apps/soam/7.1.1/symgui/spark/appsandnotebooks/applicationView.jsp
$EGO_TOP/wlp/usr/servers/gui/apps/soam/7.1.1/symgui/spark/appsandnotebooks/i18n/locale-en.json
$EGO_TOP/wlp/usr/servers/gui/apps/soam/7.1.1/symgui/spark/instance/i18n/locale-en.json
$EGO_TOP/wlp/usr/servers/gui/apps/soam/7.1.1/symgui/spark/notebooks/addNotebook.html
$EGO_TOP/wlp/usr/servers/gui/apps/soam/7.1.1/symgui/spark/notebooks/i18n/locale-en.json
$EGO_TOP/wlp/usr/servers/gui/apps/soam/7.1.1/symgui/spark/notebooks/js/notebookList.controller.js
$EGO_TOP/wlp/usr/shared/resources/rest/3.3/commons-ego.jar
$EGO_TOP/3.3/linux-x86_64/lib/librbac_ego_default.so
$EGO_TOP/3.3/linux-x86_64/lib/rbac_ego_default.so
$EGO_TOP/perf/soam/7.1.1/lib/commons-ego.jar
$EGO_TOP/soam/7.1.1/linux-x86_64/bin/soamdeploy
$EGO_TOP/integration/elk/scripts/esjobmonitor.sh
$EGO_TOP/integration/elk/scripts/indexerjobmonitor.sh
$EGO_TOP/integration/elk/scripts/kbjobmonitor.sh
$EGO_TOP/integration/elk/scripts/shipper.conf
$EGO_TOP/integration/elk/scripts/shipperjobmonitor.sh
$EGO_TOP/integration/elk/scripts/startes.sh
$EGO_TOP/integration/elk/scripts/startindexer.sh
$EGO_TOP/integration/elk/scripts/startshipper.sh
$EGO_CONFDIR/../../gui/conf/useracl/permission_GUIPermissionSoam.acl
$EGO_CONFDIR/../../gui/conf/useracl/permission_actionElementsSoam.acl
$EGO_CONFDIR/RBAC_Permission_EGO.xml
4.4. On every host in the cluster, extract
the conductor1.1.0_x86_64-build398394.tar.gz file:
tar zxfo
conductor1.1.0_x86_64-build398394.tar.gz -C
$EGO_TOP
4.5. On any management host in the cluster,
complete any one of the following steps:
·
If
you have not customized RBAC roles, copy $EGO_TOP/tmp/RBAC_Role.xml to $EGO_CONFDIR/RBAC_Role.xml.
·
If
you added new user roles to the default user roles, edit the $EGO_CONFDIR/RBAC_Role.xml file in your preferred text editor
as follows:
Add the permission <Permission>GUI_ACCESS_SPARK_APPLICATION_LOGS</Permission> for the following roles:
- Cluster Admin
- Consumer Admin
- Consumer User
- Cluster Admin (Read
Only)
- Consumer Admin (Read
Only)
- Any custom roles that should be
able to download the Spark Driver logs from the PMC.
Remove the permissions <Permission>SPARK_NOTEBOOK_CONFIGURE</Permission> and <Permission>SPARK_VERSION_CONFIGURE</Permission> from the Consumer Admin role.
4.6. On any management host in the cluster, copy $EGO_TOP/kernel/conf/RBAC_Permission_EGO.xml to $EGO_CONFDIR/RBAC_Permission_EGO.xml.
4.7. On any management host in the cluster, copy $EGO_TOP/gui/conf/useracl/permission_GUIPermissionSoam.acl to $EGO_CONFDIR/../../gui/conf/useracl/permission_GUIPermissionSoam.acl.
4.8. On any management host in the cluster, copy $EGO_TOP/gui/conf/useracl/permission_actionElementsSoam.acl to $EGO_CONFDIR/../../gui/conf/useracl/permission_actionElementsSoam.acl.
4.9. On every management host in the
cluster, modify the $EGO_TOP/integration/elk/scripts/startes.sh to replace the following 3 lines:
·
Line
3: Replace @JAVA_HOME@ with
the string EGO_TOP/jre/3.3/linux-x86_64, where EGO_TOP is the value of the $EGO_TOP
environment variable in your environment.
For example, after modification, the line could be:
export JAVA_HOME=/opt/ibm/platform/jre/3.3/linux-x86_64
·
Line
85: Replace the line
curl -s -XPUT
http://$HOSTNAME:$ELK_ESHTTP_PORT/.kibana/visualization/Spark_Completed-Tasks
-d '{"title":"Completed
Tasks","visState":"{\"type\":\"area\",\"params\":{\"shareYAxis\":true,\"addTooltip\":true,\"addLegend\":false,\"smoothLines\":false,\"scale\":\"linear\",\"interpolate\":\"linear\",\"mode\":\"stacked\",\"times\":[],\"addTimeMarker\":false,\"defaultYExtents\":false,\"setYExtents\":false,\"yAxis\":{}},\"aggs\":[{\"id\":\"1\",\"type\":\"sum\",\"schema\":\"metric\",\"params\":{\"field\":\"executor.threadpool.completeTasks\"}},{\"id\":\"2\",\"type\":\"date_histogram\",\"schema\":\"segment\",\"params\":{\"field\":\"@timestamp\",\"interval\":\"auto\",\"min_doc_count\":1,\"extended_bounds\":{},\"json\":\"{\\\"interval\\\":\\\"10s\\\"}\"}}],\"listeners\":{}}","description":"","savedSearchId":"spark-executor-log","version":1,"kibanaSavedObjectMeta":{"searchSourceJSON":"{\"filter\":[]}"}}'
with
curl -s -XPUT http://$HOSTNAME:$ELK_ESHTTP_PORT/.kibana/visualization/Spark_Completed-Tasks
-d '{"title":"Completed
Tasks","visState":"{\"type\":\"area\",\"params\":{\"shareYAxis\":true,\"addTooltip\":true,\"addLegend\":true,\"smoothLines\":false,\"scale\":\"linear\",\"interpolate\":\"linear\",\"mode\":\"stacked\",\"times\":[],\"addTimeMarker\":false,\"defaultYExtents\":false,\"setYExtents\":false,\"yAxis\":{}},\"aggs\":[{\"id\":\"1\",\"type\":\"max\",\"schema\":\"metric\",\"params\":{\"field\":\"executor.threadpool.completeTasks\"}},{\"id\":\"2\",\"type\":\"date_histogram\",\"schema\":\"segment\",\"params\":{\"field\":\"@timestamp\",\"interval\":\"auto\",\"min_doc_count\":1,\"extended_bounds\":{},\"json\":\"{\\\"interval\\\":\\\"10s\\\"}\"}},{\"id\":\"3\",\"type\":\"terms\",\"schema\":\"group\",\"params\":{\"field\":\"spark.executor.id\",\"size\":1000,\"order\":\"desc\",\"orderBy\":\"1\"}}],\"listeners\":{}}","description":"","savedSearchId":"spark-executor-log","version":1,"kibanaSavedObjectMeta":{"searchSourceJSON":"{\"filter\":[]}"}}'
·
Line
117: Replace the line
curl -s -XPUT
http://$HOSTNAME:$ELK_ESHTTP_PORT/.kibana/visualization/Spark_Completed-Tasks-per-Executor
-d '{"title":"Completed Tasks per
Executor","visState":"{\"type\":\"histogram\",\"params\":{\"shareYAxis\":true,\"addTooltip\":true,\"addLegend\":false,\"scale\":\"linear\",\"mode\":\"stacked\",\"times\":[],\"addTimeMarker\":false,\"defaultYExtents\":false,\"setYExtents\":false,\"yAxis\":{},\"spyPerPage\":10},\"aggs\":[{\"id\":\"1\",\"type\":\"sum\",\"schema\":\"metric\",\"params\":{\"field\":\"executor.threadpool.completeTasks\"}},{\"id\":\"2\",\"type\":\"terms\",\"schema\":\"segment\",\"params\":{\"field\":\"spark.executor.id\",\"size\":1000,\"order\":\"desc\",\"orderBy\":\"custom\",\"orderAgg\":{\"id\":\"2-orderAgg\",\"type\":\"count\",\"schema\":\"orderAgg\",\"params\":{}}}}],\"listeners\":{}}","description":"","savedSearchId":"spark-executor-log","version":1,"kibanaSavedObjectMeta":{"searchSourceJSON":"{\"filter\":[]}"}}'
with
curl -s -XPUT http://$HOSTNAME:$ELK_ESHTTP_PORT/.kibana/visualization/Spark_Completed-Tasks-per-Executor
-d '{"title":"Completed Tasks per
Executor","visState":"{\"type\":\"histogram\",\"params\":{\"shareYAxis\":true,\"addTooltip\":true,\"addLegend\":false,\"scale\":\"linear\",\"mode\":\"stacked\",\"times\":[],\"addTimeMarker\":false,\"defaultYExtents\":false,\"setYExtents\":false,\"yAxis\":{},\"spyPerPage\":10},\"aggs\":[{\"id\":\"1\",\"type\":\"max\",\"schema\":\"metric\",\"params\":{\"field\":\"executor.threadpool.completeTasks\"}},{\"id\":\"2\",\"type\":\"terms\",\"schema\":\"segment\",\"params\":{\"field\":\"spark.executor.id\",\"size\":1000,\"order\":\"desc\",\"orderBy\":\"custom\",\"orderAgg\":{\"id\":\"2-orderAgg\",\"type\":\"count\",\"schema\":\"orderAgg\",\"params\":{}}}}],\"listeners\":{}}","description":"","savedSearchId":"spark-executor-log","version":1,"kibanaSavedObjectMeta":{"searchSourceJSON":"{\"filter\":[]}"}}'
4.10. On every management host in the
cluster, modify the $EGO_TOP/integration/elk/scripts/startindexer.sh to replace the @JAVA_HOME@ on line 3 with the string EGO_TOP/jre/3.3/linux-x86_64,
where EGO_TOP is the value of the $EGO_TOP
environment variable in your environment.
For example, after
modification, the line could be:
export JAVA_HOME=/opt/ibm/platform/jre/3.3/linux-x86_64
4.11. Restart the cluster:
egosh ego start –f all
4.12. Log
in to the PMC as the cluster administrator.
After
installing this interim fix, ensure that all users that connect to the PMC
clear their browser cache.
=========================
5. Copyright
=========================
© IBM Corporation 2016
U.S. Government Users Restricted Rights - Use,
duplication or disclosure restricted by GSA ADP Schedule Contract with IBM
Corp.
IBM®, the IBM logo and ibm.com® are trademarks of
International Business Machines Corp., registered in many jurisdictions
worldwide.
Other product and service names might be trademarks of
IBM or other companies. A current list of IBM trademarks is available on the
Web at "Copyright and trademark information" at www.ibm.com/legal/copytrade.shtml