===============================================================================
Readme file for IBM® Spectrum Conductor with Spark
2.1.0 Fix Pack 1 (Interim Fix 421129)
Product/Component Release: 2.1.0 Fix Pack 1
Update name: Interim Fix 421129
Fix ID: Jupyter4.1.0-CwS2.1.0.1-Build421129
Publication date: 15 Sept, 2016
Abstract: Interim fix upgrading IPython notebook
3.2.1 to Jupyter notebook 4.1.0.
===============================================================================
=========================
CONTENTS
=========================
1. About this interim fix
2. Supported operating systems
3. Prerequisites
4. Installation and Configuration
5. Copyright
=========================
1. About this interim fix
=========================
This interim fix is an IPython notebook update for Jupyter Notebook 4.1.0. This updated notebook package
supports Spark 1.5.2 and Spark 1.6.1 in IBM Spectrum Conductor with Spark
v2.1.0 Fix Pack 1.
=========================
2. Supported operating systems
=========================
Red Hat Enterprise Linux
64-bit 7.x
=========================
3. Prerequisites
=========================
3.1 IBM Spectrum Conductor with Spark v2.1.0 Fix Pack 1 must be installed. For more
information, see https://www.ibm.com/support/knowledgecenter/SSZU2E_2.1.0/install/install.html.
3.2
Download Anaconda2-4.1.1-Linux-x86_64.sh from https://repo.continuum.io/archive/.
3.3
Install cURL 7.28.0 or higher on all hosts that will
run the Jupyter notebook. You can download cURL from https://curl.haxx.se/download.html.
=========================
4. Installation and Configuration
=========================
4.1. Download the Jupyter-4.1.0-scripts.tar.gz package to a local directory on your computer.
4.2.
Untar the Jupyter-4.1.0-scripts.tar.gz package:
# tar xzvf
Jupyter-4.1.0-scripts.tar.gz
When
extracted, you should see the following files:
build.version
deployment.xml
scripts/
scripts/00-pyspark-setup.py
scripts/undeploy.sh
scripts/jupyter_notebook_config.py
scripts/stop_jupyter.sh
scripts/start_jupyter.sh
scripts/prestart_jupyter.sh
scripts/custom.js
scripts/docker_jupyter.sh
scripts/jobMonitor.sh
scripts/common.inc
scripts/loginegoauth.py
scripts/deploy.sh
scripts/readycheck.sh
NOTE:
To replace the open source Anaconda script file (Anaconda2-4.1.1-Linux-x86_64.sh) with your own script, edit
the /scripts/common.inc file and define the ANACONDA_SCRIPT_NAME parameter to
the name of your own Anaconda file.
4.3. Create
the notebook package using the files included in Jupyter-4.1.0-scripts.tar.gz. Copy the scripts folder and the deployment.xml file in Jupyter-4.1.0-scripts.tar.gz; copy also the Anaconda2-4.1.1-Linux-x86_64.sh file that you downloaded to
a package folder. Ensure that all the
scripts and the Anaconda2-4.1.1-Linux-x86_64.sh file have execution (x)
permission for all your users and user groups.
For
example:
# tar
czvf jupyter.tar.gz deployment.xml scripts package
The
structure should be:
packagename
scripts
<files_from_scripts_folder>
package
Anaconda2-4.1.1-Linux-x86_64.sh
deployment.xml
For
more information, see https://www.ibm.com/support/knowledgecenter/SSZU2E_2.1.0/managing_notebooks/notebook_create.html.
4.4. Add
the Jupyter notebook. For more information, see https://www.ibm.com/support/knowledgecenter/SSZU2E_2.1.0/managing_notebooks/notebook_add.html.
When
adding the Jupyter notebook without Docker support,
specify the following required settings in the Deployment Settings tab:
·
Name:
Jupyter
·
Version:
4.1.0
·
Prestart
command: ./scripts/prestart_jupyter.sh
·
Start
command: ./scripts/start_jupyter.sh
·
Stop
command: ./scripts/stop_jupyter.sh
·
Job
monitor command: ./scripts/jobMonitor.sh
IMPORTANT:
If you specify a base port (from which the system tries to find available ports
for the Jupyter notebook), take note that this base
port setting is ignored. Notebook port numbers are dynamically generated; as a
result, the actual port used by the notebook may not be the same as the
specified port.
4.5. Create
a Spark instance group and select the Jupyter
notebook that you added. Edit the notebook configuration to specify the execution
user for this notebook in the Deployment Settings tab. Make other changes as
required. For more information, see https://www.ibm.com/support/knowledgecenter/SSZU2E_2.1.0/developing_instances/instance_create.html.
IMPORTANT:
When creating the Spark instance group, if you want to define a resource group
for Spark executors that is different from the resource group defined for this
notebook, ensure that all hosts in the Spark executors resource group have
Python 2.7.11 and all application-related dependencies installed. Then, edit
the Spark instance group configuration to set the installed Python path as the
value of the PYSPARK_PYTHON parameter within settings for Environment
Variables.
For
example, for Linux Server (RHEL) 7.0 x86-64, download Anaconda2-4.1.1-Linux-x86_64.sh from
https://repo.continuum.io/archive/, and then run this script as follows:
bash Anaconda2-4.1.1-Linux-x86_64.sh
-p $INSTALLEDIR
At
the end of the installation, choose yes to prepend the Anaconda install
location to PATH in your /root/.bashrc. Then, set PYSPARK_PYTHON as
$INSTALLEDIR/bin/python, where $INSTALLEDIR is the
Anaconda installation directory.
4.6. Deploy and start the Spark instance group. For more information, see https://www.ibm.com/support/knowledgecenter/SSZU2E_2.1.0/managing_instances/instances_deploy.html.
4.7. Assign
users for the Jupyter notebook. For more information,
see https://www.ibm.com/support/knowledgecenter/SSZU2E_2.1.0/managing_instances/mapping_notebooks.html.
NOTE:
If high availability is enabled for the master host in the cluster after the Jupyter notebook is assigned to a user and the notebook
service started, you must restart the notebook service.
4.8. Launch
the Jupyter notebook. For more information, see https://www.ibm.com/support/knowledgecenter/SSZU2E_2.1.0/managing_applications/notebooks_launching.html.
Once
the notebook launches in a new window, enter the notebook password. This is the
password of the user assigned to this notebook.
After
successful authentication, you can use Jupyter with
Spark PySpark. For more information about how to use Jupyter, see http://jupyter.org/.
NOTE:
The Jupyter notebook does not automatically log out a
user following a period of inactivity. As a result, the logged-in user does not
have to re-enter the password if the web browser window is still open. To avoid
potential security issues, ensure that you always log out from the Jupyter notebook (click Logout within the notebook) or
clear the web browser cache.
To upgrade packages bundled
with Anaconda in the future, follow these steps:
IMPORTANT:
When upgrading packages, it is your responsibility to guarantee that all the
packages and dependencies are updated properly. Only minor version upgrades are
compatible with this Jupyter notebook sample.
1. Log in to the cluster
management console as administrator.
2. Stop all the Jupyter notebooks. See https://www.ibm.com/support/knowledgecenter/SSZU2E_2.1.0/managing_instances/notebooks_stop.html.
3. Upgrade the packages as
required. All Anaconda packages are installed to the $NOTEBOOK_DEPLOY_DIR/install directory on all related
hosts.
4. Start all the Jupyter notebooks. See https://www.ibm.com/support/knowledgecenter/SSZU2E_2.1.0/managing_instances/notebooks_start.html.
=========================
5.
Copyright
=========================
© Copyright IBM Corporation 2016
U.S. Government Users Restricted Rights - Use, duplication or
disclosure restricted by GSA ADP Schedule Contract with IBM Corp.
IBM®, the IBM logo and ibm.com® are trademarks of International Business
Machines Corp., registered in many jurisdictions worldwide.
Other product and service names might be trademarks of IBM or other companies.
A current list of IBM trademarks is available on the Web at
"Copyright and trademark information" at www.ibm.com/legal/copytrade.shtml