Readme for Jupyter 4.1 Notebook in IBM Spectrum Conductor with Spark

===============================================================================
Readme file for IBM^® Spectrum Conductor with Spark 2.1.0 Fix Pack 1 (Interim Fix 421129)

Product/Component Release: 2.1.0 Fix Pack 1
Update name: Interim Fix 421129
Fix ID: Jupyter4.1.0-CwS2.1.0.1-Build421129
Publication date: 15 Sept, 2016

Abstract: Interim fix upgrading IPython notebook 3.2.1 to Jupyter notebook 4.1.0.
===============================================================================

=========================
CONTENTS
=========================
1. About this interim fix
2. Supported operating systems
3. Prerequisites
4. Installation and Configuration
5. Copyright

=========================
1. About this interim fix
=========================
This interim fix is an IPython notebook update for Jupyter Notebook 4.1.0. This updated notebook package supports Spark 1.5.2 and Spark 1.6.1 in IBM Spectrum Conductor with Spark v2.1.0 Fix Pack 1.

=========================
2. Supported operating systems
=========================
Red Hat Enterprise Linux 64-bit 7.x

=========================
3. Prerequisites
=========================
3.1 IBM Spectrum Conductor with Spark v2.1.0 Fix Pack 1 must be installed. For more information, see https://www.ibm.com/support/knowledgecenter/SSZU2E_2.1.0/install/install.html.

3.2 Download Anaconda2-4.1.1-Linux-x86_64.sh from https://repo.continuum.io/archive/.

3.3 Install cURL 7.28.0 or higher on all hosts that will run the Jupyter notebook. You can download cURL from https://curl.haxx.se/download.html.

=========================
4. Installation and Configuration
=========================
4.1. Download the Jupyter-4.1.0-scripts.tar.gz package to a local directory on your computer.

4.2. Untar the Jupyter-4.1.0-scripts.tar.gz package:

# tar xzvf Jupyter-4.1.0-scripts.tar.gz

When extracted, you should see the following files:

build.version

deployment.xml

scripts/

scripts/00-pyspark-setup.py

scripts/undeploy.sh

scripts/jupyter_notebook_config.py

scripts/stop_jupyter.sh

scripts/start_jupyter.sh

scripts/prestart_jupyter.sh

scripts/custom.js

scripts/docker_jupyter.sh

scripts/jobMonitor.sh

scripts/common.inc

scripts/loginegoauth.py

scripts/deploy.sh

scripts/readycheck.sh

NOTE: To replace the open source Anaconda script file (Anaconda2-4.1.1-Linux-x86_64.sh) with your own script, edit the /scripts/common.inc file and define the ANACONDA_SCRIPT_NAME parameter to the name of your own Anaconda file.

4.3. Create the notebook package using the files included in Jupyter-4.1.0-scripts.tar.gz. Copy the scripts folder and the deployment.xml file in Jupyter-4.1.0-scripts.tar.gz; copy also the Anaconda2-4.1.1-Linux-x86_64.sh file that you downloaded to a package folder. Ensure that all the scripts and the Anaconda2-4.1.1-Linux-x86_64.sh file have execution (x) permission for all your users and user groups.

For example:

# tar czvf jupyter.tar.gz deployment.xml scripts package

The structure should be:

packagename
    scripts
       <files_from_scripts_folder>
    package
       Anaconda2-4.1.1-Linux-x86_64.sh
    deployment.xml

For more information, see https://www.ibm.com/support/knowledgecenter/SSZU2E_2.1.0/managing_notebooks/notebook_create.html.

4.4. Add the Jupyter notebook. For more information, see https://www.ibm.com/support/knowledgecenter/SSZU2E_2.1.0/managing_notebooks/notebook_add.html.

When adding the Jupyter notebook without Docker support, specify the following required settings in the Deployment Settings tab:

· Name: Jupyter

· Version: 4.1.0

· Prestart command: ./scripts/prestart_jupyter.sh

· Start command: ./scripts/start_jupyter.sh

· Stop command: ./scripts/stop_jupyter.sh

· Job monitor command: ./scripts/jobMonitor.sh

IMPORTANT: If you specify a base port (from which the system tries to find available ports for the Jupyter notebook), take note that this base port setting is ignored. Notebook port numbers are dynamically generated; as a result, the actual port used by the notebook may not be the same as the specified port.

4.5. Create a Spark instance group and select the Jupyter notebook that you added. Edit the notebook configuration to specify the execution user for this notebook in the Deployment Settings tab. Make other changes as required. For more information, see https://www.ibm.com/support/knowledgecenter/SSZU2E_2.1.0/developing_instances/instance_create.html.

IMPORTANT: When creating the Spark instance group, if you want to define a resource group for Spark executors that is different from the resource group defined for this notebook, ensure that all hosts in the Spark executors resource group have Python 2.7.11 and all application-related dependencies installed. Then, edit the Spark instance group configuration to set the installed Python path as the value of the PYSPARK_PYTHON parameter within settings for Environment Variables.

For example, for Linux Server (RHEL) 7.0 x86-64, download Anaconda2-4.1.1-Linux-x86_64.sh from https://repo.continuum.io/archive/, and then run this script as follows:

bash Anaconda2-4.1.1-Linux-x86_64.sh -p $INSTALLEDIR

At the end of the installation, choose yes to prepend the Anaconda install location to PATH in your /root/.bashrc. Then, set PYSPARK_PYTHON as $INSTALLEDIR/bin/python, where $INSTALLEDIR is the Anaconda installation directory.

4.6. Deploy and start the Spark instance group. For more information, see https://www.ibm.com/support/knowledgecenter/SSZU2E_2.1.0/managing_instances/instances_deploy.html.

4.7. Assign users for the Jupyter notebook. For more information, see https://www.ibm.com/support/knowledgecenter/SSZU2E_2.1.0/managing_instances/mapping_notebooks.html.

NOTE: If high availability is enabled for the master host in the cluster after the Jupyter notebook is assigned to a user and the notebook service started, you must restart the notebook service.

4.8. Launch the Jupyter notebook. For more information, see https://www.ibm.com/support/knowledgecenter/SSZU2E_2.1.0/managing_applications/notebooks_launching.html.

Once the notebook launches in a new window, enter the notebook password. This is the password of the user assigned to this notebook.

After successful authentication, you can use Jupyter with Spark PySpark. For more information about how to use Jupyter, see http://jupyter.org/.

NOTE: The Jupyter notebook does not automatically log out a user following a period of inactivity. As a result, the logged-in user does not have to re-enter the password if the web browser window is still open. To avoid potential security issues, ensure that you always log out from the Jupyter notebook (click Logout within the notebook) or clear the web browser cache.

To upgrade packages bundled with Anaconda in the future, follow these steps:

IMPORTANT: When upgrading packages, it is your responsibility to guarantee that all the packages and dependencies are updated properly. Only minor version upgrades are compatible with this Jupyter notebook sample.

1. Log in to the cluster management console as administrator.

2. Stop all the Jupyter notebooks. See https://www.ibm.com/support/knowledgecenter/SSZU2E_2.1.0/managing_instances/notebooks_stop.html.

3. Upgrade the packages as required. All Anaconda packages are installed to the $NOTEBOOK_DEPLOY_DIR/install directory on all related hosts.

4. Start all the Jupyter notebooks. See https://www.ibm.com/support/knowledgecenter/SSZU2E_2.1.0/managing_instances/notebooks_start.html.

=========================

5. Copyright
=========================
© Copyright IBM Corporation 2016
U.S. Government Users Restricted Rights - Use, duplication or disclosure restricted by GSA ADP Schedule Contract with IBM Corp.
IBM®, the IBM logo and ibm.com® are trademarks of International Business Machines Corp., registered in many jurisdictions worldwide.
Other product and service names might be trademarks of IBM or other companies. A current list of IBM trademarks is available on the Web at "Copyright and trademark information" at www.ibm.com/legal/copytrade.shtml