IBM Spectrum LSF 10.1 Solution LSF Docker Job Support Readme File

Abstract

RFE#86007. The solution allows LSF to run job in Docker containers on demand. LSF manages the entire life cycle of jobs running in the container as common jobs.

Description

Readme documentation for IBM Spectrum LSF 10.1 Solution build 423991 including installation-related instructions, prerequisites and co-requisites, and list of fixes.

This solution enables LSF to run and manage jobs in Docker containers.

This solution introduces a new parameter CONTAINER in the lsb.applications file for configuring the Docker job application profile.

Syntax
CONTAINER=docker[image(image-name) options(docker-run-options) starter(user-name)]

Description
image: Required. This keyword configures the Docker image name that is used in running jobs.
options: Optional. This keyword configures the Docker job run options, which are passed to the job container by "docker run" in LSF.
starter: Optional. This keyword specifies the name of the user that starts the "docker run" to launch containers for jobs. The default user is the LSF primary administrator.

Note
1. Before specifying the Docker job run options, make sure that these options work in the Docker run command line.
    The "--cgroup-parent", "--user (-u)", and "--name" options are reserved for LSF internal use. Do not use these options in the options keyword configuration.
    The "-w" and "--ulimit" options are automatically set for LSF automatically. Do not use these options in the options keyword configuration because the specifications here override the LSF settings.
    The "-v" option is automatically used to mount the working directories that LSF needs: current working directory, job spool directory, destination file for the "bsub -f" command, tmp directory, top level LSF, and checkpoint directory on demand.
2. "--rm" is proposed to be configured in options keyword configuration to automatically remove containers after job is done.
3. The starter account must be root or the user configured in the "docker" user group. To add a user to the "docker" user group, run the following command:
    sudo usermod -aG docker starter_username

Examples
    CONTAINER=docker[image(image-name) options(--rm)]
To make blaunch works, the network and IPC must work across containers, the execution user ID and user name mapping file must be mounted into the container for blaunch authentication.
    CONTAINER=docker[image(image-name) options(--rm --network=host --ipc=host -v /path/to/my/passwd:/etc/passwd)]

The passwd file is in the following format:
     user1:x:10001:10001:::
     user2:x:10002:10002:::

Jobs submitted to the docker application profile are started in the container, for example:
bsub -app docker ./myjob.sh
1. For parallel Docker jobs, LSF will get the incorrect status DONE instead of EXIT under the following conditions:
    - Ctrl+C is used to cancel an interactive parallel docker job
    - If one task crashes when RTASK_GONE_ACTION=KILLJOB_TASKEXIT is configured in the lsb.applications file.

2. For "options" configuration in CONTAINER, it is not proposed to use "--sig-proxy=false". This configuration may trigger a kernel bug to make the whole system hang when bkill a Docker job running with it.

Readme file for: IBM® Spectrum LSF

Product/Component Release: 10.1

Update Name: Solution LSF Docker job support

Fix ID: LSF-10.1-build 423991

Publication date: 28 September 2016

Last modified date: 28 September 2016

Contents:

1.     List of fixes

2.     Download location

3.     Products or components affected

4.     System requirements

5.     Installation and configuration

6.     List of files

7.     Product notifications

8.     Copyright and trademark information

 

1.   List of fixes

RFE#86007. LSF Docker job support

2.   Download Location

Download Fix build 423991 from the following location: http://www.ibm.com/eserver/support/fixes/

3.   Products or components affected

Affected components include: mbschd, mbatchd, sbatchd, res, bapp, blaunch, lsadmin, badmin

 

4.   System requirements

Linux2.6-glibc2.3-x86_64

Linux3.10-glibc2.17-x86_64

 

5.   Installation and configuration

 

5.1          Before installation

 

 (LSF_TOP=Full path to the top-level installation directory of LSF.)

1)    Log on to the LSF master host as root

2)    Set your environment:

-      For csh or tcsh: % source LSF_TOP/conf/cshrc.lsf

-      For sh, ksh, or bash: $ . LSF_TOP/conf/profile.lsf

 

5.2          Installation steps

 

1)    Go to the patch install directory: cd $LSF_ENVDIR/../10.1/install/

2)    Copy the patch file to the install directory $LSF_ENVDIR/../10.1/install/

3)    Run patchinstall: ./patchinstall <patch>

 

5.3          After installation

 

1)    Log on to the LSF master host as root

2)    Run lsfrestart

 

5.4          Uninstallation

 

To roll back a patch:

1)    Log on to the LSF master host as root

2)    Run ./patchinstall -r <patch>

3)    Run lsfrestart

6.   List of files

 

mbschd
mbatchd
sbatchd
res
bapp
blaunch

badmin

lsadmin

 

7.   Product notifications

To receive information about product solution and patch updates automatically, subscribe to product notifications on the My notifications page (www.ibm.com/support/mynotifications) on the IBM Support website (support.ibm.com). You can edit your subscription settings to choose the types of information you want to get notification about, for example, security bulletins, fixes, troubleshooting, and product enhancements or documentation changes.


8.   Copyright and trademark information

© Copyright IBM Corporation 2016

U.S. Government Users Restricted Rights - Use, duplication or disclosure restricted by GSA ADP Schedule Contract with IBM Corp.

IBM®, the IBM logo and ibm.com® are trademarks of International Business Machines Corp., registered in many jurisdictions worldwide. Other product and service names might be trademarks of IBM or other companies. A current list of IBM trademarks is available on the Web at "Copyright and trademark information" at www.ibm.com/legal/copytrade.shtml.