IBM Platform LSF 9.1.3 Fix 384866 Readme File

Abstract

P101565. Fix to allow LSF to start NVIDIA MPS for GPU jobs.

Description

Readme documentation for IBM Platform LSF 9.1.3 Fix 384866 including installation-related instructions, prerequisites and co-requisites, and list of fixes.

CUDA MPS (Multi-Process Service), formerly known as CUDA Proxy, is a feature that allows multiple CUDA processes to share a single GPU context with EXCLUSIVE_PROCESS and DEFAULT modes.

This fix allows LSF to start MPS for jobs that require GPUs with EXCLUSIVE_PROCESS and DEFAULT modes.

Enable this feature by defining the following parameter in lsf.conf:

LSB_START_MPS

Syntax
LSB_START_MPS=y|Y

Description
If set to y|Y, LSF starts CUDA MPS for the GPU jobs that require only GPUs with EXCLUSIVE_PROCESS or DEFAULT modes. If the user requires GPUs with EXCLUSIVE_THREAD mode, LSF does not start CUDA MPS for the GPU jobs.

This parameter can be overridden at the job level by specifying the LSB_START_JOB_MPS environment variable:

LSB_ START_ JOB_ MPS=y|Y|n|N

If LSF starts MPS for a job, LSF sets CUDA_MPS_PIPE_DIRECTORY instead of CUDA_VISIBLE_DEVICES. The GPU jobs communicate with MPS through a named pipe that is defined by CUDA_MPS_PIPE_DIRECTORY. The CUDA_MPS_PIPE_DIRECTORY is stored under the directory that is specified by LSF_TMPDIR. When job finishes, LSF removes the pipe.

If the cgroup feature is enabled, LSF also creates a cgroup for MPS under the job level cgroup.

The MPS Server supports up to 16 client CUDA contexts concurrently. This limitation is per user per job and means that MPS can only support up to16 CUDA processes at one time even if LSF allocated multiple GPUs. MPS cannot exit normally if GPU jobs are killed. The LSF cgroup feature can help resolve this situation.

The MPS function is supported by CUDA Version 5.5, or later.

Readme file for: IBM® Platform LSF

Product/Component Release: 9.1.3

Update Name: Fix 384866

Fix ID: LSF-9.1.3-build384866

Publication date: 18 February 2016

Last modified date: 18 February 2016

Contents:

1.     List of fixes

2.     Download location

3.     Products or components affected

4.     System requirements

5.     Installation and configuration

6.     List of files

7.     Product notifications

8.     Copyright and trademark information

 

1.   List of fixes

p101565

2.   Download Location

Download Fix 384866 from the following location: http://www.ibm.com/eserver/support/fixes/

3.   Products or components affected

Affected components include: LSF/sbatchd, LSF/res, LSF/bjobs, LSF/bhist

 

4.   System requirements

Linux2.6-glibc2.3-x86_64
Linux3.10-glibc2.17-ppc64le

 

5.   Installation and configuration

 

5.1          Before installation

 

 (LSF_TOP=Full path to the top-level installation directory of LSF.)

1)    Log on to the LSF master host as root

2)    Set your environment:

-      For csh or tcsh: % source LSF_TOP/conf/cshrc.lsf

-      For sh, ksh, or bash: $ . LSF_TOP/conf/profile.lsf

 

5.2          Installation steps

 

1)    Go to the patch install directory: cd $LSF_ENVDIR/../9.1/install/

2)    Copy the patch file to the install directory $LSF_ENVDIR/../9.1/install/

3)    Run patchinstall: ./patchinstall <patch>

 

5.3          After installation

 

1)    Log on to the LSF master host as root

2)    Set LSB_START_MPS=Y in lsf.conf

3)    Run badmin hrestart -f all

4)    Run lsadmin resrestart -f all

 

5.4          Uninstallation

 

To roll back a patch:

1)    Log on to the LSF master host as root

2)    Run ./patchinstall -r <patch>

3)    Remove LSB_START_MPS from lsf.conf

4)    Run badmin hrestart -f all

5)    Run lsadmin resrestart -f all

6.   List of files

 

sbatchd res bjobs bhist

 

7.   Product notifications

To receive information about product solution and patch updates automatically, subscribe to product notifications on the My notifications page (www.ibm.com/support/mynotifications) on the IBM Support website (support.ibm.com). You can edit your subscription settings to choose the types of information you want to get notification about, for example, security bulletins, fixes, troubleshooting, and product enhancements or documentation changes.


8.   Copyright and trademark information

© Copyright IBM Corporation 2016

U.S. Government Users Restricted Rights - Use, duplication or disclosure restricted by GSA ADP Schedule Contract with IBM Corp.

IBM®, the IBM logo and ibm.com® are trademarks of International Business Machines Corp., registered in many jurisdictions worldwide. Other product and service names might be trademarks of IBM or other companies. A current list of IBM trademarks is available on the Web at "Copyright and trademark information" at www.ibm.com/legal/copytrade.shtml.