Readme file for IBM® Spectrum LSF Simulator 10.1 Fix Pack 13 (601213)

Abstract

Readme documentation for IBM Spectrum LSF Simulator 10.1 Fix Pack 13 including installation-related instructions, prerequisites, and limitations.

Use the LSF Simulator to analyze and tune LSF configuration by simulating an LSF cluster in a separated environment. It allows administrator to run experiments with different set of LSF configurations and workload snapshots without interrupting LSF production environment.

Description

Readme documentation for IBM Spectrum LSF Simulator V10.1 Fix Pack 13 that includes installation-related instructions, prerequisites and co-requisites, and list of new features. 

Readme file for: IBM® Spectrum LSF Simulator
Product or component release: 10.1 Fix Pack 13
Publication date: 24 June 2022
Last modified date: 24 June 2022

 

What’s new

 

·        Force password change for first-time logon.

·        Upgrade Elasticsearch version from 7.10.2 to 7.16.3 with jdk 17.0.3.1 to avoid Log4j and Java security issues.

·        Upgrade Python from 3.7.9 to 3.7.11 to avoid security vulnerability.

·        Add utilization graph of cluster level memory and slots usage for experiment results.

·        Handle symbolic links (sym-link) when load LSF configuration.

 

Installation

 

1. System Requirements

RHEL X86_64, version 7.2, or later; or Centos X86_64, version 7.2, or later

Ubuntu X86_64, version 18.04

Docker-ce 19.03.* or later is installed and running

2. Installation

1.      Select a Linux host that has enough disk storage.  The server must have internet access to download Docker-in-Docker (dind) image for starting LSF simulation Docker container. If you want to load LSF configuration directly, the host must be a client or a server in LSF production environment. The directory used to extract the package is the working directory. Make sure you have at least 100G of free space. The host memory is at least 32G.

2.      Logon as the selected Linux as root. Check the maximum virtual memory setting is larger than 262144:
# sysctl -n vm.max_map_count

3.      Check Docker engine is installed and running, and the following ports are not occupied:
5050, 20022, 2222, 9200

4.      Add the selected user, for example lsfadmin to the Docker user group:
# usermod -aG docker lsfadmin

5.      Login in again to the same host with the selected user (lsfadmin) and extract the LSF Cognitive package: lsf_cognitive_simulator_v1.tar.gz
$ tar -zxvf lsf_cognitive_simulator_v1.tar.gz

 

3. Manage LSF Simulator service

1.      As the selected user (lsfadmin) or root, start the service:
$ cd lsf_cognitive_v1/
$ ./bcogn start -v “<LSF_TOP_OUTSIDE_CONTAINER>:<LSF_TOP_INSIDE_CONTAINER>”

-v is optional. It is required only when you load LSF configuration directly in the web GUI.

-v option provides additional volume or directory that is mounting from physical host (LSF_TOP_OUTSIDE_CONTAINER) to inside Docker container (LSF_TOP_INSIDE_CONTAINER).

You must use same volume path for the pair. The service provides default file volume mountings. You can use it as bridge to transfer files to the service container.

The following volume mountings are default from the service:

$PACKAGE_TOP is the full directory that is created when you extract the original package, for example: /opt/test/lsf_cognitive_v1
     Volume path outside container                           Volume path inside the container

$PACKAGE_TOP                                                            $PACKAGE_TOP

$PACKAGE_TOP/work                                                  /opt/ibm/prediction/work

$PACKAGE_TOP/logs                                                  /opt/ibm/prediction/logs

$PACKAGE_TOP/config                                              /opt/ibm/prediction/config

Note:
If LSF production cluster is installed with LSF Suite, it might create a symbolic link from /opt/ibm/lsfsuite/lsf to /sharedir/lsf, to import LSF configurations from this production cluster directly. It is required to map both directories when you start the service. For example:

$ ./bcogn start -v “/opt/ibm/lsfsuite:/opt/ibm/lsfsuite;/sharedir/lsf:/sharedir/lsf”

 

2.      As the selected user (lsfadmin) or root, stop the service:
$ ./bcogn stop

3.      Show the current service status:
$ ./bcogn status

SERVICE           STATUS     ID               PORT
lsf_cognitive:v1  STARTED    e370e9723742     5050, 9200, 20022
docker:dind       STARTED     c420d5ad63e2      2375,2376
webssh            STARTED    22651            2222


4.      Upgrade the current service to a new sub version or new package with same version:

a.      Go to parent directory of current installation, then stop the service:
$ cd xxx
$ ./lsf_cognitive_v1/bcogn stop

b.      Extract the new package to overwrite the old package:
$ tar -zxvf lsf_cognitive_simulator_v1.tar.gz

c.      Upgrade and start the services with option -u:
$ ./lsf_cognitive_v1/bcogn start -u
Or upgrade without start:
$ ./lsf_cognitive_v1/bcogn upgrade

5.      Uninstall the package:
The following command stops the services and removes Docker image:
lsf_cognitive:v1 from image repository:
$ ./lsf_cognitive_v1/bcogn stop -f

 

4. Logon the LSF Cognitive service

1.      Prepare your browser by importing the certificate as a trusted root certificate: config/https/cacert_lsf.pem

2.      From your desktop browser, log on to the service using the following URL: https://<SERVICE_HOST>:5050/
Username:
Admin
Password:
Admin

 

5. Use the LSF Cognitive service

Basic concepts:

Experiment

An experiment is an LSF simulation run that uses selected LSF configuration and workload snapshot.

LSF configuration

An LSF configuration is a full set of LSF cluster configuration and workload policies.

Workload Snapshot

A workload snapshot is a set of job submission and completion records that are imported from the LSF cluster events files (lsb.events*).

Major workflow

To analyze and tune LSF configuration, you should import LSF configuration and workload snapshot from production cluster into the LSF Cognitive service system, then run experiments with them. You can also modify the LSF configurations and the workload snapshot, then rerun the experiments. You can compare the experiment results.

 

6. Data and logs

All the data that is related to experiments, LSF configurations, and workload snapshots are saved in the directory: lsf_cognitive_v1/work, which persists during service restart. You can backup the data to another directory any time.
Service daemons logs are saved in the directory:
lsf_cognitive_v1/logs

 

Limitations

·        Dynamic hosts are not recognized. If the LSF cluster to be simulated contains dynamic hosts, the simulator data collecting tool tries to convert those dynamic hosts to static in the collected data. The tool then shows up as static hosts in the simulated cluster.  The conversion adds those hosts to the lsf.cluster.<cluster_name> file in the conf directory in the collected data and add the dynamic hosts into the hostlist line. This will add the hosts as static hosts and will be recognized by the simulator.

·        If LSF Data Manager is enabled in LSF production environment, the experiments count 1024 CPUs for the data manager primary host which, leads to improper results. Disable the LSF Data Manager before you load the LSF configuration into the LSF Cognitive system.

·        Dynamic resources that are reported from elim are converted to fixed resources when you load LSF configuration.

·        LSF Simulator 10.1 Fix Pack 13 only works with a single LSF cluster, and does not support LSF multi-cluster mode.

·        User groups that are defined in queue level as fairshare are not imported when import workload snapshot. User groups that are defined in bsub -G are imported properly.

 

Copyright and Trademark Information

©Copyright IBM Corporation 2022 

U.S. Government Users Restricted Rights - Use, duplication or disclosure restricted by GSA ADP Schedule Contract with IBM Corp.

IBM®, the IBM logo, and ibm.com® are trademarks of International Business Machines Corp., registered in many jurisdictions worldwide. Other product and service names might be trademarks of IBM or other companies. A current list of IBM trademarks is available on the web at "Copyright and trademark information" at www.ibm.com/legal/copytrade.shtml.