IBM Spectrum LSF Simulator 10.1.0.13 Readme File 

Abstract

Readme documentation for IBM Spectrum LSF Simulator 10.1.0.13 including installation-related instructions, prerequisites and limitations.

Use the LSF Simulator to analyze and tune LSF configuration by simulating an LSF cluster in a separated environment. It allows administrator to run experiments with different set of LSF configurations and workload snapshots without interrupting LSF production environment.

 

Description

Readme documentation for IBM Spectrum LSF Simulator  V10.1.0.13 that includes installation-related instructions, prerequisites and co-requisites, and list of new features. 

Readme file for: IBM® Spectrum LSF Simulator
Product/Component Release: 10.1.0.13
Publication date: 28 July, 2022
Last modified date: 28 July, 2022

 

What’s new

 

ü  Enforce password change when first time logon

ü  Support experiment cluster level resource utilization charts

ü  Handle sym-link when load LSF conf

ü  Support external Elasticsearch with security enabled

 

New Features Description

 

1.     Enforce password change when first time logon Webgui

In order to protect the system, it is required that admin has to change the webgui logon password from default logon: Admin/Admin,

This is only done once when first time logon.

 

2.     Support experiment cluster level resource utilization charts

When an experiment is running or has been done, user can view the cluster level slots and memory usage chart by clicking on “Graphs” tab

 

3.     Handle sym-link when load LSF conf

The LSF production cluster might include symbolic links under conf/ directory. When simulator load LSF conf, it will replace the sym-link with the real content  without touching original files under LSF conf/.

 

 

4.     Support external Elasticsearch with security enabled

 

Two enhancements are related with this new feature:

 

1). PACKAGE_TOP/config/providers/es1.json will be the ONLY Elasticsearch connection settings cross the whole system.

     Previous es1.json under lsfconfs/ and experiments/ will be ignored

 

2). Administrator can configure external Elasticsearch with security mode enabled. support https connection with cacert

     and ES_USERNAME, ES_PASSWORD for user logon.

 

The following cases are supported after this enhancement:

 

case 1). Default installation,

same as before, built-in Elasticsearch is started inside the lsf-cognitive container, simulator will use built-in elasticsearch for whole system

             limitation: cannot enable security for built-in Elasticsearch

 

case 2). use external security enabled Elasticsearch by the following steps

step 1: before start the service, edit file PACKAGE_TOP/config/providers/es1.json, define

"use_ssl" : true,

"ca_certs": "CA_FILE_NAME",

"verify_certs": true,

 

step 2: copy CA_FILE_NAME to PACKAGE_TOP/config/providers/

 

step 3: define environment variables

export ES_USERNAME=elastic

export ES_PASSWORD= xxxxxx

export ElasticSearchURL=https://fp13-rhel85-2:9200 <--- can be any value, this is for disable built-in elasticsearch

 

step 4: start up the service as admin account

cd /opt/test/lsf_cognitive_v1

./bcogn start -v xxx

 

 

Fresh Installation Steps

 

1. System Requirements

RHEL X86_64, Version 7.2, or later; or Centos X86_64, Version 7.2, or later

Ubuntu X86_64, Version 18.04

Docker-ce 19.03.* is installed and running

2. Installation

     1.  Select a Linux host that has enough disk storage

          The server must have internet access in order to download Docker-in-Docker (dind) image for starting LSF simulation docker container.

          If you want to load LSF configuration directly, the host must be a client or a server in LSF production environment.

          The directory used to untar the package will be the working directory, make sure you have at least 100G of free space.

          Note: the directory must be absolute path, cannot include symbolic link in the path.

 

          The host memory should be at least 32G.

     2.  Logon as the selected linux as root, check the maximum virtual memory setting is bigger than 262144.

          # sysctl -n vm.max_map_count

     3.  Check docker engine is installed and running, and the following ports are not occupied:

           5050, 20022, 2222, 9200

     4.  Add the selected user (for example: lsfadmin) to the docker user group

           # usermod -aG docker lsfadmin

     5.  Login in again to the same host with the selected user (lsfadmin), untar the LSF Cognitive package:    lsf_cognitive_v1.tar.gz

          $ tar -zxvf  lsf_cognitive_simulator_v1.tar.gz

 

3. Manage LSF Cognitive service

     As the selected user(lsfadmin) or root, start the service:

1.      $ cd lsf_cognitive_v1/

2.      $ ./bcogn start  -v “<LSF_TOP_OUTSIDE_CONTAINER>:<LSF_TOP_INSIDE_CONTAINER>”

-v is optional,   it is required only when user load LSF configuration directly in the WebGUI.

-v option provides additional volume/directory mounting from physical host(LSF_TOP_OUTSIDE_CONTAINER) to inside docker container(LSF_TOP_INSIDE_CONTAINER).

It is required to use same volume path for the pair.  The service provides default file volume mountings, user can use it as bridge to transfer files to the service container.

The following volume mountings are default from the service:

$PACKAGE_TOP  is the full directory created when extract  the original package,  for example: /opt/test/lsf_cognitive_v1

 Volume Path Outside Container                   Volume Path Inside the Container

$PACKAGE_TOP                                                              $PACKAGE_TOP

$PACKAGE_TOP/work                                                     /opt/ibm/prediction/work

$PACKAGE_TOP/logs                                                      /opt/ibm/prediction/logs

$PACKAGE_TOP/config                                                   /opt/ibm/prediction/config

 

Note: if LSF production cluster is installed with LSF Suite, it might created symbolic link from /opt/ibm/lsfsuite/lsf à /sharedir/lsf,  in order to import lsf configuration from this production cluster directly,  it is required to map both directories when start the service.  For example: 

$ ./bcogn start -v “/opt/ibm/lsfsuite:/opt/ibm/lsfsuite;/sharedir/lsf:/sharedir/lsf”

 

     As the selected user(lsfadmin) or root, stop the service:

             $ ./bcogn  stop

     Showing the current service status:

             $ ./bcogn status

                 SERVICE             STATUS              ID                            PORT

                 lsf_cognitive:v1    STARTED            e370e9723742        5050, 9200, 20022

                 docker:dind          STARTED            c420d5ad63e2        2375,2376

                 webssh                STARTED             22651                     2222

     Upgrade the current service to a new sub version or new package with same version:

              1). Go to parent directory of current installation, then stop the service

                    $ cd xxx

                    $ ./lsf_cognitive_v1/bcogn stop

              2). Untar the new package to overwrite the old package,

                    $ tar -zxvf  lsf_cognitive_v1.tar.gz

              3). Upgrade and Start the services with option -u

     $ ./lsf_cognitive_v1/bcogn  start  -u

 

     Or upgrade without start:

     $ ./lsf_cognitive_v1/bcogn upgrade

     Uninstall the package:

                   The following command will stop services and remove docker image: lsf_cognitive:v1 and lsf_simulator:v1 from image repository.

                  $ ./lsf_cognitive_v1/bcogn stop -f

   

4. Logon the LSF Cognitive service

Prepare browser: Import the certificate:  config/https/cacert_lsf.pem to browser as a Trusted Root Certificate.

From your desktop browser, logon to the service using URL:  https://<SERVICE_HOST>:5050/  

Username:  Admin

Password:   Admin

 

5. Use the LSF Cognitive service

     Basic concepts:

         Experiment --- an experiment is a LSF simulation run using selected LSF configuration and workload snapshot

         LSF Configuration –- a LSF configuration is a full set of LSF cluster configuration and workload policies

         Workload Snapshot –- a workload snapshot is a set of job submission and completion records imported from the LSF cluster events files (lsb.events*)

 

    Major workflow:

        To analyze and tune LSF configuration, user should import LSF configuration and workload snapshot from production cluster into the LSF Cognitive service system, then run experiment with them.   Users can also modify the LSF configurations and the workload snapshot, then rerun the experiment. User can compare the experiment results.

 

6. Data and logs

     All the data related to experiments, LSF configurations, workload snapshots are saved in the directory:  lsf_cognitive_v1/work, they persist during service restart.  User can backup the data to the other safe place any time.

    Service daemons logs are saved in directory:   lsf_cognitive_v1/logs

 

Limitations

·       Dynamic hosts are not recognized: If the LSF cluster to be simulated contains dynamic hosts, the simulator data collecting tool will try to convert those dynamic hosts to static in the collected data so that they will show up as static hosts in the simulated cluster.  The conversion will add those hosts to the lsf.cluster.<cluster_name> file in the conf directory in the collected data and add the dynamic hosts into the hostlist line. This will add the hosts as static hosts and will be recognized by the simulator.

·       If LSF Data manager is enabled in LSF production environment, the experiments will count 1024 CPUs for the data manager master host which will lead to improper results. Suggest to disable the LSF Data Manager before load the LSF configuration into LSF Cognitive system.

·     Dynamic resources reported from elim will be converted to fixed resources when loading LSF configuration.

·     LSF Simulator 10.1 only work with a single LSF cluster, does not support LSF multi-cluster mode.

·     Usergroups defined in queue level as fairshare will not imported when import workload snapshot. Usergroups defined in bsub -G  are imported properly.

 

Copyright and Trademark Information

©Copyright IBM Corporation 2022

 

U.S. Government Users Restricted Rights - Use, duplication or disclosure restricted by GSA ADP Schedule Contract with IBM Corp.

IBM®, the IBM logo, and ibm.com® are trademarks of International Business Machines Corp., registered in many jurisdictions worldwide. Other product and service names might be trademarks of IBM or other companies. A current list of IBM trademarks is available on the Web at "Copyright and trademark information" at www.ibm.com/legal/copytrade.shtml.