IBM Spectrum LSF Simulator 10.1.0.14 Fix (602097)

 

Abstract

This fix ensures LSF Simulator can simulate LSF cluster with special installation structure—LSF_ENVDIR is not under LSF_TOP.

 

Description

Readme documentation for IBM Spectrum LSF Simulator  V10.1.0.14 fix 602097 that includes installation-related instructions, prerequisites and co-requisites, and list of new features. 

Readme file for: IBM® Spectrum LSF Simulator
Product/Component Release: 10.1.0.14
Publication date: 10 July 2024
Last modified date: 10 July 2024

 

Download Location

 

Download from the following location: http://www.ibm.com/eserver/support/fixes/

 

Installation

 

1. System Requirements

RHEL X86_64, Version 7.9, or later; or Centos X86_64, Version 7.9, or later

Ubuntu X86_64, Version  20.04, 22.04, 24.04

Docker-ce 19.03.* or later is installed and running

2. Installation

     1.  Select a Linux host that has enough disk storage

          The server must have internet access in order to download Docker-in-Docker (dind) image for starting LSF simulation docker container.

          If you want to load LSF configuration directly, the host must be a client or a server in LSF production environment.

          The directory used to untar the package will be the working directory, make sure you have at least 100G of free space.

          The host memory should be at least 32G.

     2.  Logon as the selected linux as root, check the maximum virtual memory setting is bigger than 262144.

          # sysctl -n vm.max_map_count

     3.  Check docker engine is installed and running, and the following ports are not occupied:

           5050, 20022, 2222, 9200

     4.  Add the selected user (for example: lsfadmin) to the docker user group

           # usermod -aG docker lsfadmin

     5.  Login in again to the same host with the selected user (lsfadmin), untar the LSF Cognitive package:    lsf_cognitive_simulator_v1.tar.gz

          $ tar -zxvf  lsf_cognitive_simulator_v1.tar.gz

 

3. Manage LSF Simulator service

     As the selected user(lsfadmin) or root, start the service:

1.      $ cd lsf_cognitive_v1/

2.      $ ./bcogn start  -v “<LSF_TOP_OUTSIDE_CONTAINER>:<LSF_TOP_INSIDE_CONTAINER>”

-v is optional,   it is required only when user load LSF configuration directly in the WebGUI.

-v option provides additional volume/directory mounting from physical host(LSF_TOP_OUTSIDE_CONTAINER) to inside docker container(LSF_TOP_INSIDE_CONTAINER).

It is required to use same volume path for the pair.  The service provides default file volume mountings, user can use it as bridge to transfer files to the service container.

The following volume mountings are default from the service:

$PACKAGE_TOP  is the full directory created when extract  the original package,  for example: /opt/test/lsf_cognitive_v1

 Volume Path Outside Container                   Volume Path Inside the Container

$PACKAGE_TOP                                                              $PACKAGE_TOP

$PACKAGE_TOP/work                                                     /opt/ibm/prediction/work

$PACKAGE_TOP/logs                                                      /opt/ibm/prediction/logs

$PACKAGE_TOP/config                                                   /opt/ibm/prediction/config

 

Note: if LSF production cluster is installed with LSF Suite, it might created symbolic link from /opt/ibm/lsfsuite/lsf à /sharedir/lsf,  in order to import lsf configuration from this production cluster directly,  it is required to map both directories when start the service.  For example: 

$ ./bcogn start -v “/opt/ibm/lsfsuite:/opt/ibm/lsfsuite;/sharedir/lsf:/sharedir/lsf”

 

     As the selected user(lsfadmin) or root, stop the service:

             $ ./bcogn  stop

     Showing the current service status:

             $ ./bcogn status

                 SERVICE             STATUS              ID                            PORT

                 lsf_cognitive:v1    STARTED            e370e9723742        5050, 9200, 20022

                 docker:dind          STARTED            c420d5ad63e2        2375,2376

                 webssh                STARTED             22651                     2222

     Upgrade the current service to a new sub version or new package with same version:

              1). Go to parent directory of current installation, then stop the service

                    $ cd xxx

                    $ ./lsf_cognitive_v1/bcogn stop

              2). Untar the new package to overwrite the old package,

                    $ tar -zxvf  lsf_cognitive_simulator_v1.tar.gz

              3). Upgrade and Start the services with option -u

     $ ./lsf_cognitive_v1/bcogn  start  -u

 

     Or upgrade without start:

     $ ./lsf_cognitive_v1/bcogn upgrade

     Uninstall the package:

                   The following command will stop services and remove docker image: lsf_cognitive:v1 from image repository.

                  $ ./lsf_cognitive_v1/bcogn stop -f

   

4. Logon the LSF Cognitive service

Prepare browser: Import the certificate:  config/https/cacert_lsf.pem to browser as a Trusted Root Certificate.

From your desktop browser, logon to the service using URL:  https://<SERVICE_HOST>:5050/  

Username:  Admin

Password:   Admin

 

5. Use the LSF Cognitive service

     Basic concepts:

         Experiment --- an experiment is a LSF simulation run using selected LSF configuration and workload snapshot

         LSF Configuration –- a LSF configuration is a full set of LSF cluster configuration and workload policies

         Workload Snapshot –- a workload snapshot is a set of job submission and completion records imported from the LSF cluster events files (lsb.events*)

 

    Major workflow:

        To analyze and tune LSF configuration, user should import LSF configuration and workload snapshot from production cluster into the LSF Cognitive service system, then run experiment with them.   Users can also modify the LSF configurations and the workload snapshot, then rerun the experiment. User can compare the experiment results.

 

6. Data and logs

     All the data related to experiments, LSF configurations, workload snapshots are saved in the directory:  lsf_cognitive_v1/work, they persist during service restart.  User can backup the data to the other safe place any time.

    Service daemons logs are saved in directory:   lsf_cognitive_v1/logs

 

Limitations

·       Dynamic hosts are not recognized: If the LSF cluster to be simulated contains dynamic hosts, the simulator data collecting tool will try to convert those dynamic hosts to static in the collected data so that they will show up as static hosts in the simulated cluster.  The conversion will add those hosts to the lsf.cluster.<cluster_name> file in the conf directory in the collected data and add the dynamic hosts into the hostlist line. This will add the hosts as static hosts and will be recognized by the simulator.

·       If LSF Data manager is enabled in LSF production environment, the experiments will count 1024 CPUs for the data manager master host which will lead to improper results. Suggest to disable the LSF Data Manager before load the LSF configuration into LSF Cognitive system.

·     Dynamic resources reported from elim will be converted to fixed resources when loading LSF configuration.

·     LSF Simulator 10.1 fix pack 13 only work with a single LSF cluster, does not support LSF multi-cluster mode.

·     Usergroups defined in queue level as fairshare will not imported when import workload snapshot. Usergroups defined in bsub -G  are imported properly.

 

Copyright and Trademark Information

©Copyright IBM Corporation 2024

 

U.S. Government Users Restricted Rights - Use, duplication or disclosure restricted by GSA ADP Schedule Contract with IBM Corp.

IBM®, the IBM logo, and ibm.com® are trademarks of International Business Machines Corp., registered in many jurisdictions worldwide. Other product and service names might be trademarks of IBM or other companies. A current list of IBM trademarks is available on the Web at "Copyright and trademark information" at www.ibm.com/legal/copytrade.shtml.