IBM Spectrum LSF Simulator 10.1.0.14 Fix (602097)
Abstract
This fix ensures LSF Simulator can simulate LSF cluster with special installation structure—LSF_ENVDIR is not under LSF_TOP.
Description
Readme
documentation for IBM Spectrum LSF Simulator V10.1.0.14 fix 602097 that
includes installation-related instructions, prerequisites and co-requisites,
and list of new features.
Readme
file for: IBM® Spectrum LSF Simulator
Product/Component Release: 10.1.0.14
Publication date: 10 July 2024
Last modified date: 10 July 2024
Download Location
Download from the following location: http://www.ibm.com/eserver/support/fixes/
Installation
1. System Requirements
RHEL X86_64, Version 7.9, or later; or Centos X86_64, Version 7.9, or later
Ubuntu X86_64, Version 20.04, 22.04, 24.04
Docker-ce 19.03.* or later is installed and running
2. Installation
1. Select a Linux host that has enough disk storage
The server must have internet access in order to download Docker-in-Docker (dind) image for starting LSF simulation docker container.
If you want to load LSF configuration directly, the host must be a client or a server in LSF production environment.
The directory used to untar the package will be the working directory, make sure you have at least 100G of free space.
The host memory should be at least 32G.
2. Logon as the selected linux as root, check the maximum virtual memory setting is bigger than 262144.
# sysctl -n vm.max_map_count
3. Check docker engine is installed and running, and the following ports are not occupied:
5050, 20022, 2222, 9200
4. Add the selected user (for example: lsfadmin) to the docker user group
# usermod -aG docker lsfadmin
5. Login in again to the same host with the selected user (lsfadmin), untar the LSF Cognitive package: lsf_cognitive_simulator_v1.tar.gz
$ tar -zxvf lsf_cognitive_simulator_v1.tar.gz
3. Manage LSF Simulator service
As the selected user(lsfadmin) or root, start the service:
1. $ cd lsf_cognitive_v1/
2. $ ./bcogn start -v “<LSF_TOP_OUTSIDE_CONTAINER>:<LSF_TOP_INSIDE_CONTAINER>”
-v is optional, it is required only when user load LSF configuration directly in the WebGUI.
-v option provides additional volume/directory mounting from physical host(LSF_TOP_OUTSIDE_CONTAINER) to inside docker container(LSF_TOP_INSIDE_CONTAINER).
It is required to use same volume path for the pair. The service provides default file volume mountings, user can use it as bridge to transfer files to the service container.
The following volume mountings are default from the service:
$PACKAGE_TOP is the full directory created when extract the original package, for example: /opt/test/lsf_cognitive_v1
Volume Path Outside Container Volume Path Inside the Container
$PACKAGE_TOP $PACKAGE_TOP
$PACKAGE_TOP/work /opt/ibm/prediction/work
$PACKAGE_TOP/logs /opt/ibm/prediction/logs
$PACKAGE_TOP/config /opt/ibm/prediction/config
Note: if LSF production cluster is installed with LSF Suite, it might created symbolic link from /opt/ibm/lsfsuite/lsf à /sharedir/lsf, in order to import lsf configuration from this production cluster directly, it is required to map both directories when start the service. For example:
$ ./bcogn start -v “/opt/ibm/lsfsuite:/opt/ibm/lsfsuite;/sharedir/lsf:/sharedir/lsf”
As the selected user(lsfadmin) or root, stop the service:
$ ./bcogn stop
Showing the current service status:
$ ./bcogn status
SERVICE STATUS ID PORT
lsf_cognitive:v1 STARTED e370e9723742 5050, 9200, 20022
docker:dind STARTED c420d5ad63e2 2375,2376
webssh STARTED 22651 2222
Upgrade the current service to a new sub version or new package with same version:
1). Go to parent directory of current installation, then stop the service
$ cd xxx
$ ./lsf_cognitive_v1/bcogn stop
2). Untar the new package to overwrite the old package,
$ tar -zxvf lsf_cognitive_simulator_v1.tar.gz
3). Upgrade and Start the services with option -u
$ ./lsf_cognitive_v1/bcogn start -u
Or upgrade without start:
$ ./lsf_cognitive_v1/bcogn upgrade
Uninstall the package:
The following command will stop services and remove docker image: lsf_cognitive:v1 from image repository.
$ ./lsf_cognitive_v1/bcogn stop -f
4. Logon the LSF Cognitive service
Prepare browser: Import the certificate: config/https/cacert_lsf.pem to browser as a Trusted Root Certificate.
From your desktop browser, logon to the service using URL: https://<SERVICE_HOST>:5050/
Username: Admin
Password: Admin
5. Use the LSF Cognitive service
Basic concepts:
Experiment --- an experiment is a LSF simulation run using selected LSF configuration and workload snapshot
LSF Configuration –- a LSF configuration is a full set of LSF cluster configuration and workload policies
Workload Snapshot –- a workload snapshot is a set of job submission and completion records imported from the LSF cluster events files (lsb.events*)
Major workflow:
To analyze and tune LSF configuration, user should import LSF configuration and workload snapshot from production cluster into the LSF Cognitive service system, then run experiment with them. Users can also modify the LSF configurations and the workload snapshot, then rerun the experiment. User can compare the experiment results.
6. Data and logs
All the data related to experiments, LSF configurations, workload snapshots are saved in the directory: lsf_cognitive_v1/work, they persist during service restart. User can backup the data to the other safe place any time.
Service daemons logs are saved in directory: lsf_cognitive_v1/logs
Limitations
· Dynamic hosts are not recognized: If the LSF cluster to be simulated contains dynamic hosts, the simulator data collecting tool will try to convert those dynamic hosts to static in the collected data so that they will show up as static hosts in the simulated cluster. The conversion will add those hosts to the lsf.cluster.<cluster_name> file in the conf directory in the collected data and add the dynamic hosts into the hostlist line. This will add the hosts as static hosts and will be recognized by the simulator.
· If LSF Data manager is enabled in LSF production environment, the experiments will count 1024 CPUs for the data manager master host which will lead to improper results. Suggest to disable the LSF Data Manager before load the LSF configuration into LSF Cognitive system.
· Dynamic resources reported from elim will be converted to fixed resources when loading LSF configuration.
· LSF Simulator 10.1 fix pack 13 only work with a single LSF cluster, does not support LSF multi-cluster mode.
· Usergroups defined in queue level as fairshare will not imported when import workload snapshot. Usergroups defined in bsub -G are imported properly.
Copyright and Trademark Information
©Copyright IBM Corporation 2024
U.S. Government Users Restricted Rights - Use, duplication or disclosure restricted by GSA ADP Schedule Contract with IBM Corp.
IBM®, the IBM logo, and ibm.com® are trademarks of International Business Machines Corp., registered in many jurisdictions worldwide. Other product and service names might be trademarks of IBM or other companies. A current list of IBM trademarks is available on the Web at "Copyright and trademark information" at www.ibm.com/legal/copytrade.shtml.