Readme file for IBM® Spectrum LSF Simulator 10.1 Fix Pack 13 (601213)
Abstract
Readme documentation for IBM Spectrum LSF Simulator 10.1 Fix Pack 13 including installation-related instructions, prerequisites, and limitations.
Use the LSF Simulator to analyze and tune LSF
configuration by simulating an LSF cluster in a separated environment. It
allows administrator to run experiments with different set of LSF
configurations and workload snapshots without interrupting LSF production
environment.
Description
Readme documentation for IBM
Spectrum LSF Simulator V10.1 Fix Pack 13 that includes installation-related
instructions, prerequisites and co-requisites, and list of new features.
Readme file for: IBM® Spectrum LSF Simulator
Product or component release: 10.1 Fix Pack 13
Publication date: 24 June 2022
Last modified date: 24 June 2022
What’s new
·
Force
password change for first-time logon.
·
Upgrade
Elasticsearch version from 7.10.2 to 7.16.3 with jdk 17.0.3.1 to avoid Log4j
and Java security issues.
·
Upgrade
Python from 3.7.9 to 3.7.11 to avoid security vulnerability.
·
Add
utilization graph of cluster level memory and slots usage for experiment
results.
·
Handle
symbolic links (sym-link) when load LSF configuration.
Installation
1. System Requirements
RHEL
X86_64, version 7.2, or later; or Centos X86_64, version 7.2, or later
Ubuntu X86_64, version 18.04
Docker-ce 19.03.* or later is installed and running
2. Installation
1.
Select
a Linux host that has enough disk storage.
The server must have internet access to download Docker-in-Docker (dind) image for starting
LSF simulation Docker container. If you want to load LSF configuration directly,
the host must be a client or a server in LSF production environment. The
directory used to extract the package is the working directory. Make sure you
have at least 100G of free space. The host memory is at least 32G.
2.
Logon
as the selected Linux as root. Check the maximum
virtual memory setting is larger than 262144:
# sysctl -n vm.max_map_count
3.
Check
Docker engine is installed and running, and the following ports are not
occupied:
5050, 20022,
2222, 9200
4.
Add
the selected user, for example lsfadmin to the Docker user
group:
# usermod -aG docker lsfadmin
5.
Login
in again to the same host with the selected user (lsfadmin) and extract the LSF
Cognitive package: lsf_cognitive_simulator_v1.tar.gz
$ tar -zxvf lsf_cognitive_simulator_v1.tar.gz
3.
Manage LSF Simulator service
1.
As
the selected user (lsfadmin) or root, start the service:
$ cd
lsf_cognitive_v1/
$
./bcogn start -v “<LSF_TOP_OUTSIDE_CONTAINER>:<LSF_TOP_INSIDE_CONTAINER>”
-v is optional. It is required only when you load LSF
configuration directly in the web GUI.
-v option provides additional volume or directory that is mounting
from physical host (LSF_TOP_OUTSIDE_CONTAINER) to inside Docker
container (LSF_TOP_INSIDE_CONTAINER).
You must use same volume path for the pair. The
service provides default file volume mountings. You can use it as bridge to
transfer files to the service container.
The following volume mountings are default from
the service:
$PACKAGE_TOP is the full directory that
is created when you extract the original package, for example: /opt/test/lsf_cognitive_v1
Volume path outside
container Volume path inside the container
$PACKAGE_TOP
$PACKAGE_TOP
$PACKAGE_TOP/work
/opt/ibm/prediction/work
$PACKAGE_TOP/logs /opt/ibm/prediction/logs
$PACKAGE_TOP/config /opt/ibm/prediction/config
Note: If
LSF production cluster is installed with LSF Suite, it might create a symbolic
link from /opt/ibm/lsfsuite/lsf to /sharedir/lsf,
to import LSF configurations from this production cluster directly. It is
required to map both directories when you start the service. For example:
$ ./bcogn
start -v “/opt/ibm/lsfsuite:/opt/ibm/lsfsuite;/sharedir/lsf:/sharedir/lsf”
2.
As
the selected user (lsfadmin) or root, stop the service:
$
./bcogn stop
3.
Show
the current service status:
$ ./bcogn status
SERVICE STATUS ID
PORT
lsf_cognitive:v1 STARTED e370e9723742 5050, 9200, 20022
docker:dind
STARTED c420d5ad63e2 2375,2376
webssh
STARTED 22651 2222
4.
Upgrade
the current service to a new sub version or new package with same version:
a.
Go
to parent directory of current installation, then stop the service:
$ cd xxx
$ ./lsf_cognitive_v1/bcogn
stop
b.
Extract
the new package to overwrite the old package:
$ tar -zxvf lsf_cognitive_simulator_v1.tar.gz
c.
Upgrade
and start the services with option -u:
$
./lsf_cognitive_v1/bcogn start -u
Or upgrade without start:
$
./lsf_cognitive_v1/bcogn upgrade
5.
Uninstall
the package:
The following command stops the services and removes Docker image: lsf_cognitive:v1 from image repository:
$
./lsf_cognitive_v1/bcogn stop -f
4.
Logon the LSF Cognitive service
1.
Prepare
your browser by importing the certificate as a trusted root certificate: config/https/cacert_lsf.pem
2.
From
your desktop browser, log on to the service using the following URL: https://<SERVICE_HOST>:5050/
Username: Admin
Password: Admin
5.
Use the LSF Cognitive service
Basic concepts:
Experiment
An experiment is an LSF simulation run that
uses selected LSF configuration and workload snapshot.
LSF configuration
An LSF configuration is a full set of LSF
cluster configuration and workload policies.
Workload Snapshot
A workload snapshot is a set of job submission
and completion records that are imported from the LSF cluster events files (lsb.events*).
Major workflow
To analyze and tune LSF configuration, you
should import LSF configuration and workload snapshot from production cluster
into the LSF Cognitive service system, then run experiments with them. You can
also modify the LSF configurations and the workload snapshot, then rerun the
experiments. You can compare the experiment results.
6.
Data and logs
All
the data that is related to experiments, LSF configurations, and workload
snapshots are saved in the directory: lsf_cognitive_v1/work, which persists during service restart.
You can backup the data to another directory any time.
Service daemons logs are saved in the directory: lsf_cognitive_v1/logs
Limitations
· Dynamic hosts are not recognized. If the LSF cluster to be simulated contains dynamic hosts, the simulator data collecting tool tries to convert those dynamic hosts to static in the collected data. The tool then shows up as static hosts in the simulated cluster. The conversion adds those hosts to the lsf.cluster.<cluster_name> file in the conf directory in the collected data and add the dynamic hosts into the hostlist line. This will add the hosts as static hosts and will be recognized by the simulator.
·
If LSF Data Manager is
enabled in LSF production environment, the experiments count 1024 CPUs for the data
manager primary host which, leads to improper results. Disable the LSF Data
Manager before you load the LSF configuration into the LSF Cognitive system.
·
Dynamic
resources that are reported from elim
are converted to fixed resources when you load LSF configuration.
·
LSF Simulator
10.1 Fix Pack 13 only works with a single LSF cluster, and
does not support LSF multi-cluster mode.
· User groups that are defined in queue level as fairshare are not imported when import workload snapshot. User groups that are defined in bsub -G are imported properly.
Copyright and
Trademark Information
©Copyright IBM Corporation 2022
U.S. Government Users Restricted Rights - Use, duplication or disclosure restricted by GSA ADP Schedule Contract with IBM Corp.
IBM®, the IBM logo, and ibm.com® are trademarks of International Business Machines Corp., registered in many jurisdictions worldwide. Other product and service names might be trademarks of IBM or other companies. A current list of IBM trademarks is available on the web at "Copyright and trademark information" at www.ibm.com/legal/copytrade.shtml.