IBM Spectrum LSF Simulator 10.1.0.13 Readme File
Abstract
Readme documentation for IBM Spectrum LSF Simulator 10.1.0.13 including installation-related instructions, prerequisites and limitations.
Use the LSF Simulator to analyze and tune LSF configuration by simulating an LSF cluster in a separated environment. It allows administrator to run experiments with different set of LSF configurations and workload snapshots without interrupting LSF production environment.
Description
Readme documentation for IBM
Spectrum LSF Simulator
V10.1.0.13 that includes installation-related instructions,
prerequisites and co-requisites, and list of new features.
Readme file for: IBM® Spectrum LSF Simulator
Product/Component
Release:
10.1.0.13
Publication
date:
28 July, 2022
Last modified
date: 28 July, 2022
What’s new
ü
Enforce
password change when first time logon
ü
Support
experiment cluster level resource utilization charts
ü
Handle
sym-link when load LSF conf
ü
Support
external Elasticsearch with security enabled
New Features
Description
1.
Enforce password change when first time logon Webgui
In order to protect the system, it
is required that admin has to change the webgui logon
password from default logon: Admin/Admin,
This is only done once when first time logon.
2.
Support experiment cluster level resource utilization
charts
When an experiment is running or has been done, user can
view the cluster level slots and memory usage chart by clicking on “Graphs” tab
3.
Handle sym-link when load LSF
conf
The LSF production cluster might include symbolic links
under conf/ directory. When simulator load LSF conf, it will replace the sym-link with the real content without touching original files under LSF
conf/.
4.
Support external Elasticsearch with security enabled
Two enhancements are related with this new feature:
1). PACKAGE_TOP/config/providers/es1.json will be the ONLY Elasticsearch
connection settings cross the whole system.
Previous es1.json
under lsfconfs/ and experiments/ will be ignored
2). Administrator can configure external Elasticsearch with
security mode enabled. support https connection with cacert
and ES_USERNAME,
ES_PASSWORD for user logon.
The following cases are supported after this enhancement:
case 1). Default installation,
same as
before, built-in Elasticsearch is started inside the lsf-cognitive
container, simulator will use built-in elasticsearch
for whole system
limitation:
cannot enable security for built-in Elasticsearch
case 2). use external security enabled Elasticsearch by the
following steps
step 1:
before start the service, edit file PACKAGE_TOP/config/providers/es1.json,
define
"use_ssl" : true,
"ca_certs":
"CA_FILE_NAME",
"verify_certs": true,
step 2:
copy CA_FILE_NAME to PACKAGE_TOP/config/providers/
step 3:
define environment variables
export
ES_USERNAME=elastic
export
ES_PASSWORD= xxxxxx
export ElasticSearchURL=https://fp13-rhel85-2:9200 <--- can be
any value, this is for disable built-in elasticsearch
step 4:
start up the service as admin account
cd
/opt/test/lsf_cognitive_v1
./bcogn start -v xxx
Fresh
Installation Steps
1. System Requirements
RHEL X86_64, Version 7.2, or later; or
Centos X86_64, Version 7.2, or later
Ubuntu X86_64, Version 18.04
Docker-ce 19.03.*
is installed and running
2.
Installation
1.
Select a Linux host that has enough disk storage
The server must have internet access in order to download Docker-in-Docker (dind)
image for starting LSF simulation docker container.
If you want to load LSF configuration
directly, the host must be a client or a server in LSF production environment.
The directory used to untar the package will be the working directory, make sure
you have at least 100G of free space.
Note: the directory must be absolute
path, cannot include symbolic link in the path.
The host memory should be at least
32G.
2.
Logon as the selected linux as root, check the
maximum virtual memory setting is bigger than 262144.
# sysctl -n vm.max_map_count
3.
Check docker engine is installed and running, and the following ports
are not occupied:
5050, 20022, 2222, 9200
4.
Add the selected user (for example: lsfadmin)
to the docker user group
# usermod
-aG docker lsfadmin
5.
Login in again to the same host with the selected user (lsfadmin), untar the LSF
Cognitive package:
lsf_cognitive_v1.tar.gz
$ tar -zxvf lsf_cognitive_simulator_v1.tar.gz
3.
Manage LSF Cognitive service
As the selected user(lsfadmin)
or root, start the service:
1.
$
cd lsf_cognitive_v1/
2.
$ ./bcogn start -v
“<LSF_TOP_OUTSIDE_CONTAINER>:<LSF_TOP_INSIDE_CONTAINER>”
-v is optional, it is required only when user load
LSF configuration directly in the WebGUI.
-v option provides additional volume/directory
mounting from physical host(LSF_TOP_OUTSIDE_CONTAINER)
to inside docker container(LSF_TOP_INSIDE_CONTAINER).
It is required to use same volume path for the
pair. The service provides default file volume
mountings, user can use it as bridge to transfer files to the service
container.
The following volume mountings are default from
the service:
$PACKAGE_TOP is the full
directory created when extract the
original package, for example:
/opt/test/lsf_cognitive_v1
Volume Path Outside Container Volume Path Inside the
Container
$PACKAGE_TOP
$PACKAGE_TOP
$PACKAGE_TOP/work
/opt/ibm/prediction/work
$PACKAGE_TOP/logs
/opt/ibm/prediction/logs
$PACKAGE_TOP/config
/opt/ibm/prediction/config
Note: if LSF production
cluster is installed with LSF Suite, it might created symbolic link from /opt/ibm/lsfsuite/lsf
à /sharedir/lsf, in order to
import lsf configuration from this production cluster
directly, it is required to map
both directories when start the service.
For example:
$ ./bcogn
start -v “/opt/ibm/lsfsuite:/opt/ibm/lsfsuite;/sharedir/lsf:/sharedir/lsf”
As the selected user(lsfadmin)
or root, stop the service:
$ ./bcogn stop
Showing the current service status:
$ ./bcogn status
SERVICE STATUS ID PORT
lsf_cognitive:v1 STARTED e370e9723742 5050, 9200, 20022
docker:dind
STARTED
c420d5ad63e2 2375,2376
webssh STARTED 22651 2222
Upgrade the current service to a new sub
version or new package with same version:
1). Go to parent directory of current
installation, then stop the service
$ cd xxx
$ ./lsf_cognitive_v1/bcogn stop
2). Untar
the new package to overwrite the old package,
$ tar -zxvf lsf_cognitive_v1.tar.gz
3). Upgrade and Start the
services with option -u
$ ./lsf_cognitive_v1/bcogn start -u
Or upgrade without start:
$ ./lsf_cognitive_v1/bcogn upgrade
Uninstall the package:
The following command will stop services
and remove docker image: lsf_cognitive:v1 and
lsf_simulator:v1 from image repository.
$ ./lsf_cognitive_v1/bcogn stop -f
4.
Logon the LSF Cognitive service
Prepare browser: Import the certificate: config/https/cacert_lsf.pem
to browser as a Trusted Root Certificate.
From your desktop browser, logon to the service
using URL: https://<SERVICE_HOST>:5050/
Username:
Admin
Password:
Admin
5.
Use the LSF Cognitive service
Basic concepts:
Experiment --- an experiment is a LSF simulation
run using selected LSF configuration and workload snapshot
LSF Configuration –- a LSF
configuration is a full set of LSF cluster configuration and workload policies
Workload Snapshot –- a workload
snapshot is a set of job submission and completion records imported from the
LSF cluster events files (lsb.events*)
Major workflow:
To analyze and tune LSF configuration,
user should import LSF configuration and workload snapshot from production
cluster into the LSF Cognitive service system, then run experiment with
them. Users can also modify the LSF
configurations and the workload snapshot, then rerun the experiment. User can
compare the experiment results.
6.
Data and logs
All the data related to experiments, LSF
configurations, workload snapshots are saved in the directory: lsf_cognitive_v1/work, they persist during
service restart. User can backup the
data to the other safe place any time.
Service daemons logs are saved in
directory: lsf_cognitive_v1/logs
Limitations
· Dynamic hosts are not recognized: If the LSF cluster to be simulated contains dynamic hosts, the simulator data collecting tool will try to convert those dynamic hosts to static in the collected data so that they will show up as static hosts in the simulated cluster. The conversion will add those hosts to the lsf.cluster.<cluster_name> file in the conf directory in the collected data and add the dynamic hosts into the hostlist line. This will add the hosts as static hosts and will be recognized by the simulator.
·
If
LSF Data manager is enabled in LSF production environment, the experiments will
count 1024 CPUs for the data manager master host which will lead to improper
results. Suggest to disable the LSF Data Manager
before load the LSF configuration into LSF Cognitive system.
·
Dynamic resources reported from elim will be converted to fixed resources when loading LSF
configuration.
· LSF
Simulator 10.1 only work with a single LSF cluster, does not support LSF
multi-cluster mode.
· Usergroups defined in queue level as fairshare will not imported when import workload snapshot. Usergroups defined in bsub -G are imported properly.
Copyright and
Trademark Information
©Copyright IBM Corporation 2022
U.S. Government Users Restricted Rights - Use, duplication or disclosure restricted by GSA ADP Schedule Contract with IBM Corp.
IBM®, the IBM logo, and ibm.com® are trademarks of International Business Machines Corp., registered in many jurisdictions worldwide. Other product and service names might be trademarks of IBM or other companies. A current list of IBM trademarks is available on the Web at "Copyright and trademark information" at www.ibm.com/legal/copytrade.shtml.