Readme file for IBM® Spectrum LSF Predictor 10.1 Fix Pack 13 (601187)
Abstract
Readme documentation for IBM Spectrum LSF Predictor 10.1 Fix Pack 13 including installation-related instructions, prerequisites, and limitations.
Use the LSF Predictor to create and train artificial
intelligence (AI) models for the prediction of LSF job memory and run time. It
allows administrator to test the AI models and compare the results, also to
deploy the models to the LSF production environments.
Description
Readme documentation for IBM
Spectrum LSF Predictor V10.1 Fix Pack 13 that includes installation-related
instructions, prerequisites and co-requisites, and list of new features.
Readme file for: IBM® Spectrum LSF Predictor
Product/Component Release: 10.1 Fix Pack 13
Publication date: 24 June 2022
Last modified date: 24 June 2022
What’s new
·
Support
AutoAI in IBM Cloud Pak for Data 4.0 running in
public cloud environment.
·
Upgrade ibm-watson-machine-learning API from 1.0.79 to 1.0.175.
·
Force
password change for first-time logon.
·
Upgrade
Elasticsearch version from 7.10.2 to 7.16.3 with jdk 17.0.3.1 to avoid Log4j
and Java security vulnerability.
·
Upgrade
Python from 3.7.9 to 3.7.11 to avoid security vulnerability.
·
Add
utilization graph of cluster-level memory and slots usage for experiment
results.
· Handle symbolic links (sym-link) when load LSF configurations.
Installation
1. System Requirements
RHEL
X86_64 version 7.2 or later; or Centos X86_64 version 7.2 or later
Ubuntu X86_64, version 18.04
Docker-ce version 19.03.* or later is installed and running
2. Installation
1.
Select
a Linux host that has enough disk storage.
The server must have internet access to download Docker-in-Docker (dind) image for starting
LSF simulation Docker container. If you want to load LSF configuration
directly, the host must be a client or a server in LSF production environment. The
directory used to extract the package is the working directory. Make sure you
have at least 100G of free space. The host memory is at least 32G.
2.
Logon
as the selected Linux as root. Check that the
maximum virtual memory setting is larger than 262144.
# sysctl -n vm.max_map_count
3.
Check
that the Docker engine is installed and running, and the following ports are
not occupied:
5050, 20022, 2222, 9200
4.
Add
the selected user, for example lsfadmin, to the Docker user
group:
# usermod -aG docker lsfadmin
5.
Login
in again to the same host with the selected user (lsfadmin) and extract the LSF
Cognitive package: lsf_cognitive_predictor_v1.tar.gz
$ tar -zxvf lsf_cognitive_predictor_v1.tar.gz
3.
Prepare IBM Cloud Pak for Data AutoAI
on public cloud
AutoAI in IBM Cloud Pak for
Data service is provided by Watson Studio. You need to prepare the environment
before you use it for LSF Predictor to train LSF job data.
1.
Setup IBM Cloud Pak for Data AutoAI on public cloud account by completing the following steps:
https://github.com/IBM/predict-insurance-charges-with-autoai#step-3-create-ibm-cloud-services
2.
Create
a Watson Studio project in Watson Studio web console. Go to main menu, select View
all projects > New Project + and select Create an empty
project.
3.
Create
a deployment space and record the space_id. Go to main menu,
select View all spaces > New deployment space +. Fill in the fields, then click Create. In the list of spaces, click on the new
deployment space name to see the space details. Copy and save space GUID
as input of space ID in LSF Predictor settings page.
4.
Generate
an IBM Cloud API key (apikey). The apikey is the only credential to log on IBM Cloud Pak for Data AutoAI on
public cloud. For more information about generating the apikey, see https://github.com/IBM/predict-insurance-charges-with-autoai#71-get-ibm-cloud-api-key.
4.
Manage LSF Predictor service
1.
As
the selected user (lsfadmin) or root, start the service:
$ cd lsf_cognitive_v1/
$ ./bcogn
start -v “<LSF_TOP_OUTSIDE_CONTAINER>:<LSF_TOP_INSIDE_CONTAINER>”
-v is optional. It is required only when you load LSF
configuration directly in the web GUI.
-v option provides more volume or directory that is mounting
from physical host (LSF_TOP_OUTSIDE_CONTAINER) to inside Docker
container (LSF_TOP_INSIDE_CONTAINER).
You must use same volume path for the pair. The
service provides default file volume mountings. You can use it as bridge to
transfer files to the service container.
The following volume mountings are default from
the service:
$PACKAGE_TOP is the full directory
that is created when you extract the original package, for example: /opt/test/lsf_cognitive_v1
For volume path outside
container volume path inside the container:
$PACKAGE_TOP $PACKAGE_TOP
$PACKAGE_TOP/work /opt/ibm/prediction/work
$PACKAGE_TOP/logs /opt/ibm/prediction/logs
$PACKAGE_TOP/config /opt/ibm/prediction/config
Note: If LSF production
cluster is installed with LSF Suite, it might create a symbolic link from /opt/ibm/lsfsuite/lsf to /sharedir/lsf,
in import LSF configurations from this production cluster directly. It is
required to map both directories when you start the service. For example:
$ ./bcogn
start -v “/opt/ibm/lsfsuite:/opt/ibm/lsfsuite;/sharedir/lsf:/sharedir/lsf”
2.
As
the selected user (lsfadmin) or root, stop the service:
$
./bcogn stop
Showing the current service status:
$ ./bcogn status
SERVICE STATUS ID PORT
lsf_cognitive:v1 STARTED e370e9723742 5050, 9200, 20022
docker:dind STARTED c420d5ad63e2 2375,2376
webssh STARTED 22651 2222
3.
Upgrade
the current service to a new sub version or new package with same version:
a.
Go
to parent directory of current installation, then stop the service:
$ cd xxx
$ ./lsf_cognitive_v1/bcogn
stop
b.
Extract
the new package to overwrite the old package:
$ tar -zxvf lsf_cognitive_v1.tar.gz
c.
Upgrade
and start the services with option -u:
$
./lsf_cognitive_v1/bcogn start -u
Or upgrade without start:
$
./lsf_cognitive_v1/bcogn upgrade
4.
Uninstall
the package:
The following command stops the services and removes Docker image: lsf_cognitive:v1 from image repository.
$
./lsf_cognitive_v1/bcogn stop -f
5.
Log on the LSF Cognitive service
1.
Prepare
your browser by importing the certificate as a trusted root certificate: config/https/cacert_lsf.pem
2.
From
your web browser, log on to the service by using the following URL: https://<SERVICE_HOST>:5050/
Username: Admin
Password: Admin
6.
Use the LSF Cognitive service
Basic concepts:
Experiment
An experiment is an LSF simulation run that uses
selected LSF configuration and workload snapshot.
LSF configuration
An LSF configuration is a full set of LSF
cluster configuration and workload policies.
Workload Snapshot
A workload snapshot is a set of job submission
and completion records that are imported from the LSF cluster events files (lsb.events*).
Prediction
A prediction is a process to generate job
resource prediction mode, which includes collecting LSF job event data, clean
data, send data to IBM Cloud Pak for Data Auto-AI for training, download model,
and so on.
Inference
An inference is a service to predict a best
value based on pre-training model for LSF newly submitted job request.
7.
Data and logs
All the data that is related to experiments,
LSF configurations, workload snapshots, prediction models are saved in the
directory: lsf_cognitive_v1/work, which persists during
service restart. You can back up the data to another directory anytime.
Service daemons logs are saved in the directory: lsf_cognitive_v1/logs
Limitations
· Dynamic hosts are not recognized. If the LSF cluster to be simulated contains dynamic hosts, the simulator data collecting tool tries to convert those dynamic hosts to static in the collected data. The tool then shows up as static hosts in the simulated cluster. The conversion adds those hosts to the lsf.cluster.<cluster_name> file in the conf directory in the collected data and adds the dynamic hosts into the hostlist line. The hosts are added as static hosts and is recognized by the simulator.
·
If LSF Data Manager is
enabled in LSF production environment, the experiments count 1024 CPUs for the data
manager primary host, which leads to improper results. Disable the LSF Data
Manager before you load the LSF configuration into LSF Cognitive system.
·
Dynamic
resources that are reported from elim
are converted to fixed resources when you load LSF configuration.
·
LSF Predictor
10.1 Fix Pack 13 only works with a single LSF cluster and does not support LSF
multi-cluster mode.
· User groups that are defined in queue level as fairshare are not imported when import workload snapshot. User groups that are defined in bsub -G are imported properly.
Copyright and
Trademark Information
©Copyright IBM Corporation 2022
U.S. Government Users Restricted Rights - Use, duplication or disclosure restricted by GSA ADP Schedule Contract with IBM Corp.
IBM®, the IBM logo, and ibm.com® are trademarks of International Business Machines Corp., registered in many jurisdictions worldwide. Other product and service names might be trademarks of IBM or other companies. A current list of IBM trademarks is available on the web at "Copyright and trademark information" at www.ibm.com/legal/copytrade.shtml.