===============================================================================

Readme file for: IBM Platform LSF SNMP Integration

Product/Component Release: 9.1.3

Update name: Fix 466928

Publication date: 15 September 2017 

Last modified: 15 September 2017

===============================================================================


Using LSF with SNMP


=========================

CONTENTS

=========================

1. Overview

2. About the LSF SNMP Agent

3. LSF SNMP Agent for Linux

4. Generating SNMP Traps

5. Structure of the LSF MIB

6. Example Of Using A Simple Command Line SNMP Client

7. Copyright


=========================

1. Overview

=========================

This README document provides instructions for using the LSF SNMP (Simple Network Management Protocol) agent and configuring SNMP trap generation. It assumes that you are already familiar with SNMP concepts.


=========================

2. About the LSF SNMP Agent

=========================

-------------------------

2.1 Overview

-------------------------

To integrate with existing network and system management frameworks, LSF supports SNMP, an IETF (Internet Engineering Task Force) standard protocol used to monitor and manage devices and software on the network. There has been defined a Management Information Base (MIB) specific to LSF.


-------------------------

2.2. LSF MIB

-------------------------

Any SNMP client, from command-line utilities to full network and system management frameworks, can monitor information provided by the LSF SNMP agent. 


-------------------------

2.3 SNMP Clients

-------------------------

The LSF SNMP agent is compatible with most SNMP version 1 clients. For information about a particular network or system management framework, refer to the documentation supplied by the vendor.


=========================

3. LSF SNMP Agent for Linux

=========================

-------------------------

3.1 Requirements

-------------------------

LSF version 9.1.3 must be installed before you can install the

LSF SNMP agent.


The binary snmpd expects to find libelf.so.1.


Assume your system has library libelf-0.137.so installed.


If snmpd fails to start because libelf.so.1 is missing:

in /usr/lib64 `ln -s libelf-0.137.so libelf.so.1`


-------------------------

3.2 Distribution Files

-------------------------

The Linux SNMP distribution includes the following files:

  1. The SNMP agent (snmpd).
  2. A script which can be used to install the SNMP agent and its support files into your current LSF installation setup.
  3. A script to start the SNMP agent (lsfsnmpd).
  4. The LSF MIB (LSF-AGENT-MIB.txt).
  5. Support MIB files for the agent (RFC-1215.txt, SNMPv2-SMI.txt, and SNMPv2-TC.txt).
  6. A configuration file (snmpd.conf).
  7. The SNMP trap program (snmpv1trap).

Although the files are distributed together, the SNMP agent and the SNMP trap program do not have to be installed together.


-------------------------

3.3 Installation

-------------------------

Log in as root, and enter the command: ./setup


This script uses your lsf.conf file (pointed to by LSF_ENVDIR, or in /etc) to find the location of your LSF installation. After it finishes, the files will be located in the following directories:


- LSF_SERVERDIR

    snmpd

    lsfsnmpd

- LSF_CONFDIR/snmp

    snmpd.conf

    LSF-AGENT-MIB.txt

    SNMPv2-SMI.txt

    SNMPv2-TC.txt

    RFC-1215.txt


-------------------------

3.4 Starting the LSF SNMP Agent

-------------------------


3.4.1 Overview

-------------------------

To simplify the startup process, the LSF SNMP binary file (snmpd) is accompanied by a script (lsfsnmpd). The LSF administrator can modify the parameters in the lsfsnmpd script, or run the snmpd binary file without using the script.


The lsfsnmpd script is customized to start the agent in a specific LSF environment. For example, it contains the location of the configuration file used by the agent (LSF_CONFDIR/snmp/snmpd.conf), and the log file that is created when the agent runs (LSF_LOGDIR/snmpd.log).


3.4.2 Synopsis 

-------------------------

snmpd [-f] [-L] [-c conf_file] [-l log_file] [-p port_number]


Options 


-f    

Do not fork from the calling shell.

-L  

Do not open a log file; use stdout/err instead.

-c conf_file

Read conf_file as a configuration file.

-l log_file

Log all output from the agent (including stdout/err) to log_file.

-p port_number

Listen on port port_number (default: port 161).



-------------------------

3.5 Configuring the LSF SNMP Agent

-------------------------


3.5.1 Overview 

-------------------------


Configuration of the agent is optional.


The configuration file (LSF_CONFDIR/snmp/snmpd.conf) has the format of one directive per line. Lines preceded by the '#' character are treated as comments, and are not parsed.


3.5.2. Directives in snmpd.conf

-------------------------

Directives which can be set in this file are:

Sets the system location for the agent in the system table of the MIB-II tree to string.

Sets the system contact for the agent in the system table of the MIB-II tree to string.

 Sets the host to receive traps. For example, the agent sends a Cold Start trap when it starts up.


To enable multiple hosts to receive traps, add a new line for each additional host. The default value is null (no hosts receive traps).

Sets the community string in the trap PDU (Protocol Data Unit) to string.

Enables the sending of authentication failure traps when set to 1 (enable). The default value is 2 (disable).

Sets the community string slot to string.


The agent has 5 slots available to keep community strings. Valid values for slot are from 1 to 5. SNMP PDUs sent to the agent should contain one of the communities in the 5 slots.


The default values for each slot are:


1 public

2 private

3 regional

4 proxy

5 core


If the LSF SNMP agent receives a PDU without a known community, it will discard the request, and if authtrapenable is set to 1, it will generate an authentication failure trap.


=========================

4. Generating SNMP Traps

=========================

-------------------------

4.1 Overview

-------------------------

Users of network management software might require SNMP traps and notifications that correspond to LSF daemon operations. The following events are supported.

  1. LIM goes down (detected by the master LIM). This event may also occur if LIM temporarily stops communicating to the master LIM.
  2. RES goes down (detected by the master LIM).
  3. sbatchd goes down (detected by mbatchd).
  4. A host becomes the new master host (detected by the master LIM).
  5. The master host stops being the master (detected by the master LIM).
  6. mbatchd comes up and is ready to schedule jobs (detected by mbatchd).
  7. mbatchd goes down (detected by mbatchd).
  8. mbatchd receives a reconfiguration request and is being reconfigured (detected by mbatchd).
  9. LSB_SHAREDIR becomes full (detected by mbatchd).

LSF provides a program which translates LSF events into SNMP traps. This program runs on the master host.


-------------------------

4.2 Requirements

-------------------------

LSF version 9.1.3 must be installed before you can install the SNMP trap program.


-------------------------

4.3 Distribution Files

-------------------------

Although the files are distributed together, the SNMP agent and the SNMP trap program do not have to be installed together. The following file is the SNMP trap program:

-------------------------

4.4 Set Up SNMP Traps

-------------------------

1 Copy the SNMP trap program (snmpv1trap) from the SNMP distribution to a directory on your computer.


2 In lsf.conf, set LSF_EVENT_PROGRAM and specify LSF's snmpv1trap program as the event program.


For example:


LSF_EVENT_PROGRAM=snmpv1trap


where:

snmpv1trap is installed in LSF_SERVERDIR. Otherwise, the full path name is required.


3 In lsf.conf, set LSF_EVENT_RECEIVER and specify the trap location parameters of the network management software you use (the host to receive the trap, and the community string in the trap PDU).


Use the syntax:


LSF_EVENT_RECEIVER=host_name[:community]


If you do not specify community, the default is public.

For example:


LSF_EVENT_RECEIVER=hostA:private


Specifies the software that monitors traps on hostA in the private community.


LSF_EVENT_RECEIVER=hostB.domainA


Specifies the software that monitors traps on hostB.domainA in the public community.


4 Save the changes to lsf.conf.


5 Reconfigure the cluster with the commands lsadmin reconfig and badmin mbdrestart.


LSF checks for any configuration errors. If no unrecoverable errors are found, you are asked to confirm reconfiguration. If unrecoverable errors are found, reconfiguration exits.


=========================

5. Structure of the LSF MIB

=========================

-------------------------

5.1 Overview

-------------------------

The LSF MIB (LSF_CONFDIR/snmp/LSF_AGENT_MIB.txt) consists of 

several tables of information, organized into 3 groups:

Note: The LSF MIBS import from the standard MIBS. Modify the lsfsnmpd script to set MIBDIRS to include the both $LSF_ENVDIR/snmp and your host's default MIB path.


-------------------------

5.2 The lsfHosts MIB Group

-------------------------

lsfStaticTable


Consists of one row for each LSF host, indexed by host IP address. Each row contains static host information, corresponding to the lshosts command.


lsfDynamicTable 


Consists of one row for each LSF server, indexed by host IP address. Each row contains dynamic host information corresponding to the lsload command, composed of built-in load indices and host status.


-------------------------

5.3 The lsfResources MIB Group

-------------------------


lsfNumericTable


Consists of several rows for each resource, one row for each LSF host using that resource, indexed by resource number (generated by the agent) and host IP address. Each row contains the name of a numeric shared resource or external index, a location (a host using the resource), and the resource value.


For shared resources, the resource value is the same for all the hosts that share the resource instance.


-------------------------

5.4 The lsfBatch MIB Group

-------------------------

lsbHostsTable 


Consists of one row for each LSF server, indexed by host IP address. Each row contains the host limits as well as the host counters.


lsbQueuesTable 


Consists of one row for each LSF queue, indexed by a number generated by the agent (corresponding to alphabetical order of queue names). Each row contains queue limits and queue counters.


lsbAllJobTable


Provides information on all jobs. 


lsbJobsTable 


Consists of one row for each running batch job, indexed by job ID (for performance reasons, only running jobs are displayed). Each row contains information such as queue, user and execution hosts, and job resource information.


lsbPendingJobTable


Provides information on pending and suspended jobs. This table includes job ID, job name, job user, job Queue, job status, submission host, execution host, and submission time, memory usage, and virtual memory usage.


=========================

6. Example Of Using A Simple Command Line SNMP Client

=========================

Make sure that the LSF snmpd is installed and running as described

above. 


Set the following environment variables. 

"<LSF_ENVDIR>/snmp" 

".iso.org.dod.internet.private.enterprises.platform.lsfAgent".


export MIBFILES="<LSF_ENVDIR>/snmp/LSF-AGENT-MIB.txt"

export PREFIX=".iso.org.dod.internet.private.enterprises.platform.lsfAgent"


In the example below, <snmpd_host> is the name of the host where the LSF snmpd is running.


$ snmpwalk -c public -v 1 <snmpd_host> lsfBatch.lsbJobTable

LSF-AGENT-MIB::lsbJobId.1098.0 = INTEGER: 1098

LSF-AGENT-MIB::lsbJobName.1098.0 = STRING: sleep 9999

LSF-AGENT-MIB::lsbJobUser.1098.0 = STRING: pdetina

LSF-AGENT-MIB::lsbJobQueue.1098.0 = STRING: normal

LSF-AGENT-MIB::lsbJobStatus.1098.0 = INTEGER: running(3)

LSF-AGENT-MIB::lsbJobSubmissionHost.1098.0 = STRING: detina1.eng.platformlab.ibm.com

LSF-AGENT-MIB::lsbJobExecutionHost.1098.0 = STRING: detina1.eng.platformlab.ibm.com

LSF-AGENT-MIB::lsbJobSubmitTime.1098.0 = STRING: Wed Apr 24 14:36:10 2013

LSF-AGENT-MIB::lsbJobProcessGroupIds.1098.0 = STRING: 17602

LSF-AGENT-MIB::lsbJobProcessIds.1098.0 = STRING: 17602 17603 17605 17608

LSF-AGENT-MIB::lsbJobCpuUsage.1098.0 = STRING: 0

LSF-AGENT-MIB::lsbJobMemoryUsage.1098.0 = INTEGER: 5

LSF-AGENT-MIB::lsbJobVirtualMemoryUsage.1098.0 = INTEGER: 169


=========================

7. Copyright

=========================

©Copyright IBM Corporation 2017


U.S. Government Users Restricted Rights - Use, duplication or disclosure restricted by GSA ADP Schedule Contract with IBM Corp.


IBM®, the IBM logo and ibm.com® are trademarks of International Business Machines Corp., registered in many jurisdictions worldwide. Other product and service names might be trademarks of IBM or other companies. A current list of IBM trademarks is available on the Web at "Copyright and trademark information" at www.ibm.com/legal/copytrade.shtml.