Sysplex situations

The sysplex situations are described below in alphabetical order. You can access the description of a specific situation by selecting its name in the Contents tab.

KM5_Model_Sysplex_DASD_Filter

KM5_Model_Sysplex_DASD_Filter uses the DASD Device Collection Filtering attributes Average Response time and I/O Rate, but does not provide any preset values. This situation is intended for use as the basis for creating your own DASD filter. The formula is:

*VALUE DASD_Device_Collection_Filtering.Average_Response_Time *GT 0 .0
*AND *VALUE DASD_Device_Collection_Filtering.I/O_Rate *GT 0.0

No DASD device data is collected unless a DASD filter situation is activated. To create a DASD filter, use the Situation editor to create a copy of this situation and specify values appropriate to your environment. You can further refine the filter criteria using other attributes in the DASD Device Collection Filtering attribute group. Set the situation to autostart, and distribute it to the *MVS_SYSPLEX managed system list.

KM5_No_Sysplex_DASD_Filter_Warn

KM5_No_Sysplex_DASD_Filter_Warn is raised when no data is being reported. This indicates either that no DASD filter situation is being used or that the filter being used is too strong and all devices are eliminated by the criteria. The formula is:

*IF *COUNT Sysplex_DASD_Group.Volume_Serial_Number *EQ 1
*AND *VALUE Sy splex_DASD_Group.Volume_Serial_Number *EQ $none$

This situation is set to run at startup and distributed to the *MVS_SYSPLEX managed system list.

If there is no sysplex DASD filter in place and you want to start monitoring DASD information, add a sysplex DASD situation. You can use the product-provided situation KM5_Model_Sysplex_DASD_Filter as the basis for your filtering situation. To use the model:

If there is a sysplex DASD filter in use, weaken the filtering criteria to allow some devices to be monitored.

KM5_Weak_Plex_DASD_Filter_Warn

KM5_Weak_Plex_DASD_Filter_Warn monitors for a high count of DASD devices, which indicates either that the DASD filtering criteria are not strong enough or that the threshold needs to be raised to reduce the number of reported devices. The provided threshold is 1000, but you change the threshold to avoid false positive results. The formula is:

*COUNT Sysplex_DASD_Group.Volume_Serial_Number *GT 1000

This situation is auto-started and distributed to the *MVS_SYSPLEX managed system list.

MVS_CFStruct_Status_Crit

MVS_ CFStruct_ Status_ Crit monitors the status of coupling facility structures and raises a Critical alert when a structure has failed.

The formula is:

VALUE CF_Structures.Structure_Status EQ Failed

If this situation is raised, an immediate rebuild is necessary. If another coupling facility is available as a backup, it should be made the primary facility.

MVS_CFStruct_Status_Warn

MVS_ CFStruct_ Status_ Warn monitors the status of coupling facility structures and raises a Warning alert when a structure is in “Rebuild” status.

The formula is:

VALUE CF_Structures.Structure_Status GT 0X14

If this situation is raised, a coupling facility structure is about to fail. A rebuild may soon be necessary. If another coupling facility is available as a backup, it should be made the primary facility.

MVS_CFStructStat_FalseLock_Crit

MVS_ CFStructStat_ FalseLock_ Crit monitors the statistics related to coupling facility structures and raises a Critical alert when the false lock table contention count reaches a high level.

The formula is:

VALUE CF_Structures.False_Lock_Table_Entry_Contention_Count GT 100

False contention should be less than half of total lock contention or less than 0.5% of total lock requests. If this condition occurs frequently, the size of the lock structure should be increased.

MVS_CFStructStat_FalseLock_Warn

MVS_CFStructStat_FalseLock_Warn monitors the statistics related to coupling facility structures and raises a Warning alert when false contention for lock structures is approaching a high level.

The formula is:

VALUE CF_Structures.False_Lock_Table_Entry_Contention_Count GT 0 AND
VALUE CF_Structures.False_Lock_Table_Entry_Contention_Count LE 100

False contention should be less than half of total lock contention or less than 0.5% of total lock requests. If this condition occurs frequently, the size of the lock structure should be increased.

MVS_CFStructToMVS_Requests_Crit

MVS_ CFStructToMVS_Requests_Crit monitors the number of requests that the coupling facility makes to a given z/OS system and raises a Critical alert when the coupling facility performance has deteriorated severely.

The formula is:

VALUE CF_Structure_to_MVS_System.Average_Queued_Request_Time GT 100 OR
VALUE CF_Structure_to_MVS_System.Request_Rates *GT 1000

If this situation is raised, check the status of the coupling links and the performance data for the coupling facility machine or LPAR. A link may be down or a processor maybe offline or malfunctioning. If the coupling facility is in an LPAR with shared processors, the overall CPC may be fully loaded, thus causing the LPAR weights to come into effect. If more LPs are assigned to the CF than its share of total weight can support, try varying an LP offline.

MVS_ CFStructToMVS_Requests_Warn

MVS_ CFStructToMVS_ Requests_Warn monitors the number of requests that the coupling facility makes to a given z/OS system and raises a Warning alert when the coupling facility performance has deteriorated to the extent that requests may begin to time out or otherwise fail.

The formula is:

(VALUE CF_Structure_to_MVS_System.Average_Queued_Request_Time GT 20.0 AND
VALUE CF_Structure_to_MVS_System.Average_Queued_Request_Time LE 100.0) OR
(VALUE CF_Structure_to_MVS_System.Request_Rates GT 500.0 AND
VALUE CF_Structure_to_MVS_System.Request_Rates LE 1000.0) OR
VALUE CF_Structure_to_MVS_System.Requests_Converted GT 50

If this situation is raised, you may have exhausted the current capacity. Check the status of the coupling links and the performance data for the coupling facility machine or LPAR.

MVS_CFStructUsers_Connect_Crit

MVS_ CFStructUsers_ Connect_ Crit monitors the connectivity status of coupling facility structure users and raises a Critical alert when a connection has failed and a rebuild is not working.

The formula is:

VALUE CF_Clients.Connection_Status EQ Failing

If another coupling facility is available as a backup, it should be made the primary facility. Check the links and the status of the coupling facility hardware.

MVS_CFStructUsers_Connect_Warn

MVS_ CFStructUsers_ Connect_ Warn monitors the connectivity status of coupling facility structure users and raises a Warning alert when the connection status indicates a reconnection, or when there are any problem connections ("Failed" or "Failed Persistent").

The formula is:

VALUE CF_Clients.Connection_Status EQ ConnectedRebuild OR
VALUE CF_Clients.Connection_Problem_Flag GT 0

If this situation is raised, it indicates that coupling facility connections are failing and a rebuild is in effect. If another coupling facility is available as a backup, it should be made the primary facility. Check the links and the status of the coupling facility hardware.

MVS_CFSystems_Performance_Crit

MVS_ CFSystems_ Performance_ Crit monitors the performance of coupling facility systems and raises a Critical alert when CPU utilization exceeds 95%, an indicator shows coupling facility system status as "Failed", or the I/Os per second exceed 500.

The formula is:

VALUE CF_Systems.CPU_Percent GT 95 OR
VALUE CF_Systems.Status EQ Failed OR
VALUE CF_Systems.I/Os_Per_Second GT 5000

If this situation is raised, it may indicate that coupling facility is in an overload condition. Check the performance data for the coupling facility and, if it is unacceptable, determine if some large structures can be replicated in another facility to reduce the data rate. The long term solution may require an increase in coupling facility capacity or a technology upgrade to faster coupling links.

MVS_CFSystems_Performance_Warn

MVS_ CFSystems_ Performance_ Warn monitors the performance of coupling facility systems and raises a Warning alert when:

The formula is:

(((VALUE CF_Systems.CPU_Percent GT 80 AND
VALUE CF_Systems.CPU_Percent LE 95) OR
VALUE CF_Systems.Status EQ Reconcile) OR
(VALUE CF_Systems.I/Os_Per_Second GT 2000 AND
VALUE CF_Systems.I/Os_Per_Second LE 5000)) OR
VALUE CF_Systems.Structure_Count_Out_Policy GT 0

If this situation is raised, it may indicate that coupling facility is in an overload condition. Check the performance data for the coupling facility and, if it is unacceptable, determine if some large structures can be replicated in another facility to reduce the data rate. The long term solution may require an increase in coupling facility capacity or a technology upgrade to faster coupling links.

MVS_CFSystemPaths_Busy_Crit

MVS_CFSystemPaths_Busy_Crit monitors the number of times a system path was busy when a request was made and raises a Critical alert when a path busy condition occurs greater than 95% of the time.

The formula is:

VALUE CF_Path.Contention_Percent GT 95

Severe performance degradation will continue until action is taken to reduce the traffic or increase the data bandwidth to the coupling facility. Traffic may be reduced by adding coupling facilities; bandwidth may be increased by using more links or faster links. See the IBM Redbook OS/390 Parallel Sysplex Configuration Volume 2: Cookbook , publication number SG24-5638, for details and further recommendations.

MVS_CFSystemPaths_Busy_Warn

MVS_ CFSystemPaths_ Busy_ Warn monitors the number of times a system path was busy when a request was made and raises a warning when a path busy condition occurs between 60% and 95% of the time, or a path is not operational.

The formula is:

(VALUE CF_Path.Contention_Percent GT 60 AND
VALUE CF_Path.Contention_Percent LE 95) OR
VALUE CF_Path.Status EQ NotOperational

This situation is raised when coupling facility path contention is high, or a link is down. Performance degradation may continue until action is taken to reduce the traffic or increase the data bandwidth to the coupling facility. Traffic may be reduced by adding coupling facilities; bandwidth may be increased by using more links or faster links. See the IBM Redbook OS/390 Parallel Sysplex Configuration Volume 2: Cookbook , publication number SG24-5638, for details and further recommendations.

MVS_GRS_RespTime_Crit

MVS_ GRS_ RespTime_ Crit monitors the response time in the global resource serialization (GRS) complex and raises a Critical alert when response time exceeds the specified threshold, 25 milliseconds by default.

The formula is:

VALUE GRS_Ring.Response_Time *GT 25

If no hardware problem is indicated, this situation indicates that there is probably a bottleneck in one of the connected systems, causing the GRS token to become stalled. If the sysplex is a parallel sysplex, this kind of problem can be eliminated by changing to a GRS Star configuration, using the coupling facility to hold and exchange GRS data across all systems in the sysplex.

MVS_GRS_RespTime_Warn

MVS_ GRS_ RespTime_ Warn monitors the response time in the global resource serialization complex (GRS) and raises a Warning alert when response time is high or one of the systems in the ring is inactive.

The formula is:

(VALUE GRS_Ring.Response_Time GT 15 AND
VALUE GRS_Ring.Response_Time LE 25) OR
VALUE GRS_Ring.Status EQ Inactive

If this situation is raised, use operator commands to display GRS status and attempt to restart the GRS ring. If no hardware problem is indicated, there is probably a bottleneck in one of the connected systems, causing the GRS token to become stalled. If the sysplex is a parallel sysplex, this kind of problem can be eliminated by changing to a GRS Star configuration, using the coupling facility to hold and exchange GRS data across all systems in the sysplex.

MVS_XCFGroupMembers_Status_Crit

MVS_ XCFGroupMembers_ Status_ Crit monitors the status of the members of cross-coupling facility groups and raises a Critical alert when XCF connection to one or more systems has been lost.

The formula is:

VALUE XCF_Member.Status EQ Missing

If this situation is raised, check that the systems are up, then use operator commands to try to restart the connections.

MVS_XCFGroupMembers_Status_Warn

MVS_ XCFGroupMembers_ Status_ Warn monitors the status of the members of cross-coupling facility groups and raises a Warning alert when an XCF connection to one or more systems has failed.

If this situation is raised, use operator commands to try to restart the connections.

The formula is:

VALUE XCF_Member.Status EQ Failed

MVS_XCFSystemPaths_Crit

MVS_ XCFSystemPaths_ Crit monitors the number of times a cross system coupling facility path was busy when a request was made and raises a Critical alert when that count exceeds 95% of the defined limit.

The formula is:

VALUE XCF_Path.Retry_Percent GT 95

If this situation is raised, an XCF path is experiencing a very high number of retries and failure may occur soon. If another path is available, it should be brought online.

MVS_XCFSystemPaths_Warn

MVS_ XCFSystemPaths_ Warn monitors the number of times a cross system coupling facility path was busy when a request was made and raises a warning when the retry percent is between 80% and 95%, or the path's status is "Failed", "Rebuilding", or "Quiescing".

The formula is:

(VALUE XCF_Path.Retry_Percent GT 80 AND
VALUE XCF_Path.Retry_Percent LE 95) OR
VALUE XCF_Path.Status GT 0X07

If this situation is raised, an XCF path is experiencing a high number of retries and failure may occur soon. If another path is available, it should be brought online.

OS_CMD_CF_Systems_ Perform_Crit

OS_ CMD_ CF_ Systems_ Perform_ Crit monitors the performance of a coupling facility system and issues a Critical alert when a coupling facility is experiencing a very high activity rate or has failed. The situation will send a message to designated TSO user IDs when this situation is true.

The formula is:

VALUE CF_Systems.CPU_Percent GT 95 OR
VALUE CF_Systems.Status EQ Failed OR
VALUE CF_Systems.I/Os_Per_Second GT 5000

If this situation is raised, use the details view to determine which problem is occurring. If the status is "failed", switch to the secondary coupling facility and restart and rebuild the failed one. If request rate or CPU usage is high, bring the situation to the attention of a system programmer. Reconfiguration or load balancing may be necessary.

OS_CMD_DASD_Device_ContIdx_Warn

OS_ CMD_ DASD_ Device_ ContIdx_ Warn monitors the contention index of a DASD device and issues a Warning alert when the contention index reaches the specified threshold (0.5 by default). It will send a message to designated TSO user IDs indicating this situation is true.

The formula is:

VALUE Sysplex_DASD.Average_Device_Contention_Index GT .500

If this situation is raised, give this information to a system programmer or DASD specialist. If the device is on a cached storage subsystem with RAID, this may not be much of a problem. If that is the case, the threshold should be adjusted to an appropriate value for the subsystem.

OS_CMD_WLM_Performance_Idx_Crit

OS_ CMD_ WLM_ Performance_ Idx_ Crit monitors the workload manager performance index of a service class period and issues a Critical alert when the performance index exceeds the specified threshold (1.50, by default). It will send a message to designated TSO user IDs indicating this situation is true.

The formula is:

VALUE Sysplex_WLM_Service_Class_Period.Performance_Index GT 1

This situation is raised when a service class is failing to meet its goal. If this situation persists, the person responsible for the Workload Manager service definition should determine if the missed goal needs to be changed. Bottleneck analysis of the service class should provide useful information.

Sysplex_DASD_Dev_ContIndx_Warn

Sysplex_ DASD_ Dev_ ContIndx_ Warn monitors contention for a DASD device and issues a Warning alert when the index is greater than the specified threshold (.5 by default).

The formula is:

VALUE Sysplex_DASD.Average_Device_Contention_Index GT .500

If this situation is raised, notify a system programmer or DASD specialist. If the device is on a cached storage subsystem with RAID, this may not be much of a problem. If that is the case, the threshold should be adjusted to an appropriate value for the subsystem.

Sysplex_DASDSys_VaryStatus_Warn

Sysplex_ DASDSys_ VaryStatus_ Warn monitors DASD devices and issues a Warning alert if a device vary status is ”Boxed“ or ”Not Ready.“

The formula is:

VALUE Sysplex_DASD_Device.Vary_Status *EQ 0X04 OR
VALUE Sysplex_DASD_Device.Vary_Status *EQ 0X05

This situation is raised when a DASD volume is in a ”Boxed“ or ”Not Ready“ status and is not available. If needed data sets are on the device and no backup devices are available, a system or subsystem failure may occur. If you cannot use the VARY command to return the device to online status, there may be a hardware malfunction and a Severity 1 or 2 hardware incident should be opened with the hardware vendor.

Sysplex_GlobalEnq_Wait_Crit

Sysplex_ GlobalEnq_ Wait_ Crit monitors the wait time on a global enqueue and issues a Critical alert when either the wait time or maximum wait time exceeds 60 seconds.

The formula is:

VALUE Global_Enqueues.Maximum_Wait_Time GT 60 OR
VALUE Global_Enqueues.Wait_Time GT 60

This situation is raised when enqueue delay is high and has remained so for over a minute. Check the details to determine who is holding the ENQ. If it is a batch job that can be canceled and requeued, the deadlock can be broken by doing so. If it is a started task or on-line user, a system programmer should be called.

Sysplex_GlobalEnq_Wait_Warn

Sysplex_ GlobalEnq_ Wait_ Warn monitors the wait time on a global enqueue and issues a Warning alert when either the wait time or maximum wait time falls between 30 and 60 seconds.

The formula is:

(VALUE Global_Enqueues.Maximum_Wait_Time GT 30 AND
VALUE Global_Enqueues.Maximum_Wait_Time LE 60) OR
(VALUE Global_Enqueues.Wait_Time GT 30 AND
VALUE Global_Enqueues.Wait_Time LE 60)

This situation is raised when ENQueue delay is high. Check the details to determine who is holding the ENQ. If it is a batch job that can be canceled and requeued, the deadlock can be broken by doing so. If it is a started task or on-line user, a system programmer should be called.

Sysplex_Workloads_PerfIdx_Crit

Sysplex_ Workloads_ PerfIdx_ Crit monitors the performance index related to workload on the sysplex and issues a Critical alert when the performance index of any service class period exceeds the specified threshold (1.50 by default).

The formula is:

VALUE Sysplex_WLM_Service_Class_Period.Performance_Index GT 1.50

This situation is raised when a service class is failing to meet its goal. If this situation persists, the person responsible for the Workload Manager service definition should determine if the missed goal needs to be changed. Bottleneck analysis of the service class should provide useful information.

Sysplex_Workloads_PerfIdx_Warn

Sysplex_ Workloads_ PerfIdx_ Warn monitors the performance index related to workload on the sysplex and issues a Warning alert when the performance index of any service class period falls between 1. 20 and 1. 50.

The formula is:

VALUE Sysplex_WLM_Service_Class_Period.Performance_Index GT 1.20 AND
VALUE Sysplex_WLM_Service_Class_Period.Performance_Index LE 1.50

This situation is raised when a service class is failing to meet its goal. If this situation persists, the person responsible for the Workload Manager service definition should determine if the missed goal needs to be changed. Bottleneck analysis of the service class should provide useful information.

Sysplex_XCFGroups_Warn

Sysplex_ XCFGroups_ Warn monitors cross coupling facility (XCF) groups and issues a Warning alert when the problem count (member status of ”Failed“ or ”Missing“) is greater than 0.

The formula is:

VALUE XCF_Group.Problem_Count GT 0

This situation is raised when there is a problem in XCF.  A system programmer should be contacted to research and correct the problem.

Sysplex_XCFSystems_Status_Crit

Sysplex_ XCFSystems_Status_ Crit monitors the status of a cross system coupling facility (XCF) and issues a Critical alert when cross-system communication within the sysplex is disabled.

The formula is:

VALUE XCF_System.Status EQ Missing

A system programmer should be contacted to determine why XCF is down and to restore it as soon as possible.

Sysplex_XCFSystems_Status_Warn

Sysplex_ XCFSystems_ Status_ Warn monitors the status of a cross system coupling facility (XCF) and issues a Warning alert when a system's status shows it is being shut down.

The formula is:

VALUE XCF_System.Status EQ BeingRemoved

When this situation is raised, a sysplex communication failure can be expected imminently. System programmer help is required to determine why the shutdown is occurring and to restart the facility as soon as possible.