Table Of Contents
Component Outage On-Line (COOL) Measurement for the Cisco 12000 Series Router
Prerequisites for COOL Measurement
Restrictions for COOL Measurement
Outage Event Detection and Calculation
Persistent and Redundant Data Storage
Outage Parameter Configuration
How to Configure COOL Measurement
Adding an Interface Object to be Monitored
Adding a Physical Entity Object to be Monitored
Adding a Remote Device Object to be Monitored
Configuring SNMP COOL Measurement Notifications
Network Device Outage Measurement: Example
Cisco IOS Software Outage Measurement: Example
Line Card Outage Measurement: Example
Interface Outage Measurement: Example
Remote Device Outage Measurement: Example
snmp-server enable traps outage
Component Outage On-Line (COOL) Measurement for the Cisco 12000 Series Router
This document describes the Cisco IOS Component Outage On-Line (COOL) Measurement feature, which provides an autonomous real-time outage monitoring and measurement within an individual networking device. COOL measurement allows network operators to monitor networking device component outages and collect raw outage data through the CISCO-OUTAGE-MONITOR-MIB. The management information base (MIB)-monitored components include hardware, software, local interfaces, and remote interfaces at next-hop network devices.
Feature History for the Component Outage On-Line Measurement Feature
Finding Support Information for Platforms and Cisco IOS Software Images
Use Cisco Feature Navigator to find information about platform support and Cisco IOS software image support. Access Cisco Feature Navigator at http://www.cisco.com/go/fn. You must have an account on Cisco.com. If you do not have an account or have forgotten your username or password, click Cancel at the login dialog box and follow the instructions that appear.
Contents
•
Prerequisites for COOL Measurement
•
Restrictions for COOL Measurement
•
How to Configure COOL Measurement
Prerequisites for COOL Measurement
The following hardware and software components are minimum requirements on the networking device to support the COOL Measurement feature functionality:
•
CISCO-OUTAGE-MONITOR-MIB, IF-MIB, and ENTITY-MIB
•
Advanced Technology Attachment (ATA) Flash memory on the Route Processor (RP).
•
On redundant processor systems, ATA Flash memory on both RPs.
Restrictions for COOL Measurement
•
Restrictions for on-line insertion or removal (OIR) devices:
–
If the physical entity (LC) object and the interface objects on the same LC are configured for COOL measurement monitoring, the physical entity (LC) OIR (removal) causes the interface objects removal on the LC from the COOL monitoring configuration.
–
For the Field Replaceable Unit (FRU) object (physical entity), it could be possible to pull off the LC or RP and insert into different slot, which is already configured by other physical entity object. To resolve the conflict, the existing physical entity object will be removed from the cool monitoring configuration.
•
To minimize any performance impact caused by the COOL measurements, do not configure more than 2000 local or remote measurement objects.
•
The event history table size is configurable up to 5000 events. When the maximum number of entries has been reached in the table, the oldest entry in the table is removed.
•
During the router crash, there is no chance for the router to send a router device down message to the network management system (NMS). Also, in the event of an RP failure, or down event, there is not enough information to determine the reason for the failure. The logical object's down events are captured during the failed object restart process.
•
If COOL software detects a different COOL version due to a Cisco IOS software upgrade or downgrade by reading the persistent file in Flash memory, the system will disable COOL measurement.
•
To downgrade an image older than 12.0(31)S3, disable COOL measurement manually before the router reload. See the "Disabling COOL Measurement" section for more information.
•
For COOL measurement, the gigabit interface converter (GBIC) and small form pluggable (SFP) objects are considered as interface objects (not FRU physical entity objects), and OIR for these objects will be tracked as an interface DOWN and UP event.
•
If the RP physical entity object failure causes the entire system to crash, the outage will be captured by the Router Device Object not by the RP physical entity object.
About COOL Measurement
To configure COOL measurement, you should understand the following topics:
COOL Measurement Overview
Measuring system and component outages for a network device is typically carried out in various ways by a network management system (NMS) located somewhere on the network. With COOL measurement, the network devices themselves perform outage monitoring, simple event filtering, event notification, and storing of outage data within the networking device for NMS or other network management tools to poll.
COOL measurement allows network operators to monitor networking device component outages and collect raw outage data through the CISCO-OUTAGE-MONITOR-MIB. The MIB-monitored components include hardware, software, local interfaces, and remote interfaces at next-hop network devices, and logical devices. The collected raw outage data assists the NMS in several ways:
•
Determining the source of failure events in real time.
•
Deriving component Mean Time Between Failure (MTBF) and Mean Time To Repair (MTTR) values and, in turn, the networking device system availability.
•
Allowing service providers to monitor Service Level Agreements (SLAs) in terms of traffic outage with their customers.
Compared to other outage monitoring tools, COOL measurement has the following benefits:
•
Monitoring—COOL measurement allows the monitoring of networking devices in real-time and helps in determining the source of failure events.
•
Accuracy—COOL measurement performs outage measurement from within a networking device, as opposed to external polling from an NMS. Internal measurement can pinpoint the source of failure events and downtime for such events as link failure and RP failure.
•
Data persistency—COOL measurement stores the outage data in persistent storage on the networking device for NMS or other network management tools to access. This avoids certain data loss due to unreliable network transport, link outage, link congestion, or a device crash itself.
•
Efficiency—COOL measurement performs event filtering operations close to the event sources, which reduces the processing overhead incurred by network and system resources at the upper layer of an NMS.
Figure 1 illustrates networking device-based measurements in networks with one or more NMS. Outage monitoring is divided into two functions:
•
Outage monitoring and measurement—Performed by COOL measurement in the individual networking devices.
•
Outage correlation and calculation—Performed by one or more NMS or NMS-associated tools. Outage correlation and calculation are outside the scope of the COOL measurement function.
Figure 1
Outage Monitoring and Measurement Using COOL
Through the SNMP manager and MIB objects, the NMS collects the outage data in two ways:
•
Event notification from the networking devices
•
Data access to the networking devices
With the event notification mechanism, the NMS receives outage data upon each occurrence of an outage event. With the data access mechanism, the NMS "reads" the outage data stored in the networking devices periodically or on demand. In other words, the outage data is "pushed" by the networking devices to the NMS or "pulled" by the NMS from the networking devices.
To enable the NMS to manage COOL measurement on the networking device, a control channel (Telnet session, for example) must be established between the NMS and the networking device. The control channel is managed by command line interface (CLI) commands. Standard protocols such as FTP may be used for downloading the COOL measurement configuration information from the NMS to the networking device.
Outage Measurement
The COOL Measurement feature has a number of functional elements:
•
Outage Event Characterization
Each of these elements is described in the following sections.
Outage Model
The target of COOL outage measurement is an object, which is a generalized term for physical and logical networking device components. COOL measurement provides outage measurements for defined types of networking objects:
•
Physical objects
•
Interface objects
•
Remote objects
•
Software objects
•
Logical objects
Associated with the objects monitored by the COOL Measurement feature are failure modes. Figure 2 shows a sample edge network environment with each type of the monitored objects.
Figure 2
Objects Monitored for COOL Measurement
Table 1 lists the failure modes for each type of monitored object.:
Outage Measurement Metrics
COOL measurement tracks a number of metrics on a per object basis:
•
Accumulated Outage Time (AOT) since measurement started
•
Number of Accumulated Failures (NAF) since measurement started
•
Recording Start Time (RST)
AOT is the time (in seconds) for a given measurement interval. NAF is the number of failures for a given measurement interval. RST is the time that the object was first added to the monitoring. The current time (TC) is also used in the calculation.
For a failure event, the corresponding object's NAF is increased by 1. For a recovery event, the outage time is calculated and added to the corresponding object's AOT.
Consider the example shown in Figure 3. For object i, the AOT and NAF are measured. From these measurements, the NMS can calculate MTTR and MTBF values for the object.
•
The measurement interval = TC - RST = 1,400,000 minutes
•
AOT = 14 minutes
•
NAF = 2
•
Availability = 1 - AOT / (TC - RST) = 1 - 14 / 1,400,000 = 99.999%
•
MTTR = AOT / NAF = 14 / 2 = 7 minutes
•
MTBF = (TC - RST) / NAF = 1,400,000 / 2 = 700,000 minutes. = 11,667 hours
Figure 3
Sample MTTR and MTBF Calculation Using AOT and NAF Measurements for Object i
Outage Event Characterization
At any given time, each object that is monitored by COOL management exists in one of the two states, UP or DOWN. Before the COOL object manager changes the state of any object, it evaluates several outage characteristics:
•
Duration threshold—minimum interval before a state change will occur and is configured by the network administrator.
•
Start time—time when the outage is first detected.
•
End time—time when the outage ends.
When COOL measurement detects an object failure (outage) or recovery, the outage manager records the event start time. When the outage has ended, the end time is also recorded. If the outage time exceeds a specified duration threshold, the state is changed from UP to DOWN in the event history table for that object. If the period between the outage event start and stop time exceeds the duration interval, the state of the object is changed. If an outage ends before the duration threshold, it is not considered as an outage.
Similarly, when an object recovers, the recovery time is recorded. If an object that is in the process of recovering from an outage experiences another outage before the timer duration threshold, it is not considered as a recovery, and the state is not changed from DOWN to UP. If the recovery exceeds the duration, the state is changed to UP in the event history table for that object.
Figure 4 illustrates how the outage event detection and recovery time is determined.
Figure 4
Outage Event Detection and Recovery Timing
Outage Management
Managing COOL measurement has a number of components:
•
Outage Event Detection and Calculation
•
Persistent and Redundant Data Storage
•
Outage Parameter Configuration
The outage manager detects and collects outage-related events.
Outage Event Detection and Calculation
The COOL manager detects outages events on a per-object basis. All object outage information includes the Accumulated Outage Time (AOT) since measurement started and the Number of Accumulated Failures (NAF) since measurement Record Start Time (RST). This enables the calculation of object Mean Time To Repair (MTTR), Mean Time Between Failure (MTBF), and availability.
The event detection for specific object types (physical, interface, remote, or logical) is discussed in the following sections.
Physical Objects
The physical object state changes to DOWN when the object is physically removed, or a software or hardware failure of the physical object exceeds the duration threshold. The physical object state changes to UP when the same physical object is inserted into any slot of the router, or the software or hardware failure is fixed. The DOWN state duration is captured as AOT and the DOWN state event increases NAF by one.
Interface Objects
The interface object state is changed to DOWN when the interface IDB detects down status due to a software or hardware interface object failure or peer interface failure that exceeds the duration threshold. The interface object state is changed to UP when the software or hardware failure of the interface object recovers or the peer interface recovers. The DOWN state duration is captured as AOT and the DOWN state event increases NAF by one. When a sub interface object is removed by the configuration (IDB is gone), that interface object will be removed from the COOL measurement configuration.
Remote Objects
The remote object state is changed to DOWN when the periodic ICMP ping to the object fails 100 percent. The remote object state is changed to UP when the periodic ICMP ping to the object is successful. The number of remote devices and frequency of ICMP messages configured on the networking device is based on individual site requirements and should be determined with consideration for overall networking device performance—the more remote devices monitored, the more processing that is required by the networking device. When the remote object is configured the state is INIT before the first ICMP ping result is available.
Logical Object Outage
A logical object consists of the router device and the Cisco IOS software on the route processor.
The router device outage can be calculated during the system restart by retrieving the last system up time and subtracting it from the current system time. The last system up time is kept by periodically updating the system time into the time stamp file on the persistent storage.
The IOS software failure on the route processor (RP) can be calculated by using a "crash reason" file, which records the crash reason during the RP crash. Since a software outage may be attributed to different causes, and if the specific cause is not clear, the COOL measurement feature categorizes the software outage by "worst case" and "best case":
•
Best Case Software Outage (MIN-IOS-SW)—includes only the exception error code of the event that appears to be the main cause of software outage.
•
Worst Case Software Outage (MAX-IOS-SW)—includes all the error codes that contributed either fully or partially to software outage.
The IOS software object can be further categorized into the predefined MIN-IOS-SW-Outage object and MAX-IOS-SW-Outage object per RP.
Outage Monitor MIB
The CISCO-OUTAGE-MONITOR MIB provides the ability to use the Simple Network Management Protocol (SNMP) to monitor in real time the hardware and software outage information for a networking device and connections with neighbor devices.
This MIB describes, stores, and reports outage-related information generated by individual hardware and software components comprising a networking device.
The MIB consists of an outage event notification and six information tables as follows:
•
Outage Event Notification—Notifies the NMS of outage-related events generated from the networking device. The notification data are the same as the entry in Outage Event History Table.
•
Outage Event History Table—Maintains a history of outage-related events generated from the networking device.
•
Outage Object Table—Maintains outage information for all the objects being monitored.
•
Outage Event Reason Map Table—Maintains the event reason and description information corresponding to the event index entry in the outage history table.
•
Process MIB Map Table—Maintains an index of the software process object type and provides mapping to two table indexes (CPUTotalIndex and cpmProcessPID) in the CISCO-PROCESS-MIB. (Not supported in this release.)
•
Remote Object Map Table—Maintains remote device descriptions corresponding to the remote object type index.
•
Logical Object Map Table—Maintains logical device descriptions corresponding to the logical object type index.
Figure 5 illustrates the relationship of the CISCO-OUTAGE-MONITOR MIB information table with the IF-MIB, ENTITY-MIB, and CISCO-PROCESS-MIB.
Figure 5
CISCO-OUTAGE-MONITOR MIB Information Table Relationships
The COOL Measurement feature maintains outage information in two tables:
Outage Object Tables
This table maintains entries for all the objects being monitored. Objects are add to the object table through COOL measurement command line interface. An entry in the table is updated following the detection of a state change for the object. Entries are removed only when the COOL measurement is disabled or the configuration is reset through the command line interface. The object table data includes the following information for each entry:
•
Object Table Index— the object index identifies a specific index of the monitored object based upon the object type
•
Object State—the current state of the object (UP or DOWN)
•
Recording Start Time (RST)—specifies the date and time of the object to start outage recording. The recording time starts when the object is added to the table.
•
Object AOT—accumulated Outage Time of the object since the initiation of the outage measurement process.
•
Object NAF—number of accumulated failures of the object since the initiation of the outage measurement process.
Outage Event History Table
This table maintains a history of outage-related events generated by the networking device for each object monitored by COOL measurement. An event occurs any time a monitored object changes state from UP to Down or Down to UP.Entries are dynamically added into the table when outage-related events occur. The event table data includes the following information:
•
Object Type—one of the object tables such as interface, physical entity, sw process, and remote object.
•
Object Table Index—index to the outage object table entry of a given object type.
•
Event Type Index—index to the event type map table entry which contains the event type (UP or DOWN) and the event description (event category code and reason).
•
Event Time—time of the event.
•
Event Interval—time interval from the previous event to the current event.
Persistent and Redundant Data Storage
The COOL Measurement feature supports persistent data storage. It consists of two levels of data storage. It keeps the event driven outage data temporarily in the NVRAM and periodically updates the outage data changes to the Flash disk (persistent storage). The data persistency makes it possible to retain outage information against unreliable network transport, link outage, link congestion, and networking device failures over the measurement period.
Administrators can configure the periodic flash storage update interval using the "Flash file update timer" and can control the event driven update using the "event storm period timer" as well as the "event storm count" parameters based on site dependencies and requirements.
The COOL Measurement feature also provides a method of redundant data storage. If the networking device supports redundant processors, the outage data is saved into persistent storage on both processors. Data redundancy makes it possible to retain outage information following RP switch overs or single-point-of-storage failures over the measurement period. It also makes it possible to retain the networking device crash information even if one of the processors containing the outage data is physically replaced.
Figure 6 illustrates the COOL Measurement data persistency and redundancy.
Figure 6
COOL Measurement Data Persistency and Redundancy
Outage Parameter Configuration
The COOL Measurement feature provides a command line interface for adding objects, configuring measurement parameters, displaying the monitoring status, and checking the availability and outage data for each object being monitored. For more information about outage monitor configuration, see the "How to Configure COOL Measurement" section. For COOL measurement commands, see the "Command Reference" section.
The COOL Measurement feature provides a number of configurable parameters and settings to customize outage monitoring. Table 2 provides a summary of these parameters.
Table 2
Outage Monitoring Configuration Parameters
Event History Table Size
The event history table entry size can be configured. Changes to the history table size have the following effects:
•
If the new table size is bigger than the previous table size, the new table keeps all the existing event history entries.
•
If the new size is less than the previous table size and the table entry is bigger than the new size, the older event history entries will be dropped.
•
If the new event table is too big to allocate memory (failed during the resize process), it will lose the previous event history entry.
•
The new event history table size shall be stored into the persistent file right after the configuration.
Flash File Update Timer
The flash file update timer provides a configurable interval at which outage information is written from memory to a Flash disk file (persistent storage).
Duration Threshold Timer
This timer provides a configurable interval for the duration threshold of an outage or recovery event. A shorter duration may result in more events being reported as outages, when in fact the objects are recovering within an acceptable time.
Time Stamp File Update Timer
The time stamp file update timer provides a configurable interval at which the COOL measurement manager writes the system time to Flash disk to record the last system up time. The system outage can be calculated during the system restart by retrieving the last system up time and subtracting it from the current system time. The time-stamp period is configurable to ensure that the outage measurement gap is within the outage resolution requirement for MTBF based on site requirements.
Event Storm Period Timer
An event storm occurs when the number of outage events exceeds a specified event threshold in a specified time period. This event storm period timer sets that duration.
Event Storm Count
An event storm occurs when the number of outage events exceeds a specified event threshold in a specified time period. The event storm count sets that threshold value.
How to Configure COOL Measurement
This section contains the procedures for configuring COOL measurement on the networking devices using the CLI. Each procedure is identified as either required or optional.
•
Enabling COOL Measurement (required)
•
Adding an Interface Object to be Monitored (optional)
•
Adding a Physical Entity Object to be Monitored (optional)
•
Adding a Remote Device Object to be Monitored (optional)
•
Configuring COOL Parameters (optional)
•
Configuring SNMP COOL Measurement Notifications (required)
•
Disabling COOL Measurement (optional)
Enabling COOL Measurement
To activate the COOL measurement process, perform the steps in this section. Enabling the COOL Measurement feature provides measurement of the Router-Device, MIN-IOS-SW, and MAX-IOS-SW objects by default.
SUMMARY STEPS
1.
enable
2.
configure terminal
3.
cool run
4.
end
5.
show cool object-table
6.
show running-config | include cool
DETAILED STEPS
Examples
To verify that the COOL Measurement feature is running, use the show cool object-table command. The cool run command initially enables three types of logical entity objects: Router-Device, minimum Cisco IOS software, and maximum Cisco IOS software. In this example, the networking device has redundant processors with a minimum and maximum entity object for the processor in slot 8 and in slot 9.
Router(config)# cool runRouter(config)# endRouter# show cool object-table**** COOL Object Table ****Type Index Status AOT NAF LAST-Change-Time Object-NameLNT 1 UP 0 0 Nov 03 2005 17:50:46 Router-DeviceLNT 2 UP 0 0 Nov 03 2005 17:50:46 MIN-IOS-SW9LNT 3 UP 0 0 Nov 03 2005 17:50:46 MAX-IOS-SW9LNT 4 UP 0 0 Nov 03 2005 17:50:46 MIN-IOS-SW8LNT 5 UP 0 0 Nov 03 2005 17:50:46 MAX-IOS-SW8router# show running-config | include coolcool runcool parametersWhat to Do Next
Add additional interface, FRU entity, and remote device objects for monitoring by the COOL measurement process.
Adding an Interface Object to be Monitored
To add an interface object to be monitored by the COOL measurement process running on the networking device, perform the following steps:
SUMMARY STEPS
1.
enable
2.
configure terminal
3.
cool interface interface-name
4.
end
5.
show cool object-table
6.
show running-config | include cool
DETAILED STEPS
Examples
In the following example, the interface object name is added to the list of monitored objects by the COOL measurement process. Additionally, the entry is added to the configuration file for the networking device.
Router# config terminalRouter(config)# cool interface ethernet 0Router# show cool object-table**** COOL Object Table ****Type Index Status AOT NAF LAST-Change-Time Object-NameLNT 1 UP 0 0 Nov 03 2005 17:54:53 Router-DeviceLNT 2 UP 0 0 Nov 03 2005 17:54:53 MIN-IOS-SW9LNT 3 UP 0 0 Nov 03 2005 17:54:53 MAX-IOS-SW9LNT 4 UP 0 0 Nov 03 2005 17:54:53 MIN-IOS-SW8LNT 5 UP 0 0 Nov 03 2005 17:54:53 MAX-IOS-SW8INF 2 UP 0 0 Nov 03 2005 17:55:13 Ethernet0Router# show running-config | include coolcool runcool interface Ethernet0cool parametersAdding a Physical Entity Object to be Monitored
To add field replaceable unit (FRU) objects to the list of objects monitored for COOL measurement, perform the following steps:
SUMMARY STEPS
1.
enable
2.
configure terminal
3.
cool physical-FRU-slot fru-slot-number [bay subslot]
4.
end
5.
show cool object-table
6.
show running-config | include cool
DETAILED STEPS
Examples
In this example, the physical object in slot 7 is added to list of objects monitored for COOL measurement. If the configuration is successful, the objects appear in the object table and in the configuration file for the networking device.
Router#config terminalRouter(config)#cool physical-FRU-slot 7Router#show cool object-table**** COOL Object Table ****Type Index Status AOT NAF LAST-Change-Time Object-NameLNT 1 UP 0 0 Nov 03 2005 17:54:53 Router-DeviceLNT 2 UP 0 0 Nov 03 2005 17:54:53 MIN-IOS-SW9LNT 3 UP 0 0 Nov 03 2005 17:54:53 MAX-IOS-SW9LNT 4 UP 0 0 Nov 03 2005 17:54:53 MIN-IOS-SW8LNT 5 UP 0 0 Nov 03 2005 17:54:53 MAX-IOS-SW8ENT 37 UP 0 0 Nov 03 2005 17:57:50 slot 7Router#show running-config | include coolcool runcool physical-FRU-slot 7cool parametersIn this example, the shared port adapter (SPA) device in slot 1, bay 0 is added list of objects monitored by the COOL measurement process. If the configuration is successful, the objects appear in the object table and in the configuration file for the networking device.Router#config terminalRouter(config)#cool physical-FRU-slot 1 bay 0Router#sh cool object-table**** COOL Object Table ****Type Index Status AOT NAF LAST-Change-Time Object-NameLNT 1 UP 0 0 Nov 03 2005 17:54:53 Router-DeviceLNT 2 UP 0 0 Nov 03 2005 17:54:53 MIN-IOS-SW9LNT 3 UP 0 0 Nov 03 2005 17:54:53 MAX-IOS-SW9LNT 4 UP 0 0 Nov 03 2005 17:54:53 MIN-IOS-SW8LNT 5 UP 0 0 Nov 03 2005 17:54:53 MAX-IOS-SW8ENT 37 UP 0 0 Nov 03 2005 17:57:50 slot 7ENT 69 UP 0 0 Nov 03 2005 17:59:41 module 1/0Router#show running-config | include coolcool runcool physical-FRU-slot 1 bay 0cool physical-FRU-slot 7cool parametersAdding a Remote Device Object to be Monitored
To add a remote networking device object to be monitored for COOL measurement, perform the following steps:
SUMMARY STEPS
1.
enable
2.
configure terminal
3.
cool remote-device entry-index remote-ip-address name seconds repeat [local-ip-address mode]
4.
end
5.
show cool object-table
6.
show running-config | include cool
DETAILED STEPS
Example
In this example, the remote device "rm-test" is added to the list of objects monitored under COOL measurement. The status message configuration is set to 60 seconds and the repeat value is set to 5.
Router#config terminalRouter(config)#cool remote-device 2 10.10.10.10 rm-test 60 5Router#show cool object-table**** COOL Object Table ****Type Index Status AOT NAF LAST-Change-Time Object-NameLNT 1 UP 0 0 Nov 03 2005 17:54:53 Router-DeviceLNT 2 UP 0 0 Nov 03 2005 17:54:53 MIN-IOS-SW9LNT 3 UP 0 0 Nov 03 2005 17:54:53 MAX-IOS-SW9LNT 4 UP 0 0 Nov 03 2005 17:54:53 MIN-IOS-SW8LNT 5 UP 0 0 Nov 03 2005 17:54:53 MAX-IOS-SW8RMT 2 INIT 0 0 Nov 03 2005 18:23:53 rm-testRouter# show running-config | include coolcool runcool remote-device 2 10.10.10.10 rm-test 60 5cool parametersConfiguring COOL Parameters
Note
Configuring COOL measurement parameters is presented here as one task; however, it is not required that you configure these parameters at the same time or in any particular sequence.
To update the COOL parameter settings, perform the following steps:
SUMMARY STEPS
1.
enable
2.
show running-config
3.
show cool parameters
4.
configure terminal
5.
cool parameter
6.
size event-table
7.
timer timestamp-file
8.
timer flash-file
9.
timer duration
10.
timer event-storm
11.
count event-storm
12.
end
13.
show cool parameters
DETAILED STEPS
Examples
In this example, the event table size value is decreased from 500 to 200 entries. All other parameters remain at their default settings.
Router#show cool parametersTime Stamp Period: Configured (30 seconds) Default (30 seconds)Flash Update Period: Configured (15 minutes) Default (15 minutes)Duration Threshold: Configured (8 seconds) Default (8 seconds)Event Storm Count Number: Configured (5 times) Default (5 times)Event Storm Check Period: Configured (1 seconds) Default (1 seconds)Max Event History Table Size: Configured (500 entries) Default (500 entries)Router#config terminalRouter(config)#cool parametersRouter(config-cool-pars)#Router(config-cool-pars)#size event-table 200Router(config-cool-pars)#exitRouter#show cool parametersTime Stamp Period: Configured (30 seconds) Default (30 seconds)Flash Update Period: Configured (15 minutes) Default (15 minutes)Duration Threshold: Configured (8 seconds) Default (8 seconds)Event Storm Count Number: Configured (5 times) Default (5 times)Event Storm Check Period: Configured (1 seconds) Default (1 seconds)Max Event History Table Size: Configured (200 entries) Default (500 entries)Router#show running-config | include coolcool run!cool parameterssize event-table 200Configuring SNMP COOL Measurement Notifications
To configure and activate COOL measurement to send SNMP trap notifications to the NMS and to enable polling from the NMS, perform the following steps:
SUMMARY STEPS
1.
enable
2.
configure terminal
3.
snmp-server enable traps outage
4.
snmp-server host host-address [traps | informs] [version {1 | 2c | 3 [auth | noauth | priv]}] community-string [udp-port port] [notification-type] [outage]
5.
end
DETAILED STEPS
What to Do Next
Configure and activate COOL measurement.
Disabling COOL Measurement
Disabling COOL measurement removes all COOL configuration and local outage measurement data. To disable COOL measurement on the networking device, perform the following steps:
SUMMARY STEPS
1.
enable
2.
configure terminal
3.
no cool run
4.
end
DETAILED STEPS
Example
The no cool run command disables the COOL measurement, deletes configuration settings, and removes local outage measurement data. To verify that the COOL Measurement feature is disabled, use the show cool object-table and show running-config commands.
Router# config terminalRouter(config)# no cool runRouter# show coolcool has not been enabledconfig ---> cool runRouter# show running-config | include coolRouter#Outage Measurement Examples
This section provides the following COOL measurement examples:
•
Network Device Outage Measurement: Example
•
Cisco IOS Software Outage Measurement: Example
•
Line Card Outage Measurement: Example
•
Interface Outage Measurement: Example
•
Remote Device Outage Measurement: Example
Note
The examples in this section illustrate the AOT and NAF measurements on the networking device running the COOL measurement process. The examples do not show MTTR and MTBF calculations, which are completed by the NMS using the COOL measurement data.
Network Device Outage Measurement: Example
This example illustrates the accumulated outage time (AOT) and number of accumulated failures (NAF) changes for a network device outage due to a reload operation, power outage, or networking device hardware failure in the redundant route processor configuration.
The following initial object table information shows that there is no AOT or NAF information:
Router# show coolNo entry is available !**** COOL Object Table ****Type Index Status AOT NAF LAST-Change-Time Object-NameLNT 1 UP 0 0 Nov 03 2005 18:43:56 Router-DeviceLNT 2 UP 0 0 Nov 03 2005 18:43:56 MIN-IOS-SW8LNT 3 UP 0 0 Nov 03 2005 18:43:56 MAX-IOS-SW8LNT 4 UP 0 0 Nov 03 2005 18:43:56 MIN-IOS-SW9LNT 5 UP 0 0 Nov 03 2005 18:43:56 MAX-IOS-SW9Following a "reload" of the entire router, the AOT and NAF information is updated and a DOWN and UP event notification is sent to account for the downtime. In this example, the event history table shows that the network device went DOWN just 33 seconds after COOL measurement was enabled. The event table also shows that the device DOWN for 233 seconds before it returned to the UP state. The object table provides the update AOT and NAF information.
Router#show cool event-table**** COOL Event Table ****Index Type Obj-ID Event Interval Event-Time Object-Name1 LNT 1 DOWN 33 Nov 03 2005 18:44:29 Router-Device2 LNT 1 UP 233 Nov 03 2005 18:48:22 Router-DeviceRouter#show cool object-table**** COOL Object Table ****Type Index Status AOT NAF LAST-Change-Time Object-NameLNT 1 UP 233 1 Nov 03 2005 18:48:22 Router-DeviceLNT 2 UP 0 0 Nov 03 2005 18:43:56 MIN-IOS-SW8LNT 3 UP 0 0 Nov 03 2005 18:43:56 MAX-IOS-SW8LNT 4 UP 0 0 Nov 03 2005 18:43:56 MIN-IOS-SW9LNT 5 UP 0 0 Nov 03 2005 18:43:56 MAX-IOS-SW9Cisco IOS Software Outage Measurement: Example
This examples illustrates the AOT and NAF changes for a Cisco IOS software outage on a networking device supporting redundant processors. The MIN-IOS-SW9 and MAX-IOS-SW9 are logical software entities and represent the active processor in the networking device. The MIN-IOS-SW8 and MAX-IOS-SW8 are logical software entities that represent the standby processor in the networking device.
Note
The MIN-IOS-SW object represents a known range of software functions in the networking device. Changes only to the MIN-IOS-SW values in the table reflect a software-only outage. The MAX-IOS-SW object represents a broad range of functions in the networking device, and changes to the MAX-IOS-SW values in the table may or may not represent a software-only outage.
In this example, the standby RP in slot 8 crashed due to a bus error, which may not be due to a purely software problem. A look at the event outage table prior to the RP crash shows only historical event information related to prior events on the networking device.
Router# show cool**** COOL Event Table ****Index Type Obj-ID Event Interval Event-Time Object-Name1 LNT 1 DOWN 33 Nov 03 2005 18:44:29 Router-Device2 LNT 1 UP 233 Nov 03 2005 18:48:22 Router-Device**** COOL Object Table ****Type Index Status AOT NAF LAST-Change-Time Object-NameLNT 1 UP 233 1 Nov 03 2005 18:48:22 Router-DeviceLNT 2 UP 0 0 Nov 03 2005 18:43:56 MIN-IOS-SW8LNT 3 UP 0 0 Nov 03 2005 18:43:56 MAX-IOS-SW8LNT 4 UP 0 0 Nov 03 2005 18:43:56 MIN-IOS-SW9LNT 5 UP 0 0 Nov 03 2005 18:43:56 MAX-IOS-SW9Following the RP outage in slot 8, the AOT and NAF information is updated to account for the downtime. In this example, the outage information is updated for MAX-IOS-SW8, which indicates a possible software and hardware problem. The event history table shows that the RP state was DOWN 530 seconds after COOL measurement was enabled. The event table also shows that the device was DOWN for 216 seconds before it returned to the UP state. The object table provides the updated AOT and increments NAF information.
Router#show cool event-table**** COOL Event Table ****Index Type Obj-ID Event Interval Event-Time Object-Name1 LNT 1 DOWN 33 Nov 03 2005 18:44:29 Router-Device2 LNT 1 UP 233 Nov 03 2005 18:48:22 Router-Device3 LNT 3 DOWN 530 Nov 03 2005 18:52:46 MAX-IOS-SW84 LNT 3 UP 216 Nov 03 2005 18:56:22 MAX-IOS-SW8Router#show cool object-table**** COOL Object Table ****Type Index Status AOT NAF LAST-Change-Time Object-NameLNT 1 UP 233 1 Nov 03 2005 18:48:22 Router-DeviceLNT 2 UP 0 0 Nov 03 2005 18:43:56 MIN-IOS-SW8LNT 3 UP 216 1 Nov 03 2005 18:56:22 MAX-IOS-SW8LNT 4 UP 0 0 Nov 03 2005 18:43:56 MIN-IOS-SW9LNT 5 UP 0 0 Nov 03 2005 18:43:56 MAX-IOS-SW9Line Card Outage Measurement: Example
This example illustrates the measurements (AOT and NAF) recorded for a physical entity (line card) outage:
!!Review the AOT and NAF values in the COOL object table.Router# show cool object-tableNo entry is available !**** COOL Object Table ****Type Index Status AOT NAF LAST-Change-Time Object-NameLNT 1 UP 0 0 Nov 03 2005 18:38:06 Router-DeviceLNT 2 UP 0 0 Nov 03 2005 18:38:06 MIN-IOS-SW8LNT 3 UP 0 0 Nov 03 2005 18:38:06 MAX-IOS-SW8LNT 4 UP 0 0 Nov 03 2005 18:38:06 MIN-IOS-SW9LNT 5 UP 0 0 Nov 03 2005 18:38:06 MAX-IOS-SW9ENT 40 UP 0 0 Nov 03 2005 18:39:49 slot 7!Notice that slot 7 is a physical entity with no current outage info.!Reload slot 7 to trigger an outage event for the sake of this example.Router#hw-module slot 7 reload!Review the values in the COOL event history table. Observe the history for slot 7. Note !that the line card went down 53 seconds after enabling COOL measurement.Router#show cool event-table**** COOL Event Table ****Index Type Obj-ID Event Interval Event-Time Object-Name1 ENT 40 DOWN 53 Nov 03 2005 18:40:42 slot 7!Review the contents of the COOL object table again. Here, AOT is 17 seconds, but this is!just a snap-shot of the down time and the outage time is continuing to increase.Router#show cool object-table**** COOL Object Table ****Type Index Status AOT NAF LAST-Change-Time Object-NameLNT 1 UP 0 0 Nov 03 2005 18:38:06 Router-DeviceLNT 2 UP 0 0 Nov 03 2005 18:38:06 MIN-IOS-SW8LNT 3 UP 0 0 Nov 03 2005 18:38:06 MAX-IOS-SW8LNT 4 UP 0 0 Nov 03 2005 18:38:06 MIN-IOS-SW9LNT 5 UP 0 0 Nov 03 2005 18:38:06 MAX-IOS-SW9ENT 40 DOWN 17 1 Nov 03 2005 18:40:42 slot 7!Display the COOL event and object tables again. This time the event table information!shows that the line card in slot 7 is back up after 129 seconds. The AOT and NAF!information is updated in the COOL object table for slot 7.Router#show cool event-table**** COOL Event Table ****Index Type Obj-ID Event Interval Event-Time Object-Name1 ENT 40 DOWN 53 Nov 03 2005 18:40:42 slot 72 ENT 40 UP 129 Nov 03 2005 18:42:51 slot 7**** COOL Object Table ****Router# show cool object-tableType Index Status AOT NAF LAST-Change-Time Object-NameLNT 1 UP 0 0 Nov 03 2005 18:38:06 Router-DeviceLNT 2 UP 0 0 Nov 03 2005 18:38:06 MIN-IOS-SW8LNT 3 UP 0 0 Nov 03 2005 18:38:06 MAX-IOS-SW8LNT 4 UP 0 0 Nov 03 2005 18:38:06 MIN-IOS-SW9LNT 5 UP 0 0 Nov 03 2005 18:38:06 MAX-IOS-SW9ENT 40 UP 129 1 Nov 03 2005 18:42:51 slot 7Interface Outage Measurement: Example
This examples illustrates the AOT and NAF measurements for an interface outage.
! Configure the interface.Router#config terminalRouter(config)#cool interface ethernet 0! Display the COOL object table to verify the entry in the table.Router#show cool object-table**** COOL Object Table ****Type Index Status AOT NAF LAST-Change-Time Object-NameLNT 1 UP 0 0 Nov 15 2005 06:47:58 Router-DeviceLNT 2 UP 0 0 Nov 15 2005 06:47:58 MIN-IOS-SW8LNT 3 UP 0 0 Nov 15 2005 06:47:58 MAX-IOS-SW8LNT 4 UP 0 0 Nov 15 2005 06:47:58 MIN-IOS-SW9LNT 5 UP 0 0 Nov 15 2005 06:47:58 MAX-IOS-SW9INF 2 UP 0 0 Nov 15 2005 06:48:09 Ethernet0! Display the running configuration.Router#show running-config | include coolcool runcool interface Ethernet0cool parameters! Cause a failure on the interfaceRouter#config terminalRouter(config)#interface ethernet 0Router(config-if)#shutdown! Display the COOL event and object tables to view the event.Router#show cool**** COOL Event Table ****Index Type Obj-ID Event Interval Event-Time Object-Name1 INF 2 DOWN 60 Nov 15 2005 06:49:09 Ethernet0**** COOL Object Table ****Type Index Status AOT NAF LAST-Change-Time Object-NameLNT 1 UP 0 0 Nov 15 2005 06:47:58 Router-DeviceLNT 2 UP 0 0 Nov 15 2005 06:47:58 MIN-IOS-SW8LNT 3 UP 0 0 Nov 15 2005 06:47:58 MAX-IOS-SW8LNT 4 UP 0 0 Nov 15 2005 06:47:58 MIN-IOS-SW9LNT 5 UP 0 0 Nov 15 2005 06:47:58 MAX-IOS-SW9INF 2 DOWN 116 1 Nov 15 2005 06:49:09 Ethernet0! Restart the interface and display the COOL event and object tables again to view the event.Router#config terminalRouter(config)#interface Ethernet 0Router(config-if)#no shutdownRouter#show cool**** COOL Event Table ****Index Type Obj-ID Event Interval Event-Time Object-Name1 INF 2 DOWN 60 Nov 15 2005 06:49:09 Ethernet02 INF 2 UP 112 Nov 15 2005 06:51:01 Ethernet0**** COOL Object Table ****Type Index Status AOT NAF LAST-Change-Time Object-NameLNT 1 UP 0 0 Nov 15 2005 06:47:58 Router-DeviceLNT 2 UP 0 0 Nov 15 2005 06:47:58 MIN-IOS-SW8LNT 3 UP 0 0 Nov 15 2005 06:47:58 MAX-IOS-SW8LNT 4 UP 0 0 Nov 15 2005 06:47:58 MIN-IOS-SW9LNT 5 UP 0 0 Nov 15 2005 06:47:58 MAX-IOS-SW9INF 2 UP 112 1 Nov 15 2005 06:51:01 Ethernet0Remote Device Outage Measurement: Example
This examples illustrates the AOT and NAF measurements for a remote device outage:
! Configure a remote object.! The remote object status is initially INIT status, since the object status is not! confirmed yet. After the status verification, the object status will be changed to UP or! DOWN.Router#config terminalRouter(config)#cool remote-device 2 128.107.165.42 remote-obj1 50 5Router#show coolNo entry is available !**** COOL Object Table ****Type Index Status AOT NAF LAST-Change-Time Object-NameLNT 1 UP 0 0 Nov 15 2005 06:53:40 Router-DeviceLNT 2 UP 0 0 Nov 15 2005 06:53:40 MIN-IOS-SW8LNT 3 UP 0 0 Nov 15 2005 06:53:40 MAX-IOS-SW8LNT 4 UP 0 0 Nov 15 2005 06:53:40 MIN-IOS-SW9LNT 5 UP 0 0 Nov 15 2005 06:53:40 MAX-IOS-SW9RMT 2 INIT 0 0 Nov 15 2005 06:55:33 remote-obj1Router#show coolNo entry is available !**** COOL Object Table ****Type Index Status AOT NAF LAST-Change-Time Object-NameLNT 1 UP 0 0 Nov 15 2005 06:53:40 Router-DeviceLNT 2 UP 0 0 Nov 15 2005 06:53:40 MIN-IOS-SW8LNT 3 UP 0 0 Nov 15 2005 06:53:40 MAX-IOS-SW8LNT 4 UP 0 0 Nov 15 2005 06:53:40 MIN-IOS-SW9LNT 5 UP 0 0 Nov 15 2005 06:53:40 MAX-IOS-SW9RMT 2 UP 0 0 Nov 15 2005 06:55:33 remote-obj1! The remote object becomes down.Router#show cool**** COOL Event Table ****Index Type Obj-ID Event Interval Event-Time Object-Name1 RMT 2 DOWN 463 Nov 15 2005 07:03:16 remote-obj1**** COOL Object Table ****Type Index Status AOT NAF LAST-Change-Time Object-NameLNT 1 UP 0 0 Nov 15 2005 06:53:40 Router-DeviceLNT 2 UP 0 0 Nov 15 2005 06:53:40 MIN-IOS-SW8LNT 3 UP 0 0 Nov 15 2005 06:53:40 MAX-IOS-SW8LNT 4 UP 0 0 Nov 15 2005 06:53:40 MIN-IOS-SW9LNT 5 UP 0 0 Nov 15 2005 06:53:40 MAX-IOS-SW9RMT 2 DOWN 3 1 Nov 15 2005 07:03:16 remote-obj1! After the remote object is UP, view the COOL event and object table to see the event.Router#show cool**** COOL Event Table ****Index Type Obj-ID Event Interval Event-Time Object-Name1 RMT 2 DOWN 463 Nov 15 2005 07:03:16 remote-obj12 RMT 2 UP 53 Nov 15 2005 07:04:09 remote-obj1**** COOL Object Table ****Type Index Status AOT NAF LAST-Change-Time Object-NameLNT 1 UP 0 0 Nov 15 2005 06:53:40 Router-DeviceLNT 2 UP 0 0 Nov 15 2005 06:53:40 MIN-IOS-SW8LNT 3 UP 0 0 Nov 15 2005 06:53:40 MAX-IOS-SW8LNT 4 UP 0 0 Nov 15 2005 06:53:40 MIN-IOS-SW9LNT 5 UP 0 0 Nov 15 2005 06:53:40 MAX-IOS-SW9RMT 2 UP 53 1 Nov 15 2005 07:04:09 remote-obj1Additional References
For additional information related to the COOL Measurement feature, refer to the following references:
Related Documents
Related Topic Document TitleConfiguring SNMP support
Configuration Fundamentals Configuration Guide, Part 3, Release 12.0
Cisco IOS commands
Cisco IOS Release 12.0 command reference publications
Standards
Standards1 TitleATIS Technical Requirements
Recording Outages in Packet Network Elements, Document Number T1.TRQ.11-2004
1 Not all supported standards are listed.
MIBs
MIBs1 MIBs Link•
CISCO-OUTAGE-MONITOR-MIB
•
IF-MIB
•
ENTITY-MIB
To obtain lists of supported MIBs by platform and Cisco IOS release, and to download MIB modules, go to the Cisco MIB website on Cisco.com at the following URL:
http://www.cisco.com/public/sw-center/netmgmt/cmtk/mibs.shtml
1 Not all supported MIBs are listed.
To locate and download MIBs for selected platforms, Cisco IOS releases, and feature sets, use Cisco MIB Locator found at the following URL:
http://tools.cisco.com/ITDIT/MIBS/servlet/index
If Cisco MIB Locator does not support the MIB information that you need, you can also obtain a list of supported MIBs and download MIBs from the Cisco MIBs page at the following URL:
http://www.cisco.com/public/sw-center/netmgmt/cmtk/mibs.shtml
To access Cisco MIB Locator, you must have an account on Cisco.com. If you have forgotten or lost your account information, send a blank e-mail to cco-locksmith@cisco.com. An automatic check will verify that your e-mail address is registered with Cisco.com. If the check is successful, account details with a new random password will be e-mailed to you. Qualified users can establish an account on Cisco.com by following the directions found at this URL:
RFCs
Technical Assistance
Command Reference
This section documents new and modified commands. All other commands used with this feature are documented in the Cisco IOS Release 12.0 command reference publications.
New Commands
Modified Commands
•
snmp-server enable traps outage
clear cool event-table
To clear and then reset the outage event history table, use the clear cool event-table command in privileged EXEC mode or user EXEC mode.
clear cool event-table
Syntax Description
This command has no arguments or keywords.
Command Modes
Privileged EXEC
User EXEC
Command History
Examples
The following example shows the contents of the COOL measurement event table before and after clearing it:
Router# show cool event-tableRP-Slot8#sh cool event-table**** COOL Event Table ****Index Type Obj-ID Event Interval Event-Time Object-Name1 ENT 45 DOWN 42 Sep 19 2005 23:59:07 slot 12 ENT 45 UP 140 Sep 20 2005 00:01:27 slot 13 INF 4 DOWN 29591 Sep 20 2005 08:05:56 GigabitEthernet7/0Router# clear cool event-tableRouter# show cool event-tableno entry is available !Related Commands
clear cool persist-files
To clear and reset any persistent files before enabling component outage on-line (COOL) measurement, use the clear cool persist-files command in privileged EXEC mode or user EXEC mode.
clear cool persist-files
Syntax Description
This command has no arguments or keywords.
Command Modes
Privileged EXEC
User EXEC
Command History
Usage Guidelines
Outage data is stored in Advanced Technology Attachment (ATA) Flash memory in the networking device. Saving outage data allows it to persist after a component or device failure, system crash, or fail-over. The persistent data is useful in calculating outage information after the device is running normally again.
Examples
The following example shows the files in disk0: on the networking device, including the persistent outage data file. The clear cool persist-files command deletes the file from disk0:.
Router# dir disk0:Directory of disk0:/1 -rw- 342 Aug 04 2005 15:30:16 outage_data_persist_file2 -rw- 34657 Aug 23 2000 03:06:16 router1-confg3 -rw- 34865 Aug 23 2000 03:06:34 router2-confgRouter# clear cool persist-filesRouter# dir disk0:Directory of disk0:/2 -rw- 34657 Aug 23 2000 03:06:16 router1-confg3 -rw- 34865 Aug 23 2000 03:06:34 router2-confgcool if-filter
To configure an interface for filtering the event notification send, use the cool if-filter command in global configuration mode. To disable the filtering of the event notification send, use the no form of this command.
cool if-filter interface
no cool if-filter interface
Syntax Description
interface
Interface from which the COOL measurement trap originates. The argument includes the interface type and number in platform-specific syntax (for example, type/slot/port).
Defaults
No interface is configured for filtering from the COOL event notification.
Command Modes
Global configuration
Command History
Usage Guidelines
A key benefit of using interface event notification filtering is the ability to reduce the overhead of event notification messages, especially in case of the event storm. Use the show cool object-table detail command to display the interface objects and filtering flag.
Examples
The following example shows how to configure the filtering for an Ethernet 5/1/0 notification:
cool if-filter Ethernet 5/1/0
Related Commands
Command Descriptionshow cool object-table
Displays a list of objects monitored for COOL measurement.
cool interface
To specify an interface for component outage on-line (COOL) measurement, use the cool interface command in global configuration mode. To disable COOL monitoring for an interface, use the no form of this command.
cool interface interface
no cool interface interface
Syntax Description
interface
Interface from which the COOL measurement trap originates. The argument includes the interface type and number in platform-specific syntax (for example, type/slot/port).
Defaults
No interface is specified to be monitored for COOL measurement.
Command Modes
Global configuration
Command History
Usage Guidelines
Use this command to monitor events from a particular interface.
Note
The COOL Measurement feature can monitor up to 2,000 objects. Each interface or field replaceable unit is counted as one object.
Use the show cool object-table command to display a list of monitored objects.
Examples
The following example shows how to enable COOL measurement on Ethernet interface 5/1/0:
cool interface Ethernet5/1/0Related Commands
Command Descriptionshow cool object-table
Displays a list of objects monitored for COOL measurement.
cool parameter
To enter component object online (COOL) parameter configuration mode, use the cool parameter command in global configuration mode.
cool parameter
Syntax Description
This command has no arguments or keywords.
Defaults
Cool parameter values are assigned as default values. No COOL parameter configuration is made for the default value.
Command Modes
Global configuration
Command History
Usage Guidelines
The COOL parameters define the characteristics of COOL management. To view the COOL parameters, enter the show cool parameters command.
Examples
The following example shows how to enter COOL parameter configuration mode:
Router(config)# cool parameterRouter(config-cool-pars)#Related Commands
Command Descriptionshow cool parameters
Displays the Component Object Online (COOL) management parameters.
cool physical-fru-slot
To add an entity for monitoring using component outage on-line (COOL) measurement, use the cool physical-fru-slot command in global configuration mode. To stop monitoring a specific FRU entity, use the no form of this command.
cool physical-fru-slot fru-slot-number [bay subslot-number]
no cool physical-fru-slot fru-slot-number [bay subslot-number]
Syntax Description
fru-slot-number
Indicates the physical slot number in the networking device.
subslot-number
Indicates the sub slot number in the slot.
Defaults
No physical entities are monitored for COOL measurement.
Command Modes
Global configuration
Command History
Usage Guidelines
By entering the slot number, you specify the physical device located in that slot.
Examples
The following example shows how to add the physical object located in slot 9 for monitoring under COOL measurement:
cool physical-FRU-slot 9The following example shows how to add a shared port adapter (SPA) device in bay 0 of slot 1 to the list of objects monitored under COOL measurement.cool physical-FRU-slot 1 bay 0cool remote-device
To add a remote networking device for component outage on-line (COOL) measurement monitoring, use the cool remote-device command in global configuration mode. To disable monitoring of a remote device, use the no form of this command.
cool remote-device entry-index remote-ip-address name seconds repeat [local-ip-address mode]
no cool remote-device entry-index remote-ip-address name seconds repeat [local-ip-address mode]
Syntax Description
Defaults
The default mode is ping.
Command Modes
Global configuration
Command History
Usage Guidelines
Remote networking devices can be monitored for outages.
Examples
The following example shows the addition of a remote device named CPE1. This device has index number 1 and is configured to generate one ping message every 30 seconds.
cool remote-device 1 10.4.4.1 CPE1 30 5
Related Commands
Command Descriptionshow cool object-table
Displays a list of objects monitored for COOL measurement.
cool run
To enable component outage on-line (COOL) measurement, use the cool run command in global configuration mode. To disable COOL measurements, use the no form of this command.
cool run
no cool run
Syntax Description
This command has no arguments or keywords.
Defaults
COOL measurement is disabled.
Enabling COOL measurement automatically enables three logical event measurements: Router-Device, minimum Cisco IOS software, and maximum Cisco IOS software.
Command Modes
Global configuration
Command History
Examples
The following example shows how to start COOL measurement functions on the networking device:
cool runRelated Commands
Command Descriptionsnmp-server enable traps outage
Enables sending COOL SNMP notifications.
snmp-server host
Specifies the recipient of an SNMP notification operation.
count event-storm
To set the threshold counter for event storms in objects monitored by component outage on-line (COOL) measurement, use the count event-storm command in cool parameter configuration mode. To remove the entry, use the no form of this command.
count event-storm number
no count event-storm {number}
Syntax Description
number
Indicates the event-storm threshold value. The default value is 5 events. The range is 1 to 100.
Defaults
The default event-storm count is 5.
Command Modes
COOL parameter configuration
Command History
Usage Guidelines
When the COOL measurement monitor detects an object failure, the event storm counter is incremented. If the number of failures for an object exceeds the event storm counter threshold for the period determined by the timer event-storm command, an event storm is detected. At that time, the COOL measurement manager stops writing event information to event tables in memory on the networking device until the event count becomes less than the threshold. This prevents event storm information from overloading the event tables in system memory and Flash (persistent) memory.
Examples
The following example shows how to set the event-storm threshold counter to 10:
count event-storm 10Related Commands
show cool event-table
To display the outage event history for objects monitored by component object online (COOL) measurement, use the show cool event-table command in privileged EXEC mode or user EXEC mode.
show cool event-table [number-of-events]
Syntax Description
number-of-events
(Optional) Displays a specified number of recent events. The argument may be any number up to 500.
Command Modes
Privileged EXEC
User EXEC
Command History
Usage Guidelines
The outage event table provides the outage information for each monitored object.
Examples
The following example shows the output of the show cool event-table command:
Router# show cool event-tableRP-Slot8#sh cool event-table**** COOL Event Table ****Index Type Obj-ID Event Interval Event-Time Object-Name1 ENT 45 DOWN 42 Sep 19 2005 23:59:07 slot 12 ENT 45 UP 140 Sep 20 2005 00:01:27 slot 13 INF 4 DOWN 29591 Sep 20 2005 08:05:56 GigabitEthernet7/0The following examples shows only type 1 (physical) entities in the COOL measurement event table:
Router# show cool event-table 1**** COOL Event Table ****Index Type Obj-ID Event Interval Event-Time Object-Name1 ENT 45 DOWN 42 Sep 19 2005 23:59:07 slot 12 ENT 45 UP 140 Sep 20 2005 00:01:27 slot 1Table 3 describes the fields shown in the display.
Related Commands
show cool object-table
To display monitored objects and their status, use the show cool object-table command in privileged EXEC mode or user EXEC mode.
show cool object-table [object-type] detail
Syntax Description
object-type
(Optional) Object being monitored: interface (1), physical-object (2), process (3), remote object (4), logical object (5).
Command Modes
Privileged EXEC
User EXEC
Command History
Usage Guidelines
Use the cool object table to display a list of monitor objects, object index number, and outage information.
Examples
The following is sample output from the show cool object-table command:
Router#show cool object**** COOL Object Table ****Type Index Status AOT NAF LAST-Change-Time Object-NameLNT 1 UP 0 0 Sep 19 2005 23:52:31 Router-DeviceLNT 2 UP 0 0 Sep 19 2005 23:52:31 MIN-IOS-SW8LNT 3 UP 0 0 Sep 19 2005 23:52:31 MAX-IOS-SW8LNT 4 UP 0 0 Sep 19 2005 23:52:31 MIN-IOS-SW9LNT 5 UP 0 0 Sep 19 2005 23:52:31 MAX-IOS-SW9INF 4 DOWN 106 1 Sep 20 2005 08:05:56 GigabitEthernet7/0INF 5 DOWN 29685 1 Sep 19 2005 23:52:58 GigabitEthernet7/1ENT 45 UP 140 1 Sep 20 2005 00:01:27 slot 1 ot 9The following is sample output showing detailed information on type 1 (physical) entities:
Router#sh cool obj 1 detail**** COOL Detailed Object Table ****-----------------------------------------------Object Type: ENTObject Index: 45Object Status: UPObject Start Time: Sep 19 2005 23:58:25Object Last Change Time: Sep 20 2005 00:01:27Object Name: slot 1Serial Name: SAD072104KMAOT: 140NAF: 1Slot Number: 1Table Status: Active-----------------------------------------------Table 4 describes the fields shown in the displays.
show cool parameters
To display component object online (COOL) measurement parameters, use the show cool parameters command in privileged EXEC mode or user EXEC mode.
show cool parameters
Syntax Description
This command has no keywords or arguments.
Command Modes
Privileged EXEC
User EXEC
Command History
Usage Guidelines
Use the show cool parameters command to display the default and currently configured COOL measurement parameters.
Examples
The following is sample output from the show cool parameters command:
Router# show cool parametersTime Stamp Period: Configured (30 seconds) Default (30 seconds)Flash Update Period: Configured (15 minutes) Default (15 minutes)Duration Threshold: Configured (8 seconds) Default (8 seconds)Event Storm Count Number: Configured (5 times) Default (5 times)Event Storm Check Period: Configured (1 second) Default (1 second)Max Event History Table Size: Configured (500 entries) Default (500 entries)Table 4 describes the fields shown in the display.
size event-table
To set the size of the event table that is maintained for objects being monitored by component outage on-line (COOL) measurement, use the size event-table command in cool parameter configuration mode. To remove a table size configuration entry, use the no form of this command.
size event-table number
no size event-table number
Syntax Description
number
Specifies the number of objects that are monitored for COOL measurement. The default size of the event table is 500 entries. The range is 1 to 5000 entries.
Defaults
The default setting is 500 entries.
Command Modes
COOL parameter configuration
Command History
Usage Guidelines
The COOL event history table size can be configured for 1 to 5000 entries. The default size is 500. An event occurs any time a monitored object changes state from UP to Down or Down to UP. Changes to the history table size have the follow effects:
•
If the new table size is bigger than the previous table size, the new table keeps all the existing event history entries.
•
If the new size is less than the previous table size and the table entry is bigger than the new size, the older event history entries will be dropped.
•
If the new event table is too big to allocate memory (failed during the resize process), it will lose the previous event history entry.
•
The new event history table size shall be stored into the persistent file right after the configuration.
Examples
The following example shows how to change the size of the event table to 2000:
size event-table 2000Related Commands
Command Descriptionshow cool parameters
Displays a list component object online (COOL) measurement parameters and their settings.
snmp-server enable traps outage
To enable the sending of component object on-line (COOL) measurement Simple Network Management Protocol (SNMP) notifications, use the snmp-server enable traps outage command in global configuration mode. To disable sending COOL SNMP notifications, use the no form of this command.
snmp-server enable traps outage
no snmp-server enable traps outage
Syntax Description
This command has no arguments or keywords.
Defaults
Disabled.
Command Modes
Global configuration.
Command History
Examples
The following example shows how to enable sending of COOL SNMP notifications to an SNMP server:
snmp-server enable traps outage
Related Commands
Command Descriptioncool run
Enables COOL measurement.
snmp-server host
Specifies the recipient of an SNMP notification operation.
snmp-server host
To specify the recipient of a Simple Network Management Protocol (SNMP) notification operation, use the snmp-server host global configuration command. To remove the specified host, use the no form of this command.
snmp-server host host-address [traps | informs] [version {1 | 2c | 3 [auth | noauth | priv]}] community-string [udp-port port] [notification-type] [outage]
no snmp-server host host [traps | informs]
Syntax Description
Defaults
This command is disabled by default. No notifications are sent.
If you enter this command with no keywords, the default is to send all trap types to the host. No informs will be sent to this host.
If no version keyword is present, the default is version 1. The no snmp-server host command with no keywords will disable traps, but not informs, to the host. In order to disable informs, use the no snmp-server host informs command.
Note
If the community-string is not defined using the snmp-server community command prior to using this command, the default form of the snmp-server community command will automatically be inserted into the configuration. The password (community-string) used for this automatic configuration of the snmp-server community will be the same as specified in the snmp-server host command. This is the default behavior for Cisco IOS Release 12.0(3) and later.
Command Modes
Global configuration
Command History
Usage Guidelines
SNMP notifications can be sent as traps or inform requests. Traps are unreliable because the receiver does not send acknowledgments when it receives traps. The sender cannot determine if the traps were received. However, an SNMP entity that receives an inform request acknowledges the message with an SNMP response. If the sender never receives the response, the inform request can be sent again. Thus, informs are more likely to reach their intended destination.
However, informs consume more resources in the agent and in the network. Unlike a trap, which is discarded as soon as it is sent, an inform request must be held in memory until a response is received or the request times out. Also, traps are sent only once, while an inform may be retried several times. The retries increase traffic and contribute to a higher overhead on the network.
If you do not enter an snmp-server host command, no notifications are sent. In order to configure the networking device to send SNMP notifications, you must enter at least one snmp-server host command. If you enter the command with no keywords, all trap types are enabled for the host.
In order to enable multiple hosts, you must issue a separate snmp-server host command for each host. You can specify multiple notification types in the command for each host.
When multiple snmp-server host commands are given for the same host and kind of notification (trap or inform), each succeeding command overwrites the previous command. Only the last snmp-server host command will be in effect. For example, if you enter an snmp-server host inform command for a host and then enter another snmp-server host inform command for the same host, the second command will replace the first.
The snmp-server host command is used in conjunction with the snmp-server enable command. Use the snmp-server enable command to specify which SNMP notifications are sent globally. For a host to receive most notifications, at least one snmp-server enable command and the snmp-server host command for that host must be enabled.
However, some notification types cannot be controlled with the snmp-server enable command. For example, some notification types are always enabled. Other notification types are enabled by a different command. For example, the linkUpDown notifications are controlled by the snmp trap link-status command. These notification types do not require an snmp-server enable command.
A notification-type option's availability depends on the networking device type and Cisco IOS software features supported on the networking device. For example, the envmon notification-type is available only if the environmental monitor is part of the system. To see what notification types are available on your system, use the command help? at the end of the snmp-server host command.
Examples
If you want to configure a unique SNMP community string for traps, but you want to prevent SNMP polling access with this string, the configuration should include an access list. In the following example, the community string is named comaccess and the access list is numbered 10:
Router(config)# snmp-server community comaccess ro 10Router(config)# snmp-server host 172.20.2.160 comaccessRouter(config)# access-list 10 deny anyThe following example sends RFC 1157 SNMP traps to the host specified by the name myhost.cisco.com. Other traps are enabled, but only SNMP traps are sent because only snmp is specified in the snmp-server host command. The community string is defined as comaccess.
Router(config)# snmp-server enable trapsRouter(config)# snmp-server host myhost.cisco.com comaccess snmpThe following example sends the SNMP and Cisco environmental monitor enterprise-specific traps to address 172.30.2.160:
Router(config)# snmp-server enable traps snmpRouter(config)# snmp-server enable traps envmonRouter(config)# snmp-server host 172.30.2.160 public snmp envmonThe following example enables the networking device to send all traps to the host myhost.cisco.com using the community string public:
Router(config)# snmp-server enable trapsRouter(config)# snmp-server host myhost.cisco.com publicThe following example will not send traps to any host. The BGP traps are enabled for all hosts, but only the ISDN traps are enabled to be sent to a host.
Router(config)# snmp-server enable traps bgpRouter(config)# snmp-server host bob public isdnThe following example enables the networking device to send all inform requests to the host myhost.cisco.com using the community string public:
Router(config)# snmp-server enable trapsRouter(config)# snmp-server host myhost.cisco.com informs version 2c publicThe following example sends HSRP MIB informs to the host specified by the name myhost.cisco.com. The community string is defined as public.
Router(config)# snmp-server enable traps hsrpRouter(config)# snmp-server host myhost.cisco.com informs version 2c public hsrpThe following example enables the host myhost.cisco.com to receive component object on-line (COOL) measurement information as SNMP traps. The community string is defined as public.
Router(config)# snmp-server host myhost.cisco.com public outageRelated Commands
timer event-storm
To set the time interval used by component outage on-line (COOL) measurement to detect event storms caused by multiple object failures, use the timer event-storm command in cool parameter configuration mode. To remove a the specific time interval, use the no form of this command.
timer event-storm seconds
no timer event-storm seconds
Syntax Description
seconds
Specifies the time interval used by the COOL measurement monitor to detect an event storm. The default interval is 1 second. The range is 1 to 5 seconds.
Defaults
The default setting is one second.
Command Modes
COOL parameter configuration
Command History
Usage Guidelines
When the COOL measurement monitor detects an object failure, an event storm counter is incremented. If the number of object failures exceeds the event storm counter threshold as determined by the count event-storm command for the period determined by the timer event-storm command, an event storm is detected. The COOL monitor continues to check for event storms at the configured interval. The COOL measurement manager stops writing event information to event tables in memory until the event count becomes less than the threshold for the configured interval. This prevents event storm information from overloading the event tables in system memory and Flash (persistent) memory.
Examples
The following example shows how to configure the event storm detection interval to 2 seconds:
timer event-storm 2Related Commands
timer flash-file
To set the time interval for writing component outage on-line (COOL) measurement event information from memory to Flash memory, use the timer flash-file command in cool parameter configuration mode. To remove a timer configuration entry, use the no form of this command.
timer flash-file minutes
no timer flash-file minutes
Syntax Description
seconds
Specifies the time interval for writing COOL event information from memory to Flash memory. The default interval is 15 minutes. The range is 10 to 60 minutes.
Defaults
The default setting is 15 minutes.
Command Modes
COOL parameter configuration
Command History
Usage Guidelines
COOL measurement supports two levels of data storage. It keeps the event driven outage data temporarily in the memory and periodically updates the entire outage data table and object table to the Flash disk (persistent storage) at an interval specified by the timer flash-file command. A flash disk is generally larger than memory (NVRAM), but Flash memory has a limitation on the number of read/write operations. NVRAM storage is relatively small compared with Flash memory, but there is no read/write limitation.
To further limit write operations to the Flash disk, data is written to Flash memory only if new events are detected in outage or object tables.
Administrators can use these parameters to adjust COOL measurement and storage parameters to match site dependencies and requirements.
Examples
The following example shows how to change the interval for writing event information to Flash disk to 10 minutes:
timer flash-file 10Related Commands
Command Descriptionshow cool parameters
Displays a list component object online (COOL) measurement parameters and their settings.
timer duration
To set the minimum time when an object that is being monitor by component outage on-line (COOL) measurement can change state after an outage or recovery is detected, use the timer duration command in cool parameter configuration mode. To remove a timer configuration entry, use the no form of this command.
timer duration seconds
no timer duration seconds
Syntax Description
Defaults
The default setting is 8 seconds.
Command Modes
COOL parameter configuration
Command History
Usage Guidelines
At any give time, each object that is monitored by COOL management exists in one of the two states, UP or DOWN. When the COOL management monitor detects an object failure (outage) or recovery, the event start time is recorded. When the outage has ended, the end time is also recorded; however, the objects state is not changed until the timer duration interval (start time to end time) has passed. For example, if an outage is less than the duration threshold, it is not considered as an outage, but if the outage time exceeds the duration threshold, the state is changed from UP to DOWN.
Similarly, when an object recovers, the recovery time is recorded. If that object experiences an outage before the timer duration threshold has passed, it is not considered as a recovery, and the state is not changed from DOWN to UP. If the recovery exceeds the timer duration, the state is deemed UP.
Administrators can use these parameters to adjust COOL measurement to match site dependencies and requirements for calculating mean time between failure (MTBF) or repair (MTTR) values.
Examples
The following example shows how to set the timer duration to 10 minutes.
timer duration 10Related Commands
Command Descriptionshow cool parameters
Displays a list component object online (COOL) measurement parameters and their settings.
timer timestamp-file
To set the time interval for updating the Component Object Online (COOL) management timestamp file, use the timer timestamp-file command in cool parameter configuration mode. To remove the timer timestamp-file configuration, use the no form of this command.
timer timestamp-file {seconds}
no timer timestamp-file {seconds}
Syntax Description
seconds
Indicates the interval at which the system updates timestamp file in Flash memory (persistent storage). The default interval is 30 seconds. The interval range is 1 to 100 seconds.
Defaults
The default timer setting is 30 seconds.
Command Modes
COOL parameter configuration
Command History
Usage Guidelines
At intervals configured by the timer timestamp-file command, the COOL measurement manager writes the system time to Flash memory to record the last system up time. During the system restart, COOL reads the timestamp file and calculates the system outage by retrieving the last system up time and subtracting it from the current system time. The time-stamp period is configurable to ensure that the outage measurement gap is within the outage resolution requirement for meantime-between-failure (MTBF) based on site requirements.
Examples
The following example shows how to set the time stamp interval to 60 seconds:
timer timestamp-file 60Related Commands
Command Descriptionshow cool parameters
Displays a summary of Component Object Online (COOL) management configured and default parameter settings.
Copyright © 2006 Cisco Systems, Inc. All rights reserved.







