Table Of Contents
Remote Monitoring Suite Overview
The Remote Monitoring Suite (RMS) components include the:
This chapter provides:
•Background information so you can understand security and user rights issues when the Remote Monitoring Suite components and your configuration databases work together in a network.
•The functional architecture of the Remote Monitoring Suite.
•A brief description of the components you use to configure and maintain Listener, LGMapper, LGArchiver, and AlarmTracker Client.
The Distributed Diagnostics and Services Network (DDSN) is a support architecture facility that gathers event and message information from multiple systems at a central point. Service provider personnel can then monitor this system information, react to urgent problems immediately, and examine a system activity history to discover chronic problems.
In this scenario, each system runs a facility that detects and reports any unusual conditions or events that occur.
These events and messages range from informational messages to reports of serious errors. This information is passed onto a process called the Listener (Figure 1-1).
The Listener typically runs stand-alone at the (network) service provider site. A single Listener can receive events from multiple systems. Depending on the installation, thesystems might connect to the Listener via a modem and a dial-up connection using the Windows Remote Access Server (RAS) or via a direct network connection.
LGMapper is a server that accepts data from the Listener process and maps Listener objects into a preconfigured object hierarchy. This server also caches managed object attribute data, and updates connected AlarmTracker Client applications with new event and alarm data. This event mechanism gives quick notification to network service provider representatives when a problem occurs.
Figure 1-1 provides an overview of the Remote Monitoring Suite error reporting process.
Figure 1-1 Error Reporting Overview
ICM, ISN, or ther Cisco products send event information to the Listener application. These products inform the Listener application of any significant errors or unexpected conditions it encounters.
Four processes on the ICM Logger (or SDDSN) handle error reporting. Two of the processes are used for remote monitoring and are discussed here. These processes are the:
•Customer Support Forwarding Service (CSFS)
–Receives events, filters them, and forwards the events to other processes that request the data.
•DDSN Transfer Process (DTP)
–Receives data from CSFS
–Transfers the events and export files to the machine running the Listener. It uses either a dial-up connection and the Remote Access Service (RAS) or a direct network connection. The Listener stores the events in a customer-specific directory on its machine.
Event messages received by the Listener include information about when and where the error occurred and the full message as reported on the event feed.
The DTP process keeps EMS events in memory until they are delivered to the Listener. To minimize the traffic to the Listener, and particularly the number of RAS connections needed over time, messages are batched together and sent periodically. However, if the DTP process receives a high priority event, it immediately sends the event to the Listener. If an attempt to establish a RAS connection fails, the DTP process periodically tries to re-establish the RAS connection.
The DTP process checks to see if there are EMS events to be processed. When there are new events, the DTP process sends the events to the Listener, establishing a RAS connection if necessary.
Note You can configure the time interval for the DTP process to check for EMS events; thirty minutes is the default setting.
Remote Monitoring Suite Functional Architecture
Figure 1-2 illustrates the overall functional architecture of the Remote Monitoring Suite.
Figure 1-2 Remote Monitoring Suite Functional Achitecture
The major Remote Monitoring Suite components are discussed in Table 1-1.
LGMapper and LGArchiver Differences
LGArchiver operation is very similar to LGMapper operation in that they both connect to Listener as a client and the both manage databases containing object mapping and Alarm Objects, but their purposes are quite different.
The LGMapper server is designed to serve multiple AlarmTracker Client applications. It processes Listener event messages and manages Alarm Objects in the database, and it must notify each active AlarmTracker Client of relevant Listener events. In addition, AlarmTracker Clients make direct queries against the Alarms Database in order to update its displays. Since interactive client response time is a requirement, it is recommended that the time history of Alarm Objects be set to something reasonable to guarantee adequate response time. This time history is set using the LGMapperCnfg tool, but its value should depend on your needs, and on the number of customers and products you need to monitor. The default value used when the product is installed is 7 days (168 hours).
The LGArchiver server, on the other hand, has no clients attached to it. Its primary purpose is to process Listener events and manage the Archived Alarms Database. It is intended to archive alarms over a longer period so you can run trend analysis and report against it. For this reason, the default alarms time history setting for the LGArchiver is 30 days. This value can be changed using the LGArchiverCnfg tool.
Samples of web-based reports that can be run against the Archived Alarms Database can be foundon the Operations Support CD-ROM in the Samples > Reporting subdirectory.
The system design does not limit LGMappers to a one-to-one relationship with Listener processes. From the Listener point of view, an LGMapper is a client. Thus, the connection between the Listener and the LGMapper can cross a network boundary.
There can be no more than two Listener processes distributed in a LAN or at different sites connected via a WAN to achieve fault tolerance.
The LGMapper processes can also be distributed, but they are not limited to two instances. Judicious placement of LGMapper servers optimizes the use of network bandwidth. For example, if you plan to set up a support center at a site that is remote from the Listener locations, it makes sense to place one or more LGMapper servers in the LAN at the support center so that AlarmTracker Client connections are always in the LAN. That way, only one or two WAN connections are needed from the LGMapper to the Listener, rather than co-locating the LGMapper server with the Listener (which would result in having a WAN connection for each AlarmTracker Client).
Fault Tolerant Functional Architecture
Figure 1-3 illustrates the overall functional architecture of the system in the context of operating in a fault-tolerant environment.
Figure 1-3 Fault Tolerant Functional Architecture
Note the following features in Figure 1-3:
•Redundant Alarms Databases
–They provide fault tolerance when one of the hosts running a Listener/LGMapper/Alarms Database is unreachable.
•Each AlarmTracker Client is capable of communicating with two running LGMapper servers, but to conserve network bandwidth, only one of the servers is designated as primary (or active).
–If the AlarmTracker Client is connected to the LGMapper servers, the primary LGMapper server performs all communication between the two processes. In the event of a primary server failure, the AlarmTracker Client must then switch over to the backup (or standby) server.
Further discussion on the deployment of machines to achieve fault-tolerant performance is found in Fault Tolerant Considerations.
The LGArchiver Server is not shown in Figure 1-3. There is no fault-tolerant scheme for it. It should be installed on a high-availability server with little or nothing else running on it. You should perform standard SQL Server backups on the database to avoid data loss.
The Alarms Database used by the LGMapper and LGArchiver is a Microsoft SQL Server 7.0 database containing the information needed to map Listener objects to attributes of a node in a hierarchy of product objects. The database also contains a table of Instance Nodes which is populated by the LGMapper at runtime. This table is saved between LGMapper sessions and represents actual Instance Nodes created based on Rules that were successfully applied at runtime. In addition, the Alarms Database contains a history of alarms objects representing the past and current states of the monitored product.
Note The schema of the Alarms Database (see "Alarms Database Schema") connected to the LGMapper and the Archived Alarms Database connected to the LGArchiver are identical. Thus, the discussion of the Alarms Database applies to both databases.
New to the Remote Monitoring Suite is the ability to monitor products other than the ICM. During installation, the Alarms Database is populated with Object Identifier (OID) Nodes and Rules for supporting the Cisco ICM and ISN products. As other products are supported, updates to the tables that manage this information can be provided as hot fixes.
SQL Server 7.0 (Service Pack 2 or higher) is an integral part of the Remote Monitoring Suite. SQL Server 7.0 must be installed on every machine running the LGMapper or LGArchiver Server.
The Rule Mapping Process
The Alarms Database is installed with a pre-loaded set of OID Nodes and Rules used by the LGMapper and LGArchiver to map incoming Listener objects to a specific attribute in an Instance Node somewhere in the hierarchy. In the current version, OID Nodes and Rules are supplied for the Cisco ICM and ISN products. Other products can be added later as hot fixes. This information is stored in the database in the OIDNodes, Attributes, OIDNodeAttributes, and Rules tables.
As Listener events are received by the LGMapper, it uses the set of Rules to map the object to an attribute of a particular OID Node. It then uses the Rule contents to determine the Instance Node to which the object applies. Once the Attribute and Instance Node are determined, the LGMapper updates the Alarms Database and it updates the state of the affected node. This, in turn, may result in a state change in other nodes (specifically its parent node) since the Remote Monitoring Suite supports the notion of state roll up in the node hierarchy.
Use the LGCnfg tool to modify the set of Rules and to create new OID Nodes for products. The LGCnfg tool can be used to define new products as well, although, typically you should not have to use the LGCnfg tool for this purpose. As previously mentioned, the Alarms Database comes pre-loaded with the OID Nodes and Rules for ICM and ISN. However, if you are interested in creating your own object hierarchy, or modifying the existing one, the LGCnfg tool is provided for that purpose. If you plan to experiment and create your own Rules and/or OID object hiererarchy, we strongly suggest you usethe LGCnfg tool to create a new database for this purpose. More information on the use of this tool is given in the LGCnfg Tool section.
If you detect any Listener object issues that are not mapped (unmapped objects appear in the Unmapped Objects Node under each customer product instance) or Listener Objects you feel are mapped to the wrong Instance Node, you should contact Cisco Technical Assistance Center (TAC) to report the problem.
When a Listener event message is processed, it is mapped by a Rule to an attribute of an OID Node. The Rule, along with the message content, is then used to determine the Instance Node to which it applies. If this Instance Node does not exist, it is created by the LGMapper software. When it is created, an entry in the InstanceNodes table is created and an in-memory copy of the node is created in the proper place in the hierarchy so that its state can be properly tracked. In addition, there is an associated table (the Customers table) which manages the set of known customers.
When the LGMapper initializes, the full set of Instance Nodes is read from the InstanceNodes table. Every Alarm Object must be associated with an attribute of an Instance Node. Thus, the Instance Nodes are persisted from session to session. As time goes on, you may find that you may want to prune this information as devices are retired or as customers are deleted. The LGCnfg tool is used to delete Instance Nodes as needed. More information on how to do this is provided in "LGCnfg Tool".
Note It is important to use the LGCnfg tool to modify the database contents and not SQL Server directly. This is because of the relationships that exist between the tables.
One of the most important purposes of the Alarms Database is to manage a set of Alarm Objects. An Alarm Object is defined as an object that generally indicates some type of failure condition for some component in a system. Typically, an Alarm Object is created by an event that signals or raises the alarm. The Alarm Object has a state consisting of the object being raised (down) or cleared (up), and an Assignment Status indicating the action TAC is taking in response to the alarm. An Alarm Object consists of one or more Listener Events indicating its state transitions.
Alarm Objects are persisted in two tables in the Alarms Database: the Alarms table (which stores information about each Alarm Object), and the Events table (which stores information about each Event that makes up the Alarm Object). In addition, a third table (the Simples table) stores a special kind of single-state Alarm Object. The Simples tables stores Simple Events from Listener which are considered to be lower priority events. Since these events are stateless and lower priority, they are separated from the main Alarms tables.
Another table (the ObjectState table) is used by LGMapper (but not LGArchiver) to store the current object state of all Listener objects. The ObjectState table contains a cross reference from the Listener ObjectName (qualified by ProductID and CustomerID) to the current Alarm Object referencing it.
Not only do the LGMapper and LGArchiver maintain the current state of the set of Alarm Objects, they also maintain an archived history of closed Alarm Objects. A closed Alarm Object is one whose state is "up" and whose assignment status is "unassigned".
You can see both closed and open Alarm Objects in the Alarms View display using the AlarmTracker Client. Select a different filter to view just the open Alarm Objects.
LGMapper and LGArchiver both have a configuration setting called Alarms Objects History which manages the size of the Alarms Database so that it does not grow unbounded. This setting specifies how long closed Alarms Objects are maintained in the database before being purged.
The Alarms Objects History setting affects only closed Alarms Objects. Open Alarm Objects are never purged, regardless of how old they are. A closed Alarm Object is subject to being purged when the time it is closed is older than the value of Alarms Objects History setting. Purging is done when the LGMapper or the LGArchiver starts up, and at occasional intervals when it is running.
For the LGMapper, the initial default value is 7 days (168 hours). Use the LGMapperCnfg tool to change this setting if necessary. Since the LGMapper and AlarmTracker Client are considered to be tactical products, you can determine how long a history you want AlarmTracker Clients to see. The value you use may affect the response time of the AlarmTracker Clients. The larger the value, the longer it takes for each database query because the queries are performed against more data, and more data is returned across the wire in the result set. You must determine what setting is best for you and your environment.
For the LGArchiver, the initial default value is 30 days. LGArchiver real-time performance is less critical than that of the LGMapper, thus, it is expected that the Alarms Objects History is longer for the LGArchiver. The recommended value depends solely on how much disk storage you are willing to allocate for the database. As a guideline, monitoring 160 ICM customers for a 30-day history the database size required is approximately 1.3GBytes. The database size is directly proportional to the number of customers and products being monitored. Thus, if your installation is monitoring 40 customers, you could probably set a 120 day history and end up with a database of approximately the same size.
The following formula can be used to estimate the SQL Server database MDF file size:
[MDF File Size MBytes] = 0.3 X [Number of Customers] X [Time History in Days]
Remember, the database is purged so its size does not grow unbounded. In addition, at startup a check is made and once-per-week the database is completely re-indexed using the DBCC DBREINDEX T-SQL command.
Alarms Database Schema
We have already introduced the names of several of the tables maintained in the Alarms Database. Further documentation on the specific content of all tables in the database can be found in Appendix A (the Alarms Database Schema Description). This schema information is used to create specific queries and reports to perform trend analysis on the LGArchiver Alarms Database.
Accessing the LGArchiver Database
The LGArchiver Alarms Database is intended to perform longer-term Alarm archiving than the LGMapper Alarms Database. The schema is open and allows you to write customized queries and reports against this database to perform specific analyses or trend reporting.
When accessing the database, keep security in mind. By default, the three LGMapper user groups are given access to the database. The LGM Readers are given read access only, LGM Users are given read/write access, and the LGM Administrators group is given administrator rights. Edit this as you see fit, but, make sure the LGMapper account has administrator rights since the LGArchiver Server runs under this account and is doing all the database work.
Fault Tolerant Considerations
Figure 1-3 shows a fault-tolerant deployment of the LGMapper Servers. Each LGMapper Server connects to a single Listener so you can install an LGMapper Server on the same machine as the Listener. However, this should be done only after reading the Hardware Recommendations section, following, which describes guidelines for deployment based on the estimated data volume. To achieve LGMapper fault tolerance, you must utilize at least two machines (four machines if you do not install LGMapper on the same machine as the Listener).
The LGArchiver is not designed to be fault tolerant, therefore, you can install it on any machine that meets the requirements as indicated in Hardware Recommendations. For example, you might install it on one of the LGMapper machines. However, you should read the Hardware Recommendations section, carefully before doing this. If you are supporting a large number of customers, you may find it necessary to put the LGArchiver on a separate machine in order for the LGMapper to provide adequate real-time response for AlarmTracker Clients. Because it is not fault tolerant, you should make sure that you schedule backups of the database so you do not lose the data in the event of a system or disk failure.
The system design does not limit LGMappers to a one-to-one relationship with Listener processes. In fact, from the Listener point of view, the LGMapper is just another client. This feature has some interesting implications when the architecture is viewed in a distributed environment.
The Listener design limits the number of Listener process to two. These processes may be distributed in a LAN or at different sites connected via a WAN to achieve fault tolerance. Likewise, the LGMapper processes can also be distributed, but are not limited to two instances. In fact, judicious placement of LGMapper Servers can optimize the use of network bandwidth. For example, if you intend to set up a support center at a site that is remote from the Listener locations, it makes sense to place one or more LGMapper Servers in the LAN at the support center so that AlarmTracker Client connections are always in the LAN. In this situation, only one or two WAN connections are needed from the LGMapper to the Listener (as opposed to co-locating the LGMapper Server with the Listener which results in having a WAN connection for each AlarmTracker Client). Figure 1-4 shows an example of this type of deployment.
Figure 1-4 Example Separated Support Center Deployment Strategy
It is difficult to precisely specify hardware requirements since the requirements are driven by the volume of data to be processed. From an LGMapper/LGArchiver point of view, the volume of data is directly proportional to the number of customers supported and the number of products monitored. However, the customer's configuration also affects the volume of data. One customer ICM installation may include 40 PGs whereas another might only have 4 PGs. This makes it difficult to make general statements about the hardware requirements.
However, as a minimum, an LGMapper/LGArchiver server should have at least 768 MBytes of memory and a 800MHz CPU or better. Since SQL Server is a memory intensive program, it always perform better on a machine with more memory.
For sites supporting fewer than 25 customers, it may be possible to install an LGMapper Server on the same machine as a Listener. For sites with more than 25 customers, it is strongly recommended a single machine be provided for each Server process shown in Figure 1-4. For sites supporting fewer than 25 customers, while it is possible to run the LGArchiver Server on the same machine as the LGMapper Server, it is recommended that the Listener run on a separate machine. If you do choose to run multiple servers on the same machine, the recommended minimum memory is 768 MBytes.
For sites supporting more than 100 customers, LGMapper Server machines must have a minimum of 768 MBytes of memory, with 1 GByte or more recommended. This ensures adequate AlarmTracker Client response time. LGArchiver Server machines must also have at least 768 MBytes of memory when supporting more than 100 customers.
AlarmTracker Client machines must have a minimum of 256 MBytes of memory and a 800 MHz processor or better. For sites supporting fewer than 25 customers, it may be possible to run AlarmTracker Clients with only 128 MBytes of memory.