Table Of Contents
Managing Fault Alarms
Understanding Faults and Fault State Machines
Using the Alarms Summary
Configuring Resource Health Monitoring
Fault State Machine Notification
Troubleshooting Fault Management
Fault Management Reference
Alarms Tab
Network Monitoring Tab
Resource Fault State Machines Reference
Server Fault State Machines
Network Fault State Machines
Network Services Fault State Machines
Storage Fault State Machines
Managing Fault Alarms
VFrame issues fault alarms if a problem occurs with a managed physical resource or an operational virtual resource (such as a device or service network). You can view fault alarms within the application and configure e-mail or syslog notifications to enhance your ability to respond to faults that VFrame cannot resolve.
This chapter explains how to monitor and configure faults, and includes the following topics:
•
Understanding Faults and Fault State Machines
•
Using the Alarms Summary
•
Configuring Resource Health Monitoring
•
Troubleshooting Fault Management
•
Fault Management Reference
Understanding Faults and Fault State Machines
VFrame defines several fault state machines to track the status of various types of resources, both physical (real devices) and logical (defined in a service network). Each fault state machine includes a set of states. Whenever a managed resource enters or leaves a state, VFrame issues a fault alarm. These alarms appear on the Alarms tab (select Tools > Alarms).
Fault state machines maintain awareness of the health of the devices in the network. Fault alarms alert you to important changes in the network, such as hardware errors or configuration change errors. These faults can be detected during normal operations, or during device rediscovery or reinventory.
VFrame automatically handles many fault alarms, particularly those relating to virtual resources defined in service networks. However, fault alarms for physical devices sometimes require intervention by operations personnel. For example, if VFrame can no longer get a heartbeat signal from a physical server, you might need to fix a physical problem such as a disconnected Ethernet cable. After you fix the problem, VFrame can clear the fault through its normal monitoring or inventory process.
You cannot explicitly clear a fault. After you fix the problem, VFrame must decide whether the fault is cleared. In some cases, you might need to clear the configuration status of a configuration problem identified on the Configuration Results tab on the Operations tab (select Tools > Operations). If the fault is for a service network, and the service network status is Deployment Blocked, you will probably also have to redeploy the network. For server faults, you might have to restart the server.
If you unmanage a device, VFrame clears all active faults on the device.
Whether the fault state machine is for logical devices defined in a service network or for physical resources, you can set up notifications so that VFrame e-mails appropriate users when faults of your selected severity occur. Thus, you can use e-mail-based paging facilities to alert operators of critical faults, so that operators do not need to constantly watch the alarms summary.
For more information on the fault state machines, how to manage them, and how to configure notifications, see the following topics:
•
Fault state machines for physical resources—Resource Fault State Machines Reference.
•
Fault state machines for virtual resources—Logical Element Fault State Machine Reference, page 14-54.
•
Setting notifications for physical resource faults—Configuring Resource Health Monitoring.
•
Setting notifications for virtual resource faults—Defining Notifications for Alarms, page 14-29.
•
Managing fault alarms—Using the Alarms Summary.
Using the Alarms Summary
The alarms summary displays fault alarms that were created by various system events. An operator should monitor fault alarms and respond to them as needed.
VFrame can respond to many faults and resolve them, particularly faults involving service-oriented elements such as logical servers or service network problems. However, faults involving physical assets sometimes require operator intervention to determine the problem and resolve it. After you fix the problem, VFrame will clear the fault during normal monitoring or reinventory processes.
You can view alarms from any context. In the Admin context, you can view alarms for all other contexts. If you view alarms from within a virtual context, you only see the faults for that context.
Procedure
Step 1
Choose Tools > Alarms to access the Alarms tab (see Alarms Tab).
The top half of the screen lists summary fault information for each element (physical device or service network element) that are affected by faults. There is one row per element, and the row might represent multiple fault alarms.
Pay attention to the state and severity of the faults:
•
State—Active indicates that there is a current problem. The system is trying to resolve the fault, but the problem might need operator intervention.
•
Severity—Fatal faults prevent the use of the affected element. Info faults usually appear if there had been a Fatal fault on the element, but the fault was cleared and the element is now operating normally.
Step 2
To view the list of current faults on the element, select it in the summary table. The details table in the bottom panel lists the current state of all fault alarms on the Current tab. Use this information to evaluate whether operator intervention is required.
•
To see the fault in a more readable format, double-click it to open a details dialog box.
•
Click History to view the previous changes to the fault alarms on this element.
Step 3
If a fault requires manual intervention, and you are working in a multi-operator organization, you can acknowledge the fault so that other operators know someone is already working on it:
•
To acknowledge a subset of the faults on an element, select the faults on the Current tab and click the Acknowledge button on the same tab. This changes the state from Active to Acknowledged, but does not affect how VFrame perceives the state of the device.
•
To acknowledge all of the faults on an element, select the element in the summary table and click the Acknowledge button for the summary table. This changes the state of all Active alarms on the element.
Step 4
If you fix the problem that is causing the fault, VFrame should notice the fix and clear the fault automatically. However, you might need to clear some configuration problems that affect a service network, and redeploy the network, before certain service network-related faults can be cleared. For more information, see Understanding Faults and Fault State Machines.
Tips
•
If you acknowledge a fault but cannot fix it, select it and click Unacknowledge. This returns the state to Active (unless VFrame was able to clear the fault) so that another operator might take up the problem.
•
You can view detailed information about an element by right-clicking it in the summary table and selecting Show Details.
•
If you select more than one element in the upper pane, the lower pane shows the faults on all selected elements.
Related Topics
•
Understanding Faults and Fault State Machines
Configuring Resource Health Monitoring
Physical resource health monitoring enables you to configure e-mail notifications for resource fault alarms. These notifications are based on alarm notification level, which uses the same names as those used for alarm severity. Although each fault state has a default alarm notification level that is the same as the alarm severity, you can modify the alarm notification level to suit your notification requirements. When you change the alarm notification level, how the system responds to the faults is not affected.
Before You Begin
The system settings must identify an e-mail address and related account information that VFrame can use for sending e-mail notifications. Select Tools > VFrame Administrator > General to configure these settings if you have the appropriate permissions.
Procedure
Step 1
Select Tools > VFrame Administrator > Network to open the Resource Health Monitoring tab.
Step 2
Configure e-mail notifications for each category of fault state machine (Server, Network, Network Services, Storage), as follows:
a.
Click New in the Notification group to create a new notification setting. You are prompted to enter the e-mail addresses of the people who should be notified for each fault alarm level (severity). Use a comma and space to separate multiple entries. Users will get notifications for the selected severity and worse severities.
b.
Click Save to save your changes. Adding items to the notifications table does not save them.
Step 3
(Optional.) To modify the notification alarm levels of the states defined for each fault state machine, select each of the fault state machines listed in the left column of the Monitoring group. The right column shows the states for each. For an explanation of each fault state machine, see Resource Fault State Machines Reference.
For some states, you can also specify how many polling cycles must indicate the state change before a fault alarm is issued. Increasing the number of required polling cycles can reduce false positives if you have a slow network; however, it can also increase your network down time.
Click Save to save your changes.
Related Topics
•
Understanding Faults and Fault State Machines
•
Resource Fault State Machines Reference
Fault State Machine Notification
Fault State Machine monitoring enables you to configure e-mail notifications for resource fault alarms. These notifications are based on alarm notification level, which uses the same names as those used for alarm severity. Although each fault state has a default alarm notification level that is the same as the alarm severity, you can modify the alarm notification level to suit your notification requirements. When you change the alarm notification level, how the system responds to the faults is not affected.
Procedure
Step 1
Choose View > Operations to open the Operations tab (see Network Monitoring Tab).
Step 2
Select a Service Network from the list at the at the left.
Step 3
Click on the Policies button to open the Service Network Policy Management dialog box.
Step 4
Select the Notification tab.
a.
Click the + (plus) button to create a new notification setting. You are prompted to enter the e-mail addresses of the people who should be notified for each fault alarm level (severity). Use a comma and space to separate multiple entries. Users will get notifications for the selected severity and worse severities.
b.
At this point a notification set can be created for either the entire service network or for any of the devices within the service network. For individual devices, select the devices icon in the list of devices located to the left.
c.
Click OK to save your notification settings.
Step 5
(Optional.) To modify the notification alarm levels of the states defined for each fault state machine, select each of the fault state machines listed in the left column of the Service Network Policy Management dialog box.
Step 6
Select the Alarms tab to open a list of State Machines.
Step 7
Select a State Machine from the list. This opens a column to the right that contains alarm severities that can be changed for the selected state machine.
For some states, you can also specify how many polling cycles must indicate the state change before a fault alarm is issued. Increasing the number of required polling cycles can reduce false positives if you have a slow network; however, it can also increase your network down time.
Click OK to save your changes.
Troubleshooting Fault Management
This section describes some problems you might encounter when working with fault alarms and their solutions, and includes the following topics:
•
Users do not receive e-mail notifications for fault alarms.
•
A fault alarm changes states every few minutes.
Problem
Users do not receive e-mail notifications for fault alarms.
Solution
If you configured notification settings for a fault state machine, and a fault alarm occurred that should have generated an e-mail but did not, the problem is probably that the SMTP settings for VFrame are not configured correctly. Choose Tools > VFrame Administration > General, and then select the SMTP tab. Ensure that a valid, existing e-mail address and SMTP server are specified. Click Test Settings to verify that VFrame can use the account.
Problem
A fault alarm changes states every few minutes.
Solution
If a fault alarm changes states every few minutes, there might be network congestion that is preventing heartbeat packets or other types of polling to register within the default time frame. For example, a switch might have an SNMP Reachable fault that changes states between down and up every few seconds. If there is no real problem with the affected resource, you can try two methods to reduce the frequency of state changes.
•
For all fault state machines, you can increase the SNMP timeout. Choose Tools > VFrame Administrator > General, click the General tab, and increase the time for SNMP Timeout. This allows VFrame to wait longer before deciding that an SNMP poll will not be answered.
•
For all fault state machines, you can increase the number of polling cycles. Choose Tools > VFrame Administration > Network. On the Network dialog box, click the Monitoring tab. Select a tab for the appropriate fault state machine (Server, Network, Network Service, or VirtualMachineManagement), then increase the Polling Interval. If you can change the number of polling cycles, increase the number for the bad states. Increasing the cycles for bad states allows VFrame to wait before marking a resource bad without preventing VFrame from marking it good right away.
Fault Management Reference
This section describes the tabs and dialog boxes you use when managing faults, and includes the following topics:
•
Alarms Tab
•
Network Monitoring Tab
•
Resource Fault State Machines Reference
Alarms Tab
Use the Alarms tab to monitor the system for fault alarms. Fault alarms alert you to problems in either physical resources or virtual resources. For example, if a server that is acquired by a service network becomes unresponsive, you will see fault alarms for both the physical resource and the virtual resource. Using this dialog box, data center operators can acknowledge fault alarms and fix those problems that VFrame cannot fix itself.
How to Get to This Dialog Box
Choose Tools > Alarms to open the Alarms tab.
Related Topics
•
Understanding Faults and Fault State Machines
•
Using the Alarms Summary
Field Reference
Table 15-1 Alarms Tab
Element
|
Description
|
Fault Summary List
The fault alarms for the elements in the system that have current or recent faults. These faults are within the scope of the current virtual context to which you are logged in, unless you are logged into the Admin context, which lists faults for all virtual contexts.
When you select a row, the faults for that element appear in the details panel at the bottom of the window. You can also double-click a row to view the alarm details, which is the same information that appears in this table in a more readable format.
If you right-click on a fault, you can perform the following actions:
• Acknowledge—The same as clicking the Acknowledge button.
• Unacknowledge—The same as clicking the Unacknowledge button.
• Show Properties—Opens a dialog box where you can view the properties of the device or service element that is affected by the faults.
• Show Details—Opens the Alarm Details dialog box, which is the same as double-clicking the row.
|
Acknowledge button
|
Click this button to acknowledge all faults on the selected element, that is, change the state of all faults to Acknowledged. Only faults in the active state are changed to the acknowledged state. If you want to change the state of only a subset of the faults on the element, select the faults on the Current tab in the details panel.
Acknowledging a fault does not affect anything in the VFrame system. The change lets other operators know that someone has taken ownership of the fault and is trying to fix it.
|
Unacknowledge button
|
Click this button to change the state of all faults on the selected element to Active. This button is active only if all faults for an element are in the Acknowledged state. If you want to change the state of only a subset of the faults on the element, select the faults on the Current tab in the details panel.
Unacknowledging a fault does not affect anything in the VFrame system. The change lets other operators know that no one is trying to fix the problem, so that another operator can take on the task.
|
Max Results
|
The maximum number of alarms to display in the table.
|
Filter button
|
Click this button to open the Filter Settings dialog box where you can define a filter to limit the types of faults displayed in the table.
In the Filter Settings dialog box, on the Filter tab, select the attributes that define the fault alarms you want to see. Click the More link to add other types of attributes. You can filter on most of the fields shown in the fault summary table (described below).
|
Severity
|
The worst alarm severity level of the faults on the element. The order of seriousness is Fatal, Error, Warning, Info.
|
State
|
The state of the faults on the element:
• Active—At least one current fault is in an active fault, that is, a problem.
• Cleared—All of the faults are currently in the Cleared state, indicating that they are no longer problems.
• Acknowledged—All of the current problems have been acknowledged, indicating that someone is fixing the fault.
|
Name
|
The name of the physical resource or service element that generated the fault alarms.
|
Summary
|
A summary of all the fault alarms. Each alarm shows the fault state machine name and status in brackets with a description of the fault. These faults are listed in a more readable format on the Current tab in the details panel below the table.
|
Type
|
The type of element such as the Firewall Services Module. The type can be a physical resource type or a service network block type:
• Physical resources—A resource that is discovered and managed on the Resources tab (select View > Resources). If the physical resource is used by a service network, there is probably also a fault generated for the related service element. You can view more details about physical resource fault state machines, and configure them, by selecting Tools > VFrame Administrator > Network, then select the Monitoring tab.
• Service network elements—An element block in a service network, for example, a logical server, as defined within a specific service network. You can view more details about service network fault state machines by selecting View > Operations, and then by clicking the Policies button, within the virtual context in which the service network is running.
|
Service Network
|
The service network whose operation generated the fault alarm, if any.
|
Context
|
The virtual context in which the service network operates, if applicable.
|
Time Modified
|
The most recent date and time that a fault alarm for this element was updated.
|
Details Panel
The detailed fault information for the items selected in the fault summary table.
The Acknowledge and Unacknowledge buttons function as described for the summary table, except that they change the state only of the alarm selected on the Current tab.
The Filter button works the same way as the one described for the summary table, but its scope is limited to the specific table displayed in the details pane. You can filter on each tab in the pane.
|
Current tab
|
The current state of the fault alarms on the element. The fields are the same as the ones described for the summary table, except that the fields apply specifically to the listed fault. There are also the following additional fields (double-click a fault to see all fields in a more readable format):
• Attribute—The name of the fault state machine.
• Status—The status of the fault based on the state of the fault state machine.
• Description—The description of the fault state.
• Previous State—The state of the fault state machine for this element prior to the current state.
• Time—The time this state change occurred.
|
History tab
|
A list of the state changes that have occurred for all fault state machines on the selected element. The fields are the same as the ones described for the summary and current faults tables, with this additional field (double-click a fault to see all fields in a more readable format):
• Generated by—Who generated the state change.
|
Acquired Resources tab
|
The physical resources that are assigned to this service network element. Information includes the name of the resource and its type.
|
Network Monitoring Tab
Use the Network Monitoring tab to modify the alarm notification levels for physical resource faults and to configure e-mail notifications for types of faults. For example, you can configure fatal faults to generate e-mail to the text-based pagers of key operators.
This tab applies only to physical resources. When you design a service network, you also have the opportunity to configure service-network-related fault policies.
How to Get to This Tab
Choose Tools > VFrame Administration > Network to open the Network dialog box—Select the Monitoring Tab at the top of the dialog box.
Related Topics
•
Understanding Faults and Fault State Machines
•
Configuring Resource Health Monitoring
Field Reference
Table 15-2 Resource Health Monitoring Tab
Element
|
Description
|
Major tabs:
Server
Network
Network Services
Storage
VirtualMachineManagement
|
Each of the following tabs contain the same fields, although the types of faults for each resource are different.
• Server—Faults related to physical servers.
• Network—Faults related to network equipment (Ethernet switches).
• Network Services—Faults related to service modules, such as FWSM (Firewall Services Module) and CSM (Content Switching Module).
• Storage—Faults related to SAN or NAS storage.
• VirtualMachineManagement—A list of the fault state machines defined for the device. These fault state machines are categorized in folders by type of resource.
|
Monitoring section
|
Polling Interval
|
How often VFrame should contact the resource or otherwise access the status of the resource. Shorter times can provide quicker failover in cases where the resource becomes unresponsive. However, short polling times can increase traffic on your network and reduce overall performance.
|
Resource MOTypes
|
When you select a fault state machine from this list, you see the states for it in the right pane. These are the states that VFrame monitors. Each state has a default alarm notification level that is the same as the fault alarm severity. If the state changes, VFrame generates a fault alarm of using the severity, not the alarm notification level. Use the alarm notification level to help you create the e-mail notification settings that you require.
Changing the alarm notification level does not alter how seriously VFrame views a particular fault.
Some states have additional parameters that you can change. For an explanation of each fault state machine and its parameters, see Resource Fault State Machines Reference.
|
Notifications section
|
Plus button (New)
|
Click this button to add a notification setting to the table. The Fault Notification Entry dialog box opens where you can enter a list of e-mail addresses for each fault alarm notification level, which are listed by severity level. If you enter more than one e-mail address for a given alarm notification level, separate them with a comma and space, such as:
sysadmin@example.com, sysadmin2@example.com
E-mail notifications are sent to these addresses for any faults of that alarm notification level (or worse) on this type of resource (server, network, network services, storage).
|
Pencil button (Edit)
|
Click this button to modify the selected fault notification setting. You can also double-click the setting in the table. The Fault Notification Entry dialog box opens so that you can modify the settings.
|
Garbage Can button (Delete)
|
Click this button to delete the selected notification setting from the table.
|
Notification table
|
Lists the notifications configured for this resource type.
|
Apply button
|
Click this button to save your changes, including additions, deletions, or modifications of entries in the notifications table. With notifications, adding an item to the table does not save the entry.
|
Resource Fault State Machines Reference
This section describes the fault state machines that are used with physical resources, and includes the following topics:
•
Server Fault State Machines
•
Network Fault State Machines
•
Network Services Fault State Machines
•
Storage Fault State Machines
Server Fault State Machines
The server fault state machines relate to your application servers. You can configure fault alarm notification level (priority) for fault alarms generated by changes to this state. The alarm notification level determines which notifications are sent based on your notification settings.
Table 15-3 explains the server fault state machines and is organized according to the folder representation in the interface.
Table 15-3 Physical Server Resource Fault State Machines
Fault State Machine
|
Description and States
|
Physical Server
Fault state machines related to physical servers.
|
Server Inventory
|
When you reboot a managed server, or select the server on the Resources tab and right-click on the server icon, a menu appears, If you select Retrieve Server Attributes, VFrame takes an inventory of the system hardware features and compares it to the previously known state of the server. The following are the possible results:
• Manageability Changed—The reinventory job determined that there was a hardware change that makes the server no longer eligible for management. For example, the server might not be connected to a managed switch. The server cannot be acquired by a service network when it is in this state. Fix whatever problem is making the server unmanageable.
• Minimum Requirement Not Met—The server does not satisfy the minimum requirements for a managed server. The CPU does not use the x86 architecture, or the host bus adapter (HBA) is not an Emulex or Qlogic adapter.
• Inventory Failed—The reinventory job failed. The failure might indicate a problem with the network connection or with the server. Try to perform the inventory again.
• OK—The server features have not changed.
• Inventory Changed—The server features changed. For example, it might have more memory than it used to have. If your resource pools for servers are dynamic and use rules that relate to the changed features, the server might move from one pool to another pool.
|
Server LOM
|
The status of the lights-out management (LOM) interface on the physical server:
• LOM Missing—VFrame cannot find a LOM interface for the physical server on which a power management function needs to be performed. Ensure the server LOM interface is connected to the network and that VFrame discovers it. If the server does not have a LOM interface, install one or replace the server.
• OK—The LOM interface and macros are functioning correctly.
|
Heartbeat
|
Whether VFrame is getting heartbeat messages from the VFrame Host Agent (VHA) running on a server. Heartbeats are issued only from physical servers that were acquired as logical servers for a service network. The possible states are as follows:
• Mapped Virtual Server Failed—VFrame did not get a heartbeat from the logical server. This state indicates the physical server is not functioning properly. Depending on the policies defined for the service network, VFrame might obtain a new physical server for the mapped logical server.
• OK—VFrame is getting heartbeats from the logical server, which means the physical server is functioning properly.
|
Physical Server LOM
Fault state machines related to LOM Managers. You create LOM Managers on the LOM Managers dialog box (select Tools > LOM Managers) and they appear in the LOM Managers dialog box.
|
LOM
|
The status of the lights-out management (LOM) Manager:
• LOM Macro Failed—VFrame tried to perform a power management function using the LOM Manager, but the macro used to perform the function did not work properly. Test your macro and make the changes required for it to function correctly.
• OK—The LOM Manager is functioning correctly.
|
Related Topics
•
Understanding Faults and Fault State Machines
•
Configuring Resource Health Monitoring
•
Network Monitoring Tab
Network Fault State Machines
The network fault state machines relate to your Ethernet switches. You can configure the following settings:
•
Alarm Notification Level—The fault alarm notification level (priority) for fault alarms generated by changes to this state. The alarm notification level determines which notifications are sent based on your notification settings.
•
Polling Cycles—In many of these fault state machines, you can set the number of polling cycles that must occur before the state is changed. If your network is congested, configuring the polling cycle to be higher than one cycle can prevent unnecessary state changes.
Table 15-4 explains the network fault state machines and is organized according to the folder representation in the interface.
Table 15-4 Physical Network Resource Fault State Machines
Fault State Machine
|
Description and States
|
Network Interface
Fault state machines related to the switch network interfaces.
|
Inventory Check
|
VFrame regularly takes inventory of network interfaces connected to managed devices to determine if something has changed. The results of this inventory check can result in the following states:
• Missing—The network interface is currently used in a service network, but it is no longer on the switch. This is a serious problem that affects service network functioning.
• OK—There are no identifiable problems with the port.
|
Port Status
|
The status of the port. The possible states are as follows:
• Down—The port is not functioning properly.
• Admin Down—The port has been shut down intentionally by an administrator.
• Up—The port is up and functioning properly.
|
Module Status
|
Specifies whether the service module that hosts the Ethernet port is functioning correctly. The possible states are as follows:
• Up—The module is functioning correctly.
• Down—There is a problem with the module and it is not functioning correctly.
|
Line Card Module
Fault state machines related to the an Ethernet port module in the switch.
|
OS Version State
|
VFrame rediscovers modules as a regular part of maintaining an accurate device inventory. During rediscovery, VFrame compares the operating system (OS) version running on the Ethernet module with the versions that VFrame supports. The possible states are as follows:
• Unknown OS Version—VFrame could not determine the OS version running on the module. VFrame cannot use a module in this state.
• Unsupported Version—VFrame cannot manage a module running this version of the OS. VFrame cannot use a module in this state.
• OK OS Version—The module is running a supported OS version.
|
Module Status
|
Whether the line card module is functioning correctly:
• Up—The module is functioning correctly.
• Down—There is a problem with the module and it is not functioning correctly.
|
Inventory Check
|
VFrame regularly takes inventory of the managed modules in a switch to determine if something has changed. The results of this inventory check can result in the following states:
• Missing—A managed line card is no longer in the switch. If the module was intentionally removed, unmanage the module. Otherwise, identify and fix any problems associated with the module.
• OK—There are no identifiable problems with the module.
|
Supervisor Module
Fault state machines related to the switch supervisor engine.
|
Module Status
|
Whether the supervisor engine is functioning correctly:
• Up—The module is functioning correctly.
• Down—There is a problem with the module and it is not functioning correctly.
|
Inventory Check
|
VFrame regularly takes inventory of the managed modules in a switch to determine if something has changed. The results of this inventory check can result in the following states:
• Missing—A managed supervisor engine is no longer in the switch. If the module was intentionally removed, unmanage the module. Otherwise, identify and fix any problems associated with the module.
• OK—There are no identifiable problems with the module.
|
OS Version State
|
VFrame rediscovers switches as a regular part of maintaining an accurate device inventory. During rediscovery, VFrame compares the operating system (OS) version running on the supervisor engine with the versions that VFrame supports. The possible states are as follows:
• Unknown OS Version—VFrame could not determine the OS version running on the module. VFrame cannot use a module in this state.
• Unsupported Version—VFrame cannot manage a module running this version of the OS. VFrame cannot use a switch in this state.
• OK OS Version—The switch is running a supported OS version.
|
Switch Chassis
Fault state machines related to the switch as a whole.
|
OS Version State
|
VFrame rediscovers switches as a regular part of maintaining an accurate device inventory. During rediscovery, VFrame compares the operating system (OS) version running on the switch with the versions that VFrame supports. The possible states are as follows:
• Unsupported Version—VFrame cannot manage a switch running this version of the switch OS. VFrame cannot use a switch in this state.
• OK OS Version—The switch is running a supported OS version.
|
Device OS Version Change
|
Whether the operating system (OS) version running on the device has changed since the service network was deployed.
• Change—The OS version changed.
• No Change—The OS version on the device has not changed.
|
Inventory Check
|
VFrame regularly takes inventory of managed switches to determine if something has changed. The results of this inventory check can result in the following states:
• VLAN Missing—A managed VLAN is no longer defined on the switch. Check the switch and recreate the VLAN, or unmanage the VLAN.
• OK—There are no identifiable problems with the switch.
|
SNMP Reachable
|
While VFrame was monitoring the switch, whether an SNMP query of the switch was successful. The possible states are as follows:
• Down—The SNMP query was unsuccessful. Ensure that you configured the correct SNMP credentials in VFrame. If the credentials are correct, check the network connectivity (ensure that VFrame can route to the switch) and check the switch for configuration problems.
• Up—The SNMP query was successful.
|
SSH Reachable
|
Whether VFrame can connect to the device using SSH. The possible states are as follows:
• Unreachable—VFrame cannot connect to the device using SSH. Check the credentials defined for the device. If the credentials are correct, check the network connectivity (ensure that VFrame can route to the switch) and check the switch for configuration problems.
• Reachable—VFrame can connect to the device using SSH.
|
Hardware Change
|
Whether the physical hardware for this switch changed since the last inventory. The possible states are as follows:
• Hardware ID Change—The switch serial number is not the same as the one previously stored for the switch based on the management IP address. The switch cannot be acquired by service networks until you clear this problem. Your options are:
– If you accidentally used an existing management IP address on a new switch, fix the new switch configuration and rediscover both switches.
– If you are replacing a switch, unmanage the switch in VFrame, delete it, and then rediscover it. You will also have to rediscover any servers that were connected to the replaced switch (see Discovering Servers, page 6-10).
• OK—The physical hardware has not changed.
|
VLAN
Fault state machines related to the VLANs defined in the switches.
|
VFrame-Created VLAN Config
|
Whether a VLAN that only VFrame should create was found unexpectedly on a switch. If you define a VLAN as a VFrame-created VLAN, it should appear on a switch only if the switch is used by a deployed (operational) service network. (For information on defining VFrame-created VLANs, see Adding New VLAN Resources, page 9-9.) This fault state machine does not apply to discovered (pre-defined) VLANs.
The possible states are as follows:
• Found—The VFrame-created VLAN is defined on one or more switches but it is not being used by a service network. If the VLAN is intended solely for VFrame use as a VFrame-created VLAN, then you can delete it from the switch. If it is not intended as a VFrame-created VLAN, then delete it from the VFrame-created VLAN resource pool.
• OK—This state can indicate the following conditions:
– For VFrame-created VLANs that are not currently being used by service networks, the VLANs are not configured on any managed switch. This is the expected result.
– For VFrame-created VLANs that are currently being used by service networks, this state does not imply that the VLAN was configured correctly on the switches. To verify VLAN configuration, from the Operations tab (select View > Operations), select a VLAN, then click the Verify button to check that VFrame correctly configured the VLAN.
|
Inventory Check
|
VFrame regularly takes inventory of managed switches to determine what VLANs are defined on the switch. The results of this inventory check can result in the following states:
• Missing—A managed VLAN is no longer defined on one or more of the managed switches. Check the switch and recreate the VLAN, or unmanage the VLAN.
• OK—There are no identifiable problems with the VLAN configuration.
|
Related Topics
•
Understanding Faults and Fault State Machines
•
Configuring Resource Health Monitoring
•
Network Monitoring Tab
Network Services Fault State Machines
The network services fault state machines relate to your Ethernet switch service modules, such as Firewall Services Modules (FWSM). You can configure the following settings:
•
Alarm Notification Level—The fault alarm notification level (priority) for fault alarms generated by changes to this state. The alarm notification level determines which notifications are sent based on your notification settings.
•
Polling Cycles—In many of these fault state machines, you can set the number of polling cycles that must occur before the state is changed. If your network is congested, configuring the polling cycle to be higher than one cycle can prevent unnecessary state changes.
Table 15-5 explains the network services fault state machines and is organized according to the folder representation in the interface.
Table 15-5 Physical Network Services Resource Fault State Machines
Fault State Machine
|
Description and States
|
CSM Module
Fault state machines related to the Content Switching Module (CSM).
|
Failover State
|
The failover state of the module.
• Unexpected State—The failover state was not the state it was expected to be, for example, the failover state is neither active nor standby.
• Standby—The failover state is in the standby state.
• Active—The failover state is in the active state.
• Non-HA—The module is not configured for high availability.
|
OS Version State
|
VFrame rediscovers modules as a regular part of maintaining an accurate device inventory. During rediscovery, VFrame compares the OS version running on the module with the versions that VFrame supports. The possible states are as follows:
• Unsupported Version—VFrame cannot manage a module running this version of the OS. VFrame cannot use a module in this state.
• OK OS Version—The module is running a supported OS version.
|
HA State
|
The high-availability (HA) state of the module.
• Changed to Non-HA Detected on Device—The module was configured to be part of a high-availability group, but its configuration was changed to non-HA.
• Second Member Changed to Non-HA—This module is the last member of the HA group and its configuration was changed to non-HA.
• Changed to HA Detected on Device—The module configuration was changed from non-HA to HA.
• OK—There is no change to the HA configuration.
• Non-HA—The module is not configured for high availability.
• DB Non-HA State Synchronized with Device—During reinventory, VFrame discovers that a module configuration was changed from HA to non-HA. VFrame removes the module from the HA pair in the VFrame database. Note that you can only remove one member of the HA pair at a time.
• DB HA State Synchronized with Device—During reinventory, a non-VFrame-managed module that is part of an HA pair with a VFrame-managed module was discovered and placed into management.
|
Failover Link State
|
Whether the failover link state between members of an HA pair has changed.
• Up—The failover link is up.
• Down—The failover link is down.
• Non-HA—The module is not configured for high availability.
|
Change HA Group
|
Whether the module has joined a different HA group.
• HA Group Change Detected—The module has joined a different HA group.
• OK—The module HA configuration has not changed.
• Non-HA—The module is not configured for high availability.
|
Config Sync State
|
Whether the module is synchronized with its HA peer.
• HA Pair Out of Sync—The configuration is not synchronized with the peer configuration.
• Non-HA—The module is not configured for high availability
• HA Pair in Sync—The configuration is synchronized with the peer configuration.
|
Peer Config
|
The state of the peer configuration for a high-availability configuration.
• Peer Missing—The peer is missing.
• OK—There has been no change in the peer configuration since the last reinventory.
• Non-HA—The module is not configured for high availability.
|
Resource Type Change
|
Whether the resource type for the module has changed.
• Changed to Non-HA—The module was originally part of a managed HA pair, but was reconfigured to remove its HA configuration. To resolve this fault, unmanage the module, and then manage it again.
• OK—The resource type has not changed.
|
Module Status
|
Whether the service module is functioning correctly:
• Up—The module is functioning correctly.
• SSH Unreachable—VFrame cannot log into the module using SSH. SSH login is required to configure the module.
• Down—There is a problem with the module and it is not functioning correctly.
|
Inventory Check
|
VFrame regularly takes inventory of the managed modules in a switch to determine if something has changed. The results of this inventory check can result in the following states:
• Missing—A managed CSM is no longer in the switch. If the module was intentionally removed, unmanage the module. Otherwise, identify and fix any problems associated with the module.
• OK—There are no identifiable problems with the module.
|
FW Services Module
Fault state machines related to the Firewall Services Module (FWSM).
|
Peer Config
|
The state of the peer configuration for a high-availability configuration.
• Peer Missing—The peer is missing.
• OK—There has been no change in the peer configuration since the last reinventory.
• Non-HA—The module is not configured for high availability.
|
OS Version State
|
VFrame rediscovers modules as a regular part of maintaining an accurate device inventory. During rediscovery, VFrame compares the OS version running on the module with the versions that VFrame supports. The possible states are as follows:
• Unsupported Version—VFrame cannot manage a module running this version of the OS. VFrame cannot use a module in this state.
• OK OS Version—The module is running a supported OS version.
|
Failover Link State
|
Whether the failover link state between members of an HA pair has changed.
• Up—The failover link is up.
• Down—The failover link is down.
• Non-HA—The module is not configured for high availability.
|
HA State
|
The high-availability (HA) state of the module.
• Changed to Non-HA Detected on Device—The module was configured to be part of a high-availability group, but its configuration was changed to non-HA.
• Second Member Changed to Non-HA—This module is the last member of the HA group and its configuration was changed to non-HA.
• Changed to HA Detected on Device—The module configuration was changed from non-HA to HA.
• OK—There is no change to the HA configuration.
• Non-HA—The module is not configured for high availability.
• DB Non-HA State Synchronized with Device—During reinventory, VFrame discovers that a module configuration was changed from HA to non-HA. VFrame removes the module from the HA pair in the VFrame database. Note that you can only remove one member of the HA pair at a time.
• DB HA State Synchronized with Device—During reinventory, a non-VFrame-managed module that is part of an HA pair with a VFrame-managed module was discovered and placed into management.
|
Change HA Group
|
Whether the module has joined a different HA group.
• HA Group Change Detected—The module has joined a different HA group.
• OK—The module HA configuration has not changed.
• Non-HA—The module is not configured for high availability.
|
Resource Type Change
|
Whether the resource type for the module has changed.
• Change to Non-HA—The module was originally part of a managed HA pair, but was reconfigured to remove its HA configuration. To resolve this fault, unmanage the module, then manage it again.
• OK—The resource type has not changed.
|
Module Status
|
Whether the service module is functioning correctly:
• SSH Unreachable—VFrame cannot log into the module using SSH. SSH login is required to configure the module.
• Up—The module is functioning correctly.
• Down—There is a problem with the module and it is not functioning correctly.
|
Inventory Check
|
VFrame regularly takes inventory of the managed modules in a switch to determine if something has changed. The results of this inventory check can result in the following states:
• Missing—A managed FWSM is no longer in the switch. If the module was intentionally removed, unmanage the module. Otherwise, identify and fix any problems associated with the module.
• OK—There are no identifiable problems with the module.
|
Context Mode Change State
|
Whether the FWSM has changed from multiple mode to single mode.
• Changed to Single Context Mode—The mode has changed.
• OK—The mode has not changed.
|
Virtual Context
Fault state machines related to FWSM security contexts.
|
Firewall Virtual Context Status
|
The status of the security context.
• Active—The security context is functioning correctly.
• Not Initialized—The security context configuration is not complete. For example, the URL might not be configured, or the security context is down due to some other error.
• Unexpected State—VFrame could not determine the state or the state was not the expected one.
|
Firewall Mode Change State
|
Whether the firewall mode has changed for the security context.
• Changed to Transparent—The mode has changed to transparent mode.
• Change to Routed—The mode has changed to routed mode.
• OK—The mode has not changed.
|
Failover State
|
The failover state of the security context.
• Unexpected State—The failover state was not the state it was expected to be, for example, the failover state is neither active nor standby.
• Standby—The failover state is in the standby state.
• Active—The failover state is in the active state.
• Non-HA—The security context is not configured for high availability.
|
Firewall Virtual Context Present
|
VFrame regularly takes inventory of the security contexts defined in an FWSM to determine if something has changed. The results of this inventory check can result in the following states:
• Missing—A managed security context is no longer in the FWSM. If the security context was intentionally removed, unmanage the context. Otherwise, identify and fix any problems associated with the context.
• OK—There are no identifiable problems with the security context.
|
Peer Config
|
The state of the peer configuration for a high-availability configuration.
• Peer Missing—The peer is missing.
• OK—There has been no change in the peer configuration since the last reinventory.
• Non-HA—The security context is not configured for high availability.
|
Related Topics
•
Understanding Faults and Fault State Machines
•
Configuring Resource Health Monitoring
•
Network Monitoring Tab
Storage Fault State Machines
The storage fault state machines relate to your storage devices, such as storage area network (SAN) equipment and NAS filers. You can configure the following settings:
•
Alarm Notification Level—The fault alarm notification level (priority) for fault alarms generated by changes to this state. The alarm notification level determines which notifications are sent based on your notification settings.
•
Polling Cycles—In many of these fault state machines, you can set the number of polling cycles that must occur before the state is changed. If your network is congested, configuring the polling cycle to be higher than one cycle can prevent unnecessary state changes.
Table 15-6 explains the storage fault state machines and is organized according to the folder representation in the interface.
Table 15-6 Physical Storage Resource Fault State Machines
Fault State Machine
|
Description and States
|
CIFS Share
Fault state machines related to CIFS shares.
|
CIFS Share Destroyed
|
Whether the CIFS share was destroyed, true or false.
|
CIFS Share Volume Offline
|
Whether the volume used by the share is offline, true or false.
|
CIFS Share State
|
The current state of a managed CIFS share:
• Missing on Filer—The share does not exist on the NAS filer
• Unknown—The share state is unknown because of a communication failure with the NAS filer.
• Known—The share state is known.
|
CIFS Share Volume Missing
|
Whether the volume used by the share is missing from the database, true or false.
|
FC Port
Fault state machines related to Fibre Channel ports.
|
Storage Port VSAN Membership State
|
Whether the managed storage array port is in the expected VSAN.
• Not Present—The port is not in the expected VSAN. This state might happen if you move a port to a different VSAN or if the port is offline. Check the storage array to determine the exact problem.
• Present—The port is in the expected VSAN.
|
Storage Inventory
|
If you are using storage manager templates, and run a discovery job to rediscover storage devices you are already managing in VFrame, the system takes inventory of the managed Fibre Channel ports in a storage array to determine if something has changed. The results of this inventory check can result in the following states:
• Physical Resource Missing—A managed port is no longer on the storage array.
• OK—There were no changes to managed ports.
|
Logical Fabric
Fault state machines related to SAN fabrics.
|
VSAN Membership State
|
Whether a VSAN storage port membership includes the expected storage ports.
• OK—All of the expected storage ports are members of the VSAN.
• Missing Storage Ports—At least one storage port that is supposed to be part of the VSAN is not part of it. The fault alarm shows the World Wide Name (WWN) of the missing port.
|
VSAN State
|
The current state of a managed VSAN:
• Down—The seed switch indicates that the VSAN operational state is down.
• Missing—The VSAN was previously discovered in the fabric, but was not found during rediscovery. Check the VSAN configuration in the storage device. Before removing a VSAN, you should first unmanage it in VFrame to avoid this fault.
• Up—The seed switch indicates that the VSAN operational state is up.
|
Logical Unit
Fault state machines related to SAN logical units (LUN).
|
Storage Inventory
|
If you are using storage manager templates, and run a discovery job to rediscover storage devices you are already managing in VFrame, the system takes inventory of the managed LUNs in a storage array to determine if something has changed. The results of this inventory check can result in the following states:
• Physical Resource Missing—A managed LUN no longer exists.
• All Mappings Missing—This managed LUN does not have any mappings to ports, which means it is not visible to outside networks. The LUN is not usable in this state. Correct the storage configuration in the storage array.
• Certain Mappings Missing—A mapping that was previously discovered to exist (between a LUN and a port) no longer exists. If the mapping should still exist, check the storage configuration in the storage array for errors.
• OK—There were no changes to managed LUNs.
|
NFS Qtree
Fault state machines related to NFS Qtrees.
|
Qtree Volume is Not Online
|
Whether the Qtree volume is not online. True indicates the volume is off-line.
|
Qtree Removed
|
Whether the Qtree was removed, true or false.
|
NFS Volume
Fault state machines related to NFS filer volumes.
|
Volume State
|
The state of the volume as identified in the device ONTAPI response or Unreachable if VFrame cannot complete an ONTAPI call to obtain the volume state.
The only state that allows normal function is Online. All other states mean that VFrame cannot use the volume.
|
NFS Volume Availability Check
|
The state of an NFS volume availability:
• Destroyed—The volume was destroyed.
• Export Rule Not Programmed—The export rule for the volume is not defined, and the volume is not usable.
• Online—The volume is online and available for use.
|
Storage Array
Fault state machines related to SAN storage arrays.
|
Storage Inventory
|
If you are using storage manager templates, and run a discovery job to rediscover storage devices you are already managing in VFrame, the system takes inventory of the managed storage arrays to determine if something has changed. The results of this inventory check can result in the following states:
• Physical Resource Missing—A managed storage array cannot be found.
• OK—There were no changes to managed storage arrays.
|
Storage Station
Fault state machines related to storage managers.
|
Storage Station State
|
The current state of the storage manager, which is used for discovering and managing storage arrays.
• Login Error—The credentials defined for logging into the storage manager did not work. Correct the credentials defined for the storage manager.
• Unreachable—The storage manager cannot be pinged from VFrame. Verify that the storage manager is online and that it is on a network that VFrame can route to.
• OK—The storage manager is functional.
|
Related Topics
•
Understanding Faults and Fault State Machines
•
Configuring Resource Health Monitoring
•
Network Monitoring Tab