Cisco VFrame Data Center 1.1 Administration Guide
Managing Faults

Table Of Contents

Managing Fault Alarms

Understanding Faults and Fault State Machines

Using the Alarms Summary

Configuring Resource Health Monitoring

Troubleshooting Fault Management

Fault Management Reference

Alarms Tab

Resource Health Monitoring Tab

Resource Fault State Machine Reference

Server Fault State Machines

Network Fault State Machines

Network Services Fault State Machines

Storage Fault State Machines


Managing Fault Alarms


VFrame issues fault alarms if a problem occurs with a managed physical resource or an operational virtual resource (such as a device or service network). You can view fault alarms within the application and configure e-mail or syslog notifications to enhance your ability to respond to faults that VFrame cannot resolve.

The following topics explain how to monitor and configure faults:

Understanding Faults and Fault State Machines

Using the Alarms Summary

Configuring Resource Health Monitoring

Troubleshooting Fault Management

Fault Management Reference

Understanding Faults and Fault State Machines

VFrame defines several fault state machines to track the status of various types of resources, both physical (real devices) and logical (defined in a service network). Each fault state machine includes a set of states. Whenever a managed resource enters or leaves a state, VFrame issues a fault alarm. These alarms appear on the Alarms tab (select View > Alarms).

Fault state machines maintain awareness of the health of the devices in the network. Fault alarms alert you to important changes in the network, such as hardware errors or configuration change errors. These faults can be detected during normal operations, or during device rediscovery or reinventory.

VFrame automatically handles many fault alarms, particularly those relating to virtual resources defined in service networks. However, fault alarms for physical devices sometimes require intervention by operations personnel. For example, if VFrame can no longer get a heartbeat signal from a physical server, you might need to fix a physical problem such as a disconnected Ethernet cable. After you fix the problem, VFrame can clear the fault through its normal monitoring or inventory process.

You cannot explicitly clear a fault. After you fix the problem, VFrame must decide whether the fault is cleared. In some cases, you might need to clear the configuration status of a configuration problem identified on the Configuration Results tab on the Operations tab (select Operations > Operations). If the fault is for a service network, and the service network's status is Deployment Blocked, you will probably also have to redeploy the network. For server faults, you might have to restart the server.

If you unmanage a device, VFrame clears all active faults on the device.

Whether the fault state machine is for logical devices defined in a service network or for physical resources, you can set up notifications so that VFrame e-mails appropriate users when faults of your selected severity occur. Thus, you can use e-mail-based paging facilities to alert operators of critical faults, so that operators do not need to constantly watch the alarms summary.

For more information on the fault state machines, how to manage them, and how to configure notifications, see these topics:

Fault state machines for physical resources—Resource Fault State Machine Reference.

Fault state machines for virtual resources—Logical Element Fault State Machine Reference.

Setting notifications for physical resource faults—Configuring Resource Health Monitoring.

Setting notifications for virtual resource faults—Defining Notifications for Alarms.

Managing fault alarms—Using the Alarms Summary.

Using the Alarms Summary

The alarms summary displays fault alarms that were created by various system events. An operator should monitor fault alarms and respond to them as needed.

VFrame can respond to many faults and resolve them, particularly faults involving service-oriented elements such as logical servers or service network problems. However, faults involving physical assets sometimes require operator intervention to determine the problem and resolve it. After you fix the problem, VFrame will clear the fault during normal monitoring or reinventory processes.

You can view alarms from any context. In the Admin context, you can view alarms for all other contexts. If you view alarms from within a virtual context, you only see the faults for that context.

Procedure


Step 1 Select View > Alarms to open the Alarms tab (see Alarms Tab).

The top half of the tab lists summary fault information for each element (physical device or service network element) that are affected by faults. There is one row per element, and the row might represent multiple fault alarms.

Pay attention to the state and severity of the faults:

State—Active indicates that there is a current problem. The system is trying to resolve the fault, but the problem might need operator intervention.

Severity—Fatal faults prevent the use of the affected element. Info faults usually appear if there had been a Fatal fault on the element, but the fault was cleared and the element is now operating normally.

Step 2 To view the list of current faults on the element, select it in the summary table. The details table in the bottom panel lists the current state of all fault alarms on the Current tab. Use this information to evaluate whether operator intervention is required.

To see the fault in a more readable format, double-click it to open a details dialog box.

Click History to view the previous changes to the fault alarms on this element.

Step 3 If a fault requires manual intervention, and you are working in a multi-operator organization, you can acknowledge the fault so that other operators know someone is already working on it:

To acknowledge a subset of the faults on an element, select the faults on the Current tab and click the Acknowledge button on the same tab. This changes the state from Active to Acknowledged, but does not affect how VFrame perceives the state of the device.

To acknowledge all of the faults on an element, select the element in the summary table and click the Acknowledge button for the summary table. This changes the state of all Active alarms on the element.

Step 4 If you fix the problem that is causing the fault, VFrame should notice the fix and clear the fault automatically. However, you might need to clear some configuration problems that affect a service network, and redeploy the network, before certain service network-related faults can be cleared. For more information, see Understanding Faults and Fault State Machines.


Tips

If you acknowledge a fault but cannot fix it, select it and click Unacknowledge. This returns the state to Active (unless VFrame was able to clear the fault) so that another operator might take up the problem.

You can view detailed information about an element by right-clicking it in the summary table and selecting Show Properties.

If you select more than one element in the upper pane, the lower pane shows the faults on all selected elements.

Related Topics

Understanding Faults and Fault State Machines

Configuring Resource Health Monitoring

Physical resource health monitoring enables you to configure e-mail notifications for resource fault alarms. These notifications are based on alarm notification level, which uses the same names as those used for alarm severity. Although each fault state has a default alarm notification level that is the same as the alarm severity, you can modify the alarm notification level to suit your notification requirements. When you change the alarm notification level, how the system responds to the faults is not affected.

Before You Begin

The system settings must identify an e-mail address and related account information that VFrame can use for sending e-mail notifications. Select File > System Settings to configure these settings if you have the appropriate permissions.

Procedure


Step 1 Select Tools > Resource Health Monitoring to open the Resource Health Monitoring tab (see Resource Health Monitoring Tab).

Step 2 Configure e-mail notifications for each category of fault state machine (Server, Network, Network Services, Storage), as follows:

a. Click New in the Notification group to create a new notification setting. You are prompted to enter the e-mail addresses of the people who should be notified for each fault alarm level (severity). Use a comma and space to separate multiple entries. Users will get notifications for the selected severity and worse severities.

b. Click Save to save your changes. Adding items to the notifications table does not save them.

Step 3 (Optional.) To modify the notification alarm levels of the states defined for each fault state machine, select each of the fault state machines listed in the left column of the Monitoring group. The right column shows the states for each. For an explanation of each fault state machine, see Resource Fault State Machine Reference.

For some states, you can also specify how many polling cycles must indicate the state change before a fault alarm is issued. Increasing the number of required polling cycles can reduce false positives if you have a slow network; however, it can also increase your network down time.

Click Save to save your changes.


Related Topics

Understanding Faults and Fault State Machines

Resource Fault State Machine Reference

Troubleshooting Fault Management

These are some problems you might encounter with resource pools and their solutions:

Users do not receive e-mail notifications for fault alarms.

A fault alarm changes states every few minutes.

Problem   Users do not receive e-mail notifications for fault alarms.

Solution   If you configured notification settings for a fault state machine, and a fault alarm occurred that should have generated an e-mail but did not, the problem is probably that the SMTP settings for VFrame are not configured correctly. Select File > System Settings, and then the SMTP tab. Ensure that a valid, existing e-mail address and SMTP server are specified. Click Test Settings to verify that VFrame can use the account.

Problem   A fault alarm changes states every few minutes.

Solution   If a fault alarm changes states every few minutes, there might be network congestion that is preventing heartbeat packets or other types of polling to register within the default time frame. For example, a switch might have an SNMP Reachable fault that changes states between down and up every few seconds. If there is no real problem with the affected resource, you can try two methods to reduce the frequency of state changes.

For all fault state machines, you can increase the SNMP timeout. Select File > System Settings, click the Properties tab, and increase the time for SNMP Timeout. This allows VFrame to wait longer before deciding that an SNMP poll will not be answered.

For some fault state machines, you can also increase the number of polling cycles that must be missed before VFrame changes the fault state. Select Tools > Resource Health Monitoring, and find the fault state machine that is causing you problems. If you can change the number of polling cycles, increase the number for the bad states (such as Down for SNMP Reachable), but leave the number for the good states at 1. Increasing the cycles for bad states allows VFrame to wait before marking a resource bad without preventing VFrame from marking it good right away.

Fault Management Reference

These topics describe the main tabs and dialog boxes related to managing faults:

Alarms Tab

Resource Health Monitoring Tab

Resource Fault State Machine Reference

Alarms Tab

Use the Alarms tab to monitor the system for fault alarms. Fault alarms alert you to problems in either physical resources or virtual resources. For example, if a server that is acquired by a service network becomes unresponsive, you will see fault alarms for both the physical resource and the virtual resource. Using this tab, data center operators can acknowledge fault alarms and fix those problems that VFrame cannot fix itself.

How to Get to This Tab

Select View > Alarms to open the Alarms tab.

Related Topics

Understanding Faults and Fault State Machines

Using the Alarms Summary

Field Reference

Table 15-1 Alarms Tab 

Element
Description
Fault Summary List

The fault alarms for the elements in the system that have current or recent faults. These faults are within the scope of the current virtual context to which you are logged in, unless you are logged into the Admin context, which lists faults for all virtual contexts.

When you select a row, the faults for that element appear in the details panel at the bottom of the window. You can also double-click a row to view the alarm details, which is the same information that appears in this table in a more readable format.

If you right-click on a fault, you can perform these actions:

Acknowledge—The same as clicking the Acknowledge button.

Unacknowledge—The same as clicking the Unacknowledge button.

Show Properties—Opens a dialog box where you can view the properties of the device or service element that is affected by the faults.

Show Details—Opens the Alarm Details dialog box, which is the same as double-clicking the row.

Acknowledge button

Click this button to acknowledge all faults on the selected element, that is, change the state of all faults to Acknowledged. Only faults in the active state are changed to the acknowledged state. If you want to change the state of only a subset of the faults on the element, select the faults on the Current tab in the details panel.

Acknowledging a fault does not affect anything in the VFrame system. The change lets other operators know that someone has taken ownership of the fault and is trying to fix it.

Unacknowledge button

Click this button to change the state of all faults on the selected element to Active. This button is active only if all faults for an element are in the Acknowledged state. If you want to change the state of only a subset of the faults on the element, select the faults on the Current tab in the details panel.

Unacknowledging a fault does not affect anything in the VFrame system. The change lets other operators know that no one is trying to fix the problem, so that another operator can take on the task.

Max Results

The maximum number of alarms to display in the table.

Filter button

Click this button to open the Filter Settings dialog box where you can define a filter to limit the types of faults displayed in the table.

In the Filter Settings dialog box, on the Filter tab, select the attributes that define the fault alarms you want to see. Click the More link to add other types of attributes. You can filter on most of the fields shown in the fault summary table (described below).

Severity

The worst alarm severity level of the faults on the element. The order of seriousness is Fatal, Error, Warning, Info.

State

The state of the faults on the element:

Active—At least one current fault is in an active fault, that is, a problem.

Cleared—All of the faults are currently in the Cleared state, indicating that they are no longer problems.

Acknowledged—All of the current problems have been acknowledged, indicating that someone is fixing the fault.

Name

The name of the physical resource or service element that generated the fault alarms.

Summary

A summary of all the fault alarms. Each alarm shows the fault state machine name and status in brackets with a description of the fault. These faults are listed in a more readable format on the Current tab in the details panel below the table.

Type

The type of element such as the Firewall Services Module. The type can be a physical resource type or a service network block type:

Physical resources—A resource that is discovered and managed on the Resources tab (select View > Resources). If the physical resource is used by a service network, there is probably also a fault generated for the related service element. You can view more details about physical resource fault state machines, and configure them, by selecting Tools > Resource Health Monitoring.

Service network elements—An element block in a service network, for example, a logical server, as defined within a specific service network. You can view more details about service network fault state machines by selecting Operations > Operations, and then by clicking the Policies button, within the virtual context in which the service network is running.

Service Network

The service network whose operation generated the fault alarm, if any.

Context

The virtual context in which the service network operates, if applicable.

Time Modified

The most recent date and time that a fault alarm for this element was updated.

Details Panel

The detailed fault information for the items selected in the fault summary table.

The Acknowledge and Unacknowledge buttons function as described for the summary table, except that they change the state only of the alarm selected on the Current tab.

The Filter button works the same way as the one described for the summary table, but its scope is limited to the specific table displayed in the details pane. You can filter on each tab in the pane.

Current tab

The current state of the fault alarms on the element. The fields are the same as the ones described for the summary table, except that the fields apply specifically to the listed fault. There are also these additional fields (double-click a fault to see all fields in a more readable format):

Attribute—The name of the fault state machine.

Status—The status of the fault based on the fault state machine's state.

Description—The description of the fault state.

Previous State—The state of the fault state machine for this element prior to the current state.

Time—The time this state change occurred.

History tab

A list of the state changes that have occurred for all fault state machines on the selected element. The fields are the same as the ones described for the summary and current faults tables, with this additional field (double-click a fault to see all fields in a more readable format):

Generated by—Who generated the state change.

Affected Resources tab

The service network resources that are affected by this physical device fault, if any. Information includes the name of the service network in which the resource is being used, and the context where the network is running, as well as the resource name and type.

Acquired Resources tab

The physical resources that are assigned to this service network element. Information includes the name of the resource and its type.


Resource Health Monitoring Tab

Use the Resource Health Monitoring tab to modify the alarm notification levels for physical resource faults and to configure e-mail notifications for types of faults. For example, you can configure fatal faults to generate e-mail to the text-based pagers of key operators.

This tab applies only to physical resources. When you design a service network, you also have the opportunity to configure service-network-related fault policies.

How to Get to This Tab

Select Tools > Resource Health Monitoring to open the Resource Health Monitoring tab.

Related Topics

Understanding Faults and Fault State Machines

Configuring Resource Health Monitoring

Field Reference

Table 15-2 Resource Health Monitoring Tab 

Element
Description

Major tabs:

Server

Network

Network Services

Storage

Each of these tabs contain the same fields, although the types of faults for each resource are different.

Server—Faults related to physical servers.

Network—Faults related to network equipment (Ethernet switches).

Network Services—Faults related to service modules, such as FWSM (Firewall Services Module) and CSM (Content Switching Module).

Storage—Faults related to SAN or NAS storage.

Monitoring section

Polling Interval

How often VFrame should contact the resource or otherwise access the status of the resource. Shorter times can provide quicker failover in cases where the resource becomes unresponsive. However, short polling times can increase traffic on your network and reduce overall performance.

Resource MOTypes

A list of the fault state machines defined for the device. These fault state machines are categorized in folders by type of resource.

When you select a fault state machine from this list, you see the states for it in the right pane. These are the states that VFrame monitors. Each state has a default alarm notification level that is the same as the fault alarm severity. If the state changes, VFrame generates a fault alarm of using the severity, not the alarm notification level. Use the alarm notification level to help you create the e-mail notification settings that you require.

Changing the alarm notification level does not alter how seriously VFrame views a particular fault.

Some states have additional parameters that you can change. For an explanation of each fault state machine and its parameters, see Resource Fault State Machine Reference.

Notifications section

New button

Click this button to add a notification setting to the table. The Fault Notification Entry dialog box opens where you can enter a list of e-mail addresses for each fault alarm notification level, which are listed by severity level. If you enter more than one e-mail address for a given alarm notification level, separate them with a comma and space, such as:

sysadmin@example.com, sysadmin2@example.com

E-mail notifications are sent to these addresses for any faults of that alarm notification level (or worse) on this type of resource (server, network, network services, storage).

Edit button

Click this button to modify the selected fault notification setting. You can also double-click the setting in the table. The Fault Notification Entry dialog box opens so that you can modify the settings.

Delete button

Click this button to delete the selected notification setting from the table.

Notification table

Lists the notifications configured for this resource type.

Save button

Click this button to save your changes, including additions, deletions, or modifications of entries in the notifications table. With notifications, adding an item to the table does not save the entry.


Resource Fault State Machine Reference

These sections describe the fault state machines that are used with physical resources:

Server Fault State Machines

Network Fault State Machines

Network Services Fault State Machines

Storage Fault State Machines

Server Fault State Machines

The server fault state machines relate to your application servers. You can configure fault alarm notification level (priority) for fault alarms generated by changes to this state. The alarm notification level determines which notifications are sent based on your notification settings.

Table 15-3 explains the server fault state machines and is organized according to the folder representation in the interface.

Table 15-3 Physical Server Resource Fault State Machines 

Fault State Machine
Description and States
Physical Server

Fault state machines related to physical servers.

Server Inventory

When you reboot a managed server, or select the server on the Resources tab and select Actions > Server Inventory, VFrame takes an inventory of the system's hardware features and compares it to the previously known state of the server. The following are the possible results:

Manageability Changed—The reinventory job determined that there was a hardware change that makes the server no longer eligible for management. For example, the server might not be connected to a managed switch. The server cannot be acquired by a service network when it is in this state. Fix whatever problem is making the server unmanageable.

Minimum Requirement Not Met—The server does not satisfy the minimum requirements for a managed server. The CPU does not use the x86 architecture, or the host bus adapter (HBA) is not an Emulex or Qlogic adapter.

Inventory Failed—The reinventory job failed. The failure might indicate a problem with the network connection or with the server. Try to perform the inventory again.

OK—The server's features have not changed.

Inventory Changed—The server's features changed. For example, it might have more memory than it used to have. If your resource pools for servers are dynamic and use rules that relate to the changed features, the server might move from one pool to another pool.

Server LOM

The status of the lights-out management (LOM) interface on the physical server:

LOM Missing—VFrame cannot find a LOM interface for the physical server on which a power management function needs to be performed. Ensure the server's LOM interface is connected to the network and that VFrame discovers it. If the server does not have a LOM interface, install one or replace the server.

OK—The LOM interface and macros are functioning correctly.

Heartbeat

Whether VFrame is getting heartbeat messages from the VFrame Host Agent (VHA) running on a server. Heartbeats are issued only from physical servers that were acquired as logical servers for a service network. These are the states:

Mapped Virtual Server Failed—VFrame did not get a heartbeat from the logical server. This state indicates the physical server is not functioning properly. Depending on the policies defined for the service network, VFrame might obtain a new physical server for the mapped logical server.

OK—VFrame is getting heartbeats from the logical server, which means the physical server is functioning properly.

Physical Server LOM

Fault state machines related to LOM Managers. You create LOM Managers on the LOM Managers tab (select Design > LOM Managers) and they appear on the Resources tab in the LOM Managers folder.

LOM

The status of the lights-out management (LOM) Manager:

LOM Macro Failed—VFrame tried to perform a power management function using the LOM Manager, but the macro used to perform the function did not work properly. Test your macro and make the changes required for it to function correctly.

OK—The LOM Manager is functioning correctly.


Related Topics

Understanding Faults and Fault State Machines

Configuring Resource Health Monitoring

Resource Health Monitoring Tab

Network Fault State Machines

The network fault state machines relate to your Ethernet switches. You can configure these settings:

Alarm Notification Level—The fault alarm notification level (priority) for fault alarms generated by changes to this state. The alarm notification level determines which notifications are sent based on your notification settings.

Polling Cycles—In many of these fault state machines, you can set the number of polling cycles that must occur before the state is changed. If your network is congested, configuring the polling cycle to be higher than one cycle can prevent unnecessary state changes.

Table 15-4 explains the network fault state machines and is organized according to the folder representation in the interface.

Table 15-4 Physical Network Resource Fault State Machines 

Fault State Machine
Description and States
Network Interface

Fault state machines related to the switch's network interfaces.

Inventory Check

VFrame regularly takes inventory of network interfaces connected to managed devices to determine if something has changed. The results of this inventory check can result in these states:

Missing—The network interface is currently used in a service network, but it is no longer on the switch. This is a serious problem that affects service network functioning.

OK—There are no identifiable problems with the port.

Port Status

The status of the port. These are the states:

Down—The port is not functioning properly.

Admin Down—The port has been shut down intentionally by an administrator.

Up—The port is up and functioning properly.

Module Status

Whether the service module that hosts the Ethernet port is functioning correctly:

Up—The module is functioning correctly.

Down—There is a problem with the module and it is not functioning correctly.

Line Card Module

Fault state machines related to the an Ethernet port module in the switch.

OS Version State

VFrame rediscovers modules as a regular part of maintaining an accurate device inventory. During rediscovery, VFrame compares the operating system (OS) version running on the Ethernet module with the versions that VFrame supports. These are the possible states:

Unknown OS Version—VFrame could not determine the OS version running on the module. VFrame cannot use a module in this state.

Unsupported Version—VFrame cannot manage a module running this version of the OS. VFrame cannot use a module in this state.

OK OS Version—The module is running a supported OS version.

Module Status

Whether the line card module is functioning correctly:

Up—The module is functioning correctly.

Down—There is a problem with the module and it is not functioning correctly.

Inventory Check

VFrame regularly takes inventory of the managed modules in a switch to determine if something has changed. The results of this inventory check can result in these states:

Missing—A managed line card is no longer in the switch. If the module was intentionally removed, unmanage the module. Otherwise, identify and fix any problems associated with the module.

OK—There are no identifiable problems with the module.

Supervisor Module

Fault state machines related to the switch's supervisor engine.

Module Status

Whether the supervisor engine is functioning correctly:

Up—The module is functioning correctly.

Down—There is a problem with the module and it is not functioning correctly.

Inventory Check

VFrame regularly takes inventory of the managed modules in a switch to determine if something has changed. The results of this inventory check can result in these states:

Missing—A managed supervisor engine is no longer in the switch. If the module was intentionally removed, unmanage the module. Otherwise, identify and fix any problems associated with the module.

OK—There are no identifiable problems with the module.

OS Version State

VFrame rediscovers switches as a regular part of maintaining an accurate device inventory. During rediscovery, VFrame compares the operating system (OS) version running on the supervisor engine with the versions that VFrame supports. These are the possible states:

Unknown OS Version—VFrame could not determine the OS version running on the module. VFrame cannot use a module in this state.

Unsupported Version—VFrame cannot manage a module running this version of the OS. VFrame cannot use a switch in this state.

OK OS Version—The switch is running a supported OS version.

Switch Chassis

Fault state machines related to the switch as a whole.

OS Version State

VFrame rediscovers switches as a regular part of maintaining an accurate device inventory. During rediscovery, VFrame compares the operating system (OS) version running on the switch with the versions that VFrame supports. These are the possible states:

Unsupported Version—VFrame cannot manage a switch running this version of the switch OS. VFrame cannot use a switch in this state.

OK OS Version—The switch is running a supported OS version.

Device OS Version Change

Whether the operating system (OS) version running on the device has changed since the service network was deployed.

Change—The OS version changed.

No Change—The OS version on the device has not changed.

Inventory Check

VFrame regularly takes inventory of managed switches to determine if something has changed. The results of this inventory check can result in these states:

VLAN Missing—A managed VLAN is no longer defined on the switch. Check the switch and recreate the VLAN, or unmanage the VLAN.

OK—There are no identifiable problems with the switch.

SNMP Reachable

While VFrame was monitoring the switch, whether an SNMP query of the switch was successful. These are the states:

Down—The SNMP query was unsuccessful. Ensure that you configured the correct SNMP credentials in VFrame. If the credentials are correct, check the network connectivity (ensure that VFrame can route to the switch) and check the switch for configuration problems.

Up—The SNMP query was successful.

SSH Reachable

Whether VFrame can connect to the device using SSH. These are the states:

Unreachable—VFrame cannot connect to the device using SSH. Check the credentials defined for the device. If the credentials are correct, check the network connectivity (ensure that VFrame can route to the switch) and check the switch for configuration problems.

Reachable—VFrame can connect to the device using SSH.

Hardware Change

Whether the physical hardware for this switch changed since the last inventory. These are the states:

Hardware ID Change—The switch's serial number is not the same as the one previously stored for the switch based on the management IP address. The switch cannot be acquired by service networks until you clear this problem. Your options are:

If you accidentally used an existing management IP address on a new switch, fix the new switch's configuration and rediscover both switches.

If you are replacing a switch, unmanage the switch in VFrame, delete it, and then rediscover it. You will also have to rediscover any servers that were connected to the replaced switch (see Discovering Servers, page 6-13).

OK—The physical hardware has not changed.

VLAN

Fault state machines related to the VLANs defined in the switches.

VFrame-Created VLAN Config

Whether a VLAN that only VFrame should create was found unexpectedly on a switch. If you define a VLAN as a VFrame-created VLAN, it should appear on a switch only if the switch is used by a deployed (operational) service network. (For information on defining VFrame-created VLANs, see Adding New VLAN Resources.) This fault state machine does not apply to discovered (pre-defined) VLANs.

These are the possible states.

Found—The VFrame-created VLAN is defined on one or more switches but it is not being used by a service network. If the VLAN is intended solely for VFrame's use as a VFrame-created VLAN, then you can delete it from the switch. If it is not intended as a VFrame-created VLAN, then delete it from the VFrame-created VLAN resource pool.

OK—This state can indicate these conditions:

For VFrame-created VLANs that are not currently being used by service networks, the VLANs are not configured on any managed switch. This is the expected result.

For VFrame-created VLANs that are currently being used by service networks, this state does not imply that the VLAN was configured correctly on the switches. To verify VLAN configuration, from the Operations tab (select Operations > Operations), use the Verification Results and Configuration Results tabs to check that VFrame correctly configured the VLAN.

Inventory Check

VFrame regularly takes inventory of managed switches to determine what VLANs are defined on the switch. The results of this inventory check can result in these states:

Missing—A managed VLAN is no longer defined on one or more of the managed switches. Check the switch and recreate the VLAN, or unmanage the VLAN.

OK—There are no identifiable problems with the VLAN's configuration.


Related Topics

Understanding Faults and Fault State Machines

Configuring Resource Health Monitoring

Resource Health Monitoring Tab

Network Services Fault State Machines

The network services fault state machines relate to your Ethernet switch service modules, such as Firewall Services Modules (FWSM). You can configure these settings:

Alarm Notification Level—The fault alarm notification level (priority) for fault alarms generated by changes to this state. The alarm notification level determines which notifications are sent based on your notification settings.

Polling Cycles—In many of these fault state machines, you can set the number of polling cycles that must occur before the state is changed. If your network is congested, configuring the polling cycle to be higher than one cycle can prevent unnecessary state changes.

Table 15-5 explains the network services fault state machines and is organized according to the folder representation in the interface.

Table 15-5 Physical Network Services Resource Fault State Machines 

Fault State Machine
Description and States
CSM Module

Fault state machines related to the Content Switching Module (CSM).

Failover State

The failover state of the module.

Unexpected State—The failover state was not the state it was expected to be, for example, the failover state is neither active nor standby.

Standby—The failover state is in the standby state.

Active—The failover state is in the active state.

Non-HA—The module is not configured for high availability.

OS Version State

VFrame rediscovers modules as a regular part of maintaining an accurate device inventory. During rediscovery, VFrame compares the OS version running on the module with the versions that VFrame supports. These are the possible states:

Unsupported Version—VFrame cannot manage a module running this version of the OS. VFrame cannot use a module in this state.

OK OS Version—The module is running a supported OS version.

HA State

The high-availability (HA) state of the module.

Changed to Non-HA Detected on Device—The module was configured to be part of a high-availability group, but its configuration was changed to non-HA.

Second Member Changed to Non-HA—This module is the last member of the HA group and its configuration was changed to non-HA.

Changed to HA Detected on Device—The module configuration was changed from non-HA to HA.

OK—There is no change to the HA configuration.

Non-HA—The module is not configured for high availability.

DB Non-HA State Synchronized with Device—During reinventory, VFrame discovers that a module configuration was changed from HA to non-HA. VFrame removes the module from the HA pair in the VFrame database. Note that you can only remove one member of the HA pair at a time.

DB HA State Synchronized with Device—During reinventory, a non-VFrame-managed module that is part of an HA pair with a VFrame-managed module was discovered and placed into management.

Failover Link State

Whether the failover link state between members of an HA pair has changed.

Up—The failover link is up.

Down—The failover link is down.

Non-HA—The module is not configured for high availability.

Change HA Group

Whether the module has joined a different HA group.

HA Group Change Detected—The module has joined a different HA group.

OK—The module's HA configuration has not changed.

Non-HA—The module is not configured for high availability.

Config Sync State

Whether the module is synchronized with its HA peer.

HA Pair Out of Sync—The configuration is not synchronized with the peer's configuration.

Non-HA—The module is not configured for high availability

HA Pair in Sync—The configuration is synchronized with the peer's configuration.

Peer Config

The state of the peer configuration for a high-availability configuration.

Peer Missing—The peer is missing.

OK—There has been no change in the peer configuration since the last reinventory.

Non-HA—The module is not configured for high availability.

Resource Type Change

Whether the resource type for the module has changed.

Changed to Non-HA—The module was originally part of a managed HA pair, but was reconfigured to remove its HA configuration. To resolve this fault, unmanage the module, and then manage it again.

OK—The resource type has not changed.

Module Status

Whether the service module is functioning correctly:

Up—The module is functioning correctly.

SSH Unreachable—VFrame cannot log into the module using SSH. SSH login is required to configure the module.

Down—There is a problem with the module and it is not functioning correctly.

Inventory Check

VFrame regularly takes inventory of the managed modules in a switch to determine if something has changed. The results of this inventory check can result in these states:

Missing—A managed CSM is no longer in the switch. If the module was intentionally removed, unmanage the module. Otherwise, identify and fix any problems associated with the module.

OK—There are no identifiable problems with the module.

FW Services Module

Fault state machines related to the Firewall Services Module (FWSM).

Peer Config

The state of the peer configuration for a high-availability configuration.

Peer Missing—The peer is missing.

OK—There has been no change in the peer configuration since the last reinventory.

Non-HA—The module is not configured for high availability.

OS Version State

VFrame rediscovers modules as a regular part of maintaining an accurate device inventory. During rediscovery, VFrame compares the OS version running on the module with the versions that VFrame supports. These are the possible states:

Unsupported Version—VFrame cannot manage a module running this version of the OS. VFrame cannot use a module in this state.

OK OS Version—The module is running a supported OS version.

Failover Link State

Whether the failover link state between members of an HA pair has changed.

Up—The failover link is up.

Down—The failover link is down.

Non-HA—The module is not configured for high availability.

HA State

The high-availability (HA) state of the module.

Changed to Non-HA Detected on Device—The module was configured to be part of a high-availability group, but its configuration was changed to non-HA.

Second Member Changed to Non-HA—This module is the last member of the HA group and its configuration was changed to non-HA.

Changed to HA Detected on Device—The module configuration was changed from non-HA to HA.

OK—There is no change to the HA configuration.

Non-HA—The module is not configured for high availability.

DB Non-HA State Synchronized with Device—During reinventory, VFrame discovers that a module configuration was changed from HA to non-HA. VFrame removes the module from the HA pair in the VFrame database. Note that you can only remove one member of the HA pair at a time.

DB HA State Synchronized with Device—During reinventory, a non-VFrame-managed module that is part of an HA pair with a VFrame-managed module was discovered and placed into management.

Change HA Group

Whether the module has joined a different HA group.

HA Group Change Detected—The module has joined a different HA group.

OK—The module's HA configuration has not changed.

Non-HA—The module is not configured for high availability.

Resource Type Change

Whether the resource type for the module has changed.

Change to Non-HA—The module was originally part of a managed HA pair, but was reconfigured to remove its HA configuration. To resolve this fault, unmanage the module, then manage it again.

OK—The resource type has not changed.

Module Status

Whether the service module is functioning correctly:

SSH Unreachable—VFrame cannot log into the module using SSH. SSH login is required to configure the module.

Up—The module is functioning correctly.

Down—There is a problem with the module and it is not functioning correctly.

Inventory Check

VFrame regularly takes inventory of the managed modules in a switch to determine if something has changed. The results of this inventory check can result in these states:

Missing—A managed FWSM is no longer in the switch. If the module was intentionally removed, unmanage the module. Otherwise, identify and fix any problems associated with the module.

OK—There are no identifiable problems with the module.

Context Mode Change State

Whether the FWSM has changed from multiple mode to single mode.

Changed to Single Context Mode—The mode has changed.

OK—The mode has not changed.

Virtual Context

Fault state machines related to FWSM security contexts.

Firewall Virtual Context Status

The status of the security context.

Active—The security context is functioning correctly.

Not Initialized—The security context configuration is not complete. For example, the URL might not be configured, or the security context is down due to some other error.

Unexpected State—VFrame could not determine the state or the state was not the expected one.

Firewall Mode Change State

Whether the firewall mode has changed for the security context.

Changed to Transparent—The mode has changed to transparent mode.

Change to Routed—The mode has changed to routed mode.

OK—The mode has not changed.

Failover State

The failover state of the security context.

Unexpected State—The failover state was not the state it was expected to be, for example, the failover state is neither active nor standby.

Standby—The failover state is in the standby state.

Active—The failover state is in the active state.

Non-HA—The security context is not configured for high availability.

Firewall Virtual Context Present

VFrame regularly takes inventory of the security contexts defined in an FWSM to determine if something has changed. The results of this inventory check can result in these states:

Missing—A managed security context is no longer in the FWSM. If the security context was intentionally removed, unmanage the context. Otherwise, identify and fix any problems associated with the context.

OK—There are no identifiable problems with the security context.

Peer Config

The state of the peer configuration for a high-availability configuration.

Peer Missing—The peer is missing.

OK—There has been no change in the peer configuration since the last reinventory.

Non-HA—The security context is not configured for high availability.


Related Topics

Understanding Faults and Fault State Machines

Configuring Resource Health Monitoring

Resource Health Monitoring Tab

Storage Fault State Machines

The storage fault state machines relate to your storage devices, such as storage area network (SAN) equipment and NAS filers. You can configure these settings:

Alarm Notification Level—The fault alarm notification level (priority) for fault alarms generated by changes to this state. The alarm notification level determines which notifications are sent based on your notification settings.

Polling Cycles—In many of these fault state machines, you can set the number of polling cycles that must occur before the state is changed. If your network is congested, configuring the polling cycle to be higher than one cycle can prevent unnecessary state changes.

Table 15-6 explains the storage fault state machines and is organized according to the folder representation in the interface.

Table 15-6 Physical Storage Resource Fault State Machines 

Fault State Machine
Description and States
CIFS Share

Fault state machines related to CIFS shares.

CIFS Share Destroyed

Whether the CIFS share was destroyed, true or false.

CIFS Share Volume Offline

Whether the volume used by the share is offline, true or false.

CIFS Share State

The current state of a managed CIFS share:

Missing on Filer—The share does not exist on the NAS filer

Unknown—The share state is unknown because of a communication failure with the NAS filer.

Known—The share state is known.

CIFS Share Volume Missing

Whether the volume used by the share is missing from the database, true or false.

FC Port

Fault state machines related to Fibre Channel ports.

Storage Port VSAN Membership State

Whether the managed storage array port is in the expected VSAN.

Not Present—The port is not in the expected VSAN. This state might happen if you move a port to a different VSAN or if the port is offline. Check the storage array to determine the exact problem.

Present—The port is in the expected VSAN.

Storage Inventory

If you are using storage manager templates, and run a discovery job to rediscover storage devices you are already managing in VFrame, the system takes inventory of the managed Fibre Channel ports in a storage array to determine if something has changed. The results of this inventory check can result in these states:

Physical Resource Missing—A managed port is no longer on the storage array.

OK—There were no changes to managed ports.

Logical Fabric

Fault state machines related to SAN fabrics.

VSAN Membership State

Whether a VSAN's storage port membership includes the expected storage ports.

OK—All of the expected storage ports are members of the VSAN.

Missing Storage Ports—At least one storage port that is supposed to be part of the VSAN is not part of it. The fault alarm shows the World Wide Name (WWN) of the missing port.

VSAN State

The current state of a managed VSAN:

Down—The seed switch indicates that the VSAN's operational state is down.

Missing—The VSAN was previously discovered in the fabric, but was not found during rediscovery. Check the VSAN configuration in the storage device. Before removing a VSAN, you should first unmanage it in VFrame to avoid this fault.

Up—The seed switch indicates that the VSAN's operational state is up.

Logical Unit

Fault state machines related to SAN logical units (LUN).

Storage Inventory

If you are using storage manager templates, and run a discovery job to rediscover storage devices you are already managing in VFrame, the system takes inventory of the managed LUNs in a storage array to determine if something has changed. The results of this inventory check can result in these states:

Physical Resource Missing—A managed LUN no longer exists.

All Mappings Missing—This managed LUN does not have any mappings to ports, which means it is not visible to outside networks. The LUN is not usable in this state. Correct the storage configuration in the storage array.

Certain Mappings Missing—A mapping that was previously discovered to exist (between a LUN and a port) no longer exists. If the mapping should still exist, check the storage configuration in the storage array for errors.

OK—There were no changes to managed LUNs.

NFS Qtree

Fault state machines related to NFS Qtrees.

Qtree Volume is Not Online

Whether the Qtree volume is not online. True indicates the volume is off-line.

Qtree Removed

Whether the Qtree was removed, true or false.

NFS Volume

Fault state machines related to NFS filer volumes.

Volume State

The state of the volume as identified in the device's ONTAPI response or Unreachable if VFrame cannot complete an ONTAPI call to obtain the volume state.

The only state that allows normal function is Online. All other states mean that VFrame cannot use the volume.

NFS Volume Availability Check

The state of an NFS volume's availability:

Destroyed—The volume was destroyed.

Export Rule Not Programmed—The export rule for the volume is not defined, and the volume is not usable.

Online—The volume is online and available for use.

Storage Array

Fault state machines related to SAN storage arrays.

Storage Inventory

If you are using storage manager templates, and run a discovery job to rediscover storage devices you are already managing in VFrame, the system takes inventory of the managed storage arrays to determine if something has changed. The results of this inventory check can result in these states:

Physical Resource Missing—A managed storage array cannot be found.

OK—There were no changes to managed storage arrays.

Storage Station

Fault state machines related to storage managers.

Storage Station State

The current state of the storage manager, which is used for discovering and managing storage arrays.

Login Error—The credentials defined for logging into the storage manager did not work. Correct the credentials defined for the storage manager.

Unreachable—The storage manager cannot be pinged from VFrame. Verify that the storage manager is online and that it is on a network that VFrame can route to.

OK—The storage manager is functional.


Related Topics

Understanding Faults and Fault State Machines

Configuring Resource Health Monitoring

Resource Health Monitoring Tab