System Monitoring Overview

System Monitoring Overview

This guide describes how to configure and use system monitoring to manage a Cisco UCS Manager environment.

Cisco UCS Manager can detect system faults: critical, major, minor, and warnings. We recommend that:

  • You monitor all faults of either critical or major severity status, as immediate action is not required for minor faults and warnings.

  • You monitor faults that are not of type Finite State Machine (FSM), as FSM faults will transition over time and resolve.

This guide covers the following information:

  • System Log

    • System logs including faults, failures, and alarm thresholds (Syslog)

    • The three types of Syslogs: Fault, Event, and Audit logs

    • The Global Fault Policy and settings that control Syslogs

  • System Event Log

    • System hardware events for servers and chassis components and their internal components (System Event Log [SEL] logs)

    • The SEL policy that controls SEL logs

  • Simple Network Management Protocol

    • SNMP for monitoring devices from a central network management station and the host and user settings

    • Fault suppression policies for SNMP traps, Call Home notifications, and specific devices

  • Core File Exporter and logs, such as Syslog, Audit Log, and the System Event Log

  • Statistics Collection and Threshold Policies for adapters, chassis, host, ports, and servers

  • Call Home and Smart Call Home Cisco embedded device support

  • Hardware monitoring using the Cisco UCS Manager user interface

  • Traffic Monitoring sessions for analysis by a network analyzer

  • Cisco Netflow Monitor for IP network traffic accounting, usage-based network billing, network planning, security, Denial of Service monitoring capabilities, and network monitoring

The Cisco UCS Manager Core and Fault Generation

The Cisco UCS Manager core is made up of three elements, which are the Data Management Engine, Application Gateway, and user accessible northbound interface. The northbound interface comprises of SNMP, Syslog, XML API, and UCSM CLI.

You can monitor the Cisco UCS Manager servers through XML API, SNMP, and Syslog. Both SNMP and Syslog are interfaces used only used for monitoring as they are read-only, so no configuration changes are allowed from these interfaces. Alternatively, the XML API is a monitoring interface that is read-write, which allows you to monitor Cisco UCS Manager, and change the configuration if needed.

Figure 1. Cisco UCS Manager Core and Monitoring Interfaces

Data Management Engine (DME)

The DME is the center of the Cisco UCS Manager system, which maintains:

  • The Cisco UCS XML database which houses the inventory database of all physical elements (blade and rack mount servers, chassis, modules, and fabric interconnects).

  • The logical configuration data for profiles, policies, pools, vNIC, and vHBA templates.

  • The various networking-related configuration details like VLANs, VSANs, port channels, network uplinks, and server downlinks.

The DME monitors:

  • The current health and state of all components of all physical and logical elements in a Cisco UCS domain.

  • The transition information of all Finite State Machine (FSM) tasks occurring.

Only the current information of inventory, health, and configuration data of the managed endpoints are stored in the Cisco UCS XML database resulting in near real time. By default the DME does not store a historical log of faults that have occurred on a Cisco UCS domain. As fault conditions are raised on the endpoints, the DME creates faults in the Cisco UCS XML database. As those faults are mitigated, the DME clears and removes the faults from the Cisco UCS XML database.

Application Gateway (AG)

Application Gateways are software agents that communicate directly with the endpoints to relay the health and state of the endpoints to the DME. AG-managed endpoints include servers, chassis, modules, fabric extenders, fabric interconnects, and NX-OS. The AGs actively monitor the server through the IPMI and SEL logs using the Cisco Integrated Management Controller (CIMC). They provide the DME with the health, state, configuration, and potential fault conditions of a device. The AGs manage configuration changes from the current state to the desired state during FSM transitions when changes are made to the Cisco UCS XML database.

The module AG and chassis AG communicate with the Chassis Management Controller (CMC) to get information about the health, state, configuration, and fault conditions observed by the CMC. The fabric interconnect NX-OS AG communicates directly with NX-OS to get information about the health, state, configuration, statistics, and fault conditions observed by NX-OS on the fabric interconnects. All AGs provide the inventory details to the DME about the endpoints during the various discovery processes. The AGs perform the state changes necessary to configure an endpoint during FSM-triggered transitions, monitor the health and state of the endpoints, and notify the DME of any faults.

Northbound Interfaces

The northbound interfaces include SNMP, Syslog, CLI, and XML API. The XML API present in the Apache webserver layer sends login, logout, query, and configuration requests using HTTP or HTTPS. SNMP and Syslog are both consumers of data from the DME.

SNMP informs and traps are translated directly from the fault information stored in the Cisco UCS XML database. SNMP GET requests are sent through the same object translation engine in reverse, where the DME receives a request from the object translation engine. The data is translated from the XML database to an SNMP response.

Syslog messages use the same object translation engine as SNMP, where the source of the data (faults, events, audit logs) is translated from XML into a Cisco UCS Manager-formatted Syslog message.

Cisco UCS Manager User Documentation

Cisco UCS Manager offers you a new set of smaller, use-case based documentation described in the following table:

Guide

Description

Cisco UCS Manager Getting Started Guide

Discusses Cisco UCS architecture and Day 0 operations, including Cisco UCS Manager initial configuration and configuration best practices.

Cisco UCS Manager Administration Guide

Discusses password management, role-based access configuration, remote authentication, communication services, CIMC session management, organizations, backup and restore, scheduling options, BIOS tokens, and deferred deployments.

Cisco UCS Manager Infrastructure Management Guide

Discusses physical and virtual infrastructure components used and managed by Cisco UCS Manager.

Cisco UCS Manager Firmware Management Guide

Discusses downloading and managing firmware, upgrading through Auto Install, upgrading through service profiles, directly upgrading at endpoints using firmware auto sync, managing the capability catalog, deployment scenarios, and troubleshooting.

Cisco UCS Manager Server Management Guide

Discusses the new licenses, registering Cisco UCS domain with Cisco UCS Central, power capping, server boot, server profiles, and server-related policies.

Cisco UCS Manager Storage Management Guide

Discusses all aspects of storage management, such as SAN and VSAN in Cisco UCS Manager.

Cisco UCS Manager Network Management Guide

Discusses all aspects of network management, such as LAN and VLAN connectivity in Cisco UCS Manager.

Cisco UCS Manager System Monitoring Guide

Discusses all aspects of system and health monitoring, including system statistics in Cisco UCS Manager.