The Ultra M Manager Node can be configured to aggregate events received from different Ultra M components as identified in Table 1.
Note |
This functionality is currently supported only with Ultra M deployments based on OSP 10 and that leverage the Hyper-Converged architecture.
|
Table 1 Component Event Sources
Solution Component
|
Event Source Type
|
Details
|
UCS server hardware
|
CIMC
|
Reports on events collected from UCS C-series hardware via CIMC-based subscription.
These events are monitored in real-time.
|
VIM (Overcloud)
|
OpenStack service health
|
Reports on OpenStack service fault events pertaining to:
-
Failures (stopped, restarted)
-
High availability
-
Ceph / storage
-
Neutron / compute host and network agent
-
Nova scheduler (VIM instances)
By default, these events are collected during a 900 second polling interval as specified within the ultram_cfg.yaml file.
Note
| In order to ensure optimal performance, it is strongly recommended that you do not change the default polling-interval. |
|
UAS (AutoVNF, UEM, and ESC)
|
UAS cluster/USP management component events
|
Reports on UAS service fault events pertaining to:
By default, these events are collected during a 900 second polling interval as specified within the ultram_cfg.yaml file.
Note
| In order to ensure optimal performance, it is strongly recommended that you do not change the default polling-interval. |
|
Events received from the solution components, regardless of the source type, are mapped against the Ultra M SNMP MIB (CISCO-ULTRAM-MIB.my, refer to Ultra M MIB). The event data is parsed and categorized against the following conventions:
-
Fault code: Identifies the area in which the fault occurred for the given component. Refer to the “CFaultCode” convention within the Ultra M MIB for more information.
-
Severity: The severity level associated with the fault. Refer to the “CFaultSeverity” convention within the Ultra M MIB for more information. Since the Ultra M Manager Node aggregates events from different components within the solution, the severities supported within the Ultra M Manager Node MIB map to those for the specific components. Refer to Ultra M Component Event Severity and Fault Code Mappings for details.
-
Domain: The component in which the fault occurred (e.g. UCS hardware, VIM, UEM, etc.). Refer to the “CFaultDomain” convention within the Ultra M MIB for more information.
UAS and OpenStack events are monitored at the configured polling interval as described in Table 2. At the polling interval, the Ultra M Manager Node:
-
Collects data from UAS and OpenStack.
-
Generates/updates .log and .report files and an SNMP-based fault table with this information. It also includes related data about the fault such as the specific source, creation time, and description.
-
Processes any events that occurred:
-
If an error or fault event is identified, then a .error file is created and an SNMP trap is sent.
-
If the event received is a clear condition, then an informational SNMP trap is sent to “clear” an active fault.
-
If no event occurred, then no further action is taken beyond Step 2.
UCS events are monitored and acted upon in real-time. When events occur, the Ultra M Manager generates a .log file and the SNMP fault table.
Active faults are reported “only” once and not on every polling interval. As a result, there is only one trap as long as this fault is active. Once the fault is “cleared”, an informational trap is sent.
Note |
UCS events are considered to be the “same” if a previously received fault has the same distinguished name (DN), severity, and lastTransition time. UCS events are considered as “new” only if any of these elements change.
|
These processes are illustrated in Figure 1. Refer to About Ultra M Manager Log Files for more information.
Figure 2. Ultra M Manager Node Event Aggregation Operation
An example of the snmp_faults_table file is shown below and the entry syntax is described in Figure 2:
"0": [3 "neutonoc-osd-compute-0: neutron-sriov-nic-agent.service" 1 8 "status known"] "1": [3 "neutonoc-osd-compute-0: ntpd" 1 8 "Service is not active state: inactive"] "2": [3 "neutonoc-osd-compute-1: neutron-sriov-nic-agent.service" 1 8 "status known"] "3": [3 "neutonoc-osd-compute-1: ntpd" 1 8 "Service is not active state: inactive"] "4": [3 "neutonoc-osd-compute-2: neutron-sriov-nic-agent.service" 1 8 "status known"] "5": [3 "neutonoc-osd-compute-2: ntpd" 1 8 "Service is not active state: inactive"]
Refer to About Ultra M Manager Log Files for more information.
Figure 3. SNMP Fault Table Entry Description
Each element in the SNMP Fault Table Entry corresponds to an object defined in the Ultra M SNMP MIB as described in Table 2. (Refer also to Ultra M MIB.)
Table 2 SNMP Fault Entry Table Element Descriptions
SNMP Fault Table Entry Element
|
MIB Object
|
Additional Details
|
Entry ID
|
cultramFaultIndex
|
A unique identifier for the entry
|
Fault Domain
|
cultramFaultDomain
|
The component area in which the fault occurred. The following domains are supported in this release:
-
hardware(1) : Harware including UCS servers
-
vim(3) : OpenStack VIM manager
-
uas(4) : Ultra Automation Services Modules
|
Fault Source
|
cultramFaultSource
|
Information identifying the specific component within the Fault Domain that generated the event.
The format of the information is different based on the Fault Domain. Refer to Table 3 for details.
|
Fault Severity
|
cultramFaultSeverity
|
The severity associated with the fault as one of the following:
-
emergency(1) : System level FAULT impacting multiple VNFs/Services
-
critical(2) : Critical Fault specific to VNF/Service
-
major(3) : component level failure within VNF/service.
-
alert(4) : warning condition for a service/VNF, may eventually impact service.
-
informational(5) : informational only, does not impact service
Refer to Ultra M Component Event Severity and Fault Code Mappings for details on how these severities map to events generated by the various Ultra M components.
|
Fault Code
|
cultramFaultCode
|
A unique ID representing the type of fault as. The following codes are supported:
-
other(1) : Other events
-
networkConnectivity(2) : Network Connectivity Failure Events
-
resourceUsage(3) : Resource Usage Exhausted Event
-
resourceThreshold(4) : Resource Threshold crossing alarms
-
hardwareFailure(5) : Hardware Failure Events
-
securityViolation(6) : Security Alerts
-
configuration(7) : Config Error Events
-
serviceFailure(8) : Process/Service failures
Refer to Ultra M Component Event Severity and Fault Code Mappings for details on how these fault codes map to events generated by the various Ultra M components.
|
Fault Description
|
cultramFaultDescription
|
A message containing details about the fault.
|
Table 3 cultramFaultSource Format Values
FaultDomain
|
Format Value of cultramFaultSource
|
Hardware (UCS Servers)
|
Node: <UCS-SERVER-IP-ADDRESS>, affectedDN: <FAULT-OBJECT-DISTINGUSIHED-NAME>
Where:
<UCS-SERVER-IP-ADDRESS> : The management IP address of the UCS server that generated the fault.
<FAULT-OBJECT-DISTINGUSIHED-NAME> : The distinguished name of the affected UCS object.
|
UAS
|
Node: <UAS-MANAGEMENT-IP>
Where:
<UAS-MANAGEMENT-IP> : The management IP address for the UAS instance.
|
VIM (OpenStack)
|
<OS-HOSTNAME>: <SERVICE-NAME>
Where:
<OS-HOSTNAME> : The OpenStack node hostname that generated the fault.
<SERVICE-NAME> : Then name of the OpenStack service that generated the fault.
|
Fault and alarm collection and aggregation functionality within the Hyper-Converged Ultra M solution is configured and enabled through the ultram_cfg.yaml file. (An example of this file is located in Example ultram_cfg.yaml File.) Parameters in this file dictate feature operation and enable SNMP on the UCS servers and event collection from the other Ultra M solution components.
To enable this functionality on the Ultra M solution:
-
Install the Ultra M Manager bundle RPM using the instructions in Install the Ultra M Manager RPM.
Note |
This step is not needed if the Ultra M Manager bundle was previously installed.
|
-
Become the root user.
sudo -i
-
Navigate to /etc.
cd /etc
-
Edit the ultram_cfg.yaml file based on your deployment scenario.
Note |
The ultram_cfg.yaml file pertains to both the syslog proxy and event aggregation functionality. Some parts of this file’s configuration overlap and may have been configured in relation to the other function.
|
-
Navigate to /opt/cisco/usp/ultram-health.
cd /opt/cisco/usp/ultram-health
-
Start the Ultra M Manager Service.
-
Verify the configuration by checking the ultram_health.log file.
cat /var/log/cisco/ultram_health.log