Understanding Health Scores
ACME's Operations team has been challenged on a regular basis to answer basic questions regarding the current status, performance, and availability of the system they are responsible for operating. To address these challenges they can now utilize the Cisco Application Centric Infrastructure (ACI), which provides health scores that provide information on status, performance, and availability. While providing such answers might be easy as it relates to an independent device or link, this information by itself is of little to no value without additional data on its effect on the overall health of the network. To manually collect and correlate information would have previously been a long and tedious task, but with health scores, data throughout the fabric is collected, computed, and correlated by the Application Policy Infrastructure Controller (APIC) in real time and then presented in an understandable format.
Traditional network monitoring and management systems attempt to provide a model of the infrastructure that has been provisioned, and describe the relationship between the various devices and links. The object model at the heart of ACI is inherent to the infrastructure. A single consolidated health score therefore shows the current status of all of the objects including links, devices, their relationships, the real-time status of their utilization, and a quick at-a-glance assessment of the current status of the entire system or any subset of the system. This visibility has a number of practical use cases, and in this chapter we will classify these use cases as reactive and proactive. ACI also provides the flexibility to monitor some aspects of how the health scores are calculated, and how various faults impact the calculation of the health score.
Most objects in the model will have an associated health score, which can be found from the Dashboard or Policy tabs of the object from the GUI. To check the overall fabric health, in the APIC GUI, go to . You can view the following information:
The controller health
Nodes with health less than 99
Tenants with health less than 99
A health graph depicting the health score of the system over a period of time
The health graph is a good indication of any system issues. If the system is stable, the graph will be a constant, otherwise it will fluctuate.
All health scores are
instantiated from the
healthInst class and can be extracted through the API.
In a reactive capacity, ACI health scores provide a quick check in which a newly occurred issue instantly results in a degradation of the health score. The root cause of the issue can be found by exploring the faults. Health scores also provide a real-time correlation in the event of a failure scenario, immediately providing feedback as to which tenants, applications, and EPGs are impacted by that failure.
Almost every object and policy has a Health tab. As an example, to check if a specific EPG has faults, you can go to . In the work pane, look for the Health tab. You can also access the Health tab under . This tab provides the affected object and how it is tied within the larger model. By clicking on the +, you can explore the health tree of any affected object or policy to reveal the faults.
Proactively, ACI health scores can help identify potential bottlenecks in terms of hardware resources, bandwidth utilization, and other capacity planning exercises. Operations teams also stand a better chance of identifying issues before they impact customers or users.
Ideally, the health of all application and infrastructure components should always be at 100%. However, this is not always realistic given the dynamic nature of data center environments. Links, equipment, and endpoints have failures. Instead the health score should be seen as a metric that will change over time, with the goal of increasing the average health score of a given set of components over time.
Viewing a Health Score Using the NX-OS-Style CLI
You can use the NX-OS-style CLI to view the health of specific objects.
To view the health of a tenant:
show health tenant tenant
To view the health of bridge domain of a tenant:
show health tenant tenant bridge domain bd
To view the health of an endpoint group of an application within a tenant:
show health tenant tenant application app epg epg
To view the health of a leaf:
show health leaf leafnode
The following example views the health of tenant "tenant1":
apic1# show health tenant tenant1 Score Change(%) UpdateTS Dn ----- ----- ------------------- ------------------------------ 100 0 2015-11-13T18:23:14 uni/tn-pineapple/health .767-08:00
The following example views the health of leaf 101:
apic1# show health leaf 101 Score Change(%) UpdateTS Dn ----- ----- ------------------- ------------------------------ 72 10 2015-11-11T12:55:52 topology/pod-1/node-101/sys/health .847-08:00
Viewing a Health Score Using the iShell
You can use the iShell to view the health of specific objects.
To view the health of an APIC:
show health controller ID
To view the health of a switch:
show health switch node
The following example views the health of switch 101:
admin@apic1:~> show health switch 101 Current Score Previous Score Timestamp Dn ------------- -------------- --------------------- ------------------- 72 65 2015-11- topology/pod-1/ 11T12:55:52.847-08:00 node-101/sys/health