At a fundamental level, notwithstanding the scale and complexity of an infrastructure that is ever growing, administrators are expected to prevent problems and recover from them faster when they do occur.
Troubleshooting, root-cause analysis, and remediation of network issues are common challenges for any infrastructure operation. They require network operators to have a high level of domain expertise and the ability to correlate complex IT environments to prevent or fix issues while upholding the infrastructure uptime to honor Service-Level Agreements (SLAs) with minimum disruption.
Current network operations tools do not address the needs of the modern network. These tools are
● Fragmented: Unintegrated tools address siloed visibility use-cases and work on different protocols. Some tools are even outdated or are expensive.
● Reactive: They have inconsistent API architectures and are not proactive. This makes it difficult to analyze root-cause issues and often too late to react to them because the tools are acting on stale data.
● Limited in insight: These tools don’t have data-plane visibility and have low data-fidelity that is not actionable. It is hard to get a holistic picture due to an absence of data correlation.
The Cisco® Network Insights applications are designed to address this challenge in a comprehensive and scalable way. Network Insights Resources (NIR) provides analysis and correlation of software and hardware telemetry data, especially for day-2 network operations use-cases, focusing on identifying anomalies and providing drill-down to specific issues.
Empower your team with proactive monitoring
● Learn from your network and recognize anomalies before your end users do.
● Generate proactive alerts useful in preventing outages.
Shorten time to remediation for troubleshooting
● Minimize critical troubleshooting time through automated root-cause analysis of data-plane anomalies, such as packet drops, latency, workload movements, routing issues, ACL drops, etc.
● Assisted auditing and compliance checks using searchable historical data presented in time-series format.
Increase speed and agility for capacity planning
● Detect and highlight components exceeding capacity thresholds through fabric-wide visibility of resource utilization and historical trends.
● Resource utilization shows time-series-based trends of capacity utilization so you can plan for resizing, restructuring, and repurposing.
Cisco Network Insights for Resources (NIR) gathers resource information through data collection to provide an overview of available resources and their active processes and configurations across the entire APIC. NIR works for both Cisco ACI™/APIC and Cisco NX-OS/DCNM platforms. Using this data, you can make crucial business decisions that give you more usable information in a shorter time, which can lead to greater operational effectiveness. This application monitors and records hardware and software telemetry data over time to identify anomalies in the fabric and help automate troubleshooting, root-cause analysis, capacity planning, and remediation. It helps infrastructure owners comply with the SLAs required by their customers.
NIR is a tool whose GUI is integrated as a plugin into the Cisco ACI APIC and DCNM controller GUI. NIR attracts the administrator’s attention to significant matters relevant to the task at hand, such as troubleshooting, monitoring, auditing, planning, etc. NIR broadly consists of the following components:
● Anomaly detection: This involves understanding the behavior of each fabric component by using different machine-learning algorithms. When the resource behavior deviates from an expected pattern, anomalies are raised.
● End-point analytics: This monitors availability, location, health, of end points and provides visibility of any impact to these end points due to any events or changes in infrastructure. It helps derive potential root causes and reduce MTTR.
● Resource utilization: This is useful for capacity planning because it offers early detection of resources that are exceeding capacity thresholds. These analytics include monitoring of software and hardware resources such as CPU, memory, and VRFs to ensure that they are being used optimally. This identifies anomalies by observing parameters such as CPU, memory, temperature, power draw, fan speed, etc.
● Statistics: Monitors and detects anomalies related to interface utilization, errors, protocol stats and state machines. Helps detect, locate, root cause issues. Correlation with EP analytics provides impact analysis data.
● Flow analytics: This helps identify, locate, and analyze root causes of data-path issues such as latency and packet drops for specific flows.
The dashboard view is intended for quick action on specific issues that need attention, as shown below: