Introduction

Cisco DNA starter kit

Cisco DNA Offers

Networking has been rapidly evolving over the past two decades. This evolution has led to a broad range of available networking technologies (numerous PHY/MACs, routing protocols, QoS, high availability, security, etc.) in order to serve a plethora of applications with highly differentiated requirements (from best effort to deterministic) and at unprecedented scales.

Networks today are the lifeblood of many organizations worldwide. Whether in healthcare, manufacturing, education, retail, or other industries, the proper, ongoing operation of the network is vital to the operation of today’s organizations.  As such, analytics for the network has become of the utmost importance, as this allows an organization to gauge how well the network is working to support the goals of its business and to constantly adapt to new requirements serving a broad range of IT and IoT applications.

Are applications being accessed efficiently by users?

Are devices able to communicate with their servers and services seamlessly?

Are the users of the network getting the service levels they need to be productive?

The answers to these and many other questions allow the organization to determine how well the network is operating, to quickly and efficiently troubleshoot problems when they arise, and to keep adapting the network’s topology, configurations, and dimensioning.

Beyond this, there is the issue of scale. Networks today are so large, so widespread, and encompass so many various interacting technologies, that the sheer volume of data provided by the network quickly becomes overwhelming for the network operator. Moreover, how can the network move from a purely reactive stance to a more proactive one, spotting key trends and issues and making the necessary network changes before problems occur?

Let’s Take the Example of Troubleshooting

What network operators need is a platform capable of extracting astute and relevant insights.  Insights into the services provided to their applications and end users. Insights into network operations via multiple correlated key performance indicators (KPIs). Insights into network trending which reveals key aspects of network and application operation. Insights that help to rapidly distill the many thousands of otherwise-disparate data points down to a few key conclusions that drive actionable outcomes for the business.

Predictive Analytics Is Yet Another Example

Can the network predict issues before they happen, or even trigger automatic remediation (for example, through a closed loop control system)?

This is where the power of machine learning (ML) comes into play. Cisco has built a breakthrough learning platform, making use of an extremely rich data platform used to train the most advanced machine learning technologies.

This white paper explores how such a platform is used in the context of Cisco’s Digital Network Architecture (Cisco DNA). ML capability is now being introduced with Cisco DNA Center, bringing powerful ML capabilities – and more importantly, the key business insights that ML provides – within reach of networks and organizations worldwide.

This introductory white paper provides an overview of such a learning platform in the context of Cisco AI Network Analytics, a solution which is initially focused on the complex technologies and challenges inherent to wireless networks. Future revisions of this document will discuss other areas, such as switching, SD-WAN, and cross-domain, to name a few, as Cisco’s machine learning technologies become applied to these areas.

Motivations

As we begin to explore the uses and technologies provided by the Cisco AI Network Analytics solution, let’s first start with a few common wireless challenges that will be immediately familiar to any network operator, manager, or architect: wireless joining/roaming times and per-user wireless application throughput.

Why Machine Learning (ML)?

What does "Learning" mean, in this context? "Learning" refers to the ability to train a machine learning (ML)(mathematical) model capable of understanding or modeling the behavior of a complex system or given KPI or variable.

Figure 4. Determining the root cause of an issue

Last but not least, the Cisco AI Network Analytics platform is capable of relevance learning (RL). Thanks to simple user feedback on the relevancy of anomalies observed and reported to the network manager, the Cisco AI Networks Analytics platform adjusts itself (“auto-adjusts”) to “raise anomalies” (that is, bring to the user’s attention) that are of most interest for the user.

Consider the use case where the system tries to predict a networking equipment failure: this is a situation where there is a ground truth (the equipment did fail (or did not)). Thus, the precision or accuracy of the prediction is easy to measure. In contrast, when raising an anomaly in a wireless network, an anomaly may be considered relevant by a given user and irrelevant by another user (there is no ground truth), making the notion of accuracy, in this case, subjective.

It is worth discussing the critical topic of relevancy. Anomaly detection (AD) refers to the capacity of a system to detect anomalies where an anomaly is an outlier. Said differently, an event may be (rightly, mathematically) considered a real anomaly even though the user may not find this fact relevant from a user or networking standpoint. Consequently, in contrast to problems with ground truth, the notion of the accuracy of the system makes little sense.

For example, in the case of cognitive analytics, determining that an AP is offering significantly less throughput for a given class of device at regular times of day in a complex network (without any  explicit and manual configuration) is extremely difficult, using classic approaches. Finding a pattern to predict the actual user experience of a video call, taking into account the nature of the application, video codec parameters, the states of the network (data rate, RF, etc.), the current observed load on the network, and the destination being reached – to mention a few parameters – is simply impossible using existing so-called rules-based systems. By way of contrast, an ML, leveraging intelligent algorithms and leveraging the power of big data, can succeed at such tasks.

Yet another example where ML provides high value relates to the detection of “subtle” changes over time, which, after enough time has passed, may become a real anomaly. In this case, algorithms are used to detect that an access point (AP) provides a throughput that slowly but surely degrades over time compared to other APs in the network, alerting the network manager so that corrective action may be taken – possibly before any end user actually notices a problem!

This is illustrated in Figure 5, where such a parameter is tracked over several weeks and months  and a deviation from the “normal” operation of the AP is highlighted, both by itself and compared with other, peer APs. In this diagram, each bubble represents an AP, the relative size of the bubbles indicates the number of clients served by that AP, and the line running across the various weeks tracks the AP’s progress and deviation over time, allowing the network manager to easily spot the issue and determine both impact and severity at a single glance.

Figure 5. Using machine learning to detect subtle degradation over time

Which Machine Learning Algorithm Are You Using?

This shows the power of ML in action – distilling a vast amount of data down to a clear set of insights from which actionable outcomes can be drawn, without dependence on static thresholds that can, and do, vary from one network deployment to another.

This is, without doubt, a very common question. Unfortunately, no single ML algorithm is capable of solving all (or even most) use cases (this fact is referred to as the “No Free Lunch” ML theorem). Efficient ML approaches rely on a set of ML algorithms working together to achieve a desired outcome. Cisco AI Network Analytics is an ML  learning platform that uses a collection of ML algorithms focused on providing key insights and outcomes not readily possible by other means.

Cisco AI Network Analytics focuses on use cases where an ML approach offers the only viable answer and where existing techniques fall short; specifically, highly dimensional problem spaces where patterns must be understood and learned over time using vast data sets in which ML has proven to be extremely powerful. For example, in the case of cognitive analytics, determining that an AP is offering significantly less throughput for a given class of device, considering a specific set of parameters, in a complex network (without any explicit and manual configuration) is extremely difficult with classic threshold-based approaches.  By leveraging the power of machine learning, such insights and outcomes can rapidly be provided.

Few Words About the Role of the Data Platform

In the world of ML, data is ultimately more important than the algorithm. Indeed, the volume of the data platform used to feed the ML algorithm is a strong factor influencing the overall efficiency of the solution. However, sheer volume of data is not sufficient; diversity is also critical. Traffic and network characteristics vary drastically between networks, and sometimes even between areas of a given network. One cannot expect an ML model to perform on predicting for data it has not been trained for. Thus, it is imperative to build a data platform with high diversity.

Last, but not least, is the quality of the data. The paradigm under which models (such as a deep neural network) may be fed with “infinite” input variables relying of some form of auto selection and filtering is far from being proven for most use cases. Feeding models with noise unavoidably leads to random output. The Cisco AI Network Analytics solution makes use of techniques to automatically ensure data quality.

Selecting a small number of relevant anomalies: Building an ML system capable of raising anomalies is not the toughest challenge. Many anomaly detection (AD) applications have been designed over the past two decades (statistical deviation, percentile regression, and auto-encoders, to name a few). Still, raising a large number of anomalies is likely to make the system unusable for the network operator. The number of anomalies raised should be limited and highly relevant. Cisco AI Analytics makes use of several advanced techniques to reach such an objective, combining an ML approach with Cisco’s 35 years of deep subject matter expertise in networking.

Architecture Overview

Data Flow

Figure 6 shows how the data flows across the main components of the platform:

  • Data Collection
    • From various sources, data is collected and aggregated on-premises by the Network Data Platform (NDP) component within the Cisco DNA Center appliance.

  • Data validation and anonymization
    • Handled by the local ML agent, operating on premises.
    • Sent to the cloud after being anonymized on premises
  • Data processing
    • Handled in the cloud, based on anonymized data.
    • Data is stored and fed into the machine learning models
    • Insights generated in the cloud and sent back to the on-premises agent
  • Issues/insight visualization
    • Data received from the cloud is first de-anonymized.
    • Data is then displayed in the Cisco AI Network Analytics UI, which is fully and seamlessly integrated with Cisco DNA Center

Data Sources

The Cisco AI Network Analytics platform is architected to feed its analytics with a wide variety of data sourced from across the network - not only the wireless components, but also routers, switches, management stations, application servers, RADIUS/DHCP servers, and more. During the initial phases, however, the focus of the data input into Cisco AI Network Analytics solution will be data produced by Cisco Wireless LAN Controllers (WLCs).

The Cisco Wireless LAN Controller platform and version requirements are the same for Cisco DNA Center:

  • Software version: AireOS 8.5.120.0 (8.5MR4+ recommended)
  • WLC Models 3504, 5520, 8510, 8540

Configuration requirements:

  • he Application Visibility and Control (AVC) feature must be enabled on the WLC/WLAN configuration to enable throughput-related use-cases.
  • APs need to be assigned to a building floor in Cisco DNA Center to allow per-site data aggregation.

Data Anonymization

The security of customer data is of paramount importance. In order to ensure the quality of telemetry while at the same time guaranteeing privacy, personal and confidential data is anonymized, namely when it pertains to:

  • End-user identity (user name, device MAC address, etc.)
  • Device location (hostname, AP location string, etc.)
  • Network addresses (IPv4 / IPv6), including routing table information

Other nonsensitive data - such as the relative location of clients and APs, mobility of clients between APs and controllers, and similar variables - need to be kept intact in order to feed meaningful data to the Cisco AI Network Analytics engines.

The anonymization scheme used is based on strong encryption. Every bit of sensitive data is run through an AES-based encryption function using a strong key that is generated, managed, and archived by the customer on-premises (it is kept in the secure storage of Cisco DNA Center). The original data is replaced with the encrypted version before being sent to the Cisco AI Network Analytics cloud

Once the output data (results, alerts, visualization, etc.) comes back from the platform cloud to be displayed in the Cisco DNA Center UI in the user’s browser, it is run by a local UI proxy through a de-anonymization process that decrypts the anonymized values and restores the original data as needed. The user is then presented with relevant names, IDs, addresses, and other such information, even though the analytics were done using anonymized versions of those items.

Deployment Model

Cisco AI Network Analytics is part of the Cisco DNA Advantage software license for Cisco DNA Center. It is provided as an additional component that seamlessly blends in with the Cisco DNA Assurance user interface. The solution provides advanced ML-generated insights and issues, along with the visualization tools required for analysing, troubleshooting, and reacting to the issues raised by the ML engines.

Deploying Cisco AI Analytics is very straightforward, thanks to Cisco DNA infrastructure, and simply requires a working instance of Cisco DNA Center (which runs in an appliance form factor) as well as HTTPS connectivity to the Cisco AI Network Analytics cloud. All data is mapped, processed, and anonymized before being sent to the cloud; results and insights are returned by the Cisco AI Network Analytics cloud services and are displayed after de-anonymization, directly in the Cisco DNA Assurance UI on Cisco DNA Center.

Communication and Authentication to the Cisco AI Network Analytics Cloud

All Communications to the Cisco AI Network Analytics cloud (hosted on AWS) are secured using Transport Layer Security (TLS) 1.2 with strong encryption. Mutual authentication between the Cisco AI Network Analytics agent on Cisco DNA Center and the associated cloud services is ensured through the use of certificates generated and managed by Cisco. The initial enrollment with the Cisco AI Network Analytics cloud certificate authority is authenticated by a customer ID and a secure onboarding key (acting as a one-time enrollment password).

All connections to the Cisco AI Network Analytics cloud are outbound on TCP port 443; no inbound connections are required (that is, the Cisco AI Network Analytics cloud will not be initiating TCP flows toward Cisco DNA Center). A list of cloud server fully qualified domain names (FQDNs) and IP addresses is provided in order to configure firewalls accordingly (please refer to the Installation Overview > Proxy Configuration section in the Cisco AI Network Analytics documentation for current information). Cisco DNA Center must also be able to perform DNS lookups for the cloud server addresses.

Connections to Cisco AI Network Analytics cloud may also go through a proxy (explicit or transparent), if required. The proxy server setting, if any, is inherited from Cisco DNA Center.

Scope of Data Collection

The following categories of information will be collected from the Cisco Wireless LAN Controllers involved:

  • Client: MAC and IP address, VLAN, association / connection AP / state / time, CCX, device / OS type, RF rate / RSSI/SNR, and ACLs
  • AP: MAC and IP addresses, WLC, state, SN / model/location, RF channel / stats / interference, neighbors, client count, and RF quality/power
  • Application-level stats, such as those provided by Cisco AVC (Application Visibility and Control)
  • System: CPU and memory usage, and client/AP count
  • RF: interferer and rogue AP/RSSI/SNR / channels, and SSID
  • DHCP: server IP address, counters, and statistics
  • RADIUS authentication and authorization: server IP address, counters, and statistics
  • Client events (state changes during association, roaming, etc.)
  • AP/RRM events (Radio Resource Management events)

Conclusion

This paper provides a short overview of Cisco AI network Analytics – a breakthrough cloud-based ML/AI platform for the network, with a specific focus on the initial components of the ML platform dedicated to use with the wireless network.

The most advanced ML/AI techniques and algorithms have been developed by Cisco for this solution to support cognitive and predictive analytics. The system is capable of learning ranges of normality for a number of variables, detects relevant anomalies, and is able to adjust continuously to user feedback. Other functionalities of the solution allow for detecting long-term anomalies, such as subtle changes over a long period of time, and also make use of ML algorithms for comparing network element performance metrics, or even providing the capability to compare the organization’s network performance with comparable peers.

 The ML models employed are trained with a vast amount of data, with high diversity and quality, a must-have for any ML-based platform, and the solution leverages strong data anonymization techniques with in-built security to ensure the appropriate level of handling for sensitive customer data. 

In short, the Cisco AI Network Analytics solution provides the ability for network operators, managers, and architects to be provided with key network insights, driving actionable outcomes, leveraging the power of machine learning to tackle problems that were not previously tractable.  Note that the output of such a system may also be used by a close loop control system for automation.

Initially, the Cisco AI Network Analytics solution (an instantiation of a broader learning platform) is focused on wireless use cases, as these are some of the most difficult-to-troubleshoot and difficult-to-analyze areas of the network – and yet, at the same time, some of the most critical areas for modern organizations as they transition to a wireless-first future. 

Over time, further revisions of this document will extend to other use cases, such as switching, SD-WAN and cross-domain, to name a few, as the solution continues to evolve and grow.