What You Will Learn
Application, Security and Infrastructure architects are faced with two significant but interdependent challenges. The need to increase the speed of of application deployment and associated infrastructure changes as well as the need to protect those applications and the associated business data against a rapidly evolving set of security threats.
In response to these two large scale changes, infrastructure architects are moving to intent based cloud architectures where business and application policy is clearly defined in a logical manner and then implemented on some combination of on or off-premise infrastructure. The use of a formal logical description of the business and application security policy allows application, security and network administrators to focus on building and maintaining the governance rules and policies required by the business and let the infrastructure tools implement the detailed configuration of the various devices.
While multiple mechanisms ranging from configuration management tools such as Puppet, Ansible, Chef, and IaaS platforms such as Openstack and finally through to the fuller featured cloud and application programmable interface (API) driven platforms such as Amazon Web Services or Cisco® Application Centric Infrastructure are being used to help with this transition a key question remains:
How do you verify that the configuration of the infrastructure which has been performed by an automation system (or prior to automation, through manual processes) truly represents the intent defined at the logical layer? Is network traffic being forwarded correctly according to application policy? Have application components been correctly deployed together? Are security rules being programmed and obeyed? How do you migrate or make changes to new or existing application policy?
While this problem is not a new one in the modern datacenter, with its rapid rates of change and need for much tighter security, without advancements offered by a modern machine learning and analytics platform, questions like these are almost impossible to answer.
In this paper, you will learn:
● Why policy compliance is critical to a responsive and secure data center
● The importance of baselining applications
● How Cisco Tetration Analytics™ application insight is used to generate an application policy baseline
● How what-if simulations can help you model and test:
◦ Migration of applications locally, remotely, and to the cloud
◦ New application policy adjustments
◦ Movement to whitelist-based security enforcement
◦ Migration to the Cisco® Application Centric Infrastructure (Cisco ACI™) platform
● How to monitor live traffic in nearly real time to detect:
◦ Permitted traffic
◦ Mistakenly dropped traffic
◦ Escaped traffic
◦ Dropped traffic
Policy Compliance Use Cases
Automation has given the network operator a completely new level of flexibility in deploying cost-effective, scalable, and maintainable data centers. However, with this approach, the operator loses a direct relationship with the underlying device hardware. Given this abstraction that comes with most automation as well as configuration that is in many cases generated by controllers (and perhaps not even be human readable), how do administrators verify that their requirements have been truly programmed in the hardware? Without an automated method to validate what is actually happening, only the illusion of security may be in place, with vulnerabilities eventually surfacing many months later after a critical issue occurs.
However, large, non-explicitly policy-based networks (that is, standalone network architectures) face the same issue. In this case, though, the responsibility for rendering the business and security policy (there is always a logical policy that a data center attempts to follow) is at the hands of the network team. This team usually must manually maintain a distributed set of distinct configuration files to deliver the desired end-to-end application policy for the network. The need to validate the configuration is working correctly and the accuracy of the security policies is the same.
Figure 1 shows both approaches.
Figure 1. The Same Business Intent Rendered Using a Controller-Based Approach and Using an Operator-Guided Approach
The term “policy” at its most basic level in this context describes the intent of the network operator. Controllers such as the Cisco Application Policy Infrastructure Controller (APIC) merely capture that intent in a formal manner and then automate the process of rendering (applying) the configuration on the necessary hardware or software devices. Data centers have been built on policy since their creation, although the term wasn’t necessarily used. In a traditional model, the operator is charged with both capturing the policy intent and rendering it on devices (by building the appropriate configuration files for each device and applying them one by one).
No matter how policy is rendered (whether using a controller or not), formally proving that the end-to-end intent has been achieved is a difficult task. Yet without validation, silent gaps in policy may be present throughout the network (Figure 2).
Figure 2. A Conceptual Representation of Logical Policy (Intent) and Security Gaps That Can Appear During the Rendering Process: For Example, a Misconfigured Access Control List Might Open Too Many Ports
Furthermore, how can the cascading implications of modifying a small piece of policy be measured? Particularly in a whitelist-based network security approach, validating the accuracy of any policy is vital. Without validation, after you deploy the new policy you may discover that critical parts of an application have become unreachable. Remediation of the problem may require a complete rollback of any changes, costing expensive engineering time, creating the illusion of rigidity of the network, and causing the application team to lose confidence that the network can respond to necessary changes in a timely manner.
Comparing the forwarding results of two policy iterations makes deployment of changes predictable and verifiable. In the example in Figure 3, HTTPS traffic continues to be permitted (as expected, shown by a blue arrow directed toward the host). Secure Shell (SSH) traffic becomes allowed in the new policy iteration (a potential regression in the logical security policy, represented by the red arrow), and HTTP traffic errornously becomes denied, causing critical parts of the application to become unreachable (represented by the yellow arrow).
Figure 3. Sample Comparison of Current and Proposed Policy for a Host Serving a Web Application
Until now, addressing this problem has been nearly impossible because the size and complexity of the modern data center have grown to such a scale that humans cannot feasibly keep track of where legitimate workloads exist while at the same time tracking the complete set of communication patterns that applications should exhibit under normal operating conditions. And this problem is exacerbated as the application base starts to shift from client-server bare metal and virtual machine based applications to a scale out container based model. (Figure 4).
Figure 4. Sample Application Dependency Graph for a Simple Three-Tier Web Application: Even at a Small Scale, the Dependency Graph Can Quickly Become Complex
The Cisco Tetration Analytics platform offers a completely new analytics-based architecture designed from the foundation to solve this problem.
Finding the flows that matter
Imagine if you were asked to find a needle in a haystack, but given the option to have the needle removed and placed to the side of the haystack, would you take it? Under usual circumstances, any reasonable person would.
But how do you know what is the haystack and what is the needle? What if what you want to find isn’t a needle, but something never studied before?
By intimately understanding the known features of the haystack, right down to the level of each stalk of hay, you can immediately identify the needle as something obviously different.
The same concept can be applied to policy compliance. By understanding the desired policy at a very detailed level, you can easily detect any deviation from the policy, even without having previously studied or even understanding the characteristics of the deviation (Figure 5).
Figure 5. With Cisco Tetration Analytics Analysis, Compliant Flows Are Easily Separated from Noncompliant Flows
This new approach reduces response times to minutes, giving your organization the opportunity to detect and remediate problems faster than ever before, helping avoid application downtime, keeping user data secure, and avoiding expensive costs generated from unpermitted network activity.
Applying Application Insight
Cisco Tetration Analytics application insight (for more information, see the references listed at the end of this document) detects the exact application policy that needs to be in place for an application to function correctly and securely. This application policy can be exported and then imported into any number of different controllers because of its open format.
Application insight uses unsupervised machine learning to generate accurate clustering of similar endpoints, along with the whitelist policies needed for secure application communication. It “learns the haystack.”
Whitelisting is considered by some, such as the Australian government information security department, to be the number-one mitigation strategy to protect against targeted cyber intrusions (Australian Goverment, 2014); however, whitelists are expensive to implement and difficult to maintain. Whitelisting is the polar opposite of the more traditional blacklisting security method; communication between endpoints is explicitly permitted on an as needed basis, all other traffic is denied by default.
Application policy recommendations from application insight can then easily be published in the policy compliance pipeline for historical simulation and live analysis. The application policy can be published in a number of open formats, including JSON, XML, and YAML. The exported policy contains definitions of discovered clusters and the communication policies required between respective clusters. Because of the open and generic nature of the policy, it is compatible with almost any network infrastructure.
Testing the Future by Simulating the Past
A good way to plan for the future is to study the past. Much time and money is spent in many data centers just keeping applications running, and many data centers lack the flexibility to respond to business needs quickly. One of the main reasons that data centers can’t respond to new needs quickly is that testing any proposed change is difficult, even if not impossible. Projects thus are completed slowly and rolled out over an extended period of time.
Because Cisco Tetration Analytics has a complete archive of every flow, however, it can simulate the implications of any new policy change.
At a technical level, Cisco Tetration Analytics builds a virtual policy lookup table (VPLT) based on the rules discovered by application insight. Historical traffic is then replayed against the lookup table, and the application policy decision is monitored. This approach closely mimics the behavior that would be observed if the same flows were generated by production workloads and enforced by a network device (Figure 6).
Figure 6. VPLT Detailing the Forwarding Decision for Several Sample Flows
The user is then presented with a time-series graph characterizing the results of the simulation (Figure 7). Permitted traffic can be removed from the graph, bringing to the attention of the operator mistakenly dropped, escaped, and rejected traffic. These are the flows that will have a direct business impact if they are not addressed. They might indicate necessary components that are unable to talk to each other, potential security holes in the policy rule sets, or compromised hosts that are generating malicious traffic.
Figure 7. Sample Application Policy Analysis Results Graph
Any noncompliant flows that are identified as permitted (“part of the haystack”) by the operator can be instantly sent back to the application insight feature to tailor the application policy and improve the machine learning algorithms (Figure 8).
Figure 8. Instantly Tailoring the Application Insight Policy After Analysis
Tests can be run after any policy change, whether small or large, occurs, enabling data center operators to rapidly make changes, test them, and deploy them with a new level of confidence.
The Cisco Tetration Analytics platform generates and tests the policies required to operate and secure your network. However, it does not enforce the application policy. A number of different technologies are available that can use policy to enforce network security.
Although Cisco Tetration Analytics is not intrinsically linked with Cisco ACI, the policy that Cisco Tetration Analytics generates can be imported into an application infrastructure controller with the click of a button, using the open-source Cisco ACI toolkit as a simple middleware component. In practice, applications and clusters will be configured as application profiles and endpoint groups (EPGs) with communication policies built as contracts in Cisco ACI.
In deployments other than those using Cisco ACI, for example, deployments using Cisco Nexus® 9000 Series standalone switches, the policy can be rendered using access control lists and other configuration options, either manually by the network operator or using a network management tool (Figure 9).
Figure 9. A Policy Feedback Loop is Easily Implemented using the Cisco APIC and Tetration Analytics
You can also use the Cisco Tetration Analytics platform to construct the filter rules in firewalls or in any other network devices than can restrict network access at Layer 4. The platform does not inhibit policy generated for third-party vendor switches, routers, firewalls, and security appliances, nor does it require Cisco Nexus 9000 Series Switches to collect telemetry information, reinforcing the open nature of the platform.
Focusing on the Problems
Now, after simulating policy compliance, you can begin to monitor the network in nearly real time for compliance. Telemetry information is streamed to the analytics cluster at sub-second intervals of less than a second and analyzed within 15 minutes of ingestion, most commonly in less than 5 minutes. This extremely low end-to-end lag time enables data center owners a real opportunity to actively manage and remediate any network problems (Figure 10).
Figure 10. Policy Analysis Timeline: From Capturing Telemetry Information to Results
Again, Cisco Tetration Analytics builds a VPLT with four buckets used to classify traffic: permitted, mistakenly dropped, escaped, and rejected. Instead of historical simulation (where you are not looking at the results of the actual network), here the policy or forwarding decision of the real network is compared to the real intent of the policy as represented by the VPLT (Figure 11).
Now traffic permitted by the network can be compared against the VPLT. Do the results match? If they do, then this is normal application traffic. If they don’t match, then you need to quickly understand why.
Figure 11. Policy Allow Combined with Network Allow Indicates a Permitted Flow: Whether You Are Troubleshooting or Looking for Anomalous Flows, Permitted Traffic Is Unnecessary Background Noise
If the network should have permitted traffic but in fact did not, you are observing a mistakenly dropped flow (Figure 12).
Figure 12. Policy Allow Combined with Network Deny or Drop Indicates a Mistakenly Dropped Flow
The next critical question that Cisco Tetration Analytics answers is: Why did the drop occur?
Drops occur for two main reasons: application policy enforcement errors and forwarding failures. The exact reason (disposition) is reported by the hardware sensor in the Cisco Nexus 9000 Series Switch and made available to the operator.
If traffic was dropped as a result of a policy enforcement error, the rendered policy may be catching and denying legitimate traffic flows. This scenario may indicate that parts of your application are unable to connect to each other when they should. If traffic was dropped as a result of a forwarding failure, the application is affected because the network is unable to forward packets across the network, a problem requiring investigation by the operations team.
If the network should have denied traffic but did not, an escaped flow is now present on the network (Figure 13).
Figure 13. Policy Deny Combined with Network Allow Indicates an Escaped Flow
An escaped flow is potentially dangerous, and at the very least, it is an anomalous flow. It should be addressed immediately. It could indicate a configuration error such as a Cisco ACI contract that opened too many ports or an unintended gap in the logical security policy. It could also indicate a malicious actor forcibly removing or circumventing network enforcement—which cannot hide from the Cisco Tetration Analytics sensors.
If the network should, and does, drop a flow, a rejected flow has been successfully denied. This behavior should reassure the data center team that its policy intent is being correctly rendered by all the devices. However, it still warrants careful inspection by all teams because it could be the activity of a compromised host that is attempting to spread, or a malicious actor that is attempting to access resources to which it should not have access (Figure 14).
Figure 14. Policy Deny Combined with Network Deny Indicates a Successfully Rejected Flow
Cisco Tetration Analytics Policy Compliance in Action
To understand how Cisco Tetration Analytics Policy Compliance works, step through the process:
1. Policies from application insight are published into policy compliance
2. To simulate application policy discovered by application insight on historical flows, using the virtual policy lookup table, select the policy group you wish to test and the time period in which you wish to test
3. Check the time series graph for permitted, misdropped, escaped, and rejected traffic
4. Make any tweaks to application insight policy directly from the policy compliance tool
5. Monitor traffic in nearly real time for compliance with application insight policy
6. Drill down from the time series graph into permitted, misdropped, escaped, and rejected traffic
If knowledge equals power, the Cisco Tetration Analytics platform gives you a new level of power by giving you previously unobtainable knowledge about your data center. You gain the power to run your data center at an entirely new level: to respond quickly to required changes and to changing network conditions, while becoming more secure each day as additional application policy is discovered, tested, applied, and verified.
For More Information
● Cisco Tetration Analytics: http://www.cisco.com/c/en/us/products/data-center-analytics/tetration-analytics/index.html
● Behavior-Based Application Insight: Understand What’s Running in Your Data Center
● Cisco Advantage Series Next Generation Data Center Flow Telemetry