Secure Network Infrastructure: Protect Video over IP Services
Purpose of Document
Designing a SECURE IP/MPLS infrastructure to defend against complex threats and malicious attacks, which continue to change in behavior and characteristics, is critical today more than ever. This document uses a design model based on existing technologies within an IP/MPLS infrastructure; the model takes advantage of these technologies to position against risk and mitigate threats targeted at the infrastructure.
Service availability, reliability, and quality are critical attributes that service providers must protect when deploying video technologies over IP/MPLS. A security baseline implemented within an IP/MPLS infrastructure protects these attributes by:
Service Provider Security Challenges and Requirements
The primary challenge faced by today's service providers is maintaining service predictability in the presence of an outbreak of malicious traffic sourced from multiple endpoints spread across multiple network boundaries. In today's terms, this type of behavior has been identified with threats such as distributed denial of service (DDoS) attacks, turbo worms, e-mail spam, and viruses. The amount of traffic generated by an outbreak has the capability of disrupting the normal operation of an IP/MPLS network and adds risk to the supporting devices routing and switching packets.
Based on emerging threats scaling to multi-gigabit rates of traffic, service provider requirements have now shifted from using standalone security appliances to requiring that security now be integrated into the network infrastructure. Integrating security within the network infrastructure provides the following advantages:
By integrating security functionality within the network infrastructure, the same operational tools to manage data services are also used to support security operations. Common tools such as Authentication/Authorization/Accounting (AAA) services, SNMP, SYSLOG, routing protocols, device counters, and packet analysis tools enforce and monitor security polices required to ensure reliable operation. As the network devices become more intelligent, operational processes become more proactive to identify and mitigate threats to key video attributes.
Business Control Framework
Security is the foundation of internetworking's future; we have moved from an Internet of implicit trust to an Internet of pervasive distrust. The Business Control Framework (BCF) model takes the functionality of a router and merges this functionality into a pervasive policy enforcement model.
BCF Design Model
The BCF design model is referenced to implement security networkwide with no reliance to a single technology. Multiple technologies and features are used networkwide to ensure two crucial design goals:
The BCF model uses six technology categories (Figure 2) to identify a security baseline to mitigate risk to video services over an IP infrastructure (availability, reliability and quality).
Threat Vectors Mapped to IP Planes of Operation
IP networks can be categorized into three planes of operation:
Each plane of operation must be properly secured and monitored (Control + Visibility) to ensure the reliable operation of the network. Figure 1 uses a matrix to map threats vectors to the three planes of operation. The areas of focus for this document will be peering/interconnect, core and Layer 3 aggregation.
Figure 1. Threat Vectors Mapped to Network Roles
BCF Technology to Mitigate Threat Vectors
Figure 2 illustrates the visual concept of BCF and how it relates to multiple technology categories:
Figure 2. BCF Pillars of Control and Visibility
BCF Trust Boundaries
Trust boundaries should be extended within the IP infrastructure across devices under control of the service provider. Networks and devices outside of the control of the service provider will be regarded as being untrusted.
Untrusted networks (external sources) include:
Secure Core and Edge Network Resources
Protect the Data Plane
Infrastructure Access Control Lists (iACLs)
In an effort to protect routers from various risks—both accidental and malicious—infrastructure protection ACLs should be deployed at network ingress points. These ACLs deny access from external sources to all infrastructure addresses such as router interfaces, while simultaneously permitting routine transit traffic to flow uninterrupted.
In normal operations, the vast majority of traffic simply flows through a router while on route to its ultimate destination.
By filtering denying access to routers form external sources, many of the external risks associated with direct router attack are mitigated. Furthermore, infrastructure ACLs help enforce security policy by permitting only explicitly authorized IP addresses and protocols to enter the network from the un-trusted networks.
In general, an infrastructure ACL is composed of four sections:
Deny Forged IP Packets
Some threats/attacks use a technique to forge or spoof the source IP address of packets to evade security mechanisms. The packet may be forged with a source IP address belonging to the trusted network to gain access to a device. In another case, a customer may attempt to spoof the IP address of another customer to steal service or to hide his or her identity while attempting to launch a DoS attack at the infrastructure.
Infrastructure ACLs define policy for allowed packets entering the network from external sources, but anti-spoofing identifies valid source IP addresses. Infrastructure ACLs should include entries to deny the following IP address ranges from entering the network:
A bogon prefix is a route that should never appear in the Internet routing table. A packet routed over the public Internet (not including over VPN or other tunnels) should never have a source address in a bogon range. These are commonly found as the source addresses of DDoS attacks. To get a current list of unused IP prefixes which should not be routed from the Internet, use the tools at http://www.cymru.com/Documents/bogon-list.html.
While static ACLs provide an efficient mechanism for blocking forged packets (source spoofing) at the peering edge or at interconnect points there are drawbacks to using ACLs on customer interfaces. For example if 1000 customer routers are connected to the aggregation router, a network would need 1000 unique ACLs, one ACL per customer. The configuration files will become very large and complex, and at some point the routers performance could be affected.
The second approach uses a feature called Unicast Reverse Path Forwarding (uRPF). This feature helps mitigate problems caused by malformed or forged (spoofed) IP packets entering the network by discarding IP packets that lack a verifiable IP source address. It provides flexibility because it automatically adapts to changes in dynamic and static routing tables. There are two modes of operation for uRPF:
If a packet fails the uRPF check (loose or strict mode), the packet is then dropped. Enable logging for these drops to report attempts by host systems sending forged packets.
QoS policies implemented throughout the IP infrastructure provides a means to guarantee IP services (video signaling, video content) during periods of congestion, including congestion caused by an attack targeted at the data plane. QoS is also a means to segment resources per service to isolate Video related traffic from normal “best-effort” Internet traffic. Create QoS classes for video and data traffic:
QoS policies should be implemented at the customer edge to enforce packet marking policies (TOS or Diffserv marking) and, if needed, police the amount traffic sourced from the customer network. QoS polices should provide that in periods of congestion, traffic marked as best-effort should be dropped to ensure video traffic is serviced from higher priority queues.
QoS policies can also be used as response mechanism to control attack traffic as it enters the network. This method uses BGP to signal a new class of service to the edge routers, which policy is preprogrammed to rate-limit this traffic to an allowable rate or simply drop.
Protect the Control Plane
The IP/MPLS infrastructure's control plane is responsible for the optimal routing of traffic on the data plane. Control plane traffic routed to a router is handled by the router processor CPU and is critical to network operation. Any service disruption to the route processor, and hence the control plane, can lead to business-affecting network outages. A denial of service (DoS) attack targeting the route processor, which can be perpetrated either inadvertently or maliciously, typically involves high rates of RP destined traffic that result in excessive CPU utilization on the route processor itself. Such an attack can be devastating to network stability and availability and may include the following symptoms:
Control plane policies should be implemented on all infrastructure devices. Polices should define allowed classes of traffic and deny all other classes. Examples of allowed protocols include:
As a baseline, four classes should be defined and assigned to specific policy:
Table 1 shows the policies which should be associated with each class. Policy action for control plane traffic should include:
Table 1. Policy Mapped to Class
It is important for the control plane policies to be monitored to ensure behavior to the device is controlled and normal operation is not disrupted. As control plane traffic profiles become better known, additional granularity can be added to further mitigate risk from even the internal trusted hosts.
In the control plane context, filtering fragments adds an additional layer of protection against a DoS attack that uses only non-initial fragments (i.e. FO > 0), however denying non-initial fragments may in rare instances deny a valid session that requires fragmentation.
Protect the Management Plane
The management plane, like the control plane, is terminated on each routing device. Protocols like SNMP and SSH are used to access the management plane for device:
It is critical to restrict access to network devices to only internal sources (trusted network) using allowed protocols. The management plane utilizes technologies within the Identity/Trust category to validate user credentials and define a trust level for device management. This level of security adds another layer of defense to protect against device break-ins or unauthorized configuration changes.
AAA services are available on routing devices to:
AAA services for management purposes should be separated from subscriber AAA services to segment external users from internal users. Common practice has been to use TACACS+ for device management and RADIUS for subscriber management. TACACS+ provides secure communications channels between the network device and AAA server to protect user credentials passed along the data plane. Routing devices need not maintain user credentials and privilege levels locally, the AAA server will maintain the username/password database and assign policy per user.
To assign policy per user, it is good practice to define user roles for management access and restrict the CLI access to specific commands needed to perform role. At the very least, three roles should be defined and assigned to specific privilege levels:
Users should be carefully assigned to the admin role; this role should be restricted to a small number of people to protect against configuration problems that could compromise device performance and availability. Usernames should be unique to track configuration changes to a specific username, and accounting records from the device include the username with timestamps for CLI input.
Granular Access Rules per Management Interface
If the network devices support additional permit lists per management interface, these should be implemented to add granular protection policies per management protocol (SNMP, SSH, Telnet, Web GUI). In addition implementing device specific access rules per management interface adds another layer of defense.
Network monitoring is essential to identifying a baseline of normal network operations. A known good baseline is used to measure against current network conditions to identify anomalies that could be associated with an attack targeted to the IP infrastructure or connected services.
Anomalies can occur due to an attack or an outbreak of malicious traffic sourced from external networks. To reduce the time to react and insert counter-measures to mitigate, anomalies need to be identified based on amount of change in traffic behavior versus normal baseline. Successful anomaly detection systems have the ability to:
Flow analysis tools provide visibility into network traffic based on:
It is common for routing devices to support NetFlow Version 5 to export flow records using the above fields in an udp packet to a collection device. Flow analysis is defined on the router interfaces using either sampled mode or full analysis mode. Sampling mode is commonly used to avoid router performance from being compromised, sample mode uses a ratio to define what packets to perform flow analysis on. For example, a sample rate of 1 out of every 100 packets. The collection device receives NetFlow records from multiple devices. Tools are available today for service providers to collect and correlate NetFlow records to perform:
Another method to identify an anomaly or a behavior change related to an attack targeted to the IP infrastructure is to monitor SNMP:
A large-scale network attack would cause an abnormal increase in the amount of the counters tracking interface statistics. In turn, an increase in traffic destined to the router would also cause CPU spikes (control plane behavior). Tools such as Multi-Router Traffic Grapher (MRTG) could be used to collect SNMP statistics and provide visual graphs for traffic rates and device CPU utilization.
To verify ACL policies and monitor number of packets matching deny statements, logging is available which could be sent to the syslog facility and stored on a remote server. A sharp increase in deny messages could be a warning sign of an ongoing attack entering the network. Within these logs are:
Care should be taken when enabling logging per ACL entry; CPU resources are required to create the log entry. Verify device capabilities to limit the number of messages generated per ACL entry. In other words, thresholds should be implemented to suspend logging to prevent the device CPU from being overwhelmed when handling a large number of messages. It should only be necessary to log messages for entries “denying” access.
Network Monitoring Summary
With each of these monitoring points, it is important that a baseline representing “normal” behavior is established. The baseline allows operations to more rapidly identify changes from the normal baseline to identify and anomaly or critical change in network behavior. Maintenance should be considered to update the baseline as traffic patterns may change or as new applications/services are deployed within the IP infrastructure.
The information retrieved from the above-mentioned monitoring tools should also be used to monitor polices to have a cycle of continuous improvement. Visibility into network behavior is like having a feedback loop:
Networkwide Response Tools
The BCF design model described in the above sections to protect the data, control and management planes can also be leveraged to respond to Distributed Denial of Service attacks which could be targeted to the Infrastructure or a connected device. The following tools combined provide a mitigation response tool used from the IP infrastructure, Remote Triggered Black Hole Filtering (RTBHF):
Source-based RTBHF provides the ability to drop traffic at the network edge based on a specific source address or range of source addresses. The source address (or range of addresses) of the attack are identified (spoofed or not) by flow analysis tools (NetFlow) monitoring inbound traffic from the Internet. Legitimate IP packets are allowed to reach the target. Implementation of source-based black hole filtering works with Unicast Reverse Path Forwarding (uRPF) loose mode.
Loose uRPF validates the source IP of a packet, if there is a route entry for the source IP of the incoming packet in the router Forwarding Information Base (FIB) the packet is routed normally. If the router does not have an FIB entry for the source IP address, or if the entry points to Null0, the RPF check fails and the packet is dropped as shown in Figure 3.
Figure 3. uRPF Loose Mode Check
This process has three steps, which are illustrated in Figure 4:
Figure 4. RTBHF Process
BCF Design Validation
The BCF design purpose was to secure the IP/MPLS infrastructure and protect video services:
The tools used within the IP/MPLS infrastructure are used to accomplish two goals; Maintain network control with networkwide visibility.
Figure 5 reviews the technologies to protect the three planes of operation from:
Figure 5. BCF Network Roles
Peering / Internet Edge
The tools used at the peering edge protect video services from being disrupted by external sources. Access to infrastructure devices is secured from external sources. The management and control plane are restricted and protected against unauthorized access and from a DoS attack targeted at the router processor RP. The peering edge tools create a layer of defense protecting core, access, and aggregation routers from external threats. Network monitoring provides visibility into the traffic behavior coming from external sources. Identify, classify, and trace back attacks, and mitigate using network response tools (Netflow + BGP + uRPF loose mode).
Core routers implement self-protection to ensure the control plane management planes are not compromised by external or internal sources. Monitoring tools ensure edge policies are working as expected.
Subscriber L3 Aggregation Edge
Ensure subscribers source IP packets with valid IP source addresses. Maintain QoS policies to protect upstream video resources, protect the video services delivered to other customers. Deny access to the infrastructure devices but permit transit access to services (Internet, video content, portals, e-mail, etc.). Monitor traffic to and from subscribers to identify malicious behavior targeted to the IP/MPLS infrastructure.
Vaughn Suazo is a Consulting Systems Engineer for Service Provider customers, specializing in Service Provider Security technologies and solutions.