Purpose of Document

Designing a SECURE IP/MPLS infrastructure to defend against complex threats and malicious attacks, which continue to change in behavior and characteristics, is critical today more than ever. This document uses a design model based on existing technologies within an IP/MPLS infrastructure; the model takes advantage of these technologies to position against risk and mitigate threats targeted at the infrastructure.

Service availability, reliability, and quality are critical attributes that service providers must protect when deploying video technologies over IP/MPLS. A security baseline implemented within an IP/MPLS infrastructure protects these attributes by:

Implementing a multilayered defense system
Controlling network-based behavior
Supporting visibility into network behavior

Service Provider Security Challenges and Requirements

The primary challenge faced by today's service providers is maintaining service predictability in the presence of an outbreak of malicious traffic sourced from multiple endpoints spread across multiple network boundaries. In today's terms, this type of behavior has been identified with threats such as distributed denial of service (DDoS) attacks, turbo worms, e-mail spam, and viruses. The amount of traffic generated by an outbreak has the capability of disrupting the normal operation of an IP/MPLS network and adds risk to the supporting devices routing and switching packets.

Based on emerging threats scaling to multi-gigabit rates of traffic, service provider requirements have now shifted from using standalone security appliances to requiring that security now be integrated into the network infrastructure. Integrating security within the network infrastructure provides the following advantages:

Functionality without affecting overall network performance
Operation with existing high-availability services
The ability to identify, classify, and trace back anomalous behavior networkwide
Increasing the overall security posture of the network
The ability to distribute counter measures to multiple ingress points of the network

By integrating security functionality within the network infrastructure, the same operational tools to manage data services are also used to support security operations. Common tools such as Authentication/Authorization/Accounting (AAA) services, SNMP, SYSLOG, routing protocols, device counters, and packet analysis tools enforce and monitor security polices required to ensure reliable operation. As the network devices become more intelligent, operational processes become more proactive to identify and mitigate threats to key video attributes.

Business Control Framework

Security is the foundation of internetworking's future; we have moved from an Internet of implicit trust to an Internet of pervasive distrust. The Business Control Framework (BCF) model takes the functionality of a router and merges this functionality into a pervasive policy enforcement model.

QoS = Security
High-Availability = Security
Edge Subscriber Policy = Security

BCF Design Model

The BCF design model is referenced to implement security networkwide with no reliance to a single technology. Multiple technologies and features are used networkwide to ensure two crucial design goals:

Control network behavior
Maintain visibility into network behavior

The BCF model uses six technology categories (Figure 2) to identify a security baseline to mitigate risk to video services over an IP infrastructure (availability, reliability and quality).

Threat Vectors Mapped to IP Planes of Operation

IP networks can be categorized into three planes of operation:

Data Plane—The data plane receives, processes, and transmits network data between network elements, and represents the bulk of network traffic that passes to and through the router.
Control Plane—The control plane is where all routing control information is exchanged, making the control plane and its components a target. Because control plane resiliency depends on CPU processing power and scalability, “out-of-resources” attacks against the CPU are not uncommon.
Management Plane—The management plane is the logical path of all traffic related to the system management of the routing platform. In a distributed and modular environment, the management plane offers new levels of complexity, and hence, increased requirements to maintain secure access.

Each plane of operation must be properly secured and monitored (Control + Visibility) to ensure the reliable operation of the network. Figure 1 uses a matrix to map threats vectors to the three planes of operation. The areas of focus for this document will be peering/interconnect, core and Layer 3 aggregation.

Figure 1. Threat Vectors Mapped to Network Roles

Reconnaissance—Scan network topologies to identify vulnerable devices (open ports, no passwords, OS vulnerabilities, etc.)
Distributed Denial of Service / Infrastructure—IP packet based attacks launched at the network infrastructure to compromise network performance and reliability
Break-Ins—Usually follows reconnaissance, unauthorized access to a given device to with intention to compromise device security
Theft of Service / Fraud—Unauthorized use of network resources

BCF Technology to Mitigate Threat Vectors

Figure 2 illustrates the visual concept of BCF and how it relates to multiple technology categories:

Network Control—Policy
- Policy Enforcement—Access Control Lists, QoS policy actions
- Isolation / Segmentation—QoS Resources
- Instrumentation and Management—Control Plane Protection
Visibility—Classification
- Protocol and Application Awareness—Netflow, Access Control List, QOS classes
- Identity and Trust—Authentication, Authorization and Accounting(AAA services), Management Access Control Lists, Anti-Spoofing
- Correlation—Networkwide Monitoring (SNMP, Netflow records, Syslog)

Figure 2. BCF Pillars of Control and Visibility

The BCF model as a whole takes into account the operational process model to implement control and visibility technologies across an IP/MPLS infrastructure. It is the goal of the BCF design to work within a sustainable operational model supporting both network and security operations. Each technology pillar works together as a whole, providing a secure system of defense.

BCF Trust Boundaries

Trust boundaries should be extended within the IP infrastructure across devices under control of the service provider. Networks and devices outside of the control of the service provider will be regarded as being untrusted.

Untrusted networks (external sources) include:

Connected customer networks
Internet (remote autonomous systems)

Secure Core and Edge Network Resources

Protect the Data Plane

Infrastructure Access Control Lists (iACLs)

In an effort to protect routers from various risks—both accidental and malicious—infrastructure protection ACLs should be deployed at network ingress points. These ACLs deny access from external sources to all infrastructure addresses such as router interfaces, while simultaneously permitting routine transit traffic to flow uninterrupted.

In normal operations, the vast majority of traffic simply flows through a router while on route to its ultimate destination.

By filtering denying access to routers form external sources, many of the external risks associated with direct router attack are mitigated. Furthermore, infrastructure ACLs help enforce security policy by permitting only explicitly authorized IP addresses and protocols to enter the network from the un-trusted networks.

In general, an infrastructure ACL is composed of four sections:

Explicitly permitted externally sourced traffic destined to infrastructure addresses.
Anti-spoofing entries that deny packets with source address that belong within your AS from entering the AS from an external source.
Deny statements for all other externally sourced traffic to infrastructure addresses.
Permit statements for all other traffic for “normal” backbone traffic en route to non-infrastructure destinations.

Deny Forged IP Packets

Some threats/attacks use a technique to forge or spoof the source IP address of packets to evade security mechanisms. The packet may be forged with a source IP address belonging to the trusted network to gain access to a device. In another case, a customer may attempt to spoof the IP address of another customer to steal service or to hide his or her identity while attempting to launch a DoS attack at the infrastructure.

Infrastructure ACLs define policy for allowed packets entering the network from external sources, but anti-spoofing identifies valid source IP addresses. Infrastructure ACLs should include entries to deny the following IP address ranges from entering the network:

RFC1918 addresses (10.0.0.0/ 172.16.0.0/ 192.168.0.0/ 127.0.0.1)
Bogon Prefixes (unused IP address ranges)
Any packets using the IP address range assigned to internal IP infrastructure

A bogon prefix is a route that should never appear in the Internet routing table. A packet routed over the public Internet (not including over VPN or other tunnels) should never have a source address in a bogon range. These are commonly found as the source addresses of DDoS attacks. To get a current list of unused IP prefixes which should not be routed from the Internet, use the tools at http://www.cymru.com/Documents/bogon-list.html.

While static ACLs provide an efficient mechanism for blocking forged packets (source spoofing) at the peering edge or at interconnect points there are drawbacks to using ACLs on customer interfaces. For example if 1000 customer routers are connected to the aggregation router, a network would need 1000 unique ACLs, one ACL per customer. The configuration files will become very large and complex, and at some point the routers performance could be affected.

The second approach uses a feature called Unicast Reverse Path Forwarding (uRPF). This feature helps mitigate problems caused by malformed or forged (spoofed) IP packets entering the network by discarding IP packets that lack a verifiable IP source address. It provides flexibility because it automatically adapts to changes in dynamic and static routing tables. There are two modes of operation for uRPF:

Loose Mode—Typically used on multipoint interfaces or on routers where asymmetrical routing is used (packets are received on one interface but the return path is not on the same interface). Loose mode verifies a source address by looking in forwarding information base (FIB), created by routing protocols, to verify there is a return route to the source and verify the path uses a valid interface.
Strict Mode—Typically used on point-to-point interfaces where the same interface is used for both directions of packet flows. Strict mode uses the same verification method as loose mode but adds one additional verification process. If the source address has a return route in the FIB table, it is then checked against the adjacency table to ensure the same interface receiving the packet is same interface used for the return path.

If a packet fails the uRPF check (loose or strict mode), the packet is then dropped. Enable logging for these drops to report attempts by host systems sending forged packets.

QoS

QoS policies implemented throughout the IP infrastructure provides a means to guarantee IP services (video signaling, video content) during periods of congestion, including congestion caused by an attack targeted at the data plane. QoS is also a means to segment resources per service to isolate Video related traffic from normal “best-effort” Internet traffic. Create QoS classes for video and data traffic:

IP Video traffic—classified with “Type-of-Service” (TOS) 5 or Diffserv code point Expedited Forwarding (EF)
Internet Data traffic—classified with TOS 0 or Diffserv code point Best Effort (BE)

QoS policies should be implemented at the customer edge to enforce packet marking policies (TOS or Diffserv marking) and, if needed, police the amount traffic sourced from the customer network. QoS polices should provide that in periods of congestion, traffic marked as best-effort should be dropped to ensure video traffic is serviced from higher priority queues.

QoS policies can also be used as response mechanism to control attack traffic as it enters the network. This method uses BGP to signal a new class of service to the edge routers, which policy is preprogrammed to rate-limit this traffic to an allowable rate or simply drop.

Protect the Control Plane

The IP/MPLS infrastructure's control plane is responsible for the optimal routing of traffic on the data plane. Control plane traffic routed to a router is handled by the router processor CPU and is critical to network operation. Any service disruption to the route processor, and hence the control plane, can lead to business-affecting network outages. A denial of service (DoS) attack targeting the route processor, which can be perpetrated either inadvertently or maliciously, typically involves high rates of RP destined traffic that result in excessive CPU utilization on the route processor itself. Such an attack can be devastating to network stability and availability and may include the following symptoms:

High route processor CPU utilization (near 100%)
Loss of line protocol keep-alives and routing protocol updates, leading to route flaps and major network transitions
Interactive sessions via the command line interface (CLI) are slow or completely unresponsive due to high CPU utilization
Route processor resource exhaustion: resources such as memory and buffers are unavailable for legitimate IP data packets
Packet queues back up, leading to indiscriminate drops (or drops due to lack of buffer resources) of other incoming packets

Control plane policies should be implemented on all infrastructure devices. Polices should define allowed classes of traffic and deny all other classes. Examples of allowed protocols include:

Routing protocols (BGP, OSPF, RIP)
Management protocols (SNMP, telnet, ssh, http/https)
ICMP messaged
Tunneling protocols (GRE, IPSec)

As a baseline, four classes should be defined and assigned to specific policy:

Critical Class
Include routing and signaling protocols used by the internal IP infrastructure
Important Class
Include management protocols used to operate the network infrastructure, access should only be allowed from trusted hosts
Normal Class
Include essential ICMP message (echo-reply, ttl-exceeded, port-unreachable, etc.)
Undesirable Class
Use this class to include known bad traffic

Table 1 shows the policies which should be associated with each class. Policy action for control plane traffic should include:

Permit
Police to a specific rate (drop excess traffic)
Drop and log

Table 1. Policy Mapped to Class

Policy	Class
Permit	Critical Class
Permit and Police	Important Class, Normal Class
Deny / Drop and log	Undesirable Class

It is important for the control plane policies to be monitored to ensure behavior to the device is controlled and normal operation is not disrupted. As control plane traffic profiles become better known, additional granularity can be added to further mitigate risk from even the internal trusted hosts.

In the control plane context, filtering fragments adds an additional layer of protection against a DoS attack that uses only non-initial fragments (i.e. FO > 0), however denying non-initial fragments may in rare instances deny a valid session that requires fragmentation.

Protect the Management Plane

The management plane, like the control plane, is terminated on each routing device. Protocols like SNMP and SSH are used to access the management plane for device:

Monitoring
CLI access

It is critical to restrict access to network devices to only internal sources (trusted network) using allowed protocols. The management plane utilizes technologies within the Identity/Trust category to validate user credentials and define a trust level for device management. This level of security adds another layer of defense to protect against device break-ins or unauthorized configuration changes.

Authentication/Authorization/Accounting

AAA services are available on routing devices to:

Validate user credentials (username, password)
Authorize privilege levels
Record accounting records to track configuration changes made by a specific user

AAA services for management purposes should be separated from subscriber AAA services to segment external users from internal users. Common practice has been to use TACACS+ for device management and RADIUS for subscriber management. TACACS+ provides secure communications channels between the network device and AAA server to protect user credentials passed along the data plane. Routing devices need not maintain user credentials and privilege levels locally, the AAA server will maintain the username/password database and assign policy per user.

To assign policy per user, it is good practice to define user roles for management access and restrict the CLI access to specific commands needed to perform role. At the very least, three roles should be defined and assigned to specific privilege levels:

Admin Role—Full access to all commands
Provisioning Role—Access to configurations commands (interfaces, subscriber services, etc.)
Monitor Role—Access to monitoring and troubleshooting commands.

Users should be carefully assigned to the admin role; this role should be restricted to a small number of people to protect against configuration problems that could compromise device performance and availability. Usernames should be unique to track configuration changes to a specific username, and accounting records from the device include the username with timestamps for CLI input.

Granular Access Rules per Management Interface

If the network devices support additional permit lists per management interface, these should be implemented to add granular protection policies per management protocol (SNMP, SSH, Telnet, Web GUI). In addition implementing device specific access rules per management interface adds another layer of defense.

Networkwide Monitoring

Network monitoring is essential to identifying a baseline of normal network operations. A known good baseline is used to measure against current network conditions to identify anomalies that could be associated with an attack targeted to the IP infrastructure or connected services.

Anomalies can occur due to an attack or an outbreak of malicious traffic sourced from external networks. To reduce the time to react and insert counter-measures to mitigate, anomalies need to be identified based on amount of change in traffic behavior versus normal baseline. Successful anomaly detection systems have the ability to:

Identify an anomaly—increase in traffic to device management plane
Classify an anomaly—Flood of SNMP requests from external hosts
Traceback an anomaly—SNMP flood entering at Peering connection

Flow Analysis

Flow analysis tools provide visibility into network traffic based on:

Source/destination IP address
Protocol
Source/destination ports
TCP flags
Bytes transmitted
TOS bits (type-of-service bits used to classify for QoS)
Next-hop information (BGP)

It is common for routing devices to support NetFlow Version 5 to export flow records using the above fields in an udp packet to a collection device. Flow analysis is defined on the router interfaces using either sampled mode or full analysis mode. Sampling mode is commonly used to avoid router performance from being compromised, sample mode uses a ratio to define what packets to perform flow analysis on. For example, a sample rate of 1 out of every 100 packets. The collection device receives NetFlow records from multiple devices. Tools are available today for service providers to collect and correlate NetFlow records to perform:

Anomaly detection
Traffic accounting
Capacity planning

SNMP Counters

Another method to identify an anomaly or a behavior change related to an attack targeted to the IP infrastructure is to monitor SNMP:

Interface counters
CPU utilization

A large-scale network attack would cause an abnormal increase in the amount of the counters tracking interface statistics. In turn, an increase in traffic destined to the router would also cause CPU spikes (control plane behavior). Tools such as Multi-Router Traffic Grapher (MRTG) could be used to collect SNMP statistics and provide visual graphs for traffic rates and device CPU utilization.

ACL Logs

To verify ACL policies and monitor number of packets matching deny statements, logging is available which could be sent to the syslog facility and stored on a remote server. A sharp increase in deny messages could be a warning sign of an ongoing attack entering the network. Within these logs are:

Timestamps
Offending source IP
ACL name/number
Rules violated

Care should be taken when enabling logging per ACL entry; CPU resources are required to create the log entry. Verify device capabilities to limit the number of messages generated per ACL entry. In other words, thresholds should be implemented to suspend logging to prevent the device CPU from being overwhelmed when handling a large number of messages. It should only be necessary to log messages for entries “denying” access.

Network Monitoring Summary

With each of these monitoring points, it is important that a baseline representing “normal” behavior is established. The baseline allows operations to more rapidly identify changes from the normal baseline to identify and anomaly or critical change in network behavior. Maintenance should be considered to update the baseline as traffic patterns may change or as new applications/services are deployed within the IP infrastructure.

The information retrieved from the above-mentioned monitoring tools should also be used to monitor polices to have a cycle of continuous improvement. Visibility into network behavior is like having a feedback loop:

Policy is defined
Policy tested
Policy implemented
Policy monitored
Policy improved
New policy implemented (repeat cycle at step 2)

Networkwide Response Tools

The BCF design model described in the above sections to protect the data, control and management planes can also be leveraged to respond to Distributed Denial of Service attacks which could be targeted to the Infrastructure or a connected device. The following tools combined provide a mitigation response tool used from the IP infrastructure, Remote Triggered Black Hole Filtering (RTBHF):

NetFlow analysis
uRPF Loose Mode
BGP
SNMP Counters

Source-based RTBHF provides the ability to drop traffic at the network edge based on a specific source address or range of source addresses. The source address (or range of addresses) of the attack are identified (spoofed or not) by flow analysis tools (NetFlow) monitoring inbound traffic from the Internet. Legitimate IP packets are allowed to reach the target. Implementation of source-based black hole filtering works with Unicast Reverse Path Forwarding (uRPF) loose mode.

Loose uRPF validates the source IP of a packet, if there is a route entry for the source IP of the incoming packet in the router Forwarding Information Base (FIB) the packet is routed normally. If the router does not have an FIB entry for the source IP address, or if the entry points to Null0, the RPF check fails and the packet is dropped as shown in Figure 3.

Figure 3. uRPF Loose Mode Check

Because uRPF validates a source IP against its FIB entry, you need only drop traffic from specific source addresses to configure loose uRPF on the external interface and ensure the RPF check fails by inserting a route to the source with a next hop of Null0. You can do this by using a trigger device to send iBGP updates. These updates set the next hop for the source IP to an unused IP address which has a static entry at the edge setting it to Null0 as explained below.

This process has three steps, which are illustrated in Figure 4:

The setup (Preparation)
- The trigger must have an iBGP peering relationship with all the edge routers, or with BGP route reflectors. The trigger must also be configured to redistribute static routes to its iBGP peers.
- The edge routers must have a static route for an unused IP address space (for example, 192.0.2.1 / 32) set to Null0
- Loose uRPF must be configured on all external facing interfaces at the edges (PE's)
The trigger
- An administrator adds a static route to the trigger; the static route represents the source IP of the attacker(s). The trigger redistributes the route by sending a BGP update to all its iBGP peers, which sets the next hop to the source IP of the attacker, the following address could be used; 192.0.2.1
- Each edge router receives an iBGP update and sets its next hop to the source IP to the unused IP address space 192.0.2.1. The next hop to this address is set to Null0 using a static routing entry in the router configuration. The next hop entry in the FIB for the source IP address is now updated to Null0.
- All traffic from the source IP will fail the loose uRPF check and as a consequence will be dropped.
The withdrawal
- Once the trigger is in place. all traffic from the source IP address(es) will be dropped at the edge routers. When the threat no longer exists, the administrator must manually remove the static route from the triggering device, which sends a BGP route withdrawal to its iBGP peers. This prompts the edge routers to remove the existing route for the source IP that points to 192.0.2.1 and installs a new route in the FIB based on the IGP RIB. If this new route is successful, loose uRPF checks will pass and traffic from the blocked source will be forwarded normally.

Figure 4. RTBHF Process

RTBHF provides the following benefits versus static ACL entries:

Attack source list is distributed from a single trigger router to multiple edge routers
The drop list is removed by the trigger router after the attack subsides, no need to manually remove ACL entries manually from all edge routers
No need to access multiple routers for CLI and generate ACL entries; minimize CLI input, therefore reducing configuration errors
Drop list is updated in less than 60 seconds across the edge network

BCF Design Validation

The BCF design purpose was to secure the IP/MPLS infrastructure and protect video services:

Video quality—bandwidth resources, traffic classification
Video availability—multicast and unicast convergence
Video performance—customer and service segmentation

The tools used within the IP/MPLS infrastructure are used to accomplish two goals; Maintain network control with networkwide visibility.

Figure 5 reviews the technologies to protect the three planes of operation from:

Reconnaissance
DDoS
Device break-ins
Theft of service/fraud

Figure 5. BCF Network Roles

Peering / Internet Edge

The tools used at the peering edge protect video services from being disrupted by external sources. Access to infrastructure devices is secured from external sources. The management and control plane are restricted and protected against unauthorized access and from a DoS attack targeted at the router processor RP. The peering edge tools create a layer of defense protecting core, access, and aggregation routers from external threats. Network monitoring provides visibility into the traffic behavior coming from external sources. Identify, classify, and trace back attacks, and mitigate using network response tools (Netflow + BGP + uRPF loose mode).

Core

Core routers implement self-protection to ensure the control plane management planes are not compromised by external or internal sources. Monitoring tools ensure edge policies are working as expected.

Subscriber L3 Aggregation Edge

Ensure subscribers source IP packets with valid IP source addresses. Maintain QoS policies to protect upstream video resources, protect the video services delivered to other customers. Deny access to the infrastructure devices but permit transit access to services (Internet, video content, portals, e-mail, etc.). Monitor traffic to and from subscribers to identify malicious behavior targeted to the IP/MPLS infrastructure.

Acknowledgements

Vaughn Suazo is a Consulting Systems Engineer for Service Provider customers, specializing in Service Provider Security technologies and solutions.

This document is part of the Cisco Security portal. Cisco provides the official information contained on the Cisco Security portal in English only.

This document is provided on an “as is” basis and does not imply any kind of guarantee or warranty, including the warranties of merchantability or fitness for a particular use. Your use of the information in the document or materials linked from the document is at your own risk. Cisco reserves the right to change or update this document without notice at any time.

Secure Network Infrastructure

Contents