Triage: Initial Analysis and Response
Worm Mitigation Reaction Methodology
Tools and Techniques
Arbor Networks Peakflow
Networkwide ACL and VACL Propagation (Rapid Deployment)
Applying the Tools to an Enterprise Environment
Containment and Isolation
IP Telephony: Cisco CallManager
Appendix A - Aggregated Bogon List
Appendix B - Freeware Tools
SNMP: net-snmp Toolset
SNMP: Multirouter Traffic Grapher
SNMP: Round Robin Database Tool
NetFlow Visualization: FlowScan
Internet worms have had a severe impact on many enterprise customers. Recently developed tools and architectural techniques can be employed to assist with the mitigation of worm activity in an enterprise environment.
This paper provides:
- A conceptual overview of worm mitigation techniques
- Details for deployment of these techniques into an overall solution for enterprise customers
This document has been written from a solution standpoint. It is primarily designed to provide a tool kit for dealing with the issue of Internet worms within an enterprise environment. Although this is the primary motivation of this document, the overall solution has application well beyond this primary purpose and additionally provides capability for detecting and responding to other security incidents.
Cisco Security Agent - Cisco Security Agent provides significant protection from many worms at a workstation or server level. Cisco Security Agent requires a controlled deployment and a degree of tuning for any uniqueness in a specific customer environment, both of which take time. Therefore, Cisco Security Agent should be seen as strategic solution and not as a tactical solution when preparing for malicious activity.
Network Admission Control - Network Admission Control (NAC) is similar to Cisco Security Agent. At the time of first writing (August 2004) NAC was in its first stage of release. A more comprehensive solution will be provided in subsequent stages. For these reasons, NAC should also be seen as a strategic solution and not as a tactical solution.
The techniques described in this document can be deployed today.
The techniques described in this document were originally developed for large Internet service providers (ISPs) and have been adapted for use in enterprise environments. They are well-understood and mature technologies, now applied in a new way to solve a new problem.
Cisco uses the techniques in this paper on its own network to defend against a range of malicious activity, including worms and other security incidents. Many customers have asked for information on how Cisco deals with these typical day-to-day threats, which has been part of the motivation for this solution.
An organization’s internal operational processes are a critical aspect of dealing with any security incident. Although sophisticated software can isolate possible security incidents, there is still a significant degree of human intervention required, making the establishment of reliable incident response procedures vital.
This document has a technical focus and is not intended to provide a detailed overview of incident response procedures. However, because these concepts are an integral part of the overall solution, a high-level overview of incident response is presented. Figure 1 illustrates the overall framework and process.
Figure 1: General Incident Response Guidelines
The overall goal of an incident response process is to maintain business operations.
Although preparation is not part of the formal incident response process, this document presents techniques that must be in place prior to the occurrence of a security incident. Having response procedures in place facilitates efficient response during an actual incident.
The Cisco Network Consulting Engineers suggest the following preparatory steps:
- Develop a clear understanding of the organization’s primary business and IT resources.
- Arrange for 24x7 access to someone who can authorize business decisions during a security incident.
- Establish open lines of communication. Operations groups need to know the key contacts within the organization.
- Collect links to Internet sites that provide up-to-date and reliable details of security threats and Internet worm activity, such as www.dshield.org, www.securityfocus.com, and bugtraq.
- Maintain updated contact details for your ISP or ISPs.
The first phase of incident response is to verify that the event is an actual security incident, such as an attack or worm event. In some cases, an incident could be the result of scheduled maintenance activities.
After the event is confirmed, take quick action to limit the damage. Doing so might entail steps such as turning off a device or removing a device from the network. However, any actions taken need to be in line with maintaining business continuity.
The second phase is the analysis phase. A key part of this process is incident classification, which involves understanding the type of attack and the damage it is causing. It is important to perform the analysis with as little impact as possible on business functions.
Next, determine the scope of the incident-the number of devices, data, and other resources affected. It is important to look beyond the initially identified target, because the event might be more widespread than initially thought.
In some cases, it might be necessary to perform a traceback to the origin of the attack; this activity might involve working through your ISP. In other cases, restoration of business operations might require priority over any traceback activities.
Measure the impact-what are the resulting effects of the incident on the organization? Has the event caused a minor problem or has it caused a major impact to the business?
The results of this analysis will help determine the most appropriate reaction techniques for the specific incident.
The reaction phase involves some action to counter the attack. Each situation will dictate the action to be taken, such as widely deploying access control lists (ACLs) in a worm event; restoring a device to normal operation by reloading the OS from the original media and restoring data from backups in a server compromise; or changing any static passwords because they might have been compromised-and an entirely reasonable response in some situations might be to do nothing.
A post-mortem involves a full, in-depth analysis of the event and the response to the event. The goal is to determine what can be done to build resistance and prevent this type of attack from happening again-essentially, learning from the experience. As a simple example, if a network penetration occurred, it would be prudent to identify what vulnerability was used to obtain access, and then fix all occurrences of that vulnerability. Additionally, it should be determined if the incident was detected in an acceptable time; if not, measures should be deployed to speed detection in the event of further incidents.
The post-mortem is a step that is often ignored. It is critical that it is not forgotten.
Worm Mitigation Reaction Methodology
The procedure outlined in this section should be followed when responding to a worm incident.
The first stage of the reaction process is to contain the spread of the worm inside the network. Compartmentalization, a core principle of the SAFE Blueprint from Cisco, is key because it allows isolation of parts of the network that are not yet infected. Figure 2 shows containment options.
The inoculation phase involves patching all systems. If the appropriate signature files or plug-ins are available for tools such as NESSUS, it is worthwhile to start scanning the network for vulnerable systems. This activity might allow operations staff to find vulnerable systems before they become infected.
During a worm crisis, there are three types of systems in your network:
- Patched systems
- Unpatched systems
- Infected systems
Figure 2: Containment Options
Inoculating uninfected systems is imperative and usually happens in parallel with the quarantine and treatment phases.
The quarantine phase involves finding each infected machine and disconnecting, removing, or blocking them from the network to prevent them from infecting other unpatched machines on the network. To achieve this goal, the infected systems need to be isolated and quarantined.
Later sections of this document will outline tools such as remote-triggered black hole routing. This technique allows the rapid isolation of infected machines, limiting their capability to spread the infection.
The treatment phase involves the cleaning and the patching of each infected system. Some worms might require complete reinstallations of the core system to ensure that the machine is clean.
All of this activity requires planning prior to a worm event. When these events occur, reaction time is critical, and these processes need to be in place. It is strongly recommended that every organization plan the reaction methodology ahead of the next crisis.
Tools and Techniques
It is important to view the following techniques as a tool kit. There is currently no simple guaranteed solution for dealing with these types of security incidents. As such, it is recommended that customers become familiar with the tools and techniques in this document.
The main tools discussed in this document include:
- NetFlow and NetFlow export
- Unicast Reverse Path Forwarding (uRPF)
- Routing protocols such as remote-triggered black hole filtering, also known as remote-triggered black hole routing
- Cisco routers and switches
- NetFlow collectors
- Arbor Networks Peakflow X and Peakflow DoS
There are many other products and features that can be used as security tools. This document focuses on a subset of these so that an effective solution can be deployed for worm mitigation.
ACLs as Security Tools
ACLs serve a dual purpose as security tools. They provide:
- A mechanism to permit or deny traffic
- A mechanism to detect certain traffic types
The use of ACLs to permit or deny traffic is a well-understood and well-documented security feature. In terms of worm mitigation, ACLs are likely to play a key role in preventing the spread of a worm by blocking its attack vector, usually a TCP or UDP port.
Using ACLs as a Detection Tool
The most common technique when using ACLs as a detection tool is to configure the router as a pseudo packet sniffer. To do so, use an ACL with a series of permit statements to provide a view of the traffic flow (Figure 3). The counters in the ACL entries can then be used to find which protocol types are potential culprits.
Figure 3: ACL Entries for Detection
Extended IP access list 169 permit icmp any any echo (21374 matches) permit icmp any any echo-reply (2 matches) permit udp any any eq echo permit udp any eq echo any permit tcp any any established (150 matches) permit tcp any any (15 matches) permit ip any any (45 matches)
VLAN access control lists (VACLs) operate somewhat like router-based ACLs. They are a means to apply access control to packets bridged within a VLAN or routed between VLANs. In terms of worm mitigation, VACLs allow access control to be applied directly to the access port.
VACLs use the same Access Control Entry (ACE) format used by router-based ACLs. The permit and deny statements based on Layer 2-4 header information are used to determine what traffic to permit and to deny. VACLs have no sense of direction, unlike router-based ACLs, which are applied on either an inbound or outbound basis. VACLs apply to traffic at both ingress and egress.
The configuration of a VACL differs for switches running Cisco Catalyst OS and native Cisco IOS Software. Figure 4 and Figure 5 illustrate the difference.
Figure 4: VACLs in Cisco Catalyst OS
Console> (enable) set security acl ip filter_http permit ip any host 10.1.1.1 filter_http editbuffer modified. Use 'commit' command to apply changes. Console> (enable) commit security acl filter_http ACL commit in progress. ACL filter_http successfully committed. Console> (enable) commit security acl map filter_http 55 Mapping in progress. VLAN 55 successfully mapped to ACL filter_http.
Figure 5: VACLs in Native Cisco IOS Software
6500 (config#) vlan access-map filter_http 6500 (config#) access-list 101 permit tcp any host 10.1.1.1 eq 80 6500 (config#) vlan access-map filter_http 6500 (config-access-map#) match ip address 101 6500 (config#) vlan filter filter_http vlan 45
NetFlow is used as the foundational technology for obtaining traffic flow information across a network. A flow is defined by seven unique keys: source IP address, destination IP address, source port, destination port, Layer 3 protocol type, ToS byte (Differentiated Services Code Point [DSCP]), and input logical interface (ifIndex). By observing traffic flows across the network, it is possible to see events that might be malicious. Some events might cause high traffic volumes, such as a denial of service (DoS) attack; others might be more subtle. In any case, observation of the flow information can detect these events. The NetFlow documentation is available at the following links:
NetFlow has the capability of performing a flow export function. In this case, all expired flow information is sent to a collector. Collectors could be a number of devices, including a Cisco NetFlow Collector, CFLOWD tools, OSU flow-tools (CFLOWD Successor), or the Arbor Networks collector, which is used as a key component of this solution. Export packets are approximately 1500 bytes and typically contain 20 to 50 flow records. Figure 6 illustrates this concept.
Figure 6: Collection of Flow Information
The current NetFlow information is also available via the command-line interface (CLI) of the router. The sample output in Figure 7 shows two clients infected with the Blaster worm that are scanning for other systems to infect. Note: 0x87 equals port 135 (illustrated in pink below).
Figure 7: NetFlow Output
Router>show ip cache flow | include 0087 SrcIf SrcIPaddress DstIf DstIPaddress Pr SrcP DstP Pkts Fa2/0 XX.XX.XX.242 Fa1/0 XX.XX.XX.119 06 0B88 0087 1 Fa2/0 XX.XX.XX.242 Fa1/0 XX.XX.XX.169 06 0BF8 0087 1 Fa2/0 XX.XX.XX.204 Fa1/0 XX.XX.XX.63 06 0E80 0087 1 Fa2/0 XX.XX.XX.204 Fa1/0 XX.XX.XX.111 06 0CB0 0087 1 Fa2/0 XX.XX.XX.204 Fa1/0 XX.XX.XX.95 06 0CA0 0087 1 Fa2/0 XX.XX.XX.204 Fa1/0 XX.XX.XX.79 06 0C90 0087 1
NetFlow monitors an interface’s ingress traffic only. Therefore, to obtain a full picture of bidirectional flow information, NetFlow must be deployed such that all ingress and egress flows are captured. Figure 8 illustrates a possible deployment scenario.
As part of the flow record, NetFlow exports the subinterface information.
Figure 8: NetFlow Deployment for Capturing Bidirectional Flow
NetFlow will have some performance impact, particularly on software-based routers. NetFlow is not a switching path; it is a companion feature that works with both fast switching and Cisco Express Forwarding. In all cases, the use of NetFlow with Cisco Express Forwarding switching is recommended. The largest dependency from a performance perspective is the number of flows. A Cisco 7200 Series NPE-300 or NPE-400 router will have its maximum forwarding rate (that is, no-drop rate) fall from somewhere between 20 percent (at 4 flows) to 50 percent (with 256,000 flows). This performance impact needs to be assessed on a case-by-case basis. In worst-case scenarios, router upgrades might be required.
There are many options for collecting exported NetFlow information. A commercial option is the Cisco CNS NetFlow Collection Engine. This can be deployed on a number of platforms, including Solaris, HP UX, and Linux.
The daily archival of flow data allows in-depth analysis of any security incidents. For example, if an infected machine is found to be participating in an outbound DoS attack, it is possible to analyze the NetFlow information for a command and control channel. If outbound DoS activity is detected, the information can be passed to the appropriate network operator.
Exporting and Analyzing Flow Information for Anomalies
Arbor Networks Peakflow provides further details of how the Arbor Peakflow products integrate into the overall solution. From a NetFlow perspective, the flow information needs to be exported to the Arbor collector as well as the NetFlow Collector. This is performed using either dual NetFlow export statements or a separate device to perform the export function for the Arbor collector and the flow archive collector.
Additional NetFlow Information
The NetFlow Services Solutions Guide is available at the following link:
Caveats at the Time of Writing
The Cisco Catalyst 6500 Series Supervisor Engine 2 and Supervisor Engine 720, prior to the PFC3BXL, do not export the ToS bits, and neither will export the TCP flags.
In many cases, these restrictions will not be an issue, particularly for the analysis of traffic on internal networks, and especially for internal anomaly detection purposes. In other circumstances, visibility of the TCP flags is important, particularly for the detection of certain external attacks such as SYN flood and RST attacks on external Internet interfaces (outside the firewalls). In these cases, NetFlow export from a dedicated Cisco router platform such as a 7500 Series or 12000 Series would be the preferred option. The above restrictions do not exist on these platforms.
Arbor Networks Peakflow
The detection and recognition of an attack or a security event is a critical component of any security solution. To successfully mitigate an attack, it is essential that it is accurately detected and the appropriate alerts are quickly raised to security operations staff.
Although IDSs provide detection capability, at the time of writing they are also signature-based, and therefore of limited benefit in these situations. Cisco itself has used the Arbor Peakflow DoS anomaly detection system to successfully detect and mitigate several worms. As such, it is included as a key part of this worm mitigation solution.
IDSs do have a significant role in the detection of many attacks and, as such, are still strongly recommended to complement an anomaly-based system. However, in the case of a worm mitigation solution (and some other security events), the Arbor anomaly-based approach has proven to be more effective.
Arbor offers two solutions to this problem. From the outset, it must be noted that Arbor’s capabilities go well beyond what is discussed in this document. Any organization deploying an Arbor solution will gain far more than just tools for worm mitigation.
The primary application of Peakflow DoS is the detection of external threats and events, making this product widely deployed by ISPs. For enterprises, using Peakflow DoS to detect the presence of an external security event (an event outside the firewall) is key to being in a position to quickly secure the network "internally" from the threat.
In the context of this solution, Peakflow DoS would be used as a tool used to monitor traffic outside an organization’s firewall.
The primary application of Peakflow X is the detection of internal threats and events. Peakflow X provides an internal anomaly detection solution through relational modeling of the enterprise’s internal network.
In the context of this solution, Peakflow X provides a detailed visualization of the application-level conversations inside an enterprise network.
Placement of the Arbor Collectors
Both Arbor Peakflow X and Peakflow DoS use a collector and controller architecture, illustrated in Figure 9. The Arbor collector receives the flow records exported from the routers. Multiple routers can export flow information to a single collector. A controller provides a Web interface, sits in the hierarchy above the collectors, and generally consolidates the information from the controllers.
Figure 9: NetFlow Deployment for Capturing Bidirectional Flow
NetFlow Data Export (NDE) needs to be reliable. It is UDP-based, so a fast, reliable path between the exporting routers and the collector is strongly recommended.
If NDE is being performed over a WAN link, it will produce approximately 1 to 1.5 percent of the saturated interface bandwidth in NDE traffic. Using a hub-and-spoke network as an example, it is best to collect the NetFlow information on the ingress interfaces of the central (hub) router as opposed to collecting it from the spoke routers, particularly if low-speed WAN links are involved. The consequence is the loss of visibility of any inter-interface traffic on the remote (spoke) routers, but the approach conserves expensive WAN bandwidth. This might or might not be an issue.
Traffic from the collectors can be "Td" to avoid the need for the routers to perform a dual export function. The Arbor collector is capable of passing the original NDE source address or rewriting it.
Routing the NDE traffic back through a monitored interface should be avoided.
Peakflow Design Parameters
The Arbor Peakflow DoS 2.3 software release supports:
- 14 collectors per controller
- Up to five (GSR 12000) size routers per collector
Smaller routers produce a smaller amount of export traffic; these design figures could be increased. However, at the time of writing, no other design recommendations were available from Arbor.
Introduction to Sinkholes
A sinkhole is a multifaceted security tool-essentially, a portion of the network that is designed to accept and analyze attack traffic. Sinkholes were originally used by ISPs to engulf attack traffic, in many cases drawing attacks away from a customer or other target. In more recent times, sinkholes have been used in enterprise environments to monitor attacks, detect scanning activity from infected machines, and generally monitor for other malicious activity.
This document illustrates how a sinkhole can be used in diverting attack traffic, monitoring for worm propagation, and monitoring other potentially malicious traffic.
Traditional Sinkhole - Diverting Attack Traffic
In the first sinkhole application, a publicly accessible Web server is the target of either a DoS or DDoS attack. Figure 10 illustrates how server WWW1 is unavailable due to the attack. Additionally, the extremely high traffic volume has saturated links and routers, making server WWW2 unavailable as well.
Figure 10: Traditional Sinkhole - The Attack
Figure 11 illustrates how a sinkhole can be used to pull attack traffic destined for WWW1 away from the target. Although this technique does not restore connectivity to WWW1 for legitimate users, it can be used to alleviate the collateral damage to other servers at the same site that might have been disabled due to link congestion caused by the attack, such as WWW2. This is a fairly drastic step. In an ISP/hosting environment, this would require customer consent.
Figure 11: Traditional Sinkhole - The Diversion
A sinkhole is also a useful tool for analyzing an attack. The sinkhole router can be used to forward the attack traffic to a back-end switch where a network analyzer, such as a sniffer or Ethereal, can be used to look at the details of the attack.
Sinkhole - Monitoring for Worm Propagation
Figure 12 illustrates how a sinkhole can be deployed to monitor for worm propagation internally within an enterprise. In this example, a host has become infected and is scanning for other hosts to infect. In this case, the sinkhole "sucks in" any internally originated traffic destined for both bogon addresses and the dark IP address space. As a result, a worm’s scanning activity can be rapidly detected at the sinkhole.
In enterprise applications, it is particularly important to monitor the dark IP address space as opposed to only a block from the bogon address space. Future worms might be written to purposely ignore these address blocks to improve scanning and propagation efficiency.
Figure 12: Sinkhole - Monitoring for Worm Propagation
Sinkholes-Detecting Other Malicious Activity
Although this example specifically illustrates the application of a sinkhole for detecting worm propagation, monitoring the bogon and dark IP address space can also detect other usually malicious activity. Under normal operational conditions, there is little reason any user or host should be attempting to access these address ranges. For example, scanning activity on these address ranges might be an internally compromised host performing recognizance activity to look for other internal targets. As such, any activity seen on the sinkhole should be investigated.
Packets with unreachable destinations, including the router null0 interface, will have an Internet Control Message Protocol (ICMP) unreachable message sent back to the source address. This "unreachable noise" is known as backscatter (Figure 13). A sinkhole is likely to draw in a substantial amount of backscatter traffic. This is particularly true for Internet-based sinkholes.
Figure 13: Backscatter Traffic
Backscatter traffic on the Internet is often the result of large-scale DoS or DDoS attacks in which spoofed source addresses have been used. As a result, backscatter traffic is sprayed randomly over the Internet address space, possibly to both assigned and unassigned addresses. There might be a high increase in backscatter traffic seen on the Internet during a large-scale worm event. It is often this backscatter traffic that will give the first sign of a new worm’s release.
Deployment of a Sinkhole
Figure 14 and Figure 15 present two possible sinkhole design options. A sinkhole does not need to be expensive, but it is recommended that it be connected with a high-speed interface into the core network. A sinkhole can be as simple as a Linux system running a routing daemon such as Zebra and a sniffer package such as Ethereal. However, a minimum recommendation is a dedicated router and an Ethernet switch to allow the attachment of various monitoring tools.
The first design option is illustrated in Figure 14. In this scenario, the target router on the right might be a low-cost device, possibly a Cisco 2600 or 3600 series router. Its primary purpose is to gather and export NetFlow information. Depending on the desired sophistication of the sinkhole, the NetFlow information might be exported to either a NetFlow Collector or an Arbor Peakflow collector.
Routing announcements for the bogon and dark IP address space can be made from either the target router or the sinkhole gateway. Announcing the address blocks from the sinkhole gateway, possibly via the redistribution of static routes, is preferred. It is also preferable to configure a static Address Resolution Protocol (ARP) entry for the target router. If these two items are configured, traffic will continue to flow onto the sinkhole network even if the target router becomes overwhelmed and fails to process the traffic load.
Figure 14: First Sinkhole Design Option
The second design option uses some form of dedicated high-speed router. This router should be of sufficient capacity to handle a high traffic load while performing NetFlow. Although it will normally be idle, it needs to be sized to accommodate an attack scenario. A second Ethernet interface should be available on this router for both NetFlow export and dedicated Simple Network Management Protocol (SNMP) polling. This approach keeps any malicious or attack traffic separated from the monitoring segment, making analysis of the attack traffic a far easier task.
Figure 15: Second Sinkhole Design Option
As in the first option, bogon and dark IP address space is announced from the sinkhole router, preferably via the redistribution of static routes. The static routes will use a bogus next hop and a static ARP entry to push traffic onto the switched network. Figure 16 illustrates the partial configuration example.
Figure 16: Sinkhole Static ARP Configuration
! ! Static route to 188.8.131.52 /3 network ip route 184.108.40.206 220.127.116.11 192.0.2.200 ! ... ! ip arp 192.0.2.200 00.00.0c.12.34.56 arpa !
Care must be taken from a routing protocol perspective to ensure that any address blocks announced from the sinkhole do not conflict with other production address ranges in the network.
Sinkholes and Packet Analyzers
The introduction to this section briefly mentions the use of a sniffer within a sinkhole. The ability to capture and analyze a new worm is a useful exercise. For example, SQL Slammer functioned on UDP port 1434 and used a 376-byte packet. This is valuable information that can be used, for example, to formulate ACLs, should that be the most appropriate action. Other information, such as the use of spoofed source addresses, IP options, a specific IP precedence, or specific DSCP is also valuable. The more the worm is understood, the more easily it can be mitigated.
Sniffers such the open-source tool Ethereal or a commercial version such as the Network General Sniffer would be excellent solutions.
Overview of Routing Techniques
Previously we have shown how a sinkhole can be used in an attack scenario to draw the attack traffic away from a target host. As explained previously, this does not help to restore service to customers wishing to access the host under attack, but it does alleviate the collateral damage, particularly if links or routers are congested as a result.
Again using an attack scenario as an example, there are many cases where it will not be desirable or feasible to shift the attack stream to a sinkhole. In these cases, it might be preferable to simply drop the stream as close to ingress as possible.
As such, a technique called remote-triggered black hole routing (also known as remote-triggered black hole filtering) can be used. Although the technique was originally developed for dealing with attacks in ISP environments, it can also be used effectively in an enterprise network for preventing worm spread. Additionally, this technique can be used for "black holing" any internal hosts participating in outbound DoS attacks, in the event that a host (such as a roaming laptop) has been compromised in this way.
This technique performs multiple functions:
- Black hole traffic at the line rate
- Provide remote trigger capability to multiple routers
- Process a large number of addresses if required
- Drop traffic based on both destination and source address, if required
To explain the technique, we will initially illustrate how it is used to mitigate an Internet-based DoS or DDoS attack. We will then explain how it can be adapted in an enterprise network.
Black Hole Routing
A black hole routing scheme is based on the concept of forwarding traffic to null0. The technique achieves a similar result to an ACL based on destination address. However, because the technique occurs directly in the forwarding (or Cisco Express Forwarding) path, it achieves a dropping function with no performance impact. Figure 17 illustrates the concept.
Figure 17: Black Hole Routing
Remote-Triggered Black Hole Routing
Although black hole routing is an effective technique for dropping traffic at line rates, we need to add remote trigger capability. This is achieved with two steps. The first step is to configure an unused route to null0. This needs to be configured on all routers that will act as remote-trigger black hole routers.
ip route 192.0.2.0 255.255.255.0 Null0
192.0.2.0 /24 is an unused address block called the Test-Net. As such, it is not publicly allocated and is often used for this application.
In the second step, Border Gateway Protocol (BGP) is used to propagate information about a prefix we want to black hole. BGP is the only routing protocol that is capable of propagating a nondirectly connected next hop. Because BGP cannot carry a next hop of null0, we must use an intermediate step. This is achieved by announcing the prefix we wish to black hole with a next hop of 192.0.2.1 (from Step 1). From here, any black hole routers know this as a route to null0 and will drop any traffic with this next-hop address.
This prefix is typically announced by a dedicated router, often known as the black hole announcement router or trigger router. The router does not need to be high-end; often, a Cisco 2600 Series device is adequate. The only requirement is that it has BGP connectivity into the provider’s core and is capable of making announcements. Figure 18 illustrates the principle.
Figure 18: Remote-Triggered Black Hole Routing
After the trigger router is in place, a configuration like the one in Figure 19 is typically used to announce the prefixes that should be black holed.
Figure 19: Configuration for Announcing Prefixes to Send to Black Hole
router bgp 999 ... redistribute static route-map STATIC-TO-BGP ... ! route-map STATIC-TO-BGP permit 10 match tag 66 set ip next-hop 192.0.2.1 set local-preference 50 set community no-export 999:000 set origin igp ! Route-map STATIC-TO-BGP permit 20 ! ... ip route 171.xxx.xxx.1 255.255.255.255 Null0 Tag 66 !
Figure 20 and Figure 21 provide a more detailed diagrammatic explanation of the operation of the remote-triggered black hole process at the router level.
Figure 20: Mapping of a Prefix to Null0
Figure 21: Internal Routing Operation of Remote-Triggered Black Hole Routing (Filtering)
Dropping on Source Address
One of the criteria for remote-triggered black hole routing to be effective as a security tool is the ability to drop traffic based on both destination address and source addresses. For example, if a host is infected with a worm, it will be identified by its source address. To prevent the spread of the worm, it is necessary to have the capability to drop any traffic originating from that source address.
A second scenario requiring a mitigation technique is one in which spoofed source addresses are used. With recent worms, such as SQL Slammer and Blaster, the host’s real IP address is used to propagate the worm. This is not to say that other worms might not use spoofed addresses. As such, the scenario needs to be accommodated. There is no reason that any host should ever send out a packet with an address other than what was assigned to it. Any packets being sent out with illegitimate source addresses should be dropped at the first router hop.
The feature that enables both of these requirements is Unicast Reverse Path Forwarding (Unicast RPF). Information about Unicast RPF is available at:
Figure 22 illustrates Unicast RPF in the traditional strict mode. If a packet is received on an interface, a route to that packet’s source address must be available back through the same interface on which the packet was received. If this route does not exist, the packet fails the RPF check and is dropped. It is recommended that this technique be deployed on all user-facing interfaces. This technique is an effective way to drop any spoofed packets from hosts on the local network at the first router hop.
Figure 22: Unicast RPF in Strict Mode
Figure 23: Configuration of Unicast RPF in Strict Mode
! interface FastEthernet2/0 ip address 192.xxx.xxx.50 255.255.255.0 ip verify unicast reverse-path ... speed 100 full-duplex !
Figure 24 illustrates Unicast RPF in loose check mode, which is essentially a relaxation of the strict check mode. In the case of loose check, the only requirement is that the source address must appear in the router’s Cisco Express Forwarding table. If the route does not exist or it has a destination of null0, the packet is dropped.
Unicast RPF with loose check is part of the mechanism that is used to perform remote-triggered black hole filtering based on source addresses. In the previous example, we saw how BGP (typically iBGP) can be used to distribute a prefix (or prefixes) to trigger the black holing of traffic for a destination address. The prefix of 192.0.2.1 was used as the next-hop address for this purpose. This, in turn, triggered the black hole operation at the network edge across a number of routers.
If Unicast RPF loose check is configured, any source addresses with a route to null0 will also be dropped. The same technique of distributing these prefixes with a next hop of 192.0.2.1, for example, is used. BGP as a protocol is capable of carrying hundreds of thousands of prefixes. This technique is scalable and can easily be used to black hole possibly thousands of individual (/32) source addresses of individually infected machines.
Figure 24: Unicast RPF in Loose Check Mode
Figure 25 illustrates the configuration of Unicast RPF in loose check mode.
Figure 25: Configuration of Unicast RPF in Loose Check Mode
! interface FastEthernet2/0 ip address 192.xxx.xxx.50 255.255.255.0 ip verify unicast source reachable-via any ... speed 100 full-duplex !
The previous sections on NetFlow and sinkholes provided a set of techniques for identifying infected machines and listed a variety of abnormal behaviors that might represent a security incident. When an infected machine or security event is identified, the operations staff has the option of black holing the device. For example, the configuration in Figure 26 would black hole the infected machines at xxx.xxx.xxx.242 and .204.
Figure 26: Configuration for Black Holing Addresses
! ip route xxx.xx.xxx.242 255.255.255.255 Null0 Tag 66 ip route xxx.xx.xxx.204 255.255.255.255 Null0 Tag 66 !
The result is illustrated in Figure 27. We now have the capability to use iBGP to selectively trigger the remote dropping of any traffic based on either source or destination address. Although this is a service provider example, it shows how attack traffic can be dropped effectively based on source addresses.
Figure 27: Selective Remote Traffic Dropping
Networkwide ACL and VACL Propagation (Rapid Deployment)
The ACLs as Security Tools section provided an overview of the use of ACLs as a blocking mechanism during worm events. Both router ACLs and VACLs have a place within a reaction methodology toolkit. Router ACLs can easily be deployed to prevent the spread of a worm from subnet to subnet. VACLs allow port-level filtering on a VLAN basis. This is a more effective technique for blocking the spread of a worm, because it can be applied directly on the switch port, allowing filtering on a per-host basis to prevent the possibility of intra-VLAN propagation.
Because reaction time is critical during a security event, a mechanism is required to apply ACLs on a widespread basis quickly. Each network is unique, so an ACL deployment scenario needs to be designed and tested as a part of a worm mitigation solution.
Network management tools such as ACL Manager, which is an optional part of CiscoWorks Resource Manager Essentials (RME), are one type of mechanism that can be used for the widespread deployment of ACLs.
At the time of writing, ACL Manager had been extended to support VACLs and private VLANs. Information about ACL Manager 1.4 is available at:
A second option for the widespread deployment of any configuration change is NetConfig, which is also part of CiscoWorks RME. Information is available at:
Although step-by-step details are not included in this document, some comments and suggestions for deployment are outlined here:
- Analyze the network topology and understand all VLANs, the devices that reside within them, and the traffic flows (if possible).
- Understand where the most critical resources reside. For example, on which VLANs do critical servers reside? Prioritize these resources so the most critical are the first to be protected in the event of a worm outbreak.
- Allocate a range of extended ACL numbers that can be dynamically used networkwide for responding to security events. For example, extended ACLs 180 to 189 might be a suitable range. Clearly document this range and possibly configure a dummy list on each device with a suitable comment so that all network operations staff will know that the range is reserved and will know its purpose. Dummy lists will also avoid any accidental conflict in ACL ranges when the extended ACLs are deployed during an incident.
- Create templates within NetConfig to assign one or more predetermined ACLs (for example, access list 180) to each VLAN on a networkwide basis. With these assignment templates in place, widespread deployment of configuration changes is accomplished by creating a suitable ACL (for example, access list 180), pushing it out to all devices, and performing a widescale assignment using the previously created scripts.
- Keep in mind that switches running Cisco Catalyst OS and native Cisco IOS Software have different configuration formats. Any strategy will need to incorporate router ACLs suitable for both routers and the multilayer switch feature cards (MSFCs). Likewise, templates will have to incorporate VACLs on both Cisco Catalyst OS and native Cisco IOS Software-based switches.
Private VLANs are a technique for providing Layer 2 isolation of hosts within a VLAN. This technique can improve the security posture of a network by isolating servers that do not need to communicate with each other. From a security standpoint, if one server were to become infected with a worm, its inability to communicate with other servers would prevent the spread. In this case, each server would be attached to an isolated port.
Figure 28 illustrates the concept of private VLANs and the associated port types.
Figure 28: Private VLANs
From a general security perspective, private VLANs should be widely deployed, particularly for the attachment of servers. In many cases, servers within server clusters are only required to communicate with clients and not with each other. Using isolated ports within a private VLAN achieves Layer 2 separation.
Private VLANs do not provide perfect isolation. If a router is attached to a promiscuous port, it is possible to hop from one host or server device to another via the router, using the intended destination IP address with the router’s MAC address. However, an ACL can be applied to the router port to prevent this behavior.
The use of private VLANs is explained extensively in the documentation for the SAFE Blueprint from Cisco. It is recommended that anyone deploying these techniques also be familiar with SAFE Blueprint best practices.
Applying the Tools to an Enterprise Environment
The previous sections have described the use of several security tools. In some cases, use was explained in the context of a traditional deployment in a service provider environment. This section discusses how these tools can be applied in an enterprise environment.
Although the primary aim of this solution is to mitigate a worm outbreak, the techniques have much wider application in responding to many other security incidents. The total solution provides response capability for the following situations:
- Mitigation of worm outbreaks
- Detection and mitigation of scanning activity from compromised hosts
- Detection and response to internally compromised machines participating in external DDoS or botnet attacks (zombies)
From a network design perspective, the integration of these techniques in specific customer environments will require a reasonable degree of independent judgment. Some networks are complex, with many interdependencies. Careful application of these techniques on a case-by-case basis is required.
The application and use of these techniques during a security incident will also require independent judgment. This section has been written based on experience. Although it is highly likely that the lessons learned can be reused with confidence, it is important to keep in mind that the security threat landscape continues to evolve.
Figure 29 illustrates a sample enterprise network, typical of many enterprises. The sample network and the examples are designed to illustrate the application of the various techniques. They are not intended to replace any of the architectural techniques recommended as part of the SAFE Blueprint architecture.
Figure 29: Sample Enterprise Network
Three primary detection techniques (or tools) have been presented as part of this solution:
- (1) NetFlow, NetFlow export to a collection device
- (2a) Arbor Peakflow DoS and Peakflow X (using NetFlow export)
- (2b) Custom scripts, custom applications, manual analysis (cheaper alternatives)
- (3) Sinkholes
Remember that these are tools; it is their application that will determine the effectiveness of a solution. The tools should not be seen as mutually exclusive answers; each has a place in an overall solution, even if some overlap occurs across the various options.
Currently the most effective and readily automated solution of the three mentioned above is the Arbor Networks solution. When a new worm outbreak occurs, it is highly likely that its spread will be seen on the Internet side of firewalls first. For example, during the SQL Slammer outbreak a large amount of activity on UDP port 1434 was detected. Because this was anomalous, it was quickly detected and then determined to be malicious.
During the early stages of a new worm event, applicable IDS signatures are not normally available. This limits the usefulness of these devices in the initial stages of an outbreak. At such a time, anomaly detection is invaluable. On this basis, Arbor Peakflow (in conjunction with NetFlow export) is very suitable for use as the primary detection mechanism. Figure 30 shows how NetFlow information would be exported to an Arbor collector and analyzed for anomalies by the Arbor Peakflow system.
The Arbor solution is also extremely effective in the detection of a variety of other attacks against or within enterprise infrastructures. An example would be a DoS or DDoS attack against a large enterprise’s Internet presence, whether it be a simple site providing a business overview or a large-scale e-commerce site. After the Arbor solution quickly and accurately detects the attack, the appropriate response can be applied.
Figure 30: Using Arbor Peakflow to Collect and Analyze NetFlow Information
There is a cost associated with the purchase of the Arbor product, which might not be acceptable to all clients. Although Arbor has been included as an integral component of this solution, it is certainly possible to build a solution without it. Alternative detection techniques (using NetFlow as the foundation) are possible; these include internally created scripts and other custom applications, or even manual analysis. If the cost of the Arbor solution is an issue, alternatives can be investigated.
Information on third-party or freeware tools is included in Appendix B.
Application of Sinkholes
A sinkhole is a valuable and complementary tool. At a minimum, a basic sinkhole should be deployed to monitor the client’s dark IP address space. Any network activity in this space usually indicates some sort of security issue or misconfiguration. A more aggressive approach for a sinkhole is to announce a default route. This will engulf any traffic without an explicit route to the destination. This technique is more advanced and must be carefully planned in conjunction with the overall routing architecture.
Figure 31 and Figure 32 illustrate worm-infected laptops scanning for further targets. In the case of Figure 31, the infected laptop is within workgroup 1.
Figure 31: Worm-Infected Laptop Scanning for Targets (Internal Laptop)
Figure 32: Worm-Infected Laptop Scanning for Targets (Remote-Access Laptop)
The greatest danger has typically been the introduction of a worm through back doors, such as externally infected laptops. This scenario is illustrated in Figure 32. These machines are often connected to the Internet via a home or hotel connection, where it is easy to become infected. Subsequently, the machines connect to the internal network either when the employee returns to the office or when connecting through a remote-access solution such as an IPSec VPN. In any case, we must stress that this has been the typical experience; we cannot guarantee that future scenarios will occur in the same manner.
As previously explained, a sinkhole is one technique that can be deployed to detect infected machines. After a sinkhole is deployed, monitoring the traffic through it becomes the main issue. There are a number of approaches, but the desired outcome is to detect any rapid increase in traffic to the sinkhole.
The Arbor solution has the capability of monitoring traffic in the enterprise’s dark IP address space. Using a sinkhole in conjunction with Arbor Peakflow is a complementary match.
Alternative Techniques for Monitoring the Sinkhole
The following list provides alternative techniques for sinkhole monitoring:
- SNMP polling of sinkhole egress interface, CiscoWorks for Switched Internetworks Device Fault Manager (monitor for sudden increase)
- SNMP polling of sinkhole egress interface, monitor with MRTG or Cricket
- Router-based Remote Monitoring (RMON) traps, monitor egress interface traffic
- NetFlow deployed on sinkhole ingress, monitor for changes
- ACLs with the log keyword
- Packet analysis
- Commercial tools such as Network General Sniffer
- Public domain tools, especially Ethereal
Containment and Isolation
When a worm outbreak is detected, it is necessary to move quickly to contain its spread. Under normal conditions, corporate firewalls should prevent worms from entering the internal network. However, every network is unique and different networks might have unique security holes where a worm (or other attack) might enter.
During the containment phase, keep in mind that the mitigation strategy might have a business impact. It is important to have access to someone who can quickly make a business decision if required.
In the case of SQL Slammer, malicious traffic was detected on UDP port 1434. This port would normally be blocked by a properly configured firewall. However, because most infections enter by way of back doors, it would be necessary to block this port throughout the internal network to properly prevent the spread of this worm. In this case, the application of port-level ACLs was the containment mechanism. Figure 33 illustrates this concept.
Figure 33: Using Port-Level ACLs to Prevent Spread of SQL Slammer Worm
ACLs can be deployed as router ACLs or VACLs. Router ACLs are the simplest (or MSFC-based ACLs in the case of a Cisco Catalyst® 6500 Series system). VACLs are more complex to deploy, but are more effective because they can block any traffic entering the VLAN at port (host) level. VACLs are more effective at preventing intra-VLAN infection.
Figure 34 illustrates the difference in operation between a router ACL and a VACL in the prevention of worm propagation.
Figure 34: Using Router ACL vs. VLAN ACL for Worm Containment
In some cases, the port on which the worm is spreading might be critical to business operation. For example, when SQL Slammer was propagating, some organizations could not block UDP port 1434 because it was required for access to the SQL server for legitimate business transactions. In this case, some alternatives would need to be considered.
If the network devices using the service on the affected port are known, permitting selective access might be an option. For example, if only a small number of clients were using SQL server, an option might have been to open UDP port 1434 to critical devices. Selective access is not guaranteed to solve the problem, but it will certainly lower the probability of infection.
Using Arbor’s Peakflow X provides useful baseline information about normal application-level conversations within the network. Having this baseline information available during a security incident aids in decision making and helps avoid the risk of inadvertently shutting down critical services.
Using Remote-Triggered Black Hole Filtering
A further and alternative containment technique (or tool) is remote-triggered black hole filtering. As explained previously, any traffic from an infected node can be dropped, based on its source address, on any appropriately configured router (or Cisco Catalyst 6500/4500 switch). The main advantage of this technique is the speed at which it can be activated. Because BGP routing is used to activate the drop, it can be activated on thousands of points across large networks in typically less than one second. This is a fast and effective technique for preventing the spread of the worm from an infected machine beyond its directly attached VLAN. During the SQL Slammer event, this proved to be an effective technique, with minimal intra-VLAN infection occurring. Figure 35 illustrates the concept.
Figure 35: Remote-Triggered Black Hole Filtering for Worm Containment
The later versions of Arbor Peakflow are capable of announcing the black hole route directly into iBGP. This feature will improve the response time when an infected or compromised device is located.
Detecting and black holing the infected machine should be viewed as a temporary fix. It is strongly recommended that further tracing of the machine be performed in order to isolate the machine at the switch port (quarantine phase). Following this, it is crucial that the owner be contacted and the machine properly treated (treatment phase).
Other Quarantine Techniques
Several other selective quarantine techniques are currently being developed. These techniques include:
Port control using scripting - It is not overly complex to construct a shell script (using TCL/Expect or Perl, for example) that identifies the attached port for an infected host from the IP address. After this port is identified, the host can be totally isolated from the network (shut down the port) or moved to a quarantine VLAN.
Policy-based routing - It is possible to use policy-based routing (PBR) to route all traffic from an infected host (based on source address) to a quarantine network. Typically, within this restricted network would be a server on which the infected clients could download tools for such tasks as treating the infection and patching the machine. The solution would require some level of scripting to activate the PBR.
Web Cache Communication Protocol - Using the same basic approach as PBR, Web Cache Communication Protocol (WCCP) can be used to divert traffic to a quarantine network.
MAC addresses - The first requirement is to know the MAC address of the infected hosts. These will soon be able to be identified, using an upcoming NetFlow enhancement in which the MAC address is exported. After the MAC address is known, it is possible to use some custom scripting (such as a Perl script) on the Dynamic Host Control Protocol (DHCP) server to grant an IP address from a quarantine pool. These addresses can then be routed via PBR or WCCP to the quarantine area.
802.1x - Through the use of 802.1x, an infected machine could be placed into a quarantine VLAN. In the same manner as above, this restricted network would contain a server where infected clients could download tools for such tasks as treating the infection and patching the machine. Alternatively, 802.1x can be used to apply a suitable ACL such that access is permitted only to the treatment server.
Remote access - Hosts dialing into the network or accessing the network using a VPN can typically have access restrictions applied using a RADIUS attribute. An ACL allowing access to the treatment server could only be applied universally on these remote-access points and then applied using RADIUS at login.
During a worm outbreak, it is critical to remember that infection might spread by entering the remote-access network. Often, machines that are connected to the Internet become infected and then connect internally using a VPN client. As such, any mitigation strategy needs to include the remote-access components of the network (both remote dial access and VPNs).
The primary intent of this document has been to describe response techniques for a worm outbreak. These same tools and techniques can also be used against external attacks on the corporate infrastructure, such as DoS and DDoS attacks on an e-commerce Website. In these cases, the response options are slightly different, but include:
Filtering (using ACL entries or ACLs) - Deny explicit sources, destinations, ports, services, and flags.
Rate limiting (via Committed Access Rate entries or class-based policing) - Throttle malicious traffic to maintain service availability.
Remote-triggered black hole routing using BGP - Drop explicit sources and destinations.
Diversion of attack traffic to a sinkhole - Redirect anomalous traffic to a sinkhole for analysis.
Diversion and scrubbing architectures - Assuming the attack traffic has not saturated the inbound link to the site, it can be diverted to a dedicated scrubbing device such the Cisco Guard XT (formerly the Riverhead Guard). This device provides sophisticated algorithms to remove attack traffic while allowing legitimate customer traffic to pass.
It is important to have a flexible response capability - different attacks will require different responses. For example, if the attack is an ICMP flood from a large number of fixed IP addresses, filtering using ACLs or remote-triggered black hole routing are the logical choices. In contrast, if the attack is a UDP flood or a SYN flood from spoofed addresses, rate limiting or diversion (to a Cisco Guard XT appliance) might be more appropriate.
IP Telephony: Cisco CallManager
Some of the most critical devices that must be protected during a worm outbreak are the Cisco CallManager systems. Compromise or infection of these devices could lead to a loss of telephony across the organization.
It is recommended that during a worm outbreak or other security incident that the nature of the event be assessed to determine its impact on the Cisco CallManager systems. It is beyond the scope of this document to provide full details for securing these systems, but it is strongly recommended that during normal operation, strict port filters be applied to restrict Cisco CallManager access to only the required ports.
Appendix A - Aggregated Bogon List
At the time of writing, the following prefixes are bogons. Details about bogons are available on the Internet at:
Appendix B - Freeware Tools
SNMP: net-snmp Toolset
Open-source SNMP command-line tools, library, trap generator, and agent are available at:
Perl modules are available from the Comprehensive Perl Archive Network (CPAN) at:
SNMP: Multirouter Traffic Grapher (MRTG)
MRTG, an open-source SNMP visualization toolset developed by Tobias Oetiker, is available on the Internet. Written in Perl, it has its own SNMP implementation.
SNMP: Round Robin Database Tool
The Round Robin Database (RRD) Tool is another open-source SNMP visualization toolset developed by Tobias Oetiker. It is available at:
RRD can be used in conjunction with MRTG. It does not perform its own SNMP collection. It can also be used with NetFlow using OSU flow-tools and FlowScan.
There are many HTML and PHP front ends such as Cacti, Cricket, and Big Sister.
The OSU flow-tools package was developed and is maintained by Mark Fullmer. It is available at:
Command-line tools allow for display and sorting of specific criteria, such as source/destination IP, source/destination ASN, protocol, or port.
Data can be batched and imported into a database such as Oracle, MySQL, or Postgres.
Flow-tools can be combined with other tools to provide visualization of traffic patterns.
NetFlow Visualization: FlowScan
FlowScan is an open-source NetFlow graphing and visualization tool. It was developed and is maintained by Dave Plonka. It is available at:
FlowScan uses NetFlow data collected using OSU flow-tools to build traffic graphs. It supports reports such as top talkers by subnet and others, and uses the RRD Tool for graphing. Add-ons such as the JKFlow module allow more detailed graphing.
 Unassigned IP address space, i.e. 18.104.22.168/8.
 Address space allocated to this enterprise, but not assigned.
This document is part of the Cisco Security Research & Operations.
This document is provided on an "as is" basis and does not imply any kind of guarantee or warranty, including the warranties of merchantability or fitness for a particular use. Your use of the information on the document or materials linked from the document is at your own risk. Cisco reserves the right to change or update this document at any time.