navbarPDF
Strip_TechNotes

Dealing with mallocfail and High CPU Utilization Resulting From the "Code Red" Worm

Updated October 10, 2001


Contents


Introduction

The "Code Red" worm exploits a vulnerability in the Index Service of the Microsoft Internet Information Server (IIS) version 5.0. When the "Code Red" worm infects a host, it causes the host to begin probing and infecting a random series of IP addresses, which causes a sharp increase in network traffic. This is especially problematic if there are redundant links in the network and/or Cisco Express Forwarding (CEF) is not being used to switch packets. This document describes the "Code Red" worm and the problems the worm can cause in a Cisco routing environment; it also discusses techniques to prevent infestation and provides links to related advisories that discuss workarounds for worm related problems.

How the "Code Red" Worm Infects Other Machines

The "Code Red" worm attempts to connect to randomly generated IP addresses. Every infected IIS server may attempt to infect the same set of devices.

The worm's source IP address and TCP port is traceable because it is not spoofed. Unicast Reverse Path Forwarding (URPF) is not helpful in suppressing a worm attack since the source address is a legal address.

Advisories that Discuss the "Code Red" Worm

The following advisories discuss the "Code Red" worm and how to patch software affected by the worm. Information about Cisco products affected by the worm can be found in the URLs listed below.

Symptoms

The following is a list of symptoms that might be seen on Cisco routers affected by the "Code Red" worm.

A low memory condition or sustained high CPU utilization (100 percent) at interrupt level could cause an Cisco IOS® router to reload because of a process misbehaving due to the stress conditions.

If you do not suspect that devices in your site are infected by or are the target of the "Code Red" worm, please check the Related Information section for additional URLs on how to troubleshoot the issues you may be experiencing.

Identifying the Infected Device

Flow switching could be used to identify the source IP address of the affected device. Configure ip route-cache flow on all the interfaces to record all the flows switched by the router.

After a few minutes, issue the command show ip cache flow to see the recorded entries. During the initial phase of the "Code Red" worm infection, it tries to replicate itself by sending HT requests to random IP addresses, so we need to look for cache flow entries with destination port 80 (HT., 0050 in hex).

The following command displays all the cache entries with a TCP port 80 (0050 in hex):

Router#show ip cache flow | include 0050 
... 

scram         scrappers    dative         DstIPaddress    Pr SrcP  DstP  Pkts 
Vl1        193.23.45.35      Vl3           2.34.56.12     06 0F9F  0050     2 
Vl1     211.101.189.208     Null        158.36.179.59     06 0457  0050     1 
Vl1        193.23.45.35      Vl3        34.56.233.233     06 3000  0050     1 
Vl1      61.146.138.212     Null        158.36.175.45     06 B301  0050     1 
Vl1        193.23.45.35      Vl3        98.64.167.174     06 0EED  0050     1 
Vl1      202.96.242.110     Null        158.36.171.82     06 0E71  0050     1 
Vl1        193.23.45.35      Vl3        123.231.23.45     06 121F  0050     1 
Vl1        193.23.45.35      Vl3          9.54.33.121     06 1000  0050     1 
Vl1        193.23.45.35      Vl3         78.124.65.32     06 09B6  0050     1 
Vl1       24.180.26.253     Null       158.36.179.166     06 1132  0050     1

If you find an abnormally high number of entries with the same source IP Address, random destination IP Address(*), DstP = 0050 (HTTP), and Pr = 06 (TCP), you have probably found an infected device. In the output above, the source IP address is 193.23.45.35 and comes from VLAN1.

(*)Another version of the "Code Red" worm called "Code Red II" does not choose a totally random destination IP address. Instead, it propagates by keeping the network portion of the IP address, and then choosing a random host portion of the IP address. This allows the worm to spread itself faster within the same network.

"Code Red II " uses the following networks and masks:

Mask        Probability of Infection 
0.0.0.0       12.5% (random) 
255.0.0.0     50.0% (same class A) 
255.255.0.0   37.5% (same class B) 

Target IP addresses that are excluded: 127.X.X.X and 224.X.X.X, and no octet is allowed to be 0 or 255. In addition, the host will not attempt to re-infect itself.

For more information on "Code Red II", refer to Code Red (II). leaving cisco.com

It may not always be possible to run netflow to detect a "Code Red" infestation attempt because you may be running a version of code that does not support netflow, or because the router has insufficient or excessively fragmented memory to enable netflow. It may also be undesirable to enable netflow when there are multiple ingress interfaces and only one egress interface on the router because netflow accounting is done on the ingress path. In this case, it is better to enable IP accounting on the lone egress interface.

Note: IP accounting disables DCEF. Do not enable IP accounting on any platform where you need to do DCEF switching.

Router(config)#interface vlan 1000 
Router(config-if)#ip accounting 
  
Router#show ip accounting 
  Source           Destination              Packets               Bytes 
20.1.145.49    75.246.253.88                    2                  96 
20.1.145.43    17.152.178.57                    1                  48 
20.1.145.49    20.1.49.132                      1                  48 
20.1.104.194   169.187.190.170                  2                  96 
20.1.196.207   20.1.1.11                        3                 213 
20.1.145.43    43.129.220.118                   1                  48 
20.1.25.73     43.209.226.231                   1                  48 
20.1.104.194   169.45.103.230                   2                  96 
20.1.25.73     223.179.8.154                    2                  96 
20.1.104.194   169.85.92.164                    2                  96 
20.1.81.88     20.1.1.11                        3                 204 
20.1.104.194   169.252.106.60                   2                  96 
20.1.145.43    126.60.86.19                     2                  96 
20.1.145.49    43.134.116.199                   2                  96 
20.1.104.194   169.234.36.102                   2                  96 
20.1.145.49    15.159.146.29                    2                  96 

In the show ip accounting output, look for source addresses that are attempting to send packets to multiple destination addresses. If the infected host is in the scan phase, it is attempting to establish HTTP connections to other routers, so you will see attempts to reach multiple IP addresses. Since it is likely most of these connection attempts will fail, you will only see a small number of packets transferred, each with a small byte count. In the above example, it is likely that 20.1.145.49 and 20.1.104.194 are infected.

When running Multi-Layer Switching (MLS) on the Catalyst 5000 Series and the Catalyst 6000 Series, the steps taken to enable netflow accounting and track down the infestation are slightly different. In a Cat6000 switch equipped with Supervisor 1 Multilayer Switch Feature Card (MSFC1) or Sup1/MSFC2, netflow-based MLS is enabled by default, but the "flow-mode" is "destination-only" so the source IP address is not cached. You can enable "full-flow" mode to help track down infected hosts using the set mls flow full command on the supervisor.

For Hybrid mode:

6500-sup(enable)set mls flow full
Configured IP flowmask is set to full flow. 
Warning: Configuring more specific flow mask may dramatically increase the number of 
MLS entries.

For Native IOS mode:

Router(config)#mls flow ip full

Enabling "full-flow" mode causes a warning to display about a dramatic increase in MLS entries. The impact of the increased MLS entries is justifiable for a short duration if you are already suffering from an infestation of "Code Red" because your MLS entries may be excessive and on the rise.

To display the collected information, use the following commands:

For Hybrid mode:

6500-sup(enable)show mls ent
Destination-IP  Source-IP       Prot  DstPrt SrcPrt Destination-Mac   Vlan EDst 
ESrc DPort     SPort     Stat-Pkts  Stat-Bytes  Uptime   Age 
--------------- --------------- ----- ------ ------ ----------------- ---- ---- 
---- --------- --------- ---------- ----------- -------- -------- 

Note: All of the above fields will be filled in when in "full-flow" mode.

For Native IOS mode:

Router#show mls ip 
DstIP           SrcIP           Prot:SrcPort:DstPort  Dst i/f:DstMAC 
-------------------------------------------------------------------- 
Pkts         Bytes       SrcDstPorts    SrcDstEncap Age   LastSeen 
-------------------------------------------------------------------- 

Once you have determined the source IP address and destination port involved in the attack, you can set MLS back to "destination-only" mode.

For Hybrid mode:

6500-sup(enable) set mls flow destination 
Usage: set mls flow <destination|destination-source|full> 

For Native IOS mode:

Router(config)#mls flow ip destination 

The Supervisor 2/MSFC2 combination is protected from attack because they do CEF switching in hardware as well as maintain netflow statistics. So, even during a "Code Red" attack, enabling full-flow mode should not swamp the router because of the faster switching mechanism. The commands to enable full-flow mode and display the statistics are the same on both the Sup1/MFSC1 and the S2/MSFC2.

Prevention Techniques

The following techniques could be used to minimize the impact of the "Code Red" worm on the router.

Blocking Traffic to Port 80

If it is feasable in your network, the easiest way to prevent the "Code Red" attack is to block all traffic to port 80, the well known port for WWW. Build an access-list to deny IP packets destined to port 80 and apply it inbound on the interface facing the infection source.

Reducing ARP Input Memory Usage

A huge memory usage in ARP Input occurs when there is a static route pointing to a broadcast interface, such as the following:

ip route 0.0.0.0 0.0.0.0 Vlan3

Every packet for the default route will be sent to the VLAN3, but since there is no next hop IP address specified, the router will send an ARP request for the destination IP address, and the next hop router for that destination will reply with its own MAC address, unless proxy ARP is disabled. This creates an additional entry in the ARP table where the destination IP address of the packet will be mapped to the next-hop MAC address. Since the "Code Red" worm sends packets to random IP addresses, this adds a new ARP entry for each random destination address and consumes more and more memory under the ARP Input process.

Creating a static default route to an interface is not good practice, especially if the interface is broadcast (Ethernet/Fast Ethernet/GE/SMDS) or multipoint (Frame Relay/ATM). Any static default route should point to the IP address of the next hop router. After you change the default route to point to the next hop IP address, use the clear arp-cache command to clear all the ARP entries. This fixes the memory utilization problem.

Using Cisco Express Forwarding (CEF) Switching

You can lower CPU utilization on an IOS router by changing from Fast/Optimum/Netflow switching to CEF switching. There are a few caveats for enabling CEF. The following section discusses the difference between CEF and fast switching and the implications of enabling CEF.

Cisco Express Forwarding vs Fast Switching

Enabling CEF may be a method to alleviate the increased traffic load caused by the "Code Red" worm. CEF is supported in IOS releases 11.1( )CC, 12.0, and later on the Cisco 7200/7500/GSR platforms. Support for CEF on other platforms may be in IOS release 12.0 or later. You can investigate further with the Software Advisor tool on CCO.

It may not be possible to enable CEF on all routers for one of the following reasons:

Fast Switching Behavior and Implications

The following are implications of using fast switching:

The command to change the above values is: ip cache-ager-interval X Y Z, where:

In the example configuration below, we used ip cache-ager 60 5 25.

Router#show ip cache
IP routing cache 2 entries, 332 bytes
   27 adds, 25 invalidates, 0 refcounts
Cache aged by 1/25 every 60 seconds (1/5 when memory is low).
Minimum invalidation interval 2 seconds, maximum interval 5 seconds, 
quiet interval 3 seconds, threshold 0 requests
Invalidation rate 0 in last second, 0 in last 3 seconds
Last full cache invalidation occurred 03:55:12 ago

Prefix/Length          Age    Interface     Next Hop
4.4.4.1/32        03:44:53    Serial1       4.4.4.1
192.168.9.0/24    00:03:15    Ethernet1    20.4.4.1

Router#show ip cache verbose
IP routing cache 2 entries, 332 bytes
   27 adds, 25 invalidates, 0 refcounts
Cache aged by 1/25 every 60 seconds (1/5 when memory is low).
Minimum invalidation interval 2 seconds, maximum interval 5 seconds,
   quiet interval 3 seconds, threshold 0 requests
Invalidation rate 0 in last second, 0 in last 3 seconds
Last full cache invalidation occurred 03:57:31 ago

Prefix/Length Age Interface Next Hop 4.4.4.1/32-24 03:47:13 Serial1 4.4.4.1 4 0F000800 192.168.9.0/24-0 00:05:35 Ethernet1 20.4.4.1 14 00000C34A7FC00000C13DBA90800

Depending on the setting of your cache ager, some percentage of your cache entries will be aged out of your fast-cache table. The benefit of aging entries more quickly and aging a larger percentage of the fast-cache table is that it keeps a smaller cache table and reduces memory consumption on the router. The disadvantage is that if traffic is still flowing for the entries that were aged out of the cache table, the initial packets will be process-switched causing a short spike in CPU consumption in IP Input until a new cache entry is built for the flow.

From IOS releases 10.3(8), 11.0(3) and later, the following changes have been made to the handling of the IP cache ager:

Note: Executing the following commands will cause the router's CPU utilization to increase. Use only when absolutely necessary.

Router#clear ip cache ? 
A.B.C.D Address prefix 
<CR>--> will clear the entire cache and free the memory used by it! Router#debug ip cache
IP cache debugging is on

CEF Advantages

Caution: Again, a default route pointing to a broadcast or multipoint interface means that the router will ARP for every new destination, potentially creating a huge adjacency table until the router runs out of memory. If CEF fails to allocate memory CEF/DECF will disable itself and will have to be manually re-enabled.

Sample Output: CEF

The sample output belows shows memory usage. It is a snapshot from a Cisco 7200 route server running IOS 12.0.

Router<show ip cef summary
IP CEF with switching (Table Version 2620746)
  109212 routes, 0 reresolve, 0 unresolved (0 old, 0 new), peak 84625
  109212 leaves, 8000 nodes, 22299136 bytes, 2620745 inserts, 2511533 invalidations
  17 load sharing elements, 5712 bytes, 109202 references
  universal per-destination load sharing algorithm, id 6886D006
  1 CEF resets, 1 revisions of existing leaves
  1 in-place/0 aborted modifications
  Resolution Timer: Exponential (currently 1s, peak 16s)
  refcounts:  2258679 leaf, 2048256 node

Adjacency Table has 16 adjacencies 


Router>show processes memory | include CEF
 PID TTY  Allocated      Freed    Holding    Getbufs    Retbufs Process
  73   0     147300       1700     146708          0          0 CEF process
  84   0        608          0       7404          0          0 CEF Scanner


Router>show processes memory | include BGP

   2   0    6891444    6891444       6864          0          0 BGP Open
  80   0       3444       2296       8028          0          0 BGP Open
  86   0     477568     476420       7944          0          0 BGP Open
  87   0 2969013892  102734200  338145696          0          0 BGP Router
  88   0   56693560 2517286276       7440     131160    4954624 BGP I/O
  89   0      69280   68633812      75308          0          0 BGP Scanner
  91   0    6564264    6564264       6876          0          0 BGP Open
 101   0    7635944    7633052       6796        780          0 BGP Open
 104   0    7591724    7591724       6796          0          0 BGP Open
 105   0    7269732    7266840       6796        780          0 BGP Open
 109   0    7600908    7600908       6796          0          0 BGP Open
 110   0    7268584    7265692       6796        780          0 BGP Open


Router>show memory summary | include FIB

Alloc PC        Size     Blocks      Bytes    What
0x60B8821C        448          7       3136    FIB: FIBIDB
0x60B88610      12000          1      12000    FIB: HWIDB MAP TABLE
0x60B88780        472          6       2832    FIB: FIBHWIDB
0x60B88780        508          1        508    FIB: FIBHWIDB
0x60B8CF9C       1904          1       1904    FIB 1 path chunk pool
0x60B8CF9C      65540          1      65540    FIB 1 path chunk pool
0x60BAC004       1904        252     479808    FIB 1 path chun
0x60BAC004      65540        252   16516080    FIB 1 path chun


Router>show memory summary | include CEF

0x60B8CD84       4884          1       4884    CEF traffic info
0x60B8CF7C         44          1         44    CEF process
0x60B9D12C      14084          1      14084    CEF arp throttle chunk
0x60B9D158        828          1        828    CEF loadinfo chunk
0x60B9D158      65540          1      65540    CEF loadinfo chunk
0x60B9D180        128          1        128    CEF walker chunk
0x60B9D180        368          1        368    CEF walker chunk
0x60BA139C         24          5        120    CEF process
0x60BA139C         40          1         40    CEF process
0x60BA13A8         24          4         96    CEF process
0x60BA13A8         40          1         40    CEF process
0x60BA13A8         72          1         72    CEF process
0x60BA245C         80          1         80    CEF process
0x60BA2468         60          1         60    CEF process
0x60BA65A8      65488          1      65488    CEF up event chunk


Router>show memory summary | include adj

0x60B9F6C0        280          1        280    NULL adjacency
0x60B9F734        280          1        280    PUNT adjacency
0x60B9F7A4        280          1        280    DROP adjacency
0x60B9F814        280          1        280    Glean adjacency
0x60B9F884        280          1        280    Discard adjacency
0x60B9F9F8      65488          1      65488    Protocol adjacency chunk

Things to Consider

When there is a large number of flows, CEF typically consumes less memory than fast switching. If memory is already consumed by a fast switching cache, you should clear the ARP cache (using clear ip arp) before enabling CEF. Note that clearing the cache causes a spike in the router's CPU utilization.

"Code Red" Questions and Answers

Q. I am running NAT, and am experiencing 100 percent CPU utilization in IP Input. When I execute show proc cpu, my CPU utilization is high in interrupt level - 100/99 or 99/98. Could this be related to "Code Red"?

A. There is recently fixed a NAT bug (CSCdu63623) involving scalability. When there are tens of thousands of NAT flows (depending on the platform type), the bug causes 100 percent CPU utilization at process or interrupt level.

A description of the bug can be found in CSCdu63623.

To determine if you are running into this bug, issue the command show align and verify that the router is not having alignment errors. If you do see alignment errors or spurious memory accesses, issue the show align command a couple of times and see if they are incrementing. If they are incrementing, the high cpu utilization at interrupt level could be the alignment errors and not CSCdu63623. See Spurious Accesses and Alignment Errors for more information.

The command show ip nat translation will show you how many translations you have active. The meltdown point for a NPE-300 class processor is about 20,000 to 40,000 translations. This number varies depending on the platform.

This meltdown problem was seen previously by a couple of customers, but since "Code Red", more customers have experienced this problem. The only workaround is to run NAT (instead of PAT), so there are fewer active translations. If you have a 7200, use a NSE-1 and lower the NAT timeout values.

Q. I am running IRB and am experiencing high CPU utilization in the HyBridge Input process. Why is this happening? Is it related to "Code Red"?

A. HyBridge Input is the process responsible for handling any packets that could not be fast switched by the IRB process. The reasons a packet could not be fast switched can include:

HyBridge Input will have trouble if there are thousands of point-to-point interfaces in the same bridge group. It will also have trouble, but to a lesser extent, if there are thousands of VSs in the same multipoint interface.

What are possible reasons for problems with IRB? Let's say that a device infected with "Code red" is scanning IP addresses.

Q.My CPU utilization is high at interrupt level, and I am getting flushes if I do a show log. The traffic rate is also only somewhat higher than normal. What could be happening?

A. Here's an example show logging output:
Router#show logging
     Syslog logging: enabled (0 messages dropped, 0 flushes, 0 overruns)
                                                  ^ 
                                                this value is non-zero
         Console logging: level debugging, 9 messages logged
   

Are you logging to the console? If you are logging, are there traffic HTTP requests? Are there any access-lists with log keywords or debugs watching particular IP flows? If flushes are incrementing, it may be because the console, usually a 9600 baud device, is having trouble handling the amount of information received. In this scenario, the router disables interrupts and does nothing but process console messages. The solution is to disable console logging or remove whatever type of logging you are doing.

Q. I am seeing numerous HTTP connection attempts on my IOS router running an ip http-server. Is this because of the "Code Red" worm scan?

A."Code Red" could be the problem. We encourage disabling the ip http server on the IOS router so that it does not have to deal with numerous connection attempts from infected hosts.

Workarounds

There are various workarounds that are discussed in the Advisories that Discuss the "Code Red" Worm section. See these advisories for the workarounds.

Another method for blocking the "Code Red" worm at network ingress points uses Network-Based Application Recognition (NBAR) and Access Control Lists (ACLs) within IOS software on Cisco routers. This method should be used in conjunction with the recommended patches for IIS servers from Microsoft. For more information on this method please see Using NBAR and ACLs for Blocking the "Code Red" Worm at Network Ingress Points.


Related Information


Toolbar

All contents are Copyright © 1992--2001 Cisco Systems Inc. All rights reserved. Important Notices and Privacy Statement.


Updated: Oct 29, 2002Document ID: 12808