

# Router Architecture And IOS Internals

#### **Agenda**

- Routing and Switching
- Cisco IOS Switching Paths
- Cisco Express Forwarding
- Router Architectures & Parallel Express Forwarding

# **Routing and Switching**

## **Switching**

Cisco.com

 The destination in the layer two header remains the same when a packet passes through a switch



#### Routing

- Host A transmits the packet to the router
- The router determines the correct outbound port, then rewrites the layer 2 header so the packet is now destined to B



#### Routing

Cisco.com

 Switching, in the context of routers, involves this process of looking up the next hop, finding the layer 2 rewrite "string," rewriting the layer 2 header, and transmitting the packet



- When the term "layer 3 switch" was first coined, it meant switching packets in hardware based on the layer 3 information
- However, the lines are rarely so neatly drawn in the real world











Cisco.com

• Where is switching done?

On the main processor, in a "normal" process

On the main processor in a special mode (interrupt context)

On a separate general purpose processor

On an application specific chip (ASIC)



# **Cisco IOS Switching Paths**

#### **IOS Process Scheduling**



- •Each disk represents a Process in the Process Ready Queue.
- •Each Process is assigned a Priority (Critical, High, Medium or Low)

Cisco.com



1

Interface Processor DMA's packet into RX Ring Buffer





Cisco.com



When Packet passed to Processor, Buffer ownership transferred to Processor.

As Ownership has passed Interrupt released.



were suspended when the **RX Interrupt arrived** 

**Shared memory** 



Cisco.com



6

Input Process Looks up Destination in Forwarding Table. Determines O/P interface. Writes new MAC header. Places Packet in Output Q



Software 'Processes'.... Tx **Buffer | Header** Ring **Shared memory** Interface polls TX ring and DMA's packets for transmission



# Demand Generated Cache Based Switching ("Fast" Switching)

Cisco.com **CPU CPU Memory Forwarding Table ARP Table** 1.1.0.0/16 via 172.16.2.1 172.16.1.1: 0F000800 10.1.1.0/24 via 172.16.1.1 172.16.2.1: 10134567A...ECE030178654 **Fast Cache** Prefix/Length Interface **Next Hop** Age 1.1.0.0/16 00:00:15 Ethernet0 172.16.2.1 14 00000C7EF7CF00E0B06423F60800 10.1.1.0/24 00:00:15 Serial1 172.16.1.1 4 0F000800

Cisco.com



Interface Processor DMA's packet into RX Ring Buffer

Cisco.com



L3 Info



Cisco.com



|             |          |    | Next Hop            |
|-------------|----------|----|---------------------|
| 10.1.2.3/32 | 00:00:15 | E0 | 10.1.2.1 14 aae0cd  |
| 11.1.2.0/24 | 00:00:15 | s1 | 10.2.3.1 4 0f000800 |



**CPU** 

Software 'Processes'....



Optimum Cache entry used to Write MAC header







# Demand Generated Cache Based Switching Issues

- First packet towards a given destination is always process switched
- Fast cache entries must be timed out periodically to prevent stale information from being used in switching
- When an arp entry or the routing table changes, we must clear some portion of the fast cache and wait for process switched traffic to rebuild it
- We store a prebuilt mac header for each possible destination. This waste space and causes duplicated effort

#### **Show Processes**

Cisco.com

| 7206#show processes |      |            |               |           |          |            |       |                 |
|---------------------|------|------------|---------------|-----------|----------|------------|-------|-----------------|
| CPU u               | tili | zation for | five seconds: | 0%/0%; or | ne minut | e: 0%; fi  | ive m | inutes: 0%      |
| PID                 | QTy  | PC F       | Runtime (ms)  | Invoked   | uSecs    | Stacks     | TTY   | Process         |
| 2                   | M*   | 0          | 8             | 86        | 93       | 9888/1200  | 0 0   | Exec            |
| 3                   | Lst  | 60655C58   | 345736        | 129733    | 2664     | 5740/6000  | 0     | Check heaps     |
| 4                   | Cwe  | 6064C268   | 4             | 1         | 4000     | 5568/6000  | 0     | Chunk Manager   |
| 5                   | Cwe  | 6065BC70   | 12            | 17        | 705      | 5596/6000  | 0     | Pool Manager    |
| 14                  | Lwe  | 60719100   | 5604          | 103710    | 54       | 5236/6000  | 0     | ARP Input       |
| 20                  | Cwe  | 60661090   | 0             | 1         | 0        | 5608/6000  | 0     | Critical Bkgnd  |
| 21                  | Mwe  | 6061BC70   | 232           | 209650    | 11       | .0164/1200 | 0 0   | Net Background  |
| 22                  | Lwe  | 605ACD38   | 0             | 26        | 01       | .1504/1200 | 0 0   | Logger          |
| 24                  | Msp  | 6061B1C0   | 32336         | 1277140   | 25       | 6920/9000  | 0     | Per-Second Jobs |
| 35                  | Mwe  | 60747998   | 4276          | 64668     | 661      | .0648/1200 | 0 0   | IP Input        |
| 82                  | Msp  | 6061B200   | 85188         | 21328     | 3994     | 5660/6000  | 0     | Per-minute Jobs |

#### For the 5 Sec window we have both the total CPU time and the Interrupt time

#### **Show Processes CPU**

Cisco.com

7206#show processes cpu

PID Runtime(ms) Invoked uSecs

248

35704

4520

90896

| 2  | 68     | 227    | 299  | 0.00% | 0.00% | 0.00% | 0 Exec           |
|----|--------|--------|------|-------|-------|-------|------------------|
| 3  | 368920 | 138425 | 2665 | 0.08% | 0.02% | 0.00% | 0 Check heaps    |
| 4  | 4      | 1      | 4000 | 0.00% | 0.00% | 0.00% | 0 Chunk Manager  |
| 5  | 20     | 21     | 952  | 0.00% | 0.00% | 0.00% | 0 Pool Manager   |
| 14 | 6608   | 119562 | 55   | 0.00% | 0.00% | 0.00% | 0 ARP Input      |
| 20 | 0      | 1      | 0    | 0.00% | 0.00% | 0.00% | 0 Critical Bkgnd |

1

26

65

3993

CPU utilization for five seconds: 0%/0%; one minute: 0%; five minutes: 0%

5Sec

1Min

0.00% 0.00% 0.00%

0.00% 0.00% 0.00%

0.00% 0.00% 0.00%

0.00% 0.00% 0.00%

0.00%

5Min TTY Process

0 Net Background

0 Per-Second Jobs

0 Per-minute Jobs

0 Logger

0 IP Input

0.00% 0.00%

More specific information on the CPU time occupied by the Processes

21

22

24

35

82

218242

1362619

68993

22759

28

# **Cisco Express Forwarding**

### **Cisco Express Forwarding**

- Background
- CEF Theory
- The CEF Mtrie
- The Adjacency Table
- Adjacency Table Entries
- Load Sharing with CEF
- CEF Accounting

## **Background: Process Level Switching**

Cisco.com

 Process Level Switching has speed limitations on high speed networks

## **Background: Fast Switching**

- Caching the results of the lookup routines was the first solution and is known as Fast Switching
- This solution encounters scalability problems on Internet backbone routers where the routing table is changing rapidly and there are many different flows of traffic
- CEF (Cisco Express Forwarding) was developed to address the scalability issues of Process and Fast Switching
- CEF doesn't cache switching information, it builds switching tables

Cisco.com

#### What Do We Need to Switch a Packet?



## **CEF Theory**

Cisco.com

# CEF Builds Two Tables to Contain this Information:



## **CEF Packet Switching**

- Read in packet from the interface and store packet into memory
- Raise an interrupt to the processor; the rest of the packet switching takes place within the interrupt
- Use CEF mtrie to lookup packet destination; determine correct next-hop info by following pointer in the last CEF mtrie node
- Use Adjacency table info to rewrite physical layer header
- Place packet on the outbound interface queue

## **CEF Theory**

Cisco.com

# What's the Difference between a Tree and a Trie?



The MAC Header Rewrite Information Is Stored in the Tree Itself

## **CEF Theory**

Cisco.com

# What's the Difference between a Tree and a Trie?





Cisco.com

 Nodes point to other nodes or leaves



Cisco.com

 Leaves point to the adjacency table



Cisco.com

Router#sh ip cef summary

IP CEF with switching (Table Version 4)

4 routes, 0 reresolve, 0 unresolved (0 old, 0 new), peak 0

4 leaves, 8 nodes, 8832 bytes, 4 inserts, 0 invalidations

0 load sharing elements, 0 bytes, 0 references

universal per-destination load sharing algorithm, id 20340B24

1 CEF resets, 0 revisions of existing leaves

0 in-place/0 aborted modifications

Resolution Timer: Exponential (currently 1s, peak 1s)

refcounts: 533 leaf, 536 node

Cisco.com Main **Processor RIB** Output Queue Input Queue The Pipe The Pipe

#### The CEF Mtrie Notes

- Where in the switching path do we build the CEF table?
- Nowhere! The CEF table is built from the routing table before (and while) packets are being switched
- Because the CEF table is directly related to the routing table, we can build it for every destination in the routing table without waiting on any packets to be switched

## **Two Separate Tables**

Cisco.com



The Routing Table and the CEF Mtrie Are Directly Related
The CEF Table Contains Reachability and Next Hop Information

Cisco.com



# **Empty Table**

Cisco.com



### Add 10.0.0.0/8

Cisco.com



### Add 20.1.0.0/16

Cisco.com



### Add 20.1.1.0/24

Cisco.com



Add 30.1.1.0/29









- Normally there are 4 levels of nodes with each node having 255 children
- Prefix and traffic distribution sometimes makes the mtrie perform better if there are different numbers of children for nodes at each level



Cisco.com



Interface Processor DMA's packet into RX Ring Buffer

- 1. A packet arrives at an input interface, RX Interrupt generated
- 2. Read IP Destination Prefix
- 3. Search CEF's FIB DB, using the Destination Prefix as Search Key



- 1. A packet arrives at an input interface, RX Interrupt generated
- 2. Read IP Destination Prefix
- 3. Search CEF's FIB DB, using the Destination Prefix as Search Key
- 4. A Successful MTRIE Lookup will result in a FIB Entry being Found 4a. If the MTRIE Lookup is unsuccessful, the packet will be dropped



- 1. A packet arrives at an input interface, RX Interrupt generated
- 2. Read IP Destination Prefix
- 3. Search CEF's FIB DB, using the Destination Prefix as Search Key
- 4. A Successful MTRIE Lookup will result in a FIB Entry being Found 4a. If the MTRIE Lookup is unsuccessful, the packet will be dropped
- 5. FIB Path is selected



- 1. A packet arrives at an input interface, RX Interrupt generated
- 2. Read IP Destination Prefix
- 3. Search CEF's FIB DB, using the Destination Prefix as Search Key
- 4. A Successful MTRIE Lookup will result in a FIB Entry being Found 4a. If the MTRIE Lookup is unsuccessful, the packet will be dropped
- 5. FIB Path is selected
- 6. Selected FIB Path will point to necessary entry in Adjacency Table



## Switch During the Receive Interrupt

- Features are processed along each switching path.
- Each feature represents a function call which may fail, succeed, or just not exist.



## Switch During the Receive Interrupt

Cisco.com

 At any point while the packet is being processed, it can be punted to the next slower process by allowing the processor to jump to the next pointer in the chain.



## Switch During the Receive Interrupt

Cisco.com

 At any point in the chain, the packet may be also be enqueued for process switching.



**Enqueue packet and terminate interrupt** 

Cisco.com

# Depending on the Type of Route, a CEF Table Entry Can Be Several Different Types

- Attached
- Connected
- Receive
- Recursive

- Attached—An "attached" mtrie entry means the destination is attached to the router
- Connected—A "connected" entry is the result of an ip address being configured on an interface
- An entry may be both Attached and Connected

- Receive—Indicates packets that are destined to the router and do not need to be switched to another interface
- Recursive—References another node to find the next-hop information

## The Adjacency Table

- The Mtrie is used to look up the next-hop for a prefix
- The final node encountered in the Mtrie during a prefix lookup includes a pointer to the correct next-hop in the adjacency table



## The Adjacency Table

- The ARP Cache and the Adjacency Table are directly related
- The adjacency table doesn't contain any information about networks; it only contains information about next hops



## The Adjacency Table

- Allows next-hops to change without changing the mtrie
- A change in next-hop just requires the final mtrie node's pointer to the adjacency table to be updated
- Routing table changes also don't impact the adjacency table

## The Adjacency Table

- Update the FIB when changes in the routing table occur
- Update the adjacency table when changes in connected adjacencies occur

## **Adjacency Table Entries**

- Auto adjacencies
- Punt Adjacencies
- Glean Adjacency
- Drop Adjacencies
- Discard Adjacencies
- Null Adjacencies
- Cached Adjacencies

# **Adjacency Table Entries (Auto)**

Cisco.com

 Auto Adjacencies—The most common type of adjacency; include all the information needed to rewrite the packet header and place the packet in the proper interfaces output queue

# **Adjacency Table Entries (Auto)**



## **Adjacency Table Entries**

Cisco.com

 Punt Adjacencies—A punt adjacency indicates that the packet should be switched by the next slower switching scheme

Cisco.com

 Glean Adjacency—Only one per router; indicates that the destination is attached to the router but the layer two information has not been acquired; results in an ARP request when a packet is switched to this destination

Cisco.com

Router#sh ip interface brief

Interface IP-Address OK? Method Status Protocol

Ethernet0/0 20.0.0.1 YES manual up up

Router#sh ip cef adjacency glean

Prefix Next Hop Interface

20.0.0.0/8 attached Ethernet0/0





## **Adjacency Table Entries**

Cisco.com

 Drop Adjacency—Indicates the packet should be dropped

# **Adjacency Table Entries (Drop)**

Cisco.com

Router#sh ip cef adjacency drop

Prefix Next Hop Interface

224.0.0.0/4 drop

## **Adjacency Table Entries**

Cisco.com

Discard Adjacency—Indicates
destinations which are part of a
loopback's subnet, but are not the actual
ip address configured on the interface

# **Adjacency Table Entries (Discard)**



## **Adjacency Table Entries**

Cisco.com

 Null Adjacency—Indicates the packet should be switched to a Null interface on the router

# **Adjacency Table Entries (Null)**

Cisco.com

Router(config)#ip route 60.0.0.0 255.0.0.0 null0

Router#sh ip cef adjacency null

Prefix Next Hop Interface

60.0.0/8 attached Null0



Cisco.com

router#show ip cef 33.97.1.0 255.255.255.0 detail

33.97.1.0/24, version 13, attached, connected, cached adjacency to Serial4/3

0 packets, 0 bytes

via Serial4/3, 0 dependencies

valid cached adjacency

The Type of Adjacency This CEF

Table Entry Points to

Number of Table Entries Which Point to (Depend On)

This Entry

Number of Packets and Bytes Which Have Been Switched Through This Entry; Configure IP CEF Accounting Per-prefix for This to Work

Cisco.com

```
router#show ip cef summary

IP CEF with switching (Table Version 46), flags=0x0

22 routes, 0 reresolve, 0 unresolved (0 old, 0 new), peak 0

25 leaves, 19 nodes, 22960 bytes, 49 inserts, 24 invalidations
0 load sharing elements, 0 bytes, 0 references
universal per-destination load sharing algorithm, id F2F8D257
```

Number of Entries Which Need to Be Re-resolved

**Number of Entries Which Do Not Have Resolved Recursions** 

**Total Number of Entries in the CEF Table** 



- A: Packets and Bytes Switched Through This Adjacency
- **B: MAC Header Rewrite String**
- C: When This Entry Will Be Refreshed; In This Case, All Point2Points Are Refreshed Every Minute

Cisco.com

router#show int ethernet1/0 stat
Ethernet1/0

| Switching path | Pkts In | Chars In | Pkts Out | Chars Out |   |
|----------------|---------|----------|----------|-----------|---|
| Processor      | 977121  | 70149655 | 578014   | 56457133  | _ |
|                |         |          |          |           | _ |
| Route cache    | 0       | 0        | 0        | 0         | ╛ |

#### **Route cache Includes CEF Switched Packets**

# Router Architectures & Parallel Express Forwarding

#### Introduction

Cisco.com

Routers have to deal in three "Planes" of operation:-

The "Control" Plane

Building and maintaining data structures such as "forwarding tables"

The "Management" Plane

Dealing with configuration files, gathering and providing statistics, providing and responding to control protocol messages

The "Data" Plane

Switching of packets, manipulation of packet (header and content), packet delivery scheduling (queuing)

#### **Introduction – Consumable Resources**

Cisco.com



When any all or all of the resources are exhausted, inconsistent behavior will be observed

## **Routers Operationally**

Cisco.com

Maintain/manipulate routing information

Listen for updates/update neighbors

Classify packets for manipulation/queuing/permit-deny, etc.

Compare packets to classification lists and perform control

Perform Layer 3 switching

**Create outbound Layer 2 encapsulation** 

Layer 3 checksum

TTL/hop count update

Management/billing (statistics)

Interface statistics—NetFlow export

Telnet, SNMP, ping, trace route, HTTP

## **Routers Functionally**

Cisco.com

(Attempt to) switch packets

Layer 3 switching based on routing information

(Attempt to) transmit packets

Access outbound media

Manipulate packets

Change contents of packet (CAR/NAT/compression/encryption)

Consume packets

Routing protocol updates etc.../services advertisements(SAP)/ICMP/SNMP

Generate packets

Routing protocol packets/SAPs/ICMP/SNMP Tunnels—GRE, IPSec, DLSw etc...

## **Router Hardware**

- Interface Processors
- The Central Processing Unit
- Memory
- The Backplane



## The Central Processing Unit

- Provides horsepower for all control plane functions, such as system maintenance, building routing tables, etc.
- On some platforms, it also provides the horsepower for actually switching packets



### **Shared Memory Architecture**

Cisco.com

#### Applicable Platforms

Cisco 1xxx Cisco 2xxx Cisco 3xxx Cisco 4xxx

## **Shared Memory Architecture**



#### **Shared Memory Architecture (Hardware "Assist")**

Cisco.com

#### Applicable Platforms

Cisco 7200 Cisco 7300 Cisco 7400 Cisco 10000

#### **Shared Memory Architecture (Hardware "Assist")**



## **Distributed Shared Memory**

Cisco.com

Applicable Platforms

Cisco 7500 Catalyst 5xxx RSM

## **Distributed Shared Memory**



#### **Distributed Cross Bar**

Cisco.com

Applicable Platforms

Cisco 6500/7600 OSR Cisco 12000 (GSR)

#### **Cross Bar Data Path** Cisco.com **CPU Memory** (DRAM) CPU (C) Forwarding Table **Packet Memory** Interface Serial Tx Card Input/Output Rx CPU Lines, "To" and "From" Fabric **Packet Memory** Tx Interface Card Rx CPU **Packet Memory** Interface Tx Card (D) FT **CPU** Packet Memory Interface Card Rx CPU

**ASIC X-Bar Fabric** 

#### **Cross Bar Data Path (Multiple Fabrics)**



#### Parallel Express Forwarding

- PXF is one kind of "Function Specific Hardware"
- PXF Architecture
- PXF packet switching

# Cisco-Developed Unique Value-Add PXF IP Services Processor

Cisco.com

#### Internally-Developed Cisco Processing Technology





#### **Benefit of PXF Acceleration**



### Parallel eXpress Forwarding (PXF) Engine

- New technology switching engine for high-touch L3 services with optimized throughput
- Programmable architecture to allow for future feature upgrades
- Based on custom pipelined array processor (ASIC)

# Power of Cisco Parallel Processing

Cisco.com

#### **PXF Network Processor**



- Matrix of separate processors
- Implements "assembly line" for exceptional performance
- "Assembly line" enables consistent throughout
- Little division when services are enabled/disabled

#### **PXF Processor Services**

#### **Example**



### Parallel express Forwarding (PXF)

Cisco.com

- Each PXF ASIC has 16 processors arranged in 4 rows x 4 columns
- Two PXF ASICs connected serially: 4x8, 32 CPUs total for an ESR
  - Parallelism and pipelining => Improved feature throughput



Next

**Current Context** 



#### **PXF Packet Forwarding**

Cisco.com **Headers Pass SDRAM SDRAM** through **Modified Headers Toaster** and Bodies Are **Moved to Packet Buffer Memory** IN OUT **SDRAM SDRAM To-Toaster From-Toaster** Buffer **Complex Complex SDRAM Input Packet Output Queue** Controller **Memory** Interface Interface **Complete Packets Are Moved from SDRAM** to Output **Line Cards** From Line Cards To Line Cards

## Summary

#### **Summary**

- 90 minutes is way too much to summarize in one slide, and not enough time to cover these topics!
- Routers scale based on CPU, processing hardware, memory and bandwidth
- Resource exhaustion results in dropped packets
- No one architecture has all the answers different platforms are appropriate for different roles in your network

#### Recommended Reading

Cisco.com

## **Inside Cisco IOS Software Architecture**

(CCIE Professional Development)

ISBN: 1-57870-181-3

#### **IP Routing Fundamentals**

ISBN: 1-57870-071-X

#### Cisco Router Configuration,

**Second Edition** 

ISBN: 1-57870-241-0



#### **CEF Whitepaper:**

http://www.cisco.com/warp/public/732/Tech/switching/docs/cef\_ov\_final.pdf

## Life Is Short, Eat Dessert First!



