RSVP graceful restart allows RSVP TE enabled nodes to recover gracefully following a node failure in the network such that the RSVP state after the failure is restored as quickly as possible. The node failure may be completely transparent to other nodes in the network.
RSVP graceful restart preserves the label values and forwarding information and works with third-party or Cisco routers seamlessly.
RSVP graceful restart depends on RSVP hello messages to detect that a neighbor went down. Hello messages include Hello Request or Hello Acknowledgment (ACK) objects between two neighbors.
A node hello is transmitted when graceful restart is globally configured and the first LSP to the neighbor is created.
Interface hello is an optional configuration. If you configure the graceful restart Hello command on an interface, the interface hello is considered to be an additional hello instance with the neighbor.
The router transmits an interface hello for graceful restart when all of the following conditions are met:
Graceful restart is configured globally.
Graceful restart is configured on the interface.
An LSP to the neighboring router is created and goes over the interface.
Cisco recommends that you use node hellos if the neighbor supports node hellos, and configure interface hellos only if the neighbor router does not support node hellos.
Interface hellos differ from node hellos. as follows:
—The source address in the IP header of the hello message has an IP address that matches the interface that the Hello message sent out. The destination address in the IP header is the interface address of the neighbor on the other side of the link. A TTL of 1 is used for per-interface hellos as it is destined for the directly-connected neighbor.
—The source address in the IP header of the Hello message includes the TE router ID of the sending router. The destination address of the IP header has the router ID of the neighbor to which this message is sent. A TTL of more than 1 is used.
The figure below shows the graceful restart extension to these messages that an object called Restart_Cap, which tells neighbors that a node, may be capable of restarting if a failure occurs. The time-to-live (TTL) in these messages is set to 255 so that adjacencies can be maintained through alternate paths even if the link between two neighbors goes down.
Figure 1. How Graceful Restart Works
The Restart_Cap object has two values—the restart time, which is the sender’s time to restart the RSVP_TE component and exchange hello messages after a failure; and the recovery time, which is the desired time that the sender wants the receiver to synchronize the RSVP and MPLS databases.
In the figure above, graceful restart is enabled on Router 1, Router 2, Router 3, and Router 4. For simplicity, assume that all routers are restart capable. A TE label switched path (LSP) is signaled from Router 1 to Router 4.
Router 2 and Router 3 exchange periodic graceful restart hello messages every 10000 ms (10 seconds), and so do Router 2 and Router 1 and Router 3 and Router 4. Assume that Router 2 advertises its restart time as 60000 ms (60 seconds) and its recovery time as 60000 ms (60 seconds) as shown in the following example:
23:33:36: Outgoing Hello:
23:33:36: version:1 flags:0000 cksum:883C ttl:255 reserved:0 length:32
23:33:36: HELLO type HELLO REQUEST length 12:
23:33:36: Src_Instance: 0x6EDA8BD7, Dst_Instance: 0x00000000
23:33:36: RESTART_CAP type 1 length 12:
23:33:36: Restart_Time: 0x0000EA60
, Recovery_Time: 0x0000EA60
The restart and recovery time are shown in
bold in the last entry.
Router 3 records this into its database. Also, both neighbors maintain the neighbor status as UP. However, Router 3’s control plane fails at some point (for example, a Primary Route Processor failure). As a result, RSVP and TE lose their signaling information and states although data packets continue to be forwarded by the line cards.
When four ACK messages are missed from Router 2 (40 seconds), Router 3 declares communication with Router 2 lost “indicated by LOST” and starts the restart time to wait for the duration advertised in Router 2’s restart time previously and recorded (60 seconds). Router 1 and Router 2 suppress all RSVP messages to Router 3 except hellos. Router 3 keeps sending the RSVP Path and Resv refresh messages to Router 4 and Router 5 so that they do not expire the state for the LSP; however, Router 3 suppresses these messages for Router 2.
A node restarts if it misses four ACKs or its hello src_instance (last source instance sent to its neighbor) changes so that its restart time = 0.
Before the restart time expires, Router 2 restarts and loads its configuration and graceful restart makes the configuration of Router 2 send the hello messages with a new source instance to all the data links attached. However, because Router 2 has lost the neighbor states, it does not know what destination instance it should use in those messages; therefore, all destination instances are set to 0.
When Router 3 sees the hello from Router 2, Router 3 stops the restart time for Router 2 and sends an ACK message back. When Router 3 sees a new source instance value in Router 2’s hello message, Router 3 knows that Router 2 had a control plane failure. Router 2 gets Router 3’s source instance value and uses it as the destination instance going forward.
Router 3 also checks the recovery time value in the hello message from Router 2. If the recovery time is 0, Router 3 knows that Router 2 was not able to preserve its forwarding information and Router 3 deletes all RSVP state that it had with Router 2.
If the recovery time is greater than 0, Router 1 sends Router 2 Path messages for each LSP that it had previously sent through Router 2. If these messages were previously refreshed in summary messages, they are sent individually during the recovery time. Each of these Path messages includes a Recovery_Label object containing the label value received from Router 2 before the failure.
When Router 3 receives a Path message from Router 2, Router 3 sends a Resv message upstream. However, Router 3 suppresses the Resv message until it receives a Path message.