Guest

IP Security Protocols

IPSec Anti-Replay Check Failures

Document ID: 116858

Updated: Jan 14, 2014

Contributed by Atri Basu, Wen Zhang, and Nehal Naik, Cisco TAC Engineers.

   Print

Introduction

This document describes a problem that concerns an Internet Protocol Security (IPSec) anti-replay check failure and provides troubleshoot procedures and possible solutions to the problem.

Note: Anti-replay protection is an important security service that IPSec protocol offers. IPSec anti-replay disablement has security implications, and should only be used with caution.

Background Information

Replay Attack Description

A replay attack is a form of network attack in which a valid data transmission is maliciously or fraudulently repeated or delayed. It is an attempt to subvert security by someone who records legitimate communications and repeats them in order to impersonate a valid user, and to disrupt or cause negative impact for legitimate connections.

Replay Check Failure Description

IPSec provides anti-replay protection against an attacker who duplicates encrypted packets with the assignment of a monotonically increasing sequence number to each encrypted packet. The receiving IPSec endpoint keeps track of which packets it has already processed on the basis of these numbers with the use of a sliding window of all acceptable sequence numbers. Currently, the default anti-replay window size in Cisco IOS® implementation is 64 packets. This is illustrated in this figure:

Here are the steps to process incoming IPSec traffic on the receiving tunnel endpoint with anti-replay enabled:

  1. When a packet is received, if the sequence number falls within the window and was not previously received, the packet is accepted, and marked as received before it is sent to integrity verification.

  2. If the sequence number falls within the window and was previously received, the packet is dropped, and the replay counter is incremented.

  3. If the sequence number is greater than the highest sequence number in the window, the packet is accepted, and marked as received. The sliding window is then moved to the right.

    Note: This only occurs if the packet is valid and passes integrity checks.



  4. If the sequence number is less than the lowest sequence in the window, the packet is dropped, and the replay counter is incremented.

In the second and fourth scenarios, a replay check failure occurs, and the router displays an error message similar to this:

%CRYPTO-4-PKT_REPLAY_ERR: decrypt: replay check failed connection id=#, sequence
number=#

Note: Group Encrypted Transport VPN (GETVPN) has an entirely different anti-replay check called Time Based Anti-Replay Failure. This document only covers counter-based anti-replay.

Problem

As previously described, the purpose of replay checks is to protect against malicious repetitions of packets. However, there are some scenarios where a failed replay check might not be due to a malicious reason:

  • The error might result from a packet reorder in the transmission medium. This is especially true if parallel paths exist.

  • The error might be caused by unequal packet processing paths inside the Cisco IOS. For example, large IPSec packets that require IP reassembly before decryption might be delayed enough, in a system under load, in order to fall outside of the replay window by the time they are processed.

  • The error might be casued by Quality of Service (QoS) enabled on the sending IPSec endpoint. With the Cisco IOS implementation, IPSec encryption happens before QoS in the egress direction. Certain QoS features, such as Low Latency Queueing (LLQ), can cause IPSec packet delivery to become out-of-order and dropped by the receiving endpoint due to a replay check failure.

Troubleshoot IPSec Replay Drops

The key to troubleshoot IPSec replay drops is to identify the packet drops due to replay, and use packet captures in order to confirm if these packets are indeed replayed packets or packets that have arrived on the receiving router outside of the replay window. In order to correctly match the dropped packets to what is captured in the sniffer trace, the first step is to identify the peer and the IPSec flow to which the dropped packets belong. This is done differently based on the router platform.

Cisco Integrated Services Router (ISR)/ISR G2 Platform that Runs Cisco IOS Classic

In order to troubleshoot on this platform, use the conn-id in the error message. Identify the conn-id in the error message, and look for it in the show crypto ipsec sa output, since replay is a per-SA (Security Association) check (as opposed to a per-peer). The syslog message also provides the Encapsulating Security Payload (ESP) sequence number, which can help uniquely identify the dropped packet in the packet capture.

Note: With different versions of code, the conn-id is either the conn id or flow_id for the inbound SA.

This is illustrated here:

%CRYPTO-4-PKT_REPLAY_ERR: decrypt: replay check failed
connection id=529, sequence number=13


Router#show crypto ipsec sa | in peer|conn id
current_peer 10.2.0.200 port 500
conn id: 529, flow_id: SW:529, sibling_flags 80000046, crypto map: Tunnel0-head-0
conn id: 530, flow_id: SW:530, sibling_flags 80000046, crypto map: Tunnel0-head-0
Router#


Router#show crypto ipsec sa peer 10.2.0.200 detail

interface: Tunnel0
Crypto map tag: Tunnel0-head-0, local addr 10.1.0.100

protected vrf: (none)
local ident (addr/mask/prot/port): (0.0.0.0/0.0.0.0/0/0)
remote ident (addr/mask/prot/port): (0.0.0.0/0.0.0.0/0/0)
current_peer 10.2.0.200 port 500
PERMIT, flags={origin_is_acl,}
#pkts encaps: 27, #pkts encrypt: 27, #pkts digest: 27
#pkts decaps: 27, #pkts decrypt: 27, #pkts verify: 27
#pkts compressed: 0, #pkts decompressed: 0
#pkts not compressed: 0, #pkts compr. failed: 0
#pkts not decompressed: 0, #pkts decompress failed: 0
#pkts no sa (send) 0, #pkts invalid sa (rcv) 0
#pkts encaps failed (send) 0, #pkts decaps failed (rcv) 0
#pkts invalid prot (recv) 0, #pkts verify failed: 0
#pkts invalid identity (recv) 0, #pkts invalid len (rcv) 0
#pkts replay rollover (send): 0, #pkts replay rollover (rcv) 0
##pkts replay failed (rcv): 21
#pkts internal err (send): 0, #pkts internal err (recv) 0

local crypto endpt.: 10.1.0.100, remote crypto endpt.: 10.2.0.200
path mtu 2000, ip mtu 2000, ip mtu idb Serial2/0
current outbound spi: 0x8B087377(2332586871)
PFS (Y/N): N, DH group: none

inbound esp sas:
spi: 0xE7EDE943(3891128643)
transform: esp-gcm ,
in use settings ={Tunnel, }
conn id: 529, flow_id: SW:529, sibling_flags 80000046, crypto map:
Tunnel0-head-0
sa timing: remaining key lifetime (k/sec): (4509600/3223)
IV size: 8 bytes
replay detection support: Y
Status: ACTIVE

<SNIP>

As can be seen from this output, the replay drop is from the 10.2.0.200 peer address with an inbound ESP SA Security Parameter Index (SPI) of 0xE7EDE943. It can also be noted from the log message itself that the ESP sequence number for the dropped packet is 13. So, the combination of peer address, SPI number, and the ESP sequence number can be used in order to uniquely identify the packet dropped in the packet capture.

Note: The Cisco IOS syslog message is rate-limited for dataplane packet drops. In order to get an accurate count of the exact number of packets dropped, use the show crypto ipsec sa detail command as shown previously. Also, note in code earlier than Cisco IOS Version 12.4(4)T, the counters might be updated incorrectly. This is fixed in Cisco bug ID CSCsa90034.

Cisco Aggregation Services Router (ASR) that Runs Cisco IOS-XE

On the ASR platform, the REPLAY_ERROR reported in some of the earlier Cisco IOS-XE releases might not print the actual IPSec flow where the replayed packet is dropped, as shown here:

%IOSXE-3-PLATFORM: F0: cpp_cp: QFP:00 Thread: 095 TS:00000000240306197890
%IPSEC-3-REPLAY_ERROR:
 IPSec SA receives anti-replay error, DP Handle 3

In order to identify the correct IPSec peer and flow information, use the Data Plane (DP) Handle printed in the syslog message as the input parameter SA Handle in this command in order to retrieve the IPSec flow information on the Quantum Flow Processor (QFP):

Router#show platform hardware qfp active feature ipsec sa 3
QFP ipsec sa Information

QFP sa id: 3
pal sa id: 2
QFP spd id: 1
QFP sp id: 2
QFP spi: 0x4c1d1e90(1276976784)
crypto ctx: 0x000000002e03bfff
flags: 0xc000800 (Details below)
: src:IKE valid:Yes soft-life-expired:No hard-life-expired:No
: replay-check:Yes proto:0 mode:0 direction:0
: qos_preclassify:No qos_group:No
: frag_type:BEFORE_ENCRYPT df_bit_type:COPY
: sar_enable:No getvpn_mode:SNDRCV_SA
: doing_translation:No assigned_outside_rport:No
: inline_tagging_enabled:No
qos_group: 0x0
mtu: 0x0=0
sar_delta: 0
sar_window: 0x0
sibling_sa: 0x0
sp_ptr: 0x8c392000
sbs_ptr: 0x8bfbf810
local endpoint: 10.1.0.100
remote endpoint: 10.2.0.200

cgid.cid.fid.rid: 0.0.0.0
ivrf: 0
fvrf: 0
trans udp sport: 0
trans udp dport: 0
first intf name: Tunnel1
<SNIP>

If the Cisco IOS version on the ASR is pre-XE Version 3.7, then the error message simply logs the message with DP Handle and no information about the peer/SPI to which the culprit packet belongs. This is where Cisco bug ID CSCtw69096 becomes relevant:

      CSCtw69096    ASR prints DP Handle in IPsec syslogs - Fixed in XE3.7 / 15.2(4)S
        .. With this defect-fix, we now print the peer ip address and the SPI as follows:
           %IPSEC-3-REPLAY_ERROR: IPSec SA receives anti-replay error, DP Handle 6
    Now shows up as:
           %IPSEC-3-REPLAY_ERROR: IPSec SA receives anti-replay error, DP Handle 6,
src_addr 10.0.0.2, dest_addr 10.0.0.1, SPI 0x1a2b3c4d

In such cases, this Embedded Event Manager (EEM) script can be used in order to see which peer and SPI triggers the anti-replay messages:


event manager applet Replay-Error
 event syslog pattern "%IPSEC-3-REPLAY_ERROR: IPSec SA receives anti-replay error"
 action 1.0 regexp "([0-9]+)$" "$_syslog_msg" dph
 action 2.0 cli command "enable"
 action 3.0 cli command "show platform hardware qfp active feature ipsec sa $dph |
append bootflash:replay-error.txt"

In order to see the output on the ASR itself, enter the more bootflash:replay-error.txt command periodically.

Work with the ASR Datapath Packet Tracing Feature

With the more recent Cisco IOS-XE software for the ASR1000, information about the peer as well as the IPSec SPI are also printed in order to help troubleshoot anti-replay problems. However, one key piece of information that is still missing compared to what is printed on the ISR G2 platforms that run Cisco IOS classic is the ESP sequence number. The ESP sequence number is used in order to uniquely identify an IPSec packet within a given IPSec flow. Without the sequence number, it becomes difficult to identify exactly which packet gets dropped in a packet capture.

In Cisco IOS-XE Version 3.10 (15.3(3)S), a new packet tracing infrastructure was introduced in order to help troubleshoot the dataplane packet forwarding issue, and it can be used in this particular troubleshooting situation where this replay drop is observed on the ASR:

%IOSXE-3-PLATFORM: F0: cpp_cp: QFP:0.0 Thread:060 TS:00000001132883828011
%IPSEC-3-REPLAY_ERROR: IPSec SA receives anti-replay error, DP Handle 3, src_addr
10.2.0.200, dest_addr 10.1.0.100, SPI 0x4c1d1e90
O

In order to help identify the ESP sequence number for the packet dropped, complete these steps with the packet tracing feature:

  1. Set up the platform conditional debugging filter in order to match traffic from the peer device:

    debug platform condition ipv4 10.2.0.200/32 ingress
    debug platform condition start
  2. Enable packet tracing with the copy option in order to copy the packet header information:

    debug platform packet enable
    debug platform packet-trace packet 64
    debug platform packet-trace copy packet input l3 size 100


  3. When replay errors are detected, use the packet trace buffer in order to identify the packet dropped due to replay, and the ESP sequence number can be found in the packet copied:

    Router#show platform packet-trace summary 
    Pkt Input Output State Reason
    0 Gi4/0/0 Tu1 CONS Packet Consumed
    1 Gi4/0/0 Tu1 CONS Packet Consumed
    2 Gi4/0/0 Tu1 CONS Packet Consumed
    3 Gi4/0/0 Tu1 CONS Packet Consumed
    4 Gi4/0/0 Tu1 CONS Packet Consumed
    5 Gi4/0/0 Tu1 CONS Packet Consumed
    6 Gi4/0/0 Tu1 DROP 053 (IpsecInput)
    7 Gi4/0/0 Tu1 DROP 053 (IpsecInput)
    8 Gi4/0/0 Tu1 CONS Packet Consumed
    9 Gi4/0/0 Tu1 CONS Packet Consumed
    10 Gi4/0/0 Tu1 CONS Packet Consumed
    11 Gi4/0/0 Tu1 CONS Packet Consumed
    12 Gi4/0/0 Tu1 CONS Packet Consumed
    13 Gi4/0/0 Tu1 CONS Packet Consumed


    The previous output shows that packet numbers 6 and 7 are dropped, so they can be examined in detail now:

    Router#show platform packet-trace  pac 6
    Packet: 6 CBUG ID: 6
    Summary
    Input : GigabitEthernet4/0/0
    Output : Tunnel1
    State : DROP 053 (IpsecInput)
    Timestamp : 3233497953773
    Path Trace
    Feature: IPV4
    Source : 10.2.0.200
    Destination : 10.1.0.100
    Protocol : 50 (ESP)
    Feature: IPSec
    Action : DECRYPT
    SA Handle : 3
    SPI : 0x4c1d1e90
    Peer Addr : 10.2.0.200
    Local Addr: 10.1.0.100
    Feature: IPSec
    Action : DROP
    Sub-code : 019 - CD_IN_ANTI_REPLAY_FAIL
    Packet Copy In
    45000428 00110000 fc329575 0a0200c8 0a010064 4c1d1e90 00000006 790aa252
    e9951cd9 57024433 d97c7cb8 58e0c869 2101f1ef 148c2a12 f309171d 1b7a4771
    d8868af7 7bae9967 7d880197 46c6a079 d0143e43 c9024c61 0045280a d57b2f5e
    23f06bc3 ab6b6b81 c1b17936 98939509 7aec966e 4dd848d2 60517162 9308ba5d


    The ESP sequence number has an offset of 24 that starts from the IP header, as emphasized in bold and italics in the previous output. In this particular example, the ESP sequence number for the dropped packet is 0x6.

Solution

After the peer is identified, there are three possible scenarios:

  1. It is a valid packet: Packet captures help confirm if the packet is actually valid, and if the problem is insignificant (due to network latency or transmission path issues) or requires a more in-depth troubleshoot. For example, the capture shows a packet with a sequence number of X that arrives out of order, and the window size is set to 64. If X + 64 packets arrive before packet X, then it gets dropped due to a replay failure (it is not really an attack).

    In such scenarios, increase the size of the replay window in order to ensure that such delays are accounted for and prevent legitimate packets from being dropped. By default, the window size is fairly small (window size of 64). If you increase the size, it does not greatly increase the risk of an attack. For information on how to configure an IPsec Anti-Replay Window, refer to the How to Configure IPsec Anti-Replay Window: Expanding and Disabling article.

    Note: A commonly encountered problem on ASRs, with respect to the window size, is that the ASR does not actually support a window size of 1024 even though the command provides the option to set the limit to 1024, and it defaults back to 512. Because of this, the window size reported in the show crypto ipsec sa output is incorrect. For more information, refer to Cisco bug ID CSCso45946.



  2. It is a packet that falls outside of the receiver's anti-replay window: In case the receiving IPSec endpoint drops the replayed packets (as it is supposed to), simultaneous sniffer captures on the WAN side of both the sender and receiver help track down if this is caused by misbehavior of the sender, or by packets replayed in the transit network.

  3. It is due to QoS configuration on the sender's end: This situation requires careful examination and some QoS tuning in order to mitigate the condition. For a more in-depth description of this topic and a potential solution, refer to the Anti-Replay Considerations in a Voice and Video Enabled IPSec VPN (V3PN) article.


Note: Replay check failures are only seen when an authentication algorithm is enabled in the IPSec transform set. Another way to suppress this error message is to disable authentication and perform encryption only; however, this is strongly discouraged due to the security implications of disabled authentication.

Related Information

Updated: Jan 14, 2014
Document ID: 116858