This document describes a problem that concerns an Internet Protocol Security (IPSec) anti-replay check failure, and provides troubleshoot procedures and possible solutions to the problem.
Note: Anti-replay protection is an important security service that IPSec protocol offers. IPSec anti-replay disablement has security implications, and should only be used with caution.
A replay attack is a form of network attack in which a valid data transmission is maliciously or fraudulently repeated or delayed. It is an attempt to subvert security by someone who records legitimate communications and repeats them in order to impersonate a valid user, and to disrupt or cause negative impact for legitimate connections.
IPSec provides anti-replay protection against an attacker who duplicates encrypted packets with the assignment of a monotonically increasing sequence number to each encrypted packet. The receiving IPSec endpoint keeps track of which packets it has already processed on the basis of these numbers with the use of a sliding window of all acceptable sequence numbers. Currently, the default anti-replay window size in Cisco IOS® implementation is 64 packets.
Note: Enhancement requests CSCva65805 and CSCva65836 have been filed to increase the default replay window size to 512 as 64 is considered impractically small for modern networks.
This is illustrated in this figure:
Here are the steps to process incoming IPSec traffic on the receiving tunnel endpoint with anti-replay enabled:
Note: This only occurs if the packet is valid and passes integrity checks.
In the second and fourth scenarios, a replay check failure occurs, and the router displays an error message similar to this:
%CRYPTO-4-PKT_REPLAY_ERR: decrypt: replay check failed connection id=#, sequence
number=#
Note: Group Encrypted Transport VPN (GETVPN) has an entirely different anti-replay check called Time Based Anti-Replay Failure. This document only covers counter-based anti-replay.
As previously described, the purpose of replay checks is to protect against malicious repetitions of packets. However, there are some scenarios where a failed replay check might not be due to a malicious reason:
The key to troubleshoot IPSec replay drops is to identify the packet drops due to replay, and use packet captures in order to confirm if these packets are indeed replayed packets or packets that have arrived on the receiving router outside of the replay window. In order to correctly match the dropped packets to what is captured in the sniffer trace, the first step is to identify the peer and the IPSec flow to which the dropped packets belong. This is done differently based on the router platform.
In order to troubleshoot on this platform, use the conn-id in the error message. Identify the conn-id in the error message, and look for it in the show crypto ipsec sa output, since replay is a per-SA (Security Association) check (as opposed to a per-peer). The syslog message also provides the Encapsulating Security Payload (ESP) sequence number, which can help uniquely identify the dropped packet in the packet capture.
Note: With different versions of code, the conn-id is either the conn id or flow_id for the inbound SA.
This is illustrated here:
%CRYPTO-4-PKT_REPLAY_ERR: decrypt: replay check failed
connection id=529, sequence number=13
Router#show crypto ipsec sa | in peer|conn id
current_peer 10.2.0.200 port 500
conn id: 529, flow_id: SW:529, sibling_flags 80000046, crypto map: Tunnel0-head-0
conn id: 530, flow_id: SW:530, sibling_flags 80000046, crypto map: Tunnel0-head-0
Router#
Router#show crypto ipsec sa peer 10.2.0.200 detail
interface: Tunnel0
Crypto map tag: Tunnel0-head-0, local addr 10.1.0.100
protected vrf: (none)
local ident (addr/mask/prot/port): (0.0.0.0/0.0.0.0/0/0)
remote ident (addr/mask/prot/port): (0.0.0.0/0.0.0.0/0/0)
current_peer 10.2.0.200 port 500
PERMIT, flags={origin_is_acl,}
#pkts encaps: 27, #pkts encrypt: 27, #pkts digest: 27
#pkts decaps: 27, #pkts decrypt: 27, #pkts verify: 27
#pkts compressed: 0, #pkts decompressed: 0
#pkts not compressed: 0, #pkts compr. failed: 0
#pkts not decompressed: 0, #pkts decompress failed: 0
#pkts no sa (send) 0, #pkts invalid sa (rcv) 0
#pkts encaps failed (send) 0, #pkts decaps failed (rcv) 0
#pkts invalid prot (recv) 0, #pkts verify failed: 0
#pkts invalid identity (recv) 0, #pkts invalid len (rcv) 0
#pkts replay rollover (send): 0, #pkts replay rollover (rcv) 0
##pkts replay failed (rcv): 21
#pkts internal err (send): 0, #pkts internal err (recv) 0
local crypto endpt.: 10.1.0.100, remote crypto endpt.: 10.2.0.200
path mtu 2000, ip mtu 2000, ip mtu idb Serial2/0
current outbound spi: 0x8B087377(2332586871)
PFS (Y/N): N, DH group: none
inbound esp sas:
spi: 0xE7EDE943(3891128643)
transform: esp-gcm ,
in use settings ={Tunnel, }
conn id: 529, flow_id: SW:529, sibling_flags 80000046, crypto map:
Tunnel0-head-0
sa timing: remaining key lifetime (k/sec): (4509600/3223)
IV size: 8 bytes
replay detection support: Y
Status: ACTIVE
<SNIP>
As can be seen from this output, the replay drop is from the 10.2.0.200 peer address with an inbound ESP SA Security Parameter Index (SPI) of 0xE7EDE943. It can also be noted from the log message itself that the ESP sequence number for the dropped packet is 13. So, the combination of peer address, SPI number, and the ESP sequence number can be used in order to uniquely identify the packet dropped in the packet capture.
Note: The Cisco IOS syslog message is rate-limited for dataplane packet drops. In order to get an accurate count of the exact number of packets dropped, use the show crypto ipsec sa detail command as shown previously. Also, note in code earlier than Cisco IOS Version 12.4(4)T, the counters might be updated incorrectly. This is fixed in Cisco bug ID CSCsa90034.
On the ASR platform, the REPLAY_ERROR reported in some of the earlier Cisco IOS-XE releases might not print the actual IPSec flow where the replayed packet is dropped, as shown here:
%IOSXE-3-PLATFORM: F0: cpp_cp: QFP:00 Thread: 095 TS:00000000240306197890
%IPSEC-3-REPLAY_ERROR:
IPSec SA receives anti-replay error, DP Handle 3
In order to identify the correct IPSec peer and flow information, use the Data Plane (DP) Handle printed in the syslog message as the input parameter SA Handle in this command in order to retrieve the IPSec flow information on the Quantum Flow Processor (QFP):
Router#show platform hardware qfp active feature ipsec sa 3
QFP ipsec sa Information
QFP sa id: 3
pal sa id: 2
QFP spd id: 1
QFP sp id: 2
QFP spi: 0x4c1d1e90(1276976784)
crypto ctx: 0x000000002e03bfff
flags: 0xc000800 (Details below)
: src:IKE valid:Yes soft-life-expired:No hard-life-expired:No
: replay-check:Yes proto:0 mode:0 direction:0
: qos_preclassify:No qos_group:No
: frag_type:BEFORE_ENCRYPT df_bit_type:COPY
: sar_enable:No getvpn_mode:SNDRCV_SA
: doing_translation:No assigned_outside_rport:No
: inline_tagging_enabled:No
qos_group: 0x0
mtu: 0x0=0
sar_delta: 0
sar_window: 0x0
sibling_sa: 0x0
sp_ptr: 0x8c392000
sbs_ptr: 0x8bfbf810
local endpoint: 10.1.0.100
remote endpoint: 10.2.0.200
cgid.cid.fid.rid: 0.0.0.0
ivrf: 0
fvrf: 0
trans udp sport: 0
trans udp dport: 0
first intf name: Tunnel1
<SNIP>
If the Cisco IOS version on the ASR is pre-XE Version 3.7, then the error message simply logs the message with DP Handle and no information about the peer/SPI to which the culprit packet belongs. This is where Cisco bug ID CSCtw69096 becomes relevant:
CSCtw69096 ASR prints DP Handle in IPsec syslogs - Fixed in XE3.7 / 15.2(4)S
.. With this defect-fix, we now print the peer ip address and the SPI as follows:
%IPSEC-3-REPLAY_ERROR: IPSec SA receives anti-replay error, DP Handle 6
Now shows up as:
%IPSEC-3-REPLAY_ERROR: IPSec SA receives anti-replay error, DP Handle 6,
src_addr 10.0.0.2, dest_addr 10.0.0.1, SPI 0x1a2b3c4d
In such cases, this Embedded Event Manager (EEM) script can be used in order to see which peer and SPI triggers the anti-replay messages:
event manager applet Replay-Error
event syslog pattern "%IPSEC-3-REPLAY_ERROR: IPSec SA receives anti-replay error"
action 1.0 regexp "([0-9]+)$" "$_syslog_msg" dph
action 2.0 cli command "enable"
action 3.0 cli command "show platform hardware qfp active feature ipsec sa $dph |
append bootflash:replay-error.txt"
In order to see the output on the ASR itself, enter the more bootflash:replay-error.txt command periodically.
With the more recent Cisco IOS-XE software for the ASR1000, information about the peer as well as the IPSec SPI are also printed in order to help troubleshoot anti-replay problems. However, one key piece of information that is still missing compared to what is printed on the ISR G2 platforms that run Cisco IOS classic is the ESP sequence number. The ESP sequence number is used in order to uniquely identify an IPSec packet within a given IPSec flow. Without the sequence number, it becomes difficult to identify exactly which packet gets dropped in a packet capture.
In Cisco IOS-XE Version 3.10 (15.3(3)S), a new packet tracing infrastructure was introduced in order to help troubleshoot the dataplane packet forwarding issue, and it can be used in this particular troubleshooting situation where this replay drop is observed on the ASR:
%IOSXE-3-PLATFORM: F0: cpp_cp: QFP:0.0 Thread:060 TS:00000001132883828011
%IPSEC-3-REPLAY_ERROR: IPSec SA receives anti-replay error, DP Handle 3, src_addr
10.2.0.200, dest_addr 10.1.0.100, SPI 0x4c1d1e90
O
In order to help identify the ESP sequence number for the packet dropped, complete these steps with the packet tracing feature:
debug platform condition ipv4 10.2.0.200/32 ingress
debug platform condition start
debug platform packet enable
debug platform packet-trace packet 64
debug platform packet-trace copy packet input l3 size 100
Router#show platform packet-trace summary
Pkt Input Output State Reason
0 Gi4/0/0 Tu1 CONS Packet Consumed
1 Gi4/0/0 Tu1 CONS Packet Consumed
2 Gi4/0/0 Tu1 CONS Packet Consumed
3 Gi4/0/0 Tu1 CONS Packet Consumed
4 Gi4/0/0 Tu1 CONS Packet Consumed
5 Gi4/0/0 Tu1 CONS Packet Consumed
6 Gi4/0/0 Tu1 DROP 053 (IpsecInput)
7 Gi4/0/0 Tu1 DROP 053 (IpsecInput)
8 Gi4/0/0 Tu1 CONS Packet Consumed
9 Gi4/0/0 Tu1 CONS Packet Consumed
10 Gi4/0/0 Tu1 CONS Packet Consumed
11 Gi4/0/0 Tu1 CONS Packet Consumed
12 Gi4/0/0 Tu1 CONS Packet Consumed
13 Gi4/0/0 Tu1 CONS Packet Consumed
Router#show platform packet-trace pac 6
Packet: 6 CBUG ID: 6
Summary
Input : GigabitEthernet4/0/0
Output : Tunnel1
State : DROP 053 (IpsecInput)
Timestamp : 3233497953773
Path Trace
Feature: IPV4
Source : 10.2.0.200
Destination : 10.1.0.100
Protocol : 50 (ESP)
Feature: IPSec
Action : DECRYPT
SA Handle : 3
SPI : 0x4c1d1e90
Peer Addr : 10.2.0.200
Local Addr: 10.1.0.100
Feature: IPSec
Action : DROP
Sub-code : 019 - CD_IN_ANTI_REPLAY_FAIL
Packet Copy In
45000428 00110000 fc329575 0a0200c8 0a010064 4c1d1e90 00000006 790aa252
e9951cd9 57024433 d97c7cb8 58e0c869 2101f1ef 148c2a12 f309171d 1b7a4771
d8868af7 7bae9967 7d880197 46c6a079 d0143e43 c9024c61 0045280a d57b2f5e
23f06bc3 ab6b6b81 c1b17936 98939509 7aec966e 4dd848d2 60517162 9308ba5d
After the peer is identified, there are three possible scenarios:
Tip: If the replay window is disabled or altered in the IPSec profile and the IPSec profile is used with tunnel protection on a Virtual Tunnel Interface (VTI), the changes will not take effect until the protection profile is either removed and reapplied or the tunnel interface is reset. This is expected behavior because IPSec profiles are just a template to create the tunnel profile map when the tunnel interface is enabled (not shut). Once the interface is already up, changes to the profile do not impact the tunnel until re-applied or the interface is reset.
Note: A commonly encountered problem on ASRs, with respect to the anti-replay window size, is that the classic ASR1K models (such as the ASR1K with ESP5, ESP10, ESP20, and ESP40, along with the ASR1001) do not actually support a window size of 1024. Even though the command allows you to set this limit to 1024, the window size is reset to 512 by the hardware. Because of this, the window size that is reported in the show crypto ipsec sa command output might not be correct. Enter the show crypto ipsec sa peer ip-address platform command in order to verify the hardware anti-replay window size. The default window size is 64 packets on all platforms. For more information, refer to Cisco bug ID CSCso45946. Newer ASR1K models (such as the ASR1K with ESP100 and ESP200, the ASR1001-X and ASR1002-X, and also the ISR-4400) do support a window size of 1024 packets in Versions 15.2(2)S and later.
Note: Replay check failures are only seen when an authentication algorithm is enabled in the IPSec transform set. Another way to suppress this error message is to disable authentication and perform encryption only; however, this is strongly discouraged due to the security implications of disabled authentication.