此产品的文档集力求使用非歧视性语言。在本文档集中,非歧视性语言是指不隐含针对年龄、残障、性别、种族身份、族群身份、性取向、社会经济地位和交叉性的歧视的语言。由于产品软件的用户界面中使用的硬编码语言、基于 RFP 文档使用的语言或引用的第三方产品使用的语言,文档中可能无法确保完全使用非歧视性语言。 深入了解思科如何使用包容性语言。
思科采用人工翻译与机器翻译相结合的方式将此文档翻译成不同语言,希望全球的用户都能通过各自的语言得到支持性的内容。 请注意:即使是最好的机器翻译,其准确度也不及专业翻译人员的水平。 Cisco Systems, Inc. 对于翻译的准确性不承担任何责任,并建议您总是参考英文原始文档(已提供链接)。
本文档介绍Firepower下一代防火墙(NGFW)上集群设置的故障排除。 本文档中涉及的大多数项目也完全适用于自适应安全设备(ASA)集群故障排除。
思科建议您了解这些主题(有关链接,请参阅相关信息部分):
本文档中的信息都是基于特定实验室环境中的设备编写的。用于本文的所有设备始于初始(默认)配置。如果您的网络处于活动状态,请确保您了解所有命令的潜在影响。
FMC和FXOS配置指南中介绍了集群部署的配置部分:
了解Firepower 41xx或93xx系列如何应用处理中转数据包非常重要:
Firepower设备提供多个捕获点,可以查看传输流。当您排除故障并启用集群时,主要挑战包括:
此图显示了一个2单元集群(例如FP941xx/FP9300):
在建立TCP SYN的非对称TCP连接时,SYN/ACK交换如下所示:
转发流量
返回流量
有关此场景的更多详细信息,请阅读集群连接建立案例研究中的相关部分。
根据此数据包交换,所有可能的集群捕获点包括:
对于转发流量(例如TCP SYN)捕获,在:
对于返回流量(例如TCP SYN/ACK)捕获:
如何启用集群捕获
FXOS捕获
该过程在《FXOS配置指南》中介绍:数据包捕获
注意:FXOS捕获只能从内部交换机的角度在入口方向进行。
数据平面捕获
建议在所有集群成员上启用捕获的方法是使用cluster exec命令。
考虑3单元集群:
要验证所有集群设备中是否存在活动捕获,请使用以下命令:
firepower# cluster exec show capture
unit-1-1(LOCAL):******************************************************
unit-2-1:*************************************************************
unit-3-1:*************************************************************
firepower#
要在Po1.201(INSIDE)上的所有设备上启用数据平面捕获,请执行以下操作:
firepower# cluster exec capture CAPI interface INSIDE
强烈建议指定捕获过滤器,如果预期会有大量流量增加捕获缓冲区,请:
firepower# cluster exec capture CAPI buffer 33554432 interface INSIDE match tcp host 192.168.240.50 host 192.168.241.50 eq 80
确认
firepower# cluster exec show capture
unit-1-1(LOCAL):******************************************************
capture CAPI type raw-data buffer 33554432 interface INSIDE [Capturing - 5140 bytes]
match tcp host 192.168.240.50 host 192.168.241.50 eq www
unit-2-1:*************************************************************
capture CAPI type raw-data buffer 33554432 interface INSIDE [Capturing - 260 bytes]
match tcp host 192.168.240.50 host 192.168.241.50 eq www
unit-3-1:*************************************************************
capture CAPI type raw-data buffer 33554432 interface INSIDE [Capturing - 0 bytes]
match tcp host 192.168.240.50 host 192.168.241.50 eq www
要查看所有捕获的内容(此输出可能很长):
firepower# terminal pager 24
firepower# cluster exec show capture CAPI
unit-1-1(LOCAL):******************************************************
21 packets captured
1: 11:33:09.879226 802.1Q vlan#201 P0 192.168.240.50.45456 > 192.168.241.50.80: S 2225395909:2225395909(0) win 29200 <mss 1460,sackOK,timestamp 1110209649 0,nop,wscale 7>
2: 11:33:09.880401 802.1Q vlan#201 P0 192.168.241.50.80 > 192.168.240.50.45456: S 719653963:719653963(0) ack 2225395910 win 28960 <mss 1380,sackOK,timestamp 1120565119 1110209649,nop,wscale 7>
3: 11:33:09.880691 802.1Q vlan#201 P0 192.168.240.50.45456 > 192.168.241.50.80: . ack 719653964 win 229 <nop,nop,timestamp 1110209650 1120565119>
4: 11:33:09.880783 802.1Q vlan#201 P0 192.168.240.50.45456 > 192.168.241.50.80: P 2225395910:2225396054(144) ack 719653964 win 229 <nop,nop,timestamp 1110209650 1120565119>
unit-2-1:*************************************************************
0 packet captured
0 packet shown
unit-3-1:*************************************************************
0 packet captured
0 packet shown
捕获跟踪
如果您想了解每个设备上的数据平面如何处理入口数据包,请使用trace关键字。这将跟踪前50个入口数据包。最多可跟踪1000个入口数据包。请注意,如果接口上应用了多个捕获,则只能跟踪一个数据包一次。
要跟踪所有集群设备上接口OUTSIDE上的前1000个入口数据包,请执行以下操作:
firepower# cluster exec cap CAPO int OUTSIDE buff 33554432 trace trace-count 1000 match tcp host 192.168.240.50 host 192.168.241.50 eq www
捕获关注流后,需要确保跟踪每台设备上关注的数据包。需要记住的重要一点是,特定数据包可#1位于单元1-1上,而#2位于另一单元上,等等。
在本例中,您可以看到SYN/ACK是Unit-2-1上的#2包,而Unit-3-1上的#1包是:
firepower# cluster exec show capture CAPO | include S.*ack
unit-1-1(LOCAL):******************************************************
1: 12:58:31.117700 802.1Q vlan#202 P0 192.168.240.50.45468 > 192.168.241.50.80: S 441626016:441626016(0) win 29200 <mss 1380,sackOK,timestamp 1115330849 0,nop,wscale 7>
2: 12:58:31.118341 802.1Q vlan#202 P0 192.168.241.50.80 > 192.168.240.50.45468: S 301658077:301658077(0) ack 441626017 win 28960 <mss 1460,sackOK,timestamp 1125686319 1115330849,nop,wscale 7>
unit-2-1:*************************************************************
unit-3-1:*************************************************************
1: 12:58:31.111429 802.1Q vlan#202 P0 192.168.241.50.80 > 192.168.240.50.45468: S 301658077:301658077(0) ack 441626017 win 28960 <mss 1460,sackOK,timestamp 1125686319 1115330849,nop,wscale 7>
要跟踪本地设备上的#2包(SYN/ACK),请执行以下操作:
firepower# cluster exec show cap CAPO packet-number 2 trace
unit-1-1(LOCAL):******************************************************
2: 12:58:31.118341 802.1Q vlan#202 P0 192.168.241.50.80 > 192.168.240.50.45468: S 301658077:301658077(0) ack 441626017 win 28960 <mss 1460,sackOK,timestamp 1125686319 1115330849,nop,wscale 7>
Phase: 1
Type: CAPTURE
Subtype:
Result: ALLOW
Config:
Additional Information:
MAC Access list
...
要跟踪远程设备上的相同数据包(SYN/ACK),请执行以下操作:
firepower# cluster exec unit unit-3-1 show cap CAPO packet-number 1 trace
1: 12:58:31.111429 802.1Q vlan#202 P0 192.168.241.50.80 > 192.168.240.50.45468: S 301658077:301658077(0) ack 441626017 win 28960 <mss 1460,sackOK,timestamp 1125686319 1115330849,nop,wscale 7>
Phase: 1
Type: CAPTURE
Subtype:
Result: ALLOW
Config:
Additional Information:
MAC Access list
...
CCL捕获
要在CCL链路上(在所有设备上)启用捕获,请执行以下操作:
firepower# cluster exec capture CCL interface cluster
unit-1-1(LOCAL):******************************************************
unit-2-1:*************************************************************
unit-3-1:*************************************************************
重新对象隐藏
默认情况下,数据平面数据接口上启用的捕获显示所有数据包:
如果不想看到重新注入的数据包,请使用reinject-hide选项。如果要验证流是否为非对称流,这将非常有用:
firepower# cluster exec capture CAPI_RH reinject-hide interface INSIDE match tcp host 192.168.240.50 host 192.168.241.50 eq 80
此捕获仅显示本地设备在特定接口上实际从物理网络接收的内容,而不是从其他集群设备接收的内容。
ASP丢弃
如果要检查特定流的软件丢包,可以启用asp-drop捕获。如果您不知道要关注哪个丢弃原因,请使用关键字all。此外,如果对数据包负载不感兴趣,可以指定headers-only关键字。这样,您可以捕获20-30倍的数据包:
firepower# cluster exec cap ASP type asp-drop all buffer 33554432 headers-only
unit-1-1(LOCAL):******************************************************
unit-2-1:*************************************************************
unit-3-1:*************************************************************
此外,您还可以指定ASP捕获中关注的IP:
firepower# cluster exec cap ASP type asp-drop all buffer 33554432 headers-only match ip host 192.0.2.100 any
清除捕获
清除所有集群设备中运行的任何捕获的缓冲区。这不会停止捕获,但只会清除缓冲区:
firepower# cluster exec clear capture /all
unit-1-1(LOCAL):******************************************************
unit-2-1:*************************************************************
unit-3-1:*************************************************************
停止捕获
在所有集群设备上停止活动捕获有两种方法。稍后,您可以继续。
路1
firepower# cluster exec cap CAPI stop
unit-1-1(LOCAL):******************************************************
unit-2-1:*************************************************************
unit-3-1:*************************************************************
继续
firepower# cluster exec no capture CAPI stop
unit-1-1(LOCAL):******************************************************
unit-2-1:*************************************************************
unit-3-1:*************************************************************
路2
firepower# cluster exec no capture CAPI interface INSIDE
unit-1-1(LOCAL):******************************************************
unit-2-1:*************************************************************
unit-3-1:*************************************************************
继续
firepower# cluster exec capture CAPI interface INSIDE
unit-1-1(LOCAL):******************************************************
unit-2-1:*************************************************************
unit-3-1:*************************************************************
收集捕获
有多种方法可导出捕获。
方式1 — 到远程服务器
这允许您将数据平面上传捕获到远程服务器(例如TFTP)。 请注意,捕获名称会自动更改以反映源设备:
firepower# cluster exec copy /pcap capture:CAPI tftp://192.168.240.55/CAPI.pcap
unit-1-1(LOCAL):******************************************************
Source capture name [CAPI]?
Address or name of remote host [192.168.240.55]?
Destination filename [CAPI.pcap]?
INFO: Destination filename is changed to unit-1-1_CAPI.pcap !!!!!!!
81 packets copied in 0.40 secs
unit-2-1:*************************************************************
INFO: Destination filename is changed to unit-2-1_CAPI.pcap !
unit-3-1:*************************************************************
INFO: Destination filename is changed to unit-3-1_CAPI.pcap !
上传的pcap文件:
方式2 — 从FMC获取捕获
此方法仅适用于FTD。首先,将捕获复制到FTD磁盘:
firepower# cluster exec copy /pcap capture:CAPI disk0:CAPI.pcap
unit-1-1(LOCAL):******************************************************
Source capture name [CAPI]?
Destination filename [CAPI.pcap]?
!!!!!
62 packets copied in 0.0 secs
从专家模式将文件从/mnt/disk0/复制到/ngfw/var/common/目录:
> expert
admin@firepower:~$ cd /mnt/disk0
admin@firepower:/mnt/disk0$ sudo cp CAPI.pcap /ngfw/var/common
最后,在FMC上导航至“系统”>“运行状况”>“监控”部分。选择View System & Troubleshoot Details > Advanced Troubleshooting并获取捕获文件:
删除捕获
要从所有集群设备中删除捕获,请使用以下命令:
firepower# cluster exec no capture CAPI
unit-1-1(LOCAL):******************************************************
unit-2-1:*************************************************************
unit-3-1:*************************************************************
已分流的流
在FP41xx/FP9300上,流可以静态(例如快速路径规则)或动态地卸载到硬件加速器。有关流分流的更多详细信息,请查看本文档:
如果流被分流,则只有少数数据包通过FTD数据平面。其余由硬件加速器(智能NIC)处理。
从捕获角度来看,这意味着如果仅启用FTD数据平面级捕获,则看不到通过设备的所有数据包。在这种情况下,您还需要启用FXOS机箱级捕获。
如果在CCL上捕获,您会注意到集群设备交换不同类型的消息。最感兴趣的是:
协议 |
描述 |
UDP 49495 |
集群心跳(keepalive) · L3广播(255.255.255.255) ·这些数据包由每个集群设备以运行状况检查保持时间值的1/3发送。 ·请注意,捕获中看到的UDP 49495数据包并非全部是心跳 ·心跳包含序列号 |
UDP 4193 |
集群控制协议数据路径消息 ·单播 ·这些数据包包含有关流所有者、指挥交换机、备份所有者等的信息(元数据)。这些 ISP 包括: ·创建新流时,从所有者向指挥交换机发送“集群添加”消息 ·当流终止时,从所有者向指挥交换机发送“集群删除”消息 |
数据包 |
属于流经集群的各种流量的数据包 |
集群心跳
除心跳消息外,还有许多集群控制消息在特定情况下通过CCL交换。其中一些是单播消息,而另一些是广播。
CLUSTER_QUIT_REASON_MASTER_UNIT_HC
每当设备从控制节点丢失3条连续心跳消息时,它都会通过CCL生成CLUSTER_QUIT_REASON_MASTER_UNIT_HC消息。此消息:
问:CLUSTER_QUIT_REASON_MASTER_UNIT_HC有何作用?
A.从单元3-1(站点B)的角度来看,它会丢失与单元1-1和单元2-1从站点A的连接,因此它需要尽快从其成员列表中删除它们,否则,如果单元2-1仍在其成员列表中,并且单元2-1碰巧是连接的指挥,并且流查询单元2-1失败。
CLUSTER_QUIT_REASON_UNIT_HC
每当控制节点丢失来自数据节点的3个连续心跳消息时,它会通过CCL发送CLUSTER_QUIT_REASON_UNIT_HC消息。此消息为单播。
CLUSTER_QUIT_REASON_STRAY_MEMBER
当拆分分区与对等分区重新连接时,新数据节点被主控制单元视为杂散成员,并接收CCP退出消息,原因为CLUSTER_QUIT_REASON_STRAY_MEMBER。
CLUSTER_QUIT_MEMBER_DLOST
由数据节点生成并作为广播发送的广播消息。设备收到此消息后,将变为DISABLED状态。此外,自动重新加入不会启动:
firepower# show cluster info trace | include DROPOUT
Nov 04 00:22:54.699 [DBUG]Receive CCP message: CCP_MSG_QUIT from unit-3-1 to unit-1-1 for reason CLUSTER_QUIT_MEMBER_DROPOUT
Nov 04 00:22:53.699 [DBUG]Receive CCP message: CCP_MSG_QUIT from unit-3-1 to unit-2-1 for reason CLUSTER_QUIT_MEMBER_DROPOUT
集群历史记录显示:
MASTER DISABLED Received control message DISABLE (member dropout announcement)
要点
使用此命令检查集群运行状况计数器:
firepower# show cluster info health details
----------------------------------------------------------------------------------
| Unit (ID)| Heartbeat| Heartbeat| Average| Maximum| Poll|
| | count| drops| gap (ms)| slip (ms)| count|
----------------------------------------------------------------------------------
| unit-2-1 ( 1)| 650| 0| 4999| 1| 0|
| unit-3-1 ( 2)| 650| 0| 4999| 1| 0|
----------------------------------------------------------------------------------
主列说明
列 |
描述 |
设备(ID) |
远程群集对等体的ID |
心跳计数 |
通过CCL从远程对等体接收的心跳数 |
心跳丢弃 |
错过的心跳数。此计数器根据收到的心跳序列号计算 |
平均差距 |
接收心跳的平均时间间隔 |
投票计数 |
当此计数器变为3时,将从集群中删除设备。轮询查询间隔与心跳间隔相同,但独立运行 |
要重置计数器,请使用以下命令:
firepower# clear cluster info health details
问:如何验证心跳频率
A.检查平均差距值:
firepower# show cluster info health details
----------------------------------------------------------------------------------
| Unit (ID)| Heartbeat| Heartbeat| Average| Maximum| Poll|
| | count| drops| gap (ms)| slip (ms)| count|
----------------------------------------------------------------------------------
| unit-2-1 ( 1)| 3036| 0| 999| 1| 0|
----------------------------------------------------------------------------------
问:如何更改FTD上的集群保持时间?
A.使用FlexConfig
在脑分裂后,谁会成为控制节点?
A.具有最高优先级(最低数)的设备:
firepower# show run cluster | include priority
priority 9
有关详细信息,请查看HC故障场景1。
簇HC机制可视化
指示计时器:最小值和最大值取决于最后收到的CCL数据包到达时间
保持时间 |
轮询查询检查 (频率) |
最小检测时间 |
最长检测时间 |
3秒(默认) |
~1秒 |
~3.01秒 |
~3.99秒 |
4 秒 |
~1.33秒 |
~4.01秒 |
~5.32秒 |
5 秒 |
~1.66秒 |
~5.01秒 |
~6.65秒 |
6 秒 |
~2秒 |
~6.01秒 |
~7.99秒 |
7 秒 |
~2.33秒 |
~7.01秒 |
~9.32秒 |
8 秒 |
~2.66秒 |
~8.01秒 |
~10.65秒 |
本部分的目标是演示:
拓扑
集群配置
单元1-1 |
单元2-1 |
单元3-1 |
cluster group GROUP1 |
cluster group GROUP1 |
cluster group GROUP1 |
集群状态
单元1-1 |
单元2-1 |
单元3-1 |
firepower# show cluster info |
firepower# show cluster info |
firepower# show cluster info |
场景 1
CCL通信丢失,在两个方向上损失约4秒
故障发生前
FTD1 |
Ftd2 |
FTD3 |
站点A |
站点A |
站点B |
控制节点 |
数据节点 |
数据节点 |
恢复后(设备角色没有更改)
FTD1 |
Ftd2 |
FTD3 |
站点A |
站点A |
站点B |
控制节点 |
数据节点 |
数据节点 |
分析
故障(CCL通信丢失)
单元3-1上的数据平面控制台消息:
firepower#
WARNING: dynamic routing is not supported on management interface when cluster interface-mode is 'spanned'.
If dynamic routing is configured on any management interface, please remove it.
Cluster unit unit-3-1 transitioned from SLAVE to MASTER
Cluster disable is performing cleanup..done.
All data interfaces have been shutdown due to clustering being disabled.
To recover either enable clustering or remove cluster group configuration.
Unit-1-1集群跟踪日志:
firepower# show cluster info trace | include unit-3-1
Nov 02 09:38:14.239 [INFO]Notify chassis de-bundle port for blade unit-3-1, stack 0x000055a8918307fb 0x000055a8917fc6e8 0x000055a8917f79b5
Nov 02 09:38:14.239 [INFO]FTD - CD proxy received state notification (DISABLED) from unit unit-3-1
Nov 02 09:38:14.239 [DBUG]Send CCP message to all: CCP_MSG_QUIT from unit-1-1 to unit-3-1 for reason CLUSTER_QUIT_MEMBER_DROPOUT
Nov 02 09:38:14.239 [INFO]Notify chassis de-bundle port for blade unit-3-1, stack 0x000055a8917eb596 0x000055a8917f4838 0x000055a891abef9d
Nov 02 09:38:14.239 [DBUG]Send CCP message to id 1: CCP_MSG_QUIT from unit-1-1 to unit-3-1 for reason CLUSTER_QUIT_REASON_UNIT_HC
Nov 02 09:38:14.239 [CRIT]Received heartbeat event 'slave heartbeat failure' for member unit-3-1 (ID: 1).
裂脑
单元1-1 |
单元2-1 |
单元3-1 |
firepower# show cluster info |
firepower# show cluster info |
firepower# show cluster info |
集群历史记录
单元1-1 |
单元2-1 |
单元3-1 |
无事件 |
无事件 |
09:38:16 UTC Nov 2 2020 |
CCL通信恢复
Unit-1-1检测当前控制节点,并且由于Unit-1-1具有较高优先级,因此会向Unit-3-1发送CLUSTER_QUIT_REASON_STRAY_MEMBER消息以触发新的选举过程。最后,单元3-1重新加入为数据节点。
当拆分分区与对等分区重新连接时,数据节点被主控控制节点视为杂散成员,并接收CCP退出消息,原因为CLUSTER_QUIT_REASON_STRAY_MEMBER。
Unit-3-1 console logs show:
Cluster unit unit-3-1 transitioned from MASTER to DISABLED
The 3DES/AES algorithms require a Encryption-3DES-AES activation key.
Detected Cluster Master.
Beginning configuration replication from Master.
WARNING: Local user database is empty and there are still 'aaa' commands for 'LOCAL'.
..
Cryptochecksum (changed): a9ed686f 8e2e689c 2553a104 7a2bd33a
End configuration replication from Master.
Cluster unit unit-3-1 transitioned from DISABLED to SLAVE
设备(unit-1-1和unit-3-1)均显示在其集群日志中:
firepower# show cluster info trace | include retain
Nov 03 21:20:23.019 [CRIT]Found a split cluster with both unit-1-1 and unit-3-1 as master units. Master role retained by unit-1-1, unit-3-1 will leave then join as a slave
Nov 03 21:20:23.019 [CRIT]Found a split cluster with both unit-1-1 and unit-3-1 as master units. Master role retained by unit-1-1, unit-3-1 will leave then join as a slave
此外,还为分裂大脑生成系统日志消息:
firepower# show log | include 747016
Nov 03 2020 21:20:23: %FTD-4-747016: Clustering: Found a split cluster with both unit-1-1 and unit-3-1 as master units. Master role retained by unit-1-1, unit-3-1 will leave then join as a slave
Nov 03 2020 21:20:23: %FTD-4-747016: Clustering: Found a split cluster with both unit-1-1 and unit-3-1 as master units. Master role retained by unit-1-1, unit-3-1 will leave then join as a slave
集群历史记录
单元1-1 |
单元2-1 |
单元3-1 |
无事件 |
无事件 |
09:47:33 UTC Nov 2 2020 |
场景 2
CCL通信丢失,在两个方向上约3-4秒
故障发生前
FTD1 |
Ftd2 |
FTD3 |
站点A |
站点A |
站点B |
控制节点 |
数据节点 |
数据节点 |
恢复后(设备角色没有更改)
FTD1 |
Ftd2 |
FTD3 |
站点A |
站点A |
站点B |
控制节点 |
数据节点 |
数据节点 |
分析
活动1:控制节点丢失来自单元3-1的3个HC,并向单元3-1发送一条消息以离开集群。
活动2:CCL恢复得非常快,来自控制节点的CLUSTER_QUIT_REASON_STRAY_MEMBER消息使其到达远程端。Unit-3-1直接进入DISABLED模式,没有脑分裂
在单元1-1(控制)上,您会看到:
firepower#
Asking slave unit unit-3-1 to quit because it failed unit health-check.
Forcing stray member unit-3-1 to leave the cluster
在单元3-1(数据节点)上,您会看到:
firepower#
Cluster disable is performing cleanup..done.
All data interfaces have been shutdown due to clustering being disabled. To recover either enable clustering or remove cluster group configuration.
Cluster unit unit-3-1 transitioned from SLAVE to DISABLED
集群单元单元3-1转换为DISABLED状态,一旦CCL通信恢复,它将重新加入为数据节点:
firepower# show cluster history
20:58:40 UTC Nov 1 2020
SLAVE DISABLED Received control message DISABLE (stray member)
20:58:45 UTC Nov 1 2020
DISABLED ELECTION Enabled from CLI
20:58:45 UTC Nov 1 2020
ELECTION SLAVE_COLD Received cluster control message
20:58:45 UTC Nov 1 2020
SLAVE_COLD SLAVE_APP_SYNC Client progression done
20:59:33 UTC Nov 1 2020
SLAVE_APP_SYNC SLAVE_CONFIG Slave application configuration sync done
20:59:44 UTC Nov 1 2020
SLAVE_CONFIG SLAVE_FILESYS Configuration replication finished
20:59:45 UTC Nov 1 2020
SLAVE_FILESYS SLAVE_BULK_SYNC Client progression done
21:00:09 UTC Nov 1 2020
SLAVE_BULK_SYNC SLAVE Client progression done
场景 3
CCL通信丢失,在两个方向上约3-4秒
故障发生前
FTD1 |
Ftd2 |
FTD3 |
站点A |
站点A |
站点B |
控制节点 |
数据节点 |
数据节点 |
恢复后(控制节点已更改)
FTD1 |
Ftd2 |
FTD3 |
站点A |
站点A |
站点B |
数据节点 |
控制节点 |
数据节点 |
分析
CCL恢复
集群历史记录
单元1-1 |
单元2-1 |
单元3-1 |
19:53:09 UTC Nov 2 2020 |
19:53:06 UTC Nov 2 2020 |
19:53:06 UTC Nov 2 2020 |
场景 4
CCL通信丢失(约3-4秒)
故障发生前
FTD1 |
Ftd2 |
FTD3 |
站点A |
站点A |
站点B |
控制节点 |
数据节点 |
数据节点 |
恢复后(控制节点更改了站点)
FTD1 |
Ftd2 |
FTD3 |
站点A |
站点A |
站点B |
数据节点 |
数据节点 |
控制节点 |
分析
故障
同样失败的另一种味道。在本例中,单元1-1也未收到来自单元3-1的3条HC消息,一旦收到新的keepalive消息,便尝试使用STRAY消息来踢出单元3-1,但该消息从未传到单元3-1:
备注
如果在步骤5中CCL未恢复,则在站点A中,FTD1成为新的控制节点,在CCL恢复后,它将赢得新选举。
设备1-1上的系统日志消息:
firepower# show log | include 747
Nov 03 2020 23:13:08: %FTD-7-747005: Clustering: State machine notify event CLUSTER_EVENT_MEMBER_STATE (unit-3-1,DISABLED,0x0000000000000000)
Nov 03 2020 23:13:09: %FTD-4-747015: Clustering: Forcing stray member unit-3-1 to leave the cluster
Nov 03 2020 23:13:09: %FTD-7-747005: Clustering: State machine notify event CLUSTER_EVENT_MEMBER_STATE (unit-2-1,DISABLED,0x0000000000000000)
Nov 03 2020 23:13:10: %FTD-4-747015: Clustering: Forcing stray member unit-3-1 to leave the cluster
Nov 03 2020 23:13:10: %FTD-6-747004: Clustering: State machine changed from state MASTER to DISABLED
Nov 03 2020 23:13:12: %FTD-7-747006: Clustering: State machine is at state DISABLED
Nov 03 2020 23:13:12: %FTD-7-747005: Clustering: State machine notify event CLUSTER_EVENT_MY_STATE (state DISABLED,0x0000000000000000,0x0000000000000000)
Nov 03 2020 23:13:18: %FTD-6-747004: Clustering: State machine changed from state ELECTION to ONCALL
设备1-1上的集群跟踪日志:
firepower# show cluster info trace | include QUIT
Nov 03 23:13:10.789 [DBUG]Send CCP message to all: CCP_MSG_QUIT from unit-1-1 for reason CLUSTER_QUIT_REASON_RETIREMENT
Nov 03 23:13:10.769 [DBUG]Receive CCP message: CCP_MSG_QUIT from unit-3-1 to unit-1-1 for reason CLUSTER_QUIT_REASON_MASTER_UNIT_HC
Nov 03 23:13:10.769 [DBUG]Send CCP message to id 1: CCP_MSG_QUIT from unit-1-1 to unit-3-1 for reason CLUSTER_QUIT_REASON_STRAY_MEMBER
Nov 03 23:13:09.789 [DBUG]Receive CCP message: CCP_MSG_QUIT from unit-2-1 for reason CLUSTER_QUIT_REASON_RETIREMENT
Nov 03 23:13:09.769 [DBUG]Send CCP message to id 1: CCP_MSG_QUIT from unit-1-1 to unit-3-1 for reason CLUSTER_QUIT_REASON_STRAY_MEMBER
Nov 03 23:13:08.559 [DBUG]Send CCP message to all: CCP_MSG_QUIT from unit-1-1 to unit-3-1 for reason CLUSTER_QUIT_MEMBER_DROPOUT
Nov 03 23:13:08.559 [DBUG]Send CCP message to id 1: CCP_MSG_QUIT from unit-1-1 to unit-3-1 for reason CLUSTER_QUIT_REASON_UNIT_HC
设备3-1上的系统日志消息:
firepower# show log | include 747
Nov 03 2020 23:13:09: %FTD-7-747005: Clustering: State machine notify event CLUSTER_EVENT_MEMBER_STATE (unit-2-1,DISABLED,0x0000000000000000)
Nov 03 2020 23:13:10: %FTD-7-747005: Clustering: State machine notify event CLUSTER_EVENT_MEMBER_STATE (unit-1-1,DISABLED,0x0000000000000000)
Nov 03 2020 23:13:10: %FTD-6-747004: Clustering: State machine changed from state SLAVE to MASTER
Nov 03 2020 23:13:10: %FTD-6-747004: Clustering: State machine changed from state MASTER_FAST to MASTER_DRAIN
Nov 03 2020 23:13:10: %FTD-6-747004: Clustering: State machine changed from state MASTER_DRAIN to MASTER_CONFIG
Nov 03 2020 23:13:10: %FTD-6-747004: Clustering: State machine changed from state MASTER_CONFIG to MASTER_POST_CONFIG
Nov 03 2020 23:13:10: %FTD-7-747006: Clustering: State machine is at state MASTER_POST_CONFIG
Nov 03 2020 23:13:10: %FTD-6-747004: Clustering: State machine changed from state MASTER_POST_CONFIG to MASTER
Nov 03 2020 23:13:10: %FTD-7-747006: Clustering: State machine is at state MASTER
集群历史记录
单元1-1 |
单元2-1 |
单元3-1 |
23:13:13 UTC Nov 3 2020 |
23:13:12 UTC Nov 3 2020 |
23:13:10 UTC Nov 3 2020 |
方案 5
故障发生前
FTD1 |
Ftd2 |
FTD3 |
站点A |
站点A |
站点B |
控制节点 |
数据节点 |
数据节点 |
恢复后(无更改)
FTD1 |
Ftd2 |
FTD3 |
站点A |
站点A |
站点B |
控制节点 |
数据节点 |
数据节点 |
故障
Unit-3-1向Unit-1-1和Unit-2-1发送了QUIT消息,但由于连接问题,只有Unit-2-1收到了QUIT消息。
Unit-1-1集群跟踪日志:
firepower# show cluster info trace | include QUIT
Nov 04 00:52:10.429 [DBUG]Receive CCP message: CCP_MSG_QUIT from unit-3-1 for reason CLUSTER_QUIT_REASON_RETIREMENT
Nov 04 00:51:47.059 [DBUG]Receive CCP message: CCP_MSG_QUIT from unit-2-1 for reason CLUSTER_QUIT_REASON_RETIREMENT
Nov 04 00:51:45.429 [DBUG]Send CCP message to all: CCP_MSG_QUIT from unit-1-1 to unit-3-1 for reason CLUSTER_QUIT_MEMBER_DROPOUT
Nov 04 00:51:45.429 [DBUG]Send CCP message to unit-3-1(1): CCP_MSG_QUIT from unit-1-1 to unit-3-1 for reason CLUSTER_QUIT_REASON_UNIT_HC
Unit-2-1集群跟踪日志:
firepower# show cluster info trace | include QUIT
Nov 04 00:52:10.389 [DBUG]Receive CCP message: CCP_MSG_QUIT from unit-3-1 for reason CLUSTER_QUIT_REASON_RETIREMENT
Nov 04 00:51:47.019 [DBUG]Send CCP message to all: CCP_MSG_QUIT from unit-2-1 for reason CLUSTER_QUIT_REASON_RETIREMENT
Nov 04 00:51:46.999 [DBUG]Receive CCP message: CCP_MSG_QUIT from unit-3-1 to unit-2-1 for reason CLUSTER_QUIT_REASON_MASTER_UNIT_HC
Nov 04 00:51:45.389 [DBUG]Receive CCP message: CCP_MSG_QUIT from unit-1-1 to unit-3-1 for reason CLUSTER_QUIT_MEMBER_DROPOUT
集群历史记录
单元1-1 |
单元2-1 |
单元3-1 |
无事件 |
00:51:50 UTC Nov 4 2020 |
00:51:47 UTC Nov 4 2020 |
NGFW捕获点
NGFW在以下方面提供捕获功能:
排除集群上的数据路径问题时,大多数情况下使用的捕获点是FXOS和FTD数据平面引擎捕获。
有关NGFW捕获的更多详细信息,请参阅本文档:
集群设备流角色基础
可通过集群以多种方式建立连接,具体取决于以下因素:
流角色 |
描述 |
标志 |
所有者 |
通常,最初接收连接的设备 |
UIO |
总监 |
处理来自转发器的所有者查找请求的单元。 |
Y |
备份所有者 |
只要指挥交换机与所有者不是同一设备,那么指挥交换机也是备份所有者。如果所有者选择自己作为指挥交换机,则选择单独的备份所有者。 |
Y(如果指挥交换机也是备份所有者) y(如果指挥交换机不是备份所有者) |
转发器 |
将数据包转发给所有者的设备 |
z |
片段所有者 |
处理分段流量的设备 |
- |
机箱备份 |
在机箱间集群中,当指挥交换机/备份和所有者流都归同一机箱的设备所有时,另一个机箱中的一个设备将成为辅助备份/指挥交换机。 此角色特定于具有1个以上刀片的Firepower 9300系列的机箱间群集。 |
w |
群集连接建立案例研究
下一节介绍各种案例研究,这些案例研究展示了通过集群建立连接的一些方法。目标是:
拓扑
集群设备和ID:
单元1-1 |
单元2-1 |
单元3-1 |
Cluster GROUP1: On |
Unit "unit-2-1" in state SLAVE |
Unit "unit-3-1" in state SLAVE |
已启用集群捕获:
cluster exec cap CAPI int INSIDE buffer 33554432 match tcp host 192.168.240.50 host 192.168.241.50 eq 80
cluster exec cap CAPO int OUTSIDE buffer 33554432 match tcp host 192.168.240.50 host 192.168.241.50 eq 80
cluster exec cap CAPI_RH reinject-hide int INSIDE buffer 33554432 match tcp host 192.168.240.50 host 192.168.241.50 eq 80
cluster exec cap CAPO_RH reinject-hide int OUTSIDE buffer 33554432 match tcp host 192.168.240.50 host 192.168.241.50 eq 80
cluster exec cap CCL int cluster buffer 33554432
注意:这些测试是在实验室环境中运行的,通过集群的流量最少。在生产中,尝试尽可能使用特定的捕获过滤器(例如,目标端口和尽可能使用源端口),以尽量减少捕获中的“噪音”。
案例研究1.对称流量(所有者也是总监)
观察1. reinject-hide捕获仅在单元1-1上显示数据包。这意味着两个方向的流量都通过单元1-1(对称流量):
firepower# cluster exec show cap
unit-1-1(LOCAL):******************************************************
capture CCL type raw-data interface cluster [Capturing - 33513 bytes]
capture CAPI type raw-data buffer 33554432 trace interface INSIDE [Buffer Full - 33553914 bytes]
match tcp host 192.168.240.50 host 192.168.241.50 eq 80
capture CAPO type raw-data buffer 33554432 trace interface OUTSIDE [Buffer Full - 33553914 bytes]
match tcp host 192.168.240.50 host 192.168.241.50 eq 80
capture CAPI_RH type raw-data reinject-hide buffer 33554432 interface INSIDE [Buffer Full - 33553914 bytes]
match tcp host 192.168.240.50 host 192.168.241.50 eq 80
capture CAPO_RH type raw-data reinject-hide buffer 33554432 interface OUTSIDE [Buffer Full - 33553914 bytes]
match tcp host 192.168.240.50 host 192.168.241.50 eq 80
unit-2-1:*************************************************************
capture CCL type raw-data interface cluster [Capturing - 23245 bytes]
capture CAPI type raw-data buffer 33554432 trace interface INSIDE [Capturing - 0 bytes]
match tcp host 192.168.240.50 host 192.168.241.50 eq 80
capture CAPO type raw-data buffer 33554432 trace interface OUTSIDE [Capturing - 0 bytes]
match tcp host 192.168.240.50 host 192.168.241.50 eq 80
capture CAPI_RH type raw-data reinject-hide buffer 33554432 interface INSIDE [Capturing - 0 bytes]
match tcp host 192.168.240.50 host 192.168.241.50 eq 80
capture CAPO_RH type raw-data reinject-hide buffer 33554432 interface OUTSIDE [Capturing - 0 bytes]
match tcp host 192.168.240.50 host 192.168.241.50 eq 80
unit-3-1:*************************************************************
capture CCL type raw-data interface cluster [Capturing - 24815 bytes]
capture CAPI type raw-data buffer 33554432 trace interface INSIDE [Capturing - 0 bytes]
match tcp host 192.168.240.50 host 192.168.241.50 eq 80
capture CAPO type raw-data buffer 33554432 trace interface OUTSIDE [Capturing - 0 bytes]
match tcp host 192.168.240.50 host 192.168.241.50 eq 80
capture CAPI_RH type raw-data reinject-hide buffer 33554432 interface INSIDE [Capturing - 0 bytes]
match tcp host 192.168.240.50 host 192.168.241.50 eq 80
capture CAPO_RH type raw-data reinject-hide buffer 33554432 interface OUTSIDE [Capturing - 0 bytes]
match tcp host 192.168.240.50 host 192.168.241.50 eq 80
观察2.源端口45954的流量连接标志分析
firepower# cluster exec show conn
unit-1-1(LOCAL):******************************************************
22 in use, 25 most used
Cluster:
fwd connections: 0 in use, 1 most used
dir connections: 0 in use, 122 most used
centralized connections: 0 in use, 0 most used
VPN redirect connections: 0 in use, 0 most used
Inspect Snort:
preserve-connection: 1 enabled, 0 in effect, 2 most enabled, 1 most in effect
TCP OUTSIDE 192.168.241.50:80 INSIDE 192.168.240.50:45954, idle 0:00:00, bytes 487413076, flags UIO N1
unit-2-1:*************************************************************
22 in use, 271 most used
Cluster:
fwd connections: 0 in use, 2 most used
dir connections: 0 in use, 2 most used
centralized connections: 0 in use, 0 most used
VPN redirect connections: 0 in use, 0 most used
Inspect Snort:
preserve-connection: 1 enabled, 0 in effect, 249 most enabled, 0 most in effect
unit-3-1:*************************************************************
17 in use, 20 most used
Cluster:
fwd connections: 1 in use, 2 most used
dir connections: 1 in use, 127 most used
centralized connections: 0 in use, 0 most used
VPN redirect connections: 0 in use, 0 most used
Inspect Snort:
preserve-connection: 0 enabled, 0 in effect, 1 most enabled, 0 most in effect
TCP OUTSIDE 192.168.241.50:443 NP Identity Ifc 192.168.240.50:39698, idle 0:00:23, bytes 0, flags z
TCP OUTSIDE 192.168.241.50:80 INSIDE 192.168.240.50:45954, idle 0:00:06, bytes 0, flags y
单元 |
标志 |
备注 |
单元1-1 |
UIO |
· 流所有者 — 设备处理流 · 指挥 — 由于单元3-1的“y”而不是“Y”,这意味着单元1-1被选为此流的指挥。因此,由于它也是所有者,因此另一台设备(本例中为unit-3-1)被选为备用所有者 |
单元2-1 |
- |
- |
单元3-1 |
y |
设备是备份所有者 |
这可视化为:
观察3.用跟踪捕捉显示两个方向仅通过单元1-1
步骤1.根据源端口确定所有集群单元中感兴趣的流和数据包:
firepower# cluster exec show capture CAPI | i 45954
unit-1-1(LOCAL):******************************************************
1: 08:42:09.362697 802.1Q vlan#201 P0 192.168.240.50.45954 > 192.168.241.50.80: S 992089269:992089269(0) win 29200 <mss 1460,sackOK,timestamp 495153655 0,nop,wscale 7>
2: 08:42:09.363521 802.1Q vlan#201 P0 192.168.241.50.80 > 192.168.240.50.45954: S 4042762409:4042762409(0) ack 992089270 win 28960 <mss 1380,sackOK,timestamp 505509125 495153655,nop,wscale 7>
3: 08:42:09.363827 802.1Q vlan#201 P0 192.168.240.50.45954 > 192.168.241.50.80: . ack 4042762410 win 229 <nop,nop,timestamp 495153657 505509125>
…
unit-2-1:*************************************************************
unit-3-1:*************************************************************
firepower# cluster exec show capture CAPO | i 45954
unit-1-1(LOCAL):******************************************************
1: 08:42:09.362987 802.1Q vlan#202 P0 192.168.240.50.45954 > 192.168.241.50.80: S 2732339016:2732339016(0) win 29200 <mss 1380,sackOK,timestamp 495153655 0,nop,wscale 7>
2: 08:42:09.363415 802.1Q vlan#202 P0 192.168.241.50.80 > 192.168.240.50.45954: S 3603655982:3603655982(0) ack 2732339017 win 28960 <mss 1460,sackOK,timestamp 505509125 495153655,nop,wscale 7>
3: 08:42:09.363903 802.1Q vlan#202 P0 192.168.240.50.45954 > 192.168.241.50.80: . ack 3603655983 win 229 <nop,nop,timestamp 495153657 505509125>
…
unit-2-1:*************************************************************
unit-3-1:*************************************************************
步骤2.由于这是TCP流跟踪三次握手数据包。如此输出所示,unit-1-1是所有者。为简单起见,省略了不相关的跟踪阶段:
firepower# show cap CAPI packet-number 1 trace
25985 packets captured
1: 08:42:09.362697 802.1Q vlan#201 P0 192.168.240.50.45954 > 192.168.241.50.80: S 992089269:992089269(0) win 29200 <mss 1460,sackOK,timestamp 495153655 0,nop,wscale 7>
...
Phase: 4
Type: CLUSTER-EVENT
Subtype:
Result: ALLOW
Config:
Additional Information:
Input interface: 'INSIDE'
Flow type: NO FLOW
I (0) got initial, attempting ownership.
Phase: 5
Type: CLUSTER-EVENT
Subtype:
Result: ALLOW
Config:
Additional Information:
Input interface: 'INSIDE'
Flow type: NO FLOW
I (0) am becoming owner
...
返回流量(TCP SYN/ACK):
firepower# show capture CAPO packet-number 2 trace
25985 packets captured
2: 08:42:09.363415 802.1Q vlan#202 P0 192.168.241.50.80 > 192.168.240.50.45954: S 3603655982:3603655982(0) ack 2732339017 win 28960 <mss 1460,sackOK,timestamp 505509125 495153655,nop,wscale 7>
...
Phase: 3
Type: FLOW-LOOKUP
Subtype:
Result: ALLOW
Config:
Additional Information:
Found flow with id 9364, using existing flow
观察4. FTD数据平面系统日志显示所有设备上的连接创建和终止:
firepower# cluster exec show log | include 45954
unit-1-1(LOCAL):******************************************************
Dec 01 2020 08:42:09: %FTD-6-302013: Built inbound TCP connection 9364 for INSIDE:192.168.240.50/45954 (192.168.240.50/45954) to OUTSIDE:192.168.241.50/80 (192.168.241.50/80)
Dec 01 2020 08:42:18: %FTD-6-302014: Teardown TCP connection 9364 for INSIDE:192.168.240.50/45954 to OUTSIDE:192.168.241.50/80 duration 0:00:08 bytes 1024000440 TCP FINs from INSIDE
unit-2-1:*************************************************************
unit-3-1:*************************************************************
Dec 01 2020 08:42:09: %FTD-6-302022: Built backup stub TCP connection for INSIDE:192.168.240.50/45954 (192.168.240.50/45954) to OUTSIDE:192.168.241.50/80 (192.168.241.50/80)
Dec 01 2020 08:42:18: %FTD-6-302023: Teardown backup TCP connection for INSIDE:192.168.240.50/45954 to OUTSIDE:192.168.241.50/80 duration 0:00:08 forwarded bytes 0 Cluster flow with CLU closed on owner
案例研究2.对称流量(所有者与总监不同)
观察1.业主与董事不同
源端口46278的流的连接标志分析
firepower# cluster exec show conn
unit-1-1(LOCAL):******************************************************
23 in use, 25 most used
Cluster:
fwd connections: 0 in use, 1 most used
dir connections: 0 in use, 122 most used
centralized connections: 0 in use, 0 most used
VPN redirect connections: 0 in use, 0 most used
Inspect Snort:
preserve-connection: 2 enabled, 0 in effect, 4 most enabled, 1 most in effect
TCP OUTSIDE 192.168.241.50:80 INSIDE 192.168.240.50:46278, idle 0:00:00, bytes 508848268, flags UIO N1
TCP OUTSIDE 192.168.241.50:80 INSIDE 192.168.240.50:46276, idle 0:00:03, bytes 0, flags aA N1
unit-2-1:*************************************************************
21 in use, 271 most used
Cluster:
fwd connections: 0 in use, 2 most used
dir connections: 0 in use, 2 most used
centralized connections: 0 in use, 0 most used
VPN redirect connections: 0 in use, 0 most used
Inspect Snort:
preserve-connection: 0 enabled, 0 in effect, 249 most enabled, 0 most in effect
unit-3-1:*************************************************************
17 in use, 20 most used
Cluster:
fwd connections: 1 in use, 5 most used
dir connections: 1 in use, 127 most used
centralized connections: 0 in use, 0 most used
VPN redirect connections: 0 in use, 0 most used
Inspect Snort:
preserve-connection: 0 enabled, 0 in effect, 1 most enabled, 0 most in effect
TCP OUTSIDE 192.168.241.50:80 NP Identity Ifc 192.168.240.50:46276, idle 0:00:02, bytes 0, flags z
TCP OUTSIDE 192.168.241.50:80 INSIDE 192.168.240.50:46278, idle 0:00:06, bytes 0, flags Y
单元 |
标志 |
备注 |
单元1-1 |
UIO |
· 流所有者 — 设备处理流 |
单元2-1 |
- |
- |
单元3-1 |
Y |
· Director和Backup owner - Unit 3-1的标志为Y(Director)。 |
这可视化为:
观察2.用跟踪捕捉显示两个方向仅通过单元1-1
步骤1.按照与案例研究1相同的方法根据源端口确定所有集群单元中感兴趣的流和数据包:
firepower# cluster exec show cap CAPI | include 46278
unit-1-1(LOCAL):******************************************************
3: 11:01:44.841631 802.1Q vlan#201 P0 192.168.240.50.46278 > 192.168.241.50.80: S 1972783998:1972783998(0) win 29200 <mss 1460,sackOK,timestamp 503529072 0,nop,wscale 7>
4: 11:01:44.842317 802.1Q vlan#201 P0 192.168.241.50.80 > 192.168.240.50.46278: S 3524167695:3524167695(0) ack 1972783999 win 28960 <mss 1380,sackOK,timestamp 513884542 503529072,nop,wscale 7>
5: 11:01:44.842592 802.1Q vlan#201 P0 192.168.240.50.46278 > 192.168.241.50.80: . ack 3524167696 win 229 <nop,nop,timestamp 503529073 513884542>
…
unit-2-1:*************************************************************
unit-3-1:*************************************************************
firepower#
在外部接口上捕获:
firepower# cluster exec show cap CAPO | include 46278
unit-1-1(LOCAL):******************************************************
3: 11:01:44.841921 802.1Q vlan#202 P0 192.168.240.50.46278 > 192.168.241.50.80: S 2153055699:2153055699(0) win 29200 <mss 1380,sackOK,timestamp 503529072 0,nop,wscale 7>
4: 11:01:44.842226 802.1Q vlan#202 P0 192.168.241.50.80 > 192.168.240.50.46278: S 3382481337:3382481337(0) ack 2153055700 win 28960 <mss 1460,sackOK,timestamp 513884542 503529072,nop,wscale 7>
5: 11:01:44.842638 802.1Q vlan#202 P0 192.168.240.50.46278 > 192.168.241.50.80: . ack 3382481338 win 229 <nop,nop,timestamp 503529073 513884542>
unit-2-1:*************************************************************
unit-3-1:*************************************************************
firepower#
步骤2.重点关注入口数据包(TCP SYN和TCP SYN/ACK):
firepower# cluster exec show cap CAPI packet-number 3 trace
unit-1-1(LOCAL):******************************************************
824 packets captured
3: 11:01:44.841631 802.1Q vlan#201 P0 192.168.240.50.46278 > 192.168.241.50.80: S 1972783998:1972783998(0) win 29200 <mss 1460,sackOK,timestamp 503529072 0,nop,wscale 7>
…
Phase: 4
Type: CLUSTER-EVENT
Subtype:
Result: ALLOW
Config:
Additional Information:
Input interface: 'INSIDE'
Flow type: NO FLOW
I (0) got initial, attempting ownership.
Phase: 5
Type: CLUSTER-EVENT
Subtype:
Result: ALLOW
Config:
Additional Information:
Input interface: 'INSIDE'
Flow type: NO FLOW
I (0) am becoming owner
跟踪单元1-1上的SYN/ACK:
firepower# cluster exec show cap CAPO packet-number 4 trace
unit-1-1(LOCAL):******************************************************
4: 11:01:44.842226 802.1Q vlan#202 P0 192.168.241.50.80 > 192.168.240.50.46278: S 3382481337:3382481337(0) ack 2153055700 win 28960 <mss 1460,sackOK,timestamp 513884542 503529072,nop,wscale 7>
Phase: 3
Type: FLOW-LOOKUP
Subtype:
Result: ALLOW
Config:
Additional Information:
Found flow with id 9583, using existing flow
观察3. FTD数据平面系统日志显示所有者和备份所有者的连接创建和终止:
firepower# cluster exec show log | include 46278
unit-1-1(LOCAL):******************************************************
Dec 01 2020 11:01:44: %FTD-6-302013: Built inbound TCP connection 9583 for INSIDE:192.168.240.50/46278 (192.168.240.50/46278) to OUTSIDE:192.168.241.50/80 (192.168.241.50/80)
Dec 01 2020 11:01:53: %FTD-6-302014: Teardown TCP connection 9583 for INSIDE:192.168.240.50/46278 to OUTSIDE:192.168.241.50/80 duration 0:00:08 bytes 1024001808 TCP FINs from INSIDE
unit-2-1:*************************************************************
unit-3-1:*************************************************************
Dec 01 2020 11:01:44: %FTD-6-302022: Built director stub TCP connection for INSIDE:192.168.240.50/46278 (192.168.240.50/46278) to OUTSIDE:192.168.241.50/80 (192.168.241.50/80)
Dec 01 2020 11:01:53: %FTD-6-302023: Teardown director TCP connection for INSIDE:192.168.240.50/46278 to OUTSIDE:192.168.241.50/80 duration 0:00:08 forwarded bytes 0 Cluster flow with CLU closed on owner
案例研究3.非对称流量(指挥交换机转发流量)
观察1. reinject-hide捕获的show packets on unit-1-1 and unit-2-1(非对称流):
firepower# cluster exec show cap
unit-1-1(LOCAL):******************************************************
capture CCL type raw-data buffer 33554432 interface cluster [Buffer Full - 33554320 bytes]
capture CAPI type raw-data buffer 100000 trace interface INSIDE [Buffer Full - 98552 bytes]
match tcp host 192.168.240.50 host 192.168.241.50 eq www
capture CAPO type raw-data buffer 100000 trace interface OUTSIDE [Buffer Full - 98552 bytes]
match tcp host 192.168.240.50 host 192.168.241.50 eq www
capture CAPI_RH type raw-data reinject-hide buffer 100000 interface INSIDE [Buffer Full - 98552 bytes]
match tcp host 192.168.240.50 host 192.168.241.50 eq www
capture CAPO_RH type raw-data reinject-hide buffer 100000 interface OUTSIDE [Buffer Full - 99932 bytes]
match tcp host 192.168.240.50 host 192.168.241.50 eq www
unit-2-1:*************************************************************
capture CCL type raw-data buffer 33554432 interface cluster [Buffer Full - 33553268 bytes]
capture CAPI type raw-data buffer 100000 trace interface INSIDE [Capturing - 0 bytes]
match tcp host 192.168.240.50 host 192.168.241.50 eq www
capture CAPO type raw-data buffer 100000 trace interface OUTSIDE [Buffer Full - 99052 bytes]
match tcp host 192.168.240.50 host 192.168.241.50 eq www
capture CAPI_RH type raw-data reinject-hide buffer 100000 interface INSIDE [Capturing - 0 bytes]
match tcp host 192.168.240.50 host 192.168.241.50 eq www
capture CAPO_RH type raw-data reinject-hide buffer 100000 interface OUTSIDE [Buffer Full - 99052 bytes]
match tcp host 192.168.240.50 host 192.168.241.50 eq www
unit-3-1:*************************************************************
capture CCL type raw-data buffer 33554432 interface cluster [Capturing - 53815 bytes]
capture CAPI type raw-data buffer 100000 trace interface INSIDE [Capturing - 0 bytes]
match tcp host 192.168.240.50 host 192.168.241.50 eq www
capture CAPO type raw-data buffer 100000 trace interface OUTSIDE [Capturing - 658 bytes]
match tcp host 192.168.240.50 host 192.168.241.50 eq www
capture CAPI_RH type raw-data reinject-hide buffer 100000 interface INSIDE [Capturing - 0 bytes]
match tcp host 192.168.240.50 host 192.168.241.50 eq www
capture CAPO_RH type raw-data reinject-hide buffer 100000 interface OUTSIDE [Capturing - 658 bytes]
match tcp host 192.168.240.50 host 192.168.241.50 eq www
观察2.源端口46502的流量连接标志分析
firepower# cluster exec show conn
unit-1-1(LOCAL):******************************************************
23 in use, 25 most used
Cluster:
fwd connections: 0 in use, 1 most used
dir connections: 0 in use, 122 most used
centralized connections: 0 in use, 0 most used
VPN redirect connections: 0 in use, 0 most used
Inspect Snort:
preserve-connection: 2 enabled, 0 in effect, 4 most enabled, 1 most in effect
TCP OUTSIDE 192.168.241.50:80 INSIDE 192.168.240.50:46502, idle 0:00:00, bytes 448760236, flags UIO N1
TCP OUTSIDE 192.168.241.50:80 INSIDE 192.168.240.50:46500, idle 0:00:06, bytes 0, flags aA N1
unit-2-1:*************************************************************
21 in use, 271 most used
Cluster:
fwd connections: 0 in use, 2 most used
dir connections: 1 in use, 2 most used
centralized connections: 0 in use, 0 most used
VPN redirect connections: 0 in use, 0 most used
Inspect Snort:
preserve-connection: 0 enabled, 0 in effect, 249 most enabled, 0 most in effect
TCP OUTSIDE 192.168.241.50:80 INSIDE 192.168.240.50:46502, idle 0:00:00, bytes 0, flags Y
unit-3-1:*************************************************************
17 in use, 20 most used
Cluster:
fwd connections: 1 in use, 5 most used
dir connections: 0 in use, 127 most used
centralized connections: 0 in use, 0 most used
VPN redirect connections: 0 in use, 0 most used
Inspect Snort:
preserve-connection: 0 enabled, 0 in effect, 1 most enabled, 0 most in effect
单元 |
标志 |
备注 |
单元1-1 |
UIO |
· 流所有者 — 设备处理流 |
单元2-1 |
Y |
· Director — 由于单元2-1的标记为“Y”,这意味着单元2-1被选为此流的导向器。 ·备份所有者 ·最后,虽然从此输出中并不明显,但从show capture和show log输出中可以明显看出,单元2-1将此流转发给所有者(尽管从技术上讲,在此方案中它不被视为转发者) 注意:设备不能同时是指挥交换机(Y流)和转发器(z流),这两个角色是互斥的。请注意,控制器(Y流)仍可转发流量。在本案例研究的后面部分,请参阅show log输出。 |
单元3-1 |
- |
- |
这可视化为:
观察3.使用跟踪捕获显示非对称流量和从单元2-1到单元1-1的重定向
步骤1.确定属于关注流的数据包(端口46502):
firepower# cluster exec show capture CAPI | include 46502
unit-1-1(LOCAL):******************************************************
3: 12:58:33.356121 802.1Q vlan#201 P0 192.168.240.50.46502 > 192.168.241.50.80: S 4124514680:4124514680(0) win 29200 <mss 1460,sackOK,timestamp 510537534 0,nop,wscale 7>
4: 12:58:33.357037 802.1Q vlan#201 P0 192.168.241.50.80 > 192.168.240.50.46502: S 883000451:883000451(0) ack 4124514681 win 28960 <mss 1380,sackOK,timestamp 520893004 510537534,nop,wscale 7>
5: 12:58:33.357357 802.1Q vlan#201 P0 192.168.240.50.46502 > 192.168.241.50.80: . ack 883000452 win 229 <nop,nop,timestamp 510537536 520893004>
unit-2-1:*************************************************************
unit-3-1:*************************************************************
返回方向:
firepower# cluster exec show capture CAPO | include 46502
unit-1-1(LOCAL):******************************************************
3: 12:58:33.356426 802.1Q vlan#202 P0 192.168.240.50.46502 > 192.168.241.50.80: S 1434968587:1434968587(0) win 29200 <mss 1380,sackOK,timestamp 510537534 0,nop,wscale 7>
4: 12:58:33.356915 802.1Q vlan#202 P0 192.168.241.50.80 > 192.168.240.50.46502: S 4257314722:4257314722(0) ack 1434968588 win 28960 <mss 1460,sackOK,timestamp 520893004 510537534,nop,wscale 7>
5: 12:58:33.357403 802.1Q vlan#202 P0 192.168.240.50.46502 > 192.168.241.50.80: . ack 4257314723 win 229 <nop,nop,timestamp 510537536 520893004>
unit-2-1:*************************************************************
1: 12:58:33.359249 802.1Q vlan#202 P0 192.168.241.50.80 > 192.168.240.50.46502: S 4257314722:4257314722(0) ack 1434968588 win 28960 <mss 1460,sackOK,timestamp 520893004 510537534,nop,wscale 7>
2: 12:58:33.360302 802.1Q vlan#202 P0 192.168.241.50.80 > 192.168.240.50.46502: . ack 1434968736 win 235 <nop,nop,timestamp 520893005 510537536>
3: 12:58:33.361004 802.1Q vlan#202 P0 192.168.241.50.80 > 192.168.240.50.46502: . 4257314723:4257316091(1368) ack 1434968736 win 235 <nop,nop,timestamp 520893006 510537536>
…
unit-3-1:*************************************************************
步骤2.跟踪数据包。请注意,默认情况下,仅跟踪前50个入口数据包。为简单起见,省略了不相关的跟踪阶段。
单元1-1(所有者):
firepower# cluster exec show capture CAPI packet-number 3 trace
unit-1-1(LOCAL):******************************************************
3: 12:58:33.356121 802.1Q vlan#201 P0 192.168.240.50.46502 > 192.168.241.50.80: S 4124514680:4124514680(0) win 29200 <mss 1460,sackOK,timestamp 510537534 0,nop,wscale 7>
...
Phase: 4
Type: CLUSTER-EVENT
Subtype:
Result: ALLOW
Config:
Additional Information:
Input interface: 'INSIDE'
Flow type: NO FLOW
I (0) got initial, attempting ownership.
Phase: 5
Type: CLUSTER-EVENT
Subtype:
Result: ALLOW
Config:
Additional Information:
Input interface: 'INSIDE'
Flow type: NO FLOW
I (0) am becoming owner
设备2-1(转发器)
返回流量(TCP SYN/ACK)。 感兴趣的单位是unit-2-1,它是指挥交换机/备份所有者,并将流量转发给所有者:
firepower# cluster exec unit unit-2-1 show capture CAPO packet-number 1 trace
1: 12:58:33.359249 802.1Q vlan#202 P0 192.168.241.50.80 > 192.168.240.50.46502: S 4257314722:4257314722(0) ack 1434968588 win 28960 <mss 1460,sackOK,timestamp 520893004 510537534,nop,wscale 7>
...
Phase: 4
Type: CLUSTER-EVENT
Subtype:
Result: ALLOW
Config:
Additional Information:
Input interface: 'OUTSIDE'
Flow type: NO FLOW
I (1) got initial, attempting ownership.
Phase: 5
Type: CLUSTER-EVENT
Subtype:
Result: ALLOW
Config:
Additional Information:
Input interface: 'OUTSIDE'
Flow type: NO FLOW
I (1) am early redirecting to (0) due to matching action (-1).
观察4. FTD数据平面系统日志显示所有设备上的连接创建和终止:
firepower# cluster exec show log | i 46502
unit-1-1(LOCAL):******************************************************
Dec 01 2020 12:58:33: %FTD-6-302013: Built inbound TCP connection 9742 for INSIDE:192.168.240.50/46502 (192.168.240.50/46502) to OUTSIDE:192.168.241.50/80 (192.168.241.50/80)
Dec 01 2020 12:59:02: %FTD-6-302014: Teardown TCP connection 9742 for INSIDE:192.168.240.50/46502 to OUTSIDE:192.168.241.50/80 duration 0:00:28 bytes 2048000440 TCP FINs from INSIDE
unit-2-1:*************************************************************
Dec 01 2020 12:58:33: %FTD-6-302022: Built forwarder stub TCP connection for OUTSIDE:192.168.241.50/80 (192.168.241.50/80) to unknown:192.168.240.50/46502 (192.168.240.50/46502)
Dec 01 2020 12:58:33: %FTD-6-302023: Teardown forwarder TCP connection for OUTSIDE:192.168.241.50/80 to unknown:192.168.240.50/46502 duration 0:00:00 forwarded bytes 0 Forwarding or redirect flow removed to create director or backup flow
Dec 01 2020 12:58:33: %FTD-6-302022: Built director stub TCP connection for INSIDE:192.168.240.50/46502 (192.168.240.50/46502) to OUTSIDE:192.168.241.50/80 (192.168.241.50/80)
Dec 01 2020 12:59:02: %FTD-6-302023: Teardown director TCP connection for INSIDE:192.168.240.50/46502 to OUTSIDE:192.168.241.50/80 duration 0:00:28 forwarded bytes 2048316300 Cluster flow with CLU closed on owner
unit-3-1:*************************************************************
firepower#
案例研究4.非对称流量(所有者是主管)
观察1. reinject-hide捕获的show packets on unit-1-1 and unit-2-1(非对称流):
firepower# cluster exec show cap
unit-1-1(LOCAL):******************************************************
capture CCL type raw-data buffer 33554432 interface cluster [Buffer Full - 33554229 bytes]
capture CAPI type raw-data buffer 100000 trace interface INSIDE [Buffer Full - 98974 bytes]
match tcp host 192.168.240.50 host 192.168.241.50 eq www
capture CAPO type raw-data buffer 100000 trace interface OUTSIDE [Buffer Full - 98974 bytes]
match tcp host 192.168.240.50 host 192.168.241.50 eq www
capture CAPI_RH type raw-data reinject-hide buffer 100000 interface INSIDE [Buffer Full - 98974 bytes]
match tcp host 192.168.240.50 host 192.168.241.50 eq www
capture CAPO_RH type raw-data reinject-hide buffer 100000 interface OUTSIDE [Buffer Full - 99924 bytes]
match tcp host 192.168.240.50 host 192.168.241.50 eq www
unit-2-1:*************************************************************
capture CCL type raw-data buffer 33554432 interface cluster [Buffer Full - 33552925 bytes]
capture CAPI type raw-data buffer 100000 trace interface INSIDE [Capturing - 0 bytes]
match tcp host 192.168.240.50 host 192.168.241.50 eq www
capture CAPO type raw-data buffer 100000 trace interface OUTSIDE [Buffer Full - 99052 bytes]
match tcp host 192.168.240.50 host 192.168.241.50 eq www
capture CAPI_RH type raw-data reinject-hide buffer 100000 interface INSIDE [Capturing - 0 bytes]
match tcp host 192.168.240.50 host 192.168.241.50 eq www
capture CAPO_RH type raw-data reinject-hide buffer 100000 interface OUTSIDE [Buffer Full - 99052 bytes]
match tcp host 192.168.240.50 host 192.168.241.50 eq www
unit-3-1:*************************************************************
capture CCL type raw-data buffer 33554432 interface cluster [Capturing - 227690 bytes]
capture CAPI type raw-data buffer 100000 trace interface INSIDE [Capturing - 0 bytes]
match tcp host 192.168.240.50 host 192.168.241.50 eq www
capture CAPO type raw-data buffer 100000 trace interface OUTSIDE [Capturing - 4754 bytes]
match tcp host 192.168.240.50 host 192.168.241.50 eq www
capture CAPI_RH type raw-data reinject-hide buffer 100000 interface INSIDE [Capturing - 0 bytes]
match tcp host 192.168.240.50 host 192.168.241.50 eq www
capture CAPO_RH type raw-data reinject-hide buffer 100000 interface OUTSIDE [Capturing - 0 bytes]
match tcp host 192.168.240.50 host 192.168.241.50 eq www
观察2.源端口46916的流量连接标志分析
firepower# cluster exec show conn
unit-1-1(LOCAL):******************************************************
23 in use, 25 most used
Cluster:
fwd connections: 0 in use, 1 most used
dir connections: 0 in use, 122 most used
centralized connections: 0 in use, 0 most used
VPN redirect connections: 0 in use, 0 most used
Inspect Snort:
preserve-connection: 1 enabled, 0 in effect, 4 most enabled, 1 most in effect
TCP OUTSIDE 192.168.241.50:80 INSIDE 192.168.240.50:46916, idle 0:00:00, bytes 414682616, flags UIO N1
unit-2-1:*************************************************************
21 in use, 271 most used
Cluster:
fwd connections: 1 in use, 2 most used
dir connections: 0 in use, 2 most used
centralized connections: 0 in use, 0 most used
VPN redirect connections: 0 in use, 0 most used
Inspect Snort:
preserve-connection: 0 enabled, 0 in effect, 249 most enabled, 0 most in effect
TCP OUTSIDE 192.168.241.50:80 NP Identity Ifc 192.168.240.50:46916, idle 0:00:00, bytes 0, flags z
unit-3-1:*************************************************************
17 in use, 20 most used
Cluster:
fwd connections: 0 in use, 5 most used
dir connections: 1 in use, 127 most used
centralized connections: 0 in use, 0 most used
VPN redirect connections: 0 in use, 0 most used
Inspect Snort:
preserve-connection: 0 enabled, 0 in effect, 1 most enabled, 0 most in effect
TCP OUTSIDE 192.168.241.50:80 INSIDE 192.168.240.50:46916, idle 0:00:04, bytes 0, flags y
单元 |
标志 |
备注 |
单元1-1 |
UIO |
· 流所有者 — 设备处理流 · 指挥 — 由于单元3-1的“y”而不是“Y”,这意味着单元1-1被选为此流的指挥。因此,由于它也是所有者,因此另一台设备(本例中为unit-3-1)被选为备用所有者 |
单元2-1 |
z |
·转发器 |
单元3-1 |
y |
— 备份所有者 |
这可视化为:
观察3.使用跟踪捕获显示非对称流量和从单元2-1到单元1-1的重定向
设备2-1(转发器)
firepower# cluster exec unit unit-2-1 show capture CAPO packet-number 1 trace
1: 16:11:33.653164 802.1Q vlan#202 P0 192.168.241.50.80 > 192.168.240.50.46916: S 1331019196:1331019196(0) ack 3089755618 win 28960 <mss 1460,sackOK,timestamp 532473211 522117741,nop,wscale 7>
...
Phase: 4
Type: CLUSTER-EVENT
Subtype:
Result: ALLOW
Config:
Additional Information:
Input interface: 'OUTSIDE'
Flow type: NO FLOW
I (1) got initial, attempting ownership.
Phase: 5
Type: CLUSTER-EVENT
Subtype:
Result: ALLOW
Config:
Additional Information:
Input interface: 'OUTSIDE'
Flow type: NO FLOW
I (1) am early redirecting to (0) due to matching action (-1).
观察4. FTD数据平面系统日志显示所有设备上的连接创建和终止:
firepower# cluster exec show log | i 46916
unit-1-1(LOCAL):******************************************************
Dec 01 2020 16:11:33: %FTD-6-302013: Built inbound TCP connection 10023 for INSIDE:192.168.240.50/46916 (192.168.240.50/46916) to OUTSIDE:192.168.241.50/80 (192.168.241.50/80)
Dec 01 2020 16:11:42: %FTD-6-302014: Teardown TCP connection 10023 for INSIDE:192.168.240.50/46916 to OUTSIDE:192.168.241.50/80 duration 0:00:09 bytes 1024010016 TCP FINs from INSIDE
unit-2-1:*************************************************************
Dec 01 2020 16:11:33: %FTD-6-302022: Built forwarder stub TCP connection for OUTSIDE:192.168.241.50/80 (192.168.241.50/80) to unknown:192.168.240.50/46916 (192.168.240.50/46916)
Dec 01 2020 16:11:42: %FTD-6-302023: Teardown forwarder TCP connection for OUTSIDE:192.168.241.50/80 to unknown:192.168.240.50/46916 duration 0:00:09 forwarded bytes 1024009868 Cluster flow with CLU closed on owner
unit-3-1:*************************************************************
Dec 01 2020 16:11:33: %FTD-6-302022: Built backup stub TCP connection for INSIDE:192.168.240.50/46916 (192.168.240.50/46916) to OUTSIDE:192.168.241.50/80 (192.168.241.50/80)
Dec 01 2020 16:11:42: %FTD-6-302023: Teardown backup TCP connection for INSIDE:192.168.240.50/46916 to OUTSIDE:192.168.241.50/80 duration 0:00:09 forwarded bytes 0 Cluster flow with CLU closed on owner
案例研究5.非对称流量(所有者与总监不同)
观察1. reinject-hide捕获的show packets on unit-1-1 and unit-2-1(非对称流):
firepower# cluster exec show cap
unit-1-1(LOCAL):******************************************************
capture CCL type raw-data buffer 33554432 interface cluster [Buffer Full - 33553207 bytes]
capture CAPI type raw-data buffer 100000 trace interface INSIDE [Buffer Full - 99396 bytes]
match tcp host 192.168.240.50 host 192.168.241.50 eq www
capture CAPO type raw-data buffer 100000 trace interface OUTSIDE [Buffer Full - 99224 bytes]
match tcp host 192.168.240.50 host 192.168.241.50 eq www
capture CAPI_RH type raw-data reinject-hide buffer 100000 interface INSIDE [Buffer Full - 99396 bytes]
match tcp host 192.168.240.50 host 192.168.241.50 eq www
capture CAPO_RH type raw-data reinject-hide buffer 100000 interface OUTSIDE [Buffer Full - 99928 bytes]
match tcp host 192.168.240.50 host 192.168.241.50 eq www
unit-2-1:*************************************************************
capture CCL type raw-data buffer 33554432 interface cluster [Buffer Full - 33554251 bytes]
capture CAPI type raw-data buffer 100000 trace interface INSIDE [Capturing - 0 bytes]
match tcp host 192.168.240.50 host 192.168.241.50 eq www
capture CAPO type raw-data buffer 100000 trace interface OUTSIDE [Buffer Full - 99052 bytes]
match tcp host 192.168.240.50 host 192.168.241.50 eq www
capture CAPI_RH type raw-data reinject-hide buffer 100000 interface INSIDE [Capturing - 0 bytes]
match tcp host 192.168.240.50 host 192.168.241.50 eq www
capture CAPO_RH type raw-data reinject-hide buffer 100000 interface OUTSIDE [Buffer Full - 99052 bytes]
match tcp host 192.168.240.50 host 192.168.241.50 eq www
unit-3-1:*************************************************************
capture CCL type raw-data buffer 33554432 interface cluster [Capturing - 131925 bytes]
capture CAPI type raw-data buffer 100000 trace interface INSIDE [Capturing - 0 bytes]
match tcp host 192.168.240.50 host 192.168.241.50 eq www
capture CAPO type raw-data buffer 100000 trace interface OUTSIDE [Capturing - 2592 bytes]
match tcp host 192.168.240.50 host 192.168.241.50 eq www
capture CAPI_RH type raw-data reinject-hide buffer 100000 interface INSIDE [Capturing - 0 bytes]
match tcp host 192.168.240.50 host 192.168.241.50 eq www
capture CAPO_RH type raw-data reinject-hide buffer 100000 interface OUTSIDE [Capturing - 0 bytes]
match tcp host 192.168.240.50 host 192.168.241.50 eq www
观察2.源端口46994的流量连接标志分析
firepower# cluster exec show conn
unit-1-1(LOCAL):******************************************************
23 in use, 25 most used
Cluster:
fwd connections: 0 in use, 1 most used
dir connections: 0 in use, 122 most used
centralized connections: 0 in use, 0 most used
VPN redirect connections: 0 in use, 0 most used
Inspect Snort:
preserve-connection: 1 enabled, 0 in effect, 4 most enabled, 1 most in effect
TCP OUTSIDE 192.168.241.50:80 INSIDE 192.168.240.50:46994, idle 0:00:00, bytes 406028640, flags UIO N1
unit-2-1:*************************************************************
22 in use, 271 most used
Cluster:
fwd connections: 1 in use, 2 most used
dir connections: 0 in use, 2 most used
centralized connections: 0 in use, 0 most used
VPN redirect connections: 0 in use, 0 most used
Inspect Snort:
preserve-connection: 0 enabled, 0 in effect, 249 most enabled, 0 most in effect
TCP OUTSIDE 192.168.241.50:80 NP Identity Ifc 192.168.240.50:46994, idle 0:00:00, bytes 0, flags z
unit-3-1:*************************************************************
17 in use, 20 most used
Cluster:
fwd connections: 2 in use, 5 most used
dir connections: 1 in use, 127 most used
centralized connections: 0 in use, 0 most used
VPN redirect connections: 0 in use, 0 most used
Inspect Snort:
preserve-connection: 0 enabled, 0 in effect, 1 most enabled, 0 most in effect
TCP OUTSIDE 192.168.241.50:80 INSIDE 192.168.240.50:46994, idle 0:00:05, bytes 0, flags Y
单元 |
标志 |
备注 |
单元1-1 |
UIO |
· 流所有者 — 设备处理流 |
单元2-1 |
z |
·转发器 |
单元3-1 |
Y |
·备份所有者 ·董事 |
这可视化为:
观察3.使用跟踪捕获显示非对称流量和从单元2-1到单元1-1的重定向
单元1-1(所有者)
firepower# cluster exec show cap CAPI packet-number 1 trace
unit-1-1(LOCAL):******************************************************
…
Phase: 4
Type: CLUSTER-EVENT
Subtype:
Result: ALLOW
Config:
Additional Information:
Input interface: 'INSIDE'
Flow type: NO FLOW
I (0) got initial, attempting ownership.
Phase: 5
Type: CLUSTER-EVENT
Subtype:
Result: ALLOW
Config:
Additional Information:
Input interface: 'INSIDE'
Flow type: NO FLOW
I (0) am becoming owner
设备2-1(转发器)
firepower# cluster exec unit unit-2-1 show cap CAPO packet-number 1 trace
1: 16:46:44.232074 802.1Q vlan#202 P0 192.168.241.50.80 > 192.168.240.50.46994: S 2863659376:2863659376(0) ack 2879616990 win 28960 <mss 1460,sackOK,timestamp 534583774 524228304,nop,wscale 7>
…
Phase: 4
Type: CLUSTER-EVENT
Subtype:
Result: ALLOW
Config:
Additional Information:
Input interface: 'OUTSIDE'
Flow type: NO FLOW
I (1) got initial, attempting ownership.
Phase: 5
Type: CLUSTER-EVENT
Subtype:
Result: ALLOW
Config:
Additional Information:
Input interface: 'OUTSIDE'
Flow type: NO FLOW
I (1) am early redirecting to (0) due to matching action (-1).
观察4. FTD数据平面系统日志显示所有设备上的连接创建和终止:
firepower# cluster exec show log | i 46994
unit-1-1(LOCAL):******************************************************
Dec 01 2020 16:46:44: %FTD-6-302013: Built inbound TCP connection 10080 for INSIDE:192.168.240.50/46994 (192.168.240.50/46994) to OUTSIDE:192.168.241.50/80 (192.168.241.50/80)
Dec 01 2020 16:46:53: %FTD-6-302014: Teardown TCP connection 10080 for INSIDE:192.168.240.50/46994 to OUTSIDE:192.168.241.50/80 duration 0:00:09 bytes 1024000440 TCP FINs from INSIDE
unit-2-1:*************************************************************
Dec 01 2020 16:46:44: %FTD-6-302022: Built forwarder stub TCP connection for OUTSIDE:192.168.241.50/80 (192.168.241.50/80) to unknown:192.168.240.50/46994 (192.168.240.50/46994)
Dec 01 2020 16:46:53: %FTD-6-302023: Teardown forwarder TCP connection for OUTSIDE:192.168.241.50/80 to unknown:192.168.240.50/46994 duration 0:00:09 forwarded bytes 1024000292 Cluster flow with CLU closed on owner
unit-3-1:*************************************************************
Dec 01 2020 16:46:44: %FTD-6-302022: Built director stub TCP connection for INSIDE:192.168.240.50/46994 (192.168.240.50/46994) to OUTSIDE:192.168.241.50/80 (192.168.241.50/80)
Dec 01 2020 16:46:53: %FTD-6-302023: Teardown director TCP connection for INSIDE:192.168.240.50/46994 to OUTSIDE:192.168.241.50/80 duration 0:00:09 forwarded bytes 0 Cluster flow with CLU closed on owner
对于下一个案例研究,使用的拓扑基于具有内联集的集群:
案例研究6.非对称流量(内联集,所有者是指挥交换机)
观察1. reinject-hide捕获显示单元1-1和单元2-1(非对称流)上的数据包。 此外,所有者是unit-2-1(在INSIDE和OUTSIDE接口上都有数据包用于重新隐藏捕获,而unit-1-1仅在OUTSIDE上):
firepower# cluster exec show cap
unit-1-1(LOCAL):******************************************************
capture CCL type raw-data buffer 33554432 interface cluster [Buffer Full - 33553253 bytes]
capture CAPO type raw-data trace interface OUTSIDE [Buffer Full - 523432 bytes]
match tcp host 192.168.240.50 host 192.168.240.51 eq www
capture CAPI type raw-data trace interface INSIDE [Capturing - 0 bytes]
match tcp host 192.168.240.50 host 192.168.240.51 eq www
capture CAPO_RH type raw-data reinject-hide interface OUTSIDE [Buffer Full - 523432 bytes]
match tcp host 192.168.240.50 host 192.168.240.51 eq www
capture CAPI_RH type raw-data reinject-hide interface INSIDE [Capturing - 0 bytes]
match tcp host 192.168.240.50 host 192.168.240.51 eq www
unit-2-1:*************************************************************
capture CCL type raw-data buffer 33554432 interface cluster [Buffer Full - 33554312 bytes]
capture CAPO type raw-data trace interface OUTSIDE [Buffer Full - 523782 bytes]
match tcp host 192.168.240.50 host 192.168.240.51 eq www
capture CAPI type raw-data trace interface INSIDE [Buffer Full - 523782 bytes]
match tcp host 192.168.240.50 host 192.168.240.51 eq www
capture CAPO_RH type raw-data reinject-hide interface OUTSIDE [Buffer Full - 524218 bytes]
match tcp host 192.168.240.50 host 192.168.240.51 eq www
capture CAPI_RH type raw-data reinject-hide interface INSIDE [Buffer Full - 523782 bytes]
match tcp host 192.168.240.50 host 192.168.240.51 eq www
unit-3-1:*************************************************************
capture CCL type raw-data buffer 33554432 interface cluster [Capturing - 53118 bytes]
capture CAPO type raw-data trace interface OUTSIDE [Capturing - 0 bytes]
match tcp host 192.168.240.50 host 192.168.240.51 eq www
capture CAPI type raw-data trace interface INSIDE [Capturing - 0 bytes]
match tcp host 192.168.240.50 host 192.168.240.51 eq www
capture CAPO_RH type raw-data reinject-hide interface OUTSIDE [Capturing - 0 bytes]
match tcp host 192.168.240.50 host 192.168.240.51 eq www
capture CAPI_RH type raw-data reinject-hide interface INSIDE [Capturing - 0 bytes]
match tcp host 192.168.240.50 host 192.168.240.51 eq www
观察2.源端口51844的流量连接标志分析
firepower# cluster exec show conn addr 192.168.240.51
unit-1-1(LOCAL):******************************************************
30 in use, 102 most used
Cluster:
fwd connections: 1 in use, 1 most used
dir connections: 2 in use, 122 most used
centralized connections: 3 in use, 39 most used
VPN redirect connections: 0 in use, 0 most used
Inspect Snort:
preserve-connection: 0 enabled, 0 in effect, 4 most enabled, 1 most in effect
TCP OUTSIDE 192.168.240.51:80 NP Identity Ifc 192.168.240.50:51844, idle 0:00:00, bytes 0, flags z
unit-2-1:*************************************************************
23 in use, 271 most used
Cluster:
fwd connections: 0 in use, 2 most used
dir connections: 4 in use, 26 most used
centralized connections: 0 in use, 14 most used
VPN redirect connections: 0 in use, 0 most used
Inspect Snort:
preserve-connection: 0 enabled, 0 in effect, 249 most enabled, 0 most in effect
TCP OUTSIDE 192.168.240.51:80 INSIDE 192.168.240.50:51844, idle 0:00:00, bytes 231214400, flags b N
unit-3-1:*************************************************************
20 in use, 55 most used
Cluster:
fwd connections: 0 in use, 5 most used
dir connections: 1 in use, 127 most used
centralized connections: 0 in use, 24 most used
VPN redirect connections: 0 in use, 0 most used
Inspect Snort:
preserve-connection: 0 enabled, 0 in effect, 1 most enabled, 0 most in effect
TCP OUTSIDE 192.168.240.51:80 INSIDE 192.168.240.50:51844, idle 0:00:01, bytes 0, flags y
单元 |
标志 |
备注 |
单元1-1 |
z |
·转发器 |
单元2-1 |
b N |
· 流所有者 — 设备处理流 |
单元3-1 |
y |
·备份所有者 |
这可视化为:
观察3.使用跟踪捕获显示非对称流量和从单元1-1到单元2-1的重定向
单元2-1(所有者/主管)
firepower# cluster exec unit unit-2-1 show cap CAPI packet-number 1 trace
1: 18:10:12.842912 192.168.240.50.51844 > 192.168.240.51.80: S 4082593463:4082593463(0) win 29200 <mss 1460,sackOK,timestamp 76258053 0,nop,wscale 7>
Phase: 1
Type: CLUSTER-EVENT
Subtype:
Result: ALLOW
Config:
Additional Information:
Input interface: 'INSIDE'
Flow type: NO FLOW
I (1) got initial, attempting ownership.
Phase: 2
Type: CLUSTER-EVENT
Subtype:
Result: ALLOW
Config:
Additional Information:
Input interface: 'INSIDE'
Flow type: NO FLOW
I (1) am becoming owner
单元1-1(转发器)
firepower# cluster exec show cap CAPO packet-number 1 trace
unit-1-1(LOCAL):******************************************************
1: 18:10:12.842317 192.168.240.51.80 > 192.168.240.50.51844: S 2339579109:2339579109(0) ack 4082593464 win 28960 <mss 1460,sackOK,timestamp 513139467 76258053,nop,wscale 7>
Phase: 1
Type: CLUSTER-EVENT
Subtype:
Result: ALLOW
Config:
Additional Information:
Input interface: 'OUTSIDE'
Flow type: NO FLOW
I (0) am asking director (1).
返回流量(TCP SYN/ACK)
单元2-1(所有者/主管)
firepower# cluster exec unit unit-2-1 show cap CAPO packet-number 2 trace
2: 18:10:12.843660 192.168.240.51.80 > 192.168.240.50.51844: S 2339579109:2339579109(0) ack 4082593464 win 28960 <mss 1460,sackOK,timestamp 513139467 76258053,nop,wscale 7>
Phase: 1
Type: CLUSTER-EVENT
Subtype:
Result: ALLOW
Config:
Additional Information:
Input interface: 'OUTSIDE'
Flow type: FULL
I (1) am owner, update sender (0).
Phase: 2
Type: FLOW-LOOKUP
Subtype:
Result: ALLOW
Config:
Additional Information:
Found flow with id 7109, using existing flow
观察4. FTD数据平面系统日志显示所有设备上的连接创建和终止:
firepower# cluster exec show log | include 51844
unit-1-1(LOCAL):******************************************************
Dec 02 2020 18:10:12: %FTD-6-302022: Built forwarder stub TCP connection for OUTSIDE:192.168.240.51/80 (192.168.240.51/80) to unknown:192.168.240.50/51844 (192.168.240.50/51844)
Dec 02 2020 18:10:22: %FTD-6-302023: Teardown forwarder TCP connection for OUTSIDE:192.168.240.51/80 to unknown:192.168.240.50/51844 duration 0:00:09 forwarded bytes 1024001740 Cluster flow with CLU closed on owner
unit-2-1:*************************************************************
Dec 02 2020 18:10:12: %FTD-6-302303: Built TCP state-bypass connection 7109 from INSIDE:192.168.240.50/51844 (192.168.240.50/51844) to OUTSIDE:192.168.240.51/80 (192.168.240.51/80)
Dec 02 2020 18:10:22: %FTD-6-302304: Teardown TCP state-bypass connection 7109 from INSIDE:192.168.240.50/51844 to OUTSIDE:192.168.240.51/80 duration 0:00:09 bytes 1024001888 TCP FINs
unit-3-1:*************************************************************
Dec 02 2020 18:10:12: %FTD-6-302022: Built backup stub TCP connection for INSIDE:192.168.240.50/51844 (192.168.240.50/51844) to OUTSIDE:192.168.240.51/80 (192.168.240.51/80)
Dec 02 2020 18:10:22: %FTD-6-302023: Teardown backup TCP connection for INSIDE:192.168.240.50/51844 to OUTSIDE:192.168.240.51/80 duration 0:00:09 forwarded bytes 0 Cluster flow with CLU closed on owner
案例研究7.非对称流量(内联集,所有者与指挥交换机不同)
所有者是unit-2-1(在INSIDE和OUTSIDE接口上都有用于reinject-hide捕获的数据包,而unit-3-1仅在OUTSIDE上):
firepower# cluster exec show cap
unit-1-1(LOCAL):******************************************************
capture CCL type raw-data buffer 33554432 interface cluster [Capturing - 13902 bytes]
capture CAPO type raw-data trace interface OUTSIDE [Capturing - 90 bytes]
match tcp host 192.168.240.50 host 192.168.240.51 eq www
capture CAPI type raw-data trace interface INSIDE [Capturing - 0 bytes]
match tcp host 192.168.240.50 host 192.168.240.51 eq www
capture CAPO_RH type raw-data reinject-hide interface OUTSIDE [Capturing - 0 bytes]
match tcp host 192.168.240.50 host 192.168.240.51 eq www
capture CAPI_RH type raw-data reinject-hide interface INSIDE [Capturing - 0 bytes]
match tcp host 192.168.240.50 host 192.168.240.51 eq www
unit-2-1:*************************************************************
capture CCL type raw-data buffer 33554432 interface cluster [Buffer Full - 33553936 bytes]
capture CAPO type raw-data trace interface OUTSIDE [Buffer Full - 523126 bytes]
match tcp host 192.168.240.50 host 192.168.240.51 eq www
capture CAPI type raw-data trace interface INSIDE [Buffer Full - 523126 bytes]
match tcp host 192.168.240.50 host 192.168.240.51 eq www
capture CAPO_RH type raw-data reinject-hide interface OUTSIDE [Buffer Full - 524230 bytes]
match tcp host 192.168.240.50 host 192.168.240.51 eq www
capture CAPI_RH type raw-data reinject-hide interface INSIDE [Buffer Full - 523126 bytes]
match tcp host 192.168.240.50 host 192.168.240.51 eq www
unit-3-1:*************************************************************
capture CCL type raw-data buffer 33554432 interface cluster [Buffer Full - 33553566 bytes]
capture CAPO type raw-data trace interface OUTSIDE [Buffer Full - 523522 bytes]
match tcp host 192.168.240.50 host 192.168.240.51 eq www
capture CAPI type raw-data trace interface INSIDE [Capturing - 0 bytes]
match tcp host 192.168.240.50 host 192.168.240.51 eq www
capture CAPO_RH type raw-data reinject-hide interface OUTSIDE [Buffer Full - 523432 bytes]
match tcp host 192.168.240.50 host 192.168.240.51 eq www
capture CAPI_RH type raw-data reinject-hide interface INSIDE [Capturing - 0 bytes]
match tcp host 192.168.240.50 host 192.168.240.51 eq www
观察2.源端口59210的流量连接标志分析
firepower# cluster exec show conn addr 192.168.240.51
unit-1-1(LOCAL):******************************************************
25 in use, 102 most used
Cluster:
fwd connections: 0 in use, 1 most used
dir connections: 2 in use, 122 most used
centralized connections: 0 in use, 39 most used
VPN redirect connections: 0 in use, 0 most used
Inspect Snort:
preserve-connection: 0 enabled, 0 in effect, 4 most enabled, 1 most in effect
TCP OUTSIDE 192.168.240.51:80 INSIDE 192.168.240.50:59210, idle 0:00:03, bytes 0, flags Y
unit-2-1:*************************************************************
21 in use, 271 most used
Cluster:
fwd connections: 0 in use, 2 most used
dir connections: 0 in use, 28 most used
centralized connections: 0 in use, 14 most used
VPN redirect connections: 0 in use, 0 most used
Inspect Snort:
preserve-connection: 0 enabled, 0 in effect, 249 most enabled, 0 most in effect
TCP OUTSIDE 192.168.240.51:80 INSIDE 192.168.240.50:59210, idle 0:00:00, bytes 610132872, flags b N
unit-3-1:*************************************************************
19 in use, 55 most used
Cluster:
fwd connections: 1 in use, 5 most used
dir connections: 0 in use, 127 most used
centralized connections: 0 in use, 24 most used
VPN redirect connections: 0 in use, 0 most used
Inspect Snort:
preserve-connection: 0 enabled, 0 in effect, 1 most enabled, 0 most in effect
TCP OUTSIDE 192.168.240.51:80 NP Identity Ifc 192.168.240.50:59210, idle 0:00:00, bytes 0, flags z
单元 |
标志 |
备注 |
单元1-1 |
Y |
·总监/备份所有者 |
单元2-1 |
b N |
· 流所有者 — 设备处理流 |
单元3-1 |
z |
·转发器 |
这可视化为:
注意:对于步骤2(通过CCL的数据包),在步骤4(数据流量)之前发生非常重要。 在不同的情况(例如,竞争情况)中,指挥交换机不知道流。因此,由于它是内联集,因此会将数据包转发到目的地。如果接口不在内联集中,则数据包将被丢弃。
观察3.使用跟踪捕获显示CCL上的非对称流量和交换:
转发流量(TCP SYN)
单元2-1(所有者)
firepower# cluster exec unit unit-2-1 show cap CAPI packet-number 1 trace
1: 09:19:49.760702 192.168.240.50.59210 > 192.168.240.51.80: S 4110299695:4110299695(0) win 29200 <mss 1460,sackOK,timestamp 130834570 0,nop,wscale 7>
Phase: 1
Type: CLUSTER-EVENT
Subtype:
Result: ALLOW
Config:
Additional Information:
Input interface: 'INSIDE'
Flow type: NO FLOW
I (1) got initial, attempting ownership.
Phase: 2
Type: CLUSTER-EVENT
Subtype:
Result: ALLOW
Config:
Additional Information:
Input interface: 'INSIDE'
Flow type: NO FLOW
I (1) am becoming owner
返回流量(TCP SYN/ACK)
Unit-3-1(ID 2 — 转发器)通过CCL将数据包发送到Unit-1-1(ID 0 - director)
firepower# cluster exec unit unit-3-1 show cap CAPO packet-number 1 trace
1: 09:19:49.760336 192.168.240.51.80 > 192.168.240.50.59210: S 4209225081:4209225081(0) ack 4110299696 win 28960 <mss 1460,sackOK,timestamp 567715984 130834570,nop,wscale 7>
Phase: 1
Type: CLUSTER-EVENT
Subtype:
Result: ALLOW
Config:
Additional Information:
Input interface: 'OUTSIDE'
Flow type: NO FLOW
I (2) am asking director (0).
Unit-1-1(指挥交换机) — Unit-1-1(ID 0)知道流所有者是Unit-2-1(ID 1),并通过CCL将数据包发回Unit-3-1(ID 2 — 转发器)
firepower# cluster exec show cap CAPO packet-number 1 trace
unit-1-1(LOCAL):******************************************************
1: 09:19:49.761038 192.168.240.51.80 > 192.168.240.50.59210: S 4209225081:4209225081(0) ack 4110299696 win 28960 <mss 1460,sackOK,timestamp 567715984 130834570,nop,wscale 7>
Phase: 1
Type: CLUSTER-EVENT
Subtype:
Result: ALLOW
Config:
Additional Information:
Input interface: 'OUTSIDE'
Flow type: STUB
I (0) am director, valid owner (1), update sender (2).
Unit-3-1(ID 2 - forwarder)通过CCL获取数据包,并将其发送到Unit-2-1(ID 1 - owner)
firepower# cluster exec unit unit-3-1 show cap CAPO packet-number 2 trace
...
2: 09:19:49.761008 192.168.240.51.80 > 192.168.240.50.59210: S 4209225081:4209225081(0) ack 4110299696 win 28960 <mss 1460,sackOK,timestamp 567715984 130834570,nop,wscale 7>
Phase: 1
Type: CLUSTER-EVENT
Subtype:
Result: ALLOW
Config:
Additional Information:
Input interface: 'OUTSIDE'
Flow type: STUB
I (2) am becoming forwarder to (1), sender (0).
所有者重新注入数据包并将其转发到目的地:
firepower# cluster exec unit unit-2-1 show cap CAPO packet-number 2 trace
2: 09:19:49.775701 192.168.240.51.80 > 192.168.240.50.59210: S 4209225081:4209225081(0) ack 4110299696 win 28960 <mss 1460,sackOK,timestamp 567715984 130834570,nop,wscale 7>
Phase: 1
Type: CLUSTER-EVENT
Subtype:
Result: ALLOW
Config:
Additional Information:
Input interface: 'OUTSIDE'
Flow type: FULL
I (1) am owner, sender (2).
观察4. FTD数据平面系统日志显示所有设备上的连接创建和终止:
firepower# cluster exec show log | i 59210
unit-1-1(LOCAL):******************************************************
Dec 03 2020 09:19:49: %FTD-6-302022: Built director stub TCP connection for INSIDE:192.168.240.50/59210 (192.168.240.50/59210) to OUTSIDE:192.168.240.51/80 (192.168.240.51/80)
Dec 03 2020 09:19:59: %FTD-6-302023: Teardown director TCP connection for INSIDE:192.168.240.50/59210 to OUTSIDE:192.168.240.51/80 duration 0:00:09 forwarded bytes 0 Cluster flow with CLU closed on owner
unit-2-1:*************************************************************
Dec 03 2020 09:19:49: %FTD-6-302303: Built TCP state-bypass connection 14483 from INSIDE:192.168.240.50/59210 (192.168.240.50/59210) to OUTSIDE:192.168.240.51/80 (192.168.240.51/80)
Dec 03 2020 09:19:59: %FTD-6-302304: Teardown TCP state-bypass connection 14483 from INSIDE:192.168.240.50/59210 to OUTSIDE:192.168.240.51/80 duration 0:00:09 bytes 1024003336 TCP FINs
unit-3-1:*************************************************************
Dec 03 2020 09:19:49: %FTD-6-302022: Built forwarder stub TCP connection for OUTSIDE:192.168.240.51/80 (192.168.240.51/80) to unknown:192.168.240.50/59210 (192.168.240.50/59210)
Dec 03 2020 09:19:59: %FTD-6-302023: Teardown forwarder TCP connection for OUTSIDE:192.168.240.51/80 to unknown:192.168.240.50/59210 duration 0:00:09 forwarded bytes 1024003188 Cluster flow with CLU closed on owner
集群故障排除简介
群集问题可分为:
重要配置注意事项
由于来自低端口的流量导致集群IP不均衡,PAT池范围使用率较高
FTD将PAT IP划分为“范围”,并尝试将xlate保持在同一源范围内。下表显示了如何将源端口转换为同一源范围内的全局端口。
原始源端口 |
转换后的源端口 |
1-511 |
1-511 |
512-1023 |
512-1023 |
1024-65535 |
1024-65535 |
当源端口范围已满且需要从该范围分配新的PAT转换时,FTD将移至下一个IP,为该源端口范围分配新转换。
症状
通过集群的NATed流量的连接问题
确认
# show nat pool
FTD数据平面日志显示PAT池耗尽:
Dec 9 09:00:00 192.0.2.10 FTD-FW %ASA-3-202010: PAT pool exhausted. Unable to create TCP connection from Inside:192.0.2.150/49464 to Outside:192.0.2.250/20015
Dec 9 09:00:00 192.0.2.10 FTD-FW %ASA-3-202010: PAT pool exhausted. Unable to create TCP connection from Inside:192.0.2.148/54141 to Outside:192.0.2.251/443
缓解
配置NAT平面端口范围并包括保留端口。
此外,在后6.7/9.15.1中,只有当节点离开/加入具有受PAT影响的庞大后台流量的集群时,您才会出现不均衡的端口块分布。它自身恢复的唯一方法是释放端口块以在节点间重新分配。
使用基于端口块的分布时,当节点分配有例如pb-1、pb-2的10个端口块时…… pb-10。节点总是从第一个可用端口块开始,并从它分配一个随机端口,直到它耗尽。仅当到该点为止的所有端口块都用尽时,分配才会移至下一个端口块。
例如,如果主机建立512个连接,则设备会为从pb-1随机分配的所有512个连接分配映射端口。现在,由于所有这512个连接都处于活动状态,当主机建立第513个连接时,它会移到pb-2并从中分配一个随机端口。现在,在513个连接中,假设第10个连接已完成并清除了pb-1中的一个可用端口。此时,如果主机建立第514个连接,则集群单元将分配来自pb-1的映射端口,而不是pb-2的映射端口,因为pb-1现在有一个空闲端口(在第10个连接删除时释放)。
需要记住的重要部分是分配从具有空闲端口的第一个可用端口块进行,以便在正常加载的系统中,最后一个端口块始终可用于重分发。此外,PAT通常用于短期连接。端口块在较短时间内可用的可能性非常高。因此,通过基于端口块的池分配,池分配达到平衡所需的时间可以缩短。
但是,如果从pb-1到pb-10的所有端口块都用尽,或者每个端口块都有一个端口用于长期连接,则端口块永远不会快速释放并重新分发。在这种情况下,破坏性最小的方法是:
警告:这会中断相关连接。
当重定向到其他目标时,无法浏览到双通道网站(如Web邮件、银行等)或SSO网站
症状
无法浏览到双通道网站(如Webmail、银行网站等)。 当用户连接到要求客户端打开第二个套接字/连接的网站,并且第二个连接散列到不同于第一个连接散列到的群集成员,并且流量使用IP PAT池时,当服务器从不同的公有IP地址接收连接时,流量会被服务器重置。
确认
捕获数据平面集群捕获,了解如何处理受影响的传输流。在这种情况下,TCP重置来自目标网站。
缓解(6.7/9.15.1之前)
关于以太信道负载均衡算法:
由于池中的PAT IP不足,发送到控制节点的所有流量都导致集群性能低
症状
集群中没有足够的PAT IP来向数据节点分配空闲IP,因此受PAT配置限制的所有流量都会转发到控制节点进行处理。
确认
使用show nat pool cluster命令查看每台设备的分配,并确认它们在池中至少拥有一个IP。
缓解
对于6.7/9.15.1之前版本,请确保您的PAT池大小至少等于集群中节点数。在PAT池的后6.7/9.15.1中,您从所有PAT池IP分配端口块。如果PAT池使用率确实很高,导致池频繁耗尽,您需要增加PAT池大小(请参阅FAQ部分)
由于向控制节点发送的所有流量未启用xlate,因此性能较低
症状
大量高速UDP备份流通过集群控制节点进行处理,这会影响性能。
背景
只有使用启用每个会话的转换的连接才能由使用PAT的数据节点处理。使用命令show run all xlate查看xlate per-session config
启用每会话意味着当关联连接断开时,xlate立即断开。这有助于在连接受到PAT时提高每秒连接性能。在关联连接断开后,非每会话转换将再生30秒,如果连接速率足够高,则每个全局IP上的可用65k TCP/UDP端口可能会在短时间内耗尽。
默认情况下,所有TCP流量均启用每个xlate,且仅UDP DNS流量启用每个会话。这意味着所有非DNS UDP流量都会转发到控制节点进行处理。
确认
使用此命令检查集群设备之间的连接和数据包分布:
firepower# show cluster info conn-distribution
firepower# show cluster info packet-distribution
firepower# show cluster info load-monitor
使用cluster exec show conn命令查看哪些群集节点拥有UDP连接。
firepower# cluster exec show conn
使用此命令可了解跨群集节点的池使用情况。
firepower# cluster exec show nat pool ip| in UDP
缓解
为所关注的流量(例如UDP)配置每会话PAT(每会话permit udp命令)。 对于ICMP,您不能更改默认多会话PAT,因此当配置了PAT时,IMCP流量始终由控制节点处理。
当节点离开/加入集群时,PAT池分布变得不均衡。
症状
确认
%ASA-3-202010: NAT pool exhausted. Unable to create TCP connection from inside:192.0.2.1/2239 to outside:192.0.2.150/80
缓解
症状
集群PAT传输的流量的主要连接问题。这是因为根据设计,FTD数据平面不发送全局NAT地址的GARP。
确认
直连设备的ARP表显示控制节点更改后集群数据接口的MAC地址不同:
root@kali2:~/tests# arp -a
? (192.168.240.1) at f4:db:e6:33:44:2e [ether] on eth0
root@kali2:~/tests# arp -a
? (192.168.240.1) at f4:db:e6:9e:3d:0e [ether] on eth0
缓解
在集群数据接口上配置静态(虚拟)MAC。
PAT连接失败
症状
集群PAT的流量的连接问题。
验证/缓解
firepower# debug nat 2
nat: no free blocks available to reserve for 192.168.241.59, proto 17
nat: no free blocks available to reserve for 192.168.241.59, proto 17
nat: no free blocks available to reserve for 192.168.241.58, proto 17
nat: no free blocks available to reserve for 192.168.241.58, proto 17
nat: no free blocks available to reserve for 192.168.241.57, proto 17
要停止调试,请执行以下操作:
firepower# un all
ASA和FTD集群PAT改进(9.15和6.7后)
发生了什么变化?
PAT操作已重新设计。各个IP不再分布到每个集群成员。相反,PAT IP被拆分为多个端口块,并结合IP粘性操作在集群成员之间均匀(尽可能多)分配这些端口块。
新设计解决了这些限制(请参阅上一节):
从技术上讲,现在有1024-65535作为PAT的默认端口范围,而不是默认的1-511、512-1023和1024-65535端口范围。此默认范围可以扩展为包括特权端口范围1-1023(用于常规PAT)(“include-reserve”选项)。
这是FTD 6.7上PAT池配置的示例。有关其他详细信息,请参阅《配置指南》中的相关部分:
有关PAT的其他故障排除信息
FTD数据平面系统日志(后6.7/9.15.1)
当群集节点上的粘滞IP中的所有端口都用尽,并且分配移至具有空闲端口的下一个可用IP时,会生成粘滞失效系统日志。例如
%ASA-4-305021: Ports exhausted in pre-allocated PAT pool IP 192.0.2.100 for host 198.51.100.100 Allocating from new PAT pool IP 203.0.113.100.
池不平衡系统日志在节点加入集群时生成,并且不获得端口块的任何或不等分份额,例如
%ASA-4-305022: Cluster unit ASA-4 has been allocated 0 port blocks for PAT usage. All units should have at least 32 port blocks.
%ASA-4-305022: Cluster unit ASA-4 has been allocated 12 port blocks for PAT usage. All units should have at least 32 port blocks.
显示命令
池分布状态
在show nat pool cluster summary输出中,对于每个PAT IP地址,在平衡分配方案中,节点间的端口块不得超过1个。均衡和不均衡的端口块分布示例。
firepower# show nat pool cluster summary
port-blocks count display order: total, unit-1-1, unit-2-1, unit-3-1
IP OUTSIDE:ip_192.168.241.57-59 192.168.241.57 (126 - 42 / 42 / 42)
IP OUTSIDE:ip_192.168.241.57-59 192.168.241.58 (126 - 42 / 42 / 42)
IP OUTSIDE:ip_192.168.241.57-59 192.168.241.59 (126 - 42 / 42 / 42)
不均衡分布:
firepower# show nat pool cluster summary
port-blocks count display order: total, unit-1-1, unit-4-1, unit-2-1, unit-3-1
IP outside:src_map 192.0.2.100 (128 - 32 / 22 / 38 / 36)
池所有权状态
在show nat pool cluster输出中,不得有一个端口块,其所有者或备份都不能为“UNKNOWN”。如果有,则表明池所有权通信存在问题。示例:
firepower# show nat pool cluster | in
[3072-3583], owner unit-4-1, backup <UNKNOWN>
[56832-57343], owner <UNKNOWN>, backup <UNKNOWN>
[10240-10751], owner unit-2-1, backup <UNKNOWN>
对端口块中端口分配的记帐
show nat pool 命令通过其他选项得到增强,以显示详细信息和已过滤的输出。示例:
firepower# show nat pool detail
TCP PAT pool INSIDE, address 192.168.240.1, range 1-1023, allocated 0
TCP PAT pool INSIDE, address 192.168.240.1, range 1024-65535, allocated 18
UDP PAT pool INSIDE, address 192.168.240.1, range 1-1023, allocated 0
UDP PAT pool INSIDE, address 192.168.240.1, range 1024-65535, allocated 20
TCP PAT pool OUTSIDE, address 192.168.241.1, range 1-1023, allocated 0
TCP PAT pool OUTSIDE, address 192.168.241.1, range 1024-65535, allocated 18
UDP PAT pool OUTSIDE, address 192.168.241.1, range 1-1023, allocated 0
UDP PAT pool OUTSIDE, address 192.168.241.1, range 1024-65535, allocated 20
UDP PAT pool OUTSIDE, address 192.168.241.58
range 1024-1535, allocated 512
range 1536-2047, allocated 512
range 2048-2559, allocated 512
range 2560-3071, allocated 512
...
unit-2-1:*************************************************************
UDP PAT pool OUTSIDE, address 192.168.241.57
range 1024-1535, allocated 512 *
range 1536-2047, allocated 512 *
range 2048-2559, allocated 512 *
注意:“*”表示它是备份端口块
要解决此问题,请使用clear xlate global <ip> gport <start-end>命令手动清除其他节点上的某些端口块,以便重分发到所需节点。
手动触发端口块重分发
firepower# show nat pool detail | i 19968
range 19968-20479, allocated 512
range 19968-20479, allocated 512
range 19968-20479, allocated 512
firepower# clear xlate global 192.168.241.57 gport 19968-20479
INFO: 1074 xlates deleted
后6.7/9.15.1 PAT常见问题(FAQ)
问:如果集群中可用设备数量的IP数量可用,您仍可以将每台设备1个IP用作选项
答:现在不会了,也不会切换基于IP地址与基于端口块的池分配方案。
基于IP地址的池分配的旧方案导致多会话应用失败,其中来自主机的多个连接(属于单个应用事务的一部分)被负载均衡到集群的不同节点,并因此被不同的映射IP地址转换,这些映射IP地址导致目的服务器看到它们来自不同实体。
而且,使用新的基于端口块的分配方案,即使现在可以使用一个PAT IP地址这么低的地址,也始终建议根据需要进行PAT的连接数来拥有足够的PAT IP地址。
问:您是否仍能为集群的PAT池提供IP地址池?
是的,你可以。来自所有PAT池IP的端口块将分布在集群节点中。
问:如果为PAT池使用多个IP地址,则每个IP地址分配给每个成员的端口块是否相同?
答:不,每个IP都独立分布。
问:所有群集节点都有所有公有IP,但只有端口的子集?如果是这种情况,那么是否保证每次源IP使用相同的公有IP?
答:正确,每个PAT IP部分归每个节点所有。如果所选公有IP在节点上耗尽,则会生成系统日志,指示无法保留粘滞IP,并且分配将移至下一个可用公有IP。无论是独立部署、HA部署还是集群部署,IP粘性始终以尽力而为的,这取决于池可用性。
问:所有内容是否都基于PAT池中的单个IP地址,但如果在PAT池中使用多个IP地址,则不适用?
答:它也适用于PAT池中的多个IP地址。来自PAT池中每个IP的端口块分布在多个群集节点上。PAT池中的每个IP地址都会在集群中的所有成员之间拆分。因此,如果PAT池中有C类地址,则每个集群成员都有来自每个PAT池地址的端口池。
它与中广核合作吗?
答:是的,CGNAT也受支持。CGNAT(也称为“块分配”)PAT的默认块大小为“512”,可通过xlate block-allocation size CLI修改该大小。在常规动态PAT(非CGNAT)中,块大小始终为“512”,这是固定的且不可配置的。
问:如果设备离开集群,控制节点是将端口块范围分配给其他设备还是将其保留给自己?
答:每个端口块都有一个所有者和备份。每次从端口块创建xlate时,也会将其复制到端口块的备份节点。当节点离开集群时,备份节点拥有所有端口块和所有当前连接。备份节点由于已成为这些附加端口块的所有者,因此会为其选择新备份,并将所有当前转换复制到该节点以处理故障场景。
问:根据此警报,可以采取什么措施来加强粘性?
原因有二:粘性无法保留。
原因1:流量负载均衡不正确,因为其中一个节点看到的连接数比其他节点更多,从而导致特定粘滞IP耗尽。如果确保流量在集群节点间均匀分布,则可以解决此问题。例如,在FPR41xx集群上,调整连接交换机上的负载均衡算法。在FPR9300集群上,确保机箱中刀片数量相等。
原因2:PAT池使用率非常高,这会导致池频繁耗尽。要解决此问题,请增加PAT池大小。
问:如何处理对extended关键字的支持?它是显示错误并防止在升级期间添加整个NAT命令,还是删除extended关键字并显示警告?
答:从ASA 9.15.1/FP 6.7开始,集群中不支持PAT“扩展”选项。不会从任何CLI/ASDM/CSM/FMC中删除配置选项。当配置(直接或间接通过升级)时,系统会向您发送警告消息,并且会接受配置,但您看不到PAT的扩展功能。
问:是否与并发连接的转换次数相同?
答:在6.7/9.15.1之前,虽然是1-65535,但源端口在1-1024范围内从未多少使用,因此它实际上使其为1024-65535(64512 conns)。 在6.7/9.15.1之后的实施中,默认行为为“flat”,即1024-65535。但是,如果要使用1-1024,可以使用“include-reserve”选项。
问:如果节点加回集群,它会将旧备份节点作为“备份”,而该备份节点会将旧端口块分配给它?
答:这取决于当时端口块的可用性。当节点离开集群时,其所有端口块都会移动到备份节点。然后,控制节点会累计空闲端口块并将其分配给所需节点。
问:如果控制节点的状态发生变化,则会选择新的控制节点,是否会维持PAT块分配,或者是否会根据新的控制节点重新分配端口块?
A.新控制节点了解已分配的块以及哪些块是空闲的并从那里开始。
问:使用这种新行为时,最大转换数是否与最大并发连接数相同?
是的。最大转换数取决于PAT端口的可用性。它与最大并发连接数无关。如果仅允许1个地址,则可能有65535个连接。如果需要更多,则必须分配更多IP地址。如果地址/端口足够,则可以达到最大并发连接数。
问:添加新集群成员时,端口块分配的过程是什么? 如果由于重新启动而添加集群成员会发生什么情况?
答:端口块始终由控制节点分配。只有当有空闲端口块时,端口块才会分配给新节点。空闲端口块表示端口块内的任何映射端口都不提供任何连接。
此外,当重新加入时,每个节点重新计算它可以拥有的块数。如果节点包含的块数超出其预期,它会在控制节点可用时向其释放此类附加端口块。然后控制节点将它们分配给新加入的数据节点。
问:它是否也仅支持TCP和UDP协议或SCTP?
答:动态PAT从不支持SCTP。对于SCTP流量,建议仅使用静态网络对象NAT。
问:如果节点用尽了块端口,它是否会丢弃数据包而不使用下一个可用的IP块?
不,不会马上掉下去。它使用来自下一个PAT IP的可用端口块。如果所有PAT IP上的所有端口块都用尽,则会丢弃流量。
问:为避免集群升级窗口中控制节点过载,最好先手动选择新控制(例如,4单元集群升级过半),而不是等待控制节点上处理所有连接?
答:必须最后更新控件。这是因为,当控制节点运行较新版本时,除非所有节点运行较新版本,否则不会启动池分配。此外,当升级运行时,如果控制节点运行较旧版本,则具有较新版本的所有数据节点都会忽略来自控制节点的池分发消息。
要详细解释这一点,请考虑使用4个节点A、B、C和D并以A为控制的群集部署。以下是典型的无中断升级步骤:
a.处理PAT配置
b.将每个PAT IP划分为端口块
c.所有端口块都处于未分配状态
d.忽略从控件接收的旧版本集群PAT消息
e.将所有PAT连接重定向到主设备
4.同样,使用新版本启动其他节点。
5.重新加载单元“A”控制。由于没有用于控制的备份,所有现有连接都将被丢弃
6.新控件以较新的格式开始端口块的分配
7.单元“A”重新加入,能够接受并处理端口块分发消息
症状
在站点间集群部署中,必须在1个特定站点(站点本地流量)中处理的分段数据包仍可发送到其他站点中的设备,因为其中一个站点可以具有分段所有者。
在集群逻辑中,为具有分段数据包的连接定义了其他角色:片段所有者。
对于分段数据包,接收分段的集群单元根据分段源IP地址、目标IP地址和数据包ID的哈希确定分段所有者。然后,所有分片通过集群控制链路转发到分片所有者。分段可以负载均衡到不同的集群单元,因为只有第一个分段包含交换机负载平衡哈希中使用的5元组。其他分段不包含源端口和目标端口,可以对其他集群设备进行负载均衡。分段所有者临时重组数据包,以便根据源/目标IP地址和端口的哈希值确定指挥交换机。如果是新连接,则片段所有者将成为连接所有者。如果是现有连接,则分段所有者会通过集群控制链路将所有分段转发给连接所有者。然后,连接所有者重新组装所有分段。
请考虑此拓扑,将分段的ICMP回应请求从客户端流到服务器:
为了了解操作顺序,在内部、外部、集群控制链路接口上配置了跟踪选项的集群范围数据包捕获。此外,在内部接口上配置了带reinject-hide选项的数据包捕获。
firepower# cluster exec capture capi interface inside trace match icmp any any
firepower# cluster exec capture capir interface inside reinject-hide trace match icmp any any
firepower# cluster exec capture capo interface outside trace match icmp any any
firepower# cluster exec capture capccl interface cluster trace match icmp any any
群集中的操作顺序:
1.站点1中的unit-1-1接收分段的ICMP回应请求数据包。
firepower# cluster exec show cap capir
unit-1-1(LOCAL):******************************************************
2 packets captured
1: 20:13:58.227801 802.1Q vlan#10 P0 192.0.2.10 > 203.0.113.10 icmp: echo request
2: 20:13:58.227832 802.1Q vlan#10 P0
2 packets shown
2.unit-1-1选择站点2中的unit-2-2作为分片所有者,并向其发送分段数据包。
从单元1-1发送到单元2-2的数据包的目的MAC地址是单元2-2中CCL链路的MAC地址。
firepower# show cap capccl packet-number 1 detail
7 packets captured
1: 20:13:58.227817 0015.c500.018f 0015.c500.029f 0x0800 Length: 1509
192.0.2.10 > 203.0.113.10 icmp: echo request (wrong icmp csum) (frag 46772:1475@0+) (ttl 3)
1 packet shown
firepower# show cap capccl packet-number 2 detail
7 packets captured
2: 20:13:58.227832 0015.c500.018f 0015.c500.029f 0x0800 Length: 637
192.0.2.10 > 203.0.113.10 (frag 46772:603@1480) (ttl 3)
1 packet shown
firepower# cluster exec show interface po48 | i MAC
unit-1-1(LOCAL):******************************************************
MAC address 0015.c500.018f, MTU 1500
unit-1-2:*************************************************************
MAC address 0015.c500.019f, MTU 1500
unit-2-2:*************************************************************
MAC address 0015.c500.029f, MTU 1500
unit-1-3:*************************************************************
MAC address 0015.c500.016f, MTU 1500
unit-2-1:*************************************************************
MAC address 0015.c500.028f, MTU 1500
unit-2-3:*************************************************************
MAC address 0015.c500.026f, MTU 1500
3.unit-2-2接收、重组分片的数据包,并成为流的所有者。
firepower# cluster exec unit unit-2-2 show capture capccl packet-number 1 trace
11 packets captured
1: 20:13:58.231845 192.0.2.10 > 203.0.113.10 icmp: echo request
Phase: 1
Type: CLUSTER-EVENT
Subtype:
Result: ALLOW
Config:
Additional Information:
Input interface: 'inside'
Flow type: NO FLOW
I (2) received a FWD_FRAG_TO_FRAG_OWNER from (0).
Phase: 2
Type: CLUSTER-EVENT
Subtype:
Result: ALLOW
Config:
Additional Information:
Input interface: 'inside'
Flow type: NO FLOW
I (2) have reassembled a packet and am processing it.
Phase: 3
Type: CAPTURE
Subtype:
Result: ALLOW
Config:
Additional Information:
MAC Access list
Phase: 4
Type: ACCESS-LIST
Subtype:
Result: ALLOW
Config:
Implicit Rule
Additional Information:
MAC Access list
Phase: 5
Type: ROUTE-LOOKUP
Subtype: No ECMP load balancing
Result: ALLOW
Config:
Additional Information:
Destination is locally connected. No ECMP load balancing.
Found next-hop 203.0.113.10 using egress ifc outside(vrfid:0)
Phase: 6
Type: CLUSTER-EVENT
Subtype:
Result: ALLOW
Config:
Additional Information:
Input interface: 'inside'
Flow type: NO FLOW
I (2) am becoming owner
Phase: 7
Type: ACCESS-LIST
Subtype: log
Result: ALLOW
Config:
access-group CSM_FW_ACL_ global
access-list CSM_FW_ACL_ advanced trust ip any any rule-id 268435460 event-log flow-end
access-list CSM_FW_ACL_ remark rule-id 268435460: PREFILTER POLICY: igasimov_prefilter1
access-list CSM_FW_ACL_ remark rule-id 268435460: RULE: r1
Additional Information:
...
Phase: 19
Type: FLOW-CREATION
Subtype:
Result: ALLOW
Config:
Additional Information:
New flow created with id 1719, packet dispatched to next module
...
Result:
input-interface: cluster(vrfid:0)
input-status: up
input-line-status: up
output-interface: outside(vrfid:0)
output-status: up
output-line-status: up
Action: allow
1 packet shown
firepower# cluster exec unit unit-2-2 show capture capccl packet-number 2 trace
11 packets captured
2: 20:13:58.231875
Phase: 1
Type: CLUSTER-EVENT
Subtype:
Result: ALLOW
Config:
Additional Information:
Input interface: 'inside'
Flow type: NO FLOW
I (2) received a FWD_FRAG_TO_FRAG_OWNER from (0).
Result:
input-interface: cluster(vrfid:0)
input-status: up
input-line-status: up
Action: allow
1 packet shown
4.unit-2-2根据安全策略允许数据包,并通过外部接口从站点2发送到站点1。
firepower# cluster exec unit unit-2-2 show cap capo
2 packets captured
1: 20:13:58.232058 802.1Q vlan#20 P0 192.0.2.10 > 203.0.113.10 icmp: echo request
2: 20:13:58.232058 802.1Q vlan#20 P0
观察/警告
Interface: inside
Configuration: Size: 200, Chain: 24, Timeout: 5, Reassembly: virtual
Run-time stats: Queue: 0, Full assembly: 0
Drops: Size overflow: 0, Timeout: 0,
Chain overflow: 0, Fragment queue threshold exceeded: 0,
Small fragments: 0, Invalid IP len: 0,
Reassembly overlap: 0, Fraghead alloc failed: 0,
SGT mismatch: 0, Block alloc failed: 0,
Invalid IPV6 header: 0, Passenger flow assembly failed: 0
在集群部署中,分段所有者或连接所有者将分段的数据包放入分段队列。分段队列大小受使用fragment size <size> <nameif>命令配置的Size计数器(默认为200)的值所限制。当片段队列大小达到“大小”的2/3时,会考虑超出片段队列阈值,丢弃不属于当前片段链的任何新片段。在这种情况下,超出分段队列阈值将递增,并生成系统日志消息FTD-3-209006。firepower# show fragment inside
Interface: inside
Configuration: Size: 200, Chain: 24, Timeout: 5, Reassembly: virtual
Run-time stats: Queue: 133, Full assembly: 0
Drops: Size overflow: 0, Timeout: 8178,
Chain overflow: 0, Fragment queue threshold exceeded: 40802,
Small fragments: 0, Invalid IP len: 0,
Reassembly overlap: 9673, Fraghead alloc failed: 0,
SGT mismatch: 0, Block alloc failed: 0,
Invalid IPV6 header: 0, Passenger flow assembly failed: 0
%FTD-3-209006: Fragment queue threshold exceeded, dropped TCP fragment from 192.0.2.10/21456 to 203.0.113.10/443 on inside interface.
作为解决方法,请增加Firepower管理中心>设备>设备管理> [编辑设备] >接口> [接口] >高级>安全配置> 覆盖默认分段设置,保存配置并部署策略。然后,在show fragment命令输出中监控队列计数器以及系统日志消息FTD-3-209006的出现情况。
由于ACI Pod中活动的L4校验和验证,群集中出现间歇性连接问题
症状
缓解
症状
设备无法加入集群,并显示以下消息:
The slave has left the cluster because application configuration sync is timed out on this unit. Disabling cluster now!
Cluster disable is performing cleanup..done.
Unit unit-2-1 is quitting due to system failure for 1 time(s) (last failure is Slave application configuration sync timeout). Rejoin will be attempted after 5 minutes.
All data interfaces have been shutdown due to clustering being disabled. To recover either enable clustering or remove cluster group configuration.
验证/缓解
firepower# show interface
Interface Port-channel1 "Inside", is up, line protocol is up
Hardware is EtherSVI, BW 40000 Mbps, DLY 10 usec
MAC address 3890.a5f1.aa5e, MTU 9084
Interface Port-channel48 "cluster", is up, line protocol is up
Hardware is EtherSVI, BW 40000 Mbps, DLY 10 usec
Description: Clustering Interface
MAC address 0015.c500.028f, MTU 9184
IP address 127.2.2.1, subnet mask 255.255.0.
firepower# ping 127.2.1.1 size 9184
Switch# show interface
port-channel12 is up
admin state is up,
Hardware: Port-Channel, address: 7069.5a3a.7976 (bia 7069.5a3a.7976)
MTU 9084 bytes, BW 40000000 Kbit , DLY 10 usec
port-channel13 is up
admin state is up,
Hardware: Port-Channel, address: 7069.5a3a.7967 (bia 7069.5a3a.7967)
MTU 9084 bytes, BW 40000000 Kbit , DLY 10 use
症状
设备无法加入集群,并显示以下消息:
Interface mismatch between cluster master and joining unit unit-2-1. unit-2-1 aborting cluster join.
Cluster disable is performing cleanup..done.
Unit unit-2-1 is quitting due to system failure for 1 time(s) (last failure is Internal clustering error). Rejoin will be attempted after 5 minutes.
All data interfaces have been shutdown due to clustering being disabled. To recover either enable clustering or remove cluster group configuration.
验证/缓解
登录每个机箱上的FCM GUI,导航至接口选项卡,并验证所有集群成员是否具有相同的接口配置:
症状
集群中有多个控制单元。请考虑以下拓扑:
机箱1:
firepower# show cluster info
Cluster ftd_cluster1: On
Interface mode: spanned
This is "unit-1-1" in state MASTER
ID : 0
Site ID : 1
Version : 9.15(1)
Serial No.: FLM2103TU5H
CCL IP : 127.2.1.1
CCL MAC : 0015.c500.018f
Last join : 07:30:25 UTC Dec 14 2020
Last leave: N/A
Other members in the cluster:
Unit "unit-1-2" in state SLAVE
ID : 1
Site ID : 1
Version : 9.15(1)
Serial No.: FLM2103TU4D
CCL IP : 127.2.1.2
CCL MAC : 0015.c500.019f
Last join : 07:30:26 UTC Dec 14 2020
Last leave: N/A
Unit "unit-1-3" in state SLAVE
ID : 3
Site ID : 1
Version : 9.15(1)
Serial No.: FLM2102THJT
CCL IP : 127.2.1.3
CCL MAC : 0015.c500.016f
Last join : 07:31:49 UTC Dec 14 2020
Last leave: N/A
机箱2:
firepower# show cluster info
Cluster ftd_cluster1: On
Interface mode: spanned
This is "unit-2-1" in state MASTER
ID : 4
Site ID : 1
Version : 9.15(1)
Serial No.: FLM2103TUN1
CCL IP : 127.2.2.1
CCL MAC : 0015.c500.028f
Last join : 11:21:56 UTC Dec 23 2020
Last leave: 11:18:51 UTC Dec 23 2020
Other members in the cluster:
Unit "unit-2-2" in state SLAVE
ID : 2
Site ID : 1
Version : 9.15(1)
Serial No.: FLM2102THR9
CCL IP : 127.2.2.2
CCL MAC : 0015.c500.029f
Last join : 11:18:58 UTC Dec 23 2020
Last leave: 22:28:01 UTC Dec 22 2020
Unit "unit-2-3" in state SLAVE
ID : 5
Site ID : 1
Version : 9.15(1)
Serial No.: FLM2103TUML
CCL IP : 127.2.2.3
CCL MAC : 0015.c500.026f
Last join : 11:20:26 UTC Dec 23 2020
Last leave: 22:28:00 UTC Dec 22 2020
确认
firepower# ping 127.2.1.1
Type escape sequence to abort.
Sending 5, 100-byte ICMP Echos to 127.2.1.1, timeout is 2 seconds:
?????
Success rate is 0 percent (0/5)
firepower# show arp
cluster 127.2.2.3 0015.c500.026f 1
cluster 127.2.2.2 0015.c500.029f 1
firepower# capture capccl interface cluster
firepower# show capture capccl | i 127.2.1.1
2: 12:10:57.652310 arp who-has 127.2.1.1 tell 127.2.2.1
41: 12:11:02.652859 arp who-has 127.2.1.1 tell 127.2.2.1
74: 12:11:07.653439 arp who-has 127.2.1.1 tell 127.2.2.1
97: 12:11:12.654018 arp who-has 127.2.1.1 tell 127.2.2.1
126: 12:11:17.654568 arp who-has 127.2.1.1 tell 127.2.2.1
151: 12:11:22.655148 arp who-has 127.2.1.1 tell 127.2.2.1
174: 12:11:27.655697 arp who-has 127.2.1.1 tell 127.2.2.1
缓解
以下是交换机配置示例:
Nexus# show run int po48-49
interface port-channel48
description FPR1
switchport access vlan 48
vpc 48
interface port-channel49
description FPR2
switchport access vlan 48
vpc 49
Nexus# show vlan id 48
VLAN Name Status Ports
---- ----------- --------- -------------------------------
48 CCL active Po48, Po49, Po100, Eth1/53, Eth1/54
VLAN Type Vlan-mode
---- ----- ----------
48 enet CE
1 Po1 up success success 10,20
48 Po48 up success success 48
49 Po49 up success success 48
Nexus1# show vpc brief
Legend:
(*) - local vPC is down, forwarding via vPC peer-link
vPC domain id : 1
Peer status : peer adjacency formed ok
vPC keep-alive status : peer is alive
Configuration consistency status : success
Per-vlan consistency status : success
Type-2 consistency status : success
vPC role : primary
Number of vPCs configured : 3
Peer Gateway : Disabled
Dual-active excluded VLANs : -
Graceful Consistency Check : Enabled
Auto-recovery status : Disabled
Delay-restore status : Timer is off.(timeout = 30s)
Delay-restore SVI status : Timer is off.(timeout = 10s)
vPC Peer-link status
---------------------------------------------------------------------
id Port Status Active vlans
-- ---- ------ --------------------------------------------------
1 Po100 up 1,10,20,48-49,148
vPC status
----------------------------------------------------------------------
id Port Status Consistency Reason Active vlans
-- ---- ------ ----------- ------ ------------
1 Po1 up success success 10,20
48 Po48 up success success 48
49 Po49 up success success 48
症状
一个或多个数据端口通道接口被挂起。当管理性启用的数据接口挂起时,由于接口运行状况检查失败,同一机箱中的所有集群设备都会从集群中退出。
请考虑以下拓扑:
确认
firepower#
Beginning configuration replication to Slave unit-2-2
End Configuration Replication to slave.
Asking slave unit unit-2-2 to quit because it failed interface health check 4 times (last failure on Port-channel1). Clustering must be manually enabled on the unit to rejoin.
firepower# Unit is kicked out from cluster because of interface health check failure.
Cluster disable is performing cleanup..done.
All data interfaces have been shutdown due to clustering being disabled. To recover either enable clustering or remove cluster group configuration.
Cluster unit unit-2-1 transitioned from SLAVE to DISABLED
firepower# show cluster history
==========================================================================
From State To State Reason
==========================================================================
12:59:37 UTC Dec 23 2020
ONCALL SLAVE_COLD Received cluster control message
12:59:37 UTC Dec 23 2020
SLAVE_COLD SLAVE_APP_SYNC Client progression done
13:00:23 UTC Dec 23 2020
SLAVE_APP_SYNC SLAVE_CONFIG Slave application configuration sync done
13:00:35 UTC Dec 23 2020
SLAVE_CONFIG SLAVE_FILESYS Configuration replication finished
13:00:36 UTC Dec 23 2020
SLAVE_FILESYS SLAVE_BULK_SYNC Client progression done
13:01:35 UTC Dec 23 2020
SLAVE_BULK_SYNC DISABLED Received control message DISABLE (interface health check failure)
firepower# show cluster info trace module hc
Dec 23 13:01:36.636 [INFO]cluster_fsm_clear_np_flows: The clustering re-enable timer is started to expire in 598000 ms.
Dec 23 13:01:32.115 [INFO]cluster_fsm_disable: The clustering re-enable timer is stopped.
Dec 23 13:01:32.115 [INFO]Interface Port-channel1 is down
FPR2(fxos)# show port-channel summary
Flags: D - Down P - Up in port-channel (members)
I - Individual H - Hot-standby (LACP only)
s - Suspended r - Module-removed
S - Switched R - Routed
U - Up (port-channel)
M - Not in use. Min-links not met
--------------------------------------------------------------------------
Group Port-Channel Type Protocol Member Ports
--------------------------------------------------------------------------
1 Po1(SD) Eth LACP Eth2/1(s) Eth2/2(s) Eth2/3(s) Eth2/4(s)
48 Po48(SU) Eth LACP Eth3/1(P) Eth3/2(P) Eth3/3(P) Eth3/4(P)
缓解
症状
设备离开集群。
验证/缓解
firepower# show cluster history
FPR4150# connect local-mgmt
FPR4150 (local-mgmt)# dir cores
如果集群设备的/ngfw分区中的磁盘利用率达到94%,则该设备会退出集群。磁盘利用率检查每3秒进行一次:
> show disk
Filesystem Size Used Avail Use% Mounted on
rootfs 81G 421M 80G 1% /
devtmpfs 81G 1.9G 79G 3% /dev
tmpfs 94G 1.8M 94G 1% /run
tmpfs 94G 2.2M 94G 1% /var/volatile
/dev/sda1 1.5G 156M 1.4G 11% /mnt/boot
/dev/sda2 978M 28M 900M 3% /opt/cisco/config
/dev/sda3 4.6G 88M 4.2G 3% /opt/cisco/platform/logs
/dev/sda5 50G 52M 47G 1% /var/data/cores
/dev/sda6 191G 191G 13M 100% /ngfw
cgroup_root 94G 0 94G 0% /dev/cgroups
在这种情况下,show cluster history输出显示:
15:36:10 UTC May 19 2021
MASTER MASTER Event: Master unit unit-1-1 is quitting
due to diskstatus Application health check failure, and
master's application state is down
或
14:07:26 CEST May 18 2021
SLAVE DISABLED Received control message DISABLE (application health check failure)
验证故障的另一种方法是:
firepower# show cluster info health
Member ID to name mapping:
0 - unit-1-1(myself) 1 - unit-2-1
0 1
Port-channel48 up up
Ethernet1/1 up up
Port-channel12 up up
Port-channel13 up up
Unit overall healthy healthy
Service health status:
0 1
diskstatus (monitor on) down down
snort (monitor on) up up
Cluster overall healthy
此外,如果磁盘约为100%,则设备在回接集群时可能遇到困难,直到释放一些磁盘空间。
每5分钟,每个集群设备检查一次本地和对等设备的CPU和内存利用率。如果利用率超过系统阈值(LINA CPU 50%或LINA内存59%),信息性消息显示在:
firepower# more log/cluster_trace.log | i CPU
May 20 16:18:06.614 [INFO][CPU load 87% | memory load 37%] of module 1 in chassis 1 (unit-1-1) exceeds overflow protection threshold [CPU 50% | Memory 59%]. System may be oversubscribed on member failure.
May 20 16:18:06.614 [INFO][CPU load 87% | memory load 37%] of chassis 1 exceeds overflow protection threshold [CPU 50% | Memory 59%]. System may be oversubscribed on chassis failure.
May 20 16:23:06.644 [INFO][CPU load 84% | memory load 35%] of module 1 in chassis 1 (unit-1-1) exceeds overflow protection threshold [CPU 50% | Memory 59%]. System may be oversubscribed on member failure.
该消息表示在设备发生故障时,剩余的设备资源可以超订用。
6.3之前FMC版本的行为
6.3后FMC
最低支持的管理器 |
受管设备 |
所需的受管设备最低支持版本 |
备注 |
FMC 6.3 |
仅FP9300和FP4100上的FTD集群 |
6.2.0 |
这仅是FMC功能 |
警告:在FTD上形成集群后,您需要等待自动注册开始。您不得尝试手动注册群集节点(添加设备),但应使用协调选项。
症状
节点注册失败
缓解
如果数据节点注册因任何原因失败,则有2个选项: