Each agent creates clusters of hosts, based on similar network location,
geolocation, and application traffic characteristics, to best identify anomalies. By
organizing individual hosts into clusters, the system can better provide an overall view
of your network's anomalies, while allowing you to view anomalous connections between
specific hosts. Cluster names describe the commonality of all hosts within the cluster.
Because each network deployment is different, the system does not predefine
During the initial learning phase, each agent analyzes the traffic coming through the router
it is installed on, identifying hosts, traffic, and the application in that traffic. It
also identifies the edges,
consisting of traffic of an application between clusters.
After the initial learning phase, the system
compares observed traffic to the baseline and identifies anomalies, deviations from the
baseline. The system also assigns an internal anomalous behavior score to each anomaly,
which signifies how much this behavior deviates from the baseline.
Because different ISRs see different portions of your overall traffic, each agent views different portions of your overall network traffic. Each agent views the hosts internal to that branch. Because the system dynamically generates clusters, an agent may reassign internal hosts to clusters based on additional observed information. Each agent also views hosts external to that branch. Because each branch detects different traffic, multiple agents may create different baselines, and include the same exteral host in different named clusters. Based on additional information, they may also reassign the external host to different clusters.
After an agent generates clusters, it identifies network traffic characteristics by examining NetFlow data and performing deep packet inspection (DPI), and identifies the application using Network Based Application Recognition (NBAR). The system groups similar applications into application groups, to streamline user analysis.
After it identifies traffic and applications, the agents identify the edges between clusters. An edge is the traffic of an application group between hosts in different clusters.
One edge can consist of traffic of the same
application group among multiple hosts, as long as the hosts share clusters. For
example, both of the following are part of the same edge:
Two clusters may share multiple edges, each
edge consisting of traffic of a different application group. For example, two clusters
may share the following edges:
email traffic over the
office application group (POP3, IMAP, and SMTP)
file transfer traffic over
file-xfer application group (FTP, SFTP, and SMB)
One cluster may have an edge with multiple
clusters for traffic of the same application group. For example, Cluster A is part of
the following email traffic edges:
POP3, IMAP, and SMTP email
traffic between Cluster A and Cluster B
POP3, IMAP, and SMTP email
traffic between Cluster A and Cluster C
Over the initial learning phase, the system applies machine learning algorithms to cluster and edge information to create an internal baseline model (baseline) of your network traffic. During this learning phase, and over the lifetime of the system, it continually applies additional machine learning algorithms to network traffic to detect deviations from the baseline and identify anomalies. An anomaly is traffic deviating from the baseline. The destination host involved in all the anomalous and related conversations is considered the target host. Based on the deviation from the baseline and the behavioral analysis, an agent assigns an internal anomalous behavior score to the anomaly.
The system also assigns a severity rating to the anomaly, which signifies the likelihood
of negative impact on your network. The severity rating is based on threat intelligence
collected from sources such as Talos, and can be low, medium, or high.
The system uses both the severity rating and
internal relevance score to determine whether to report the anomaly. Severity and
internal relevance are different, and can often correspond, but are not directly tied to
each other. The system, for example, may identify an anomaly with a low relevance score,
but report it with a medium to high severity, because malicious hosts are involved.
Alternatively, the system may identify an anomaly with a high relevance score, meaning
it is a large deviation from the baseline, but report it with a low severity because
threat intelligence did not identify any malicious factors.
Anomalies are not inherently malicious. For
example, if you normally backup your servers at 3 AM, and you change your backup time to
2 AM, the system first categorizes your 2 AM backup as a seasonal edge anomaly, because
it did not previously detect any traffic at 2 AM. However, if an unauthorized host
attempts to exfiltrate data from your servers at 2 AM, the system also categorizes that
as a seasonal edge anomaly.