When the Big Bad Wolf Eyes the Brick House: Advanced Malware Protection in Our Data Centers

 

Published: January, 2020

Any reputable security expert would advise the Three Little Pigs to start by strengthening their straw and stick houses. The brick house can wait a bit because it’s resistant to the most common attacks: huFF and P.UFF.

Replace straw and stick houses with Windows and Mac endpoints—and brick houses with Linux and Windows servers—and you get the gist of our deployment strategy for Cisco Advanced Malware Protection (AMP). In 2019 we began replacing our third-party antivirus solution with Cisco AMP, which not only blocks suspected malware at the point of entry but also continually scans for infected files that manage to sneak past our defenses. We started with our 125,000 Windows and Mac endpoints because most malware attacks target those operating systems. The malware detection rate doubled, as described in this blog.

But like brick houses, Windows and Linux data center servers are not invulnerable. The number of malware attacks against even Linux servers is rising—a side effect of their growing numbers. We’ve got thousands of Linux servers in our data centers, increasing the potential for malware. Some of that malware could give hackers a foothold to attack Windows and Mac endpoints.

Deployment: data center by data center, starting with non-production servers

We added another layer of protection to our Windows and Linux data center servers by deploying Cisco AMP. At the time our Windows servers ran a popular antivirus solution, deployed on-premises, and our Linux servers didn’t have antivirus protection. 

As a cloud-based solution, Cisco AMP spares us from having to deploy and maintain servers for the management console, updates, reporting, etc.—tasks that used to take about 30 hours per quarter even when there were no issues. With Cisco AMP cloud we don’t need any on-premises hardware. Instead we just install lightweight connectors on the servers we’re monitoring.

To roll out AMP we started with lab servers, moved on to development servers, and later added production servers. In the initial deployment we configured Cisco AMP in Monitor mode so that it alerted us to suspected malware without quarantining the files. This prevented false positives that could inconvenience our users until we tuned the rules. Running in Monitor mode also prevented conflicts between Cisco AMP and the old Windows antivirus solution we continued to use during the cutover. 

Migrating from our previous Windows server antivirus solution

While Cisco AMP was running in Monitor mode, we built and tuned policies for applications running on Windows servers. We made sure to replicate the antivirus exclusions we had entered into the old antivirus software to prevent problems like high CPU usage, slow application response time, etc.

When we were confident that Cisco AMP detected malware and had a low false-positive rate, we cut over from the old antivirus solution. First we disabled the antivirus features in the old product and set Cisco AMP to Protect mode, and then later we uninstalled the old AV product from the servers.

The issues we saw when we rolled out the solution on non-production servers were minor. These included high CPU on busy servers (resolved with tuning), some false positives, and lack of server tagging and reporting.

Today Cisco AMP is set to Protect mode across all our Windows servers, actively monitoring and blocking known malware. It also provides tagging and reporting capabilities. Not having to manage backend server infrastructure saves considerable time. How much? In the 10 months preceding our migration to Cisco AMP, we spent about 30 hours responding to five change requests related to servers running the old AV product. In the most recent quarter, after we had decommissioned 85% of on-premises infrastructure, we spent only four hours on change requests. That’s 26 hours of found time for new projects.

Lessons learned as “customer zero”

As customer zero for Cisco AMP on Endpoints and one of the first to use it for data center servers, we’re sharing what we’ve learned to help customers simplify their own deployments.

First, we’re big fans of server grouping because it allows us to customize policies at a granular level. Creating groups and moving systems based on their function is important for successful policy management. We strongly recommend automating group assignments. If you have thousands of servers, as we do, assigning them to groups manually would take a lot of time. We automated the process using the Cisco AMP API to assign servers to groups based on attributes like lifecycle or Active Directory membership.

Second, because some Cisco AMP connector upgrades require a reboot to complete, we now schedule these upgrades for just before other planned activities that require a reboot, such as operating system patching. This reduces server downtime and helps to make sure we have the most current and stable Cisco AMP connector version.

The payoff: deeper understanding of malware helps us shore up our defenses

Beyond antivirus protection, Cisco AMP helps us better understand what malware does and how it gets into our network. As Sun Tzu said, “Know your enemy.” Here are our favorite features.

Sandboxing, for safe investigation of suspect files

When Cisco AMP flags an unknown file as “suspect,” it moves the file to an isolated sandbox environment. Sandboxes detonate these unknown files in a safe environment and then record their actions. Comparing the behavior of new suspect files to the behavior in these reports helps us determine if they are malicious. 

Trajectory analysis: how the file got in, where it went, and what it did

Unlike our old antivirus solution, Cisco AMP lets us see the path the malware took through the network—the file trajectory. A map shows how malware got onto a device and how it spread. We can see, for example, that a file infected with Skidmap came in via SCP at 11:47 a.m. when John Doe executed a particular command. Seeing which servers and endpoints Skidmap has touched let us immediately isolate and repair the affected servers before malware can spread. Without this feature we’d have to search large numbers of servers to identify the infected ones.  That’s a lengthy process, increasing the risk of malware spreading and doing further damage.

Figure 1. Device Trajectory

Figure 2. File Trajectory

Automatic submission of unknown files to Threat Grid

Cisco AMP automatically sends files to Cisco Threat Grid, the cloud service we use for all file analysis. Automatic submission of files saves time for our operations staff—and helps us identify new threats sooner. Right now Cisco AMP is submitting about 10,000 files daily for Cisco IT.

Summing up, Cisco AMP blocks most of the binaries the Big Bad Wolves of the world lob our way. But some malware will inevitably sneak in. Cisco AMP is on the case--continually scanning files and showing malware the door.

For more information

Cisco AMP

To read additional Cisco IT business solution case studies, visit Cisco on Cisco: Inside Cisco IT