End users around the world are reporting an increase in spam. Much of this increase can be attributed to a spam resurgence in 2006, propelled by the emergence of new, more sophisticated forms of image spam.
How Image Spam Works
In image spam, the "call to action" of the message is part of an embedded file attachment instead of in the body of the email. (Examples are a GIF or JPEG.) These images are automatically displayed to end users, yet the content of the image itself remains hidden from most spam filters.
Image Spam Reduces Productivity
The increase in more complex image spam attacks has caused spam capture rates across the email security industry to decline. The sheer increase in the volume of spam, combined with a higher percentage of larger-sized spam:
- Reduces productivity
- Frustrates end users
- Clogs the email infrastructure because many mail systems cannot keep pace with the volume
Learn How to Protect Your Organization
Here we will discuss:
- Recent trends in image spam
- Why image spam is difficult to detect
- How Cisco IronPort can protect you from this increasing threat
You can also download the 2006 Image Spam Report (PDF - 235 KB)
Trends and Solutions
According to the SenderBase Network, spam volumes leveled off in 2005, but surged again in the second quarter of 2006. Figure 1 shows that worldwide spam volumes grew from approximately 30 billion messages per day to more than 50 billion over 12 months.
Spam Increase of 40 Percent
IronPort saw a 40 percent increase in spam volumes during Q2 2006 alone. Therefore, even if the spam capture rate had held constant, average end users would have noticed 40 percent more spam in their inbox since April.
Image Spam Increase to More than 20 Percent
Much of this increase in overall spam volume can be attributed to the growth in image spam. As illustrated in Figure 1, image spam rose from around 3 percent of spam in July 2005 to more than 20 percent a year later. When overall spam volumes spiked in Q4 2005 and Q2 2006, image spam was propelling the increase.
Figure 1: Worldwide Increase in Image Spam Caused Overall Spam Volumes to Surge in Q2 2006
Root Cause
The root cause of this sharp increase in spam volumes is money. Spammers are single-minded: They send spam to make money. The more messages that are delivered to inboxes, the better the chances that recipients take action on the messages, resulting in more income for spammers.
Image Spam Will Remain a Problem
As we discuss next:
- Randomized image spam is especially difficult for most spam filters to detect, so more of the spam gets delivered.
- Spammers can also make their images appear quite normal and compelling to users, resulting in higher response rates.
Because neither of these factors is likely to change in the near-term, Cisco IronPort expects image spam to remain a problem for the foreseeable future. Spammers innovate rapidly in their use of image spam, suggesting that image spam will soon become even more challenging to detect.
Why Image Spam Is Difficult to Detect
Image spam has been around for years. It was originally created to get past "heuristic" filters, which block messages containing words and phrases commonly found in spam. Because image files are in an entirely different format from the text found in an email, heuristic filters never "see" the content of the message. So, these filters were easily defeated by this type of spam.
First Attempts to Address the Image Spam Problem
To deal with this problem, antispam vendors developed "fuzzy signature" technologies. These signature-based technologies collect samples of known spam and then classify "near-identical" messages as spam. These signatures were sometimes written against just the message attachment, so messages with different content but the same attachment would still be marked as spam.
Spammers Randomize Images with Dots
Signature-based defenses remained effective for several years. In 2006, however, spammers began randomizing images to appear the same to the human viewer but totally different to spam filters. For example, some messages advertise stock purchases with an attached GIF file. Random "dots" and borders of subtly different color and width are inserted in the image (Figure 2).
Figure 2: Embedded GIF File Containing All "Text" with Dots Randomly Inserted in the Image to Make Every Message Appear Unique to Spam Filters
The signatures that most antispam vendors rely on to detect these attacks vary dramatically, based on these small changes to the image. So, antispam vendors may publish a rule that stops one instance, but this rule does not stop all the remaining spam messages in the attack.
Numerous Other Ways to Randomize Images
There is almost an infinite number of ways that spammers can randomize images. In addition to inserting dots, spammers have recently used techniques such as:
- Varying the colors used in an image
- Changing the width and pattern of the border
- Altering the font style
- "Slicing" images down into smaller pieces
Figure 3 shows an example of the "slicing" technique recently used by spammers. Images are broken down into many smaller files of varying sizes and then reassembled in the mail client. They appear as a single image to the email recipient.
Figure 3: Image "Sliced" into Smaller Pieces and Reassembled
The rectangle in Figure 3 represents the border of one of more than a dozen image files used to construct this message. This technique is used to defeat signature-based defenses and break up words that could be found by optical character recognition.
Optical Character Recognition as a Defense
Some vendors have recently introduced optical character recognition (OCR) to detect image spam. OCR technology extracts typewritten text from an image. While more effective than signature-based solutions alone, OCR has several limitations:
- It is very computationally expensive.
- Fully rendering each message and looking for word matches against different character set libraries can take several seconds per message.
- System throughput can be reduced to below levels acceptable to most ISPs and enterprises.
- It is extremely vulnerable to obfuscation.
While modern OCR technology can reliably detect typed letters and numbers, it can be easily fooled by basic techniques used by spammers. For example, OCR is ineffective at detecting image spam that includes handwritten text, graphics, or any abstract data.
Protection with Cisco IronPort Anti-Spam
Cisco IronPort Anti-Spam uses a unique, multilayer approach that stops more than 98 percent of image-based spam, with near-zero false-positives. The first layer of defense is powered by Cisco IronPort Context Adaptive Scanning Engine (CASE). Next is an inner layer of image spam protection powered by the patent-pending Cisco IronPort Multidimensional Pattern Recognition (MPR) technology.
Content Analysis Alone Is Not Enough
Most antispam filters depend greatly on content analysis to stop spam. These filters all rely heavily on something that can easily be manipulated by spammers themselves, however. Image spam is just one instance where content-based filters fall short. As Figures 2 and 3 show, the "content" of the spam is invisible to many filters because it is embedded in the image itself.
Context Adaptive Scanning
To detect image spam, Cisco IronPort has augmented traditional content-based techniques with techniques that analyze the full context in which the message was received. Specifically, CASE detects threats by analyzing four broad areas:
- Who sent the message and what do we know about this sender?
- Where does the call to action in the message take you?
- What is the nature of the message content?
- How was the message technically constructed?
Instead of generating a signature based on the content of the message, Cisco IronPort creates a specific spam profile for an image-based spam attack. This profile combines the "who, where, what, and how" of a message.
CASE Example
For example, one profile might be created for a message that:
- Originates from a dynamic IP address
- Contains a certain header pattern
- Has an embedded image of a specific size range and type
- Contains little or no text in the body of the email itself
None of these factors alone are likely to indicate with certainty that a message is spam, but they are highly accurate when combined. Context adaptive scanning helps Cisco IronPort filter the majority of image-based spam attacks without decoding the image file. The second layer of protection is provided by Multidimensional Pattern Recognition (MPR).
Obfuscating Image Spam
To the human eye, image spam is extremely recognizable. In fact, this property of image spam is one that makes it attractive to spammers. Obfuscating image spam content to avoid filtering does not require nearly the same effort that traditional text spam does. But if this spam is so obvious to the end user, why are spam filters unable to identify it?
Why OCR Falls Short
The challenge is that humans interpret the content of messages using a much more comprehensive data set than just the text displayed. A readers perception of a message is shaped by image attributes such as:
- Color
- Shape
- Font size and type
- Graphics
- Many other characteristics
This information is entirely hidden from traditional content filters, and technologies like OCR capture only a fraction of this information.
Multidimensional Pattern Recognition
Cisco IronPort Anti-Spam includes a patent-pending technology called Multidimensional Pattern Recognition (MPR) to address this problem. After decoding the binary image files, Cisco IronPort uses MPR to analyze the decompressed image data across more than 13 dimensions. This analysis determines whether or not the message is spam.
MPR Example
Color is an example of a dimension that provides plentiful information about the content of a message. Cisco IronPort analyzes the distribution of colors found in each message to establish the likelihood that the message is spam. Two examples are:
- MPR can scan a GIF file to look for pixel patterns
- MPR can also detect anomalous "dots" in images
Pixel patterns can indicate that the image file is displaying "all text" to the user. This pattern is common in spam but rare in legitimate email. Most legitimate GIF files contain pictures, not text.
Anomalous "dots" do not fit the "smoother" gradients of light typically found in legitimate email. These dots may represent attempts by the spammer to defeat signatures.
Deep Inspection Without Compromising Performance
To make this level of inspection possible without compromising performance, Cisco IronPort applies the concept of "early exit":
- The more intensive MPR process is applied to messages only after images have passed through regular context adaptive scanning.
- Within MPR, if part of the image file has been analyzed and the message is noted as spam, the full image file will never be analyzed.
The end result is a process that is not only more accurate, but also several times faster than traditional OCR technologies. Critical to the effectiveness of this technology is the real-time nature of Cisco IronPort Anti-Spam. Updates to the system are made every 5 minutes, helping to ensure immediate and accurate protection from image-based threats.