Anti-virus software supplier Kaspersky Lab has refined a new approach to filtering, which it says will pick up spam and other unwanted text messages that may have been hidden in image files.
Some graphical spam goes undetected because it is hard to filter. Before it can start to be checked for spam, the software first has to determine if there is any text in an image. “Spammers intentionally distort and create ‘noise’ in images to make detection more difficult” the vendor explained.
Adding random noise or confetti to a message is only one way a spammer can get around anti-spam software. Blurring of text outlines, construction of the image from multiple image layers assembled within an HTML e-mail, or use of animated image formats can also be very effective.
Kaspersky said the majority of anti-spam systems fail to detect image spam because most of them use some form of machine recognition to screen images for text. And this requires uniformity in terms of size, style and the arrangement of symbols.
Currently, the surest known countermeasure for image spam is to discard all messages containing images that do not appear to come from an already white-listed e-mail address. Kaspersky reckons the approach it has developed is an improvement on this.
The patented technology is based on a probabilistic and statistical approach. Whether or not an image contains text is determined by the layout of the graphic patterns of words and lines, as well as the content of the letters and words in those patterns.
Dedicated filters ensure that the system is not affected by noise elements or the fracturing of text within images, while obfuscation techniques used in graphic spam such as warping and rotating are counteracted using a unique method of detecting text lines.
Because it does not call for machine recognition it is claimed to provide high-speed detection.
“The method has sufficiently low resource requirements for it to be used in Kaspersky Lab’s spam filter” said Eugene Smirnov, the developer of the technology and manager of the Anti-Spam Development Group at Kaspersky Lab. The new method is also quite good at detecting images that contain text in almost any language, he claimed.