The Pitfalls of Hunting Cyber Threats with AI

Artificial intelligence (AI) will not automatically detect and resolve every potential malware or cyberthreat incident, but when it combines both bad and good behavior modeling it becomes a successful and powerful weapon against even the most advanced malware.

From Trapping to Hunting

By their very nature, malware detection tools must constantly evolve to stay up to date with ever-changing crimeware. One of the biggest evolutions in malware detection is the migration from trapping to hunting. In threat trapping, passive technologies identify malware using models of bad behavior like signatures. If a malware signature is found in an object, it is malicious.

The Pitfalls of Hunting Cyber Threats with AI - CBR — Giovanni Vigna, Co-founder and CTO, Lastline.

For decades, most malware detection products have used and depended on trapping, or bad behavior models.

With threat hunting, good-behavior models proactively search out anomalous and malicious activities that don’t match the models. In theory, by modeling what’s good you don’t have to model what’s bad. You can always detect the bad because you know what is good, and everything different from good must therefore be bad.

This shift from trapping to hunting, or from bad behavior to good behavior modeling, is necessary because advanced malware is so sophisticated that it can easily evade security solutions that rely on bad behavior models like signatures. Today’s malware authors are quite adept at creating single-use or limited-use malware that will never be seen by signature-creating vendors. Without signatures, conventional detection tools that are dependent on bad behavior models are entirely ineffective. They will not detect this type of advanced malware.

Because hunting technologies use good behavior modeling and don’t rely on signatures, they are much more effective at discovering modern evasive malware. Products that use good behavior modeling will detect many forms of malware that a signature-based tool will miss. Because of their proactive behavior monitoring, malware hunting tools detect anomalies like:

An excessive use of certain resources (such as CPU or memory)
Connections to hosts with which the target of an infection never communicated in the past
An unusual amount of data transferred to an external host
The invocation of programs, such as compilers or network exploration tools, that have never been used before
Logins at unusual times
… and many other anomalies

AI Automates Good Behavior Modeling

Unfortunately, developing accurate malware detection products based on good behavior modeling is not easy. It is necessary to collect and analyze a huge amount of data—capturing, processing, and classifying virtually everything legitimate programs and users do. That requires not only access to the data, but an extraordinary amount of processing power—and it’s a never-ending job. Because behaviors are always changing, behavior modeling is a forever task. It’s never complete and becomes obsolete very fast.

Performing all of these good behavior modeling tasks would be virtually impossible to do manually. However, AI or machine learning is especially suited for this type of work. Unlike human beings, AI never tires, scales in extraordinary ways to handle very large datasets, and can automatically generate baseline models of normal behavior.

Although behavior modeling is a never-ending job, AI has the horsepower to constantly stay on top of the changes. As soon as new behaviors emerge, they will automatically be accounted for in real-time.

Overcoming False Positives

Applying AI to the task of developing good behavior models solves many of the technical and resource challenges of detecting advanced malware. However, even though AI is a powerful approach, there are caveats to how well it can develop accurate models. AI is important, but it’s not a silver bullet. It’s something you need in your arsenal but it’s not a solution to every problem.

In particular, there is one very significant obstacle that AI needs to overcome—false positives.

The false positives problem lies in the fact that anomaly-based malware detection is built on the flawed assumption that anomalies are inherently bad. Anything that is out of the ordinary is dangerous, or at least could be dangerous and needs further examination. However, in reality it’s actually the norm to have at least some anomalies that are good, and some bad behaviors that won’t show up as anomalies.

This fundamental but flawed assumption causes a major problem because a malware detection system that generates alerts based on abnormal events will likely create a lot of unnecessary work for administrators. Every time the system sees something weird, someone has to manually check it out. For example, an after-hours login could be evidence of an attack or could be just someone working late for the first time in their career. Each of these anomalies creates a potential fire that very skilled and expensive security personnel must extinguish.

Augmenting Good Behavior Modeling with Bad Behavior Modeling

To overcome the high number of false positives generated by good behavior modeling, each anomaly must undergo further evaluation. As examples, an unusual connection to a distant site might be legitimate, or it might be part of a data breach attack. Likewise, a large data transfer might be an unauthorized exfiltration, or it could be an administrator setting up a remote company server. A security analyst or administrator could manually validate both of these anomalies by looking at the data and those involved, and if necessary, talking with the individuals to understand the context.

This manual method of evaluating each false positive works for a small number of incidents. But no company has enough human resources to manually evaluate a large number of alerts about possible security threats.

Fortunately, with advanced AI capabilities we don’t have to rely entirely on human resources to evaluate all of the potential security incidents. Despite a number of serious shortcomings, bad behavior models are actually quite effective at helping to evaluate false positives. When AI applies both bad and good behavior models, it reduces the number of false positives to a manageable amount.

To illustrate this, let’s return to our earlier examples of unusual connections and large data transfers. Bad behavior models will frequently include information about hosts and IP addresses that are known to be malicious. They also contain information about malicious traffic and data flows. This information is extremely valuable. AI can evaluate an unusual connection or large data transfer against these known models of bad behavior. A match indicates with near certainty that the actions are malicious. On the other hand, if the addresses, data flows, and traffic don’t match any known bad behavior, confidence is dramatically increased that the anomaly is just that, an anomaly and nothing more.

AI Dramatically Lessens Reliance on Humans

As malware detection tools continue to evolve, AI will play a larger and more important role. Although the good behavior models most commonly associated with AI often create false positives, adding the automated application of bad behavior models can reduce these errors to a manageable number.

Although it’s difficult to imagine a world where human resources aren’t involved in combating evasive malware, the recent advances in AI will dramatically lessen our reliance on human assistance.