Derek Lin is Exabeam’s Chief Data Scientist and has more than 20 years of experience in the cybersecurity industry. His previous work has included behavior-based security analytics such as malware detection and insider threat detection, risk-based on-line banking fraud detection, data loss prevention, voice-biometrics security, and speech and language processing. He spoke to Ed Targett about why the “devil is in the data”.
How much is AI/Machine Learning/Data science helping cybersecurity?
Cybersecurity solutions to detect threats used to rely on signature-based blacklisting or correlation rules. These solutions have long been inadequate. These days Data Science plays a significant role in areas from endpoint protection to insider threat detection. This is particularly true in the area of user or entity behaviour monitoring where each user or entity’s current activity is constantly being monitored against its historical profile (or normal behaviour). This type of pro-active monitoring is impossible without an automated data-based analytics approach afforded by data science.
What’s the biggest advancement these technologies have provided to the cybersecurity industry?
In addition to user and entity behaviour analytics – where machine learning has shined – a big advancement in network traffic or malware binary detection is Deep Learning (a machine learning framework). Deep Learning is… now applied to analyse network packets and binary executable files, looking for clues in packet or byte streams that are impossible to identify with human eyes.
How far can AI/advanced analytics go to automate threat detection and response?
The goal is not to fully replace human analysts, but to make analysts more productive. Without advanced analytics, security analysts have been mired in mountains of alerts. Much effort is wasted in chasing down false positives while many threats go undetected. The goal of advanced analytics is to provide better and sharper signals for humans so there are fewer leads to start with. Given alerts, systems can automate low-tiered incidents but leave the few, but difficult ones, for human analysts.
How is AI/advanced analytics being utilised by cybercriminals?
Cybersecurity is a cat-and-mouse chase game. Cybercriminals have the same toolset as we do. For example, in order to avoid domain name blacklisting, they have used domain generation algorithms to come up with fast-changing domain names with random-looking characters to avoid being blacklisted. Once we were able to detect domain names with random characters, they switched to generate domain names with random words to evade the detection. In security work, it has always been to build the wall a little higher until it is scaled again.
How is data currently being used in the infosec sector?
Let’s take the area of insider threat detection. Data is everywhere. Yet, it is an open secret that much of the security data needs curation before it can become useful. Volumes of the raw data need be parsed and cross-correlated so that data elements are fully normalised before they can be utilised. Once the data is cured, we use it to monitor user activities to look for anomalous behavior. We can use it to derive system intelligence to supplement IT knowledge and detect threats of known behaviour scenarios.
Will Machine Learning continue to disrupt the industry, or have all of its uses been found?
We shouldn’t limit our horizon to the existing data sources or to the current level of machine learning research. A good parallel example is in the advancement of natural language processing. It wasn’t too many years ago that Deep Learning disrupted the then plateaued field of speech recognition and brought forth new realisations in accuracy and applications. We are still at the beginning of machine learning application for cybersecurity.
What is the next step for AI in cybersecurity?
Compared to other industries, there are relatively fewer machine learning scientists working in cybersecurity. In theory, as the community grows larger, more innovations will come. But what can really encourage the progress is collaboration at both the human-level and data-level. Future industry-wide efforts in sharing security data and incidents for the purpose of research will benefit research activities tremendously.