View all newsletters
Receive our newsletter - data, insights and analysis delivered to you
  1. Technology
  2. Cybersecurity
December 17, 2018

Microsoft: Here’s an “Unprecedented” Dataset – Predict Infection, Win $20k

First prize, $12,00: 261 competitors and counting....

By CBR Staff Writer

Microsoft has launched a new competition challenging researchers and programmers to come up with an AI model that predicts the likelihood of malware infection based on a machine’s configuration.

It is providing an “unprecedented malware dataset” to train the AI on. The winner will receive $12,000, with a second price of $7,000.

The competition was announced December 13 on Kaggle, described as “an AirBnB for Data Scientists”. Kaggle is a community platform for data scientists founded by Google. It has over 536,000 active members.

There are already 261 competitors, with the competition closing in three months.

Microsoft’s aim is to further improve Microsoft layered defence system by establishing a predictive approach to system vulnerabilities.

Announcing the competition, the company said: “The malware industry continues to be a well-organized, well-funded market dedicated to evading traditional security measures. Once a computer is infected by malware, criminals can hurt consumers and enterprises in many ways.”

With more than one billion enterprise and consumer customers, Microsoft takes this problem very seriously and is deeply invested in improving security.

Content from our partners
Scan and deliver
GenAI cybersecurity: "A super-human analyst, with a brain the size of a planet."
Cloud, AI, and cyber security – highlights from DTX Manchester

Microsoft Malware Prediction AI

Participants are tasked with building the models using 9.4GB of anonymised data collected from 16.8 million devices by Microsoft.

This data is then divided into two lots, train.csv and test.csv.

Within these data sets each row corresponds to a machine with the indicator MachineIdentifier. A second labelled HasDetections informs the participants that malware was detected within the indicated machine.

Using the information and labels in train.csv, the participants are tasked with predicting the value for HasDetections for each device in test.csv.

The train.csv file contains a wealth of machine configuration information such as the Operating System, processor type, country location and the current firewall setup.

Chase Thomas and Robert McCann Windows Defender Research team commented in a security blog that: “The competition provides academics and researchers with varied backgrounds a fresh opportunity to work on a real-world problem using a fresh set of data from Microsoft.”

“Results from the contest will help us identify opportunities to further improve Microsoft’s layered defenses, focusing on preventative protection. Not all machines are equally likely to get malware; competitors will help build models for identifying devices that have a higher risk of getting malware so that preemptive action can be taken.”

See Also: New Trojan Targets PayPal App

You can enter the Microsoft Malware Prediction competition on Kaggle here. It finishes on March 13, 2019. The maximum team size is eight people and up to five entries can be submitted per day.

Microsoft is also awarding

Websites in our network
Select and enter your corporate email address Tech Monitor's research, insight and analysis examines the frontiers of digital transformation to help tech leaders navigate the future. Our Changelog newsletter delivers our best work to your inbox every week.
  • CIO
  • CTO
  • CISO
  • CSO
  • CFO
  • CDO
  • CEO
  • Architect Founder
  • MD
  • Director
  • Manager
  • Other
Visit our privacy policy for more information about our services, how Progressive Media Investments may use, process and share your personal data, including information on your rights in respect of your personal data and how you can unsubscribe from future marketing communications. Our services are intended for corporate subscribers and you warrant that the email address submitted is your corporate email address.