LinkedIn, the world’s largest professional social networking site have last week announced the open sourcing of the machine learning library named ‘Isolation Forest’, an implementation of a widely used Machine Learning algorithm, the Isolation Forest. The library is being used by LinkedIn to detect and prevent its users from online abuse.
LinkedIn’s implementation of the Isolation Forest is at its core Machine Learning, a modern used approach to writing software in which the software makes decisions instead of a human, based on learnings from data. The firm outlined the “unique challenges” it faces using Machine Learning to tackle to the issue of online abuse in its article announcing the new library.
To name a few, such challenges (they said) are primarily due to labelling the data, an approach typically used in ‘supervised learning’ (a widely used approach to implementing Machine Learning) and adversarial adaptivity, simply meaning that the ‘abusers’ are “quick to adapt and evolve”.
Detecting Suspicious Abnormalities in the Data
As a result of these challenges, the team decided to use a different approach, still with Machine Learning, but instead utilising a well-known algorithm called ‘Isolation Forest’ in which outliers (essentially something that looks different to the norm in a set of data) can be optimally identified in a non-organised set of data – basically it’s much easier to tell if something in the data is strange.
With this in mind, the team noted how this enabled them to identify potentially abusive behaviours in order to safeguard their users: “For some types of abuse, such as spam, it is possible to have a scalable review process where humans label training examples as spam or not spam. There are other types of abuse, such as scraping, where this kind of scalable human labeling is much more difficult, or impossible”
Open Sourced and Applicable for Payment Fraud Through to Data Center Monitoring
As noted earlier, the LinkedIn have open-sourced this software library out into the developer community, meaning that developers from other firms will be able to utilise this unsupervised machine learning approach in a variety of contexts which the team suggest as potential usages, namely:
- Automation detection
- Payment fraud
- ML health assurance
- Data center monitoring
In closing, there are already other implementations of Isolation Forests available for developers to use, however with the size and scale of LinkedIn’s platform, in addition to the scale of the firms engineering output, it is likely that this open sourced library will benefit other technology teams looking to solve similar outlier detection problems and most importantly protect users in a variety of contexts.