Elon Musk is now famous as the man who wants to end his days on Mars, after he’s revolutionised the transport and energy sectors.
He got his chance to do this with the money he made from his pioneering ecommerce firm, PayPal.
Apart from Musk’s inimitable leadership style, PayPal was able to rise to prominence by mounting a real-time view of its entire payment network, an innovative way of visualising data and complex networks of connections between users that was a big factor in breakout success and subsequent market domination.
A big part of its success was the way it headed off the threat of online fraud. And although no fraud prevention measures are ever 100% foolproof, significant progress can be achieved by looking beyond the individual data points to the relationships between them, in the way PayPal successfully managed to do.
So how did PayPal successfully beat the fraudsters? Looking at data relationships isn’t straightforward and doesn’t necessarily mean gathering new or more data. The key to it is to look at the existing data in a new way – namely, in a way that makes explicit underlying connections and patterns in the shape of a powerful but proven approach, graph databases.
Unlike most other ways of looking at data, graph databases are designed to exploit relationships in data. That means they can uncover patterns difficult to detect using traditional representations such as tables. PayPal, which moved $230bn worth of currency over its networks in 2014, still employs graph techniques to perform sophisticated fraud detection at massive, global scale. IDC estimates that this has already saved the firm more than $700m – and that graphs have enabled the company to perform robust predictive fraud analysis.
It’s not just the PayPals of the world who are exploiting the power of graphs. An increasing number of enterprises, from banks to ecommerce firms, are using them to solve a variety of connected data problems, first and foremost being the speedy detection of suspicious online activity.
There are various types of fraud – first-party, insurance, and e-commerce fraud, for instance. What they all have in common: layers of dishonesty to hide the crime – indirection that can really only be uncovered through connected analysis. In each of these types of fraud, graph databases offer a significant opportunity to augment existing methods of fraud detection, making evasion substantially more difficult: let’s see how.
Stopping The Bust Out
First-party fraud involves criminals who apply for credit cards, loans, overdrafts and unsecured banking credit lines but who have no intention of ever paying the money back. It’s a serious problem for banks, who lose tens of billions of pounds every year to this form of fraud, and it’s believed that as much as 20% of unsecured bad debt at leading US and European banks is just such first-party fraud.
First-party fraud is hard to detect and the perpetrators are skilled at emulating legitimate customers, until the moment they do their ‘Bust-Out,’ i.e. cleaning out all their accounts and disappearing.
Another factor is the exponential nature of the relationship between the number of participants in the fraud ring. However, while these characteristics make these schemes very damaging, it also renders them especially open to graph-based methods of fraud detection.
Why? That’s because a first-party fraud ring involves two or more people sharing a subset of legitimate contact information, combining them to create a number of synthetic identities. With these fake IDs, they will open new accounts for unsecured credit lines, credit cards, overdraft protection, personal loans, etc.
The accounts are used in a normal manner, with regular purchases and timely payments, so that the banks gain confidence and slowly increase credit over time. One day the ring makes their move, maxing out credit lines and disappearing. Collections processes ensue, but the stable door is shut and the fraudsters are long gone and ready to start all over again.
The potential scale of all this is quite worrying. In the case of two fraudsters, sharing only a phone number and address (two pieces of data), just these two miscreants can combine them to create four synthetic identities with fake names with four to five accounts for each synthetic identity, a total of 18 accounts. Assuming an average of £4,000 in credit exposure per account, the bank’s loss could be £72,000 – perhaps more. The potential loss in a ten-person fraud is no less than £1.5m, assuming 100 false identities and three financial instruments per identity, each with a £5,000 credit limit.
Clearly we need better ways to combat this. Analyst group Gartner thinks we need a layered model for fraud prevention that starts with simple discrete methods but progresses to more elaborate types of analysis. Its ultimate suggested layer, "Entity Link Analysis", leverages connected data in order to detect organised fraud – and this is a form of analysis graph databases excel in.
Complex modelling
Financial services companies currently use a number of methods to beat fraud. Standard instruments, such as a deviation from normal purchasing patterns, use discrete data, though – not connections. Discrete methods are very useful for catching fraudsters acting alone, but cannot pick up on the more elaborate ‘shared identifiers’ that typify fraud rings (collectives of miscreants, often cross-border, even cross-continent). Furthermore, many such methods are prone to producing a false positive.
Uncovering rings with traditional relational database approaches requires modeling the data as a set of tables and columns, then carrying out complex joins and self-joins. The problem is that such queries are complex to build and expensive to run, and scaling them in a way that supports real-time access is problematic, with performance becoming exponentially worse as the size of the ring increases and the total data set grows.
Window getting ever-smaller
Graph databases, by contrast, have emerged as an ideal tool for overcoming just such hurdles. Used with powerful new data languages like Cypher, which provides a simple semantic for detecting rings in the graph and navigating connections in memory, in real time, they can be a powerful way of spotting connections between fraudsters and their activities.
What’s more, running appropriate entity link analysis queries using a graph database, augmented by running checks tied to the right kinds of customer and account lifecycle ‘events’ can help banks identify probable fraud rings during or even before the fraud occurs.
As business processes become faster and more automated, the window for detecting fraud is becoming narrower – increasing the need for real-time solutions which graphs are key to enabling. Traditional technologies, while still suitable for certain types of prevention, are not designed to detect the most elaborate fraud operations. Graph databases provide a unique ability to uncover a variety of important fraud patterns, in real time.
Forrester Research predicted that just over a quarter of enterprises will be using such databases by 2017. Years back, Elon Musk saw their potential – and so should you.
The author is co-founder and CEO of Neo Technology, the company behind Neo4j graph database