A. Adam Wilson, Informatica
Can you give me a quick overview of Informatica’s ILM division?
Convincing companies that data is an asset is easy. Convincing them that it is a liability is hard.
Companies need to put more focus on ‘lean data management’, by which I mean aligning the value of the data. Why does all data fly first class, maintained and managed by expensive personnel? We help companies to pull the data out and partition it more intelligently. We were doing that for live data because you save costs and get better performance in the process.
But companies were telling us that they had a problem with obsolete data. Nobody wants to unplug anything but we enable them to move all that obsolete data into an optimised archive so they can retire applications. Once it’s in the dedicated archive you can apply records management principles and still query the data if you need to, but you don’t have to pay for a database. That is part of Informatica Data Archive. It has built in legal hold, and all the tagging and so on you need for things like defensible data disposal.
What’s your view of Big Data?
We can help companies get rid of data that there’s no business reason to keep. We can help to make Big Data small.
The growth in data shows no sign of letting up. But at some point the relational database model breaks down, because it costs a fortune. There’s a better place to put that low density but high volume data. We can move it to disk with the cost characteristics of tape but it’s still online or nearline when you want to query the data.
The storage guys have never understood the application layer. They understand what a block or LUN is but not what a purchase order looks like. We can understand the data in a transactionally astute way, for example a purchase order that’s older than six months may be OK to archive based on its metadata. We do entity inference, inferring entity definitions based on the data and make suggestions based on that – that helps to accelerate the process.
In what sort of cases might that help?
For example a typical Oracle E-Business application may have hundreds or even thousands of disconnected tables. That’s an example of where we can go in and infer entity definitions, and because we’ve done that for customers we can use what we know to help other customers. We have that for Amdocs, SAP, Oracle, PeopleSoft and do on.
We can also run simulations, for example why are some purchase orders not eligible to be archived? It may be they have no matching invoice, so those can be flagged for deletion if they are old enough, for example.
A lot of what you are talking about seems to refer to production data. What about non-production, for example dev and test?
It’s a valid question because for everything companies have in production they typically have 8 to 12 copies in non-production. It’s not only very expensive but it’s a huge security risk. The non-production environment is like the Wild West. The biggest risk for data breach is not people coming through the firewall – 58% of breaches are through negligence or maliciousness. The threat has shifted. We’ve been doing this for several years but there’s been an explosion of interest in the last six months.
So what’s changed?
There are a lot of privileged users in the organisation, organisations are more aware of the dangers and the regulators are stronger and more vocal. The fines are huge and the reputational damage is massive. We can help the company ‘skinny down’ production copies for non-production use, then obfuscate that data so that sensitive data is not at risk. That’s our Test Data Management and Informatica Data Masking.
How does this compare to standard encryption?
We compliment encryption. Privileged users have the right to see and manipulate data because they are building and testing new applications. You can’t just encrypt it all. You still want birthdays to look like birthdays, but we use a number of masking techniques so that the data is practically worthless if it gets spilled.
We’ve seen take-up of this technology in things like clinical trials, too – you need people to be able to manipulate the data but there are pieces of personally identifiable information that nobody needs to see, and we can obfuscate that.
And is the data masking piece used in production environments too?
Yes, we can do this kind of masking for production as well as non-production environments. We had a telco that was having to fire five or six operators per month because they were being caught trawling celebrity phone data. That’s obviously extremely sensitive given recent headlines. We obfuscate that data dynamically so that depending on the user’s authority they can only see limited pieces of the data. It’s all about reducing the risk. The possible fines and reputational damage mean that this is an area where projects are getting funded and accelerated.
Are there any new capabilities in the pipeline?
We’ll bring out a new version of our Test Data Management soon. Without giving too much away, one of the features of that upgrade will be for the Dynamic Masking to get broader database support. We support Oracle, SQL Server and DB2, we’ll be adding Teradata and later on also mainframe environments like VSAM, IMS and so on. There’s a lot of data in the typical organisation that could do with some ILM.