MapR’s converged data platform is being future proofed with enhancements that focus on security, data governance and performance.
In recent months the importance of data governance and security has been in the spotlight as the agreements have been reached regarding Safe Harbour/Privacy Shield and the EU’s General Data Protection Regulation.
These upcoming regulations place a much greater emphasis on businesses to have strict control over their data, and this is where MapR’s updates come into play.
File and stream access control expressions (ACEs) have been added in order to simplify the granting of permissions to users and groups across both data files and directories using Boolean expressions, this is designed to make security administration more scalable and manageable, the company said.
The problem being dealt with is posed by big data platforms where the convergence means that many different applications and data sets are available, this makes it difficult for there to be granular access control.
To counter this MapR’s ACEs looks to provide more than a simple group or role definition, which can be difficult to set up and challenging to view the access control list and understand who has access and who doesn’t.
The introduction of Boolean expressions, an expression that evaluates to a value of the data type, in a value of either true of false, aims to improve this system.
Jack Norris, SVP, Data and Applications, MapR told CBR: "We applied it not only to the data but the whole volume, MapR has has logical volumes to separate out policies and access that can be over a series of directories." These log volumes can be used to protect sensitive info, personal financial information.
In multi-tenancy systems organisations typically set up separate volumes for each customer, set up an access control expression at that level and then no matter what people do underneath that, there is a second line of defence.
Essentially this makes sure that no one mistypes a name and accidentally gives access to people that shouldn’t have it.
In addition to access control, the company has audit capabilities that are designed to provide a granular view of every operation with respect to the cluster. Norris said that data reads and writes are logged into the audit system in a JSON file.
Because they are in the JSON format, Apache Drill can be used against the files directly for self-service. No fixed reports are included so users can augment and do any machine learning constructs that can look for anomalies.
This clearly ties in quite nicely with the upcoming regulations, Norris said "I think it goes hand in hand with our view that this is a core platform for business operations, once you focus on that then this high level of security and auditing are all part of that."
In addition to the focus on data security and governance, the company has been highlighting some performance improvements. According to research firm ESG, MapR Streams benchmark testing produce 18,000,000 messages/second performance with over 3.5GBs throughput.
Improvements around Solid State Drives have been made with the optimisation of I/O and parallelisation for SSD.
The final part of the rolling upgrade comes with Docker and containers in mind. The issue being tackled is that lines are being draw through operations, placing transient applications on one side that don’t require state and on the other side, stateful apps that require connection to persistent storage.
A stateful app is one that stores information about what has happened or changed since it started running. Stateless apps don’t expose any of that information, they give the same response to the same request, function or method call, every time – HTTP is stateless in its raw form.
MapR has made the converged data platform capable of providing the underlying layer to support stateful apps by mounting it to any physical servers in the data centre. This means it can access in those mounts any data across the cluster.
The benefit of this is that as containers move and sit on different physical infrastructure they will have access to the data anywhere in the data centre, according to Norris this "open’s up what’s possible."
The CDP is already available and cloud-based deployments will be available this month in AWS Marketplace, Azure Marketplace and CenturyLink Marketplace.