The update integrates the system with Hitachi’s Hi-Command management software and flagship Universal Storage Platform disk array, and adds replication, encryption and de-duplication functions.
HDS began OEM’ing Archivas’ CAS system last summer, and by the time it bought Archivas outright for a rumored $110m in February 2007, it had scored only around four customers for the re-branded product.
That added to around about a dozen customers that Archivas had won independently of that OEM deal, in around two years on the market.
According to HDS, one reason for the slow sales is that Archivas’ was only selling software to be installed on multiple Linux servers front-ending third-party disk. Another is that customers were suffering the usual cold-feet about buying a high-end technology from a start-up, although HDS does not explain why the HDS re-branding did not dispel those nerves.
The player to beat in the CAS market is EMC, which claims to have sold more than 100PB capacity of its Centera disk archive to over 3,000 customers since that device was launched five years ago.
Hitachi’s announcement of its Content Services Archive Platform version 2.0 made more than one veiled reference to the Centera. Like other suppliers, HDS took a swipe at what is perceived to be Centera’s biggest weakness, which is its throughput.
The Hitachi system comprises up to 80 Linux servers or nodes running the CAP software, and according to Hitachi delivers up to 470% greater throughput than first-generation CAS solutions.
Until now Hitachi has been selling the CAP as a front end to its mid-range ATA-capable WMS-100 and AMS-100 arrays. It promised last year that it would put the system in front of its giant USP box, and now it has delivered on that promise or at least it will when the CAP 2.0 begins shipping next month.
The USP can virtualize or aggregate the disk capacity third-party disk arrays, and HDS said that
the CAP will be able to access that third-party capacity up to 20PB across 80 nodes, or up to 400m objects per node. Hitachi also pointed to the separation of front-end processing nodes and back-end storage in its system, which allows customers to strike whatever ratio of node-to-disk that they want.
The next most frequently targeted weakness of the Centera is the fact that it presents a proprietary API to applications, unlike the CAP and other systems which offer CIFS, NFS, HTTP and WebDAV interfaces. But CIFS, NFS and HTPP interfaces were added to the Centera about a year after it was launched, and over 240 archiving software vendors support the Centera, compared to around 16 for the CAP. Not only that, but the CAP also sports its own proprietary API.
Hitachi argues that it is far easier to use CIFS, NFS or HTTP interface to reach meta-data on the Centera than on the CAP. We had a Korean developer come to visit us. He integrated his application with our system on the 18-hour flight back to Korea, said HDS marketing director Michael Hay. And unlike the Centera API, the CAP API provides access to a search engine, HDS said.
CAS systems that take digital fingerprints or hashes of files are intrinsically de-duplicating data. But the SHA-256 and other hashing algorithms used in CAS boxes are theoretically capable of creating identical finger-print from different files, in what is known as a hash collision. That could lead to the inadvertent deletion of what was actually unique and not duplicate data, but Hitachi’s CAP 2.0 now completes a complete bit-by-bit comparison of files before any supposedly duplicate files are discarded.
How many of Hitachi customers have ever experienced a hash collision in the wild, rather than just read about laboratory-induced collisions? It’s very unlikely to occur, Hay admitted. But we wanted to eliminate the risk entirely, and not have to deal with the FUD, he said.
The encryption is completed using a key that is split up and stored in parts across multiple nodes. That means that if part of a system were to end up in the wrong hands, hackers or would-be data thieves would still not have access to the complete key needed to decrypt the data.
The object-level, IP-based replication is from CAP to CAP, and is entirely separate to any replication functions in Hitachi’s USP array.
An entry-level 5TB CAP system will carry a list price of around $70,000.