Many organizations are now aware of the fact that they need to perform technology hardware refresh exercises every few years in order to ensure that the hardware their archives are stored on is always current. However, little thought has been given to the file format of the information itself.

Enterprise content management (ECM) vendors have started to consider the problem, and often offer organizations the option of converting stored information to an XML format, as this is seen as a de facto standard, and a format that will survive into the future. The problem with XML is that there are many dialects, some of which are industry-specific, but at least it is a move in the right direction.

There are new file formats emerging that are being developed for the long-term retention of information, and these include PDFA and Microsoft Office XML File Format. In addition, there are bodies looking at long-term retention. In the meantime, organizations that are worried about changing the format of their information for fear of it making them non-compliant could take a copy of their archived information and convert it to a format designed for long-term retention. This copy could then be migrated each time a new format emerges. There will then always be a readable version of the information, while the original version remains intact and unchanged, if unreadable.

It is not just unstructured information that needs to be retained for long periods. Transactional, structured data stored within relational databases also needs to be retained. At present, if a new version of a database is released that involves changes to the format of the data, existing data needs to be migrated to the new format. However, as the amount of structured data held in archives increases, this will become a much more onerous task.

One problem for organizations, and to a certain extent vendors, is that it is difficult to look at long-term issues when individuals are only in employment for a relatively short period of time, although employees starting work now could still be working for the same company in 40 years’ time. This is long enough for several generations of file formats to be developed and become obsolete.

The vendors should be taking the lead in developing file formats for the long-term retention of data, and organizations should be pushing the vendors to work together to produce long-term file formats that are not dependent on any application to read them. Organizations also need to start thinking in the long-term about how they can ‘future-proof’ their retained information against application and file format obsolescence. When information is requested in 100 years’ time, it will be the organization and not the vendors that are held accountable if information is not accessible, which may well be the responsibility of our grandchildren or great grandchildren.

Source: OpinionWire by Butler Group (www.butlergroup.com)