Skip to content

OpenXML and the British Library - Part 3

The British Library is one of three copyright libraries in the UK (the other two are the Cambridge and Oxford University Libraries). So UK publishers are obliged to send them a (free) copy of every book they publish. It has amassed a vast collection of material, some of it stored in its main building next to St Pancras station in London, some of it elsewhere.

A hundred years ago, all its material was paper based, but over the last few decades it has started to acquire an increasing proportion of digital material. Particularly if it contains too much acid (as cheap paperbacks usually do) paper slowly turns brown and decays, but the problems of preserving paper-based information and making it available to researchers are well understood.

When floppy disks and other magnetic media first came in, people did not give too much thought to their shelf-life. It turned out that life was shorter than expected, but - even more important - there has been rapid technical change. So even a perfectly preserved disk may be hard to deal with, because (1) it was designed for players which are no longer made, and (2) the data is written in a file format which is no longer used.

So how is a heritage of old disks to be preserved? Opinions seem to differ. (1) is relatively easy to deal with. The important thing to preserve is the digital data, the 0s and 1s, not the disk. A medieval manuscript may be a thing of unique beauty, but one 5 1/4 inch floppy is much the same as another. So step 1 is to extract the data from the old disk and store it, as is, on a more modern device. These days terabyte drives are cheap, so such storage is not a significant cost.

The more interesting problem is (2). The Adam Farquhar approach is apparently to convert the file (if it happens to be in some old MS file format) into OpenXML and then to preserve that. Well maybe that has some merit. An alternative approach would be to convert it to a .pdf file. Now clearly you lose something by converting it to a .pdf file. You lose the ability to edit the file, but that presumably is the last thing you want to do. More important, macros will stop working and you may lose “meta-data” showing such things as the change-history of the file.

Any serious historian or archivist would immediately and instinctively insist that whatever else you did, you must preserve the original digital file (on your modern terabyte drives). Future generations may want to examine bits and bytes that currently do not interest us and are lost in the conversion process.

But, leaving that aside, is the OpenXML version likely to be significantly more useful to the typical researcher than the .pdf version?

That is an entirely different question to the question of whether a typical business is likely to find a OpenXML conversion or an ODF conversion more useful for is legacy files. But it does have one thing in common, I have seen absolutely no evidence on the matter. Nor apparently has Adam Farquhar.

Post a Comment

Your email is never published nor shared. Required fields are marked *