Whilst fears of the millennium bug largely proved unfounded they did at least help to raise awareness of the fragility of digital data. Once upon a time the paper archive of site notebooks, context records and plan and section drawings would be boxed up and put on a museum shelf. Should some future researcher need to consult them, then short of fire or flood, those records would still be available in 50 or 100 years: a little dustier, but still legible. From the late 1970s onwards a growing proportion of these records were converted into a digital format: text files, databases, CAD files, and so on. Some types of data, such as geophysical surveys, were digital from the outset; by 2001 many field-workers were collecting other forms of data straight into a digital format, including context descriptions, site photographs, and layer and feature coordinates. Moreover, whilst some data can be printed onto paper or microfilm as a security copy, much now depend upon their digital form for their very meaning. The functionality of a GIS, or a 3-D virtual model, cannot be replicated in hard copy.
Such digital data cannot just be left on the shelf if there is to be any hope of ever being able to use them again. Digital data require active curation. The computer diskette is particularly vulnerable to changes in temperature, dust or magnetism. It is also a specific storage medium that requires a particular hardware device to read it. The CD-ROM may provide a more durable means of preserving a particular sequence of binary digits but, contrary to popular belief, once the drive has been rendered redundant by the next upgrade in storage technology it will be no more secure than a 5¼ inch or an 8 inch floppy disc or a punched card or even paper tape. Furthermore the data on it may require a specific version of a software program in order to extract it, and the data held within the application may require knowledge of specific codes in order to comprehend it. Unless care is taken of each of these four elements, then it is likely that existing digital data will be useless within five years or less.
The BBC Domesday Project of the mid-eighties is a particularly good example of digital preservation challenges. In 1086 William the Conqueror completed the first national survey of the English countryside. The Domesday Book, as it became known, was recorded with vellum and ink and still survives in usable format. In 1986 BBC Domesday was launched to celebrate the 900th anniversary of the original Domesday book with the idea of capturing a massive range of information on the social, environmental, cultural and economic make up of the UK. The survey was recorded on two 12 inch video discs which could be viewed using a special BBC Microcomputer. Problems of hardware and software dependence have now rendered the system obsolete. While the video discs are likely to remain in good condition for many years to come, the 1980s computers which read them and the BBC Micro software which interprets the digital data have a finite lifetime. With few working examples left, the information on this incredible historical object will soon disappear forever unless action is taken.
There are a number of strategies for digital preservation (Beagrie and Jones 2000; Hendley 1998; Ross 2000; Russell 2000, TFDI 1995). The three main ones are hardware preservation, hardware emulation, and migration. Hardware preservation requires the maintenance of original hardware in order to keep software applications running on it. Clearly this can be expensive and require a high level of technological expertise. As time goes on costs increase as more and more antiquated machines have to be preserved and it is really only a solution of last resort. Hardware emulation also tries to keep old versions of software applications in running order, but it does this by emulating old operating systems on new computers. Of course, as operating systems constantly develop it becomes necessary to have emulations running within emulations and the whole business can become extremely complex. The Archaeology Data Service therefore favours the third strategy of migration. This approach relies upon the assumption that it is the information content rather than the look and feel of a particular application that is important. In the case of archaeological data we feel this is justifiable, although it may not be appropriate for all disciplines. Where possible, data are converted to open file formats, such as comma delimited ASCII files. For other datatypes, such as CAD for example, they are converted to standard exchange formats, such as DXF files, and will require migration to new versions as formats develop. This strategy requires most investment of labour at the point of deposition and it is the expectation that the bulk copying of files to new versions will be relatively easy to automate, although a sampling strategy for validation of files will be essential. As part of migration it is also necessary to ensure regular backup and refreshment of the physical storage media. The Arts and Humanities Data Service is also developing a central facility for the deep storage offline of large quantities of digital data in preservation formats.
A user survey conducted in 1998 revealed a very low level of awareness of good digital archiving practice within archaeology (Condron et al. 1999, 33-9). Many organisations were holding digital data, but 47% had not adopted any means of protecting the physical media (Condron et al. 1999, fig. 6.11). The ADS gained first-hand experience in data archaeology through work on the Newham Museum Archive (Kilbride 2000; Austin et al. 2001). When Newham Museum Archaeological Service was closed the digital data collected over the last ten years were hurriedly dispatched to the ADS, where they were catalogued, and accessioned. The archive arrived on 220 floppy disks, containing some 6432 individual files. About 5% of the total was already corrupted by the time it arrived in York. Of the remainder, 1500 files contained site reports, or elements of site reports. There were well over 700 database files and 1200 geophysics files. Each one of these had to be recorded in turn and converted from the original proprietary format into formats recommended for long-term preservation. Some 900 files were held in formats that are unidentifiable and thus remain unreadable. However, the main problems were caused not by degradation of media or obsolete file formats, but by inadequate documentation. Thus, there are various catalogues of small finds that, although consistent and apparently correct, contain no indication which excavation they relate to. Given that over 150 separate excavations are represented, this is obviously problematic, making them more or less useless. In another case, a large cemetery had been recorded in great detail. Each bone had been recorded with a descriptive code, but there was no means of expanding the codes, so the thousands of records generated are worthless. From a cemetery with several hundred burials, only one patella can be identified with any certainty, surviving because it was referred to in a free-text field.
Newham is certainly not exceptional, and had the foresighted Newham archaeologists not recognised the value of their data when the Museum closed down, then all this would certainly have been lost for good. How many similar boxes of floppy discs live on the shelves of museums and contracting units? Strategies for Digital Data suggests there are thousands (Condron et al. 1999, fig. 6.3).
Fortunately, digital preservation has now moved up the agenda of many organisations and in the UK a Digital Preservation Coalition has been founded. Its members include the British Library, the Consortium of University Research Libraries, the Joint Information Systems Committee of the Higher and Further Education Funding Councils, and the Public Record Office. This recognises that in digital preservation there are shared problems and shared solutions and economies of scale. In the archaeological sector the Strategies for Digital Data report recognised that it will also be necessary to develop a system of designation of approved digital archives, comparable to that which exists for registered museums (Condron et al. 1999, Recommendation 9, 81-2). The idea of registration of archives has been taken further by the Research Libraries Group and Online Computer Library Center in a report entitled Trusted Digital Repositories: Attributes and Responsibilities. The report also recommends adoption of a reference model for an Open Archival Information System (OAIS), a common framework for describing and comparing architectures and operations of digital archives. Compliance with this model is a defining attribute of a trusted digital repository.
© Internet Archaeology
URL: http://intarch.ac.uk/journal/issue15/7/jr4.html
Last updated: Wed 28 Jan 2004