Killing the Duck to Keep the Quack

(Using the World Wide Web for digital preservation)

After my father's death in 1990 I found amongst his papers several hundred faded photographs of Central Australian Aborigines taken during a camel expedition in 1933. Because they illustrated an important story and because they may be the only existing record of some of these people I felt a responsibility to send them into the future.

The images are now in a digital form along with my father's journals, artefacts and related material from the expedition. Together, they form the core of a large and evolving archival site on the World Wide Web entitled The Flight of Ducks. The problems I have encountered in seeking to preserve this digital collection could apply to any collection of digital material. It is ironic that as the digital era approaches, ephemeral standards and technological obsolescence not only make the preservation of digital material difficult but also threaten its usefulness.

So, why use such a fluid and unstable medium as the World Wide Web as a repository for digital material? To answer this question it is necessary to understand how this new medium overcomes some of the difficulties peculiar to digital data.

At the same time as the Voyager missions left the earth carrying analog messages into space and a distant future, Landsat was sending back digital information about the earth. The data from the first Landsat satellite is now effectively irretrievable because no working machine can read the tapes used for storage in the 1970s.

At the 'Capturing The Rainbow' conference on digital preservation in 1995 Maggie Exon from the Australian National Library warned:

'There is a very real possibility that nothing created, stored and disseminated electronically will survive in the long term. The problem does need to be stated this dramatically. I have an unfailing sinking feeling whenever anybody links the concepts of digitisation and preservation. I have a profound and unchanging disbelief that these two concepts belong in any sense in the same world.'

Lost In Transit

In spite of the many confirmations of Marshall McLuhan's alliterative slogan, 'the medium is the massage', the process of digitisation reminds us that information can usually be separated from the medium which carries it. This loosening of the bonds imposed by medium, allows data in almost any form (text, sound, image) to be reused and recombined with a facility we have yet to learn how to exploit. When significant collections of material in digital form can be dispersed yet accessed from multiple locations in a single session, re-collecting takes on a new meaning with new vulnerabilities. The currency of the term, Cultural Memory, has more to do with an anguish over our collective ability to forget (lose data) than with any confidence in the reliability of recollection.

Because digital material does not wear out, we now find that the traditional balance kept between availabiltiy for public access and preservation has been reversed. Access and preservation have become inseparable.

The important issue here, is not that we lose data through the deterioration of storage media (CD-ROM, DAT tape, etc) but permanent access eludes us because both hardware and software become rapidly obsolete. Every time a word or image processor vendor introduces an improved update with a slightly incompatible file format, more information becomes inaccessible. We have even abnegated archival responsibility to the technology itself ("the computer crashed!").

Devices and processes used to record, store, and retrieve digital information are in a permanent state of transition. CD-ROM, DAT tape DVD etc. have an expected life cycle of between 2 - 5 years. The practice known as refreshing digital information by copying it onto new media is particularly vulnerable to problems of backward compatibility and interoperability. Software capable of emulating obsolete systems currently relies on the goodwill and enthusism of talented programmers but has little basis in economic reality.

An even greater vulnerability is that the custodians of data, be they private or institutional, are unlikely to be able to bear the costs and complexities of moving digital information. They will deliberately or inadvertently, through a simple failure to act, render the data irretrievable. After 30 years we are just beginning to define the true nature of this problem. Most digital preservation research has concentrated on material which can be represented in print format primarily because this is, to date, the bulk of all archival holdings. Today, (without diminishing the importance of text) relevant and vibrant culture is more likely to reside in material which cannot be represented in print.

Digital Archaeology

Just as the development of the Elizabethan printing press led to variations in text which sparked 400 years of Shakespearian textual scholarship, so the evolution of character coding may well provide digital archaeologists with years of employment as they mine the digital middens of redundant storage media.

At this stage ASCII (American Code for Information Interchange) is the most widely deployed representation of text. Information exchange on the internet relies on it almost exclusively. Behind the screens of the World Wide Web lies ASCII marked up with Hypertext Markup Language (HTML). This markup language is a kind of wrapping paper or metatext. It can provide formatting information, designate hypertext links and assist in search and retrieval. Its advantages are that it can be transmitted across all networks and that it is not only independent of the material it encloses but also of hardware or software. Its simplicity allows people with almost no computer skills to create, format and disseminate digital material in a networked medium.

Proliferating Organisms

As we slide into a networked mediation of knowledge, the borders around collectable information have expanded. To properly understand how this contributes to the evolution of a new medium requires a shift of understanding.

First, it is necessary to abandon ideas about the finished work and to redefine concepts of the published work . Above all, this is a proliferating medium of rapid, if not instant, global dissemination. For the archivist, the notion that we can save a copy of every work published is as absurd as it is to think that each screen might be anything more than an evolving variant of a continuous stream of material.

Second, one might usefully swap the word idea for screen in order to get closer to a description of the dynamic processes involved. Sometimes the screen only exists in virtual form during the time of access. Outside, this moment of access, the screen consists only of a set of rules and references to fragments from other sources from which this screen is to be derived. Like thought itself, the medium is inherently unstable not because the medium is unable to carry and assemble these fragments reliably, but because the fragments themselves may be continuously changing.

Meta Meta Meta

The impact of metadata on this same material by networked providers, archivists, collectors, researchers, commentators and special interest groups is just beginning to be felt. Metadata springs from the self referential characteristic of networked media, it is literally, data about data. It might include navigational and discovery aids, access counts, guest lists, data bases, combing screens, emails, chat group references, even essays like this, which itself contains metadata. Most of this referential information, even archival information, is similarly live and constantly changing, as, like the material it describes, it evolves beneath the weight of comment, input and upgrade.

A way to obtain usable metadata for electronic resources is to give authors and information providers a means to describe the resources themselves, without having to undergo the extensive training required to create records conforming to established standards. Problems arise because textual description is difficult without agreed terminology. The World Wide Web has blurred distinctions between image, speech and writing to the extent that it is difficult to separate concise discourse about the Web as a cybernetic system from the system itself.

One need only enter one of the graphical on-line chat sites such as The Palace to see how speech has become entangled with text and image and emotion. The consequences for semiotics are profound, particularly where writing and images are also reorganised graphically through hypertext links. It is no longer possible to define the participant as either reader or writer. Each participant reads an individual path by actively selecting a series of links which, in turn, transform that participant into the writer of a kind of dendritic trace of ideas. In this way hypertextual reading, writing and thinking mirrors the cognitive process itself. What is new is the representation of this process not just because this path can be traced but because it can be recorded.

In theory, every screen on the web could be integrated into a single site or period of access. Or, each site or access could be seen as representing fragments of an evolving idea.

The Flight of Ducks

'The Flight of Ducks' has always been an evolving idea. It is part history, part data-base, part novel, part research diary, part news-group, part museum, part poem, part shed.

Above all, it is a communication between its story and its audience or participants. Here, a collection of digital objects is given meaning, not just because they have historical significance, but because the story is still unfolding. Like oral epic poetry, the site publicly accommodates its own evolution as a living breathing, proliferating organism.

Storage without access is like trying to keep the quack by killing the duck. I have just moved the site from the RMIT minyos computer into the Victorian State Film Centre's digital library - Cinemedia, where access provisions mean that the site will continue to evolve. The National Library of Australia are looking at ways to capture it as part of a pilot study into the preservation of digital material. 'Capture' inludes regular updates (frequency to be determined).

The World Wide Web has evolved as a flexible carrier of digital data across both hardware and software. Its ability to disseminate digital material globally, combined with its inherent flexibility, allow for the accommodation of evolving standards of encoding and markup. Survival of significant material on-line is dependent on use and use is related to ease of access. These are not technological problems, inexperience and lack of infrastructure are the primarily threats to preservation and both need to be addressed by humans rather than machines.

Simon Pockley is currently completeing a PhD at RMIT based on 'The Flight of Ducks'. The site recently won two ATOM awards for the best Australian On-line Production and The Premiers gold award for the best Australian Multimedia Product.

Simon welcomes your questions and comments and can be contacted by email at:

