Killing the Duck to Keep the Quack
(networked proliferation and long-term access)
An evolving report (since 1995) on the challenges of maintaining access to the on-line documentary entitled The Flight of Ducks.
For ease of reading, the text is designed to be printed (20 pages)
Updates and relationships to the work it describes are available [on-line]
http://www.duckdigital.net/FOD/FOD0055.html
[Voice] 0418 575 525 [Email] simonpockley@gmail.com
(CC) reserved Simon Pockley
Provenance:

2005-10-03: Re-marked up with updated style sheet
2001-03-24: Same title given to a presentation at the ETD 2001 Forum
1998-03-11: Deleted from RMIT University server at request of RMIT Human Research Ethics Committee
1997-11-01: Text submitted as part of Australia's first on-line doctoral thesis
1997-07-30: Regularly captured by The National Library of Australia's Pandora Archive
1997-04-02: An edited print version published in the RMIT Alumni Magazine Issue
1996-09-05: First captured by Stanford University Conservation On-Line as "Lest we Forget"
Subsections:
  1. Summary
  2. The Fragility of Cultural Memory
  3. Technological Obsolescence
  4. Death in Custody
  5. Describing Digital Content
  6. Markup - digital wrapping paper
  7. A New Medium - frozen moments
  8. Metadata
  9. A case study - unexpected vulnerabilities
  10. A few more words
  11. External references & notes
  12. Electronic conversation
1. Summary:

The ease of distributed access accompanying the digital revolution has a darker side. Technological obsolescence and ephemeral formats have left little firm ground upon which to build the infrastructure necessary for the effective management and preservation of digital resources.

An infrastructure supporting long-term access needs to be able toaccommodate the continuous evolution of the technology as well as continuous streams of data. This means that a durable record of such work now includes a flexible and adaptible approach to maintaining live access.

Long-term access is the life-blood of evolving, mutating works, such as The Flight of Ducks. Rather than killing such a work, by fixing it in closed storage media, this essay argues that the transformative qualities of networked technologies should be seen as providing an archival advantage. These proliferating information spaces, relying on the inherent simplicity and durability of ASCII text, are showing themselves to be capable of carrying the content of ideas, not only across platforms and standards, but perhaps even across to new technologies.

2. The Fragility of Cultural Memory

Concern over the loss of historical awareness accompanying the gradual infiltrations of a global culture has led to the currency of the term cultural memory. My own attempts to rediscover a fragmented family history have made me less confident in the reliability of recall and more confident in our collective ability to forget the origins of our cultural diversity.

In 1978, Carl Sagan led a team at NASA to put together a collection of messages to travel into the future aboard the Voyager1 spacecraft. Using the technology of the day, these messages were recorded onto 12-inch gold plated copper phonograph records encased in protective aluminium jackets. The records (encoded in analog form) contained sounds and images from many of the world's vanishing cultures. It was a remarkable and prescient exercise, as much a communication with an endangered present as it was a journey through space and time. It will be forty thousand years before they (Voyager 1 & 2) come within a light year of a star and millions of years before either reach any other planetary system. It is quite possible that these artefacts may become the only evidence of our existence.

In the past, the custodians of cultural records had to continually balance the preservation of manuscripts, books, recordings, film and video against their availability for public access. Digital content reverses this balance. In spite of Marshall McLuhan's book-bound insights ('the medium is the massage') the process of digitisation now allows us to separate content from the medium which carries it. This loosening of the bonds imposed by the carrier medium, allows data in almost any form (text, sound, image) to be reused and recombined with a facility which we have yet to learn how to exploit.

The rush to digital technology has been so sudden that we have barely had time to work out how we can practically apply the process to records of the past, let alone consider the questions necessary to ensure the durability of the present. Nor have we had time to quietly reflect on the stability and durability of the means of mediation - computers and software. Some of the more thoughtful archivists are sounding alarms.

There is a very real possibility that nothing created, stored and disseminated electronically will survive in the long term. The problem does need to be stated this dramatically. I have an unfailing sinking feeling whenever anybody links the concepts of digitisation and preservation. I have a profound and unchanging belief that these two concepts do not belong in any sense in the same world.

Maggie Exon: Long-Term Management Issues in the Preservation of Electronic Information 2

After my father's death in 1990, I extracted from his belongings a collection of artefacts, journals and several hundred photographs relating to a camel expedition into Central Australia in 1933. Some of the rarest artefacts (stone tjurunga) were missing. The journal was barely legible and many of the surviving negatives, stored in rusty cigarette tins for 57 years, had deteriorated.

The view that there is a responsibility to send this collection into the future, because it is of historical importance, is a cultural value not shared by everyone. Perhaps it is the journey of the son around the father. Perhaps it is simply that this collection is part of my own cultural memory, but I feel a duty to protect it. The collection has significance today because it opens a window into a time and place from where many recollections are now a source of national anguish and pain.

While there are well-developed protective measures useful for prolonging the material life of this collection, most (removal from light etc) restrict accessibility. Digital surrogates of physical material have the advantage of being infinitely replicable without degradation. Common practice, when it comes to archiving, has been to record these digital surrogates onto storage media such as CD-ROM, which soon becomes redundant. The development of the World Wide Web has created the opportunity, not only to replicate content, but also to distribute it widely.

A preservation strategy using networked space, as a 'keeping place', appears to have been overlooked as a viable long-term repository for digital material. Currently, the problem of hardware obsolescence is simply pushed out to a single point, at the server level, where backward compatibility and incremental upgrade is the norm. However, less regulated, even viral, file sharing networks present an evolving medium for the distribution of digital work. In these spaces infrastructures are truly transitional and a lack of the selective processes inherent in custodial control, may provide the flexibility needed to carry digital content into the future.

In the digital environment the links between selection of materials, provision of access to those materials, and preservation of them over time is so inextricably linked that at the National Library we tend to talk increasingly simply of providing short and long term access rather than even making a semantic distinction between preservation and access.

Maggie Jones: Preservation Roles and Responsibilities of Collecting Institutions in the Digital Age3

In order to demonstrate and investigate this approach, the 1933 photographs, along with my father's journals and Aboriginal artefacts, now provide the core of an evolving and participatory work on the World Wide Web entitled The Flight of Ducks. Here, appropriation is encouraged and the accessible and replicable digital surrogates of a decaying physical collection join another collection of digital objects created around and about them - on-line.

With digital archives there is always the possibility of keeping everything. The tendency towards electronic copia is becoming a characteristic of sites which are concerned with collections. The primary challenges are concerned with selecting, not what should be included, but what should be left out. These selection decisions now reach down into the digital fabric of the material, as resolution is traded off against file size. In the swamp 4 of digital data, beneath which our libraries are sinking, we find ourselves losing the ability to recall 5 (with precision).

With electronically-stored information, the paradigm shift from concern about durability to concern about permanence has been completed. We may worry about hackers but we do not worry about genuine use. In fact, we revel, we positively boast when we can show an exponential growth in the use of the information services we provide. It requires a large shift in perception to realise that the best chance electronic information has of being preserved is that it should go on being used, regularly and continually. As soon as it is not used, it is in trouble.

Maggie Exon: Long-Term Management Issues in the Preservation of Electronic Information 6

This essay explores the contexts, the difficulties and the unexpected benefits I have encountered in seeking to preserve this material by putting it on-line, they could apply to any collection.

3. Technological Obsolescence

While loss of data associated with the deterioration of storage media 7 (CD-ROM, DAT tape etc) is important, the main issue confronting the archivist is that both hardware and software become rapidly obsolete. Who today has a punched card reader or a working copy of FORTRAN 11? (Lesk 95) Information from the first Landsat satellite is now irretrievable because no working machine can read the tapes used for storage in the 1970s. In many cases, it is impossible to know the extent of data degradation, because equipment required to read it no longer exists. Boxes full of 5 1/4" floppy disks collect dust on top of cupboards. Every time a word or image processor vendor 8 introduces an improved update with a slightly incompatible file format, more information becomes inaccessible.

An interesting phenomenon has been the abnegation of archival responsibility by the individual to the technology itself ("the computer crashed!"). Those of us, who use computers, can provide numerous examples of data loss - often for reasons shrugged off as inexplicable.

Devices and processes used to record, store and retrieve digital information now have a life cycle of between 2 - 5 years. Far less than some of the most fragile physical materials they are seeking to preserve. The practice known as 'refreshing' digital information by copying it onto new media is particularly vulnerable to problems of 'backward compatibility' and 'interoperability'. Software capable of emulating obsolete systems currently relies on the goodwill and enthusiasm of talented programmers but has little basis in economic reality.

The key point here, of course, is that these examples have been driven not by economic considerations but by the desire of talented programmers to preserve obsolete material because of the value (often entertainment) they gained from that material in the past. So long as the Internet culture of open access at essentially no cost to the tools and information necessary for talented people to create such emulation software continues, making it possible for people to assist with preservation on a volunteer basis at the cost primarily of their time, I believe that it will remain possible to preserve the majority of digital information available today.

Andrew Pam: Email (1996) 9

4. Death in Custody

The digital revolution may have profound effects on the operation of institutions previously responsible for holding archival material. It is becoming more obvious that many are unlikely to be able to bear the costs and complexities of moving digital content into the future. They will deliberately or inadvertently, through a simple failure to act, render the information irretrievable. Most collecting institutions have yet to develop a body of knowledge and experience, or infrastructure, to deal with these issues. The trend is to outsource the digitisation of existing collections to companies with only a short-term interest in the process. It is often easier to ignore the existence of digital work or to treat it as somehow less worthy of collection and preservation.

The greatest fear the report raises is that in a world when more and more cost-justification is required, the owners of information will not take the steps needed to keep it available; nor will they permit others to do so; and much will disappear.

Michael Lesk: Preserving Digital Objects: Recurrent Needs and Challenges10

The search for a suitable pond for The Flight of Ducks has provided insights into three quite different custodial environments:

The University of RMIT - Melbourne

Like most universities, postgraduate teaching, supervision and research at RMIT is mediated through the use of computers. It is now standard practice11 to use word processors to produce even the most basic essays. Access to more powerful computers, digital scanners and sophisticated software have resulted in these works becoming digital mixtures of text, images, sounds and even moving images in 2D and 3D. It is an irrational and complex process to transform these efforts into paper bound theses. Moreover, these theses, once shelved in the University Library, are rarely, if ever accessed.

Like many tertiary institutions, RMIT University does not accept any custodial responsibility for on-line research during its development. There is little or no infrastructure to support long-term access and Internet accounts are seen as short-term or transitional. In 1996-97, at the Higher Degree Level the University was hindered in the development of a preservation strategy for digital work by not having a Head Librarian. Efforts to put these issues on the RMIT University Library and the Higher Degrees Committee agenda were met by indifference - even hostility. At the Pro-Vice Chancellor level there was a similar lack of will and understanding.

Even if an infrastructure had existed when this project was submitted for examination (1997) an unexpected vulnerability emerged. The University Human Research Ethics Committee (RMIT HREC) requested that the site be removed from the University server. This action was taken because several members of the HREC suspected that the research might be culturally offensive (see Blinding the Duck). Political censorship is a vulnerability that digital works can only resist through proliferation. It took less than three minutes to delete the site. If The Flight of Ducks had not spread into other more stable locations it would now be inaccessible or locked up in closed storage media.

The National Library of Australia (NLA)

In recognising that the preservation of digital material is more than just a technical problem, The NLA set out to develop a flexible approach that could evolve with the technology. A primary initiative in 1996 was the PANDORA 12 project. This project set out to discover, through a series of pilot projects, how to build the infrastructure needed to link the areas involved in these complex issues.

Whether the creator is an organisation with both the commitment and resources to maintain digital information over time themselves or not is likely to determine the nature of the relationship between them and the Library. For example if the creator is not in a position to maintain digital information over time but the information is considered to be significant, the National Library may well undertake the responsibility for maintaining it if another institution is not considered to be a more appropriate site. These national mechanisms have yet to be worked out...

Maggie Jones: Preservation Roles and Responsibilities of Collecting Institutions in the Digital Age 13

The Flight of Ducks was selected as a pilot site through which many of the issues of digital preservation could be explored.

In November 1997, as the result of the work by the PANDORA team at the NLA, most of The Flight of Ducks was formally archived for long-term access. The original plan had been to capture the site every month but this has been found to be stretching the resources of the project. Like all government projects, PANDORA cannot rely on continued support. One can only speculate on what would happen to this fledgling archive if funding was withdrawn and the unit dismantled.

Cinemedia

In 1997 an umbrella organisation known as Cinemedia was launched, as the convergent needs of screen culture in Victoria (Australia) were identified as having a common future. I moved a mirror of The Flight of Ducks to Cinemedia (then the State Film Centre) at the beginning of 1996 in order to take advantage of an under-utilised server and to escape from the lack of on-line infrastructure at the University.

While Cinemedia lacked an infrastructure for the preservation of on-line work, its mission was to encourage access to screen content and culture. There was the hope that in this more receptive environment, it would be possible to build an infrastructure that could draw on the achievements of the NLA and its own need for a curated digital media showcase. Such a showcase was planned for 2001 when a new building, housing [on-line] digital gallery spaces, would open in the centre of Melbourne.

The organisation's resources were almost entirely directed towards film and video culture. Curatorial policy for evolving digital work had to compete with the paradigms of its Digital Media Library (containing closed media) or the aspirations of its various business units. These units were more comfortable adapting the production needs of film to CD-ROM than discovering how they can support on-line productions that are more service than product.

Before revisiting these organisations, for an examination of The Flight of Ducks as a case study, it is important to look at some of the encoding contexts that underlie any infrastructure capable of supporting long-term access.

5. Describing Digital Content

In December 1994, the Commission on Preservation and Access and the Research Libraries Group created the Task Force on Archiving of Digital Information 14 to investigate the means of ensuring continued access indefinitely into the future of records stored in digital electronic form. Their report is essential reading and is the source of most policy in this area. The Task Force divides digital information into two distinct categories or kinds of objects (a term derived from programming):

Most research has concentrated on the preservation of the first group (document-like objects) primarily because these are, to date, the bulk of all archival holdings. Today, relevant and vibrant culture is more likely to reside in the second group. This group should probably be divided into interactive and non-interactive type objects. Either way, our ability to retain these digital objects depends on properties inherent to binary coding, that give both of these types of objects their versatility.

An important function of a code is to represent characters in a standardised way. It must be able to translate human communication into something that digital architectures (computers) can understand and it must be standardised so that these machines can communicate with each other without a loss of data integrity. While the on and off switches of binary coding can be represented by Os and Is, the length of the binary sequence varies according to the operating system. The meaning of each sequence varies according to the kind of information it describes (text letter, number, image etc).

The basic challenge for the digital archivist is to be able to migrate the structure and content of information through a maze of competing digital coding systems. The source of this maze can be found in the history and development of character encoding standards. It is useful to look very briefly at this development in order to understand the forces that continue to determine, and undermine, standards today.

Character coding really began in the 19th century with the first attempts to automate the typesetting process. It was the telegraph that led to the development of remote typesetting and subsequently the growth of newspaper chains. The first standard (the Baudot code) was too limited to reflect the 26 letters of the alphabet along with punctuation and numbers. So, several shift keys were used to increase the available codes. Later, this served as a model for the option, alt, control and command keys on today's computers. Walter Morey introduced a variant of Baudots code, Teletype Services (TTS) code. However, the problem with both these shift codes was that, if they were lost in transmission, all subsequent codes were misrepresented until another shift code was received. This particular problem was not solved until the 1960s.

There are two ways to fill a technological gap such as that which existed for character codes. One way is for a group of companies to get together, spend months or years in careful consideration and bring forth a standard (we see this today with W3C). The other way is for one company to create its own solution, quickly implement it, and expect all the other companies to follow along (today's example is Microsoft). Both of these paths were taken, producing two competing codes 15.

ASCII (American Standard Code for Information Interchange) and EBCDIC (Extended Binary Coded Decimal Interchange Code) were the result of these approaches; each is used by a different type of computer. One of the main difficulties in translating one into the other is that each system uses a different type of keyboard with keys on one that simply do not exist on the other.

Because ASCII code only defines 128 characters, many computers define another 128 characters to create a proprietary 'extended ASCII' character set. The majority of these have now been standardised and codified as ISO character sets, with "Latin-1" being the commonly assumed default in most modern systems. The character set in use also does not address semantic issues such as the permissible lengths of lines or the encoding of the end of a line or a document. This is where most of the incompatibilities arise; MacOS uses a carriage return to mark the end of a line, Unix uses a line feed and DOS, Windows and OS/2 use a carriage and line feed.16

At this point ASCII is not only the most widely deployed representation of text but its most durable. In the interest of interoperability, information exchange on the Internet relies on it almost exclusively. However, the Internet reaches communities all over the world. If it is to become a significant cultural force, the needs of languages using non-ASCII character sets will eventually have to be addressed. At the moment the development of a new character coding system by Xerox (with even more codeable characters) known as Unicode appears to have the support of many of the major computer players. The The Unicode Standard 17 uses a 16-bit code set that provides codes or more than 65,000 characters. Nevertheless, the decisions made by a currently dominant company, like Microsoft, show us that standards can be sacrificed to the maintenance of market share. The persistence of the QWERTY keyboard layout is further evidence that rationality does not always prevail.

6. Markup - digital wrapping paper

Just as the development of the Elizabethan printing press led to variations in text that sparked 400 years of Shakespearean textual scholarship, so the evolution of character coding may well provide the digital archaeologists with years of employment.

Electronic texts18 have been used in textual scholarship for nearly 50 years. It is only recently that they are expected to be reusable and accessible. Markup, a kind of wrapping paper or meta-text, encodes the logical structure and content of a defined document. Standard Generalised Markup Language (SGML) became an international standard (ISO 8879) in 1986. It has the advantage that it is not only independent of encoded text but also of hardware or software. It can be transmitted across all networks.

Behind every screen on the World Wide Web lies a Hypertext Markup Language (HTML) made of ASCII text. HTML is a simpler but universally recognised application of SGML. It can provide formatting information, designate hypertext links and assist in search and retrieval. Its simplicity allows people with almost no computer skills to create, format and disseminate digital material in a networked medium. HTML is limited in its ability to describe content because it does not include many of the meta-information codings of SGML markup. As a result, it is rapidly being extended.

In order to increase the contextual abilities of the web as a manager of the world's information, the direction of evolution is to make web user agents able to receive and process generic SGML in the way that they are now able to receive and process HTML. The bridge may well be an Extensible Markup Language known as XML.

The Multipurpose Internet Mail Extensions (MIME) introduces Internet Media Types, including text representations other than ASCII. HTML is a standard Internet Media Type as well as an SGML application. In the MIME and SGML specifications, however, character representation is notoriously complex, and the two specifications are inconsistent and incompatible. The Internet Engineering Task Force (IETF), and the MIME_SGML, HTML and Hypertext Transfer Protocol (HTTP) working groups are attempting to rectify these inconsistencies and are discussing the best ways of incorporating text representations other than ASCII.

From this point markup description rapidly becomes overwhelmed by the acronyms representing various and competing specifications, formats and markup variants. We are now seeing Microsoft's Explorer software rapidly eroding the dominance of Netscape as a defacto browser standard. Neither fully complies with the other or with the various W3C HTML specifications. Such is the speed of development that it becomes difficult for even the most active participants in these races to keep up. It is equally hard to predict what, if any, standards will evolve. There is no reason to doubt that the same leap frogging processes that applied to character encoding will not apply to Markup languages.

It is the simplicity of HTML that has led to its success. However, HTML is also capable of accommodating specific programs, such as Java Applets, that appear to have a contradictory role. They complicate the Markup process yet, at the same time, they automate or simplify many interactive tasks. This creates a tension between the forces that seek to automate the development of web-based activities for mass participation and those that led to its spectacularly rapid adoption. Nevertheless, the dynamic qualities of these programs have profound implications as they transform the World Wide Web into a new medium.

7. A New Medium - frozen moments

We are now plunging headlong into a networked mediation of knowledge where the very nature of digital information has expanded beyond the notion of stable encoded objects. To properly understand how this contributes to a new medium requires a paradigm shift. It is necessary to abandon ideas about the finished work and to redefine concepts of the published work.19The web is a proliferating medium of rapid, if not instant, global dissemination where data flows in both directions from sender and receiver. Unlike the closed media of book, film, video or CD-ROM people can and do talk back. This electronic conversation or interaction can inform, challenge and even overwrite material. In The Flight of Ducks email is combined to form unexpected links and relationships. Questions, attacks, interviews, photographs, stories, even animations arrive daily, causing the site and its content to grow incrementally and to mutate constantly. This paper itself has undergone hundreds of on-line updates as electronic conversation leads to refinement. For example:

It's the quote you put under the "Death in Custody" heading, the last sentence...

These national mechanisms have yet to be worked out - other institutions take on roles as both facilitators and active participants in preserving digital information.

She believes the end of the sentence does not portray the message correctly and would prefer it if you would remove the end of the sentence instead so that it would read:

These national mechanisms have yet to be worked out....

I think that it's because the other institutions don't take on those roles yet, but the current sentence may sound like they do. (the dot dot dot she put in I guess to signify there was more discussion on the subject, but would optional if you don't like it).

Deborah Woodyard (NLA) email (Fri October 25, 1996)

For the archivist, the notion that we can save a copy of every work published is as absurd as it is to think that each screen or data display might be anything more than an evolving variant of a continuous stream.

Taken further, one might usefully swap the word idea for display in order to get closer to a description of the dynamic processes involved. The display resulting from the action of a search engine is a prime example. Its content might only exist during the time of access. This moment of access consists only of a set of rules and references to fragments from other sources from which this array is to be derived. Like thought itself, the medium is inherently unstable not because it is unable to carry and assemble these fragments reliably, but because the fragments themselves may be continuously changing. Add to this the possibility of live feeds of video, sound, G.P.S. data, temperature, (not necessarily from the same places) and we see that we are, in fact, generating a new representation of perception and thought. The narrative conventions of this medium (where humans enter it) lie somewhere between a phone tap and a postcard.

In the case of networked information, when we don't have a tangible item owned by the Library, we have no obvious link to whose responsibility it is to maintain it, no obvious way of being able to tell whether it is in fact endangered, and no easy way to find it in the first place, much less make an assessment of its significance.

Maggie Jones: Preservation Roles and Responsibilities of Collecting Institutions in the Digital Age19

One need only enter one of the graphical on-line chat sites such as The Palace 20 to see how speech has become entangled with text and image. The consequences for semiotics are profound, particularly where data is reorganised through mediated links. In these spaces it also becomes impossible to define the participant as either reader or writer. We follow Vannevar Bush's journey in reading an individual path by actively selecting that path which, in turn, transforms us into the writer of a kind of dendritic trace of ideas. In this way the mediation of reading, writing and thinking may mirror the cognitive process itself.

The Net navigator, or cybernaut, has learned to find her way around in the rhizomatic flood of hypertext links. She knows that there's no original text, no 'actual' document to which all other documents are to be related. She's figured out that on the Net it's primarily a matter of forming small machines, creative text designs and sensible images out of the manifold and dispersed text segments. These machines, designs and images, which didn't exist previously in this way and won't continue to exist in the future, are ontologically transitional in type.

Mike Sandbothe The Transversal Logic of the World Wide Web: A Philosophical Analysis21

One can imagine that every screen on the web could be integrated into a single site. Each period of access could be seen as representing a fragment of a single evolving idea. The passing of these fleeting moments turn the archivist into an assembler of 'snapshots' - or frozen moments.

In describing its own making, The Flight of Ducks uses these conventions to show how its development has evolved. The fleeting glimpes of conversation arising from its construction are as much a part of the work as the thoughts recorded in the original journal around which the site is built.

Sometimes the data received by email or attachment carries with it information which its sender might not want to be publicly accessible. The conversation can be edited because the work is built by hand. However, as the site has grown, this has become extremely time consuming. I find myself, out of respect and propriety, correcting spelling, stripping out personal references and deleting embarrassments.

While there are on-site warnings and procedures for helping me to identify restricted information, the distinction between messages which arrive with suggestions and comment and those which are private is not always easy to make. This is made more difficult when much of the information supplied actually influences the development of the work.

The digital archivist must ask what happens to all this data if it is not allowed to join the material it describes. History has shown us that it is often those things we discard that, in hindsight, are important. From an editorial point of view they form a narrative in themselves. They have come to characterise a medium, often mistakenly thought of, as a medium distanced from human nuance. These are issues that need to be explored and confronted. They are similar issues to those of Aboriginal cultural sensitivity explored in the third of these hypertexts, Blinding the Duck.

8. Metadata

The impact of metadata on these snapshots by networked providers, archivists, collectors, researchers, commentators and special interest groups is just beginning to be felt.

Metadata springs from the self-referential quality of networked media, it is literally, data about data. Metadata might even outweigh the information it describes. Metadata might include format descriptions, rights statements, keywords, sources, navigational aids, discovery aids, access counts, data bases, combing screens, emails, chat group references, even essays like this, which itself contains metadata. (See source code FOD0055.html). Most of this referential information, even archival information, is similarly live and in a state of constant change as, like the material it describes, it evolves beneath the weight of comment, input and upgrade. The extent to which this metadata is capable of being separated from the information it describes, influences or extends is a difficult question. It is a question made even more complex when the context of the information is evolving with the medium itself.

In order to avoid infinite regress and ask questions about the usefulness of metadata as an aid to finding information, the participants of the OCLC/NCSA Metadata Workshop 22 recast this question into: how can a simple metadata record be defined that sufficiently describes a wide range of electronic objects?

Once the link between access and preservation is understood, then resource discovery (finding things) was, and is, the single most important need that metadata can satisfy. The sheer volume of data [on-line] means that there is far more information than professional abstractors, indexers and cataloguers can manage using existing methods and systems. It is now becoming obvious that metadata needs to be generated at the point of creation of a digital resource. Authors and information providers need a means to describe the resources themselves, without having to undergo the extensive training required to create records conforming to established archival standards.

It is important not to forget that archival communities already catalogue a vast amount of information. Effective standards and proven practices for providing description and access to various types of resources have been used for so long that there is considerable resistance to the idea that they should simplify mature approaches so that they can be used by ordinary people on-line.

During the life of The Flight of Ducks there have been four developmental metadata workshops. The first was held in Dublin (Ohio) and gave its name to a simple set of 13 elements known as the Dublin Core. The second was held at the University of Warwick, in the UK and led to the development of a conceptual framework for different varieties of metadata known as the Warwick extensions. The third, back at Dublin, added 2 more elements to the core relating to images. The fourth, held in Canberra (Australia 1997), attempted to reach a consensus about how these basic elements could be refined. At this meeting there was a division between those who sought to complicate resource description - known as structuralists and those who sought to keep it simple - known as the minimalists. The structuralists usually belong to the traditional archival community of librarians and archivists and are primarily concerned with how the Dublin Core can be used to integrate on-line discovery mechanisms with traditional off-line procedures.

International projects such as the Text Encoding Initiative (TEI) 23 which seek to evolve a set of interpretive structures allowing text to be adequately described may well founder on these tensions. Put simply, the complexity required to tag texts for their most idiosyncratic components comes at the expense of the simplicity required for the effective generation of these tags and consequently for the effective searching of large databases.

There are other tensions through which these differing approaches have come to characterise the use of the World Wide Web itself. Many people are impatient to use the platform/software dependent plug-ins and proprietary scripts which produce the bells and whistles moving the web towards full screen sound and movement. They forget that the success and vibrancy of the World Wide Web has been built on its platform/software independence. The Flight of Ducks is unashamed to be minimalist in its use of hand written markup and metadata.

Some of the most useful metadata is produced by persons other than librarians and document owners, and it can be found neither in card catalogues nor in self-descriptions of the documents themselves. Many kinds of such "third-party" metadata (e.g., bibliographies) are indispensable aids to information discovery. It should be possible to allow topic-oriented metadata documents with semantic network functionality to be cooperatively authored, interchanged, and integrated into master documents. Such documents (and amalgamated master documents) might resemble traditional catalogues, indexes, thesauri, encyclopedia, bibliographies, etc., with functional enhancements such as the hiding of references that are outside the scope of the researcher's interest, etc.

Stuart Weibel: OCLC/NCSA Metadata Workshop24

Please take a moment to view the source code of this (or any) display screen by pressing [View] then [Document Source] in your browser.

During its evolution The Flight of Ducks has been the site of one of the Internet's few large-scale implementations of Dublin Core type metadata. This development has been based on those elements that either appeared to be stable or had particular functions within the scope of its structure and purpose. Because the concept of on-line metadata has been in development, there have been many false starts and blind alleys. Metadata development continues today as the various on-line working groups (in which I participate) argue their cases from their minimalist or structuralist standpoints.

9. Evolving an Infrastructure - a case study

In 1995 it was unusual for a digital new media documentary to be constructed for the purpose of preserving its content. Surrogates of the core material had only just been created when the Dublin Core was first proposed.

It is still hard to find projects of similar depth and scope. Indeed, the site has now become so large and complex that any global changes or upgrades represent an enormous investment of time and effort. Text editors, can help with some global changes but in most cases screens are unique entities and carry individual sets of metadata such as identifiers and screen format descriptions that can only be marked up by hand.

The Flight of Ducks will continue to be constructed and upgraded publicly on-line. Because of the rapid development of on-line technology, both the sound files and moving images can be upgraded (when time permits) in a way that will transform the site into a more dynamic space. The available bandwidth rather than the limitations of the encoding limit these forms of upgrade. To date, the pragmatic demands of having a realistic transfer rate of about 500 bytes per second means that all files have to be kept to a minimum if they are to download in an acceptable time.

The site is currently made up of nearly 1000 screens. Beyond the fragile source material from which it was originally derived, the content of the site exists only in digital form. Apart from various and largely unknown third party captures of selected material, diverse versions of the digital content of The Flight of Ducks have demonstrated its ability to proliferate by spreading into the following servers and storage media:

There have been so many changes during the site's evolution that when I look at some of the old captures, it is unrecognisable. Unless some form of global Internet meltdown occurs, preservation of this material largely depends on the successful and continued migration of the digital content across servers. It would be interesting to be writing this section in 20 years time. For the record, it may be useful to identify the following known vulnerabilities:

University of RMIT

At the time of writing The Flight of Ducks has been deleted from the University server at the request of the RMIT Human Research Ethics Committee. If /when it is finally allowed to be submitted for examination as a Ph.D. project, the RMIT University Library will be required to retain a copy of the snapshot used for the examination. There is yet no clear indication of how a decision will be made about whether the University will run this snapshot on its own server or opt to link to the NLA catalogue. The current chairman of the Higher Degrees Committee is of the opinion that CD-ROM is an appropriate long-term storage and examination medium. The RMIT Library, on the other hand, now recognises that this is only a temporary storage medium.

In order to provide some focus for discussion I have written a short protocol (See Appendix) outlining many of the issues which need to be considered, entitled: The Submission, Storage and Examination of Online Projects at RMIT. This paper rests heavily on the work done at Virginia Tech. (U.S.A.) where this essay appears to have found an audience.

The politically motivated actions of the RMIT Human Research Ethics Committee have not only delayed examination by six months but effectively frozen all constructive discussion about the submission and archiving of the work at the Higher Degrees Committee level. The RMIT Library takes its instructions from the Higher Degrees Committee and as a result is ill prepared to establish an infrastructure capable of supporting long-term access to the work.

Cinemedia

The Cinemedia server is now the only live (responsive) site for The Flight of Ducks . The work will continue to evolve after its examination as a Ph.D. project. Cinemedia, through a memorandum of understanding with RMIT's Centre for Animation and Interactive Multimedia (AIM), has committed itself (at executive level) to displaying the results of student work but not worked out who should take responsibility for it.

The Flight of Ducks currently enjoys the formal status, at Cinemedia, of being a screen partner. Under this arrangement I have FTP access into the server. However, Cinemedia retains control of all material on its server and has the right to remove the site or selected content at any time. There was an attempt to test this procedure by the RMIT HREC in their quest to have the work censored. As a result of this action, the designated authority of the webmaster has been restated. Informally, the site is secure for as long as I continue to work there and watch over it.

I am currently being employed as a consultant in matters to do with the management of digital resources. Cinemedia is in a unique position to invent a co-operative and convergent infrastructure for the collection, preservation and distribution of curated digital work.

In an effort to provide the policy development necessary for curated access to on-line digital works, I have developed a draft Digital Management Strategy describing what an infrastructure framework might look like. If this leads to the development of this framework then The Flight of Ducks will have a better chance of remaining accessible.

A detail of this framework dealing with disaster planning was recently highlighted when, on Friday 15th May 1998, a hacker was intercepted who had penetrated the Cinemedia server and begun to delete files. This was not an isolated incident.

The National Library of Australia (NLA)

The Library has been a staunch ally when it comes to political censorship and the politics of control. As professional librarians and curators of cultural memory, there is little cause for concern. There are certainly problems in integrating the metadata into their main catalogue but these may dissolve with hardware upgrades.

To date, the NLA has been unable to capture the password protected area (built to house restricted material) now containing confidential letters and email. We are currently working on an agreement concerning password management and responsibility. This is an important area of infrastructure development if the collection is to be able to remain whole. While the location of the digital surrogates of tjurunga has been problematic (from the University's point of view), it is hoped that eventually this will not be such a divisive issue. It allows time for the development of an effective 3D scanning technique so that all the artefacts can be included and not just surrogate photographs of them.

10. A few more words

Overall, agreement between these three organisations is still as elusive as any commitment to build an infrastructure between them.

My greatest concern is that we may be trying to be too elaborate; the plethora of projects, organisations and competing systems is a worry. In particular, everything we are building now appears dependent on elaborate webs (world-wide or otherwise) of cooperation. Maybe the idea of localised, bounded collections has not quite yet had its day and the world's large collecting institutions should not so readily divest themselves of their selection and collection responsibilities in favour of vaguer notions of the coordination of devolved responsibilities.

Maggie Exon: Long-Term Management Issues in the Preservation of Electronic Information25

Unfortunately, the web is too often characterised as a superficial medium in which one surfs or browses through material without spending time, even effort, appraising it. It is a medium yet to develop accepted conventions of authority, provenance or boundaries. We invent these as we go.

These are not technical issues. They are issues that draw resources away from any serious commitment to the curation and identification of significant digital works. Commitment is essential before the seminal works of this new era of networked communication become irretrievable.

11. External references & notes
  1. Rudd, R. [1997, Feb 24] Voyager's greetings to the Universe Interstellar Outreach Program.
    Available online: http://vraptor.jpl.nasa.gov/voyager/record.html [Accessed 1998, April 28].
  2. Exon, M. [1995, Nov 30] Long-Term Management Issues in the Preservation of Electronic Information. Papers from the National Preservation Office Annual Conference - Multimedia Preservation: Capturing The Rainbow.
    Available online: http://www.nla.gov.au/3/npo/conf/npo95me.html [Accessed 1998, April 28].
    For clarity, I have changed the syntax of the last sentence. Her actual words were:
    I have a profound and unchanging disbelief that these two concepts belong in any sense in the same world.
  3. Jones, Maggie. [1995, Nov 30] Preservation Roles and Responsibilities of Collecting Institutions in the Digital Age. Papers from the National Preservation Office Annual Conference Multimedia Preservation: Capturing The Rainbow.
    Available online: http://www.nla.gov.au/nla/staffpaper/npomj.html [Accessed 1998, April 28].
  4. Wainwright, Eric. [1995, Nov 30] Culture and Cultural Memory: Challenges of the Electronic Era. Papers from the National Preservation Office Annual Conference - Multimedia Preservation: Capturing The Rainbow.
    Available online: http://www.nla.gov.au/nla/staffpaper/npoew.html [Accessed 1998, April 28].
  5. Cathro, W. [1997 Sept 10] Metadata: An Overview , Standards Australia Seminar: Matching Discovery and Recovery.
    Available online: http://www.nla.gov.au/nla/staffpaper/cathro3.html#purp [Accessed 1998, April 28].
    'Recall' is used in the technical sense (librarian term) in which information retrieval is measured in terms of recall and precision. Recall and precision factors of 10-20% are often acceptable for most purposes. Web search engines frequently involve precision factors of much less than one percent.
  6. Op Cit. Maggie Exon
  7. Harvey, Ross. [1995, Nov 30] The Longevity of Electronic Media: from Electronic Artefact to Electronic Object. Papers from the National Preservation Office Annual Conference - Multimedia Preservation: Capturing The Rainbow.
    Available online: http://www.nla.gov.au/3/npo/conf/npo95rh.html [Accessed 1998, April 28].
  8. For example, Windows'95 will not run many of the existing DOS and Windows programs and hence information archived using DOS or Windows is lost to Windows'95 unless new versions of the software programs are released.
  9. Pam, A. (avatar@glasswings.com.au). Lest We Forget (1996 Sept 8) (simonpockley@gmail.com)
  10. Lesk, M. [1995, Nov 28] Preserving Digital Objects: Recurrent Needs and Challenges. Keynote address: National Preservation Office Annual Conference - Multimedia Preservation: Capturing The Rainbow.
    Available online: http://community.bellcore.com/lesk/auspres/aus.html [Accessed 1998, April 28].
  11. Roberts, A. [1997 Sept 29] Theses Unbound Ariadne.
    Available online: http://www.ariadne.ac.uk/issue11/cover/ [Accessed 1998, April 28].
  12. Cameron, J. [1997 Oct 15] Preserving & Accessing Networked Documentary Resources of Australia (PANDORA) Project
    Available online: http://www.nla.gov.au/policy/plan/pandora.html [Accessed 1998, April 28].
    PANDORA is a project initiated by the National Library of Australia to investigate strategies for the storage, preservation and access to digital data in the context of the creation of an electronic archive of library materials. This study is being funded in part by the Australian Vice-Chancellors' Committee (AV-(CC) reserved Working Group on Electronic Publishing.
  13. Op. cit. Maggie Jones
  14. Garrett, John. & Waters, D. [1995] Preserving Digital Information Report of the Task Force on Archiving of Digital Information.
    Available online: http://lyra.rlg.org/ArchTF/tfadi.index.htm [Accessed 1998, April 28].
    It is from this report that I have taken the sub-heading `The Fragility of Cultural Memory'.
  15. Sanders, J. [1993 Oct 5] The History and Function of ASCII Code
    Unavailable online: http://www.it.rit.edu/~jas3263/doc_process/ascii.html [Accessed 1996, May 18].
    Vanished. no author email contact. It was also quoted in Young, R. [1997 April] Mac-PC Text Differences & Demonstrations of Carriage Return verses Line Feed
    Available online: http://130.212.8.138/msp/Instructors/rey/text/Textdif.htm [1998, April 28].
  16. [No date] The Unicode Standard.
    Available online:http://www.unicode.org/unicode/standard/standard.html [1998, April 28].
  17. Susan Hockey [No date] Describing Electronic Texts: The Text Encoding Initiative and SGML
    Available online: http://lcweb.loc.gov/catdir/semdigdocs/hockey.html [Accessed 1998, April 28]
  18. Mitchell, William. The Reconfigured Eye, MIT Press. (1992) P.52.
  19. Op. cit. Maggie Jones
  20. [1998] The Palace
    Available online: http://www.thepalace.com/ [Accessed 1998, April 28].
    The Palace is virtual world chat with real-time pictures and sounds. Users create a personal appearance (avatar) by importing their own graphics or by selecting from a suitcase of prepared images included with the software. Authoring with the Palace is easy, and registered users can quickly create their own personal Palace on the Internet! Palace-like environments are proliferating with varying degrees of platform dependence.
  21. Sandbothe, M. [1997 July 30] The Transversal Logic of the World Wide Web: A Philosophical Analysis . Paper given at the Computers and Philosophy Conference in Pittsburgh
    Available online: http://www.uni-magdeburg.de/~iphi/ms/transversal.html [1998, April 28].
  22. Weibel, S., Godby, J. and Miller, E. [No date] OCLC/NCSA Metadata Workshop Los Alamos National Laboratory
    Available online: http://www.oclc.org:5046/oclc/research/conferences/metadata/dublin_core_report.html [Accessed 1998, April 28].
  23. Op. cit. Hockey, Susan.
  24. Op. cit. OCLC/NCSA Metadata Workshop
  25. Op Cit. Exon, Maggie.
Some Useful Projects