See: Killing the Duck
Simon
I'm glad you are happy with the title entry page.
Do you mean 'Does Harvest gather them?' If so, the answer is that Harvest
often will not. It has trouble with many of the Java applications. While
our ability to deal with Java will probably increase, preserving one off
proprietary programs into the future will be exceedingly difficult and
resource intensive.
A software bank is worth a try. We have specified one for our archive
management system or digital object server as we now call it. It may only
help us in the short term, until the platform changes, but we won't know
unless we try. It's good to hear that another agency is doing some work on
this too and we may be able to learn from each other.
It's exciting that you have the opportunity to work with Cinemedia in this
way.
Cheers Margaret
>Margaret
None of this need concern you at all. The University Higher Degrees Committee will not accept the NLA archive. It must be archived by the RMIT Library and the link will be from their archive to the live site running at Cinemedia.
Fine.
Oh dear.
How complicated!
Sure! I was just talking to our IT fellow about capturing the password
controlled space. His earlier advice to me, when he was very new to the
job, was that Harvest would be able to cope. However, he said this morning
that he cannot work out how to make Harvest do it. Unfortunately this
software is no longer supported so there is no one to provide him with
advice.
He is experimenting with some other software that he thinks may be more
successful in this area. We'll keep in touch on this point.
Hi Simon
Congratulations! Persistence pays off. I take it that this means you can
now proceed to the assessment phase.
I note the requirement for a link to the Cinemedia site from the copy in the
PANDORA Archive. We already have a link from the title entry page. Do you
think this is sufficient or do you think the link needs to be from within
the archived version?
Best wishes
Margaret
Simon
Responses inserted below:
You write that at present, there is no machine readable result derived from your 'Source' tag by the NLA. This information is read by search engines.
I've yet to see a search engine that presents this information. Can you point me to one?
I have the feeling I am missing something here. Are all these Dublin Core type Meta tags useless to the NLA?
Online library catalogues have been around for quite a while now but they
came into existence at a time when the Internet and its metadata were not
even dreamed of. The days of using metadata are still in their infancy. We
would certainly encourage publishers to include metadata in their online
publications, but the management of that data and how it can be used has not
yet been resolved. I suggest that you read the following online documents:
http://www.searchenginewatch.com/webmasters/features.html
They provide a comparison of a number of major search engines, how they
index documents and whether of not they use meta tags - not all of them do!
http://www.searchenginewatch.com/webmasters/meta.html
I thought the reason for having a database was to be able to retrieve information. If I understand you correctly, then as far as the NLA is concerned, the FOD is like an encyclopaedia - described only as a book with no reference to its content. Is that right?
The reason for having a catalogue certainly is to enable the retrieval of
information. There are, however, limitations to what catalogues are able to
do presently. Many catalogues are not further subdivided into separate
smaller categories as the National Library's is.
Describing FOD as an encyclopaedia is not a bad simile. However, if you
check the subject headings given to FOD you will notice that words such as
"diaries, pictorial works, biography, journeys, description and travel,
Central Australia, Aborigines" are all included. These do provide
information about the content of the publication. They just aren't as
comprehensive as your metadata, as we have strict guidelines to follow
regarding the provision of subject headings. As I have already stated, we
do not yet know whether or not metadata may be included in catalogue records
in the future.
Anne Daniels (adaniels@nla.gov.au).(1998 May12). Metadata?
>Margaret
1. Thanks for letting me know the server is up again.
2. I'm wondering if I should change the name of `The Flight of Ducks' to `A Flight of Ducks' in order to be less alphabetically disadvantaged.
Up to you.
NOOOOOO! You are in the vanguard and everyone else has yet to catch up.
If they want their work to be discovered by search engines in the future
they are going to have to do what you have done.
Not used at this point in time by the NLA for creation of the catalogue
record. That may change. We are currently changing the computer system
that runs the National Bibliographic Database. The current system was
implemented in the early 1980s, developed in the late 1970s, so in the stone
age as far as computerised catalogues go. We (people on the ground in
Technical Services) don't yet fully understand what the new system will be
able to do. For instance, will it be enable searches on fields containing
data not currently searchable? If so, we may want to change our cataloguing
practice to incorporate new data, such as some publisher produced metadata.
Anne and I were discussing this possibility only the other day.
The library catalogue is only one way in which people are locating documents
on the Net. You're right, the search engines are mostly not using metadata
at the moment but it is about to happen and those publishers who don't
supply it will be left behind. The NLA sees as a major role for itseld
encouraging Australian publishers to use.
We've just had a very important strategic meeting where we were talking
about access to the Australian publishing output and how important the
metadata aspect is. Believe me what you have done is very important.
Not quite sure what you mean. You may be meaning what we've just spent all
morning talking about - how to integrate our existing services with the new
ones and what kind of structure we need to do it. How to use the new
technical possibilities to bring it all together and do more than we've ever
done before. How to take the best of the old and build on it to create
something spectacular.
b. I compare this to a screen such as FOD0505.html (a list of screens relating to Hermannsburg Mission) which list photographs, journal entries etc.
c. I look at the keyword metadata for this screen:
<meta name="keywords" content="Finke river, Albrecht, Aborigine, Mission Central Australia, journals, history, Hermannsburg, combing screens, heat new medium, maps, project, archive, database, non-linear, Mission, truck Strehlow, Murch, Larnach, Pockley, Aboriginal, linear, contents, Johannsen, archival web site, camels, expedition, journey, travel, path, thread, combing screens, new medium, project, expedition, journey, travel, path, thread," >
d. I look at the title
<TITLE >Index............Australian history: Hermannsburg Mission index of screens - The Flight of Ducks.........combing/mission</TITLE >
e. I think - surely it can't be all that hard to build an interpretive engine that can find such a screen or indeed, any one of the screens listed. It seems to me that each screen is actually a seperate site. What am I doing wrong here?
You are doing nothing wrong. You are thinking ahead of your time. What you
are suggesting will be done. Many people are currently working on it.
It is difficult, but it is being done.
As above. Having been to the WWW7 conference a couple of weeks ago and
having just been to the meeting this morning I am very confident that what
you are envisioning will happen. There are so many people working all over
the world on all aspects of resource discovery and access that it is awe
inspiring. The National Library is absolutely committed to providing
Australia with a world class system of information resources, not only in
the library sector, but drawing in other collecting sectors such as museums,
galleries, archives. We are leading in Australia. We are one of the
leaders in the world. It may take a year or two to bring it all together,
but your metadata will not be wasted.
Deborah
Hi - how are you. All your work in getting the FOD etc archived is
making an enormous difference at this end. Keep it up - Margaret has
been wonderful in putting up with this constant invasion. I am trying
to cleary articulate to Cinemedia what kind
of infrastructure they need to build if they are to collect and archive
digital work. It's the tail wagging the dog as far as the FOD is
concerned and so far just a series of questions. I looked at the RLG
stuff but not sure what I can usefully extract. Is it Ok to ask you
questions about infrastructure?
Questions make finding answers easier - so here is a difficult one:
Maybe in the literary `collected letters' sense, but they are creating
a much larger and sometimes more intimate record of communication.
Email is a bit of a cross between a postcard and taped phone call. If
you have a look at the current on-line version of
`Killing the Duck to keep the Quack' (FOD0055.html) and go to
the piece by Maggie Jones that
you wrote to me about a year ago in `Death in Custody' and press
`worked out...' you will see why I am asking you about these issues.
The context is important. From my perspective it is important that the
FOD be transparent and that we are all able to explore the ways in
which a work can show (or cannot show) its own making. After all, if a
work doesn't finish, then all it has to show is its construction. This
is probably the hardest paradigm shift that people have to make in
understanding this medium.
I would assume (perhaps wrongly) that sitting on your Library Registry
or even your own or Margaret's hard drive are communications about the
FOD which I don't know about.
What are you going to do with them ?
Regards
Simon
Hello Simon
These are very relevant thoughts you present. The NLA is constantly
faced with decisions of what to keep and what not to keep. It's not an
easy decision. And then there is discussion of what access can be or
should be provided. I would imagine that if you as producer of FOD had
vital material you wished to be part of your creation but not available
for public perusal for 50 years we may consider how to do that, or let
you know that we couldn't do it and allow you the choice to withdraw the
material. I'm sure we would respect your wishes.
This is common to manuscript collections. Unfortunately the electronic
mediums are frequently robbing us of the record of these communications.
Very few will be kept to be donated to collections such as ours. And if
they are kept, they may have restrictions placed on them such as our
regular manuscripts do, say no public access for 50 years, etc.
Deborah (Dwoodyar@nla.gov.au)
Deborah
I have sent this to Margaret and refer to a previous message from her. Maybe
you have some thoughts on this because it concerns you too. I think it is an
important and difficult issue to discuss.
From the outset the FOD has been a site which shows what it is telling, does
what it says, displays its own making, reflects its own action. This action is
increasingly made up of the electronic conversation which it attracts and which
is part of its making. It is, in fact, one of the most compelling and
fascinating characterists of this new networked medium. It keeps it alive.
Because the conversation is between humans and not machines I am placed in an
editorial role and find myself stripping away anything that might compromise,
embarass or betray the norms of propriety that govern civilised communication.
It is not always easy to make the distiction between private and public.
However, I believe that I have edited with sensitivity and respect (if in
doubt I leave it out).
What troubles me is that history has shown us that what we discard today is
often what is most interesting tomorrow. So I am not sure what to do with the
conversation relating to the FOD which, out of propriety, should remain
private.
You pointed out that our conversation relating to the FOD as well as your
internal conversation within the NLA is kept as Library business on an
electronic Registry file where it is confined to the eyes of Library staff
only. There are conversations realating to the FOD in there which I know
nothing about. I too have conversations about the NLA which you know nothing
about. This electronic conversation is data with the same fragilities and
vulnerabilities as the rest of the FOD. I believe it to be part of the
datascape from which the FOD was/is constructed.
>It's not metadata. These messages are documents, objects, in themselves.What becomes of this FOD related conversation - these related objects?
Are they deleted - when ? Are they stored - where ?
It seems to me to be an issue similar to that of the cultural sensitivity of Aboriginal material ?
The secret/sacred tjurungas, for example, would no longer exist if they had not been `collected'. In their traditional context they were not intended to last for long - they were worn out, broken, burnt, eaten by termites, stolen or lost.
Surely, as a collection, these conversations need to reside within or be attached to, the site which we are trying to send into the future? Do we bundle them up and put them in a password protected area like I have done with the tjurunga?
Who has access ?
Do we take the same view of these objects as some Aboriginal commentators and say, "let them go".
Do we allow records to be erased ?
>I am quite uncomfortable about it. When I write messages to you it has been on the understanding that they are for your eyes and perhaps, at your discretion, for those of some of your colleagues. I had certainly not considered that, it you are posting them to your site, they could be open to anyone in the world. Whenever I write, I do always try to be discrete (you will laugh at this) but nevertheless it is disconcerting to learn that a much wider audience might be reading them.Again, don't misunderstand me. I am not talking about making private or restricted conversations public but I am talking about preserving them. As I said to you before: I too find this uncomfortable, but that doesn't mean we shouldn't discuss it or should allow data which might be of great interest to be lost because of our lack of vision or imagination.
Recently I heard a talk given by a woman who was interested in evolving a taxonomy of the content of email and its protocols. I can imagine such a researcher being grateful that we were able to see beyond our red faces and burning ears.
FODgate ?
Regards Simon
The FOD has been useful to the PANDORA Project for a number of reasons.
It's great for demonstrations because the flying ducks catch people's
attention and we usually use it for this. We have a demonstration to
the Library's Council coming up on 5 December - lots of awe inspiring
big wigs, including people like Barry Jones and Sir Anthony Mason - and
we are going to use it for that. On a more serious note, it is useful
to us because of the type of publication it is. It's not a serial,
which the majority of e pubs are, so it gives us a case to think about
that is outside the usual when we are thinking through our business
processes as we develop the archive management facility. ('How would
the amf cope with FOD?' is the kind of question we are wont to ask.)
The questions you have asked as you have refined the publications have
also challenged us to
think about broader issues and this has assisted us in the development
of the model.
FOD is unique, it contains documenatry material of research value,
presented in an innovative and thoughtful way and therefore adds
richness to the PANDORA archive and to the national heritage as well.
Jeeze, you gotta be careful about what you say to you in emails - it
goes public! "Er... is your site moderated, Mr Pockley?"
How important is it that the site or story in Web text based media be
'alive'? How to keep it alive?
felix (mrfelix@netspace.net.au)
Dear Simon
I have had a discussion with my IT knowledgeable colleague, Debbie, about
your question re the ability to display the date in one format to machines
and in another to humans. The way we understand it is as follows: it
probably can be done, but it is introducing complexity which may also
introduce problems. You could enter the date as (CC) reserved YYMMDD which is what
most software (the machine) requires and introduce a conversion program
within your publication to display the date in a more user friendly way.
The simplest conversion program would represent the date to humans as
DDMMYY(CC) reserved with perhaps forward slashes or hyphens between for punctuation. A
more complex conversion could depict the date as 11 September 1997 as you
would prefer, but the greater the complexity the slower the display and the
greater the scope for problems.
A compromise would be to enter the date in the format required by Dublin Core
in the metadata, and to enter it as a string (eg 11 September 1997) within
the publication itself. The average user is not going to look at the
metadata and therefore will not be offended by its unfriendly appearance.
Unfortunately, if you do not enter the data in the format required by
the software, it will not recognise it. Most software does not recognise
11th September 1997 as a date and it will therefore not pick it up in a
search. Database software used to manage a Dublin Core metadata repository
will also be unlikely to recognise it. You are probably therefore wasting
time putting it in in this form.
Another possibility is to input the data twice, in one format for searching
and the other for display. However, this is not currently supported by
the Dublin Core standard.
I hope this helps
Best wishes
Margaret(mphillip@nla.gov.au)
Margaret
Many thanks for your comments.
Sure do - particularly Internet Explorer - Netscape3 Gold on a Mac sometimes
does very odd things with animated gifs. But I keep an eye on both Macs and
PCs. I use an Silicon Graphics Indy workstation myself. Because what I do is
so simple I don't often run into problems. The main difference I have found is
the quality of the gamma, I think it is called gamma, (sort of screen
brightness) - While I have been updating the metadata attached to the images I
have been increasing the brightness of many of the images because they were too
dark on a PC. So, if you are updating FOD with Harvest don't assume that the
images are static.
At the time I was looking at the image from differnt perspectives. First from
the point of view of caption. Second from the point of view of actual image
content. Over time I have done exactly as you suggest and combined the two.
However sometimes, where there is a contextual description in the journal or
somewhere else, I add a second description tag with the content being a URL eg.
FOD0252.html.
Hmmm - this is an interesting shortcoming of language at the moment. Recently,
I had to fill in a form and had a similar problem. I described myself as
`servant to the material'. Author seems conventionally appropriate but
`creator' seems a little pretentious even `godlike' to me. As best I remember I
think I used author because it fitted in with the Hyperwave browser attributes
see : http://www.xanadu.com.au/FOD
I'll try `creator' for a while and see if I can get used to it.
I think you hit the spot when you talk about testing the usefulness of it all.
After all, if it is not useful, then - why bother. As I told you before, I have
chucked out heaps of stuff from the combing screens because it simply was not
useful.
Having Paul Resnick's students working away on PICs will be very interesting.
They are a bright lot at MIT so I'm looking forward to what they come up with.
It should be very useful for (the silent) John Thompson too.
Ok - what is less ambiguous than 10th September 1997 ? I don't want to budge
from this because I don't like writing or reading dates like 19970910 which is
not a date but a number.
What I don't know is how I should tag this - any suggestions?
Once again thanks - I have printed out the W Cathro paper.
Kind regards Simon
Simon
There was one thing more that I meant to say. About the Date element.
The information about Dublin Core elements that I have recommends that
the information about date be provided in a particular format: YYYYMMDD
as defined by ANSI X30-1985. It says, 'many other schema are possible
but, if used, they should be identified in an unambiguous manner'.
Cheers Margaret (mphillip@nla.gov.au)
Simon
Now to your questions. These comments are based on an analysis of
FOD0178.html.
1. The source information looks fine. It gives details about the
original object and where it is currently located.
2. The format looks fine too. We use Netscape 3 which has no trouble
dealing with your images. My IT colleague, Debbie Campbell, raised the
question of whether it is just as well managed by other browsers such as
Internet Explorer. Which browser do you use? Do you try out with
alternate browsers? For instance, if for some reason other browsers did
not deal well with it, you might want to include here something like,
'view with Netscape 3'.
3. In relation to the descriptions, I am curious as to why you have given
two. It may be for some good reason that I do not understand such as
increasing your chances of hits. If not, why not roll it all into one?
eg, "Digitised photograph of Pastor Albrecht with wife Minna behind
children Helen and Theodore, all standing in front of the shady verandah
of their house at Hermannsburg Mission, table - right, metal tubs
hanging
on fence - left, January, 1933" (This is probably being picky)
We are concerned about your use of 'author' . Dublin Core Element
Descriptions suggest use of 'creator', which is broader and more
appropriate when you are referring to a photograph, either digitised or
original. One is a creator or a photographer, but not really an author
of a photograph or a digitised version of one.
This raises an interesting point of discussion. What is being described
in this metadata? I assume that you are descibing the digitised
photograph. FJA Pockley was the creator of the original photo but
presumably you become the creator of the digitised version of the photo?
People working on Dublin Core have started to do some work on qualifiers
which would refine the meaning of elements such as Creator, although
this is still in proposal form. It might be a bit premature to put it
into practice.
For an interesting discussion of qualifiers you may be interested in a
paper by Warwick Cathro.
http://www.nla.gov.au/nla/staffpaper/cathro3.html
The section headed 'Semantics: the Minimalist/Structualist Issue' gives
a good discussion of this. (You could print it out and take it with you
to the Warrumbungles :))
Simon, you are way ahead of us in terms of application. All we can
really do is provide you with our opinion. Dublin Core is still
developing. It is great to see someone taking this so seriously. It is
only by creators' willingness to put it into practice that ultimately
the usefullness of it all will be able to be tested.
I was interested to read that FOD is being used as the basis for an
assignment. It will be fascinating to see what comes of this and
whether the students come up with something that might be useful for
you.
Have a relaxing time in the Warrumbungles!
Best wishes Margaret (mphillip@nla.gov.au)
Dear Simon
As usual we have had to put our heads together over your questions and,
even so, we don't have complete answers. We also have some questions
for you. What you are doing is extremely interesting and has wider
implications which I wonder whether you have considered.
We note that you are looking at updating your photographic metadata.
Are you intending to provide this level of detail for every
screen/digital object or only for selected ones, for instance, those
which might warrant direct access in their own right (eg digitised
original photographs and manuscript material)?
The answer to this question depends on what you are trying to achieve
and how far ahead we are looking. For instance, it occurs to me that
there might be a great deal of value for some researchers in the future
to be able to locate source material about Hermannsburg in 1933 without
knowing of the association with FJA Pockley or The Flight of Ducks.
What the NLA would ideally like to be able to achieve eventually
(although there is no guarantee that it will happen) is to enable a
searcher to run a search engine over all metadata in a number of the
databases on our site, including the pictorial database, Images1, the
national guide to manuscripts about Australia, RAAM, and PANDORA, among
others. If a researcher were looking for primary materials about
central Australia, they could find material in each, including Flight of
Ducks in PANDORA if the metadata is set up in the right way.
Are you open to the idea of contributing to this broader resource
discovery scenario? If so, you might want to consider adding
information such as date in a field where it will be readily
searchable. Unfortunately, the date in the Dublin Core Metadata Element
Set (http://purl.org/metadata/dublin_core_elements) refers to the date
the resource was made available in its present form. We believe that
only a title search including the date would pick up the date in the
title field. You have also included the date in the description which
hopefully future search engines will be able to pick up. The only
additional thought we had on this was to enter the date alone in a
repeated description field in the hope that this might increase its
chances of being picked up. You may know better than we do whether this
is likely.
We notice that you have repeated the keyworks element several times. Is
this to increase the likelihood of discovery by a search engine, or a
higher rating on the search list?
Unfortunately, the Dublin Core elements are not yet fully developed.
There is the potential for using some additional elements which could be
useful in this case, eg 13 Relation and 14 Coverage, but they are still
experiemental and may change.
I am interested in the concept of the relation between the The Flight of
Ducks in its digital form and the original materials. You have a brief
description of the original materials under 'Description' in the
metadata attached to FOD0001. I wonder it this would not be better
placed under element 11, Source, with Description providing a
description of The Flight of Ducks itself. At all levels the metadata
really refers to the digital object/s. Elements such as Source and
possibly Relation could be used to describe the relationship of the
digital object to the original materials.
Another element you may wish to use is 15, Rights Management. 'The
content of this element is intended to be a link...to a copyright
notice, a right-management statement...' This may be particularly
important for original materials, especially if/when the stage is
reached whereby searchers are able to access individual objects, without
necessarily having to come in through your home page.
It is certainly not a waste of time, but if you are using the Dublin
Core metadata, we believe that you have put it in the wrong place.
Under DC, the description element is defined as 'a textual description
of the content of the resource, including abstracts in the case of
document-like objects or content descriptions in the case of visual
resources....The URL or other unique identifier should go in element 10,
Resource Identifier.
The fuller the description, the greater the potential usefulness to
researchers. In the future, when search engines search metadata, it
should also increase the likelihood of discovery. However, it depends
on what you are trying to achieve and how much time you have to spend.
Do you mean in the title field? Help or hindrance to what? We are
unsure whether they would help or hinder discovery. As far as we know
search engines are not case sensitive.
Things are going very well. The PANDORA team has completed the Logical
Data Model for the archive management system and is in the process of
seeking comment on it from other groups within the Library, including
the IT people. The next step is the Technical Infrastructure Document.
We have put in a lot of hard work to reach this stage and it is exciting
that now we have something to show for it. We are just about to
commence some intensive work on screen design for data input.
The proof of concept archive is also progressing. I was able to give a
modest demonstration of a few titles to the AV(CC) reserved PANDORA Advisory
Committee last week and this went successfully.
People are always
fascinated by FOD. They like the flying ducks and the fun of trying to
catch them with the cursor. Unfortunately, to date, we haven't been
able to demonstrate FOD to best advantage as the Harvest software hasn't
successfully captured many of the images. Our most recent modification
to Harvest late last week did a much better job. When I'm happy with it
I'll send you the URL so you can go in and have a look.
By the way, what did you use to create the flying ducks? I am doing
Computers and Computing at the University of Canberra and we have been
using Visual Basics 4 to make a butterfly fly backwards and forwards
across the screen.
Another side question, just out of curiosity. Do you intend, at some
time in the future, even the distant future, once you have finished with
them, to place your fathers journals, photographs and artifacts in the
care of a collecting institution such as a State or the National
Library? Researchers discovering the digital object may well seek access
to the original for some reason or other, perhaps for publication of a
photograph in a print book. How will you manage increased public
awareness of your materials?
Simon, I hope this helps a bit. Please feel free to ask more questions.
It makes us think of issues from the publishers point of view which
helps our own work.
Best wishes Margaret E Phillips
Dear Simon
At last, the answers to your questions. I am sorry they have been so
long in coming.
Flight of Ducks was chosen because it conforms to the selection
guidelines developed by the Library's Selection Committee on Online
Australian Publications (SCOAP). These guidelines are available on our
Server at http://www.nla.gov.au/1/scoap/scoapgui.html. Your publication
meets a number of criteria outlined in Section 4. It is about Australia
and is written by an Australian author, it does not exist in print, it
has authority and research value.
In addition, we are pleased to be preserving Flight of Ducks because of
its innovative approach, because you are setting high standards for
online publication, including metadata, and you are willing to discuss
these issues with us and to cooperate with us. This will help us to
develop our own understanding of online publishing and assist us to
develop a national model for preserving it.
To date we have selected over 160 online publications of various types
for preservation. We plan to preserve all titles that conform to the
guidelines. It's a large task and a daunting one at this stage when we
have only a wobbly prototype archive of six titles. Flight of Ducks is
one of them. During the next couple of weeks, I hope to feel confident
enough to give you access to it so that you can see it and we can
benefit from your feedback. At this stage links with the archive is not
reliable.
I am not sure exactly what you are asking here. Digital media is
theoretically anything that is published in digital rather than analogue
format and includes sound, photographs, film as well as text. The
PANDORA Project will not include digital objects which are purely sound
recordings, photographs or film. (We would expect that National Film
and Sound Archive to take care of the film and sound.) PANDORA will of
course archive photographs, film and sound where they are used within
publications that meet the SCOAP guidelines.
The principle focus of PANDORA at the moment is Internet publications,
but we are also considering the need in the future to migrate
information from CD ROMs and floppy disks and we envisage managing this
information within the PANDORA model.
If this doesn't answer the question adequately, please let me know.
I shall answer this question from the Library's point of view.
Cost
One of the major issues for us and deposit libraries in general is that
of cost. The British Library Research and Development Department
commissioned a report from a consulting company, Cimtech, which, among
other things, illustrated the high cost to libraries of managing
electronic publications in addition to the continuing and increasing
business of managing print collections.
The costs of setting up manual and automated systems to manage and
preserve digital publications were calculated and compared with the
costs of managing print materials. The cost of managing/preserving a
printed monograph for the first 25 years is 122 pounds. The total cost
of managing/preserving a digital publication over 25 years with an
automated system is 5150 pounds and with a manual system is 2350 pounds.
Cimtech did point out that these comparisons are crude and that there
are factors that ameliorate the extremely high cost for digital
publications. The average digital publication is much larger than the
average print publications and it may be fairer to compare the cost of
ten or even 20 print publications with the cost of one digital. In
addition, the digital costs are based around the fact that a whole
system has been put in place to manage between 500 and 1000 publications
per year. There will be economies of scale when volumes rise. On the
other hand, these costs do not include the additional costs of
developing a long term preservation solutions and in that sense they
underestimate the cost of preservation.
One of the report's conclusions was that, 'as it stands today, the
management and preservation of digital publications is a very expensive
exercise and could only be justified on a very selective basis'.
Selection
This conclusion is an issue in itself for deposit libraries like the
National Library. Traditionally, in print, we have seen it as our
responsibility to collect the nation's documentary heritage
comprehensively. Because of the volume and variable quality of online
publications, as well as the expense of preserving them, we are being
forced to be selective. There is already a good spirit of cooperative
collection development in Australia, particularly among the State and
National libraries, (we are envied by our counterparts in the UK and
North America) but electronic publications makes this all the more
essential. We cannot afford, at the taxpayers expense, to undertake
unnecessary duplication.
A higher level of cooperative collection development is therefore one of
the issues that emerges in the preservation of electronic publications.
This is labour intensive and, from past experience, takes time to bring
about. It means that when the Library makes a decision not to select a
title, we have to try to ensure, if it has any worth at all, that is
will be preserved by someone else.
Lack of legal deposit provisions
The issue of selection is also related to another, the lack of legal
deposit provisions for electronic publications. This means that the
Library has to seek them out and negotiate conditions of access and
archiving with individual publishers. This is resource intensive and it
also means that often the conditions of use are not uniform, resulting
in inefficiencies when it comes to applying procedures.
Migration of data
There is the need to build in strategies for managing the migration of
information when one technology expires and another takes over. In the
print world, a library can take in a book and a serial, provide
recommended storage conditions of temperature and humidity and have a
reasonable expectation that the item will still be in good condition in
100 years time. The preservation strategies for electronic publications
must be must more active (and therefore costly). We have to build into
our archive management facility now the ability to migrate data as it
becomes necessary in the not too distant future. We have to 'future
proof", which means building in more options than may ultimately be
necessary. Again, this is costly.
A complex issue that cuts across a number of the ones already mentioned
is that the very nature of online publishing and one of its major
advantages, its ease of transmission, also brings with it some
disadvantages. The ease of copying and transmission causes publishers
and authors, justifiably, to be concerned about their intellectual
property rights and commercial interests. The National Library respects
these concerns and has no wish to prejudice authors' and publishers'
rights. However, it has an ironic outcome for libraries such as ours.
While the technology theoretically permits greater dissemination of
information together with preservation of one copy only of a given
publication, social and commercial factors actually inhibit this.
As a library with national responsibilities, we find ourselve in a
position of being able to offer our remote users less of a service when
it comes to electronic publications, than we have been able to do with
print. It also means that, at least in the short term we may have to
preserve more copies of an electronic publication than we have had to do
with print. For example, a company that is publishing commercially on
the Internet has just this week agreed to deposit with the National
Library one copy of each of its titles on the condition that we provide
access to in house users only. We are willing to agree to this. So
that readers in other States also have access to the titles, the
publishing company is generously depositing a copy in each of the State
libraries under the same conditions. This means potentially that eight
copies will be deposited and preserved. In the print world, one copy
would have been deposited in the National Library, one in the relevant
State library and the National Library would have purchased an
additional copy for inter library loan to serve remote users.
Instead of the technology enabling the library community to archive
fewer copies of electronic publications to save costs when the expense
is much greater, it may be necessary, at least in the short term while
commercial interest and copyright pertains, to archive more.
These are the main issues as I see them. I hope that helps. If you
need more information, please let me know.
Three of us on the PANDORA team have been pondering your most recent
questions on the datascape and associated metadata tag. You are ahead
of us on these matters. The two more technically minded have gone away
to think more about them on the week end. I have been asked to assure
you that we have provided for the storage of such a structure in the
data model that we are currently working on. One of us will get back to
you on this next week.
Three of us from the Project will be at AusWeb97. Are you going? It
would be good to have the opportunity to meet you since I missed you at
the Metadata seminar.
Best wishes
Margaret (mphillip@nla.gov.au)
Dear Simon,
Thanks for your prompt and provocative response. I must say I do
appreciate an honest and forthright opinion.
First of all - I assumed (one day I will learn never to assume
anything!) that because you had quoted so extensively from 'Capturing
the Rainbow', the conference I organised last year when I was Director
of the National Preservation Office (NPO), that you knew who I was. I
now head NIAC - National Initiatives and Collaboration, a new Branch
within NLA which incorporates the functions of the NPO. I am convenor of
PADI which is a group of people from a number of different areas which
is attempting to raise awareness of the consequences of not preserving
access to important digital information. We recognise that much digital
information evolves but we also are conscious we should not only keep
the latest version of all electronic objects. Different rules will be
required to deal with different categories of digital objects.
I apologise for including the wrong URL for the 'What's happening?' page
in my last message to you. It should have been:
http://www.nla.gov.au/dnc/tf2001/padi/happen.html -somehow I included
the URL for your e-pub which you say is out of date. Is there a later
version?
As far as the search mechanism is concerned I am interested to see that
you did not retrieve any information. Did you use the Boolean search or
the natural language method? the natural language usually gives good
results. We are thinking of removing the Boolean option.
I am keen to leave my 'comfort zone' - I recognise that we are in a
transition period and that librarians and archivists (and I am neither)
may not even exist in the future but I disagree that none of them are
prepared to tackle the real issues. There are discussions of these
issues. If you are interested I can tell you about some.
Yes, text is totally inadequate to describe images and other
non-document type objects, but work is taking place to find alternative
methods. In the meantime we are in the ridiculous situation where,
simply because we are now operating in a digital environment, we are
finding cataloging and indexing resources to describe works which have
been in our collections for ages but which have never been catalogued
because the process was considered to be too lbour intensive and it is
not as if we are presenting the images in new ways - we are merely
creating a digital version of the peper versions.
I would like to have more info on 'The uses of text'
Cheers, Jan
Dear Simon,
I was very interested to read this paper. I am conscious of your
publication 'Flight of Ducks' through my involvement with the PANDORA
project at the NLA which has identified your work as an electronic
publication worthy of preservation.
As you will be aware from reading 'Capturing the Rainbow' I am very
interested in providing long-term access to digital information and am
attempting through the work of PADI -Preserving Access to Digital
Information to provide comprehensive and current information on all
aspects of creating and managing digital objects.
I would be interestesd to hear your comments on the PADI website -
http://www.nla.gov.au/dnc/tf2001/padi/padi.html
We have recently introduced a new development the 'What's happening?'
page http://www.duckdigital.net/FOD/FOD0055.html
We are also in the process of providing an improved search mechanism for
the site. It can currently be viewed at -
http://www.intext.com.au/intext.txb/rljpadi2/search.htm
I look forward to hearing from you. Jan Lyall - (jlyall@nla.gov.au)
Hello Simon,
Was your trip to Hermannsburg Mission what you were expecting?
The reason for my message...
Maggie Jones came to me this morning to ask if I could contact you about one
of the quotes from her that you made in "Lest We Forget". For some reason
she was looking through your essay again recently when she thought something
didn't quite come out right and on checking her own paper realised you had
slightly edited something she said.
It's the quote you put under the "Death in Custody" heading, the last
sentence...
These national mechanisms have yet to
be worked out - other institutions take on roles as both facilitators
and active participants in preserving digital information.
She believes the end of the sentence does not portray the message correctly
and would prefer it if you would remove the end of the sentence instead so
that it would read:
I hope this is not too much to ask. Deborah Woodyard (Dwoodyar@nla.gov.au)
I read your essay with interest. I have a rather professional concern in
this area - I'm a librarian who works for one of the library software
companies (althought I'd rather be an historian...).
A number of our clients are getting into electronic services, and
maintaining electronic collections - my job is to get them thinking about
what it is they are actually doing so that they they have a clear picture of
what they are trying to achieve. And to hopefully achieve it.
I do find that demands of the new technologies are rapidly overtaking the
ability of most of our clients to absorb what is happenning. As one of our
systems analysts says, "the trouble is they are librarians and not
information technologists". However even he is having difficulty (and he
would not admit it) coming to grips with the net and how to negotiate it,
and its implications. If anyone in our small office wants a search on the
net, they ask me to do it. Partly because I know how to, but also because I
have a different level of the visual and textual literacy needed to analyse
what I have found. And I am by no means a "nethead".
I had already seen many of your reference sites, particularly the Chasing
the Rainbow conference papers. One site you did not mention, which may
interest you is the Dlib electronic journal. This is a british e-journal
which discusses information technologies. It is rather dry and academic but
has raised a number of the issues you are concerned with. The address is
www.dlib.org
Now for some random comments.
The essay reads like something that has been bubbling under the surface for
a while, has generated lots of thought, but has only now been put onto
"paper". As a result, it is not as focused as it could be. I thought that
once you got onto HTML and ASCII, it became a second essay that did not gel
with the initial sections. And I am not too sure how Palace16 fits in.
In my profession life, as well as my academic life, I have also spent a
deal of time thinking about these issues. And of course, when thinking
about it I find myself arguing against myself and holding various mutually
exclusive viewpoints at the same time.
With rapid changes in technology, particularly where electronic and digital
resources are concerned, it often appears that people can't see the wood for
the technology. They lose their critical facilities and ability to step
back and examine the "whole" and the content - it gets confused wiith the
format. Certainly some of the librarians I deal with suffer from this
syndrome! I don't think it is training, or the nature of the profession. I
think it is more a way that the world is conceptualised.
Anyway some comments...
The first sections of the essay are a fairly "romantic" view of technology
and memory. There is a sense of loss about it. And you make some very
interesting points in the sections about the fragility of cultural memory.
However, is the issue really that the information/data is now spread around
in different media in different places? After all, even in the
"traditional" library and archive environment, collections were also spread
around and in different formats. Part would be in this building, part would
be in that, different institutions, often in different countries, hold
portions of the whole. Only think of the writings of Sylvia Plath - Smith
have some, Colorado(? another US uni) have others, the bbc have sound
recordings and Ted Hughes has more , yet it is still the collected ouvre of
Sylvia Plath. "Sylvia Plath" does not cease to exist because all the bits of
paper are not on the same shelf.
It seems to me that what you fear(?), or maybe it is what I fear and I
attribute it to you, is the loss of the "grand narrative" that enables us to
imagine these disparate things and formats as part of a whole. And by
extension, the loss of the ability to create the grand narratives of
information in the first place.
I guess it could be likened to the transition period that oral cultures go
through when they become more literate. The stories are still there, but
the links are missing - the stories are not always lost but the social
cohesion goes missing. People don't always realise that they can adapt
and continue with the stories.
I've been thinking about your essay all week, but unfortunately I can't seem
to get my thoughts on screen. I am not sure my comments are relevant, but
you can have them with my good will.(lindac@altarama.com.au)
It is interesting stuff. However
preservation of information is not a recent problem ( let's face it, nothing
is forever ), and obsolesence of a delivery mechanism likewise ( one could
look at carved stone and hieroglyphics as 'obsolete technologies' ). Perhaps
the thrust of this tale is less about storage technologies and more about
access to information and defining relevant information. Important
information will always translate into the latest medium and language ( The
Bible ) and be easily accessible to all ( In the drawer next to every hotel
bed, etc.). Evolving technologies is kinda like moving house. It is the
working out of where everything is and what you want to take with you that is
important, more so than the cupboard it's kept in.
Interesting point: Word has it that many Asian countries are now gearing up
for CD ROM technologies rather than an on-line delivery. Appears they like
how you can control what information is put on a CD ROM, unlike the anarchy
of the Web.(kimba@aim.py.rmit.edu.au)
Preservation of information has always been one of the major goals of the
Xanadu project. I believe Ted Nelson once said that he saw Xanadu as
"the magic place of literary memory, where nothing is forgotten".
This has also long been one of my goals and one of the many reasons why I
became involved with the project - to help build the "posterity machine",
if you will. Therefore, your paper was of considerable interest to me.
Technological Obsolescence
This is one reason why we have long advocated that document
meta-information (sometimes also called "markup") should be stored in a
parallel data stream with separate addressing from the raw data itself,
permitting the same data to have several different kinds of
meta-information applied to it. New file formats could then potentially
be layered over the top of the existing ones. This would also ensure
that the raw data remains readily accessible even when software capable
of interpreting the meta-information is not available.
Actually, it has been extremely interesting to see the astonishingly
broad range of emulation software begin created and made available
for the Linux operating system essentially because the authors had
obsolete documents or software they wanted to preserve. As a result
support is presently available for filesystems including DOS FAT,
Windows 95 VFAT, OS/2 HPFS, the SCO Unix filesystem, MacOS HFS and
the Amiga filesystem and for the execution of programs for the DOS,
Windows 3.1, MacOS, Amiga, CP/M, Commodore 64 and Atari 2600 games
console systems on current PC hardware!
Another example is the widespread support for text-based adventure games
and interactive fiction dating back to the earliest known examples of
the genre. A standard format known as ZIL (Zork Interpretive Language)
was adopted, interpreters for that format are now available for just
about every computing platform, and the earliest games which were not
written in ZIL were translated to that format in order to preserve them.
As a result I can still play Crowther and Woods' Original Colossal
Caves Adventure from two decades ago on Windows, OS/2 and Unix and I
expect I shall still be able to play it another four decades from now
on whatever computer equipment is available at that time.
The key point here, of course, is that these examples have been driven
not by economic considerations but by the desire of talented programmers
to preserve obsolete material because of the value (often entertainment)
they gained from that material in the past. So long as the Internet
culture of open access at essentially no cost to the tools and
information necessary for talented people to create such emulation
software continues, making it possible for people to assist with
preservation on a volunteer basis at the cost primarily of their time,
I believe that it will remain possible to preserve the majority of
digital information available today.
Describing Digital Information
I would prefer to break digital information into three categories rather
than the two defined by the Task Force on Archiving of Digital
Information: Interactive documents including computer programs,
hypermedia (interactive multimedia) and "virtual reality",
non-interactive documents with a time component including audio, video
and multimedia, and static documents which do not have a time component
and are not interactive such as text and images. The last two
categories have traditionally been delvered and preserved without the
use of computers, and the last category has traditionally been delivered
and preserved without the use of any equipment to access the media.
The bulk of archival holdings falls into the last category because it
represents the longest historical period. The middle category is where
much of the focus of contemporary culture lies, but part of the ongoing
digital revolution is the shift in popular culture towards the first,
interactive category. It is this category which naturally due to its
relatively recent emergence and inherently more complex nature raises
the most unsolved problems for preservation and archival.
I don't believe that Baudot, which used 5 bit codes capable of
representing 32 distinct characters if no shift codes were used, was
ever unable to represent the 26 characters of the Western alphabet. As
I understand it, the reason for the shift codes was to permit the
representation of numbers and punctuation.
This is simply not the case. There is only one ASCII code and it is
exactly the same on all computers that support it, including the Mac and
PC. However, the ASCII code only defines 128 characters and many
computers define another 128 characters to create a proprietary
"extended ASCII" character set. The majority of these have now been
standardised and codified as ISO character sets, with "Latin-1" being
the commonly assumed default in most modern systems. The character set
in use also does not address semantic issues such as the permissible
lengths of lines or the encoding of the end of a line or a document.
This is where most of the incompatibilities arise; MacOS uses a carriage
return to mark the end of a line, Unix uses a line feed and DOS, Windows
and OS/2 use a carriage return and line feed. These are certainly representation
issues, but not arising from the interpretation of the ASCII code.
Markup
I would suggest that "markup" is another name for meta-text, rather
than a variety of it.
This is a common misunderstanding. HTML is not a version of SGML.
SGML is a language in which meta-information codings can be described.
Such applications of SGML which describe a particular markup coding are
known as DTDs (Document Type Definitions), and the various levels of the
HTML standard (with some minor but unfortunate exceptions) are examples
of SGML DTDs. Thus, an HTML document with the corresponding DTD
comprises a prefectly valid example of a full SGML document.
Again, this is incorrect; SGML does not describe documents, it describes
markup which may in turn be used to describe documents. HTML is limited
in its expressive ability because it lacks desirable features which are
being continually added in successive revisions, most notably the recent
addition of "Cascading Style Sheets".
Correct, except that HTML is no longer merely a proposed MIME type but
a standard one (text/html).
Conclusion
An excellent paper which addresses many interesting and important
issues. May I suggest that the Xanadu Project is also relevant to this
paper since, as I mentioned earlier, preservation is one of its key
concerns. The Xanadu home page is at
http://www.xanadu.com.au/xanadu/
Other projects insipred by Xanadu may also be relevant, especially
HyperWave (at
http://www.hyperwave.com/) which seeks to separate the
metadata from the document according to the Xanadu principles and
provides translation layers to present existing information stored in
the document database in a variety of formats using the current
Internet standards as they evolve. HyperWave is aimed particularly
at Digital Library applications and is already in use by the European
Space Agency and under investigation by the US Library of Congress.
(avatar@glasswings.com.au)
Many thanks - a most interesting 'interactive essay' which I've now pointed to
from our National Cultural Network (NCN) prototype web sit (not yet available
on the Internet).
You may have heard that the NCN got up in the Budget - media release at url
http://www.dca.gov.au/mediarel/online.html
If you are going to be in Canberra at some stage do let me know and come and
see our prototype web site (we've already pointed to 'The Flight of Ducks'
exhibit). We need to encourage others to develop more exhibitions like yours
and to contribute to a national online Forum where 'digitisation' issues can
be aired and discussed. We plan to build a 'virtual gallery' and 'Forum' into
the NCN web site. I'm sure you'd have some good ideas to contribute.
Keep up the good work and keep in touch.(RHewison@dca.gov.au)
I've glanced at your publication and was immediately entertained and
engrossed. I agree with your statements/opinions/viewpoints and will be
happy to respond after I've looked at it all (in both paper and on-line
format). Reading lots of text is still easier on paper. However,
interactivity (and structure) can only be appreciated on computer which
is a strange concept in itself: How can 'structure' be more concrete
(evident) in a digital environment?(desmmm@Underdale.UniSA.edu.au)
Thank-you very much for sending me your essay. I've shown it Colin Webb
and Maggie Jones also, and we are extremely pleased to see people such as
yourself outside the National Library interested and concerned about these
issues too.
Your essay is very interesting as promised and it covers the subject and
issues very well.
As you may have anticipated, I would like to reply to your suggestion under
the heading "No Conclusion Only Evolution" that our institution has not
addressed the issues you raised. We are seriously addressing these issues.
However, we do not claim to have solved them. We have asked ourselves the
questions you listed and more, and we are gradually finding answers. There
is steady progress being made and hopefully it will be widely discussed as
soon as we have some results.
Maggie was concerned that you were given the wrong impression in part of her
paper. In the section "Death in Custody" you talk about the preservation
sections lack of skills and equipment and failure to define the problem, but
Maggie was trying to highlight the need for many different areas of the
institution to be involved. Preservation Services are more aware of the
issues than other sections and we are trying to build the infrastructure
needed to link all the areas involved in this complex issue. We believe we
now have defined the problem, and this is not a problem that has existed for
30 years.
Have you heard any more about Vic. State Film Centre's Cinemedia? Last we
had heard, it hadn't been started. I'd be very interested to hear more
about where it's at.
I believe the Library is still very keen to work with you and "The Flight of
Ducks", even if things seem a bit slow on that front at the moment. I hope
we will soon find the opportunity to communicate on these subjects again.
(Dwoodyar@nla.gov.au)
Great ideas which I feel deserve to be published and given wider
circulation. No time tonight into more details but will later.
(birdy@aim.py.rmit.edu.au)