Posting to NGC4LIB
Liber has recently published a very interesting article (http://liber.library.uu.nl/publish/issues/2009-2/index.html?000472) where the creators of the digital library Europeana asked for an outsider’s view of their project and asked Rick Erway of OCLC to do it. This was quite a courageous act by Europeana, and I commend them.
The entire article is quite thought provoking and I am still reading it, but Ms. Erway has some interesting things to say about metadata in a shared, internationalized environment. I take the liberty to quote the entire section:
My theories on metadata are
1. We do not need another standard.
2. People will use standards, but not in standard ways. Surprising choices are made even in using plain old Dublin Core. Having to hunt for or transform data, based on site-specific rules, does not easily scale.
3. People say they want to be told what to do, but they will not do it, because their situation or collection is unique.
4. No one likes their own metadata.
5. Mapping is a mythical grail.
What follows is a gross generalization (to which I have found no exceptions): Librarians want metasearch or federated searching. They do not like their own implementation. They blame the deficiency on metadata mapping. If they just had a better crosswalk, it would be better. So they change their software, retool with better mapping, and they still do not like it.
The reason is that a butterfly specimen has entirely different metadata than a painting of a butterfly. Who is the creator and what is the title or subject of a butterfly specimen? What is the Latin name or habitat of an impressionistic rendition of a butterfly? Just how many fields can be mapped between these two records?
My recommendation is to require a very small set of common elements and allow the rest to aid free text searching. Europeana’s adoption of OAI-PMH and Dublin Core is a good thing. It precludes the development of yet another approach and adopts one that others may already be using. Requiring some very basic elements makes some advanced searches or filtering possible. If participants are allowed to leave required elements empty, it will render those documents not discoverable. Allowing data beyond what is required will allow for better retrieval, but just through free text searching. That’s pretty much what users do anyway, type words in a box. Google manages to make it work.
User-generated information is intriguing. Access points that users use can be added to the ones we use. And we may get very rich information from experts. But there is a management headache when the data being augmented is in an aggregation. How do you coordinate giving enriched records back to contributors? If you do not or if they do not incorporate them into their catalog, then how do you coordinate updates from the contributor to the records that have been enhanced?
I don’t know if I agree with this or not. I do agree with the idea that metadata creation/cataloging can get far too theoretical and thereby lose a sense of practicality, her “Who is the creator and what is the title or subject of a butterfly specimen? What is the Latin name or habitat of an impressionistic rendition of a butterfly?” is a good example of this tendency.
Still, I consider this more as a matter of a loss of focus in the very purpose of the catalog among the practitioners; a “can’t see the forest because of the trees” syndrome. For example, perhaps we should consider that it is relatively unimportant whether somebody/something is a creator or contributor or editor or web manager or whatever. The question should be: would somebody want to find this particular resource by searching this particular entity? If so, would they want to search this entity in a way that has something to do with the creation of the intellectual aspects of the resource, or by the publication, dissemination aspects of the resource, or by the various ways of describing the resource, perhaps the titles and subjects? Naturally there are difficulties with this: is conference a name, title, or “event”? Dates of creation, editing, etc. vs. dates of publication, issue. But such issues should not divert us from the essence of the matter, and many times these discussions are purely theoretical with little or no impact on how people search, retrieve and understand metadata.
In my experience, I think a fundamental idea is being lost among the populace: that a well-organized catalog truly allows searching for *concepts.”. For example, he writes: “Allowing data beyond what is required will allow for better retrieval, but just through free text searching. That’s pretty much what users do anyway, type words in a box. Google manages to make it work.” I cannot agree with this. When people type in e.g., “wwi” into a box, it doesn’t follow that they realize that they are searching the *text* “wwi” and not the *concept* of that war that took place from 1914-1918. So, when I have pointed this out to people, they are shocked that by typing “wwi” into the box, they miss-by definition-anything before 1938, because nobody called it WWI until there was WWII. Once the public realizes this, it becomes clear to them, and they are not so happy with Google results, but unless you have worked at this for a long time, such as professional catalogers have, you will never realize it. And of course, when you consider the totality of languages and how languages change, it is a far more complex and subtle matter than any single person can understand. Non-textual materials, music, videos, images, etc. have entire realms of other considerations as well.
Also, the conclusion that “Google manages to make it work” does not follow, in my opinion. Google manages to *make people happy* with the results of the search, but it does not mean that it really works the way people expect, as the WWI example above demonstrates. “Customer satisfaction” may be the correct goal for a company such as Google, but it is definitely not a satisfactory goal for doctors or lawyers, who are ethically compelled to tell you the truth whether it makes you happy or not. I like to think that librarians are more a part of the latter group instead of in the corporate business group that follows the motto “Let the buyer beware.”
Her comment: “People say they want to be told what to do, but they will not do it, because their situation or collection is unique.” I cannot agree with completely, either. I think people want to be told what to do, and especially since people are scared today, they may be willing to cooperate more than ever, but they will not be dictated to and therefore, cooperation will not be 100%. Cooperation involves a vast amount of give and take among all the groups involved, and that means us as well. Plus, cooperation includes an element of trust that seems to be lacking at the moment.
So, her recommendation “to require a very small set of common elements and allow the rest to aid free text searching,” is absolutely necessary and I agree, but it does not obviate the need for a genuine organization of materials. How that can be done in a world of diminishing resources, higher productivity, and genuinely shared workflows remains to be seen.