Tuesday, January 19, 2010
Liber has recently published a very interesting article (http://liber.library.uu.nl/publish/issues/2009-2/index.html?000472) where the creators of the digital library Europeana asked for an outsider's view of their project and asked Rick Erway of OCLC to do it. This was quite a courageous act by Europeana, and I commend them.
The entire article is quite thought provoking and I am still reading it, but Ms. Erway has some interesting things to say about metadata in a shared, internationalized environment. I take the liberty to quote the entire section:
My theories on metadata are
1. We do not need another standard.
2. People will use standards, but not in standard ways. Surprising choices are made even in using plain old Dublin Core. Having to hunt for or transform data, based on site-specific rules, does not easily scale.
3. People say they want to be told what to do, but they will not do it, because their situation or collection is unique.
4. No one likes their own metadata.
5. Mapping is a mythical grail.
What follows is a gross generalization (to which I have found no exceptions): Librarians want metasearch or federated searching. They do not like their own implementation. They blame the deficiency on metadata mapping. If they just had a better crosswalk, it would be better. So they change their software, retool with better mapping, and they still do not like it.
The reason is that a butterfly specimen has entirely different metadata than a painting of a butterfly. Who is the creator and what is the title or subject of a butterfly specimen? What is the Latin name or habitat of an impressionistic rendition of a butterfly? Just how many fields can be mapped between these two records?
My recommendation is to require a very small set of common elements and allow the rest to aid free text searching. Europeana's adoption of OAI-PMH and Dublin Core is a good thing. It precludes the development of yet another approach and adopts one that others may already be using. Requiring some very basic elements makes some advanced searches or filtering possible. If participants are allowed to leave required elements empty, it will render those documents not discoverable. Allowing data beyond what is required will allow for better retrieval, but just through free text searching. That's pretty much what users do anyway, type words in a box. Google manages to make it work.
User-generated information is intriguing. Access points that users use can be added to the ones we use. And we may get very rich information from experts. But there is a management headache when the data being augmented is in an aggregation. How do you coordinate giving enriched records back to contributors? If you do not or if they do not incorporate them into their catalog, then how do you coordinate updates from the contributor to the records that have been enhanced?
I don't know if I agree with this or not. I do agree with the idea that metadata creation/cataloging can get far too theoretical and thereby lose a sense of practicality, her "Who is the creator and what is the title or subject of a butterfly specimen? What is the Latin name or habitat of an impressionistic rendition of a butterfly?" is a good example of this tendency.
Still, I consider this more as a matter of a loss of focus in the very purpose of the catalog among the practitioners; a "can't see the forest because of the trees" syndrome. For example, perhaps we should consider that it is relatively unimportant whether somebody/something is a creator or contributor or editor or web manager or whatever. The question should be: would somebody want to find this particular resource by searching this particular entity? If so, would they want to search this entity in a way that has something to do with the creation of the intellectual aspects of the resource, or by the publication, dissemination aspects of the resource, or by the various ways of describing the resource, perhaps the titles and subjects? Naturally there are difficulties with this: is conference a name, title, or "event"? Dates of creation, editing, etc. vs. dates of publication, issue. But such issues should not divert us from the essence of the matter, and many times these discussions are purely theoretical with little or no impact on how people search, retrieve and understand metadata.
In my experience, I think a fundamental idea is being lost among the populace: that a well-organized catalog truly allows searching for *concepts.". For example, he writes: "Allowing data beyond what is required will allow for better retrieval, but just through free text searching. That's pretty much what users do anyway, type words in a box. Google manages to make it work." I cannot agree with this. When people type in e.g., "wwi" into a box, it doesn't follow that they realize that they are searching the *text* "wwi" and not the *concept* of that war that took place from 1914-1918. So, when I have pointed this out to people, they are shocked that by typing "wwi" into the box, they miss-by definition-anything before 1938, because nobody called it WWI until there was WWII. Once the public realizes this, it becomes clear to them, and they are not so happy with Google results, but unless you have worked at this for a long time, such as professional catalogers have, you will never realize it. And of course, when you consider the totality of languages and how languages change, it is a far more complex and subtle matter than any single person can understand. Non-textual materials, music, videos, images, etc. have entire realms of other considerations as well.
Also, the conclusion that "Google manages to make it work" does not follow, in my opinion. Google manages to *make people happy* with the results of the search, but it does not mean that it really works the way people expect, as the WWI example above demonstrates. "Customer satisfaction" may be the correct goal for a company such as Google, but it is definitely not a satisfactory goal for doctors or lawyers, who are ethically compelled to tell you the truth whether it makes you happy or not. I like to think that librarians are more a part of the latter group instead of in the corporate business group that follows the motto "Let the buyer beware."
Her comment: "People say they want to be told what to do, but they will not do it, because their situation or collection is unique." I cannot agree with completely, either. I think people want to be told what to do, and especially since people are scared today, they may be willing to cooperate more than ever, but they will not be dictated to and therefore, cooperation will not be 100%. Cooperation involves a vast amount of give and take among all the groups involved, and that means us as well. Plus, cooperation includes an element of trust that seems to be lacking at the moment.
So, her recommendation "to require a very small set of common elements and allow the rest to aid free text searching," is absolutely necessary and I agree, but it does not obviate the need for a genuine organization of materials. How that can be done in a world of diminishing resources, higher productivity, and genuinely shared workflows remains to be seen.
Friday, January 15, 2010
B.G. Sloan wrote:
Barbara Fister writes a nice essay on the future of the academic library:
Thanks for pointing to this excellent article. It's a nice summary about how various people are dealing with a world they see as disintegrating. The main problem is that we are in a moment of transition and it is impossible to know which way to choose. These are matters that seem much easier in retrospect, and I'm sure in 30 or 40 years as people reflect on whatever is in store, many will conclude that, "Well, really they had no choice but to do ..." So, people often conclude that Lincoln had no choice, Wilson, Roosevelt, Johnson (just to take U.S. presidents) were all compelled by outside forces to do what they did and had little personal choice. Of course this ignores their reality which in fact, was filled with anguish and doubt.
It seems to me that there are essentially two ways of dealing with major transitional moments like this: the technical, bureaucratic way (or as I prefer to call it, CYA :-) or the dynamic way.
Both ways assume that the future cannot be foretold, so the first method wants to avoid eventual blame if something goes wrong (as it most probably will), so you include as many people as you can; you form committees with pro/con papers, debates, maybe even take votes occasionally along the way. This method may take quite literally forever, but it has the undoubted advantage of diffusing any blame when bad decisions are made because no single person, i.e. "nobody" can be found to be at fault.
The other way is much more dynamic. Someone gets an idea and runs with it, fixing problems encountered along the way. Some ways may turn out completely wrong and those involved have to face up to their mistakes, but if they are allowed to admit the problems quickly enough, the mistakes may not be so dire. More importantly, they may find others doing something similar and they can collaborate. Therefore, we have the old idea of "trial and error" which works so long as people have the freedom to make "trials," while "errors" are freely admitted and not punished, or at least not too badly.
One method is not necessarily better than the other--a lot depends on the moment of time you happen to be living in. In peaceful times when changes are more or less predictable, the first method may be fine and in fact, much better than the second method, which is almost always highly disruptive. But in moments of tremendous change, the first method can lead to extinction.
And now, to bring this back to the topic of "Next Generation Catalogs for Libraries," it is obvious that libraries are prime examples of the first method, while Google represents the second.
I think libraries must begin to freely admit that some things they cherish most are obsolete today. (This appears finally to really be happening now) But once they accept these truths, I believe that libraries, i.e. as a collective endeavor, *can* have a huge advantage over organizations such as Google which, after all, is a company that must show a profit. Consequently, Google needs to drive people to Google products in all kinds of innovative ways, but we know that there is much, much more out there than Google products. Google is only one tool for us. As a result, we have far more at our disposal then Google does.
If we build tools that take what Google has, plus Yahoo, the Internet Archive, Gallica, scanned library materials scattered around, open archives, etc. plus input from academics, and most important, really and genuinely *cooperate* closely with our colleagues from all over the world in the areas of libraries, archives, publishing, open archives, and on and on, broadening and building on our traditional tasks of: selection, description and organization, we could build something genuinely new.
For example, one thing is clear to me: the public (both academic and the general public) desperately wants selection and may want it more than description or organization. Just look at the emphasis in Information Literacy classes about "evaluating resources." Instead of putting all the onus on the end users, what can the library community do about it? Well, how about we do "selection?" After all, the public trusts us for books. Then however, multiple questions arise from all over the place: 1) How in the world can we do selection of web materials in any kind of practical way with the resources we have? (Answer: not the old way!) 2) What does selection mean when related to materials on the Internet? 3) Who else can we involve? ...
These are just a few questions which in turn provoke other questions, including those of description and organization. They go on and on. Are there solutions? The *only way* to know if solutions even exist is through trial and error, not by committees and position papers. Google has demonstrated this brilliantly. Choosing this method is provocative by its very nature, yet if (and I hope when) the library field solves these problems, I'll bet we could create something that could really appeal to the public. And the public could respond very positively.
Naturally, there would be lots of mistakes along the way. The dynamic method is exciting, but frightening at the same time.
A few musings.
Wednesday, January 13, 2010
On Tue, 12 Jan 2010 09:35:30 -0800, Steven C Shadle wrote:
Thanks so much for the clarifications. You make some excellent points. I'd like to make some further clarifications myself.
>>With virtual materials, more and more systems are being built with a link from the article metadata straight to the journal article and the user needs to know practically nothing about the serial as a whole.Right now, it works from only from within a specific database, e.g. within Lexis-Nexis or Ebsco. It breaks down quite a bit once we get outside a database. So, if someone finds an article in Google Scholar, you still have to search different databases much as before computers (only you use a computer!). There are some workarounds for this, since I have seen some Firefox plugins that do some quite wondrous linking.
>What metadata are you talking about? SICIs never really caught on. DOIs only get a user to the version that *publisher* has registered. At this point, I know of literally no publishers who submit to CrossRef not only their own online version, but also all of the providers that they have licensed the electronic rights to.
In my own opinion, linking to an item that is already in an online database should not be such a terrible problem from the technical viewpoint. It is a problem that can be fixed when publishers want it badly enough. Many publishers are finding themselves in dire straits today and I believe they will find the additional incentive to create tools for reliable linking so that when someone finds an article he or she wants, it should be very easy to get to it so that they can decide whether or not to buy it. Various options exist right now that can achieve this. I would say that most of these efforts will happen through Google Scholar since publishers are looking for the biggest bang for the buck. After all, it's the publishers' bread and butter.
It also demonstrates the absolute importance of metadata--so long that as metadata can be shared in all kinds of ways so that as many people as possible can become aware of materials they may want, and--in my opinion--why high quality standards are also important, which all lead to the importance of cataloging and catalogers.
>However, latest entry is not the panacea for users that you portray as it really depends on the implementation (ie, what does the user see. Regina Reynolds & Cindy Hepfer recently did a piece for Information Standards Quarterly "In Search of Best Practices for Presentation of E-Journals" Spring 2009, Volume 21, Issue 2, 18-24) where they describe a scenario of a student with an older citation trying to find an article. Former journal titles exist in citations and unless the discovery system clearly meets the user's title expectations/assumptions early on the process, discovery will fail. Or put another way, a latest entry record in my III WebPAC won't help the user with an earlier title because our above the fold display is typically 1XX, 245, local holdings. Mention of any earlier titles happens towards the bottom of the detailed display.This is really an excellent point and is yet another example of how the current OPAC fails in comparison with the old card catalog in terms of display. For those out there who haven't seen this in action before, here's an example from the Princeton catalog. There are lots of more complex and better examples out there but of course, I can't find any right now (I just hope the links work):
Title for earliest entry: Monthly review (London, England : 1749)
The earlier/later links are not implemented in Princeton's catalog so you have to search manually. But in the card catalog, it is:
When searching in the OPAC, the multiple display of all the various entries is rather incoherent.
My own opinion of all of this is we are still thinking in terms of making separate records, which equals making a catalog card, or the unit card. The display of a computer is much more powerful than only showing a single record. With relational databases, it can take bits and pieces of information from all over the place and off of the web as well. We see this now in OPACs that use separate authority and circulation modules, and maybe import tags from LibraryThing with book covers from Amazon, but they can do far more than that. Other formats are still more powerful. And if there would be at least some level of cooperation, the possibilities are almost
This leads to my arguments that I have made several times against viewing the "manifestation record" as something that is separately created and hand-made (i.e. the catalog card) instead of a dynamic entity created from an entire host of different bits and pieces. (That's why I say that there is no such thing as a "manifestation") This is the way any webpage works, by taking all kinds of files from all over and displaying them on your machine.
When looked at in this way, the catalog record (i.e. the public display) becomes far more interestingâ€"at least I think so!
Somehow it seems that I can always turn the argument into an anti-FRBR rant! :-)
Tuesday, January 12, 2010
Serials are a very delicate matter and I think, should be reconsidered in their entirety. The major/minor change rules come from the latest ISBDs (available through the CCR Wiki at:
http://sites.google.com/site/opencatalogingrules/isbd-areas, all under 0.2 Treatment of Resources and 0.2.4 Changes requiring a new description: for continuing resources).
There is a real problem here balancing the needs of maintaining coherent library inventories with the needs of the users. In my own opinion, which is certainly not shared by everyone, latest entry was easier for non-specialists to grasp, since they had a nice overview of the entirety of the serial, seeing how it changed over the years, while successive entry broke everything up, at times making it difficult even for an expert cataloger to get an idea of the whole of a serial.
Yet, in a networked environment it becomes extremely difficult to share records based on latest entry, because each library will have vastly different holdings. If the idea is to share serial records, a library must have an option other than only taking the latest entry, since it would mean, in effect, major editing not only of each library's holdings, but of each bibliographic record, thereby defeating the purpose of sharing records. Thus, we have successive entry, which is designed to provide libraries this option, although as I said, I feel this situation becomes far more difficult for our users.
Are there any options today? I think there are. New formats, such as XML, allow brand new displays never available before, which can perhaps give the best of both worlds, allowing complete nested displays that the user could interact with, such as drop-down menus showing descriptions of earlier titles. But a more radical approach would question even more: how many *library users* want an entire serial? I would bet that 99% or more want individual articles from the serials, or they may be interested in knowing what is in the latest issue of a serial. It is very rare that anyone would have a practical need to know the entire history of the Atlantic Magazine, other than a cataloger or selector for inventory purposes. Lots of people want individual articles from the Atlantic, or they may want a thematic issue if they know about it.
In the printed world, to get an individual article, people need to know the physical location where a journal is shelved; therefore, they need to know something about how serials work in a catalog and how they are placed on the shelves. With virtual materials, more and more systems are being built with a link from the article metadata straight to the journal article and the user needs to know practically nothing about the serial as a whole. I agree there is a loss and it has profound implications in several ways: think of Google News and its impact on newspapers. Newspaper publishers are anguished about how their newpapers are being losing their identity. But this is exactly how Lexis-Nexis works, and most other online databases: the individual journals and individual issues disappear almost completely. Academic journal publishers are saying similar things as the newspaper publishers (although I can't find any examples at the moment, I have read them) Nevertheless, we all know that this is happening, and the direction seems to be toward more disintegration, not less.
When we are discussing the *needs of the users* and not the *needs of library inventory* (both equally important), in a world where information is disintegrating into smaller and smaller chunks, I think we have to seriously consider what *we* need as librarians. Serial records will more and more be there for librarians and inventory purposes, not for users.
Monday, January 11, 2010
Emails discussing the sites:
Those were interesting. Have you seen http://projectinfolit.org/pdfs/PIL_Fall2009_Year1Report_12_2009.pdf? How College Students Seek Information in the Digital Age. Very interesting, but frightening since it follows my own experiences.
Our users are normal human beings with all of their foibles: most want to get a job done with the least amount of work they can get away with. Therefore, people prefer concise, "reliable" results that they can use and go on to something else. Occasionally, very occasionally, someone comes who genuinely wants to learn. What was saddest to me from what you sent was the response to library "quality" and almost nobody citing it as a place to learn…. Wow. It is one place you can meet the greatest minds in history, spend time with them, but it is not a place to learn. For me, that is a huge generational change since that was exactly what a library was.
But I guess that was back in the days when there was still respect for "self-education," as being a finer thing that sitting silently in a classroom bored to death, with the alpha wave screaming out of everybody's brains. Today, education is something you pay for, and in return, if you make your teachers and instructors happy enough, you will get a degree, which is the equivalent of a union card today. But a union card meant that a plumber was a decent plumber, or a mechanic knew what he was doing. Today, people get degrees who can't write five words coherently, and while they may have some knowledge of a subject, they often have almost no understanding of it. And let's not even get into the subject of "drawing logical conclusions!" But they have their degree and can go on to graduate school, get their PhD and teach somewhere.
Once in awhile though, someone may surprise you. Most of what I learned of value, I learned on my own. Most of what you learn in school ends up a crock sooner or later, just like most of what I learned about the Soviet Union (my topic) turned out wrong, and now everybody is rethinking modern capitalist economics. So, if you got a degree in economics in 2003, what does that mean you know today? You've got to start all over from scratch.
I've been toying with an idea for a long time now that we are actually living in a "Dark Age," just like back then, where there were accepted truths you can't question, there were authorized experts who had to be quoted, and so on. How would someone know if they were living in a Dark Age? I am sure that if you asked those people back then that their minds were closed, they would have laughed you out of the place. But when I see the power of the media, of political correctness, how people in the U.S. rarely want to discuss politics because it might be disruptive, and find it really open over here, it's been very interesting. (Italy has its problems too!)
I got this idea around 15 years ago, and it is only in the last 5 or so years that I am beginning to understand more precisely how it is happening. And what to do about it. I think librarians could be very important in this scenario, but we can be just as caught up in it as anybody else. Whether the catalog can survive, or even a separate entity as the library, I don't know, but I think (and hope!) that people will want and need librarians' values and ethical positions.
That’s a very interesting thought Jim. I tend to think its always been like this (think Socrates in Greece) although in some ages its worse.You’re right, but one place that we can’t forget about is Renaissance Florence. (My wife and I went there over the holidays for a couple of days. Fabulously beautiful place. It’s amazing to me that practically all of the Renaissance happened in such a tiny town. Great food and wine, by the way!) Anyway, their minds were really free for a time, compared to what they had before (and what we have now in many ways). Of course, it made lots of people angry, as Giordano Bruno and Galileo discovered. The late 19th-early 20th century was pretty free as well. It just seems to me that we are on the cusp of something happening. Everybody’s mad at everybody else; people are waiting for something. I just hope it’s good and/or great.
I have mixed feelings about the cataloging stations… still, one can’t not notice how the only thing left of value here seems to be the materials themselves and the metadata that helps people find, explore and acquire them…I think so too. But we must adapt to this new world that is coming very fast. That’s why I get nervous when a library gets linked to a physical locality or to a specific physical collection, no matter how great it happens to be. It seems to me that this would be similar to the mammals betting everything they had on the dinosaurs after the weather started getting cold instead of seeing the changes around them. For those librarians who are ready and willing to adapt to the new environment, they can—of this I have no doubt. I just don’t know if our physical collections can adapt as well. More and more, people want the information inside the book, and on their own terms and in different formats, but less and less do they want the book itself. Sad, because I am a bookman, but it’s the way of the world.
Tuesday, January 5, 2010
These are some thought-provoking comments. I think there is another consideration that must be included: general expectations.
People expect much more than ever before, and a lot of these expectations are simply unrealistic. In your example, an argument could be made that the populace expects and demands 100% security. Of course, this is an impossibility, but nevertheless, when something untoward happens, as it always will, people believe that "something has gone wrong." Many times, it isn't that something has gone wrong; it's not that anyone has made a mistake; it's that no one has ever achieved 100% security. This means that things happen sooner or later, and that wil always have things to learn. I think this is something that people outside of the US understand somewhat better. For example, just a couple of days ago there was a bombing at a police station in southern Italy, most probably by the Mafia (http://www.siciliainformazioni.com/giornale//76320/calabria-bomb-seen-general-warning-attack-ndrangheta-signature-investigators.htm), but while people are angry and concerned, there has been nothing about: who made the mistakes that allowed this to happen in the first place, and so on. There have been bombings in the UK, Spain and many other European countries, but their reactions are quite different from what seems to be going on in the US. I think we can also keep in mind the problem of relying on 100% economic security as well.
In these cases, I think there is a lot society can do, but there will still be a lot of risk left over.
Relating this to libraries, what are our patrons' expectations? I have not looked to see if there is any research in this area per se, but it is my professional experience that people's expectations for what they can get through information search and retrieval has changed a lot. For example, if a library doesn't have a book, the public can get a copy through ILL or through some sort of arrangement where your patrons can go to another library where they can get the book. 20 years ago, that was considered a success. Today, I think such an option is considered more as a failure in the eyes of our patrons, and are becoming more and more of a failure all the time. People want *everything* and they want it *now* (http://www.youtube.com/watch?v=u61vw_jBAvE) so if they have to go somewhere else, or have to wait a few days or weeks to get the information they want, or as I have seen more and more, if something is not available in an electronic version where they can search the full-text, they see it as a failure of the information search and retrieval system. I am not saying that this is a good thing that is happening, but happening it is and I don't have any idea how this can be stopped. Attached to this is the idea of Marcia Bates' "Principle of Least Effort" which I discussed on a list somewhere and I placed a copy on my blog at: http://catalogingmatters.blogspot.com/2009/06/re-in-praise-of-lazy-catalogers-was.html.
"Probably the single most frequently discovered finding on information seeking behavior is that people use the principle of least effort in their information seeking. This may seem reasonable and obvious, but the full significance of this finding must be understood. People do not just use information that is easy to find; they even use information they know to be of poor quality and less reliable--so long as it requires little effort to find--rather than using information they know to be of high quality and reliable, though harder to find."
This has profound consequences for what we are doing, both in terms of systems development and perhaps more crucially, on user education.
This is why I keep harping on the FRBR user tasks, which necessarily are very closely related to user expectations: people want and expect something substantially different today from before. The FRBR user tasks are simply obsolete and the sooner we accept this, the better for our entire field. In fact, it is my experience that the rise of full-text searching has brought a consequent loss of the very concept of authority control (which was poorly understood to begin with), and as a result, even the *idea* of being able to find, for example "all of the works" of Mark Twain (within the traditional parameters of the rule of three etc.) no matter the form of the name on the item, is becoming less and less known among patrons and perhaps less and less appreciated. Once they understand the purpose and power of authority control, which is definitely not something that can be explained in sixty seconds, (for example, see the BBC article "Turning into digital goldfish" http://news.bbc.co.uk/2/hi/science/nature/1834682.stm) This goes equally for all the European, US and UK students I have worked with.
We need to make tools that will berelevant to our users' needs , but before that we need to find out what their needs are, otherwise we are making tools relevant to the world before the 1990s but not to the world of today, let alone the world 20 years from now.