Friday, February 25, 2011

RE: How We Know

Posting to NGC4LIB

Laval Hunsucker wrote:
Library selection does not attempt to be neutral since, as you point out and, it is very well known besides, this is impossible. Librarians have just as many opinions as anyone else, and they hold their opinions just as strongly as anyone out there. Librarians are not neutral. Neutrality is passive, and selection is very active. It actively attempts to include all types of different opinions and attitudes that the selector *probably* disagrees with, even violently. I have certainly added such materials to the collection for which I am responsible; materials that I very actively dislike and completely disagree with. But I am aware that the library's collection should not be a mirror of my own opinions. While I could do that, it would be an irresponsible use of what has been entrusted to me, and I will even say, a misuse of my own power. And it is power; let us not minimize that.

My personal library collection at home is an entirely different matter since I refuse to pay out of my own pocket for--what I believe to be--garbage, but the collection that I am responsible for professionally is another matter entirely.

Now, what is "The Best"? (You notice that I always put that in quotation marks) Will there be--and in fact, could there ever be--any kind of agreement on such a question? Of course not. Such a question is, and I hope will always be--highly controversial. But, decisions *must be made*. Period. There is no choice. There is only so much money and so much space. That means choices must be made and that is what library selectors are paid to do, and we cannot renounce that responsibility, because otherwise someone else will make the decisions. At least librarians are guided by their code of ethics. This has been going on in libraries from time immemorial and is certainly nothing new.

When I say that somebody *must* make the decision, it is similar to a judge who must make a decision about far more difficult matters, for example, when a person has died or been seriously injured through the fault of someone else, it eventually comes down to how much money to award the person, or their family, whose lives have been destroyed. In the broader scheme of things, we all know that no amount of money could ever be enough, but a decision must be made. That is the very purpose for the existence of the judge. And the judge cannot simply walk away and say that a decision cannot be made, because even if the judge did walk away, we would have to find somebody else to make the decision since a decision must be made.

But then Carol goes on to ask, "what is garbage"? Of course, that depends on the collection. A pile of comic books may be useless in one collection but the main part of another. This is the way it has always been, and as I wrote before, it would be very nice to think that the need for the library task of selection has been eliminated with the appearance of so many free materials on the web since with those, the problems of budget and space are gone. But it is more complicated than that. Remember that story about the teacher who had pictures on her MySpace page of herself drunk as a student and got fired from her job? I'm sure this woman thinks those pictures are garbage and wishes they had been thrown away! There are lots of other people who have been hurt by the existence of materials that are not thrown away.

So, we are seeing a problem, since now people are always complaining that they are finding too much, as the NY Review article discussed very clearly and why I started this thread (the URL again is In the past, this has not really been a problem for libraries, who have pretty much striven to add as much as they can, depending on space and budget.

Based on that article, I suggested that people still want selection, and in fact they are getting it now, but it is being done by others and not by librarians who have their ethics. When somebody gets 8 million hits on Google, the "relevance" ranking is a type of selection made by a computer. We should not pretend that this relevance ranking is either "objective" or "neutral". It is programmed by human beings working for a modern corporation, using methods that are secret and that can be manipulated by others around the world for whatever personal reasons they may have. I am not finding fault, just trying to point out the reality of the situation as I see it.

As a librarian, I happen to know that there is a lot of very good information that is not coming up in the top one or two screens in Google; there are open archives along with all kinds of other sites that people may find very useful, but Google either does not have it at all (i.e. it's in the "hidden web") or there is too much "static" or "garbage" that gets in the way.

Sure, we can all just say there is no problem with selection now because the old constraints no longer apply, and that, after all, selection is akin to censorship, but this ignores the fact that people feel overwhelmed by the amount of "information" they receive and that selection takes place now automatically, but the automatic selection has all kinds of serious problems. It is obvious that traditional library selection must change tremendously (at least it is obvious to me), probably as much as traditional library cataloging. I am sure these changes will eventually happen sooner or later since they are needed so badly and I can only hope that traditional library ethics will be a major part of it.

Thursday, February 24, 2011

RE: Abbreviations in RDA

Posting to RDA-L

Hal Cain wrote:
The dictum that context imparts meaning is, I think, relevant here. In the context of an ISBD bibliographic record, printed or in a screen display, standard abbreviations have a context; nowadays, even so, possibly not all who see them in that context will understand them.

In contemporary bibliographic displays, the context is often fractured. Therefore the meaning may be obscured. When we prepare to dismantle bibliographic data and mash elements into hitherto unseen combinations, we can assume no particular context,

Therefore it seems to me that abbreviations no longer have a place in our workflows.
This is a very important point, but I have a different take on it. In the future, I think it is safe to assume that the catalog records we make will be mashed up with other "things" out there to create entirely new resources. (At least, I hope they will be because otherwise, our records will be ignored and not used at all) At this point in time, it is practically impossible to predict how our records will be used and changed, but one thing that I think we can assume: the traditional context will be lost, as Hal mentioned. This means that a bibliographic record will be seen *outside* the catalog, in isolation from the rest of the records it relates to, by way of headings and descriptive treatments. It will be just like looking at a few catalog cards taken out of a catalog. There are so many relationships that the headings and descriptions make little or no sense outside of the catalog. (To explain this, someone can ask of a single record: "Why did you use the form "International Business Machines Corporation and not IBM, which is the way everybody thinks of it?" "Because the other records in the catalog use that same form." etc.)

In the future, a record will also be seen from within different cultural/linguistic contexts. So, when a patron sees a record imported into a future mashup, it may be coming from--who knows where, e.g. (I hope these links work) from the Deutsche National Bibliothek (click on "Finden"), where the abbreviation for pages is S. or from the Russian State Library, where the abbreviation is c. (click on the record number) but there are all kinds of other abbreviations, too in all of these records. So, while the Russian abbreviations may be incomprehensible to English speakers, the reverse is true as well.

This is what our patrons will see and will be experiencing in the near future--I am sure that many are experiencing this right now--and we must respond. All of these library/catalog records will--sooner or later--be mashed up. Of that I have no doubt because people want it so desperately. [Concerning this, I suggest the recent report from CIBER "Social Media and Research Workflow.", p. 29 where it is clear that above all, *everybody* wants from libraries a single search for all electronically licensed resources. I think we need to do more than that and include non-licensed resources, and that is what I have attempted to do with my Extend Search in my catalog at AUR]

For our patrons, the universe of information has gone *far outside* the boundaries of our catalogs, and we must continually look at the information universe through the eyes *of our patrons*, and focus less on the information universe *of library catalogs*, which sadly, is having less and less meaning and importance to the world. This involves a total change in the intellectual orientation of catalogers, it's true, but it is vital that we do it. It has been compared by others to the intellectual changes people went through when the Earth "ceased" to be the center of the universe, and the Sun "became" the center of one small solar system inside an average galaxy within an immense, almost unlimited universe.

How do/will our records fit in to such a universe? Does typing out abbreviations even play a role in it? How can we "fix" the situation for our patrons when they can see so many types of records created under so many rules and many times--if not most of the time, no rules at all?

These are some of the genuine, and serious, issues that our patrons are facing, and by extension, we should face as well.

Tuesday, February 22, 2011

RE: How We Know

Posting to NGC4LIB

john g marr wrote:
I certainly appreciate your main point regarding the need for "selection" of data from vast amounts constantly pouring in, but can we expect mere humans to do unbiased selection? And could librarians perhaps contribute to the solution by stepping up to gather and analyze overlooked data, rather than merely "selecting" what is deemed by social pressure to be appropriate?
I could turn that around and ask if we can expect mere machines, that can be programmed and hacked at will by all kinds of unscrupulous persons, to do unbiased selection? At least librarians are supposed to resist social pressures to censor their collections, as the endless fights over "The Joy of Sex" "Huck Finn" and "Heather has Two Mommies" have demonstrated, some of them even leading to the end of careers but always agonizing for the librarian involved.

I would venture that selection is perhaps the least understood task in librarianship (while cataloging is the most "mysterious"!). It has been my experience that our public wants "selection" today and they want it desperately, but they don't use the word "selection" and they understand nothing about it: what are its purposes, how it works, or much of anything at all. Several researchers I have met have even considered it to be a rather evil practice--after all, how can anybody ever be presumptuous enough to select for someone else? The answer is--of course people can be that presumptuous--because *somebody* has to select something, somewhere along the way. There is a budget and space considerations that demand decision making, and somebody has to take responsibility.

But on the web, with such a wealth of information available for free--and more coming online all the time--the traditional methods of selection break down. It would seem that the need for decision making is avoided, yet, the problems that selection is supposed to solve have not disappeared. It's just that with all of these "free" materials, a selector's "budget" no longer consists of money, but of our patron's time and patience. People do not want to have to wade through lots of junk. And they want what is "The Best" (although they could never tell you what that means).

Just as traditional library selection involves many non-librarians: publishers, dealers, retailers, jobbers, and so on, solving the problem for online resources will need many people from outside the library world. We can't do it all on our own. Plus, the concept of "selection" will have to be revamped somehow. Somehow, automated means must be used, but I don't know how. Perhaps as a first-level "triage" to sort out the real garbage from possibilities.

The example of Wikileaks and everybody's reactions to it is highly notable: none of it was "unbiased": politicians, governments, researchers, journalists, Amazon, but in essence, the argument is a question of "selection", either for, or against. I don't want to get into the political issues about it, but I think we have seen how difficult it is to "select against" a resource today.

How does a library catalog enter into it? Not at all, and it's kind of a comfortable place to be when compared to dealing with complaints about a book on the shelf of Mark Twain, paid for and circulating. Whether or not I, or any cataloger, makes a record for the Wikileaks site, it will not make any difference at all whether someone can find the Wikileaks site or not. People get that kind of information using different tools today.

Yet, I still think people want selection, but I am not sure what this will mean in the new environment.

Monday, February 21, 2011

How We Know

Posting to NGC4LIB

Concerning an article in the March 10, 2011 NY Review of Books: "How We Know" by Freeman Dyson (reviewing the book: The Information: A History, a Theory, a Flood / James Gleick. Pantheon.

This is a very interesting article; I guess I'm going to have to buy yet another book(!), but a couple of points jump out at me:
"Telescopes and spacecraft have evolved slowly, but cameras and optical data processors have evolved fast. Modern sky-survey projects collect data from huge areas of sky and produce databases with accurate information about billions of objects. Astronomers without access to large instruments can make discoveries by mining the databases instead of observing the sky. [my emphasis--JW] Big databases have caused similar revolutions in other sciences such as biochemistry and ecology."
and the final part:
"The consequence of this freedom is the flood of information in which we are drowning. The immense size of modern databases gives us a feeling of meaninglessness. Information in such quantities reminds us of Borges's library extending infinitely in all directions. It is our task as humans to bring meaning back into this wasteland. As finite creatures who think and feel, we can create islands of meaning in the sea of information."
My own opinion is that any library selector has known this for a long, long time. They, more than anyone else (probably) have seen the immensity of the "information universe". That is their job after all: to take "the best" (however that is defined) from the totality. But it is nevertheless the selector who is supposed to have one of the best ideas of that "totality". I suspect that the reason why the amount of information/noninformation/disinformation is growing so outrageously today is mainly because what in the past would have been thrown away as trash is now being retained. I wonder how much of this incredible sea of so-called "information" are those pictures of young, drunken students in mid-debauch, or an almost infinite number of exact reproductions of videos of out and out pornography, IM chats that consist almost exclusively of "ummm" "er, ..." "what the ....!", thousands of blog posts that repeat links to the same items, and other things that would be much better discarded. Just imagine all of those Twitter messages that are being saved at the Library of Congress! In this regard, I am reminded of Seneca in his "On the shortness of life", where he discussed similar concerns:
"It would be tedious to mention all the different men who have spent the whole of their life over chess or ball or the practice of baking their bodies in the sun. They are not unoccupied whose pleasures are made a busy occupation. For instance, no one will have any doubt that those are laborious triflers who spend their time on useless literary problems, of whom even among the Romans there is now a great number. It was once a foible confined to the Greeks to inquire into what number of rowers Ulysses had, whether the Iliad or the Odyssey was written first, whether moreover they belong to the same author, and various other matters of this stamp, which, if you keep them to yourself, in no way pleasure your secret soul, and, if you publish them, make you seem more of a bore than a scholar. But now this vain passion for learning useless things has assailed the Romans also. In the last few days I heard someone telling who was the first Roman general to do this or that; Duilius was the first who won a naval battle, Curius Dentatus was the first who had elephants led in his triumph."
and then goes on:
"...does it serve any useful purpose to know that Pompey was the first to exhibit the slaughter of eighteen elephants in the Circus, pitting criminals against them in a mimic battle? He, a leader of the state and one who, according to report, was conspicuous among the leaders of old for the kindness of his heart, thought it a notable kind of spectacle to kill human beings after a new fashion. Do they fight to the death? That is not enough! Are they torn to pieces? That is not enough! Let them be crushed by animals of monstrous bulk! Better would it be that these things pass into oblivion lest hereafter some all-powerful man should learn them and be jealous of an act that was nowise human. O, what blindness does great prosperity cast upon our minds!"
Just because data can be saved does not mean that it is transformed into "information" or that it should be saved. Seneca's discussion here reminds me of "Buffy studies" and all kinds of similar endeavors.

It seems that instead of a quest for some indefinable "meaning", the subtext of the NYRB article is actually a cry for selection. How the selection will occur, either manually or by automated means or a combination, remains to be seen.

Friday, February 18, 2011

Google Public Data Explorer opens up

Posting to NGC4LIB

Google is now allowing anyone to upload their own datasets to use in conjunction with Google Public Data. There is a standardized format for the information, the Dataset Publishing Language, which is XML and CSV. It doesn't look *too* difficult, but this is only at first glance, and actually doing it could wind up a monster! Still, I can imagine some automatic converters being made to make conversion very simple.

Of course, the difficult part will be to decide what there is (if anything) from our bibliographic data to use with this tool. This will be trial and error of course, but I could imagine trying call numbers, subjects, perhaps formats, lengths of books, who knows what else mapped to publication dates (or dates of cataloging) to put into this tool, and see what happens. For example, I would be very interested to find out how usage of subject subdivisions has changed over the years, because lately, it seems as if fewer subdivisions are being assigned and are replaced with a multiple "descriptor" type of practice. This could be done by exporting all 6xx $x along with the year of cataloging and see how they have changed over the years. Trying the other subdivisions may be interesting, too.

But I wouldn't try doing this with ISO2709 records! :-)

RE: LJ article on discovery layer services at Sydney Jones Library, University of Liverpool

Posting to Autocat Concerning the article "Liverpool's Discovery

At the risk of blatant self-promotion, this is what I have attempted to do with my Extend Search tool in my catalog. Here is an example how it works using "Alfred Hitchcock":

I have also created a Two-Minute Tutorial on it:

Once you change your definition of the "library's collection" from "that which is held by my institution, and/or what my institution pays for" to a definition: "the information that is *really* available to my patrons", the entire universe changes and becomes frightening both for the librarian as well as the searcher: the librarian has to "control" it somehow, while the searcher has to deal with it all.

Still, in one sense, I don't know if the real "information universe" has really changed all that much from what it was before the world wide web, since there were always lots of resources out there held by publishers, educational institutions, international organizations, learned societies, and everybody else, but before the Web, it was very difficult just to know about this information, much less use it. The web has made that part immeasurably easier. So, for example, I can go into Google News, search "Barack Obama" and retrieve (as of now):
"Apple's Steve Jobs attends Obama's meeting with tech honchos
‎International Business Times - 18 minutes ago
US President Barack Obama greets well-wishers upon his arrival in San Francisco February 17, 2011. Obama is visiting nearby Woodside to meet with business ...
Apple's Steve Jobs at tech CEO meeting with Obama‎ - Straits Times
Obama talks jobs with Jobs, other tech leaders‎ - The Hindu
Obama meets tech leaders, including Apple's boss‎ - Monsters and
Business Review USA - The Canadian Press
all 761 news articles » AAPL - GOOG"
I can click a button and get into "The Hindu" of India, or "The Straits Times" of Singapore! The possibilities for cultural exchange are obvious. More important, these resources have been around for a long time, but now I know about them and they are easier to get than ever. Plus 761 other articles. And free besides! Amazing!

From the point of the average search (I hesitate to say "search*er*" but "search" since even the most discriminating scholars need incredibly focused searches for only a small percentage of their total information needs), then WEMI is normally not what the public wants. Certainly they want to be able to retrieve Hobbes' translation of Thucydides, but today this is easier to achieve than ever: just type into any OPAC Hobbes and Thucydides and it will come up. And if you look in the Internet Archive:, or in Google Books, you can even download your own copies of some rare editions. But there are other sites as well, and this is what I have tried to allow in my Extend Search, which increases searchers' possibilities, at least to a point: which searches specific databases I have chosen for my students to search. For specific reasons, I have not chosen some excellent ones, for example, Gallica.

But this certainly doesn't end it. If you go to "Other database groups" and choose "General Search Engines", you can find in regular Google several copies. And if you go back to the "Other database groups", select "Articles and Open Archives" you really begin to find a wealth of information, primarily about Hobbes and Thucydides. Even if you go to "Government & Policy Documents" you will find a huge amount of information, primarily from different Think tanks. I realize this does not end it.

While I am not trying to toot my own horn (well, maybe a little bit!), what I want to point out is that none of this is all that *new*. These materials have always existed, more or less; it's just that now they are much easier to get to, and in fact, they are so much easier to get to, that we are looking at a change in the fundamental structure of information. The materials on the World Wide Web are now an integral part of our local collections because our patrons use them at least as much as the materials in our traditional "local collections", and the World Wide Web is fast becoming one of the great research libraries in the world.

But we must admit that our traditional methods of bibliographic control completely break down in such an environment. The tools I have made are very crude, I understand, and could be improved at every juncture, but they do provide some level of help, while even a small amount of "cooperation" could improve matters significantly; for example, how about some level of authority control in each of the databases for the different types of names and at least some subjects?

My Extend Search betrays my library/cataloger background as well: it was only after I made it that I realized I have split them by *format* (reminiscent of AACR2) and not by *subject* (which my patrons would prefer). So, I have books, moving images, still images, government documents, etc. I thought this was pretty funny when I realized it! Finally, this cost my institution nothing at all except some of my creativity and elbow grease, plus a small server.

How can librarians deal with such an enormous amount of work when it seems as if our numbers will not be growing at an appropriate rate? Does RDA help or hinder? My opinion shouldn't be surprising: it posits an information universe of WEMI, which is based on managment practices within a world of physical materials. That no longer holds true and we should be looking for new, and sustainable, methods.

Thursday, February 17, 2011

Thoughts on Jeopardy and IBM's Watson

Posting to NGC4LIB

Concerning the article in the NY Times: “Computer Wins on ‘Jeopardy!’: Trivial, It’s Not” about the IBM computer “Watson” defeating the greatest champions on Jeopardy.

Although the competition was only semi-fair (at least in my opinion) since it could buzz within 10 milliseconds, the competitive aspects are beside the point. The computer actually could understand human-language questions, and it appears that you could—potentially—get into a type of dialogue with it (or in other words, a “reference interview”?). To me, this is much more significant than when Deep Blue beat world’s chess champion Gary Kasparov in the match, since it was not that Deep Blue played better than Kasparov, it was just that Kasparov cracked under the pressure, and made some of the worst mistakes in his career. Besides, humans behind the scenes were “tweaking” the computer during the match.

It seems as if something like Watson could be of tremendous importance to the library community, whether we happen to like it or not. Something tells me that the number of librarians is not going to grow tremendously. See the Bureau of Labor Statistics where they predict a respectable 8% growth by 2018, but in their analysis:
“Employment of librarians is expected to grow by 8 percent between 2008 and 2018, which is as fast as the average for all occupations. Growth in the number of librarians will be limited by government budget constraints and the increasing use of electronic resources. Both will result in the hiring of fewer librarians and the replacement of librarians with less costly library technicians and assistants. As electronic resources become more common and patrons and support staff become more familiar with their use, fewer librarians are needed to maintain and assist users with these resources. In addition, many libraries are equipped for users to access library resources directly from their homes or offices through library Web sites. Some users bypass librarians altogether and conduct research on their own. However, librarians continue to be in demand to manage staff, help users develop database-searching techniques, address complicated reference requests, choose materials, and help users to define their needs.

Jobs for librarians outside traditional settings will grow the fastest over the decade. Nontraditional librarian jobs include working as information brokers and working for private corporations, nonprofit organizations, and consulting firms. Many companies are turning to librarians because of their research and organizational skills and their knowledge of computer databases and library automation systems. Librarians can review vast amounts of information and analyze, evaluate, and organize it according to a company's specific needs. Librarians also are hired by organizations to set up information on the Internet. Librarians working in these settings may be classified as systems analysts, database specialists and trainers, webmasters or Web developers, or local area network (LAN) coordinators.”
While 8% is normal overall, compare this to their related occupations: 23% for curators, 13% for primary/secondary teachers, 15% for postsecondary teachers, 24% for computer scientists, 20% for computer systems analysts. Also, the 8% presumes the higher retirement of current librarians.

Whether we like any of this or not, librarians are going to need help, and perhaps something on the order of this computerized Watson can provide a first-line of reference help. Such a tool may be perfect for a virtual community such as Second Life. People are often very hesitant to ask reference questions and my own experience bears this out. I get relatively few face-to-face reference questions, but I have created a number of what I call Two-Minute Tutorials on the web that my users can access anywhere at anytime, and these tutorials get used hundreds and hundreds of times each semester, so many times that it completely dwarfs what I do in person. Seeing those statistics was eye-opening for me.

I would personally love to have something like Watson to help people, so long as it would be very clear about when it is unsure and in those cases, transfer the question(s) to a human expert.

Wednesday, February 16, 2011

RE: [RDA-L] rdacontent terms - dataset

Bernhard Eversberg wrote:
15.02.2011 20:48, Weinheimer Jim:
> In my opinion (and not only mine), this is the world we must enter, whether we want to or not. How do you enter this world? By creating Web Services. In order just to start to do this, you must use XML, since this is the language. It is not ISO2709.
Now that is of course right. Only "you must use XML" does precisely *not* mean "use XML as your internal syntax"! It just means "be able to use XML in the production and use of services". That, in fact, is posible on systems that use MARC internally, and even ISO-MARC.

The misunderstanding here is the same that led to the internal use of MARC in ILSs in the first place. That was never really necessary, nor intended by the creators of MARC, for MARC was meant to be a communication format. In modern parlance, a service format, only that it was offline bulk services (magnetic tapes) at that time.

Again, to make it clear: Internally, in the black box that is your system from the viewpoint of the world, you can do whatever you want to structure your data. You can even (although you should think twice) use ISO-MARC - only just let nobody see it. As long as you are able to answer requests in XML *and in other syntaxes that may be asked for*, in services that the world can use. XML is not a be-all or cure-all, and in 10 years' time it may be obsolete - we have no control over that.

May we now put that matter of ISO to rest? I've never liked it myself, and it *ought* to be gotten rid of, but that's actually off-topic in this forum.
This is correct. I never mentioned ISO2709 being used internally. The internal format also probably won't be MARC, but some kind of relational structure, or an XML structure (as in Lucene), or a mixture of both (as in Koha). Each system internally can and probably will be, quite different, just as they are today. That is beside the point. The only catalog I know of that stores records internally in ISO2709 is CDS-ISIS, but there are probably others. All that matters ultimately however, is that the final product transfers its data according to a specified format.

That aside, the matter of ISO2709 *is* of incredible importance for the transfer of our records, because so long as we use it for transferring records, we remain locked into all its deficiencies, no matter how great our internal systems may become. It's like having a dam within a drought-suffering populace that needs water. Your dam may be able to deliver 200,000 gallons of water a second, but if the pipes are old and can only deliver 100 gallons a second, the fact is, you can only deliver 100 gallons a second, and this remains the case even if you do more work and you can deliver 400,000 gallons. Although everyone wants your water, and you want to deliver it, the pipes must be upgraded if you are to help. And if we don't upgrade those pipes, we cannot blame the populace for looking elsewhere for what they need.

So, perhaps we create these wonderful sites that internally have, e.g. 100 subfields in a field. Maybe we want fields beyond the 999 that we have now. None of this can be transferred using ISO2709. *If* we wanted to get rid of the single main entry, by making the 100 repeatable and everything associated with it, it would be a huge undertaking in ISO2709, if it could be done at all, but fairly simple in XML. There are lots of problems.

I don't know if XML is the ultimate solution, but that doesn't matter. It would certainly be a step forward; a step we could take now; the developers could start working with our records now and the public--perhaps--might even begin to appreciate them; and it wouldn't cost nearly as much as instituting RDA (to bring the topic back).

But you are right. I really do not like saying bad things about RDA on this list. The reason I harp on this is to provide a concrete example how we could adopt changes that are much less disruptive to us than the adoption of RDA, far less expensive, and that would (or at least could) have far more profound effects on the world "out there".

RE: rdacontent terms - dataset

Posting to RDA-L

Jonathan Rochkind wrote:
On 2/15/2011 10:34 AM, Weinheimer Jim wrote:
> I am being real. The plain text format of MarcEdit *absolutely cannot* do the same as MARCXML. I'll prove this right now. Browsers are built to work with XML, so right now, this second, any webmaster can work on the fly with XML using nothing more than a browser. They need no other tools.
That's not really true. Most browsers don't give you any tools for 'working with' XML, although some (but not all) of them will display it with nice syntax-aware coloring.
Sorry to contradict you, but I have done this myself multiple times. Here is a discussion of it: Anybody can work with XML and XSLTs with a browser, and in fact I have had to do it because I did not have access to the expensive XMLSpy, which verifies your XML.

But I think I finally understand where the disagreement lies. For example, you mentioned:
Nonetheless, I see no reason to think getting either "MarcEdit text format" OR MarcXML to be processed natively by our ILSs would be an improvement for us at all. If that's what's being suggested? I'm not sure what IS being suggested.
No, I am not suggesting ILSs. If everything were based on everyone searching separate library ILSs, everything would be fine but that is no longer the case. The internet is growing through means of mashups and apis. This is an absolute fact. This is a wonderful development and extremely powerful. But what does this mean exactly? I think the best explanation is this short video from ZDNet that I suggest everyone watch.

In my opinion (and not only mine), this is the world we must enter, whether we want to or not. How do you enter this world? By creating Web Services. In order just to start to do this, you must use XML, since this is the language. It is not ISO2709. What are some examples?

Let us imagine that a scholarly group wants to build a site about baroque architecture (this is true since I know some of them). One thing they should be able to do is to make automatic queries behind the scenes (i.e. web services) to bring together all kinds of information to create some new tool of use to their community. They can--right now, today--use Google Maps, Google Books, Amazon, Yahoo, dbpedia, the Internet Archive ... Here are just a few of them available now: In here, we can see that there is a system called BookBump that uses a number of apis, including the LC SRU API, which is based on providing records in.... XML.

Unfortunately, there still doesn't seem to be a Google Scholar API, which could be one of the most important apis for our community.

If we do not enter this world based on APIs and web services, I fear that we will be left behind completely. The general public will never even know about our records. We must let our data enter and interoperate with other apis in ways we cannot foresee right now. We also cannot expect that everyone will consciously click on the links to our catalogs and search them, because they just won't. Besides, people want to use our information in genuinely unique ways they never could have before.

This is why I feel so strongly that sticking with ISO2709 for transferring records hurts us terribly. The longer we remain in that ISO2709 straight jacket, the less we can enter the world where everything is happening. There are other reasons, too, but in the world of mashups, we cannot assume that people will come to our ILSs, especially when they will be able to use the Google Books api or the LibraryThing api or the Internet Archive api. There must be some kind(s) of Library api.

RE: rdacontent terms - dataset

Posting to RDA-L

Bernhard Eversberg wrote:
Am 15.02.2011 15:27, schrieb Weinheimer Jim:
> Of course, MARCXML doesn't solve all the problems, but one big one will be out of the way.
Oh get real, Jim!
The plain text format of MarcEdit can do the same, with an absolute minimum of effort when compared to MARCXML. Don't overlook the constraints of our actual ecosystem. Where ISO can't be avoided, for some updating or transfer purposes, MarcEdit can still be converted both ways with existing tools. Where MARCXML is desired, it can be produced from both ISO-MARC and MarcEdit. MARCXML must be looked upon as an add-on, not a requirement or necessity to escape the present, unsplendid ISOlation.
I am being real. The plain text format of MarcEdit *absolutely cannot* do the same as MARCXML. I'll prove this right now. Browsers are built to work with XML, so right now, this second, any webmaster can work on the fly with XML using nothing more than a browser. They need no other tools. This is the importance of XML. Here is an example of how it works and you can change it yourself where you can add in variables and values as you want:
For example, in the XML part (left side) add a value under <food>
In the XSLT (right side) under <xsl:value-of select="description"/> copy and paste this:
<span style="font-weight:bold;font-size:5em;color:red;"><xsl:value-of select="thing"/></span>
Then, click the button and see what happens underneath. This particular example is done using a server, but it can be done in a browser, or both. There are a thousand variations on using XSL Transformations, some very impressive. I always assume that when you have an XML file, you can do *anything* with the information within it. Anything at all. There are also a lot more ways of using XML records than only XSLTs. You cannot do this with a plain text format.

Even though I use MarcEdit every day, nobody in the world uses (or will use) it except for libraries and librarians. MARC in its ISO2709 form is used today *only* for transferring records from one *library catalog* to another *library catalog*. It has no other function. This is why I say that so long as we use ISO2709, we are stuck on "Library Island" because nobody else can transfer our records.

Why? Because you can't do anything at all with them until you have parsed them out and transformed them into a format you can deal with. Therefore, if a webmaster wants to work with our records now, they first have to parse them using a separate tool (like MarcEdit) to transform them into XML (or some kind of format that works with a web browser), and this they will not do because they *cannot* work with the records on the fly, as they can easily with the XML above. If we supply people with XML--even DC simple--they can at least work with it to an extent.

Again, if somebody were to include an ISO2709 parser into every browser, matters may be different (maybe?) but there is no chance of that when it is we who should change and not everyone else.

Tuesday, February 15, 2011

RE: Cataloguing non print materials

Posting to RDA-L

J. McRee Elrod wrote:
I've had not one suggestion on or off list with any provision in RDA which makes it easier to catalogue electronic resources than using AACR2, which might have been added to AACR2.
That is very interesting and it certainly mirrors my experience. Cataloging electronic resources that you do not control, e.g. a digital copy of a book on the web, is not more difficult than cataloging a regular book. You just handle it differently, and decide if it is, in FRBR terms, a new item or a new manifestation. Keeping a valid URL is the major difficulty.

Working with the "real" web materials is completely different, but this does not mean that they are any more difficult to catalog. In my experience, the problems fundamentally stem from the nature of the materials: 1) it's difficult to examine the item. With a book or map or recording, etc. it is pretty easy to examine the entire item before you begin cataloging. With web sites, you don't know where it starts or finishes, when you have left the site or not, etc.; 2) the information on the item changes constantly and unpredictably, so you can never be sure whether the record you made last week--or even five minutes ago--has anything to do with what the resource looks like now.

But none of this is really new, since these are essentially the same problems we face when cataloging serials and looseleaf publications. That's why I think that the real problem is: 3) web resources change with no notice. With physical resources, the mail arrives, is sorted out, and the relevant materials eventually get sent to someone who updates the record. With web materials, this no longer applies since the web master can change the title of the resource, the site can even be hijacked, or whatever and the record does not change because we have not been notified. The note we provide that gives the date the title was viewed on: "Title from home page (viewed April 22, 2002)" is just pathetic and is similar to: "Don't blame me, I'm only the cataloger."

None of this has anything to do with cataloging *rules* and much more to do with procedures and using technology to deal with a different kind of material. I still believe my ideas from an article I wrote in Vine Magazine from 1999 could point toward a solution. The article was much too long (my normal problem, I know) but in essence suggested that using embedded metadata within the resource could be checked by a web spider periodically and if certain information were updated, the catalog record would be updated as well.

I saw the workflow as: a selector decides a site is worthwhile and provides some instructions to the cataloger (e.g. analyse certain subsections of the site). This goes to a cataloger who creates the record(s). The web master is notified that the site has been selected as especially worthwhile and given a copy of the metadata record(s) to be placed on specific page(s). Then, if the web master changed the title of the site or other information such as the dates or basic description, he or she would be required to change the <dc:title> or <dc:date> in the embedded metadata record. (This is simple for the web master) A spider would check it periodically, and if any changes are found, they are added in the catalog record automatically, and messages sent both to catalogers and the web master to notify everyone of the changes. I saw it as an interactive CIP and apportioning the labor where it best belongs.

But I have never seen how changing cataloging *rules* have much to do with the matter.

Sunday, February 13, 2011

The Dirty Little Secrets of Search

Posting to NGC4LIB


Here is another article that people may find interesting, from the New York Times. "The Dirty Little Secrets of Search" by David Segal (February 12, 2011), where there is an excellent discussion of search engine optimization or SEO, and what Google does to punish companies or individuals that try to get around their guidelines. It is interesting also to note Google's terms of service (under "Quality Guidelines") at

Of course, people will, and organizations must, push these guidelines to their limits. An organization such as J.C. Penney (from the NYTimes article) must try to maximize their sales and advertising is the only way to do that. In this new environment we are in, while Penney could take out ads in e.g. the NYTimes for their merchandise, people no longer think that way. To use the web to find new dresses, people go to Google--not the NYTimes, and newspapers are suffering terribly and even shutting down because of it. Therefore, it is absolutely vital for Penney that when someone searches "dresses" in Google, that they see the Penney site and *not* on page 2. How do you ensure this? By hiring a company that specializes in SEO, or, the only other choice is to pay Google to ensure that when someone searches "dresses" an "adword" comes up that will have a link to Penney's site.

Google has their own guidelines to punish what they call "dirty tricks" (read the article) and Penney's site fell from #1 to #50 or so, in any case, where those links become essentially useless. Google of course, tries to claim innocence in all of this, that the two parts (adwords and search) are completely disconnected, so that it is not the case that if you make Google mad, you are punished, but you can fix it with some money since the only other way to ensure that people will see "dresses" on Google's first page is to pay them. Naturally, this is a situation that is ripe for exploitation, in spite of Google's motto "Do no evil".

The reality is that everyone tries to hover just below Google's "detection screen" to move up a little bit gradually, but not too much. Where does that leave the public? Although they may "feel" free, behind the scenes they are being incredibly manipulated. I especially liked where he wrote: "When you read the enormous list of sites with Penney links, the landscape of the Internet acquires a whole new topography. It starts to seem like a city with a few familiar, well-kept buildings, surrounded by millions of hovels kept upright for no purpose other than the ads that are painted on their walls." Maybe the view of the virtual world is not that of Tron, of either complete control or complete freedom, but much like the view of what the real modern city would become in Blade Runner.

The scholarly/education world is not more virtuous than the regular world, and will suffer from the same problems. In this regard, the article "Academic Search Engine Spam and Google Scholar's Resilience Against it" Joeran Beel and Bela Gipp. Journal of Electronic Publishing Volume 13, Issue 3, December 2010
DOI: 10.3998/3336451.0013.305;view=text;rgn=main;idno=3336451.0013.305 is even more important, especially since their conclusions were that it is easier to spam Google Scholar than regular Google.

This is a rat race that libraries should do their best to avoid. Using the Google-inspired tools based on crowdsourcing have their advantages but as this article makes clear, problems as well. We should assume that Google Books and Scholar will have essentially the same problems, if not worse. It still seems to me that traditional library goals and ethics, based on standards can have a role in solving this dilemma, but at this point I don't know how.

Saturday, February 12, 2011

RE: RDA and MARC (was Linked data)

Posting to RDA-L

Karen Coyle wrote:
Quoting Weinheimer Jim:
> But I wonder if what you point out is a genuine problem, especially in an RDA/FRBR universe. The user tasks are to find, identify, yadda -- works, expressions, manifestations, and *items*. Not sub-items.
Jim, I think you're at the wrong end of the WEMI continuum -- what this record lacks is better access to *Works* contained in the manifestation/item. Items are the physical items, the thing you have in hand. The added entries in this record represent persons and works.
Well, it is still a *valid* way of looking at it. The purpose of the traditional unit record is to describe the *thing* you are cataloging as a whole--after you decide what constitutes that "whole", and then link it to other records in various ways. The analytic (essentially an extra card--under main entry mostly--in the catalog) was to supplement matters to a point. The problem is and always been: how do you look at the "whole"? I'm sure the problem was always in the backs of our minds when we cataloged one of those single volume complete works of Shakespeare. ("But nobody can find Romeo and Juliet by Shakespeare with this record for his complete works!" This is another of those problems that worked more or less in the card catalog and does *not* work in the OPAC) But ultimately more important: if you are cataloging a conference, do you catalog each paper separately? If you are cataloging a serial, do you catalog each issue separately, or each article in each issue separately? I have done all of that.

These are some of the most important issues in cataloging because if you decide to, e.g. catalog each conference paper separately, your work may increase by 10 times or more and it is a huge responsibility to keep from crashing and burning. Promising something like that means that you have the resources to achieve it. Of course, we don't even try with journal articles.

But in today's world even that is not enough. Let's look again at those examples. In the catalog record for Boykan's Flume, we can see the problems clearly:

245 10 |a Flume |h [sound recording] : |b selected chamber works / |c Martin Boykan.

505 0\ |a Sonata for violin and piano (17:54) -- A packet for Susan (19:59) -- Flume : fantasy for clarinet and piano (10:40) -- String quartet no. 1 (18:58)

700 12 |a Boykan, Martin. |t Sonatas, |m violin, piano.
700 12 |a Boykan, Martin. |t Packet for Susan.
700 12 |a Boykan, Martin. |t Flume.
700 12 |a Boykan, Martin. |t Quartets, |m strings, |n no. 1.
The cataloger could have considered the 245 adequate, but leaving that kind of access is seen as kind of useless for music cataloging and they are expected to do much deeper analysis than most others. (In book cataloging, this is not seen as such a bad problem) Still, note that the 505 note is not enhanced since there is access through the 700s.

But compare this with the Amazon record
505 0 |t Sonata for violin & piano: molto moderato -- |t Sonata for violin & piano: alla marcia -- |t Sonata for violin & piano: variations -- |t A packet for susan, for mezzo-soprano & piano: it often comes into my head -- |t A packet for susan, for mezzo-soprano & piano: the good-morrow -- |t A packet for susan, for mezzo-soprano & piano: bright star -- |t A packet for susan, for mezzo-soprano & piano: the owl and the pussycat -- |t A packet for susan, for mezzo-soprano & piano: well I remember -- |t Flume, fantasy for clarinet & piano -- |t String quartet no. 1: sostenuto -- |t String quartet no. 1: allegro -- |t String quartet no. 1: interlude, adagio espressivo -- |t String quartet no. 1: leggiero.
which has been parsed into 505$t and the "works" have changed. "Sonata for violin and piano" has been split into "molto moderato, alla marcia, variations." (separated only by that horrid punctuation mark!) The other pieces are split in the same way. So someone could say, "If I search in the catalog for "packet susan owl pussycat" I won't get it, but I will in Amazon. Therefore the Amazon record is superior".

But this still isn't the end. If you look at the public lecture of Jonathan Zittrain "The Future of the Internet" (an important talk), it is a compete talk but is split into 17 parts, each with a different theme.

But even this is not the end. I have seen some tools such as HRAF that index each paragraph! With full-text tools such as Google Books, it is hard to predict what is going to happen. In any case, the very idea of "what is a work"? is far from solved today.

The modern information environment is fragmenting. It is being sliced and diced into a bunch of little pieces, while others take what they want and make jambolayas of their own personal recipe, sharing and mixing, changing and discarding. To believe that the FRBR framework of work-expression-manifestation-item encompasses this new world, describes this new world, makes sense of this new world, or even helps to comprehend this new world, takes nothing less than--in my opinion--a tremendous leap of faith. And I have said before that I lost my faith a long time ago.

I am sure the Googley guys would never even consider thinking in FRBR terms. They will look at the situation as it stands and try "something". If that doesn't work, they'll try another one and another....

I think we need a new set of eyes.

Friday, February 11, 2011

RE: [RDA-L] general interest in RDA

Posting to RDA-L

Kevin M. Randall wrote:
Jim, it sounds from this comment that you really are not grasping what RDA is all about. If you look at it just in terms of the guidelines themselves, or the resulting MARC records currently being created, certainly it would seem that it's just a little tweaking here and there.

But the underlying philosophy and structure of RDA are nothing short of revolutionary when compared with AACR2. You are asking for change; and a huge change is what RDA is actually helping to bring about. AACR2 is based on the eight areas of ISBD, and guides the cataloger through the process of putting together a description for ISBD display. RDA is based on discrete bits of data (the RDA elements), each uniquely identified (that's a VERY important part), and guides the cataloger in supplying those bits of data, regardless of what kind of display is going to be used.
ISBD/AACR2 guide the cataloger to put together a description for ISBD *display*?! I confess that this is a very strange idea to me. I personally don't think about display when I am cataloging anything. Very few online catalogs use an ISBD display for the unit record, so Worldcat, Voyager, Dynix, etc. each have all kinds of displays for their records. (OK, I confess I'm a throwback and I have used a modifed ISBD display in my catalog, but I don't have to).

ISBD mandates standards for the *creation* of the single record (or unit record) such that the description can be shared internationally. Currently, the ISBD standard separates the elements using punctuation, but it could just as easily (and should) be linked to UNIMARC, much as the LCRIs now show the MARC format; but UNIMARC is based much more closely on ISBD than is MARC21.

AACR2 continues ISBD (with a minimum of differences, we hope?) to stipulate how these separate records should link together, defining strings of text that imply some kind of physical arrangement (primarily alphabetical, or "dictionary"), and this method of alphabetical browsing is continued into subjects as well.

Now, if we are talking about the displays of *multiple records*, that is another matter, since FRBR discusses e.g. what a user *needs to be able to do* with a work or expression.

While the RDA elements are very important, and they are uniquely identified, and I don't want to be mistaken that they are not important because they are; nevertheless, this is not something all that new, since a coding such as 260$c and are equivalent. So, we could just as easily have but we want things to work with *English* language terms, something that was not possible back in the 1960s when MARC and ISO2709 were created. With the rise of the FRBR framework, other possibilities became possible which, we must admit, were always there, but went beyond the purposes of the catalog (as it was at that time) e.g. the place where a work was created as opposed to the place where it was published, or you can have the extent not only of a manifestation, but of an expression. Again, I posit that this is not really new; a cataloger could always have done the extra work to find out where Stephen King wrote "The Shining", but it wasn't seen as worth the effort so there were no guidelines for establishing or encoding it. For some cases of manuscripts and early printed books, the extent of the actual expression inside the physical items was seen as absolutely important, and has been described in far more detail than regular, mass produced books. It remains to be seen if our predecessors were correct in considering that adding this information to be not worth the effort or perhaps some kind of crowdsourcing or interoperating with other databases will provide a "solution", if indeed, it is determined that there is a problem.

But ISBD is primarily a standard for *description* and not display, i.e. how to describe an item for maximum clarity and interoperability in the greater world. There are just a few pages of rules for punctuation, while the vast majority discuss how to determine which information to input and how to do it. The rules for description are incomparably more important than those for punctuation. Anyway, the display that ISBD mandates is followed by almost no one now and is pretty much ignored since other methods have overshadowed them. Whether this is wise or not is a matter of debate.

FRBR continues ISBD in a theoretical sense, and attempts to create a framework for how the aggregate of records are supposed to "function" with one another, but once again, I suggest that FRBR does nothing more than describe how our catalogs have always worked--and does not discuss whether these are the tasks that users themselves really want and need to do. For only one instance, the dictionary catalog is *dead*--dead and, it should be, buried.

These are a few of the challenges we face of *genuine change* and something that our predecessors could *never have imagined*. I do not see how RDA and FRBR address these issues. This is some of what I have tried to demonstrate in my last two podcasts.

FW: [RDA-L] general interest in RDA

Posting to RDA-L

Barbara Tillett wrote:
James - If we just keep business as usual, I am convinced libraries will go the way of the dinosaurs, and very soon (as we've seen academic and public libraries shutting down branches and closing catalog depts to rely on vendors or technicians to do copy cataloging only).

The metadata we provide has tremendous potential for re-use in the internet environment in ways that will make libraries even more relevant to users everywhere, and that is what we are preparing for with RDA - when we can move to creating well-formed metadata following RDA's elements and relationships, away from the AACR2 mentality of creating only linear citation listings with main entries and authorized headings (it can be done other ways, given labeling the data for machine re-use). We must break with that kind of 19th and 20th century thinking. It's not just a matter of little tweaks to AACR2 and LCRIs.

We definitely need our vendors on board to make all this much easier for catalogers, and we can build a shared vision of where we are going with all this. Why not a shared datafile of the world's bibliographic and authority data, freely accessible for all to use - not behind OCLC's WorldCat with its costs and restrictions and the costly repetition of the same data in local OPACs around the world - why not replace OPACs with much better resource discovery systems? Ex Libris is moving that direction as is III. Those resource discovery systems of tomorrow will be able to answer all sorts of user questions, not just the author/title/subject index choices we give them now, and not just be proprietary to libraries but open to the entire information community. We could be doing so much more for so much less cost by sharing globally and using a structure of well-formed metadata, packaged in an RDA-based XML schema.

I would much rather be energized by such a prospect than wallow in the gloom and doom of today's economic woes. Let's make it less expensive and better than ever.

I agree with what you say almost completely. Libraries must update their "world views" to include what the general public actually uses by adapting to the new information environment, or as I described it in my talk at the RDA@yourlibrary conference, these are matters of Darwinian survival.

Where I disagree is that I believe the changes of RDA really are just little tweaks to AACR2 and the LCRIs; they are not indicative of any real change either for the sharing or production of our records, and will not help or hinder the new directions you outline. But catalogers themselves will be hindered since everyone will have to learn to use new tools and new terminology to produce what is the same product as today, except for a few cosmetic changes. I have yet to see how RDA will improve the situation for our patrons, while being incredibly disruptive--and expensive--for us. We need CHANGE--not typing out abbreviations and adding a few extra fields. Those are not the problems we face.

But I am repeating myself, and I hate to say bad things about RDA on this list. The more I think about it, I think Michael Gorman's talk at the conference really makes the most sense in the current environment.

Still, I think we all agree about where we want to end up; we just disagree on how to get there.

RE: RDA and MARC (was Linked data)

Posting to RDA-L

Kelley McGrath wrote:
There was a discussion a while ago now about the problems (or not) with MARC. I gave a presentation at ALA Midwinter called "Will RDA Kill MARC?" I was sort of waiting for the official version to be posted, but, although the person organizing the presentation has tried to post it on the ALA/ALCTS site, apparently the site down a lot. So in order to belatedly get my two cents in, I've put the presentation up at for anyone who might be interested. I guess my point is that we could make MARC do at least most of the things we need it to do to support RDA, but that's probably not the best use of our limited resources.

Interestingly, one of the audience members asked rather if MARC will kill RDA...
Thank you so much for sharing this presentation. It's no secret what I think about RDA, FRBR, and MARC but I agree with a lot of what you point out. Yet, I do have a question. On slide #15, you use an example MARC record to show how the current coding of our data is inadequate by referring to a record (which I discovered to be "Flume" by Martin Boykan) and at first, I could not understand how the coding you presented there was inadequate. But I realized that the problem (if it turns out to be one) is that the information for e.g. Cyrus Stevens, only pertains to the first work (Sonata for violin and piano), which was performed at the Sonic Temple, etc. and that the other pieces of music there had other performers and performance information. As a result, information for the separate pieces is spread around all over the place and as you point out, it cannot be brought together.

[As an aside, for the entire record, you can see the one at Princeton (You need to click into the record. It's the closest I can get), and it is also interesting to compare this with the version from Amazon, using that wonderful conversion tool: By the way, the library record seems superior in every way, allowing controlled access to every name and title.]

But I wonder if what you point out is a genuine problem, especially in an RDA/FRBR universe. The user tasks are to find, identify, yadda --> works, expressions, manifestations, and *items*. Not sub-items. This record seems to allow everything that FRBR requires, plus it allows even more because if I am looking for Boykan's First string quartet, there is a beautiful controlled analytic. This level of analysis is rarely followed in regular cataloging of other materials.

So, as far as finding goes, there is absolutely no problem. It is just that in order to *see* the metadata associated with this particular piece (remember, I can still *find* it!), I have to look at the description for the entire item. Nevertheless, it can be found--and in a controlled way as well since the cataloger did a good job--and this information is available for the later user tasks of identifying, selecting and obtaining. This seems to fall squarely within the FRBR requirements.

Of course, breaking it all down could be achieved today in MARC through doing separate constituent entry/host entry records, so that the information that is scattered around within the single record would actually come together in the separate records. With a more flexible format than MARC, the information could be grouped within the same record, e.g.:
<metadata for host entry>
<metadata for 1st constituent>
<details of the performance>
</details of the performance>
</metadata for 1st constituent>
<metadata for 2nd constituent>
</metadata for 2nd constituent>
</metadata for host entry>
None of this would allow for more *access* than what we have right now however, since people would still be able to find exactly what they can today. I am sure that it would be just about as much work for the cataloger as doing constituent/host entries. The actual difference would be with display, since the constituents would display within the host entry, if it were desired. (Although they wouldn't have to) While this *may* be more flexible than what we have today, I still believe that with XML and XSL Transformations, we could probably have almost the same displays available using our current constituent/host entry cataloging. (By the way, I am *NOT* at all advocating we institute host/constituent cataloging!)

A difference could be if we added controlled access for, e.g. place and date of a performance, perhaps similar to a conference heading, but I am also *NOT* advocating that, either!

I can imagine that a problem could arise with interoperability with another catalog/database that coded these matters separately, e.g. a music recordings database where you could search for specific dates and places of performance. But this raises a whole bunch of side issues that are not pertinent here.

So, what I am asking is that while what you point out here is true, does it constitute a problem in the practical world? And especially is it a problem in an FRBR/RDA world where it is assumed that people want *items*?

Or, is this merely a theoretical issue? I personally believe that this is a theoretical problem that impedes finding and access in no way at all.

But yes, I agree that for all kinds of reasons we need to provide the public with more useful formats than MARC, especially its ISO2709 instantiation. But this needs to be done in steps. The first step could be allowing the public access to an XML format of our records. I don't care if it's MARCXML or MODS or some other variant. After that, we play it by ear and see what the public needs, and what our needs are.

Thursday, February 10, 2011

RE: general interest in RDA

Posting to RDA-L

Nerissa Lindsey wrote:
It is interesting to hear that RDA isn't being taught yet at many of these programs. I personally think that this is unfortunate, because even if RDA is not adopted I think all cataloging students should at least be learning the fundamentals so they know why it is even being considered as a replacement for AACR2. I can understand why people who have worked in the field for many years are 'tired' as Mr. Weinheimer has mentioned. However, graduates from MLS/MLIS programs are going to be shaping the futures of cataloging/metadata departments of all kinds, and I think that educating them in RDA is just as important as teaching AACR2. I just finished my MLIS in June '10 from the University of Washington, and last spring they offered a course called RDA and Metadata taught by Diane Hillman. I gained a lot of insight from auditing this course that I wouldn't have otherwise if I stuck with just the regular cataloging courses. I see a trend across libraries at least in the US where cataloging departments are changing their names to things like cataloging and metadata department or just metadata services. I even applied for a position with the title: Resource Description and Access manager after I had graduated. I have heard stories about libraries who are hiring metadata librarians and not planning on replacing their catalogers when they retire. I do not feel qualified to state whether I think RDA is the best option or not, but I do know that any student hoping to make it in this field after they graduate better have at least a solid educational foundation about RDA.
Thanks for your input. I very rarely get to hear the voice of the "younger generation", so I really appreciate it. But let me mention, as one of those old codgers, that the world of metadata is a huge one with practices you (and I) cannot imagine, much less agree with. The voice of experience suggests that underestimating the complexity of the task facing us will lead straight to failure and ignominy. The people who come after us (and I hope, many of us who are still around--including myself!) absolutely *must* find some kind of ways to bring these disparate methods into something approaching harmony. The old methods are shot--I completely agree. The workflows, the methods, the *almost* everything, must change radically. (OK, some things can stay!) One thing I am certain about: if librarians/catalogers don't do it, somebody else will, perhaps Google or Yahoo (both for-profit corporations), perhaps an agency from some government (US, UK, Italian, German, Chinese, Russian?), perhaps an international organization, perhaps some 12 year old kid in his basement. I don't know which one, but I do know that sooner or later everybody's work will interoperate in some way, even if that means that it is all mashed together semi-mindlessly, on the order of the Google Book metadata that we have now.

The assumption that the 19th century conceptual framework of RDA/FRBR encompasses this huge, changing universe is rather bewildering, and completely unwarranted, in my opinion. RDA/FRBR are representative of the old methods (again, in my opinion!). While I will admit that there is a remote possibility that this rather ancient world view of FRBR may actually describe what we are facing today, I remain *extremely skeptical*. In fact, it is my belief that if Panizzi, Jewett, Cutter, etc. were alive today, they would be the first to throw out the old ways to find what people *really* need and what our capabilities really are.

I prefer not to say bad things about RDA and FRBR on this list, so I apologize.

RE: Watson - IBM's "question-answering" machine (potential implications for libraries?)

Posting to NGC4LIB

Laval Hunsucker wrote:
I don't exactly follow you, Jim. You wrote :
> But questions that demand more thought and require a deeper understanding will (I hope!) always be asked and I don't see how a computer can answer those.
But wait, Thomas did write ( and you even quoted him ) "To get a question answered you look it up on the web or you ask an expert." And that latter resource is where one would ( and surely should ) go precisely in cases of, as you put it, "questions that demand more thought and require a deeper understanding". The hope you here express ( that such questions will continue to be put ) will certainly not prove futile, but I can't in my wildest fantasies imagine why anyone would choose to put such questions to a librarian rather than to an expert.
Thanks for asking this question Laval. As I mulled over my reply, I realized that there are certain assumptions that I have mentioned throughout my own postings but I don't believe I have made an explicit statement about what (at least I believe) is the function of the librarian, especially in the future.

In the traditional information process, there have been various responsibilities, each performed by different groups. There are authors/experts (fiction & non-fiction), there are journalists, there are editors, there are peer reviewers, there are publishers, there are vendors, there are consumers, there are librarians. Each has been responsible for a specific function and each has had an important, and unique, role to play. When we talk about an "expert," it becomes clear that each of these roles also has its own levels of expertise, "expert medical editor", "expert music vendor", and so on.

But when we talk about an "expert" as you do, we have in mind some recognized authority on a topic, probably someone in education, government, or business, who has published highly regarded works on the topic the questioner is interested in. In today's world of the World Wide Web and Web2.0, it would appear that the groups I mentioned above between the author and the consumer can now in essence, disappear, and the consumer of information can come into direct contact with the author/expert, asking questions and getting answers directly. In such a case, what is/are the role(s) of any of the other groups? I want to focus on librarians here.

The role of the "expert" is to make judgments for those who are not experts; to point out precisely and clearly what that expert believes is the "best", perhaps mentioning a few potential areas of error that the questioner should best avoid. So, when you take a class in university, you expect the professor to present what he or she honestly believes to be the truth. To expect anything else would seem rather strange and certainly violates the notion of academic freedom. While the professors may occasionally point out some areas with which they disagree, these other areas should certainly be given short shrift--after all, that is why they are the experts: to help save your time and prevent you from falling into various types of error.

While this idea sounds good at first blush, it quickly disintegrates. "Experts" are constantly disputing almost every issue (except for Bruno Latour's concept of the black box which has its own problems, as I discussed in my latest podcast), so how is the searcher to find the "right" expert for his or her needs? What does this idea of "choosing the right expert" (which even sounds bizarre, but which is necessary nevertheless) mean in practice? For example, someone wants to know about the volatile issue of "creation science" Who is "the expert"? How does someone go about choosing one?

Gradually, the idea of impartiality arises. I submit that impartiality and expertise are very difficult to reconcile, if not impossible, since an essential part of expertise involves making judgments of all kinds; otherwise it is difficult to claim expertise. This is where I believe librarianship comes in, and becomes a vital part of it all. We have our ethics:
"VI. We do not advance private interests at the expense of library users, colleagues, or our employing institutions.
VII. We distinguish between our personal convictions and professional duties and do not allow our personal beliefs to interfere with fair representation of the aims of our institutions or the provision of access to their information resources."
This is different from expertise. In some areas, I can claim to have some expertise, e.g. Slavic-languages cataloging, some parts of internet development, the history of chess, and in those areas, I can help questioners who ask for my judgment. However, I am a librarian, striving to follow those parts of the code above as much as I can by furnishing questioners with lists of resources compiled in ways that are as free from bias as humanly possible. In the area of e.g. chess history, I can set aside my expertise and my personal distaste for some works and authors, and do my job as a librarian.

I cannot do my job as a librarian all by myself, and I rely on similarly-minded people (other librarians) to help find and compile these resources. A librarian, as opposed to the expert above, is not supposed to make judgments, but concentrate on impartially furnishing information for the searchers to sift through, consider, and decide for themselves.

Many students/questioners do not care for this because it means that they must devote their own intellectual labor. They would much rather sit back and just be told one particular expert's version of the "truth". I think that this is the real meaning behind the finding that: "Librarians like to search; people like to find." Many people out there want to be told what is the truth and not work at it. Some of the times, I want that too.

But not always.

Therefore, I believe the librarian-function, as laid out in the code of ethics, is absolutely important and vital, but this does not mean that everything will remain as it is now. Will there be "libraries" full of "librarians" using tools that are similar to what we have today? I have no idea at all. Perhaps Wikipedia itself will evolve into something that allows all of that. Perhaps there will be something beyond my imagining now. But believing that the entire matter is solved by being able to directly link a questioner with some kind of "expert" out there is--to me--an absolutely terrifying scenario, ripe for all kinds of abuse.

I certainly don't expect everyone to agree with me on this (and after all, we are all "experts" in our own ways!), but of course, I will resort to that awful rhetorical trick and point out that any disagreement will only tend to bolster my argument! :-)

RE: RDA provisions

Posting to RDA-L

Brenndorfer, Thomas wrote:
What is the economic advantage to actually recording the data, but not using it in a more modern way? The effort is already spent in recording, by someone, somewhere, but not necessarily in linking. If the only display option in the past was the card catalogue, it's not much of a leap in understanding why some fields weren't used, but yet the data was still recorded.
Or, to put it another way, as institutions cast an eye on other systems, such as IMDB, that seem to be doing a fantastic job, how can one argue that libraries can't be doing the same level of quality work-- cost-effectively!--, especially in a collaborative environment, where better tools and mechanisms (and standards!) are regularly appearing?
There is still another way of looking at it, today in a modern way, and that is to actually *cooperate* with other communities such as IMDB. In the broad scheme of things, if we were going to add relator information for materials that are already in IMDB, which you say is so great and I am sure you are correct, this is duplicated effort. That makes no sense today. So, instead of looking at it as a "competition" we should be looking at opportunities for real and genuine cooperation--in fact, we absolutely *must* be looking at such opportunities. There are a myriad of such possibilities today, and it will mean genuine change for us, as well as for others.

In the current information/economic/budgetary environment, I still believe that for a general cataloger to spend substantially more time cataloging each item is impracticable. Productivity must increase; and increase a lot. For specific communities, it may be a different matter.

I certainly hope that Michael Gorman's talk from the RDA@yourlibrary conference is made generally available: The Path Not Taken: A Descriptive Cataloging "might have been" where he discusses some of these matters. We spoke at the same time, and I only heard him after the fact, but his thoughts are very important and all need to consider them.

Wednesday, February 9, 2011

RE: RDA provisions

Posting to RDA-L

Brenndorfer, Thomas
The entries are organized by his role in each film: actor, director, producer, soundtrack, composer, miscellaneous crew, camera and electrical equipment. This is a very user-friendly organization.
The whole point to RDA is to allow properly differentiated and interconnected elements to thrive in these kinds of displays. Burying data in text descriptions is just that-- burying. It's wasted effort, and the data is of limited utility, happily living in flat file card-like environments, but not much use elsewhere. It's true that making full use of RDA elements in MARC is a problem, but it would be wise to assert that it is MARC that has the problem, not RDA.
It may not be so much a MARC problem, but a conscious decision among catalogers quite some time back that continuing this sort of access was unwarranted. Adding relator codes have always been possible but, while I cannot point to a decision from where I am currently, it was obviously decided that it was not worth the effort. At this point, I can point to some cards, that show earlier practices of "joint author", and "comp."

The original LC AACR2 Rule Interpretation on relators apparently was issued in 1982, p. 29-30, when they decided not to apply the option, except for "ill." for illustrators of added entries. In addition, certain communities went their own ways, e.g. the art cataloging community for access for artists such as Albrecht Durer, who fulfilled many roles.

There is a lot of metadata that could be added to records that is considered not worth the effort. It's important to distinguish between a complete lack of access (i.e. when a name is not recorded at all) as opposed to more "specific" access, such as being able to limit a search to someone as an editor, a publisher, a producer, a "joint author", and so on, although the person's heading can still be found.

Of course, any information at all can be added, but the unavoidable question is: is it worth the effort to distinguish "blurb writer" from "licensor"? A practice that can be achieved only by 1% or 5% of the cataloging community cannot be considered practicable for the entire cataloging community.

Perhaps in highly specific databases, it is worth the effort but for a general cataloger, practical matters must enter into it somewhere. And *especially so* today since the cataloging community is facing highly restricted budgets for a long time to come.

RE: Watson - IBM's "question-answering" machine (potential implications for libraries?)

Posting to NGC4LIB

Thomas Krichel wrote (concerning
B.G. Sloan writes
> What if we had sophisticated affordable "question-answering" machines in ten years? What would that mean for libraries?"
Why would that make any change? The idea that people go to see librarians to get questions answered is already many years out of date, isn't it? To get a question answered you look it up on the web or you ask an expert.
This is absolutely correct: the future is with us now! The number of reference questions asked has tumbled and there is no reason to think that this will change anytime soon.

Of course, almost no question has a single "correct" answer, except for questions such as, how tall is Mt. Everest, or, Lincoln belonged to the Democratic Party--true or false? Almost every substantive question has several possible answers. For example, a question I was asked once pops into my mind: Does communism lead inevitably to Stalinism? Hard to answer with a yes or no! There is no single "correct" answer.

So, the traditional reference questions termed "ready-reference" are probably already gone from the reference desk. But questions that demand more thought and require a deeper understanding will (I hope!) always be asked and I don't see how a computer can answer those. The traditional library ideal that the librarian furnishes the searcher with information--in an unbiased manner--(or at least so far as is humanly possible) will still be needed, so that people can examine various ways of looking at an area of concern to them, and each can finally arrive at his or her own, personal version of "the truth".

How librarians can help people achieve this sort of ideal in a networked, virtual environment remains to be seen however, but this would seem to me to be one of the more interesting of the various challenges we face.

Monday, February 7, 2011

Cataloging Matters, podcast #8

Cataloging Matters, podcast #8

RDA: the Wrong Solution 
for the Wrong Problem
a paper given at the 
RDA@yourlibrary online conference
hosted by the Amigos Consortium
February 4, 2011
I believe the title of this talk pretty well sums up what I intend to discuss. Libraries and their catalogs are in great distress, but although in distress, I remain optimistic since I believe there are many solutions possible. Unfortunately, RDA goes in the wrong direction: it will not help the public use our catalogs any better, and more importantly, RDA ignores the real problems that libraries, librarians, and catalogers, are facing. Above all, for catalogers: they need help. And lots of it.

What are these problems? I want to discuss a few of them, but remember that this is *not* an exhaustive list!

Today, with the exponential growth of the Internet, libraries are faced with a huge and ever growing number of resources that need description and organization; we are also experiencing the apparently paradoxical situation of an increasing number of variants of resources, along with fewer of them at the same time. By this I mean that instead of hundreds or thousands of more-or-less exact physical copies of a single resource, such as we see with multiple copies of a single book, these new resources are truly unique, single websites (i.e. each website equals a single copy) but this single copy can be viewed simultaneously by any number of people wherever they are on the web. At the same time, there are an increasing number of variants of resources as they are reworked in all kinds of novel ways. We have never seen these types of resources before. There are many examples, such as the “Star Wars Kid” video on YouTube that went viral and multiple versions of that one video came out.
[The best discussion of the many facets of Star Wars Kid is in Chapter 16 of the lecture Jonathan Zittrain: The Future of the Internet and the many videos are at You must see the original first, though:
There are also “mashups” [], i.e. webpages that bring together bits and pieces of other webpages. All of these resources change constantly and sometimes there is a new version every few minutes or even seconds, while the older versions are not saved, and consequently, they disappear forever. Back when I saw the first examples of these resources, I thought they were the very definition of “ephemera” and therefore, out of the scope of library catalogs. That was a simple and satisfactory solution for me, but too bad: I was wrong. These kinds of resources turn out to be very important.

Articles are written about all of these new resources in scholarly journals. Online social networks are providing completely new ways to find information. One such is Aardvark that bypasses searching altogether and links you to “someone” who can answer your question! There is little doubt that developments using these, and still more innovative tools, will continue into the future. These are some of the consequences of the changing nature of modern information.

Then, once people have found the resources they want, they can use citation management software [] that not only manages their documents and citations, but also hooks them into collegial networks where they can collaborate in various ways, and this software will even search for new documents automatically, using semantic analysis tools that ferret out your needs, without the necessity for anybody to consciously search for anything at all, not even a subject or a name heading! We can all take it for granted that these tools will become increasingly sophisticated as they develop.

None of this is science fiction and is happening as I speak. As a consequence, there are: 1) resources that never could have existed before the Internet; 2) there are entirely new methods to find resources; 3) people are using these resources in unprecedented ways, reworking them for their own purposes and as a result, I think it is safe to conclude that: 4) the general populace has completely different expectations and needs than ever before.

And yet, we need to realize that many of the changes libraries are facing are not really focused on us, but what we are experiencing is primarily a side effect of the sweeping changes going on in the traditional publishing industry. As a result, all libraries, from the local public library to the research library, are facing more-or-less the same pressures, or will be very soon. The changes that are roiling the traditional physical publication industries find their end point in the library.

The different publishing industries are dealing with the changes in various ways. The music publishers have been reduced to using threats and are despised by many throughout the world. In the traditional print industries, newspapers are in the vanguard, and their demise gives us a glimpse of the future of the entire print publishing industry. It is clear that the old ways are changing. So, the changes I am discussing in libraries are not necessarily focused on us: they represent a sea change in the centuries-old patterns of publication, of how people communicate among themselves, and libraries have been the end point of that process; but this too, may be coming to an end, or at least changing in fundamental ways.

Perhaps none of this is new to anyone listening; I think it shouldn’t be. Something else that shouldn’t be new is the fact that almost every library is facing major budget cuts. And nothing I have read has predicted that the “good old days” of those big, fat library budgets are going to be restored any time soon. (When were those good old days anyway?)

Oh yes, there is also this little thing called Google Books that already has more texts available at the click of a button than in many of our entire libraries! Much of this is publically available now, but once the full text of the entire collection is made available, and it would be wise to assume that it will happen sooner rather than later--and possibly very, very soon, like next week?--it will be highly difficult to stand in the way of our patrons’ demands, and libraries will be forced into subscribing to all of this magnificent full text. Our patrons will be able to search these materials with tools that have no library input whatsoever. This too, will develop in ways that are unpredictable. No one can convince me that this will *not* have tremendous effects on the use of library resources.

From all of this, it is obvious that the library community as a whole is facing serious difficulties, but we must admit that traditional cataloging and the local catalog are in absolute crisis. While catalogers are under greater stress than ever before because the numbers of resources are higher than ever, and potentially growing at exponential rates, the number of catalogers is not increasing or, in many cases, going down. All this is occurring while their final product, i.e. the catalog record, is becoming less and less understandable to the public, who now have far more experience of full-text retrieval tools than traditional library tools. Of course, in the march toward the future, eventually *everyone* will be members of the “Google generation”. If the result is that even the idea of “surname [comma] forename” is being forgotten, what does that portend for the far more complex concept of authority control?

The methods used in the library catalog are based on centuries of trial and error using technologies appropriate to an earlier time. For instance, our current procedures for creating cross-references for various forms of personal and corporate names, as well as subject heading access, are based on left-anchored textual strings and are founded on how people browsed card catalogs, which hasn’t been the case in most libraries for over 20 or 25 years. That has been quite some time now.

I am not aware of any studies that have shown that our public *today*--not the public of 100 years ago but the public of today--actually wants what RDA is designed to give them. Remember, RDA is based on FRBR, and the purpose is to allow the user to “find, identify, select, obtain (what?) works, expressions, manifestations, items (how?) by their authors, titles, subjects”. In other words--and this is very important--RDA allows *nothing new* at all, because FRBR explicitly restates the same user needs that have been the underlying purpose of the catalog since at least the early 1840s at the British Museum under Antonio Panizzi. So, if we institute RDA and FRBR, our users will not be able to do anything--and I repeat *anything at all*--that they cannot do today. This at the same time as the very nature of the resources, how the resources are found, and what people can do with them, have changed in ways that have been unpredictable, and these changes are continuing at an incredible pace.

What are the actual changes our users will see with RDA? They are the type that almost no one will even notice. For instance, I am sure almost no one will notice the spelled out cataloging abbreviations or changing the dates on personal names from, e.g. 1943- to “born 1943”, or the elimination of N.T. and O.T. in the books of the Bible. Also, if RDA is implemented fully--and this requires that there be even further changes than what are considered now--the display of the works, expressions, manifestations, and items *can* be different, if libraries want, and these views will probably look very similar to displays in printed book catalogs, although they will probably be somewhat more interactive. Or they can look more or less the same as today. But it is important to understand that with RDA, patrons will find no change in searching. For example, we will not institute a “methodology” access point for scientific materials, long asked for by many, or anything substantially different because the basic purpose of RDA and FRBR are exactly the same as what exists now.

Consequently, when I compare RDA to the tremendous changes in the universe of information that I outlined earlier, I do not see how it has any relevance at all.

RDA does not address the fact that people have definite problems using our subject headings, that people almost never browse lists of names arranged alphabetically, that full-text searching and various types of sorting, such as relevance ranking, are by far the most popular types of searching that people do--even though very few people understand what relevance ranking actually means. RDA does not address how to incorporate related, non-library metadata projects on the web and how catalogers can cooperate with other creators of metadata to get the help they so desperately need. Considering higher quality records, we all know that many libraries are not able to follow the AACR2 rules today and nothing happens to them, so why wouldn’t they just decide not to follow RDA as well?

At the same time, I ask: is it morally justified for libraries, who are facing major budget cuts, to spend significant amounts on training catalogers to learn RDA at the expense of.... what? The simple fact is there will not be new funding, so there absolutely must be tradeoffs: will there be less spending on materials and resources for our patrons? Will more staff be laid off? Will more library branches close? Will more pay raises be deferred, or will more paychecks be reduced? Our British colleagues are facing some of the most draconian budget cuts I have ever heard of. Is it in their interests to cobble together the funding for training for RDA somehow? What are they supposed to give up? Other libraries, such as my own, simply do not have the budget at all for this, period.

Therefore, we see that an unavoidable corollary of RDA implementation will be a split in the library bibliographic world at a highly inopportune moment.

No one has suggested that publishers will provide us with better quality records in RDA than they do now with AACR2. Creating RDA records will not take less time than AACR2, therefore, it is difficult to even imagine how productivity could increase. It seems to me that sooner or later, someone must demonstrate a sound business case in favor of adopting RDA. I have yet to see one.

As a result, trying to force FRBR’s 19th century view of information onto our new information universe is like those people long ago who continued to insist, while ignoring all the evidence, that the earth is the center of the universe. And also, now that new tools such as Google and Google Books exist that allow each individual to experience this new universe of information for him or herself, and to use it in very personal ways, then to insist that FRBR is “what people need” when any individual can see it is not, is similar to those groups who fervently believe that the U.S. did not really land a man on the moon and that NASA has been involved in a tremendous hoax from that time!

It is important to keep in mind that obtaining the funds for a subscription to the entire full text of Google Books, when it becomes available, will undoubtedly be much easier than for RDA training, even though additional funds will most probably be required, but the benefits of the full text of millions of books will be crystal clear immediately to each and every one of our administrators as well as to our patrons. In contrast, demonstrating the advantages of RDA to administrators and patrons will be next to impossible because the business case has not yet been made.

Why does the cataloging community insist on a drastic change in their rules that will have serious backroom impacts on workflow, training and productivity, but that no one will notice in the final product? I have a few theories, one of which I mentioned in my very first podcast where I discussed “change for change’s sake,” [] but on further reflection, I realized there is another possibility: the Black Box of Bruno Latour. [One of the best discussions was in an old Lingua Franca issue]

Bruno Latour is a French philosopher who has a unique, and *highly controversial*, method of studying scientists. He studies them as if they were a tribe of primitive people in the South Pacific, and concentrates not on the products of what they make, but how they do it and how they relate to one another, or what he calls “science in action”. [Latour, Bruno. 1987. Science in action: how to follow scientists and engineers through society. Cambridge, Mass: Harvard University Press.] So, he asks: what is this “thing called science” when it is being done? How are scientific theories made?

He describes that while a theory is in flux, there is a huge amount of effort, ego, money, hope, energy, and everything else you can think of, poured into proving one’s own theory, but naturally there are competing theories, and just as much effort, ego and so on is thrown into those counter theories. When one of these theories finally “wins” and becomes accepted by the general scientific community, it turns into what is essentially a “black box” where information is input on one end, and from the other end, a solution comes out. Everyone agrees that the black box works correctly and whatever it produces is “correct”.

It turns out that the longer a black box is in place, the more people have invested in maintaining that black box. This includes companies that produce and sell scientific instruments and publish information, along with various scientific departments with their individual scientists all focused on getting grant money and interested in the advancement of their careers. Therefore, those who seek to “open” that black box do so at their own peril because they will be facing many established layers of powerful vested interests.

At certain points however, when the black box simply stops working, it nevertheless must be opened. In the case of libraries, I suggest that the black box is the traditional library catalog, and it has already been opened up for quite some time. It was not the librarians who opened it initially, but computer specialists who built their own tools, such as the plasma physics site, the entire open access publishing movement and even ingenious kids who built sites such as Facebook.

It is my position that since this is the situation we are facing, librarians too must--and I mean absolutely must--open that black box that has been handed down to us and protected by our predecessors since at least the days of Panizzi, if not before. We must open it for ourselves so that we can reconsider *everything* in it, its purposes, how it functions, and which parts serve the needs of our patrons. For those parts that do not serve our patrons’ needs, are they necessary for librarians, or can they be repurposed in some way? We must include in our deliberations all kinds of other groups such as interested members of the public and scholars and many, many others.

Latour mentions that doing this is like opening Pandora’s box: it will be messy; it will be disheartening; it will be humiliating in many ways, and yet we have no choice except to do it because the black box of the library catalog no longer functions as it should and other groups who are far more powerful and important than librarians are reconsidering matters right now without us. I am a cataloger and such an idea is very disturbing to me. Nevertheless, we must involve ourselves or risk remaining completely ignored. I believe that doing this will be a major step in the further advance of our profession.

A lot has been done already by the general information science community and while their findings should be considered, their conclusions should *not* necessarily be accepted.

I think there is a great deal the library cataloging community can do that will have far greater advantages for the public than the cosmetic changes of RDA, while being much less disruptive for us. To take only one example, we can face up to the fact that our traditional system of subject headings simply *do not work* in the online environment. But it doesn’t follow that people do not want the *control* that the subject headings allow and therefore should be abandoned. This would be an incorrect conclusion. In fact, this is one of those areas where the public has already opened the black box and come up with something called “The Semantic Web” which in essence, seeks to provide many of the same controls as our traditional subject headings and authority controls. An example of such a project is dbpedia []. See also, the project [] which attempts something similar to what librarians have always done through authority control. These projects are far from perfect and just a few moments of a skilled cataloger’s time skimming over some of these projects will show how much help they need.

This is merely one area where catalogers could make important contributions to a huge, collaborative project that others can readily see and perhaps, even come to appreciate, at least appreciate far more than typing out a few abbreviations in local catalogs.

Above all, our creaky old MARC format needs to change into something more modern, plus our records need to be liberated from our local catalogs, to begin to make their own way in the world outside of library catalogs, to be reused in all kinds of ways by the public, but these records can still retain their ties to the library world through means of linked data.

As I mentioned before, many of these suggestions make me highly uncomfortable. I am sure they will make many other catalogers uncomfortable as well, along with the organizations they work for, but I feel something like this is imperative.

I think an anecdote from my own family history may be appropriate. This comes to me second hand, by way of my father. He told me a story that he had been told about his great-grandfather, my great-great-grandfather, Joel Akers, who passed away before my father was born. Here is a picture of him and his family.

This took place in a little farming town in Kansas, and my father told me how the townspeople told him that “Grandpa Akers” absolutely hated the new automobiles. People had fun remembering that whenever a car drove into town, Grandpa Akers would hobble out into the street, stamp his feet, shake his cane and cuss and yell all the time he could see the car. Folks compared him to a banty rooster. Of course, all of his anger and threats didn’t stop anything, but it gave others something to laugh about.

Although I would have liked to meet my Grandpa Akers, I confess that I don’t want to be like him. There is no use fighting these kinds of changes because it is wholly unrealistic to imagine how they could vanish so that some previous time that you happen to prefer will return. The fact is: this new world is not going away, and once this revelation is genuinely accepted, the task becomes very simple: Darwinian survival. How do we survive in such a future?

If I followed my instincts, I could let my “Grandpa Akers side” come out. I could cuss everybody and yell out: “I love books! They aren’t going away! Look how many are being published right now! These website things are crazy since anybody can put any blamed thing out there they want! And since you can’t believe what you see there, any fool who believes in those things is crazy too! Aardvark? What kind of a stupid name is that? And it links me up to some idiot out there who I don’t know, but he’s supposed to answer my questions?! What is this insanity? MARC format was good enough for my pappy, so it’s good enough for me!” While I yell this at the top of my voice, I can stamp my feet and shake my fist at anybody who is unlucky enough to come anywhere near me.

Yet, if I actually did this, what would happen today as opposed to the year 1900? Of course, there would be a very good chance that someone around me would have a cell phone. They could record my outburst, and upload it to Youtube. A video like that could easily go viral and I could become just as well known as the Dramatic Chipmunk, the Star Wars Kid, or the Dancing Baby, with people laughing at me, not just in the same town, but all over the world, and for a long time to come!
[The Dramatic Chipmunk, Dancing Baby Yes, the Dramatic Chipmunk actually did meet the Star Wars Kid:]
When I saw so many videos of myself, accompanied by homemade voice-overs and sound effects, each varying in its level of hilarity or obscenity, it just might turn out that I would learn something very special about myself.

Catalogers need a new attitude. We can see this attitude in the example where the Library of Congress finally let out the subject headings in a format I did not know: SKOS (Simple Knowledge Organization System). I applauded, and still applaud this project because making the subject headings generally available has been overdue for many years. It is a great learning project, but as I myself learned to my own dismay, for several reasons, the subfield codes could not be transferred into SKOS. [] As a result, a subject heading there is the entire text string with each subdivision separated only by double dashes! While I realize this is only a beginning and I certainly hope it will be developed further, I find it totally ironic that the way has implemented the subject headings is essentially a replica of the catalog card itself in pre-MARC form! Still, what is so bad about it?

The power of the system of subject headings was not only with authorized forms, but even more important--I think, in the subdivisions that refine the main topic in various ways. Of course, there were always tremendous problems with library subject headings but the resulting benefits were enormous and could easily be seen by everyone who knew how they worked. Still, people faced the problem of finding the authorized form of the main heading, which necessitated (and still requires) a whole slew of cross-references. But this was only part of the problem: if you were to use the subject headings effectively, it was essential to get an *overview of the subdivisions* used under that heading, because when you did this, you discovered how the system of subdivisions actually opened up your mind to new possibilities you could not have suspected before. For instance, someone interested in horses could find by browsing
“Horses--Behavior--United States--Anecdotes”
or someone interested in Dr. Johnson could find
“Johnson, Samuel, 1709-1784--Knowledge--Manners and customs”

In other words, *when used correctly* the system of subject headings not only helped you find what was in the collection, but it also revealed new ideas you would never have thought of and actively searched for on your own.

In James Burke’s excellent documentary series “The Day the Universe Changed”, in one episode he described the development of catalogs and indexes. He demonstrated the powers available through indexing that brought disparate bits of information together in novel ways and how it helped people to think. He concluded that the result from indexing achieved “1+1=3”. I cannot think of a better way to describe it.

It was too difficult to get such an overview from examining hundreds or thousands of cards however, and so you had to consult the LCSH red books separately to get a coherent overview of the subject heading structure, which we must admit very few people did, and anyway, this method also had its own problems. Transferring such a complex system into online library catalogs has been a complete disaster, leading to general incomprehension among the public of a tool that is potentially incredibly useful.

We need to spend our time making this system where “1+1=3” function once again using today’s tools and for today’s populace. Since it has been a failure in our online catalogs, we need other options. It is absolutely vital to retain subject subdivisions. SKOS doesn’t allow it. OK, use something else. If nothing out there works and we have to create it ourselves from scratch, that is just fine, we should do it; we *must* do it even though it may not be “perfect”. I compare this to the Ferrari racing team: if the team decided it needed something and their mechanic said, “Well, I can’t find anything like that in the car parts store”, he would be fired on the spot. That is not how you win Formula 1 races. You yourselves, create the conditions for your own successes. The Ferrari team knows this very well. Catalogers too, need to adopt this kind of attitude in their work. Otherwise, they will come in dead last.

To calm our minds, we can keep telling ourselves that *we* didn’t open that box, others did, but it’s open now. All we can do is the same thing Pandora did.

How do we do this? I would like to close with a quote from Latour:
“Now that [the black box] has been opened, with plagues and curses, sins and ills whirling around, there is only one thing to do, and that is to go even deeper, all the way down into the almost-empty box, in order to retrieve what, according to the venerable legend, has been left at the bottom–yes, hope. It is much too deep for me on my own; are you willing to help me reach it? May I give you a hand?”

Thank you very much for your attention. It really is a great time to be a librarian.

Consider joining the Cooperative Cataloging Rules Wiki! []