Saturday, July 30, 2011

Re: Need for change (Was: dates)

Posting on NGC4LIB

On 30/07/2011 14:46, Joe Hourcle wrote:
<snip>
On Jul 29, 2011, at 7:48 PM, john g marr wrote:
This image also chracterized around half of the electorate (pretty important people who run all of our lives and budgets), who simply do not understand the issues or the complexities of political rhetoricbeyoind sound-bites.
Um ... electorate == people who can vote.
Most of 'em don't vote, and most of them don't consider library funding to be the main issue when electing delegates.
</snip>
While I certainly have my own political viewpoints, in these kinds of discussions I feel that political statements are more divisive than anything else. In the case of libraries and their future, it goes beyond politics: be they free market capitalists or state-planning communists, anarchists or monarchists, it would seem that all have an equal need and desire for reliable information that they can access easily and quickly. As Joe points out, it is a part of public education and community development, which is in all people's interest along the entire range of  political beliefs.

Attached to this however, is the need to build tools that people want and need. This means finding out what those users want and need so that you can build those tools. People have never liked to be confronted with: this is the product/service/whatever that we make and you have to use it or do without. The only time this "works" is when people are dealing with a monopoly, since their only choice is to literally do without. Libraries had a "monopoly" for a long time but that monopoly went away.

It wasn't all because of the growth of the world wide web. Maybe I'm a little slow, but a movie came out only a few years ago that had a big impact on me. I remembering seeing that movie "Night at the museum" where the museum exhibits come to life in the Natural History Museum at night and our hero, the nightwatchman, wants to learn more about those dinosaurs, statues, and other things. So where does he go? To Barnes and  Noble! I almost fell out of my seat since in my naivety I thought he would go to the library! I realized I was dating myself. Then I remembered how the Barnes and Nobles had always been so full, and that libraries have been trying to become more like Barnes & Noble type bookstores for a long time now. Times were changing a lot back in the 1990's and now, the situation is changing again as the big chains have problems. When the World Wide Web showed up it was a one-two punch that has staggered the entire field of librarianship.

So, I don't think it's as much a matter of convincing people that they need libraries so much as that we build tools that they will provide information that *they clearly need* and that they can use easily. Maybe people don't know about the services libraries offer and need additional information, but you can't--and shouldn't try--to convince people that they need libraries when they believe that they don't--it's insulting. Far more productive would be to show people that we are changing, building tools that will help them find at least some things more easily and that, if we had the resources, we could do so much more, then that is an argument that people may find convincing, not only in municipal and state governments, but in foundations, companies and other organizations.

By far, the main tool that a library has is its library catalog, which is currently based on old thinking and just not very relevant to many people's needs today. It was built and designed in another time, for another world. Realize I am talking about the *catalog itself*--how it works and what it assumes--NOT the individual catalog records, which are incredibly under-utilized. As a result, it is unfortunately very difficult to demonstrate how the library is changing.

It's a tough time.

Friday, July 29, 2011

Need for change (Was: dates)

Posting to NGC4LIB

On 29/07/2011 01:52, Alexander Johannesen wrote:
<snip>
You need to do something big and drastic, and all you #$&^%#$*& care about are the minutea of AACR2, the stupidity of RDA, the generally incomprehensibly FRBR, and all the warts of MARC. At no point have you said, hey, how about we come up with a new and crazy way to create reliable meta data? Or how to deal with digital resources? Or what does access mean in a digital economy? What does it mean to fund a library in a global information market-space?
</snip>
and
<snip>
You all got library degrees. Are they relevant to what you want to do?
</snip>
Of course, I agree. The situation for libraries is truly serious and all trends point to it getting worse. It kind of reminds me of a news clip I saw several years ago, when a French village was being flooded. The video showed the huge wave rolling through the village, washing away cars and the smaller buildings but on top of one building (whose foundations looked as if they were eroding away) was an old woman sweeping and sweeping her patio! Obviously she was terrified, but had no idea what to do except--sweep her patio.

Librarians have to show that they are relevant and they have to do it soon since the economic/information environment is changing very, very quickly. Our relevance is no longer taken as a given. In this regard, I heard this fellow speak at a conference about "Perfecting the Irrelevant" and his example of the Smith Corona typewriter impressed me. Here is the story and I suggest everyone read and think about it:
http://waynehodgins.typepad.com/ontarget/files/perfecting_the_irrelevant.pdf.

To me, so much of what the current library initiatives have in mind are different variations of this "perfecting the irrelevant." We *must* discover what is relevant to society today and fit ourselves in, and not to assume--or rather convince ourselves because we won't convince anybody else--that the solution to our societal relevance is to provide our users with the FRBR user tasks. Again, where is the evidence for such an outrageous conclusion? I still haven't seen it, although I have seen all kinds of evidence for other powers and abilities they want and expect. Therefore, FRBR and RDA are fabulous examples of perfecting the irrelevant. Even if we could wave our magic wands like Harry Potter and somehow get WEMI to work right now, today, does anybody really, seriously believe that it would make any substantial difference to our patrons and they would exclaim "Yes! Yes! This is what I've always been missing!" Why? And when we begin to consider the so-called tiny "changes" with RDA; how they will make any difference to any of our users, and the significant labor and costs it means for us, that image of the old French woman furiously sweeping her patio pops into my mind again.

Figuring out how librarians and what they make are relevant to today's society--as I firmly believe we can do since we actually provide services found nowhere else--will not be a simple task though, and it will be a humbling experience, I am sure. Many of our most cherished beliefs will be shown false I am sure, but our field will be stronger for it. What should the library catalog do today? What should reference services be? What does selection mean today? Asking these questions seriously will inevitably lead to painful answers, but it is necessary. The only way to find out a lot of this is to do as Alex says and experiment.

But libraries are bureaucratic and it is very difficult to justify experiments and development, because experiment and development assume the possibility of failure. An idea may not work out and therefore "be a  waste", which is tough to justify, especially today. Of course, just as nothing is a "100% success" nothing is a "100% failure" and all attempts are steps along the same road. False paths are just as valuable as true ones, so long as others know about the false path; otherwise they will make the same mistakes.

Still, this is a very disheartening time for libraries. On the bright side, once we do show that we are relevant and vital to our societies, as I am sure we will, the funding will come. Of that, I have no doubt.

Re: Repeat request for review of W3C LLD Report

Posting to NGC4LIB

On 17/07/2011 18:13, Karen Coyle wrote:
<snip>
we have gotten very few comments on the W3C LLD report [1], yet there are very clearly things that merit discussion in that report. I would really hate for the report to be finalized "unexamined" by a wider audience.
</snip>
I have done some thinking about the report and I am not sure how to include these comments on the site, so at least I'll do it here.

First, I very much appreciate this report and agree that linked data could be one of the main solutions for libraries, yet I think the report itself lacks a certain reality. One of the main problems that I see here is the apparent assumption that the "library community" really is some kind of a single entity, when actually, it is not and has never been a single community. There are huge differences in the skills, labor and system needs among different kinds of librarians, from selectors to acquisitions librarians, to catalogers, to reference librarians, to special collections, to conservation; from librarians in public libraries vs. those in academic libraries vs. special libraries; from librarians in big libraries vs. smaller libraries vs. very small libraries; RDA libraries vs. non-RDA libraries (perhaps); U.S. libraries vs. European vs. the rest of the world, and so on. In my own opinion, even considering that this incredible number of communities, who often do not talk with one another and when they do, rarely understand one another very well, can constitute a "single community", simply beggars the imagination. This single "library community" exists only in utopian dreams.

Strangely enough, it is probable that the librarians who have the best understanding of the entirety of the library field are those in the smaller libraries because there are less opportunities (some may prefer substituting "fewer pressures") for the specialization that occurs in the largest collections. Considering that all of these communities are the same is a similar mistake made by FRBR that lumps all users together ("user tasks").

Another point I would like to make becomes clearer from a short understanding of some history. In the beginning of computerization, there were no off-the-shelf library systems, and consequently libraries had no choice except to try to create their own catalogs. Some of these later became for-profit companies such as NOTIS, but even though some of the local efforts were good and had energetic people behind them, far more often the difficulty and expense to libraries that created their own systems slowly forced them to abandon their own efforts and buy ready-made systems--in other words, libraries have been seriously burned when they have tried to do their own development.

It's true that the situation of open-source, cooperative development going on today is a totally different story and it has been proven to succeed, but memories are very long in librarianship and even a lot of the younger librarians have heard the highly painful recollections of those earlier times. As a result, many are extremely reticent to begin such a terrible process again. In any case, out of the aggregate, very few libraries have ever been able to afford any kind of development and the vast majority have been more or less forced to rely on others. Buying a ready-built ILMS was also always a good solution for bureaucratic reasons: you could say, "if it's good enough for LC [or Columbia, or the Bibliotheque National, etc.] it's good enough for us!"

Libraries are now expected to be creative and innovative in areas of cataloging rules and system development--a rather new mentality, and without any tangible benefits as yet. Of course, staffing levels will  remain flat at best for a long time to come and catalogers are already overwhelmed. Therefore, libraries present a very difficult environment to expect a great deal of development.

I suspect that the locale for genuine, open source library system development may--strangely enough--wind up taking place primarily in the smaller libraries, who will be forced by budget crunches to give up their expensive ILMSs and to give open source catalogs a genuine chance. Once they see the many advantages, development may really take off.

Finally, the problem of rights becomes incredibly important in a linked data environment, but I see it somewhat differently than what I read here. Linked data is, in traditional library terms, a disassembled record where information can come in from anywhere. As a result, if an essential part of the record that is to be finally assembled disappears, the entire structure becomes useless. Therefore, whoever controls those essential parts will have a certain amount of power and it will be vital to determine the rights for those parts of the data. To put this in concrete terms: in a WEMI/linked data environment, the (W)orks and (E)xpressions parts absolutely must be available, since otherwise, everybody's (M)anifestation and (I)tem parts would become useless and could be held hostage to monopolistic practices: "Pay me x amount of money or I will shut off access to the Work and Expression records." The Google Book project already has taken library materials, digitized them, and if matters work out (perhaps), they could end up charging the library community to access its own materials. Something similar could happen with linked data.

Creating a linked data system for WEMI-type records will be expensive and if it demands payment, it could easily morph into a "linked data rupture" between those who can pay and those who cannot. Those who cannot or will not pay must understand their rights because once in, it may be extremely difficult to opt out.

Wednesday, July 27, 2011

Re: dates

Posting to NGC4LIB

On 27/07/2011 02:50, Karen Coyle wrote:
<snip>
Quoting "Beacom, Matthew" <matthew.beacom@YALE.EDU>:
FRBR work and expression records could go pretty far to giving something like you are asking for, but it is not a sure thing that they would have to include dates to unambiguously identify a particular work. The date would be a valuable but not a necessary piece of information. To justify the need for the date of first creation (first conception would not be measurable in any practical way), a new generation catalog would need to be specifically defined and understood by users as a tool that would do much more than current library catalogs do.
I feel like library cataloging has become so focused on the OBJECT that is being cataloged that we almost forget that there is CONTENT in the object, and that the point of the object is to convey that content. I think libraries should be less focused on the object and more active in helping users learn about the content. (When was the last time we had a long discussion about subject analysis or classification?) I don't care if we call it cataloging or subject access or bibliography, just as long as we do it.
</snip>
In cataloging terminology, this is called "content" vs. "carrier". For all kinds of reasons, the vast majority of which are practical, libraries have almost always concentrated on "carrier". This distinction and the reasons for it have been misunderstood by many from the beginning, and I am doing some research on some of the earliest debates about this, now that I can through the magic powers of the Internet!

If catalogers had an unlimited amount of time to do their work, they could research all kinds of things, but the fact is, catalogers don't even have the time to do what they are assigned to do now and their numbers can hardly be expected to increase anytime soon, if ever. Thus, there are and will continue to be tradeoffs. For example, would you rather that a cataloger spent his or her time researching the original dates of the "works" and "expressions", or that they focus on cataloging new materials? What would the majority of library patrons prefer?

The library catalog that we have inherited is designed to perform in certain ways. We can go a long way toward enhancing those traditional "objectives" as Cutter termed them. But to pile on additional objectives makes the entire thing fall apart. As I wrote in my previous post, *if* we could crowdsource these additional objectives (another version of outsourcing) and put together different databases, e.g. dbpedia for Gilgamesh http://dbpedia.org/page/Epic_of_Gilgamesh, this is something that might be realistically achieved. To do this though, Task No. 1 is to start using XML in some form. Then, all kinds of developers can begin to build tools that will exploit our databases, and we could exploit theirs.

To expect catalogers to spend even more of their precious time to figure out the dates of works, expressions, as well as the manifestations, while catalogers watch their numbers remain static or diminish, and their administrators watch the backlogs grow, is simply unrealistic.

We should be focusing on what we can do NOW, with what exists out there NOW. There are so many rich resources out there just waiting to be tapped. They could begin to be exploited NOW. I repeat that this should be the job of the next generation catalog. This would be much better than waiting for something that might appear in ten or fifteen years... maybe. And most probably not ever.

Re: same edition?

Posting to Autocat

On 26/07/2011 17:52, J. McRee Elrod wrote:
<snip>
John Marr said:

.. and the publisher Ingram is not the same as the publisher Berg ...
We would consider Ingram a manufacturer, not a publisher. They reproduced the larger part of an edition published by Berg. Thus we would code Berg as 260$b, and Ingram as 260$f. Both AACR2 and MARC21 have elegant solutions to such situations, if we would but take advantage of them.

James' convoluted choice can be avoided. Both/and is possible.
</snip>
Saved by the 260$f!

Yes, that's true, but when the *only* difference from one item to another is the number of pages, everyone is faced with the convoluted choices. Plus, when considering paging in an international, multi-metadata-community environment, it is even more convoluted. Here is another attempt at self-promotion (I am completely shameless in these things): my chapter in "Conversations with catalogers in the 21st century" edited by Elaine Sanchez (yay!), deals with this, and it is available *for free* in the E-LIS open archive at http://hdl.handle.net/10760/15838

The scans are readable, but I had never used the scanner, so everybody who downloads the article, gets to see my fingers at the bottom!

Re: [NGC4LIB] dates

Posting to NGC4LIB

On 26/07/2011 17:37, Laval Hunsucker wrote:
<snip>
Yes, _dates_ !

Interesting issue. The kind of thing you mention can indeed be annoying. Perhaps even more so in the case of something like, say, the Epic of Gilgamesh, 2001 :-).

But *which*, and how many, dates would or should a catalog record give ? And how ?
</snip>
FRBR provides date attributes for the work, expression and manifestation, but strangely, not for the item--something I am sure that makes sense somehow but the reasoning has always escaped me. It seems that if there is anything you really could provide a date for, it should be for the physical item you can hold in your hand. But... ?

I've always thought that traditional cataloging and MARC were relatively poor on dates, since there are lots of possible dates for metadata, including effective date of research (although something is published in 2011, the actual work on the resource finished in 2008 or 2009).

A lot depends on what you want the catalog to do. Currently, it is designed along Cutter's guidelines (from 1876!) http://library.music.indiana.edu/tech_s/manuals/training/catalog/cutter.html. Objective 3H was always sort of lost in the discussion, but the catalog certainly is designed to do everything else there.

When we add more "objectives" onto this list, the whole edifice begins to groan. For instance, a question such as "What do you have by 19th-century women authors from Holland?" (a more realistic question from a patron instead of others I have read) cannot be answered by the traditional catalog since it is not designed to do so. Best would be to suggest for people to browse the shelves for 19th century Dutch literature, looking for female names, but browsing has its own problems and this would not be using the catalog, but the arrangement of books. The absolute best would be *IF* you could find a reference work that lists women authors from the 19th century in Holland and to search each one from the list. In other words, suggest that the users do lots of work.

That is the traditional answer, but today it is possible for different databases to interoperate, so that a database of authors, limited to Holland, 19th century, female, could work in conjunction with our catalogs, or another database that may have the dates of specific works, such as Gilgamesh or Homer.

Catalogers no longer have to do everything from scratch--their systems can work with all kinds of other projects out there. This is what a next-generation library catalog should do, and, I think that if a database does not already exist, there would be many people from the scholarly community and/or the general citizenry who would be very happy to help create these kinds of databases.

Tuesday, July 26, 2011

Re: Some thoughts on books about Wikileaks

Posting to Autocat

On 26/07/2011 10:31, Moore, Richard wrote:
<snip>
"Extremism" is one of those things for which an objective definition is elusive; however, I think the LCSH "Extremist Web sites" is appropriate for works that purport to be about sites the author considers extreme.

Cf. the LCSH "Demoniac possession" and "Demonomania". A rational cataloguer might consider that a work purporting to deal with the former was in fact concerned with the latter, but it's the author's intended subject that should count.
</snip>
The idea that a resource should be presented as the author intended it is correct. I saw this a lot when I would have to assign some Soviet books the subject heading "Anti-Soviet propaganda" when they were clearly Anti-American propaganda, but they presented themselves as Anti-Soviet propaganda. Otherwise I would have been giving my own interpretation.

But we still make certain exceptions. Here is an example I found at random in Worldcat. As a deference to others, I prefer not to include the title of the book here but only give the link: http://www.worldcat.org/oclc/68694413

The subjects assigned:
Television personalities -- United States -- Biography.
Radio personalities -- United States -- Biography.
Television talk shows -- United States.
Radio talk shows -- United States.
Conservatism -- United States.
Hate -- Political aspects -- United States.
Bullying -- Political aspects -- United States.
Propaganda, American.
Mass media -- Political aspects -- United States.
United States -- Politics and government -- 1989-
I don't know if the author would agree that this covers his intended subject, and I cannot tell if the cataloger is conservative, centrist or leftist.

Well done!

Re: same edition?

Posting to Autocat

On 26/07/2011 01:29, Jamie King wrote:
<snip>
I agree with Mac and Mark. From OCLC's When to input a new record: "Specific differences in the extent of item (other than those noted) justify a new record." (A large chunk of plates missing isn't one of the exceptions.) http://www.oclc.org/bibformats/en/input/#CHDJFJHA
Mark Ehlert wrote: I'd vote for new record, judging from the description of the Bergoriginal and your Ingram copy.  The 32 pages of plates make up a significant part of the book's content at about 13% of the whole.  And the text may make reference to the (non-existent in Ingram's run) plate illustrations.
Dawn Loomis wrote: I have an Ingram print on demand of a publication: Visibly Muslim. ISBN:9781845204327 OCLC: 460711288. This printing does not have the pages of plates indicated in the bib. record. Do I need to create a new record or can I use the above mentioned record.?
</snip>
Well, I just figure that this is why catalogers are paid the BIG money! :-)

In reality, there are a lot of possibilities here, depending on whose guidelines you are following. LCRI 2.5B9 (formerly 2.5B10) http://sites.google.com/site/opencatalogingrules/2-5b9--leaves-or-pages-of-plates, we are told that, "If the leaves or pages of plates are unnumbered, give the number only when the plates clearly represent an important feature of the book.  Otherwise, generally do not count unnumbered leaves or pages of plates." (Before this RI in 1991, we always counted unnumbered plates.) Also, in LCRI 1.0 http://sites.google.com/site/opencatalogingrules/aacr2-chapter-1/1-0--decisions-before-cataloging---rev#TOC-Edition-or-Copy-of-Monograph "Decisions before cataloging --> Edition or Copy of Monograph", we are told that there is a new edition whenever "anything in the following areas or elements of areas differs from one bibliographic record to another: title and statement of responsibility area, edition area, the extent statement of the physical description area, and series area."


Obviously, this becomes tricky when unnumbered plates exist in a book, and whether or not the cataloger considers them "important".

Now, when we contrast this to ALA's "Differences between, changes within" http://www.ala.org/ala/mgrps/divs/alcts/resources/org/cat/differences07.pdf, (pdf p. 12), we see A5a: "A different extent of item, including the specific material designation, indicating a significant difference in extent or in the nature of the resource is MAJOR. Minor variations due to bracketed or estimated information are MINOR. Variation or presence vs. absence of preliminary paging is MINOR. Use of an equivalent conventional term vs. a specific material designation is MINOR."

Finally, the OCLC guidelines are apparently based mainly on the ALA guidelines.

Based on all of this, what can we conclude? A lot depends on what you believe about the original cataloger: did he or she think that the unnumbered leaves were "clearly an important feature of the book" or not. Did the original cataloger not really care and wanted to go home early, or was overwhelmed with work, or did he or she consult with others and only then decide? Since any of this is obviously impossible to know, it turns out that what we are seeing here is an illustration of the complexities of leaving matters to "cataloger's judgment" which at first blush seems to make matters easier while saving time. While it may be true that it saves the time of the original cataloger, we can see that such a policy can demand a lot more time from *other catalogers* out there who have to interpret the cataloger's judgment. There is exactly the same problem with the ALA/OCLC guideline of paging is:
"A different extent of item, including the specific material designation, indicating a significant difference in extent or in the nature of the resource is MAJOR.
351 p. vs. 353 p. is MINOR"

This is in direct conflict with the LCRI which says "anything in ... the extent statement of the physical description area..." while many scholars I know would consider the difference between 351 p. or 353 p. to be very MAJOR indeed.

I discussed this problem of determining "manifestations" in some more detail in my second podcast of my personal journey with FRBR. (I had to throw that in!)

As I stated before somewhere, as a *cataloger* I personally don't care which rules are chosen, just so long as everybody follows the same rules. As a *researcher* however, I consider that the paging/extent of a resource is very important with physical items and I cannot see the horrible difficulty of making a new record, change the 351 p. to 353 p., so that the record at least tells the truth about the item everyone is working with.

Monday, July 25, 2011

Some thoughts on books about Wikileaks

Posting to Autocat

"How WikiLeaks Books Came to Be Liberated & No Longer Categorized Under 'Extremist Websites'"
http://dissenter.firedoglake.com/2011/07/23/library-of-congress-no-longer-classifies-wikileaks-books-under-extremist-websites/
"The Library of Congress (LOC) and the National Library of Australia (NLA) have, in the past week, reviewed their categorization for WikiLeaks books that were on file. A bottom-up movement of WikiLeaks supporters and writers on Twitter going back and forth on how WikiLeaks books were being categorized led the LOC and NLA to mount this review. And, reviews by the LOC and NLA led to a change in categorization, meaning no longer will WikiLeaks books be categorized under the subject header "Extremist Websites.""
[by the way, it is "web sites"]

I suggest that all catalogers read this very interesting article, but even more interesting is the authority record:
150 __ |a Extremist Web sites
550 __ |w g |a Web sites
670 __ |a Work cat.: Hate on the net : extremist sites, neo-fascism on-line, eletronic jihad, 2008.
670 __ |a Oxford handbook of Internet psychology, 2007: |b p. 191 (extremist websites; review of 150 extremist websites revealed large percentage had links to similar sites)
670 __ |a Community in the digital age, 2004: |b p. 191 (extremist websites; HBO documentary "Hate on the Internet" provided number of disturbing examples of how extremist websites influenced disaffected youth to commit hate crimes)
670 __ |a Counterterrorism, 2009: |b p. 152 (increasingly important role played by Internet and extremist websites in radicalizing immigrants, citizens in Western countries)
670 __ |a Internet Watchfoundation WWW site, Mar. 30, 2009 |b (extremist websites; extremist web sites)
952 __ |a 0 bib. record(s) to be changed
952 __ |a LC pattern: Government Web sites
Even after reading this, I still don't really understand what is an "extremist web site". I could understand "Neo-fascist web sites" (I am purposely ignoring capitalization) or "Hate--Computer network
resources"
. Additionally, the subjects in the LC record for the work cataloged in the authority record, "Hate on the net" are:
Racism --Computer network resources.
Race discrimination --Computer network resources.
Antisemitism --Computer network resources.
Internet.
World Wide Web.
Cyberspace --Social aspects.
Technology --Social aspects.
These subjects are clear to me and I think they are well done.

After a search, the dictionary definitions of "extremism" leave me unsatisfied (e.g. Merriam Webster's "the quality or state of being extreme") and I prefer the Wikipedia definition http://en.wikipedia.org/wiki/Extremism:
"Extremism is any ideology or political act far outside the perceived political center of a society; or otherwise claimed to violate common moral standards. In democratic societies, individuals or groups that advocate the replacement of democracy with an authoritarian regime are usually branded extremists, in authoritarian societies the opposite applies.
The term is invariably, or almost invariably, used pejoratively.  Extremism is usually contrasted with moderation, and extremists with moderates. (For example, in contemporary discussions in Western countries of Islam, or of Islamic political movements, it is common for there to be a heavy stress on the distinction between extremist and moderate Muslims. It is also not uncommon to necessarily define  distinctions regarding extremist Christians as opposed to moderate Christians, as in countries such as the United States)."
It goes on to discuss how difficult it is to define "extremism". I confess that what pops into my own mind is the Barry Goldwater quote: "Extremism in the defense of liberty is no vice".

So, whether someone likes Wikileaks or not, according to the Wikipedia definition (and the others I have seen) "Extremist web sites" should not apply to Wikileaks and I applaud the popular movement to change the  subjects on that book. Continuing the same line of thought however: I don't know if I like the subject itself "Extremist web Sites" since it does not appear to have a clear meaning, and is invariably pejorative, as Wikipedia points out.

I think this is a great illustration of how the public can get involved, and would like to get involved, in some of the issues of cataloging. Naturally, some more negative examples can be given as well, and I could provide some myself.

Re: Progress on tasks?

 Posting to RDA-L
 
On 25/07/2011 13:26, Bernhard Eversberg wrote:
<snip>
On the other hand, fee-based, commercial services are not to be excluded and may provide all sorts of added value, obtainable only by subscription or purchase, pricing entirely left to their discretion. They would, however, receive the same level and scope of access to the text as everybody else: No monopolized access to the textbase. Now that RDA has to be rewritten anyway, the chance is within the scope of possibilities.

I am aware that this excludes the familiar ways of commercial funding of code development and the subsequent monopolized, copyrighted access to the text. This is a thing of the past, definitely, budget crunch or not, and any scheme still based on it can no longer succeed. At least not in a good enough way to result in mentionable, communicable improvements. And not in a big enough way to overcome the AACR2+MARC21 octopus. Just look at the test data if you don't believe this.
</snip>

This is a fundamental point that needs to be emphasized. Just because something is made "open" does not mean that people still don't need a lot of help that they will be willing to pay for. For instance, if someone needs information on MARC format, it is not enough just to hand them the manual (as I have seen before). If people are going to use MARC format they need help with it. The "added value" is the expert(s) who can actually help the person who needs the information. This holds for all the cataloging rules and procedures that are incomprehensible for those who are untrained.

Someone may want to implement the "free" catalog Koha, but they find that they are paying for someone else to install and host it, plus do the basic maintenance. There are many places around the world that host Koha--and making money at it--as well as hosting many other open source programs. It still costs, but normally it costs a whole lot less than a proprietary version. Plus, you are free to make your own changes as you need.

If all of the cataloging rules were made open, other communities could actually think about using them or adapting them. But every one would all need help. That sounds like an "unserved market" and this market could be huge. I think that the metadata rules and procedures from the library community are the greatest in the world and I can imagine that a lot of other metadata communities would want to utilize them in many ways, but currently those library cataloging rules are a closed system. A lot of people could make a lot of money from providing all kinds of services: training, providing translations to and from various languages (as each community adds their own versions and interpretations), and many other services as well. ALA/OCLC could coordinate a lot of this, supply server space (perhaps for a reasonable price), provide varying levels of training leading to *certification* and so on. The sky is the limit.

But, it really means heading off into new directions.

Re: [ACAT] How to index statement of responsibility

Posting to Autocat

On 21/07/2011 17:54, J. McRee Elrod wrote:
<snip>
We even like key word searching of 300, so we can find sample records or particular SMDs. The SLC OPAC has keyword search by MARC field, and we find it very helpful.
</snip>
This can show the problems of this kind of searching. When you open up the 300 field to keyword searching, you automatically have problems searching numbers, which can be very important in scientific and technical publications. If you index the 300 field, a search for "123" retrieves everything with 123 p., or a search for "23" picks up everything that is 23 cm, and becomes much more complicated for users to understand why they are looking at records that have nothing to do with their search, and more importantly, how to work with them.

The same goes for indexing the 260 field. A keyword search for e.g. "New York" will retrieve everything published in New York, which most people do not want. To get around this, you cannot say simply "NOT 260 new york" because you do want items published in New York, so limits must be made specifically for title and subject fields, in other words, a highly complex search that is beyond the capabilities of most users.

Still, I agree that being able to search all fields and subfields can be very handy for expert searchers (i.e. catalogers), who understand what they are doing and can look for sample records far more quickly and  easily when the entire record is indexed. For instance, when setting up a name, I really liked being able to search for 100/700$c for examples of different qualifying terms. But almost nobody else needs to do
anything like that.

Matters are different concerning 245$c, since I personally think the advantages of indexing outweigh any disadvantages.

Sunday, July 24, 2011

Re: Progress on tasks?

Posting to RDA-L

On 21/07/2011 17:18, Beacom, Matthew wrote:
<snip>
The MARC pilot project report is available in PDF here
http://www.eric.ed.gov/PDFS/ED029663.pdf
</snip>
Thanks so much for pointing to this. Now that I do not have such good access to these kinds of documents, each one is very much appreciated!

Analysing the document is very interesting though. One point is on page 3/9 or pdf p. 14, where it says:
"In the MARC Pilot Project, budget and time constraints influenced design considerations. Among these were:
  1. Project Facilities. The implementation of the project required a facility for the central preparation of machine-readable catalog records. A major change to the computer configuration would be expensive and could not be justified. [Details are given] This decision influenced the design of computer programs.
  2. Mode of Data Collection. In view of the time constraint and the experimental nature of the project, it was deemed inadvisable to disturb the existing internal operations of the Processing Department... The manuscript card ... was reproduced on a preprinted input worksheet which became the source data for MARC. The design of the format, the worksheets, and the editing procedures were all influenced by the use of source data in this form."

These seem to be entirely valid concerns, and I am not criticizing them, but it is important to keep in mind that the original MARC Pilot Project suffered from its own limitations. One additional point is (of course) that they were using keypunch machines with paper tape. I don't have experience with tape, but it brings back some of my own horrible memories of the days before video output, where typos were a lot worse than on a typewriter. It meant that you had to re-punch an entire card or even a card set, and sometimes when I typed an "e" instead of an "s" I would just want to sit down and cry!

They also wanted to put in a publisher code, i.e. a specific code for each publisher (pdf p. 57) but it was broken off very quickly because of the cost caused by "research necessary to establish new codes" and they provide the normal list of problems related to authority records. They took the very practical decision that there would be very little benefit for a significant amount of work. Their conclusions should be kept in mind when considering some of the proposals put forward today.

Also, when looking at the list of participants, all are university/research libraries except for Montgomery County Public Schools and Nassau County Library System. But when you look at the actual reports, you find that Montgomery County Public Schools could not participate at all, and the Nassau County System, although they provided a bit more information, also said that they could not participate because it was just too complicated. As a result, the libraries actually included were almost completely university/research libraries.

Of course, what we see in the report reflects the limited technology and "world view" of the 1960s. In a pre-internet world, it was far more difficult to communicate with large, disparate groups compared with what can be done today. Today, the technology has made it possible for people with much less knowledge and experience to participate in highly technical tasks. Today, everyone who wants to participate can be included, and the fact that this sort of all-inclusive organizational model can succeed has been proven through the development of open source software. Keeping matters secret among a restricted set of all-knowing gurus, who allow limited information to dribble out and then present everyone with what is in essence a "fait accompli" that can be rejected only by admitting a huge waste, is a thing of the past, or at least it should be.

Saturday, July 16, 2011

Re: [NGC4LIB] "Is a Bookless Library Still a Library?"

Posting to NGC4LIB

On Thu, Jul 14, 2011 at 4:48 PM, Karen Coyle wrote:
<snip>
And yet ... look at the cataloging rules and you will see a culture obsessed with the THING itself and with very little concern about the information within the thing. Yes, there are subject headings (3) and one (1) classification code, but note that those aren't addressed in the cataloging code and aren't being discussed today the way that RDA is. Where is our interest in the information?  Why aren't we putting hundreds of thousands of dollars and thousands of person-hours into developing new ways to get at content? (And I don't mean expanding keyword searches over more and more databases with mainly descriptive metadata.)
</snip>

(Apologies for the tardy reply, I have been away)
This is because cataloging has always concerned itself much more with the carrier instead of the content. For printed/physical items, this makes sense from a practical viewpoint. A catalog record I made 20 years ago that describes a printed book still describes that book.

But this is what I think is one of the biggest problems with RDA (and FRBR): they still concentrate on carrier, and bring that same focus to web resources. This results in an almost impossible situation where you have a static catalog record describing a resource that changes unpredictably. So, it's not such a strange occurrence to find that a whole series of items you cataloged last week had changed a lot in the meantime! And if they changed that much in the last week, what will be the differences in 20 years?

The rules we have for describing physical resources have proven themselves, but websites are a completely different animal. They need a completely different approach to ensure that the work done on them today is not simply wasted effort when the item changes, as it will inevitably. PDF will change to QDF or PEF or something--we know that. The URLs will change. Almost everything will in the record. So, the cataloging problem we facing is not with cataloging rules, but with record maintenance.

Re: [NGC4LIB] "Is a Bookless Library Still a Library?"

Posting to NGC4LIB

On Tue, Jul 12, 2011 at 10:50 PM, B.G. Sloan wrote:
<snip>
From Time magazine:

"We've been hearing about it for years, but the bookless library has finally arrived, making a beachhead on college campuses. At Drexel University's new Library Learning Terrace, which opened just last month, there is nary a bound volume..."

See: http://ti.me/ooq5PI
</snip>
Of course a library with no books is still a library. I think it is vital that librarians reconsider what defines the "library's collection" if they are even to survive. The job of librarians is not, and has never been, to place physical objects on shelves and keep them in order. This is mixing up the tasks of the library with the purposes of the library.

What is a library? What purposes does it serve? What is the library supposed to do to fulfill the needs of the public or other community, who pays for it? The answers of 25 years ago to these questions are no longer valid.

At the moment, there is still a need for printed books, but we should remember that practically everything printed today exists in an electronic version somewhere. All that would have to be done is for those people in charge of those electronic versions, to just make them generally available. The technology exists for this to happen right now; the public wants it. It is going to happen sooner or later, and probably sooner in the STME fields, much as we are seeing now.

We must prepare ourselves, and that includes reconsidering the 19th century FRBR model of what users supposedly wanted back then, and RDA.

Re: DLC records not following instructions in Series authority record

Posting to Autocat

For those who are interested, I discussed LC's decision to abandon
series authority in one of my "Open Letters" to Thomas Mann at
http://eprints.rclis.org/handle/10760/7836
I still stand by those comments.

Saturday, July 9, 2011

Re: What is MARC format?

Posting to Autocat

On 09/07/2011 16:55, J. McRee Elrod wrote:
James Weinheimer said:
... there is the ISO2709 format which is used only to transfer complete catalog records (i.e. cards) among library catalogs, and this is definitely obsolete today.
MARC for transferring records at time of ILS migration is very much not obsolete today. Libraries which can't get MARC out of an obsolete ILS are faced with a very expensive problem.
As I said, there is one use for the ISO2709 MARC format: for transferring complete catalog records from one library catalog to another. Period. That's it. No other tool utilizes that format and it must always be transformed before it can be used anywhere. On the other hand, XML can be used by lots of tools out there, including browsers. [See below]
Because of this 1960s foundation, everything from then on becomes limited based on that format: limited>record length, a limited number of fields and subfields ...
MARBI seems to have no difficulty adding fields and subfield. Consider the proposed overly complex 26X fields.
What I mean is that a single record can be no more than 99,999 because in the Leader, the first 5 positions (0-4) define the record length.Then comes the directory which defines the length of each field. I have seen repeated complaints on Autocat that the subfields are used up in some fields. With XML, there are none of these limitations.
and then concerning the information in the fixed fields, this is very difficult to extract without the special library software.
Have you tried reading XML without special software?
Yes, and you can too. All you  need is your browser, not some kind of special software that you have to download and learn. Here is a basic example: http://www.w3schools.com/xsl/cdcatalog_with_xsl.xml. If you
look at the source code of this page, it is all XML and it uses thestylesheet to make it readable and useful. This page explains what is happening http://www.w3schools.com/xsl/xsl_transformation.asp.

Stylesheets can be extremely powerful. If these were MARCXML records, any browser can find, retrieve, resort and reformat a search result in XML in all kinds of ways--and on the fly if you want. By this I mean that I can automatically search a remote database, retrieve x number of records, and--if they are in XML--I can take just the parts I want, reformat those parts, resort them and do all kinds of things with them. People do it every day. I agree that MARCXML is not the best format, but at least it is possible to work with them. Nothing like that can be done with ISO2709 records. They are obsolete except to transfer virtual catalog cards from one library catalog to another. People need to be able to do much more than this and as a result, our records are ignored.
Also, a certain structure is embedded ISO2709. As I have mentioned, allowing for the possibility of multiple main entries ...
Great. I do not wish to inhabit a bibliographic world of multiple main entries, whatever that could mean, with no set forms for bibliographies and citations, not to mention subject and added entries for works.
What multiple main entries means is that for Masters and Johnson, Johnson will be treated equally with Masters, which is how it should be. I submit that if there had never been book catalogs or card catalogs and we were setting things up from scratch today, nobody would ever come up with the idea of a single main entry. It is a remnant of the printed catalog.

In reality, there is primary and secondary authorship. I have mentioned in other messages how allowing multiple main headings would be very difficult in ISO2709 because of added and subject entries, if it is possible at all, but structuring it would be relatively easy in an XML format.

Again, a focus record transfer using XML instead of ISO2709 is not the complete solution, but one important step toward entering the larger information universe.

Considering bibliographical and citation formats, I have worked with them extensively, and am really doing so now as a consultant figuring out how to create citations for online statistical databases, and I have never seen any citation format that mandates a single main entry. People are supposed to list the authors as they appear on the t.p., using either full name or initial, and distinguish any editors, translators and others, in other words, indicate primary and secondary authorship, but I have never even heard of any citation format that directs people to figure out a single main entry.

What is MARC format? (Was: My ALA talk)

Posting to Autocat

On 08/07/2011 18:04, J. McRee Elrod wrote:
<snip>
> Eric Hellman said:
I don't remember "dismissing the value of almost everything we do as librarians." I'll admit to complaining (sarcastically) about the technical deficiencies and practical drawbacks of MARC ..
We find the "technical deficiencies" to be in present ILSs, not in MARC. Today's ILSs have hardly scratch the surface of what could be done with MARC, or cross walks from MARC.

It is easier to cross walk from MARC to other schema, than the reverse. For example, a cross walk MARC record from IMBD or ONIX takes quite a bit of editing. But with the addition of images, either can be produced from MARC. URLs for full electronic text are a commonplace; for one of our clients, we are now including URLs for cover images. MARC is a far more versatile tool than many suppose.
</snip>
I think it's important to discuss precisely what constitutes the MARC format so that everyone is talking about the same thing. At its most basic level, there is the ISO2709 format which is used only to transfer complete catalog records (i.e. cards) among library catalogs, and this is definitely obsolete today. Because of this 1960s foundation, everything from then on becomes limited based on that format: limited record length, a limited number of fields and subfields, and then concerning the information in the fixed fields, this is very difficult to extract without the special library software. Also, a certain structure is embedded ISO2709. As I have mentioned, allowing for the possibility of multiple main entries, to allow for primary/secondary authorship, would be very difficult to institute in ISO2709, if it is even possible, but very easy in XML. I am sure there are other problems as well.

Another level of the MARC format is what specialists (catalogers) work with: the field numbers and subfield codes, such as 245$a, mean something highly specific to us, but are meaningless to anyone else. Still, just converting 245$a to a so-called "human readable" version is very difficult. Having it display as "Title proper" still means nothing to people since this is specialist jargon. I keep going back and forth on this because "245$a" is language independent while <titleProper> means something only to specialist English speakers.

Although I still believe that
<245> 
   <a></a>
</245>
on the order of MARCXML would be better in the long run than English words that don't mean anything to a non-cataloger anyway, just use 245$a and make different stylesheets, but... I was argued down and I had to admit that developers just won't do that. OK--it's just some stupid computer codes: go with English-language coding.

So, if we had the same "semantics" as we do today with MARC, but the new format actually said, instead of
<300>
   <b></b>
</300>
it could be:
<physicalDescription>
   <otherPhysicalDetails></otherPhysicalDetails>
</physicalDescription>
the two would function identically, and catalogers could get it to display the numbers and subfields anyway. Like I said, it's just a bunch of stupid computer codes and any one code can work as well as any other.

So, there are (at least) three distinct levels to MARC format: the ISO2709 format, the field numbers with subfield codes, and the higher-level semantics. It is the higher-level semantics that are important to retain while the remainder: the field/subfield numbers and codes, plus the ISO2709 format can be safely tossedoverboard, i.e. once the correct systems are in place.

Friday, July 8, 2011

Re: Library data: why bother? by Eric Hellman

Posting to Autocat

On 08/07/2011 17:07, J. McRee Elrod wrote:
<snip>
Brian Briscoe said:
... catalogers understand very well that our catalogs need to become more web-compatible and need to be more user-friendly. There has been much movement toward that area.
Exactly. And that means ILS development, creating a whole new structure, not fiddling with the building blocks as does RDA.

SLC can walk from MARC to any "web friendly" format you want, and has done so (usually for non OPAC use). But the ILSs to utilize these formats do not exist.
</snip>
The more I consider these matters, the more I become convinced that the very premise of the catalog needs to be changed. On the one side are the needs of the users, comprising many, many different types of communities, and on the other side there are the needs of the librarians. And yet, everybody is supposed to use "the catalog". Why?

The only reason that everyone is expected to use the same system is because that is the way it has always been--catalogers always used the same catalog as the patrons. Actually, it would probably be more correct to say that "patrons have always had to use the same catalog as the catalogers" (not counting the old "official catalogs" which were essentially duplicates of the public catalog, the official catalog being off limits to the users, who could mess up the public one and it would not be that big of a disaster, at least not for the catalogers!). The need for everyone to consult the same catalog has always led to an uneasy peace, dating back at least to Panizzi. Today, there is *absolutely no reason* why everybody has to use the same catalog.

The records and updates could easily be ported into a Drupal application and/or other places, and searchers could go crazy with those. But the librarians: selection, reference, and cataloging, and the rest have specific needs to manage the collection. These functions must be fulfilled if they are to have any chance to manage the collection, but the users don't need most of them.

So, we could consider the ILMS as the modern idea of the "official catalog", while wherever the records were ported out to could be considered the "public catalog(s)". This would be so much easier, cheaper and forward looking than changing our cataloging rules! The implementation of which will not actually change much for anyone at all!

Re: Technology and catalogers (was XML...)

Posting to Autocat

On 08/07/2011 00:23, Kevin M Randall wrote:
<snip>
Aaron Kuperman wrote:
MARC is a format - it isn't a physical thing.

Floppy disks were hardware.

Formats tend to be immortal, [...]
I would definitely disagree here. Formats are just as "mortal" as the media on which they are carried. The lifespan of a format may be longer than that of the carrier--or, in some cases, it may well be shorter. The inability to access data contained in an obsolete format is just as real a problem as the inability to access data contained in an obsolete carrier. (Think of files you may have, copied over from 8-inch floppy to 5.25-inch floppy to 3.5-inch floppy to Zip disk to flash drive to the "cloud", that were created in early word processing or spreadsheet programs, and now can't be read because you don't have software that can do anything with them.)

However, I don't think we need to start worrying about the loss of MARC functionality just yet...
</snip>
There are lots of examples of obsolete file formats. There is at least one initiative to deal with this however: "Keeping Emulation Environments Portable" http://news.bbc.co.uk/2/hi/technology/7886754.stm The article states: "Britain's National Archive estimates that it holds enough information to fill about 580,000 encyclopaedias in formats that are no longer widely available."

Emulating the original program plus emulating the operating system that program used at the time seems to be a huge job. Yet, I quote from the BBC article: "Dr Anderson said emulation was more workable in the long term than the usual method of preserving old files which involves migrating information on to new formats with its attendant risks of data degradation and corruption."

I have suggested that in the future, there will be a job called something like a "Digital Archaeologist" whose task it will be to find and rework/revivify old formats (both physical and digital) to get some obsolete information tool to function again.

Re: Library data: why bother? by Eric Hellman

Posting to Autocat

On 07/07/2011 19:32, J. McRee Elrod wrote:
<snip>
James Weinheimer said:
The same thing happened with printed documents, where if something was not printed and>remained only in manuscript, it was ignored by society.
There was a lot of writing and reading before movable type printing was developed in Korea and Europe, as well as libraries with catalogues.

We still pour over manuscripts. Remember the Dead Sea Scrolls?

All of the recorded intellectual and artistic expression of humankind in the province of libraries.
</snip>
Yes, there are many documents in manuscript--but they get genuinely acknowledged and used by society only after their contents can be communicated to others, and this always meant: being converted into another format. Before printing, copying manuscripts was expensive and time consuming, and could be done on a highly limited basis. It was also incredibly prone to error. It turned out that printing of various types was the only real way of making information generally available until computers and the web appeared.

Before a new and more useful format appears on the scene, people have no choice except to use the old format or go without. This happened when people had to use manuscripts locally--the vast majority had no access to those materials. But when the new, "improved" format arrives, those documents that never get converted into the new format are left behind. They practically stop being used, or are used by even fewer people than before. Once these same materials have been reformatted, they can be reproduced, communicated and used over and over again. This happened with printing and today we see the same thing happening with, e.g. JSTOR--now that so many older journals are available electronically, they are being used much more than when their contents were only in paper and people had to dig it all out by hand.
<snip>
Suzanne Graham said:
I understood his point here to be that we don't just need to describe the item as it is (create a surrogate), but we need to further enhance ...
Yes. That's why we add 505s and 520s to MARC records, particularly items which can not be picked up and examined with one's bifocals. Clients tell us use of e-books rises sharply when 505s and 520s are supplied. We've had less positive feed back for 653 or 695 keywords.
</snip>
This still assumes that people will use the surrogate (or catalog) record. Hellman's argument (as I understand it) is that people will not use the surrogate *when given the choice* and full-text gives them that choice. To this I must agree. For example, I wonder how many people even know about the complete metadata record in Google Books? When people make a search and click into something they immediately find themselves
in the midst of full text, e.g. the search for "iraq war media" http://www.google.com/search?q=iraq+war+media&hl=en&safe=off&prmdo=1&tbs=cdr:1,cd_min:2000,cd_max:2099,bkv:f,bkt:b&tbm=bks&source=lnt,
click on "The media and the Rwanda genocide" and you go directly into the text, http://books.google.com/books?id=nJT54Oe2D08C&pg=PA436&dq=iraq+war+media&hl=en&sa=X&oi=book_result&ct=result&resnum=3&ved=0CDcQ6AEwAg#v=onepage&q=iraq%20war%20media&f=false.

But there is another page "About this book" http://books.google.com/books?id=nJT54Oe2D08C&dq=iraq+war+media&source=gbs_navlinks_s which has lots of metadata, far more than catalogers could ever supply. How many people even know about these pages? (I discuss this in much more detail in my chapter in "Conversations with catalogers" now in the E-LIS archive http://hdl.handle.net/10760/15838. I am just shameless in these things!)

Even if people know about these metadata records, how many would look at them? The "library metadata" record, which is actually a very strange thing, comes at the very bottom of the page. I personally think that people would find the "references" the most useful, but the word cloud is a semi-amazing creation.

It seems that Hellman believes that as full-text improves and gets richer (and hopefully, there will be more and more full-text available) this is the functionality that people will come to expect, not to find themselves looking at the metadata page. It does not mean that metadata won't exist, people will just not interoperate with it like they do today.

I don't know what I think. He may very well be right. Of course, the purpose of traditional library metadata changes in such a system and FRBR and RDA become practically irrelevant.

Re: Technology and catalogers (was XML...)

Posting to Autocat

On 07/07/2011 19:36, Todd Grooten wrote:
<snip>
This makes me wonder - as someone who graduated library school in 2003, I feel that my skillset is becoming obsolete...I am a cataloger that only knows MARC.

Would you say that catalogers who only know MARC are going to end up like the floppy disk?

How does one remedy this?
</snip>

This is a great question. In my opinion, the answer is: it's tough to say. While it is important to understand XML and what you can do with it (in short: *anything you want* which is not that easy to really accept) it is not important to actually be able to work with the native XML format. This is similar to being a MARC cataloger, but never working with the native ISO2709 format of MARC21, which is a genuine nightmare and far more complex than native XML format.

I would emphasize that the intellectual task of cataloging should *not* be attached too closely with MARC format. When you say that you are a cataloger that only knows MARC, take out MARC, and you remain a cataloger, i.e. someone who can create standardized descriptions of resources, and organize them for others to be able to find. I can guarantee that not everyone can do that. For instance, untrained people cannot do valid subject analysis. I think the future (here is a prediction!) will emphasize this type of conceptual thinking and there will be less emphasis on format. Most problems with variant formats will be handled silently behind the scenes by systems.

Thursday, July 7, 2011

Re: Library data: why bother? by Eric Hellman

Posting to Autocat

On 07/07/2011 15:35, J. McRee Elrod wrote:
<snip>
I think Hellman brings up a point that is highly important where he says: "We don't need surrogates" ...
This assumes that (1) all library resources are available in electronic form, or (2) if not, the title accurately reflects the content. "Puritan in Babylon" says nothing about Puritans or Babylon.

Not all library resources are textual these days. A surrogate is certainly needed for a work of art, a motion picture, or an electronic device.
</snip>

I don't know if Hellman would disagree with this. He apparently does assume everything digital, which I think is a fair assumption, in the sense that "if it's not digital, it doesn't exist". The same thing happened with printed documents, where if something was not printed and remained only in manuscript, it was ignored by society. Something similar seems to be happening today but we are in a time of transition. (It's too bad since I am a bookman!)

As I understand Hellman's presentation, he is arguing that there may be a place for metadata, but not as a "surrogate" for the item--rather the metadata should serve to improve the Search Engine Optimization. This next point comes from me and I don't know if he would agree: this metadata can be added in various ways, as embedded metadata, or now there is microdata, although these "metadata-type improvements" could probably exist separately. Still the user would probably never see the "surrogate" since it would be used only by the search engine to optimize the result for the searcher in various ways.

I am not saying I agree with this, but I can certainly understand how non-experts, who have been frustrated by traditional IMLSs, and comparing them to what they can do with a Google search or some other modern electronic database, would have few qualms concluding that our ILMSs don't work.

Brian is absolutely right what he says about subjects, and I will add, searching for the other "concepts" as well: authors, titles, places, corporate bodies, and so on. This is a nightmare on Google. For instance, there happens to be a James Weinheimer in NYC, who apparently works as a nightclub singer. I've listened to some of his songs and he must be my evil twin. Sorry James! When I sing, I sound like an old toad croaking. You don't sing any better than I do, but at least I don't inflict myself on anybody! Anyway, I state here and now that he and I are NOT the same person!

The problem is, to get into this power of navigating "concepts" in the catalog, especially to use the power of the subjects to their fullest, you have to use it like a card catalog and that has completely fallen apart today. The cross-reference structure, so vitally important, doesn't even work most of the time with keyword searches. So, people wind up comparing our searching tool with the Google-type searching tools and draw the logical conclusions.

These are the areas where I think substantive improvements should be made to improving how our library catalogs work, not with RDA and FRBR. Unfortunately, those initiatives although well-intentioned, are pointing us into detours where there will be no users in the future. They are going in areas such as SEO and other areas. I think we need to work with those kinds of developments... somehow.

Library data: why bother? by Eric Hellman

Posting to Autocat

I recently read Jeffrey Beall's review of Eric Hellman's talk, "Library data: why bother?" that he gave at ALA http://metadata.posterous.com/review-of-eric-hellmans-talk-at-ala-annual-20 and I finally managed to find the actual slides so that I can get my own impression of his controversial talk. The slides are at: http://www.facebook.com/l.php?u=http%3A%2F%2Fbit.ly%2FipVVoH&h=4AQDVSolC. I suggest people look at these slides as a good example of how catalogs are viewed by many highly influential "information experts".

Jeffrey Beall did not like the talk, "Hellman's talk was among the most arrogant and flippant I had ever attended at an ALA conference. His talk was supposed to be about linked data, but he exploited his position as speaker to unwarrantedly trash libraries, library standards, and librarians." I sympathize with his anger, but I think it is vital today to accept that many non-librarians--and even librarians--share Eric Hellman's conclusion that the library and its catalogs are becoming obsolete, if they have not already been obsolete for some time. Lots of people agree with Hellman that the replacement is/will be full-text searching and they put their faith in SEO, that is, "Search Engine Optimization". Especially in today's economic climate, there is a lot of pressure on administrators to reconsider everything that their organization is doing to maximize their options and if someone could convince administrators that they had a "magic machine" some, if not many, would snap at it.

I think Hellman brings up a point that is highly important where he says: "We don't need surrogates", which I take to mean that we do not need separate catalog records. Although he doesn't say it in so many words, I believe Hellman is saying that if metadata has a use, it is to *improve* the SEO by inserting more specific dates, some type of description, perhaps even some authorized forms, by using metadata or microdata, but the emphasis of searching should be on SEO.

Whether it turns out that these methods would work or not--my own mind remains open on this suggestion--Hellman's declaration that "we don't need surrogates" is a feeling that I have witnessed myself. Most people  do not like to use the catalog and use it only when they have to so that they can get into the books, etc. that they want. Often, they search a book they know they want just so that they can get into the stacks and  browse. In any case, they would much prefer to browse the shelves and, in spite of a lot of my protesting that shelf browsing is OK but one of the least efficient methods of searching, many patrons still resolutely refuse to use the catalog. I don't think this is anything new; after all: the information people want is in the books and serials and maps and in the other parts of our collections, not in the catalog. I remember how I rarely used the catalog myself before I understood exactly what it was and how it was structured. As I remember, I would  look up an author's name with the sole idea of getting into the stacks and browsing the shelves where I assumed all the books were that I would need sat together.

It also turns out that people do not realize that Google is based on metadata. The Google search result has a short summary of each resource, where you see the URL and a bit of the keyword you searched in its context (this is "data about data"), and if you select a time frame in the left-hand column, e.g. "Past month", you miraculously see the dates of the pages, or when you click on "Timeline", "Reading level" and so on, it becomes clear that there is some kind of metadata being utilized behind the scenes. There is undoubtedly a lot more metadata that we don't see, so these are of course, metadata records.

Still, people do not relate to these records in the same way as they do with our catalog records and see it as working directly with the digitized resources.

Of course, I think that library catalog records (or "surrogates") are still very important for information retrieval, but it is clear that the *functionality of our catalogs* need to be rethought completely. Nevertheless, we should no longer think that this attitude is simply taken for granted any longer by the powers-that-be. Such statements must be proven today, often to people who are less than sympathetic. Many love full-text searching, they are familiar with it and find it far more useful than our tools that are conceptually difficult and definitely more complex. Maybe it should not be that way, but the fact that the traditional methods are being seriously questioned is simply a fact of life today.