Friday, February 26, 2010

FW: [Metadatalibrarians] Webinar: Cataloging: Where are we now? Where are we going?

Posting to various lists
"Cataloging: Where are we now? Where are we going?"
Speakers: Karen Coyle and Renee Register
Broadcast date: Friday, February 19, 2010

Thanks for pointing this out. I recommend that everyone watch this remarkable webcast about cataloging, replete with rock music and spectacular graphics flying all over the screen! It seems as if many of the current postings I have read on some of the lists are correct. From this webcast, it seemed to me that RDA was presented as a done deal. The testing that will take place seems to be designed to shake out some annoying bugs in the system, but RDA itself will not be seen as a bug and will be adopted no matter what. RDA also is presented as looking toward the future, although I personally still do not understand this; how any of the changes demanded by the RDA rules matter in the slightest completely escapes me. Perhaps the rule that changes "ca." to a question mark denotes a fundamental step into the Semantic Web in a way that I simply cannot fathom. RDA still has long parts devoted to detailing how to create authorized forms of names. To me, it would be more forward looking if instead, there were something that recognized that for all kinds of reasons, authorized forms of names are less important in a system using URIs (decoupling the labeling function of a heading from its collating function).

So, while we are supposed to accept silently that RDA is a step into the future, when I stop to look at the actual changes, I cannot help asking myself: how does changing the use of brackets for the statement of responsibility [or insert any other rule change] help enter the new world? How does renumbering AACR2 1.4F6 to RDA [or insert any other rule number] help? What difference does it make?

The other part that struck me were the final words about how everything today is about the user: we should focus on what the user wants. (As an aside, this is a far deeper question than appears at first glance as any reference librarian will tell you, because they understand that people are often unaware of what they really want and they need help to discover it, but here we are entering the realm of psychology and philosophy) Nevertheless, focusing on what the user needs is fine, but during the other parts of the show, they said that when we enter this new world of information, our users will be able to do all kinds of things they cannot do in our catalogs: to find and create tags, link to or write reviews, use innovative social tools that will lead them to all sorts of other materials and so on, but never was it said that with RDA, users will finally be able to do what they have always really wanted and needed to do: "find / identify / select / obtain" "works / expressions / manifestations / items" by their "titles / authors / subjects." Of course, if somebody knows how to work a traditional library catalog correctly, they can do that pretty well right now.

To me, their silence was a tacit admission that the FRBR user tasks are not what people actually want to do, and people have moved on to other needs, or perhaps they have had other needs from the beginning. As I have mentioned in other postings, if the FRBR user tasks are incorrect, then the whole of FRBR must become suspect, and by extension RDA; that is, if we are to focus on what our users want. This is not saying that anyone has done anything wrong, it is just Darwin's principal of adaptation in action: the information environment has changed in fundamental ways since FRBR was written, and you either adapt to those changes or you do not.

As one example, it appears that in this new world of information, people want to take our data and refashion it in all kinds of ways for their own purposes. I confess that this has come as a surprise to me, since I thought cataloging information was not valued very highly either by the rest of the library community or by the community at large. The metadata revolution exploded however, and suddenly people discovered that with metadata they could do all sorts of things they couldn't do before.

Focusing on the needs of the users is fine, but if one of the goals is to share our data with them, this should give catalogers pause, since they recognize very clearly that a well-made catalog record has validity only within its own realm of relationships. A subject heading "Yeltsin, Boris Nikolayevich, 1931-2007" is useful only because it shares that heading with records for similar resources. If a record for a similar resource instead uses "El'cin, Boris Nikolaevič, 1931-2007" or "ЕЛЬЦИН, БОРИС НИКОЛАЕВИЧ, 1931-" [in Russian] the heading loses much of its value. There are ways out around this today (the URI mentioned above), but still, we must realize that in the new information world, it will be (if it is not already) as if people can take home their own copies of cards they wanted out of the card catalog. And not just from my catalog, but from other catalogs all over the world. Although the information remains on the cards, once they are outside of the context of their respective catalogs, much of the information, the headings and manifold relationships, lose their utility and purpose. A card alone on its own, or a card jumbled together with all kinds of other cards that use a whole variety of headings for Yeltsin (called mash-ups in today's jargon) is truly a strange thing.

I think that this is the world we are entering and this is the world we must design our records and systems for. A tall task indeed! But an interesting one and one that needs our expertise.

To conclude, I heartily recommend the show featuring two deeply experienced and intelligent catalogers. It was quite stimulating.

Wednesday, February 24, 2010

FW: A couple of interesting articles

Posting to NGC4LIB
I would like to point people to a couple of articles that I think are particularly interesting:

How Google's Algorithm Rules the Web / By Steven Levy, February 22, 2010, Wired March 2010
This article discusses in some depth how Google's algorithm has changed throughout the years and how it may continue to change.

It is especially interesting to read this article alongside another that documents complaints that Google is demoting their competitors' sites on its search rankings.
EU weighs Google rivals' complaints / By Robert Wielard. The Associated Press February 24, 2010, 6:43AM ET, through BusinessWeek

I have discovered in my Information Literacy workshops that people really and truly "like" and "trust" Google, but no one can tell me precisely what they "trust" Google to do. I tell them that Google is a normal company operating in an area with very few "rules," i.e. the Internet. Libraries and librarians, on the contrary, operate very firmly within the realm of ethics, and this is one of the basic differences between libraries and search engines.

FW: RDA, AACR2 and a simple, commonsense implementation plan

Posting to Autocat

Dear Alan,

Thank you so much for your answers, but the main question still has not been addressed:

You state: "In developing RDA JSC established guiding objectives and principles, including the objective of consistency, which we defined as follows, "The data should be amenable to integration into existing databases (particularly those developed using AACR and related standards)." This consistency helps to control the burden of training and will reduce the impact of implementation on productivity. Future changes are planned to bring RDA closer into alignment with international principles and standards, but it was agreed with constituencies to take a gradual approach rather than making all of the changes at once."

While I understand and sympathize with this sentiment, I still do not see how the adoption of any RDA rule, or collection of rules, that I have seen will change the situation in a more forward looking way. Which rules will do this? Elimination of the rule of three? Changes in the abbreviations? Using "place of publication not identified"? Changing from "Selections" to "Works. Selections."? These look to me like mere changes in procedures that are neither positive nor negative. In fact, in general I do not understand how changing any cataloging rule, i.e. a guideline on input for describing and/or arranging resources for later retrieval, could have that much of an impact on our user communities. Getting rid of the rule of three will most probably have an impact on cataloging productivity, but I doubt if any patrons will even notice it because they don't understand the rule of three in the first place. Certainly they will not notice any other of the current changes, except for the administrators of our budgets who will notice when they decide whether or not to pay to retrain catalogers and rewrite the current local cataloging documentation to refer to the new rules. Plus, the administrators will have to decide whether or not to pay for online subscriptions to RDA *on an ongoing basis*.

On the other hand, sharing our data in accessible formats (i.e. non-MARC) and enabling URI linking to authority records could make a huge difference to our user community. Look what happened already with the CERN library that simply let their catalog records out and within just a couple of days, people were using them in innovative ways. Just think what people could do with our stuff! There are many things that the cataloging community could do right now that would make a difference to our patrons.

"AD: While we all look forward to the day when funding is not under threat, experience suggests it would be a mistake to sit on our hands until it arrives. The underlying business model for resource description is changing and RDA is part of the adjustment libraries are making in response to that change."

This is difficult to answer. If there simply is not enough money, there will be no choice for many libraries. They simply *cannot* adopt RDA even if they want to because the funding does not and will not exist. I personally do not see any advantage in RDA over AACR2 aside from ethereal statements such as: "JSC would contend that this is the most exciting aspect of RDA development. RDA is based on the FRBR and FRAD models with a focus towards the semantic web and its use of linked data. Aligning RDA with these models positions us to benefit from future convergence of the conceptual models for the different sectors within the domain of resource description," which, while I am sure you and the JSC are sincere, such a statement is exceedingly vague and unsatisfying.

I would like to close this by once again emphasizing that I have the greatest respect and appreciation for all the work everyone has done on RDA. I understand it has been a massive undertaking by some of the best minds in cataloging in the world. But at some point, if a person honestly thinks something is seriously wrong, that person has to stand up and say that we are going down the wrong road and therefore we must find another path.

Libraries need a real choice and that is the reason why some concerned librarians have put forward the Cooperative Cataloging Rules to provide an alternative that would promote consistency with current practice as well as providing a forum for continued development. For more information on the initiative, see:

Monday, February 22, 2010

RE: [RDA-L] Question about RDA relationships (App. J)

Posting to RDA-L

Daniel CannCasciato wrote:
Hal Cain wrote:

> I wonder how far OCLC will let participants go in supplying these kinds of links:

And I agree. I am not allowed to update the pcc records at this time.

I will throw a spanner in the works here and say that in the new world of shared data, it is impossible to predict where our records will show up, how they will look, how they function, and how they will be used, so it is vital that catalogers realize that it will not be catalogers and librarians who will be the ones deciding what will happen to their records. For example, if the records continue to go into Google Books as they are now, it will be Google who decides what kind of links will be allowed, not us and not OCLC. This is an example of what many are calling "losing control." (The legal decision on opening up GBS could come this week, by the way! Hold on!)

However, I do wonder how many catalogers would agree with Karen's assertion that the library concept is that metadata is a one-time creation rather than additive. I certainly don't and have advocated for the iterative process for bibliographic and authority data. As Hal identified later in his message, the "core record" is meant to be a dynamic one. The fact that the practice as yet isn't supported (logistically and administratively) is fundamental problem for users. Some library administrators, for example, tend to view the iterative process as "tweaking" and needless, rather than inherently required. David Bade's work (and the work of others) certainly gives a strong argument for exploiting language, scholarly, and subject expertise when we can. I hope the iterative process becomes more acceptable regardless of which environment one is working from or in.

But in this new world, other information will be included. Look at the popularity of LibraryThing, which works quite differently. Here is a random record:

I think these views are some of what we need to be studying. This does not mean that we simply imitate LibraryThing or GBS, but we need to learn from their successes. The idea of a "do it once, do it right, forget it" vs. "tweaking" doesn't make a lot of sense in a world that mashes records together and are open to general collaboration with the world. We should remember that many more people are using LibraryThing than WorldCat, obviously because it fulfills their needs better. (Librarything) (WorldCat)

Libraries and their metadata need to become a meaningful part of this bigger universe of metadata. But to do this, we need to rid ourselves of a lot of the old assumptions.

Wednesday, February 17, 2010

New Journals, Free Online, Let Scholars Speak Out

Blog reply to Schmidt, Peter. New Journals, Free Online, Let Scholars Speak Out, Chronicle for Higher Education, February 14, 2010
We need to admit that peer-review is far from the panacea that many scholars want it to be. There have been plenty of examples of shoddiness getting by peer-review. Besides, new information comes up constantly and while something may have legitimately made it through the peer review process 10 or 15 years ago, it never would today. That's why in today's world, there is the possibility for a substantial improvement over traditional tools, with post peer review. Only in this way can someone know, when they are looking at a paper from 1996, that another paper was published in another journal in 2007 and effectively refuted the earlier paper.

These possibilities are relatively easy to implement today with the Web2.0 tools using forums, ratings and the like. These are the directions that would make a real difference not only to our colleagues, but to the public at large, so that everyone can see the debates, ferment, and even intellectual excitement taking place in the academy.

RE: how do i become a cataloger?

Message to Autocat
I think a lot of it depends on the way both you, and the institution you want to work for, look at the future of information retrieval. The handwriting is on the wall that the traditional library controls are slowly disintegrating, above all in the area of selection, where you have to buy aggregator databases with tons of materials you would otherwise never pay for, to the possible future of Google Books and the addition of millions of ebooks. Add to this the wonderful projects of open access papers, books, journals and the lot, and selection becomes far more complex than perhaps ever before.

These changes lay corresponding pressure on the local catalogs (the next link in the chain), which are supposed to exist to give access and control to the materials "selected for the collection." Look at the outsourcing of most catalogs for the electronic journal list through resources such as Serials Solutions, along with all of those proprietary ebook databases. What will happen with Google Books? Who will use our local catalogs then? And we can't forget that Google Books is absolutely *not* the only place, or even the best place, for electronic books. Let's not forget those weird things that are not books or serials or scores or recordings, but are something new.

People and institutions can either ignore these changes or embrace them. I won't discuss ignoring them, but embracing them today as a librarian means being only a part, perhaps a small part, of any solutions. This is difficult for many to accept because earlier, if the public wanted information, they had no choice but to go to a library, and once there, they had no choice but to use the bibliographical tools we created. Those days are gone.

Yet, I think that because of our unique skills and knowledge, librarians and catalogers can be an important part of any future solutions, but it is important to act sooner rather than later, because a lot of it is like trying to change the course of an oil tanker--they can't turn on a dime and foresight is needed.

People and institutions (i.e. those people you work for) can have honest, but diametrically different opinions on these very weighty matters and it becomes very hard when people do not agree on such fundamental principles. We are in a transitional moment, and learning AACR2, MARC21 and so on may be similar to learning how to shoe a horse when automobiles first appeared. Or maybe not. Traditional cataloging skills are still needed today but who knows just 10 years from now?

I'll append part of a private message I sent to some library school students who were asking some questions about the changes from AACR2 to RDA, the consequences, the future, and so on.

One point though: don't underestimate the usefulness of your "inexperience." Approaching these issues as an interested person and looking at them through fresh eyes is extremely important, especially today. Another way of considering your situation is that you are lucky that "your mind is not warped." :-)

But seriously, the tools we make are for people exactly like you as you are *right now* (plus a library has additional internal needs). As someone on NGC4LIB mentioned recently, "maybe catalogers are not the best ones to determine how the catalog is used." I think, "Absolutely!" This is where outsiders are essential, especially reference librarians, but others as well, and why I say that the user tasks of "find/identify/select/obtain works/expressions/manifestations/items" can only have been figured out by a cataloger. I can't imagine that users would *ever* come up with that. What are our users really doing? I don't know; nobody knows right now, and research is being done.

So please: keep an open mind and question *everything* you see. And especially today, never ever accept for an answer, "That's just the way it's done."

Tuesday, February 16, 2010

RE: two nominal setbacks on the digital revolution front

Posting to Autocat

On Mon, 15 Feb 2010 09:02:36 -0600, mike tribby wrote:
>"The survey showed that newspapers continue to lead the Internet in nearly all categories for readers seeking local news and information on products being sold in their communities. But it also showed the progress made by alternate sources, with a large share of state residents preferring the Web for information on places to visit in the state and nearly as many using the

It doesn't say that people don't use other sources of information, but it certainly doesn't sound the death knell for newspapers' usefulness frequently aired on this bandwidth either. I don't claim that it indicates the Internet and digital news sources in general are dead or doomed as a popular information source, though predictions of the imminent demise of paper and ink resources sometimes seem to be based on less factual evidence than this IMNSHO.
The issue for newspapers is money. It has been shown that the public wants their information more than ever, but they don't want to pay. Therefore, Rupert Murdoch keeps the paywall up, and the NY Times is expected to (re-)try the paywall at the beginning of next year.

Who knows if people will pay for it? There has to be the idea in people's minds that if they have a choice of sites for free (which will always happen) and others that are for pay, the pay site had better be significantly better. There have been lots of problems with the accuracy of professional news coverage in the last several years (I won't go into that) so, at least I am skeptical.

Also, I think that more and more, people will be using tools such as Google News with mashup results: With this type of option, I don't know how many will miss the NY Times, e.g. the following story from Google News with links to CNN, ABC, Reuters, plus 1,223 others.

18 killed in Belgian train crash, official says
CNN - Jessica Hartogs, Cristina Lynch - 1 hour ago
(CNN) -- At least 18 people died and 55 were hurt when two trains collided in Belgium Monday, an official told CNN, adding that the numbers may not be final.
Belgian train crash: up to 25 passengers feared dead
Belgian passenger trains in head-on collision in Halle BBC News
Reuters - Voice of America - AFP - ABC News - Wikipedia: Halle train collision
all 1,223 news articles

No NY Times. Big deal.

My concern is how we will deal with this same situation when (not if) our library metadata is mashed up in the same way.

Friday, February 12, 2010

RE: [RDA-L] RDA requirements in LMS

Excerpted from a private message concerning
My own thoughts on the future of authority control are a bit different however, and relate to why I mentioned web services in my original message. It seems to me that authority control using URIs would be absolutely perfect when used in conjunction with a web service. I do not know precisely how this would work, but now that there is the site, the VIAF, the dbpedia project, which appears to be an "authority file" based on wikipedia, tools that I don't know about, and others that will come up as the Semantic Web is built, it could become incredibly rich.

This view means that much of authority control will be taken out of the catalog, out of the library, and even out of the library community itself. Libraries can relate to this trend (which I think is the inevitable outcome as the Semantic Web grows) in different ways: by fighting it to retain their "control" (whatever that will mean), or to accept things and try to fit in so as to help shape the future.

For all of these reasons, this is why I think the open source movement is so important. The world of information is in a state of flux right now, and only *real* experimentation (i.e. trial and error) by many people will help us find solutions. Large organizations encumbered by very difficult to change proprietary systems may not be the appropriate areas to find these solutions.

I envision to myself the situation of the dinosaurs, who were so happy and successful for millions of years, with the mammals barely making it, but when the comet hit, the dinosaurs died because they couldn't adapt, but the mammals could. It's the same situation today with the libraries, publishers and other creators of intellectual creativity, who did so well for so long, and then the World Wide Web struck. Everybody is in a state of panic and many, (read the publishers) are trying to keep the old ways of doing things, when it is clear those days are gone. This is when I remember the quote of Darwin: "It is not the strongest of species that survives, nor the most intelligent; it is the one that is most adaptable to change."

I really believe that this describes the situation as it is today. There is real success possible today for people and organizations who are willing to adapt.

E-Library Economics

Comment to "E-Library Economics" in Inside Higher Education at:
There are so many topics to discuss in this article, that it is difficult to begin, but I want to focus on the relationship to libraries. First, I would like to point out one quote: "This is largely due to the fact that e-readers have not managed to replicate certain aspects of the traditional book-reader's user experience: “You can do a lot with a print book: photocopy or scan as many pages as you like, scrawl in the margins, highlight passages, bookmark pages, skip around, read it in the bathtub, give it to someone else, make art out of it, etc.,"

Different groups doubtlessly will respond in different ways. The publisher will say that you have never been able to photocopy or scan as many pages as you like legally, and now with digital materials, they can finally regularize a situation that has been out of control since the introduction of the Xerox machine.

A librarian will say that you had better not scrawl in the margins, highlight passages, dog ear pages of a copy of a book that belongs to the library.

If I lend someone a book from my own collection, they had better not read it in the bathtub where the humidity will warp the binding and they might drop it in as they start to fall to sleep, and if they make art out of it or give it to somebody else, they had better be ready to pay to replace my copy. So a lot of these concerns deal only with personal copies, not library copies.

Also, about browsing the shelves, it must be accepted by everyone that browsing is a pleasant activity but has ceased to be a reliable way to find information for a long time. Now that almost every collection annexes some materials, almost every collection has multiple locations and/or multiple classifications, there are materials scattered in journals and conference publications and collections that are shelved far away from where you may be browsing, and now there is so much worthwhile information available only electronically, browsing the shelves guarantees that you will find fewer and fewer of the materials that you need. Librarians need to be forthright in saying this, even though it may dismay many researchers. It is simply a statement of fact. The task before librarians and other information professionals is to create tools that let us navigate these resources as simply and as clearly as possible.

Friday, February 5, 2010

RE: Why are you for killing libraries?

Comment on blog posting: Why are you for killing libraries?
I think it would be naive to believe that the ebook will not have much of an effect on either bookstores or libraries, once there is an ebook reader the equivalent of the Ipod, which will probably happen sooner rather than later.

I am a librarian, and although I love books, I look forward to the future. But the future holds danger. I compare the situation of libraries to the newspaper world, where journalists are in danger of losing everything, in a similar way librarians are faced with the same danger.

But this is where librarians can learn from the journalists. There is general agreement that journalists are still needed. Many of them are now beginning to differentiate the field of "journalism" from "working for a newspaper." Since newspapers are in real danger, does that mean that the field of journalism is too? How can someone survive as a journalist without having to work for a newspaper? Some are deciding that it may be possible.

In the same way, there is the field of librarianship with our ethics and values, our skills and methods, and this differs from the tasks that we have of managing a physical collection. If libraries do not survive, or remain only as museums of "physical curiosities" (which is a possibility) does it mean that it is also the death knell for librarianship as a field? I don't think so, that is, if librarians reorient themselves.

I think that librarians and journalists can survive and even thrive in the new environment, but I don't know if newspapers and libraries will survive. If not, it will be a sad time, indeed.

Still, it should be a wild ride for everybody!

RE: [RDA-L] Utlility of ISBD/MARC vs. URIs (Was: Systems ...)

Bernhard Eversberg wrote:
J. McRee Elrod wrote:
>> imposes structure where it isn't helpful (e.g., where it was based on obsolete card design).
> Every word of your post rang true, until I reached that last sentence. Insofar as the old unit card structure is reflected in the choice and order of elements of the ISBD, it is *very* helpful.

Mac, I wasn't targeting ISBD here, and I'm as convinced as you are about its usefulness and importance. (We only want to get rid of punctuation at the end of subfields.)

Rather, I was getting at the innumerable rules that concern the arrangement of entries and tracings and whether or not an added entry was necessary, and how to control these things. Most of the indicators that concerned card production are not helpful any more but add to the confusion that governs opinions about MARC. Also, stuff like the omission of leading articles in uniform titles, which came into being *only* because that field lacks the indicator.


In addition, I think it's important to consider how it is best to focus our (most probably) ever decreasing resources in a truly shared, open environment. Let us just imagine for the moment, that we can get ONIX or DC copy for every single resource we catalog (that will be quite some time in the future if ever, but let's just imagine) and the cataloger updates the record. Efficiency will probably still dictate that there be copy catalogers who concentrate on the simple updates, and complex catalogers who will do more. How will it look if the copy catalogers report that for the week they have added filing indicators to 200 records and 245$b to 300 records? :-)

Joking aside, I think we have to get to the kernel of what our users need, plus I think we need to accept that once projects such as Google books come online, fewer and fewer people will search our local catalogs separately. They will come to our catalogs (if at all) from Google Books, where they will find the full-text plus a mashup of our metadata mixed in with who knows what, to find whether a library near them has a physical copy of an item, although they will be able to read the book online. Only time can tell how long it will be before people don't care so much about the physical book. (As an aside, I just bought a Sony ebook reader, and although I am definitely a bookman, I absolutely love it! For the first time, I can actually enjoy reading a book I have taken from the web! I have shown it to people and most want one too)

I admit this is a terrifying scenario (for me, at least), but it is one that is both logical and easy to predict. Once it is accepted however, we can begin to consider exactly what catalogers can provide our patrons that the Googles and the Yahoos cannot. I think there is an awful lot we can do and we can prove that we are still necessary.

But I don't know how much of it will resemble what we have always done. Is browsing alphabetically by title *really* so important to people that we must devote resources to do it? Would those resources be better used in adding new materials? I don't know but I have my own opinions. I think the situation is becoming so important that today we must make a case why people need something so desperately, e.g. browsing alphabetized lists of book titles, that we must devote staff time to redoing records that are otherwise correct. No longer can we rely on simply continuing current practices. Of course, this goes for all of MARC and the cataloging rules, but one must start somewhere.

Tuesday, February 2, 2010

FW: [RDA-L] Systems v Cataloging was: RDA and granularity

Bernhard Eversberg wrote:

Some metadata creators are inclined to follow no rules except their own, not disclosing what these are. But OK, we should not be pointing fingers at them but try very hard to make sense of everything they might come up with, creating a grand mashup (resisted to write hotchpotch.)

If that is so, and if metadata creators are not interested in getting the most out of our stuff either, why do we keep following extremely complex rules requiring innumerable elements? Dumb down RDA and MARC so we have only one elementfor keyword indexable text, and a few indispensable codes and dates. Wouldn't that immensely ease the job of creating the mashup? After all, what more is Google doing, and who except us is saying that's not good enough?

I think that each group sincerely believes its own standard to be better than anyone else's. (I believe it!) So long as everyone holds onto such ideas, there can be no change and the result will be that a separate metadata record will forever be made and remade by each metadata community (or when taken to a reductio ad absurdum, even each library/bibliographical agency). This is the situation as it has always been, but before the WWW it was practically impossible to know about and share records with all of the other bibliographic agencies. Those difficulties have now been overcome. This situation becomes uncomfortable however, since earlier, while we honestly could not see the records produced by others, today we either have to pretend not to see them or willfully ignore them. This results in a situation that I don't believe serves anybody very well.

The practice of cataloging is based on the principle of "consistency" which can turn cataloging into the most conservative of endeavors. By following the principle of consistency, catalogers ensure that the records they make today must work with the older records, some of them made 100 or more years ago. If you don't keep this in mind, the result can be hiding the previous records or at least making those earlier records incomprehensible. Of course, lots of practices have changed tremendously, but the basic idea is for everything to work together. Can the principle of consistency be retained in an open, shared, cooperative environment? I think it can.

Perhaps I'm a dreamer, but since it seems as if the general public wants reliable metadata (ref. the Language Log discussion about the metadata in Google Books) I still think that it's not too late, so long as catalogers are willing to adapt to some different practices. If we could simply get the rules pertaining to each separate bit of metadata, e.g. these page numbers follow the rules of the FAO of the UN, or by CERN, AACR2, Dewey, etc., it could go a long way for making the information more understandable.

I emphasize that this would be for librarians, who need this level of detail for their work of maintaining the collection, and not for users, who rarely need anything like this.

[RDA-L] Systems v Cataloging was: RDA and granularity

Daniel CannCasciato wrote:

Karen Coyle wrote in part:

" all of the needs are user needs . . . "


Pardons, but this is not correct. If we are to manage the collection (whatever "the collection" happens to be), we will need tools, and some of these tools will be designed for library use and not for the users.

There's nothing strange about this: for example, there are many things on an automobile that the general public does not need to understand in order to drive the car safely and correctly. Still, just because I do not understand them, I do not conclude that they are unnecessary. Some of the things may be there for no other reason than to make it easier (and cheaper) for the mechanics to do maintenance. Good! If I insist on knowing what all of these strange things are, I can learn what they are there for, but it is highly presumptuous to conclude that they are unnecessary.

For this reason, something like the number of pages is useful and vital primarily for librarians to manage a collection. What do I mean by this? If a selector is deciding whether to buy a copy of a certain text, e.g. yet another copy of Romeo and Juliet, he or she first needs to know if there is already a copy in the collection. The paging must describe the item well enough so that the selector does not have to march into the stacks to check how many pages the item *really* has. If the selector ends up buying an additional copy of something already in the collection, everybody gets mad because of the waste of money, staff time, and shelf space. But very few patrons, i.e. only the extreme specialists of our general reading public, really care much about how many pages something has.

There are many other areas of the record like this: the publishing/copyright/printing date(s), statement of responsibility, series statement, arguably the series tracing, many of the notes, and so on.

The traditional catalog serves many functions for many people, and one of the primary functions is as an inventory tool. It remains to be seen whether e.g. the incredibly complex system of subject headings are there for users, or more for librarians to ensure reliable retrieval.

In today's mashup world, where all kinds of metadata will be thrown together in ways we cannot predict, it is our task to figure our some way to have all of this make sense. See for example, the current thread in the NGC4LIB list about CERN making their bibliographic data open, which is non-ISBD. I am sure that other libraries will follow and Anglo-American libraries eventually will be forced to do the same. Sooner or later, our metadata, based on different standards, will *HAVE* to interoperate with CERN's metadata, and many other standards.

But let's face it: this is what is happening in our catalogs right now, since they contain various bibliographic standards other than the current flavor of AACR2. Our catalogs have always managed to contain AACR2, AACR1, non-ISBD, Cutter rules, Dewey rules, ALA rules, and on and on. If RDA is implemented, there is yet another standard.

Looked at in this way, the new environment may not be all that much different from what we have today.

Again, I think these are the directions we should take instead of coming up with yet another new set of rules that few metadata creators will follow.

Monday, February 1, 2010

FW: [NGC4LIB] The CERN Library publishes its book catalog as Open Data

Concerning The CERN Library publishing its book catalog as Open Data
The whole dataset can be downloaded from
This is really great and I hope that other libraries will follow.

*But* the question will be how to incorporate all of this together in a coherent way. The standards of CERN are quite different from Anglo-American standards. Below is a record taken at random, with the record pf the same item in LC. After a quick look, I see that in CERN there is no size; in the LC record the place of publication reflects AACR2 practice of adding a place within the country of the cataloging agency; there are differences in the date of publication vs. date of copyright; no statement of responsibility and no edition statement in the CERN record; the paging itself is different. These last are important for AACR2's determination of copy vs. new edition. CERN's subjects reflect their narrower collecting focus vs. LCSH's broader focus, e.g. "Python" vs. "Python (Computer program language)." Noel Rappin's name does not have the date of birth as occurs in the NAF. There are several other differences, including some of differing cataloging philosophies.

None of this is to find fault, but rather, while the sharing is great, that is only a first step. How can we use these records in the best, most efficient way for our own purposes and for our users? Of course, some of these problems can be solved with URIs, but I don't believe everything can. Do we just settle for a mashup or can we do something else?

Jim Weinheimer

<controlfield tag="001">984645</controlfield>
<controlfield tag="005">20071109101316.0</controlfield>
<datafield tag="020" ind1=" " ind2=" ">
<subfield code="a">0596002475</subfield>
<subfield code="u">print version, paperback</subfield>
<datafield tag="020" ind1=" " ind2=" ">
<subfield code="a">9780596002475</subfield>
<subfield code="u">print version, paperback</subfield>
<datafield tag="041" ind1=" " ind2=" ">
<subfield code="a">eng</subfield>
<datafield tag="080" ind1=" " ind2=" ">
<subfield code="a">004.438.Jython</subfield>
<datafield tag="100" ind1=" " ind2=" ">
<subfield code="a">Pedroni, Samuele</subfield>
<datafield tag="245" ind1=" " ind2=" ">
<subfield code="a">Jython</subfield>
<subfield code="b">Essentials</subfield>
<datafield tag="246" ind1=" " ind2=" ">
<subfield code="a">Rapid Scripting in Java</subfield>
<subfield code="i">Cover title</subfield>
<datafield tag="260" ind1=" " ind2=" ">
<subfield code="a">Beijing</subfield>
<subfield code="b">O'Reilly</subfield>
<subfield code="c">2002</subfield>
<datafield tag="300" ind1=" " ind2=" ">
<subfield code="a">277 p</subfield>
<datafield tag="490" ind1=" " ind2=" ">
<subfield code="a">O'Reilly &amp; Asociates books</subfield>
<datafield tag="650" ind1="1" ind2="7">
<subfield code="2">SzGeCERN</subfield>
<subfield code="a">Computing and Computers</subfield>
<datafield tag="653" ind1="1" ind2=" ">
<subfield code="9">CERN</subfield>
<subfield code="a">Jython</subfield>
<datafield tag="653" ind1="1" ind2=" ">
<subfield code="9">CERN</subfield>
<subfield code="a">Java</subfield>
<datafield tag="653" ind1="1" ind2=" ">
<subfield code="9">CERN</subfield>
<subfield code="a">Python</subfield>
<datafield tag="690" ind1="C" ind2=" ">
<subfield code="a">BOOK</subfield>
<datafield tag="700" ind1=" " ind2=" ">
<subfield code="a">Rappin, Noel</subfield>
<datafield tag="916" ind1=" " ind2=" ">
<subfield code="d">200609</subfield>
<subfield code="s">h</subfield>
<subfield code="w">200638</subfield>
<datafield tag="960" ind1=" " ind2=" ">
<subfield code="a">21</subfield>
<datafield tag="961" ind1=" " ind2=" ">
<subfield code="c">20080407</subfield>
<subfield code="h">2044</subfield>
<subfield code="l">CER01</subfield>
<subfield code="x">20060920</subfield>
<datafield tag="963" ind1=" " ind2=" ">
<subfield code="a">PUBLIC</subfield>
<datafield tag="970" ind1=" " ind2=" ">
<subfield code="a">002647668CER</subfield>
<datafield tag="980" ind1=" " ind2=" ">
<subfield code="a">BOOK</subfield>
<datafield tag="964" ind1=" " ind2=" ">
<subfield code="a">0001</subfield>

LC Control No.: 2003266066
LCCN Permalink:
000 01438cam a22003494a 450
001 13108602
005 20090729142230.0
008 030303s2002 ch a b 001 0 eng
010 __ |a 2003266066
015 __ |a GBA2-Y6751
020 __ |a 0596002475
035 __ |a (OCoLC)ocm49044531
040 __ |a UKM |c UKM |d CUS |d TXA |d CUY |d DAY |d DLC
042 __ |a pcc
050 00 |a QA76.73.J38 |b P43 2002
082 04 |a 005.133 |2 21
100 1_ |a Pedroni, Samuele.
245 10 |a Jython essentials / |c Samuele Pedroni and Noel Rappin ; foreword by Jim Hugunin.
250 __ |a 1st ed.
260 __ |a Beijing ; |a Sebastopol, CA : |b O'Reilly, |c c2002.
300 __ |a xx, 277 p. : |b ill. ; |c 23 cm.
500 __ |a "Rapid scripting in Java"--Cover.
504 __ |a Includes bibliographical references (p. xvi-xvii) and index.
650 _0 |a Java (Computer program language)
650 _0 |a Jython (Computer program language)
650 _0 |a Python (Computer program language)
700 1_ |a Rappin, Noel, |d 1971-
856 42 |3 Publisher description |u
856 42 |3 Contributor biographical information |u
906 __ |a 7 |b cbc |c pccadap |d 2 |e ncip |f 20 |g y-gencatlg
925 0_ |a acquire |b 2 shelf copy |x policy default
955 __ |a ps05 2003-03-03 to ASCD |c jf05 2003-03-11 to subj. |d jf09 2003-03-11 to sl |e jf12 2003-03-12 to Dewey |a jf16 2003-07-11 copy2 to BCCD