Tuesday, November 30, 2010

Imagining different types of standards (Was: Statement Naming More Than One Person, Etc.: Mark of omission before supplied information)

Posting to RDA-L

Brenndorfer, Thomas wrote:
Perhaps it would have been better to use an example from Codex Alimentarius that resembled the textual properties displayed on bibliographic resources which catalogers must take into account in assisting people in identifying those resources. The General Standard for the Labelling of Prepackaged Foods (http://www.codexalimentarius.net/download/standards/32/CXS_001e.pdf) prescribes a series of instructions for recording the the name of the food that is no less onerous than the rules for bibliographic description in libraries:

4.1 The name of the food
4.1.1 The name shall indicate the true nature of the food and normally be specific and not generic: Where a name or names have been established for a food in a Codex standard, at least one of these names shall be used. In other cases, the name prescribed by national legislation shall be used. In the absence of any such name, either a common or usual name existing by common usage as an appropriate descriptive term which was not misleading or confusing to the consumer shall be used.…

Thanks for pointing that out. This is a much better example of what I have in mind. For example, I can imagine that determining a *precise form* of a named entity may become less important as URIs begin to be implemented and displays of names become more fluid. Still, I can imagine a highly predictable form that would, in a sense, "guarantee" access to the name for librarians; in other words, an "expert form" of the name and that could continue current AACR2-type practices more or less.

Of course, the same methods could work for subjects as well, and perhaps better. So, if we have a form of subject that really no one would ever think of, e.g. "Byron, George Gordon Byron, Baron, 1788-1824--Homes and haunts--England--London", this would not necessarily be the first thing displayed and it could be something more like "Lord Byron and British pubs"!

RE: Copyright vs. publication dates

Posting to Autocat

Marian Veld wrote:
On Thu, Nov 25, 2010 at 4:53 PM, James Weinheimer wrote:
>I repeat, the changes we institute should have practical goals, not abstract, theoretical goals. We no longer have the luxury for theory.
I couldn't disagree more. Without theory we find ourselves adrift on a sea of changes with no idea of how to respond to them. We find ourselves with a cobbled-together system based on reactions to problems or perceived problems, but no concept of how things fit together. We end up with conflicts in our data collection and recording practices because we don't have a rigorous conceptual basis for making decisions. In short, we end up with the catalogs that we now have. I personally like FRBR, but I will accept that other conceptual bases are possible and could even be better than FRBR. But to try to build a complex structure like a library catalog without theory is like trying to build a computer without understanding electronics.
The emphasis in that sentence, as in my entire posting, was on the goals. Thanks for pointing this out this lapse--I need to tighten up my writing. The goals of RDA and FRBR, along with its business model, are still wholly abstract and theoretical. As a result, I was trying to point out that it does not make sense to risk so many of our ever-[sharply!]-diminishing resources on retraining and retooling for the sake of vague, abstract, theoretical goals. Especially today, our goals should be practical, just as when everything changed from AACR1 to AACR2, it was for the very practical purpose of getting additional cataloging copy, and this was because the entire Anglo-American world could begin to cooperate not only in the area of bibliographic description, but in the headings as well. The risks of change were certainly there, but there were definite and tangible rewards offered.

I can see the risks today with RDA very clearly, but I still haven't seen what the rewards are yet, except we are supposed to have faith that it is "the wave of the future". I have lost the faith.

RE: Seeking a Web-based FRBR Catalog (catalogue)

Posting to RDA-L

Mike McReynolds wrote:
I've been seeking examples of FRBR catalogs on the Web to point to as examples. Despite searching the RDA-L archives, library literature, the IFLA Web site and Google, I've not been able to locate a single example of a FRBR catalog. This would be helpful to justify the amount of time I've already devoted to modifying our cataloging software to simply accept RDA records imported from OCLC and then the amount of time I will spend re-learning cataloging.
The problem with finding a genuine FRBR catalog is that it exists only in theory: for a true FRBR catalog to exist, you need another structure underlying the edifice, one based on the FRBR entity/attribute model, and nothing like that exists yet (that I know of anyway). For that to happen, we need a complete change in MARC format (which was created to exchange information on separate cards, i.e. complete information for each manifestation or edition), plus we would need changes in rules, to ensure that the information required in each entity is there, e.g. that the work record has the required information for all the relevant authors and subjects, that the expression record has the information for editors and versions, etc. etc. To create such a structure will require quite literally a sea change in how every cataloger works, and more importantly, how they think. Naturally, there would be tremendous concerns over retrospective conversion; otherwise we risk making everything we have now more or less obsolete.

In the meantime there are some projects that attempt to replicate the experience of an FRBR catalog, and the others have suggested several excellent ones. I personally like the example at http://zoeken.bibliotheek.be. Such projects are incredibly useful since they demonstrate that there is a lot we can do with the records we have right now, and these projects by no means exhaust the possibilities. I think it would be wise to take a step back and, using these projects which simulate a genuine FRBR tool, to ask seriously: would building a genuine FRBR sort of tool really provide our patrons with what they want or need? Does an FRBR tool answer the real-life questions our public brings to the catalog? Is it best, in these exceedingly trying financial conditions, to redo everything to build a tool that people *may not* find particularly useful?

I am as yet unaware of any user studies along these lines in relation to FRBR/RDA, but there are many studies of users, how they search for information and what they expect from it, from other viewpoints. Two of the latest are at: http://www.libraryjournal.com/lj/communityacademiclibraries/887740-419/discovery_face-off_draws_a_crowd.html.csp (the Charleston Conference. I only read the LJ account, but I just discovered that some of the presentations are up at http://www.slideshare.net/event/2010-charleston-conference) and Project Information Literacy’s report at: http://projectinfolit.org/pdfs/PIL_Fall2010_Survey_FullReport1.pdf There are many other highly useful studies however, some of the most interesting coming from “library anthropologists”(!).

Sunday, November 28, 2010

RE: Statement Naming More Than One Person, Etc.: Mark of omission before supplied information

Posting to RDA-L

J. McRee Elrod wrote:
Mark Ehlert said:
>(Something to fall back on when the RDA text is wishy-washy--which says something about the RDA text as is stands now.)
The end result will be increased variation in practice among those
creating bibliographic records.
Although I am a fervent believer in consistency, I believe that the future of bibliographic standards will come to resemble other standards, e.g. standards for food. As an example, you can look at the standards of the Codex Alimentarius and how they work: http://www.codexalimentarius.net/web/standard_list.jsp
If you look at almost any standard, for example, the following is taken from the one for honey, we see standards such as:
(a) Honeys not listed below - not more than 20%
(b) Heather honey (Calluna) - not more than 23%
3.5.2 Sucrose Content

(a) Honey not listed below
- not more than 5 g/100g

(b) Alfalfa (Medicago sativa), Citrus spp., False
Acacia (Robinia pseudoacacia), French
Honeysuckle (Hedysarum), Menzies Banksia
(Banksia menziesii),Red Gum (Eucalyptus
camaldulensis), Leatherwood (Eucryphia
lucida), Eucryphia milligani
- not more than 10 g/100g
(c) Lavender (Lavandula spp),Borage (Borago officinalis)- not more than 15 g/100g
I freely confess that I do not understand the first thing about making honey, so all of this means nothing to me, but I accept that to experts it means something very specific and is very important. And as a consequence, everybody who cares about honey actually cares about these standards, although the vast majority of people who eat honey don't even know these standards exist and even fewer have read them. We can also see from just these little examples that food standards are almost always minimums and not maximums, i.e. they allow plenty of room for additional quality but certain minimums are guaranteed. I think there is a lot we can all learn from such standards.

So, I think that as future bibliographic standards evolve, they will become guidelines for minimums, and not how they are now: "thou shalt transcribe the statement of responsibility from precisely these sources of information using precisely these methods".

Exactly how these new types of standards will work in practice, I cannot very well imagine at this point, but it seems something like this may be the only way to ensure some level of reliability that different bibliographic agencies can achieve. We have to face facts: it is becoming ever more essential that libraries and library catalogers get all the help they can. This will mean real and true cooperation with other relevant bibliographic agencies. This was never possible before but today, using modern technology, the possibility for cooperation on a previously unimaginable level is available. This will mean however, fundamental changes for absolutely *everyone* involved, not least of all, libraries. Based on the development of standards in other areas, perhaps determining minimal levels is a more profitable way to go than the traditional library method of: everyone will do *this* in precisely *these ways*. This has a possible consequence of lack of consistency, and this must be dealt with in some way. Right now, I don't know how it could be done.

Incredible changes are happening now anyway, and apparently more will come very soon. Here is a recent article from the Guardian that describes a bit of what our British colleagues may be seeing. http://www.guardian.co.uk/books/2010/nov/22/library-cuts-leading-authors-condemn
"Writers Philip Pullman, Kate Mosse and Will Self have criticised government cuts that could see up to a quarter of librarians lose their jobs over the next year. Widespread library closures are expected as councils cut their services and look to volunteers in an attempt to balance budgets hit by the coalition's spending review."
Profound changes are happening to the profession right now and practical methods must be taken to deal with them.

Friday, November 26, 2010

Wordle for First Thus

Some time ago I discovered the site: Wordle, which creates word clouds from any text or from web sites you want. It works as normal word clouds (as if word clouds can be normal!): larger words are used more often, while the arrangement is left up to the idiosyncracies of the computer program. You have some control over display, but not a lot.

I have played with it privately, mainly with the purpose of self-criticism in mind, to try to get a different sense of what I am writing. Of course, these images include text of whatever I happen to be quoting.

I have found it enlightening to put in some text to see what happens. I thought I would share my latest one, based on the latest RSS feed of my blog.

RE: Copyright vs. publication dates

Posting to Autocat

On Thu, 25 Nov 2010 13:56:59 -0500, Brenndorfer, Thomas wrote:
[Mac wrote:]
>> >It's $c [2009] or $c [not after 2009]
>> How is the redundant: 260$c[2009], c2009, or $c[2009],$g2009, or all three, helpful to anyone?
Because they draw attention to the fact that different data elements are referred to. How hard is it to get people to see that the first element after 260 $c is ALWAYS the date of publication or the probable date of publication? Otherwise we rely on people inferring "Oh, a copyright date by itself. I guess the cataloger made the decision to double up its meaning for both Date of publication and Date of copyright." Quite frankly, I've found some staff and catalog users mystified by the presence of such marked up data when the expectation that it's only one kind of data and not crammed with several.
I have found over the years lots of staff and many more users mystified by all kinds of things in the catalog, but we sure don't do much about them. How about the inane heading "Tolstoy, Leo, graf, 1828-1910" which mixes an English-form of his name with his Russian title instead of Count? I've had lots of questions of that sort, and nobody likes the answer. The rule about separate headings for different pseudonyms has made lots of people mad. ("You mean I have to look him up under *all* those forms of name?! Yes.) How about us not making separate records for items with different ISBNs? The use of "1 v. (various pagings)" is a big one. Subject headings are particularly weird for lots of patrons. The list could expand for a long time.

What I am getting at is just because people don't understand something in the catalog is no reason to consider it a problem. I don't understand lots of things on a car, my tv set, my dvd player, my computer and lots of things at work, but I don't worry about them. If I wanted to learn about them, I know that I could, but it's not that important to me to put out the effort. This happens to every single person in the world every single day. So I ask: why should it be different for a catalog record? Let's be realistic about it: most users don't have the slightest idea how a library catalog works anyway. Understanding a date is the least of their problems!

In any case, I cannot imagine that the proposed RDA practice that repeats redundant information, e.g. [2009], c2009; or 2009, 2009 can possibly be less mystical to users than what we have now.

We can reform the dates in the catalog record, that would be fine, but it should be for practical purposes (as I mentioned concerning its possible use for cataloging simplification) and we should not delude ourselves that this will be any clearer for our users, because it simply won't be. A lot of the information in a catalog record is not for the user, but for librarians to manage the collection. There is nothing at all strange or wrong with this and as I mentioned above, we run into this same practice in other professions dozens or even hundreds of times every day in our lives, and we forget it immediately. Our users are no different from us: they see lots of things in a library they do not understand. Some of those many things are in catalog records. Big deal. Most of them don't even notice, and those very few that it bothers can ask.

I repeat, the changes we institute should have practical goals, not abstract, theoretical goals. We no longer have the luxury for theory.

Thursday, November 25, 2010

RE: More granularity if imprint year coding?

Posting to RDA-L

Hal Cain wrote:
Quoting Deborah Fritz <deborah@MARCOFQUALITY.COM>:
> I think that what John actually said was "and *not just* with regard to the 260 field", my emphasis added, i.e., plans are afoot for adding granularity to the 260  *and* other fields.
> Which is certainly good news-for however long we are going to continue to use MARC for RDA.
Which for some will be a long time, I think, seeing how many smaller libraries I know that have little or no prospect of getting funding for replacing their existing MARC systems. On the other hand, some will need specialist help to rejig their MARC mapping to accomodate RDA records, but that will come rather cheaper than system replacement. It would be a service to us all to be able to incorporate new MARC subfielding (such as in 260) in one operation.
I agree with Hal on this: any changes will take an awful long time to percolate through the system. The purpose of my original post on this topic was to point out the difficulties of everyone agreeing that "this particular item I am looking at" is the same as "this other particular item I am looking at". In other words, I was trying to point out the real difficulty of determining what is a manifestation. It is only a matter of *definition*, and different bibliographic universes will define their equivalent of a "manifestation" in different ways, and not only that, each individual cataloger/metadata creator who works within a separate bibliographic universe--all of whom may be highly experienced and knowledgeable--will also interpret things in their own ways. I cannot imagine that another bibliographic universe (e.g. publishers, rare book dealers, etc.) will change everything they do simply because our bibliographic universe changes our definition of what is a manifestation. After all, we wouldn't change for them.

If something that should be one of the simplest aspects of cataloging turns out to be so difficult to reconcile in practice ("This is--or is not--a copy of that"), then how in the world does that leave us with any hope at all to reach agreement on expression and work, which I don't think anyone maintains are "simpler" in any way at all? Finally, our records can no longer be considered separately from other records in different bibliographic universes out there, and they *will* (not must) interoperate all together somehow!

Understand my despair?

So, my concern is not so much that we need additional subfields (although Jonathan is absolutely right about systems needing them), because additional subfields necessarily increase complexity. Greater complexity should be avoided because it takes more time to do and catalogers need to be trained to input information consistently, otherwise we get hash. Just adding a bunch of subfields that are misused serves no purpose. Nevertheless, in certain *rare* cases however, adding subfields may actually *simplify* cataloger's work and in my experience, 260$c may be an example of one of those cases.

Or maybe not. I think it should be considered, but practical considerations (i.e. simplification) need to take precedence.

Wednesday, November 24, 2010

RE: [RDA-L] More granulalrity if imprint year coding?

Posting to RDA-L

J. McRee Elrod wrote:
James Weinheimer suggested on Autocat:
>"I believe the basic problem lies in the "260 $c - Date of publication, distribution, etc." We are simply putting far too much information in this [sub]field ..."
Subfield code 260$d is available for copyright year as opposed to publication year. But whether separate subfield coding would be an advantage to patrons depends of how it is applied in RDA.

My suggestion for adding more subfields for additional types of dates is not for the purpose of the patrons at all, but I was thinking of the needs of the (gasp!) cataloger, for the purpose of making cataloging easier. Let me explain:

There are just too many types of dates attached to a resource: date of creation, date of issue, date of publication, date of update, date of research, date of.... the list can go on and on, especially as we address the multiplicity of web resources. In my own experience, figuring out which date(s) to add and which date(s) to ignore, in addition to learning or teaching the intricacies of it all has been a real pain, and ultimately it hasn't worked very well. Added onto this the expectation that everyone will decide matters in the same way(!) is, in my opinion, completely unrealistic. For example, one thing I have seen many times in relatively simple book cataloging--from LC copy, too--is mistaking the printing date for the publication date, but there are many more problems than this, as the apparently simple example on Autocat showed. What will happen when it really gets tough?

Looking around for a "sustainable solution", it occurred to me that the problem is that we are shoehorning too much into the single date $c. I am sure we can all point to major problems today, and especially as we include the records of other bibliographic agencies into the mix (at least I hope we get some help somewhere!), most probably with all kinds of different practices, it is clear that maintaining the current situation which doesn't work very well is unsustainable. There needs to be a change, and if nothing else, there will be eventually when all of our records are put through the "metadata mash-up blender" in Google Books, as I personally have no doubt will happen sooner or later.

Concerning display, that can be handled in different ways. It could be handled much as the 6xx xyzv subfields are displayed, i.e. without any differentiation. If people are really concerned, perhaps an onmouseover event could be employed that would display an explanation of each date subfield, but that is overkill, I think.

Finally, my own opinion on an item that arrives today with a 2011 copyright on it: assuming that the item was actually manufactured in 2010 is unwarranted. It could have been created in 2005 or 1930 for all I know. The 2010 date is the date of accession, and/or the date of cataloging. That is useful information, too.

Tuesday, November 23, 2010

RE: Copyright vs. publica. dates

Posting to Autocat concerning the thread Copyright vs. publica. dates

This is an excellent example of the mind-twisting problem of determining what is a manifestation. For this one item--and I am assuming everyone is transcribing what they see on exact copies--we have already seen a whole raft of methods. Added onto this in the "macro-world" of metadata, we still need to note the Amazon method: http://www.amazon.com/Civil-Rights-1964-Landmark-Legislation/dp/1608700402
"Publisher: Marshall Cavendish Children's Books; 1 edition (September 2010)"

and Google Books:

and Open Library:
Published 2010 by Marshall Cavendish Benchmark in New York
(same in Worldcat: http://www.worldcat.org/title/civil-rights-act-of-1964/oclc/430736527)

I am sure there are lots of other variants out there.

Unfortunately, nowhere have I managed to see either the t.p. or the sacred "t.p. verso", but from this seemingly simple example, we can see clearly how there are several ways of describing the same item.

One way or another, these records will be mashed/forced/lumped together. Otherwise, the amount of duplicated work is simply intolerable plus the final product is practically incoherent for anybody searching and using these records.

The mashing of all of these records will be done with the cooperation of libraries or without it. My personal belief is that this will wind up being decided by the people at Google somehow, since they are the big boys on the block, and they already have more full-text books than anybody else in the world.

What is the problem with this scenario? I believe the basic problem lies in the "260 $c - Date of publication, distribution, etc." We are simply putting far too much information in this field; how much does that poor, little "etc." include! Almost any resource has several dates attached to it, and if we had different subfields to put in specifically: copyright date, publication date, reprint date, issue date, distribution date, etc., records would be simultaneously more accurate *and* easier to do because you wouldn't have to decide which dates to include or ignore. It would also be far simpler to train people, because you would focus on real transcription of what you see.

Of course, the changes to 260 won't happen. But the mashing of the records will. Of that I am certain.

Friday, November 19, 2010

How Google Works

This is one of the clearest explanations of how Google works that I have ever seen.Thanks to http://www.ppcblog.com/how-google-works/

How Google Works.

Keyword vs. Controlled vocabulary studies--a case study

Posting to Autocat

Brian Briscoe wrote:
Do any of you have a citation or link to studies that did a comparison between Keyword and Controlled vocabulary searching and possibly included a determination that one searching method was better than the other?
I am already aware of Sevin McCutcheon's study that came out in The Indexer, June 2009. I would appreciate any pointers you can give me to further studies or research n the subject.
This is a topic that interests me as well, so I decided to do what I tell my students to do in my information literacy workshops. I think my results may prove interesting to others on this list.

I found the original article "McCutcheon, S. Keyword vs controlled vocabulary searching: the one with the most tools wins" and looked through the bibliography for an interesting citation. I chose "Gross, T and Taylor, A. G. (2005) What have we got to lose? The effect of controlled vocabulary on keyword searching results. College and Research Libraries 66(3), 212-30." then copied and pasted just the title into Google Scholar.

The result on its own turns up some articles that are quite interesting: http://tinyurl.com/342o9d6. But more important is that under the metadata record to Gross and Taylor's article is "Cited by 28". Again, not all of them are completely relevant, but some are. These in turn are cited in still later articles.

What I wanted to point out however, are the articles in the right-hand column, which are supposed to be free versions of the articles available in an open archive. Sometimes, these are the same as in the left-hand column, e.g. Zavalina's article from Ideals at Illinois (Ideals is a fabulous resource, by the way!), and are both freely available, but as I tell my students you will see spam, e.g. JSTOR results show up especially often in the right-hand column, which, of course is not an open archive. My students seem to accept this proviso with no problem. Still, look at the number of articles that are available in open archives. I have watched these numbers mushroom in the last-I would venture to say, three years.

There is also the click box at the top now, which allows you to do a keyword search only in those articles that cite your article: a huge advance. All of this is obviously very powerful, but even more important: this is free to anybody today who has access to the web-a fact I still find astonishing! Of course, Google will continue to advance these tools. As one example that I can imagine: if they could change the click box for limiting the search not only to cited articles, but to the entire thread of citations from beginning to end that the user could control-this could prove to be useful (or perhaps not). Finally, keep in mind that this is only one citation from one article, when there are several other citations available, leading to articles with other citations. Trying to visualize the amount of relevant material can be staggering.

Based on these considerations: this is the way I see it:
These results are definitely useful for anyone and quite easy to do. This method also mixes a fairly traditional tool (citation indexing) with keyword possibilities. Also, I suspect that Google will not change their tools for us-at least, not very much. By this, I mean that Google will not do a lot of work to be able to fit into our systems and we must design things to fit into theirs-that is, if we want to cooperate. It is inevitable that newer tools such as Mendeley will be incorporated into this eventually (if it has not been done already).

So, for me the question becomes: how do we build tools that fit into this situation usefully, simply, and coherently? How can controlled vocabulary enter into this entire equation where citation analysis (plus other methods as yet unforeseen) are so easy to use? I think it can and must be used, but it is quite a complex task.

Thursday, November 18, 2010

RE: Technical Services Emerging Matters

Posting to Autocat

On Wed, 17 Nov 2010 10:08:15 -0500, Sabanadze, Irina wrote:
>Dear Autocaters,
>Are you familiar with the concept of Learning Organisation (LO) as described by many authors as, for instance, in the blog post "Knowledge: the Key to Competitive Advantage for Learning Organisations" written by Lalita Chumun
>Do you feel the transitioning to the LO is realistic at this time?
Thanks for pointing this out. In my view, this idea of "learning organisations" is similar to Darwinian evolution: organisms (or organizations) must adapt to their environment or be ruled against. In times of very little change, such as during the age of dinosaurs when everybody could get along happily and lazily munched on the leaves that grew everywhere or they just ate one another, nobody had to exhibit much in the way of new ideas or new actions, and thus, during such easy times there is less need for these "learning organizations"; but in times of greater environmental change, everybody must exhibit much more innovation and adaptability or risk being "ruled against". It doesn't matter if these organisms are bacteria, dinosaurs, people, or organizations.

So, while I completely agree with this posting, there are other factors that must be taken into account: above all, time and money. Major changes demand time--primarily for trial and error--and in our present climate, there seems to be little time. Plus, training is not free and someone has to pay for it, either the individual or the organization. These costs can be substantial, and often the stated goals of the training are highly nebulous and therefore, difficult to justify during times of restricted budgets.

Finally, there is another factor determining adaptability, perhaps the most important one, when it concerns organisms or organizations with intelligence, and that is: the Goal. Dinosaurs and mammals had no goals of what they wanted to become and things just happened to them, while we can at least try to foresee what would be for the best and try to work our way toward those goals. What are the goals of the library community today? What do we want to change into? Too much change and we become something else; not enough change and we prove ourselves not to be the "fittest". Therefore, the "learning organization" must have shared goals in mind. There is a lot of confusion on these basic issues.

Not that this should be surprising since almost all "information-based" organizations are confused: the news industry, the music industry, the book industry, and so on; while the confusion within each sector has deep ramifications for society. It just seems we are facing the so-called "perfect storm" today. The dinosaurs had their Permian-Triassic extinction event, and our society has the Internet-Global Financial Crisis, 2008-2009. (By the way, I just discovered that in LCSH, the financial crisis is over, echoing the pundits. Perhaps Financial crises--21st century would be a better choice? But I find that one too frightening!)

I hope we prove ourselves to be more adaptable than the dinosaurs and display the innovation of our tiny mammal ancestors. One important asset would be instituting these "learning organizations", if that proves possible.

That's my opinion, anyway!

Wednesday, November 17, 2010

RE: EBL (Ebook Library) "problematic" records

Posting to Autocat

Brian Briscoe wrote:
The question concerns whether it is worthwhile to improve the EDL records to make them useful to catalog users.
In Paul Adasiak's original message, I focused on:
Our own Collection Development librarian says this, with numbers to prove it ... If, with poor records, their access is adequate, then why bother with "improved" records at all?
This is a highly important question--perhaps the ultimate question about the utility of catalog records--and backed by solid numbers to boot. I have heard similar questions raised by upper echelons who are trying to manage very tight budgets and are looking for savings everywhere. Such a question is doubtlessly valid and begs to be asked, although it certainly makes me uncomfortable. Sooner or later--especially when the full-text of Google Books becomes available, it will have to be answered. In the present case, if it turns out that most people are finding the EBL books primarily though the full-text search supplied by EBL and thus bypassing the local catalog completely, or if they are finding them *despite* lousy records, either makes a huge difference. If it can be shown that the public are *not* really finding these items and the reason is lousy cataloging, now that would be important.

I know several people who say that the way to make the catalog records the most useful to the public is to delete them and scan everything. I do not agree at all, but in these days, I think more and more people will listen. As I mentioned in my latest podcast, I have watched my patrons struggle and fight with the catalog: any catalog. The very concept of controlled vocabulary is getting more difficult to understand and to explain; even the idea of [surname], [forename] is getting weird for people! Look at how the names display in Worldcat as [forename] [surname].

Don't get me wrong: I am as strong a believer as anybody out there in high standards in our catalog records, and it is my stance that high standards (higher than they are now) that are constantly improving represent the only way forward for our field. But it's just very hard to say exactly what these "high standards" should be today, as the Collection Development librarian above can point out. The old standards will obviously have to change in the new environment we are entering, but we still have to figure out how. Here is only the newest example I have seen how research needs are beginning to change http://www.nytimes.com/2010/11/17/arts/17digital.html, and it is clear to me that we have to fit into that kind of world or be left in another time. This was why I mentioned using EBL for a case study, which might help answer some of these questions.

At the same time, there is an actual interest in quality of metadata as Nunberg's "Metadata Trainwreck" shows, which even made it into some of the more popular media: not only the Chronicle, but the Register, Salon, etc. I think we may have an historic and short-lived opportunity to get some of our points through, but let's face it: it probably won't happen.

Tuesday, November 16, 2010

RE: EBL (Ebook Library) "problematic" records

Posting to Autocat

On Tue, 16 Nov 2010 08:44:20 -0600, Brian Briscoe wrote:
>On Mon, Nov 15, 2010 at 1:17 PM, Paul Adasiak wrote:
>Some have said that, even despite their poor bib records, the books are being found and used. Our own Collection Development librarian says this, with numbers to prove it ... If, with poor records, their access is adequate, then why bother with "improved" records at all?
>Because, in reality, their acces is not adequate with the poor records. Just because a user happens to stumble across a resource by happenstance does not mean that the access to that item is very good.
The main task is to discover exactly how the patrons are finding these books. Are they using *only* the records in the catalog, or are they accessing them through the full-text search within the EBL website? Of course, this is fated to become an incredibly big and thorny issue when the millions of Google Books are eventually brought online in their entirety.

This could be an interesting project for some library student out there: to find out how many people find the EBL books through the local catalog/metadata records, versus how many find them through the full-text search in EBL. Throwing in the results from NetLibrary could be interesting as well. EBL and NetLibrary may be willing to share their statistics on this, at least in the aggregate. The results could help us determine the impact of the addition of the full-text books from Google.

One of the major challenges that the field of cataloging must face is how it will interoperate with full-text retrieval--what will be necessary for us to maintain and what can safely be tossed overboard. This type of project could make for an interesting case study.

Monday, November 15, 2010

RE: All our eggs in one basket?

Posting to RDA-L

Karen Coyle wrote:
We do not have a single source of data today. We have publisher web sites, Books in Print, publisher ONIX data, online booksellers, Wikipedia, LC's catalog, WorldCat, thousands of library databases, a millions of citations in documents.
There is the question of "is this data authoritative?"...
Also, if the informational world were amenable, a lot of this information *could* come from the item itself. For example, metadata could be harvested from the <meta> fields of a web page. See as an example, the metadata in the Slavic Cataloging Manual, now at Indiana University http://www.indiana.edu/~libslav/slavcatman/. Look at the "Page Source" mostly found under "View" in most browsers and you will see some metadata for this item. Spiders could be configured to harvest this data.

Or, in an XML document, a lot of this could come from the information itself, e.g. a title of a book could be encoded as "245a" or "dc.title" (although I would like some way to distinguish a title proper). The ISBD principle of exact transcription would fit in perfectly. Also, as information is updated, the updates could be reflected everywhere immediately.

The mechanics of much of this exists right now. The main problem is that there is very little agreement over coding or how data is input. For example, see almost any NY Times article http://www.nytimes.com/2010/11/15/world/asia/15prexy.html, and look at the <meta> fields there. This can give an idea of the possibilities, as well as the challenges in getting control of all of this.

RE: Again about "Re: Why We Can't Afford Not to Create a Well-Stocked National Digital Library System"

Posting to NGC4LIB

Dan Matei wrote:
> Well, I'm not so concerned about the publishers' revenues. I'm more concerned about the authors' revenues. But I know of an example of a reasonable "business model" for a national digital library:
> www.pim.hu/object.90867f8f-d45e-40f9-8a6b-fe0034f0db87.ivy
> The Hungarian Government pays copyright fees to the most important contemporary Hungarian authors (including the Nobel Prize winner Kertész Imre) and publishes their oeuvre online (for free) !
> Bravo, I would say !
and Karen Coyle replied:
Dan, this is all fine and well in theory, but the publishing industry has incredible clout here in the US (not the authors, who are but chattel). So unless the *publishers* see an economic advantage and are aboard, this is just a numbers game. I agree that it *should* be possible, but *should* doesn't get us very far.
The plans to pay publishers for public use, however, seems to me to be a slippery slope. Already the Assn of American Publishers has been stating that they should get payment for every book lent in a public library (as is done in some countries.
From my standpoint, I see this exchange as a reflection of the European vs. American views of the world of economics. The European considers that the power of the state should predominate (in this case, the Hungarian government), while the American sees the predominance of "private" power (i.e. the publishers). Not that either side necessarily agrees with their respective viewpoint, but there are these assumptions on both sides.

We see it also in the recent turmoil in Europe over cuts in the social safety net, with huge strikes in France, student revolts in Britain, lots of things going on in Greece and Spain, while the U.S. is quite tranquil in comparison.

Book publishers are undergoing huge shocks right now and are having just as much trouble adapting to this new world as the music industry. (See "Stars fall in Amazon protest about ebook prices" http://www.guardian.co.uk/books/2010/nov/03/ebook-prices-kindle-amazon-protests) As ebooks and/or print-on-demand become more widespread, the traditional control of the publisher must change: printing a manuscript in x number of copies locally, and then sending these copies around the world to specific retail outlets, where the public can acquire them, and the copies that go unsold are returned to the publisher who refunds the retailer. Such an economic model makes sense only when there are no other alternatives. Today there are several other alternatives emerging.

We can see movement in the newspaper industry, where some journalists are starting to see that the interest of an individual newspaper does not necessarily coincide with the practice of journalism or of the journalists themselves. With scholarly literature, the open archive movement is showing that the interests of individual researchers (who get nothing for writing the articles but want to be cited as widely as possible) often have little in common with the publisher of the journal, or with the aggregators.

We'll see what happens with books now, but if I were one of those authors mentioned in the article above getting the single stars and horrible reviews, I don't think I would believe that my publisher has my best interests in mind. Things are changing in highly unpredictable ways. This will obviously have major impacts on the library world, but at this point, it is very hard to foretell what they will be.

Sunday, November 14, 2010

The Functional Requirements for Bibliographic Records, a personal journey Part 4

Cataloging Matters No. 6
The Functional Requirements for Bibliographic Records, 
a personal journey
Part 4

Direct Link
Part 1


Hello everyone. My name is Jim Weinheimer and welcome to Cataloging Matters, a series of podcasts about the future of libraries and cataloging, coming to you from the most beautiful, and the most romantic city in the world, Rome, Italy.

This installment continues my personal journey with the Functional Requirements for Bibliographic Records (or FRBR). Will I finish it at last? Stay tuned!

This series has gone on for three previous podcasts. I believe that this installment will make no sense at all without the others, so I strongly suggest that you listen to them first, in order. Links to the earlier podcasts, along with everything else discussed here, are available from the transcript.

I have been very busy lately with the school year and other matters, and that is why it has taken some time for me to continue this series. But as I warned in my first podcast, this is a true “irregular” in every sense of the word so don’t expect too much!

I have decided to spare everyone from having to listen to me recite my twelve-step process yet again. If anyone is listening, I can imagine the sighs of relief!--and I will take up from the time that I worked at the Food and Agriculture Organization of the United Nations, where I had entered my Serious Doubts Phase.

I was away from the library cataloging I had become accustomed to and was working with AGRIS cataloging rules, actually indexing separate chapters, papers, and articles when necessary, working with a thesaurus called AGROVOC; all of this along with a certain amount of systems development. While there, I also dealt with bibliographic formats and practices from other organizations, practices I had never seen before. These had to do with statistics, images, geographic information, internal documents, and all kinds of other types of resources. Therefore, I came into contact with separate “metadata worlds,” each world coherent and meaningful on its own, e.g. the metadata world of AACR2, our own metadata world of AGRIS, metadata worlds of different indexes, the metadata world of statistics, and so on. None was necessarily “better” than any other, and each on its own made sense more-or-less, and I would have loved to import any or all of those records into our catalog, but when I tried to get them to work together, all I got was hash and it would have been easier to just do everything from scratch. I saw how the power of the newer formats such as XML could manipulate “correctly-encoded data” in all sorts of amazing ways--and I could even do some of it myself!--yet the automatic methods could only go so far.

One sticking point lay in the details of “correctly-encoded data” and exactly what that meant. This went both for the formats as well as the data itself. It turned out that just getting other organizations to send author information coded as was tough enough, or to get people to use more or less in the same way. But trying to bring uniformity to the data itself, that is, so that everyone would use, e.g. “FAO” and not one of the dozens of other possible forms of its name, turned out to be overwhelming. It was very clear that you could work yourself to death trying to regularize this information. True, there were possible solutions using URIs instead of the exact forms of names, but it still seemed to me that there would have to be an incredible amount of agreement among all sorts of parties before any progress could be made. Being in the middle of all this propelled me deeply into my Serious Doubts phase.

But these considerations were soon brushed aside, since I accepted a position as Library Director at the American University of Rome, a small, undergraduate institution located in a beautiful area among graceful palazzi on the top of the Janiculum Hill, which provides visitors with some of the most spectacular views of the city that anyone could hope for. I took the position for various reasons: I wanted to be a real librarian again, but also, I had always felt that it was the smallest collections where the materials on the World Wide Web offered both the greatest opportunities, and posed the toughest challenges. A Harvard or a Yale will have a great collection no matter what, and in many cases for the people there, the materials on the web result only in an “extra format” of something already available to them. For a small collection however, such an opportunity should provide the difference between night and day!

But how do you do this in a small library with very little help? Perhaps in another podcast, I’ll talk about my own attempts and what I think are my successes and failures, but for now I want to focus on FRBR.

Returning to the Anglo-American cataloging world got me back into AACR2 and gradually, FRBR. More importantly, I began to have my very first substantial and regular contacts with the public as a reference librarian. I learned a lot and am still learning all the time. What have I learned so far? First, it’s not easy at all to think on your feet when a student has put everything off until the last second and is half frantic. It’s also not easy to wheedle out of someone what they really want to know and not allow yourself to get sidetracked into showing them all kinds of things they do not want, wasting your time and having them think you don’t know what you are doing; or to try and match your understanding of what they want to what is actually available. Plus, for loads of reasons, doing reference is far more difficult when you actively add materials on the World Wide Web into the mix, as I wanted to do: it is a practical impossibility to keep up with them; there are concerns of “quality of information”; World Wide Web sites change their names and locations; and, as I discovered, Google and other search engines are always tweaking their results, so a search that, in a manner of speaking, “worked” yesterday or last week may not work the same today. That can be maddening!

There are dozens or hundreds of very thorny obstacles when you actively try to incorporate the information available on the World Wide Web into part of the local collection, but I felt I had to do it, and I still do. The old methods just don’t work well enough however, and I have succeeded only partially.

Through my experiences with the public as a reference librarian and watching how they tried to work with the library’s catalog, I very soon fell into my Disillusionment phase. I saw firsthand how difficult a library catalog is for the public to use. At the same time, I saw how much easier it is for the public to use tools such as Google and Google Scholar, along with databases such as those from Ebsco or Sage. The Google-type tools and databases obviously were made with the user’s comfort and so-called “customer satisfaction” in mind, while the library catalogs had different ideas. It was at that time, while I reflected on my observations of my users’ troubles, that I remembered the purposes of the catalog as laid out in the FRBR user tasks: to find/identify/select/obtain works/expressions/manifestations/items by their authors/titles/subjects. Therefore, I began thinking about FRBR consciously once again.

And yet the main thing that I discovered very quickly was: when people ask for help, they very, very rarely are asking for works, expressions, manifestations or items. Of course, on a certain level, people continually ask for a book that they cannot find on the shelf, so they are asking for items, but beyond this purely mechanical/clerking need, they ask for answers to their questions, no matter where those answers may come from. Most questions are similar to: “I am writing a paper on the House of the Vestal Virgins, and I can’t find anything useful” or “I need statistics on drug crime over the last 10 years” “I’m updating a book I wrote several years ago and need the newest information” or similar questions. Rarely, and I emphasize very rarely but it does actually happen, I have gotten a question such as, “I need Hobbes’ translation of the Iliad” or, what I hear more often, “I need the latest edition of such and such a book”. To be honest, I am almost the only person I know of who wants highly specific editions. As one example of my own needs, I have been looking for a specific edition of Thomas Middleton’s play “A game of chess”, where I would like a first edition (yeah, sure!), but so far I can find no scan of the first edition. Still, I can download my own copy of the play from an edition of Middleton’s collected works from the Internet Archive, which includes excellent commentary http://www.archive.org/details/worksthomasmidd04bullgoog
and there is another copy of this set at HathiTrust. Plus, I do have access to a couple of scans of different frontispieces, one is from the first edition, while the other is probably what the editor in the collected works mentions in his preface to the play:
http://de.wikipedia.org/w/index.php?title=Datei:Middleton_%27A_Game_of_Chess%27.jpg&filetimestamp=20070708233726; http://commons.wikimedia.org/wiki/File:A_Game_of_Chess.jpg.

I have found no copies of the first edition here in Rome, and I hope that no library would make such a rare book available for inter-library loan, but in spite of all that, I still have access to quite a bit of information just by sitting at my desk and knowing where to look. While I am not the only person who wants resources of this type, there are nevertheless very few in comparison with everyone else, and such people are certainly not in the majority. Of course, I would also like all of this to be much easier to find. Nevertheless, I freely admit that this is not the primary type of information that I need either, since I search much more often for answers to my own questions no matter where those answers happen to be. So, in the vast majority of situations, I am not much different from my patrons. Naturally, the very idea of relating the concepts of works/expressions/manifestations/items to web resources seemed to be nonsensical to me: while I agreed that with enough mental effort, you could probably force sites such as youtube, microsoft.com, blogs, or facebook into an FRBR structure, I could not see how the final product be useful to anyone or worth the effort.

These observations may seem obvious and rather unimportant, but to me, realizing and accepting all of this was simply devastating: if it is true that very few people want works/expressions/manifestations/items, then it follows that people want something else. Then, the conclusion is unavoidable: the catalog does not supply what people want! That is when I found myself on new ground, in a place I did not want to be, and I did not like it one bit.

I squirmed, but I could not avoid the unpleasant conclusion: the very premises of FRBR toward users were clearly and utterly wrong. FRBR had confused “user tasks” with what the traditional catalog actually did. It dawned on me that FRBR describes how the traditional catalog has always functioned and although it may be correct so far as it goes, it does not then logically follow that this is also what people want or need. That is where the fallacy lies. And I could see that fallacy in operation every single day when I worked with my patrons, or even when I did my own searching.

Still, if the premises of FRBR were wrong, what did that entail for everybody out there? What did this mean for RDA, which was just really getting off the ground? And I stood face to face with my Despair phase.

In retrospect, my Disillusionment phase was not so difficult because it passed rather quickly, but my Despair phase (which I confess I still fall into occasionally) lasted significantly longer and was therefore, much more difficult for me. I could still do my job of course: select materials, create records in the catalog, work with patrons and so on, but it all had much less meaning since I saw how the newer tools were being used more often, with more relish, and many times, with results that were really not all that bad. People still had major problems with the new tools of course, especially when it came to writing a paper, but the problems with full-text seemed trivial compared to those they encountered when they worked with a library catalog.

I also became responsible for the university’s bibliographic instruction, or, what is now styled as the library parts of information literacy. Many things I learned about my students surprised me, but primarily, I was surprised that when people type the terms they want into a search box, I have not yet met anyone who understands what they are searching or what is happening. People even find such a question surprising. To the people I have met, a search box is a search box is a search box, no matter what it is and what it is connected to. There is one box and it does everything for them. Therefore, it becomes highly difficult for people to understand that when they are searching a library catalog, they are searching, as I now call them in my Information Literacy sessions, “Summary records” and they are not searching full-text. Several students have been honest enough to tell me that they didn’t understand what it meant to search by author, title, or subject! This was some of the reality I encountered.

I do not think any of these people are stupid, and in fact, the way they approach matters makes a lot of sense: all search boxes look the same, therefore they are the same, so all should be searched the same way. It seems to me that using search boxes demands an intellectual leap that doesn’t apply when searching physical catalogs and indexes. When working with physical catalogs and indexes, it is very clear what you are searching and what you are not, since it is obvious that there is no way you could be searching the full-text of the books on the shelves when you are using a card catalog, or when leafing through the volumes of an index. But when the catalogs and resources are all virtual, the relationships among catalogs, indexes and everything else become far more nebulous and the searcher can sense no clear boundaries. The entire environment becomes far more abstract, and you don’t know what you are doing, and what you are not doing. That is, you can’t know without doing a lot of work.

This is why I believe people search library catalogs in the same way they search Google, and why they almost always get such poor results when compared to full-text searching. Searching a catalog competently is a skill that must be learned; and not only learned once, it must be exercised or it atrophies, just as any other skill that goes unused. So, even if I had made some strides forward in some of my classes and people actually learned something, it would turn out that after a few months or a year later, they forgot. That should not be surprising, but it was for me.

While I was doing some research on user education, I came across a provocative article, which quoted a Mr. Line from a paper in 1983 entitled “Thoughts of a non-user, non-educator” http://www.londonmet.ac.uk/deliberations/courses-and-resources/wilson.cfm, where he was quoted as saying that the term user education is, “meaningless, inaccurate, pretentious and patronising and that if only librarians would spend the time and effort to ensure that their libraries are more user friendly then they wouldn't have to spend so much time doing user education.”

While this made a lot of sense to me, I am also interested in library history, and one of my favorite authors is William Warner Bishop from the Library of Congress (who also happened to be the first head of cataloging at Princeton University). He gave a talk to the NY State Library School in 1915 (http://www.archive.org/details/catalogingasasse00bishrich). In this talk, he said something about catalogs that always rang true for me and I would like to quote him at some length:

“Now no instrument can always be worked easily, safely and successfully by the chance comer. Herein lies much of the difficulty found in the use of card catalogs.

For who uses a card catalog? For whom is it made? This is the real crux of much of the current discussion of the merits--and failings--of that machine. Obviously it is not for the way-faring man; equally obviously not for the child just entering school. Clearly persons who wish to read or study some definite book or some subject are the normal users of card catalogs. For the idle or the curious browser, there are the open shelves; for the fiction seeker, the finding list and more open shelves; for the child, the children's room; for the man in haste, the reference collection and its attendants.”

and later he says:

“Is it not a perfectly fair statement that in the users of a card catalog there may be presumed some modicum of intelligence and a more than passing interest in some topic? I do not believe that the card catalog can ever be made so easy of operation, especially in this day of huge libraries, that every chance comer can handle it successfully without some instruction.”
He goes on to say that a catalog is complex because books and people are complex. This is beautifully said and very convincing, but I fear, it is absolutely outmoded in our day and age. Mr. Bishop went to great pains to talk about how different kinds of people can avoid using the card catalog, but today with an ever-growing demand to use library collections remotely and the ubiquity of what is now called search, “every chance comer” has no choice but to work with the catalog, and to do it without any real instruction. Since the number of reference questions is also plummeting, patrons even do it without asking anyone for help. Of course they will have poor results! As a result, I reluctantly came to believe that Mr. Line is correct and not Mr. Bishop. Today, people who want information for whatever purpose can avoid not only the catalog, as Mr. Bishop pointed out in 1915 for “the man in haste” and the “curious browser”, but today everyone, including the serious searcher can avoid the entire library as well.

As a result, I concluded that since there is absolutely no possibility of training all users of our catalogs, it is the catalog that must change and no longer be seen as an impediment. This means that it must change in ways that will be more “user friendly”. But added to this imperative were all of the other problems I have mentioned in my earlier podcasts: a mushrooming number of worthwhile materials available online--Google Books is only one site which alone has been adding millions and millions of books but there are an enormous and ever-growing number of other great sites out there with new ones popping up all the time, each containing innovative and wonderful resources; I had also noticed that there is a huge amount of metadata, but it did not interoperate because formats, data, and bibliographic concepts are not coordinated and consequently, the metadata others create is not “good enough” and must be redone over and over again by each group; there was the genuine challenge of full-text retrieval methods plus the new “social web” which were difficult to assess, but showed great promise and it only made sense to work with these things somehow; almost all of us were also looking at flat budget lines and on and on the problems went. These were some of the real and serious challenges that I saw we were facing, and what was the library community’s response?

FRBR and then, RDA.

To be honest, I had always been looking forward to RDA since it was clear that changes were needed, particularly in raising productivity, and dealing with the new, weird things I saw on the web, where it seemed that the only thing that was constant was that they changed.

I need to pause for a moment here to avoid a potential misunderstanding: when I say that productivity needs to increase, I am absolutely not saying that catalogers are slackers or anything of the sort. Increases in production come primarily through the introduction of technological innovations and adherence to shared standards, not through individuals working harder. There have been relatively few technical improvements in the creation and sharing of catalog records since the introduction of Z39.50. Some tools provide help in making authority records and so on, but a lot more could be done. Much more important in my view is for catalogers to produce records that are of a sufficiently high standard that other bibliographic organizations can just accept them without local editing. I think we all know that while libraries claim that they create records that follow AACR2, they often fail in many ways and local editing is necessary with the result that the same items are re-cataloged repeatedly, or it turns out that the volume of copy records that require editing becomes so overwhelming that libraries just give up and accept whatever comes their way. Such a situation cannot be considered adherence to standards and is unsustainable in the long run. Imposition of genuine and realistic standards that must be followed, as they are in other industries such as foods and drugs, or the automobile industry, if such standards were possible to implement, would doubtless increase productivity tremendously.

So this is what I mean when I say that productivity must rise; we work smarter so that we can genuinely cooperate, not that each cataloger must produce 500 original records a day!

To return: while it was clear to me that FRBR did not provide what users wanted, I was very interested in seeing what RDA would come up with. Perhaps the actual practice would improve on theory by avoiding the problems I saw and provide some real solutions. But when RDA came out for general review and I could see it, I plunged into the darkest depths of my Despair phase. I couldn’t even discuss matters of detail of RDA because I saw that it was silent about the tremendous challenges we were really facing: of productivity, how to work with the other “worlds” of metadata, or interoperate with full-text tools. RDA did nothing new except change a couple of procedures, and it stuck faithfully to FRBR. As a result, our patrons’ experience would not change at all.

About this same time the economic bubble burst, and lots of things changed. Before the bubble, I could at least consider retraining and retooling, but afterwards, it was simply unthinkable. Perhaps even then, if I had honestly thought that RDA represented a step forward, I might have considered fighting for funding (still unsuccessfully, I have no doubt), but I could not ignore that in my professional opinion, RDA is not a solution for anything and I could not justify spending precious dollars (or euros) on that.

In the depths of my Despair phase, I contacted others and it turned out that they also shared many of my concerns; they also had no money for retraining staff and switching over to RDA. This was when I found a ray of Hope because I learned I was not alone, and I decided to initiate the Cooperative Cataloging Rules Wiki (http://sites.google.com/site/opencatalogingrules/). It’s still new and I don’t know what will happen with it, it may be doomed to oblivion, but at least for me it represents a bit of hope and an option for libraries who either cannot or will not switch to RDA. I thought long and hard before announcing anything, but decided to simply forge ahead.

That pretty much describes my own, personal journey with FRBR up to the present, and the difficulty I experienced of accepting that FRBR changes nothing of substance and avoids the real problems facing modern librarians. Perhaps you will find this ending anticlimactic or unsatisfying, but it is not for me.

A very important concern of mine can be inferred from those who have listened to my earlier podcasts: that the FRBR user tasks are based on the work of Panizzi and Cutter, two giants in the field whom I have admired immensely. For me, renouncing FRBR was equivalent to renouncing Panizzi and Cutter and this made me exceedingly uncomfortable. Nothing improved until an exchange on the RDA-L list with Bernhard Eversberg, who helped me understand things better. http://www.mail-archive.com/rda-l@listserv.lac-bac.gc.ca/msg02048.html

I was discussing details of how difficult it had been for me to find some small bit of information I wanted (it turned out to be only a single page published over 100 years ago), but nevertheless I could do it and I considered the fact that I actually could find what I wanted nothing less than amazing. I mentioned that these are the sorts of things people want to do today and they have nothing whatsoever to do with the FRBR user tasks. Bernhard pointed out that in earlier days, people wanted to do the same things, but “they had to first align their intention with a bookish mindset and then walk into a library,” which seemed true, and I replied that in a case like mine, it was probably only after prolonged consultation with an experienced reference librarian.

So, perhaps Panizzi and Cutter were right after all, but for them, the existence of a reference librarian was simply too obvious to mention, since it went without saying that untrained people could never use a catalog competently.

The information environment has changed far too much and the presence of an ever-watchful, skilled reference librarian can no longer be taken for granted. This narrows the choices at our disposal: either to expect patrons to struggle with our catalogs as we can see them doing now and if patrons don’t find something it’s their problem and not ours, or we can try to make the catalog more useful and user friendly so that people can operate it more easily. Of course, in one way, shape, or form, our patrons pay our salaries, and since patrons can now actually get worthwhile information without the library, it is logical to assume that if we do nothing and expect everyone to continue fighting with our catalogs, those patrons will see us either as useless or obstructionist, and suddenly, their problems really do become our problems. For me, FRBR and RDA head in the wrong directions and are the equivalent of doing nothing.

So, we are left with improving the catalog. There are a lot of things we can and should be doing using the power of the computer systems, plus focusing on increasing quality and standards. Fixing this situation will demand time and imagination, a lot of trial and error; and I hope it will be done with fantasy, taste, and even a bit of fun here and there.

For those of you who have had the patience to share my journey, I hope you have enjoyed it, whether you happen to agree with me or not.

The music I would like to close with is from the first movement of Vivaldi’s stirring, and rather dark Double Cello Concerto in G Minor, performed by the King’s Consort. http://www.youtube.com/watch?v=IYdTLnlc4q4&amp

That’s it for now. Thank you for listening to “Cataloging Matters” with Jim Weinheimer, coming to you from Rome, Italy, the most beautiful, and the most romantic, city in the world.

Friday, November 12, 2010

RE: [RDA-L] 300 Punctuation

Posting to RDA-L

Myers, John F. wrote:
This is what happens when we continue to coopt a communication standard developed to print cards for use as a vehicle to convey data in electronic interfaces. Nearly every quirk in MARC can be traced back to its foundation as a card printing mechanism (and the lack of programming sophistication when it was originally developed).

One thing I think needs to be kept in mind is the purpose of the ISBD punctuation, which is language-independent. Here is a record I took at random from the catalog of the Russian National Library. Even though not everybody reads Russian, any cataloger in the world can immediate understand what the various parts are because of the punctuation. (I switched my email format to HTML, so I hope it works for everybody)

Достоевский, Федор Михайлович (1821-1881).
Село Степанчиково и его обитатели : Из записок
неизвест. / Ф.М.Достоевский. - Изд. для
слабовидящих. - М. : ИПТК "Логос", 1997. - 550 с.
; 20 см. - (Круг чтения).

I think this important function can be retained in a non-ISBD punctuation atmosphere—at least kind of. We can have different interfaces so that each person can decide upon the language he or she wants to view the catalog in, but even then, it seems as if there will be some kind of a limit on the number of languages offered, and the idea of above, where any cataloger can understand that record will not be possible.

Of course, we need to consider the possibility of various types of automatic translations a la Google Translate, and/or automatic transliteration as well.

Retaining the international comprehension would be very nice but maybe it can’t be done.

Wednesday, November 10, 2010

RE: Displaying Work/Expression/Manifestation records

Posting to Autocat

On Tue, 9 Nov 2010 10:31:20 -0500, Brenndorfer, Thomas wrote:
>Linked Data is quite different from regular HTML web linking or even from traditional database design. Linked Data is about sharing structured data using identifiers (which don't even have to be HTML-based).
Yes, I know. I've done it. See below.
>> To institute linked data *right now* in our records, we could do it now and
>> don't need FRBR or RDA.
>Encoding Linked Data URIs in MARC is riddled with problems, as indicated in this MARC Discussion Paper
This isn't the main idea of linked data. It is repurposing and reformatting the data that already exists for new uses. In that example I gave in my catalog: http://www.galileo.aur.it/cgi-bin/koha/opac-detail.pl?bib=26319, the Google Books link is achieved through linked data: in the background without the user being aware of it, Google Books is being searched automatically using the ISBN, and if it finds something, you get the link. This is achieved using javascripts and XML--maybe some other things, too. My implementation only searches the ISBN, but it can be done with other information as well. The same goes for the Citations example, where it automatically searches worldcat and brings back an RSS feed (written in XML) that OCLC rather ingeniously (I believe) formatted for local implementors to work with.

There are LibraryThing APIs that can be used, Amazon.com the Internet Archive, HathiTrust, and so on and on and on. I haven't worked much with these yet since I can only do so much here, but there are immense possibilities by using the information in our records *right now* and just being a little imaginative here and there.

One of the big problems in the library world is that there are no library APIs (that I know of) that are really useful for our public. It appears that it is possible to use id.loc.gov for linked data purposes, but you would have to add/change all of the current textual subjects to URIs, e.g. the heading "650 \0$aWorld Wide Web" would have to be changed in some fashion to http://id.loc.gov/authorities/sh95000541#concept and from here, you could apparently get the XML response that you could work with. I don't know of any live implementations. It seems that there could be lots of possibilities here but things have to change around. For example, LC should be able to take the request, e.g. "http://id.loc.gov/authorities/World Wide Web" (easy to implement for almost anybody), and they would convert it to what they need: http://id.loc.gov/authorities/sh95000541#concept instead of expecting everybody in the world to change or edit their records locally. They won't do that. If LC implemented something like this however, it would be much more understandable in a networked world, and could also be used very widely by non-librarians, and for non-library tasks, as well.

But one of the problems with this, even *if* this were implemented, is that the user would only see the authority record with, true, the BT, NT, RT, UF and scope notes, which are useful, but the only links from the record you see in id.loc.gov are to other internal links, so it is of very little use to the patron since there are no links into other items on the web. But in Bernhard Eversberg's version: http://www.biblio.tu-bs.de/db/a30/lcsh.php?com=X%20a30ind--Vuf1=LCS--Vut1=world%20wide%20web (you have to click on "world wide web") there are at least links into Google, Google Scholar, etc.

As an example of something that presents the user with a richer tool, look at http://dbpedia.org/page/World_Wide_Web from dbpedia. Don't look at the formatting because that can be changed, you can display only the languages you want, etc. Look at all the different kinds of properties, e.g. "isInventionOf: Tim Berners Lee". The "redirects" are the equivalent of UF. There are a lot of problems with this I agree, but there are lots of links to lots of resources out there. Which would an untrained user find more useful: id.loc.gov or dbpedia?

These are some of the promises (problems?) with linked data, but that is the world we should be getting into. And it can be done *right now* today. But we need imagination. Here's a scenario:

Imagine a world where everybody follows ISBD--forget the punctuation, but the real meat of it: the rules for deciding what to input and how to do it. Then in the linked data "request" we could put in the ISBD information: the equivalent of 245abc, 250a, 260abc, 300a, 4xx, or whatever we wanted. The "reply" could be giving us the record for the manifestation we requested, or *anything else* we wanted, perhaps something entirely different, such as other subjects, or recommendations, or latest blog entries from academics, or who knows what. It could even supply us with the work/expression/manifestation FRBR displays if that were seen as so important. It could give it all to us.

Above all, the key to make this work is high-quality, predictable metadata that can supply a valid and predictable "request" and match it with a valid and predictable "reply." Then the sky really is the limit. But first we must open our minds to absolutely new possiblities.

Tuesday, November 9, 2010

RE: Displaying Work/Expression/Manifestation records

Posting to Autocat

Thomas Brenndorfer wrote:
I'm not sure about this fixation on the "unit record." In Scenario 2, a multitude of bibliographic and authority records work together and are linked together. Likewise, in the card catalog paradigm (Scenario 3), separate records are used to create a linked infrastructure using controlled headings. There is a single MARC record now that concatenates information about different entities, but it's not a self-contained unit record that stores all important information about the entities involved. That being said, the idea of broad container records is still a priority, as seen in the work being done on the Linked Data version of the cataloging universe (and the game in town for Linked Data is RDA, not AACR2). This Q&A page (particularly question 12) for Linked Data applications of bibliographic information is quite useful in understanding how bibliographic entities are handled in this new context. http://www.niso.org/news/events/2010/dublincore/questions/

I don't think I have a fixation on the unit record. What I have been trying to point out is that FRBR does not propose anything that catalogs do not provide now: the user tasks are *precisely* what anybody can do in a catalog right now, and what people have been able to do since at least Panizzi's time in the 1840s. The only thing that is "new" is breaking our current records up into works/expressions/manifestations/items. And I tried to show that this is also not new at all, that it harkens back to the days of the printed catalog. Then I asked what seemed to be a highly obvious question: is trying to recreate 19th century methods what people want today? Is this moving forward or backward? The only other kind of display involving FRBR-type structures I have found was in Fiction Finder, which was very interactive. In my opinion, such a tool would be essentially useless for patrons.

Concerning the point about linked data, I agree that it is important, but there are lots of ways to do it. Linked data is nothing new on the web--webmasters have always had to deal with "linked data" every day when they make a page. Any webpage you look at today probably contains lots of different parts of linked data: separate files for headers and footers, for navigation, styles, and so on. There are all kinds of different ways of implementing it: through server-side includes, linking cascading style sheets, images, various scripts, web services, and so on.

For example, what appears to be a single "webpage" in this record from my catalog: http://www.galileo.aur.it/cgi-bin/koha/opac-detail.pl?bib=26319 is actually made up of--I don't know how many different files, dozens at least--linked together using all of the methods delineated above. The Google Translate, Google Books, Sharing sections, are all called "widgets" that links this record to all kinds of other resources elsewhere on the web. If you click on "Get a citation" you are working with the WorldCat API, which they created, but I had to format what I received a little bit, to give my users a citation automatically. My patrons like that one a lot.

To institute linked data *right now* in our records, we could do it now and don't need FRBR or RDA. The problem is: we would have to link into something now that people would find really useful, and we don't really have anything. There is the id.loc.gov service, which could be used to search, e.g. in this case "Television series" (since "Television series Great Britain" retrieves nothing) and the patron could find the additional heading: "Television mini-series", and then--you're at a dead end. It's a good beginning but currently, there is nothing for users to do once they get there.

This is why I have linked into Bernhard Eversberg's system "LCSH Browser" using the blue "S" next to each subject. When you click on that, you search his system where, if you clicked on "Television viewers" you will find the BT, NT, UF etc. for that topic, plus links into different things, including even my own catalog (AUR Library). If Bernhard would turn this into a web service, you wouldn't have to click and it would come up automatically, and I could format it however I would want (using an onmouseover event or something).

We could try searching subj3ct.com, the search for "Television series Great Britain" https://subj3ct.com/search?query=Television+series+great+britain retrieves something that could prove useful, e.g. we see the children's series "Freewheelers" (http://dbpedia.org/page/Freewheelers) which provides lots of additional information and links to all kinds of other resources, such as its Wikipedia page, photos, related subjects, and so on. But at basis, I personally find the current structure of subj3ct.com confusing and unpredictable. Still--it's something!

Unit record or not, I honestly couldn't care less. We need to build a tool that *our patrons will use* and especially in this economic climate, without risking both the family farm and the baby's shoes. So much could be done right now and I think there are lots of people just aching for some ways forward.

I'll say it one more time: FRBR looks backward, while nothing I have seen in RDA changes anything of substance from what we do now. What we need is real change: something that will make people sit up and take notice. Real change can happen, if we decide to do it.

RE: Centralization (was: Catalogers and [no] respect)

Posting to Autocat

On Sat, 6 Nov 2010 11:51:09 -0800, J. McRee Elrod wrote:
>Marian Veld said:
>>Well, I like to think that our records *are* artisanally crafted, but I certainly hope we're not creating the massive duplication of 500 records for each work.
>I suspect what led to this strange assertion is that we each have *copies* of said record in our individual "silos".
>Personally, I find great comfort in this decentralization. It is a mistake. I think, to place all our eggs in one basket by having a national, continental. or international single linked database.
>Such a singular database could be disrupted or even lost by a natural or human made catastrophe.
I'm sure that everyone would agree with this, but the emphasis of Rick Anderson about the "500 artisanally crafted catalog records" is not, I think decrying the ultimate duplication of the records themselves, but the duplication of effort. Do we need 500 artisans each creating their own record for the same thing?

Naturally, this is overstating the point a bit. Where I disagree is that often, a non-cataloger will see some of the problems and then decide that the records we make are unimportant, especially so today with new tools, but attacks on the practice of cataloging have always occurred and is not at all new. Although it must all be repurposed and rethought, it seems obvious to me that bibliographic information is just as important today as ever before (just look at the "metadata trainwreck" debate http://languagelog.ldc.upenn.edu/nll/?p=1701), but the problems are in "how" those bibliographic records are created. If an OCLC master record for a book is terribly done, whenever another library buys that same book and uses the OCLC record for copy, then each library has to edit the record. If there were 500 libraries buying that book, then in effect Rick Anderson would be correct and the result is a completely unsustainable situation. RLIN would at least allow you to choose the record from the library you wanted, but with the master record, what you see is what you get.

Of course, the problem we are dealing with here is that our current standards are not being followed in many cases and in consequence, these are standards only "in theory" with the consequence that significant resources must be allotted to copy cataloging, or, libraries are forced to give up. I fear that RDA will only complicate this further.

Naturally, in the greater metadata universe of information from publishers, with other bibliographic agencies and lots of others, we meet lots more duplication. When looked at from this viewpoint, the duplication is staggering. I don't know how, but someday, somehow I am sure it will sort itself out and become more efficient.

Monday, November 8, 2010

RE: Displaying Work/Expression/Manifestation records

Posting to Autocat


I think we've both made our respective points and have talked this out. We'll have to agree to disagree. My only addition here is where you point out:

On Fri, 5 Nov 2010 Kevin M. Randall wrote:
What FRBR describes can definitely be done with the unit record. This is explicitly shown in RDA database implementation scenario 3 (http://www.rda-jsc.org/images/pdf.gif) [Actually at www.rda-jsc.org/docs/5editor2rev.pdf JW]. All that FRBR has done is study what we've already been doing for a very, very, very long time, and essentially say, "This is what we've been doing, or trying to do; here are the things we have to include in the records to make sure we keep doing it successfully, and hopefully do it even better." Please note, FRBR is only telling us WHAT needs to be done (identify and relate the entities) and WHY. It isn't even attempting to tell us HOW to do it--that's up to other endeavors (e.g., RDA is trying to do that for content alone).
This document is about implementing RDA, and they are emphasizing that in order to implement RDA in your catalog, you don't have to implement the model as laid out in FRBR, as they clearly say and display in scenario 1: "In the first scenario, RDA data are stored in a relational or object-oriented database structure that mirrors the FRBR and FRAD conceptual models. Descriptive data elements are stored in records that parallel the primary entities in the FRBR model: work records, expression records, manifestation records, and item records," which eliminates the unit record. I believe the document is silent about implementing the FRBR model using separate unit records, but it does say that you can implement RDA within unit records.

At least we do seem to be in agreement that FRBR does not do anything essentially new, but continues what catalogers have been doing for a very, very long time. My stance is: that's exactly the problem. I don't think this is insulting or anything of the sort: I'm a cataloger too and have been for quite awhile! But FRBR looks backward, in fact: quite a long way back, and this is when we need to find some new ways forward.

RE: Why We Can't Afford Not to Create a Well-Stocked National Digital Library System

Posting to NCG4LIB

Karen Coyle wrote (Concerning David Rothman's Why We Can't Afford Not to Create a Well-Stocked National Digital Library System http://bit.ly/dz23Rj)
So... where would revenue for the publishers come in? He
thinks that some kind of monolithic fee system would satisfy the
publishers, but where would the money for a tempting fee come from?
Libraries manage because they *don't* pay a per-use fee. He somehow
thinks that "magic will happen" that will make all of this
economically feasible. In fact, he thinks this would save libraries
money. I think he engages in utopian thinking.
While this may be true, I think that utopian thinking is critical at certain periods of time, to open up thinking toward genuinely new possibilities. If someone had said to me 20 years ago that there would be millions or more worthwhile materials available online at the click of a button, I would have deemed it "utopian," if indeed, not crazy, but now the possibilities of the Internet Archive, Google books, youtube and so on are nothing less than science fiction come true--and in an incredibly short time. Google audaciously started scanning books and now, they may be available to everyone very soon, but in any case, I'm sure they will be sooner or later.

The publishers cannot stop it forever, but they would apparently like to. (See: the Guardian's "Stars fall in Amazon protest about ebook prices" http://www.guardian.co.uk/books/2010/nov/03/ebook-prices-kindle-amazon-protests where it turns out that people are angry with the *authors* because the *publishers* raised prices on the ebooks based on the "agency model" sometimes even higher than the hardback price. To me, this shows they cannot deal with the ebook model yet) If I were an author, I believe this would make me very uncomfortable. Occurrences such as these demonstrate that the interests of publishers and their authors are not necessarily the same, although the publishers want to say that they are.

Publishers still have a vital role to play so long as there are printed books distributed in the traditional manner: i.e. physical items created and printed in one geographic area and sent around to different retail outlets, perhaps locally or around the world. This is not, and has never been, a very efficient model but in an entirely physical world, it was difficult to come up with anything else.

As this model changes through the gradual acceptance of ebooks and/or local print on demand, such as the Espresso Book Machine, it wouldn't surprise me if the actual functions of publishers gradually pass on to sites such as Google and/or Amazon since it makes more sense. When we discuss scholarly publishing, the situation is changing even faster.

It's clear that copyright will have to change since it cannot deal with the fundamental difference with today's technology which is based on sharing files: the way the internet works is by placing a copy of a file from one machine on another machine. Even though this is the way the internet has functioned since the 1970s or so, publishers still cannot deal with it. Publishers have been trying to deal with these changes by restricting the consumer's rights to the use of their materials through incredibly long agreements you are forced to click on and accept without the chance of any discussion or negotiation. And then it turns out there are many things you cannot do with your electronic book, while you can literally do *anything* with your printed book except to make a copy of it.

So, I see the situation with copyright as stuck in time and serving almost nobody's needs: not the authors, the public, or even the publishers themselves. Changing it will be a fight however, and libraries will only be spectators on the sidelines.