Friday, April 30, 2010

RE: If Academic Libraries Remove Computers, Will Anyone Come?

Posting to NCG4LIB

Unfortunately, I think I have to agree completely with Peter and the assistant university librarian at the U California/Santa Barbara, who think that if you took the computers out of the library, numbers in the library would be cut by half. Computers are a major draw anywhere on campus. Mobile computing is now the big thing, and especially so here in Italy, where it seems that everybody has at least one cell phone. I remember when a student said she was having trouble finding a book, and asked where it was and she showed me the record in our library catalog on her mobile phone. It looked horrible! I confess that I haven't found the energy to attack this problem and have preferred to ignore it... but when I see predictions such as mobile computing in one year, http://wp.nmc.org/horizon2010/chapters/mobile-computing/ while I don't think it's particularly true, it frightens me.

Every student has a laptop, but they don't want to lug them around. A smaller, lighter device would change everything.

Of course, use of the library space has little relation to use of the "library". People may come to the library to study, or check their email, or just hang out, while you can now use the materials available through the library virtually. Still, places to hang out or study do not have to be in the library.

For these reasons, and the eventual approval of the Google-Publisher agreement, I fear that *physical* libraries are doomed to eventual extinction. The idea of "use of the library" must change in relationship to this. Physical libraries will most probably gradually turn into archives that will be consulted rarely, updated and maintained to ensure against protection against some sort of catastrophe.

I've already written of my own experiences and thoughts concerning ebook readers and what it means for libraries. Does all this mean that librarians also have to disappear as well?

I don't think so, because in my experience, information does not organize itself, although it seems some on this list feel that something magical exists: what I call "the Philosopher's Algorithm." They feel that there is "the Perfect Algorithm" out there somewhere and it only needs to be found. Once they have it, they can run this algorithm against vast unorganized masses of information, and "the Algorithm" will organize it. I see this as similar to the alchemists who promised so much and searched so diligently for the Philosopher's Stone. :-)

But while we lack such a tool, unless we want to put our faith in the Google-type "black box," it will fall to humans to organize information, additionally I think we must assume that finding information that is both relevant and reliable will never be easy and people will need help. Almost all of these tasks will take place in a virtual environment, and the practice of librarianship has always been a personal, social activity. These are some difficult problems to wrestle with.

RE: fairies and subdivision

Posting to AUTOCAT

On Thu, 29 Apr 2010 12:18:54 -0400, William TB Fee wrote:

You make some interesting points, but I don't know if they are worth the costs of changing, see below:

>Two reasons I can think of for a change from 650 to 600. The first, which I have already named, is that the NACO folks, those who already do name authorities, would also be able to do authorities for fictional names as well.

I see this more as a problem of changing inadequate workflows and changing responsibilities, and not one of MARC format and cataloging rules. The NACO/SACO divide has always seemed artificial to me, but while the problem you mention is certainly a problem, it is only indicative of another, wider problem of the need to increase productivity, which needs to be addressed.

Still, anything we can do to improve workflow and productivity would be an improvement, so if it can be shown the changing these 650s to 600s would mean enough of an increase in productive to offset the maintenance, this is a possible reason to adopt it. But, it is still a stop-gap.

>Secondly, by coding as a 600, there are extra sub-fields that become available. ‡g Miscellaneous information, ‡c Titles and other words associated with a name, ‡q Fuller form of name and ‡b Numeration (useful for fictional heroes who have had many "real names").

This I agree with less. Are we going to start doing:
600 1\ $aTarzan,$cKing of the Jungle,$d1912-
and
600 1\ $aJane,$cconsort of Tarzan, King of the Jungle,$d1912-
and
600 1\ $aBoy,$cheir-apparent of Tarzan, King of the Jungle,$d1913-
although he could be "dauphin."
(dates from Wikipedia)

Of course, I'm just joking, but there are those out there who take these sorts of things seriously and therefore, force us to do so as well. If these would become 600s, do we really think there will not be pressure to change groups of them to corporate bodies? It would be logical after all, and failure to do so would be illogical! So now we have to work on, Fantastic Four, X-Men, Teenage Mutant Ninja Turtles, Justice League of America. Will we then have to scour every nook and cranny for official publications of these bodies?

As I have said: cataloging is facing very serious problems today and those problems must be addressed. These problems do not lie in cosmetics. They lie in adapting to fundamental changes in technology, but more importantly, to basic changes in what the public wants and expects from information, and how they interact with it. Sooner or later, we are going to have to face up to these changes or risk becoming obsolete and irrelevant. If someone can point to a problem of access and/or comprehension of a record, then consider changing *anything*, by all means. If someone can show that our public is having problems finding Tarzan or Jane or Bullwinkle, we must discover the problem and find a solution. Changing 650s to 600s would change nothing at all for our users in this case, but I'm not overly worried about it because I haven't heard of anybody complaining that they couldn't find Betty Boop. What I am worried about is what I I have heard from my students, who include those who come here from all over the U.S. and other
places in the world: they don't understand even the concept of searching by author, title or subject. They only know Google's single box. *This* is a real problem to deal with.

So, if someone can demonstrate genuine problems--primarily among our users--concerning lack of access or genuine comprehension of the records, that is a problem; otherwise:

If it ain't broke, don't fix it!

Thursday, April 29, 2010

RE: After MARC...MODS?

Posting to NGC4LIB

Bernhard Eversberg wrote:

<snip>
They [i.e. Google - JW] know much better than that, obtaining the collective results of a century of library work for free. Not just the catalog data from OCLC (without which they couldn't be as good as they are with the known-item search), but it is obvious that to get carefully built and maintained collections for scanning is worth immensely more than using just any stuff found in the vaults of antiquarian bookstores or wherever.
</snip>

I really want to believe this, but I'm not sure if you could find general agreement in the current environment. I have been looking very carefully at the Google Book Search interface because I think that once the Google-Publisher agreement is eventually approved, the Google Book interface will become the starting point for research. Why wouldn't it be? I know that back when I was a student, I would have absolutely loved it!

But when you examine the Google Book interface closely, it's quite frightening for a librarian. For example, the "cataloging information" is mashed-up and placed on the very bottom of the "About this book" page, precisely the spot where Google knows no one will ever go. But above the bibliographic metadata are just the sorts of tools *right now* that librarians are talking about developing, e.g. links into word clouds, reviews, mapping into Google Maps, and so on. They already exist, and done very well, I might add.

In any case, it is rather difficult to know exactly what is being searched in the Google Book Search text box (does it search the bibliographic metadata?) and more importantly, how the results are arranged for viewing. And I don't know how many people use the Advanced search, which of course, does not use authority files, so you get different results for "Dostoyevsky / Dostoevsky / Dostoevskii" and on and on.

What will be the use of a local catalog work in the kind of environment where people will be able to go to Google Books and get millions of things with a touch of a button? Why will we still expect these people to use our tools when they will find plenty to keep them busy in Google Books?

I don't know. I think we can offer many things Google will not be able to, but we have to reconsider what it is we do very deeply. We may encounter some very unpleasant truths. It won't be easy to grapple with much of this however, and that's why I appreciate people like Alex who can help us figure out some of these new directions.

RE: fairies and subdivision

Posting to Autocat

On Wed, 28 Apr 2010 15:27:59 +0100, Helen Buhler wrote:

>On 28 April 2010 at 8:56, Mike Tribby wrote:
>> "So, having set off this whole chain of arguments on footprints, authorship and the kid looking to steal material for a paper, I'll just say that I myself have no issues with a fictitious character in a 600. As Kevin M. Randall said, and Mac echoed, "a name is a name". Rules are the same for construction whether fictitious or not."
>>
>> I agree fully, however I'm mildly concerned about my anthropomorphic friends. Rather than Donald Duck (Fictitious character) in a 650 as it now is, in a 600 would it be:
>> Duck, Donald (Fictitious character),
>> which is like McDuck, Scrooge (Fictitious character)? Or would we retain the distinction between imaginary characters whose "surnames" are their species identification and those with more human-like surnames, yet place them in 600s rather than 650s?
>>

>Good point, Mike. Another one to add to the long list of RDA decisions still to be made... My stepmother would be saying that someone hadn't "thought it through". Very wise advice, I've found.

Of course, the broader question of: why? also remains to be answered. What exactly is the problem that this change from 650 to 600 (which will demand at least a certain amount of resources to "fix"), what precisely is the problem either catalogers or our patrons have encountered that this is supposed to solve? If there is no problem, why go through the hassle of "fixing" it? In any case, I don't see how this "solution" could solve anything at all. People will still search these things in exactly the same way, and only those few in the know will opt to select "search by subject" instead of by keyword. Still, even this will be the same.

In those systems where you can select "search by subject/personal name, subject/corporate name, subject/topic etc." the very, very few people who understand what this means and use it will notice a difference because they will now have to look up Scrooge McDuck under subject/personal name instead of subject/topic; something that will probably strike them as just as strange as it does now.

And for the .001% of this tiny remainder of users who can actually look through browse displays of these subjects and choose to do so, they will notice that Scrooge McDuck has moved from the "S" to the "M" section. Still, I hope the authority record will have a cross-reference from the old form and any potential dismay on the part of our patrons can be avoided. [I have since discovered my error: Scrooge McDuck will not change from current models, but Donald Duck will, or might. In any case, this is not all that vital to the argument I am trying to make. Thanks to Mike Tribby who pointed this out and kept me honest!--JW]

Again, I ask: Why? The only possible utility for changing from a 650 to 600, it seems to me, would be to have the option to reindex the catalog to include 600s when you search for personal name authors. I guess this would be for readers who think that Miss Marple existed and wrote her own books. I wouldn't suggest doing that, however.

This is the main problem I have with RDA: it is a lot of work for us to change our records and practices, to get retrained, but it certainly simplifies nothing substantial; the rules are still as arbitrary as ever. I don't see how it can possibly raise productivity in any way, and it results in no substantive changes that our users will ever notice. It doesn't make anything more easy to find, and 95% of the changes are cosmetic. In fact, as I review the "changes," I think RDA actually stands for "Retype Designated Abbreviations."

In many of these cases of typing out abbreviations, it would be much more logical instead to insert special codes for, e.g. "no place of publication" and have each institution display the text it chooses, in a language the user chooses. This way we could begin to utilize the power of the new technology to make our work easier and a more comfortable environment for our users.

Cataloging is in a state of crisis, as I think we all know. Perhaps changing to RDF instead of current MARC format will make a substantive change, I'm not sure, but these are the areas that will matter to our users. Changing the *cataloging rules* makes a difference only to us and not to the public or our institutions, as the example of Scrooge McDuck demonstrates.

Keep in mind the Cooperative Cataloging Rules!
http://sites.google.com/site/opencatalogingrules/

Wednesday, April 28, 2010

RE: After MARC...MODS?

Posting to NGC4LIB

Alex,

I think we are beginning to see eye-to-eye again. I agree that library methods are mainly 19th-century, our processes are 1970s, and the *public face* of what we are building is divorced from the needs of our users. I don't think FRBR or RDA solves or even seriously discusses any of this, and why I think it's all going down the wrong road.

So, the power that is represented by our authority files should be unleashed into... what? Topic maps? OK, I'm all for it! How about putting it all into a wiki or lots of wikis and let the world play? For example, the Mark Twain record is really useful and could be made much better, but not in its present format, and it's locked away in a closed database anyway. Let these things out for general experimentation, see what works and what doesn't, and go from there. This is impossible to do while the data remains walled off however. If it were up to me, all of it would be available in a couple of hours, but...

<snip>
I doubt many of you have deep knowledge of what is possible these days through various forms of AI, clustered facets, semantic models and various degrees of formal logic over reasonably fuzzy structures of knowledge management. Of course it's easy to counter this by saying "well, show me, I haven't seen it." Well of course you haven't ; no librarian specialist has joined up with a systems expert to create it yet (or, at least, I haven't seen it :).
</snip>

I can only hope something like this happens soon. Still, while I am sure I don't know everything about AI, I don't think that matters to the point I am trying to make, which is: currently, it can't do the job that we, as librarians, feel needs to be done. I may be wrong, but I haven't seen it yet, and I haven't seen anything even close. If you want us to believe you, it must be proven to us, even in a prototype. If we bring up the example of how we have controlled Mark Twain's name and ask if that control (not the method but the control) can be replicated in some way with AI, don't try to convince us that retaining that control is not important. Show us what the machines can do, and it is then our job to decide if it's good enough or not. But I am not just going to believe it.

And yes, it is precisely librarians and I think catalogers who should decide because they know what to look for, and not leave it to an inexperienced and naïve public. This is like some of the spiritualists who would claim they could talk to the dead and make tables walk around the room. They could fool lots of people, including professors with vast experience, but they couldn't fool Harry Houdini. It's the specialists in describing and organizing information (catalogers) who know the pitfalls.

Still, the *only* reason we do have the control we currently have over our materials is because of our adherence to the standards. If we throw off the standards, we throw off the control. Therefore, it must be clearly thought out.

<snip>
> Welcome to my world! This is what bibliographic control is all about and it is not simple.

Well, is that because of the tools and methods chosen, or because it's an actual difficult area? The reason I'm asking here is for example the dreaded way of having birth and death years (sometimes, or too often optional, or wrong) as part of the name as an identifier for entities which you have to match and merge in order to make sense out of (and most of the time get wrong). In the past these identifiers were the best one could come up with because there were no computers, no decentralized means of resolving them, no authority control across different bodies. However, now we do. See where this is going?
</snip>

I agree with a lot of this. For example, I think the Wikipedia disambiguation page is far superior to our methods, but still, the actual task itself of "disambiguation" is difficult, although I agree it can and must be made much simpler. We must always keep the standards in focus though; if we were not involved in a cooperative task, we would not have to follow standards shared with others, and we would be free to do anything we want. But shared standards involve other responsibilities.

Should the standards change? Absolutely, and RDA purports to do that, but in my opinion, it's just more of the some using other vocabulary, and why I initiated the Cooperative Cataloging Rules.

RE: After MARC...MODS?

Posting to NGC4LIB

Alexander Johannesen wrote:

<snip>
First Jim : "Better" and "reliable" are synonyms to the subjective user, and I'm always writing from his or her perspective, never from the librarian perspective, which perhaps not only clarify my language and stance, but why we seem to have these contradicting views which when scrutinized become agreement. I'm sure that "reliable" is - how to put this? - a better goal, however I don't think I see users worry about the operating word here, nor what your goal is per se.
</snip>

This is an important point that separates certain world-views that I must insist is both understood and accepted: "better" and "reliable" are not synonyms at all, and I think the general public can understand the difference when it's explained to them. It is much like many people I know who absolutely hate to drive a car: they hate maintaining the car, fighting the traffic, obeying the traffic signs and rules and so on and so on. But, for many it's the only practical way to get anywhere. As a result, they have no choice except to learn how to drive a car, learn about putting air in the tires, learn the traffic rules, follow the signs, read maps...

"Better" doesn't even enter into the equation. Certainly, people can imagine lots of things as being "better" than this task they absolutely despise, but all of that remains in the realm of fantasy. This is how you drive a car. Period. And there are very good reasons why these rules and methods for driving a car exist, because otherwise if everybody were left to do whatever they think is "better," and ignore one-way streets and speed limits, let their brakes go to pieces, the result would be chaos. If you want to get from here to there reliably in a car, you have no choice except to do it a certain way. And it's very complicated.

This was the situation in the library and its catalog before keyword access. People learned (or were supposed to learn) how the catalog functioned, learned the "rules of the road" and followed along as they were supposed to. It was just like the situation with a car: the tools made were designed to get the patron to the materials they needed reliably, although it was not easy at all. They had no other choice since there was no other access into the collection except to browse the shelves more or less helplessly (if it was open stacks), and many chose this option.

When keyword was introduced, it suddenly allowed people to drive "off the roads" and this was experienced as a sense of freedom. Librarians (I believe), while realizing keyword was introducing a bit of chaos into the catalog, shared this feeling of freedom, seeing it as a very useful addition to the normal means of access since the reliable means of access still existed. Yet in reality, as time passed these "normal means of access" became less and less known among our public (and among many librarians as well) and as a consequence, less and less used, and the former "reliability" was practically forgotten among them. Yet, the catalogers still maintained and added to these increasingly empty streets, pretending that they were being used as much as ever. This is obviously a practice that cannot continue indefinitely.

So again coming around to the point, it seems as if your suggestion is that we abandon these roads that we have been creating and maintaining over the generations in favor of these "wonderful and new" tools that are being created today. I, and many of my colleagues, say that the new methods cannot replicate what the old methods can do--and still do thanks to the fortunate circumstance that they have continued to be maintained throughout the years--and would prefer to adapt the old roads to the needs of the new environment. In this way, they could be rediscovered by the public. Naturally, the new methods you mention can be adopted and added, but the task of "reliable access" needs to be taken up anew and put at the forefront.

<snip>
Apart from that I don't have much to say as you didn't really address many of the larger points I was making. :)
</snip>

I ran out of time and I am again. But let me take one of your points:

<snip>
... because it's a straw man. I can do the same to you; if you searched for Mark Twain you will miss out all of that which was written by Samuel Clemens or Josh or Thomas Jefferson Snodgrass.
</snip>

There's a lot more than that. To see what you are missing and how to use the catalog correctly, see (you have to click on "Authorized and Notes," then "Authority Record" and then "Labelled Display" (Don't blame me. I didn't design it! :-) ):
http://authorities.loc.gov/cgi-bin/Pwebrecon.cgi?DB=local&Search_Arg=Twain,%20Mark,%201835-1910&Search_Code=SHED_&CNT=50+records+per+page

So, we see many other forms that will lead a searcher to the authorized form of "Twain, Mark, 1835-1910," plus the absolutely critical note:
For works of this author written under other names, search also under
Clemens, Samuel Langhorne, 1835-1910,
Snodgrass, Quintus Curtius, 1835-1910
Louis de Conte, 1835-1910

This is because there is a bibliographic concept of "separate bibliographic identities" and it manifests itself in AACR2 rule 22.2B with the LCRI at: http://sites.google.com/site/opencatalogingrules/22-2b--choice-among-different-names--pseudonyms where it says contemporary authors have separate bibliographic identities, and therefore will have separate authority records for each name, plus a contemporary author is anybody who lived into the 20th century. Twain died in 1910, so therefore, he falls under this rule, and there are some nice guidelines to follow in this RI.

There are also other forms being brought together in the VIAF:
http://viaf.org/viaf/50566653#Twain,%20Mark,%201835-1910

Welcome to my world! This is what bibliographic control is all about and it is not simple. I personally disagree with 22.2B and the entire idea of separate bibliographic identities, but it doesn't matter what I think. I must follow the rules, and that means that anybody who uses the library tools must follow them as well. No single person can know all of it.

So, considering that example I gave in one of my messages of searching "Dostoyevsky" and the typical Google response "Did you mean: Dostoevsky" seems rather facile in comparison.

I think these constructions are far too important to be discarded and it has yet to be shown that the new and wonderful tools, as promising as they may be, can provide even a hint of this control.

Tuesday, April 27, 2010

RE: After MARC...MODS?

Posting to NGC4LIB

Alex,

I think we are actually on the same side, more-or-less, but there are some basic points of difference in our world-views. This may, or may not, have to do with the world-views of our respective roles as cataloger or systems person.

A Disquisition on the Idea of "Better"

It seems as if the goal of "better" search results should be accepted as a given. What could be more obvious than that? But it is important to understand above all that this is not true for a library catalog. It's also not true for many other things in our world. That may seem strange, but the evidence is all around us.

Library experience has shown that "better" is far too subjective to be a practical goal. What is better for me may very well be worse for you. As an illustration, I have always considered the process of cataloging to be similar to that of a mechanic (although I know this may irk some out there, from my working-class background, I consider comparison to a master mechanic to be a great compliment). The mechanic working on your car seeks to make your machine go the way it is supposed to go. If he tries to make your car go "better," it must give you pause because then it will be going in the way it is not supposed to go. While it may end up genuinely going "better," it may only seem that way to you and it may actually be worse in ways you don't even know and will appear at unfortunate moments: polluting the atmosphere, being more prone to going out of control on curves, or in other ways.

This problem with coming up and implementing new ideas was something I pointed out in a thread I began about "lack of imagination" among catalogers. Boy, have I been excoriated for that one! But I still maintain it: the job of catalogers is to adhere to standards--just as mechanics must--and therefore, flights of imagination cannot be included within the system where those standards function. There can be brilliance within the system, and I have witnessed it many, many times, but not "flights of imagination." This may be good or bad, but if catalogers want to let their imaginations free, which I think is vital today, they must work outside the system where those standards function, otherwise, they destroy it.

Library catalogs aim for reliable search results and not the "best" results. If I am interested in "love" and search in a library catalog with full cross-references, and I do it correctly, that is, following the methods laid down in the 19th-century for using card and printed catalogs (please keep reading!), I look under "L" browse to "Love" and find:

Narrower Term: Attachment behavior.
Narrower Term: Communism and love.
Narrower Term: Courtly love
Narrower Term: Courtship.
Narrower Term: God (Christianity)--Love.
Narrower Term: God (Christianity)--Worship and love.
Narrower Term: God (Hinduism)--Worship and love
Narrower Term: God (Islam)--Love [proposed]
Narrower Term: God (Islam)--Worship and love.
Narrower Term: God (Islam)--Worship and love [proposed update]
Narrower Term: God (Judaism)--Love.
Narrower Term: God (Judaism)--Worship and love.
Narrower Term: God--Love.
Narrower Term: God--Worship and love.
Narrower Term: Love, Maternal.
Narrower Term: Love, Paternal.
Narrower Term: Marriage.
Narrower Term: Platonic love.
Narrower Term: Summer romance
Narrower Term: Unrequited love.
Narrower Term: Yoga, Bhakti.
See Also: First loves
See Also: Friendship.
See Also: Intimacy (Psychology)

I think this is a provocative display that actually opens my mind to new possibilities that otherwise, I most probably would never have thought of. (Love to Summer romance, or to Bhakti yoga!) When I look at the individual records under the heading/concept "love", I can be assured that I am looking at all items in my local collection (but not journal articles) where no matter in what language the item is in or when it was published, the topic of "love" equals 20% or more of the content (that is, if the catalogers behind the scenes have been trained to do this job and continue to do their jobs correctly, which is becoming increasingly problematic). Also, since there are so many books where 20% is on love, this topic has had to be subdivided into myriad groupings. And it is really handy to be able to separate out other things such as Love Act (Musical group) and Love actually (Motion picture). Please be aware that I also cannot use my imagination and decide that "Wedlock" is "better" than "Marriage" and enter it instead, because then the whole system breaks down.

Please understand, there can be no dispute about what I just wrote here. These are facts about how the machine works. These facts are being forgotten, and disputing them would be similar to arguing with your mechanic that the pistons shouldn't work this way, but some other way. While you can have whatever opinion you like, it doesn't change the fact that this is the way the pistons are supposed to work. Creating new records that fit into this system correctly is also highly complex.

Therefore, the library catalog does not aim for "better" results, but "reliable" results and "reliability" is based on a whole number of factors. I maintain that if Google is aiming for "better" results, that's fine but "better" still remains a terribly subjective term that in practice equals "the result that makes me happy, but I may be happy only because I don't know what I may be missing. If I knew what I was missing, I might not be happy, but if I don't know, then ignorance is bliss."

Now to change focus: can and should this traditional library catalog model, which relies on adherence to strict standards, be maintained in the new environment? This I do not know and I don't think anyone will know without some kind of genuine research and "flights of imagination". I will say that it is important to search by differentiated "personal, corporate, and geographic names" and "subjects" (I don't know so much about titles), and that it would be very nice to be able to distinguish one Michael Gorman from another Michael Gorman.

The current method of browsing headings is definitely 100% obsolete, and has been since about 5 minutes after the implementation of keyword searching in the catalog. I would certainly like to see some new attempts to make the power of our headings more obvious in the new environment instead of just abandoning them. At last we are getting some tools such as Aquabrowser, but I personally find them unconvincing. Vivisimo and Grokster are on the right track and have also provided some interesting attempts but I haven't seen anything with library-type headings.

If there are no imaginative attempts to make these traditional controls more useful today however, these standards probably will be abandoned as useless. These imaginative attempts presuppose open data however, and this has not been forthcoming from the library community. For example, while I applaud the id.loc.gov project, it was much too late, and we are still stuck with textual strings that must be browsed to be understandable, and therefore, we are still stuck in the 19th-century. If these incredible files had been put out 10 years ago for open development, they could very well have served as the backbone for the Semantic Web, but now, this will probably shift to something such as dbpedia, where I think we must get on board.

This is enough for now. I think that "better" will probably be the goal of the newer systems, although it must obviously be qualified by many, many caveats.

Monday, April 26, 2010

RE: After MARC...MODS?

Posting to NGC4LIB

Peter Schlumpf wrote:

<snip>
Anyway you look at it, both opacs and Google simply match patterns of text and return the results. To say otherwise for either is to suggest some sort of inherent intelligence in the system itself.
</snip.

While this is correct, it needs to be kept in mind that--for better or worse--this is *not* the way the OPAC, or the online public *card catalog* is designed to work. This is part of the knowledge that is being lost.

During the time of the card catalog and before, people never searched "text" like they do today because it wasn't possible. They browsed cards that were arranged in a conceptual manner. You could arrange these concepts primarily in two ways. First was a classified way as they did in many European libraries, and so you would search for "dogs" under animals--vertebrates--mammals... (whatever the classification is). In the US, they opted for the "Dictionary catalog" which used alphabetical order (and then some classified arrangements after that) so that you searched for "dogs" by going to "D" and browsing to "dogs" and perhaps finding a cross-reference to "Canines."

This is still what we do. Isn't that freaky?! This is why I say that taking a record outside of the catalog it belongs to is a bit like taking a fish out of water: it can't really exist on its own because it is so reliant on so many other things. It becomes more or less senseless and will die on its own.

This is why you can search concepts in the catalog, and shows that people searched concepts before OPACs. In fact, they had no other choice. When computers arrived, catalogers used them as a more efficient way to *make cards,* not to find and access, and when online access came about (Online Public Access Catalogs), everybody just kept on doing their same old thing, while the systems people threw on keyword searching. This actually messed up the traditional, conceptual way to search the catalog and therefore, the traditional catalog card traditions disintegrated in the online environment. Nobody talked about it much; it was never fixed but just ignored.

Does this need to be changed? Of course (and keep in mind that RDA and FRBR do not change these functions in any fundamental way! That's one of the many reasons why I maintain they must be reconsidered), but the access the card catalog provided, and still provides through the OPACs, can still be extremely powerful, so long as they are used properly.

RE:... After MARC...MODS?

Posting to NGC4LIB

On Thu, 22 Apr 2010 21:53:44 +1000, Alexander Johannesen wrote:

>On Thu, Apr 22, 2010 at 21:18, Weinheimer Jim wrote:
>> How about this? Pretend that you are interested in the history of black people in agriculture in the United States. Tell me how you would go about searching and retrieving information in Google or Google Scholar or a related tool using full text.
>
>Well, I pop "black people in agriculture in the United States" and get back 1.3 million hits of which, one can assume, there is valuable information. I go through them, and copy and paste into my research document anything that smacks of gold.

Sorry I didn't respond to this earlier since I only saw it now. Your results and reactions are extremely interesting.

First:
You say that you get back "...1.3 million hits of which, one can assume, there is valuable information. I go through them, and copy and paste into my research document anything that smacks of gold." What??? You go through 1.3 million hits???? You are a really fast reader! Sorry, but that one I will never believe. And this is one of the main problems when my students find when they use a tool such as Google in practice: the result is completely out of control. When they have gotten the same types of results for simple purposes relating to their own general interest, they don't care so much about the search result since it can be fun to "surf." But when they are grappling with something important such as writing a paper, where they have to stick to a specific topic, and they could flunk out if they quote something stupid, they see Google as something much less useful and more similar to a toy. And it frightens them.

They feel there may be something there in the search result of 1.3 million (and I add: or not there, see below), but the results are in a completely unpredictable order that change constantly, based on the number of links to an item (and thus, place #1 is determined primarily by bloggers) plus there are a number of other factors that determine ranking which are business secrets of Google. It has been shown without a doubt that this order can be manipulated for all sorts of purposes (for obvious examples, see Google-Bombing in Wikipedia, but this is being done constantly in far more subtle ways). As I tell my students, the Google use of the term "relevance" does not at all equal their own understanding of the term "relevance" and they should not confuse the two. The Google use is a secretive business term but one chosen strategically to make their customers more comfortable. It works.

Second:
What exactly are you looking at when you see the results from "black people in agriculture in the United States" and also, what are you not looking at? Well, you miss many original documents, because the term "blacks" was not the word used for African-American people in agriculture in the early United States. There were other terms used, some highly insulting today. When a cataloger puts in metadata, it's a completely different matter. In a library catalog, you don't have to search these older terms, but in full-text you do, or they will never come up in the result--and you will never realize it. As a result, you miss entire categories of really useful information.

Other problems: "agriculture" is unnecessarily limiting. You would also have to search at least "farming" but probably others as well. Searching "United States" will miss most of the information in the individual states, where there will be lots of possibly the most interesting resources.

I won't discuss "quality of information" here, which is another huge problem that people have to face every day. You say, "anything that smacks of gold" but how am I supposed to know that? Also, I won't discuss exactly what Google is and is not searching when you do a search, because this is another of their closely-held secrets.

So we see that what at first glance appears to be extremely simple: typing a few words into a box and getting a result, is incredibly complex and terribly limiting. It takes an expert to understand how limiting it is. Google has done an excellent job of making it seem to be simple, and they have done this by designing a tool to make people happy, but we should not confuse this with providing results that are reliable and comprehensible, which is what people really want. And it has serious consequences, as students will tell you.

I would suggest that when people see matters in this way, they will see the immensity of such a task, and that they will have a bit more respect for the work done in catalogs, which smooths the way for people. But of course, "we won't get no respect!"

Perhaps this is too detailed for your purposes, but it certainly is not too detailed for the students I work with, who are being serious about it and, as I say, terribly worried about it since not dealing with it could derail their entire careers.

Library catalogs are designed on different principles and have strengths in exactly these areas, and this is why I think that creating a tool that would bring the strengths of library catalogs together with full-text retrieval tools would be the best. But simply ignoring what our tools can do would be the same as allowing superstition and bias and even censorship to run rampant.

RE: Designing future "catalogs" - where do the users come in?

Posting to NGC4LIB

Alexander Johannesen wrote:

<snip>
B.G. Sloan <bgsloan2@yahoo.com> wrote:
> In short, how do we design systems that help users find what they really need, rather than designing systems where we librarians assume we know what users need?

There's an approach that is rarely used but that I have some experience in doing :

Ask them.
</snip>

Absolutely. This is the major problem with FRBR and its user tasks that supposedly, everything is based upon: I don't think anybody would ever say that the FRBR user tasks are what people want and/or need in the current information environment. Nobody has ever pointed to any research that shows FRBR builds a tool that users want. If anybody questions themselves about their own searching habits, I doubt if anybody will say that what I really want and need are the FRBR user tasks. This doesn't mean that people no longer need to be able to: find "stuff" by their authors, titles, and subjects, but we need to recognize the startling fact that in era of "Google-one-box" services, the very idea of being able to limit a search by "author" or "title" or "subject" is becoming lost. And people on this list know what I think of the use and even artificiality of "Work-Expression-Manifestation-Item," which have no relationship to user tasks (except for those very rare users who need the 1917 Harper's edition of Huckleberry Finn; but let's face it--if somebody needs that level of detail, they probably need more information than what a regular library catalog gives them). And anyway, full-text searching has made clear that people want and expect many other powers as well.

But on a broader level, if we define a catalog as an "aid for finding different resources that exist within a local collection," I think this needs to change because now with the web, where people can see a lot more than ever before, the "local collection" is necessarily limiting from the very beginning. And our users see it clearly. Much more useful for our patrons would be for everyone who feeds into the catalog (i.e. those who select materials for cataloging, plus those who do the actual cataloging itself) to think in broader terms than a "local catalog," envisioning something more akin to a "bibliography."

In this way, selection turns into something other than, "what is being paid for and/or housed in my individual collection and other institutions where I have a special relationship" into "what are the useful materials available to the members of my local community no matter where they happen to be and if they are free or not."

Showing people what is really and truly available to them and not just what is in a local collection is what I mean by thinking in terms of "bibliography" which is not limited to any one collection. Only when you know what is really out there can the individual or institution decide if it is worth the costs of getting access to it. One major step further in this direction would be to create an "annotated bibliography," and this could be done easily enough with Web2.0 tools.

Making such a tool would simply be too much for librarians to do alone, and we would have to enlist the help of scholars, teachers and other experts for the task of selection; for that of description, access, and record maintenance, we would need lots of other help. Of course, we would have to rethink our current standards into what could be done practically to ensure some standards of quality in these records that must be built cooperatively with many other communities out there.

For example, what is the purpose of the "bibliographic description" when the item itself is instantly available with a simple click? While I don't doubt there is a purpose, it must be seen as fundamentally different from the purpose of the ISBD standard for a book that is available only after trudging through the stacks, or after spending two days retrieving it from an annex location off-site, or through a very expensive ILL that involves both staff time and funding from a number of agencies.

Using a tool such as this would probably demand all different kinds of interfaces: one for novices, another for experts, or perhaps even by specialty: for classicists, for architects, for physicists, ...

To me, viewing the problem in these ways would represent a real "maturing" of the local catalog into something greater than it is now, and would create a tool that would become vital to our users, and I think rather quickly.

Friday, April 23, 2010

RE: After MARC...MODS?

Posting to NGC4LIB

Alex,

I think this exchange has been extremely valuable since it illuminates an area of genuine difference in the views of catalogers vs. programmers.

I tried to elicit a real-life example with my request for how someone should go about finding information on a specific topic. This happens thousands of times every day when people ask a reference librarian to help them find specific types of information. If you are only mildly interested in a topic, it's one matter, but if you are trying to write a paper for class, or keep from looking like a fool when talking on topics with others who may be experts, you take a more serious approach.

Students have tremendous problems with this as they progress in their studies at university. Research has shown that some 80% or more people rate their searching abilities as "very good" or "expert". And they may be, for a mundane task such as finding the height of Mt. Everest, getting somebody's email address, or finding and buying a new Ipod on Amazon.com. (More or less what librarians term "ready reference") But once they are confronted with the task of finding information for a class paper--even on extremely simple topics such as the one I gave you--they discover they are helpless and don't know anything at all. They don't know where to begin; they don't know how to end; they don't know anything except to type different words into a box and it's not working.

This is when they come to the reference librarian for help, and in my experience, they are more or less in a state of shock and totally panicked. This makes the librarian's job *especially difficult* since your number one task is to calm them down. Still, I think a lot of their panic is from suddenly having to face the undeniable fact that in a realm where they believed they were experts, it has become frighteningly obvious that they don't know what they are doing at all.

One thing that traditional librarians always had--and yes, I am bringing up that dreaded *past* again! :-) -- was the control in a library catalog. They knew that within specified limits, and those limits were clearly laid out, they really could find *all books* by Dostoyevsky in the local collection, no matter in what language or how his name appeared in the item. This is something you--let me be very clear about this--you *absolutely, positively, cannot do* using only full-text tools.

So, the library catalog has always been arranged conceptually (an error shown in the statements of many non-catalogers who discuss our "textual strings" and ignore their conceptual purpose), and therefore, within known parameters, you can find "everything" about the concept "cats" in a library catalog. See for example, this search in LC's catalog with some very nice suggestions for other searches:
http://catalog.loc.gov/cgi-bin/Pwebrecon.cgi?DB=local&Search_Arg=cats&Search_Code=SUBJ_&CNT=100&hist=1

Browsing in this way shows clearly what is available in the LC catalog on cats and even provokes the searcher at times with the unexpected, e.g. Cats--Caricatures and cartoons, or Cats--Humor, or Cats--Psychic aspects. This is how a catalog is arranged: by concepts, and how it is supposed to work. When I have shown people how it works, they find it quite powerful and extremely fast. Also, with our conceptual controls, we can separate the concept "cats" from other conceptual entities that are not related but use the same text, e.g. Cats & a Fiddle (Musical group), Cats & dogs (Motion picture) Cats & Jammers (Rock group) and so on. Try searching Cats in Google (it would probably be best to turn on the Safe Search, but perhaps not) and compare it to the subject search in LC and with an open mind, consider if the Google result is anything that could be useful or merely chaotic.

Google and other full-text tools do not have this kind of control, and librarians miss those controls a lot, I assure you. This is why experienced librarians often look askance at the new tools.

What I just outlined is becoming lost knowledge, especially so among the public, but also increasingly among younger librarians. But I submit that ignoring these controls that *are not replicated anywhere else* or dismissing them out of hand actually limits our imagination as to what can be done.

The method I sketched above (which becomes far more complex in practice) is hopelessly obsolete in many ways that I won't go into here, and this I readily admit, but the control it provides is not.

The future I foresee that would be best for our users is not to abandon the controls found in a catalog but to adapt them for the current "user needs" (whatever they are determined to be, and not those determined in FRBR), perhaps dumping some things, and working with the additional access provided through ever-improving full-text searching capabilities.

Perhaps someday, the conceptual work done by human catalogers can be done automatically by machines. But nothing I have seen makes me think it will happen anytime soon. If our controls should vanish, and this may be a possibility with the economic crisis and the generalized view of "traditional cataloging = continued obsolescence," it will impoverish us all.

Thursday, April 22, 2010

RE: After MARC...MODS?

Posting to NGC4LIB

Alexander Johannesen wrote:

<snip>
[...] they [users--JW] don't give a monkey's bottom about the minutiae of the twilight-zone between AACR2 and ISBD; that stuff is only interesting to people who are several steps removed from both reality and the evolution of information science. It's nonsense, it's piffle, it's caring about where you put the life-guards flags up on the beach when the tsunami hits. This stuff is something nobody needs nor cares about, not anymore, except die-hard catalogers who find some sadistic pleasure in rules and regulation that don't mean anything to anyone else.
</snip>

I encounter this type of reaction among non-specialists--which includes many librarians who have never done cataloging by the way--and I find it highly revealing. It would seem to be logical that if there is a general agreement among specialists that certain capabilities within a machine or a system are vital, there would be some level of respect paid to their experience; that although there may be many things that I, as a layman, personally do not understand, I should not conclude that I am in the midst of some type of conspiracy among members of a "modern guild" who are really only trying to retain a dead-hand control over processes and materials. It would be more logical to think that perhaps the knowledge this group has built up over the years and decades and even centuries is a type of collective wisdom that I do not know--and cannot know without a lot of work; that what they know and what they have specialized in over their entire careers should not be dismissed out of hand.

When that specialized group is made up of mechanics, or bakers, or even computer programmers, a certain respect and deference is given to their expertise and experience, but this does not happen with cataloging. Everybody always knows better than a cataloger. To be fair, it has probably always been that way. Witness the scathing attacks Antonio Panizzi had to endure at the British Library in the mid-1800s. In my research at Princeton University, I discovered that the creation and maintenance of the library's catalog was at first the responsibility of the university's president, who got tired of it; he delegated it to the faculty, where it deteriorated badly until the entire affair disintegrated altogether in the 1870s with one poor soul literally going insane(!). The faculty finally admitted defeat and inability to deal with it (! although they maintained it was lack of interest), and they hired their first real librarian. I still am in awe of the incredible knowledge and abilities this man displayed, and his boundless ability for hard work. He is probably the greatest practical cataloger I have ever come into contact with.

The Google-type tools today seem to be easy but that is the way they are designed to *appear*. While these tools are incredibly useful and I use them all the time, the cataloger understands that this "easiness" is actually deceptive. So, if you do a search for "mark twain" or "world war i" or the memoirs of a fighter pilot during the invasion of Iraq, or whatever, the search seems to work because you almost always get something that makes you "happy." This is where it ends for the layman, but for the cataloger, it's only the beginning. For the cataloger, it is a very serious issue what people mean when they search something like "world war i" or "blacks" or "Dostoyevsky" and then instead of just making the searcher "happy," to relate that request in a reliable way to what they retrieve. This is not easy.

Research has shown and my 100% experience is that people "trust" the Google result, although it is totally a black box. I cannot know what is really in the "Google collection"; I cannot know the intricacies how Google ranks anything. All that is completely secret, rather like the guilds I mentioned above. Therefore, the Google result cannot be checked for reliability. Things are quite different in a traditional library catalog. The technical issues in this regard are highly involved and I have gone on too long already, but this is a complex matter and cannot be understood in five minutes, I assure you.

The non-specialist has never thought about any of this. When I do a Google search during reference work or in information literacy classes, I pause at the Google result and ask, "What are we looking at?" Nobody I have met has ever thought about what is contained in the Google result or why something is number one. This is an issue of prime importance: the retrieval of information that human beings can rely upon so they are not subject to a torrent of misinformation campaigns, spin and superstition. I think that's important.

Enough said on that, but this is a very sore point with many catalogers. While we are supposed to respect everybody else, they can all say we don't know anything at all and dismiss everything we say as backward and useless. Many take offense, although I am more open.

Now,

<snip>
What I just don't get is that catalogers and librarians know so darn well how their knowledge is needed, how the future needs people who can guide and help us all in through the informolasses, yet there's no movement towards doing so on a grand scale! You all sit and dick around with quibbles of whether to use MARC or MODS, and talk about how FRBR and RDA fits into your world. I'll tell you how it fits into *our* world;
</snip>

I think this is very well-said. Do we need change? Of course. Still, at some level we need to discuss nuts and bolts, and this is where I see the MARC-MODS, FRBR-RDA debate. We need deep and substantive changes and the current efforts are in the wrong direction, I completely agree. Still, I think that the foundational basis of cataloging remains as valid as ever and are definitely not supplied by the Googles and Yahoos and Mendeleys out there, although they may desperately try to convince us that people need nothing else. We need them and they need us.

It may take a few decades for people to realize they need specialized-librarian controls, however, much as it took the Princeton faculty quite some time to finally admit they just couldn't handle it. We'll see, but I hope it doesn't take that long.

Wednesday, April 21, 2010

FW: [NGC4LIB] After MARC...MODS?

Posting to NGC4LIB

<snip>
I will add that bringing in views from outside the community of specialists is a good thing. They offer something fresh. The present ugly kludge that is MARC is a product of inbred thinking and cruft accumulated over decades. Such finely grained definitions of what a title is looks a little ridiculous in the real world. I see little of value in the byzantine complexity.
</snip>

While in many senses, I agree with this, bringing in new views must be done with care, otherwise they can also offer naivete. It's always rather easy to see a highly complex practice and proclaim that it should be gotten rid of. So, while it may seem ridiculous to have so many definitions of a title, you need to find out why. In this case, it is because that there really are so many different types of titles. This is one of those things you don't realize until you think about it more closely.

So, while it may seem that it is catalogers who are imposing a "Byzantine complexity" on materials that in themselves are clear, and therefore is just not worth the effort, from another point of view, catalogers state that we are imposing order and clarity upon materials that, in themselves, are essentially chaotic.

Now, once this is understood or at least accepted, the debate can begin. Do we want to lose this kind of control over the different types of titles? If so, what would be the consequences? And the small group cannot answer these questions on their own, since the only people who can know what the consequences would be are those who use your system. I am at least open to the possibility of dropping all of it, although I am skeptical.

Here is a concrete example of what may seem to be a small change can erupt into a huge complexity. One rule in subject cataloging is that you have one heading for one concept. In the 1990s LC decided for various reasons to change this excellent rule for the case of the "Soviet Union" when everything fell apart. Suddenly, there was a choice of three possibilities: Russia, Soviet Union, or Former Soviet republics. Seems small and simple enough--that is, until you had to do it.

Suffice it to say, these were some of the most mind-numbing complexities I have ever worked on in my life. We had some of the most experienced catalogers in the country perplexed. Finally, I made some guidelines myself, gave a workshop and then was asked to put it online in my Slavic Cataloging Manual (which I gave to ACRL and Indiana University) at: http://www.indiana.edu/~libslav/slavcatman/rsufsr.html If you want to see some of the complexities, read it.

Little changes can have huge consequences in a catalog. I agree there may be alternate solutions but care must be taken before introducing them.

FW: fall of kapitalsim forestalled

Posting to Autocat

Concerning whether people will be willing to pay to access materials on the web when they have the choice of free alternatives will have to be seen. The NY Times has also decided to start charging people starting sometime next year. They may be right, or wrong. Only time can tell. And a lot depends on the kind of new tools that are made.

For example, I looked up in the latest Rolling Stone, and found that there is a review for a new album "Here Lies Love." But in Wikipedia, there's a lot more information with some good links: http://en.wikipedia.org/wiki/Here_Lies_Love. There's a link to the Rolling Stone review, but would it bother people terribly if the link just didn't exist? They probably wouldn't even notice it. Also, as I have pointed out before, how popular will sites such as Google News become: http://news.google.com/ where you get more choices than you can deal with. If one site doesn't let you in, there will always be another.

I also think that that article in The New Republic, "Toward a new Alexandria" is fabulous and right on target. I think it points in some important future directions for libraries, but who knows what will happen?

But one thing I do know. All of these established organizations are trying to convince the public that they are doing a better job because they are "professionals," but it has been shown that, e.g. the New York Times hasn't been so great for several years. There may be lots of reasons for that, but I know that I for one, won't subscribe because I don't think it will be worth it. And yes, before you ask, I do subscribe to some magazines online now. That makes it very difficult for these organizations, and I confess that I feel exactly the same worry for libraries. But whatever happens, it won't stay the same for, "the times, they are a-changin'"

RE: [NGC4LIB] After MARC...MODS?

Posting to NGC4LIB

Alexander Johannesen wrote:

<snip>
On Tue, Apr 20, 2010 at 23:02, Lundgren,Jimmie Harrell
<jimlund@uflib.ufl.edu> wrote:
> Oh, please! http://www.loc.gov/marc/

Not sure what to make of that response. MARBI is suppose to pursue the wonderfulness of MARC, no? Where is the framework I'm talking about? Or even the idea of a framework rather than a few scruffy scripts? The mind boggles.

Besides, I've talked about the absolute EVIL of MARC before, a timely reminder for this discussion ;
http://shelter.nu/blog/2008/09/marcxml-beast-of-burden.html
</snip>

Thanks for pointing to your blog posting, but I do have some problems. You write:

"<datafield tag="245" ind1="1" ind2="0">
<subfield code="a">Arithmetic /</subfield>
</datafield>

I'll translate what this does for you;

<title>Arithmetic</title>

The MARC tag 245 means "title statement", and the code "a" means, uh, title. This perticular madness comes from the culture of MARC itself which I'll rant about some other time (and have in the past), so I'll try to stick to the pure XML part of it."

This isn't entirely true. 245$a means "title proper," which is a technical term that catalogers use. You can find the guidelines for it in the ISBD at 1.1 Title Proper:
http://www.ifla.org/files/cataloguing/isbd/isbd-cons_2007-en.pdf#page=37
where you find several paragraphs defining it, stating what is and is not a title proper, and what is and is not included in it. Certainly non-specialists will find its meaning almost impossible to understand without a lot of additional work, and at difficult moments, catalogers themselves must return to this definition. Practically all parts of the bibliographic record are defined in these extremely precise ways. This is no different from how any other standard works in any endeavor. For example, without a lot of effort I cannot understand the standards for roofing materials or the requirements to be able to label a certain food as "Chocolate" (apparently, the standards for chocolate are more rigorous here in Italy than in other countries).

So, when a non-specialist says "title" it does not mean the same as when a specialist says "title," who immediately differentiates it in many ways and encodes it in the 245 field, one of which is the "title proper" which goes into the 245$a. So, in contrast to the others on this list who say that ISBD is a problem, I will say that the library world in very lucky in this respect because of the tremendous work done by our predecessors to create truly international standards based on the ISBD. Are they followed perfectly by everybody? Of course not.

The non-specialist may justifiably ask why titles are treated in such a seemingly arcane manner, but then would get entangled in a myriad of intricacies, just as would happen if I would ask why a certain standard exists for roofing materials. As a short answer to the title question, in addition to ensuring the correct identification of a specific item, these guidelines exist for some older reasons as well, such as filing order in the card catalog, so that, e.g.:
"War and peace : the definitive edition"
and
"War and peace in the nuclear age"
are not interfiled.

I expect to get tomatoes hurled at me for pointing this out, but nevertheless, it is questionable whether people browse titles today as they did in the old days, and to be honest, every computer catalog I have used interfiles it anyway, along with all kinds of other strange interfiling practices. While this has often driven me crazy, the public has never seemed to notice it. So, perhaps this reason is no longer justified, but there are still plenty of other important reasons for retaining the title proper so it should certainly be retained.

This is why I say that specialist catalogers must be involved in these matters. We can't expect programmers to undertake a deep study of these bibliographic intricacies, just as I mentioned in an earlier post that a specialist cataloger can only become at best a semi-competent programmer.

While I believe things need to change and simplify (which is not the direction of RDA, IMHO) it must be done with a clear understanding of what we have now, what could be gained and what would be lost. This is beyond the abilities of any one person or even one community of specialists. There is nothing wrong with this; it is just the way the world works.

Tuesday, April 20, 2010

RE: On imagination, catalogers, perspectives, negative stereotypes -NGC4LIB

Posting to Autocat

Concerning my original posting, I made a comment to MJ Suhonos on NGC4LIB which I did not post here. I have placed it on my blog at: http://catalogingmatters.blogspot.com/2010/04/re-ngc4lib-on-power-of-imagination-was.html (this is why I made the blog. I found I was getting too confused!)

I am not stating that catalogers are not imaginative (how is that for a double negative?); catalogers can be very imaginative. I am trying to say that through the nature of their jobs, catalogers are trained not to be so, as Suzanne Stauffer mentioned in her post.

A solution to the problem must involve the entire library plus partners outside. If catalogers continue to focus their efforts primarily on printed materials, or even when things change over to primarily digital, only on those materials that our individual library has paid for, we lose out on some of the most exciting cooperative developments out there, by this I am talking about all different sorts of open-access materials and projects which include texts, videos, audio, maps, wikis, weird things and so on and on. People want and need these materials and we should not tell them just to use Google or Yahoo or other non-library tool and thereby force them to ignore the tools we make, but we should help our users find materials of interest no matter where they happen to be. Of course, when everybody is looking at precisely the same web site, it makes no sense to catalog the same thing 10,000 times, or even to have 10,000 copy-cataloged records each of which will have to be updated. Solutions must be found.

A few days at a reference desk will make clear that people want help with selection probably more than anything else; people also need help finding this information in the first place because it's not easy, even though research has shown that the public think they are expert searchers. But a few minutes helping people using tools such as Google and Yahoo, and almost any database out there, will make a librarian long for the controls we can get through cataloging. You want to search for a name, so that you can find writings of a "John White" and not get "John Jones" from the White House staff, or "John Smith" selling paint.

Yet, how do we get people to want our help in the first place?

These are huge problems, but librarians who see the needs can begin to imagine a system or group of systems that will solve these problems. It doesn't mean that they can build it on their own, but that is another issue entirely. It is only the very first step. If you don't see the needs, or if you see issues only in terms of what you happen to know at the moment or "how to most efficiently transfer what has been done in the past into the new environment," (which is the way I see FRBR and RDA) it's practically impossible to imagine solutions.

It should be obvious that this is an entirely different task from creating high-quality catalog records that conform to a standard. Both are absolutely necessary and I do not want to slight cataloging, but I am pointing out that each demands a completely different mind-set.

RE: [NGC4LIB] On the Power of the Imagination (Was: What do I need to know?!)

MJ Suhonos wrote:

<snip>
From my limited experience, I vehemently disagree that cataloguers lack imagination; they are generally among the most intelligent and creative people in the organization. Rather, I think what cataloguers tend to lack is bravado, audacity, and daring. I won't speculate on why this is any more than to simply say that the reasons are manifold, and outside the scope of this email.
</snip>

I want to emphasize that I am not at all saying that catalogers are not intelligent; quite the contrary. Some of the most difficult intellectual tasks I have ever undertaken were when I was cataloging. And NACO/SACO revision at a major research library puts scholarly peer review to shame. It can be very tough.

That said, the act of cataloging is basically to fit things into an existing structure. This can be done routinely, mindlessly, cleverly or brilliantly. I have seen it all. There can be brilliant thinking and solutions inside the box but these are times that demand thinking outside that box. Catalogers are not trained or encouraged to do that. A cataloger does not think immediately that "there is a database on the web over here that we may be able to work with." In particular here is Google Books with millions of scanned books. People want them. We must work with that and not ignore it. How do we deal with something like Google Books efficiently, with a bit of elegance and still maintain some level of standards?

Catalogers think in terms of AACR2 (or AACR1, or ALA or Dewey School, or whatever rules they happen to be following at the moment), MARC (or catalog cards) and in highly specific, picky bits of information. This is highly important and I don't want to devalue that, but things are changing today, matters are getting serious and we need some real innovations.

I certainly think catalogers can think outside of the box, but it is not something they normally do. I worry that if we don't find the innovations among ourselves, they will be imposed upon us.

Monday, April 19, 2010

RE: [Koha] The Potential Death of Koha in Pennsylvania Libraries

Posting to Koha List

[...] In this vein, you may be interested in a new OCLC report: "Research Libraries, Risk and Systemic Change" http://www.oclc.org/research/news/2010-03-25.htm which is not just about research libraries.

They discuss risk management and delineate many of the risks as:

"Many of the risks rated as high (impact and certainty) pertain to:

  • human resources and organizational culture, including a lack of attention to cross-training and reallocation of existing staff
  • lack of critical skill sets for managing data sets, engaging directly with research faculty, or retooling technological infrastructure
  • an organizational culture that inhibits innovation
  • difficulty in attracting and retaining staff in a competitive environment where fewer credentialed library professionals are available
  • uncertainties about the appropriate qualifications for library managers who may require skills developed in other sectors."

Then on p. 16 under "Strategies for Mitigation," they continue:

"We believe that increased reliance on *shared infrastructure* along with increased outsourcing and regional consolidation of services will enable more rapid deployment of the services that research library users want and need moving the following risks into a more acceptable range of impact and occurrence:
  • Library cannot adjust fast enough to keep up with rapidly changing technology and user needs (Risk 19).
  • Increased inefficiencies and expenses due to lack of functionality of legacy systems and IT support (Risk 20).
  • Due diligence and sustainability assessment of local or third party services is not completed, tracked or analyzed (Risk 21)."

Their solution of a shared infrastructure would seem to mirror your case in Pennsylvania. While I understand such a conclusion, the upshot of if seems to be "we sink or swim together." I think that instead of having everyone crowd into the same lifeboat, it would be just as logical to foster individual initiatives, while making sure everyone shared their work.

Still, I can understand that going to this level of trust (the foundation of the entire open source movement) does not come easy to an administrator who is responsible for results.

Perhaps you can reach some level of agreement. I am sure that everyone is nervous and still in the search for answers.

Sunday, April 18, 2010

On the Power of the Imagination (Was: What do I need to know?!)

Posting to NCG4LIB

(Yes, I stole the title from one of Montaigne's Essays, but it seems very apt)

I pretty much agree with Alex about what a cataloger needs to know today. Learning to program is a great idea, but you must always keep in mind that the best you can be at programming will be to become an average programmer at best, and if you want to concentrate on your cataloging skills, programming will always be an adjunct to that. Alex's idea of keeping up with technology is the better way to go. I suggest that your programming abilities be aimed at being good enough to create prototypes on your own, so that you are not forced to describe your ideas in abstract terms which can be very difficult for others and frustrating for you, but if you can actually point to a semi-working version that people can use, you have a great advantage. A picture truly is worth a thousand words. Yet it is important not to get too attached to anything you may create. Remember, you will only be a semi-competent programmer at best, so anything you may create can be vastly improved at every single point.

But I want to expand into another realm, and to point out an area where I think many librarians, and particularly catalogers, are very bad. With everything changing such as it is today, I think that retaining and nurturing a healthy imagination of the possible is important. I have noticed that catalogers tend toward another type of thinking regarding new technologies: this is what I do now; how can I do the same thing with the new technology? This is terribly limiting in many ways. I think catalogers have reacted in this same way in the past and this is why we can see many of the limitations of the printed catalog transferred into the card catalog, but more important for us, how the card catalog mentality still dominates even today. For example, we still assume the continued need to browse the subject heading strings in order to get the best understanding of them; the retention of the single main entry, once so vital in printed catalogs, loses its meaning in the newer technology and severely limits everyone's possibilities; in authority control, the concern over creating an authorized form of a name and which cross-references to use, almost all of which are based on card and printed catalogs.

Specifically, I believe RDA is an excellent case in point. It adds nothing I see that is essentially new, but rather it seeks to continue, maintain, and even impose the traditional view of information into the environment of the web. I won't go into details here, but our patrons have moved far beyond these traditional methods and I believe the RDA attempt is doomed to failure because it does not look at the new possibilities of what can be done today.

In my opinion, the most important thing for the cataloger today is to nurture your imagination to envision the tremendous promise of both what is possible with what is needed. How do you do this? To find out what is possible, keep up with the latest in technology, as Alex says. But this must be balanced with what is needed. Along with keeping up with the latest research in how users interact with information, definitely the best way to discover this is to get involved personally in reference work, and really try to involve yourself with real human beings who are dealing with real human problems of information retrieval. Their needs are endless, and it is their needs that are the purpose of what everyone is doing. It seems that this most important aspect of librarianship gets more or less ignored in the entire situation and as a result, everything remains in the realm of the abstract and the nebulous.

So, just following the technology is not enough in my opinion. This represents what is possible and what is still lacking in this scenario is what is needed. So, roll up your shirtsleeves and work closely with your patrons. Just a couple of days of working with some real, live patrons will make very clear that their needs are not even close to "find-identity-select-obtain --> works-expressions-manifestations-items by their authors-titles-subjects" which is what FRBR and RDA declares as fact. These are obviously based on abstractions of the traditional view of what our patrons need; not at all what real life human beings want and expect.

As a result, so far as I am concerned, the FRBR user tasks are one of the clearest examples of this obvious lack of imagination among catalogers. Not only Cutter, but Panizzi himself, and perhaps even going back to Thomas Hyde (who wrote down the first real cataloging rules at the Bodleian in the 1600s, if not even earlier librarians) would see practically no change from their handiwork. While I have great respect for all of these giants in our field, continuing this mentality is regressive and absolutely must change into one that deals with what the current technology offers and recognizes how our patrons work with it.

As Alex points out, this is very tough to do. But I maintain that it is precisely librarians who are the experts in this task. Nobody else can do it; certainly not the programmers, and not even the users themselves know. It's the reference librarians, who have the best idea of matching what the users need with what is really out there. They can see what is lacking and suggest what is needed that the programmers can build and that catalogers and others can "populate" with all kinds of metadata.

Someone who can see all of this is rare indeed. It's what I strive to be although I fail, but I can still vividly imagine dozens of tools that could be of help and all kinds of possibilities for cooperation, although I cannot build them, especially on my own.

This is what makes cataloging, information description, storage and retrieval so exciting to me at this time. But I recognize these ideas are radical, and I have no idea what the future holds for libraries and the librarians who work in them.

Friday, April 16, 2010

RE: [RDA-L] Signatory to a treaty

Posting to RDA-L

Karen Coyle wrote:

<snip>
This choice about attribute and entity needs to be made in the data design, not at the point of cataloging, IMO. I'm trying to think of an exception to that, and can't come up with one... Where I think we *could* have choices, although it is not allowed within RDA, is in deciding on the attributes that are associated with an entity. As an example, some specialist communities would like to use attributes like "colour" in the Work entity, but RDA has it in Expression. Someone may want to add an attribute that RDA does not include. In terms of systems, this is not terribly difficult using registered elements and application profiles. Essentially, as long as the data elements are clearly defined they can be used in different relationships without losing their meaning. In the past, data was defined by the record; today we can define data that can be used in any number of different situations and different records. This gives us a freedom we didn't have before.
</snip>

Will the distinction between attribute vs. entity be solved in the data design? I don't know but I am very skeptical. I personally think it is so complex that the data designers will leave it to the drudges to deal with. :-)

But even beyond that, it seems to me as if we should be finding simpler means to stay where we are, and if it becomes more complex for us, there should be an associated gain both for us and for our users. But here we are adding complexity upon complexity to wind up in exactly the same place.

So, here I am, a middle-aged bearded librarian/scholarly-type feeling "breathless and giddy" like Alice in Wonderland! I can't resist quoting:

"Now! Now!" cried the Queen. "Faster! Faster!" And they went so fast that at last they seemed to skim through the air, hardly touching the ground with their feet, till suddenly, just as Alice was getting quite exhausted, they stopped, and she found herself sitting on the ground, breathless and giddy.

The Queen propped her against a tree, and said kindly, "You may rest a little, now."

Alice looked round her in great surprise. "Why, I do believe we've been under this tree the whole time! Everything's just as it was!"

"Of course it is," said the Queen: "what would you have it?"

"Well, in our country," said Alice, still panting a little, "you'd generally get to somewhere else - if you ran very fast for a long time, as we've been doing."

"A slow sort of country!" said the Queen. "Now, here, you see, it takes all the running you can do, to keep in the same place. If you want to get somewhere else, you must run at least twice as fast as that!"

http://urila.tripod.com/Alice.htm

If we're going to go through all of these mental contortions let's at least end up someplace that is new!

RE: ONIX-MARC integration

Posting to Autocat

I normally go to great pains to document my assertions, but I didn't this time when I wrote how "information managers" (by which I mean non-catalogers), very often slight the importance of strict accuracy in our records. In an upcoming chapter in a book that (I hope!) will be published, I go into the need for accuracy and standards in quite some detail, but people will have to wait. If it's not accepted, I'll put it on my blog.

I have heard this assertion very often that a metadata/cataloging record is for the ease and utility of the user, and very often this assertion is made by "information managers" who have studied the innards of the cataloging database and perhaps even designed major parts of it, but they have rarely, if ever, actually done the dirty work of creating a catalog record from scratch, and often they have never used the library's collection itself.

Although they may even have a subject expertise and therefore some experience as a user, they certainly do not have any reference expertise, which is a completely different task where you can immediately see how good or poor data (not just the computer coding) can help you find or not find relevant information. Yet, because of their computer expertise (not bibliographic or subject expertise) they are in a position of authority, where they can make decisions. Often these people become bean counters and cannot imagine how there could possibly be a problem with something so "mundane" as entering the title. If there is a problem, it must be either with you, or your guidelines.

It would not be good for me or anyone to state any names, and off hand I can't find anything on the web, but I can point out that this is nothing new. (When I am in the realm of library history, I immediately feel more sure of myself!) Ernest Richardson, in his " The curse of bibliographicalcataloging," which I cannot find online, began with, "The general function of the library catalog may be said to be, in simplest terms, "to connect the reader surely and promptly with the book that he wants to use." (Quoted in Stevens, Rolland E. "A Summary of the Literature on the Use Made by the Research Worker of the University Library Catalog" Univ. of Illinois Library School Occasional papers, no. 13 Aug. 1950. "http://www.ideals.illinois.edu/bitstream/handle/2142/3886/gslisoccasionalpv00000i00013.pdf?sequence=1").

But he goes on to say something I find even more interesting, (quoting from memory) "Everything else that does not pertain to this is luxury." (the rest of the Library school paper is very interesting in this regard, too. Thanks to the U. of Ill. for putting these on the web!)

Also, I maintain that the decision by LC not to trace series is an offshoot of this same idea that everything in the record is for the user, and the needs of the library managers are thereby ignored, as I discussed at some length in my first "Open Reply" to Thomas Mann at
http://eprints.rclis.org/6741/.

Now, I will say that the user tasks in FRBR are only a repetition from the ones codified by Charles Cutter in the 1800s and that they must be completely reconsidered in the light of today's technology, and in this sense, I agree with the "infomation managers." But that is another topic. I am saying that some of the user needs should be expanded to include the needs of the experts who create and maintain the collection. While this may be obvious to the people on this list, it definitely is not obvious to non-specialists.

RE: [ACAT] WorldCat Rights and Responsibilities for the OCLC Cooperative

A continuation of the private discussion. My further comments in green.

Jim Weinheimer wrote:

> How about online reference assistance ...
> I think it would be great, but I think the reason people >don't ask questions of reference librarians is because >they don't think they need it ...

How is that different from pre-digital world days? Library users have always shown a varying degree of indendependence and self-confidence in their search for information and materials. But at least with online references, they wouldn't have to go face-to-face with another person.

In pre-digital days, people had absolutely no choice except to use the materials in the library or do without. Today, they have the tremendous options found on the Internet, and when the millions of materials on Google are widely available as full-text, it will represent even a greater choice for our users where they won't need the library.

> It matches my experience and I've thought a lot about >this ...

This is a telling statement. As humans, we all have our own experiences and it is by those experiences that our opinions are often formed. However, as professional librarians with a responsiblity to provide information to all people (or as many as will come to us) it is out duty to make our decisions based upon solid study and evidence.

If you refer back to my previous statement, I was saying that the Educause report matched my experience, so I wasn't just making it up. "Look at: http://net.educause.edu/ir/library/pdf/ers0808/rs/ers0808w.pdf p. 54, where it shows that 80% of the students believe they are either "very skilled" or "experts" at finding information on the internet."

>Perhaps it has to do with the "zero search" that we used to talk about so much a long time ago. The zero result was considered a bad result, but now I'm thinking that it may have been a good result--that a zero search gave a clear result that you were doing something wrong. Today, it's tough to get a zero search since your search query may even be reformatted automatically for you.

When a user searches on a traditional catalog, a zero search is still very possible. And in a traditional catalog with a proper authority file, the user would be shown "Better" search options. But "progressive" librarians have now added automatic keyword searching and other automated directional processes to avoid those zero searches because they felt that was what users wanted.

It seems to me that those "progressive" libraries may not have made decisions based upon their own perceptions, not fully studying what the future result of those decisons might be.

On this we agree, but steps taken at these "progressive" libraries should be seen as experiments, and since they are experiments, many things will fail or must be completely rethought. This may be one area for reconsideration. By the way, the "Better" search options I have seen are labelled, "Did you mean" and you normally see spelling variants. Variants based on an actual authority file I have personally not seen, although I am sure some exist, but now with Zebra-type indexing and displays, much more can happen.

> Anyway, if people think they are the experts, it means they figure they wouldn't learn anything from asking anyone, especially not from some librarian who knows only about books....

Is that a new phenomenon in this digital world or did we just do a study to make it more evident?

Not completely new, but as I stated above: in a pre-digital library when you had problems you either asked questions, browsed the shelves endlessly or did completely without. Many just browsed the shelves endlessly.

>> I don't know that anyone, certainly not me, believes that sending it
>> off to OCLC is enough. ... </snip>

> It seems as if putting limitations on where we can send our records other than OCLC is precisely what this policy is all about. So, I think the underlying belief is that it really is enough to send it off to OCLC. Where else are our records going, except perhaps to Google Books?

We seem to be missing eachother here somewhere. I send my records to OCLC because that is part of my contractual agreement with them. Sure they put it in WorldCat and they consider WorldCat the best way to make all records available to everyone. But that does not make my
collection much more useful.

I want users to contact my library to access my holdings. I want them to go to my website, not OCLC's website. And that is the resource sharing that needs to take place.

Here are a few major points of contention. First, I maintain that "my holdings" includes the Internet. No one can tell me that my patrons do not want or need the multiple free copies of Huckleberry Finn, scanned beautifully at the Internet Archive, or anything else there. There are hundreds of thousands, if not millions, of some of the greatest books ever written there, plus recordings, videos and on and on, available to all for free. How do I include them into my catalog efficiently? I conclude I cannot create or copy catalog all of those items there. I cannot even select them because there are too many. Still, my users need to know that when they are looking at a copy of Huck Finn in my own catalog, there are scads of others available for free all over the web at the click of a button, and I say that ignoring these materials simply because they are different borders on the unethical. What do I do? I have no choice except to experiment with entirely new methods and see what works and what fails. But I cannot ignore them.

Second, insisting that people must use any website, including your catalog, in the way you want and corresponds to what you think "best," and people should not come from other sites such as OCLC, or Google, or someone's personal site, or through a host of other ways, I believe ignores the new methods becoming so popular today. These are the very foundations of the popularity of the World Wide Web, which allows links from one site anywhere in the world to another site anywhere in the world.

These facts, taken together, constitute a new reality for every institution, not just libraries, who are asking similar questions: exactly who are my patrons today? Potentially, they come from all over the world. Exactly what is my collection? I don't claim to have the answers, but it's changing by the minute.

Some may view this with complete and hopeless dismay, but others see tremendous opportunities.

>Libraries have shown quite clearly that they don't even know the basics of competition. Our world is changing because the world of our users is changing, and we have to follow them. So, the solution "Libraries can provide links from their catalogs out to the Google books if they prefer, "won't work, because this assumes that people will be using our catalogs to find full text in Google. I don't think that makes much sense since people will be using Google right from the start and then we'll be lucky if they ever have the time to get around to our catalogs ...

I fundamentally disagree with your analysis of future information seekers. A recent study of academic users (wish I noted where it was) showed that academic library users use Wikipedia, Google, etc. for basic informational searches and baseline material for their studies. But they use library materials (books, subscribed electronic databases, etc.) to gain deeper understanding and do serious research on a topic.

I believe that users will continue to visit libraries for better, more reliable information. And, in the future, that will be more likely through our websites than through our doors.

I agree, and therein lies the infernal conundrum: people come to our libraries for the physical books on our shelves and the journals that are not scanned yet; they use the library's website (but they don't want to have to come to the library itself) to get to the subscribed electronic databases. There is a recent report from Ithaka that analyses how faculty view the library, and the majority view libraries' most important part as purchasing agents for the institution. The authors consider: "The declining visibility and importance of traditional roles for the library and the librarian may lead to faculty primarily perceiving the library as a budget line, rather than as an active intellectual partner." (p. 13)

While I know they do much more than this, libraries must actively demonstrate that they do more than just buy access to some databases (and soon, Google Books) and give a few links on the library's page which could just as easily appear on the page for Human Resources or Finance or Departmental pages. That is being a budget line. What do we do? The only way to find out is to accept the situation and start experimenting.

>I think we have to reconsider what "quality" means. It will not be us who will determine if a record is high-quality. It will be our users, who will be comparing our records to things they like better, e.g. in Amazon.com, LibraryThing, Google Books and who knows what else?

You are right. But is there any proof to show what our users determine to be "quality?" My personal criteria depends upon the situation of my search at the time. But it usually comes down to weather a search engine (or catalog) can give me a selection of sources for me to choose from succinctly and precisely. I want to know that I am getting all of the novels at a given location by Alexander McCall-Smith in one listing with a high degree of certainty. The library authority file provides that. I don't have to ferret through varying forms of the name, misspellings by another user in his search, or any other false hit brought about by catch-all search engines.

> Take a look at this record:
> LC: http://lccn.loc.gov/2003053220
>
> Then in LibraryThing: http://www.librarything.com/search_works.php?q=0521823277
>
> Then in Amazon: http://www.amazon.com/Oratory-Political-Power-Roman-Republic/dp/0521823277/ref=sr_1_1?ie=UTF8&s=books&qid=1271170846&sr=1-1
>
> Then in Google Books metadata: http://books.google.com/books?id=ZwkJh1cZb1YC&source=gbs_navlinks_s
>
> Which would you pick if you didn't know anything at >all about cataloging?

Another point where we are missing each other. I don't search for authority records. I don't search for biographical information about an author in a catalog unless I am searchng for a biography. The LC authority file you show above is meant to work behind the scenes to direct all of my searches to one form of name. And my experience tells me that it does it quite well.

What I would care about is the end search result. If I search for materials by Alexander McCall-Smith and LibraryThing or GoogleBooks gave me a more concise display of available materials, then I would be less cautious about the abolishment of our MARC-based data. But I have seen that LT and GB are not as precise. And so I advocate a more measured approach that preserves our current data and formats while we experiment with others.

I certainly never said that we abolish MARC format, although it should be changed to become more flexible. We must enact other ways to exchange those MARC records instead of the outmoded ISO2709 Z39.50 method. Because of those restrictions, we can't change MARC in many ways. Once we abandon that and go to, e.g. MARC-XML, things begin to open up.

Also, I say that the reason people don't search for authority records is because they are mostly boring but they don't have to be. If we changed our world view of what an authority record is, the authority record for Mark Twain could very easily become the page in Wikipedia. We would just use it as our URI. Certainly things would have to be retooled a bit; this is why I have suggested dbpedia. A lot of work would have to be done in dbpedia to make it more useful, but that's never stopped us before!

Once we open ourselves to sharing information and collaborating on the web in new and innovative ways, the possibilities are almost endless!