Wednesday, December 28, 2011

Re: [ACAT] What RDA might be good for

Posting to Autocat

On 27/12/2011 19:03, Kevin M Randall wrote:
James Weinheimer wrote:
It seems to me that if we want our catalogs used (and thereby catalogers employed), we shouldn't focus on the needs of the publishers and their practices, but rather we need to focus on the needs of those who will actually use the catalog to find relevant materials: our patrons. 
 I believe that Aaron Kuperman's focus *was* on the needs of the catalog users. When he talked of "linking related records in a way that reflects 21st century publishing practices" the point wasn't that the records should meet the needs of the publishers, but rather that the records should allow the catalog to deal in a better way with the resources that libraries are getting, be they from publishers or from any other source. Libraries are going to get whatever resources and metadata the publishers supply; to work with those things isn't to "focus on the needs" of the suppliers, but to just deal with reality: this is what we got, let's work with it and get it into the form that will best help the patrons find it. That's what Kuperman was getting at.
Put that way, I don't believe anyone could disagree, but I have to ask if this is enough for the future? There is a basic philosophical difference that I have with many other catalogers. In the future, libraries will (or at least should) be getting materials from all kinds of sources, not limited to "publishers", just as you mentioned. This will mean that there will be many different kinds of metadata structured in a multiplicity of ways, the vast majority will not be MARC21 or AACR2 or RDA. Also, we should not assume that any form of metadata as it stands today will be unchangeable since we know it will change, in ways we cannot predict. Such a belief would severely restrict our world views.

The absolutely #1 viewpoint in this regard is to discover what the public wants and needs, and then the task will be to fulfill those needs. This is what Google and the other information companies do and why they succeed.

As an example of what I think is the older mentality, there was an interview recently in Library Journal with Madeline McIntosh, a big shot with Random House., where she said, "We do, of course, have to adapt to readers’ changing preferences and habits, and at Random House we’re actively embracing the very positive opportunities that are opened up by digital publishing and distribution. That said, we do have a fervent belief in the ongoing importance of the physical book and of the places where physical books are found: libraries, bookstores, schools, airports, super­markets, etc. Without having books embedded in our physical environment, it would be so much harder to help readers connect with new books and authors."

This statement really struck me. Later in the interview, she emphasized the importance of printed books a few times, e.g. "Whatever the future looks like, we want a model that will ensure continued support for physical books, in physical libraries, in local communities. That’s crucial for us." Clearly, she considers digital content only tangentially. It would seem to me that someone who honestly wanted to sell "stuff written in books" would say, "I don't care how people want my content: in print, digital, on leather, on bark or on stone. I will give them these formats and more. I'll give people what they want however they want it, because that's how I can make money!" I would expect most 19th-century businessmen to react this way because their number one goal, above everything else, was to sell their wares.

It seems that the only reason that a publisher would focus on printed books today is either that they are just backward-looking or they are terrified of changing their business model, which I think is what explains these strange reactions.

In a similar way, I don't believe that libraries should be trying to force current information resources into our traditional structures and to assume that we know what our patrons want and need. We simply *cannot* know this without deep research, working with reference librarians who are the ones who work the most closely with the public, and finally, trial and error.

Will the linked data environment provide what the public wants? The fact is, nobody knows. It very well may, but people may just as easily find it all useless. Lots of forward looking people have questioned the very need for the existence of our type of metadata records. So, the worst thing could actually happen: that once everything is online, or whatever the magic percentage of "everything" is, the public may have no use at all for our traditional type of metadata; search engine optimization (SEO) may fulfill every need of a searcher. We just don't know. In any case, when looking at the matter from a universal point of view, that is, from the viewpoint that encompasses the entirety of metadata, I feel pretty safe in predicting that very few non-libraries will structure their metadata in variations of FRBR.

Still, I maintain that even in such a drastic case of the public rejecting our metadata, I still believe there will be a need for our metadata, but not necessarily as a tool for the public to use, or at least use directly. This is the challenge of living in Darwinian times, as we are doing now, and we have to accept the realities we find plus how the environment changes, often very quickly. Just like our ancient mammalian predecessors, the main task is survival by adapting to the changing new environments. This means to be ready to drop any of our most cherished beliefs if necessary, and to reevaluate our strengths and weaknesses.

Unfortunately, traditional library searching should not be considered one of our strengths today since fewer and fewer people understand it. Modern advances in searching is making many of our traditional methods obsolete. Some may believe that by transforming our records into FRBR and linked data, the people will return, but I simply don't see it.

What would make people return? I don't know, but I am willing to admit that any of our traditional methods no longer work (I will do so only after debate along with evidence, of course!) and that *everything* has to be reconsidered. What does the library catalog provide people? Does it really allow them to find/identify/select/obtain etc.? Is this the genuine purpose of the catalog or is it something different? Also, is the business of libraries really to select, acquire, receive, catalog, shelve, etc. or does a library actually do something different?

I think libraries, and the catalog, actually provide quite different services to our public from what we have always thought. I am not saying that I know what it is that libraries really do and what they really mean to a community, although I have some personal opinions. It is vital to find out. The quickest and easiest way to find out is to create easy-to-add APIs and letting the public take our records to play with in a form they can work with--it has to happen sooner or later anyway. So, as far as I am concerned, until we begin to really open things up to find out what the public really and truly wants, we will remain mired in the realm of old beliefs and superstition.

Thursday, December 22, 2011

Re: Cataloging issue of journal as monograph

Posting to Autocat

On 21/12/2011 17:05, J. McRee Elrod wrote:
Jim said:
Although there is a theoretical difference between a series/analysed serial, and we see it in the usage of either the 490/830 or 730, the final product for the searcher is precisely the same level of access.
No. A series tracing in 730 does not appear in the series index of our software. I assume the same is true of other ILS with a series search. Practices need to be adopted with basic cataloguing principles and the end result in mind, not done for convenience on the fly. The LCRI and OCLC guideline are counterproductive.

It's not rocket science as they say.
Unfortunately, I cannot agree with this. Maybe your software does not allow a 730 to appear in the series index, but there are lots of different kinds of software in the world with lots of different types of capabilities. As an open source developer, I would probably decide to change the indexing. The average person *cannot* be expected to know the difference between a regular series and an analysed serial. In this sense, for a layperson it may as well be rocket science. For an experienced cataloger, it may not be, but it is not fair to the users and ultimately, is counterproductive for everyone, including us. The public barely understands what a title is, but a series title? Or an analysed serial? This doesn't mean they are stupid, but they neither specialize nor have any interest in these matters, and probably never will.

The vast majority of people I have met have tremendous problems searching library catalogs--I don't think I'm the only one who have dealt with these types of people. After all, if catalogers have problems with something, how in the world can we expect the average person to understand? I discovered that many younger people have trouble understanding the very concept of a catalog record (what I always called a "summary record") since 95%+ of their experience is with Google and Yahoo.

The practices that we choose should be based on the needs of the *searchers*, that is, the people who will actually be using our records, and cataloging principles should take a back seat. In addition, I have seen many "mistakes" of putting the title of a serial issue they are analysing into the 830 instead of the 730.

So, what is the lesson from all of this? For an ordinary searcher, they should be very careful of searching series titles since they don't really understand them. For a cataloger, if you have anything that even looks like a series/serial title, you had better search both 730 and 830, otherwise you may be missing something.

Can this situation be improved and perhaps even become simplified? I think it can.

Tuesday, December 20, 2011

Re: [ACAT] Cataloging issue of journal as monograph

On 20/12/2011 20:45, J. McRee Elrod wrote:
Jim said:
... there is absolutely no way that the untrained layperson can understand something this complicated.
What's complicated? You search the serial/series and get the serial and some analized issues. The only complication is if some put that serial/series in 730, so that it does nt show up in a series search.

The only series about which we get complaints are the 800 ones they can't find by title. Somtimes the 800$t begins with "The", and sometimes it's not indexed. We also get 800 entries reported as duplicates of the 100. Since people are paying us money. they do not hesitate to complain if they don't like something.
Sorry Mac, of course it's complicated in the cases of serials that are partially analysed, to know that you have to look into the holdings of a serial record or at a separately cataloged item. I tried to demonstrate how complex it is by showing that different records can be shown and even interpreted in different ways. A layperson should not be expected to even suspect these sorts of refinements.

And the problem of expecting complaints in cases such as this, is that people have to be rather sophisticated to realize there is something wrong in the first place. Perhaps your customers are that sophisticated, but it is my experience that very few people understand what a series is, how series differ from serials, and even less do they understand analysed serials.

Re: Cataloging issue of journal as monograph

Posting to Autocat

On Tue, Dec 20, 2011 at 4:29 PM, J. McRee Elrod wrote:
If 730 is used for some, that excludes them from the series search, leading to missed items. Most ILS we support, even if they have a separate series search (including our inhouse software), also include 830 series in title search. Search results are more consistent with the use of 830.

We've never had a complaint about partial analysis of serials/series not being understood at the OPAC.
Just because no one complains doesn't mean that they understand it. If people (catalogers yet!) have not even been able to understand the difference understand a 440--so much so that it had to be abandoned for processes that demand more inputting--or if the concept of "surname [comma] forename" is becoming too cumbersome, there is absolutely no way that the untrained layperson can understand something this complicated. What could very well be happening is that people rather draw the wrong conclusions, or if everything is classed as a set, they go to the shelves and find the item there. If they have problems, they just do something else.

That said, I agree that there are tons of improvements that can be made and perhaps we could even find simplifications. For instance, while there is certainly a need for to distinguish 730 (uniform title added entry--primarily for analytics) from 830 (uniform title specifically for series), why can't we just throw all of the analysed serial titles into the 830 as well? Is there anything of value lost?

Re: Cataloging issue of journal as monograph

Posting to Autocat

On Mon, Dec 19, 2011 at 11:42 PM, Layne, Sara wrote:
True about the series authority record. I suppose we've been using the serial record as a de facto authority record for monographic analytics of that serial.
But what I *have* always found counter-intuitive are the LCRIs that you cite-- I want to treat those situations as 'true' series also. Do you suppose the LCRIs are actually an effort to avoid creating the series authority record? And, under current LC policy, to enable the tracing of the 'series'?
Although there is a theoretical difference between a series/analysed serial, and we see it in the usage of either the 490/830 or 730, the final product for the searcher is precisely the same level of access.

The real problem from the searcher's point of view are serials that are analysed only in part. Here is an example in the Princeton University catalog for the title "University of California publications in history" (let's hope the link works!)

The first record is for the serial and contains all volumes (vol. 1-v. 82) but only 5 volumes are analysed separately, some even in different formats. I think this would be rather difficult for a user to understand: they want e.g. vol. 12, and it is buried in the serial record, but vol. 18 is in a separate record. Also, it could be interpreted as two different copies, that vol. 71 "Politics of prejudice" by Roger Daniels, is both under F870.J3 D17 1962 and another copy classed with the rest of the serial under D1 .U558. Difficult to know.

It worked somewhat differently in the card catalog, at least in the practices I have seen.

To see how the Princeton card catalog dealt with this, see the completely different heading "California. University. Publications in history" (not that easy to find today in the card catalog!): with a rather large card set for the serial, and the following cards. We immediately see the note "Vol. 33 and following are cataloged separately; see next cards"

We see that the catalogers added each title to the serial record but did not make separate cards for most of them. Then, as we flip through to e.g. vol. 71, we see that the separate card was filed within the card set for the serial. This is much better than how it works today since one of the consequences of computerization is that catalogers have, in essence, lost control of filing and left it all up to the idiot computers. I haven't seen a level of access for MARC records similar to the card catalog except using components, although that has never been implemented very well, in my opinion.

Still, I think this is the sort of area that is obviously complex--too complex for our patrons--and should be completely rethought to increase the utility of the catalog for the public. Not necessarily for us, since we are the experts and already know how all of this works.

How, using the tremendous power of current systems, could the current methods be improved for everyone? And maybe even for us too?

Thursday, December 15, 2011

Re: What RDA might be good for

Posting to Autocat

On 15/12/2011 17:17, Aaron Kuperman wrote:
 However I believe that RDA established a framework, primarily through the 3xx fields, of linking related records in a way that reflects 21st century publishing practices, and that if we want to keep our catalogues useful and relevant (which is a code word for keeping catalogers employed) we should be figuring out how to exploit. Currently we often see a "work" that manifests itself as printed work (like it always did, but now the printed form may be an on-demand printout that has been indiviudally bound) as well as an ebook, and perhaps a website or a part of someone else's website as if the case for most statutes- and our current catalogue rules don't do a great job of connecting them (and I didn't mention things such as adaptation to videos which are irrelevant for law cataloging). If RDA does address these issues, then we've finally found a problem for which RDA might be the solution.

Does anyone agree with me? If so, this is what we should be talking about when we discuss training needs, or "selling" RDA to library managers.

I come from a different standpoint. It seems to me that if we want our catalogs used (and thereby catalogers employed), we shouldn't focus on the needs of the publishers and their practices, but rather we need to focus on the needs of those who will actually use the catalog to find relevant materials: our patrons. The thread on "Discoverability" discusses that. One basic point that must be accepted before we can make any progress--at least I think--is to accept that in the eyes of our patrons, the traditional library catalog is broken. It does not serve their needs.

This is a hard point to accept for a cataloger who has spent his or her entire career honing specialized skills (including me), but as I mentioned earlier, I think it will be almost impossible to make any progress whatsoever in the future if we do not accept this pronouncement.

Once this is accepted (which I will readily admit, not every cataloger will accept), the question becomes: what is it that RDA will change in such a fundamental way? It turns out that RDA will definitely *not* change anything fundamentally--that it absolutely clear and is essentially what was written in the LC/NAL/NLM report--but if RDA is seen as a step forward toward an FRBR universe, then it may be a different matter.   

Of course, that in turn depends on whether you believe that an FRBR universe will offer anything essentially different to our patrons. This is one of those silent assumptions where I have never seen any evidence. I have also seen that relatively few people actually want to navigate through works, expressions, manifestations, and items since the information describing much of it--especially the manifestations--is essentially meaningless to our patrons. This information has meaning to librarians. A patron will often prefer the latest edition of a book, but catalogs have always supplied that information, while today it is easier than ever to sort by date of publication, or date of accession, or almost any way you want.

I still see no reason at all for adopting new rules; we need new ideas about how to repurpose the information we now have. Since quite literally everything is in such flux right now, no one seems to know which way to proceed. As a result, it seems as if we are stuck with trial and--necessarily--error which will eventually find some useful ways forward.

But the changes offered by RDA will not make any difference to our patrons--that is more than clear. I still find it rather amazing that an institution could adopt an expensive practice involving major changes to product workflow, that will neither add simplicity or increase productivity, without a very convincing business case showing what the tangible advantages are.

Re: Discoverability

Posting to Autocat

On Thu, Dec 15, 2011 at 3:23 PM, Dunn, Kathryn M. wrote:
I wonder if you're thinking of the University of Rochester's River Campus Libraries and Nancy Fried Foster, their Director of Anthropological Research:
I have been fortunate enough to hear Ms. Foster speak and even to meet her. Her work is genuinely innovative, highly interesting, and is rewarding for anyone to read. Her book "Studying Students: The Undergraduate Research Project at the University of Rochester" is available for free download, and I very much recommend it.

A library anthopologist! That's great!

Wednesday, December 14, 2011

RE : Old School Search Engines

Posting to Autocat

On Wed, Dec 14, 2011 at 3:27 PM, Mitchell, Michael wrote:
James Weinheimer wrote:

For instance, the subject heading browse (alphabetical) for "chess" mashes together not only the topic with its subdivisions, but people's names, series titles, names of computer programs, corporate bodies, and so on. Additionally, before and after the topic of chess come personal names of people, and all kinds of topics and other entities who have nothing whatsoever to do with chess.
I don't see why a properly functioning catalog would mash together "people's names, series titles, names of computer programs, corporate bodies, and so on" when doing a subject browse. That sort of mashup is what one gets from a keyword search. And, that is why I don't care to use keyword searching in a library catalog except in an initial search to discover LCSH terminology or classification areas.

This is exactly what I tried to demonstrate happens right now in the LC Authority File, but this is not to find fault: all dictionary-type catalogs work the same way. This has been known for a long, long time and has always been one of the main complaints people have had with the dictionary catalog.

For instance, in the LC Authority file, you can look for Dogs. What is the subject heading that comes just before that? It takes awhile to go through the personal names, corporate bodies and such, (nothing having anything to do with the topic of dogs) but then the first subject heading you come to: Dogrib mythology. So, even if we could magically get rid of everything except 150s, we would still be looking from Dogrib mythology to Dogs. That is a very strange leap that happens only because of English spelling. If I am interested in Dogs, I don't want to see Dogrib mythology.

So, what comes after Dogs? Again, there are personal names, but then comes a reference from Dogsharks (nothing to do with Dogs). At least, there are dogsleds but after that, there is nothing whatsoever to do with dogs. This occurs right now whenever you browse an alphabetically arranged list--and always has.

A classified list can be imagined by disassembling the alphabetical lists we see, and rearranging them all by the BT, NT, RT that are inside each authority record. We can see it to a point in with, but a much better example are the Getty Vocabularies.

The Art & Architecture Thesaurus display for "armchairs"¬e=&english=N&prev_page=1&subjectid=300037776 includes the cryptically-named "hierarchical position". Click on the little triangle-thingy for "chairs" and you will see the amazing number of the different types (NTs) of chairs. The A&AT is arranged alphabetically to a point as we see here, but the primary arrangement is classified (conceptual).

Once again, compare this to the dictionary-type of authority file in the LCSH, look for "armchair" and the headings before and after it, which includes people, corporate bodies, titles and all kinds of things that have nothing whatsoever to do with any aspect of the concept of "armchair".

So, it's not as if the information is not in the records, because it is. The issue is: what is the best arrangement for someone interested in "dogs" or "armchairs": by alphabetical arrangement or a classified arrangement? This is an old, old debate that will probably come alive again.

Re: Discoverability

Posting to Autocat

On Wed, Dec 14, 2011 at 12:45 AM, Audrey Driscoll wrote [concerning the article at: <>]:
When I read the interview, one thing jumped out at me -- the research and testing Google does on how people search for information.  They review search logs, but also observe real people doing searches, including their eye movements.

Has anyone done this kind of thing in the library world?  James Weinheimer keeps saying that library users don't search the way they used to and our OPACs are not what they want.  Has anyone tried to quantify what they do want?
Just as a point of clarification, I think it goes without saying that when people sit at a computer and enter information into a text box today, they are expecting to get results based on keyword and arranged in various ways--they do not expect lists of headings arranged in alphabetical order.

This is why I am so much against the FRBR "user tasks": the very word "Search" is taking on an entirely new meaning today that couldn't have even been imagined 30 years ago (except possibly by some "futuristic" thinkers such as Marshall McLuhan, but even then, probably not). The term "item" means something quite different in an environment that lets it change literally from moment to moment and be mashed up into zillions of variants, from the "item" that sits on a shelf somewhere.

FRBR describes a world that--I won't say has disappeared--but a world that is being rapidly overtaken by outside changes in the information environment. This development should be seen as very good and positive.

What is it that people want today? I don't know but lots of very important, powerful and rich companies are trying their utmost to find out.

While it is my gut feeling that people will no longer submit themselves to the regime of searching a library catalog "correctly", I think they still would like to take a lot from it. "Expert selection" immediately comes to mind; the idea of "consistent results" comes to my mind but probably not to a non-librarian; "conceptual searching" and so on. These were abilities that the card catalog allowed but have disappeared with today's Googley-type search engines. 

I think there is a major place for those traditional powers today, if only we admit that in the eyes of the public, the library catalog as it is now is broken, then we could really focus our efforts on fixing it. Not by trying to impose a 19th century view of the information universe on something that is fundamentally different.

RE : Old School Search Engines

Posting to Autocat

On 13/12/2011 18:04, J. McRee Elrod wrote:
For us poor spellers, or when not certain of a surname spelling, browse can be very handy.
Not all searches are subject searches. Sometimes one is seeking a partially known item or person.
Of course, a much more powerful way to search for a word you do not know how to spell is the "fuzzy search" available today. Even the open-source catalog Koha has this option, but anybody sees it in action with every search in Google, e.g. a search for "philsophy" automatically "corrects" your search to "philosophy" but allows you to continue your original search. If you don't know someone's full name, there is the Wikipedia solution, e.g. see the "list of people with surname Smith", which leads to a huge number of disambiguation pages. Of course, none of them are in the order surname, forename.

If you are interested in someone and all you have is the name "Smith", that is really sad for you, but it seems as if the Wikipedia pages, with its possibility for adding a search for a "relator" term, would be at least a little more useful than the library's files.

I still maintain that this would be a great place for cooperation among different communities!

Tuesday, December 13, 2011

RE : Old School Search Engines

Posting to Autocat

On Mon, 2011-12-12 at 12:21 -0800, J. McRee Elrod wrote:
Keyword searching is only a solution in a monolingual collection, and still may miss material with "cute" titles, if lacking good contents, summaries, and subject headings.
I am questioning the idea of the utility of the dictionary aspect of the current catalogs, i.e. where someone must do an alphabetical browse to find, e.g. "dogs" or "Argentina--History" and then finding the subjects arranged alphabetically, as opposed to finding records by just searching for keywords. If you are going to browse today, I would think that a classed arrangement would be much more useful to people than an alphabetic one.

For instance, the subject heading browse (alphabetical) for "chess" mashes together not only the topic with its subdivisions, but people's names, series titles, names of computer programs, corporate bodies, and so on. Additionally, before and after the topic of chess come personal names of people, and all kinds of topics and other entities who have nothing whatsoever to do with chess.

In contrast, the tool Visuwords although graphically horrifying to me personally, at least lets people "browse" the topic conceptually and seems to me a much more useful arrangement than the dictionary one.

Who knows? Perhaps variations of Roget's original thesaurus arrangement will become more popular, has Roget's original preface, which begins
"THE present work is intended to supply, with respect to the English language, a desideratum hitherto unsupplied in any language; namely, a collection of the words it contains and of the idiomatic combinations peculiar to it, arranged, not in alphabetical order as they are in a dictionary, but according to the ideas which they express."
It may turn out that the dictionary arrangement will prove valuable, but it's why I mentioned that the old debate will probably be revisited in the future.

Monday, December 12, 2011

Re: [ACAT] RE : Old School Search Engines

Posting to Autocat

On 12/12/2011 19:00, J. McRee Elrod wrote:
Hal said concerning shelf browsing: 
I'll forestall Mac Elrod (I hope) by pointing out that you may get a similar effect if you use a classified browse index ... 
You do not mention the fastest growing category of all - remote electronic resources. For me, the hope of classed browsing is the major reason to class electronic resources. Classification should be more than "mark and park".
I have gone back and forth throughout the years concerning the value of classifying online resources. My current way of thinking is that classification could be immensely valuable, but its functionality must change substantially from what it is today. Items that are virtual can have multiple class numbers from multiple classification systems, and consequently they can reside on different "shelves" simultaneously, while being rearranged according to the needs of each individual searcher (referring back to my Harry Potter example of the books flying all over the library).

I suspect that in the future, the members of the cataloging/metadata community will find themselves rehashing a lot of the old debate of classified vs. dictionary arrangements. Keyword access avoids many of the problems within the traditional classified catalogs that made the dictionary catalog such an important option. Best would be a blending of the classification arrangements with the syndetic structures of the subject headings, and something really valuable could result.

That is, if people will still want our catalog records in the new environment.

RE : Old School Search Engines

Posting to Autocat

On 11/12/2011 19:39, Beartooth wrote:
    Among full-time scholars there was, and likely still is, another strategy, conditioned only I think in part by the fact that many of them had carrels, seminar rooms, and/or other footholds far from the card catalog.

    By hook or crook or wandering around, one might find a tier of shelves with some apparent if marginal relevance to what one wanted. At that point one could scan bindings for promising titles (or names, if one knew any); pull one off, open it, and look for a bibliography.

    At that point today, I would pull out a netbook and try the online catalog for two or three items; go to the shelf where they belonged; and iterate the process.

    Back in the Biblio-Carboniferous, of course, I'd've hoofed it down to the card catalog as a last resort, but would first have made a couple stabs at navigating by the seat of my pants to that second shelf.

    In a nutshell, this approach depends entirely on a faith in the library's subject cataloging being reasonably true to Jefferson's original principle of keeping things together that were likely to be used together (rather than, say, trying to emulate peter Mark Roget).

    The reality was, and I hope still is, that it works fairly well -- and, like browsing in the bookstore sense, is a lot more fun than the methods that catalogers a/o reference librarians mostly advocate. It would be interesting to know whether the current generation does something comparable with its electronic engines.
Yes. Browsing the shelves has always been the most popular, and pleasurable, way to find information in the library. Unfortunately, many people use this as a replacement for using the catalog, and the consequence is that these people assume (or simply hope?) that all of the materials they need are miraculously on these shelves. (I personally imagine to myself a scene from the library in the Harry Potter movies, where all the "important and relevant" books on the topic I want fly magically from wherever they are in the collection to the shelves I happen to be browsing)

Any cataloger will know that the materials on a topic can be spread all over the library for all kinds of reasons: materials in special locations or special locations, in an annex, journal articles, other formats (e.g. microforms), plus we all know that whenever a topic gets semi-complex, any book could legitimately go into several places in the classification scheme. This has been the situation since at least the library of Alexandria, but still, a huge number of the public relies on browsing the shelves as a main method of finding their information. (I too, was one of their numbers before I got into library school) One of the most common reference questions is: where are your books on [fill in the blank]? They want to hear a physical location consisting of several shelves instead of being told the best ways to search the catalog.

Today, we have to add electronic resources into the mix of what is missed when you browse the shelves, but I have thought of an additional wrinkle here. It seems to me that the public considers a full-text search to be the same as browsing the shelves. The short "metadata record" they see in the Google result, with the title of the resource they can click on, followed by a short description, is almost the same as browsing the spine titles of the books on the shelves. When they look at the actual resource, it's the same as taking the book off of the shelf and leafing through the pages. That's the way I feel about it.

Since this is the most pleasant part of working with a physical collection, it only stands to reason that it is also the most pleasant part of working with a virtual collection. In both cases, the searchers are hoping (incorrectly) that they are working with the materials on their topic, or at least the most important, "relevant" ones. Still, it is in many people's interests to maintain the fiction that they really *are* looking that the most important, relevant materials on their topic.

If the system and metadata worked correctly, a virtual book could go onto many different virtual "shelves", and the Harry Potter scene could become a reality.

Sunday, December 11, 2011

Old School Search Engines: Where Are They Now? : Some thoughts

Posting to Autocat

I read an article in WebProNews “Old School Search Engines: Where Are They Now? ” by Chris Crum (November 16, 2011), where the author discusses the “old” search engines (all launched from 1995-1996--not that long ago!), and it gave me a chance to reflect on some of the historical developments I have seen and studied. Among other "old" search engines, the author talks about WebCrawler, which was my own favorite. It still exists(!) and has now become an aggregator of results of other web search engines. He mentions Altavista (acquired by Yahoo and now scheduled for elimination), HotBot and others. Seeing Dogpile brings to mind some forgotten feelings about it: I refused to use Dogpile simply because of its name, which didn't inspire much confidence, to say the least!

The article doesn't discuss pre-web search engines that were at least semi-popular. I am thinking of the old gopher networks that were built before the existence of the World Wide Web and for some reason, the search engines for that technology were named after characters in the “Archie” comic book series: Archie searches, Veronica, Jughead, and perhaps some others. Gopher was all text based and utilized telnet. For those who are interested, here is what it looked like:

I found it interesting that parts of Gopher still  function and that there seems to be a small community trying to keep it going. This seems to me as some sort of unique type of cultural preservation project. Anyway, as I remember, libraries were just beginning to build major gopher sites when HTML and the World Wide Web came out and suddenly, everything changed to http accessed through new programs called "browsers".

Reflecting back on how all of these systems popped up, only to become essentially forgotten after just a few years, has made me think about library catalogs and how they too, have died, or changed into something quite different. Before the card catalog, there were different types of book catalogs, either printed or manuscript. The later development of card catalogs attempted to include various new technologies and some of these were abandoned: microforms, photography and other attempts. When OPACs started to appear, everything at first was text-based and looked similar to the gopher sites. We can see how this worked by reading documentation produced by the library at Indiana University for NOTIS From that documentation, we can see that the functionalities and displays of that catalog replicated as closely as possible card catalog browsing, e.g. the guidelines for subjects Displays of the records themselves began to vary from traditional card displays, e.g. see the "The old man and the sea" example at Virginia Community Colleges:, but the functionality was text-based, where you had to type in "a" for author, "t" for title, and then type the number of the record, or e.g. LON for the long display.

I remember how I saw that hypertext allowed people to just click on links instead of typing in all of those cryptic codes and how popular that was (including with me), but more important, the moment keyword searching was introduced, the public just loved it and fewer and fewer people browsed the headings in the old ways. To be fair to the public, browsing headings requires cross-references to be really useful and the first attempts for computerized heading browses omitted the cross-references. It took a long time to incorporate the authority files and even today, it is not done very well. In any case, as time went by and people used keyword searching almost exclusively, many of the older people forgot about browsing the headings, while the younger people never learned much about it in the first place.

Ultimately, as entire texts became available for searching, the powers of searching full-text became clear to everyone, plus searchers could access the information they wanted immediately without going through anything that resembled a "catalog" or "catalog records". At the same time only a few experts could see or understand the problems in full-text searching. Of course, experts realize there are massive amounts of metadata utilized in the background when you use a search engine such as Google or Yahoo, but the searcher is pretty much not aware of it.

Nevertheless, the larger story from all of this is that the two systems (search engines and library catalogs) are merging, and doing so of necessity. It is obvious that it is far more pressing for library catalogs to merge with search engines than vice versa since the public has made its preferences very obvious. I see few pressures on search engines to become more like library catalogs, while there are many calls for library catalogs to become more like search engines, or at the very least, more like Amazon.

From this viewpoint, compared to 15 years ago, people behave quite differently when searching for information and have different expectations of what they should be able to do with that information once they have found it. The problems the public experiences are not based on cataloging rules, or if there are problems in that respect, they relate much more to how the catalog *functions* than to the rules themselves. That is, people very rarely browse headings any longer, as they were forced to back in the days of the card catalog and text-based searches as we see in the old gopher searches, and how NOTIS-type OPACs worked. An even more basic challenge is that people today tend to search library tools only *after* searching their favorite search engine or other database.

In spite of these considerations, the traditional heading browses nevertheless provide a power that currently cannot be found anywhere else; but the heading browses have become dysfunctional in today's computer systems. So, the primary question should be: How can the power of those browses be incorporated correctly into the information tools that our searchers rely upon, whether or not those tools will be located in a local library catalog? Working on this issue is one area that could have much more positive results for libraries than an overhaul in the our cataloging rules, which will result only in a few insignificant changes in the displays of our cataloging records.

One additional point: these search engines are labelled as "old school" even though they came out only around 15 years ago. That is about the same time as when FRBR came out, but in the library world, FRBR is considered to be the most modern statement we have! This illustrates yet another basic difference in the library information world vs. the "information world at large": radically different perspectives of time. For the "information world at large", 15 years ago represents a fundamentally different information world from what we are facing today.

Maybe they are right.

Thursday, December 8, 2011

Re: Editing record for additional copies of popular fiction titles

Posting to Autocat

On 08/12/2011 20:21, Marian Veld wrote:
On Mon, Dec 5, 2011 at 3:51 PM, Audrey Driscoll wrote:
Isn't this what FRBR was intended to address? :-)

We "lump" fiction editions, similarly to what has already been described. Because I began my cataloguing career in an academic library, this practice still bothers me, even though I understand the reasons for it. I especially dislike the fact that the bibliographic record describes a specific edition, even though only some of the items attached to it belong to that edition. It would be better if the record was more generic, but we don't go back and rework it when adding other eds.

We also have a practice for "children's classics" which is similar, except that 700s are added for all the different illustrators.
I was waiting for someone to bring this up. All the hoopla of RDA not being based on what patrons actually want completely misses this point. While RDA might not be exactly what patrons want, for public library patrons at least, it's a step in the right direction.
The problem with this, as I see it, stems from the fact that the library catalog is supposed to serve (at least) two equally important functions: 1) to give the public some basic access into the collection, and 2) a complete inventory tool for the library managers. An additional point today is that catalog records can be shared much more easily than ever before, and by all kinds of people in all kinds of ways. For instance, one major use of catalog records today is to provide automatic bibliographic references for reference management software (all available for free, so long as you have a computer). Therefore, a student now tends to rely on a catalog record for bibliographic references, so poor catalog records could impact them very seriously. Telling people that they should be careful with automatic bibliographic references can (and has with me) found the retort that the records in the catalog should be better.

It is vital to remember that today's systems are very powerful and at least as found in library catalogs, a tiny percentage of the potential of these powerful systems is used. For instance, the public view of the catalog records can be radically different from the view of librarians. While the catalog records can be complete and accurate--for a cataloger--there is a lot of information that members of the public may not need or want. All kinds of views can be generated today, including FRBR displays and other merged types of displays, but I am sure, that there are lots of other displays would be far more innovative and interesting. Consequently, everybody *can* be happy today.

In the long run, as library catalogs will have to deal with full-text automatic indexing, which we can all assume will get better and better while becoming cheaper and cheaper, it seems to me that catalogers will have one advantage they can definitely point to: while our records will probably never be "more, cheaper, or faster," than automatic full-text,and I even think an insistence that our records are "better" will fall on deaf ears, the one point that we will have in our favor will be our *consistency*, the *reliability* of our records that follow shared standards, and that our records can be retrieved equally well from one day or one week or one year to the next. I don't think I am the only person in the world who gets driven crazy by the lack of indexing/cataloging consistency in tools such as Google, where "tweaking" occurs constantly. I often have extreme difficulty finding a site I had seen the day before, and sometimes a week later, I don't even try if it's not all that important.

This would be just one single step toward making the library catalog more useful to the public, but it would be an important one. Library cataloging provides consistent and reliable access. It is something that people can rely upon, and if the public comes to understand this, perhaps it may come to be appreciated much more than it is at present.

Monday, December 5, 2011

Re: Editing record for additional copies of popular fiction titles

Posting to Autocat

On 04/12/2011 21:55, Valerie Borthistle wrote:
I have recently begun cataloguing fiction in a public library setting. I am learning that, for popular fiction titles, when added copies are purchased, it is desireable to have all copies of the same format on one bibliographic record to facilitate the holds process. In my previous work at an academic library, we followed standards to create a new bibliographic record if there is a different publisher and date of publication, different ISBN, pagination, etc.
I cannot restrain myself from giving a historical note:
Examples from older practices can be seen in Princeton's scanned catalog where you often find an added entry card, and there would sometimes be the note "For other editions, see main card" and the "main card" was not a single main card but a number of cards.

Here for instance, is a facsimile of Cornwallis' "Discourses upon Seneca", where we see the "other editions" note. But, when we look for the main card we see that there are three versions cataloged separately: a microfilm of a 1601 edition, an original from 1631, and the actual "main card" of the facsimile. In this case, there are three different editions, in three different locations, with three different call numbers.

So, we see in some cases, this practice was aimed to limit the growth of the catalog by cutting down on the number of added entry cards, while the catalog itself still provides the same amount of descriptive information. In older book catalogs, practices were even more varied.