Monday, February 18, 2013

Catalog Matters Podcast no. 18: Problems with Library Catalogs

Catalog Matters Podcast no. 18:
Problems with Library Catalogs

https://archive.org/details/Eighteenth


Hello everyone and welcome to Cataloging Matters, a series of podcasts about the future of libraries and cataloging, coming to you from the most beautiful, and the most romantic city in the world, Rome, Italy. My name is Jim Weinheimer.

In the last episode, I provided some examples of how people want to manipulate data instead of plowing their way through masses of printed text but I went on to express my doubts that the information in catalog records is actually the type of information that most people want to manipulate. I would like to continue that discussion.

In the previous episode, I provided some examples of the kind of data that people want to manipulate, and I want to add one more example here because it has meaning to me personally.

I used to be a semi-serious chess player. Every beginning player has the experience of just after a few moves, you find yourself looking at a position you do not understand, but your opponent knows everything. He is smiling, moving quickly and easily, while you are suffering and spending lots of time just to find moves that you hope don't lose. It doesn't take too many of these experiences, and lost games, before you figure out that if you want to have good results, you must prepare your first moves, also called the chess openings, and that means doing research.

This is genuine research by the way—nothing at all like those undergraduate papers where five or six scholarly articles fulfill an assignment that nobody cares about. No, you care. You want the best and you want to be thorough because otherwise, you will suffer and you will lose. So what does it mean to do this kind of research?

In the past, it meant spending money to get the largest library of chess books and magazines you could afford and borrowing anything you could get your hands on. These materials were—and still are—filled with games and notes, and you hoped everything was well indexed, so that you could bring it all together and write—manually—your own “opening book” of good moves, bad moves, plans, ideas and so forth. Doing this could take months of hard work and you were always adding to it.

Today, all this is done with databases and what used to demand so much labor and time to sift through this massive amount of information now takes only a few seconds. The first time I saw one of these tools in action, I was quite literally left speechless! Grandmaster Gennadii Sosonko says that before databases, it took anywhere from a year to a year and a half to prepare a new opening. But because of databases, the research takes only a few seconds, and the data can be mined in new ways, so today to reach the same level of preparation requires only... two weeks! Two weeks versus a year and a half. And you are as well prepared as anyone. That is incredible. For those who are interested, I have added a link to a video that demonstrates this. You don't need to know any chess to see the power of such a tool. Obviously, chess players who do not use these tools are probably at a serious disadvantage.

I have no doubt that others want to do something similar—not with chess, but with whatever topic they prefer. I know I would. The reason it works so well with chess is because the moves that once were printed on paper have now been made into data and that data can be manipulated by computers in all kinds of ways. To do something similar with other topics, it would be necessary to turn the information on paper into a kind of data that computers can understand and work with.

It also shows the problem with catalog information that I discussed in the previous episode of Cataloging Matters. As a chess player, I am interested in the data of the chess games themselves, that is, the individual moves, their evaluations, who played them and when, not the data about the books and the serials and the videos and everything else that contains the information I want. Therefore, as a chess player interested in improving my play, which information from the catalog record would I want to manipulate? The fixed fields, the standard numbers, the main or added entries, the titles, the publication information, the physical information, the series information, the notes, the subjects? None of that helps me improve my play. And yet, I am always interested in finding more “chess data” to put into my database.

In the same way, I think most people are interested in improving their knowledge and understanding of baroque architecture or political issues of my community or plasma physics or whatever interests them, but manipulating the bibliographical details of the containers that hold the information that interests them will not help them understand those topics.

This is why I say that while we can go ahead a “turn our catalog records into data”, it is—lacking any evidence to the contrary—at the very least, extremely naive to expect the public to find new insights into the topics that interest them because they will be able to manipulate the standard numbers or the publication information or the notes or the publication patterns, or any of the other information that is in our records.

So, why would anybody need catalog records? What more could I want regarding my chess data? As I said before, I am always interested in finding more “chess data” to put into my chess database, and this is where the catalog information comes in. Although the catalog does not have chess data, it can lead me to chess data.

It can be argued that full-text searches can lead to more chess data, too. What is the difference between these tools?

Everyone recognizes that the public has changed its “information seeking” behavior in fundamental ways from what it was only 20 or so years ago. For those listening who may be relatively young, 20 years may seem like a long time, but in library-time, it must be recognized that 20 years is quite literally the blink of an eye. What this means is that every day almost everyone who uses a library's collection works with materials and records made long ago. Often, those materials are among the most important and valuable parts of the collection. This does not happen with many other fields such as with businesses and most other organizations. For them, the information made before a certain time, say five or ten years, is much less important for their needs and is discarded or archived, and those times it is retained, it is kept as a curiosity.

Materials in libraries are very different.

Full-text search engines have profoundly changed the way people search and even the way people think about searching. It seems that even for many of those who did work in those earlier times, their memories have faded. I know my memories have until I start working to remember.

One example of how deeply we have changed is that today, everyone takes for granted the over-arching importance of “relevance ranking”. Relevance, a word that sounds innocent enough, has taken on semi-propagandistic uses in that it mixes the sense of its meaning in statistics and information science with the way it is more popularly understood. Companies such as Google that make billions of dollars, are very interested in making sure that the these two definitions remain mixed together in people's minds as much as possible.

In spite of what some may prefer to believe, the two senses are definitely not the same, but it can be difficult to see and comprehend the difference. We can discern that difference most clearly when we examine a search engine result verses a search in a library catalog, when the search in the library catalog has been correctly made and the library catalog also works correctly. I emphasize correctly because it is extremely difficult to do today.

How do people find materials with full-text searches? Research on search engines (I have some links in the transcript) has consistently shown that people concentrate almost all their attention on the top three or so results. People almost never go beyond the first page. It should be added that the default number of search results in Google is ten, and since people rarely change a default setting, the first page means ten results.
Search User Interfaces: Presentation of Search Results / Alexander Schreiner. In: Themen des Information Retrieval : Suchmaschinen und Web-Suche : Beiträge des Seminars im Sommersemester 2012 / Andreas Henrich, Daniel Blank (Hrsg.). p. 35+ and Search User Interfaces / Marti Hearst. Cambridge University Press, 2009. p. 136)

I have personally been fascinated when I watch people work with Google. They put in a word or two or three, look at the top three results, or five at the most, and if they don't find what they want, they immediately try other words, look at the top three or five results, try yet other words, and so on.

I confess I have found myself searching Google in exactly this same way. Such actions betray a number of assumptions on the part of the searchers—and this apparently includes me when I do it.

Many of these assumptions are rather illogical but entirely understandable. As one example of these assumptions, it seems illogical to believe that a search through the vast information resources now on the internet and that retrieves several hundreds of thousands or millions of results could possibly have only a paltry three or four hits that are “relevant” and that the millions of other pages are therefore practically “irrelevant” and can be ignored. That really makes no sense but it is what I see with Google results. After the top few results, the rest really is almost completely irrelevant.

After the first few hits, I see more and more places to buy books or videos or tee shirts or bizarre email exchanges that are (I guess) somehow “relevant” to my search. I have always found this very strange. You would think you would find highly relevant items at first, then slowly you would see less relevant and gradually it would trail off to complete irrelevance, but my experience, which may be different from anyone else's, has been a more or less complete drop off after the first five or ten maximum. Therefore, I think people are right to stop looking after the top few. But I often think: is that true? I can't believe it. Furthermore, to believe that a machine could automatically bring the results to the top that are the “best” and “most appropriate” and to do it for me as an individual at any particular moment, is akin to magical thinking.

It begins to make more sense when we consider the information science meaning of the word “relevance”. That meaning of relevance is quite different and has to do with mathematics and algorithms, with precision versus recall and so forth. This is the meaning of relevance for a Google search—buried in statistics and algorithms (almost all secret by the way)—but it is something I don't believe the average person understands. When people hear that the top hits are the most “relevant” to their search, they confuse this algorithmic sense of “relevance” with “best” or “most appropriate” or “most useful” and then, they eventually come to believe that these pages, by definition, really are the “best” or “most appropriate” or “most useful”.

Although I can't prove it, but I don't think it can be disproved either, I have come to suspect that Google does not so much find the most relevant sites (even in the information science meaning) so much as it has managed to move the completely irrelevant junk that had tormented everyone for such a long time, to lower levels in the search result. What is left over is popularly interpreted as the “most relevant” or “best” but what is genuinely the “most relevant” or “best” may still lie buried inside the search result somewhere or not even in that result at all.

More importantly though, this matter becomes clearer when we compare it with a correctly done catalog search where everything works differently. Let's imagine that I am interested in “popular songs”. A reference librarian would immediately understand that my request most probably reflects a lack of focus and would begin to ask questions such as: popular songs from where, from what time, which genres, am I interested in recordings or texts, and so on. A reference librarian could help me a lot.

But even if I do not consult a reference librarian, there is a lot of help with a correctly done search in a correctly made catalog.

I know that on the lists and in my podcasts I discuss library and cataloging history and I hope it doesn't put too many people off. I do so not out of a sense of nostalgia, but because I believe it is impossible at this point in time to understand our current catalogs and decide in which directions they should change without clearly understanding what they are, and that means knowing at least a bit of their history. And for better or worse, that means discussing catalogs that existed in other formats. Never forget that the records we make today could easily fit into a card catalog of 1870.

During the days of the card catalog where everything was in alphabetical order, I would search for “popular songs” by opening the card drawer as close to “Popular” as possible and eventually come to a card like the one I have placed in the transcript, which is from the Princeton University scanned card catalog. It says:

When imagining someone doing this in reality, it is essential always to keep in mind that I could not come to this card directly as the hyperlink allows. It would take me some time to find this card, because first, I wouldn't know it existed, plus I would be browsing from the beginning of the drawer of cards. In this case, I would have seen and browsed past the title “Popular history of British ferns”, the subject heading “POPULAR LITERATURE--FRANCE”, the title “Popular political economy” a cross-reference for the corporate body “Popular revolutionary American alliance” and so on. That is, I would see many records that have nothing at all to do with what I want—popular songs.

After this browsing, I would find the card that would tell me that I should look under “Music, Popular (Songs, etc.)” so I would walk over to the “Ms” where I would once again browse just as I did before, seeing even more materials that had nothing at all to do with what I wanted, and would eventually find a special arrangement of cards. There is a link to this arrangement in the transcript. http://bit.ly/XPy0ek. Unfortunately the scans go a bit crazy for awhile but you can still see each card. Click on “Next Card” and go through just a few of them. The searcher discovers that this topic “Music, Popular (Songs, etc.)” has been subdivided into groups, such as “Addresses, essays, lectures” “Bibliography” “Dictionaries”, and as I continued to browse, I would discover that I could also find popular songs of different geographical areas. Quite a bit of help.

For those who used the printed books of the Library of Congress Subject Headings (those terrifying, big, fat, red books that I never understood before library school), I would again browse alphabetically, looking for “popular songs”. We can see how it worked from a copy of the relevant page found in Google Books. I added it to the transcript.

http://books.google.it/books?id=rREoAAAAMAAJ&dq=library%20congress%20subject%20headings%20music&pg=PA3613#v=onepage&q&f=false

Under “popular songs” I see that I should look under “Popular music”. The historian can see that the heading has changed since the card catalog. In this example, “Popular music” is on the same page in the printed book, so we just go to the top of the page.

We discover that added to the topic “Popular music” is the not very highly readable (May Subd Geog) which, to those who know, means that this can be subdivided by geographical area. We also see related classification numbers, the UF, BT, NT and a scope note. Continuing on, we can see some subdivisions and find that “Popular music – Louisiana” has a Narrower Term of “Zydeco music”.

All of this can be very helpful to someone interested in popular songs, and in the absence of a reference librarian can help people focus their thoughts and perhaps lead them from the vague notion of “popular songs” to something tangible that interests them. In this case “Zydeco music”.

That's how it worked in the printed world. It would have taken a lot more time than I have taken to explain it. You may also have had to wait because someone was using the card drawers you needed. Searching the card catalog was just a pain. And yet, there were advantages.

Let's compare this to browsing entries for the subject “Popular music” in the online LC catalog. There's a link in the transcript. http://catalog.loc.gov/cgi-bin/Pwebrecon.cgi?DB=local&Search_Arg=popular%20music&Search_Code=SUBJ_&CNT=100&hist=1. We are assuming we already know that the subject to browse is “Popular music”. What do we see?

We see many, many more subdivisions than those in the printed LC Subject headings. Each geographical subdivision displays, resulting in an overwhelming list and illustrates how the cryptic (May Subd Geog), although not very comprehensible, actually came in very handy to help someone understand how a topic is sub-arranged. There are also many more subdivisions in this list than we see in the printed LC subject headings, and these come from the list of free-floating subdivisions that can be used under any topical heading. I provide a link to an old version of that list http://www.itcompany.com/inforetriever/form_subdivisions_list.htm, where we can find “Bibliography” “Bio-bibliography” “Discography” and many others.

After browsing through ten screens comprising 100 subject headings each under “Popular music” or 1000 subject headings—I repeat: 1000 subject headings—I am only up to “Popular music—France—1901-1910”. It's hard to say how many screens of popular music there are, but I think it is safe to conclude that only the tiniest percentage of a populace used to looking only at the top three hits would last to the bitter end, or even half-way through to see the key Narrower Term reference from Popular music – Louisiana that leads them to “Zydeco music”. No one will do that today. Including me. I refuse.

There was a similar problem with card catalogs of course. Although I can't demonstrate it physically—people will have to just take my word for it—it was a lot easier to flip through the cards in a card catalog or page through the subject headings in a book catalog than plow through these web pages. But it was still a pain.

Once I do find “Zydeco music” in the computer catalog http://catalog.loc.gov/cgi-bin/Pwebrecon.cgi?DB=local&Search_Arg=zydeco%20music&Search_Code=SUBJ_&CNT=100&hist=1 I find some other intriguing subjects, such as “Zydeco music—Finland” along with a related term “Cajun music”.

This simple example illustrates that the catalog is based on creating intellectual groupings, that is, sets, of similar items and presenting those sets to the searcher in different ways. There is no concern at all for anything resembling “relevance”. It isn't as if you would look at the 200 items you find listed under “Zydeco music” in the LC catalog and think “I don't see what I want under the first three records listed here so I'll try another search”. At least, I hope searchers do not do that today, although from their point of view if they did it would be fully logical. So people may do this—I don't know. Does anybody know? Somebody should.

The assumption with a library catalog should be: if the information about Zydeco music you want exists, it will definitely be within this grouping labeled “Zydeco music”.

Is that true?

No.

Why? For several reasons. One of the main ones: catalog records base themselves primarily on complete resources—technically speaking, 20% or more of an item, so within a specific collection there may be many materials with information about “Zydeco music” but not everything warrants a separate heading. In fact, there may be a lot of information about Zydeco music in the resources found under the broader term “Popular music—Louisiana”, maybe even “Popular music—Southern States”. It would not be stretching the imagination that there may also be significant information on Zydeco music under materials with the related term “Cajun music”. How can someone be aware of all of that?

Let's look. What happens in the library catalog if I browse the subject headings for “Zydeco music” and I go forward and backward? If I browse backward, I find the heading Zydeco dance--Study and teaching—Louisiana which is perhaps not too bad, but next comes a subject heading about the word “Jew” in Lithuanian.

If I browse forward, I find Zydeco musicians but then come some place names and corporate bodies in Poland. While Zydeco musicians and dancing may be all right for my purposes, those other topics are of absolutely no value to me. They are so far off that they can't even be labeled serendipity. Some have claimed that alphabetical arrangement is essentially no different than random arrangement—or at least a completely arbitrary arrangement—and this demonstrates why.

Obviously, what someone really needs, when looking at records of “Zydeco music” is to know that there may be more information on Zydeco music at least in the groupings “Popular music—Louisiana” and “Cajun music” if not maybe others.

These relationships exist now but as we have just seen, utilizing these relationships is practically impossible since even if you know how to do it, as I do, you have to fight with the catalog. This is why I have stated repeatedly that the catalog is broken.

Why do we have to fight with it? Because the catalog we have today was designed to present everything in alphabetical order, the arrangement you find in a dictionary, this is why Charles Cutter titled his rules “Rules for a Dictionary Catalog”.

That is, a dictionary of the 19th century—not one of the 21st century. For someone using merriam-webster.com or dictionary.com or Wikipedia, all of those tools work completely differently from the dictionaries and encyclopedias in the world of Panizzi and Cutter or even that of only 20 years ago. If I go to merriam-webster.com, I just type in the word I want to know. It helps me even if my spelling is atrocious. I can completely misspell the word “chrysanthemum” http://www.merriam-webster.com/dictionary/krisanthenum and still find it.

Try looking for this word in a printed dictionary if you have one and notice along the way how much you see that has nothing to do with chrysanthemums or flowers or even biology. If you don't have a printed dictionary, I have a link where you can look for “chrysanthemum” in a dictionary from 1823. http://books.google.it/books?id=jlZBAAAAcAAJ This link goes to the cover. Don't cheat and do a text search for the word but browse for it like you would in a physical volume! I don't suggest looking up “chrysanthemum” under “k” but if you want, I have a link to volume 2. http://books.google.it/books?id=qVZBAAAAcAAJ.

Therefore, when we read that the library catalog is a dictionary catalog, which it is, these printed dictionaries are what we should envision. That is because the people who designed our catalogs had those tools right before their eyes since everyone used them all the time. Those old catalogers added all sorts of aids to searching their catalog but those were all made for a physical dictionary catalog and those aids have become useless today. The reason they are useless is that browsing alphabetically, and seeing a huge number of materials that are completely irrelevant to our search have become very strange in the modern world. This is a fact whether we like it or not.

The methods I have briefly described clearly do not work in the current environment. They are never, ever coming back and they shouldn't because they genuinely are obsolete. But that is the way our catalogs work now, whether we like those methods or not. Nevertheless I think it is important to consider that just because the methods may be obsolete doesn't mean that everything is obsolete.

What do I mean? Let's consider some differences from the past. What is a heading? For catalogers today, it means the 1xx, 240, 4xx, 6xx, 7xx or 8xx that today contains controlled vocabulary and provides a link that searchers can click on so they can find related records. In the past, it was something much less vague. It was the part written at the top of a card that determined where that card sat in the card catalog. In the transcript I have an example of a card where I denote the heading in red, and often, subject headings were typed in red too.


In book catalogs, the heading was printed one time at the beginning of a group of records and for groupings that went on at some length the heading would be repeated at the top of the column or the top of the page. In the transcript I provide an example of headings in a printed book catalog and again denote the headings in red. We can see how Cicero's name is not repeated even though there are six items.
Catalogue of the Mercantile Library in New York. New York : E.O. Jenkins, 1844. http://books.google.it/books?id=_mtMx5Z8J28C p. 43:

I also have an example of subject headings with subdivisions in a catalog from 1869. We see the beginning of the topic “Moral science” which comes after “Moors in Spain” (dictionary catalog at work) and we see its subdivisions “General works – History” and “Systematic treatises”. There are other subdivisions that come later, such as Miscellaneous works, and all kinds of Special subjects, Anger, Avarice, and others.
Catalogue of the Library of Congress : index of subjects. Washington [D.C.] : GPO, 1869. volume 2, p. 1177. http://books.google.it/books?id=RbtSAAAAcAAJ.

The purpose of the heading as a designation for a group of records on the same topic or author, is very clear in a book catalog. The methods are obsolete, there is no argument about that. But exactly what do we see here that is so obsolete?

No one today is going to look for “Moral sciences” by starting at “M” and browsing past Metallurgy, Meteorology and Monograms. But it does not necessarily mean that the groupings themselves are obsolete, that is: the sets of records found under each heading.

I believe it is clear that people still want the materials we see grouped together, for instance the materials grouped under the topic Moral science – General works – History. People in 1869 wanted the resources we see grouped there and there is every reason to think that people want exactly those same resources today. Therefore, the grouping, or the set, is not obsolete. Of course it needs to be updated to include modern resources. If we assume that people still want this group today, the question becomes: how does someone want to find this group? Naturally, people would prefer more modern words in place of “Moral science”, such as Ethics, Deontology, Morality, or Morals.

Yet, how can people who want such a group find it if they don't know how it is named, here Moral science – General works – History or even that the group exists? Also, when people find this group, how can they find other groupings of materials that may be of interest to them? Can it be done?

Yes. We saw it in the card catalog. The example with Zydeco music shows how people really could do this in earlier catalogs—if they used those catalogs and other tools correctly. It wasn't easy back then, but it's almost impossible today.

The library catalog provides groupings of resources that have been selected by experts. The groupings and arrangements of the individual resources are based not on statistical relevance, but on the intellectual contents of the items. Naturally, this system has never been perfect but neither are Google or any of the other systems. The traditional way relies on the ups and downs of human frailties, and consequently has missed a lot, but I'll just go ahead and say it: it can't be all that much worse than believing that when I do a full-text search and get a million hits, that only the first three I see are worth considering and that they have risen to the top by magic. I don't believe it.

The reason I don't believe it is that I understand what statistical relevance is and I also understand how library catalogs are supposed to work. I know there must be more.

These are some of the reasons why I don't think RDA or FRBR are going to make any substantial difference in the ways the public uses our catalogs. Even with linked data, why should the public want to manipulate bibliographic data that has no meaning for them? Our catalogs will still be based on the principles of a 19th century dictionary—not even on a 21st century dictionary! The problem is not our records or even the information in them—it is the reliance on alphabetical order that has become obsolete in our new environment.

When I am looking at the set of records under “Ethics”, I want to know that there are many subtopics available for me such as “Cross-cultural studies” that I would never have imagined. I want to know there may be more information under “Philosophy” and “Values”, and that there are all kinds of narrower terms such as “Akrasia” http://lccn.loc.gov/sh2006003161 even though I may have never heard of Akrasia in my life!

So, is there a solution? I think there is and in the next podcast (yes, it continues) I want to discuss something called Information Architecture and how Information Architecture could help library catalogs and even libraries.



The music to end this episode is a little different from what I have chosen before. This time it won't be Italian music, but more in keeping with the spirit of this talk, here is Zydeco music from Louisiana. This is “What you gonna do?” performed by Buckwheat Zydeco and his fabulous group. http://www.youtube.com/watch?v=AQL1eT4crZw à

That's it for now. Thank you for listening to Cataloging Matters with Jim Weinheimer, coming to you from Rome, Italy, the most beautiful, and the most romantic city in the world.


12 comments:

  1. You say: "Even with linked data, why should the public want to manipulate bibliographic data that has no meaning for them? Our catalogs will still be based on the principles of a 19th century dictionary—not even on a 21st century dictionary! The problem is not our records or even the information in them—it is the reliance on alphabetical order that has become obsolete in our new environment."

    The primary reason for linked data (and more discrete data identification) is not for the end user to manipulate that data. The primary reason is for the SYSTEMS to be able to manipulate the data according to the needs of the user. The problems with the catalog that you talk about are very real. But the deficiencies in our current metadata structures are very real, too. With the pieces of data more clearly defined, systems will be able to make associations between resources, concepts, etc. that are inherent in the descriptions, thesauri, classification schemes, etc. The systems will then be able to take the existing data, which is based on 19th and 20th century technologies, and make it more useful in 21st century technologies.

    I do not understand why you keep insisting that we should be working on improving our catalogs INSTEAD OF implementing FRBR, RDA, linked data, etc. It's not an either/or thing going on. Rather, RDA and linked data are the paths that we have identified as the ones we need to take in order to make better catalogs!

    ReplyDelete
  2. The goal should be the end user, not the machines. As I said in the podcast, what parts of the catalog record do the users want to manipulate when they are looking for information of interest to them? Are they interested in the "associations between resources, concepts, etc. that are inherent(?) in the descriptions, thesauri, classification schemes, etc."? I don't think so and it needs to be proven.

    While people do want to manipulate data, I cannot imagine that too many of them will find new insights into Bram Stoker's Dracula, or in the Cold War or in the reasons for the latest economic crisis by manipulating any part of our catalog records. That is, unless someone can show that by manipulating the paging, size, statements of responsibility, or bibliographic notes, that some researcher will find new information about Shakespeare's Hamlet.

    Finally, exactly who has identified that RDA and linked data are the paths we need to take to make better catalogs? It was never put to a vote or even discussed.

    It is long past time to question these propositions and put them to the test. Better that it would fail now on its own rather than have everyone go up in flames with it.

    ReplyDelete
  3. You're still not getting it. The users aren't doing the manipulating, the machines are. And this doesn't mean that the focus is on the machines. Catalogers writing bibliographic descriptions on 3x5 cards were not focussed on the machine (card catalog); they were focussed on the user, using the technology of the card catalog to meet the user's needs. Catalogers putting bibliographic descriptions into MARC tag workforms aren't focussed on the machine (online catalog); they're focussed on the user, using the technology of the online catalog. Catalogers dealing with linked data will not be focussed on the machine; they will be focussed on the user, using the technology of linked data.

    Users aren't going to be "manipulating" the data in the sense that you seem to be imagining. We're not talking about users working with bibliographic data the way they would with, say, ICPSR data. But they WILL be able to get more easily to the information they're looking for because the MACHINES will be able to manipulate the bibliographic data more accurately. Data such as relationships between original works and translations; original works and derivations; works and writers; works and performers; works and awards; works and languages; works and mediation devices; and on and on and on and on and on. Making the machines more powerful in handling bibliographic data, to lead the users to the resources they want, is the whole point of it all. Yes, we need better catalogs. But we need better (and better-defined) data to make the better catalogs possible.

    ReplyDelete
  4. I think I have demonstrated as much or more than anyone else that I "get it". I just question the final product and the focus. I realize that it is all for machines and that once it is good for machines, it will be good for people. I disagree with that mindset.

    In theory that may be OK, but in reality what will be the final product? As I showed in the podcast, it will be making data that people DO NOT WANT TO MANIPULATE.

    We can say how much people will want to manipulate "relationships between original works and translations; original works and derivations; works and writers; works and performers; works and awards; works and languages; works and mediation devices; and on and on and on and on and on" but I think people will care as much about that as I--speaking as a chessplayer who wants to improve my game--care about manipulating all of that. Which means absolutely zero interest. I want the moves and information about the moves. The other is of no interest to me at all. Adding the FRBR relationships (editor, director, WEMI, sequels, supplements ....) will make absolutely no difference.

    Of course, if someone can show research that people want to manipulate this information so badly, I would be willing to reconsider. But research is exactly what the pro-RDA/FRBR people do not want to do.

    It remains to be shown that people want so badly to manipulate bibliographic data as opposed to the data that really interests them. We see it every day in every library: people use the catalog only as a way to get into the collection, which contains the information they want. It will be the same with linked data, where people still won't want the bibliographic information, except as a way to get into the information that they really want.

    Until the cataloging community recognizes this fact, there will be little change.

    ReplyDelete
  5. You obviously do not get it at all, because I keep trying to explain that the point is not to have people manipulate bibliographic data. The point is to have bibliographic data that the machine can manipulate. The FRBR study showed how our bibliographic records have all along contained the details of the resources: titles, authors, subjects, relationships with other resources, etc. Tagging those same details with MARC coding has enabled us to create discovery tools with computers that are much more powerful than card catalogs. Newer methods are showing much promise for tools that are even more powerful.

    NO ONE EXCEPT YOU has been saying that we are trying to create bibliographic data for the end user to manipulate as bibliographic data. What the end user will be doing is telling the discovery tool what things they're looking for, or sometimes just follow paths that the tool offers them based on something else they're looking at. The computer will be doing all the manipulating. Unless you think that someone using even the most rudimentary of online catalogs is "manipulating bibliographic data"!

    ReplyDelete
  6. What a strange argument this is! Of course it is the machines that actually do the manipulation of the data, just as I mentioned with the chess program in my podcast. The "manipulation" of that chess data I used to do manually, myself. People "tell" the machine what they want, just as they do in the chess program, (if you haven't looked at that video, see how the program works) and the machine does its magic.

    Catalogs contain information about “containers”: books, serials, recordings, and so on, and very few people need to manipulate that kind of data. OK, they don't need machines to manipulate the information for them. That information is irrelevant for their purposes.

    Perhaps it hurts a cataloger too much to accept the fact: almost nobody needs bibliographic information, except a librarian. For instance, the information in the catalog record cannot help the chess program I mentioned. All the public needs from the catalog record is where an item can be found, and where are other, similar items.

    Finding other, similar items has been broken for a long time for the public, as I demonstrated. And RDA, FRBR, Linked Data, won't fix that.

    It seems clear that you don't get it at all. You seem to be insisting that the public (using machines, of course!) needs the detailed information found in catalog records for some reason I cannot fathom.

    ReplyDelete
  7. "Almost nobody needs bibliographic information, except a librarian." Oh dear me, I guess I don't need the catalog to contain the title of the thing I'm looking for. Or the name of its author. Or when it was published. Or what other works it's related to.

    "All the public needs from the catalog record is where an item can be found, and where are other, similar items." Nobody can find WHERE an item is if there is nothing in the catalog saying WHAT the item is!

    "It seems clear that you don't get it at all. You seem to be insisting that the public (using machines, of course!) needs the detailed information found in catalog records for some reason I cannot fathom." Yes, the users DO need that information. The information is the absolutely essential part of the catalog. Without that information, there is no catalog at all.

    I'm starting to wonder if maybe all you're doing is just making a career out of being a devil's advocate, for the fun of it. Or do you really believe all the stuff you're saying?

    ReplyDelete
  8. No. I am being honest and realistic. Remember that from the very beginnings, the public has come to the library to use the resources in the collection, and the catalog has been their way into the collection. How many people have you heard of--other than (maybe) catalogers--who enjoy searching the catalog? And reading a catalog record? The information people want is in the collection. That is where they learn about the topics that interest them--not from the catalog.

    This fact may be too difficult for catalogers to accept, I don't know. Accepting it was very difficult for me. But events are showing this simple fact to be true.

    Now that there are ways of "finding information" other than library catalogs that are much more alluring than library catalogs, and the fact that the library catalogs as finding tools are broken (as I demonstrated in the podcast), things must change.

    To make the information in our catalog into data means that the public (through the miracle of machines, of course!) will be able to manipulate that data in all kinds of new ways. But for someone who wants to learn about the history of alchemy, or the causes of WWII, or the intricacies of Michelangelo's David, they will not learn anything from the catalog. If they are lucky, they will be led to resources where they can learn about those topics.

    If I am wrong, please demonstrate it. Show me how manipulating the information in the catalog records can help me understand the causes of the Cold War. Or what is a war crime. Or why 9/11 happened. Or why Durer painted himself as Jesus Christ.

    You can't. That is because such information is not in the catalog. It's in the collection. And making the catalog information into data cannot change that at all. How will being able to manipulate anything in the catalog record help me learn about those topics? Manipulating the standard numbers? The authors? The catalog notes?

    Manipulating any of that information will make no difference whatsoever--unless you can demonstrate otherwise. Please do so if you can. The best the catalog can do is lead me to resources that might contain that information.

    And the ways the catalog does that do not work at all well today.

    As I pointed out in the podcast and other places, we can go ahead and make the catalog records into data, but to expect it to make any substantial difference to anyone is very naive. It will make our records more widely available, and that would be good although the context will be lost, but there are many, many ways of doing that other than Linked Data.

    ReplyDelete
  9. You are repeatedly failing to grasp the most basic point that I have been trying to make: This has nothing whatsoever to do with giving data to the end user to manipulate in one way or another OTHER THAN as a means of finding the resources they are after. You say "Show me how manipulating the information in the catalog records can help me understand the causes of the Cold War." How can I show you that when it isn't my point at all? The purpose of the bibliographic data is NOT to provide information about the causes of the Cold War; it is to describe, and make available to the user, RESOURCES THAT CONTAIN the information about the causes of the Cold War. For example, this might be done by using LCSH "Cold War". Or it might be by using an identifier that stands for the concept. And from a record describing a resource about the Cold War, a link from the LCSH string or identifier would lead the user to other resources about the Cold War. And there might be additional links to other resources by the same creator, the same publisher, etc. that might be related in some way and provide more information.

    How many ways can I say it? This isn't about trying to make bibliographic data do something that isn't designed to do, and that it cannot do even if someone wanted it to. It's about formatting it into packages to help it do what it's always been meant to do, but to do it better.

    Of course, at this point you are so married to your straw man that I would be totally surprised to see you concede on this matter.

    ReplyDelete
  10. It is truly unfortunate you take these matters so personally and resort to ad hominem attacks. It makes it difficult to reach any kind of understanding.

    But at least you appear to agree that manipulating the information in catalog records will not help someone understand a topic that interests them. So, you seem to agree with my point that the catalog records are actually signs that lead people toward the resources that may contain the information they want. Once that point is accepted, the public's ability to manipulate the data in the catalog records should not be seen as such a wonderful success since it will make very little difference to them.

    This has been my point all along. Then I go on to ask: What can we do with our catalog records that will make a difference to people? How can we create something that people cannot find anywhere else? And where we do not make our so-called "legacy data" obsolete? Is there anything?

    My answer is yes: by fixing the catalog to make sure it once again becomes a really useful finding tool for today. And this has been the point of my last two podcasts (plus the next one). It is far more complex than changing a few rules and adopting a new format. It is admitting that in the current information environment, there is something wrong in the fundamental workings of the catalog, and how people relate to the catalog and individual catalog records, and then trying to fix all of it.

    RDA, FRBR and Linked Data do not address these issues.

    ReplyDelete
  11. What I find unfortunate is that you continue to argue against a description of RDA and linked data that does not fit anything I have seen anywhere else but in your messages.

    You say: "So, you seem to agree with my point that the catalog records are actually signs that lead people toward the resources that may contain the information they want." Yes, yes, yes!!! That's what I've been saying all along.

    Then you say: "Once that point is accepted, the public's ability to manipulate the data in the catalog records should not be seen as such a wonderful success since it will make very little difference to them." But it sounds here like you are referring to exactly the thing I have been trying to refute in all my previous comments. This has nothing at all to do with providing data for the user to manipulate for any reasons other than the purpose of finding, identifying, selecting, and obtaining the resources they're wanting.

    It is NOT about letting the user do calculations on material types and dimensions, or comparisons of lengths of contents notes, or whatever, the same way they may use a database of research data. What it IS about is helping the user through providing greater depth and accuracy in filtering search results; through making more complete linkages between related resources; etc. To help them get more quickly to the things they're looking for, and once they find something, get more easily to other related things that they may also want.

    It's about helping the user find all the things written by Author X, even those where the author's name appears as Author X1 or Author X2. It's about helping the user find a specific edition of a Finnish translation of a Harry Potter book. It's about helping the user determine whether this CD, or that CD, or some other MP3 file is what they're looking for.

    It's about improving the signs. The signs will have more information; they will be more plentiful; hopefully they will be able to be adjusted to the needs of the user (have access points or labels translated into another language, for example).

    While we can disagree on how well FRBR, RDA, and linked data address the problems of the catalog, I believe it is absolutely incorrect to say that they are not addressing the problems at all. Because addressing those problems is specifically what they are all about. The deficiencies in our catalogs are due in no small part to the deficiencies in the metadata (both content and structure).

    ReplyDelete
  12. I believe I am among the first to ask many of these questions, such as when I questioned whether the FRBR user tasks are all that relevant to the world today. Is that really what people want? The accepted answer today seems to be that it is not the main tasks that people want in the modern information universe. Certainly we can have a tool that lets people find, identify, etc. by their authors, titles and subjects, but the evidence shows that this is rather low in people's priorities and they want something else. What is the evidence? When they can, people have left library catalogs in favor of other tools, where they find information much more easily and better (in their eyes). Cataloging is not very high on the agenda in many libraries and departments are being cut back.

    The FRBR user tasks can be done by anyone now--right now--in modern faceted catalogs such as Worldcat. But nobody is declaring victory or tooting horns. Nobody seems to care.

    What more evidence do we need?

    It turns out we are in agreement on some things, but you still place a lot of faith that once our records are placed into the linked data universe, somehow the public will find them more useful than where they are now. I question that assumption based on several points, but mainly on the fact that our catalogs are broken. RDA certainly doesn't make anything easier to find (it makes it harder in many ways as I pointed out in Catalogs, Consistency and the Future. Linked data won't fix anything and will only make the “brokenness” felt more widely.

    In my next podcast, I'll be discussing Information Architecture and what it could do for our catalogs. The number one focus of Information Architecture is to create something that “your users” can navigate and find the information they want as quickly and as easily as possible. If somebody has to fight with a site to make it work, or if they experience trouble, they will leave for something else. In the business world, this is the kiss of death.

    This is what has happened with library catalogs. I see nothing in FRBR or RDA or linked data that takes the view of the user in fixing the problem they experience of a broken catalog. One of the main reasons for this is that there have been no real user studies of the problems that users have with the catalog, and to then find solutions to fix their problems. More energy has been placed in “information literacy” which teaches people how to search a catalog. And of course, the need to implement RDA.

    We must see the catalog as the users see it: as a whole, as an entirety, not individual records. When we look at it from their point of view, I believe that the problem with catalogs appear much deeper than just deficiencies in the metadata, either content or structure.

    ReplyDelete