Catalogs, Consistency and the Future
Hello everyone and welcome to Cataloging Matters, a series of podcasts about the future of libraries and cataloging, coming to you from the most beautiful, and the most romantic city in the world, Rome, Italy. My name is Jim Weinheimer.
Hello everyone and welcome to Cataloging Matters, a series of podcasts about the future of libraries and cataloging, coming to you from the most beautiful, and the most romantic city in the world, Rome, Italy. My name is Jim Weinheimer.
I want to discuss something that has puzzled me for quite some time about RDA and FRBR, but I believe I have discovered the source of my problem. In this episode, I would like to discuss the consequences of implementing RDA and FRBR.
Some of the comments I have read concerning the changes to the library catalog envisioned by RDA and FRBR have puzzled me, but a recent private exchange of emails may contain an explanation. To explain my problem: the RDA advocates claim that the changes with RDA and FRBR will “improve access” to materials. Of course, I also want to improve access but everyone seems to be talking past one another. Now I believe I understand the problem: there are different interpretations and attitudes toward what “increasing access” means.
For the pro-RDA group, “increasing access” appears to mean that after RDA is implemented along with all the necessary changes to MARC format, the public will have more search possibilities than they have today. For instance, information called “relator codes” will be added to the name headings we now use. The director of a film will get the relator code “film director” or the editor of a book will get “editor”. Once this is done, the public will be able to search by more specific terms than are available today, such as “editor” or “actor” or “film producer” or any of a number of other possibilities. Right now in the Internet Movie Database, you can find people in their separate roles, as “actors” versus “producers” and after the RDA-FRBR-MARC updates, the public will be able to do the same in our catalogs.
RDA and FRBR also allow more specific relationships among resources, so that people will know that a link that goes to Shakespeare's Hamlet actually goes to a summary or a paraphrase or a screenplay or whatever instead of what they see today: a simple added entry that reads “Shakespeare, William, 1564-1616. Hamlet.” Of course, it will take quite awhile to get all of this working, but when it is working, people will have a much clearer idea of what is really available to them through the new relationships and the other information added to the records.
Additionally, when these records are added into the Semantic Web, access will be even greater because library records will then be seen by far more people than ever before. People will come across library records during their normal course of searching without having to seek out and search separate library catalogs as they do today, since—let's face it—library catalogs remain relatively unused in the greater scheme of things.
I have understood all of this but I still cannot agree and in fact, I have maintained that implementing RDA and FRBR will actually reduce access to the materials in our collections. I have stated repeatedly that it is vital to begin to look at the catalog not only through the eyes of the cataloger, but through the eyes of members of the public, people who understand very little about catalogs, and who don't want to know. How will the public view the changes in the individual records and the search results? That is what I want to discuss here, and to answer this, I feel that I need to discuss at some length, one of the foundational building blocks of the catalog. While I am sure that all catalogers know this, it seems to have been forgotten or set aside in some instances, especially in regard to RDA and FRBR.
In a library catalog, the main idea—of course—is to give access to the materials in the collection. And do it reliably. Doing it reliably is a lot harder than it sounds. The basic method of achieving this has been that all catalog records follow—as best as fallible human catalogers can manage—the principle of consistency. Without consistency, a catalog ceases to be a catalog but becomes a simple list of different kinds of references that have titles or names or whatever entered more or less randomly. In the transcript I give a link to an example of an old document from 1760 that claims to be a catalog but bears little resemblance to what we think of as a catalog. It is more of a simple list of books. It is the first published catalog of Princeton University (at that time called The College of New Jersey) http://libweb2.princeton.edu/rbsc2/libraryhistory/1760_Davies.pdf
As you look at it, you will find that the descriptions, which are very short, are arranged both by the size of the item and by an imperfect alphabetical order. The result is that everything is pretty much mixed up randomly under names or titles. For instance, you can find Bibles under “B” (out of strict alphabetical order) but also under “F” for French Bible, or under “V” for “Vatabli Biblia Hebraica” (a polyglot Bible). Works by Josephus can be found under “J” but also under “L” “L'Estrange's Josephus” for his translation of Josephus, and so on. Something like this may be adequate for personal use but it is obvious that it would be a very difficult tool for the public to work with.
Librarians solved these kinds of problems by introducting the principle of consistency, and this type of consistency is followed in a host of ways: forms of names; forms of subjects; how to analyse a subject; use of specific MARC fields; where to find the title of an item in a specific format; how to count the pages of a book. This has been going on for a long time. Back in the days of the card catalog there was also a consistent size of the cards because librarians quickly discovered that when people worked with cards of all different sizes, it was too difficult, and therefore if libraries wanted to share their cataloging by copying their cards and sending them around, everyone concerned had to have a single size of card with the hole for the rod in the same spot. Consistency has been the predominant idea of a catalog since its beginnings.
This is not for the ease of the cataloger, but for the public. From the point of view of the patrons, when they look at a search result for William Shakespeare, they assume that they are looking at the resources by or about William Shakespeare, no matter how his name may appear on an item or in what language it may be in. Is this assumption correct? The answer is that in a catalog it is possible to reply yes—so long as you search the catalog correctly and stipulate certain conditions such as the “rule of three”, or depending on how the catalog searches: is it a keyword search? Are subjects included in the search or 505 notes? Blah, blah, blah. Most important: have the catalogers done their work correctly? In other tools, such as a full-text search engine, this type of positive answer is not possible.
As one cataloger once told me, “It is more important to be consistently wrong than inconsistently right.” And this is true because if something is wrong but has been done consistently, there is at least a chance that it can be fixed, but when inconsistency is involved, it becomes almost impossible even to discover how big a problem may be. At one institution I worked, the subject heading for Colombia had been entered very consistently for years as Columbia with a u! It was easy to fix.
Concerning “right” versus “wrong”, the very idea of “right” is so nebulous that there is little chance of general agreement although there is much more agreement on what is wrong. For instance, what is the “right” name of the capital of Italy? It may be Rome, Roma, Rim, Rzym, depending on where you come from. But all can agree that the capital of Italy is not “London” or “Moskva” or “Beijing”. You will always find someone who disagrees with the form you have chosen to be “right”, but while people may disagree with a form, it can still be applied consistently for retrieval purposes. Of course today, URIs can be introduced so the forms of headings can become much more flexible than before.
Since consistency has been such an fundamental principle of the catalog from its earliest days, it is vital to consider the consequences very deeply before breaking that consistency—otherwise, you may wind up breaking your catalog. In this sense, cataloging can be a highly conservative endeavor and it can lead to the extreme view that nothing at all can be changed without serious consequences to the catalog. Many have argued that and I myself did for awhile until I began to work with other catalogs and other rules, and as I worked with the public. My eyes were opened and I began to understand more clearly where consistency could be more, or less, important. Although consistency is always important in a catalog, there are some practical areas where it is much less important than others.
Changes occur all the time in our lives and these changes must be reflected in the catalog. When a word becomes obsolete newer terminology is substituted. A fairly recent simple example came from a few years back when the subject heading “Moving-pictures”, which was obviously from an earlier time, changed to “Motion pictures”. Of course, no one today will think of “moving-pictures” although, I personally would have preferred the less formal “Movies”. (See how little agreement there is on what constitutes “right”?) Still, a consistent form was chosen and all previous instances of “Moving-pictures” were changed (or were supposed to be changed) to “Motion pictures”. In some catalogs, it is necessary to change each heading on each record manually but even though it may be a lot of work, it is a relatively simple task and can be left to lower-level staff or even student helpers. Among the public, I am sure there was probably some bewilderment among long-time library users who no longer found anything under “Moving-pictures” but they were expected to ask a librarian and all would be fine.
Changing those earlier records is a lot of work, but it is done because otherwise the principle of consistency would be broken and patrons would have to search under two headings for the same concept. And that becomes very difficult for library patrons.
Sometimes this is just unavoidable because a single heading may split into two or three separate headings. Updating the older records becomes impossible without recataloging each item because you cannot know which of the new headings the older heading refers to. There are not many choices in such a case: recatalog all of the previous materials, or use a method called “superimposition”. How does this work?
One example of this is the old heading “Labor and laboring classes” which was used for many years, and was then split into three headings: “Labor”, “Labor movement” and “Working class” and all new records received the newer headings. When compared to the Motion pictures example, updating the older records would require the actual recataloging of thousands of records, requiring a higher level of staff, and therefore was practically impossible for most collections. As a consequence, the old records have retained the obsolete heading “Labor and laboring classes”. Yet, it is important for the public, when looking at a newer record with the subject “Working class” to know that there are also materials cataloged before a certain date on the subject “Working class” but they are found under a heading that is no longer used: “Labor and laboring classes”. Not simple. Conversely, it is equally important that when people are looking at the items under “Labor and laboring classes” there are more materials cataloged later, under “Labor”, “Labor movement” and/or “Working class”.
The only way to achieve this has been through adding notes to the catalog. In the card catalog, this was done simply enough by adding special cards in the correct places that people would encounter as they browsed the cards in alphabetical order. At the beginning of “Labor” a card would say something like “For items cataloged before [date], see Labor and laboring classes” and at the beginning of “Labor and laboring classes” a card would say“For items cataloged after [date], see Labor, Labor movement, Working class”, but today matters are different. I have never seen an online catalog handle these sorts of notes very well.
Not in the LC catalog, nor in any catalog that I have seen, are there notes indicating earlier practices, even when you do a left-anchored browse search of these headings, so that users have no idea when they search “Labor movement” they must also search “Labor and laboring classes” and vice-versa. I have certainly never seen any attempt in a keyword environment which is what is necessary today.
Still, I am not finding fault. Such splits in subjects are unavoidable but they should obviously be kept to a minimum. With the changes planned with RDA however, there will be far greater impacts that will be visible on nearly every record. What does that mean in real terms?
Let's imagine that RDA and the new, improved format have been implemented. Let us also imagine that a Blu-Ray for the 1956 version of “Moby Dick” arrives at the library. The cataloger will add exactly the same names as before—no change—but this time, the cataloger will add “film director” and/or its code to John Huston's heading. The cataloger will do the same with each of the other names, as actor or writer, or whatever role the person fulfilled.
With this change, the consistency of the catalog has been broken. The earlier records just had John Huston's name with no role information. What consequences will this have? It depends on what libraries do with it.
From the display viewpoint, the patron will search John Huston's name and discover that in the films where he was the director, he is labelled as “film director” on some records and on others not, even though he was the director. To be honest, this does not seem—to me—to be much of a problem, since the public spends little time on individual records in a catalog. People see far stranger things on the web every single day and simply forget them. If it bothers someone enough, they can always ask but I think very few will. Most just don't care.
If someone implements a specialized search limited only to “film directors” as can be done now on the Internet Movie Database, the consequences are different: the search will not find anything cataloged before the Blu-Ray. Your catalog may have records for the videocassette and DVD of Huston's Moby Dick, but a search for John Huston as a film director will not find these items. It also won't find any of the other films where Huston was a director. So, from the point of view of someone who is interested in finding out what your collection has where John Huston was the director, they will find one record for a Blu-Ray when your collection may have multiple versions of every one of the movies he directed. This works exactly as what we mentioned before: when someone searches for “Working class” they must also know to search “Labor and laboring classes”, otherwise they may be missing the majority of the materials available to them.
We have seen why this happens: because the founding principle upon which the catalog is built is consistency. Adding something that seems so innocuous at first actually breaks that consistency, thereby breaking the functioning of the catalog and adding tremendously to its complexity. This is why I maintain that implementing these changes will in reality reduce access instead of improve it. Without a doubt, if these kinds of changes are implemented, the complexity of doing a reliable search in the catalog goes up tremendously. What can be done?
When AACR2 was implemented, it mandated changing a huge number of headings, and many libraries responded by closing their earlier catalogs and placing the new AACR2 records into a separate catalog, thereby splitting the catalog. There was a lot of debate about the effects this would have on the public, and how libraries should respond.
Our predecessors considered that closing the catalog was a very serious decision since it essentially forced everyone to search in two different places in two different ways. In my own experience, the public never understood why there were two catalogs, and many believed the computer catalog and the cards were copies of each another that the library placed for the public's convenience! In any case, it was extremely complex to expect the public to search “Soviet Union” in the computer catalog, but “Russia (1923- U.S.S.R.)” in the card catalog, or search Twain, Mark (plus his different pseudonyms) in the computer catalog but Clemens, Samuel in the card catalog. Corporate bodies were even more complicated. Still, closing the catalog coincided more or less (and I emphasize more or less) with the introduction of computer catalogs, so it helped to ease the final decision.
I am not saying that switching to AACR2 was a mistake or anything of the sort, because that's all history and librarians got a lot in return, primarily a larger number of copy catalog records. But there is no doubt that it was more difficult for the public, and it only seems wise for modern day catalogers to learn from previous experience. Will the public understand that when they search “John Huston” as a director, they will not be seeing everything with him as a director? Based on previous experience, there is no way they will understand, and they will think that the Blu-Ray is all that the library has to offer. Multiply this for all writers, actors, editors, and all of the other roles. And of course, the catalogers must assign all roles consistently as well.
Exactly the same problems of lack of consistency will be encountered with the WEMI relation fields.
So is there a solution?
Barring any brilliant automated solutions which I cannot even imagine—well, I can some solutions, but I can't imagine that they would actually work!—it seems as if any practical solutions will have to be primarily manual. To me, it seems as if there are only a few solutions:
- split the catalogs and train the patrons
- keep a single catalog and train the patrons
- change the searching mechanism so that when you select a relator e.g. “film directors” you are also searching without that relator code
Let's discuss these.
A. split the catalogs and train the patrons
This could also be called “closing the catalog” as nothing would be added to the earlier catalog. This would recreate what happened with AACR2. While this option is probably simplest for the catalogers, it is probably the most complex for the patrons. Today, the mechanical part of the search, that is, continuing the patron's search into the other database, could be implemented with a simple link that would search the patron's terms in the other catalog, but the search itself would still necessarily be different in each catalog. People would still need to know how a search for a film director in one catalog would have to be a general author search in the other, otherwise they would be seeing too many strange results.
Training and retraining of patrons would be unavoidable and it would be very difficult to predict how they would react. It would be a massive task for public services. Based on previous experience, the patrons would be at least as confused by the catalog as they are currently, and probably much more so.
B. Keep the catalogs together and train the patrons
This is what I have always imagined would happen and how it works now. The RDA and pre-RDA records are all together in the same database. Some libraries apparently did something similar during the transition to AACR2. It's interesting how they did it.
This is an historical aside, but I can't restrain myself. When the heading changed from Samuel Clemens to Mark Twain for example, the Samuel Clemens cards under “C” were physically moved to “T” Twain, Mark and interfiled, but the headings at the top were not changed. The users would find almost everything correctly filed, but on one card, it would say Twain, Mark and the next card would have Clemens, Samuel all under “T”. Also, in the AACR2 records the separate pseudonyms of Mark Twain were under separate headings, but those separate “bibliographic identities” were all merged together in the earlier Samuel Clemens records. This happened for every heading. I understand why the librarians did this, but people found it very confusing and I don't blame them at all.
To do something similar in the computer catalog with RDA relator codes would avoid the mechanical part where the public would have to go into a separate database, but separate searches would still need to be done on the single database, for instance, separate searches for John Huston with and without relator codes. As before, training the patrons would be unavoidable, and once again it would be very difficult to predict how they would react. This would also create a massive task for public services.
C. Change the searching mechanism so that the catalog will search the earlier records automatically
This envisions the previous option, but the catalog would automatically search with and without the relator code, or when searching for related works such as adaptations, the computer would also look for a simple added entry. This way, the search result would be the most reliable.
This would be simplest from the point of view of user searching since only one search would be needed, but the result could be quite confusing for them. Searching for John Huston as director but then getting a result when he was an actor only, would be very confusing and would probably make people angry. Such an option would in essence ignore all the relator codes and format changes introduced by RDA and FRBR. On the positive side, at least this possibility would ensure that the searchers were getting more or less everything within a single search.
The possibility of automatically converting the older records to some new, wonderful FRBR format where the headings would come from separate “work” or “expression” entities needs to be demonstrated as being practically possible. I harbor very serious doubts.
So, we see that breaking the principle of consistency has many consequences, that is, if we want to preserve the reliability of the search results.
From the email exchange I mentioned earlier however, my correspondent mentioned yet another solution that is perhaps reminiscent of Alexander the Great when he hacked the famed Gordian's knot to pieces. I quote:
“We tell them [i.e. the public] to deal with it and feel lucky for the improved though incomplete access I suppose. We have done this with multiple cataloging improvements including subject headings for fiction and genre headings without undertaking retrospective editing.”
end of quote
I am grateful for the frankness of this statement. Many catalogers would probably agree and I would have too at one time: people will just have to deal with it because of the “improvements” that we are giving them. Although I don't know if comparing the changes of RDA and FRBR to the subject headings for fiction and genre headings is all that valid, since people rarely use subject headings today, especially as they are designed to be used, so I don't know if a substantial part of the population has used the newer fiction and genre headings all that much. That aside, the fiction and genre headings are not in every record, and the RDA changes for access and description are on a completely different level, affecting almost every record with roles scattered everywhere.
Will the public see these changes as improvements or limitations? Will they see this as even more proof that the library catalog is broken beyond repair and has been so for a long time? No matter what, the changes of RDA and FRBR will involve added complexity and massive training seems unavoidable if people are to get reliable results. Such a level of training seems to be more and more difficult even to imagine today. It is really too bad that there has never been any “product testing”, so there is no real idea how the public will react.
Perhaps the biggest difference from the time AACR2 was implemented and RDA implementation today is that today's public has options, many options. In the late 1970s, early 1980s, people had very few choices other than the library catalog if they wanted information: they could “deal with it” and use the catalog, browse the shelves at random, or give up, go home and do without the library and its information altogether.
Everyone must accept that this scenario describes a world that is long past and will never come again. Today and in the future, people have several options available to them and the options grow all the time. Many of these options have entire teams of experts with deep pockets (at Google, Yahoo, Microsoft, and so on) who care very deeply whether the public likes their product or not and are continually working to make their clients happy.
Compare this to the attitude above, which is, in my experience, a very “cataloger attitude”: “We tell the public to deal with it and feel lucky for the improved though incomplete access.” Will people accept that today? Will they feel lucky? It must be admitted that employees at Google or Yahoo would absolutely never say such things—they know people have options, and these options are available quite literally at the click of a button. They take it all deadly seriously and constantly ask, what can we build that will entice people to use our product?
Workers at Google understand very well that if the popular mind decided Bing gave better results, the great Google search site would lie unused, just as the web is littered with old search engines. (I discussed this in an earlier posting http://blog.jweinheimer.net/2011/12/old-school-search-engines-where-are.html) It is very easy to make a new choice among search engines. They understand that Google is but one choice among many choices, and we should understand that libraries are also just another one of those choices. The public does not have to “deal with it” because they have many other options.
The public's expectations are completely different from what they were 20 years ago. From what they were 15 years ago or 10 years ago. They are changing at an incredible rate. I fear that the additional complexity of getting reliable search results will be very off-putting to the public and they will turn to other tools they prefer. Then, it will be the catalogers who will have to “deal with it”.
After all of these considerations, trying to transfer “consistency”, the foundation of the catalog, into the Semantic Web in some sort of semi-coherent way makes me shake my head in despair. If it's this hard with our catalogs, what about the crazy open Internet world? How could a non-librarian web master ever begin to deal with any of it? To those questions, I must reply that I don't have the slightest idea. For my own thoughts on the Semantic Web, you can listen to another podcast of mine. http://blog.jweinheimer.net/2012/03/cataloging-matters-podcast-no-14.html
Does this mean that there can never be any changes at all? Of course not and I and others make suggestions all the time, but there needs to be a simple acknowledgement of some realities. For example, catalogers need to admit that the dictionary catalog is dead. As dead as the dodo. Sad perhaps, but it's been dead since keyword was introduced and that was a long time ago. Therefore, many practices based on left-anchored text browses no longer make any sense. Except to a cataloger. Those methods are obsolete today. Other important practices just don't work. Cross-references that make subject headings more useful and comprehensible, along with pseudonyms of authors and earlier/later names of corporate bodies remain hidden from the public in the keyword environment. Catalogers can implement RDA and FRBR and people still will not know, after finding a record for an item with the subject heading “Labor and laboring classes” that they also need to search “Labor” “Labor movement” or “Working class” if they are going to find anything new. How about figuring out how catalog records and full-text search engines can best work together to improve both tools?
Instead of increasing the complexity of the catalog, we should be trying to improve what we already have and fix the dysfunctions that people have to deal with every single day. When a cataloger is looking at a catalog record, and the results of a catalog search, they should be doing it not through their own eyes, but through the eyes of someone who knows very little, or preferably nothing, about the catalog. When you do this, the problems of the catalog become clearer: the cataloging abbreviations, relator codes, and relationships pale in comparison with other, massive, obvious problems. Deal with those first.
To conclude, I wanted to clarify how I see the practical problems of implementing RDA and FRBR, which is that the consequences will be to decrease access for the public because of the greater complexities it introduces.
The music I have chosen to end this episode is the overture from Verdi's La Forza del Destino. This is one of the great operas and also apparently has a curse on it. In 1960, an American singer, Leonard Warren, was about to start the aria “Morir, tremenda cosa” or “to die, a momentous thing” and he fell down dead from a cerebral hemorrhage! According to Wikipedia, some singers have refused to perform in this opera including Pavarotti. This recording comes from the Internet Archive, where you can listen to the rest of the opera if you want. http://archive.org/details/FridayNightAtTheOpera-41610
That's it for now. Thank you for listening to Cataloging Matters with Jim Weinheimer, coming to you from Rome, Italy, the most beautiful, and the most romantic city in the world.