Re: [RDA-L] Automatically adding relationship designators (was Cost of Retrospective Conversion for Legacy Data…)

Posting to RDA-L

On 09/12/2013 0.04, Kelley McGrath wrote:

OLAC is attempting a project of this sort for film and video credits. We are trying to teach a computer to recognize the names and roles that appear in 245$c, 260+$b, 508 and 511 (and if we get really brave maybe 505) and also connect them to the correct 1xx/7xx if present. The current program, which uses natural language processing (NLP) techniques, is reasonably successful with personal names and with roles given in English. We are working on building a multilingual vocabulary. It tends to choke on complicated statements that involve a lot of corporate bodies.

I hesitate to bring this up because most probably everybody already thinks of me as a purveyor of doom and gloom, but I still believe that we must consider these things in realistic terms. Although the attempt is laudable, I still say that we must first of all see through the eyes of the users who would be interested in this kind of information. For instance, if I am a regular user and I wanted to know the movies directed by John Huston, what would be the first thing I would think of?

“Google it”. I am sure almost everybody would. So I did a natural language search: “what movies did john huston direct” and what happens? (This is linked data in action!) We find that down below in the links area (at least in the results I get), #1 is a link to John Huston in Wikipedia, #2 goes to “Category:Films directed by John Huston” also in Wikipedia, and #3 goes into his page at the IMDB (which I personally prefer). All have lists of the movies he directed. This is incredibly easy to do and free to all.

Putting aside for the moment the linked data result, the 3 links perform exactly the same function as in the past when someone would ask a reference librarian, “I need a list of the movies John Huston directed” and the knowledgeable reference librarian would reply: “Here. You can find the list in this book.” and would hand the user the latest issue of this title (or something similar) which was very possibly shelved in the reference collection for quick and easy access.

Therefore, just as the reference librarian would take the user’s question and convert it into, “He needs to look in Film directors : a complete guide“, today a reference librarian would do the same thing but answer/include, “He needs to look in the IMDB”. Without any doubt, that is the ethical answer for such a question and will remain so for a long, long time in the future.

The huge difference is that today, people rarely consult reference librarians. The librarian would already know that if you want to find the films of specific directors, the library catalog is currently not the right place to look for this information and when viewed realistically, it never will be the right place. There is nothing at all wrong with that. Not every tool is good for every use, just as if you want the latest business news or to find out why your XML won’t validate, the best place is not JSTOR, and it never will be. That doesn’t mean JSTOR is no good–it just means that you have to look in other places for that kind of information. Today, the correct place to look for the films people have directed is the IMDB or perhaps a few other places on the web. We are really lucky that we have such options for free today. The reference librarians would be able to help the searcher in these directions if they were asked, but sadly, that is happening less and less.

So, adding the relator codes automatically will still demand manual cleanup, perhaps (probably) on a massive scale, if it is ever to become as good as IMDB is right now. I suggest that the correct method for a library catalog is to lead the person to the right resource that he or she wants and perhaps even do it better than Google. In this case of film directors, I find it very difficult even to imagine how we could do better than Google because the Google search works so incredibly well. Perhaps a film librarian could discover that the IMDB and Wikipedia are incorrect or incomplete. In that respect perhaps library efforts could be better focused on improving IMDB and Wikipedia than adding relator codes.

There is also the option that the library catalog could interact with the IMDB (and/or Wikipedia) using the APIs.

This opens up a highly pertinent question for me: I don’t even know what a library catalog is supposed to provide in today’s semi-total information environment. This is a great example. We can’t ignore these wonderful sites. What should the catalog do today?