Catalog
Matters Podcast no. 18:
Problems
with Library Catalogs
https://archive.org/details/Eighteenth
Hello everyone and welcome to Cataloging Matters, a series of podcasts about the future of libraries and cataloging, coming to you from the most beautiful, and the most romantic city in the world, Rome, Italy. My name is Jim Weinheimer.
In the last episode, I
provided some examples of how people want to manipulate data instead
of plowing their way through masses of printed text but I went on to
express my doubts that the information in catalog records is actually
the type of information that most people want to manipulate. I would
like to continue that discussion.
In the previous episode, I
provided some examples of the kind of data that people want to
manipulate, and I want to add one more example here because it has
meaning to me personally.
I used to be a
semi-serious chess player. Every beginning player has the experience
of just after a few moves, you find yourself looking at a position
you do not understand, but your opponent knows everything. He is
smiling, moving quickly and easily, while you are suffering and
spending lots of time just to find moves that you hope don't lose. It
doesn't take too many of these experiences, and lost games, before
you figure out that if you want to have good results, you must
prepare your first moves, also called the chess openings, and
that means doing research.
This is genuine
research by the way—nothing at all like those undergraduate
papers where five or six scholarly articles fulfill an assignment
that nobody cares about. No, you care. You want the best and you want
to be thorough because otherwise, you will suffer and you will lose.
So what does it mean to do this kind of research?
In the past, it meant
spending money to get the largest library of chess books and
magazines you could afford and borrowing anything you could get your
hands on. These materials were—and still are—filled with games
and notes, and you hoped everything was well indexed, so that you
could bring it all together and write—manually—your own “opening
book” of good moves, bad moves, plans, ideas and so forth. Doing
this could take months of hard work and you were always adding to it.
Today, all this is
done with databases and what used to demand so much labor and time to
sift through this massive amount of information now takes only a few
seconds. The first time I saw one of these tools in action, I was
quite literally left speechless! Grandmaster Gennadii Sosonko says
that before databases, it took anywhere from a year to a year and a
half to prepare a new opening. But because of databases, the research
takes only a few seconds, and the data can be mined in new ways, so
today to reach the same level of preparation requires only... two
weeks! Two weeks versus a year and a half. And you are as well
prepared as anyone. That is incredible. For those who are interested,
I have added a link to a video that demonstrates this. You don't need
to know any chess to see the power of such a tool.
Obviously, chess players who do not use these tools are probably at a
serious disadvantage.
I have no doubt that
others want to do something similar—not with chess, but with
whatever topic they prefer. I know I would. The reason it works so
well with chess is because the moves that once were printed on paper
have now been made into data and that data can be manipulated by
computers in all kinds of ways. To do something similar with other
topics, it would be necessary to turn the information on paper into a
kind of data that computers can understand and work with.
It also shows the problem
with catalog information that I discussed in the previous episode of
Cataloging Matters. As a chess player, I am interested in the
data of the chess games themselves, that is, the individual moves,
their evaluations, who played them and when, not the data
about the books and the serials and the videos and everything else
that contains the information I want. Therefore, as a chess
player interested in improving my play, which information from the
catalog record would I want to manipulate? The fixed fields, the
standard numbers, the main or added entries, the titles, the
publication information, the physical information, the series
information, the notes, the subjects? None of that helps me improve
my play. And yet, I am always interested in finding more “chess
data” to put into my database.
In the same way, I think
most people are interested in improving their knowledge and
understanding of baroque architecture or political issues
of my community or
plasma physics or whatever
interests them, but manipulating the bibliographical details
of the containers that hold the information that interests
them will not help them understand those topics.
This is why I say that
while we can go ahead a “turn our catalog records into data”, it
is—lacking any evidence to the contrary—at the very least,
extremely naive to expect the public to find new insights into
the topics that interest them because they will be able to manipulate
the standard numbers or the publication information or the notes or
the publication patterns, or any of the other information that is in
our records.
So, why would anybody need
catalog records? What more could I want regarding my chess data? As I
said before, I am always interested in finding more “chess data”
to put into my chess database, and this is where the catalog
information comes in. Although the catalog does not have chess data,
it can lead me to chess data.
It can be argued that
full-text searches can lead to more chess data, too. What is the
difference between these tools?
Everyone recognizes
that the public has changed its “information seeking” behavior in
fundamental ways from what it was only 20 or so years ago. For those
listening who may be relatively young, 20 years may seem like a long
time, but in library-time, it must be recognized that 20 years is
quite literally the blink of an eye. What this means is that every
day almost everyone who uses a library's collection works with
materials and records made long ago. Often, those materials are among
the most important and valuable parts of the collection. This does
not happen with many other fields such as with businesses and most
other organizations. For them, the information made before a certain
time, say five or ten years, is much less important for their needs
and is discarded or archived, and those times it is retained, it is
kept as a curiosity.
Materials in libraries are
very different.
Full-text search engines
have profoundly changed the way people search and even the way people
think about searching. It seems that even for many of those who did
work in those earlier times, their memories have faded. I know my
memories have until I start working to remember.
One example of how deeply
we have changed is that today, everyone takes for granted the
over-arching importance of “relevance ranking”. Relevance,
a word that sounds innocent enough, has taken on semi-propagandistic
uses in that it mixes the sense of its meaning in statistics and
information science with the way it is more popularly understood.
Companies such as Google that make billions of dollars, are very
interested in making sure that the these two definitions remain mixed
together in people's minds as much as possible.
In spite of what some may
prefer to believe, the two senses are definitely not the same, but it
can be difficult to see and comprehend the difference. We can discern
that difference most clearly when we examine a search engine result
verses a search in a library catalog, when the search in the library
catalog has been correctly made
and the library catalog also works correctly. I
emphasize correctly because
it is extremely difficult to do today.
How do people find
materials with full-text searches? Research on search engines (I have
some links in the transcript) has consistently shown
that people concentrate almost all their attention on the top three
or so results. People almost never go beyond the first page. It
should be added that the default number of search results in Google
is ten, and since people rarely change a default setting, the first
page means ten results.
Search User Interfaces: Presentation of Search Results / Alexander Schreiner. In: Themen des Information Retrieval : Suchmaschinen und Web-Suche : Beiträge des Seminars im Sommersemester 2012 / Andreas Henrich, Daniel Blank (Hrsg.). p. 35+ and Search User Interfaces / Marti Hearst. Cambridge University Press, 2009. p. 136)
I have personally been
fascinated when I watch people work with Google. They put in a word
or two or three, look at the top three results, or five at the most,
and if they don't find what they want, they immediately try other
words, look at the top three or five results, try yet other words,
and so on.
I confess I have
found myself searching Google in exactly this same way. Such actions
betray a number of assumptions on the part
of the searchers—and this apparently includes me when I do it.
Many of these assumptions
are rather illogical but entirely understandable. As one example of
these assumptions, it seems illogical to believe that a search
through the vast information resources now on the internet and that
retrieves several hundreds of thousands or millions of results could
possibly have only a paltry three or four hits that are “relevant”
and that the millions of other pages are therefore practically
“irrelevant” and can be ignored. That really makes no sense but
it is what I see with Google results. After the top few results, the
rest really is almost completely irrelevant.
After the first few
hits, I see more and more places to buy books or videos or tee shirts
or bizarre email exchanges that are (I guess) somehow “relevant”
to my search. I have always found this very strange. You would think
you would find highly relevant items at first, then slowly you would
see less relevant and gradually it would trail off to complete
irrelevance, but my experience, which may be different from anyone
else's, has been a more or less complete drop off after the first
five or ten maximum. Therefore, I think people are right to stop
looking after the top few. But I often think: is that true? I can't
believe it. Furthermore, to believe that a machine could
automatically bring the results to the top that are the “best”
and “most appropriate” and to do it for me as an individual at
any particular moment, is akin to magical thinking.
It begins to make more
sense when we consider the information science meaning of the
word “relevance”. That meaning of relevance is quite different
and has to do with mathematics and algorithms, with precision versus
recall and so forth. This is the meaning of relevance for a
Google search—buried in statistics and algorithms (almost all
secret by the way)—but it is something I don't believe the average
person understands. When people hear that the top hits are the most
“relevant” to their search, they confuse this algorithmic sense
of “relevance” with “best” or “most appropriate” or “most
useful” and then, they eventually come to believe that these pages,
by definition, really are the “best” or “most
appropriate” or “most useful”.
Although I can't prove it,
but I don't think it can be disproved either, I have come to suspect
that Google does not so much find the most relevant sites (even in
the information science meaning) so much as it has managed to move
the completely irrelevant junk that had tormented everyone for such a
long time, to lower levels in the search result. What is left over is
popularly interpreted as the “most relevant” or “best” but
what is genuinely the “most relevant” or “best” may still lie
buried inside the search result somewhere or not even in that result
at all.
More importantly though,
this matter becomes clearer when we compare it with a correctly done
catalog search where everything works differently. Let's imagine that
I am interested in “popular songs”. A reference librarian would
immediately understand that my request most probably reflects a lack
of focus and would begin to ask questions such as: popular songs from
where, from what time, which genres, am I interested in recordings or
texts, and so on. A reference librarian could help me a lot.
But even if I do not
consult a reference librarian, there is a lot of help with a
correctly done search in a correctly made catalog.
I know that on the lists
and in my podcasts I discuss library and cataloging history and I
hope it doesn't put too many people off. I do so not out of a sense
of nostalgia, but because I believe it is impossible at this point in
time to understand our current catalogs and decide in which
directions they should change without clearly understanding what they
are, and that means knowing at least a bit of their history. And for
better or worse, that means discussing catalogs that existed in other
formats. Never forget that the records we make today could easily fit
into a card catalog of 1870.
During the days of the
card catalog where everything was in alphabetical order, I would
search for “popular songs” by opening the card drawer as close to
“Popular” as possible and eventually come to a card like the one
I have placed in the transcript, which is from the Princeton
University scanned card catalog. It says:
When imagining
someone doing this in reality, it is essential always to keep in mind
that I could not come to this card directly as the hyperlink allows.
It would take me some time to find this card, because first, I
wouldn't know it existed, plus I would be browsing from the beginning
of the drawer of cards. In this case, I would have seen and browsed
past the title “Popular history of British ferns”, the subject
heading “POPULAR LITERATURE--FRANCE”, the title “Popular
political economy” a cross-reference for the corporate body
“Popular revolutionary American alliance” and so on. That is, I
would see many records that have nothing
at all to do with what I want—popular
songs.
After this browsing,
I would find the card that would tell me that I should look under
“Music, Popular (Songs, etc.)” so I would walk over to the “Ms”
where I would once again browse just as I did before, seeing even
more materials that had nothing at all
to do with what I wanted, and would eventually find a special
arrangement of cards. There is a link to this arrangement in the
transcript. http://bit.ly/XPy0ek.
Unfortunately the scans go a bit crazy for awhile but you can still
see each card. Click on “Next Card” and go through just a few of
them. The searcher discovers that this topic “Music, Popular
(Songs, etc.)” has been subdivided into groups, such as “Addresses,
essays, lectures” “Bibliography” “Dictionaries”, and as I
continued to browse, I would discover that I could also find popular
songs of different geographical areas. Quite a bit of help.
For those who used
the printed books of the Library of Congress Subject Headings (those
terrifying, big, fat, red books that I never
understood before library school), I
would again browse alphabetically, looking for “popular songs”.
We can see how it worked from a copy of the relevant page found in
Google Books. I added it to the transcript.
http://books.google.it/books?id=rREoAAAAMAAJ&dq=library%20congress%20subject%20headings%20music&pg=PA3613#v=onepage&q&f=false
http://books.google.it/books?id=rREoAAAAMAAJ&dq=library%20congress%20subject%20headings%20music&pg=PA3613#v=onepage&q&f=false
Under “popular
songs” I see that I should look under “Popular music”. The
historian can see that the heading has changed since the card
catalog. In this example, “Popular music” is on the same page in
the printed book, so we just go to the top of the page.
We discover that
added to the topic “Popular music” is the not very highly
readable (May Subd Geog) which,
to those who know, means that this can be subdivided by geographical
area. We also see related
classification numbers, the UF, BT, NT and a scope note. Continuing
on, we can see some subdivisions and find that “Popular music –
Louisiana” has a Narrower Term of “Zydeco music”.
All of this can be very
helpful to someone interested in popular songs, and in the absence of
a reference librarian can help people focus their thoughts and
perhaps lead them from the vague notion of “popular songs” to
something tangible that interests them. In this case “Zydeco
music”.
That's how it worked in
the printed world. It would have taken a lot more time than I have
taken to explain it. You may also have had to wait because someone
was using the card drawers you needed. Searching the card catalog was
just a pain. And yet, there were advantages.
Let's compare this
to browsing entries for the subject “Popular music” in the online
LC catalog. There's a link in the transcript.
http://catalog.loc.gov/cgi-bin/Pwebrecon.cgi?DB=local&Search_Arg=popular%20music&Search_Code=SUBJ_&CNT=100&hist=1.
We are assuming we already know that the subject to browse is
“Popular music”. What do we see?
We see many,
many more subdivisions than those in
the printed LC Subject headings. Each geographical subdivision
displays, resulting in an overwhelming list and illustrates how the
cryptic (May Subd Geog),
although not very comprehensible, actually came in very handy to help
someone understand how a topic is sub-arranged. There are also many
more subdivisions in this list than
we see in the printed LC subject headings, and these come from the
list of free-floating subdivisions that can be used under any topical
heading. I provide a link to an old version of that list
http://www.itcompany.com/inforetriever/form_subdivisions_list.htm,
where we can find “Bibliography” “Bio-bibliography”
“Discography” and many others.
After browsing through ten
screens comprising 100 subject headings each under “Popular music”
or 1000 subject headings—I repeat: 1000 subject headings—I am
only up to “Popular music—France—1901-1910”. It's hard to say
how many screens of popular music there are, but I think it is safe
to conclude that only the tiniest percentage of a populace used to
looking only at the top three hits would last to the bitter end, or
even half-way through to see the key Narrower Term reference from
Popular music – Louisiana that leads them to “Zydeco
music”. No one will do that today. Including me. I refuse.
There was a similar
problem with card catalogs of course. Although I can't demonstrate it
physically—people will have to just take my word for it—it was a
lot easier to flip through the cards in a card catalog or page
through the subject headings in a book catalog than plow through
these web pages. But it was still a pain.
Once I do find
“Zydeco music” in the computer catalog
http://catalog.loc.gov/cgi-bin/Pwebrecon.cgi?DB=local&Search_Arg=zydeco%20music&Search_Code=SUBJ_&CNT=100&hist=1
I find some other intriguing subjects, such as “Zydeco
music—Finland” along with a related term “Cajun music”.
This simple example
illustrates that the catalog is based on creating intellectual
groupings, that is, sets, of similar items and
presenting those sets to the searcher in different ways. There is no
concern at all for anything resembling “relevance”. It isn't as
if you would look at the 200 items you find listed under “Zydeco
music” in the LC catalog and think “I don't see what I want under
the first three records listed here so I'll try another search”. At
least, I hope searchers do not do that today, although from their
point of view if they did it would be fully logical. So people may do
this—I don't know. Does anybody know? Somebody should.
The assumption with a
library catalog should be: if the information about Zydeco music you
want exists, it will definitely be within this grouping
labeled “Zydeco music”.
Is that true?
No.
Why? For several reasons.
One of the main ones: catalog records base themselves primarily on
complete resources—technically speaking, 20% or more of an item, so
within a specific collection there may be many materials with
information about “Zydeco music” but not everything warrants a
separate heading. In fact, there may be a lot of information about
Zydeco music in the resources found under the broader term “Popular
music—Louisiana”, maybe even “Popular music—Southern States”.
It would not be stretching the imagination that there may also be
significant information on Zydeco music under materials with the
related term “Cajun music”. How can someone be aware of all of
that?
Let's look. What happens
in the library catalog if I browse the subject headings for “Zydeco
music” and I go forward and backward? If I browse backward, I find
the heading Zydeco dance--Study and
teaching—Louisiana which is perhaps not too bad, but next
comes a subject heading about the word “Jew” in Lithuanian.
If I browse forward, I
find Zydeco musicians but then come some place names and
corporate bodies in Poland. While Zydeco musicians and dancing may be
all right for my purposes, those other topics are of absolutely no
value to me. They are so far off that they can't even be labeled
serendipity. Some have claimed that alphabetical arrangement is
essentially no different than random arrangement—or at least a
completely arbitrary arrangement—and this demonstrates why.
Obviously, what someone
really needs, when looking at records of “Zydeco music” is to
know that there may be more information on Zydeco music at least in
the groupings “Popular music—Louisiana” and “Cajun music”
if not maybe others.
These relationships exist
now but as we have just seen, utilizing these relationships is
practically impossible since even if you know how to do it, as I do,
you have to fight with the catalog. This is why I have stated
repeatedly that the catalog is broken.
Why do we have to fight
with it? Because the catalog we have today was designed to present
everything in alphabetical order, the arrangement you find in a
dictionary, this is why Charles Cutter titled his rules “Rules
for a Dictionary
Catalog”.
That is, a
dictionary of the 19th
century—not one of the 21st
century. For someone using
merriam-webster.com or dictionary.com or Wikipedia, all of those
tools work completely differently from the dictionaries and
encyclopedias in the world of Panizzi and Cutter or even that of only
20 years ago. If I go to merriam-webster.com, I just type in the word
I want to know. It helps me even if my spelling is atrocious. I can
completely misspell the word “chrysanthemum”
http://www.merriam-webster.com/dictionary/krisanthenum
and still find it.
Try looking for this
word in a printed dictionary if you have one and notice along the way
how much you see that has nothing to do with chrysanthemums or
flowers or even biology. If you don't have a printed dictionary, I
have a link where you can look for “chrysanthemum” in a
dictionary from 1823. http://books.google.it/books?id=jlZBAAAAcAAJ
This link goes to the cover. Don't cheat and do a text search for
the word but browse for it like you would in a physical volume! I
don't suggest looking up “chrysanthemum” under “k” but if you
want, I have a link to volume 2.
http://books.google.it/books?id=qVZBAAAAcAAJ.
Therefore, when we
read that the library catalog is a dictionary catalog, which it is,
these printed dictionaries are what we should envision. That is
because the people who designed our catalogs had those tools right
before their eyes since everyone used them all the time. Those old
catalogers added all sorts of aids to searching their catalog but
those were all made for a physical dictionary catalog and those aids
have become useless today. The reason they are useless is that
browsing alphabetically, and seeing a huge number of materials that
are completely irrelevant to our search have become very strange in
the modern world. This is a fact whether we like it or not.
The methods I have briefly
described clearly do not work in the current environment. They are
never, ever coming back and they shouldn't because they genuinely are
obsolete. But that is the way our catalogs work now, whether we like
those methods or not. Nevertheless I think it is important to
consider that just because the methods may be obsolete doesn't
mean that everything is obsolete.
What do I mean? Let's
consider some differences from the past. What is a heading? For
catalogers today, it means the 1xx, 240, 4xx, 6xx, 7xx or 8xx that
today contains controlled vocabulary and provides a link that
searchers can click on so they can find related records. In the past,
it was something much less vague. It was the part written at the top
of a card that determined where that card sat in the card catalog. In
the transcript I have an example of a card where I denote the heading
in red, and often, subject headings were typed in red too.
In book catalogs,
the heading was printed one time at the beginning of a group of
records and for groupings that went on at some length the heading
would be repeated at the top of the column or the top of the page. In
the transcript I provide an example of headings in a printed book
catalog and again denote the headings in red. We can see how Cicero's
name is not repeated even though there are six items.
Catalogue of the Mercantile Library in New York. New York : E.O. Jenkins, 1844. http://books.google.it/books?id=_mtMx5Z8J28C p. 43:
Catalogue of the Mercantile Library in New York. New York : E.O. Jenkins, 1844. http://books.google.it/books?id=_mtMx5Z8J28C p. 43:
I also have an
example of subject headings with subdivisions in a catalog from 1869. We see the beginning of the topic “Moral science” which
comes after “Moors in Spain” (dictionary catalog at work) and we
see its subdivisions “General works – History” and “Systematic
treatises”. There are other subdivisions that come later, such as
Miscellaneous works, and all kinds of Special subjects, Anger,
Avarice, and others.
Catalogue of the Library of Congress : index of subjects. Washington [D.C.] : GPO, 1869. volume 2, p. 1177. http://books.google.it/books?id=RbtSAAAAcAAJ.
Catalogue of the Library of Congress : index of subjects. Washington [D.C.] : GPO, 1869. volume 2, p. 1177. http://books.google.it/books?id=RbtSAAAAcAAJ.
The purpose of the heading
as a designation for a group of records on the same topic or author,
is very clear in a book catalog. The methods are obsolete, there is
no argument about that. But exactly what do we see here that is so
obsolete?
No one today is going to
look for “Moral sciences” by starting at “M” and browsing
past Metallurgy, Meteorology and Monograms. But it does
not necessarily mean that the groupings themselves are
obsolete, that is: the sets of records found under each heading.
I believe it is clear that
people still want the materials we see grouped together, for instance
the materials grouped under the topic Moral science – General
works – History. People in
1869 wanted the
resources we see grouped there and there is every reason to think
that people want exactly those same resources today. Therefore, the
grouping, or the set, is not
obsolete. Of course it
needs to be updated to include modern resources. If we assume that
people still want this group today, the question becomes: how does
someone want to find this group? Naturally, people would
prefer more modern words in place of “Moral science”, such as
Ethics, Deontology, Morality, or
Morals.
Yet, how can people who
want such a group find it if they don't know how it is named, here
Moral science – General works – History or even that the
group exists? Also, when people find this group, how can they find
other groupings of materials that may be of interest to them? Can it
be done?
Yes. We saw it in the card
catalog. The example with Zydeco music shows
how people really could do this in earlier catalogs—if
they used those catalogs and other tools correctly. It
wasn't easy back then, but it's almost impossible today.
The library catalog
provides groupings of resources that have been selected by experts.
The groupings and arrangements of the individual resources are based
not on statistical relevance, but on the intellectual contents
of the items. Naturally, this system has never been perfect but
neither are Google or any of the other systems. The traditional way
relies on the ups and downs of human frailties, and consequently has
missed a lot, but I'll just go ahead and say it: it can't be all that
much worse than believing that when I do a full-text search and get a
million hits, that only the first three I see are worth considering
and that they have risen to the top by magic. I don't believe it.
The reason I don't believe
it is that I understand what statistical relevance is and I
also understand how library catalogs are supposed to work. I know
there must be more.
These are some of the
reasons why I don't think RDA or FRBR are going to make any
substantial difference in the ways the public uses our catalogs. Even
with linked data, why should the public want to manipulate
bibliographic data that has no meaning for them? Our catalogs will
still be based on the principles of a 19th century
dictionary—not even on a 21st century dictionary! The
problem is not our records or even the information in them—it is
the reliance on alphabetical order that has become obsolete in our
new environment.
When I am looking at
the set of records under “Ethics”, I want to know that there are
many subtopics available for me such as “Cross-cultural studies”
that I would never have imagined. I want to know there may be more
information under “Philosophy” and “Values”, and that there
are all kinds of narrower terms such as “Akrasia”
http://lccn.loc.gov/sh2006003161
even though I may have never heard of Akrasia in my life!
So, is there a solution? I
think there is and in the next podcast (yes, it continues) I want to
discuss something called Information Architecture and how
Information Architecture could help library catalogs and even
libraries.
The music to end this episode is a little different from what I have chosen before. This time it won't be Italian music, but more in keeping with the spirit of this talk, here is Zydeco music from Louisiana. This is “What you gonna do?” performed by Buckwheat Zydeco and his fabulous group. http://www.youtube.com/watch?v=AQL1eT4crZw à
That's it for now. Thank
you for listening to Cataloging Matters with Jim Weinheimer, coming
to you from Rome, Italy, the most beautiful, and the most romantic
city in the world.





You say: "Even with linked data, why should the public want to manipulate bibliographic data that has no meaning for them? Our catalogs will still be based on the principles of a 19th century dictionary—not even on a 21st century dictionary! The problem is not our records or even the information in them—it is the reliance on alphabetical order that has become obsolete in our new environment."
ReplyDeleteThe primary reason for linked data (and more discrete data identification) is not for the end user to manipulate that data. The primary reason is for the SYSTEMS to be able to manipulate the data according to the needs of the user. The problems with the catalog that you talk about are very real. But the deficiencies in our current metadata structures are very real, too. With the pieces of data more clearly defined, systems will be able to make associations between resources, concepts, etc. that are inherent in the descriptions, thesauri, classification schemes, etc. The systems will then be able to take the existing data, which is based on 19th and 20th century technologies, and make it more useful in 21st century technologies.
I do not understand why you keep insisting that we should be working on improving our catalogs INSTEAD OF implementing FRBR, RDA, linked data, etc. It's not an either/or thing going on. Rather, RDA and linked data are the paths that we have identified as the ones we need to take in order to make better catalogs!
The goal should be the end user, not the machines. As I said in the podcast, what parts of the catalog record do the users want to manipulate when they are looking for information of interest to them? Are they interested in the "associations between resources, concepts, etc. that are inherent(?) in the descriptions, thesauri, classification schemes, etc."? I don't think so and it needs to be proven.
ReplyDeleteWhile people do want to manipulate data, I cannot imagine that too many of them will find new insights into Bram Stoker's Dracula, or in the Cold War or in the reasons for the latest economic crisis by manipulating any part of our catalog records. That is, unless someone can show that by manipulating the paging, size, statements of responsibility, or bibliographic notes, that some researcher will find new information about Shakespeare's Hamlet.
Finally, exactly who has identified that RDA and linked data are the paths we need to take to make better catalogs? It was never put to a vote or even discussed.
It is long past time to question these propositions and put them to the test. Better that it would fail now on its own rather than have everyone go up in flames with it.
You're still not getting it. The users aren't doing the manipulating, the machines are. And this doesn't mean that the focus is on the machines. Catalogers writing bibliographic descriptions on 3x5 cards were not focussed on the machine (card catalog); they were focussed on the user, using the technology of the card catalog to meet the user's needs. Catalogers putting bibliographic descriptions into MARC tag workforms aren't focussed on the machine (online catalog); they're focussed on the user, using the technology of the online catalog. Catalogers dealing with linked data will not be focussed on the machine; they will be focussed on the user, using the technology of linked data.
ReplyDeleteUsers aren't going to be "manipulating" the data in the sense that you seem to be imagining. We're not talking about users working with bibliographic data the way they would with, say, ICPSR data. But they WILL be able to get more easily to the information they're looking for because the MACHINES will be able to manipulate the bibliographic data more accurately. Data such as relationships between original works and translations; original works and derivations; works and writers; works and performers; works and awards; works and languages; works and mediation devices; and on and on and on and on and on. Making the machines more powerful in handling bibliographic data, to lead the users to the resources they want, is the whole point of it all. Yes, we need better catalogs. But we need better (and better-defined) data to make the better catalogs possible.
I think I have demonstrated as much or more than anyone else that I "get it". I just question the final product and the focus. I realize that it is all for machines and that once it is good for machines, it will be good for people. I disagree with that mindset.
ReplyDeleteIn theory that may be OK, but in reality what will be the final product? As I showed in the podcast, it will be making data that people DO NOT WANT TO MANIPULATE.
We can say how much people will want to manipulate "relationships between original works and translations; original works and derivations; works and writers; works and performers; works and awards; works and languages; works and mediation devices; and on and on and on and on and on" but I think people will care as much about that as I--speaking as a chessplayer who wants to improve my game--care about manipulating all of that. Which means absolutely zero interest. I want the moves and information about the moves. The other is of no interest to me at all. Adding the FRBR relationships (editor, director, WEMI, sequels, supplements ....) will make absolutely no difference.
Of course, if someone can show research that people want to manipulate this information so badly, I would be willing to reconsider. But research is exactly what the pro-RDA/FRBR people do not want to do.
It remains to be shown that people want so badly to manipulate bibliographic data as opposed to the data that really interests them. We see it every day in every library: people use the catalog only as a way to get into the collection, which contains the information they want. It will be the same with linked data, where people still won't want the bibliographic information, except as a way to get into the information that they really want.
Until the cataloging community recognizes this fact, there will be little change.
You obviously do not get it at all, because I keep trying to explain that the point is not to have people manipulate bibliographic data. The point is to have bibliographic data that the machine can manipulate. The FRBR study showed how our bibliographic records have all along contained the details of the resources: titles, authors, subjects, relationships with other resources, etc. Tagging those same details with MARC coding has enabled us to create discovery tools with computers that are much more powerful than card catalogs. Newer methods are showing much promise for tools that are even more powerful.
ReplyDeleteNO ONE EXCEPT YOU has been saying that we are trying to create bibliographic data for the end user to manipulate as bibliographic data. What the end user will be doing is telling the discovery tool what things they're looking for, or sometimes just follow paths that the tool offers them based on something else they're looking at. The computer will be doing all the manipulating. Unless you think that someone using even the most rudimentary of online catalogs is "manipulating bibliographic data"!
What a strange argument this is! Of course it is the machines that actually do the manipulation of the data, just as I mentioned with the chess program in my podcast. The "manipulation" of that chess data I used to do manually, myself. People "tell" the machine what they want, just as they do in the chess program, (if you haven't looked at that video, see how the program works) and the machine does its magic.
ReplyDeleteCatalogs contain information about “containers”: books, serials, recordings, and so on, and very few people need to manipulate that kind of data. OK, they don't need machines to manipulate the information for them. That information is irrelevant for their purposes.
Perhaps it hurts a cataloger too much to accept the fact: almost nobody needs bibliographic information, except a librarian. For instance, the information in the catalog record cannot help the chess program I mentioned. All the public needs from the catalog record is where an item can be found, and where are other, similar items.
Finding other, similar items has been broken for a long time for the public, as I demonstrated. And RDA, FRBR, Linked Data, won't fix that.
It seems clear that you don't get it at all. You seem to be insisting that the public (using machines, of course!) needs the detailed information found in catalog records for some reason I cannot fathom.
"Almost nobody needs bibliographic information, except a librarian." Oh dear me, I guess I don't need the catalog to contain the title of the thing I'm looking for. Or the name of its author. Or when it was published. Or what other works it's related to.
ReplyDelete"All the public needs from the catalog record is where an item can be found, and where are other, similar items." Nobody can find WHERE an item is if there is nothing in the catalog saying WHAT the item is!
"It seems clear that you don't get it at all. You seem to be insisting that the public (using machines, of course!) needs the detailed information found in catalog records for some reason I cannot fathom." Yes, the users DO need that information. The information is the absolutely essential part of the catalog. Without that information, there is no catalog at all.
I'm starting to wonder if maybe all you're doing is just making a career out of being a devil's advocate, for the fun of it. Or do you really believe all the stuff you're saying?
No. I am being honest and realistic. Remember that from the very beginnings, the public has come to the library to use the resources in the collection, and the catalog has been their way into the collection. How many people have you heard of--other than (maybe) catalogers--who enjoy searching the catalog? And reading a catalog record? The information people want is in the collection. That is where they learn about the topics that interest them--not from the catalog.
ReplyDeleteThis fact may be too difficult for catalogers to accept, I don't know. Accepting it was very difficult for me. But events are showing this simple fact to be true.
Now that there are ways of "finding information" other than library catalogs that are much more alluring than library catalogs, and the fact that the library catalogs as finding tools are broken (as I demonstrated in the podcast), things must change.
To make the information in our catalog into data means that the public (through the miracle of machines, of course!) will be able to manipulate that data in all kinds of new ways. But for someone who wants to learn about the history of alchemy, or the causes of WWII, or the intricacies of Michelangelo's David, they will not learn anything from the catalog. If they are lucky, they will be led to resources where they can learn about those topics.
If I am wrong, please demonstrate it. Show me how manipulating the information in the catalog records can help me understand the causes of the Cold War. Or what is a war crime. Or why 9/11 happened. Or why Durer painted himself as Jesus Christ.
You can't. That is because such information is not in the catalog. It's in the collection. And making the catalog information into data cannot change that at all. How will being able to manipulate anything in the catalog record help me learn about those topics? Manipulating the standard numbers? The authors? The catalog notes?
Manipulating any of that information will make no difference whatsoever--unless you can demonstrate otherwise. Please do so if you can. The best the catalog can do is lead me to resources that might contain that information.
And the ways the catalog does that do not work at all well today.
As I pointed out in the podcast and other places, we can go ahead and make the catalog records into data, but to expect it to make any substantial difference to anyone is very naive. It will make our records more widely available, and that would be good although the context will be lost, but there are many, many ways of doing that other than Linked Data.
You are repeatedly failing to grasp the most basic point that I have been trying to make: This has nothing whatsoever to do with giving data to the end user to manipulate in one way or another OTHER THAN as a means of finding the resources they are after. You say "Show me how manipulating the information in the catalog records can help me understand the causes of the Cold War." How can I show you that when it isn't my point at all? The purpose of the bibliographic data is NOT to provide information about the causes of the Cold War; it is to describe, and make available to the user, RESOURCES THAT CONTAIN the information about the causes of the Cold War. For example, this might be done by using LCSH "Cold War". Or it might be by using an identifier that stands for the concept. And from a record describing a resource about the Cold War, a link from the LCSH string or identifier would lead the user to other resources about the Cold War. And there might be additional links to other resources by the same creator, the same publisher, etc. that might be related in some way and provide more information.
ReplyDeleteHow many ways can I say it? This isn't about trying to make bibliographic data do something that isn't designed to do, and that it cannot do even if someone wanted it to. It's about formatting it into packages to help it do what it's always been meant to do, but to do it better.
Of course, at this point you are so married to your straw man that I would be totally surprised to see you concede on this matter.
It is truly unfortunate you take these matters so personally and resort to ad hominem attacks. It makes it difficult to reach any kind of understanding.
ReplyDeleteBut at least you appear to agree that manipulating the information in catalog records will not help someone understand a topic that interests them. So, you seem to agree with my point that the catalog records are actually signs that lead people toward the resources that may contain the information they want. Once that point is accepted, the public's ability to manipulate the data in the catalog records should not be seen as such a wonderful success since it will make very little difference to them.
This has been my point all along. Then I go on to ask: What can we do with our catalog records that will make a difference to people? How can we create something that people cannot find anywhere else? And where we do not make our so-called "legacy data" obsolete? Is there anything?
My answer is yes: by fixing the catalog to make sure it once again becomes a really useful finding tool for today. And this has been the point of my last two podcasts (plus the next one). It is far more complex than changing a few rules and adopting a new format. It is admitting that in the current information environment, there is something wrong in the fundamental workings of the catalog, and how people relate to the catalog and individual catalog records, and then trying to fix all of it.
RDA, FRBR and Linked Data do not address these issues.
What I find unfortunate is that you continue to argue against a description of RDA and linked data that does not fit anything I have seen anywhere else but in your messages.
ReplyDeleteYou say: "So, you seem to agree with my point that the catalog records are actually signs that lead people toward the resources that may contain the information they want." Yes, yes, yes!!! That's what I've been saying all along.
Then you say: "Once that point is accepted, the public's ability to manipulate the data in the catalog records should not be seen as such a wonderful success since it will make very little difference to them." But it sounds here like you are referring to exactly the thing I have been trying to refute in all my previous comments. This has nothing at all to do with providing data for the user to manipulate for any reasons other than the purpose of finding, identifying, selecting, and obtaining the resources they're wanting.
It is NOT about letting the user do calculations on material types and dimensions, or comparisons of lengths of contents notes, or whatever, the same way they may use a database of research data. What it IS about is helping the user through providing greater depth and accuracy in filtering search results; through making more complete linkages between related resources; etc. To help them get more quickly to the things they're looking for, and once they find something, get more easily to other related things that they may also want.
It's about helping the user find all the things written by Author X, even those where the author's name appears as Author X1 or Author X2. It's about helping the user find a specific edition of a Finnish translation of a Harry Potter book. It's about helping the user determine whether this CD, or that CD, or some other MP3 file is what they're looking for.
It's about improving the signs. The signs will have more information; they will be more plentiful; hopefully they will be able to be adjusted to the needs of the user (have access points or labels translated into another language, for example).
While we can disagree on how well FRBR, RDA, and linked data address the problems of the catalog, I believe it is absolutely incorrect to say that they are not addressing the problems at all. Because addressing those problems is specifically what they are all about. The deficiencies in our catalogs are due in no small part to the deficiencies in the metadata (both content and structure).
I believe I am among the first to ask many of these questions, such as when I questioned whether the FRBR user tasks are all that relevant to the world today. Is that really what people want? The accepted answer today seems to be that it is not the main tasks that people want in the modern information universe. Certainly we can have a tool that lets people find, identify, etc. by their authors, titles and subjects, but the evidence shows that this is rather low in people's priorities and they want something else. What is the evidence? When they can, people have left library catalogs in favor of other tools, where they find information much more easily and better (in their eyes). Cataloging is not very high on the agenda in many libraries and departments are being cut back.
ReplyDeleteThe FRBR user tasks can be done by anyone now--right now--in modern faceted catalogs such as Worldcat. But nobody is declaring victory or tooting horns. Nobody seems to care.
What more evidence do we need?
It turns out we are in agreement on some things, but you still place a lot of faith that once our records are placed into the linked data universe, somehow the public will find them more useful than where they are now. I question that assumption based on several points, but mainly on the fact that our catalogs are broken. RDA certainly doesn't make anything easier to find (it makes it harder in many ways as I pointed out in Catalogs, Consistency and the Future. Linked data won't fix anything and will only make the “brokenness” felt more widely.
In my next podcast, I'll be discussing Information Architecture and what it could do for our catalogs. The number one focus of Information Architecture is to create something that “your users” can navigate and find the information they want as quickly and as easily as possible. If somebody has to fight with a site to make it work, or if they experience trouble, they will leave for something else. In the business world, this is the kiss of death.
This is what has happened with library catalogs. I see nothing in FRBR or RDA or linked data that takes the view of the user in fixing the problem they experience of a broken catalog. One of the main reasons for this is that there have been no real user studies of the problems that users have with the catalog, and to then find solutions to fix their problems. More energy has been placed in “information literacy” which teaches people how to search a catalog. And of course, the need to implement RDA.
We must see the catalog as the users see it: as a whole, as an entirety, not individual records. When we look at it from their point of view, I believe that the problem with catalogs appear much deeper than just deficiencies in the metadata, either content or structure.