Posting to NGC4LIB
On 08/11/2012 12:45, Dave Caroline wrote:
I shall pick one small statement and show that “standards” do not do justice to the data or the users. If only the “professionals” looked out at the world beyond and asked themselves what do people want to find.
Do the users really want a limited subset, no they are more likely to want the right subset. there is a difference. a standard chooses a subset with arbitrary rules that may have more leaning to the cost of cataloguing than the usability of the catalogue.
A fulltext ocr that google can see will enable direct finding of information for anybody.
An example from today, I have a trade book by a company called Barber Colman who made gear hobbing machines taking a description of one very special type of hob “mutilated-tooth single position hob” putting that into google took me straight to the patent, no messing with any catalogers view of the information and few would have the domain knowledge probably.
One thing that library cataloging “standards” specify is only catalogue the first (main author/s). a travesty in a book where sections are all by leading authors.
That is a very good example of the power of full-text searching, but I don’t see how this shows that standards do not do justice to the data or the users. The search you did is only one type of search and I would submit, is not what most people do most of the time. People are more interested in topics. My favorite example is to search for “World War I”. People who are interested in that topic will enter into the search box “world war i” or “wwi” or whatever, and in Google, the response will be millions of hits on World War I, probably with the first link straight into Wikipedia. More than anyone could read in a lifetime. When I have asked people if they are happy with the search, they say “yes, look at all the hits”. When I then ask them if it is a good search, the question surprises them and no one has figured out the problem until I tell them.
I then go on to say that a search for “World War I” *cannot*–*by definition*–find a very important type of resource. And that is: anything from before 1938, i.e. before World War II took place. Nobody called it World War I until there was World War II. Therefore, in a full-text search for “world war i” it is impossible for there to be any primary sources, nor can there be many secondary sources. The public is not used to thinking in this way: full-text, as the term says, is searching *text* while library catalogs were designed to allow people to search for *concepts*. Although people may say they want resources on “World War I” 99% are actually interested in “that big war that took place mainly in Europe from the years 1914-1918 that killed millions, no matter what words have been used to name it”. The fact is, you can search library catalogs for concepts–something you cannot do in Google.
In your example of “mutilated-tooth single position hob”, you were searching for the text, not trying to find out the range of information that may be available on this hob. Is it known under any other terms? Possibly, but that would take research. While people do occasionally want to search text as you did, it seems that people mostly don’t care what the words are, they want to find out about architecture in Rome, the techniques of Leonardo, and so on. These need conceptual searching.
Library catalogs were designed to allow this “conceptual access” to the materials within a collection. In fact, it was physically impossible to do a text search like everyone does today. There has been a complete intellectual change and cataloging has yet to deal with it. Before, all you could do was browse cards–you couldn’t walk up to a card catalog and say: “give me all the cards with the words ‘world war i’ printed on them” and have them all fly into your hands. The most you could do would be to search a card catalog by the beginning words of the title–that is, if the library made title added entry cards–but that was exceedingly hit or miss. On the other hand, you could search the cards by their subjects and that demanded certain methods by both catalogers and searchers, and yes: standards.
I admit that transferring this conceptual access into the computerized catalog has not been done very well at all and has been, in my opinion, one of the biggest disasters in the history of cataloging. Still, I think people would very much like the conceptual access since it is available nowhere else.
Turning to other examples of searching text vs. concepts reveals that words (text) change constantly. What words would you choose to search full-text for “African Americans” in documents from the 19th century? Even from the 1960s? Or homosexuals? What about different languages? I can remember people from the South calling the U.S. Civil War as “The War of the Northern Aggression”. Here is an interesting Wikipedia article on the topic: http://en.wikipedia.org/wiki/Naming_the_American_Civil_War. So it is clear that full-text searching, although it seems to be easy, actually deals with the vast complexity of language change. It is mined with boobytraps for the unsuspecting.
The *only way* the library type of conceptual access can work is through adherence to standards by trained experts. While I think full-text is great and use it all the time, we must question whether it is so good that it eliminates the need for the other type of access. I certainly don’t think so. The first step though is to admit that our traditional methods for allowing conceptual access has been a disaster in the computer catalogs and figure out how to fix it for people today. That is a huge task, I agree.
I am not saying that full-text is bad but just as anything else, it has strengths and its weaknesses. It is my belief that merging the library catalog with full-text would create something far more powerful than anything we have seen so far. I think it would be really fun to try.
Unfortunately, I fear that people would rather dump this kind of unique access found in library catalogs instead of fixing it. And of course, libraries are in a budget bind having to pay for RDA implementation, which will have no effect on any of this.