RDA-L Some discoveries of search engine results

Posting to RDA-L

On 8/14/2015 5:22 AM, Amanda Cossham wrote:
I agree that this would be important to consider, but there is a lot written on relevance already in literature on information retrieval. Two examples taken somewhat at random from Information Research: Analysing the effects of individual characteristics and self-efficacy on users’ preferences for system features in relevance judgment. Yin-Leng Theng and Sei-Ching Joanna Sin http://www.informationr.net/ir/17-4/paper536.html Seeking relevance in academic information use. Jela Steinerováhttp://www.informationr.net/ir/13-4/paper380.html
The principle of least effort (Zipf) is also highly relevant, as is that of satisficing – and both have as much impact on a searcher’s decision not to go past the first screen of results as the relevance of those results. Plus, there is a belief by many searchers that what is returned IS relevant, because that’s what they understand search engines to be giving them.

–Apologies in advance for the length of this reply, but the papers Amanda cited made me think and this has become some thoughts on what “relevance” means. This is not about RDA, but it does deal with the catalog–

Thanks for sharing those links. Not easy reading, but I managed to wade through them and a few others they cited. From my understanding, it seems as if they discovered that “relevance” means entirely different things to different people, plus “relevance” can mean different things to the same individual depending on what that person’s information needs are, how he or she is feeling, or all different kinds of criteria. So, a scholar looking for information for an article may consider relevance in one way, but just a few moments later that same scholar may be deciding which new app to download and may view relevance in an entirely different way.

This doesn’t seem to be anything new: it is something that reference librarians have known for a long time. In fact, in many ways that was their job. The answer to a reference question depends on who asked it. It isn’t that “relevance” is such an abstract term that it doesn’t mean anything at all; rather, it is that the meanings of “relevance” are highly subjective and continually in flux.

Therefore, viewed in the aggregate, “relevance” essentially means nothing, because it varies so much; but when considering it in a specific situation, relevance becomes much more tangible. As I mentioned, a reference librarian would understand immediately.

From the viewpoint of computer searching, the IT definition of relevance takes precedence since that is what people see. Since so much of IT relevance is secret, or the consequences of algorithmic ranking are so complex that they cannot readily be explained by a human, then, apart from some general guidelines, no expert could ever predict or explain why specific items come up in the order they do. And I am sure that if we could examine those algorithms in detail, we would find many, many points where we would disagree on how those algorithms rank the results. Google (and probably the other search engines) also tweak their algorithms about 500-600 times every year! (https://moz.com/google-algorithm-change)

I agree that the principle of least effort is fully in effect with searching, but satisficing is linked closely to what you know about: you may be satisfied with your horse and buggy until you discover cars and airplanes.

So, the term “relevance” varies far too much among individuals to be generally applicable to a “Sort by relevance” button, especially when we know it doesn’t work very well at all, even in relation to giving us advertising that is relevant to our needs and where the advertising system has access to massive amounts of our personal information, some of it very sensitive. I think relevant ads are something that everyone wants and would like to have, but it must be admitted that it clearly doesn’t work.

In fact, and strangely enough, for some people, watching online ads is becoming a moral/ethical issue because “somebody has to watch the ads and if you don’t the companies will stop paying for your free content”. See e.g. “Why Using an Ad Blocker Is Stealing” where the author writes: “Every time you block an ad, what you’re really blocking is food from entering a child’s mouth.”(!!!-my exclamation points-JW)

Of course, these same ethical considerations do not apply to print advertising at all. If I ignore an ad in a newspaper or magazine, I am not told that I must suffer pangs of guilt because I have made babies go hungry!

So, what underlies this concept of “relevance” that turns out to be increasingly strange the more we examine it? I would bet that even those who proclaim how great the Googles are, probably wonder silently how much is useful in those hundreds or thousands of links that are on the pages they don’t look at.

I suspect what we are seeing with “relevance” is not so much a logical attitude or even a type of belief, but it is actually more of a hope among people who desperately want it to work–because they think there
is no choice. Therefore, they hope that the IT definition of “relevance” matches their own personal definition of “relevance”–or at least it’s close enough. Why do people think that? (And I include myself) Imagine for a moment that it was proclaimed publicly that relevance does not work. Where would that leave everyone?

It would leave them overwhelmed and helpless against the torrents of “information” that come at them constantly and relentlessly. Naturally, since people have no choice except relevance, saying that it doesn’t work is something that can never be done because “That way lies madness.”

It has to work because it has to work.

Relating this to the microbiologist’s talk I mentioned in the first post of this thread, his workflow would pretty much disintegrate if relevance doesn’t work. From what he said, he uses relevance ranking (from tools such as Google Scholar) and recommendations from friends using tools such as Linked-In. If relevance doesn’t work, he would be reduced to getting recommendations and articles from his colleagues, which would effectively revert to the earliest days of scholarly communication when scholars wrote private letters to one another, and sometimes a few would be published for others to share.

No, no. Much better to believe that relevance ranking works and not think about it too much.

To bring this back to the catalog and its relevance ranking: first, I am more confused than ever by what “relevance” is supposed to mean in a library catalog. I am not saying we should eliminate it from the catalog because the public wants it and expects it today, but if called on to explain it to someone, I don’t know how I could do it.

Second, I think a lot of the problem comes from the lack of choices other than relevance; therefore a solution would seem to lie in the direction of offering other choices. Are there any possibilities? Librarians would seem to be a great community for that–after all, until the Googles, we were supposed to be the experts in selecting reliable resources and making them available in predictable and reliable ways.

Of course, our catalogs were never based on anything resembling “relevance”. It has always worked by grouping similar things together, so that one record for a book by Dostoyevsky will be found next to another record for another book by Dostoyevsky, no matter how that name is spelled. The task for the user was to find this grouping and then everything within that grouping was assumed to be relevant. People would determine their own relevance of the materials by examining the items. (If they could, they would browse the shelves but otherwise, they would have to request them) Our methods and tools are pretty creaky but the final product can be unique and very useful for people.

As one example, someone studying Dante and wanting some information about the people found in his works would probably like the listing in the catalog–that is, if they knew about the subject heading
“Dante Alighieri, 1265-1321–Characters” See the listing in the LC catalog: e.g. http://1.usa.gov/1LkMdLs When I search for “dante characters” online, I get links to a computer game and something called the “Infernopedia”–no relation to the 13th century Dante.

The semi-traditional display of the results that we see in LC’s catalog could certainly change from the list we see into … well, there are innumerable options and lots could be tried.

Doing the same search in Worldcat
(http://www.worldcat.org/search?qt=worldcat_org_all&q=su%3A%22Dante+Alighieri%22+su%3A%22Characters%22) we see the very useful facets, but we don’t get the listing of the characters that we see in LC’s catalog, which I think people would like. Merging the two displays does not seem impossible and could be a tiny first step.

But the big problem is: how does someone find that grouping found under “Dante Alighieri, 1265-1321–Characters” in the first place? The answer is not to teach everyone in the world how to search catalogs correctly or have everyone take an information literacy workshop. The answer is to have the catalog function for the world of the 21st century. Doing that would actually be re-creating the reference librarian’s job in many ways and I am sure they would have lots of suggestions, along with faculty, teachers, students and others.

I don’t know if people want to go down such a long road to create something new, or if they would just prefer to believe that relevance ranking works. For those who believe it works and only needs a few tweaks here and there, there is no real problem; for those who don’t believe it, there is a long way to go.



One Comment

  1. Ron Murray said:

    I think it important to maintain a distinction between what people do and what computers do.

    I’d say that humans form and make *relevance judgments* and base their selections upon that process. Computer programs compute *relevance values* and put those values to various uses, including organizing lists of items for human inspection or further computational action.

    Computer scientists are less inclined to say that computational relevance algorithms duplicates the same process as humans use, than to say that it produces results that humans can progressive learn to appreciate and use – e.g. “is this a useful result for my purposes?” Of course when computer scientists and those who employ them switch from “scientific mode” to “marketing mode,” marketing claims can be constructed to perhaps serve business models a bit more than enhancing scientific – and personal – understanding.

    To see how not just scientists but also marketers and product designers “”really** find out what people find relevant in scenarios where choices must be made between products or services, see:


    This gold-standard consumer research method enables investigators to not only find out what (product or service) features are important to subjects, but also which features they will trade-off to get other more desirable product features. Consumer researchers use this method to construct a model of the customer’s mind, against they can test – without requiring additional human involvement – possible product and service configurations relevant enough to drive consumer choice (i.e., a purchase or a preferential service selection.
    Compare the above marketing, etc. technique with the (marketing, essentially) claims made for computed relevance ranking – and consider that our community has been wise enough so far to support the training of library and other professionals capable of tapping reader’s minds and determining their forms of relevance via “reference interviews” and other human-interactive means.

    September 19, 2015

Comments are closed.