RE: The next generation of discovery tools (new LJ article)

Posting to NGC4LIB

Jonathan Rochkind wrote:

I would be wary of assuming that this is reflected in the _math_ though. Jim, by “my own experience too is that this is correct”, do you mean you’ve actually looked at the distribution of calculated relevance scores in the result, or just that your own judgements of relevance of hits would distribute like that, trail off into non-relevance?

Google is a highly secretive organization and I would be surprised if they would release this sort of data, but maybe they do somewhere. Still, while the actual relevance numbers assigned by Google may be
100, 98, 87, 54, 35, 12, 4, 1, 1, 1, 1, 1, 1, 1, 1,
100, 70, 69, 68, 67, 66, 65, 64, 30, 39, 28, 27, 26, 10, 9, 8, 7
the fact is that very few people go past the first screen. This includes me. I don’t think it is so much a matter of laziness but that the results past the first page just do not serve. As a consequence, it seems to me that below a certain threshold–I’ll pull a number out of my hat, let’s say 20–it may as well be 1 or 0.

Of course, speaking as the “information specialist” I have no doubt that there is far more that is really relevant on the web than the handful of what I see in the first couple of pages of a Google search (here, I am using the term “relevance” in the normal sense, and not in the Google sense). Since people love Google so much, I always feel I have to add that this is not a criticism of Google. Google is nothing more than a tool, like a hammer or a power saw, and any tool has its strengths and weaknesses. This is simply an illustration of the importance of understanding those strengths and weaknesses.

But in any case, that is why I suggested using Google Scholar results instead, since it seems as if the major ranking of the search is by number of citations, and this can be seen.