Posting to NGC4LIB
Karen Coyle wrote:
I have always assumed (and I would love for someone to post some real data related to this) that after a very small number of high ranked results the remainder of the set is “flat” — that is, many items with the same value. What makes this flat section difficult is that there is no plausible order — if your set is ranked:
100, 98, 87, 54, 35, 12, 4, 1, 1, 1, 1, 1, 1, 1, 1, ….
and you go to pick up results for page 2, they will all have the same rank and they will be in no useful order. (probably FIFO).
My own experience too is that this is correct. Something that may be relevant to this discussion or not, I have worked a bit with a Firefox plugin, called Cloudlet http://www.getcloudlet.com/, where it takes a search in Google, Yahoo, and some other databases, and returns a word cloud. In the Wired article at http://www.wired.com/epicenter/2008/12/firefox-add-ons/, they mention that to get better results, you should change your account to get 100 results per page, but otherwise I haven’t discovered any more details concerning how it works. I’ve concentrated on trying to find out if it is genuinely useful.
I still haven’t decided if it is or is not, but something within me says that it *has* to be useful. My concern is: when I click on a word in the cloud, I don’t really know what I’m looking at.
In any case, this is a different take on the same idea as what we are discussing here.
Anyway, a suggestion for Karen is to relate the search to Google Scholar, which arranges results by number of citations (mostly). For more specific searches, i.e. not only single words but multiple terms, the citations die out after a couple of pages or so.