Friday, November 19, 2010

Keyword vs. Controlled vocabulary studies--a case study

Posting to Autocat

Brian Briscoe wrote:
<snip>
Do any of you have a citation or link to studies that did a comparison between Keyword and Controlled vocabulary searching and possibly included a determination that one searching method was better than the other?
I am already aware of Sevin McCutcheon's study that came out in The Indexer, June 2009. I would appreciate any pointers you can give me to further studies or research n the subject.
</snip>
This is a topic that interests me as well, so I decided to do what I tell my students to do in my information literacy workshops. I think my results may prove interesting to others on this list.

I found the original article "McCutcheon, S. Keyword vs controlled vocabulary searching: the one with the most tools wins" and looked through the bibliography for an interesting citation. I chose "Gross, T and Taylor, A. G. (2005) What have we got to lose? The effect of controlled vocabulary on keyword searching results. College and Research Libraries 66(3), 212-30." then copied and pasted just the title into Google Scholar.

The result on its own turns up some articles that are quite interesting: http://tinyurl.com/342o9d6. But more important is that under the metadata record to Gross and Taylor's article is "Cited by 28". Again, not all of them are completely relevant, but some are. These in turn are cited in still later articles.

What I wanted to point out however, are the articles in the right-hand column, which are supposed to be free versions of the articles available in an open archive. Sometimes, these are the same as in the left-hand column, e.g. Zavalina's article from Ideals at Illinois (Ideals is a fabulous resource, by the way!), and are both freely available, but as I tell my students you will see spam, e.g. JSTOR results show up especially often in the right-hand column, which, of course is not an open archive. My students seem to accept this proviso with no problem. Still, look at the number of articles that are available in open archives. I have watched these numbers mushroom in the last-I would venture to say, three years.

There is also the click box at the top now, which allows you to do a keyword search only in those articles that cite your article: a huge advance. All of this is obviously very powerful, but even more important: this is free to anybody today who has access to the web-a fact I still find astonishing! Of course, Google will continue to advance these tools. As one example that I can imagine: if they could change the click box for limiting the search not only to cited articles, but to the entire thread of citations from beginning to end that the user could control-this could prove to be useful (or perhaps not). Finally, keep in mind that this is only one citation from one article, when there are several other citations available, leading to articles with other citations. Trying to visualize the amount of relevant material can be staggering.

Based on these considerations: this is the way I see it:
These results are definitely useful for anyone and quite easy to do. This method also mixes a fairly traditional tool (citation indexing) with keyword possibilities. Also, I suspect that Google will not change their tools for us-at least, not very much. By this, I mean that Google will not do a lot of work to be able to fit into our systems and we must design things to fit into theirs-that is, if we want to cooperate. It is inevitable that newer tools such as Mendeley will be incorporated into this eventually (if it has not been done already).

So, for me the question becomes: how do we build tools that fit into this situation usefully, simply, and coherently? How can controlled vocabulary enter into this entire equation where citation analysis (plus other methods as yet unforeseen) are so easy to use? I think it can and must be used, but it is quite a complex task.

No comments:

Post a Comment