Friday, September 2, 2011

Re: How Google makes improvements to its search algorithm

Posting to NGC4LIB

On 31/08/2011 15:17, Jimmy Ghaphery wrote:
<snip>
I am fascinated by the notion of imprecise or custom search results and the way in which it challenges our expectations in the libraries.

An important aspect to the appropriateness of fuzzy results is the characteristics of the underlying data. In the case of Google we are talking about a huge data set that can at best be loosely corralled. In this context, using additional data such as usage patterns and geographic location of the searcher makes perfect sense to me. For a scientist searching a genomic database, it makes sense that results need to predictable and repeatable.

It is not crystal clear to me where library data might fit along this continuum. Considering the potential scope of the next generation catalog I do think we need to embrace notions of rich algorithms and rapid iteration to tease out relevant results. In reality our results change every day that we add records (sometimes radically if we are bulk loading). How scientific do we need to be here? Do we entertain requests for a researcher who wants to see results from our previous system or the results we presented from a search even a year ago?
</snip>
I remember this news report from the BBC where, because of the various tweaks, Google keeps losing a city in Florida and the consequences to the people living in that town! http://news.bbc.co.uk/2/hi/programmes/world_news_america/9038870.stm. (When I read a story like this, I often "teleport" back in time 25 years
mentally and try to imagine what I would think. I would find this one completely incomprehensible!) I sent a post to Autocat http://catalogingmatters.blogspot.com/2010/09/disappearing-cities.html where I discussed my own views, and there was a short dialog.

One suggestion for fitting in library data was made by Eric Hellman in a talk at ALA, that I mentioned in another post to Autocat, which provoked more dialog. http://comments.gmane.org/gmane.education.libraries.autocat/40227. To make sure that I was not misinterpreting him, I wrote him and he got involved too, in another thread http://comments.gmane.org/gmane.education.libraries.autocat/40267.  Essentially, he was saying that in the future, people would very rarely interact with library metadata as they do now (i.e. looking at catalog records), and that it would be used more as "microdata" http://en.wikipedia.org/wiki/Microdata_%28HTML5%29 behind the scenes, resorting and reworking search results, or Search Engine Optimization. I mentioned the Google Books project with all of its metadata, that most people probably don't even know about, but there has to be a lot going on behind the scenes there.

There is a very definite role for library metadata in the future. I personally think it has to do with ensuring a level of standardization to guarantee that Google's misplacing of towns doesn't happen because of the inevitable tweaks. Also, it becomes clearer and clearer to me that people really don't like to interact with the library's catalog--how it works, how it looks, even what it is, the catalog is becoming a strange thing for the average person. I think Hellman is onto something and may be on the right track toward a solution. Seen in this sense, the Google raters example may prove invaluable.

No comments:

Post a Comment