Sunday, December 11, 2011

Old School Search Engines: Where Are They Now? : Some thoughts

Posting to Autocat

I read an article in WebProNews “Old School Search Engines: Where Are They Now? ” by Chris Crum (November 16, 2011) http://www.webpronews.com/old-school-search-engines-2011-11, where the author discusses the “old” search engines (all launched from 1995-1996--not that long ago!), and it gave me a chance to reflect on some of the historical developments I have seen and studied. Among other "old" search engines, the author talks about WebCrawler, which was my own favorite. It still exists(!) and has now become an aggregator of results of other web search engines. He mentions Altavista (acquired by Yahoo and now scheduled for elimination), HotBot and others. Seeing Dogpile brings to mind some forgotten feelings about it: I refused to use Dogpile simply because of its name, which didn't inspire much confidence, to say the least!

The article doesn't discuss pre-web search engines that were at least semi-popular. I am thinking of the old gopher networks that were built before the existence of the World Wide Web and for some reason, the search engines for that technology were named after characters in the “Archie” comic book series: Archie searches, Veronica, Jughead, and perhaps some others. Gopher was all text based and utilized telnet. For those who are interested, here is what it looked like:

I found it interesting that parts of Gopher still  function and that there seems to be a small community trying to keep it going. http://gopher.floodgap.com/gopher/gw?a=gopher%3A%2F%2Fgopher.floodgap.com%2F1%2Fworld. This seems to me as some sort of unique type of cultural preservation project. Anyway, as I remember, libraries were just beginning to build major gopher sites when HTML and the World Wide Web came out and suddenly, everything changed to http accessed through new programs called "browsers".

Reflecting back on how all of these systems popped up, only to become essentially forgotten after just a few years, has made me think about library catalogs and how they too, have died, or changed into something quite different. Before the card catalog, there were different types of book catalogs, either printed or manuscript. The later development of card catalogs attempted to include various new technologies and some of these were abandoned: microforms, photography and other attempts. When OPACs started to appear, everything at first was text-based and looked similar to the gopher sites. We can see how this worked by reading documentation produced by the library at Indiana University for NOTIS http://library.music.indiana.edu/tech_s/manuals/training/notisearching.html. From that documentation, we can see that the functionalities and displays of that catalog replicated as closely as possible card catalog browsing, e.g. the guidelines for subjects http://library.music.indiana.edu/tech_s/manuals/training/notisearching.html#SUB. Displays of the records themselves began to vary from traditional card displays, e.g. see the "The old man and the sea" example at Virginia Community Colleges: http://helpnet.vccs.edu/EServices/NOTIS/sampleSearch.htm, but the functionality was text-based, where you had to type in "a" for author, "t" for title, and then type the number of the record, or e.g. LON for the long display.

I remember how I saw that hypertext allowed people to just click on links instead of typing in all of those cryptic codes and how popular that was (including with me), but more important, the moment keyword searching was introduced, the public just loved it and fewer and fewer people browsed the headings in the old ways. To be fair to the public, browsing headings requires cross-references to be really useful and the first attempts for computerized heading browses omitted the cross-references. It took a long time to incorporate the authority files and even today, it is not done very well. In any case, as time went by and people used keyword searching almost exclusively, many of the older people forgot about browsing the headings, while the younger people never learned much about it in the first place.

Ultimately, as entire texts became available for searching, the powers of searching full-text became clear to everyone, plus searchers could access the information they wanted immediately without going through anything that resembled a "catalog" or "catalog records". At the same time only a few experts could see or understand the problems in full-text searching. Of course, experts realize there are massive amounts of metadata utilized in the background when you use a search engine such as Google or Yahoo, but the searcher is pretty much not aware of it.

Nevertheless, the larger story from all of this is that the two systems (search engines and library catalogs) are merging, and doing so of necessity. It is obvious that it is far more pressing for library catalogs to merge with search engines than vice versa since the public has made its preferences very obvious. I see few pressures on search engines to become more like library catalogs, while there are many calls for library catalogs to become more like search engines, or at the very least, more like Amazon.

From this viewpoint, compared to 15 years ago, people behave quite differently when searching for information and have different expectations of what they should be able to do with that information once they have found it. The problems the public experiences are not based on cataloging rules, or if there are problems in that respect, they relate much more to how the catalog *functions* than to the rules themselves. That is, people very rarely browse headings any longer, as they were forced to back in the days of the card catalog and text-based searches as we see in the old gopher searches, and how NOTIS-type OPACs worked. An even more basic challenge is that people today tend to search library tools only *after* searching their favorite search engine or other database.

In spite of these considerations, the traditional heading browses nevertheless provide a power that currently cannot be found anywhere else; but the heading browses have become dysfunctional in today's computer systems. So, the primary question should be: How can the power of those browses be incorporated correctly into the information tools that our searchers rely upon, whether or not those tools will be located in a local library catalog? Working on this issue is one area that could have much more positive results for libraries than an overhaul in the our cataloging rules, which will result only in a few insignificant changes in the displays of our cataloging records.

One additional point: these search engines are labelled as "old school" even though they came out only around 15 years ago. That is about the same time as when FRBR came out, but in the library world, FRBR is considered to be the most modern statement we have! This illustrates yet another basic difference in the library information world vs. the "information world at large": radically different perspectives of time. For the "information world at large", 15 years ago represents a fundamentally different information world from what we are facing today.

Maybe they are right.

1 comment:

  1. Love this post. I spend a lot of time on subject authorities ... and I'm involved in a very small system (four high school libraries) but I still think that the catalog is the place for our kids to begin searching (a somewhat futile hope, I realize) and that our catalogs will get better with time. Like the writer I wonder about new cataloging rules (e.g. RDA) as opposed to finding ways to make the information more accessible m=by whatever means. Anyway, thanks for the post.

    ReplyDelete