Wednesday, June 27, 2012

Reality Check: What is it that the Public Wants today?

[Links to everything are at the end in the References Section]

Hello everyone.

I want to thank the Committee for allowing me the honor to speak at such an important event and to include me with such a prominent group. Also, I want to thank everyone who is attending. I just wish I could be with you in person, but this is almost a miracle!

A major part of my talk will summarize four presentations--all available online--that I suggest everyone should watch and that I have found to be very important because they can help show the way forward for libraries.

First however, a few numbers.
Of course, most people want specific webpages rather than entire sites, just as most want journal articles and not entire journals, so to make those 555 million sites useful, we must increase this number by a large factor to turn it into webPAGES. Other formats are incredible too, such as 72 hours of video uploaded to YouTube every minute! These numbers reveal a very definite trend: that non-traditional resources are going up at a phenomenal rate when compared to traditional resources.

We can expect the same phenomenal increase in the creation of metadata, which, after all, is supposed to mirror more-or-less the resources themselves. Much of this metadata will be made automatically or by people who haven't got the slightest idea what they are doing. If all of this metadata comes together into the same pot, which it probably will eventually, (I call this the Metadata Macrocosm) these trends suggest that the records librarians create will disappear into that huge morass like a drop of water into the ocean.
I think this is reality based on looking honestly at the raw numbers.

With that cheerful note aside, the first talk I would like to discuss is by David Weinberger, “Too big to know” where he mentions some of this. 

In one part of his talk, he says: “metadata is not what it used to be” and when he mentions “what used to be metadata,” he is talking about library cataloging! In his opinion, today EVERYTHING has become metadata because with full text, you can search, e.g. “Call me Ishmael” or any sentence out of the book, and retrieve not only Melville's Moby Dick, you get everything by Melville, about Melville, his friends and whales and so on. As a result, for Weinberger, metadata has become something quite different. For him, metadata is what you know and data is what you are looking for. Therefore, everything can function as metadata. He concludes:

I think he is pretty much correct and he makes a very convincing case. This is the world we are entering, if we are not in it already.

Weinberger is describing a type of “information overload”. How can we control it?

To continue with another talk, web guru Clay Shirkey said:
The problem is not information overload. The problem is filter failure.

What he means is that people have always complained about having too much information. Already in the year 1500, the average literate person had access to more books than he could read in a lifetime. That happened very quickly since printing didn't really get going until around 1470, so we are talking about only 30 years! “Information overload” is nothing new.

As a consequence, in Shirkey's opinion, too much information is not the problem. The problem is that people have also relied on methods to eliminate the information they didn't want. This has been the job of publishing houses and editors. Just the sheer cost of creating and distributing information resources, not to mention buying and storing them, have all kept the amount of information down and thus have served as types of, what he calls, filters. In Shirkey's opinion, when people complain of information overload, they are actually saying that the filters they have always used have broken down.

Some have concluded that the way to control information is with the Web2.0 tools such as Facebook, where you “befriend” someone and this person, or one of his friends, or one of their friends, or one of their friends, may mention a good resource that may be of interest to you.

One of the many potential problems with such methods is that you may find yourself trapped in what is now known as a “Filter bubble” that is, where all the information you see comes from sources that pretty much agree with you, your friends and your friends' friends, so you slowly become unaware of other sides of arguments. Remember, the dream of Tim Berners-Lee and his Semantic Web is to have mechanical “intelligent agents” gather our information for us automatically and present us with their results, working much like a thermostat. Each person would have his or her own little personalized “research team” working constantly and diligently to bring us exactly what we want while we go about living our lives.

I have serious problems with this dream of intelligent agents for information, but I will not go into them here because I have discussed this issue in podcasts and postings that can be found on my blog. To sum up my own opinion: his dream is my nightmare.

In any case, the filter bubble makes perfect sense to me. And when added to Weinberger's comments on metadata, plus Shirkey's filter failure, librarians may just want to throw up their hands and run away screaming, OR, they could see a perfect opportunity for an information field based on ethics, such as librarianship, to provide help. How could this work?
As a first step on the way toward a solution, I would like to mention another talk: The Paradox of Choice. This theory posits that we are all suffering from an intellectual paradox and this paradox can be boiled down to the following inferential statement:
a) Freedom is Good.
b) Freedom means I have more Choices.
c) Therefore, the more Choices I have means I have more Freedom.

While this may appear logical, it turns out that in reality, when confronted with too many choices, people feel exactly the opposite and in fact, they just shut down. You see all this stuff and you are overwhelmed. When there are too many choices, people worry about making the wrong choice, a stupid choice, falling for a con job, or something else.

From all of these observations, being made in public forums and discussed in the general media, it is clear to me that the public wants help. There is need for something.

I think that Noam Chomsky, who is a highly controversial figure, but also an accomplished scholar, summed up the public's situation quite well when he said:
I think what Chomsky is discussing is a variation of the same topic as everyone else: the filters we have always had are broken, but he points out that even when they worked better, it was still never enough just to walk into a library. You also needed some kind of a guide, such as a librarian, to help you find materials that are meaningful to you.

To summarize all of this:
The most hopeful part is that none of these people are librarians. These voices come from the public who consider these issues genuinely important. I also think that the values and experience of librarianship address all of these concerns. How can libraries respond?

I suggest that individual libraries, and the entire library field, begin to consider themselves primarily in terms of filters, that is, instead of including, tending toward excluding. Google includes; libraries should be doing something different. It's rather strange that current technology makes it easier to include than to exclude, but that's the way it is. Also, in a sense librarians actually have been “filters of filters” since the beginning, through selection, reference, cataloging, which all serve as filters in various ways.

But what kind of filters can libraries provide? I will leave the important, and tremendous issue of selection aside here. What is it that is unique that catalogs do? I do not think we can honestly say that they give better access, since that is based on judgment and is impossible to prove. But catalogs can provide standardized methods of access that are reliable—and reliable in all sorts of meanings of the word. Reliable selection that guarantees you will see all kinds of opinions; reliable cataloging so that you can find something the same way you found it yesterday; reliable access so that if a site you found disappears or changes, you can still access the information. Of course, experts will be best at using these tools, but that goes for any information tool including Google, which is a lot harder to use than many think.

Above all else, we must acknowledge that the traditional library catalog serves the needs of the library managers: selectors, acquisitions and reference staff, and it allows reliable search results for experts. That has always been the library catalog's main purpose and it does a pretty good job.

There is nothing wrong with this. Such a task is absolutely critical because if librarians cannot do their jobs, nobody can use their libraries, but it is wrong then to conclude that this tool, so necessary for librarians, is also the tool that the public needs.

So what is needed?

It has been my experience that catalogers have a tendency to concentrate on individual records, individual fields and subfields, and often lose sight of the entirety of the catalog. The public looks at the catalog completely differently: they spend little time on an individual record because once they find something of interest, they stop looking at the record and off they go to the resource itself, but the public does spend much more time on the catalog as a whole, that is, looking at the result sets. Therefore, I think we should attempt to reimagine how the public could perceive the result of a search.

In a paper I gave recently in Oslo, I tried to reimagine the individual catalog record and how it could possibly look and work in the future. This time, based on what I have mentioned in this talk, I would like to reimagine how a search result could be made more meaningful in some way. How could this be done?

I think that the field of statistics may offer some valuable insights. How has statistical information been portrayed over the years? A lot has been happening.

At first, it was all tabular, and then statisticians would select some information they thought was interesting to make a few graphs or charts. That's how it's been for a long time, new types of graphs have been introduced along with color, but it was all essentially the same. Today with online databases however, some brand new methods can be applied, such as found in Google Public Data Explorer. Let's take just a moment to see this tool.

[Short discussion of how it works]

Google Public Data Explorer allows displays that could never exist before. They are animated, and far more readily understandable and compelling than the older displays. Suddenly, it is easy for anyone to understand what a time series is. Also, individuals can select the information they find interesting to make their own graphs.

What if we compare this experience to library search result displays? What have patrons seen over the years? The result set has always been a listing of individual records, displayed in various ways. Here are some examples of a search for “Stonehenge”.

[added the headings just for demonstration purposes]

In all the library catalogs, although there may be complete or brief displays, you can sort them in different ways, and now there are facets, we see that people wind up looking at a listing of individual records, not that much different from what people saw probably even when they were in the Library of Alexandria.

In the Worldcat display, people get 5,691 records. That is quite a “paradox of choice”! Which one does someone choose?

With faceted catalogs however, there really is something different: suddenly, the library catalog provides statistical information! This is where the experience of statistical displays can play a part. While the facets in the library catalog are wonderful, my experience shows that people still have problems understanding them. People may click here and there, but with little understanding. This leads me to suspect that people relate to the facets as they would to any tabular display of statistics.

I can imagine someone looking at this Worldcat result and thinking: "There are 434 ebooks and 76 microforms," and relating to it the same as looking at the statistical table and thinking, "In Barrington, there are no deaf and dumb, 1 blind, 2 insane and 2 idiots, while in Bristol there are 5 deaf and dumb, 8 blind, 2 insane and 1 idiot."

It means little to them and there needs something more. What more can be done?

Since now we are dealing with statistical information, one method would be to try to display the tabular information graphically, such as with the graphs or perhaps even with displays similar to Google public data explorer, and that certainly should be experimented with. I have no idea how that would turn out, but are there other options?

I believe that there are and I confess I have held something back. There is another method to display statistical data that comes from Narrative science This tool takes statistical data and generates a textual interface that is not all that bad. It is used now for box scores for Little League Baseball games, and Forbes Magazine uses it for many business reports. Let's take a moment to examine this generated text.

The first is for a Little League baseball game and the second is from Forbes Magazine.

So, how does Narrative Science work? All it does (ha!) is provide an alternative interface that displays the results not in table or graphic form but in words. Google Public Data Explorer could display the same information but it would be graphical and animated.

Could a textual interface such as what we see at Narrative Science, provide a different understanding of a search result in a library catalog? Here is how I think something could work in the faceted search result for Stonehenge.

Catalogers realize that the library catalog and related library files furnish almost all this information right now, but getting at it is not easy: first, you have to know it exists, second, how to access it, third, and most important: you have to know how to read it. It wouldn't surprise me if, by just perusing the tabular data, expert statisticians could mentally visualize something similar to what we can all see today in Google Public Data Explorer. Why shouldn't there be something similar for library catalogs?

Without any doubt, these summaries would be much better if they were written by experts in the field, but it is clear that can't be done, just as a professional reporter will never write up the results of a Little League baseball game. Technology can provide a practical answer.

I believe that such developments could help many patrons, by providing them with a level of context they have never had except the rare few who have had an experienced librarian sitting next to them explaining what they are seeing. By turning a complex result that untrained people find only semi-comprehensible into something much less threatening, it lessens the paradox of choice, provides the beginning of an intellectual framework, and gives people some new kinds of filters that may actually help them.

If something like this did prove to be popular with the public, there may be much more demand for other tasks such as selection and reference. I could see reference staff playing a very important role in such a system.

This is only one suggestion, but I believe that these are the sorts of efforts that would, even if only partially successful, make much greater differences in the lives of the patrons than RDA and FRBR could ever hope to do. Such a project would take advantage of the powers of modern systems, and would cause little disruption to the library's everyday work. And I think it would actually be a lot easier to do something like this with the catalog than what we saw with the Little League box scores. That was incredible. Give me access to the XSL sheet and I could probably do the first sentence right now. Maybe more.

I shall end with this:


No comments:

Post a Comment