Hello everyone.
I want to thank the
Committee for allowing me the honor to speak at such an important
event and to include me with such a prominent group. Also, I want to
thank everyone who is attending. I just wish I could be with you in
person, but this is almost a miracle!
A
major part of my talk will summarize four presentations--all
available online--that I suggest everyone should watch and that I
have found to be very important because they can help show the way
forward for libraries.
First however, a few
numbers.
Of course, most people
want specific webpages rather than entire sites, just
as most want journal articles and not entire journals,
so to make those 555 million sites useful, we must increase this
number by a large factor to turn it into webPAGES. Other formats are
incredible too, such as 72 hours of video uploaded to YouTube every
minute! These numbers reveal a very definite trend: that
non-traditional resources are going up at a phenomenal rate when
compared to traditional resources.
We can expect the same
phenomenal increase in the creation of metadata, which,
after all, is supposed to mirror more-or-less the resources
themselves. Much of this metadata will be made automatically or by
people who haven't got the slightest idea what they are doing. If all
of this metadata comes together into the same pot, which it probably
will eventually, (I call this the Metadata Macrocosm) these trends
suggest that the records librarians create will disappear into that
huge morass like a drop of water into the ocean.
I think this is reality
based on looking honestly at
the raw numbers.
With that cheerful note
aside, the first talk I would like to discuss is by David Weinberger,
“Too big to know” where
he mentions some of this.
In
one part of his talk, he says: “metadata is not what it
used to be” and when he mentions “what used to be metadata,”
he is talking about library cataloging! In his opinion, today
EVERYTHING has become metadata because with full text, you can
search, e.g. “Call me Ishmael” or any sentence out of the book,
and retrieve not only Melville's Moby Dick,
you get everything by Melville, about Melville, his friends
and whales and so on. As a result, for Weinberger, metadata
has become something quite different. For him, metadata is
what you know and data is what you are looking for. Therefore,
everything can function as metadata. He concludes:
I think he is pretty much
correct and he makes a very convincing case. This is the world we are
entering, if we are not in it already.
Weinberger is describing a
type of “information overload”. How can we control it?
To continue with another
talk, web guru Clay Shirkey said:
The problem is not
information overload. The problem is filter failure.
What
he means is that people have always complained about having too much
information. Already in the year 1500, the average literate person
had access to more books than he could read in a lifetime. That
happened very quickly since printing didn't really get going until
around 1470, so we are talking about only 30 years! “Information
overload” is nothing new.
As a
consequence, in Shirkey's opinion, too much information is not the
problem. The problem is that people have also relied on methods to
eliminate the information they didn't want. This has been the job of
publishing houses and editors. Just the sheer cost of creating and
distributing information resources, not to mention buying and storing
them, have all kept the amount of information down and thus have
served as types of, what he calls, filters. In Shirkey's opinion,
when people complain of information overload, they are actually
saying that the filters they have always used have broken down.
Some have concluded that
the way to control information is with the Web2.0 tools such as
Facebook, where you “befriend” someone and this person, or one of
his friends, or one of their friends, or one of their friends, may
mention a good resource that may be of interest to you.
One of the many potential
problems with such methods is that you may find yourself trapped in
what is now known as a “Filter bubble” that is, where all the
information you see comes from sources that pretty much agree with
you, your friends and your friends' friends, so you slowly become
unaware of other sides of arguments. Remember, the dream of Tim
Berners-Lee and his Semantic Web is to have mechanical “intelligent
agents” gather our information for us automatically and present us
with their results, working much like a thermostat. Each person would
have his or her own little personalized “research team” working
constantly and diligently to bring us exactly what we want while we
go about living our lives.
I have serious problems
with this dream of intelligent agents for information, but I will not
go into them here because I have discussed this issue in podcasts and
postings that can be found on my blog. To sum up my own opinion: his
dream is my nightmare.
In any case, the filter
bubble makes perfect sense to me. And when added to Weinberger's
comments on metadata, plus Shirkey's filter failure, librarians may
just want to throw up their hands and run away screaming, OR, they
could see a perfect opportunity for an information field based on
ethics, such as librarianship, to provide help. How could this work?
As a first step on the way
toward a solution, I would like to mention another talk: The
Paradox of Choice. This theory posits that we are all suffering
from an intellectual paradox and this paradox can be boiled down to
the following inferential statement:
a)
Freedom is Good.
b)
Freedom means I have more Choices.
c)
Therefore, the more Choices I have means I have more Freedom.
While this may appear
logical, it turns out that in reality, when confronted with too many
choices, people feel exactly the opposite and in fact, they just shut
down. You see all this stuff and you are overwhelmed. When there are
too many choices, people worry about making the wrong choice, a
stupid choice, falling for a con job, or something else.
From all of these
observations, being made in public forums and discussed in the
general media, it is clear to me that the public wants help. There is
need for something.
I think that Noam Chomsky,
who is a highly controversial figure, but also an accomplished
scholar, summed up the public's situation quite well when he said:
I think what Chomsky is
discussing is a variation of the same topic as everyone else: the
filters we have always had are broken, but he points out that even
when they worked better, it was still never enough just to walk into
a library. You also needed some kind of a guide, such as a librarian,
to help you find materials that are meaningful to you.
To summarize all of this:
The most hopeful part is
that none of these people are librarians. These voices come from the
public who consider these issues genuinely important. I also think
that the values and experience of librarianship address all of these
concerns. How can libraries respond?
I suggest that individual
libraries, and the entire library field, begin to consider themselves
primarily in terms of filters, that
is, instead of including,
tending toward excluding.
Google includes; libraries should be doing something different. It's
rather strange that current technology makes it easier to include
than to exclude,
but that's the way it is. Also, in a sense librarians actually
have been “filters of filters” since the beginning, through
selection, reference, cataloging, which all serve as filters in
various ways.
But what kind of filters
can libraries provide? I will leave the important, and tremendous
issue of selection aside here. What is it that is unique that
catalogs do? I do not think we can honestly say that they give better
access, since that is based on judgment and is impossible to prove.
But catalogs can provide standardized methods of access that are
reliable—and reliable in all sorts of meanings of the word.
Reliable selection that guarantees you will see all kinds of
opinions; reliable cataloging so that you can find something the same
way you found it yesterday; reliable access so that if a site you
found disappears or changes, you can still access the information. Of
course, experts will be best at using these tools, but that goes for
any information tool including Google, which is a lot harder to use
than many think.
Above all else, we must
acknowledge that the traditional library catalog serves the needs of
the library managers: selectors, acquisitions and
reference staff, and it allows reliable search results for experts.
That has always been the library catalog's main purpose and it does a
pretty good job.
There is nothing wrong
with this. Such a task is absolutely critical because if librarians
cannot do their jobs, nobody can use their libraries, but it is wrong
then to conclude that this tool, so necessary for librarians, is also
the tool that the public needs.
So what is needed?
It has been my experience
that catalogers have a tendency to concentrate on individual records,
individual fields and subfields, and often lose sight of the entirety
of the catalog. The public looks at the catalog completely
differently: they spend little time on an individual record because
once they find something of interest, they stop looking at the record
and off they go to the resource itself, but the public does spend
much more time on the catalog as a whole, that is,
looking at the result sets. Therefore, I think we should attempt to
reimagine how the public could perceive the result of a search.
In a paper I gave recently
in Oslo, I tried to reimagine the individual catalog record and how
it could possibly look and work in the future. This time, based on
what I have mentioned in this talk, I would like to reimagine how a
search result could be made more meaningful in some
way. How could this be done?
I think that the field of
statistics may offer some valuable insights. How has statistical
information been portrayed over the years? A lot has been happening.
At first, it was all
tabular, and then statisticians would select some information they
thought was interesting to make a few graphs or charts. That's how
it's been for a long time, new types of graphs have been introduced
along with color, but it was all essentially the same. Today with
online databases however, some brand new methods can be applied, such
as found in
Google Public Data Explorer. Let's take
just a moment to see this tool.
[Short discussion of how it works]
Google Public Data
Explorer allows displays that could never exist before. They
are animated, and far more readily understandable and compelling than
the older displays. Suddenly, it is easy for anyone to understand
what a time series is. Also, individuals can select the information
they find interesting to make their own graphs.
What if we compare this
experience to library search result displays? What have patrons seen
over the years? The result set has always been a listing of
individual records, displayed in various ways. Here are some examples
of a search for “Stonehenge”.
[added the headings just for demonstration purposes]
In all the library
catalogs, although there may be complete or brief displays, you can
sort them in different ways, and now there are facets, we see that
people wind up looking at a listing of individual records, not that
much different from what people saw probably even when they were in
the Library of Alexandria.
In the Worldcat display,
people get 5,691 records. That is quite a “paradox of choice”!
Which one does someone choose?
With faceted catalogs
however, there really is something different: suddenly, the library
catalog provides statistical information! This is where
the experience of statistical displays can play a part. While the
facets in the library catalog are wonderful, my experience shows that
people still have problems understanding them. People may click here
and there, but with little understanding. This leads me to suspect
that people relate to the facets as they would to any tabular display
of statistics.
I can imagine someone looking at this Worldcat result and thinking: "There are 434 ebooks and 76 microforms," and relating to it the same as looking at the statistical table and thinking, "In Barrington, there are no deaf and dumb, 1 blind, 2 insane and 2 idiots, while in Bristol there are 5 deaf and dumb, 8 blind, 2 insane and 1 idiot."
It means little to them
and there needs something more. What more can be done?
Since now we are dealing
with statistical information, one method would be to try to display
the tabular information graphically, such as with the graphs or
perhaps even with displays similar to Google public data explorer,
and that certainly should be experimented with. I have no idea how
that would turn out, but are there other options?
I believe that there are
and I confess I have held something back. There is another method to
display statistical data that comes from Narrative
science http://www.narrativescience.com
This tool takes statistical data and generates a textual interface
that is not all that bad. It is used now for box scores for Little
League Baseball games, and Forbes Magazine uses it for many business
reports. Let's take a moment to examine this generated text.
The first is for a Little League baseball game and the second is from Forbes Magazine.
So, how does Narrative
Science work? All it does (ha!) is provide an alternative interface
that displays the results not
in table or graphic form but in words. Google Public Data
Explorer could display the same information but it would be graphical
and animated.
Could a textual interface
such as what we see at Narrative Science, provide a different
understanding of a search result in a library catalog? Here is how I
think something could work in the faceted search result for
Stonehenge.
Catalogers realize that
the library catalog and related library files furnish almost all this
information right now, but getting at it is not easy: first, you have
to know it exists, second, how to access it, third, and most
important: you have to know how to read it. It wouldn't surprise me
if, by just perusing the tabular data, expert statisticians could
mentally visualize something similar to what we can all see today in
Google Public Data Explorer. Why shouldn't there be something similar
for library catalogs?
Without any doubt, these
summaries would be much better if they were written by experts in the
field, but it is clear that can't be done, just as a professional
reporter will never write up the results of a Little
League baseball game. Technology can provide a practical answer.
I believe that such
developments could help many patrons, by providing them with a level
of context they have never had except the rare few who have had an
experienced librarian sitting next to them explaining what they are
seeing. By turning a complex result that untrained people find only
semi-comprehensible into something much less threatening, it lessens
the paradox of choice, provides the beginning of an intellectual
framework, and gives people some new kinds of filters that may
actually help them.
If something like this did
prove to be popular with the public, there may be much more demand
for other tasks such as selection and reference. I could see
reference staff playing a very important role in such a system.
This is only one
suggestion, but I believe that these are the sorts of efforts that
would, even if only partially successful, make much greater
differences in the lives of the patrons than RDA and FRBR could ever
hope to do. Such a project would take advantage of the powers of
modern systems, and would cause little disruption to the library's
everyday work. And I think it would actually be a lot easier to do
something like this with the catalog than what we saw with the Little
League box scores. That was incredible. Give me access to the XSL
sheet and I could probably do the first sentence right now. Maybe
more.
I shall end with this:
Statistics:
Presentations:
Other: