Authority in an Age of Open Access (an analysis)

Posting to Autocat and NGC4LIB

I would like to share a talk by Clay Shirkey, the Internet guru entitled “Authority in an Age of Open Access”

In one part (5 minutes in), he talks about a project of the Smithsonian Institution, when they put up several thousand images on Flickr and asked people (anyone) to tag them. He says that this shows what happens when you “take a job solely for curators and you invite the public in”. He then goes on to mention how there is now a huge, tremendous list of tags produced by the public and discusses three tags of interest to him. He obviously considers the huge list of tags as a positive, but his talk goes in directions different from what I want to pursue here.

As a cataloger, I look at it a little differently. The public undoubtedly did a huge amount of work on these images and all can see it, but from the viewpoint of access, what is the result? Of course, there are lots of images and I cannot look at them all, but I chose one set at random (15 images) “Mary Agnes Chase Field Books” and considered the tags that were–and were not–assigned.

The first thing I discovered was that there is practically no consistency of the tags within the set. Just looking at the first two photos illustrates it. The first is a wonderful photo of two little girls in Brazil labeled “Two of Agnes Chase’s favorite subjects.” and there are several tags:
children, girls, two, seated, steps, outdoors, Brazil, 1920s, twenties

The next photo, just as interesting, is labeled “Serra da Gramma [sic]. Dr. Rolfs, jungly bamboo slope between fazendo and Araponga.” But it lacks any tags at all. I don’t know the subject area, but I did find “Arapongas (Paraná, Brazil)” in the NAF. Yet, if you look in the comment section, one person “Pixel Wrangler” made some suggestions for corrections, one of which was actually implemented by the Smithsonian. At the same time, the Smithsonian staff member (librarian?) was able to explain a couple of fine points. Which led one person to remark “wow …..” but I don’t know if it was the photo this person found so amazing or the exchange between “Pixel Wrangler” and “Smithsonian Institution”.

Looking at the rest of the photos as a whole, only the first and last had geographic location (Brazil), although a total of 9 are in Brazil, 1 Guatemala, 2 Mexico, 1 Nicaragua, 1 Alaska, 1 Arizona.

8 out of the 15 (the majority) had no tags at all, other than those the Smithsonian gave to each one: “Smithsonian Institution Archives, Smithsonian Institution, Women’s History Month”. Of those that had tags, some photos had National park areas added, e.g. “Itatiaia National Park” which is “Parque Nacional do Itatiaia (Brazil)” in the NAF.

Some conclusions from this highly cursory analysis: looking at the huge tag cloud should now give someone pause. We now know that the tags for “Brazil” are not all the photos of Brazil, even within this small 15 photo collection. We see only two when there should be at least nine. Who knows how many photos of Brazil there are within the rest of the collection? If this is so undeniably true for this single tag, what are you really looking at for the rest of the tags? The first photo has the tags “girls” and “children” but this photo has nothing When you click on the tag “children” in the huge tag cloud, you will not retrieve this photo. This shows how people assume a lot when they click on a tag. (Of course, this applies equally to all headings in a library catalog)

Or perhaps people don’t assume. Or maybe they don’t care. Nevertheless, they should be aware of something that seems so vital, and yet so easily hidden, as are the 7 photos from this collection when someone clicks on the Brazil tag. How is somebody supposed to know?

My experience shows people don’t understand any of this and are actually embarrassed when you demonstrate it to them. They try to explain it away and then often reply they don’t care, but I believe that is a face-saving maneuver. Are we supposed to believe that they really and truly don’t care what they get from a search?! In my opinion, it is much more the case that people do not want it to be true and prefer to ignore it.

The comments to the photos are indeed very interesting. Some have substantive information, e.g. in, there is a discussion about the use of hats in field photographs (led by the Smithsonian), and in this photo of a steamboat in Alaska, someone has linked into Wikipedia and Project Gutenberg to give additional information about this particular steamboat.

All in all, an impressive project by the Smithsonian, but in my opinion, not so much for the reasons Clay Shirkey gives. The Smithsonian staff appear to have taken this as an opportunity for genuine outreach and I am sure they have created some very good feelings about the Institution. Kudos to them! It must have been a lot of work but rewarding as well.

After this short analysis however, the huge tag cloud seems to hide as much as it reveals. It shows the pitfalls of relying on an enthusiastic public who are completely untrained and where the idea of providing “consistent, reliable retrieval” is completely alien. Clay Shirkey discusses the tags “cyanotype”, “moustache” and “steampunk”. He is obviously assuming something when he clicks on one of these tags. What does he think he is seeing when he clicks on “moustache”, I wonder? Does he realize he is getting only a completely unknown and random percentage, just as we can demonstrate with “Brazil”? Does he care?

In spite of all of this, I agree with the overall tenor of his talk, and found it highly entertaining as well as educational. I suggest it to all.