On 10/13/2013 7:27 AM, Alexander Johannesen wrote:
I’d just like to add a few bits about why Linked Data is or was important. It’s not really about sharing the data anymore, it has become almost a secondary nice to have feature of meta data; surely you give out the meta data in order to make things findable? No, the real importance of why the library world should have been quicker and smarter about it is about namespace real-estate, and the power of identifiers, and it’s this subtler connection in which things are truly found.
So, for example, we want to talk about Mark Twain. I could link my data to a URI (which is just a string of letters to make up an identifier; that it’s a URI that you can plonk in a browser or do a HTTP GET on to resolve it is an added bonus) so that we can make sure that when I talk about Mark Twain, I mean the Mark Twain that is linked to this one
And wouldn’t it be great if that was the case?
[Sorry for the long message, but it is usual with me and I can’t find a way to make it simpler]
If this is not just rhetorical question but one that is seriously asked, then I have an answer that, so far as my own experience is concerned is definitive (although others may have other experiences): Wouldn’t it be great if that was the case? The answer is decidedly no.
When id.loc.gov first came out I really, really, really wanted to include it into my catalog in some way. I don’t believe the API had come out yet, but there are other ways if you are creative enough although they may not be perfect. I showed it to several of my users (students and faculty) and while they found it kind of neat, especially the “Visualization” tool, it did not provide them with any information they thought would be useful for their purposes. I think this offers a clear example of looking at a tool like this as a developer, as a cataloger, and as a user.
The underlying purpose of the kind of record we see in id.loc.gov is not so much to provide data to manipulate in all kinds of new and wonderful ways, but to help people discover information that is within a particular collection. So, with the record for Mark Twain, what is there? We find various forms of his name, which is not important in and of itself, but it is there so that when someone searches for e.g. “Tuwen, Make”, people see a reference that says: “See: Twain, Mark, 1835-1910.” (http://1.usa.gov/162o37r)
In this case with Mark Twain, you also discover that he has different “bibliographic identities” (in cataloger-speak), which translates into normal speak as: if you want to find everything by Mark Twain, you also have to look under the names:
Clemens, Samuel Langhorne, 1835-1910
Conte, Louis de, 1835-1910
Snodgrass, Quintus Curtius, 1835-1910
The rest of the information in the record is for catalogers, documenting where the information for each form of name came from and maybe some more. So, for the user, this information is good only for resource discovery within the realm of the specific catalogs that use these forms. Other catalogs have different rules and different forms. For example, pre-AACR2 rules (but lots of other rules too) treat the concept of “bibliographic identities” differently and the heading to search for everything by Mark Twain was only “Clemens, Samuel Langhorne, 1835-1910”. We can see how this was handled in the transition at Princeton University with the first card under “Clemens, Samuel” bit.ly/1ajsS8sbut if you browse to the next cards, you will see that his books are under “Clemens” as was correct before AACR2.
So, the only real information from id.loc.gov that is of use to the public is that they have to look under three other forms of name to find everything by Twain. To revive this type of information would only result in creating a tool that begins to work the way the catalog was designed to work (i.e. back in the 19th century). That is important, by the way.
If we look for an author who did not use pseudonyms, all we see are different forms of the name, e.g. “Goethe, Johann Wolfgang von, 1749-1832” http://id.loc.gov/authorities/names/n79003362.html It is of minimal use for the user to know that Goethe has also been published under “Ko-tê, 1749-1832” although if they search for “Ko-tê” they will find the reference to Goethe.
When we use the VIAF http://viaf.org/viaf/50566653/ we get something that may be more useful more useful to the public, which is the correct form of name to search in different catalogs. So, we discover we need to search “Твен, Марк 1835-1910” in Russian catalogs, and in Arabic catalogs, توين، مارك، 1835-1910
A tool could be made to search Mark Twain’s Russian form of name automatically in the correct catalogs, e.g. http://bit.ly/1boDipB in the Russian catalogs. That may–or may not–be useful to someone to know that materials cataloged in Russia use this form and can be searched correctly.
In Worldcat Identities http://www.worldcat.org/identities/lccn-n79-21164 we find different information derived from the catalog. We see genres, roles, his most widely held works and a word cloud of his subjects. Worldcat Identities, and especially the word cloud at the bottom may be of the most use to the public of all of these tools, but it needs to be tested. Again, when I have showed these tools to people, although they found them interesting, they could not tell me how those tools could help them in any substantive way in anything they could imagine.
Compare these tools to dbpedia http://dbpedia.org/page/Mark_Twain that gives lots of concrete information and tons of links about Mark Twain.
Today, all this can be linked together with linked data (which can definitely be done) but following John Marr’s questions, it seems to me to do so would be to create the very definition of “information overload”.
I want it clear that I am not saying that some kind of tool should not be built, because it definitely should be built, but we must look at it through the eyes of the person consuming it. Otherwise, we may be creating something for us and not for the people who need to use it. Linked data may end up creating a different kind of chaos. This is why I say that linked data may create something useful for the public, but it just as well confuse them more than ever.