Re: [ACAT] reaction – ALCTS webinar on Future of Technical Services

Posting to Autocat

On 5/11/2016 9:33 PM, Harper, Cynthia wrote:
Just a thought after listening to the ALCTS Webinar “Is Technical Services Dead?” by Amy Weiss and Julie Moore. Julie emphasized something about RDA that I’d never heard anyone mention in the introductory RDA webinars I’ve listedned to to date. That is, RDA extends controlled vocabulary to a lot more fields. Which we all understand is a good thing for computer description.

It made me think of a definition for linked data that librarians can easily understand – a central function of linked data is as a decentralized controlled vocabulary, where many can add choices for new terms or new fields.

Is that description valid?

I co-authored a chapter of a book with Julie Moore that discussed some of these matters. (The chapter is here, by the way http://eprints.rclis.org/25338/)

One of the basic purposes of linked data is to share your data with others. And to do it in the most useful ways possible. But what does that really mean?

As an example, we can take a look at the Open Library Project at the Internet Archive where many different types of institutions have included their catalogs (https://archive.org/details/ol_data). Anybody in the world can download any of these catalogs for free and you can do anything you like with them. (I still find that amazing in itself!) These files are in all kinds of formats: MARC ISO-2709, MARCXML and MODS is much of it (library-driven formats), but there are also records from Random House in ONIX, from CERN (probably in a different format), from Amazon (probably yet another format), something called “indcat” and no doubt many other, quite different types of records.

We see that while it is true that these records have been freely shared, they are not so useful because many are in binary format (i.e. formats such as ISO-2709 where you need another program such as MARCEdit just to be able to read them) but even then, you have to be an expert to know what a 504 field is, or what an 043 field code is. Maybe you know MARC, but you don’t know ONIX or whatever is used in Amazon, and who knows what CERN uses or the other institutions we see there? Librarians mayknow the formats they use, but they do not know them all. It must be
admitted that most non-librarians would be completely lost. In any case, if someone wanted to make something useful from this, it would be an incredible amount of work.

Linked data is a method that attempts to make the underlying structures of your data as comprehensible as possible to the outside community (I want to emphasize that), so that others who do not know your structures could know what e.g. an 043 code is and can decide to use it or not according to their own purposes. In this way, your data can be included in someone else’s program or app.

Otherwise, your data will most probably be ignored when people make these new programs and apps. That’s a scary thought.

This isn’t the end of it however. Even if you build a tool that can actually share all of these formats, you are still stuck with strings of text, so one institution may use the term “Libraries” as a subject but another institution may use “مكتبة” (the Arabic word for libraries) and others use still other forms. If those textual strings would be turned into this link (https://www.wikidata.org/wiki/Q7075) that brings together many language forms of the concept “Libraries,” it is possible that a far better sharing of information could occur among radically different institutions.

While this scenario may sound wonderful, not everyone agrees with it. For one thing, it will be very expensive to implement and maintain, and institutions will want to see a decent return in some way. (“What is the ROI?” is a question that cannot be summarily dismissed) There is also a very good chance that nobody will use our data anyway–there are no guarantees at all. Another point: it will take a long time to create such a structure and with the information universe changing so quickly with new tools popping up constantly, it may take so long to implement a linked data universe that when it finally comes out (after decades) people of that time may consider our linked data tools similar to wagon wheels made from wood or axes chipped out of flint. In other words, whatever we make may be obsolete by the time it is built. Finally, some think (including me) that linked data is just too idealistic and while it has some good ideas, there are dozens of serious practical hurdles coming from different parts of the information universe that must be dealt with before investing heavily in linked data.

The idea of linked data is NOT that you can do things with your own data that you couldn’t do before. After all, you already understand your own data structures and are fully in control of your data. You can manipulate it or convert it or do anything with it already.

To turn now to your question

It made me think of a definition for linked data that librarians can easily understand – a central function of linked data is as a decentralized controlled vocabulary, where many can add choices for new terms or new fields.

my answer is yes, that can be done with linked data, but you don’t have to have linked data to do it. It would be a very expensive way of doing it. There are several other options for doing things like that. If we want new terms or new fields or anything in our own databases nothing is stopping us from implementing them now.

If the problem is “dirty data” i.e. text strings that are entered inconsistently because of typos, variant language forms etc., there are several methods for fixing that too, and linked data is not necessary.

If instead, we would decide it would be nice to help people who don’t know the meanings of words they see in our catalog, and we should link our catalog to a dictionary in some way. If librarians would have to build their own electronic dictionary from scratch and make a system to operate with our catalogs, it would never happen. In practical terms, it may be a nice idea but impossible.

With linked data, you could find that such a dictionary already exists: Wordnet from Princeton (https://wordnet.princeton.edu/) and has been made available in linked data by the University of Amsterdam (https://datahub.io/dataset/vu-wordnet). All it would require would be a programmer to re-tool the catalog to do it automatically. As a result, no librarian would have to change a single thing they do. Something that was formerly unthinkable becomes very possible because of people’s willingness to share their data, and do so in a useful way.

That is the promise of linked data. There are hundreds of issues connected to such a simplified scenario, but it does hold lots of promise.

-195

Share