Monday, January 30, 2012

Re: Considerations on Linked Data (Was: Showing birth and death dates)



On 1/28/2012 6:51 PM, Karen Coyle wrote:
<snip>
On 1/28/12 9:03 AM, James Weinheimer wrote:
But concerning linked data: Accessing bits and pieces of bibliographic records in the cloud using URIs may be a good idea, or maybe not. Eliminating the need for multiple, redundant local databases may also be a good idea, or maybe not. There are many questions that would need to be decided before entering on such an arrangement. One of the most critical involves intellectual property. I think we all know that struggles over intellectual property are becoming more complicated and more intense as the internet grows and becomes more important in each person's life.
This is a mis-interpretation of linked data, IMO. There is nothing inherent in linked data that says that you must store your data 'in the cloud' nor that you must use cloud-based data. Linked data is used today in enterprise situations that are "off line." It is a useful data management method in itself.

Most organizations look at a multi-tiered data model today. There is the internal, highly controlled data that is used to manage operational functions, like warehousing, billing, service creation.

Then there is the linking that allows your data to take your users out to the world of Web-based information, or to lead people from the Web into your institution. These links can fail, but the important design decision is to decide where you can risk that failure (e.g. sometimes a user won't get from Wikipedia to the library and vice versa) and when you cannot (e.g. the FRBR Work data must always be available).

This is really not new; we already design systems to 'fail gracefully' for non-essential services, and to capture and control data where failure is catastrophic.

I think it's best to think about linked data like this: if I write a paper and put it on the web, anyone can link to it. This linking enhances discovery, but it doesn't change the content of my paper nor its solidity as a unit. If those people stop linking to me, nothing changes for me. If I link from my paper to other information, I know that information is not guaranteed to be there. If I absolutely need that remote information for the integrity of my work, I generally make a local copy of it. If the remote document disappears, I get a 404 message and I can decide if I want to change something. Much of this negotiation between links now happens as automated processes, and the use of URIs for linked data makes it likely that many links will be made, and un-made, without human intervention.

I envision that libraries will create a controlled pool of library data that is not dependent on the open cloud. This is where cataloging will take place, this is where inventory control will take place, and this is where library systems can pull data for library system displays if they wish. Whether or not we also allow others to link to this data (not changing it or its integrity in any way) is a decision we'll have to make. Meanwhile, library systems will link opportunistically to a wide range of information on the web.

We shouldn't be afraid of the web -- we all use it every day; our users live on it. There is no 100% guarantee that everything out there will be stable, but if it were terribly unstable we wouldn't be using it the way we are today. Use gmail? You have no control over that. Use Wikipedia? That's someone else's data. Use google or bing? Ditto.

Essentially, as a system for information discovery and exchange, the Web works. Yes, it could perhaps fail, but if it does, library linking to resources like Wikipedia or DBPedia will be the least of our worries.
</snip>
I don't think I am misinterpreting linked data, I am just recognizing a reality on the web. In a theoretical world, everyone wants to share and share equally. But we are in a different world, especially in today's climate, where everyone is trying desperately to cut budgets and save money wherever possible, plus to actually generate funds.

Today, there is a tremendous movement among organizations to "monetize" their data and their websites. Step one is to establish "ownership" of this information. These organizations need to do this, so I am not criticizing them, simply recognizing a fact that is happening in the world of business, and the library world as well.

There are different types of links: simple links into a paper, links into wikipedia and so on. Those can come and go as they please. But in the linked data world as foreseen by w3c and especially the FRBR data model, not all links are the same. For instance, a library catalog can add into their records the user reviews from Amazon. Let's suppose Amazon will eventually want money for you to link into those reviews. A  library can dump those parts without much fuss, but if you have an FRBR data model and are relying on other agencies for work/expression and maybe even manifestation entities, that is a completely different matter. You are *absolutely dependent* on the agency that supplies this information, and whether you have the right to download copies to your own servers, etc. will have to be negotiated. But in the current melees over copyright on the web, it would be extremely naive for a library to
simply take such a right for granted.

I don't know what the future directions will be with library data: in the cloud or off the cloud, but "libraries" are not so monolithic and will probably implement a variety of solutions. Anyway, there is a world of difference between "being frightened" of the web and approaching it in a responsible, business-like manner, especially after some libraries have already lost rights to their own digitized resources.

Certainly the web is a great tool, but it is undergoing some fundamental changes right now in various areas, one of the most important is in the realm of rights. I am just saying that a simple belief that going to linked data will be the solution, could actually lead to nightmares.

No comments:

Post a Comment