ACAT linked data question

On 24/02/2015 21.31, David Bigwood wrote:
> I still think this distributed system might fail at searching. Sure it can > pull in data and display it when a record is selected. But when I search > will it follow the dozen links in each record? And then will it follow the > links from each of those endpoints?
> A patron comes in and does a KW search for Project Apollo. Will it search > all the TOC links for all records in the collection? And follow all the > subject links to VAIF and then to Wikipedia and redo the search based on > all the terms retrieved from those sites? Will it follow all the names in > all the records to VAIF to see if any the VAIF or dbpedia match any of > those to Project Apollo? Then will it follow all the links to the full-text > of all the summary notes in all the records to see if it gets hits on > Project Apollo in any of those remote resources? As it’s KW search it could > be anything, so maybe searching MESH, AAT, AGRICOLA, the NASA Thesaurus, > GeoRef Thesaurus and so on would also need to happen. Any cross references > from those sources would have to be searched again against the whole > system. What if any of these resources are down? One KW search at a > university could easily generate billions of links.

Systems can be built that will search all of that and display it how we want. And yes, saying it is easier said than doing it, especially if the purpose is to build something that is genuinely useful in practical terms for the public. But to demonstrate it at work, there is the Google Books API that automatically searches their database and returns different options. For instance, I can search Princeton University’s catalog for “Electronic funds transfers” as a subject and find this record We can see the Google Books link with it. How did that work? In the background, the catalog searched Google’s database and automatically brought back the book cover, and other information. It probably searched for the ISBN, but you can search it in all kinds of ways. It could do much more if the programmers wanted it to.

It is very possible that if I searched Google Books for “Electronic funds transfers” I would not find this book. It is also important to note that if the Google Books site goes down, you don’t see big empty boxes filled with question marks in Princeton’s catalog: you just don’t see anything at all. Google knows this would be bad for them and prefer for things to “fail gracefully”. The user never realizes that something has gone wrong. All in all, this system seems to work pretty well. Of course, this could be broadened widely if you could get back the full-text of the book but the Google Books-Publishers’ agreement went down the tubes. For now.

One of the problems we are facing is that we are at the very beginning of what will undoubtedly be a long process, and it is almost impossible to imagine what the final product could be. Probably it will be something a lot of us would disagree with. For instance, fundamental changes are occurring right now in the process of “search” which will have profound impacts on what the public expects. The rise of “predictive search” with the use of algorithms that rely on computers that monitor our every waking (and sleeping) moment; tools that tend to take on lives of their own in order to predict what we want before we even realize it ourselves, is one of the latest, and most disturbing, fads.

But in defense of programmers, they would look at linked data in a completely different way and see it as a logical outcome of what they have been doing for decades. Let me try to explain quickly:

Almost any computer program is not a single entity but is actually a conglomeration of lots and lots of smaller programs (called scripts or APIs or other names) that the programmer brings together (“includes”) for his or her purposes. So, what seems to the user as a single screen on a computer–such as the program you are reading this posting on–is actually composed of dozens (or more) of these smaller scripts, some determining the header, the footer, navigation, whether you can delete or print or save or so on.

There are similar capabilities with “server-side includes” (SSI) where a web programmer can include specific pages or bits of other files depending on various criteria. As an example, a site may look different to you depending on whether you are logged-in or not. If you are logged-in to the system, you may have options to email things to yourself, to save, to see what other users are online, or whatever, but if you are not logged-in you see none of that. This is done with SSI, where the programmer has written something like “if this person is logged-in, then add this file (or include it) to the display or run this program. If not, do not add the file”. There can be many, many variations on this.

Linked data does something similar. If certain conditions are met, take a file from another site and use it to build the page for the user. The user doesn’t need to be aware of any of it. At base, the function is the same, it is the scale that is different.

In any case, this is what the library community, and the cataloging community have been aiming at for quite awhile now.

James Weinheimer
First Thus
First Thus Facebook Page
Cooperative Cataloging Rules
Cataloging Matters Podcasts