Re: Link stability in Google Books and Project Gutenberg?

Posting to Autocat

On 05/06/2011 05:53 AM, Ziebell, Carl wrote:

<snip>
I was wondering if anyone has looked into the stability of links in Google Books or Project Gutenberg. I came up with a crazy idea (in a dream, no less). I am in charge of a small rare books collection at Ripon College. (Frankly, many of the books in the collection are not all that “rare.”) I’d like to see if I can find digitized versions of items that are not so rare. If so, I’d like to create 856 links for these items, but I’d like these links to be fairly stable if possible. Then they can “circulate” virtually and not be touched physically. That could also provide a basis for a digitization project for the unique book in the collection. Why recreate the wheel for books that have already been digitized by others?

So, my dear autocatters, what advice have you? Who has done this sort of thing before and would be willing to share their knowledge and experience with me?
</snip>

This is one of the most important questions that librarians need to ask themselves. I completely agree that there is normally little reason to re-digitize materials that have already been done. But this train of thought can be continued: if this is the case for the rare books in my collection, then why not for other rare books that are on the web but not in my collection? And if this is accepted, why not for any other book that is not in my collection, whether it’s rare or not? If we accept all of this, we find we are reconsidering: what constitutes a “library’s collection” today?

In this sense, *every library* can now have a wonderful “rare book collection” consisting merely of links on the web. It’s simply a matter of changing perspectives. This is *much easier said than done* of course and the consequence is that the job of everybody in the library changes. The problem is the stability of the URLs is only one of these consequences, but a tough one.

Unfortunately, there is very little that the cataloger can do: ensuring URL stability is the job of the webmaster using URIs, Persistent URLs and related technology. This means that webmasters must ensure that your link will work “forever”, and although their database structures will change over time, your link will always be forwarded to the correct place. This is their responsibility; if they don’t do it there is little you can do about it.

The only real thing that the cataloger can do is if something works with an openURL. http://en.wikipedia.org/wiki/OpenURL, which is an attempt to standardize one part of a URL (the query), while the URL to the database itself may change. So, if you have a link using isbn, e.g.
http://www.myfirstcatalog.edu/cgi?isbn=0836218310 changes to
http://www.mysecondcatalog.edu/cgi?isbn=0836218310

and this change affects not only 1 record but 500,000, all that needs to change is the part before the question mark since the ISBN is already in your record in an 020 field. In a correctly structured system, the first part http://www.myfirstcatalog.edu/cgi will be structured (e.g. a relational database) so that it can be changed one time only for all records to reflect http://www.mysecondcatalog.edu/cgi.

So, for openURL to work, we would need two inputs: one for the base URL, and another for the query. Of course, the webmaster can assure this as well, by making forwards from earlier websites.

To make something like this to work in Google Books, we see, e.g. http://books.google.com/ebooks?id=ajFDAAAAcAAJ, which has their unique id (and of course, there is no ISBN for this item), what is important is to make sure that the Google ID is in the record somewhere so that this search can be created by concatenating everything together correctly. In the Internet Archive, e.g. http://www.archive.org/details/erasmus00jebbrich, the ID there is erasmus00jebbrich, which can be used in exactly the same way. Hathitrust and Europeana and Gallica and others have their own methods.

Therefore, each database can have its own identifiers and our records need to allow for that, so that the Google ID goes with the Google link and not with the Internet Archive link. I’m not sure if the current MARC format can allow for this.

This is only one part that librarians need to deal with if we are to consider the resources on the World Wide Web to really be essential parts of our collections. It won’t be easy, but I think it’s necessary.

I hope this helps more than it hinders!

-236

Share