ACAT Improving catalogues

Posting to Autocat

On 9/8/2014 8:37 PM, Charles Pennell wrote:

If anything, I think Bibframe is on target to create even greater granularity for our data than MARC (including MARCXML) ever could have. Plus, it has the attention of those who are trying to provide better access to that data through the semantic Web and are looking for a more sympathetic data structure and vocabulary than what is currently offered through non-libraryviders.

This is one of those points I have never understood. Right now, our tools are powerful enough (either with mySQL or with XML) we can manipulate any (and I mean ANY) information in our records in any way we could want (and this means ANY WAY). Changing our formats will not change this at all. For instance, if we want to enable people to do the FRBR user tasks: to find/identify/select/obtain works/expressions/manifestations/items by their authors/title/subjects, that can be done RIGHT NOW. By anybody. This is not because of any changes in our cataloging rules or formats, but because the “innards” of the catalog have been changed with Lucene indexing that now allows anybody to use the facets (created automatically through the new indexing) to do that. To prove it to yourself, all you have to do is search WorldCat for any uniform title, e.g. Dante’s “Divine Comedy”. With this search, anybody can then click on different formats, different dates, languages and so on. These facets and the user interface can be changed in any way we want. Any uniform title can be used.

Yes, all of this can be improved a lot, that user interface can really be improved, but catalogers need do nothing. These changes can be made only by the computer technicians. Catalogers just need to add all the uniform titles, just as they have always done.

To make records more granular means to add information that is currently not in our records. What does this mean? For instance, we could add that a specific person was a translator instead of just an added entry. In a new record, we make today, we can add the correct $e relator code. That is pretty easy. But what about in this record where Henry Francis Cary was the translator? Who is going to add it to that record? It won’t add itself! Or to the other expressions/manifestations of his translation? Or to all of the other translators to all of the other versions? Or to all of the other translations of all works? How many records and how much work would that be? What about all of the relator codes, including the WEMI?


This is what I was getting at in my podcast Cataloging Matters No. 16: Catalogs, Consistency and the Future

If we don’t add the coding to these older records, then they will effectively be hidden when someone clicks on someone as translator, i.e. only the new headings will get:

Cary, Henry Francis, |d 1772-1844, |e translator

and the old ones will not. That is, until someone recodes them. To make this search useful means that every single record will have to be recataloged, otherwise when people search for Cary as a translator those records cannot, by definition, be found.

How is this different from other databases that IT people work with? From my experience, business people (where this happens regularly) deal with it by saying, “Well, we are dealing with old, obsolete information, so we can just archive it. We can put it in a zip file and let people download it if they want it.” I have heard precisely those words.

Maybe this is correct when talking about invoice information that is 5+ years old–or maybe not–or personnel information, or even for medical information that is 15 years old or older. But it is 100% totally incorrect for library catalog information. Why?

Because the materials received, and the records made, for materials received 50 or 100 years ago may be among the most important and valuable materials in your library. Remember, we are talking about everything made before just 2 or 3 years ago. That is quite a bit. If you make those records a lot harder to find, you automatically make the materials they describe harder to find. And as a result, the collection itself is less useful. Therefore, the information in a library catalog is fundamentally different from the information in most other databases.

Library catalogs have always been based on the rule of consistency, and I still have seen nothing at all that replaces that. For instance, linked data is still based on putting in links consistently. If the information in the records is inconsistent (and adding relator information etc. only to the new records is a perfect example of that), that makes it at least a whole lot harder to find the earlier records–and therefore, the materials they describe.

Perhaps it works in some databases better than others, but absolutely not in a library catalog. If we change our new records, we must change our old ones or people won’t find them (or at least it will be a lot harder and more confusing). We either care about that or we don’t. If we care, this means massive retrospective conversions, and in our dwindling cataloging departments, we must confess that that means it will never be done. That is a simple fact.

As Mac has said: our records as they are right now could be much more useful to people than they are, but that is a task for the IT people to change the catalog “innards”. One part would be to make the authority records actually useful again.

Now, to return to Bibframe etc. Sure, we can and should change our format (should have been done at least 15 years ago), but that has much more to do with not being stuck in traditional ILMSs, and being able to use cheaper, more powerful tools (e.g. Lucene uses MARCXML) and making them more available for non-library uses.

But that is another topic.