Friday, April 27, 2012

Re: [ACAT] Modern library cataloging does not see the forest because of the trees; WAS: Various other threads

Posting to Autocat

On 27/04/2012 16:13, Brenndorfer, Thomas wrote:
<snip>
That's confusing the use of punctuation for semantic separation of data elements with the use of punctuation for easy eye-readable displays (or abbreviations for the purposes of compact, quick-to-scan displays on a catalog card or printed index).

Right now MARC tags, indicators, fixed fields, and subfield codes are still not enough-- punctuation plays a role in separating and defining elements. MARC and its catalog card origins are tightly bound together, and that historical basis is why catalogers care so much-- semantics are lost without the punctuation.

But RDA on the other hand begins the clean break. Elements are elements. Display is display. In the handful of areas where RDA falls short the blame lies solely with lingering AACR2 quirks (such as in Extent where content and physical carrier elements are confusingly still crushed together -- "1 atlas (3 volumes)" vs "1 score (232 pages)" vs "3 volumes" vs "232 pages". Here punctuation plays a role in providing a semantic breakdown of content units and carrier units and subunits of various types, which are required to go together following complex instructions.

It would be much simpler to break those all down into separate elements, as has been done elsewhere in RDA.

People can't have it both ways. They can't cling to punctuation rules as if they're the be-all and end-all and at the same time want to play in the big leagues of data management where elements sets and entity-relationship analyses and data schemas are prerequisites to join the club.

Just as a web programmer uses data and then cascading style sheets to change displays without changing data, so should catalogers begin to think in this way (and in many ways they are forced to already -- most online catalogs don't follow prescribed ISBD order and the explosion in enriched content is making the idea of a unitary, standard record vanish to insignificance).

Data is king, and it's time to stop thinking exclusively of the that eye-friendly, quick-to-scan, peruse collocated headings approach to catalog records as if we're still producing records for one implementation-- the card catalog.
</snip> 
My point was: who are we making these records for? For the public or for catalogers? You mention "... that historical basis is why **catalogers** care so much" [my emphasis] and I agree. Catalogers, including myself, do care, but does it serve the needs of the public? And yes, I understand about data and the supposed "problems" of mixing things up, but any practical problems have yet to be demonstrated. What I am trying to point out is that these matters should not be considered merely as ends in themselves: will anybody find it genuinely useful? More complex for the cataloger it will be, without a doubt.

Especially in the reality of this linked data universe that everyone is so gung-ho about, where the relatively paltry number of our records will disappear into the mush mixing with records made by all and sundry, it just seems to me that we should stop and think about where we are going, why we are going there, and what we are to expect once we arrive. For instance, let's imagine that we do recode everything so that "1 score (232 pages)" which is mixed up beyond belief (I guess) will be "better" when everything is coded separately, while at the same time in the reality of the environment of the linked data universe where there will be no agreement on much of anything at all. Theoretically, I believe almost everyone can see the difference between:
300 $a1 score (232 pages)

and something like this (I made this up at the spur of the moment. People can differ on the terms, but changing the terms would not change the basic computer functionality)
<extent>
   <formatExtent type="score">
     1
   </formatExtent>
   <physicalExtent type="numberOfPages">
     232
   </physicalExtent>
</extent>

Yes, computers would just love something like this but that is almost beside the point. It's not that hard to understand. The overriding question however, should not be focused on the machines but rather: is this useful for the *patrons*? I think this shows the difference between theory and practice pretty clearly. The second example definitely requires more work from the cataloger and added complexity. What about older practices such as "5 p.l., 404, [1] p."? I know that some organizations still do this sort of pagination, or at least they claim to. In the linked data universe, our records will be expected to interoperate with all kinds of practices we can't even imagine.

Shouldn't some type of tests be done to discover the usefulness of this level of coding before we begin to go into anything like this? Has any user ever complained before that they needed this sort of encoding? Has anyone ever requested anything like this? Of course, adding levels of coding can go on almost forever. Keep in mind that breaking consistency with our earlier records in the catalog is always a *very serious* matter and should be considered very thoroughly. The reason it is so serious is because retrospective conversion will absolutely never, ever be done, and we immediately make those records obsolete. Therefore, it is a very serious decision to make a large percentage of your catalog obsolete, especially when that is its major value. Finally, lacking any evidence, we should assume that almost nobody else in the semantic web/linked data/whatever will be creating such levels of complexity. Of course, this shows only one of the simpler examples of this greater level of encoding.

When there are so many different challenges catalogers are facing today, and so many ways to go that could definitely help the public, why should we go looking for problems that may not even be there? Or do catalogers think they are actually leading, or lacking that, that they will be the leaders after these sorts of measures are implemented?

1 comment:

  1. I agree with both James and Thomas. We have to question to what degree describing the pagination, illustrative matter, sound, etc. meets user needs. At the same time, RDA has not adapted the data model that FRBR purists would prefer whereby each element and attribute has its own identifier and data value added separately. RDA is also more difficult to implement as a standard from a practical standpoint by the lack of punctuation in the rules. While libraries use MARC there is some standardization in entry, but the display has become varied. Maybe this is better for the user, maybe not. Time will tell. Speaking as the librarian of a specialized, small collection, the lack of display standards integrated within the rules in RDA is an impediment to full adoption.

    ReplyDelete