Friday, January 21, 2011

RE: Linked data

Posting to RDA-L

Jonathan Rochkind wrote:
Okay, again, give me the algorithm for software to use to figure out what title(s) to use to display to the user (assuming we don't just want to put out the whole 245 with ISBD punctuation and all).
How does software know when to get exactly what from a 240 or 740 , and when to use it as a title label? (A 240 is often useless as a title label, eg "Selections", that is NOT in fact the title of the work it's attached to, to any user at all; A work cited in a 740 may or may not actually be the work in the record at hand, it could also be a 'related' work in some way).
I am completely confused here. My original contention in this thread was that MARCXML is *not* a perfect solution, but still represents one small, but important, step in the direction libraries need to go, and can be implemented relatively easily. There are still problems with MARC in an XML format but this should not stop us from switching to MARCXML so that we can take that first step to placing our records into the wider information universe. Then it turned into parsing other titles from the 245, but I showed that this is unnecessary since catalogers do this manually at the point of cataloging, so long as the cataloger is following the standards. Any catalogers who do not do this are, automatically, producing sub-standard records.

Now that the problem of parsing has been dealt with, the "problem" shifts to how to *label* the different titles. I want to point out that this is not at all the same issue as the one discussed either by myself or in the article. But OK, let's discuss it.

All of this has been defined for a long time. The 245 is, by definition, the title appearing on the chief source of information of the item the cataloger is describing, transcribed as exactly as possible. 95% of the time, this is a no-brainer; probably 3% of the time it takes some thought but is still easy; 2% of the time, it may be tough and the cataloger must make a decision that others may disagree with. (In my own experience, these kinds of problems mostly stem not from which title to use, but deciding upon one chief source of information from several choices) In any case, the real purpose of the 245 field is *identification* and is *not* for access purposes, although lots of times in the past, a title added entry card was made, while 245s are invariably indexed today. Additional access is made for titles that are buried within the title statement.

All of the other title fields are there for additional access for various reasons, e.g. a uniform title, which is used to bring different versions with variant titles together, such as language versions, variant editions and so on.

Another type of title belongs to something larger or smaller in different ways: a series title, host item entry and constituent entry all describe greater "entities" or lesser ones that the item described relates to. These can be dealt with as separate records or within a single record. This is shown in the different treatments of series/serial records, host/constituent or analytics.

Another type of title is different ways that the title in the 245 may be entered. Today, these are in the 246 field, e.g. corrected spelling of the title, variant titles from the chief source of information, etc. 740 is kind of a grab bag for titles that can't be placed anywhere else. Maybe the 740 is muddled a bit, especially because the 246 field for books was instituted only in the 1990s, but nothing is perfect. (It would be interesting for OCLC to run a test and find out how many 740s there are now and how many of those are not from the 245)

This is about as accurate as we can get in practice. If we want to devote the resources for greater specificity, there should be some kind of evidence that the users of the system need it. I think that would be very difficult to prove. Certainly librarians don't need any greater specificity.

Cataloging theory decided long ago which of these titles to display as "THE" title: what is found on the chief source of the item transcribed as exactly as possible, and this is placed in the 245 field. There is no problem here. A 240 "Selections" is not, by definition, the title of any item, but it is also not useless. It is useless *only* if the remainder of the record is ignored. 240 "Selections" implies, again by definition, a 1xx field. You cannot have a 240 without a 1xx, otherwise you would have a 130. These treatments are followed into the 7xx and 8xx fields and even the 6xx fields when required. (Why there is 1xx/240 as opposed to 1xx$a$t I have never understood, but is a matter for historians to answer)

I personally see no problems with any of this and I think it is pretty well done. There are in-depth rules for all of it. Catalogers have gone out of their way for quite some time to input this information accurately. If there actually is a problem of understanding the practice, it seems to me this may be because everything catalogers do is based on methods designed for card catalogs--where everybody could see much more clearly how they worked than in an OPAC--these methods nevertheless work. And they certainly work far better than *any kind* of metadata I have seen from publishers. So, if our records are poor (which I don't believe is correct), ONIX data cannot be seen as one bit better.

Besides, going back to my original point, none of this justifies the continuation of ISO2709. Certainly MARCXML format as it now stands is not perfect, but we need to take matters one step at a time! Otherwise, we will never start.

This does not end the argument by any means, primarily because we are in the Internet, and not every bibliographic agency follows AACR2 or ISBD, and I am positive many of these other agencies will not follow RDA. Ultimately, we must work with these other agencies. Perhaps sharing our records with the general public could help us solve problems of this sort that we encounter.

No comments:

Post a Comment