Wednesday, August 3, 2011

Re: Browse and search BNB open data

Posting to RDA-L

On 03/08/2011 08:34, Bernhard Eversberg wrote:
<snip>
02.08.2011 18:34, J. McRee Elrod:
http://www.allegro-c.de/db/a30/bl.htm
Am I correct that there is no MARC display available?
OK, for what it's worth and for good measure, I've added that in; no big deal since we've got what it takes.

Now, MARC appears directly underneath the regular display. But only as complete and as correct as the stuff that was released.The format made available by BL is an XML schema of theirown design, documented here:
http://www.bl.uk/bibliographic/datafree.html
(under "Data model & draft schema")
</snip>
This is interesting. From the table http://www.bl.uk/bibliographic/pdfs/marctordfxmlmappingsv0-3-2.pdf, we see how some of the semantics of the MARC format are lost in the conversion. As we evolve away from the MARC format, I am sure the direction will be toward simplification, so it seems valuable to discuss what could be eliminated from MARC with the fewest consequences. From a very quick review of that table, I see the 534 being translated to dcterms:description, losing some handy subfields, and all of the subfields in the 100/700 fields mapping to dcterms:creator. Also, all of the subfields in the 6xx fields are being placed into dcterms:subject, and there is a loss of the subfield description avxyz.

I need to emphasize that this is discussing losing the specific subfield *coding*, NOT losing the information, e.g.
100 0_ |a Benedict |b XVI, |c Pope, |d 1927-
as opposed to
<dcterms:creator>Benedict XVI, Pope, 1927-</dcterms:creator>

In practical terms for all the various metadata communities, where precisely is the loss here?

While there is an undoubted loss in semantics, with the future evolution of MARC format, we can ask: do such losses have any practical consequences? Although I think many subfields (although not the information) could disappear without any essential loss, some will have important consequences to different communities. For instance, we see in the mapping the complete elimination of 245$c, which would obviously have important consequences for *librarians* (i.e. necessary for determination of a copy), although the loss of 245$c would be much less dire for the users. Loss of subfields with some of the most consequences would seem to be the subfields in the 6xx fields, since those semantics *could* lead to novel computer manipulation, sorting by chronology, geographic, and all kinds of other ways. Also, the distinctions of:
650$aHistory$xBibliography
650$aHistory$vBibliography
650$aBibliography$xHistory

would be lost.

Compare this to losing the subfields in the 1xx/7xx, where the consequences would appear to be much fewer.

Yet, compare this to what others want: even more semantics, for example, to encode 300$a even further to specify pages or leaves or whatever. e.g.
<300>
   <a>
      <pages>
         245
      </pages>
      <leaves>
         56
      </leaves>
   </a>
</300>
etc.

There are definite advantages with this level of coding but on the negative side, it is more work, prone to many more errors, and is more difficult to train new people, especially as there will be the push to simplify.

I think these questions will begin to be asked (finally!), and answered too. This project from the British Library may be a great catalyst for the discussion.

No comments:

Post a Comment