Thursday, August 4, 2011

Re: XML vs. MARC

Posting to RDA-L

On 04/08/2011 13:00, Bernhard Eversberg wrote:
<snip>
Briefly: In the long run, inelegance cannot and should not win.
</snip>
There are many people who believe that the MARC format is terribly inelegant. In any case, it is very difficult for a non-expert to work with and it makes no sense to put up additional roadblocks to developers using our data. The old saying: 'If the mountain will not come to Muhammad, then Muhammad must go to the mountain' If we just provide our obsolete formats and expect everyone to drop everything to learn it, we are writing our own obituaries.
<snip>
That's not a built-in bonus of XML. Anyone who programs in Perl, PHP or Python (and others) can produce Web services not using XML.The service may *deliver* stuff wrapped in XML nonetheless, but it does not require the stuff to be wrapped like that in storage.So, for communication purposes you may still provide and enjoy all the comfort that XML may offer in particular environments and processes.
Plus it may be changed whenever needed with no internal changes in your database. The latter is extremely important and much underestimated.
</snip>
I agree completely. Perhaps I have found a point of mutual misunderstanding. In this, the XML I am calling for is a *communications* format, not a storage format, just the same as MARC. MARC is also not (at least, not for a long time) a storage format, e.g. in your databases, I am sure you are not storing native ISO2709 format. I keep repeating that ISO2709 is used for *transfer* of records and not storage. When I download a record from another library catalog, it compiles the ISO2709 on the fly. My catalog then parses and reworks it for the structures of my catalog, probably something completely different from the remote catalog. At this point, the ISO2709 format vanishes, and if somebody later wants to download the record from my catalog, my catalog recompiles the ISO2709 record along with any updates I have made and sends it to the other catalog, etc. etc. This works *only* in library catalogs. Therefore, ISO2709/MARC21 is a standard for library record *communication* and not for library catalog storage.

Exactly the same thing would/will happen with XML: it will be used to *transfer* records from one database with one structure into another system with completely different structures, when the original XML will then disappear or be used immediately for display purposes, but it will not be used for storage. In this scenario, live queries can provide multiple records and allow people to define only the information they want, plus be able to immediately display the results in a web browser or in the app without needing any outside program to convert anything, because browsers can work with native XML. Therefore, we gain the advantage of tremendous flexibility compared with today.

If I provide native XML from my catalog or database, this is when my data can begin to participate with other initiatives on the web (library or non-library) by providing various types of web services, thereby freeing those who want to use my data from the need to download gigabytes of lots of data that may be irrelevant to their purposes, plus they may have to reformat it before they can do anything with it, place it into a database, maybe do some "tweaking" where there may be problems in the data somewhere, and then the data immediately becomes dated and obsolete, because the database that provided the original gigabytes of data has been updated in the meantime, so I have to worry about maintenance. This is not what many developers want, which actually duplicates a lot of the work going on in the original database(s), and now developers expect to work with tools that will do live queries of remote databases, and they expect those databases to return flexible results.

Although providing gigabytes of records through open data is great--I am not disputing that and I applaud the British Library--it is only one step and one option to getting into the web environment.

As far as providing the scheme--as I keep saying, various options exist right now although we can all agree they are not perfect: there is MODS, or even dc simple, and yes MARCXML. I don't care. We can provide them all, just so long as we get included in the wider world of the web. Whatever the format, it has to provide multiple records for live queries of the database, not single records, and provide a flexible format for the results that can be displayed immediately in a browser. Let's see somebody do that with ISO2709. (Also, with XML and some tools such as Lucene, you don't even need a database but just use lots of different indexes, but that's a different discussion)

This is the sort of debate that should probably be on NGC4LIB.

No comments:

Post a Comment