Thursday, July 26, 2012

RE: RDA questions from librarians at small libraries

Posting to Autocat

On 26/07/2012 15:33, Marc Truitt wrote:
Similarly, we have more than once heard the argument that "better systems" ought to precede new metadata standards.  This is a chicken-and-egg viewpoint if ever there was one.  The simple, if unpleasant truth is that system designers and vendors are highly reluctant to design and market new systems, absent established metadata standards on which to base them.  Systems people must design to some *specification*, particularly if they are to hope for their products to integrate and be interoperable with the environment in which they function.  Stark evidence of this fact can be seen in the differences between ILS functionality and interoperability in the areas of cataloguing as compared with acquisitions.  Whatever our complaints about the varying ways in which our systems handle MARC-based bibliographic metadata, so long as the metadata itself is reasonably standards compliant, there will always be some level of core functionality on which we can depend.  Even more important, and again as long as the records are reasonably standards compliant, we always have the hope that we can migrate our data to newer, better systems down the road.

Compare this with the scandal that is ILS acquisitions metadata.  There are virtually no standards, save for those promulgated by publishers and vendors -- in other words, not designed for library use -- and even these are as often as not observed at best only in the breach. Acquisitions data are notoriously difficult to migrate between systems, given the lack of an accepted standard and the fact that systems implement proprietary functionality and libraries tend to customize or establish system work-arounds to accommodate local workflows.  Yikes! 

I think a lot of the problems of the cataloger/IT divide stems from different interpretations of the meaning of "metadata". For many IT people, metadata means the coding, while the "information inside the coding" is the "data" which populates the database. IT people are not that interested in the data, except that it is UTF-8 or ISO-8859 or if it is a date or a code, that it follows some accepted standard. We see it clearly in statistics databases, which will have the coding for a specific field (this is the metadata), e.g. acreage devoted to wheat, and one instance of this field has the statistic of 500 (this is the data). There are also the the provisos that this statistic represents hundreds or thousands (50,000 or 500,000) and in acres or hectares or square kilometers. This is what many IT people mean when they point out that our "text" (our titles, authors, dates, paging and other bibliographic concepts) should be transformed into "data". The "500" in the statistics database really can do a lot when related to other "data" out there (linked data!) but if it is misinterpreted and in one database 500 is hundreds while in another it is thousands, the results will be gobbledygook and any interpretations from it are wrong.

The mentality is also seen in the discussions of IT people about ISBD where they often focus solely on the two or three pages of punctuation and the rest of it is of no interest to them. IT people look at ISBD and think, "How old and obsolete is that??!!" Catalogers don't care much about the punctuation and are instead interested in the hundreds of pages of absolutely essential guidelines on how to approach an item, determine which information to choose, which to ignore, how to input it, and all in a standardized way. So, when you state "absent established metadata standards on which to base them" lots of catalogers will reply that our metadata standards go back longer than anybody else's. It's a different concept of metadata vs. data.

(To add yet another level of complication, David Weinberger gave a talk and mentioned that "Metadata isn't what it used to be". He is claiming that today, metadata is what you know--a few lines of a poem, a couple of words of a song, the color of a book that you can't remember the author or title--and data is what you are looking for. He may be absolutely right and then the IT people *AND* catalogers will have even more to argue about! For those who are interested, there is a link to his very interesting talk in a paper of mine.

Ultimately, I think that much of the problem is that IT people insinuate themselves too much into cataloging matters, and catalogers insinuate themselves too much into IT matters. For me, I simply don't care one bit if MARC 100 changes into "creator" "author" "writer" or "abc123", so long as it works. To me, it's just a bunch of stupid computer codes and one code will work just as well as any other. The computer doesn't care. What matters is that everyone who is inputting real information into that field/area must interpret it in the same way and the information is compatible somehow. And there lies the real problem. For instance, some arguments I have had with IT people is that the "creator" of a scan of the Mona Lisa, for instance, should be the name of the scanner software. After all, it's an image file, not the physical resource of the Mona Lisa hanging in the Louvre, and no human created the file. Others have said that the person who pushed the button of the scanner is the "creator". The concept of "title" can be interpreted in a host of ways, as can each and every bibliographic concept. Arguments such as these are very tiring and depressing to me and is why I regret that numbers will not be used, so that everybody would have to look it up instead of deciding that they know what "title" means. Numbers will not be used however, and this is where I fear that gobbledygook genuinely threatens, especially in the looming linked data universe that so many appear to be looking forward to. It frightens me.

One obvious step toward a solution is to separate the IT responsibilities from the cataloging responsibilities. But I don't know if that will happen anytime soon.

No comments:

Post a Comment