Monday, October 4, 2010

The Functional Requirements for Bibliographic Records, a personal journey Part 3

The Functional Requirements for Bibliographic Records, a personal journey
Part 3
Link to pt. 2

Hello everyone. My name is Jim Weinheimer and welcome to Cataloging Matters, a series of podcasts about the future of libraries and cataloging, coming to you from the most beautiful, and the most romantic city in the world, Rome, Italy. This installment continues my personal journey with the Functional Requirements for Bibliographic Records (or FRBR).

This series: The Functional Requirements for Bibliographic Records: a personal journey, has gone on for two previous podcasts. I believe that this installment will make very little sense without the first two, so I strongly suggest that you listen first to them, in order. Links to the earlier podcasts are available from the transcript.

I also want to mention that I added a Google Translate widget to my blog, so that those who want it, can now get an instant translation of the transcript, or any of my other postings, in a surprisingly large number of languages. Still, we are all aware of the problems of automatic translation, so keep in mind that Google Translate is designed only to provide help, but translations may turn out to be faulty. If Google Translate has me say something stupid, it is not necessarily my fault!  

Now, to continue my journey:
Let me sum up where I am in my twelve step process, and pardon me for repeating the steps yet again, but I feel it is important:
------Renewed Determination
--------------Serious questioning
----------------Serious doubts

After encountering initial failure in my first efforts to understand FRBR, I now understood it, but based on my widening experience, I was being faced with questions without apparent answers. These questions I could not simply ignore, and consequently, I had entered my Consternation phase. Those first questions had to do with some rather uncomfortable facts concerning the manifestation, or I was used to calling it, the edition. This had been an aspect of traditional cataloging that I had always taken for granted, and yet it became crystal clear to me that different communities viewed the same physical item in quite different ways. These differences were actually based on definitions, and I saw how those definitions could vary widely among different bibliographic communities: you could be a late-20th century AACR2/ISBD cataloger, a rare book cataloger, someone cataloging for FAO of the United Nations, a cataloger from the late 19th century, and so on, and each person would look and relate to the same physical object in a unique, and often quite a different way.

I myself had experienced how on one day, a book was a new manifestation or edition, but because of a new LC Rule Interpretation, on the very next day, that same book was an item or copy. And this concerned a concept that to me had earlier seemed so fundamental and solid: whether this thing I am holding in my hands is a copy of something else or not! If there was no agreement on a point such as that, how could there possibly be any agreement on anything at all? (Jumping ahead for just a moment, this is the sort of reasoning that would eventually lead me to my Despair phase)

Still, it is important to say that I had no doubts as to the ultimate correctness of FRBR since its principles were based on Cutter’s Rules, which have served as the foundation of modern catalogs since the latter 19th century. These rules represent the solid base that we could all rely upon, and therefore, as far as the problems that I saw, I could safely put them on a shelf in the back of my mind labelled “Snags”, since I knew--or at least I had faith--that the problems I saw were either some kind of imperfection in my own understanding, or these were examples of some minor anomalies that would be worked out in time. As a result, I felt relatively comfortable and assured while I was in my Consternation phase.

But I couldn’t ignore it all forever and I was quickly entering my Serious Questioning phase. When I worked at the Food and Agriculture Organization of the United Nations (or FAO), I found myself looking at all kinds of cataloging from different communities, such as those I myself was working with, but I also saw other records from journal and book publishers who wanted to share our data, and still other communities who were making online videos, the geographic mapping community, and from various others around the world. At the same time, Google Books was just starting to get off the ground; I could see fabulous resources in the the Internet Archive, and all sorts of digitized books were becoming available through different projects ranging from the University of Virginia to the University of Heidelberg to think tanks to fabulous antiquarian map sites from dealers. FAO itself was placing very important information online; not only books and documents, but conferences, videos and images, entire workshops, statistics and so on, and I saw how many other organizations and universities were doing the same things.

I discovered that few of these projects followed widely recognized standards, and often, some of these communities had no standards at all, or nothing that you could really label as standards. For example, someone might say that their records “followed the Dublin Core standards”, a peculiar idea that actually meant that the computer coding may have followed Dublin Core: i.e. creator, date, relation, and so on, but the information within the coding, (i.e. what I will call here the actual cataloging information) followed no standards at all, i.e. something authored by the “United Nations” could be entered as “United Nations” or “UN” or “U.N.” or “ONU” or “Naciones Unidas”, or a host of other possibilities.

And yet, at least these communities were semi-organized. Not the least important of these communities was the burgeoning open archive movement, where everything was supposed to be “self-managed” in some sort of way that was completely unclear to me.

Open Archives are a major topic and will be the subject of a future podcast, so I will avoid the discussion of them for now. For those who are interested however, I have placed a link in the transcript that goes to additional information. I suggest, and link to, Peter Suber’s Open Access Overview For my purposes here, I simply want to point out that I felt I was witnessing an exponential growth in the immediate access to materials in open archives, that is, to materials that are highly important to patrons of any library.

It turns out that so far, my prediction of the rate of growth of these open archives has turned out to be true, although I think it was always a pretty safe bet. Links to the statistics are available in the transcript.
(Statistic of growth in open archives: ROAR: (click on graphical analysis to generate it yourself) and OpenDOAR:]

At the same time as I realized that the numbers of these new, easily accessible sources of important information had the potential to become genuinely overwhelming, and since I was at FAO and in communication with managers of different web sites, I also realized that these overwhelming numbers of resources did not necessarily imply a complete absence of metadata, since I discovered that metadata was continually being created at practically every level.

This metadata (I cannot find it within myself to call it “cataloging information”) was being created by publishers, journal databases, and sometimes by the authors themselves, as they added their materials into the open archives, but the vast majority of it was not being created by library catalogers. Much of this metadata was created for internal management and workflow purposes; for example, the publisher or editor could find out how far along a specific author may be on a chapter of a book, or if an item has been assigned an ISBN, and so on.

As a result, instead of a problem of quantity, the difficulty seemed rather to lie in the quality of the metadata. I had seen that there was quite a bit of metadata being produced according to accepted standards, such as our metadata that followed the AGRIS standards, but these standards did not follow AACR2 or ISBD. For example, the basic rule of “exact transcription of the title page” does not exist in the AGRIS rules, and catalogers are directed to enter only corrected titles. There are also some bibliographical concepts that do not exist in ISBD or AACR2. Therefore, although our AGRIS descriptions were standardized, the headings were standardized, the subjects were standardized and so on, they were all uncoordinated with respect to AACR2 and vice versa.

There were still other standards that were unrelated to any other standard, and the result was that the same items were cataloged over and over because the standards were not shared. Much of the rest of the metadata I saw did not adhere to any type of standards at all.

While it was certainly my opinion that AACR2-type cataloging was the “best”, I could not deny that there were many other standards that had been around for a long time, and that thousands of people, if not far more, had found them highly useful; therefore, I recognized that my personal bias in favor of AACR2 could just as easily be explained by the fact that my initial library training took place in the United States because I just happened to be born there, and not in some other country.

As an added complication, there was that damned Google that kept drawing me back like a moth to a flame, and doing a pretty good job of finding a lot of the information I wanted, so long as I wasn’t looking for anything in depth. As a result of these considerations, I found myself falling deeper and deeper into my Serious Questioning stage, although FRBR was not in my conscious thoughts in any focused way, as I had other tasks to attract my attention: practical cataloging and now, systems.

My time at FAO was when I made my first major strides into systems. I had made several rather extensive websites earlier, such as those of the Cataloging and Technical Services documentation at Princeton University Library, and several specialized cataloging manuals, one of which was my own Slavic Cataloging Manual, but databases had always remained beyond my abilities. While I had wanted to learn about databases and XML earlier, and had read books and asked people for help, I just could not understand it and nothing worked until I met two colleagues at FAO, who ultimately became my close friends. I am forever in their debt because they sat me down and showed me how to build simple databases and how to actually use XML.

I had been studying XML for some time, and one of the most important things I learned at FAO was that the way I had been approaching XML was completely wrong. XML is short for Extensible Markup Language, and its native format is terrifying to behold. For me, I had focused on creating these terrifying XML files of bibliographic records and the most that would happen was that I would run it through a program that would tell me whether the XML document was “well formed” or not. If it wasn’t, then I had more work to do, but sometimes it would say that it was “well formed”, and ... that was it! Since nothing else happened, it was tough to get excited over the “success” that my XML was “well formed”, and therefore, it had always been completely anticlimactic. As a result, I could not grasp how there could be any practical advantages in converting our records to XML format, and therefore, I was exceedingly skeptical over whether libraries should change to XML from native MARC.

At FAO, I discovered that while the XML format is certainly very important, that’s not the fun part. The cool options come from something called XSL-Transformations and related tools that work with XML. So long as your XML document is valid and well formed, then with XSL-Transformations, you can actually transform your XML record into anything you want.

Think about that for just a moment: anything you want.

So, I learned how to take an XML record (in my case, an AGRIS bibliographic record in XML format) and turn it into another format, as I did by turning it into MARC21; or I could make it into a pdf document, or an MS Word or Open Office document. I could transform XML-MARC records into web pages. I even found I could change batches of records into Excel sheets, where I could get some new views that could help for purposes of quality control.

The first few times I did it, I thought it was magic. I figured that probably I could have even converted those records into a movie if I had wanted to badly enough. What does this mean in the real world? As one example, a newspaper encoded in XML can be published simultaneously (or transformed) as a printed document and as a website by simply applying a new XSL-Transformation, and therefore the newspaper itself only has to be created one time. Once again, this same XML file could probably even be transformed into a video.

So, XML on its own is not very exciting, but when paired with complementary technologies such as XSL, it can provide radically novel displays and can sort and re-sort records in a whole variety of ways. I kept telling myself: with XML, you can transform the record into anything you want. This is not something that can be fully understood immediately since it is so expansive, and I am still coming to grips with it myself. Anything means anything.  Now, as an aside, I will admit that probably it isn’t really anything, but I think it’s important at this point to assume that it literally is anything, so that we can eventually find and learn the limitations.

There are other technologies based on XML documents: XLinks, XQuery, XForms and so on that I have not worked with, but I am sure even these are not the end and that there will probably be other developments in the future. This is one reason why I believe it is vital for the library world to shift to XML formats of some kind, so that they can be transformed. I believed (and still believe) that such a capability will represent a fundamental break with the previous cataloging traditions and will have profound consequences for librarians and others, probably both good and bad, many of which we cannot foresee today.

When you add all of this to the possibilities arising from the complementary power of modern browsers and other systems to display and actually bring in information from distant databases on the fly using what is called web services, where everything can display on a single computer screen, and where the user can interact with it in various ways, the possibilities are literally endless. While I didn’t know how to make all of these things I am talking about, I could do a little bit, and that little bit helped me to understand more, and consequently to imagine possibilities that earlier, had never occurred to me. For those who are interested, I have added links to some simple videos about XML and web services from the transcript. I especially recommend What is a Mashup? from ZdNet, which shows how fun it can be and IBM’s more technical, but I don’t think overly so, An Introduction to XML: The Basics

Now that I was concentrating on digital resources, I saw their numbers increasing at an unheard of rate. What were the consequences? In libraries, I had seen and heard stories of major reorganizations, but while I heard specifics in some cases concerning the reorganization of catalog departments, I did not hear anything at all about how the number of catalogers would increase. In fact, I heard the opposite. Slowly, slowly, these realities of how the creation and access to information was changing began to work its way into my brain, which led me even more deeply into my Serious Doubts phase.

Although at that time I was much less involved in the U.S. cataloging world, there was discussion about it at FAO, and I gave some presentations on FRBR to my colleagues. While I did my very best to describe FRBR, my subconscious doubts began to rise to the surface: What does the creation of works/ expressions/ manifestations/ items accessed by their authors/ titles/ subjects have to do with the enormities of the problems I saw? We were being faced with an avalanche of information from everywhere at once, so it seemed that what should take primary importance was to increase the number of catalog records by a quantum factor in some way. Otherwise, although we can claim that we have some type of “control”, it will be control over a very quickly diminishing percentage of worthwhile resources, until it becomes practically infinitesimal and useless. How in the world could you keep a straight face when you declare that you have control?

But on the other hand, how could productivity be increased like that? Catalogers were working hard already and no one was even suggesting that new catalogers would or even could be hired on in enough numbers that would make an appreciable difference.

But I remembered that the problem did not seem to be with the quantity of metadata, but the quality, and so to me, a major part of any solution was obviously to work together somehow. Yet, I realized that this innocent-sounding little phrase “work together” held a vast number of consequences and troubles and fights that seemed so insoluble that I personally did not want to think about them.

How could everyone possibly work together? There wasn’t even agreement on what a manifestation was, so what did this portend for expressions and works and everything else? At the same time, when resources were available with just a click, for me it was pretty much irrelevant whether it clicked into a pdf file or an html file that might include an image or even a video.

Another concept I learned at FAO, which is very important in that institution and I believe also exists in other fields, and in those where it does not exist, it should, is the concept of sustainability. In agriculture, sustainability is demonstrated in the saying we have all heard: give me a fish and I eat for a day, but teach me to fish and I eat for a lifetime. This is absolutely true, but the key is to realize that such an idea is not limited only to fishing or agriculture. In all fields, there are quick-fix solutions and long-term solutions. Quick-fix solutions may be necessary, but by definition, they deal only with emergencies and cannot be relied upon in the long term. Therefore, such solutions are not sustainable.

On the other hand, a long-term solution is just what it says: a solution that will not necessarily last until the end of time, but at least it will for the foreseeable future and consequently, such a solution is sustainable. To take a specific example, there can be emergency solutions for villages faced with a temporary shortfall of rain and the locals need water for simple survival, so everybody gets together and takes them water. But in the long term, if global warming is turning the area into a desert, other solutions, far more drastic, will be needed.

Emergency solutions normally do not engender too much resistance because after all, everyone is facing an emergency, but long-term solutions inevitably cause tensions because in those cases, people are contemplating genuine, unavoidable changes, that is, changes that will last, for all practical purposes, forever. In such situations, there will be winners, and there will be losers. The winners in the current situation may not be the same after the imposition of the long-term solutions, and they may fear that they will turn out to be the losers; therefore sometimes they will oppose the long-term solutions. Nevertheless, changes will be unavoidable, since in the example above, the desert is advancing inexorably, and it is vital to avoid an endless number of increasingly severe emergencies, each requiring any number of quick-fix emergency solutions, that in any case must all end up in failure and possible catastrophe.

After my time at FAO, I felt that sustainability should be an important concept for catalogs and cataloging, as well.

Around this time, fully in my Serious Doubts phase, I left FAO to become Director of the Library of the American University of Rome, a small, undergraduate institution. For the first time in my library career, I would begin to work both regularly and extensively with the public as a reference librarian, and almost immediately as I started work with undergraduates and faculty on their research, I fell into my Disillusionment phase, followed rather quickly by Despair.

At this point, I shall stop once again, and save the rest of my journey for yet another podcast, part 4. I will do my best to finish in the next installment, but I can’t promise anything because there is still quite a bit to cover: my experiences with the public, and seeing how they work with information and what they expect to be able to do with it, plus my struggles with Information Literacy, led me toward new phases. But I’ll talk about them later.

The music I have chosen to end this segment is an excerpt from a fun piece by Marco Uccellini, called Aria quinta sopra la Bergamasca or the Fifth tune for the Bergomask performed by the group Il Giardino Armonico. For everyone’s information, I discovered that a bergomask is a dance that made fun of the people from Bergamo, a region of northern Italy, who were supposed to be notoriously bad dancers. For example, in Shakespeare’s A Midsummer Night’s Dream, the clowns dance a bergomask. The piece here is an excerpt; if you would like to listen to the entire piece, the link is available from the transcript.

That’s it for now. Thank you for listening to “Cataloging Matters” with Jim Weinheimer, coming to you from Rome, Italy, the most beautiful, and the most romantic, city in the world.

No comments:

Post a Comment