Wednesday, November 30, 2011

Spelling out abbreviations

Posting to RDA-L

J. McRee Elrod wrote:
Programming to display lower case "p." as "pages" in collation when following a number, will not display upper case "P." in an entry or statement of responsibility as "Pages". Better a one time fairly simple programming provision, than all that keying with increased probability of typos.

Matters which should be addressed by ILS development, rather than AACR2/MARC21 replacement, are being addressed by changing standards which have not yet been fully utilized.
Absolutely right. Also, it must be accepted and admitted that the old records can never be upgraded by hand (what a waste of resources that would be!) and every patron will see abbreviations in every single search they do until the end of time. Therefore, if we want to solve this "problem" of abbreviations from the point of view of the *patron*, it can only be done in some kind of automated way. These are simple facts.

Of course, resources will also be needed to automatically "correct" the abbreviations, and it is only natural to question if this the best use of our ever-diminishing resources. People certainly have problems with our catalogs, but are the problems with abbreviations really such a major impediment? In my experience, the idea of authority control is a mystery to most people and how our subject headings work is more or less incomprehensible to them.

Thursday, November 24, 2011

Re: The bibliographic universe

Posting to RDA-L

On Wed, Nov 23, 2011 at 5:33 AM, Cossham, Amanda wrote:
We know that FRBR is a conceptual model of the bibliographic universe. However, there is no generally accepted definition of 'bibliographic universe' nor does the original IFLA report define it. Some definitions are hugely broad, some exclude maps and music, others imply any textual material but in practice mean what is held by libraries.

So, I'm collecting definitions to see how broad or narrow this universe is, and what FRBR might or might not be useful for. You're welcome to comment on any of the definitions I've collected so far, or add others.
I think that the concept of the "bibliographic universe" has changed in fundamental ways with the advent of the web. Today, any item on the web can be subject to being part of this universe, so I think we need to discuss first, what is going on. This involves the concept of "metadata" which, I think, is subject to misinterpretation.

When I began as moderator of a listserv for ASIS&T for information architecture quite a few years ago, there was a woman from Lucent who was interested in metadata and we began to exchange emails discussing it. I thought I had a good understanding of what metadata was (i.e. library catalog information primarily), and we exchanged about 5 indepth emails before I realized we were discussing completely different things! She was just as surprised as I was! We used the same terminology in lots of cases, but our "universes" were completely different, so we wound up discussing what seemed to be the same things, but in different planes of reality that never met.

So, what is metadata? As I wrote once somewhere, "metadata" is nothing essentially new: libraries have always kept all kinds of information in different files: card catalogs, shelflists, acquisitions lists, borrowers, desiderata etc. etc. etc. But other organizations have always kept their information in all kinds of files: banks, governments, the military, businesses, courts, universities, doctors, etc. etc. etc. Nothing new. Starting in the 1970s (about), the price of computers began to come down and software was made, so that it became easier and cheaper to handle all of these files using computer databases, and everybody began to put this information into computers. Nothing really new, just change in format.

But then, the internet appeared, and it seemed that almost overnight, all of these separate computers were suddenly linked together. From that point, the information in these different databases could interoperate. That had never really been possible before and, from this point the group term "metadata" could begin to be used in a meaningful way, denoting in essence: the totality of the information stored in all of these different databases. When you think of all of this information, some of it very sensitive, the possibility that all of this metadata can interoperate obviously holds both opportunities and dangers and is something that we are all still dealing with.

In this broad scheme of metadata, library metadata is very small and commands very little respect. This is what I began to realize 'way back then when I exchanged emails with that woman from Lucent. Metadata that leads to money and power gets respect. Google and Facebook are based on metadata, but it's their kind of metadata, not ours. Our metadata does not lead to money and power, at least not immediately, and is a major reason why library metadata is handled almost as an afterthought by the big information agencies.

To me, this sea change in the information environment forces a reconsideration of the original idea of the "bibliographic universe" which, to me now, seems almost quaint. It must be updated to include everything that is available on the World Wide Web, if not much more, since this is the reality of what people deal with on a daily basis. If someone would have told me this would happen just 25 years ago, I would have said they were crazy!

Concerning the concept of the "document," as Hal pointed out, I think this is changing too since people want things granularized and summarized and repackaged in a whole variety of ways. Therefore, the "document" changes constantly, is often created dynamically, and other "documents" disappear without a trace.

Re: On demand print editions

Posting to Autocat

On 21/11/2011 21:05, Clumpner, Krista E wrote:
Once, in jest, I made up "ANSI standards" for Publishers. One of the standards was you can only use an ISBN for one title, one edition. But, of course, the problem is no one can enforce standards on publishers.
Michael Cohen wrote:
Yes, publishers use ISBNs for their own purposes, and don't understand that they can be used for other purposes as well, like linking out to cover art. Search this title in WorldCat and note the picture you get:  "Evapotranspiration and irrigation scheduling"
I love that example! Too funny! I'll bet there are lots of others too! But what isn't funny is, how can we believe that publishers will give us RDA-compliant information when they can't even do "the right thing" with the ISBNs that they themselves supply? It just doesn't seem even remotely possible to me.

Saturday, November 19, 2011

Re: On demand print editions

Posting to Autocat

On 19/11/2011 02:30, mike tribby wrote:
Publishers purchase their ISBNs in various sizes of blocks and the sad but immutable fact is that they sometimes, either out of venality, ignorance, or some other unpleasant motivation, re-use their ISBNs, possibly comforted by the thought that they paid for it, and they're going to get all the use they can out of their ISBN block purchase.

The brave new world to which you refer and its herald RDA will likely have little to do with publishers, whether intentionally or not, upsetting our bibliographic-control apple cart by reusing ISBNs. As Ian's posting suggested, that's one reason we have subfield z available in the 020. Some publishers make it a practice to reuse ISBNs. In some cases they have been apprised of the error of their ways-- and yet they persist.
Publishers have a completely different attitude (i.e. business attitude) toward books and metadata than libraries do. Publishers are interested in money (understandably enough) and therefore, if an item has been out of print for awhile, it effectively doesn't exist in their universe any longer. Therefore, reusing ISBNs is no problem in their view since it still provides a single point of reference to the products they are currently creating. Therefore, from the publisher's point of view, there is no error in reusing an ISBN because it conflicts with none of their own business practices.

In the larger world however, ISBNs have taken on a use that, in the publisher's opinion, is incorrect since ISBNs should be applicable only to the products being currently published. Therefore, the problem is not with them (in their view) but with everybody else.

Unfortunately, I agree that publishers probably won't change their current practices because they do not see it as in their interests to do so. It would be up to the library/bibliographic community to somehow make it worth their while to concern themselves with matters that do not affect their business.

This is a great example of RDA and publishers. If publishers won't even deal with the ISBNs, why in the world would they be more willing to give us RDA-compliant metadata when they don't give us AACR2? I can't see it.

Sunday, November 13, 2011

Re: [ACAT] Bibliographical references - include pages?

Posting to Autocat

On 11/11/2011 17:36, Marcia McKenzie wrote:
Is there a standard practice for whether/when to include pages when citing bibliographical references in a 504 note? There does not appear to be any consistency in OCLC records and the various manuals I've consulted. Of course, if references are scattered throughout a book it would not be possible to include pages, but even when they are gathered in one section, sometimes pages are noted and other times not. And when they are both scattered throughout the book and there is a "References" section at the end, it can be difficult to tell whether the section at the end includes all the sources mentioned throughout the book.
As others have pointed out, the rule is LCRI 2.7B18 [] but the current rule has been simplified significantly from the earlier practice. This is an example of the "cataloging simplification" that has been going on for a long time and this particular simplification is one that I personally never cared for. The original practice was given in 1982 in CSB no. 17, which allowed for much greater clarity instead of the generic "Includes bibliographical references" with the page numbers. Many catalogers (thankfully!) have continued to follow the old practice, especially if the bibliography had a separate title, e.g. "List of works of William Hull" p. 242, which, even though a single page as in this example, was often extremely useful for me as a researcher. Often I would want something such as "Complete bibliography of [author]" since that was often just what I needed even though the book did not contain all of those materials. In my own view, the practice "Includes bibliographical references" represented a serious downgrade in the usefulness of the record.

Of course, this did represent an improvement over the version of LCRI in CSB 44, where the rule did not allow adding the page numbers. Then, after an outcry, in CSB 47, they said to start adding the page numbers again.

The reason for this change, it seemed to me, was that they wanted the "b" in the fixed field to display automatically from that code and the cataloger wouldn't enter the information manually. That never happened; catalogers kept adding information manually, so there was no real savings from typing bibliographical references vs. bibliographical footnotes or even bibliographical endnotes.

It is amazing that I can point to these LCRIs all the way from Rome, Italy! I thank the Library of Congress for making these valuable documents available.

Friday, November 11, 2011

Re: Apocrypha

Posting to RDA-L

On Thu, Nov 10, 2011 at 3:22 PM, Armin Stephan wrote:
The work "Genesis" is the work "genesis". I see no need for any qualifier at all.
(AACR cataloguers use to qualify everything. German cataloging tradition shows, that it is possible to use less qualifiers.)
I would just like to point out the Wiki disambiguation page for Genesis:

As I have pointed out before, the disambiguation pages of Wikipedia are one area where we can see a huge improvement over our traditional library tools. I can't imagine anybody preferring our methods to a page like this.

Still, even they add several qualifiers.

Wednesday, November 9, 2011

Re: [RDA-L] Offlist reactions to the LC Bibliographic Framework statement

Posting to RDA-L

I almost changed the subject line, but this still *seems* to concern the bibliographic framework, or perhaps not.

Of course, relator codes require more work than not assigning them. That is a simple fact that no one can dispute. The question is: are they worth it?

This is not the sort of question that can be answered with a simple "Yes, I think so" or "No, I don't think so". Different aspects must be considered first. The first fact that must be accepted is that the old records will not be upgraded and this has consequences for everything else.

First, will the relator codes be indexed for searching, i.e. will people be able to limit their searches to "editors" or "compilers" or "contestee" or "process contact"? I certainly hope not since the results will be unpredictable. Therefore, if the codes are not there for searching, what are they there for? There seems to be only one answer: for display.

Another aspect must be to see matters from the public's viewpoint. That viewpoint certainly should never be ignored. Since the old records will never be upgraded to add relator codes, they will see records with relator codes and records without relator codes all mixed together in every single search they do. What will be the correct way for a non-expert to approach them? Therefore, they will see, in every search, in one record, made post-RDA, there will be a relator code for a specific role, but in another record, pre-RDA, there will not be a relator code for exactly that same role. What then, is the purpose of the relator code? How can we keep them from being confused? How should people approach our records then, and how do we inform people what they should and shouldn't believe concerning the relator codes? What are the best ways to use them and what are poor ways? And remember, these will be exactly the same people who can't be expected to know what "p." or "ill." mean!

Naturally, another important aspect of the matter is the amount of work and the effects on productivity. When an experienced cataloger says that it has a noticeable effect on productivity, that statement should simply be accepted. It is in the nature of things that there will be easy items in English, just as we still get new editions of "The old man and the sea" and with very little work, we can count it as an original catalog record in our statistics. But there are other materials that are not in English, strange items with unclear roles that demand time. These kinds of strange roles can only get stranger with online materials.

It seems that there will be serious consequences both to catalogers and the public. This is normal when you decide to add new parts to the basic functions to the catalog. The only way to answer these considerations is to do at least some amount of research and find out if the consequences are worth the effort. Otherwise, we dive into the effort armed only with suppositions based on limited knowledge and personal beliefs.

Of course, in a regular business environment this sort of research would have been done at a very early stage, not at the very end.

Re: [RDA-L] Offlist reactions to the LC Bibliographic Framework statement

Posting to RDA-L

On 08/11/2011 22:21, J. McRee Elrod wrote:
See Chicago Manual of Style 14th ed. 16.35-38. Up to three authors may be given, but only the first is given in inverted order. Sounds like a main entry to me. One has to choose one to invert. Beyond three, only the first is given. (Entry under first of more than three is closer to RDA than AACR2, but like AACR2 in substituting "et al." for additional authors.) Am I the only one old enough to remember more than one author at the top of the unit card? But *one* was first.
Well, I beg to differ since I don't see that mere inversion of the name that happens to be first on an item to be the equivalent to the selection of a main entry. Everyone on this list is fully aware that the rules for a single main entry are terribly complex. The same thing happens when you have four, five, or more names. 

Certainly,  *in a bibliographic citation* a single one of all the authors has to come first, but not in a computerized catalog where displays are (or can be) much more fluid. Articles can get wild, e.g. Who wants to trace all of them?! Yet, in the bibliographic citation entry for this item, it would be the first three to seven authors, with the first one inverted. Who can maintain that the first person here is equivalent to a *single main entry*? In the future, I would predict that monographs (whatever form they become) could very possibly approach this level of complexity.

In any case, there is no reason why Johnson should be treated subordinately to Masters, except to maintain our old practice of a single main entry. Many bibliographic databases do just fine without the concept of a single main entry. Look at Amazon with three authors If you look at the cover in the "Look Inside" (I can't see the t.p.), Masters is first, but in the "citation" Kolodny is first. In the CIP, Masters retains main entry. Dublin Core also avoids a single main entry.

Why continue this practice when there are three equal authors or more? In a card or printed catalog, I freely agree that matters are quite different but in a database, matters are completely different.

If we could get rid of those complex rules, cataloging would become simplified a bit and access would remain the same if not improved. 

Still, I realize that I cannot convince you of this, so we can agree to disagree. Yet, wouldn't it be great to at least allow the possibility of something like this? In ISO2709, allowing for such a possibility would be terribly difficult, but as I tried to show in XML, it is almost child's play.

Re: [RDA-L] Offlist reactions to the LC Bibliographic Framework statement

Posting to RDA-L

On 08/11/2011 22:15, Jonathan Rochkind wrote:
Kind of off topic, but curious why you don't think relator codes are the right thing to do. If we're listing 3 or 5 or 10 people or entities 'responsible' for an artistic work, why wouldn't we want to be able to say the nature/role of each entities responsibility?  Or, if we do, but relator codes are a poor device for this, why?

I answered this in another posting that can be found here

While I have nothing against the relator codes *in theory* I think there are serious practical barriers. Entering the relator codes entails additional work for catalogers and some will not be so simple, but more important, there is the serious problem of legacy data. If catalogers had been adding the relator codes all along, that would be one thing, but the decision was made back then not to add them. We must admit that those records will not be updated. 

Therefore, when looking at the situation from the *patron's point of view*, they will still--always--have to check and recheck every single citation generated from a library catalog because there may be editors, compilers and others who must be cited as such. I see this leading to tremendous confusion and anger. Remember, these are the same people who are not supposed to be able to understand abbreviations such as "p." and "et al." (except in citations, of course!). 

I don't think it is wise to promise more than we can deliver.

Tuesday, November 8, 2011

Re: [RDA-L] Offlist reactions to the LC Bibliographic Framework statement

Posting to RDA-L

On 08/11/2011 17:23, J. McRee Elrod wrote:
Jim said:
Getting rid of a *single* main entry would be the equivalent of DC's <creator> and <contributor> where <creator> is repeatable, thereby creating multiple main entries.
How would you produce single entry bibliographies? How would scholars cite in footnotes? How would cataloguers construct subject and added entries for works? Libraries are part of a larger bibliographic universe, and should adhere to its standards and practices, which would include returning to compiler main entry.
Could you point me in the direction of a bibliographic citation format that demands someone choosing a *single* main entry? I have worked a lot with them and have never found anything resembling a single main entry. While the practices vary, the main rule is, copy the authors in the order they appear on the title page. Some stop at a maximum of four, none more than seven. Some want the forms of names as spelled out on the item, others say to abbreviate first and middle names. These formats mostly want people to differentiate between authors and others, e.g. editors, compilers, and translators, by putting in (ed.) or mentioning translations. Here is the Chicago format Another nice page is from Ursinus Here is a guide for the Harvard rules "For books with two, three or four authors of equal status the names should all be included in the order they appear in the document. Use an and to link the last two multiple authors." These rules, and others, actually use "et al."!

I admit that these considerations would provide a reason to go back to the practice of adding relator codes (which I do *not* think is the right thing to do, by the way).

Now, as far as cataloging items for subject or added entries for works with two or more main entries, it can be done in XML quite easily, but more difficult with ISO2709. With XML, for a subject entry for Masters and Johnson (two main entries), you could have (an abbreviated MARCXML record. I think catalogers can follow):

  <a>Smith, John</a><d>1960-</d>
  <a>The book by Masters and Johnson</a>
  <b>some thought</b>
  <c>by John Smith</c>
New York</a>
    <a>Masters, William Howell</a>
    <a>Johnson, Virginia</a>
    <a>Human Sexual Response

The same could be done with an analytic or series, just replacing <subjectUniformTitle> with <analyticUniformTitle> or <seriesUniformTitle>. How this could be done in ISO2709, I do not know, but I won't say that it cannot be done because somebody may figure out a way, but I can't imagine why anyone should want to. XML can do it right now and it could be utilized by browsers the world over--right now. 

Once we get away from ISO2709, there will be all kinds of novel bibliographic structures that can be implemented. ISO2709 leads catalogers to think in certain ways about how information in structures. There is no need for that any longer.

Re: [RDA-L] Offlist reactions to the LC Bibliographic Framework statement

Posting to RDA-L

On Tue, Nov 8, 2011 at 7:01 AM, Hal Cain wrote:
However, once I began to see how competent systems handled MARC, it became plain that what they were doing was basically to create a matrix and populate it with the tag values, the indicator values, and the subfield data prefixed by the subfield code.  Then the indexing routines read the matrix (not the raw MARC ISO2709 data) and distributed the data into the appropriate areas of the system's internal table structure.  From those tables, I was able, when required, to obtain what I wanted by direct query on the appropriate part of the database. When it was necessary to export a single MARC record, a group of them, or indeed the whole database, the system had routines which reversed the process (and, last of all, counted the number of characters in order to fill in the record length element of the MARC leader). This was extremely burdensome to programmers who came to the game in the 1990s and had no background in early data processing, chiefly of text rather than numbers, but in its time it was pure genius. Nowadays it's a very special niche, and the foreignness to programmers and designers of the processes involved probably plays a part in keeping us from having really good cataloguing modules and public catalogues; and I can understand the frustration entailed for those who expect to interrogate a database directly.

Bear in mind, though, that using a modern cataloguing module (Horizon is the one I'm most familiar with), I can search for a record on a remote system, e.g. the LC catalog, through Z39.50, and have the record on my screen, in editable form, in a second or two, indistinguishable from a record in the local database. The system's internal routines download the record in MARC format (ISO 2709, hated by Jim) and build the matrix which feeds the screen display.
This is really a nice description of the problems of ISO2709, Hal. Thanks a lot.

I would like to clarify one point however: do I hate ISO2709 format? I can answer that honestly: no. It served its purpose well for the environment it was born into. That environment changed however, and we need to face up to that. If our modern systems (i.e. modern web browsers) worked with the ISO2709 format, i.e. the files that the machine actually receives, then I would be all for it.

Yet, this is not the reality of the situation. Browsers work with a variety of formats, but they work with XML, which gives us some options. Browsers do not work with ISO2709, and I don't believe they ever will. Therefore, the only systems that can work with ISO2709 records (which is how libraries exchange their cataloging information) are other catalogs, and that automatically restrains us from participating in the wider information universe. As a result, in my own opinion, hanging on to ISO2709 borders on the irrational since we automatically limit the utility of our records, thereby limiting ourselves.

MARCXML has many limitations that I won't discuss here, but *at least* it is in XML which *can* be used in the new environment. It is much more flexible than ISO2709. For instance, I have mentioned before that I believe we should get away from a *single* main entry--that while a single main entry made sense in the card catalog, it makes no sense in a computerized catalog. Others disagree, but no matter what, I think it is vital that we should have that kind of flexibility.

Getting rid of a *single* main entry would be the equivalent of DC's <creator> and <contributor> where <creator> is repeatable, thereby creating multiple main entries. It turns out this is much more difficult than merely making 1xx repeatable, since you also have to allow it in the 6xx, 7xx and 8xx, for books *by* Masters and Johnson, for books *about the books* written by Masters and Johnson, for analytical and series treatments as well.

You could do this without too much difficulty in XML, even in MARCXML, but in ISO2709, it would be a relative nightmare because you would have to rework the entire structure, from the directory on down. (This is why the MARCXML principle of "roundtripability"--what a word!--needs to be dropped. Otherwise, we still remain trapped in the ISO2709 format!) Anyway, while it may be possible to rework ISO2709 to such an extent, would it be worthwhile to do it on such an old format?

This is just one example of the relative inflexibility of ISO2709, but there are many more.

Still, I don't hate ISO2709. It served its purpose admirably, but it's like the horse and buggy. I'm sure nobody hated horses and buggies after the automobile came out, but eventually, if it turned out that Dad and Grandpa refused to get a car when everybody else had one and the advantages were plain for all to see, Junior very possibly would have wound up hating the horse and buggy he was forced to use.

Monday, November 7, 2011

Re: [RDA-L] Offlist reactions to the LC Bibliographic Framework statement

On Mon, Nov 7, 2011 at 11:05 AM, Bernhard Eversberg wrote:
But be that as it may, my point is that even for this function, it is no longer technically necessary. For all intents and purposes, MARC may live on forever without the need to deal with ISO2709. It is technically obsolete, but we need not care.
Perhaps it will live on as one developer described, when last week at lunch we were discussing the "old days" of the ISO2709 format for AGRIN3 data that he (and I and everybody) had to work with before we all changed it to XML. 

He mentioned that he keeps the specifications in a drawer of his desk as a momento mori. Once in awhile he takes them out just to gaze upon and to remind himself of other realities!

Re: [RDA-L] Offlist reactions to the LC Bibliographic Framework statement

Posting to RDA-L 

On Mon, Nov 7, 2011 at 10:21 AM, Bernhard Eversberg wrote:
Jim, my point is, in nuce:
"Yes, MARC is horrible, but ISO is not the reason".
You wrote:
With ISO2709, it is designed to transfer a complete catalog record from one catalog into another catalog.
Yes, but Web services on any MARC based catalog need not suffer from that, Web services can be constructed without paying any attention to the ISO structure. I said that much in my post. It is regrettable that up until now we still have not many useful web services as part of library OPACs. But the reason for this is certainly not ISO2709.
Have you ever seen or heard of a web service based on ISO2709? What then will be the purpose of ISO2709 except one: to transfer a catalog record from one library catalog to another?

But this now appears to be the second aspect of MARC, which is what most of the discussion is about, not about ISO2709 itself, but the coding, e.g. 100b 300c and so on. In one sense, this is much less of a problem because we are talking about mere computer codes, and those codes can display however someone wants them to display.

So, when developers say that they don't like MARCXML, this is a lot of what they are talking about since they want and expect the coding to say "title" and "date of publication" and they don't want to look up what 245a or 300c means. (There are also the codes that must be dug out of the fixed fields such as the type of dates and dates in the 008, the language code, etc. but that is yet another matter)

Of course, we run into the problem of library jargon here, since 245a is not "title" but "title proper" and not only that, it includes the "alternative title" plus it includes individual titles when an item lacks a collective title. There may be some more nuances as well. Therefore, 245a implies separate access to a lot of other types of titles. Non-cataloger developers cannot be expected to know or understand any of this. So, if the format codes it <title>, that is misleading, while coding it as <titleProper>, developers will just think it's a weird name for a title.

This is complicated and at the moment I don't know how it can be solved. Perhaps our traditional library distinctions will disappear in the new environment, but I hope not.

Re: [RDA-L] Offlist reactions to the LC Bibliographic Framework statement

Posting to RDA-L

On Mon, Nov 7, 2011 at 8:00 AM, Bernhard Eversberg wrote:
Jim, ISO2709 is a nuisance, agreed. And I dislike it no less than you do because I'm a real programmer and know what it feels like. But don't let's get carried away and rush to premature conclusions with inappropriate metaphors. Rather, consider this: Would you tear down your house and rebuild it from the ground up if the old wallpaper gives you the creeps?

For that's what ISO2709 is: mere wallpaper. Easily replaced or painted over. Nothing serious, nothing that affects any qualities of the building.
I wish that were true. ISO2709 is the standard way libraries exchange their records, and this means that anybody who wants library information must work with ISO2709. ISO2709 was designed to make catalog cards, and that is what it still does today, only the cards are not printed on card stock, they are printed on the computer screen. Certainly, they can be searched in ways different from a card catalog, but this is because of the mere fact that they reside in the computer--not because the format is any more amenable to searching.

Today, most web developers I know do not want to copy and reformat and maintain duplicates of records that are on different systems. They want much more to interoperate with them, and they can do this through various APIs. For instance, I can add a Google Books API that will search--in the background--Google Books in all kinds of ways and return one record, or multiple records. It does not give me the entire Google metadata record, nor do I want it. (as ISO2709 does--by definition) I want to work with the Google metadata *on the fly* so that I do not have the responsibility to keep the record current, reformat it and have to do all kinds of additional work. Keeping the record current is Google's responsibility--not mine and I shouldn't have to do it.

With ISO2709, it is designed to transfer a complete catalog record from one catalog into another catalog. It is not designed for interactivity. Here is a practical example. At LC, they have lots of sets of records where you can interact with them So, I could have a local catalog on e.g. dance, and I could search behind the scenes--if I set the machine correctly--the records for LC's dance instruction manuals. I can display these records as I wish because they are in XML. I would not have to download all records in ISO2709, convert them in MARCEdit, put them into my own database, where the URLs and other information may change in the future, since potentially it is a ton of work to maintain records for materials on the web.

Another example is the Worldcat Search API There is no mention of ISO2709 there. Plus, I implemented the Worldcat Citations API when I was at AUR:
and an example: In the right-hand column, you will see "Get a Citation". When you click it, you will see citation formats (in XML, not ISO2709) taken on the fly from Worldcat and reformatted by the system I created. This is a simple example and matters could become much more complex, if someone desired.

The fact is, most developers want to work with APIs in these kinds of ways instead of having to download, convert (mostly an extremely difficult job to come out with anything coherent), upload into your own system, and then maintain those records. That is horribly inefficient, and unnecessary, today.

Why don't more developers work with library metadata? To me, the answer is absolutely obvious. We are not making APIs that developers want to work with, and one reason is that we keep maintaining that if somebody wants our information bad enough, it is "easy" to work with ISO2709 records by downloading, reformatting, etc. but that is wrong. Working with APIs is what is easy and if you use ISO2709 you absolutely cannot do that.

Developers don't want--or need--to jump through all of those hoops when they don't have to, and they prefer to work with other systems. So they don't use our records and prefer, e.g. Amazon, which has all kinds of APIs.

Unfortunate. But perhaps it is something that the Bibliographic Framework will address and our metadata will be more usable in the information universe.

Saving libraries but not librarians

Posting to NGC4LIB

This was an article in the Los Angeles Times written by a disillusioned librarian who maintains that librarians get little respect, especially today ("California must value librarians; libraries can't run

She provides numbers showing how budgets and librarians have decreasedin the last few years. The most perceptive point she makes is:
"Still, the idea of shutting down a library is unpalatable to most officials. So they lay off librarians to keep the buildings open, supporting the illusion that libraries can simply run themselves.

On school visits, I ask what students think a librarian does. The response is always the same. "Librarians check out books. They read a lot. They tell people to be quiet." These misconceptions are held by adults too. When I told a friend that I was embarking on my graduate degree, he asked, "You need a master's degree in the Dewey Decimal System?"

With that attitude, who cares whether California has any librarians left? Why not replace us with phone trees, self-service checkout machines and volunteers?"
Then another article in reply was published by a fellow at a legal clinic ("Saving libraries but not librarians", who claims that academic libraries perhaps need traditional libraries and librarians, but that the general public can get by with Google. His opening sums up his argument:
"The digital revolution, while improving society, has gutted many professions. Machines have replaced assembly-line workers, ATMs have replaced bank tellers, Amazon has replaced bookstores and IBM's Watson may even replace doctors and lawyers. And now, the Internet is replacing librarians.

Or at least it should be.

The digital revolution has made many librarians obsolete. Historically, librarians exclusively provided many services: They organized information, guided others' research and advised community members. But now, librarians compete with the Internet and Google. Unlike libraries, the Internet's information is not bound by walls; from blogs and books to journals and laws, the Internet has them all. And Google makes this information easily accessible to anyone with an Internet connection."
My own take on this issue is what others pointed out in the comments to the second article: the second person does not know what a librarian does, and the librarian in the first article pointed out that very few people do know. Even people who use libraries all the time and claim to love them so much, still do not understand what library work is. But the simple fact is that the materials on the web has caused a true revolution in information that librarians are still trying to deal with. Libraries normally enact changes on a schedule akin to geological time, but the web materials change constantly. Google changes its search algorithm about once a day!

The traditional method that libraries have used to deal with new materials has been to fold them in with the same procedures they had used for the old materials, so e.g. when photography came in, libraries altered their current methods to include them; the same with computer files and other newer materials. But our old methods have failed when applying them to materials on the web (primarily not because online materials are harder to catalog--there are just a lot more of them and they change unpredictably without notice, so the record becomes outdated as a consequence).

If the Google-Publisher agreement had been approved, I think we would be taking the issue of the relevance of the library more seriously, but it was rejected and libraries have gained a breathing space of a few years. Still, all of those materials will definitely be available online sooner or later, and libraries will simply have to deal with the situation of 90% of the resources people want are available online at the click of a button.

Someone wrote to me privately about the Amazon Prime program where people can borrow one book a month for $80 a year--plus you can watch movies and get other advantages. Depending on the books and movies you can get, this may be a very popular choice for a lot of people. For those who would choose Amazon Prime, I would bet that many would feel they did not need libraries any longer. And in tough economic times, they may be less agreeable to taxes paying for libraries that they don't use.

These issues could perhaps be resolved if the public had a better understanding and appreciation of what librarians do, and what new things they could do with updated tools and in a new information environment.

But librarians themselves will have to change first.

Friday, November 4, 2011

Re: Offlist reactions to the LC Bibliographic Framework statement

Posting to RDA-L concerning  "A Bibliographic Framework for the Digital Age"

Well, I take a slightly different position from my esteemed colleagues. The transition from our outdated format will have to come sooner or later, and the sooner we do it, the sooner we can actually enter the larger world of metadata, be it for better or for the worse.

Format considerations themselves are, I think, not the real issue for catalogers. To me, the issues are similar to those back at the end of the 19th century, when libraries wanted to share copies of their catalog cards, but the sizes of the cards were different in each library. This had to be dealt with first. Therefore, it became a duel to the death to get the size of *your own* library's cards as the accepted standard, otherwise you would be forced to recatalog everything on the other sized cards. At Princeton University, their cards were larger than was ultimately accepted, so they tried cutting down the cards and writing what was cut off wherever they could. (I never found one of those, but I would have loved it!) That didn't work out so well, so they had to use other solutions. Still, after all of that fighting, all that everyone had agreed upon was a *blank* card. Then came the real fight, about what information should be written on the card, and where each bit of information should go. That took a long time to agree upon, with ISBD solving part of it.

Deciding on a common format is similar to deciding on the standard-sized card back then. While this is a very important step, it is nevertheless necessary to point out that everyone will be agreeing on an *empty* format. What will fill up that format is what cataloging should focus on. To be honest, what interests me much more than this:

245 10 $a.....

is what actually goes into that 245 field, how to enter the title itself in a standardized way so that others can find and understand the record. Whether the format is:

245 10 $a
<marc:datafield tag="245" ind1="1" ind2="0">
     <marc:subfield code="a"> </marc:subfield> 
<isbd:titleProper> </isbd:titleProper>

I don't really care very much. Most of this must be determined by technicians. Catalogs have their needs and these must be kept in mind, but some of their needs are very probably outdated now. One format may be more accurate, one may have indicators for alphabetical browsing (which almost nobody does anymore), and of course, some formats will be ignored by developers because they are too much of a pain to work with. In my opinion, we need to make our formats as amenable to developers as possible, because then they may be willing to include us rather than exclude us.

Once everyone moves to XML-type formats, there will automatically be the flexibility for various groups to add their own "name spaces", e.g. I can imagine something like:

In this way, different communities could add their own metadata, while still being able to cooperate. I think a lot of communities, and individual libraries, will like this possibility. All in all, I think something like this should have been done a long time ago, as a first step before considering RDA. Once the format is dealt with in some way, (just as the standard-sized card so long ago) then changes in cataloging rules may make more sense--or they may not.

Also, in deference to Bernhard and his statement
(ISO2709, BTW, is *not* among the flaws and issues. It is a very marginal issue of a purely internal nature and is in no way related to MARC as a content standard. MARC can perfectly well work without ISO, no one needs to bother with it except the few systems that are still unable to ingest anything else, and they can use MarcEdit to get what they want. Abandoning ISO in favor of the external format MarcEdit uses, you get rid of the 9999 character field length limit as well.)

I must disagree 100%. Maintaining that ISO2709 is not a problem is like saying that the water in the local stream is fine. While you can't drink it immediately, all you have to do is take a few buckets of that water, let them sit for 5 or 6 hours to settle, then skim off what's on top. Boil the water you skimmed off for 10 minutes or so and then throw in a couple of chlorine tablets at the end. Shake it all up and voila! You can drink it. Therefore, the water is safe to drink!

We can't expect people to do that when there are all kinds of other, more friendly methods out there that will let you do what you want without jumping through hoops. I want to be able to drink water directly out of the tap! How in the world can we hope to change our format if we don't see the problems with a format that is over 40 years old and everybody has to work with *before* they can begin to do anything with it?

I am willing to make a wager. I'll bet 100 euros that whatever format they or NISO comes up with, will *not* be ISO2709. I am not a betting man--I only bet on sure things. And while ISO2709 served its purpose, its time has passed. Of that, there is no doubt at all.

I hope the new format comes out soon, but I doubt it.

Thursday, November 3, 2011

Re: [RDA-L] NISO offers itself as the standards body for future format

Posting to RDA-L

On Thu, Nov 3, 2011 at 8:45 AM, Bernhard Eversberg wrote:
Help with the creation of a new format would be great. What the library world needs here is, of course, an indefinite term commitment. And what we also need is a free and open standard, or else we can forget everything about opening up to other communities and freeing our data in the web for everybody to use. Libraries are there to make recorded knowledge universally available and useful. To assist this, today, they have to make their data universally available and useful, and with that huge body of data, the conventions that constitute its foundation. What we have instead is one not universally open entity in control of the data and another one in possession of the rules. Now, the format is to go into custody of a third?
Good point. I had simply assumed that what they make would be free. It appears as if they do make them available for free, e.g. the Digital Talking Book Standard at They also say explicitly that they are available at no cost:
"All NISO standards are protected by copyright. NISO standards can be downloaded and reproduced for noncommercial purposes only. NISO standards cannot be translated, modified, redistributed, sold or repackaged in any form without the prior permission of NISO."

Still, this needs to be made very clear. For instance, I can imagine libraries--and individual libraries--wanting to add their own namespaces to whatever NISO would make, so the word "modified" would have to be considered carefully. Plus, the translation makes me hesitant, although I understand.

Wednesday, November 2, 2011

Re: NISO offers itself as the standards body for future format

Posting to RDA-L

On 02/11/2011 20:20, Karen Coyle wrote:
Sorry to repeat this to so many lists, but the most recent NISO newsletter:

makes the case that NISO may be the more appropriate body for the development of the future data format for libraries. Quoting from the message by Todd Carpenter:

"The MARC standards office at LC is adeptly led and they have the best of intentions, with a goal of trying to represent and serve all that use this important format. However, there is a fine line between leadership and control. Hopefully, LC is willing to lead while letting the broader community control, as messy as that process might be.

The process for moving MARC into today's information environment is important, as noted above. Wouldn't the process be better served by utilizing the existing and open standards development processes already in place that have served our community so well in so many areas?"
The simple fact is that libraries need help. They need help for the actual task of creating metadata; they need help to figure out what types of metadata is needed today both by our patrons and for collection management; and they need help to come up with formats. Libraries need help in all of these areas, especially now since there is practically no chance of their getting additional funding any time soon. There is a down side of course: the more help you get--substantial help--the more you lose the control you were used to having.

I just figure that NISO or some other organization will do this sooner or later, so it might as well be sooner so that libraries can get the help. The adjustments may be wrenching for librarians however. My own concern is that if development is undertaken by non-librarians (i.e. patrons) that library management needs are maintained in full. Probably there is no worry about this since modern formats can be so flexible that a "library management name space" can always be created and included.

I think this would be a great development if NISO got involved.

Re: Not really a new edition?

Posting to Autocat

On Tue, Nov 1, 2011 at 3:59 PM, McCormick, Elizabeth wrote:
I have here The invisible hands : top hedge fund traders on bubbles, crashes, and real money by Steven Drobny. It was published in "(c) 2010, 2011" by Wiley. This is a paperback book. In WorldCat, there is a bib (#468969450) for the hardcover, which was published in 2010 by Wiley. Both books have a foreword by Jared Diamond; however, in the paperback, it's called "Foreword to the Previous Edition." The paperback has a new foreword by Nouriel Roubini as well as a new preface by the author in addition to the "Preface to the Previous Edition." Through the magic of the Interwebs, I've been able to compare the tables of contents for both books and they match exactly except for the aforementioned differences. Is this really enough to warrant a new bib - a new foreword and a new preface?

In addition to what everyone has mentioned, there are four main resources:

1) LCRI 1.0

2) ALA's "Differences Between, Changes Within: Guidelines on When to Create a New Record"

3) OCLC's "When to Input a New Record" and the "Field-by-Field Guidelines for New Records"


Following these guidelines (which unfortunately are not all the same) can turn it into an almost automatic decision. A big concern is whether there is a change in the 245$abc and/or in the 300$a. If there are no differences but you still know there are major changes, then it is a matter of judgement, but if you know there are differences in the persons responsible for the text, you should at least describe them in a 500 note. I would think that an additional preface would be shown in a difference in the 245$c and in the 300$a, thereby making it clearly a new edition.

For rare books, there is more attention paid to different *states* of the text than with more normal publications. States of the text rarely show up in what is recorded in the normal bibliographic fields. This can become especially detailed when it comes to dealer's catalogs, which can be exceedingly minute in their descriptions.

In the future, when almost everything is online and will be liable to various types of computer parsing, I think the emphasis will change toward much more exact methods using word counts and file compares instead of our rather primitive methods of counting pages and relying on the wording and dates found on the chief sources of information, which is all supplied by the publishers/printers and almost never by the authors.

This was one of those areas that I hoped RDA would address but, they didn't.

Tuesday, November 1, 2011

Re: Search Engine Optimization, Google's Algorithm, and Library Selection

Posting to NGC4LIB

On 01/11/2011 13:33, Dave Caroline wrote:
To me Google is the solution, I have a private holding but publish the catalogue, without google I would get no email requests, proper SEO is about sensible data on ones web page rather than understanding Googles algorithms.
I use the webmaster tools they provide and I do notice a steady rise with occasional jumps when either I do something or Google changes its algorithm.

Thinking that the librarian is some how the answer misses the most important points,
why should the user come to you,
how does the user find you have the content,
who is the internet's librarian (possibly the +1 button on websites will make us ALL the internet librarian)
if your collection is not catalogued (box of letters is is a terrible description/title) and published how do you expect to get users at all.
It is the article and Google itself that are questioning the usefulness of the traditional Google algorithm, and this is based on companies utilizing SEO. The purpose of a company on the web is to drive as much web traffic to their pages as possible, but this is not the searcher's purpose. The searchers want to see resources that are as closely aligned to their searches as possible. These are different ideas and purposes. I can't blame websites for trying to get as much traffic as possible, using whatever tools they can because they are all about revenue. But from the searcher's viewpoint, I don't want to see a bunch of junk, such as the pages in the eHow site from Demand Media, as discussed in the article from Tech Republic.

Library selection is designed to avoid the waste of time for the searchers by hiring experts to pre-select useful and reliable resources. The entire process is open with Collection Development policies and so on. Library selection certainly has its own problems, but in any case its underlying purposes are totally different from sites such as Google.

You ask some great practical questions, e.g. why should the user come to us in the first place? Libraries should have been dealing with these issues from a long time back, but unfortunately, it seems to me as if they have continued to concentrate on printed resources. I understand why, since they are already dealing with overwhelming numbers of materials and adding on more seems impossible. Libraries already have catalogs online and these should be revamped to become more useful for all kinds of people. Still, I think librarians offer services that are found nowhere else and these could be leveraged somehow. One part of that is selection, although it would have to be adapted to web-scale.

The problem with everyone becoming the "internet librarian" is spam. For instance, click spam, where people are hired into sweatshops to click on ads all day(! Called Click Fraud in Wikipedia can lead to huge profits. I do not see why trusting the +1 button would be any different.

My own opinion is that we are still at the very beginning of the internet and web and it is very difficult to see what will happen. Some things that seem absolutely impossible today will be enacted--somehow. This is just like at the very beginnings of printing, nobody foresaw the changes that would happen in the future: either the incredible societal changes that took place, or the huge number of people employed in all the various aspects of bibliography: from authors to editors to publishers to printers to distributors to libraries and bookstores, plus all of the spinoffs: increased paper making, transportation, storage and on and on. It is practically impossible to predict what will happen in the future of the web.

Search Engine Optimization, Google's Algorithm, and Library Selection

Posting to NGC4LIB

I suggest the article "Can Google survive its blind faith in the algorithm?" from Tech Republic about the problems with the Google algorithm:

This article describes the problem of the original Google algorithm as clearly as any I have seen; that spam, using Search Engine Optimization (SEO) is overwhelming the traditional Google algorithms and other methods are needed. This is why the Facebook-type system is now seen as the salvation, since it relies much more on human-made links instead of a machine-centered algorithm. I have concerns about this method as well.  For instance, while I have friends and acquaintances who I want to talk with and spend time, they have all kinds of interests--some I don't approve of, and I am positive they have other interests I am not aware of and that I don't want to even think about. While they are my friends, I certainly wouldn't want them to determine my reading material, and even less so having their friends decide what I should read. Add to this the idea that each of these persons' *web surfing history* would be used and the scenario actually becomes repellent to me. Probably if someone had mentioned such a possibility to me 25 years ago, I would have just laughed out loud, finding it too bizarre to even imagine.

It seems to me however, that this entire discussion is actually one that librarians know quite a bit about: selection. How to do it and who should do it. I personally like the idea of a selector who is professional and more important, *ethical*, determining the most important resources for me so that I don't have to waste my time.

Apparently, this is what the Google algorithm is trying to achieve automatically. Once you have a series of sites that are reliable and useful, it would seem that SEO would be far more powerful. Traditional library selection which is based a lot on managing a budget where the selector only has so many dollars or euros is quite different from selection of free web resources. In any case, I think there would be a great opening for libraries here, by establishing some kind of real cooperative selection. There have been some sites for this, e.g.  Infomine. Intute, which was great, has unfortunately been discontinued.

Still, there seems to be an important area of need where the information companies are actually struggling. When Google changes their algorithm, the consequences are pretty much unpredictable and the sites involved  consider themselves to be punished. Library selection is not punishment or reward, but something completely different.

Maybe libraries could step in.

Re: Radical proposal for RDA inclusions

Posting to RDA-L

On Fri, Oct 28, 2011 at 7:41 AM, Bernhard Eversberg wrote:
I see two big issues here (among many more lesser ones) that should not be taken too lightly:

1. MARC as input standard has made sure that it was (more or less) the same everywhere. Someone trained at X could go to work at Y immediately without a lot of retraining.

2. Dealing with raw data at the person-machine interface of data input has at least two advantages:
-- Directness: What you see is what you get, no layers of transformation and interpretation between you and the data.
-- Ease of human communication: The format became the very language of catalogers' talk about the data; precise, succinct, unambiguous, international (numbers, not words!). Just listen in on any AUTOCAT discussion.

For all the flaws of MARC, these are great advantages.

Considering what modern systems can do, there could be any number of highly convenient but widely different input systems. As soon as two different ones are adopted at X and Y, points 1. and 2. are both lost. And then, modern input systems will evolve, they will change over time, get refined, modified, replaced by new designs. What will that mean for the productivity of the cataloging workforce? And how are they going to talk on AUTOCAT, for instance?
As always, you ask some great questions and I certainly don't have any answers.

Even catalogers don't work with the raw data format of MARC (don't worry. I won't begin my ISO2709 diatribe again!) but they are looking at a formatted display. Taking this further, the display catalogers work with could easily show human-language explanations instead of numbers, as many catalogs do now, since they often show the field/subfield along with the description.

Still, the numbers for the fields and subfields allow a degree of almost scientific accuracy when discussing catalog issues that I don't think can be easily replicated into human language.

*Perhaps* the new RDF coding could be the solution, but at least to me, the very idea of catalogers speaking in RDF triples somehow brings to mind images from some of the wilder scenes of sci-fi/terror movies or the show The X Files. It's enough to give me the shivering horrors!