Sunday, July 29, 2012

Re: [ACAT] RDA/FRBR buy in

Posting to Autocat

On 29/07/2012 01:09, Brenndorfer, Thomas wrote:
<snip>
James Weinheimer wrote: ...
 Because normally, they would be considered separate manifestations. ...
They're always considered separate manifestations. The question is the convention used to process them or represent them. RDA recognizes the entire range of examples you have presented-- from bulk change scenarios of file types to one-off resources with special characteristics. ... It's hard to see what the argument actually is. Do away with the concept of Manifestations because they're difficult to deal with in some scenarios? That misses the point. Establishing a framework for discussion up front makes sense. Enumerating the conventions for handling concepts within the framework makes sense.

That's what FRBR and RDA are doing. Throwing out some examples of complex Manifestations as if they demonstrate that FRBR and RDA are broken doesn't make sense. All they demonstrate is that some convention has to be adopted and perhaps some software developed, but what is being discussed and proposed is still fully covered by the overall framework in FRBR and RDA.
</snip>

In my last posting, I wasn't so much hitting RDA or FRBR, I was answering the question you put to me: "Why would anyone need to "catalog" these? File characteristic metadata are generated on the fly, and can easily and automatically populate any database."    

Then I attempted to show the problem of how to deal with "formats/manifestations" when they can be generated at the click of a button, and disappear as well. "The Imitation of Christ" example is really good since it shows new formats and obsolete ones in the Internet archive, although they still seem to be there. So, we can imagine this multiplied into the millions, e.g. HathiTrust or Google Books flicking a switch and suddenly, all the texts are in avi and mp3 and wmv and epub and open office and daisy and mobi. Each one creates millions of additional manifestations, although those manifestations may only "exist" when you click the button, kind of a print on demand except it's format creation on demand. Then those same people can decide that nobody needs, e.g. wmv and millions of those resources are gone.

This is probably happening now and is certainly well within the bounds of predictability. In this scenario, there are no theoretical problems handling everything in traditional ways, creating separate manifestations for each format of each resource, but in practical terms the workload quite literally makes this solution impossible. So, my observations were not about cataloging rules but everyday cataloging. It will make a huge difference if these formats are handled separately or not.

I don't know what the solution is, although I can imagine a few options. Obviously, there must be some cooperation with the website administrators, or lacking that, a more generic form of catalog record must be considered.

[ACAT] RDA/FRBR buy in

Posting to Autocat


On 28/07/2012 16:00, Brenndorfer, Thomas wrote:
<snip>
James Weinheimer wrote: ....
 So, if you had a database of 10,000 documents and all were in XML, to make those same 10,000 documents available in an audio format, you could generate them automatically through a single XSLT file that transforms the source file. It could even be done on the fly. If you wanted to make them available in some new format, e.g. epub or mobi or some other version, it can be done by adding only one file. Do we want to catalog these sorts of items separately?
Why would anyone need to "catalog" these? File characteristic metadata are generated on the fly, and can easily and automatically populate any database.
</snip>
Because normally, they would be considered separate manifestations. If the metadata creation is handled automatically, I agree that is one option. But I would think that catalogers would believe that such matters should be handled at least semi-consistently in the real world. Let us imagine that a selector has decided upon the Thomas à Kempis, The Imitation of Christ site at http://www.ccel.org/ccel/kempis/imitation/imitation.html. There are obviously different ways to catalog it. I discovered that another translation is still available in an earlier version in the Internet Archive at http://web.archive.org/web/19990422034909/http://ccel.wheaton.edu/kempis/imitation/imitation.html.

I can catalog it as a single webpage or I can catalog each instance of each file, so this can be either one record or a number of records, or I imagine I can do it both ways. As we see, when I first found this site in 1999, there were actually different bibliographical editions but there seems to be only one now. If I am to do any of it automatically, how do I do that? The people who own the site could do some things automatically perhaps, if they were interested enough and saw adequate value but otherwise, it seems to be up to the cataloger.

<snip>
The benefit of the FRBR model is that the content can be cataloged separately from the carrier, which would mean descriptive data of the content, clarifying data about the content, relationships to creators and subjects, etc., could still involve cataloger intervention. There is no need to spend time "cataloging" the carrier data when this can be largely automated, or generated as needed. Relational databases work on the principle of the entities, attributes, and relationships. If you don't like the bibliographic elements in FRBR and RDA then use other ones suitable for the metadata task at hand.

The point that is being missed is that the overall framework and requirements for thinking about how data elements inter-relate and inter-operate are still the same. The point that is being missed is that the FRBR model **** IS BASED ON THE SAME FRAMEWORK USED IN BUILDING ALL RELATIONAL DATABASES ****. The cognitive dissonance resulting from criticizing FRBR but bringing up data problems that have already been solved by the same entity-relatonship model used in FRBR is so obvious that all one can do is watch the spectacle of such a trainwreck of illogical and nonsensical ideas.
</snip>

I understand relational databases and how they work. I've built a few myself. It is important to acknowledge that they are not the latest in technology and there are other options. Relational databases are certainly not good enough for more advanced searching capabilities, for instance, if Google were a relational database, it would blow a bunch of gaskets. Lucene-type indexing technologies have proven themselves superior for those matters. (It's all based on flat-files, by the way!) That is why tools such as Worldcat with facets, which can now provide the FRBR user tasks, can operate as well as they do. Many systems use both databases: the relational database in the background for the technicians to actually manage the data (to edit and create the records), but Lucene technologies for the actual searching.

And I will say once again that there are many ways of modeling data. FRBR is one way of doing it but it is not the only way and it also doesn't mean it is the best way. One of the first steps in modeling is figuring out what is important to the stakeholders (i.e. the people who will use the system) and attempt to give it to them as much as possible. The FRBR data model was based (I assume) on this same idea to give the public what it wanted, and the model with works/expressions/manifestations/items was created. Yet, there was no research to discover if these really were the tasks that the majority of the users wanted, but just assumed to model according to the purposes of the catalog as laid out initially by Panizzi and Cutter and expanded later. Yet, we live in a different informational world since FRBR was created (in Internet time, the 1990s are now a different era, that is, in pre-Google times) and brand new resources are being created. Now we are standing on the edge of even more profound changes.

The first task should be to fix what has been broken for such a long time: upgrade the antediluvian MARC format, include the absolutely essential syndetic structures into keyword searches, get the subject headings to function coherently again (somehow), and link whatever can be linked. This is plenty to do, but absolutely necessary no matter what else happens. After watching how people work with all of this for a few years, perhaps we will have a better understanding what the public wants and how to adapt to it, and perhaps we will even see that FRBR structures are necessary. But there is no evidence of that now.

Perhaps everything is obvious to those so entranced by the prescriptions of FRBR/RDA "that all one can do is watch the spectacle of such a trainwreck of illogical and nonsensical ideas" of those who criticize, but for me, the amount of unsubstantiated library superstition is equally astonishing.

Re: [ACAT] RDA/FRBR buy in


Posting to Autocat

On 27/07/2012 16:38, Joel Hahn wrote:
<snip>
I see this figure frequently bandied about, and I think that it misses the point to some degree. Perhaps less than 20% of *everything* described in WorldCat or LC's catalog have more than one manifestation, but for a public library, the percentage of new (and old) acquisitions that exist in more than one manifestation and/or more than one expression is significantly higher, quite possibly as high as the inverse percentage.
</snip>

This is an excellent point, but I don't know if I would go so far as to say the inverse percentage. I have also wondered how many of these are for obsolete editions of textbooks, or for materials that just probably won't be used anymore, such as a microform of a public domain book when a copy can be found online for free. More precise research on this would be very useful.

But just as interesting is the case of a document in XML that uses various style sheets to generate all different kinds of formats: .doc, .pdf, .txt, whatever. All you need is a single source file in XML (this would be equivalent to the "expression") and the different manifestations are generated. So, if you had a database of 10,000 documents and all were in XML, to make those same 10,000 documents available in an audio format, you could generate them automatically through a single XSLT file that transforms the source file. It could even be done on the fly. If you wanted to make them available in some new format, e.g. epub or mobi or some other version, it can be done by adding only one file. Do we want to catalog these sorts of items separately?

The first time this came up for me in an email discussion was back on the DC list, when we were discussing the current hot issue "1:1" and I asked how many records this item should be cataloged as: https://www.jiscmail.ac.uk/cgi-bin/webadmin?A2=ind9904&L=dc-general&D=0&P=64390 (the URL for the item has changed to http://www.ccel.org/ccel/kempis/imitation/imitation.html) There are several versions here and I don't know if they are generated automatically or not, although they certainly could be. My original message mentions some formats that are no longer available (e.g. hypercard), so if it was decided to catalog the new formats of the 10,000 documents I mentioned above, those 10,000 items could just as quickly disappear. What is the best way to handle these sorts of issues that potentially create or save massive amounts of work?

But I want to emphasize that whether the public needs or wants to limit to specific editions/manifestation is irrelevant in my opinion. It is vital for the *librarians* to have this information because it is their job to manage the collection so it must be entered somewhere. How those records are displayed to the public and how it is engineered is another matter entirely and there can be great versatility today. For instance, the FRBR displays (yes, there are FRBR displays) where the records for differing editions can be merged into a single view for the public could be done now and does not have to be done by manually creating a different record structure. (See http://www.loc.gov/marc/marc-functional-analysis/tool.html)

Much more is possible too.

Re: [ACAT] RDA/FRBR buy in

Posting to Autocat

On 27/07/2012 16:25, Frank Newton wrote:
<snip>
James, I don't agree with your statement "The subject heading strings, and the left-handed text browse mechanisms have become as obsolete in the online environment as a stone axe has in our modern world." Subject headings are not a technological tool like an axe. They are a language, like French, or Old Church Slavonic, or Buddhist Hybrid Sanskrit, or Scots Gaelic, or Basic English, or Haitian Creole (or Haitian if that is a better name for it), or Cantonese.

and

But the solution to the "X" factor is -- teaching. Both the proper use of Library of Congress subject headings by searchers and researchers, and the proper _appreciation_ of Library of Congress subject headings, have to be _taught_. First to catalogers, then to reference librarians, then to library users in all their many categories.
</snip>
Teaching people how to use the subject strings doesn't work and never has. The system itself must be completely and totally rethought because people will absolutely not do left-anchored text searches and browse alphabetically through lists. I quoted a quote of a "Mr. Line" in one of my podcasts (all I could find): http://blog.jweinheimer.net/2010/11/cataloging-matters-no.html from a paper in 1983 entitled “Thoughts of a non-user, non-educator”. He     was quoted as saying that the term user education is, “meaningless, inaccurate, pretentious and patronising and that if only librarians would spend the time and effort to ensure that their libraries are more user friendly then they wouldn't have to spend so much time doing user education.”
In my podcast, I discuss this and compare it to more normal library ideas, but I really think Mr. Line is right. This is because the library catalog is a tool for experts, and as anybody knows, if you don't use an expert tool regularly, you forget it. So, here is a student in an information literacy workshop that lasts for either a couple of hours, or maybe even a semester, and while they might wind up learning how to use the headings, they most probably have forgotten it in just a few years (or months, or weeks or seconds) and they turn to something that is easier such as Google. Expecting the world to change everything they do to use our catalogs may have worked before the web when there was a single choice to get information: the library, or go without. But today there are many tools out there, each competing wildly for the public's attention. If one is very difficult and complicated and strange, while another is simple, cool, and gives pretty good results, it's a no-brainer which will win.

What is the reality of how users experience subject headings? This is how I witnessed my patrons experience the headings over and over (sorry to refer to another of my own postings, but that's what happens. Besides you can see the whole discussion thread if you want) http://article.gmane.org/gmane.education.libraries.autocat/3962/match=weinheimer+fascism+bologna

Many could, and do, conclude that the subject headings are useless but that is incorrect because when properly used they furnish levels of access not available through any other tools. But expecting the world to change just because catalogers cannot reconceive how the subject headings could work in a new environment (and it's not so new anymore) is the path to oblivion. If catalogers cannot reconceive how the subject headings could work online, perhaps information architects could, or even library science students, that is, once they understand their potential power.

Friday, July 27, 2012

Re: [ACAT] RDA/FRBR buy in

Posting to Autocat

On 26/07/2012 22:58, Aaron Kuperman wrote:
<snip>
RDA is coming out at a bad time economically, and much of its advanced functionality will require a new communications format (replacing MARC) and totally new OPAC's. AACR2 has clearly failed, so much that many customers (users) are switching to keyword internet searching and serious thought is being given to liquidiating cataloging.
</snip>
RDA is coming out at a truly bad time and a new format is critical. But I don't see how it is that AACR2 has clearly failed, although I admit the *catalog itself* clearly has. This is because it is still designed to function as a card catalog and the authority files--so necessary for comprehensibility--don't work. The subject heading strings, and the left-handed text browse mechanisms have become as obsolete in the online environment as a stone axe has in our modern world. The catalog has to be rethought from scratch, and should have been rethought over a decade ago.

But even then, assuming for the moment that the cataloging rules of AACR2 have been a failure, it remains to be shown how the cataloging rules of RDA are better designed to provide the public with what it wants--unless we are to make the additional assumption that the public wants the FRBR structures, *and* that the current facet approach in the newest catalogs is inadequate for people. This winds up stretching the assumptions so far that they snap, and requires some form of evidence.

After RDA is implemented, the public will see nothing different except the more perceptive may notice some inconsistencies in how a few words display (not "cm" however, which is for some unfathomable reason that I do not care to fathom, a symbol instead of an abbreviation! Ha! Ha!) but the catalogs will work precisely the same so people will experience the same problems.

Eventually, after much more work, much more expense, full FRBR can be implemented that will allow the public to navigate the WEMI almost as easily and simply as they can today with faceted catalogs. Of course, since less than 20% of all resources have more than one manifestation, it won't be noticed that often anyway, plus what is the relevance that the WEMI kind of structure has for the becoming-weirder-and-weirder digital resources? Has anybody seen a second edition of a website where the first edition is still around? I am sure there are some, because you can find almost anything on the web, no matter how bizarre, but how many times does it happen? There are different expressions (web pages in different languages, although some versions are normally more complete than others) but no manifestations, that is, unless we want to start thinking about making separate manifestation records for each capture of a site in the Internet Archive, which really are different manifestations. The White House site currently has 2686 captures in the Internet Archive http://web.archive.org/web/20110728223109/http://www.whitehouse.gov/. If somebody wants to do this as a project, you can count me out!

With RDA, we see standards go down: access points go down, "cataloger judgment" goes up, while the complexity of creating the records increases since there is the additional level of WEMI to figure out--even when there is only a single manifestation.

Going with RDA makes perfect sense to me!

Thursday, July 26, 2012

Re: When 95% digitized, do we still need cataloging?

Posting to Autocat
On 25/07/2012 16:23, Li, Yue wrote:
<snip>
When 95% digitized, do we still need cataloging?
</snip>
I remembered another consideration that I neglected in my previous reply: in the future, how will people interact with the metadata (catalog) records? My papers at Oslo and Anaheim discuss this. We see two fundamentally different ways of how people "interact" with the metadata records with the Internet Archive and Google Books.

Concerning individual items, in the Internet Archive, people search the metadata records (I do not believe you are searching the full-text there), see the metadata records and then get to the item. This is how it works in libraries. HathiTrust works the same way except I believe you are also searching the full-text, but you go into the metadata record before going into the full-text itself. In Google Books however, you search the full-text and the metadata records, but you go into the item immediately. If you want to see the metadata record you can, but it is out of the way and people I have met do not even know about the metadata records in Google Books.

So, the metadata record can be front and center, or it can remain behind the scenes. There are massive amounts of metadata in Google, Yahoo and other search engines, but it remains behind the scenes. As people become more and more used to going directly into the actual resources, interposing the metadata record may seem increasingly strange. Some of the undergraduates I have worked with were already having troubles understanding the relationship between the online catalog and the library's collection and I don't think they were stupid. Expectations and perceptions are changing very quickly and interacting with "summary records" (as I called them) is becoming unusual for people.

It is human nature to devalue what you do not see; this is often why, e.g. sewer lines are ignored until they literally crumble, while if the problem is something everyone talks about continually and is in your face, e.g. potholes in the roads, these tend to be fixed more quickly. The importance people place on metadata/catalog records, and thus whether it is considered necessary or not, could be related to how much people actually will see them--or don't see them--in the future.

RE: RDA questions from librarians at small libraries

Posting to Autocat

On 26/07/2012 15:33, Marc Truitt wrote:
<snip>
Similarly, we have more than once heard the argument that "better systems" ought to precede new metadata standards.  This is a chicken-and-egg viewpoint if ever there was one.  The simple, if unpleasant truth is that system designers and vendors are highly reluctant to design and market new systems, absent established metadata standards on which to base them.  Systems people must design to some *specification*, particularly if they are to hope for their products to integrate and be interoperable with the environment in which they function.  Stark evidence of this fact can be seen in the differences between ILS functionality and interoperability in the areas of cataloguing as compared with acquisitions.  Whatever our complaints about the varying ways in which our systems handle MARC-based bibliographic metadata, so long as the metadata itself is reasonably standards compliant, there will always be some level of core functionality on which we can depend.  Even more important, and again as long as the records are reasonably standards compliant, we always have the hope that we can migrate our data to newer, better systems down the road.

Compare this with the scandal that is ILS acquisitions metadata.  There are virtually no standards, save for those promulgated by publishers and vendors -- in other words, not designed for library use -- and even these are as often as not observed at best only in the breach. Acquisitions data are notoriously difficult to migrate between systems, given the lack of an accepted standard and the fact that systems implement proprietary functionality and libraries tend to customize or establish system work-arounds to accommodate local workflows.  Yikes! 

</snip>
I think a lot of the problems of the cataloger/IT divide stems from different interpretations of the meaning of "metadata". For many IT people, metadata means the coding, while the "information inside the coding" is the "data" which populates the database. IT people are not that interested in the data, except that it is UTF-8 or ISO-8859 or if it is a date or a code, that it follows some accepted standard. We see it clearly in statistics databases, which will have the coding for a specific field (this is the metadata), e.g. acreage devoted to wheat, and one instance of this field has the statistic of 500 (this is the data). There are also the the provisos that this statistic represents hundreds or thousands (50,000 or 500,000) and in acres or hectares or square kilometers. This is what many IT people mean when they point out that our "text" (our titles, authors, dates, paging and other bibliographic concepts) should be transformed into "data". The "500" in the statistics database really can do a lot when related to other "data" out there (linked data!) but if it is misinterpreted and in one database 500 is hundreds while in another it is thousands, the results will be gobbledygook and any interpretations from it are wrong.

The mentality is also seen in the discussions of IT people about ISBD where they often focus solely on the two or three pages of punctuation and the rest of it is of no interest to them. IT people look at ISBD and think, "How old and obsolete is that??!!" Catalogers don't care much about the punctuation and are instead interested in the hundreds of pages of absolutely essential guidelines on how to approach an item, determine which information to choose, which to ignore, how to input it, and all in a standardized way. So, when you state "absent established metadata standards on which to base them" lots of catalogers will reply that our metadata standards go back longer than anybody else's. It's a different concept of metadata vs. data.

(To add yet another level of complication, David Weinberger gave a talk and mentioned that "Metadata isn't what it used to be". He is claiming that today, metadata is what you know--a few lines of a poem, a couple of words of a song, the color of a book that you can't remember the author or title--and data is what you are looking for. He may be absolutely right and then the IT people *AND* catalogers will have even more to argue about! For those who are interested, there is a link to his very interesting talk in a paper of mine. http://blog.jweinheimer.net/2012/06/reality-check-what-is-it-that-public.html)

Ultimately, I think that much of the problem is that IT people insinuate themselves too much into cataloging matters, and catalogers insinuate themselves too much into IT matters. For me, I simply don't care one bit if MARC 100 changes into "creator" "author" "writer" or "abc123", so long as it works. To me, it's just a bunch of stupid computer codes and one code will work just as well as any other. The computer doesn't care. What matters is that everyone who is inputting real information into that field/area must interpret it in the same way and the information is compatible somehow. And there lies the real problem. For instance, some arguments I have had with IT people is that the "creator" of a scan of the Mona Lisa, for instance, should be the name of the scanner software. After all, it's an image file, not the physical resource of the Mona Lisa hanging in the Louvre, and no human created the file. Others have said that the person who pushed the button of the scanner is the "creator". The concept of "title" can be interpreted in a host of ways, as can each and every bibliographic concept. Arguments such as these are very tiring and depressing to me and is why I regret that numbers will not be used, so that everybody would have to look it up instead of deciding that they know what "title" means. Numbers will not be used however, and this is where I fear that gobbledygook genuinely threatens, especially in the looming linked data universe that so many appear to be looking forward to. It frightens me.

One obvious step toward a solution is to separate the IT responsibilities from the cataloging responsibilities. But I don't know if that will happen anytime soon.

Re: When 95% digitized, do we still need cataloging?

Posting to Autocat

On 25/07/2012 16:23, Li, Yue wrote:
<snip>
When 95% digitized, do we still need cataloging? 
  • Creator creates information in e-file. 
  • Publishers (individual or companies) will digitize and publish/are digitizing and publishing in e- resources/format.
  • Machine will generate bibliographic information and index almost everything for searching.
  • (Can machine find and build up relationships and generate/construct LinkedData? More or less, I am sure.) 
Like it or not, I see it coming. ProQuest takes away all our ETD. 
I am struggling with this question. I know lot of experts here can provide their insight and expertise. 
Some GA (Graduate Assistants) are working here and I am reluctant to recommend cataloging for their future career.
</snip>
I believe this will become the major question facing the cataloging community; and the topic may become acute fairly soon. My own opinion:

There will be a difference between description of the item and "headings/access points/semantics/linked data/whatever we want to call that part". Based on the talk by John Unsworth I mentioned in my last post, plus how the "born digital" materials are developing, the emphasis on catalogers creating works, expressions, manifestations, items may not be what people will want. (As I have mentioned before, I don't know if the public wanted WEMI so much in the past either, but that is a question for historians) For non-fiction materials, many people often want just bits and pieces of a book, especially for research purposes: the "perfect" 20 pages on the topic out of a 350 page book. People have always wanted bits and pieces, but it used to be that the only way toward the 20 pages was through the WEMI structures found in the catalogs. For instance, do you think that the authors who have a 30 or 40 page bibliography read each and every one of those materials completely? At the same time now and in the future, there will be many other ways to get to those 20 pages.

So, if catalogers continue to describe items in ways that are useful for the patrons, these sorts of issues will have to be considered as the public's expectations and needs change. For non-text materials: music, films, images, especially serials, and so on, description will probably always be needed, although Google Image search is impressive and improving.

For fiction materials, you normally want the entire item, and now works of fiction are often written in series/long installments, but I think that the public's needs for fiction will be pretty much served by current *descriptive* practices.

If descriptive cataloging continues, if resources change to XML to coding, much of it can be taken directly from the resource.

On the other hand, when it comes to assigning headings/access points/etc. I have seen nothing that replaces current library methods for searching concepts but our user interfaces have broken down online. The cataloging community must demonstrate the utility of the headings because it has been forgotten but that will not be easy at all. *If* the library community can demonstrate the value of our conceptual access (subjects, classification, name headings) I think there would be a real demand for those powers, that is, if people really understood what it means to search by author, title, subject and it was easy for the public to do it.

To sum up: conceptual cataloging will always be needed, although it will have to be resold to the public. Also, to return to my old drumbeat--RDA and FRBR are going down a false path. Just going into the Semantic Web is also not a solution.

What does the library catalog provide the search--aside from a bunch of catalog records that are arranged in different ways that allow the FRBR user tasks? What more does it do? A lot, I think.

Re: [ACAT] Gender terms in authority was RE: Advance Notice: Phase 1 of the PCCAHITG...

Posting to Autocat

On 24/07/2012 21:01, Marc Truitt wrote:
<snip>
Umm, with due respect to Jim, I'm not so certain that I agree with the implications of this statement.  All new/enhanced functionality has to start *somewhere*, and unless we accept that it may be incomplete or imperfect, we will never implement it.  As I understand Jim's view, one could easily argue that we should never have begun adding contents notes or TOCs to bibliographic records either, unless we intended to go back and do all the millions of legacy records we'd created before that lacked such information.  Or, to take another (perhaps extreme) example, it might be argued that we should never have used MARC for anything at all except card production, lest we otherwise be forced to rely on split files, one online and the other representing the huge pile of paper-based cataloguing.

But of course, that's *precisely* what many of us did, for many, many years.  Yes, there were recon projects in many places -- believe it or not, I'm  working in an institution that is only now nearing the end of that tunnel! -- but we accepted the fact that automating would result in split files for years or even decades.  And some of us continue to accept this situation, cumbersome though it may be.

In my mind's eye, I can imagine an implementation of FRBR that someday might allow me to ask the catalogue -- whatever catalogue -- to show me all the manifestations of all the expressions in a collection that could be said to be traceable back through the Western literary tradition that begins with the work _Romeo and Juliette_.  It's pie-in-the-sky right now, but you might call it my proof-of-the-pudding FRBR fantasy.  But in order to have even a remote hope of getting there, I have to accept that we start drawing relationship links at some point.  We worry about the unlinked previous objects sometime after that point.

We regularly implement new features and functionalities in our systems, knowing full well that they will not apply to work we've already done for a long time, if ever.  It's a part of life, and we all simply deal with it.
</snip>
It seems to me that it is much better, especially given the powers of modern systems, to focus on what we can provide--right now, today, instead of aiming toward possibilities that may or may not give semi-decent results after 15 or 20 years or so, but we know that we will have to undertake huge retrospective projects before decent results are even possible. If new capabilities such as gender are needed so badly, we should discover what other tools exist (e.g. even books!) that may give even better information to the public right now. The catalog was always primarily a pointing device and to expect it to do fundamentally different tasks such as to "find all female authors writing on spiritual matters" is asking the catalog to do more than it was originally designed to do. It seems much the same as someone saying, "Let's add barcodes to the books so that it makes circulation easier. I just made a new field for the barcodes" and walking away. When adding new books, it's easy, so for the catalogers, who focus on adding the new items, it's easy, but for the rest of the library and its patrons, when you are looking at tens or hundreds of thousands of books already in a collection, or more, and they are currently being used, it creates huge problems for everybody else that people will have to solve somehow. Therefore, it will take significant resources for a long time to make the barcodes really useful. It may be decided that adding barcodes may, or may not, be worth the costs and efforts because there are definite benefits, but adding gender is of a completely different nature.

As I said before, if somebody finds the cash to add the gender field or even if the work is crowdsourced it may be worthwhile, but it seems to me that not putting practical considerations front and center means promising more than we could ever furnish and setting ourselves up to fail. Today, there are new possibilities: are there ways of hooking into other systems that may have this information? If so, is it enough to just point to those resources (i.e. catalog them), or do they have to be incorporated into the mechanisms of the catalog? If they are to be incorporated, what does that entail? Is this the best way to use the library's resources? These are logical questions and yes--it means making a business case.

I confess that in many ways, I am incredibly conservative as a cataloger. (Not so much in other ways) I fear that if we are not very careful, we could easily make our database obsolete, or spend time filling it up with lots of information that cannot and will not be used, and thereby promise far too much that can never be fulfilled--at least not without scads of manual updates. That would be setting ourselves up for failure. And our field cannot afford that right now. Saying that something might come in handy someday runs into the same arguments that has provoked the cataloging simplification of the last several decades, where so many notes and other practices were eliminated. We shouldn't start that process again. After all, Seymour Lubetzky himself came up with the question: "Is this rule necessary?" Still wise words today.

At the same time, there is a great deal of data in our records that goes unused right now. We should be concentrating our efforts on that.

In my mind's eye, I can foresee a time when 95% of everything is digitized and OCR'd, so that contents notes and TOCs will be searched in the items themselves, and that same information in the catalog will become more or less obsolete. There will still be cleanup but it will be correcting the OCR--something that many, many people, including myself want. At the same time, with improved OCR all kinds of additional computer manipulation could be done, much like a Google Ngram Viewer, but much improved. http://books.google.com/ngrams. For an idea of what the current directions are in these fields, see John Unsworth's talk at MIT http://video.mit.edu/watch/building-and-using-big-digital-libraries-john-unsworth-11546/ where he discusses some of these new capabilities.

Wednesday, July 25, 2012

Re: [ACAT] Gender terms in authority was RE: Advance Notice: Phase 1 of the PCCAHITG...


Posting to Autocat
Hal Cain wrote:
<snip>
This kind of correlation is extremely difficult to perform in current systems, and unless some clever system design, backed by enhancement of existing authorities and bibliographic data, is done, the usefulness of library systems to library users will be, in terms of comparison with other ways of correlating data, overtaken.  At that point the trickle of questions asking about the value of what cataloguers do will become a torrent.

Questions that look difficult from inside the systems of bibliographic control we inhabit are precisely the ones we have to create good, useful, straightforward answers to; and the data has to be there to make it possible, or people will stop expecting us to be useful.
</snip>
This seems to be wise counsel. It is important to look at what we have now and come up with new ways how it can be best utilized for the public, instead of saying, "Well, if everybody had been adding gender all along, then people could search for various genders, so let's just add the information".

When we say, "People would like to find women writers from the Italian Renaissance," I sympathize, but since we have not been putting that kind of information into the records, we would have to start from scratch. When people do a search, they should expect some kind of meaningful coverage, so if the search works only on the records created after 2013, the search misses 99.9999% of everything since the information isn't there. If someone found the money for a special project to add this information, or they decided to crowdsource it, those attempts could be interesting, but I ask again: is that the best use of library resources? If someone wants that kind of information, why not use the catalog to find a book (or other resource that is in the collection) that might have that information already? So, how about trying a search such as subject keyword "women authors dictionaries" and similar searches, where they may find a resource that will provide them with a better and clearer answer than the library catalog ever could.

Sounds to me that if someone has this kind of question, they should go to a reference librarian for help because there is probably a resource in a library somewhere! Plus, there is a nice scope note under "Women authors".
So this leads me to ask: what can we realistically expect from the catalog without going into every record and updating it? It is not there to answer every single question, but there is a lot that the catalog can do.

Monday, July 23, 2012

Re: [ACAT] Must article to read

Posting to Autocat

On 21/07/2012 21:00, MULLEN Allen wrote:
<snip>
As to non-librarian users of catalogs- again no disaster results from RDA that I can perceive (what crashes do you foresee?). As for non-librarian user information needs, you have made the case yourself that user information needs transcend the resources available in any library and that the quality and variety of quality academic and non-academic resources available, now and into the future, are enormous.

 ...

It is easy to define "business case" (Prince2 or whatever the norms in the business world might be) in such a way that no matter what is offered, it can be said it doesn't fit a given definition of what a business case *should* be. However, the test committee report (as well as the Working Group, etc.) have defined a number of goals for greatly expanding the capabilities of the library catalog environment, and substantial work, including RDA is being done and has been done for several years now to achieve those goals. You feel that this is insufficient, I and other Autocat readers understand this and why. However, If RDA fails, users can still walk up to our catalogs and find local holdings as reliably (or not as one's perspective might hold) as they can at present. I can come up with any number of imagined scenarios where this wouldn't be true, but they would be idle speculation unless something more substantial than opinion is offered.

...

But identifying an ad hominem, guilt by association, attack such as ""Today's believers in the superiority of their own creations, like the theologians of yesterday, are sure to blame Adam and Eve---and catalogers---for any and all problems that follow on the implementation of the Next Generation Catalog. That much is entirely predictable" as a profound insight does not seem helpful. It assumes that if RDA fails, catalogers will be blamed - an unfair and unsubstantiated assumption. It assumes that RDA developers are blinded by devotion to their creation without any substantiation that a single person, much less the scores or hundreds of librarians involved in RDA development is thus so arrogant and blind. This perspective has the same basis in reality that the accusations that IPCC and thousands of scientists around the world, along with government agencies and others, are engaged in vast conspiracy to impose massive changes under a delusion that there is anthropogenic climate c hange. It is tantamount to believing that Judith A. Kuhagen, Barbara Tillett, Chris Oliver, Pat Riva, Alison Hitchens, Daniel Paradis, Clement Arsenault, Jennifer Bowen, Athena Salaba, Robert Maxwell, Adam Schiff, Sue Andrews, Alison Hitchens, Lars Svensson, Mireille Huneault, and as I said, hundreds of others, who are involved in an endeavor bound to fail through their ignorance, and bound to be blamed on catalogers. SRSLY?
</snip>
There definitely will be consequences to RDA implementation. Except for those libraries that have a knight in shining armor who will ride up and drop off the extra cash, money and resources will have to be diverted from other library services that are more vital to the public than implementing RDA. (To disprove this statement means to make the business case) Other consequences include lowered productivity because RDA is even more complex, split files, cleanup, paying for online subscriptions, plus there are always unforeseen costs--and that is only for the relatively small changes of implementing RDA. What will be the result? The public will be looking at the same catalogs that they have already rejected. Also, modern solutions are ignored with RDA. In the case of typing out abbreviations, it is no solution at all since the public will forever be looking at abbreviations until the end of time. This shows a steadfast refusal to look at matters from the public's viewpoint (who will still be seeing the abbreviations in the older records) and is also guilty of implementing 19th century solutions (manual retyping) in defiance of far more powerful and efficient capabilities of the 21st century. The results to the users? Practically none. Except for changing the rule of three to the rule of one. Those who really believe that changing to the rule of one will "release the inner cataloger" or will lead to "catalogers gone crazy" and access points will go up have absolutely no understanding of human nature. It is only realistic to conclude that the number of access points will go down. Of course, when full FRBR implementation comes, I think we all know it will be even more expensive. The results on cataloging departments will be more work, more complex work, while staffing levels will at best remain level, and still--since there is no business case--the "bean counters" will be demanding more and more reasons why they should continue to fund tools that are falling farther and farther behind the needs of the general populace. They will not sit still to listen to explanations of theories or ontologies or be swept away by heartfelt declarations of faith that all will be fine when we reach linked data. I have been through all of this before.

So, there definitely are consequences.

I am very concerned about another point however. Allen appends the list of names "It is tantamount to believing that Judith A. Kuhagen, Barbara Tillett, Chris Oliver, Pat Riva, Alison Hitchens, Daniel Paradis, Clement Arsenault, Jennifer Bowen, Athena Salaba, Robert Maxwell, Adam Schiff, Sue Andrews, Alison Hitchens, Lars Svensson, Mireille Huneault, and as I said, hundreds of others, who are involved in an endeavor bound to fail through their ignorance, and bound to be blamed on catalogers"

Such a characterization troubles me immensely and I must reject it absolutely and completely.

<soapbox>
  <stepup />
I am a librarian, and one of the major reasons I became a librarian was because I believe very strongly in freedom of thought and freedom of speech. These imply freedom of inquiry. Progress is made possible only when good, pointed questions can be freely asked of anyone. Excellent, uncomfortable questions do not constitute personal attacks and when they are considered so, freedom of inquiry disappears. Anyone--at any time--can be wrong. And of course, I can be wrong as well. History has demonstrated this simple fact innumerable times so it should come as no surprise that any or all of us may be wrong at this very moment. Questioning, so long as it is asked in a civil way, is not any kind of insult nor is it morally suspect. The absolute need for "falsifiability" is a vital part of modern society. If you are not free to ask questions and everyone is supposed to just accept the opinions of certain people as fact because otherwise it is an insult, modern society could not exist. The areas that are off limits are then termed "dogma". Ptolemy, Plato, Aristotle, Aquinas and many many others could not be questioned for a long time. Right now in the Capitoline Museum, there is an absolutely fascinating exhibition from the "Secret Archives" of the Vatican library called "Lux in Arcana". In it, you can see Galileo's "confession" where he was forced to acknowledge that everything he had seen and written about the solar system and universe was wrong. He confessed to be in error and that the Earth is the center of the universe and does not move. Ptolemy, a much greater influence on the world than any of those you mention, or any of us, could not be questioned. Did Galileo really believe that he was in error? No. But he was not free.

Therefore, I feel that librarians, above all other professions, should be strongly in favor of free speech and free inquiry, and this means to be even more diligent when applying it in their own fields of endeavor. Otherwise, it is all empty rhetoric.
  <stepdown />
</soapbox>

I have great respect for those who have worked so hard for RDA, and I have gone on at length about it. Yet that still does not make them immune to being questioned. Especially from a peer, which I consider myself to be. Each and every one of them can be wrong about what users want, and it is no attack to say so and ask for the evidence. Of course, no one has to reply to any questions but in that case it doesn't stop the inquiries. The people who question are equally free to draw their own conclusions from the silence and to share those conclusions with anyone they wish.

Saturday, July 21, 2012

Re: [ACAT] Must article to read

Posting to Autocat

On 21/07/2012 16:15, Brenndorfer, Thomas wrote:
<snip>
If a library, such as the Library of Congress, has established practices which it deems justified to be its business, such as the organization of intellectual or creative works, and resulting expressions and manifestations, supported by records for other entities of interest, then the burden of creating a business case has largely been made.
</snip>
Even the Library of Congress said that they could not make a valid business case http://www.loc.gov/bibliographic-future/rda/source/rdatesting-finalreport-20june2011.pdf p. 4. So, I think it is quite a stretch to conclude that the business case has been made. It is much more logical to conclude that RDA will be implemented without a business case. Certainly there was no public testing or any prototypes of anything. I am just pointing out that that would *never* be allowed in a normal business environment.
<snip>
In addition, the underlying model for FRBR, the entity-relationship model, has a proven track record in creating practical solutions for complex data needs. To not acknowledge such practices and realities seems colossally irresponsible. Assigning a mentality such as
"Today's believers in the superiority of their own creations, like the theologians of yesterday, are sure to blame Adam and Eve---and catalogers---for any and all problems that follow on the implementation of the Next Generation Catalog. That much is entirely predictable."
to practical, reliable, sensible people who have worked through the well-travelled data modeling exercises for bibliographic data is such a mindboggling stretch of thinking. It's absurd. One can easily assign this mentality to any group, such as catalog traditionalists who frown on those who fumble through their well-manicured card catalog creations. Kudos to those who work hard in the trenches to improve library data and systems and user experiences with what they have to work with, and who work hard keeping the historical and human side of the equation in mind when developing new models or tools, and have to endure such relentless maligning from on high.
</snip>
This is the problem with criticizing a project and why I mentioned that when I worked in grocery stores, if the designers had tested those checkstands with real clerks, those clerks were obviously too afraid to tell them the truth, or (I am adding now) their comments were just ignored because they said things the designers did not want to hear. So, in this case, we have "practical, reliable, sensible people who have worked through the well-travelled data modeling exercises for bibliographic data" and therefore, questioning them is just absurd.

Of course, in his article David pointed out how errors of the cockpit designers led to crashes, so what he is saying is not absurd but is based on sad experience. In my career, I have lived through some disaster library projects in catalogs, and not only in catalogs. (I refuse to discuss them) Designers of systems are only human; they often get very attached to their projects and immediately go into "parental protection mode" when someone criticises them. They can also be guilty of hubris, of believing that their researches and theories have covered every possible contingency, and if people do not like what they have created, it has to be a problem with the people, who are stupid, inept and ungrateful. Consequently, the very purpose of the project changes from creating a service that fulfills the needs of the users of the system into one that makes the designers happy and proud. That is when it becomes a disaster for the organization that is building the project.

Therefore, there are project systems such as the PRINCE2 system I mentioned and why it places so much authority with the Senior User.

Re: [ACAT] Must article to read

Posting to Autocat

On 20/07/2012 23:30, MULLEN Allen wrote:
<snip>
Lawrence Creider writes:
I think the problem is not the accuracy of the characterization of Mr. Bade's comments as Marxist but the implicit assumption that that label automatically discredits his arguments.
I don't believe it discredits the comments because of an association with Marx, it discredits them (in my opinion) because there is not a well-made argument that indicates that RDA and next generation developers are either seeking to expropriate the means of production, or are slavishly devoted to technology (the Deist analogy). Since these are included at the end of an essay with nothing that I found tangibly supported either of these references. I can see how it might be read that I was disparaging Mr. Bade's essay by drawing out those comments in my analysis. However, they stood out as being an integral part of the concluding paragraph (there was an expropriation mention earlier as well) so I believe were worthy of inclusion in explaining how I perceive Mr. Bade was framing the argument. I didn't think he was associating himself with Marx, rather he uses the associations to label RDA/NGC librarians in a disparaging manner, concluding his ad hominem argument thus. </snip>
This entire thread has been quite enlightening. For one thing, it highlights why I believe introducing politics into these kinds of discussions should not be done because doing so unnecessarily divides people and diverts attention from finding solutions.

My own reading of David's paper focuses more on his discussion of the airline cockpit designers and their attitudes, e.g.
"In the second chapter “Almost without a pilot” he discusses how in the face of problems and crashes following the installation of automated cockpits aeronautic engineers blamed the pilots: it was not a problem with the technologies but the pilots who did not understand how to use the software or how to dialogue (remember Le Marec!) with the computer"
and
"System failures leading to airplane crashes or other accidents are understood in terms of the statistical probabilities of an accident; accidents are normal and expected (cf. Perrow). But when a human is found to be at fault, the accident is no longer normal, no longer a statistical probability but a moral fault, and the pilot is judged according to moral values, not statistical probabilities."
and
"...a rash of accidents erupted in the early 1990s and led to an acknowledgment of the complexity of the real world. The cultural diversity of pilots throughout the world—such as the fact that they did not all speak or read English well or even at all—was finally admitted to be an unavoidable factor in the operation of a global aeronautics communication system."

In various other careers in my past, I have seen precisely this same occurrence whenever new technology was introduced. The workers are supposed to adapt to the new technology and if you can not, you are labelled stupid, backward, or hostile. I remember when I used to work in grocery stores when the very first scanners, with entirely new checkstands were installed. The management was planning on getting rid of many people because we would all become "more productive". The checkstands looked very good but the moment you started working in one, it turned out to be less comfortable, more tiring, much slower and was really a drag to keep clean. Obviously, it was designed by people who had done a lot of research, but not people who actually used them. I decided that if they had tested the checkstands with real clerks, those clerks clearly did not feel free to tell them what they really thought.

Well, it turned out that at best, everyone was able to go about half as fast as with the older checkstands. The management decided that the clerks were going on some kind of slow-down strike (even though the managers were only half as fast too!) and it just added to the many areas of tension between management and employees.

This is one of the purposes of (sorry for repeating it!) a business case, and how it works in the PRINCE2 method that I have been a small part of. There is one role called a Senior User who is one of the top members of project team. "The Senior User(s) is responsible for specifying the needs of those who will use the project’s products, for user liaison with the project management team, and for monitoring that the solution will meet those needs within the constraints of the Business Case in terms of quality, functionality and ease of use." In other words, what the Senior User says cannot just be summarily dismissed by the Senior Supplier, who is responsible for supplying the needs of the Senior User. Both are at the same level. http://www.prince2.com/prince2-methodology.asp

So, in creating such a business case for the new cataloging environment, who would the Senior User represent? The public and the librarians, one part of which would be the catalogers. All of this involves doing research and outreach, to determine as best as possible what people want. The Senior Supplier then creates prototypes that the Senior User evaluates, and at certain stages, prototypes are laid before the public for their comments. In this kind of system, the Senior Supplier is *forced* to listen to the Senior User. The Senior User at the same time must be understanding of the technical problems and must be flexible but a system such as this at least attempts to bring the needs of the users into serious consideration.

Needless to say, none of this has been done with FRBR or RDA, and the danger is that the library community will go down the same path as the airplane cockpit designers vs. the pilots, when the crashes and malfunctions finally could not just be fobbed off on the pilots. Similar outcomes in the library community should not be a surprise to anybody. And as David ends his paper, "Today’s believers in the superiority of their own creations, like the theologians of yesterday, are sure to blame Adam and Eve—and catalogers—for any and all problems that follow on the implementation of the Next Generation Catalog. That much is entirely predictable."


I consider that a profound insight.

Friday, July 20, 2012

Re: [RDA-L] AACR2 records in OCLC

Posting to RDA-L

On 19/07/2012 21:31, Robert Maxwell wrote:
<snip>
For the same reasons we might upgrade a pre-AACR2 record to AACR2. RDA records have lots of advantages over AACR2 records. The abolishment of the rule of three is an example that comes to mind quickly.
</snip>
Revising the rule of three to the rule of one plus illustrators of children's books(!) is difficult to call an advantage--that is, if you are saying it is an advantage to the public. As I mentioned in my paper in Buenos Aires, somebody has to be realistic sooner or later. My own reality has always been to be happy when I see that fourth author or corporate body--and I don't think I am all that different or terrible than anybody else! "Thank heaven for the rule of three!" http://blog.jweinheimer.net/2012/02/is-rda-only-way-alternative-option.html

Catalogers are human beings, and to expect human beings to do more than is required is unrealistic. To expect people to do so without any rewards, or when the rewards lie elsewhere, as when catalogers are rewarded for making more records (as is the case almost everywhere today) is even more unrealistic. If rewards were changed from number of items processed to number of access points added, then sure--access points would go up but I don't see cataloging departments changing that way. I can certainly imagine someone looking at the growing number of items waiting for cataloging, and deciding that I can do, e.g. 1/3 more records by following the rule of one. There are certain realities of human beings, and realities of following standards that should be acknowledged. People do what is required of them and very few do any more. Especially if there are no rewards except "spiritual" ones--or you may even be punished because your productivity goes down.

This rule will probably have the biggest consequences for the public.

Re: [ACAT] Lack(?) of research, or, Let's hear it for the nobodies

Posting to Autocat

I'll let some of these comments pass and go to the more important ones:

On 19/07/2012 18:43, Kevin M Randall wrote:
<snip>
James Weinheimer wrote:
The technological triumph that now permits searchers to follow the FRBR functional requirements, and with *no need for any expensive, structural changes* should be universally acclaimed by all librarians as one of the great accomplishments in 21st century catalogs, but it has gone strangely unremarked. Certainly more needs to be done with the user interfaces, but nevertheless, FRBR can be done right now. That is a fact. And it is good.
Catalogs have been "doing FRBR" for a very very long time, since long before the FRBR study was ever begun. The point is, *how well* are the functional requirements being met? I think most people would say that at our current state, we're doing much better than before, but nowhere near as well as we could.
Because of this fact, the purpose of the FRBR structure of WEMI has disappeared.
Such a bold statement needs something to back it up. I have seen nothing showing that there is no need to have various editions of HAMLET relate to each other; or no need to have various renditions of "New York, New York" relate to Kander and Ebb; or no need to be able to find a specific copy of A PRAYER FOR OWEN MEANY when searching by that title or John Irving's name; or to be able to find an article in DICKENS STUDIES NEWSLETTER when the only citation I have is for the later title DICKENS QUARTERLY. These are things that FRBR and the WEMI entities are all about.
Of course there is always room for refinement and improvement in the FRBR model. But any criticism of FRBR must have a basis in an understanding of what FRBR is actually saying. It is not helpful to put forth a misrepresentation of FRBR and criticize that. What would one make of a review of THE WIZARD OF OZ that complained about how poorly it told the story of Goldilocks and the Three Bears?
</snip>
There are various issues here. First is your statement "Catalogs have been "doing FRBR" for a very very long time, since long before the FRBR study was ever begun.  The point is, *how well* are the functional requirements being met?" Concerning the first sentence, this has certainly not been my experience at all. I have pointed to examples of WEMI in old printed and card catalogs in several postings. When keyword was introduced however, people were stopped from finding WEMI for a long time because of the way keyword results have been displayed. At first, as I remember, keyword results were arranged by the order input into the file; then it was by date of publication, then by "relevance". Now, default is normally by relevance but the searchers can change the order by author or date if they wish. So, there has been a disruption in the searchers' ability to find WEMI, which has gone on for a long, long time. Now, some allow searching WEMI to a point, but the searcher must do it with a left-anchored text search, which is like expecting people to go back to rotary telephones. (I miss them by the way!) Some catalogs do not even allow that.

Your second sentence is more consequent however: "The point is, *how well* are the functional requirements being met?" Again I disagree and say that we must determine first if the functional requirements really are what the public wants. There has simply been too much water flowing under the bridge for too long a time to simply assume that the functional requirements are what the public wants. To accept such an unproven statement unquestioningly is, as I have said, the same as believing in a superstition. But I have said this before and I am sure all are as tired of it as I am.

Finally, concerning my "bold statement", look at the facets with Worldcat with Hamlet: http://bit.ly/NJD0kG (the links I have given before don't seem to be working, so I will use this. The search is just au:shakespeare and ti:hamlet. This search can be made more exact). Take a look at the result: the searcher can limit by all kinds of formats, by other people, e.g. Olivier,  by dates, languages, etc. As I keep saying, all this can be improved. Therefore comes my "bold" statement, which is after all, based on experiment that can be repeated: If the purpose of all of this is to make it possible to navigate the WEMI, it can be done right now and there is no need for RDA or FRBR, that is, so long as the catalogers add the correct uniform titles and you have the correct system. That simple statement of fact seems to make people angry however ...

But, since people have not been able to do WEMI for such a long time now, it is only logical to question whether they want it now. But I would like to avoid these questions since others are much more interesting, such as: what does the catalog really provides a searcher other than the simple WEMI, or separate unit records? Yet, I guess asking such questions is akin to blasphemy.

Of course, I have demonstrated more than anyone else how I do not understand anything about FRBR! :-)

Thursday, July 19, 2012

Re: [ACAT] Lack(?) of research, or, Let's hear it for the nobodies

Posting to Autocat

Allen,

I appreciate you taking the time to constructing such a thoughtful response. A few of mine:

On 19/07/2012 00:34, MULLEN Allen wrote:
<snip>
A few thoughts:
There is a great deal of research and general literature on library user behaviors and perceptions and on trends in information discovery, and on various aspects of RDA. The validity of this literature is certainly open to challenge though I bristle (as many of you well know) when the nature of the challenge becomes ad hominem rather than merits or dismerits thereof. And while James Weinheimer's point that a business case has not been made for RDA beyond generalities, this is no less true of much of what libraries engage in. If anyone can show us a business case that was made for AACR2, I'd like to see it.
</snip>
There was a very practical purpose to AACR2: to increase the amount of usable copy cataloging by getting everyone to use the same name headings. If that happened, a whole number of savings would follow for everyone. It also followed logically from AACR1 which brought the descriptions together and was a small success. Yet, it was still an environment of cards and adding the local headings to all the cards entailed a lot of work. The actual changes with AACR1 were relatively minor but AACR2 was quite disruptive since so many headings would change. It was decided that the advantages would outweigh the disadvantages, but it was a careful decision with advantages that were clear to all, although many disagreed.

Nobody has shown any similar tangible advantages with RDA or FRBR, and this is what I keep trying to point out.

It seems as if the purpose of FRBR has changed since its inception. The original purpose of it was, as demonstrated in its title: if bibliographic records are to function they must follow these requirements, otherwise they do not function. Therefore, it was designed to allow searchers to navigate the WEMI in various ways. Originally, it was seen that the only way to do this was to implement a new record structure where the WEMI became separate "entities" that exist independently. This was a huge change from anything before (as I have tried to demonstrate in previous posts).

Although none of that was demonstrated, I could see at the time that it followed logically from the historical developmental path of library catalogs. Back in the 1990s, it was difficult to imagine anything else. I certainly wasn't able to, since I could barely keep up with what was going on, with retrospective conversions, implementing integrated systems, for me as a Slavic cataloger, the incredible changes with the end of the Soviet Union, and so on.

Then arose the ideas of Web2.0 and Web3.0, of the Semantic Web and linked data. That was when the purpose of FRBR gradually morphed into getting into the linked data world and "turning our text into data". Perhaps that was already in the back of the minds of some people in the creation of FRBR, but is definitely not a part of the functional requirements. Even so, it needs to be demonstrated that in the potentially gigantic world of the Semantic Web--the size of which will dwarf the paltry few tens or at most hundreds of millions of records that library catalogers could ever hope to make and where nobody except a few catalogers will be following FRBR or AACR2 or RDA or perhaps any guidelines at all--the utility of placing what catalogers create into the semantic web still remains to be demonstrated in some kind of way. This has also never been done. The attitude has been similar to "build it and they will come".

On the technology front, a couple of interesting developments have taken place: 1) that faceted catalogs allow for people to navigate WEMI as easily as anything contemplated by FRBR; 2) that FRBR structures are not needed to enter the world of linked data, and that even the complex RDF coding originally required has been rendered unnecessary since the same results can be accomplished in much easier ways.

These are simple facts. And people can either accept these facts or ignore them.

The technological triumph that now permits searchers to follow the FRBR functional requirements, and with no need for any expensive, structural changes should be universally acclaimed by all librarians as one of the great accomplishments in 21st century catalogs, but it has gone strangely unremarked. Certainly more needs to be done with the user interfaces, but nevertheless, FRBR can be done right now. That is a fact. And it is good.

Because of this fact, the purpose of the FRBR structure of WEMI has disappeared. It's amazing how that can happen: one little technological change can make hundreds of years of work obsolete overnight, as happened with the introduction of the PC which made the typewriter immediately obsolete. As software improved, physical spreadsheets and overhead transparencies slowly disappeared, and many other things are in the process of becoming obsolete. Televisions and telephones, even physical books and records/CDs/DVDs, physical letters and envelopes, are less and less necessary. This has happened very quickly, and we are only at the beginning of the changes.

So, what is the purpose of FRBR with RDA as its first step? It is the duty of interested people to ask these sorts of questions. And to do it in a forthright fashion. I am not predicting doom and gloom--and I never have--since I have no special insight into the future and everything could turn out just fine, but the fact is: without logical and acceptable reasons, success would not be a matter of logic, knowledge, and foresight, but of sheer luck. In the environment I have laid out, I do not know the purpose of FRBR and RDA, and the case still needs to be laid out. None of this avoids the need for a business case but only makes it more critical.

Re: [ACAT] Lack(?) of research, or, Let's hear it for the nobodies

Posting to Autocat

On 18/07/2012 22:29, Kevin M Randall wrote:
<snip>
James Weinheimer wrote:
I am more than willing to admit that my ideas are wrong but please, demonstrate to me which of these papers performed research on the public to demonstrate that people wanted RDA and FRBR over and above other services.
Since I also mentioned the FRBR Bibliography quite a long time ago, I am disappointed that there is still a demand that someone else find the information you want in it but apparently do not want to look for yourself. Nevertheless, I'll point you to one: "Case Studies in Implementing FRBR: AustLit and MusicAustralia" http://www.nla.gov.au/lis/stndrds/grps/acoc/ayres2004.doc (the link for the HTML version in the bibliography doesn't work; try http://alia.org.au/publishing/alj/54.1/full.text/ayres.html ).

In addition, the eXtensible Catalog is a high profile project incorporating the FRBR model that unfortunately isn't in the FRBR Bibliography (since that bibliography isn't all that up-to-date). Information on XC can be found by starting out at: http://www.extensiblecatalog.org/ There really is stuff out there, if you actually want to find it. A rather old-fashioned, quaint method of starting to do that might be by Googling "frbr user research"...

By the way, I believe you asked for research into FRBR and users, not only those that "demonstrate that people wanted RDA and FRBR over and above other services." The latter doesn't sound like any of the research questions I would imagine having been studied. (I would imagine something more along the lines of "demonstrate that RDA and FRBR help guide the users to more successful catalog searches.")
</snip>
This is really a great example. Thanks for pointing it out. Again, whatever libraries create is not in a vacuum and absolutely must be compared with other tools out there. The reason comparison is so critical is because the public will immediately compare anything we create with what they already know. For instance, let's do a quick search for "waltzing matilda" (as in the paper you mention) in Worldcat: http://www.worldcat.org/search?qt=worldcat_org_all&q=title%3A%22waltzing+matilda%22. We see the results with very nice facets. So, if the question is that people need FRBR (which seems to be taken for granted so I will too for the purposes of this argument), is this display satisfactory? The searcher can limit by author, by format, by dates, etc. Therefore, it seems as if the FRBR/RDA structure is not necessary, if the purpose is to allow people to navigate through the WEMI. Of course, the current Worldcat interface, and probably the search too, can be vastly improved.

Still, how else can it be improved even further? How about taking this into Google Videos: https://www.google.com/search?q=Waltzing+Matilda+cowan&tbm=vid, the results are interesting. Still, let's do something more advanced for a moment and limit it in various ways: https://www.google.com/search?q=intitle%3A"waltzing+matilda"+(mpg|mp3|mp4|avi|wmv). This limits the search to the terms "waltzing matilda" just in the title of the resource, plus limits to various audio files. This is just off the top of my head and I am sure can be improved.

This is what I mean by pointing out what can be done today. It's simply amazing, but not all that simple and beyond the abilities of a normal patron. So, the question becomes: can this kind of coding be done behind the scenes? I will venture a guess and reply: absolutely YES.

Can a catalog interface, such as Worldcat that has limited an item in various ways be able to extend that search into other databases in a way that a user would find useful? The answer is simply: YES. Could a lot more be done right now to improve all of this? Absolutely YES. Do we need FRBR or RDA for any of this? The answer is very simple: NO.

RDA and FRBR are past it. How much more demonstration is needed? Move on into the new environments along with everybody else.

Re: Lack(?) of research, or, Let's hear it for the nobodies

Posting to Autocat

On 18/07/2012 19:30, Kevin M Randall wrote:
<snip>
James Weinheimer wrote:
With RDA/FRBR, there is still *absolutely zero* evidence that any of this will make any difference at all to the public because nobody has ever done the research.
"Nobody"? I suppose the people involved in the projects listed here are all "nobodies": http://infoserv.inist.fr/wwsympa.fcgi/d_read/frbr/FRBR_bibliography.rtf It's one thing to agree or disagree with results of research. It's quite another to deny the existence of the research.
</snip>
I am not trying to insult anyone but this is not a game of egos and we should be trying to reach the truth. I am more than willing to admit that my ideas are wrong but please, demonstrate to me which of these papers performed research on the public to demonstrate that people wanted RDA and FRBR over and above other services. I won't argue that they betray a lot of work, but it does not demonstrate how RDA or FRBR will make any meaningful difference to the public. To be honest, I don't really care if catalogers like it or not, so long as the public likes it and finds it useful. To paraphrase Mark Twain, catalogers have little or no influence on society(!).

Of course, all of it definitely will make a difference to libraries and catalogers, but not in additional productivity, greater simplicity or additional access. When I have discussed this with real, live scholars (I actually know quite a few) what RDA will do, they do not understand anything about linked data and won't listen to it, but when I tell them the differences in what will have consequences they will see in their research (practically nothing aside from a few display differences), they just wind up laughing and say, "We are supposed to give up some of our own subscriptions to pay for this?" That is a completely understandable question and is in agreement with the LC/NLM/NAL report. These are often the types of people who are the decision-makers in funding decisions. I have wondered how these people were convinced (or not)?

None of the scholars I have discussed this with (at least the ones who don't pass out too soon) care at all about navigating through the WEMI, although many definitely do want specific editions. There is a scholar-side of me, and I am one of those who wants specific editions, but as far as navigating through them--no. The Worldcat-type facets achieve everything I could want--and even more.

But what is happening now? The Google-type searching is becoming so powerful that it can change the way we think. I still believe all librarians should watch Daniel Russell's "What Does It Mean To Be Literate in the Age of Google?" that I mentioned earlier http://blog.jweinheimer.net/2012/05/what-does-it-mean-to-be-literate-in-age.html. While these questions may not be very realistic (or maybe they are), it shows what can be answered today and that opens up all kinds of new possibilities that could never have been imagined just 10 or 20 years ago. Only one of the many points he makes is that people who do not know Control-F are not literate today (find text within a page). A fascinating observation. His blog is unnerving for a librarian/cataloger: http://searchresearch1.blogspot.it/ but this is reality.

At the same time, there are many powers of the traditional catalog that are not available in Google. But they are not FRBR. I won't go into that again however.

Re: [ACAT] Advance Notice: Phase 1 of the PCCAHITG Phased Implentation of RDA to begin soon

Posting to Autocat

On 18/07/2012 18:49, Kevin M Randall wrote:
<snip>
How can the changes demonstrate anything, until they actually have taken place? Fighting against the changes, hoping they don't take place, does not help us get anywhere.
</snip>

That is precisely the purpose of making a business case--to ensure that you are building something that people want. Otherwise, everyone is going only on their gut feelings, and not everyone has those same feelings in their guts. It is not allowed in a business environment since they have been seriously burned so many times, so today the success or failure of a project is based not on holding your breath and crossing your fingers, but on research, market testing, and discussion. With RDA/FRBR, there is still absolutely zero evidence that any of this will make any difference at all to the public because nobody has ever done the research.

When you mention that fighting against the changes does not help us get anywhere--in the current environment, we don't even know where this "anywhere" is! Nobody knows the right way to go. That's why people are expected to make business cases. Somebody should have stood up against the Ford Edsel. Somebody should have stood up when businesses kept wanting to make horses and buggies. But they didn't.

Wednesday, July 18, 2012

Re: [ACAT] Advance Notice: Phase 1 of the PCCAHITG Phased Implentation of RDA to begin soon

Posting to Autocat

On 17/07/2012 23:13, J. McRee Elrod wrote:
<snip>
Stephen McDonald said:
In fact, most of the 1xx changes will make things a lot _more_ consistent than they ever were before.
It seems to me "consistent" is being used in two ways here. I was speaking of the same form of entry for the same person or body over time and across cataloguing agencies. I assume Stephen means consistent principles for forming entries.

I'm all for consistent practice in establishing *new* entries, but established ones (apart from those representing more than one person) should be left as they are. Those forms will remain in older records in underfunded catalogues, resulting in split files, if changes are made.
</snip> 
Precisely. Consistency in heading construction is really a matter of aesthetics for catalogers. People have never understood how or why a specific heading is the way it is. Why should they need to? As I mentioned in a previous posting, people do not know about the concept of "bibliographic identity". How in the world could they ever understand the apparent inconsistency when one subordinate corporate body is entered subordinately to the main body while another is entered independently? Why does one corporate body use IBM while another one uses "International Business Machines Corporation"? Why should people need to understand? Why should they care, so long as it all works?  Still, back when everyone was trapped in a printed catalog environment, book or card, perhaps more of a case could be made for consistency in heading construction since the requirement for left-hand text search made some things more clear, much more so than in a keyword environment with relevance ranking.

But instead of adding information that already exists in other databases, e.g. gender of an author, address of a corporate body, it would make so much more sense to link these databases together instead of retyping the same information over and over and over and over again. That is 19th century thinking and one of the major points in favor of linked data, after all! Information about the sex of an author can be found in all kinds of databases so why do we have to duplicate that labor? Yet, it does seem reasonable to ask: why is this information so much more important than other services catalogers could provide?

Also, in a computerized environment where the emphasis is on the URI instead of text, a single form becomes less important because displays become much more flexible. One example of a change that I think would be much appreciated by everyone is if language were added to each form of name. I think many people (including myself) would prefer to be able to choose the English form "Bank of China" to "Zhongguo yin hang" while others could choose French or German or whatever. Catalogers wouldn't necessarily have to do much work to add this information since it is the sort of task that could be crowdsourced.

I want to emphasize once again that the RDA changes are all theoretical. While some have requested these changes in the past, it does not follow that just because someone requests something, that it is best to fulfill it. For instance, why add the addresses of people and corporate bodies? If someone wants an address, the web is the only decent place for that kind of information now. Are catalogers supposed to check the addresses and update them in the records? Or are catalogers now in the business of creating city directories? The usefulness of adding this type of information that can be found elsewhere, especially at the expense of other services when you are in a time of restricted budgets must be questioned. As an example of this, our British colleagues have created this map http://libraries.fromconcentrate.net/ which I found illuminating. Even the libraries that are listed as "Saved" I am sure still have seriously lowered budgets. I haven't found a similar one for other countries. Is there anything like this for the US?

I think I have shown that I am not against changes, but any changes introduced must demonstrate that they make genuine and useful differences to the public to provide something that is found nowhere else. Plus, we really should be trying to get the cross-references to work along with making the subject headings functional again, and it's a good idea to add URIs whenever possible.

This would be a modern way of handling it.

Re: [ACAT] Advance Notice: Phase 1 of the PCCAHITG Phased Implentation of RDA to begin soon

Posting to Autocat

On 17/07/2012 17:25, McDonald, Stephen wrote:
<snip>
The only thing that will happen in Phase I is that records which are _not_ RDA compatible and _cannot_ be made more RDA compatible through automated processes will be marked with a 667 field saying so. No changes will be made to any access points, so this will not actually require authority maintenance. The purpose is to mark records which will need manual conversion.

After Phase I is completed, qualified NACO participants can start converting individual records from AACR2 to RDA forms. This will be low volume until Phase II.

Phase II is when changes will happen in a big way, and as you have noted, it will be a lot of work for everyone who wants to keep up. This is when the automated scripts will run, converting a large portion of the records which are not RDA compatible. This will happen as close to DAY 1 of PCC RDA implementation as possible (March 31, 2013, if I recall correctly).

After that, there will be slow cleanup of all the records that still need manual conversion work.
</snip>
I can't hold myself back. When all of the heading changes took place when AACR2 was implemented, at least there was the very practical reasoning that everyone in the Anglo-American world would be using the same forms of names. This built logically on the advances of AACR1, which brought the descriptions into alignment, and then AACR2 brought the name headings into alignment. This meant an incredible amount of cleanup for everyone but the goal was very clear: ultimately to reduce the amount of labor because if everybody followed the standards, the name headings on a record produced in e.g. Australia, could be used with no edits in the UK, US or Canada. Many people still thought it wasn't worth the effort but the advantages were clear. With AACR3/RDA, it would seem logical to try to bring the information into much more accord internationally, but using URIs for name and even subject headings.

But no. The changes in RDA are all theoretical. There is absolutely zero evidence that it will increase the amount of usable copy, as happened with AACR2, or that people will be able to find resources more easily than they can now. All of that is just tacitly assumed. And this added labor for cleanup, which will go on for some time, will be at the expense of .... what? Different libraries will decide for themselves which services and/or staff to cut.

And the greatest irony: the public will still be looking at the same old catalog interfaces they've never really understood.

Why is everybody going through these motions? Or has someone already proved that the public really wants RDA and I've missed it?