Friday, October 12, 2012

Re: [ACAT] New Bibliographic Framework: Update with Eric Miller

Posting to Autocat

On 11/10/2012 16:05, Brenndorfer, Thomas wrote:

Part of the essential problem is that you keep saying that FRBR is about finding WEMI through authors, titles and subjects -- as if it just replicates the existing structure.

This is fundamentally incorrect, as it represents only one thing that FRBR supports -- legacy functions in traditional catalogs.

An entity-relationship analysis is a routine exercise done on data, and is the backbone for much of what makes the world work in terms of databases and their functionality. Such a routine operation would have inevitably been done to traditional cataloging data, and as it turned out it was called FRBR. It would have happened anyways. It is just as necessary as it ever was. It's not radical and shiny and new -- it's a routine process related to broader requirements of data management. FRBR is not meant as an alternative to Lucene-type indexing, but does in fact allow catalog data to take advantage of better indexing tools.

FRBR is not just about traditional structures. Rather, it's about the idea that any entity can be found or identified through any attribute. So a person could be found through a variety of elements, not just name, and not just through a left-anchored browse list. A work could be found through a variety of elements, not just its uniform title. A work could be found through its relationship to other entities.

Those attributes and relationships could be turned into facets. You have made it appear as if facets are something new that supersede FRBR, when the truth is completely different. The purpose in FRBR is to clearly define the "things" of interest in any bibliographic data collection, and use that as the basis for organizing the specific bits of data.

From the FRBR report (section 2.3)

"The first step in the entity analysis technique is to isolate the key objects that are of interest to users of information in a particular domain. These objects of interest or entities are defined at as high a level as possible. That is to say that the analysis first focuses attention not on individual data but on the "things" the data describe. Each of the entities defined for the model, therefore, serves as the focal point for a cluster of data."

The problem you have missed is that the elements used in facets are not as flexible as they could be absent a true FRBRized structure.

It's straightforward to imagine how a FRBRized structure works. In finding the video "Troy", a FRBRized catalog would allow the user to find all related entities-- individuals and corporate bodies connected to the work, different versions (expressions and manifestations) that exists, items that the library owns, and also links to related works that can be navigated-- the original work.

What's missing now in traditional catalog is efficient reciprocal linking. RDA's list of relationship designators are organized by reciprocal terms. The linking of entities need only be done once and shared globally for great new functionality to be realized.

This is already starting to appear in RDA authority records:

100 1 ‡aPasternak, Boris Leonidovich,‡d1890-1960.‡tDoktor Zhivago 
380 ‡anovel 
530 0 ‡wr‡iAdapted as a motion picture (work):‡aDoctor Zhivago (Motion picture : 1965) 
530 0 ‡wr‡iAdapted as a motion picture (work):‡aDoctor Zhivago (Motion picture : 2002)

While this example of authority record data is still rooted in text-based, left-anchored functionality, the fundamental point to the direction set in RDA is to look at all of this data as supporting the deepest underlying connections between entities, and the co-ordination of data about those entities, such that those connections and data can be acted upon in new and more efficient ways-- going well beyond the traditional notions and conventions of how catalog data is currently structured.

RDA already relegates the instructions for authorized access points to the back of the chapters for the respective entities. The left-anchored heading and uniform title concepts are supported in RDA, but they are not central. RDA, for example, give primacy to coded identifiers for identifying entities. RDA recognizes that many other data elements support the tasks of finding and identifying entities, not just left-anchored headings and the bits that make up uniform titles and access points.

Until that is properly understood, then the discussions on this topic will remain circular and unproductive.

I keep forgetting that the problem is with my own understanding. Because if I understood, I would agree completely with RDA and FRBR because they are so obviously true.


The example you mention of "Troy" would work right now and you don't need FRBR for it. When you find the movie for "Troy", all of the related names, subjects, titles, will be extracted for further searching. Right now, all of these headings are there for filtering, or in other words, will narrow your results to more closely identify what the searcher wants from the original search, but it could just as easily be opened up so that if you click on e.g. Brad Pitt, you would instead search for anything with Brad Pitt. You could do this just by changing the operator to an OR, or by eliminating the original search altogether. I don't think I would prefer that, but the user interface could certainly do it. It could also provide for both possibilities. It could be set to find it all if you wanted, although that would probably be a waste of resources. It could provide entirely new possibilities for retrieval and display. The power is there--it depends on what you want to do with it.

In Worldcat, uniform titles do not seem to be included in the facets, which I think are crucial but I confess they would need work for the user interface. Incidentally, I noticed that the indexing in Worldcat seems to be incomplete. I looked for "Pasternak Doktor Zhivago" and did not see David Lean anywhere in the facets, who was director of the version with Omar Sharif and is in the records as AE. If I am correct, it needs to be fixed.

Anyway, I consider that the "great new functionality" you mention with:
100 1 ‡aPasternak, Boris Leonidovich,‡d1890-1960.‡tDoktor Zhivago 
380   ‡anovel 
530 0 ‡wr‡iAdapted as a motion picture (work):‡aDoctor Zhivago (Motion picture : 1965) 
530 0 ‡wr‡iAdapted as a motion picture (work):‡aDoctor Zhivago (Motion picture : 2002)  
could be "solved" with Lucene and new user interfaces, much as what I laid out in my talk at ALA. This way, we wouldn't need the outrageous amount of work to add all motion pictures manually to the authority records (this could go on forever) but it could be done smarter, using the information already in the catalog, e.g. for the search"pasternak boris"+ti:"doktor+zhivago"&qt=results_page (I get some weird results here but that is another topic), the user interface could say that it was made into a movie (utilizing, behind the scenes, the uniform title and the subdivision "Film and video adaptations"), and that there are x number of videos made in year y and z; that there are so many translations, and so on and on, and that if the question is about Pasternak's Doktor Zhivago, there are ....

It can be done today! And you don't need to do it by hand.

And yes, I do understand URIs and have even used them. They can be very useful under the right circumstances but they are not a panacea and FRBR has fetishized them. As we can all prove to ourselves over and over every single day, you don't have to have URIs to build extremely useful tools. The web has demonstrated that several times over.

But even so, a URI can be any string at all so long as it is unique. Consequently, the URI in VIAF for Leo Tolstoy is: but in dbpedia, it is: URIs do not have to be numbers, just something unique.

Our terrible, horrible, text-based headings are based on unique forms. Yes--they are truly unique textual strings--and catalogers have gone to great lengths to make sure they are unique because they understand very clearly that they must be unique. Many in the IT community don't really understand that our forms are--really are--unique. Very few other agencies can actually say that, other than ISO language designators or postal codes. A lot could be built on the truly unique forms that catalogers have created with such care. We should use the powers that are built into the system that we already have.

But setting all this aside, can we really expect that the public will remain patient and will silently wait until RDA and FRBR and the "new bibliographic framework" are implemented before libraries can begin to build something of genuine use to them? Of course, Eric Miller said that neither RDA nor FRBR are necessary and the new bibliographic framework should be as simple as possible. But waiting for RDA and FRBR and a complex bibliographic framework will take, how many years.... only 5 years? That's very optimistic. How about 10 years? That means we will be a decade further behind the public than we are now. What does that mean? As a yardstick, Google has existed as a company only since 1998 (14 years) and went public in 2004 (8 years). Look how much has changed in that time. Where will people be in 10 more years?

Yet, the fact that we could do a tremendous amount right now to make things better for the public that funds us is deemed to be "circular and unproductive". I guess it's just easier to keep putting things off.

Nevertheless, we are supposed to take solace in knowing that the public will wait until we create something that makes us happy and follows our pre-existing concepts--that's what it's all about after all--because everybody knows that FRBR and RDA really and truly do constitute a "Great Leap Forward"!

Darn it! I keep forgetting that I don't understand much at all! :-)

No comments:

Post a Comment