Tuesday, November 24, 2009

FW: User tags (was Sarah Palin)

On Mon, 23 Nov 2009 11:22:57 -0700, john g marr wrote:

>On Mon, 23 Nov 2009, James Weinheimer wrote:

>> On Fri, 20 Nov 2009 14:08:15 -0500, Henriksen, Phalbe wrote:
>>> Is the library's catalogue going to degenerate into a free-for-all
>>> political forum???

>> While I certainly sympathize with this view, I think that this may be the
>> price to the library if we are to enter the Web2.0 world: we lose control
of a lot of tasks where we were the absolute masters previously.
>
> The question is not whether we should retain a form of "mastery" or "pay
>a price" not to play techno-political conformity games, but whether we
>should continue to be and become increasingly responsible for providing
>factual data for public consumption in a transparent and unbiased manner.

So, are you saying that we should not allow users to tag the records? As I tried to point out, there are essentially two options: 1) to not allow tagging 2) to allow tagging. There are several options in how to implement user tagging.

If we allow tagging, then it can either be managed or not managed. If it is managed, it will demand library resources (perhaps a lot) and there will be some very tricky moments, I am sure. If we want to scrub out obscenties, we can do at least a lot of it through automated means, but that still leaves room for a lot of tags out there. If we manage it, what criteria do we use? Do we maintain that we are the ones who are "objective," "unbiased" or "fair and balanced" while the others aren't?

Of course, just because someone may be a teacher or faculty member does not make them immune to bias and subjectivity. How do we explain to users that "our tags" (i.e. traditionally assigned subject headings) are *not* biased, while theirs are? Or their tags may be, depending on how we feel that day? More importantly, how do we get users to agree to our pronouncements without making a huge fuss? I see it that if we manage user tags, we will be opening a huge, political, can of worms.

Web services may provide some solutions, but offer just as many pitfalls since we can import tags and reviews from other sites, but this means that we would still have to decide how to manage it or not and (at least I hope!) others will be able to take our records and do what they will with them, including changing them, or displaying them with the tags and keywords *they* want. For example, I have no doubt that there are some people who would love to take our records with the author of "William Shakespeare" and change the heading to the person that they believe was the "real" author of those works, e.g. Edward de Vere, Queen Elizabeth, Francis Bacon, or whoever is their favorite.

What I am trying to point out is that the world of cataloging is changing because our society is changing and there is nothing we can do about it. I don't like a lot of these changes in many ways, but they are being forced upon us and we must deal with problems and possibilities that our predecessors never had to face. We can decide to ignore it all, to keep "control" of everything, and maintain that ours is "better" than everyone else's, but that seems to me to take us down the path of eventual extinction. As Darwin said, "It is not the strongest of the species that survives, nor the most intelligent that survives. It is the one that is the most adaptable to change."

As an aside, I also find it very interesting that there is controversy among catalogers over adding user tags and other types of user inclusions (which the public has demonstrated that it wants), but there is almost no controversy over the switch to FRBR structures which have a much greater impact on cataloging and the catalogs, even though it has not been demonstrated that our public want it at all.

Monday, November 23, 2009

FW: User tags (was Sarah Palin)

On Fri, 20 Nov 2009 14:08:15 -0500, Henriksen, Phalbe wrote:

>Is the library's catalogue going to degenerate into a free-for-all
>political forum??? Or are we going to limit tagging to patrons who have
>active library cards, or write policies so that we have to go through
>the potential tags every morning and cancel the ones that don't meet our
>policy? Are we going to add to our technical services staff to do that?

While I certainly sympathize with this view, I think that this may be the price to the library if we are to enter the Web2.0 world: we lose control of a lot of tasks where we were the absolute masters previously. So, we either allow tagging and enter the Web2.0 world, or we do not enter that world, or we try to manage the input from that world to retain as much as we had, which demands more resources from an ever-decreasing staff.

Another possibility is to use the API to import on the fly the tags and reviews from other online tools, which is possible to do with Amazon.com right now. Conventional thinking is that this improves your catalog, but then you have all of those tags and reviews from amazon.com..... (!!)

Still another possibility is that libraries actually decide to share our metadata all over the web in all kinds of ways (as I have suggested) and our records can then be seen on webpages everywhere. Of course, this means that we lose control over how our records are displayed, the context, and so on, and our records can still display alongside those tags.

I don't know what the solution is. Maybe it will be an opportunity for some enterprising person out there to create a new NetNanny, "CatNanny" that will "scrub out" those dirty tags for the catalogers. Or, everybody will just get used to seeing bizarre results, just as we do in Google and Yahoo. I confess that sometimes a Google search result strikes me so much that suddenly, it seems I have been transported from the year 1990 or so, and I *can't believe" I'm seeing all of these things that would have shocked me deeply back then. The world has changed a lot in not *all that much* time, and it seems to be changing faster and faster.

Still, I do think that it is absolutely imperative for libraries to enter the Web2.0 world, in spite of us losing control. If we don't, I fear we will just be marginalizing ourselves too much

RE: Will books survive? A scorecard…

Comment to post: Will books survive? A scorecard…

It has seemed to me that the comparison of *a* printed book to *a* digital book is rather missing the point. More apt is to compare *a* digital device to *a* library. In this library are texts, videos, newspapers, magazines, and all kinds of weird things that ended up in the library for all kinds of reasons.

So, when we say that there are a lot of distractions on the web, this is entirely true, but there can also be lots of distractions in a library, with people walking around and talking, different magazines vying for your attention, public lectures scheduled, people doing strange things in the stacks and bathrooms, and so on, all of this going on while you are trying to concentrate on your book.

And, while an ebook device may cost a few hundred dollars, this must be compared to buying an entire library of books. Thanks to all of the digitization projects, many of which allow you to download for free, a single ebook reader now can represent quite literally a million-book library, with some of the finest works ever produced (although not the most recent).

Given all of this, it seems as if a digital book reader would be a great value since it could give me zillions of the best books of all time for free immediately. My hope is that people may actually read some of these older books that are free, compare them to some of the pulp published today, and question which is more valuable. There were a lot of romance novels published earlier that are now in the public domain. Maybe the fact that they are free will make them more interesting to the public again.

Friday, November 20, 2009

FW: Sarah Palin

This is one of the most entertaining threads I have read on Autocat! I would just like to point out that *when* (not if) everything becomes digital, any book will be able to have multiple class numbers.

Naturally, in Web2.0 catalogs, our users will be able to add their tags, arrange things and so on, and in these sorts of areas of political and moral divide, it may be educational to see how the general public tags these materials. I looked it up on LibraryThing and found the tags rather tame so far:
http://www.librarything.com/work/9026270

but the conversations have a bit more:
http://www.librarything.com/topic/76541

The tags at amazon.com are more partisan, some I would not want to repeat on this list:
http://www.amazon.com/gp/product/tags-on-product/0061939897/ref=tag_dpp_cust_edpp_sa

At amazonUK, there is much less:
http://www.amazon.co.uk/gp/product/tags-on-product/0061939897/ref=tag_dpp_cust_edpp_sa

At amazonCanada, there is some more partisanship,
http://www.amazon.ca/Going-Rogue-American-Sarah-Palin/dp/0061939897/ref=sr_1_1?ie=UTF8&s=books&qid=1258704033&sr=8-1

but here is a cataloging problem: most user tags are associated with differing versions that haven't been released yet, e.g. the CD:
http://www.amazon.ca/Going-Rogue-Cd-American-Life/dp/0061990736/ref=sr_1_2?ie=UTF8&s=books&qid=1258704033&sr=8-2

and the large print:
http://www.amazon.ca/Going-Rogue-Lp-American-Life/dp/0061979554/ref=sr_1_3?ie=UTF8&s=books&qid=1258704033&sr=8-3

The reason I bring this up is because I feel that whatever catalogs we come up with, we are witnessing a new function of the catalog. Catalogers will probably be the ones to deal with managing this somehow, both from the point of view of relevance and civility, also to make sure that tags permeate to all the different versions, all the while balancing this with the concerns
of users and free speech.

I have no idea how to solve any of this, just pointing out some considerations as we march into a new world of information.

Thursday, November 19, 2009

[NGC4LIB] FRBR WEMI and identifiers

Ross,

I really appreciate the indepth answer you provided, but I still have some problems.

First, your example of the SKOS:
owl:sameAs <info:lc/authorities/sh2009120881> ;
skos:inScheme <http://id.loc.gov/authorities#conceptScheme>,
<http://id.loc.gov/authorities#topicalTerms> ;
skos:prefLabel "Communication--Political aspects--United States"@en;
lcsh:coordinates
<http://id.loc.gov/authorities/sh85029027#concept>,
<http://id.loc.gov/authorities/sh00005651#concept>,
<http://purl.org/NET/marccodes/gacs/n-us#location> .

is fine and I believe does exactly what I have been saying that we need. but as you say, we must imagine this sometime in the future since it doesn't work now (not only because the term United States is not yet avaialble, but because the system is not set up that way. i.e. there is currently no link from
http://id.loc.gov/authorities/sh2009120881

to either:

<http://id.loc.gov/authorities/sh85029027#concept>,
<http://id.loc.gov/authorities/sh00005651#concept>,

The reason this does not work currently is because everything is still based on how people browsed a card or printed catalog. It all made perfect sense before, but fell apart with keyword searching. I think I need to stop and explain this because it may be becoming "lost information." For those who know this already, I apologize in advance.

If someone wanted to find books on the politics of communications in the U.S., they would open the "C" catalog drawer (not "P" and not "U") and begin going through the cards until they would come to "Communication," which--in theory--would be a raised card with the information now available at http://id.loc.gov/authorities/sh85029027#concept printed on it. They would read and learn whatever was on this raised card, then they would continue to browse (for quite awhile sometimes) until they ran across the subdivision "Political aspects" and continue to "United States."

In reality, it never worked that well because librarians were scared that the catalog would get too big, so they placed very few guide cards into the catalog, and as a result, almost all of the cross-references were found only in the red books. As a result, the red books were vital for the searcher to get all of these cross-references and such, but relatively few people actually used them. (I confess I did not understand their importance until library school, and I know I am not alone! BTW, a discussion is going on about the red books now on Autocat) People, including me, nevertheless muddled through somehow.

This system worked even worse when computers arrived with keyword since people ceased browsing the headings as they were supposed to, and with keyword searching, they would jump right into a record placed in the *middle of the file,* then see the subjects, and choose "Communication--Political aspects--United States." When they clidked on this link (if the system allowed it) you would be thrown into the *middle* of the old, card catalog browse list and not at the beginning as it was designed to work. This is how the LC catalog works right now. But the searcher still needs the information found under "Communication" plus lots more along the way, and now, the only way to get this information is to browse up and up to the top, often, after many, many screens. Of course, nobody does this except weirdos like me who understand how it is *supposed* to work. But, it's still a pain to do it and there must be something better.

Therefore, the link from "Communication--Political aspects--United States" to "Communication" is absolutely critical if the headings are to be useful, since the traditional method of browsing does not work anymore, and hasn't for a long time.

Therefore, while the structure you point out may work in the future, it doesn't appear to right now, and we are forced to imagine. The trouble with imagining is: I and lots of other people can imagine a lot and once people begin imagining, they can imagine how much more they could and should get, instead of only the internal relationships to "Communication" "Political aspects" and "United States." I think something like: http://dbpedia.org/page/Category:Communication would be found pretty useful by lots of people out there. Also, I would like some level of real world searches to be involved. My example has always been the real world keyword search for someone who is interested in battles of WWII: "wwii battles" which should retrieve the cross-references:
See: World War, 1939-1945--Aerial operations.
See: World War, 1939-1945--Campaigns.
See: World War, 1939-1945--Naval operations.

which appears now only if you search: "World War, 1939-1945 battles" which nobody would ever do. With a structure as you lay out above and what I think is necessary, it is at least possible because there is a reference for "wwii" in http://id.loc.gov/authorities/sh85148273 which appears nowhere else. This structure reflects how the card catalog functioned. I have written some more on this in one of my "Open replies" to Thomas Mann, where I discuss some of the problems of subjects, at: http://eprints.rclis.org/13059/1/OntheRecordOpenReply.pdf

<snip>
Your basis for this thread was to mitigate the effort and expense of our current cataloging process by ignoring RDA and FRBR and, instead, tweaking AACR2. But then you ask if we should drop LCSH for dbpedia. These seems completely disjoint. How would we begin to justify the retrospective conversion?
</snip>

I do not want that at all. We should be working hard to make LCSH actually useful for the public who now approach information retrieval in ways completely differently from before (primarily, using keyword which, as I tried to show, makes the LCSH browses more or less incoherent). But even more importantly, we must create something that is genuinely useful to our users and this means to *not* merely recreate the functionality of the card catalog, but we should try to recreate its power--because there was a power that is not replicated in our library catalogs (as I have tried to demonstrate) and certainly not in Google and the like. This also shouldn't take 10 years to do.

If it turns out that all we can do is recreate the traditional browses used in the card catalog, I am afraid it may not be worthwhile.

Wednesday, November 18, 2009

[NGC4LIB] FRBR WEMI and identifiers

Ross Singer wrote:

On Tue, Nov 17, 2009 at 4:30 AM, Weinheimer Jim wrote:

> See an earlier message where I mention that the pre-coordinated LCSH strings don't work at all well in an online environment. This is no surprise since they were designed for a completely different technology. I think you can have them work well, but in order for this to happen today, the "strings" must become more flexible, but as I discovered, RDF does not allow the kind of flexibility to do what I mention in this message. Therefore, we either give up on LCSH or seek new solutions.
> https://listserv.nd.edu/cgi-bin/wa?A2=NGC4LIB;QfWMyg;20091104085554%2B0100
<snip>
So are you purposefully ignoring the half dozen-plus counter arguments to your erroneous claim?
</snip>

Not at all, but please point out to me how I can take "Communication--Political aspects--United States" using the power of the semantics contained withing the 650 field:
650 \0$aCommunication$xPolitical aspects$zUnited States (topical subject - topical subdivision - geographical subdivision)
to get a more flexible display of the type I pointed out:
topical subdivision - geographical subdivision - topical subject
Political aspects United States Communication,
using the RDF from
http://id.loc.gov/authorities/sh2009120881#concept

It was my understanding from everything in this thread that this cannot be done using RDF and then Karen pointed out that the problem is with SKOS. That's fine, but the final result is the same: that what we have in id.loc.gov is totally inflexible. That's why I said that other technologies may be needed to do these sorts of things. I understand very clearly that the purpose of the RDF files from the id.loc.gov site are supposed to be there only for referencing, but I also tried to point out that this is not nearly enough to get a web designer to use it in reality. The web designer must see added-value, and on the web, this means links above all else. Can you say why somebody would use id.loc.gov and not dbpedia? And even if we placed our links into the relevant URIs in dbpedia, there would still be no added value since there would still be no links.

As I stated before, I am not an expert in these matters, but I still don't see how you can get flexible displays from the RDF that I see. If there is a transformation involved, e.g. breaking everything at the hyphens, there are still no semantics of the subdivisions, which are critical.

If I am wrong, please enlighten me, I would love to be wrong on this and learn that perhaps the LCSH may be of real use to everyone, but please focus on what is possible today, now, with the tools and data we have at hand, not what might be done after 10 years and the willing cooperation of half of the people on the internet, because there may be many other solutions available in 10 years, and I don't know how much cooperation we'll get. People have been waiting for a long time already to see something cool emerge from libraries and almost nothing has happened.

I think we can all agree that LCSH as subject strings are not useful for the general public today. My own opinion, although I may be wrong, is that LCSH should be useful and potentially they can be useful so long as--as I wrote--we link what can be linked, and what I see at id.loc.gov does not allow that. Still, above all else LCSH needs to be flexible, otherwise, we are stuck with pre-20th century browse displays from a card of printed catalog. Does id.loc.gov allow these possibilities? Or do we need a different technology?

Jim Weinheimer

Tuesday, November 17, 2009

HathiTrust batch load records in the catalog

I handled this in two ways:
1) For catalog records, I concentrated on the open-access books published by the University of Michigan Press. I did it as quick and dirty as I could by hand, using MarcEdit to get the records by ISBN (most in LC). Then I did some minor batch processing (245$h), added the links one-by-one(!!), and added a 710 Hathitrust Digital Library.

It was only at the end that I noticed there is a URI, as opposed to a URL, e.g. http://hdl.handle.net/2027/mdp.39015062870780 instead of http://babel.hathitrust.org/cgi/pt?id=mdp.39015062870780. Apparently simple enough to fix, but I just haven't gotten around to it yet.

It would be really, really nice if somebody did all of this and put it in a file for all of us to share. I would, but my catalog is Koha 2.2.7 and the fixed fields don't come out quite right, so there would still be a lot of work for everyone.

2) My other attempt has been to use the "Extend Search" function of my catalog to automatically search the HathiTrust database, e.g. see the extend search for "Historiography" at: http://www.galileo.aur.it/opac-tmpl/npl/en/extsearch/extsearchebooks.php?q=Historiography and select "HathiTrust." At this point, you can click on "Full-View Only" and get the result.

With limited resources, I find it very difficult to do much more.

James L. Weinheimer j.weinheimer@aur.edu
Director of Library and Information Services
The American University of Rome
Rome, Italy

Monday, November 16, 2009

FW: [NGC4LIB] FRBR WEMI and identifiers

Alexander Johannesen wrote:

<snip>
Sure. My own two Bob's worth is that it is too little to late, and
also that LCSH in all its glory itself is a complicated scheme that it
would take mostly librarians to love and use, and hence others out
there couldn't care too much about it. (You spoke of the NRK use case
of which I have insider information that suggests that LCSH isn't
suitable, mostly by virtue of being poorly and too slowly maintained
as a Norwegian subset).
</snip>

I think this is the main question, and I think it should posed not in *theoretical* terms but in *practical* terms. Perhaps some of these linked data projects may work more or less in ten years or so and perhaps not. While I did not understand the limitations inherent in RDF and SKOS, I do understand very clearly the idea of linked data and how powerful it could become someday. I've written about it myself. But, we are working in very *practical times* today and it seems these solutions will take several years to work out (at least), longer to actually implement, and all this in a time of decreasing budgets and a growing skepticism from the public (who provides our funding) about the usefulness of library records in general. I have noticed that measuring time on the web is different and seems to me rather similar to the so-called "dog years" where one human year equals 7 years for a dog (the last I read anyway). I don't know what the relationship is with "library time" and "world wide web time," but it must be somewhat similar. For example, our undergraduates today cannot imagine a world without Google (which they much prefer to our tools) and they find library tools rather strange. In just a few years from now, they will be the graduate students and the tax payers, and they will be further away from library records than ever. With each passing year, it will be increasingly difficult to win them back--win them back to our *tools,* not necessarily to our *collections,* although that may prove to be difficult as well if the Google Book agreement is implemented and we see how it is used.

So, as I look at all of these projects as a web designer, I look at them with an idea of usefulness to those who use the tools I make. It is an axiom of information architecture that a page grows in importance with the number of links it has both to it and from it (this is a measure of how well it is incorporated into the WWW). As a designer, I will link to pages that contain useful links. Linking to a dead page (no links) will serve no purpose and only anger my constituents. This is why I can't imagine anyone linking to id.loc.gov and why they might to dbpedia: one is value-added and the other has none at all. We have to give the web designers out there a genuine reason to put in a link to our tools, otherwise, they won't do it.

But, I believe that LCSH can potentially provide users much more useful browsing, using the great syndetic structure, so long as people do *not* have to navigate it as they did in the card catalog, where it worked a lot better. I think the see alsos are great, so that when I think I want "Authority" I find the exceedingly helpful:

Narrower Term: Divine right of kings.
Narrower Term: Example.
Narrower Term: General will.
Narrower Term: Power (Philosophy)
See Also: Authoritarianism
See Also: Consensus (Social sciences)

and use the links to discover that I really want: Legitimacy of governments. This should be incredibly simple to do today, and I think that if it were, and with appropriate useful links incorporated, people would really like it.

But perhaps this is all a matter of doing something like Jane said and getting into cooperative projects such as HIVE [Hi Jane! And thanks for the link!]. This could be one of the areas that I and others have mentioned about "losing control," where catalogers and librarians become merely one among equals. For example, we could put from the URIs to the id.loc.gov in dbpedia. I don't know what good that would do, but we could. Nevertheless, I feel there is a huge role for us to play, both in the creation and management of metadata and in other areas of the web to lend a layer of some type of "authority," but it will probably be with tools created by others since what we have made just doesn't seem to cut it.

Friday, November 13, 2009

RE: [NGC4LIB] FRBR WEMI and identifiers

Ross Singer wrote:
<snip>
Rather than being an issue of credibility, I would say the biggest
reason that id.loc.gov is getting relatively little use is because the
communities that it's designed for aren't using it: libraries.

LCSH authorities aren't terribly interesting to non-library
communities by themselves. They have simpler or more appropriate
domain-specific thesauri to describe their data. What is interesting
to non-library consumers, however, are the resources we've described
with these subjects. Then when these subjects are related to their
subjects, we have a rosetta stone of sorts.
</snip>

I had written a rather long reply to an earlier message of yours, but this is a better place. Essentially, I wrote that I was surprised to discover that you were absolutely right and RDF does not have the flexibility I think is absolutely essential: to break the headings into individual parts and in this way, to allow all kinds of new and even exciting possibilities. Otherwise, we are remain stuck with the same old textual strings that gave our patrons such trouble in the past, and now they are simply being "rewritten" in RDF. By this I mean there can be no links from Italy, Northern--Civilization to Italy, Northern or vice versa, and the only way of bringing them together in a coherent fashion is to throw the user into a left-anchored browse display, which of course, is exactly the same functionality as the card catalog. (And this without the much superior displays offered by the red books) As a result, it seems as if we are in a less flexible situation than perhaps ever before. I don't understand why RDF cannot do this, but it really doesn't matter.

While this may have been well-known and obvious to many on this list, I am a not an expert in everything, so this realization has come as rather a shock to me. I find it exceedingly tragic for the cataloging world and the library world by extension, but I confess it does appeal to my rather twisted sense of humor--it is highly ironic that we are still stuck with the 19th-century world of textual strings (that is, we lack the correct tool, so essentially the situation is that although we need a powerful electric screwdriver with multiple bits, all we have in our toolbox is a hammer). But it's also tragic since I don't see any solution.

To return to the point of this message, I think that the reason id.loc.gov isn't used is because it is providing people with something they neither need nor want: it provides a tool that is already passe (textual strings with obsolete navigation). No one will ever use it because even if they do, it is a tool designed for another era, a time that has been gone for at least 15 years (which is a *long* time in modern information terms). Why would even a library use it since the public doesn't like the traditional functionality in the OPACs today? I can't imagine any web designer using it except on some sort of theoretical project, since dbpedia and other tools offer exactly the same functionality plus a lot more (and the designers can even add subjects themselves).

Please correct me if I am wrong, but if all we can do is provide 19th century browse displays of our headings--and headings are the vast majority of the control we exert over the records we create and the materials in our collections--we can do that right now and it is met by incomprehension and uncaring on the part of the public with the result that they ignore our tools whenever they can. I don't see that changing now.

Is there a way out of this? Or should we start working a lot more with dbpedia?

Thursday, November 12, 2009

RE: [NGC4LIB] FRBR WEMI and identifiers

Karen Coyle wrote:
<snip>
Jim, it's hard to know what you are suggesting. LCSH is "out there" (and if you count the months of the lcsh.info effort, has been for well over a year) in a linked data format, but it appears that it hasn't found its users. MARC in XML has been available for quite a while, but we aren't seeing uptake for using the library data more broadly. I suspect that there's an underlying problem that isn't related to the format of the data, but the content.
</snip>

Karen,
I guess I don't understand what you are saying either. As my previous post tried to demonstrate, the id.loc.gov does *not provide* linked data. All it does is make a single URI into a completely closed system, e.g.: http://id.loc.gov/authorities/sh85120839 (Shakespeare, William, 1564-1616--Authorship--Oxford theory) provides no links to anything at all, not even links where it could provide them automatically, to the main Shakespeare heading and into the question of his authorship, never mind something that someone would actually want, such as into Worldcat and other places. Compare this to http://dbpedia.org/page/Oxfordian_theory, which provides a huge number of links to interesting resources along with links into related information in dbpedia. Which is more useful? If you were a web designer, which one would you want to work with?

While I applaud the id.loc.gov effort, it seems to me little better than "putting up a table in a pdf file" as TBL put it. As he said, at least put the table in a CSV format, if you can't do anything better. The id.loc.gov is potentially useful, but only potentially. As only one example of a basic improvement is Bernhard's LCSH Browser:
http://www.biblio.tu-bs.de/db/lcsh/page.php?urG=LCS&urA=18&urS=_shakespeare,+william,+1564-1616+--+authorship+--+collaboration, where you can click on the Oxford theory and then search different databases. Sorry for the self-promotion (but I am just shameless in these things!), you can click on AUR Library, where you will find nothing, but you can extend the search into multiple databases, finding materials in different places. There is a very interesting result in Google Books based on the LCSH jargon, where the searcher must know enough to delete in the heading Shakespeare's date and the word "Authorship." But, you will be able to find many things this way. In the Internet archive, you need to also delete "theory" when you retrieve items, and although I have tried to make it as easy as possible to search many other databases and projects, it still doesn't mean it's easy. Yet, is this useful to my patrons? They think so and I think so, too. Can it be improved 10,000%? Of course.

I think we need to be concentrating on making something that is practical, useful, and that extends to the full all the possible uses of the information we have at our fingertips *right now.* We also need to get away from the idea that we need to be in full control of what happens in the catalog, which is simply unrealistic now, but that we provide unique "value-added" options for our users. The information in catalogs and their associated records are all truly amazing, but everything should be seen as only a foundation for us and the general public to construct each community's respective buildings and facades. What those structures will look like, we cannot imagine at this point in time. In any case, we have to first provide the building blocks. Projects such as dbpedia are doing that, but the library community lags behind.

RE: [NGC4LIB] The Dewey Dilemma

Shawne Miksa wrote:
<snip>
Once again, let me play my broken record---there is demonstrated by many professional librarians a sad lack of understanding of the purpose of knowledge classification systems--how to build/construct numbers for representing the intellectual content of a resource in order to show how that content is related to other resources in the collection (i.e., fits into the collection). How would the user know how to read the numbers if the librarian doesn't even understand it?

DDC, LCC, etc., are knowledge classification systems first, not physical arrangement devices. Physical arrangement is a by-product; using the class #s for shelf-arrangement is optional. I only hope these libraries that have switched to something like BISAC (sp?) haven't stripped the DDC numbers completely from the record.
</snip>

As a former dilettante-theorist of the history of classification, one thing I took away from my studies is that any classification is only a mirror of the mind of the person who made the classification. What I mean by this can be seen just by perusing the outlines of any classification scheme: with some consideration, you can know if the classifier believes in God or not, what they think about moral or political questions, when the classification was created, and so on. Sometimes, it is absolutely obvious, such as some medieval classifications that began and ended with God ("I am the alpha and the omega") or in our own LCC, which classes Communism after criminal organizations such as the Mafia. Also in LCC, we see the importance they placed on philosophy in the 19th century, e.g. an entire subclass BH devoted to Aesthetics. The user of the classification system is also locked in time, e.g. Psychology, which is seen as more of a science now and would probably be more useful for browsing purposes in R, is still in B. Examples can go on and on, especially when comparing different systems. That's why I am not surprised that the Open Shelves Classification hasn't been much of a success. Too many cooks spoil the broth.

It wasn't until later when some of the bigger book collections began to be built that what we know as library classifications came out. Of course, the librarians almost always placed related books together, but a classification number normally referred more to an alcove with a big letter over it: "Z" which stood for e.g. Religion where you would go and browse the limited number of shelves of books, where everything got a shelf or press number. Here are some examples at Princeton: http://infoshare1.princeton.edu/rbsc2/libraryhistory/shlfmks/shelfmarks.html

I have gone back and forth about the utility of retaining the classification *number* in an online environment and I have argued strenuously against them in the past, but I think I have changed (for the moment!) and see their use more as a handy ordering device behind the scenes for the associated text that describes the subject (the label). For example, "Abnaki Indians" with E99.A13, the number would be used primarily for arranging the different headings for browsing, which gets away from alphabetical arrangement or the handmade BT/RT/NT, and the number may not even have to display.

But I don't know.

Tuesday, November 10, 2009

FW: [NGC4LIB] FRBR WEMI and identifiers

Ross Singer wrote:

<snip>
It is my recollection that pretty much every cataloging client works by a person typing into a text box next to a display that reads something like "100 ‡a". In this box they enter all of the ISBD punctuation, by hand, to signify the specifics of the data.

Now, is this more or less correct?
</snip>

To a certain extent. One of the points I try to make is that when examining a bibliographic record, what appears to be difficult to a non-expert is actually rather simple, and what seems to be simple is actually quite difficult. Let me give an example. Let's say that I have a title of a resource and I want to add that information to the record. (This also goes for any other part of a bibliographic "record"). Here is a record taken at random from LC: http://lccn.loc.gov/2008022443
The title is displayed in the OPAC as:
Main Title: Marxism, fascism, and totalitarianism : chapters in the intellectual history of radicalism / A. James Gregor.

The MARC coding in the cataloger display is:
245 10 |a Marxism, fascism, and totalitarianism : |b chapters in the intellectual history of radicalism / |c A. James Gregor.
(The underlying ISO2709 format is too terrifying to behold!)

What appears to be difficult here is the coding: all of those numbers and those strange subfields. But this is a misperception, just like someone who looks at, e.g. a text in Russian and believes that the hard part is the alphabet, but anybody who tries to learn Russian quickly understands that the alphabet is the easy part. Once you learn that, *then* it becomes difficult.

So, the rules for determining what the title(s) are and how to input them in a standardized manner go on and on. Is it a text? or a serial? or a movie? or a graphic? Where do you take the title from in each case? ISBD devotes many pages to it. The rules in AACR2 go on and on, and from there the LC Rule Interpretations go on. To see some of the complexity, look at: http://sites.google.com/site/opencatalogingrules/aacr2-chapter-1/1-0--decisions-before-cataloging---rev/1-0e--language-and-script-of-the-description---rev for the LCRIs and http://www.ifla.org/files/cataloguing/isbd/isbd-cons_2007-en.pdf#page=37 for the current ISBD rules. To a non-expert, all of this may seem to be so much superfluous fluff whose only purpose is to provide some dubious employment, but in actuality, these rules exist to provide a level of standardization *in the bibliographic data itself,* not the coding.

[Since I am the inveterate historian, I can't resist an example of how this same thing was done in the past. In the Princeton catalog of 1760, the librarian, Samuel Davies, cataloged one book as "Baxter's Call," found under the "B" Octavos section of that catalog, See it at http://infoshare1.princeton.edu/rbsc2/libraryhistory/1760_Davies.pdf (p. 9, pdf. p. 17). The title is actually "A call to the unconverted to turn and live, and accept of mercy while mercy may be had : as they ever would find mercy in the day of extremity from the living God / by his unworthy servant, Richard Baxter." See the record for a variant edition at: http://lccn.loc.gov/00511076]

As a result, a modern bibliographic record is a highly-standardized creation conforming to many rules and norms (or at least, it should be), and something like
100 1_ |a Twain, Mark, |d 1835-1910
becomes far more complex than the non-librarian suspects.

<snip>
If not, ignore the rest of what I'm about to say.

If so, it seems like this is exactly what we need to be moving away from. If ISBD/AACR2 rules were as simple as confining these things toa single subfield, well then the machine could probably figure things out enough to keep the status quo for data entry. But that's not the case. This says to me that some retraining would need to be done just to overcome the roadblocks that current practice throws into the path.

Now, maybe there's a happy medium. I don't know. Really, in the end, the interface should work this out anyway (IMO).
</snip>

While I basically agree, I think this allows me to point out another difference between the practicing cataloger and the systems person. The idea of "data entry" is not equivalent to the act of cataloging something according to agreed-upon standards. Yes, there are some similarities between the two (someone must physically enter data into the correct fields), but this is not the same as cataloging. Cataloging consists of the creation of 1) standardized descriptions of different resources. This is done so that there is a certain amount of guaranteed understanding among those who examine that description, i.e. the title is entered in a single way, taken from the same source, etc., the publication information will be the same, the extent. There are many purposes for this ranging from record sharing to inter-library loan; and 2) organizing the description(s) made in step 1 so that it/they can be found in multiple ways by multiple people.

Tremendous efficiencies can be realized today in step 1, if everyone concerned (a huge number of completely disparate bibliographic/webliographic communities) will agree to cooperate to standardize their work. (This is part of the purpose of the Cooperative Cataloging Rules: to find some common ground for eventual cooperation, and not all dictated by the library community). Concerning step 2, organizing the descriptions, I think there is much less need for agreement and standardization. Certainly, as far as our users are concerned, the traditional library methods of organization are being left behind by the web2.0 and web3.0 capabilities. Nevertheless, I believe there needs to be a level of "guaranteed areas" of access, just as there is a "guaranteed understanding" in the descriptions as dictated by ISBD, since otherwise, access becomes entirely unpredictable.

Monday, November 9, 2009

Are Too Many Students Going to College?

Blog comment to the Chronicle at: http://chronicle.com/article/Are-Too-Many-Students-Going-to/49039/

It seems to me that the undertone of this entire discussion is that the public (high-school) education is so bad. Public education should be able to produce an educated-enough public to participate in its own government, because otherwise, only a small minority of college-educated people will be able to do so.

In any case, it seems to me that there is little difference between the real estate agents convincing people who want to buy a house that it is worth the investment when those agents know that chances are, it will not be (i.e. the housing dabacle) and convincing someone that spending tens of thousands of dollars for 4 years to get a B.A. in English or Art will give them a vocation that they can live on comfortably.

People spend money for higher-education to get a better job, but higher-education as it is today is still based on a world that has always existed for the very few: a world dominated by the old, gentlemanly idea that "sonny-boy" should get some kind of "higher education" so that he he has enough culture to rise above the hoi-polloi when he enters and eventually takes over daddy's business.

FW: [NGC4LIB] FRBR WEMI and identifiers

Ross Singer wrote:
<snip>
Jim, I get what you're saying here, but I also think you're missing a really important point: there is no universal, one true way, that all people will want all resources.

So let's go with the notion that the FRBR user objectives are antiquated (which I can't say I subscribe to, since this theory hasn't been tested as far as I know, but for the sake of argument...).
</snip>

I can point out that the opposite has never been tested either, i.e. that the FRBR user tasks are what our users *really* want and should not be accepted as a statement of fact. I'm merely questioning a basic assertion of FRBR, which I believe is such fundamental point for everything else that it must be demonstrated, but yes, let's continue...

<snip>
If our data was reconfigured into a more FRBR-like model, we would have
/significantly/ more freedom to construct, associate and index our resources in ways that /do/ work for /specific/ user communities for specific /needs, resources and activities/.

By simply applying another coat of paint to AACR2, this sort of flexibility is impossible. Only by breaking our "records" into the individual resources they represent can we begin to represent the data according to the needs of an activity or user group. And to date, FRBR (and, by extension, RDA) has been the only realistic attempt to accomplish this.
</snip>

This is where I have a major difference. If we want to do these things, and I agree with all of them, I don't see that the *rules for input* have to change, which should be the rreason for implementing RDA. By all means, let's revamp 100% the MARC format, put in the linked data wherever possible, share our data promiscuously, but does it follow that I need to have different rules telling how to determine a title or how to input it? No. Are they changing for entering publication information? No. Are they changing for counting the pages or determining extent of resource and how to input that information? No. Are the rules changing even in those areas that in an earlier time were necessary technologically, such as the need for a single main entry in a card or printed catalog? No. (I have discussed this at length in other messages) Will we be using modern technology more effectively, e.g. extent of resource will now be determined dynamically by word counts and file compares? No.!
Is RDA going to increase cataloger productivity to a significant degree? No.

And if the rules don't change for the actual inputting (which is the case with RDA, since it has extremely few changes from AACR2) why do they have to be completely reorganized so that I have to relearn my tools? Especially during times when there is very little money in the kitty?

I compare it to a mechanic, who has been working for several years with the organization of his(her) tools in the toolbox, then somebody comes in and "reorganizes" those tools. The mechanic asks why: am I getting radically new tools? No, they are the same tools. Will this reorganization make me more productive? No, we don't foresee that at all. Why then, must my tools be reorganized, and I have to relearn where they are?

And here the mechanic gets a theoretical argument.

The barriers to entering into the information world we both want are not problems with the ways we input our data: it's with our obsolete formats and the fact that we are reluctant to share our data in the first place. So, my argument is: even if we adopt RDA with all the internal upheavals that will cause, it won't make any difference because we will still be faced with exactly the same problems of obsolete formats and not wanting to share our data, and as a result we will present our users something very similar to Fiction Finder, which they most probably will ignore because it does not meet their needs.

So what is my solution? Why don't we just put our data out for general use in a format that people can work with? That means: non-ISO2709 and non-MARC. (That does *not* necessarily mean we have to completely abandon MARC for our own internal purposes) So, I maintain the opposite of what you stated: that adopting FRBR definitely will *not* give libraries the freedom that user communities need--that will only happen when we decide to share our data, and that the data that we do share is usable by non-librarians. I do not see that the rules for input (the cataloging rules or what is covered by RDA) need to change very much, if at all, since that is one of the most important ingredients of high-quality standards, which is one of the most important services libraries can provide.

Naturally, in the future, libraries will most probably come out with major improvements in their data and formats, after a lot of study and research, which most probably will lead to something radically different from what we have today.

OCLC has managed to do quite a bit right now by massaging the records we have. I have been quite impressed. This relates back to what Tim Berners-Lee said in his talk: step number 1 is to share your data in a format others can use. All of these are considerations I made when initiating the Cooperative Cataloging Rules Wiki at http://sites.google.com/site/opencatalogingrules/ (I just had to get that plug in! :-))

Certainly catalogers and librarians in general need to change the view of what they create and how their creations can all fit into the new world, but that is another task entirely.

Friday, November 6, 2009

FW: Changes in Tech Services

On Thu, 5 Nov 2009 10:56:35 -0500, Roe,Kevin wrote:
>In the last few years, I would say that it is rare that we do not have to make some corrections in the record. Some are somewhat insignificant in terms of our users' ability to find the records, while others are would create problems if not corrected.
...
>So the dilemma is to decide what changes are needed and what aren't. Many libraries are simply accepting copy from their vendors without checking them, but this can and will only result in a massive cleanup job somewhere in the future, when patrons begin to complain about not finding materials that are in their libraries.

Correct, and this emphasizes that library cataloging/metadata *standards* are not at all like standards in other parts of our society. There are standards for water quality, handling electricity and gas, sewer lines; there are standards for butchers and bakers; there are standards for building codes and automobile manufacture.

Those are real, honest-to-goodness *standards,* and they are standards precisely because if you break the standards, you will be punished, and often severely, but it's been quite different in the library world of "nudge-nudge wink-wink" reactions to these matters. Now that our data increasingly is being used outside of our own worlds, we are seeing some consequences. The discussion of cataloging quality at Language Log http://languagelog.ldc.upenn.edu/nll/?p=1701 and Jon Orwant's reply (manager of Google Books) is quite enlightening. Orwant was critical of the quality of human-created metadata, and although I am personally suspicious of the specific examples he gave, problems of quality in library-created records certainly exist.

The real dilemma as I see it is that we cannot compete with automated methods on the very important issues of price and quantity, because one computer can churn out enough metadata in one hour equal to 100 catalogers working a year (I just made that up, but it is probably something close to the truth). The only advantages that librarians can offer is better *quality.* That's all.

Scary, but I think people want high quality badly, and issues of "quality" information retrieval will become far more important as the web grows and becomes less and less easy to manage.

What is needed is a general re-thinking of what "quality" means in a catalog and in an individual instance of a metadata record (which I think may be radically different from what has been traditionally thought), then relate this to what libraries are in a position to achieve; finally, to *ensure* the records created will achieve these standards. Regrettable as it would be, this may mean we would have to "lower" the current standards so that they can become something that is achievable, I don't know.

But the current system does not seem to be functioning very well and seems to require some fundamental changes.

FW: [NGC4LIB] FRBR WEMI and identifiers

Jakob Voss wrote:
<snip>
In addition you can harvest the Semantic Web for expressions that other people have created. The rest only depends on nice interfaces that people can use for to manage FRBR statements.
</snip>

This is one of the big problems, as I mentioned in an earlier message. Is this creating what our *patrons* want, or is it creating something that *we* want? A place to begin discussion is Fiction Finder at: http://fictionfinder.oclc.org/

I can't make any links to individual records because it uses session cookies, but just enter it, click on something in the tag cloud, and look at an individual record for an FRBR view. (I'm looking at Dicken's David Copperfield) For those who understand, you can see the work (at the top), expressions (by language and formats), and multiple manifestations, linking into records into WorldCat. I think OCLC has done a very good job of making it as clear as possible, and easy to navigate.

But I repeat my question, which I think is vital: does this give our patrons what they want and need? Or do they need something else, such as links into Google Books, into the Internet Archive, into selected websites, into entertainment and educational videos and lectures, conferences, scholarly websites, reviews, ratings, Wikipedia, dbpedia and who knows what else? Who would choose to use the displays and functions found in Fiction Finder over, e.g. LibraryThing?

I realize that at basis, the FRBR displays create what library catalogs have been trying to achieve for over 100 years, and it's interesting to see it now. I just don't know how relevant the "find/select/identify/obtain --> works/expressions/manifestations/items --> by their authors/titles/subjects/standard numbers" is to the information universe of today.

Now that libraries are facing such difficult times and rethinking how to best apportion library resources, labor, and intellectual capital, I think these questions eventually will have to be addressed and answered, and perhaps sooner than we think.

Tuesday, November 3, 2009

FW: [NGC4LIB] At Univ. of South Carolina, the Card Catalog's Graceful Departure

On Mon, 2 Nov 2009 12:00:44 -0500, McGrath, Kelley C. wrote:

>1. Our data is still designed for printing cards, rather than providing machine-manipulable data for today's environment. The MARC format, despite some visionary elements, was designed for the practical task of printing cards. Our data is overly focused on text strings and not designed for easy extraction and manipulation of parts of the record. We retain practices that
were designed to save space on cards. A lot of things don't work well in the OPAC because they were designed to produce data to be interpreted and filed by a human being. We need to modernize what data we record and how we record it. As it is, the form of the date often is an obstacle to developing the systems we need.

This is the entire point, and I think it colors the thinking of catalogers to a very large extent. For an example prototype of what RDA is aiming for, i.e. FRBR displays, take a look at FictionFinder. It has a nice tag cloud, and the whole site works pretty well, but look at an FRBR record, e.g. The Secret Garden, http://tiny.cc/F1kwx. [Actually, this doesn't work. Go to http://fictionfinder.oclc.org/index.html, click on "Orphans" and then "The Secret Garden"]

With this display, most of the "work" information is at the top (although the summary goes to expression--in any case, works seem to be culture based), and everything is then placed into a table that the user can sort by language or format (expressions) and by dates (manifestations) and finally, they can find which libraries have which items.

I think everybody has done a great job here--it works very well. But is this the goal we should be aiming for? Is this what people really want in this new information universe?

Or do they want these things? Here are free versions in the Internet Archive:
http://www.archive.org/search.php?query=%22secret%20garden%22%20AND%20mediatype%3Atexts

Here are links to articles about The Secret Garden in Google Scholar, several of which are available to everyone
http://scholar.google.com/scholar?hl=en&q=secret+garden+burnett&amp;btnG=Search&as_ylo=&as_vis=0,


I am sure they would like a better human display of http://dbpedia.org/page/The_Secret_Garden, but the information is great.

The Wikipedia page: http://en.wikipedia.org/wiki/The_Secret_Garden

and the LibraryThing page, http://www.librarything.com/search_works.php?q=secret%20garden%20burnett

To me, there is absolutely no contest about what will appeal to our patrons and be the most useful to them. People want choices, *real* choices, and I think somebody, somewhere, will provide them with these choices.

I would like it to be the library community. This is part of the new world.